Solving problems using math and computers is my favourite job to do. In early days of my career, I aspired to be a good software engineer, and I passionately pursued it until I slowly transitioned towards a research career. While I write less code as a researcher than what I’m used to doing as a software engineer, I emphasize on good software engineering practices, and open sourcing of tools with a permissible license. In the beginning (2012-2016) I wrote much of my code in Java/Groovy/Scala, but in the recent years (2016-Now), Python has become my go to choice. I have released a bunch of tools to PyPi.
I sometimes participate in StackOverflow QA threads.
Here are some of my selected projects:
RTG: Reader Translator Generator
Neural Machine Translation Toolkit.
- Code: github.com/isi-nlp/rtg-xt
- Docs: isi-nlp.github.io/rtg/
- Installer: pypi.org/project/rtg/
MTData: Machine Translation Data
A tool that locates, downloads, and prepares parallel data for machine translation from many data sources.
- Code: github.com/thammegowda/mtdata
- Installer+Docs: pypi.org/project/mtdata/
NLCodec: Natural Language CoDec
A library to do coding-decoding such as Word, Character, and Byte-Pair-Encoding of natural language text.
- Code: github.com/isi-nlp/nlcodec/
- Installer+Docs: pypi.org/project/nlcodec/
awkg: Python awk
awk like line-processing tool with python as scripting language.
- Code: github.com/thammegowda/awkg
- Installer+Docs: pypi.org/project/awkg/
VirtChar: Virtual Characters
Dialog systems that imitate characters from the popular TV show named F.R.I.E.N.D.S.
- Code: github.com/thammegowda/virtchar
- Dataset: github.com/thammegowda/dialog-data
- Report and Presentation
JunkDetect: Junk Detector
A tool to detect junk or not-junk text with support for 100 languages.
- Code: github.com/thammegowda/junkdetect
- Installer+Docs: pypi.org/project/junkdetect/
Sparkler: Spark Crawler
A large scale web crawler on Apache Spark, with Apache Solr backend for crawler database.
Auto Extractor
HTML web page clustering tool based on DOM structure and CSS style similarity.
- Code: github.com/USCDataScience/autoextractor
- Docs: github.com/USCDataScience/autoextractor/wiki
- Paper: ieeexplore.ieee.org/abstract/document/7785739
Supervising UI
A simple web UI for labelling images to be used for image recognition.
More Tools
- CoreNLP + Apache Tika: github.com/thammegowda/tika-ner-corenlp
- Contributed to Apache Tika: TikaAndNER
- Keras models deployment on JVM using Deeplearning4J: github.com/USCDataScience/dl4j-kerasimport-examples
- Contributed to Apache Tika: PR #125
- Tensorflow model deployment on JVM using GRPC: github.com/thammegowda/tensorflow-grpc-java
- Image Recognition at large scale using Apache Spark: github.com/thammegowda/tika-dl4j-spark-imgrec
- Document Similarity using Apache Spark and Solr: github.com/thammegowda/solr-similarity
- Keyboard layout map of OSX for Kannada (my native language): github.com/thammegowda/kannada-osx-keylayout
