NMT | Thamme Gowda

Sequence Transduction: Generalization and Challenges

Sequence-to-sequence transduction, e.g. neural machine translation (NMT), is a general problem. This task involves transformation of a sequence of symbols to another sequence of symbols, where both input and output can have varying lengths. Challenges Sequential information: Order of input and output symbols are important. Variable length sequences with long term dependencies: Sequences can be extremely long. Symbols may contains dependencies across the sequence. E.g. Consider the text in a book, and dependencies across chapters. Unbounded vocabulary, e.g. Vocabulary in natural languages. Imbalanced distribution: Some symbols may appear frequently while other may appear rarely. E.g. The distribution of types in natural languages. ...

Many-to-English Machine Translation Tools, Data, and Pretrained Models

Macro-Average: Rare Types Are Important Too

Finding the Optimal Vocabulary for Neural Machine Translation