Sequence Transduction: Generalization and Challenges

Sequence-to-sequence transduction, e.g. neural machine translation (NMT), is a general problem. This task involves transformation of a sequence of symbols to another sequence of symbols, where both input and output can have varying lengths. Challenges Sequential information: Order of input and output symbols are important. Variable length sequences with long term dependencies: Sequences can be extremely long. Symbols may contains dependencies across the sequence. E.g. Consider the text in a book, and dependencies across chapters. Unbounded vocabulary, e.g. Vocabulary in natural languages. Imbalanced distribution: Some symbols may appear frequently while other may appear rarely. E.g. The distribution of types in natural languages. ...

May 4, 2021 · 2 min · Thamme Gowda

Many-to-English Machine Translation Tools, Data, and Pretrained Models

Many-to-English Machine Translation Tools, Data, and Pretrained Models

April 25, 2021 · 3 min · Thamme Gowda

Macro-Average: Rare Types Are Important Too

Macro-Average: Rare Types Are Important Too

March 11, 2021 · 8 min · Thamme Gowda

Finding the Optimal Vocabulary for Neural Machine Translation

Finding the Optimal Vocabulary for Neural Machine Translation

November 1, 2020 · 2 min · Thamme Gowda