MTData Search

⬇Save IDs
Dataset IDs are of form: Group-name-version-lang1-lang2(JavaScript flavored) Regular expressions are welcome! Examples:
  • .*crawl matches both commoncrawl and paracrawl
  • -eng$|-eng- matches all English datasets without a country code
  • -eng(_US)?$|-eng(_US)?- matches all English US datasets
  • -eng(_[A-Z]{2})?$|-eng(_[A-Z]{2})?- matches all English datasets, regardless of country code
Click "Save IDs" to download IDs as a textfile.
All your search queries are done locally on your browser.