PSDtoHUBSPOT News Blog

This Blog Template is created by www.psdtohubspot.com

Written by Steven Bussey
on February 20, 2020

Corpus (pl. corpora) is a large body of machine-readable text used for research purposes.

Corpora are either monolingual or multilingual. They often include extra information about parts of speech or alignment of segments in different languages. Some corpora are kept private by their owners, while others are available for everyone to use free of charge. Large translation memories can be used as multilingual corpora.

Research in monolingual corpora can be used in language teaching, voice-recognition and for terminology mining. Bilingual corpora are fundamental to training Statistical Machine Translation engines.

Some of the largest freely available English corpora can be found online here.

You may also like:

Andovar Academy Quality Assurance

Quality Assurance

Comprehensive Guide to Translation Quality Assurance

Machine Translation Andovar Academy

Machine Translation

The Evolution of Machine Translation: From Rules to Neural Networks What is Machine Translation? Machine Translation (MT...

Andovar Academy Large Language Models

Large Language Models (LLMs)

Unlocking the Power of Large Language Models (LLMs) in Localization Localization is the keystone in reaching global audi...