Corpus

100%

Written by Steven Bussey
on February 20, 2020

Corpus (pl. corpora) is a large body of machine-readable text used for research purposes.

Corpora are either monolingual or multilingual. They often include extra information about parts of speech or alignment of segments in different languages. Some corpora are kept private by their owners, while others are available for everyone to use free of charge. Large translation memories can be used as multilingual corpora.

Research in monolingual corpora can be used in language teaching, voice-recognition and for terminology mining. Bilingual corpora are fundamental to training Statistical Machine Translation engines.

Some of the largest freely available English corpora can be found online here.

Andovar Academy Quality Assurance

Comprehensive Guide to Translation Quality Assurance

Machine Translation Andovar Academy

The Evolution of Machine Translation: From Rules to Neural Networks What is Machine Translation? Machine Translation (MT...

Andovar Academy Large Language Models

Unlocking the Power of Large Language Models (LLMs) in Localization Localization is the keystone in reaching global audi...

Contact us

Take your brand to the next level.

PSDtoHUBSPOT News Blog

This Blog Template is created by www.psdtohubspot.com

Andovar Academy

Corpus

Categories

Subscribe to Email Updates

Popular Stories

Subscribe to Email Updates

Get all News Updates to your inbox.

Subscribe to Email Updates

Contact us

^HQSingapore

About Andovar

Subscribe to our Newsletter

Follow us

PSDtoHUBSPOT News Blog

This Blog Template is created by www.psdtohubspot.com

Andovar Academy

Corpus

Categories

Subscribe to Email Updates

Popular Stories

Subscribe to Email Updates

You may also like:

Quality Assurance

Machine Translation

Large Language Models (LLMs)

Get all News Updates to your inbox.

Subscribe to Email Updates

Contact us

HQSingapore

About Andovar

Subscribe to our Newsletter

Follow us

^HQSingapore