Harman Patil (Editor)

Lancaster Oslo Bergen Corpus

Updated on
Edit
Like
Comment
Share on FacebookTweet on TwitterShare on LinkedInShare on Reddit

The Lancaster-Oslo/Bergen Corpus (often abbreviated as LOB Corpus) is a million-word collection of British English texts which was compiled in the 1970s in collaboration between the University of Lancaster, the University of Oslo, and the Norwegian Computing Centre for the Humanities, Bergen, to provide a British counterpart to the Brown Corpus compiled by Henry Kučera and W. Nelson Francis for American English in the 1960s.

Its composition was designed to match the original Brown corpus in terms of its size and genres as closely as possible using documents published in the UK by British authors. Both corpora consist of 500 samples each comprising about 2000 words in the following genres:


The corpus has been also tagged, i.e. part-of-speech categories have been assigned to every word.

References

Lancaster-Oslo-Bergen Corpus Wikipedia