Synchronic English Web Corpus
This corpus consists of 467,713,650 words (tokens) from web-extracted texts.
It covers the period 2000-2010 split into the sub-corpora below.
339,907,995 words from 100,000 randomly selected web-pages to form sample of the distrubution of texts throughout the web.
127,805,655 words from 56,000 pages selected based on the Open Directory classification of web pages. Each domain consists of 4,000 pages.
|Kids and Teens||9,776,391 words|