The WebCorp Linguist's Search Engine is a tool for the study of language on the web. The corpora below were built by crawling the web and extracting textual content from web pages. Searches can be performed to find words or phrases, including pattern matching, wildcards and part-of-speech. Results are given as concordance lines in KWIC format. Post-search analyses are possible including time series, collocation tables, sorting and summaries of meta-data from the matched web pages.

Synchronic English Web Corpus
470 million word corpus built from web-extracted texts. Including a randomly selected 'mini-web' and high-level subject classifaction. Search ► About
Diachronic English Web Corpus
130 million word corpus randomly selected from a larger collection and balanced to contain the same number of words per month. Search ► About
Birmingham Blog Corpus
630 million word corpus built from blogging websites. Including a 180 million word sub-section separated into posts and comments. Search ► About
Anglo-Norman Correspondence Corpus
A corpus of approximately 150 personal letters written by users of Anglo-Norman. Including bespoke part-of-speech annotation. Search ► About
Novels of Charles Dickens
A searchable collection of the novels of Charles Dickens. Results can be visualised across chapters and novels. Search ► About