The WebCorp Linguist's Search Engine is a tool for the study of language on the web. The corpora below were built by crawling the web and extracting textual content from web pages. Searches can be performed to find words or phrases, including pattern matching, wildcards and part-of-speech. Results are given as concordance lines in KWIC format. Post-search analyses are possible including time series, collocation tables, sorting and summaries of meta-data from the matched web pages.
|470 million word corpus built from web-extracted texts. Including a randomly selected 'mini-web' and high-level subject classifaction.||Search ► About|
|130 million word corpus randomly selected from a larger collection and balanced to contain the same number of words per month.||Search ► About|
|630 million word corpus built from blogging websites. Including a 180 million word sub-section separated into posts and comments.||Search ► About|
|A corpus of approximately 150 personal letters written by users of Anglo-Norman. Including bespoke part-of-speech annotation.||Search ► About|