WebCorp Linguist's Search Engine is a specially tailored search engine for the study of language on the web.
Due to the inadequacies of commercial search engines for studying language on the web it was obvious that a specialised search engine, understanding these needs, is required. The orignial WebCorp (now WebCorp Live) uses commercial search engines to extract results from the web and organises the information for linguistic study. Due to the limitations of this system we developed a fully-tailored linguistic search engine: WebCorp LSE.
WebCorp LSE is powered by our own search engine, developed at Birmingham City University. Our specially-designed web crawler, parser, tokeniser, indexer and other components allow us to cache and process large sections of the web. The new architecture has allowed us to enhance the sentence boundary detection, date identification, 'junk' (or 'boilerplate') removal, collocation and other statistical analysis options currently available in WebCorp Live. Additional pre-processing includes grammatical tagging and language detection. The search interface enables queries for words and phrases, including wildcards and pattern matching.
WebCorp LSE is currently being tested by members of the corpus linguistics community. If you have an interest in language and would like to help us test the search engine and provide feedback, please email rdues @ bcu.ac.uk.
WebCorp LSE is being developed and operated by the Research and Development Unit for English Studies (RDUES) in the School of English at Birmingham City University.