Users Test Corpus
Made up of domains built from social bookmarking websites in English, French and German and Guardian articles downloaded from the Guardian website (2006 only)
Old version of the search engine software.
Multi-Domain Test Corpus
Made up of domains built from social bookmarking websites in English, French and German and Guardian articles downloaded from the Guardian website (Jan 2000 - April 2007)
Old version of the search engine software.
French Newspaper Corpus
Made up of three French newspapers (La Dépêche, L'Humanité and Le Monde) each broken up into six domains (culture, editorial, finance, international news, national news and sport). Retrieved 2002 and 2003.
New version of the search engine software.
Works of Thomas Carlyle
Old version of the search engine software.
Science Fiction Corpus
Old version of the search engine software.
Charles Dickens Novels
Made up to novels written by Charles Dickens downloaded from Project Gutenberg.
New version of the search engine software. Some features disabled because they don't apply to this corpus. Extra Book / Chapter graphs.
Newspaper Corpus (tagged)
Independent and Guardian 1984 - 2006
Newest version of the search engine software. All the latest developments will be tested with this version.
Newspaper Corpus (old interface)
Independent and Guardian 1989 - 2006
First version of the search engine software.
Test Corpus
Content changes frequently
Newest version of the search engine software. All the latest developments will be tested with this version.