View Document

It is possible to view a document in our corpus in three ways:

All are accessible by clicking on a concordance on the search results page.

Text

This displays the text of a document containing a match to your query. The matching string will be highlighted in red. You can reach this by clicking on a concordance on the results page then clicking the 'Text' link.

At the top of the page the following meta information is displayed:

Following this there are two links:

Below this the text of the document is displayed as it is stored in our corpus. You may notice some content is not present in this display compared to the cached or live versions of the document. This is due to our clean-up and boilerplate removal processes that are essential to ensuring the quality of the text in our corpus. The cached version contains the full content of the document.

Show POS Tags

Clicking this will change the display to show the corresponding part-of-speech tags in brackets along side each word. Description of POS tagsets.

Goto Match

Clicking this will scroll the page down to the match that you selected on the concordance results screen.

Live Webpage

You can view the live version of a webpage by clicking on a link to it from either the concordance results or view text screens.

A live webpage is external to the search engine and represents the document as it currently exists on the internet. It is important to note that the page may have changed since it was downloaded and included in our corpus. Therefore we also store a cached version of the document in the form it was downloaded in.

WebCorp and RDUES are not associated with any external webpages and take no responsibility for their content.

Cache

Every document in our corpus is stored in the exact form that it was downloaded in. We call this store of pages our cache. To access the cached version of a document you can click on a link to it on the concordance results or view text screens.

You may notice that some pages look different to their live version. This is because external elements of the page are not included as part of the cached version. These may be media (such as images and video), javascript or stylesheet files. All of the textual content used within our corpus will be present in the cached document.