EuroGOV: Engineering a Multilingual Web Corpus
| Authors |
|
|---|---|
| Publication date | 2005 |
| Book title | Working Notes for the CLEF 2005 Workshop |
| Organisations |
|
| Abstract | EuroGOV is a multilingual web corpus that was created to serve as the document collection for WebCLEF, the CLEF 2005 web retrieval task. EuroGOV is a collection of web pages crawled from the European Union portal, European Union member state governmental web sites, and Russian government web sites. The corpus contains over 3 million documents written in more than 20 different European languages. In this paper we provide a detailed description of the EuroGOV collection. |
| Document type | Conference contribution |
| Downloads | |
| Permalink to this page | |
