EuroGOV: Engineering a Multilingual Web Corpus

Open Access
Authors
Publication date 2005
Book title Working Notes for the CLEF 2005 Workshop
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract EuroGOV is a multilingual web corpus that was created to serve as the document collection for WebCLEF, the CLEF 2005 web retrieval task. EuroGOV is a collection of web pages crawled from the European Union portal, European Union member state governmental web sites, and Russian government web sites. The corpus contains over 3 million documents written in more than 20 different European languages. In this paper we provide a detailed description of the EuroGOV collection.
Document type Conference contribution
Downloads
Permalink to this page
Back