- Digital sustainable publication of legacy parliamentary proceedings
- 11th Annual International Digital Government Research Conference on Public Administration Online: Challenges and Opportunities (dg.o '10), Puebla, Mexico
- Book/source title
- Proceedings 11th International Digital Government Research Conference (dg.o 2010)
- Pages (from-to)
- Digital Government Society of North America
- Document type
- Conference contribution
- Faculty of Science (FNWI)
- Informatics Institute (IVI)
We address the problem of publishing parliamentary proceedings in a digital sustainable manner. We give an extensive requirements analysis, and based on that propose a uniform XML format. We evaluated our approach by collecting and automatically processing proceedings from six parliaments spanning almost 200 years in total. Most of this data is real legacy data consisting of scanned and OCRed documents. The approach scales very well and produces high quality data.
All documents are transformed into UTF-8 encoded XML files with extensive metadata in Dublin Core standard. The text itself is divided into pages which are divided into paragraphs. Every document, page and paragraph has a unique URN which resolves to a web page. Every page element in the XML files is connected to a facsimile image of that page in PDF or JPEG format. We created a viewer in which both versions can be inspected simultaneously. A search-engine for the complete collection is available online.
If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library, or send a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible.