Digital weight watching: reconstruction of scanned documents
| Authors |
|
|---|---|
| Publication date | 2009 |
| Host editors |
|
| Book title | AND 2009 : proceedings of the Third Workshop on Analytics for Noisy Unstructured Text Data |
| Book subtitle | July 23-24, 2009, Barcelona, Spain |
| ISBN (electronic) |
|
| Event | Third Workshop on Analytics for Noisy Unstructured Text Data (AND 2009), Barcelona, Spain |
| Pages (from-to) | 25-31 |
| Publisher | New York, NY: ACM Press |
| Organisations |
|
| Abstract |
Scanned and OCRed data leads to large file sizes if facsimile images are included. This makes storage of, and providing online access to large data sets costly. Manually analyzing such data is cumbersome because of long download and processing times. It may thus be advantageous to reconstruct the scanned documents as documents without scanned images which nevertheless closely resemble the original. We have done this reconstruction for a data set of Dutch parliamentary proceedings with positive results. 1.5% of the original storage space was needed, while the documents resembled the originals to a high degree. We describe the reconstruction process and evaluate the costs, the benefits and the quality.
|
| Document type | Conference contribution |
| Language | English |
| Published at | https://doi.org/10.1145/1568296.1568303 |
| Permalink to this page | |
