Making PDFs Accessible for Visually Impaired Users (and Findable for Everybody Else)

R. van Heusden; H. Ling; L. Nelissen; M. Marx

doi:https://doi.org/10.1007/978-3-031-43849-3_21

Making PDFs Accessible for Visually Impaired Users (and Findable for Everybody Else)

Authors	R. van Heusden H. Ling L. Nelissen M. Marx
Publication date	2023
Host editors	O. Alonso H. Cousijn G. Silvello M. Marrero C. Teixeira Lopes S. Marchesin
Book title	Linking Theory and Practice of Digital Libraries
Book subtitle	27th International Conference on Theory and Practice of Digital Libraries, TPDL 2023, Zadar, Croatia, September 26–29, 2023 : proceedings
ISBN	9783031438486
ISBN (electronic)	9783031438493
Series	Lecture Notes in Computer Science
Event	27th International Conference on Theory and Practice of Digital Libraries
Pages (from-to)	239-245
Number of pages	6
Publisher	Cham: Springer
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	We treat documents released under the Dutch Freedom of Information Act as FAIR scientific data and find that they are not findable nor accessible, due to text malformations caused by redaction software. Our aim is to repair these documents. We propose a simple but strong heuristic for detecting wrongly OCRed text segments, and we then repair only these OCR mistakes by prompting a large language model. This makes the documents better findable through full text search, but the repaired PDFs do still not adhere to accessibility standards. Converting them into HTML documents, keeping all essential layout and markup, makes them not only accessible to the visually impaired, but also reduces their size by up to two orders of magnitude. The costs of this way of repairing are roughly one dollar for the 17K pages in our corpus, which is very little compared to the large gains in information quality.
Document type	Conference contribution
Language	English
Related dataset	Increasing Accessibility of Government Documents Dataset
Published at	https://doi.org/10.1007/978-3-031-43849-3_21 (Final published version)
Other links	https://github.com/irlabamsterdam/accessibilifier
Downloads	978-3-031-43849-3_21 (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Making PDFs Accessible for Visually Impaired Users (and Findable for Everybody Else)