Cochrane-auto: An Aligned Dataset for the Simplification of Biomedical Abstracts
| Authors | |
|---|---|
| Publication date | 2024 |
| Host editors |
|
| Book title | The Third Workshop on Text Simplification, Accessibility and Readability : proceedings of the workshop |
| Book subtitle | TSAR 2024 : November 15, 2024 |
| ISBN (electronic) |
|
| Event | 3rd Workshop on Text Simplification, Accessibility and Readability |
| Pages (from-to) | 41-51 |
| Number of pages | 11 |
| Publisher | Kerrville, TX: Association for Computational Linguistics |
| Organisations |
|
| Abstract |
The most reliable and up-to-date information on health questions is in the biomedical literature, but inaccessible due to the complex language full of jargon. Domain specific scientific text simplification holds the promise to make this literature accessible to a lay audience. Therefore, we create Cochrane-auto: a large corpus of pairs of aligned sentences, paragraphs, and abstracts from biomedical abstracts and lay summaries. Experiments demonstrate that a plan-guided simplification system trained on Cochrane-auto is able to outperform a strong baseline trained on unaligned abstracts and lay summaries. More generally, our freely available corpus complementing Newsela-auto and Wiki-auto facilitates text simplification research beyond the sentence-level and direct lexical and grammatical revisions.
|
| Document type | Conference contribution |
| Language | English |
| Published at | https://doi.org/10.18653/v1/2024.tsar-1.5 |
| Downloads |
2024.tsar-1.5
(Final published version)
|
| Permalink to this page | |
