Data quality control and research asset discovery for open science

N. Li

Data quality control and research asset discovery for open science

Authors	N. Li
Supervisors	P. Grosso Z. Zhao
Award date	27-05-2025
ISBN	9798897785766
Number of pages	141
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	Open science represents a transformative movement advocating for more open and collaborative research practices, where publications, data, software, and other academic outputs are shared at the earliest stages and made available for reuse. Along this trend, Data Quality Control (DQC) plays a crucial role in ensuring the quality of data and, thus, the correctness and reliability of scientific findings. We investigate Active Learning (AL) to interactively query human annotators to label the most informative data points, thereby reducing the labeling burden on experts. Besides data, there has been a growing emphasis on sharing other types of research assets, such as codes, computational notebooks, and software tools to improve the reproducibility of research and facilitate collaboration across disciplines. However, the proliferation of research assets introduced by the open science movement can lead to information overload. We propose DeCNR, which models computational notebooks as bi-modal data (including text and code) and utilizes a fused sparse-dense model for computational notebooks retrieval. Extending from this research, we propose MRAS, a search system capable of indexing various types of research assets from heterogeneous data sources, enabling users to discover a wide range of research resources through a single search interface. In summary, this thesis addresses two crucial aspects of open science: Data Quality Control (DQC) and Research Asset Discovery (RAD). By focusing on DQC, we aim to ensure that data is reliable and trustworthy, while RAD seeks to facilitate the efficient retrieval of high-quality research assets.
Document type	PhD thesis
Language	English
Downloads	Thesis
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Data quality control and research asset discovery for open science