Leveraging active learning for ocean data quality assessment reducing labeling workload and addressing severe data imbalance challenges
| Authors | |
|---|---|
| Publication date | 10-2025 |
| Journal | International Journal of Data Science and Analytics |
| Volume | Issue number | 20 | 5 |
| Pages (from-to) | 4777-4798 |
| Organisations |
|
| Abstract |
Oceanic research initiatives like Argo, GLOSS, and EMSO aim to enhance our understanding of the oceans and climate through extensive data collection. Maintaining the quality of collected data is essential for effective data analysis and real-world applications. While automated and semi-automated tests can provide real-time or near-real-time validation, thorough quality control still depends on operator review. Consequently, current Quality Control (QC) processes continue to be labor-intensive. Machine Learning (ML) methods, which can analyze vast amounts of data and learn complex patterns autonomously, offer significant potential for improving QC processes. However, challenges like severe data disproportion persist for ML approaches. This article proposes exploiting active learning (AL) to assist QC experts, reducing their workload by proactively selecting informative data points for labeling. Targeting the data distribution challenge, AL, coupled with imbalance-resilient classifiers, enhances model performance in recognizing erroneous data points. To mitigate the cold-start problem in AL, we propose outlier detection for initializing classifiers, significantly reducing annotation costs. Our approach is tested on data generated by 5 Argo floats, demonstrating its feasibility to lessen the labeling workload for experts and tackle significant data imbalance. Although the experiments are limited in scale, the findings indicate a promising outlook for using active learning in ocean data quality assessment, facilitating an effective semi-automated quality control framework.
|
| Document type | Article |
| Language | English |
| Published at | https://doi.org/10.1007/s41060-025-00751-w |
| Other links | https://www.scopus.com/pages/publications/105002454070 |
| Downloads |
Leveraging active learning for ocean data quality assessment
(Final published version)
|
| Permalink to this page | |
