Table Representation Learning

M. Hulsebos

Table Representation Learning

Authors	M. Hulsebos
Supervisors	P.T. Groth
Cosupervisors	C. Demiralp
Award date	23-02-2024
ISBN	9789464837438
Number of pages	153
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	The increasing amount of data being collected, stored, and analyzed, induces a need for efficient, scalable, and robust methods to handle the data. A large fraction of this data is stored in structured formats such as relational tables and spreadsheets. To automate data management and analysis tasks for such data, in this thesis, we investigate how the success of representation learning for data modalities like text and images, can be extended to tabular data, which we refer to as Table Representation Learning (TRL). First, we present the results of our exploration of neural embedding methods for automatic table comprehension. We contribute Sherlock, a deep learning model for detecting the semantic types of table columns in a scalable, robust and accurate manner. We also present a system, AdaTyper, that effectively and efficiently adapts such semantic type detection models towards unseen data distributions and semantic types. As existing TRL models need to be pre-trained on large-scale representative datasets, we introduce GitTables: a large corpus of relational tables extracted from CSV files stored on GitHub. The tables in GitTables better resemble typical database tables and are enriched with column semantics. Finally, we present Observatory, a framework and tool for analyzing what learned embeddings of tables capture with regards to structural and content characteristics of relational tables. With Observatory, we identify strengths and weaknesses of existing TRL models and the table embeddings they generate. The thesis concludes with a summary of our findings and a discussion around open challenges and future opportunities for Table Representation Learning.
Document type	PhD thesis
Language	English
Downloads	Thesis
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Table Representation Learning