Normalized information distance

Authors	P.M.B. Vitányi F.J. Balbach R.L. Cilibrasi M. Li
Publication date	2009
Host editors	F. Emmert-Streib M. Dehmer
Book title	Information theory and statistical learning
ISBN	9780387848150
Pages (from-to)	45-82
Publisher	New York, NY: Springer
Organisations	Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract	The normalized information distance is a universal distance measure for objects of all kinds. It is based on Kolmogorov complexity and thus uncomputable, but there are ways to utilize it. First, compression algorithms can be used to approximate the Kolmogorov complexity if the objects have a string representation. Second, for names and abstract concepts, page count statistics from the World Wide Web can be used. These practical realizations of the normalized information distance can then be applied to machine learning tasks, especially clustering, to perform feature-free and parameter-free data mining. This chapter discusses the theoretical foundations of the normalized information distance and both practical realizations. It presents numerous examples of successful real-world applications based on these distance measures, ranging from bioinformatics to music clustering to machine translation.
Document type	Chapter
Note	The original publication is available at www.springerlink.com
Language	English
Published at	https://doi.org/10.1007/978-0-387-84816-7_3
Permalink to this page

Back

UvA-DARE