The similarity metric

M. Li; X. Chen; X. Li; B. Ma; P. Vitanyi

The similarity metric

Authors	M. Li X. Chen X. Li B. Ma P. Vitanyi
Publication date	2003
Book title	Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms
Book subtitle	SODA '03 : Baltimore, MD, January 12-13, 2003
ISBN	9780898715385
Event	14th ACM-SIAM symposium on discrete algorithms
Pages (from-to)	863-872
Publisher	New York: Association for Computing Machinery
Organisations	Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract	A new class of metrics appropriate for measuring effective similarity relations between sequences, say one type of similarity per metric, is studied. We propose a new "normalized information distance", based on the noncomputable notion of Kolmogorov complexity, and show that it minorizes every metric in the class (that is, it is universal in that it discovers all effective similarities). We demonstrate that it too is a metric and takes values in [0, 1]; hence it may be called the similarity metric. This is a theory foundation for a new general practical tool. We give two distinctive applications in widely divergent areas (the experiments by necessity use just computable approximations to the target notions). First, we computationally compare whole mitochondrial genomes and infer their evolutionary history. This results in a first completely automatic computed whole mitochondrial phylogeny tree. Secondly, we give fully automatically computed language tree of 52 different language based on translated versions of the "Universal Declaration of Human Rights".
Document type	Conference contribution
Language	English
Published at	https://dl.acm.org/doi/10.5555/644108.644253 (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

The similarity metric