- The normalization of occurrence and Co-occurrence matrices in bibliometrics using Cosine similarities and Ochiai coefficients
- Journal of the Association for Information Science and Technology
- Volume | Issue number
- 67 | 11
- Pages (from-to)
- Document type
- Faculty of Social and Behavioural Sciences (FMG)
- Amsterdam School of Communication Research (ASCoR)
We prove that Ochiai similarity of the co-occurrence matrix is equal to cosine similarity in the underlying occurrence matrix. Neither the cosine nor the Pearson correlation should be used for the normalization of co-occurrence matrices because the similarity is then normalized twice, and therefore overestimated; the Ochiai coefficient can be used instead. Results are shown using a small matrix (5 cases, 4 variables) for didactic reasons, and also Ahlgren et al.'s (2003) co-occurrence matrix of 24 authors in library and information sciences. The overestimation is shown numerically and will be illustrated using multidimensional scaling and cluster dendograms. If the occurrence matrix is not available (such as in internet research or author cocitation analysis) using Ochiai for the normalization is preferable to using the cosine.
- go to publisher's site
- Accepted author manuscript
If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library, or send a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible.