Zoekopdracht:
faculteit: "FNWI" en publicatiejaar: "2000"
| Auteur | Sjaak Verbeek | | Titel | An Information Theoretic Approach to Finding Word Groups for Text Classification |
| Jaar | 2000 |
| Faculteit | Faculteit der Natuurwetenschappen, Wiskunde en Informatica |
| Instituut/afd. | FNWI/FGw: Institute for Logic, Language and Computation (ILLC) |
| Serie | ILLC Master of Logic Theses / ILLC ; MoL-2000-03 |
| Samenvatting | An Information Theoretic Approach to Finding Word Groups
for Text Classification
Sjaak Verbeek
This thesis concerns finding the `optimal' number of (non-overlapping)
word groups for text classification. We present a method to select
_which_ words to cluster in word groups and _how many_ such word
groups to use on the basis of a set of pre-classified texts. The
method involves a greedy search through the space of possible word
groups. The criterion on which is navigated through this space is
based on `mutual information' and is known as `Jensen Shannon
divergence'. The criterion to decide _which number_ of word groups to
use is based on Rissanen's MDL Principle. We present empirical results
that indicate that the proposed method performs well at its task. The
prediction model used is based on the Naive Bayes model and the date
set used for the experiments is a subset of the `20 Newsgroup
Dataset'. |
| Soort document | Preprint |
| Download bestand | |
| Document finder |
|
Gebruik dit adres om naar deze pagina te linken: http://dare.uva.nl/record/419859
Vraag/opmerking over dit recordMail aan een collega
Toevoegen aan bewaarset
|