Frequencies

After preprocessing is finished, frequencies of terms in documents and the complete corpus can be computed. The Text Processing plugin provides the nodes for the computation of the most famous frequency measures in text mining, i.e. term frequency (TF) and inverse document frequency (IDF). In addition to the inverse document frequency, an inverse category frequency node is available, which is analogue to IDF but based on categories. Based on the computed frequencies filtering can be applied by the Frequency Filter node in order to keep e.g. the high frequent terms. It is possible to specify a range of frequency values for which terms are kept, or to specify an amount k of terms to keep, which are the k terms with the highest frequencies.