In the enrichment step semantic information is added by named entity recognition and tagging. The Enrichment category contains nodes, which assign part of speech tags, recognize standard named entities, such as names of persons, organization, or locations; biomedical name entities, such as names of genes or proteins; or chemical structures. Each recognized named entity is assigned a tag value e.g. “Person” and a tag type e.g. “NE” for named entity. The tag type represents the domain, or type of a tagger e.g. biomedical named entities or chemical named entities and the value represents a particular characteristic in that domain, e.g. gene in the biomedical field. The POS tagger for example assigns part of speech tags. The assigned tag type is “POS” and the values are those of the Penn Treebank tag set. German texts can be tagged with the Stanford tagger, with the STTS tag set. Each tagger usually assigns tags of its own domain and thus uses its own tag type and set of values. Based on these tag types and values filtering can be applied afterwards, the named entities can be extracted and visualized.
Named entities can be set to “unmodifiable” in order to prevent them from being separated, manipulated or filtered by subsequent nodes of the preprocessing category in the workflow (e.g. Stemmer). Usually recognized named entities, e.g. gene names should not be filtered or stemmed by subsequent nodes in the workflow. To avoid their manipulation of preprocessing nodes, these terms are set “unmodifiable” by the corresponding tagger node (by default). In the dialog of each tagger node it can be specified whether recognized named entities are set “unmodifiable” or not.
If two named entity recognizer are applied one after the other, the latter will overwrite the tagging of the former in case of conflicts. For example if first the ABNER Tagger node recognized “interleukin 7” as a named entity, and second the Dictionary Tagger node recognizes “interleukin” as a named entity (based on a specified dictionary), then the previously recognized term “interleukin 7” is split up and “interleukin” (without 7) is tagged as a named entity.