Case Studies: Support of Target Discovery Research by KNIME Textprocessing and NetworkminigThe Palladian Geo Nodes are a subset of the Palladian Nodes which provide functionality for working with geographic data (currently focused on point-based data). The Geo Nodes contain basic components, such as a "GeoCoordinate" cell type which represents a WGS84 latitude/longitude pair, a Haversine-based distance measure and aggregation methods for coordinate collections. The nodes include an extractor for location data from text, street address geocoding and reverse coordinate lookup.
- LocationExtractor: Allows to extract geographic locations from English text. This node uses Palladian's location extraction mechanism, which performs various steps for recognizing potential locations within a given text, followed by a disambiguation. The disambiguation step checks hierarchical/contains relations and identifies correct locations by their proximity to other given locations in the text.
- GoogleAddressGeocoder, MapQuestGeocoder, MapzenGeocoder: These nodes allow geocoding of street addresses, such as "1600 Amphitheatre Parkway, Mountain View, CA", to geo coordinates using either Google's, MapQuest's, or Mapzen’s APIs.
- ReverseLocationLookup: Given a latitude/longitude coordinate pair and a specified radius, retrieve all locations within that radius. E.g. given a coordinate (52.52, 13.41) and using an appropriate location source, we can retrieve the location "Berlin" in Germany.
- LatitudeLongitudeToCoordinate and CoordinateToLatitudeLongitude: Conversion nodes between double latitude/longitude pairs and GeoCoordinate value and vice versa.
- Geo distances: Provides the Haversine measure as distance between two coordinate points. The distance measure can e.g. be used for KNIME's clustering nodes to perform a spatial clustering of geo coordinates.
- MapViewer: This node allows to display geographical locations on a map. Different tile providers are available. Colors and sizes can be applied using KNIME's Color Manager and Size Manager nodes.
Adding a location source
In order to use LocationExtrator and ReverseLocationLookup nodes, a location source (aka. Gazetteer) needs to be configured. Open the KNIME preferences and go to the following preferences page: KNIME → Palladian Location Extractor.
- GeoNames: We currently provide a location source for the GeoNames API with the Palladian community contributions. The free version of the GeoNames API allows 30,000 REST requests/day, 2,000 REST request/hour. To add the GeoNames source, click the New... button, and follow the instructions, which will provide the you with links to create a free GeoNames account. Enable the option to retrieve location hierarchies to improve the LocationExtractor’s results. This however causes an additional API request for every found location.
- Local Gazetteer: In case, you want to keep your data private, you're running out of GeoName's request limit, or you significantly want to speed up operations, we provide a separate plugin, which allows to setup a local gazetteer on your machine without accessing the Web. Contact us, if you are interested.
Configuring MapQuest and Mapzen
The MapQuestGeocoder and MapzenGeocoder nodes require a freely available API key. Open the KNIME preferences, go to KNIME → Palladian Geocoders, and follow the provided instructions on how to obtain a key, if necessary.
Good or bad holidays: Extracting locations from text
This workflow uses Palladian's WebSearcher node to perform the queries "best holiday locations" and "worst holiday locations" on Bing. Using several other Palladian and Textprocessing nodes, the found found web pages are downloaded, parsed and their text content is extracted. With the LocationExtractor node, location mentions in the texts are extracted, so that their coordinates can be displayed using the MapViewer node. Download the workflow
Pizza connection: Geocoding, MapViewer, and clustering
This workflow demonstrates the geocoding functionality, the MapViewer node, and usage of the Geo distances in order to perform a clustering of extracted geo coordinates. The example data is taken from a pizzeria dataset from this Google tutorial. Download the workflow