Overview
The HTTP nodes provide a connection to HTTP- and REST-based servies. Using the HttpRetriever node, different HTTP methods can be executed: GET
, POST
, HEAD
, PUT
, and DELETE
. The node allows to submit content, which must be given as binary data. It handles cookies and allows to specify arbitrary HTTP headers for requests. For accessing OAuth-based APIs, a dedicated node is available which simplifies signing requests -- just enter your credentials and create the request to perform.
Results of the HttpRetriever node are provided as HttpResult cell type. The HttpResult type bundles the actual binary content of the result, status code, and all HTTP response headers. In case you want to extract header information from a HttpResult, use an HttpResultDataExtractor node.
Form-encoded requests
In case you want to send content with your HTTP requests (typically for POST
and PUT
), you can select a Binary cell as input in the HttpRetriever node's configuration. For performing a form encoded request, use the FormEncodedHttpEntityCreator node, where you can transform string columns to encoded key-value data. Do not forget to specify HTTP entity content type in HttpRetriever's configuration afterwards.
Multipart-encoded requests
Multipart-encoded requests can be created using the MultipartEncodedHttpEntityCreator node. It requires one or more binary input columns and creates (1) a combined multipart-encoded column, (2) column with the content type header, including the delimiter. In a downstream HttpRetriever node, select appended binary column as HTTP entity input, and use a flow variable to set the proper HTTP entity content type.
Cookies
Cookies which are created during a node's execution are output to the HttpRetriever's second output port. In case you want to send cookies with a request, use the second (optional) input port of the HttpRetriever node. When performing sequential requests with multiple HttpRetriever nodes, you can simply chain the cookie in- and out-ports to hand the cookies through the workflow.
Example workflows
Form-based login with GET
, POST
, and cookies
This workflow demonstrates how to perform a login to the KNIME forums similar to a browser-based login flow. The workflow extracts the number of unread items for each sub forum. Before running the workflow, you need to specify you own user credentials as workflow variables: Right-click on the workflow in the "KNIME Explorer" view, choose "Workflow Variables..." and enter your data. Download the workflow
.
Crawl paginated pages
This workflow shows how article URLs can be extracted from paginated overview pages (as typically known from news websites and blogs) using a recursive loop. The example uses the Ars Technica website, but the workflow can be easily adapted to your needs: (1) Specify a start URL, (2) specify an XPath expression for extracting the desired article URLs, (3) specify an XPath for extracting the 'next' link. Download the workflow
.
Accessing the Twitter API with OAuth authentication
In this example workflow, the OAuth nodes and further Palladian HTTP nodes are being used to access the Twitter API. The example shows how to perform a GET
request to access a specific Tweet and a POST
request to create a new Tweet. The OAuth node is not limited to Twitter, but can be used for any OAuth 1.0 signing purpose.
Before running the example workflow, make sure to enter your own credentials in the OAuth node’s configuration. Follow these instructions to get the necessary keyd and access tokens. Important: Note, that you explicitly need to enable write access if required.
Download the workflow
. More information on OAuth is available here, here, and here.