Using KNIME Server REST API for file uploads and downloads

A recent support request brought us to look at the KNIME REST API a bit closer. The question was how to set-up and call a workflow via REST that consumes a file resource and produces a file output. Good starting points to explore the API are the blog post introducing the KNIME Server REST API and the KNIME Server billboard, which is part of the KNIME Server installation.  If you haven't discovered it have a look at this URL (after pointing it to your server address): https://localhost:8443/com.knime.enterprise.server/rest/v4/_profile/kni…

So let's look at the 

 first:

 

After uploading this workflow to the KNIME Server and it can be called via any arbitrary REST client (Java, Python, Perl, KNIME, ...). It has three quickforms, two defining inputs ("File Upload" and "Integer Input"), and one defining the output resource ("File Download"). The purpose of the workflow is to read a file resource and extract the top n most lines from it. The output is piped into another new file and made available as result.

The use case of the workflow is simple but the invocation as REST service is not as we have mixed input types: parameters as JSON (the number of lines in the input) and a generic stream of data (the file itself).

We'll be using local installation of a KNIME Server and "CURL" as REST client. This might appear overly cryptic as it is all commandline but it's fair as it's most basic. We'll do the following steps assuming that the workflow is already uploaded to the KNIME Server.

  1. Discover the workflow in the repository via REST (simple repository listing)
  2. Spawn a new instance of the workflow and query it's in- and output parameters
  3. Draft the payload of the "mixed content" type and invoke the service
  4. Read the result file
  5. Discard the workflow job

To shorten the command inputs and outputs below we abbreviate the base server address to <server>. That is, each occurrence of <server> really is https://server.company.name:8443/com.knime.enterprise.server

Anker1. Discover the workflow in the repository via REST (simple repository listing)

Listing the full repository content is a simple GET operation:

~ $ curl -u 'knime:knime' <server>/rest/v4/repository/ { "_class" : "com.knime.enterprise.server.rest.api.v4.repository.ent.WorkflowGroup", ... "path" : "/", "type" : "WorkflowGroup", "children" : [ { "_class" : "com.knime.enterprise.server.rest.api.v4.repository.ent.Workflow", "@controls" : { "self" : { "href" : "<server>/rest/v4/repository/File-HEAD-Example", "method" : "GET" } }, "path" : "/File-HEAD-Example", "type" : "Workflow" }, { ... } ], "owner" : "wiswedel", "@namespaces" : { "knime" : { "name" : "http://www.knime.com/server/rels#" } } }

Anker2. Spawn a new instance of the workflow and query it's in- and output parameters

The repository tree result also lists further URLs that can be visited; for our "File-HEAD-Example" it suggests to visit this URL via GET request: <server>/rest/v4/repository/File-HEAD-Example.

Following this URL we find the "jobs" URL (...rest/v4/repository/File-HEAD-Example:jobs), including the information that a new job is to be instantiated using a POST request:

~ $ curl -X POST -u 'knime:knime' <server>/rest/v4/repository/File-HEAD-Example:jobs { "@controls" : { "knime:execute-job" : { "href" : "<server>/rest/v4/jobs/88adc60e-0440-465e-801c-82066555abcc{?async}", "isHrefTemplate" : true, "type" : "multipart/form-data", "method" : "POST" }, .... }, "id" : "88adc60e-0440-465e-801c-82066555abcc", "workflow" : "/File-HEAD-Example", "isOutdated" : false, "hasReport" : false, "inputParameters" : { "line-count-3" : { "integer" : 100 } }, "notifications" : { }, "inputResources" : { "file-upload-1" : "file:/tmp/example.txt" }, "outputValues" : { "file-download-7" : null }, "owner" : "knime", "state" : "IDLE", "name" : "/File-HEAD-Example job 88adc60e-0440-465e-801c-82066555abcc", "@namespaces" : { "knime" : { "name" : "http://www.knime.com/server/rels#" } } }

The call has instantiated a new (idle) job with a given ID and it also tells us what the expected input and output parameters are. Here, it's expecting a "line-count-3" input parameter (with default 100) and an input resource "file-upload-1". If all inputs were just plain simple parameters (like strings, numbers, or basic JSON) the response would also contain an example how to draft a request. However, this workfow is a little more complicated as it expects a file stream also.

Anker3. Draft the payload of the "mixed content" type and invoke the service

In order to execute the workflow we need to make another POST request using a "multipart/form-data" input. How to compose this multipart attachment is described in detail in the billboard. All higer level languages support the creation of the attachments via  a well defined API, though as we want to use CURL here we need to construct the attachment by means of a file. The file content is:

--bounds-1234 Content-Type: application/octet-stream Content-Disposition: form-data; name="line-count-3" {"integer":2} --bounds-1234 Content-Type: application/octet-stream Content-Disposition: form-data; name="file-upload-1" test line 1 test line 2 test line 3 test line 4 test line 5 --bounds-1234--

The highlighted lines describe the meta data. It contains two parts, whereby the name of each part corresponds to the identifiers as described above. It's important to note that Content-Type is (currently) ignored and is expected to match the content type as defined via the workflow (so simple parameters need to contain JSON and the file uploads, aka resources, can be binary). In the example the line count parameter has value 2 and the file to be uploaded is a plain text file with "test line 1", ..., "test line 5".

The (synchronous) invocation via curl looks like

~ $ curl -X POST -k -H "Content-Type: multipart/form-data; boundary=bounds-1234"\ --data-binary @/tmp/rest-call.data -u 'knime:knime' \ <server>/rest/v4/jobs/88adc60e-0440-465e-801c-82066555abcc { "@controls" : { ... "knime:output-resource" : { "href" : "<server>/rest/v4/jobs/88adc60e-0440-465e-801c-82066555abcc/output-resources/{resourceId}", "isHrefTemplate" : true, "method" : "GET" }, .... }, "id" : "88adc60e-0440-465e-801c-82066555abcc", ... "outputResources" : { "file-download-7" : "output.txt" }, "owner" : "knime", "state" : "EXECUTED", "name" : "/File-HEAD-Example job 88adc60e-0440-465e-801c-82066555abcc", ... }

 

Anker4. Read the result file

The output generated by the call indicates the workflow is now fully executed (state: EXECUTED) and it also points us to the resulting output file, which is to be retrieved via GET on .../rest/v4/jobs/88adc60e-0440-465e-801c-82066555abcc/output-resources/{resourceId}, whereby the only valid resource ID is "file-download-7". Downloading the file and looking into it reveals this content:

~ $ curl -u 'knime:knime' <server>/rest/v4/jobs/88adc60e-0440-465e-801c-82066555abcc/output-resources/file-download-7 "test line 1" "test line 2"

This is the expected output as we wanted to have the first two lines of the uploaded file. Note also that in case the workflow has an assoicated report it could be downloaded in a similiar way (and as usual you would find the URL pointing the report file in the response when executing the workflow).

Anker5. Discard the workflow job

The above workflow can now be called multiple times with different input. The server takes care of resetting just the right input nodes and populating the web service arguments (so the line count and the input file in this example). This pattern of calling the workflow is very convenient in case there are many different inputs to be processed, e.g. a predictor workflow would be spawned once and then called iteratively many times with just the new data to be predicted. The rest of the workflow (e.g. the part that loads the predictive model) is only executed once.

When no more REST calls are needed the workflow should be discarded. This is done by a DELETE request and also documented in the @control section when spawning and querying the job:

~ $ curl -X DELETE -k -u 'knime:knime' <server>/rest/v4/jobs/88adc60e-0440-465e-801c-82066555abcc