We all know the terms Black Friday and Cyber Monday - the days following Thanksgiving Day in the United States when stores offer massive discounts on their goods. According to wikipedia, Black Friday has routinely been the busiest shopping day of the year in the U.S. since 2005 (we're familiar with footage of shoppers camping outside of stores to be the first to secure the best deals). Black Friday's counterpart, Cyber Monday, is the day where you'll find so-called killer deals on tech items in online stores. That started when online stores began promoting their own sales to compete with the sales offered by shops on Black Friday. This year retailers are predicting more online shoppers than ever before because of the coronavirus pandemic.
To cope with the influx of orders, businesses need a good system to generate their invoices. Today, we're going to help the accountants of a superstore company, and generate invoices automatically based on orders information. Each invoice contains the order ID, order date, customer name, ordered products, and their costs, and possibly other information.
The company stores huge amounts of order data on Google Cloud Storage. We need to access, filter, and put the data into a shape that fits an invoice report. After that, we export the report as a pdf file. However, an official document from a person to person is always nicer, isn’t it? Therefore, we use the company internal data about customer contact persons and their regions of responsibility, blend this information with the region where the order was submitted, and add the person’s name in the footer of the pdf. The company internal data resides in the collaboration space of the company, Microsoft SharePoint.
Figure 1. Blending data from Google Cloud Storage and SharePoint Online, and exporting the results into a pdf file. In our example we automatically generate invoice reports based on order data and company internal data.
Topic: Generate invoice reports as pdfs
Challenge: Blend data from Google Cloud Storage and Microsoft SharePoint to access all required information for an invoice report. Export the invoice report as a pdf file.
Access mode: CSV files
Integrated Tools: Microsoft SharePoint, Google Cloud Storage
The superstore company sells furniture, technology, and office supplies in North America for both corporates and consumers. For each order we have the information about the order ID, order date, customer name and location, sales amount and profit. In addition, we have company internal data containing the names of the responsible persons for the West/East/Central/South regions. The original Sample-Superstore.xls file is provided by Tableau.
Step-by-step guide to accessing Google Cloud Storage
- We start by accessing the order data available on Google Cloud Storage, which is one of the services in Google Cloud Platform. We can access the data with the Google Cloud Storage Connector and Google Authentication (API Key) nodes.
- In the configuration dialog of the Google Authentication (API Key) node, we provide the service account email and P12 key file. We can get these by following the steps described in detail in the Tutorial: Importing Bike Data from Google BigQuery blog post: signing in with a Google account, creating a project, creating a service account, and downloading the API key in P12 format.
- Further below in the configuration dialog of the Google Authentication (API Key) node, in the scopes field, we select the Google services, such as Cloud Storage, Drive, and Sheets, that are granted for this connection. We select “Google Cloud Storage (Read)” in the dropdown menu, and click the Add button. When we do, we see the following line in the scopes field: https://www.googleapis.com/auth/devstorage.read_only
- Next, we access the order data with the Google Cloud Storage Connector node. In the configuration dialog, we provide the project ID, and select a working directory where the subsequent Reader nodes in the workflow start the browsing for files. The project id is shown in the top left corner of the Google Cloud Platform dashboard. Our data resides in the “Superstore” bucket in the “Orders” folder, so we write “/Superstore/Orders” in the working directory field, and click OK.
- Now, we’re ready to read the data into KNIME! We simply use the CSV Reader (Labs) node, and connect the Google Cloud Storage Connector node to its optional File System Connection input port. In the configuration dialog of the CSV Reader (Labs) node, we click the Browse button, and select the Sample-Superstore.csv file.
Step-by-step guide to accessing data on SharePoint Online
- In this second step, we access the company internal data on Microsoft SharePoint, the collaboration space of the company. We can access SharePoint Online, one of the Microsoft 365 cloud services, with the Microsoft Authentication and SharePoint Online Connector nodes.
- In the configuration dialog of the Microsoft Authentication node, we have two options for authentication: interactive authentication and username and password. We select “Interactive authentication” in the dropdown menu, and click Login. A dialog opens asking for the Microsoft account credentials and security consent before proceeding. Agreeing creates the connection, confirmed by a green “logged in” message next to the Login button in the configuration dialog. Alternatively, we could select Username/password authentication in the dropdown menu, and write the Microsoft account credentials in the fields that activate.
- Further below in the configuration dialog we can define the reach of the authentication. The “memory” setting resets the connection when the workflow closes. The “file” setting enables using the same connection in multiple workflows. The “node” setting enables closing and opening the workflow without recreating the connection.
- Finally, we define the access rights for the connection. For this experiment, the SharePoint files (Read) access is enough.
- Next, we access the data in the collaboration space with the SharePoint Online Connector node. In it’s configuration dialog, we can directly access the “root site”, the uppermost folder level in the company’s SharePoint site, or the Web URL, such as superstore.sharepoint.com, or if the admin consent was given in the authentication step, we can also select a group site directly.
- We can select a working directory by clicking the Browse button next to the Working directory field, and the subsequent Reader nodes will browse for folders and files starting from this folder level.
- Finally, we use the CSV Reader (Labs) node to read the Superstore-People.csv file containing the names of the responsible persons for different regions.
Blending, wrangling, and exporting data into a pdf file
The data blending and preprocessing steps are shown in the Microsoft SharePoint meets Google Cloud Storage workflow in Figure 2, available on the KNIME Hub. After accessing the data, we start a chunk loop to handle only one order at a time. Next, we reshape the data for each order so that
- Each ordered product appears in a separate row
- The customer name, order date, and total value appears at the bottom of the table
- The order ID appears in the invoice report title
- The employee name appears in the footer
Therefore, we join the information about the employee name to the order data based on the region where the order was submitted. We concatenate the rows for the products, customer name, and the total value, because this information is shown in the invoice table. We combine the string “Invoice:” with the order ID to customize the title of the invoice report. Finally, we provide the invoice table as the data input, and the employee name and report title as flow variables for the Table to PDF node.
Figure 2. Blending data from Google Cloud Storage and Microsoft SharePoint, preprocessing the table into a shape that fits an invoice report, and exporting the table into a pdf with a custom title and footer. The procedure is automatically repeated for all orders in the data. The Microsoft SharePoint meets Google Cloud Storage workflow is available on the KNIME Hub.
The Table to PDF node generates a pdf (Figure 3), where the input table appears as an image, with a custom title and author. The pdfs are exported into the location specified in the configuration dialog...whatever kind of storage the accountants are using!
Figure 3. Example PDF invoice report generated with the Table to PDF node. The input data table is shown as image in the middle, and the custom title and author can be provided as flow variables
Yes, they blend! We generated invoice reports with data from two different sources, Google Cloud Storage and Microsoft SharePoint. We defined the layout of the invoice report with the preprocessing nodes in KNIME, and let the workflow then generate all invoice reports by one click, each one named according to the order ID. A lot of manual work was saved. So the next time if you receive your invoice report in record time, you might know the reason!