Eliminating the Need to Manually Read Through 30,000 Work Order Descriptions
BGIS manages large capital projects for its clients, some involving a targeted optimization of building systems and equipment to drive efficiency, which aligns well with the company’s sustainability focus. This case focuses on a lighting retrofit conducted in hundreds of a client’s retail sites. Post-retrofit, the client wanted to assess whether the value delivered had been accurately calculated, compare the results with the original business case, and use the outcome of the investigation to inform further retrofit decisions.
Clients accrue several benefits through lighting retrofits – including reductions in electricity consumption, GHG emissions and work order service calls. While electricity and GHG emission reductions are relatively easy to measure or calculate through meter bills, quantifying savings delivered as a result of work order reduction is complicated for several reasons. In this case, work orders represent events when a technician attends a building site to service or replace affected lights. This information consists of numerical as well as text data. To make things more complicated, a lighting retrofit isn’t always about changing to a completely new light bulb. It could be upgrading to a newer model of the same type of light bulb, which was the case here: upgrading from fluorescent tube light type A1 to A2. This is a level of detail simply not categorized in an easy-to-analyze fashion, yet the information is hidden in plain sight - in the text.
In order to understand this information, the traditional approach would have been for someone to manually read through work order problem descriptions – 30,000 to be precise – and subjectively categorize them. Furthermore, not all sites participated in the retrofit project, and as the project spanned multiple years, sites were not all retrofit at the same time. The typical approach of analyzing such a large volume of data would be to aggregate information at the client level, which would result in any savings benefits at test sites being diluted by control sites.
In a nutshell: while the data existed in the database, extracting information from large quantities of work orders, and the corresponding textual fields was complex. It simply wasn’t as easy to prove retrofit savings as it may have appeared.
The client was relying on a detailed analysis to help inform future lighting benefits, so the analysis had to be spot on to ensure a correct decision.
Using Data Science to Deep Dive into Topics and Ensure Accuracy
The first step in defining an objective and efficient way to quantify savings was to identify a baseline: when did the retrofit happen, and which sites were retrofit? Looking at the costs on either side of the baseline for test sites, for both retrofit sites and non-retrofit sites, it’s possible to identify the impact of the retrofit.
Proving that the drop was in fact driven by the fluorescent tube light change from type A1 to A2 was the next challenge - as that level of detail is not kept as a tabular record and is difficult to extract. The solution for this was through application of data science. Topic modeling, an unsupervised natural language processing (NLP) technique, was used to read through all work orders’ descriptions and resolutions - to understand what issue occurred, and what work was performed at the site. This technique categorized the service calls in an objective fashion, providing statistics to deep dive into the topics to ensure accuracy. In one case, topic modelling detected a category of work orders (“ceil, height, standard, fluoresce”) where service calls had been initiated to change a fluorescent tube light at ceiling height. This was clearly an activity which was within the scope of the retrofit project objective i.e. the project aimed to reduce such types of work orders. Several other in-scope themes of service calls were also identified which occurred prior to the retrofit.
Topic modelling was conducted in both the pre-retrofit and post-retrofit phases to identify (1) the types and counts of work orders created, (2) whether the underlying issues were those which the retrofit was designed to address, and (3) the associated costs within each type of topic.
The issues that the retrofit was designed to address – like in the example above – reduced dramatically in quantity and in the overall priority post-retrofit. When compared to non-retrofit sites, it became further apparent that the retrofit resulted in cost reductions. Topic modelling allowed BGIS to attribute these savings to the particular type of light bulb that was replaced under the retrofit.
At a high level the lighting retrofit project was budgeted at approximately $4M.
The savings were a combination of reductions in both electricity consumption and maintenance and repair costs (M & R work orders). Energy savings were measurable from the bills, while the M & R savings required the topic modelling approach. Therefore the savings just on M & R work orders of $420K (annualized) is substantial as a proportion of the overall project budget and helped to justify the overall project costs. Specific results were:
$420K worth of annual cost savings
$35K reduction in average monthly work order costs
Potential for Even More Savings
As a service provider, this analysis demonstrated that savings could potentially have been higher had all sites been retrofit – the sites which did not go through the retrofit in fact increased in cost over time over the baseline. Additionally, future retrofits may be condensed over shorter time frames to reap cost avoidance benefits, leading to stronger business cases and savings for clients. From an analysis perspective, modern technology can quickly address common business problems in an objective, transparent, and repeatable fashion. A risk with analyses conducted at aggregate levels is that such analyses can easily provide an incorrect direction (Simpson’s Paradox); in this case, had an aggregated method been used, it could have incorrectly led to an understatement of the savings delivered for the client.
A review of independent research papers identified KNIME as a Leader in Gartner’s 2018 Magic Quadrant for Data Science and Machine Learning Platforms: a position KNIME had retained for four years prior. Additionally, the total cost of ownership for KNIME was dramatically lower than other software providers in the same quadrant.
As brand new tool to the organization, KNIME Analytics Platform was simple to learn thanks to an extensive example library (KNIME Hub), several free and paid online courses, a buzzing online community on KNIME Forum (as is typical with many open source tools), and a responsive support team. A key reason for selecting KNIME was the no-fee, one-click download. Other data science tools were considered, but the high licensing fees quickly made the total cost of ownership unpalatable. Also, because KNIME doesn’t work in competition with existing tools, but rather alongside them, it provided peace of mind that the tools the business is familiar with could still be used – including Access, SQL and Tableau.
Getting started with KNIME was also very easy – thanks to all the free resources available online. Paras Gupta, Director, BI & Advanced Analytics at BGIS went from "having zero experience to being an advanced user in under two weeks.”
Furthermore, in this case specifically, the client was able to go back and justify further business cases - helping BGIS to prove value and to continue delivering value to clients.
This Success Story is available here as a PDF.
Download the free and open source KNIME Analytics Platform.
Watch Paras Gupta's present this topic from KNIME Fall Summit 2019.
Hear Paras explain the value KNIME delivers not only to clients, but also to BGIS.