Data Science Learnathon. From Raw Data to Deployment: the Data Science Cycle with KNIME, R, or Python

- HomeAway 1800 Domain Blvd, Austin, TX, USA

This will be a Learnathon kind of meetup. A Learnathon is a session where we allow ourselves the luxury of learning new tools and new techniques. For this particular event, we will cover the whole data science cycle, from the raw data to the final application on a production machine. That is: data access, data blending, data preparation, model training, optimization, testing, and finally deployment. The tool of choice for this Learnathon will be KNIME Analytics Platform.  

KNIME Analytics Platform is an open, open-source, GUI driven, data analytics platform that covers all your data needs from data import to final deployment. Being open, KNIME Analytics Platform offers a vast integration and IDE environment for R, Python, SQL, and Spark.  

After an initial introduction to the tool and to the data science cycle, we will split into groups. Each group will focus on one of three aspects of the data science cycle:

  • Just pure raw data. Data Access and Data Preparation
  • Machine Learning. Which model shall I use? Which parameters?
  • I have a great model. Now what? The deployment phase  

Each participant can implement his/her part using KNIME Analytics Platform directly or R orPython from within the IDE provided inside KNIME Analytics Platform. In theory, it would also be possible to run the analysis in Spark. However, the usage of Spark requires the additional installation of an external accessible Spark cluster. Because of the required overhead, we will focus on Spark usage during another Learnathon event.

On our side, we will provide: a few datasets; example workflows to complete according to the chosen task; experts in KNIME, R, and Python; and of course food and drinks. Please bring your own laptop to use during the Learnathon, with KNIME Analytics Platform pre-installed. To install KNIME Analytics Platform, follow the instructions provided in these YouTube videos:

If you would like to get familiar with KNIME Analytics Platform, you can explore the content of this e-learning course In particular, we advise you to read and watch the units in Chapter 1.

Here is a more detailed agenda of the event.

  • 6:30 – 6:50pm: Introduction to KNIME Analytics Platform
  • 6:50 – 7:10pm: The Data Science cycle: from raw data to deployment 
  • 7:10 – 7:20pm: Data sets and tasks presentation; group formation
  • 7:20 – 10:00pm: Let’s work & learn!