How This Workflow Works
This workflow connects to Snowflake, retrieves and processes customer transaction and demographic data directly within the database, and uses the Elbow method to visually determine the optimal number of customer groups. It then applies k-Means clustering to segment the data, and assigns new customers to the identified segments. The results are visualized, and then stored in Snowflake for further use.
Key Features:
- Load and transform customer data in Snowflake
- Identify meaningful segments using clustering algorithms
- Determine the optimal number of customer groups with a visual method
- Assign new customers to existing segments at scale directly within Snowflake
- Visualize and assess the quality of customer segments for actionable insights
Step-by-step:
1. Extract Customer Data from Snowflake and Join It:
The workflow connects to a Snowflake database, retrieves transaction and demographic data, and combines it into a unified customer dataset.
2. Partition, Clean and Transform Data:
The dataset is partitioned into existing and new customers, and undergoes preprocessing steps such as handling missing values, managing outliers, and normalizing numerical features. These transformations prepare the data for clustering and ensure the algorithm receives clean information.
3. Determine Optimal Number of Clusters:
The workflow applies the Elbow method to help estimate the best number of customer segments. By visually inspecting how the clustering quality changes with different numbers of clusters, users can select a value that balances simplicity and accuracy.
4. Discover Customer Segments and Cluster New Data in-Database:
Using the selected number of clusters, the workflow applies the k-Means algorithm to group existing customers into segments. New customer data stored in Snowflake can then be assigned to the most relevant cluster directly within the database, allowing the segmentation strategy to scale as new data arrives.
5. Visualize and Assess Cluster Quality:
The segmented customer groups are visualized using interactive plots. Users can inspect the characteristics of each segment, evaluate clustering quality, and write clustered customers back to Snowflake for further analysis or reporting.