Data Generation

Generating clusters with Gaussian distribution

Each cluster is based on three Gaussian distributed values, which form the final cluster. The workflow demonstrates how the of the data generation nodes can be used in combination to generate a complex data set. First each data point is assigned a cluster. Than for each cluster the three dimensionsare generated based on a Guassian distribution. Finally the stresser node is used to add some random noise.

Generating data sets containing association rules

This workflows takes an existing set of shopping baskets. These baskets are provided via a basket id + each of the product id. Afterwards first the support of apple, chips and gummi bears is increased, to get a sufficient support for our rule. Afterwards the rule is inserted, by increasing the confident of the rule. This is realized by inserting gummi bears into baskets where apple and chips are already found. Finally the new rules are validated in the Association Rule Learner

Subscribe to Data Generation