Each cluster is based on three Gaussian distributed values, which form the final cluster. The workflow demonstrates how the of the data generation nodes can be used in combination to generate a complex data set. First each data point is assigned a cluster. Than for each cluster the three dimensionsare generated based on a Guassian distribution. Finally the stresser node is used to add some random noise.
Similar to flow 01_Generating_clusters_with_Gaussian_distribution this flow generates three clusters with three dimensions. However here, they are generated in parallel and joined afterwards.
Generates a sample model database. e.g. for each model we create the height, shoe-size, agency and number of jobs.
Shows, how missing values, can be very randomly added to cells of a column.
Split the table into two parts. Delete in one part the column. Concatenate the two parts.
This workflow generates three types of cluster. The first has the form of a boomerang, while the second one simulates a T and the third resembles a cup.
Combination of two tables randomly. Hence the rows of table 1 are randomly filled with rows of table 2
Manipulate each data part indiviually and rejoin your data.
When manipulating data there are always things which are done per nominal value. This can be applied as demonstrated here.
This workflows takes an existing set of shopping baskets. These baskets are provided via a basket id + each of the product id. Afterwards first the support of apple, chips and gummi bears is increased, to get a sufficient support for our rule. Afterwards the rule is inserted, by increasing the confident of the rule. This is realized by inserting gummi bears into baskets where apple and chips are already found. Finally the new rules are validated in the Association Rule Learner
Show how to generate the whole shopping basket. How to use existing informations for it and how everything can finally be exported for mining.
This workflow shows the advantage of using quasi random generation for the use of multidimensional numerical data generation. As seen in the final concatenate node, the data points get nicer distributed.