How This Workflow Works
This workflow brings together several tables containing fictitious records from different distributions. It demonstrates how to stack these tables one after another by either including all columns (union) or only the columns they have in common (intersection), allowing you to handle missing columns across the datasets according to your needs.
Key Features:
- Combine multiple datasets into a single table for unified analysis
- Choose between stacking all available columns or only shared columns
- Handle missing columns flexibly during the merging process
Step-by-step:
1. Combine Tables Using Intersection of Columns:
The workflow first demonstrates combining tables by including only the columns that all tables share. This method produces a cleaner, more consistent table, but excludes any columns that aren't present in every dataset.
2. Combine Tables Using Union of Columns:
Next, the workflow combines the tables by including all columns from each source. If a column is missing in one table, its values are set as missing in that portion of the stacked table. This approach ensures you retain all available columns from every dataset.
3. Repeat for Additional Data Sources:
The stacking using union of columns is repeated for a third dataset, allowing you to see how to dynamically add more input data ports and effectively combine all the available data into a unified table.