Section 3.4. Transformation: Conversion, Replacement, Standardization, and New Feature Generation

Often, data are standardized before being stored, analyzed, or reported. This means, string and date & time values are converted to follow the same style and format, numbers are normalized, and new features are created from the existing ones.

Possible string manipulation operations are extracting substrings, standardizing texts to lower case or upper case, or adding a prefix/suffix to string values, for example.

To numbers you could apply some kind of mathematical transformation, like for example normalization or logarithmic transformation.

In general, data can be transformed to generate new, hopefully, more informative input features.

This section aims at covering as many as possible of such data transformation operations on strings, numbers, date & time, and other data types.