Tidy Data
Tidy datasets (, ) have the following structure:
- Each variable is a column.
- Each observation is a row.
- Each type of observational unit is a table.
A dataset is a collection of values (numbers, strings etc). Every value belongs to a variables and an observation.
Every variables contains all values that measure the same underlying attribute (e.g. some metric, score, temperature).
An observation contains all values measured on the same unit (e.g. person, day etc) across all attributes.
Some common patterns of messy (non-tidy) datasets:
- Column headers are values, not variable names.
- Multiple variables are stored in one column.
- Variables are stored in both rows and columns.
- Multiple types of observational units are stored in the same table.
- A single observational unit is stored in multiple tables.