Tidy data

Updated on Oct 01, 2024

Edit

Comment

Tidy data is the data obtained as a result of a process called data tidying. It is one of the important cleaning processes during big data processing and is a recognized step in the practice of data science. The Tidy data sets have structure and working with them are easy, they’re easy to manipulate, model and visualize. Tidy data sets main concept is to arrange data in a way that each variable is a column and each observation is a row.

Tidy data provide standards and concepts for data cleaning, and with tidy data there’s no need to start from scratch and reinvent new methods for data cleaning.

Characteristics

Jeff Leek in his book The Elements of Data Analytic Style summarize the characteristics of tidy data in the points:

Each variable you measure should be in one column
Each different observation of that variable should be in a different row
There should be one table for each “kind” of variable
If you have multiple tables, they should include a column in the table that allows them to be linked

References

Tidy data Wikipedia

(Text) CC BY-SA