Tidyverse
The tidyverse is a collection of open source packages for the R programming language introduced by Hadley Wickham and his team that "share an underlying design philosophy, grammar, and data structures" of tidy data. Characteristic features of tidyverse packages include extensive use of non-standard evaluation and encouraging piping.
As of November 2018, the tidyverse package and some of its individual packages comprise 5 out of the top 10 most downloaded R packages. The tidyverse is the subject of multiple books and papers. In 2019, the ecosystem has been published in the Journal of Open Source Software.
Its syntax has been referred to as "supremely readable", and some have argued that tidyverse is an effective way to introduce complete beginners to programming, as pedagogically it allows students to quickly begin doing data processing tasks. Moreover, some practitioners have pointed out that data processing tasks are intuitively easier to chain together with tidyverse compared to Python's equivalent data processing package, pandas. There is also an active R community around the tidyverse. For example, there is the TidyTuesday social data project organised by the Data Science Learning Community (DSLC), where varied real-world datasets are released each week for the community to participate, share, practice, and make learning to work with data easier. Critics of the tidyverse have argued it promotes tools that are harder to teach and learn than their built-in, base R equivalents and are too dissimilar to some programming languages.
The tidyverse principles more generally encourage and help ensure that a universe of streamlined packages, in principle, will help alleviate dependency issues and compatibility with current and future features. An example of such a tidyverse principled approach is the pharmaverse, which is a collection of R packages for clinical reporting usage in pharma.
Packages
The core tidyverse packages, which provide functionality to model, transform, and visualize data, include:
- ggplot2 – for data visualization
- dplyr – for wrangling and transforming data
- tidyr – help transform data specifically into tidy data, where each variable is a column, each observation is a row; each row is an observation, and each value is a cell.
- readr – help read in common delimited, text files with data
- purrr – a functional programming toolkit
- tibble – a modern implementation of the built-in data frame data structure
- stringr – helps to manipulate string data types
- forcats – helps to manipulate category data types
Additional packages assist the core collection. Other packages based on the tidy data principles are regularly developed, such as tidytext for text analysis, tidymodels for machine learning, or tidyquant for financial operations.