library('tidyverse')
It’s simple to create a tibble - instead of using base R’s
data.frame()
function, use tibble’s tibble()
function. If you’re
looking to coerce an object into a tibble, use as_tibble()
instead of
as.data.frame()
. The function as_tibble()
was created with speed in
mind, it is much quicker than the base R counterpart.
Using tibbles instead of data frames is an easy habit to form, and the
benefits of using tibbles make it time well spent. Tibbles never change
input types like data frames do, they also never adjust the names of
variables. Tibbles evaluate arguments lazily and sequentially, resulting
in more user-friendly structure creation and manipulation. They also
don’t use rownames()
and store variables as special attributes;
tibbles are a standardized data frame that consistently simplify the
user experience.
In addition to the previously mentioned benefits of tibbles, here are perhaps the three most important changes made from the outdated data frame.
Printing
Objects as a data.frame
will print every column in the data frame.
This behavior is rarely useful, so I’ve used the head()
function to
limit the output.
head(iris, n = 10)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 7 4.6 3.4 1.4 0.3 setosa
## 8 5.0 3.4 1.5 0.2 setosa
## 9 4.4 2.9 1.4 0.2 setosa
## 10 4.9 3.1 1.5 0.1 setosa
When an object is stored as a tibble, calling it will automatically limit the output to ten rows.
iris.tib <- as_tibble(iris)
iris.tib
## # A tibble: 150 x 5
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## <dbl> <dbl> <dbl> <dbl> <fct>
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 7 4.6 3.4 1.4 0.3 setosa
## 8 5 3.4 1.5 0.2 setosa
## 9 4.4 2.9 1.4 0.2 setosa
## 10 4.9 3.1 1.5 0.1 setosa
## # … with 140 more rows
You’ll also notice that tibbles inform you on the data structures and
dimensions, data frames do not. If you want to view the entire dataset,
the View()
function in RStudio is a great option.
Subsetting
Tibbles are more strict on subsetting; remember that a single bracket
[
will produce another tibble (multiple vectors) and a double bracket
[[
will produce a single vector.
[
= Multiple Vectors[[
= Single Vector
You can also use the $
to pull single vector of information, but only
by its name.
When using $
within a tibble, don’t expect the partial matching
behavior that’s found in data frames.
df <- data.frame(abc = 1)
df$a
## [1] 1
df2 <- tibble(abc = 1)
df2$a
## Warning: Unknown or uninitialised column: 'a'.
## NULL
If you’re a fan of the magrittr pipe like I am, you’ll need to use the
special character .
to subset the tibble.
df <- tibble(
x = runif(5),
y = rnorm(5)
)
df %>% .$x
## [1] 0.8407117 0.5250497 0.9844013 0.3676194 0.2659309
df %>% .[["x"]]
## [1] 0.8407117 0.5250497 0.9844013 0.3676194 0.2659309
Recycling
My favorite from data frames is the lack of vector recycling in tibbles. Within data.frames, if a vector doesn’t fit the structures dimensions it is repeated or “recycled” until it does.
data.frame(a = 1:6, b = 1:2)
## a b
## 1 1 1
## 2 2 2
## 3 3 1
## 4 4 2
## 5 5 1
## 6 6 2
Tibbles don’t recycle vectors, unless they’re of length 1.
tibble(a = 1:6, b = 1:2)
## Tibble columns must have consistent lengths, only values of length one are recycled:
## * Length 2: Column `b`
## * Length 6: Column `a`
That’s all for now!
- Fisher
Comments