## Library

```
library('tidyverse')
```

## Iteration by Hand

Iteration allows you to conduct the same operation on multiple inputs
without tediously copying-and-pasting code. To illustrate the need for
iteration, I’ll set up a repetitive task and complete it by hand. Once
we have a baseline of tediousness, I’ll complete the same task using a
for loop. After that, we’ll move to the purrr package’s `map`

functions
for maximum-awesome.

To illustrate the need for iteration, let’s compute a simple summary
statistic on a dataset. First we’ll create a dataframe with 4 variables:
a, b, c, and d. Each of these variables will contain 10 randomly
generated numbers from the normal distribution using the function
`rnorm`

.

```
df <- tibble(
a = rnorm(10),
b = rnorm(10),
c = rnorm(10),
d = rnorm(10)
)
df
## # A tibble: 10 x 4
## a b c d
## <dbl> <dbl> <dbl> <dbl>
## 1 -0.554 -0.0567 -0.550 0.527
## 2 -0.364 0.697 -0.00842 -0.329
## 3 -0.247 0.395 0.179 -0.477
## 4 0.701 0.943 -0.760 -1.26
## 5 0.724 0.0190 -1.66 -0.0738
## 6 -0.0406 -0.427 -0.315 -0.645
## 7 1.16 -0.185 -0.931 0.529
## 8 0.498 -0.378 0.636 0.00134
## 9 -0.832 -0.594 0.355 -0.176
## 10 -0.786 0.468 0.882 -1.27
```

As you can see the numbers range from around negative 3 to positive 3, but are mostly populating -1 to 1. This is typical of the normal distribution, to see this distribution in action, check out my previous blog post A Roll of the Dice.

Now to compute the mean of each of the variables a-d:

```
mean(df)
## Warning in mean.default(df): argument is not numeric or logical: returning
## NA
## [1] NA
```

Well that doesn’t work…

I guess we’ll have to specify each variable.

```
mean(df$a)
## [1] 0.02607439
mean(df$b)
## [1] 0.08807291
mean(df$c)
## [1] -0.2174574
mean(df$d)
## [1] -0.3171327
```

That’s better. But imagine if we have 100 variables that we want to
calculate the mean of… That would be awful! Good thing we have for loops
to rescue us from all of this *work*.

## For Loops

Let’s make a for loop, and use it to calculate the mean of each of these variables.

```
# Step 1
output_vector <- vector("double", ncol(df))
# Step 2
for (i in seq_along((df))) {
# Step 3
output_vector[[i]] <- mean(df[[i]])
# close for loop
}
# display results
output_vector
## [1] 0.02607439 0.08807291 -0.21745738 -0.31713274
```

We’ve successfully created a for loop to iterate through the variables! It may seem like more work at first, and maybe it is for such a simple task as calculating the mean of 4 variables. But for loops are an integral programming technique and can be extremely useful. Even if you’re a master at purrr.

Each for loop is broken down into three steps.

The first step, you must create an output vector to store your results.
It’s easy to create an empty vector of the correct size using the
`vector()`

function. In the first argument of `vector()`

you specify
what kind of vector to create. This could be specified as any of the
data structures i.e. integer, logical, character, etc. The second
argument of `vector()`

is the length of the vector. Clever use of
`ncol()`

, `nrow()`

, or `length()`

will give you the proper length for
your output vector.

The second step of a for loop is to define the sequence. Here we
determine what to loop over using base R’s `seq_along()`

function.
Create the variable `i`

as a counter variable.

The third and most complicated step of creating a for loop is in the body of the sequence. Here we describe to R exactly what we want to do as we loop through our defined dataframe. Sometimes we simply want to take the mean of each variable, and store that number into our output vector. Other times we may want to insert a series of if then statements, of generate graphics of the data we’re looping over. The possibilities are endless.

Great, so we’ve covered a iteration and a simple for loop. Now let’s get to the good stuff. The reason you came here. The purrr package!

## Purrr Map Function

In our quest to find the mean of each of the four variables in `df`

, we
can use the most basic purrr function `map()`

.

```
df %>%
map(mean)
## $a
## [1] 0.02607439
##
## $b
## [1] 0.08807291
##
## $c
## [1] -0.2174574
##
## $d
## [1] -0.3171327
```

The function `map`

takes two main arguments, a target to iterate over,
and a function to apply during that iteration. `map`

Returns a list of
values, but if you don’t want a generic list you can the variants of the
`map`

function.

`map_lgl()`

returns a logical vector.`map_int()`

returns an integer vector.`map_dbl()`

returns a double vector.`map_chr()`

returns a character vector.

Here’s what `map_dbl()`

looks like when applied to our favorite
iteration-situation.

```
df %>%
map_dbl(mean)
## a b c d
## 0.02607439 0.08807291 -0.21745738 -0.31713274
```

That’s smooooth.

So let’s make it more complex! How about we create our own function and
test out the `map_chr`

function. I want to run through a dataframe of
arbitrary size, and return the words “positive”, “negative”, or “zero”,
if the elements are as such. I want this to function to be done to every

First things first, let’s make the custom function!

```
# name the function 'classify_chr' with input = 'input'
classify_chr <- function(input) {
# set a counter equal to 1
i <- 1
# create a save vector of input length
save_vect <- vector("character", length(input))
# main while loop, while counter is less than or equal
# to input length, classify the elements and put that
# character classification into the save vector.
# Then, add one to the counter to move on to the next
# element.
while (i <= length(input))
{
if (input[[i]] > 0) {
save_vect[[i]] = "positive"
}
if (input[[i]] < 0) {
save_vect[[i]] = "negative"
}
if (input[[i]] == 0) {
save_vect[[i]] = "zero"
}
i <- i + 1
}
# When the save vector is full of output, collapse the
# results and separate them by a space. Finally, print
# the resulting output vector.
output <- str_c(save_vect, collapse=" ")
print(output)
}
```

Alright. That function was relatively easy to create! Let’s test it out.

```
# test classify_chr function
classify_chr(-1)
## [1] "negative"
classify_chr(0)
## [1] "zero"
classify_chr(1)
## [1] "positive"
```

Single entries are working as expected. Let’s give the custom function a concatenated list of numbers and see what it does.

```
# test classify_chr on a numeric vector
a <- c(-1,0,1)
classify_chr(a)
## [1] "negative zero positive"
```

Looking good, how about we step it up to the final level of complexity! Running through a tibble that has lists of numbers as it’s variables.

```
# test classify_chr on a tibble
test_tib <- tibble(
a = sample(-10:10,10, replace = T),
b = sample(1:10,10, replace = T),
c = 0
)
test_tib
## # A tibble: 10 x 3
## a b c
## <int> <int> <dbl>
## 1 -10 7 0
## 2 2 5 0
## 3 4 7 0
## 4 -10 5 0
## 5 -8 1 0
## 6 -10 10 0
## 7 1 7 0
## 8 5 1 0
## 9 10 5 0
## 10 0 8 0
classify(test_tib)
## Error in classify(test_tib): could not find function "classify"
```

An Error! Oh no! Wait, that’s exactly why we went through all of this. That’s where purrr comes in! Let’s give it a shot.

```
# use map_chr to apply custom function to a tibble
test_tib %>%
map_chr(classify_chr)
## [1] "negative positive positive negative negative negative positive positive positive zero"
## [1] "positive positive positive positive positive positive positive positive positive positive"
## [1] "zero zero zero zero zero zero zero zero zero zero"
## a
## "negative positive positive negative negative negative positive positive positive zero"
## b
## "positive positive positive positive positive positive positive positive positive positive"
## c
## "zero zero zero zero zero zero zero zero zero zero"
```

Alright! Not the prettiest of outputs, but it does what it’s designed to do. And it’s a great illustration of when we need to use the map function.

### Purrr map2 & pmap Function

When we want to map over two arguments in our function, we can use the map2 function. Say you want to generate a random number from the normal distribution with a specific mean and standard deviation.

Using the `rnorm`

function, you could do something like this.

```
rnorm(10, 5, n = 1)
## [1] 16.51869
```

There we’ve taken number from the normal curve with mean (mu) 10, and
standard deviation (sigma) 5. now say we want to do this 5 times each
with 10 different inputs. Instead of writing 50 `rnorm`

statements,
`Map2`

can help us out.

```
# create a variety of mean inputs
mu <- rep(20:24, 2)
# create a variety of standard devation inputs
sigma <- sample(1:5, size = 10, replace = T)
# check the inputs
mu
## [1] 20 21 22 23 24 20 21 22 23 24
sigma
## [1] 3 5 1 2 5 1 3 4 1 5
# apply mu and sigma inputs to the rnorm function,
# produce 5 outputs for each pair of input arguments
map2(mu, sigma, rnorm, n=5)
## [[1]]
## [1] 21.17404 22.56106 22.49158 22.15317 16.70522
##
## [[2]]
## [1] 19.08679 17.91927 21.51969 17.75422 12.80090
##
## [[3]]
## [1] 21.73921 20.89284 22.85076 22.23587 19.92050
##
## [[4]]
## [1] 25.29599 23.64171 23.27595 20.67421 21.70236
##
## [[5]]
## [1] 19.87137 15.18446 15.11524 19.59864 20.91303
##
## [[6]]
## [1] 20.39417 19.47562 18.83630 19.57695 19.72240
##
## [[7]]
## [1] 12.27347 19.89696 26.24954 18.58605 20.20357
##
## [[8]]
## [1] 14.32382 22.86421 18.62573 25.46024 21.88317
##
## [[9]]
## [1] 24.44066 23.07704 23.02711 23.51332 22.88832
##
## [[10]]
## [1] 28.77960 22.31481 33.42296 20.08290 29.53275
```

It’s easy to see how you could want 3 arguments, or 4 or 5. For the
generalized case, of **p** inputs, we use the `pmap`

function in much
the same way we would use map2.

As a final example, we want to again use the `rnorm`

function to choose
a number from the normal distribution with a set mean and standard
deviation. this time, we also want to vary the number of outputs the
function returns by changing the `n =`

argument in `rnorm`

. 3 varied
arguements is the job for `pmap`

.

```
outs <- c(10,15,20)
mu <- c(5,6,7)
sigma <- c(1,2,3)
arguments <- list(outs, mu, sigma)
arguments %>%
pmap(rnorm)
## [[1]]
## [1] 5.752082 4.529751 5.569735 3.719392 2.931446 5.323930 3.992918
## [8] 3.253352 4.440078 6.312741
##
## [[2]]
## [1] 6.363228 4.626116 7.005189 7.429690 9.127814 6.009882 4.589295
## [8] 6.587369 6.612209 3.677911 6.607614 8.443922 5.242299 5.776972
## [15] 4.124226
##
## [[3]]
## [1] 5.1630993 9.4077378 0.4833954 6.3163107 3.2688190 9.4852517
## [7] 2.4587297 9.8187866 3.0978455 7.3941717 2.1337123 7.4064330
## [13] 7.5908101 6.4423399 9.2909349 8.8124667 12.9292212 8.6884592
## [19] 8.1997732 6.3380038
```

That’s all for now!

- Fisher

## Comments