For more details see R Functionals

A

randomise <- function(f) f(runif(1e3))

randomise(mean)

#> [1] 0.4903837

randomise(mean)

#> [1] 0.494241

randomise(sum)

#> [1] 505.3871

The chances are that you’ve already used a functional: the three most frequently used are

A common use of functionals is as an alternative to for loops.

Functionals reduce bugs in your code by better communicating intent. Functionals implemented in base R are well tested (i.e., bug-free) and efficient, because they’re used by so many people. Many are written in C, and use special tricks to enhance performance. That said,

Outline

My first functional: lapply()The simplest functional is lapply(), which you may already be familiar with.

lapply() is written in C for performance, but we can create a simple R implementation that does the same thing:

lapply2 <- function(x, f, ...) {

out <- vector("list", length(x))

for (i in seq_along(x)) {

out[[i]] <- f(x[[i]], ...)

}

out

}

A

**higher-order function**is a function that takes a function as an input or returns a function as output. We’ve already seen one type of higher order function: closures, functions returned by another function. The complement to a closure is a**functional**, a function that takes a function as an input and returns a vector as output. Here’s a simple functional: it calls the function provided as input with 1000 random uniform numbers.randomise <- function(f) f(runif(1e3))

randomise(mean)

#> [1] 0.4903837

randomise(mean)

#> [1] 0.494241

randomise(sum)

#> [1] 505.3871

The chances are that you’ve already used a functional: the three most frequently used are

**lapply(), apply(), and tapply()**. All three take a function as input (among other things) and return a vector as output.A common use of functionals is as an alternative to for loops.

**For loops**have a bad rap in R. They have a reputation for**being slow**(although that reputation is only partly true, see modification in place for more details). But the real downside of for loops is that they’re not very expressive. A for loop conveys that it’s iterating over something, but doesn’t clearly convey a high level goal.**Instead of using a for loop, it’s better to use a functional**. Each functional is tailored for a specific task, so when you recognise the functional you know immediately why it’s being used. Functionals**play other roles as well as replacements for for-loops**. They are useful for**encapsulating common data manipulation tasks like split-apply-combine**, for thinking “functionally”, and for working with mathematical functions.Functionals reduce bugs in your code by better communicating intent. Functionals implemented in base R are well tested (i.e., bug-free) and efficient, because they’re used by so many people. Many are written in C, and use special tricks to enhance performance. That said,

**using functionals will not always produce the fastest code. Instead, it helps you clearly communicate and build tools that solve a wide range of problems**. It’s a mistake to focus on speed until you know it’ll be a problem. Once you have clear, correct code you can make it fast using the techniques you’ll learn in improving the speed of your code.Outline

- My first functional: lapply() introduces your first functional: lapply().
- For loop functionals shows you variants of lapply() that produce different outputs, take different inputs, and distribute computation in different ways.
- Data structure functionals discusses functionals that work with more complex data structures like matrices and arrays.
- Functional programming teaches you about the powerful Reduce() and Filter() functions which are useful for working with lists.
- Mathematical functionals discusses functionals that you might be familiar with from mathematics, like root finding, integration, and optimisation.
- Loops that shouldn’t be converted to functions provides some important caveats about when you shouldn’t attempt to convert a loop into a functional.
- A family of functions finishes off the chapter by showing you how functionals can take a simple building block and use it to create a set of powerful and consistent tools.

My first functional: lapply()The simplest functional is lapply(), which you may already be familiar with.

**lapply() takes a function, applies it to each element in a list, and returns the results in the form of a list**. lapply() is the building block for many other functionals, so it’s important to understand how it works. Here’s a pictorial representation:lapply() is written in C for performance, but we can create a simple R implementation that does the same thing:

lapply2 <- function(x, f, ...) {

out <- vector("list", length(x))

for (i in seq_along(x)) {

out[[i]] <- f(x[[i]], ...)

}

out

}

From this code, you can see that

lapply() makes it easier to work with lists by eliminating much of the boilerplate associated with looping. This allows you to focus on the function that you’re applying

# Create some random data

l <- replicate(20, runif(sample(1:10, 1)), simplify = FALSE)

# With a for loop

out <- vector("list", length(l))

for (i in seq_along(l)) {

out[[i]] <- length(l[[i]])

}

unlist(out)

#> [1] 2 6 10 2 4 3 6 5 6 9 8 4 1 5 6 5 6 5 3 4

# With lapply

unlist(lapply(l, length))

#> [1] 2 6 10 2 4 3 6 5 6 9 8 4 1 5 6 5 6 5 3 4

(I’m

Since data frames are also lists, lapply() is also useful when you want to do something to each column of a data frame:

# What class is each column?

unlist(lapply(mtcars, class))

#> mpg cyl disp hp drat wt qsec

#> "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"

#> vs am gear carb #> "numeric" "numeric" "numeric" "numeric"

# Divide each column by the mean

mtcars[] <- lapply(mtcars, function(x) x / mean(x))

The pieces of x are always supplied as the first argument to f. If you want to vary a different argument, you can use an

trims <- c(0, 0.1, 0.2, 0.5)

x <- rcauchy(1000)

unlist(lapply(trims, function(trim) mean(x, trim = trim)))

#> [1] -1.13849692 -0.05370041 -0.02565192 0.02389956

It’s useful to remember that there are three basic ways to loop over a vector:

xs <- runif(1e3)

res <- c()

for (x in xs) {

# This is slow!

res <- c(res, sqrt(x))

}

This is slow because each time you extend the vector, R has to copy all of the existing elements. Avoid copies discusses this problem in more depth. Instead, it’s much better to create the space you’ll need for the output and then fill it in. This is easiest with the second form:

res <- numeric(length(xs))

for (i in seq_along(xs)) {

res[i] <- sqrt(xs[i])

}

Just as there are three basic ways to use a for loop, there are three basic ways to use lapply():

lapply(xs, function(x) {})

lapply(seq_along(xs), function(i) {})

lapply(names(xs), function(nm) {})

**lapply() is a wrapper for a common for loop pattern**: create a container for output, apply f() to each component of a list, and fill the container with the results. All other for loop functionals are variations on this theme: they simply use different types of input or output.lapply() makes it easier to work with lists by eliminating much of the boilerplate associated with looping. This allows you to focus on the function that you’re applying

# Create some random data

l <- replicate(20, runif(sample(1:10, 1)), simplify = FALSE)

# With a for loop

out <- vector("list", length(l))

for (i in seq_along(l)) {

out[[i]] <- length(l[[i]])

}

unlist(out)

#> [1] 2 6 10 2 4 3 6 5 6 9 8 4 1 5 6 5 6 5 3 4

# With lapply

unlist(lapply(l, length))

#> [1] 2 6 10 2 4 3 6 5 6 9 8 4 1 5 6 5 6 5 3 4

(I’m

**using unlist() to convert the output from a list to a vector**to make it more compact. We’ll see other ways of making the output a vector shortly.)Since data frames are also lists, lapply() is also useful when you want to do something to each column of a data frame:

# What class is each column?

unlist(lapply(mtcars, class))

#> mpg cyl disp hp drat wt qsec

#> "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"

#> vs am gear carb #> "numeric" "numeric" "numeric" "numeric"

# Divide each column by the mean

mtcars[] <- lapply(mtcars, function(x) x / mean(x))

The pieces of x are always supplied as the first argument to f. If you want to vary a different argument, you can use an

**anonymous function**. The following example varies the amount of trimming applied when computing the mean of a fixed x.trims <- c(0, 0.1, 0.2, 0.5)

x <- rcauchy(1000)

unlist(lapply(trims, function(trim) mean(x, trim = trim)))

#> [1] -1.13849692 -0.05370041 -0.02565192 0.02389956

**Looping patterns**It’s useful to remember that there are three basic ways to loop over a vector:

**loop over the elements**: for (x in xs)**loop over the numeric indices**: for (i in seq_along(xs))**loop over the names**: for (nm in names(xs))

**The first form is usually not a good choice for a for loop because it leads to inefficient ways of saving output**. With this form it’s very natural to save the output by extending a datastructure, like in this example:xs <- runif(1e3)

res <- c()

for (x in xs) {

# This is slow!

res <- c(res, sqrt(x))

}

This is slow because each time you extend the vector, R has to copy all of the existing elements. Avoid copies discusses this problem in more depth. Instead, it’s much better to create the space you’ll need for the output and then fill it in. This is easiest with the second form:

res <- numeric(length(xs))

for (i in seq_along(xs)) {

res[i] <- sqrt(xs[i])

}

Just as there are three basic ways to use a for loop, there are three basic ways to use lapply():

lapply(xs, function(x) {})

lapply(seq_along(xs), function(i) {})

lapply(names(xs), function(nm) {})

**Typically you’d use the first form because lapply()****takes care of saving the output for you**. However, if you need to know the position or name of the element you’re working with, you should use the second or third form. Both give you an element’s position (i, nm) and value (xs[[i]], xs[[nm]]). If you’re struggling to solve a problem using one form, you might find it easier with another.For loop functionals:

friends of lapply() The key to using functionals in place of for loops is recognising that common looping patterns are already implemented in existing base functionals. Once you’ve mastered these existing functionals, the next step is to start writing your own: if you discover you’re duplicating the same looping pattern in many places, you should extract it out into its own function.

The following sections build on lapply() and discuss:

Vector output: sapply and vapply

The following example illustrates these differences. When given a data frame, sapply() and vapply() return the same results. When given an empty list, sapply() returns another empty list instead of the more correct zero-length logical vector.

sapply(mtcars, is.numeric)

#> mpg cyl disp hp drat wt qsec vs am gear carb

#> TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

vapply(mtcars, is.numeric, logical(1))

#> mpg cyl disp hp drat wt qsec vs am gear carb

#> TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

sapply(list(), is.numeric)

#> list()

vapply(list(), is.numeric, logical(1))

#> logical(0)

If the function returns results of different types or lengths, sapply() will silently return a list, while vapply() will throw an error. sapply() is fine for interactive use because you’ll normally notice if something goes wrong, but it’s dangerous when writing functions.

The following example illustrates a possible problem when extracting the class of columns in a data frame: if you falsely assume that class only has one value and use sapply(), you won’t find out about the problem until some future function is given a list instead of a character vector.

friends of lapply() The key to using functionals in place of for loops is recognising that common looping patterns are already implemented in existing base functionals. Once you’ve mastered these existing functionals, the next step is to start writing your own: if you discover you’re duplicating the same looping pattern in many places, you should extract it out into its own function.

The following sections build on lapply() and discuss:

**sapply() and vapply(), variants of lapply() that produce vectors, matrices, and arrays as output, instead of lists.****Map() and mapply() which iterate over multiple input data structures in parallel.****mclapply() and mcMap(), parallel versions of lapply() and Map().****Writing a new function, rollapply(), to solve a new problem.**

Vector output: sapply and vapply

**sapply() and vapply() are very similar to lapply() except they simplify their output to produce an atomic vector**. While sapply() guesses, vapply() takes an additional argument specifying the output type.**sapply() is great for interactive use because it saves typing**, but if you use it inside your functions you’ll get weird errors if you supply the wrong type of input.**vapply() is more verbose, but gives more informative error messages and never fails silently**. It is better suited for use inside other functions.The following example illustrates these differences. When given a data frame, sapply() and vapply() return the same results. When given an empty list, sapply() returns another empty list instead of the more correct zero-length logical vector.

sapply(mtcars, is.numeric)

#> mpg cyl disp hp drat wt qsec vs am gear carb

#> TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

vapply(mtcars, is.numeric, logical(1))

#> mpg cyl disp hp drat wt qsec vs am gear carb

#> TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

sapply(list(), is.numeric)

#> list()

vapply(list(), is.numeric, logical(1))

#> logical(0)

If the function returns results of different types or lengths, sapply() will silently return a list, while vapply() will throw an error. sapply() is fine for interactive use because you’ll normally notice if something goes wrong, but it’s dangerous when writing functions.

The following example illustrates a possible problem when extracting the class of columns in a data frame: if you falsely assume that class only has one value and use sapply(), you won’t find out about the problem until some future function is given a list instead of a character vector.