Apply family functions - Part 1

The apply() function

The apply family functions belong to the R base package, they are especially useful when combining their use with functions to manipulate pieces of matrices, arrays, lists and data frames. These functions allow you to cross data in multiple ways to avoid the use of for loops that are usually computationally heavier.

The first function we will talk about in this series is the apply () function, which in its simplest form of use is used to evaluate the margins (1 = rows or 2 = columns) of a matrix or an array to apply a function to them.

As a first example, we start from a matrix with three rows and three columns.

mat <- matrix(c(2, 4, 6, 7, 8, 9, 1, 12, 21), nrow = 3, ncol = 3)
mat
##      [,1] [,2] [,3]
## [1,]    2    7    1
## [2,]    4    8   12
## [3,]    6    9   21

If you wish, for example, to obtain the sum of each column, you can use the apply () function as follows.

apply(mat, 2, sum)
## [1] 12 24 34

We can also calculate the average of each row.

apply(mat, 1, mean)
## [1]  3.333333  8.000000 12.000000

There are also some functions already programmed in the R base package that quickly replicate the previous results. For example, there is the colSums () function to calculate the amount of each column, and rowMeans () to obtain the arithmetic mean of each row.

colSums(mat)
## [1] 12 24 34
rowMeans(mat)
## [1]  3.333333  8.000000 12.000000

The two cases shown above exemplify a basic use of the apply () function, however, this function is much more powerful and is capable of working in a multidimensional way. Consider, for example, an object in two dimensions (rows and columns) similar to the one created previously, that is, an array.

mat2 <- matrix(1:9, nrow = 3, ncol = 3)
mat2
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9

The mat2 object represents a particular case of an array, which can be created using thearray ()function.

array(data=1:9, dim = c(3,3))
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9

The array () function also allows you to add tags to the rows and columns using its dimnames argument.

nombres.columnas <- c("COL1","COL2","COL3")
nombres.filas <- c("FILA1","FILA2","FILA3")
arreglo <- array(data=1:9, dim = c(3,3), 
                 dimnames = list(nombres.columnas, nombres.filas))
arreglo
##      FILA1 FILA2 FILA3
## COL1     1     4     7
## COL2     2     5     8
## COL3     3     6     9

We can add multiple dimensions to an array. For this, suppose you want to have the following sizes:

  • DIM1: Numbers from 1 to 9.
  • DIM2: Numbers from 1 to 9 multiplied by 10.
  • DIM3: Numbers from 1 to 9 multiplied by 100.
  • DIM4: Numbers from 1 to 9 increased by 1000.

One way to generate the previous arrangement is by using the following code:

nombres.dimensiones <- c("DIM1","DIM2","DIM3","DIM4")
arreglo <- array(data = c(seq(from=1, to=9, by=1),           #1 al 9
                          seq(from=10, to=90, by=10),        #10 al 90
                          seq(from=100, to=900, by=100),     #100 al 900
                          seq(from=1000, to=9000, by=1000)), #1000 al 9000
                 dim = c(3, 3, 4),                           #3 filas, 3 columnas y 4 dimensiones
                 dimnames = list(nombres.filas,
                                 nombres.columnas,
                                 nombres.dimensiones))
arreglo
## , , DIM1
## 
##       COL1 COL2 COL3
## FILA1    1    4    7
## FILA2    2    5    8
## FILA3    3    6    9
## 
## , , DIM2
## 
##       COL1 COL2 COL3
## FILA1   10   40   70
## FILA2   20   50   80
## FILA3   30   60   90
## 
## , , DIM3
## 
##       COL1 COL2 COL3
## FILA1  100  400  700
## FILA2  200  500  800
## FILA3  300  600  900
## 
## , , DIM4
## 
##       COL1 COL2 COL3
## FILA1 1000 4000 7000
## FILA2 2000 5000 8000
## FILA3 3000 6000 9000

Starting from the previous array, suppose that you want to obtain the maximum value per row from each dimension.

apply(arreglo, c(3,1), max)
##      FILA1 FILA2 FILA3
## DIM1     7     8     9
## DIM2    70    80    90
## DIM3   700   800   900
## DIM4  7000  8000  9000

Or, you may want to obtain the maximum value of each column from each dimension.

apply(arreglo, c(3,2), max)
##      COL1 COL2 COL3
## DIM1    3    6    9
## DIM2   30   60   90
## DIM3  300  600  900
## DIM4 3000 6000 9000

The following result shows the minimum of each column in each dimension.

apply(arreglo, c(2,3), min)
##      DIM1 DIM2 DIM3 DIM4
## COL1    1   10  100 1000
## COL2    4   40  400 4000
## COL3    7   70  700 7000

The previous examples can be applied to arrays with a higher dimension; for this, it is enough to have an adequate arrangement and operate on the corresponding margins with the apply () function.

Related

comments powered by Disqus
ORCID iD iconhttps://orcid.org/0000-0001-6733-4759