Indexing | Practical Data Science

What follows is probably more of a refresher for those that have used R quite a bit already. Presumably you’ve had enough R exposure to be aware of some of this. However, much of data processing regards data frames, or other tables of mixed data types, so more time will be spent on slicing and dicing of data frames instead. Even so, it would be impossible to use R effectively without knowing how to handle basic data types.

Slicing Vectors

Taking individual parts of a vector of values is straightforward and something you’ll likely need to do a lot. The basic idea is to provide the indices for which elements you want to exract.

letters[4:6]  # lower case letters a-z

[1] "d" "e" "f"

[1] "m" "j" "c"

Slicing Matrices/data.frames

With 2-d objects we can specify rows and columns. Rows are indexed to the left of the comma, columns to the right.

myMatrix[1, 2:3]  # matrix[rows, columns]

Label-based Indexing

We can do this by name if they are available.

Position-based Indexing

Otherwise we can index by number.

Mixed Indexing

Even both!

If the row/column value is empty, all rows/columns are retained.

mydf['row1', ]
mydf[, 'b']

Non-contiguous

Note that the indices supplied do not have to be in order or in sequence.

Boolean

Boolean indexing requires some TRUE-FALSE indicator. In the following, if column A has a value greater than or equal to 2, it is TRUE and is selected. Otherwise it is FALSE and will be dropped.

Indexing Exercises

This following is a refresher of base R indexing only.

Here is a matrix, a data.frame and a list.

mymatrix = matrix(rnorm(100), 10, 10)
mydf = cars
mylist = list(mymatrix, thisdf = mydf)

Exercise 1

For the matrix, in separate operations, take a slice of rows, a selection of columns, and a single element.

Exercise 2

For the data.frame, grab a column in 3 different ways.

Exercise 3

For the list, grab an element by number and by name.