Friday , 21 September 2018
Home >> E >> Enterprise Applications >> R tip: Learn dplyr’s case_when() function

R tip: Learn dplyr’s case_when() function

Hi, this is Sharon Machlis, Director of Editorial Data Analytics at IDG Communications.

In this second episode of Do More with R, we’ll see how dplyr case_when() function helps avoid a lot of nested ifelse statements.

For data, I have a list of US states and their estimated populations, which you can see here.

I also set up R variables showing which states are in each region

SHOW FILE 02_USRegions.R

First, let me load the state population data and the USRegions file, and also load the dplyr package.

RUN ALL COMMANDS IN FILE 02_case_when.R

And look at the structure of the data.

TYPE AND RUN str(USpops)

The task here is to assign each state to its proper division.

There are a couple of different ways to do this. One common way is to use R’s ifelse function. In R, if you want to run an if statement across an entire vector at once, you typically use the special ifelse function, that’s ifelse all one word.

In this case, it might look something like this:

SHOW FILE 02_case_when_ifelse

That does work if you’ve only got a few alternatives. But I find that format difficult to read with more than a couple of options. And, it’s easy for me to make mistakes with closing parentheses or commas in the wrong place.

And, what if I wanted to assign states by Division, instead of Region?

SHOW 02_USDivisions.R

That’s 9 levels of nested if-elses!

Dplyr’s case_when() has an easier format. Here’s the syntax:

SHOW FILE 02_format_case_when.R

Each if-then statement has its own line. The condition, if test, is on the left. Then there’s a tilde, and then the value is on the right. Each line needs a comma, except the last line. If you want to have a catch-all value for everything you haven’t defined, put the last condition as TRUE (I’m not sure why it’s TRUE, but it is), and then your catch-all value on the right. Done.

Let me show a simple example testing whether a few numbers are even, odd, or – if they’re not whole integers – neither.

SHOW FILE 02_format_case_when_2.R

I’ll create a vector of numbers 1, 2, 3, 4, and 5.7. Let’s run a case_when to see if they’re even or odd. If the remainder when dividing by two is 0, it’s even. If the remainer is 1,it’s odd. Otherwise, it’s neither. Results should be odd, even, odd, even, neither. Let me run this code block:

RUN CODE STARTING LINE 10 IN 02_format_case_when_2.R

And we’ve got odd, even, odd, even, neither!

Now let’s see what that looks like for the state region examples.

SHOW FILE 02_case_when_case_when

Here I’m importing my state population file into R, then adding a column called Division with dplyr’s mutate function. The values of Division are based on the case_when statement. If the State is my vector of Northeast state names, I’ll assign the value Northeast. And so on.

Let me run that

SOURCE THE FILE 02_case_when_case_when

And then look at the results

CLICK ON USPops OBJECT IN TOP RIGHT PANE, SCROLL A BIT

Looks good. And, no “Other” value, so all the states were assigned.

That’s it for this episode, thanks for watching! For more R tips, head to the More With R video page at bit.ly/morewithR. That’s https B I T period L Y slash more with R, all lowercase except for the R. So long!

close
==[ Click Here 1X ] [ Close ]==