R doParallel: A Brain-Friendly Introduction to Parallelism in R

Reading time:

time

min

tutorials

By:

Dario Radečić

January 18, 2024

Implementing parallel execution in your code can be both a blessing and a curse. On the one hand, you're leveraging more power your CPU has to offer and increasing execution speed, but on the other, you're sacrificing the simplicity of single-threaded programs.

Luckily, <b>parallel processing in R is extremely developer-friendly</b>. You don't have to change much on your end, and R works its magic behind the scenes. Today you'll learn the basics of parallel execution in R with the R doParallel package. By the end, you'll know how to parallelize loop operations in R and will know exactly how much faster multi-threaded R computations are.
<blockquote>Looking to speed up your data processing pipelines? <a href="https://appsilon.com/r-data-processing-frameworks/" target="_blank" rel="noopener">Take a look at our detailed R Data Processing Frameworks comparison</a>.</blockquote>
<h3>Table of contents:</h3><ul><li><strong><a href="#introduction">What is Parallelism and Why Should You Care?</a></strong></li><li><strong><a href="#r-doparallel">R doParallel - Everything You Need to Know</a></strong></li><li><strong><a href="#summary">Summing up R doParallel</a></strong></li></ul>

<hr />

<h2 id="introduction">What is Parallelism and Why Should You Care?</h2>
If you're new to R and programming in general, it can be tough to fully wrap your head around parallelism. Let's make things easier with an analogy.

Imagine you're a restaurant owner and the only employee there. Guests are coming in and it's only your responsibility to show them to the table, take their order, prepare the food, and serve it. The problem is - <b>there's only so much you can do</b>. The majority of the guests will be waiting a long time since you're busy fulfilling earlier orders.

On the other hand, if you employ two chefs and three waiters, you'll drastically reduce the wait time. This will also allow you to serve more customers at once and prepare more meals simultaneously.

The first approach (when you do everything by yourself) is what we refer to as <b>single-threaded execution</b> in computer science, while the second one is known as <b>multi-threaded execution</b>. The ultimate goal is to find the best approach for your specific use case. If you're only serving 10 customers per day, maybe it makes sense to do everything yourself. But if you have a striving business, you don't want to leave your customers waiting.

You should also note that just because the second approach has 6 workers in total (two chefs, three waiters, and you), it doesn't mean you'll be exactly six times faster than when doing everything by yourself. Managing workers has some overhead time, just like managing CPU cores does.

But in general, the concept of parallelism is a game changer. Here's a couple of reasons why you must learn it as an R developer:
<ul><li><b>Speed-up computation: </b>If you're old enough to remember the days of single-core CPUs, you know that upgrading to a multi-core one made all the difference. Simply put, more cores equals more speed, that is if your R scripts take advantage of parallel processing (they don't by default).</li><li><b>Efficient use of resources: </b>Modern CPUs have multiple cores, and your R scripts only utilize one by default. Now, do you think that one core doing all the work while the rest sit idle is the best way to utilize compute resources? Likely no, you're far better off by distributing tasks across more cores.</li><li><b>Working on larger problems: </b>When you can distribute the load of processing data, you can handle larger datasets and more complex analyses that were previously out of reach. This is especially the case if you're working in data science since modern datasets are typically huge in size.</li></ul>
Up next, let's go over R's implementation of parallel execution - with the <code>doParallel</code> package.
<h2 id="r-doparallel">R doParallel - Everything You Need to Know</h2>
<h3>What is R doParallel?</h3>
The R doParallel package enables parallel computing using the <code>foreach</code> package, so you'll have to install both (explained in the following section). In a nutshell, it allows you to run foreach loops in parallel by combining the <code>foreach()</code> function with the <code>%dopar%</code> operator. Anything that's inside the loop's body will be executed in parallel.

This rest of the section will connect the concept of parallelism with a practical example by leveraging <code>foreach</code> and <code>doParallel</code> R packages.

Let's begin with the installation.
<h3>How to Install R doParallel</h3>
You can install both packages by running the <code>install.packages()</code> command from the R console. If you don't want to install them one by one, pass in package names as a vector:
<pre><code class="language-r">install.packages(c("foreach", "doParallel"))</code></pre>
That's it - you're good to go!
<h3>Basic Usage Example</h3>
This section will walk you through a basic usage example of R doParallel. The goal here is to wrap your head around the logic and steps required to parallelize a loop in R.

We'll start by importing the packages and using the <code>detectCores()</code> function from the <code>doParallel</code> package. As the name suggests, it will return the number of cores your CPU has, and in our case, store it into a <code>n_cores</code> variable:
<pre><code class="language-r">library(foreach)
library(doParallel)
<br># How many cores does your CPU have
n_cores <- detectCores()
n_cores</code></pre>
My M2 Pro Macbook Pro has 12 CPU cores:

<img class="size-full wp-image-22907" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65e9e66be20815b2e93c7d17_1dc6950d_Image-1-Number-of-CPU-cores.webp" alt="Image 1 - Number of CPU cores" width="357" height="91" /> Image 1 - Number of CPU cores

The next step is to create a cluster for parallel computation. The <code>makeCluster(n_cores - 1)</code> will initialize one for you with one less core than you have available. The reason for that is simple - <b>you want to leave at least one core for other system tasks</b>.

Then, the <code>registerDoParallel()</code> functions sets up the cluster for use with the <code>foreach</code> package, enabling you to use parallel execution in your code:
<pre><code class="language-r"># Register cluster
cluster <- makeCluster(n_cores - 1)
registerDoParallel(cluster)</code></pre>
And that's all you have to setup-wise. You now have everything needed to run R code in parallel.

To demonstrate, we'll parallelize a loop that will run 1000 times (<code>n_iterations</code>), square the number on each iteration, and store it in a list.

To run the loop in parallel, you need to use the <code>foreach()</code> function, followed by <code>%dopar%</code>. Everything after curly brackets (inside the loop) will be executed in parallel.

After running this code, it's also a good idea to stop your cluster.

Here's the entire snippet:
<pre><code class="language-r"># How many times will the loop run
n_iterations <- 1000
# To save the results
results <- list()
<br># Use foreach and %dopar% to run the loop in parallel
results <- foreach(i = 1:n_iterations) %dopar% {
# Store the results
results[i] <- i^2
}
<br># Don't fotget to stop the cluster
stopCluster(cl = cluster)</code></pre>
The output of this code snippet is irrelevant, but here it is if you're interested:

Image 2 - Computation results

And that's how you can run a loop in parallel in R. The question is - <b>will parallelization make your code run faster?</b> That's what we'll answer next.
<h3>Does R doParallel Make Your Code Execution Faster?</h3>
There's a good reason why most code you'll ever see is single-threaded - it's simple to write, has no overhead in start time, and usually results in fewer errors.

Setting up R loops to run in parallel involves some overhead time in setting up a cluster and managing the runtime (which happens behind the scenes). Depending on what you do, <b>your single-threaded code will sometimes be faster compared to its parallelized version</b>, all due to the mentioned overhead.

That's why we'll set up a test in this section, and see how much time it takes to run the same piece of code on different numbers of cores, different numbers of times.

The <code>test()</code> function does the following:
<ul><li>Creates and registers a new cluster with <code>n_cores</code> CPU cores, and stops it after the computation.</li <li>Uses <code>foreach</code> to perform iteration <code>n_iter</code> number of times.</li><li>Keeps track of time needed in total, and time needed to do the actual computation.</li> <li>Returns a <code>data.frame</code> displaying the number of cores used, iterations made total running time, and total computation time.</li></ul>
Here's what this function looks like in the code:
<pre><code class="language-r">test <- function(n_cores, n_iter) {
# Keep track of the start time
time_start <- Sys.time()
<br> # Create and register cluster
cluster <- makeCluster(n_cores)
registerDoParallel(cluster)
<br> # Only for measuring computation time
time_start_processing <- Sys.time()
<br> # Do the processing
results <- foreach(i = 1:n_iter) %dopar% {
i^2
}
<br> # Only for measuring computation time
time_finish_processing <- Sys.time()
<br> # Stop the cluster
stopCluster(cl = cluster)
<br> # Keep track of the end time
time_end <- Sys.time()
<br> # Return the report
return(data.frame(
cores = n_cores,
iterations = n_iter,
total_time = difftime(time_end, time_start, units = "secs"),
compute_time = difftime(time_finish_processing, time_start_processing, units = "secs")
))
}</code></pre>
And now for the test itself. We'll test a combination of using 1, 6, and 11 cores for running the <code>test()</code> function 1K, 10K, 100K, and 1M times. After each iteration, the results will appended to <code>res_df</code>:
<pre><code class="language-r">res_df <- data.frame()
<br># Arbitrary number of cores
for (n_cores in c(1, 6, 11)) {
# Arbitrary number of iterations
for (n_iter in c(1000, 10000, 100000, 1000000)) {
# Total runtime
current <- test(n_cores, n_iter)
# Append to results
res_df <- rbind(res_df, current)
}
}
<br>res_df</code></pre>
Here are the results:

<img class="size-full wp-image-22911" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65e9e66de5412ad18a02de1e_7b8611e3_Image-3-Runtime-comparisons.webp" alt="Image 3 - Runtime comparisons" width="666" height="424" /> Image 3 - Runtime comparisons

Looking at the table data can only get you so far. Let's visualize our runtimes to get a better grasp of compute and overhead times.

There's a <b>significant overhead</b> for utilizing and managing multiple CPU cores when there are not so many computations to go through. As you can see, using only 1 CPU core took the longest in compute time, but managing 11 of them takes a lot of overhead:

<img class="size-full wp-image-22913" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65e9e66ea8460f13cd30ca30_1d6e8d4b_Image-4-Runtime-for-1K-iterations.webp" alt="Image 4 - Runtime for 1K iterations" width="1389" height="1067" /> Image 4 - Runtime for 1K iterations

If we take this to 10K iterations, things look interesting. <b>There's no point in leveraging parallelization at this amount of data</b>, as it increases both compute and overhead time:

<img class="size-full wp-image-22915" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65e9e66e0b432bb87a94ca9b_cbc872a0_Image-5-Runtime-for-10K-iterations.webp" alt="Image 5 - Runtime for 10K iterations" width="1389" height="1067" /> Image 5 - Runtime for 10K iterations

Taking things one step further, to 100K iterations, we have an overall win when using 6 CPU cores. The 11-core simulation had the fastest runtime, but the <b>overhead of managing so many cores</b> took its toll on the total time:

<img class="size-full wp-image-22917" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65e9e66fa25310f3344d4f39_a16d5570_Image-6-Runtime-for-100K-iterations.webp" alt="Image 6 - Runtime for 100K iterations" width="1389" height="1067" /> Image 6 - Runtime for 100K iterations

And finally, let's take a look at 1M iterations. This is the point where the <b>overhead time becomes insignificant</b>. Both 6-core and 11-core implementations come close, and both are faster than a single-core implementation:

<img class="size-full wp-image-22919" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65e9e67017713690960d85b9_65939d1e_Image-7-Runtime-for-1M-iterations.webp" alt="Image 7 - Runtime for 1M iterations" width="1389" height="1067" /> Image 7 - Runtime for 1M iterations

To summarize the results - parallelization makes the compute time faster when there's a significant amount of data to process. When that's not the case, you're better off sticking to a single-threaded execution. The code is simpler, and there's no overhead to pay in terms of time required to manage multiple CPU cores.

<hr />

<h2 id="summary">Summing up R doParallel</h2>
R does a great job of hiding these complexities behind the <code>foreach()</code> function and <code>%dopar%</code> command. But just because you can parallelize some operation it doesn't mean you should. This point was made perfectly clear in the previous section. Code that runs in parallel is often more complex behind the scenes, and a lot more things can go wrong.

You've learned the basics of R doParallel today. You now know how to run an iterative process in parallel, provided the underlying data structure stays the same. Make sure to stay tuned to <a href="https://appsilon.com/blog/" target="_blank" rel="noopener">Appsilon Blog</a>, as we definitely plan to cover parallelization in the realm of data frames and processing large datasets.
<blockquote>Did you know R supports Object-Oriented Programming? <a href="https://appsilon.com/object-oriented-programming-in-r-part-1/" target="_blank" rel="noopener">Take a look at the first of many articles in our new series</a>.</blockquote>

‍

Have questions or insights?

Engage with experts, share ideas and take your data journey to the next level!

Stop Struggling with Outdated Clinical Data Systems

Join pharma data leaders from Jazz Pharmaceuticals and Novo Nordisk in our live podcast episode as they share what really works when building modern, compliant Statistical Computing Environments (SCEs).

Save My Spot

Is Your Software GxP Compliant?

Download a checklist designed for clinical managers in data departments to make sure that software meets requirements for FDA and EMA submissions.

Get the Checklist

Ensure Your R and Python Code Meets FDA and EMA Standards

A comprehensive diagnosis of your R and Python software and computing environment compliance with actionable recommendations and areas for improvement.