Can I Do that In R? Advanced Excel Functions in R for Data Manipulation

Estimated time:

Does Excel.. excel at everything? Should you consider R programming language instead? Well, yes - but not for the reasons you might initially think. Excel is fun and intuitive but has some obvious problems with reproducibility and scalability. More on that later.

This article will walk you through several examples of Excel functions in R. We'll see how things are done in Excel and in R, and then do our best to compare the two. Maybe you're a seasoned Excel user and want to dip your toes in data manipulation in R. You've found just the right place to start.
<blockquote>Looking to combine R and Excel? <a href="" target="_blank" rel="noopener">We bring you list of packages to combine the best of both worlds</a>.</blockquote>
Table of contents:
<ul><li><a href="#file-io">Creating and Reading Data in Excel vs. R</a></li><li><a href="#the-basics">Basic Excel Functions in R - Getting Your Feet Wet</a></li><li><a href="#charts">Excel Functions in R for Data Summarization and Charts</a></li><li><a href="#the-benefit">Why Transition from Excel to R - Is R a Viable Excel Alternative?</a></li><li><a href="#summary">Summing up Excel Functions in R</a></li></ul>
<h3>Before We Start</h3>
In this article, we'll delve into the world of data manipulation in R and explore how it compares to Excel, especially when dealing with advanced functions. We'll walk you through various examples, showing how to perform tasks that you might be familiar with in Excel but with the power and flexibility of R. But before we dive into the exciting world of data manipulation and analysis, let's address an essential aspect for beginners: package installation.
<h4>Package Installation</h4>
If you're new to R, it's important to note that R's strength lies in its extensibility through packages. We'll use several R packages, such as dplyr, readxl, lubridate, and ggplot2, to enhance data manipulation and visualization. For our beginner audience, it could be helpful to understand how to install these packages. To get started, you can install these packages using the <code>install.packages()</code> function, which ensures that you have the necessary tools at your disposal to follow along with the examples in this article.

This video is a <a href="" target="_blank" rel="noopener">brief walkthrough for installing R packages</a> in RStudio and from GitHub.

So, let's embark on this journey of data manipulation in R, armed with the right tools to <em>excel</em> in your data tasks!

<hr />

<h2 id="file-io">Creating and Reading Data in Excel vs. R</h2>
First things first, we need some data. Creating it in Excel is easy, fast, and intuitive. After all, it was built for creating and manipulating somewhat decently large data sheets.

For this article, we'll create a dataset of employee information. We'll keep track of their first name, last name, age, department, salary, and the date they joined our made-up company. Feel free to create your own Excel data sheet for your needs. I recommend keeping it relatively simple for the tutorial.

Here's what the data looks like:

<img class="size-full wp-image-19860" src="" alt="Image 1 - Tabular data created in Excel" width="1376" height="340" /> Image 1 - Tabular data created in Excel

The question now is - how can you replicate this in R? Well, you can declare a dataset from scratch by wrapping it into a <code>data.frame()</code> and providing a list of column names and values, as shown below:
<pre><code class="language-r">data &lt;- data.frame(
 "ID" = c(1, 2, 3, 4, 5),
 "FirstName" = c("John", "Jane", "Mark", "Bob", "Susan"),
 "LastName" = c("Doe", "Dean", "Markson", "Barston", "Cup"),
 "Age" = c(31, 33, 27, 41, 39),
 "Department" = c("IT", "Sales", "Marketing", "IT", "IT"),
 "Salary" = c(5000, 4500, 5000, 6500, 7000),
 "DateJoined" = c(as.Date("2018-05-27"), as.Date("2020-08-14"), as.Date("2021-07-13"), as.Date("2015-01-10"), as.Date("2012-09-04"))
Here's what the dataset looks like:

<img class="size-full wp-image-19862" src="" alt="Image 2 - Tabular data created in R" width="1238" height="368" /> Image 2 - Tabular data created in R

Or, you can leverage the already existing Excel file, and use packages such as <code>readxl</code> to load it:
<pre><code class="language-r">library(readxl)
<br>data &lt;- read_xlsx("data.xlsx", sheet = "Sheet1")
This is the output you'll see:

<img class="size-full wp-image-19864" src="" alt="Image 3 - Reading an Excel File in R" width="1672" height="478" /> Image 3 - Reading an Excel File in R

The thing you need to remember about R is that it connects to virtually any data source, from local files to databases and anything in between. That's not a luxury you get with Excel.

Sure, you can connect to the data source manually and export the data, but that's not the same. We like things automated at Appsilon.
<h2 id="the-basics">Basic Excel Functions in R - Getting Your Feet Wet</h2>
This section will walk you through some basic operations, such as summary statistics, string/date manipulation, and conditionals. We'll implement the identical logic in R and Excel to see how they compare.

Let's start with simple summary statistics.
<h3>Simple Summary Statistics</h3>
Think of these as points you always want to know the value for. What's the average of something, or what is the range of values in a column? These functions will give you the answer.

We'll use the <code>Salary</code> column for the calculation. In Excel, it's located in a column <code>F</code> and spans between rows <code>2:6</code>. Yours may differ, so change the values in the formulas accordingly.

Here's an example set of functions you can use to find the minimum, average (mean), median, and maximum, but also its standard deviation:
<pre><code class="language-excel">=MIN(F2:F6)
Below are the results you'll see in Excel:

<img class="size-full wp-image-19866" src="" alt="Image 4 - Summary statistics in Excel" width="578" height="242" /> Image 4 - Summary statistics in Excel

Neat, and something you're used to, but how can you translate it into R?

R programming language has functions built-in for taking care of calculations like these. For example, you can call the <code>mean()</code> function and pass in the column of interest (<code>dataframe_name$column_name</code>) to get the value out.

The surrounding <code>paste()</code> function is here just to give some context, or to print the function name in front of the value:
<pre><code class="language-r">paste("Min. salary:", min(data$Salary))
paste("Average. salary:", mean(data$Salary))
paste("Median salary:", median(data$Salary))
paste("Max salary:", max(data$Salary))
paste("STdev. salary:", sd(data$Salary))</code></pre>
These are the results you'll see:

<img class="size-full wp-image-19868" src="" alt="Image 5 - Summary statistics in R" width="1088" height="530" /> Image 5 - Summary statistics in R

And that's the basics of summary statistics for you. Up next, let's discuss string concatenation.
String concatenation is one of the typical data manipulation and preprocessing operations. The idea is to take two (or more) columns and combine their values into a single new column.

Luckily for us, we have the <code>First Name</code> and <code>Last Name</code> columns that beg to be combined into a <code>Full Name</code> column.

In Excel, you'll want to call the <code>CONCAT()</code> function and pass in both values with a single space in between. Here's an example:
<pre><code class="language-excel">=CONCAT(B2;" ";C2)</code></pre>
And here's what everything will look like when you apply the formula to the entire column:

<img class="size-full wp-image-19870" src="" alt="Image 6 - String concatenation in Excel" width="1352" height="724" /> Image 6 - String concatenation in Excel

In R, you can use the <code>mutate()</code> function from <code>dplyr</code> to do the same:
<pre><code class="language-r">library(dplyr)
<br>data %&gt;%
 mutate(FullName = paste(FirstName, LastName, sep = " "))</code></pre>
The results are identical, not taking into consideration the UI differences:

<img class="size-full wp-image-19872" src="" alt="Image 7 - String concatenation in R" width="1510" height="412" /> Image 7 - String concatenation in R

The <code>mutate()</code> function is responsible for creating new attributes. In it, you're basically assigning values to a new column. The <code>paste()</code> function will combine multiple fields to produce the final results.
<blockquote>Are you new to R dplyr and want to learn more? <a href="" target="_blank" rel="noopener">We have a complete beginners guide to dplyr on our blog</a>.</blockquote>
Up next, let's tackle conditions. In here, you want the cell value to populate differently, depending on values from different columns.

We'll create a new column named <code>Higher than avg. sal</code> that will have a value of "Y" if the salary for a given employee is greater than the average salary across the company, and "N" otherwise.

Here's the Excel formula you need to use:
<pre><code class="language-excel">=IF(F2&gt;AVERAGE(F$2:F$6); "Y"; "N")</code></pre>
And these are the results:

<img class="size-full wp-image-19874" src="" alt="Image 8 - Conditional statements in Excel" width="1450" height="658" /> Image 8 - Conditional statements in Excel

In R, you can once again use the <code>mutate()</code> function and assign the result to be the output from the <code>ifelse()</code> function. The declaration is pretty much self-explanatory:
<pre><code class="language-r">data %&gt;%
 mutate(HigherThanAvgSal = ifelse(Salary &gt; mean(Salary), "Y", "N"))</code></pre>
The results are once again identical:

<img class="size-full wp-image-19876" src="" alt="Image 9 - Conditional statements in R" width="1598" height="420" /> Image 9 - Conditional statements in R

And that's conditionals for you. Up next, let's go over string manipulation.
<h3>String Manipulation</h3>
You can do all sorts of stuff with strings in Excel and R. We'll demonstrate that on a rather simple example of extracting initials from first and last names.

To achieve this in Excel, you'll need to use the familiar <code>CONCAT()</code> function, with its inner elements being the results of a <code>LEFT()</code> function. This one will extract the first N letters from a string.

If we extract only the first letter for both first and last names and then combine them, we will get the initials.

Here's the implementation in Excel:
<pre><code class="language-excel">=CONCAT(LEFT(B2;1); LEFT(C2;1))</code></pre>
These are the results you should see:

<img class="size-full wp-image-19878" src="" alt="Image 10 - String manipulation in Excel" width="1458" height="598" /> Image 10 - String manipulation in Excel

Things are quite similar in R. Instead of <code>LEFT</code>, you'll need to use <code>substr()</code> and paste the start and end points. These are then surrounded by the familiar <code>paste()</code> function:
<pre><code class="language-r">data %&gt;%
 mutate(Initials = paste(substr(FirstName, 1, 1), substr(LastName, 1, 1), sep = ""))</code></pre>
And you get the same output as with Excel:

<img class="size-full wp-image-19880" src="" alt="Image 11 - String manipulation in R" width="1818" height="470" /> Image 11 - String manipulation in R

And finally, let's go over date manipulations.
<h3>Date Calculations</h3>
There's no getting around dates. They're an integral part of any dataset, and in ours, they signify the exact day at which the employee has joined the company.

What we're aiming to accomplish is simple - calculate how many years has the employee been in our company.

In Excel, one way of doing this is by subtracting the year of the hire date from the year of the current date. Here's an example:
<pre><code class="language-excel">=YEAR(TODAY())-YEAR(G2)</code></pre>
This is what you should see populated in your column:

<img class="size-full wp-image-19882" src="" alt="Image 12 - Date calculations in Excel" width="1458" height="596" /> Image 12 - Date calculations in Excel

Doing the same in R is much easier with the <code>lubridate</code> pacakge. Make sure to install it first - <code>install.packages("lubridate")</code>, and then use the function call similar to the one from Excel:
<pre><code class="language-r">library(lubridate)
<br>data %&gt;%
 mutate(YearsExp = year(today()) - year(DateJoined))</code></pre>
Once again, the results are identical:

<img class="size-full wp-image-19884" src="" alt="Image 13 - Date calculations in R" width="1438" height="424" /> Image 13 - Date calculations in R

But we're not done with this section yet. R has a secret weapon that will leave a lot of Excel users jealous.
<h3>R dplyr - Do Everything With One Command</h3>
And the secret is - the <code>dplyr</code> package allows you to chain operations together. For example, you can create all of the previously seen attributes in a single call.

Here's the code for doing so:
<pre><code class="language-r">library(dplyr)
<br>data &lt;- data %&gt;%
   FullName = paste(FirstName, LastName, sep = " "),
   HigherThanAvgSal = ifelse(Salary &gt; mean(Salary), "Y", "N"),
   Initials = paste(substr(FirstName, 1, 1), substr(LastName, 1, 1), sep = ""),
   YearsExp = year(today()) - year(DateJoined)
This is the output you will see:

<img class="size-full wp-image-19886" src="" alt="Image 14 - Multiple column creation with R dplyr mutate()" width="2278" height="322" /> Image 14 - Multiple column creation with R dplyr mutate()

With R, there's no guesswork in what's going on. You can clearly read the code and the order of operations.
<h2 id="charts">Excel Functions in R for Data Summarization and Charts</h2>
This section will consider one other important aspect of the R vs. Excel debate - charts. Excel makes it easy to produce good-looking data visualizations, but you'll have to wait and see if it's up to par with what R has to offer.
<h3>Conditional Aggregation and Charting in Excel</h3>
Our chart logic is quite simple. We want to display the total monthly salaries across departments. This calculation doesn't involve any work for the Sales and Marketing departments, since there's only one employee there. But still, the formula will scale to future employees.

The <code>SUMIF()</code> function in Excel allows us to calculate the sum only on those records where a defined condition is met. You first define the condition range (departments), then the criterion (IT, Sales, Marketing), and finally the range from which the values will be summed up:
<pre><code class="language-excel">=SUMIF(E2:E6;"IT";F2:F6)
Once calculated, you can select the attributes and create your chart with a couple of clicks:

<img class="size-full wp-image-19888" src="" alt="Image 15 - Chart in Excel" width="1294" height="812" /> Image 15 - Chart in Excel

Neat. Let's see what we have to do to achieve the same in R.
<h3>Conditional Aggregation and Charting in R</h3>
R's <code>dplyr</code> packs a convenient <code>group_by()</code> function that allows us to group the dataset by an apartment, and calculate the summary statistics from there.

All of the results will be stored to a new <code>data.frame</code>, just for the sake of easier visualization:
<pre><code class="language-r">department_salaries &lt;- data %&gt;%
 group_by(Department) %&gt;%
 summarize(Salary = sum(Salary))
This is what the summarized <code>data.frame</code> looks like:

<img class="size-full wp-image-19890" src="" alt="Image 16 - Summary statistics in R" width="532" height="362" /> Image 16 - Summary statistics in R

You can now leverage the <code>ggplot2</code> package to make a bar chart. There's no point in explaining the function call line by line, so just read it and see what happens if you remove some bits and pieces:
<pre><code class="language-r">library(ggplot2)
<br>bar_chart &lt;- ggplot(department_salaries, aes(x = Department, y = Salary)) +
 geom_bar(stat = "identity", fill = "steelblue") +
 labs(title = "Total monthly salaries by department",
      y = "Salary in $",
      x = "Department")
This is the chart you'll see after running the code:

<img class="size-full wp-image-19892" src="" alt="Image 17 - Chart in R with ggplot2" width="1816" height="1536" /> Image 17 - Chart in R with ggplot2

Looking to dive deeper into ggplot2? <a href="" target="_blank" rel="noopener">We have an entire article series on making stunning charts, bar charts included</a>.

And that's basically it for the "how" part. The only thing left to do is discuss the reasons why you might consider switching from Excel to R.
<h2 id="the-benefit">Why Transition from Excel to R - Is R a Viable Excel Alternative?</h2>
Excel isn't going anywhere, that's a fact. It's a tool enjoyed by millions worldwide, while R is a more niched technology. And that's fine, R was never meant to replace Excel for average users.

Where R becomes interesting is if you consider yourself an Excel power user. That's where you run into Excel limitations, such as:
<ol><li><b>Limited data capacity: </b>This will depend on your Excel version, but the more recent ones have a cap of 1048576 rows and 16384 columns. It might be insufficient for handling huge datasets.</li><li><b>Performance issues: </b>Excel can become slow and unresponsive when dealing with complex calculations and large datasets, whereas R is designed to handle such tasks efficiently and without hiccups.</li><li><b>Limited statistical analysis: </b>Excel provides basic statistical functions, while R offers a comprehensive suite of statistical packages, allowing you to do more advanced things.</li><li><b>Lack of version control: </b>Excel lacks built-in version control, making it difficult to track changes and collaborate on projects compared to R, which can use version control systems like Git. Sure, you can version control an Excel file with Git in theory, but no one does it for a reason.</li><li><b>Reproducibility and automation challenges: </b>Excel lacks built-in features for reproducibility, making it harder to document and reproduce data analysis compared to R, which supports scripts and notebooks. Also, Excel requires manual execution of tasks, making it less suitable for automated data processing and analysis compared to R.</li></ol>
This is just a handful of reasons why we think Excel is a no-go for power users and companies working with big datasets. Your mileage may vary, and you're likely to find additional advantages/disadvantages in your area of work.

But still, we think R is a viable Excel alternative whenever you do anything south of basic analysis and visualization.

<hr />

<h2 id="summary">Summing up Excel Functions in R</h2>
And there you have it - Excel functions in R, and why R is a viable alternative to Excel, especially as the amount of data grows. We hope the transition to R doesn't scare you. It's a quite simple programming language, but will definitely require some time to get used to. It's a new way of thinking and implementing actions, after all.

The examples provided today should be enough to get you started. R's <code>dplyr</code> package supports pretty much anything Excel does, at least in the realm of data analysis and manipulation. If you want to visualize your data, look no further than <code>ggplot2</code> for static visualizations.

<i>Did you successfully transition from Excel to R? Or did you find a reason not to?</i> Whatever the case, make sure to share your thoughts with us in the comment section below. You can also reach out on Twitter - <a href="" target="_blank" rel="noopener">@appsilon</a>.
<blockquote>Is Excel obsolete? <a href="" target="_blank" rel="noopener">Here's how (and why) to switch to R Shiny instead</a>.</blockquote>

Contact us!
Damian's Avatar
Damian Rodziewicz
Head of Sales
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.