How to Make Stunning Boxplots in R: A Complete Guide to ggplot Boxplot

Estimated time:

<em><strong>Updated</strong>: July 14, 2022.</em> <h2>Boxplots with R and ggplot2</h2> Are your data visualizations an eyesore? It’s a common problem in the data science world. The solution is easier than you think, as R provides many ways to make stunning visuals. Today you’ll learn how to create impressive boxplots with R and the <code>ggplot2</code> package. Need more than boxplots? Explore more of the ggplot2 series: <ul><li><a title="How to Make Stunning Bar Charts with R" href="" target="_blank" rel="noopener noreferrer">Bar Charts with R</a></li><li><a title="How to Make Stunning Line Charts with R" href="" target="_blank" rel="noopener noreferrer">Line Charts with R</a></li><li><a href="" target="_blank" rel="noopener noreferrer">Scatter Plots with R</a></li></ul> This article demonstrates how to <strong>make stunning boxplots</strong> with ggplot based on any dataset. We’ll start simple with a brief introduction and interpretation of boxplots and then dive deep into <strong>visualizing</strong> and <strong>styling</strong> ggplot boxplots. Table of contents: <ul><li><a href="#what-is-a-boxplot">What Is a Boxplot?</a></li><li><a href="#ggplot-vs-ggplot2">ggplot, ggplot2, and ggplot()?</a></li><li><a href="#first-boxplot">Make Your First ggplot Boxplot</a></li><li><a href="#styling">Style ggplot Boxplots — Change Layout, Outline, and Fill Color</a></li><li><a href="#labels">Add Text, Titles, Subtitles, Captions, and Axis Labels to ggplot Boxplots</a></li><li><a href="#advanced">Advanced ggplot Boxplot Examples</a></li><li><a href="#conclusion">Conclusion</a></li></ul> <hr /> <h2 id="what-is-a-boxplot">What Is a ggplot Boxplot?</h2> A boxplot is one of the simplest ways of representing a distribution of a continuous variable. It consists of two parts: <ul><li><strong>Box</strong> — Extends from the first to the third quartile (Q1 to Q3) with a line in the middle that represents the median. The range of values between Q1 and Q3 is also known as an <em>Interquartile range (IQR)</em>.</li><li><strong>Whiskers</strong> — Lines extending from both ends of the box indicate variability outside Q1 and Q3. The minimum/maximum whisker values are calculated as Q1/Q3 -/+ 1.5 * IQR. Everything outside is represented as an outlier.</li></ul> Take a look at the following visual representation of a horizontal box plot: <img class="size-full wp-image-8692" src="" alt="Image 1 - Boxplot representation" width="2560" height="780" /> Image 1 - Boxplot representation In short, boxplots provide a ton of information for a single chart. They're excellent for <strong>summary statistics</strong>. Boxplots tell you whether the variable is normally distributed, or if the distribution is skewed in either direction. You can also easily spot the outliers, which always helps. It's an e<strong>xcellent data visualization</strong> for statisticians and researchers looking to visualize data distributions, compare several distributions, and of course - identify outlier points. They also come in many shapes and styles, with options including horizontal box plots, vertical box plots, notched box plots, violin plots, and more. So be sure to choose the appropriate box plot based on your needs. <h2 id="ggplot-vs-ggplot2">ggplot, ggplot2, and ggplot()?</h2> Let's clarify something before we begin. <ul><li>ggplot() is the main function</li><li>ggplot is the name of the <a href="" target="_blank" rel="noopener">archived package</a></li><li>ggplot2 is the name of the <a href="" target="_blank" rel="noopener">current package</a></li></ul> Often, you'll hear or see people referencing the ggplot2 package as 'ggplot'. That's because the previous package version was titled - you guessed it - 'ggplot', and old habits die hard. If you call the ggplot <strong>function</strong>, it's simply 'ggplot', but the current <strong>package </strong>is 'ggplot2'. So if you're trying to install ggplot (the package), you'll run into a wall. Instead, search for ggplot2. <img class="aligncenter wp-image-13427" src="" alt="'ggplot' package installation suggestions from CRAN repository based on input text" width="400" height="288" /> Let’s see how you can use R and ggplot to visualize boxplots. <h2 id="first-boxplot">Make Your First ggplot Boxplot</h2> <h3>Data frame for Your Boxplot</h3> R has many datasets built-in, one of them being <code>mtcars</code>. It’s a small and easy-to-explore dataset we’ll use today to draw boxplots. You’ll need only <code>ggplot2</code> installed to follow along. We’ll visualize boxplots for the <code>mpg</code> (Miles per gallon) variable among different <code>cyl</code> (Number of cylinders) options in most of the charts. You’ll have to convert the <code>cyl</code> variable to a factor beforehand. Here’s how: <pre>library(ggplot2) df &lt;- mtcars df$cyl &lt;- as.factor(df$cyl) head(df) </pre> The <code>head()</code> function prints the first six rows of the dataset: <img class="size-full wp-image-8693" src="" alt="Image 2 - Head of MTCars dataset" width="1488" height="364" /> Image 2 - Head of MTCars dataset From the image alone, you can see that <code>mpg</code> is continuous, and <code>cyl</code> is categorical. It’s a variable-type combination you’re looking for when working with boxplots. <h3>Visualization</h3> You can make ggplot boxplots look stunning with a bit of work, but starting out they’ll look pretty plain. Think of this as a blank canvas to paint your beautiful boxplot story. The <code>geom_boxplot()</code> function is used in <code>ggplot2</code> to draw boxplots. Here’s how to use it to make a default-looking boxplot of the <em>miles per gallon</em> variable: <pre>ggplot(df, aes(x = mpg)) +  geom_boxplot() </pre> <img class="size-full wp-image-8694" src="" alt="Image 3 - Simple boxplot with ggplot2" width="2550" height="1594" /> Image 3 - Simple boxplot with ggplot2 And boy is it ugly. We’ll deal with the stylings later after we go over the basics. Every so often, you’ll want to visualize multiple boxplots on a single chart — each representing a distribution of the variable with some filter condition applied. For example, we can visualize the distribution of <em>miles per gallon</em> for every possible <em>cylinder</em> value. The latter is already converted to a factor, so you’re ready to go. Here’s the code: <pre>ggplot(df, aes(x = cyl, y = mpg)) +  geom_boxplot() </pre> <img class="size-full wp-image-8695" src="" alt="Image 4 - Miles per gallon among different cylinder numbers" width="2550" height="1594" /> Image 4 - Miles per gallon among different cylinder numbers It makes sense — a car makes fewer miles per gallon the more cylinders it has. There are outliers for cars with eight cylinders, represented with dots above and whiskers below. You can change the orientation of the chart if you find this one hard to look at. Just call the <code>coord_flip()</code> function when coding the chart: <pre>ggplot(df, aes(x = cyl, y = mpg)) +  geom_boxplot() +  coord_flip() </pre> <img class="size-full wp-image-8696" src="" alt="Image 5 - Changing the orientation" width="2550" height="1594" /> Image 5 - Changing the orientation We’ll stick with the default orientation moving forward. Let’s say you want to display every data point on the boxplot. The <code>mtcars</code> dataset is relatively small, so it might actually be a good idea. You’ll have to call the <code>geom_dotplot()</code> function to do so: <pre>ggplot(df, aes(x = cyl, y = mpg)) +  geom_boxplot() +  geom_dotplot(binaxis = "y", stackdir = "center", dotsize = 0.5) </pre> <img class="size-full wp-image-8698" src="" alt="Image 6 - Displaying all data points on the boxplot" width="2550" height="1594" /> Image 6 - Displaying all data points on the boxplot Be extra careful if you’re doing this for a larger dataset. Outliers are a bit harder to spot and it's easy to get overwhelmed. Let’s explore how you can make boxplots more appealing to the eye. <h2 id="styling">Style a ggplot Boxplot — Change Theme, Outline, and Fill Color</h2> <h3>Boxplot Outline</h3> Let’s start with the outline color. It might just be enough to give your visualization an extra punch. You can specify an attribute that decides which color is applied in the call to <code>aes()</code>, and then use the <code>scale_color_manual()</code> function to provide a list of colors: <pre>ggplot(df, aes(x = cyl, y = mpg, color = cyl)) +  geom_boxplot() +  scale_color_manual(values = c("#0099f8", "#e74c3c", "#2ecc71")) </pre> <img class="size-full wp-image-8699" src="" alt="Image 7 - Changing the outline color" width="2550" height="1594" /> Image 7 - Changing the outline color There are other ways to specify the color palette or use custom color palettes. However, we find the option above to be the most customizable. <h3>Fill Boxplot</h3> If you want to change the fill color instead, you have options. You can specify a color to the <code>fill</code> parameter inside <code>geom_boxplot()</code> if you want all boxplots to have the same color: <pre>ggplot(df, aes(x = cyl, y = mpg)) +  geom_boxplot(fill = "#0099f8") </pre> <img class="size-full wp-image-8700" src="" alt="Image 8 - Changing the fill color" width="2550" height="1594" /> Image 8 - Changing the fill color The alternative is to apply the same logic we used in the outline color — a variable controls which color is applied where, and you can use the <code>scale_color_manual()</code> function to change the colors: <pre>ggplot(df, aes(x = cyl, y = mpg, fill = cyl)) +  geom_boxplot() +  scale_fill_manual(values = c("#0099f8", "#e74c3c", "#2ecc71")) </pre> <img class="size-full wp-image-8701" src="" alt="Image 9 - Changing the fill color (2)" width="2550" height="1594" /> Image 9 - Changing the fill color (2) <h3>Changing Your Boxplot Theme</h3> Now we’re getting somewhere. The only thing we haven’t addressed is that horrendous background color. You can get rid of it by changing the theme. For example, adding <code>theme_classic()</code> will make your chart a bit more modern and minimalistic: <pre>ggplot(df, aes(x = cyl, y = mpg, fill = cyl)) +  geom_boxplot() +  scale_fill_manual(values = c("#0099f8", "#e74c3c", "#2ecc71")) +  theme_classic() </pre> <img class="size-full wp-image-8702" src="" alt="Image 10 - Changing the theme" width="2550" height="1594" /> Image 10 - Changing the theme Style boils down to personal preference, but this one is much easier to look at in our opinion. There’s still one gigantic elephant in the room left to discuss — titles and labels. No one knows what your ggplot boxplot represents without them. <h2 id="labels">Add Text, Titles, Subtitles, Captions, and Axis Labels to a ggplot Boxplot</h2> <h3>Labeling ggplot Boxplots</h3> Let’s start with text labels. It’s somewhat unusual to add them to boxplots, as they’re usually used on charts where exact values are displayed (bar, line, etc.). Nevertheless, you can display any text you want with ggplot boxplots, you’ll just have to get a bit more creative. For example, if you want to display the number of observations, mean, and median above every boxplot, you’ll first have to declare a function that fetches that information. We decided to name ours <code>get_box_stats()</code>: <pre>get_box_stats &lt;- function(y, upper_limit = max(df$mpg) * 1.15) {  return(data.frame(    y = 0.95 * upper_limit,    label = paste(      "Count =", length(y), "\n",      "Mean =", round(mean(y), 2), "\n",      "Median =", round(median(y), 2), "\n"    )  )) } </pre> <blockquote><strong>Discover more Boxplot arguments in the <a href="" target="_blank" rel="noopener noreferrer">ggplot2 boxplot documentation</a>.</strong></blockquote> You can now pass it to <code>stat_summary()</code> function when drawing boxplots: <pre>ggplot(df, aes(x = cyl, y = mpg, fill = cyl)) +  geom_boxplot() +  scale_fill_manual(values = c("#0099f8", "#e74c3c", "#2ecc71")) +  stat_summary( = get_box_stats, geom = "text", hjust = 0.5, vjust = 0.9) +  theme_classic() </pre> <img class="size-full wp-image-8703" src="" alt="Image 11 - Adding text" width="2550" height="1594" /> Image 11 - Adding text Neat, right? Much better than displaying values directly on the chart. <h3>Titling Boxplots</h3> Let’s cover titles and axes labels next. These are mandatory for production-ready charts, as without them, the users don’t know what they’re looking at. You can use the following code snippet to add title, subtitle, caption, x-axis label, and y-axis label: <pre>ggplot(df, aes(x = cyl, y = mpg)) +  geom_boxplot(fill = "#0099f8") +  labs(    title = "Miles per gallon among different cylinder options",    subtitle = "Made by Appsilon",    caption = "Source: MTCars dataset",    x = "Number of cylinders",    y = "Miles per gallon"  ) +  theme_classic() </pre> <img class="size-full wp-image-8704" src="" alt="Image 12 - Adding title, subtitle, caption, and axis labels" width="2550" height="1594" /> Image 12 - Adding title, subtitle, caption, and axis labels If you think these look a bit plain, you’re not alone. You can use the <code>theme()</code> function to style them. Be aware that your <strong>custom styles will be ignored</strong> if you call <code>theme_classic()</code> after declaring custom styles: <pre>ggplot(df, aes(x = cyl, y = mpg)) +  geom_boxplot(fill = "#0099f8") +  labs(    title = "Miles per gallon among different cylinder options",    subtitle = "Made by Appsilon",    caption = "Source: MTCars dataset",    x = "Number of cylinders",    y = "Miles per gallon"  ) +  theme_classic() +  theme(    plot.title = element_text(color = "#0099f8", size = 16, face = "bold", hjust = 0.5),    plot.subtitle = element_text(face = "bold.italic", hjust = 0.5),    plot.caption = element_text(face = "italic")  ) </pre> <img class="size-full wp-image-8705" src="" alt="Image 13 - Styling title, subtitle, and caption" width="2550" height="1594" /> Image 13 - Styling title, subtitle, and caption Much better — assuming you like the blue color. <h2>Advanced ggplot Boxplot Examples</h2> We'll now cover a couple of advanced things you can do with R ggplot boxplots. These might not be super handy for everyday tasks, but you'll know when you need them. Let's start with something on a simple side - adding mean value. <h3>Adding mean value to boxplots</h3> As you already know, boxplots show the median as a thick line somewhere in the box. But what if you also want to show the mean value? That's what this subsection will teach you. The <code>stat_summary()</code> function does the trick. You can use it to specify any function and shape, but we'll stick with the mean: <pre><code class="language-r">ggplot(df, aes(x = cyl, y = mpg)) +  geom_boxplot() +  stat_summary(fun = "mean", geom = "point", shape = 2, size = 3, color = "blue")</code></pre> <img class="size-full wp-image-14626" src="" alt="Image 14 - Adding mean values to a boxplot" width="1760" height="1306" /> Image 14 - Adding mean values to a boxplot Again, not something you'll use daily, but you'll know exactly when you need this functionality. <h3>Highlight individual boxplots</h3> Sometimes you want to shift the user's focus to a certain area of the chart. Doing so is a bit tricky, as it involves adding a new variable to your dataset specifying which row should be highlighted. To do so, we can use the <code>dplyr</code> package and chain a call to <code>ggplot</code>. Take a look for yourself: <pre><code class="language-r">library(dplyr) <br>df %&gt;%  mutate(hlt = ifelse(cyl == 4, "Highlighted", "Normal")) %&gt;%  ggplot(aes(x = cyl, y = mpg, fill = hlt, alpha = hlt)) +    geom_boxplot() +    scale_fill_manual(values = c("#0099f8", "grey")) +    scale_alpha_manual(values = c(1, 0.5)) +    theme(legend.position = "none")</code></pre> <img class="size-full wp-image-14628" src="" alt="Image 15 - Highlighting individual boxplot" width="1758" height="1304" /> Image 15 - Highlighting individual boxplot We've modified both the color and transparency of individual boxplots, but you're free to stick with coloring only. <h3>Add boxplots as marginal distributions to scatter plots</h3> In statistics, this is actually done all the time. It saves both time and space, as you can show relationships between variables as a scatter plot, and on the margins, you can also show the distribution of each variable. Now, <code>ggplot</code> doesn't ship by default with this functionality, so you'll have to install an additional package - <code>ggExtra</code>. Once installed, create a scatter plot as you normally would, and then wrap it in a call to <code>ggMarginal</code> which can show a marginal distribution as a histogram, density plot, or a boxplot: <pre><code class="language-r">library(ggExtra) <br>p &lt;- ggplot(df, aes(x = wt, y = mpg, color = cyl, size = cyl)) +  geom_point(alpha = 0.7) +  theme_minimal() +  theme(legend.position = "none") ggMarginal(p, type = "boxplot") </code></pre> <img class="size-full wp-image-14630" src="" alt="Image 16 - Adding marginal distributions" width="1764" height="1300" /> Image 16 - Adding marginal distributions Neat, right? You can further change <code>boxplot</code> for <code>histogram</code> or <code>desnsity</code> to change the charts on the margins. Everything covered so far is just enough to get you on the right track when making ggplot boxplots, so we’ll stop here. <blockquote><strong>Looking for more examples of Boxplots? Check out the <a href="" target="_blank" rel="noopener noreferrer">r-bloggers boxplot feed to see what the R community has to say</a>. </strong></blockquote> <hr /> <h2 id="conclusion">Conclusion to ggplot Boxplot in R</h2> Today you’ve learned what boxplots are, and how to draw them with R and the <code>ggplot2</code> library. You've also learned how to make them aesthetically pleasing by changing colors, and adding text, titles, and axis labels. You now have the knowledge to style boxplots however you'd like. You know what to tweak, and now it’s up to you to pick fonts and colors. When creating data visualizations with R, you're only limited by your creativity (and R knowledge). If you need help finding inspiration or tools be sure to check out what can be achieved with <a href="" target="_blank" rel="noopener noreferrer">advanced R programming</a>. At Appsilon, we’ve used <code>ggplot2</code> package frequently when developing enterprise R Shiny dashboards for Fortune 500 companies. If you have a keen eye for design and know a thing or two about R and Shiny, reach out. We have several <a href="" target="_blank" rel="noopener noreferrer">R Shiny developer positions</a> available. <blockquote>Read more: <a title="How to Start a Career as an R Shiny Developer" href="" target="_blank" rel="noopener noreferrer">How to Start a Career as an R Shiny Developer</a></blockquote>

Contact us!
Damian's Avatar
Damian Rodziewicz
Head of Sales
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
data visualization