MANOVA in R - How To Implement and Interpret One-Way MANOVA

Reading time:

time

min

January 5, 2022

The R programming language packs a rich set of statistical functions. It makes it easy to do any kind of statistical test, including the analysis of variance. Today you’ll learn all about MANOVA in R, and apply it to a real dataset.

We’ll start with the theory and discuss use-cases in which you should consider MANOVA instead of regular, univariate ANOVA.
<blockquote><strong>Want to learn the basics? Read our guide to <a href="https://appsilon.com/anova-in-r/" target="_blank" rel="noopener noreferrer">One-way ANOVA in R from scratch.</a> </strong></blockquote>
We strongly recommend reading the above article first, as it lays the foundation for analysis of variance in R.

Table of contents:
<ul><li><a href="#manova-theory">The Theory of MANOVA in R</a></li><li><a href="#manova-implementation">MANOVA in R: Implementation</a></li><li><a href="#manova-post-hoc">Interpret MANOVA in R With a Post-Hoc Test</a></li><li><a href="#conclusion">Conclusion</a></li></ul>

<hr />

<h2 id="manova-theory">The Theory of MANOVA in R</h2>
MANOVA stands for <em>Multivariate ANOVA</em> or <em>Multivariate Analysis Of Variance</em>. It’s an extension of regular ANOVA. The general idea is the same, but the MANOVA test has to include at least two dependent variables to
analyze differences between multiple groups (factors) of the independent variable.

If you only have a single dependent variable, then there’s no point in using MANOVA. Regular ANOVA will suit you fine. For example, if you want to see if a petal length is associated with different types of Iris species, there’s no point in using MANOVA, as you only have a single dependent variable (petal length). On the contrary, if you have data on petal length and petal width, then using MANOVA would be a wise thing to do.

MANOVA in R uses <strong>Pillai’s Trace</strong> test for the calculations, which is then converted to an F-statistic when we want to check the significance of the group mean differences. You can use other tests, such as Wilk’s Lambda, Roy’s Largest Root, or Hotelling-Lawley’s test, but Pillai’s Trace test is the most powerful one.
<h3>Errors, hypotheses, and assumptions</h3>
A regular ANOVA test often suffers from a lot of type I errors. These are caused when you perform multiple ANOVA tests for each dependent variable. MANOVA provides a solution, as it’s able to capture group differences based on combined information of the multiple dependent variables.

Because MANOVA uses more than one dependent variable, the null and the alternative hypotheses are slightly changed:
<ul><li><strong>H0</strong>: Group mean vectors are the same for all groups or they don’t differ significantly.</li><li><strong>H1</strong>: At least one of the group mean vectors is different from the rest.</li></ul>
MANOVA in R won’t tell you which group differs from the rest, but that’s easy to determine via a post-hoc test. We’ll use Linear Discriminant Analysis (LDA) to answer this question later.

As you would expect, MANOVA statistical test has many strict assumptions. The ones from ANOVA carry over - independence of observations and homogeneity of variances - and some new are introduced:
<ul><li><strong>Multivariate normality</strong> - Each combination of independent or dependent variables should have a multivariate normal distribution. Use Shapiro-Wilk’s test to verify.</li><li><strong>Linearity</strong> - Dependent variables should have a linear relationship with each group (factor) of the independent variable.</li><li><strong>No multicollinearity</strong> - Dependent variables should have very high correlations.</li><li><strong>No outliers</strong> - There shouldn’t be any outliers in the dependent variables.</li></ul>
Checking all of these is time-consuming and dataset-specific. To keep things simple, we’ll assume all of the requirements are met and explore only how to use MANOVA in R in the following section.
<h2 id="manova-implementation">MANOVA in R: Implementation</h2>
As with most of the things in R, performing a MANOVA statistical test boils down to a single function call. But we’ll need a dataset first. The Iris dataset is well-known among the data science crowd, and it is built into R:

<script src="https://gist.github.com/darioappsilon/cb7edc12cf8d08400a2e6f845c9b7a91.js"></script>

<img class="size-full wp-image-11894" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b01f49099015d6184eac82_iris-dataset-in-r.webp" alt="Image 1 - The Iris dataset in R" width="1290" height="360" /> Image 1 - The Iris dataset in R

It doesn’t matter if you use the same dataset as us, as long as one critical condition is met - the dataset must have more observations (rows) per group in the independent variable than a number of the dependent variables. For example, the Iris dataset has 3 groups and 4 dependent variables. This means we need more than 4 observations for each of the flower species. We have 50 for each, so we're good to go.
<h3>Dependent variables</h3>
As MANOVA cares for the difference in means for each factor, let’s visualize the boxplot for every dependent variable. There will be 4 plots in total arranged in a 2x2 grid, each having a dedicated boxplot for a specific flower species:

<script src="https://gist.github.com/darioappsilon/e7da9bb56f810cfbd2a0d6c299e73398.js"></script>

<img class="size-full wp-image-11892" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b01f4bcb8cd72a3d44f924_boxplots-for-all-dependent-variables.webp" alt="Image 2 - Boxplots for all dependent variables and all factors of the independent variable" width="2154" height="1492" /> Image 2 - Boxplots for all dependent variables and all factors of the independent variable
<blockquote><strong>Do you want to learn more about boxplots? <a href="https://appsilon.com/ggplot2-boxplots/" target="_blank" rel="noopener noreferrer">Check our complete guide to stunning boxplots with R</a>.</strong></blockquote>
It seems like the <em>setosa</em> species is more separated than the other two, but let’s not jump to conclusions.
<h3>One-way MANOVA in R</h3>
We can now perform a one-way MANOVA in R. The best practice is to separate the dependent from the independent variable before calling the <code>manova()</code> function. Once the test is done, you can print its summary:

<script src="https://gist.github.com/darioappsilon/d861a81a12687389c75e02ceff6d2648.js"></script>

<img class="size-full wp-image-11904" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b01f4bfd01060a37df767a_manova-in-r-test-summary.webp" alt="Image 3 - MANOVA in R test summary" width="1380" height="268" /> Image 3 - MANOVA in R test summary

By default, MANOVA in R uses Pillai’s Trace test statistic. The P-value is practically zero, which means we can safely reject the null hypothesis in the favor of the alternative one - at least one group mean vector differs from the rest.

While we’re here, we could also measure the effect size. One metric often used with MANOVA is <strong>Partial Eta Squared</strong>. It measures the effect the independent variable has on the dependent variables. If the value is 0.14 or greater, we can say the effect size is large. Here’s how to calculate it in R:

<script src="https://gist.github.com/darioappsilon/8fb8cd8dd5004b697c77c59e344fa56b.js"></script>

<img class="size-full wp-image-11906" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b01f4dc8798807239e967b_partial-eta-squared-value-for-the-manova-test.webp" alt="Image 4 - Partial Eta Squared value for the MANOVA test" width="1040" height="278" /> Image 4 - Partial Eta Squared value for the MANOVA test

The value is 0.6, which means the effect size is large. It’s a great way to double-check the summary results of a MANOVA test, but how can we actually know which group mean vector differs from the rest? That’s where a post-hoc test comes into play.
<h2 id="manova-post-hoc">Interpret MANOVA in R With a Post-Hoc Test</h2>
The P-Value is practically zero, and the Partial Eta Squared suggests a large effect size - but which group or groups are different from the rest? There’s no way to tell without a post-hoc test. We’ll use
<strong>Linear Discriminant Analysis (LDA),</strong> which finds a linear combination of features that best separates two or more groups.

By doing so, we’ll be able to visualize a scatter plot showing the two linear discriminants on the X and Y axes, and color code them to match our independent variable - the flower species.

You can implement Linear Discriminant Analysis in R using the <code>lda()</code> function from the <code>MASS</code> package:

<script src="https://gist.github.com/darioappsilon/66056eccdcbff3606b4941dda1d20043.js"></script>

<img class="size-full wp-image-11900" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b01f4e3e47f6a3573007d0_linear-discriminant-analysis-results.webp" alt="Image 5 - Linear Discriminant Analysis results" width="1628" height="1194" /> Image 5 - Linear Discriminant Analysis results

Take a look at the coefficients to see how the dependent variables are used to form the LDA decision rule. LD1 is calculated as <code>LD1 = 0.83 * Sepal.Length + 1.53 * Sepal.Width - 2.20 * Petal Length - 2.81 * Petal.Width</code>, when rounded to two decimal points.

The snippet below uses the <code>predict()</code> function to get the linear discriminants and combines them with our independent variable:

<script src="https://gist.github.com/darioappsilon/494364f9930f991b2ed07d07d8e74005.js"></script>

<img class="size-full wp-image-11898" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b01f50fb4af06205134252_LDA-dataset.webp" alt="Image 6 - LDA dataset" width="840" height="364" /> Image 6 - LDA dataset

The final step in this post-hoc test is to visualize the above <code>lda_df</code> as a scatter plot. Ideally, we should see one or multiple groups stand out:

<script src="https://gist.github.com/darioappsilon/64639c1e0799927cf83a5c0a10a8cef9.js"></script>

<img class="size-full wp-image-11896" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b01f50fb4af06205134288_LDA-dataset-scatter-plot.webp" alt="Image 7 - LDA dataset as a scatter plot" width="2144" height="1494" /> Image 7 - LDA dataset as a scatter plot

The <em>setosa</em> species is significantly different when compared to <em>virginica</em> and <em>versicolor</em>. These two are more similar, suggesting that it was the <em>setosa</em> group that had the most impact for us to reject
the null hypothesis.

<strong>To summarize</strong> - the group mean vector of the <em>setosa</em> class is significantly different from the other group means, so it’s safe to assume it was a crucial factor for rejecting the null hypothesis.

<hr />

<h2 id="conclusion">Conclusion</h2>
And there you have it - your go-to guide to MANOVA in R. You now know what MANOVA is, when you should use it, and how to implement and interpret it with the R programming language. For additional practice, we recommend you apply the above code to a dataset of your choice. Just make sure to satisfy all MANOVA prerequisites. You could also remove the <em>setosa</em> class from the Iris dataset and repeat the test. Any ideas on what would happen then?
If you want to learn more about inferential statistics in R, stay tuned to Appsilon’s blog.

Have the latest in R/Shiny and data science delivered weekly in your inbox. Subscribe to our newsletter today.

MANOVA in R - How To Implement and Interpret One-Way MANOVA

Open source, pharma, and AI insights - once a week.

Share Your Data Goals with Us