# Deep Learning with R and Keras: Build a Handwritten Digit Classifier in 10 Minutes

<em><strong>Updated</strong>: December 30, 2022.</em>
<h2><span data-preserver-spaces="true">Deep Learning in R - MNIST Classifier with R Keras</span></h2>
<span data-preserver-spaces="true">In a day and age where everyone seems to know how to solve at least basic deep learning tasks with Python, one question arises: </span><em><span data-preserver-spaces="true">How does R fit into the whole deep learning picture</span></em><span data-preserver-spaces="true">?</span>
<blockquote><span data-preserver-spaces="true">You don't need deep learning algorithms to solve basic image classification tasks. </span><a class="editor-rtfLink" href="https://wordpress.appsilon.com/r-mnist-random-forests/" target="_blank" rel="noopener noreferrer"><span data-preserver-spaces="true">Here's how to classify handwritten digits with R and Random Forests</span></a><span data-preserver-spaces="true">.</span></blockquote>
<span data-preserver-spaces="true">Here is some good news for R fans - both Tensorflow and Keras libraries are available to you, and they're easy to configure. Today you'll learn how to solve a well-known MNIST problem with Keras.</span>
<span data-preserver-spaces="true">Navigate to a section:</span>
<ul><li><a href="#install">Installing Tensorflow and Keras with R</a></li><li><a href="#dataset">Dataset Loading and Preparation</a></li><li><a href="#training">Model Training</a></li><li><a href="#evaluation">Model Evaluation</a></li><li><a href="#conclusion">Conclusion</a></li></ul>
<hr />
<h2 id="install"><span data-preserver-spaces="true">Installing Tensorflow and Keras with R</span></h2>
<span data-preserver-spaces="true">To build an image classifier model with Keras, you'll have to install the library first. But before you can install Keras, you'll have to install Tensorflow.</span>
<span data-preserver-spaces="true">The procedure is a bit different than when installing other libraries. Yes, you'll still use the <code>install.packages()</code> function, but there's an extra step involved.</span>
<span data-preserver-spaces="true">Here's how to install Tensorflow from the R console:</span>
<pre><code class="language-r">install.packages("tensorflow")
library(tensorflow)
install_tensorflow()<code></code></code></pre>
<span data-preserver-spaces="true">Most likely, you'll be prompted to install Miniconda, which is something you should do - assuming you don't have it already.</span>
<span data-preserver-spaces="true">The installation process for Keras is identical - just be aware that Tensorflow has to be installed first:</span>
<pre><code class="language-r">install.packages("keras")
library(keras)
install_keras()<code></code></code></pre>
<span data-preserver-spaces="true">Restart the R session if you're asked to - failing to do so could result in some DLL issues, at least according to R.</span>
<span data-preserver-spaces="true">And that's all you need to install the libraries. Let's load and prepare the dataset next.</span>
<h2 id="dataset"><span data-preserver-spaces="true">Dataset Loading and Preparation</span></h2>
<span data-preserver-spaces="true">Luckily for us, the MNIST dataset is built into the Keras library. You can get it by calling the <code>dataset_mnist()</code> function once the library is imported.</span>
<span data-preserver-spaces="true">Further, you should separate the dataset into four categories:</span>
<ul><li><span data-preserver-spaces="true"><code>X_train</code> - contains digits for the training set</span></li><li><span data-preserver-spaces="true"><code>X_test</code> - contains digits for the testing set</span></li><li><span data-preserver-spaces="true"><code>y_train</code> - contains labels for the training set</span></li><li><span data-preserver-spaces="true"><code>y_test</code> - contains labels for the testing set</span></li></ul>
<span data-preserver-spaces="true">You can use the following code snippet to import Keras and unpack the data:</span>
<pre><code class="language-r">library(keras)
<br>mnist <- dataset_mnist()
X_train <- mnist$train$x
X_test <- mnist$test$x
y_train <- mnist$train$y
y_test <- mnist$test$y</code></pre>
<span data-preserver-spaces="true">It's a good start, but we're not done yet. This article will only use linear layers (no convolutions), so you'll have to reshape the input images from 28x28 to 1x784 each. You can do so with the <code>array_reshape()</code> function from Keras. Further, you'll also divide each value of the image matrix by 255, so all images are in the [0, 1] range.</span>
<span data-preserver-spaces="true">That will handle the input images, but we also have to convert the labels. These are stored as integers by default, and we'll convert them to categories with the <code>to_categorical()</code> function.</span>
<span data-preserver-spaces="true">Here's the entire code snippet:</span>
<pre><code class="language-r">X_train <- array_reshape(X_train, c(nrow(X_train), 784))
X_train <- X_train / 255
<br>X_test <- array_reshape(X_test, c(nrow(X_test), 784))
X_test <- X_test / 255
<br>y_train <- to_categorical(y_train, num_classes = 10)
y_test <- to_categorical(y_test, num_classes = 10)</code></pre>
<span data-preserver-spaces="true">And that's all we need to start with model training. Let's do that next.</span>
<h2 id="training"><span data-preserver-spaces="true">Model Training</span></h2>
<span data-preserver-spaces="true">MNIST is a large and simple dataset, so a simple model architecture should result in a near-perfect model.</span>
<span data-preserver-spaces="true">We'll have three hidden layers with 256, 128, and 64 neurons, respectively, and an output layer with ten neurons since there are ten distinct classes in the MNIST dataset.</span>
<span data-preserver-spaces="true">Every linear layer is followed by dropout in order to prevent overfitting. </span>
<span data-preserver-spaces="true">Once you declare the model, you can use the <code>summary()</code> function to print its architecture:</span>
<pre><code class="language-r">model <- keras_model_sequential() %>%
layer_dense(units = 256, activation = "relu", input_shape = c(784)) %>%
layer_dropout(rate = 0.25) %>%
layer_dense(units = 128, activation = "relu") %>%
layer_dropout(rate = 0.25) %>%
layer_dense(units = 64, activation = "relu") %>%
layer_dropout(rate = 0.25) %>%
layer_dense(units = 10, activation = "softmax")
summary(model)</code></pre>
<span data-preserver-spaces="true">The results are shown in the following figure:</span>
<img class="size-full wp-image-6640" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b3958506a4963c1d08d28a_1-5.webp" alt="Image 1 - Summary of our neural network architecture" width="640" height="338" /> Image 1 - Summary of our neural network architecture
<span data-preserver-spaces="true">One step remains before we can begin training - compiling the model. This step involves choosing how loss is measured, choosing a function for reducing loss, and choosing a metric that measures overall performance.</span>
<span data-preserver-spaces="true">Let's go with categorical cross-entropy, Adam, and accuracy, respectively:</span>
<pre><code class="language-r">model %>% compile(
loss = "categorical_crossentropy",
optimizer = optimizer_adam(),
metrics = c("accuracy")
)</code></pre>
<span data-preserver-spaces="true">You can now call the <code>fit()</code> function to train the model. The following snippet trains the model for 50 epochs, feeding 128 images at a time:</span>
<pre><code class="language-r">history <- model %>%
fit(X_train, y_train, epochs = 50, batch_size = 128, validation_split = 0.15)</code></pre>
<span data-preserver-spaces="true">Once this line of code is executed, you'll see the following output:</span>
<img class="size-full wp-image-6641" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d554c92cb2a06ec0e9cc_ab16d21f_2-5.webp" alt="Image 2 - Model training (step 1)" width="1050" height="339" /> Image 2 - Model training (step 1)
<span data-preserver-spaces="true">After a minute or soo, 50 epochs will elapse. Here's the final output you should see:</span>
<img class="size-full wp-image-6642" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d55500fd9b7af134460b_48ee29b1_3-5.webp" alt="Image 3 - Model training (step 2)" width="1059" height="151" /> Image 3 - Model training (step 2)
<span data-preserver-spaces="true">At the same time, you'll see a chart updating as the model trains. It shows both loss and accuracy on training and validation subsets. Here's how it should look like when the training process is complete:</span>
<img class="size-full wp-image-6643" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d5552c3f301b93864697_ee89119d_4-5.webp" alt="Image 4 - Loss and accuracy on training and validation sets" width="2314" height="1154" /> Image 4 - Loss and accuracy on training and validation sets
<span data-preserver-spaces="true">And that's it - you're ready to evaluate the model. Let's do that next.</span>
<h2 id="evaluation"><span data-preserver-spaces="true">Model Evaluation</span></h2>
<span data-preserver-spaces="true">You can use the <code>evaluate()</code> function from Keras to evaluate the performance on the test set. Here's the code snippet for doing so:</span>
<pre><code class="language-r">model %>%
evaluate(X_test, y_test)</code></pre>
<span data-preserver-spaces="true">And here are the results:</span>
<img class="size-full wp-image-6645" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d556a6ad2b083ed5d84b_e9e9f295_5-5.webp" alt="Image 5 - Model evaluation on the test set" width="711" height="45" /> Image 5 - Model evaluation on the test set
<span data-preserver-spaces="true">As you can see, the model resulted in an above 98% accuracy on previously unseen data.</span>
<span data-preserver-spaces="true">To make predictions on a new subset of data, you can use the <code>predict_classes()</code> function as shown below (UPDATE 2022: make sure to add <code>k_argmax()</code> to the end because the old way doesn't work anymore):</span>
<pre><code class="language-r">predictions <- model %>%
predict(X_test) %>%
k_argmax()
<br>predictions$numpy()</code></pre>
<span data-preserver-spaces="true">Here are the results:</span>
<img class="size-full wp-image-6644" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d5577e3fcb2832abaa90_3e8cc2e4_6-4.webp" alt="Image 6 - Class predictions on the test set" width="734" height="376" /> Image 6 - Class predictions on the test set
You can take the R Keras model evaluation to the next step by creating a custom evaluation dataframe object. The one below contains actual test values and predictions. Keep in mind that a constant number 1 is subtracted from the actual value to get numbers in range between 0 and 9 instead of 1 to 10:
<pre><code class="language-r">library(ramify)
<br>eval_df <- data.frame(
y = argmax(y_test),
y_hat = predictions$numpy()
)
eval_df$y <- lapply(eval_df$y, function(x) x - 1)
<br>head(eval_df)</code></pre>
<img class="size-full wp-image-17198" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d5582c764ec4771084a4_081d4fae_7-2.webp" alt="Image 7 - Head of the evaluation data.frame" width="358" height="414" /> Image 7 - Head of the evaluation data.frame
You can now further feature engineer attributes as you wish. For example, here's how you can add a column which compares the two existing ones to see if classification is correct:
<pre><code class="language-r">eval_df$is_correct <- ifelse(eval_df$y == eval_df$y_hat, 1, 0)
<br>head(eval_df)</code></pre>
<img class="size-full wp-image-17200" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d5587b3b08462745a577_b976ea1b_8-1.webp" alt="Image 8 - Head of the evaluation data.frame (2)" width="472" height="408" /> Image 8 - Head of the evaluation data.frame (2)
Probably the biggest reason you'd want a custom evaluation dataframe is so you can calculate custom evaluation metrics. For example, the dataframe has to be unlisted before you can construct a confusion matrix:
<pre><code class="language-r">eval_df <- as.data.frame(lapply(eval_df, unlist))
table(ACTUAL = eval_df$y, PREDICTED = eval_df$y_hat)</code></pre>
<img class="size-full wp-image-17202" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d55985b6bd7053b71d06_d307b4b2_9-2.webp" alt="Image 9 - MNIST confusion matrix" width="1248" height="678" /> Image 9 - MNIST confusion matrix
MNIST is a 10-class classification problem, so the confusion matrix isn't so easy to look at. Calculating metrics such as precision and recall is also challenging due to the number of classes.
Luckily, you can use dedicated R packages to calculate these metrics easily:
<pre><code class="language-r">library(MLmetrics)
<br>MLmetrics::Accuracy(y_pred = eval_df$y_hat, y_true = eval_df$y)
MLmetrics::Precision(y_pred = eval_df$y_hat, y_true = eval_df$y)
MLmetrics::Recall(y_pred = eval_df$y_hat, y_true = eval_df$y)
MLmetrics::F1_Score(y_pred = eval_df$y_hat, y_true = eval_df$y)</code></pre>
<img class="size-full wp-image-17204" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d55ac92cb2a06ec0ef08_a91006aa_10.webp" alt="Image 10 - Test set metrics" width="1442" height="414" /> Image 10 - Test set metrics
<span data-preserver-spaces="true">And that's how you can use Keras in R! Let's wrap things up in the next section.</span>
<hr />
<h2 id="conclusion"><span data-preserver-spaces="true">Summing up R Keras</span></h2>
<span data-preserver-spaces="true">Most of the online resources for deep learning are written in R - I'll give you that. That doesn't mean R is obsolete in this area. Both Tensorflow and Keras have official R support, and model development is just as easy as with Python.</span>
<strong><span data-preserver-spaces="true">If you want to implement machine learning in your organization, you can always reach out to </span></strong><a class="editor-rtfLink" href="https://wordpress.appsilon.com/" target="_blank" rel="noopener noreferrer"><strong><span data-preserver-spaces="true">Appsilon</span></strong></a><strong><span data-preserver-spaces="true"> for help.</span></strong>
<h3><span data-preserver-spaces="true">Learn More</span></h3><ul><li><a class="editor-rtfLink" href="https://wordpress.appsilon.com/r-linear-regression/" target="_blank" rel="noopener noreferrer"><span data-preserver-spaces="true">Machine Learning with R: A Complete Guide to Linear Regression</span></a></li><li><a class="editor-rtfLink" href="https://wordpress.appsilon.com/r-logistic-regression/" target="_blank" rel="noopener noreferrer"><span data-preserver-spaces="true">Machine Learning with R: A Complete Guide to Logistic Regression</span></a></li><li><a class="editor-rtfLink" href="https://wordpress.appsilon.com/r-decision-treees/" target="_blank" rel="noopener noreferrer"><span data-preserver-spaces="true">Machine Learning with R: A Complete Guide to Decision Trees</span></a></li><li><a class="editor-rtfLink" href="https://wordpress.appsilon.com/r-xgboost/" target="_blank" rel="noopener noreferrer"><span data-preserver-spaces="true">Machine Learning with R: A Complete Guide to Gradient Boosting and XGBoost</span></a></li><li><a class="editor-rtfLink" href="https://wordpress.appsilon.com/object-detection-yolo-algorithm/" target="_blank" rel="noopener noreferrer"><span data-preserver-spaces="true">YOLO Algorithm and YOLO Object Detection: An Introduction</span></a></li></ul>

###### Contact us!

**Iwona Matyjaszek**