PyTorch: How to Train and Optimize A Neural Network in 10 Minutes

Reading time:

time

min

December 6, 2022

Deep learning might seem like a challenging field to newcomers, but it's gotten easier over the years due to amazing libraries and community. PyTorch library for Python is no exception, and it allows you to train deep learning models from scratch on any dataset. <blockquote>Sometimes it's easier to visualize deep learning models - you can do so with these <a href="https://appsilon.com/visualize-pytorch-neural-networks/" target="_blank" rel="noopener">3 examples for visualizing PyTorch neural networks</a>.</blockquote> In this article, you'll get a hands-on experience with PyTorch by coding your first neural network from scratch and optimizing it. For the sake of speed, the network will be trained on a small <a href="https://gist.githubusercontent.com/netj/8836201/raw/6f9306ad21398ea43cba4f7d537619d0e07d5ae3/iris.csv" target="_blank" rel="nofollow noopener">Iris dataset</a>, but you can always modify the model to fit your data. <blockquote>Want to use Keras in R? <a href="https://appsilon.com/r-keras-mnist/" target="_blank" rel="noopener">Follow this guide to build a handwritten digit classifier</a>.</blockquote> Table of contents: <ul><li><a href="#data">How to Load Data with PyTorch Data Loaders</a></li><li><a href="#single-model">Train Your First Neural Network with PyTorch</a></li><li><a href="#optimize">How to Optimize Your Neural Network Models in PyTorch</a></li><li><a href="#summary">Summing up Model Training & Optimization in PyTorch</a></li></ul> <hr /> <h2 id="data">How to Load Data with PyTorch Data Loaders</h2> First things first, we need a way to load the dataset. You could go as simple as loading a CSV file and converting it to a PyTorch Tensor object, but this approach isn't optimal for bigger datasets. That's the reason why we'll use PyTorch <code>DataLoader</code> module, which will allow loading data in batches. We still have to load the dataset with Python, so here's a code snippet for all the library imports and the dataset: <pre><code class="language-python">import warnings warnings.filterwarnings("ignore") <br>import numpy as np import pandas as pd import torch import torch.nn as nn import torch.nn.functional as F from torch.utils.data import DataLoader, TensorDataset from sklearn.model_selection import train_test_split <br>import matplotlib.pyplot as plt import matplotlib.gridspec as gridspec from IPython import display display.set_matplotlib_formats("svg") <br>iris = pd.read_csv("https://gist.githubusercontent.com/netj/8836201/raw/6f9306ad21398ea43cba4f7d537619d0e07d5ae3/iris.csv") iris.head()</code></pre> The <code>head()</code> method from Pandas shows the first five rows of the dataset: <img class="size-full wp-image-15829" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d2e58b0a609886a95a9f_e98e6e24_1-4.webp" alt="Image 1 - Head of the Iris dataset" width="548" height="214" /> Image 1 - Head of the Iris dataset We want to predict the <code>variety</code> variable (dependent) based on the four independent variables (widths and lengths). A common way is to separate features (<code>X</code>) from the target variable (<code>y</code>), and convert both to PyTorch tensors. Keep in mind that Torch tensors should be numeric, so we'll have to encode the target variable: <pre><code class="language-python">X = torch.tensor(iris.drop("variety", axis=1).values, dtype=torch.float) y = torch.tensor( [0 if vty == "Setosa" else 1 if vty == "Versicolor" else 2 for vty in iris["variety"]], dtype=torch.long ) <br>print(X.shape, y.shape)</code></pre> <img class="size-full wp-image-15831" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d2e69bd3e45a2189cc86_94a894b3_2-4.webp" alt="Image 2 - Shape of the feature data tensor and target variable tensor" width="391" height="29" /> Image 2 - Shape of the feature data tensor and target variable tensor As you would expect, the feature tensor has a shape of 150 rows by 4 columns, while the target variable tensor has only one column with the same number of rows. The next step you want to do when training the classification model is to <b>train/test split</b>. The idea is to train the model on most of the data (e.g., 80%) and evaluate it on the remainder. This way, we can make sure the high accuracies aren't obtained because of overfitting. With the 80:20 split, the training set will have 120 samples. We'll load these in 12 batches with 10 samples each, and we'll load the test set all at once. Batches aren't really necessary for this simple of a problem, but we want to provide a workflow that's easy for you to copy/paste between the projects: <pre><code class="language-python">X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.8, random_state=42) <br>train_data = TensorDataset(X_train, y_train) test_data = TensorDataset(X_test, y_test) <br>train_loader = DataLoader(train_data, shuffle=True, batch_size=12) test_loader = DataLoader(test_data, batch_size=len(test_data.tensors[0])) <br>print("Training data batches:") for X, y in train_loader: print(X.shape, y.shape) print("\nTest data batches:") for X, y in test_loader: print(X.shape, y.shape)</code></pre> Here are the shapes of data coming in on each batch: <img class="size-full wp-image-15833" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d2e79202e0e0262489d9_6ab52ecf_3-4.webp" alt="Image 3 - Batch contents" width="297" height="245" /> Image 3 - Batch contents We now have the data ready for training, so next, let's see how to build a PyTorch neural network model. <h2 id="single-model">Train Your First Neural Network with PyTorch</h2> There are multiple ways to build a neural network model in PyTorch. You could go with a simple Sequential model for this dataset, but we'll stick to a more robust class approach. The first model we'll build will have a single hidden layer of 16 nodes that's connecting the input and the output layer. Just make sure that the number of <code>out_features</code> on layer L-1 matches the number of <code>in_features</code> on layer L - otherwise, matrix multiplication won't be possible and you'll get an error: <pre><code class="language-python">class Net(nn.Module): def __init__(self): super().__init__() self.input = nn.Linear(in_features=4, out_features=16) self.hidden_1 = nn.Linear(in_features=16, out_features=16) self.output = nn.Linear(in_features=16, out_features=3) def forward(self, x): x = F.relu(self.input(x)) x = F.relu(self.hidden_1(x)) return self.output(x) model = Net() print(model)</code></pre> By printing the model you get a glimpse into its architecture - there are better ways to do this but this one is adequate for today: <img class="size-full wp-image-15835" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d2e83778b40ed0781a83_9319e089_4-4.webp" alt="Image 4 - Model architecture" width="640" height="113" /> Image 4 - Model architecture Now comes the training part. In PyTorch, you have to set the training loop manually and manually calculate the loss. The backpropagation (learning) is also handled inside the training loop. We'll keep track of the training and testing accuracies per epoch for visualizations later. Configuration-wise, we'll use <code>CrossEntropyLoss</code> to keep track of the loss, and <code>Adam</code> as a gradient descent implementation: <pre><code class="language-python">num_epochs = 200 train_accuracies, test_accuracies = [], [] <br>loss_function = nn.CrossEntropyLoss() optimizer = torch.optim.Adam(params=model.parameters(), lr=0.01) <br>for epoch in range(num_epochs): # Train set for X, y in train_loader: preds = model(X) pred_labels = torch.argmax(preds, axis=1) loss = loss_function(preds, y) optimizer.zero_grad() loss.backward() optimizer.step() train_accuracies.append( 100 * torch.mean((pred_labels == y).float()).item() ) # Test set X, y = next(iter(test_loader)) pred_labels = torch.argmax(model(X), axis=1) test_accuracies.append( 100 * torch.mean((pred_labels == y).float()).item() )</code></pre> The code shouldn't take more than a second or so to finish, as we're dealing with a small dataset. We have the training and testing accuracies now saved into variables, so let's visualize them to see how the learning went: <pre><code class="language-python">fig = plt.figure(tight_layout=True) gs = gridspec.GridSpec(nrows=2, ncols=1) <br>ax = fig.add_subplot(gs[0, 0]) ax.plot(train_accuracies) ax.set_xlabel("Epoch") ax.set_ylabel("Training accuracy") <br>ax = fig.add_subplot(gs[1, 0]) ax.plot(test_accuracies) ax.set_xlabel("Epoch") ax.set_ylabel("Test accuracy") <br>fig.align_labels() plt.show()</code></pre> <img class="size-full wp-image-15837" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d2e9b3d8e6cd986d6bf0_0bd82304_5-4.webp" alt="Image 5 - Training and testing accuracies per epoch" width="755" height="563" /> Image 5 - Training and testing accuracies per epoch On the test set, the model is always between 97% and 100% accurate, with one exception of 94%. That's an impressive performance for a model with only one hidden layer! It brings the question, though - <b>Is this the optimal model architecture?</b> Well, we can't possibly know before trying to optimize it. Let's do that in the next section. <h2 id="optimize">How to Optimize Your Neural Network Models in PyTorch</h2> The optimization process boils down to trying out a couple of layer/nodes per layer combinations. The PyTorch model class allows you to introduce variability with <code>ModuleDict()</code>, but more on that in a bit. Let's start with something simpler. Just wrap the entire training logic into a <code>train_model()</code> function, and make sure to extract data and the model parts to the function argument. This function will do the training for us and will return the last obtained training and testing accuracy: <pre><code class="language-python">def train_model(train_loader, test_loader, model, lr=0.01, num_epochs=200): train_accuracies, test_accuracies = [], [] loss_function = nn.CrossEntropyLoss() optimizer = torch.optim.Adam(params=model.parameters(), lr=lr) for epoch in range(num_epochs): for X, y in train_loader: preds = model(X) pred_labels = torch.argmax(preds, axis=1) loss = loss_function(preds, y) losses.append(loss.detach().numpy()) optimizer.zero_grad() loss.backward() optimizer.step() train_accuracies.append( 100 * torch.mean((pred_labels == y).float()).item() ) X, y = next(iter(test_loader)) pred_labels = torch.argmax(model(X), axis=1) test_accuracies.append( 100 * torch.mean((pred_labels == y).float()).item() ) return train_accuracies[-1], test_accuracies[-1] <br> train_model(train_loader, test_loader, Net())</code></pre> The last line of the code snippet tests the function - you can see it works without any issues: <img class="size-full wp-image-15839" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d2e97d51e4abfae6c683_976c51ea_6-2.webp" alt="Image 6 - Testing the train_model() function" width="278" height="43" /> Image 6 - Testing the train_model() function Onto the model now. We'll declare a new class - <code>Net2</code> - that will accept the number of layers and the number of nodes as parameters. Then, the idea is to store each layer in PyTorch <code>ModuleDict()</code>. This way, we can have a variable number of layers with a variable number of nodes per layer. The rest of the class is more or less the same, but with a slightly changed code writing convention: <pre><code class="language-python">class Net2(nn.Module): def __init__(self, n_units, n_layers): super().__init__() self.n_layers = n_layers self.layers = nn.ModuleDict() self.layers["input"] = nn.Linear(in_features=4, out_features=n_units) for i in range(self.n_layers): self.layers[f"hidden_{i}"] = nn.Linear(in_features=n_units, out_features=n_units) self.layers["output"] = nn.Linear(in_features=n_units, out_features=3) def forward(self, x): x = self.layers["input"](x) for i in range(self.n_layers): x = F.relu(self.layers[f"hidden_{i}"](x)) return self.layers["output"](x)</code></pre> The optimization can now begin. We'll go from 1 to 4 layers, with each layer having either 8, 16, 24, 32, 40, 48, 56, or 56 nodes. It's quite a number of combinations, but in the end, we should see what works best for the Iris dataset: <pre><code class="language-python">n_layers = np.arange(1, 5) n_units = np.arange(8, 65, 8) train_accuracies, test_accuracies = [], [] <br>for i in range(len(n_units)): for j in range(len(n_layers)): model = Net2(n_units=n_units[i], n_layers=n_layers[j]) train_acc, test_acc = train_model(train_loader, test_loader, model) train_accuracies.append({ "n_layers": n_layers[j], "n_units": n_units[i], "accuracy": train_acc }) test_accuracies.append({ "n_layers": n_layers[j], "n_units": n_units[i], "accuracy": test_acc }) train_accuracies = pd.DataFrame(train_accuracies).sort_values(by=["n_layers", "n_units"]).reset_index(drop=True) test_accuracies = pd.DataFrame(test_accuracies).sort_values(by=["n_layers", "n_units"]).reset_index(drop=True) test_accuracies.head()</code></pre> Here are the first five accuracy reports for the test subset: <img class="size-full wp-image-15841" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d2ea7d51e4abfae6c6d1_00fb4ba9_7-2.webp" alt="Image 7 - Head of the test accuracies dataset" width="303" height="217" /> Image 7 - Head of the test accuracies dataset The question remains: <b>How can we extract the best model architecture?</b> Simple - just keep the rows that have the highest accuracy on the test set: <pre><code class="language-python">test_accuracies[test_accuracies["accuracy"] == test_accuracies["accuracy"].max()]</code></pre> <img class="size-full wp-image-15843" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d2ebeb1c1eedd3644dbd_0cc589ea_8-1.webp" alt="Image 8 - Model architectures with the highest test accuracy" width="303" height="503" /> Image 8 - Model architectures with the highest test accuracy As you can see, there isn't just one perfect combination. Multiple model architectures will result in 100% accuracy, at least on this dataset. If your dataset has thousands of testing samples, it's unlikely that this many combinations will result in the same accuracy. Overall, you can't really go wrong with any of these, but it's recommended to stick with the simplest solution that yields the best results. <hr /> <h2 id="summary">Summing up Model Training & Optimization in PyTorch</h2> Training your first neural network in PyTorch wasn't that difficult, was it? It boils down to writing a scalable, dataset and parameter agnostic framework, which you can then copy/paste between projects. For example, you could also extract the loss function and the optimizer as function parameters, and see what happens as you change them. It's a good idea for a homework assignment, so why don't you give it a try? <i>What are your thoughts on building PyTorch neural networks? Do you prefer some other library, such as TensorFlow, and why?</i> Please let us know in the comment section below. Also, feel welcome to move the discussion to Twitter - <a href="https://twitter.com/appsilon" target="_blank" rel="noopener">@appsilon</a> - we'd love to hear your feedback. <blockquote>Dive deep into image classification - <a href="https://appsilon.com/cnn-for-image-classification/" target="_blank" rel="noopener">Convolutional Neural Networks for seed classification</a>.</blockquote>

PyTorch: How to Train and Optimize A Neural Network in 10 Minutes

Open source, pharma, and AI insights - once a week.

Share Your Data Goals with Us