The power of neural networks is no mystery. With increasing data and increasing compute, where the tools that we use for deep learning are very important. It is important that we pick the right one. PyTorch is not everyone’s cup of tea, but it certainly checks many of the boxes for intense out-of-the-box neural networks.

PyTorch is a library developed by Facebook for GPU accelerating computation of tensors. It is heavily optimized for these types of tensor computations (similar to how NumPy is optimized for mathematical operations) and can be used seamlessly on a GPU for vastly faster training. Facebook is aware that users of PyTorch will likely also use NumPy so they have implemented easy ways to switch between the two.

Background

At its core, PyTorch operates on tensors, multi-dimensional matrices containing elements of a single type. Tensors in PyTorch bear a striking resemblance to NumPy’s ndarray, but with a significant distinction: they are optimized for GPU-accelerated computations.

The essence of deep learning is the propagation of data (in the form of tensors) through a series of transformations, often represented by layers in a neural network. As data flows through these layers, PyTorch’s dynamic computation graph keeps track of all operations performed on each tensor. This facilitates automatic differentiation, a powerful mechanism by which gradients are computed, enabling the optimization of model parameters using backpropagation.

Sequential vs. Functional Models

In PyTorch, one common way to define models is using the Sequential class. This provides a linear stack of layers, where the output of one layer becomes the input to the next. It’s a straightforward and intuitive way to design neural networks, especially when the data flow is unidirectional.

However, as neural network architectures evolved and became more complex, a need arose for a more flexible approach. This is where functional models come in. Instead of being limited to a simple sequence, functional layers, as seen in both PyTorch’s nn.functional and Keras’s functional API, allow for more intricate architectures. Layers can receive inputs from multiple previous layers and send their outputs to multiple subsequent ones. This leads to the possibility of designing networks with skip connections, multiple input or output branches, and shared layers.

Addressing Overfitting

Highly interconnected models, while being powerful, can sometimes fit the training data too closely, capturing not just the underlying patterns but also the noise or random fluctuations in the data. This phenomenon, known as overfitting, reduces a model’s ability to generalize to unseen data.

One of the strategies to mitigate overfitting is the introduction of dropout. The dropout technique involves randomly “dropping out” a fraction of neurons during training, meaning they are temporarily removed from the network. This prevents any single neuron from becoming overly specialized and fosters a more distributed and robust representation. It’s like adding a regularizing effect, forcing the network to learn redundant patterns, which in turn improves its generalization capabilities.

Implementing with PyTorch

One of the basic building blocks in PyTorch is the module block. Each of these are typically used to define a layer of a model and are called with torch.nn.Module, which is the base class for all of the different types of neural networks possible in PyTorch.

Implementing a sequential neural network is quite simple. As described above, the sequential model simply involves the stacking of different layers on top of each other.

In Torch, there exists a container torch.nn.Sequential that can be used to house an ordered list of modules. Here’s how you can create a simple sequential model with convolutional layers, ReLU activation, and max pooling:

import torch.nn as nn

# Define the sequential model
model = nn.Sequential(
    # First convolutional layer with 3 input channels, 16 output channels, and a kernel size of 3x3
    nn.Conv2d(3, 16, 3),
    # Activation function
    nn.ReLU(),
    # Max pooling with a 2x2 window
    nn.MaxPool2d(2, 2),
    
    # Another convolutional layer with 16 input channels and 32 output channels
    nn.Conv2d(16, 32, 3),
    nn.ReLU(),
    nn.MaxPool2d(2, 2)
)

print(model)

In this example:

We use two convolutional layers (nn.Conv2d) to extract features from the input. Each of these layers is followed by a ReLU activation function (nn.ReLU) to introduce non-linearity.
After each convolutional layer, there’s a max-pooling layer (nn.MaxPool2d) to reduce the spatial dimensions of the output and to enhance the prominent features.

With the nn.Sequential container, adding new layers or changing the order becomes as simple as modifying the list of modules. It provides a convenient way to organize and encapsulate the different components of your model.

Using an Ordered Dictionary as Input to the PyTorch Sequential Model

The nn.Sequential container in PyTorch allows for an additional level of clarity and organization by accepting an OrderedDict. This helps in giving explicit names to each layer or operation, making it easier to reference and modify them later on.

While ReLU layers introduce non-linearity to the model, dropout is another technique used to prevent overfitting. By randomly setting a fraction of the input units to 0 at each update during training time, dropout helps in preventing the model from relying too heavily on any specific neuron, promoting a more generalized representation.

Let’s modify the previous example by using an OrderedDict to create a sequential model that also includes dropout:

import torch.nn as nn
from collections import OrderedDict

# Define the sequential model using an OrderedDict
model = nn.Sequential(OrderedDict([
    ('conv1', nn.Conv2d(3, 16, 3)),
    ('relu1', nn.ReLU()),
    ('pool1', nn.MaxPool2d(2, 2)),
    ('dropout1', nn.Dropout(p=0.5)),
    
    ('conv2', nn.Conv2d(16, 32, 3)),
    ('relu2', nn.ReLU()),
    ('pool2', nn.MaxPool2d(2, 2)),
    ('dropout2', nn.Dropout(p=0.5))
]))

print(model)

In this example:

The OrderedDict provides a way to name each module. For instance, the first convolutional layer is named 'conv1'.
We’ve added nn.Dropout layers after each max-pooling operation. This will randomly set a fraction (in this case, 50%) of the input units to 0 during training, which can help in preventing overfitting.
The model can be accessed using the defined names. For example, to get the weights of the first convolutional layer, you can use model.conv1.weight.

With the OrderedDict, it becomes intuitive to add, modify, or access specific layers in your model, providing both structure and flexibility.

Concluding Remarks

With so many neural network libraries available, Pytorch is certainly one of the best available. Adding a layer to the neural network is as simple as adding another line into the Torch constructor. Implementation of sequential neural networks in Pytorch is a breeze. Layers can also be added non-sequentially through the functional function call.