1. Introduction

In general, you will need these things to train a model:

  • A Model
  • A Dataset
  • A Dataloader
  • A Loss Function (Criterion)
  • An Optimizer

2. Model

We will build a simple model for demonstration. The model takes a tensor of shape (batch_size, 10) as input and outputs a tensor of shape (batch_size, 2).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# @file simple_model.py
import torch
import torch.nn as nn

class SimpleModel(nn.Module):
    def __init__(self):
        super(SimpleModel, self).__init__()
        self.fc = nn.Linear(10, 2)

    def forward(self, x):
        return self.fc(x)

if __name__ == "__main__":
    model = SimpleModel()
    x = torch.randn(4, 10)  # Shape: (4, 10)
    y = model(x)
    print(y.shape)  # Shape: (4, 2)

You can run the script to check how the model works:

1
python simple_model.py

3. Dataset

We will build a simple dataset for demonstration. The dataset generates random data and labels.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
# @file simple_dataset.py
import torch
from torch.utils.data import Dataset

class SimpleDataset(Dataset):
    def __init__(self, size):
        self.size = size

    def __len__(self):
        return self.size

    def __getitem__(self, idx):
        x = torch.randn(10)  # Shape: (10,); Element type: float32
        y = torch.randint(0, 2, (1,))  # Shape: (1,); Element type: int64
        return x, y

if __name__ == "__main__":
    dataset = SimpleDataset(4)
    x, y = dataset[0]
    print(x.shape, y.shape)  # Shape: (10,), (1,)

You can run the script to check how the dataset works:

1
python simple_dataset.py

4. Dataloader

As long as the dataset is built, creating a dataloader is quite easy.

A dataloader will provide batch_size samples in each iteration. For example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# @file temp.py
from torch.utils.data import DataLoader
from simple_dataset import SimpleDataset

dataset = SimpleDataset(100)
# Get a sample, shape: (10,), (1,)
sample_x, sample_y = dataset[0] 

# Suppose batch_size is 16, the dataloader will provide 16 samples in each iteration
dataloader = DataLoader(dataset, batch_size=16, shuffle=True, drop_last=True)
for i, (x, y) in enumerate(dataloader):
    print(x.shape, y.shape)  # Shape: (16, 10), (16, 1)
    break

You can run the script to check how the dataloader works:

1
python temp.py

5. Loss Function

Different tasks require different loss functions. For example, a 2-class classification task can use nn.CrossEntropyLoss, while a regression task can use nn.MSELoss.

In our case, we will use nn.CrossEntropyLoss.

6. Optimizer

We will use torch.optim.SGD as the optimizer. torch.optim.Adam is also a good choice. This is a hyperparameter that you can tune.

7. Trainpipeline

Now we can build the trainpipeline. The trainpipeline will train the model on the dataset.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
# @file trainpipeline.py
import torch
import torch.nn as nn
from torch.utils.data import DataLoader

# This is the model we built
from simple_model import SimpleModel
# This is the dataset we built
from simple_dataset import SimpleDataset

DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
BATCH_SIZE = 16
EPOCHS = 100
LEARNING_RATE = 0.01


def train():
    # Create a model and move it to DEVICE
    model = SimpleModel().to(DEVICE)

    # Create train dataset and dataloader
    train_dataset = SimpleDataset(1000)
    val_dataset = SimpleDataset(100)
    train_dataloader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True, drop_last=True)
    val_dataloader = DataLoader(val_dataset, batch_size=BATCH_SIZE, shuffle=False, drop_last=False)

    # Create a loss function and an optimizer; The optimizer will update the model's parameters
    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.SGD(model.parameters(), lr=LEARNING_RATE)

    for epoch in range(EPOCHS):
        model.train()  # Set the model to training mode
        for i, (x, y) in enumerate(train_dataloader):
            x, y = x.to(DEVICE), y.to(DEVICE)
            optimizer.zero_grad()
            y_pred = model(x)
            loss = criterion(y_pred, y.squeeze())
            loss.backward()
            optimizer.step()

        model.eval()  # Set the model to evaluation mode
        with torch.no_grad():  # Disable gradient calculation
            total_loss = 0
            total_correct = 0
            total_samples = 0
            for i, (x, y) in enumerate(val_dataloader):
                x, y = x.to(DEVICE), y.to(DEVICE)
                y_pred = model(x)
                loss = criterion(y_pred, y.squeeze())
                total_loss += loss.item()
                total_correct += (y_pred.argmax(dim=1) == y.squeeze()).sum().item()
                total_samples += y.size(0)

            print(f"Epoch: {epoch}, Loss: {total_loss / total_samples}, Accuracy: {total_correct / total_samples}")

if __name__ == "__main__":
    train()

You can run the script to check how the trainpipeline works:

1
python trainpipeline.py