Tensorflow & Pytorch comparison with CIFAR10
The purpose of this notebook is to layout the typical workflow in a simple feed-forward classification problem using Tensorflow/Keras and Pytorch.
About
The third assigment of the course Pytorch Zero to GANS run by JOVIAN.ML is to go through a simple classification problem using the CIFAR10 dataset. The course uses Pytorch and as an option course attendees were asked to use Tensorflow to repeat the task.
The training was done on Google Colab with GPU. The good thing with running on Colab (and Binder, Kaggle to name a few others) is there is not much setup involved. Import the required libraries and off you go!
The Tensorflow version will be done first followed by the Pytorch version. The course assignment was in Pytorch (as the course title suggests) so the TF example was made to match that setup.
from google.colab import drive
drive.mount('/content/drive', force_remount=True)
import os
root_dir = '/content/drive/My Drive/Colab Notebooks/jovian/'
from __future__ import print_function
import tensorflow as tf
import keras
from keras.datasets import cifar10
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.utils import to_categorical
import os
import numpy as np
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
%matplotlib inline
Data augmentation not used because this was not used in the Pytorch example below. **
batch_size=128
epochs=20
data_augmentation=False
(X,y), (x_test,y_test) = cifar10.load_data()
# collapse-hide
print(X.shape)
print(y.shape)
print(x_test.shape)
print(y_test.shape)
y_squeeze = np.squeeze(y)
# Create a classes list
classes = ['airplane',
'automobile',
'bird',
'cat',
'deer',
'dog',
'frog',
'horse',
'ship',
'truck']
#unique classes
unique_classes = np.unique(y)
num_classes = len(unique_classes)
unique_classes
unique, count = np.unique(y, return_counts=True)
for i in range(len(unique)):
print(f' class {classes[i] } has {count[i].item()} images')
#collapse-hide
fig = plt.figure(figsize=(6,6))
for i in range(9):
plt.subplot(3,3,i+1)
plt.imshow(X[i])
plt.show()
x_train, x_val, y_train, y_val= train_test_split(X, y,
test_size=0.1,
random_state=42)
x_train=x_train.astype('float32')/255.
x_val=x_val.astype('float32')/255.
y_train = to_categorical(y_train, num_classes)
y_val = to_categorical(y_val, num_classes)
print(x_train.shape)
print(y_train.shape)
y_train = np.squeeze(y_train)
y_val=np.squeeze(y_val)
model = Sequential()
model.add(Flatten(input_shape=(32,32,3)))
model.add(Dense(32,activation = 'relu'))
model.add(Dense(num_classes))
model.add(Activation('softmax'))
model.summary()
The number of model parameters must match that of Pytorch.
opt = keras.optimizers.SGD(learning_rate=1e-3)
model.compile(loss='categorical_crossentropy',
optimizer=opt,
metrics = ['accuracy'])
H = model.fit(x_train, y_train,
batch_size=batch_size,
epochs = 20,
validation_data=(x_val,y_val))
print(H.history.keys())
plt.title("ACCURACY")
plt.plot(H.history['accuracy'], label='train_acc')
plt.plot(H.history['val_accuracy'], label = 'val_acc')
plt.legend()
plt.show()
x_test =x_test.astype('float32')/255.
y_test = to_categorical(y_test, num_classes)
y_test.shape
model.evaluate(x_test, y_test)
import torch
import torchvision
import numpy as np
import matplotlib.pyplot as plt
import torch.nn as nn
import torch.nn.functional as F
from torchvision.datasets import CIFAR10
from torchvision.transforms import ToTensor
from torchvision.utils import make_grid
from torch.utils.data.dataloader import DataLoader
from torch.utils.data import random_split
%matplotlib inline
# Project name used for jovian.commit
project_name = '03-cifar10-feedforward'
proj_dir = os.path.join(root_dir, project_name)
proj_dir
dataset = CIFAR10(root=proj_dir, download=True, transform=ToTensor())
test_dataset = CIFAR10(root=proj_dir, train=False, transform=ToTensor())
dataset_size = len(dataset)
dataset_size
test_dataset_size = len(test_dataset)
test_dataset_size
classes = dataset.classes
classes
num_classes = len(dataset.classes)
num_classes
Note that this dataset consists of 3-channel color images (RGB). Let us look at a sample image from the dataset. matplotlib
expects channels to be the last dimension of the image tensors (whereas in PyTorch they are the first dimension), so we'll the .permute
tensor method to shift channels to the last dimension. Let's also print the label for the image.
#get the label of the dataset using the [1] index
img, label = dataset[1]
label_of_image_1 = dataset[1][1]
title= str(label_of_image_1) + ' is a ' + classes[label_of_image_1]
plt.imshow(img.permute(1,2,0))
plt.title(title)
plt.show()
label_of_train_images=[]
for i in range(len(dataset)):
label_of_train_image = dataset[i][1]
label_of_train_images.append(label_of_train_image)
num_unique_train_labels = np.unique(label_of_train_images)
uniq_image_count = torch.stack([(torch.tensor(label_of_train_images)==i).sum() for i in num_unique_train_labels])
for i in range(len(uniq_image_count)):
print(f' class {classes[i] } has {uniq_image_count[i].item()} images')
torch.manual_seed(43)
val_size = 5000
train_size = len(dataset) - val_size
Let's use the random_split
method to create the training & validation sets
train_ds, val_ds = random_split(dataset, [train_size, val_size])
len(train_ds), len(val_ds)
We can now create data loaders to load the data in batches.
batch_size=128
train_loader = DataLoader(train_ds, batch_size, shuffle=True, num_workers=4, pin_memory=True)
val_loader = DataLoader(val_ds, batch_size*2, num_workers=4, pin_memory=True)
test_loader = DataLoader(test_dataset, batch_size*2, num_workers=4, pin_memory=True)
Let's visualize a batch of data using the make_grid
helper function from Torchvision.
#collapse-hide
viz_loader = DataLoader(train_ds, 9, shuffle=True, num_workers=4, pin_memory=True)
for images, _ in viz_loader:
plt.figure(figsize=(6,6))
plt.axis('off')
plt.imshow(make_grid(images, nrow=3).permute((1, 2, 0)))
break
Base Model class & Training on GPU
Let's create a base model class, which contains everything except the model architecture i.e. it wil not contain the __init__
and __forward__
methods. We will later extend this class to try out different architectures. In fact, you can extend this model to solve any image classification problem.
def accuracy(outputs, labels):
_, preds = torch.max(outputs, dim =1)
return torch.tensor(torch.sum(preds== labels).item()/len(preds))
class ImageClassificationBase(nn.Module):
def training_step(self, batch):
images, labels = batch
out = self(images) # Generate predictions
loss = F.cross_entropy(out, labels) # Calculate loss
return loss
def validation_step(self, batch):
images, labels = batch
out = self(images) # Generate predictions
loss = F.cross_entropy(out, labels) # Calculate loss #change from cross_entropy
acc = accuracy(out, labels) # Calculate accuracy
return {'val_loss': loss, 'val_acc': acc}
def validation_epoch_end(self, outputs):
batch_losses = [x['val_loss'] for x in outputs]
epoch_loss = torch.stack(batch_losses).mean() # Combine losses
batch_accs = [x['val_acc'] for x in outputs]
epoch_acc = torch.stack(batch_accs).mean() # Combine accuracies
return {'val_loss': epoch_loss.item(), 'val_acc': epoch_acc.item()}
def epoch_end(self, epoch, result):
print("Epoch [{}], val_loss: {:.4f}, val_acc: {:.4f}".format(epoch, result['val_loss'], result['val_acc']))
We can also use the exact same training loop as before. I hope you're starting to see the benefits of refactoring our code into reusable functions.
def evaluate(model, val_loader):
outputs = [model.validation_step(batch) for batch in val_loader]
return model.validation_epoch_end(outputs)
def fit(epochs, lr, model, train_loader, val_loader, opt_func=torch.optim.SGD):
history = []
optimizer = opt_func(model.parameters(), lr)
for epoch in range(epochs):
# Training Phase
for batch in train_loader:
loss = model.training_step(batch)
loss.backward()
optimizer.step()
optimizer.zero_grad()
# Validation phase
result = evaluate(model, val_loader)
model.epoch_end(epoch, result)
history.append(result)
return history
Finally, let's also define some utilities for moving out data & labels to the GPU, if one is available.
Let us also define a couple of helper functions for plotting the losses & accuracies.
def plot_losses(history):
losses = [x['val_loss'] for x in history]
plt.plot(losses, '-x')
plt.xlabel('epoch')
plt.ylabel('loss')
plt.title('Loss vs. No. of epochs');
def plot_accuracies(history):
accuracies = [x['val_acc'] for x in history]
plt.plot(accuracies, '-o')
plt.xlabel('epoch')
plt.ylabel('accuracy')
plt.title('Pytorch Accuracy vs. No. of epochs');
Let's move our data loaders to the appropriate device.
input_size = 3*32*32
hidden_size = 32
output_size = 10
class CIFAR10Model(ImageClassificationBase):
def __init__(self):
super().__init__()
#dd hidden layer
self.linear1 = nn.Linear(input_size, hidden_size)
#output layer
self.linear2 = nn.Linear(hidden_size, output_size)
def forward(self, xb):
# Flatten images into vectors
out = xb.view(xb.size(0), -1)
# Apply layers & activation functions
out = self.linear1(out)
out = F.relu(out)
out = self.linear2(out)
return out
You can now instantiate the model, and move it the appropriate device.
#USING A GPU
torch.cuda.is_available()
def get_default_device():
"""Pick GPU if available, else CPU"""
if torch.cuda.is_available():
return torch.device('cuda')
else:
return torch.device('cpu')
device = get_default_device()
print(device)
def to_device(data, device):
"""Move tensor(s) to chosen device"""
if isinstance(data, (list,tuple)):
return [to_device(x, device) for x in data]
return data.to(device, non_blocking=True)
for images, labels in train_loader:
print(images.shape)
images = to_device(images, device)
print(images.device)
break
model = to_device(CIFAR10Model(), device)
print(model.parameters)
for t in model.parameters():
print(t.shape)
Before you train the model, it's a good idea to check the validation loss & accuracy with the initial set of weights.
#collapse-hide
pytorch_total_params = sum(p.numel() for p in model.parameters())
print('Total number of parameters: ',pytorch_total_params)
pytorch_trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print('Trainable parameters: ', pytorch_trainable_params)
print('layers + activations',len(list(model.parameters())))
The number of model parameters matches that of TF/Keras
class DeviceDataLoader():
"""Wrap a dataloader to move data to a device"""
def __init__(self, dl, device):
self.dl = dl
self.device = device
def __iter__(self):
"""Yield a batch of data after moving it to device"""
for b in self.dl:
yield to_device(b, self.device)
def __len__(self):
"""Number of batches"""
return len(self.dl)
train_loader = DeviceDataLoader(train_loader, device)
val_loader = DeviceDataLoader(val_loader, device)
test_loader= DeviceDataLoader(test_loader, device)
for xb, yb in val_loader:
print('xb.device:', xb.device)
xb = xb.view(xb.size(0), -1)
break
history = [evaluate(model, val_loader)]
history
history = fit(20, 1e-3, model, train_loader, val_loader)
history
Plot the losses and the accuracies to check if you're starting to hit the limits of how well your model can perform on this dataset. You can train some more if you can see the scope for further improvement.
plot_accuracies(history)
evaluate(model, test_loader)
Results
Accuracy Tensorflow/Keras : 0.3619999885559082 Pytorch: 0.33740234375
The differences could be due to very little training and hence lack of convergence of the solution, the randomness of the weights intialisation and differences in the library implementations in TF and Pytorch and other stuff I am not aware of ;)
Concluding comments
This exercise was not to get an exact match of accuracy but to demonstrate the constructs between TF and Pytorch.
The Jovian course Pytorch Zero to GANS is a great introduction to Machine Learning. I am enjoying
- Collaborating with others
- Working through examples
- Finishing assignments and submitting for approval
- Blogging about my experiences
- Staying enthusiastic about ML!
Thanks and appreciation to:
- Jeremy Howard and Hamel Hussain for the Fastpages framework in which this blog is written as a Jupyter notebook.