Introduction to Machine Learning with PyTorch

Day 1 - Part 3: Deep Learning with Flexibility

Juan F. Imbet

Master 2 (203) in Financial Markets, Paris Dauphine - PSL University

2025-10-31

What is PyTorch?

Deep learning framework developed by Facebook/Meta
Dynamic computational graphs (define-by-run)
Pythonic and intuitive API
Strong focus on research and flexibility
Excellent GPU acceleration support
Powerful automatic differentiation engine

Why PyTorch?

More control than Keras/TensorFlow
Easier debugging (standard Python debugger works)
Research-friendly: easy to implement custom architectures
Growing ecosystem: torchvision, torchaudio, etc.
Used by OpenAI, Tesla, Microsoft, and top universities
Production-ready with TorchScript

Installation and Setup

Installing PyTorch:

# For CPU only
pip install torch torchvision

# For CUDA (GPU support) - check pytorch.org for your CUDA version
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# With conda
conda install pytorch torchvision torchaudio -c pytorch

Verify installation:

import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

PyTorch Tensors

Tensors are the fundamental data structure:

import torch
import numpy as np

# Create tensors
x = torch.tensor([1, 2, 3, 4, 5])
y = torch.zeros(3, 4)
z = torch.ones(2, 3)
random = torch.randn(2, 3)  # Normal distribution

# From numpy
arr = np.array([1, 2, 3])
tensor_from_numpy = torch.from_numpy(arr)

# To numpy
numpy_array = tensor_from_numpy.numpy()

print(f"Tensor shape: {x.shape}")
print(f"Tensor dtype: {x.dtype}")

Tensor Operations

# Basic operations
a = torch.tensor([1.0, 2.0, 3.0])
b = torch.tensor([4.0, 5.0, 6.0])

# Element-wise operations
c = a + b
d = a * b
e = torch.sin(a)

# Matrix operations
A = torch.randn(3, 4)
B = torch.randn(4, 5)
C = torch.mm(A, B)  # Matrix multiplication

# Reshaping
x = torch.randn(12)
y = x.view(3, 4)  # Reshape to 3x4
z = x.view(-1, 2)  # Reshape to (?, 2)

print(f"Result shape: {C.shape}")

GPU Acceleration

# Check if CUDA is available
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

# Move tensors to GPU
x = torch.randn(1000, 1000)
x_gpu = x.to(device)

# Or create directly on GPU
y_gpu = torch.randn(1000, 1000, device=device)

# Compute on GPU
z_gpu = torch.mm(x_gpu, y_gpu)

# Move back to CPU
z_cpu = z_gpu.to('cpu')

Automatic Differentiation (Autograd)

PyTorch’s most powerful feature:

import torch

# Create tensor with gradient tracking
x = torch.tensor([2.0], requires_grad=True)

# Define function: y = x^3 + 2x^2 + 5
y = x**3 + 2*x**2 + 5

# Compute gradients (dy/dx = 3x^2 + 4x)
y.backward()

print(f"x = {x.item()}")
print(f"y = {y.item()}")
print(f"dy/dx = {x.grad.item()}")  # Should be 3(4) + 4(2) = 20

Example: Computing Derivatives

Useful for optimization and understanding functions:

import torch
import matplotlib.pyplot as plt

# Define function: f(x) = sin(x) + x^2/10
x = torch.linspace(-5, 5, 100, requires_grad=True)
f = torch.sin(x) + x**2 / 10

# Compute gradient
f.sum().backward()  # Need scalar for backward()
df_dx = x.grad

# Plotting
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.plot(x.detach().numpy(), f.detach().numpy())
plt.title('Function f(x)')
plt.subplot(1, 2, 2)
plt.plot(x.detach().numpy(), df_dx.numpy())
plt.title("Derivative f'(x)")
plt.show()

Building Neural Networks

Using torch.nn module:

import torch
import torch.nn as nn

# Define a simple neural network
class SimpleNet(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleNet, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, output_size)
    
    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        return x

# Create model
model = SimpleNet(input_size=10, hidden_size=64, output_size=1)
print(model)

Example 1: Regression with PyTorch

Approximating a non-linear function:

import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np

# Generate data
X = np.linspace(-5, 5, 1000).reshape(-1, 1)
y = np.sin(X) + np.cos(2*X) + np.random.randn(1000, 1) * 0.1

# Convert to tensors
X_tensor = torch.FloatTensor(X)
y_tensor = torch.FloatTensor(y)

# Split data
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
    X_tensor, y_tensor, test_size=0.2, random_state=42
)

Define the Model

class RegressionNet(nn.Module):
    def __init__(self):
        super(RegressionNet, self).__init__()
        self.fc1 = nn.Linear(1, 64)
        self.fc2 = nn.Linear(64, 64)
        self.fc3 = nn.Linear(64, 32)
        self.fc4 = nn.Linear(32, 1)
        self.relu = nn.ReLU()
    
    def forward(self, x):
        x = self.relu(self.fc1(x))
        x = self.relu(self.fc2(x))
        x = self.relu(self.fc3(x))
        x = self.fc4(x)
        return x

# Create model
model = RegressionNet()
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

Training Loop

# Training loop
num_epochs = 100
losses = []

for epoch in range(num_epochs):
    # Forward pass
    outputs = model(X_train)
    loss = criterion(outputs, y_train)
    
    # Backward pass and optimization
    optimizer.zero_grad()  # Clear gradients
    loss.backward()        # Compute gradients
    optimizer.step()       # Update weights
    
    losses.append(loss.item())
    
    if (epoch + 1) % 10 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

# Evaluate
model.eval()
with torch.no_grad():
    y_pred = model(X_test)
    test_loss = criterion(y_pred, y_test)
    print(f'Test Loss: {test_loss.item():.4f}')

Example 2: Binary Classification

Creating a classifier:

from sklearn.datasets import make_moons
from sklearn.preprocessing import StandardScaler

# Generate data
X, y = make_moons(n_samples=1000, noise=0.1, random_state=42)

# Scale data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Convert to tensors
X_tensor = torch.FloatTensor(X_scaled)
y_tensor = torch.FloatTensor(y).view(-1, 1)

# Split
X_train, X_test, y_train, y_test = train_test_split(
    X_tensor, y_tensor, test_size=0.2, random_state=42
)

Classification Model

class Classifier(nn.Module):
    def __init__(self):
        super(Classifier, self).__init__()
        self.fc1 = nn.Linear(2, 32)
        self.fc2 = nn.Linear(32, 16)
        self.fc3 = nn.Linear(16, 1)
        self.relu = nn.ReLU()
        self.dropout = nn.Dropout(0.3)
        self.sigmoid = nn.Sigmoid()
    
    def forward(self, x):
        x = self.relu(self.fc1(x))
        x = self.dropout(x)
        x = self.relu(self.fc2(x))
        x = self.dropout(x)
        x = self.sigmoid(self.fc3(x))
        return x

model = Classifier()
criterion = nn.BCELoss()  # Binary Cross Entropy
optimizer = optim.Adam(model.parameters(), lr=0.001)

Training the Classifier

# Training
num_epochs = 100
train_losses = []
train_accuracies = []

for epoch in range(num_epochs):
    model.train()
    
    # Forward pass
    outputs = model(X_train)
    loss = criterion(outputs, y_train)
    
    # Backward and optimize
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    # Calculate accuracy
    predictions = (outputs > 0.5).float()
    accuracy = (predictions == y_train).float().mean()
    
    train_losses.append(loss.item())
    train_accuracies.append(accuracy.item())
    
    if (epoch + 1) % 20 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}, Acc: {accuracy.item():.4f}')

Evaluation and Predictions

# Evaluate on test set
model.eval()
with torch.no_grad():
    test_outputs = model(X_test)
    test_predictions = (test_outputs > 0.5).float()
    test_accuracy = (test_predictions == y_test).float().mean()
    
    print(f'Test Accuracy: {test_accuracy.item():.4f}')

# Confusion matrix
from sklearn.metrics import confusion_matrix, classification_report

y_pred_np = test_predictions.numpy()
y_test_np = y_test.numpy()

print("Confusion Matrix:")
print(confusion_matrix(y_test_np, y_pred_np))
print("\nClassification Report:")
print(classification_report(y_test_np, y_pred_np))

DataLoader for Batch Training

Efficient data loading:

from torch.utils.data import TensorDataset, DataLoader

# Create dataset
train_dataset = TensorDataset(X_train, y_train)
test_dataset = TensorDataset(X_test, y_test)

# Create data loaders
batch_size = 32
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

# Training with batches
for epoch in range(num_epochs):
    for batch_X, batch_y in train_loader:
        outputs = model(batch_X)
        loss = criterion(outputs, batch_y)
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

Custom Loss Functions

# Define custom loss
class CustomLoss(nn.Module):
    def __init__(self):
        super(CustomLoss, self).__init__()
    
    def forward(self, predictions, targets):
        # Example: MSE + L1 regularization
        mse = torch.mean((predictions - targets)**2)
        l1 = torch.mean(torch.abs(predictions - targets))
        return mse + 0.1 * l1

# Use custom loss
custom_criterion = CustomLoss()
loss = custom_criterion(outputs, y_train)

Learning Rate Scheduling

from torch.optim.lr_scheduler import StepLR, ReduceLROnPlateau

# Step LR: reduce LR every 30 epochs
scheduler = StepLR(optimizer, step_size=30, gamma=0.1)

# Reduce on plateau: reduce when metric stops improving
scheduler = ReduceLROnPlateau(optimizer, mode='min', factor=0.5, 
                               patience=10, verbose=True)

# In training loop
for epoch in range(num_epochs):
    # ... training code ...
    
    # Step the scheduler
    scheduler.step()  # For StepLR
    # scheduler.step(val_loss)  # For ReduceLROnPlateau

Advanced: Computing Gradients

Custom gradient computation:

import torch

def compute_gradient(func, x_val):
    """Compute gradient of function at x_val"""
    x = torch.tensor([x_val], requires_grad=True)
    y = func(x)
    y.backward()
    return x.grad.item()

# Example: gradient of x^2 at x=3
def f(x):
    return x**2

grad = compute_gradient(f, 3.0)
print(f"Gradient of x^2 at x=3: {grad}")  # Should be 6

# Example: gradient of sin(x) at x=0
grad = compute_gradient(lambda x: torch.sin(x), 0.0)
print(f"Gradient of sin(x) at x=0: {grad}")  # Should be 1

Numerical Optimization Example

Using gradients to find minimum:

import torch
import torch.optim as optim

# Function to minimize: f(x) = (x-3)^2 + 5
x = torch.tensor([0.0], requires_grad=True)
optimizer = optim.SGD([x], lr=0.1)

# Optimization loop
for i in range(100):
    optimizer.zero_grad()
    
    # Compute function value
    y = (x - 3)**2 + 5
    
    # Compute gradient and update
    y.backward()
    optimizer.step()
    
    if i % 20 == 0:
        print(f'Iter {i}: x={x.item():.4f}, f(x)={y.item():.4f}')

print(f'Minimum at x={x.item():.4f}')  # Should be close to 3

Saving and Loading Models

# Save model
torch.save(model.state_dict(), 'model_weights.pth')

# Save entire model
torch.save(model, 'complete_model.pth')

# Save checkpoint (for resuming training)
checkpoint = {
    'epoch': epoch,
    'model_state_dict': model.state_dict(),
    'optimizer_state_dict': optimizer.state_dict(),
    'loss': loss
}
torch.save(checkpoint, 'checkpoint.pth')

# Load model
model = Classifier()
model.load_state_dict(torch.load('model_weights.pth'))
model.eval()

# Load checkpoint
checkpoint = torch.load('checkpoint.pth')
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])

PyTorch vs Other Frameworks

Strengths:

Dynamic computation graphs (easier debugging)
More “Pythonic” and intuitive
Better for research and custom architectures
Excellent automatic differentiation

Considerations:

More verbose than Keras
Need to write training loops manually
Steeper learning curve initially

Key Takeaways: PyTorch

Flexible and powerful deep learning framework
Automatic differentiation is the core feature
Dynamic graphs make debugging easier
Use DataLoader for efficient batch processing
GPU acceleration with .to(device)
More control than Keras, but more code
Perfect for research and custom architectures

When to Use PyTorch?

✅ Use for:

Research and experimentation
Custom architectures and loss functions
When you need fine-grained control
Understanding deep learning internals
Production with TorchScript

❌ Not ideal for:

Quick prototyping (Keras is faster)
When you want minimal code

Next Steps

Explore torchvision for computer vision
Learn about custom datasets and transforms
Implement Convolutional Neural Networks (CNNs)
Explore Transfer Learning with pretrained models
Compare with TensorFlow 2.x

Resources: