Gradient-Based Optimization

Sia Ghelichkhan

- / -

Gradient-Based Optimization

and how we do it in G-ADOPT (and Firedrake)

Sia Ghelichkhan and the G-ADOPT team

Objectives

By the end of this workshop, we want to understand:

The inverse problem: How to find optimal parameters from observations
The reduced functional: Connecting control parameters to objectives
Why we need derivatives: The key to efficient optimization
How it works in Firedrake: Understanding ReducedFunctional.derivative()

Recall: Forward Modelling

Forward modelling as a functional:

\mathbf{u} = \mathcal{F}(\mathbf{m})

Where:

\mathbf{u} are the state variables (temperature, velocity, pressure)
\mathbf{m} are the control parameters (material properties, initial conditions)
\mathcal{F} is the forward operator (solves the governing PDE)

The forward problem:

Given \mathbf{m}, compute \mathbf{u}

Example: Given initial temperature T_{IC}, simulate mantle convection to get velocity \mathbf{u} and temperature T

The Inverse Problem

The inverse problem asks the opposite question:

Given observations \mathbf{u}_{obs}, find the control parameters \mathbf{m}

Example — 1: Given observed surface velocities, find the initial temperature T_{IC} that best explains them

Example — 2: Given observed surface deformation, find the rheology, or ice history that best explains them.

Challenge: This is usually ill-posed and requires regularization!

Key insight: We solve inverse problems by formulating them as optimization problems!

The Objective Functional

Measure misfit between predictions and observations:

J(\mathbf{u}, \mathbf{m}) = \frac{1}{2} \int_V (\mathbf{u} - \mathbf{u}_{obs})^2 \, dx

Add regularization to stabilize the problem:

J = \underbrace{\frac{1}{2} \int_V (T - T_{obs})^2 \, dx}_{\text{data misfit}} + \underbrace{\frac{\beta_d}{2} \int_V (T_{IC} - \bar{T})^2 \, dx}_{\text{damping}} + \underbrace{\frac{\beta_s}{2} \int_V |\nabla T_{IC}|^2 \, dx}_{\text{smoothing}}

Components:

Data misfit: How well do we match observations?
Damping: Penalize large deviations from prior
Smoothing: Penalize rough solutions

\mathbf{m}^* = \arg\min_{\mathbf{m}} J(\mathbf{u}(\mathbf{m}), \mathbf{m})

The Reduced Functional

The key idea:

J depends on both \mathbf{u} (state) and \mathbf{m} (control)

But \mathbf{u} is determined by \mathbf{m} through the forward problem!

Reduced functional: \hat{J}(\mathbf{m}) = J(\mathbf{u}(\mathbf{m}), \mathbf{m})

Now we have an unconstrained optimization problem: \mathbf{m}^* = \arg\min_{\mathbf{m}} \hat{J}(\mathbf{m})

The reduced functional “hides” the state variables by solving the forward problem implicitly.

Why We Need Derivatives

Gradient-free methods (e.g., grid search):

Try many values of \mathbf{m}
Evaluate \hat{J}(\mathbf{m}) for each
Pick the best one

Cost: O(N_{controls}) forward solves per iteration

For mantle convection: Millions of unknowns → impractical!

Why We Need Derivatives

Gradient-based methods:

Compute gradient \frac{\partial \hat{J}}{\partial \mathbf{m}}
Move “downhill” in the direction of steepest descent
Update: \mathbf{m}_{k+1} = \mathbf{m}_k - \alpha \frac{\partial \hat{J}}{\partial \mathbf{m}}

Cost: One gradient computation (adjoint solve) per iteration

Key advantage: Gradient computation cost is independent of the number of control parameters!

Computing the Gradient: The Challenge

Naive approach (finite differences):

\frac{\partial \hat{J}}{\partial m_i} \approx \frac{\hat{J}(\mathbf{m} + \epsilon \mathbf{e}_i) - \hat{J}(\mathbf{m})}{\epsilon}

Problem: Need N_{controls} forward solves to compute full gradient!

For mantle convection: Millions of parameters → millions of forward solves!

Solution: The adjoint method computes the gradient with only ONE additional solve!

The Adjoint Method

The gradient via chain rule:

\frac{d\hat{J}}{d\mathbf{m}} = \frac{\partial J}{\partial \mathbf{m}} + \frac{\partial J}{\partial \mathbf{u}} \frac{\partial \mathbf{u}}{\partial \mathbf{m}}

The expensive part: \frac{\partial \mathbf{u}}{\partial \mathbf{m}} requires solving sensitivity equations

Adjoint trick: Introduce adjoint variables \boldsymbol{\lambda} such that:

\frac{d\hat{J}}{d\mathbf{m}} = \frac{\partial J}{\partial \mathbf{m}} + \boldsymbol{\lambda}^T \frac{\partial \mathcal{F}}{\partial \mathbf{m}}

The adjoint equation:

\left(\frac{\partial \mathcal{F}}{\partial \mathbf{u}}\right)^T \boldsymbol{\lambda} = \left(\frac{\partial J}{\partial \mathbf{u}}\right)^T

Cost: One forward solve + one adjoint solve

ReducedFunctional in Firedrake

Firedrake + pyadjoint automates the adjoint derivation:

from firedrake import *
from firedrake.adjoint import *

# Define control
T_IC = Function(Q, name="InitialTemperature")
control = Control(T_IC)

# Solve forward problem
T = solve_forward_problem(T_IC)

# Define objective functional
J = assemble(0.5 * (T - T_obs)**2 * dx)

# Create reduced functional
J_hat = ReducedFunctional(J, control)

# Compute derivative automatically!
dJ_dm = J_hat.derivative()

What happens under the hood:

pyadjoint “tapes” all forward operations
Automatically derives adjoint equations
Solves adjoint system backwards
Computes gradient

How pyadjoint Works: The Tape

During forward solve, pyadjoint records:

Every solve statement
Every function assignment
Dependencies between variables

The tape stores:

T^0 \xrightarrow{\text{project}} T^1 \xrightarrow{\text{Stokes}} \mathbf{u}^1 \xrightarrow{\text{energy}} T^2 \cdots

For adjoint solve, pyadjoint:

Traverses tape backwards
For each forward operation, executes its adjoint
Assembles the full gradient via chain rule

Key advantage: Works at PDE level, not line-by-line differentiation → maximum efficiency!

Complete Workflow in G-ADOPT

from gadopt import *
from gadopt.inverse import *

# 1. Set up forward problem
T_IC = Function(Q1, name="Control")

# 2. Solve forward problem
T_final, u = solve_mantle_convection(T_IC)

# 3. Define objective
J = 0.5 * assemble((T_final - T_obs)**2 * dx)
J += 0.5 * beta_s * assemble(dot(grad(T_IC),
                                  grad(T_IC)) * dx)

# 4. Create reduced functional
J_hat = ReducedFunctional(J, Control(T_IC))

# 5. Optimize using ROL
T_IC_opt = minimize(J_hat, method='ROL',
                    options={'General': {...}})

Key steps:

Define control parameters
Solve forward problem
Compute objective
Automatic adjoint derivation
Gradient-based optimization

Summary

Core Concepts:

Inverse problem: Find \mathbf{m} from observations
Reduced functional: \hat{J}(\mathbf{m}) = J(\mathbf{u}(\mathbf{m}), \mathbf{m})
Adjoint method: Efficient gradient computation
Optimization: Iteratively minimize \hat{J}

Firedrake + G-ADOPT provide:

Automatic adjoint derivation (pyadjoint)
Reduced functional creation
Gradient computation via .derivative()
Integration with ROL optimizer
Mesh-independent convergence

The power: Write forward problem → Get optimization framework for free!