Gradient-Based Optimization

Sia Ghelichkhan

- / -

Logo

Gradient-Based Optimization

and how we do it in G-ADOPT (and Firedrake)


Sia Ghelichkhan and the G-ADOPT team

Objectives

By the end of this workshop, we want to understand:

  • The inverse problem: How to find optimal parameters from observations
  • The reduced functional: Connecting control parameters to objectives
  • Why we need derivatives: The key to efficient optimization
  • How it works in Firedrake: Understanding ReducedFunctional.derivative()

Recall: Forward Modelling

Forward modelling as a functional:

\mathbf{u} = \mathcal{F}(\mathbf{m})

Where:

  • \mathbf{u} are the state variables (temperature, velocity, pressure)
  • \mathbf{m} are the control parameters (material properties, initial conditions)
  • \mathcal{F} is the forward operator (solves the governing PDE)

The forward problem:

Given \mathbf{m}, compute \mathbf{u}

Example: Given initial temperature T_{IC}, simulate mantle convection to get velocity \mathbf{u} and temperature T

The Inverse Problem

Inverse Problem
Forward vs inverse problem

The Inverse Problem

The inverse problem asks the opposite question:

Given observations \mathbf{u}_{obs}, find the control parameters \mathbf{m}

Example — 1: Given observed surface velocities, find the initial temperature T_{IC} that best explains them

Example — 2: Given observed surface deformation, find the rheology, or ice history that best explains them.

Challenge: This is usually ill-posed and requires regularization!

Inverse Problem
Forward vs inverse problem

Key insight: We solve inverse problems by formulating them as optimization problems!

The Objective Functional

Measure misfit between predictions and observations:

J(\mathbf{u}, \mathbf{m}) = \frac{1}{2} \int_V (\mathbf{u} - \mathbf{u}_{obs})^2 \, dx

Add regularization to stabilize the problem:

J = \underbrace{\frac{1}{2} \int_V (T - T_{obs})^2 \, dx}_{\text{data misfit}} + \underbrace{\frac{\beta_d}{2} \int_V (T_{IC} - \bar{T})^2 \, dx}_{\text{damping}} + \underbrace{\frac{\beta_s}{2} \int_V |\nabla T_{IC}|^2 \, dx}_{\text{smoothing}}

Components:

  • Data misfit: How well do we match observations?
  • Damping: Penalize large deviations from prior
  • Smoothing: Penalize rough solutions
\mathbf{m}^* = \arg\min_{\mathbf{m}} J(\mathbf{u}(\mathbf{m}), \mathbf{m})

The Reduced Functional

The key idea:

J depends on both \mathbf{u} (state) and \mathbf{m} (control)

But \mathbf{u} is determined by \mathbf{m} through the forward problem!

Reduced functional: \hat{J}(\mathbf{m}) = J(\mathbf{u}(\mathbf{m}), \mathbf{m})

Now we have an unconstrained optimization problem: \mathbf{m}^* = \arg\min_{\mathbf{m}} \hat{J}(\mathbf{m})

Reduced Functional
Eliminating state variables

The reduced functional “hides” the state variables by solving the forward problem implicitly.

Why We Need Derivatives

Gradient-free methods (e.g., grid search):

  • Try many values of \mathbf{m}
  • Evaluate \hat{J}(\mathbf{m}) for each
  • Pick the best one

Cost: O(N_{controls}) forward solves per iteration

For mantle convection: Millions of unknowns → impractical!

Why We Need Derivatives

Why We Need Derivatives
Why We Need Derivatives

Gradient-based methods:

  • Compute gradient \frac{\partial \hat{J}}{\partial \mathbf{m}}
  • Move “downhill” in the direction of steepest descent
  • Update: \mathbf{m}_{k+1} = \mathbf{m}_k - \alpha \frac{\partial \hat{J}}{\partial \mathbf{m}}

Cost: One gradient computation (adjoint solve) per iteration

Key advantage: Gradient computation cost is independent of the number of control parameters!

Computing the Gradient: The Challenge

Naive approach (finite differences):

\frac{\partial \hat{J}}{\partial m_i} \approx \frac{\hat{J}(\mathbf{m} + \epsilon \mathbf{e}_i) - \hat{J}(\mathbf{m})}{\epsilon}

Problem: Need N_{controls} forward solves to compute full gradient!

For mantle convection: Millions of parameters → millions of forward solves!

Finite Differences
Finite difference stencil

Solution: The adjoint method computes the gradient with only ONE additional solve!

The Adjoint Method

The gradient via chain rule:

\frac{d\hat{J}}{d\mathbf{m}} = \frac{\partial J}{\partial \mathbf{m}} + \frac{\partial J}{\partial \mathbf{u}} \frac{\partial \mathbf{u}}{\partial \mathbf{m}}

The expensive part: \frac{\partial \mathbf{u}}{\partial \mathbf{m}} requires solving sensitivity equations

Adjoint trick: Introduce adjoint variables \boldsymbol{\lambda} such that:

\frac{d\hat{J}}{d\mathbf{m}} = \frac{\partial J}{\partial \mathbf{m}} + \boldsymbol{\lambda}^T \frac{\partial \mathcal{F}}{\partial \mathbf{m}}

The adjoint equation:

\left(\frac{\partial \mathcal{F}}{\partial \mathbf{u}}\right)^T \boldsymbol{\lambda} = \left(\frac{\partial J}{\partial \mathbf{u}}\right)^T

Cost: One forward solve + one adjoint solve

ReducedFunctional in Firedrake

Firedrake + pyadjoint automates the adjoint derivation:

from firedrake import *
from firedrake.adjoint import *

# Define control
T_IC = Function(Q, name="InitialTemperature")
control = Control(T_IC)

# Solve forward problem
T = solve_forward_problem(T_IC)

# Define objective functional
J = assemble(0.5 * (T - T_obs)**2 * dx)

# Create reduced functional
J_hat = ReducedFunctional(J, control)

# Compute derivative automatically!
dJ_dm = J_hat.derivative()

What happens under the hood:

  • pyadjoint “tapes” all forward operations
  • Automatically derives adjoint equations
  • Solves adjoint system backwards
  • Computes gradient

How pyadjoint Works: The Tape

During forward solve, pyadjoint records:

  • Every solve statement
  • Every function assignment
  • Dependencies between variables

The tape stores:

T^0 \xrightarrow{\text{project}} T^1 \xrightarrow{\text{Stokes}} \mathbf{u}^1 \xrightarrow{\text{energy}} T^2 \cdots

For adjoint solve, pyadjoint:

  • Traverses tape backwards
  • For each forward operation, executes its adjoint
  • Assembles the full gradient via chain rule
Tape
Computational tape visualisation

Key advantage: Works at PDE level, not line-by-line differentiation → maximum efficiency!

Complete Workflow in G-ADOPT

from gadopt import *
from gadopt.inverse import *

# 1. Set up forward problem
T_IC = Function(Q1, name="Control")

# 2. Solve forward problem
T_final, u = solve_mantle_convection(T_IC)

# 3. Define objective
J = 0.5 * assemble((T_final - T_obs)**2 * dx)
J += 0.5 * beta_s * assemble(dot(grad(T_IC),
                                  grad(T_IC)) * dx)

# 4. Create reduced functional
J_hat = ReducedFunctional(J, Control(T_IC))

# 5. Optimize using ROL
T_IC_opt = minimize(J_hat, method='ROL',
                    options={'General': {...}})

Key steps:

  1. Define control parameters
  2. Solve forward problem
  3. Compute objective
  4. Automatic adjoint derivation
  5. Gradient-based optimization

Summary

Core Concepts:

  • Inverse problem: Find \mathbf{m} from observations
  • Reduced functional: \hat{J}(\mathbf{m}) = J(\mathbf{u}(\mathbf{m}), \mathbf{m})
  • Adjoint method: Efficient gradient computation
  • Optimization: Iteratively minimize \hat{J}

Firedrake + G-ADOPT provide:

  • Automatic adjoint derivation (pyadjoint)
  • Reduced functional creation
  • Gradient computation via .derivative()
  • Integration with ROL optimizer
  • Mesh-independent convergence

The power: Write forward problem → Get optimization framework for free!