DerivKit logo black Hyper-Hessian#

This section shows how to compute the hyper-Hessian (third-derivative tensor) of a function using DerivKit.

The hyper-Hessian generalizes the Hessian to third order. For a set of parameters \(\theta\) and a function \(f(\theta)\), it contains all third-order partial derivatives evaluated at a single point in parameter space.

Notation

  • p denotes the number of model parameters (theta has shape (p,)).

If f(theta) returns a scalar and theta has shape (p,), the hyper-Hessian has shape (p, p, p).

If f(theta) returns an array with shape out_shape, the hyper-Hessian has shape (*out_shape, p, p, p) and is computed independently for each output component.

See also Gradient, Jacobian, and Hessian. For more information on higher-order derivatives, see CalculusKit.

The primary low-level interface for computing the hyper-Hessian is derivkit.calculus.hyper_hessian.build_hyper_hessian(). You can choose the derivative backend via method and pass backend-specific options via **dk_kwargs (forwarded to derivkit.derivative_kit.DerivativeKit.differentiate()).

Basic usage#

>>> import numpy as np
>>> from derivkit.calculus.hyper_hessian import build_hyper_hessian
>>> # Define a scalar-valued function with non-zero third derivatives.
>>> def func(theta):
...     return theta[0] ** 3 + 2.0 * theta[0] * theta[1] + theta[1] ** 2
>>> theta0 = np.array([0.5, 2.0])
>>> hess3 = build_hyper_hessian(func, theta0)
>>> print(hess3.shape)
(2, 2, 2)
>>> # Analytic reference:
>>> # d^3 f / dtheta0^3 = 6, and all other third partials are zero.
>>> ref = np.zeros((2, 2, 2), dtype=float)
>>> ref[0, 0, 0] = 6.0
>>> np.allclose(hess3, ref, atol=1e-6, rtol=0.0)
True

Symmetry#

For smooth functions, the hyper-Hessian is symmetric under permutations of its parameter indices:

>>> import numpy as np
>>> from derivkit.calculus.hyper_hessian import build_hyper_hessian
>>>
>>> def func(theta):
...     return theta[0] ** 3 + 2.0 * theta[0] * theta[1] + theta[1] ** 2
>>>
>>> theta0 = np.array([0.5, 2.0])
>>> hess3 = build_hyper_hessian(func, theta0)
>>> # Check symmetry for a representative index triplet.
>>> bool(hess3[0, 1, 0] == hess3[1, 0, 0] == hess3[0, 0, 1])
True

Tensor-valued outputs#

If the function returns a tensor, the hyper-Hessian is computed independently for each output component and afterwards reshaped back to (*out_shape, p, p, p).

>>> import numpy as np
>>> from derivkit.calculus.hyper_hessian import build_hyper_hessian
>>> # Two-output function.
>>> # f0(theta) = theta0^3
>>> # f1(theta) = theta0*theta1 + theta1^2
>>> def func(theta):
...     return np.array([
...         theta[0] ** 3,
...         theta[0] * theta[1] + theta[1] ** 2,
...     ])
>>> theta0 = np.array([0.5, 2.0])
>>> hess3 = build_hyper_hessian(func, theta0)
>>> print(hess3.shape)
(2, 2, 2, 2)
>>>
>>> # Component 0: only d^3/dtheta0^3 = 6 is non-zero.
>>> ref0 = np.zeros((2, 2, 2), dtype=float)
>>> ref0[0, 0, 0] = 6.0
>>> np.allclose(hess3[0], ref0, atol=1e-6, rtol=0.0)
True
>>> # Component 1: all third derivatives are zero (quadratic/linear terms only).
>>> np.allclose(hess3[1], 0.0, atol=1e-6, rtol=0.0)
True

Finite differences (Ridders)#

You can select a derivative backend using method and pass backend-specific options via **dk_kwargs.

>>> import numpy as np
>>> from derivkit.calculus.hyper_hessian import build_hyper_hessian
>>> def func(theta):
...     return theta[0] ** 3 + 2.0 * theta[0] * theta[1] + theta[1] ** 2
>>> theta0 = np.array([0.5, 2.0])
>>> hess3 = build_hyper_hessian(
...     func,
...     theta0,
...     method="finite",
...     n_workers=4,
...     extrapolation="ridders",
... )
>>> ref = np.zeros((2, 2, 2), dtype=float)
>>> ref[0, 0, 0] = 6.0
>>> np.allclose(hess3, ref, atol=5e-3, rtol=0.0)
True

Notes#

  • For scalar outputs, only unique entries with i <= j <= k are evaluated and the tensor is symmetrized by filling all index permutations.

  • For tensor outputs, each component is treated as a scalar internally (flattened, differentiated, and reshaped back).

  • n_workers parallelizes work across entries (scalar outputs) or across output components (tensor outputs).