Hessian#

This section shows how to compute the Hessian (matrix of second derivatives) using DerivKit.

The Hessian describes the local curvature of a function with respect to the model parameters.

For a function f(theta), the Hessian is the matrix of second derivatives with respect to the parameters.

Notation

p denotes the number of model parameters (theta has shape (p,)).

Depending on the output type of f(theta), the Hessian has the following shape:

scalar output f(theta) → Hessian shape (p, p)
tensor output f(theta) with shape out_shape → Hessian shape (*out_shape, p, p)

See also Gradient for first derivatives and Jacobian for vector-valued outputs. For more information on hessian, see CalculusKit.

The primary interface for computing the Hessian is derivkit.calculus_kit.CalculusKit.hessian(). For advanced usage and backend-specific keyword arguments, see derivkit.calculus.hessian.build_hessian(). You can choose the derivative backend via method and pass backend-specific options via **dk_kwargs (forwarded to derivkit.derivative_kit.DerivativeKit.differentiate()).

Basic usage (scalar-valued function)#

>>> import numpy as np
>>> from derivkit import CalculusKit
>>> # Define a scalar-valued function
>>> def func(theta):
...     return np.sin(theta[0]) + theta[0] * theta[1] + theta[1] ** 2
>>> # Point at which to compute the Hessian
>>> x0 = np.array([0.5, 2.0])
>>> # Create CalculusKit instance and compute Hessian
>>> calc = CalculusKit(func, x0=x0)
>>> hess = calc.hessian()
>>> print(np.round(hess, 6))
[[-0.479426  1.      ]
 [ 1.        2.      ]]
>>> print(hess.shape)
(2, 2)
>>> ref = np.array([
...     [-np.sin(0.5), 1.0],
...     [1.0, 2.0],
... ])
>>> print(np.round(ref, 6))
[[-0.479426  1.      ]
 [ 1.        2.      ]]

Hessian diagonal only#

For large parameter spaces you may only need the diagonal of the Hessian. DerivKit provides a fast helper for this case.

>>> import numpy as np
>>> from derivkit import CalculusKit
>>> # Define a scalar-valued function
>>> def func(theta):
...     return np.sin(theta[0]) + theta[0] * theta[1] + theta[1] ** 2
>>> # Instantiate CalculusKit and compute Hessian diagonal
>>> calc = CalculusKit(func, x0=np.array([0.5, 2.0]))
>>> hess_diag = calc.hessian_diag()
>>> print(np.round(np.asarray(hess_diag).reshape(-1), 6))
[-0.479426  2.      ]
>>> ref = np.array([-np.sin(0.5), 2.0])
>>> np.allclose(hess_diag, ref)
True

Tensor-valued outputs#

If the function returns a tensor, the Hessian is computed independently for each output component.

The result is reshaped back to (*out_shape, p, p).

>>> import numpy as np
>>> from derivkit import CalculusKit
>>> # Define a tensor-valued function
>>> def func(theta):
...     return np.array([
...         np.sin(theta[0]),
...         theta[0] * theta[1] + theta[1] ** 2,
...     ])
>>> # Point at which to compute the Hessian
>>> x0 = np.array([0.5, 2.0])
>>> # Create CalculusKit instance and compute Hessian
>>> calc = CalculusKit(func, x0=x0)
>>> hess = calc.hessian()
>>> print(hess.shape)
(2, 2, 2)
>>> hess0_ref = np.array([
...     [-np.sin(0.5), 0.0],
...     [0.0, 0.0],
... ])
>>> hess1_ref = np.array([
...     [0.0, 1.0],
...     [1.0, 2.0],
... ])
>>> np.allclose(hess[0], hess0_ref)
True
>>> np.allclose(hess[1], hess1_ref)
True

Finite differences (Ridders) via `dk_kwargs`#

>>> import numpy as np
>>> from derivkit import CalculusKit
>>> # Define a scalar-valued function
>>> def func(theta):
...     return np.sin(theta[0]) + theta[0] * theta[1] + theta[1] ** 2
>>> # Create CalculusKit instance and compute Hessian
>>> calc = CalculusKit(func, x0=np.array([0.5, 2.0]))
>>> hess = calc.hessian(
...     method="finite",
...     n_workers=4,
...     stepsize=1e-2,
...     num_points=5,
...     extrapolation="ridders",
...     levels=4,
... )
>>> print(np.round(hess, 6))
[[-0.479426  1.      ]
 [ 1.        2.      ]]

Notes#

For scalar outputs, only the upper triangle of the Hessian is evaluated and mirrored for efficiency.
For tensor outputs, each component is treated as a scalar internally (flattened, differentiated, and reshaped back).
n_workers parallelizes Hessian tasks (entries and/or components).
When using CalculusKit.hessian_diag(), mixed partial derivatives are skipped for speed.