.. |dklogo| image:: ../../assets/logos/logo-black.png :alt: DerivKit logo black :width: 32px |dklogo| Hyper-Hessian ====================== This section shows how to compute the hyper-Hessian (third-derivative tensor) of a function using DerivKit. The hyper-Hessian generalizes the Hessian to third order. For a set of parameters :math:`\theta` and a function :math:`f(\theta)`, it contains all third-order partial derivatives evaluated at a single point in parameter space. **Notation** - ``p`` denotes the number of model parameters (``theta`` has shape ``(p,)``). If ``f(theta)`` returns a scalar and ``theta`` has shape ``(p,)``, the hyper-Hessian has shape ``(p, p, p)``. If ``f(theta)`` returns an array with shape ``out_shape``, the hyper-Hessian has shape ``(*out_shape, p, p, p)`` and is computed independently for each output component. See also :doc:`gradient`, :doc:`jacobian`, and :doc:`hessian`. For more information on higher-order derivatives, see :doc:`../../about/kits/calculus_kit`. The primary low-level interface for computing the hyper-Hessian is :func:`derivkit.calculus.hyper_hessian.build_hyper_hessian`. You can choose the derivative backend via ``method`` and pass backend-specific options via ``**dk_kwargs`` (forwarded to :meth:`derivkit.derivative_kit.DerivativeKit.differentiate`). Basic usage ----------- .. doctest:: hyper_hessian_basic >>> import numpy as np >>> from derivkit.calculus.hyper_hessian import build_hyper_hessian >>> # Define a scalar-valued function with non-zero third derivatives. >>> def func(theta): ... return theta[0] ** 3 + 2.0 * theta[0] * theta[1] + theta[1] ** 2 >>> theta0 = np.array([0.5, 2.0]) >>> hess3 = build_hyper_hessian(func, theta0) >>> print(hess3.shape) (2, 2, 2) >>> # Analytic reference: >>> # d^3 f / dtheta0^3 = 6, and all other third partials are zero. >>> ref = np.zeros((2, 2, 2), dtype=float) >>> ref[0, 0, 0] = 6.0 >>> np.allclose(hess3, ref, atol=1e-6, rtol=0.0) True Symmetry -------- For smooth functions, the hyper-Hessian is symmetric under permutations of its parameter indices: .. doctest:: hyper_hessian_symmetry >>> import numpy as np >>> from derivkit.calculus.hyper_hessian import build_hyper_hessian >>> >>> def func(theta): ... return theta[0] ** 3 + 2.0 * theta[0] * theta[1] + theta[1] ** 2 >>> >>> theta0 = np.array([0.5, 2.0]) >>> hess3 = build_hyper_hessian(func, theta0) >>> # Check symmetry for a representative index triplet. >>> bool(hess3[0, 1, 0] == hess3[1, 0, 0] == hess3[0, 0, 1]) True Tensor-valued outputs --------------------- If the function returns a tensor, the hyper-Hessian is computed independently for each output component and afterwards reshaped back to ``(*out_shape, p, p, p)``. .. doctest:: hyper_hessian_tensor >>> import numpy as np >>> from derivkit.calculus.hyper_hessian import build_hyper_hessian >>> # Two-output function. >>> # f0(theta) = theta0^3 >>> # f1(theta) = theta0*theta1 + theta1^2 >>> def func(theta): ... return np.array([ ... theta[0] ** 3, ... theta[0] * theta[1] + theta[1] ** 2, ... ]) >>> theta0 = np.array([0.5, 2.0]) >>> hess3 = build_hyper_hessian(func, theta0) >>> print(hess3.shape) (2, 2, 2, 2) >>> >>> # Component 0: only d^3/dtheta0^3 = 6 is non-zero. >>> ref0 = np.zeros((2, 2, 2), dtype=float) >>> ref0[0, 0, 0] = 6.0 >>> np.allclose(hess3[0], ref0, atol=1e-6, rtol=0.0) True >>> # Component 1: all third derivatives are zero (quadratic/linear terms only). >>> np.allclose(hess3[1], 0.0, atol=1e-6, rtol=0.0) True Finite differences (Ridders) ---------------------------- You can select a derivative backend using ``method`` and pass backend-specific options via ``**dk_kwargs``. .. doctest:: hyper_hessian_finite_ridders >>> import numpy as np >>> from derivkit.calculus.hyper_hessian import build_hyper_hessian >>> def func(theta): ... return theta[0] ** 3 + 2.0 * theta[0] * theta[1] + theta[1] ** 2 >>> theta0 = np.array([0.5, 2.0]) >>> hess3 = build_hyper_hessian( ... func, ... theta0, ... method="finite", ... n_workers=4, ... extrapolation="ridders", ... ) >>> ref = np.zeros((2, 2, 2), dtype=float) >>> ref[0, 0, 0] = 6.0 >>> np.allclose(hess3, ref, atol=5e-3, rtol=0.0) True Notes ----- - For scalar outputs, only unique entries with ``i <= j <= k`` are evaluated and the tensor is symmetrized by filling all index permutations. - For tensor outputs, each component is treated as a scalar internally (flattened, differentiated, and reshaped back). - ``n_workers`` parallelizes work across entries (scalar outputs) or across output components (tensor outputs).