DerivKit logo black ForecastKit#

ForecastKit provides derivative-based tools for local likelihood and posterior analysis. It uses numerical derivatives to construct controlled approximations to a model’s likelihood or posterior around a chosen expansion point.

The toolkit includes multiple Fisher-matrix formalisms, Fisher bias, Laplace approximations, and higher-order DALI expansions. It also provides utilities for contour visualization and posterior sampling based on these local approximations.

All methods rely on DerivativeKit for numerical differentiation and work with arbitrary user-defined models.

Runnable examples illustrating these methods are collected in Examples.

Fisher Information Matrix#

The Fisher matrix [1] quantifies how precisely model parameters can be determined from a set of observables under a local Gaussian approximation.

Given:

  • parameters \(\theta = (\theta_1, \theta_2, \ldots)\)

  • a model mapping parameters to observables \(\nu(\theta)\)

  • a data covariance matrix \(C\)

ForecastKit computes the Jacobian

\[J_{i a} = \frac{\partial \nu_i}{\partial \theta_a},\]

using DerivativeKit and CalculusKit, and constructs the standard Fisher matrix

\[F = J^\top C^{-1} J.\]

The Fisher matrix can be inverted to yield the Cramér–Rao lower bound [2] on the parameter covariance matrix under the assumption that the likelihood is locally Gaussian near its maximum. In this approximation, the inverse Fisher matrix provides a lower bound on the achievable variances of unbiased parameter estimators, independent of the specific inference algorithm used. As a result, Fisher matrix methods offer a fast and computationally efficient way to forecast expected parameter constraints without performing full likelihood sampling.

Interpretation: The Fisher matrix provides a fast, local forecast of expected parameter constraints under a Gaussian likelihood approximation.

Example: A basic Fisher matrix computation is shown in Fisher matrix

Generalized Gaussian Fisher#

When the data covariance depends on the model parameters, the standard Fisher matrix must be generalized to include derivatives of both the mean and the covariance [3].

For a Gaussian likelihood with mean \(\mu(\theta) = \langle d \rangle\) and covariance \(C(\theta) = \langle (d - \mu)(d - \mu)^{\mathrm T} \rangle\), the Fisher matrix is

\[F_{\alpha\beta} = \frac{1}{2} \mathrm{Tr} \!\left[ C^{-1} C_{,\alpha} C^{-1} C_{,\beta} \right] + \mu_{,\alpha}^{\mathrm T} C^{-1} \mu_{,\beta}.\]

Here \(C_{,\alpha} \equiv \partial C / \partial \theta_\alpha\) and \(\mu_{,\alpha} \equiv \partial \mu / \partial \theta_\alpha\).

This expression reduces to the standard Fisher matrix when the covariance is independent of the parameters.

Interpretation: The generalized Gaussian Fisher provides a consistent local approximation when both the signal and noise depend on the model parameters.

X–Y Fisher Formalism#

The X–Y Fisher formalism [4] applies when the observables are naturally split into measured inputs \(X\) and outputs \(Y\), both of which are noisy and possibly correlated.

The joint data covariance is written in block form as

\[\begin{split}C = \begin{pmatrix} C_{XX} & C_{XY} \\ C_{XY}^{\mathrm T} & C_{YY} \end{pmatrix},\end{split}\]

and the model predicts the expectation value of the outputs as \(\mu(X, \theta)\).

Assuming the model can be linearized in the latent true inputs \(x\),

\[\mu(x) \simeq \mu(X) + T(X)\,(x - X), \qquad T_{ij} \equiv \frac{\partial \mu_i}{\partial x_j}\Big|_{x=X},\]

the latent variables can be marginalized analytically.

The resulting likelihood for \(Y\) is Gaussian with an effective covariance

\[R = C_{YY} - C_{XY}^{\mathrm T} T^{\mathrm T} - T C_{XY} + T C_{XX} T^{\mathrm T}.\]

The Fisher matrix then takes the same form as the generalized Gaussian Fisher, with the replacement \(C \rightarrow R\):

\[F_{\alpha\beta} = \frac{1}{2} \mathrm{Tr} \!\left[ R^{-1} R_{,\alpha} R^{-1} R_{,\beta} \right] + \mu_{,\alpha}^{\mathrm T} R^{-1} \mu_{,\beta}.\]

Interpretation: The X–Y Fisher matrix consistently propagates uncertainty in the measured inputs into the output covariance, enabling Fisher forecasts when both inputs and outputs are noisy.

Fisher Bias#

Small systematic deviations in the observables can bias inferred parameters [9]. These deviations are encoded as a difference data vector

\[\Delta \nu_i = \nu^{\mathrm{biased}}_i - \nu^{\mathrm{unbiased}}_i.\]

ForecastKit computes the first-order Fisher bias vector

\[b_a = \sum_{i,j} J_{i a}\, C^{-1}_{i j}\, \Delta \nu_j,\]

and the resulting parameter shift

\[\Delta \theta_a = \sum_b (F^{-1})_{a b}\, b_b.\]

ForecastKit returns:

  • the bias vector

  • the induced parameter shift

  • optional visualization of the bias relative to Fisher contours

Interpretation: Fisher bias estimates how small systematic errors in the observables translate into shifts in best-fit parameters.

../../_images/fisher_bias_demo_1and2sigma.png

Example: A worked example is provided in Fisher bias.

Laplace Approximation#

The Laplace approximation [8] replaces the posterior distribution near its maximum by a multivariate Gaussian obtained from a second-order Taylor expansion of the negative log-posterior.

Let the log-posterior be

\[\mathcal{L}(\theta) = -\ln p(\theta \mid d) = -\ln \mathcal{L}(d \mid \theta) -\ln p(\theta).\]

Expanding around the maximum a posteriori (MAP) point \(\hat{\theta}\), where \(\nabla \mathcal{L}(\hat{\theta}) = 0\), gives

\[\mathcal{L}(\theta) \simeq \mathcal{L}(\hat{\theta}) + \frac{1}{2} (\theta - \hat{\theta})^{\mathrm T} H (\theta - \hat{\theta}),\]

where the Hessian matrix is

\[H_{ab} \equiv \left. \frac{\partial^2 \mathcal{L}}{\partial \theta_a \partial \theta_b} \right|_{\theta = \hat{\theta}}.\]

Under this approximation, the posterior is Gaussian,

\[p(\theta \mid d) \approx \mathcal{N} \!\left( \hat{\theta}, \, H^{-1} \right),\]

with covariance given by the inverse Hessian of the negative log-posterior.

In the special case of a flat prior and a Gaussian likelihood, the Hessian reduces to the Fisher information matrix, and the Laplace approximation coincides with the Fisher forecast.

DALI (Higher-Order Forecasting)#

The DALI expansion (Derivative Approximation for LIkelihoods; [5]) extends Fisher and Laplace approximations by retaining higher-order derivatives of the likelihood around a chosen expansion point.

Expanding the log-posterior locally in parameter displacements \(\Delta\theta = \theta - \hat{\theta}\), DALI approximates the posterior as

\[\log p(\theta \mid d) \simeq \log p(\hat{\theta} \mid d) - \frac{1}{2} F_{\alpha\beta}\, \Delta\theta_\alpha \Delta\theta_\beta - \frac{1}{3!} G_{\alpha\beta\gamma}\, \Delta\theta_\alpha \Delta\theta_\beta \Delta\theta_\gamma - \frac{1}{4!} H_{\alpha\beta\gamma\delta}\, \Delta\theta_\alpha \Delta\theta_\beta \Delta\theta_\gamma \Delta\theta_\delta + \cdots,\]

where

  • \(F_{\alpha\beta}\) is the Fisher matrix,

  • \(G_{\alpha\beta\gamma}\) is the third-derivative (skewness) tensor,

  • \(H_{\alpha\beta\gamma\delta}\) is the fourth-derivative (kurtosis) tensor,

all evaluated at the expansion point \(\hat{\theta}\).

For Gaussian data models with parameter-independent covariance, these tensors can be expressed directly in terms of derivatives of the model predictions, allowing DALI to be constructed using numerical derivatives alone.

At second order (“doublet DALI”), the posterior takes the form

\[p(\theta \mid d) \propto \exp\!\Bigg[ -\frac{1}{2} F_{\alpha\beta}\, \Delta\theta_\alpha \Delta\theta_\beta -\frac{1}{2} G_{\alpha\beta\gamma}\, \Delta\theta_\alpha \Delta\theta_\beta \Delta\theta_\gamma +\frac{1}{8} H_{\alpha\beta\gamma\delta}\, \Delta\theta_\alpha \Delta\theta_\beta \Delta\theta_\gamma \Delta\theta_\delta \Bigg].\]

Including higher-order terms (“triplet DALI”) systematically improves the local approximation, capturing skewness and non-elliptical curvature while remaining positive definite.

Interpretation: DALI provides a controlled hierarchy of local posterior approximations, reducing to the Fisher and Laplace limits when higher-order derivatives vanish.

../../_images/dali_vs_fisher_exact_1d.png ../../_images/dali_vs_fisher_2d_1and2sigma.png

Example: A worked example is provided in DALI tensors.

Generalized Gaussian Fisher#

When the data covariance depends on the model parameters, the Fisher matrix must include derivatives of both the mean-derivative and the covariance-derivative terms.

For Gaussian-distributed data with mean \(\mu(\theta)\) and covariance \(C(\theta)\), the generalized Fisher matrix evaluated at a fiducial point is

\[F_{\alpha\beta} = \mu_{,\alpha}^{\mathrm T}\, C^{-1}\, \mu_{,\beta} + \frac{1}{2}\, \mathrm{Tr}\!\left[ C^{-1} C_{,\alpha} C^{-1} C_{,\beta} \right],\]

where \(\mu_{,\alpha} \equiv \partial \mu / \partial \theta_\alpha\) and \(C_{,\alpha} \equiv \partial C / \partial \theta_\alpha\).

This reduces to the standard Fisher matrix when \(C\) is independent of the parameters. In ForecastKit, the mean-derivative term is always included, while the covariance-derivative term is included only when cov is provided as a callable C(theta).

Interpretation: Use the generalized Gaussian Fisher when both the signal and the noise model depend on the parameters.

Example: A worked example is provided in Gaussian Fisher matrix.

X–Y Fisher Formalism#

The X–Y Fisher formalism applies when the observables are split into measured inputs \(X\) and outputs \(Y\), where both are noisy and may be correlated. Measurement errors are described by a joint Gaussian covariance

\[\begin{split}C = \begin{pmatrix} C_{XX} & C_{XY} \\ C_{XY}^{\mathrm T} & C_{YY} \end{pmatrix},\end{split}\]

and the model predicts the mean of the outputs as \(\mu(X,\theta)\).

Linearizing the model mean in the (latent) true inputs \(x\) around the measured inputs \(X\),

\[\mu(x,\theta) \simeq \mu(X,\theta) + T(X,\theta)\,(x - X), \qquad T_{ij} \equiv \frac{\partial \mu_i}{\partial x_j}\Big|_{x=X},\]

and analytically marginalizing over \(x\) yields a Gaussian likelihood for \(Y\) with an effective covariance

\[R = C_{YY} - C_{XY}^{\mathrm T} T^{\mathrm T} - T C_{XY} + T C_{XX} T^{\mathrm T}.\]

The Fisher matrix then has the same form as the generalized Gaussian Fisher, with the replacement \(C \rightarrow R\):

\[F_{\alpha\beta} = \mu_{,\alpha}^{\mathrm T}\, R^{-1}\, \mu_{,\beta} + \frac{1}{2}\, \mathrm{Tr}\!\left[ R^{-1} R_{,\alpha} R^{-1} R_{,\beta} \right].\]

In ForecastKit, the covariance blocks \(C_{XX}\), \(C_{XY}\), and \(C_{YY}\) are treated as fixed; parameter dependence enters through the local sensitivity matrix \(T\), which propagates input uncertainty into the effective output covariance \(R\).

Interpretation: Use the X–Y Fisher formalism when uncertainty in measured inputs must be propagated into the output covariance.

Example: A worked example is provided in X–Y Gaussian Fisher matrix.

Posterior Sampling and Visualization#

ForecastKit provides utilities to draw samples directly from Fisher, Laplace, or DALI-expanded posteriors and to convert them into GetDist-compatible [6] MCSamples objects.

This enables:

  • posterior sampling based on local likelihood approximations using emcee [7]

  • easy integration with GetDist for contour plotting and statistical summaries

  • direct contour visualization and uncertainty propagation

  • comparison between Fisher, Laplace, and DALI forecasts

These workflows are designed for forecasting and local posterior analysis, providing fast and controlled approximations to parameter constraints without requiring full likelihood evaluations with MCMC or nested sampling methods.

Examples: Worked examples are provided in: - Fisher contours for Fisher-based GetDist samples - DALI contours for DALI-based posterior sampling - Laplace contours for Laplace approximations

Backend Notes#

  • If method is omitted, the adaptive derivative backend is used.

  • Any DerivativeKit backend may be selected (finite differences, Ridders, Gauss–Richardson, polynomial fits, etc.).

  • Changing the derivative backend affects only how derivatives are computed, not the forecasting logic itself.

ForecastKit is fully modular and designed to scale from simple Gaussian forecasts to higher-order likelihood expansions with minimal code changes.

References#