.. |dklogo| image:: ../../assets/logos/logo-black.png
   :alt: DerivKit logo black
   :width: 32px

|dklogo| ForecastKit
====================

ForecastKit provides derivative-based tools for local likelihood and posterior
analysis. It uses numerical derivatives to construct controlled approximations
to a model’s likelihood or posterior around a chosen expansion point.

The toolkit includes multiple Fisher-matrix formalisms, Fisher bias,
Laplace approximations, and higher-order DALI expansions. It also provides
utilities for contour visualization and posterior sampling based on these
local approximations.

All methods rely on DerivativeKit for numerical differentiation and work with
arbitrary user-defined models.

Runnable examples illustrating these methods are collected in
:doc:`../../examples/index`.


Fisher Information Matrix
-------------------------

The Fisher matrix [#fisher]_ quantifies how precisely model parameters can be determined
from a set of observables under a local Gaussian approximation.

Given:

- parameters :math:`\theta = (\theta_1, \theta_2, \ldots)`
- a model mapping parameters to observables :math:`\nu(\theta)`
- a data covariance matrix :math:`C`

ForecastKit computes the Jacobian

.. math::

   J_{i a} = \frac{\partial \nu_i}{\partial \theta_a},

using DerivativeKit and CalculusKit, and constructs the standard Fisher matrix

.. math::

   F = J^\top C^{-1} J.

The Fisher matrix can be inverted to yield the Cramér–Rao lower bound [#crlb]_
on the parameter covariance matrix under the assumption that the likelihood is
locally Gaussian near its maximum. In this approximation, the inverse Fisher
matrix provides a lower bound on the achievable variances of unbiased
parameter estimators, independent of the specific inference algorithm used.
As a result, Fisher matrix methods offer a fast and computationally efficient
way to forecast expected parameter constraints without performing full
likelihood sampling.


Interpretation
^^^^^^^^^^^^^^

The Fisher matrix provides a fast, local forecast of expected parameter constraints
under a Gaussian likelihood approximation.


Examples
^^^^^^^^

The following examples illustrate typical Fisher-forecast workflows:

- A basic Fisher matrix computation:
  :doc:`../../examples/forecasting/fisher`

- Visualization of Fisher constraints using ``GetDist``:
  :doc:`../../examples/forecasting/fisher_contours`


Generalized Gaussian Fisher
---------------------------

When the data covariance depends on the model parameters, the standard Fisher
matrix must be generalized to include derivatives of both the mean and the
covariance [#gaussfisher]_.

For a Gaussian likelihood with mean
:math:`\mu(\theta) = \langle d \rangle`
and covariance
:math:`C(\theta) = \langle (d - \mu)(d - \mu)^{\mathrm T} \rangle`,
the Fisher matrix is

.. math::

   F_{\alpha\beta}
   =
   \frac{1}{2}
   \mathrm{Tr}
   \!\left[
     C^{-1} C_{,\alpha} C^{-1} C_{,\beta}
   \right]
   +
   \mu_{,\alpha}^{\mathrm T}
   C^{-1}
   \mu_{,\beta}.

Here :math:`C_{,\alpha} \equiv \partial C / \partial \theta_\alpha` and
:math:`\mu_{,\alpha} \equiv \partial \mu / \partial \theta_\alpha`.

This expression reduces to the standard Fisher matrix when the covariance
is independent of the parameters.

Interpretation
^^^^^^^^^^^^^^

The generalized Gaussian Fisher provides a consistent local approximation
when both the signal and noise depend on the model parameters.


Examples
^^^^^^^^

A worked example is provided in :doc:`../../examples/forecasting/fisher_gauss`.


X–Y Fisher Formalism
--------------------

The X–Y Fisher formalism [#xyfisher]_ applies when the observables are naturally split into
measured inputs :math:`X` and outputs :math:`Y`, both of which are noisy and
possibly correlated.

The joint data covariance is written in block form as

.. math::

   C =
   \begin{pmatrix}
     C_{XX} & C_{XY} \\
     C_{XY}^{\mathrm T} & C_{YY}
   \end{pmatrix},

and the model predicts the expectation value of the outputs as
:math:`\mu(X, \theta)`.

Assuming the model can be linearized in the latent true inputs :math:`x`,

.. math::

   \mu(x) \simeq \mu(X) + T(X)\,(x - X),
   \qquad
   T_{ij} \equiv
   \frac{\partial \mu_i}{\partial x_j}\Big|_{x=X},

the latent variables can be marginalized analytically.

The resulting likelihood for :math:`Y` is Gaussian with an effective covariance

.. math::

   R
   =
   C_{YY}
   -
   C_{XY}^{\mathrm T} T^{\mathrm T}
   -
   T C_{XY}
   +
   T C_{XX} T^{\mathrm T}.

The Fisher matrix then takes the same form as the generalized Gaussian Fisher,
with the replacement :math:`C \rightarrow R`:

.. math::

   F_{\alpha\beta}
   =
   \frac{1}{2}
   \mathrm{Tr}
   \!\left[
     R^{-1} R_{,\alpha} R^{-1} R_{,\beta}
   \right]
   +
   \mu_{,\alpha}^{\mathrm T}
   R^{-1}
   \mu_{,\beta}.


Interpretation
^^^^^^^^^^^^^^

The X–Y Fisher matrix consistently propagates uncertainty in the measured
inputs into the output covariance, enabling Fisher forecasts when both inputs
and outputs are noisy.

Examples
^^^^^^^^

A worked example is provided in :doc:`../../examples/forecasting/fisher_xy`.


Fisher Bias
-----------

Small systematic deviations in the observables can bias inferred parameters [#fisherbias]_.
These deviations are encoded as a difference data vector

.. math::

   \Delta \nu_i = \nu^{\mathrm{biased}}_i - \nu^{\mathrm{unbiased}}_i.

ForecastKit computes the first-order Fisher bias vector

.. math::

   b_a = \sum_{i,j} J_{i a}\, C^{-1}_{i j}\, \Delta \nu_j,

and the resulting parameter shift

.. math::

   \Delta \theta_a = \sum_b (F^{-1})_{a b}\, b_b.

ForecastKit returns:

- the bias vector
- the induced parameter shift
- optional visualization of the bias relative to Fisher contours

Interpretation
^^^^^^^^^^^^^^

Fisher bias estimates how small systematic errors in the observables translate
into shifts in best-fit parameters.

.. image:: ../../assets/plots/fisher_bias_demo_1and2sigma.png
   :width: 60%

Examples
^^^^^^^^

A worked example is provided in :doc:`../../examples/forecasting/fisher_bias`.


Laplace Approximation
---------------------

The Laplace approximation [#laplace]_ replaces the posterior distribution near its maximum
by a multivariate Gaussian obtained from a second-order Taylor expansion of the
negative log-posterior.

Let the log-posterior be

.. math::

   \mathcal{L}(\theta)
   =
   -\ln p(\theta \mid d)
   =
   -\ln \mathcal{L}(d \mid \theta)
   -\ln p(\theta).

Expanding around the maximum a posteriori (MAP) point
:math:`\hat{\theta}`, where
:math:`\nabla \mathcal{L}(\hat{\theta}) = 0`, gives

.. math::

   \mathcal{L}(\theta)
   \simeq
   \mathcal{L}(\hat{\theta})
   +
   \frac{1}{2}
   (\theta - \hat{\theta})^{\mathrm T}
   H
   (\theta - \hat{\theta}),

where the Hessian matrix is

.. math::

   H_{ab}
   \equiv
   \left.
   \frac{\partial^2 \mathcal{L}}{\partial \theta_a \partial \theta_b}
   \right|_{\theta = \hat{\theta}}.

Under this approximation, the posterior is Gaussian,

.. math::

   p(\theta \mid d)
   \approx
   \mathcal{N}
   \!\left(
     \hat{\theta},
     \, H^{-1}
   \right),

with covariance given by the inverse Hessian of the negative log-posterior.

In the special case of a flat prior and a Gaussian likelihood, the Hessian
reduces to the Fisher information matrix, and the Laplace approximation
coincides with the Fisher forecast.


Examples
^^^^^^^^

The following examples illustrate the Laplace approximation workflow:

- A basic Laplace approximation of the posterior:
  :doc:`../../examples/forecasting/laplace_approx`

- Visualization of Laplace-approximated posteriors using ``GetDist``:
  :doc:`../../examples/forecasting/laplace_contours`


Priors
------

ForecastKit methods can be used either as *likelihood* approximations or as
*posterior* approximations. The distinction is whether a prior contribution is
included.

Given data :math:`d`, the posterior is

.. math::

   \log p(\theta \mid d)
   =
   \log p(d \mid \theta)
   +
   \log p(\theta)
   + \mathrm{const}.

ForecastKit represents priors as callables that evaluate a log-density
(up to an additive constant). Hard exclusions are encoded by returning
``-np.inf``, corresponding to zero probability outside the allowed region
of parameter space.

Where Priors Enter
^^^^^^^^^^^^^^^^^^

Priors affect different forecasting methods in distinct ways:

- **Fisher and generalized Fisher**

  Fisher forecasts are Gaussian *likelihood* approximations by default.
  A Gaussian prior may be incorporated analytically by adding its precision
  matrix to the Fisher matrix.

  For a Gaussian prior with covariance :math:`C_{\mathrm{prior}}`,

  .. math::

     F_{\mathrm{post}}
     =
     F_{\mathrm{like}} + C_{\mathrm{prior}}^{-1}.

  This yields a Gaussian approximation to the *posterior* and preserves the
  Fisher formalism.

- **Laplace and DALI approximations**

  Laplace and DALI operate directly on the log-posterior when a prior is
  supplied. In this case, the prior contribution is evaluated explicitly
  alongside the likelihood expansion, and hard bounds may exclude regions
  of parameter space.

- **Sampling and visualization**

  Plotting tools such as ``GetDist`` do not apply priors implicitly. If samples
  are drawn from a Fisher, Laplace, or DALI approximation without including
  a prior term, the resulting contours correspond to the likelihood
  approximation alone.

  To visualize posterior constraints, the prior must be included explicitly
  when generating samples.

Prior Construction
^^^^^^^^^^^^^^^^^^

ForecastKit provides a unified prior interface via
:func:`derivkit.forecasting.priors_core.build_prior`.

A prior is constructed by combining one or more *prior terms*, each of which
defines a log-density contribution, and optional hard bounds. The resulting
callable evaluates

.. math::

   \log p(\theta) = \sum_k \log p_k(\theta),

returning ``-np.inf`` if any term or bound is violated.

Supported prior terms include:

- uniform (hard bounds)
- multivariate Gaussian and diagonal Gaussian
- log-uniform / Jeffreys
- half-normal and half-Cauchy
- log-normal
- Beta distributions
- Gaussian mixture priors

Interpretation
^^^^^^^^^^^^^^

Priors control identifiability and regularization in local forecasts. They can:

- stabilize ill-conditioned Fisher matrices,
- enforce physical parameter support (e.g. positivity),
- encode external information or weakly informative assumptions,
- determine whether a local approximation should be interpreted as a
  likelihood forecast or a posterior forecast.

Care should be taken to distinguish numerical regularization from genuine
prior information when interpreting forecasted parameter constraints.

Examples
^^^^^^^^

Practical examples demonstrating how priors are incorporated are provided in:

- :ref:`Including priors in Fisher contours <fisher-including-priors>`
  for Fisher forecasts
- :ref:`Including priors in DALI contours <dali-including-priors>`
  for DALI-based posterior sampling


DALI (Higher-Order Forecasting)
-------------------------------

The DALI expansion (Derivative Approximation for LIkelihoods; [#dali]_) extends
Fisher and Laplace approximations by retaining higher-order derivatives of the
likelihood around a chosen expansion point.

Expanding the log-posterior locally in parameter displacements
:math:`\Delta\theta = \theta - \hat{\theta}`, DALI approximates the posterior as

.. math::

   \log p(\theta \mid d)
   \simeq
   \log p(\hat{\theta} \mid d)
   - \frac{1}{2}
     F_{\alpha\beta}\,
     \Delta\theta_\alpha \Delta\theta_\beta
   - \frac{1}{3!}
     D^{(1)}_{\alpha\beta\gamma}\,
     \Delta\theta_\alpha \Delta\theta_\beta \Delta\theta_\gamma
   - \frac{1}{4!}
     D^{(2)}_{\alpha\beta\gamma\delta}\,
     \Delta\theta_\alpha \Delta\theta_\beta
     \Delta\theta_\gamma \Delta\theta_\delta
   + \cdots,

where

- :math:`F_{\alpha\beta}` is the Fisher matrix,
- :math:`D^{(1)}_{\alpha\beta\gamma}` and :math:`D^{(2)}_{\alpha\beta\gamma\delta}`
  are the second-order (doublet) DALI correction terms,
- :math:`T^{(1)}_{\alpha\beta\gamma\delta}`,
  :math:`T^{(2)}_{\alpha\beta\gamma\delta\epsilon}`,
  and :math:`T^{(3)}_{\alpha\beta\gamma\delta\epsilon\zeta}`
  denote third-order (triplet) DALI correction terms,

all evaluated at the expansion point :math:`\hat{\theta}`.


For Gaussian data models with parameter-independent covariance, these tensors
can be expressed directly in terms of derivatives of the model predictions,
allowing DALI to be constructed using numerical derivatives alone.

At second order (“doublet DALI”), the posterior takes the form

.. math::

   p(\theta \mid d)
   \propto
   \exp\!\Bigg[
     -\frac{1}{2}
     F_{\alpha\beta}\,
     \Delta\theta_\alpha \Delta\theta_\beta
     -\frac{1}{2}
     D^{(1)}_{\alpha\beta\gamma}\,
     \Delta\theta_\alpha \Delta\theta_\beta \Delta\theta_\gamma
     +\frac{1}{8}
     D^{(2)}_{\alpha\beta\gamma\delta}\,
     \Delta\theta_\alpha \Delta\theta_\beta
     \Delta\theta_\gamma \Delta\theta_\delta
   \Bigg].

Including third-order (“triplet DALI”) terms introduces additional correction
tensors :math:`T^{(i)}`, which capture higher-order non-Gaussian structure while
preserving positive definiteness of the approximation.

Keep in mind that:

- DALI is a *local* approximation around ``theta0`` and may degrade far from the
  expansion point.
- DALI may perform poorly for models with weak or sublinear parameter dependence,
  or when increasing the expansion order does not improve convergence
  (see discussion in e.g. [#dali]_ and section VI.B of
  `arXiv:2211.06534 <https://arxiv.org/abs/2211.06534>`_).
- If DALI does not stabilize as the expansion order is increased,
  numerical posterior sampling (e.g. ``emcee``) should be used to validate
  the approximation.


Interpretation
^^^^^^^^^^^^^^

DALI provides a controlled hierarchy of local posterior approximations,
reducing to the Fisher and Laplace limits when higher-order derivatives vanish.


.. image:: ../../assets/plots/dali_vs_fisher_exact_1d.png
   :width: 60%
.. image:: ../../assets/plots/dali_vs_fisher_2d_1and2sigma.png
   :width: 60%


Examples
^^^^^^^^

The following examples illustrate the DALI approximation workflow:

- A basic DALI expansion of the posterior:
  :doc:`../../examples/forecasting/dali`

- Visualization of DALI-expanded posteriors using ``GetDist``:
  :doc:`../../examples/forecasting/dali_contours`


Posterior Sampling and Visualization
------------------------------------

ForecastKit provides utilities to draw samples directly from Fisher, Laplace,
or DALI-expanded posteriors and to convert them into GetDist-compatible [#getdist]_
``MCSamples`` objects.

This enables:

- posterior sampling based on local likelihood approximations using emcee [#emcee]_
- easy integration with GetDist for contour plotting and statistical summaries
- direct contour visualization and uncertainty propagation
- comparison between Fisher, Laplace, and DALI forecasts

These workflows are designed for forecasting and local posterior analysis,
providing fast and controlled approximations to parameter constraints without
requiring full likelihood evaluations with MCMC or nested sampling methods.

Examples:
^^^^^^^^^
Worked examples are provided in:
- :doc:`../../examples/forecasting/fisher_contours` for Fisher-based GetDist samples
- :doc:`../../examples/forecasting/dali_contours` for DALI-based posterior sampling
- :doc:`../../examples/forecasting/laplace_contours` for Laplace approximations


Backend Notes
-------------

- If ``method`` is omitted, the adaptive derivative backend is used.
- Any DerivativeKit backend may be selected
  (finite differences, Ridders, Gauss–Richardson, polynomial fits, etc.).
- Changing the derivative backend affects only how derivatives are computed,
  not the forecasting logic itself.

``ForecastKit`` is fully modular and designed to scale from simple Gaussian
forecasts to higher-order likelihood expansions with minimal code changes.


References
----------

.. [#fisher] Wikipedia, *Fisher information matrix*,
   https://en.wikipedia.org/wiki/Fisher_information
.. [#crlb] Wikipedia, *Cramér–Rao bound*,
   https://en.wikipedia.org/wiki/Cramér–Rao_bound
.. [#gaussfisher] M. Tegmark et al., *Karhunen-Loeve eigenvalue problems in cosmology: how should we tackle large data sets?*,
   https://arxiv.org/abs/1404.2854
.. [#xyfisher] A. Heavens et al., *Generalised Fisher Matrices*,
   https://arxiv.org/abs/1404.2854
.. [#dali] E. Sellentin et al., *Breaking the spell of Gaussianity: forecasting with higher order Fisher matrices*
   https://arxiv.org/abs/1401.6892
.. [#getdist] A. Lewis, *GetDist: a Python package for analysing Monte Carlo samples*,
   https://arxiv.org/abs/1910.13970
.. [#emcee] D. Foreman-Mackey et al., *emcee: The MCMC Hammer*,
   https://arxiv.org/abs/1202.3665
.. [#laplace] Wikipedia, *Laplace's approximation*,
   https://en.wikipedia.org/wiki/Laplace%27s_approximation
.. [#fisherbias] A. Amara and A. Réfrégier, *Systematic Bias in Cosmic Shear: Beyond the Fisher Matrix*,
   https://arxiv.org/abs/0710.5171