.. |dklogo| image:: ../../assets/logos/logo-black.png
   :alt: DerivKit logo black
   :width: 32px

|dklogo| Gradient
=================

This section shows how to compute the gradient of a scalar-valued function using DerivKit.

The gradient describes how a scalar output changes with respect to each model
parameter.

For a set of parameters :math:`\theta` and a scalar-valued function :math:`f(\theta)`,
the gradient is the vector of first derivatives of :math:`f` with respect :math:`\theta`.

**Notation**

- ``p`` denotes the number of model parameters (``theta`` has shape ``(p,)``).

If ``f(theta)`` returns a scalar and ``theta`` has shape ``(p,)``, the gradient
has shape ``(p,)``, with one component per parameter.

See also :doc:`jacobian` for vector-valued outputs and :doc:`hessian` for second derivatives.
For more information on gradient, see :doc:`../../about/kits/calculus_kit`.

The primary interface for computing the gradient is
:meth:`derivkit.calculus_kit.CalculusKit.gradient`.
For advanced usage and backend-specific keyword arguments, see
:func:`derivkit.calculus.gradient.build_gradient`.
You can choose the derivative backend via ``method`` and pass backend-specific
options via ``**dk_kwargs`` (forwarded to
:meth:`derivkit.derivative_kit.DerivativeKit.differentiate`).


Basic usage
-----------

.. doctest:: gradient_basic

   >>> import numpy as np
   >>> from derivkit.calculus_kit import CalculusKit
   >>> # Define a scalar-valued function
   >>> def func(theta):
   ...     return np.sin(theta[0]) + theta[1] ** 2
   >>> # Point at which to compute the gradient
   >>> x0 = np.array([0.5, 2.0])
   >>> # Create CalculusKit instance and compute gradient
   >>> calc = CalculusKit(func, x0=x0)
   >>> grad = calc.gradient()
   >>> print(np.round(np.asarray(grad).reshape(-1), 6))  # shape (p,)
   [0.877583 4.      ]
   >>> ref = np.array([np.cos(0.5), 4.0])
   >>> print(np.round(ref, 6))
   [0.877583 4.      ]


Finite differences (Ridders) via ``dk_kwargs``
----------------------------------------------

.. doctest:: gradient_finite_ridders

   >>> import numpy as np
   >>> from derivkit.calculus_kit import CalculusKit
   >>> # Define a scalar-valued function
   >>> def func(theta):
   ...     return np.sin(theta[0]) + theta[1] ** 2
   >>> # Create CalculusKit instance and compute gradient
   >>> calc = CalculusKit(func, x0=np.array([0.5, 2.0]))
   >>> grad = calc.gradient(
   ...     method="finite",
   ...     stepsize=1e-2,
   ...     num_points=5,
   ...     extrapolation="ridders",
   ...     levels=4,
   ... )
   >>> print(np.round(np.asarray(grad).reshape(-1), 6))
   [0.877583 4.      ]


Adaptive backend via ``dk_kwargs``
----------------------------------

.. doctest:: gradient_adaptive

   >>> import numpy as np
   >>> from derivkit.calculus_kit import CalculusKit
   >>> # Define a scalar-valued function
   >>> def func(theta):
   ...     return np.sin(theta[0]) + theta[1] ** 2
   >>> # Create CalculusKit instance and compute gradient
   >>> calc = CalculusKit(func, x0=np.array([0.5, 2.0]))
   >>> grad = calc.gradient(
   ...     method="adaptive",
   ...     n_points=12,
   ...     spacing="auto",
   ...     ridge=1e-10,
   ... )
   >>> print(np.round(np.asarray(grad).reshape(-1), 6))
   [0.877583 4.      ]


Parallelism across parameters
-----------------------------

Different gradient components can be computed in parallel.
The number of parallel processes can be tuned with the ``n_workers`` parameter.

.. doctest:: gradient_parallel

   >>> import numpy as np
   >>> from derivkit.calculus_kit import CalculusKit
   >>> # Define a scalar-valued function
   >>> def f(theta):
   ...     return np.sin(theta[0]) + theta[1] ** 2 + np.cos(theta[2])
   >>> # Create CalculusKit instance and compute gradient
   >>> calc = CalculusKit(f, x0=np.array([0.5, 2.0, 0.1]))
   >>> grad = calc.gradient(
   ...     method="finite",
   ...     n_workers=3,
   ...     stepsize=1e-2,
   ...     num_points=5,
   ... )
   >>> print(np.round(np.asarray(grad).reshape(-1), 6))
   [ 0.877583  4.       -0.099833]


Notes
-----

- ``n_workers`` can speed up expensive functions by parallelizing gradient components.
  For cheap functions, overhead may dominate.
- The function must return a **scalar**. If it returns a vector or higher-rank
  tensor, :meth:`derivkit.CalculusKit.gradient` raises ``TypeError``.