kafe2 - Data Visualisation and Model Fitting

          Link to github Repository of the kafe2 project.


kafe2

  is an open-source Python package for the likelihood-based estimation of parameters from measured data. Its selling points are

  1. that it offers state-of-the-art statistical methods (e.g. confidence intervals/regions based on the profile likelihood method),
  2. that it only uses open-source software which allows users to understand and reproduce results with relative ease,
  3. and that it offers an easy-to-use and performance-optimized pipeline that takes numerical data and produces for example parameter confidence intervals or publication-quality plots.

Let’s look at a brief example:

import kafe2

# Just some numerical data:
x_data = [1.0, 2.0, 3.0, 4.0]
y_data = [2.3, 4.2, 7.5, 9.4]

kafe2.xy_fit("line", x_data, y_data, x_error=0.1, y_error=[0.40, 0.45, 0.40, 0.25],
             y_error_cor_rel=0.05)
kafe2.plot(x_label="$t$ [s]", y_label="$h$ [m]")

With just two function calls we get the following plot:

Example kafe2 plot

Ignoring imports and variable definitions the example consists of just two function calls but a lot is happening under the hood:

  1. A negative log-likelihood (NLL) function given the data and uncertainties is constructed. Uncertainties in x and y direction can be defined as either absolute or relative, and as either correlated or uncorrelated. If a simple float is provided as input the same amount of uncertainty is applied to each data point. Since the user did not specify a model function a line is used by default.
  2. A numerical optimization algorithm is applied to the likelihood function to find the best estimates for the model parameter values.
  3. Because the user defined uncertainties in x direction and uncertainties relative to the y model values the total covariance matrix has become a function of the model parameters. It is therefore necessary to re-calculate the total covariance matrix at each optimization step; while somewhat computationally expensive this ensures minimal bias and good statistical coverage of the confidence intervals. Due to the parameter-dependent uncertainties the regression problem as a whole has also become nonlinear. kafe2 recognizes the change and switches from estimating symmetrical parameter uncertainties from the Rao-Cramér-Fréchet bound to estimating confidence intervals from the profile likelihood.
  4. The data and model are plotted along with a confidence band for the model function. A legend containing information about the model, the parameter estimates, and the results of a hypothesis test (Pearson’s chi-squared test) is added automatically.
  5. Because the regression problem is nonlinear kafe2 by default also produces plots of the confidence intervals of single model parameters as well as plots of the confidence regions of pairs of parameters:
Example kafe2 plot of confidence intervals/regions

The above example is of course highly configurable: among other things users can define arbitrarily complex Python functions to model the data, they can switch to different data types (e.g. histogram data), they can use different likelihoods (e.g. Poisson), or they can simultaneously fit multiple models with shared parameters. Since tools have an inherent trade-off between usability and complex functionality kafe2 offers several interfaces for fitting:

By offering several interfaces with varying levels of complexity kafe2 aims to meet the needs of both beginners and experts. To learn more we invite you to take a look at the full documentation.