Covariance matrices representations#

For this tutorial, it is required to install covmats with the required modules for the notebook. The easiest way is through pip:

pip install covmats[examples]

Or alternatively using conda

conda install covmats[examples]

You might also clone the repository and install from source

pip install -e ".[examples]"

Then import the modules

[1]:
import covmats
import marimo as mo
import numpy as np
import scipy as sp

# potential licence issue under the cholesky factorization section.

Once the installation is done, covmats gives access to various covariance matrix representations for large scale inversion, fast linear algebra and fast sampling.

The common interface CovarianceMatrix can be seen as a extension of the class scipy.stats.Covariance as it inherits from it (thus making it compatible with all scipy.stats functions and classes) and dope it with LinearOperator capabilities.

Derived from this base class, covmats provides several full-rank representation of covariance matrices:

as well as several low-rank (approximate) representation of covariance matrices:

In addition, it is possible to build low-rank approximations from a given Kernel. But it only provides the linear operations capabilities (no sampling not statistical calculations).

It is also possible to obtain low-rank factorization from any :py:class:CovarianceMatrix and :py:class:CovKernelAsLinop using randomized Eigen factorization:

By convention, we will refer to all covariance matrices as \(\mathbf{A}\) and their respective inverses, aka precision matrices, as \(\mathbf{Q}\).

Full-rank covariance matrix decomposition#

CovViaDiagonal#

Let’s start with the simplest case, i.e., a diagonal matrix, to explore the API:

[2]:
d3 = [1, 2, 3]
A33 = np.diag(d3)  # a diagonal covariance matrix
cov_diag_33 = covmats.CovViaDiagonal(d3)
# a point of interest
v3 = [4, -2, 5]
# repreat 4 times to obtain an array
V34 = np.vstack([v3] * 4).T

It behaves as a linear operator:

[3]:
# Test the equality between a dense array and the diagonal cov representation
np.testing.assert_allclose(A33 @ v3, cov_diag_33 @ v3)
np.testing.assert_allclose(A33 @ V34, cov_diag_33 @ V34)
cov_diag_33 @ V34
[3]:
array([[ 4.,  4.,  4.,  4.],
       [-4., -4., -4., -4.],
       [15., 15., 15., 15.]])

It is possible to get the inverse (get \(\mathbf{Q}\)), and solve \(\mathbf{Ax} = \mathbf{b}\):

[4]:
np.testing.assert_allclose(np.linalg.inv(A33), cov_diag_33.precision)
np.testing.assert_allclose(cov_diag_33.solve(cov_diag_33 @ V34), V34)
[5]:
dist = sp.stats.multivariate_normal(mean=[0, 0, 0], cov=A33)
dist.pdf(v3)
[5]:
np.float64(4.9595685102808205e-08)

TODO: continue the tutorial

⚠️ About the following code that performs sparse cholesky factorization

Scikit-sparse depends on external libraries with GPL licenses, such as SuiteSparse. As a consequence the following piece of code must adopt that license as well. Please look into the terms of this license before creating a dynamic link to this pieces in your downstream package and understand commercial use limitations. We are not lawyers and cannot provide any guidance on the terms of this license.

Please see https://www.gnu.org/licenses/licenses.html#LGPL

[6]:
from typing import no_type_check


@no_type_check  # for ty checks
def _get_L_D_P(A: sp.sparse.sparray):
    """
    Return L, D and P from the factorization L @ D @ L' = P @ A @ P' using sksparse.

    Note that sksparse uses SuiteSparse which is LGPL licence.
    """
    import sksparse.cholmod as cholmod

    # Need to take the API change into account
    try:
        # sksparse 4.x
        L, D, P = cholmod.ldl(A, order="amd")
    except AttributeError:
        # sksparse 5.x
        f = cholmod.cholesky(A)
        (L, D), P = f.L_D(), f.P()
    return L, D, P

Low-rank covariance matrix decomposition#