biweight_midcovariance#
- astropy.stats.biweight_midcovariance(data, c=9.0, M=None, modify_sample_size=False)[source]#
Compute the biweight midcovariance between pairs of multiple variables.
The biweight midcovariance is a robust and resistant estimator of the covariance between two variables.
This function computes the biweight midcovariance between all pairs of the input variables (rows) in the input data. The output array will have a shape of (N_variables, N_variables). The diagonal elements will be the biweight midvariances of each input variable (see
biweight_midvariance()
). The off-diagonal elements will be the biweight midcovariances between each pair of input variables.For example, if the input array
data
contains three variables (rows)x
,y
, andz
, the outputndarray
midcovariance matrix will be:\[\begin{split}\begin{pmatrix} \zeta_{xx} & \zeta_{xy} & \zeta_{xz} \\ \zeta_{yx} & \zeta_{yy} & \zeta_{yz} \\ \zeta_{zx} & \zeta_{zy} & \zeta_{zz} \end{pmatrix}\end{split}\]where \(\zeta_{xx}\), \(\zeta_{yy}\), and \(\zeta_{zz}\) are the biweight midvariances of each variable. The biweight midcovariance between \(x\) and \(y\) is \(\zeta_{xy}\) (\(= \zeta_{yx}\)). The biweight midcovariance between \(x\) and \(z\) is \(\zeta_{xz}\) (\(= \zeta_{zx}\)). The biweight midcovariance between \(y\) and \(z\) is \(\zeta_{yz}\) (\(= \zeta_{zy}\)).
The biweight midcovariance between two variables \(x\) and \(y\) is given by:
\[\zeta_{xy} = n_{xy} \ \frac{\sum_{|u_i| < 1, \ |v_i| < 1} \ (x_i - M_x) (1 - u_i^2)^2 (y_i - M_y) (1 - v_i^2)^2} {(\sum_{|u_i| < 1} \ (1 - u_i^2) (1 - 5u_i^2)) (\sum_{|v_i| < 1} \ (1 - v_i^2) (1 - 5v_i^2))}\]where \(M_x\) and \(M_y\) are the medians (or the input locations) of the two variables and \(u_i\) and \(v_i\) are given by:
\[ \begin{align}\begin{aligned}u_{i} = \frac{(x_i - M_x)}{c * MAD_x}\\v_{i} = \frac{(y_i - M_y)}{c * MAD_y}\end{aligned}\end{align} \]where \(c\) is the biweight tuning constant and \(MAD_x\) and \(MAD_y\) are the median absolute deviation of the \(x\) and \(y\) variables. The biweight midvariance tuning constant
c
is typically 9.0 (the default).If \(MAD_x\) or \(MAD_y\) are zero, then zero will be returned for that element.
For the standard definition of biweight midcovariance, \(n_{xy}\) is the total number of observations of each variable. That definition is used if
modify_sample_size
isFalse
, which is the default.However, if
modify_sample_size = True
, then \(n_{xy}\) is the number of observations for which \(|u_i| < 1\) and/or \(|v_i| < 1\), i.e.\[n_{xx} = \sum_{|u_i| < 1} \ 1\]\[n_{xy} = n_{yx} = \sum_{|u_i| < 1, \ |v_i| < 1} \ 1\]\[n_{yy} = \sum_{|v_i| < 1} \ 1\]which results in a value closer to the true variance for small sample sizes or for a large number of rejected values.
- Parameters:
- data2D or 1D array_like
Input data either as a 2D or 1D array. For a 2D array, it should have a shape (N_variables, N_observations). A 1D array may be input for observations of a single variable, in which case the biweight midvariance will be calculated (no covariance). Each row of
data
represents a variable, and each column a single observation of all those variables (same as thenumpy.cov
convention).- c
float
, optional Tuning constant for the biweight estimator (default = 9.0).
- M
float
or 1D array_like, optional The location estimate of each variable, either as a scalar or array. If
M
is an array, then its must be a 1D array containing the location estimate of each row (i.e.a.ndim
elements). IfM
is a scalar value, then its value will be used for each variable (row). IfNone
(default), then the median of each variable (row) will be used.- modify_sample_sizebool, optional
If
False
(default), then the sample size used is the total number of observations of each variable, which follows the standard definition of biweight midcovariance. IfTrue
, then the sample size is reduced to correct for any rejected values (see formula above), which results in a value closer to the true covariance for small sample sizes or for a large number of rejected values.
- Returns:
- biweight_midcovariance
ndarray
A 2D array representing the biweight midcovariances between each pair of the variables (rows) in the input array. The output array will have a shape of (N_variables, N_variables). The diagonal elements will be the biweight midvariances of each input variable. The off-diagonal elements will be the biweight midcovariances between each pair of input variables.
- biweight_midcovariance
References
Examples
Compute the biweight midcovariance between two random variables:
>>> import numpy as np >>> from astropy.stats import biweight_midcovariance >>> # Generate two random variables x and y >>> rng = np.random.default_rng(1) >>> x = rng.normal(0, 1, 200) >>> y = rng.normal(0, 3, 200) >>> # Introduce an obvious outlier >>> x[0] = 30.0 >>> # Calculate the biweight midcovariances between x and y >>> bicov = biweight_midcovariance([x, y]) >>> print(bicov) [[0.83435568 0.02379316] [0.02379316 7.15665769]] >>> # Print standard deviation estimates >>> print(np.sqrt(bicov.diagonal())) [0.91343072 2.67519302]