biweight_midcovariance#
- astropy.stats.biweight_midcovariance(data, c=9.0, M=None, modify_sample_size=False)[source]#
Compute the biweight midcovariance between pairs of multiple variables.
The biweight midcovariance is a robust and resistant estimator of the covariance between two variables.
This function computes the biweight midcovariance between all pairs of the input variables (rows) in the input data. The output array will have a shape of (N_variables, N_variables). The diagonal elements will be the biweight midvariances of each input variable (see
biweight_midvariance()
). The off-diagonal elements will be the biweight midcovariances between each pair of input variables.For example, if the input array
data
contains three variables (rows)x
,y
, andz
, the outputndarray
midcovariance matrix will be:where
, , and are the biweight midvariances of each variable. The biweight midcovariance between and is ( ). The biweight midcovariance between and is ( ). The biweight midcovariance between and is ( ).The biweight midcovariance between two variables
and is given by:where
and are the medians (or the input locations) of the two variables and and are given by:where
is the biweight tuning constant and and are the median absolute deviation of the and variables. The biweight midvariance tuning constantc
is typically 9.0 (the default).If
or are zero, then zero will be returned for that element.For the standard definition of biweight midcovariance,
is the total number of observations of each variable. That definition is used ifmodify_sample_size
isFalse
, which is the default.However, if
modify_sample_size = True
, then is the number of observations for which and/or , i.e.which results in a value closer to the true variance for small sample sizes or for a large number of rejected values.
- Parameters:
- data2D or 1D array_like
Input data either as a 2D or 1D array. For a 2D array, it should have a shape (N_variables, N_observations). A 1D array may be input for observations of a single variable, in which case the biweight midvariance will be calculated (no covariance). Each row of
data
represents a variable, and each column a single observation of all those variables (same as thenumpy.cov
convention).- c
float
, optional Tuning constant for the biweight estimator (default = 9.0).
- M
float
or 1D array_like, optional The location estimate of each variable, either as a scalar or array. If
M
is an array, then its must be a 1D array containing the location estimate of each row (i.e.a.ndim
elements). IfM
is a scalar value, then its value will be used for each variable (row). IfNone
(default), then the median of each variable (row) will be used.- modify_sample_sizebool, optional
If
False
(default), then the sample size used is the total number of observations of each variable, which follows the standard definition of biweight midcovariance. IfTrue
, then the sample size is reduced to correct for any rejected values (see formula above), which results in a value closer to the true covariance for small sample sizes or for a large number of rejected values.
- Returns:
- biweight_midcovariance
ndarray
A 2D array representing the biweight midcovariances between each pair of the variables (rows) in the input array. The output array will have a shape of (N_variables, N_variables). The diagonal elements will be the biweight midvariances of each input variable. The off-diagonal elements will be the biweight midcovariances between each pair of input variables.
- biweight_midcovariance
References
Examples
Compute the biweight midcovariance between two random variables:
>>> import numpy as np >>> from astropy.stats import biweight_midcovariance >>> # Generate two random variables x and y >>> rng = np.random.default_rng(1) >>> x = rng.normal(0, 1, 200) >>> y = rng.normal(0, 3, 200) >>> # Introduce an obvious outlier >>> x[0] = 30.0 >>> # Calculate the biweight midcovariances between x and y >>> bicov = biweight_midcovariance([x, y]) >>> print(bicov) [[0.83435568 0.02379316] [0.02379316 7.15665769]] >>> # Print standard deviation estimates >>> print(np.sqrt(bicov.diagonal())) [0.91343072 2.67519302]