API Reference
Matching
Weighted exact matching using coarsened predictor variables
match(data, treatment)
Weight observations based on global and local (strata) treatment level populations
Only observations from strata with examples from each treatment level will receive a non-zero weight. If the treatment column contains continuous values, it is a high likelihood that all examples will receive a weight of zero.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
DataFrame
|
The data on which we shall perform coarsened exact matching |
required |
treatment |
str
|
The name of the column in data containing the treatment variable |
required |
Returns:
Type | Description |
---|---|
Series
|
The weight to use for each observation of the provided data given the coarsening provided |
Source code in cem/match.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
|
Imbalance
Multidimensional histogram imbalance between two or more collections of observations
L1(data, treatment, weights=None)
(Weighted) Multidimensional L1 imbalance between groups of observations of differing treatment levels
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
DataFrame
|
Observations |
required |
treatment |
str
|
Name of column containing the treatment level |
required |
weights |
Series
|
Example weights |
None
|
Source code in cem/imbalance.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
|
L2(data, treatment, weights=None)
(Weighted) Multidimensional L2 imbalance between groups of observations of differing treatment levels
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
DataFrame
|
Observations |
required |
treatment |
str
|
Name of column containing the treatment level |
required |
weights |
Series
|
Example weights |
None
|
Source code in cem/imbalance.py
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
|
Automatic Coarsening
Coarsening predictor variables for a collection of observations
coarsen(data, treatment, measure='l1', lower=1, upper=10, columns=None)
Automatic coarsening by binning numeric columns using the number of bins, H, that resulted in the median (unweighted) imbalance over a range of possible values for H.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
DataFrame
|
Data to coarsen |
required |
treatment |
str
|
Name of the column containing the treatment level |
required |
measure |
str
|
Imbalance measure (l1 or l2) |
'l1'
|
lower |
int
|
Minimum value for H |
1
|
upper |
int
|
Maximum value for H |
10
|
columns |
Optional[Sequence[str]]
|
Columns to coarsen |
None
|
Source code in cem/coarsen.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
|