HurdleDMR.jl is a Julia implementation of the Hurdle Distributed Multinomial Regression (HDMR), as described in:
Kelly, Bryan, Asaf Manela, and Alan Moreira (2018). Text Selection. Working paper.
It includes a Julia implementation of the Distributed Multinomial Regression (DMR) model of Taddy (2015).
This tutorial explains how to use this package from Python via the PyJulia package.
First, install Julia itself. The easiest way to do that is from the download site https://julialang.org/downloads/. An alternative is to install JuliaPro from https://juliacomputing.com
Once installed, open julia in a terminal (or in Juno), press ]
to activate package manager and add the following packages:
pkg> add HurdleDMR GLM Lasso
See the documentation here for installation instructions.
Because I use miniconda, I also had to run the following, but you might not:
from julia.api import Julia
jl = Julia(compiled_modules=False)
jl.eval("using Distributed")
from julia.Distributed import addprocs
addprocs(4)
from julia import HurdleDMR as hd
jl.eval("@everywhere using HurdleDMR")
Setup your data into an n-by-p covars matrix, and a (sparse) n-by-d counts matrix. Here we generate some random data.
import numpy as np
from scipy import sparse
n = 100
p = 3
d = 4
np.random.seed(123)
m = 1 + np.random.poisson(5,n)
covars = np.random.uniform(0,1,(n,p))
q = [[0 + j*sum(covars[i,:]) for j in range(d)] for i in range(n)]
#rowsums = [sum(q[i]) for i in range(n)]
q = [q[i]/sum(q[i]) for i in range(n)]
#counts = sparse.csr_matrix(np.concatenate([[np.random.multinomial(m[i],q[i]) for i in range(n)]]))
counts = np.concatenate([[np.random.multinomial(m[i],q[i]) for i in range(n)]])
counts
array([[0, 2, 3, 3], [0, 2, 2, 2], [0, 1, 2, 2], [0, 1, 1, 7], [0, 1, 1, 3], [0, 1, 1, 7], [0, 1, 2, 5], [0, 1, 0, 5], [0, 1, 5, 4], [0, 0, 1, 4], [0, 1, 2, 1], [0, 0, 3, 2], [0, 2, 3, 3], [0, 0, 2, 7], [0, 1, 5, 0], [0, 0, 2, 5], [0, 1, 0, 4], [0, 0, 1, 3], [0, 1, 3, 4], [0, 1, 3, 5], [0, 1, 1, 3], [0, 1, 0, 6], [0, 0, 2, 5], [0, 1, 3, 4], [0, 1, 3, 3], [0, 0, 2, 4], [0, 0, 1, 2], [0, 1, 0, 3], [0, 0, 2, 2], [0, 0, 2, 6], [0, 1, 1, 3], [0, 1, 0, 6], [0, 1, 2, 3], [0, 1, 2, 4], [0, 2, 0, 5], [0, 1, 1, 2], [0, 0, 2, 3], [0, 1, 1, 3], [0, 2, 2, 3], [0, 1, 6, 0], [0, 0, 0, 5], [0, 0, 2, 3], [0, 0, 3, 0], [0, 0, 2, 3], [0, 0, 2, 3], [0, 2, 2, 0], [0, 0, 2, 1], [0, 3, 3, 6], [0, 0, 1, 3], [0, 0, 1, 1], [0, 2, 1, 1], [0, 0, 2, 5], [0, 1, 3, 3], [0, 0, 0, 4], [0, 2, 0, 2], [0, 2, 2, 6], [0, 2, 1, 4], [0, 2, 1, 5], [0, 0, 1, 2], [0, 1, 2, 2], [0, 1, 1, 4], [0, 0, 1, 7], [0, 0, 4, 6], [0, 0, 2, 2], [0, 1, 1, 3], [0, 0, 0, 5], [0, 2, 2, 2], [0, 0, 1, 3], [0, 2, 2, 1], [0, 0, 2, 3], [0, 1, 1, 8], [0, 1, 1, 1], [0, 0, 2, 3], [0, 0, 2, 5], [0, 1, 2, 3], [0, 2, 2, 2], [0, 2, 2, 1], [0, 4, 1, 2], [0, 3, 0, 4], [0, 0, 3, 1], [0, 0, 0, 5], [0, 0, 4, 1], [0, 2, 1, 2], [0, 2, 4, 3], [0, 2, 3, 5], [0, 0, 0, 8], [0, 1, 0, 4], [0, 1, 0, 3], [0, 1, 3, 2], [0, 1, 1, 6], [0, 0, 1, 5], [0, 0, 1, 4], [0, 0, 3, 2], [0, 1, 3, 2], [0, 3, 0, 3], [0, 2, 7, 5], [0, 0, 4, 3], [0, 3, 1, 1], [0, 0, 3, 3], [0, 2, 3, 0]])
The Distributed Multinomial Regression (DMR) model of Taddy (2015) is a highly scalable
approximation to the Multinomial using distributed (independent, parallel)
Poisson regressions, one for each of the d categories (columns) of a large counts
matrix,
on the covars
.
To fit a DMR:
m = hd.dmr(covars, counts)
We can get the coefficients matrix for each variable + intercept as usual with
hd.coef(m)
array([[ 0. , -1.94507425, -1.28828706, -0.59041306], [ 0. , 0.1056461 , 0. , 0. ], [ 0. , 0. , 0. , 0. ], [ 0. , 0. , 0.1268672 , 0. ]])
By default we only return the AICc maximizing coefficients. To also get back the entire regulatrization paths, run
paths = hd.dmrpaths(covars, counts)
We can now select, for example the coefficients that minimize 10-fold CV mse (takes a while)
jl.eval("using Lasso: MinCVmse")
from julia import Lasso
gen = jl.eval("MinCVKfold{MinCVmse}(10)")
hd.coef(paths, gen)
array([[ 0.00000000e+00, -1.89167038e+00, -1.22050226e+00, -5.90413062e-01], [ 0.00000000e+00, 3.18787704e-11, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, 0.00000000e+00, 2.97862348e-07, 0.00000000e+00]])
For highly sparse counts, as is often the case with text that is selected for
various reasons, the Hurdle Distributed Multinomial Regression (HDMR) model of
Kelly, Manela, and Moreira (2018), may be superior to the DMR. It approximates
a higher dispersion Multinomial using distributed (independent, parallel)
Hurdle regressions, one for each of the d categories (columns) of a large counts
matrix,
on the covars
. It allows a potentially different sets of covariates to explain
category inclusion ($h=1{c>0}$), and repetition ($c>0$).
Both the model for zeroes and for positive counts are regularized by default,
using GammaLassoPath
, picking the AICc optimal segment of the regularization
path.
HDMR can be fitted:
m = hd.hdmr(covars, counts, inpos=[1,2], inzero=[1,2,3])
We can get the coefficients matrix for each variable + intercept as usual with
coefspos, coefszero = hd.coef(m)
print("coefspos:\n", coefspos)
print("coefszero:\n", coefszero)
coefspos: [[ 0. -2.18288411 -1.18060442 -0.41828599] [ 0. 0.33062404 0. 0. ] [ 0. 0. 0.02997958 0.08338104]] coefszero: [[ 0. 0.04616614 1.36309252 3.07912436] [ 0. 0. 0. -0.67010761] [ 0. 0. 0. 0. ] [ 0. 0. 0. 0. ]]
By default we only return the AICc maximizing coefficients. To also get back the entire regulatrization paths, run
paths = hd.hdmrpaths(covars, counts)
hd.coef(paths, Lasso.AllSeg())
(array([[[ 0.00000000e+00, -2.02133392e+00, -1.16575159e+00, -3.76235470e-01], [ 0.00000000e+00, 2.90768861e-08, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, 0.00000000e+00, 1.88213201e-12, 1.36664985e-10], [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00]], [[ 0.00000000e+00, -2.05931209e+00, -1.17116227e+00, -3.80396163e-01], [ 0.00000000e+00, 7.93643648e-02, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, 0.00000000e+00, 1.09392297e-02, 8.30076060e-03], [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00]], [[ 0.00000000e+00, -2.09423553e+00, -1.17609960e+00, -3.84191708e-01], [ 0.00000000e+00, 1.51431219e-01, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, 0.00000000e+00, 2.09033261e-02, 1.58632356e-02], [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00]], [[ 0.00000000e+00, -2.12633252e+00, -1.18060442e+00, -3.87653804e-01], [ 0.00000000e+00, 2.16918343e-01, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, 0.00000000e+00, 2.99795817e-02, 2.27532006e-02], [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00]], [[ 0.00000000e+00, -2.15581588e+00, -1.18671462e+00, -3.90811453e-01], [ 0.00000000e+00, 2.76460942e-01, 4.61259056e-03, 0.00000000e+00], [ 0.00000000e+00, 0.00000000e+00, 3.76960464e-02, 2.90305459e-02], [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00]], [[ 0.00000000e+00, -2.18288411e+00, -1.19351483e+00, -3.93691184e-01], [ 0.00000000e+00, 3.30624044e-01, 1.16365381e-02, 0.00000000e+00], [ 0.00000000e+00, 0.00000000e+00, 4.43915698e-02, 3.47498089e-02], [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00]], [[ 0.00000000e+00, -2.18509703e+00, -1.19971939e+00, -3.96317254e-01], [ 0.00000000e+00, 3.87485773e-01, 1.80355614e-02, 0.00000000e+00], [ 0.00000000e+00, -5.11025873e-02, 5.04946902e-02, 3.99606571e-02], [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00]], [[ 0.00000000e+00, -2.18608377e+00, -1.20537987e+00, -3.98711836e-01], [ 0.00000000e+00, 4.39887722e-01, 2.38654722e-02, 0.00000000e+00], [ 0.00000000e+00, -1.00972759e-01, 5.60576447e-02, 4.47083262e-02], [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00]], [[ 0.00000000e+00, -2.18722497e+00, -1.21054341e+00, -4.00895193e-01], [ 0.00000000e+00, 4.87737174e-01, 2.91769653e-02, 0.00000000e+00], [ 0.00000000e+00, -1.46676441e-01, 6.11281004e-02, 4.90340183e-02], [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00]], [[ 0.00000000e+00, -2.18846599e+00, -1.21525323e+00, -4.02885839e-01], [ 0.00000000e+00, 5.31416284e-01, 3.40163139e-02, 0.00000000e+00], [ 0.00000000e+00, -1.88537708e-01, 6.57495256e-02, 5.29752634e-02], [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00]], [[ 0.00000000e+00, -2.18976444e+00, -1.21954865e+00, -4.04700683e-01], [ 0.00000000e+00, 5.71279194e-01, 3.84251907e-02, 0.00000000e+00], [ 0.00000000e+00, -2.26859921e-01, 6.99616237e-02, 5.65662484e-02], [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00]], [[ 0.00000000e+00, -2.19108719e+00, -1.22346595e+00, -4.06355168e-01], [ 0.00000000e+00, 6.07651872e-01, 4.24422459e-02, 0.00000000e+00], [ 0.00000000e+00, -2.61926040e-01, 7.38005241e-02, 5.98381161e-02], [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00]], [[ 0.00000000e+00, -2.19240871e+00, -1.22703811e+00, -4.07863396e-01], [ 0.00000000e+00, 6.40833982e-01, 4.61022281e-02, 0.00000000e+00], [ 0.00000000e+00, -2.93999344e-01, 7.72992307e-02, 6.28192369e-02], [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00]], [[ 0.00000000e+00, -2.19238025e+00, -1.23029533e+00, -4.09238237e-01], [ 0.00000000e+00, 6.70631777e-01, 4.94369808e-02, 0.00000000e+00], [ 0.00000000e+00, -3.23271113e-01, 8.04878185e-02, 6.55354565e-02], [ 0.00000000e+00, -2.16424573e-03, 0.00000000e+00, 0.00000000e+00]], [[ 0.00000000e+00, -2.18350673e+00, -1.23326513e+00, -4.10491440e-01], [ 0.00000000e+00, 6.94673188e-01, 5.24752660e-02, 0.00000000e+00], [ 0.00000000e+00, -3.49641464e-01, 8.33937415e-02, 6.80103213e-02], [ 0.00000000e+00, -1.87391863e-02, 0.00000000e+00, 0.00000000e+00]], [[ 0.00000000e+00, -2.17555276e+00, -1.23597274e+00, -4.11633728e-01], [ 0.00000000e+00, 7.16571980e-01, 5.52435290e-02, 0.00000000e+00], [ 0.00000000e+00, -3.73689671e-01, 8.60420042e-02, 7.02652830e-02], [ 0.00000000e+00, -3.37752323e-02, 0.00000000e+00, 0.00000000e+00]], [[ 0.00000000e+00, -2.16841470e+00, -1.23844115e+00, -4.12674883e-01], [ 0.00000000e+00, 7.36519747e-01, 5.77657182e-02, 0.00000000e+00], [ 0.00000000e+00, -3.95617699e-01, 8.84554223e-02, 7.23198857e-02], [ 0.00000000e+00, -4.74203130e-02, 0.00000000e+00, 0.00000000e+00]], [[ 0.00000000e+00, -2.16200061e+00, -1.23950189e+00, -4.13623832e-01], [ 0.00000000e+00, 7.54688730e-01, 5.97651354e-02, 0.00000000e+00], [ 0.00000000e+00, -4.15610148e-01, 9.07380663e-02, 7.41919352e-02], [ 0.00000000e+00, -5.98077251e-02, -2.01295591e-03, 0.00000000e+00]], [[ 0.00000000e+00, -2.15623264e+00, -1.24027038e+00, -4.14488717e-01], [ 0.00000000e+00, 7.71240849e-01, 6.15374039e-02, 0.00000000e+00], [ 0.00000000e+00, -4.33836708e-01, 9.28323970e-02, 7.58976545e-02], [ 0.00000000e+00, -7.10563849e-02, -4.18515353e-03, 0.00000000e+00]], [[ 0.00000000e+00, -2.15103859e+00, -1.24097200e+00, -4.15276967e-01], [ 0.00000000e+00, 7.86316683e-01, 6.31524154e-02, 0.00000000e+00], [ 0.00000000e+00, -4.50451628e-01, 9.47411754e-02, 7.74518242e-02], [ 0.00000000e+00, -8.12746149e-02, -6.16430919e-03, 0.00000000e+00]], [[ 0.00000000e+00, -2.14635745e+00, -1.24161255e+00, -4.15995356e-01], [ 0.00000000e+00, 8.00048953e-01, 6.46243290e-02, 0.00000000e+00], [ 0.00000000e+00, -4.65596577e-01, 9.64807535e-02, 7.88679109e-02], [ 0.00000000e+00, -9.05591315e-02, -7.96755219e-03, 0.00000000e+00]], [[ 0.00000000e+00, -2.14213433e+00, -1.24219749e+00, -4.16500855e-01], [ 0.00000000e+00, 8.12556610e-01, 6.59662780e-02, 0.00000000e+00], [ 0.00000000e+00, -4.79400722e-01, 9.80660122e-02, 8.01228005e-02], [ 0.00000000e+00, -9.89975077e-02, -9.61045464e-03, -2.53526513e-04]], [[ 0.00000000e+00, -2.13832223e+00, -1.24273090e+00, -4.16524455e-01], [ 0.00000000e+00, 8.23950636e-01, 6.71883457e-02, 0.00000000e+00], [ 0.00000000e+00, -4.91982430e-01, 9.95108644e-02, 8.11625791e-02], [ 0.00000000e+00, -1.06668271e-01, -1.11074815e-02, -1.22738202e-03]], [[ 0.00000000e+00, -2.13487848e+00, -1.24321785e+00, -4.16546101e-01], [ 0.00000000e+00, 8.34330240e-01, 6.83025049e-02, 0.00000000e+00], [ 0.00000000e+00, -5.03449495e-01, 1.00827492e-01, 8.21099308e-02], [ 0.00000000e+00, -1.13642695e-01, -1.24714014e-02, -2.11472118e-03]], [[ 0.00000000e+00, -2.13176475e+00, -1.24366215e+00, -4.16566026e-01], [ 0.00000000e+00, 8.43784853e-01, 6.93178643e-02, 0.00000000e+00], [ 0.00000000e+00, -5.13900160e-01, 1.02027331e-01, 8.29732207e-02], [ 0.00000000e+00, -1.19985376e-01, -1.37141128e-02, -2.92321139e-03]], [[ 0.00000000e+00, -2.12894736e+00, 0.00000000e+00, -4.16584271e-01], [ 0.00000000e+00, 8.52396617e-01, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, -5.23424173e-01, 0.00000000e+00, 8.37597669e-02], [ 0.00000000e+00, -1.25754604e-01, 0.00000000e+00, -3.65987824e-03]], [[ 0.00000000e+00, -2.12639732e+00, 0.00000000e+00, -4.16601040e-01], [ 0.00000000e+00, 8.60242101e-01, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, -5.32103692e-01, 0.00000000e+00, 8.44765171e-02], [ 0.00000000e+00, -1.31002791e-01, 0.00000000e+00, -4.33108646e-03]], [[ 0.00000000e+00, -2.12408719e+00, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, 8.67388179e-01, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, -5.40013206e-01, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, -1.35777947e-01, 0.00000000e+00, 0.00000000e+00]], [[ 0.00000000e+00, -2.12199430e+00, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, 8.73898999e-01, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, -5.47221165e-01, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, -1.40122941e-01, 0.00000000e+00, 0.00000000e+00]], [[ 0.00000000e+00, -2.12009624e+00, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, 8.79829144e-01, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, -5.53789376e-01, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, -1.44077371e-01, 0.00000000e+00, 0.00000000e+00]], [[ 0.00000000e+00, -2.11837504e+00, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, 8.85232119e-01, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, -5.59774802e-01, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, -1.47676414e-01, 0.00000000e+00, 0.00000000e+00]], [[ 0.00000000e+00, -2.11681330e+00, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, 8.90154267e-01, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, -5.65228992e-01, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, -1.50952442e-01, 0.00000000e+00, 0.00000000e+00]], [[ 0.00000000e+00, -2.11539572e+00, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, 8.94638435e-01, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, -5.70199047e-01, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, -1.53934713e-01, 0.00000000e+00, 0.00000000e+00]], [[ 0.00000000e+00, -2.11410858e+00, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, 8.98723642e-01, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, -5.74727898e-01, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, -1.56649788e-01, 0.00000000e+00, 0.00000000e+00]], [[ 0.00000000e+00, -2.11293952e+00, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, 9.02445426e-01, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, -5.78854679e-01, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, -1.59121790e-01, 0.00000000e+00, 0.00000000e+00]], [[ 0.00000000e+00, -2.11187742e+00, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, 9.05836154e-01, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, -5.82615060e-01, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, -1.61372631e-01, 0.00000000e+00, 0.00000000e+00]], [[ 0.00000000e+00, -2.11091236e+00, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, 9.08925514e-01, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, -5.86041586e-01, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, -1.63422184e-01, 0.00000000e+00, 0.00000000e+00]], [[ 0.00000000e+00, -2.11003505e+00, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, 9.11739915e-01, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, -5.89163815e-01, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, -1.65288631e-01, 0.00000000e+00, 0.00000000e+00]], [[ 0.00000000e+00, -2.10923735e+00, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, 9.14303869e-01, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, -5.92008759e-01, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, -1.66988416e-01, 0.00000000e+00, 0.00000000e+00]], [[ 0.00000000e+00, -2.10851208e+00, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, 9.16640013e-01, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, -5.94601088e-01, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, -1.68536426e-01, 0.00000000e+00, 0.00000000e+00]], [[ 0.00000000e+00, -2.10785258e+00, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, 9.18768670e-01, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, -5.96963235e-01, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, -1.69946260e-01, 0.00000000e+00, 0.00000000e+00]], [[ 0.00000000e+00, -2.10725249e+00, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, 9.20707718e-01, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, -5.99115538e-01, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, -1.71230413e-01, 0.00000000e+00, 0.00000000e+00]], [[ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00]], [[ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00]]]), array([[[ 0.00000000e+00, 4.61661434e-02, 1.36309252e+00, 2.70806114e+00], [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, -2.06142611e-05], [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00]], [[ 0.00000000e+00, 2.85616460e-02, 1.31383731e+00, 2.79335381e+00], [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, -1.59061397e-01], [ 0.00000000e+00, 3.38630981e-02, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, 0.00000000e+00, 9.36637611e-02, 0.00000000e+00]], [[ 0.00000000e+00, 1.25399866e-02, 1.26935403e+00, 2.87277112e+00], [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, -3.04323034e-01], [ 0.00000000e+00, 6.47156833e-02, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, 0.00000000e+00, 1.79023813e-01, 0.00000000e+00]], [[ 0.00000000e+00, -2.04337784e-03, 1.22912935e+00, 2.94665932e+00], [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, -4.37162628e-01], [ 0.00000000e+00, 9.28269244e-02, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, 0.00000000e+00, 2.56859150e-01, 0.00000000e+00]], [[ 0.00000000e+00, -1.53193133e-02, 1.19271630e+00, 3.01534195e+00], [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, -5.58750354e-01], [ 0.00000000e+00, 1.18441468e-01, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, 0.00000000e+00, 3.27862418e-01, 0.00000000e+00]], [[ 0.00000000e+00, -2.74064001e-02, 1.15972356e+00, 3.07912436e+00], [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, -6.70107611e-01], [ 0.00000000e+00, 1.41781788e-01, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, 0.00000000e+00, 3.92652538e-01, 0.00000000e+00]], [[ 0.00000000e+00, -3.84121692e-02, 1.12980669e+00, 3.16724819e+00], [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, -7.80647397e-01], [ 0.00000000e+00, 1.63050287e-01, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, 0.00000000e+00, 4.51785276e-01, -4.66554241e-02]], [[ 0.00000000e+00, -4.20435289e-02, 1.10266103e+00, 3.32311386e+00], [ 0.00000000e+00, -1.24407718e-02, 0.00000000e+00, -8.94467551e-01], [ 0.00000000e+00, 1.82447093e-01, 0.00000000e+00, -7.28615975e-02], [ 0.00000000e+00, 0.00000000e+00, 5.05761709e-01, -1.43806892e-01]], [[ 0.00000000e+00, -4.22947674e-02, 1.08817310e+00, 3.48370582e+00], [ 0.00000000e+00, -2.97187640e-02, 0.00000000e+00, -1.00091662e+00], [ 0.00000000e+00, 2.00137850e-01, -1.89963162e-02, -1.66064974e-01], [ 0.00000000e+00, 0.00000000e+00, 5.54656529e-01, -2.34399985e-01]], [[ 0.00000000e+00, -4.25159707e-02, 1.09214696e+00, 3.63334616e+00], [ 0.00000000e+00, -4.54694266e-02, 0.00000000e+00, -1.09944000e+00], [ 0.00000000e+00, 2.16267024e-01, -6.81794362e-02, -2.52240140e-01], [ 0.00000000e+00, 0.00000000e+00, 5.98688726e-01, -3.17458672e-01]], [[ 0.00000000e+00, -4.27110815e-02, 1.09598990e+00, 3.77260703e+00], [ 0.00000000e+00, -5.98277741e-02, 0.00000000e+00, -1.19056324e+00], [ 0.00000000e+00, 2.30972233e-01, -1.13162886e-01, -3.31887477e-01], [ 0.00000000e+00, 0.00000000e+00, 6.38926994e-01, -3.93632966e-01]], [[ 0.00000000e+00, -4.28834834e-02, 1.09967816e+00, 3.90204279e+00], [ 0.00000000e+00, -7.29167919e-02, 0.00000000e+00, -1.27477720e+00], [ 0.00000000e+00, 2.44378916e-01, -1.54302018e-01, -4.05464127e-01], [ 0.00000000e+00, 0.00000000e+00, 6.75698055e-01, -4.63502203e-01]], [[ 0.00000000e+00, -4.30360850e-02, 1.10319550e+00, 4.02219279e+00], [ 0.00000000e+00, -8.48485452e-02, 0.00000000e+00, -1.35254253e+00], [ 0.00000000e+00, 2.56601440e-01, -1.91920779e-01, -4.73391855e-01], [ 0.00000000e+00, 0.00000000e+00, 7.09299114e-01, -5.27587358e-01]], [[ 0.00000000e+00, -3.82712274e-02, 1.10653190e+00, 4.13358329e+00], [ 0.00000000e+00, -9.68005407e-02, 0.00000000e+00, -1.42429398e+00], [ 0.00000000e+00, 2.67852884e-01, -2.26315411e-01, -5.36062913e-01], [ 0.00000000e+00, -8.20534596e-03, 7.40001074e-01, -5.86360459e-01]], [[ 0.00000000e+00, -3.15994970e-02, 1.10968221e+00, 4.23672728e+00], [ 0.00000000e+00, -1.08205050e-01, 0.00000000e+00, -1.49044244e+00], [ 0.00000000e+00, 2.78161708e-01, -2.57757175e-01, -5.93843955e-01], [ 0.00000000e+00, -1.95769068e-02, 7.68051260e-01, -6.40251593e-01]], [[ 0.00000000e+00, -2.55146946e-02, 1.11264511e+00, 4.33212374e+00], [ 0.00000000e+00, -1.18599123e-01, 0.00000000e+00, -1.55137660e+00], [ 0.00000000e+00, 2.87559334e-01, -2.86494678e-01, -6.47078855e-01], [ 0.00000000e+00, -2.99430237e-02, 7.93675740e-01, -6.89654330e-01]], [[ 0.00000000e+00, -1.99650707e-02, 1.11542225e+00, 4.42021706e+00], [ 0.00000000e+00, -1.28073136e-01, 0.00000000e+00, -1.60741931e+00], [ 0.00000000e+00, 2.96126217e-01, -3.12755904e-01, -6.96078474e-01], [ 0.00000000e+00, -3.93925449e-02, 8.17081326e-01, -7.34922008e-01]], [[ 0.00000000e+00, -1.49038722e-02, 1.11801753e+00, 4.50155802e+00], [ 0.00000000e+00, -1.36708468e-01, 0.00000000e+00, -1.65901256e+00], [ 0.00000000e+00, 3.03935620e-01, -3.36749999e-01, -7.41172412e-01], [ 0.00000000e+00, -4.80063459e-02, 8.38457334e-01, -7.76403840e-01]], [[ 0.00000000e+00, -1.02887156e-02, 1.12043652e+00, 4.57655239e+00], [ 0.00000000e+00, -1.44578676e-01, 0.00000000e+00, -1.70643374e+00], [ 0.00000000e+00, 3.11054287e-01, -3.58668855e-01, -7.82632585e-01], [ 0.00000000e+00, -5.58580898e-02, 8.57977122e-01, -8.14397247e-01]], [[ 0.00000000e+00, -6.08037107e-03, 1.12268601e+00, 4.64563080e+00], [ 0.00000000e+00, -1.51751775e-01, 0.00000000e+00, -1.74999079e+00], [ 0.00000000e+00, 3.17543186e-01, -3.78688542e-01, -8.20727078e-01], [ 0.00000000e+00, -6.30150741e-02, 8.75799475e-01, -8.49184614e-01]], [[ 0.00000000e+00, -2.24332009e-03, 1.12477360e+00, 4.70920463e+00], [ 0.00000000e+00, -1.58289182e-01, 0.00000000e+00, -1.78997342e+00], [ 0.00000000e+00, 3.23457859e-01, -3.96970594e-01, -8.55707535e-01], [ 0.00000000e+00, -6.95385938e-02, 8.92069840e-01, -8.81026555e-01]], [[ 0.00000000e+00, 1.25539588e-03, 1.12670744e+00, 4.76766418e+00], [ 0.00000000e+00, -1.64247867e-01, 0.00000000e+00, -1.82665222e+00], [ 0.00000000e+00, 3.28849064e-01, -4.13663185e-01, -8.87809443e-01], [ 0.00000000e+00, -7.54846745e-02, 9.06921443e-01, -9.10163355e-01]], [[ 0.00000000e+00, 4.44509906e-03, 1.12849596e+00, 4.82138310e+00], [ 0.00000000e+00, -1.69678320e-01, 0.00000000e+00, -1.86028452e+00], [ 0.00000000e+00, 3.33762945e-01, -4.28902200e-01, -9.17254349e-01], [ 0.00000000e+00, -8.09042219e-02, 9.20476306e-01, -9.36817475e-01]], [[ 0.00000000e+00, 7.35305627e-03, 1.13014770e+00, 4.87071040e+00], [ 0.00000000e+00, -1.74627500e-01, 0.00000000e+00, -1.89110658e+00], [ 0.00000000e+00, 3.38241680e-01, -4.42812220e-01, -9.44247692e-01], [ 0.00000000e+00, -8.58437803e-02, 9.32846172e-01, -9.61193345e-01]], [[ 0.00000000e+00, 1.00040425e-02, 1.12883320e+00, 4.91597576e+00], [ 0.00000000e+00, -1.79138000e-01, 4.66945358e-03, -1.91933946e+00], [ 0.00000000e+00, 3.42323714e-01, -4.55409688e-01, -9.68981782e-01], [ 0.00000000e+00, -9.03457673e-02, 9.44779814e-01, -9.83479628e-01]], [[ 0.00000000e+00, 1.24206671e-02, 1.12242638e+00, 4.95748940e+00], [ 0.00000000e+00, -1.83248644e-01, 1.75247911e-02, -1.94518989e+00], [ 0.00000000e+00, 3.46044105e-01, -4.66723824e-01, -9.91635464e-01], [ 0.00000000e+00, -9.44488630e-02, 9.56861949e-01, -1.00385033e+00]], [[ 0.00000000e+00, 1.46234871e-02, 1.11659996e+00, 4.99554147e+00], [ 0.00000000e+00, -1.86994679e-01, 2.92466160e-02, -1.96884932e+00], [ 0.00000000e+00, 3.49434810e-01, -4.77045390e-01, -1.01237514e+00], [ 0.00000000e+00, -9.81883203e-02, 9.67888982e-01, -1.02246561e+00]], [[ 0.00000000e+00, 1.66315708e-02, 1.11130021e+00, 5.03040178e+00], [ 0.00000000e+00, -1.90408819e-01, 3.99346509e-02, -1.99049400e+00], [ 0.00000000e+00, 3.52525030e-01, -4.86460681e-01, -1.03135473e+00], [ 0.00000000e+00, -1.01596360e-01, 9.77951866e-01, -1.03947263e+00]], [[ 0.00000000e+00, 1.84618627e-02, 1.10647945e+00, 5.06232549e+00], [ 0.00000000e+00, -1.93520033e-01, 4.96785042e-02, -2.01029116e+00], [ 0.00000000e+00, 3.55341303e-01, -4.95048649e-01, -1.04871804e+00], [ 0.00000000e+00, -1.04702251e-01, 9.87133691e-01, -1.05500752e+00]], [[ 0.00000000e+00, 2.01300610e-02, 1.10209333e+00, 5.09154573e+00], [ 0.00000000e+00, -1.96355178e-01, 5.85618462e-02, -2.02839078e+00], [ 0.00000000e+00, 3.57907873e-01, -5.02881363e-01, -1.06459689e+00], [ 0.00000000e+00, -1.07532737e-01, 9.95510780e-01, -1.06919452e+00]], [[ 0.00000000e+00, 0.00000000e+00, 1.09810241e+00, 5.11828176e+00], [ 0.00000000e+00, 0.00000000e+00, 6.66598653e-02, -2.04493444e+00], [ 0.00000000e+00, 0.00000000e+00, -5.10024747e-01, -1.07911401e+00], [ 0.00000000e+00, 0.00000000e+00, 1.00315282e+00, -1.08214837e+00]], [[ 0.00000000e+00, 0.00000000e+00, 1.09446977e+00, 5.14273550e+00], [ 0.00000000e+00, 0.00000000e+00, 7.40431334e-02, -2.06005137e+00], [ 0.00000000e+00, 0.00000000e+00, -5.16538961e-01, -1.09238217e+00], [ 0.00000000e+00, 0.00000000e+00, 1.01012386e+00, -1.09397413e+00]], [[ 0.00000000e+00, 0.00000000e+00, 1.09116391e+00, 5.16509509e+00], [ 0.00000000e+00, 0.00000000e+00, 8.07727873e-02, -2.07386193e+00], [ 0.00000000e+00, 0.00000000e+00, -5.22479109e-01, -1.10450591e+00], [ 0.00000000e+00, 0.00000000e+00, 1.01648199e+00, -1.10476847e+00]], [[ 0.00000000e+00, 0.00000000e+00, 1.08815441e+00, 5.18553297e+00], [ 0.00000000e+00, 0.00000000e+00, 8.69076785e-02, -2.08647533e+00], [ 0.00000000e+00, 0.00000000e+00, -5.27895407e-01, -1.11558111e+00], [ 0.00000000e+00, 0.00000000e+00, 1.02228082e+00, -1.11461975e+00]], [[ 0.00000000e+00, 0.00000000e+00, 1.08541467e+00, 5.20420969e+00], [ 0.00000000e+00, 0.00000000e+00, 9.24998230e-02, -2.09799362e+00], [ 0.00000000e+00, 0.00000000e+00, -5.32833787e-01, -1.12569633e+00], [ 0.00000000e+00, 0.00000000e+00, 1.02756907e+00, -1.12360927e+00]], [[ 0.00000000e+00, 0.00000000e+00, 1.08292030e+00, 5.22127316e+00], [ 0.00000000e+00, 0.00000000e+00, 9.75970846e-02, -2.10851033e+00], [ 0.00000000e+00, 0.00000000e+00, -5.37336182e-01, -1.13493309e+00], [ 0.00000000e+00, 0.00000000e+00, 1.03239136e+00, -1.13181148e+00]], [[ 0.00000000e+00, 0.00000000e+00, 1.08064899e+00, 5.23685710e+00], [ 0.00000000e+00, 0.00000000e+00, 1.02243417e-01, -2.11810873e+00], [ 0.00000000e+00, 0.00000000e+00, -5.41440875e-01, -1.14336549e+00], [ 0.00000000e+00, 0.00000000e+00, 1.03678851e+00, -1.13929406e+00]], [[ 0.00000000e+00, 0.00000000e+00, 1.07858114e+00, 5.25108969e+00], [ 0.00000000e+00, 0.00000000e+00, 1.06477782e-01, -2.12687077e+00], [ 0.00000000e+00, 0.00000000e+00, -5.45182848e-01, -1.15106314e+00], [ 0.00000000e+00, 0.00000000e+00, 1.04079763e+00, -1.14612002e+00]], [[ 0.00000000e+00, 0.00000000e+00, 1.07669795e+00, 5.26408253e+00], [ 0.00000000e+00, 0.00000000e+00, 1.10337383e-01, -2.13486484e+00], [ 0.00000000e+00, 0.00000000e+00, -5.48593989e-01, -1.15808803e+00], [ 0.00000000e+00, 0.00000000e+00, 1.04445287e+00, -1.15234576e+00]], [[ 0.00000000e+00, 0.00000000e+00, 1.07498285e+00, 5.27594377e+00], [ 0.00000000e+00, 0.00000000e+00, 1.13855278e-01, -2.14215972e+00], [ 0.00000000e+00, 0.00000000e+00, -5.51703426e-01, -1.16449873e+00], [ 0.00000000e+00, 0.00000000e+00, 1.04778529e+00, -1.15802400e+00]], [[ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 5.28676967e+00], [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, -2.14881517e+00], [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, -1.17034803e+00], [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, -1.16320235e+00]], [[ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 5.29664721e+00], [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, -2.15488455e+00], [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, -1.17568375e+00], [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, -1.16792409e+00]], [[ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 5.30565687e+00], [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, -2.16041717e+00], [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, -1.18055034e+00], [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, -1.17222887e+00]], [[ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 5.31388116e+00], [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, -2.16546813e+00], [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, -1.18499049e+00], [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, -1.17615477e+00]]]))
A sufficient reduction projection summarizes the counts, much like a sufficient
statistic, and is useful for reducing the d dimensional counts in a potentially
much lower dimension matrix z
.
To get a sufficient reduction projection in direction of vy for the above example
z = hd.srproj(m,counts,1,1)
z
array([[ 0.08265601, -0.2233692 , 8. , 3. ], [ 0.11020801, -0.2233692 , 6. , 3. ], [ 0.06612481, -0.2233692 , 5. , 3. ], [ 0.036736 , -0.2233692 , 9. , 3. ], [ 0.06612481, -0.2233692 , 5. , 3. ], [ 0.036736 , -0.2233692 , 9. , 3. ], [ 0.04132801, -0.2233692 , 8. , 3. ], [ 0.05510401, -0.33505381, 6. , 2. ], [ 0.0330624 , -0.2233692 , 10. , 3. ], [ 0. , -0.33505381, 5. , 2. ], [ 0.08265601, -0.2233692 , 4. , 3. ], [ 0. , -0.33505381, 5. , 2. ], [ 0.08265601, -0.2233692 , 8. , 3. ], [ 0. , -0.33505381, 9. , 2. ], [ 0.05510401, 0. , 6. , 2. ], [ 0. , -0.33505381, 7. , 2. ], [ 0.06612481, -0.33505381, 5. , 2. ], [ 0. , -0.33505381, 4. , 2. ], [ 0.04132801, -0.2233692 , 8. , 3. ], [ 0.036736 , -0.2233692 , 9. , 3. ], [ 0.06612481, -0.2233692 , 5. , 3. ], [ 0.04723201, -0.33505381, 7. , 2. ], [ 0. , -0.33505381, 7. , 2. ], [ 0.04132801, -0.2233692 , 8. , 3. ], [ 0.04723201, -0.2233692 , 7. , 3. ], [ 0. , -0.33505381, 6. , 2. ], [ 0. , -0.33505381, 3. , 2. ], [ 0.08265601, -0.33505381, 4. , 2. ], [ 0. , -0.33505381, 4. , 2. ], [ 0. , -0.33505381, 8. , 2. ], [ 0.06612481, -0.2233692 , 5. , 3. ], [ 0.04723201, -0.33505381, 7. , 2. ], [ 0.05510401, -0.2233692 , 6. , 3. ], [ 0.04723201, -0.2233692 , 7. , 3. ], [ 0.09446401, -0.33505381, 7. , 2. ], [ 0.08265601, -0.2233692 , 4. , 3. ], [ 0. , -0.33505381, 5. , 2. ], [ 0.06612481, -0.2233692 , 5. , 3. ], [ 0.09446401, -0.2233692 , 7. , 3. ], [ 0.04723201, 0. , 7. , 2. ], [ 0. , -0.67010761, 5. , 1. ], [ 0. , -0.33505381, 5. , 2. ], [ 0. , 0. , 3. , 1. ], [ 0. , -0.33505381, 5. , 2. ], [ 0. , -0.33505381, 5. , 2. ], [ 0.16531202, 0. , 4. , 2. ], [ 0. , -0.33505381, 3. , 2. ], [ 0.08265601, -0.2233692 , 12. , 3. ], [ 0. , -0.33505381, 4. , 2. ], [ 0. , -0.33505381, 2. , 2. ], [ 0.16531202, -0.2233692 , 4. , 3. ], [ 0. , -0.33505381, 7. , 2. ], [ 0.04723201, -0.2233692 , 7. , 3. ], [ 0. , -0.67010761, 4. , 1. ], [ 0.16531202, -0.33505381, 4. , 2. ], [ 0.06612481, -0.2233692 , 10. , 3. ], [ 0.09446401, -0.2233692 , 7. , 3. ], [ 0.08265601, -0.2233692 , 8. , 3. ], [ 0. , -0.33505381, 3. , 2. ], [ 0.06612481, -0.2233692 , 5. , 3. ], [ 0.05510401, -0.2233692 , 6. , 3. ], [ 0. , -0.33505381, 8. , 2. ], [ 0. , -0.33505381, 10. , 2. ], [ 0. , -0.33505381, 4. , 2. ], [ 0.06612481, -0.2233692 , 5. , 3. ], [ 0. , -0.67010761, 5. , 1. ], [ 0.11020801, -0.2233692 , 6. , 3. ], [ 0. , -0.33505381, 4. , 2. ], [ 0.13224962, -0.2233692 , 5. , 3. ], [ 0. , -0.33505381, 5. , 2. ], [ 0.0330624 , -0.2233692 , 10. , 3. ], [ 0.11020801, -0.2233692 , 3. , 3. ], [ 0. , -0.33505381, 5. , 2. ], [ 0. , -0.33505381, 7. , 2. ], [ 0.05510401, -0.2233692 , 6. , 3. ], [ 0.11020801, -0.2233692 , 6. , 3. ], [ 0.13224962, -0.2233692 , 5. , 3. ], [ 0.18892802, -0.2233692 , 7. , 3. ], [ 0.14169602, -0.33505381, 7. , 2. ], [ 0. , -0.33505381, 4. , 2. ], [ 0. , -0.67010761, 5. , 1. ], [ 0. , -0.33505381, 5. , 2. ], [ 0.13224962, -0.2233692 , 5. , 3. ], [ 0.07347201, -0.2233692 , 9. , 3. ], [ 0.06612481, -0.2233692 , 10. , 3. ], [ 0. , -0.67010761, 8. , 1. ], [ 0.06612481, -0.33505381, 5. , 2. ], [ 0.08265601, -0.33505381, 4. , 2. ], [ 0.05510401, -0.2233692 , 6. , 3. ], [ 0.04132801, -0.2233692 , 8. , 3. ], [ 0. , -0.33505381, 6. , 2. ], [ 0. , -0.33505381, 5. , 2. ], [ 0. , -0.33505381, 5. , 2. ], [ 0.05510401, -0.2233692 , 6. , 3. ], [ 0.16531202, -0.33505381, 6. , 2. ], [ 0.04723201, -0.2233692 , 14. , 3. ], [ 0. , -0.33505381, 7. , 2. ], [ 0.19837443, -0.2233692 , 5. , 3. ], [ 0. , -0.33505381, 6. , 2. ], [ 0.13224962, 0. , 5. , 2. ]])
Here, the first column is the SR projection from the model for positive counts, the second is the the SR projection from the model for hurdle crossing (zeros), and the third is the total count for each observation.
Counts inverse regression allows us to predict a covariate with the counts and other covariates. Here we use hdmr for the backward regression and another model for the forward regression. This can be accomplished with a single command, by fitting a CIR{HDMR,FM} where the forward model is FM <: RegressionModel.
jl.eval("using GLM: LinearModel")
spec = jl.eval("CIR{HDMR,LinearModel}")
cir = hd.fit(spec,covars,counts,1,
select=Lasso.MinBIC(), nocounts=True)
cir
<PyCall.jlwrap CIR{HDMR,LinearModel}(1, [1, 2], HDMRCoefs{InclusionRepetition,Array{Float64,2},Lasso.MinBIC,UnitRange{Int64}}([0.0 -2.18288 -1.1806 -0.415995; 0.0 0.330624 0.0 0.0; 0.0 0.0 0.0299796 0.0788679; 0.0 0.0 0.0 0.0], [0.0 0.0461661 1.36309 3.07912; 0.0 0.0 0.0 -0.670108; 0.0 0.0 0.0 0.0; 0.0 0.0 0.0 0.0], true, 100, 4, 1:3, 1:3, Lasso.MinBIC()), LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}}: Coefficients: ─────────────────────────────────────────────────────────────────────── Estimate Std. Error t value Pr(>|t|) Lower 95% Upper 95% ─────────────────────────────────────────────────────────────────────── x1 0.887227 0.23313 3.80572 0.0003 0.424277 1.35018 x2 -0.00275704 0.101541 -0.027152 0.9784 -0.204397 0.198883 x3 -0.132434 0.101139 -1.30943 0.1936 -0.333275 0.0684076 x4 0.393855 0.697003 0.565069 0.5734 -0.990255 1.77797 x5 0.236098 0.319183 0.739695 0.4613 -0.397736 0.869933 x6 -0.00979175 0.0158525 -0.617678 0.5383 -0.0412717 0.0216882 x7 -0.0817407 0.0717278 -1.1396 0.2574 -0.224178 0.0606965 ─────────────────────────────────────────────────────────────────────── , LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}}: Coefficients: ────────────────────────────────────────────────────────────────────── Estimate Std. Error t value Pr(>|t|) Lower 95% Upper 95% ────────────────────────────────────────────────────────────────────── x1 0.57844 0.0794739 7.27837 <1e-10 0.420707 0.736174 x2 0.0068076 0.1002 0.0679403 0.9460 -0.192061 0.205676 x3 -0.13156 0.0990448 -1.32829 0.1872 -0.328136 0.0650164 ────────────────────────────────────────────────────────────────────── )>
where the nocounts=True
means we also fit a benchmark model without counts,
and select=Lasso.MinBIC()
selects BIC minimizing Lasso segments for each category.
we can get the forward and backward model coefficients with
hd.coefbwd(cir)
(array([[ 0. , -2.18288411, -1.18060442, -0.41599536], [ 0. , 0.33062404, 0. , 0. ], [ 0. , 0. , 0.02997958, 0.07886791], [ 0. , 0. , 0. , 0. ]]), array([[ 0. , 0.04616614, 1.36309252, 3.07912436], [ 0. , 0. , 0. , -0.67010761], [ 0. , 0. , 0. , 0. ], [ 0. , 0. , 0. , 0. ]]))
hd.coeffwd(cir)
array([ 0.88722732, -0.00275704, -0.13243369, 0.39385493, 0.23609827, -0.00979175, -0.08174069])
The fitted model can be used to predict vy with new data
hd.predict(cir, covars[range(1,10),:], counts[range(1,10),:])
array([0.55374217, 0.44339513, 0.41348183, 0.44716221, 0.42806591, 0.46489547, 0.48025307, 0.45795602, 0.56790706])
We can also predict only with the other covariates, which in this case is just a linear regression
hd.predict(cir, covars[range(1,10),:], counts[range(1,10),:], nocounts=True)
array([0.55869446, 0.46528463, 0.48635113, 0.4689304 , 0.49773856, 0.51824815, 0.45533037, 0.53886851, 0.55713808])
Kelly, Manela, and Moreira (2018) show that the differences between DMR and HDMR can be substantial in some cases, especially when the counts data is highly sparse.
Please reference the paper for additional details and example applications.