\begin{align*} y_{i} & \sim p(y_{i}; \eta_{i}, \phi) \\ \eta_{i} & = \sum_{t=1}^{T} \alpha_{t} d_{i,t} + \sum_{j=1}^{J} \beta_{j} x_{i,j}, i=1,\ldots, n \end{align*}
Current Population Survey data for 2010 and 2019
228 covariates
204 treatments
There is an obvious over-selection variance inflation that makes harder to detect weaker effects
There is also a more subtle over-selection bias
- c {\alpha \phi \over \alpha^2+\phi} {\log J \over n}
cil function in R package mombf
\prod_{t=1}^{T} p(\alpha_{t} \mid \delta_{t}, \phi) \prod_{j=1}^{J} p(\beta_{j} \mid \gamma_{j}, \phi)
Inclusion indicators \delta_t,\gamma_j \in \{0,1\}
State-of-the-art non-local priors on inclusion
Heterogenous covariate inclusion probabilities
\begin{align*} \prod_{t=1}^{T} \text{Bern}(\delta_{t}; 1/2) \prod_{j=1}^{J} \text{Bern}(\gamma_{j}; \pi_{j}(\bm{\theta})) \end{align*}
\begin{align*} \pi_{j}(\bm{\theta}) = \rho + (1 - 2 \rho) \left( 1 + \exp \left\{ - \theta_{0} - \sum_{t=1}^{T} \theta_{t} f_{j,t} \right\} \right)^{-1} \end{align*}
f_{j,t} are positive measures of (joint) association between treatment t and covariate j
\rho sets upper-lower probability of inclusion bounds
\bm{\theta} is learnt from data and moderates inclusion
\begin{align*} p(\bm{y} \mid \bm{\theta})= \sum_{\bm{\delta}, \bm{\gamma}} p(\bm{y} \mid \bm{\delta}, \bm{\gamma}) p(\bm{\delta}, \bm{\gamma} \mid \bm{\theta}) \end{align*}
with the first term either analytically available or approximated, e.g., by ALA of Rossell et al. (2021)
To avoid several MCMC runs (for different \bm{\theta}) and multi-modality, we also consider the following cheaper alternative
We considered the localized posterior p_0(\bm{\delta}, \bm{\gamma} \mid \bm{y}) = p(\bm{\delta}, \bm{\gamma} \mid \bm{y}, \bm{\theta}=\bm{0})
Note the identity p(\bm{y} \mid \bm{\theta}) = \sum_{\bm{\delta},\bm{\gamma}} p_{0}(\bm{\delta},\bm{\gamma} \mid \bm{y}) p(\bm{\delta},\bm{ \gamma} \mid \bm{\theta})