HomeHealthI have too many control variables ... which ones should I include...

# I have too many control variables … which ones should I include in my regression model? – Health economist

Let’s say you have some data on healthcare spending by different people and you want to know which patient characteristics drive healthcare spending. While this seems like something any health economist could do, measuring the relationship requires knowing (i) which independent variables to include in the data analysis and (ii) their functional form. Option (i) can be determined based on previous and expert studies, but even that is imperfect. Point (ii) is very difficult to decipher. Is there a data-driven way to achieve this?

A role of Belloni, Chernozhukov and Hansen (2014) proposes the use of post-double selection (PDS) to identify the relevant controls and their functional form. Consider the case where we want to model the following:

andI = g (wI) + ςI

where

E (ςI| g (wI)) = 0

Belloni paper goodies g (w) as a high-dimensional, approximately linear model, where:

g (wI) = Σj = 1 a PjXI j+ rPi)

Note that in the Belloni framework, it is possible that the number of control variables (P.) be greater than the number of observations (NORTH). How can you have more regressors than results? Basically because Belloni requires that the causal relationship be roughly scarce which means that outside the P. control variables, only s of them are different from 0 where yes.

Belloni proposes identifying theses s important variables using a Minimum Absolute Contraction and Selection Operator (LASSO) model of Frank and Friedman (1993) as follows:

In LASSO, the coefficients are chosen to minimize the sum of the squared residuals plus a penalty term that penalizes the size of the model through the sum of the absolute values ​​of the coefficients. The term λ is the penalty level that provides the degree to which the number of variables with coefficients other than zero (or very small) is penalized. Papers like Belloni et al. (2012) other Belloni et al. (2016) provide some reasonable estimates for the value of λ. The gamma coefficients are the “penalty charges” that are intended to ensure the equivalence of the coefficient estimates with the change of scale of x. For example, if one variable was schooling on a scale of 1 to 16 and another variable was income in dollars, a 1-year increase in education is an order of magnitude increase much greater than a \$ 1 increase in income. annual income. Penalty charges are intended to correct this disparity. The authors note that:

The penalty function in LASSO is special because it has a skew at 0, and the penalty function in LASSO is special because it has a skew at 0, which results in a sparse estimator with many coefficients set to a sparse estimator with many set coefficients. exactly zero.

However, one of the problems with the LASSO approach is that the resulting coefficients are biased towards zero. The approach proposed by Belloni is to use post-Lasso estimation using the following two-step approach:

First, LASSO is applied to determine which variables can be ruled out from a prediction point of view. Then, the coefficients of the remaining variables are estimated by ordinary least squares regression using only the variables with nonzero first-pass estimated coefficients. The Post-LASSO estimator is convenient to implement and … performs just as well and often better than LASSO in terms of rates of convergence and bias.

More details are in the document and there are also a variety of empirical examples. read all study.

Also, a recent article by Kugler et al. (2021) published last month used the Belloni approach in her study to examine the impact of salary expectations on the decision to become a nurse.

Related News