A Bayesian Reanalysis of Results from the Enhanced Services for the Hard-to-Employ Demonstration and Evaluation Project
Social policy evaluations usually use classical statistical methods, which may, for example, compare outcomes for program and comparison groups and determine whether the estimated differences (or impacts) are statistically significant — meaning they are unlikely to have been generated by a program with no effect. This approach has two important shortcomings. First, it is geared toward testing hypotheses regarding specific possible program effects — most commonly, whether a program has zero effect. It is difficult with this framework to test a hypothesis that, say, the program’s estimated impact is larger than 10 (whether 10 percentage points, $10, or some other measure). Second, readers often view results through the lens of their own expectations. A program developer may interpret results positively even if they are not statistically significant — that is, they do not confirm the program’s effectiveness — while a skeptic might interpret with caution statistically significant impact estimates that do not follow theoretical expectations.
This paper uses Bayesian methods — an alternative to classical statistics — to reanalyze results from three studies in the Enhanced Services for the Hard-to-Employ (HtE) Demonstration and Evaluation Project, which is testing interventions to increase employment and reduce welfare dependency for low-income adults with serious barriers to employment. In interpreting new data from a social policy evaluation, a Bayesian analysis formally incorporates prior beliefs, or expectations (known as "priors"), about the social policy into the statistical analysis and characterizes results in terms of the distribution of possible effects, instead of whether the effects are consistent with a true effect of zero.
The main question addressed in the paper is whether a Bayesian approach tends to confirm or contradict published results. Results of the Bayesian analysis generally confirm the published findings that impacts from the three HtE programs examined here tend to be small. This is in part because results for the three sites are broadly consistent with findings from similar studies, but in part because each of the sites included a relatively large sample. The Bayesian framework may be more informative when applied to smaller studies that might not be expected to provide statistically significant impact estimates on their own.