Skip Navigation U.S. Department of Health and Human Services
Agency for Healthcare Research Quality
Archive print banner

This information is for reference purposes only. It was current when produced and may now be outdated. Archive material is no longer maintained, and some links may not work. Persons with disabilities having difficulty accessing this information should contact us at: Let us know the nature of the problem, the Web address of what you want, and your contact information.

Please go to for current information.

Tools for Monitoring the Health Care Safety Net

Estimating the Size of the Uninsured and Other Vulnerable Populations in a Local Area (continued)

Approaches for Local Estimation of Safety Net Demand

Given the basic technical issues, primarily concerning sample size, of the national surveys for estimating local safety net demand, we now turn to other methods for estimating this demand at the local level. The three primary approaches are:

  1. Conducting your own household survey.
  2. Using other data sources to estimate demand.
  3. Developing a statistical model using the available local data.

If local analysts are not able to conduct their own household survey (Option 1), they will likely use one of the State or national surveys as a data source for estimating demand at the local level (Options 2 and 3). Each comes with its pros and cons, and analysts will need to assess each approach carefully.

Conducting Your Own Household Survey

The direct approach to estimating health insurance coverage within a small area (e.g., a county or city) can be characterized by two features:

  1. Using a measurement instrument (e.g., State survey) to directly measure health insurance coverage.
  2. Using measurements from a sample of people drawn from the actual population of interest (e.g., the county or city of interest).

For example, to directly measure health insurance coverage within a specific county, researchers could construct a survey instrument designed to measure health insurance (or use existing instruments from ongoing national surveys) and draw a sample of people from the county to serve as survey respondents.

Three conditions must be met to obtain high-quality direct estimates of health insurance coverage. First, the instrument used to measure the concept should be valid. For an instrument to be valid, the survey items need to do a good job of determining whether someone has health insurance coverage.3 Second, each member of the population of interest should have a known probability of selection into the sample. For example, if you are conducting a survey of 500 people and you draw a simple random sample from a population list that includes all 5000 people in the county, then each person's probability of selection would be 10 percent. The final condition that must be met to derive direct estimates is to have a large enough sample size. A good rule of thumb is that the equivalent of 100 simple random sample cases is needed for each population of interest.

It is important to explore your own State survey efforts and assess whether resources are already being expended on a State survey. It might be possible to add sample to an ongoing State survey to obtain the numbers needed to do local estimation. This builds on existing capacity and provides an economical way to collect data. In addition, piggybacking on an existing State survey provides a benchmark for comparison purposes, enabling answers to the question "How well is your county or region doing relative to the State overall?"4

3 SHADAC has an interactive tool that allows researchers to compare the content and question wording of 10 different State surveys on health insurance coverage. The following link also has links to nine State health access survey web pages that contain information on State specific reports, data tables and survey methodology (

4 For additional information on survey design and implementation, go to the chapter by Joel Cantor.

Developing a Proxy Measure

Although direct estimates through survey sampling already described provide the most defensible estimates, they are also the most costly to produce. Indeed, the cost of producing high-quality direct estimates for small areas is often prohibitive. When at least one of the three conditions to derive direct estimates is not met, people often turn to one or more of the other approaches, depending on their expertise, resources, and data available. The simplest of these alternatives is the "proxy direct" approach to small area estimation.

The proxy approach uses some measures that can serve as a proxy for health insurance coverage and generally applies it to a proxy population within a county. A commonly used, although somewhat controversial, "proxy measure" of uninsurance uses administrative records from all the hospitals within a county to determine the percent of specified discharge diagnoses that were coded as "self-pay." Specifically, this would entail extracting information on the expected primary source of reimbursement for specifically chosen diagnoses reported in hospital discharge data sets from all hospitals in a given area. Patients discharged with one of these specific diagnoses, classified as "self-pay" (i.e., the individual, not an insurance company or the government, was expected to pay the bill) would be designated as being uninsured. For example, if 8 percent of all patients with these diagnoses were expected to be self-pay, then the uninsurance rate in the county could be set at 8 percent as well.

A major strength of this and other proxy measures is the low cost These data are relatively inexpensive to compile and are routinely collected in a majority of States. Moreover, the use of this particular proxy measure avoids the problem of basing estimates on small survey samples, as there generally will be adequately large numbers of discharges for the selected diagnoses within a specific geographic area.

However, there are also concerns with bias and measurement error. For example, not everyone discharged from each hospital will be a resident of that county, which can bias the estimate for the referent county. In addition, a given county's estimated rate of uninsurance can also be biased from its actual rate because not every patient living in the county will have gone to one of the county's hospitals.

Furthermore, for the diagnoses selected for use in this analysis, it is critical that the decision to be admitted to a hospital be completely independent of whether one has insurance coverage or not. For example, for a given diagnosis with some "discretion" about the need to be hospitalized, individuals with insurance coverage are more likely to be admitted to a hospital than those without coverage. To the extent that you include this type of diagnosis in the set of diagnoses forming your overall proxy measure of uninsurance, you would underestimate the amount of uninsurance in the county. Although this "self-pay" proxy measure uses data from the county of interest, it is nevertheless a "proxy" population that can be expected to yield an estimate of uninsurance of greater or lesser accuracy.

Finally, because actual insurance coverage is only correlated with expected self-pay and is not the same thing, use of this proxy measure of coverage can involve error. For example, an individual may be classified as "self-pay" at the time of discharge, but may later receive retroactive Medicaid coverage for this hospital expense. As this example shows, using this proxy measure would yield too high an estimated uninsurance rate unless some adjustment could be made to account for this kind of error. This type of adjustment is difficult to do—and subject to imprecision—with only expected primary payment data available.

Although proxy measures often have fairly large sample sizes (for example, expected payer information on discharges from all hospitals within a county for an entire year), the proxy measure approach is generally considered to be a last resort. Given that the potential for bias is high with proxy measures, analysts may want to consider it only if nothing else is available. At a minimum, however, exercise great care in selecting the proxy used, preferably using only those that have been rigorously evaluated for potential bias.

Using a Model-Based Approach

The simple modeling approach predicts health insurance coverage for a specific geographic area using one or more variables correlated with health insurance coverage, where the correlation is based on data obtained from the geographic area of interest. Data used to establish this correlation typically come from State surveys or national estimates of State levels of uninsurance, such as those produced by CPS. It is then possible to predict coverage for other geographic areas without a measure of health insurance coverage by inserting the values of the correlated measures into the models and using this model-based estimate as the health insurance coverage estimate.

An example of this approach is using unemployment rates to estimate the level of uninsurance. The use of unemployment rates is attractive for two reasons:

  1. Unemployment rates are correlated with health insurance coverage rates.
  2. Timely reports of unemployment rates are available for every county in the United States from the Bureau of Labor Statistics.

If, for example, it was found through statistical analysis that the uninsurance rate was, on average, 1.5 times the amount of the unemployment rates across a large number of counties, then, in counties without any direct measure of uninsurance, an estimate of uninsurance would be 1.5 times the unemployment rate in the county. With such a simple model, it is clearly preferable that the counties used to develop the model be as similar as possible in demographic characteristics, be located within the same State, and be as close as possible to the counties using the model to predict their uninsurance rates.

The pre-eminent example of a model-based approach is the Census Bureau's Small-Area Income and Poverty Estimates (SAIPE) program). In the SAIPE program, up-to-date estimates of the number of school-age children living in poverty in U.S. counties are obtained from a combination of two estimates. First, for counties that have been sampled by the annual March Supplement to the Current Population Survey (CPS), this survey provides a direct estimate of the number of school-aged children in poverty. Even for counties that have been sampled, however, this direct estimate is usually based on very small samples. As a result, even if 3 years of March CPS information are combined to form one direct estimate, it is still likely to be subject to too large an amount of sampling error to be of much policy utility if used alone. In addition, only about one-third of counties nationwide are included in the March CPS sample in any given year. Consequently, no direct estimate is possible for the majority of counties in the country.

To overcome this deficiency, researchers have developed regression models to provide indirect, or synthetic, estimates of a county's number of school-age children in poverty. This approach begins by assembling a large data set on all the counties in the entire country that have been included in the CPS samples. The data collected for this project come from the CPS itself, on each county's number of school-age children in poverty, plus Internal Revenue Service (IRS) data on individual tax returns and data from the Federal food stamp program, all aggregated to the county level to yield predictors of school-age children in poverty. That is, these predictors include such county-specific measures as the number of child exemptions reported by families in poverty in the county, and the number of people receiving food stamps in the county. These data are then used in regression models to establish the statistical relationship between the expected number of school-age children in poverty in each county and the levels of these predictor variables for the county.

Importantly, these predictor variables are selected in part because of the feasibility (for the Census Bureau) of obtaining reasonably up-to-date values for them for all the counties in the country. Thus it is possible to use these up-to-date predictor values to estimate each county's number of school-age children in poverty.

Finally, since this regression model has been estimated on a large data set (all counties in the county with CPS samples), the synthetic or indirect estimates derived from it are capable of achieving reasonably high levels of 'predictive' accuracy.

The SAIPE model estimates of school-age children in poverty are formed as a mixture of the direct estimates (for counties included in the March CPS sample) and the model predictions, or indirect estimates. By blending these two estimates in a sophisticated manner that takes into account the accuracy of each estimate, the resulting blended estimate is better than either direct or model-based estimate would be alone. Importantly, they also provide an estimate for those counties not included in the March CPS samples. Other advantages of the SAIPE model estimates are that they can be updated on an annual or biennial schedule and they can be expected to have less error than the other alternative—outdated census estimates. The major disadvantage is that the production of these model-based estimates requires substantial resources. These models must be developed initially and then evaluated by highly trained statisticians; they require access to large amounts of data, preferably nationwide, all of which may not be in the public domain; and the models themselves must be updated periodically, which also entails large resource costs.

The Census Bureau's Small Area Estimates Branch (SAEB) is currently developing model-based estimates of the number of the uninsured at the State and county levels using the CPS. The SAEB's current focus is on developing State-level estimates of the number of children (ages 0-18) who are in families with incomes at or below 200 percent of the Federal poverty line (FPL) and the number of these low-income children who are uninsured. The focus on low-income children is a natural extension of SAEB's work on the SAIPE; and, because the CPS is the primary source for State-level estimates of uninsurance, it provides a basis for comparison with which they can evaluate the model-based estimates. The methodologies used by the SAEB are described in Fisher and Campbell (2002). The authors have found that at least initially, the covariates that are available for modeling uninsurance rates for low-income children are not as useful as they were in predicting child poverty rates in the SAIPE effort. Nonetheless, State and county estimates of the uninsured (perhaps by age) are expected in the fall of 2003.

The sophisticated models developed by Census are developed by some of the best statisticians and modelers in the country. It is hoped that this expertise will develop estimates that can be used by State and localities to better understand rates of health insurance coverage. However, State and local analysts can do their own simple models to provide some level of information and develop more complex models using expertise located within State agencies or at local universities.

Return to Contents

Comments on Estimating State Health Care Program Coverage

Monitoring and estimating public health care program enrollment is important to understanding the dynamics of shifting demand for safety net services. Surveys, such as the CPS, provide one source of estimates for public program participation, and the same direct and indirect (or model-based) approaches that can be used to generate uninsurance rates (described in last section) can be applied to public health care program participation. In fact, the Census Bureau's Small Area Estimation Branch (SAEB) anticipates that they will be able to provide State and county estimates of Medicaid program participation by fall 2003 using model-based approaches.

The estimation of public program participation from the CPS is not as straightforward as estimating uninsurance rates, however. While the Census Bureau is confident that people identified as having privately purchased insurance are actually insured, how well the questions correctly classify respondents into public coverage is unknown. Analogous evidence strongly suggests that those enrolled in Government health care programs may misidentify the program they are in. Call et al. (2001) found that public program participants were for the most part able to identify whether they were insured or uninsured and whether they were on public or private insurance, but they were not always able to distinguish the type of public program in which they were enrolled. Perhaps as a result of this confusion, survey estimates of public coverage often fall below actual enrollment numbers generated by administrative data.

Researchers have speculated about other sources of this underreporting bias (Blumberg and Cynamon, 2001). The stigma associated with public assistance may lead respondents to avoid mentioning enrollment in such programs. Alternatively, Medicaid managed care plans may operate like, and appear to be, private insurance companies. In addition, cognitive testing undertaken at the Census Bureau demonstrated that people use several different terms to refer to Medicaid (e.g., medical assistance and medical care) and have particular difficultly discerning Medicaid from Medicare in the CPS (Loomis, 2000).

Brown et al. (1997) describe a number of other sources of the observed discrepancies between CPS estimates of public health care program participation and administrative data provided by States to the Centers for Medicare & Medicaid Services (CMS). One possible source of the difference between Medicaid administrative data and CPS estimates of Medicaid coverage is that the former includes both institutionalized and noninstitutionalized populations, while the CPS includes only noninstitutionalized populations. Another source arises because certain population groups who have higher rates of Medicaid coverage (e.g., homeless persons) may be underrepresented in the CPS sampling frame. Conversely, people, especially children, who live in two States during the time period of interest may be counted twice in administrative data (Blumberg and Cynamon 1999). Finally, the discrepancies also may be due to different forms of measurement error (Brown et al., 1997). For example, the CPS and the administrative data totals may not cover the same period of time. Specifically, the CPS respondents may be responding according to a different time frame than that covered in the administrative data.

Regardless of the cause, the number of people enrolled in public health care programs like Medicaid, based on survey information, is often lower than the number of enrollees in administrative databases maintained by States and by CMS. According to figures provided in Nelson and Mills (2002), the 2.3 million estimate obtained by the CPS is substantially lower than that provided by administrative data (3.3 million) when compared to administrative estimates of SCHIP enrollment. However, the CPS estimates of SCHIP coverage are closer to the actual administrative enrollment files in those States that have separate SCHIP programs than in those that have Medicaid expansions. Nelson and Mills (2002) speculate that this difference is because the Medicaid and SCHIP questions in the CPS refer to the same program names in Medicaid expansion States where there is no separate SCHIP program. As a result of the observed discrepancies, the Census Bureau cautions against using the single-year CPS estimates of SCHIP participation at the State level and suggests that researchers and policymakers use the combined 2001-02 CPS file that is currently available. (Nelson and Mills, 2002).

Health care policy research firms such as The Urban Institute with its Transfer Income Model (TRIM) and Mathematica Policy Research, Inc., with the Micro Analysis of Transfers to Households (MATH) program, try to correct for these public program undercounts when estimating the impacts of different policy options. The TRIM and MATH models allocate those people in the CPS who indicate they are not enrolled in a public health care program (e.g., Medicaid) but are eligible for enrollment based on their responses to other survey items (e.g., family income, family composition) to enrollment in the public program. Through this process, the CPS estimates of public program participation approach the enrollment numbers provided by administrative data.

It is important for analysts to understand that differences will exist among the State administrative files (including Medicaid) and SCHIP enrollment records, and any estimate of such coverage from national, State, or local surveys. Much work remains to be done on reconciling the data from both of these sources, and efforts are ongoing to develop a gold standard on coverage data. However, to date there are still methodological issues surrounding both survey data and data derived from administrative records.

Return to Contents

Summary of Issues and Recommendations

Estimates for the uninsured with desirable levels of accuracy for well-defined subpopulations and for specific time periods at the sub-State level are obtainable only at very substantial cost, since they are achievable only from large-sample based direct estimates. Thus, localities need to be aware of and use State and national surveys, particularly given the potential for sub-State estimates with existing data sources. These include the HRSA-funded State survey initiative as well as CPS, SLAITS and MEPS-IC. Conversely, estimates using proxy measures are generally possible with low resource costs but are very unlikely to provide sufficient accuracy or sensitivity to be useful for most evaluation purposes. Again, the development of these models relies on existing data, and knowledge of both State and national sources of information on health insurance coverage is critical. Specifically, the proxy measure and model-based approaches in general will not be sensitive to specific interventions within a geographic area.

For example, if a county implements an intervention to increase insurance coverage, its impact will only be detectable from a model if either:

  • One or more of the correlates are directly impacted by the intervention itself and hence are directly related to uninsurance status (e.g., "self-pay" status for specific diagnoses).
  • There is a significant number of directly measured cases from the area in the blended model (in which case it really becomes best to use the direct estimate approach).

Thus, complex, difficult-to-achieve, and/or costly requirements are placed on proxy measure and blended-model approaches if they are to meet most evaluation uses.

Model-based estimates currently being developed by the Census Bureau for uninsurance in small areas may prove immensely useful in future years as methods are refined. The estimates will be based on a large data set, again including all the counties in the country with a sample in the March Supplement to the CPS, and they will use many predictor variables available to the Census Bureau on a timely basis. Thus, it is likely such models would be capable of generating estimates with reasonably high predictive accuracy and in a reasonably timely manner. But like the SAIPE model estimates for children in poverty, the model-based estimates of uninsurance will likely have to be based on a 3-year average estimate for many if not all uses to which they would be put. This 3-year time frame would not accommodate many evaluation uses, although it might prove satisfactory for less rigorous monitoring.

In summary, selection of the appropriate estimation approach is not straightforward and requires an assessment of the principal strengths and weaknesses of each approach (Table 4). Unfortunately, each of the previously listed desired properties for small-area estimates of uninsurance is achievable only at the price of steep trade-offs among the properties. When evaluating the relative merits of the various approaches described, one must also consider the ease or difficulty with which the results can be described. Specifically, if an approach other than direct estimation is pursued, it will be important (and difficult) to provide policymakers with an appropriate understanding of the complex statistical and methodological issues associated with the proxy direct and model-based approaches. End users of the information generated by the approaches must also be informed of the requisite cautions to guard against over-interpretation of the data.

In this chapter, we have attempted to address key issues and sources of data to estimate the demand for safety net services. We have focused on the uninsured and the Medicaid/SCHIP populations, but other populations, including the homeless, mentally ill, and other vulnerable populations, also rely on safety net services. Getting detailed information on the characteristics of coverage and access for these populations is difficult because of their small sizes, their accessibility, and the costs associated with developing and implementing targeted surveys around specific population groups. We have focused on the largest group of the population that relies on the health care safety net.


In the following, we point out some rather obvious but targeted messages for local analysts attempting to understand the dynamics of the changing need for safety net services.

  • Be knowledgeable and aware of different data sources and approaches to data collection and measurement issues. Understand the CPS methodology and try to use data that are pooled over 3 years whenever possible, even for State-level estimates. CPS will continue to play a dominant role in discussions about the numbers of uninsured and it will be crucial to discuss estimates in the context of the State CPS estimates. In Figure 1, we used a 2-year pooled sample because of changes in the 2000 CPS that affect the estimates. Data from 1999 were not comparable, and 2002 data are not yet available.
  • Be realistic about what can be measured and the difficulties in detecting significant change in demand at the local level. While it may be important to understand the dynamics of shifting demand at the local level, obtaining data that will demonstrate change over time may be too costly. It may be more valuable to understand safety net capacity and use estimates of local demand that may not measure change over time.
  • Use existing resources and data. Data and information are available at both the State and local levels. Before collecting new data, explore what currently exists and whether expertise is available to develop proxy or model-based estimates using this existing data.
  • Use national and local experts. Many Federal analysts spend their careers studying survey and sample design as well as development of statistical adjustments or model-based approaches. In addition, local university-based experts are often interested in these technical issues. Finally, privately funded technical assistance and research centers, including the State Health Access Data Assistance Center (SHADAC), (, now support the application of State and national survey data on coverage policy issues. Find these experts and use them.
  • Build on existing surveys. This is important especially if there is an ongoing State survey that could be expanded or adapted to meet local needs for data. Alternatively, explore buying in sample with a larger national survey such as NSAF, MEPS-IC, or SLAITS. The uniform methods and approach provided by a national survey will be necessary to compare and contrast different approaches to access and services across localities.
  • Be flexible. Issues related to measuring and monitoring the safety net will be complex and require various data collection and estimation efforts. No single measure can adequately monitor the safety net over time. Analysts will need an approach that captures this complexity while providing a framework and information that can be useful longitudinally.

Return to Contents


It is important that State and local analysts clearly define the policy questions to be answered through the estimation of the size of the uninsured and other populations in a local area. This is a necessary condition for assessing how well the estimation approaches described above will work for their intended purpose. For example, proxy measurement approaches are not well-suited to tracking changes over time and may not be the best methods for evaluating the impact of a program designed to decrease the number of uninsured in a local area.

Moreover, State and local analysts should be aware of the limitations of existing approaches and the costs associated with conducting additional household surveys. The sample size required to track health insurance coverage at the local level is often prohibitive to collect on a routine basis.

Finally, technical work being done at the Federal level may produce new model-based estimates that can be useful at the local level. The data and methods used to develop these estimates must be rigorous and the information disseminated in a timely fashion. A balanced and informed approach to estimating the numbers of uninsured and Medicaid/SCHIP covered individuals in a local area can be one key component of assessing the health care safety net.

Return to Contents
Proceed to Next Section


The information on this page is archived and provided for reference purposes only.


AHRQ Advancing Excellence in Health Care