This information is for reference purposes only. It was current when produced and may now be outdated. Archive material is no longer maintained, and some links may not work. Persons with disabilities having difficulty accessing this information should contact us at: https://info.ahrq.gov. Let us know the nature of the problem, the Web address of what you want, and your contact information.
Please go to www.ahrq.gov for current information.
Integrated State Data Systems
By Walter P. "Pete" Bailey, M.P.H.
Editor's Note: Needs assessments for and evaluations of health and social programs generally focus on data from a single program, describing the population served or the effectiveness of one program at a time. Whether at the Federal, State, or local level, such efforts rarely examine data from programs other than the one being studied. Broader-based analyses can allow program officials to understand issues such as the extent of Medicaid enrollment among children eligible for free- and reduced-price school lunches, or the impact of alcohol and drug treatment programs on use of the emergency department. A crucial step in moving toward a broader-based understanding of population needs and program effectiveness is developing data allowing those analyses.
Because data from different programs are developed for different purposes, they are generally stored separately and have no common individual identifiers to link them. This chapter describes an innovative approach to conducting more comprehensive analyses across a range of health and social service programs.
Making Use of Administrative Data
Why Use Integrated Data?
Case Studies of Data Linkage Projects
Components for Successful Data Integration
South Carolina has invested heavily in integrated data systems and benefited significantly from using these systems. This paper describes the power of integrated data systems and provides practical advice to States and local communities about some operating principles and capacities that will enable them to build and use their own systems. This paper also discusses the principles that underlie South Carolina's success with this type of data system. These principles are derived from years of developing and encouraging a collaborative model that joins the efforts of State agencies, private-sector organizations, and researchers.
Integration of data from multiple programs and sources can take the policymaker far beyond the limited knowledge that originates with data supplied for a single program, both in understanding the underlying problems of the program participants and in evaluating the impact of the services that the program provides. It is only through understanding more about the populations we serve that we can hope to provide government services that move us in the direction of improving the quality of life and fostering the independence of those who rely on the safety net.
In this paper, the reader will get a sense of what "leaps of understanding" are possible with integrated data systems. These tools may assist each State and locality in developing its own framework and vision, given the structure of its government and the needs of its programs. The concepts presented in this paper will perhaps have universal applicability, but may need to be "tweaked" to accommodate the infrastructure in a particular State.
Return to Contents
When designing an integrated data system, it is critical not to limit the types of functional areas included in the data system—all areas of human services can provide information, even though they may at first appear to be unrelated. Attention should be paid to data systems used to determine eligibility for and/or participation in:
- Health services.
- Social services.
- Mental health services.
- Disability services.
- Alcohol and drug abuse programs.
- Educational programs.
- The criminal justice system.
- Elderly services.
- Housing programs.
- Public safety programs.
- Disease, immunization, and child abuse registries.
Data systems used for bill payments, e.g., Medicaid and insurance products, should also be considered for inclusion.
In addition to those sponsored by government programs, there are other valuable sources of data that are worthy of inclusion in an integrated data system. The importance of private-sector data cannot be overemphasized. Information on hospitalizations, emergency department utilization, outpatient or ambulatory surgeries, office visits, home health, and nursing services provide a comprehensive picture of health services utilization. Additionally, the not-for-profit sector can augment existing data systems, particularly since their programs provide care for populations who do not meet the eligibility requirements for publicly sponsored programs. Information on the underinsured, uninsured, and the working poor can be located in their data systems. By integrating diverse sources of data from Government, private-sector, and not-for-profit programs, researchers can get closer to understanding the prevalence of many health conditions and can more fully assess a population's health services utilization.
To illustrate the types of available administrative datasets that can potentially be integrated, Table 1 lists datasets that are routinely integrated in South Carolina.
Table 1. Data Sets Integrated in South Carolina
|Agency or Program
||Decennial and estimates/projections
|SC First Steps
||Needs assessment data for children age 5 and
|SC Department of Disabilities & Special Needs
|SC Vocational Rehabilitation
|SC Department of Mental Health
|SC Labor, Licensure, and Regulation
||Licensed Physicians Database
|SC Department of Health & Environmental Control
||Vital Records, Emergency Medical Services, Ambulance, BabyNet, Children's Rehabilitative Services, various Maternal & Child Health files
|SC Department of Public Safety
||Motor Vehicle Crashes
|SC Department of Juvenile Justice
||Juvenile Justice Referral Database
|SC Private Healthcare Providers
||Inpatient hospitalizations, emergency department visits, outpatient surgeries, home health visits
|SC Department of Education
||Student demographics, Palmetto Achievement Challenge Test (PACT, a standardized test) and Exit Exams, 1st grade readiness
|SC Department of Social Services
||Temporary Assistance to Needy Families (TANF), Wage Match and Work Support, Food Stamps, Foster Care Tracking, Child Protective Services, Adult Protective Services, Child Support Services
|SC State Law Enforcement Division
||Criminal History File, Crime Incidents
|SC Department of Health & Human Services
||Medicaid claims data, Child Care Voucher System, Community Long Term Care,
Division on Aging
|SC State Health Plan
||Medical claims data for State employees
|SC Department of Alcohol & Other Drug Abuse Services
||Client service files
Ideally, an integrated data system attempts to capture the full range of health and human services experiences of the populations being served, so a range of information from administrative data systems can be relevant and yield insight into many areas of interest.
Return to Contents
Making Use of Administrative Data
Administrative data systems provide cost efficiency and convenience for engaging in targeting, program planning, evaluation, and monitoring. All States have administrative data systems that manage programs and services, such as Medicaid and various social services. While employing these data systems individually in a "silo-fashion" can provide a wealth of statistical information, integrating these systems at the client level can provide a comprehensive understanding of the health and human services experience. By integrating data systems, researchers and managers can obtain a fuller appreciation of the constituents they are serving and problems they are addressing. Household structure, diagnoses, employment, income level, education status, disability, and migrant status are just a few examples of what may be available through integrated administrative datasets.
An integrated data system allows agencies to track individuals over time and across the different health and human services agencies. If empowered with address-matching software and a geographic information component, such a system can bring about strong cooperation and collaboration as well as the capacity to target problems, pretest programs, and conduct outcome evaluations that will more accurately reflect the health and human service experience. Most of this information has been sitting in government data systems for decades. In the 21st century, the technology to move forward has come to fruition, and government now has the technological capacity to create a fully integrated health and human services data system using the administrative data systems it has always employed.
Return to Contents
Why Use Integrated Data?
Knowing the prevalence of diseases and diagnoses is useful in policymaking, health planning, and treatment. To the extent that individuals use services for particular diseases, linking administrative data from the programs that provide these services can provide prevalence rates for specific populations. For example, by linking Medicaid, hospital discharge, and emergency department data to an administratively defined population such as recipients of Food Stamps or Temporary Assistance to Needy Families (TANF), it is possible to determine the prevalence of conditions such as hypertension, diabetes, and low birthweight infants among the population of food stamp or TANF recipients that uses health services and is enrolled in Medicaid. Although this information is very useful, administrative data will only reflect the experiences of clients who have sought care, received services, are listed on registries, or have a health insurance plan. It will not provide an account for those individuals who are not subscribers of health insurance, or who have not sought care or services.
Population Definitions and Needs Assessments
The key to a solid program evaluation is understanding the population served and the nature of their needs. One of the many advantages of an integrated administrative data system is the ability to define a population. In South Carolina, there has been much interest in the safety net population, specifically their resource utilization and health outcomes. To meet this need, data from the Medicaid, TANF, and Food Stamps programs, as well as from uninsured hospitalizations, emergency department visits, and outpatient surgeries, were linked and unduplicated to create a safety net datasets for research and analysis.
When defining a population, there are two types of data systems to consider integrating:
- Those that contain specific variables that support the definition of the population to be studied.
- Those that, by virtue of the services its program provides, are affiliated with the population being defined.
Because most data systems code an individual on a variety of variables (e.g., diagnoses, age, income, gender), it is relatively simple to define a population using the existing codes provided by the administrative data systems. For example, in creating a study cohort of children with special health care needs, children with disabilities who participated in State agency programs were included by virtue of their program participation. Additional children were added by identifying them through ICD-9 diagnoses included in Medicaid claims data.
After defining the study population, the needs assessment can proceed by measuring indicators of interest, such as service encounters. For example, if investigating health services encounters for individuals with mental illness, integrated data can illustrate the frequency and types of encounters across different components of the health care system by date and diagnosis, giving the researcher a more robust needs assessment.
Experience has demonstrated that data requests for needs assessments and program planning are largely based either on the general population within a geographic area of interest or a subpopulation of interest such as the safety net population. Most indicators that are used for needs assessment in the general population are also available for sub-populations, such as low birth weight babies, individuals with disabilities, food stamp recipients, and children with special health care needs. Program planning, budget development, and program implementation all are aided by using data that are focused on the subpopulation served. Table 2 provides examples of needs assessment topics and indicators that are routinely asked of the general population, but also can be asked of any subpopulation of interest via an integrated data system.
Table 2. Example of Needs Assessment Indicators
||Rates of fertility, birth, prenatal death, postnatal death, infant death, premature birth, teen pregnancy, birth defects, low and very low birth weight births, birth to single mothers, high-risk babies, maternal disease, and adequacy of prenatal care
|Health and Other Service Utilization
||Rates and reasons for doctor's office visits, emergency department visits, hospitalizations and preventable hospitalizations, prescription drug utilization; cost of health care by type of care; Department of Mental Health use and rates by type; type of specialist providing care; Early and Periodic Screening, Diagnosis and Treatment (EPSDT) visits; and number and percent of clients using multiple agencies' services
||4-year-old kindergarten participation; school readiness; test scores, number and rate of children attending alternative schools; percent of students with individual education plans; Head Start participation; and First Steps participation
|Home and Family Life
||Household structure; volume and rates of child care by type; marriage and divorce rates; abuse and neglect rates by type; rates of children in foster care; free and reduced lunch rates; information on family resources; housing or rental costs, transportation availability; and parents' educational levels
|Quality of Life Indicators
||Children with special health care needs by type; disability rates by type; accident and injury rates by type; fire injuries; death rates by type; urban/rural comparisons; prevalence of health conditions; and vocational rehabilitation rates
||Alcohol and drug use rates by type; arrest rates by type; oral health statistics; rates of seat belt use; car seat use; and rates of Department of Juvenile Justice involvement by type
Correlational analysis also can be used to suggest other interventions that may affect the subpopulation of interest. For example, governments invest millions of dollars in educational improvement initiatives. Rarely are these funds allocated for interventions targeting life issues that influence educational performance. In South Carolina, by linking the birth record files with educational performance files, the following factors were identified as being related to school readiness:
- Mother's educational attainment.
- Adequacy of prenatal care.
- Number of previous births.
- Gestation period.
- Apgar score.
- Mother's age.
Had this correlational analysis been conducted prior to the allocation of funds for educational improvement programs, the funding source would have realized that there also are interventions that need to occur during the prenatal period that could improve student educational performance.
Evaluation of most programs involves more knowledge than can be gleaned from data systems specific to those programs. A classic illustration is the evaluation of programs related to mental health and alcohol and drug abuse. Measurement of outcomes for these programs requires additional knowledge of substantive areas other than the traditional mental health, alcohol, and drug abuse program areas. For example, successful treatment programs should (1) prevent law enforcement encounters, i.e., juvenile justice and imprisonment.; (2) reduce school drop-out rates; (3) improve educational scores; (4) reduce alternative school participation; (5) reduce teen pregnancies; (6) reduce uses of social services programs, i.e., TANF and food stamps; (7) prevent abuse and neglect of vulnerable populations, such as children and the elderly; (8) reduce emergency department utilization and associated costs; and (9) lower suicide and attempted suicide rates.
All of these outcomes involve linkages to major administrative datasets such as vital records, hospitalization and emergency department data, and social and human services data that are not typically a part of the mental health and alcohol and drug abuse service data systems. Having integrated linkages to these systems enables comparative program evaluation in which programs that affect these areas as well as alcohol and substance abuse may be identified. In this way, programs with comprehensive impacts may yield information on best practices with wide applicability.
Interventions and programs are only as good as the entities that provide services and the people staffing them. With integrated administrative data systems, comparisons can be conducted at the provider or staff level. As in the case of mental health, integrated data can provide detailed information at the staff level, enabling program staff to learn from the experiences of other staff. This comparative knowledge can be useful in continuing education programs designed to improve the quality of services offered by providers.
However, one must use caution when employing integrated data systems to evaluate community-level interventions. It can be difficult to determine whether the intervention under investigation generated change. The traditional threats to analytic validity should be considered when using an integrated data system, as the evaluation design may not be able to account for all confounders. For example, a county in South Carolina demonstrated significantly high economic and population growth rates and decreases in poverty rates. The poverty rates were deceptive because of changes in the population's denominator. The number of poor people actually increased during the same time period, but because there was a larger increase in the number of non-poor people, erroneous conclusions could have been drawn.
Integrated data can be used to measure the impact of an intervention. Using integrated data provides rich feedback on the outcome measures of interest, and it can also enrich the understanding of tangential outcomes. To illustrate, Commun-I-Care is a not-for-profit organization providing pharmaceutical assistance to patients who could not otherwise afford their medications. The organization is funded through grants and contributions from the medical and hospital communities as well as the pharmaceutical industry. By integrating Commun-I-Care's program data with private-sector hospitalization data, the organization was able to demonstrate that its program significantly reduced emergency department visits among the participants. The intervention thus saved the hospitals in its service area unnecessary emergency department costs.
Return to Contents
Case Studies of Data Linkage Projects
The following three case scenarios are representative of South Carolina's efforts to use integrated data systems. They illustrate how safety net populations of interest can be defined operationally in ways that are relevant for program managers and elected officials. The examples also demonstrate the utility of understanding safety net service utilization across program boundaries as well as the potential for identifying powerful performance measures, like educational achievement, outside the confines of standard health outcome indicators. Moreover, the Geographic Information System (GIS) analyses that are part of the case examples show how integrated data can elucidate differences among neighborhoods that may not be obvious from ordinarily available demographic data. This information can provide leads for those who want to address safety net concerns in communities by targeting key hotspots with tailored interventions. As described in the cases, findings often stimulate new investigations, resulting in an ongoing partnership among data analysts and those who provide services or allocate resources.
Example 1. Children With Special Health Care Needs
A health department-sponsored program with the mission of serving children with special health care needs desired a comprehensive understanding of the conditions and services of their constituents. The program, Children's Rehabilitative Services (CRS), estimated that it had served approximately 10,000 children over a 3-year period.
Defining the Population
In an effort to move toward prevalence measures, all the available relevant administrative datasets were linked together to estimate the total number of children with special health care needs. CRS defined "children with special health care needs" based on ICD-9 codes. This definition was applied to health service data systems including Medicaid, inpatient hospitalization, outpatient surgeries, emergency department visits, and the State government employee health plan (which accounts for 10 percent of the overall State population). Some data systems had other diagnostic information, but not at the ICD-9 level. Nonetheless, these systems were integrated for the analysis because of their mission, and included files from the Department of Mental Health, CRS, BabyNet, the Department of Disabilities and Special Needs, the Department of Vocational Rehabilitation, the Department of Education, and the Department of Social Services. By integrating all of these separate systems using the definition provided, an unduplicated count of more than 340,000 children with special health care needs was developed. This analysis had tremendous ramifications for CRS' budget and program agenda, which had previously been based on serving the 10,000 children they had identified through analysis of individual datasets.
After defining the cohort of children with special health care needs, additional information was gleaned from the integrated data. One area of interest was "crossover" services," that is, multiple organizations serving the same children. Given the complexity of the problems faced by these children, it was hypothesized that they were receiving many different services from a variety of providers. The integrated data demonstrated a very different outcome: only a very small portion of the cohort was identified as being served by multiple State agencies and programs.
The integrated data also provided important information on the economic well-being of families of children with special health care needs. By linking data at the client level, CRS gained a better understanding of how many families were receiving financial assistance from programs such as TANF and Food Stamps, the number of families with private insurance, and an estimate of the number of families who were uninsured.
A few of the data sources captured information on household structure, including variables on the number of people dwelling in the house, head of household, and household income. This information provided CRS with an insight into the family lives of their constituents. Additionally, the number of abused or neglected children and the number living in foster care also could be counted. The information for this analysis was available only through an integrated data system, as all of this information did not reside in one data source.
Educational performance of the cohort of children with special health care needs was also explored. School readiness, Palmetto Achievement Test (a State-administered examination), and Exit Examinations (required for graduation) were all analyzed to compare children with special health care needs to children without special health care needs. The results demonstrated remarkable differences, with the former performing at a much poorer level than the latter. Figures 1 and 2 show the results of these comparisons.
Geographic Information Systems and Environmental Factors
Having defined the cohort, the number and rate of children with special health care needs were mapped at the county level. The analysis demonstrated that two counties most noted for their environmental pollution had the highest rate of children with special health care needs. In addition, the results showed that high numbers of children with special health care needs live in the Interstate 95 corridor, which is also known for its high rates of poverty.
After viewing the map of children with special health care needs, data on diagnosis categories for the county with the highest rate of children with special health care needs were analyzed. It was determined that this county had the highest rate of childhood asthma in the State. In response, the State's Medicaid agency embarked on a campaign to address the issue by initiating preventive programs, environmental analysis, and working with health care providers and the educational community.
Other Uses of the Integrated Data
Once appropriate consent and privacy issues have been addressed, it is possible to use integrated data for sampling purposes. For example, while integrated administrative data yielded much information for CRS to process, there were additional questions the staff wished to ask families of children with special health care needs. Specifically, they wanted more information on the families' satisfaction with child-to-adult transition services, families' perception of whether or not their children had medical homes, satisfaction with medical homes, and satisfaction with State agency services. These topics were later addressed through a written survey, and the participants were sampled from the integrated data base.
Example 2. "Covering Kids" Project
This project was undertaken at the request of the South Carolina Hospital Association in cooperation with several State and non-profit agencies. With funding from The Robert Wood Johnson Foundation and The Duke Endowment, the workgroup set out to identify geographic pockets with large numbers of uninsured children and to implement strategies for increasing the numbers of these children covered by Medicaid and/or the State Children's Health Insurance Program (SCHIP).
Locating the Population
Using 2 years of data from the inpatient hospitalization and emergency department visit datasets, uninsured children were identified from the payer field. In addition, datasets with eligibility information for TANF and Food Stamps were incorporated to locate potentially uninsured children. All these records were then integrated with Medicaid eligibility files to remove any children who had subsequently obtained Medicaid coverage. The demographic and geographic variables found on the datasets enabled a detailed description of the identified population.
It is important to note that the hospital discharge and emergency department data were used for targeting uninsured populations, rather than developing a precise estimate of the uninsured. The aim was to identify the most likely census blocks with high concentrations of uninsured individuals to target outreach efforts. Discharge data have limited utility in estimating levels of insurance coverage for small areas. For example, because many individuals without health insurance seek health care at alternative locations, such as free clinics and community health centers, it is very difficult to calculate the prevalence of the uninsured. Since this study was conducted, South Carolina has added free clinic information to the integrated data system and is currently partnering with community health centers concerning potential inclusion of their data. As more datasets become integrated, more precise estimates will be possible.
The resulting datasets of "confirmed non-Medicaid" uninsured children also was fed through address-matching software, and density maps by county were produced. These maps provided critical information to the project for determining pilot intervention locations.
Improving Intervention Strategies
By linking Medicaid eligibility datasets with Department of Education free and reduced lunch datasets, it was possible to map children who were likely to be Medicaid-eligible at the school attendance zone level. The ability to target particular schools for enrolling uninsured children in Medicaid or SCHIP would not have been possible without the use of the integrated datasets.
While the numbers of Medicaid-eligible children were monitored monthly, additional program effects were measured by assessing changes in utilization of hospital services by uninsured children, especially the use of emergency department services. Over the first 2 years of the project, the number of uninsured children declined, and emergency department visits by uninsured children fell by more than 30 percent. This added to the fiscal health of the State's hospitals and improved access to health care for the State's children.
As alluded to previously, the integrated data system was also helping in targeting "hot spots," or concentrated areas of children eligible for Medicaid or SCHIP. By linking files from the Department of Education with the hospital discharge and emergency department data, communities and schools that would benefit from targeted outreach were identified.
Early in the project, information on the amount of charity care provided to children was compiled for each hospital and mailed to its Chief Executive Officer (CEO). Hospitals were offered the opportunity to receive lists of their confirmed uninsured patients as a means of assisting them in identifying children who would be eligible for the project's intervention.
Example 3. Healthy Start
One of South Carolina's three Healthy Start projects desired to know the location of high concentrations of infants born with serious health problems. This information would be used by the project to:
- Target interventions.
- Recruit clients.
- Tailor preventive services.
Locating the Population
The service area consisted of several rural, economically depressed counties along the Interstate 95 corridor. Vital records and Medicaid files were linked to define a cohort of babies with serious health problems. Prenatal and postnatal diagnoses and events were used in defining the cohort.
Once the cohort was defined, the data were processed through address-matching software. Babies with serious health problems were then mapped at the census block level. Using the linked data system, census blocks with the highest concentration of infants with serious health problems were compared to census blocks with very few such children along a variety of indicators including demographics, community resources, and health diagnoses. It became important to explain these differences so that interventions and strategies could be appropriately designed.
Catalyst for Communication
The comparison work with the integrated data and GIS technology yielded no significant differences on any of the variables between the census blocks with high concentrations of infants with serious health problems and those census blocks with low concentrations of such infants.
When the data were presented to the project's staff, consortia, and consumers, an interesting discussion ensued. Using their own familiarity with the area and their own qualitative, ethnographic assessments, they discussed the individual community and neighborhood dynamics that might explain the differences in infant health for the census blocks.
This example is important because even though the integrated data could not explain the differences in health outcomes, it served as a communication catalyst among stakeholders in a very worthwhile project. The group reached consensus on important differences between the census blocks. They identified the census blocks with better birth outcomes as having had a network of strong, viable churches with full-time ministers and major community or civic programs such as the Boy Scouts. In comparison, the census blocks with poor birth outcomes had churches with itinerant ministers who visited once a month for religious services. Additionally, they were deficient in major civic and community programs.
On an anecdotal note, it also seems that those blocks with high concentrations of infants with problems had "juke joints" or establishments that served alcohol and promoted congregation among the young African Americans in the area. The stakeholders believed that the "juke joints" promoted an environment of poor personal health behaviors that could result in poor birth outcomes.
Return to Contents
Proceed to Next Section