Moving Toward International Standards in Primary Care Informatics

Highlights of Moving Toward International Standards in Primary Care Informatics: Clinical Vocabulary

by C. G. Chute, M.D., Dr.P.H.

Throughout the history of medicine, reliance on observation and inference has been the major operational principle of good practice. Throughout the training of those in the late 20th century, this tradition has been reinforced by extending the notion of observation beyond the physical exam and careful history to include diagnostic evaluations. A consequence of this march of progress, then, is an explosion of information and knowledge that can be brought to bear in the treatment and management of patients. In short, medical practice, whether we like it or not, has become an information-intensive enterprise, relying in unprecedented ways on the synthesis of detailed and technical observations with complex and interrelated knowledge that embodies our best notions of current practice. Were this not sufficient to capture our attention, the pace of new knowledge discovery, integration of that knowledge into guidelines, and expectations of infallible excellence all conspire to force the management of patient information and practice knowledge to become a major priority in medical education and practice; its role in applied patient care research also remains preeminent. 

The emphasis on characterizing patient information—including presenting conditions, findings, symptoms, working diagnoses, interventions, and outcomes—is manifest in a broad spectrum of health analyses. Clinical epidemiology, outcomes analysis, health services research, guideline development, continuous quality improvement, and health economics are among the traditions that rely fundamentally on a consistent representation of underlying patient data. If this premise is so, then surely great attention must have been paid to the basic problems of how to consistently represent clinical information in a standardized way. Knowing readers realize that nothing could be further from the truth.

Classifications and Nomenclatures

Since the London Bills of Mortality were published in 1662 (1), periodic efforts have been made to categorize human mortality (2). International coordination of these efforts is manifest in the current International Classification of Diseases (ICD) (3). During the middle of this century, efforts to address human morbidity became a focus of attention, galvanized by the introduction of the American Clinical Modification of the ICD (ICD-9-CM) (4) in 1977. In parallel with these large-scale efforts, the evolution of the multiaxial Standard Nomenclature of Diseases and Operations (5), begun in 1928, evolved through the pathology classification of SNOP (6) to become what we now know as SNOMED (Systematized Nomenclature of Human and Veterinary Medicine) International (7). Similarly, the Read Codes (8) have become the basis of patient care coding in the United Kingdom. With such varied and intensive activity in data representation, the problem of robustly representing patient data must surely be solved? The evidence does not entirely support this notion.

The vast majority of patient data in the United States are coded exclusively according to the ICD-9-CM, primarily for reimbursement purposes. These data form the basis of national data sets, compiled by the Health Care Financing Administration and other major insurers, which are used to establish health policy and practice standards. Yet, a simple review of the classification reveals that it is devoid of any notion regarding disease severity. Indeed, two patients with widely differing conditions having profoundly different natural histories and outcomes will often be coded to the identical rubric. For example, a man with a microscopic focus of indolent prostate cancer found accidentally in the course of a transurethral resection of the prostate for benign prostatic hypertrophy will be coded identically to an unfortunate man with widely metastatic prostate cancer involving multiple bone sites, liver, brain, and lungs. Clearly these two men do not have the same prognosis and should not be collapsed into the same category, yet the best available health data in the United States do precisely this. How well, then, do the major clinical classifications work?

In a study undertaken by the Computer-Based Patient Records Institute (CPRI), an attempt was made to quantitate how well patient data are captured by major clinical coding systems (9). Employing narrative texts drawn from four medical centers and a variety of chart components (history and physical, procedure notes, nursing notes, etc.), the authors extracted 3,061 clinical concepts to create a consistent set of findings to be coded by different terminology systems. After encoding, the quality of each assignment was judged on a 0-2 scale and averaged over all concepts. Table 1 summarizes the salient points of a recent publication by Chute and colleagues (9).

Table 1. Clinical content capture by major terminologies

Terminology systemDiagnosesFindingsModifiersOthersTreatment and proceduresOverall
Read V21.471.360.651.501.261.05

Note: Average of 0.2 subjective scores for 3,061 clinical concepts from narrative texts. Adopted from: Chute CG, Cohn SP, Campbell KE, et al. The content of clinical classifications. Journal of the American Medical Informatics Association 1996;3:224-33.

ICD-10 is International Classification of Disease, 10th Revision. ICD-9-CM is International Classification of Diseases, 9th Revision, Clinical Modification. CPT is Current Procedural Terminology. SNOMED is Systematized Nomenclature of Human and Veterinary Medicine. Read V2 is Version 2 of Read Codes.

Among the striking observations is that, overall, ICD-9-CM captures considerably less than half (0.77/2) of the information considered important within the texts. ICD-10 does even less well, suggesting that it alone will not solve our problems with classifying and capturing patient data. Of the systems evaluated, SNOMED performed in a clearly superior way, albeit not without some measure of information loss (about 13 percent). The overall conclusion is that major amounts of information go unrecognized, inevitably resulting in significant misclassification problems for analyses based on data encoded with these terminologies. Hence, the major sources of clinical data in the United States and throughout the world may be misleading.

Several efforts are presently underway to develop consistent and robust terminologies intended to capture the clinical detail and substance of patient findings and events. Among them is the Convergent Medical Terminology (CMT) project (10). The CMT project, being undertaken by Mayo Foundation and Kaiser Permanente with funding from the National Library of Medicine (NLM) and the Agency for Health Care Policy and Research (AHCPR), intends to expand a clinically relevant subset of the Large-Scale Vocabulary, using a knowledge representation environment (IBM's prototype K-Rep) (11) to better capture the relationships between observations and to capture pertinent modifiers of these conditions such as severity.

The seminal contributions of the International Classification of Primary Care (ICPC) (12) provide another dimension of functionality, constituting a comprehensive classification specifically organized for primary care. Further, only the ICD can claim a larger international contribution to editorial content and widespread implementation. Nevertheless, the ICPC is not focused on the clinical detail so important to the valid and unbiased analyses of clinical information relating to care practice and outcomes. 

Return to Contents

Information Exchange

Few would argue that it is sufficient to capture and classify clinical data; practical care delivery depends critically on being able to exchange relevant information when and where it is needed. The domain of clinical messaging standards, perhaps best exemplified by HL/7 (Health Layer 7), attempts to accommodate these requirements. It has been widely noted that the health care industry is farther ahead in establishing standards to exchange messages than it is in standardizing a consistent content for them (13). Most users of message interchange technologies readily acknowledge that considerable latitude exists in how these standards are implemented, detracting considerably from the vision of "plug and play" levels of comparability.

CEN (Comité Européen de Normalisation) Technical Committee (TC) 251 on Health Informatics coordinates European efforts in this domain. Having first focused on a body of fundamental specifications about health information standards, the constituent working groups propose to evolve a spectrum of integrated standards that will support consistent clinical content. TC 251 working groups understand the difficulties of language-independent representations and interchange, and they may produce a level of standards specification that can contribute to an international solution for patient data.

While the technical limitations and challenges of clinical information representation and exchange are formidable, they pale against the political issue of whether people wish to have patient data managed with such facility at all. This is at the heart of the confidentiality question, which is central to any consideration of primary care information policies. 

Return to Contents

Confidentiality and Commitment

Although the societal benefit of deriving new knowledge from systematically collected repositories of patient experience seems obvious to me, it must be balanced against the risk and concern associated with potential misuse of confidential data. Ample case reports testify to the genuine risk to future insurability or employment associated with inappropriate access to patient records. Public attitudes are justifiably distrustful of any efforts to facilitate collection, interchange, or access to patient information, regardless of how noble the motivation. This has become manifest in many pending congressional bills, which range from reasonable restriction of unjustified data exchange to a complete ban on collecting health care data for any reason. Indeed, many in the confidentiality community regard the use of patient data for research or knowledge-generation purposes as a dangerous loophole in much of current draft legislation. Sentiment to close these loopholes by prohibiting all research use of patient information has an interested audience in the Federal legislature and among many State lawmakers.

The informatics community may perform no more valuable service within the next decade than to help ensure the confidentiality of primary patient information. This includes standards for repositories of patient data that require encryption of information to ensure against its misuse yet enable the linkage of subsequent outcomes or followup events with earlier episodes of care. In this way, longitudinal profiles of patient experience can be used to improve our understanding of disease natural histories or to empirically evaluate management options with respect to patient outcomes. Indeed, without the assurance of research access to patient information, the great promise of this information-intensive age of medicine will pass unmet, and our opportunity to efficiently and effectively improve the quality of care we deliver will go unaddressed. 

Our needs for standards in primary care range across many challenges. Perhaps the most fundamental are those that address the capture and consistent representation of patient findings, conditions, or events to enable the generation of new knowledge, insights, and understanding for improving the care we deliver. In parallel with these content standards are standards supporting the efficient exchange of data and knowledge at the time and place of need. Finally, the most strategic requirement concerns our ability to assure patients and society that the security and confidentiality of personal histories can be protected while preserving the legitimate needs for aggregate analyses to deliver the promise implicit in the information-intensive age of health care.


  1. Graunt J. Natural and political observations made upon the Bills of Mortality, 1662. Baltimore: Johns Hopkins Press; 1939.
  2. Greenwood M. Medical statistics from Graunt to Farr. Biometrika 1941;32:101-27; 1942;32:203-25; 1943;33(I):1-24.
  3. World Health Organization. Manual of the International Statistical Classification of Diseases, Injuries, and Causes of Death (9th Revision). Geneva; 1977.
  4. International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM), Vols. 1-3. Ann Arbor, MI: Commission on Professional and Hospital Activities; 1993.
  5. Standard Nomenclature of Diseases and Operations. Chicago: American Medical Association; 1933.
  6. Systematized Nomenclature of Pathology. Chicago: College of American Pathologists; 1965.
  7. Côté RA, Rothwell DJ, Palotay JL, et al. SNOMED International. Northfield, IL: College of American Pathologists; 1994.
  8. NHS Centre for Coding and Classification. Read Codes File Structure Version 3: Overview and technical description. Woodgate, Leicestershire, UK; 1993.
  9. Chute CG, Cohn SP, Campbell KE, et al. The content coverage of clinical classifications. Journal of the American Medical Informatics Association 1996;3:224-33.
  10. Medical speak. Wall Street Journal 1995 March 9; p. 1.
  11. Mays E, Weida R, Dionne R, et al. Scalable and expressive medical terminologies. Journal of the American Medical Informatics Association; in press.
  12. Lamberts H, Wood M, Editors. International Classification of Primary Care. New York: Oxford Press; 1987.
  13. General Accounting Office. Automated medical records: leadership needed to expedite standards development. Washington; 1993.
Page last reviewed November 1995
Internet Citation: Moving Toward International Standards in Primary Care Informatics: Highlights of Moving Toward International Standards in Primary Care Informatics: Clinical Vocabulary. November 1995. Agency for Healthcare Research and Quality, Rockville, MD.