Skip Navigation U.S. Department of Health and Human Services
Agency for Healthcare Research Quality
Archive print banner

Systematic Review of the Literature Regarding the Diagnosis of Sleep Apnea


Evidence Report/Technology Assessment: Number 1

This information is for reference purposes only. It was current when produced and may now be outdated. Archive material is no longer maintained, and some links may not work. Persons with disabilities having difficulty accessing this information should contact us at: Let us know the nature of the problem, the Web address of what you want, and your contact information.

Please go to for current information.

Under its Evidence-based Practice Program, the Agency for Health Care Policy and Research (AHCPR) is developing scientific information for other agencies and organizations on which to base clinical guidelines, performance measures, and other quality improvement tools. Contractor institutions—such as the Metaworks®, Inc., Evidence-based Practice Center—review all relevant scientific literature on assigned clinical care topics and produce evidence reports and technology assessments, conduct research on methodologies and the effectiveness of their implementation, and participate in technical assistance activities.

Overview / Reporting the Evidence / Methodology / Findings / Future Research / Availability of Full Report


In this study, MetaWorks investigators have developed an evidence base via a systematic review of the literature pertinent to diagnostic testing and screening in sleep apnea in adult patients. Sleep apnea (SA) is a recently recognized disorder of sleep characterized by recurrent apneic and hypopneic episodes.

Apnea was typically defined as complete cessation of airflow, but in some studies, a > 80 percent reduction in airflow was used. For defining hypopnea, most papers suggested a 50 percent or greater reduction in airflow was used, with or without a coincident O2 desaturation of anywhere from 2 percent to 4 percent from some average SaO2 over a preceding interval of time.

In view of its high prevalence and serious associated morbidity, SA has recently been described as a major public health concern. A major problem in the field in 1998 is diagnosis: who to test, how to test, and what are the implications of test results regarding the risk of serious clinical sequelae?

Sleep apnea is a condition where the gold standard diagnostic method, overnight full-channel polysomnography (PSG) in a sleep lab is intrusive and costly, and the interpretation can be difficult. A standard PSG typically consists of electroencephalogram (EEG), submental (ątibialis) electromyogram (EMG), electrooculogram (EOG), respiratory airflow (usually by oronasal flow monitors), respiratory effort (usually by plethysmography), and oxygen saturation (oximetry). Electrocardiography (ECG) and body position are also frequently monitored in formal sleep studies and stated to be standard requirements of PSG by some groups.

If, however, the estimated prevalence of sleep apnea at 2 percent to 4 percent of middle-aged adults is correct, the costs of full PSGs to diagnose all suspected cases would be prohibitive. The development of simpler and less costly alternatives for diagnostic testing would be highly desirable as would simpler prescreening tests prior to full PSG.

Diagnostic approaches which might be viewed either as alternatives to PSGs or as screening tests to better select patients for PSG include:

  • Partial channel PSGs.
  • Partial night or daytime PSGs.
  • Portable sleep monitoring devices for use at home.
  • Radiologic imaging of the head and neck for anatomic abnormalities predictive of sleep apnea, including cephalometry.
  • Magnetic resonance imaging (MRI) and computed tomography (CT) scans.
  • Anthropomorphic measurements, such as neck circumference.
  • Nasopharyngeal and laryngeal endoscopic measurements of both structure and function.
  • Focused questionnaires.

All such interventions were within the scope of this review, provided they compared results against the gold standard diagnostic test, the standard PSG.

Although the type of sleep evaluation study preferred (and reimbursed) varies widely among physicians, sleep centers, and managed care organizations, MetaWorks investigators have avoided making specific recommendations in this review. MetaWorks investigators also did not review technical considerations related to data acquisition, storage, retrieval, and analysis of various devices, which were beyond the scope of this project. Rather, it is intended that this synthesis of the best available evidence will serve as an information resource for local decisionmakers and developers of guidelines/recommendations. It should also serve to highlight gaps in literature and areas ripe for future research.

Return to Contents

Reporting the Evidence

The key questions that guided this review were:

  1. What diagnostic and screening tests are presently available?
  2. What is the strength of the evidence in support of each?
  3. What is the predictive value of these tests in different populations (which requires estimating the prevalence of SA in different populations)?
  4. What are the implications of certain PSG results in terms of serious clinical events occurring as comorbidities in association with a diagnosis of SA?

Return to Contents


In general, MetaWorks investigators used systematic review methods derived from the evolving science of review research. The review followed a prospective protocol that was developed a priori and shared with the nominating partners on the project (Blue Cross/Blue Shield [BC/BS] of Massachusetts and the Sleep Disorders Centre of Metropolitan Toronto), a panel of technical experts (with representation from consumer groups and professional specialties: neurology, pulmonology, dentistry, otolaryngology, epidemiology, and nursing); and the Task Order Officers at AHCPR. The protocol outlined the methods to be used for the literature search, study eligibility criteria, data elements for extraction, and methodological strategies to minimize bias and maximize precision during the process of data collection, extraction, and synthesis.

The published literature was searched from 1980 to present. The search cutoff date was November 1, 1997, and the retrieval cutoff date was January 30, 1998. The search started with a broad Medline search using the terms "sleep apnea syndrome" and "monitoring, physiologic," "sleep apnea syndrome" and "airway resistance," and "human." Also, MetaWorks investigators searched "sleep apnea syndromes," "sleep apnea syndrome," and "index." In addition, the 1997 Current Contents CD-ROM was searched ("sleep apnea") to the same cutoff date. All citations and abstracts were printed and screened at MetaWorks for any mention of diagnostic tests in adults with SA, for which full papers were obtained. The electronic searches noted above were supplemented by a thorough search of the reference lists of all eligible studies and relevant review articles.

To be included in the review, studies had to report results of any diagnostic test or intervention to establish or support a diagnosis of SA in adults, with at least 10 patients as total sample size. Studies reported in the following Western European languages—English, German, French, Spanish, or Italian—were accepted.

All eligible papers were scored on features pertinent to diagnostic test study design, execution, and reporting, with a range of possible scores from 0 to 44. Those falling in the lowest 20 percent of the distribution of actual scores were dropped from data extraction and analysis. Each accepted diagnostic study was extracted in duplicate by investigators with one extractor using a blinded copy of each study report, masked as to source of financial support, authors, and journal. The agreement between extractors was approximately 78 percent and differences were resolved by consensus.

Key data elements sought for extraction from each study included study level, patient level, and test characteristics. Only clearly reported aggregate results were extracted from studies. Results that were only given for individual patients, and results that would require extrapolations from graphs or derivations from figures or tables were not captured. For all tests, sensitivity, specificity, positive predictive value, negative predictive value, and correlation coefficients of each test relative to PSG AI or AHI (RDI) results were sought.

Apnea index (AI) is defined as the number of apneic episodes/hour sleep, and apnea-hypopnea index (AHI) is the total apneas plus hypopneas during total time asleep, divided by the number of hours asleep. The respiratory distress index (RDI) is the same as AHI.

The main objective of the analysis was to evaluate the diagnostic accuracy of alternatives to full PSG for the diagnosis of SA as compared to a full PSG (gold standard). Initially, weighted averages using Mantel-Haenszel fixed effects models combining the comparative summary statistics were calculated and summarized for groups based on diagnostic test category. Study and patient-level covariates and study evidence scores were also summarized for each diagnostic test category. A summary receiver operating characteristic (ROC) curve was calculated for each diagnostic group where data were available. While differences among studies may be an argument against estimating one common sensitivity and specificity using fixed or random effects models, these factors can be described using the summary ROCs, which both display and summarize the heterogeneity.

A group of 22 peer reviewers drawn from consumer groups and professional organizations, along with our technical experts and partners, was assembled to review and provide suggestions to the draft final report describing this project. Their feedback, as well as that from AHCPR, was incorporated wherever possible within the original scope of the project.

Return to Contents


All Studies: PSG

  • 71 studies (7,572 patients), mean evidence score = 20.6 (range, 16 to 34). Level III to IV evidence overall (that is, primarily derived from case series and observational studies).
  • Variability in PSG definitions of apnea and hypopnea, and AI or AHI thresholds for diagnosis, with or without presence of clinical signs and/or symptoms.
  • Variability in components of "standard" PSG is evident, and requirement for all "standard" PSG channels not established in SA diagnosis. Night to night PSG reproducibility is not well documented and may differ by SA diagnostic thresholds.

Partial Channel PSGs

  • 3 studies of partial channel PSGs (213 patients), mean evidence score = 17.7 (range, 17 to 19).
  • Sensitivity ranged from 82 percent to 94 percent and specificity from 82 percent to 100 percent.
  • Sensitivity and specificity of partial channel PSGs appear promising as possible prescreening tests or replacements for full PSG.

Portable Devices

  • 25 studies of portable monitoring devices (1,631 patients), mean evidence score = 22.1 (range, 16 to 34).
  • Portable device results were mostly from supervised sleep labs, not at home.
  • Reliability in unattended home use, equipment failure rates, night to night reproducibility, price, compliance, and safety are rarely reported.
  • Sensitivity ranged from 32 percent to 100 percent and specificity from 33 percent to 100 percent.
  • Studies of portable devices were variable due to study and device heterogeneity.


  • 12 studies of oximetry alone (1,784 patients); mean evidence score = 20 (range, 16 to 32).
  • Mean sensitivity and mean specificity are 87.4 percent (range, 36 percent to 100 percent) and 64.9 percent (range, 23 percent to 99 percent), respectively.
  • Oximetry studies provided moderate sensitivity and specificity.

Partial Time PSGs

  • 7 studies of partial time PSGs (505 patients), mean evidence score = 18.6 (range, 17 to 20).
  • Mean sensitivity at AI/AHI threshold of 5 was 69.7 percent (range, 66 percent to 93 percent), and at threshold of 10, 79.5 percent (range, 42 percent to 89 percent). Specificity at AI/AHI threshold of 5 was 87.4 percent (range, 50 percent to 100 percent) and at threshold of 10, 86.7 percent (range, 57 percent to 100 percent).
  • Sensitivity and specificity of partial time PSGs appear promising as possible prescreening tests or replacements for full PSG.


  • 5 radiologic studies—1 MRI, 3 cephalometry, and 1 CT + cephalometry—not meta-analyzable.
  • Radiology studies could not be analyzed due to insufficient data.


  • 17 clinical studies (too few studies each for anthropomorphic signs or ears/nose/throat [ENT] exams). Also, 1 chemical assay and 3 questionnaire studies not meta-analyzable.
  • 4 studies of flow volume loops (595 patients), mean evidence score = 18.3 (range, 17 to 20). When both FEF50/FIF50 (a measure of extrathoracic airway obstruction) and the sawtooth sign (indicative of pharyngeal fluttering during respirations) were analyzed together, the mean sensitivity was 39.1 percent (range, 41 percent to 59 percent) and mean specificity was 60.5 percent (range, 54 percent to 85 percent).
  • 4 studies of global impressions of clinicians (1,139 patients), mean evidence score = 23 (range, 19 to 28). Mean sensitivity = 58.9 percent (range, 52 percent to 79 percent), specificity = 65.6 percent (range, 50 percent to 100 percent).
  • Several miscellaneous studies of questionnaires, anthropomorphic signs, and ENT exams could not be analyzed due to insufficient data.
  • Global clinical impressions provided moderate sensitivity and specificity; least accurate were flow volume loops.

Prediction Equations

  • 8 models (1,908 patients), mean evidence score = 21.5 (range, 17 to 30). Mean sensitivity = 66.5 percent (range, 61 percent to 98 percent) and mean specificity = 88.7 percent (range, 21 percent to 100 percent).
  • Prediction models achieved high sensitivity and specificity.

Prevalence Studies

  • General populations: 11 prevalence studies (2,410 patients), mean prevalence of SA = 9.2 percent (range, 0 to 33 percent).
  • Healthy elderly: 7 prevalence studies (469 patients), mean prevalence of SA = 34.6 percent (range, 2 percent to 43 percent).
  • Coronary artery disease: 8 studies (461 patients), mean prevalence of SA = 54.9 percent (range, 50 percent to 100 percent).
  • Hypertension: 4 studies (166 patients), mean prevalence SA = 26.9 percent (range, 22 percent to 30 percent).
  • Erectile dysfunction/impotence: 3 studies (1,138 men), mean prevalence of SA = 42.2 percent (range, 11 percent to 44 percent).
  • Other special populations (stroke, end stage renal disease, congestive heart failure, Alzheimer's disease, depression, and healthy offspring of SA patients): too few studies to summarize.
  • Caveat: Prevalence studies may be underrepresented in this set due to search strategy of identifying primarily diagnostic studies.
  • Caveat: Few prevalence studies here utilized gold standard PSG to diagnose SA, so diagnosis based upon unvalidated tests. Such prevalence estimates are suspect.

Comorbidity Studies

Conditions associated with SA:

  • Hypertension: 24 studies (3,497 SA patients), mean proportion with hypertension = 42.0 percent (range, 9 percent to 77 percent).
  • Coronary artery disease: 9 studies (1,086 SA patients), mean proportion with coronary artery disease (manifest as angina or myocardial infarction [MI]) = 20.3 percent (range, 2 percent to 33 percent).
  • Ventricular arrhythmias: 5 studies (205 SA patients), mean proportion with ventricular arrhythmias (usually complex arrhythmias, during nocturnal monitoring) = 13.1 percent (range, 3 percent to 47 percent).
  • Mortality: 5 studies (2,281 SA patients) with prolonged follow-up (5 to 13 years) reported deaths (all causes) in 6 percent to 11 percent of patients, mean = 7.0 percent.
  • Caveat: Studies with actual clinical consequences of certain AIs, with or without signs and symptoms, and with or without treatment, are not well represented in this set. Inclusion of treatment studies might be useful. Clinical implications of SA diagnosis are unclear.

Return to Contents

Future Research

Future studies of diagnostic test strategies should address the many limitations of the literature noted above. The field could benefit from adoption of a common terminology and definitions for fundamental concepts such as apnea and hypopnea, and the relation between AI and AHI should be established, in order to allow conversions and comparisons across studies. Researchers should seek to clarify what is the frequency of sleep apnea/hypopnea in general populations by gender and age. More naturalistic sleep studies (in the home) are still of interest, as MetaWorks investigators suspect much of the uncertainty about the nature of SA, its pathophysiology, the risk factors, and the clinical consequences, derive from the fact that the phenomenon of SA may be altered by the fact of observing it via standard PSG. Long-term followup studies are recommended to better document the findings of treated vs. untreated SA. Lastly, all sleep monitoring systems which are proposed as prequalifiers or replacements for PSG must be validated in the setting in which they are intended to be used.

Return to Contents

Availability of the Full Report

The full evidence report from which this summary was taken was prepared for AHCPR by MetaWorks, Inc., Boston, MA, under contract No. 290-97-0016. The Evidence Report is archived online on the National Library of Medicine Bookshelf. Print copies are no longer available.

Return to Contents

AHCPR Publication Number 99-E001
Current as of January 2005


The information on this page is archived and provided for reference purposes only.


AHRQ Advancing Excellence in Health Care