Evidence Report/Technology Assessment: Number 5
This information is for reference purposes only. It was current when produced and may now be outdated. Archive material is no longer maintained, and some links may not work. Persons with disabilities having difficulty accessing this information should contact us at: https://info.ahrq.gov. Let us know the nature of the problem, the Web address of what you want, and your contact information.
Please go to www.ahrq.gov for current information.
Under its Evidence-based Practice Program, the Agency for Health Care Policy and Research (AHCPR) is developing scientific information for other agencies and organizations on which to base clinical guidelines, performance measures, and other quality improvement tools. Contractor institutions review all relevant scientific literature on assigned clinical care topics and produce evidence reports and technology assessments, conduct research on methodologies and the effectiveness of their implementation, and participate in technical assistance activities.
Overview / Reporting the Evidence / New Technologies Assessed / Patient Population and Settings / Methodology / Supplemental Analyses / Findings / Future Research / Availability of Full Report
Worldwide, carcinoma of the cervix is one of the most common malignancies in women. It was estimated that
approximately 13,700 new cases of the disease would occur in the United States in 1998. A woman's lifetime risk of being
diagnosed with cervical cancer in the United States is currently 0.83 percent, and the risk of dying from the disease is 0.27
The incidence of cervical cancer and associated mortality have each decreased over 40 percent since 1973; the decreases
are largely attributable to the success of mass screening using the Papanicolaou (Pap) test to diagnose premalignant or
early-stage cases. The decreases in invasive cervical cancer incidence and mortality since the introduction of the Pap smear
have been so dramatic that it is one of the few interventions to receive an "A" recommendation from the U.S. Preventive
Services Task Force even though there are no randomized trials demonstrating its effectiveness.
Despite the indisputably dramatic impact of Pap screening, there is still uncertainty about the details of Pap smear
performance, and much could be done to improve the performance of the test and followup of patients after screening.
Controversy about the details of Pap smear performance is manifest in differing recommendations about the frequency of
screening and the age (if any) at which screening may safely be stopped. A significant proportion of patients and providers
fail to comply with even the least demanding recommendations for Pap screening frequency. Numerous barriers to
screening have been identified that reduce access to Pap smears and other preventive services.
Recently, efforts to improve Pap smear performance have focused on reducing the number of false negative smears, that is,
cases in which premalignant or malignant cells have been misdiagnosed as normal. Measures adopted to improve
laboratory performance on this point include manual rescreening of a portion of slides initially evaluated as negative, an
approach mandated by Federal law (Clinical Laboratory Improvement Amendments [CLIA]). Recently, several
technologies have been developed to optimize Pap test screening by reducing the false negative rate. These technologies are
a major focus of this report.
Return to Contents
Reporting the Evidence
The report addresses three main questions:
- What is the accuracy of cervical cytology using conventional Pap smears and new technologies (thin-layer cytology,
computer rescreening, algorithm-based decisionmaking technology) for detecting cervical cancer and its precursors?
- What are the direct medical costs associated with cervical cancer screening, evaluation, treatment, and followup of
cervical cytological abnormalities and treatment and followup of cervical cancer?
- What are the effects on total health care cost, morbidity, and mortality of regular cervical cytological screening using
thin-layer cytology and computer rescreening using neural network or algorithm-based decisionmaking technology compared with the conventional Pap smear in women participating in a screening program?
On the first point, the report will review published studies comparing cervical cytological diagnosis with clinical diagnosis
based on colposcopy or biopsy. The results of this review will form the basis for a meta-analysis.
On the second point, the report will identify and examine current claims data and other datasets to estimate empirically
costs associated with cervical cytological screening.
On the third point, the report will review the literature on the effectiveness and cost-effectiveness of cervical cytology
screening and use these data to develop a comprehensive cost-effectiveness model to examine the impact of the newer
screening technologies. In the absence of definitive clinical trials on key questions of cervical cancer screening,
policymakers have relied on decision-modeling studies to integrate epidemiological data on the natural history of cervical
cancer precursors, data on the performance of diagnostic tests for early cervical cancer or cervical cancer precursors, and
data on cost. These models estimate the efficacy of various screening programs, balance estimated efficacy against
estimated cost, and lead to decisions about appropriate screening intervals and age cutoffs.
Return to Contents
New Technologies Assessed
Recent developments in specimen processing and interpretation may substantially improve the Pap smear as a diagnostic
test for cervical cancer and cancer precursors. Three new devices recently approved by the Food and Drug Administration
(FDA) are considered in this report: ThinPrep®, Papnet®, and AutoPap®. The three devices employ three different types of
technology: thin-layer cytology (ThinPrep®) and computerized rescreening utilizing neural-network technology (Papnet®)
or algorithmic classification (AutoPap®).
Each of these technologies was developed to reduce the false negative rate associated with cervical cytological screening.
The two major components to this false negative rate are false negatives related to sampling error and false negatives
related to detection error. About two-thirds of false negatives are a result of sampling error and the remaining one-third a
result of detection error. Each of the new technologies is directed at one of these components of false negatives. Thin-layer
cytology aims primarily to fix sampling error, whereas computerized rescreening targets detection error. This implies that
neither technology will be able to reduce false negatives beyond a certain threshold.
Thin-layer cytology is a new technology for processing cytological samples. The sample is collected as in the conventional
Pap test using a broom-type device or plastic spatula and endocervical brush combination, but rather than smearing the
cytological sample directly onto a microscope slide, this method suspends the sample cells in a fixative solution, disperses
them, and then selectively collects cells on a filter. The cells are then transferred to a microscope slide for cytological
interpretation. Because cytological samples are fixed immediately after collection, there are fewer artifacts in cellular
morphology. Fewer cells on the slide are obscured, both because the process reduces artifactual material such as blood and
mucus and because cells are deposited on the slide in a monolayer. Clinical studies of the ThinPrep® 2000 (Cytyc
Corporation, Boxborough, MA) have shown that test sensitivity is improved compared with conventional Pap smears. The
improvement in sensitivity appears to be greater in populations with a low incidence of cytological abnormalities.
One newly approved device, Papnet®, uses neural-network computerized rescreening of Pap smears initially read as
negative by a cytotechnologist. The system works by using automated computerized imaging of Pap smear slides and
interpretation of images using a computerized algorithm to identify slides that are likely to contain abnormal cells. The
Papnet® system (Neuromedical Systems, Inc.) identifies cells or clusters of cells that require review and can display up to
128 images of the slide likely to contain abnormalities. These images can be reviewed by a cytotechnologist who can
decide whether or not to review the slide using light microscopy.
AutoPap® 300 QC system (Neopath, Inc.), an algorithm-based decisionmaking technology, identifies slides exceeding a
certain threshold for the likelihood of abnormal cells. The laboratory can select different thresholds corresponding to 10,
15, and 20 percent review rates. In contrast to random rescreening, the population of slides selected by the AutoPap® 300
QC system is enriched with abnormalities and, at the 10-15 percent sort rate, this population of slides should contain 70-80
percent of the slides containing abnormalities missed by manual screening.
A variety of other technologies or clinical strategies have been proposed to improve Pap testing including various devices
for collecting a cytological sample from the cervix. Still other technologies have been proposed to augment or replace
cervical cytological screening, including colposcopic photographs for review by experts (cervicography) and DNA testing
for specific human papillomavirus (HPV). These technologies are not considered in the present report.
Return to Contents
Patient Population and Settings
The primary target population for this evidence report is women of average cervical cancer risk in the United States who
are candidates for Pap smear screening. For the purposes of our analysis, candidates for Pap smear screening include
women between the age of onset of sexual activity and the age of 85.
Although a large proportion of cervical cancer occurs in women with very limited or no screening, we did not examine
programs or policies designed to improve screening compliance. Some previous studies have focused on special
populations such as elderly women and elderly women who have not previously been screened.
The principal practice setting considered is the primary care practice in the United States (general internal medicine, family
practice, adolescent medicine, and obstetrics/gynecology) and government and nongovernment family planning clinics
(e.g., Planned Parenthood, public health clinics).
Return to Contents
The comprehensive review of the literature, from identification of databases through abstraction of individual articles into
the evidence tables, was a multistep, sequential process. This process is detailed below.
Literature Sources Used
MEDLINE, CancerLit, HealthSTAR, CINAHL, EMBASE, and EconLit computerized database searches, supplemented by
manual journal searches and querying experts and device manufacturers, were the sources used to identify English language
reports on the accuracy of cervical cytological screening, costs associated with screening and treatment, and
Citations for the review of accuracy of cervical cytological testing were retrieved with a search strategy that combined
various text word and index terms for cervical cytological tests with cervical cancer or dysplasia and sensitivity and
specificity. The strategy to retrieve articles on the costs and health outcomes associated with cervical cancer screening
combined cervical cytological test terms with terms describing cost analysis and mathematical modeling. Experienced
librarians assisted with the design and translation of these search strategies for each database searched.
Screening of Articles
Separate sets of criteria for including articles in the evidence report were developed for the two topics that were the subject
of literature reviews (diagnostic testing and cost and health outcomes). In each case, final screening criteria were developed
through an iterative process. Each iteration of criteria was pilot-tested by each reviewer/abstractor on a subset of randomly
Articles on diagnostic testing were first screened based on information available through the online databases (primarily
title, authors, and abstract when available). Citations were eliminated in Step 1 of the screening process if cervical cytology
was not evaluated as a screening test or if the screening test results were not compared with a reference standard. In Step 2
of the screening process, full texts of articles were reviewed to select articles in which a reference standard of colposcopy
or histology was used, the screening test and references standard were reasonably concurrent (i.e., within 3 months), and
sufficient data to calculate both sensitivity and specificity were provided (i.e., all cells of a two-by-two table). Of the 939
bibliographic references reviewed, 561, or approximately 60 percent, were excluded during the first screening, and another
293, or 31 percent, during the second screening. Eighty-six articles were included according to these criteria: 84 studies of
conventional Pap screening and one study each of ThinPrep® and Papnet®. Because so few studies of the new technologies
met the original criteria, we modified the criteria to include studies of the new technologies that used a cytology reference
standard and allowed estimation of either sensitivity or specificity. We considered a total of 59 studies (12 on AutoPap®, 27
on Papnet®, and 20 on ThinPrep®) during this final stage of the screening process (Step 3). The net result was the inclusion
of 6 studies of AutoPap®, 11 of Papnet®, and 8 of ThinPrep®.
Articles on cost and health outcomes of cervical cytological screening were selected if they assessed the effect of screening
on life expectancy or quality, number of cases of cervical cancer, or total health care costs for any of the following
cytological screening technologies: conventional Pap smears, thin-layer cytology, or Pap smears with computerized
rescreening. Of the 672 articles identified, 638, or 95 percent, were eliminated during the screening process. Thirty-four
articles were included in the review.
Data Abstraction Process
Key information was abstracted onto specially designed forms and verified by either duplicate abstraction (two-by-two
tables) or overreading by paired clinician-abstractors. Differences were resolved by consensus.
For the diagnostic testing articles, both members of each abstractor team also independently completed two-by-two tables
for each study, extracting the key data to calculate sensitivity, specificity, and prevalence and other data to be used in the
meta-analysis. The main outcome measures considered were the sensitivity and specificity of cytological abnormality by
Pap test for detecting cases, where cytological abnormality was defined by one of three thresholds ranging from atypical
squamous cells of uncertain significance (ASCUS) (threshold 1) to low-grade squamous intraepithelial lesion (LSIL)
(threshold 2) to high-grade squamous intraepithelial lesion (HSIL) (threshold 3), and where a case was defined as a
histological diagnosis of dysplasia or carcinoma. Equivalent categories in other classification schemes were also used.
Two-by-two tables were constructed for four different combinations of cytological versus histological thresholds:
ASCUS/cervical intraepithelial neoplasia (CIN1), LSIL/CIN1, LSIL/CIN2-3, and HSIL/CIN2-3.
Criteria for Evaluating the Quality of Articles
Quality scores for articles on diagnostic testing were assigned according to predetermined methodological criteria based on
blind interpretation of screening test results, use of a reference standard of histology, selection of test-negative patients for
verification, avoidance of bias in sample collection, description of the spectrum of disease in the sample, publication as a
full report (as opposed to abstract), and source of support.
The quality of articles on costs and health outcomes was described according to recently published criteria by an expert
panel on cost and effectiveness in medicine.
Return to Contents
Meta-analysis of Pap Test Accuracy
We used the effectiveness score to combine data from multiple studies describing the performance of the conventional Pap
test in discriminating between patients with and without cervical lesions. The effectiveness score takes account of both
sensitivity and specificity by fitting a receiver operating characteristic (ROC) curve through a logistic odds transformation
of the two and thus accounts for their interdependence. The effectiveness score is more normally distributed than either
sensitivity or specificity and can be thought of as a gauge of the overall discriminatory ability of the test. Standardized
effectiveness scores can be interpreted across different diagnostic tests. In general, a score of 3 reflects a test with good
discrimination, whereas a score of 1 reflects a test that does not discriminate between disease positives and disease
We used maximum likelihood estimation techniques and a random effects model to calculate summary measures of
effectiveness at each of the four explicit diagnostic thresholds (ASCUS/CIN1, LSIL/CIN1, LSIL/CIN2-3, HSIL/CIN2-3).
We further evaluated the effect of variations in disease prevalence and in quality of study design and reporting on test
Several available datasets were analyzed to estimate direct medical costs of screening, diagnosing, and treating cervical
cancer, calculating separate estimates for women 20-64 years of age and those 65 years and older (eligible for Medicare).
For women 20-64, the unit cost of screening, diagnosis, and treatment of cervical cancer was estimated from MEDSTAT
data from 1992, 1993, and 1994, inflated to reflect 1994 charges and converted to costs using 1994 cost-to-charge ratios
published by the American Hospital Association.
For women over 65, Medicares resource-based relative value scale (RBRVS) fee schedule for physician services,
Medicares clinical laboratory fee schedule for laboratory services, and national average diagnosis-related group (DRG)
payments for hospital admissions were used to identify the payments associated with services received for cervical cancer
screening, diagnosis, and treatment. Charges and payment information obtained from all sources were then converted to
reflect costs associated with the services provided and all costs were inflated to 1997 dollars.
We constructed a 20-State Markov model that follows a cohort of women from age 15 to 85 and assumes that there are no
prevalent cases of HPV infection or squamous intraepithelial lesion (SIL) at age 15. Cycle lengths are 1 year long. No Pap
smear screening is compared with the following screening strategies: conventional Pap smears at 1-, 2- and 3-year
intervals, thin-layer cytology smears at 1-, 2- and 3-year intervals, and 100 percent computerized rescreening at 1-, 2- and
We used a U.S. health system perspective and evaluated the direct and health care-specific costs associated with screening,
diagnosis, and treatment of cervical cancer and its precursors. We did not consider other societal costs such as work loss.
The model considers the following outcomes: cost per year of life saved, cost per cervical cancer death prevented and per
cervical cancer case prevented, and the number of morbid therapies avoided.
We discounted costs and years of life at 3 percent annually in the base case and varied the discount rate from 0 to 5 percent
in a sensitivity analysis.
Specific parameter estimates were derived from a preliminary literature assessment conducted for this report and prior
published models of cervical cancer screening.
Return to Contents
Important findings regarding the discrimination about the accuracy of cervical cytological screening include the following:
- Despite the demonstrated ability of cervical cytological screening in reducing cervical cancer mortality, the conventional
Pap test is less sensitive than it is generally believed to be.
- Few studies of primary screening were unaffected by workup bias, but the few that were provided estimates of the
specificity of Pap smear screening of 0.98 (95 percent confidence interval; 0.97-0.99) and sensitivity of 0.51 (95 percent
confidence interval; 0.37-0.66).
- The Pap test is more accurate when a higher cytological threshold (HSIL) is used with the goal of detecting a high-grade
lesion. Lower test thresholds or use of the Pap test for detecting low-grade dysplasia results in poorer discrimination.
The accuracy of the Pap test is strongly affected by disease prevalence. Higher disease prevalence is associated with higher
estimates of sensitivity and lower estimates of specificity (with a greater effect on specificity). These findings are consistent
with prevalence as a marker for workup bias and perhaps also reflect an imperfect reference standard that is more specific
- Quality of the studies reviewed, based on previously described criteria, varied widely; however, quality score did not
explain a statistically significant amount of the between-study variation in discrimination when the variation in the
prevalence of disease was controlled.
- Existing information fails to provide accurate estimates for specificity of thin-layer cytology or computerized rescreening
technologies. Our initial requirement for verification of test negatives with colposcopy or histology led to the exclusion of
all but one study each of ThinPrep® and Papnet® and all studies of AutoPap®. The values reported for sensitivity and
specificity in the few studies that use histological or colposcopic reference standards are well within the range of sensitivity
and specificity reported for the conventional Pap test. However, including studies that directly compare these new
technologies with conventional Pap smear testing (screening or rescreening) using a cytological reference standard results
in significant improvements in sensitivity.
Important findings regarding the costs of cervical cytological screening and cervical cancer diagnosis and treatment include
- Pap smear screening cost is somewhat higher in older women than younger women chiefly because physician and total
time spent in obtaining Pap smears during office visits is longer for older women.
- Estimated costs of cervical cancer treatment calculated from episodes of care are substantially higher than estimates
based on average procedure-specific costs because of both the provision of related services and the effect of complicated
cases with unusually high costs. Estimates based on procedure-related costs alone will underestimate the true direct
Important findings from a review of previously published models of the cost and effectiveness of cervical cytological
screening include the following:
- Published models examining the cost and effectiveness of Pap smear screening have consistently found Pap screening to
have a significant impact on the incidence and mortality of cervical cancer and to have an acceptable range of
cost-effectiveness ratios when compared with no screening.
- Estimates of Pap test accuracy used in these models generally overestimated Pap test performance, as determined by
recent unbiased studies and the findings of this report, and previously published meta-analyses. Best estimates of Pap test
performance fall outside the range used in sensitivity analyses of some models.
Important findings from a new model of cost and effectiveness of cervical cytological screening include the following:
- The cost-effectiveness of either a technology that improves primary screening sensitivity (e.g., thin-layer cytology), or
one that improves rescreening sensitivity (e.g., computerized rescreening), is directly related to the frequency of
screening—longer intervals result in lower estimates of cost per life year saved.
- Our findings were relatively insensitive to assumptions about cervical cancer incidence, the cost of technologies,
diagnostic strategies for abnormal screening results, age at onset of screening, or most other variables tested.
- There is substantial uncertainty about the estimates of sensitivity and specificity of thin-layer cytology and computerized
rescreening technologies compared with each other and with conventional Pap testing. The uncertainty is not reflected in
the point estimates for effectiveness or cost-effectiveness. Although it is clear that both thin-layer cytology and
computerized rescreening technologies provide an improvement in effectiveness at higher cost, the imprecision in estimates
of effectiveness makes drawing conclusions about the relative cost-effectiveness of thin-layer cytology and computerized
rescreening technologies problematic.
Return to Contents
Our research suggests several areas for possible future study.
- Future decision models, cost-effectiveness studies, and health policy decisions should consider the sensitivity of Pap
smear screening close to 50 percent.
- Thin-layer cytology technology (ThinPrep®), the computerized rescreening device (AutoPap®), and the algorithm-based
decisionmaking technology (Papnet®) have received regulatory approval from the FDA based on their demonstration of
improved sensitivity compared with conventional Pap smear techniques. However, the evidence currently available does
not fully describe the impact of these technologies on the specificity of the screening process. It is possible that a new
technology might simultaneously raise both sensitivity and specificity; however, this has not been conclusively
demonstrated for the devices reviewed in this report. Future studies of these technologies should include verification of
test-negative subjects to allow estimation of specificity.
- Comparisons with cytological reference standards attest to the validity of the new technologies compared with optimal
Pap screening, but comparison with a histological reference standard provides a more relevant outcome for clinical
decisionmakers, since histological diagnosis forms the basis of most clinical management decisions. Further research is
needed to validate negative cytological diagnoses made with the new technologies with colposcopy, in both low-prevalence
and high-prevalence populations. This could be accomplished by subjecting a random sample of cytology-negative women
to colposcopy, which would permit statistical correction for workup bias and estimation of test specificity.
- Further research is needed to quantify the effect of cervical cancer and premalignant cervical lesions and various
treatments for cervical cancer or dysplasia on quality of life. These data will allow a more comprehensive assessment of the
impact of technologies for cervical cytological screening.
Return to Contents
Availability of the Full Report
The full evidence report from which this summary was taken was prepared by Duke University, an AHCPR
Evidence-based Practice Center, Durham, NC, under Contract No. 290-97-0014. The Evidence Report is available online on the National Library of Medicine Bookshelf. Print copies are no longer available.
Return to Contents
AHCPR Publication Number 99-E009
Current as of January 1999