Evidence Report/Technology Assessment: Number 5

Evaluation of Cervical Cytology

Summary

Under its Evidence-based Practice Program, the Agency for Health Care Policy and Research (AHCPR) is developing scientific information for other agencies and organizations on which to base clinical guidelines, performance measures, and other quality improvement tools. Contractor institutions review all relevant scientific literature on assigned clinical care topics and produce evidence reports and technology assessments, conduct research on methodologies and the effectiveness of their implementation, and participate in technical assistance activities.

Overview / Reporting the Evidence / New Technologies Assessed / Patient Population and Settings / Methodology / Supplemental Analyses / Findings / Future Research / Availability of Full Report

Overview

Worldwide, carcinoma of the cervix is one of the most common malignancies in women. It was estimated that approximately 13,700 new cases of the disease would occur in the United States in 1998. A woman's lifetime risk of being diagnosed with cervical cancer in the United States is currently 0.83 percent, and the risk of dying from the disease is 0.27 percent.

The incidence of cervical cancer and associated mortality have each decreased over 40 percent since 1973; the decreases are largely attributable to the success of mass screening using the Papanicolaou (Pap) test to diagnose premalignant or early-stage cases. The decreases in invasive cervical cancer incidence and mortality since the introduction of the Pap smear have been so dramatic that it is one of the few interventions to receive an "A" recommendation from the U.S. Preventive Services Task Force even though there are no randomized trials demonstrating its effectiveness.

Despite the indisputably dramatic impact of Pap screening, there is still uncertainty about the details of Pap smear performance, and much could be done to improve the performance of the test and followup of patients after screening. Controversy about the details of Pap smear performance is manifest in differing recommendations about the frequency of screening and the age (if any) at which screening may safely be stopped. A significant proportion of patients and providers fail to comply with even the least demanding recommendations for Pap screening frequency. Numerous barriers to screening have been identified that reduce access to Pap smears and other preventive services.

Recently, efforts to improve Pap smear performance have focused on reducing the number of false negative smears, that is, cases in which premalignant or malignant cells have been misdiagnosed as normal. Measures adopted to improve laboratory performance on this point include manual rescreening of a portion of slides initially evaluated as negative, an approach mandated by Federal law (Clinical Laboratory Improvement Amendments [CLIA]). Recently, several technologies have been developed to optimize Pap test screening by reducing the false negative rate. These technologies are a major focus of this report.

Return to Contents

Reporting the Evidence

The report addresses three main questions:

What is the accuracy of cervical cytology using conventional Pap smears and new technologies (thin-layer cytology, computer rescreening, algorithm-based decisionmaking technology) for detecting cervical cancer and its precursors?
What are the direct medical costs associated with cervical cancer screening, evaluation, treatment, and followup of cervical cytological abnormalities and treatment and followup of cervical cancer?
What are the effects on total health care cost, morbidity, and mortality of regular cervical cytological screening using thin-layer cytology and computer rescreening using neural network or algorithm-based decisionmaking technology compared with the conventional Pap smear in women participating in a screening program?

On the first point, the report will review published studies comparing cervical cytological diagnosis with clinical diagnosis based on colposcopy or biopsy. The results of this review will form the basis for a meta-analysis.

On the second point, the report will identify and examine current claims data and other datasets to estimate empirically costs associated with cervical cytological screening.

On the third point, the report will review the literature on the effectiveness and cost-effectiveness of cervical cytology screening and use these data to develop a comprehensive cost-effectiveness model to examine the impact of the newer screening technologies. In the absence of definitive clinical trials on key questions of cervical cancer screening, policymakers have relied on decision-modeling studies to integrate epidemiological data on the natural history of cervical cancer precursors, data on the performance of diagnostic tests for early cervical cancer or cervical cancer precursors, and data on cost. These models estimate the efficacy of various screening programs, balance estimated efficacy against estimated cost, and lead to decisions about appropriate screening intervals and age cutoffs.

Return to Contents

New Technologies Assessed

Recent developments in specimen processing and interpretation may substantially improve the Pap smear as a diagnostic test for cervical cancer and cancer precursors. Three new devices recently approved by the Food and Drug Administration (FDA) are considered in this report: ThinPrep®, Papnet®, and AutoPap®. The three devices employ three different types of technology: thin-layer cytology (ThinPrep®) and computerized rescreening utilizing neural-network technology (Papnet®) or algorithmic classification (AutoPap®).

Each of these technologies was developed to reduce the false negative rate associated with cervical cytological screening. The two major components to this false negative rate are false negatives related to sampling error and false negatives related to detection error. About two-thirds of false negatives are a result of sampling error and the remaining one-third a result of detection error. Each of the new technologies is directed at one of these components of false negatives. Thin-layer cytology aims primarily to fix sampling error, whereas computerized rescreening targets detection error. This implies that neither technology will be able to reduce false negatives beyond a certain threshold.

Thin-layer cytology is a new technology for processing cytological samples. The sample is collected as in the conventional Pap test using a broom-type device or plastic spatula and endocervical brush combination, but rather than smearing the cytological sample directly onto a microscope slide, this method suspends the sample cells in a fixative solution, disperses them, and then selectively collects cells on a filter. The cells are then transferred to a microscope slide for cytological interpretation. Because cytological samples are fixed immediately after collection, there are fewer artifacts in cellular morphology. Fewer cells on the slide are obscured, both because the process reduces artifactual material such as blood and mucus and because cells are deposited on the slide in a monolayer. Clinical studies of the ThinPrep® 2000 (Cytyc Corporation, Boxborough, MA) have shown that test sensitivity is improved compared with conventional Pap smears. The improvement in sensitivity appears to be greater in populations with a low incidence of cytological abnormalities.

One newly approved device, Papnet®, uses neural-network computerized rescreening of Pap smears initially read as negative by a cytotechnologist. The system works by using automated computerized imaging of Pap smear slides and interpretation of images using a computerized algorithm to identify slides that are likely to contain abnormal cells. The Papnet® system (Neuromedical Systems, Inc.) identifies cells or clusters of cells that require review and can display up to 128 images of the slide likely to contain abnormalities. These images can be reviewed by a cytotechnologist who can decide whether or not to review the slide using light microscopy.

AutoPap® 300 QC system (Neopath, Inc.), an algorithm-based decisionmaking technology, identifies slides exceeding a certain threshold for the likelihood of abnormal cells. The laboratory can select different thresholds corresponding to 10, 15, and 20 percent review rates. In contrast to random rescreening, the population of slides selected by the AutoPap® 300 QC system is enriched with abnormalities and, at the 10-15 percent sort rate, this population of slides should contain 70-80 percent of the slides containing abnormalities missed by manual screening.

A variety of other technologies or clinical strategies have been proposed to improve Pap testing including various devices for collecting a cytological sample from the cervix. Still other technologies have been proposed to augment or replace cervical cytological screening, including colposcopic photographs for review by experts (cervicography) and DNA testing for specific human papillomavirus (HPV). These technologies are not considered in the present report.

Return to Contents

Patient Population and Settings

The primary target population for this evidence report is women of average cervical cancer risk in the United States who are candidates for Pap smear screening. For the purposes of our analysis, candidates for Pap smear screening include women between the age of onset of sexual activity and the age of 85.

Although a large proportion of cervical cancer occurs in women with very limited or no screening, we did not examine programs or policies designed to improve screening compliance. Some previous studies have focused on special populations such as elderly women and elderly women who have not previously been screened.

The principal practice setting considered is the primary care practice in the United States (general internal medicine, family practice, adolescent medicine, and obstetrics/gynecology) and government and nongovernment family planning clinics (e.g., Planned Parenthood, public health clinics).

Return to Contents

Methodology

The comprehensive review of the literature, from identification of databases through abstraction of individual articles into the evidence tables, was a multistep, sequential process. This process is detailed below.

Literature Sources Used

MEDLINE, CancerLit, HealthSTAR, CINAHL, EMBASE, and EconLit computerized database searches, supplemented by manual journal searches and querying experts and device manufacturers, were the sources used to identify English language reports on the accuracy of cervical cytological screening, costs associated with screening and treatment, and cost-effectiveness.

Citations for the review of accuracy of cervical cytological testing were retrieved with a search strategy that combined various text word and index terms for cervical cytological tests with cervical cancer or dysplasia and sensitivity and specificity. The strategy to retrieve articles on the costs and health outcomes associated with cervical cancer screening combined cervical cytological test terms with terms describing cost analysis and mathematical modeling. Experienced librarians assisted with the design and translation of these search strategies for each database searched.

Screening of Articles

Separate sets of criteria for including articles in the evidence report were developed for the two topics that were the subject of literature reviews (diagnostic testing and cost and health outcomes). In each case, final screening criteria were developed through an iterative process. Each iteration of criteria was pilot-tested by each reviewer/abstractor on a subset of randomly chosen articles.

Articles on diagnostic testing were first screened based on information available through the online databases (primarily title, authors, and abstract when available). Citations were eliminated in Step 1 of the screening process if cervical cytology was not evaluated as a screening test or if the screening test results were not compared with a reference standard. In Step 2 of the screening process, full texts of articles were reviewed to select articles in which a reference standard of colposcopy or histology was used, the screening test and references standard were reasonably concurrent (i.e., within 3 months), and sufficient data to calculate both sensitivity and specificity were provided (i.e., all cells of a two-by-two table). Of the 939 bibliographic references reviewed, 561, or approximately 60 percent, were excluded during the first screening, and another 293, or 31 percent, during the second screening. Eighty-six articles were included according to these criteria: 84 studies of conventional Pap screening and one study each of ThinPrep® and Papnet®. Because so few studies of the new technologies met the original criteria, we modified the criteria to include studies of the new technologies that used a cytology reference standard and allowed estimation of either sensitivity or specificity. We considered a total of 59 studies (12 on AutoPap®, 27 on Papnet®, and 20 on ThinPrep®) during this final stage of the screening process (Step 3). The net result was the inclusion of 6 studies of AutoPap®, 11 of Papnet®, and 8 of ThinPrep®.

Articles on cost and health outcomes of cervical cytological screening were selected if they assessed the effect of screening on life expectancy or quality, number of cases of cervical cancer, or total health care costs for any of the following cytological screening technologies: conventional Pap smears, thin-layer cytology, or Pap smears with computerized rescreening. Of the 672 articles identified, 638, or 95 percent, were eliminated during the screening process. Thirty-four articles were included in the review.

Data Abstraction Process

Key information was abstracted onto specially designed forms and verified by either duplicate abstraction (two-by-two tables) or overreading by paired clinician-abstractors. Differences were resolved by consensus.

For the diagnostic testing articles, both members of each abstractor team also independently completed two-by-two tables for each study, extracting the key data to calculate sensitivity, specificity, and prevalence and other data to be used in the meta-analysis. The main outcome measures considered were the sensitivity and specificity of cytological abnormality by Pap test for detecting cases, where cytological abnormality was defined by one of three thresholds ranging from atypical squamous cells of uncertain significance (ASCUS) (threshold 1) to low-grade squamous intraepithelial lesion (LSIL) (threshold 2) to high-grade squamous intraepithelial lesion (HSIL) (threshold 3), and where a case was defined as a histological diagnosis of dysplasia or carcinoma. Equivalent categories in other classification schemes were also used. Two-by-two tables were constructed for four different combinations of cytological versus histological thresholds: ASCUS/cervical intraepithelial neoplasia (CIN1), LSIL/CIN1, LSIL/CIN2-3, and HSIL/CIN2-3.

Criteria for Evaluating the Quality of Articles

Quality scores for articles on diagnostic testing were assigned according to predetermined methodological criteria based on blind interpretation of screening test results, use of a reference standard of histology, selection of test-negative patients for verification, avoidance of bias in sample collection, description of the spectrum of disease in the sample, publication as a full report (as opposed to abstract), and source of support.

The quality of articles on costs and health outcomes was described according to recently published criteria by an expert panel on cost and effectiveness in medicine.

Return to Contents

Supplemental Analyses

Meta-analysis of Pap Test Accuracy

We used the effectiveness score to combine data from multiple studies describing the performance of the conventional Pap test in discriminating between patients with and without cervical lesions. The effectiveness score takes account of both sensitivity and specificity by fitting a receiver operating characteristic (ROC) curve through a logistic odds transformation of the two and thus accounts for their interdependence. The effectiveness score is more normally distributed than either sensitivity or specificity and can be thought of as a gauge of the overall discriminatory ability of the test. Standardized effectiveness scores can be interpreted across different diagnostic tests. In general, a score of 3 reflects a test with good discrimination, whereas a score of 1 reflects a test that does not discriminate between disease positives and disease negatives.

We used maximum likelihood estimation techniques and a random effects model to calculate summary measures of effectiveness at each of the four explicit diagnostic thresholds (ASCUS/CIN1, LSIL/CIN1, LSIL/CIN2-3, HSIL/CIN2-3). We further evaluated the effect of variations in disease prevalence and in quality of study design and reporting on test discrimination.

Cost Analysis

Several available datasets were analyzed to estimate direct medical costs of screening, diagnosing, and treating cervical cancer, calculating separate estimates for women 20-64 years of age and those 65 years and older (eligible for Medicare). For women 20-64, the unit cost of screening, diagnosis, and treatment of cervical cancer was estimated from MEDSTAT data from 1992, 1993, and 1994, inflated to reflect 1994 charges and converted to costs using 1994 cost-to-charge ratios published by the American Hospital Association.

For women over 65, Medicares resource-based relative value scale (RBRVS) fee schedule for physician services, Medicares clinical laboratory fee schedule for laboratory services, and national average diagnosis-related group (DRG) payments for hospital admissions were used to identify the payments associated with services received for cervical cancer screening, diagnosis, and treatment. Charges and payment information obtained from all sources were then converted to reflect costs associated with the services provided and all costs were inflated to 1997 dollars.

Cost-Effectiveness Model

We constructed a 20-State Markov model that follows a cohort of women from age 15 to 85 and assumes that there are no prevalent cases of HPV infection or squamous intraepithelial lesion (SIL) at age 15. Cycle lengths are 1 year long. No Pap smear screening is compared with the following screening strategies: conventional Pap smears at 1-, 2- and 3-year intervals, thin-layer cytology smears at 1-, 2- and 3-year intervals, and 100 percent computerized rescreening at 1-, 2- and 3-year intervals.

We used a U.S. health system perspective and evaluated the direct and health care-specific costs associated with screening, diagnosis, and treatment of cervical cancer and its precursors. We did not consider other societal costs such as work loss. The model considers the following outcomes: cost per year of life saved, cost per cervical cancer death prevented and per cervical cancer case prevented, and the number of morbid therapies avoided.

We discounted costs and years of life at 3 percent annually in the base case and varied the discount rate from 0 to 5 percent in a sensitivity analysis.

Specific parameter estimates were derived from a preliminary literature assessment conducted for this report and prior published models of cervical cancer screening.

Return to Contents

Findings

Important findings regarding the discrimination about the accuracy of cervical cytological screening include the following:

Despite the demonstrated ability of cervical cytological screening in reducing cervical cancer mortality, the conventional Pap test is less sensitive than it is generally believed to be.
Few studies of primary screening were unaffected by workup bias, but the few that were provided estimates of the specificity of Pap smear screening of 0.98 (95 percent confidence interval; 0.97-0.99) and sensitivity of 0.51 (95 percent confidence interval; 0.37-0.66).
The Pap test is more accurate when a higher cytological threshold (HSIL) is used with the goal of detecting a high-grade lesion. Lower test thresholds or use of the Pap test for detecting low-grade dysplasia results in poorer discrimination.

The accuracy of the Pap test is strongly affected by disease prevalence. Higher disease prevalence is associated with higher estimates of sensitivity and lower estimates of specificity (with a greater effect on specificity). These findings are consistent with prevalence as a marker for workup bias and perhaps also reflect an imperfect reference standard that is more specific than sensitive.

Quality of the studies reviewed, based on previously described criteria, varied widely; however, quality score did not explain a statistically significant amount of the between-study variation in discrimination when the variation in the prevalence of disease was controlled.
Existing information fails to provide accurate estimates for specificity of thin-layer cytology or computerized rescreening technologies. Our initial requirement for verification of test negatives with colposcopy or histology led to the exclusion of all but one study each of ThinPrep® and Papnet® and all studies of AutoPap®. The values reported for sensitivity and specificity in the few studies that use histological or colposcopic reference standards are well within the range of sensitivity and specificity reported for the conventional Pap test. However, including studies that directly compare these new technologies with conventional Pap smear testing (screening or rescreening) using a cytological reference standard results in significant improvements in sensitivity.

Important findings regarding the costs of cervical cytological screening and cervical cancer diagnosis and treatment include the following:

Pap smear screening cost is somewhat higher in older women than younger women chiefly because physician and total time spent in obtaining Pap smears during office visits is longer for older women.
Estimated costs of cervical cancer treatment calculated from episodes of care are substantially higher than estimates based on average procedure-specific costs because of both the provision of related services and the effect of complicated cases with unusually high costs. Estimates based on procedure-related costs alone will underestimate the true direct medical costs.

Important findings from a review of previously published models of the cost and effectiveness of cervical cytological screening include the following:

Published models examining the cost and effectiveness of Pap smear screening have consistently found Pap screening to have a significant impact on the incidence and mortality of cervical cancer and to have an acceptable range of cost-effectiveness ratios when compared with no screening.
Estimates of Pap test accuracy used in these models generally overestimated Pap test performance, as determined by recent unbiased studies and the findings of this report, and previously published meta-analyses. Best estimates of Pap test performance fall outside the range used in sensitivity analyses of some models.

Important findings from a new model of cost and effectiveness of cervical cytological screening include the following:

The cost-effectiveness of either a technology that improves primary screening sensitivity (e.g., thin-layer cytology), or one that improves rescreening sensitivity (e.g., computerized rescreening), is directly related to the frequency of screening—longer intervals result in lower estimates of cost per life year saved.
Our findings were relatively insensitive to assumptions about cervical cancer incidence, the cost of technologies, diagnostic strategies for abnormal screening results, age at onset of screening, or most other variables tested.
There is substantial uncertainty about the estimates of sensitivity and specificity of thin-layer cytology and computerized rescreening technologies compared with each other and with conventional Pap testing. The uncertainty is not reflected in the point estimates for effectiveness or cost-effectiveness. Although it is clear that both thin-layer cytology and computerized rescreening technologies provide an improvement in effectiveness at higher cost, the imprecision in estimates of effectiveness makes drawing conclusions about the relative cost-effectiveness of thin-layer cytology and computerized rescreening technologies problematic.

Return to Contents

Future Research

Our research suggests several areas for possible future study.

Future decision models, cost-effectiveness studies, and health policy decisions should consider the sensitivity of Pap smear screening close to 50 percent.
Thin-layer cytology technology (ThinPrep®), the computerized rescreening device (AutoPap®), and the algorithm-based decisionmaking technology (Papnet®) have received regulatory approval from the FDA based on their demonstration of improved sensitivity compared with conventional Pap smear techniques. However, the evidence currently available does not fully describe the impact of these technologies on the specificity of the screening process. It is possible that a new technology might simultaneously raise both sensitivity and specificity; however, this has not been conclusively demonstrated for the devices reviewed in this report. Future studies of these technologies should include verification of test-negative subjects to allow estimation of specificity.
Comparisons with cytological reference standards attest to the validity of the new technologies compared with optimal Pap screening, but comparison with a histological reference standard provides a more relevant outcome for clinical decisionmakers, since histological diagnosis forms the basis of most clinical management decisions. Further research is needed to validate negative cytological diagnoses made with the new technologies with colposcopy, in both low-prevalence and high-prevalence populations. This could be accomplished by subjecting a random sample of cytology-negative women to colposcopy, which would permit statistical correction for workup bias and estimation of test specificity.
Further research is needed to quantify the effect of cervical cancer and premalignant cervical lesions and various treatments for cervical cancer or dysplasia on quality of life. These data will allow a more comprehensive assessment of the impact of technologies for cervical cytological screening.

Return to Contents

Availability of the Full Report

The full evidence report from which this summary was taken was prepared by Duke University, an AHCPR Evidence-based Practice Center, Durham, NC, under Contract No. 290-97-0014. Print copies may be obtained free of charge from the Publications Clearinghouse by calling 1-800-358-9295. Requestors should ask for Evidence Report/Technology Assessment No. 5, Evaluation of Cervical Cytology (AHCPR Publication No. 99-E010). The Evidence Report is available online on the National Library of Medicine Bookshelf.

Return to Contents

AHCPR Publication Number 99-E009
Current as of January 1999

Internet Citation:

Evaluation of Cervical Cytology. Summary, Evidence Report/Technology Assessment: Number 5, January 1999. Agency for Health Care Policy and Research, Rockville, MD. http://www.ahrq.gov/clinic/epcsums/cervsumm.htm

Return EPC Evidence Reports
Clinical Information
AHRQ Home Page
Department of Health and Human Services