Analysis and Data Improvement Tools

Analysis and Data Improvement Tools

NAACCR Committees and members have worked collaboratively to develop tools and resources for use by central cancer registry analysts and researchers. Select one of the options below to learn more.

This tool describes and provides macro-driven formulae, in a Microsoft Excel workbook, to calculate completeness of case ascertainment based on observed cancer incidence, death rates, and a comparison of standard rates of incidence and mortality in the United States.

This program is used with a NAACCR standard data exchange file format with confidential information, including a census tract identifier. The program will link the census tract identifier with the percent of the residents in the census tract that live below the poverty level. This information is based data from the 2000 U.S. Census and the American Community Survey. The data used is the census data most closely aligned with diagnosis year. The program will output two variables that will be attached to every registry record inputted: the xx.x% poverty for the census tract, and a second variable that groups the exact percents into four categories: less than 5% poverty, 5%-9.9% poverty, 10%-19.9% poverty, and 20% or higher poverty.

The Record Uniqueness Program was developed by Howe, Lake, and Shen to assess electronic data files for risk of confidentiality breach based on unique combinations of key variables.

This is a software utility developed in MS Access to identify miscoded sex codes based on first name. Taking as input a data file in NAACCR v16 format, a query runs against a list of known sex/name pairs, and it produces a list of cases for manual review that have potential errors in sex. The utility is based on an algorithm initially created by the New York Cancer Registry in August 2011.

In real world registry settings, the number of potential errors flagged by the tool  is extremely low – in the neighborhood of 0.25%. After careful review, users have reported that about 20-50% of the cases identified by the tool in need of review are indeed in error. Higher percentages have been found when the tool is run on incoming registry data. For cases where the edit flagged a sex that was correct, a misspelling of the name was often identified. For male breast cancer, nearly all flagged cases were errors, a consequence of the highly skewed sex distribution of this cancer site. A published study on this tool is available here.

A list of tools which can import and export data in NAACCR Volume II format.


v16 SAS Translation Tool

The code template below can be used by proficient SAS programmers to efficiently and accurately access data in the new V16 format. Code to both read and write ASCII V16 format is provided. Various sections and options are included – users simply comment out sections which are not applicable for their specific needs. The code supports the three most often used record types (Incidence, Confidential and Text). Beginning with V14, code is included to handle data elements which are part of the CDC’s Comparative Effectiveness Research (CER) and Patient-Centered Outcomes Research (PCOR) projects. As you use the tool, we appreciate any feedback or comments you have. Contact with your thoughts.



This MS Access database contains an import/export file specification for NAACCR v15 and v16 record layouts. It allows the user to import these types of files, perform operations on them, and then export them back out as a text file in the same format. Contact if you have any feedback on this tool.

Rural-Urban Data Items

Studies have shown that residents of rural areas have lower screening rates, lower rates of follow-up of abnormal screening tests, higher late-stage diagnosis rates, and differences in cancer treatment patterns.  Including tract-level indicators of rural-urban residence in the NAACCR data files will facilitate research in rural-urban disparities and allow researchers to control for rural-urban differences in model-based analysis of cancer risks and outcomes.

This SAS code creates 2 different measures of the rural-urban environment.  The URIC is a measure of the rural nature of the place of residence and can be an indicator of access to recreation, access to food stores, exposures to pollutants, crime levels, social cohesion, etc.  The USDA RUCA-based indicator is a measure of the proximity to large urban centers and can be an indicator of access to oncology specialists and cancer treatment facilities.  Both indicators have been tested for uniqueness and they do not allow the identification of individual census tracts as long as the county is not known.

Description of items:

Two indicators of the rural-urban environment based on the census tract of the diagnosis address:

  • Urban Rural Indicator Codes (URIC) is based on the Census Bureau’s identification of urban and rural areas
  • Rural Urban Commuting Areas Codes (RUCA) is based on the USDA’s Rural Urban Commuting Area (RUCA) codes

Cases diagnosed between 1995 and 2004 are assigned a code based on the 2000 U.S. Census. Cases diagnosed since 2005 are assigned a code based on the 2010 U.S. Census. 

Allowable values:

  • URIC :
    • 1: all urban – the percent of the population in an urban area = 100%
    • 2: mostly urban – the percent of the population in an urban area < 100% and ≥ 50%
    • 3: mostly rural – the percent of the population in a rural area < 100% and > 50%
    • 4: all rural – the percent of the population in an rural area = 100%
    • 9: unknown or not applicable – census tract not available or tract population was zero at the last decadal census
  • RUCA
    • 1: urban commuting area – RUCA codes 1.0, 1.1, 2.0, 2.1, 3.0, 4.1, 5.1, 7.1, 8.1, and 10.1
    • 2: not an urban commuting area – all other RUCA codes except 99
    • 9: unknown or not applicable – census tract not available or RUCA code = 99

Along with incidence and mortality data, information on population-based cancer survival is necessary to understand the full burden of cancer in our society. This SAS code is used to create the variables needed to conduct relative survival for the CiNA Volume 4: CiNA Survival. It is made available here for use by researchers on their own data and currently updated for a study cutoff date of 2015.

Copyright © 2016 NAACCR, Inc. All Rights Reserved | Terms of Use | naaccr-swoosh-only See NAACCR Partners and Sponsors