Data Analysis Tools

Great Circle Distance Calculator

This SAS code calculates the great circle distance between the locations of cases at the time of diagnosis and the locations of treatment facilities. Case locations are taken from NAACCR items 2352 (latitude) and 2354 (longitude) in a NAACCR v10 or v11 record layout file. The program can use either source (unconsolidated) or consolidated case records as input. A second input file contains facility IDs, latitude, and longitude.

NAACCR Asian/Pacific Islander Identification Algorithm (NAPIIA)

This algorithm uses information on Asian race Not Otherwise Specificed (NOS) as reported to the cancer registry based on information from the medical record, and using gender, birthplace, first name, and surname (including maiden name, when available), assigns an Asian NOS race status to a more specific Asian race group.

  • NAPIIA v1.2.1, Reviewed September 13, 2011 (PDF)
  • Note: NAPIIA is now run as part of the NHAPIIA algorithm.

NAACCR Hispanic and Asian/Pacific Islander Identification Algorithm (NHAPIIA)

This algorithm combines NHIA and NAPIIA into a single SAS program.

NAACCR Method to Enhance Hispanic/Latino Identification (NHIA)

This algorithm uses information on ethnicity from the medical record, information reported to the cancer registry, and an evaluation of the strength of the birthplace race and surname (including maiden name, when available) associated with Hispanic ethnicity status.

  • NHIA v2.2.1, Updated September 12, 2011 (PDF)
  • Note: NHIA is now run as part of the NHAPIIA algorithm.

NAACCR Method to Estimate Completeness

This tool describes and provides macro-driven formulae, in a Microsoft Excel workbook, to calculate completeness of case ascertainment based on observed cancer incidence, death rates, and a comparison of standard rates of incidence and mortality in the United States.

Poverty and Census Tract Linkage Program

This program is used with a NAACCR standard data exchange file format with confidential information, including a census tract identifier. The program will link the census tract identifier with the percent of the residents in the census tract that live below the poverty level. This information is based data from the 2000 US Census and the American Community Survey. The data used is the census data most closely aligned with diagnosis year. The program will output two variables that will be attached to every registry record inputted: the xx.x% poverty for the census tract, and a second variable that groups the exact percents into four categories: less than 5% poverty, 5%-9.9% poverty, 10%-19.9% poverty, and 20% or higher poverty.

Record Uniqueness

The Record Uniqueness Program was developed by Howe, Lake, and Shen to assess electronic data files for risk of confidentiality breach based on unique combinations of key variables.