Howe H. et al. Method to assess identifiability in electronic data files. Am J Epidemiol. 2007 Mar 1;165(5):597-601

This article describes the evaluation of record uniqueness in the Cancer in North America (CINA) research files upon addition of a county based socioeconomic variable. The authors developed a software tool known as the Record Uniqueness (RU) program in order to assess the number of unique records or unique record sets in a particular data set based on key variables chosen. The percentage of unique records on a given data file can be taken as an estimate of the risk identifying a known cancer patient on the file. The paper analyzes the increase in the number of unique records with the addition of different forms of the socioeconomic variable. For files released for the purposes of research, the authors suggest that no more than 20 percent of the variable combinations should identify unique record sets in order to decrease the risk of a confidentiality breech. For public use data files, the suggested threshold is 5 percent. While the thresholds identified may not be optimal for every data set, the software tool described is a valuable addition to the methods employed in protecting confidentiality of data by providing quantitative input to help judge the tradeoff between identifiably risk and data utility when deciding whether or not to add potentially identifiable information.