← Back

Computational Aspects of Statistical Confidentiality (CASC) Project

ApplicationLevelRegistry UsersCustomers Users
Link: http://neon.vb.cbs.nl/CASC/

The CENEX-SDC (CENters for EXcellence Statistical Disclosure Control) Project is sponsored by the European community and the CASC project is a CENEX project to research, develop and implement new techniques for statistical disclosure control. The project, based in the Netherlands and with participants from Spain, the U.K., and Germany, has developed two software tools, known collectively as the Argus Twins and named μ-Argus and τ-Argus respectively, that work to protect both microdata and tabular data. Latest versions and manuals of both software products are freely available from the CASC website.

The software utilizes many techniques to prevent data intruders from re-identifying individuals and I think that it is well worth recommending that cancer registries be aware of this group (CENEX-SDC).

1. μ-Argus In the case of microdata, where each record represents an individual, the danger facing statistical data agencies is that individuals from whom statistical data has been collected, may be re-identified by data intruders.

For example, a data file in the Netherlands might contain this information: residence=Urk, gender=female, and occupation=statistician. Urk happens to be a small village and there might be only one female statistician living there, so that person may easily be re-identified through use of just these three key fields.

In order to create a safe public microdata file the data file must be modified. The techniques applied by the μ-Argus software include global recoding (or regrouping of categorical data), local suppression (in the example just given, the occupation of the woman might be suppressed, Post Randomization Method or PRAM, which deliberately misclassifies some categorical data based on a probabilistic model, and/or noise may be introduced into the data.

The μ-Argus manual includes a tutorial with a sample data file to guide the new user through the application. An ASCII fixed file is used as the input file along with a file of metadata, which contains the record layout for Argus to use. The manual also includes extensive discussion of the theory of SDC.

[See also the NAACCR Record Uniqueness Program entry]

2. τ-Argus For tabular data, the τ-Argus software helps identify cells that might reveal information on individuals and produces safe tables that mask an individuals identity from third party viewers. The program uses a dominance rule to find sensitive cells within a table. The dominance rule states that a cell of a table is unsafe for public use if a few (n) major contributors to a cell are responsible for a large percentage (k) of the total of that particular cell. The table creator may supply their own levels, but n=3 and k=70% is commonly used. Cells that are risky or unsafe are suppressed.

But due to the presence of marginals in a table it is often easy to recalculate these suppressed cells. So additional cells must be suppressed to prevent this recalculation of the primary unsafe cells. It is not only enough to prevent exact recalculation but also to guarantee a safety range to protect the primary unsafe cells. The optimal selection of these secondary cells, as to avoid unnecessary high losses in the information content of the protected tables, is a very complex numerical optimization problem. It cannot be applied to many tables if the tables have a hierarchical structure. The hierarchical structures imply many more (sub-)marginals, which can be used to recalculate these primary suppressed cells.

The τ-Argus manual contains chapters describing the theoretical background, a reference chapter, and a step-by-step set of procedures for users who want to process a table or set of tables. The program can also be used to process a microdata file from which tables can then be constructed.