Note: I argued with all my strength in the NAACCR XML Work Group about the choice to eliminate the data item number (naaccrNum) from NAACCR XML data files, and lost. Now the news comes in over my transom that a campaign is afoot to eliminate the item number from the entire XML specification and from Volume II. I am going to make one last appeal here to the community at large not to let that happen.
The EDITS tools perform data validation processing referring to NAACCR Volume II and custom data items using their unique number identifiers. There are two leading reasons why this design was implemented and remains in practice:
* Volume II has historically made changes to data item names. The numbers are immutable.
* Computers prefer numbers to strings.
The argument was made to me that the naaccrId is now the “immutable” identifier. By design, it repackages the data item name so that a casual reader of a data file knows all it contains about a patient’s cancer. The naaccrId is already out of synch with the (modified) data item name in a number of cases, but mostly seems to suggest what the item is. (Does countyAtDxGeocode1990 mean the same thing as countyAtDxGeocode19708090? The latter represents how the NAACCR XML specification’s algorithm would translate the current name of data item 94.)
But more important to any programmer who has to listen to somebody complain that his software doesn’t run fast enough is that second bullet. When you use a string as an identifier, your processing includes running your language’s equivalent of string.compare() to determine whether your search for “phase1RadiationExternalBeamPlanningTech” is found in the XML record (it has to evaluate every byte of that string to be sure). And it has to run this process every time you want that item.
Or you could simply ask it to find data item 1502. Pop quiz: Which look-up happens faster?
I am unpersuaded of the value of naaccrId at all, but can accept it as a property of the specification since it seems to be fundamental to at least one vendor’s implementation. But let’s not rush to throw out the naaccrNum. All that accomplishes is diminished performance for high-volume batch processing such as EDITS.
Kathleen