Using SAS with NAACCR XML

Tagged: SAS

This topic has 35 replies, 5 voices, and was last updated 5 years, 8 months ago by Bruce Riddle.

Viewing 7 posts - 31 through 37 (of 37 total)

← 1 2 3

Author

Posts
July 16, 2018 at 12:20 pm #7441

Valerie Yoder
Spectator

Sorry to get your hopes up, reading looks good but I forgot to check writing! It seems broken in 4.9 and 4.10. The temporary output csv file contains all of the patients & tumors, but the xml only contains the first patient, their first tumor, and a tumor from another patient down to rxTextSurgery. I don’t see anything obvious about why it stopped writing and why the tumor got placed under the wrong patient, it is correct in the temp CSV.

ex.
<Patient>
<Item>Patient 1 info</Item>
<Tumor>
<Item>Patient 1’s tumor info</Item>
</Tumor>
<Tumor>
<Item>Patient 2’s tumor info</Item>

Successfully wrote:
<Item naaccrId=”textDxProcPath”>9-9-16 HOSPITAL PATH-16-99999 BRAIN, TEST, TEXT: TEXT, WHO GRD 999. TEXT, BRAIN TMR: X/X TUMOR. XXX-9: TEXT. TEXT: TEXT, TEXT: TEXT</Item>
<Item naaccrId=”textStaging”>N/A</Item>
<Item naaccrId=”rxTextSurgery”>9-9-16 HOSPITAL: TEXT W/TEXT TEXT BY DR DOCTOR</Item> end of writing

next item to be written, is present in tmp csv rxTextRadiation:
9-9/9-9-16 HOSPITAL, DR DOCTOR: XXX BRAIN (9999 CGY), 99 FX’S, XXXX & 9MV

It successfully wrote some variables that were read with CDATA, so I don’t think that was the problem.

July 16, 2018 at 12:24 pm #7442

Fabian Depry
Moderator

Darn 🙂

I will take another look at some point.

July 22, 2018 at 1:51 pm #7473

Fabian Depry
Moderator

Hi Valerie,

I looked more into the issue you described, but I can’t reproduce it.

I used the following file:
https://github.com/imsweb/naaccr-xml/blob/master/src/test/resources/data/sas/test2.xml

I tried to create a file that represents the data you described.

Could you please try that file yourself when you have some time, and confirm that it’s also working for you. And if it is, can you please compare it with your own file and maybe try to figure out the difference?

Thank you!!!

August 23, 2018 at 1:28 pm #9752

Bruce Riddle
Spectator

I want to make the case in writing that a different approach is needed to get data out of a central registry database into analytical tools like SAS, GenEdits, InterRecordEdits, SEER*PATH, Match*PRO, etc.
The NAACCR Volume 2 has the title ‘Data Standards and Data Dictionary.’ Then the next piece is called the ‘XML Data Exchange Standard.’ The primary goal of the data exchange standard is to ensure seamless transmission between registries be it a hospital registry or a central registry. Nowhere is it written that the XML Data Exchange Record has to be read by any of the analytical tools. Almost all of our analytical tools do not read or cannot read XML documents very well.

Plan B: A secondary standard is needed that allows for a pipe-delimited formatted ASCII file to be exported from a central registry database to be input into an analytical tool. The two models for this are SEER*STAT and MATCH*PRO.

The primary assumption I am making is that right now a pipe (‘|’) is not contained in any names, addresses, or coding schemes collected by a registry or imported into a registry software system. If that assumption is violated, then we need to find another delimiter.

I would like to see developed analytical file formats that consist of selected data items needed for normal work. For instance, prior to calls for data, a list of data items would be developed along with the order that could be brought into GenEdits and InterRecordEdits. That file format would be installed in the registry vendor software to output the subsets of necessary cases.

In New Hampshire, because we are so small, we would seek all cases 1995-2017 that meet the required criteria. The output file would contain approximately 136 reportable data items along with a few confidential data items to facilitate editing of cases. The resulting file would be smaller than an XML file, faster to output and faster to read in the analytical software.

I would strongly prefer that the header use NAACCR Item numbers as the variables names (N18_20, N18_390, N18_400, etc.) to make manipulation easier.

Rarely does a central registry need to output the entire NAACCR record. It would be necessary for inter-state data exchange, for archive purposes, and for transmission to some authorities.

Other pipe delimited file formats could be used for submission to the NAACCR Geocoder. Match*PRO, etc. SAS PROC IMPORT can easily read into a pipe-delimited file with a header.

The use of analytical pipe-delimited does not diminish the value of the XML Standard for Data Exchange. At a certain size, a delimited file becomes unwieldy and cumbersome.

August 24, 2018 at 8:48 am #9759

Isaac Hands
Moderator

Bruce, thank you for this request and your description of the problems you may face with XML. In the NAACCR XML Workgroup we have been discussing the utility of a delimited file format for certain use cases, specifically related to compatibility in SAS and other statistical software. As these discussions are ongoing, we are all forming and re-forming our opinions on the matter so I am not sure I can present a coherent picture of where the discussions currently land until we have more discussion. (You are welcome to join our Workgroup at any time)

On the subject of choosing a delimiter, we can define escaping rules for whatever delimiter we choose, so I am not worried about trying to guess whether a certain character will show up in the output or not.

On the subject of header names, I am not a fan of defining another name for data items. We are currently working with UDS to harmonize the NAACCR “Short Name” list with the current XML naaccrIds so that we can reduce duplicate naming efforts in the NAACCR Community. This will probably involve shortening the Xml naaccrIds and agreeing on a standard way to generate stable names across versions.

August 24, 2018 at 9:05 am #9760

Bruce Riddle
Spectator

Issac,
I was on the last call and I listened to the discussions.
I work with about 150 variables from the NAACCR dataset in SAS. It is
much easier to type the NAACCR Item number than some random short name. The
lookup is much faster. I am not asking for another name. The NAACCR numbers
are in place.
In a related topic, I am trying to work with Windows PowerShell to manipulate
HL7 ePath records. PowerShell provides a very useful way to do that with only
a few commands. In my work, I discovered that PowerShell can also manipulate
XML files. That should be very helpful.

February 2, 2021 at 1:31 pm #13902

Caosang Auto
Spectator

thanks
Author

Posts