Using SAS with NAACCR XML

Home Forums NAACCR XML Standard Using SAS with NAACCR XML

Tagged: 

Viewing 6 posts - 31 through 36 (of 36 total)
  • Author
    Posts
  • #7441
    AnonymousValerie Otto
    Spectator

    Sorry to get your hopes up, reading looks good but I forgot to check writing! It seems broken in 4.9 and 4.10. The temporary output csv file contains all of the patients & tumors, but the xml only contains the first patient, their first tumor, and a tumor from another patient down to rxTextSurgery. I don’t see anything obvious about why it stopped writing and why the tumor got placed under the wrong patient, it is correct in the temp CSV.

    ex.
    <Patient>
    <Item>Patient 1 info</Item>
    <Tumor>
    <Item>Patient 1’s tumor info</Item>
    </Tumor>
    <Tumor>
    <Item>Patient 2’s tumor info</Item>

    Successfully wrote:
    <Item naaccrId=”textDxProcPath”>9-9-16 HOSPITAL PATH-16-99999 BRAIN, TEST, TEXT: TEXT, WHO GRD 999. TEXT, BRAIN TMR: X/X TUMOR. XXX-9: TEXT. TEXT: TEXT, TEXT: TEXT</Item>
    <Item naaccrId=”textStaging”>N/A</Item>
    <Item naaccrId=”rxTextSurgery”>9-9-16 HOSPITAL: TEXT W/TEXT TEXT BY DR DOCTOR</Item> end of writing

    next item to be written, is present in tmp csv rxTextRadiation:
    9-9/9-9-16 HOSPITAL, DR DOCTOR: XXX BRAIN (9999 CGY), 99 FX’S, XXXX & 9MV

    It successfully wrote some variables that were read with CDATA, so I don’t think that was the problem.

    #7442
    AnonymousFabian Depry
    Moderator

    Darn 🙂

    I will take another look at some point.

    #7473
    AnonymousFabian Depry
    Moderator

    Hi Valerie,

    I looked more into the issue you described, but I can’t reproduce it.

    I used the following file:
    https://github.com/imsweb/naaccr-xml/blob/master/src/test/resources/data/sas/test2.xml

    I tried to create a file that represents the data you described.

    Could you please try that file yourself when you have some time, and confirm that it’s also working for you. And if it is, can you please compare it with your own file and maybe try to figure out the difference?

    Thank you!!!

    #9752
    AnonymousBruce Riddle
    Spectator

    I want to make the case in writing that a different approach is needed to get data out of a central registry database into analytical tools like SAS, GenEdits, InterRecordEdits, SEER*PATH, Match*PRO, etc.
    The NAACCR Volume 2 has the title ‘Data Standards and Data Dictionary.’ Then the next piece is called the ‘XML Data Exchange Standard.’ The primary goal of the data exchange standard is to ensure seamless transmission between registries be it a hospital registry or a central registry. Nowhere is it written that the XML Data Exchange Record has to be read by any of the analytical tools. Almost all of our analytical tools do not read or cannot read XML documents very well.

    Plan B: A secondary standard is needed that allows for a pipe-delimited formatted ASCII file to be exported from a central registry database to be input into an analytical tool. The two models for this are SEER*STAT and MATCH*PRO.

    The primary assumption I am making is that right now a pipe (‘|’) is not contained in any names, addresses, or coding schemes collected by a registry or imported into a registry software system. If that assumption is violated, then we need to find another delimiter.

    I would like to see developed analytical file formats that consist of selected data items needed for normal work. For instance, prior to calls for data, a list of data items would be developed along with the order that could be brought into GenEdits and InterRecordEdits. That file format would be installed in the registry vendor software to output the subsets of necessary cases.

    In New Hampshire, because we are so small, we would seek all cases 1995-2017 that meet the required criteria. The output file would contain approximately 136 reportable data items along with a few confidential data items to facilitate editing of cases. The resulting file would be smaller than an XML file, faster to output and faster to read in the analytical software.

    I would strongly prefer that the header use NAACCR Item numbers as the variables names (N18_20, N18_390, N18_400, etc.) to make manipulation easier.

    Rarely does a central registry need to output the entire NAACCR record. It would be necessary for inter-state data exchange, for archive purposes, and for transmission to some authorities.

    Other pipe delimited file formats could be used for submission to the NAACCR Geocoder. Match*PRO, etc. SAS PROC IMPORT can easily read into a pipe-delimited file with a header.

    The use of analytical pipe-delimited does not diminish the value of the XML Standard for Data Exchange. At a certain size, a delimited file becomes unwieldy and cumbersome.

    #9759
    Isaac Hands
    Moderator

    Bruce, thank you for this request and your description of the problems you may face with XML. In the NAACCR XML Workgroup we have been discussing the utility of a delimited file format for certain use cases, specifically related to compatibility in SAS and other statistical software. As these discussions are ongoing, we are all forming and re-forming our opinions on the matter so I am not sure I can present a coherent picture of where the discussions currently land until we have more discussion. (You are welcome to join our Workgroup at any time)

    On the subject of choosing a delimiter, we can define escaping rules for whatever delimiter we choose, so I am not worried about trying to guess whether a certain character will show up in the output or not.

    On the subject of header names, I am not a fan of defining another name for data items. We are currently working with UDS to harmonize the NAACCR “Short Name” list with the current XML naaccrIds so that we can reduce duplicate naming efforts in the NAACCR Community. This will probably involve shortening the Xml naaccrIds and agreeing on a standard way to generate stable names across versions.

    #9760
    AnonymousBruce Riddle
    Spectator

    Issac,
    I was on the last call and I listened to the discussions.
    I work with about 150 variables from the NAACCR dataset in SAS. It is
    much easier to type the NAACCR Item number than some random short name. The
    lookup is much faster. I am not asking for another name. The NAACCR numbers
    are in place.
    In a related topic, I am trying to work with Windows PowerShell to manipulate
    HL7 ePath records. PowerShell provides a very useful way to do that with only
    a few commands. In my work, I discovered that PowerShell can also manipulate
    XML files. That should be very helpful.

Viewing 6 posts - 31 through 36 (of 36 total)
  • You must be logged in to reply to this topic.

Copyright © 2018 NAACCR, Inc. All Rights Reserved | naaccr-swoosh-only See NAACCR Partners and Sponsors