Bruce Riddle

Bruce Riddle

Forum Replies Created

Viewing 15 posts - 1 through 15 (of 16 total)
  • Author
    Posts
  • in reply to: using SAS with NAACCR XML #13649
    AnonymousBruce Riddle
    Spectator

    Using Northcon 210 (1.0.0.17), NAACCR XML Utility 7.5, and the NAACCR-XML Java Library on GitHub, I have successfully re-written our SAS programs to read and write NAACCR 21 XML files for case processing. GenEdits 5.1.064 and the NAACCR 21A metafile was used to examine the input and output XML files. All the software appeared to work well. I customized the read and write Macros from GitHub to meet our needs for NAACCR 21 and NAACCR A record. Two minor problems. One is finding a way to suppress some of the output produced by the Macro. A sequence of 6 or more reads or writes of the respective Macros makes the SAS Log less useful for debugging. The other problem is that if you create errors that involve the JAVA library, the only recovery is to stop and restart SAS.

    Many thanks to everyone to developed these tools. Getting all of this to work was a major relief.

    in reply to: using SAS with NAACCR XML #13494
    AnonymousBruce Riddle
    Spectator

    I am trying to work with NAACCR 210. How do I start to debug this?

    1 * test of reading XML file using SAS ;
    2 filename txml “J:\XML\SAS_XML\naaccr-xml-utility-7.5\naaccr-xml-utility-7.5\sas” ;
    3 %include txml (read_naaccr_xml_macro.sas) ;
    86 %readNaaccrXml(
    87 libpath=”J:\XML\SAS_XML\naaccr-xml-utility-7.5\naaccr-xml-utility-7.5\sas”,
    88 sourcefile=”J:\XML\Oct2020_test\120170_16Sep2020_V21.xml”,
    89 naaccrversion=”210″,
    90 recordtype=”A”,
    91 dataset=stg1 ) ;

    ERROR: Could not find class com/imsweb/naaccrxml/sas/SasXmlToCsv at line 1 column 111. Please ensure that the
    CLASSPATH is correct.
    ERROR: DATA STEP Component Object failure. Aborted during the EXECUTION phase.
    java.lang.ClassNotFoundException: com.imsweb.naaccrxml.sas.SasXmlToCsv
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    NOTE: The SAS System stopped processing this step because of errors.
    NOTE: DATA statement used (Total process time):
    real time 0.45 seconds
    cpu time 0.04 seconds

    in reply to: question about patient/turmor model #10649
    AnonymousBruce Riddle
    Spectator

    I have no issues with SAS macros; I use them. This is still early in the game and I have a limited amount of test data. Fabian pointed out today that a NAACCR 18 flat file I received from a national vendor does not exactly meet the “specifications” to be used in SEER DataViewer. I am powerless on the issue to get the vendor to “fix” the files. I assume the XML files I will receive will also be imperfect. I am trying to test out case processing scenarios that address the issues I outlined above and allow me to manage and track the inflows and characteristics of data received. I am only one voice. The current structure of the SAS Macro is awkward and limited for use in processing up to 40 files in a batch with imperfections in the data. Some of this is my own idiocentric ways of work with data; I never use long complex variables names in large data sets. Who wants to learn 791 variable names? I strongly prefer the NAACCR Item numbers. Right now the tools being developed seem to focus on one file at a time. I do not want to touch one file unless it is so bad it needs work. I want to work on files in batches with tools that allow me to add supplemental information to track the who and when. Before anyone writes yet another program, we thought is required and tools in development need to be released for testing and evaluation.

    in reply to: question about patient/turmor model #10647
    AnonymousBruce Riddle
    Spectator

    Issac,
    I tried the XML macro and it does not scale well for monthly
    production. Very awkward on an in a multi-file production run.
    Bruce

    in reply to: Editor Tool for XML files #10136
    AnonymousBruce Riddle
    Spectator

    Joe,
    I think is a wonderful idea. I have used XML Exchange Plus
    in my experiments and I like the tool.

    Bruce

    in reply to: Editor Tool for XML files #10067
    AnonymousBruce Riddle
    Spectator

    We expect we will start to receive XML files in January or February. Although the flat file option will exist, I expect some IT people will make the choice for the registrars. Almost immediately I will need tools to go into the file to make changes such as missing hospital numbers or missing dates. I will also need to figure out a way to separate out rapid reports (within 45 days of diagnosis) and definitive reports (within 180 days of diagnosis). Please, can anyone suggest an XML text editor? I assume given the size of the files, they will arrive Zipped so it would be nice if the editor read and saved ZIP files.

    Thanks.

    in reply to: Using SAS with NAACCR XML #9760
    AnonymousBruce Riddle
    Spectator

    Issac,
    I was on the last call and I listened to the discussions.
    I work with about 150 variables from the NAACCR dataset in SAS. It is
    much easier to type the NAACCR Item number than some random short name. The
    lookup is much faster. I am not asking for another name. The NAACCR numbers
    are in place.
    In a related topic, I am trying to work with Windows PowerShell to manipulate
    HL7 ePath records. PowerShell provides a very useful way to do that with only
    a few commands. In my work, I discovered that PowerShell can also manipulate
    XML files. That should be very helpful.

    in reply to: Using SAS with NAACCR XML #9752
    AnonymousBruce Riddle
    Spectator

    I want to make the case in writing that a different approach is needed to get data out of a central registry database into analytical tools like SAS, GenEdits, InterRecordEdits, SEER*PATH, Match*PRO, etc.
    The NAACCR Volume 2 has the title ‘Data Standards and Data Dictionary.’ Then the next piece is called the ‘XML Data Exchange Standard.’ The primary goal of the data exchange standard is to ensure seamless transmission between registries be it a hospital registry or a central registry. Nowhere is it written that the XML Data Exchange Record has to be read by any of the analytical tools. Almost all of our analytical tools do not read or cannot read XML documents very well.

    Plan B: A secondary standard is needed that allows for a pipe-delimited formatted ASCII file to be exported from a central registry database to be input into an analytical tool. The two models for this are SEER*STAT and MATCH*PRO.

    The primary assumption I am making is that right now a pipe (‘|’) is not contained in any names, addresses, or coding schemes collected by a registry or imported into a registry software system. If that assumption is violated, then we need to find another delimiter.

    I would like to see developed analytical file formats that consist of selected data items needed for normal work. For instance, prior to calls for data, a list of data items would be developed along with the order that could be brought into GenEdits and InterRecordEdits. That file format would be installed in the registry vendor software to output the subsets of necessary cases.

    In New Hampshire, because we are so small, we would seek all cases 1995-2017 that meet the required criteria. The output file would contain approximately 136 reportable data items along with a few confidential data items to facilitate editing of cases. The resulting file would be smaller than an XML file, faster to output and faster to read in the analytical software.

    I would strongly prefer that the header use NAACCR Item numbers as the variables names (N18_20, N18_390, N18_400, etc.) to make manipulation easier.

    Rarely does a central registry need to output the entire NAACCR record. It would be necessary for inter-state data exchange, for archive purposes, and for transmission to some authorities.

    Other pipe delimited file formats could be used for submission to the NAACCR Geocoder. Match*PRO, etc. SAS PROC IMPORT can easily read into a pipe-delimited file with a header.

    The use of analytical pipe-delimited does not diminish the value of the XML Standard for Data Exchange. At a certain size, a delimited file becomes unwieldy and cumbersome.

    in reply to: Discussion: The case for preserving naaccrNum #9751
    AnonymousBruce Riddle
    Spectator

    I agree that NAACCR Number is very important to operations. Almost all by
    code uses the NAACCR number to refer to variables. The number gives me an
    exact name. I cannot imagine working with on a day to day with the longer names
    in the XML specification writing code. It is just a great deal of typing and chances
    to make errors.

    Bruce

    in reply to: question about patient/turmor model #7334
    AnonymousBruce Riddle
    Spectator

    A very insightful comment. The conversion will not be simple.

    in reply to: question about patient/turmor model #7331
    AnonymousBruce Riddle
    Spectator

    More research and figured out Part 1 of my question. Part 2 is harder. Like the SAS conversion issue,
    the challenge remains on how to create an accurate analytic record that contains the correct patient
    and correct tumor info.

    in reply to: Editor Tool for XML files #7011
    AnonymousBruce Riddle
    Spectator

    We use RMCDS as the registry database. NH Rules and Regs require reporters to send us a rapid report within
    45 days of diagnosis. Almost all reporter transmissions contain a mix of rapid and definitive or complete reports. I use SAS to separate out rapids from definitives. In that step, I can also correct for missing or incomplete data.

    RMCDS only lets us load NAACCR records. I use SAS to take reports from non-hospital reporters –pathology cases, death clearance only records, clinic records–to create a NAACCR record to load into RMCDS.

    Bruce

    in reply to: Using SAS with NAACCR XML #6952
    AnonymousBruce Riddle
    Spectator

    My experiments with SAS and XML have not been very successful. The loss of SAS eliminates a very powerful tool both for basic file processing prior to loading data in to the registry database and also working with the data on export from the registry database. I have little hope that SAS will invest in a more advanced XML tool.
    Here is one idea for a solution to at least create analytical files. SAS Proc Import will read delimited files with a header. This provides an option for two applications. One application is to be able to export from the main database selected variables in a pipe delimited format with a header. To make this more user friendly, the application needs a configuration page where you can just check the variables you need and be able to keep that list as a file for future use. Some users will only need to set the configuration once. Then SAS Proc Import can read in the delimited file and create the SAS data set.
    The second application would read an XML file and perform the same task as above.
    In both instances, one line for patient/tumor. Very few exercises require the entire set of all NAACCR variables so these analytic data sets should be fairly small.
    The major advantage of this method is that you do not need any input or format statement. The significant disadvantage is that PROC Import selects the input format so sometimes you get numeric when you want character, etc.
    Another version of above is write out two separate files. One file of pipe delimited data and a second file of the input format. The input format could easily dragged into a SAS program. The configuration page could allow for selection of formats. For example, I read in all dates as character since NAACCR allows date with blanks. In SAS, I can fill in the blanks before creating a SAS date that can be manipulated.
    The XML file for a standard time period, 1995 to 2018, will be very large. Few registries will have the storage capacity to keep a reasonable number of these files around. The ability to easily create analytic files is very important. Finding a very convenient way to upzip, run a tool or GenEdits, and re-zip will be important.

    in reply to: Editor Tool for XML files #6950
    AnonymousBruce Riddle
    Spectator

    As I said in the beginning, “a number of powerful and robust tools.” Many smaller registries do not
    have an IT staff so they have not really thought about XML and the impact it will have on registry operations. If many tools are not present, the move to XML will be very difficult.

    in reply to: Using SAS with NAACCR XML #6720
    AnonymousBruce Riddle
    Spectator

    I tried out the sample code Fabian posted on XML files I created using various tools and our data for one year. The good news is that I got identical results from the XML files exported by the tools although they differed in size. For 8,000 cases, one file was 188,725 KB and one was 148,833 KB. The bad news is that it is very slow. SAS is provided under license to NPCR Registries and many take advantage of the opportunity. Few registries I know have any staff who know any JAVA, Python, or C++. If I know C++ or JAVA well enough to write code to manipulate XML, I would get a much better paying job.

    One suggestion here was to use the XML tools built into MS SQL. We will explore that idea.

    B

Viewing 15 posts - 1 through 15 (of 16 total)

Copyright © 2018 NAACCR, Inc. All Rights Reserved | naaccr-swoosh-only See NAACCR Partners and Sponsors