Fabian Depry

Fabian Depry

Forum Replies Created

Viewing 9 posts - 31 through 39 (of 39 total)
  • Author
    Posts
  • in reply to: Using SAS with NAACCR XML #7195
    AnonymousFabian Depry
    Moderator

    I put together a solution for reading using an XML Mapper and for writing using a tagset template. It seems to work fine for small data files but it doesn’t scale well and those solutions are not really usable for big files.

    I am currently working on a solution that involves calling a Java Archive (JAR) through SAS; the Java creates a tmp CSV file based on the XML and SAS can then easily read that CSV. The logic for calling Java is embedded in a SAS macro that can easily be distributed. This solution is still slower than dealing with flat files, but it’s much more reasonable for big files than the XML Mapper and/or tagsets.

    I posted all my code and experiments in the Java NAACCR XML GitHub project: https://github.com/imsweb/naaccr-xml/wiki (there is a NAACCR XML and SAS section at the bottom).

    Please feel free to download those examples and try them yourself and provide feedback in this forum!

    in reply to: Using SAS with NAACCR XML #7026
    AnonymousFabian Depry
    Moderator

    I see. But this is a global setting you change on your local machine. The SAS instance I use is a company-wide instance running on a remote Linux server.

    I guess I could ask our IT to change the SAS JRE globally for the company, but I am not sure they will accept that…

    I still think it’s an interesting solution, but I was hoping to be able to set the JRE when calling SAS (or that the default JRE would support Java 8 which has been out for 5 or 6 years now). That’s a bit disappointing.

    Thanks for the info though.

    in reply to: Using SAS with NAACCR XML #7024
    AnonymousFabian Depry
    Moderator

    Hi Isaac,

    I wanted to try your Java solution, but I ran into an issue: SAS uses a private JRE that they maintain and they are way behind: they latest version (SAS 9.4) requires Java 7 (which has been end-of-life for 3 years!). The NAACCR XML Java library is compiled under Java 8, and so it’s not compatible with SAS 9.4.

    I got that information from this link:
    https://support.sas.com/en/documentation/third-party-software-reference/9-4/support-for-java.html

    How did you make your example run with the Java 8 NAACCR XML library?

    in reply to: Using SAS with NAACCR XML #6986
    AnonymousFabian Depry
    Moderator

    I think this is a good idea.

    At the end, this is similar to what the NAACCR XML Utility tool does, except it translate XML into NAACCR fixed-column instead of CVS.

    Did you use specialized code to read the XML, or did you use the existing Java library to read the data “patient by patient”?

    in reply to: Editor Tool for XML files #6947
    AnonymousFabian Depry
    Moderator

    Hello Bruce,

    Those are very specific requirements, and you might not be able to resolve all of them with a single tool.

    I know the SEER Data Viewer (https://seer.cancer.gov/tools/dataviewer/) can be used to filter data, recode variables and re-create data files. The current version only supports the fixed-columns format, but it will support XML in the near future and you would be able to apply the same processing on XML data. I am not sure that tool will be able to handle all those requirements, but it might be worth for you to investigate it now and see how well it fits your processes.

    It is possible that somebody will come up with a way to read and write large XML data files with SAS, but so far those attempts have not been very successful and so looking into other available tools might be a good idea at this point.

    in reply to: Using SAS with NAACCR XML #6646
    AnonymousFabian Depry
    Moderator

    I think we are done looking at SAS for now.

    For reading SAS, it looks like an acceptable solution will involve an XMLMap file that tells SAS how to construct the data sets based on the different level of data in XML (so one data set for NaaccrData, one for Patient and one for Tumor, although all those sets can be defined in a single XMLMap file). The XMLMap will define SAS variables based on their NAACCR ID attribute (using XPath); it will also define a few “ORDINAL” variables which will be used as identifiers for every rows of the data sets (they are called ORDINAL because they are counters incremented when a specific tag is found in the data files). A SAS program will then be able to “merge” back the different data sets using the ORDINAL variables as a pivot (or linkage variable); the end result will be a single data set where the NaaccrData data is repeated for every Patient and Tumor, and the Patient data is repeated for every Tumor (which is the same behavior as reading flat files). There is one caveat to this solution: SAS will read and process every variables defined in the XMLMap; so using a mapping file that defines all variables won’t be practical for large data files (the processing will be too slow). Instead, a smaller mapping file should be used with just the variables that are needed for the program. Hopefully it will be possible to create those specialized XMLMap files using an open-source software. I am attaching an example of a mapping file including only a few variables:
    – naaccr-xml-v16-data-sample.xml: a very simple NAACCR XML sample file
    – naaccr-xml-v16-sas-def-minimal.map: an XMLMap file containing the definition of one variable at each XML level (plus the ordinal variables)
    – readin.level2.sas: a simple SAS program that merges Patient and Tumor data from the sample files and print frequencies of the defined variables.
    – readin.level2.output: the results of running the SAS program (I only copied the relevant frequencies)

    For writing SAS, the conclusion would be “don’t do it”. We found no satisfactory way of using an XMLMap to re-create a valid NAACCR XML file. There are other solutions that don’t use an XMLMap but they are very involved and require some type of coding that most people wouldn’t be willing to do. There are other tools and software that can recode variables and that will probably be updated to support NAACCR XML; the best approach for recoding XML files would be to switch to those tools.

    *** Update: looks like I can’t upload the files in this post; all the files have been uploaded in the java NAACCR XML project in GitHub:
    https://github.com/imsweb/naaccr-xml/tree/master/docs/sas

    Attachments:
    You must be logged in to view attached files.
    in reply to: Using SAS with NAACCR XML #6544
    AnonymousFabian Depry
    Moderator

    Isaac,

    I will let Linda comment on your proposed solution.

    But just so you know, we (at IMS) have a few SAS experts and we are currently investigating the SAS XML Mapper (it seems to be the standard solution for that type of problem).

    Our first proof of concept went very well, but we are not ready to post any results yet. Once we are, we will report them to the NAACCR XML work group and update this forum.

    in reply to: NAACCR Fixed-Width data exchange format going away in 2020 #6446
    AnonymousFabian Depry
    Moderator

    That all makes sense, thanks for the reply!

    in reply to: NAACCR Fixed-Width data exchange format going away in 2020 #6444
    AnonymousFabian Depry
    Moderator

    Will the XML standard stay “compatible” with the fixed-column format until that date? Or are there plans for adding features to the XML that would break that compatibility before the fixed-column retirement date?

Viewing 9 posts - 31 through 39 (of 39 total)

Copyright © 2018 NAACCR, Inc. All Rights Reserved | naaccr-swoosh-only See NAACCR Partners and Sponsors