Kathleen Beaumont

Kathleen Beaumont

Forum Replies Created

Viewing 13 posts - 16 through 28 (of 28 total)
  • Author
    Posts
  • in reply to: Need to build an interface for XML to SQL Datadata #7458

    Hi Jeff,

    I’m glad you were able to use the CDC’s XMLPlus.dll to parse a sample data file.

    Regarding the second bullet in your “Next Queued tasks” list: An easy way to create a NAACCR XML data file uses the GUI app in the CDC download, XMLExchange Plus. Open the Convert form to select a flat-format data file and an EDITS50-compatible NAACCR metafile (or, if your registry has a custom metafile, use that). Press F1 to bring up the Help pages for this process, and generate the data file(s) you need for your continuing development.

    The point I’m trying to make is that you can use a flat data file of Record Type ‘160’ and generate a NAACCR XML data file that meets today’s requirements. Be sure to read the help for drop-down control “Record Type”, which allows you to specify the Record Type of the resulting data file.

    Kathleen

    in reply to: Need to build an interface for XML to SQL Datadata #7287

    Jeff,

    Going back to basics: You are presently bulk-importing from flat-format files, and said you hoped to minimize re-coding existing systems, so let’s loop back to that concept.

    Keep in mind that a big motivation for using XML is to allow transmitting complete text data (e.g., physician’s notes, etc.) instead of truncating it to a couple of thousand characters, so you don’t want to use the current NAACCR undelimited format. (These data items are identifiable in the NAACCR XML dictionary with allowUnlimitedText=”true”.) Also note that the plan is to eliminate the “starting position” attribute of NAACCR data items in the not-distant future, so positioning in a flat file will be something you’ll have to maintain with your own “record layout”. Boy, this is getting ugly really fast, isn’t it?

    When importing from flat ASCII, do delimit the fields so that you can know where the text fields begin and end. But I recommend not delimiting with the pipe character (‘|’). In fact, don’t use any keyboard character because somebody is bound to embed it in a text field. I like the guillemet for this purpose (‘»’, typed from the keyboard with Alt+0187) because it is really unlikely to be typed during data entry. And you’ll need to flatten the CRLFs in text data; I think SQL Server understands “~” to be a linefeed, but you’d have to look into what Oracle uses.

    You’ll have to write the converter yourself; the freebie IMS and NPCR tools perform conversions between XML and the traditional undelimited NAACCR flat record.

    BTW, when you import data now are you loading into temporary tables, or are you loading straight into your production tables? And how long does it take you to load 100 thousand data files of Incident records each day? Again, I’m asking simply because I am finding this whole topic really interesting… so if you get bored with amusing me, feel free to ignore me!

    Kathleen

    in reply to: Need to build an interface for XML to SQL Datadata #7285

    Hi Jeff,

    And no butting out your in too deep now …

    You may change your mind after this… 😉

    I admit that I prefer to a fault to write my own utilities, especially when existing tools annoy me. So if I were doing this, I would

    • read in a Patient record from the XML data file
    • iterate through a list of all possible Patient-level data items *
    • dynamically build an INSERT statement for just the items provided in the XML record
    • and execute it

    * You know what items matter to you, so if the XML contains additional/unexpected data items (i.e., ItemDefs) you are not interested in them anyway.

    Then do the same for all of the Tumor data associated with this Patient. I don’t think it would take long to write a quick-and-dirty test app to see how the performance compares to using the XML importer built into Oracle. If performance is acceptable, you could make the tool easily configurable to modify the “list of items we care about”, so that future changes to the NAACCR dictionary can be accommodated.

    Now remember, if you respond to me you’re just encouraging me to keep making suggestions!

    Good luck, and I hope you’ll continue to post about what you’re planning. The subject is very interesting to me.

    Kathleen

    in reply to: Need to build an interface for XML to SQL Datadata #7282

    converting XML to flat file’s for “bulk loading”

    This is just a thought, but what if you instead created “workspace” tables in your database that followed the structure of the XML file, i.e., a table for Patient data and another for Tumor data. Each would have primary key columns, and the Tumor tuples would have foreign key relationships to the Patient tuple.

    This would make it easier to perform case-specific and batch-update “fixes” to data on the way into the production tables. (This may not be something you do at your registry, but over the years I’ve heard lots of programmers say they need to fix misspellings and other such tweaks.) It would also be easy to write a “reviewer” app to look up cases, filter an incoming data file on any criteria, generate reports, whatever your analysts are clamoring for.

    Then you could run a handful of SQL insert/update queries to pull the data from the workspace tables into the production tables, and zap the workspace tables for next time.

    I just sort of hate to see you convert XML to flat ASCII files.

    I’ll butt out now.

    Kathleen

    in reply to: Need to build an interface for XML to SQL Datadata #7271

    Hi Jeff,

    NPCR produced XML Exchange Plus as a working example of an implementation of the NAACCR XML v1.3 specification. Whether you choose to use the XMLPlus.dll or simply use its API as a launch-point for your own implementation, you might find the information in the XMLExchangePlus help file a good place to start.

    The installer (includes XMLPlus.dll, a Windows application demonstrating the library’s features, and a comprehensive help file) can be found at the NPCR web site:
    https://www.cdc.gov/cancer/npcr/tools/registryplus/xml-exchange-plus.htm

    Kathleen Beaumont
    retired programmer for EDITS50 and XML Exchange Plus

    in reply to: Regex in filters #5259

    There are other ways you could isolate these fields with the existing filter, you know. Since you have stated you are working with custom fields, you could use date last changed, or place a keyword in the Field comments and filter on that.

    There are many easy ways you can achieve what you are trying to do without adding regular expressions (a difficult concept, even for programmers) to the EditWriter filters.

    Kathleen

    in reply to: Regex in filters #5094

    Hi Steve,

    I’m a little confused. If you want to find fields in the list that start with a particular prefix, just sort the Fields list in the Navigator by Field Name, and then use incremental search to position the grid on the first Field that matches. (Refer to the Navigator chapter in the help file, topic Incremental Searching for an explanation of how EditWriter implements this feature.)

    The Filter Fields dialog allows you to filter for a search term appearing anywhere within the Field Name (or in the Comments area of the Field object), including at the beginning of the Field Name. What didn’t work for you?

    Kathleen

    in reply to: Recent Files list #5093

    Hi Steve,

    EditWriter uses the Recently Used List that is built into the Open File dialog on Windows. When you click the Open button on the tool bar (or select from File | Open SMF), that dialog is presented with the cursor blinking in the File Name combo box. Click the down arrow on that control to see the full list of all of the metafiles you have ever opened, presented with the most recent ones at the top.

    Thanks for using this forum to ask this question.

    Kathleen

    in reply to: Confused by GEPv5 vs GEP NCDB 27 #4616

    Joanne,

    Have you “subscribed” to these forums? Scroll up this page (assuming you are reading this within the forum). The “header” to your first message displays this:

    Author Posts Favorites | Subscribe

    If you have already subscribed, the link will say Unsubscribe.

    Please let me know if you were already subscribed, so that I can follow up with NAACCR tech support. Otherwise, by subscribing you are authorizing the forum software to send email.

    Thanks for using this forum!

    Kathleen

    in reply to: Confused by GEPv5 vs GEP NCDB 27 #4614

    Hi Joanne,

    Please refer to the section of the announcement labeled: Schedule for Adoption of EDITS50 Tools. 2017 is a transition year, during which EDITS40 and EDITS50 will both be supported. You should go ahead with your plans to use the NCDB-provided software for your current call for Data.

    Kathleen

    in reply to: TNM Path N, SSF3, 4, 5 Breast (CoC) #4489

    Jim,

    Could you ask the person who sent you this email to provide the values for all of the data items referenced by this edit? A screen shot from the writer’s software (if it displays all of the referenced fields and values) would do. Otherwise, copy/paste from a GenEDITS Plus report.

    This edit has 94 lines of code, and it will likely be faster to find the cause of the problem if we can run the edit with the writer’s specific values in the test bench debugger.

    Thanks,
    Kathleen

    in reply to: Specifications #4483

    Jim,

    This response is to test whether you receive notification that I have written to this forum.

    Kathleen

    in reply to: Check-boxes for edits #4480

    Hi Charle,

    That feature is already implemented in EditWriter v5.

    Kathleen

Viewing 13 posts - 16 through 28 (of 28 total)

Copyright © 2018 NAACCR, Inc. All Rights Reserved | naaccr-swoosh-only See NAACCR Partners and Sponsors