Regular Expressions Application in the Messaging Workbench

 

The Messaging Workbench supports message element value assessment using Pearl Compatible Regular Expressions (PCRE). The particular PCRE engine incorporated in the MWB has been supplied under license by Ralf Junker / The Delphi Inspiration (© Copyright 2000-2003 Ralf Junker / The Delphi Inspiration, all rights reserved). The Delphi Inspiration Web Site is at http://www.zeitungsjunge.de/delphi/ .

 

PCRE is a set of library functions that implement regular expression pattern matching using the same syntax and semantics as Perl 5. To learn more about these functions and the PCRE effort visit http://www.pcre.org/ . For documentation of the expression language visit http://www.perldoc.com/perl5.8.0/pod/perlre.html . Another helpful resource in understanding and implementing PCRE is the book Mastering Regular Expressions by  Jeffrey E.F. Friedl published by O’Reilly ISBN 0-596-00289-0.

 

 

How PCREs are used in the MWB

 

PCREs (a.k.a. regex) can be used at various levels within the MWB, and for a couple of different purposes. At the most basic level, a new field has been added to the profile that permits entry of a PCRE expression, which can be used to ensure that any example value enterred for the element is valid. A more important aspect of PCRE usage in the MWB is their application in message instance validation against a message profile. PCREs and message instance validation are being introduced concurrently in release 6.2 of the MWB. The new message instance validation feature is discussed elsewhere. The sections below explain how to setup and use PCREs in the MWB.

 

Simple Application:

 

To get a sense of how PCREs can be used in the MWB, start a new profile, and simply compile the single MSH segment. Select MSH.1 – field separator in the Message Tree. If the segment library that the MSH was compile against already has PCREs assigned to its elements, then you’ll see a regex expression is displayed in the Regex field (bottom right of Message Definition tab), that looks something like this: ^[\x20-\x7e]{1,199}$. This expression represents an HL7 ST (string) data type, which is appropriate for MSH.1. If on the other hand your segment library has not been prepared with integrated regex, the field will be empty.

 

In either event, select the regex entry and delete it from the field. In the regex field enter the following string: ^[|]$ instead. Note that immediately after entering the ‘[‘ character, the field background turns red (or whatever color you have the highlight set to in the MWB options). As you enter the ‘]’ character the field background turns white again. The color change illustrates the fact that the MWB evaluates the validity of the regex as you enter it. Persistence of red in the field’s background color indicates that the regex entered is illegal and needs to be corrected before it can be applied.

 

The ^[|]$ expression that we just entered is a constraint on the generic HL7 ST data type regex. This paper will not delve into an explanation of PCRE syntax, but suffice it to say that the first expression indicates that a valid ST value will start at the beginning of a line, followed by up to 199 characters in the range of space to ‘~’ (32-126), culminating with an end of line character. The second expression contrains that statement indicating that a valid MSH.1 value can only consist of the ‘|’ character bracketted by a start of line and end of line character.

 

It’s time now to test our regex. In the example value field enter a ‘-’ character. Notice that the Example value field background turns red. The highlight indicates that the example value is invalid. Erase the ‘-’ and replace with ‘|’. Note that the highlight disapears indicating that the example value is consistent with the regex that validates it. As a test, go to the Regex field, and change the ‘|’ to a ‘-‘. Note that the Example value field becomes highlighted again. These simple excercises illustrate that the Example value and Regex fields are “wired together”. This feature ensures that an MWB user can be confident of using legal regex and valid example values in a subject profile.

 

Propagation of PCREs within the MWB:

 

The excercises above also allude to to other important features of the PCRE implementation within the MWB. First, that libraries can be imbued with regex attributes. Secondly, that element regex may be constrained at different levels. As indicated above, segment libraries can have regex attributes assigned to constituent elements, which will be automatically inherited by profile message elements that employ such a regex prepared segment library. Regex assignment by the way is directly applicable to elements with primitive data types only.

 

In addition, data type libraries can have regex assigned to primitive elements. Segment libraries in turn can inherit the regex from their associated data type libraries. Also, within data type libraries, compound data types can further constrain (or even change ) the regex of their associated primitive data types. For example, if we consider the TS time stamp data type in HL7 version 2.4 we see that it is composed of an NM component (Date/Time) and an ST component (degree of precision).

 

The standard specifies that the Date/Time component is to be implemented as a string that looks like this:

[YYYMMDD[HHHMM[SS[.SSSS]]][+-ZZZZ]

According to the standard however the NM data type is to be implemented as:

 [+/-]n[nnn---][.n[nnn---]]

which is obviously inconsistent with the date/time string. After initially defining regex for the data type library’s primitive elements, the 2 components of the TS inherit the default regex for NM and ST, ^[+-]?\d*\.?\d*$ and ^[\x20-\x7e]{1,199}$ respectively.

 

The MWB however permits constraining and/or changing the components regex to better suit the parent data type’s implementation. In this case, the first component’s,  regex is changed to:

 ^\d{4}(((0[1-9])|(1[0-2]))?((([0-2][1-9])|(3[0-1]))?((([01]\d|2[0-3])([0-5]\d))?((([0-5]\d)?((\.\d{1,4})?))))))([+-](([0]\d|1[0-3])([0-5]\d)))?$

which actually matches the HL7 specified implementation. The second component is constrained from the generic string regex as per the the specified HL7 implementation as: ^[YLDHMS]$ .

After these changes, any more complex data type that includes a TS component will inherit the specific TS components regex.

 

To summarize, the MWB supports hierarchical inheritance and constraint of regex at the following levels:

 

Primitive data type – Data type library

   Compound data type – Data Type library

       Segment fields – Segment library

           Profile fields – Message Profile

 

 

Regex Initialization in the MWB

 

The MWB has been engineered to facilitate the propagation of regex within its libraries. Starting with Data Type maintenance (from the menu Maint\Datatypes\Add/Edit Data type File). After selecting a data type file, select the last menu item under the Data Type option - Init Regex. This feature will allow you to select a text document that lists primitive data types together with their regex. For this release the text document is named Primitive DT Regex.txt  and is supplied as the default. After selection of the file, the primitive data types are propagated throughout the data type library to both primitive and compound data types. It is recommended that the data types be reviewed for the need to constrain/change the complex data types appropriately as described above. Save the data type file to make the regex propagation permanent.

 

This method of propagation was devised to permit users to provide their own regex and to facilitate easy extension of this feature to later HL7 versions, and to existing customized data type libraries. Run this option whenever it is desired to change one or more regex globally through out the data type library.

 

To inherit the regex into a segment library from an associated data type library, use the Maint\Libraries\Edit Library option. Select the segment library of interest. Be sure that it shows the appropriate Attached Data Type file (or establish the attachment). Determine whether or not the segment’s primitive fields should be changed or not (initially they should be) and check/uncheck the Regex/Examples for Primitive fields as desired, then click on the DT Compile Only button.

 

The check box determines whether or not a segment’s primitive data types will take on the regex of the data type. If you have already constrained the segment’s primitive fields regex, you may not want to inherit the data type general regex.

 

 

At the time of this writing, it is intended that regex will be propagated throughout the standard libraries supplied with the MWB. Check to be sure though. It is also possible to remove regex from a given data type library and segment library. To do so, start with a particular data type library, and select the file (supplied) named “Primitive DT Regex-nulls.txt”. This will remove the regex from the data type library. Follow up by compiling this data type library into the associated segment library(s) to scrub the segment library of regex.