Regular Expressions Application in the Messaging Workbench
The Messaging Workbench supports message element value assessment using Pearl Compatible Regular Expressions (PCRE). The particular PCRE engine incorporated in the MWB has been supplied under license by Ralf Junker / The Delphi Inspiration (© Copyright 2000-2003 Ralf Junker / The Delphi Inspiration, all rights reserved). The Delphi Inspiration Web Site is at http://www.zeitungsjunge.de/delphi/ .
PCRE is a set of library functions that implement regular expression pattern matching using
the same syntax and semantics as Perl 5. To learn more about these
functions and the PCRE effort visit http://www.pcre.org/
. For documentation of the expression language visit http://www.perldoc.com/perl5.8.0/pod/perlre.html
. Another helpful resource in understanding and implementing PCRE is the book Mastering
Regular Expressions by Jeffrey
E.F. Friedl published by O’Reilly ISBN 0-596-00289-0.
PCREs (a.k.a. regex)
can be used at various levels within the MWB, and for a couple of different
purposes. At the most basic level, a new field has been added to the profile
that permits entry of a PCRE expression, which can be used to ensure that any
example value enterred for the element is valid. A more important aspect of
PCRE usage in the MWB is their application in message instance validation
against a message profile. PCREs and message instance validation are being
introduced concurrently in release 6.2 of the MWB. The new message instance
validation feature is discussed elsewhere. The sections below explain how to
setup and use PCREs in the MWB.
Simple
Application:
To get a sense of
how PCREs can be used in the MWB, start a new profile, and simply compile the
single MSH segment. Select MSH.1 – field separator in the Message
Tree. If the segment library that the MSH was compile against already
has PCREs assigned to its elements, then you’ll see a regex expression is
displayed in the Regex field (bottom right of Message Definition tab), that
looks something like this: ^[\x20-\x7e]{1,199}$. This expression
represents an HL7 ST (string) data type, which is appropriate for MSH.1.
If on the other hand your segment library has not been prepared with integrated
regex, the field will be empty.
In either event,
select the regex entry and delete it from the field. In the regex field enter
the following string: ^[|]$ instead. Note that immediately after
entering the ‘[‘ character, the field background turns red (or whatever
color you have the highlight set to in the MWB options). As you enter the ‘]’
character the field background turns white again. The color change illustrates
the fact that the MWB evaluates the validity of the regex as you enter it.
Persistence of red in the field’s background color indicates that the regex
entered is illegal and needs to be corrected before it can be applied.
The ^[|]$
expression that we just entered is a constraint on the generic HL7 ST
data type regex. This paper will not delve into an explanation of PCRE syntax,
but suffice it to say that the first expression indicates that a valid ST
value will start at the beginning of a line, followed by up to 199 characters
in the range of space to ‘~’ (32-126), culminating with an end of line
character. The second expression contrains that statement indicating that a
valid MSH.1 value can only consist of the ‘|’ character bracketted
by a start of line and end of line character.
It’s time now to
test our regex. In the example value field enter a ‘-’ character. Notice
that the Example value field background turns red. The highlight indicates that
the example value is invalid. Erase the ‘-’ and replace with ‘|’.
Note that the highlight disapears indicating that the example value is
consistent with the regex that validates it. As a test, go to the Regex field,
and change the ‘|’ to a ‘-‘. Note that the Example value field becomes
highlighted again. These simple excercises illustrate that the Example value
and Regex fields are “wired together”. This feature ensures that an MWB user
can be confident of using legal regex and valid example values in a subject
profile.
Propagation of
PCREs within the MWB:
The excercises
above also allude to to other important features of the PCRE implementation
within the MWB. First, that libraries can be imbued with regex attributes.
Secondly, that element regex may be constrained at different levels. As
indicated above, segment libraries can have regex attributes assigned to
constituent elements, which will be automatically inherited by profile message
elements that employ such a regex prepared segment library. Regex assignment by
the way is directly applicable to elements with primitive data types only.
In addition, data
type libraries can have regex assigned to primitive elements. Segment libraries
in turn can inherit the regex from their associated data type libraries. Also,
within data type libraries, compound data types can further constrain (or even
change ) the regex of their associated primitive data types. For example, if we
consider the TS – time stamp data type in HL7 version 2.4
we see that it is composed of an NM component (Date/Time) and an ST
component (degree of precision).
The standard
specifies that the Date/Time component is to be implemented as a string
that looks like this:
[YYYMMDD[HHHMM[SS[.SSSS]]][+-ZZZZ]
According to the
standard however the NM data type is to be implemented as:
[+/-]n[nnn---][.n[nnn---]]
which is
obviously inconsistent with the date/time string. After initially defining
regex for the data type library’s primitive elements, the 2 components of the TS
inherit the default regex for NM and ST, ^[+-]?\d*\.?\d*$
and ^[\x20-\x7e]{1,199}$ respectively.
The MWB however
permits constraining and/or changing the components regex to better suit the
parent data type’s implementation. In this case, the first component’s, regex is changed to:
^\d{4}(((0[1-9])|(1[0-2]))?((([0-2][1-9])|(3[0-1]))?((([01]\d|2[0-3])([0-5]\d))?((([0-5]\d)?((\.\d{1,4})?))))))([+-](([0]\d|1[0-3])([0-5]\d)))?$
which actually
matches the HL7 specified implementation. The second component is constrained
from the generic string regex as per the the specified HL7 implementation as: ^[YLDHMS]$
.
After these
changes, any more complex data type that includes a TS component will
inherit the specific TS components regex.
To summarize, the MWB supports hierarchical inheritance and constraint of regex at the following levels:
Primitive data type – Data type library
Compound data
type – Data Type library
Segment
fields – Segment library
The MWB has been engineered to facilitate the propagation of regex within its libraries. Starting with Data Type maintenance (from the menu Maint\Datatypes\Add/Edit Data type File). After selecting a data type file, select the last menu item under the Data Type option - Init Regex. This feature will allow you to select a text document that lists primitive data types together with their regex. For this release the text document is named Primitive DT Regex.txt and is supplied as the default. After selection of the file, the primitive data types are propagated throughout the data type library to both primitive and compound data types. It is recommended that the data types be reviewed for the need to constrain/change the complex data types appropriately as described above. Save the data type file to make the regex propagation permanent.
This method of propagation was devised to permit users to provide their own regex and to facilitate easy extension of this feature to later HL7 versions, and to existing customized data type libraries. Run this option whenever it is desired to change one or more regex globally through out the data type library.
To inherit the regex into a segment library from an associated data type library, use the Maint\Libraries\Edit Library option. Select the segment library of interest. Be sure that it shows the appropriate Attached Data Type file (or establish the attachment). Determine whether or not the segment’s primitive fields should be changed or not (initially they should be) and check/uncheck the Regex/Examples for Primitive fields as desired, then click on the DT Compile Only button.
The check box determines whether or not a segment’s primitive data types will take on the regex of the data type. If you have already constrained the segment’s primitive fields regex, you may not want to inherit the data type general regex.
At the time of this writing, it is intended that regex will be propagated throughout the standard libraries supplied with the MWB. Check to be sure though. It is also possible to remove regex from a given data type library and segment library. To do so, start with a particular data type library, and select the file (supplied) named “Primitive DT Regex-nulls.txt”. This will remove the regex from the data type library. Follow up by compiling this data type library into the associated segment library(s) to scrub the segment library of regex.