Schematron validation

Thanks to European eInvoicing standard (EN16931) and OpenPEPPOL, Schematron validation has become a well-known topic especially in eProcurement.

Schema validation is a bad quality metric

Traditionally senders have been able to check their own XML documents using schema validation only. If XML document has passed schema validation, it has been good enough to be sent to forward.

Schema validation covers message structure and possibly some content constraints. But it does not have ability to express conditional and integrity requirements. In addition, message schemas usually aim to cover multiple use cases and are composed of reusable components. These are good design practices but schemas become very generic: only handful of mandatory elements, no length constraints, no codelist constraints, no pattern constraints.

Schematron enables rule based validation

Schematron is a rule-based validation language for transforming business and use case specific requirements into technical validation rules. Schematron helps to tighten and narrow down message structure by stating mandatory structures and structures which should not be used. It also allows to state integrity requirements like sum checks, date comparisons, dependent elements (either-or, if-then, one-of, all-or-none, etc) and conditional value requirements. Schematron is used as an additional validation layer on top of schema validation.

Why Schematron validation should be used?

Country, industry and company specific requirements are usually published as an exhausting PDF document called message implementation guideline (MIG). It's needed for sure but it leaves a lot of room for interpretation. Each and everyone person assumes that they have constructed a standard compliant implementation but in the end implementations will not interoperate. Schematron can be used to convert documented requirements to technical validation rules which effectively sets minimum requirements for all standard compliant implementation.

Integrity and content requirements are often checked only when data is read to a receiving system. When issues are found, XML document is moved to exception handling process where either missing/invalid data is completed manually or an error notice is returned back to a sender. It's obvious that this process wastes time and money. In addition, most error notices generated by receiving systems are not clear enough to identify and locate an issue from XML document. If a sender is not sure how to fix the issue, even more time and money will be wasted when help desk gets involved.

Benefits of using Schematron

Schematron can be used to publish integrity and content requirements in a technical format which makes it possible to apply detailed XML validation automatically by a sender. This makes sure that most invalid XML documents will never reach a receiver's system which obviously minimized the need for mutual exception handling.

Why Schematron has not been popular before? Most of all, it's due to unawareness but you may also hear bad excuses like "Schematron cannot cover all requirements because some checks needs to be run against a back-end system and that's why we are not using it". So what? Wouldn't it be great if 80% of requirements could be checked already by a sender's system? Most issues would be solely solved by a sender who would get an instant and detailed feedback and could easily iterate to make sure that the issue has been fixed.

What's lacking from Schematron

Schematron validation generates a technical output, just like schema validation. Schematron report is a long XML document listing which checks were applied and which errors were found. Each error notice contains a requirement as a clear text, XPath of an invalid element and validation rule's technical implementation as a bunch of XPath "code". Without special attention to implementation of calculation rules, such as sum checks, one cannot easily decide whether an issue is related to rounding or something else. Schematron error doesn't either contain a line number for an invalid element. In case of a bigger XML document, its like seeking a needle in a haystack.

Schematron validation fits perfectly to automated content validation, when one needs to check that an XML document complies with requirements. For locating and fixing issues, most will find its output too harsh and technical as such.

Truugo + Schematron - perfect together

Schematron makes it possible to define technical and business rules as a standalone file, instead of embedding those to system code. Thus exactly same checks can be applied in production environment and in Truugo with a minimal effort.

Schematron file can be imported to Truugo to setup an easy self-service validator. Truugo makes it efficient to locate and fix issues instantly: Each error notice is visualized and includes a link to an invalid element. Complicated checks can be split into parts, for example, to inform users about a gap in a sum check.

Truugo self-service model helps to speed-up deployments by providing an easy user interface and test output to support efficient error detection. For production issues, Truugo validation API makes sure that a detailed test report is always seconds away.

Need to create own Schematron file and wondering what would be the best way to get started? Please contact us!