Data shape conventions

Data shape language choice

Title: Data shape language choice

Identifier DSC-R1

Statement:

The data shapes shall be expressed in SHACL.

Checkable: Yes, partially

Validating that a Shapes file is indeed a SHACL Shapes file may perhaps be the subject of a syntax checker or another validation profile. It raises the question of whether to accommodate shape declarations and class definitions in the same file/graph, and also declaration styles where a class is also a shape (class extended shapes, which is in turn a subject of DSC-R4). Furthermore, a SHACL file is by extension also an OWL file. We can employ basic checks to ensure resource types are from either vocabulary, and that there exists at least one SHACL resource.

Only SHACL and OWL resource types

Shape Description(s):

  • A node shape targeting all subjects of rdf:type with a property membership restriction to both the set of primary OWL and SHACL types

    • Non-Compliance Severity: Violation

    • Non-Compliance Message(s):

      • Violation of SEMIC rule DSC-R1: Only SHACL (and OWL) class and property declarations shall be used

  • A node shape targeting the class sh:NodeShape with a restriction on the inverse of the propery rdf:type and minCount=1 (effectively saying there must be at least one instance of it)

    • Non-Compliance Severity: Violation

    • Non-Compliance Message(s):

      • Violation of SEMIC rule DSC-R1: SHACL shape declarations shall exist

Loose versus rigid constraints

Title: Loose versus rigid constraints

Identifier DSC-R2

Statement:

In a Core Vocabulary, then the data shape constraints shall be as loose as possible, i.e., permissive, while in an Application Profile, the data shape constraints shall be as rigid as necessary, i.e. restrictive.

Checkable: No

It is not possible to verify whether a vocabulary is a component of an Application Profile (AP). To implement such a check, options need to be provided, such as specifying an AP for validation. Nevertheless, identifying violations in shapes that adhere to or deviate from the AP will pose a challenge.

Open and Closed world assumptions

Title: Open and Closed world assumptions

Identifier DSC-R3

Statement:

The data instantiating a core model may be fragmented across information systems. For representation purposes, open-world assumptions shall be adopted, and for validation purposes, a closed-world assumption shall be adopted.

Checkable: No

The open vs. closed world assumption is not easily verified in an automated manner. The conventions do not clearly articulate how this affects large ontologies, such as ePO.

Shape definitions

Title: Shape definitions

Identifier DSC-R4

Statement:

Each ontology class shall be mirrored by a sh:NodeShape. The property constraints may be embedded (contextualised) within node shapes or defined as freestanding.

Checkable: Yes, partially

The gist of the rule is two-fold: (i) every class must have a corresponding shape and (ii) property constraints can be "freestanding" or not. SEMIC does not make clear if the latter is to be recommended or avoided. The former would normally not be implementable, as we validate SHACL and OWL independently, but an idea for the future is to allow (provide an option for) a combined validation of both OWL and SHACL. There are other subrules, and one clearly implementable is that of avoiding class extended shapes (i.e. a shape that is also a class).

Avoid shapes that are extended from classes

Shape Description(s):

  • A node shape targeting the class sh:NodeShape with a property value type membership restriction to sh:NodeShape (effectively saying this is by no means an extended shape of any kind - not just OWL class)

    • Non-Compliance Severity: Warning

    • Non-Compliance Message(s):

      • Non-observance of SEMIC rule DSC-R4: Avoid classes extended as node shapes.

Data shape severity

Title: Data shape severity

Identifier DSC-R5

Statement:

It is recommended to use constraint severity for distinguishing critical violations from non-critical recommendation warnings and from nice-to-have information.

Checkable: Yes

This is a rule which has syntax implications and therefore may be covered more extensively by a validator using a syntax profile.

Explicit declaration of severity

Shape Description(s):

  • A node shape targeting the class sh:NodeShape with a restriction of XONE (exactly or "exclusively one of") on two sets of property shapes: one on the property sh:severity with exact cardinality 1, and another on the sequence of properties (sh:property sh:severity) with exact cardinality 1, referring to any related property shape (effectively saying check that either one of these situations must exist).

    • Non-Compliance Severity: Warning

    • Non-Compliance Message(s):

      • Non-observation of SEMIC rule DSC-R5: Data shapes should be assigned a level of severity