Versioning and canonical urls

 

At last week’s FHIR Developer Days in Amsterdam, we had a highly enjoyable break-out session on the use of canonical urls when taking versioning into consideration.

The issue had been popping up more often recently, and we, as the core team, had been pushing off trying to solve it until we had better understanding of the problem. But here we were: we had a room full of knowledgeable people and enough experience to take a stab at a solution.

For those not present, I’d like to summarize what we discussed, seamlessly turning into what I think needs to happen next.

The problem

Let me start off by giving you an overview of the problem.
The two key players in the game are the canonical url and the canonical reference. The canonical url is a special identity present on most of our conformance resources (StructureDefinition, ValueSet and the like). This identity is used by a canonical reference, present in StructureDefinition (e.g. StructureDefinition.base) and FHIR instances (Resource.meta.profile), which is used to refer to other StructureDefinitions (or any other conformance resource). For example, the StructureDefinition for the localized version of Address in Germany starts like this:

<StructureDefinition>    
    <url value="http://fhir.de/StructureDefinition/address-de-basis" />
    <name value="address-de-basis" />
    <title value="Adresse, deutsches Basisprofil" />

Here, the <url> contains the canonical url of the address-de-basis StructureDefinition.

This canonical url can be referenced by other parts of your specification, as for example here at Patient.address in the StructureDefinition of the German version of Patient:

<element>
    <path value="Patient.address" />
    <short value="Adresse nach deutschem Profil" />
    <type>
        <code value="Address" />
        <profile value="http://fhir.de/StructureDefinition/address-de-basis" />
    </type>

This is quite comparable to what happens in your favorite programming language, and then referring to it when you are declaring an instance variable of class member:

public class GermanAddress
{
    ....
}

public class GermanPatient
{
    ....
    public GermanAddress Address { get; }
}

As a good member of the FHIR community, you have published all your profiles to a registry (these examples come from the German Basisprofil on Simplifier), and people are happily validating their Patient instances against it. Some of the instances may even have gotten tagged by a profile claim:

    <Patient>
        <meta>
            <profile value="http://fhir.de/StructureDefinition/address-de-basis">
        </meta>
        <!-- rest of the Patient's data -->
    </Patient>

Before long, this canonical reference gets baked into both instances in databases and software using your profile.

And then, the day comes that you are required to make changes to your profile. Breaking changes in fact. You are circling your brand new version of the profile amongst your colleagues, everyone is updating their test-instances and all works fine. Until you publish the new version on the registry. Once a copy of your StructureDefinition starts tickling down to servers around the country, previously correct instances of the German Patient will now start to fail validation. You realize you have broken the contract you had with your users: stuff that was valid before has now -without the end-users doing anything- become broken.

More subtly, if your breaking change was to the address-de-basis profile, this turns out to be a breaking change to all profiles depending on address-de-basis, which then proliferate up all the way to the instances.

A simple solution

It is easy to see what you should have done: you (and all your users) should have versioned both the canonical url and the canonical reference! So,

<StructureDefinition>    
    <url value="http://fhir.de/StructureDefinition/address-de-basis-v1" />

and

<type>
    <code value="Address" />
    <profile value="http://fhir.de/StructureDefinition/address-de-basis-v1" />
</type>

and finally

<meta>
    <profile value="http://fhir.de/StructureDefinition/address-de-basis-v1">
</meta>

This way, when we publish a new version we may change the canonical url and update it to end in v2, and all existing materials will remain untouched and valid.
Of course, minor changes are OK, so if we all agree to stick to semver, and only change our canonical url when a major version number changes, we’re doing fine. We formalized this approach in the STU3 version of the FHIR specification, by adding a <version> element to StructureDefinition:

<StructureDefinition>    
    <url value="http://fhir.de/StructureDefinition/address-de-basis-v1" />
    <version value="0.1" />

and then allowing you to tag canonical references with a | and a version number like so:

<type>
    <code value="Address" />
    <profile value="http://fhir.de/StructureDefinition/address-de-basis|0.1" />
</type>

Ok. Done. We published STU3 and hoped the problem was solved.

But it wasn’t. Well, technically, it was – but there are a few practical complications:

  • We had not followed this practice ourselves in the specification (leaving http://hl7.org/StructureDefinition/Patient untouched for years across STUs), and neither did the most prominent early Implementation Guides (like those from Argonaut). We set bad examples that turned out to be the path of least resistance as well. Guess what happens.
  • It just feels wrong to hard-wire version numbers in references within a set of StructureDefinitions and Implementation Guides you author. If you publish them as a version-managed, coherent set of StructureDefinitions, it is obvious that you’d like to reference the version of the StructureDefinition published at the same time as the referring StructureDefinition in that same set.
  • If you need to bump the version in a canonical url in the set that you publish, you need to be really sure you update all references to it in that set. And then (as we saw above) update the canonical url of all referring StructureDefinitions, and so on and so on. If you fail to do this, you end up in a situation where part of your definitions are using one version and another part another version. Granted, we could find someone who thinks that’s a feature, but I am sure most would disagree.

Better tooling support for authors could ease this job and help sticking to versioned references, but I kept having the nagging feeling something was not right. This was strengthened by the fact that this is not how we commonly version in other development environments:
continuing our parrallel with programming concepts, versioning the canonical url would be comparable to versioning our class names:

public class GermanAddress2 { ... }
public class GermanName4 { ... }

public class GermanPatient2
{
    ....
    public GermanAddress2 Address { get; }
    public GermanName4 Name { get; }
}

It is not like we’ve never seen this before, but that’s really only done if you want to keep incompatible versions of the same class within the same codebase, because you still need access to both (e.g. to do mapping).

At the same time, we were looking at how users of Simplifier and authors of implementation guides organized their projects and how they wanted versioning to work. It turns out that StructureDefinitions simply do not live on their own, much like Java classes are not shared and distributed in isolation. They are authored and shipped in sets (like implementation guides), and are versioned in those sets. Of course, they may still use canonical references to materials outside the set, and you’d need to tightly version those references, but inside their set, they simply mean to point to each other.

You don’t need to look around long at how other parts of the industry have solved this to realize that we need the concept of a “package”, much like you would package up your classes and Javascript files into zips, jars or whatever and ship them as npm, maven or NuGet packages.

Packages to the rescue

If you are not familiar with these packaging mechanism, I’ll call out a few properties of packages, to see how they solve our problems:

  • Packages contain artifacts that are managed by a small group of authors, who prepare them as a consistent set and publish them as an indivisible unit under the same version moniker.
  • A package is published on a registry and has a name and a version, the combination of which is enforced to be unique within that registry. You cannot overwrite a package once it has become published.
  • Packages explicitly state on which version of which other packages they depend, and contain configuration information on how to handle version upgrades and mismatches. Additionally, the user of a packages may override how version dependencies are resolved, even enforcing the use of an older or newer version of the dependency when desired.

Packages are usually just normal zip archives with all artifacts you like to publish, with an additional “configuration file” added to the zip:

{
  "name": "http://fhir.org/Package/package-de-basis",
  "version": 1.0.4,
  "dependencies": 
  {
    "http://fhir.org/Package/some-external-package": ">3.1.4"
  }
}

This has considerable advantages for the users:

  • When a user downloads a package, it contains “everything you need” to start working with StructureDefinitions within it, instead of having to download individual artifacts one by one from a registry.
  • The user also has the confidence that these artifacts are meant to be used together, which is not obvious if you are looking at a huge list of StructureDefinition on the registry.
  • A package manager will ensure that if you download one package for use in your authoring environment or machine, all dependencies will also be retrieved, and you would not encounter errors due to missing references.

As an author of a package of conformance resources this means that you no longer need to version your canonical references: they are interpreted to mean to refer to the resources inside your package. This is even true for canonical references to StructureDefinitions and ValueSets outside your package, since you as an author explicitly declare these dependencies and version them at the package level and no longer at each individual reference. Upgrading to a new version of a dependency is now completely trivial.

It might even solve another long standing wish expressed by authors: you could configure “type aliases” within your package, stating that every reference to a certain core type (or profile) within the package is translated to a reference to another type. If you wonder why that is useful, consider the poor German authors who defined a German Address (address-de-basis) and then needed to go over all of their StructureDefinitions to replace references to the “standard” core Address with their localized version. It’s pretty comparable to redirecting conformance references to a specific version of a StructureDefinition, so we can solve this in one go.

Concrete steps

Looking at the situation today, I suggest we do the following:

  • Remove <version> from StructureDefinition, and obsolete the use of | in references (though hard version references might still have a use).
  • Decide what we need as a syntax to define packages and declare dependencies. We could leverage existing mechanisms (package.json being a prime candidate) or integrate it into the existing ImplementationGuide resource (which would enable existing registries to simply use a FHIR endpoint to list the packages).
  • Enhance the authoring workflow in the current tools to allow the authors to create packages when they want to publish their IGs, fixing the external dependencies to specific versions.
  • Create (or re-use) package managers to enable our users to download packages and their dependencies (and turn make our registry compatible with those tools).

Are we done? No. There are two more design decisions (at least) to make. Let’s start with the easiest one:

It is apparent that there is a relationship to our concept of an “Implementation Guide” and a package. And we have to figure out exactly what that relationship is. I feel an Implementation Guide is a kind of package, containing the conformance resources, html files, images, etcetera that form the IG. But this also means there will be packages that are not ImplementationGuides. If we decide that the ImplementationGuide resource is the home of the “configuration part” of a package, we will need to rename ImplementationGuide to reflect its new scope.

But I saved the most impactful consequence for last:

You can no longer interpret canonical references present in conformance resources or instances outside of the context of a package.

Let me reiterate that: any authoring tool, validator, renderer or whatever system that uses StructureDefinitions to do its work will need to know this context. Those who have carefully studied the current ImplementationGuide resource (especially ImplementationGuide.global) realize this is already the case now, but most are blissfully unaware of this hidden feature.

For systems working with conformance resources (like instance validators), it’s likely they have this context: if you’re dealing with a StructureDefinition, you probably got it from a package in the first place (it becomes a different matter entirely if a resource can be distributed in different packages. Well, let’s not diverge.)

For servers exchanging instances however – we’d need to assume they know the context out of band. But this won’t do for partners exchanging outside of such a controlled environment. For this let me suggest to summon an old friend, hidden in a dark corner of the FHIR specification: Resource.implicitRules

The specification states:

Resource.implicitRules is a reference to a set of rules that were followed when the resource was constructed, and which must be understood when processing the content

that sounds about right, however, it continues:

Wherever possible, implementers and/or specification writers should avoid using this element.

And sparingly lists the reasons why we shouldn’t. I suggest we take a fresh look at this element and see whether the element we thought up so many years ago may find use for it after all.

Finally

We’re not done yet, and I admit I am not sure I’ve dealt with all the consequences of this versioning solution here, but it has one thing going for it: we’re profiting from solutions thought up by the rest of the industry who have dealt with this before. But is an exchange standard and it’s artifacts completely comparable to a software package? Will we be able to fix the problem of communicating package context?

I don’t know yet, but the more I think about the uses of packages and how it eases the life of authors and users of implementation guides, the more I think we should pursue this a bit in online discussions. Hope they are as engaging as the one at FHIR DevDays!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s