Overview

The Artifact Protocol describes a means to define, store, and transmit data, and preserve its meaning over extended periods of time or extensive processing.

Implementing the Artifact Protocol in a system enables full version control for data, its structure, and its meaning.

Rationale

A dataset is ultimately a record of the understanding of a portion of reality.

This understanding is inherantly temporal. It is an understanding, at the point in time of being recorded, of a portion of reality as observed at a given point in time.

This holds true if the data is being recorded concurrently or near-concurrently to the portion of reality which it reflects.

As data within the database or dataset grows, we can say that there is a growing understanding of the portion of reality being reflected. And as this understanding grows, we can say that it is likely that the definitions of our data - itself the record of our understanding of the sub-set of reality being reflected - may change.

Developer Libraries

The Artifact Protocol is being developed alongside two implementations of the protocol itself: a reference implementation with which to work with Artifacts, and a content addressable storage system to store Artifacts.

quotation_text_background

The meaning of things lies not in the things themselves, but in our attitude towards them in particular caused by what we compare it to: something worse and we feel grateful for what we have; something better and we feel somehow let down.

Antoine de Saint-Exupery quotation_citation_background

Python Artifact is the data manipulation layer for working with Artifacts as a developer-user. It is built according to the object-capability pattern.

It allows you to create Artifacts of the following primitive types:

  • Entity Schema Artifacts (Artifacts describing 'things')
  • Entity Artifacts (Artifacts recording 'things')
  • Relationship Schema Artifacts (Artifacts describing the relationship between 'things')
  • Relationship Artifacts (Artifacts recording the relationship between 'things')
  • Event Schema Artifact (Artifacts describing the events which take place on or between 'things')
  • Event Artifact (Artifacts recording the events which take place on or between 'things')

Artifact Vault is a content-addressable storage system for Artifacts. It is the simplest possible way of storing and retreiving Artifacts.

Artifacts are stored in a single object called a 'vault'. They are retreived by calling their Artifact key, which is a unique hash of the Artifact data itself.

The vault itself can either be called locally through code, or over an API locally or remotely.

quotation_text_background

There is nothing new except what has been forgotten.

French proverb quotation_citation_background

Rationale

Data, the structure of the data, and the meaning of the data, are treated in fundamentally the same way - as a self-describing unit of knowledge. These units are called 'Artifacts', and are immutable.

These Artifacts are themselves divided into six types, which are used as the basic elements from which data is defined or recorded:

  • Entity Schema Artifact
  • Entity Artifact
  • Relationship Schema Artifact
  • Relationship Artifact
  • Event Schema Artifact
  • Event Artifact

These are called 'primitive types'.

Each of the Artifact primitive types exists as one part of a pair: one to describe data, and one to record it. Therefore, when recording data about an entity as an Entity Artifact, data within an Entity Schema Artifact is used to describe that record.

Artifacts can only be created. If the data stored within an Artifcat is found to be untrue, then a new Artifact must be created to represent the new understanding of the truth. This is the case for the record of data within a record-type primitive type like an Entity Artifact, or the definition of data within a definition-type primitive like Entity Schema Artifact. Artifacts store references to Artifacts where they represent a newer or better understanding of the truth. Recording provenance in this way enables version control.

As all data is stored in Artifacts, so all data, schemata, and semantics, of the domain are version controlled.

Therefore, the changes to the understanding of the domain data itself gets tracked over time. Again, this version control spans data, structure, and semantics.

When recording data, a record-type Artifact also stores a reference to the version of the understanding of the domain structure.