Constructing a Category From IPLD

IPLD is IPFS’ answer to abstract data representation in the semantic web. The domain of its values is a slight superset to those of JSON – they add ‘links’ as an additional type. This lends itself to a nice structurally typed value space, ripe for the polytypic manipulation it was intended for. In this post we explore the additional structure present in this system.

Defining the IPLD Category #

IPLD types are algebraic data types, sans coproduts. There are a few ‘primitives’, such as numeric types and characters. Links can be seen as isomorphic to character lists – and are therefore not primitive.

While IPLD itself has no notion of a function, Serialising (arbitrary) functions has always been a security nightmare, and is well to be outside of scope of IPDL, meaning IPLD\bm{IPLD} definitely isn’t closed. It is worth noting that even JavaScript engines suffer from massive exploits, nearly 50% of which are due to JIT compiling code. it is still within the model for applications to receive, produce, and – in between – manipulate IPLD values. Constructing idid after this is a breeze.

In fact, proving IPLD\bm{IPLD} to be a subcategory of Set\bm{Set} is so agitatingly trivial, I’ll skip the process entirely (and leave it as an exercise to the reader, of course). You can learn more about this notion of types as sets, and the category of types (sets) in my post on sorts.

It is worth pointing out that IPLD\bm{IPLD} on its own has no initial object, as there is no empty type. However, there is a null type, which makes for a terminal object.

But does the structure end there?

Advanced Data Layouts #

Advanced Data Layouts – ADLs for short – are perhaps the most confusingly named part of the IPLD model. IPFS in general is rife with them, but this portion seems particularly poorly documented: take a look. They technically do refer to non-trivial data layouts (though, the line between simple and advanced is fuzzy).

ADLs are (invertible) functions on IPLD data. They serve to encapsulate particular structure in the function’s domain. For instance, IPFS has a soft cap of 1MB on the unit storage block. To get around this, you’d have to split data into multiple blocks. This is factored out by an ADL which transparently does this for you.

Obviously, ADLs are only useful if they can go both ways – you do have to produce the data in the first place somehow. This requirement makes them isomorphisms in our category.

So, we know ADLs are invertible and can compose (they are functions, after all). This sounds suspiciously like there is a group we can form in here somewhere. For those first encountering this concept, what follows is one of those elegant realisations category theory is known to offer.

A reminder: a group is a set alongside a binary operation, such that the operation is:

We obviously pick the set of ADLs, as that is the only collection we have. For our operation, we pick function composition, as that is the only operation on functions we have.

We know composition is associative by definition, so that’s an easy condition to fulfill. Similarly, the neutral element is the identity function, which can be trivially constructed (it is also its own inverse).

Is function composition closed? Let’s check. A function is an ADL if it is an isomorphism in IPDL\bm{IPDL} – in other words, composition is closed iff fgf \circ g is also an isomorphism.

Af1B f1g1AfBgC\begin{CD} @. A @<f^{-1}<< B\\ @. @AAf^{-1}A @AAg^{-1}A\\ A @>f>> B @>g>> C\\ \end{CD} \\\\

Although bit of a strange diagram, it does commute. In fact the condition f1g1=idAf^{-1} \circ g^{-1} = id_A is practically the definition of an isomorphism. So, composition is closed.

Lastly, we get our inverse element by the requirement that all elements in our set of ADLs are isomorphisms, which – by definition – fulfill the condition that, given an isomorphism f:ABf: A \to B, there exists a function f1:BAf^{-1}: B \to A such that ff1=idAf \circ f^{-1} = id_A and f1f=idBf^{-1} \circ f = id_B. So, ADLs form a group under composition.

This is an example of a more general rule: a category CC whose arrows are all isomorphisms is called a grupoid, because it forms a group (hom(C),)\left(hom(C), \circ\right). Can we prove this? Well, the definition of a category gives us closedness, associativity, and identity. The requirement all arrows be isomorphic pushes us over the edge and gives us an inverse for all elements. This is what I hinted at in the previous footnote. It might not be too surprising of a revalation if you consider that categories on their own form a monoid over homsets and composition — a fact detailed in the first few pages of Categories For The Working Mathematician

Since functions are monadic in their codomain, we can construct a Kleisli category with the objects being elements of the homset of IPLD\bm{IPLD}, and the Kleisli arrow being composition.

One should note that while schemas are ADLs, neither lenses or selectors are, because they are not isomorphic. This is due to them being polytypic catamorphisms, or many-to-one in set theoretic terms.

Codecs #

Codecs are isomorphisms between IPLD values and a particular wire format. However, not all values in the wire format are mapped to IPLD values. This could be formalised with equalisers and pullbacks, but I’m not convinced it would provide any meaningful insight.

If we extend our category to contain objects representing wire formats, we find that for each format, we have an isomorphism between the format and every other IPLD type. It is via these isomorphisms that we prove wire formats to be isomorphic objects.

And from the uniqueness There are many ways to (de)serialise to wire format, but we only pick the canonical isomorphism to be in our category, as only that arrow is of any use to us. of arrows coming to and from them, we can see that adding codecs into the category gives us a zero object (both initial and terminal), or – in other words – makes IPLD\bm{IPLD} a pointed category.

This also means there is a zero morphism for any two objects A,BA, B, such that 0AB:A0B0_{AB}: A \to 0 \to B — it factors through the zero object. Another way of saying this is that there exists a unique morphism ff for any two objects A,BA, B such that for any other morphisms g,h:XA,  u,v:BYg,h: X \to A,\;u,v: B \to Y, the following is true: fg=fhf \circ g = f \circ h and uf=vfu \circ f = v \circ f.

This also means we can find the kernel and cokernel for any morphism in the category.

Open (and Practical) Problems #

The idea of IPLD is to write code once that fits different data, both polymorphic over wire format (late binding of codecs), and the exact shape of the data (structural typing).

This brings alongisde it some issues: say you’re writing code that interacts with some data input – what is the minimum set of requirements for that data input such that the function is not partial when applied to so-typed data?

Secondly, say a certain IPFS block really is well-typed for what you intend to use it – how do you know this before downloading and validating the entire tree structure?

Lastly, if we want to apply an ADL f:XYf: X \to Y, but we are given an object ZZ – and there exists an isomorphism ϕ:ZX\phi: Z \to X, how can we discover it and apply it?

While for the first two I provide some allaying solutions, the last issue is one that would require a very strict type system; and even then it could only give an answer to if such a function could exist. All workaround solutions seem more suited than any solution to this very general problem.

Structural Typing and Inference #

Contrasting with nominal type systems which identify types through their names, Hence the name – from the Latin nōmen. structural typing looks at values much more practically.

In essence, it equates types by comparing their structure. The exact details of this are usually left to the particular type system – for example, if field names or order matters, how functions are equated, etc.

Introducing a subtyping relation into the system, which would, roughly speaking, identify ‘subsets’ of a larger type with a ‘natural’ injective map. Of course, the naturality of such a mapping depends on the exact definition of the type system.

A strongly typed system with a set of well-defined inference rules would make all function signatures have an equally well-defined structural requirement of its inputs. Keen readers may recognise that an additional form of polymorphism — row polymorphism — is needed to preserve type information when dealing with subtypes in function signatures. Row polymorphism formalises the notion of “…and the rest of the object”.

Type checkers could generate a description for each function’s requirements of its inputs, while the type systems provides a definition of subtyping to check if a given input would be suitable for that function.

It is worth noting that both structural typing, and especially inference in these systems is a relatively new and hot topic in type theory research, as there aren’t as many clear correct answers as exist in more traditional nominative type systems.

There are many useful extensions to the type system which would greatly benefit writing IPLD-compatible applications, like gradual typing – which leaves some type checks to runtime; substructural typing, or algebraic effects – to allow for more reasoning to be typechecked in what is already a volatile input environment.

A candidate language to start from would perhaps be PureScript, which is more-or-less on feature parity with Haskell, with the major improvement of already supporting row polymorphism; as well as compiling down into, and being interoperable with JavaScript — which all lends itself to adoption on multiple platforms, though a compiled language would be nice too.

Structural Descriptions #

Regarding solutions to my second open problem – extracting structural information from a content address – a few invariants should be noted: At least I consider them invariant; if there is wiggle room I’d be more than happy to stand corrected and, hopefully, see it used in solving the problem.

To know a block’s structure before ‘manually’ parsing it in its entirety can be achieved in one of two ways: adding information to the block’s address, or adding data into a block header which can be quickly parsed out and first downloaded.

Secondly, as the point of having a structure descriptor for blocks is to translate arbitrary structure into machine-readable semantic, the language used for describing the structure (at least the one atop the possible description language hierarchy) must be unique, stable, and well-defined.

Lastly, just like with content hashes, there is no way to prevent tampering (or lying about a block’s structure) — not without public key cryptography. This means that, ultimately, the structure a block says it has is merely a hint, not a guarantee.

A Possible Solution #

I’ve come up with a possible solution to this problem. Firstly, leaving a structure description in the actual block is 1) unresourceful – many blocks may share the same structure, but all must have identical header overhead, and 2) difficult to program into a block exchange protocol that should support both streaming and sudden transfer abortion.

I propose a uniform structure-defining language be used, such that it describes IPLD nodes at only a minimal depth, while delegating deeper structure descriptions to other such documents using links.

CIDs would be expanded to include a hash to a document like this, which the end user would download first, inspect, and come to a conclusion whether he can use that block meaningfully or no. Combined with a function signature generated by a structurally typed typechecker, this would prove a powerful combination.

This approach, however, has several issues. Firstly, there is no guarantee of integrity for CIDs, and malicious actors could purposefully point CIDs to bogus structure description documents.

For many especially small linked blocks, downloading the multitude of description documents may prove to be detrimental due to the number of network trips needed for the ordeal of downloading a supplementary document — and downloading and manually inspecting the block might have been faster.

While not an issue in and of itself, since the structure descriptors are IPLD nodes with hash-links themselves, they benefit from all the hash tree consequences too — but a malleable hash that would reflect the content more accurately would be extremely beneficial (something like being able to tell if a given subtree satisfies some property).