Intertwingler — An Engine for Dense Hypermedia
Intertwingler
is an engine, very much like
WordPress is an engine: you use it to
make websites. You can think of Intertwingler
, at least this
implementation of it, as a demonstrator for the kind of
infrastructure necessary to make the Web do genuine dense
hypermedia.
The way to understand dense hypermedia is to contrast it with what
the Web is off the shelf, which is sparse hypermedia: big clunky
pages with not a lot of links, and what links do exist are
sequestered into regions like navigations and other UI. What we want
instead are smaller, more composable units, the mechanism of
composition being — what will end up being a much greater density of —
ordinary links. The effect we are after with Intertwingler
is to
pulverize Web content, dramatically increasing its addressability.
Not only does this afford practical benefits like content reuse, but
new affordances for software tools and creative expression.
Strategy
The main problem Intertwingler
has to solve, then, is the fact that
links on the Web are extremely brittle. The reason why links on
the Web are brittle is because it's very cheap to change the URL of a
Web resource, and very expensive to change all the places where that
URL is referenced. Intertwingler
solves this problem the following
way:
- It stores links (i.e., referent-reference pairs✱) as first-class objects,
- It assigns every resource a canonical identifier that doesn't budge,
- It overlays human-friendly address components (slugs) on top,
- It remembers prior values for these address components if you change them,
- It uses a custom resolver to do everything in its power to match a requested URL to exactly one resource,
- It controls what URLs are exposed to the outside world, through an aggressive redirection policy,
- It also has a mechanism for the principled handling parametrized and derived resources, maintaining a registry of parameter names, syntaxes, semantics, and other metadata.
Intertwingler
accomplishes all this by bringing your organization's
entire address space (i.e., every Web URL under every domain you control)
under its management.
✱ Actually,
Intertwingler
stores links as triples where the third element is the kind of link it is. More on this later.
Also packaged with the Intertwingler
demonstrator are the means for
creating websites with dense hypermedia characteristics:
- A file system handler, for transition from legacy configurations
- A content-addressable store handler, for bulk storage and caching of opaque resources
- A pluggable markup generation handler, for rendering transparent resources
- Mutation handlers (e.g.
PUT
andPOST
) for both opaque and transparent resources, - A set of transforms for manipulating resource representations, specifically HTML/XML markup and images.
Architecture
Concepts
This is a brief glossary of terms that are significant to
Intertwingler
. It is not exhaustive, as it is assumed that the
reader is familiar with Web development terminology. Note that I will
typically prefer the more generic term "URI" (identifier) rather than
"URL" (locator), and use "HTTP" to refer to both HTTP and HTTPS unless
the situation demands more precision.
(Information) Resource
An information resource is a relation between one or more identifiers (in this case URIs) and one or more representations. A familiar type of information resource is a file, which has exactly one representation and usually one, but possibly more than one identifier (file name/path). Web resources have several additional dimensions, including content type, natural language, character sets, other encoding and compression schemes, and even the request method or verb by which the resource was requested.
Opaque Resource
An opaque resource is named such because the enclosing information system does not need to "see into" it. An opaque resource may have more than one representation, but one representation will always be designated as canonical.
Transparent Resource
A transparent resource is the complement of an opaque resource: the enclosing information system can, and often must, "see into" its structure and semantics. Since the canonical representation of a transparent resource resides only in live working memory, all serializations (that neither discard information nor make it up) are considered equivalent.
Representation
A representation (of an information resource on the Web) is a literal sequence of bytes (octets) that represents the given information resource. Representations can vary by media type, natural language, character set, compression, and potentially many other dimensions.
HTTP Transaction
An HTTP(S) transaction refers to the process of a client issuing a single request to a server, and that server responding in kind. In other words, a single request-response pair.
Handler
An Intertwingler
handler is a microservice with certain
characteristics. All handlers are designed to be run as stand-alone
units for bench testing and system segmentation. A handler responds to
at least one URI via at least one request method. Handlers have a
manifest that lists the URIs, request methods, parameters, content
types, etc. under their control. This enables the Intertwingler
engine to perform preemptive input sanitation, and efficiently route
requests to the correct handler.
Engine
An Intertwingler
engine is a special-purpose handler that marshals
all other handlers and transforms, resolves URIs, and routes requests
to handlers for a particular site. This is the part, albeit wrapped in
a multiplexing harness, that faces the external network.
"Site" here is a logical entity relating a pool of content to a set of domain names. An organization may have multiple sites that don't share content—or indeed, Intertwingler is flexible enough that they can share content, but each engine has its own resolver, independent of the others.
Harness
The term "harness" is used to refer to two separate entities within Intertwingler, and both of them do some kind of multiplexing. The undecorated term "harness" refers to a specialized handler that bootstraps the whole system's configuration and affords one or more engines to manage different sites with different configurations. The other, the transform harness, yokes together all the transformation queues for a given engine, prepares the transformation subrequests and collects their results.
Transform
A transform is a special-purpose handler that encapsulates one or
more operations (each identified by URI) over a request body. As such,
transforms only respond to POST
requests. Like all handlers,
transforms have lists of content types for each URI in their manifest
that they will both accept and emit. Transforms are configured to run
in a queue, with the output of one being fed into the input of the
next. Through its interaction with an HTTP message, a transform may
also trigger a subsequent transform to be added to its own, or
another queue.
Request Transform
A request transform operates over HTTP requests. It can modify the request's method, URI, headers, body (if present), or any combination thereof.
Response Transform
A response transform operates over HTTP responses. Analogous to request transforms, response transforms can manipulate the response status, headers, body, or any combination thereof. Unlike a request transform, there are multiple queues for response transforms: an early-run queue and a late-run queue, with an addressable queue sandwiched between them.
Handlers
Everything in Intertwingler
is a handler, including the engine
itself. At root, a handler is a microservice created in compliance
with the host language's lightweight Web server interface (in our
case with Ruby, that would be
Rack).
A handler is intended to be only interfaced with using HTTP (or, again, the Web server interface's approximation of it). That is, a handler instance is a callable object that accepts a request object and returns a response object. A handler is expected to contain at least one URI that will respond to at least one request method.
The Intertwingler
Harness and Engine(s)
The Intertwingler
harness imagines itself one day turned into a
high-performance, stand-alone reverse proxy, with hot-pluggable
handlers (and by extension, transforms) that can be written in any
language, and interface internally over HTTP. That is the lens with
which to view the design. The harness is meant to be put at the edge
of an organization's Web infrastructure — or at least very close to
it — and manage the Web address space, with an engine assigned to each
distinct site under the organization's control (each of which can
coalesce multiple DNS domains).
Interactions Between Engine Components
This diagram attempts to show how the parts of the Intertwingler engine's anatomy interact with each other. Functionality encapsulated within a particular node can therefore be accessed by any node that can trace a path to it. Solid lines therefore represent that one instance (the base) has the other instance (the arrowhead) as a member. Bidirectional arrows signify a backreference. Dotted arrows are ephemeral links, e.g. URIs. 3D boxes represent (potentially) multiple instances. Green boxes are handlers or subclasses thereof, blue is the parameter subsystem, and red is the transformation infrastructure.
The Transform entity in the diagram is simply metadata about a transform (such as its parametrization), and ultimately points to a resource, implemented as part of a transform handler. Transform handlers are nevertheless undifferentiated from ordinary handlers in this view.
What follows is a set of definitions for the parts of the system anatomy not discussed elsewhere.
Dispatcher
The dispatcher is the part of the engine that takes an HTTP request, selects an appropriate handler for it, feeds the request to the handler, and passes the response back to the caller (usually the engine itself).
Parameter Registry
The Intertwingler engine has a site-wide registry for URL query (and
path) parameters, their types (and type coercions), cardinalities,
ordering, conflicts, formatting, names (and aliases), and other
behaviour. This is mainly a thin wrapper around
Params::Registry
, with
the additional capability of fetching its configuration data out of
the graph.
The parameter registry could probably be rolled into the resolver, since it primarily concerns managing URI query parameter names, values, constraints, type coercions, canonicalization, etc.
Partial Invocation
Transformation functions may take additional scalar parameters. These are represented by partial invocations that effectively curry the function so that they only need to take a single outstanding parameter: the HTTP message body. Partial invocations are either statically configured in transform queues, or inserted on the fly into addressable queues.
I consternated over the computer-sciencey term invocation rather than the more precise mathy term application, but I decided this is a computer and people will probably confuse it for an app or something.
Queue Chain
The most recent addition to the transform infrastructure is a disposable queue-of-queues that manages the state of the transformation process over the lifetime of an HTTP request loop, in a manner that insulates longer-lived objects from destructive changes.
Resolver
The Intertwingler URI resolver is one of the first substantive accomplishments of this project, created back in 2019. This is the part of the system responsible for translating between human-friendly yet mutable addresses to the more durable, yet less pronounceable identifiers (UUIDs and RFC6920 hash URIs).
Transform Harness
The transform harness organizes chains of transform queues, i.e., request queues and response queues. Typically, there is a single request queue associated with the engine, but handlers may specify their own response queues besides the engine default (e.g. the content-addressable store has a null queue so nothing interferes with the integrity of the message bodies it emits). The transform harness is also responsible for translating path parameters into partial invocations that get added to a per-request addressable response queue.
The transform harness was originally a member of the engine, but this turned out to be awkward in a number of ways:
- The contents of addressable queues are not known until the time of the request.
- The invocation of a transform in an earlier queue can cause changes to the contents of subsequent queues (see
tfo:Insertion
).- The initial response queue itself is not known until the dispatcher selects a handler.
The transform harness was initially conceived as the part that marshals all things transformation, but its role of hanging onto a bunch of long-lived data structures conflicted with the chaos of what happens during a request. Also, it only interacts with the dispatcher (indeed, it needs to know what handler the dispatcher chose to resolve a response queue) and does not need to be in the engine's main execution context. As such, I introduced a new entity, a queue chain, that is designed to be used exclusively by the dispatcher. The chain shallow-copies the prototypical transform queues so it doesn't matter what happens to them. The transform harness need only be concerned, then, with loading and hanging onto the relevant data structures. Indeed, it is starting to look a little redundant, and I may eventually just dissolve it into the dispatcher.
Intertwingler
Handler Manifests (In Progress)
Still in progress at the time of this writing is a finalized design for handler manifests, though some details are certain. A manifest is intended to advertise the set of URIs that a given handler will respond to, along with:
- what request methods are recognized,
- what content types are available as a response,
- what URI query parameters are recognized, their data types, cardinality, etc.,
- what content types are accepted in requests (at least the ones that send body content)
- in the case of
POST
ed HTML forms (application/x-www-form-urlencoded
andmultipart/form-data
types), parameter lists analogous to query parameters, - etc…
The exact format of the manifest payload is still yet to be
determined. What is known is that handler manifests will be
retrieved by the special OPTIONS *
request, intended to address the
server (in this case microservice) directly rather than any one
particular resource it manages. Since the HTTP
specification
does not explicitly define semantics for any content in response to
OPTIONS *
, we future-proof by only sending the manifest if a
Prefer: return=representation
header is present in the request, in addition to the ordinary content
negotiation headers, Accept
and so on.
State
Intertwingler
maintains its state — at least the transparent
resources — in an RDF graph database. The
current implementation uses a very simple, locally-attached quad
store. Opaque resources, or rather their literal
representations, are held primarily in a content-addressable
store. Intertwingler
also includes a file system handler to help
transition from legacy configurations.
Both the graph database and the content-addressable store are candidates for stand-alone systems that could be scaled up and out.
Addressing
Intertwingler
maintains URI continuity by ascribing durable
canonical identifiers to every resource, and
then overlaying human-friendly yet potentially perishable
identifiers on top. The goal of the Intertwingler
resolver is to
eliminate the possibility of a user receiving a 404
error, at least
in practice. (In principle it will always be possible to request URIs
that Intertwingler
has never had under its management.)
While it is possible, for aesthetic reasons, to ascribe an explicit
path as an overlay URI, Intertwingler only needs as much path
information as is necessary to match exactly one canonical
identifier. That is, if the database only contains one resource with a
slug of my-summer-vacation
, then the full URI
https://my.website/my-summer-vacation
is enough to positively
identify it. (If a longer path was explicitly specified, then
Intertwingler
will redirect.) If a second resource shows up in the
graph with the same slug, Intertwingler
will return 300 Multiple Choices
with the shortest URIs that will unambiguously identify both
options.
URI path segments prior than the terminating one correspond to
arbitrary entities in the graph that happen to have been appropriately
tagged. Again, the only purpose they serve is to unambiguously
identify the terminating path segment. For the path /A/B/C/d
to
resolve, A
has to exist and be connected (again, arbitrarily) to
B
, B
to C
, and C
to d
. If only part of the path resolves,
then that is one of the few situations you will encounter a 404
—
because the path is over\specified — something you can only do if
you enter the path manually, as Intertwingler
will only ever expose
(explicit overrides notwithstanding) the shortest uniquely-identifying
overlay path for any resource. As such, if d
can be uniquely
identified using a shorter path, the default behaviour of
Intertwingler
is to redirect.
In practice, this behaviour subsumes what we ordinarily think of as "folders" or "containers", and will be possible to configure which resource and relation types get considered for "container-ness", but in general
Intertwingler
does not recognize the concept of a container as a category of entity that is meaningfully distinct from a non-container.
Canonical Identifiers
Intertwingler
uses
UUIDs for the bulk of
its canonical identifiers, with the exception of those that correspond
1:1 to byte segments (that is to say, the opaquest of the opaque),
which use URIs derived from cryptographic
digests. The former
can always be reached by accessing, e.g.:
https://my.website/d3e20207-1ab0-4e65-a03c-2580baab25bc
and the latter, e.g.:
https://my.website/.well-known/ni/sha-256/jq-0Y8RhxWwGp_G_jZqQ0NE5Zlz6MxK3Qcx02fTSfgk
…per RFC6920. If the resolver finds a suitable overlay address for the UUID form, it will redirect, but the hash URI form remains as-is. Direct requests to these hash URIs (at least from the outside) will also bypass any response transforms, in order to preserve the cryptographic relationship between the URI and the representation.
Intertwingler
Transform Protocol
The transform protocol is inspired by the FastCGI
specification,
and its use in server modules like Apache's
mod_authnz_fcgi
.
In this configuration, the main server issues a subrequest to a
FastCGI daemon, and then uses the response, in this case, to determine
if the outermost request is authorized. The reasoning goes that this
behaviour can be generalized to ordinary HTTP (in our era of liberal
use of reverse proxies, FastCGI is an extra step), as well as handle
other concerns in addition to authorization. (Indeed, FastCGI itself
also specifies a filter
role,
but I have not seen a server module that can take advantage of it.)
A direct request to a transform looks like a POST
to the transform's
URI where the request body is the object to be transformed. Additional
parameters can be fed into the transform using the URI's query
component, it being on a separate band from the request body. POST
s
to transforms must include a Content-Type
header and should
include an Accept:
header to tell the transform what it prefers as a
response. The Content-Length
, Content-Type
, Content-Language
,
and Content-Encoding
headers of the transform's response will be
automatically merged into the original HTTP message. Queues of
transforms can therefore be composed, to perform complex operations.
When an HTTP transaction occurs completely within an engine's process space (i.e., it does not try to access handlers running in other processes/engines), the engine has strategies to mitigate the amount of extraneous parsing and serialization that would otherwise occur.
Entire-Message Transforms
Transforms can modify the entire HTTP message by serializing the
message (or the part desired to be modified) into the body of the
request to the transform, and using the content type message/http
.
Transforms that accept serialized HTTP messages as request bodies
should respond in kind.
That is, if you were writing an entire-request-manipulating request transform, it would expect the
POST
ed content to be a serialized request, and would likewise return a serialized request. An analogous response transform would expect a serialized response in the request body, and likewise respond with a serialized response, allmessage/http
.
For entire-message-manipulating transforms, it is only necessary to pass in the part of the HTTP message that one wishes to have transformed, plus any additional information needed for the transformation to be successful. (It is, however, necessary to include the request line or status line, for request transforms and response transforms, respectively.) Results will be merged into the original HTTP message. Responding with an identical value as the request (request line, status line, or header) will leave it unchanged, or in the case of headers, it is safe to omit them. To signal that a header ought to be deleted, include it in the outgoing header set with the empty string for a value.
URI Rewriting and No-Ops
The response codes 303 See Other
and 304 Not Modified
have special
meaning with respect to transforms. If a request transform returns a
303
, its Location
header should be interpreted as a simple
internal rewrite of the request-URI. A 304
indicates that the
transform (request or response) has made no changes at all. All
other 3XX
responses are forwarded to the client.
Redirect responses from addressable response transforms that return their own URI path with different query parameter values are translated backwards into the outermost request with different path parameter values.
Addressable Transforms
Most transforms are configured statically, but some response transforms are addressable through the use of path parameters, a lesser-known feature of URIs. The advantage of using path parameters to identify response transforms is that they stack lexically, so the example:
https://my.website/some/image;crop=200,100,1200,900;scale=640,480
…would fetch /some/image
from a content handler, and then in a
subrequest, POST
the resulting response body to, say,
/transform/crop?x=200&y=100&width=1200&height=900
, receive that
response body, and then POST
it to
/transform/scale?width=640&height=480
, the response to which would
be reattached to the outgoing response to the client. The mapping that
relates the comma-separated positional arguments in the path
parameters to key-value query parameters is expressed using the
Transformation Functions Ontology.
Handler Inventory
Everything in Intertwingler
is a handler, but the undecorated term
"handler" refers to content handlers. These are the stripped-down
microservices that actually respond to outside requests.
File System Handler
This is a rudimentary handler that provides content-negotiated GET
support to one or more document roots on the local file system.
Markup Generation Handler
This handler generates (X)HTML+RDFa (or other markup) documents from subjects in the graph. Pluggable markup generators can be attached to different URIs or RDF classes.
Generic Markup Generator
This creates a simple (X)HTML document with embedded RDFa intended for subsequente manipulation downstream. This generator and others will eventually be supplanted by a hot-configurable Loupe handler.
Atom Feed Generator
This will map resources typed with certain RDF classes to Atom feeds
when the request's content preference is for application/atom+xml
.
Google Site Map Generator
This will generate a Google site map at the designated address.
skos:ConceptScheme
Generator
This is a special alphabetized list handler for SKOS concept schemes.
sioct:ReadingList
Generator
This is a special alphabetized list handler for bibliographies.
Person/Organization List Generator
This is a special alphabetized list handler for people, groups, and organizations.
All Classes Generator
This handler will generate a list of all RDF/OWL classes known to
Intertwingler
. Useful for composing into interactive interfaces.
Adjacent Property Generator
This handler will generate a resource containing a list of RDF properties that are in the domain of the subject's RDF type(s). Useful for composing into interactive interfaces.
Adjacent Class Generator
This handler will generate a resource containing a list of subjects
with ?s rdf:type ?Class .
statements where ?Class
is in the range
of a given property. Useful for composing into interactive interfaces.
Content-Addressable Store Handler
The content-addressable store handler wraps
Store::Digest::HTTP
(which itself wraps
Store::Digest
).
This handler maps everything under /.well-known/ni/
. You can add a
new object to the store by POST
ing it to that address.
Store::Digest::HTTP
also generates rudimentary index pages.
Reverse Proxy Handler (TODO)
While the plan is to include a reverse proxy handler, and while they are relatively easy to write, I am leaving it out until I can determine a sensible policy for not letting the entire internet access the entire rest of the internet through the reverse proxy.
Linked Data Patch Handler
The
LD-Patch
handler processes PATCH
requests with text/ldpatch
content and
applies them to the graph. This can be used in conjunction with the
RDF-KV Transform.
Transform Inventory
Much of the labour of Web development is considerably simplified if you realize that many of the operations that bulk up Web applications can be construed as transformations over HTTP message bodies. Most transforms don't need much, if any information outside of the segment of bytes they get as input. Most transforms, moreover, are tiny pieces of code.
Request Transforms
Again, request transforms are run in advance of the content handlers, generally making small adjustments to headers and sometimes manipulating request bodies.
Markdown Hook Transform
This simple transform adds text/markdown
to the request's Accept
header, so downstream content negotiation selects Markdown variants
when it wouldn't otherwise. It also hooks the Markdown to
HTML response transform.
Note that if the
Accept
header already containstext/markdown
with a higher score thantext/html
orapplication/xhtml+xml
, the markdown passes through to the client untouched.
Sass Hook Transform
In a virtually identical move, this transform adds the
text/x-vnd.sass
and text/x-vnd.sass.scss
content types to the
request's Accept
header, and hooks the Sass Transform.
Pseudo-File PUT
Transform
This transform will take a PUT
request to a particular URI and
generate the graph statements needed to fake up a file path, while
transforming the request into a POST
to
/.well-known/ni/
to store the
content.
This is a basic mechanism for getting content onto the site in lieu of a fully-fledged WebDAV infrastructure, which will come later. I have implemented a WebDAV server before and it was an entire project unto itself.
RDF-KV Transform
This transform will take POST
requests with
application/x-www-form-urlencoded
or multipart/form-data
bodies
that conform to the RDF-KV protocol
and transform the request into a PATCH
with a text/ldpatch
body,
suitable for the LD-Patch handler.
Response Transforms
Response transforms are run after the selected content handler, in three phases: early-run, addressable, and late-run. Theoretically any response transform can be run in any phase, but some transforms will only make sense to run in certain phases, and/or before or after other transforms in the same phase.
Markdown to HTML Transform
This transform will take Markdown and turn it into (X)HTML.
Sass Transform
This transform will take Sass content and turn it into CSS.
Since
libsass
development has been discontinued in favour of a Dart implementation, the Sass transform is a candidate for the first cross-language handler.
Tidy Transform
This transform will run HTML Tidy over (X)HTML content.
RDF Transform
Turn an RDFa document into Turtle, N-Triples, RDF/XML, or JSON-LD.
Also turn any of those types into each other.
Strip Comments Transform
Removes the comments from HTML/XML markup.
(X)HTML Conversion Transform
Transforms HTML to XHTML and vice versa.
Rewrite <head>
Transform
Ensures the correct <title>
and <base href="…">
elements are
present in an (X)HTML document, as well as <link>
, <meta>
,
<script>
and <style>
.
Rehydrate Transform
Transforms certain inline elements in an (X)HTML document (<dfn>
,
<abbr>
…) into links to term definitions, people, places, companies…
Add Social Media Metadata Transform
Adds Google, Facebook,
Twitter,
etc. metadata to the <head>
of an (X)HTML document.
Add Backlinks Transform
Adds a chunk of markup containing backlinks to every block or section element in an (X)HTML document that is an identifiable RDF subject.
Rewrite Links Transform
Rewrites all links embedded in a markup document to the most up-to-date URIs.
Mangle mailto:
Transform
Obfuscates e-mail addresses/links in a manner serviceable to recovery by client-side scripting.
Add Amazon Tag Transform
Adds an affiliate tag to Amazon links.
Normalize RDFa Prefixes Transform
Moves RDFa prefix declarations to the root (or otherwise outermost available) node where possible; overwrites alternate prefix identifiers with those configured in the resolver; prunes out unused prefix declarations.
Add xml-stylesheet
PI Transform
This transform will add an <?xml-stylesheet …?>
processing
instruction to the top of an XML document, for use with XSLT or CSS.
Apply XSLT Transform
Applies an XSLT stylesheet to an XML document.
Reindent Transform
Normalizes the indentation of an HTML/XML document.
Image Conversion Transform
Converts a raster image from one format to another.
Crop Transform
Crops an image.
Scale Transform
Scales an image down.
Desaturate Transform
Makes an image black and white.
Posterize Transform
Posterizes an image.
Knockout Transform
Generates a transparency mask based on a colour value.
Brightness Transform
Manipulates an image's brightness.
Contrast Transform
Manipulates an image's contrast.
Gamma Transform
Manipulates an image's gamma value.
Implementation Notes
Parts of Intertwingler
, notably the URI resolver and markup
generation handlers, depend on a
reasoner to make inferences
about assertions in the database. In 2018, when I began working on
Intertwingler
's predecessor, RDF::SAK
, the only workable
implementations of reasoners were in Java and Ruby (which still
appears to more or less be the case). I chose Ruby because it was
easier for prototyping. My vision for Intertwingler
, though, is that
it eventually has implementations in as many languages as it can.
Graph-Based Configuration Policy
Most of Intertwingler's configuration, which is much more detailed than could be adequately served by an ordinary configuration file, comes from the RDF graph. On principle, I would prefer to make graph-based configuration optional to the extent that it can be. This translates to writing object constructors that are for the most part ignorant of (or at least agnostic to) RDF, such that Intertwingler's innards can be nominally constructed without having to procure a bunch of graph data.
Storing the configuration in the graph also makes Intertwingler hot-configurable.
This situation entails that a common vocabulary and interface be established for initializing objects from the graph. Many, but importantly not all instances of classes in the running code have counterparts in the graph.
In particular, the dispatcher and the transform harness are parts of the engine that do not have their own identities, but due to an earlier design, the resolver does (which may no longer be necessary). The parameter registry, likewise, currently does not, but may in a future revision.
Every class that gets its configuration from the graph should exhibit the following characteristics:
- It needs a path (i.e., by tracing instance members) to the graph database—not (necessarily) passing it in directly to the constructor.
- It needs a subject (if it doesn't have its own subject, it can borrow its "parent's").
- It needs a static method that has a standardized name which
downstream users can associate with constructing an instance out of
the graph (this used to be
resolve
, which I then changed toconfigure
, which I think I'll change back toresolve
again). - The actual
initialize
method should have a minimal footprint of required members in positional arguments (usually this is the "parent" thing that has a connection to the graph, and the subject URI), and everything optional should be in keyword parameters (some exceptions apply).- The entire initial state of the object should be able to be passed into the constructor; i.e., if the object is initialized with its state in the constructor, there should be nothing "extra" that has to be fetched from the graph.
- There may also be some value in punning the initialization parameters (as well as those of other methods) such that a subject URI passed in will autovivify into the object (as in instance) it stands for. I'm not sure if I want this behaviour everywhere so I'm still on the fence.
- There should be a
refresh
instance method that will, potentially recursively, sync the entity with the current state of the graph.- For nested objects it is entirely possible that some instances may disappear while others are initialized anew, as well as changing connections between objects.
- Many classes represent container-like entities, and the entities
they contain are also subjects in the graph.
- It is important that these objects can have members added and removed at runtime, and not have to do a full refresh to update their state.
- Therefore, create an instance method
resolve
(or some disambiguating derivation if there is more than one kind of membership) that expects the subject URI.
I have furthermore found it useful to mix in some graph-manipulating
shorthand methods to these classes (from
Intertwingler::GraphOps::Addressable
) to facilitate loading their
configuration. These methods will work as long as there is a repo
member which points to the graph (I usually keep this private
to
control where the graph is accessed from), and a subject
.
Here are the classes that could currently (as of 2024-02-06) use this treatment:
-
Intertwingler::Engine
-
Intertwingler::Resolver
(may be coalesced into the engine) -
Intertwingler::Engine::Dispatcher
(as a proxy) -
Intertwingler::Transform::Harness
(as a proxy) -
Intertwingler::Transform::Queue
-
Intertwingler::Transform
-
Intertwingler::Transform::Partial
-
Intertwingler::Transform::Invocation
(eventually, when we do caching) -
Intertwingler::Params::Group
-
Intertwingler::Params::Template
Note: the handlers themselves (
Intertwingler::Handler
andIntertwingler::Transform::Handler
) currently get their configuration fromurn:x-ruby:
URNs, in lieu of having to come up with an entire regime for configuring those too.
The urn:x-ruby:
Scheme
It was determined that it would be useful to have a way to represent a a Ruby module (or, indeed, a module in any programming language that has them) as a URI. The handlers, in particular, are configured and loaded this way. The representation looks like this:
urn:x-ruby:path/to/module;Constant::Name?=arbitrary=param
Using the URN scheme was not my first choice (an
x-ruby:
scheme on its own is more appropriate), but Ruby's inbuiltURI
module is written such that you can't register a URI scheme that contains a hyphen (despite that being legal, and thex-
prefix being the standing recommendation for non-standard URI schemes). The third-party URN module, however, does not have this constraint.
The class, provisionally titled Intertwingler::RubyURN
does not
impose any semantic constraints, although it does provide a mechanism
to require
the file and (if successful) return the identified
constant. Query parameters (actually the URN
Q-Component)
have their keys normalized to symbols; no other manipulation is done
besides. The application is free to interpret the components of the
x-ruby
URN however it likes, e.g., by treating the constant as a
class and the query as initialization parameters.
If, however, you are minting URNs which represent code you intend to subsequently load and run, you should probably take some steps to mitigate all the myriad bad things that can happen as a result.
Installation
For now I recommend just running the library out of its source tree:
~$ git clone git@github.com/doriantaylor/rb-intertwingler.git intertwingler
~$ cd intertwingler
~/intertwingler$ bundle install
Configuration
Intertwingler
is effectively a form of middleware, meaning it's
effectively useless without mounds of content. Until further notice,
my recommendation is to monitor the Getting
Started guide.
Sponsorship
The bulk of the overhaul that transformed RDF::SAK
into
Intertwingler
was funded through a generous research fellowship by
the Ethereum Foundation, through their
inaugural Summer of Protocols
program. This was a unique opportunity for which I am sincerely
grateful. I would also like to thank Polyneme
LLC for their financial support and ongoing
interest in the project.
Contributing
Bug reports and pull requests are welcome at the GitHub repository.
Copyright & License
©2018-2024 Dorian Taylor
This software is provided under the Apache License, 2.0.