Now that some of scaffolding and ground-leveling is done around a project I’ve been working on, it felt like a good time to step back and try to philosophize and narratize (that should be a word). I imagine waxing more on the topic in the future, perhaps on specific aspects, but this might serve as a general introduction and reckoning.

In short, the project is a python client for Fedora Commons 4 (FC4), boringly called “pyfc4”. In my defense, I’ve worked on projects with some pretty fun names: “Ouroboros”, a middleware for wrangling the chaos of a digital collections universe; “ichabod”, a headless browser that rendered pages and used fuzzy matching to determine if the content had changed much; to name a couple. But this felt like a project that deserved a sane, calm, simple name.

Why’s that? Because, at best, pyfc4 should get out of your way.

Background

Over the past few years, we’ve built a digital collections platform/infrastructure I’m quite proud of around Fedora Commons 3.x, Solr, and the other usual suspects. And we did this using primarily python. The decision to use python was not calculated, other than we had more collective experience with the language at the time, and were blissfully unaware how much work rolling our own middleware would be, as opposed to using other ecosystems like Hydra (now Samvera) or Islandora, both really excellent and vibrant codebases and communities.

We knew Fedora was going to be in the mix, but how to communicate with it? After some initial work making raw HTTP calls to Fedora’s REST API, we stumbled on “[eulfedora]”(https://github.com/emory-libraries/eulfedora), the phenomenal, intuitive, and joy-to-use python client for Fedora Commons 3.x. Eulfedora describes itself as:

eulfedora is a Python module that provides utilities, API wrappers, and classes for interacting with the Fedora-Commons Repository in a pythonic, object-oriented way, with optional Django integration.

eulfedora.api provides complete access to the Fedora API, primarily making use of Fedora’s REST API. This low-level interface is wrapped by eulfedora.server.Repository and eulfedora.models.DigitalObject, which provide a more abstract, object-oriented, and Pythonic way of interacting with a Fedora Repository or with individual objects and datastreams.

This description accurately and succinctly describe what eulfedora is, and does, but does not attempt to justify its existence or utility in the face of alternatives. It leaves out things like handling encoding, large file streaming, alerting user’s to identifiers (PIDs) that break Fedora’s PID rules, and the list goes on. Like other language specific API wrappers out there, Eulfedora wrangles the specification of its target, and that of the target’s API, into a patterns and information reporting that is helpful to the user.

Eulfedora is integral to our digital collections platform, and has functioned almost without reproach for years now. Sure, we’ve encountered some strange edge cases, and even submitted an update or two, but 99.99% of the time, eulfedora does precisely what you expect, and how you expect it should happen.

And so, it’s unsurprising that eulfedora would be an inspiration for pyfc4; a python client to serve a very similar, but worth noting, not identical, role for Fedora Commons 4.x and its complete overhaul of philosophy and architecture.

One area that pyfc4 distinctly deviates from eulfedora is no current integrations or configurations specific to Django. That time may come, but not yet.

Overview

pyfc4 is envisioned as a lightweight python client for interacting with an instance of Fedora Commons 4.x. The idea was to map the majority, if not all, of Fedora’s API endpoints and functions to methods and patterns within pyfc4.

Similar to eulfedora, it imagines interacting with Fedora via a Repository instance, and then working on and managing Reource’s. It can be broken down into the following high-level models:

  • Repository
    • Transaction
  • API
  • Resource
    • ResourceVersion
    • RDFSource
      • Container
        • BasicContainer
        • DirectContainer
        • IndirectContainer
    • NonRDFSource

Very generally, if repo were an instance of Repository, it is passed around to Resource instances, which itself has a hierarchy of classes that align with the Linked Data Platform (LDP) that Fedora Commons 4.x is, itself, an implementation of.

pyfc4 supports transactions, versioning of resources, and the more mundane but important CRUD operations for resources. One example of the utility of pyfc4, to pick on the “U” from CRUD, would be updating resources.

Updates

Updates in FC4, if not overwriting the totality of a resource’s metadata (which is not preferred for a number of reasons), are performed by issuing PATCH requests containing a SPARQL update query. Now, like any convenience pattern or method from a client library, this is something quite doable, step-by-step, determining what triples have been added, removed, modified, building the SPARQL query with the correct DELETE {}, INSERT {}, WHERE {} syntax. But you’re going to be doing this often. pyfc4 approaches this common task in the following way:

  1. when retrieving a resource, storing the current state of the RDF graph as self.rdf. _orig_graph
  2. all local changes to the resource’s graph – adding triples, removing them, etc. – is made to a copy of that graph at self.rdf.graph
  3. when it comes time to update the server with local changes, a diff is calculated between the original graph and the modified one, and converted into a SPARQL query that is sent to the server via a PATCH request

It’s much more convenient to type foo.update() than navigating those distinct phases each time, and a library like pyfc4 provides that convenience. It also provides a single point of interaction for updates that can be optimized and improved over time, instead of simliar, but not identical, methods for updating resources. Just one example, but an exmample nonetheless of why a client is helpful as opposed to making requests to a raw API endpoint.

RDF / graph interactions

Another area that pyfc4 might make one’s life easier, if not speedier, is interacting with the RDF payloads that FC4 provides for a resource. pyfc4 parses this graph – handling a variety of different RDF serialization formats for input or output – and provides some convenient ways to access those triples.

One particularly handy way is an object-like approach. pyfc4 parses the graph from a resource, and based on the URIs of the predicates, and stored prefixes, stores them by prefix with the objectcs accessible through dot notation.

For example, if foo is a retrieved BasicContainer resource:

# get ldp:contains 
In [6]: foo.rdf.triples.ldp.contains
Out[6]:
[rdflib.term.URIRef('http://localhost:8080/rest/foo/bar'),
 rdflib.term.URIRef('http://localhost:8080/rest/foo/baz')]

# In [8]: goober.rdf.triples.foaf.knows
Out[8]: [rdflib.term.URIRef('http://localhost:8080/rest/tronic/tronic2')]

# a more complete example of how this trickles through
# add namespace
In [9]: foo.add_namespace('licorice','http://licorice.org#')

# add triple
In [10]: foo.add_triple(foo.rdf.prefixes.licorice.best_color, 'black')

# update
In [11]: foo.update()
Out[11]: True

# now, object-like access to this triple parsed from graph
In [12]: foo.rdf.triples.licorice.best_color
Out[12]: [rdflib.term.Literal('black', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string'))]

Though pyfc4 is working nicely in the lovely cleanspace of testing and spec fullfillment, I’m hopeful that putting it to work for actual repository tasks will reveal other areas that convenience patterns and methods can be put in place. Perhaps even distilling common patterns into processes or approaches that would be faster than raw, and numerous, API calls.

Okay, enough for now. While I could go on, a dentist appointment looms, and that is perhaps the best excuse to keep this short and sweet for now. In the meantime, more documentation, examples, and information can be found on the github page: https://github.com/ghukill/pyfc4.