Sigmoid Curves

S-Curves:

The beautiful S-Curve. A curve we encounter daily, revealed itself tonight as an optional output from machine learning algorithms.

“The function was named in 1844–1845 by Pierre François Verhulst, who studied it in relation to population growth. The initial stage of growth is approximately exponential; then, as saturation begins, the growth slows, and at maturity, growth stops.

The logistic function finds applications in a range of fields, including artificial neural networks, biology (especially ecology), biomathematics, chemistry, demography, economics, geoscience, mathematical psychology, probability, sociology, political science, linguistics, and statistics.”

What more proof does one need; look at that lineup!

Photogrammetry

I’m utterly speechless. And it’s due to photogrammetry.

In a land far, far away, I was a land surveyor out West. In those heady times of shooting lasers, I become familiar with the idea of point clouds as representations of geometric space in coordinate form. Per our execution with lasers and reflective prisms, the resolution of our models were decided by how many measurements we took. For a big job, this might be in upwards of 100-300 points. Imagine hitting the corner of a building with a laser – all corners – then a few shots along the sides to confirm that walls weren’t billowing, or beams sagging. 14 hours later, you had 100, 200, 300 points in 3D space that would form a point cloud.

This was amazing.

Well, technology marches on. Sometime in the last 5-7 years, I stumbled on a project called Photosynth, that offered the ability to upload a series of overlapping photos to create a point cloud, where points were determined by parts of the images it determined were overlapping and the same point in space. Incredibly, it would deduce a 3D space from 2D images. This was absoluteley jaw-dropping at the time, and still is. It was also shortlived. The project was through Microsoft and the University of Washington, but with the proliferation of quadcopter drones and GPUs and more efficient and widely available processing of images, I’m assuming this technology was suddenly highly profitable.

I will keep this short for now, as the hours loom long, but truly consumer grade software has started to emerge that allow photogrammetry rendition of 3D models from images; much like Photosynth, but exportable to formats well understood in the 3D modeling space. Some is proprietary software like Photoscan, but other promising open source options like OpenDroneMap are hot on their heels.

My first photogrammetric project, a pair of 2+ year old Nikes with more miles than I can count. A particular fondness I have for old shoes; effortlessly they are by your side (or under your feet) in life moments of small to large; what else propel and protect like they do.

It is clearly quite rough, but, this was 12 images, taken quickly, from my phone. 12 images. From a phone. In poor lighting.

Undoubtedly, more to come. Much, much more. In fact, we are soon to meet with someone at Wayne State whom I understand to be doing this very thing for cultural artifacts. Now, how do we preserve, and provide discovery and access for these beauties…

Pyfc4 Intro

Now that some of scaffolding and ground-leveling is done around a project I’ve been working on, it felt like a good time to step back and try to philosophize and narratize (that should be a word). I imagine waxing more on the topic in the future, perhaps on specific aspects, but this might serve as a general introduction and reckoning.

In short, the project is a python client for Fedora Commons 4 (FC4), boringly called “pyfc4”. In my defense, I’ve worked on projects with some pretty fun names: “Ouroboros”, a middleware for wrangling the chaos of a digital collections universe; “ichabod”, a headless browser that rendered pages and used fuzzy matching to determine if the content had changed much; to name a couple. But this felt like a project that deserved a sane, calm, simple name.

Why’s that? Because, at best, pyfc4 should get out of your way.

Background

Over the past few years, we’ve built a digital collections platform/infrastructure I’m quite proud of around Fedora Commons 3.x, Solr, and the other usual suspects. And we did this using primarily python. The decision to use python was not calculated, other than we had more collective experience with the language at the time, and were blissfully unaware how much work rolling our own middleware would be, as opposed to using other ecosystems like Hydra (now Samvera) or Islandora, both really excellent and vibrant codebases and communities.

We knew Fedora was going to be in the mix, but how to communicate with it? After some initial work making raw HTTP calls to Fedora’s REST API, we stumbled on “[eulfedora]”(https://github.com/emory-libraries/eulfedora), the phenomenal, intuitive, and joy-to-use python client for Fedora Commons 3.x. Eulfedora describes itself as:

eulfedora is a Python module that provides utilities, API wrappers, and classes for interacting with the Fedora-Commons Repository in a pythonic, object-oriented way, with optional Django integration.

eulfedora.api provides complete access to the Fedora API, primarily making use of Fedora’s REST API. This low-level interface is wrapped by eulfedora.server.Repository and eulfedora.models.DigitalObject, which provide a more abstract, object-oriented, and Pythonic way of interacting with a Fedora Repository or with individual objects and datastreams.

This description accurately and succinctly describe what eulfedora is, and does, but does not attempt to justify its existence or utility in the face of alternatives. It leaves out things like handling encoding, large file streaming, alerting user’s to identifiers (PIDs) that break Fedora’s PID rules, and the list goes on. Like other language specific API wrappers out there, Eulfedora wrangles the specification of its target, and that of the target’s API, into a patterns and information reporting that is helpful to the user.

Eulfedora is integral to our digital collections platform, and has functioned almost without reproach for years now. Sure, we’ve encountered some strange edge cases, and even submitted an update or two, but 99.99% of the time, eulfedora does precisely what you expect, and how you expect it should happen.

And so, it’s unsurprising that eulfedora would be an inspiration for pyfc4; a python client to serve a very similar, but worth noting, not identical, role for Fedora Commons 4.x and its complete overhaul of philosophy and architecture.

One area that pyfc4 distinctly deviates from eulfedora is no current integrations or configurations specific to Django. That time may come, but not yet.

Overview

pyfc4 is envisioned as a lightweight python client for interacting with an instance of Fedora Commons 4.x. The idea was to map the majority, if not all, of Fedora’s API endpoints and functions to methods and patterns within pyfc4.

Similar to eulfedora, it imagines interacting with Fedora via a Repository instance, and then working on and managing Reource’s. It can be broken down into the following high-level models:

  • Repository
    • Transaction
  • API
  • Resource
    • ResourceVersion
    • RDFSource
      • Container
        • BasicContainer
        • DirectContainer
        • IndirectContainer
    • NonRDFSource

Very generally, if repo were an instance of Repository, it is passed around to Resource instances, which itself has a hierarchy of classes that align with the Linked Data Platform (LDP) that Fedora Commons 4.x is, itself, an implementation of.

pyfc4 supports transactions, versioning of resources, and the more mundane but important CRUD operations for resources. One example of the utility of pyfc4, to pick on the “U” from CRUD, would be updating resources.

Updates

Updates in FC4, if not overwriting the totality of a resource’s metadata (which is not preferred for a number of reasons), are performed by issuing PATCH requests containing a SPARQL update query. Now, like any convenience pattern or method from a client library, this is something quite doable, step-by-step, determining what triples have been added, removed, modified, building the SPARQL query with the correct DELETE {}, INSERT {}, WHERE {} syntax. But you’re going to be doing this often. pyfc4 approaches this common task in the following way:

  1. when retrieving a resource, storing the current state of the RDF graph as self.rdf. _orig_graph
  2. all local changes to the resource’s graph – adding triples, removing them, etc. – is made to a copy of that graph at self.rdf.graph
  3. when it comes time to update the server with local changes, a diff is calculated between the original graph and the modified one, and converted into a SPARQL query that is sent to the server via a PATCH request

It’s much more convenient to type foo.update() than navigating those distinct phases each time, and a library like pyfc4 provides that convenience. It also provides a single point of interaction for updates that can be optimized and improved over time, instead of simliar, but not identical, methods for updating resources. Just one example, but an exmample nonetheless of why a client is helpful as opposed to making requests to a raw API endpoint.

RDF / graph interactions

Another area that pyfc4 might make one’s life easier, if not speedier, is interacting with the RDF payloads that FC4 provides for a resource. pyfc4 parses this graph – handling a variety of different RDF serialization formats for input or output – and provides some convenient ways to access those triples.

One particularly handy way is an object-like approach. pyfc4 parses the graph from a resource, and based on the URIs of the predicates, and stored prefixes, stores them by prefix with the objectcs accessible through dot notation.

For example, if foo is a retrieved BasicContainer resource:

# get ldp:contains 
In [6]: foo.rdf.triples.ldp.contains
Out[6]:
[rdflib.term.URIRef('http://localhost:8080/rest/foo/bar'),
 rdflib.term.URIRef('http://localhost:8080/rest/foo/baz')]

# In [8]: goober.rdf.triples.foaf.knows
Out[8]: [rdflib.term.URIRef('http://localhost:8080/rest/tronic/tronic2')]

# a more complete example of how this trickles through
# add namespace
In [9]: foo.add_namespace('licorice','http://licorice.org#')

# add triple
In [10]: foo.add_triple(foo.rdf.prefixes.licorice.best_color, 'black')

# update
In [11]: foo.update()
Out[11]: True

# now, object-like access to this triple parsed from graph
In [12]: foo.rdf.triples.licorice.best_color
Out[12]: [rdflib.term.Literal('black', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string'))]

Though pyfc4 is working nicely in the lovely cleanspace of testing and spec fullfillment, I’m hopeful that putting it to work for actual repository tasks will reveal other areas that convenience patterns and methods can be put in place. Perhaps even distilling common patterns into processes or approaches that would be faster than raw, and numerous, API calls.

Okay, enough for now. While I could go on, a dentist appointment looms, and that is perhaps the best excuse to keep this short and sweet for now. In the meantime, more documentation, examples, and information can be found on the github page: https://github.com/ghukill/pyfc4.