Choosing a framework for our v2 Digital Collections front-end
We are currently in the midsts of refreshing our Digital Collections front-end. It has been a workhorse for us, still functions, and still looks respectable, but the foundation is beginning to crack as we push the original design and corners of spaghetti code beyond their original visions.
- user login and authentication
- iteratively improved search
- full-text vs. structured metadata refinement
- improvement of facets
- collection browsing
- serials browsing and interfaces
- inter-linked learning objects
- introduction of hierarchical content types such as archival materials and serials
- and the list goes on…
I’m proud of what we’ve built, something that is remarkably usable and cohesive given the breakneck pace of change and learning that ran parallel. It has survived entire re-imaginings of how digital objects are structured, a full-fledged API on the back-end that powers it, migration of servers, introduction of vagrant provisioning, you name it.
But its time has come. As we push into more digital objects, we’ve started to notice some performance hits that are a result of inefficient JS tangles and approaches. Our initial approach was a “lightweight” JS front-end that relied heavily on AJAX calls to retrieve information from our back-end API, that was drawn on the page with jQuery-Mustache templating. We’ve made a handful of improvements that keep it humming along, but any substantial changes would require reworking a lot of ugly JS code. And I can say that, because we wrote it all.
The visual style is also feeling a bit dated, or perhaps if but only stale. It needs a refresh there too.
And there was the important issue of sustainability. We know the ins and outs of the JS code, but thar be dragons thar, and feels near impossible to document in a lucid fashion.
So, the time is right. We have at our disposal someone who is going to put together front-end wireframes that we can use to wire and implement. The next big decision: what kind of organization and/or framework for the front-end?
We spent a bit of time going round and round, discussing the pros and cons of emerging JS, Python, and other frameworks. It is worth noting, all while simulataneously congnizant that we may migrate to a more turn-key solution down the road, if projects like Hydra-in-a-Box provide a truly, and palatable kit-and-kaboodle option. Another goal, briefly alluded to above, is sustainability in a front-end; something that can be worked on, improved, fixed, and loved for some time.
I can’t believe I’m typing this, but we are starting to hone in on using a PHP framework. Considering Slim and Lumen at this point. Why a PHP framework? Why not Flask to augment the other python components in the stack?
For a combination of reasons.
First, PHP is a language commonly used here in the libraries. More people know it now, and though you could debate this a bit, it’s probable that anyone coming in later will at least be familiar with PHP. Perhaps you could say the same about Python, but as long as the website is PHP based, we’ll have people “in shop” who know PHP. Perhaps the same can’t be said for Python. And that’s important.
Second, we would like to keep front-end and back-end cleanly separated. At our initial wireframing meeting, the individual creating a working wireframe leveraged our quirky and undocumented API and created a working demo. That was amazing. It reinforced the idea of treating our collections as data, maybe even first and foremost. Front-ends will come and go, amaze and disgust, but our underlying, structured digital objects will remain. An organized API for access to those materials means a multitude of front-end interfaces are possible, and migration down the road is easier.
We also know ourselves well enough that if we created a python-based, Flask front-end, we would inevitably start importing libraries and models directly from the back-end Ouroboros ecosystem. While this may be programatically efficient in some ways, maybe faster in others, it would muddle clean lines between our back and front ends that we would like to maintain.
And so, PHP is looking good. We still get the following from a PHP framework:
- URL routing: as easy to return machine-readable data as rendered pages
- built-in templating: likely with syntax nearly identical to Jinja in Flask
- models: make a nice connection between front-end models and API
- ORM: room to grow into user accounts, etc.
- conventions to organize our code
- and many more these tired fingers haven’t yet gotten to
Undoubtedly there will be updates and twists in this adventure to a new front-end – no little one, being a complete rewrite of our digital collections API – but it’s exciting to have a path to explore at this point.
Archival Material Ingest Workflow
It has taken quite some time, but we’ve honed in on a workflow for ingesting archival materials from the Reuther library – aka, University Archives here on campus – into our Fedora Commons based repository. Turns out, the following is all there is to it!
Rest assured, there are more to those arrows than immediately meets the eye. Let’s dive in.
Our goal was to take digitized archival materials from the Reuther Library and provide preservation and access via our Digital Collections infrastructure. At a very high level, we were imagining one digital object in Fedora for each digital file coming from the archives.
But we were realistic from the get-go, it was going to be a much larger enterprise than file-by-file. How would we manage ingest and description at scale? To answer that, we need to look at the systems in play: Archivematica and ArchivesSpace.
Archivematica is a series of tubes. To be more precise, micro-services. Archivematica takes groups of files, runs them through a host of preservation micro-services – virus scanning, file format normalization, checksumming, etc. – and ties them up together in a tidy bow with an over-arching METS file. Archivematica is inspired by the venerable OAIS model (Open Archival Information System), and as such, speaks in terms of AIPs, SIPs, and DIPs.
We are using Archivematica as means to get actual, discrete digital files from the Reuther into a format that we can batch process them for ingest. Additionally, we get all the preservation friendly treatment from the micro-services, and begin a paper trail of metadata about the file’s journey. It’s quite possible we’ll dig deeper into affordances and functionality of AM (it’s time to shorten), but for now, it’s primarily a virus checking, checksumming, METS writing, server/building spanning networked pipeline for us. And it’s going to be great!
The next dish in the cupboard is ArchivesSpace. ArchivesSpace, save a passionate and exploration here of just what ASpace is and represents, it’s safe to think of it as the next generation of archival software used to handle description, management of information around materials, discovery, and much more. Our partner in crime, the Reuther Library, is slowly switching to ASpace to handle their description and information management. It is a database driven application, that still also exports finding aids for archival collections in EAD. We’ll be using those, with plans to leverage the API once deployment has settled down a bit.
Our involvement with ArchivesSpace is limited primarily to our metadata librarian who takes a manifest of the files as processed by Archivematica, an EAD of descriptive and intellectual organization metadata about the collection as exported from ArchivesSpace, and creates a new METS file meant to enrich / augment the original Archivematica METS file.
I’ve perhaps nestled myself too deeply in the weeds here, so lets zoom out. We…
- take files from the archives via Archivematica
- these come with an AM generated METS file that represents the “physical” hierarchy and organzization of the digital files on disk
- we then take an EAD from ArchivesSpace that contains “intellectual” hierarchy and description about the materials, and synthesize a new METS file that represents the intellectual organization of the files - something we refer to as “AEM METS”, for “Archival Enrichment Metadata (AEM)”
- with the original digital files, AM METS, and AEM METS, we create bags on disk
- finally, ingest!
Where, and how, does this happen?
This occurs in an increasingly substantial corner of our adminitrative middleware, Ouroboros, called the “Ingest Workspace”. This Ingest Workspace, the intent of this blog post and which I’ve managed to bury pretty far down here, is where we take these collaborative bits of information and assimilate them into sensible digital objects we can ingest. It’s the green box in the diagram above.
This process has taken a considerable amount of time, as it’s a complex process! So much so, that wonderful folks at the Bentley Historial Library received a grant and to fund research into wiring these platforms together to aid in these kind of ingest workflows (as you dig down, the details are different, but the end goals share many similarities - moreover, their blog and presentations at conferences have been a huge help for thinking through these processes).
The difficult and complexity come, in large part, to reconciling physical and intellectual arangement of archival materials, or any materials for that matter. A quick and dirty example: a postcard from a friend is in a shoebox, in a drawer, in my desk. That is the stellar physical arrangement I have chosen. However, I may have it intellectually organized under meaningful materials –> postcards –> international. And that’s an easy example where the hierarchical levels align. How, then, might we digitize this item and provide access to a user, while also trying to contextualize the item within its intellectual and physical place?
We decided to drop one: physical. I should point out here that even within the “physical” hierarchy, we are actually often referring to digital and analog versions of the resource. The Reuther library has made the very wise choice of organizing their digital files where possible to mimick their physical hierarchy, which makes this considerably easier. But suffice it to say that we retain both, in form or another, such that we can work backwards and figure out where the original digital or analog version lives.
To wrap up, we are choosing to organize and contextualize the files primarily based on their intellectual arrangement as suggested by ArchivesSpace. Which, in fine fashion to finish, explains the need to interwine information from Archivematica and ArchivesSpace! This post comes on the heels of an early successful ingest of this workflow, expecting all kinds of interesting twists and turns as we proceed – newly digitized materials, updates to metadata, pointing back to ArchivesSpace (which we have in our back pocket with ASPace identifiers), etc.
Wonders I, what the Sunday morning holds. I know that I holds a cup of coffee in one hand (well, it’s on the table as I type this).
What is that draw of self-reflective writing? While others are generating useful content for the world, I’m endlessly intrigued with the workflow. In fact, it was workflow thinking that occupied my Friday afternoon.
Had the distinct pleasure of attending my 5th or 6th Mid-Michigan Digital Practioners meeting. One of the conversations was about workflows, and I realized I was chomping at the bit to share workflows we’ve worked on for ingesting objects into our instance of Fedora Commons, our pipelines for running materials through Archivematica and fitting with descriptive metadata from ArchivesSpace, or even just decisions trees for deriving JP2’s from TIFFs (with particular thanks to Jon Stoop at Princeton for sharing some of their Kakadu “recipes” that we’ve repurposed).
When we first set out to replace an aging Digital Collection system with a then unknown platform, it was workflow models that eventually opened our eyes and understanding about Fedora Commons. I remember looking at countless diagrams of modern digital collections infrastructures, and noticing reocurring components like “Fedora Commons”, “Solr”, “Blacklight”, etc. This was my preferred way of learning about what’s out, what’s hot, what’s not, what’s great, what’s old, what’s neat, what there is. What a wonderful way to learn about the world of things by observing their place in a grand workflow diagram.
And so, thinking of beginning a repository of sorts for workflows. I know this is happening in other areas like the Portland Common Data Model (PCDM), and/or around project Hydra. I do believe we hone and refine our intuitions about these complex infrastructures by seeing artists’s renditions – and that’s what these works of art are.