A Sunday Morning

Wonders I, what the Sunday morning holds. I know that I holds a cup of coffee in one hand (well, it’s on the table as I type this).

What is that draw of self-reflective writing? While others are generating useful content for the world, I’m endlessly intrigued with the workflow. In fact, it was workflow thinking that occupied my Friday afternoon.

Had the distinct pleasure of attending my 5th or 6th Mid-Michigan Digital Practioners meeting. One of the conversations was about workflows, and I realized I was chomping at the bit to share workflows we’ve worked on for ingesting objects into our instance of Fedora Commons, our pipelines for running materials through Archivematica and fitting with descriptive metadata from ArchivesSpace, or even just decisions trees for deriving JP2’s from TIFFs (with particular thanks to Jon Stoop at Princeton for sharing some of their Kakadu “recipes” that we’ve repurposed).

When we first set out to replace an aging Digital Collection system with a then unknown platform, it was workflow models that eventually opened our eyes and understanding about Fedora Commons. I remember looking at countless diagrams of modern digital collections infrastructures, and noticing reocurring components like “Fedora Commons”, “Solr”, “Blacklight”, etc. This was my preferred way of learning about what’s out, what’s hot, what’s not, what’s great, what’s old, what’s neat, what there is. What a wonderful way to learn about the world of things by observing their place in a grand workflow diagram.

And so, thinking of beginning a repository of sorts for workflows. I know this is happening in other areas like the Portland Common Data Model (PCDM), and/or around project Hydra. I do believe we hone and refine our intuitions about these complex infrastructures by seeing artists’s renditions – and that’s what these works of art are.

To workflows!


JP2 Conversions

Wanted to share a couple of hilarious and haunting images from creating and converting JP2s with the Kakadu JPEG2000 library. Full disclaimer: it is not the fault of Kakadu, it is most likely our free-wheeling, high-octane JP2 conversion approach we had in the pipeline for awhile.

Kakadu allows for tasks to get run over multiple processes, this is good! We also run these JP2 conversions with Celery, a background task infrastructure for Python. It is also not Celery’s fault. Finally, we’re queuing up multiple images for this pixel gauntlet. That, is most assuredly our fault.

The result, some pretty wild images. They are usually the combination of tiles and pixels from pictures nearby on the processing pipeline, at least that’s my working theory for now.


The Spreadsheet View

So I’ve been watching the EXCELLENT Collections as Data 2016 conference live stream all morning, and it’s really got the wheels going.

And the wheels were already going. A few weeks ago – when I finish corraling my thoughts, perhaps I can link to here – I attended a workshop in Maryland about Image Processing and Reunification.

These events, and the natural and mysterious evolution of ideas, have conspired to really hit home the idea of Collections as Data.

Doesn’t stop there. Thomas Padilla, a former nearby colleague of MSU and now in California I believe, also shared an IMLS grant just yesterday they had funded, “Collections as Data: Conditions of Possibility”.

I’m also serving on a committee about academic, R1 library collections.

And there’s no end in sight.

So, collections as data? What does that mean?

We do our best here in the Digital Publishing and with our Digital Collections to push the envelope of preservation and access, challenging ourselves to align digital objects in ways that will send them flying into the masses outstretch arms like mailbags on passing trains.


If I’m going to bury the lead, might as well throw one more blanket on the pile. I’ve also been working on a connector between the python ORM Peewee and DataTables for another project (which I hope to share at some point). As such, the scary efficient and well-understood mecahnics of a searchable, server-side processing spreadsheet has been on the brain.

So here’s the lead:

What about a spreadsheet-like view for digital collections?

You get it all. Thumbnails. Titles. Descriptions. Metadata. Filtering. Sorting. Speed. Search results already as structured data. Finesse. Fireworks.

Onwards and upwards! Putting the feelers out for a Solr-DataTables, python based connector, and we’re hoping to wire up just such an interface soon for our front-end.