Morning Long Paper Session
An Entity-based Approach to Interoperability in the Canadian Writing Research Collaboratory
Susan Brown – U of Guelph, U of Alberta; Jeffery Antoniuk, Michael Brundin, John Simpson, Mihaela Ilovan – U of Alberta; Robert Warren – Carleton University
Proposing how applying linked open data to CWRC projects would facilitate “rudimentary interoperability.” Susan briskly described their entities and usage of authorities. Trying to balance two kinds of projects, those sophisticated DH projects that use well regularized metadata and those that use highly irregular data and metadata. The latter cannot be remediated, as she noted, but the goal is to make it interoperate with other sets. Read more…
These notes were taking using the online text editor Draft, which creates Markdown files that can be either exported or directly published to platforms such as WordPress, etc. As always with my conference notes, I have attempted to put my editorial comments in italics.
Morning Long Paper Session
Organizational Practices in Digital Humanities Centers
Smiljana Antonijevic Ubois – Penn State University
She’s a research anthropologist, so this should be interesting …
She posed her talk as a problem: why digital humanities centres (DHC)? Gave a brief overview of origins in the 1980s, noting that the visions behind these unfold in practice. This influence is the goal of her research. She studied 23 institutions in Europe and the US with 258 participants. Methods included case studies, surveys, in-depth interviews, observation, etc. She visited 11 centres, some established and others just starting out. In common: physical space with 5-15 staff. Read more…
HASTAC was a great event. Well run, with worthwhile sessions from a range of perspectives. As always, my editorial comments are in italics where I remembered to do this. Hopefully it’s obvious as well where I forgot to add it.
Panel: Tales from the Library Basement: Doing Digital Humanities as CLIR Fellows
Digital Humanities at UC Santa Cruz
Started by noting how well hidden some of this can be. Used the walk to her office, down stairwells and through locked doors, as a metaphor for that. Her role is to do outreach, but she literally has to go out since people cannot get to her.
Showed a useful definition of DH: “Using digital resources, methods, and tools to do good transformative humanities research” (Lorna Hughes, at http://whatisdigitalhumanities.com/). That site, incidentally, shows a different definition each time it loads. Read more…
HASTAC 2015 was held at Michigan State University in East Lansing, Michigan. As always, I’ve tried to put my editorial comments in italics.
Connecting the Dots (opening plenary)
Scott Weingart, Carnegie Mellon
Used a visual model (a circle) to describe the extent of human knowledge. Noted that when doing a PhD, the idea is to nudge out a little bit from that circle and expand the scope. As he noted, while it may seem like a small contribution, it’s an “uplifting narrative.” Another way to think of it is that the knowledge is already known, just not to scholars, so what the scholar does is give it shape and form that adhere to the rules of research. Example: an anthropologist may ‘discover’ things about a given population, but the members of that community live those practices and traditions. I think of my own PhD research in this light. Unearthing, sorting, and creating a narrative are in their own ways about creating new knowledge. Read more…
This spring’s CNI moved west to lovely Seattle. As always, I’ve offset my metacommentary on the talks with italics. If you find these notes useful, or have further thoughts, please leave a comment.
What price Open Access?
Stuart Shieber, Harvard; Ivy Anderson, California Digital Library; Ralf Schimmer, Max Planck Digital Library
Shieber started with a thought experiment of what it would cost Harvard to pay APCs for all of the articles its faculty publish. Not surprisingly, this would be a staggering sum at ~$3,000 per article (8000x$3000 = $24,000,000). Their spend on subscriptions was around $5,000,000, so clearly the APC route is not feasible.
Stepping out of the university context, he notes that in 2011, the publishing industry had $9.4 billion in revenue, which produced 1.8 million articles, which is about $5,222/article. In that light, even the high cost of the APC route seems reasonable at $3,000/article. Read more…
My editorial comments are in italics.
Improving the Odds of Preservation
David S. H. Rosenthal, LOCKSS
Various studies have shown that large portions of the digital world are not archived. Over 50% of the journals we hold are preserved, most content linked from e-theses are no longer available, etc. He refers to this as the ‘half-empty archive’ and notes that the bad news is that this is overly optimistic. It’s actually worse. We tend to prefer archiving information that’s easy to access and presents no technical hurdles, e.g.- archiving Elsevier’s output isn’t doing anything terribly useful since it’s well situated content. We do not skew our activities to risk, in other words. Put simply, large, obvious, and well linked collections of information are more likely to be preserved, while all of the smaller yet critical portions go unpreserved.
More issues: we look backwards, not forwards, in other words, we prefer books and journals as preservation objects to more modern forms of information such as social media output and Web content in general. Dynamic and ephemeral content has little chance of being preserved. Read more…
Under the Hood with OpenStack
Steve Marks, Amaz Taufique, ScholarsPortal
Showed the specs for the hardware being used for the Ontario Library Research Cloud. McMaster is part of this project, so I’ve seen these specs before and wasn’t taking close notes during this stretch. It’s a ton of hardware, suffice to say, with the goal being a 1.25PB array.
The software layer is based on OpenStack. Showed a graphic that explained how this works. Key to the design is no central database; also the whole setup is hardware agnostic. When an object is written, it needs to be written to two nodes for a successful write, but in testing it was common for all three nodes to write immediately, even with large and complex transfers. Were only one node to write, an error is returned. After being written, they are replicated across the other nodes. Amaz also showed what happens when a node becomes unavailable, which is that objects are written to handoff nodes until it recovers.
The initial pilot was done using GTAnet, which encompasses the three universities that participated in testing: UofT, Ryerson, and York. This testing was necessary to see what kind of traffic is generated during both routine and stress scenarios. Ultimately, there were four nodes, one each at Ryerson and York and two at UofT. The fourth was necessary to observe the aforementioned handoff (i.e.- what occurs when a primary node is unavailable). Read more…