CNI Spring 2011 notes
After a several year hiatus, I made it back to a CNI meeting and attended a number of interesting sessions. Below are some notes, with links to the posted presentation materials. My commentary on the talks is set in square brackets to distinguish it from the summary notes.
Information, Infrastructure, and the Internet: Reflections on Three Decades in Internet Time [slides]
Christine Borgman – UCLA
Borgman delivered the opening plenary lecture as this year’s recipient of the Paul Evan Peters Award. She opened with a mildly depressing observation, that being that we may be reaching the end of the generative Internet. One example she gave of this was the app culture in which we now find ourselves, which allows creators to set restrictions and stifle innovation. Also, she noted that search is optimized for commercial interests.
Libraries used to be able to “mark it and park it”–i.e.- buy collections and store them–but collections these days are far more dynamic. Alas, the dynamic world of information was not built around the idea of scholarship, where linear progress from point to point is helpful.
After giving a broad historical overview of how we have reached this point, she set out four “grand challenges” for data and its infrastructure:
- take back information retrieval
- engage the entire information lifecycle
- distribute infrastructure
- match policy to incentives
[These should seem familiar to anyone working libraries, at least on the digital side, these days, and they are indeed truly vexing issues that consume much of our time and thought.]
With regard to information retrieval, she noted that we need to return to a sense of “aboutness” [hadn't heard that term since a cataloging class in library school other than when I used it myself] and “draw semantic connections” between objects. As a colleague also in attendance noted, this does not scale terribly well.
At one point, Borgman said that we need to “partner with research teams,” but I always ask what that means when I hear it. She said this with regard to engaging the entire information lifecycle, and I get in many ways what she is saying, but in practice no one ever seems to have a concrete plan for how this actually happens. [The structure of academic institutions and the mechanisms of grant funding do not create as many openings for libraries to engage research as would perhaps be desirable.]
One problem with engagement is, of course the lack of policies and incentives, which was Borgman’s fourth point. As she noted, data management/curation is expensive work, and without incentives, it tends to go off the rails. She noted that the case is not made to researchers as to why this should occur, most importantly that it is a means, not an end. [Anyone who has watched an IR fail to take flight--and who hasn't--knows this pain. So few institutions have taken any concrete steps to create policies, let alone create incentives, so what few policies that do exist seem mildly punitive or onerous.]
Digital Humanities: A Natural Future for Academic Libraries
Thomas Wilson – University of Alabama
Wilson noted that they wanted to provide a faculty-centered service, and that they dreamt big – large video walls, e.g. As he put it, entering this area of work leverages the natural affinity between libraries and the humanities. Collections have always mattered to the humanities, but now tools are necessary. Need to engage faculty in conversation when starting such initiatives. They did this, asking faculty what digital humanities meant to them, and how it impacts their scholarship.
Ultimately, they seek a mix of personalities –people new and experienced as scholars and digital humanists. Impulse must come from scholar–“we’re not manufacturing something for them”–but you do need certain staff available: designers, programmers, metadata experts, archivists, database admins, project manager, curator. [This is indeed a critical point. How does one avoid becoming a service center, where one performs work on demand.] The project remains as the scholar’s project; the library consults and facilitates.
When talking about the space, he mentioned instructional and meeting space, which to me does not sound like a project space, but a learning and event space. The photos he showed underscored this impression. It did not look like a work/project space. Underscoring this impression was a quote from a faculty user (historian) that mentioned fostering teaching, but does not mention a concrete project. Another user did note the technical expertise of library staff and critical software provided by the center. [This last bit strikes me as a value-added component of libraries; we are good (or can be good) at managing IT infrastructure in a way that central IT is not. Our strength lies in taking care of niche needs, and not putting everything at the mercy of major systems and strict rules, which are necessary to manage enterprise services, which we typically do not provide.]
Realizing Scalar Capacities to Transform Media Archive Scholarship [handout]
Craig Dietrich/Eric Loyer – USC
This was one of those talks that makes attending CNI worthwhile. They’re developing a tool called Scalar that, when ready for public release, offers a novel publishing platform that, put in simple terms, facilitates the inclusion of multimedia materials in online scholarship.
While introducing the need for such a tool, Dietrich made a humorous correlation between chain stores/McDonald’s and WordPress: both overgeneralize and are ill-suited to specific purposes. WordPress in particular is limiting and leads to “shoehorning”, i.e.- shoving content into rigid frameworks. They showed how code customization builds a house of cards that other developers then must undo to suit their own purposes, and asked how one can fix this. Their somewhat pointed answer was to build the semantic Web, but not inference; but rather as they put it by adding “the useful stuff,” e.g.- rdf – creating links via relationships.
The anatomy of a Scalar book consists of four elements: pages, media, paths, tags. One feature is that a piece of media can be part of multiple paths, and one can jump back and forth between paths and randomly remix the book. No limit on how many paths per object. Paths intersect, and one has fairly unlimited possibilities to recombine content and have paths overlap, etc. [When shown visually, this just looked cool, and one could really see the advantage.]
There are different views, where one can emphasize the text or the media, which changes how one views and processes the content. In their example, it allowed the professor to highlight student work and give its prominence.
Showed the annotations tool, which is quite flexible and impressive. Keyed to videos, for example, where one can set an annotation at a specific time point, sparing readers the hunt and keeping them on track.
They did refrain from offering a timeline for a beta release. [One can only hope it's sooner rather than later and that this isn't one of those projects that shows great promise but withers before it hits the street or gets co-opted by a commercial venture.]
Migrating to the Cloud: Pepperdine Libraries at Web Scale [slides]
Michael Dula/Gan Ye – Pepperdine University
Dula began his introduction of Pepperdine’s move from a traditional ILS to OCLC’s WSM (Webscale Management System) with a predictably negative overview of legacy OPACs. [They're an easy target, and there was nothing new here.] He then showed their WorldCat Local search results. [Nice interface, but does one need to pay money for this functionality? In other words, is the ILS worth the literal investment?]
He wanted to get out of server management business: “we want to manage information, not hardware.” [I get the impulse here, but think we are being overly hasty about outsourcing our server-based apps without doing due diligence regarding extended and longterm costs. Too often we burn the money saved by dumping a server engineering ways around the host's policies/restrictions.]
Dula noted that a high degree of integration between library software components was desirable. [Isn't there a downside here if those components don't keep up over time? Dependency can be an albatross.]
[WMS cheaper now; what happens when the price goes up, which it will? As an early academic adopter, isn't one getting an overly favorable price? This is a pattern we have seen before, and the marriage always goes awry or at best is reduced to a marriage of convenience. Besides, don't monolithic solutions seem a bit frightening? Think PeopleSoft - everyone uses it, no one likes it.]
Seems like they had many of the typical migration problems, e.g.- exposure of internal notes, etc. Also, in their legacy system (Voyager), a patron could be a member of multiple patron groups. WMS allows only one. [At this stage of the talk it occurred to me that it looks like is a SaaS ILS, something others offer as well, and I wasn't seeing much of the Webscale promise.]
A lot of their implementation issues sound typical of migrating to a new ILS. Nothing new here; vendor promises fixes.
Cataloging: here one sees some of the cloud benefits, e.g.- no need to update holdings in OCLC since one is working in that environment by default.
One choice they did make was not to map historical circ data, for example, to WMS. Seems like that’s a loss of sorts, if one wants to leverage the cloud. Starting over at ground zero in a statistical sense is a loss.
Dula referenced the planned integration between WMS and university accounting systems. [How likely is true integration? OCLC has plans to work with PeopleSoft, but for now it's a "promise." That seems like a hill on which many valiant troopers will sacrifice themselves.]
Dula: “It is a good thing to be in a partnership with a vendor.” [Isn't this partnership an illusion? For one, we are customers. Were we partners, we'd get to see the code. For another, one is a partner until a bigger fish comes along who then captures the vendor's attention. How is this not going to happen?]
Dula offered a good answer of sorts to my question about how development avoids being bogged down in the “we have to please all customers” quagmire, namely, that the open API allows customization in many areas. Perhaps overly optimistic, but there’s something there.
DiSC: Developing a Digital Scholarship Commons [handout]
Joan Smith/Rick Luce – Emory University
Luce opened by noting that the emergence of digital humanities/scholarship centers is something of a response to e-science, but for other disciplines. Great framing comment.
Emory received a Mellon grant to plan their process. Part of the reason for the grant was to give it a “sense of there’s something here” to get potential collaborators interested and engaged. They also wanted to see what was out there and take the best of what was available. They also sought to define the role of the library (actual and potential) in digital scholarship as part of the grant-funded work.
Key point: Wanted the commons to be “of the library,” not just “in the library.”
They discovered that many existing offerings take a vertical marketing strategy, i.e.- they are limited to one or a small set of disciplines. Emory wants to build in sustainability, which is not an easy task. For one, how does one staff for the long term? One point of agreement among the various project consultants: librarians must be an integral part of the center.
Luce and Smith want to make libraries the hub of digital scholarship, not an incidental component. [Question: I agree, but then why do we need centers? Should it just be the work of the library as a whole org? In other words, when do we disband the centers and just integrate the services into the library?]
Emory has curriculum in digital scholarship, as did Alabama. [This made me wonder what McMaster has, and I need to find out.]
Emory plans to fund seed grants (I gathered they will do this from library funds, but may have that wrong) for graduate students to bring them in (sponsored internships – competitive). Alabama mentioned this, too, and it seems to be an important method for setting things in motion and establishing the library as a key partner.
As part of the grant, an Emory team visited five sites:
- CHNM at GMU – grant-funded, orientation is external, very focused on marketing and outreach
- UVa Scholar’s Lab – two-tiered: service and project-based work – something Emory wants to pursue – also a successful fellowship program – librarians’ role is more service oriented
- CDRH at Nebraska – lots of staff, PhD students, more a research lab than a partnership between librarians and faculty; experimental in nature; build scholarly tools to support research; focused on thematic projects; funding from many university sources
- VSI visual studies inst at Duke – in the library, not of the library; siloed; hub of activity, but not hub of digital scholarship at Duke; good space ideas, but not a useful model for Emory in terms of implementation
- MITH at Maryland – research center, lots of grant funding, mostly about R&D, not about supporting digital scholarship at Maryland, externally focused; tool development, but not as partners on faculty projects; also in not of the library; has the center for digital collections in its space, but not part of the MITH org
- The infrastructure challenge: “tension between leading-edge creativity & long-term sustainability.” Also: multiple technologies and the “lack of common ‘tech stack’.”
- The need to establish transparency with regard to projects–project management (training faculty to use project management, e.g.).
- What to do with source code? Do we teach people to use version control? Is that our role?
- Transparency: literally need a Website that tracks what’s going on, not least to keep campus informed.
Major goal: build collaborative environment with flexible spaces, state of the art computing, display, and interactive equipment.
During the Q&A, the following points emerged:
- They involved 20-30 people in space planning!!! Luce noted that while he initially found this excessive, it paid dividends.
- If one creates a space like this for graduate students, they will bring their faculty in – interesting way to view how one builds an active clientele.
- One key aspect: putting closure on projects to avoid overly long commitments – “boot them out of the space” when it wraps up and to make room for new projects.
Search Engine Optimization for Digital Repositories [handout]
Kenning Arlitsch – University of Utah; Patrick O’Brien – RevX Corporation
Goals: increase reach, increase visibility.
Arlitsch noted the statistics we have that show that so few users start their research with library sites, which is a well-known issue. [It occurred to me in this context that if 7% use Wikipedia to start their research, for example, should we do a better job of optimizing how library objects appear in Wikipedia? - COinS? There are likely other ways of doing such things on more popular interfaces than ours, and not just search results.]
Part of SEO for digital repositories is satisfying natural donor curiosity and vanity; it’s not a bad goal at all to make their stuff findable.
They demonstrated that simple robots.txt errors lead to mass exclusion of our digitized content. [That this is so is really nearly outright embarrassing. These gets back to basic Web management, but as someone who has seen Web management close up in libraries, it does not surprise me. I pointed out during the Q&A that at a previous institution we had a Google Search Appliance, and besides the fact that it gave us the ability to provide state-of-the-art search to the campus, it's a fabulous diagnostic tool if one wants to see firsthand where a Google bot is thwarted by one's Web setup. Based on what we found, we made innumerable fine tweaks and customized many paths, etc.]
Chinese/Romanian harvesters came up during the talk and the Q&A as people using up your resources as they crawl the metadata. [But why pick on them and not harvesters from Canada or France? Some could be legit, perhaps increasingly so. I am not naive and know that a lot of illicit traffic comes from certain nations, but I'm uncomfortable when it becomes blanket ethnic/national statements.]
O’Brien pointed out that descriptive metadata can be “basic Dr. Suess stuff.” No need to be fancy or verbose. [Only someone who doesn't have a history of working with librarians could make it sound so simple, which, of course, it is.]