CNI Spring 2014 notes

May 9, 2014

tags: cni14s, digital humanities, libraries, linked data

flickr – Jason Mrachina

A bit late, but below are my session notes from the most recent CNI meeting. As always, editorial comments are generally in italics.

Fostering A Graduate Research Community With Digital Scholarship Programs And Services
Andrew Bonamici, Karen Estlund, University of Oregon

They’ve established a certificate program: New Media and Culture Certificate. 24 students enrolled from 10 different departments. The goal is to provide both theoretical and practical components. It has a program director, a fairly aggressive marketing approach (blog, Twitter, etc.), and is trying to position itself as a recruiting tool for potential graduate students.

How does the library support it? For one, it aligns with their strategic directions, but they can also provide facilities, content, tools and systems, and people who are willing to help.

One thing the NMCC doesn’t do is extend the time to degree. It does have some challenges. On his slide about these, the first two began with “decentralized” (both human resources and infrastructure). Strategic planning is hard when you cross units that have their own plans. There’s also a lack of experts and they are stretched thin (“great starters, shallow bench” to apply a sports metaphor).

The libraries have done a lot of assessment to determine what students need. She noted that LibQUAL doesn’t really give an organization enough specific data or feedback to drive the development of new offerings, so you have to ask and engage students in this analysis. They have a Digital Scholarship Center that has the following main goals and activities:

developing a ds curriculum – credit courses (intro to DS – offered twice and it has filled up; have also offered data management/TEI/coding and seen strong demand)
supporting engagement – they have a Sandbox and Lab created by opening up a space into which they put seating, monitors, equipment, stuff people can try out (somewhat successful; haven’t advertised since they lack staff support)
consulting – they work with graduate students and help them with fairly basic questions: how do I host a site longterm, what’s cPanel, what’s GitHub, etc.
community – four coffee machines and a tardis; space for groups to meet; hosting the NMCC open houses
community – creating “graduate affiliates” that have office space and meet weekly; they assign mentors to these students that help them learn about tools and technologies (the mentors are not necessarily U of O people)
community – publishing (people need blogs and outlets)
alignment – bringing it all back to the NMCC and supporting that program (and bringing it into the library)

One of the first questions was how the courses were listed (as library courses or from other departments). Their response is that they are able to list courses under their own name (but get no income for enrolment).

Visualizing Temporal Narrative
Nora Dimmock, University of Rochester

To paraphrase one of her points rather abruptly, she connected the practice of close-reading, as articulated by Prof. Joel Burges, to ‘reading’ television. It’s a way to encourage the reader to pay attention to shot changes and temporal elements, and add annotations and other metadata. It takes what one knows to be true of television–that technology has changed the visual style and presentation–and makes it objectively demonstrable. She showed various visualizations: network visualizations, graphs, charts.

The course that Burges taught around television narratives was successful, so they created a course that applied some of the same ideas about narrative temporality to a book (Down in the Chapel: Religious Life in an American Prison). Students then had some choices about how they could visualize the narrative; the results are divergent. She noted the various tools students used, ranging from R to Photoshop.

Use Altmetrics To Uncover The Hidden Scholarly Dialogue
Andrea Michalek, Plum Analytics

She surveyed the problems with traditional metrics, noting that citations lag behind in terms of actual impact. Everyone is active online, which she demonstrated by the volume of data that arises on the Web every day. She called the impact of scholarship on the Web “scholarly data exhaust,” as in leaving an exhaust trace (seems it has no value, but it does as an indicator). They define five categories of altmetrics: usage, captures, mentions, social media, and citations. Captures are beyond viewing and show engagement, as in putting an article in Mendeley. Social media includes tweets, retweets, likes, +1s, etc. Citations means more than just journal citations (e.g.- patents). She noted that only captures, mentions, and social media are altmetrics. Usage and citation have been with us for a long time.

She then ventured into turf she labelled “how does this help me” and broke down into three categories: performing research, funding research, and publishing research. In the first category she noted that it’s a new way to set benchmarks and demonstrate impact. The question is, do the right people at your institution care and/or know how to assess this data.

As a funder, it’s a way to establish the ROI and to make decisions about future grants. Again, will lots of eyeballs outweigh the article in Nature? It’s hard to imagine that for many we have reached that point.

She noted that in addtion to her product (PlumX), there are other tools and products emerging: ImpactStory, Altmetric, PLOS. Altmetric belongs to Macmillan, while Plum was acquired by EBSCO. This ownership should be something of a warning sign for us, as in it’s something we’re going to be asked to sign up for.

She showed a demonstration, and it’s pretty neat, but it seems pretty hard to contextualize for an evaluator, i.e.- someone who needs to draw a conclusion about the effectiveness or impact of a researcher. She did note that this is something of a more effective means to Google oneself. It does stand to reason that if an institution were to sign on, then its particular visualizations would begin to make sense to deans, chairs, and tenure committees.

The work being done around ORCID, VIVO, etc. all makes it easier to use Plum, since a lot of the work has been done in the existing profiles. Plum pulls that all together, along with any relevant identifiers, e.g.- DOI, PubMed ID, arXiv ID, handle, etc.

They are often asked about avoiding gaming and bias. Their answer is essentially not to boil it down to one count or metric (e.g.- Klout). They don’t deliver a score, as does Altmetric, since they believe it shows a bias. With their tool, one can choose what data to compare and which articles or objects should be considered. They also link through to the original source, where and when possible.

1914-1918 online: International Encyclopedia of the First World War
Oliver Janz, Nicolas Apostolopoulos, Freie Universität Berlin

Always a pleasure to see the “DFG session” at CNI. Janz pointed out that his talk was more or less at their urging and direction, and I think it’s a good thing that DFG has this perspective on putting their supported projects out into the world.

Canada has some involvement on the editorial board, which is nice to see. There are a couple dozen countries involved overall, so it’s a truly collaborative effort. The more than 1,000 authors, in fact, represent 54 countries in sum. This makes it the largest international effort in the field of World War I studies. He noted that some authors (historians) struggle with English, so the project provides translation and editing support.

The scope includes collections, such as those at the Imperial War Museum, various institutes that create research on the topic, such as the Orient-Institut in Istanbul or various DHI (Deutsches Historisches Institut – German Historical Institute) sites. It also includes existing projects, such as one in Switzerland at the University of Zürich that treats the involvement of Switzerland in the war.

Janz offered a quick summary of why WWI can be considered a global war, and how the project wants to cover that entire scope (including Africa, related conflicts such as the Turkish war of liberation, etc.). They even want to include the neutral countries, since given the economic impact of the war the concept of neutrality is somewhat moot. For example, they’re including Latin America, which is often ignored entirely in standard histories and texts (as he put it, it’s not even a footnote).

The structure includes regional survey articles, thematic survey articles, regionalized thematic articles, and the encyclopedic entries (on discrete topics, e.g.- a type of machine gun). These are all interrelated and connected, of course. In terms of bulk, the last category (encyclopedic entries) dominates, but they do plan to have 1,500 articles, so the scope of the project is vast, ‘crazy’ as he called it.

Apostolopoulos covered the more technical aspects of the project, such as how the project is managed and the tools/techniques in use. The project includes a variety of media types, as one would expect, and the article is a composite of various types that are assembled on the fly. They use a common metadata scheme that tracks such things as author data but also data from the editorial office; the Bavarian State Library adds a great deal of metadata to support harvesting and integration (LCSH, BSB-DDC, Rameau, etc.). They have a project-specific taxonomy, from which he showed an excerpt. It includes 321 total terms that can be assigned over three levels (only eight at the top level). He then showed a detailed example, which graphically demonstrated how interlinking within the encyclopedia happens along thematic lines.

To process their articles, they use Confluence, JIRA, SMW (Semantic MediaWiki), and Zotero to support their work. They conceptualise in Confluence and run author discussions and use JIRA to manage the editorial workflow, Semantic MediaWiki to do editing and indexing, and Zotero to support publication (entering bibliographic entries). The process follows four basic steps: plan, write, insert, and enrich. His slides gave more detailed descriptions that I won’t try to capture in notes. He noted that they are pretty much working with Word as the main authoring tool. They don’t support TeX or other more exotic tools, since they are not widely used by their authoring pool.

This is the first time I’ve seen a description of a project that is using Semantic MediaWiki to assemble articles. As he put it, SMW allows the creation of additional classes for their content. Seems like a powerful tool, but difficult to capture from a five-minute gloss. The main point he was making is that it’s flexible. SMW has an extension that allows direct importation of bibliographic data from Zotero, which saves time and helps with standardizing the format.

Articles, in a last step, are enriched with multimedia, and authors are then asked to approve the final form of the article before it appears online. As he put it, all of this takes time and money, so it would be interesting for them to know if there are better ways to do some of this.

He provided a tour of the user interface, noting that in addition to articles one can see spatio-temporal visualizations. In other words, there’s taxonomy-based navigation and also “ontology-based” navigation. He did note, however, that they don’t have an ontology of their own.

Practical Work in Linked Data Using Digital Collections: Unleashing the Expressivity of Data
Silvia Southwick, Cory Lampert, University of Nevada, Las Vegas

Began with a definition of Linked Data and a description of the current environment, where linking between collections is poor and based on visible URLs. Also ran through the benefits of Linked Data. She pointed out the basic fact that there is a clear case for adopting Linked Data, but not so much in terms of pathways to implementation.

UNLV sees the merits of getting involved in this work and created a project to get started. In phase one, they wanted to clean and export their data, and used CONTENTdm for this work. It’s exported to a spreadsheet and pulled into OpenRefine in the second phase, where it’s prepared, reconciled, supplemented with triples, and exported as RDF. In the third phase, they use Mulgara or Virtuoso to publish the data.

Reconciliation involves comparing one’s data to a controlled vocabulary. It can bring better and more data back into your base records. They showed a quick screenshot example of using OpenRefine to achieve this. As she noted, OR is a server so can communicate with external datasets to do this.

Of course, once one has created RDF files with OpenRefine, it needs to be published somewhere, i.e.- in a triple store. The data is imported into Mulgara for this purpose. So what next? They are now exploring how to do that. One area is visualization, where they’re using OpenLink Virtuoso Pivot Viewer, Gephi, and RelFinder.

OpenLink Pivot Viewer is useful for displaying images, which are selected via SPARQL queries. The results can be refined with facets. A query can become a de facto dynamic collection. RelFinder is effective for visualizing relationships, as the name might imply. Gephi makes very attractive network visualizations, as we all have seen in recent years.

Comments are closed.

Libraries, Technology, and other matters

CNI Spring 2014 notes

Who I am

Recent

Search

Older posts

Latest tweet

Libraries, Technology, and other matters

CNI Spring 2014 notes

Share this:

Related

Who I am

Recent

Search

Older posts

Latest tweet