Digital Humanities 2016 Kraków

July 18, 2016

tags: digital humanities, libraries, linked data, open source, software

Wawel Cathedral

One interesting quirk of the DH conference is that so many talks are presented with a long list of collaborators in the program. It’s great to see collaborative work becoming more the norm. This isn’t new, but I seem to note an increase in this year over year. I’ve chosen just to list the person(s) who presented to keep my notes shorter and easier to read.

Wednesday, July 13
Thursday, July 14
Friday, July 15

Tuesday, July 12

Workshop – CWRC & Voyant Tools: Text Repository Meets Text Analysis
Susan Brown, U of Guelph; Stéfan Sinclair, McGill U; Geoffrey Rockwell, U of Alberta

Susan sketched a history of CWRC, which has its origins in the Orlando project (which dates from pre-XML days). I had no idea that CWRC used Islandora, but was happy to hear it. They’ve done custom module development for Islandora. Noted that a point of the workshop was to demonstrate how tools can be used in tandem, and are not always silos that don’t interoperate.

Unfortunately, the main server that powers CWRC went pear-shaped shortly before the conference, so it proved to be difficult to navigate and edit. Still, got a good overview of CWRC’s editing tool and its capabilities. Very keen to try it further when it’s released this fall in a hopefully stable version.

For Voyant, we worked from this tutorial. Generally speaking, I was too busy tinkering and learning to take notes. Interesting to note was the aside from the CWRC server’s crankiness, Voyant also bogged down at times when we were all hammering on it (~35 people in the room). Given Voyant’s success and wide use, hopefully there will be some scaling so that it performs well under heavy loads.

One interesting choice made my CWRC was to use Islandora. Susan noted a variety of frustrations with Islandora, which is natural given how much they are demanding from the interface of a tool that has been primarily developed to serve a different and less demanding purpose, namely, that of a digital asset management tool for libraries and archives. I can understand the nature of this choice rather than opting to code up CWRC from the ground floor, but it does seem that it may set limits on future development.

It was also a bit surprising when during the Voyant portions of the workshop that their server slowed when being hit by a couple of dozen simultaneous queries. An internationally known and heavily used tool such as Voyant could benefit from more robust hosting. This got me thinking about what the connections are between humanities researchers and IT bodies on their campus and how their needs are being properly addressed or not. When one compares the situation of Voyant, for example, which can be stressed by a single workshop, with the computing resources available to the sciences, the divide comes into clear focus.

Wednesday, July 13

Short Paper Session: Semantic Technologies 1

Publishing Second World War History as Linked Data Events on the Semantic Web
Erro Hyvönen, Aalto U

Started with the simple point that there is copious human consumable information on the Web related to WW2, but their project–WarSampo–is intended to publish it as linked data for machines. Their motivation is to prove Hegel wrong (cheekily), i.e.- that one can, in fact, learn something from history.

Another motivation is to draw materials out of public collections, the drawers in which people have items laying about related to their family’s experiences. Their project is to create infrastructure that enables this and saves downstream work by doing it right in the first instance. They have put together a network, gotten some funding, and created a prototype, but now they are working toward creating a sustainable ecosystem. They’re trying to extend Berners-Lee’s notion of five-star linked data to seven stars, so adding more utility.

There are about a dozen datasets from various sources in their prototype, ranging from diaries from the National Archives to OCR’d events from ~1000 history books. The geographic scope is simply on the Karelian theatre, so he points out that this is a very narrow scope compared to what it could become. They extended the CIDOC CRM to capture further event types. Their knowledge graph has >7 million triples. Showed the prototype portal, which offers various interfaces that leverage the linked data, e.g.- an event browser, biographical portraits, etc.

Converting the Liddell Scott Greek-English Lexicon into Linked Open Data using lemon
Fahad Khan – U of Venice and Istituto di Linguistica Computazionale “A. Zampolli”

lemon = LExicon Model for ONtologies. Showed the details on some slides that one couldn’t really read well due to font size. Liddell Scott Lexicon was published in the mid-19th century and is still published in its ninth edition. Various versions, a full and two abridged versions; they chose to take the ‘middle’ version, although they plan now to work on the full Liddell.

Showed how complex entries can be. There’s a visual hierarchy, Roman numeral and Arabic numeral labels, etc. Their work was made easier because the Perseus project had already published the work in a standard format, TEI-DICT. Their two core challenges were deciding what TEI-DICT data to keep as well as how to get include some of the data they had that didn’t fit the lemon model. To solve the latter, they created an extension to lemon that they called polyLemon that represents the nesting in the iLSJ.

Etymology Meets Linked Data. A Case Study In Turkic
Christian Chiarcos – U Frankfurt

Their main goal is to identify cognates, those inherited from ancestral languages or loan words. To find them, one considers two dimensions, semantic and phonological similarity. For the former, they do it via dictionary lookup, for the latter similarity filtering and ranking that is applied to candidates.

Showed the network graph of the Linguistic Linked Open Data, noting that most of the dictionaries represented use lemon. Walked a bit through lemon showing how they attempted to define their matches. They had to create some of their own sources, i.e.- created their own linked data editions, in other cases they used existing from LLOD.

Walked through examples of using SPARQL to tap multiple resources in series, which was a bit detailed to capture in notes, but made sense as he walked us through it. The point I got was that one hops between resources to follow lexical threads. Made multiple references to DBnary.

A Comparative Analysis of Bibliographic Ontologies: Implications for Digital Humanities
J. Stephen Downie, U of Illinois – Urbana-Champaign

Downie works with the HathiTrust, which is trying to move past the basic metadata model that has prevailed until now, i.e.- the 15 million works in the Trust are represented mainly by MARC records.

Their project EIEPHaT (missed the meaning of the acronym) is intended to assist with linking between large corpora, in this case HT and EEBO. Looked at four different ontologies MODS, BIBFRAME, Schema.org, and FRBRoo. In many cases, it’s possible to map fields one to one, but in other instances properties and classes get mixed up, e.g.- birthdates. In Schema.org it’s just a data, in FRBRoo it is classed differently (he was moving fast so I missed the details).

Results?

MODS/MADS most descriptive
FRBRoo event-based model
BIBFRAME bridges those two
Schema.org – crossroads of the other four, focus on marketplace transactions

He went into further detail, describing their suitability and idiosyncracies. Concluded by noting that all four do something well, but that none of them meets scholarly needs on their own, which is that they expected to find. Their work identified the need to:

formalize mappings
establish best practice and transformations
create clear, precise documentation (existing is “shockingly bad”)

Long Paper Session: Analyzing and using new media 2

Player-Driven Content: Analysing Textual Communications in Online Roleplay
Michelle Shade, Ben Rowles – Pennsylvania State U

Shade and Rowles are both undergraduate students, mentored by James O’Sullivan at the University of Sheffield. Gave a brief intro to MMORPG and Warcraft in particular. It’s not only the object of their research, they are also players.

Do players tend to roleplay in the chat session? No, it’s uncommon. They openly acknowledge that they are playing a video game, e.g.- referring to game play aspects or by making references to external topics. Some of the players, however, do choose to roleplay, sometimes even speaking in Elvish, which is an option in the game.

Why study roleplay? The language had not been quantitatively analyzed. Also, typically, it has been identified on player input, i.e.- when players declare they are roleplaying. So they asked if the language differs from objective oriented play and if so what its characteristics are.

To do this, they needed to acquire log files that the game can generate and that some players choose to generate and analyze. They made requests to guilds (player groups) to acquire log files. To clean the data, they used regular expressions to remove channel changes, timestamps, nicknames, etc. resulting in a cleaner transcript.

They applied:

most frequent words
zeta (exclusive words)
delta
topic modeling
sentiment analysis

RP has more nouns, adjectives, adverbs, prepositions, but non-RP has more verbs. Only RP has reflexive pronouns, and only non-RP has numbers in digit form, acronyms, and interjections (mostly swearing).

They can show that players who run two channels, one RP and one non-RP, use entirely different language. She ran through their sentiment analysis so quickly it was hard to capture details, but in essence it suggests that the players are performing as in a theatre group, i.e.- emotions flow in sync to create melodrama.

In sum, they found that RP and non-RP are both qualitatively and quantitatively distinct. There had been an assumption that this was the case, but their research shows that this is actually the case. It also demonstrates that this is actually a theatrical element added to the game, not just a narrative.

What Do Boy Bands Tell Us About Disasters? The Social Media Response to the Nepal Earthquake
David Lawrence Shepard – UCLA

Shepard is the lead developer at the Center for Digital Humanities at UCLA; he started by acknowledging his collaborators at several Japanese universities.

What do boy bands tell us? They tell us how to donate (One Direction did this). He asked: is this a music company making its brands look good, or the actual band members typing out tweets?

Framed people tweeting about disasters who are not involved are just onlookers, that it’s a form of vanity tweeting. Is there anything to this tweeting? Does it aid our understanding?

He noted that there’s no question that people who directly impacted by disasters are helped by social media. There are algorithms that help first responders parse such tweets to useful ends. But opinions diverge on whether “onlooking” tweeting is of any use. Many are pessimistic, e.g.- Gladwell and Morozov, but a Pew study shows that tweeting does effect change by encouraging donations. But the overall research picture is split, as he showed by citing various studies. All agree: social media creates a lot of discourse about events.

Their work is to separate onlookers who are seeking engagement, and those who are just building their brand, so to speak. Their methodology:

gather tweets
detect relevant tweets
detect onlookers’ interests
group users by general type (my gloss of his characterization)

To gather, they culled tweets using the Twitter API to pull tweets that used the word Nepal, then pulled six months of tweets from onlookers. Used topic modelling to determine their interests. Clustered users by topic or their Twitter habits. These were interesting slides; nutshell: comparatively few users were really engaged with the topic (14%). Many in this group typically come to the topic with an agenda in two: nationalist, sectarian, etc.

Contextualizing Receptions of World Literature by Mining Multilingual Wikipedias
Sayan Bhattcharyya – U of Illinois – Urbana Champaign

Gave some opening context about world literature in translation. What is at stake, why should we care? as he put it. It’s necessary to teach in translation because many students lack the facility to read in other languages, but doing so accepts a loss of context and “cultural pragmatics” are lost in translation, among other drawbacks.

Wikipedia represents an interesting addition to this environment. There are often pages about translated works of literature. It extends what he described as established pedagogical practices. Its multilingual nature helps “counter epistemic privileging of English as the universal interlingua.”

Another contextual piece is that there is clear concern in the academy about the encroachment of global English into the domain of literature. There is a gap between the global and the local.

One goal of the project is to mine the discussions around texts that are discussed in multiple languages on Wikipedia, machine translating them, and then doing textual analysis on this corpus. They also seek to create a portal that enables parallel viewing of the contents of various language Wikipedias. Ideally, they want to use eye-tracking to see how students actually use the content of the Wikipedia pages.

Short Paper Session: Crowd sourcing / engaging the public

Notes from the Transcription Desk: Modes of engagement between the community and the resource of the Letters of 1916
Neale Rooney – Maynooth U

Broke down the contributors to the crowdsourcing project in the title. Found that 70% of the contributors are men, but that 70% of the transcription is done by women (believe I heard this correctly). Blew through a great deal of statistics and survey results. Interesting result was that the most popular letter writer for transcription was Mary Martin, not one of the more traditional heroes of the uprising, e.g.- Michael Collins. Noted that, in general, the survey and the analytics supported each other.

36% of their transcribers come from higher education. Fairly heavily used in classrooms, too. Noted that many seem to work from a sense of civic duty, broadly defined.

In answer to a question about getting the public to do TEI, one of his collaborators (Susan Schreibman) noted that they chose not to tell people that it’s ostensibly hard, and they found that people tended to do a pretty good job. Some mistakes, but generally fairly accurate markup. She also noted that the work should be easy enough to do without training or extensive help.

Adding Value to a Research Infrastructure Through User-contributed ePublications
Catherine Emma Jones – CVCE, Luxembourg

They want their users to be able to take the resources on their site and curate them individually, adding their own research notes, etc. They acknowledge that their own internal scope isn’t broad enough, so users can add that breadth. Not all of the publications are in the public domain, however, so they created a toolbox that allows one to add their own notes and share it privately with a small group (say, a class) or with the world.

One user–male academic–has published 47 documents and 57% of the objects used have been used by him. He would like collections within his set of 47 documents; this feature does not currently exist.

Indigenous digital humanities. Participatory geo-referenced-mapping and visualization for digital data management platforms in digital anthropology
Urte Undine Froemming – Freie U Berlin

Showed some examples of anthropologists who provided film cameras to indigenous populations to allow them to film their own culture. She noted that this was an origin of what they are doing now in terms of enabling participation in anthropology.

Described some of the methods they used–sketch mapping (hand-drawn perceptual maps), scale mapping (basically, manual georeferencing)–with alpine populations to compare local knowledge with actual hazard maps. They then superimpose this on a digital map as layers, also including audio components involving personal narratives. Helps apply local knowledge to hazard and risk management.

Adding Semantics To Comics Using A Crowdsourcing Approach
Mihnea Tufis – Pierre and Marie Curie U, Paris 6

Begged us to allow him to say comics whether he was talking about sequential art, comics, or graphic novels for the duration of his talk to keep things simple.

Described the traditional way of reading comics: go to store, buy comic, take it home and read it. Now we have mobile comics. Collecting comics has two aspects. One is the passionate personal collector, but there are also now institutional collectors, such as Michigan State’s collection. There is now a fair amount of academic research attention, too.

He described many non-textual aspects of comics–motion lines, smell lines, shapes and edging of word bubbles, etc.–that readers can describe and could be translated, for example, to a haptic experience in the digital environment. There is a TEI extension known as CBML – Comic Books Markup Language, that moves in this direction. They’ve created a crowdsourcing platform to markup comic books using CBML, as well as to transcribe text. They gather and aggregate all annotations and use an algorithm to assess inter-transcriber agreement. CBML now needs some extensions, e.g.- for splash pages and meta-panels. They want more comics to include; copyright is a major issue. A beta site with a few comics is available. They are working with Scribe to build it.

Interesting desert

Thursday, July 14

Long Paper Session: Analyzing and using new media 4

SpotiBot—Turing testing Spotify
Pelle Snickars – Umeå U

While introducing Spotify, he noted that it requires no captcha to enter the site, further pointing out the original meaning of captcha, which is to be a Turing test (Completely Automated Public Turing test to tell Computers and Humans Apart). They have set up a number of bots using the ‘freemium’ access. He referred to them as “research informants” that “exhibit human-like behaviour” while “listening to music.” He noted that the bots are also scripted to document their own activity.

Pointed out the economic problems of streaming, i.e.- how do artists get paid. Showed some examples of people gaming Spotify. One band put up an album of ten tracks of complete silence and encouraged listeners to play it on repeat all night. Another bot called “stream it like you mean it” allowed one to loop a favourite infinitely every 31 seconds, an attempt to channel revenue to a desired artist.

Their bots did similar things, playing a specific track (yes, they used “Dancing Queen,” but also some ‘music’ one of their graduate students made available on Spotify) on repeat. Play it for 35 seconds, then play it again. He noted that their bots violated some of Spotify’s user agreement terms, and that the didn’t ask his university permission to do this and use public proxies to mask the source.

They found it hard to launch 100 bots to play “Dancing Queen” at the same time, so that part of their experiment didn’t work. They got maybe 20 at a time to work (hardware limitation). He noted that with proper hardware, it would seem possible to do this.

In general they have found that their Spotibots do pass the Turing test, i.e.- they are capable of playing tracks repeatedly without being interrupted or detected by Spotify. There was something appealing about the way he noted they were doing this without worrying about fine legal details; they make a few cents off of the music their students have uploaded, but rather than complicate things they are just moving forward with their work.

Exploring and Discovering Archive-It Collections with Warcbase
Ian Milligan, U Waterloo

Started by noting that the core source for history since the 19th century has been archival materials, which is an economy of scarcity. WARCs (Web archives) have the potential to change that.

Gave a brief intro to a WARC file, noting that it preserves all of the components, making it possible to reconstruct Websites as they existed. In a nutshell, we have no moved from scarcity to abundance (citing Rosenzweig). Asked the question: could one even study the 1990s and beyond without Web archives? It’s a major change in how history will be written, and the 1990s are already history per typical practice where first historical treatments tend to appear 20 years after an event.

Pointed out that people can use the Internet Archive’s Wayback Machine, but that it is inadequate as a research tool. It repeats many of the methods of analogue archives, e.g.- moving one document at a time. We can do more with the technology we have now.

Using an Archive-It (IA service) collection, showed how archives can be used for history. The project explored the changes in Canadian politics from 2005-2016. One could see the changes, but he asked the question about how one measures this. U of Toronto used Archive-It to track 50 Websites related to Canadian politics (the CPP). The CPP had a simple interface, just a basic search, few advanced options, and no facets. Great collection, but no one used it.

To improve, their team has been developing Warcbase. It’s your own Wayback Machine, connecting WARC files with the modern technology stack (Hadoop, etc.). It scales, and it’s very powerful. Also documented. Working to integrate it into Voyant for analysis. Possible to do name extraction.

This type of analysis is changing the emphasis from content to structure and structured metadata, e.g.- links. Link graphs reveal dynamics; one can learn more from a picture than from reading 10,000 Web pages.

How to do this:

ingest data
do basic shell analysis (also a Web Spark GUI for those afraid of shell clients)
filter a corpus
visualize to select sub-corpora – helps narrow the focus of research

For those who don’t want to use Gephi, they offer a D3 visualization tool on their site. There are numerous ways to filter. Also possible to create Solr indexes.

Short Paper Session: Sustainability and preservation / Digital libraries and museums

Evaluating GitHub as a Platform of Knowledge for the Humanities
Lisa Spiro, Sean Smith – Rice U

Conducted a survey of GitHub usage, also interviews. In the interviews, they investigated usability, community, user needs, data portability, and openness. Also curious why people started using GH, with versioning being the top reason, as well as being a free public repo.

The interview underscored the importance of GH for collaboration. One interviewer pointed out how easy it is to find software, change it, and put it back. Not all is perfect, of course; just because it is public doesn’t mean people will find your repo nor use your code. The majority of repos only have one committer.

What are the limitations?

learning curve for Git
can’t handle large datasets
concern about preservation (although they recognize that it’s not designed for that)
anxiety that GH could change its business model
concerns about inclusivity

Need to link to the slides for this one; lots of results on slides that I couldn’t type fast enough to capture.

Preserving Ireland’s Digital Cultural Identity towards 2116
Sharon Webb – U of Sussex; Rebecca Grant – Digital Repository of Ireland

The Digital Repository of Ireland is a TDR for humanities and social sciences data in Ireland and was launched in 2015. Spoke extensively about the need to engage their community to raise awareness about the fragility of digital collections. Put out a call for relevant collections, prepared for ingest (including digitization and XML encoding), and launched their platform.

Issues:

resourcing issues in archives: funding, staff, training, technical infrastructure
content bias driven by resource issues
awareness of digital preservation, but most orgs were looking for digitization services
impossible to crosswalk ISAD(G) to EAD
validated DRI workflows and documentation (i.e.- this occurred and is a positive)

Next up is loaning digitization equipment to various organizations, along with a computer to do XML work. They need to provide more training, though, not least around digital preservation and ISAD(G)/EAD issues.

Remediations of Polish Literary Bibliography: Towards a Lossless and Sustainable Retro-Conversion Model for Bibliographical Data
Maciej Maryl, Piotr Wciślik – Institute of Literary Research of the Polish Academy of Sciences

The Polish Library Bibliography (PLB) is an annotated bibliography of all articles, notes, and other materials related to literature, started in 1954. It was first remediated, i.e.- moved from print to online, in the early 2000s, but it was just “crammed” on to the Web. Eventually it became a “Medina”-like database, with many layers and complex paths. In the new remediation, they want to create a better research resource.

One task they need to address is digitizing the earlier years of bibliography. They also need to move the data into a standards-based environment, for this they will use a data standard rather than a bibliographic format, choosing schema.org rather than the more bibliocentric ontologies such as BIBFRAME or RDA, which he referred to as remediations of MARC. Using schema presents some challenges, among them dealing with works, expressions, and manifestations, which were used inconsistently in the source data. They also identified 86 different contributor roles in their data, which they needed to simplify. Also had to hack through some subject classification issues.

They hope that their work can be a model for other such literary bibliographies.

File Formats for Archiving: Stability and Persistence Issues
Andrea Bianco – U of Basel

Noted there are three critical problems in digital archives: hardware, software formats, data carriers. The data carriers are less of an issue in his opinion. More crucial is the hardware you need to read them. File formats are also a fairly severe issue. In sum, he noted that solving the hardware and data carrier problems is mainly achieved by copying the bitstream, but this doesn’t solve the file format problem. Preservation of content requires migration of file formats, i.e.- transcoding the data. For archives, file formats must be well documented, high quality, widely used, and standardized. Example: tif, which is used by 88% of memory institutions.

They have developed DPF Manager with a partner university in Spain. But having a tool isn’t everything. They are now working with a broader community, TI/A – tagged image for archival . They are trying to articulate standards for tif, which would be a subset of the broader tif standard for the archival community, then propose this as an ISO standard.

Long Paper Session: Teaching DH, teaching with DH 1

Read, Play, Build: Teaching Sherlock Holmes through Digital Humanities
Joanna Elizabeth Swafford – SUNY New Paltz

Describes her model as read – play – build.

read articles or blog posts on methodology
play with projects that use that technology
build a small project using that methodology

Uses Omeka with her classes.

Read her paper, hence Spartan notes. See comment two talks below about why this is if curious.

From Index Cards to a Digital Information System: Teaching Data Modeling to Master’s Students in History
Francesco Beretta, U de Lyon

Teaches computer science for historians, 20-40 students per year. They need to create systems to store the data they use, but most of them are unfamiliar with digital tools, mostly using paper notes. The range of subjects is broad, across all history subdisciplines.

To move them in this direction, he tries to get students to think data, produce data, and to visualize or analyze data. They use an internal system at Lyon (symogih.org – showed a quick slide with its architecture) that uses a specific data model. They want to pull data from texts and map it into their data model. Starts with index cards, and gets to data.

Has them tag texts with colours so that they begin to identify objects and entities. Needs a technology to move them toward data; he prefers PostgreSQL and he’s built a stack around it that the students use. Ultimately, they move toward R for data analysis and visualization.

One observation his students have made is that it is interesting work, but that they should have had the course earlier in their studies, so they are moving it earlier. Considering moving it down to the bachelor level.

Teaching Digital Humanities Through a Community-Engaged, Team-Based Pedagogy
Andrew Jewell – U of Nebraska-Lincoln

Started with an anecdote about how a student launched an app based on her work and was quite pleased to have done something “real.”

As the title suggested, he emphasized that the projects were team-based; most the time it works well, but has also led to confusion, anger, and annoyance. We learned this at McMaster as well with our first DH course. Another insight was that problem solving is more important than skill building. Also: expertise is decentralized and shared. By that he meant that no instructor could cover all of the material, nor that even as a group did they know everything the course would need in terms of skills and tools.

Read a paper, rapidly, that was better written for reading, not delivering orally. It’s incredibly difficult to connect with someone reading what is essentially an article at you, making no or few breaks in their rhythm, looking only at their paper, etc. One can write a text that is suited for reading (Swafford did fairly well in this regard), but this was not one of them. It’s a shame, really, because it is a topic I wanted to hear about, but less would most definitely have been more.

Kraków door

Friday, July 15

Long Paper Session: Visualizing 1

CORE – A Contextual Reader based on Linked Data
Eetu Mäkelä – Aalto U

[slides]

Their main goal is to support close reading in an unfamiliar domain. A reading aid or gloss, in other words. Showed an example of a WWI primary source that was full of references to people, places, and bodies that are likely not familiar to a typical scholarly reader. Creates highlighted segments in texts (help in context), which when hovered over brings up a popup that has bits of information pulled from various linked data resources. In addition to that data, they also point to related primary sources, both textual and visual. Quite extensive and wide ranging. They are presented as visual thumbnails. The tool make use of linked data to find variant names and concepts to pull in more resources.

This eliminates the need, of course, for formal annotation, and can be done on the fly by parsing content in HTML or text PDF files. They use a number of vocabularies (showed a long list of WWI vocabs). Showed other examples of applying the tool to Latin histories and Finnish legal texts, noting that LOD is nicely suited to multilinguality.

RICardo Project : Exploring 19th Century International Trade
Paul Girard – Sciences Po médialab

Spoke about two things: RICardo database on 19th century trade, and the RICardo Web application – code here. Noted that trade flows between entities, not just countries. They have information from the point of view of both parties. Also have trade data on world trade, i.e.- data on one entity’s trade with the entire world. Showed a long list of primary sources from various nations in Europe and beyond (20 total). They also have many secondary sources, such as British Empire statistical abstracts, but there are many of these from various countries. Also have estimation sources, which provide only total trade data, and tend to be more recent. Many currencies in play, 120 in all, so they had to build an exchange rate database by converting all to pounds sterling by year.

They also had many types of entities, as he noted, not just countries: cities, colonial areas, groups, all of which stem from the nature of the statistical source and nature of the trade entity. Started in MS Access, now csv files, python scripts. Headed to public release in 2017 via GitHub. Volume: nearly 300K flows, 1500 entities, etc. Complex, in a word. They are now looking to visual means to make the data more useful and easily to analyze.

They call their idea a datascape, and held data sprints with historians, economists, developers, and designers to work through the issues in the same time and place. Four parts/views: world > country > bilateral foci as well as visual documentation of the metadata. When you visualize, how do you let users know about data discrepancies, deal with incomplete datasets, deal with data alternatives, and deal with data heterogeneity? By exposing the metadata, they can visualize how complete or spotty their data is by putting a point in a grid by country and year if they have data.

For incomplete datasets, he noted that null entries do not mean zero. So you have to ferret out such false values. Also a value plus null does equal null. You can’t show a partial value without really messing up a visualization; better to have a hole. If they must do this at points, they show a second visualization that makes clear that the change in trend is due to missing values.

When there are multiple data alternatives, they use a sequence of decisions to determine the best source and simply show them all with context. They can also visualize these discrepancies, for example, two entities are likely to count trade differently, so will report different values (which is often discussed by historians).

Prototypes as Thinking through Making. Decision Points and Evaluation in Prototyping a Visualisation Framework for Historical Documents
Florentina Armaselu – CVCE; Roberto Rosselli Del Turco – U Torino

Goal was to create a framework to enable the transformation and visualization of TEI-encoded documents. One of their theses: researchers always want to compare originals with transciptions. They have a prototype, but no beta release as yet.

She dove into a very detailed desciption of their development work, from which I did extract one interesting and unusual twist, which was reinforced by the second speaker, namely, that they looked at existing tools and codebases and didn’t just go off on their own and create a parallel tool, but engaged with the team at Torino that had created EVT. As Roberto put it, they exchanged code, which is a fruitful interaction between projects.

Roberto also spoke about the issues of going open source, noting that not all DH projects release code. Some say they are open, but never release code, others are so heavily customized that they can’t be abstracted to other uses. He offered some advice in this regard:

use a permissive license
release early and often
be visible, make contact with other projects

Given that I’ve written and spoken about libraries releasing code, I was cheered to hear this call for openness.

Short Paper Session: Visualizing 2

Repairing William Playfair: Digital Fabrication, Design Theory, and the Long History of Data Visualization
Lauren Klein – Georgia Institute of Technology

Klein noted that she was mainly reporting on her coauthors’ work (two graduate students). Playfair was one of the founders of modern visualization; invented various types of modern representations, such as bar graphs and pie charts (late 18th century).

When trying to recreate images such as those Playfair created in digital form, many questions arise and complicate things. Playfair himself was criticized for creating pretty pictures minus data, so published data tables next to his graphs in some editions of his work (but not all). In modern visualization tools, you have to have the data before you can create a picture, but with D3, the goal is transformation rather than representation per its tagline.

When Traditional Ontologies are not Enough: Modelling and Visualizing Dynamic Ontologies in Semantic-Based Access to Texts
Silvia Piccini – ILC-CNR

Referred to visualization as a cognitive support for using textual collections. They wanted visualizations that would have the quality of hand-sketched diagrams. Their Clavius tool exists here.

Visualisation Strategies for Comparing Political Ideas with the ORATIO platform
Tommaso Elli – Politecnico di Milano

ORATIO is a database of political speeches. They want to create connections between the visualizations they generate and the source documents from which they stem. There are generous and ungenerous approaches he noted. The former is what visualizations do: overview first, then move to the specific. A concordance works in the opposite direction: search for the specific and then move to the more general.

Showed a variety of visualizations using the speeches of Kennedy and Nixon.

Can you tell that after 2.5 days of heady talks I am running out of listening and processing steam? I can.

Invisible Cities In Literature And History: Interfaces To Scalable Readings Of Textual And Visual Representations Of The Imaginary
Charles van den Heuvel – Huygens Institute for the History of the Netherlands; Florentina Armaselu – CVCE

Work based around Calvino’s work Invisible Cities. Showed an example of what he called imaginary depictions of existing cities, which are drawn images of Groningen, each of which has a specific type of historical evidence. Their tool gives one a zooming capability, allowing multidimensional investigation, bridging between distant and close reading.

New maps for the lettered city: a data visualization exploration of 19th century salons in Mexico
Silvia Eunice Gutiérrez De la Torre – Würzburg U

Talked about various work done on literary associations, but noted that in Latin America, such gatherings were often subversive, not desired by those in power. As she put it, she was fortunate because another scholar/librarian had published a work on all of the associations in Mexico City, making it possible for her to map and know about these groups. Able to show overlap, using D3, in membership between various groups. Could identify a “mafia” of sorts, i.e.- a core of the community.

In a closing aside, noted that without a scholarship to study in Würzburg and the exceptional support of her library in Mexico, she would not have been able to do this work nor attend the DH conference. Pointed out that it is hard for people from certain nations to attend.

Short Paper Session: Teaching DH, teaching with DH 3

Live/Life Stories. The Uses Of Digital War Testimonies In Educational Contexts
Susan Hogervorst – Open U Nederland, Erasmus U Rotterdam

Noted that there is concern around the passing of so many eyewitnesses to the Second World War. Now we are seeing a shift, however, away from collecting and preserving these interviews toward disseminating them to a wider audience. Called this a revolution in oral history.

Uses a variety of methods to study the use of these histories across generations, including Web analytics and focus groups. Spoke in more detail about a focus group with student teachers in a Holocaust education program. Had them select two interview fragments (and explain how and why) from oral history collections and formulate one or two learning objectives. Could learn both how they use portals as well as their selection criteria.

They tended to choose fragments about children (kids can relate), those with a high density of topics and places already known to students, and stories kids would remember, that have an ‘impact.’ Why use them?

to add a human dimension to facts
make abstract phenomena understandable
illustrate textbook history and confirm known facts, ethically (evidence) and epistemologically (reliable)

Discovered that students browsed to fragments relying on options presented on the site, while keyword searching was all but unused, although those who do you keyword searching stay on the site longer, per log analysis.

The Digital Scholarship Training Programme at British Library: Concluding Report & Future Developments
Aquiles Alencar-Brayner – British Library

This is an internal staff training program they run to help staff address their rapidly growing digital collections and the demands of researchers. Total of 88 courses to over 400 staff, all face to face. Showed some sample course titles, e.g.-

Cleaning up Data
Programming in Libraries
Communicating Our Collections Online
Managing Digital Research Information
Crowdsourcing in Cultural Institutions

Not trying to create scholars, but help staff work with researchers and to innovate. Want staff across all areas to be familiar with basics of digital scholarship. What did participants value most?

hands-on, practical experiences
time to explore innovative digital projects
trying new tools, especially on their own BL or similar collections
expertise and enthusiasm of instructors
meeting their colleagues and learning about BL projects

Some best practices and tips:

case studies, real world examples help
articulate learning outcomes
clear, printable instructions
max 15 in hands-on courses
allow time to complete exercises
have optional activities on hand for advanced staff

Really admirable program. Lots worth emulating, even if we would struggle to do it on this scale. They even have a monthly DS reading group for staff, so that those who aren’t technically inclined can still take part in theoretical discussions. Slides here.

Writing Composition in the Close Reading Cycle: Developing The Annotation Studio Idea Space
Kurt Fendt – Massachusetts Institute of Technology

Annotation Studio was conceived as a teaching tool to enable collaborative reading. Have about 8,000 worldwide users, across a wide range of humanities disciplines, but beyond the humanities as well, e.g.- anthropological field notes. People use it as they wish; the platform doesn’t limit applications of collaborative reading.

Wired!: Collaborative Teaching & Critical Digital Making In An Art History Classroom
Hannah L. Jacobs – Duke U

Wired! Lab includes faculty, grad students, and undergrads from a range of programs and disciplines. Based in Art History and Visual Studies, so integrated in to some of their courses (showed a list). Cited Ratto’s definition of critical making as their scope.

Showed a detailed example from an Art History class using Neatline.

Digitale Tools und Methoden für die geisteswissenschaftliche Forschung praxisnah erklärt: Ein neues Format im Test (Digital Tools and Methods for Humanities Research Practically Explained: Testing a New Format)
Tanja Wissik, Claudia Resch – Austrian Academy of Sciences, Austria

[Spoke in English despite German title.]

Created a training program primarily for researchers at the Austrian Academy of Sciences, although their tool galleries are open to all. These range from various specific tool-based training to topics such as data management planning. Participants are asked to register, and can get a confirmation or certificate of completion if they need such proof of participation.

Presentations are held face to face. Just finished their first year and provided an overview of the feedback they have received from participants. While a majority were Academy people, it was a slight majority, which indicates to them that training is needed in Austria. Slight majority of women, and many took more than one course. Their age statistics shows more mid-career people than student-aged people. Mostly humanists, but also librarians, archivists, information scientists, linguistics, etc. Generally positive feedback when asked if they would attend again, almost all said yes or strongly yes. Most found the combination of morning lecture and afternoon practical, hands-on work to be successful, although they learned to cap size to make sure the hands-on portion is optimal for participants.

Postcript

Random words/phrases I heard: multiresolutional, hairballs (network visualizations that are a tangled, ugly mess: a hairball), puking rainbow meme

One Comment

remipulwer permalink

July 22, 2016 11:13

Reblogged this on Mr. Remi Pulwer and commented:
Great notes on some sessions I wasn’t able to attend to… #DH2016

Comments are closed.

Libraries, Technology, and other matters

Digital Humanities 2016 Kraków

Tuesday, July 12

Wednesday, July 13

Thursday, July 14

Friday, July 15

Postcript

Who I am

Recent

Search

Older posts

Latest tweet

Libraries, Technology, and other matters

Digital Humanities 2016 Kraków

Tuesday, July 12

Wednesday, July 13

Thursday, July 14

Friday, July 15

Postcript

Share this:

Related

Who I am

Recent

Search

Older posts

Latest tweet