Digital Humanities 2014 day one notes
My notes from Digital Humanities will likely be copious, so it seems to make sense to segment them by day. As always, any mistakes are my inability to listen and type and I’m more than happy to correct any botched or twisted points below.
Session 1, Wednesday
Digital Cultural Heritage and the Healing of a Nation: Digital Sudan
Marilyn Deegan, King’s College London
Sudan’s cultural heritage is not well known, although it reaches back >4000 years. Commonly heard: did you know Sudan has more pyramids than Egypt? Its cultural history is as rich as Greece, Rome, Egypt, but suffers from lack of knowledge outside the country. Kush kingdom, Napata kingdom, Meroitic kingdom, Nubian kingdoms: visually and textually related (similar script) to Egypt’s culture. The Nubian kingdom originally resisted Islamic invaders, but eventually fell and an Islamic era ensued, lasting into the 14th century. The cultural influences are varied and profound. More recent history has been less than ideal, with the 20th century full of war and revolution. One of the goals of Digital Sudan is to present this culture to the world, in part to counter the contemporary view of Sudan as a nation torn by war and conflict.
Documents (manuscripts and printed books, from medieval period to present), maps, AV materials, photographs are all targets for digitization. The National Library is a new venture, fairly small and modest. Larger collections are found, for example, at the University of Khartoum. The National Archives have 76 million negatives; photographs in general are a major asset as pictures are taken of everything and everyone. The University of Bergen is working with Sudan Radio and TV to digitize TV and radio tapes, >160,000 hours.
One planned project is to create a Tuti Island Cultural Life Project, documenting an island in the the middle of Khartoum at the confluence of the White Nile and the Blue Nile. This will demonstrate the richness of the collections.
She showed an image of documentary films stored in an aircraft hangar. She said the smell of vinegar from the decay was overwhelming. No air conditioning, and overwhelming heat.
Asked what the digital humanities community can do to drive such projects, not just in a humanitarian sense, but to recognize these materials as critical to cultural history and to understanding other cultures.
Digital Humanities Empowering Through Arts and Music. Tunisian Representations of Europe Through Music and Video Clips
Monika Salzbrunn, Simon Mastrangelo, University of Lausanne
Main questions are how DH can contribute to empowering processes through arts and music, and how mental representations of Europe evolve and circulate. They outlined their theoretical approach, which relied on exit, voice, and loyalty as described by Hirschmann, translocal migration per Pries and Salzbrunn, multiple belonging processes from Yuval-Davis, and by linking street anthropology and ethnography of virtual networks. This work required ethnographic fieldwork in Switzerland and Tunisia with “Harragas” (people who want to burn their papers), as well as netnographic fieldwork on representations in social media (YouTube, Facebook, etc.).
Mastrangelo showed the actual keywords they used on YouTube, which were an interesting mix of the descriptive (migration, exile) and the very specific, such as mousi9a and har9a. Results with the general descriptors were pretty poor, with videos on bird migration, etc. Their final search query was “mousi9a har9a tunis.” Specifically, they were looking for music, hence the use of mousi9a, which is a Latinized keyword for music in Arabic. They pulled a set of metadata for relevant videos, and used Gephi to map the network ties within the group of videos. This showed that most were closely related and only one video was completely isolated.
Building on their YouTube exploration, they mined Facebook, selecting four public Facebook pages based on criteria they set (context, number of subscribers, presence of video/music content) and then mapped the results of this, integrating the YouTube data as well since the Facebook pages point to YouTube content.
They explained their approach to creating the network maps and how they selected their media objects. They looked for images that had a musical background, i.e.- still images with music playing behind them, where the images are thematically related to migration and diaspora. They also sought musical video clips that offered some sort of commentary on conditions as well. The last category was Harragas crossing the Mediterranean, where the actual crossing is shown, with or without music.
They see this work as an initial exploration and closed by suggesting other avenues for research, such as questioning the link between representations in social media and those on the ground (e.g.- harraga meeting sites). They also suggested cross analysis of their material with ethnographic materials from Tunisia and Switzerland.
Revisionism as Outreach: The Letters of 1916 Project
Susan Schreibman, National University of Ireland Maynooth
This is a crowdsourced project around the Easter Rising of 1916. The uprising lasted around a week and was quickly suppressed by the British. She sketched the story for those of us who aren’t up on our Irish history.
She showed a list of institutions that have provided images of letters and photographs, a total of 10 in all. Interesting example of a digital collection spanning more than one location or physical collection.
She noted that the sequence around the project was different than with a typical project, where you build something and then put it out for people to use. In this case, since it was crowdsourced, the idea was to get it out as soon as possible and get people involved. The responses it received steered the course of the project, for example by leading to a series of launches in various locations, including Chicago (which she joked could be seen as a county of Ireland).
She showed Carletti, et al.’s definition of crowdsourcing, which breaks it down into various activities: correction/transcription, contextualization, complementing collection, classification gathering, co-curation, and crowdfunding.
Their project has a WordPress frontend, but the content layer is a modified Omeka tool known as DIY. They wanted intuitive and easy-to-use. She asked if this is a “golden age of tools,” as we have a lot of tools at our disposal for this work. They do not even require authentication; users can contribute anonymously, yet they find that transcription accuracy is very high. She noted that the Bentham toolbar they employed is great, and makes TEI encoding work, with little error. They did add a couple of little tools to handle dates and salutations. The toolbar has made her think differently about TEI; rather than hitting people new to it with the full complexity, start simple, and work towards the full show.
Session 2, Wednesday
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Alastair Dunning, Clemens Neudecker, Europeana
The Europeana newspaper project is intended to aggregate the digitized newspapers from across Europe, improve OCR, etc. Currently, it’s about two million pages, and by 2015 they hope to have 10 million. They currently have 1.12 million metadata records, and aim for four million by 2015. The list of libraries is fairly long; another set of libraries just provides metadata records. At this point, Dunning noted that this is pretty similar to other projects.
Neudecker spoke about the features that users want that aren’t yet available, e.g.- ability to download texts. He also noted that while they have two million pages, there are 130 million that have been digitized, and there are 1.5 billion to do. Displayed as a bar graph, this is a pretty stark picture and shows that they are just now scratching the surface.
Question: how should we represent the absence (of certain content) to users? One option would be to show placeholders, but of course results would be flooded with placeholders. Alternative would be to have one record for multiple records, but even that has problems. It would help to have standardized information for every digital resource, but as he noted, that’s a pipe dream for now (his words: are you kidding?).
What they are doing to show this is using graphs to illustrate what’s available and to add context, but libraries have to provide such charts, and they don’t always do that.
Other issues: variable OCR quality, licensing statements of varying types, copyright boundaries from nation to nation, some pages have article segmentation, some library content has named entity extractions. OCR errors, of course, impact the ability to text mine.
They offer an API, and feel that this is a way to create more opportunities for use and exploration. Credited Tim Sherratt at the National Library of Australia for his work with Trove newspapers. Showed other uses of APIs to visualize and present newspaper data. He closed that part of the talk by asking how many people, though, are ready to create or exploit an API. The answer is that it’s not very many people.
Exploring Usage of Digital Newspaper Archives Through Web Log Analysis: A Case Study of Welsh Newspapers Online
Paul Matthew Gooding, University College London
He noted at the outset that his scope was determined by the Web logs he could get, hence only Welsh newspapers. Also noted the difficulties of digitizing newspapers, not least article level segmentation. What he wants to know is how these collections are actually used, as he noted from his own work that people cite these collections as research tools or collections, but not much more than that.
Noted the utility of Google Analytics and was generally positive about it, but also pointed out some of the issues it has. One of these is the notion of sampling for heavily visited sites. It’s also impossible to pull the data as a dataset and share it with others.
By getting data from the Welsh newspaper collection, he was able to study search, browse, and content query data. His analysis showed that entry to the collection is via search, but as a user goes deeper, search diminishes, but browse and content queries increase. He explained that his work shows that the archival process being followed when people use the digital archive doesn’t differ that greatly from physical archival research.
Exploratory Thematic Analysis for Historical Newspaper Archives
Lauren Klein, Georgia Institute of Technology
Showed data visualizations from the Statistical Atlas of the United States published by Gannett as an introduction to her talk. Interesting how contemporary they seem despite their age (19th century).
Their data source was eleven abolitionist newspapers. She showed some topic modeling examples that they used to mine the source. Essentially these are sets of query terms that group together under headings that she assigned, e.g.- “women’s rights.” They employed EDA, exploratory data analysis, which stems from John Tukey’s 1977 book of the same title. EDA implicitly asserts that visualization is useful as it “amplifies cognition.”
Showed TOME: Interactive TOpic Model and MEtadata Visualization, a tool they’ve developed that shows ‘dust trails’ of topics. She explained the issues that arose from the first prototype, including its limitations. Showed a second prototype that permitted more investigation and exploration in a single view. It’s still very much in development; she admitted (kindly) that some of the features hinted at in the second prototype were just creative Photoshopping.
In response to a question, she noted that when you publish a tool, it might be wise to include not only a “how to” document, but also some “what to do with it” examples, since this isn’t always easy for others to intuit.
Session 3, Wednesday
Building a Multi-Dimensional Space for the Analysis for European Integration Treaties. An XML-TEI Scenario
Serving a contemporary need, namely making European treaties less complex to find and understand. Emerges from the insight that general knowledge of the nature of European integration is low in the population. Specifically, they are looking to assist scholars in European Integration Studies with their analysis of these legal documents.
Problems in Encoding Documents of the Early Modern Japanese
Described encoding issues related to the Sharebon, an early modern Japanese text. Beneath the paragraph
level, there are three elements, warigaki (interlinear notes), speaker, and descriptive texts. There are text formatting issues on a #PCDATA, as he described it: voice marking, phonetic correction, and iteration symbols. These are relevant parts of Japanese language (voiced v. unvoiced phonemes, etc.). TEI allows and to show original and corrected versions, but somewhat generically.
Another issues is how to handle the Ruby annotations, i.e.- furigana glosses that indicate pronunciation; these are more challenging to address. For one, they can appear both left and right of the text, one showing furigana pronunciation, the other showing other syllabaries (e.g.- katakana). Hard to solve with TEI.
Uncertain About Uncertainty: Different Ways of Processing Fuzziness in Digital Humanities Data
F. Binder, B. Entrup, I. Schiller, H. Lobin (unsure who spoke)
Stems from the GeoBib project, a bibliographical project around early Holocause literature (to oversimplify).
Where does fuzziness arise? In part, it stems from vague descriptions and localizations, or when fictionalized spaces and names are used. There is also fuzziness around biographies, as well as from geography and maps (contradictory maps).
Fuzziness manifests itself in various places: TEI/XML, the wiki, the database, the Web interface, and the maps. TEI has the certainty attribute, which has set values (high, medium, low, unknown). In the Wiki they use infoboxes to add variant data, which they can also do in the database. This can be surfaced on the Website by putting additional data in italics and using mouseover tooltips to explain these bits. He also showed an example of a map that visualizes ambiguity, e.g.- what the extent of “Germany” during the years 1933-1945 was (depends on one’s point of view).
Macro-Etymological Textual Analysis
Started with a little quiz, asking us to comment on the nature of near synonyms, e.g.- ask/question/interrogate. He has created the Macro-Etymological Analyzer, which views a text through the lens of etymology to parse the words in a text to determine their origin. Showed this running against Pride and Prejudice. The origins are grouped into families (Germanic, Latinate, et al.).
He showed multiple graphs of what happens when this tool is applied to various texts. Sometimes the results are predictable, but certain texts, among them Joyce’s Ulysses, defy expectations (perhaps also predictably). Finished by describing trends that the tool reveals in literary texts.
The tool is available on GitHub under a GPL license.
Zampolli Lecture, Wednesday
Communities of Practice, the Methodological Commons, and Digital Self-Determination in the Humanities
Ray Siemens, University of Victoria
Takes slight issue with anyone who depicts the humanities as a static enterprise. Points out that it has always been a fairly dynamic enterprise, with or without the entrance of computing. One manifestation of this is when administrators say, great, DH is a thing and you can have three positions and they’ll be in computer science.
Gave a super brief summary of points he has made before about ‘big tent’ digital humanities, showing how active and vibrant it is. Ended that tour by pointing out how hard it is to define all of this fully, accurately, comprehensively, in a way on which we can all agree, and in a way that is actionable. Reviewed some of the attempts to do so, including his 2004 book with Susan Schreibman and John Unsworth (2004) and again with Schreibman in a subsequent book.
Beyond the definition, we need to think about how and where we do what we do, which is how he arrives at his points about communities of practice. As he put it, this helps us understand not only what we do, but who we are, where we do what we do, and why we do it. There is a set of practices that is shared among those who identify as digital humanists (his emphasis). This community has a formal structure, e.g.- ADHO and its constituent organizations, the whole idea being to unify structures and create opportunities for exchange and collaboration. ADHO thus assumes a role for supporting and propagating what makes DH a practice/field.
He pointed out that this leads to the notion of the methodological commons, a notion that has been around for a couple of decades and is often captured graphically. What are the elements? Positive movement toward a problem-based focus (driven by research questions). He offered several other points beyond this, such as increasing data, both in terms of creating and applying data. What this can lead to is asking better questions and being prepared to ask and address those questions.
He asked how our community of practice responds to these trends. One is to communicate open and clearly about what we do; more broadly, this is an opportunity for the community to lead the way in some areas. Training and a curriculum is a response. This starts in informal ways, and becomes increasingly formal over time. Starts with collegial discussion, networking, brown-bag sessions, etc. and leads to the development of workshops at the local, then regional, national, and international levels. These are all informal. Beyond that, there are the formal types, e.g.- PhD programs, but these come with a range of considerations one must address: legitimacy, enrolment numbers, costs, agility, etc. He described what he called a sweet spot between the very formal and the very informal, which is occupied by occasional/inflected accredited curriculum and national/international workshops, where there is enough legitimacy, for example, but still a fair degree of agility. What then occurs is that one needs to “work the ‘sweet spot’ up” the scale.
What is emerging is an ADHO-related training network, of which DHSI is one node, but there are many others (e.g.- Leipzig). One outcome of this is the graduate professional certificate in DH. Is it time for a formal ADHO network? Sounds like the signal has been given for that.