DH2014 day two notes
Today featured the panels I’ve noted below, as well as two excellent poster sessions. Saw many things that gave me ideas and/or inspiration, and took numerous pictures that I immediately emailed to colleagues who aren’t here.
Session 4, Thursday
MicroPasts: Co-creation and Participatory Public Archaeology
Daniel Pett, British Museum
All open: open data, open source. All built on a variety of OS platforms: WordPress, Pybossa, neighbor.ly, and Discourse. About 15 GitHub repos, which host various parts of MicroPasts.
They’ve been live with it since October 2013. Some minor issues, not least with legacy browsers. The crowd-sourcing site went live in April 2014 (“slightly too early”). Mainly based around the BM’s Bronze Age archives; they need help cataloguing 30,000 index cards of all Bronze Age artefacts. Transcription comes first, which isn’t entirely simple. Beyond that, they need help creating 3D models. They can then generate 3D prints that can be carried around, dropped, etc. Made a joke about not being stopped at customs with these objects.
They apply CC0 to task data and models, and the images get CC-BY. His argument is that it’s publicly owned and funded, so belongs to the public. Apparently that’s not entirely OK with everyone in his workplace. They find that their 3D models are now being printed worldwide.
Now they’ve moved into crowdfunding, which launches later this month. They’ve got several seed projects that need some funding (maximum is £5000) to push their work forward. Also trying to establish a social media presence, but at present the impact is modest. Usual platforms: Twitter, Facebook, YouTube. The idea is community interaction.
Their evaluation processes rely on online surveys, talks, Google Analytics, Pybossa statistics, diary study, social media data, etc. They also have some challenges, such as convincing their finance department to understand PayPay accounts and crowdfunding. Sounds familiar.
Socially-Derived Linking and Data Sharing in a Virtual Laboratory for the Humanities
Toby Burrows, Univ. of Western Australia; Deb Verhoeven, Deakin University
Creating the Humanities Networked Infrastructure (HuNI). Currently 695,000 entities built from 30 data sources. The idea is to aggregate a ton of data to allow researchers to explore the data and connect with others doing similar research. Also has the capacity to allow participants to create and upload their own data. Funded by NeCTAR.
The virtual laboratory is currently in final user testing, which should conclude soon. At that point, it will launch as a public service. HuNI has its own data model, which he briefly introduced without going into detail, other than to say it resulted from extensive philosophical discussions. Based on Person, Organization, Work, Place, Event, and Concept. The idea is to keep things simple, as their slide put it, to keep the categorization of data entities to a minimum.
During their demo, it became more clear what they meant by social linking. In HuNI, it’s possible to create a collection that associates various datasets with each other, i.e.- applies human intelligence to what are otherwise disparate Web resources. The links themselves have the character, by design, of RDF triples. Showed the example where Hugh Jackman attended the Western Australia Academy of Performing Arts. Following those connections leads to other discoveries, such as that Heath Ledger is also an alumnus of that Academy. The collection page shows a network diagram of this that can be navigated and followed.
In response to a question, they said that the connections are not RDF triples, as in it’s not a triplestore. Verhoeven also pointed out that connections can be contested. People can disagree with each other. In response to another question about the possibility of introducing “junk” or “noise,” they pointed out that it’s not possible, via HuNI, to modify the source entities, only to link them. Verhoeven also noted that this is experimental, and that troll behaviour could occur and could be confronted.
Mining a “Trove”: Modeling a Transnational Literary Culture
Katherine Bode, Australian National University
Trove is, of course, the database of Australian newspapers created by the National Library of Australia, but she pointed out that her work could apply to other resources. 13.5 million pages from 680 newspapers, which makes it the world’s largest newspaper archive.
In the 19th century, a great deal of literature was published in newspapers, but it wasn’t a simple matter before digitization to discover and study this literature. Trove does article segretation, which is useful, and since words recur (since fiction was published serially) it makes it fairly easy to extract the literature. They use a Trove API to extract their dataset, and then manually apply fields to enhance the data that comes out of the API.
They intend to make the full dataset publicly available so that others can work with it and improve upon it.
In the 19th century, newspaper printing was a messy world. Much was anonymous or pseudonymous, and plagiarism or rogue reprinting was rampant, with works appearing multiple times under various bylines or in various guises. “The works are processes rather than stable objects.” She noted that literary criticism and literary theory ignore that notion, treating works as stable and singular.
Second epistemological point she raised is the provisional nature of a digitally generated archive such as this. New proxies are introduced into research that go beyond what existed in an analogue environment. Trove creates new proxies, based on how it’s created, how it’s presented, and the models it uses to display and sort the archive. The word she used was mediation. The more proxies you have, the greater the risk that errors or biases can manifest themselves and compound themselves. OCR errors can be identified and mitigated, for example, but search interfaces are harder to interpret and it becomes harder to identify and remedy issues that arise with digital archives. The rhetoric that goes with this kind of work revolves around objectivity and comprehensiveness, but those are not given nor guaranteed.
In conclusion, she noted that we need to counter uncertainty with rigor and method rather than despair.
Session 5, Thursday
Developing for Distant Listening: Developing Computational Tools for Sound Analysis By Framing User Requirements within Critical Theories for Sound Studies
Tanya Clement, University of Texas
Spoke about the development of HiPSTAS, High Performance Sound Technologies for Access and Scholarship. Mainly provided a theoretical basis for the work from a read paper.
Kinomatics: Big Cultural Data and the Study of Cinema
Deb Verhoeven, Alwyn Davidson, Deakin University
Investigation of global showtime data. Idea is to get at cinema as a social practice, rather than a textual practice. Verhoeven situated this in the ‘new cinema history,’ which sees cinema as not isolated but rather socially connected and embedded. That’s a poor gloss of what she said, incidentally. She pointed to a number of sites working in this vein, but noted that all of them were locally specific, i.e.- constrained in geographic scope (e.g.- London).
Kinomatics is their play on kinematics. What is it? Poorly summarized, the study of the motion of film as an industrial product. Specifically, they are compiling showtime data globally, and have good coverage save for some specific areas (Russia, bits of South America, etc.). Showed a visualization of the data at the Melbourne level.
Their goals are to document the flow (or bleed) of culture and to drive the computational turn in cinema studies. They purchase the data from a commercial provider, which the provider disposes of after one month has passed. They get a weekly delivery, so their data is the only record of this data.
Now that the data is being compiled, they are beginning to pose questions that the data can answer (she showed two slides of examples, quickly). Some are related to cinema, others are more about data and its uses (e.g. – confidentiality and socialization of data). As she put it, though, moving beyond the obvious takes work. In other words, the things you think you will find, you do, so you have to go past that.
Davidson introduced some use cases for the data, e.g.- the spread of high frame rate technologies around the globe. The first visualization she showed was the “obvious” evidence that US films are shown globally and dominate globally. Going beyond that, they explored dyadic relationships between two markets, in other words, not just where US films went, but what went in the other direction. To do this, they use their own database, but also pull data from IMDb to get country of origin and production year.
Another project was to rank cities by “cinemability,” their ability to support cinema. They used a variety of criteria that they can pull from the data (diversity, number of screens, etc.). Defining cities is something of a challenge, so they set criteria which resulted in 302 cities in 42 countries. Results can be found at cinemacities.com.
Integrating Score and Sound: “Augmented Notes” and the Advent of Interdisciplinary Publishing Frameworks
Joanna Swafford, SUNY-New Paltz
Demonstrated and discussed a tool she has created known as Augmented Notes (for score following) and a specific resource that she has created with it, Songs of the Victorian. Swafford developed this with support from the Scholars’ Lab at the University of Virginia. Didn’t take a lot of notes, but the fact that she built this out of the ground is impressive. She made a point in the Q&A about hosting it using the Google App Engine so that it would be portable with her after she finished her degree. It’s a shame that universities can’t support this kind of hosting without laying claim to the work nor making it hard to move.