ARL Forum 2015 notes
Designing Digital Scholarship: Research in a Networked World
Julia C. Blixrud Memorial Lecture
Tara McPherson, U of Southern California
Started by pointing out the issues of scale with digital scholarship: hundreds of hours of video uploaded to YouTube every minute, millions of Instagram images, etc. Beyond that, there’s also an explosion of data. She showed various examples of municipal use of data and pointed out that there are very valuable and rich scholarly datasets as well. As she put it, we hear about data from our provosts all the time.
Noted that humanists don’t typically use the term data, but that it’s data being used regardless of what one calls it. Archives such as the Shoah Foundation archive are simply a different kind of data when viewed in this light.
Interesting notion: “preservation is no longer separate from access.” Taken literally, that’s a very daunting challenge.
Noted that the nature of the work we are doing now no longer finds its best or only expression in print, e.g.- monographs or scholarly articles. She described essentially merged products where the data and the analysis coexist side by side. Activities such as data visualization are part of scholarly production and need to be assessed and considered as such. As with all such talks along these lines, it comes back to the notion that faculty need to get “credit” for this work. Along with questioning what gets credit, perhaps it’s time to question the underlying system, i.e.- one that privileges research output far above any other roles that a faculty member fills, e.g.- that of teacher and mentor.
Made a call for closer alignment between university presses and the institution that spawned them, in particular the libraries. Capped her comments on this by saying that we need to stop giving our money to Elsevier. Of course, we only give them money because scholars give them their work.
We should “value experimental practice,” which she defined as “playing” with information and doing interesting and artistic things with data and information. In this light, game designers came up, as well as interface design and design in general. Talked a bit about Vectors, a journal that was founded to publish work that couldn’t be published in traditional formats. As she put it, the work wasn’t always practical, but it was generative. Interestingly, she spoke of it in the past tense, even though it continues. It’s amazing to see what can be done with significant grant support, but most scholars and institutions have zero access to those funds. That’s a problem we need to help them solve.
The Vectors team created Scalar, a publishing platform that’s has been talking about on the conference circuit for a number of years now. It’s a CMS that seeks to go beyond the textual form and allow multiple streams–data, multimedia, text–to interact and interoperate. As she noted, Scalar is now one of many such tools that are emerging, most funded by Mellon through various presses. Scalar can both pull in diverse data and content, as well as provide multiple views to it on the inside. One aspect that is well known from earlier talks on Scalar is the lack of strict linearity to its narrative form. It’s possible for the reader to take many paths through the content, not just the one the author intended. She noted the “vampiric” relationship of most work to archives, where they take material as source but push nothing back to the archive. Scalar creates a mechanism to reverse this trend.
She rolled back to the notion of the relationship between teaching and research, noting that they are closely interconnected in the heads of scholars, but that we have created various structures within the academy that cleave them apart again. Digital scholarship offers the possibility of bringing them back into connection and alignment. Students can become active participants in research more readily within the spaces that DS creates.
Emerging Models in Humanities Publishing: Institutional Implications
Panel: Gary Dunham, Carolyn Walters, and Jason Jackson, Indiana U; Meredith Kahn and Charles Watkinson, U of Michigan; Lisa Macklin, Emory; Elliott Shore, ARL
Carolyn Walters introduced the panel by speaking about the studies at Michigan, Indiana, and Emory that formed the basis of the panel into how to develop models to create a new scholarly communications practice.
Emory’s study occurred in 2014-2015 and included 11 humanities faculty, the co-director of their digital scholarship centre, and the director of their scholarly communications office. The last role is Lisa’s, meaning she was the only librarian on the group. They used two outside consultants, directors of external university presses (UNC and Yale). Internally, they worked with their academic technology service.
One question they had was around eligible work, arriving at what they characterize as a continuum: print monographs, long form scholarship published digitally that resembles print, long form scholarship that has been enhanced by the digital format (e.g.- links, embedded media, etc.), and digitally published long form scholarship that is not suitable for print publication. Beyond that, they began to discuss the quality of a digital monograph. Their answer included these elements: robust peer review, marketing, design, licensed for reuse, sustainability and preservation, printable, annotation, searchable, and the potential for networking (i.e.- linking). There are a few things on that list–e.g.- printable–where my eyebrows go up. Also not sure about marketing’s place when it comes to digital work. I suppose it depends on how one defines and practices marketing.
Their conclusions were that the humanities still places emphasis on long form scholarship. They feel that open access is important, but they also want universities to subsidize the publications, with funding available to scholars at all ranks. One of the final conclusions she put on her slide was “any program will require socialization.” Their recommendation to Emory was to have a university funded model. Mellon has invited them to apply for funding, but that’s one institution, and there are hundreds. Such funding does not lead to broad transformation.
Indiana and Michigan have also embraced the subsidy model. Jason spoke about faculty eligibility. Michigan and Indiana have similar models. He noted that ideally the scope of subsidy would extend to PhD graduates working at institutions that lack subsidies, but continued by saying that practice would likely fall short of this goal.
For the publisher perspective, Meredith noted that AAUP recommends that the subventions be directed primarily toward their membership. Other publishers may qualify at the local level, but such publishers should be committed to rigours peer review and be required to reveal their expenditures for the monographs receiving subsidies. Her discussion of Oxford UP and Cambridge UP show how challenging it can be to decide who falls in scope and who does not.
Gary posed the question: how much does a digital monograph cost? The direct costs are copyediting, proofing, typesetting, design, and file conversion. He then listed the indirect staff costs: acquisitions, project managers, designers, sales, publicity, etc. Other costs, still: conference travel and exhibits, advertising, sales commissions/fees, and facilities.
They attempted to derive the precise costs for a digital monograph, arriving at a $1500 labour cost per monograph. The net total cost came out to around $26-27K at Indiana and Michigan, while the Ithaka study pegged it closer to $22K.
Charles took up the implications for publishers. His first point was that not all monographs incur the same costs. Depending on complexity, the cost of an individual title can be much higher. Noted that much of this complexity actually depends on the author (good laugh line).
He asked the hard question about what to do with monographs with multiple authors, not least when the authors are at multiple institutions, perhaps in multiple countries. How could/should one apply subsidies?
Elliott closed their remarks by noting that what’s being discussed are current costs, and that we shouldn’t take these as fixed and immutable. Asked the piercing question: why are the digital humanities, libraries, and university presses funded separately? The first lives on grants, libraries are funded directly, and presses must cover their costs with revenue. Is this good?
Digital Scholarship in the Social Sciences
Harriette Hemmasi and Rachel S. Franklin, Brown; Eric Kansa, U of California-Berkeley; Ethan Watrall, Michigan State U
Rachel started by noting something that we hear often in the Sherman Centre, namely that everything she does is digital scholarship. She did note that she hasn’t previously stepped back and evaluated what she is doing. To get others involved, however, you have to break it down and define and shape what it is so that others can enter.
She spoke about the need to introduce people to techniques, to data (numbers), and stress the point that one has to get the work out to an audience (dissemination). Where do we preserve our data? How do we record and preserve our methodology? These are questions we didn’t ask before but must now.
Made a joke about how geographic data is special (laughter). She pointed out though, that it’s data everyone wants but that it’s unusual to work with for many people who lack training (e.g.- most humanities scholars). She did give examples of the special nature of spatial data. For example, the way data is coded determines how it can be used. Described the first and only law of geography: everything is related, but near things are related more (badly paraphrased). Regarding dissemination, she noted that we once had paper maps, but we’ve moved very far past that. It’s one thing to digitize sources, e.g.- census data, but once it’s digital it’s imperative to describe and support it in such a way that people can actually use it successfully.
With regard to dissemination, she noted that it’s imperative for researchers to describe what they have done in lay language, as that opens their research up to others. With new scholars, she feels that much of this–the impulse to share and distribute data–is inherent in new scholars, but as one goes up the age scale it becomes increasingly radical as a notion.
Brown has S4: Spatial Structures in the Social Sciences. The initial goal was to “make Brown a highly visible international center for research that investigates human behavior in its social and spatial context.” From her description what it sounds like is a sophisticated network of varied expertise that supports a wide array of scholars using spatial data. They emphasize training, for example, including an annual two-week GIS training course. Part of the goal is to bring people with disparate backgrounds together and get them talking. Much of this description resonated with me since we are trying to do many of these things at the Sherman Centre. We don’t “do” the research nor produce the product; we facilitate, train, connect, listen. They specifically train people to do this so that when they go out into the world, they take this way of doing things with them and build it wherever they land.
Her best line: “I’m too old for Python.” This is the wisdom of an advanced scholar who recognizes that we cannot expect individuals to learn every skill their work might require. It takes a network.
Eric spoke from the context of archaeology, but as it relates to research data management. He feels that RDM is an area that needs more intellectual investment. It’s not just “save the data” and do the same thing we’ve always done, but using data to enhance research and open up new opportunities. In other words, libraries and librarians should see themselves as intellectual partners in this area. Specifically, he spoke about the Open Context project.
So what are the challenges? Status quo, for one, where the article is the be all and end all of research, and data are considered unimportant. We hold on to it with tenacity and it dies with us, so to speak. When we talk about data management, we tend to use language of compliance and metrics (he noted that this is a Taylorist perspective). After all, we call it management. It’s checkboxes in a plan.
Open Context takes a different road, where data becomes its own form of publication. They organize peer review, exercise an editorial role, etc. His point: data needs the same kind of attention that other forms of scholarship merit. Given the focus on archaeology, there is a lot of “less structured data” such as images and field notes. Showed some examples of specific information in their system, which enables what he called microcitation, or citing very small elements of a large dataset. This is critical for future understanding and for following the thought process of a scholar.
He highlighted the difference between OC and a digital repository. The latter takes the whole file as the object of citation, while OC allows actual objects, a bone or a weapon, to be directly citable. This does lead to the criticism that OC is an expensive boutique publisher that cannot scale up or out. He had extensive answers to this question, noting, for example, the value inherent in creating linked data and publishing it to the Web, which yields benefits in terms of letting more people connect with the work.
It takes work, but as he asked “what’s the point of all of this?” He introduced the idea of slow data in contrast to big data. How this work comes together is important and deserves support, for example, he noted that much of this happens in the alt-ac realm, so clearly outside of the academic mainstream. We need experimentation and community capacity building (he see libraries as natural collaborators and intellectual partners).
Ethan spoke about international and cross cultural collaboration on the basis of two projects. For their work with a Malian photo archive, they (Matrix at MSU) are working with NEH funds and the British Library. The other example is project exploring the archaeology of Gorée Island off the coast of Senegal, a major site in the history of the West African slave trade. With this project they had serious preservation and access issues to address. The latter, for example, required a trip to Dakar. They worked with a university in Senegal and the Smithsonian on digitizing portions of this collection, using stereophotogrammetry for 3D objects. The project involves training graduate students in Senegal to do the work, which gets to the issue of building capacity.
So his punchline, as he put it, is that this is only possible if you:
- seek partners and collaborators, not service or content providers
- seek points of fruitful compromise that balance your goals with those of your partners
- recognize that issues of openness and cultural patrimony aren’t always binary
As he noted, many projects being done in African nations do not seek collaboration, but simply see these countries as a resource. With regard to his second point, our Western impulse to slap a CC license on everything is often at odds with the desires of the community of the community that feels ownership of the materials. On his last point, he rejects the notion that openness and cultural patrimony are binary, suggesting that it’s more of a spectrum and one needs to negotiate that with partners.
Funding Partnerships—Trans-Atlantic Platform
Geneva Henry, George Washington U; Brett Bobley, NEH; Brent Herbert-Copley, SSHRC
Brent from SSHRC spoke first. The Trans-Atlantic Platform dates to 2013 and includes key funders in South America, North America, and Europe. The idea behind it is that international research collaboration is increasing, but a lack of coordination in national funding creates barriers to collaboration.
Initial activities include analysis of member organizations and the creation of an overview of cooperation across the EU and the Americas. One area where collaboration has been successful is Digging into Data. Building on that, there is now a plan to launch a new round in 2016. There is also work going on to define a thematic scope, i.e.- to identify areas where funding could play a critical role in addressing diversity and inequality, furthering transformative environmental research, and building resilient and innovative societies.
Why is this important to SSHRC? For one, SSHRC has a history of funding the digital humanities and has increasingly opened programs to international collaborators and co-applicants. It also aligns with the new policy frameworks around open access and research data management, as well as with some new strategic directions.
Brett noted that Digging into Data, which started in 2009, assumed from the start that libraries would play a role. It’s a very successful program, attracting a fair amount of media attention. It has also spanned a wide range of disciplines.
He pointed out that one reason to engage in TAP is that it allows the NEH to fund projects larger than it could fund on its own (as he put it, they don’t have much money as we all know). That’s a major advantage of scale.
Global Partnerships in Digital Scholarship
Elliott Shore, ARL; Geoffrey Boulton, U of Edinburgh
Boulton’s unenviable task was to summarize the day and draw some connections. He said many interesting things, including the directly challenging notion that “research has become the enemy of teaching.” From my vantage point in libraries, I could not agree more.
He then turned to the notion of global partnerships in digital scholarship. He noted that he was out of his comfort zone speaking about digital scholarship in the humanities and social sciences. As he pointed out, his comfort zone is in Antarctica and Tibet doing science.
What binds the sciences to the humanities? One notion: systematic organization of knowledge that can be rationally explained and reliably applied. There are four strategies we deploy to achieve this: empirical claims (how the world is), normative claims, formal analysis, and interpretation.
- all disciplines used all these approaches, in different proportions
- empirical claims are based on “data”
- data issues are the same for all of us in essence, but vary in the details
Used the example of Henry Oldenburg and the Royal Society’s decision to publish accounts of science from various corners of the world. Oldenburg insisted upon the vernacular as opposed to Latin, a key moment in the history of open knowledge.
He emphasized the scale problem with data. With so much data, it’s become a basic challenge to reproduce results. Studies are showing that the vast majority of articles in various disciplines are not reproducible. The notion that science is no infallible is, as he showed with some choice quotes from mass media such as *The Economist), becoming widely understood. The issue: the probability/possibility that most research conclusions are wrong? Why is this so? Inadequate data or metadata, invalid reasoning, fraud. If we fix those, is it all better? Probably not, and perhaps we shouldn’t. We don’t want to inhibit bold ideas and imagination. Science is perhaps the best way to gain new knowledge, but it will remain uncertain and provisional. Does this matter, he asked? Not really. It’s just a state of affairs.
He sees great promise in open knowledge, quoting Shaw’s famous line about exchanging apples versus exchanging ideas. Showed a concrete from the natural sciences (EMBL-EBI Elixir).
He made one of the few negative observations I’ve heard about data visualization, namely, if we are merging a wide array of data, as in a large number of datasets, we actually make it more difficult to visualize the data, but the value of the massive combination of data is great.