Category Archives: Spring 2014

DH Praxis Course Archive

This website was used for the DH Praxis course at the Graduate Center, CUNY, during the 2013-2014 academic year. Taught by Profs. Matthew K. Gold and Stephen Brier, the course aimed to introduce graduate students to the landscape of digital humanities work during the fall semester and to engage students in hands-on collaborative work during the spring.

Here are some things you can do on this site:

  • Check out the archived lectures of our Fall speakers on subjects such as topic modeling, data visualization, project management, and networked scholarly communication.
  • Visit the three projects produced by student teams during the Spring 2014 semester:

Beyond Citation: Critical Thinking About Academic Databases

DH Box: A Digital Humanities Laboratory In the Cloud

Travelogue: Mapping Literary History

  • Check out blog posts by individual students and teams below.

Announcing the launch of Beyond Citation

The Beyond Citation project team is thrilled to announce the public launch of our website at Even though scholars use academic databases every day, it is difficult to find information about how the databases work and what is in them. Beyond Citation gathers information about academic databases in one place to enable traditional humanities scholars and digital humanists to get a better sense of the content and searching mechanisms in databases. The goal of Beyond Citation is to make academic databases more transparent to users and to encourage critical thinking about academic databases.

The audience for the site is scholars, librarians, research enthusiasts or anyone who:

  • Uses academic databases and wants to learn more about what is in them
  • Is frustrated with academic databases and wants tips about how to more effectively search them
  • Wants to share their knowledge or experiences of academic databases with others

We invite you to participate in Beyond Citation by:

  • Starting or adding to a thread in the Community Forum.
  • Proposing an article or blog post that they would like to write
  • Offering to write an entire entry for a new database
Please visit Follow us on Twitter @beyondcitation

Continue reading

DH Box Takes Off

Cross-posted from the DH Box Blog:

This is it: DH Box is officially launching. The Digital GC is presenting an evening of short talks from various CUNY Graduate Center digital initiatives today, May 12 — starting off with DH Box.

I wanted to take a moment to reflect on where DH Box started and how far we’ve come. We introduced our project in early February:

What is DH Box?

Not much, so far. But we intend it to be a portable, customized linux environment for Digital Humanities learners that can rely on incredibly inexpensive technology. All you really need is a computer that runs Linux (and a monitor and keyboard, of course!) — but the platform that excites us most is the Raspberry Pi, a tiny computer that sells for just $35. Imagine a collection of DH tools, pre-installed and configured, and a set of texts for users to interrogate — all on a portable and inexpensive device.

That’s a quote from our first blog post — and it illustrates the most drastic change to our project. DH Box’s founder, Stephen Zweibel, had originally envisioned DH Box as being scripts that, when run, installed common DH applications (think Omeka, MALLET, NLTK) onto the user’s system; additionally, DH Box could be shipped as its suite of tools pre-installed on the light and portable Raspberry Pi computer.

As DH Box developed, it took a shift in platform, moving away from the issue of dealing with the idiosyncrasies of each individual’s system, to hosting instances of a virtual computer that any user could launch.

This was a vast and visible shift. But, despite not being as drastic, many other project elements developed in the journey from DH Box’s inception to its official launch.

Continue reading

DH Box: Countdown to Launch

DH Box is nearly ready to launch! Our user experience testing is complete and we are putting the finishing touches on our front end interface. We have added more information to our documentation on our wiki including new pages for “What is DH Box” and “Launching DH Box” as well as pages for each of the tools in DH Box: IPython, MALLET, NLTK, Omeka, and RStudio.

Our team did a presentation on DH Box for the Academic Center for Excellence in Research and Teaching at Hunter College. It was exciting to show our project to faculty who may want to use it in classroom teaching and research. The audience included math professors, English professors, sociology professors, librarians, IT specialists, and adaptive technology specialists. We got a great range of questions. Some were very specific: “Can I upload a csv file with the information for all 25 students in my class or do I need to add a new user for each one?” Great idea – we’ll see what we can do! Some questions were more general such as “What tools do you plan to add in the future?” We will be adding more tools as the project develops, but we need to limit our selection to web based or command line tools that are open source.

Everyone was very interested in how using a virtual server can improve access to technology for students. Our team is excited about a new project at the University of Mary Washington called A Domain of One’s Own The project will provide all incoming freshmen with their own domain names and Web space. Students will have the freedom to create subdomains, install any LAMP-compatible software, setup databases and email addresses, and carve out their own space on the web that they own and control.

DH Box brings a powerful virtual computer to anyone with a web connection. Students do not need to own the most recent laptop computer or to attend a school with a big budget computer lab.  We are very exited about how our project may grow in the future to offer even more!


DH Box Development and Testing

We’ve made big strides developing the front end interface to launch a new DH Box, and the Welcome page/menu that acts as the DH Box ‘home base’. We received extremely helpful feedback from some generous volunteer user experience testers at City Tech, and valuable advice from Chris Stein, Director of User Experience for the CUNY Academic Commons.

The results of our first round of user experience testing gave our team some great insights, and a fresh perspective on the project. We learned that perhaps one of our biggest challenges is effectively conveying the concept of the project in a readily digestible way.

We discovered that users can easily get the impression that DH Box is essentially a website, when in fact it’s much more than that (it’s a computer!). It’s understandable that this virtual computer could be confused for a website since DH Box’s primary navigation happens through your web browser. A distinct IP address is assigned to each DH Box instance at the time of launch. DH Box users navigate to applications (Mallet, Omeka, etc.) through specific ports designated for each tool. The “port” is just a unique numeric identifier appended to the end of your DH Box IP address. This same protocol for assigning unique identifiers is the basis of the internet; there’s an IP address behind every website.

We as a team are now reexamining how to explain the system of navigation, along with all of the fantastic stuff a virtual computer can offer so that users will be ready to push DH Box to the limit.

[Cross-Posting] On Successful #DH Project Management

Project management is difficult. As one of my teammates said to me point-blank: “I would not want your job.”

As our team began to work on Travelogue, I assumed that my brief stint organizing the development of two separate websites in various professional settings would help me. But while a background in marketing has allowed me to think more critically about things like publicity, nothing really prepared me for managing people my own age in a setting where we do not receive salaries for our work.

And while I have been extremely lucky to work with a group of brilliant people who are  invested in helping me complete the project, it has been tricky figuring out how to tell people what (and how much) to do; everyone has full lives outside of school.

In a work setting, orders would coming down from my boss who had little idea of the actual tasks we needed to take in order to complete a website. The details of these orders were laid out for me by advanced IT and design departments, each of whom had their own ideas about how the website should look and behave. In this project, where I am the “boss,” things were more difficult, especially because while all of us have great ideas, the actual means to execution can be unclear. But just because you only have a basic understanding of web design, it does not mean that you can’t build something (mostly) from scratch. You just need a good plan.

Websites and website redesigns can (and do) take years to complete, but for this project, we only have about four months. In the course of this semester thus far, I’ve found that a few things are essential to completing a project successfully. Some seem obvious, but when you are trying to keep a bunch of different wheels spinning, simple things can be easy to forget.

(Of course, this is not complete list)

Know Your Deliverables

What are the major tasks that need to be completed in order to produce a final project? In the course of a semester, what needs to be completed from week-to-week in order to get things done? Setting some key deadlines, and being able to adjust them, will help the project move forward. I made a simple project plan in an Excel document that was arranged by week, with a new goal for each Monday. From there, I doubled back and talked to my group members about what needed to be completed for each goal. I am indebted to Micki Kaufman for major assistance here, as well as to Tom Scheinfeldt’s lecture last semester.

Use Your Support Network

There are experts at your school who can help you. As it goes with everything, being afraid to ask for help can (and will) diminish your success.

Know Your Team’s Strengths (and Weaknesses)

Project management involves a good deal of emotional intelligence. Knowing where your group members are coming from, and being aware of and sensitive to what they can and can’t accomplish in a given time frame, will provide for a better outcome. It kind of goes without saying that actively listening to your group members’ concerns and ideas will make them more invested in your goals.

Be Flexible

This goes for allowing extra time in your project plan, as well as being open to adjusting your vision and/or timeline. It can be hard to let go of original ideas, but if they aren’t working, it’s important that you are able to recognize that and just let go. In the case of Travelogue, our project scope changed slightly from what I originally proposed when we learned more about our platform. You also have to pad enough extra time in your project plan in case you hit roadblocks or an unexpected learning curve.

Relax (a Little Bit)

In working on a major project with a tight deadline, not only is it important to manage your expectations, but it is also important not to put too much pressure on your group. My personality defaults to surface-level relaxation that can be misinterpreted as lackadaisical, when usually (like anyone else) I’m managing a huge amount of internal stress. I try not to micromanage my team as a result of my internal freakouts, which would make anyone stressed-out and disengaged. At the same time, being too lax about deadlines says: “I don’t really care.” If you don’t care, neither will they.

We are currently buzzing around our computers to get this thing done, with constant revision of the plan to keep things in motion.


And here is a link to the project plan for anyone who’s interested:

Academic Databases: Beyond Digital Literacy

Basic digital literacy for scholarly research includes knowing how to access digital archives, search them, and interpret their results.

Another component of digital literacy is familiarity with the semiotics of the interface; knowing how to “read” the instructions and symbols that give the user an idea of what invisible material lives in a database. These portals make the contents accessible, and also convey, before a search is even conducted, a range of search possibilities. The interface suggests something about the most useful metadata that the archive contains and the way the data can be accessed.

A user, then, can glean understanding about the mechanics of the database through the interface alone. This additional level of digital literacy is helpful, but still represents a limited understanding of databases. Many of the commonly used archives that humanities scholars, librarians, and historians use are proprietary, and even with some information and educated guesses about these archives’ metadata structures, it’s difficult or impossible to go a step deeper and discern exactly how the search algorithms work and how the database is designed.

This is an issue of emerging importance for digital scholars, and is prompting historians and others to think about what appears in search results and what doesn’t. But even if researchers knew how every database and its search algorithms worked, that wouldn’t resolve all the issues and theoretical implications of digital research and scholarship. As Ben Schmidt has pointed out, “database design constrains the ways historians can use digital sources.”

The limits of database design are an important window into the computational disciplines that enable information science in the first place. Programming machines to search a hybrid of digitized source materials is of course a wide problem, involving a myriad of methods, employing methods that are constantly evolving and becoming more powerful. Therefore, it’s interesting to ask: When are the issues associated with digital research contingent on computational science and when are they contingent on the way that proprietary archives and databases choose to implement the latest algorithms?

An interesting consideration in addressing this question might start with a distinction that William J. Turkel makes between scholars who use subscription archives and those who write code to mine massive data sets themselves. The literary scholar Ted Underwood has also discussed searching academic databases and data mining in parallel, commenting, “I suspect that many humanists who think they don’t need “big data” approaches are actually using those approaches every day when they run Google searches . . . Search is already a form of data mining. It’s just not a very rigorous form: it’s guaranteed only to produce confirmation of the theses you bring to it.”

Thinking about the distinction between proprietary database engineer and dataset hackers might foster the assumption that those two parties have radically different agendas or methods for searching born-digital and and digitized archive material. But while independent programmers represents a new frontier of sorts—scholars willing to learn the methods needed to do their own research and retrieve information from their own source material—they aren’t necessarily confronted by any fewer database design limitations than the engineers who work at Gale. This gets at the heart of what’s at stake for researchers in a digital age, and why this is an apt time to explore the way digital archives work, on a computational level.

Many automated, machine-driven search techniques are a set of instructions that don’t always produce predictable results, and can be difficult to reverse engineer even when bugs are discovered. Corporate engineers don’t have full control over the results they get, and neither do hackers or the authors of open-source software.

Why is that important? One goal of Beyond Citation is to explore and provide information on how databases work, so that scholars can better understand their research results. One could argue that scholars require so-called “neutral” technology; systems that don’t favor any one type or set of results over another. And it’s easier to understand and confirm search neutrality if algorithms and source code are publicly available. But exactly what is such neutrality, and would we know it if we saw it? Any algorithm, secret or otherwise, is a product of disciplinary constraints and intersections, and reveals the boundaries of what’s computationally possible. In short, the “correctness” of any algorithm is hard to nail down.

When we look more closely at the concept of neutrality, we see that both the user and the engineer are implicated in algorithmic design choices. James Grimmelman, a lawyer, has made a compelling argument that, “Search is inherently subjective: it always involves guessing the diverse and unknown intentions of users.” Code that’s written as a service to users is written with an interaction already in mind. Evaluating the nuances of search algorithms and determining the impact they make on the integrity of one’s research involves acknowledging these kinds of imagined dialogues.

These are just some exploratory thoughts, as none of these questions about database design and search can be taken in isolation. Beyond Citation, then, is a starting point for going beyond digital literacy in multiple directions. We are gathering and presenting the kinds of knowledge that might allow scholars to distinguish between computational limitations, the limits of metadata and the ways it’s structured, and the agendas of a proprietary company. As the project evolves, we ourselves hope to deepen the kinds of skills and knowledge that allow us to present such information in the most meaningful and usable ways.

Communicating Technical Process

With alpha work on DH Box wrapping up, it’s a good moment to reflect on some technical lessons learned, as well as some lessons about being on the technical side of a team. Up to this point, while I have been keeping my team apprised in general of DH Box’s technical situation as it progressed, most of the details of its implementation, as well as the specific tools I’ve used and their justifications, pros/cons, and possible alternatives, I have kept to myself.

This is, in part, due to the fact that I did not begin with a particular plan. Though we had a well-defined goal for DH Box, I knew that there were myriad ways to reach it. So I experimented with different methods of cloud deployment and server provisioning, that is, different ways of creating each new instance of DH Box and automatically installing all of the necessary software on it.

I started with a BASH script designed to run on the first boot of each new DH Box instance. This worked well enough, but didn’t offer much in the way of sophisticated automation or transparency for debugging. I then tried some of the more well-known server deployment/provisioning tools, like Puppet and Salt. Puppet I found less straightforward than I’d hoped, partially because it requires modules to be written in a homespun variety of Ruby, which I’m not super comfortable with. Salt did more of what I wanted, but I was still reading its documentation when I became distracted by yet another tool, Ansible.

Ansible turned out to be just what I needed: It is written in Python, a language I have more familiarity with, and it allows me to monitor each deployment of a new DH Box in real time. Using Ansible, I’ve been able to create a whole automation workflow in one language, and, even better, I can easily see if and at exactly which point a deployment fails. This is crucial to efficient problem solving and future updates for DH Box, as its installation process necessarily involves many separate moving parts.

With these details of DH Box’s technical framework determined, it’s possible to create a more concrete “blueprint”, and I’m now working with our project planner, Gioia, to incorporate much more specific technical milestones into our overall plan. Going forward, I hope to keep everyone up-to-date and communicate some of what I learn along the way, without getting us too bogged-down in technical minutiae.

Collaborative Opportunities

The Travelogue team has been exploring how other sites are using maps as digital pedagogical tools.  We are also connecting with possible collaborators, including other mapping projects, educational institutions and libraries.

In an effort to be participate in the conversations happening on social network platforms, Travelogue has been monitoring how Twitter is being used by similar projects.  We have explored hashtags that are being used in reference to maps, are concerned with literature, teaching, English, History, Social Studies, high school teachers, lesson plans etc.  We have also been following the conversations/posts on the Humanities, Arts, Science and Technology Alliance and Collaboratory (HASTAC) site.

On the development front we are playing with several WordPress Child Themes to see which will best work for the Travelogue site and the ESRI Storymap we will be using.  Research wise, we have completed a workable draft of the Ernest Hemingway content spreadsheet which we will use to construct Travelogue’s Ernest Hemingway StoryMap.

The Travelogue Commons site has a Research section that is categorized and features helpul resources, compiled during the progression of the Travelogue project.  For example, Esri Storymaps for Education.

Thank you for following our journey.  We look forward to sharing our connections with others in the GIS world.

If you want to contact us please do. Our project blog is at Email us at dhtravelogue [at] gmail [dot] com or follow us on Twitter @DhTravelogue

It’s the Content, Stupid.

I teach workshops on library databases to a range of users throughout the year at the New York Public Library. Some of the walk-in students are academics, others are unaffiliated scholars, and many more are undergraduate or graduate students from nearby schools. The degree to which they’re familiar with platforms, searching, Boolean logic, peer-review, and formats varies. But one thing all the students share is general confusion as to which database they should use for the kind of research they’re conducting.

The database vendors don’t help: Readex? Never heard of it. ProQuest? Sounds vaguely familiar. And the database names—Academic Search Premier, Ulrich’s, Project Muse— are opaque. Yes, some exact titles, like The New York Times or Chicago Defender, can steer the user in a general direction, but without a greater understanding of the kind of content that can be found in each resource, the user is left to fend for his or herself. And that usually means Google. While Google is not an inherently bad choice, especially for initial research queries, many beneficial subscription resources are left unexplored.

Take online reference databases—in the past, a question asked at the information desk often resulted in a librarian directing the user towards the section of the physical reference shelf where one might find sources to help. Today, much of that reference shelf has moved online to platforms like Credo or Gale Virtual Reference Library. The online sources may provide 24/7 access to information, but finding relevant titles is often more difficult.

Theoretically, that’s where discovery platforms like Summon and EBSCO Discovery Service  come in. Discovery platforms search the metadata of nearly all the library’s subscription resources simultaneously so users don’t need to visit each database individually. But they are only helpful if the service your library subscribes to indexes the databases that you need. EBSCO Discovery Service, for example, doesn’t index ProQuest products, and vice versa. Therefore, if you’re using EBSCO for a search on historical newspapers or periodicals, your results will be greatly limited.

Perhaps it’s no surprise, then, that 97% of academic library directors surveyed in the recent Ithaka S+R survey cite teaching informational literacy to undergraduates as an important function of the library. With such limited transparency of online sources, undergraduates clearly need all the help they can get when starting their research.

The Beyond Citation team hopes that researchers—both seasoned and amateur—will shine the light on databases they use regularly by examining the database’s strengths, weaknesses, and the overall range of material. In other words, the content. Because without a better understanding of the troves of rich information discoverable in each database, they’re all just links on a page.

We are at Email us at BeyondCitation [at] gmail [dot] com or follow us on Twitter @beyondcitation as we get ready for the launch in May.