Featured post

DH Praxis Course Archive

This website was used for the DH Praxis course at the Graduate Center, CUNY, during the 2013-2014 academic year. Taught by Profs. Matthew K. Gold and Stephen Brier, the course aimed to introduce graduate students to the landscape of digital humanities work during the fall semester and to engage students in hands-on collaborative work during the spring.

Here are some things you can do on this site:

  • Check out the archived lectures of our Fall speakers on subjects such as topic modeling, data visualization, project management, and networked scholarly communication.
  • Visit the three projects produced by student teams during the Spring 2014 semester:

Beyond Citation: Critical Thinking About Academic Databases

DH Box: A Digital Humanities Laboratory In the Cloud

Travelogue: Mapping Literary History

  • Check out blog posts by individual students and teams below.

Announcing the launch of Beyond Citation

The Beyond Citation project team is thrilled to announce the public launch of our website at BeyondCitation.org. Even though scholars use academic databases every day, it is difficult to find information about how the databases work and what is in them. Beyond Citation gathers information about academic databases in one place to enable traditional humanities scholars and digital humanists to get a better sense of the content and searching mechanisms in databases. The goal of Beyond Citation is to make academic databases more transparent to users and to encourage critical thinking about academic databases.

The audience for the site is scholars, librarians, research enthusiasts or anyone who:

  • Uses academic databases and wants to learn more about what is in them
  • Is frustrated with academic databases and wants tips about how to more effectively search them
  • Wants to share their knowledge or experiences of academic databases with others

We invite you to participate in Beyond Citation by:

  • Starting or adding to a thread in the Community Forum.
  • Proposing an article or blog post that they would like to write
  • Offering to write an entire entry for a new database
Please visit BeyondCitation.org. Follow us on Twitter @beyondcitation

Continue reading

DH Box Takes Off

Cross-posted from the DH Box Blog: http://dhbox.commons.gc.cuny.edu/blog/2014/dh-box-takes-off


This is it: DH Box is officially launching. The Digital GC is presenting an evening of short talks from various CUNY Graduate Center digital initiatives today, May 12 — starting off with DH Box.

I wanted to take a moment to reflect on where DH Box started and how far we’ve come. We introduced our project in early February:

What is DH Box?

Not much, so far. But we intend it to be a portable, customized linux environment for Digital Humanities learners that can rely on incredibly inexpensive technology. All you really need is a computer that runs Linux (and a monitor and keyboard, of course!) — but the platform that excites us most is the Raspberry Pi, a tiny computer that sells for just $35. Imagine a collection of DH tools, pre-installed and configured, and a set of texts for users to interrogate — all on a portable and inexpensive device.

That’s a quote from our first blog post — and it illustrates the most drastic change to our project. DH Box’s founder, Stephen Zweibel, had originally envisioned DH Box as being scripts that, when run, installed common DH applications (think Omeka, MALLET, NLTK) onto the user’s system; additionally, DH Box could be shipped as its suite of tools pre-installed on the light and portable Raspberry Pi computer.

As DH Box developed, it took a shift in platform, moving away from the issue of dealing with the idiosyncrasies of each individual’s system, to hosting instances of a virtual computer that any user could launch.

This was a vast and visible shift. But, despite not being as drastic, many other project elements developed in the journey from DH Box’s inception to its official launch.

Continue reading

DH Box: Countdown to Launch

DH Box is nearly ready to launch! Our user experience testing is complete and we are putting the finishing touches on our front end interface. We have added more information to our documentation on our wiki including new pages for “What is DH Box” and “Launching DH Box” as well as pages for each of the tools in DH Box: IPython, MALLET, NLTK, Omeka, and RStudio.

Our team did a presentation on DH Box for the Academic Center for Excellence in Research and Teaching at Hunter College. It was exciting to show our project to faculty who may want to use it in classroom teaching and research. The audience included math professors, English professors, sociology professors, librarians, IT specialists, and adaptive technology specialists. We got a great range of questions. Some were very specific: “Can I upload a csv file with the information for all 25 students in my class or do I need to add a new user for each one?” Great idea – we’ll see what we can do! Some questions were more general such as “What tools do you plan to add in the future?” We will be adding more tools as the project develops, but we need to limit our selection to web based or command line tools that are open source.

Everyone was very interested in how using a virtual server can improve access to technology for students. Our team is excited about a new project at the University of Mary Washington called A Domain of One’s Own The project will provide all incoming freshmen with their own domain names and Web space. Students will have the freedom to create subdomains, install any LAMP-compatible software, setup databases and email addresses, and carve out their own space on the web that they own and control.

DH Box brings a powerful virtual computer to anyone with a web connection. Students do not need to own the most recent laptop computer or to attend a school with a big budget computer lab.  We are very exited about how our project may grow in the future to offer even more!

 

[Cross-posted] The conundrum of public creation

In the first blog post for our Travelogue: Mapping Literary History project “Welcome to Travelogue” written by our great Project Manager Sarah, she talked about the excitement the group felt at embarking on this project and our eagerness to learn new things and to create a great digital project. She was speaking the truth; we are all excited about working on this project.

For me, as the web site developer, the first thing I had the opportunity to learn was WordPress. The idea was that I would create a meta-blog site and the whole group would use the site to blog and post about the process we were all going through to create out project, “Travelogue – Mapping Literary History”. The process of creating this meta-blog site would give me the opportunity and a place where I could learn and play with WordPress so that when I had to create the official web site for our actual public project, I’d be comfortable and familiar with the CMS.

In her post Sarah also referenced a post I had written for our Fall 2013 Digital Praxis seminar, where I talked about not being afraid to fail. While I wrote about not worrying about failing and how the process itself of learning and trying new things was a success, whether the project failed or not, I must admit that while that may sound good, in reality it is hard to live that philosophy. I was afraid to fail, I was afraid to create a site which would be less than and to do it in public no less is not easy. It is not easy working and creating “in public” (a phrase our professor Matt Gold likes to use). It is not easy to talk about your worries and concerns in public. In my work life I’ve worked where you don’t show the process to the public, just the results. You know, you don’t want to see sausage being made; you just want to eat the sausage. I had to keep reminding myself that part of this class and project was actually doing a good portion of our work in public and letting the public see what we were doing, the difficulties we were having, along with our successes. Stay tuned for my next post where I will write about some of my failures and successes so far in creating these 2 sites and what I’ve learned so far working on this group project.

 

DH Box Development and Testing

We’ve made big strides developing the front end interface to launch a new DH Box, and the Welcome page/menu that acts as the DH Box ‘home base’. We received extremely helpful feedback from some generous volunteer user experience testers at City Tech, and valuable advice from Chris Stein, Director of User Experience for the CUNY Academic Commons.

The results of our first round of user experience testing gave our team some great insights, and a fresh perspective on the project. We learned that perhaps one of our biggest challenges is effectively conveying the concept of the project in a readily digestible way.

We discovered that users can easily get the impression that DH Box is essentially a website, when in fact it’s much more than that (it’s a computer!). It’s understandable that this virtual computer could be confused for a website since DH Box’s primary navigation happens through your web browser. A distinct IP address is assigned to each DH Box instance at the time of launch. DH Box users navigate to applications (Mallet, Omeka, etc.) through specific ports designated for each tool. The “port” is just a unique numeric identifier appended to the end of your DH Box IP address. This same protocol for assigning unique identifiers is the basis of the internet; there’s an IP address behind every website.

We as a team are now reexamining how to explain the system of navigation, along with all of the fantastic stuff a virtual computer can offer so that users will be ready to push DH Box to the limit.

Outreach: collaborative promotion

To promote a project in development is not an easy task but it isn’t impossible. Like other projects of the DH class, the outreach approach of Beyond Citation has been conceived as a collaborative effort from all team members.We all feel confident that the project has much to offer.  The question really is: how to make it known to our potential users.  However it is hard to measure what a strong audience is.  Right now we have a monthly average of 285 unique blog  visitors.  Is this number enough? What is considered a success in the online world?

Also, it is important to keep in mind that outreach isn’t a popularity contest; it is a combination of individual and collective actions working toward a common goal of engagement with the project. And that entails more than just counting the number of visitors.

Therefore, it was imperative to understand who is our core audience.  Based in the type of found on Beyond Citation, we believe that scholars and users of academic databases will be our core audience. But what is the best way to reach them?  Once the website is fully operational, having a minimum of 28 monthly users could be considered an outreach achievement. That could tell us that at least 10% percentage of our blog audience understood what Beyond Citation is.

WordPress is used for blogging, with different members of the group contributing. They have covered topics from understanding what is an academic database to questioning the importance of digital tools in the academic world.  We are having a good response from the on-line community, many of the post have gotten feedback through comments, tweets and retweets. The blog has become the main voice of the project while the platform is still under construction.

Since Beyond Citation is a digital project it seemed logical to use digital tools such as WordPress, Twitter and  LinkedIn for online promotion.

Twitter  has provided us with valuable ways to interact with scholars and members of the academy.  This powerful tool is the main social network to promote Beyond Citation. According to our Google Analytics report 95% of social referrals come from shared links from Twitter.

On the other hand, LinkedIn is a relative new outreach strategy.  The principal reason to have a LinkedIn account is to create a deeper online presence. This social network allows us to find specific users (based in their professional profile) and to establish different paths for promoting our project on the web.

Nevertheless there are some concerns surrounding what to do next. Press kits, tutorials and even podcasts are possible future outreach actions. However, we still need a final product, something more tangible to promote. Then each member of the team will have another task: keeping the interest of the users.

[Cross-Posting] On Successful #DH Project Management

Project management is difficult. As one of my teammates said to me point-blank: “I would not want your job.”

As our team began to work on Travelogue, I assumed that my brief stint organizing the development of two separate websites in various professional settings would help me. But while a background in marketing has allowed me to think more critically about things like publicity, nothing really prepared me for managing people my own age in a setting where we do not receive salaries for our work.

And while I have been extremely lucky to work with a group of brilliant people who are  invested in helping me complete the project, it has been tricky figuring out how to tell people what (and how much) to do; everyone has full lives outside of school.

In a work setting, orders would coming down from my boss who had little idea of the actual tasks we needed to take in order to complete a website. The details of these orders were laid out for me by advanced IT and design departments, each of whom had their own ideas about how the website should look and behave. In this project, where I am the “boss,” things were more difficult, especially because while all of us have great ideas, the actual means to execution can be unclear. But just because you only have a basic understanding of web design, it does not mean that you can’t build something (mostly) from scratch. You just need a good plan.

Websites and website redesigns can (and do) take years to complete, but for this project, we only have about four months. In the course of this semester thus far, I’ve found that a few things are essential to completing a project successfully. Some seem obvious, but when you are trying to keep a bunch of different wheels spinning, simple things can be easy to forget.

(Of course, this is not complete list)

Know Your Deliverables

What are the major tasks that need to be completed in order to produce a final project? In the course of a semester, what needs to be completed from week-to-week in order to get things done? Setting some key deadlines, and being able to adjust them, will help the project move forward. I made a simple project plan in an Excel document that was arranged by week, with a new goal for each Monday. From there, I doubled back and talked to my group members about what needed to be completed for each goal. I am indebted to Micki Kaufman for major assistance here, as well as to Tom Scheinfeldt’s lecture last semester.

Use Your Support Network

There are experts at your school who can help you. As it goes with everything, being afraid to ask for help can (and will) diminish your success.

Know Your Team’s Strengths (and Weaknesses)

Project management involves a good deal of emotional intelligence. Knowing where your group members are coming from, and being aware of and sensitive to what they can and can’t accomplish in a given time frame, will provide for a better outcome. It kind of goes without saying that actively listening to your group members’ concerns and ideas will make them more invested in your goals.

Be Flexible

This goes for allowing extra time in your project plan, as well as being open to adjusting your vision and/or timeline. It can be hard to let go of original ideas, but if they aren’t working, it’s important that you are able to recognize that and just let go. In the case of Travelogue, our project scope changed slightly from what I originally proposed when we learned more about our platform. You also have to pad enough extra time in your project plan in case you hit roadblocks or an unexpected learning curve.

Relax (a Little Bit)

In working on a major project with a tight deadline, not only is it important to manage your expectations, but it is also important not to put too much pressure on your group. My personality defaults to surface-level relaxation that can be misinterpreted as lackadaisical, when usually (like anyone else) I’m managing a huge amount of internal stress. I try not to micromanage my team as a result of my internal freakouts, which would make anyone stressed-out and disengaged. At the same time, being too lax about deadlines says: “I don’t really care.” If you don’t care, neither will they.

We are currently buzzing around our computers to get this thing done, with constant revision of the plan to keep things in motion.

Visit: http://travelogue.commons.gc.cuny.edu

And here is a link to the project plan for anyone who’s interested: https://docs.google.com/spreadsheet/ccc?key=0As13_khVZTLXdHBMV2NlNWwtTndiRTZsUk1QQTVWYnc&usp=sharing

Academic Databases: Beyond Digital Literacy

Basic digital literacy for scholarly research includes knowing how to access digital archives, search them, and interpret their results.

Another component of digital literacy is familiarity with the semiotics of the interface; knowing how to “read” the instructions and symbols that give the user an idea of what invisible material lives in a database. These portals make the contents accessible, and also convey, before a search is even conducted, a range of search possibilities. The interface suggests something about the most useful metadata that the archive contains and the way the data can be accessed.

A user, then, can glean understanding about the mechanics of the database through the interface alone. This additional level of digital literacy is helpful, but still represents a limited understanding of databases. Many of the commonly used archives that humanities scholars, librarians, and historians use are proprietary, and even with some information and educated guesses about these archives’ metadata structures, it’s difficult or impossible to go a step deeper and discern exactly how the search algorithms work and how the database is designed.

This is an issue of emerging importance for digital scholars, and is prompting historians and others to think about what appears in search results and what doesn’t. But even if researchers knew how every database and its search algorithms worked, that wouldn’t resolve all the issues and theoretical implications of digital research and scholarship. As Ben Schmidt has pointed out, “database design constrains the ways historians can use digital sources.”

The limits of database design are an important window into the computational disciplines that enable information science in the first place. Programming machines to search a hybrid of digitized source materials is of course a wide problem, involving a myriad of methods, employing methods that are constantly evolving and becoming more powerful. Therefore, it’s interesting to ask: When are the issues associated with digital research contingent on computational science and when are they contingent on the way that proprietary archives and databases choose to implement the latest algorithms?

An interesting consideration in addressing this question might start with a distinction that William J. Turkel makes between scholars who use subscription archives and those who write code to mine massive data sets themselves. The literary scholar Ted Underwood has also discussed searching academic databases and data mining in parallel, commenting, “I suspect that many humanists who think they don’t need “big data” approaches are actually using those approaches every day when they run Google searches . . . Search is already a form of data mining. It’s just not a very rigorous form: it’s guaranteed only to produce confirmation of the theses you bring to it.”

Thinking about the distinction between proprietary database engineer and dataset hackers might foster the assumption that those two parties have radically different agendas or methods for searching born-digital and and digitized archive material. But while independent programmers represents a new frontier of sorts—scholars willing to learn the methods needed to do their own research and retrieve information from their own source material—they aren’t necessarily confronted by any fewer database design limitations than the engineers who work at Gale. This gets at the heart of what’s at stake for researchers in a digital age, and why this is an apt time to explore the way digital archives work, on a computational level.

Many automated, machine-driven search techniques are a set of instructions that don’t always produce predictable results, and can be difficult to reverse engineer even when bugs are discovered. Corporate engineers don’t have full control over the results they get, and neither do hackers or the authors of open-source software.

Why is that important? One goal of Beyond Citation is to explore and provide information on how databases work, so that scholars can better understand their research results. One could argue that scholars require so-called “neutral” technology; systems that don’t favor any one type or set of results over another. And it’s easier to understand and confirm search neutrality if algorithms and source code are publicly available. But exactly what is such neutrality, and would we know it if we saw it? Any algorithm, secret or otherwise, is a product of disciplinary constraints and intersections, and reveals the boundaries of what’s computationally possible. In short, the “correctness” of any algorithm is hard to nail down.

When we look more closely at the concept of neutrality, we see that both the user and the engineer are implicated in algorithmic design choices. James Grimmelman, a lawyer, has made a compelling argument that, “Search is inherently subjective: it always involves guessing the diverse and unknown intentions of users.” Code that’s written as a service to users is written with an interaction already in mind. Evaluating the nuances of search algorithms and determining the impact they make on the integrity of one’s research involves acknowledging these kinds of imagined dialogues.

These are just some exploratory thoughts, as none of these questions about database design and search can be taken in isolation. Beyond Citation, then, is a starting point for going beyond digital literacy in multiple directions. We are gathering and presenting the kinds of knowledge that might allow scholars to distinguish between computational limitations, the limits of metadata and the ways it’s structured, and the agendas of a proprietary company. As the project evolves, we ourselves hope to deepen the kinds of skills and knowledge that allow us to present such information in the most meaningful and usable ways.

Communicating Technical Process

With alpha work on DH Box wrapping up, it’s a good moment to reflect on some technical lessons learned, as well as some lessons about being on the technical side of a team. Up to this point, while I have been keeping my team apprised in general of DH Box’s technical situation as it progressed, most of the details of its implementation, as well as the specific tools I’ve used and their justifications, pros/cons, and possible alternatives, I have kept to myself.

This is, in part, due to the fact that I did not begin with a particular plan. Though we had a well-defined goal for DH Box, I knew that there were myriad ways to reach it. So I experimented with different methods of cloud deployment and server provisioning, that is, different ways of creating each new instance of DH Box and automatically installing all of the necessary software on it.

I started with a BASH script designed to run on the first boot of each new DH Box instance. This worked well enough, but didn’t offer much in the way of sophisticated automation or transparency for debugging. I then tried some of the more well-known server deployment/provisioning tools, like Puppet and Salt. Puppet I found less straightforward than I’d hoped, partially because it requires modules to be written in a homespun variety of Ruby, which I’m not super comfortable with. Salt did more of what I wanted, but I was still reading its documentation when I became distracted by yet another tool, Ansible.

Ansible turned out to be just what I needed: It is written in Python, a language I have more familiarity with, and it allows me to monitor each deployment of a new DH Box in real time. Using Ansible, I’ve been able to create a whole automation workflow in one language, and, even better, I can easily see if and at exactly which point a deployment fails. This is crucial to efficient problem solving and future updates for DH Box, as its installation process necessarily involves many separate moving parts.

With these details of DH Box’s technical framework determined, it’s possible to create a more concrete “blueprint”, and I’m now working with our project planner, Gioia, to incorporate much more specific technical milestones into our overall plan. Going forward, I hope to keep everyone up-to-date and communicate some of what I learn along the way, without getting us too bogged-down in technical minutiae.