Author Archives: Gioia Stevens

DH Box: Countdown to Launch

DH Box is nearly ready to launch! Our user experience testing is complete and we are putting the finishing touches on our front end interface. We have added more information to our documentation on our wiki including new pages for “What is DH Box” and “Launching DH Box” as well as pages for each of the tools in DH Box: IPython, MALLET, NLTK, Omeka, and RStudio.

Our team did a presentation on DH Box for the Academic Center for Excellence in Research and Teaching at Hunter College. It was exciting to show our project to faculty who may want to use it in classroom teaching and research. The audience included math professors, English professors, sociology professors, librarians, IT specialists, and adaptive technology specialists. We got a great range of questions. Some were very specific: “Can I upload a csv file with the information for all 25 students in my class or do I need to add a new user for each one?” Great idea – we’ll see what we can do! Some questions were more general such as “What tools do you plan to add in the future?” We will be adding more tools as the project develops, but we need to limit our selection to web based or command line tools that are open source.

Everyone was very interested in how using a virtual server can improve access to technology for students. Our team is excited about a new project at the University of Mary Washington called A Domain of One’s Own The project will provide all incoming freshmen with their own domain names and Web space. Students will have the freedom to create subdomains, install any LAMP-compatible software, setup databases and email addresses, and carve out their own space on the web that they own and control.

DH Box brings a powerful virtual computer to anyone with a web connection. Students do not need to own the most recent laptop computer or to attend a school with a big budget computer lab.  We are very exited about how our project may grow in the future to offer even more!

 

User experience testing and documentation

DH Box is really taking shape! We have a bare bones version of our server image up and running thanks to all of Steve’s hard work over the last week. We have revised our project plan with new milestone dates and a clear cut set of tasks we need to accomplish. We are working hard on everything we need to do now and also looking forward to the next phase.

User experience testing and documentation will be very important over the next few weeks. We need to be sure that people who are not already familiar with the command line, cloud computing, and DH tool installation will find DH Box easy and convenient to use. Documentation (aka the “user manual”) will be the key to helping users make the most of DH Box. We have decided to use Read the Docs  to host our documentation. Read the Docs allows us to host documentation files on our website and update our documentation when pushing to the GitHub repository that hosts our website – this means updating our online documentation is as simple as updating text on our website! One great benefit of using a utility like Read the Docs is our documentation will be easily maintainable, will be forkable by contributors, will be available online, and will be searchable.

Refining our focus and finding connections

The DH Box team has been working hard on defining the scope for DH Box and setting up our project plan. We’ve started using Asana as our project management tool. As the project manager, I’m really enjoying Asana. It’s flexible, easy, and it allows our team to collaborate on building the plan as we go. It’s also very nice that it tracks everything and sends out plenty of reminders!

Our scope has been narrowing down as we refine our concept of DH Box. We are thinking more about who will use DH Box and thinking about the best way to make it a valuable toolkit for introductory students in digital humanities classes.

Pedagogy is a key part of the digital humanities at the CUNY Graduate Center and the Praxis Network. Our focus for the first phase of development will be text analysis and topic modeling including key tools such as MalletNatural Language Toolkit (NLTK), and the Stanford Named Entity Recognizer. We are going to build an interactive textbook using IPython Notebook. The textbook will be bundled with the DH Box install scripts and it will help orient students with the tools through interactive code execution. We have also thought more about our platform and what would be most useful for our users. We are going to make DH Box available for download not only for Raspberry Pi but also for Linux, Mac, and hopefully Windows.

As we have narrowed down our scope, we are also discovering a much wider range of connections to the DH community. Our professor, Matt Gold, has put us in touch with his colleague Dennis TenenGC Digital Fellow  Micki Kaufman suggested we check out Ian Milligan’s work and we’ve found amazing stuff in Big Digital History: Exploring Big Data through a Historian’s Macroscope, a co-written manuscript by Shawn Graham, Ian Milligan, and Scott Weingart. My library colleague Roxanne Shirazi, who edits the dh+lib blog, suggested we check out an idea for a project called DH creator stick which George Williams proposed at THATCamp Piedmont 2012 (see also a blog post by Mark Sample).

We’re amazed by the range of rich ideas we are beginning to discover. We hope to reach out to the DH community and ask for advice and feedback as DH Box takes shape.

Collaboration and creative constraints

The One Week | One Tool project shows that time and resource constraints really can be made to work in a project’s favor. Twelve strangers who committed to seven days of all hack, no yack, and very little sleep made some great tools (Serendip-o-matic, Anthologize). Creating severe constraints can foster both collaboration and creativity.  Tom Scheinfeldt pointed out three key lessons for successful collaborations: 1) embrace serendipity 2) let go and 3) collaboration is shared doing.

DH barn raising projects are inspiring, but they are artificial laboratories of collaboration. How many of the lessons will be useful outside of an intense, boot camp style workshop? I agree that collaboration is shared doing, but I started to wonder when Scheinfeldt pointed out that time constraints can mean sacrificing shared decision making.  My workplace culture focuses on consensus building, so I found it both slightly shocking and secretly delightful (those staff meetings are long!) when Scheinfeldt said “Moving on will mend hurt feelings more than talking about it.”

What about collaboration for the rest of us? Very few people have the luxury of attending a week long workshop. The real world has plenty of constraints (time, money, jobs, families, multiple competing projects, laundry etc.). What are best ways to use these constraints to promote collaboration and creativity? Maybe it’s less about crash programs and more about intelligent adaptation to existing conditions? Starting a business (designing the right product for the right market) or gardening (choosing the best plant to thrive in a particular location) could be useful examples for thinking about the best ways to use what we have to build something together.

David Mimno and fatty tuna

David Mimno made an important distinction about theory vs. practice when he pointed out that MALLET (or any DH tool) is a method, not a methodology.  MALLET can uncover thematic patterns in massive digital collections, but it is up to the researcher using the tool to evaluate the results, pose new questions, and think of possible new uses for the tool.  In our class discussion, Mimno compared different roles in topic modeling to Iron Chef:  he makes the knives (MALLET), librarians dump a lot of fatty tuna (the corpus of text) on the table, and the humanists are the chefs who need to make the meal (interpreting and drawing new conclusions from the results).

As a librarian, I have never thought of myself as a provider of fatty tuna, but I get the general point. What role do librarians and other alt-academics play in DH? Can a librarian be a tool maker, a chef, a sous-chef, a waitress, or something else entirely?  What does it mean to curate content and devise valuable ways to access that content?  Is it scholarship? I am not sure if I can answer that question, but I do see many new ways to apply MALLET as a search and discovery tool which would be very useful for scholarship.

Can we do better than key word search to find relevant information in huge collections of digital text? Would search terms created from the body of the text itself be more accurate than hand-coding using the very dated and narrow Library of Congress subject headings? The DH literature on topic modeling doesn’t have much on libraries, but I did find the following information. Yale, U. Michigan, and UC Irvine received an Institute of Museum and Library Services grant to study Improving Search and Discovery of Digital Resources Using Topic Modeling. See also an interesting D-Lib Magazine article on using topic modeling in HathiTrust, A New Way to Find: Testing the Use of Clustering Topics in Digital Libraries  

DH definition

At the beginning of class: DH is research, study and teaching at the intersection of digital technology and the humanities.

At the end of class: Is DH a transitory “big tent”? To a humanist, any digital technology is potentially a tool, a text or a metaphor.

I have many questions and I am not sure of the answers (or even if there are any real answers).  What are the parameters of DH? Is it one field or many? Is it the eversion of humanities computing? is it a new set of tools to change academia as a whole? Can an academic program with established departments, journals, and grants be considered a transitory phase? I like the hands-on praxis and the quantitative side.