[Cross-Posting] On Successful #DH Project Management

Project management is difficult. As one of my teammates said to me point-blank: “I would not want your job.”

As our team began to work on Travelogue, I assumed that my brief stint organizing the development of two separate websites in various professional settings would help me. But while a background in marketing has allowed me to think more critically about things like publicity, nothing really prepared me for managing people my own age in a setting where we do not receive salaries for our work.

And while I have been extremely lucky to work with a group of brilliant people who are  invested in helping me complete the project, it has been tricky figuring out how to tell people what (and how much) to do; everyone has full lives outside of school.

In a work setting, orders would coming down from my boss who had little idea of the actual tasks we needed to take in order to complete a website. The details of these orders were laid out for me by advanced IT and design departments, each of whom had their own ideas about how the website should look and behave. In this project, where I am the “boss,” things were more difficult, especially because while all of us have great ideas, the actual means to execution can be unclear. But just because you only have a basic understanding of web design, it does not mean that you can’t build something (mostly) from scratch. You just need a good plan.

Websites and website redesigns can (and do) take years to complete, but for this project, we only have about four months. In the course of this semester thus far, I’ve found that a few things are essential to completing a project successfully. Some seem obvious, but when you are trying to keep a bunch of different wheels spinning, simple things can be easy to forget.

(Of course, this is not complete list)

Know Your Deliverables

What are the major tasks that need to be completed in order to produce a final project? In the course of a semester, what needs to be completed from week-to-week in order to get things done? Setting some key deadlines, and being able to adjust them, will help the project move forward. I made a simple project plan in an Excel document that was arranged by week, with a new goal for each Monday. From there, I doubled back and talked to my group members about what needed to be completed for each goal. I am indebted to Micki Kaufman for major assistance here, as well as to Tom Scheinfeldt’s lecture last semester.

Use Your Support Network

There are experts at your school who can help you. As it goes with everything, being afraid to ask for help can (and will) diminish your success.

Know Your Team’s Strengths (and Weaknesses)

Project management involves a good deal of emotional intelligence. Knowing where your group members are coming from, and being aware of and sensitive to what they can and can’t accomplish in a given time frame, will provide for a better outcome. It kind of goes without saying that actively listening to your group members’ concerns and ideas will make them more invested in your goals.

Be Flexible

This goes for allowing extra time in your project plan, as well as being open to adjusting your vision and/or timeline. It can be hard to let go of original ideas, but if they aren’t working, it’s important that you are able to recognize that and just let go. In the case of Travelogue, our project scope changed slightly from what I originally proposed when we learned more about our platform. You also have to pad enough extra time in your project plan in case you hit roadblocks or an unexpected learning curve.

Relax (a Little Bit)

In working on a major project with a tight deadline, not only is it important to manage your expectations, but it is also important not to put too much pressure on your group. My personality defaults to surface-level relaxation that can be misinterpreted as lackadaisical, when usually (like anyone else) I’m managing a huge amount of internal stress. I try not to micromanage my team as a result of my internal freakouts, which would make anyone stressed-out and disengaged. At the same time, being too lax about deadlines says: “I don’t really care.” If you don’t care, neither will they.

We are currently buzzing around our computers to get this thing done, with constant revision of the plan to keep things in motion.

Visit: http://travelogue.commons.gc.cuny.edu

And here is a link to the project plan for anyone who’s interested: https://docs.google.com/spreadsheet/ccc?key=0As13_khVZTLXdHBMV2NlNWwtTndiRTZsUk1QQTVWYnc&usp=sharing

Academic Databases: Beyond Digital Literacy

Basic digital literacy for scholarly research includes knowing how to access digital archives, search them, and interpret their results.

Another component of digital literacy is familiarity with the semiotics of the interface; knowing how to “read” the instructions and symbols that give the user an idea of what invisible material lives in a database. These portals make the contents accessible, and also convey, before a search is even conducted, a range of search possibilities. The interface suggests something about the most useful metadata that the archive contains and the way the data can be accessed.

A user, then, can glean understanding about the mechanics of the database through the interface alone. This additional level of digital literacy is helpful, but still represents a limited understanding of databases. Many of the commonly used archives that humanities scholars, librarians, and historians use are proprietary, and even with some information and educated guesses about these archives’ metadata structures, it’s difficult or impossible to go a step deeper and discern exactly how the search algorithms work and how the database is designed.

This is an issue of emerging importance for digital scholars, and is prompting historians and others to think about what appears in search results and what doesn’t. But even if researchers knew how every database and its search algorithms worked, that wouldn’t resolve all the issues and theoretical implications of digital research and scholarship. As Ben Schmidt has pointed out, “database design constrains the ways historians can use digital sources.”

The limits of database design are an important window into the computational disciplines that enable information science in the first place. Programming machines to search a hybrid of digitized source materials is of course a wide problem, involving a myriad of methods, employing methods that are constantly evolving and becoming more powerful. Therefore, it’s interesting to ask: When are the issues associated with digital research contingent on computational science and when are they contingent on the way that proprietary archives and databases choose to implement the latest algorithms?

An interesting consideration in addressing this question might start with a distinction that William J. Turkel makes between scholars who use subscription archives and those who write code to mine massive data sets themselves. The literary scholar Ted Underwood has also discussed searching academic databases and data mining in parallel, commenting, “I suspect that many humanists who think they don’t need “big data” approaches are actually using those approaches every day when they run Google searches . . . Search is already a form of data mining. It’s just not a very rigorous form: it’s guaranteed only to produce confirmation of the theses you bring to it.”

Thinking about the distinction between proprietary database engineer and dataset hackers might foster the assumption that those two parties have radically different agendas or methods for searching born-digital and and digitized archive material. But while independent programmers represents a new frontier of sorts—scholars willing to learn the methods needed to do their own research and retrieve information from their own source material—they aren’t necessarily confronted by any fewer database design limitations than the engineers who work at Gale. This gets at the heart of what’s at stake for researchers in a digital age, and why this is an apt time to explore the way digital archives work, on a computational level.

Many automated, machine-driven search techniques are a set of instructions that don’t always produce predictable results, and can be difficult to reverse engineer even when bugs are discovered. Corporate engineers don’t have full control over the results they get, and neither do hackers or the authors of open-source software.

Why is that important? One goal of Beyond Citation is to explore and provide information on how databases work, so that scholars can better understand their research results. One could argue that scholars require so-called “neutral” technology; systems that don’t favor any one type or set of results over another. And it’s easier to understand and confirm search neutrality if algorithms and source code are publicly available. But exactly what is such neutrality, and would we know it if we saw it? Any algorithm, secret or otherwise, is a product of disciplinary constraints and intersections, and reveals the boundaries of what’s computationally possible. In short, the “correctness” of any algorithm is hard to nail down.

When we look more closely at the concept of neutrality, we see that both the user and the engineer are implicated in algorithmic design choices. James Grimmelman, a lawyer, has made a compelling argument that, “Search is inherently subjective: it always involves guessing the diverse and unknown intentions of users.” Code that’s written as a service to users is written with an interaction already in mind. Evaluating the nuances of search algorithms and determining the impact they make on the integrity of one’s research involves acknowledging these kinds of imagined dialogues.

These are just some exploratory thoughts, as none of these questions about database design and search can be taken in isolation. Beyond Citation, then, is a starting point for going beyond digital literacy in multiple directions. We are gathering and presenting the kinds of knowledge that might allow scholars to distinguish between computational limitations, the limits of metadata and the ways it’s structured, and the agendas of a proprietary company. As the project evolves, we ourselves hope to deepen the kinds of skills and knowledge that allow us to present such information in the most meaningful and usable ways.

Communicating Technical Process

With alpha work on DH Box wrapping up, it’s a good moment to reflect on some technical lessons learned, as well as some lessons about being on the technical side of a team. Up to this point, while I have been keeping my team apprised in general of DH Box’s technical situation as it progressed, most of the details of its implementation, as well as the specific tools I’ve used and their justifications, pros/cons, and possible alternatives, I have kept to myself.

This is, in part, due to the fact that I did not begin with a particular plan. Though we had a well-defined goal for DH Box, I knew that there were myriad ways to reach it. So I experimented with different methods of cloud deployment and server provisioning, that is, different ways of creating each new instance of DH Box and automatically installing all of the necessary software on it.

I started with a BASH script designed to run on the first boot of each new DH Box instance. This worked well enough, but didn’t offer much in the way of sophisticated automation or transparency for debugging. I then tried some of the more well-known server deployment/provisioning tools, like Puppet and Salt. Puppet I found less straightforward than I’d hoped, partially because it requires modules to be written in a homespun variety of Ruby, which I’m not super comfortable with. Salt did more of what I wanted, but I was still reading its documentation when I became distracted by yet another tool, Ansible.

Ansible turned out to be just what I needed: It is written in Python, a language I have more familiarity with, and it allows me to monitor each deployment of a new DH Box in real time. Using Ansible, I’ve been able to create a whole automation workflow in one language, and, even better, I can easily see if and at exactly which point a deployment fails. This is crucial to efficient problem solving and future updates for DH Box, as its installation process necessarily involves many separate moving parts.

With these details of DH Box’s technical framework determined, it’s possible to create a more concrete “blueprint”, and I’m now working with our project planner, Gioia, to incorporate much more specific technical milestones into our overall plan. Going forward, I hope to keep everyone up-to-date and communicate some of what I learn along the way, without getting us too bogged-down in technical minutiae.

Collaborative Opportunities

The Travelogue team has been exploring how other sites are using maps as digital pedagogical tools.  We are also connecting with possible collaborators, including other mapping projects, educational institutions and libraries.

In an effort to be participate in the conversations happening on social network platforms, Travelogue has been monitoring how Twitter is being used by similar projects.  We have explored hashtags that are being used in reference to maps, are concerned with literature, teaching, English, History, Social Studies, high school teachers, lesson plans etc.  We have also been following the conversations/posts on the Humanities, Arts, Science and Technology Alliance and Collaboratory (HASTAC) site.

On the development front we are playing with several WordPress Child Themes to see which will best work for the Travelogue site and the ESRI Storymap we will be using.  Research wise, we have completed a workable draft of the Ernest Hemingway content spreadsheet which we will use to construct Travelogue’s Ernest Hemingway StoryMap.

The Travelogue Commons site has a Research section that is categorized and features helpul resources, compiled during the progression of the Travelogue project.  For example, Esri Storymaps for Education.

Thank you for following our journey.  We look forward to sharing our connections with others in the GIS world.

If you want to contact us please do. Our project blog is at  travelogue.commons.gc.cuny.edu. Email us at dhtravelogue [at] gmail [dot] com or follow us on Twitter @DhTravelogue

It’s the Content, Stupid.

I teach workshops on library databases to a range of users throughout the year at the New York Public Library. Some of the walk-in students are academics, others are unaffiliated scholars, and many more are undergraduate or graduate students from nearby schools. The degree to which they’re familiar with platforms, searching, Boolean logic, peer-review, and formats varies. But one thing all the students share is general confusion as to which database they should use for the kind of research they’re conducting.

The database vendors don’t help: Readex? Never heard of it. ProQuest? Sounds vaguely familiar. And the database names—Academic Search Premier, Ulrich’s, Project Muse— are opaque. Yes, some exact titles, like The New York Times or Chicago Defender, can steer the user in a general direction, but without a greater understanding of the kind of content that can be found in each resource, the user is left to fend for his or herself. And that usually means Google. While Google is not an inherently bad choice, especially for initial research queries, many beneficial subscription resources are left unexplored.

Take online reference databases—in the past, a question asked at the information desk often resulted in a librarian directing the user towards the section of the physical reference shelf where one might find sources to help. Today, much of that reference shelf has moved online to platforms like Credo or Gale Virtual Reference Library. The online sources may provide 24/7 access to information, but finding relevant titles is often more difficult.

Theoretically, that’s where discovery platforms like Summon and EBSCO Discovery Service  come in. Discovery platforms search the metadata of nearly all the library’s subscription resources simultaneously so users don’t need to visit each database individually. But they are only helpful if the service your library subscribes to indexes the databases that you need. EBSCO Discovery Service, for example, doesn’t index ProQuest products, and vice versa. Therefore, if you’re using EBSCO for a search on historical newspapers or periodicals, your results will be greatly limited.

Perhaps it’s no surprise, then, that 97% of academic library directors surveyed in the recent Ithaka S+R survey cite teaching informational literacy to undergraduates as an important function of the library. With such limited transparency of online sources, undergraduates clearly need all the help they can get when starting their research.

The Beyond Citation team hopes that researchers—both seasoned and amateur—will shine the light on databases they use regularly by examining the database’s strengths, weaknesses, and the overall range of material. In other words, the content. Because without a better understanding of the troves of rich information discoverable in each database, they’re all just links on a page.

We are at blog.beyondcitation.org. Email us at BeyondCitation [at] gmail [dot] com or follow us on Twitter @beyondcitation as we get ready for the launch in May.

 

DH Box considers deployment options

Cross-posted from the DH Box Blog: http://dhbox.commons.gc.cuny.edu/blog/2014/deployment-options-dh-box


Once DH Box knew the platform it would adopt, it was simply a matter of figuring out the best way to utilize that platform. But was it so simple?

What the DH Box Team has been tackling this week is striking a balance between providing a robust tool that is useful for the intended audience and whose maintenance is not insurmountable for its administrators.

To recap — the platform chosen for delivering the DH Box environment, ready with DH tools installed, is a web server image provided through Amazon’s AMI (Amazon Machine Image) appliance. This will deliver, in essence, an identical copy of a tool-laden operating system to any user’s system.

Choosing this platform offered important benefits — for example, freedom from having to address issues caused by tools being installed to users’ personal systems. However, it also introduced tension: to deploy images hosted by Amazon, one needs to use an Amazon account. Would we have users create their own Amazon Web Services (AWS) accounts that require credit card information (though launching the Image is a free service) or would we maintain an account that instances would be launched from and figure out how the DH Box team would handle potential related charges?

Many questions entered into this equation: Would our intended users be open to providing credit card information? Who might this alienate? Or, if we managed the AWS account with many instances running, would we incur charges we’re not prepared to deal with? What would be the time-period allotted to users for running the instances?

DH Box has had to think through how different deployment options (e.g. requiring users to have their own AWS accounts) might affect how DH Box will be adopted by intended users. And this — the tension between providing a service that is maintainable, sustainable, and at-once useful to the intended audience — is something any project like DH Box might face.

User experience testing and documentation

DH Box is really taking shape! We have a bare bones version of our server image up and running thanks to all of Steve’s hard work over the last week. We have revised our project plan with new milestone dates and a clear cut set of tasks we need to accomplish. We are working hard on everything we need to do now and also looking forward to the next phase.

User experience testing and documentation will be very important over the next few weeks. We need to be sure that people who are not already familiar with the command line, cloud computing, and DH tool installation will find DH Box easy and convenient to use. Documentation (aka the “user manual”) will be the key to helping users make the most of DH Box. We have decided to use Read the Docs  to host our documentation. Read the Docs allows us to host documentation files on our website and update our documentation when pushing to the GitHub repository that hosts our website – this means updating our online documentation is as simple as updating text on our website! One great benefit of using a utility like Read the Docs is our documentation will be easily maintainable, will be forkable by contributors, will be available online, and will be searchable.

Thinking About Authority and Academic Databases

Beyond Citation hopes to encourage critical thinking by scholars about academic databases. But what do we mean by critical thinking? Media culture scholar Wendy Hui Kyong Chun has defined critique as “not attacking what you think is false, but thinking through the limitations and possibilities of what you think is true.”

One question that the Beyond Citation team is considering is the scholarly authority of a database. Yale University Library addresses the question of scholarly authority in a handout entitled the “Web vs. Library Databases,” a guide for undergraduates. The online PDF states that information on the web is “seldom regulated, which means the authority is often in doubt.” By contrast, “authority and trustworthiness are virtually guaranteed” to the user of library databases.

Let’s leave aside for the moment the question of whether scholars should always prefer the “regulated” information of databases to the unruly data found on the Internet. While Yale Library may simply be using shorthand to explain academic databases to undergraduates, to the extent that they are equating databases and trustworthiness, I think they may be ceding authority to databases too readily and missing some of the complexity of the current digital information landscape.

Yale Library cites Academic Search and Lexis-Nexis as examples of databases. Lexis-Nexis is a compendium of news articles, broadcast transcripts, press releases, law cases, as well as Internet miscellany. Lexis-Nexis is probably authoritative in the sense that one can be comfortable that the items accessed are the actual articles obtained directly from publishers and thus contain the complete texts of articles (with images removed). In that limited sense, items in Lexis-Nexis are certainly more reliable than results obtained from a web search. (Although this isn’t true for media historians who want to see the entire page with pictures and advertisements included. For that, try the web or another newspaper database). Despite its relatively long pedigree for an electronic database, careful scrutiny of results is just as crucial when doing a search in Lexis-Nexis as it is for an Internet search.

In some instances, especially when seeking information about non-mainstream topics, searching the Internet may be a better option. Composition and rhetoric scholar Janine Solberg has written about her experience of research in digital environments, in particular how full-text searches on Amazon, Google Books, the Internet Archive and HathiTrust enabled her to locate information that she was unable to find in conventional library catalogs. She says, “Web-based searching allowed me not only to thicken my rhetorical scene more quickly but also to rapidly test and refine questions and hypotheses.” In the same article, Solberg calls for “more explicit reflection and discipline-specific conversation around the uses and shaping effects of these [digital] technologies” and recommends as a method “sharing and circulating research narratives that make the processes of historical research visible to a wider audience . . . with particular attention to the mediating role of technologies.”

Adding to the challenge of thinking critically about academic databases is their dynamic nature. The terrain of library databases is changing as more libraries adopt proprietary “discovery” systems that search across the entire set of databases to which libraries subscribe. For example, the number of JSTOR users has dropped “as much as 50%” with installations of discovery systems and changes in Google’s algorithms. Shifts in discovery have led to pointed discussions between associations of librarians and database publishers about the lack of transparency of search mechanisms. In 2012, Tim Collins, the president of EBSCO, a major database and discovery system vendor, found it necessary to address the question of whether vendors of discovery systems favor their own content in searches, denying that they do. There is, however, no way for anyone outside the companies to verify his statement because the vendors will not reveal their search algorithms.

While understanding the ranking of search results in academic databases is an open question, a recent study comparing research in databases, Google Scholar and library discovery systems by Asher et al. found that “students imbued the search tools themselves with a great deal of authority,” often by relying on the brand name of the database. More than 90% of students in the study never went past the first page of search results. As the study notes, “students are de facto outsourcing much of the evaluation process to the search algorithm itself.”

In addition, lest one imagine that scholars are immune to an uncritical perspective on digital sources, in his study of the citation of newspaper databases in Canadian dissertations, historian Ian Milligan says that scholars have adopted the use of these databases without achieving a concomitant perspective on their shortcomings. Similarly to the Asher et al. study of undergraduate students, Milligan says, “Researchers cite what they find online.”

If critique is, as Chun says, thinking through the limitations and possibilities of what we think is true, then perhaps by encouraging reflective conversations among scholars about how these ubiquitous digital tools shape research and the production of knowledge, Beyond Citation’s efforts will be another step toward that critique.

We are at blog.beyondcitation.org. Email us at BeyondCitation [at] gmail [dot] com or follow us on Twitter @beyondcitation as we get ready for the launch in May.

Travelogue: Format Selection and Other Updates

The team chose the ESRI ArcGIS Storymaps platform for the Travelogue project.  Last week the team had a vote on which ESRI ArcGIS Storymaps format to go with, the options were:

Sequential, Place-based Narratives Map Tour http://storymaps.arcgis.com/en/app-list/map-tour/

A Curated List of Points of Interest Short List http://storymaps.arcgis.com/en/app-list/shortlist/

Comparing Two or More Maps Tabbed Viewer  http://storymaps.arcgis.com/en/app-list/tabbed-viewer/

Comparing Two or More Maps Side Accordion http://storymaps.arcgis.com/en/app-list/side-accordion

A Curated List of Points of Interest Playlist http://storymaps.arcgis.com/en/app-list/playlist

The winner was…Map Tour http://storymaps.arcgis.com/en/app-list/map-tour/

Each team member has an Esri ArcGIS organizational account that can be used to practice and publish.  With the format selected and a large volume of research content done we can now start building.  The American authors that we have chosen to initially feature are Zora Neale Hurston and Ernest Hemingway.  We have shared Google Drive folders for each that feature spreadsheets with the research collected so far.  The spreadsheet entries are organized with a unified chronological date so that the journeys can be mapped chronologically.  All of the locations on both spreadsheets also have coordinates.

Informational text about each author is being written and audiovisual material to be featured on the Travelogue site is being collected.  Notably, direct links to Hemingway images from the JFK Library’s Media Gallery http://www.jfklibrary.org/JFK/Media-Gallery.aspx For the content sources we have chosen to use the MLA citation format.

The Travelogue’s Twitter account has received a few new followers.  Also, a Travelogue tweet was favorited by a San Francisco Chronicle newspaper Book Editor (all acknowledgements count).  The Twitter logo has been redesigned.  The look of the Twitter page has been updated to reflect the biblio and cartographic aspects of the project. Check it out @dhtravelogue

The team is looking forward to providing a status update presentation to the DH Praxis class on Monday, March 24th.

If you want to contact us please do. Our project blog is at  travelogue.commons.gc.cuny.edu. Email us at dhtravelogue [at] gmail [dot] com or follow us on Twitter @DhTravelogue

Finding a Home: Travelogue Picks a URL

The Travelogue team has been navigating the URL waters (travel puns abound but URL names do not).  By Monday, March 17th the URL had been decided upon and purchased.  Details soon to follow (we will let you know when to begin the drum roll).

Other updates: On the Travelogue’s Commons page the Twitter feed has been updated removing the icons and making it more text based.  The team is also choosing between paper texture images to be used for the Travelogue’s Commons site background, consulting with guides on 2014 web design trends.  We have been actively working on the Zotero citations for the content that will be featured on the Travelogue site.  Meet-ups outside of normal class hours have been scheduled.  We have been outlining the research that has been done so far and what needs to be worked on.  Zora Neale Hurston and Ernest Hemingway are the two American authors that the Travelogue project will initially focus on.  Research wise, we are currently working on historical context, researching what was going on in the locations that they traveled to during their time there.

If you want to contact us please do. Our project blog is at  travelogue.commons.gc.cuny.edu. Email us at dhtravelogue [at] gmail [dot] com or follow us on Twitter @DhTravelogue