Category Archives: Spring 2014

DH Box considers deployment options

Cross-posted from the DH Box Blog: https://dhbox.commons.gc.cuny.edu/blog/2014/deployment-options-dh-box


Once DH Box knew the platform it would adopt, it was simply a matter of figuring out the best way to utilize that platform. But was it so simple?

What the DH Box Team has been tackling this week is striking a balance between providing a robust tool that is useful for the intended audience and whose maintenance is not insurmountable for its administrators.

To recap — the platform chosen for delivering the DH Box environment, ready with DH tools installed, is a web server image provided through Amazon’s AMI (Amazon Machine Image) appliance. This will deliver, in essence, an identical copy of a tool-laden operating system to any user’s system.

Choosing this platform offered important benefits — for example, freedom from having to address issues caused by tools being installed to users’ personal systems. However, it also introduced tension: to deploy images hosted by Amazon, one needs to use an Amazon account. Would we have users create their own Amazon Web Services (AWS) accounts that require credit card information (though launching the Image is a free service) or would we maintain an account that instances would be launched from and figure out how the DH Box team would handle potential related charges?

Many questions entered into this equation: Would our intended users be open to providing credit card information? Who might this alienate? Or, if we managed the AWS account with many instances running, would we incur charges we’re not prepared to deal with? What would be the time-period allotted to users for running the instances?

DH Box has had to think through how different deployment options (e.g. requiring users to have their own AWS accounts) might affect how DH Box will be adopted by intended users. And this — the tension between providing a service that is maintainable, sustainable, and at-once useful to the intended audience — is something any project like DH Box might face.

User experience testing and documentation

DH Box is really taking shape! We have a bare bones version of our server image up and running thanks to all of Steve’s hard work over the last week. We have revised our project plan with new milestone dates and a clear cut set of tasks we need to accomplish. We are working hard on everything we need to do now and also looking forward to the next phase.

User experience testing and documentation will be very important over the next few weeks. We need to be sure that people who are not already familiar with the command line, cloud computing, and DH tool installation will find DH Box easy and convenient to use. Documentation (aka the “user manual”) will be the key to helping users make the most of DH Box. We have decided to use Read the Docs  to host our documentation. Read the Docs allows us to host documentation files on our website and update our documentation when pushing to the GitHub repository that hosts our website – this means updating our online documentation is as simple as updating text on our website! One great benefit of using a utility like Read the Docs is our documentation will be easily maintainable, will be forkable by contributors, will be available online, and will be searchable.

Travelogue: Format Selection and Other Updates

The team chose the ESRI ArcGIS Storymaps platform for the Travelogue project.  Last week the team had a vote on which ESRI ArcGIS Storymaps format to go with, the options were:

Sequential, Place-based Narratives Map Tour http://storymaps.arcgis.com/en/app-list/map-tour/

A Curated List of Points of Interest Short List http://storymaps.arcgis.com/en/app-list/shortlist/

Comparing Two or More Maps Tabbed Viewer  http://storymaps.arcgis.com/en/app-list/tabbed-viewer/

Comparing Two or More Maps Side Accordion http://storymaps.arcgis.com/en/app-list/side-accordion

A Curated List of Points of Interest Playlist http://storymaps.arcgis.com/en/app-list/playlist

The winner was…Map Tour http://storymaps.arcgis.com/en/app-list/map-tour/

Each team member has an Esri ArcGIS organizational account that can be used to practice and publish.  With the format selected and a large volume of research content done we can now start building.  The American authors that we have chosen to initially feature are Zora Neale Hurston and Ernest Hemingway.  We have shared Google Drive folders for each that feature spreadsheets with the research collected so far.  The spreadsheet entries are organized with a unified chronological date so that the journeys can be mapped chronologically.  All of the locations on both spreadsheets also have coordinates.

Informational text about each author is being written and audiovisual material to be featured on the Travelogue site is being collected.  Notably, direct links to Hemingway images from the JFK Library’s Media Gallery http://www.jfklibrary.org/JFK/Media-Gallery.aspx For the content sources we have chosen to use the MLA citation format.

The Travelogue’s Twitter account has received a few new followers.  Also, a Travelogue tweet was favorited by a San Francisco Chronicle newspaper Book Editor (all acknowledgements count).  The Twitter logo has been redesigned.  The look of the Twitter page has been updated to reflect the biblio and cartographic aspects of the project. Check it out @dhtravelogue

The team is looking forward to providing a status update presentation to the DH Praxis class on Monday, March 24th.

If you want to contact us please do. Our project blog is at  travelogue.commons.gc.cuny.edu. Email us at dhtravelogue [at] gmail [dot] com or follow us on Twitter @DhTravelogue

Finding a Home: Travelogue Picks a URL

The Travelogue team has been navigating the URL waters (travel puns abound but URL names do not).  By Monday, March 17th the URL had been decided upon and purchased.  Details soon to follow (we will let you know when to begin the drum roll).

Other updates: On the Travelogue’s Commons page the Twitter feed has been updated removing the icons and making it more text based.  The team is also choosing between paper texture images to be used for the Travelogue’s Commons site background, consulting with guides on 2014 web design trends.  We have been actively working on the Zotero citations for the content that will be featured on the Travelogue site.  Meet-ups outside of normal class hours have been scheduled.  We have been outlining the research that has been done so far and what needs to be worked on.  Zora Neale Hurston and Ernest Hemingway are the two American authors that the Travelogue project will initially focus on.  Research wise, we are currently working on historical context, researching what was going on in the locations that they traveled to during their time there.

If you want to contact us please do. Our project blog is at  travelogue.commons.gc.cuny.edu. Email us at dhtravelogue [at] gmail [dot] com or follow us on Twitter @DhTravelogue

[Cross-Post] Maintaining Documentation is Important – Examples of Online Platforms | Harlan Kellaway

Cross-posted from: http://harlankellaway.com/2014/03/17/online-documentation-platforms/ (March 17, 2014)


Documentation for technical projects — especially ones you envision having non-technical end users — is essential. This statement comes from various experiences, both as a user and a developer who has both produced documentation for such users and has taken over poorly-documented projects from other developers. Having good documentation cuts down on the endless issues that come with figuring out how to use a technical product, whether from the frontend or the backend.

The reason documentation is on my mind is that one of the current projects I’m doing development for, DH Box, is one targeted towards end users of the variety described above. Not only that, but one of the systems integral to the project is a notoriously opaque one: Amazon Web Services. Moreover, it’s a mission of mine in my personal work to explore how tech can be made more accessible to those who don’t deem themselves technically savvy (I consider the technical/non-technical social binary and contributing factors to be problematic — but I’ll save that exploration for a different post!).

This week, I explored some tools that could help maintain the online documentation for our DH Box project, with a few preferences in mind: documentation that is easily updatable, documentation that is browsable and searchable, documentation that is configurable (e.g. in how it looks).

Continue reading

[Cross-Post] Subtle Elements of Online Branding That Can Help the Impression You Make in the Web World | Harlan Kellaway

Cross-posted from: http://harlankellaway.com/2014/03/02/online-branding-subtle-elements-to-help-web-impression/ (March 2, 2014)


I found myself in the midst of an interesting exchange a couple weeks ago — it was over whether Twitter handles should have underscores in them or not. The person questioning the underscores had been told to not use them period, but was unclear on why. It occurred to me that there are many subtle forms of online branding that folks who work on/with the Internet quite a bit eventually pick up, things that are obscure to more general users.

Here are a few examples of such online branding to think about when building an online presence.

Continue reading

[Cross-Post] Website Development for Beginners – Trade-offs Between Jekyll and WordPress | Harlan Kellaway

Cross-posted from: http://harlankellaway.com/2014/02/20/web-development-beginners-tradeoffs-jekyll-wordpress/ (February 20, 2014)


I’ve used WordPress for years as the base for just about every site I’ve created. It had occurred to me occasionally that WordPress may not be the best choice in every website development case, whether big or small, simple or complex. But, honestly, the idea of creating a website that didn’t have a database behind it hadn’t even occurred to me. A website that is almost purely HTML? I had subconsciously equated this with broken links and scrolling banners. I had equated database-backing with words like easy, maintainable, modern, and extensible.

It’s a clear case of every problem looking like a nail when the only tool you have is a hammer. But what if your problem is a thumbtack? Or a staple? A hammer might work, but it’s most definitely overkill and can even limit your creativity in solving the problem of fastening paper to things.

In terms of website development, WordPress has been my hammer and it’s gotten the job done. But, I had the opportunity the past weeks to use a new tool — Jekyll.

Continue reading

DH Box: Making a place in the Cloud

The DH Box team has made exciting strides over the past week!

As some may know, DH Box will be available on a pre-installed, pre-configured Debian cloud server. To achieve this, we are using Amazon Web Services. For those who aren’t initiated, AWS is a vast cloud computing infrastructure (with internet servers throughout the world) that offers services very similar to what a physical computer would. But AWS brings unrivaled scale, flexibility, and economy (pay as you go pricing).

DH Box’s Intro to Cloud Computing

Dennis Tenen led the DH Box team through it’s first group workshop on setting up a virtual web server image, a.k.a. an “EC2 Instance.” The virtual web server contains an Amazon Machine Image (think of it as an identical copy) of an operating system. DH Box will be freely available for users to launch their own instance of ours. This solution saves users the trouble of downloading and installing tools to their own computers.

What do users need to access DH Box in the Cloud?

It will be pretty simple- users must sign up for a free AWS account. And we’re making use of AWS’ CloudFormation (templates that deploy services rapidly) utility to automate many of the steps required to launch a new AMI instance. We also have custom scripts to automate the launch of DH Box files and software once users copy our server image. We’re really excited about being introduced to this powerful service, and even more encouraged that our configuration templates will allow DH Box users to dive swiftly into DH inquiry.

This is just the beginning- we’re focusing heavily on providing thorough documentation so that DH Box users will have everything they need to get up and running. Stay tuned!

Special thanks to Prof. Dennis Tenen for his amazing Intro to Cloud Computing Workshop.

Beyond Citation: Critical thinking about academic databases

During the Fall 2013 semester, I started reading, thinking and writing about the impact of academic databases such as JSTOR and Gale: Artemis Primary Sources on research and scholarship. I learned that databases shape the questions that can be asked and the arguments that can be made by scholars through search interfaces, algorithms, and the items that are contained in or absent from their collections. Although algorithms in databases have been found to have an “epistemological power” through their ranking of search results, understanding why certain search results appear is very difficult even for the team that engineered the algorithms. Yet knowledge of how databases work is extremely limited because information about database structures is scanty or unavailable and constantly changing.

Despite the ubiquity of databases, academics are often unaware of the constraints that databases place on their research. Lack of information about the impact of database structures and content on research is an obstacle to scholarly inquiry because it means that scholars may not be aware of and cannot account for how databases affect their interpretations of search results or text analysis.

Digital humanists have examined both the benefits and perils of research in academic databases. The introduction of digital tools for text analysis to identify patterns common to large amounts of documents has added to the complexity of scholars’ tasks. Historian Jo Guldi writes that, “Keyword searching [in databases] . . . allows the historian to propose longer questions, bigger questions;” yet she also remarks on the challenges posed by search in an earlier article saying that, “Each digital database has constraints that render historiographical interventions based upon scholars’ queries initially suspect.” Scholars such as Caleb McDaniel, Miriam Posner, James Mussell, Bob Nicholson and Ian Milligan have written about the skewed search results of databases of historical newspapers, the impossibility of finding provenance information to contextualize what database users are seeing, and the lack of information about OCR accuracy. Besides these issues, scholars should also have an understanding of errors in digital collections. For example, scholars using Google Books would probably want to know that thirty-six percent of Google Books have errors in either author, title, publisher, or year of publication metadata.

Historian Tim Hitchcock talks about the importance of understanding the types of items in digital collections, saying, “Until we get around to including the non-canonical, the non-Western, the non-textual and the non-elite, we are unlikely to be very surprised.” Because they can contain what seems to be an almost infinite number of documents, archival databases offer an appearance of exhaustiveness that does not yield easily to a scholar’s probing. But while a gestalt understanding of a primary source database is crucial to determining the representation of items in the collection, the limited bibliographic information that is available about academic databases is scattered or unknown to most scholars.

As one step toward overcoming scholars’ lack of knowledge about the biases inherent in databases, I am working with a team of other students in the DH Praxis Seminar at the CUNY Graduate Center to create Beyond Citation, a website to aggregate bibliographic information about major humanities databases so that scholars can understand the significance of the material they have gleaned. Beyond Citation will help humanities scholars to practice critical thinking about research in databases.

The benefit of encouraging critical thinking about databases is more than merely facilitating research. Critical thinking about databases counters scholars’ “tendency to consider the archive as a hermetically-sealed space in which historical material can be preserved untouched,” and “[forces] a recognition of the constructed nature of evidence and its relation to the absent past.”

The Beyond Citation team has selected a set of humanities databases for the initial site launch and is working out the nitty-gritty of platform and server-side database functionality as well as completing research about the databases that we have chosen to cover on the site.

By providing structured information about databases and articles about research strategies, Beyond Citation will frame the common problems that scholars face when evaluating the results of their work in databases. Scholars will be able to enrich the data on the site with their own contributions, participate in reflective conversations and share highly situated stories about their experiences of working in databases. While an early version of the website to be launched in May 2014 will have a limited scope, the idea is that the site will eventually become a research workshop.

As information scientist Ryan Shaw observes, “In an era of vast digital archives and powerful search algorithms, the key challenge of organizing information is to construct systems that aid understanding, contextualizing, and orienting oneself within a mass of resources.” By making essential bibliographic information about the structures and content of academic databases accessible to scholars, Beyond Citation will take an important step to updating the scholarly apparatus to encourage critical thinking about databases and their effect on research and scholarship.

Reach us at BeyondCitation [at] gmail [dot] com or follow us on Twitter as we get ready for the launch in May: @beyondcitation

Acknowledgments

The idea for Beyond Citation originated from my encounter with a blog post by Caleb McDaniel about historians’ research practices suggesting the creation of an “online repository” of information about proprietary databases.

Presenting… DH Box

In the interest of spreading the mission of DH Box far and wide, I’ve been working on a brief presentation that might also serve as an online introduction to the project. It’s available hereTake a look!

I’ll be using these slides to give a short talk about DH Box to faculty this Tuesday at Hunter College. It looks like we’ll be making quite a few presentations like this one, because as it turns out, building a community is one of the key factors determining success for DH Box. We will need the help of an invested community to:

  • Determine which tools should be included
  • Identify new platforms to target
  • Contribute to documentation
  • Spread awareness about DH Box

and it seems clear that in-person meetings and discussions are the best way for us to create interest in our work. That’s not to discount social media approaches at all; they allow for broad outreach we couldn’t manage otherwise. But in-person conversation allows us to demonstrate and discuss DH Box in greater depth, thus solidifying each potential user’s understanding and their relationship with us and our project.