Category Archives: DH Student Projects 2014

Thinking About Authority and Academic Databases

Beyond Citation hopes to encourage critical thinking by scholars about academic databases. But what do we mean by critical thinking? Media culture scholar Wendy Hui Kyong Chun has defined critique as “not attacking what you think is false, but thinking through the limitations and possibilities of what you think is true.”

One question that the Beyond Citation team is considering is the scholarly authority of a database. Yale University Library addresses the question of scholarly authority in a handout entitled the “Web vs. Library Databases,” a guide for undergraduates. The online PDF states that information on the web is “seldom regulated, which means the authority is often in doubt.” By contrast, “authority and trustworthiness are virtually guaranteed” to the user of library databases.

Let’s leave aside for the moment the question of whether scholars should always prefer the “regulated” information of databases to the unruly data found on the Internet. While Yale Library may simply be using shorthand to explain academic databases to undergraduates, to the extent that they are equating databases and trustworthiness, I think they may be ceding authority to databases too readily and missing some of the complexity of the current digital information landscape.

Yale Library cites Academic Search and Lexis-Nexis as examples of databases. Lexis-Nexis is a compendium of news articles, broadcast transcripts, press releases, law cases, as well as Internet miscellany. Lexis-Nexis is probably authoritative in the sense that one can be comfortable that the items accessed are the actual articles obtained directly from publishers and thus contain the complete texts of articles (with images removed). In that limited sense, items in Lexis-Nexis are certainly more reliable than results obtained from a web search. (Although this isn’t true for media historians who want to see the entire page with pictures and advertisements included. For that, try the web or another newspaper database). Despite its relatively long pedigree for an electronic database, careful scrutiny of results is just as crucial when doing a search in Lexis-Nexis as it is for an Internet search.

In some instances, especially when seeking information about non-mainstream topics, searching the Internet may be a better option. Composition and rhetoric scholar Janine Solberg has written about her experience of research in digital environments, in particular how full-text searches on Amazon, Google Books, the Internet Archive and HathiTrust enabled her to locate information that she was unable to find in conventional library catalogs. She says, “Web-based searching allowed me not only to thicken my rhetorical scene more quickly but also to rapidly test and refine questions and hypotheses.” In the same article, Solberg calls for “more explicit reflection and discipline-specific conversation around the uses and shaping effects of these [digital] technologies” and recommends as a method “sharing and circulating research narratives that make the processes of historical research visible to a wider audience . . . with particular attention to the mediating role of technologies.”

Adding to the challenge of thinking critically about academic databases is their dynamic nature. The terrain of library databases is changing as more libraries adopt proprietary “discovery” systems that search across the entire set of databases to which libraries subscribe. For example, the number of JSTOR users has dropped “as much as 50%” with installations of discovery systems and changes in Google’s algorithms. Shifts in discovery have led to pointed discussions between associations of librarians and database publishers about the lack of transparency of search mechanisms. In 2012, Tim Collins, the president of EBSCO, a major database and discovery system vendor, found it necessary to address the question of whether vendors of discovery systems favor their own content in searches, denying that they do. There is, however, no way for anyone outside the companies to verify his statement because the vendors will not reveal their search algorithms.

While understanding the ranking of search results in academic databases is an open question, a recent study comparing research in databases, Google Scholar and library discovery systems by Asher et al. found that “students imbued the search tools themselves with a great deal of authority,” often by relying on the brand name of the database. More than 90% of students in the study never went past the first page of search results. As the study notes, “students are de facto outsourcing much of the evaluation process to the search algorithm itself.”

In addition, lest one imagine that scholars are immune to an uncritical perspective on digital sources, in his study of the citation of newspaper databases in Canadian dissertations, historian Ian Milligan says that scholars have adopted the use of these databases without achieving a concomitant perspective on their shortcomings. Similarly to the Asher et al. study of undergraduate students, Milligan says, “Researchers cite what they find online.”

If critique is, as Chun says, thinking through the limitations and possibilities of what we think is true, then perhaps by encouraging reflective conversations among scholars about how these ubiquitous digital tools shape research and the production of knowledge, Beyond Citation’s efforts will be another step toward that critique.

We are at blog.beyondcitation.org. Email us at BeyondCitation [at] gmail [dot] com or follow us on Twitter @beyondcitation as we get ready for the launch in May.

Travelogue: Format Selection and Other Updates

The team chose the ESRI ArcGIS Storymaps platform for the Travelogue project.  Last week the team had a vote on which ESRI ArcGIS Storymaps format to go with, the options were:

Sequential, Place-based Narratives Map Tour http://storymaps.arcgis.com/en/app-list/map-tour/

A Curated List of Points of Interest Short List http://storymaps.arcgis.com/en/app-list/shortlist/

Comparing Two or More Maps Tabbed Viewer  http://storymaps.arcgis.com/en/app-list/tabbed-viewer/

Comparing Two or More Maps Side Accordion http://storymaps.arcgis.com/en/app-list/side-accordion

A Curated List of Points of Interest Playlist http://storymaps.arcgis.com/en/app-list/playlist

The winner was…Map Tour http://storymaps.arcgis.com/en/app-list/map-tour/

Each team member has an Esri ArcGIS organizational account that can be used to practice and publish.  With the format selected and a large volume of research content done we can now start building.  The American authors that we have chosen to initially feature are Zora Neale Hurston and Ernest Hemingway.  We have shared Google Drive folders for each that feature spreadsheets with the research collected so far.  The spreadsheet entries are organized with a unified chronological date so that the journeys can be mapped chronologically.  All of the locations on both spreadsheets also have coordinates.

Informational text about each author is being written and audiovisual material to be featured on the Travelogue site is being collected.  Notably, direct links to Hemingway images from the JFK Library’s Media Gallery http://www.jfklibrary.org/JFK/Media-Gallery.aspx For the content sources we have chosen to use the MLA citation format.

The Travelogue’s Twitter account has received a few new followers.  Also, a Travelogue tweet was favorited by a San Francisco Chronicle newspaper Book Editor (all acknowledgements count).  The Twitter logo has been redesigned.  The look of the Twitter page has been updated to reflect the biblio and cartographic aspects of the project. Check it out @dhtravelogue

The team is looking forward to providing a status update presentation to the DH Praxis class on Monday, March 24th.

If you want to contact us please do. Our project blog is at  travelogue.commons.gc.cuny.edu. Email us at dhtravelogue [at] gmail [dot] com or follow us on Twitter @DhTravelogue

Finding a Home: Travelogue Picks a URL

The Travelogue team has been navigating the URL waters (travel puns abound but URL names do not).  By Monday, March 17th the URL had been decided upon and purchased.  Details soon to follow (we will let you know when to begin the drum roll).

Other updates: On the Travelogue’s Commons page the Twitter feed has been updated removing the icons and making it more text based.  The team is also choosing between paper texture images to be used for the Travelogue’s Commons site background, consulting with guides on 2014 web design trends.  We have been actively working on the Zotero citations for the content that will be featured on the Travelogue site.  Meet-ups outside of normal class hours have been scheduled.  We have been outlining the research that has been done so far and what needs to be worked on.  Zora Neale Hurston and Ernest Hemingway are the two American authors that the Travelogue project will initially focus on.  Research wise, we are currently working on historical context, researching what was going on in the locations that they traveled to during their time there.

If you want to contact us please do. Our project blog is at  travelogue.commons.gc.cuny.edu. Email us at dhtravelogue [at] gmail [dot] com or follow us on Twitter @DhTravelogue

DH Box: Making a place in the Cloud

The DH Box team has made exciting strides over the past week!

As some may know, DH Box will be available on a pre-installed, pre-configured Debian cloud server. To achieve this, we are using Amazon Web Services. For those who aren’t initiated, AWS is a vast cloud computing infrastructure (with internet servers throughout the world) that offers services very similar to what a physical computer would. But AWS brings unrivaled scale, flexibility, and economy (pay as you go pricing).

DH Box’s Intro to Cloud Computing

Dennis Tenen led the DH Box team through it’s first group workshop on setting up a virtual web server image, a.k.a. an “EC2 Instance.” The virtual web server contains an Amazon Machine Image (think of it as an identical copy) of an operating system. DH Box will be freely available for users to launch their own instance of ours. This solution saves users the trouble of downloading and installing tools to their own computers.

What do users need to access DH Box in the Cloud?

It will be pretty simple- users must sign up for a free AWS account. And we’re making use of AWS’ CloudFormation (templates that deploy services rapidly) utility to automate many of the steps required to launch a new AMI instance. We also have custom scripts to automate the launch of DH Box files and software once users copy our server image. We’re really excited about being introduced to this powerful service, and even more encouraged that our configuration templates will allow DH Box users to dive swiftly into DH inquiry.

This is just the beginning- we’re focusing heavily on providing thorough documentation so that DH Box users will have everything they need to get up and running. Stay tuned!

Special thanks to Prof. Dennis Tenen for his amazing Intro to Cloud Computing Workshop.

Beyond Citation: Critical thinking about academic databases

During the Fall 2013 semester, I started reading, thinking and writing about the impact of academic databases such as JSTOR and Gale: Artemis Primary Sources on research and scholarship. I learned that databases shape the questions that can be asked and the arguments that can be made by scholars through search interfaces, algorithms, and the items that are contained in or absent from their collections. Although algorithms in databases have been found to have an “epistemological power” through their ranking of search results, understanding why certain search results appear is very difficult even for the team that engineered the algorithms. Yet knowledge of how databases work is extremely limited because information about database structures is scanty or unavailable and constantly changing.

Despite the ubiquity of databases, academics are often unaware of the constraints that databases place on their research. Lack of information about the impact of database structures and content on research is an obstacle to scholarly inquiry because it means that scholars may not be aware of and cannot account for how databases affect their interpretations of search results or text analysis.

Digital humanists have examined both the benefits and perils of research in academic databases. The introduction of digital tools for text analysis to identify patterns common to large amounts of documents has added to the complexity of scholars’ tasks. Historian Jo Guldi writes that, “Keyword searching [in databases] . . . allows the historian to propose longer questions, bigger questions;” yet she also remarks on the challenges posed by search in an earlier article saying that, “Each digital database has constraints that render historiographical interventions based upon scholars’ queries initially suspect.” Scholars such as Caleb McDaniel, Miriam Posner, James Mussell, Bob Nicholson and Ian Milligan have written about the skewed search results of databases of historical newspapers, the impossibility of finding provenance information to contextualize what database users are seeing, and the lack of information about OCR accuracy. Besides these issues, scholars should also have an understanding of errors in digital collections. For example, scholars using Google Books would probably want to know that thirty-six percent of Google Books have errors in either author, title, publisher, or year of publication metadata.

Historian Tim Hitchcock talks about the importance of understanding the types of items in digital collections, saying, “Until we get around to including the non-canonical, the non-Western, the non-textual and the non-elite, we are unlikely to be very surprised.” Because they can contain what seems to be an almost infinite number of documents, archival databases offer an appearance of exhaustiveness that does not yield easily to a scholar’s probing. But while a gestalt understanding of a primary source database is crucial to determining the representation of items in the collection, the limited bibliographic information that is available about academic databases is scattered or unknown to most scholars.

As one step toward overcoming scholars’ lack of knowledge about the biases inherent in databases, I am working with a team of other students in the DH Praxis Seminar at the CUNY Graduate Center to create Beyond Citation, a website to aggregate bibliographic information about major humanities databases so that scholars can understand the significance of the material they have gleaned. Beyond Citation will help humanities scholars to practice critical thinking about research in databases.

The benefit of encouraging critical thinking about databases is more than merely facilitating research. Critical thinking about databases counters scholars’ “tendency to consider the archive as a hermetically-sealed space in which historical material can be preserved untouched,” and “[forces] a recognition of the constructed nature of evidence and its relation to the absent past.”

The Beyond Citation team has selected a set of humanities databases for the initial site launch and is working out the nitty-gritty of platform and server-side database functionality as well as completing research about the databases that we have chosen to cover on the site.

By providing structured information about databases and articles about research strategies, Beyond Citation will frame the common problems that scholars face when evaluating the results of their work in databases. Scholars will be able to enrich the data on the site with their own contributions, participate in reflective conversations and share highly situated stories about their experiences of working in databases. While an early version of the website to be launched in May 2014 will have a limited scope, the idea is that the site will eventually become a research workshop.

As information scientist Ryan Shaw observes, “In an era of vast digital archives and powerful search algorithms, the key challenge of organizing information is to construct systems that aid understanding, contextualizing, and orienting oneself within a mass of resources.” By making essential bibliographic information about the structures and content of academic databases accessible to scholars, Beyond Citation will take an important step to updating the scholarly apparatus to encourage critical thinking about databases and their effect on research and scholarship.

Reach us at BeyondCitation [at] gmail [dot] com or follow us on Twitter as we get ready for the launch in May: @beyondcitation

Acknowledgments

The idea for Beyond Citation originated from my encounter with a blog post by Caleb McDaniel about historians’ research practices suggesting the creation of an “online repository” of information about proprietary databases.

Presenting… DH Box

In the interest of spreading the mission of DH Box far and wide, I’ve been working on a brief presentation that might also serve as an online introduction to the project. It’s available hereTake a look!

I’ll be using these slides to give a short talk about DH Box to faculty this Tuesday at Hunter College. It looks like we’ll be making quite a few presentations like this one, because as it turns out, building a community is one of the key factors determining success for DH Box. We will need the help of an invested community to:

  • Determine which tools should be included
  • Identify new platforms to target
  • Contribute to documentation
  • Spread awareness about DH Box

and it seems clear that in-person meetings and discussions are the best way for us to create interest in our work. That’s not to discount social media approaches at all; they allow for broad outreach we couldn’t manage otherwise. But in-person conversation allows us to demonstrate and discuss DH Box in greater depth, thus solidifying each potential user’s understanding and their relationship with us and our project.

Beyond Citation: Building digital tools to explain digital tools

Over the last couple weeks, the Beyond Citation team has transformed into a web production team of sorts, focused on making key decisions about platform, site architecture, user interaction, design, and communication.

Beyond Citation—a project to build a website that aggregates accessible, structured information about scholarly databases—has the potential to enhance how scholars approach, use, and interpret resources from some of today’s most widely used digital collections. While it would be straightforward for our team to simply gather and publish information about those resources, our challenge is to build a digital tool that supports meaningful interaction with that information, one that can also scale in the future and cater to a community of contributors.

In the project’s nascent stages, the tactical concerns before us are familiar—we’re taking on the common challenge of building and launching a website or web app. Thrust into the very practical realm of software, decisions, and constraints, discussions of critical theory get put off to discuss the merits of WordPress and Drupal. These powerful tools place the project in a digital ecosystem much wider than academia. The platform we have chosen—WordPress—pushes us deeper still into the wide worlds of relational databases, server-side scripting, and content management—the digital tools that will allow us to explain other digital tools.

As we construct the basic building blocks for the site, we find that the best way to focus our approach is by seeking the advice of experts, reading blogs about WordPress customization, and learning more about MySQL and WordPress taxonomies. The robust open source community behind WordPress has enabled us to confirm that the technical requirements for the Beyond Citation website can be met many times over through combinations of WordPress plugins.

Something to consider while building this tool with WordPress, is that we are seeking to publish data about proprietary tools by using open source technology. Perhaps this isn’t really so unusual—we see this in a similar vein as increasingly popular APIs that allow for easier data aggregation or configuration from multiple sources. And toolsets that are hybrids of proprietary and open source systems are extremely common.

But there’s an important depth to explore when thinking about Beyond Citation as a bridge between proprietary and open source systems. The idea of “exposed” information, built on “hidden” information, represents a direction that the project can try to push technically. For instance, if in a future iteration the team can uncover information about scholarly databases that’s not just hard to find, but not openly available (such as how search algorithms work, or the criteria behind publisher contracts), then I think the value of Beyond Citation increases in a direction most closely aligned with its original ambition. This would also allow the project to explore the similarities and differences in how scholarly databases work in more meaningful ways.

Before we can do that, everyone on the team is doing their part to fill in knowledge gaps, and discovering “how technology works” on multiple levels. Just as we are researching the types of information about scholarly databases that we want the project to highlight, we are also researching the types of data-driven web frameworks that could easily support such information. Like many Digital Humanities projects, Beyond Citation is about knowledge acquisition and aggregation for both developers and researchers. We are challenging ourselves to learn as much as we can about one set of digital tools before we can communicate new information about other sets of digital tools—both of which are moving targets, evolving in their own realms of authorship.

As we work towards a May launch date for an early version of the site, we realize that the authors of digital projects need a constant appetite for more knowledge—technical knowledge and subject-matter knowledge—in order to create and maintain an authoritative tool.

Follow us on Twitter as we get ready for May: @beyondcitation

It’s a Two-Fer!

Travelogue group members
Sarah – Project Manager
Amy – Technology and Design
Melanie – Outreach and Communication
Evonne – Research
Adam – Technology and Design

Last week, due to illness, the Travelogue’s outreach and communication person was ironically silenced.  However, that means this week there is twice as much Travelogue team blog fun to catch up on!

Travelogue’s Twitter page has a great new logo courtesy of Adam.  Initially, we had encountered an issue with the size of the first Travelogue logo not looking great sized down for Twitter.  Adam also created the Travelogue logo that appears on the Travelogue’s Common’s page.  Throughout the design process, Adam shared drafts for input from the group.  Amy has been hard at work on the design and content of the Travelogue’s Common’s page.

Last Monday on March 3rd the team, sans one under the weather outreach and communication member, presented an update on the project status to the DHPraxis class.  In preparation, Sarah created an action plan outlining how each team member could explain the progression the team has made so far.

Sarah met with our DH Praxis professor Matt Gold to go over the scope of the project and get his input on the current ideas the team has.  Sarah is working on the Travelogue website’s wireframe and created a mock up of the layout.  Also, she is continuously working on the project plan.  The team has been actively communicating, to organize the communication and each team member’s responsibilities, Sarah established an Asana page for the team.

Evonne has been compiling research resources, organizing the research conducted, what needs to be further researched and maintaining citations in a Travelogue Zotero page.  Using Evonne’s extensive research as a guide and the Gale database Directory of Special Libraries and Information Centers, Melanie has been reaching out to multiple academic institutions.  The preliminary goal is to introduce the Travelogue project, request info on the usage of content (for example from the Library of Congress) and building relations from there.  Through the Travelogue Twitter account Melanie has followed organizations working on mapping projects  and will be actively working creating engaging content in the pursuit of followers.

The team has been exploring ArcGIS Story Maps as the mapping tool for the project.  A schedule of meetings outside of class is being established as to best collaboratively brainstorm face to face.  The team is looking into whether Travelogue will be paralleling the travel narratives of the chosen authors (Ernest Hemingway and Zora Neale Hurston), literally displaying the travel trajectories of both on the same map?  Or, will each author’s journey be depicted on a separate map?  The website’s URL is also currently being decided upon.

If you want to contact us please do. Our project blog is at  travelogue.commons.gc.cuny.edu. Email us at dhtravelogue [at] gmail [dot] com or follow us on Twitter @DhTravelogue

New Friend, New Platform for DH Box

Cross-posted from: https://dhbox.commons.gc.cuny.edu/blog/2014/dh-box-new-friend-new-platform


This week the DH Box team reconsidered their choice of platform, with the help of Dennis Tenen, a professor at Columbia University in the Digital Humanities and New Media Studies program (and former developer with Microsoft).

A couple weeks ago we were surprised and delighted to find that another team had come up with the idea for a portable tool that could help users quickly get going with DH applications. And this week we found that Professor Tenen and colleagues had also discussed how to tackle such a project and had come up with yet a different solution! In discussing that solution, we found it matched our aim of providing an ease of quickly setting up an environment for new users and made us change our focus for both implementation and outreach.

Read more

Opening DH Box

This is it! The inaugural post of the DH Box blog (the DH stands for Digital Humanities). Here we intend to make the process of planning, creating, and publicizing the DH Box transparent for our readers. Hopefully this provides some inspiration, and even a blueprint, for future collaborative DH projects.

But let’s not get ahead of ourselves! First, some questions and answers:

What is DH Box?

Not much, so far. But we intend it to be a portable, customized environment for Digital Humanities learners that can rely on incredibly inexpensive technology. All you really need is a computer (and a monitor and keyboard, of course!) — but the platform that excites us most is the Raspberry Pi, a tiny computer that sells for just $35. Imagine a collection of DH tools, pre-installed and configured, and a set of texts for users to interrogate — all on a portable and inexpensive device.

What inspired the idea of DH Box?

Several ongoing humanities projects have begun to take advantage of the continuing miniaturization of computing technology. One in particular excited my imagination: Library Box, which repurposes a wireless router into a “portable digital file distribution tool…that enables delivery of educational, healthcare, and other vital information to individuals off the grid.” The possibilities for ’embedded’, specialized miniature computers are massive.

What is needed to run DH Box?

Our first major goal is to get DH Box running on the Raspberry Pi. Once that’s done, DH Box will also be runnable on nearly any Linux computer! We are also targeting OS X.

Who do you think will use DH Box?

Anyone and everyone who is interested in learning Digital Humanities inquiry techniques, but especially those who may not have any prior programming experience. We hope that instructors will use our tools to set up almost instant DH labs, and that students will use DH Box to get an edge in their research.

We see DH Box as an example of what is likely to be a robust and interesting future field, ‘humanities hardware’.

Who are we?

We are an interdisciplinary team of learners and do-ers, librarians and developers and digital humanists and more — with an interest in making DH work more accessible. Find us:

dhbox.org
@DH_Box
hello@dhbox.org

More to come as we continue to develop DH Box!