Category Archives: Fall 2013

Posts done in Fall 2013

Easy Access to Data for Text Mining

Prospect Workflow

Will 2014 be the year that you take a huge volume of texts and run them through an algorithm to detect their themes? Because significant hurdles to humanists’ ability to analyze large volumes of text have been or are being overcome, this might very well be the year that text mining takes off in the digital humanities. The ruling in the Google Books federal lawsuit that text mining is fair use has removed many concerns about copyright that had been an almost insurmountable barrier to obtaining data. Another sticking point has been the question of where to get the data. Until recently, unless researchers digitized the documents themselves, the options for humanities scholars were mostly JSTOR’s Data for Research, Wikipedia and pre-1923 texts from Google Books and HathiTrust. If you had other ideas, you were out of luck. But within the next few months there will be a broader array of full-text data available from subscription and open access databases.

CrossRef, the organization that manages Digital Object Identifiers (DOIs) for database publishers, has a pilot text mining program, Prospect, that has been in beta since July 2013 and will launch early this year. There is no fee for researchers who already have subscription access to the databases. To use the system, researchers with ORCID identifiers log in to Prospect and receive an API token (alphanumeric string). For access to subscription databases, Prospect displays publishers’ licenses that researchers can sign with a click. After agreeing to the terms, they receive a full-text link. The publisher’s API verifies the token, license, and subscription access and returns full-text data subject to rate limiting (e.g. 1500 requests per hour).

Herbert Van de Sompel and Martin Klein, information scientists who participated in the Prospect pilot, say “The API is really straightforward and based on common technical approaches; it can be easily integrated in a broader workflow. In our case, we have a work bench that monitors newly published papers, obtains their XML version via the API, extracts all HTTP URIs, and then crawls and archives the referenced content.”

The advantage for publishers is that providing access to an API may stop people from web scraping the same URLs that others are using to gain access to individual documents. And publishers won’t have to negotiate permissions with many individual researchers. Although a 2011 study found that when publishers are approached by scholars with requests for large amounts of data to mine they are inclined to agree, it remains to be seen how many publishers will sign up for the optional service and what the license terms will be. Interestingly, the oft-maligned Elsevier is leading the pack having made its API accessible to researchers during the pilot phase. Springer, Wiley, Highwire and the American Physical Society are also involved.

Details about accessing the API are on the pilot support site and in this video. CrossRef contacts are Kirsty Meddings, product manager [] and Geoffrey Bilder, Director of Strategic Initiatives [].


Redefining DH

The first semester of the Digital Praxis Seminar was an inspiring invitation into the new age of scholarship. The lecture series set a compelling foundation for engaging the Digital Humanities, and opened a portal to possibility. The seminar led me to imagine how I could elevate my own scholarship in the midst of today’s Information Revolution. It challenged me to consider ways to overcome traditional text-based modes of humanities scholarship, and conceive of new mediums to give scholarship greater relevance and influence in mainstream society.
At the close of the first semester, I find myself reflecting heavily on the Digital Humanities. At the beginning of the semester when we were asked to define Digital Humanities, I had trouble coming up with a definition. As I attempt to redefine the field now, one word comes to mind: Possibility. The Digital Humanities is all about possibility. It’s about the possibility that comes from collaboration, creativity, problem solving, technology, scholarship, and innovation.
I am very excited to begin working on projects after the break! I look forward to seeing you all. All the best for a happy and healthy new year!

Resources for Film Studies Projects

As I know there are at least a couple other film studies people here, and hopefully others are interested as well, below is a non-exhaustive list of possible tools and/or resources for film analysis. One final note that I would like to add is that I think these tools are productive for stimulating both analytical and creative abilities, the latter of which is often lacking in traditional humanities scholarship and pedagogy.

  • Digital Storytelling & Animated GIFs – digital storytelling seems to be growing in undergraduate and K-12 curriculums. This could be a great tool for humanities-based coursework as it allows students to think differently about how stories and films are constructed. Recording/editing mechanisms are now inexpensive and somewhat ubiquitous, and platforms like YouTube can easily publicize a student’s work. Animated GIFs may perform a similar function. Matt pointed me to Jim Groom’s blog, which is very interesting:
  • ClipNotes for iPad – this is a very cool app for doing film studies, though at the moment, it is extremely difficult to share one’s work, and use is obviously limited to iPad owners.
  • Visualization – earlier in the semester we looked at Brendan Dawes’ “Cinema Redux” project, which is perhaps the best example, though varying approaches to visualizing films are possible.
  • Cinemetrics – this is a great tool for doing film measurement analysis. The website contains detailed information, a database, and some written scholarship on the topic.
  • Max 6 – we used Max with Phidgets during Bill Turkel’s workshop earlier in the semester. Max contains several free tutorials on working with video clips in the program. There are some very cool possibilities.

I hope everyone has a nice break!

Race, Surveillance and Technology

The lecture on race, surveillance and technology captured my attention in a way that no other lecture this semester had.  I very much appreciate Ms. Simone Browne’s candid approach to this very difficult subject and the compelling discussions that followed, as well as the discussion with Zach Blas around protecting privacy during the informative workshop that followed. 

The abhorrent history of the branding of slaves, both on eastern and western shores provides a reference to understand corporal punishment and the mass categorizations of human beings as “other” as a societal norm.  Given this background, so long as there is this notion of “otherness”, it is concerning that the use of biometric technology as a surveillance tool can become a great detriment to society especially since anyone can access this technology.  On the upside, the government uses biometric technology as a protection device against terrorist at the country’s ports of entry.  However as was noted during the lecture, we know the private sector can collect data in the form of capturing one’s fingerprint so long as the public acquiesces to finger scans.  What can this mean for those who are able to implement the use of these private treasure troves?  Will technologies such as these effect future generations in adverse ways?  Should we assume such data collections will always be used to aid the human condition and not harm it?  As the public becomes better informed, will concerns around privacy once again explode onto the national scene? Shouldn’t they?

Researcher Beware!

Genevieve asked a great question Monday night: How does one verify the data reflected in mapping platforms?

In his answer, Steve Romalewski stressed that when examining any kind of data, critical thinking is essential. First, look at the metadata to see when the content was updated, look to see where the information came from, who gathered it, what it includes and, by extension, excludes.

Both Genevieve’s question and Steve’s answer underscore the importance of critical thinking and content transparency in the myriad digital tools we use everyday for research. There is often a false sense of security when searching online platforms that the content will be there and that it will be true. And if it’s not there, then it must not exist at all.

To some extent, it might not exist, online anyway. Take, for example, an online full text historical newspaper archive. While the platform may advertise a specific title as being in the full text, it doesn’t necessarily tell you that only select issues are available. The hapless researcher, plugging in keywords and getting nowhere might not be aware of that gap in coverage, and so gets…nothing. If she had known the inclusion dates of that digital archive, she might’ve known that while her online search might yield very little, a spin through a physical microfilm reel might prove enormously fruitful —  albeit a lot more time consuming.

As we increasingly rely on digital tools for research, sometimes to the exclusion of other resources, we must always be aware of the ways the resources are structured and the content they provide. With that knowledge, we’ll have much more manageable expectations of what can be found, how best to approach it for research, or whether someone is better off consulting another full text portal: the physical book.

DH Thesis

This is a message from my friend Anderson who was handing out a paper about his thesis during our last class:

Well this is embarrassing…

Everyone who attempted to access my app, I really appreciate it, but Amy Wolfe was kind enough to let me know that I had incorrectly transcribed the URL.

The correct URL for the site is: or

I hope you all will check it out.

Thanks again, Anderson Evans e-mail: twitter: @Anderson_Evans

The Gentle Introduction Resource

The G.I.R. is a Rails based web app that hopes to collect a specific collection of crowdsourced academic resources. This app is maintained by Anderson Evans as the core of his thesis for the MALS degree in Digital Humanities at CUNY Graduate Center.

Message from Steve Romalewski

Steve and Matt,

I hope yesterday’s presentation was helpful.  The students had some good questions.  As I mentioned, I’d be glad to follow up with them individually if they have more specific questions or want to discuss options.

Btw, I was reminded today that cartoDB has started to offer online tutorials for beginners.  More info here:  The first session already took place, but they’ll have others and the material will be archived at that link.  Please pass along to your students if you think it’d be helpful.



Steven Romalewski, Director,
CUNY Mapping Service
Center for Urban Research at The Graduate Center / CUNY

Mapping Movies

Steve Romalewski offered us a broad overview of the many tools one can utilize for mapping projects. It is astounding to consider the sophistication of programs like ArcGIS and QGIS when, as Steve noted, the majority of the functionality is never even used, and wonderfully complex, insightful maps are created nonetheless. Equally astounding, however, are more recent, smaller-scale tools such as,, and even ESRI’s Storymaps. While both ArcGIS and QGIS are powerful devices that are not particularly intimidating, a humanist may find one of the latter mapping tools more appropriate for his/her work. Intuitive and easily navigable, such tools can be remarkably effective for geo-plotting humanistic data. Since my background is in film studies, I am particularly interested in thinking of ways to map movie data.

Despite an abundance of work and theory developed around literary mapping (particularly the work of Franco Moretti), there seem to be relatively few attempts to synthesize cinema and maps. Of note, however, is Stephen Mamber’s digital work, as well as his 2003 essay, “Narrative Mapping”, which outlines potential approaches for mapping narrative films. Also notable is Jeffrey Klenotic’s current project “Mapping Movies” (see Narrative mappings of a film may be interesting, particularly when multiple settings occur and the geography itself has contextual meaning, but Klenotic’s project shows that other forms of mapping cinema are possible. Though unfinished at the moment, this project intends to map film exhibitions from an historical perspective in order to gain social and cultural knowledge regarding the movie-going population in certain locations at certain moments in time. Another conceivable approach could involve mapping production locations, if one was doing historical research on the business itself, or perhaps simply investigating how production locations contrast their fictional counterparts. Likewise, mapping a particular film author’s work (either by production location or fictional setting) might offer insight only attainable through geographical visualization. Suffice to say, the potential is vast.

ESRI’s Storymaps, though seemingly unsophisticated and geared toward a consumer-base, may in fact offer the greatest potential for mapping movies. If people haven’t tried this quick, easy, and fun tutorial, I would highly recommend it: The “map tour” template (and other templates probably have this functionality, as well) allows one to import web images and video (via flickr, youtube, etc.). This is great for geo-tagging photos from a road trip. But this could be equally valuable for a scholarly, narrative mapping project. Historical documents, manuscripts, etc. can be compiled, converted to image files, posted to a site like Flickr, then very easily mapped in Storymaps. For film study, one could rip a DVD using a simple, free tool like Handbrake (, break down scenes according to setting (using QuickTime Player or simple editing software like iMovie), post each scene as a separate video to YouTube, then embed the URL to a pin in Storymaps (based, of course, on the geographic location in which the scene is set). Likewise, the video clip is viewable in a side bar, similar to National Geographic’s “Geostories” ( One can, therefore, watch an entire film while simultaneously tracing the narrative geographically.

This process may seem a bit convoluted, but it is actually quite simple, and it offers a new way of looking at a particular film, or any story.

Consensus isn’t what collaboration is about

Consensus isn’t what collaboration is about. This take-home-point by Tom Scheinfeld stuck with me, and I found myself saying it out loud to a group of people at work. This is an important point that bogged our web project down in the planning stages of The struggle of early stages–lack of consensus and leadership on what the website should be, it’s goals, its tone, it’s audience it’s measure of success, etc.–still is visible to a visitor who spends a little bit of time on it. The project lacked a visionary who knew and believed what the project should be.

Instead of gaining consensus of the group, Scheinfeld stated that if a positive outcome is accomplished, led by a few members of the group, that is what the collective will remember–their achievement as a group. But for this achievement to be accomplished, someone must asses the best possible outcome and make a decision. My question at that moment was, how do you, as a leader decide on something? How can you be confident that it will work? Maybe you don’t until you take the chance? Is the definition of a leader, someone who confidently takes chance on something?

Another probably obvious point that didn’t always seem clear to me, is that the measure of success should be based on whether people use it. Despite the institutional agenda, and ideologies and high standards that motivate projects, if there isn’t a practical utility value that serves people, it’s not successful. The number of people that the project considers a success, though, requires some thought. As project that caters to a specialized audience (particular topic in art history), it needs to determine what that limited number of audience is, that it should strive toward. There is pressure to serve a general audience, and have many visits per day, but the content of the site cannot appeal to a general audience. A specialized and devoted follower, who values the content of the site is needed, but the number of that audience is yet to be determined.

There were many golden advise that Scheinfeld left during his talk and discussion. Here are a few more that I will keep with me in the future wherever I end up working as a professional project manager:

  • value constraints (think about the 7-day turn around)
  • make time for social interaction (meta cognition)
  • assessing people’s skills. Determining the types of skills the group has, and the type of skills that need to be acquired.
  • think of set of critical questions; always ask what it’s missing, and think of the overall picture
  • list the criteria for the measure of success
  • divide a group to execute different things
  • watch out for unthoughtful moves. There is risk of losing members’ urgency, respect, trust, etc.
  • create process documentation. This could be part of the out-reach (tweeting out the progress or reporting on a blog)

An incredibly valuable session, the text by Sharon M. Leon also provided practical tips on project management. I would be curious to hear about the workshop that evening, which I couldn’t make. Classmates: let me know if any of you made it to the workshop and please share with me your take-home-points!!