Two-Minute Preview:
http://www.youtube.com/watch?v=Jpv5-Xw6ois&feature=youtu.be
Full Video:
Two-Minute Preview:
http://www.youtube.com/watch?v=Jpv5-Xw6ois&feature=youtu.be
Full Video:
Matthew Kirschenbaum spoke about his forthcoming book project, which was recently profiled in The New York Times.
Kirschenbaum’s research asks questions such as: When did writers begin using word processors? Who were the early adopters? How did the technology change their relationship to their craft? Was the computer just a better typewriter—faster, easier to use—or was it something more? And what will be the fate of today’s “manuscripts,” which take the form of electronic files in folders on hard drives, instead of papers in hard copy? This talk, drawn from the speaker’s forthcoming book on the subject, will provide some answers, and also address questions related to the challenges of conducting research at the intersection of literary and technological history.
Matthew G. Kirschenbaum is Associate Professor in the Department of English at the University of Maryland and Associate Director of the Maryland Institute for Technology in the Humanities (MITH, an applied thinktank for the digital humanities).
Film still from “Know Your Typewriter” courtesy of Prelinger Archives and the Internet Archive
In “Gibson’s Typewriter,” Scott Bukatman writes about the irony that William Gibson’s cyberpunk novel Neuromancer was composed on a manual typewriter. Distinguishing himself from the postmodernists who have declared the end of history, Bukatman argues that “The discourse around surrounding (and containing) electronic technology is somewhat surprisingly prefigured by the earlier technodiscourse of the machine age.” To explore the “tropes that tie cyberculture to its historical forebears,” Bukatman says that he wants to reinstate the history of the typewriter “in order to type history back into Neuromancer.”
Typing history back into Neuromancer turns out to be quite a challenge because as Bukatman says, “The repression of the typewriter’s historical significance in the Neuromancer anecdote has its analogue in the annals of technological history. No serious academic investigation of the typewriter has been published, to my knowledge, and almost all curious writers seem to rely upon the same two texts: The Typewriter and the Men Who Made It (hmmm . . .) and, even better, The Wonderful Writing Machine (wow!), both highly positivist texts from the 1950s.”
I started out interested and sympathetic to Bukatman’s aims. He is a gifted writer who skillfully pulls apart and teases out the meaning of the 1950s texts. I had the impression that I was reading the “truest” version of the history of the typewriter that was available at the time Bukatman was writing. I was curious, though, about other histories of the typewriter that might have been published after this piece was written.
After some research I was surprised to learn that there are quite a few histories of the typewriter, almost all of which were published well before Bukatman’s essay. See the Smithsonian’s bibliography of the typewriter and Google Books (related books links). A number of these were written by collectors or have illustrations targeted to collectors; but several are more serious, with Michael H. Adler’s The Writing Machine widely regarded as the most accurate. Despite Bukatman’s claim at the time of his writing that there weren’t academic books about the history of the typewriter, one of the two histories he cites, The Typewriter and the Men Who Made It, was written by a professor at Urbana, published by the University of Illinois Press, and reviewed in a journal of the Organization of American Historians. Another example from academia is George Nichols Engler’s dissertation, The Typewriter Industry: The Impact of a Significant Technological Revolution (1969).
Am I simply being pedantic by pointing this out? I don’t think so. Bukatman, Professor of Art and Art History at Stanford, declares that his task is “reinstating history,” and the recitation of that history comprises about a third of the essay. Calling the lack of authoritative histories a “repression” and claiming an analogy to the “repression” of the anecdote about Gibson’s manual typewriter in cyberculture is central to the structure of his argument. And, through his fluent analysis of the texts he has chosen, he seems to present himself as an authority who has culled the best that is available.
The feeling that I am left with as a reader is of being misled by the writer (however inadvertently), and that, to borrow Bukatman’s phrase, the “disappearance [of history] was little more than a trope of a postmodern text.”
King’s “Word Processor of the Gods” was an amusing read. It’s a bit dated (1983) but still holds captures what we all probably thought when our fingers touched keyboards… that the word processor could be construed as a magic machine with the abilities to delete or change our personal histories.
I suppose King having the perspective of an author/writer, was hypothesizing that with the arrival of the word processor, text is made extremely pliable (movable type) and that transformed the way people write/think (similar to Derrida’s pen vs machine). I think that with each communication method we adopt or adapt to the technology (be it a chisel, plume, chalk, pen, tablet) and they also change the ways which we interact.
On October 8, CUNY DHI and the Graduate Center Composition and Rhetoric Community (GCCRC) hosted a conversation about the intersection of writing studies and digital humanities with Doug Eyman and Collin Brooke. These two innovative scholars shared in an important discussion concerning the future of digital rhetoric. Doug Eyman is a professor of digital rhetoric, technical and scientific communication, and professional writing at George Mason University and the senior editor of Kairos: A Journal of Rhetoric, Technology, and Pedagogy; Collin Brooke is a professor of Rhetoric and Writing at Syracuse University and is the author of Lingua Fracta: Towards of Rhetoric of New Media.
When it comes to hacking and coding one rolls up their sleeves to build models and prototypes to engage visually, open debate and uncover new meanings. Theory as applied in methodologies leads us away from the mundane and toward bold ways of assessing existing humanist issues that are embedded in abundance in big data through literature, history and sociology. The work of the digital humanist asserts that which is regarded as traditional narrative notions might gain new meaning or insight through further research and closer inspection. The question “How does theory support the digital humanities” is critical because theory compels consideration.
Drucker raises the notion of “creating computational protocols that are grounded in humanistic theory and methods”, and “suggest it is essential if we are to assert the cultural authority of the humanities in a world whose fundamental medium is digital”.(3) The term “cultural authority” suggests epistemological knowledge that is central to creating new digital approaches to engage critical thinking. These new digital approaches would assist in revisiting unresolved concerns as well as in observing thought processes to determine outcomes around current day critical issues and to create models using the digital humanist toolbox to reflect these findings. For instance the digital humanist can explore myriad issues on the political or social worldwide human landscapes and derive appropriate useful outcomes. Prototypes then aid in accessing which digital tools best assist and inform this work.
Ramsay and Rockwell put forth the idea that “prototypes are theories”(4). These prototypes aid in the ability to create, to do, and to build, yet the “guidelines for evaluation of digital work”(3) may restrict prototypes as scholarly. The argument can be made that such restriction could ultimately have the effect of working against the investment of skill and time during the course of the digital humanist’s workflow. As Drucker noted, “more is at stake than just the technical problems of projection”(7). It is the potential of the prototype to assist workflow and serve to aid thoughtful response around humanist issues. The efficient use of mechanisms to devise tools in the digital realm assist the user in multitasking, and aid in the completion of data rich and-or quantitative digital tasks. Theory then is a tool that aids the work of the digital humanist to build and create.
When David Mimno came to class to discuss topic modeling and MALLET, he first showed an image of the Perseus Digital Library, referring to it as ’19th century scholarship’. Now, Professor Mimno had a hand in the creation of that website, so I wouldn’t think he meant that as an insult. But he did go on to say that technology offers ‘more’ for the humanities than what the Perseus Project has done.
This made me wonder about the implicit criticism of ’19th century scholarship’ versus new computational humanities research. My understanding of the value of the humanities has everything to do with enrichment — that is, personal growth engendered by reading, understanding, and discussing the thoughts of other people exploring what it is to be human. Put another way: increasing wisdom through study. I accept that not everyone holds this view.
If we use MALLET to determine the difference in word use by male and female authors, we have certainly learned something about humanity. But it seems like a different project from the one I understand to be that of the humanities. Does the new, computational approach ‘engender personal growth’? I am ready to believe that it can, but not nearly as obviously as, say, studying Shakespeare’s Sonnets would. So far, the current approach seems to be more concerned with studying humans and human texts in a ‘scientific’, fact-oriented manner.
So that may be ’21st century humanities scholarship’, as opposed to that of the 19th century. But it needn’t be ‘either, or’. We can use Digital Humanities tools and methods to enrich the experience of students who are reading humanistic texts, much in the way done by the Perseus Digital Library, for instance. We can, as my colleague Gioia Stevens points out, use topic modeling to improve discovery of digital texts, which would unquestionably help in the individual pursuit of self-improvement.
In Part One of this blog post, I wrote about scholars’ reliance on proprietary databases for research and the importance of understanding the constraints which database structures place on the outcomes of their efforts. Unfortunately, generally speaking, information about the structures of proprietary databases is not easily accessible. To remedy this, Caleb McDaniel has talked about the need to create an online resource to collate information about the construction of proprietary databases.
As an exploration of the structure of a proprietary database, I will look at one commercial database’s search and text analysis tools and touch on their handling of content. My goal is to demonstrate some of the complexity of these systems and to parse out the types of information that scholars would want to know and should consider sharing when writing up their research findings.
Artemis – Text mining lite
I recently attended a presentation about a commercial database company’s venture into what I call “text mining lite.” The company, Gale, has just started to offer text analysis and other tools that are squarely aimed at the field/set of methods of digital humanities. The tools are available through Artemis, an interface that allows searches across multiple collections of primary eighteenth century (ECCO) and nineteenth century sources (NCCO). There is a separate Artemis platform for literary material with the same analytic features. By 2015 Gale humanities collections running the gamut from the 19th Century U.S. Newspapers to the Declassified Documents Reference System and many others will migrate into Artemis. Artemis is available CUNY-wide.
Parameters of search
To access Artemis’s textual analysis capabilities the user first determines the parameters of selection of the materials. The options are extensive: date ranges, content type (e.g. manuscript, map, photograph), document type (e.g. manifesto, telegram, back matter), title, and source library. For example, one could search only letters from the Smith College archives or manuscripts from the Library of Congress in particular years.
Context
Discussing the use of Google’s Ngram to find themes in large bodies of texts, Matt Jockers advises caution, “When it comes to drawing semantic meaning from a word, we require more than a count of that word’s occurrence in the corpus. A word’s meaning is derived through context” (120). In his CUNY DHI and Digital Praxis Seminar lecture, David Mimno addressed the necessity of understanding the context of words in large corpora saying, “We simply cannot trust that those words that we are counting mean what we think they mean. That’s the fundamental problem.”
One way that Artemis deals with this is by offering a view into the context of the documents in search results. For each result, clicking on “Keywords in Context” brings up a window showing the words surrounding the keyword in the actual (digital facsimile) document. This makes it relatively simple to identify if the document is actually relevant to your research, as long as the number of documents being examined is not too large.
Refining results
While the categories of search that Artemis allows are quite flexible, it is also possible to enter proximity operators to find co-located words. This means that, in many situations, it will be possible to further refine results through iterative searching to locate smaller batches of relevant documents on which to run the text analysis tools.
Ngram viewer
Artemis features a visualization tool that offers some improvements over Google’s Ngram to show frequency of terms over time. The term frequency ngram is created from the search results. Click and drag on the term frequency graph to modify the date range. The graph can zoom to the one-year level. It is possible to retrieve a particular document by clicking on the point on the graph. The visualization also displays term popularity, the percent of the total documents each year. Term popularity normalizes the number of documents based on the percentage of the content.
Term clusters visualization
For larger sets of documents, or to look at entire collections, researchers might want to use term clusters. Term clusters use algorithms to group words and phrases that occur a statistically relevant number of times within the search results.
The visualization of term clusters are based on the first 100 words of the first 100 search results per content type. This means that the algorithm would run only within, for example, the first one hundred words of the first one hundred monographs, the first one hundred words of the first one hundred manuscripts, and the first one hundred words of the first one hundred newspaper articles. The size limitations are because the text analysis tools are bandwidth intensive. Searches of larger numbers of documents take longer to return results and also slow down the system for other users. By clicking on the clusters, it is possible to drill down into the search results to the level of individual documents and their metadata.
Legibility of documents
Scholars should have an understanding of the process by which database publishers have transformed documents into digital objects because it affects the accuracy of searches and text analysis. In Gales’ collections, printed materials are OCR’d. For nonprint materials, such as manuscripts, ephemera and photograph captions, the metadata of names, places and dates are entered by hand. By providing improved metadata for nonprint materials, Gale has increased the discoverability of these types of documents. This is particularly important for those studying women and marginalized groups whose records are more likely to be found in ephemeral materials.
Collection descriptions
Understanding the types of materials contained within a proprietary database can be difficult. The Eighteenth Century Collections Online (ECCO) is based on the English Short Title Catalogue from the British Library and is familiar to many scholars of the eighteenth century. The Nineteenth Century Collections Online (NCCO) is a newer grouping of collections that is being continually updated. To see a detailed description of the collections in NCCO, go to the NCCO standalone database, not the Artemis platform, and click Explore Collections.
Data for research
Generally, scholars can download PDFs of documents from Artemis only one document at a time (up to 50 pages per download). When I asked about access to large amounts of data for use by digital humanists, the Gale representative said that while their databases are not built to be looked at on a machine level (because of the aforementioned bandwidth issues), Gale is beginning to provide data separately to scholars. They have a pilot program to provide datasets to Davidson College and the British Library, among others. Gale is also looking into setting up a new capability to share data that would be based outside their current system. The impression that I got was that they would be receptive to scholars who are interested in obtaining large amounts of data for research.
Bonus tip: direct (public) link to documents
Even though it doesn’t have anything to do with standards for presenting scholarship, I thought people might want to know about this handy feature. Artemis users have the ability to bookmark search results and save the URL for future reference. The link to the document(s) can then be shared with anyone, even those without logins to the database. To be clear, anyone that clicks on the link is taken directly to the document(s) although they won’t have the capability to extend the search. This makes it easy to share documents with students and through social media.
In this post, I have sought to shed some light on the usually opaque construction of proprietary databases. If people start “playing” with Artemis’ text mining lite capabilities, I would be interested in hearing about their perceptions of its usefulness for research.
Works cited
Jockers, Matthew L. “Theme.” Macroanalysis Digital Methods and Literary History. Urbana: University of Illinois Press. Print.
It is striking that discussions of theory in DH seem primarily focused on how DH projects themselves provide theory rather than actually theorizing about the nature of DH as an academic discipline. The latter ostensibly belongs more appropriately to a discussion on defining DH (as we have discussed in week 2), but I find it productive and relevant to discuss here. Looking first at what Ramsay and Rockwell refer to as “thing theory” then noting the importance of interactivity in digital scholarship, I will attempt to broadly approach these two issues—i.e., locating theory in DH and literally defining a theory of DH—to substantiate DH as a theoretical undertaking but more importantly to illustrate how DH is unique from traditional humanities.
Regarding the hack vs. yack debate, it seems clear that even the strongest proponents of methodology over theory would agree that there is no strict dichotomy between the two. As Natalia Cecire notes, “the two are not antithetical” (56). In fact, hack and yack share essential qualities, namely the overall goal of humanistic inquiry. The only apparent differences involve the tools and media utilized. But throughout history humans have used a variety of tools and media to externalize thought (from Paleolithic cave paintings to film and new media). Simply put, humanities scholarship has long suffered from the tyranny of oral and written discourse as its primary media. DH utilizes digital tools as its media to externalize thought and humanistic inquiry. The digital product itself possesses (or should possess) the essential qualities of a written piece of scholarship, i.e., theory (notably, theory with a lower case “t”).
Ramsay and Rockwell refer to this as thing theory: “Prototypes are theories, which is to say they already contain or somehow embody that type of discourse that is most valued—namely, the theoretical” (3), and later more poignantly claim, “To ask whether coding is a scholarly act is like asking whether writing is a scholarly act” (8). I would perhaps add that coding itself is a form of writing, just as, for instance, filmmaking or other media creation are forms of writing, insofar as a communicable textual entity is created. As Drucker notes, such forms of scholarship involve “an analysis of ways representational systems produce a spoken subject” (8).
Can a film not act as a form of scholarship? Interestingly, tenure-track faculty in film production departments (though not necessarily a humanities discipline) are assessed purely on their body of film work. And it seems equally valid for a traditional humanist to produce a provocative film in lieu of a formal essay. Additionally, an inherent rule in filmmaking (though often broken) involves concealing the process (the tools). Regarding digital scholarship, Patrick Murray-John notes, “A good user interface is designed specifically so that you don’t have to deal with the inner workings of the application” (76). This is becoming a bit of a digression, but I would at least like to pose the question: is this an important rule for DH?
Equally striking, however, is Gary Hall’s take in “There are No Digital Humanities.” Hall questions the computational turn in humanities as a movement, stressing the notion that it appears to be a reverse of Lyotard’s Postmodern Condition, allowing science and quantitative information to dominate the humanities. This is an important point that deserves deeper investigation, particularly as DH continually evolves.
Ben Schmidt’s thesis is particularly useful here: “The answer, I am convinced, is that we should have prior beliefs about the ways the world is structured, and only ever use digital methods to try to create works which let us watch those structures in operation” (61). The individual subject, the human, is key in interpreting even the most empirical humanistic inquiry. Furthermore, DH fundamentally advocates open-access and, more importantly, interactivity. The ability of the user (scholar or non-scholar) to experience a DH work and interpolate his/her experiences and thoughts seems to allow DH to evade a reversal of postmodernism. Whether via data visualization, topic models, or simply blogs and open-access texts, which allow peer review/critique and interactivity with the text, the foundation of DH as a discipline appears firmly rooted in subjective humanistic inquiry in a manner that is unique and potentially more effective than traditional scholarship.
In this sense, DH can and should innately contain both theory (generally speaking) as well as a theory of itself, i.e., promoting subjective interactivity with relatively objective knowledge.
David Mimno made an important distinction about theory vs. practice when he pointed out that MALLET (or any DH tool) is a method, not a methodology. MALLET can uncover thematic patterns in massive digital collections, but it is up to the researcher using the tool to evaluate the results, pose new questions, and think of possible new uses for the tool. In our class discussion, Mimno compared different roles in topic modeling to Iron Chef: he makes the knives (MALLET), librarians dump a lot of fatty tuna (the corpus of text) on the table, and the humanists are the chefs who need to make the meal (interpreting and drawing new conclusions from the results).
As a librarian, I have never thought of myself as a provider of fatty tuna, but I get the general point. What role do librarians and other alt-academics play in DH? Can a librarian be a tool maker, a chef, a sous-chef, a waitress, or something else entirely? What does it mean to curate content and devise valuable ways to access that content? Is it scholarship? I am not sure if I can answer that question, but I do see many new ways to apply MALLET as a search and discovery tool which would be very useful for scholarship.
Can we do better than key word search to find relevant information in huge collections of digital text? Would search terms created from the body of the text itself be more accurate than hand-coding using the very dated and narrow Library of Congress subject headings? The DH literature on topic modeling doesn’t have much on libraries, but I did find the following information. Yale, U. Michigan, and UC Irvine received an Institute of Museum and Library Services grant to study Improving Search and Discovery of Digital Resources Using Topic Modeling. See also an interesting D-Lib Magazine article on using topic modeling in HathiTrust, A New Way to Find: Testing the Use of Clustering Topics in Digital Libraries