Tag Archives: MALLET

css.php

19th Century Scholarship For the 21st Century

When David Mimno came to class to discuss topic modeling and MALLET, he first showed an image of the Perseus Digital Library, referring to it as ’19th century scholarship’. Now, Professor Mimno had a hand in the creation of that website, so I wouldn’t think he meant that as an insult. But he did go on to say that technology offers ‘more’ for the humanities than what the Perseus Project has done.

This made me wonder about the implicit criticism of ’19th century scholarship’ versus new computational humanities research. My understanding of the value of the humanities has everything to do with enrichment — that is, personal growth engendered by reading, understanding, and discussing the thoughts of other people exploring what it is to be human. Put another way: increasing wisdom through study. I accept that not everyone holds this view.

If we use MALLET to determine the difference in word use by male and female authors, we have certainly learned something about humanity. But it seems like a different project from the one I understand to be that of the humanities. Does the new, computational approach ‘engender personal growth’? I am ready to believe that it can, but not nearly as obviously as, say, studying Shakespeare’s Sonnets would. So far, the current approach seems to be more concerned with studying humans and human texts in a ‘scientific’, fact-oriented manner.

So that may be ’21st century humanities scholarship’, as opposed to that of the 19th century. But it needn’t be ‘either, or’. We can use Digital Humanities tools and methods to enrich the experience of students who are reading humanistic texts, much in the way done by the Perseus Digital Library, for instance. We can, as my colleague Gioia Stevens points out, use topic modeling to improve discovery of digital texts, which would unquestionably help in the individual pursuit of self-improvement.

David Mimno and fatty tuna

David Mimno made an important distinction about theory vs. practice when he pointed out that MALLET (or any DH tool) is a method, not a methodology.  MALLET can uncover thematic patterns in massive digital collections, but it is up to the researcher using the tool to evaluate the results, pose new questions, and think of possible new uses for the tool.  In our class discussion, Mimno compared different roles in topic modeling to Iron Chef:  he makes the knives (MALLET), librarians dump a lot of fatty tuna (the corpus of text) on the table, and the humanists are the chefs who need to make the meal (interpreting and drawing new conclusions from the results).

As a librarian, I have never thought of myself as a provider of fatty tuna, but I get the general point. What role do librarians and other alt-academics play in DH? Can a librarian be a tool maker, a chef, a sous-chef, a waitress, or something else entirely?  What does it mean to curate content and devise valuable ways to access that content?  Is it scholarship? I am not sure if I can answer that question, but I do see many new ways to apply MALLET as a search and discovery tool which would be very useful for scholarship.

Can we do better than key word search to find relevant information in huge collections of digital text? Would search terms created from the body of the text itself be more accurate than hand-coding using the very dated and narrow Library of Congress subject headings? The DH literature on topic modeling doesn’t have much on libraries, but I did find the following information. Yale, U. Michigan, and UC Irvine received an Institute of Museum and Library Services grant to study Improving Search and Discovery of Digital Resources Using Topic Modeling. See also an interesting D-Lib Magazine article on using topic modeling in HathiTrust, A New Way to Find: Testing the Use of Clustering Topics in Digital Libraries