Tag Archives: Library

css.php

David Mimno and fatty tuna

David Mimno made an important distinction about theory vs. practice when he pointed out that MALLET (or any DH tool) is a method, not a methodology.  MALLET can uncover thematic patterns in massive digital collections, but it is up to the researcher using the tool to evaluate the results, pose new questions, and think of possible new uses for the tool.  In our class discussion, Mimno compared different roles in topic modeling to Iron Chef:  he makes the knives (MALLET), librarians dump a lot of fatty tuna (the corpus of text) on the table, and the humanists are the chefs who need to make the meal (interpreting and drawing new conclusions from the results).

As a librarian, I have never thought of myself as a provider of fatty tuna, but I get the general point. What role do librarians and other alt-academics play in DH? Can a librarian be a tool maker, a chef, a sous-chef, a waitress, or something else entirely?  What does it mean to curate content and devise valuable ways to access that content?  Is it scholarship? I am not sure if I can answer that question, but I do see many new ways to apply MALLET as a search and discovery tool which would be very useful for scholarship.

Can we do better than key word search to find relevant information in huge collections of digital text? Would search terms created from the body of the text itself be more accurate than hand-coding using the very dated and narrow Library of Congress subject headings? The DH literature on topic modeling doesn’t have much on libraries, but I did find the following information. Yale, U. Michigan, and UC Irvine received an Institute of Museum and Library Services grant to study Improving Search and Discovery of Digital Resources Using Topic Modeling. See also an interesting D-Lib Magazine article on using topic modeling in HathiTrust, A New Way to Find: Testing the Use of Clustering Topics in Digital Libraries