Letter to a Future Post-Bac

Dear Future Post-Bac,

I’m going to start by addressing the elephant in the room: this is a weird job. How in the hell are you going to explain what this job is to your uncle when you’re trying to make small talk at Christmas? What are you going to tell your friends who have all moved to NYC and DC and LA to be lawyers and med students and organizers? More importantly, how are you going to talk about this position on a resume? In a cover letter? In an interview? 

Let me give you a few answers that I hope will orient you in this new DH space.

The answer to the first question (what will you tell your uncle?) changed over time for me. As I became more submerged in DH theory, I got better at describing it. Here’s the easiest answer: “I work in the library, and research digital technology faculty members use in their classrooms.” This is sure to bore any relative to death and they’ll quickly change the subject. 

Is that really all you’re going to be doing, though? Absolutely not. Here are some of the things that I did that weren’t researching classroom technology over the 18 months that I worked here:

Continue reading “Letter to a Future Post-Bac”

Analysis of a Failed Project

Throughout the last year, I have undertaken a series of small, independent projects in order to learn more about digital humanities tools and techniques.

  • Designing a WordPress site
  • Designing a Scalar site
  • Designing an Omeka site
  • Designing a Neatline map
  • Designing a StoryMap JS
  • Designing a Timeline JS for the Five Colleges of Ohio
  • Testing out Voyant with Jude the Obscure and some other random texts
  • Testing out my ability to use R to create a Structured Topic Model of Jude the Obscure (whatever that means)
  • Testing out the Topic Modeling GUI with Jude the Obscure
  • Setting up an OHMS account and learning how to use it
  • Creating an alternative version of OHMS on WordPress
  • Creating an Illustrated Guide to Doing Oral History
  • Designing a Scalar infographic
  • Creating a list of DH undergraduate majors and minors
  • Maintaining a list of everything I’ve read relating to digital scholarship
  • Writing and presenting an opinion piece at the Bucknell University Digital Scholarship Conference
  • Learning the very basics of HTML, CSS, and jQuery
  • Researching Blockchain and Bitcoin to understand their role in digital society
  • Recording and editing videos about some of the projects undertaken on the Five College’s campuses
  • Creating an interactive version of a text using jQuery on Scalar
  • Designing and executing a research project about free speech on the five college campuses

Naturally, some of these independent projects failed. They can’t all be winners. If you can’t tell, they’re the bold, italic ones. I’m not going to talk about the Scalar one today. That one has a pretty simple reason for failing — if you put your own JavaScript or jQuery in Scalar, it’s going to break your whole site. I know because I destroyed mine twice.

Instead, I’m going to talk about that free speech research project. What ever happened to that? Why did that go out the window?

A short reminder for those of you who don’t know: I designed a research project in which I would interview students around different college campuses about their opinions on free speech, and then I would transcribe those interviews, analyze them using a topic model, and create some data visualizations from these results. The goal was to determine how students from different political backgrounds talked about free speech on college campuses. What went awry?

The short answer is that I simply didn’t have enough time to work on it once I started helping faculty members with their projects, so I put this project on the back burner and by the time I got back around to it, didn’t have all of the skills I needed in order to do what I wanted with the data.

The long answer is as follows:

Continue reading “Analysis of a Failed Project”

Redefining Difficult: A Post-Bac’s Perspective on the Accessibility of Digital Scholarship Practices for Recent Grads

[The following is the text (and some fun gifs) from a presentation I gave last week at the Bucknell University Digital Scholarship Conference. Thank you to Bucknell for hosting such a fun, student-centered conference, the Mellon Foundation for providing me with funding to attend the conference (and my entire job), and Ben Daigle for helping me hone my message and feel confident in my presentation!]

I want to start by telling a story from when I was an undergrad. I was in a Literary Theory class and we were reading Lacan, not secondary explanations of Lacan. Actual Lacan. For those of you who haven’t read it, here’s a couple of stereotypical sentences from one of this works, The Agency of the Letter in the Unconscious.

“And we will fail to pursue the question further as long as we cling to the illusion that the signifier answers to the function of representing the signified, or better, that the signifier has to answer for its existence in the name of any signification whatever.”

“It is not only with the idea of silencing the nominalist debate with a low blow that I use this example, but rather to show how in fact the signifier enters the signified, namely, in a form which, not being immaterial raises the question of its place in reality. For the blinking gaze of a short sighted person might be justified in wondering whether this was indeed the signifier as he peered closely at the little enamel signs that bore it, a signifier whose signified would in this call receive its final honours from the double and solemn procession from the upper nave.”

So my homework was to read 22 pages of that. I realize that I took both of these out of context, but believe me, it’s hard to understand regardless. I showed up to class early and I asked my friends, did you understand this? Did you get what he was saying? None of them had, so I felt a little better. Still, we frantically Googled anything about the chapter that we could find so that it didn’t seem like we hadn’t read it at all. We went into class and had an amazing discussion about the chapter and what it meant and how it was intentionally written to be kind of confusing because that’s the point he’s making. With the help of the professor, I came away with a much deeper understanding of Lacan than I had by simply Googling. Two years later, I was in a children’s literature course and we read Frindle, which is a children’s book about a boy who calls his pen a frindle and essentially changes the language in his school and then in his town. It’s about who has the power to decide which words are counted as real words. Who gets to make them up? How do they become legitimate? What does the power to determine the validity of a word mean and how is it enforced? It was about Lacan and it illustrated the concept of signifier and signified and the literal and figurative power struggle between them in a little chapter book that I read in like 45 minutes.

Alright, let’s all keep that story in mind while I start talking about digital humanities.

I’m going to talk about my position and trying to learn about digital scholarship, and then I’m going to talk a little bit about the issues I see within digital scholarship from an accessibility standpoint.

First of all, what is a post-bac?

A post-bac is basically a post-doc, only it takes place after getting a bachelor’s degree. This specific position was created because the members of the Digital Collaborations Group at the Five Colleges of Ohio found that a lot of the time when a faculty member undertook a research project, there was usually a student researcher who took on a lot of the technical work and helped the faculty member complete what were usually very interdisciplinary projects. The DCG decided it would be helpful to have a person who was fully dedicated to learning about digital scholarship in order to assist faculty members with their research. In short, it’s my job to research different options and opportunities within the practice that I could help students and faculty incorporate into their pedagogy.

I’ve also been given the opportunity to develop my own research project that incorporates some of the digital scholarship techniques that I’ve been teaching myself and that project is currently underway.

So now you’re next question is probably going to be, so why did they hire me?

Well, I graduated from Denison University in 2016 with a degree in creative writing. Prior to taking this position, I had had a lot of jobs.

  1. Taekwondo Instructor
  2. Babysitter/Chauffer
  3. Math Tutor
  4. Traveling Mattress Salesman/Cashier
  5. Diner Waitress
  6. English Department Assistant
  7. Oral History Research Assistant
  8. Off-Campus Study Administrative Assistant
  9. Marketing Consultant

If you’ll notice only one of those is directly related to digital scholarship. So the point I’m making is that I don’t have any background in coding, computer science, web design, math or statistics, or really anything having to do with the digital. I am a millennial, I do remember a time when I had to wait for my mom to get off the phone in order to get on the internet.

But despite my ability to set my grandma’s phone to easy mode or share a youtube video on Facebook, I don’t actually know a whole lot about computers. Surprising, right?

I wouldn’t say this came as a huge surprise to me. I knew I didn’t know what I was doing when it came to coding or anything relating to computers that don’t have a GUI. But the point of this post-bac wasn’t to hire someone who already knew how to do everything related to digital scholarship, it was to hire someone who wanted to continue learning and to leverage the research undertaken by the post-bac to assist with student and faculty projects. Which is awesome. I’m basically being paid to teach myself. And that’s the background that I’m coming with to talk about accessibility in digital humanities.

Continue reading “Redefining Difficult: A Post-Bac’s Perspective on the Accessibility of Digital Scholarship Practices for Recent Grads”

Week 28: Summer Reading Roundup

The internet is not a utopia. I know, this isn’t exactly a revolutionary, new idea. A lot of people have made this critique. There are a lot of problems with the internet. It’s overrun with racist, sexist, homophobic language. I pretty strongly agree with Sarah Jeong’s theory that the internet is mostly garbage. Amazon and Google and Facebook are all collecting our data and selling it to companies and political campaigns. And no matter where you go on the web, infinite ads flash in the corners, the middle of articles, and interrupt the videos you watch.

But the internet is not a total dystopia either, as many critics have claimed (see reading list). The internet and the digital in general seem to be forms that provide infinite production and zero consumption, and while this isn’t exactly true, the abundance and availability of information and art has dramatically increased with the internet. The internet has created the opportunity for remixes and play and new forms of community. It has created new forms of communication and provides an unprecedented amount of instantly available information. Some bloggers have likened early YouTube to David Graeber’s concept of baseline communism, that is “the raw material of sociality, a recognition of our ultimate interdependence that is the ultimate substance of social peace.” On YouTube individuals create content not to make money but instead to help and entertain one another with tutorials or silly songs. Early YouTube–that is YouTube prior to the ad revenue–created a platform on which almost anyone could upload and share their knowledge, and millions of people did this, not for their own gain, but just to help other people because we are all ultimately dependent on one another, and it brings us joy to contribute to the greater good, however small the contribution. For all of the racism and sexism on the internet (which Sarah Jeong points out is mostly run by bots created by a small portion of the population), there are examples where humans help one another and give to one another without the expectation of receiving anything in return. If the internet has done any good, it has shown that humans are not innately selfish, calculating machines, but are instead constantly striving to build communities and help one another, flawed though their attempts might be.

This article is my attempt to provide an overview of what I believe are the most pressing issues created by the internet and the digital based on the critiques I have read throughout the summer. My goal is to show how each of these problems is related to and feeds into one another, and show how we might begin to think of a world that actually changes and responds to these problems.

Continue reading “Week 28: Summer Reading Roundup”

Week 21: DHSI

I spent last week attending a class on Ethical Data Visualization at the Digital Humanities Summer Institute (DHSI) in Victoria, Canada. It was a fantastic experience, not just because I enjoyed my class but also because I met some amazing friends with similar academic interests to my own.

In class we learned how visualizations can be constructed to support a misleading narrative. Here are a few examples:

Problems with this visual (to name a few): It shows an arbitrary amount of time to disguise the rise in the trend line that would be shown if the first big spike was taken away. (What does that since 1750 even mean?) It’s showing change over time, so a straight line actually shows a steady increase. The arrow is an intentional optical illusion to trick you into seeing more of a horizontal pattern than there actually is. 

Problems with this visual: There are two y-axis that aren’t labeled, so it seems to make the two lines much more comparable when in reality they aren’t even that close. The colors are highly suggestive and are intended to play off of the public’s bias to think of women as feminine and delicate (pink) and abortion as murder (blood red). This doesn’t even kind of take into account any of the other services provided by Planned Parenthood. Cancer screenings are only going down because there are more effective methods for screenings which require fewer tests. There are no points between the two end points on each line, leaving out a significant part of the data.

Problems with this visual: For one thing, it doesn’t add up to 100% even though it’s a pie chart. The colors are misleading (blue used in chart is the same blue as the background making Huckabee less prominent). It’s hard to read the source. It’s tilted, which distorts our understanding of the amount of space each slice takes up, which makes Palin look like she has the smallest chunk of magical 193% pie when in reality she has the most support. The divisions between the pieces of pie are absolutely arbitrary. The word “Back” doesn’t need to appear on each label.

I would like to point out that the first two visuals were used by actual congressmen during congressional testimonies and the third was used on live TV. The bar is literally as low as it can possibly go. It’s not too hard to make visuals that are more ethical than these. 

At the same time, we debated the possibility of creating a value-neutral/non-political visualization. Is such a thing even possible? Is it what we should strive for? Or should we attempt to construct a narrative using our own values and politics with the understanding that one of our many values should be the truth? Most of the class seemed to agree the second was a more realistic and fruitful option.

I also learned a bit about R, including how to import data from a csv and create charts using this data. However, I also learned that R isn’t the ideal program for the type of visualizations I would like to create. Javascript would be better, but as I cannot learn it fast enough, I’m going to use Power BI for most of the visuals I intend to create for my free speech on campus project.

Finally, and I think most importantly, the idea that computers are human-made machines, not magical black boxes of science was repeatedly reinforced throughout the course. It seems obvious but given the amount of propaganda we consume on a daily basis, this is an important message to reinforce. Just because a visualization looks official and uses numbers and cites a source, doesn’t mean it’s an accurate reflection of reality. Every choice we make with our visualizations impacts the narrative it conveys to the audience. Computers aren’t inherently value-neutral because the data we feed into them isn’t value neutral and they’ve been programmed to parse it in a particular way–a systematic way, but necessarily in a way that reflects the messiness of the world. It’s just like Miriam Posner’s quote about data-based work that I quoted in my first ever blog post:

I would like us to start understanding markers like gender and race not as givens but as constructions that are actively created from time to time and place to place. In other words, I want us to stop acting as though the data models for identity are containers to be filled in order to produce meaning and recognize instead that these structures themselves constitute data. That is where the work of DH should begin. What I am getting at here is a comment on our ambitions for digital humanities going forward. I want us to be more ambitious, to hold ourselves to much higher standards when we are claiming to develop data-based work that depicts people’s lives.

The data is what you make it. The visualizations are what you make them. On the one hand this is empowering. For so long, we’ve been led to believe that there are some things that are just set in stone when it comes to computers and data, but it doesn’t have to be this way. On the other hand, this is a little terrifying. It’s a lot of responsibility to make an ethical visualization. We can’t use computers as scapegoats for the inaccurate narratives our research may display. It’s up to you to be intentional with your work.

Outside of class, I talked with dozens of wonderful people and made some great friends. It was really helpful for me, personally, to talk with academics close to my age who are interested in areas of research similar to my own. I got a lot of advice and I’m more than a little excited to go on to grad school and maybe (probably) to get a PhD.

Overall, it was a great week (even with the cold weather)!

Week 18: Youtube Tutorials and Coding with R

I know absolutely nothing about R, so in order to get started, I decided to follow along with a tutorial video by Julia Silge I had watched on topic modeling in R using something called “tidytext” principles. You can check out the Silge’s blog post and video here.

However, I wasn’t able to load my TEI copies of Capital into R (and actually crashed R to the point of having to reinstall and edit the folders to run R using command line). So I decided to follow the tutorial more closely by using the Gutenberg online copy of Jude the Obscure.

I found Silge’s video and the accompanying blog post incredibly useful. When I got stuck or a strange error appeared, I could easily Google the issue or rewind the video to make sure I was using the correct symbols for each step and in the process I learned a lot about the coding nuances of R. For example, when working with dplyr, each line in a sequence must end in %>% up until the last line as %>% acts almost like a chain linking each subsequent function together.

The first chart I created was a tf-idf chart for each part of the book, and I will admit that I was not entirely sure what tf-idf meant. Julia Silge briefly explained the tf-idf concept, but I found the this page useful because it provided a formula in plain English.

  • TF(t) = (Number of times term t appears in a document) / (Total number of terms in the document)
  • IDF(t) = log_e(Total number of documents) / (Number of documents with term t in it)
  • Value (i.e. length of the bar in the chart) = TF * IDF

In other words, tf-idf represents the importance of a term inside a document. tf-idf shows which words appear most frequently in each part that are relatively less frequent in other parts. For example, the word “pig” is very important in the first part of the book, but doesn’t come up often in the remainder of the book.

As soon as I ran the code in R, I saw a bunch of results that came out as 0, which didn’t seem correct since the video didn’t show a bunch of terms with 0s. So I used the formula above to test a few of the results by hand and found that they were correct. From here, I used ggplot2 to create this chart:

The next step was to create a topic model, which was fairly easy to do after making the tf-idf chart. As Silge explained each step, I felt I understood what I was doing and why the chart turned out the way that it did.

Finally, I created a gamma graph, which Silge explained shows that there is roughly one document (part of the story) per topic. I was unsure at the time why this should matter, but I managed to create the graph and it looked like the one in the video so I considered this a success.

Once I finished the tutorial and began writing this blog post, I realized running all of these lines of code by following a tutorial really isn’t that different than using a GUI, at least in terms of what I learned. I generated all of these charts and really didn’t need to look at what any of the packages and formulas I was using meant or how they worked (though I did look into tf-idf because I initially thought I had done something incorrect). This isn’t to say I didn’t learn anything from this exercise. Certainly, it was helpful for me to better understand the syntax and minutia of coding with R, and the resulting charts I generated were technically clear and accurate. But this isn’t good enough from a distant reading standpoint. Going back to one of the two quotes I provided in my last post, “In distant reading and cultural analytics the fundamental issues of digital humanities are present: the basic decisions about what can be measured (parameterized), counted, sorted, and displayed are interpretative acts that shape the outcomes of the research projects. The research results should be read in relation to those decisions, not as statements of self-evident fact about the corpus under investigation.” I don’t really understand any of the decisions I made aside from the fact that I wanted 6 topics and that I divided the book into 6 parts.

Certainly this isn’t a terribly impactful experiment. If I screwed up the statistics and misrepresented the data, thereby misrepresenting the book as a whole, there won’t be any negative consequences. It just means readers of this blog might think the word “pig” or “gillingham” is more important to the story than they actually are. But let’s say I wasn’t working with a single nineteenth century novel (and admitting my gaps in knowledge as I proceeded) and was instead working with data about race or gender or a corpus of nineteenth century novels. Let’s say I wanted to derive meaning from this data and failed to do so accurately. Let’s say I didn’t understand the malleable nature of my data points, that race or gender or even vocabulary are not set in stone but are instead social constructs that can be interpreted in many ways. Let’s say I created some charts following Silge’s tutorial and (incorrectly) determined white male writers have a “better” or “larger” vocabulary than women writers of color, and that I used these charts to determine which novels I should read in the future to better understand the nineteenth century cannon. That would definitely be a problem. It would be worse if anyone who read this blog post was convinced by my incorrect results and acted in accordance with them.

So let’s go back to the beginning to figure out what each of these visualizations mean, not just to do my due diligence to Thomas Hardy, but also distant reading and text analysis as a whole. I’m already fairly clear on what the tf-idf chart means and how it was created, so I’ve decided not to delve into that any further. Let’s start with the topic modeling chart. Continue reading “Week 18: Youtube Tutorials and Coding with R”

Weeks 15, 16, and 17: TEI Trials and Errors

About two and a half weeks ago, I submitted my Free Speech Research Project for IRB approval. While I wait for a response, I have decided to delve into more analytical work. I will be attending DHSI in early June to learn about ethical data visualization. Although the description for the program doesn’t list any prior experience with TEI or R as a prerequisite, I decided to go ahead and practice with both of them.

This post will be dedicated to my experienced with TEI and the following post will deal with R. I had been interested in learning TEI for a while as it seems to be one of the major cornerstones of digital humanities work, particularly when working with text. Although people often associate TEI with close reading techniques, I intended to automate the TEI conversion process, making this project more closely associated with distant reading. Distant reading was something I read about when I first started this position and I found the concept fascinating mainly because I feel like students and people my age instinctively employ distant reading techniques when faced with a large text or corpus. When I studied abroad, I didn’t have enough time to read Bleak House in three days, so I found secondary sources, word clouds, timelines, videos, any kind of visualization I could get my hands on so I could more quickly absorb the text. Essentially, I was trying to do this:

Artist: Jean-Marc Cote Source: http://canyouactually.com/100-years-ago-artists-were-asked-to-imagine-what-life-would-be-like-in-the-year-2000/

And while this is not exactly what digital humanists mean when they say distant reading, it’s a great illustration of the way the internet has drastically changed the way that we understand art and literature. Take This is America for example. So many news outlets and social media participants immediately attempted to dissect the video, none succeeding in capturing every nuance, but as a collective whole, created a fairly solid interpretation of the work within hours of its release, utilizing blog posts, Tweets, articles, videos, essays, and more. Because explanations and interpretations are so readily provided, artists like Donald Glover and Hiro Murai are pushed to create increasingly nuanced and layered works that resist simple interpretations and challenge their audience to dwell on their meaning. We’ve all witnessed times when every single news media outlet comes out with the exact same article on a pop culture phenomenon (just look up articles about Taylor Swift’s Reputation, all of which list each easter egg in the video with ease). The reactions to This is America were diverse and, cobbled together, still seem to fall short of encapsulating the sense of mesmerization that This is America provokes. The internet has so drastically changed the way we consume and interpret art, that artists who want to challenge their audience have to create works that elude the media’s customary act of reacting and labelling. At the same time, scholars have to create new ways of understanding pieces of art. The distant reading techniques employed by digital scholars are an attempt to answer this call.

Before I started anything with my project, I decided to read more about distant reading and its current relationship to the Humanities. Two quotes from these readings really stood out to me.

  • “Computer scientists tend toward problem solving, humanities scholars towards knowledge acquisition and dissemination” (Jänicke). What happens when we collapse that binary. What if the goal of the humanities becomes solving problems and the goal of the computer scientists is to acquire and disseminate knowledge? How does our ability to understand art and data morph into something new when we treat art like data and data like art?
  • “In distant reading and cultural analytics the fundamental issues of digital humanities are present: the basic decisions about what can be measured (parameterized), counted, sorted, and displayed are interpretative acts that shape the outcomes of the research projects. The research results should be read in relation to those decisions, not as statements of self-evident fact about the corpus under investigation” (UCLA). In the same way that art is created to answer a call for increasingly difficult interpretive puzzles, a product of the analytical environment into which it was born, so is the analysis itself. You cannot create data or answers or knowledge or solutions in a vacuum. The decisions I make during this project will impact the results. The fact that I am using digital techniques to perform my analysis doesn’t absolve me of responsibility for what I produce. 

I designed a project that I felt could incorporate both aspects of these quotes. I want to solve a problem and acquire knowledge, and at the same time, understand that whatever I produce would be inextricably tied to my own research decisions. This project also had to include aspects of TEI and R, as well as some of the other tools and skills I’ve learned about thus far. My plan was this:

  1. Find copies of Marx’s first three volumes of Capital and systematically scrape them from the web. (I chose to exclude the fourth volume as it was not available online in the same format as the other three volumes.)
  2. Encode the scraped texts (saved as a plain text file from the CSV file generated by the scraper) as TEI texts.
  3. Use those TEI documents to do some topic modeling or sentiment analysis in R, and put them into Voyant (because why not?).

Capital seemed like a good choice because I have an interest in Marxist theory, so I felt I would intuitively understand some of the results from my experiments, and it would be fun to learn about Marx’s most important work using new techniques that–as far as I know–haven’t been applied to this work before. Capital is also highly structured with parts, chapters, sections, and a variety of textual elements including tables, quotes, foreign phrases, and footnotes, which would give me plenty of opportunities to learn the options available in the TEI schema. Also, May 5 was Marx’s 200th birthday, and it just seemed fitting.

I coded the first chapter by hand so that I understood the structure, why some elements can be used together while others cannot, how to best divide the text using the <div> tag, how to label tags, and transform attributes into TEI-approved formats (e.g. &amp; into &#38;).

Using an initial HTML version of Chapter 1, I gradually worked my way through finding and replacing HTML elements with TEI elements.

Final structure of TEI encoded chapter

Once I completed the first chapter, I felt I had a sufficient understanding of the elements to know what to look for in a TEI document created using an online converter tool. For this step, I used OxGarage to convert the Word document versions of the volumes into P5 TEI. I then cleaned up the documents in order to make them valid. This step required very little effort. Most of my edits pertained to fixing mistakes in the Word documents, like a broken link to a footnote or an incorrect, invisible numbered list before some paragraphs.

This whole process took me about a week to complete. I summarized above but the actual steps looked like this: Continue reading “Weeks 15, 16, and 17: TEI Trials and Errors”

Weeks 13 and 14: There’s a GUI for That

For the last couple of weeks, I have been working on IRB documents for my research project involving student interviews. The good thing about the IRB process is that it really made me plan out my project so that I now know every detail of how I will do each step.

Part of my project will almost definitely involve topic modeling and sentiment analysis. However, when I wrote my first draft of the IRB Approval Form Responses, I realized I didn’t actually know very much about topic modeling or sentiment analysis, and what little knowledge I did have wasn’t going to cut it for this review process. So I sat down to try to read about the process of topic modeling and how it can be used.

I don’t know if you’ve read my About page or any of my other blog posts, but let me reiterate that I do not have a computer science or a statistics background. I consider myself fairly capable when it comes to math. I’ve always had a good sense for math and logic in the same way that although I’m not a cartographer, I am usually pretty confident in my ability to point out the cardinal directions. However, the articles I was reading and most of the blog posts sounded like this to me: 

No matter how many articles I read or reread, I just wasn’t getting how the whole thing worked. Where do you do topic modeling or sentiment analysis on a computer? Is it code? Is it a program? Do I need to download something? How should I format the files I want to analyze? What kind of code is used in topic modeling?

Learning how to do this stuff on your own is like trying to bake a cake for the first time in your life, and your only directions are in Russian, and you’ve also never eaten cake before. I have a general idea of what I hope to end up with, but I don’t even know where to flip in this cookbook to find a cake recipe.

I found myself reading the same sentence over and over again, so I did what I always do when I don’t understand how to do something and reading isn’t helping. I got on Youtube and watched some videos. (You can find everything I’ve watched and read (that I comprehended) on my reading list page.) They weren’t all helpful, but just watching someone type out the code and show a result was helpful. So I watched a bunch of these videos and after finding one that was particularly clear and useful (it was actually on sentiment analysis, but that’s beside the point), I downloaded all of the things the Youtuber had in the description section of his video. I quickly realized that many of the programs have been updated since the video, and don’t look the same or don’t have the same features anymore, so I couldn’t follow along with his tutorial as I planned. I was kind of back to square one–or square two, since I at least had a better idea of what kind of information I could get out of doing data analysis like this.

Next step? Complain about how hard this is! Feeling like I’d hit only dead ends, I explained my predicament to Ben, who sent me an article I’d previously given up on. The thing about a lot of these tutorial articles is that they start by telling you to go back a step if you don’t already know how to use the command line or BASH or R or whatever thing they’re going to use throughout the tutorial. And that makes a lot of sense to me. I wouldn’t suggest to you that the best way to learn how to bake is by using a recipe in a Russian cookbook and if I did, it would be cruel of me not to tell you to brush up on your Russian first. But if you keep going backward further and further away from the thing you want to do, you end up watching videos about how computers work (like this) instead of writing an IRB document. And I’m certainly not arguing that learning how computers work is a bad thing or that I shouldn’t spend my time learning the basics of computer science, but I’m also on a schedule. There are only so many skills I can learn in a week or a month or a year. But, I had a lead on one. Ben suggested this article, which was much clearer to me once I watched all of the Youtube videos and out of all the tutorial blogs, if Ben said this was good way to start, I could trust it would get me somewhere. That article linked to another that I could understand and another (see reading list). Eventually, I found the GUI Topic Modeling Tool, which is a GUI for MALLET, and then I really got on a roll.

Let me say briefly, I understand the trepidation around GUIs. If you don’t know how something works (e.g. what the program does and how it processes data), you might take all of the results at face value, as if they were concrete, official, and not as the social construct that they actually are. It’s like how most of us know vaccines prevent diseases and we get them, but few of us know how to make them. Generally, this works out. We don’t all need to know how to make a vaccine, but very few of us are mixing up our own vaccines at home and claiming they cure anything. Digital scholars who don’t know how the statistical model behind their data works aren’t going to accidentally give people mercury poisoning with a homemade polio vaccine, but they could confuse their audience with claims that might not be substantiated in the data. So if you don’t love the GUI Topic Modeling Tool, I get it.

The thing about GUIs to me is that they can be like the Google Translate for your Russian cookbook. Sometimes (maybe often) you’re going to end up with some total nonsense, but if it’s all you’ve got to get started, that’s what you’re going to use. The MALLET GUI available here, is really, really useful. It allowed me to work backward, so that I could tweak variables in a format I understand and compare the results. I know what number of topics should do, but what does number of iterations mean? How does it change the results from a data set I know super well? How does changing the number of topic words printed impact my understanding of the results? I don’t need to know how to tell the computer to change that variable yet. I need to understand what the variable is in the first place.

I ended up with some data that I could play around with and visualize on Power BI. Miriam Posner’s blog post about interpreting this data was super helpful at this step. Now I feel more prepared to read all of those blogs about the statistical model behind topic modeling. I have a better grasp of the variables and results. I’m also going to try running MALLET from the command line. To return to that damn cake metaphor for the 100th time this post, I know what all of the ingredients are. I’ve tasted some cake. Next, I have to learn how to set the oven and turn on the mixer. (Do you hate me yet?)

Best of all, the other benefit of using the MALLET GUI was being able to grasp enough of the conceptual ideas behind topic modeling and the data it generates so I could complete my IRB documents. Now I just need to submit them!

Weeks 11 and 12: My Favorite Tools

For the last two weeks, I have been trying out online tools that I may use for my research project and in this post, I’m going to discuss my two favorites.


When I interviewed for this post-bac position, I talked a lot about my frustration with the lack of transcription tools available online while I was working on the Digital Life Stories Archive for Regina Martin. My struggles with voice-to-text software was a learning experience for me that involved pirating old versions of Premier in order to use the now abandoned transcription tool only to find out it no longer functioned–even on older versions, transcribing an hour-long interview by hand, which took me nearly six hours, and spending multiple hours online searching for any tool, free or not, that would cut down on transcription time. We laughed during my interview about the fact that my expectations were too high because I knew Apple and Google created voice-to-text programs for Siri and Google Assistant and I expected there to be a “magical” solution to this transcription issue on Regina’s project.

Well, laugh no longer because there is a magical solution out there called Trint. This is an online program that you have to pay for (The rate is $12/hour right now). I conducted a practice interview with my brother, Spencer that I fed into Trint, and honestly, I held off trying it because I was so sure I would be disappointed in the results. Well, I’m writing this now to tell you I most certainly was not disappointed. Trint transcribed a half-hour interview in about 2 minutes. The audio was decent but not high-quality by any means because we conducted the interview over Zoom and Spencer’s internet was a little spotty. The results were incredible. I didn’t think it would function half as well as it did.

I don’t mean to suggest that Trint was perfect, certainly, there were some words that Trint mixed up. For example, Spencer and I have midwestern accents and we tend to slur words together, so depending on the context, we might say “a”, “I”, and “uh” the exact same way, so a sentence like “And uh, I was a little embarrassed” might come out like “And I I was uh little embarrassed.” This isn’t a hard fix. As you listen to the interview, Trint darkens the words being played and you can easily edit the text to match the intended words by just clicking a editing the transcript like a word document.

Another aspect that takes longer to edit is punctuation, as Trint only puts in periods at hard stops in the conversation. In order to better textually represent what’s being said, sometimes it’s necessary to put in additional punctuation. For example, Trint might write “You know I was at work and the dude well he so. He’s not nice.” Without punctuation we might understand the speaker’s intended meaning of these sentences, but punctuation would better capture the speaker’s actual phrasing, so that sentence becomes “You know, I was at work and the dude, well, he–so… he’s not nice.”

I will say that there were a number of difficult or unlikely phrases that Trint did accurately capture. Trint caught Spencer’s use of “sorta” and my use of “gotcha”, automatically capitalized “Black Lives Matter”, and cut out filler noises like “uh” and “um.” And aside from a few situations with homophones like “by and buy”, it was mostly correct.

I don’t think I can say enough positive things about Trint, so I’ll just say if you have an interview project or any audio or video that you want transcribed, try Trint out.


The second application I’m going to talk about is webscraper.io. This is an incredibly useful, simple tool to use. Although it appears to be really complex because, for whatever reason, the actual web scraper is the very last tab on the tool, the videos provided on the webscraper.io website (though quiet and oddly monotone), are very easy to follow and tell you everything you need to know about the tool in order to scrape websites.

I tend to resist watching video tutorials because I find them boring – as I’m sure many people do – and I enjoy figuring things out by playing with the tools on my own. However, web scraping and in particular webscraper.io uses its own language and without any context, I totally failed at my first scrape (I wouldn’t even say scrape, I’d say random clicking and nothing happening). However, once I gave in and watched the short video, I completed a web scrape of The Denisonian website in just a few minutes.

As I moved on to more complicated, slightly newer-looking websites, I found that having a little bit of knowledge of HTML was helpful because some of the newer websites disabled the point and click feature on the program and I had to edit the JSON code for the web scrape by hand. This was both frustrating and rewarding, but I’m very glad I was presented with that challenge because I feel I have a much more in-depth understanding of what web scraping is because of it.

The Takeaways

Both of these applications took time – one in terms of waiting around for the application to be developed, the other in terms of my ability to learn a new program. But what I get out of both of these experiences, is an appreciation for the fluid learning skills that I’ve gained through a liberal arts education.

Right now, to me digital scholarship is another branch or level of understanding and interpreting the world, and it’s one that often frustrates the hell out of me. I spent weeks searching for a program that didn’t exist only to find the best version of it possible years later (thanks for the recommendation, Ben). I spent hours – probably longer than I would’ve spent had I just copied and pasted the data I wanted – teaching myself how to scrape different websites. But it’s also incredibly rewarding and gratifying. I stood up in triumph in the computer lab at OWU when my web scraping code worked after the 30th try. I called my mom to tell her about Trint because I was so excited it existed at the same time I was planning to take on another interview project.

I can’t say the digital liberal arts alone have instilled perseverance in me. I’ve always been stubborn. But it has transformed that stubbornness into something productive, the ability to try new methods, learn new ways, and maybe best of all, how to know when to quit (e.g. don’t illegally download an old version of Premier that will crash your 13-year-old computer).

Weeks 8, 9, and 10: How do You Start a Research Project?

It’s been a little while since I last posted, but the delay comes with a payoff: I finally decided on a topic for my independent research project. This is going to be kind of a long post. So, buckle up!

As part of this position, I get to dedicate some of my time to an independent project of my choosing (so long as it involves digital scholarship). I debated taking on a dozen different projects with topics like:

  • Memes
  • Capitalism
  • Karl Marx/Das Kapital
  • #Activism
  • Twitter Bots
  • Thomas Hardy’s Jude the Obscure
  • Thomas Pynchon’s The Crying of Lot 49
  • The myth of the American dream
  • Utopias
  • Free Speech
  • Portrayals of panopticons in video games

And, I considered making all kinds of weird digital projects involving these subjects, like:

  • flash games
  • Twitter bots
  • visuals not unlike The Knotted Line
  • an essay consisting entirely of memes
  • any other jarring combination of maps and graphs that would cause you to pause and think before interpreting the data

The one topic that really grabbed me was free speech on campus. Free speech has been an interest of mine since I was a student at Denison, where I protested speakers and groups invited to table on campus (details I won’t go into here). When I found myself talking about free speech every evening and weekend with my friends and family, I knew I had found a topic that could sustain a years’ worth of inquiry. Below, I’m going to outline how I turned a vague idea into a research project.

The idea for this project came up during the Ohio Five board meeting while provosts and board members discussed their concerns about students’ interpretation of free speech policy on campus and the seemingly escalating events that challenge free speech policy on the campuses. It felt like something was missing from the conversation (student voices), but also an acknowledgement of the post-structuralist point of view that many students–especially those advocating for stricter policies–have embraced and taken as truth (which certainly isn’t a criticism coming from me, the “how many times can she reference Foucault in an hour” student).

So for me, this conversation generated a pile of questions that I couldn’t immediately, internally resolve with Marxist theory as my friends and family would tell you I am wont to do. I ordered the book suggested during the meeting, Free Speech on Campus by Erwin Chemerinsky and Howard Gillman, and after some discussion with Ben, I decided to sketch out a rough research project involving student interviews.

I started with initial questions that I had for students, like:

  • Do you feel the administration does a good job making you feel safe to express yourself on campus?
  • Do you feel like there’s a group of students who don’t agree with your political opinions on campus? What is your perception of the divide of political opinions on campus?
  • Are there any policies that you would change on your campus in order to better reflect your views on free speech?

From there, I started reading everything about free speech and interviewing that I could get my hands on (see reading list). However, this isn’t exactly a digital project yet. It’s a sociology/psychology/communications project, but it doesn’t incorporate digital scholarship outside of the fact that I would be using a digital recording to analyze the students’ views. So that’s when I brought in the idea of text encoding. There are a lot of words that students (and everyone else) uses to talk about free speech that are coded with different meanings depending on who uses them. For example, the phrases “political correctness” or “PC culture” are rarely defined by the people who use them. Heidi Kitrosser explains,

“When [journalists] reference “political correctness,” it often is unclear whether they mean to reference formal restrictions or informal pressures, let alone the subset of either type that they have in mind. Even when reports single out particular practices, important details frequently are excluded. We saw, for instance, several commentators refer to “trigger warnings” without specifying whether they mean voluntary warnings by faculty, warn- ings suggested or encouraged by a school’s administration, or administratively mandated warnings. There is even less clarity as to the meaning of “safe spaces.””*

So I wondered whether there a way I could encode the transcriptions from the interviews to reveal the disconnect between students and the words that they use to talk about free speech. That’s when I came across this very helpful paper by two Berkeley students who were using interviews to visualize information like word frequency and word count. After talking with Jacob Heil, the Digital Scholarship Librarian and Director of CoRE at the College of Wooster, I found this was actually a pretty doable project.

However, as I mentioned earlier, I want to get weird with the data and my analysis or visualization of it. I want to do something unexpected. So, I thought about creating a Twitter bot, and it’s still a consideration. I’m drawn to the uncanny valley aspect of a Twitter bot and though I don’t know exactly how a bot might take shape as a result of this project, it seems like an interesting way to reveal new insights into the thoughts of students. I’m also on the fence about making something similar to the Knotted Line, though I am certainly not an artist of any kind of caliber, so I will have to find a new way of creating this type of visual. Regardless of the form, I think making these challenging visuals actually results in a better conversation with the audience because the audience is forced to spend time with the visual to interpret it, rather than showing statistics or graphs that reveal to the audience the answer that they want to find.

The questions about this type of visualization become: can I interweave the student narratives to reveal something new about them? Can I uncover assumptions students hold? Is one of them right and the other wrong? What is the relationship between the post-structuralist points of view these students and their arguments for changing campus policy? This isn’t an argument between red and blue, it’s much more nuanced than that (as are many arguments). Where do these students overlap? Where do they separate? How can I visualize this without imposing my own interpretation on students’ beliefs, or is that possible?

I don’t know the answers to these questions and of course, there are a lot more logistical and pragmatic issues to take into account for this project as well. For example: What if no one volunteers to be interviewed? What if I only get students with very similar points of view? What if I don’t have enough time? I have to go through five IRB processes if I’m going to do interviews at all of the campuses like I hoped. Do I even have time for that?

For now, I am simply maintaining a list of all of these questions so that as I move forward, I don’t lose sight of the constraining factors.

Finally, I had to actually formulate a research question that was both narrow enough to fit within the scope of this project and my 10 month(ish) deadline, and broad enough that it could be tweaked to better fit the project should it begin to take a new turn as I’m interviewing students. To formulate this question, I first read papers on qualitative research techniques and interview projects (see reading list again). Then, I compiled a list of all of the questions that arose as I was reading about free speech. I used these notes to write research questions that could allow for the possibility to dig into the theoretical questions as I interviewed students. Below are the two research questions I landed on and a series of 5 sub-questions related to the main two.

(1) How do students think (talk/perceive/conceptualize/interact with) about free speech on campus? (2) When students disagree on aspects of free speech, are there differences in the ways they talk about free speech and the language that they use?

  1. What words do students use when talking about free speech? How often do they use them?
  2. What do students think the “other side(s)” don’t understand about their POV?
  3. What influenced students to think about free speech in the way that they do?
  4. Are students’ views grounded in different schools of thought/theories/ways of understanding the world?
  5. How does students’ perceptions of campus climate impact their views or vice versa?

This isn’t a final list, and it certainly isn’t a list of interview questions, but it’s definitely a start. From here, I am going to continue to meet with campus leaders to discuss my project plans and continue to read about free speech and interview practices. Soon, I hope to begin the IRB process and teach myself how to encode text.

*Citation on Reading List under Kitrosser