Skip to content

Cleaning Up Zoom Transcriptions for Qualitative Research

October 5, 2022

While sometimes I lament the Current State of Things in academic libraries, I am very glad to be doing qualitative research at this point in time when voice recognition technology has allowed us to create transcriptions of an interview out of the auto-generated captions. Zoom has not only allowed my research team and our subjects, scattered throughout Western Pennsylvania, to come together and share our experiences and attitudes about journal venue choices, but it has also allowed us to collect and analyze our data in an affordable way.

A previous research project I worked on made use of a professional transcription service. For this project, we chose to use our funding entirely for subject compensation. This meant we would be doing the transcription ourselves. While Zoom’s auto-transcription feature is incredibly useful, it still requires some human intervention, especially when faced with less-than-optimal audio quality, non-American accents, verbal tics, or academic jargon. And while in future iterations of this research, we’ll probably request funding for a transcription service, I think Zoom’s auto-transcription is an incredibly helpful tool when that’s not an option.

To help other new library researchers, I’d like to share my workflow for anyone else interested in working with Zoom-generated transcriptions.

Collecting the data:

After receiving verbal consent in the interview, I record to the cloud. While at Pitt, our videos are stored securely on Panopto, but the videos and captions/transcripts are also retrievable from Zoom’s website for up to 120 days.

The Zoom transcript is divided up by timestamps every ten seconds or so.

2:29 SUBJECT: Lorem ipsum dolor sit amet, consectetur

2:32 INTERVIEWER: adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

2:36 SUBJECT: Ut enim ad minim veniam,

2:41 SUBJECT: quis nostrud exercitation ullamco laboris

2:43 SUBJECT: nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in

Initial cleaning up the data:

First, I de-identify the transcripts by replacing all instances of the subject’s name with a codename, and then skim through the transcript for any mentions of the subject’s name and replace with [redacted]. Then, I condense all the consecutive lines from the same speaker into one “entry” for readability’s sake, and perform a “find and replace” for what I’ve found are common capitalization errors (lowercase I contractions, for example). This makes the “listening through” round easier so I’m focusing more on the words and less on the format.

2:29 [CODE]: Lorem ipsum dolor sit amet, consectetur

2:32 INTERVIEWER: adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

2:36 [CODE]: Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in

Listening through:

This is where the bulk of the work is, and so the three of us on my team divide the work after agreeing on a style guide.

As I said earlier, Zoom is fairly good, but doesn’t always get everything right. This round is also where I start adding [tags] for laughter or significant gestures that contribute to the meaning of their words (such as pointing to something in the background or throwing their hands up in resignation). I’ll also italicize any strongly emphasized words if it changes the meaning of the sentence.

But mostly, I just listen and make changes to inaccurate transcriptions as I read along. There are some amusing instances of Zoom “mishearing” things, one example: “The guy was no buffoon” was transcribed as “The guy was notebook phone.” …Indeed! If there are some phrases where I have to listen more than twice to get a sense for what they’re saying, I highlight it and move on.

If the transcription is fairly accurate, I’ll also use this round to start throwing in some basic commas and periods to break up the run-on sentences of human speech. Every few minutes or so, I pause to look back on what I just covered to ensure that the paragraph is making sense. I may also use that time to remove some of the subject’s verbal tics to make sure the ideas aren’t getting lost.

My method of listening through is pretty iterative, so I do it a few times, focusing on different things each time, such as tough sections that I just couldn’t parse. At this point, I might pull in one of my team to get their input. We try to get multiple ears on the transcriptions anyway for this exact reason.

Reading through:

After the listen throughs there are the read throughs where I focus on punctuation and readability.  While these readthroughs don’t necessarily need the listening part, it helps to ensure you’re capturing the cadence and meaning of what they actually said. This is also where I check formatting, since this data will be deposited in our institutional repository and it needs to be usable by others.

And that’s how far we’ve gotten so far in this research project! I have only ever done interview research over Zoom, but I would love to hear from librarians and researchers who had to do manual transcription (quelle horreur!) and invite you to share your experiences in the comments.

No comments yet

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: