Apollo in Real Time Forum

Message Boards => Apollo 13 Moments of Interest => Topic started by: bfeist on October 18, 2024, 02:11:01 pm

Title: Improved Apollo 13 MOCR audio transcripts
Post by: bfeist on October 18, 2024, 02:11:01 pm
I've taken the 7200 hours of MOCR audio recordings and have re-transcribed them using the latest Whisper "large-v3" model and a wrapper system called WhisperX. This has provided fewer hallucinations in the transcripts and provides much better utterance timestamps.

These new transcripts replace the existing "large-v2" transcripts and are now live on apolloinrealtime.org/13

New stats:
3,936,510 utterances
37,503,659 words
199MB of text
Title: Re: Improved Apollo 13 MOCR audio transcripts
Post by: MadDogBV on October 21, 2024, 01:47:20 pm
Looks good, thanks Ben. It does seem a bit more accurate. There were some quirks I noticed such as duplicate text or inconsistent words ("P2" and "P-tube" got mixed up a couple times in the transcript of a recent Moment of Interest I posted), but on the whole it seems pretty strong and if nothing else gives a good baseline to work with. It does pretty good at transcribing quiet parts in particular.

For example, parts like the attached image seem to occur a fair amount. (Having listened to these clips, I can say fairly confidently that "the, uh" is not repeated 15 times like the transcript implies here.)  ;D

Edit - 10/22/2024: Per my updated post below, the transcript errors were due to my computer storing an outdated transcript in the browser cache. This was fixed by clearing cookies and cache for Apolloinrealtime.org.
Title: Re: Improved Apollo 13 MOCR audio transcripts
Post by: bfeist on October 21, 2024, 03:55:31 pm
Is that example from Apollo 13? Just asking because I'm still in the process of retranscribing Apollo 11 and expect these kinds of errors there. 13 shouldn't have them.
Title: Re: Improved Apollo 13 MOCR audio transcripts
Post by: MadDogBV on October 22, 2024, 10:35:46 am
Apollo 13. However, it has been a while since I cleared my cookies and cache for Apollo 13 In Real Time. I might try that next.

Edit: Sure enough, that fixed the problem. The transcripts now look wonderful. You might make it a recommendation to previous users of AIRT to clear their cookies and cache to get the updated transcripts.