Message Boards > Apollo 13 Moments of Interest
Improved Apollo 13 MOCR audio transcripts
(1/1)
bfeist:
I've taken the 7200 hours of MOCR audio recordings and have re-transcribed them using the latest Whisper "large-v3" model and a wrapper system called WhisperX. This has provided fewer hallucinations in the transcripts and provides much better utterance timestamps.
These new transcripts replace the existing "large-v2" transcripts and are now live on apolloinrealtime.org/13
New stats:
3,936,510 utterances
37,503,659 words
199MB of text
MadDogBV:
Looks good, thanks Ben. It does seem a bit more accurate. There were some quirks I noticed such as duplicate text or inconsistent words ("P2" and "P-tube" got mixed up a couple times in the transcript of a recent Moment of Interest I posted), but on the whole it seems pretty strong and if nothing else gives a good baseline to work with. It does pretty good at transcribing quiet parts in particular.
For example, parts like the attached image seem to occur a fair amount. (Having listened to these clips, I can say fairly confidently that "the, uh" is not repeated 15 times like the transcript implies here.) ;D
Edit - 10/22/2024: Per my updated post below, the transcript errors were due to my computer storing an outdated transcript in the browser cache. This was fixed by clearing cookies and cache for Apolloinrealtime.org.
bfeist:
Is that example from Apollo 13? Just asking because I'm still in the process of retranscribing Apollo 11 and expect these kinds of errors there. 13 shouldn't have them.
MadDogBV:
Apollo 13. However, it has been a while since I cleared my cookies and cache for Apollo 13 In Real Time. I might try that next.
Edit: Sure enough, that fixed the problem. The transcripts now look wonderful. You might make it a recommendation to previous users of AIRT to clear their cookies and cache to get the updated transcripts.
Navigation
[0] Message Index
Go to full version