George McKinney Adventures in Software Development

February 26, 2024

AWS Transcribe CLI Workflow

Filed under: AWS,ffmpeg — georgemck @ 6:28 pm

Recently I needed to create transcriptions for a number of videos. I decided to use Amazon Transcribe to make it faster for me than typing. I used ffmpeg and S3 to lighten the load.

 

— 1. separate audio from the video file

ffmpeg -i input.mp4 -vn -acodec pcm_s16le -ar 44100 -ac 2 output.wav

— 2. upload audio to S3 bucket
aws s3 cp output.wav s3://transcribe-for-canvas

 

— 3. extract the text from the audio through transcription
aws transcribe start-transcription-job –transcription-job-name canvascaptions –media MediaFileUri=s3://transcribe-for-canvas/output.wav –output-bucket-name transcribe-for-canvas –subtitles Formats=srt –language-code en-US –region us-east-1

— 4. check on the transcription progress
aws transcribe get-transcription-job –transcription-job-name canvascaptions

 

— 5. download the transcription files
aws s3 cp s3://transcribe-for-canvas –recursive

 

No Comments

No comments yet.

RSS feed for comments on this post. TrackBack URL

Sorry, the comment form is closed at this time.

Powered by WordPress