Problem:
While working with Google ExoPlayer for Android I ran into an issue regarding the display/timing of inband CEA608 closed captions. Some of the closed captions were being displayed for too long. These types of captions are embedded into the ts chunks fetched from servers when consuming HLS streams. This makes it difficult to debug the expected behavior of these captions (start time, end time, etc.) because the metadata regarding them isn’t human readable. Here’s the approach I took to extract the CEA608 captions/metadata in order to debug the root cause of my issue.
Prerequisites:
- Bash/Unix
- Git
- ffmpeg
For this guide I’ll be using the apple stream bip bop 16×9 HLS stream as an example. It contains a single CEA608 captions track from an old cpcweb/demo which appears to be no longer available. This stream requests the same ts chunk URL (https://devimages.apple.com.edgekey.net/streaming/examples/bipbop_16x9/gear5/main.ts) using a “Range” Header. The server reads the range and returns the appropriate ts chunk. Modify the following guide as needed for your stream.
By examining the gear 4 m3u8 playlist from the master m3u8 playlist I determined the ranges of the first two ts chunks:
#EXTM3U #EXT-X-TARGETDURATION:11 #EXT-X-VERSION:4 #EXT-X-MEDIA-SEQUENCE:0 #EXT-X-PLAYLIST-TYPE:VOD #EXTINF:9.9766, #EXT-X-BYTERANGE:1210156@0 main.ts #EXTINF:9.9433, #EXT-X-BYTERANGE:1190040@1210156 main.ts |
Switch to a working directory where new files can safely be created.
Store the ranges to curl the ts chunks from the server:
# Store a reference to each desired chunk in a file printf "bytes=0-1210155\nbytes=1210156-2400195" >> ranges for f in `cat ranges`; do curl -H "Range: $f" 'https://devimages.apple.com.edgekey.net/streaming/examples/bipbop_16x9/gear4/main.ts' > $f.ts; done |
Now we have the first two chunks in our working directory saved as:
bytes=0-326743.ts bytes=326744-653111.ts |
Next we’ll combine the ts chunks into a single video file. Create a file list containing all the ts chunks which we just fetched:
for f in *.ts; do echo "file '$f'" >> tslist.txt; done |
Concatenate the files into a single ts video file using ffmpeg. ffmpeg website.
ffmpeg -f concat -safe 0 -i tslist.txt -c copy output.ts |
Take a moment to verify that our new video output.ts plays as expected using a video player such as VLC.
The CEA608 closed captions can now be extracted from output.ts. Checkout the CCExtractor tool by http://www.ccextractor.org/:
git clone https://github.com/CCExtractor/ccextractor.git cd linux |
Using the default configuration extract the captions using ccextractor:
./ccextractor output.ts |
Using the default configuration of ccextractor an output.srt file containing the desired CEA608 captions metadata is generated in the working directory:
1
00:00:01,935 --> 00:00:07,906
♪MUSIC♪
2
00:00:07,941 --> 00:00:09,508
I'M AT THE LEFT
OF THE SCREEN.
3
00:00:09,543 --> 00:00:10,968
SO CAPTIONS
OF WHAT I SAY
4
00:00:11,003 --> 00:00:13,004
APPEAR AT THE LEFT
OF THE SCREEN, TOO.
5
00:00:14,473 --> 00:00:16,307
NOW I'M AT THE RIGHT
OF THE SCREEN,
6
00:00:16,342 --> 00:00:18,276
SO MY CAPTIONS APPEAR
AT THE RIGHT.
7
00:00:19,545 --> 00:00:19,977
NOW I AM OFF SCREEN.
(END)
CCExtractor has many configuration options. See them here