Problem:
While working with Google ExoPlayer for Android I ran into an issue regarding the display/timing of inband CEA608
closed captions. Some of the closed captions were being displayed for too long. These types of captions are embedded into the ts chunks
fetched from servers when consuming HLS streams. This makes it difficult to debug the expected behavior of these captions (start time, end time, etc.) because the metadata regarding them isn’t human readable. Here’s the approach I took to extract the CEA608
captions/metadata in order to debug the root cause of my issue.
Prerequisites:
- Bash/Unix
- Git
- ffmpeg
For this guide I’ll be using the apple stream bip bop 16×9 HLS stream as an example. It contains a single CEA608
captions track from an old cpcweb/demo which appears to be no longer available. This stream requests the same ts chunk
URL (https://devimages.apple.com.edgekey.net/streaming/examples/bipbop_16x9/gear5/main.ts) using a “Range” Header
. The server reads the range and returns the appropriate ts chunk
. Modify the following guide as needed for your stream.
By examining the gear 4 m3u8 playlist from the master m3u8 playlist I determined the ranges of the first two ts chunks
:
#EXTM3U #EXT-X-TARGETDURATION:11 #EXT-X-VERSION:4 #EXT-X-MEDIA-SEQUENCE:0 #EXT-X-PLAYLIST-TYPE:VOD #EXTINF:9.9766, #EXT-X-BYTERANGE:1210156@0 main.ts #EXTINF:9.9433, #EXT-X-BYTERANGE:1190040@1210156 main.ts |
Switch to a working directory where new files can safely be created.
Store the ranges to curl the ts chunks
from the server:
# Store a reference to each desired chunk in a file printf "bytes=0-1210155\nbytes=1210156-2400195" >> ranges for f in `cat ranges`; do curl -H "Range: $f" 'https://devimages.apple.com.edgekey.net/streaming/examples/bipbop_16x9/gear4/main.ts' > $f.ts; done |
Now we have the first two chunks in our working directory saved as:
bytes=0-326743.ts bytes=326744-653111.ts |
Next we’ll combine the ts chunks
into a single video file. Create a file list containing all the ts chunks
which we just fetched:
for f in *.ts; do echo "file '$f'" >> tslist.txt; done |
Concatenate the files into a single ts
video file using ffmpeg
. ffmpeg website.
ffmpeg -f concat -safe 0 -i tslist.txt -c copy output.ts |
Take a moment to verify that our new video output.ts
plays as expected using a video player such as VLC.
The CEA608
closed captions can now be extracted from output.ts
. Checkout the CCExtractor tool by http://www.ccextractor.org/:
git clone https://github.com/CCExtractor/ccextractor.git cd linux |
Using the default configuration extract the captions using ccextractor:
./ccextractor output.ts |
Using the default configuration of ccextractor
an output.srt
file containing the desired CEA608
captions metadata is generated in the working directory:
1 00:00:01,935 --> 00:00:07,906 ♪MUSIC♪ 2 00:00:07,941 --> 00:00:09,508 I'M AT THE LEFT OF THE SCREEN. 3 00:00:09,543 --> 00:00:10,968 SO CAPTIONS OF WHAT I SAY 4 00:00:11,003 --> 00:00:13,004 APPEAR AT THE LEFT OF THE SCREEN, TOO. 5 00:00:14,473 --> 00:00:16,307 NOW I'M AT THE RIGHT OF THE SCREEN, 6 00:00:16,342 --> 00:00:18,276 SO MY CAPTIONS APPEAR AT THE RIGHT. 7 00:00:19,545 --> 00:00:19,977 NOW I AM OFF SCREEN. (END)
CCExtractor has many configuration options. See them here