How to extract CEA608 Inband closed captions from HLS video ts chunks
by Riley MacDonald, June 20, 2017

Problem:
While working with Google ExoPlayer for Android I ran into an issue regarding the display/timing of inband CEA608 closed captions. Some of the closed captions were being displayed for too long. These types of captions are embedded into the ts chunks fetched from servers when consuming HLS streams. This makes it difficult to debug the expected behavior of these captions (start time, end time, etc.) because the metadata regarding them isn’t human readable. Here’s the approach I took to extract the CEA608 captions/metadata in order to debug the root cause of my issue.

Prerequisites:

  • Bash/Unix
  • Git
  • ffmpeg

For this guide I’ll be using the apple stream bip bop 16×9 HLS stream as an example. It contains a single CEA608 captions track from an old cpcweb/demo which appears to be no longer available. This stream requests the same ts chunk URL (https://devimages.apple.com.edgekey.net/streaming/examples/bipbop_16x9/gear5/main.ts) using a “Range” Header. The server reads the range and returns the appropriate ts chunk. Modify the following guide as needed for your stream.

By examining the gear 4 m3u8 playlist from the master m3u8 playlist I determined the ranges of the first two ts chunks:

#EXTM3U
#EXT-X-TARGETDURATION:11
#EXT-X-VERSION:4
#EXT-X-MEDIA-SEQUENCE:0
#EXT-X-PLAYLIST-TYPE:VOD
#EXTINF:9.9766,	
#EXT-X-BYTERANGE:1210156@0
main.ts
#EXTINF:9.9433,	
#EXT-X-BYTERANGE:1190040@1210156
main.ts

Switch to a working directory where new files can safely be created.
Store the ranges to curl the ts chunks from the server:

# Store a reference to each desired chunk in a file
printf "bytes=0-1210155\nbytes=1210156-2400195" >> ranges
for f in `cat ranges`; do curl -H "Range: $f" 'https://devimages.apple.com.edgekey.net/streaming/examples/bipbop_16x9/gear4/main.ts' > $f.ts; done

Now we have the first two chunks in our working directory saved as:

bytes=0-326743.ts
bytes=326744-653111.ts

Next we’ll combine the ts chunks into a single video file. Create a file list containing all the ts chunks which we just fetched:

for f in *.ts; do echo "file '$f'" >> tslist.txt; done

Concatenate the files into a single ts video file using ffmpeg. ffmpeg website.

ffmpeg -f concat -safe 0 -i tslist.txt -c copy output.ts

Take a moment to verify that our new video output.ts plays as expected using a video player such as VLC.

The CEA608 closed captions can now be extracted from output.ts. Checkout the CCExtractor tool by http://www.ccextractor.org/:

git clone https://github.com/CCExtractor/ccextractor.git
cd linux

Using the default configuration extract the captions using ccextractor:

./ccextractor output.ts

Using the default configuration of ccextractor an output.srt file containing the desired CEA608 captions metadata is generated in the working directory:

1
00:00:01,935 --> 00:00:07,906
             ♪MUSIC♪            

2
00:00:07,941 --> 00:00:09,508
I'M AT THE LEFT                 
OF THE SCREEN.                  

3
00:00:09,543 --> 00:00:10,968
SO CAPTIONS                     
OF WHAT I SAY                   

4
00:00:11,003 --> 00:00:13,004
APPEAR AT THE LEFT              
OF THE SCREEN, TOO.             

5
00:00:14,473 --> 00:00:16,307
           NOW I'M AT THE RIGHT 
                 OF THE SCREEN, 

6
00:00:16,342 --> 00:00:18,276
          SO MY CAPTIONS APPEAR 
                  AT THE RIGHT. 

7
00:00:19,545 --> 00:00:19,977
       NOW I AM OFF SCREEN.     

(END)

CCExtractor has many configuration options. See them here

Open the comment form

Leave a comment:

Comments will be reviewed before they are posted.

User Comments:

Be the first to leave a comment on this post!