Back to Blog
Streaming Technology
FFmpeg Series #5

Container Formats and Streaming Protocols: MP4, MKV, HLS, and DASH

Master the art of multimedia encapsulation and delivery. From the ISO Base Media File Format to adaptive bitrate streaming, understand how modern platforms package and distribute content at scale.

JewelMusic Engineering Team
February 8, 2025
23 min read
Container Formats and Streaming Protocols

Understanding Container Formats

Once audio and video are compressed into elementary streams, they must be packaged for storage or transmission. Container formats define how to store multiple streams of data—video, audio, subtitles, metadata—in a single file, handling their interleaving and synchronization. Streaming protocols then define how to deliver this data over networks for real-time playback.

Container vs Codec

Container Format

  • • Defines file structure and metadata
  • • Multiplexes multiple streams
  • • Handles synchronization
  • • Examples: MP4, MKV, WebM, AVI

Codec

  • • Compresses/decompresses data
  • • Defines encoding algorithm
  • • Stream-specific (audio/video)
  • • Examples: H.264, AAC, VP9, Opus

The ISO Base Media File Format (MP4)

The ISO Base Media File Format (ISOBMFF), formally standardized as ISO/IEC 14496-12, is a foundational specification for modern container formats, most notably MP4. Derived from Apple's QuickTime format, it provides a flexible and extensible structure for storing time-based multimedia data.

MP4 Box Structure
MP4 File Structure (Hierarchical Boxes):

┌─────────────────────────────────────────┐
│ ftyp (File Type Box)                    │
│  - Brand: mp42, isom                    │
│  - Compatible brands                    │
└─────────────────────────────────────────┘
┌─────────────────────────────────────────┐
│ moov (Movie Box)                        │
│ ├─ mvhd (Movie Header)                  │
│ │   - Duration, timescale, creation     │
│ ├─ trak (Track Box) - Video             │
│ │ ├─ tkhd (Track Header)                │
│ │ ├─ mdia (Media Box)                   │
│ │ │ ├─ mdhd (Media Header)              │
│ │ │ ├─ hdlr (Handler Reference)         │
│ │ │ └─ minf (Media Information)         │
│ │ │   ├─ vmhd (Video Media Header)      │
│ │ │   ├─ dinf (Data Information)        │
│ │ │   └─ stbl (Sample Table)            │
│ │ │     ├─ stsd (Sample Description)    │
│ │ │     ├─ stts (Time to Sample)        │
│ │ │     ├─ stco (Chunk Offset)          │
│ │ │     └─ stsz (Sample Size)           │
│ ├─ trak (Track Box) - Audio             │
│ │ └─ [Similar structure]                │
│ └─ udta (User Data)                     │
└─────────────────────────────────────────┘
┌─────────────────────────────────────────┐
│ mdat (Media Data Box)                   │
│  - Interleaved audio/video samples      │
│  - Referenced by moov index tables      │
└─────────────────────────────────────────┘

Key Features

  • • Object-oriented design
  • • Extensible metadata
  • • Fast random access
  • • Industry standard

Limitations

  • • Patent encumbered
  • • Limited codec support
  • • No native chapters
  • • Complex structure

Common Uses

  • • Streaming platforms
  • • Mobile devices
  • • DSLR cameras
  • • Web video

Matroska (MKV): The Open Alternative

The Matroska (MKV) format is a free, open-standard container designed with flexibility, extensibility, and long-term viability in mind. Unlike MP4, which is governed by ISO standards, Matroska is a royalty-free project that can hold an unlimited number of video, audio, picture, and subtitle tracks in a single file.

Matroska EBML Structure
Matroska/EBML Structure:

┌─────────────────────────────────────────┐
│ EBML Header                             │
│  - EBML Version                         │
│  - DocType: "matroska"                  │
│  - DocTypeVersion                       │
└─────────────────────────────────────────┘
┌─────────────────────────────────────────┐
│ Segment                                 │
│ ├─ SeekHead (Index)                     │
│ │   └─ Seek entries to other elements   │
│ ├─ Info                                 │
│ │   ├─ TimecodeScale                    │
│ │   ├─ Duration                         │
│ │   └─ MuxingApp/WritingApp             │
│ ├─ Tracks                               │
│ │   └─ TrackEntry (per stream)          │
│ │       ├─ TrackNumber                  │
│ │       ├─ TrackType (video/audio/sub)  │
│ │       ├─ CodecID                      │
│ │       └─ Video/Audio settings         │
│ ├─ Chapters                             │
│ │   └─ EditionEntry                     │
│ │       └─ ChapterAtom                  │
│ ├─ Attachments                          │
│ │   └─ AttachedFile (fonts, covers)     │
│ ├─ Tags                                 │
│ │   └─ Tag (metadata)                   │
│ └─ Cluster (multiple)                   │
│     ├─ Timecode                         │
│     └─ SimpleBlock/BlockGroup           │
│         └─ Block (actual media data)    │
└─────────────────────────────────────────┘

EBML (Extensible Binary Meta Language):
• XML-like structure in binary
• Self-describing format
• Forward/backward compatible
• Unknown elements safely skipped

MKV Advantages

  • • Universal codec support
  • • Multiple audio/subtitle tracks
  • • Chapter markers and menus
  • • Attachments (fonts, images)
  • • Error recovery capability
  • • Streaming support
  • • No licensing fees
  • • Extensible metadata

Other Important Container Formats

WebM

Subset of Matroska for the web:

  • • Video: VP8, VP9, AV1 only
  • • Audio: Vorbis, Opus only
  • • Optimized for web streaming
  • • Royalty-free and open
  • • Native browser support
MPEG-TS (Transport Stream)

Designed for transmission over unreliable networks:

  • • 188-byte fixed packets
  • • Error correction support
  • • Used in broadcast TV, HLS
  • • Self-synchronizing
  • • No file-level index needed
FLV (Flash Video)

Legacy format still used for live streaming:

  • • Simple tag-based structure
  • • Low latency streaming
  • • RTMP protocol support
  • • Limited codec support
  • • Being phased out

Fragmented MP4 for Streaming

From Regular MP4 to fMP4
Regular MP4 vs Fragmented MP4:

Regular MP4:
[ftyp][moov][mdat.........................]
         ↑
    Must download entire moov to start playback

Fragmented MP4 (fMP4):
[ftyp][moov][moof][mdat][moof][mdat][moof][mdat]...
         ↑     ↑     ↑
    Init  Fragment 1  Fragment 2  Fragment 3

Structure:
┌─────────────────────────────────────────┐
│ Initialization Segment                  │
│ ├─ ftyp (File Type)                     │
│ └─ moov (Movie Box - no sample data)    │
│     └─ mvex (Movie Extends)             │
│         └─ trex (Track Extends)         │
└─────────────────────────────────────────┘
┌─────────────────────────────────────────┐
│ Media Segment (Fragment)                │
│ ├─ styp (Segment Type - optional)       │
│ ├─ moof (Movie Fragment)                │
│ │   ├─ mfhd (Fragment Header)           │
│ │   └─ traf (Track Fragment)            │
│ │       ├─ tfhd (Track Fragment Header) │
│ │       └─ trun (Track Run)             │
│ └─ mdat (Media Data)                    │
└─────────────────────────────────────────┘

Benefits:
• Progressive download/streaming
• Live streaming support
• Adaptive bitrate switching
• Reduced startup latency
• Error resilience

Creating fMP4 with FFmpeg:

# Generate fMP4 for DASH/HLS
ffmpeg -i input.mp4 -c:v h264 -c:a aac \
  -movflags frag_keyframe+empty_moov+default_base_moof \
  -f mp4 output_fragmented.mp4

# Live streaming with fragments
ffmpeg -i rtmp://source -c copy \
  -f mp4 -movflags frag_keyframe+empty_moov+faststart \
  -frag_duration 2000000 output.mp4

HTTP Live Streaming (HLS)

HTTP Live Streaming (HLS), developed by Apple and standardized as IETF RFC 8216, is the dominant protocol for delivering adaptive bitrate video to end-users. Its architecture is elegantly simple and leverages standard web servers and CDNs.

HLS Architecture
HLS Workflow:

1. Source Processing:
   Video → Multiple Bitrates → Segmentation
   
2. File Structure:
   master.m3u8
   ├── 1080p/playlist.m3u8
   │   ├── segment000.ts
   │   ├── segment001.ts
   │   └── segment002.ts
   ├── 720p/playlist.m3u8
   │   └── [segments]
   └── 480p/playlist.m3u8
       └── [segments]

3. Master Playlist (master.m3u8):
#EXTM3U
#EXT-X-VERSION:6
#EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080
1080p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=2800000,RESOLUTION=1280x720
720p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=1400000,RESOLUTION=854x480
480p/playlist.m3u8

4. Media Playlist (1080p/playlist.m3u8):
#EXTM3U
#EXT-X-VERSION:3
#EXT-X-TARGETDURATION:10
#EXT-X-MEDIA-SEQUENCE:0
#EXTINF:10.0,
segment000.ts
#EXTINF:10.0,
segment001.ts
#EXTINF:10.0,
segment002.ts
#EXT-X-ENDLIST

HLS Features:
• Adaptive Bitrate (ABR)
• Live and VOD support
• Encryption (AES-128)
• Closed captions
• Multiple audio tracks
• Low Latency HLS (LL-HLS)

Advantages

  • • Uses standard HTTP/HTTPS
  • • CDN-friendly
  • • Firewall traversal
  • • Apple ecosystem support

Limitations

  • • Higher latency (10-30s)
  • • Apple-centric
  • • TS overhead
  • • Complex for live

Dynamic Adaptive Streaming over HTTP (DASH)

MPEG-DASH is an international standard for adaptive bitrate streaming. Unlike HLS, which was developed by Apple, DASH is codec-agnostic and container-agnostic, offering greater flexibility.

DASH Implementation
DASH Manifest (MPD - Media Presentation Description):

<?xml version="1.0" encoding="UTF-8"?>
<MPD xmlns="urn:mpeg:dash:schema:mpd:2011" 
     type="static"
     mediaPresentationDuration="PT10M"
     minBufferTime="PT2S">
  
  <Period duration="PT10M">
    <!-- Video Adaptation Sets -->
    <AdaptationSet mimeType="video/mp4" 
                   codecs="avc1.42c01e">
      
      <Representation id="1" 
                      bandwidth="5000000" 
                      width="1920" height="1080">
        <SegmentTemplate media="video/1080p/$Number$.m4s"
                        initialization="video/1080p/init.mp4"
                        duration="4000" timescale="1000"/>
      </Representation>
      
      <Representation id="2" 
                      bandwidth="2800000" 
                      width="1280" height="720">
        <SegmentTemplate media="video/720p/$Number$.m4s"
                        initialization="video/720p/init.mp4"
                        duration="4000" timescale="1000"/>
      </Representation>
    </AdaptationSet>
    
    <!-- Audio Adaptation Sets -->
    <AdaptationSet mimeType="audio/mp4" 
                   codecs="mp4a.40.2" lang="en">
      <Representation id="3" bandwidth="128000">
        <SegmentTemplate media="audio/128k/$Number$.m4s"
                        initialization="audio/128k/init.mp4"
                        duration="4000" timescale="1000"/>
      </Representation>
    </AdaptationSet>
  </Period>
</MPD>

DASH Features:
• ISO standard (no vendor lock-in)
• Codec/container agnostic
• Supports fMP4 and WebM
• Multi-DRM support
• Server push capability
• Common Media Application Format (CMAF)

HLS vs DASH Comparison

HLS
  • • Text-based playlists (M3U8)
  • • TS or fMP4 segments
  • • Apple ecosystem
  • • Simpler implementation
DASH
  • • XML manifest (MPD)
  • • Any container format
  • • Industry standard
  • • More flexible

RTMP: The Legacy Live Protocol

Real-Time Messaging Protocol
RTMP Architecture:

┌──────────────┐    RTMP     ┌──────────────┐
│   Encoder    │─────────────▶│ Media Server │
│  (OBS/FMLE)  │   Port 1935  │   (Nginx)    │
└──────────────┘              └──────┬───────┘
                                     │
                              Transcode/Package
                                     │
                    ┌────────────────┼────────────────┐
                    ▼                ▼                ▼
                   HLS            DASH           WebRTC
                 (HTTP)          (HTTP)          (UDP)
                    │                │                │
                    ▼                ▼                ▼
                 Players          Players        Browsers

RTMP Features:
• Persistent TCP connection
• Low latency (1-3 seconds)
• Chunked message format
• Multiple streams per connection
• Action Message Format (AMF)

RTMP URL Structure:
rtmp://server:port/app/stream_key
rtmp://live.example.com/live/user123

Handshake Process:
1. C0/S0: Protocol version
2. C1/S1: Timestamp + random data
3. C2/S2: Echo verification
4. Ready for streaming

Current Role:
• First-mile protocol (ingest)
• Encoder → Server transport
• Not for end-user delivery
• Being replaced by SRT/WebRTC

RTMP with FFmpeg:

# Stream to RTMP server
ffmpeg -re -i input.mp4 -c:v libx264 -c:a aac \
  -f flv rtmp://live.example.com/live/stream_key

# Receive RTMP and convert to HLS
ffmpeg -i rtmp://localhost/live/stream \
  -c:v copy -c:a copy -f hls \
  -hls_time 10 -hls_list_size 3 \
  -hls_flags delete_segments output.m3u8

Modern Streaming Technologies

WebRTC

Real-time communication for browsers:

  • • Sub-second latency (< 500ms)
  • • P2P and server-based modes
  • • Built-in adaptive bitrate
  • • NAT traversal (STUN/TURN)
  • • Encrypted by default (DTLS/SRTP)
// WebRTC in browser
const pc = new RTCPeerConnection();
const stream = await navigator.mediaDevices.getUserMedia({
  video: true, audio: true
});
stream.getTracks().forEach(track => 
  pc.addTrack(track, stream));
SRT (Secure Reliable Transport)

Modern replacement for RTMP:

  • • Low latency over unreliable networks
  • • AES encryption built-in
  • • Packet loss recovery
  • • Firewall traversal
  • • Open source (no licensing)
# Stream with SRT
ffmpeg -i input.mp4 -c copy -f mpegts \
  srt://hostname:port?mode=caller&transtype=live
CMAF (Common Media Application Format)

Unified format for HLS and DASH:

  • • Single set of files for both protocols
  • • fMP4-based segments
  • • Chunked encoding for low latency
  • • Reduces CDN storage by 50%
  • • Industry-wide adoption

Practical FFmpeg Examples

Container and Streaming Operations

# Generate HLS with multiple bitrates

ffmpeg -i input.mp4 \
  -filter_complex "[0:v]split=3[v1][v2][v3]; \
    [v1]scale=w=1920:h=1080[v1out]; \
    [v2]scale=w=1280:h=720[v2out]; \
    [v3]scale=w=854:h=480[v3out]" \
  -map "[v1out]" -c:v:0 libx264 -b:v:0 5M \
  -map "[v2out]" -c:v:1 libx264 -b:v:1 3M \
  -map "[v3out]" -c:v:2 libx264 -b:v:2 1M \
  -map a:0 -c:a:0 aac -b:a:0 192k \
  -map a:0 -c:a:1 aac -b:a:1 128k \
  -map a:0 -c:a:2 aac -b:a:2 96k \
  -f hls -hls_time 10 -hls_playlist_type vod \
  -master_pl_name master.m3u8 \
  -var_stream_map "v:0,a:0 v:1,a:1 v:2,a:2" output_%v.m3u8

# Generate DASH content

ffmpeg -i input.mp4 \
  -c:v libx264 -b:v 2M -g 48 -keyint_min 48 -sc_threshold 0 \
  -c:a aac -b:a 128k \
  -use_timeline 1 -use_template 1 \
  -adaptation_sets "id=0,streams=v id=1,streams=a" \
  -f dash output.mpd

# Convert between containers (remux)

# MP4 to MKV (no re-encoding)
ffmpeg -i input.mp4 -c copy output.mkv

# MKV to MP4 with compatible codecs
ffmpeg -i input.mkv -c:v copy -c:a aac output.mp4

# Extract all streams to separate files
ffmpeg -i input.mkv \
  -map 0:v -c copy video.mp4 \
  -map 0:a:0 -c copy audio_en.aac \
  -map 0:a:1 -c copy audio_jp.aac \
  -map 0:s -c copy subtitles.srt

# Live streaming pipeline

# Capture → Transcode → Stream
ffmpeg -f v4l2 -i /dev/video0 \
  -f alsa -i hw:0 \
  -c:v libx264 -preset veryfast -b:v 3000k \
  -c:a aac -b:a 128k \
  -f flv rtmp://live.twitch.tv/live/YOUR_STREAM_KEY

# Analyze container structure

# Detailed container info
ffprobe -v quiet -print_format json -show_format -show_streams input.mp4

# MP4 box structure
mp4box -info input.mp4

# MKV element info
mkvinfo input.mkv

Choosing the Right Format

Use CaseContainerProtocolReason
Web StreamingfMP4HLS/DASHWide compatibility, ABR
Live BroadcastingTS/fMP4RTMP→HLSLow latency ingest, scalable delivery
Video CallsRTPWebRTCUltra-low latency, P2P
ArchivalMKVFileFlexible, preserves everything
Mobile AppsMP4ProgressiveHardware support, compatibility

Performance Optimization

Best Practices

  • Segment Size: 2-10 seconds for VOD, 1-2 seconds for low latency. Balance between latency and efficiency.
  • Keyframe Alignment: Ensure keyframes align across all bitrates for smooth switching in ABR.
  • CDN Optimization: Use byte-range requests for large files, implement proper cache headers.
  • CMAF Adoption: Single set of files for both HLS and DASH reduces storage and improves cache efficiency.

References & Resources

FFmpeg Series Conclusion

What We've Covered

Through this comprehensive series, we've explored the complete FFmpeg ecosystem—from its modular architecture and core libraries, through the mathematical foundations of compression, to modern streaming protocols and WebAssembly implementation.

Technical Deep Dives

  • ✓ libav* library architecture
  • ✓ Video compression mathematics
  • ✓ Audio psychoacoustics
  • ✓ WebAssembly implementation
  • ✓ Container formats & protocols

Practical Applications

  • ✓ Transcoding pipelines
  • ✓ Streaming architectures
  • ✓ Browser-based processing
  • ✓ Adaptive bitrate delivery
  • ✓ Real-world optimizations

FFmpeg remains the cornerstone of modern multimedia processing, powering everything from streaming platforms to browser-based editors. Its continued evolution ensures it will remain relevant as new formats and delivery methods emerge.