Container Formats and Streaming Protocols: MP4, MKV, HLS, and DASH
Master the art of multimedia encapsulation and delivery. From the ISO Base Media File Format to adaptive bitrate streaming, understand how modern platforms package and distribute content at scale.

Understanding Container Formats
Once audio and video are compressed into elementary streams, they must be packaged for storage or transmission. Container formats define how to store multiple streams of data—video, audio, subtitles, metadata—in a single file, handling their interleaving and synchronization. Streaming protocols then define how to deliver this data over networks for real-time playback.
Container vs Codec
Container Format
- • Defines file structure and metadata
- • Multiplexes multiple streams
- • Handles synchronization
- • Examples: MP4, MKV, WebM, AVI
Codec
- • Compresses/decompresses data
- • Defines encoding algorithm
- • Stream-specific (audio/video)
- • Examples: H.264, AAC, VP9, Opus
The ISO Base Media File Format (MP4)
The ISO Base Media File Format (ISOBMFF), formally standardized as ISO/IEC 14496-12, is a foundational specification for modern container formats, most notably MP4. Derived from Apple's QuickTime format, it provides a flexible and extensible structure for storing time-based multimedia data.
MP4 File Structure (Hierarchical Boxes): ┌─────────────────────────────────────────┐ │ ftyp (File Type Box) │ │ - Brand: mp42, isom │ │ - Compatible brands │ └─────────────────────────────────────────┘ ┌─────────────────────────────────────────┐ │ moov (Movie Box) │ │ ├─ mvhd (Movie Header) │ │ │ - Duration, timescale, creation │ │ ├─ trak (Track Box) - Video │ │ │ ├─ tkhd (Track Header) │ │ │ ├─ mdia (Media Box) │ │ │ │ ├─ mdhd (Media Header) │ │ │ │ ├─ hdlr (Handler Reference) │ │ │ │ └─ minf (Media Information) │ │ │ │ ├─ vmhd (Video Media Header) │ │ │ │ ├─ dinf (Data Information) │ │ │ │ └─ stbl (Sample Table) │ │ │ │ ├─ stsd (Sample Description) │ │ │ │ ├─ stts (Time to Sample) │ │ │ │ ├─ stco (Chunk Offset) │ │ │ │ └─ stsz (Sample Size) │ │ ├─ trak (Track Box) - Audio │ │ │ └─ [Similar structure] │ │ └─ udta (User Data) │ └─────────────────────────────────────────┘ ┌─────────────────────────────────────────┐ │ mdat (Media Data Box) │ │ - Interleaved audio/video samples │ │ - Referenced by moov index tables │ └─────────────────────────────────────────┘
Key Features
- • Object-oriented design
- • Extensible metadata
- • Fast random access
- • Industry standard
Limitations
- • Patent encumbered
- • Limited codec support
- • No native chapters
- • Complex structure
Common Uses
- • Streaming platforms
- • Mobile devices
- • DSLR cameras
- • Web video
Matroska (MKV): The Open Alternative
The Matroska (MKV) format is a free, open-standard container designed with flexibility, extensibility, and long-term viability in mind. Unlike MP4, which is governed by ISO standards, Matroska is a royalty-free project that can hold an unlimited number of video, audio, picture, and subtitle tracks in a single file.
Matroska/EBML Structure: ┌─────────────────────────────────────────┐ │ EBML Header │ │ - EBML Version │ │ - DocType: "matroska" │ │ - DocTypeVersion │ └─────────────────────────────────────────┘ ┌─────────────────────────────────────────┐ │ Segment │ │ ├─ SeekHead (Index) │ │ │ └─ Seek entries to other elements │ │ ├─ Info │ │ │ ├─ TimecodeScale │ │ │ ├─ Duration │ │ │ └─ MuxingApp/WritingApp │ │ ├─ Tracks │ │ │ └─ TrackEntry (per stream) │ │ │ ├─ TrackNumber │ │ │ ├─ TrackType (video/audio/sub) │ │ │ ├─ CodecID │ │ │ └─ Video/Audio settings │ │ ├─ Chapters │ │ │ └─ EditionEntry │ │ │ └─ ChapterAtom │ │ ├─ Attachments │ │ │ └─ AttachedFile (fonts, covers) │ │ ├─ Tags │ │ │ └─ Tag (metadata) │ │ └─ Cluster (multiple) │ │ ├─ Timecode │ │ └─ SimpleBlock/BlockGroup │ │ └─ Block (actual media data) │ └─────────────────────────────────────────┘ EBML (Extensible Binary Meta Language): • XML-like structure in binary • Self-describing format • Forward/backward compatible • Unknown elements safely skipped
MKV Advantages
- • Universal codec support
- • Multiple audio/subtitle tracks
- • Chapter markers and menus
- • Attachments (fonts, images)
- • Error recovery capability
- • Streaming support
- • No licensing fees
- • Extensible metadata
Other Important Container Formats
Subset of Matroska for the web:
- • Video: VP8, VP9, AV1 only
- • Audio: Vorbis, Opus only
- • Optimized for web streaming
- • Royalty-free and open
- • Native browser support
Designed for transmission over unreliable networks:
- • 188-byte fixed packets
- • Error correction support
- • Used in broadcast TV, HLS
- • Self-synchronizing
- • No file-level index needed
Legacy format still used for live streaming:
- • Simple tag-based structure
- • Low latency streaming
- • RTMP protocol support
- • Limited codec support
- • Being phased out
Fragmented MP4 for Streaming
Regular MP4 vs Fragmented MP4: Regular MP4: [ftyp][moov][mdat.........................] ↑ Must download entire moov to start playback Fragmented MP4 (fMP4): [ftyp][moov][moof][mdat][moof][mdat][moof][mdat]... ↑ ↑ ↑ Init Fragment 1 Fragment 2 Fragment 3 Structure: ┌─────────────────────────────────────────┐ │ Initialization Segment │ │ ├─ ftyp (File Type) │ │ └─ moov (Movie Box - no sample data) │ │ └─ mvex (Movie Extends) │ │ └─ trex (Track Extends) │ └─────────────────────────────────────────┘ ┌─────────────────────────────────────────┐ │ Media Segment (Fragment) │ │ ├─ styp (Segment Type - optional) │ │ ├─ moof (Movie Fragment) │ │ │ ├─ mfhd (Fragment Header) │ │ │ └─ traf (Track Fragment) │ │ │ ├─ tfhd (Track Fragment Header) │ │ │ └─ trun (Track Run) │ │ └─ mdat (Media Data) │ └─────────────────────────────────────────┘ Benefits: • Progressive download/streaming • Live streaming support • Adaptive bitrate switching • Reduced startup latency • Error resilience
Creating fMP4 with FFmpeg:
# Generate fMP4 for DASH/HLS ffmpeg -i input.mp4 -c:v h264 -c:a aac \ -movflags frag_keyframe+empty_moov+default_base_moof \ -f mp4 output_fragmented.mp4 # Live streaming with fragments ffmpeg -i rtmp://source -c copy \ -f mp4 -movflags frag_keyframe+empty_moov+faststart \ -frag_duration 2000000 output.mp4
HTTP Live Streaming (HLS)
HTTP Live Streaming (HLS), developed by Apple and standardized as IETF RFC 8216, is the dominant protocol for delivering adaptive bitrate video to end-users. Its architecture is elegantly simple and leverages standard web servers and CDNs.
HLS Workflow: 1. Source Processing: Video → Multiple Bitrates → Segmentation 2. File Structure: master.m3u8 ├── 1080p/playlist.m3u8 │ ├── segment000.ts │ ├── segment001.ts │ └── segment002.ts ├── 720p/playlist.m3u8 │ └── [segments] └── 480p/playlist.m3u8 └── [segments] 3. Master Playlist (master.m3u8): #EXTM3U #EXT-X-VERSION:6 #EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080 1080p/playlist.m3u8 #EXT-X-STREAM-INF:BANDWIDTH=2800000,RESOLUTION=1280x720 720p/playlist.m3u8 #EXT-X-STREAM-INF:BANDWIDTH=1400000,RESOLUTION=854x480 480p/playlist.m3u8 4. Media Playlist (1080p/playlist.m3u8): #EXTM3U #EXT-X-VERSION:3 #EXT-X-TARGETDURATION:10 #EXT-X-MEDIA-SEQUENCE:0 #EXTINF:10.0, segment000.ts #EXTINF:10.0, segment001.ts #EXTINF:10.0, segment002.ts #EXT-X-ENDLIST HLS Features: • Adaptive Bitrate (ABR) • Live and VOD support • Encryption (AES-128) • Closed captions • Multiple audio tracks • Low Latency HLS (LL-HLS)
Advantages
- • Uses standard HTTP/HTTPS
- • CDN-friendly
- • Firewall traversal
- • Apple ecosystem support
Limitations
- • Higher latency (10-30s)
- • Apple-centric
- • TS overhead
- • Complex for live
Dynamic Adaptive Streaming over HTTP (DASH)
MPEG-DASH is an international standard for adaptive bitrate streaming. Unlike HLS, which was developed by Apple, DASH is codec-agnostic and container-agnostic, offering greater flexibility.
DASH Manifest (MPD - Media Presentation Description): <?xml version="1.0" encoding="UTF-8"?> <MPD xmlns="urn:mpeg:dash:schema:mpd:2011" type="static" mediaPresentationDuration="PT10M" minBufferTime="PT2S"> <Period duration="PT10M"> <!-- Video Adaptation Sets --> <AdaptationSet mimeType="video/mp4" codecs="avc1.42c01e"> <Representation id="1" bandwidth="5000000" width="1920" height="1080"> <SegmentTemplate media="video/1080p/$Number$.m4s" initialization="video/1080p/init.mp4" duration="4000" timescale="1000"/> </Representation> <Representation id="2" bandwidth="2800000" width="1280" height="720"> <SegmentTemplate media="video/720p/$Number$.m4s" initialization="video/720p/init.mp4" duration="4000" timescale="1000"/> </Representation> </AdaptationSet> <!-- Audio Adaptation Sets --> <AdaptationSet mimeType="audio/mp4" codecs="mp4a.40.2" lang="en"> <Representation id="3" bandwidth="128000"> <SegmentTemplate media="audio/128k/$Number$.m4s" initialization="audio/128k/init.mp4" duration="4000" timescale="1000"/> </Representation> </AdaptationSet> </Period> </MPD> DASH Features: • ISO standard (no vendor lock-in) • Codec/container agnostic • Supports fMP4 and WebM • Multi-DRM support • Server push capability • Common Media Application Format (CMAF)
HLS vs DASH Comparison
HLS
- • Text-based playlists (M3U8)
- • TS or fMP4 segments
- • Apple ecosystem
- • Simpler implementation
DASH
- • XML manifest (MPD)
- • Any container format
- • Industry standard
- • More flexible
RTMP: The Legacy Live Protocol
RTMP Architecture: ┌──────────────┐ RTMP ┌──────────────┐ │ Encoder │─────────────▶│ Media Server │ │ (OBS/FMLE) │ Port 1935 │ (Nginx) │ └──────────────┘ └──────┬───────┘ │ Transcode/Package │ ┌────────────────┼────────────────┐ ▼ ▼ ▼ HLS DASH WebRTC (HTTP) (HTTP) (UDP) │ │ │ ▼ ▼ ▼ Players Players Browsers RTMP Features: • Persistent TCP connection • Low latency (1-3 seconds) • Chunked message format • Multiple streams per connection • Action Message Format (AMF) RTMP URL Structure: rtmp://server:port/app/stream_key rtmp://live.example.com/live/user123 Handshake Process: 1. C0/S0: Protocol version 2. C1/S1: Timestamp + random data 3. C2/S2: Echo verification 4. Ready for streaming Current Role: • First-mile protocol (ingest) • Encoder → Server transport • Not for end-user delivery • Being replaced by SRT/WebRTC
RTMP with FFmpeg:
# Stream to RTMP server ffmpeg -re -i input.mp4 -c:v libx264 -c:a aac \ -f flv rtmp://live.example.com/live/stream_key # Receive RTMP and convert to HLS ffmpeg -i rtmp://localhost/live/stream \ -c:v copy -c:a copy -f hls \ -hls_time 10 -hls_list_size 3 \ -hls_flags delete_segments output.m3u8
Modern Streaming Technologies
Real-time communication for browsers:
- • Sub-second latency (< 500ms)
- • P2P and server-based modes
- • Built-in adaptive bitrate
- • NAT traversal (STUN/TURN)
- • Encrypted by default (DTLS/SRTP)
// WebRTC in browser const pc = new RTCPeerConnection(); const stream = await navigator.mediaDevices.getUserMedia({ video: true, audio: true }); stream.getTracks().forEach(track => pc.addTrack(track, stream));
Modern replacement for RTMP:
- • Low latency over unreliable networks
- • AES encryption built-in
- • Packet loss recovery
- • Firewall traversal
- • Open source (no licensing)
# Stream with SRT ffmpeg -i input.mp4 -c copy -f mpegts \ srt://hostname:port?mode=caller&transtype=live
Unified format for HLS and DASH:
- • Single set of files for both protocols
- • fMP4-based segments
- • Chunked encoding for low latency
- • Reduces CDN storage by 50%
- • Industry-wide adoption
Practical FFmpeg Examples
# Generate HLS with multiple bitrates
ffmpeg -i input.mp4 \ -filter_complex "[0:v]split=3[v1][v2][v3]; \ [v1]scale=w=1920:h=1080[v1out]; \ [v2]scale=w=1280:h=720[v2out]; \ [v3]scale=w=854:h=480[v3out]" \ -map "[v1out]" -c:v:0 libx264 -b:v:0 5M \ -map "[v2out]" -c:v:1 libx264 -b:v:1 3M \ -map "[v3out]" -c:v:2 libx264 -b:v:2 1M \ -map a:0 -c:a:0 aac -b:a:0 192k \ -map a:0 -c:a:1 aac -b:a:1 128k \ -map a:0 -c:a:2 aac -b:a:2 96k \ -f hls -hls_time 10 -hls_playlist_type vod \ -master_pl_name master.m3u8 \ -var_stream_map "v:0,a:0 v:1,a:1 v:2,a:2" output_%v.m3u8
# Generate DASH content
ffmpeg -i input.mp4 \ -c:v libx264 -b:v 2M -g 48 -keyint_min 48 -sc_threshold 0 \ -c:a aac -b:a 128k \ -use_timeline 1 -use_template 1 \ -adaptation_sets "id=0,streams=v id=1,streams=a" \ -f dash output.mpd
# Convert between containers (remux)
# MP4 to MKV (no re-encoding) ffmpeg -i input.mp4 -c copy output.mkv # MKV to MP4 with compatible codecs ffmpeg -i input.mkv -c:v copy -c:a aac output.mp4 # Extract all streams to separate files ffmpeg -i input.mkv \ -map 0:v -c copy video.mp4 \ -map 0:a:0 -c copy audio_en.aac \ -map 0:a:1 -c copy audio_jp.aac \ -map 0:s -c copy subtitles.srt
# Live streaming pipeline
# Capture → Transcode → Stream ffmpeg -f v4l2 -i /dev/video0 \ -f alsa -i hw:0 \ -c:v libx264 -preset veryfast -b:v 3000k \ -c:a aac -b:a 128k \ -f flv rtmp://live.twitch.tv/live/YOUR_STREAM_KEY
# Analyze container structure
# Detailed container info ffprobe -v quiet -print_format json -show_format -show_streams input.mp4 # MP4 box structure mp4box -info input.mp4 # MKV element info mkvinfo input.mkv
Choosing the Right Format
Use Case | Container | Protocol | Reason |
---|---|---|---|
Web Streaming | fMP4 | HLS/DASH | Wide compatibility, ABR |
Live Broadcasting | TS/fMP4 | RTMP→HLS | Low latency ingest, scalable delivery |
Video Calls | RTP | WebRTC | Ultra-low latency, P2P |
Archival | MKV | File | Flexible, preserves everything |
Mobile Apps | MP4 | Progressive | Hardware support, compatibility |
Performance Optimization
Best Practices
- Segment Size: 2-10 seconds for VOD, 1-2 seconds for low latency. Balance between latency and efficiency.
- Keyframe Alignment: Ensure keyframes align across all bitrates for smooth switching in ABR.
- CDN Optimization: Use byte-range requests for large files, implement proper cache headers.
- CMAF Adoption: Single set of files for both HLS and DASH reduces storage and improves cache efficiency.
References & Resources
FFmpeg Series Conclusion
Through this comprehensive series, we've explored the complete FFmpeg ecosystem—from its modular architecture and core libraries, through the mathematical foundations of compression, to modern streaming protocols and WebAssembly implementation.
Technical Deep Dives
- ✓ libav* library architecture
- ✓ Video compression mathematics
- ✓ Audio psychoacoustics
- ✓ WebAssembly implementation
- ✓ Container formats & protocols
Practical Applications
- ✓ Transcoding pipelines
- ✓ Streaming architectures
- ✓ Browser-based processing
- ✓ Adaptive bitrate delivery
- ✓ Real-world optimizations
FFmpeg remains the cornerstone of modern multimedia processing, powering everything from streaming platforms to browser-based editors. Its continued evolution ensures it will remain relevant as new formats and delivery methods emerge.