The Problem: Video Processing Was the Bottleneck
A research platform allows mental health researchers to upload video interviews with study participants. Each video is typically 45-60 minutes, recorded in 1080p on smartphones.
The workflow researchers needed:
- Upload original video (often 2-3 GB per file)
- System creates web-friendly versions (480p, 720p for different bandwidth scenarios)
- Extract audio as a separate file (for transcription services)
- All files available within minutes, not hours
The challenge: They were uploading 50+ videos per day. Peak times saw 10-15 simultaneous uploads.
Their initial attempt: A simple sequential processing system.
- Upload video → wait for processing to finish → upload next video
- Each video took 15-20 minutes to process
- Processing 10 videos took 3+ hours
- Researchers sat idle waiting for videos to finish
The requirement: Process multiple videos at the same time, without overwhelming the server or running out of disk space.
Why Sequential Processing Fails
Most developers start with the simple approach: process one video at a time. This creates a massive bottleneck.
The sequential problem:
- Video 1: Finishes at 2:18 PM
- Video 2: Finishes at 2:36 PM
- Video 10: Finishes at 5:00 PM
The researcher who uploaded Video 10 at 2:00 PM waits 3 hours to see their processed file.
The server's perspective: CPU usage hovers at 25-30%. The server is bored, mostly idle, waiting for the current video to finish before starting the next one.
This is like having a 4-lane highway but forcing all cars into a single lane.
Our Solution: Parallel Processing with Smart Queue Management
We built a system that processes multiple videos simultaneously—but with controls to prevent server overload.
How it works:
- Researcher uploads 10 videos through web interface
- Videos go directly to cloud storage (AWS S3)
- All 10 videos enter a processing queue immediately
- We run 3 dedicated video processing workers (configurable based on server capacity)
- Each worker automatically picks jobs from the queue
- For each video, 3 operations run simultaneously: Transcode to 480p, transcode to 720p, extract audio track
The result:
- 10 videos uploaded at 2:00 PM
- Workers 1, 2, 3 process Videos 1, 2, 3 immediately (done by 2:18 PM)
- Workers pick up Videos 4, 5, 6 (done by 2:36 PM)
- Workers pick up Videos 7, 8, 9 (done by 2:54 PM)
- Last worker picks up Video 10 (done by 3:06 PM)
- All 10 videos fully processed in 66 minutes
Time savings: 3 hours → 66 minutes (nearly 3x faster)
The Technical Challenge: Smart Worker Configuration
Instead of one giant queue for everything, we created separate queues for different types of work:
Video Processing Queue:
- Heavy CPU usage (video encoding)
- 3 dedicated workers, each handling 1 video at a time
- Each worker uses 2 CPU cores (for the 3 parallel operations per video)
- Total: 6 CPU cores actively encoding
Why this configuration works:
- 8-core server handles 3 videos + other tasks simultaneously
- Leaves 2 cores for the web app and database
- Workers automatically pick jobs from queue—no manual intervention needed
- If a worker crashes, job manager re-queues the incomplete video and another worker picks it up
Handling failures gracefully:
- Corrupted video? Worker marks job as "failed" with error message, moves on to next video
- Server out of disk space? Workers check available disk before downloading, job stays in queue until space frees up
- Automatic cleanup of processed temporary files prevents disk buildup
Real-World Performance
Before parallel processing:
- ⏱️ Processing time: 15-20 minutes per video (sequential)
- 📊 Throughput: ~3 videos per hour
- 💻 CPU usage: 25-30% (server mostly idle)
- 😓 User experience: "Why is this taking so long?"
After parallel processing:
- ⏱️ Processing time: Still 15-20 minutes per video, but 3 videos process at once
- 📊 Throughput: ~9 videos per hour (3x improvement)
- 💻 CPU usage: 70-80% (server actually working)
- ✨ User experience: "My videos are ready in 20 minutes even during peak hours"
Peak load handling:
- Before: 50 videos uploaded → 16+ hours of processing time
- After: 50 videos uploaded → 6 hours of processing time
- Real-world: Most videos finish within 30 minutes of upload during normal hours
What We Learned
1. Parallelism is limited by resources, not code
You can't just keep adding workers indefinitely. We tested 3 workers (perfect balance, 70-80% CPU usage), 6 workers (server struggled, 95%+ CPU, system became unresponsive), and 2 workers (underutilized, 50% CPU, throughput suffered). Test with realistic load to find the optimal worker count for your server.
2. Separate queues prevent one bottleneck from blocking everything
Early version had one queue for all jobs (videos, audio, documents, thumbnails). When videos backed up during peak hours, simple document jobs waited unnecessarily. Separate queues meant quick jobs finish fast regardless of video backlog. Users get some feedback quickly even if video is still processing.
3. Visibility into queue status changes user behavior
We added a simple progress indicator: "Your video is #3 in the processing queue. Estimated time: 25 minutes." Impact: Support tickets about "slow processing" dropped by 80%. Users just needed to know the system was working.
Building a platform that handles media uploads, document processing, or any batch processing workload? Let's talk →
We've built parallel processing systems for video transcoding, image optimization, AI inference, and document generation. The pattern is always the same: identify independent work, distribute it across workers, and monitor to prevent overload.