How It Works

Alright, let's pop the hood and see what's actually running when an instructor clicks "Start Class" and video appears on screens across the internet.

This page gets technical. If you just want to integrate and ship, the Quick Start will get you there faster. But if you want to understand why things work the way they do—or you need to explain this to your team—keep reading.

The Core Challenge

Here's the fundamental problem with live video: it's absurdly bandwidth-intensive.

A decent quality video stream runs about 2-3 megabits per second. In a peer-to-peer model, the instructor's device would need to upload that stream separately to every student. Ten students? That's 20-30 Mbps upload. A hundred students? 200-300 Mbps.

Most home internet connections top out at 10-20 Mbps upload. Even if bandwidth wasn't an issue, the CPU load of maintaining a hundred separate WebRTC peer connections would crush most laptops.

Peer-to-peer works for 1:1 video calls. For live classrooms, it falls apart completely.

The SFU Model

SFU stands for Selective Forwarding Unit. It's the architectural pattern that makes one-to-many streaming actually work.

SFU Routing

The concept is straightforward: upload once, distribute many.

The instructor uploads their video stream exactly once—to our servers. We then forward that stream to every connected student. The instructor's bandwidth requirement stays constant whether they're teaching 5 students or 5,000.

The word "selective" matters. We don't blindly copy packets to everyone. We make intelligent decisions about what to send to whom, based on each student's connection quality. More on that in a moment.

Simulcast Encoding: The Three-Layer Trick

Not all students have the same internet connection. Some are on university ethernet. Some are on home WiFi with family members streaming Netflix. Some are on cellular from a bus.

Sending everyone identical 1080p video doesn't work. The students on poor connections will buffer constantly. But sending everyone 240p to be safe means the students on fast connections are watching a pixelated mess.

The solution is simulcast—encoding the video at multiple quality levels simultaneously.

When the instructor publishes their camera, their browser encodes three parallel streams:

Layer	Resolution	Bitrate	Scenario
High	1080p/720p	~2.5 Mbps	Fiber, solid broadband
Medium	540p/480p	~700 Kbps	Average home WiFi
Low	360p/240p	~250 Kbps	Cellular, congested networks

All three layers upload to our SFUs. Then comes the magic.

Real-Time Layer Switching

Our servers monitor every student's connection continuously. Every couple of seconds, we evaluate:

Packet loss: Are we losing more than a few percent of packets?
Round-trip latency: Is congestion building up?
Bandwidth estimation: The browser reports what it can actually receive

Based on this data, we pick the appropriate layer for each student. And we switch dynamically.

Student walks from their desk to the kitchen, farther from the WiFi router? We detect the degraded signal and switch them to a lower layer immediately—before buffering happens. They walk back? We bump them up to HD again.

This is running right now in every Verriflo classroom. You don't configure it. It just works.

Network Traversal: Getting Through Firewalls

Here's an uncomfortable truth about the modern internet: most devices don't have public IP addresses.

Your laptop sits behind a home router doing NAT. Your phone is behind your cellular carrier's NAT. Corporate laptops are behind enterprise firewalls specifically designed to block unexpected connections.

Getting two devices behind different NATs to exchange video is one of the harder problems in real-time communication. This is where ICE (Interactive Connectivity Establishment) comes in.

NAT Traversal

The Connection Negotiation

When a participant joins, our SDK tries multiple connection paths in parallel:

1. Direct connection
Try punching through the NAT directly. Sometimes works with permissive home routers. Often blocked.

2. STUN-assisted connection
STUN (Session Traversal Utilities for NAT) servers help discover public-facing IP addresses. The SDK asks our STUN servers "what's my public IP and port?" and uses that information to establish a connection. Works for many scenarios where direct fails.

3. TURN relay
When direct and STUN both fail—common with strict corporate firewalls or symmetric NATs—we fall back to TURN (Traversal Using Relays around NAT). Media actually routes through our relay servers instead of going peer-to-peer. This always works, at the cost of maybe 50-100ms extra latency.

We operate STUN and TURN infrastructure globally. Our SDK tries all paths in parallel and uses the best available option. About 85% of connections succeed without TURN. The other 15% (mostly corporate networks and strict mobile NATs) go through relay.

From your perspective—and your users' perspective—this is invisible. Video just works, regardless of network environment.

The Signaling Layer

Before any video flows, devices need to coordinate. This happens through signaling over WebSocket connections.

The sequence goes like this:

Connection: Client opens a secure WebSocket to our signaling servers
SDP exchange: Both sides describe their capabilities (codecs, resolutions, supported features)
ICE negotiation: Exchange possible network paths for the actual media connection
DTLS handshake: Establish encryption keys for media streams
Media flows: Video starts

This all happens in 2-3 seconds. From the user's perspective, they clicked "Join" and now they see video.

The signaling connection stays open throughout the session, carrying control messages: participant joins/leaves, mute/unmute notifications, quality change signals, and so on.

Reconnection: Designed for Failure

Networks fail. WiFi drops. Cell towers hand off. Routers reboot. Students walk through dead zones.

Most video apps weren't built with this assumption. Connection drops, screen freezes, user refreshes the page, rejoins the room, apologizes for missing content.

We built the entire SDK around the expectation that connections will fail. The question isn't "if" but "when" and "how fast can we recover."

Reconnection Flow

The Recovery Strategy

Detection (1-3 seconds)
We notice packets stopped flowing. No waiting around hoping things improve.

Progressive retry
First reconnection attempt happens immediately. If it fails, we wait 500ms and try again. Then 1 second. Then 2 seconds. Progressive backoff prevents hammering an overloaded network.

Multi-path attempts
We don't just retry the same path. ICE kicks in again, trying different candidates in parallel. Maybe the original path is now broken but a TURN relay will work.

Stream resumption
Once reconnected, we pick up the stream roughly where we left off. No manual intervention required.

For most network hiccups, recovery takes under 10 seconds. Often under 5. The student sees a brief "Reconnecting..." overlay, then they're back in class. No page refresh. No rejoin.

This matters more than almost any other feature. A lecture where the video freezes for 30 seconds while students fumble with refresh buttons isn't a good learning experience. Seamless recovery is table stakes for us.

Encryption: Everything, Always

Every bit flowing through Verriflo is encrypted. No exceptions.

Media streams: DTLS-SRTP
The DTLS (Datagram Transport Layer Security) handshake happens during connection setup and establishes encryption keys. SRTP (Secure Real-time Transport Protocol) then encrypts every audio and video packet with those keys.

Even if someone intercepts the network traffic, the packets are useless without the keys. And the keys are established through cryptographic exchange, specific to each session.

Signaling: WSS over TLS 1.3
All control messages travel through encrypted WebSocket connections using TLS 1.3—the current standard.

API: HTTPS with TLS 1.3
Server-to-server communication between your backend and our API uses modern encryption.

Tokens: Cryptographically signed
The join tokens we issue are signed. Each one specifies exactly which room and which participant identity it's valid for. Tokens expire quickly. Even if intercepted, they're useless after expiration and can't be forged to access other rooms.

The Recording Pipeline

When recording is enabled, here's what happens behind the scenes:

Recording Pipeline

Composition: We capture the composite view—main camera plus screen share if active—the same view students are seeing.

Real-time encoding: As frames arrive, we encode to H.264 format. This isn't "record now, transcode later." Encoding happens live, in parallel with the stream itself.

Cloud storage: Encoded chunks stream directly to cloud storage as they're created.

Availability: Within minutes of class ending, the recording is available for download via API.

No post-processing queue. No "your recording will be ready in 2 hours." Minutes after class ends, it's ready.

Retention policy: Recordings are automatically deleted 6 hours after class ends. Download them to your own infrastructure if you need them longer. This isn't arbitrary—educational recordings often contain faces, names, sensitive discussions. Short retention by default minimizes exposure. You control what happens after download.

Geographic Distribution

We're not running from a single data center somewhere.

Our media infrastructure is distributed globally. When a room is created, we allocate resources in the region closest to the instructor. Students connect to edge nodes near their locations. Internal routing between nodes is optimized for low latency.

Instructor in Singapore with students across Southeast Asia? Traffic stays regional. Instructor in London with European students? Same principle.

You don't configure regions. It's automatic based on where connections originate.

What This All Means for Integration

All of this complexity—SFUs, simulcast layers, ICE negotiation, TURN relays, progressive reconnection, real-time encoding—exists so your integration can be simple.

Your backend: One API call to generate a join token.
Your frontend: Pass that token to our SDK or open our URL in an iframe.
Everything else: Our problem.

The engineering work described on this page took years. With Verriflo, you skip straight to the result.

Ready to build?

Quick Start — First classroom running in 5 minutes
API Overview — Full endpoint reference
Web Integration — Iframe embedding guide
Flutter SDK — Native mobile integration

The Core Challenge​

The SFU Model​

Simulcast Encoding: The Three-Layer Trick​

Real-Time Layer Switching​

Network Traversal: Getting Through Firewalls​

The Connection Negotiation​

The Signaling Layer​

Reconnection: Designed for Failure​

The Recovery Strategy​

Encryption: Everything, Always​

The Recording Pipeline​

Geographic Distribution​

What This All Means for Integration​