WebRTC: P2P, SFU and MCU Approaches to Communicate Effectively
Today, there are a growing number of applications that use WebRTC technology to transmit audio/video streams. Nextcloud Talk of course, but also Jitsi, BigBlueButton and many others. Nevertheless, there are essentially 3 approaches to streaming and allowing scalability from a few users to several thousand simultaneously.
WebRTC is essentially a peer-to-peer technology, and it is necessary to configure a separate central server (or several, with one, or more, load balancers - Load Balancer - upstream) depending on the purpose of the service, such as when building a large-scale multimedia broadcast service or when content processing is required. Depending on the use case, the architecture can be considered as follows.
P2P (Peer to Peer) or Mesh
This is the mode adopted by Nextcloud Talk in its version delivered with Nextcloud (and Cloudeezy) by default. Direct end-to-end connection without a central server(s) is advantageous in terms of cost, but as the number of peers increases (mesh structure), the system and network require high capacity. To put it roughly, an ADSL connection (A = Asymmetric) is suitable for up to 4 simultaneous users because it is limited in Uplink (uplink speed). This is why, in this connection mode, a user with a good machine (CPU + symmetrical fibre internet connection for example) will encounter few problems, whereas a user with ADSL will quickly encounter problems as soon as the participants join the conference.
Maybe in a few years it will be possible to do without the other modes (more and more powerful CPUs, very powerful fiber or 5G connections that are widespread globally) but for the moment, we need to set up centralizations (in datacenter or "on-premises" if the link - symmetrical - is ideal) to overcome this problem and to transfer the load from the clients to the servers, much more powerful.
SFU (Selective Shipping Unit)
This is a central server (or servers) that relays multimedia traffic, and each peer connects to it for decryption/encryption processing. It is suitable for a streaming service structure such as video streaming. This mode is used, among others, by Jitsi.
MCU (Multipoint Control Unit)
As a central server method in which a plurality of transmission media are mixed or transcoded by a central server and delivered to a receiver side, the load on the client and the network is considerably reduced, while high computing power of the central server is required. This is the mode adopted by Nextcloud Talk in its HPB version (optional on our Nextcloud hosting offers).
Needs comparison table for each approach
|Throughput (in upload)||High||Low||Low|
|Flow rate (download)||High||Low||High|
|Customer CPU Usage||High||Low||Medium|
|Using the server CPU||-||High||Low|
|Possible latency||Depends on network bandwidth||Depends on processor power||Depends on network bandwidth|
|Simulcast / SVC capability||-||-||Yes|
A design adapted to the use cases such as service targets and costs is required. As a general rule :
- small-scale (1:1) voice/video calls are P2P, one-way broadcast services of medium or higher level (e.g. e-Learning, Broadcasting, etc.).
- services for purposes such as videoconferencing are SFU, large scale voice conversation (e.g. voice mixing)
- Real-time control systems (e.g. video transcoding to grid) are designed primarily with MCU.
The full mesh option (P2P) is feasible as long as the number of participants is small. As mentioned in the article, the main problem is bandwidth. MCU looks like an easy solution but it has scale and cost issues.
Solutions" exist to limit the problems, for example low resolutions for each participant (lower quality video = lower CPU requirements for decoding and lower bandwidth requirements for sending/receiving) but today nobody wants to make a "pixelated" conference with our high resolution screens.
A videoconferencing service architecture may include a combination of the 3 options or a part of them and the use of each depending on the logic of the service and the type / number of participants, for example, a combination of full mesh (P2P) and SFU.