For those types of conference calls, the server only needs to know the sizes of the various streams and which packet is for what stream. It does not need to see the decrypted media, so one can implement e2e encryption for such types of group calls. This is less common in the industry, but is possible. For example: https://support.google.com/duo/answer/9280240?hl=en
(I used to work at Google on WebRTC, Duo, and Hangouts, but now work on video calling at Signal).
Many conference calls are implemented using what's called a Selective Forwarding Unit (SFU) and the sending clients send multiple resolutions (either independent, called "Simulcast" or dependent, called "SVC"). In that case, the adaptation is done by the server in selecting which resolution to forward at any given time. This is fairly common practice in the industry. For example: https://github.com/jitsi/jitsi-videobridge and https://tools.ietf.org/html/draft-aboba-avtcore-sfu-rtp-00 and https://www.w3.org/TR/webrtc-svc/.
For those types of conference calls, the server only needs to know the sizes of the various streams and which packet is for what stream. It does not need to see the decrypted media, so one can implement e2e encryption for such types of group calls. This is less common in the industry, but is possible. For example: https://support.google.com/duo/answer/9280240?hl=en
(I used to work at Google on WebRTC, Duo, and Hangouts, but now work on video calling at Signal).