Establishing a WebRTC Media Session

WebRTC does not specify any signaling standards to establish a media session. Instead it is up to the implementer to decide how to signal the call. SIP was specifically designed for this purpose so many webRTC implementers will choose it for their application. The implementer is responsible for sending, receiving and parsing of SIP messages while the browser will only accept the session description and handle the media channels. JSEP is the standard to allow implementations to communicate from within a browser and provide session description as well as high level media control.

WebRTC Media Session

Fig 1: WebRTC implementers can use SIP to establish a media session between peers

WebRTC browser implementations must use JSEP to interface with the application layer. All WebRTC implementations within browsers must be done as JavaScript libraries to ensure interoperability between browsers. Signaling is done via web sockets to provide the application a means of establishing a connection with a server of its choosing without giving it access to the network layer of the user’s computer.

WebRTC provides no requirements or recommendations for the signaling protocol and therefore,  it can be changed on a per application basis. Typical VOIP applications use SIP as a signaling protocol so many WebRTC voice and video applications will use it as well to maintain compatibility with already existing technologies.

WebRTC specifies the use of ICE for network address translation (NAT). ICE works by polling various STUN and TURN servers to establish a list of possible IP addresses on which the peer can reach a user. The purpose of NAT is to traverse networks with unknown firewall configurations to establish the user-to-user media connection. For web browser implementations the browser will handle the ICE protocol but the implementer may need to send ICE candidates to the peer via the signaling protocol.

More Information