VoIP, or Voice over IP, enables mobile and fixed telephones, fax machines, and other communications devices to initiate and receive calls over an IP based packet network. It can be seen minimally as a IP based successor to the previous generation of analog PSTN services. VoIP works with Internet and mobile services to send/receive voice calls as digital signals over the Internet using voice over IP technology. VoIP is often thought of as voice over wired networks, such as Ethernet based networks, but modern mobile networks use VoIP for all voice calls – voice is IP data in a cell network. VoIP is referred to as Voice over LTE (VoLTE) in the 4G network, and Voice over New Radio (VoNR) in 5G. This is not in contrast to VoIP, but rather indicates rules and requirements for how standard VoIP protocols will be handled in a 3GPP network. VoIP can also be used on these (or other) networks in what is referred to as Over The Top (OTT) mode – e.g., it is given no more special handling by the network than any other data. This is often the case for non-carrier based VoIP offered by 3rd party commercial service providers.
Other frameworks can fall under the VoIP umbrella as well. For example, WebRTC is based on the same VoIP media protocols – but sessions are generally managed using web protocols, rather than SIP.
There is a variety of software/hardware solutions and applications that can adapt mobile or landline phones and other devices to access VoIP services. The figure shows an example of a hybrid network depicting different ways of accessing an IP network for the purpose of using voice services.
Although there a number of protocol ecosystems that could be and have been called VoIP – such as those in the past based on H.323, Megaco, and MGCP – today, the primary protocol stack that VoIP systems are built around is the Session Initiation Protocol (SIP). SIP provides a flexible standard for initiating multimedia sessions between endpoints, including video, chat, interactive games, and virtual reality. A SIP based VoIP system is built on a variety of IETF protocols beyond RFC3261 itself, some of the major pieces being:
- RTP – the Realtime Transport Protcol
- SDP – the Session Description Protocol
- TLS – Transport Layer Security
RTP is the protocol used to take the media stream, most often audio and possible video in a VoIP system, and packetize it, adding sequencing and timing information, as well as information on the type of the payload. This information is used to reconstruct the pieces of the stream on the receiving end, and recreating a constant real-time stream.
SDP is carried in the SIP messages themselves, conveying the parameters used to negotiate a media session. A process called Offer/Answer, based on RFC3264, is used to do the negotiation.
TLS is the current follow up to SSL for transport layer security. SIP uses TLS to encrypt the SIP messages between endpoints and servers, on a hop by hop bases. This is often indicated by using the sips:// form of the sip:// URL.