Other Free Encyclopedias » Online Encyclopedia » Encyclopedia - Featured Articles » Contributed Topics from F-J

IP Telephony - The Session Initiation Protocol (SIP), SIP’s Protocol Architecture, SIP Transport, SIP Network Components

user server agent message

Abdulmotaleb El Saddik
School of Information Technology and Engineering
University of Ottawa, Ontario, Canada

Definition : Internet telephony is the process of making telephone calls over the Internet.

Internet telephony, also referred to as IP telephony (IPT), is the process of making telephone calls over the Internet, regardless of whether traditional telephones (POTS, GSM, ISDN, etc.), single use appliances, or audio-equipped personal computers are used in the calls. IPT is highly appealing for many reasons, the most important of which is the ease of implementing its services. Internet Telephony Service Providers (ITSP) can use a single IP-based infrastructure for providing traditional Internet, as well as Internet telephony access.

The Session Initiation Protocol (SIP)

The Session Initiation Protocol (SIP) is a signaling protocol for Internet Telephony. It is documented in (RFC3261, 2002) by the Internet Engineering Task Force (IETF), and is ideal for real-time multimedia communication signaling. It is an end-to-end application layer signaling protocol that is used to setup, modify, and teardown multimedia sessions such as audio/videoconferencing, interactive gaming, virtual reality, and call forwarding over IP networks. By providing those services, SIP enables service providers to integrate basic IP telephony services with Web, e-mail, presence notification and instant messaging over the Internet. It is clear that SIP is rapidly changing the way that people make telephone calls and is therefore becoming a real threat to traditional plain old telephone service (PSTN) network. SIP works with many other protocols that were designed to carry the various forms of real time multimedia applications data by enabling endpoints to discover one another and to agree on a characterization of a session that they would like to share. Even though SIP was designed to work with other internet transport protocols such as UDP, TCP when it was developed by the IETF as part of the Internet Multimedia Conferencing Architecture, it is very much a general purpose signaling protocol that works independently of underlying protocol, and regardless of the type of session that is being established. SIP is a text based client server protocol that incorporates elements of two widely used Internet protocols: HTTP and the Simple Mail Transport Protocol (SMTP), used for web browsing and e-mail respectively. HTTP inspired a client server design in SIP, as well as the use of URL’s and URI’s, however, in SIP a host may well act as client and server. From SMTP, SIP borrowed a text -encoding scheme and header style. For example, SIP reuses SMTP headers like To, From, Date and Subject .

SIP extensions supports mobility and detects presence to allow users to communicate using different devices, modes, and services, anywhere that they are connected to the Internet. Third-Generation Partnership Project (3GPP) group accepted SIP as the signaling protocol for Multimedia Applications in 3G Mobile Networks.

SIP’s Protocol Architecture

As can be seen in Figure 1, SIP does not rely on any particular transport protocol; it can run indifferently over TCP (Transport Control Protocol), UDP (User Datagram Protocol), TLS (Transport Layer Security), SCTP (Stream Control Transport Protocol), and conceptually any other protocol stack, like ATM (Asynchronous Transfer Mode) or Frame Relay. SIP does not dictate the data flow between peers, the Session Description Protocol (SDP) does that and negotiates and determines the format of data exchanged between them. SDP denned in RFC2327 is intended for describing multimedia sessions for the purposes of session announcement, invitation, and other forms of multimedia session initiation.

The benefits of the SIP protocol can be summarized by the following:

  • Because it utilizes existing IP architecture, services based on SIP are scalable.
  • Since SIP was built as an IP protocol, it integrates seamlessly with other IP protocols and services.
  • Global connectivity can be achieved with SIP protocol. Any SIP user can be reached by another SIP user over the Internet, regardless of their location, service provider, and whether they have registered with central services or not.
  • Simplicity is hallmark of SIP, due to its text coded, highly readable messages, and its simple transactions models, except for few cases that have special conditions.
  • Statelessness: Depicted by the ability of SIP servers to store minimal information about the state or existence of a media session in a network
  • Flexibility: Protocols can be used in any host applications that are not restricted to telephony.

SIP Transport

User Datagram Protocol (UDP) : A single UDP datagram or packet carries a SIP request, or response. This implies that a SIP message must be smaller in size than the Message Transport Unit (MTU) of the IP network. Since SIP doesn’t support fragmentation at the SIP layer, TCP is used for larger messages. The UDP Source port is chosen from a pool of available port numbers (above 49172) or sometimes the default SIP port, 5060 is used. Lack of a reliable transport mechanism in UDP may cause SIP message to be lost. To tackle this problem, SIP comes with its own reliability mechanism to handle the retransmission, in case a SIP message is lost.

Transport Control Protocol (TCP) : Not only does it provide reliable transport, TCP also offers congestion control. It can also transport SIP messages of arbitrary size. As in UDP, TCP uses SIP port number 5060 for the destination port. The source port is chosen from an available pool of port numbers. The main disadvantages of TCP are: the setup delay incurred when establishing a connection, and the need to maintain the connection at the transport layer by the server.

Transport Layer Security Protocol (TLS) : SIP employs TLS over TCP for encrypted transport with additional capabilities of authentication. The default SIP port number for TLS is 5061. This use of TLS by SIP takes advantage of the encryption and authentication services. However, encryption and authentication are only useful on a single hop. If a SIP request involves multiple hops, TLS becomes useless for end-to-end authentication.

SIP Network Components

SIP network components include User Agents, Servers and Gateways. The following section discusses these components in detail.

User Agent: A SIP enabled end-device is called a SIP user agent. A User Agent takes directions from a user and acts as an agent on that user’s behalf to make or accept and teardown calls with other user agents. The UA terminates the SIP call signaling and acts as the interface to the SIP network. It also maintains the state of calls that it initiates or participates in. UA must support UDP transport, and also TCP if it sends messages that are greater than 1000 octets in size. Also, a SIP user agent must support SDP for media description. A SIP user agent contains both a client application, and a server application. The two parts are designated as User Agent Client (UAC) and User Agent Server (UAS). The UAC initiates requests on behalf of the user and UAS processes incoming requests and generates appropriate responses. During a session, a user agent will usually operate as both a UAC and a UAS.

Server: The SIP server assists in call establishment, call teardown, and mobility management. Some SIP servers (proxy and redirect) can be stateless. Usually, logical SIP servers are often co-located within the same physical device. SIP servers must support TCP, TLS and UDP for transport. There is no protocol distinction between these servers, and also a client or proxy server has no way of knowing which it is communicating with. The distinction lies only in function: a proxy or redirect server cannot accept or reject a request, where as a user agent server can. Following are the different types of SIP servers:

  • Register Server: The SIP registration server, also known as registrar allows SIP agents to register their current location, retrieve a list of current registrations, and clear all registrations. The registrar accepts a user SIP registration request (REGSITER) message and responds with an acknowledgement message (200 OK) for successful registration, otherwise, it responds with an error message. In a registration request message, the “To” header field contains the name of the resource being registered, and the “Contact” header field contain the alternative addresses or aliases. The registration server creates a temporary binding between the Address of Record (AOR) URI in the “To” and the device URI in the “Contact” fields. Registration servers usually require the registering user agent to be authenticated for security reasons. Thus, registered information will be made available to other SIP servers within the same administrative domain, such as proxies and redirect servers. The registrar is responsible for keeping information up-to-date within the location service by sending updates.
  • Proxy Server: The proxy server forwards SIP requests on behalf of SIP User Agents to the next hop server, which may be another proxy server or the final user agent. A proxy does not need to understand a SIP request in order to forward it. After receiving the SIP request, the proxy will contact the location server to determine the next hop to forward the SIP requests to. The proxy may well rewrite the SIP message before forwarding it to its correct location. For incoming calls, it proxy will interrogate the location service to determine how to forward the call. The proxy may use SIP registration, SIP presence, or any other type of information to determine a user’s location. The proxy can also be Page 367  configured to provide authentication control and act as a point of control between the internal private network and an outside public network. A proxy server may be stateful or stateless. A stateless proxy server processes each SIP request or response based solely on the message contents. Once the message has been parsed, processed, and forwarded or responded to, no information about the messages is stored. A stateless proxy server never retransmits a message and doesn’t use any SIP timers. A stateful proxy server keeps track of requests and responses that were received in the past, and uses that information in processing future requests and responses. A stateful proxy starts a timer when a request is forwarded. If no response to the request is received within the timer period, the proxy will retransmit the requests, relieving the user agent of this task.
  • Redirect Server: The redirect server responds to a UA request with redirection response, indicating the current location of the called party, so that UA can directly contact it. In this case, the UA must establish a new call to the indicated location. A redirect server does not forward a request received by the UA. Redirect server uses a database or location service to look up a user. The user location information is then sent back to caller in a redirection message response.
  • Location Server: A redirect or proxy server uses a location server to obtain information about a user’s whereabouts. The service can be co-located with other SIP servers. The interface between the location service and other servers is not defined by SIP.
  • Conference Server: Conferencing server is used to aid the multiparty conference call establishment. A conferencing server mixes the media received and sends it out to all the participants using one multicast address or all of the participants’ unicast addresses, depending on the mode of conference that was setup.

SIP Gateways: A SIP gateway is an application that provides an interface for a SIP network to another network utilizing another signaling protocol. SIP supports internetworking with PSTN and H.323 via SIP-PSTN gateway and SIP-H.323 gateway respectively. In terms of SIP protocol, a gateway is just a special type of user agent, where the user agent acts on behalf of another protocol rather than a human user. A SIP gateway terminates the SIP signaling path and may sometimes also terminate the media path. SIP Gateway can support hundreds or thousands of users and does not register every user it supports.

  • SIP-PSTN gateway terminates both signaling and media paths. SIP can be translated into, or made to inter-work with common Public Switched Telephone Network (PSTN) protocols such as Integrated Service Digital Network (ISDN), ISDN User part (ISUP), and other Circuit Associated Signaling (CAS) protocols. A PSTN gateway also converts RTP media stream in the IP network into a standard telephony trunk or line. The conversion of signaling and media paths allows calling to and from the PSTN using SIP. Page 368 
  • SIP-H.323 gateway: SIP to H.323 terminates the SIP signaling path and converts the signaling to H.323, but the SIP user agent and H.323 terminal can exchange RTP media information directly with each other without going through the gateway.

SIP’s Role in Multimedia Services

SIP is heavily involved in today’s multimedia services, especially in the following categories:

User Presence Notification and Personalization: SIP Signaling functions request, detect, and deliver presence information and provide presence detection and notification. SIP presence functionality gives the opportunity to know who is online among a given contacts list before the session is established. SUBSCRIBE, NOTIFY messages are used to subscribe and notify users for presence detection and notification in an instant messaging application. A User agent sends a SUBSCRIBE message to another UA with a series of event requests indicating the desire of the sender to be notified by another UA. The NOTIFY message is used to indicate the occurrence of the event to the requested UA .

Instant Messaging and Collaborative environment: Instant messaging enables User agent to send short messages to another User Agent. It is very useful for short requests and responses. Instant messaging has better real-time characteristics than e-mail. MESSAGE method is used to support instant messaging. Its short messages are sent form UA to UA without establishing a session between them. The messages are sent in multi-part MIME format (similar to e-mail) and can also contain multimedia attachments.

Multimedia conference call setup and management: This can be divided into end to end call setup and conference setup:

SIP – End to End Call Setup

  • (Proxy): After receiving the SIP request from the User agent, the proxy contacts the location server to determine the next hop to forward the SIP requests to. Once it receives the next hop information from the location server, it forwards the UA SIP request message. The proxy then updates the INVITE request message with its host address before forwarding it.
  • (Redirect): SIP Redirect Server responds to a UA request with a redirection response, indicating the current location of the called party.

SIP – Conference Setup: Conferencing where many parties can participate in the same call is now a common feature of multimedia communication systems. SIP supports three different multiparty conferencing modes:

  • Ad hoc/Full Mesh: In this mode, every participant establishes session with every other participant with a series of INVITE messages and sends an individual copy of the media to the others. This mechanism only scales to small groups.
  • Meet me/Mixer: In this mode, each participant establishes the point-to-point session to the Conferencing Bridge (or mixer). A mixer or bridge takes each participant’s media streams and replicates it to all other participants as a unicast message. This mechanism is idea if all participants are interactive, however, it doesn’t scale for a large number of participants.
  • Interactive Broadcast/Network layer multicast: In this mode, each participant establishes the point-to-point session to the Conferencing Bridge (or mixer). A Conferencing Bridge is used but mixed media is sent to a multicast address, instead of being unicast to each participant. This mechanism can involve active and passive participants. SIP signaling is required for interactive participants only. This mode works well for large-scale conferences.

User Mobility: One of the powerful features of SIP is its ability to support terminal mobility, personal mobility, and Service mobility to a SIP user.

  • Terminal Mobility (Mobile IP- SIP): A SIP user agent will be able to maintain its connections to the Internet as its associated user moves from network to network, and possibly changes its point of connection. The user’s generic and location-independent address enables it to access services from both, stationary end devices, or from mobile end-devices.
  • Personal Mobility (SIPREGISTER): SIP Personal mobility allows the user to access Internet services from any location by using any end devices. Since SIP URI (similar e-mail address) is device-independent, a user can utilize any end-device to receive and make calls. Participants can also use any end-device to receive and to make calls.
  • Service Mobility: SIP service mobility provides a feature to a SIP user to keep the same services when mobile as long as the services/tools residing in the user agent can be accessed over Internet (Ex: Call Forwarding etc). Participant can interrupt the session and later on, continue at a different location.


SIP is a powerful and flexible application layer signaling protocol for multimedia applications. Its applications are not limited to Internet Telephony, although telephony applications are the main driving forces behind SIP development. Another popular application of SIP is Instant Messaging and Presence (IMP). IETF SIMPLE working Group is working on developing standards for IM and Presence. IP has been adopted by the 3rd Generation Partnership Project (3GPP) for establishing, controlling, and maintaining real-time wireless multimedia sessions using Internet Protocol. SIP is an ASCI text based protocol, and SIP messages are long: up to and exceeding 800 bytes. This is not a problem for fixed networks with a high bandwidth, but it is for wireless cellular networks, where the bandwidth is very limited. For this reason, the SIP messages should be compressed in wireless networks. A number of proposals for SIP message compression have been submitted to the Robust Header Compression (ROHC) working group of the Internet Engineering Task Force (IETF). TCCB (Text Based Compression using Cache and Blank Approach) is a compression technique ideal for the compression of long ASCII text-based messages, such as SIP message bodies. Therefore, SIP message compression using TCCB has the potential to reduce request/response delays.

Ipatieff, Vladimir Nikolayevich [next] [back] Investment Strategy for Integrating Wireless Technology Into Organizations - INTRODUCTION, MAJOR UNCERTAINTIES AND RISKS IN THE FIELD OF WIRELESS TECHNOLOGIES

User Comments

Your email address will be altered so spam harvesting bots can't read it easily.
Hide my email completely instead?

Cancel or

Vote down Vote up

about 5 years ago

SIP servers play an important role in VoIP technology. When you you are about to register a phone to your telephone system you need to create a SIP account on a SIP server.
I myself use Ozeki Phone System XE and when I connect my devices to this system with a SIP server and it is a quite easy process. I recommend you a link: http://www.ozekiphone.com/what-is-sip-server-338.html
Here you can even watch a video about SIP servers.