Girish Mahajan (Editor)

Session Initiation Protocol

Updated on
Edit
Like
Comment
Share on FacebookTweet on TwitterShare on LinkedInShare on Reddit

The Session Initiation Protocol (SIP) is a communications protocol for signaling, for the purpose of controlling multimedia communication sessions. Internet telephony, business IP telephone systems, service providers and carriers use SIP. SIP can be used to set up and control voice and video calls, as well as instant messaging. The most common application of SIP is the setup and termination of Voice over IP (VoIP) telephone calls.

Contents

Its essential purpose in call setup is to inform the calling party of the Internet Protocol (IP) address of the called party's telephone, so that data units containing segments of digitized speech may be then transmitted to the called party's telephone, implementing Voice over IP (VoIP) speech communication.

SIP implements the functions of the session layer In the OSI 7-Layer Reference Model. In the 4-layer DoD Internet Model, SIP is an application layer protocol. SIP was designed to be independent of the underlying transport layer. The protocol defines the messages that are sent between endpoints, which govern establishment, termination and other essential elements of a call. SIP can be used for creating, modifying and terminating sessions consisting of one or several media streams. It is a text-based protocol, incorporating many elements of the Hypertext Transfer Protocol (HTTP) and the Simple Mail Transfer Protocol (SMTP).

SIP works in conjunction with other protocols that specify the media format and protocol to be used to subsequently communicate the media. SIP is typically used to carry a Session Description Protocol (SDP) message specifying the codec and the use of either the Real-time Transport Protocol (RTP) or Secure Real-time Transport Protocol (SRTP) for media communication. RTP Protocol data units may be encrypted byTransport Layer Security (TLS) for secure transmission.

History

SIP was originally designed by Mark Handley, Henning Schulzrinne, Eve Schooler and Jonathan Rosenberg in 1996. The protocol was standardized as RFC 2543 in 1999. In November 2000, SIP was accepted as a 3GPP signaling protocol and permanent element of the IP Multimedia Subsystem (IMS) architecture for IP-based streaming multimedia services in cellular networks. In June 2002 the specification was revised in RFC 3261 and various extensions and clarifications have been published since.

The protocol was designed with the vision to support new multimedia applications. It has been extended for video conferencing, streaming multimedia distribution, instant messaging, presence information, file transfer, fax over IP and online games.

SIP is distinguished by its proponents for having roots in the Internet community rather than in the telecommunications industry. SIP has been standardized primarily by the IETF, while other protocols, such as H.323, have traditionally been associated with the International Telecommunication Union (ITU).

A motivating goal for SIP was to provide a signaling and call setup protocol for IP-based communications that can support a superset of the call processing functions and features present in the public switched telephone network (PSTN). SIP by itself does not define these features; rather, its focus is call setup and signaling. The features that permit familiar telephone-like operations (i.e. dialing a number, causing a phone to ring, hearing ringback tones or a busy signal) are performed by proxy servers and user agents. Implementation and terminology are different in the SIP world compared to the PSTN but, to the end-user, the behavior is similar.

Protocol operation

SIP is only involved in the signaling portion of a media communication session, primarily used to set up and terminate voice or video calls. SIP can be used to establish two-party (unicast) or multiparty (multicast) sessions. It also allows modification of existing calls. The modification can involve changing addresses or ports, inviting more participants, and adding or deleting media streams. SIP has also found applications in messaging applications, such as instant messaging, and event subscription and notification.

SIP works in concert with several other protocols to specify the media format and coding, and the protocol for communicating the media once the call is set up. For call setup, the body of a SIP message contains a Session Description Protocol (SDP) data unit, which specifies the media format, codec and media communication protocol. Voice and video media is typically specified to be communicated between the terminals using the Real-time Transport Protocol (RTP) or Secure Real-Time Transport Protocol (SRTP).

Each resource of a SIP network, such as a user agent or a voicemail box, is identified by a Uniform Resource Identifier (URI), which follows the general standard syntax also used in Web services and e-mail. The URI scheme used for SIP is sip and a typical SIP URI has the form sip:username@domainname or sip:username@hostport, where domainname requires DNS SRV records to locate the servers for SIP domain while hostport can be an IP address or a fully qualified domain name of the host and port. If secure transmission is required, the scheme sips is used.

SIP employs design elements similar to the HTTP request/response transaction model. Each transaction consists of a client request that invokes a particular method or function on the server and at least one response. SIP reuses most of the header fields, encoding rules and status codes of HTTP, providing a readable text-based format.

SIP clients typically use TCP or UDP on port numbers 5060 or 5061 to communicate signaling information to SIP servers and other SIP endpoints. Port 5060 is commonly used for non-encrypted signaling traffic whereas port 5061 is typically used for traffic encrypted with Transport Layer Security (TLS).

SIP can be carried by several transport layer protocols including the Transmission Control Protocol (TCP), the User Datagram Protocol (UDP) or the Stream Control Transmission Protocol (SCTP).

SIP-enabled telephony networks often implement many of the call processing features of Signaling System 7 (SS7), although the two protocols themselves are very different. SS7 is a centralized protocol, characterized by a complex central network architecture and dumb endpoints (traditional telephone handsets). SIP is a client-server protocol, however most SIP-enabled devices may perform both the client and the server role. In general, the session initiator is a client, and the call recipient is the server. SIP features are implemented in the communicating endpoints, contrary to traditional SS7 architecture, in which features are implemented in the network core.

Because SIP devices must perform both client and server roles, network communication can be difficult with modern network topologies. When using a connection-oriented protocol like TCP, SIP nominally expects that separate connections will be opened for requests from A to B and requests from B to A. The use of firewalls and network address translation (NAT) interferes with this, as it may not be possible for B to initiate a connection to A, if A is behind a firewall or NAT. SIP allows the original connection from A to B to be used for requests from B to A, but the requests must correctly distinguish between A's private and public addresses and ports; this is also true of requests on connectionless protocols like UDP. To accomplish this, SIP uses extensions like received and rport, and can be paired with other protocols for discovering network topology such as TURN, STUN, and ICE.

Network elements

The network elements that use the Session Initiation Protocol for communication are called SIP user agents. Each user agent (UA) performs the function of a user agent client (UAC) when it is requesting a service function, and that of a user agent server (UAS) when responding to a request. Thus, any two SIP endpoints may in principle operate without any intervening SIP infrastructure. However, for network operational reasons, and for provisioning public services to users, and directory services, SIP defines several specific types of network server elements. Each of these service elements also communicates within the client-server model implemented in user agent clients and servers.

User agent

A user agent is a logical network end-point used to create or receive SIP messages. The user agent manages SIP sessions. As a client (UAC), it sends SIP requests, and as a server (UAS) it receives requests and returns a SIP response. Unlike other network protocols that fix the roles of client and server, e.g., in HTTP, in which a web browser only acts as a client, and never as a server, the Session Initiates Protocol requires both peers to implement both roles. The roles of UAC and UAS only last for the duration of a SIP transaction.

A SIP phone is an IP phone that implements client and server functions of a SIP user agent and provides the traditional call functions of a telephone, such as dial, answer, reject, call hold, and call transfer. SIP phones may be implemented as a hardware device or as a softphone. As vendors increasingly implement SIP as a standard telephony platform, the distinction between hardware-based and software-based SIP phones is blurred and SIP elements are implemented in the basic firmware functions of many IP-capable devices.

In SIP, as in HTTP, the user agent may identify itself using a message header field (User-Agent), containing a text description of the software, hardware, or the product name. The user agent field is sent in request messages, which means that the receiving SIP server can evaluate this information to perform device-specific configuration or feature activation. Operators of SIP network elements sometimes store this information in customer account portals, where it can be useful in diagnosing SIP compatibility problems or display of service status.

Proxy server

A proxy server is a network server with UAC and UAS components that functions as an intermediary entity for the purpose of performing requests on behalf of other network elements. A proxy server primarily plays the role of routing, meaning that its job is to ensure that a request is sent to another entity closer to the targeted user. Proxies are also useful for enforcing policy, such as for determining whether a user is allowed to make a call. A proxy interprets, and, if necessary, rewrites specific parts of a request message before forwarding it.

Registrar

A registrar is a SIP endpoint provides a location service. It accepts REGISTER requests, recording the address and other parameters from the user agent. For subsequent requests it provides an essential means to locate possible communication peers on the network. The location service links one or more IP addresses to the SIP URI of the registering agent. Multiple user agents may register for the same URI, with the result that all registered user agents receive the calls to the URI.

SIP registrars are logical elements, and are often co-located with SIP proxies. To improve network scalability, location services may instead be located with a redirect server.

Redirect server

A redirect server is a user agent server that generates 3xx (redirection) responses to requests it receives, directing the client to contact an alternate set of URIs. A redirect server allows proxy servers to direct SIP session invitations to external domains.

Session border controller

Session border controllers serve as middle boxes between UA and SIP servers for various types of functions, including network topology hiding and assistance in NAT traversal.

Gateway

Gateways can be used to interconnect a SIP network to other networks, such as the public switched telephone network, which use different protocols or technologies.

SIP messages

SIP is a text-based protocol with syntax similar to that of HTTP. There are two different types of SIP messages: requests and responses. The first line of a request has a method, defining the nature of the request, and a Request-URI, indicating where the request should be sent. The first line of a response has a response code.

Requests

Requests initiate a SIP transaction between two SIP entities for establishing, controlling, and terminating sessions. Critical methods include the following.

  • INVITE: Used to establish a dialog with media exchange between user agents.
  • BYE: Terminates an existing session.
  • REGISTER: The method implements a location service for user agents, which indicate their address information to the server.
  • Responses

    Responses are sent by the user agent server indicating the result of a received request. Several classes of responses are recognized, determined by the numerical range of result codes:

  • 1xx: Provisional responses to requests indicate the request was valid and is being processed.
  • 2xx: 200-level responses indicate a successful completion of the request. As a response to an INVITE, it indicates a call is established.
  • 3xx: This group indicates a redirection is needed for completion of the request. The request has to be completed with a new destination.
  • 4xx: The request contained bad syntax or cannot be fulfilled at the server.
  • 5xx: The server failed to fulfill an apparently valid request.
  • 6xx: This is a global failure, as the request cannot be fulfilled at any server.
  • Transactions

    SIP defines a transaction mechanism to control the exchanges between participants and deliver messages reliably. A transaction is a state of a session, which is controlled by various timers. Client transactions send requests and server transactions respond to those requests with one or more responses. The responses may include provisional responses with a response code in the form 1xx, and one or multiple final responses (2xx – 6xx).

    Transactions are further categorized as either type Invite or type Non-Invite. Invite transactions differ in that they can establish a long-running conversation, referred to as a dialog in SIP, and so include an acknowledgment (ACK) of any non-failing final response, e.g., 200 OK.

    Because of these transactional mechanisms, unreliable transport protocols, such as the User Datagram Protocol (UDP), are sufficient for SIP operation.

    Instant messaging and presence

    The Session Initiation Protocol for Instant Messaging and Presence Leveraging Extensions (SIMPLE) is the SIP-based suite of standards for instant messaging and presence information. MSRP (Message Session Relay Protocol) allows instant message sessions and file transfer.

    Conformance testing

    The SIP developer community meets regularly at conferences organized by SIP Forum to test interoperability of SIP implementations. The TTCN-3 test specification language, developed by a task force at ETSI (STF 196), is used for specifying conformance tests for SIP implementations.

    Performance testing

    When developing SIP software or deploying a new SIP infrastructure, it is very important to test capability of servers and IP networks to handle certain call load: number of concurrent calls and number of calls per second. SIP performance tester software is used to simulate SIP and RTP traffic to see if the server and IP network are stable under the call load. The software measures performance indicators like answer delay, answer/seizure ratio, RTP jitter and packet loss, round-trip delay time.

    Applications

    SIP trunking is a marketing term for native Voice over Internet Protocol (VoIP) communication services offered by carriers. SIP Trunking service provides communication of VoIP phone calls between an organization's locations over an IP network with Quality of Service mechanisms to guarantee transmission characteristics (packet loss, delay and jitter) suitable for voice. A SIP Trunking service also includes gateway service to connect VoIP calls to the legacy Public Switched Telephone Network (PSTN). This simplifies organizations' telecom infrastructure and saves money by sharing the carrier access circuit for voice, data and Internet traffic, and removing the need for Primary Rate Interface (PRI) connections.

    Many VoIP phone companies allow customers to use their own SIP devices, such as SIP-capable telephone sets, or softphones.

    SIP-enabled video surveillance cameras can make calls to alert the owner or operator that an event has occurred; for example, to notify that motion has been detected out-of-hours in a protected area.

    SIP is used in audio over IP for broadcasting applications where it provides an interoperable means for audio interfaces from different manufacturers to make connections with one another.

    Implementations

    The U.S. National Institute of Standards and Technology (NIST), Advanced Networking Technologies Division provides a public-domain Java implementation that serves as a reference implementation for the standard. The implementation can work in proxy server or user agent scenarios and has been used in numerous commercial and research projects. It supports RFC 3261 in full and a number of extension RFCs including RFC 6665 (event notification) and RFC 3262 (reliable provisional responses).

    Numerous other commercial and open-source SIP implementations exist. See List of SIP software.

    SIP-ISUP interworking

    SIP-I, or the Session Initiation Protocol with encapsulated ISUP, is a protocol used to create, modify, and terminate communication sessions based on ISUP using SIP and IP networks. Services using SIP-I include voice, video telephony, fax and data. SIP-I and SIP-T are two protocols with similar features, notably to allow ISUP messages to be transported over SIP networks. This preserves all of the detail available in the ISUP header, which is important as there are many country-specific variants of ISUP that have been implemented over the last 30 years, and it is not always possible to express all of the same detail using a native SIP message. SIP-I was defined by the ITU-T, whereas SIP-T was defined via the IETF RFC route.

    Encryption

    The increasing concerns about the security of calls that run over the public Internet has made SIP encryption more popular and more desired.

    If secure transmission is required, the sips URI scheme is used and mandates that each hop over which the request is forwarded up to the target domain must be secured with Transport Layer Security (TLS). The last hop from the proxy of the target domain to the user agent has to be secured according to local policies. TLS protects against attackers who try to listen on the signaling link but it does not provide real end-to-end security to prevent espionage and law enforcement interception, as the encryption is only hop-by-hop and every single intermediate proxy has to be trusted.

    Because VPN is not an option for most service providers, most service providers that offer secure SIP (SIPS) connections use TLS for securing signaling. The relationship between SIP (port 5060) and SIPS (port 5061), is similar to that as for HTTP and HTTPS, and uses URIs in the form "sips:[email protected]". The media streams, which occur on different connections to the signaling stream, can be encrypted with SRTP. The key exchange for SRTP is performed with SDES (RFC 4568), or the newer and often more user friendly ZRTP (RFC 6189), which can automatically upgrade RTP to SRTP using dynamic key exchange (and a verification phrase). One can also add a MIKEY (RFC 3830) exchange to SIP and in that way determine session keys for use with SRTP.

    References

    Session Initiation Protocol Wikipedia