Streaming Video over the Internet

A) Paper Summary

This paper introduces an architecture for streaming video over the Internet. Prior to streaming, video was usually downloaded. Since, it took a long time to download video files, streaming was invented with the intention of avoiding download delays and enhancing user experience. In streaming, video content is played as it arrives over the network, in the sense that there is no wait period for a complete download. The main challenges in Internet streaming are to maintain real time processing constraints and to provide QoS guarantees in a best effort network scenario, where bandwidth fluctuations are frequent.

Fig 1 shows an architecture for video streaming.

Figure 1. An architecture for streaming video

Raw video and audio data are pre-compressed by video and audio compression algorithms and then saved in storage devices. Upon the client’s request, a streaming server retrieves compressed video and audio from storage devices and then the application level QoS control module adapts the video/audio bit-streams according to network status and QoS requirements. After the adaptation the transport protocols packetize the compressed bit-streams and send the video/audio packets to the Internet. Packets may be dropped or experience excessive delay due to congestion within the network and hence to improve the quality of media delivery continuous media distribution services are deployed within the network. At the client side, various bit-streams which are received in the form of packets need to be synchronized with respect to each other. Design and implementation details for each of these key areas are discussed at length in the paper.

Video Compression

Since raw video consumes a lot of bandwidth, compression is usually employed to achieve transmission efficiency. Video compression can be classified into two categories: scalable and nonscalable video coding. A nonscalable video encoder compresses the raw video into a single bit-stream, thereby leaving little scope for adaptation. On the other hand, a scalable video encoder compresses the raw video into multiple bit streams of varying quality. One of the multiple bit streams provided by the scalable video encoder is called the base stream, which, if decoded provides a coarse quality video presentation, whereas the other streams, called enhancement streams, if decoded and used in conjunction with the base stream improve the video quality. The best schemes in this area are Fine Granularity Scalability (FGS) which utilizes bitplane coding method to represent enhancement streams. A variation of FGS is Progressive FGS (PFGS) which, unlike the two layer approach of the FGS, uses a multiple layered approach towards coding. The advantage in doing this is that errors in motion prediction are reduced due to availability of incremental reference layers.

Application Layer QoS Control

Application layer QoS control techniques are unique because they are employed at the application layer. These techniques control packet loss and transmission delays due to network congestion, without any support from the network infrastructure. They are broadly classified into Congestion Control mechanisms and Error Control mechanisms. Congestion control mechanisms can be further classified into Rate Control methods and Rate Shaping methods whereas Error Control mechanisms comprise of Forward Error Correction coding (FEC), retransmission, error resilient coding and error concealment.

Rate control can be done either by the source or receiver or both of them could co-operate to provide rate control. Source based rate control techniques are either probe based or model based. Probe based approaches at the source, are experimental in nature and rely on obtaining feedback from the receiver in order to adapt the sending rate to the network bandwidth whereas model based approaches are based on the throughput model of the TCP. Receiver based rate control mechanisms require that the source should transmit data in separate channels of different quality. If the receiver detects no congestion then it adds a channel in order to improve the visual quality of the video whereas if congestion is detected then the receiver drops a channel thus performing a graceful degradation of the visual quality of the video. Apart from these individual techniques a hybrid approach in which both the source and receiver cooperate to achieve rate control are also prevalent. Rate shaping is another technique used to provide congestion control and the basic idea behind it is to perform transcoding by using filters for adapting the rate of transmission between links having different bandwidth requirements.

Error control techniques employ FECs in which redundant information is added to the bit-stream in order to facilitate the reconstruction of the view in case of packet loss. Retransmission schemes are applicable only in scenarios where it is possible to obtain a lost packet through retransmission without violating its presentation deadline. Error resilient techniques employ multiple encoding description methods to compensate for packet loss and finally Error concealment methods use spatial and temporal interpolation to reconstruct the lost information within or between frames.

Continuous Media Distribution Services

Built over the Internet (IP protocol), mechanisms under this heading were developed as network infrastructure support for maintaining QoS and efficiency for multimedia content delivery. These include Network Filtering, Application Level Multicast and Content Replication.

Network filters aim to maximize video quality during network congestion. Using network filters at the source are a costly idea because servers are usually quite constrained with processing real time data and therefore service providers often tend to place filters at routers. Network filters serve a dual purpose: a) They distribute media to the network and b) based on control information passed between communication participants they shape the network traffic by transcoding bit rates. Network filters have an advantage because they know the format of the media stream and hence can provide graceful degradation instead of corrupting the flow outright. Further, network filters can achieve bandwidth efficiency by discarding packets that arrive later than their deadlines.

Application level multicast is aimed at building a multicast service over the Internet. It enables individual service providers and enterprises to construct their own Internet multicast networks and interconnect them into larger, world wide content distribution networks through application level peering relationships.

Content replication is another technique which is widely used to provide reduced bandwidth consumption in networks, reduced load on streaming servers, reduced latency for clients and increased availability of media content. Mainly, content replication is achieved through caching and mirroring. Mirroring though has its advantages but is a costly and ad-hoc process, however, caching seems to be quite a promising technique. Caching is mostly employed where proxy servers are used to act as a gateway for local users. A portion of the media content cached at proxy servers often provides significant reduction in the wait time for media delivery.

Streaming Server

Streaming servers play a key role in providing streaming services. To offer quality streaming services, streaming servers are required to process multimedia data in real time, support VCR like functions and retrieve media components in a synchronous fashion. Streaming servers mainly have three components which are the communicator, operating systems and storage systems.

Operating systems supporting multimedia streaming are supposed to provide real time process scheduling. Two common types of such scheduling methods are Earliest Deadline First (EDF) and rate monotonic scheduling. Another function which the OS performs is resource management. Since servers need to guarantee QoS for already established sessions an admission control test is usually made before a new client connection is accepted. Admission control algorithms are usually either deterministic or statistical in nature. Deterministic mechanisms provide hard guarantees to clients whereas statistical methods achieve better resource utilization by providing small QoS violations during temporary overloading. Another functionality which the OS needs to provide is real time file management. This is usually done by either storing the file as contiguous blocks and using real time disk scheduling algorithms such as SCAN-EDF, DC- SCAN or grouped sweeping or stripping the data in a file across multiple disks to ensure parallel access among multiple clients.

Storage systems for multimedia distribution increase the data throughput with data stripping. An obsolete method to increase data capacity is to use tertiary and hierarchical storage systems which provide data archiving properties. A new development is to use Storage Area Networks (SAN) or Network Attached Storage (NAS). The difference between the two is that SAN provides high speed block device access and is based upon an encapsulated SCSI protocol whereas NAS provides a more conventional file system view based upon TCP, UDP and IP protocols.

Media Synchronization

Media synchronization is all about maintaining the temporal relationship within a stream and between different multimedia streams. It is classified into three categories:

a) Intra-stream synchronization – This refers to maintaining the temporal relationship between the lowest layer logical data units such as the audio/video frames.

b) Inter-stream synchronization – This refers to the synchronization requirement between media streams such as synchronization between audio and video during the streaming of a movie.

c) Inter-object synchronization – This refers to synchronization between time-independent objects and time-dependent objects within media streams. A suitable example is when a slide show which has audio objects attached to it is streamed. Care has to be taken so that an audio object of one slide does not overlap the other.

The essential part of any media synchronization is the specifications of the temporal relations within the media and between the media. This may be done either automatically or manually. A most common method of specifying temporal relationship is axes-based in which a stream is time stamped at the source to store temporal information within the stream and with respect to other streams. There are various preventive and corrective methods to maintain media synchronization. A peculiar corrective method is Stream Synchronization Protocol in which units at the receiver end monitor the difference between the predicted time and the actual time of arrival of a media packet. These units communicate the difference to the scheduler which delays the presentation of the video unit so as to accommodate the packet arrival delay.

Protocols for Streaming Video

Quite a few protocols have been standardized and designed for communication between clients and streaming servers. According to their functionalities they can be classified into the following three categories:

1) Network layer protocol provides the basic network service such as network addressing. The IP serves as the network layer protocol for multimedia streaming.

2) Transport protocol provides end-to-end network transport for streaming applications. Transport protocols include UDP, TCP, RTP, RTCP etc.

3) Session control protocols define the messages and procedures to control the delivery of the multimedia data during an established session. The RTSP and SIP are such session control protocols.

Towards the end the authors have done a recap of the whole paper with additional insights into future directions and scope of multimedia streaming.

B) Class Discussions

The Scope of Multimedia Streaming : It is quite obvious that multimedia streaming technology would provide a wider platform for networking applications. Some of the examples cited were live telecasting of conferences, surveillance operations and sports telecast. Reference to VoIP was made and it was Buck's view that VoIP may actually never lead to Video telephony over IP because video can fill up just about any available bandwidth. Thus, is it worthwhile to stream media instead of downloading is still an open question.
Some of the ideas present in this paper are obsolete. Like the one suggesting hierarchical storage of memory and use of SAN, NAS to improve upon capacity and access respectively. Today, disks are much cheaper and have all sorts of embedded intelligence in them, which has, in turn, resulted in an improvement in their performance and at the same time has made it difficult to take advantage of their internal structures.
A brief discussion took place on Real Time operating systems. It was mentioned that, "Real Time OS" is more of a buzz word and it is hard implement an OS which can meet fine grained millisecond deadlines because the hardware infrastructure we have today was not developed with due consideration to the real time concept.
In Buck's view, contrary to what the paper projects, RTSP and SIP are not overlapping protocols, rather they have been developed with different ideologies in mind.