Broadcasting a high volume of information requires a very wide bandwidth, which for analogue television is five to six megahertz. For digital video broadcasting (DVB) ten times the analogue bandwidth or more is necessary. For this reason, data compression techniques are used to reduce the bandwidth so that more than one channel can be transmitted in the space allocated to a single channel for analogue TV.
There are many advantages of digital TV over analogue, such as:
1. Good picture quality
2. Increased number of channels within the same bandwidth
3. Lower transmission power
4. Reduced adjacent-channel interference
Broadcasting of digital TV signals involves digitisation, compression and channel encoding. Let’s go through digitisation and compression mechanisms in detail.
Digitisation is the process of converting the analogue audio and video signals into a series of bits using an analogue-to-digital converter (ADC). Digitising a TV picture means sampling the contents of the picture frame-by-frame and scan-line-by-scan-line. In order to maintain the quality of the picture, there must be at least as many samples per line as there are pixels, with each sample representing one pixel.
Sampling of TV signal
Sampling refers to the process of converting a continuous analogue signal into discrete digital numbers. Typically, an ADC is used to convert voltages into digital numbers. The process may be reversed through a digital-to-analogue converter (DAC).
Sampling rate. In order to convert analogue signals (which are continuous in time) into digital representation, these must be sampled at discrete intervals in time. The interval at which the signal is captured is known as the sampling rate of the converter.
If the sampling rate is fast enough, the stored sampled data points may be used to reconstruct the original signal exactly from the discrete data by interpolating the data points. Ultimately, the accuracy of the reconstructed signal is limited by the quantisation error, and is possible only if the sampling rate is higher than twice the highest frequency of the signal. This is the basis for the ‘Shannon-Nyquist Sampling Theorem.’ If the signal is not sampled at baseband, it must be sampled at greater than twice the bandwidth.
A 13.5MHz sampling rate was selected to satisfy this as well as other criteria. One is that the sampling rate must be divisible by line frequency:
13.5 = 864×15.625 for PAL
The number of active pixels per line for PAL is therefore 13.5×52 = 702
Colour TV broadcasting involves transmission of luminance (Y) and two colour-difference signals (R-Y and B-Y). The two colour-difference signals R-Y and B-Y are referred to as CR and CB, respectively.
In DVB, these three components are independently sampled and converted into three digital datastreams before compression, modulation and subsequent transmission. For the luminance signal, which contains the highest video frequencies, the full sampling rate of 13.5 MHz is used. As for chrominance components CR and CB, which contain lower video frequencies, sampling rate of 6.75 MHz is used. It is followed by the multiplexer, where all the three streams are combined into a single stream and a total sampling rate of 13.5 + 6.75 + 6.75 = 27 MHz is obtained.
Sampling structure. During the digitising process, the three parameters of the component video signal are assigned a numeric sampling value. Groups of four video pixels within each of the three components are looked at and samples taken for recording. With a 4:2:2 sampled video signal, all four of the luminance pixels, two R-Y pixels and two B-Y pixels are sampled. This gives a 4:2:2 sampling rate.
With a 4:1:1 signal, all four of the luminance pixels are sampled four times but only one pixel is sampled from each of the R-Y and B-Y. This lower sampling rate of the colour components results in less colour information being recorded, affecting the accuracy and intensity of the colour in the video signal.
4:1:1 may not be used when doing chroma-keying, graphics and other compositing functions, as all of these functions require strong colours to be present in the video signal. The advantage of 4:1:1 sampling is that you can record twice as much information as 4:2:2 on the same area of video tape, thus providing twice as much recording/playback time within a given tape length. And, of course, the circuitry within the equipment is less expensive for a manufacturer to produce.
The next step after sampling is quantisation, in which sample values are rounded up to the nearest integer quantum values. The precise number of quantums or levels is determined by bit depth—the number of bits in the code. For example, if eight bits are used, there will be 28 = 256 quantum levels. The bit rate can be calculated as:
Bit rate = Number of samples per second x Number of bits per sample
Number of samples per second = Number of samples per picture x Number of pictures per second
Number of samples per PAL picture = 720×576 = 414,720
Given a picture rate of 25,
Number of samples per second = 720×576×25 = 10,368,000
Therefore the bit rate generated by the luminance component using an 8-bit code is:
720×576×25×8 = 82,944,000 = 82.944 Mbps
The bit rate for the chrominance components depends on the sampling structure used. For a 4:2:2 sampling structure with only horizontal sub-sampling:
Number of samples = 360×576 = 207,360 per picture
Bit rate for each chrominance component is:
360×576×Picture rate×Number of bits = 360×576×25×8
= 41.472 Mbps, which is half the luminance bit rate.
Total chrominance bit rate is therefore:
41.472×2 = 82.944 Mbps
Giving a total bit rate of:
82.944 + 82.944 = 166 Mbps
For a 4:2:0 structure, it comes down to 124.416 Mbps. For a 4:1:1 structure, it will again be 124.416 Mbps.
The actual bandwidth requirements depend upon the type of modulation used. For pulse-code modulation, it will be half the bit rate, which comes down to 62 MHz. But even that is very high and there is a need to compress the data. As we have seen, transmission of SD television requires a bandwidth of around 62 MHz. Certain compression techniques are required to reduce the bit rate and hence the bandwidth compensation.
There are two main data compression standards—JPEG encoding and MPEG encoding. JPEG is associated with still image compression and MPEG with digital videos. MPEG-2 is used for SD and MPEG-4 for HDTV.
A digital television programme consists of three components—video, audio and service data. The original video and audio information is in analogue form and has to be sampled and quantised before being fed into the appropriate coders. The service data, which contains additional information such as teletext and network-specificinformation including electronic programme guide (EPG), is generated in digital form and requires no encoding.
MPEG-2 encoding. Video MPEG encoding consists of data preparation, compression and quantisation. The purpose of video data preparation is to ensure a raw-coded sample of the picture frame organised in a way that is suitable for data compression.
MPEG uses both temporal (time) and spatial (space) compression. Video is a sequence of still images, so the same compression technique as for JPEG can be applied to video clips. It is called spatial inter-frame compression. Also, successive images of video clips differ only slightly. It is possible to compress the redundant part while sending only the difference between them. This technique is called temporal compression.
Video preparation involves regrouping the samples of CR , CB and Y into 8×8 blocks to be used in spatial redundancy removal. These blocks are then rearranged into 16×16 macro blocks for use in temporal redundancy removal. The macro blocks are then grouped into slices, which are the basic units for data compression.
The most commonly used method works by comparing each frame in the video with the previous one. If the frame contains areas where nothing has moved, the system simply issues a short command that copies that part of the previous frame, bit-for-bit, into the next one. If sections of the frame move in a simple manner, the compressor emits a slightly longer command that directs the de-compresser to shift, rotate, lighten or darken the copy.
Inter-frame compression works well for programmes that are played back by the viewer but can cause problems if the video sequence needs to be edited.
As mentioned earlier, only the difference between consecutive frames is transmitted. The remaining data is redundant and does not get transmitted. For example, if a news reader is reading the news, only part of the frame that contains his lip movement gets transmitted as the difference-frame. Things like microphone, paperweight or the channel logo will be the same for all the frames and hence need not be transmitted.
If information is the same but its position changes in the next frame, it will not get encoded once again as it was already coded in the firstframe. For example, if the same paperweight is now shifted towards right, its position with respect to macro blocks changes. But it carries the same information as before. So now the motion vector is applied to these particular macro blocks. It will appear with its changed position, but it requires fewer bits for coding the same and hence the reduction in bandwidth.
The predicted frame (frame obtained by applying motion) is then subtracted from the second frame to produce the difference-frame. Both components (motion vector and frame difference) are combined to form a predicted frame (P-frame).
Temporal compression is carried out on a group of pictures normally composed of twelve non-interlaced frames. The firstframe of the group acts as the anchor or reference frame known as the inter-frame (I-frame). This is followed by a P-frame obtained by comparing the second frame with the I-frame. This is then repeated and the third frame is compared with the previous P-frame to produce a second P-frame and so on. This goes on until the end of the group of twelve frames. A new reference I-frame is then inserted for the next group of twelve frames and so on. This type of prediction is known as forward prediction.
Motion vector. Motion vector is obtained by the process of block matching. Y component of the reference frame is divided into 16×16 macro blocks. Blocks are taken one by one and the matching block searched within the given area.
When a match is found, the displacement is used to obtain a motion-compensation vector that describes the movement of the macro block in terms of speed and direction. Only a relatively small amount of data is necessary to describe a motion-compensation vector. The actual pixel values of the macro block do not have to be retransmitted.
Predicted and difference frames. As shown in Fig. 5, current frame F0 is fed into Buffer-1 and held there for a while. It is also fed into the movement vector generator, which uses the contents of the previous frame F-1 stored in the video memory to obtain motion vector MV0. The motion vector is then added to F-1 to produce pre-dicted frame P0. P0 is compared with the contents of the current frame F0 in Buffer-2 to produce residual error or difference frame D0. Residual error D0 is fed into the spatial discrete cosine transform (DCT) encoder and sent out for transmission.
Encoded D0 is decoded to reproduce D0 as it would be required at the receiving end. D0 is then added to P0, which has been waiting in Buffer-2 to reconstruct current frame F0 for storage in the video memory for the next frame and so on.
The heart of spatial redundancy removal is the DCT processor. The DCT processor receives video slices in the form of a stream of 8×8 blocks. The blocks may be part of a luminance frame (Y) or a chrominance frame. Sample values representing the pixel of each block are then fed into the DCT processor, which translates them into an 8×8 matrix of DCT coeffiients representing the spatial frequency content of the block. The coefficient are then scanned and quantised before transmission.
Discrete cosine transform. DCT is used in several coding standards (MPEG 1, MPEG 2, H.263, etc) to remove spatial redundancy that exists between neighbouring pixels in a image. It is a Fourier transform which takes information in the time domain and expresses it in the frequency domain. Normal pictures are two-dimensional and contain diagonal as well as horizontal and vertical spatial frequencies. MPEG-2 specifies DCT as the method of transforming spatial picture information into spatial frequency components. Each spatial frequency is given a value known as the DCT coefficient.For an 8×8 block of pixel samples, an 8×8 block of DCT coefficients is produced (refe Fig. 7).
A block that contains different picture details is represented by various coefficient values in the appropriate cells. Coarse picture details utilise a number of cells towards the left top corner, and the cells and finepicture details utilise a number of cells towards the bottom right-hand corner.
The grey-scale pattern is shown in Fig. 9(a). The corresponding sample values are shown in Fig. 9(b) and DCT coefficients in Fig. 9(c). DCT does not directly reduce the number of bits required to represent the 8×8-pixel block. Sixty-four pixel sample values are replaced with 64 DCT coefficient values
The reduction in the number of bits follows from the fact that for a typical block of a natural image, the distribution of the DCT coefficiens is not uniform. An average DCT matrix has most of its coeffiients—and therefore energy—concentrated at and around the top left-hand corner. The bottom right-hand quadrant has very few coefficiens of any substantial value. Bit rate reduction may thus be achieved by not transmitting the zero and near-zero coefficients Further bit reduction may be introduced by weighted quantising and special coding techniques of the remaining coefficients.
Quantising the DCT block. After a block has been transformed, the DCT coefficients are quantised (rounded u or down) to a smaller set of possible values to produce a simplified set of coefficients.
The DCT block (shown in Fig. 10(a)) may be reduced to very few coefficients (Fig. 10(b)) if a threshold of 1.0 is applied. After this, a non-linear or weighted quantisation is applied. The video samples are given a linear quantisation but the DCT coefficient receive a non-linear quantisation. A different quantisation level is applied to each coefficient depending on the spatial frequency it represents within the block.
High quantisation levels are allocated to coefficients representing low spatial frequencies. This is because the human eye is the most sensitive to low spatial frequencies. Lower quantisation is applied to coefficients representing high spatial frequencies. This increases quantisation error at these high frequencies, introducing error noise that is irreversible at the receiver. However, these errors are tolerable since high-frequency noise is less visible than low-frequency noise.
The DCT coefficient at the top left-hand is treated as a special case and given the highest priority. A more effective weighted quantisation may be applied to the chrominance frames since quantisation error is less visible in the chrominance component than in the luminance component. Quantisation error is more visible in some blocks than in others. One place where it shows up is blocks that contain a high contrast edge between two plain areas. The quantisation parameters can be modified to limit the quantisation error, particularly in high-frequency cells.
Daniel A. Figueiredo is scientist-D, and Sidhant Kulkarni and Umesh Shirale are pursuing M.Tech in electronics design and technology at DOEACC centre, Aurangabad