Video streaming and MPEG-DASH. In this module, we'll start with streaming networks and technologies. We'll first start with Managed Networks. In the managed networks category, cable TV and IPTV services are the most popular. One of the things that you can see in these type of services is that they are well-planned, their schedule is pre-announced. In addition, in-between the movies, the dramas, and the TV shows and the news broadcasts that they have, they have just the right amount of advertisements inside there as well. This all comes because they are well-managed, pre-scheduled, and the delivery mechanism to your house is well-planned as well. They use multicast transport technology, and also they have controlled Quality-of-Service, QoS functions, such that the service that is provided to you is very reliable. Then we have on the other hand, unmanaged networks in which content delivery to the viewer is using a unicast connection. The delivery from a Server or CDN is the common way it's done. A Server is a broadcast Server which could be something like the YouTube Server. In addition, when we say CDN, this is content delivery network, in which this is a network that is used for video streaming and content delivery over various wide range networks including the internet that may cross intercontinental areas. Therefore, we will need to study about CDN technology, and that is the topic in the next module in this course. Then looking at streaming methods, there are proprietary streaming protocols which run on TCP or UDP. There is HTTP over TCP, and this is used popular in progressive download techniques, and we will see some of these. In addition, there is conventional streaming versus adaptive streaming technology, and this is something that we need to look into especially because adaptive streaming technology is what's going on right now, and also future networks will rely on it. So the technical aspect of this, how does this work? We will need to study about that, and we will in this course. Conventional streaming services. These are the ones that you'll see as in terms of Microsoft Windows Media, Apple Quicktime, Adobe Flash. Very popular media service technologies that you've already used. Progressive download is HTTP over TCP where video playout can start as soon as the necessary amount of data, which may be a little bit worth of about 2-10 seconds worth of video information, is downloaded and then buffered in your memory and then you can immediately turn it on. Of course, because it is buffered, it is stored within your memory, you can decide when to turn it on, when to turn it off, to pause it or you can go and go backwards or forwards to see the parts of the video. Of course, you cannot go forward beyond what has been received, but you can go maybe a little bit into the fast- forwarding of the video to see if it is already downloaded and received in your device. Progressive download. There are advantages and disadvantages, the pros and the cons. The pros are easily supports freeze and rebuffering. In addition, easily support Trick mode which enables you to do fast-forward seek/play, or rewind, and other functions. Then there's the disadvantages, the cons which the Server must have multiple resolution versions of the same video. So it's going to take up more space on the Server. What about trick mode? A little bit more on the details. This is an example of trick mode in which when you're watching a video, for example like YouTube. You view it to a certain point, you want to look into the further down or you want to rewind back to a certain position to look at and view a different point within the video. Then what you do is you put your finger down on the red dot and you move it around. Then those small windows that you see over there will show an image of where the video will restart if you turn it back on at that position. This is what we call Trick mode and it helps to make video viewing very, very convenient. Then there's adaptive streaming which is a level up in technology, where a viewer device monitors its replay buffer and network condition which is related to the bandwidth, the error rate, the throughput, various delay components, and it will choose the most appropriate version of the video, the video resolution that is, and uses progressive downloading. It will decide what level to show as in terms of level, meaning resolution of the video. The resolution, for example, there could be an ordinary mode, but if the network condition is good and there's a little buffer, then well, you can maybe go up to a high-definition mode where the number of frames per second goes up. In addition, the number of resolution in your, each image goes up as well. That will increase the data rate that is being received. You can only do that when your network condition and your replay buffer has sufficient space. Some of the adaptive streaming service examples include MPEG-DASH which is DASH, stands for Dynamic Adaptive Streaming over HTTP which is the most popular networking technology that is used throughout the world. In addition, Microsoft Smooth Streaming, as well as, Apple HTTP Live Streaming and Adobe HTTP Dynamic Streaming. These are all very popular, but this one is the most popular. The advantage of using adaptive streaming with HTTP are based upon the fact that HTTP has ubiquitous connectivity. Meaning that almost all devices support HTTP connections. HTTP is a pull protocol. The pull protocol is opposite to a push protocol. This is an advantage of adaptive streaming using HTTP because pull streaming has the following benefits. Based on the fact that it easily traverses middle boxes. The middle boxes are the firewall, the gateway, the Network Address Translation devices, and other things as well. This is based on the fact that, it is a pull service in which the device will send out requests to have a certain amount of video information to be sent to it, such that these middle boxes will see how much was requested by the device, and then it will see that the server is sending that much amount. Because of that, it can trust that the amount that is being sent into the network to the user is the right amount. The opposite case which is the Push Protocol, in which the server is in control of the video streaming flow. It will send packets down to the mobile device. Then these middle boxes do not know if this video files that are being sent to the mobile user are based on the mobiles users request or if it is just sending it down in a way such that it is due to a certain virus attack. It is based upon a certain Denial of Service Attack that is being conducted. These middle boxes are not sure, so the firewall may activate a blocking mechanism and the video flow will be blocked. Pull services enable the opposite operation such that just the right amount that has been requested by the mobile device is sent to the mobile device from the server, and that can be relied upon such that the middle boxes, let that pass through very easily. In addition, pull streaming support servers keep only minimum device state information. This means that, if we were doing a push-based service, which is the opposite of a pull service. If we're doing a push-based service than the server needs to keep track of, how much it sent to that mobile device. In addition, it needs to keep track of the network condition. It needs to keep track of if there were any errors and if there were, should I re-transmit or not. That's a lot of information to be sent and keep track of if you are in control of a lot of mobile users at the same time. Therefore, it really complicates the structure, the control mechanism that the server needs to keep track of. On the other hand, in pull services, your smartphone is going to request how much it needs based upon the network condition that it sees, based upon the amount of memory that it has, based upon the current video status in which it is displaying. So it knows exactly what's going on and it's only controlling the services that are coming to itself. So it's not that much of a burden for the smartphone because basically, the smartphone knows what it's doing and it's going to keep track of itself anyway. So therefore, the amount of work that the server needs to keep track of is minimal when you use pull services. That's another reason why pull services are so popular, and that is the reason why much more scalable compared to push streaming servers. Meaning that, on one server that is supporting multiple users, when you're using push, if there is a limited amount than when you're using pull, you can support much more and the scalability benefit comes from that. Push-based Media Streaming. Let's take a closer look into what push is. The server streams packets to the client until the client stops or interrupts the session. So it is in a form that the server is going to send and send and send until the receiving device says, "Whoa! Stop, I can't take anymore." The server maintains a session state with the client and listens for commands from the client regarding session state changes. Some of the popular push-based media streaming services that have been used are RTSP which stands for Real-time Streaming Protocol and RTP, Real-time Transport Protocol. RTSP is an IETF standard based upon RFC 2326 and typically the SCP, Session Control Protocol is commonly used. On the other hand, RTP is an IETF standard based upon RFC 3550 in which UDP is commonly used. RTP and UDP lets the server push packets to the client. The server bitrate is determined by the application quality of service requirements and the client and server characteristics. Here when we say bitrate, we're talking about data traffic that is measured in units of bits per second, either represented like this bits slash second or bps, bits per second for short. Push-based Media Streaming, the disadvantages, the cons. Server has too much to do. It's too burdening. In addition, playing device status is hard to track because the server and the mobile device are far away. The server is trying to keep track of the mobile device or your PC or laptop computer status and it's far away. By the time that a certain status is experienced and reported back to the server, there is a time gap. So therefore, the server is always trying to catch up of the current status that is going on at your PC, laptop computer, or smartphone. Many firewalls will block RTP packets because they don't know if the packets that are being sent to the mobile device are based on a mobile device's request or if this is some type of security attack. Many internet service networks that had been replaced with CDN do not support RTP. This is because RTP is possibly dangerous and it could be used in network attacks. We need to study about CDN, as you've seen that content delivery networks are one of the core technologies that are used in video streaming services so we will in the next module. Pull-based Adaptive Streaming technology. The media client which is like your smartphone or your personal computer, sends HTTP requests to the server to quickly retrieve, this is you pulling in your packets, content packets in burst mode. Meaning that when you request for something, you want it to be sent very quickly such that you receive packets in a sequence and then you save it in your memory and you can replay it back on your mobile device. After the minimum required buffer level is filled up, the media client is able to play it, display it on the media player. For example, once you download the minimum amount that is required, then on your smartphone, you can start viewing the video. The server transmits at the media encoding bitrate that matches the media consumption rate. Meaning that, there is a certain rate that the video is being displayed such that video packets that are in the memory are consumed. Then the way that the media encoding bitrate is set will be matched to that rate. In addition, if the client buffer level remains stable at a certain fixed level approximately, the network resources will be efficiently used. This is what we're talking about as in terms of quality network resources on how efficient we're using them, in addition to how the mobile device or your PC or laptop computer is viewing the video rate. For instance, in order to have the buffer level remains stable at the client device, which is your PC a laptop or smartphone, that means that the speed of how the video packets are consumed and displayed on your display and the speed of how the packets are arriving into your buffer, your memory, these are approximately the same rate such that the overall buffer level remains stable. Meaning that, it's coming in and going out at the same rate. Then in these cases, the network resources are very efficiently used and that's good. Then if the network packets are lost or transmission delays are experience, then buffer underflow. The buffers will become empty because packets will not be coming in properly. They may be coming in but if they have errors, they will be dropped, they will be terminated, they will be erased. Therefore, the playback may be interrupted in which the video stops in the middle. The server will dynamically switch to a lower bit rate stream to prevent buffer underflow. That means that the video packets that come in, they will be made in a way such that when you support a lower resolution video than the amount of data packets that are sent in, and the packet sizes can be reduced in these cases. Therefore, the overall streaming speed can be maintained because you're sending a little, matching the lesser size of the network throughput, and therefore, what is being received can be displayed on your monitor very comfortably. Of course, what you lose here is the resolution of the video. To avoid noticeable video quality degradation, gradual bitrate stream reductions are used. Meaning that, if you see the network conditions are degrading, they're getting poorer, then you gradually reduce the video quality, such that lesser packets, a lesser throughput is needed. If network conditions improve then the servers switches to a higher bit rate until the media encoding bitrate is recovered. That's talking about if the network conditions improve, then you want to go back to the high quality that you were using before. This is supportive if the network throughput, and the other conditions in the network are able to support this higher resolution, and higher throughput. The pull-based adaptive streaming of the media works like this; in which you see over there, there are various movie bitrates and also fragments. The details of these will be explained in the following slides. But first, let me go over this example which starts off with a request manifest of movie A. That is replied back from the server with this, a manifest comes back to the device that is going to play the video. With this manifest coming back, this means that I will support your request of movie A, then the pull process starts. It is going to request for 200 kilobits per second of a data rate which is the data rate for the video streaming to be sent to itself, and the reference time is zero. Then as the video packets come in, it will see that the network condition is pretty good, so it will request for a higher resolution, it will request for more, and the data rate goes up to 400 kilobits per second at two seconds. Then it asks for even higher quality upgrade. Then 800 kilobits per second at four seconds is what is requested, and then a higher data rate is received. But then, as we're receiving at this higher rate of 800 kilobits per second, some packet losses and some delay are experienced. So it says, "Whoa, this is too fast, let's reduce." Therefore it requests for at six seconds, for a 400 kilobits per second rate. It is reducing. Then the network conditions improve, and it goes back to the higher rate again and request for these services at eight seconds into the video service. This is how we GET at this range, and this is how the adaptive media streaming is done, and as you can see it is based on the pull mechanism. What is requested is what is gotten. Looking into the structure of the media streaming file format which is based on MP4, MPEG-4 technology. You will first see at the beginning there is the file type which is the ftype listed up there. Then we have the movie metadata which is the moov part. Then we go into the fragment part. In the fragment part you can see that the audio video data are in the mdat boxes. The mdat stands for media data, and you will see that the mdat boxes, and the metadata form a fragment. In addition, fragments are retrieved in an HTTP GET request message. Looking inside the fragment, we have the overall moof; the movie fragment part, and then we have the media data, the mdat part which we explained down here. This is what is in a fragment. Then we have the movie fragment random access part, the mfra part, and this is what we are going to be using. Looking into the HTTP request Message, for an example, you see that this is a GET message because at the far end you see GET, G-E-T listed down there. Each fragment is downloaded based on a unique HTTP request-response pair. In addition, the HTTP Request Message header contains two pieces of important information. What is the bitrate? The other is the time offset of the requested fragment. Looking up here in this HTTP GET request message, where is that? Well, you can estimate this information from these parts right here. Looking into the details of the pull-based adaptive media streaming structure, being interactive with the server, look over there, on the top there is a GET request and at the bottom there is the response. Supporting the client manages these components, the client monitors and measures these factors and then the client will perform the certain level of adaptation based upon the network, and it's own buffer conditions. Looking into the details, generic client-side pull-based adaptive streaming implementation example is now what we're going to see, and here we go. A client acquires media from fragments of a file over one or more connection according to the playout buffer state, and other conditions which include the network status as well. Over there which is the GET request and the response part, that is a key component. At minimum, the server provides standard responses to the HTTP GET request. Meaning that it can provide at least the standard responses or it can provide more information. The client gets a manifest file that identifies the overall media presentation and alternative bitrates. These alternative bitrates are for higher resolution or lower resolution video and audio. Then comes the client managing components. The client needs a client-side manifest or playlists file to map fragment request to specific files or to map byte ranges or time offsets to files. In addition, some adaptive streaming techniques, a similar file on the server translates client requests. Then we have the client monitors and measures component in which the playout buffer management selects fragments from a file at a particular bitrate in response to buffer state and other variables. This here you can see the components of what is actually measured and what is monitored. In addition, client performs adaptation based on these monetary values. Then the adaptive streaming client keeps a playout buffer of several seconds saved, and it wants to maintain this level in a stable way. Because if this is maintained stable, then the network is going to be used very efficiently. How much does it try to keep? About 5-30 seconds of video playing time.