(r10) Research < Main

Hi and Welcome smile

This is the page related to my research. The following is the log of the activities completed arranged chronologically with date.

Benchmarks conducted for the MMCN Paper (paper submitted on 1st August 2006, acceptance notification due: 1st September 2006) - Paper Accepted in MMCN 2007

Most of the work reported here were completed before Aug 1st 2006

We instrumented both mplayer and VLC for measuring jitter, fps and CPULoad related parameters. In particular, we modified the video display modules by placing printf()'s at appropriate places that print out the current time and frame jitter at that time. From this basic information, we later calculate FPS by running another script on the dump files. We make sure that in all the experiments, we allow mplayer and VLC to drop frames when needed.
- If X instances of mplayer and Y instances of VLC just saturates the CPU by playing a specific XVID(mpeg-4) video (all of them play the same video at the same position) , we measure their individual fps and frame jitter. At the same time, we measure the overall CPU load on the system for the duration of playback. We run this experiments on a Dell Inspiron 1300 laptop.
- Next, we perform the same experiment with 2X mplayer instances and 2Y VLC instances and measure the same parameters. We run this experiments on a Dell Inspiron 1300 laptop.
- From the above two experiments, we show that VLC performs worse of the two under heavy CPU load.
- Next, we patch the instrumented mplayer for playing SPEG video. We perform the same experiment as above and note the new values for X and 2X. Lets say the new value of X be X'. We note that X'<X. We run this experiments on a P-IV 3Ghz desktop with 1Ghz ram.
We perform the same experiments using QStream this time with a single QStream process running multiple videos.
- We used Qmon for dumping the performance data on to some text files and plot them. We note the threshold # of videos that just saturates the CPU, say Z. We run increasing number of videos upto Z to check how QStream adapts to increasing load and then go ahead upto 2Z. We run Qstream client on a P-IV 3Ghz desktop with 1Ghz ram and 100 Mbps ethernet. We run the server and monitor on a different machine on the same subnet. Later we plot the fps, cpuload and jitter values for each of the videos.
- We run another set of similar experiments as above with boosting the temporal quality of one video (meaning it would drop less frames) among the rest. We show that in the case where all videos have equal priorities, all of them degrade gracefully, with boosted case, one of them definitey perform better compared to the rest.
- Lastly, we run a single video with Qstream and plot the video bitrate against cpu load and show that video bitrate of a MPEG 4 video vary greately over time and the cpuload also varies acordingly showing a strong correlation between the two. This clearly shows that the entropy of a video varies greatly over time and so does the number of bits used to encode a single frame. Thus, when we play multiple videos on a single machine, cpu requirements are a result of combined requirements of all the videos taken together which is very difficult to predict, much less make provisions for it beforehand. Thus, this shows our belief that graceful quality adaptation across multiple videos is a difficult problem which is exasperated by the fact that hardware capability vaies greately from device to device, from handhelds having low processing power (for saving battery life) to high end dual core desktop machines.

Experiments performed as a continuation of the ones on the MMCN Paper but not reported on the submitted paper (perhaps will be reported on the camera ready once the paper gets accepted).

The work reported here was completed before Aug 15th 2006

QStream tests as before but this time, we run N instances of qstream, each running one video.
- We do the same set of tests as before but now each of the qstream processes run one video each and we also make sure that we skip each video by different amounts so as to bring a heterogeneous environment where each video has varying cpu requirements. We disable alsa sound output because we find that contention among the different qstream processes for acquiring the single device causes major stops in videos. We compare the results with those of the previous benchmarks and find that:-
  - Boosting has no effect - pretty understandable because the videos do not communicate among each other about their deadlines. Further, The scheduler does not know anything about the boosted status of one video.
  - Unfairness among the videos is more pronounced due to the scheduler knowing nothing about the deadlines of the players.
- We intend to perform the same set of experiments with mplayer and sound disabled for more fair comparison in the final version of the paper.

We performed some minor bug fixes with qstream as we went along, modified the decoder for more fine grained frame level cpu adaptation frame dropping etc.

Readings completed (as of 31/08/2006)

Major portions of Buck's thesis. Chapters on performance analysis and multicast were avoided.
Read the rejected NOSDAV paper and the soft timer's paper.
Read Qstream qsf source code and shared memory IPC.
Preliminary coding of Cooperative Scheduler done; benchmarks done;

Work done as a part of the Eurosys paper (deadline for initial submission: 20th September, 2006, 10 pm)

The EDF scheduler

Our proposal is that each of the real time applications would provide enough information to the Linux kernel (such as their internal deadlines, deadline priority etc) so that the kernel can make the proper decision as to whom to run next. At appropriate times, the kernel will then be able to preempt the currently executing task and run the one that has the most favorable parameters. By favorable parameters we mean a closest deadline, a high priority ASAP event [1] etc. In this way, using such fine grained approach, each of the applications should be able to hit their deadlines at the right time. If ever missed, we shall still have a bounded latency.
However, there are caveats in this approach. Firstly, we hit a large number of context switches since this is very fine-grained design. The number and frequency of context switches depends on how close the events are on the timeline, time to execute an event and context switch time. A high context switch overhead will translate into inefficient (or in other words wastage) use of cpu which possibly could have been used for some other useful work. Secondly, what happens if one or more of the applications lies about their own internal deadlines and other private parameters? In such a case, one application can easily steal cpu time from other applications. A kernel level implementation would have no knowledge of the higher level process specific notions like deadlines and priority. To deal with this situation, we propose to design a hybrid scheduler, one that is in between a fine-grained cooperative scheduler and a simple periodic scheduler. In this scheme, we choose an appropriate period, say 20 ms and divide it uniformly between all the competing real time tasks. Each of the processes gets a uniform allocations to start with. As each of the applications runs, they use up their allocation. The kernel wakes up the process that has the highest priority nearest deadline and runs them until the deadline of some other process is about to expire. The kernel then wakes up that process and runs it. When an application runs, it uses up its own allocation. When its allocation becomes zero, the kernel forces it to sleep till next epoch, which is 20 ms, the period of our scheduler. In this scheme, each of the applications would still be able to hit their deadlines, but with a maximum tardiness equal to the period of our scheduler. For example, if the allocation of a process expires, it will sleep, but it is possible that one of its own deadlines expired just on the moment it went to sleep. This deadline event can not be served by the cpu until it wakes up again, in this case after 20 ms. On the other hand, the benefit of this approach is that no one gains by lying to the kernel. If one does lie, he will use up his allocation earlier than others and will be forced to sleep. Thus, by adjusting the period, we can adjust the maximum amount of tardiness that we can tolerate, have a policing effect and also achieve an upper bound on the number of context switches. Reducing the number of context switches and policing comes with the price of more coarse grained context switches leading to reduced tardiness. The main theme however in this kind of mixed approach is to share the cpu reasonably uniformly across all the processes, even if one of them cheats the kernel and at the same time hit deadlines within reasonable bounds.
However, in the above discussion, we assumed that there are no other cpu intensive non-real time tasks running in the system. In general, this assumption is not true. When there are other cpu jobs in queue, the cpu division across all the real time tasks may not be uniform. In order to account for the fact that their might be other non-real time throughput oriented applications running in the system that might also need a share of the CPU time, we propose to reserve a fixed percentage of cpu time (say 20%) that can be allocated to these other tasks. Though, this might look like to be an easy solution to the problem, it is not very elegant. The obvious question that we find ourselves asking here is that how much cpu do we want to reserve beforehand? Whether this reservation is enough for all times? Does this ever lead to wastage of CPU cycles when there are few other non-real time tasks running etc. We are in the process of trying to find other elegant and feasible solutions to this problem. In general, we believe that this is an extremely difficult problem to address.
In order to get an idea of the different kinds of issues that we might encounter in our design and the kind of results we may expect, we have a user level implementation of pure cooperative scheduling scheme, the hybrid scheme and pure periodic scheme using Qstream. Each of the Qstream applications communicate their deadline and deadline priorities to each other using shared memory IPC. All these applications yield the CPU (using a condition variable and wait) to other fellow applications depending on their own deadlines and on the nearest external deadlines.
We have performed benchmarks using qstream processes that use all the three kinds of scheduling mechanisms discussed above and compared them with single process-multiple video and independent multi-video cases.

The Results/Analysis/Future Proposals

A cooperative EDF scheduler produces results very close to the single video case. However, we see an overall context switch rate of more than 3000 per second.
The FPS rate of all the videos are very close to each other, something that is impossible to achieve in the independent player case.

Topic revision: r10 - 2006-09-20 - AnirbanSinha

~~Edit~~
~~Attach~~