* [RFC] Scheduler issue 1, RT tasks ...
@ 2001-12-20 21:11 Davide Libenzi
2001-12-20 22:25 ` george anzinger
0 siblings, 1 reply; 30+ messages in thread
From: Davide Libenzi @ 2001-12-20 21:11 UTC (permalink / raw)
To: lkml
I'd like to have some comments about RT tasks implementation in a SMP
system because POSIX it's not clear about how the priority rules apply to
multiprocessor systems.
The Balanced Multi Queue Scheduler ( BMQS, http://www.xmailserver.org/linux-patches/mss-2.html )
i'm working on tries to keep CPU schedulers the more independent as
possible and currently implements two kind of RT tasks, local one and
global ones.
Local RT tasks apply POSIX priority rules inside the local CPU, that means
that an RT task running on CPU0 cannot preempt another task ( being it
normal or RT ) on CPU1. This keeps schedulers interlocking very low
because of the very fast path in reschedule_idle() ( no multi lock
acquisition, CPU queue loops, etc...).
Global RT tasks, that live in a separate run queue, have the ability to
preempt remote CPU and this can lead ( in the unfortunate case that the
last CPU running the RT task is running another RT task ) to an higher
cost in reschedule_idle().
The check for a global RT task selection is done in a very fast way before
checking the local queue :
if (!list_empty(&runqueue_head(RT_QID)))
goto rt_queue_select;
rt_queue_select_back:
and this does not affect the scheduler latency at all.
On the contrary, by having a separate queue for global RT tasks, can
improve it in high run queue load cases.
The local/global RT task selection is done with setscheduler() with a new
( or'ed ) flag SCHED_RTGLOBAL, and this means that the default is RT task
local.
I'd like to have comments on this before jumping to the next Scheduler
issue ( balancing mode ).
- Davide
^ permalink raw reply [flat|nested] 30+ messages in thread* Re: [RFC] Scheduler issue 1, RT tasks ... 2001-12-20 21:11 [RFC] Scheduler issue 1, RT tasks Davide Libenzi @ 2001-12-20 22:25 ` george anzinger 2001-12-20 22:21 ` Momchil Velikov 2001-12-20 22:36 ` Davide Libenzi 0 siblings, 2 replies; 30+ messages in thread From: george anzinger @ 2001-12-20 22:25 UTC (permalink / raw) To: Davide Libenzi; +Cc: lkml Davide Libenzi wrote: > > I'd like to have some comments about RT tasks implementation in a SMP > system because POSIX it's not clear about how the priority rules apply to > multiprocessor systems. > The Balanced Multi Queue Scheduler ( BMQS, http://www.xmailserver.org/linux-patches/mss-2.html ) > i'm working on tries to keep CPU schedulers the more independent as > possible and currently implements two kind of RT tasks, local one and > global ones. > Local RT tasks apply POSIX priority rules inside the local CPU, that means > that an RT task running on CPU0 cannot preempt another task ( being it > normal or RT ) on CPU1. This keeps schedulers interlocking very low > because of the very fast path in reschedule_idle() ( no multi lock > acquisition, CPU queue loops, etc...). > Global RT tasks, that live in a separate run queue, have the ability to > preempt remote CPU and this can lead ( in the unfortunate case that the > last CPU running the RT task is running another RT task ) to an higher > cost in reschedule_idle(). > The check for a global RT task selection is done in a very fast way before > checking the local queue : > > if (!list_empty(&runqueue_head(RT_QID))) > goto rt_queue_select; > rt_queue_select_back: > > and this does not affect the scheduler latency at all. > On the contrary, by having a separate queue for global RT tasks, can > improve it in high run queue load cases. > The local/global RT task selection is done with setscheduler() with a new > ( or'ed ) flag SCHED_RTGLOBAL, and this means that the default is RT task > local. > I'd like to have comments on this before jumping to the next Scheduler > issue ( balancing mode ). > My understanding of the POSIX standard is the the highest priority task(s) are to get the cpu(s) using the standard calls. If you want to deviate from this I think the standard allows extensions, but they IMHO should be requested, not the default, so I would turn your flag around to force LOCAL, not GLOBAL. -- George george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ Real time sched: http://sourceforge.net/projects/rtsched/ ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Scheduler issue 1, RT tasks ... 2001-12-20 22:25 ` george anzinger @ 2001-12-20 22:21 ` Momchil Velikov 2001-12-20 22:57 ` Davide Libenzi 2001-12-20 22:36 ` Davide Libenzi 1 sibling, 1 reply; 30+ messages in thread From: Momchil Velikov @ 2001-12-20 22:21 UTC (permalink / raw) To: george anzinger; +Cc: Davide Libenzi, lkml >>>>> "George" == george anzinger <george@mvista.com> writes: George> Davide Libenzi wrote: >> Local RT tasks apply POSIX priority rules inside the local CPU, that means >> that an RT task running on CPU0 cannot preempt another task ( being it >> normal or RT ) on CPU1. [...] >> Global RT tasks, that live in a separate run queue, have the ability to >> preempt remote CPU and this can lead. [...] >> The local/global RT task selection is done with setscheduler() with a new >> ( or'ed ) flag SCHED_RTGLOBAL, and this means that the default is RT task >> local. George> My understanding of the POSIX standard is the the highest priority George> task(s) are to get the cpu(s) using the standard calls. If you want to George> deviate from this I think the standard allows extensions, but they IMHO George> should be requested, not the default, so I would turn your flag around George> to force LOCAL, not GLOBAL. I'd like to second that, IMHO the RT task scheduling should trade throughput for latency, and if someone wants priority inversion, let him explicitly request it. Regards, -velco ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Scheduler issue 1, RT tasks ... 2001-12-20 22:21 ` Momchil Velikov @ 2001-12-20 22:57 ` Davide Libenzi 2001-12-21 17:00 ` Mike Kravetz 0 siblings, 1 reply; 30+ messages in thread From: Davide Libenzi @ 2001-12-20 22:57 UTC (permalink / raw) To: Momchil Velikov; +Cc: george anzinger, lkml On 21 Dec 2001, Momchil Velikov wrote: > >>>>> "George" == george anzinger <george@mvista.com> writes: > > George> Davide Libenzi wrote: > >> Local RT tasks apply POSIX priority rules inside the local CPU, that means > >> that an RT task running on CPU0 cannot preempt another task ( being it > >> normal or RT ) on CPU1. > [...] > >> Global RT tasks, that live in a separate run queue, have the ability to > >> preempt remote CPU and this can lead. > [...] > >> The local/global RT task selection is done with setscheduler() with a new > >> ( or'ed ) flag SCHED_RTGLOBAL, and this means that the default is RT task > >> local. > > George> My understanding of the POSIX standard is the the highest priority > George> task(s) are to get the cpu(s) using the standard calls. If you want to > George> deviate from this I think the standard allows extensions, but they IMHO > George> should be requested, not the default, so I would turn your flag around > George> to force LOCAL, not GLOBAL. > > I'd like to second that, IMHO the RT task scheduling should trade > throughput for latency, and if someone wants priority inversion, let > him explicitly request it. No a great performance loss anyway. It's zero performance loss if the CPU that has ran the woke up RT task for the last time is not running another RT task ( very probable ). If the last CPU of the woke up task is running another RT task a CPU discovery loop ( like the current scheduler ) must be triggered. Not a great deal anyway. - Davide ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Scheduler issue 1, RT tasks ... 2001-12-20 22:57 ` Davide Libenzi @ 2001-12-21 17:00 ` Mike Kravetz 2001-12-21 17:19 ` Davide Libenzi 2001-12-24 0:18 ` Victor Yodaiken 0 siblings, 2 replies; 30+ messages in thread From: Mike Kravetz @ 2001-12-21 17:00 UTC (permalink / raw) To: Davide Libenzi; +Cc: Momchil Velikov, george anzinger, lkml On Thu, Dec 20, 2001 at 02:57:55PM -0800, Davide Libenzi wrote: > On 21 Dec 2001, Momchil Velikov wrote: > > > > I'd like to second that, IMHO the RT task scheduling should trade > > throughput for latency, and if someone wants priority inversion, let > > him explicitly request it. > > No a great performance loss anyway. It's zero performance loss if the CPU > that has ran the woke up RT task for the last time is not running another > RT task ( very probable ). If the last CPU of the woke up task is running > another RT task a CPU discovery loop ( like the current scheduler ) must > be triggered. Not a great deal anyway. Some time back, I asked if anyone had any RT benchmarks and got little response. Performance (latency) degradation for RT tasks while implementing new schedulers was my concern. Does anyone have ideas about how we should measure/benchmark this? My 'solution' at the time was to take a scheduler heavy benchmark like reflex, and simply make all the tasks RT. This wasn't very 'real world', but at least it did allow me to compare scheduler overhead in the RT paths of various scheduler implementations. -- Mike ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Scheduler issue 1, RT tasks ... 2001-12-21 17:00 ` Mike Kravetz @ 2001-12-21 17:19 ` Davide Libenzi 2001-12-21 17:33 ` Mike Kravetz 2001-12-24 0:18 ` Victor Yodaiken 1 sibling, 1 reply; 30+ messages in thread From: Davide Libenzi @ 2001-12-21 17:19 UTC (permalink / raw) To: Mike Kravetz; +Cc: Momchil Velikov, george anzinger, lkml On Fri, 21 Dec 2001, Mike Kravetz wrote: > On Thu, Dec 20, 2001 at 02:57:55PM -0800, Davide Libenzi wrote: > > On 21 Dec 2001, Momchil Velikov wrote: > > > > > > I'd like to second that, IMHO the RT task scheduling should trade > > > throughput for latency, and if someone wants priority inversion, let > > > him explicitly request it. > > > > No a great performance loss anyway. It's zero performance loss if the CPU > > that has ran the woke up RT task for the last time is not running another > > RT task ( very probable ). If the last CPU of the woke up task is running > > another RT task a CPU discovery loop ( like the current scheduler ) must > > be triggered. Not a great deal anyway. > > Some time back, I asked if anyone had any RT benchmarks and got > little response. Performance (latency) degradation for RT tasks > while implementing new schedulers was my concern. Does anyone > have ideas about how we should measure/benchmark this? My > 'solution' at the time was to take a scheduler heavy benchmark > like reflex, and simply make all the tasks RT. This wasn't very > 'real world', but at least it did allow me to compare scheduler > overhead in the RT paths of various scheduler implementations. Mike, a better real world test would be to have a variable system runqueue load with the wakeup of an rt task and measuring the latency of the rt task under various loads. This can be easily implemented with cpuhog ( that load the runqueue ) plus the LatSched ( scheduler latency sampler ) that will measure the exact latency in CPU cycles. - Davide ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Scheduler issue 1, RT tasks ... 2001-12-21 17:19 ` Davide Libenzi @ 2001-12-21 17:33 ` Mike Kravetz 2001-12-21 18:29 ` Davide Libenzi 0 siblings, 1 reply; 30+ messages in thread From: Mike Kravetz @ 2001-12-21 17:33 UTC (permalink / raw) To: Davide Libenzi; +Cc: Momchil Velikov, george anzinger, lkml On Fri, Dec 21, 2001 at 09:19:04AM -0800, Davide Libenzi wrote: > On Fri, 21 Dec 2001, Mike Kravetz wrote: > > > Some time back, I asked if anyone had any RT benchmarks and got > > little response. Performance (latency) degradation for RT tasks > > while implementing new schedulers was my concern. Does anyone > > have ideas about how we should measure/benchmark this? My > > 'solution' at the time was to take a scheduler heavy benchmark > > like reflex, and simply make all the tasks RT. This wasn't very > > 'real world', but at least it did allow me to compare scheduler > > overhead in the RT paths of various scheduler implementations. > > Mike, a better real world test would be to have a variable system runqueue > load with the wakeup of an rt task and measuring the latency of the rt > task under various loads. > This can be easily implemented with cpuhog ( that load the runqueue ) plus > the LatSched ( scheduler latency sampler ) that will measure the exact > latency in CPU cycles. Right! Any ideas on variable system runqueue load? Should those other tasks be RT or OTHER? a mix? I would suspect that we would want multiple RT tasks on the runqueue or at least in the system (otherwise why worry about global semantics?). However, I would feel better about this if someone had a real world load involving RT tasks on a SMP system. At least then we could try to simulate a load someone cares about. -- Mike ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Scheduler issue 1, RT tasks ... 2001-12-21 17:33 ` Mike Kravetz @ 2001-12-21 18:29 ` Davide Libenzi 0 siblings, 0 replies; 30+ messages in thread From: Davide Libenzi @ 2001-12-21 18:29 UTC (permalink / raw) To: Mike Kravetz; +Cc: Momchil Velikov, george anzinger, lkml On Fri, 21 Dec 2001, Mike Kravetz wrote: > On Fri, Dec 21, 2001 at 09:19:04AM -0800, Davide Libenzi wrote: > > On Fri, 21 Dec 2001, Mike Kravetz wrote: > > > > > Some time back, I asked if anyone had any RT benchmarks and got > > > little response. Performance (latency) degradation for RT tasks > > > while implementing new schedulers was my concern. Does anyone > > > have ideas about how we should measure/benchmark this? My > > > 'solution' at the time was to take a scheduler heavy benchmark > > > like reflex, and simply make all the tasks RT. This wasn't very > > > 'real world', but at least it did allow me to compare scheduler > > > overhead in the RT paths of various scheduler implementations. > > > > Mike, a better real world test would be to have a variable system runqueue > > load with the wakeup of an rt task and measuring the latency of the rt > > task under various loads. > > This can be easily implemented with cpuhog ( that load the runqueue ) plus > > the LatSched ( scheduler latency sampler ) that will measure the exact > > latency in CPU cycles. > > Right! Any ideas on variable system runqueue load? Should those > other tasks be RT or OTHER? a mix? I would suspect that we would > want multiple RT tasks on the runqueue or at least in the system > (otherwise why worry about global semantics?). > > However, I would feel better about this if someone had a real world > load involving RT tasks on a SMP system. At least then we could try > to simulate a load someone cares about. In my tests i stop the run queue load to 8 ( per cpu ) now coz higher values are somehow unusual. A good plot should also have a third dimension that is the number of real time tasks running. I guess i've to take a better look at gnuplot docs for 3d graphs :) - Davide ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Scheduler issue 1, RT tasks ... 2001-12-21 17:00 ` Mike Kravetz 2001-12-21 17:19 ` Davide Libenzi @ 2001-12-24 0:18 ` Victor Yodaiken 2001-12-24 1:31 ` Davide Libenzi 1 sibling, 1 reply; 30+ messages in thread From: Victor Yodaiken @ 2001-12-24 0:18 UTC (permalink / raw) To: Mike Kravetz; +Cc: Davide Libenzi, Momchil Velikov, george anzinger, lkml Run a "RT" task that is scheduled every millisecond (or time of your choice) while(1`){ read cycle timer clock_nanosleep(time period using aabsolute time read cycle timer - what was actual delay? track worst case } Run this a) on aaaaaaaaan unstressed system b) under stress c) while a timed non-rt benchmark runs to figure out "RT" overhead. On Fri, Dec 21, 2001 at 09:00:15AM -0800, Mike Kravetz wrote: > On Thu, Dec 20, 2001 at 02:57:55PM -0800, Davide Libenzi wrote: > > On 21 Dec 2001, Momchil Velikov wrote: > > > > > > I'd like to second that, IMHO the RT task scheduling should trade > > > throughput for latency, and if someone wants priority inversion, let > > > him explicitly request it. > > > > No a great performance loss anyway. It's zero performance loss if the CPU > > that has ran the woke up RT task for the last time is not running another > > RT task ( very probable ). If the last CPU of the woke up task is running > > another RT task a CPU discovery loop ( like the current scheduler ) must > > be triggered. Not a great deal anyway. > > Some time back, I asked if anyone had any RT benchmarks and got > little response. Performance (latency) degradation for RT tasks > while implementing new schedulers was my concern. Does anyone > have ideas about how we should measure/benchmark this? My > 'solution' at the time was to take a scheduler heavy benchmark > like reflex, and simply make all the tasks RT. This wasn't very > 'real world', but at least it did allow me to compare scheduler > overhead in the RT paths of various scheduler implementations. > > -- > Mike > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Scheduler issue 1, RT tasks ... 2001-12-24 0:18 ` Victor Yodaiken @ 2001-12-24 1:31 ` Davide Libenzi 2001-12-24 5:33 ` Victor Yodaiken 0 siblings, 1 reply; 30+ messages in thread From: Davide Libenzi @ 2001-12-24 1:31 UTC (permalink / raw) To: Victor Yodaiken; +Cc: Mike Kravetz, Momchil Velikov, george anzinger, lkml On Sun, 23 Dec 2001, Victor Yodaiken wrote: > > > Run a "RT" task that is scheduled every millisecond (or time of your > choice) > while(1`){ > read cycle timer > clock_nanosleep(time period using aabsolute time > read cycle timer - what was actual delay? track worst > case > } > > Run this > a) on aaaaaaaaan unstressed system > b) under stress > c) while a timed non-rt benchmark runs to figure out "RT" > overhead. I've coded a test app that uses the LatSched latency patch ( that uses rdtsc ). It basically does 1) set the current process priority to RT 2) an ioctl() to activate the scheduler latency sampler 3) sleep for 1-2 secs 4) ioctl() to stop the sampler 5) peek the sample with pid == getpid(). In this way i get the net RT task scheduler latency. Yes it does not get the real one that includes accessories kernel paths but my code does not affect these ones. And they add noise to the measure. - Davide ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Scheduler issue 1, RT tasks ... 2001-12-24 1:31 ` Davide Libenzi @ 2001-12-24 5:33 ` Victor Yodaiken 2001-12-24 18:52 ` Davide Libenzi 0 siblings, 1 reply; 30+ messages in thread From: Victor Yodaiken @ 2001-12-24 5:33 UTC (permalink / raw) To: Davide Libenzi Cc: Victor Yodaiken, Mike Kravetz, Momchil Velikov, george anzinger, lkml On Sun, Dec 23, 2001 at 05:31:11PM -0800, Davide Libenzi wrote: > On Sun, 23 Dec 2001, Victor Yodaiken wrote: > > > > > > > Run a "RT" task that is scheduled every millisecond (or time of your > > choice) > > while(1`){ > > read cycle timer > > clock_nanosleep(time period using aabsolute time > > read cycle timer - what was actual delay? track worst > > case > > } > > > > Run this > > a) on aaaaaaaaan unstressed system > > b) under stress > > c) while a timed non-rt benchmark runs to figure out "RT" > > overhead. > > I've coded a test app that uses the LatSched latency patch ( that uses > rdtsc ). > It basically does 1) set the current process priority to RT 2) an ioctl() > to activate the scheduler latency sampler 3) sleep for 1-2 secs 4) ioctl() > to stop the sampler 5) peek the sample with pid == getpid(). > In this way i get the net RT task scheduler latency. Yes it does not get > the real one that includes accessories kernel paths but my code does not > affect these ones. And they add noise to the measure. Seems to me that you are not testing what apps see. Internal benchmarks are useful only for figuring out how to remove bottlenecks that effect actual user apps - in my humble opinion of course. The nice thing about my benchmark is that it actually tests something useful - how well you can do periodic tasks. BTW, on RTLinux we get under 100 microseconds on even 50Mhzx PPC860 - 17us on a 800Mhz K7. I'd be happy to see some decent numbers in Linux, but you gotta measure something more applied. > > > > > - Davide > ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Scheduler issue 1, RT tasks ... 2001-12-24 5:33 ` Victor Yodaiken @ 2001-12-24 18:52 ` Davide Libenzi 2001-12-27 3:01 ` Victor Yodaiken 0 siblings, 1 reply; 30+ messages in thread From: Davide Libenzi @ 2001-12-24 18:52 UTC (permalink / raw) To: Victor Yodaiken; +Cc: Mike Kravetz, Momchil Velikov, george anzinger, lkml On Sun, 23 Dec 2001, Victor Yodaiken wrote: > On Sun, Dec 23, 2001 at 05:31:11PM -0800, Davide Libenzi wrote: > > On Sun, 23 Dec 2001, Victor Yodaiken wrote: > > > > > > > > > > > Run a "RT" task that is scheduled every millisecond (or time of your > > > choice) > > > while(1`){ > > > read cycle timer > > > clock_nanosleep(time period using aabsolute time > > > read cycle timer - what was actual delay? track worst > > > case > > > } > > > > > > Run this > > > a) on aaaaaaaaan unstressed system > > > b) under stress > > > c) while a timed non-rt benchmark runs to figure out "RT" > > > overhead. > > > > I've coded a test app that uses the LatSched latency patch ( that uses > > rdtsc ). > > It basically does 1) set the current process priority to RT 2) an ioctl() > > to activate the scheduler latency sampler 3) sleep for 1-2 secs 4) ioctl() > > to stop the sampler 5) peek the sample with pid == getpid(). > > In this way i get the net RT task scheduler latency. Yes it does not get > > the real one that includes accessories kernel paths but my code does not > > affect these ones. And they add noise to the measure. > > > Seems to me that you are not testing what apps see. Internal benchmarks > are useful only for figuring out how to remove bottlenecks that > effect actual user apps - in my humble opinion of course. > The nice thing about my benchmark is that it actually tests something > useful - how well you can do periodic tasks. BTW, on RTLinux we get > under 100 microseconds on even 50Mhzx PPC860 - 17us on a 800Mhz K7. > I'd be happy to see some decent numbers in Linux, but you gotta > measure something more applied. I know what you're saying but my goal now is to fix the scheduler not the overall RT latency ( at least not the one that does not depend on the scheduler ). Just take for example your 17us for your 800MHz machine, in my dual PIII 733 MHz with an rqlen of 4 the scheduler latency ( with that std scheduler ) is about 0.9us ( real one, not lat_ctx ). That means the the scheduler responsibility in your 17us is about 5%, and the remaining 95% is due "external" kernel paths. With an rqlen of 16 ( std scheduler ) the latency peaks up to ~2.4us going to ~14-15% of scheduler responsibility. I've coded this simple app : http://www.xmailserver.org/linux-patches/lnxsched.html#RtLats and i use it with the cpuhog ( hi-tech software that is available inside the same link ) to load the run queue. I'm going to plot the measured latency versus the runqueue length. Thanks to OSDLAB i'll have an 8 way machine to make some test on these big SMPs. I'll code even the simple app you're proposing but the real problem is how to load the system. The cpuhog load is a runqueue load and is "neutral", that means that is the same on all the systems. Loading the system with other kind of loads can introduce a device-driver/hw dependency on the measure ( much or less run time with irq disabled for example ). - Davide ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Scheduler issue 1, RT tasks ... 2001-12-24 18:52 ` Davide Libenzi @ 2001-12-27 3:01 ` Victor Yodaiken 2001-12-27 17:41 ` Davide Libenzi 0 siblings, 1 reply; 30+ messages in thread From: Victor Yodaiken @ 2001-12-27 3:01 UTC (permalink / raw) To: Davide Libenzi Cc: Victor Yodaiken, Mike Kravetz, Momchil Velikov, george anzinger, lkml On Mon, Dec 24, 2001 at 10:52:46AM -0800, Davide Libenzi wrote: > I know what you're saying but my goal now is to fix the scheduler not the > overall RT latency ( at least not the one that does not depend on the my bias is to fix the cause of the problem, but go ahead. > scheduler ). Just take for example your 17us for your 800MHz machine, in > my dual PIII 733 MHz with an rqlen of 4 the scheduler latency ( with that > std scheduler ) is about 0.9us ( real one, not lat_ctx ). That means the > the scheduler responsibility in your 17us is about 5%, and the remaining > 95% is due "external" kernel paths. With an rqlen of 16 ( std scheduler ) No: we've measured. The time in our system, which does not follow any Linux kernel paths, is dominated by motherboard bus delays. > the latency peaks up to ~2.4us going to ~14-15% of scheduler responsibility. > I've coded this simple app : > > http://www.xmailserver.org/linux-patches/lnxsched.html#RtLats > > and i use it with the cpuhog ( hi-tech software that is available inside > the same link ) to load the run queue. I'm going to plot the measured > latency versus the runqueue length. Thanks to OSDLAB i'll have an 8 way > machine to make some test on these big SMPs. I'll code even the simple > app you're proposing but the real problem is how to load the system. The > cpuhog load is a runqueue load and is "neutral", that means that is the > same on all the systems. Loading the system with other kind of loads can > introduce a device-driver/hw dependency on the measure ( much or less run > time with irq disabled for example ). Try ping -f localhost& ping -f onsamelocalnet & dd if=/dev/hda1 of=/dev/null & make clean; make bzImage; as a simple start > > > > > > - Davide > > > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Scheduler issue 1, RT tasks ... 2001-12-27 3:01 ` Victor Yodaiken @ 2001-12-27 17:41 ` Davide Libenzi 2001-12-28 0:05 ` Victor Yodaiken 0 siblings, 1 reply; 30+ messages in thread From: Davide Libenzi @ 2001-12-27 17:41 UTC (permalink / raw) To: Victor Yodaiken; +Cc: Mike Kravetz, Momchil Velikov, george anzinger, lkml On Wed, 26 Dec 2001, Victor Yodaiken wrote: > On Mon, Dec 24, 2001 at 10:52:46AM -0800, Davide Libenzi wrote: > > I know what you're saying but my goal now is to fix the scheduler not the > > overall RT latency ( at least not the one that does not depend on the > > my bias is to fix the cause of the problem, but go ahead. > > > > scheduler ). Just take for example your 17us for your 800MHz machine, in > > my dual PIII 733 MHz with an rqlen of 4 the scheduler latency ( with that > > std scheduler ) is about 0.9us ( real one, not lat_ctx ). That means the > > the scheduler responsibility in your 17us is about 5%, and the remaining > > 95% is due "external" kernel paths. With an rqlen of 16 ( std scheduler ) > > No: we've measured. The time in our system, which does not follow any > Linux kernel paths, is dominated by motherboard bus delays. 17us of bus delay ?! UP or SMP ? Under which kind of bus load ? > > the latency peaks up to ~2.4us going to ~14-15% of scheduler responsibility. > > I've coded this simple app : > > > > http://www.xmailserver.org/linux-patches/lnxsched.html#RtLats > > > > and i use it with the cpuhog ( hi-tech software that is available inside > > the same link ) to load the run queue. I'm going to plot the measured > > latency versus the runqueue length. Thanks to OSDLAB i'll have an 8 way > > machine to make some test on these big SMPs. I'll code even the simple > > app you're proposing but the real problem is how to load the system. The > > cpuhog load is a runqueue load and is "neutral", that means that is the > > same on all the systems. Loading the system with other kind of loads can > > introduce a device-driver/hw dependency on the measure ( much or less run > > time with irq disabled for example ). > > Try > ping -f localhost& > ping -f onsamelocalnet & > dd if=/dev/hda1 of=/dev/null & > make clean; make bzImage; > > > as a simple start Below is dumped the skeleton of a test app but i need an high res timer patch to sleep 2-5ms - Davide /* * rtttest by Davide Libenzi ( linux kernel scheduler rt latency sampler ) * Version 0.16 - Copyright (C) 2001 Davide Libenzi * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA * * Davide Libenzi <davidel@xmailserver.org> * * * The purpose of this tool is to measure the scheduler latency for * real time tasks using the "latsched" kernel patch. * Build: * * gcc -o rtttest rtttest.c -lrt * * Use: * * rtttest [--test-stime s] [--sleep-mstime ms] [--pause-mstime ms] [--priority p] * [--sched-fifo] [--sched-rr] [-- cmdpath [arg] ...] * * --test-stime = Set the test time in seconds * --sleep-mstime = Set the sleep time in milliseconds * --pause-mstime = Set the pause time in milliseconds * --priority = Set the real time task priority ( 1..99 ) * --sched-fifo = Set the real time task policy to FIFO * --sched-rr = Set the real time task policy to RR * -- = Separate the optional command to be executed during the test time * cmdpath = Command to be executed * arg = Command argouments * */ #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <signal.h> #include <string.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <time.h> #include <signal.h> #include <sched.h> #include <sys/ioctl.h> #include <sys/mman.h> #include <linux/timex.h> #define STD_SLEEP_TIME 4 #define PAUSE_SLEEP_TIME 200 #define STD_TEST_TIME 8 static volatile int stop_test = 0; void sig_int(int sig) { ++stop_test; signal(sig, sig_int); } int main(int argc, char *argv[]) { int ii, icmd, pausetime = PAUSE_SLEEP_TIME, testtime = STD_TEST_TIME, policy = SCHED_FIFO, priority = 1, sleeptime = STD_SLEEP_TIME, numsamples; pid_t expid = -1; cycles_t cys, cye, cylat = 0, mscycles; cycles_t *samples; struct sched_param sp; struct timespec ts1, ts2; for (ii = 1; ii < argc; ii++) { if (strcmp(argv[ii], "--test-stime") == 0) { if (++ii < argc) testtime = atoi(argv[ii]); continue; } if (strcmp(argv[ii], "--sleep-mstime") == 0) { if (++ii < argc) sleeptime = atoi(argv[ii]); continue; } if (strcmp(argv[ii], "--pause-mstime") == 0) { if (++ii < argc) pausetime = atoi(argv[ii]); continue; } if (strcmp(argv[ii], "--priority") == 0) { if (++ii < argc) priority = atoi(argv[ii]); continue; } if (strcmp(argv[ii], "--sched-fifo") == 0) { policy = SCHED_FIFO; continue; } if (strcmp(argv[ii], "--sched-rr") == 0) { policy = SCHED_RR; continue; } if (strcmp(argv[ii], "--") == 0) { icmd = ++ii; break; } } numsamples = (testtime * 1000) / pausetime + 1; if (!(samples = (cycles_t *) malloc(numsamples * sizeof(cycles_t)))) { perror("malloc"); return 1; } if (icmd > 0 && icmd < argc) { expid = fork(); if (expid == -1) { perror("fork"); return 5; } else if (expid == 0) { setpgid(0, getpid()); execv(argv[icmd], &argv[icmd]); exit(0); } } memset(&sp, 0, sizeof(sp)); sp.sched_priority = priority; if (sched_setscheduler(0, policy, &sp)) { perror("sched_setscheduler"); if (expid > 0 && kill(-expid, SIGKILL)) perror("SIGKILL"); return 4; } signal(SIGINT, sig_int); clock_getres(CLOCK_REALTIME, &ts1); fprintf(stderr, "timeres=%ld\n", ts1.tv_nsec / 1000); clock_gettime(CLOCK_REALTIME, &ts1); cys = get_cycles(); sleep(1); clock_gettime(CLOCK_REALTIME, &ts2); cye = get_cycles(); mscycles = (cye - cys) / ((ts2.tv_sec - ts1.tv_sec) * 1000 + (ts2.tv_nsec - ts1.tv_nsec) / 1000000); for (ii = 0; ii < numsamples && !stop_test; ii++) { ts1.tv_sec = 0; ts1.tv_nsec = sleeptime * 1000000; cys = get_cycles(); clock_nanosleep(CLOCK_REALTIME, 0, &ts1, &ts2); cye = get_cycles(); samples[ii] = (cye - cys) / mscycles; if (samples[ii] > cylat) cylat = samples[ii]; usleep(pausetime * 1000); } numsamples = ii; memset(&sp, 0, sizeof(sp)); sp.sched_priority = 0; if (sched_setscheduler(0, SCHED_OTHER, &sp)) { perror("sched_setscheduler"); if (expid > 0 && kill(-expid, SIGKILL)) perror("SIGKILL"); return 6; } if (expid > 0 && kill(-expid, SIGKILL)) perror("SIGKILL"); for (ii = 0; ii < numsamples; ii++) { } fprintf(stdout, "maxlat=%llu\n", cylat); return 0; } ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Scheduler issue 1, RT tasks ... 2001-12-27 17:41 ` Davide Libenzi @ 2001-12-28 0:05 ` Victor Yodaiken 2001-12-28 0:48 ` Davide Libenzi 0 siblings, 1 reply; 30+ messages in thread From: Victor Yodaiken @ 2001-12-28 0:05 UTC (permalink / raw) To: Davide Libenzi Cc: Victor Yodaiken, Mike Kravetz, Momchil Velikov, george anzinger, lkml On Thu, Dec 27, 2001 at 09:41:33AM -0800, Davide Libenzi wrote: > > No: we've measured. The time in our system, which does not follow any > > Linux kernel paths, is dominated by motherboard bus delays. > > 17us of bus delay ?! > UP or SMP ? > Under which kind of bus load ? Try cli read cycle timer inb from some isa port read cycle timer repeat for a while sti print worst case and weep ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Scheduler issue 1, RT tasks ... 2001-12-28 0:05 ` Victor Yodaiken @ 2001-12-28 0:48 ` Davide Libenzi 0 siblings, 0 replies; 30+ messages in thread From: Davide Libenzi @ 2001-12-28 0:48 UTC (permalink / raw) To: Victor Yodaiken; +Cc: Mike Kravetz, Momchil Velikov, george anzinger, lkml On Thu, 27 Dec 2001, Victor Yodaiken wrote: > On Thu, Dec 27, 2001 at 09:41:33AM -0800, Davide Libenzi wrote: > > > No: we've measured. The time in our system, which does not follow any > > > Linux kernel paths, is dominated by motherboard bus delays. > > > > 17us of bus delay ?! > > UP or SMP ? > > Under which kind of bus load ? > > Try > cli > read cycle timer > inb from some isa port > read cycle timer > repeat for a while > sti > print worst case and weep No need to test, i've a positive guess from ISA :) - Davide ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Scheduler issue 1, RT tasks ... 2001-12-20 22:25 ` george anzinger 2001-12-20 22:21 ` Momchil Velikov @ 2001-12-20 22:36 ` Davide Libenzi 2001-12-24 0:19 ` Victor Yodaiken 1 sibling, 1 reply; 30+ messages in thread From: Davide Libenzi @ 2001-12-20 22:36 UTC (permalink / raw) To: george anzinger; +Cc: lkml On Thu, 20 Dec 2001, george anzinger wrote: > Davide Libenzi wrote: > > > > I'd like to have some comments about RT tasks implementation in a SMP > > system because POSIX it's not clear about how the priority rules apply to > > multiprocessor systems. > > The Balanced Multi Queue Scheduler ( BMQS, http://www.xmailserver.org/linux-patches/mss-2.html ) > > i'm working on tries to keep CPU schedulers the more independent as > > possible and currently implements two kind of RT tasks, local one and > > global ones. > > Local RT tasks apply POSIX priority rules inside the local CPU, that means > > that an RT task running on CPU0 cannot preempt another task ( being it > > normal or RT ) on CPU1. This keeps schedulers interlocking very low > > because of the very fast path in reschedule_idle() ( no multi lock > > acquisition, CPU queue loops, etc...). > > Global RT tasks, that live in a separate run queue, have the ability to > > preempt remote CPU and this can lead ( in the unfortunate case that the > > last CPU running the RT task is running another RT task ) to an higher > > cost in reschedule_idle(). > > The check for a global RT task selection is done in a very fast way before > > checking the local queue : > > > > if (!list_empty(&runqueue_head(RT_QID))) > > goto rt_queue_select; > > rt_queue_select_back: > > > > and this does not affect the scheduler latency at all. > > On the contrary, by having a separate queue for global RT tasks, can > > improve it in high run queue load cases. > > The local/global RT task selection is done with setscheduler() with a new > > ( or'ed ) flag SCHED_RTGLOBAL, and this means that the default is RT task > > local. > > I'd like to have comments on this before jumping to the next Scheduler > > issue ( balancing mode ). > > > My understanding of the POSIX standard is the the highest priority > task(s) are to get the cpu(s) using the standard calls. If you want to > deviate from this I think the standard allows extensions, but they IMHO > should be requested, not the default, so I would turn your flag around > to force LOCAL, not GLOBAL. So, you're basically saying that for a better standard compliancy it's better to have global preemption policy by default. And having users to request rt tasks localization explicitly. It's fine for me. - Davide ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Scheduler issue 1, RT tasks ... 2001-12-20 22:36 ` Davide Libenzi @ 2001-12-24 0:19 ` Victor Yodaiken 2001-12-24 1:20 ` Davide Libenzi 0 siblings, 1 reply; 30+ messages in thread From: Victor Yodaiken @ 2001-12-24 0:19 UTC (permalink / raw) To: Davide Libenzi; +Cc: george anzinger, lkml On Thu, Dec 20, 2001 at 02:36:07PM -0800, Davide Libenzi wrote: > > My understanding of the POSIX standard is the the highest priority > > task(s) are to get the cpu(s) using the standard calls. If you want to > > deviate from this I think the standard allows extensions, but they IMHO > > should be requested, not the default, so I would turn your flag around > > to force LOCAL, not GLOBAL. > > So, you're basically saying that for a better standard compliancy it's > better to have global preemption policy by default. And having users to > request rt tasks localization explicitly. It's fine for me. Can you please cite the passaaages in the standrd you have in mind? > > > > > - Davide > > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Scheduler issue 1, RT tasks ... 2001-12-24 0:19 ` Victor Yodaiken @ 2001-12-24 1:20 ` Davide Libenzi 2001-12-27 3:42 ` Victor Yodaiken 0 siblings, 1 reply; 30+ messages in thread From: Davide Libenzi @ 2001-12-24 1:20 UTC (permalink / raw) To: Victor Yodaiken; +Cc: george anzinger, lkml On Sun, 23 Dec 2001, Victor Yodaiken wrote: > On Thu, Dec 20, 2001 at 02:36:07PM -0800, Davide Libenzi wrote: > > > My understanding of the POSIX standard is the the highest priority > > > task(s) are to get the cpu(s) using the standard calls. If you want to > > > deviate from this I think the standard allows extensions, but they IMHO > > > should be requested, not the default, so I would turn your flag around > > > to force LOCAL, not GLOBAL. > > > > So, you're basically saying that for a better standard compliancy it's > > better to have global preemption policy by default. And having users to > > request rt tasks localization explicitly. It's fine for me. > > Can you please cite the passaaages in the standrd you have in mind? POSIX 1003. The doubt was if ( since the POSIX standard does not talk about SMP ) the real time priorities apply to CPU or to the entire system. This because the scheduler i'm working on has two kind of RT tasks, local and global ones. Local RT tasks cannot preempt remote CPU so if, for example, one RT task is woke up and its last CPU is running another RT task with higher priority, the fresly woke up task will wait even if other CPUs are running tasks wil lower priority. Global RT task will force remote preemption in case the last CPU that ran the woke up RT task is running another higher priority RT task. Global RT tasks have their own queue and lock like CPUs. My old default was local RT task that was forced by a setscheduler() flag SCHED_RTGLOBAL while George suggested that it's better to have default global and to have this behavior forced by a SCHED_RTLOCAL flag. I already changed the code to default to global. - Davide ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Scheduler issue 1, RT tasks ... 2001-12-24 1:20 ` Davide Libenzi @ 2001-12-27 3:42 ` Victor Yodaiken 2001-12-27 17:48 ` Davide Libenzi 0 siblings, 1 reply; 30+ messages in thread From: Victor Yodaiken @ 2001-12-27 3:42 UTC (permalink / raw) To: Davide Libenzi; +Cc: Victor Yodaiken, george anzinger, lkml On Sun, Dec 23, 2001 at 05:20:26PM -0800, Davide Libenzi wrote: > On Sun, 23 Dec 2001, Victor Yodaiken wrote: > > > On Thu, Dec 20, 2001 at 02:36:07PM -0800, Davide Libenzi wrote: > > > > My understanding of the POSIX standard is the the highest priority > > > > task(s) are to get the cpu(s) using the standard calls. If you want to > > > > deviate from this I think the standard allows extensions, but they IMHO > > > > should be requested, not the default, so I would turn your flag around > > > > to force LOCAL, not GLOBAL. > > > > > > So, you're basically saying that for a better standard compliancy it's > > > better to have global preemption policy by default. And having users to > > > request rt tasks localization explicitly. It's fine for me. > > > > Can you please cite the passaaages in the standrd you have in mind? > > POSIX 1003. The doubt was if ( since the POSIX standard does not talk > about SMP ) the real time priorities apply to CPU or to the entire system. Right, that was my question. George says, in your words, "for better standards compliancy ..." and I want to know why you guys think that. > This because the scheduler i'm working on has two kind of RT tasks, local > and global ones. Local RT tasks cannot preempt remote CPU so if, for > example, one RT task is woke up and its last CPU is running another RT > task with higher priority, the fresly woke up task will wait even if other > CPUs are running tasks wil lower priority. Global RT task will force > remote preemption in case the last CPU that ran the woke up RT task is > running another higher priority RT task. Global RT tasks have their own > queue and lock like CPUs. My old default was local RT task that was > forced by a setscheduler() flag SCHED_RTGLOBAL while George suggested that > it's better to have default global and to have this behavior forced by a > SCHED_RTLOCAL flag. I already changed the code to default to global. > > > > > - Davide > > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Scheduler issue 1, RT tasks ... 2001-12-27 3:42 ` Victor Yodaiken @ 2001-12-27 17:48 ` Davide Libenzi 0 siblings, 0 replies; 30+ messages in thread From: Davide Libenzi @ 2001-12-27 17:48 UTC (permalink / raw) To: Victor Yodaiken; +Cc: george anzinger, lkml On Wed, 26 Dec 2001, Victor Yodaiken wrote: > On Sun, Dec 23, 2001 at 05:20:26PM -0800, Davide Libenzi wrote: > > On Sun, 23 Dec 2001, Victor Yodaiken wrote: > > > > > On Thu, Dec 20, 2001 at 02:36:07PM -0800, Davide Libenzi wrote: > > > > > My understanding of the POSIX standard is the the highest priority > > > > > task(s) are to get the cpu(s) using the standard calls. If you want to > > > > > deviate from this I think the standard allows extensions, but they IMHO > > > > > should be requested, not the default, so I would turn your flag around > > > > > to force LOCAL, not GLOBAL. > > > > > > > > So, you're basically saying that for a better standard compliancy it's > > > > better to have global preemption policy by default. And having users to > > > > request rt tasks localization explicitly. It's fine for me. > > > > > > Can you please cite the passaaages in the standrd you have in mind? > > > > POSIX 1003. The doubt was if ( since the POSIX standard does not talk > > about SMP ) the real time priorities apply to CPU or to the entire system. > > Right, that was my question. George says, in your words, "for better > standards compliancy ..." and I want to know why you guys think that. The thought was that if someone need RT tasks he probably need a very low latency and so the idea that by applying global preemption decisions would lead to a better compliancy. But i'll be happy to ear that this is false anyway ... - Davide ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Scheduler issue 1, RT tasks ...
@ 2001-12-28 9:45 Martin Knoblauch
2001-12-29 9:12 ` george anzinger
0 siblings, 1 reply; 30+ messages in thread
From: Martin Knoblauch @ 2001-12-28 9:45 UTC (permalink / raw)
To: linux-kernel
> Re: [RFC] Scheduler issue 1, RT tasks ...
>
> >
> > Right, that was my question. George says, in your words, "for better
>
> > standards compliancy ..." and I want to know why you guys think
> that.
>
> The thought was that if someone need RT tasks he probably need a very
> low
> latency and so the idea that by applying global preemption decisions
> would
> lead to a better compliancy. But i'll be happy to ear that this is
> false
> anyway ...
>
without wanting to start a RT flame-fest, what do people really want
when they talk about RT in this [Linux] context:
- very low latency
- deterministic latency ("never to exceed")
- both
- something completely different
Thanks
Martin
--
+-----------------------------------------------------+
|Martin Knoblauch |
|-----------------------------------------------------|
|http://www.knobisoft.de/cats |
|-----------------------------------------------------|
|e-mail: knobi@knobisoft.de |
+-----------------------------------------------------+
^ permalink raw reply [flat|nested] 30+ messages in thread* Re: [RFC] Scheduler issue 1, RT tasks ... 2001-12-28 9:45 Martin Knoblauch @ 2001-12-29 9:12 ` george anzinger 0 siblings, 0 replies; 30+ messages in thread From: george anzinger @ 2001-12-29 9:12 UTC (permalink / raw) To: knobi; +Cc: linux-kernel Martin Knoblauch wrote: > > > Re: [RFC] Scheduler issue 1, RT tasks ... > > > > > > > > Right, that was my question. George says, in your words, "for better > > > > > standards compliancy ..." and I want to know why you guys think > > that. > > > > The thought was that if someone need RT tasks he probably need a very > > low > > latency and so the idea that by applying global preemption decisions > > would > > lead to a better compliancy. But i'll be happy to ear that this is > > false > > anyway ... > > > > without wanting to start a RT flame-fest, what do people really want > when they talk about RT in this [Linux] context: > > - very low latency > - deterministic latency ("never to exceed") > - both > - something completely different > All of the above from time to time and user to user. That is, some folks want one or more of the above, some folks want more, some less. What is really up? Well they have a job to do that requires certain things. Different jobs require different capabilities. It is hard to say that any given system will do a reasonably complex job with out testing. For example we may have the required latency but find the system fails because, to get the latency, we preempted another task that was (and so still is) in the middle of updating something we need to complete the job. On the other hand, some things clearly are in the way of doing some real time tasks in a timely fashion. Among these things are long context switch latency, high kernel overhead, and low resolution time keeping/ alarms. So we talk (argue? posture?) most about these. At the same time all the other bullet items of *nix systems are, at least some times, important. Why Linux? The same reasons it is used any where else. Among these reasons is the desire to have to know and support only one system. Thus the drive to extend it to the more responsive end of the spectrum without loosing other capabilities. And, of course, the standards issue is in here. Standards compliance is important from an investment point of view. It allows the user to move his costly (far more than the hardware) software investment from one kernel/ system to another with little or no rework. -- George george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ Real time sched: http://sourceforge.net/projects/rtsched/ ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Scheduler issue 1, RT tasks ...
@ 2001-12-29 19:02 Dieter Nützel
2001-12-29 21:00 ` Andrew Morton
2001-12-29 22:24 ` Davide Libenzi
0 siblings, 2 replies; 30+ messages in thread
From: Dieter Nützel @ 2001-12-29 19:02 UTC (permalink / raw)
To: george anzinger
Cc: Martin Knoblauch, Davide Libenzi, Robert Love, Linux Kernel List
Martin Knoblauch wrote:
>
> > Re: [RFC] Scheduler issue 1, RT tasks ...
> >
> > >
> > > Right, that was my question. George says, in your words, "for better
> >
> > > standards compliancy ..." and I want to know why you guys think
> > that.
> >
> > The thought was that if someone need RT tasks he probably need a very
> > low latency and so the idea that by applying global preemption decisions
> > would lead to a better compliancy. But i'll be happy to ear that this is
> > false anyway ...
> >
>
> without wanting to start a RT flame-fest, what do people really want
> when they talk about RT in this [Linux] context:
>
> - very low latency
> - deterministic latency ("never to exceed")
> - both
> - something completely different
>
> All of the above from time to time and user to user. That is, some
> folks want one or more of the above, some folks want more, some less.
> What is really up? Well they have a job to do that requires certain
> things. Different jobs require different capabilities. It is hard to
> say that any given system will do a reasonably complex job with out
> testing. For example we may have the required latency but find the
> system fails because, to get the latency, we preempted another task that
> was (and so still is) in the middle of updating something we need to
> complete the job.
So George what direction should I try for some tests?
2.4.17 plus your and Robert's preempt plus lock-break?
Add your high-res-timers, rtscheduler or both?
Do they apply against 2.4.17/2.4.18-pre1?
A combination of the above plus Davide's BMQS?
I ask because my MP3/Ogg-Vorbis hiccup during dbench isn't solved anyway.
Running 2.4.17 + preempt + lock-break + 10_vm-21 (AA).
Some wisdom?
Thank you for all your work and
Happy New Year
-Dieter
--
Dieter Nützel
Graduate Student, Computer Science
University of Hamburg
Department of Computer Science
@home: Dieter.Nuetzel@hamburg.de
^ permalink raw reply [flat|nested] 30+ messages in thread* Re: [RFC] Scheduler issue 1, RT tasks ... 2001-12-29 19:02 Dieter Nützel @ 2001-12-29 21:00 ` Andrew Morton 2001-12-29 22:24 ` Davide Libenzi 1 sibling, 0 replies; 30+ messages in thread From: Andrew Morton @ 2001-12-29 21:00 UTC (permalink / raw) To: Dieter Nützel; +Cc: Linux Kernel List Dieter Nützel wrote: > > I ask because my MP3/Ogg-Vorbis hiccup during dbench isn't solved anyway. > Running 2.4.17 + preempt + lock-break + 10_vm-21 (AA). > Some wisdom? Please test this elevator patch. I'll be putting it out more formally in a day or two. Much more testing is needed yet, but for me, the time to read a 16 megabyte file whilst running dbench 160 falls from three minutes thirty seconds to seven seconds. (This is a VM thing, not an elevator thing). --- linux-2.4.18-pre1/drivers/block/elevator.c Thu Jul 19 20:59:41 2001 +++ linux-akpm/drivers/block/elevator.c Sat Dec 29 00:52:05 2001 @@ -82,6 +82,7 @@ int elevator_linus_merge(request_queue_t { struct list_head *entry = &q->queue_head; unsigned int count = bh->b_size >> 9, ret = ELEVATOR_NO_MERGE; + const int max_bomb_segments = q->elevator.max_bomb_segments; while ((entry = entry->prev) != head) { struct request *__rq = blkdev_entry_to_request(entry); @@ -116,6 +117,56 @@ int elevator_linus_merge(request_queue_t } } + /* + * If we failed to merge a read anywhere in the request + * queue, we really don't want to place it at the end + * of the list, behind lots of writes. So place it near + * the front. + * + * We don't want to place it in front of _all_ writes: that + * would create lots of seeking, and isn't tunable. + * We try to avoid promoting this read in front of existing + * reads. + * + * max_bomb_sectors becomes the maximum number of write + * requests which we allow to remain in place in front of + * a newly introduced read. We weight things a little bit, + * so large writes are more expensive than small ones, but it's + * requests which count, not sectors. + */ + if (max_bomb_segments && rw == READ && ret == ELEVATOR_NO_MERGE) { + int cur_latency = 0; + struct request * const cur_request = *req; + + entry = head->next; + while (entry != &q->queue_head) { + struct request *__rq; + + if (entry == &q->queue_head) + BUG(); + if (entry == q->queue_head.next && + q->head_active && !q->plugged) + BUG(); + __rq = blkdev_entry_to_request(entry); + + if (__rq == cur_request) { + /* + * This is where the old algorithm placed it. + * There's no point pushing it further back, + * so leave it here, in sorted order. + */ + break; + } + if (__rq->cmd == WRITE) { + cur_latency += 1 + __rq->nr_sectors / 64; + if (cur_latency >= max_bomb_segments) { + *req = __rq; + break; + } + } + entry = entry->next; + } + } return ret; } @@ -188,7 +239,7 @@ int blkelvget_ioctl(elevator_t * elevato output.queue_ID = elevator->queue_ID; output.read_latency = elevator->read_latency; output.write_latency = elevator->write_latency; - output.max_bomb_segments = 0; + output.max_bomb_segments = elevator->max_bomb_segments; if (copy_to_user(arg, &output, sizeof(blkelv_ioctl_arg_t))) return -EFAULT; @@ -207,9 +258,12 @@ int blkelvset_ioctl(elevator_t * elevato return -EINVAL; if (input.write_latency < 0) return -EINVAL; + if (input.max_bomb_segments < 0) + return -EINVAL; elevator->read_latency = input.read_latency; elevator->write_latency = input.write_latency; + elevator->max_bomb_segments = input.max_bomb_segments; return 0; } --- linux-2.4.18-pre1/include/linux/elevator.h Thu Feb 15 16:58:34 2001 +++ linux-akpm/include/linux/elevator.h Sat Dec 29 12:57:33 2001 @@ -3,10 +3,11 @@ typedef void (elevator_fn) (struct request *, elevator_t *, struct list_head *, - struct list_head *, int); + struct list_head *); -typedef int (elevator_merge_fn) (request_queue_t *, struct request **, struct list_head *, - struct buffer_head *, int, int); +typedef int (elevator_merge_fn)(request_queue_t *, struct request **, + struct list_head *, struct buffer_head *bh, + int rw, int max_sectors); typedef void (elevator_merge_cleanup_fn) (request_queue_t *, struct request *, int); @@ -16,6 +17,7 @@ struct elevator_s { int read_latency; int write_latency; + int max_bomb_segments; elevator_merge_fn *elevator_merge_fn; elevator_merge_cleanup_fn *elevator_merge_cleanup_fn; @@ -24,13 +26,13 @@ struct elevator_s unsigned int queue_ID; }; -int elevator_noop_merge(request_queue_t *, struct request **, struct list_head *, struct buffer_head *, int, int); -void elevator_noop_merge_cleanup(request_queue_t *, struct request *, int); -void elevator_noop_merge_req(struct request *, struct request *); - -int elevator_linus_merge(request_queue_t *, struct request **, struct list_head *, struct buffer_head *, int, int); -void elevator_linus_merge_cleanup(request_queue_t *, struct request *, int); -void elevator_linus_merge_req(struct request *, struct request *); +elevator_merge_fn elevator_noop_merge; +elevator_merge_cleanup_fn elevator_noop_merge_cleanup; +elevator_merge_req_fn elevator_noop_merge_req; + +elevator_merge_fn elevator_linus_merge; +elevator_merge_cleanup_fn elevator_linus_merge_cleanup; +elevator_merge_req_fn elevator_linus_merge_req; typedef struct blkelv_ioctl_arg_s { int queue_ID; @@ -54,22 +56,6 @@ extern void elevator_init(elevator_t *, #define ELEVATOR_FRONT_MERGE 1 #define ELEVATOR_BACK_MERGE 2 -/* - * This is used in the elevator algorithm. We don't prioritise reads - * over writes any more --- although reads are more time-critical than - * writes, by treating them equally we increase filesystem throughput. - * This turns out to give better overall performance. -- sct - */ -#define IN_ORDER(s1,s2) \ - ((((s1)->rq_dev == (s2)->rq_dev && \ - (s1)->sector < (s2)->sector)) || \ - (s1)->rq_dev < (s2)->rq_dev) - -#define BHRQ_IN_ORDER(bh, rq) \ - ((((bh)->b_rdev == (rq)->rq_dev && \ - (bh)->b_rsector < (rq)->sector)) || \ - (bh)->b_rdev < (rq)->rq_dev) - static inline int elevator_request_latency(elevator_t * elevator, int rw) { int latency; @@ -85,7 +71,7 @@ static inline int elevator_request_laten ((elevator_t) { \ 0, /* read_latency */ \ 0, /* write_latency */ \ - \ + 0, /* max_bomb_segments */ \ elevator_noop_merge, /* elevator_merge_fn */ \ elevator_noop_merge_cleanup, /* elevator_merge_cleanup_fn */ \ elevator_noop_merge_req, /* elevator_merge_req_fn */ \ @@ -95,7 +81,7 @@ static inline int elevator_request_laten ((elevator_t) { \ 8192, /* read passovers */ \ 16384, /* write passovers */ \ - \ + 6, /* max_bomb_segments */ \ elevator_linus_merge, /* elevator_merge_fn */ \ elevator_linus_merge_cleanup, /* elevator_merge_cleanup_fn */ \ elevator_linus_merge_req, /* elevator_merge_req_fn */ \ ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Scheduler issue 1, RT tasks ... 2001-12-29 19:02 Dieter Nützel 2001-12-29 21:00 ` Andrew Morton @ 2001-12-29 22:24 ` Davide Libenzi 1 sibling, 0 replies; 30+ messages in thread From: Davide Libenzi @ 2001-12-29 22:24 UTC (permalink / raw) To: Dieter Nützel Cc: george anzinger, Martin Knoblauch, Robert Love, Linux Kernel List [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: TEXT/PLAIN; charset=X-UNKNOWN, Size: 2212 bytes --] On Sat, 29 Dec 2001, Dieter [iso-8859-15] Nützel wrote: > Martin Knoblauch wrote: > > > > > Re: [RFC] Scheduler issue 1, RT tasks ... > > > > > > > > > > > Right, that was my question. George says, in your words, "for better > > > > > > > standards compliancy ..." and I want to know why you guys think > > > that. > > > > > > The thought was that if someone need RT tasks he probably need a very > > > low latency and so the idea that by applying global preemption decisions > > > would lead to a better compliancy. But i'll be happy to ear that this is > > > false anyway ... > > > > > > > without wanting to start a RT flame-fest, what do people really want > > when they talk about RT in this [Linux] context: > > > > - very low latency > > - deterministic latency ("never to exceed") > > - both > > - something completely different > > > > All of the above from time to time and user to user. That is, some > > folks want one or more of the above, some folks want more, some less. > > What is really up? Well they have a job to do that requires certain > > things. Different jobs require different capabilities. It is hard to > > say that any given system will do a reasonably complex job with out > > testing. For example we may have the required latency but find the > > system fails because, to get the latency, we preempted another task that > > was (and so still is) in the middle of updating something we need to > > complete the job. > > So George what direction should I try for some tests? > 2.4.17 plus your and Robert's preempt plus lock-break? > Add your high-res-timers, rtscheduler or both? > Do they apply against 2.4.17/2.4.18-pre1? > A combination of the above plus Davide's BMQS? > > I ask because my MP3/Ogg-Vorbis hiccup during dbench isn't solved anyway. > Running 2.4.17 + preempt + lock-break + 10_vm-21 (AA). > Some wisdom? A bad scheduler can make the latency to increase but in your case i don't think that it could increase that much ( in percent ). By copying a huge file arund you can experience spots of 1-2 secs of machine freeze and this is definitely not the scheduler. The demage the a bad scheduler can do is directly proportional to the cs anyway. - Davide ^ permalink raw reply [flat|nested] 30+ messages in thread
[parent not found: <200112291907.LAA25639@messenger.mvista.com>]
* Re: [RFC] Scheduler issue 1, RT tasks ... [not found] <200112291907.LAA25639@messenger.mvista.com> @ 2001-12-30 10:01 ` george anzinger 2001-12-30 19:54 ` Dieter Nützel 0 siblings, 1 reply; 30+ messages in thread From: george anzinger @ 2001-12-30 10:01 UTC (permalink / raw) To: Dieter Nützel Cc: Martin Knoblauch, Davide Libenzi, Robert Love, Linux Kernel List Dieter Nützel wrote: > > Martin Knoblauch wrote: > > > > > Re: [RFC] Scheduler issue 1, RT tasks ... > > > > > > > > > > > Right, that was my question. George says, in your words, "for better > > > > > > > standards compliancy ..." and I want to know why you guys think > > > that. > > > > > > The thought was that if someone need RT tasks he probably need a very > > > low latency and so the idea that by applying global preemption decisions > > > would lead to a better compliancy. But i'll be happy to ear that this is > > > false anyway ... > > > > > > > without wanting to start a RT flame-fest, what do people really want > > when they talk about RT in this [Linux] context: > > > > - very low latency > > - deterministic latency ("never to exceed") > > - both > > - something completely different > > > > All of the above from time to time and user to user. That is, some > > folks want one or more of the above, some folks want more, some less. > > What is really up? Well they have a job to do that requires certain > > things. Different jobs require different capabilities. It is hard to > > say that any given system will do a reasonably complex job with out > > testing. For example we may have the required latency but find the > > system fails because, to get the latency, we preempted another task that > > was (and so still is) in the middle of updating something we need to > > complete the job. > > So George what direction should I try for some tests? > 2.4.17 plus your and Robert's preempt plus lock-break? > Add your high-res-timers, rtscheduler or both? > Do they apply against 2.4.17/2.4.18-pre1? > A combination of the above plus Davide's BMQS? I would guess you want preempt plus lock-break at least. rtsched may give a small improvement if you run any real time (i.e. not SCHED_OTHER) tasks (and the improvement should be in both real time and non-real time preemption) but, in general, the schedule is not any where near the problem the long held locks are so I really don't expect to see much improvement here. If you have a lot of task on the system (not active, just there) you may see the "recalculate" with the standard scheduler which is much improved with rtsched (it does not include tasks not in the run list in the recalculate). As for high-res-timers, I just put out a 2.4.13 version which should work on 2.4.17 (there are rejects in the patch, but all in non-i386 code). I have one report, however, of asm errors which seem to depend on the compiler (or asm) version. I will look into this and put up a 2.4.17 version early next week. Testing wise, I don't think this will be visible because you most likely are not using POSIX timers. There is a change in the timer list structure, but that should be in the noise also. In short, the high-res-timers project provides new capability, not improved performance with existing capability. > > I ask because my MP3/Ogg-Vorbis hiccup during dbench isn't solved anyway. > Running 2.4.17 + preempt + lock-break + 10_vm-21 (AA). > Some wisdom? Try the preempt-stats patch and collect data during the hiccup. It should point the finger at the problem. Let us know what you find. Robert has been very good at fixing things like this with his lock-break stuff, but we/he need to know who the bad guy is. > > Thank you for all your work and > Happy New Year > > -Dieter > -- > Dieter Nützel > Graduate Student, Computer Science > > University of Hamburg > Department of Computer Science > @home: Dieter.Nuetzel@hamburg.de -- George george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ Real time sched: http://sourceforge.net/projects/rtsched/ ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Scheduler issue 1, RT tasks ... 2001-12-30 10:01 ` george anzinger @ 2001-12-30 19:54 ` Dieter Nützel 2001-12-31 13:56 ` george anzinger 0 siblings, 1 reply; 30+ messages in thread From: Dieter Nützel @ 2001-12-30 19:54 UTC (permalink / raw) To: Andrew Morton; +Cc: Linux Kernel List On Sunday, 29. December 2001 21:00, you wrote: >Dieter Nützel wrote: > > > > I ask because my MP3/Ogg-Vorbis hiccup during dbench isn't solved anyway. > > Running 2.4.17 + preempt + lock-break + 10_vm-21 (AA). > > Some wisdom? > > Please test this elevator patch. I'll be putting it out more formally > in a day or two. Much more testing is needed yet, but for me, the > time to read a 16 megabyte file whilst running dbench 160 falls from > three minutes thirty seconds to seven seconds. (This is a VM thing, > not an elevator thing). Andrew or anybody else, can you please send me a copy directly? The version I've extracted from the list is some what broken. I am not on LKML 'cause it is to much traffic for such a poor little boy like me...;-) Thanks, Dieter ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Scheduler issue 1, RT tasks ... 2001-12-30 19:54 ` Dieter Nützel @ 2001-12-31 13:56 ` george anzinger 2002-01-01 18:55 ` Dieter Nützel 0 siblings, 1 reply; 30+ messages in thread From: george anzinger @ 2001-12-31 13:56 UTC (permalink / raw) To: Dieter Nützel; +Cc: Andrew Morton, Linux Kernel List Dieter Nützel wrote: > > On Sunday, 29. December 2001 21:00, you wrote: > >Dieter Nützel wrote: > > > > > > I ask because my MP3/Ogg-Vorbis hiccup during dbench isn't solved anyway. > > > Running 2.4.17 + preempt + lock-break + 10_vm-21 (AA). > > > Some wisdom? > > > > Please test this elevator patch. I'll be putting it out more formally > > in a day or two. Much more testing is needed yet, but for me, the > > time to read a 16 megabyte file whilst running dbench 160 falls from > > three minutes thirty seconds to seven seconds. (This is a VM thing, > > not an elevator thing). > > Andrew or anybody else, > > can you please send me a copy directly? > The version I've extracted from the list is some what broken. > I am not on LKML 'cause it is to much traffic for such a poor little boy like > me...;-) > Andrew, I think the problem is that the mailer(s) insert new lines. Is this right Dieter? It is certainly a problem for me. Best to mail as an attachment. -- George george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ Real time sched: http://sourceforge.net/projects/rtsched/ ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Scheduler issue 1, RT tasks ... 2001-12-31 13:56 ` george anzinger @ 2002-01-01 18:55 ` Dieter Nützel 0 siblings, 0 replies; 30+ messages in thread From: Dieter Nützel @ 2002-01-01 18:55 UTC (permalink / raw) To: george anzinger Cc: Andrew Morton, Andrea Arcangeli, Linux Kernel List, Robert Love, Oleg Drokin, ReiserFS List On Monday, 31. December 2001 14:56, george anzinger wrote: > Dieter Nützel wrote: > > On Sunday, 29. December 2001 21:00, you wrote: > > >Dieter Nützel wrote: > > > > I ask because my MP3/Ogg-Vorbis hiccup during dbench isn't solved > > > > anyway. Running 2.4.17 + preempt + lock-break + 10_vm-21 (AA). > > > > Some wisdom? > > > > > > Please test this elevator patch. I'll be putting it out more formally > > > in a day or two. Much more testing is needed yet, but for me, the > > > time to read a 16 megabyte file whilst running dbench 160 falls from > > > three minutes thirty seconds to seven seconds. (This is a VM thing, > > > not an elevator thing). > > > > Andrew or anybody else, > > > > can you please send me a copy directly? > > The version I've extracted from the list is some what broken. > > I am not on LKML 'cause it is to much traffic for such a poor little boy > > like me...;-) > > Andrew, > > I think the problem is that the mailer(s) insert new lines. Is this > right Dieter? It is certainly a problem for me. Yes. > Best to mail as an attachment. Yes. But I applied it by hand and got the best results I ever had! GREAT work, Andrew! This should be go in, soon. 2.4.17 preempt-kernel-rml-2.4.17-1.patch lock-break-rml-2.4.17-2.patch 00_nanosleep-5 10_vm-21 (Andrea) bootmem-2.4.17-pre6 elevator-fix (Andrew) O-inode-attrs.patch (ReiserFS) linux-2.4.17rc2-KLMN+exp_trunc+3fixes.patch (ReiserFS) Happy New Year and best wishes! -Dieter BTW Below are my first results. More to come (analysis of latency). 2.4.17-preempt + 10_vm-21 + elevator dbench/dbench> time ./dbench 32 32 clients started ................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................+.................................................................+...................+.......+....................+.........................................................................................................+.........+....................+................+..................+.....+...................................................................................................................................................................................................................................................................+..............++...........+.+.+..++++.+++.+.+++.++++******************************** Throughput 49.7707 MB/sec (NB=62.2133 MB/sec 497.707 MBit/sec) 13.800u 51.810s 1:25.89 76.3% 0+0k 0+0io 939pf+0w 2.4.17-preempt + 10_vm-21 + elevator + MP3 playback dbench/dbench> time ./dbench 32 32 clients started ..............................................................................................................................................................................................................................................................................................................................................+.................................++....+.+.+.......+.............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................+......................++.....+..+++.++..++.++++.+....++++.+++++******************************** Throughput 48.6323 MB/sec (NB=60.7904 MB/sec 486.323 MBit/sec) 14.690u 52.920s 1:27.87 76.9% 0+0k 0+0io 939pf+0w ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2002-01-01 18:55 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-12-20 21:11 [RFC] Scheduler issue 1, RT tasks Davide Libenzi
2001-12-20 22:25 ` george anzinger
2001-12-20 22:21 ` Momchil Velikov
2001-12-20 22:57 ` Davide Libenzi
2001-12-21 17:00 ` Mike Kravetz
2001-12-21 17:19 ` Davide Libenzi
2001-12-21 17:33 ` Mike Kravetz
2001-12-21 18:29 ` Davide Libenzi
2001-12-24 0:18 ` Victor Yodaiken
2001-12-24 1:31 ` Davide Libenzi
2001-12-24 5:33 ` Victor Yodaiken
2001-12-24 18:52 ` Davide Libenzi
2001-12-27 3:01 ` Victor Yodaiken
2001-12-27 17:41 ` Davide Libenzi
2001-12-28 0:05 ` Victor Yodaiken
2001-12-28 0:48 ` Davide Libenzi
2001-12-20 22:36 ` Davide Libenzi
2001-12-24 0:19 ` Victor Yodaiken
2001-12-24 1:20 ` Davide Libenzi
2001-12-27 3:42 ` Victor Yodaiken
2001-12-27 17:48 ` Davide Libenzi
-- strict thread matches above, loose matches on Subject: below --
2001-12-28 9:45 Martin Knoblauch
2001-12-29 9:12 ` george anzinger
2001-12-29 19:02 Dieter Nützel
2001-12-29 21:00 ` Andrew Morton
2001-12-29 22:24 ` Davide Libenzi
[not found] <200112291907.LAA25639@messenger.mvista.com>
2001-12-30 10:01 ` george anzinger
2001-12-30 19:54 ` Dieter Nützel
2001-12-31 13:56 ` george anzinger
2002-01-01 18:55 ` Dieter Nützel
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox