* [RFC PATCH -rt] Priority preemption latency @ 2006-06-10 0:01 Darren Hart 2006-06-10 6:48 ` Ingo Molnar 0 siblings, 1 reply; 15+ messages in thread From: Darren Hart @ 2006-06-10 0:01 UTC (permalink / raw) To: linux-kerneL; +Cc: Ingo Molnar, Thomas Gleixner We have run into a situation where a lower priority RT task will run after a higher priority RT task is awoken. We are running the test case available here: http://linux.dvhart.com/tests/prio-preempt/ For all required files, including librt.h, download the tarball here: http://linux.dvhart.com/tests/tests.tar.bz2 The test case returns non zero on failure, so just run it in a loop, exitting on failure. The following is a patch by Mike Kravetz that seems to resolve the problem. Thoughts, comments? Thanks, -- Darren Hart IBM Linux Technology Center I've been looking into the priority preemption issues with [the prio-preempt test case]. My 'theory' is that when RT tasks are awakened they are not always put on the 'best' runqueue. As a result, the rt_overload functionality has to be engaged to get the task to a CPU it can run on. This 'delay' in getting the task on a CPU is the 'bug' exposed by the testcase as a lower priority task is allowed to run during this delay. In the testcase, awakened worker tasks should all run on the same CPU as the others are busy running other higher priority tasks. But, from the scheduler's point of view these other CPUs might be a better place for the awakened task because they are 'less loaded'. My quick and dirty patch (below) is to the try_to_wake_up path. When awakening an RT task don't send it to a remote CPU (determined to be less loaded) unless it can prerempt the task running on the remote CPU. In such cases the task is added to the current CPU's runqueue. I've been successfully running the C testcase for a while with this patch applied. I'm not sure of there is a 'complete' solution to this problem without a redesign of 'global' RT scheduling on SMP. The rt_overload mechanism does not guarantee strict priority ordering (as evidenced by this bug). Perhaps the best solution we can hope for with the current mechanism is to make the scheduler be smarter with RT task placement on SMP. This would at least minimize the need for rt_overload. -- Mike diff -Naupr linux-rayrt12.1-r357/kernel/sched.c linux-rayrt12.1-r357.work/kernel/sched.c --- linux-rayrt12.1-r357/kernel/sched.c 2006-05-27 00:43:35.000000000 +0000 +++ linux-rayrt12.1-r357.work/kernel/sched.c 2006-06-08 22:41:20.000000000 +0000 @@ -1543,6 +1543,17 @@ static int try_to_wake_up(task_t *p, uns } } + /* + * XXX Don't send RT task elsewhere unless it can preempt current + * XXX on other CPU. Better yet would be for awakened RT tasks to + * XXX examine this(and all other) CPU(s) to see what is the best + * XXX fit. For example there is no check here to see if the + * XXX currently running task can be preempted (which would be the + * XXX ideal case). + */ + if (rt_task(p) && !TASK_PREEMPTS_CURR(p, rq)) + goto out_set_cpu; + new_cpu = cpu; /* Could not wake to this_cpu. Wake to cpu instead */ out_set_cpu: new_cpu = wake_idle(new_cpu, p); ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH -rt] Priority preemption latency 2006-06-10 0:01 [RFC PATCH -rt] Priority preemption latency Darren Hart @ 2006-06-10 6:48 ` Ingo Molnar 2006-06-10 7:21 ` Ingo Molnar 2006-06-11 5:49 ` Darren Hart 0 siblings, 2 replies; 15+ messages in thread From: Ingo Molnar @ 2006-06-10 6:48 UTC (permalink / raw) To: Darren Hart; +Cc: linux-kerneL, Thomas Gleixner * Darren Hart <dvhltc@us.ibm.com> wrote: > @@ -1543,6 +1543,17 @@ static int try_to_wake_up(task_t *p, uns > } > } > > + /* > + * XXX Don't send RT task elsewhere unless it can preempt current > + * XXX on other CPU. Better yet would be for awakened RT tasks to > + * XXX examine this(and all other) CPU(s) to see what is the best > + * XXX fit. For example there is no check here to see if the > + * XXX currently running task can be preempted (which would be the > + * XXX ideal case). > + */ > + if (rt_task(p) && !TASK_PREEMPTS_CURR(p, rq)) > + goto out_set_cpu; > + Great testcase! Note that we already do RT-overload wakeups further below: /* * If a newly woken up RT task cannot preempt the * current (RT) task then try to find another * CPU it can preempt: */ if (rt_task(p) && !TASK_PREEMPTS_CURR(p, rq)) { smp_send_reschedule_allbutself(); rt_overload_wakeup++; } what i think happened is that the above logic misses the case you have described in detail, and doesnt mark the current CPU for rescheduling. I.e. it sends an IPI to all _other_ CPUs, including the 'wrong target' - but it doesnt mark the current CPU for reschedule - hence if the current CPU is the only right target we might fail to handle this task! could you try the (untested) patch below, does it solve your testcase too? Ingo Index: linux-rt.q/kernel/sched.c =================================================================== --- linux-rt.q.orig/kernel/sched.c +++ linux-rt.q/kernel/sched.c @@ -1587,11 +1587,15 @@ out_set_cpu: } else { /* * If a newly woken up RT task cannot preempt the - * current (RT) task then try to find another - * CPU it can preempt: + * current (RT) task (on a target runqueue) then try + * to find another CPU it can preempt: */ if (rt_task(p) && !TASK_PREEMPTS_CURR(p, rq)) { smp_send_reschedule_allbutself(); + /* + * Trigger a reschedule check on the current CPU too: + */ + set_tsk_need_resched(current); rt_overload_wakeup++; } } ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH -rt] Priority preemption latency 2006-06-10 6:48 ` Ingo Molnar @ 2006-06-10 7:21 ` Ingo Molnar 2006-06-11 5:49 ` Darren Hart 1 sibling, 0 replies; 15+ messages in thread From: Ingo Molnar @ 2006-06-10 7:21 UTC (permalink / raw) To: Darren Hart; +Cc: linux-kerneL, Thomas Gleixner * Ingo Molnar <mingo@elte.hu> wrote: > could you try the (untested) patch below, does it solve your testcase > too? find updated patch below: - fix small race: use this_rq->curr not 'current'. - optimize the case where current CPU could be preempted and do not send IPIs. - integrate the RT-overload global statistics counters into schedstats. This should make things more scalable. Ingo Index: linux-rt.q/kernel/sched.c =================================================================== --- linux-rt.q.orig/kernel/sched.c +++ linux-rt.q/kernel/sched.c @@ -292,6 +292,11 @@ struct runqueue { /* try_to_wake_up() stats */ unsigned long ttwu_cnt; unsigned long ttwu_local; + + /* RT-overload stats: */ + unsigned long rto_schedule; + unsigned long rto_wakeup; + unsigned long rto_pulled; #endif }; @@ -426,7 +431,7 @@ static inline void task_rq_unlock(runque * bump this up when changing the output format or the meaning of an existing * format, so that tools can adapt (or abort) */ -#define SCHEDSTAT_VERSION 12 +#define SCHEDSTAT_VERSION 13 static int show_schedstat(struct seq_file *seq, void *v) { @@ -443,13 +448,14 @@ static int show_schedstat(struct seq_fil /* runqueue-specific stats */ seq_printf(seq, - "cpu%d %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu", + "cpu%d %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu", cpu, rq->yld_both_empty, rq->yld_act_empty, rq->yld_exp_empty, rq->yld_cnt, rq->sched_switch, rq->sched_cnt, rq->sched_goidle, rq->ttwu_cnt, rq->ttwu_local, rq->rq_sched_info.cpu_time, - rq->rq_sched_info.run_delay, rq->rq_sched_info.pcnt); + rq->rq_sched_info.run_delay, rq->rq_sched_info.pcnt, + rq->rto_schedule, rq->rto_wakeup, rq->rto_pulled); seq_printf(seq, "\n"); @@ -640,9 +646,7 @@ static inline void sched_info_switch(tas #define sched_info_switch(t, next) do { } while (0) #endif /* CONFIG_SCHEDSTATS */ -int rt_overload_schedule, rt_overload_wakeup, rt_overload_pulled; - -__cacheline_aligned_in_smp atomic_t rt_overload; +static __cacheline_aligned_in_smp atomic_t rt_overload; static inline void inc_rt_tasks(task_t *p, runqueue_t *rq) { @@ -1312,7 +1316,7 @@ static void balance_rt_tasks(runqueue_t if (p && (p->prio < next->prio)) { WARN_ON(p == src_rq->curr); WARN_ON(!p->array); - rt_overload_pulled++; + schedstat_inc(this_rq, rto_pulled); set_task_cpu(p, this_cpu); @@ -1469,9 +1473,9 @@ static inline int wake_idle(int cpu, tas static int try_to_wake_up(task_t *p, unsigned int state, int sync, int mutex) { int cpu, this_cpu, success = 0; + runqueue_t *this_rq, *rq; unsigned long flags; long old_state; - runqueue_t *rq; #ifdef CONFIG_SMP unsigned long load, this_load; struct sched_domain *sd, *this_sd = NULL; @@ -1587,12 +1591,34 @@ out_set_cpu: } else { /* * If a newly woken up RT task cannot preempt the - * current (RT) task then try to find another - * CPU it can preempt: + * current (RT) task (on a target runqueue) then try + * to find another CPU it can preempt: */ if (rt_task(p) && !TASK_PREEMPTS_CURR(p, rq)) { - smp_send_reschedule_allbutself(); - rt_overload_wakeup++; + this_rq = cpu_rq(this_cpu); + /* + * Special-case: the task on this CPU can be + * preempted. In that case there's no need to + * trigger reschedules on other CPUs, we can + * mark the current task for reschedule. + * + * (Note that it's safe to access this_rq without + * extra locking in this particular case, because + * we are on the current CPU.) + */ + if (TASK_PREEMPTS_CURR(p, this_rq)) + set_tsk_need_resched(this_rq->curr); + else + /* + * Neither the intended target runqueue + * nor the current CPU can take this task. + * Trigger a reschedule on all other CPUs + * nevertheless, maybe one of them can take + * this task: + */ + smp_send_reschedule_allbutself(); + + schedstat_inc(this_rq, rto_wakeup); } } @@ -1951,7 +1977,7 @@ static inline void finish_task_switch(ru * then kick other CPUs, they might run it: */ if (unlikely(rt_task(current) && prev->array && rt_task(prev))) { - rt_overload_schedule++; + schedstat_inc(rq, rto_schedule); smp_send_reschedule_allbutself(); } #endif ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH -rt] Priority preemption latency 2006-06-10 6:48 ` Ingo Molnar 2006-06-10 7:21 ` Ingo Molnar @ 2006-06-11 5:49 ` Darren Hart 2006-06-11 5:58 ` Ingo Molnar 2006-06-11 6:24 ` Darren Hart 1 sibling, 2 replies; 15+ messages in thread From: Darren Hart @ 2006-06-11 5:49 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kerneL, Thomas Gleixner [-- Attachment #1: Type: text/plain, Size: 2092 bytes --] On Friday 09 June 2006 23:48, Ingo Molnar wrote: > * Darren Hart <dvhltc@us.ibm.com> wrote: > > @@ -1543,6 +1543,17 @@ static int try_to_wake_up(task_t *p, uns > > } > > } > > > > + /* > > + * XXX Don't send RT task elsewhere unless it can preempt current > > + * XXX on other CPU. Better yet would be for awakened RT tasks to > > + * XXX examine this(and all other) CPU(s) to see what is the best > > + * XXX fit. For example there is no check here to see if the > > + * XXX currently running task can be preempted (which would be the > > + * XXX ideal case). > > + */ > > + if (rt_task(p) && !TASK_PREEMPTS_CURR(p, rq)) > > + goto out_set_cpu; > > + > > Great testcase! Note that we already do RT-overload wakeups further > below: > > /* > * If a newly woken up RT task cannot preempt the > * current (RT) task then try to find another > * CPU it can preempt: > */ > if (rt_task(p) && !TASK_PREEMPTS_CURR(p, rq)) { > smp_send_reschedule_allbutself(); > rt_overload_wakeup++; > } > > what i think happened is that the above logic misses the case you have > described in detail, and doesnt mark the current CPU for rescheduling. > > I.e. it sends an IPI to all _other_ CPUs, including the 'wrong target' - > but it doesnt mark the current CPU for reschedule - hence if the current > CPU is the only right target we might fail to handle this task! > > could you try the (untested) patch below, does it solve your testcase > too? Thanks for the updated patch! It wouldn't quite build (proc_misc.c still referenced the old rt_overload_* variables, fixup patch attached removing those print statements). I have it running on a 4 way opteron box running prio-preempt in a timed while loop, exiting only on failure. It's been running fine for several minutes - usually fails in under a mintue. We'll see how it's doing in the morning :-) -- Darren Hart IBM Linux Technology Center Realtime Linux Team [-- Attachment #2: rto-proc_misc-fixup.patch --] [-- Type: text/x-diff, Size: 1129 bytes --] diff -Nurp linux-2.6.16/fs/proc/proc_misc.c linux-2.6.16-fixup/fs/proc/proc_misc.c --- linux-2.6.16/fs/proc/proc_misc.c 2006-06-10 16:48:23.000000000 -0700 +++ linux-2.6.16-fixup/fs/proc/proc_misc.c 2006-06-10 17:00:55.000000000 -0700 @@ -569,19 +569,11 @@ static int show_stat(struct seq_file *p, { unsigned long nr_uninterruptible_cpu(int cpu); extern int pi_initialized; - extern int rt_overload_schedule, - rt_overload_wakeup, rt_overload_pulled; unsigned long rt_nr_running_cpu(int cpu); extern atomic_t rt_overload; int i; - seq_printf(p, "rt_overload_schedule: %d\n", - rt_overload_schedule); - seq_printf(p, "rt_overload_wakeup: %d\n", - rt_overload_wakeup); - seq_printf(p, "rt_overload_pulled: %d\n", - rt_overload_pulled); seq_printf(p, "pi_init: %d\n", pi_initialized); seq_printf(p, "nr_running(): %ld\n", nr_running()); @@ -593,8 +585,6 @@ static int show_stat(struct seq_file *p, for_each_cpu(i) seq_printf(p, "rt_nr_running(%d): %ld\n", i, rt_nr_running_cpu(i)); - seq_printf(p, "rt_overload: %d\n", atomic_read(&rt_overload)); - } #endif ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH -rt] Priority preemption latency 2006-06-11 5:49 ` Darren Hart @ 2006-06-11 5:58 ` Ingo Molnar 2006-06-11 6:37 ` Ingo Molnar 2006-06-11 6:24 ` Darren Hart 1 sibling, 1 reply; 15+ messages in thread From: Ingo Molnar @ 2006-06-11 5:58 UTC (permalink / raw) To: Darren Hart; +Cc: linux-kerneL, Thomas Gleixner * Darren Hart <dvhltc@us.ibm.com> wrote: > Thanks for the updated patch! It wouldn't quite build (proc_misc.c > still referenced the old rt_overload_* variables, fixup patch attached > removing those print statements). [...] doh, forgot to send those changes ... > [...] I have it running on a 4 way opteron box running prio-preempt > in a timed while loop, exiting only on failure. It's been running > fine for several minutes - usually fails in under a mintue. We'll see > how it's doing in the morning :-) meanwhile i've released -rt3 with the fix (and the procfs change) included. i slept on it meanwhile, and i think the safest would be to also do the attached patch ontop of -rt3. This makes the 'kick other CPUs' logic totally unconditional - so whatever happens the wakeup code will notice if an RT task is in trouble finding the right CPU. Under -rt3 we'd only run into this branch if we stayed on the same CPU - but there can be cases when we have your scenario even in precisely such a case. It's rare but possible. Ingo Index: linux-rt.q/kernel/sched.c =================================================================== --- linux-rt.q.orig/kernel/sched.c +++ linux-rt.q/kernel/sched.c @@ -1588,38 +1588,37 @@ out_set_cpu: this_cpu = smp_processor_id(); cpu = task_cpu(p); - } else { + } + /* + * If a newly woken up RT task cannot preempt the + * current (RT) task (on a target runqueue) then try + * to find another CPU it can preempt: + */ + if (rt_task(p) && !TASK_PREEMPTS_CURR(p, rq)) { + this_rq = cpu_rq(this_cpu); /* - * If a newly woken up RT task cannot preempt the - * current (RT) task (on a target runqueue) then try - * to find another CPU it can preempt: + * Special-case: the task on this CPU can be + * preempted. In that case there's no need to + * trigger reschedules on other CPUs, we can + * mark the current task for reschedule. + * + * (Note that it's safe to access this_rq without + * extra locking in this particular case, because + * we are on the current CPU.) */ - if (rt_task(p) && !TASK_PREEMPTS_CURR(p, rq)) { - this_rq = cpu_rq(this_cpu); + if (TASK_PREEMPTS_CURR(p, this_rq)) + set_tsk_need_resched(this_rq->curr); + else /* - * Special-case: the task on this CPU can be - * preempted. In that case there's no need to - * trigger reschedules on other CPUs, we can - * mark the current task for reschedule. - * - * (Note that it's safe to access this_rq without - * extra locking in this particular case, because - * we are on the current CPU.) + * Neither the intended target runqueue + * nor the current CPU can take this task. + * Trigger a reschedule on all other CPUs + * nevertheless, maybe one of them can take + * this task: */ - if (TASK_PREEMPTS_CURR(p, this_rq)) - set_tsk_need_resched(this_rq->curr); - else - /* - * Neither the intended target runqueue - * nor the current CPU can take this task. - * Trigger a reschedule on all other CPUs - * nevertheless, maybe one of them can take - * this task: - */ - smp_send_reschedule_allbutself(); + smp_send_reschedule_allbutself(); - schedstat_inc(this_rq, rto_wakeup); - } + schedstat_inc(this_rq, rto_wakeup); } out_activate: ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH -rt] Priority preemption latency 2006-06-11 5:58 ` Ingo Molnar @ 2006-06-11 6:37 ` Ingo Molnar 0 siblings, 0 replies; 15+ messages in thread From: Ingo Molnar @ 2006-06-11 6:37 UTC (permalink / raw) To: Darren Hart; +Cc: linux-kerneL, Thomas Gleixner * Ingo Molnar <mingo@elte.hu> wrote: > i slept on it meanwhile, and i think the safest would be to also do > the attached patch ontop of -rt3. This makes the 'kick other CPUs' > logic totally unconditional - so whatever happens the wakeup code will > notice if an RT task is in trouble finding the right CPU. Under -rt3 > we'd only run into this branch if we stayed on the same CPU - but > there can be cases when we have your scenario even in precisely such a > case. It's rare but possible. just to elaborate on that possibility a bit more, it's this portion of the wakeup code that could cause problems: new_cpu = wake_idle(new_cpu, p); if (new_cpu != cpu) { set_task_cpu(p, new_cpu); task_rq_unlock(rq, &flags); /* might preempt at this point */ rq = task_rq_lock(p, &flags); old_state = p->state; if (!(old_state & state)) at the 'might preempt' point the kernel can go to any other CPU. The stock kernel does not care because it's really rare and wakeup placement of tasks is a statistical thing to it. But for RT it's important to get it right all the time, so my patch removes the RT-overload check from the else branch to an unconditional place. (I doubt you can trigger it normally, but it's a possibility nevertheless.) (i'll do a patch for the upstream scheduler to not re-enable interrupts at this point [it's a waste of cycles], but even if we couldnt go to another CPU the whole scheduling scenario might change while we are trying to acquire the runqueue lock, so it's still beneficial to have the RT-overload check unconditional.) Ingo ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH -rt] Priority preemption latency 2006-06-11 5:49 ` Darren Hart 2006-06-11 5:58 ` Ingo Molnar @ 2006-06-11 6:24 ` Darren Hart 2006-06-11 7:36 ` Ingo Molnar 1 sibling, 1 reply; 15+ messages in thread From: Darren Hart @ 2006-06-11 6:24 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kerneL, Thomas Gleixner > Thanks for the updated patch! It wouldn't quite build (proc_misc.c still > referenced the old rt_overload_* variables, fixup patch attached removing > those print statements). I have it running on a 4 way opteron box running > prio-preempt in a timed while loop, exiting only on failure. It's been > running fine for several minutes - usually fails in under a mintue. We'll > see how it's doing in the morning :-) Well it failed in about 14 minutes. The machine was under heavy load running another benchmark. I have removed the secondary benchmark and am running prio-preempt alone on the same 4 way opteron box. Will post again when I know more... -- Darren Hart IBM Linux Technology Center Realtime Linux Team ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH -rt] Priority preemption latency 2006-06-11 6:24 ` Darren Hart @ 2006-06-11 7:36 ` Ingo Molnar 2006-06-12 15:38 ` Darren Hart 2006-06-12 20:12 ` Mike Kravetz 0 siblings, 2 replies; 15+ messages in thread From: Ingo Molnar @ 2006-06-11 7:36 UTC (permalink / raw) To: Darren Hart; +Cc: linux-kernel, Thomas Gleixner * Darren Hart <dvhltc@us.ibm.com> wrote: > > Thanks for the updated patch! It wouldn't quite build (proc_misc.c still > > referenced the old rt_overload_* variables, fixup patch attached removing > > those print statements). I have it running on a 4 way opteron box running > > prio-preempt in a timed while loop, exiting only on failure. It's been > > running fine for several minutes - usually fails in under a mintue. We'll > > see how it's doing in the morning :-) > > Well it failed in about 14 minutes. The machine was under heavy load > running another benchmark. I have removed the secondary benchmark and > am running prio-preempt alone on the same 4 way opteron box. Will > post again when I know more... ok - could you try the patch from today (re-attached below)? Maybe that theoretical scenario i mentioned is only theoretical in theory ;-) Ingo Index: linux-rt.q/kernel/sched.c =================================================================== --- linux-rt.q.orig/kernel/sched.c +++ linux-rt.q/kernel/sched.c @@ -1588,38 +1588,37 @@ out_set_cpu: this_cpu = smp_processor_id(); cpu = task_cpu(p); - } else { + } + /* + * If a newly woken up RT task cannot preempt the + * current (RT) task (on a target runqueue) then try + * to find another CPU it can preempt: + */ + if (rt_task(p) && !TASK_PREEMPTS_CURR(p, rq)) { + this_rq = cpu_rq(this_cpu); /* - * If a newly woken up RT task cannot preempt the - * current (RT) task (on a target runqueue) then try - * to find another CPU it can preempt: + * Special-case: the task on this CPU can be + * preempted. In that case there's no need to + * trigger reschedules on other CPUs, we can + * mark the current task for reschedule. + * + * (Note that it's safe to access this_rq without + * extra locking in this particular case, because + * we are on the current CPU.) */ - if (rt_task(p) && !TASK_PREEMPTS_CURR(p, rq)) { - this_rq = cpu_rq(this_cpu); + if (TASK_PREEMPTS_CURR(p, this_rq)) + set_tsk_need_resched(this_rq->curr); + else /* - * Special-case: the task on this CPU can be - * preempted. In that case there's no need to - * trigger reschedules on other CPUs, we can - * mark the current task for reschedule. - * - * (Note that it's safe to access this_rq without - * extra locking in this particular case, because - * we are on the current CPU.) + * Neither the intended target runqueue + * nor the current CPU can take this task. + * Trigger a reschedule on all other CPUs + * nevertheless, maybe one of them can take + * this task: */ - if (TASK_PREEMPTS_CURR(p, this_rq)) - set_tsk_need_resched(this_rq->curr); - else - /* - * Neither the intended target runqueue - * nor the current CPU can take this task. - * Trigger a reschedule on all other CPUs - * nevertheless, maybe one of them can take - * this task: - */ - smp_send_reschedule_allbutself(); + smp_send_reschedule_allbutself(); - schedstat_inc(this_rq, rto_wakeup); - } + schedstat_inc(this_rq, rto_wakeup); } out_activate: ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH -rt] Priority preemption latency 2006-06-11 7:36 ` Ingo Molnar @ 2006-06-12 15:38 ` Darren Hart 2006-06-15 21:06 ` Mike Kravetz 2006-06-12 20:12 ` Mike Kravetz 1 sibling, 1 reply; 15+ messages in thread From: Darren Hart @ 2006-06-12 15:38 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel, Thomas Gleixner On Sunday 11 June 2006 00:36, Ingo Molnar wrote: > ok - could you try the patch from today (re-attached below)? Maybe that > theoretical scenario i mentioned is only theoretical in theory ;-) > > Ingo I started running this version of the patch with prio-preemt in a loop over 10 hours ago, and it's still running. This seems to be the right fix. Thanks! -- Darren Hart IBM Linux Technology Center Realtime Linux Team ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH -rt] Priority preemption latency 2006-06-12 15:38 ` Darren Hart @ 2006-06-15 21:06 ` Mike Kravetz 2006-06-15 22:13 ` Darren Hart ` (2 more replies) 0 siblings, 3 replies; 15+ messages in thread From: Mike Kravetz @ 2006-06-15 21:06 UTC (permalink / raw) To: Darren Hart; +Cc: Ingo Molnar, linux-kernel, Thomas Gleixner, John Stultz [-- Attachment #1: Type: text/plain, Size: 1026 bytes --] On Mon, Jun 12, 2006 at 08:38:24AM -0700, Darren Hart wrote: > I started running this version of the patch with prio-preemt in a loop > over 10 hours ago, and it's still running. This seems to be the right fix. Unfortunately, this test did eventually fail over in our environment. John Stultz added the concept of 'interrupter threads' to the testcase. These high priority RT interrupter threads, occasionally wake up and run for a short period of time. Since these new threads are higher priority than any other, they theoretically should not impact the testcase. This is because the primary purpose of the testcase is to ensure that lower priority tasks do not run while higher priority tasks are waiting for a CPU. After adding these interrupter threads, the tetscase fails (on a system here) about 50% of the time. An updated version of prio-preempt.c is attached. It needs the same headers/makefile/etc as originally provided by Darren. Any help figuring out what is happening here would be appreciated. -- Mike [-- Attachment #2: prio-preempt.c.new --] [-- Type: text/plain, Size: 5860 bytes --] /* Filename: prio-preempt.c * Author: Dinakar Guniguntala <dino@us.ibm.com> * Description: Test priority preemption * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. * * Copyright (C) IBM Corporation, 2006 * * 2006-Jun-01: Initial version by Dinakar Guniguntala * Changes from John Stultz and Vivek Pallantla */ #include <stdio.h> #include <stdlib.h> #include <signal.h> #include <time.h> #include <pthread.h> #include <sched.h> #include <errno.h> #include <sys/syscall.h> #include "librt.h" #define NUM_WORKERS 27 #define CHECK_LIMIT 1 volatile int busy_threads = 0; volatile int test_over = 0; volatile int threads_running=0; static int rt_threads = 0; static pthread_mutex_t bmutex = PTHREAD_MUTEX_INITIALIZER; static pthread_mutex_t mutex[NUM_WORKERS+1]; static pthread_cond_t cond[NUM_WORKERS+1]; static int t_after_wait[NUM_WORKERS]; #define ADD_INTERRUPTER_THREADS #ifdef ADD_INTERRUPTER_THREADS void *int_thread(void* arg) { int j, a = 0; while (!test_over) { /* do some busy work */ if(!(a%4)) a = a * 3; else if (!(a%6)) a = a /2; else a++; usleep(10); } return (void*)a; } #endif void *busy_thread(void* arg) { struct sched_param sched_param; int policy, mypri = 0, tid; tid = (int) (((struct thread *) arg)->arg); if (pthread_getschedparam(pthread_self(), &policy, &sched_param) != 0) { printf("ERR: Couldn't get pthread info \n"); } else { mypri = sched_param.sched_priority; } pthread_mutex_lock(&bmutex); busy_threads++; printf("Busy Thread %d(%d): Running...\n", tid, mypri); pthread_mutex_unlock(&bmutex); /* TODO: Add sched set affinity here */ /* Busy loop */ while (!test_over); printf("Busy Thread %d(%d): Exiting\n", tid, mypri); return NULL; } void *worker_thread(void* arg) { struct sched_param sched_param; int policy, rc, mypri = 0, tid, times = 0; tid = (int) (((struct thread *) arg)->arg); nsec_t pstart, pend; if (pthread_getschedparam(pthread_self(), &policy, &sched_param) != 0) { printf("ERR: Couldn't get pthread info \n"); } else { mypri = sched_param.sched_priority; } /* check in */ pthread_mutex_lock(&bmutex); threads_running++; pthread_mutex_unlock(&bmutex); /* block */ rc = pthread_mutex_lock(&mutex[tid]); rc = pthread_cond_wait(&cond[tid], &mutex[tid]); rc = pthread_mutex_unlock(&mutex[tid]); printf("%llu: Thread %d(%d) wakes up from sleep \n", rt_gettime(), tid, mypri); /*check if we're the last thread */ if (tid == NUM_WORKERS-1) { t_after_wait[tid] = 1; pthread_mutex_lock(&bmutex); threads_running--; pthread_mutex_unlock(&bmutex); return NULL; } /* Signal next thread */ rc = pthread_mutex_lock(&mutex[tid+1]); rc = pthread_cond_signal(&cond[tid+1]); printf("%llu: Thread %d(%d): Sent signal (%d) to (%d)\n", rt_gettime(), tid, mypri, rc, tid+1); pstart = pend = rt_gettime(); rc = pthread_mutex_unlock(&mutex[tid+1]); printf("%llu: Thread %d(%d) setting it's bit \n",rt_gettime(), tid, mypri); t_after_wait[tid] = 1; while (t_after_wait[tid+1] != 1) { pend = rt_gettime(); times++; } if (times >= CHECK_LIMIT) { printf("Thread %d(%d): Non-Preempt limit reached. %llu ns max latency\n", tid, mypri, pend-pstart); exit(1); } /* check out */ pthread_mutex_lock(&bmutex); threads_running--; pthread_mutex_unlock(&bmutex); return NULL; } void *master_thread(void* arg) { int i, pri_boost; #ifdef ADD_INTERRUPTER_THREADS /* start interrupter thread */ pri_boost = 90; for (i = 0; i < rt_threads; i++) { create_fifo_thread(int_thread, (void*)0, sched_get_priority_min(SCHED_FIFO) + pri_boost); } #endif /* start the (N-1) busy threads */ pri_boost = 80; for (i = rt_threads; i > 1; i--) { create_fifo_thread(busy_thread, (void*)i, sched_get_priority_min(SCHED_FIFO) + pri_boost); } /* make sure children are started */ while (busy_threads < (rt_threads-1)) sleep(1); printf("Busy threads created!\n"); /* start NUM_WORKERS worker threads */ for (i = 0, pri_boost = 10; i < NUM_WORKERS ; i++, pri_boost+=2) { pthread_mutex_init(&mutex[i], NULL); pthread_cond_init(&cond[i], NULL); create_fifo_thread(worker_thread, (void*)i, sched_get_priority_min(SCHED_FIFO) + pri_boost); } printf("Worker threads created\n"); /* Let the worker threads wait on the cond vars */ while (threads_running < NUM_WORKERS) sleep(2); printf("Signaling first thread\n"); pthread_mutex_lock(&mutex[0]); pthread_cond_signal(&cond[0]); pthread_mutex_unlock(&mutex[0]); while (threads_running) sleep(20); test_over = 1; return NULL; } int parse_args(int c, char *v) { int handled = 1; switch (c) { case 'n': rt_threads = atoi(v); break; default: handled = 0; break; } return handled; } int main(int argc, char* argv[]) { int pri_boost; rt_init("n:", parse_args, argc, argv); if (rt_threads == 0) { printf("Usage: %s -n num_rt_threads\n", argv[0]); exit(1); } pri_boost = 81; create_fifo_thread(master_thread, (void*)0, sched_get_priority_min(SCHED_FIFO) + pri_boost); /* wait for threads to complete */ join_threads(); return 0; } ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH -rt] Priority preemption latency 2006-06-15 21:06 ` Mike Kravetz @ 2006-06-15 22:13 ` Darren Hart 2006-06-15 23:05 ` Esben Nielsen 2006-07-11 16:48 ` Mike Kravetz 2 siblings, 0 replies; 15+ messages in thread From: Darren Hart @ 2006-06-15 22:13 UTC (permalink / raw) To: Mike Kravetz; +Cc: Ingo Molnar, linux-kernel, Thomas Gleixner, John Stultz On Thursday 15 June 2006 14:06, Mike Kravetz wrote: > On Mon, Jun 12, 2006 at 08:38:24AM -0700, Darren Hart wrote: > > I started running this version of the patch with prio-preemt in a loop > > over 10 hours ago, and it's still running. This seems to be the right > > fix. > > Unfortunately, this test did eventually fail over in our environment. > John Stultz added the concept of 'interrupter threads' to the testcase. > These high priority RT interrupter threads, occasionally wake up and > run for a short period of time. Since these new threads are higher > priority than any other, they theoretically should not impact the > testcase. This is because the primary purpose of the testcase is to > ensure that lower priority tasks do not run while higher priority tasks > are waiting for a CPU. > > After adding these interrupter threads, the tetscase fails (on a system > here) about 50% of the time. An updated version of prio-preempt.c is > attached. It needs the same headers/makefile/etc as originally provided > by Darren. Thanks for the update Mike! I have incorporated the interrupter threads using a command line argument, -i, they are disabled by default. Grab the latest here: http://linux.dvhart.com/tests/ To run the test on a 4 way machine with interrupter threads enabled, use: $ ./prio-preempt -n 4 -i > > Any help figuring out what is happening here would be appreciated. -- Darren Hart IBM Linux Technology Center Realtime Linux Team Phone: 503 578 3185 T/L: 775 3185 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH -rt] Priority preemption latency 2006-06-15 21:06 ` Mike Kravetz 2006-06-15 22:13 ` Darren Hart @ 2006-06-15 23:05 ` Esben Nielsen 2006-06-15 22:48 ` Mike Kravetz 2006-07-11 16:48 ` Mike Kravetz 2 siblings, 1 reply; 15+ messages in thread From: Esben Nielsen @ 2006-06-15 23:05 UTC (permalink / raw) To: Mike Kravetz Cc: Darren Hart, Ingo Molnar, linux-kernel, Thomas Gleixner, John Stultz On Thu, 15 Jun 2006, Mike Kravetz wrote: > On Mon, Jun 12, 2006 at 08:38:24AM -0700, Darren Hart wrote: >> I started running this version of the patch with prio-preemt in a loop >> over 10 hours ago, and it's still running. This seems to be the right fix. > > Unfortunately, this test did eventually fail over in our environment. > John Stultz added the concept of 'interrupter threads' to the testcase. > These high priority RT interrupter threads, occasionally wake up and > run for a short period of time. Since these new threads are higher > priority than any other, they theoretically should not impact the > testcase. This is because the primary purpose of the testcase is to > ensure that lower priority tasks do not run while higher priority tasks > are waiting for a CPU. > > After adding these interrupter threads, the tetscase fails (on a system > here) about 50% of the time. An updated version of prio-preempt.c is > attached. It needs the same headers/makefile/etc as originally provided > by Darren. > > Any help figuring out what is happening here would be appreciated. Well, I can't say understand what your test is supposed to test. But I know having printf() inside something which is supposed to be RT is a big no-no as it can block. What if the first printf() after pthread_cond_wait() is blocking? The interrupt threads probably use a lot of CPU overall and they have priority 90. They might slow the output device for printf() so much down that some queue becomes full and printf() must block. Esben > -- > Mike > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH -rt] Priority preemption latency 2006-06-15 23:05 ` Esben Nielsen @ 2006-06-15 22:48 ` Mike Kravetz 0 siblings, 0 replies; 15+ messages in thread From: Mike Kravetz @ 2006-06-15 22:48 UTC (permalink / raw) To: Esben Nielsen Cc: Darren Hart, Ingo Molnar, linux-kernel, Thomas Gleixner, John Stultz On Fri, Jun 16, 2006 at 12:05:21AM +0100, Esben Nielsen wrote: > Well, I can't say understand what your test is supposed to test. But I > know having printf() inside something which is supposed to be RT is a big > no-no as it can block. You are correct. John's modifications to the testcase actually removed the printfs from the threads we are concerned with. I left them in because they were part of the 'original' tetscase, and I wanted to introduce as few changes as possible. In hindsight, that was a mistake. With the printfs commented out (in the worker threads) the testcase still fails at about the same rate. Darren, can you comment those out in the version on your site? An updated copy is attached. -- Mike /* Filename: prio-preempt.c * Author: Dinakar Guniguntala <dino@us.ibm.com> * Description: Test priority preemption * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. * * Copyright (C) IBM Corporation, 2006 * * 2006-Jun-01: Initial version by Dinakar Guniguntala * Changes from John Stultz and Vivek Pallantla */ #include <stdio.h> #include <stdlib.h> #include <signal.h> #include <time.h> #include <pthread.h> #include <sched.h> #include <errno.h> #include <sys/syscall.h> #include "librt.h" #define NUM_WORKERS 27 #define CHECK_LIMIT 1 volatile int busy_threads = 0; volatile int test_over = 0; volatile int threads_running=0; static int rt_threads = 0; static int int_threads = 0; static pthread_mutex_t bmutex = PTHREAD_MUTEX_INITIALIZER; static pthread_mutex_t mutex[NUM_WORKERS+1]; static pthread_cond_t cond[NUM_WORKERS+1]; static int t_after_wait[NUM_WORKERS]; void *int_thread(void* arg) { int j, a = 0; while (!test_over) { /* do some busy work */ if (!(a%4)) a = a * 3; else if (!(a%6)) a = a /2; else a++; usleep(10); } return (void*)a; } void *busy_thread(void* arg) { struct sched_param sched_param; int policy, mypri = 0, tid; tid = (int) (((struct thread *) arg)->arg); if (pthread_getschedparam(pthread_self(), &policy, &sched_param) != 0) { printf("ERR: Couldn't get pthread info \n"); } else { mypri = sched_param.sched_priority; } pthread_mutex_lock(&bmutex); busy_threads++; printf("Busy Thread %d(%d): Running...\n", tid, mypri); pthread_mutex_unlock(&bmutex); /* TODO: Add sched set affinity here */ /* Busy loop */ while (!test_over); printf("Busy Thread %d(%d): Exiting\n", tid, mypri); return NULL; } void *worker_thread(void* arg) { struct sched_param sched_param; int policy, rc, mypri = 0, tid, times = 0; tid = (int) (((struct thread *) arg)->arg); nsec_t pstart, pend; if (pthread_getschedparam(pthread_self(), &policy, &sched_param) != 0) { printf("ERR: Couldn't get pthread info \n"); } else { mypri = sched_param.sched_priority; } /* check in */ pthread_mutex_lock(&bmutex); threads_running++; pthread_mutex_unlock(&bmutex); /* block */ rc = pthread_mutex_lock(&mutex[tid]); rc = pthread_cond_wait(&cond[tid], &mutex[tid]); rc = pthread_mutex_unlock(&mutex[tid]); /* printf("%llu: Thread %d(%d) wakes up from sleep \n", rt_gettime(), tid, mypri); */ /*check if we're the last thread */ if (tid == NUM_WORKERS-1) { t_after_wait[tid] = 1; pthread_mutex_lock(&bmutex); threads_running--; pthread_mutex_unlock(&bmutex); return NULL; } /* Signal next thread */ rc = pthread_mutex_lock(&mutex[tid+1]); rc = pthread_cond_signal(&cond[tid+1]); /* printf("%llu: Thread %d(%d): Sent signal (%d) to (%d)\n", rt_gettime(), tid, mypri, rc, tid+1); */ pstart = pend = rt_gettime(); rc = pthread_mutex_unlock(&mutex[tid+1]); /* printf("%llu: Thread %d(%d) setting it's bit \n",rt_gettime(), tid, mypri); */ t_after_wait[tid] = 1; while (t_after_wait[tid+1] != 1) { pend = rt_gettime(); times++; } if (times >= CHECK_LIMIT) { printf("Thread %d(%d): Non-Preempt limit reached. %llu ns max latency\n", tid, mypri, pend-pstart); exit(1); } /* check out */ pthread_mutex_lock(&bmutex); threads_running--; pthread_mutex_unlock(&bmutex); return NULL; } void *master_thread(void* arg) { int i, pri_boost; /* start interrupter thread */ if (int_threads) { pri_boost = 90; for (i = 0; i < rt_threads; i++) { create_fifo_thread(int_thread, (void*)0, sched_get_priority_min(SCHED_FIFO) + pri_boost); } } /* start the (N-1) busy threads */ pri_boost = 80; for (i = rt_threads; i > 1; i--) { create_fifo_thread(busy_thread, (void*)i, sched_get_priority_min(SCHED_FIFO) + pri_boost); } /* make sure children are started */ while (busy_threads < (rt_threads-1)) sleep(1); printf("Busy threads created!\n"); /* start NUM_WORKERS worker threads */ for (i = 0, pri_boost = 10; i < NUM_WORKERS ; i++, pri_boost+=2) { pthread_mutex_init(&mutex[i], NULL); pthread_cond_init(&cond[i], NULL); create_fifo_thread(worker_thread, (void*)i, sched_get_priority_min(SCHED_FIFO) + pri_boost); } printf("Worker threads created\n"); /* Let the worker threads wait on the cond vars */ while (threads_running < NUM_WORKERS) sleep(2); printf("Signaling first thread\n"); pthread_mutex_lock(&mutex[0]); pthread_cond_signal(&cond[0]); pthread_mutex_unlock(&mutex[0]); while (threads_running) sleep(20); test_over = 1; return NULL; } int parse_args(int c, char *v) { int handled = 1; switch (c) { case 'n': rt_threads = atoi(v); break; case 'i': int_threads = 1; break; default: handled = 0; break; } return handled; } int main(int argc, char* argv[]) { int pri_boost; rt_init("n:i", parse_args, argc, argv); if (rt_threads == 0) { printf("Usage: %s -n num_rt_threads [-i]\n", argv[0]); exit(1); } pri_boost = 81; create_fifo_thread(master_thread, (void*)0, sched_get_priority_min(SCHED_FIFO) + pri_boost); /* wait for threads to complete */ join_threads(); return 0; } ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH -rt] Priority preemption latency 2006-06-15 21:06 ` Mike Kravetz 2006-06-15 22:13 ` Darren Hart 2006-06-15 23:05 ` Esben Nielsen @ 2006-07-11 16:48 ` Mike Kravetz 2 siblings, 0 replies; 15+ messages in thread From: Mike Kravetz @ 2006-07-11 16:48 UTC (permalink / raw) To: Ingo Molnar, Darren Hart Cc: linux-kernel, Darren Hart, Thomas Gleixner, John Stultz Hi Ingo, As you can see from the note below, we are still having issues with this testcase. As a reminder Darren's mail with a pointer to the new testcase is at: http://www.uwsg.indiana.edu/hypermail/linux/kernel/0606.1/0729.html When run with the '-i' option it fails on a regular (aprox 50% on 4 way) basis. Any insight into what may be happening would be appreciated. Thanks, -- Mike On Thu, Jun 15, 2006 at 02:06:58PM -0700, Mike Kravetz wrote: > On Mon, Jun 12, 2006 at 08:38:24AM -0700, Darren Hart wrote: > > I started running this version of the patch with prio-preemt in a loop > > over 10 hours ago, and it's still running. This seems to be the right fix. > > Unfortunately, this test did eventually fail over in our environment. > John Stultz added the concept of 'interrupter threads' to the testcase. > These high priority RT interrupter threads, occasionally wake up and > run for a short period of time. Since these new threads are higher > priority than any other, they theoretically should not impact the > testcase. This is because the primary purpose of the testcase is to > ensure that lower priority tasks do not run while higher priority tasks > are waiting for a CPU. > > After adding these interrupter threads, the tetscase fails (on a system > here) about 50% of the time. An updated version of prio-preempt.c is > attached. It needs the same headers/makefile/etc as originally provided > by Darren. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH -rt] Priority preemption latency 2006-06-11 7:36 ` Ingo Molnar 2006-06-12 15:38 ` Darren Hart @ 2006-06-12 20:12 ` Mike Kravetz 1 sibling, 0 replies; 15+ messages in thread From: Mike Kravetz @ 2006-06-12 20:12 UTC (permalink / raw) To: Ingo Molnar; +Cc: Darren Hart, linux-kernel, Thomas Gleixner On Sun, Jun 11, 2006 at 09:36:09AM +0200, Ingo Molnar wrote: > ok - could you try the patch from today (re-attached below)? Maybe that > theoretical scenario i mentioned is only theoretical in theory ;-) Thanks for the patch! I just started looking at the RT scheduling code and am trying to get a good understanding of what is happening. In the case where the task on the current CPU can not be preempted, we send IPIs to all other CPUs and they end up running the balance_rt_tasks() routine to try and schedule the task. Is that correct? If the newly awakened task can preempt more than one currently running RT task, then is it possible for tasks to 'bounce around' a bit before we end up running the top N priority RT tasks (where N is the number of CPUs)? In the case where the newly awakened task can preempt the task on the current CPU, shouldn't we also trigger reschedules on other CPUs? Perhaps we do somewhere and I am missing it. I am thinking of the case where the current CPU was running a RT task with lower priority than the awakened task. But, another CPU is running an even lower priority RT task. In this case, we really want the newly awakened task to preempt the lower priority task on the remote CPU. Thanks, -- Mike ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2006-07-11 16:47 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-06-10 0:01 [RFC PATCH -rt] Priority preemption latency Darren Hart 2006-06-10 6:48 ` Ingo Molnar 2006-06-10 7:21 ` Ingo Molnar 2006-06-11 5:49 ` Darren Hart 2006-06-11 5:58 ` Ingo Molnar 2006-06-11 6:37 ` Ingo Molnar 2006-06-11 6:24 ` Darren Hart 2006-06-11 7:36 ` Ingo Molnar 2006-06-12 15:38 ` Darren Hart 2006-06-15 21:06 ` Mike Kravetz 2006-06-15 22:13 ` Darren Hart 2006-06-15 23:05 ` Esben Nielsen 2006-06-15 22:48 ` Mike Kravetz 2006-07-11 16:48 ` Mike Kravetz 2006-06-12 20:12 ` Mike Kravetz
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox