From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932414AbWFJHWN (ORCPT ); Sat, 10 Jun 2006 03:22:13 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S932429AbWFJHWN (ORCPT ); Sat, 10 Jun 2006 03:22:13 -0400 Received: from mx3.mail.elte.hu ([157.181.1.138]:59857 "EHLO mx3.mail.elte.hu") by vger.kernel.org with ESMTP id S932414AbWFJHWN (ORCPT ); Sat, 10 Jun 2006 03:22:13 -0400 Date: Sat, 10 Jun 2006 09:21:29 +0200 From: Ingo Molnar To: Darren Hart Cc: linux-kerneL@vger.kernel.org, Thomas Gleixner Subject: Re: [RFC PATCH -rt] Priority preemption latency Message-ID: <20060610072129.GA14567@elte.hu> References: <200606091701.55185.dvhltc@us.ibm.com> <20060610064850.GA11002@elte.hu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20060610064850.GA11002@elte.hu> User-Agent: Mutt/1.4.2.1i X-ELTE-SpamScore: 0.0 X-ELTE-SpamLevel: X-ELTE-SpamCheck: no X-ELTE-SpamVersion: ELTE 2.0 X-ELTE-SpamCheck-Details: score=0.0 required=5.9 tests=AWL,BAYES_50 autolearn=no SpamAssassin version=3.0.3 0.0 BAYES_50 BODY: Bayesian spam probability is 40 to 60% [score: 0.5995] 0.0 AWL AWL: From: address is in the auto white-list X-ELTE-VirusStatus: clean Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org * Ingo Molnar wrote: > could you try the (untested) patch below, does it solve your testcase > too? find updated patch below: - fix small race: use this_rq->curr not 'current'. - optimize the case where current CPU could be preempted and do not send IPIs. - integrate the RT-overload global statistics counters into schedstats. This should make things more scalable. Ingo Index: linux-rt.q/kernel/sched.c =================================================================== --- linux-rt.q.orig/kernel/sched.c +++ linux-rt.q/kernel/sched.c @@ -292,6 +292,11 @@ struct runqueue { /* try_to_wake_up() stats */ unsigned long ttwu_cnt; unsigned long ttwu_local; + + /* RT-overload stats: */ + unsigned long rto_schedule; + unsigned long rto_wakeup; + unsigned long rto_pulled; #endif }; @@ -426,7 +431,7 @@ static inline void task_rq_unlock(runque * bump this up when changing the output format or the meaning of an existing * format, so that tools can adapt (or abort) */ -#define SCHEDSTAT_VERSION 12 +#define SCHEDSTAT_VERSION 13 static int show_schedstat(struct seq_file *seq, void *v) { @@ -443,13 +448,14 @@ static int show_schedstat(struct seq_fil /* runqueue-specific stats */ seq_printf(seq, - "cpu%d %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu", + "cpu%d %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu", cpu, rq->yld_both_empty, rq->yld_act_empty, rq->yld_exp_empty, rq->yld_cnt, rq->sched_switch, rq->sched_cnt, rq->sched_goidle, rq->ttwu_cnt, rq->ttwu_local, rq->rq_sched_info.cpu_time, - rq->rq_sched_info.run_delay, rq->rq_sched_info.pcnt); + rq->rq_sched_info.run_delay, rq->rq_sched_info.pcnt, + rq->rto_schedule, rq->rto_wakeup, rq->rto_pulled); seq_printf(seq, "\n"); @@ -640,9 +646,7 @@ static inline void sched_info_switch(tas #define sched_info_switch(t, next) do { } while (0) #endif /* CONFIG_SCHEDSTATS */ -int rt_overload_schedule, rt_overload_wakeup, rt_overload_pulled; - -__cacheline_aligned_in_smp atomic_t rt_overload; +static __cacheline_aligned_in_smp atomic_t rt_overload; static inline void inc_rt_tasks(task_t *p, runqueue_t *rq) { @@ -1312,7 +1316,7 @@ static void balance_rt_tasks(runqueue_t if (p && (p->prio < next->prio)) { WARN_ON(p == src_rq->curr); WARN_ON(!p->array); - rt_overload_pulled++; + schedstat_inc(this_rq, rto_pulled); set_task_cpu(p, this_cpu); @@ -1469,9 +1473,9 @@ static inline int wake_idle(int cpu, tas static int try_to_wake_up(task_t *p, unsigned int state, int sync, int mutex) { int cpu, this_cpu, success = 0; + runqueue_t *this_rq, *rq; unsigned long flags; long old_state; - runqueue_t *rq; #ifdef CONFIG_SMP unsigned long load, this_load; struct sched_domain *sd, *this_sd = NULL; @@ -1587,12 +1591,34 @@ out_set_cpu: } else { /* * If a newly woken up RT task cannot preempt the - * current (RT) task then try to find another - * CPU it can preempt: + * current (RT) task (on a target runqueue) then try + * to find another CPU it can preempt: */ if (rt_task(p) && !TASK_PREEMPTS_CURR(p, rq)) { - smp_send_reschedule_allbutself(); - rt_overload_wakeup++; + this_rq = cpu_rq(this_cpu); + /* + * Special-case: the task on this CPU can be + * preempted. In that case there's no need to + * trigger reschedules on other CPUs, we can + * mark the current task for reschedule. + * + * (Note that it's safe to access this_rq without + * extra locking in this particular case, because + * we are on the current CPU.) + */ + if (TASK_PREEMPTS_CURR(p, this_rq)) + set_tsk_need_resched(this_rq->curr); + else + /* + * Neither the intended target runqueue + * nor the current CPU can take this task. + * Trigger a reschedule on all other CPUs + * nevertheless, maybe one of them can take + * this task: + */ + smp_send_reschedule_allbutself(); + + schedstat_inc(this_rq, rto_wakeup); } } @@ -1951,7 +1977,7 @@ static inline void finish_task_switch(ru * then kick other CPUs, they might run it: */ if (unlikely(rt_task(current) && prev->array && rt_task(prev))) { - rt_overload_schedule++; + schedstat_inc(rq, rto_schedule); smp_send_reschedule_allbutself(); } #endif