From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1762598AbXKQRk1 (ORCPT ); Sat, 17 Nov 2007 12:40:27 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755494AbXKQRkQ (ORCPT ); Sat, 17 Nov 2007 12:40:16 -0500 Received: from mcclure-nat.wal.novell.com ([130.57.22.22]:51094 "EHLO mcclure.wal.novell.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1755466AbXKQRkN (ORCPT ); Sat, 17 Nov 2007 12:40:13 -0500 Message-Id: <473EE01F.BA47.005A.0@novell.com> X-Mailer: Novell GroupWise Internet Agent 7.0.2 HP Date: Sat, 17 Nov 2007 12:35:43 -0500 From: "Gregory Haskins" To: "Steven Rostedt" , "LKML" Cc: "Peter Zijlstra" , "Ingo Molnar" , "Steven Rostedt" , "Christoph Lameter" Subject: Re: [PATCH v3 10/17] Remove some CFS specific code from the wakeup path of RT tasks References: <20071117062104.177779113@goodmis.org> <20071117062404.672976284@goodmis.org> In-Reply-To: <20071117062404.672976284@goodmis.org> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=__Part496F3C7F.5__=" Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org --=__Part496F3C7F.5__= Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Content-Disposition: inline >>> On Sat, Nov 17, 2007 at 1:21 AM, in message <20071117062404.672976284@goodmis.org>, Steven Rostedt wrote:=20 > -/* > - * wake_idle() will wake a task on an idle cpu if task->cpu is > - * not idle and an idle cpu is available. The span of cpus to > - * search starts with cpus closest then further out as needed, > - * so we always favor a closer, idle cpu. > - * > - * Returns the CPU we should wake onto. > - */ > -#if defined(ARCH_HAS_SCHED_WAKE_IDLE) > -static int wake_idle(int cpu, struct task_struct *p) > -{ > - cpumask_t tmp; > - struct sched_domain *sd; > - int i; > - > - /* > - * If it is idle, then it is the best cpu to run this task. > - * > - * This cpu is also the best, if it has more than one task = already. > - * Siblings must be also busy(in most cases) as they didn't = already > - * pickup the extra load from this cpu and hence we need not check > - * sibling runqueue info. This will avoid the checks and cache = miss > - * penalities associated with that. > - */ > - if (idle_cpu(cpu) || cpu_rq(cpu)->nr_running > 1) > - return cpu; > - > - for_each_domain(cpu, sd) { > - if (sd->flags & SD_WAKE_IDLE) { > - cpus_and(tmp, sd->span, p->cpus_allowed); > - for_each_cpu_mask(i, tmp) { > - if (idle_cpu(i)) { > - if (i !=3D task_cpu(p)) { > - schedstat_inc(p, > - se.nr_wakeups_idle)= ; = ^^^^^^^^^^^^^^^^^^^^^ [...] > --- linux-compile.git.orig/kernel/sched_fair.c 2007-11-16 = 11:16:38.000000000 -0500 > +++ linux-compile.git/kernel/sched_fair.c 2007-11-16 22:23:39.0000000= 00 -0500 > @@ -564,6 +564,137 @@ dequeue_entity(struct cfs_rq *cfs_rq, st > } > =20 > /* > + * wake_idle() will wake a task on an idle cpu if task->cpu is > + * not idle and an idle cpu is available. The span of cpus to > + * search starts with cpus closest then further out as needed, > + * so we always favor a closer, idle cpu. > + * > + * Returns the CPU we should wake onto. > + */ > +#if defined(ARCH_HAS_SCHED_WAKE_IDLE) > +static int wake_idle(int cpu, struct task_struct *p) > +{ > + cpumask_t tmp; > + struct sched_domain *sd; > + int i; > + > + /* > + * If it is idle, then it is the best cpu to run this task. > + * > + * This cpu is also the best, if it has more than one task = already. > + * Siblings must be also busy(in most cases) as they didn't = already > + * pickup the extra load from this cpu and hence we need not check > + * sibling runqueue info. This will avoid the checks and cache = miss > + * penalities associated with that. > + */ > + if (idle_cpu(cpu) || cpu_rq(cpu)->nr_running > 1) > + return cpu; > + > + for_each_domain(cpu, sd) { > + if (sd->flags & SD_WAKE_IDLE) { > + cpus_and(tmp, sd->span, p->cpus_allowed); > + for_each_cpu_mask(i, tmp) { > + if (idle_cpu(i)) > + return i; = ^^^^^^^^^^^^^^^^ Looks like some stuff that was added in 24 was inadvertently lost in the = move when you merged the patches up from 23.1-rt11. The attached patch is = updated to move the new logic as well. Regards, -Greg =20 --=__Part496F3C7F.5__= Content-Type: text/plain; name="sched-de-cfsize-rt-path.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="sched-de-cfsize-rt-path.patch" RT: Remove some CFS specific code from the wakeup path of RT tasks From: Gregory Haskins The current wake-up code path tries to determine if it can optimize the wake-up to "this_cpu" by computing load calculations. The problem is that these calculations are only relevant to CFS tasks where load is king. For = RT tasks, priority is king. So the load calculation is completely wasted bandwidth. Therefore, we create a new sched_class interface to help with pre-wakeup routing decisions and move the load calculation as a function of CFS task's class. Signed-off-by: Gregory Haskins --- include/linux/sched.h | 1=20 kernel/sched.c | 167 ++++++++-----------------------------------= ---- kernel/sched_fair.c | 148 ++++++++++++++++++++++++++++++++++++++++++ kernel/sched_idletask.c | 9 +++ kernel/sched_rt.c | 10 +++ 5 files changed, 195 insertions(+), 140 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index e9e74de..253517b 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -823,6 +823,7 @@ struct sched_class { void (*enqueue_task) (struct rq *rq, struct task_struct *p, int = wakeup); void (*dequeue_task) (struct rq *rq, struct task_struct *p, int = sleep); void (*yield_task) (struct rq *rq); + int (*select_task_rq)(struct task_struct *p, int sync); =20 void (*check_preempt_curr) (struct rq *rq, struct task_struct *p); =20 diff --git a/kernel/sched.c b/kernel/sched.c index f089f41..bb92ec4 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -861,6 +861,13 @@ iter_move_one_task(struct rq *this_rq, int this_cpu, = struct rq *busiest, struct rq_iterator *iterator); #endif =20 +#ifdef CONFIG_SMP +static unsigned long source_load(int cpu, int type); +static unsigned long target_load(int cpu, int type); +static unsigned long cpu_avg_load_per_task(int cpu); +static int task_hot(struct task_struct *p, u64 now, struct sched_domain = *sd); +#endif /* CONFIG_SMP */ + #include "sched_stats.h" #include "sched_idletask.c" #include "sched_fair.c" @@ -1040,7 +1047,7 @@ static inline void __set_task_cpu(struct task_struct = *p, unsigned int cpu) /* * Is this task likely cache-hot: */ -static inline int +static int task_hot(struct task_struct *p, u64 now, struct sched_domain *sd) { s64 delta; @@ -1265,7 +1272,7 @@ static unsigned long target_load(int cpu, int type) /* * Return the average load per task on the cpu's run queue */ -static inline unsigned long cpu_avg_load_per_task(int cpu) +static unsigned long cpu_avg_load_per_task(int cpu) { struct rq *rq =3D cpu_rq(cpu); unsigned long total =3D weighted_cpuload(cpu); @@ -1422,58 +1429,6 @@ static int sched_balance_self(int cpu, int flag) =20 #endif /* CONFIG_SMP */ =20 -/* - * wake_idle() will wake a task on an idle cpu if task->cpu is - * not idle and an idle cpu is available. The span of cpus to - * search starts with cpus closest then further out as needed, - * so we always favor a closer, idle cpu. - * - * Returns the CPU we should wake onto. - */ -#if defined(ARCH_HAS_SCHED_WAKE_IDLE) -static int wake_idle(int cpu, struct task_struct *p) -{ - cpumask_t tmp; - struct sched_domain *sd; - int i; - - /* - * If it is idle, then it is the best cpu to run this task. - * - * This cpu is also the best, if it has more than one task = already. - * Siblings must be also busy(in most cases) as they didn't = already - * pickup the extra load from this cpu and hence we need not check - * sibling runqueue info. This will avoid the checks and cache = miss - * penalities associated with that. - */ - if (idle_cpu(cpu) || cpu_rq(cpu)->nr_running > 1) - return cpu; - - for_each_domain(cpu, sd) { - if (sd->flags & SD_WAKE_IDLE) { - cpus_and(tmp, sd->span, p->cpus_allowed); - for_each_cpu_mask(i, tmp) { - if (idle_cpu(i)) { - if (i !=3D task_cpu(p)) { - schedstat_inc(p, - se.nr_wakeups_idle)= ; - } - return i; - } - } - } else { - break; - } - } - return cpu; -} -#else -static inline int wake_idle(int cpu, struct task_struct *p) -{ - return cpu; -} -#endif - /*** * try_to_wake_up - wake up a thread * @p: the to-be-woken-up thread @@ -1495,8 +1450,6 @@ static int try_to_wake_up(struct task_struct *p, = unsigned int state, int sync) long old_state; struct rq *rq; #ifdef CONFIG_SMP - struct sched_domain *sd, *this_sd =3D NULL; - unsigned long load, this_load; int new_cpu; #endif =20 @@ -1516,90 +1469,7 @@ static int try_to_wake_up(struct task_struct *p, = unsigned int state, int sync) if (unlikely(task_running(rq, p))) goto out_activate; =20 - new_cpu =3D cpu; - - schedstat_inc(rq, ttwu_count); - if (cpu =3D=3D this_cpu) { - schedstat_inc(rq, ttwu_local); - goto out_set_cpu; - } - - for_each_domain(this_cpu, sd) { - if (cpu_isset(cpu, sd->span)) { - schedstat_inc(sd, ttwu_wake_remote); - this_sd =3D sd; - break; - } - } - - if (unlikely(!cpu_isset(this_cpu, p->cpus_allowed))) - goto out_set_cpu; - - /* - * Check for affine wakeup and passive balancing possibilities. - */ - if (this_sd) { - int idx =3D this_sd->wake_idx; - unsigned int imbalance; - - imbalance =3D 100 + (this_sd->imbalance_pct - 100) / 2; - - load =3D source_load(cpu, idx); - this_load =3D target_load(this_cpu, idx); - - new_cpu =3D this_cpu; /* Wake to this CPU if we can */ - - if (this_sd->flags & SD_WAKE_AFFINE) { - unsigned long tl =3D this_load; - unsigned long tl_per_task; - - /* - * Attract cache-cold tasks on sync wakeups: - */ - if (sync && !task_hot(p, rq->clock, this_sd)) - goto out_set_cpu; - - schedstat_inc(p, se.nr_wakeups_affine_attempts); - tl_per_task =3D cpu_avg_load_per_task(this_cpu); - - /* - * If sync wakeup then subtract the (maximum = possible) - * effect of the currently running task from the = load - * of the current CPU: - */ - if (sync) - tl -=3D current->se.load.weight; - - if ((tl <=3D load && - tl + target_load(cpu, idx) <=3D tl_per_task= ) || - 100*(tl + p->se.load.weight) <=3D imbalance*= load) { - /* - * This domain has SD_WAKE_AFFINE and - * p is cache cold in this domain, and - * there is no bad imbalance. - */ - schedstat_inc(this_sd, ttwu_move_affine); - schedstat_inc(p, se.nr_wakeups_affine); - goto out_set_cpu; - } - } - - /* - * Start passive balancing when half the imbalance_pct - * limit is reached. - */ - if (this_sd->flags & SD_WAKE_BALANCE) { - if (imbalance*this_load <=3D 100*load) { - schedstat_inc(this_sd, ttwu_move_balance); - schedstat_inc(p, se.nr_wakeups_passive); - goto out_set_cpu; - } - } - } - - new_cpu =3D cpu; /* Could not wake to this_cpu. Wake to cpu = instead */ -out_set_cpu: - new_cpu =3D wake_idle(new_cpu, p); + new_cpu =3D p->sched_class->select_task_rq(p, sync); if (new_cpu !=3D cpu) { set_task_cpu(p, new_cpu); task_rq_unlock(rq, &flags); @@ -1615,6 +1485,23 @@ out_set_cpu: cpu =3D task_cpu(p); } =20 +#ifdef CONFIG_SCHEDSTATS + schedstat_inc(rq, ttwu_count); + if (cpu =3D=3D this_cpu) + schedstat_inc(rq, ttwu_local); + else { + struct sched_domain *sd; + for_each_domain(this_cpu, sd) { + if (cpu_isset(cpu, sd->span)) { + schedstat_inc(sd, ttwu_wake_remote); + break; + } + } + } + +#endif + + out_activate: #endif /* CONFIG_SMP */ schedstat_inc(p, se.nr_wakeups); diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c index d3c0307..9e30c96 100644 --- a/kernel/sched_fair.c +++ b/kernel/sched_fair.c @@ -830,6 +830,151 @@ static void yield_task_fair(struct rq *rq) } =20 /* + * wake_idle() will wake a task on an idle cpu if task->cpu is + * not idle and an idle cpu is available. The span of cpus to + * search starts with cpus closest then further out as needed, + * so we always favor a closer, idle cpu. + * + * Returns the CPU we should wake onto. + */ +#if defined(ARCH_HAS_SCHED_WAKE_IDLE) +static int wake_idle(int cpu, struct task_struct *p) +{ + cpumask_t tmp; + struct sched_domain *sd; + int i; + + /* + * If it is idle, then it is the best cpu to run this task. + * + * This cpu is also the best, if it has more than one task = already. + * Siblings must be also busy(in most cases) as they didn't = already + * pickup the extra load from this cpu and hence we need not check + * sibling runqueue info. This will avoid the checks and cache = miss + * penalities associated with that. + */ + if (idle_cpu(cpu) || cpu_rq(cpu)->nr_running > 1) + return cpu; + + for_each_domain(cpu, sd) { + if (sd->flags & SD_WAKE_IDLE) { + cpus_and(tmp, sd->span, p->cpus_allowed); + for_each_cpu_mask(i, tmp) { + if (idle_cpu(i)) { + if (i !=3D task_cpu(p)) { + schedstat_inc(p, + se.nr_wakeups_idle);= + } + return i; + } + } + } else { + break; + } + } + return cpu; +} +#else +static inline int wake_idle(int cpu, struct task_struct *p) +{ + return cpu; +} +#endif + +#ifdef CONFIG_SMP +static int select_task_rq_fair(struct task_struct *p, int sync) +{ + int cpu, this_cpu; + struct rq *rq; + struct sched_domain *sd, *this_sd =3D NULL; + int new_cpu; + + cpu =3D task_cpu(p); + rq =3D task_rq(p); + this_cpu =3D smp_processor_id(); + new_cpu =3D cpu; + + for_each_domain(this_cpu, sd) { + if (cpu_isset(cpu, sd->span)) { + this_sd =3D sd; + break; + } + } + + if (unlikely(!cpu_isset(this_cpu, p->cpus_allowed))) + goto out_set_cpu; + + /* + * Check for affine wakeup and passive balancing possibilities. + */ + if (this_sd) { + int idx =3D this_sd->wake_idx; + unsigned int imbalance; + unsigned long load, this_load; + + imbalance =3D 100 + (this_sd->imbalance_pct - 100) / 2; + + load =3D source_load(cpu, idx); + this_load =3D target_load(this_cpu, idx); + + new_cpu =3D this_cpu; /* Wake to this CPU if we can */ + + if (this_sd->flags & SD_WAKE_AFFINE) { + unsigned long tl =3D this_load; + unsigned long tl_per_task; + + /* + * Attract cache-cold tasks on sync wakeups: + */ + if (sync && !task_hot(p, rq->clock, this_sd)) + goto out_set_cpu; + + schedstat_inc(p, se.nr_wakeups_affine_attempts); + tl_per_task =3D cpu_avg_load_per_task(this_cpu); + + /* + * If sync wakeup then subtract the (maximum = possible) + * effect of the currently running task from the = load + * of the current CPU: + */ + if (sync) + tl -=3D current->se.load.weight; + + if ((tl <=3D load && + tl + target_load(cpu, idx) <=3D tl_per_task= ) || + 100*(tl + p->se.load.weight) <=3D imbalance*= load) { + /* + * This domain has SD_WAKE_AFFINE and + * p is cache cold in this domain, and + * there is no bad imbalance. + */ + schedstat_inc(this_sd, ttwu_move_affine); + schedstat_inc(p, se.nr_wakeups_affine); + goto out_set_cpu; + } + } + + /* + * Start passive balancing when half the imbalance_pct + * limit is reached. + */ + if (this_sd->flags & SD_WAKE_BALANCE) { + if (imbalance*this_load <=3D 100*load) { + schedstat_inc(this_sd, ttwu_move_balance); + schedstat_inc(p, se.nr_wakeups_passive); + goto out_set_cpu; + } + } + } + + new_cpu =3D cpu; /* Could not wake to this_cpu. Wake to cpu = instead */ +out_set_cpu: + return wake_idle(new_cpu, p); +} +#endif /* CONFIG_SMP */ + + +/* * Preempt the current task with a newly woken task if needed: */ static void check_preempt_wakeup(struct rq *rq, struct task_struct *p) @@ -1102,6 +1247,9 @@ static const struct sched_class fair_sched_class =3D = { .enqueue_task =3D enqueue_task_fair, .dequeue_task =3D dequeue_task_fair, .yield_task =3D yield_task_fair, +#ifdef CONFIG_SMP + .select_task_rq =3D select_task_rq_fair, +#endif /* CONFIG_SMP */ =20 .check_preempt_curr =3D check_preempt_wakeup, =20 diff --git a/kernel/sched_idletask.c b/kernel/sched_idletask.c index bf9c25c..ca53748 100644 --- a/kernel/sched_idletask.c +++ b/kernel/sched_idletask.c @@ -5,6 +5,12 @@ * handled in sched_fair.c) */ =20 +#ifdef CONFIG_SMP +static int select_task_rq_idle(struct task_struct *p, int sync) +{ + return task_cpu(p); /* IDLE tasks as never migrated */ +} +#endif /* CONFIG_SMP */ /* * Idle tasks are unconditionally rescheduled: */ @@ -72,6 +78,9 @@ const struct sched_class idle_sched_class =3D { =20 /* dequeue is not valid, we print a debug message there: */ .dequeue_task =3D dequeue_task_idle, +#ifdef CONFIG_SMP + .select_task_rq =3D select_task_rq_idle, +#endif /* CONFIG_SMP */ =20 .check_preempt_curr =3D check_preempt_curr_idle, =20 diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c index 4a469c5..0e408dc 100644 --- a/kernel/sched_rt.c +++ b/kernel/sched_rt.c @@ -147,6 +147,13 @@ yield_task_rt(struct rq *rq) requeue_task_rt(rq, rq->curr); } =20 +#ifdef CONFIG_SMP +static int select_task_rq_rt(struct task_struct *p, int sync) +{ + return task_cpu(p); +} +#endif /* CONFIG_SMP */ + /* * Preempt the current task with a newly woken task if needed: */ @@ -669,6 +676,9 @@ const struct sched_class rt_sched_class =3D { .enqueue_task =3D enqueue_task_rt, .dequeue_task =3D dequeue_task_rt, .yield_task =3D yield_task_rt, +#ifdef CONFIG_SMP + .select_task_rq =3D select_task_rq_rt, +#endif /* CONFIG_SMP */ =20 .check_preempt_curr =3D check_preempt_curr_rt, =20 --=__Part496F3C7F.5__=--