From: Andrew Morton <akpm@linux-foundation.org>
To: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: mingo@elte.hu, linux-kernel@vger.kernel.org, npiggin@suse.de,
rostedt@goodmis.org
Subject: Re: [patch 2/2] sched: dynticks idle load balancing - v2
Date: Wed, 21 Feb 2007 12:23:44 -0800 [thread overview]
Message-ID: <20070221122344.564fb44f.akpm@linux-foundation.org> (raw)
In-Reply-To: <20070216180842.B8744@unix-os.sc.intel.com>
On Fri, 16 Feb 2007 18:08:42 -0800
"Siddha, Suresh B" <suresh.b.siddha@intel.com> wrote:
> Changes since v1:
>
> - Move the idle load balancer selection from schedule()
> to the first busy scheduler_tick() after restarting the tick.
> This will avoid the unnecessay ownership changes when
> softirq's(which are run in softirqd context in certain -rt
> configurations) like timer, sched are invoked for every idle tick
> that happens.
>
> - Misc cleanups.
>
> ---
> Fix the process idle load balancing in the presence of dynticks.
> cpus for which ticks are stopped will sleep till the next event wakes it up.
> Potentially these sleeps can be for large durations and during which today,
> there is no periodic idle load balancing being done.
>
> This patch nominates an owner among the idle cpus, which does the idle load
> balancing on behalf of the other idle cpus. And once all the cpus are
> completely idle, then we can stop this idle load balancing too. Checks added
> in fast path are minimized. Whenever there are busy cpus in the system, there
> will be an owner(idle cpu) doing the system wide idle load balancing.
>
> Open items:
> 1. Intelligent owner selection (like an idle core in a busy package).
> 2. Merge with rcu's nohz_cpu_mask?
>
I spose I'll hold my nose and merge this, but it creates too much of a mess
to be mergeable into the CPU scheduler, IMO.
Can we please do something to reduce the ifdef density? And if possible,
all the newly-added returns-from-the-middle-of-a-function?
> +#ifdef CONFIG_NO_HZ
> +static struct {
> + int load_balancer;
> + cpumask_t cpu_mask;
> +} nohz ____cacheline_aligned = {
> + .load_balancer = -1,
> + .cpu_mask = CPU_MASK_NONE,
> +};
> +
> +/*
> + * This routine will try to nominate the ilb (idle load balancing)
> + * owner among the cpus whose ticks are stopped. ilb owner will do the idle
> + * load balancing on behalf of all those cpus. If all the cpus in the system
> + * go into this tickless mode, then there will be no ilb owner (as there is
> + * no need for one) and all the cpus will sleep till the next wakeup event
> + * arrives...
> + *
> + * For the ilb owner, tick is not stopped. And this tick will be used
> + * for idle load balancing. ilb owner will still be part of
> + * nohz.cpu_mask..
> + *
> + * While stopping the tick, this cpu will become the ilb owner if there
> + * is no other owner. And will be the owner till that cpu becomes busy
> + * or if all cpus in the system stop their ticks at which point
> + * there is no need for ilb owner.
> + *
> + * When the ilb owner becomes busy, it nominates another owner, during the
> + * next busy scheduler_tick()
> + */
> +int select_nohz_load_balancer(int stop_tick)
> +{
> + int cpu = smp_processor_id();
> +
> + if (stop_tick) {
> + cpu_set(cpu, nohz.cpu_mask);
> + cpu_rq(cpu)->in_nohz_recently = 1;
> +
> + /*
> + * If we are going offline and still the leader, give up!
> + */
> + if (cpu_is_offline(cpu) && nohz.load_balancer == cpu) {
> + if (cmpxchg(&nohz.load_balancer, cpu, -1) != cpu)
So we require that architectures which implement CONFIG_NO_HZ also
implement cmpxchg.
> + BUG();
> + return 0;
> + }
> +
> + /* time for ilb owner also to sleep */
> + if (cpus_weight(nohz.cpu_mask) == num_online_cpus()) {
> + if (nohz.load_balancer == cpu)
> + nohz.load_balancer = -1;
> + return 0;
> + }
> +
> + if (nohz.load_balancer == -1) {
> + /* make me the ilb owner */
> + if (cmpxchg(&nohz.load_balancer, -1, cpu) == -1)
> + return 1;
> + } else if (nohz.load_balancer == cpu)
> + return 1;
> + } else {
> + if (!cpu_isset(cpu, nohz.cpu_mask))
> + return 0;
> +
> + cpu_clear(cpu, nohz.cpu_mask);
> +
> + if (nohz.load_balancer == cpu)
> + if (cmpxchg(&nohz.load_balancer, cpu, -1) != cpu)
> + BUG();
> + }
> + return 0;
> +}
> +#endif
> +
> /*
> * run_rebalance_domains is triggered when needed from the scheduler tick.
> *
> @@ -3347,15 +3437,46 @@ static DEFINE_SPINLOCK(balancing);
>
> static void run_rebalance_domains(struct softirq_action *h)
> {
> - int this_cpu = smp_processor_id(), balance = 1;
> - struct rq *this_rq = cpu_rq(this_cpu);
> - unsigned long interval;
> + int balance_cpu = smp_processor_id(), balance;
> + struct rq *balance_rq = cpu_rq(balance_cpu);
> + unsigned long interval, next_balance;
One definition per line is preferred.
> struct sched_domain *sd;
> - enum idle_type idle = this_rq->idle_at_tick ? SCHED_IDLE : NOT_IDLE;
> + enum idle_type idle;
> +
> +#ifdef CONFIG_NO_HZ
> + cpumask_t cpus = nohz.cpu_mask;
> + int local_cpu = balance_cpu;
> + struct rq *local_rq = balance_rq;
> +
> +restart:
> +
> + if (local_rq->idle_at_tick && nohz.load_balancer == local_cpu) {
> + if (need_resched())
> + return;
> +
> + balance_cpu = first_cpu(cpus);
> + if (unlikely(balance_cpu >= NR_CPUS))
> + return;
> + balance_rq = cpu_rq(balance_cpu);
> + cpu_clear(balance_cpu, cpus);
> + }
> +
> + /*
> + * We must be doing idle load balancing for some other idle cpu
> + */
> + if (local_cpu != balance_cpu)
> + idle = SCHED_IDLE;
> + else
> +#endif
> + idle = balance_rq->idle_at_tick ? SCHED_IDLE : NOT_IDLE;
> +
It's not just that this is an eyesore (it is) - it introduces some ongoing
maintenance problems. I expect this can be pulled out into a separate
function with no loss in quality of generated code.
And that function would be a nice site for a nice comment explaining the
design. Why do we return if need_resched()? What is the significance of
(balance_cpu >= NR_CPUS)?
> +
> /* Earliest time when we have to call run_rebalance_domains again */
> - unsigned long next_balance = jiffies + 60*HZ;
> + next_balance = jiffies + 60*HZ;
> +
> + for_each_domain(balance_cpu, sd) {
>
> - for_each_domain(this_cpu, sd) {
> if (!(sd->flags & SD_LOAD_BALANCE))
> continue;
>
> @@ -3374,7 +3495,7 @@ static void run_rebalance_domains(struct
> }
>
> if (time_after_eq(jiffies, sd->last_balance + interval)) {
> - if (load_balance(this_cpu, this_rq, sd, idle, &balance)) {
> + if (load_balance(balance_cpu, balance_rq, sd, idle, &balance)) {
> /*
> * We've pulled tasks over so either we're no
> * longer idle, or one of our SMT siblings is
> @@ -3398,7 +3519,16 @@ out:
> if (!balance)
> break;
> }
> - this_rq->next_balance = next_balance;
> + balance_rq->next_balance = next_balance;
> +
> +#ifdef CONFIG_NO_HZ
> + if (local_rq->idle_at_tick && nohz.load_balancer == local_cpu) {
> + if (time_after(next_balance, local_rq->next_balance))
> + local_rq->next_balance = next_balance;
> + if (!cpus_empty(cpus))
> + goto restart;
> + }
> +#endif
> }
A goto in an ifdef :(
Perhaps it can be removed by teaching the caller to call this function again?
next prev parent reply other threads:[~2007-02-21 20:23 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-12-11 23:53 [RFC] Patch: dynticks: idle load balancing Siddha, Suresh B
2006-12-13 22:43 ` Ingo Molnar
2006-12-13 23:13 ` Ingo Molnar
2006-12-13 23:03 ` Siddha, Suresh B
2006-12-13 23:31 ` Ingo Molnar
2006-12-13 23:19 ` Siddha, Suresh B
2006-12-14 0:34 ` Ingo Molnar
2006-12-19 20:12 ` Ingo Molnar
2006-12-19 21:12 ` Siddha, Suresh B
2007-01-16 11:35 ` Ingo Molnar
2007-01-30 21:57 ` Siddha, Suresh B
2007-02-07 22:19 ` Siddha, Suresh B
2007-02-17 2:03 ` [patch 1/2] sched: fix idle load balancing in softirqd context Siddha, Suresh B
2007-02-17 2:08 ` [patch 2/2] sched: dynticks idle load balancing - v2 Siddha, Suresh B
2007-02-21 20:23 ` Andrew Morton [this message]
2007-02-22 3:14 ` Nick Piggin
2007-02-24 2:01 ` [patch] sched: dynticks idle load balancing - v3 Siddha, Suresh B
2007-02-22 3:26 ` [patch 2/2] sched: dynticks idle load balancing - v2 Nick Piggin
2007-02-22 22:33 ` Siddha, Suresh B
2007-02-23 3:43 ` Nick Piggin
2007-02-17 14:42 ` [patch 1/2] sched: fix idle load balancing in softirqd context Steven Rostedt
2007-02-21 6:25 ` Siddha, Suresh B
2007-02-21 20:13 ` Andrew Morton
2006-12-13 23:48 ` [RFC] Patch: dynticks: idle load balancing Ingo Molnar
2006-12-20 0:49 ` Steven Rostedt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070221122344.564fb44f.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=npiggin@suse.de \
--cc=rostedt@goodmis.org \
--cc=suresh.b.siddha@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.