From: Frederic Weisbecker <fweisbec@gmail.com>
To: Vincent Guittot <vincent.guittot@linaro.org>
Cc: linux-kernel@vger.kernel.org, linaro-dev@lists.linaro.org,
peterz@infradead.org, mingo@kernel.org, rostedt@goodmis.org,
efault@gmx.de
Subject: Re: [PATCH v4] sched: fix init NOHZ_IDLE flag
Date: Fri, 22 Feb 2013 13:32:15 +0100 [thread overview]
Message-ID: <20130222123213.GA17948@somewhere.redhat.com> (raw)
In-Reply-To: <1361435356-28466-1-git-send-email-vincent.guittot@linaro.org>
On Thu, Feb 21, 2013 at 09:29:16AM +0100, Vincent Guittot wrote:
> On my smp platform which is made of 5 cores in 2 clusters, I have the
> nr_busy_cpu field of sched_group_power struct that is not null when the
> platform is fully idle. The root cause seems to be:
> During the boot sequence, some CPUs reach the idle loop and set their
> NOHZ_IDLE flag while waiting for others CPUs to boot. But the nr_busy_cpus
> field is initialized later with the assumption that all CPUs are in the busy
> state whereas some CPUs have already set their NOHZ_IDLE flag.
> During the initialization of the sched_domain, we set the NOHZ_IDLE flag when
> nr_busy_cpus is initialized to 0 in order to have a coherent configuration.
> If a CPU enters idle and call set_cpu_sd_state_idle during the build of the
> new sched_domain it will not corrupt the initial state
> set_cpu_sd_state_busy is modified and clears the NOHZ_IDLE only if a non NULL
> sched_domain is attached to the CPU (which is the case during the rebuild)
>
> Change since V3;
> - NOHZ flag is not cleared if a NULL domain is attached to the CPU
> - Remove patch 2/2 which becomes useless with latest modifications
>
> Change since V2:
> - change the initialization to idle state instead of busy state so a CPU that
> enters idle during the build of the sched_domain will not corrupt the
> initialization state
>
> Change since V1:
> - remove the patch for SCHED softirq on an idle core use case as it was
> a side effect of the other use cases.
>
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
> kernel/sched/core.c | 4 +++-
> kernel/sched/fair.c | 9 +++++++--
> 2 files changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 26058d0..c730a4e 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5884,7 +5884,9 @@ static void init_sched_groups_power(int cpu, struct sched_domain *sd)
> return;
>
> update_group_power(sd, cpu);
> - atomic_set(&sg->sgp->nr_busy_cpus, sg->group_weight);
> + atomic_set(&sg->sgp->nr_busy_cpus, 0);
> + set_bit(NOHZ_IDLE, nohz_flags(cpu));
> +
> }
>
> int __weak arch_sd_sibling_asym_packing(void)
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 81fa536..2701a92 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5403,15 +5403,20 @@ static inline void set_cpu_sd_state_busy(void)
> {
> struct sched_domain *sd;
> int cpu = smp_processor_id();
> + int clear = 0;
>
> if (!test_bit(NOHZ_IDLE, nohz_flags(cpu)))
> return;
> - clear_bit(NOHZ_IDLE, nohz_flags(cpu));
>
> rcu_read_lock();
> - for_each_domain(cpu, sd)
> + for_each_domain(cpu, sd) {
> atomic_inc(&sd->groups->sgp->nr_busy_cpus);
> + clear = 1;
> + }
> rcu_read_unlock();
> +
> + if (likely(clear))
> + clear_bit(NOHZ_IDLE, nohz_flags(cpu));
I fear there is still a race window:
= CPU 0 = = CPU 1 =
// NOHZ_IDLE is set
set_cpu_sd_state_busy() {
dom1 = rcu_dereference(dom1);
inc(dom1->nr_busy_cpus)
rcu_assign_pointer(dom 1, NULL)
// create new domain
init_sched_group_power() {
atomic_set(&tmp->nr_busy_cpus, 0);
set_bit(NOHZ_IDLE, nohz_flags(cpu 1));
rcu_assign_pointer(dom 1, tmp)
clear_bit(NOHZ_IDLE, nohz_flags(cpu));
}
I don't know if there is any sane way to deal with this issue other than
having nr_busy_cpus and nohz_flags in the same object sharing the same
lifecycle.
next prev parent reply other threads:[~2013-02-22 12:32 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-02-21 8:29 [PATCH v4] sched: fix init NOHZ_IDLE flag Vincent Guittot
2013-02-22 12:32 ` Frederic Weisbecker [this message]
2013-02-22 13:24 ` Vincent Guittot
2013-02-26 13:16 ` Frederic Weisbecker
2013-02-26 16:41 ` Vincent Guittot
2013-02-26 17:43 ` Frederic Weisbecker
2013-02-27 8:28 ` Vincent Guittot
2013-02-27 16:13 ` Frederic Weisbecker
2013-02-27 16:45 ` Vincent Guittot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130222123213.GA17948@somewhere.redhat.com \
--to=fweisbec@gmail.com \
--cc=efault@gmx.de \
--cc=linaro-dev@lists.linaro.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=vincent.guittot@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.