From: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
To: Thomas Gleixner <tglx@linutronix.de>, Anton Blanchard <anton@samba.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@kernel.org>,
rt@linutronix.de, Michael Ellerman <mpe@ellerman.id.au>,
Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>,
shreyas@linux.vnet.ibm.com
Subject: Re: [patch 10/15] sched/migration: Move calc_load_migrate() into CPU_DYING
Date: Wed, 13 Jul 2016 01:35:34 +0530 [thread overview]
Message-ID: <57854D8E.4000004@linux.vnet.ibm.com> (raw)
In-Reply-To: <alpine.DEB.2.11.1607121744350.4083@nanos>
Hi,
On 07/12/2016 10:03 PM, Thomas Gleixner wrote:
> Anton,
>
> On Tue, 12 Jul 2016, Anton Blanchard wrote:
>>> It really does not matter when we fold the load for the outgoing cpu.
>>> It's almost dead anyway, so there is no harm if we fail to fold the
>>> few microseconds which are required for going fully away.
>>
>> We are seeing the load average shoot up when hot unplugging CPUs (+1
>> for every CPU we offline) on ppc64. This reproduces on bare metal as
>> well as inside a KVM guest. A bisect points at this commit.
>>
>> As an example, a completely idle box with 128 CPUS and 112 hot
>> unplugged:
>>
>> # uptime
>> 04:35:30 up 1:23, 2 users, load average: 112.43, 122.94, 125.54
>
> Yes, it's an off by one as we now call that from the task which is tearing
> down the cpu. Does the patch below fix it?
Tested your patch to see that on an idle box on offlinig the cpus I dont see
increase in loadaverage.
# uptime
01:27:44 up 10 min, 1 user, load average: 0.00, 0.18, 0.18
# lscpu | grep -Ei "on-line|off-line"
On-line CPU(s) list: 0-127
# ppc64_cpu --cores-on=2
# lscpu | grep -Ei "on-line|off-line"
On-line CPU(s) list: 0-15
Off-line CPU(s) list: 16-127
# sleep 60
# uptime
01:28:52 up 11 min, 1 user, load average: 0.11, 0.19, 0.18
Thanks and Regards,
Shilpa
>
> Thanks,
>
> tglx
>
> 8<----------------------
>
> Subject: sched/migration: Correct off by one in load migration
> From: Thomas Gleixner <tglx@linutronix.de>
>
> The move of calc_load_migrate() from CPU_DEAD to CPU_DYING did not take into
> account that the function is now called from a thread running on the outgoing
> CPU. As a result a cpu unplug leakes a load of 1 into the global load
> accounting mechanism.
>
> Fix it by adjusting for the currently running thread which calls
> calc_load_migrate().
>
> Fixes: e9cd8fa4fcfd: "sched/migration: Move calc_load_migrate() into CPU_DYING"
> Reported-by: Anton Blanchard <anton@samba.org>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 51d7105f529a..97ee9ac7e97c 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5394,13 +5394,15 @@ void idle_task_exit(void)
> /*
> * Since this CPU is going 'away' for a while, fold any nr_active delta
> * we might have. Assumes we're called after migrate_tasks() so that the
> - * nr_active count is stable.
> + * nr_active count is stable. We need to take the teardown thread which
> + * is calling this into account, so we hand in adjust = 1 to the load
> + * calculation.
> *
> * Also see the comment "Global load-average calculations".
> */
> static void calc_load_migrate(struct rq *rq)
> {
> - long delta = calc_load_fold_active(rq);
> + long delta = calc_load_fold_active(rq, 1);
> if (delta)
> atomic_long_add(delta, &calc_load_tasks);
> }
> diff --git a/kernel/sched/loadavg.c b/kernel/sched/loadavg.c
> index b0b93fd33af9..a2d6eb71f06b 100644
> --- a/kernel/sched/loadavg.c
> +++ b/kernel/sched/loadavg.c
> @@ -78,11 +78,11 @@ void get_avenrun(unsigned long *loads, unsigned long offset, int shift)
> loads[2] = (avenrun[2] + offset) << shift;
> }
>
> -long calc_load_fold_active(struct rq *this_rq)
> +long calc_load_fold_active(struct rq *this_rq, long adjust)
> {
> long nr_active, delta = 0;
>
> - nr_active = this_rq->nr_running;
> + nr_active = this_rq->nr_running - adjust;
> nr_active += (long)this_rq->nr_uninterruptible;
>
> if (nr_active != this_rq->calc_load_active) {
> @@ -188,7 +188,7 @@ void calc_load_enter_idle(void)
> * We're going into NOHZ mode, if there's any pending delta, fold it
> * into the pending idle delta.
> */
> - delta = calc_load_fold_active(this_rq);
> + delta = calc_load_fold_active(this_rq, 0);
> if (delta) {
> int idx = calc_load_write_idx();
>
> @@ -389,7 +389,7 @@ void calc_global_load_tick(struct rq *this_rq)
> if (time_before(jiffies, this_rq->calc_load_update))
> return;
>
> - delta = calc_load_fold_active(this_rq);
> + delta = calc_load_fold_active(this_rq, 0);
> if (delta)
> atomic_long_add(delta, &calc_load_tasks);
>
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 7cbeb92a1cb9..898c0d2f18fe 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -28,7 +28,7 @@ extern unsigned long calc_load_update;
> extern atomic_long_t calc_load_tasks;
>
> extern void calc_global_load_tick(struct rq *this_rq);
> -extern long calc_load_fold_active(struct rq *this_rq);
> +extern long calc_load_fold_active(struct rq *this_rq, long adjust);
>
> #ifdef CONFIG_SMP
> extern void cpu_load_update_active(struct rq *this_rq);
>
>
>
next prev parent reply other threads:[~2016-07-12 20:05 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-10 12:04 [patch 00/15] cpu/hotplug: Convert scheduler to hotplug state machine Thomas Gleixner
2016-03-10 12:04 ` [patch 01/15] cpu/hotplug: Document states better Thomas Gleixner
2016-03-10 12:04 ` [patch 03/15] sched: Make set_cpu_rq_start_time() a built in hotplug state Thomas Gleixner
2016-03-10 12:04 ` [patch 04/15] sched: Allow hotplug notifiers to be setup early Thomas Gleixner
2016-03-10 12:04 ` [patch 05/15] sched: Consolidate the notifier maze Thomas Gleixner
2016-03-10 12:04 ` [patch 06/15] sched: Move sched_domains_numa_masks_clear() to DOWN_PREPARE Thomas Gleixner
2016-03-10 12:04 ` [patch 08/15] sched, hotplug: Move sync_rcu to be with set_cpu_active(false) Thomas Gleixner
2016-05-05 11:24 ` [tip:smp/hotplug] sched/hotplug: " tip-bot for Peter Zijlstra
2016-05-06 13:06 ` tip-bot for Peter Zijlstra
2016-03-10 12:04 ` [patch 07/15] sched/hotplug: Convert cpu_[in]active notifiers to state machine Thomas Gleixner
2016-03-10 12:04 ` [patch 10/15] sched/migration: Move calc_load_migrate() into CPU_DYING Thomas Gleixner
2016-05-05 11:24 ` [tip:smp/hotplug] " tip-bot for Thomas Gleixner
2016-05-06 13:06 ` tip-bot for Thomas Gleixner
2016-07-12 4:37 ` [patch 10/15] " Anton Blanchard
2016-07-12 16:33 ` Thomas Gleixner
2016-07-12 18:49 ` Vaidyanathan Srinivasan
2016-07-12 20:05 ` Shilpasri G Bhat [this message]
2016-07-13 7:49 ` Peter Zijlstra
2016-07-13 13:40 ` [tip:sched/urgent] sched/core: Correct off by one bug in load migration calculation tip-bot for Thomas Gleixner
2016-03-10 12:04 ` [patch 09/15] sched/migration: Move prepare transition to SCHED_STARTING state Thomas Gleixner
2016-05-05 11:24 ` [tip:smp/hotplug] " tip-bot for Thomas Gleixner
2016-05-06 13:06 ` tip-bot for Thomas Gleixner
2016-03-10 12:04 ` [patch 12/15] sched/hotplug: Move migration CPU_DYING to sched_cpu_dying() Thomas Gleixner
2016-05-05 11:25 ` [tip:smp/hotplug] " tip-bot for Thomas Gleixner
2016-05-06 13:07 ` tip-bot for Thomas Gleixner
2016-03-10 12:04 ` [patch 11/15] sched/migration: Move CPU_ONLINE into scheduler state Thomas Gleixner
2016-05-05 11:25 ` [tip:smp/hotplug] " tip-bot for Thomas Gleixner
2016-05-06 13:07 ` tip-bot for Thomas Gleixner
2016-03-10 12:04 ` [patch 13/15] sched/hotplug: Make activate() the last hotplug step Thomas Gleixner
2016-05-05 11:25 ` [tip:smp/hotplug] " tip-bot for Thomas Gleixner
2016-05-06 13:07 ` tip-bot for Thomas Gleixner
2016-03-10 12:04 ` [patch 14/15] sched/fair: Make ilb_notifier an explicit call Thomas Gleixner
2016-05-05 11:26 ` [tip:smp/hotplug] " tip-bot for Thomas Gleixner
2016-05-06 13:08 ` tip-bot for Thomas Gleixner
2016-03-10 12:04 ` [patch 15/15] sched: Make hrtick_notifier " Thomas Gleixner
2016-05-05 11:26 ` [tip:smp/hotplug] " tip-bot for Thomas Gleixner
2016-05-06 13:08 ` tip-bot for Thomas Gleixner
2016-04-04 7:54 ` [patch 00/15] cpu/hotplug: Convert scheduler to hotplug state machine Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=57854D8E.4000004@linux.vnet.ibm.com \
--to=shilpa.bhat@linux.vnet.ibm.com \
--cc=anton@samba.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=mpe@ellerman.id.au \
--cc=peterz@infradead.org \
--cc=rt@linutronix.de \
--cc=shreyas@linux.vnet.ibm.com \
--cc=svaidy@linux.vnet.ibm.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox