From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1751473AbcGLUFq (ORCPT <rfc822;w@1wt.eu>);
	Tue, 12 Jul 2016 16:05:46 -0400
Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:16512 "EHLO
	mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL)
	by vger.kernel.org with ESMTP id S1750826AbcGLUFo (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Tue, 12 Jul 2016 16:05:44 -0400
X-IBM-Helo: d23dlp02.au.ibm.com
X-IBM-MailFrom: shilpa.bhat@linux.vnet.ibm.com
X-IBM-RcptTo: linux-kernel@vger.kernel.org
Date: Wed, 13 Jul 2016 01:35:34 +0530
From: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0
MIME-Version: 1.0
To: Thomas Gleixner <tglx@linutronix.de>, Anton Blanchard <anton@samba.org>
CC: LKML <linux-kernel@vger.kernel.org>, Peter Zijlstra <peterz@infradead.org>,
        Ingo Molnar <mingo@kernel.org>, rt@linutronix.de,
        Michael Ellerman <mpe@ellerman.id.au>,
        Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>,
        shreyas@linux.vnet.ibm.com
Subject: Re: [patch 10/15] sched/migration: Move calc_load_migrate() into
 CPU_DYING
References: <20160310115406.940706476@linutronix.de> <20160310120025.328739226@linutronix.de> <20160712143745.430733e7@kryten> <alpine.DEB.2.11.1607121744350.4083@nanos>
In-Reply-To: <alpine.DEB.2.11.1607121744350.4083@nanos>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
X-TM-AS-MML: disable
X-Content-Scanned: Fidelis XPS MAILER
x-cbid: 16071220-0004-0000-0000-0000017947B4
X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused
x-cbparentid: 16071220-0005-0000-0000-0000083FE5B0
Message-Id: <57854D8E.4000004@linux.vnet.ibm.com>
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-07-12_09:,,
 signatures=0
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0
 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam
 adjust=0 reason=mlx scancount=1 engine=8.0.1-1604210000
 definitions=main-1607120183
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Hi,

On 07/12/2016 10:03 PM, Thomas Gleixner wrote:
> Anton,
> 
> On Tue, 12 Jul 2016, Anton Blanchard wrote:
>>> It really does not matter when we fold the load for the outgoing cpu.
>>> It's almost dead anyway, so there is no harm if we fail to fold the
>>> few microseconds which are required for going fully away.
>>
>> We are seeing the load average shoot up when hot unplugging CPUs (+1
>> for every CPU we offline) on ppc64. This reproduces on bare metal as
>> well as inside a KVM guest. A bisect points at this commit.
>>
>> As an example, a completely idle box with 128 CPUS and 112 hot
>> unplugged:
>>
>> # uptime
>>  04:35:30 up  1:23,  2 users,  load average: 112.43, 122.94, 125.54
> 
> Yes, it's an off by one as we now call that from the task which is tearing
> down the cpu. Does the patch below fix it?

Tested your patch to see that on an idle box on offlinig the cpus I dont see
increase in loadaverage.

# uptime
01:27:44 up 10 min,  1 user,  load average: 0.00, 0.18, 0.18

# lscpu | grep -Ei "on-line|off-line"
On-line CPU(s) list:   0-127

# ppc64_cpu --cores-on=2

# lscpu | grep -Ei "on-line|off-line"
On-line CPU(s) list:   0-15
Off-line CPU(s) list:  16-127

# sleep 60
# uptime
 01:28:52 up 11 min,  1 user,  load average: 0.11, 0.19, 0.18

Thanks and Regards,
Shilpa

> 
> Thanks,
> 
> 	tglx
> 
> 8<----------------------
> 
> Subject: sched/migration: Correct off by one in load migration
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> The move of calc_load_migrate() from CPU_DEAD to CPU_DYING did not take into
> account that the function is now called from a thread running on the outgoing
> CPU. As a result a cpu unplug leakes a load of 1 into the global load
> accounting mechanism.
> 
> Fix it by adjusting for the currently running thread which calls
> calc_load_migrate().
> 
> Fixes: e9cd8fa4fcfd: "sched/migration: Move calc_load_migrate() into CPU_DYING"
> Reported-by: Anton Blanchard <anton@samba.org>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 51d7105f529a..97ee9ac7e97c 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5394,13 +5394,15 @@ void idle_task_exit(void)
>  /*
>   * Since this CPU is going 'away' for a while, fold any nr_active delta
>   * we might have. Assumes we're called after migrate_tasks() so that the
> - * nr_active count is stable.
> + * nr_active count is stable. We need to take the teardown thread which
> + * is calling this into account, so we hand in adjust = 1 to the load
> + * calculation.
>   *
>   * Also see the comment "Global load-average calculations".
>   */
>  static void calc_load_migrate(struct rq *rq)
>  {
> -	long delta = calc_load_fold_active(rq);
> +	long delta = calc_load_fold_active(rq, 1);
>  	if (delta)
>  		atomic_long_add(delta, &calc_load_tasks);
>  }
> diff --git a/kernel/sched/loadavg.c b/kernel/sched/loadavg.c
> index b0b93fd33af9..a2d6eb71f06b 100644
> --- a/kernel/sched/loadavg.c
> +++ b/kernel/sched/loadavg.c
> @@ -78,11 +78,11 @@ void get_avenrun(unsigned long *loads, unsigned long offset, int shift)
>  	loads[2] = (avenrun[2] + offset) << shift;
>  }
> 
> -long calc_load_fold_active(struct rq *this_rq)
> +long calc_load_fold_active(struct rq *this_rq, long adjust)
>  {
>  	long nr_active, delta = 0;
> 
> -	nr_active = this_rq->nr_running;
> +	nr_active = this_rq->nr_running - adjust;
>  	nr_active += (long)this_rq->nr_uninterruptible;
> 
>  	if (nr_active != this_rq->calc_load_active) {
> @@ -188,7 +188,7 @@ void calc_load_enter_idle(void)
>  	 * We're going into NOHZ mode, if there's any pending delta, fold it
>  	 * into the pending idle delta.
>  	 */
> -	delta = calc_load_fold_active(this_rq);
> +	delta = calc_load_fold_active(this_rq, 0);
>  	if (delta) {
>  		int idx = calc_load_write_idx();
> 
> @@ -389,7 +389,7 @@ void calc_global_load_tick(struct rq *this_rq)
>  	if (time_before(jiffies, this_rq->calc_load_update))
>  		return;
> 
> -	delta  = calc_load_fold_active(this_rq);
> +	delta  = calc_load_fold_active(this_rq, 0);
>  	if (delta)
>  		atomic_long_add(delta, &calc_load_tasks);
> 
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 7cbeb92a1cb9..898c0d2f18fe 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -28,7 +28,7 @@ extern unsigned long calc_load_update;
>  extern atomic_long_t calc_load_tasks;
> 
>  extern void calc_global_load_tick(struct rq *this_rq);
> -extern long calc_load_fold_active(struct rq *this_rq);
> +extern long calc_load_fold_active(struct rq *this_rq, long adjust);
> 
>  #ifdef CONFIG_SMP
>  extern void cpu_load_update_active(struct rq *this_rq);
> 
> 
>