From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751473AbcGLUFq (ORCPT ); Tue, 12 Jul 2016 16:05:46 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:16512 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750826AbcGLUFo (ORCPT ); Tue, 12 Jul 2016 16:05:44 -0400 X-IBM-Helo: d23dlp02.au.ibm.com X-IBM-MailFrom: shilpa.bhat@linux.vnet.ibm.com X-IBM-RcptTo: linux-kernel@vger.kernel.org Date: Wed, 13 Jul 2016 01:35:34 +0530 From: Shilpasri G Bhat User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0 MIME-Version: 1.0 To: Thomas Gleixner , Anton Blanchard CC: LKML , Peter Zijlstra , Ingo Molnar , rt@linutronix.de, Michael Ellerman , Vaidyanathan Srinivasan , shreyas@linux.vnet.ibm.com Subject: Re: [patch 10/15] sched/migration: Move calc_load_migrate() into CPU_DYING References: <20160310115406.940706476@linutronix.de> <20160310120025.328739226@linutronix.de> <20160712143745.430733e7@kryten> In-Reply-To: Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16071220-0004-0000-0000-0000017947B4 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16071220-0005-0000-0000-0000083FE5B0 Message-Id: <57854D8E.4000004@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-07-12_09:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1604210000 definitions=main-1607120183 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, On 07/12/2016 10:03 PM, Thomas Gleixner wrote: > Anton, > > On Tue, 12 Jul 2016, Anton Blanchard wrote: >>> It really does not matter when we fold the load for the outgoing cpu. >>> It's almost dead anyway, so there is no harm if we fail to fold the >>> few microseconds which are required for going fully away. >> >> We are seeing the load average shoot up when hot unplugging CPUs (+1 >> for every CPU we offline) on ppc64. This reproduces on bare metal as >> well as inside a KVM guest. A bisect points at this commit. >> >> As an example, a completely idle box with 128 CPUS and 112 hot >> unplugged: >> >> # uptime >> 04:35:30 up 1:23, 2 users, load average: 112.43, 122.94, 125.54 > > Yes, it's an off by one as we now call that from the task which is tearing > down the cpu. Does the patch below fix it? Tested your patch to see that on an idle box on offlinig the cpus I dont see increase in loadaverage. # uptime 01:27:44 up 10 min, 1 user, load average: 0.00, 0.18, 0.18 # lscpu | grep -Ei "on-line|off-line" On-line CPU(s) list: 0-127 # ppc64_cpu --cores-on=2 # lscpu | grep -Ei "on-line|off-line" On-line CPU(s) list: 0-15 Off-line CPU(s) list: 16-127 # sleep 60 # uptime 01:28:52 up 11 min, 1 user, load average: 0.11, 0.19, 0.18 Thanks and Regards, Shilpa > > Thanks, > > tglx > > 8<---------------------- > > Subject: sched/migration: Correct off by one in load migration > From: Thomas Gleixner > > The move of calc_load_migrate() from CPU_DEAD to CPU_DYING did not take into > account that the function is now called from a thread running on the outgoing > CPU. As a result a cpu unplug leakes a load of 1 into the global load > accounting mechanism. > > Fix it by adjusting for the currently running thread which calls > calc_load_migrate(). > > Fixes: e9cd8fa4fcfd: "sched/migration: Move calc_load_migrate() into CPU_DYING" > Reported-by: Anton Blanchard > Signed-off-by: Thomas Gleixner > --- > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 51d7105f529a..97ee9ac7e97c 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -5394,13 +5394,15 @@ void idle_task_exit(void) > /* > * Since this CPU is going 'away' for a while, fold any nr_active delta > * we might have. Assumes we're called after migrate_tasks() so that the > - * nr_active count is stable. > + * nr_active count is stable. We need to take the teardown thread which > + * is calling this into account, so we hand in adjust = 1 to the load > + * calculation. > * > * Also see the comment "Global load-average calculations". > */ > static void calc_load_migrate(struct rq *rq) > { > - long delta = calc_load_fold_active(rq); > + long delta = calc_load_fold_active(rq, 1); > if (delta) > atomic_long_add(delta, &calc_load_tasks); > } > diff --git a/kernel/sched/loadavg.c b/kernel/sched/loadavg.c > index b0b93fd33af9..a2d6eb71f06b 100644 > --- a/kernel/sched/loadavg.c > +++ b/kernel/sched/loadavg.c > @@ -78,11 +78,11 @@ void get_avenrun(unsigned long *loads, unsigned long offset, int shift) > loads[2] = (avenrun[2] + offset) << shift; > } > > -long calc_load_fold_active(struct rq *this_rq) > +long calc_load_fold_active(struct rq *this_rq, long adjust) > { > long nr_active, delta = 0; > > - nr_active = this_rq->nr_running; > + nr_active = this_rq->nr_running - adjust; > nr_active += (long)this_rq->nr_uninterruptible; > > if (nr_active != this_rq->calc_load_active) { > @@ -188,7 +188,7 @@ void calc_load_enter_idle(void) > * We're going into NOHZ mode, if there's any pending delta, fold it > * into the pending idle delta. > */ > - delta = calc_load_fold_active(this_rq); > + delta = calc_load_fold_active(this_rq, 0); > if (delta) { > int idx = calc_load_write_idx(); > > @@ -389,7 +389,7 @@ void calc_global_load_tick(struct rq *this_rq) > if (time_before(jiffies, this_rq->calc_load_update)) > return; > > - delta = calc_load_fold_active(this_rq); > + delta = calc_load_fold_active(this_rq, 0); > if (delta) > atomic_long_add(delta, &calc_load_tasks); > > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h > index 7cbeb92a1cb9..898c0d2f18fe 100644 > --- a/kernel/sched/sched.h > +++ b/kernel/sched/sched.h > @@ -28,7 +28,7 @@ extern unsigned long calc_load_update; > extern atomic_long_t calc_load_tasks; > > extern void calc_global_load_tick(struct rq *this_rq); > -extern long calc_load_fold_active(struct rq *this_rq); > +extern long calc_load_fold_active(struct rq *this_rq, long adjust); > > #ifdef CONFIG_SMP > extern void cpu_load_update_active(struct rq *this_rq); > > >