From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933503AbcA0REC (ORCPT ); Wed, 27 Jan 2016 12:04:02 -0500 Received: from mail-wm0-f52.google.com ([74.125.82.52]:38476 "EHLO mail-wm0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932303AbcA0RD6 (ORCPT ); Wed, 27 Jan 2016 12:03:58 -0500 Message-ID: <1453914234.3517.32.camel@gmail.com> Subject: Re: [PATCH] mm, vmstat: make quiet_vmstat lighter (was: Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable) again and shut down on idle) From: Mike Galbraith To: Michal Hocko Cc: Christoph Lameter , Peter Zijlstra , LKML Date: Wed, 27 Jan 2016 18:03:54 +0100 In-Reply-To: <20160127164825.GF13951@dhcp22.suse.cz> References: <20160121082402.GA29520@dhcp22.suse.cz> <20160121165148.GF29520@dhcp22.suse.cz> <20160122140418.GB19465@dhcp22.suse.cz> <20160122161201.GC19465@dhcp22.suse.cz> <1453566115.3529.8.camel@gmail.com> <20160127164825.GF13951@dhcp22.suse.cz> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.16.5 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 2016-01-27 at 17:48 +0100, Michal Hocko wrote: > On Sat 23-01-16 17:21:55, Mike Galbraith wrote: > > Hi Christoph, > > > > While you're fixing that commit up, can you perhaps find a better > > home > > for quiet_vmstat()? It not only munches cycles when switching > > cross > > -core mightily, for -rt it injects a sleeping lock into the idle > > task. > > > > 12.89% [kernel] [k] refresh_cpu_vm_stats.isra.12 > > 4.75% [kernel] [k] __schedule > > 4.70% [kernel] [k] mutex_unlock > > 3.14% [kernel] [k] __switch_to > > What about the following fix? I haven't done _extensive_ testing, but 4.5-rc1-rt (with NO_HZ_FULL enabling hacks restored, as it's otherwise a dead .config item) is gripe free whether nohz_full CPUs are in use or not, and overhead went away in it and desktop configured master kernel (PREEMPT_VOLUNTARY). > --- > From c74a04c4fdfe1fa67933bb1ac83d3de0532aaab2 Mon Sep 17 00:00:00 > 2001 > From: Michal Hocko > Date: Wed, 27 Jan 2016 17:24:22 +0100 > Subject: [PATCH] mm, vmstat: make quiet_vmstat lighter > > Mike has reported a considerable overhead of refresh_cpu_vm_stats > from > the idle entry during pipe test: > 12.89% [kernel] [k] refresh_cpu_vm_stats.isra.12 > 4.75% [kernel] [k] __schedule > 4.70% [kernel] [k] mutex_unlock > 3.14% [kernel] [k] __switch_to > > This is caused by 0eb77e988032 ("vmstat: make vmstat_updater > deferrable > again and shut down on idle") which has placed quiet_vmstat into > cpu_idle_loop. The main reason here seems to be that the idle entry > has > to get over all zones and perform atomic operations for each vmstat > entry even though there might be no per cpu diffs. This is a > pointless > overhead for _each_ idle entry. > > Make sure that quiet_vmstat is as light as possible. > > First of all it doesn't make any sense to do any local sync if the > current cpu is already set in oncpu_stat_off because vmstat_update > puts > itself there only if there is nothing to do. > > Then we can check need_update which should be a cheap way to check > for > potential per-cpu diffs and only then do refresh_cpu_vm_stats. > > The original patch also did cancel_delayed_work which we are not > doing > here. There are two reasons for that. Firstly cancel_delayed_work > from > idle context will blow up on RT kernels (reported by Mike): > [ 2.279582] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.5.0-rt3 #7 > [ 2.280444] Hardware name: MEDION MS-7848/MS-7848, BIOS > M7848W08.20C 09/23/2013 > [ 2.281316] ffff88040b00d640 ffff88040b01fe10 ffffffff812d20e2 > 0000000000000000 > [ 2.282202] ffff88040b01fe30 ffffffff81081095 ffff88041ec4cee0 > ffff88041ec501e0 > [ 2.283073] ffff88040b01fe48 ffffffff815ff910 ffff88041ec4cee0 > ffff88040b01fe88 > [ 2.283941] Call Trace: > [ 2.284797] [] dump_stack+0x49/0x67 > [ 2.285658] [] ___might_sleep+0xf5/0x180 > [ 2.286521] [] rt_spin_lock+0x20/0x50 > [ 2.287382] [] try_to_grab_pending+0x69/0x240 > [ 2.288239] [] cancel_delayed_work+0x26/0xe0 > [ 2.289094] [] quiet_vmstat+0x75/0xa0 > [ 2.289949] [] cpu_idle_loop+0x38/0x3e0 > [ 2.290800] [] cpu_startup_entry+0x13/0x20 > [ 2.291647] [] start_secondary+0x114/0x140 > > And secondly, even on !RT kernels it might add some non trivial > overhead > which is not necessary. Even if the vmstat worker wakes up and > preempts > idle then it will be most likely a single shot noop because the stats > were already synced and so it would end up on the oncpu_stat_off > anyway. > We just need to teach both vmstat_shepherd and vmstat_update to stop > scheduling the worker if there is nothing to do. > > Reported-by: Mike Galbraith > Signed-off-by: Michal Hocko > --- > mm/vmstat.c | 60 ++++++++++++++++++++++++++++++++++++++++----------- > --------- > 1 file changed, 40 insertions(+), 20 deletions(-) > > diff --git a/mm/vmstat.c b/mm/vmstat.c > index 40b2c74ddf16..7747f85537b6 100644 > --- a/mm/vmstat.c > +++ b/mm/vmstat.c > @@ -1396,10 +1396,15 @@ static void vmstat_update(struct work_struct > *w) > * Counters were updated so we expect more updates > * to occur in the future. Keep on running the > * update worker thread. > + * If we were marked on cpu_stat_off clear the flag > + * so that vmstat_shepherd doesn't schedule us > again. > */ > - queue_delayed_work_on(smp_processor_id(), vmstat_wq, > - this_cpu_ptr(&vmstat_work), > - round_jiffies_relative(sysctl_stat_interval) > ); > + if (!cpumask_test_and_clear_cpu(smp_processor_id(), > + cpu_stat_off)) { > + queue_delayed_work_on(smp_processor_id(), > vmstat_wq, > + this_cpu_ptr(&vmstat_work), > + round_jiffies_relative(sysctl_stat_i > nterval)); > + } > } else { > /* > * We did not update any counters so the app may be > in > @@ -1417,18 +1422,6 @@ static void vmstat_update(struct work_struct > *w) > * until the diffs stay at zero. The function is used by NOHZ and > can only be > * invoked when tick processing is not active. > */ > -void quiet_vmstat(void) > -{ > - if (system_state != SYSTEM_RUNNING) > - return; > - > - do { > - if (!cpumask_test_and_set_cpu(smp_processor_id(), > cpu_stat_off)) > - cancel_delayed_work(this_cpu_ptr(&vmstat_wor > k)); > - > - } while (refresh_cpu_vm_stats(false)); > -} > - > /* > * Check if the diffs for a certain cpu indicate that > * an update is needed. > @@ -1452,6 +1445,30 @@ static bool need_update(int cpu) > return false; > } > > +void quiet_vmstat(void) > +{ > + if (system_state != SYSTEM_RUNNING) > + return; > + > + /* > + * If we are already in hands of the shepherd then there > + * is nothing for us to do here. > + */ > + if (cpumask_test_and_set_cpu(smp_processor_id(), > cpu_stat_off)) > + return; > + > + if (!need_update(smp_processor_id())) > + return; > + > + /* > + * Just refresh counters and do not care about the pending > delayed > + * vmstat_update. It doesn't fire that often to matter and > canceling > + * it would be too expensive from this path. > + * vmstat_shepherd will take care about that for us. > + */ > + refresh_cpu_vm_stats(false); > +} > + > > /* > * Shepherd worker thread that checks the > @@ -1470,11 +1487,14 @@ static void vmstat_shepherd(struct > work_struct *w) > get_online_cpus(); > /* Check processors whose vmstat worker threads have been > disabled */ > for_each_cpu(cpu, cpu_stat_off) > - if (need_update(cpu) && > - cpumask_test_and_clear_cpu(cpu, > cpu_stat_off)) > - > - queue_delayed_work_on(cpu, vmstat_wq, > - &per_cpu(vmstat_work, cpu), 0); > + if (need_update(cpu)) { > + if (cpumask_test_and_clear_cpu(cpu, > cpu_stat_off)) > + queue_delayed_work_on(cpu, > vmstat_wq, > + &per_cpu(vmstat_work, cpu), > 0); > + } else { > + cpumask_set_cpu(cpu, cpu_stat_off); > + cancel_delayed_work(this_cpu_ptr(&vmstat_wor > k)); > + } > > put_online_cpus(); > > -- > 2.7.0.rc3 >