From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1758030AbcAYUNZ (ORCPT <rfc822;w@1wt.eu>);
	Mon, 25 Jan 2016 15:13:25 -0500
Received: from mail-wm0-f42.google.com ([74.125.82.42]:35491 "EHLO
	mail-wm0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1757166AbcAYUNX (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Mon, 25 Jan 2016 15:13:23 -0500
Date: Mon, 25 Jan 2016 21:13:20 +0100
From: Michal Hocko <mhocko@kernel.org>
To: Christoph Lameter <cl@linux.com>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>,
        Peter Zijlstra <peterz@infradead.org>,
        LKML <linux-kernel@vger.kernel.org>
Subject: Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable
 again and shut down on idle)
Message-ID: <20160125201319.GA19020@dhcp22.suse.cz>
References: <alpine.DEB.2.20.1601210941540.7063@east.gentwo.org>
 <20160121165148.GF29520@dhcp22.suse.cz>
 <alpine.DEB.2.20.1601211130580.7741@east.gentwo.org>
 <20160122140418.GB19465@dhcp22.suse.cz>
 <alpine.DEB.2.20.1601220950290.17929@east.gentwo.org>
 <20160122161201.GC19465@dhcp22.suse.cz>
 <alpine.DEB.2.20.1601221046020.17984@east.gentwo.org>
 <1453566115.3529.8.camel@gmail.com>
 <20160125174224.GH23934@dhcp22.suse.cz>
 <alpine.DEB.2.20.1601251152290.19487@east.gentwo.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <alpine.DEB.2.20.1601251152290.19487@east.gentwo.org>
User-Agent: Mutt/1.5.24 (2015-08-30)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Mon 25-01-16 12:02:06, Christoph Lameter wrote:
> On Mon, 25 Jan 2016, Michal Hocko wrote:
> 
> > On Sat 23-01-16 17:21:55, Mike Galbraith wrote:
> > > Hi Christoph,
> > >
> > > While you're fixing that commit up, can you perhaps find a better home
> > > for quiet_vmstat()?  It not only munches cycles when switching cross
> > > -core mightily, for -rt it injects a sleeping lock into the idle task.
> > >
> > >     12.89%  [kernel]       [k] refresh_cpu_vm_stats.isra.12
> > >      4.75%  [kernel]       [k] __schedule
> > >      4.70%  [kernel]       [k] mutex_unlock
> > >      3.14%  [kernel]       [k] __switch_to
> >
> > Hmm, I wouldn't have expected that refresh_cpu_vm_stats could have
> > such a large footprint. I guess this would be just an expensive noop
> > because we have to check all the zones*counters and do an expensive
> > this_cpu_xchg. Is the whole deferred thing worth this overhead?
> 
> Why would the deferring cause this overhead?

I guess the profile speaks for itself, doesn't it?

> Also there is no cross core activity from quiet_vmstat(). It simply
> disables the local vmstat updates.

It doesn't go cross core but it still does nr_zones * counters atomic
ops.

> > Unless there is a clear and huge win from doing the vmstat update
> > deferrable then I think a revert is more appropriate IMHO.
> 
> It reduces the OS events that the application experiences by folding it
> into the tick events. If its not deferrable then a timer event will be
> generated in addition to the tick. We do not want that.

Yes this is what I have read in the changelog. But "how much" part is
really missing. Is this even quantifiable?

> Workqueues are used in many places. If RT can sleep within workqueue
> management functions then spinlocks cannot be taken anymore and there may
> be issues with preemption.

RT can sleep in _any_ spinlock except for raw spin locks. Even though
the !RT kernel is not sleeping doesn't really matter much because
cancel_delayed_work is quite a heavy function which shouldn't be called
from the idle context AFAIU. Sure most of the time it will boil down to
del_timer but it can hit the slowpath as well if the timer got migrated
to a different CPU and we have to race with the WQ pool management IIUC.

Maybe this overhead can be reduced by outsourcing the functionality to
vmstat_shepherd which can check idle CPUs, cancel the timer for them
update the differentials and put them to cpu_stat_off? 

> The regression that I know of (independent of "RT") is due as far as I
> know due to the switch of the parameters of some vmstat functions to 64
> bit instead of 32 bit.

I am not sure I am following.

-- 
Michal Hocko
SUSE Labs