From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758030AbcAYUNZ (ORCPT ); Mon, 25 Jan 2016 15:13:25 -0500 Received: from mail-wm0-f42.google.com ([74.125.82.42]:35491 "EHLO mail-wm0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757166AbcAYUNX (ORCPT ); Mon, 25 Jan 2016 15:13:23 -0500 Date: Mon, 25 Jan 2016 21:13:20 +0100 From: Michal Hocko To: Christoph Lameter Cc: Mike Galbraith , Peter Zijlstra , LKML Subject: Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle) Message-ID: <20160125201319.GA19020@dhcp22.suse.cz> References: <20160121165148.GF29520@dhcp22.suse.cz> <20160122140418.GB19465@dhcp22.suse.cz> <20160122161201.GC19465@dhcp22.suse.cz> <1453566115.3529.8.camel@gmail.com> <20160125174224.GH23934@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon 25-01-16 12:02:06, Christoph Lameter wrote: > On Mon, 25 Jan 2016, Michal Hocko wrote: > > > On Sat 23-01-16 17:21:55, Mike Galbraith wrote: > > > Hi Christoph, > > > > > > While you're fixing that commit up, can you perhaps find a better home > > > for quiet_vmstat()? It not only munches cycles when switching cross > > > -core mightily, for -rt it injects a sleeping lock into the idle task. > > > > > > 12.89% [kernel] [k] refresh_cpu_vm_stats.isra.12 > > > 4.75% [kernel] [k] __schedule > > > 4.70% [kernel] [k] mutex_unlock > > > 3.14% [kernel] [k] __switch_to > > > > Hmm, I wouldn't have expected that refresh_cpu_vm_stats could have > > such a large footprint. I guess this would be just an expensive noop > > because we have to check all the zones*counters and do an expensive > > this_cpu_xchg. Is the whole deferred thing worth this overhead? > > Why would the deferring cause this overhead? I guess the profile speaks for itself, doesn't it? > Also there is no cross core activity from quiet_vmstat(). It simply > disables the local vmstat updates. It doesn't go cross core but it still does nr_zones * counters atomic ops. > > Unless there is a clear and huge win from doing the vmstat update > > deferrable then I think a revert is more appropriate IMHO. > > It reduces the OS events that the application experiences by folding it > into the tick events. If its not deferrable then a timer event will be > generated in addition to the tick. We do not want that. Yes this is what I have read in the changelog. But "how much" part is really missing. Is this even quantifiable? > Workqueues are used in many places. If RT can sleep within workqueue > management functions then spinlocks cannot be taken anymore and there may > be issues with preemption. RT can sleep in _any_ spinlock except for raw spin locks. Even though the !RT kernel is not sleeping doesn't really matter much because cancel_delayed_work is quite a heavy function which shouldn't be called from the idle context AFAIU. Sure most of the time it will boil down to del_timer but it can hit the slowpath as well if the timer got migrated to a different CPU and we have to race with the WQ pool management IIUC. Maybe this overhead can be reduced by outsourcing the functionality to vmstat_shepherd which can check idle CPUs, cancel the timer for them update the differentials and put them to cpu_stat_off? > The regression that I know of (independent of "RT") is due as far as I > know due to the switch of the parameters of some vmstat functions to 64 > bit instead of 32 bit. I am not sure I am following. -- Michal Hocko SUSE Labs