From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chris Metcalf Subject: Re: [PATCH v4 1/5] nohz_full: add support for "cpu_isolated" mode Date: Fri, 24 Jul 2015 16:22:07 -0400 Message-ID: <55B29E6F.7020600@ezchip.com> References: <1436817481-8732-1-git-send-email-cmetcalf@ezchip.com> <1436817481-8732-2-git-send-email-cmetcalf@ezchip.com> <55A4271B.9040506@ezchip.com> <55AE993E.6040501@ezchip.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org To: Andy Lutomirski , Paul McKenney Cc: Gilad Ben Yossef , Steven Rostedt , Ingo Molnar , Peter Zijlstra , Andrew Morton , Rik van Riel , Tejun Heo , Frederic Weisbecker , Thomas Gleixner , Christoph Lameter , Viresh Kumar , "linux-doc@vger.kernel.org" , Linux API , "linux-kernel@vger.kernel.org" List-Id: linux-api@vger.kernel.org On 07/21/2015 03:26 PM, Andy Lutomirski wrote: > On Tue, Jul 21, 2015 at 12:10 PM, Chris Metcalf wrote: >> So just for the sake of precision, the thing I'm talking about >> is the lru_add_drain() call on kernel exit. Are you proposing >> that we call that for every nohz_full core on kernel exit? >> I'm not opposed to this, but I don't know if other nohz >> developers feel like this is the right tradeoff. > I'm proposing either that we do that or that we arrange for other cpus > to be able to steal our LRU list while we're in RCU user/idle. That seems challenging; there is a lot that has to be done in lru_add_drain() and we may not want to do it for the "soft isolation" mode Frederic alludes to in a later email. And, we would have to add a bunch of locking to allow another process to steal the list from under us, so that's not obviously going to be a performance win in terms of the per-cpu page cache for normal operations. Perhaps there could be a lock taken that nohz_full processes have to take just to exit from userspace, and that other tasks could take to do things on behalf of the nohz_full process that it thinks it can do locklessly. It gets complicated, since you'd want to tie that to whether the nohz_full process was currently in the kernel or not, so some kind of atomic update on the context_tracking state or some such, perhaps. Still not really clear if that overhead is worth it (both from a maintenance point of view and the possible performance hit). Limiting it just to the hard isolation mode seems like a good answer since there we really know that userspace does not care about the performance implications of kernel/userspace transitions, and it doesn't cause slowdowns to anyone else. For now I will bundle it in with my respin as part of the "hard isolation" mode Frederic proposed. >> Well, in principle if we accepted my proposed patch series >> and then over time came to decide that it was reasonable >> for nohz_full to have these complete cpu isolation >> semantics, the one proposed ABI simply becomes a no-op. >> So it's not as problematic an ABI as some. > What if we made it a debugfs thing instead of a prctl? Have a mode > where the system tries really hard to quiesce itself even at the cost > of performance. No, since it's really a mode within an individual task that you'd like to switch on and off depending on what the task is trying to do - strict mode while it's running its main fast-path userspace code, but certainly not strict mode during its setup, and possibly leaving strict mode to run some kinds of slow-path, diagnostic, or error-handling code. -- Chris Metcalf, EZChip Semiconductor http://www.ezchip.com