From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753251AbaGaVLe (ORCPT ); Thu, 31 Jul 2014 17:11:34 -0400 Received: from e35.co.us.ibm.com ([32.97.110.153]:33514 "EHLO e35.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751375AbaGaVLc (ORCPT ); Thu, 31 Jul 2014 17:11:32 -0400 Date: Thu, 31 Jul 2014 14:11:26 -0700 From: "Paul E. McKenney" To: josh@joshtriplett.org Cc: linux-kernel@vger.kernel.org, mingo@kernel.org, laijs@cn.fujitsu.com, dipankar@in.ibm.com, akpm@linux-foundation.org, mathieu.desnoyers@efficios.com, tglx@linutronix.de, peterz@infradead.org, rostedt@goodmis.org, dhowells@redhat.com, edumazet@google.com, dvhart@linux.intel.com, fweisbec@gmail.com, oleg@redhat.com, bobby.prani@gmail.com Subject: Re: [PATCH v2 tip/core/rcu 0/10] RCU-tasks implementation Message-ID: <20140731211126.GE11241@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20140731003914.GA3872@linux.vnet.ibm.com> <20140731161902.GA12697@cloud> <20140731165843.GV11241@linux.vnet.ibm.com> <20140731172024.GF12697@cloud> <20140731183816.GA11241@linux.vnet.ibm.com> <20140731205817.GB13837@cloud> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140731205817.GB13837@cloud> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 14073121-6688-0000-0000-000003B0020D Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jul 31, 2014 at 01:58:17PM -0700, josh@joshtriplett.org wrote: > On Thu, Jul 31, 2014 at 11:38:16AM -0700, Paul E. McKenney wrote: > > On Thu, Jul 31, 2014 at 10:20:24AM -0700, josh@joshtriplett.org wrote: > > > On Thu, Jul 31, 2014 at 09:58:43AM -0700, Paul E. McKenney wrote: > > > > On Thu, Jul 31, 2014 at 09:19:02AM -0700, josh@joshtriplett.org wrote: > > > > > On Wed, Jul 30, 2014 at 05:39:14PM -0700, Paul E. McKenney wrote: > > > > > > This series provides a prototype of an RCU-tasks implementation, which has > > > > > > been requested to assist with tramopoline removal. This flavor of RCU > > > > > > is task-based rather than CPU-based, and has voluntary context switch, > > > > > > usermode execution, and the idle loops as its only quiescent states. > > > > > > This selection of quiescent states ensures that at the end of a grace > > > > > > period, there will no longer be any tasks depending on a trampoline that > > > > > > was removed before the beginning of that grace period. This works because > > > > > > such trampolines do not contain function calls, do not contain voluntary > > > > > > context switches, do not switch to usermode, and do not switch to idle. > > > > > > > > > > I'm concerned about the amount of system overhead this introduces. > > > > > Polling for holdout tasks seems quite excessive. If I understand the > > > > > intended use case correctly, the users of this will want to free > > > > > relatively small amounts of memory; thus, waiting a while to do so seems > > > > > fine, especially if the system isn't under any particular memory > > > > > pressure. > > > > > > > > > > Thus, rather than polling, could you simply flag the holdout > > > > > tasks, telling the scheduler "hey, next time you don't have anything > > > > > better to do..."? Then don't bother with them again unless the system > > > > > runs low on memory and asks you to free some. (And mandate that you can > > > > > only use this to free memory rather than for any other purpose.) > > > > > > > > One of the many of my alternative suggestions that Steven rejected was > > > > to simply leak the memory. ;-) > > > > > > > > But from what I can see, if we simply flag the holdout tasks, we > > > > either are also holding onto the task_struct structures, re-introducing > > > > concurrency to the list of holdout tasks, or requiring that the eventual > > > > scan for holdout tasks scan the entire task list. Neither of these seems > > > > particularly appetizing to me. > > > > > > > > The nice thing about Lai Jiangshan's suggestion is that it allows the > > > > scan of the holdout list to be done completely unsynchronized, which > > > > allows pauses during the scan, thus allowing the loop to check for > > > > competing work on that CPU. This should get almost all the effect > > > > of indefinite delay without the indefinite delay (at least in the > > > > common case). > > > > > > > > Or am I missing something here? > > > > > > If you only allow a single outstanding set of callbacks at a time, you > > > could have a single flag stored in the task, combined with a count > > > stored with the set of callbacks. Each time one of the holdout tasks > > > comes up, clear the flag and decrement the count. If and only if you > > > get asked to free up memory, start poking the scheduler to bring up > > > those tasks. When the count hits 0, free the memory. > > > > > > The set of trampolines won't change often, and presumably only changes > > > in response to user-driven requests to trace or stop tracing things. > > > So, if you have to wait for the existing set of callbacks to go away > > > before adding more, that seems fine. And you could then ditch polling > > > entirely. > > > > If I understand what you are suggesting, this requires hooks in the > > scheduler. I used to have hooks in the scheduler, but I dropped them in > > favor of polling the voluntary context-switch count in response to Peter > > Zijlstra's concerns about adding overhead to the scheduler's fastpaths. > > > > Therefore, although the flags are sometimes cleared externally from the > > scheduling-clock interrupt (for usermode execution), it is quite possible > > that a given task might never have its flag cleared asynchronously. > > > > Another approach might be to poll more slowly or to make the polling > > evict itself if it detects that this CPU has something else to do. > > Would either or both of these help? > > As discussed at lunch today, another option would be to drop the thread > and handle cleanup synchronously from the caller in the tracing code, or > fire off a kthread *on request* to handle it asynchronously. That would > avoid paying the startup and overhead on a system that has tracing in > the kernel but never uses it, as will likely occur with distro kernels. Good points! As discussed, I am more inclined towards the second approach than towards the first, but let me get v3 tested and posted and then see how things look. Thanx, Paul