From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750717AbWGaOiS (ORCPT ); Mon, 31 Jul 2006 10:38:18 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1750770AbWGaOiS (ORCPT ); Mon, 31 Jul 2006 10:38:18 -0400 Received: from e5.ny.us.ibm.com ([32.97.182.145]:35747 "EHLO e5.ny.us.ibm.com") by vger.kernel.org with ESMTP id S1750717AbWGaOiQ (ORCPT ); Mon, 31 Jul 2006 10:38:16 -0400 Date: Mon, 31 Jul 2006 07:38:50 -0700 From: "Paul E. McKenney" To: Bill Huey Cc: Esben Nielsen , linux-kernel@vger.kernel.org, tglx@linutronix.de, rostedt@goodmis.org, dipankar@in.ibm.com, mingo@elte.hu, tytso@us.ibm.com, dvhltc@us.ibm.com Subject: Re: [RFC, PATCH, -rt] Early prototype RCU priority-boost patch Message-ID: <20060731143850.GA1293@us.ibm.com> Reply-To: paulmck@us.ibm.com References: <20060728001918.GA2634@us.ibm.com> <20060728155220.GC1289@us.ibm.com> <20060728222716.GA13794@gnuppy.monkey.org> <20060729021829.GN1289@us.ibm.com> <20060729025037.GB15392@gnuppy.monkey.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20060729025037.GB15392@gnuppy.monkey.org> User-Agent: Mutt/1.4.1i Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jul 28, 2006 at 07:50:37PM -0700, Bill Huey wrote: > On Fri, Jul 28, 2006 at 07:18:29PM -0700, Paul E. McKenney wrote: > > On Fri, Jul 28, 2006 at 03:27:16PM -0700, Bill Huey wrote: > > > What is that ? like randomly boosting without tracking which thread is > > > inside an RCU critical section ? > > > > Perhaps a better way to put it would be that a thread preempted in > > an RCU read-side critical section boosts itself, and tracks the fact > > that it boosted itself in its tasks structure. > > > > The second boost would be from some other task, but if the task had > > already boosted itself, the de-boosting would already be taken care of > > at the next rcu_read_unlock() -- but as mentioned earlier in this > > thread, you only boost someone else if they are not currently running. > > The problem here is that I can't see how it's going to boost the thread > if the things doing the RCU sync can't track the list of readers. It > might be record in the trask struct, now what ? The first boost is performed by the task itself the first time there is a preemption attempt (or the first time it blocks on a mutex), so no need to track the list of readers in that case. The trick is that there is no benefit to boosting someone who is already running -- we only need to boost the first time they are considering blocking. If there is a need for a second "boost to the sky" in case of excessively delayed grace period (or to provide deterministic synchronize_rcu() latency), then we need a list only of those RCU readers who have attempted to block at least once thus far in their current RCU read-side critical section. But I was putting this off until I get the simple case right. Cowardly of me, I know! ;-) Finally found Steve Rostedt's PI document (in 2.6.18-rc2), very helpful (though I suppose I should reserve judgement until after I get this working...) > > > > 2. RCU reader boosting a lock holder. This ends up being a > > > > combination of #1 (because the act of blocking on a lock implies > > > > an "out of nowhere" priority boost) and normal lock boosting. > > > > > > Lock holder as in mutex held below and RCU critical section ? > > > > Lock holder as in task 0 holds the lock, perhaps in an RCU read-side > > critical section and perhaps not. Task 1 is in an RCU read-side > > critical section and attempts to acquire the lock. Task 1 must block, > > because task 0 still holds the lock. Task 1 must boost itself before > > blocking, and must donate its boosted priority to task 0. > > Ok (thinking...) > > > > > 3. A call_rcu() or synchronize_rcu() boosting all readers. I am > > > > not sure we really need this, but in case we do... One would > > > > need an additional prio_booster for each task to be boosted, > > > > right? This would seem to require an additional prio_booster > > > > struct in each task structure. > > > > > > This needs a notion of RCU read side ownership to boost those preempted > > > threads. > > > > I am getting the impression that #3 is something to leave aside for now. > > ... > > > The idea is that none of this stuff ever happens except in cases where > > the RCU read-side critical section blocks, in which case all this is > > ... > > > in the noise compared to the context switch. The sole exception to > > this is that rcu_read_unlock() must check to see if it has been boosted, > > and deboost itself if so. I don't particularly like the additional > > comparison, but it should not be too expensive. > > Oh, blocks as is gets shoved into a wait queue for a PI enabled lock. Yep. > > > Don't know what to think about it other than some kind of tracking or > > > boosting logic in the per CPU run queue or the task struct itself during > > > the boost operation. But you're still stuck with the problem of what > > > to boost and how to find that out during an RCU sync side. It's still > > > an ownership problem unless Esben can think of another way of getting > > > around that problem. > > > > One idea is to put tasks that block in RCU read-side critical sections > > on a list -- again, the hope is that the overhead is in the noise compared > > to the context switch. > > Only way to find out is to try it. Or to try without it and see what happens. > > > That's why I suggested a priority ceiling or per CPU priority threshold > > > tracking (+ CPU binding) the priority of the irq-threads and stuff. It's > > > a simple hack to restore the cheesy preempt count stuff without having > > > to revert to invasive ownership tracking for each reader. > > > > > > It's just an idea. Maybe it'll be useful to you. > > > > Let me make sure I understand what you are suggesting -- sounds to me > > like a check in preempt_schedule(). If the task to be preempted is > > higher priority than the ceiling, the preemption request is refused. > > > > Or am I missing part of your proposal? > > Something like at all preemption points, cond_resched() and friends (scheduler > tick) to an additional check against a value in a threads own CPU run queue > struct to see if it should permit the preemption or not. I'm thinking about > ways to avoid doing an expensive run queue lock during a task's priority > manipulation and instead have some other kind of logic orthogonal to that so > that it can bypass this overhead. A value in a run queue that can be checked > against in order to prevent a preemption from happen might be able to side > step the need for a doing a full run queue lock to reorder a tasks priority > ranking. > > If a ceiling or threshold was defined for RCU (another somewhat complicated > topic) it could prevent the RCU critical section from preempting other > than with other SCHED_FIFO tasks at and above that priority, if you choose > a threshold at that priority. That'll be apart of the runtime configuratio > of the system. You'd have to cpu_get/put to get that value so that you get > at it safely, read or write to it, and maybe save and restore that value on > entry and exit respectively. You'll also have to set a field in the task > struct to prevent it from migration to another CPU and make sure that's > modifying the right stuff on the right CPU. > > It's a possible solution to a rather difficult problem. What do you think ? > too much of a hack ? I am not sure -- seems to be a dual approach to boosting the RCU reader's priority in the preemption case. I suspect that a real priority boost would still be needed in the case where the RCU reader blocks on a mutex, since we need the priority inheritance to happen in that case, right? > (I'm into -rt development again after a good OLS and I'm trying to get my > kernel development up and going so that I can help out) Sounds very good!!! Thanx, Paul