Re: [RFC, PATCH, -rt] Early prototype RCU priority-boost patch

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Paul E. McKenney" <paulmck@us.ibm.com>
To: Bill Huey <billh@gnuppy.monkey.org>
Cc: Esben Nielsen <nielsen.esben@googlemail.com>,
	linux-kernel@vger.kernel.org, tglx@linutronix.de,
	rostedt@goodmis.org, dipankar@in.ibm.com, mingo@elte.hu,
	tytso@us.ibm.com, dvhltc@us.ibm.com
Subject: Re: [RFC, PATCH, -rt] Early prototype RCU priority-boost patch
Date: Mon, 31 Jul 2006 07:38:50 -0700	[thread overview]
Message-ID: <20060731143850.GA1293@us.ibm.com> (raw)
In-Reply-To: <20060729025037.GB15392@gnuppy.monkey.org>

On Fri, Jul 28, 2006 at 07:50:37PM -0700, Bill Huey wrote:
> On Fri, Jul 28, 2006 at 07:18:29PM -0700, Paul E. McKenney wrote:
> > On Fri, Jul 28, 2006 at 03:27:16PM -0700, Bill Huey wrote:
> > > What is that ? like randomly boosting without tracking which thread is
> > > inside an RCU critical section ?
> > 
> > Perhaps a better way to put it would be that a thread preempted in
> > an RCU read-side critical section boosts itself, and tracks the fact
> > that it boosted itself in its tasks structure.
> > 
> > The second boost would be from some other task, but if the task had 
> > already boosted itself, the de-boosting would already be taken care of
> > at the next rcu_read_unlock() -- but as mentioned earlier in this
> > thread, you only boost someone else if they are not currently running.
> 
> The problem here is that I can't see how it's going to boost the thread
> if the things doing the RCU sync can't track the list of readers. It
> might be record in the trask struct, now what ?

The first boost is performed by the task itself the first time there
is a preemption attempt (or the first time it blocks on a mutex), so
no need to track the list of readers in that case.  The trick is that
there is no benefit to boosting someone who is already running -- we
only need to boost the first time they are considering blocking.

If there is a need for a second "boost to the sky" in case of excessively
delayed grace period (or to provide deterministic synchronize_rcu()
latency), then we need a list only of those RCU readers who have attempted
to block at least once thus far in their current RCU read-side critical
section.  But I was putting this off until I get the simple case right.
Cowardly of me, I know!  ;-)

Finally found Steve Rostedt's PI document (in 2.6.18-rc2), very helpful
(though I suppose I should reserve judgement until after I get this
working...)

> > > > 2.	RCU reader boosting a lock holder.  This ends up being a
> > > > 	combination of #1 (because the act of blocking on a lock implies
> > > > 	an "out of nowhere" priority boost) and normal lock boosting.
> > > 
> > > Lock holder as in mutex held below and RCU critical section ?
> > 
> > Lock holder as in task 0 holds the lock, perhaps in an RCU read-side
> > critical section and perhaps not.  Task 1 is in an RCU read-side
> > critical section and attempts to acquire the lock.  Task 1 must block,
> > because task 0 still holds the lock.  Task 1 must boost itself before
> > blocking, and must donate its boosted priority to task 0.
> 
> Ok (thinking...)
> 
> > > > 3.	A call_rcu() or synchronize_rcu() boosting all readers.  I am
> > > > 	not sure we really need this, but in case we do...  One would
> > > > 	need an additional prio_booster for each task to be boosted,
> > > > 	right?  This would seem to require an additional prio_booster
> > > > 	struct in each task structure.
> > >  
> > > This needs a notion of RCU read side ownership to boost those preempted
> > > threads.
> > 
> > I am getting the impression that #3 is something to leave aside for now.
> 
> ... 
> 
> > The idea is that none of this stuff ever happens except in cases where
> > the RCU read-side critical section blocks, in which case all this is
> 
> ...
> 
> > in the noise compared to the context switch.  The sole exception to
> > this is that rcu_read_unlock() must check to see if it has been boosted,
> > and deboost itself if so.  I don't particularly like the additional
> > comparison, but it should not be too expensive.
> 
> Oh, blocks as is gets shoved into a wait queue for a PI enabled lock.

Yep.

> > > Don't know what to think about it other than some kind of tracking or
> > > boosting logic in the per CPU run queue or the task struct itself during
> > > the boost operation. But you're still stuck with the problem of what
> > > to boost and how to find that out during an RCU sync side. It's still
> > > an ownership problem unless Esben can think of another way of getting
> > > around that problem.
> > 
> > One idea is to put tasks that block in RCU read-side critical sections
> > on a list -- again, the hope is that the overhead is in the noise compared
> > to the context switch.
> 
> Only way to find out is to try it.

Or to try without it and see what happens.

> > > That's why I suggested a priority ceiling or per CPU priority threshold
> > > tracking (+ CPU binding) the priority of the irq-threads and stuff. It's
> > > a simple hack to restore the cheesy preempt count stuff without having
> > > to revert to invasive ownership tracking for each reader.
> > > 
> > > It's just an idea. Maybe it'll be useful to you.
> > 
> > Let me make sure I understand what you are suggesting -- sounds to me
> > like a check in preempt_schedule().  If the task to be preempted is
> > higher priority than the ceiling, the preemption request is refused.
> > 
> > Or am I missing part of your proposal?
> 
> Something like at all preemption points, cond_resched() and friends (scheduler
> tick) to an additional check against a value in a threads own CPU run queue
> struct to see if it should permit the preemption or not. I'm thinking about
> ways to avoid doing an expensive run queue lock during a task's priority
> manipulation and instead have some other kind of logic orthogonal to that so
> that it can bypass this overhead.  A value in a run queue that can be checked
> against in order to prevent a preemption from happen might be able to side
> step the need for a doing a full run queue lock to reorder a tasks priority
> ranking.
> 
> If a ceiling or threshold was defined for RCU (another somewhat complicated
> topic) it could prevent the RCU critical section from preempting other
> than with other SCHED_FIFO tasks at and above that priority, if you choose
> a threshold at that priority. That'll be apart of the runtime configuratio
> of the system. You'd have to cpu_get/put to get that value so that you get
> at it safely, read or write to it, and maybe save and restore that value on
> entry and exit respectively. You'll also have to set a field in the task
> struct to prevent it from migration to another CPU and make sure that's
> modifying the right stuff on the right CPU.
> 
> It's a possible solution to a rather difficult problem. What do you think ?
> too much of a hack ?

I am not sure -- seems to be a dual approach to boosting the RCU reader's
priority in the preemption case.  I suspect that a real priority boost
would still be needed in the case where the RCU reader blocks on a mutex,
since we need the priority inheritance to happen in that case, right?

> (I'm into -rt development again after a good OLS and I'm trying to get my
> kernel development up and going so that I can help out)

Sounds very good!!!

							Thanx, Paul

next prev parent reply	other threads:[~2006-07-31 14:38 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-07-28  0:19 [RFC, PATCH, -rt] Early prototype RCU priority-boost patch Paul E. McKenney
2006-07-28 11:38 ` Esben Nielsen
2006-07-28 15:52   ` Paul E. McKenney
2006-07-28 20:00     ` Esben Nielsen
2006-07-29  2:20       ` Paul E. McKenney
2006-07-28 22:27     ` Bill Huey
2006-07-29  2:18       ` Paul E. McKenney
2006-07-29  2:50         ` Bill Huey
2006-07-31 14:38           ` Paul E. McKenney [this message]
2006-07-31 22:22             ` Bill Huey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060731143850.GA1293@us.ibm.com \
    --to=paulmck@us.ibm.com \
    --cc=billh@gnuppy.monkey.org \
    --cc=dipankar@in.ibm.com \
    --cc=dvhltc@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=nielsen.esben@googlemail.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=tytso@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.