[RFC, PATCH, -rt] Early prototype RCU priority-boost patch

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC, PATCH, -rt] Early prototype RCU priority-boost patch
@ 2006-07-28  0:19 Paul E. McKenney
  2006-07-28 11:38 ` Esben Nielsen
  0 siblings, 1 reply; 10+ messages in thread
From: Paul E. McKenney @ 2006-07-28  0:19 UTC (permalink / raw)
  To: linux-kernel, tglx
  Cc: rostedt, dipankar, billh, nielsen.esben, mingo, tytso, dvhltc

Hello!

This is a very crude not-for-inclusion patch that boosts priority of
RCU read-side critical sections, but only when they are preempted, and
only to the highest non-RT priority.  The rcu_read_unlock() primitive
does the unboosting.  There are a large number of things that this patch
does -not- do, including:

o	Boost RCU read-side critical sections to the highest possible
	priority.  One might wish to do this in OOM situations.  Or
	if the grace period is extending too long.  I played with this
	a bit some months back, see:

		http://www.rdrop.com/users/paulmck/patches/RCUboost-20.patch

	to see what I was thinking.  Or similarly-numbered patches,
	see http://www.rdrop.com/users/paulmck/patches for the full
	list.  Lots of subtly broken approaches for those who are
	interested in subtle breakage.

	One must carefully resolve races between boosting and the
	to-be-boosted task slipping out of its RCU read-side critical
	section.  My thought has been to grab the to-be-boosted task
	by the throat, and only boost it if it is (1) still in an
	RCU read-side critical section and (2) not running.  If you
	try boosting a thread that is already running, the races between
	boosting and rcu_read_unlock() are insanely complex, particularly
	for implementations of rcu_read_unlock() that don't use atomic
	instructions or memory barriers.  ;-)

	Much better to either have the thread boost itself or to make
	sure the thread is not running if having someone else boost it.

o	Boost RCU read-side critical sections that must block waiting
	for a non-raw spinlock.  The URL noted above shows one approach
	I was messing with some time back.

o	Boost RCU read-side critical sections based on the priority of
	tasks doing synchronize_rcu() and call_rcu().  (This was something
	Steve Rostedt suggested at OLS.)  One thing at a time!  ;-)

o	Implementing preemption thresholding, as suggested by Bill Huey.
	I am taking the coward's way out on this for the moment in order
	to improve the odds of getting something useful done (as opposed
	to getting something potentially even more useful only half done).

Anyway, the following patch compiles and passes lightweight "smoke" tests.
It almost certainly has fatal flaws -- for, example, I don't see how it
would handle yet another task doing a lock-based priority boost between
the time the task is RCU-boosted and the time it de-boosts itself in
rcu_read_unlock().

Again, not for inclusion in its present form, but any enlightenment would
be greatly appreciated.

(Thomas, you did ask for this!!!)

							Thanx, Paul

Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com> (but not for inclusion)
---

 include/linux/init_task.h |    1 +
 include/linux/rcupdate.h  |    2 ++
 include/linux/sched.h     |    3 +++
 kernel/rcupreempt.c       |   11 +++++++++++
 kernel/sched.c            |    8 ++++++++
 5 files changed, 25 insertions(+)

diff -urpNa -X dontdiff linux-2.6.17-rt7/include/linux/init_task.h linux-2.6.17-rt7-rcubp/include/linux/init_task.h
--- linux-2.6.17-rt7/include/linux/init_task.h	2006-07-27 14:29:55.000000000 -0700
+++ linux-2.6.17-rt7-rcubp/include/linux/init_task.h	2006-07-27 14:34:20.000000000 -0700
@@ -89,6 +89,7 @@ extern struct group_info init_groups;
 	.prio		= MAX_PRIO-20,					\
 	.static_prio	= MAX_PRIO-20,					\
 	.normal_prio	= MAX_PRIO-20,					\
+	.rcu_prio	= MAX_PRIO,					\
 	.policy		= SCHED_NORMAL,					\
 	.cpus_allowed	= CPU_MASK_ALL,					\
 	.mm		= NULL,						\
diff -urpNa -X dontdiff linux-2.6.17-rt7/include/linux/rcupdate.h linux-2.6.17-rt7-rcubp/include/linux/rcupdate.h
--- linux-2.6.17-rt7/include/linux/rcupdate.h	2006-07-27 14:29:55.000000000 -0700
+++ linux-2.6.17-rt7-rcubp/include/linux/rcupdate.h	2006-07-27 14:34:20.000000000 -0700
@@ -175,6 +175,8 @@ extern int rcu_needs_cpu(int cpu);
 
 #else /* #ifndef CONFIG_PREEMPT_RCU */
 
+#define RCU_PREEMPT_BOOST_PRIO MAX_USER_RT_PRIO  /* Initial boost level. */
+
 #define rcu_qsctr_inc(cpu)
 #define rcu_bh_qsctr_inc(cpu)
 #define call_rcu_bh(head, rcu) call_rcu(head, rcu)
diff -urpNa -X dontdiff linux-2.6.17-rt7/include/linux/sched.h linux-2.6.17-rt7-rcubp/include/linux/sched.h
--- linux-2.6.17-rt7/include/linux/sched.h	2006-07-27 14:29:55.000000000 -0700
+++ linux-2.6.17-rt7-rcubp/include/linux/sched.h	2006-07-27 14:34:20.000000000 -0700
@@ -851,6 +851,9 @@ struct task_struct {
 	int oncpu;
 #endif
 	int prio, static_prio, normal_prio;
+#ifdef CONFIG_PREEMPT_RCU
+	int rcu_prio;
+#endif
 	struct list_head run_list;
 	prio_array_t *array;
 
diff -urpNa -X dontdiff linux-2.6.17-rt7/kernel/rcupreempt.c linux-2.6.17-rt7-rcubp/kernel/rcupreempt.c
--- linux-2.6.17-rt7/kernel/rcupreempt.c	2006-07-27 14:29:55.000000000 -0700
+++ linux-2.6.17-rt7-rcubp/kernel/rcupreempt.c	2006-07-27 14:34:20.000000000 -0700
@@ -147,6 +147,17 @@ rcu_read_lock(void)
 			atomic_inc(current->rcu_flipctr2);
 			smp_mb__after_atomic_inc();  /* might optimize out... */
 		}
+		if (unlikely(current->rcu_prio <= RCU_PREEMPT_BOOST_PRIO)) {
+			int new_prio = MAX_PRIO;
+
+			current->rcu_prio = MAX_PRIO;
+			if (new_prio > current->static_prio)
+				new_prio = current->static_prio;
+			if (new_prio > current->normal_prio)
+				new_prio = current->normal_prio;
+			/* How to account for lock-based prio boost? */
+			rt_mutex_setprio(current, new_prio);
+		}
 	}
 	trace_special((unsigned long) current->rcu_flipctr1,
 		      (unsigned long) current->rcu_flipctr2,
diff -urpNa -X dontdiff linux-2.6.17-rt7/kernel/sched.c linux-2.6.17-rt7-rcubp/kernel/sched.c
--- linux-2.6.17-rt7/kernel/sched.c	2006-07-27 14:29:55.000000000 -0700
+++ linux-2.6.17-rt7-rcubp/kernel/sched.c	2006-07-27 14:58:40.000000000 -0700
@@ -3685,6 +3685,14 @@ asmlinkage void __sched preempt_schedule
 		return;
 
 need_resched:
+#ifdef CONFIG_PREEMPT_RT
+	if (unlikely(current->rcu_read_lock_nesting > 0) &&
+	    (current->rcu_prio > RCU_PREEMPT_BOOST_PRIO)) {
+		current->rcu_prio = RCU_PREEMPT_BOOST_PRIO;
+		if (current->rcu_prio < current->prio)
+			rt_mutex_setprio(current, current->rcu_prio);
+	}
+#endif
 	local_irq_disable();
 	add_preempt_count(PREEMPT_ACTIVE);
 	/*

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC, PATCH, -rt] Early prototype RCU priority-boost patch
  2006-07-28  0:19 [RFC, PATCH, -rt] Early prototype RCU priority-boost patch Paul E. McKenney
@ 2006-07-28 11:38 ` Esben Nielsen
  2006-07-28 15:52   ` Paul E. McKenney
  0 siblings, 1 reply; 10+ messages in thread
From: Esben Nielsen @ 2006-07-28 11:38 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, tglx, rostedt, dipankar, billh, nielsen.esben,
	mingo, tytso, dvhltc

Hi,
  I have considered an idea to make this work with the PI: Add the ability 
to at a waiter not refering to a lock to the PI list. I think a few 
subsystems can use that if they temporarely want to boost a task in a 
consistend way (HR-timers is one). After a little renaming getting the 
boosting part seperated out of rt_mutex_waiter:

  struct prio_booster {
 	struct plist_node	booster_list_entry;
  };

  void add_prio_booster(struct task_struct *, struct prio_booster *booster);
  void remove_prio_booster(struct task_struct *, struct prio_booster 
*booster);
  void change_prio_booster(struct task_struct *, struct prio_booster 
*booster, int new_prio);

(these functions takes care of doing/triggering a lock chain traversal if 
needed) and change

  struct rt_mutext_waiter {
     ...
     struct prio_booster booster;
     ...
  };

There are issues with lock orderings between task->pi_lock (which should 
be renamed to task->prio_lock) and rq->lock. The lock ordering probably 
have to be reversed, thus integrating the boosting system directly into 
the scheduler instead of into rtmutex-subsystem.

Esben

On Thu, 27 Jul 2006, Paul E. McKenney wrote:

> Hello!
>
> This is a very crude not-for-inclusion patch that boosts priority of
> RCU read-side critical sections, but only when they are preempted, and
> only to the highest non-RT priority.  The rcu_read_unlock() primitive
> does the unboosting.  There are a large number of things that this patch
> does -not- do, including:
>
> o	Boost RCU read-side critical sections to the highest possible
> 	priority.  One might wish to do this in OOM situations.  Or
> 	if the grace period is extending too long.  I played with this
> 	a bit some months back, see:
>
> 		http://www.rdrop.com/users/paulmck/patches/RCUboost-20.patch
>
> 	to see what I was thinking.  Or similarly-numbered patches,
> 	see http://www.rdrop.com/users/paulmck/patches for the full
> 	list.  Lots of subtly broken approaches for those who are
> 	interested in subtle breakage.
>
> 	One must carefully resolve races between boosting and the
> 	to-be-boosted task slipping out of its RCU read-side critical
> 	section.  My thought has been to grab the to-be-boosted task
> 	by the throat, and only boost it if it is (1) still in an
> 	RCU read-side critical section and (2) not running.  If you
> 	try boosting a thread that is already running, the races between
> 	boosting and rcu_read_unlock() are insanely complex, particularly
> 	for implementations of rcu_read_unlock() that don't use atomic
> 	instructions or memory barriers.  ;-)
>
> 	Much better to either have the thread boost itself or to make
> 	sure the thread is not running if having someone else boost it.
>
> o	Boost RCU read-side critical sections that must block waiting
> 	for a non-raw spinlock.  The URL noted above shows one approach
> 	I was messing with some time back.
>
> o	Boost RCU read-side critical sections based on the priority of
> 	tasks doing synchronize_rcu() and call_rcu().  (This was something
> 	Steve Rostedt suggested at OLS.)  One thing at a time!  ;-)
>
> o	Implementing preemption thresholding, as suggested by Bill Huey.
> 	I am taking the coward's way out on this for the moment in order
> 	to improve the odds of getting something useful done (as opposed
> 	to getting something potentially even more useful only half done).
>
> Anyway, the following patch compiles and passes lightweight "smoke" tests.
> It almost certainly has fatal flaws -- for, example, I don't see how it
> would handle yet another task doing a lock-based priority boost between
> the time the task is RCU-boosted and the time it de-boosts itself in
> rcu_read_unlock().
>
> Again, not for inclusion in its present form, but any enlightenment would
> be greatly appreciated.
>
> (Thomas, you did ask for this!!!)
>
> 							Thanx, Paul
>
> Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com> (but not for inclusion)
> ---
>
> include/linux/init_task.h |    1 +
> include/linux/rcupdate.h  |    2 ++
> include/linux/sched.h     |    3 +++
> kernel/rcupreempt.c       |   11 +++++++++++
> kernel/sched.c            |    8 ++++++++
> 5 files changed, 25 insertions(+)
>
> diff -urpNa -X dontdiff linux-2.6.17-rt7/include/linux/init_task.h linux-2.6.17-rt7-rcubp/include/linux/init_task.h
> --- linux-2.6.17-rt7/include/linux/init_task.h	2006-07-27 14:29:55.000000000 -0700
> +++ linux-2.6.17-rt7-rcubp/include/linux/init_task.h	2006-07-27 14:34:20.000000000 -0700
> @@ -89,6 +89,7 @@ extern struct group_info init_groups;
> 	.prio		= MAX_PRIO-20,					\
> 	.static_prio	= MAX_PRIO-20,					\
> 	.normal_prio	= MAX_PRIO-20,					\
> +	.rcu_prio	= MAX_PRIO,					\
> 	.policy		= SCHED_NORMAL,					\
> 	.cpus_allowed	= CPU_MASK_ALL,					\
> 	.mm		= NULL,						\
> diff -urpNa -X dontdiff linux-2.6.17-rt7/include/linux/rcupdate.h linux-2.6.17-rt7-rcubp/include/linux/rcupdate.h
> --- linux-2.6.17-rt7/include/linux/rcupdate.h	2006-07-27 14:29:55.000000000 -0700
> +++ linux-2.6.17-rt7-rcubp/include/linux/rcupdate.h	2006-07-27 14:34:20.000000000 -0700
> @@ -175,6 +175,8 @@ extern int rcu_needs_cpu(int cpu);
>
> #else /* #ifndef CONFIG_PREEMPT_RCU */
>
> +#define RCU_PREEMPT_BOOST_PRIO MAX_USER_RT_PRIO  /* Initial boost level. */
> +
> #define rcu_qsctr_inc(cpu)
> #define rcu_bh_qsctr_inc(cpu)
> #define call_rcu_bh(head, rcu) call_rcu(head, rcu)
> diff -urpNa -X dontdiff linux-2.6.17-rt7/include/linux/sched.h linux-2.6.17-rt7-rcubp/include/linux/sched.h
> --- linux-2.6.17-rt7/include/linux/sched.h	2006-07-27 14:29:55.000000000 -0700
> +++ linux-2.6.17-rt7-rcubp/include/linux/sched.h	2006-07-27 14:34:20.000000000 -0700
> @@ -851,6 +851,9 @@ struct task_struct {
> 	int oncpu;
> #endif
> 	int prio, static_prio, normal_prio;
> +#ifdef CONFIG_PREEMPT_RCU
> +	int rcu_prio;
> +#endif
> 	struct list_head run_list;
> 	prio_array_t *array;
>
> diff -urpNa -X dontdiff linux-2.6.17-rt7/kernel/rcupreempt.c linux-2.6.17-rt7-rcubp/kernel/rcupreempt.c
> --- linux-2.6.17-rt7/kernel/rcupreempt.c	2006-07-27 14:29:55.000000000 -0700
> +++ linux-2.6.17-rt7-rcubp/kernel/rcupreempt.c	2006-07-27 14:34:20.000000000 -0700
> @@ -147,6 +147,17 @@ rcu_read_lock(void)
> 			atomic_inc(current->rcu_flipctr2);
> 			smp_mb__after_atomic_inc();  /* might optimize out... */
> 		}
> +		if (unlikely(current->rcu_prio <= RCU_PREEMPT_BOOST_PRIO)) {
> +			int new_prio = MAX_PRIO;
> +
> +			current->rcu_prio = MAX_PRIO;
> +			if (new_prio > current->static_prio)
> +				new_prio = current->static_prio;
> +			if (new_prio > current->normal_prio)
> +				new_prio = current->normal_prio;
> +			/* How to account for lock-based prio boost? */
> +			rt_mutex_setprio(current, new_prio);
> +		}
> 	}
> 	trace_special((unsigned long) current->rcu_flipctr1,
> 		      (unsigned long) current->rcu_flipctr2,
> diff -urpNa -X dontdiff linux-2.6.17-rt7/kernel/sched.c linux-2.6.17-rt7-rcubp/kernel/sched.c
> --- linux-2.6.17-rt7/kernel/sched.c	2006-07-27 14:29:55.000000000 -0700
> +++ linux-2.6.17-rt7-rcubp/kernel/sched.c	2006-07-27 14:58:40.000000000 -0700
> @@ -3685,6 +3685,14 @@ asmlinkage void __sched preempt_schedule
> 		return;
>
> need_resched:
> +#ifdef CONFIG_PREEMPT_RT
> +	if (unlikely(current->rcu_read_lock_nesting > 0) &&
> +	    (current->rcu_prio > RCU_PREEMPT_BOOST_PRIO)) {
> +		current->rcu_prio = RCU_PREEMPT_BOOST_PRIO;
> +		if (current->rcu_prio < current->prio)
> +			rt_mutex_setprio(current, current->rcu_prio);
> +	}
> +#endif
> 	local_irq_disable();
> 	add_preempt_count(PREEMPT_ACTIVE);
> 	/*
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC, PATCH, -rt] Early prototype RCU priority-boost patch
  2006-07-28 11:38 ` Esben Nielsen
@ 2006-07-28 15:52   ` Paul E. McKenney
  2006-07-28 20:00     ` Esben Nielsen
  2006-07-28 22:27     ` Bill Huey
  0 siblings, 2 replies; 10+ messages in thread
From: Paul E. McKenney @ 2006-07-28 15:52 UTC (permalink / raw)
  To: Esben Nielsen
  Cc: linux-kernel, tglx, rostedt, dipankar, billh, mingo, tytso,
	dvhltc

On Fri, Jul 28, 2006 at 12:38:33PM +0100, Esben Nielsen wrote:
> Hi,
>  I have considered an idea to make this work with the PI: Add the ability 
> to at a waiter not refering to a lock to the PI list. I think a few 
> subsystems can use that if they temporarely want to boost a task in a 
> consistend way (HR-timers is one). After a little renaming getting the 
> boosting part seperated out of rt_mutex_waiter:
> 
>  struct prio_booster {
> 	struct plist_node	booster_list_entry;
>  };
> 
>  void add_prio_booster(struct task_struct *, struct prio_booster *booster);
>  void remove_prio_booster(struct task_struct *, struct prio_booster 
> *booster);
>  void change_prio_booster(struct task_struct *, struct prio_booster 
> *booster, int new_prio);
> 
> (these functions takes care of doing/triggering a lock chain traversal if 
> needed) and change
> 
>  struct rt_mutext_waiter {
>     ...
>     struct prio_booster booster;
>     ...
>  };

I must defer to Ingo, Thomas, and Steve Rostedt on what the right thing
to do is here, but I do much appreciate the pointers!

If I understand what you are getting at, this is what I would need to
do to in order to have a synchronize_rcu() priority-boost RCU readers?
Or is this what I need to legitimately priority-boost RCU readers in
any case (for example, to properly account for other boosting and
deboosting that might happen while the RCU reader is priority boosted)?

Here are the RCU priority-boost situations I see:

1.	"Out of nowhere" RCU-reader priority boost.  This is what
	the patch I submitted was intended to cover.  If I need your
	prio_booster struct in this case, then I would need to put
	one in the task structure, right?

	Would another be needed to handle a second boost?  My guess
	is that the first could be reused.

2.	RCU reader boosting a lock holder.  This ends up being a
	combination of #1 (because the act of blocking on a lock implies
	an "out of nowhere" priority boost) and normal lock boosting.

3.	A call_rcu() or synchronize_rcu() boosting all readers.  I am
	not sure we really need this, but in case we do...  One would
	need an additional prio_booster for each task to be boosted,
	right?  This would seem to require an additional prio_booster
	struct in each task structure.

Or am I off the mark here?

> There are issues with lock orderings between task->pi_lock (which should 
> be renamed to task->prio_lock) and rq->lock. The lock ordering probably 
> have to be reversed, thus integrating the boosting system directly into 
> the scheduler instead of into rtmutex-subsystem.

This does sound a bit scary.  What exactly am I adding that would motivate
inverting the lock ordering?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC, PATCH, -rt] Early prototype RCU priority-boost patch
  2006-07-28 15:52   ` Paul E. McKenney
@ 2006-07-28 20:00     ` Esben Nielsen
  2006-07-29  2:20       ` Paul E. McKenney
  2006-07-28 22:27     ` Bill Huey
  1 sibling, 1 reply; 10+ messages in thread
From: Esben Nielsen @ 2006-07-28 20:00 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Esben Nielsen, linux-kernel, tglx, rostedt, dipankar, billh,
	mingo, tytso, dvhltc

On Fri, 28 Jul 2006, Paul E. McKenney wrote:

> On Fri, Jul 28, 2006 at 12:38:33PM +0100, Esben Nielsen wrote:
>> Hi,
>>  I have considered an idea to make this work with the PI: Add the ability
>> to at a waiter not refering to a lock to the PI list. I think a few
>> subsystems can use that if they temporarely want to boost a task in a
>> consistend way (HR-timers is one). After a little renaming getting the
>> boosting part seperated out of rt_mutex_waiter:
>>
>>  struct prio_booster {
>> 	struct plist_node	booster_list_entry;
>>  };
>>
>>  void add_prio_booster(struct task_struct *, struct prio_booster *booster);
>>  void remove_prio_booster(struct task_struct *, struct prio_booster
>> *booster);
>>  void change_prio_booster(struct task_struct *, struct prio_booster
>> *booster, int new_prio);
>>
>> (these functions takes care of doing/triggering a lock chain traversal if
>> needed) and change
>>
>>  struct rt_mutext_waiter {
>>     ...
>>     struct prio_booster booster;
>>     ...
>>  };
>
> I must defer to Ingo, Thomas, and Steve Rostedt on what the right thing
> to do is here, but I do much appreciate the pointers!
>
> If I understand what you are getting at, this is what I would need to
> do to in order to have a synchronize_rcu() priority-boost RCU readers?
> Or is this what I need to legitimately priority-boost RCU readers in
> any case (for example, to properly account for other boosting and
> deboosting that might happen while the RCU reader is priority boosted)?
>
> Here are the RCU priority-boost situations I see:
>
> 1.	"Out of nowhere" RCU-reader priority boost.  This is what
> 	the patch I submitted was intended to cover.  If I need your
> 	prio_booster struct in this case, then I would need to put
> 	one in the task structure, right?
>
> 	Would another be needed to handle a second boost?  My guess
> 	is that the first could be reused.

Yes, put one in the task structure and use change_prio_booster().
>
> 2.	RCU reader boosting a lock holder.  This ends up being a
> 	combination of #1 (because the act of blocking on a lock implies
> 	an "out of nowhere" priority boost) and normal lock boosting.
>

That is the normal chain walking of the PI code. It is basicly already 
handled there.

> 3.	A call_rcu() or synchronize_rcu() boosting all readers.  I am
> 	not sure we really need this, but in case we do...  One would
> 	need an additional prio_booster for each task to be boosted,
> 	right?  This would seem to require an additional prio_booster
> 	struct in each task structure.
>
> Or am I off the mark here?

Hmm, yes.
You would need a list of all preempted rcu-readers per CPU.
Then you need to use change_prio_booster() on all of them. However, you 
can do it on the first now, and then update the next at next schedule etc. 
Each CPU can only run one of these tasks until it calls schedule() 
anyways :-)

>
>> There are issues with lock orderings between task->pi_lock (which should
>> be renamed to task->prio_lock) and rq->lock. The lock ordering probably
>> have to be reversed, thus integrating the boosting system directly into
>> the scheduler instead of into rtmutex-subsystem.
>
> This does sound a bit scary.  What exactly am I adding that would motivate
> inverting the lock ordering?

I came to think about it, it might not be so good an idea. In the 
rtmutex the lock order is task->pi_lock then rq->lock. But if it should 
probably the scheduler ought take next->prio_lock, so it can avoid 
moving a boosted task down in priority below the boost. But when it does 
that it already has the rq->lock. On the other hand a trylock would 
probably work and if that in rare cicumstances fail it can release the 
rq->lock and jump back and try again.
So probably no reversal of lock ordering is needed.

Esben

>
> 							Thanx, Paul
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC, PATCH, -rt] Early prototype RCU priority-boost patch
  2006-07-28 15:52   ` Paul E. McKenney
  2006-07-28 20:00     ` Esben Nielsen
@ 2006-07-28 22:27     ` Bill Huey
  2006-07-29  2:18       ` Paul E. McKenney
  1 sibling, 1 reply; 10+ messages in thread
From: Bill Huey @ 2006-07-28 22:27 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Esben Nielsen, linux-kernel, tglx, rostedt, dipankar, mingo,
	tytso, dvhltc, Bill Huey (hui)

On Fri, Jul 28, 2006 at 08:52:20AM -0700, Paul E. McKenney wrote:
> I must defer to Ingo, Thomas, and Steve Rostedt on what the right thing
> to do is here, but I do much appreciate the pointers!
> 
> If I understand what you are getting at, this is what I would need to
> do to in order to have a synchronize_rcu() priority-boost RCU readers?
> Or is this what I need to legitimately priority-boost RCU readers in
> any case (for example, to properly account for other boosting and
> deboosting that might happen while the RCU reader is priority boosted)?
> 
> Here are the RCU priority-boost situations I see:
> 
> 1.	"Out of nowhere" RCU-reader priority boost.  This is what
> 	the patch I submitted was intended to cover.  If I need your
> 	prio_booster struct in this case, then I would need to put
> 	one in the task structure, right?
> 
> 	Would another be needed to handle a second boost?  My guess
> 	is that the first could be reused.

What is that ? like randomly boosting without tracking which thread is
inside an RCU critical section ?

> 2.	RCU reader boosting a lock holder.  This ends up being a
> 	combination of #1 (because the act of blocking on a lock implies
> 	an "out of nowhere" priority boost) and normal lock boosting.

Lock holder as in mutex held below and RCU critical section ?

> 3.	A call_rcu() or synchronize_rcu() boosting all readers.  I am
> 	not sure we really need this, but in case we do...  One would
> 	need an additional prio_booster for each task to be boosted,
> 	right?  This would seem to require an additional prio_booster
> 	struct in each task structure.

This needs a notion of RCU read side ownership to boost those preempted
threads.

> Or am I off the mark here?

The scary thing about what you're doing is that all of the techniques
you've named (assuming that I understand you correctly) requires a
notion of an owner and ownership tracking. That's just going to kill
RCU read side performance if you do that whether it be a list per reader
or something else like that. It's a tough problem.

Don't know what to think about it other than some kind of tracking or
boosting logic in the per CPU run queue or the task struct itself during
the boost operation. But you're still stuck with the problem of what
to boost and how to find that out during an RCU sync side. It's still
an ownership problem unless Esben can think of another way of getting
around that problem.

That's why I suggested a priority ceiling or per CPU priority threshold
tracking (+ CPU binding) the priority of the irq-threads and stuff. It's
a simple hack to restore the cheesy preempt count stuff without having
to revert to invasive ownership tracking for each reader.

It's just an idea. Maybe it'll be useful to you.

bill

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC, PATCH, -rt] Early prototype RCU priority-boost patch
  2006-07-28 22:27     ` Bill Huey
@ 2006-07-29  2:18       ` Paul E. McKenney
  2006-07-29  2:50         ` Bill Huey
  0 siblings, 1 reply; 10+ messages in thread
From: Paul E. McKenney @ 2006-07-29  2:18 UTC (permalink / raw)
  To: Bill Huey
  Cc: Esben Nielsen, linux-kernel, tglx, rostedt, dipankar, mingo,
	tytso, dvhltc

On Fri, Jul 28, 2006 at 03:27:16PM -0700, Bill Huey wrote:
> On Fri, Jul 28, 2006 at 08:52:20AM -0700, Paul E. McKenney wrote:
> > I must defer to Ingo, Thomas, and Steve Rostedt on what the right thing
> > to do is here, but I do much appreciate the pointers!
> > 
> > If I understand what you are getting at, this is what I would need to
> > do to in order to have a synchronize_rcu() priority-boost RCU readers?
> > Or is this what I need to legitimately priority-boost RCU readers in
> > any case (for example, to properly account for other boosting and
> > deboosting that might happen while the RCU reader is priority boosted)?
> > 
> > Here are the RCU priority-boost situations I see:
> > 
> > 1.	"Out of nowhere" RCU-reader priority boost.  This is what
> > 	the patch I submitted was intended to cover.  If I need your
> > 	prio_booster struct in this case, then I would need to put
> > 	one in the task structure, right?
> > 
> > 	Would another be needed to handle a second boost?  My guess
> > 	is that the first could be reused.
> 
> What is that ? like randomly boosting without tracking which thread is
> inside an RCU critical section ?

Perhaps a better way to put it would be that a thread preempted in
an RCU read-side critical section boosts itself, and tracks the fact
that it boosted itself in its tasks structure.

The second boost would be from some other task, but if the task had 
already boosted itself, the de-boosting would already be taken care of
at the next rcu_read_unlock() -- but as mentioned earlier in this
thread, you only boost someone else if they are not currently running.

> > 2.	RCU reader boosting a lock holder.  This ends up being a
> > 	combination of #1 (because the act of blocking on a lock implies
> > 	an "out of nowhere" priority boost) and normal lock boosting.
> 
> Lock holder as in mutex held below and RCU critical section ?

Lock holder as in task 0 holds the lock, perhaps in an RCU read-side
critical section and perhaps not.  Task 1 is in an RCU read-side
critical section and attempts to acquire the lock.  Task 1 must block,
because task 0 still holds the lock.  Task 1 must boost itself before
blocking, and must donate its boosted priority to task 0.

> > 3.	A call_rcu() or synchronize_rcu() boosting all readers.  I am
> > 	not sure we really need this, but in case we do...  One would
> > 	need an additional prio_booster for each task to be boosted,
> > 	right?  This would seem to require an additional prio_booster
> > 	struct in each task structure.
>  
> This needs a notion of RCU read side ownership to boost those preempted
> threads.

I am getting the impression that #3 is something to leave aside for now.

> > Or am I off the mark here?
> 
> The scary thing about what you're doing is that all of the techniques
> you've named (assuming that I understand you correctly) requires a
> notion of an owner and ownership tracking. That's just going to kill
> RCU read side performance if you do that whether it be a list per reader
> or something else like that. It's a tough problem.

The idea is that none of this stuff ever happens except in cases where
the RCU read-side critical section blocks, in which case all this is
in the noise compared to the context switch.  The sole exception to
this is that rcu_read_unlock() must check to see if it has been boosted,
and deboost itself if so.  I don't particularly like the additional
comparison, but it should not be too expensive.

> Don't know what to think about it other than some kind of tracking or
> boosting logic in the per CPU run queue or the task struct itself during
> the boost operation. But you're still stuck with the problem of what
> to boost and how to find that out during an RCU sync side. It's still
> an ownership problem unless Esben can think of another way of getting
> around that problem.

One idea is to put tasks that block in RCU read-side critical sections
on a list -- again, the hope is that the overhead is in the noise compared
to the context switch.

> That's why I suggested a priority ceiling or per CPU priority threshold
> tracking (+ CPU binding) the priority of the irq-threads and stuff. It's
> a simple hack to restore the cheesy preempt count stuff without having
> to revert to invasive ownership tracking for each reader.
> 
> It's just an idea. Maybe it'll be useful to you.

Let me make sure I understand what you are suggesting -- sounds to me
like a check in preempt_schedule().  If the task to be preempted is
higher priority than the ceiling, the preemption request is refused.

Or am I missing part of your proposal?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC, PATCH, -rt] Early prototype RCU priority-boost patch
  2006-07-28 20:00     ` Esben Nielsen
@ 2006-07-29  2:20       ` Paul E. McKenney
  0 siblings, 0 replies; 10+ messages in thread
From: Paul E. McKenney @ 2006-07-29  2:20 UTC (permalink / raw)
  To: Esben Nielsen
  Cc: linux-kernel, tglx, rostedt, dipankar, billh, mingo, tytso,
	dvhltc

On Fri, Jul 28, 2006 at 09:00:31PM +0100, Esben Nielsen wrote:
> On Fri, 28 Jul 2006, Paul E. McKenney wrote:
> 
> >On Fri, Jul 28, 2006 at 12:38:33PM +0100, Esben Nielsen wrote:
> >>Hi,
> >> I have considered an idea to make this work with the PI: Add the ability
> >>to at a waiter not refering to a lock to the PI list. I think a few
> >>subsystems can use that if they temporarely want to boost a task in a
> >>consistend way (HR-timers is one). After a little renaming getting the
> >>boosting part seperated out of rt_mutex_waiter:
> >>
> >> struct prio_booster {
> >>	struct plist_node	booster_list_entry;
> >> };
> >>
> >> void add_prio_booster(struct task_struct *, struct prio_booster 
> >> *booster);
> >> void remove_prio_booster(struct task_struct *, struct prio_booster
> >>*booster);
> >> void change_prio_booster(struct task_struct *, struct prio_booster
> >>*booster, int new_prio);
> >>
> >>(these functions takes care of doing/triggering a lock chain traversal if
> >>needed) and change
> >>
> >> struct rt_mutext_waiter {
> >>    ...
> >>    struct prio_booster booster;
> >>    ...
> >> };
> >
> >I must defer to Ingo, Thomas, and Steve Rostedt on what the right thing
> >to do is here, but I do much appreciate the pointers!
> >
> >If I understand what you are getting at, this is what I would need to
> >do to in order to have a synchronize_rcu() priority-boost RCU readers?
> >Or is this what I need to legitimately priority-boost RCU readers in
> >any case (for example, to properly account for other boosting and
> >deboosting that might happen while the RCU reader is priority boosted)?
> >
> >Here are the RCU priority-boost situations I see:
> >
> >1.	"Out of nowhere" RCU-reader priority boost.  This is what
> >	the patch I submitted was intended to cover.  If I need your
> >	prio_booster struct in this case, then I would need to put
> >	one in the task structure, right?
> >
> >	Would another be needed to handle a second boost?  My guess
> >	is that the first could be reused.
> 
> Yes, put one in the task structure and use change_prio_booster().

OK.

> >2.	RCU reader boosting a lock holder.  This ends up being a
> >	combination of #1 (because the act of blocking on a lock implies
> >	an "out of nowhere" priority boost) and normal lock boosting.
> >
> 
> That is the normal chain walking of the PI code. It is basicly already 
> handled there.

Yep.  The only change is that the RCU reader must boost itself before
doing the current PI stuff.

> >3.	A call_rcu() or synchronize_rcu() boosting all readers.  I am
> >	not sure we really need this, but in case we do...  One would
> >	need an additional prio_booster for each task to be boosted,
> >	right?  This would seem to require an additional prio_booster
> >	struct in each task structure.
> >
> >Or am I off the mark here?
> 
> Hmm, yes.
> You would need a list of all preempted rcu-readers per CPU.
> Then you need to use change_prio_booster() on all of them. However, you 
> can do it on the first now, and then update the next at next schedule etc. 
> Each CPU can only run one of these tasks until it calls schedule() 
> anyways :-)

Good point -- though the trick would be to work out where in the scheduler
one should boost the next one.

> >>There are issues with lock orderings between task->pi_lock (which should
> >>be renamed to task->prio_lock) and rq->lock. The lock ordering probably
> >>have to be reversed, thus integrating the boosting system directly into
> >>the scheduler instead of into rtmutex-subsystem.
> >
> >This does sound a bit scary.  What exactly am I adding that would motivate
> >inverting the lock ordering?
> 
> I came to think about it, it might not be so good an idea. In the 
> rtmutex the lock order is task->pi_lock then rq->lock. But if it should 
> probably the scheduler ought take next->prio_lock, so it can avoid 
> moving a boosted task down in priority below the boost. But when it does 
> that it already has the rq->lock. On the other hand a trylock would 
> probably work and if that in rare cicumstances fail it can release the 
> rq->lock and jump back and try again.
> So probably no reversal of lock ordering is needed.

Music to my ears!!!  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC, PATCH, -rt] Early prototype RCU priority-boost patch
  2006-07-29  2:18       ` Paul E. McKenney
@ 2006-07-29  2:50         ` Bill Huey
  2006-07-31 14:38           ` Paul E. McKenney
  0 siblings, 1 reply; 10+ messages in thread
From: Bill Huey @ 2006-07-29  2:50 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Esben Nielsen, linux-kernel, tglx, rostedt, dipankar, mingo,
	tytso, dvhltc, Bill Huey (hui)

On Fri, Jul 28, 2006 at 07:18:29PM -0700, Paul E. McKenney wrote:
> On Fri, Jul 28, 2006 at 03:27:16PM -0700, Bill Huey wrote:
> > What is that ? like randomly boosting without tracking which thread is
> > inside an RCU critical section ?
> 
> Perhaps a better way to put it would be that a thread preempted in
> an RCU read-side critical section boosts itself, and tracks the fact
> that it boosted itself in its tasks structure.
> 
> The second boost would be from some other task, but if the task had 
> already boosted itself, the de-boosting would already be taken care of
> at the next rcu_read_unlock() -- but as mentioned earlier in this
> thread, you only boost someone else if they are not currently running.

The problem here is that I can't see how it's going to boost the thread
if the things doing the RCU sync can't track the list of readers. It
might be record in the trask struct, now what ?

> > > 2.	RCU reader boosting a lock holder.  This ends up being a
> > > 	combination of #1 (because the act of blocking on a lock implies
> > > 	an "out of nowhere" priority boost) and normal lock boosting.
> > 
> > Lock holder as in mutex held below and RCU critical section ?
> 
> Lock holder as in task 0 holds the lock, perhaps in an RCU read-side
> critical section and perhaps not.  Task 1 is in an RCU read-side
> critical section and attempts to acquire the lock.  Task 1 must block,
> because task 0 still holds the lock.  Task 1 must boost itself before
> blocking, and must donate its boosted priority to task 0.

Ok (thinking...)

> > > 3.	A call_rcu() or synchronize_rcu() boosting all readers.  I am
> > > 	not sure we really need this, but in case we do...  One would
> > > 	need an additional prio_booster for each task to be boosted,
> > > 	right?  This would seem to require an additional prio_booster
> > > 	struct in each task structure.
> >  
> > This needs a notion of RCU read side ownership to boost those preempted
> > threads.
> 
> I am getting the impression that #3 is something to leave aside for now.

... 

> The idea is that none of this stuff ever happens except in cases where
> the RCU read-side critical section blocks, in which case all this is

...

> in the noise compared to the context switch.  The sole exception to
> this is that rcu_read_unlock() must check to see if it has been boosted,
> and deboost itself if so.  I don't particularly like the additional
> comparison, but it should not be too expensive.

Oh, blocks as is gets shoved into a wait queue for a PI enabled lock.
 
> > Don't know what to think about it other than some kind of tracking or
> > boosting logic in the per CPU run queue or the task struct itself during
> > the boost operation. But you're still stuck with the problem of what
> > to boost and how to find that out during an RCU sync side. It's still
> > an ownership problem unless Esben can think of another way of getting
> > around that problem.
> 
> One idea is to put tasks that block in RCU read-side critical sections
> on a list -- again, the hope is that the overhead is in the noise compared
> to the context switch.

Only way to find out is to try it.

> > That's why I suggested a priority ceiling or per CPU priority threshold
> > tracking (+ CPU binding) the priority of the irq-threads and stuff. It's
> > a simple hack to restore the cheesy preempt count stuff without having
> > to revert to invasive ownership tracking for each reader.
> > 
> > It's just an idea. Maybe it'll be useful to you.
> 
> Let me make sure I understand what you are suggesting -- sounds to me
> like a check in preempt_schedule().  If the task to be preempted is
> higher priority than the ceiling, the preemption request is refused.
> 
> Or am I missing part of your proposal?

Something like at all preemption points, cond_resched() and friends (scheduler
tick) to an additional check against a value in a threads own CPU run queue
struct to see if it should permit the preemption or not. I'm thinking about
ways to avoid doing an expensive run queue lock during a task's priority
manipulation and instead have some other kind of logic orthogonal to that so
that it can bypass this overhead.  A value in a run queue that can be checked
against in order to prevent a preemption from happen might be able to side
step the need for a doing a full run queue lock to reorder a tasks priority
ranking.

If a ceiling or threshold was defined for RCU (another somewhat complicated
topic) it could prevent the RCU critical section from preempting other
than with other SCHED_FIFO tasks at and above that priority, if you choose
a threshold at that priority. That'll be apart of the runtime configuratio
of the system. You'd have to cpu_get/put to get that value so that you get
at it safely, read or write to it, and maybe save and restore that value on
entry and exit respectively. You'll also have to set a field in the task
struct to prevent it from migration to another CPU and make sure that's
modifying the right stuff on the right CPU.

It's a possible solution to a rather difficult problem. What do you think ?
too much of a hack ?

(I'm into -rt development again after a good OLS and I'm trying to get my
kernel development up and going so that I can help out)

bill


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC, PATCH, -rt] Early prototype RCU priority-boost patch
  2006-07-29  2:50         ` Bill Huey
@ 2006-07-31 14:38           ` Paul E. McKenney
  2006-07-31 22:22             ` Bill Huey
  0 siblings, 1 reply; 10+ messages in thread
From: Paul E. McKenney @ 2006-07-31 14:38 UTC (permalink / raw)
  To: Bill Huey
  Cc: Esben Nielsen, linux-kernel, tglx, rostedt, dipankar, mingo,
	tytso, dvhltc

On Fri, Jul 28, 2006 at 07:50:37PM -0700, Bill Huey wrote:
> On Fri, Jul 28, 2006 at 07:18:29PM -0700, Paul E. McKenney wrote:
> > On Fri, Jul 28, 2006 at 03:27:16PM -0700, Bill Huey wrote:
> > > What is that ? like randomly boosting without tracking which thread is
> > > inside an RCU critical section ?
> > 
> > Perhaps a better way to put it would be that a thread preempted in
> > an RCU read-side critical section boosts itself, and tracks the fact
> > that it boosted itself in its tasks structure.
> > 
> > The second boost would be from some other task, but if the task had 
> > already boosted itself, the de-boosting would already be taken care of
> > at the next rcu_read_unlock() -- but as mentioned earlier in this
> > thread, you only boost someone else if they are not currently running.
> 
> The problem here is that I can't see how it's going to boost the thread
> if the things doing the RCU sync can't track the list of readers. It
> might be record in the trask struct, now what ?

The first boost is performed by the task itself the first time there
is a preemption attempt (or the first time it blocks on a mutex), so
no need to track the list of readers in that case.  The trick is that
there is no benefit to boosting someone who is already running -- we
only need to boost the first time they are considering blocking.

If there is a need for a second "boost to the sky" in case of excessively
delayed grace period (or to provide deterministic synchronize_rcu()
latency), then we need a list only of those RCU readers who have attempted
to block at least once thus far in their current RCU read-side critical
section.  But I was putting this off until I get the simple case right.
Cowardly of me, I know!  ;-)

Finally found Steve Rostedt's PI document (in 2.6.18-rc2), very helpful
(though I suppose I should reserve judgement until after I get this
working...)

> > > > 2.	RCU reader boosting a lock holder.  This ends up being a
> > > > 	combination of #1 (because the act of blocking on a lock implies
> > > > 	an "out of nowhere" priority boost) and normal lock boosting.
> > > 
> > > Lock holder as in mutex held below and RCU critical section ?
> > 
> > Lock holder as in task 0 holds the lock, perhaps in an RCU read-side
> > critical section and perhaps not.  Task 1 is in an RCU read-side
> > critical section and attempts to acquire the lock.  Task 1 must block,
> > because task 0 still holds the lock.  Task 1 must boost itself before
> > blocking, and must donate its boosted priority to task 0.
> 
> Ok (thinking...)
> 
> > > > 3.	A call_rcu() or synchronize_rcu() boosting all readers.  I am
> > > > 	not sure we really need this, but in case we do...  One would
> > > > 	need an additional prio_booster for each task to be boosted,
> > > > 	right?  This would seem to require an additional prio_booster
> > > > 	struct in each task structure.
> > >  
> > > This needs a notion of RCU read side ownership to boost those preempted
> > > threads.
> > 
> > I am getting the impression that #3 is something to leave aside for now.
> 
> ... 
> 
> > The idea is that none of this stuff ever happens except in cases where
> > the RCU read-side critical section blocks, in which case all this is
> 
> ...
> 
> > in the noise compared to the context switch.  The sole exception to
> > this is that rcu_read_unlock() must check to see if it has been boosted,
> > and deboost itself if so.  I don't particularly like the additional
> > comparison, but it should not be too expensive.
> 
> Oh, blocks as is gets shoved into a wait queue for a PI enabled lock.

Yep.

> > > Don't know what to think about it other than some kind of tracking or
> > > boosting logic in the per CPU run queue or the task struct itself during
> > > the boost operation. But you're still stuck with the problem of what
> > > to boost and how to find that out during an RCU sync side. It's still
> > > an ownership problem unless Esben can think of another way of getting
> > > around that problem.
> > 
> > One idea is to put tasks that block in RCU read-side critical sections
> > on a list -- again, the hope is that the overhead is in the noise compared
> > to the context switch.
> 
> Only way to find out is to try it.

Or to try without it and see what happens.

> > > That's why I suggested a priority ceiling or per CPU priority threshold
> > > tracking (+ CPU binding) the priority of the irq-threads and stuff. It's
> > > a simple hack to restore the cheesy preempt count stuff without having
> > > to revert to invasive ownership tracking for each reader.
> > > 
> > > It's just an idea. Maybe it'll be useful to you.
> > 
> > Let me make sure I understand what you are suggesting -- sounds to me
> > like a check in preempt_schedule().  If the task to be preempted is
> > higher priority than the ceiling, the preemption request is refused.
> > 
> > Or am I missing part of your proposal?
> 
> Something like at all preemption points, cond_resched() and friends (scheduler
> tick) to an additional check against a value in a threads own CPU run queue
> struct to see if it should permit the preemption or not. I'm thinking about
> ways to avoid doing an expensive run queue lock during a task's priority
> manipulation and instead have some other kind of logic orthogonal to that so
> that it can bypass this overhead.  A value in a run queue that can be checked
> against in order to prevent a preemption from happen might be able to side
> step the need for a doing a full run queue lock to reorder a tasks priority
> ranking.
> 
> If a ceiling or threshold was defined for RCU (another somewhat complicated
> topic) it could prevent the RCU critical section from preempting other
> than with other SCHED_FIFO tasks at and above that priority, if you choose
> a threshold at that priority. That'll be apart of the runtime configuratio
> of the system. You'd have to cpu_get/put to get that value so that you get
> at it safely, read or write to it, and maybe save and restore that value on
> entry and exit respectively. You'll also have to set a field in the task
> struct to prevent it from migration to another CPU and make sure that's
> modifying the right stuff on the right CPU.
> 
> It's a possible solution to a rather difficult problem. What do you think ?
> too much of a hack ?

I am not sure -- seems to be a dual approach to boosting the RCU reader's
priority in the preemption case.  I suspect that a real priority boost
would still be needed in the case where the RCU reader blocks on a mutex,
since we need the priority inheritance to happen in that case, right?

> (I'm into -rt development again after a good OLS and I'm trying to get my
> kernel development up and going so that I can help out)

Sounds very good!!!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC, PATCH, -rt] Early prototype RCU priority-boost patch
  2006-07-31 14:38           ` Paul E. McKenney
@ 2006-07-31 22:22             ` Bill Huey
  0 siblings, 0 replies; 10+ messages in thread
From: Bill Huey @ 2006-07-31 22:22 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Esben Nielsen, linux-kernel, tglx, rostedt, dipankar, mingo,
	tytso, dvhltc, Bill Huey (hui)

On Mon, Jul 31, 2006 at 07:38:50AM -0700, Paul E. McKenney wrote:
> On Fri, Jul 28, 2006 at 07:50:37PM -0700, Bill Huey wrote:
> > The problem here is that I can't see how it's going to boost the thread
> > if the things doing the RCU sync can't track the list of readers. It
> > might be record in the trask struct, now what ?
> 
> The first boost is performed by the task itself the first time there
> is a preemption attempt (or the first time it blocks on a mutex), so
> no need to track the list of readers in that case.  The trick is that
> there is no benefit to boosting someone who is already running -- we
> only need to boost the first time they are considering blocking.
> 
> If there is a need for a second "boost to the sky" in case of excessively
> delayed grace period (or to provide deterministic synchronize_rcu()
> latency), then we need a list only of those RCU readers who have attempted
> to block at least once thus far in their current RCU read-side critical
> section.  But I was putting this off until I get the simple case right.
> Cowardly of me, I know!  ;-)
 
Ok, I see what you're talking about.

> Finally found Steve Rostedt's PI document (in 2.6.18-rc2), very helpful
> (though I suppose I should reserve judgement until after I get this
> working...)
> 
> > It's a possible solution to a rather difficult problem. What do you think ?
> > too much of a hack ?
> 
> I am not sure -- seems to be a dual approach to boosting the RCU reader's
> priority in the preemption case.  I suspect that a real priority boost
> would still be needed in the case where the RCU reader blocks on a mutex,
> since we need the priority inheritance to happen in that case, right?

This is unfortunately true. It maybe that my suggestion isn't going to
work in this scenario. I'll have to think about this more. I think that
either way you still have to extract information somewhere that there is
you're in a live RCU reader section anyways, either through an RCU read
side counter or some kind of mechanism like that (boosted priority or
ceiling can denote some use of RCU, as in a some kind of count, but this
is getting more complicated and remotely less useful) and still take tha
into account when you do priority inheritance.

I'll have to think about this more, but you are probably completely
correct and it might not be possible to priority boost an RCU read side
without some kind of explicit reader tracking in the task struct.

bill


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2006-07-31 22:22 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-28  0:19 [RFC, PATCH, -rt] Early prototype RCU priority-boost patch Paul E. McKenney
2006-07-28 11:38 ` Esben Nielsen
2006-07-28 15:52   ` Paul E. McKenney
2006-07-28 20:00     ` Esben Nielsen
2006-07-29  2:20       ` Paul E. McKenney
2006-07-28 22:27     ` Bill Huey
2006-07-29  2:18       ` Paul E. McKenney
2006-07-29  2:50         ` Bill Huey
2006-07-31 14:38           ` Paul E. McKenney
2006-07-31 22:22             ` Bill Huey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox