Re: [PATCH 1/2] srcu: RCU variant permitting read-side blocking

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH 1/2] srcu: RCU variant permitting read-side blocking
       [not found]       ` <20060626181418.70aeffd3.akpm@osdl.org>
@ 2006-06-27  1:37         ` Paul E. McKenney
  0 siblings, 0 replies; 6+ messages in thread
From: Paul E. McKenney @ 2006-06-27  1:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, matthltc, dipankar, stern, mingo, tytso, dvhltc

On Mon, Jun 26, 2006 at 06:14:18PM -0700, Andrew Morton wrote:
> On Mon, 26 Jun 2006 17:53:51 -0700
> "Paul E. McKenney" <paulmck@us.ibm.com> wrote:
> 
> > > > +struct srcu_struct_array {
> > > > +	int c[2];
> > > > +} ____cacheline_internode_aligned_in_smp;
> > > 
> > > ____cacheline_internode_aligned_in_smp isn't implemented..
> > 
> > It was not long ago...  :-/
> 
> I was trying to work out why on earth this compiled.
> 
> It gives you a global variable called
> ____cacheline_internode_aligned_in_smp.  That works nicely until you
> include this header file from two .c files, at which time you get two
> global variables called ____cacheline_internode_aligned_in_smp.  And the
> linker will happily swallow even that unless you're using -fno-common.

Hmmm...

Sounds like percpu_alloc() is strongly recommended.  Made the changes,
compiling, hope to test overnight.

> > > > +		if (sum == 0)
> > > > +			break;
> > > > +		schedule_timeout_interruptible(1);
> > > > +	}
> > > 
> > > Little sleeps like this are a worry.  It's usually an indication that we've
> > > been lazy and haven't put in the wakeups which are needed for a
> > > minimum-latency wait.
> > 
> > I have been even -more- lazy and have absolutely -no- wakeups.  ;-)
> > The alternative would be to have srcu_read_unlock() wake up the
> > task doing the synchronize_srcu(), but getting that right is painful.
> 
> Shouldn't be too hard...
> 
> A wakeup can be relatively expensive, but one can often do
> 
> 	if (something_which_is_inexpensive())
> 		wake_up(...);
> 
> although it takes care.

Exactly.  ;-)  Been there, done that, gotten it right, but have also
run up almost every blind alley that there is.

Besides, prior to this, there is a synchronize_sched().  In many cases,
the readers will have all completed during the synchronize_sched()
latency, so my bet is that the extra complexity will have no benefit in
the common case.  And if someone comes up with with a good reason to do
a blocking network receive or some such in the SRCU read-side critical
sections, I will be happy to add the wakeup machinations.

Fair enough?

(Besides, we will want to save some of the complexity budget for a
hierarchical implementation should Jesse Barnes prove correct about
future 1,000-CPU dies, right?)

> if you want to be _really_ sleazy you can do
> 
> 	if (something_which_is_inexpensive_and_isnt_quite_right())
> 		wake_up(...);
> 
> and, at the other end:
> 
> 	while (something) {
> 		schedule_timeout_interruptible(1);
> 	}
> 
> and rely upon the flakey-wakeup to work most of the time, so it usually
> interrupts the sleep.

Urg...

> Now erase this from your mind.

To erase it from my mind, I would have had to allow it to get that far
in the first place.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] srcu: RCU variant permitting read-side blocking
  2006-06-27 21:13 [PATCH 1/2] srcu: RCU variant permitting read-side blocking Oleg Nesterov
@ 2006-06-27 18:59 ` Paul E. McKenney
  2006-06-27 19:19   ` Paul E. McKenney
  2006-06-28 19:41   ` Oleg Nesterov
  0 siblings, 2 replies; 6+ messages in thread
From: Paul E. McKenney @ 2006-06-27 18:59 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: linux-kernel

On Wed, Jun 28, 2006 at 01:13:58AM +0400, Oleg Nesterov wrote:
> Hello Paul,
> 
> "Paul E. McKenney" wrote:
> >
> > +void init_srcu_struct(struct srcu_struct *sp)
> > +{
> > +	int cpu;
> > +
> > +	sp->completed = 0;
> > +	sp->per_cpu_ref = (struct srcu_struct_array *)
> > +			  kmalloc(NR_CPUS * sizeof(*sp->per_cpu_ref),
> > +				  GFP_KERNEL);
> > +	for_each_cpu(cpu) {
> > +		sp->per_cpu_ref[cpu].c[0] = 0;
> > +		sp->per_cpu_ref[cpu].c[1] = 0;
> > +	}
> 
> Isn't it simpler to just do:
> 
> 	sp->per_cpu_ref = kzmalloc(NR_CPUS * sizeof(*sp->per_cpu_ref),
> 				GFP_KERNEL);
> 
> and drop 'for_each_cpu(cpu)' initialization ?

Yes, and even simpler to use the alloc_percpu(), as Andrew suggested.

> > +int srcu_read_lock(struct srcu_struct *sp)
> > +{
> > +	int idx;
> > +
> > +	preempt_disable();
> > +	idx = sp->completed & 0x1;
> > +	barrier();
> > +	sp->per_cpu_ref[smp_processor_id()].c[idx]++;
> > +	preempt_enable();
> > +	return idx;
> > +}
> 
> Could you explain this 'barrier()' ?

It ensures that the compiler picks up sp->completed but once.
It is hard to imagine a compiler generating code that fetched sp->completed
more than once, but I have been unpleasantly surprised before.

Thoughts?

> > +void synchronize_srcu(struct srcu_struct *sp)
> > +{
> > +	int cpu;
> > +	int idx;
> > +	int sum;
> > +
> > +	might_sleep();
> > +
> > +	mutex_lock(&sp->mutex);
> > +
> > +	smp_mb();  /* Prevent operations from leaking in. */
> 
> Why smp_wmb() is not enough? We are doing synchronize_sched() below
> before reading ->per_cpu_ref, and ->completed is protected by ->mutex.

Could well be that smp_wmb() is sufficient.  I frankly was not engaging
in that level of optimization on this round.  Seems likely, given that
I was not able to come up with a convincing counter-example.

That said, I am not going to change it until I can prove that it is
safe.  ;-)

> > +	idx = sp->completed & 0x1;
> > +	sp->completed++;
> 
> But srcu_read_lock()'s path and rcu_dereference() doesn't have rmb(),
> and the reader can block, so I can't understand how this all works.
> 
> Suppose ->completed == 0,
> 
> 	WRITER:						READER:
> 
> 	old = global_ptr;
> 	rcu_assign_pointer(global_ptr, new);
> 
> 	synchronize_srcu:
> 
> 		locks mutex, does mb,
> 		->completed++;
> 		
> 							srcu_read_lock();
> 								// reads ->completed == 1
> 								// does .c[1]++
> 							ptr = rcu_dereference(global_ptr)
> 								// reads the *OLD* value,
> 								// because we don't have rmb()

Hmmm...  I thought I was handling this case, but my rationale as to
how is looking a bit flimsy at the moment.  ;-)  I will look at this
more carefully.  If you are correct, one fix is to replace the prior mb
with synchronize_sched().  Do you agree that this would fix the problem?

> 							block_on_something();
> 
> 
> 		synchronize_sched();

The above synchronize_sched() guarantees that all srcu_read_lock() calls
that are still in flight will either (1) already be accounted for in .c[1]
or (2) do their accounting in .c[0].

> 							// ... still blocked ...
> 
> 		checks sum_of(.c[0]) == 0, yes
> 
> 		synchronize_sched();

This one handles the srcu_read_unlock() analog of the situation you
are worried about above.  The reader does not have memory barriers in
srcu_read_unlock(), so an access to the data structure might get
reordered to follow the decrement of .c[0] -- which would get messed
up by the following kfree().

The synchronize_sched() guarantees that all concurrent srcu_read_unlock()
calls complete cleanly before synchronize_sched() returns, inserting
a memory barrier on each CPU to enforce this.

> 							// ... still blocked ...
> 
> 	kfree(old);
> 
> 							// wake up
> 							do_something(ptr);
> 
> 
> Also, I can't understand the purpose of 2-nd synchronize_sched() in
> synchronize_srcu().

(See above.)

> Please help!

Thank you for the careful review!  I will look more carefully into the
scenario you called out above.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] srcu: RCU variant permitting read-side blocking
  2006-06-27 18:59 ` Paul E. McKenney
@ 2006-06-27 19:19   ` Paul E. McKenney
  2006-06-28 19:41   ` Oleg Nesterov
  1 sibling, 0 replies; 6+ messages in thread
From: Paul E. McKenney @ 2006-06-27 19:19 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: linux-kernel

On Tue, Jun 27, 2006 at 11:59:45AM -0700, Paul E. McKenney wrote:
> On Wed, Jun 28, 2006 at 01:13:58AM +0400, Oleg Nesterov wrote:
> > "Paul E. McKenney" wrote:
> > > +	idx = sp->completed & 0x1;
> > > +	sp->completed++;
> > 
> > But srcu_read_lock()'s path and rcu_dereference() doesn't have rmb(),
> > and the reader can block, so I can't understand how this all works.
> > 
> > Suppose ->completed == 0,
> > 
> > 	WRITER:						READER:
> > 
> > 	old = global_ptr;
> > 	rcu_assign_pointer(global_ptr, new);
> > 
> > 	synchronize_srcu:
> > 
> > 		locks mutex, does mb,
> > 		->completed++;
> > 		
> > 							srcu_read_lock();
> > 								// reads ->completed == 1
> > 								// does .c[1]++
> > 							ptr = rcu_dereference(global_ptr)
> > 								// reads the *OLD* value,
> > 								// because we don't have rmb()
> 
> Hmmm...  I thought I was handling this case, but my rationale as to
> how is looking a bit flimsy at the moment.  ;-)  I will look at this
> more carefully.  If you are correct, one fix is to replace the prior mb
> with synchronize_sched().  Do you agree that this would fix the problem?

Never mind -- this fix would have no effect.  Guess I should engage
my brain before sending email.  :-/

First to review my reasoning, and then to provide either the explanation
should my reasoning prove true or a fix otherwise...

							Thanx, Paul

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] srcu: RCU variant permitting read-side blocking
@ 2006-06-27 21:13 Oleg Nesterov
  2006-06-27 18:59 ` Paul E. McKenney
  0 siblings, 1 reply; 6+ messages in thread
From: Oleg Nesterov @ 2006-06-27 21:13 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: linux-kernel

Hello Paul,

"Paul E. McKenney" wrote:
>
> +void init_srcu_struct(struct srcu_struct *sp)
> +{
> +	int cpu;
> +
> +	sp->completed = 0;
> +	sp->per_cpu_ref = (struct srcu_struct_array *)
> +			  kmalloc(NR_CPUS * sizeof(*sp->per_cpu_ref),
> +				  GFP_KERNEL);
> +	for_each_cpu(cpu) {
> +		sp->per_cpu_ref[cpu].c[0] = 0;
> +		sp->per_cpu_ref[cpu].c[1] = 0;
> +	}

Isn't it simpler to just do:

	sp->per_cpu_ref = kzmalloc(NR_CPUS * sizeof(*sp->per_cpu_ref),
				GFP_KERNEL);

and drop 'for_each_cpu(cpu)' initialization ?

> +int srcu_read_lock(struct srcu_struct *sp)
> +{
> +	int idx;
> +
> +	preempt_disable();
> +	idx = sp->completed & 0x1;
> +	barrier();
> +	sp->per_cpu_ref[smp_processor_id()].c[idx]++;
> +	preempt_enable();
> +	return idx;
> +}

Could you explain this 'barrier()' ?

> +void synchronize_srcu(struct srcu_struct *sp)
> +{
> +	int cpu;
> +	int idx;
> +	int sum;
> +
> +	might_sleep();
> +
> +	mutex_lock(&sp->mutex);
> +
> +	smp_mb();  /* Prevent operations from leaking in. */

Why smp_wmb() is not enough? We are doing synchronize_sched() below
before reading ->per_cpu_ref, and ->completed is protected by ->mutex.

> +	idx = sp->completed & 0x1;
> +	sp->completed++;

But srcu_read_lock()'s path and rcu_dereference() doesn't have rmb(),
and the reader can block, so I can't understand how this all works.

Suppose ->completed == 0,

	WRITER:						READER:

	old = global_ptr;
	rcu_assign_pointer(global_ptr, new);

	synchronize_srcu:

		locks mutex, does mb,
		->completed++;
		
							srcu_read_lock();
								// reads ->completed == 1
								// does .c[1]++
							ptr = rcu_dereference(global_ptr)
								// reads the *OLD* value,
								// because we don't have rmb()

							block_on_something();


		synchronize_sched();
							// ... still blocked ...

		checks sum_of(.c[0]) == 0, yes

		synchronize_sched();
							// ... still blocked ...

	kfree(old);

							// wake up
							do_something(ptr);


Also, I can't understand the purpose of 2-nd synchronize_sched() in
synchronize_srcu().

Please help!

Oleg.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] srcu: RCU variant permitting read-side blocking
  2006-06-28 19:41   ` Oleg Nesterov
@ 2006-06-28 15:32     ` Paul E. McKenney
  0 siblings, 0 replies; 6+ messages in thread
From: Paul E. McKenney @ 2006-06-28 15:32 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: linux-kernel

On Wed, Jun 28, 2006 at 11:41:21PM +0400, Oleg Nesterov wrote:
> On 06/27, Paul E. McKenney wrote:
> >
> > On Wed, Jun 28, 2006 at 01:13:58AM +0400, Oleg Nesterov wrote:
> > > 
> > > Also, I can't understand the purpose of 2-nd synchronize_sched() in
> > > synchronize_srcu().
> > 
> > This one handles the srcu_read_unlock() analog of the situation you
> > are worried about above.  The reader does not have memory barriers in
> > srcu_read_unlock(), so an access to the data structure might get
> > reordered to follow the decrement of .c[0] -- which would get messed
> > up by the following kfree().
> 
> Aha, I see.

Fortunately, we understood opposite sides of the problem, so, taken
together, we have it covered.  ;-)

Now we just need to figure out how to find the problems that both of
us missed!

> The last question. The 'srcu-2' you posted today does synchronize_srcu_flip()
> twice. You did it this way because srcu is optimized for readers, otherwise we
> could just add smp_rmb() into srcu_read_lock() - this should solve the problem
> as well.
> 
> Is my understanding correct?

Exactly correct!!!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] srcu: RCU variant permitting read-side blocking
  2006-06-27 18:59 ` Paul E. McKenney
  2006-06-27 19:19   ` Paul E. McKenney
@ 2006-06-28 19:41   ` Oleg Nesterov
  2006-06-28 15:32     ` Paul E. McKenney
  1 sibling, 1 reply; 6+ messages in thread
From: Oleg Nesterov @ 2006-06-28 19:41 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: linux-kernel

On 06/27, Paul E. McKenney wrote:
>
> On Wed, Jun 28, 2006 at 01:13:58AM +0400, Oleg Nesterov wrote:
> > 
> > Also, I can't understand the purpose of 2-nd synchronize_sched() in
> > synchronize_srcu().
> 
> This one handles the srcu_read_unlock() analog of the situation you
> are worried about above.  The reader does not have memory barriers in
> srcu_read_unlock(), so an access to the data structure might get
> reordered to follow the decrement of .c[0] -- which would get messed
> up by the following kfree().

Aha, I see.

The last question. The 'srcu-2' you posted today does synchronize_srcu_flip()
twice. You did it this way because srcu is optimized for readers, otherwise we
could just add smp_rmb() into srcu_read_lock() - this should solve the problem
as well.

Is my understanding correct?

Thanks!

Oleg.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2006-06-28 15:31 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-27 21:13 [PATCH 1/2] srcu: RCU variant permitting read-side blocking Oleg Nesterov
2006-06-27 18:59 ` Paul E. McKenney
2006-06-27 19:19   ` Paul E. McKenney
2006-06-28 19:41   ` Oleg Nesterov
2006-06-28 15:32     ` Paul E. McKenney
     [not found] <20060626190328.GD2141@us.ibm.com>
     [not found] ` <20060626190743.GE2141@us.ibm.com>
     [not found]   ` <20060626134447.a75cb385.akpm@osdl.org>
     [not found]     ` <20060627005350.GG1295@us.ibm.com>
     [not found]       ` <20060626181418.70aeffd3.akpm@osdl.org>
2006-06-27  1:37         ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox