* Re: [PATCH 1/2] srcu: RCU variant permitting read-side blocking
[not found] ` <20060626181418.70aeffd3.akpm@osdl.org>
@ 2006-06-27 1:37 ` Paul E. McKenney
0 siblings, 0 replies; 6+ messages in thread
From: Paul E. McKenney @ 2006-06-27 1:37 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-kernel, matthltc, dipankar, stern, mingo, tytso, dvhltc
On Mon, Jun 26, 2006 at 06:14:18PM -0700, Andrew Morton wrote:
> On Mon, 26 Jun 2006 17:53:51 -0700
> "Paul E. McKenney" <paulmck@us.ibm.com> wrote:
>
> > > > +struct srcu_struct_array {
> > > > + int c[2];
> > > > +} ____cacheline_internode_aligned_in_smp;
> > >
> > > ____cacheline_internode_aligned_in_smp isn't implemented..
> >
> > It was not long ago... :-/
>
> I was trying to work out why on earth this compiled.
>
> It gives you a global variable called
> ____cacheline_internode_aligned_in_smp. That works nicely until you
> include this header file from two .c files, at which time you get two
> global variables called ____cacheline_internode_aligned_in_smp. And the
> linker will happily swallow even that unless you're using -fno-common.
Hmmm...
Sounds like percpu_alloc() is strongly recommended. Made the changes,
compiling, hope to test overnight.
> > > > + if (sum == 0)
> > > > + break;
> > > > + schedule_timeout_interruptible(1);
> > > > + }
> > >
> > > Little sleeps like this are a worry. It's usually an indication that we've
> > > been lazy and haven't put in the wakeups which are needed for a
> > > minimum-latency wait.
> >
> > I have been even -more- lazy and have absolutely -no- wakeups. ;-)
> > The alternative would be to have srcu_read_unlock() wake up the
> > task doing the synchronize_srcu(), but getting that right is painful.
>
> Shouldn't be too hard...
>
> A wakeup can be relatively expensive, but one can often do
>
> if (something_which_is_inexpensive())
> wake_up(...);
>
> although it takes care.
Exactly. ;-) Been there, done that, gotten it right, but have also
run up almost every blind alley that there is.
Besides, prior to this, there is a synchronize_sched(). In many cases,
the readers will have all completed during the synchronize_sched()
latency, so my bet is that the extra complexity will have no benefit in
the common case. And if someone comes up with with a good reason to do
a blocking network receive or some such in the SRCU read-side critical
sections, I will be happy to add the wakeup machinations.
Fair enough?
(Besides, we will want to save some of the complexity budget for a
hierarchical implementation should Jesse Barnes prove correct about
future 1,000-CPU dies, right?)
> if you want to be _really_ sleazy you can do
>
> if (something_which_is_inexpensive_and_isnt_quite_right())
> wake_up(...);
>
> and, at the other end:
>
> while (something) {
> schedule_timeout_interruptible(1);
> }
>
> and rely upon the flakey-wakeup to work most of the time, so it usually
> interrupts the sleep.
Urg...
> Now erase this from your mind.
To erase it from my mind, I would have had to allow it to get that far
in the first place. ;-)
Thanx, Paul
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] srcu: RCU variant permitting read-side blocking
2006-06-27 21:13 [PATCH 1/2] srcu: RCU variant permitting read-side blocking Oleg Nesterov
@ 2006-06-27 18:59 ` Paul E. McKenney
2006-06-27 19:19 ` Paul E. McKenney
2006-06-28 19:41 ` Oleg Nesterov
0 siblings, 2 replies; 6+ messages in thread
From: Paul E. McKenney @ 2006-06-27 18:59 UTC (permalink / raw)
To: Oleg Nesterov; +Cc: linux-kernel
On Wed, Jun 28, 2006 at 01:13:58AM +0400, Oleg Nesterov wrote:
> Hello Paul,
>
> "Paul E. McKenney" wrote:
> >
> > +void init_srcu_struct(struct srcu_struct *sp)
> > +{
> > + int cpu;
> > +
> > + sp->completed = 0;
> > + sp->per_cpu_ref = (struct srcu_struct_array *)
> > + kmalloc(NR_CPUS * sizeof(*sp->per_cpu_ref),
> > + GFP_KERNEL);
> > + for_each_cpu(cpu) {
> > + sp->per_cpu_ref[cpu].c[0] = 0;
> > + sp->per_cpu_ref[cpu].c[1] = 0;
> > + }
>
> Isn't it simpler to just do:
>
> sp->per_cpu_ref = kzmalloc(NR_CPUS * sizeof(*sp->per_cpu_ref),
> GFP_KERNEL);
>
> and drop 'for_each_cpu(cpu)' initialization ?
Yes, and even simpler to use the alloc_percpu(), as Andrew suggested.
> > +int srcu_read_lock(struct srcu_struct *sp)
> > +{
> > + int idx;
> > +
> > + preempt_disable();
> > + idx = sp->completed & 0x1;
> > + barrier();
> > + sp->per_cpu_ref[smp_processor_id()].c[idx]++;
> > + preempt_enable();
> > + return idx;
> > +}
>
> Could you explain this 'barrier()' ?
It ensures that the compiler picks up sp->completed but once.
It is hard to imagine a compiler generating code that fetched sp->completed
more than once, but I have been unpleasantly surprised before.
Thoughts?
> > +void synchronize_srcu(struct srcu_struct *sp)
> > +{
> > + int cpu;
> > + int idx;
> > + int sum;
> > +
> > + might_sleep();
> > +
> > + mutex_lock(&sp->mutex);
> > +
> > + smp_mb(); /* Prevent operations from leaking in. */
>
> Why smp_wmb() is not enough? We are doing synchronize_sched() below
> before reading ->per_cpu_ref, and ->completed is protected by ->mutex.
Could well be that smp_wmb() is sufficient. I frankly was not engaging
in that level of optimization on this round. Seems likely, given that
I was not able to come up with a convincing counter-example.
That said, I am not going to change it until I can prove that it is
safe. ;-)
> > + idx = sp->completed & 0x1;
> > + sp->completed++;
>
> But srcu_read_lock()'s path and rcu_dereference() doesn't have rmb(),
> and the reader can block, so I can't understand how this all works.
>
> Suppose ->completed == 0,
>
> WRITER: READER:
>
> old = global_ptr;
> rcu_assign_pointer(global_ptr, new);
>
> synchronize_srcu:
>
> locks mutex, does mb,
> ->completed++;
>
> srcu_read_lock();
> // reads ->completed == 1
> // does .c[1]++
> ptr = rcu_dereference(global_ptr)
> // reads the *OLD* value,
> // because we don't have rmb()
Hmmm... I thought I was handling this case, but my rationale as to
how is looking a bit flimsy at the moment. ;-) I will look at this
more carefully. If you are correct, one fix is to replace the prior mb
with synchronize_sched(). Do you agree that this would fix the problem?
> block_on_something();
>
>
> synchronize_sched();
The above synchronize_sched() guarantees that all srcu_read_lock() calls
that are still in flight will either (1) already be accounted for in .c[1]
or (2) do their accounting in .c[0].
> // ... still blocked ...
>
> checks sum_of(.c[0]) == 0, yes
>
> synchronize_sched();
This one handles the srcu_read_unlock() analog of the situation you
are worried about above. The reader does not have memory barriers in
srcu_read_unlock(), so an access to the data structure might get
reordered to follow the decrement of .c[0] -- which would get messed
up by the following kfree().
The synchronize_sched() guarantees that all concurrent srcu_read_unlock()
calls complete cleanly before synchronize_sched() returns, inserting
a memory barrier on each CPU to enforce this.
> // ... still blocked ...
>
> kfree(old);
>
> // wake up
> do_something(ptr);
>
>
> Also, I can't understand the purpose of 2-nd synchronize_sched() in
> synchronize_srcu().
(See above.)
> Please help!
Thank you for the careful review! I will look more carefully into the
scenario you called out above.
Thanx, Paul
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] srcu: RCU variant permitting read-side blocking
2006-06-27 18:59 ` Paul E. McKenney
@ 2006-06-27 19:19 ` Paul E. McKenney
2006-06-28 19:41 ` Oleg Nesterov
1 sibling, 0 replies; 6+ messages in thread
From: Paul E. McKenney @ 2006-06-27 19:19 UTC (permalink / raw)
To: Oleg Nesterov; +Cc: linux-kernel
On Tue, Jun 27, 2006 at 11:59:45AM -0700, Paul E. McKenney wrote:
> On Wed, Jun 28, 2006 at 01:13:58AM +0400, Oleg Nesterov wrote:
> > "Paul E. McKenney" wrote:
> > > + idx = sp->completed & 0x1;
> > > + sp->completed++;
> >
> > But srcu_read_lock()'s path and rcu_dereference() doesn't have rmb(),
> > and the reader can block, so I can't understand how this all works.
> >
> > Suppose ->completed == 0,
> >
> > WRITER: READER:
> >
> > old = global_ptr;
> > rcu_assign_pointer(global_ptr, new);
> >
> > synchronize_srcu:
> >
> > locks mutex, does mb,
> > ->completed++;
> >
> > srcu_read_lock();
> > // reads ->completed == 1
> > // does .c[1]++
> > ptr = rcu_dereference(global_ptr)
> > // reads the *OLD* value,
> > // because we don't have rmb()
>
> Hmmm... I thought I was handling this case, but my rationale as to
> how is looking a bit flimsy at the moment. ;-) I will look at this
> more carefully. If you are correct, one fix is to replace the prior mb
> with synchronize_sched(). Do you agree that this would fix the problem?
Never mind -- this fix would have no effect. Guess I should engage
my brain before sending email. :-/
First to review my reasoning, and then to provide either the explanation
should my reasoning prove true or a fix otherwise...
Thanx, Paul
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] srcu: RCU variant permitting read-side blocking
@ 2006-06-27 21:13 Oleg Nesterov
2006-06-27 18:59 ` Paul E. McKenney
0 siblings, 1 reply; 6+ messages in thread
From: Oleg Nesterov @ 2006-06-27 21:13 UTC (permalink / raw)
To: Paul E. McKenney; +Cc: linux-kernel
Hello Paul,
"Paul E. McKenney" wrote:
>
> +void init_srcu_struct(struct srcu_struct *sp)
> +{
> + int cpu;
> +
> + sp->completed = 0;
> + sp->per_cpu_ref = (struct srcu_struct_array *)
> + kmalloc(NR_CPUS * sizeof(*sp->per_cpu_ref),
> + GFP_KERNEL);
> + for_each_cpu(cpu) {
> + sp->per_cpu_ref[cpu].c[0] = 0;
> + sp->per_cpu_ref[cpu].c[1] = 0;
> + }
Isn't it simpler to just do:
sp->per_cpu_ref = kzmalloc(NR_CPUS * sizeof(*sp->per_cpu_ref),
GFP_KERNEL);
and drop 'for_each_cpu(cpu)' initialization ?
> +int srcu_read_lock(struct srcu_struct *sp)
> +{
> + int idx;
> +
> + preempt_disable();
> + idx = sp->completed & 0x1;
> + barrier();
> + sp->per_cpu_ref[smp_processor_id()].c[idx]++;
> + preempt_enable();
> + return idx;
> +}
Could you explain this 'barrier()' ?
> +void synchronize_srcu(struct srcu_struct *sp)
> +{
> + int cpu;
> + int idx;
> + int sum;
> +
> + might_sleep();
> +
> + mutex_lock(&sp->mutex);
> +
> + smp_mb(); /* Prevent operations from leaking in. */
Why smp_wmb() is not enough? We are doing synchronize_sched() below
before reading ->per_cpu_ref, and ->completed is protected by ->mutex.
> + idx = sp->completed & 0x1;
> + sp->completed++;
But srcu_read_lock()'s path and rcu_dereference() doesn't have rmb(),
and the reader can block, so I can't understand how this all works.
Suppose ->completed == 0,
WRITER: READER:
old = global_ptr;
rcu_assign_pointer(global_ptr, new);
synchronize_srcu:
locks mutex, does mb,
->completed++;
srcu_read_lock();
// reads ->completed == 1
// does .c[1]++
ptr = rcu_dereference(global_ptr)
// reads the *OLD* value,
// because we don't have rmb()
block_on_something();
synchronize_sched();
// ... still blocked ...
checks sum_of(.c[0]) == 0, yes
synchronize_sched();
// ... still blocked ...
kfree(old);
// wake up
do_something(ptr);
Also, I can't understand the purpose of 2-nd synchronize_sched() in
synchronize_srcu().
Please help!
Oleg.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] srcu: RCU variant permitting read-side blocking
2006-06-28 19:41 ` Oleg Nesterov
@ 2006-06-28 15:32 ` Paul E. McKenney
0 siblings, 0 replies; 6+ messages in thread
From: Paul E. McKenney @ 2006-06-28 15:32 UTC (permalink / raw)
To: Oleg Nesterov; +Cc: linux-kernel
On Wed, Jun 28, 2006 at 11:41:21PM +0400, Oleg Nesterov wrote:
> On 06/27, Paul E. McKenney wrote:
> >
> > On Wed, Jun 28, 2006 at 01:13:58AM +0400, Oleg Nesterov wrote:
> > >
> > > Also, I can't understand the purpose of 2-nd synchronize_sched() in
> > > synchronize_srcu().
> >
> > This one handles the srcu_read_unlock() analog of the situation you
> > are worried about above. The reader does not have memory barriers in
> > srcu_read_unlock(), so an access to the data structure might get
> > reordered to follow the decrement of .c[0] -- which would get messed
> > up by the following kfree().
>
> Aha, I see.
Fortunately, we understood opposite sides of the problem, so, taken
together, we have it covered. ;-)
Now we just need to figure out how to find the problems that both of
us missed!
> The last question. The 'srcu-2' you posted today does synchronize_srcu_flip()
> twice. You did it this way because srcu is optimized for readers, otherwise we
> could just add smp_rmb() into srcu_read_lock() - this should solve the problem
> as well.
>
> Is my understanding correct?
Exactly correct!!!
Thanx, Paul
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] srcu: RCU variant permitting read-side blocking
2006-06-27 18:59 ` Paul E. McKenney
2006-06-27 19:19 ` Paul E. McKenney
@ 2006-06-28 19:41 ` Oleg Nesterov
2006-06-28 15:32 ` Paul E. McKenney
1 sibling, 1 reply; 6+ messages in thread
From: Oleg Nesterov @ 2006-06-28 19:41 UTC (permalink / raw)
To: Paul E. McKenney; +Cc: linux-kernel
On 06/27, Paul E. McKenney wrote:
>
> On Wed, Jun 28, 2006 at 01:13:58AM +0400, Oleg Nesterov wrote:
> >
> > Also, I can't understand the purpose of 2-nd synchronize_sched() in
> > synchronize_srcu().
>
> This one handles the srcu_read_unlock() analog of the situation you
> are worried about above. The reader does not have memory barriers in
> srcu_read_unlock(), so an access to the data structure might get
> reordered to follow the decrement of .c[0] -- which would get messed
> up by the following kfree().
Aha, I see.
The last question. The 'srcu-2' you posted today does synchronize_srcu_flip()
twice. You did it this way because srcu is optimized for readers, otherwise we
could just add smp_rmb() into srcu_read_lock() - this should solve the problem
as well.
Is my understanding correct?
Thanks!
Oleg.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2006-06-28 15:31 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-27 21:13 [PATCH 1/2] srcu: RCU variant permitting read-side blocking Oleg Nesterov
2006-06-27 18:59 ` Paul E. McKenney
2006-06-27 19:19 ` Paul E. McKenney
2006-06-28 19:41 ` Oleg Nesterov
2006-06-28 15:32 ` Paul E. McKenney
[not found] <20060626190328.GD2141@us.ibm.com>
[not found] ` <20060626190743.GE2141@us.ibm.com>
[not found] ` <20060626134447.a75cb385.akpm@osdl.org>
[not found] ` <20060627005350.GG1295@us.ibm.com>
[not found] ` <20060626181418.70aeffd3.akpm@osdl.org>
2006-06-27 1:37 ` Paul E. McKenney
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox