* Re: [PATCH 1/2] srcu: RCU variant permitting read-side blocking @ 2006-06-27 21:13 Oleg Nesterov 2006-06-27 18:59 ` Paul E. McKenney 0 siblings, 1 reply; 6+ messages in thread From: Oleg Nesterov @ 2006-06-27 21:13 UTC (permalink / raw) To: Paul E. McKenney; +Cc: linux-kernel Hello Paul, "Paul E. McKenney" wrote: > > +void init_srcu_struct(struct srcu_struct *sp) > +{ > + int cpu; > + > + sp->completed = 0; > + sp->per_cpu_ref = (struct srcu_struct_array *) > + kmalloc(NR_CPUS * sizeof(*sp->per_cpu_ref), > + GFP_KERNEL); > + for_each_cpu(cpu) { > + sp->per_cpu_ref[cpu].c[0] = 0; > + sp->per_cpu_ref[cpu].c[1] = 0; > + } Isn't it simpler to just do: sp->per_cpu_ref = kzmalloc(NR_CPUS * sizeof(*sp->per_cpu_ref), GFP_KERNEL); and drop 'for_each_cpu(cpu)' initialization ? > +int srcu_read_lock(struct srcu_struct *sp) > +{ > + int idx; > + > + preempt_disable(); > + idx = sp->completed & 0x1; > + barrier(); > + sp->per_cpu_ref[smp_processor_id()].c[idx]++; > + preempt_enable(); > + return idx; > +} Could you explain this 'barrier()' ? > +void synchronize_srcu(struct srcu_struct *sp) > +{ > + int cpu; > + int idx; > + int sum; > + > + might_sleep(); > + > + mutex_lock(&sp->mutex); > + > + smp_mb(); /* Prevent operations from leaking in. */ Why smp_wmb() is not enough? We are doing synchronize_sched() below before reading ->per_cpu_ref, and ->completed is protected by ->mutex. > + idx = sp->completed & 0x1; > + sp->completed++; But srcu_read_lock()'s path and rcu_dereference() doesn't have rmb(), and the reader can block, so I can't understand how this all works. Suppose ->completed == 0, WRITER: READER: old = global_ptr; rcu_assign_pointer(global_ptr, new); synchronize_srcu: locks mutex, does mb, ->completed++; srcu_read_lock(); // reads ->completed == 1 // does .c[1]++ ptr = rcu_dereference(global_ptr) // reads the *OLD* value, // because we don't have rmb() block_on_something(); synchronize_sched(); // ... still blocked ... checks sum_of(.c[0]) == 0, yes synchronize_sched(); // ... still blocked ... kfree(old); // wake up do_something(ptr); Also, I can't understand the purpose of 2-nd synchronize_sched() in synchronize_srcu(). Please help! Oleg. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] srcu: RCU variant permitting read-side blocking 2006-06-27 21:13 [PATCH 1/2] srcu: RCU variant permitting read-side blocking Oleg Nesterov @ 2006-06-27 18:59 ` Paul E. McKenney 2006-06-27 19:19 ` Paul E. McKenney 2006-06-28 19:41 ` Oleg Nesterov 0 siblings, 2 replies; 6+ messages in thread From: Paul E. McKenney @ 2006-06-27 18:59 UTC (permalink / raw) To: Oleg Nesterov; +Cc: linux-kernel On Wed, Jun 28, 2006 at 01:13:58AM +0400, Oleg Nesterov wrote: > Hello Paul, > > "Paul E. McKenney" wrote: > > > > +void init_srcu_struct(struct srcu_struct *sp) > > +{ > > + int cpu; > > + > > + sp->completed = 0; > > + sp->per_cpu_ref = (struct srcu_struct_array *) > > + kmalloc(NR_CPUS * sizeof(*sp->per_cpu_ref), > > + GFP_KERNEL); > > + for_each_cpu(cpu) { > > + sp->per_cpu_ref[cpu].c[0] = 0; > > + sp->per_cpu_ref[cpu].c[1] = 0; > > + } > > Isn't it simpler to just do: > > sp->per_cpu_ref = kzmalloc(NR_CPUS * sizeof(*sp->per_cpu_ref), > GFP_KERNEL); > > and drop 'for_each_cpu(cpu)' initialization ? Yes, and even simpler to use the alloc_percpu(), as Andrew suggested. > > +int srcu_read_lock(struct srcu_struct *sp) > > +{ > > + int idx; > > + > > + preempt_disable(); > > + idx = sp->completed & 0x1; > > + barrier(); > > + sp->per_cpu_ref[smp_processor_id()].c[idx]++; > > + preempt_enable(); > > + return idx; > > +} > > Could you explain this 'barrier()' ? It ensures that the compiler picks up sp->completed but once. It is hard to imagine a compiler generating code that fetched sp->completed more than once, but I have been unpleasantly surprised before. Thoughts? > > +void synchronize_srcu(struct srcu_struct *sp) > > +{ > > + int cpu; > > + int idx; > > + int sum; > > + > > + might_sleep(); > > + > > + mutex_lock(&sp->mutex); > > + > > + smp_mb(); /* Prevent operations from leaking in. */ > > Why smp_wmb() is not enough? We are doing synchronize_sched() below > before reading ->per_cpu_ref, and ->completed is protected by ->mutex. Could well be that smp_wmb() is sufficient. I frankly was not engaging in that level of optimization on this round. Seems likely, given that I was not able to come up with a convincing counter-example. That said, I am not going to change it until I can prove that it is safe. ;-) > > + idx = sp->completed & 0x1; > > + sp->completed++; > > But srcu_read_lock()'s path and rcu_dereference() doesn't have rmb(), > and the reader can block, so I can't understand how this all works. > > Suppose ->completed == 0, > > WRITER: READER: > > old = global_ptr; > rcu_assign_pointer(global_ptr, new); > > synchronize_srcu: > > locks mutex, does mb, > ->completed++; > > srcu_read_lock(); > // reads ->completed == 1 > // does .c[1]++ > ptr = rcu_dereference(global_ptr) > // reads the *OLD* value, > // because we don't have rmb() Hmmm... I thought I was handling this case, but my rationale as to how is looking a bit flimsy at the moment. ;-) I will look at this more carefully. If you are correct, one fix is to replace the prior mb with synchronize_sched(). Do you agree that this would fix the problem? > block_on_something(); > > > synchronize_sched(); The above synchronize_sched() guarantees that all srcu_read_lock() calls that are still in flight will either (1) already be accounted for in .c[1] or (2) do their accounting in .c[0]. > // ... still blocked ... > > checks sum_of(.c[0]) == 0, yes > > synchronize_sched(); This one handles the srcu_read_unlock() analog of the situation you are worried about above. The reader does not have memory barriers in srcu_read_unlock(), so an access to the data structure might get reordered to follow the decrement of .c[0] -- which would get messed up by the following kfree(). The synchronize_sched() guarantees that all concurrent srcu_read_unlock() calls complete cleanly before synchronize_sched() returns, inserting a memory barrier on each CPU to enforce this. > // ... still blocked ... > > kfree(old); > > // wake up > do_something(ptr); > > > Also, I can't understand the purpose of 2-nd synchronize_sched() in > synchronize_srcu(). (See above.) > Please help! Thank you for the careful review! I will look more carefully into the scenario you called out above. Thanx, Paul ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] srcu: RCU variant permitting read-side blocking 2006-06-27 18:59 ` Paul E. McKenney @ 2006-06-27 19:19 ` Paul E. McKenney 2006-06-28 19:41 ` Oleg Nesterov 1 sibling, 0 replies; 6+ messages in thread From: Paul E. McKenney @ 2006-06-27 19:19 UTC (permalink / raw) To: Oleg Nesterov; +Cc: linux-kernel On Tue, Jun 27, 2006 at 11:59:45AM -0700, Paul E. McKenney wrote: > On Wed, Jun 28, 2006 at 01:13:58AM +0400, Oleg Nesterov wrote: > > "Paul E. McKenney" wrote: > > > + idx = sp->completed & 0x1; > > > + sp->completed++; > > > > But srcu_read_lock()'s path and rcu_dereference() doesn't have rmb(), > > and the reader can block, so I can't understand how this all works. > > > > Suppose ->completed == 0, > > > > WRITER: READER: > > > > old = global_ptr; > > rcu_assign_pointer(global_ptr, new); > > > > synchronize_srcu: > > > > locks mutex, does mb, > > ->completed++; > > > > srcu_read_lock(); > > // reads ->completed == 1 > > // does .c[1]++ > > ptr = rcu_dereference(global_ptr) > > // reads the *OLD* value, > > // because we don't have rmb() > > Hmmm... I thought I was handling this case, but my rationale as to > how is looking a bit flimsy at the moment. ;-) I will look at this > more carefully. If you are correct, one fix is to replace the prior mb > with synchronize_sched(). Do you agree that this would fix the problem? Never mind -- this fix would have no effect. Guess I should engage my brain before sending email. :-/ First to review my reasoning, and then to provide either the explanation should my reasoning prove true or a fix otherwise... Thanx, Paul ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] srcu: RCU variant permitting read-side blocking 2006-06-27 18:59 ` Paul E. McKenney 2006-06-27 19:19 ` Paul E. McKenney @ 2006-06-28 19:41 ` Oleg Nesterov 2006-06-28 15:32 ` Paul E. McKenney 1 sibling, 1 reply; 6+ messages in thread From: Oleg Nesterov @ 2006-06-28 19:41 UTC (permalink / raw) To: Paul E. McKenney; +Cc: linux-kernel On 06/27, Paul E. McKenney wrote: > > On Wed, Jun 28, 2006 at 01:13:58AM +0400, Oleg Nesterov wrote: > > > > Also, I can't understand the purpose of 2-nd synchronize_sched() in > > synchronize_srcu(). > > This one handles the srcu_read_unlock() analog of the situation you > are worried about above. The reader does not have memory barriers in > srcu_read_unlock(), so an access to the data structure might get > reordered to follow the decrement of .c[0] -- which would get messed > up by the following kfree(). Aha, I see. The last question. The 'srcu-2' you posted today does synchronize_srcu_flip() twice. You did it this way because srcu is optimized for readers, otherwise we could just add smp_rmb() into srcu_read_lock() - this should solve the problem as well. Is my understanding correct? Thanks! Oleg. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] srcu: RCU variant permitting read-side blocking 2006-06-28 19:41 ` Oleg Nesterov @ 2006-06-28 15:32 ` Paul E. McKenney 0 siblings, 0 replies; 6+ messages in thread From: Paul E. McKenney @ 2006-06-28 15:32 UTC (permalink / raw) To: Oleg Nesterov; +Cc: linux-kernel On Wed, Jun 28, 2006 at 11:41:21PM +0400, Oleg Nesterov wrote: > On 06/27, Paul E. McKenney wrote: > > > > On Wed, Jun 28, 2006 at 01:13:58AM +0400, Oleg Nesterov wrote: > > > > > > Also, I can't understand the purpose of 2-nd synchronize_sched() in > > > synchronize_srcu(). > > > > This one handles the srcu_read_unlock() analog of the situation you > > are worried about above. The reader does not have memory barriers in > > srcu_read_unlock(), so an access to the data structure might get > > reordered to follow the decrement of .c[0] -- which would get messed > > up by the following kfree(). > > Aha, I see. Fortunately, we understood opposite sides of the problem, so, taken together, we have it covered. ;-) Now we just need to figure out how to find the problems that both of us missed! > The last question. The 'srcu-2' you posted today does synchronize_srcu_flip() > twice. You did it this way because srcu is optimized for readers, otherwise we > could just add smp_rmb() into srcu_read_lock() - this should solve the problem > as well. > > Is my understanding correct? Exactly correct!!! Thanx, Paul ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <20060626190328.GD2141@us.ibm.com>]
[parent not found: <20060626190743.GE2141@us.ibm.com>]
[parent not found: <20060626134447.a75cb385.akpm@osdl.org>]
[parent not found: <20060627005350.GG1295@us.ibm.com>]
[parent not found: <20060626181418.70aeffd3.akpm@osdl.org>]
* Re: [PATCH 1/2] srcu: RCU variant permitting read-side blocking [not found] ` <20060626181418.70aeffd3.akpm@osdl.org> @ 2006-06-27 1:37 ` Paul E. McKenney 0 siblings, 0 replies; 6+ messages in thread From: Paul E. McKenney @ 2006-06-27 1:37 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, matthltc, dipankar, stern, mingo, tytso, dvhltc On Mon, Jun 26, 2006 at 06:14:18PM -0700, Andrew Morton wrote: > On Mon, 26 Jun 2006 17:53:51 -0700 > "Paul E. McKenney" <paulmck@us.ibm.com> wrote: > > > > > +struct srcu_struct_array { > > > > + int c[2]; > > > > +} ____cacheline_internode_aligned_in_smp; > > > > > > ____cacheline_internode_aligned_in_smp isn't implemented.. > > > > It was not long ago... :-/ > > I was trying to work out why on earth this compiled. > > It gives you a global variable called > ____cacheline_internode_aligned_in_smp. That works nicely until you > include this header file from two .c files, at which time you get two > global variables called ____cacheline_internode_aligned_in_smp. And the > linker will happily swallow even that unless you're using -fno-common. Hmmm... Sounds like percpu_alloc() is strongly recommended. Made the changes, compiling, hope to test overnight. > > > > + if (sum == 0) > > > > + break; > > > > + schedule_timeout_interruptible(1); > > > > + } > > > > > > Little sleeps like this are a worry. It's usually an indication that we've > > > been lazy and haven't put in the wakeups which are needed for a > > > minimum-latency wait. > > > > I have been even -more- lazy and have absolutely -no- wakeups. ;-) > > The alternative would be to have srcu_read_unlock() wake up the > > task doing the synchronize_srcu(), but getting that right is painful. > > Shouldn't be too hard... > > A wakeup can be relatively expensive, but one can often do > > if (something_which_is_inexpensive()) > wake_up(...); > > although it takes care. Exactly. ;-) Been there, done that, gotten it right, but have also run up almost every blind alley that there is. Besides, prior to this, there is a synchronize_sched(). In many cases, the readers will have all completed during the synchronize_sched() latency, so my bet is that the extra complexity will have no benefit in the common case. And if someone comes up with with a good reason to do a blocking network receive or some such in the SRCU read-side critical sections, I will be happy to add the wakeup machinations. Fair enough? (Besides, we will want to save some of the complexity budget for a hierarchical implementation should Jesse Barnes prove correct about future 1,000-CPU dies, right?) > if you want to be _really_ sleazy you can do > > if (something_which_is_inexpensive_and_isnt_quite_right()) > wake_up(...); > > and, at the other end: > > while (something) { > schedule_timeout_interruptible(1); > } > > and rely upon the flakey-wakeup to work most of the time, so it usually > interrupts the sleep. Urg... > Now erase this from your mind. To erase it from my mind, I would have had to allow it to get that far in the first place. ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2006-06-28 15:31 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-27 21:13 [PATCH 1/2] srcu: RCU variant permitting read-side blocking Oleg Nesterov
2006-06-27 18:59 ` Paul E. McKenney
2006-06-27 19:19 ` Paul E. McKenney
2006-06-28 19:41 ` Oleg Nesterov
2006-06-28 15:32 ` Paul E. McKenney
[not found] <20060626190328.GD2141@us.ibm.com>
[not found] ` <20060626190743.GE2141@us.ibm.com>
[not found] ` <20060626134447.a75cb385.akpm@osdl.org>
[not found] ` <20060627005350.GG1295@us.ibm.com>
[not found] ` <20060626181418.70aeffd3.akpm@osdl.org>
2006-06-27 1:37 ` Paul E. McKenney
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox