From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932528AbWF0S7K (ORCPT ); Tue, 27 Jun 2006 14:59:10 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S932530AbWF0S7J (ORCPT ); Tue, 27 Jun 2006 14:59:09 -0400 Received: from e33.co.us.ibm.com ([32.97.110.151]:10938 "EHLO e33.co.us.ibm.com") by vger.kernel.org with ESMTP id S932528AbWF0S7I (ORCPT ); Tue, 27 Jun 2006 14:59:08 -0400 Date: Tue, 27 Jun 2006 11:59:45 -0700 From: "Paul E. McKenney" To: Oleg Nesterov Cc: linux-kernel@vger.kernel.org Subject: Re: [PATCH 1/2] srcu: RCU variant permitting read-side blocking Message-ID: <20060627185945.GD1286@us.ibm.com> Reply-To: paulmck@us.ibm.com References: <20060627211358.GA484@oleg> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20060627211358.GA484@oleg> User-Agent: Mutt/1.4.1i Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 28, 2006 at 01:13:58AM +0400, Oleg Nesterov wrote: > Hello Paul, > > "Paul E. McKenney" wrote: > > > > +void init_srcu_struct(struct srcu_struct *sp) > > +{ > > + int cpu; > > + > > + sp->completed = 0; > > + sp->per_cpu_ref = (struct srcu_struct_array *) > > + kmalloc(NR_CPUS * sizeof(*sp->per_cpu_ref), > > + GFP_KERNEL); > > + for_each_cpu(cpu) { > > + sp->per_cpu_ref[cpu].c[0] = 0; > > + sp->per_cpu_ref[cpu].c[1] = 0; > > + } > > Isn't it simpler to just do: > > sp->per_cpu_ref = kzmalloc(NR_CPUS * sizeof(*sp->per_cpu_ref), > GFP_KERNEL); > > and drop 'for_each_cpu(cpu)' initialization ? Yes, and even simpler to use the alloc_percpu(), as Andrew suggested. > > +int srcu_read_lock(struct srcu_struct *sp) > > +{ > > + int idx; > > + > > + preempt_disable(); > > + idx = sp->completed & 0x1; > > + barrier(); > > + sp->per_cpu_ref[smp_processor_id()].c[idx]++; > > + preempt_enable(); > > + return idx; > > +} > > Could you explain this 'barrier()' ? It ensures that the compiler picks up sp->completed but once. It is hard to imagine a compiler generating code that fetched sp->completed more than once, but I have been unpleasantly surprised before. Thoughts? > > +void synchronize_srcu(struct srcu_struct *sp) > > +{ > > + int cpu; > > + int idx; > > + int sum; > > + > > + might_sleep(); > > + > > + mutex_lock(&sp->mutex); > > + > > + smp_mb(); /* Prevent operations from leaking in. */ > > Why smp_wmb() is not enough? We are doing synchronize_sched() below > before reading ->per_cpu_ref, and ->completed is protected by ->mutex. Could well be that smp_wmb() is sufficient. I frankly was not engaging in that level of optimization on this round. Seems likely, given that I was not able to come up with a convincing counter-example. That said, I am not going to change it until I can prove that it is safe. ;-) > > + idx = sp->completed & 0x1; > > + sp->completed++; > > But srcu_read_lock()'s path and rcu_dereference() doesn't have rmb(), > and the reader can block, so I can't understand how this all works. > > Suppose ->completed == 0, > > WRITER: READER: > > old = global_ptr; > rcu_assign_pointer(global_ptr, new); > > synchronize_srcu: > > locks mutex, does mb, > ->completed++; > > srcu_read_lock(); > // reads ->completed == 1 > // does .c[1]++ > ptr = rcu_dereference(global_ptr) > // reads the *OLD* value, > // because we don't have rmb() Hmmm... I thought I was handling this case, but my rationale as to how is looking a bit flimsy at the moment. ;-) I will look at this more carefully. If you are correct, one fix is to replace the prior mb with synchronize_sched(). Do you agree that this would fix the problem? > block_on_something(); > > > synchronize_sched(); The above synchronize_sched() guarantees that all srcu_read_lock() calls that are still in flight will either (1) already be accounted for in .c[1] or (2) do their accounting in .c[0]. > // ... still blocked ... > > checks sum_of(.c[0]) == 0, yes > > synchronize_sched(); This one handles the srcu_read_unlock() analog of the situation you are worried about above. The reader does not have memory barriers in srcu_read_unlock(), so an access to the data structure might get reordered to follow the decrement of .c[0] -- which would get messed up by the following kfree(). The synchronize_sched() guarantees that all concurrent srcu_read_unlock() calls complete cleanly before synchronize_sched() returns, inserting a memory barrier on each CPU to enforce this. > // ... still blocked ... > > kfree(old); > > // wake up > do_something(ptr); > > > Also, I can't understand the purpose of 2-nd synchronize_sched() in > synchronize_srcu(). (See above.) > Please help! Thank you for the careful review! I will look more carefully into the scenario you called out above. Thanx, Paul