Re: [PATCH 1/2] srcu: RCU variant permitting read-side blocking

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Paul E. McKenney" <paulmck@us.ibm.com>
To: Oleg Nesterov <oleg@tv-sign.ru>
Cc: linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/2] srcu: RCU variant permitting read-side blocking
Date: Tue, 27 Jun 2006 11:59:45 -0700	[thread overview]
Message-ID: <20060627185945.GD1286@us.ibm.com> (raw)
In-Reply-To: <20060627211358.GA484@oleg>

On Wed, Jun 28, 2006 at 01:13:58AM +0400, Oleg Nesterov wrote:
> Hello Paul,
> 
> "Paul E. McKenney" wrote:
> >
> > +void init_srcu_struct(struct srcu_struct *sp)
> > +{
> > +	int cpu;
> > +
> > +	sp->completed = 0;
> > +	sp->per_cpu_ref = (struct srcu_struct_array *)
> > +			  kmalloc(NR_CPUS * sizeof(*sp->per_cpu_ref),
> > +				  GFP_KERNEL);
> > +	for_each_cpu(cpu) {
> > +		sp->per_cpu_ref[cpu].c[0] = 0;
> > +		sp->per_cpu_ref[cpu].c[1] = 0;
> > +	}
> 
> Isn't it simpler to just do:
> 
> 	sp->per_cpu_ref = kzmalloc(NR_CPUS * sizeof(*sp->per_cpu_ref),
> 				GFP_KERNEL);
> 
> and drop 'for_each_cpu(cpu)' initialization ?

Yes, and even simpler to use the alloc_percpu(), as Andrew suggested.

> > +int srcu_read_lock(struct srcu_struct *sp)
> > +{
> > +	int idx;
> > +
> > +	preempt_disable();
> > +	idx = sp->completed & 0x1;
> > +	barrier();
> > +	sp->per_cpu_ref[smp_processor_id()].c[idx]++;
> > +	preempt_enable();
> > +	return idx;
> > +}
> 
> Could you explain this 'barrier()' ?

It ensures that the compiler picks up sp->completed but once.
It is hard to imagine a compiler generating code that fetched sp->completed
more than once, but I have been unpleasantly surprised before.

Thoughts?

> > +void synchronize_srcu(struct srcu_struct *sp)
> > +{
> > +	int cpu;
> > +	int idx;
> > +	int sum;
> > +
> > +	might_sleep();
> > +
> > +	mutex_lock(&sp->mutex);
> > +
> > +	smp_mb();  /* Prevent operations from leaking in. */
> 
> Why smp_wmb() is not enough? We are doing synchronize_sched() below
> before reading ->per_cpu_ref, and ->completed is protected by ->mutex.

Could well be that smp_wmb() is sufficient.  I frankly was not engaging
in that level of optimization on this round.  Seems likely, given that
I was not able to come up with a convincing counter-example.

That said, I am not going to change it until I can prove that it is
safe.  ;-)

> > +	idx = sp->completed & 0x1;
> > +	sp->completed++;
> 
> But srcu_read_lock()'s path and rcu_dereference() doesn't have rmb(),
> and the reader can block, so I can't understand how this all works.
> 
> Suppose ->completed == 0,
> 
> 	WRITER:						READER:
> 
> 	old = global_ptr;
> 	rcu_assign_pointer(global_ptr, new);
> 
> 	synchronize_srcu:
> 
> 		locks mutex, does mb,
> 		->completed++;
> 		
> 							srcu_read_lock();
> 								// reads ->completed == 1
> 								// does .c[1]++
> 							ptr = rcu_dereference(global_ptr)
> 								// reads the *OLD* value,
> 								// because we don't have rmb()

Hmmm...  I thought I was handling this case, but my rationale as to
how is looking a bit flimsy at the moment.  ;-)  I will look at this
more carefully.  If you are correct, one fix is to replace the prior mb
with synchronize_sched().  Do you agree that this would fix the problem?

> 							block_on_something();
> 
> 
> 		synchronize_sched();

The above synchronize_sched() guarantees that all srcu_read_lock() calls
that are still in flight will either (1) already be accounted for in .c[1]
or (2) do their accounting in .c[0].

> 							// ... still blocked ...
> 
> 		checks sum_of(.c[0]) == 0, yes
> 
> 		synchronize_sched();

This one handles the srcu_read_unlock() analog of the situation you
are worried about above.  The reader does not have memory barriers in
srcu_read_unlock(), so an access to the data structure might get
reordered to follow the decrement of .c[0] -- which would get messed
up by the following kfree().

The synchronize_sched() guarantees that all concurrent srcu_read_unlock()
calls complete cleanly before synchronize_sched() returns, inserting
a memory barrier on each CPU to enforce this.

> 							// ... still blocked ...
> 
> 	kfree(old);
> 
> 							// wake up
> 							do_something(ptr);
> 
> 
> Also, I can't understand the purpose of 2-nd synchronize_sched() in
> synchronize_srcu().

(See above.)

> Please help!

Thank you for the careful review!  I will look more carefully into the
scenario you called out above.

							Thanx, Paul

next prev parent reply	other threads:[~2006-06-27 18:59 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-06-27 21:13 [PATCH 1/2] srcu: RCU variant permitting read-side blocking Oleg Nesterov
2006-06-27 18:59 ` Paul E. McKenney [this message]
2006-06-27 19:19   ` Paul E. McKenney
2006-06-28 19:41   ` Oleg Nesterov
2006-06-28 15:32     ` Paul E. McKenney
     [not found] <20060626190328.GD2141@us.ibm.com>
     [not found] ` <20060626190743.GE2141@us.ibm.com>
     [not found]   ` <20060626134447.a75cb385.akpm@osdl.org>
     [not found]     ` <20060627005350.GG1295@us.ibm.com>
     [not found]       ` <20060626181418.70aeffd3.akpm@osdl.org>
2006-06-27  1:37         ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060627185945.GD1286@us.ibm.com \
    --to=paulmck@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oleg@tv-sign.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.