From: "Paul E. McKenney" <paulmck@us.ibm.com>
To: Oleg Nesterov <oleg@tv-sign.ru>
Cc: linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/2] srcu: RCU variant permitting read-side blocking
Date: Tue, 27 Jun 2006 11:59:45 -0700 [thread overview]
Message-ID: <20060627185945.GD1286@us.ibm.com> (raw)
In-Reply-To: <20060627211358.GA484@oleg>
On Wed, Jun 28, 2006 at 01:13:58AM +0400, Oleg Nesterov wrote:
> Hello Paul,
>
> "Paul E. McKenney" wrote:
> >
> > +void init_srcu_struct(struct srcu_struct *sp)
> > +{
> > + int cpu;
> > +
> > + sp->completed = 0;
> > + sp->per_cpu_ref = (struct srcu_struct_array *)
> > + kmalloc(NR_CPUS * sizeof(*sp->per_cpu_ref),
> > + GFP_KERNEL);
> > + for_each_cpu(cpu) {
> > + sp->per_cpu_ref[cpu].c[0] = 0;
> > + sp->per_cpu_ref[cpu].c[1] = 0;
> > + }
>
> Isn't it simpler to just do:
>
> sp->per_cpu_ref = kzmalloc(NR_CPUS * sizeof(*sp->per_cpu_ref),
> GFP_KERNEL);
>
> and drop 'for_each_cpu(cpu)' initialization ?
Yes, and even simpler to use the alloc_percpu(), as Andrew suggested.
> > +int srcu_read_lock(struct srcu_struct *sp)
> > +{
> > + int idx;
> > +
> > + preempt_disable();
> > + idx = sp->completed & 0x1;
> > + barrier();
> > + sp->per_cpu_ref[smp_processor_id()].c[idx]++;
> > + preempt_enable();
> > + return idx;
> > +}
>
> Could you explain this 'barrier()' ?
It ensures that the compiler picks up sp->completed but once.
It is hard to imagine a compiler generating code that fetched sp->completed
more than once, but I have been unpleasantly surprised before.
Thoughts?
> > +void synchronize_srcu(struct srcu_struct *sp)
> > +{
> > + int cpu;
> > + int idx;
> > + int sum;
> > +
> > + might_sleep();
> > +
> > + mutex_lock(&sp->mutex);
> > +
> > + smp_mb(); /* Prevent operations from leaking in. */
>
> Why smp_wmb() is not enough? We are doing synchronize_sched() below
> before reading ->per_cpu_ref, and ->completed is protected by ->mutex.
Could well be that smp_wmb() is sufficient. I frankly was not engaging
in that level of optimization on this round. Seems likely, given that
I was not able to come up with a convincing counter-example.
That said, I am not going to change it until I can prove that it is
safe. ;-)
> > + idx = sp->completed & 0x1;
> > + sp->completed++;
>
> But srcu_read_lock()'s path and rcu_dereference() doesn't have rmb(),
> and the reader can block, so I can't understand how this all works.
>
> Suppose ->completed == 0,
>
> WRITER: READER:
>
> old = global_ptr;
> rcu_assign_pointer(global_ptr, new);
>
> synchronize_srcu:
>
> locks mutex, does mb,
> ->completed++;
>
> srcu_read_lock();
> // reads ->completed == 1
> // does .c[1]++
> ptr = rcu_dereference(global_ptr)
> // reads the *OLD* value,
> // because we don't have rmb()
Hmmm... I thought I was handling this case, but my rationale as to
how is looking a bit flimsy at the moment. ;-) I will look at this
more carefully. If you are correct, one fix is to replace the prior mb
with synchronize_sched(). Do you agree that this would fix the problem?
> block_on_something();
>
>
> synchronize_sched();
The above synchronize_sched() guarantees that all srcu_read_lock() calls
that are still in flight will either (1) already be accounted for in .c[1]
or (2) do their accounting in .c[0].
> // ... still blocked ...
>
> checks sum_of(.c[0]) == 0, yes
>
> synchronize_sched();
This one handles the srcu_read_unlock() analog of the situation you
are worried about above. The reader does not have memory barriers in
srcu_read_unlock(), so an access to the data structure might get
reordered to follow the decrement of .c[0] -- which would get messed
up by the following kfree().
The synchronize_sched() guarantees that all concurrent srcu_read_unlock()
calls complete cleanly before synchronize_sched() returns, inserting
a memory barrier on each CPU to enforce this.
> // ... still blocked ...
>
> kfree(old);
>
> // wake up
> do_something(ptr);
>
>
> Also, I can't understand the purpose of 2-nd synchronize_sched() in
> synchronize_srcu().
(See above.)
> Please help!
Thank you for the careful review! I will look more carefully into the
scenario you called out above.
Thanx, Paul
next prev parent reply other threads:[~2006-06-27 18:59 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-06-27 21:13 [PATCH 1/2] srcu: RCU variant permitting read-side blocking Oleg Nesterov
2006-06-27 18:59 ` Paul E. McKenney [this message]
2006-06-27 19:19 ` Paul E. McKenney
2006-06-28 19:41 ` Oleg Nesterov
2006-06-28 15:32 ` Paul E. McKenney
[not found] <20060626190328.GD2141@us.ibm.com>
[not found] ` <20060626190743.GE2141@us.ibm.com>
[not found] ` <20060626134447.a75cb385.akpm@osdl.org>
[not found] ` <20060627005350.GG1295@us.ibm.com>
[not found] ` <20060626181418.70aeffd3.akpm@osdl.org>
2006-06-27 1:37 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060627185945.GD1286@us.ibm.com \
--to=paulmck@us.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=oleg@tv-sign.ru \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox