From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Boqun Feng <boqun.feng@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>,
josh@joshtriplett.org, Steven Rostedt <rostedt@goodmis.org>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
jiangshanlai@gmail.com, LKML <linux-kernel@vger.kernel.org>,
syzkaller <syzkaller@googlegroups.com>
Subject: Re: rcu: WARNING in rcu_seq_end
Date: Tue, 7 Mar 2017 19:08:05 -0800 [thread overview]
Message-ID: <20170308030805.GB30506@linux.vnet.ibm.com> (raw)
In-Reply-To: <20170308024417.sptzlehrtfl6bnc2@tardis>
On Wed, Mar 08, 2017 at 10:44:17AM +0800, Boqun Feng wrote:
> On Tue, Mar 07, 2017 at 06:26:03PM -0800, Paul E. McKenney wrote:
> > On Wed, Mar 08, 2017 at 09:39:13AM +0800, Boqun Feng wrote:
> > > On Tue, Mar 07, 2017 at 03:31:54PM -0800, Paul E. McKenney wrote:
> > > > On Wed, Mar 08, 2017 at 07:05:13AM +0800, Boqun Feng wrote:
> > > > > On Tue, Mar 07, 2017 at 07:27:15AM -0800, Paul E. McKenney wrote:
> > > > > > On Tue, Mar 07, 2017 at 03:43:42PM +0100, Dmitry Vyukov wrote:
> > > > > > > On Tue, Mar 7, 2017 at 3:27 PM, Boqun Feng <boqun.feng@gmail.com> wrote:
> > > > > > > > On Tue, Mar 07, 2017 at 08:05:19AM +0100, Dmitry Vyukov wrote:
> > > > > > > > [...]
> > > > > > > >> >>
> > > > > > > >> >> What is that mutex? And what locks/unlocks provide synchronization? I
> > > > > > > >> >> see that one uses exp_mutex and another -- exp_wake_mutex.
> > > > > > > >> >
> > > > > > > >> > Both of them.
> > > > > > > >> >
> > > > > > > >> > ->exp_mutex is acquired by the task requesting the grace period, and
> > > > > > > >> > the counter's first increment is done by that task under that mutex.
> > > > > > > >> > This task then schedules a workqueue, which drives forward the grace
> > > > > > > >> > period. Upon grace-period completion, the workqueue handler does the
> > > > > > > >> > second increment (the one that your patch addressed). The workqueue
> > > > > > > >> > handler then acquires ->exp_wake_mutex and wakes the task that holds
> > > > > > > >> > ->exp_mutex (along with all other tasks waiting for this grace period),
> > > > > > > >> > and that task releases ->exp_mutex, which allows the next grace period to
> > > > > > > >> > start (and the first increment for that next grace period to be carried
> > > > > > > >> > out under that lock). The workqueue handler releases ->exp_wake_mutex
> > > > > > > >> > after finishing its wakeups.
> > > > > > > >>
> > > > > > > >> Then we need the following for the case when task requesting the grace
> > > > > > > >> period does not block, right?
> > > > > > > >
> > > > > > > > Won't be necessary I think, as the smp_mb() in rcu_seq_end() and the
> > > > > > > > smp_mb__before_atomic() in sync_exp_work_done() already provide the
> > > > > > > > required ordering, no?
> > > > > > >
> > > > > > > smp_mb() is probably fine, but smp_mb__before_atomic() is release not
> > > > > > > acquire. If we want to play that game, then I guess we also need
> > > > >
> > > > > The point is that smp_mb__before_atomic() + atomic_long_inc() will
> > > > > guarantee a smp_mb() before or right along with the atomic operation,
> > > > > and that's enough because rcu_seq_done() followed by a smp_mb() will
> > > > > give it a acquire-like behavior.
> > > >
> > > > Given current architectures, true enough, from what I can see.
> > > >
> > > > However, let's take a look at atomic_ops.rst:
> > > >
> > > >
> > > > If a caller requires memory barrier semantics around an atomic_t
> > > > operation which does not return a value, a set of interfaces are
> > > > defined which accomplish this::
> > > >
> > > > void smp_mb__before_atomic(void);
> > > > void smp_mb__after_atomic(void);
> > > >
> > > > For example, smp_mb__before_atomic() can be used like so::
> > > >
> > > > obj->dead = 1;
> > > > smp_mb__before_atomic();
> > > > atomic_dec(&obj->ref_count);
> > > >
> > > > It makes sure that all memory operations preceding the atomic_dec()
> > > > call are strongly ordered with respect to the atomic counter
> > > > operation. In the above example, it guarantees that the assignment of
> > > > "1" to obj->dead will be globally visible to other cpus before the
> > > > atomic counter decrement.
> > > >
> > > > Without the explicit smp_mb__before_atomic() call, the
> > > > implementation could legally allow the atomic counter update visible
> > > > to other cpus before the "obj->dead = 1;" assignment.
> > > >
> > > > So the ordering is guaranteed against the atomic operation, not
> > > > necessarily the stuff after it. But again, the implementations I know
> > > > of do make the guarantee, hence my calling it a theoretical bug in the
> > > > commit log.
> > >
> > > Fair enough ;-) It's me who misunderstood this part of document.
> > >
> > > However, the names of the barriers are smp_mb__{before,after}_atomic(),
> > > so if they, semantically, only provide ordering for the corresponding
> > > atomic ops rather than a full barrier, I would their names are
> > > misleading ;-)
> >
> > Well, if you have both ordering before and after, then you have full
> > ordering.
>
> I mean the names of the barriers are *smp_mb*__before_atomic() and
> *smp_mb*__after_atomic(), so it's natural to think they provide a
> smp_mb() in some situations ;-)
>
> > > > > > > smp_mb__after_atomic() there. But it would be way easier to understand
> > > > >
> > > > > Adding smp_mb__after_atomic() would be pointless as it's the load of
> > > > > ->expedited_sequence that we want to ensure having acquire behavior
> > > > > rather than the atomic increment of @stat.
> > > >
> > > > Again, agreed given current code, but atomic_ops.rst doesn't guarantee
> > > > ordering past the actual atomic operation itself.
> > >
> > > Neither does atomic_ops.rst guarantee the ordering between a load before
> > > the atomic op and memory accesses after the atomic op, right? I.e.
> > > atomic_ops.rst doesn't say no for reordering like this:
> > >
> > > r1 = READ_ONCE(a); ---------+
> > > atomic_long_inc(b); |
> > > smp_mb__after_atomic(); |
> > > WRITE_ONCE(c); |
> > > {r1 = READ_ONCE(a)} <-------+
> > >
> > > So it's still not an acquire for READ_ONCE(a), in our case "a" is
> > > ->expedited_sequence.
> > >
> > > To me, we can either fix the atomic_ops.rst or, as I proposed, just
> > > change smp_mb__before_atomic() to smp_mb().
> >
> > Or have both an smp_mb__before_atomic() and an smp_mb__after_atomic(),
> > as is the usual approach when you need full ordering. ;-)
>
> Yes ;-) It's just that "adding a barrier after one operation to provide
> acquire semantic for another operation" looks weird to me.
But they are memory barriers! They are -suppposed- to look weird! ;-)
Thanx, Paul
> Regards,
> Boqun
>
> > Thanx, Paul
> >
> > > Thoughts?
> > >
> > > Regards,
> > > Boqun
> > >
> > > > Thanx, Paul
> > > >
> > > > > > > what's happens there and prove that it's correct, if we use
> > > > > > > store_release/load_acquire.
> > > > > >
> > > > > > Fair point, how about the following?
> > > > > >
> > > > > > Thanx, Paul
> > > > > >
> > > > > > ------------------------------------------------------------------------
> > > > > >
> > > > > > commit 6fd8074f1976596898e39f5b7ea1755652533906
> > > > > > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > > > > Date: Tue Mar 7 07:21:23 2017 -0800
> > > > > >
> > > > > > rcu: Add smp_mb__after_atomic() to sync_exp_work_done()
> > > > > >
> > > > > > The sync_exp_work_done() function needs to fully order the counter-check
> > > > > > operation against anything happening after the corresponding grace period.
> > > > > > This is a theoretical bug, as all current architectures either provide
> > > > > > full ordering for atomic operation on the one hand or implement,
> > > > > > however, a little future-proofing is a good thing. This commit
> > > > > > therefore adds smp_mb__after_atomic() after the atomic_long_inc()
> > > > > > in sync_exp_work_done().
> > > > > >
> > > > > > Reported-by: Dmitry Vyukov <dvyukov@google.com>
> > > > > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > > > >
> > > > > > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> > > > > > index 027e123d93c7..652071abd9b4 100644
> > > > > > --- a/kernel/rcu/tree_exp.h
> > > > > > +++ b/kernel/rcu/tree_exp.h
> > > > > > @@ -247,6 +247,7 @@ static bool sync_exp_work_done(struct rcu_state *rsp, atomic_long_t *stat,
> > > > > > /* Ensure test happens before caller kfree(). */
> > > > > > smp_mb__before_atomic(); /* ^^^ */
> > > > > > atomic_long_inc(stat);
> > > > > > + smp_mb__after_atomic(); /* ^^^ */
> > > > >
> > > > > If we really care about future-proofing, I think it's more safe to
> > > > > change smp_mb__before_atomic() to smp_mb() rather than adding
> > > > > __after_atomic() barrier. Though I think both would be unnecessary ;-)
> > > > >
> > > > > Regards,
> > > > > Boqun
> > > > >
> > > > > > return true;
> > > > > > }
> > > > > > return false;
> > > > > >
> > > >
> > > >
> >
> >
next prev parent reply other threads:[~2017-03-08 3:08 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-03-04 16:01 rcu: WARNING in rcu_seq_end Dmitry Vyukov
2017-03-04 20:40 ` Paul E. McKenney
2017-03-05 10:50 ` Dmitry Vyukov
2017-03-05 18:47 ` Paul E. McKenney
2017-03-06 9:24 ` Dmitry Vyukov
2017-03-06 10:07 ` Paul E. McKenney
2017-03-06 10:11 ` Dmitry Vyukov
2017-03-06 23:08 ` Paul E. McKenney
2017-03-07 7:05 ` Dmitry Vyukov
2017-03-07 14:27 ` Boqun Feng
2017-03-07 14:43 ` Dmitry Vyukov
2017-03-07 15:27 ` Paul E. McKenney
2017-03-07 18:37 ` Dmitry Vyukov
2017-03-07 19:09 ` Paul E. McKenney
2017-03-07 23:05 ` Boqun Feng
2017-03-07 23:31 ` Paul E. McKenney
2017-03-08 1:39 ` Boqun Feng
2017-03-08 2:26 ` Paul E. McKenney
2017-03-08 2:44 ` Boqun Feng
2017-03-08 3:08 ` Paul E. McKenney [this message]
2017-03-07 15:16 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170308030805.GB30506@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=boqun.feng@gmail.com \
--cc=dvyukov@google.com \
--cc=jiangshanlai@gmail.com \
--cc=josh@joshtriplett.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=rostedt@goodmis.org \
--cc=syzkaller@googlegroups.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.