From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Peter Zijlstra <peterz@infradead.org>,
akpm@linux-foundation.org, Ingo Molnar <mingo@elte.hu>,
linux-kernel@vger.kernel.org,
Masami Hiramatsu <mhiramat@redhat.com>,
"Frank Ch. Eigler" <fche@redhat.com>,
Hideo AOKI <haoki@redhat.com>,
Takashi Nishiie <t-nishiie@np.css.fujitsu.com>,
Steven Rostedt <rostedt@goodmis.org>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Subject: Re: [patch 01/15] Kernel Tracepoints
Date: Fri, 1 Aug 2008 14:10:20 -0700 [thread overview]
Message-ID: <20080801211020.GQ14851@linux.vnet.ibm.com> (raw)
In-Reply-To: <20080715165122.GB30082@Krystal>
On Tue, Jul 15, 2008 at 12:51:23PM -0400, Mathieu Desnoyers wrote:
> * Peter Zijlstra (peterz@infradead.org) wrote:
> > On Tue, 2008-07-15 at 12:08 -0400, Mathieu Desnoyers wrote:
> > > * Peter Zijlstra (peterz@infradead.org) wrote:
> > > > On Tue, 2008-07-15 at 11:22 -0400, Mathieu Desnoyers wrote:
> > > > > * Peter Zijlstra (peterz@infradead.org) wrote:
> > > > > >
> > > > > > I'm confused by the barrier games here.
> > > > > >
> > > > > > Why not:
> > > > > >
> > > > > > void **it_func;
> > > > > >
> > > > > > preempt_disable();
> > > > > > it_func = rcu_dereference((tp)->funcs);
> > > > > > if (it_func) {
> > > > > > for (; *it_func; it_func++)
> > > > > > ((void(*)(proto))(*it_func))(args);
> > > > > > }
> > > > > > preempt_enable();
> > > > > >
> > > > > > That is, why can we skip the barrier when !it_func? is that because at
> > > > > > that time we don't actually dereference it_func and therefore cannot
> > > > > > observe stale data?
> > > > > >
> > > > >
> > > > > Exactly. I used the implementation of rcu_assign_pointer as a hint that
> > > > > we did not need barriers when setting the pointer to NULL, and thus we
> > > > > should not need the read barrier when reading the NULL pointer, because
> > > > > it references no data.
> > > > >
> > > > > #define rcu_assign_pointer(p, v) \
> > > > > ({ \
> > > > > if (!__builtin_constant_p(v) || \
> > > > > ((v) != NULL)) \
> > > > > smp_wmb(); \
> > > > > (p) = (v); \
> > > > > })
> > > >
> > > > Yeah, I saw that,.. made me wonder. It basically assumes that when we
> > > > write:
> > > >
> > > > rcu_assign_pointer(foo, NULL);
> > > >
> > > > foo will not be used as an index or offset.
> > > >
> > > > I guess Paul has thought it through and verified all in-kernel use
> > > > cases, but it still makes me feel unconfortable.
> > > >
> > > > > #define rcu_dereference(p) ({ \
> > > > > typeof(p) _________p1 = ACCESS_ONCE(p); \
> > > > > smp_read_barrier_depends(); \
> > > > > (_________p1); \
> > > > > })
> > > > >
> > > > > But I think you are right, since we are already in unlikely code, using
> > > > > rcu_dereference as you do is better than my use of read barrier depends.
> > > > > It should not change anything in the assembly result except on alpha,
> > > > > where the read_barrier_depends() is not a nop.
> > > > >
> > > > > I wonder if there would be a way to add this kind of NULL pointer case
> > > > > check without overhead in rcu_dereference() on alpha. I guess not, since
> > > > > the pointer is almost never known at compile-time. And I guess Paul must
> > > > > already have thought about it. The only case where we could add this
> > > > > test is when we know that we have a if (ptr != NULL) test following the
> > > > > rcu_dereference(); we could then assume the compiler will merge the two
> > > > > branches since they depend on the same condition.
> > > >
> > > > I remember seeing a thread about all this special casing NULL, but have
> > > > never been able to find it again - my google skillz always fail me.
> > > >
> > > > Basically it doesn't work if you use the variable as an index/offset,
> > > > because in that case 0 is a valid offset and you still generate a data
> > > > dependency.
> > > >
> > > > IIRC the conclusion was that the gains were too small to spend more time
> > > > on it, although I would like to hear about the special case in
> > > > rcu_assign_pointer.
> > > >
> > > > /me goes use git blame....
> > > >
> > >
> > > Actually, we could probably do the following, which also adds an extra
> > > coherency check about non-NULL pointer assumptions :
> > >
> > > #ifdef CONFIG_RCU_DEBUG /* this would be new */
> > > #define DEBUG_RCU_BUG_ON(x) BUG_ON(x)
> > > #else
> > > #define DEBUG_RCU_BUG_ON(x)
> > > #endif
> > >
> > > #define rcu_dereference(p) ({ \
> > > typeof(p) _________p1 = ACCESS_ONCE(p); \
> > > if (p != NULL) \
> > > smp_read_barrier_depends(); \
> > > (_________p1); \
> > > })
> > >
> > > #define rcu_dereference_non_null(p) ({ \
> > > typeof(p) _________p1 = ACCESS_ONCE(p); \
> > > DEBUG_RCU_BUG_ON(p == NULL); \
> > > smp_read_barrier_depends(); \
> > > (_________p1); \
> > > })
> > >
> > > The use-case where rcu_dereference() would be used is when it is
> > > followed by a null pointer check (grepping through the sources shows me
> > > this is a very very common case). In rare cases, it is assumed that the
> > > pointer is never NULL and it is used just after the rcu_dereference. It
> > > those cases, the extra test could be saved on alpha by using
> > > rcu_dereference_non_null(p), which would check the the pointer is indeed
> > > never NULL under some debug kernel configuration.
> > >
> > > Does it make sense ?
> >
> > This would break the case where the dereferenced variable is used as an
> > index/offset where 0 is a valid value and still generates data
> > dependencies.
> >
> > So if with your new version we do:
> >
> > i = rcu_dereference(foo);
> > j = table[i];
> >
> > which translates into:
> >
> > i = ACCESS_ONCE(foo);
> > if (i)
> > smp_read_barrier_depends();
> > j = table[i];
> >
> > which when i == 0, would fail to do the barrier and can thus cause j to
> > be a wrong value.
> >
> > Sadly I'll have to defer to Paul to explain exactly how that can happen
> > - I always get my head in a horrible twist with this case.
> >
>
> I completely agree with you. However, given the current
> rcu_assign_pointer() implementation, we already have this problem. My
> proposal assumes the current rcu_assign_pointer() behavior is correct
> and that those are never ever used for index/offsets.
>
> We could enforce this as a compile-time check with something along the
> lines of :
>
> #define BUILD_BUG_ON_NOT_OFFSETABLE(x) (void)(x)[0]
>
> And use it both in rcu_assign_pointer() and rcu_dereference(). It would
> check for any type passed to rcu_assign_pointer and rcu_dereference
> which is not either a pointer or an array.
>
> Then if someone really want to shoot himself in the foot by casting a
> pointer to a long after the rcu_deref, that's his problem.
>
> Hrm, looking at rcu_assign_pointer tells me that the ((v) != NULL) test
> should probably already complain if v is not a pointer. So my build test
> is probably unneeded.
Yeah, I was thinking in terms of rcu_dereference() working with both
rcu_assign_pointer() and an as-yet-mythical rcu_assign_index(). Perhaps
this would be a good time to get better names:
Current: rcu_assign_pointer() rcu_dereference()
New Pointers: rcu_publish_pointer() rcu_subscribe_pointer()
New Indexes: rcu_publish_index() rcu_subscribe_index()
And, while I am at it, work in a way of checking for either being in
the appropriate RCU read-side critical section and/or having the
needed lock/mutex/whatever held -- something I believe PeterZ was
prototyping some months back.
Though I still am having a hard time with the conditional in
rcu_dereference() vs. the smp_read_barrier_depends()...
Thanx, Paul
next prev parent reply other threads:[~2008-08-01 21:15 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-07-09 14:59 [patch 00/15] Tracepoints v3 for linux-next Mathieu Desnoyers
2008-07-09 14:59 ` [patch 01/15] Kernel Tracepoints Mathieu Desnoyers
2008-07-15 7:50 ` Peter Zijlstra
2008-07-15 13:25 ` Mathieu Desnoyers
2008-07-15 13:59 ` Peter Zijlstra
2008-07-15 14:27 ` Mathieu Desnoyers
2008-07-15 14:42 ` Peter Zijlstra
2008-07-15 15:22 ` Mathieu Desnoyers
2008-07-15 15:31 ` Peter Zijlstra
2008-07-15 15:50 ` Mathieu Desnoyers
2008-08-01 21:10 ` Paul E. McKenney
2008-07-15 16:08 ` Mathieu Desnoyers
2008-07-15 16:25 ` Peter Zijlstra
2008-07-15 16:51 ` Mathieu Desnoyers
2008-08-01 21:10 ` Paul E. McKenney [this message]
2008-08-02 0:03 ` Peter Zijlstra
2008-08-02 0:17 ` Paul E. McKenney
2008-08-01 21:10 ` Paul E. McKenney
2008-07-15 16:26 ` Mathieu Desnoyers
2008-08-01 21:10 ` Paul E. McKenney
2008-07-15 17:50 ` Mathieu Desnoyers
2008-07-15 14:03 ` Peter Zijlstra
2008-07-15 14:46 ` Mathieu Desnoyers
2008-07-15 15:13 ` Peter Zijlstra
2008-07-15 18:22 ` Mathieu Desnoyers
2008-07-15 18:33 ` Steven Rostedt
2008-07-15 18:52 ` Masami Hiramatsu
2008-07-15 19:08 ` Mathieu Desnoyers
2008-07-15 19:02 ` Mathieu Desnoyers
2008-07-15 19:52 ` Peter Zijlstra
2008-07-09 14:59 ` [patch 02/15] Tracepoints Documentation Mathieu Desnoyers
2008-07-09 14:59 ` [patch 03/15] Tracepoints Samples Mathieu Desnoyers
2008-07-09 14:59 ` [patch 04/15] LTTng instrumentation - irq Mathieu Desnoyers
2008-07-09 16:39 ` Masami Hiramatsu
2008-07-09 17:05 ` [patch 04/15] LTTng instrumentation - irq (update) Mathieu Desnoyers
2008-07-09 14:59 ` [patch 05/15] LTTng instrumentation - scheduler Mathieu Desnoyers
2008-07-09 15:34 ` [patch 05/15] LTTng instrumentation - scheduler (repost) Mathieu Desnoyers
2008-07-09 15:39 ` Ingo Molnar
2008-07-09 16:00 ` Mathieu Desnoyers
2008-07-09 16:21 ` [patch 05/15] LTTng instrumentation - scheduler (merge ftrace markers) Mathieu Desnoyers
2008-07-09 19:09 ` [PATCH] ftrace port to tracepoints (linux-next) Mathieu Desnoyers
2008-07-10 3:14 ` Takashi Nishiie
2008-07-10 3:57 ` [PATCH] ftrace port to tracepoints (linux-next) (nitpick update) Mathieu Desnoyers
[not found] ` <20080711143709.GB11500@Krystal>
[not found] ` <Pine.LNX.4.58.0807141112540.30484@gandalf.stny.rr.com>
[not found] ` <20080714153334.GA651@Krystal>
[not found] ` <Pine.LNX.4.58.0807141153250.29493@gandalf.stny.rr.com>
2008-07-14 16:25 ` [PATCH] ftrace memory barriers Mathieu Desnoyers
2008-07-14 16:35 ` Steven Rostedt
2008-07-09 14:59 ` [patch 06/15] LTTng instrumentation - timer Mathieu Desnoyers
2008-07-09 14:59 ` [patch 07/15] LTTng instrumentation - kernel Mathieu Desnoyers
2008-07-09 14:59 ` [patch 08/15] LTTng instrumentation - filemap Mathieu Desnoyers
2008-07-09 14:59 ` Mathieu Desnoyers
2008-07-09 14:59 ` [patch 09/15] LTTng instrumentation - swap Mathieu Desnoyers
2008-07-09 14:59 ` Mathieu Desnoyers
2008-07-09 14:59 ` [patch 10/15] LTTng instrumentation - memory page faults Mathieu Desnoyers
2008-07-09 14:59 ` Mathieu Desnoyers
2008-07-09 14:59 ` [patch 11/15] LTTng instrumentation - page Mathieu Desnoyers
2008-07-09 14:59 ` [patch 12/15] LTTng instrumentation - hugetlb Mathieu Desnoyers
2008-07-11 14:30 ` [patch 12/15] LTTng instrumentation - hugetlb (update) Mathieu Desnoyers
2008-07-09 14:59 ` [patch 13/15] LTTng instrumentation - net Mathieu Desnoyers
2008-07-09 14:59 ` [patch 14/15] LTTng instrumentation - ipv4 Mathieu Desnoyers
2008-07-09 14:59 ` Mathieu Desnoyers
2008-07-09 17:01 ` [patch 00/15] Tracepoints v3 for linux-next Masami Hiramatsu
2008-07-09 17:11 ` [patch 15/15] LTTng instrumentation - ipv6 Mathieu Desnoyers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080801211020.GQ14851@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=eduard.munteanu@linux360.ro \
--cc=fche@redhat.com \
--cc=haoki@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@polymtl.ca \
--cc=mhiramat@redhat.com \
--cc=mingo@elte.hu \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=t-nishiie@np.css.fujitsu.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.