From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Peter Zijlstra <peterz@infradead.org>,
Rusty Russell <rusty@rustcorp.com.au>,
Oleg Nesterov <oleg@redhat.com>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Andi Kleen <andi@firstfloor.org>,
Steven Rostedt <rostedt@goodmis.org>,
Thomas Gleixner <tglx@linutronix.de>,
Lai Jiangshan <laijs@cn.fujitsu.com>,
George Spelvin <linux@horizon.com>,
Andrea Arcangeli <aarcange@redhat.com>,
David Woodhouse <David.Woodhouse@intel.com>,
Rik van Riel <riel@redhat.com>,
Michel Lespinasse <walken@google.com>
Subject: Re: [PATCH v5 05/10] seqlock: Better document raw_write_seqcount_latch()
Date: Tue, 14 Apr 2015 06:04:27 -0700 [thread overview]
Message-ID: <20150414130427.GR23685@linux.vnet.ibm.com> (raw)
In-Reply-To: <20150414102505.GA13015@gmail.com>
On Tue, Apr 14, 2015 at 12:25:06PM +0200, Ingo Molnar wrote:
>
> * Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
>
> > On Mon, Apr 13, 2015 at 11:21:46AM -0700, Linus Torvalds wrote:
> > > On Mon, Apr 13, 2015 at 10:43 AM, Paul E. McKenney
> > > <paulmck@linux.vnet.ibm.com> wrote:
> > > >
> > > > A shorthand for READ_ONCE + smp_read_barrier_depends() is the shiny
> > > > new lockless_dereference()
> > >
> > > Related side note - I think people should get used to seeing
> > > "smp_load_acquire()". It has well-defined memory ordering properties
> > > and should generally perform well on most architectures. It's (much)
> > > stronger than lockless_dereference(), and together with
> > > smp_store_release() you can make rather clear guarantees about passing
> > > data locklessly from one CPU to another.
> > >
> > > I'd like to see us use more of the pattern of
> > >
> > > - one thread does:
> > >
> > > .. allocate/create some data
> > > smp_store_release() to "expose it"
> > >
> > > - another thread does:
> > >
> > > smp_load_acquire() to read index/pointer/flag/whatever
> > > .. use the data any damn way you want ..
> > >
> > > and we should probably aim to prefer that pattern over a lot of our
> > > traditional memory barriers.
> >
> > I couldn't agree more!
>
> /me too!
>
> > RCU made a similar move from open-coding smp_read_barrier_depends()
> > to using rcu_dereference() many years ago, and that change made RCU
> > code -much- easier to read and understand. I believe that moving
> > from smp_mb(), smp_rmb(), and smp_wmb() to smp_store_release() and
> > smp_load_acquire() will provide similar maintainability benefits.
> > Furthermore, when the current code uses smp_mb(),
> > smp_store_release() and smp_load_acquire() generate faster code on
> > most architectures.
>
> A similar maintainability argument can be made for locking:
> spin_lock(x) was a big step forward compared to lock_kernel(),
> primarily not because it improves scalability (it often does), but
> because the '(x)' very clearly documents the data structure that is
> being accessed and makes locking and data access bugs a lot more
> visible in the review phase already.
>
> I wish rcu_read_lock() had a data argument, for similar reasons - even
> if it just pointed to a pre-existing lock or an rcu head it never
> touches ;-)
Heh! Jack Slingwine and I had that argument back in 1993. I advocated
placing the update-side lock into the rcu_read_lock() equivalent, and
he responded by showing me a use cases were (1) there were no update-side
locks and (2) there were many update-side locks, and it was impossible
to select just one on the read side. ;-)
However, DYNIX/ptx did not have anything like rcu_dereference() or
list_for_each_entry_rcu(), which perhaps can be used in your example
below. (Hey, that was 20 years ago, when 50MB was a lot of main
memory. So we relied on compilers being quite dumb.)
> As an example I picked a random file out of the kernel that uses RCU:
> kernel/cpuset.c::validate_change():
>
> static int validate_change(struct cpuset *cur, struct cpuset *trial)
> {
> struct cgroup_subsys_state *css;
> struct cpuset *c, *par;
> int ret;
>
> rcu_read_lock();
>
> /* Each of our child cpusets must be a subset of us */
> ret = -EBUSY;
> cpuset_for_each_child(c, css, cur)
> if (!is_cpuset_subset(c, trial))
> goto out;
>
> /* Remaining checks don't apply to root cpuset */
> ret = 0;
> if (cur == &top_cpuset)
> goto out;
>
> par = parent_cs(cur);
>
> /* On legacy hiearchy, we must be a subset of our parent cpuset. */
> ret = -EACCES;
> if (!cgroup_on_dfl(cur->css.cgroup) && !is_cpuset_subset(trial, par))
> goto out;
>
> /*
> * If either I or some sibling (!= me) is exclusive, we can't
> * overlap
> */
> ret = -EINVAL;
> cpuset_for_each_child(c, css, par) {
> if ((is_cpu_exclusive(trial) || is_cpu_exclusive(c)) &&
> c != cur &&
> cpumask_intersects(trial->cpus_allowed, c->cpus_allowed))
> goto out;
> if ((is_mem_exclusive(trial) || is_mem_exclusive(c)) &&
> c != cur &&
> nodes_intersects(trial->mems_allowed, c->mems_allowed))
> goto out;
> }
>
> /*
> * Cpusets with tasks - existing or newly being attached - can't
> * be changed to have empty cpus_allowed or mems_allowed.
> */
> ret = -ENOSPC;
> if ((cgroup_has_tasks(cur->css.cgroup) || cur->attach_in_progress)) {
> if (!cpumask_empty(cur->cpus_allowed) &&
> cpumask_empty(trial->cpus_allowed))
> goto out;
> if (!nodes_empty(cur->mems_allowed) &&
> nodes_empty(trial->mems_allowed))
> goto out;
> }
>
> /*
> * We can't shrink if we won't have enough room for SCHED_DEADLINE
> * tasks.
> */
> ret = -EBUSY;
> if (is_cpu_exclusive(cur) &&
> !cpuset_cpumask_can_shrink(cur->cpus_allowed,
> trial->cpus_allowed))
> goto out;
>
> ret = 0;
> out:
> rcu_read_unlock();
> return ret;
> }
>
> So just from taking a glance at that function can you tell me what is
> being RCU protected here? I cannot, I can only guess that it must
> either be cpuset_for_each_child() or maybe the cpumasks or other
> internals.
>
> And if I search the file for call_rcu() it shows me nothing. Only if I
> know that cpusets are integrated with cgroups and I search
> kernel/cgroup.c for call_rcu(), do I find:
>
> call_rcu(&css->rcu_head, css_free_rcu_fn);
>
> aha!
>
> ... or if I drill down 3 levels into cpuset_for_each_child() ->
> css_for_each_child() -> css_next_child() do I see the RCU iteration.
And I have felt that reviewing pain as well.
But shouldn't these API members be tagged with "_rcu" to make that
more clear? Sort of like the difference between list_for_each_entry
and list_for_each_entry_rcu()?
> It would have been a lot clearer from the onset, if I had a hint
> syntactically:
>
> rcu_read_lock(&css->rcu_head);
> ...
> rcu_read_unlock(&css->rcu_head);
I cannot resist asking what you put there if the update side uses
synchronize_rcu()... A NULL pointer? A pointer to synchronize_rcu()?
Something else? And what do you do in the not-uncommon case where
multiple RCU chains are being traversed in the same RCU read-side
critical section? One approach would be to use varargs, I suppose.
Though with a hash table, list, or tree, you could have a -lot- of
->rcu_head structures to reference, and concurrent additions and deletions
mean that you wouldn't necessarily know which at rcu_read_lock() time.
> beyond the reviewer bonus I bet this would allow some extra debugging
> as well (only enabled in debug kernels):
>
> - for example to make sure we only access a field if _that field_ is
> RCU locked (reducing the chance of having the right locking for
> the wrong reason)
One possibility would be to mark each traversal of an RCU-protected
pointer. Currently, if a multilinked structure is inserted in one
shot, only the initial pointer to that structure needs to have
rcu_dereference(). Otherwise, it is hard to tell exactly how far
the RCU protection is to extend. (Been having too much fun with
this sort of thing in the standards committees...)
> - we could possibly also build lockdep dependencies out of such
> annotated RCU locking patterns.
Tell me more?
> - RCU aware list walking primitives could auto-check that this
> particular list is properly RCU locked.
For example, that a lock in the proper update class was held during
the corresponding update?
> This could be introduced gradually by using a different API name:
>
> rcu_lock(&css->rcu_head);
> ...
> rcu_unlock(&css->rcu_head);
>
> (the 'read' is implied in RCU locking anyway.)
Agreed, a new API would be needed for something like this.
> ... and if you think this approach has any merit, I volunteer the perf
> and sched subsystems as guinea pigs! :-)
And rcutorture, not that it counts for much.
> What do you think?
I think that I don't yet fully understand your proposal. ;-)
Thanx, Paul
next prev parent reply other threads:[~2015-04-14 13:04 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-13 14:11 [PATCH v5 00/10] latched RB-trees and __module_address() Peter Zijlstra
2015-04-13 14:11 ` [PATCH v5 01/10] module: Sanitize RCU usage and locking Peter Zijlstra
2015-04-13 15:32 ` Ingo Molnar
2015-04-13 15:40 ` Peter Zijlstra
2015-04-13 16:32 ` Ingo Molnar
2015-04-13 14:11 ` [PATCH v5 02/10] module: Annotate module version magic Peter Zijlstra
2015-04-13 14:11 ` [PATCH v5 03/10] module, jump_label: Fix module locking Peter Zijlstra
2015-04-13 14:11 ` [PATCH v5 04/10] rbtree: Make lockless searches non-fatal Peter Zijlstra
2015-04-13 15:50 ` Ingo Molnar
2015-04-13 19:10 ` Peter Zijlstra
2015-04-13 14:11 ` [PATCH v5 05/10] seqlock: Better document raw_write_seqcount_latch() Peter Zijlstra
2015-04-13 16:32 ` Ingo Molnar
2015-04-13 17:08 ` Mathieu Desnoyers
2015-04-13 17:43 ` Paul E. McKenney
2015-04-13 18:21 ` Linus Torvalds
2015-04-13 18:42 ` Paul E. McKenney
2015-04-14 10:25 ` Ingo Molnar
2015-04-14 13:04 ` Paul E. McKenney [this message]
2015-04-14 14:31 ` Ingo Molnar
2015-04-14 15:11 ` Paul E. McKenney
2015-04-13 19:17 ` Peter Zijlstra
2015-04-13 19:47 ` Peter Zijlstra
2015-04-14 10:26 ` Ingo Molnar
2015-04-13 14:11 ` [PATCH v5 06/10] rbtree: Implement generic latch_tree Peter Zijlstra
2015-04-13 16:43 ` Ingo Molnar
2015-04-13 19:20 ` Peter Zijlstra
2015-04-13 14:11 ` [PATCH v5 07/10] module: Optimize __module_address() using a latched RB-tree Peter Zijlstra
2015-04-13 16:49 ` Ingo Molnar
2015-04-14 12:31 ` Peter Zijlstra
2015-04-13 14:11 ` [PATCH v5 08/10] module: Make the mod_tree stuff conditional on PERF_EVENTS || TRACING Peter Zijlstra
2015-04-13 16:53 ` Ingo Molnar
2015-04-13 14:11 ` [PATCH v5 09/10] module: Use __module_address() for module_address_lookup() Peter Zijlstra
2015-04-13 14:11 ` [PATCH v5 10/10] module: Rework module_addr_{min,max} Peter Zijlstra
2015-04-13 16:56 ` Ingo Molnar
2015-04-14 2:55 ` Rusty Russell
2015-04-14 6:42 ` Peter Zijlstra
2015-04-14 12:56 ` Peter Zijlstra
2015-04-14 13:00 ` Ingo Molnar
2015-04-13 17:02 ` [PATCH v5 00/10] latched RB-trees and __module_address() Ingo Molnar
2015-04-14 2:57 ` Rusty Russell
2015-04-14 6:41 ` Peter Zijlstra
2015-04-15 4:41 ` Rusty Russell
2015-04-15 9:41 ` Peter Zijlstra
2015-05-28 2:07 ` Rusty Russell
2015-05-28 11:20 ` Peter Zijlstra
2015-05-28 23:49 ` Rusty Russell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150414130427.GR23685@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=David.Woodhouse@intel.com \
--cc=aarcange@redhat.com \
--cc=andi@firstfloor.org \
--cc=laijs@cn.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux@horizon.com \
--cc=mathieu.desnoyers@efficios.com \
--cc=mingo@kernel.org \
--cc=oleg@redhat.com \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=rostedt@goodmis.org \
--cc=rusty@rustcorp.com.au \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=walken@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.