From: Ingo Molnar <mingo@kernel.org>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Peter Zijlstra <peterz@infradead.org>,
Rusty Russell <rusty@rustcorp.com.au>,
Oleg Nesterov <oleg@redhat.com>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Andi Kleen <andi@firstfloor.org>,
Steven Rostedt <rostedt@goodmis.org>,
Thomas Gleixner <tglx@linutronix.de>,
Lai Jiangshan <laijs@cn.fujitsu.com>,
George Spelvin <linux@horizon.com>,
Andrea Arcangeli <aarcange@redhat.com>,
David Woodhouse <David.Woodhouse@intel.com>,
Rik van Riel <riel@redhat.com>,
Michel Lespinasse <walken@google.com>
Subject: Re: [PATCH v5 05/10] seqlock: Better document raw_write_seqcount_latch()
Date: Tue, 14 Apr 2015 12:25:06 +0200 [thread overview]
Message-ID: <20150414102505.GA13015@gmail.com> (raw)
In-Reply-To: <20150413184253.GZ23685@linux.vnet.ibm.com>
* Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> On Mon, Apr 13, 2015 at 11:21:46AM -0700, Linus Torvalds wrote:
> > On Mon, Apr 13, 2015 at 10:43 AM, Paul E. McKenney
> > <paulmck@linux.vnet.ibm.com> wrote:
> > >
> > > A shorthand for READ_ONCE + smp_read_barrier_depends() is the shiny
> > > new lockless_dereference()
> >
> > Related side note - I think people should get used to seeing
> > "smp_load_acquire()". It has well-defined memory ordering properties
> > and should generally perform well on most architectures. It's (much)
> > stronger than lockless_dereference(), and together with
> > smp_store_release() you can make rather clear guarantees about passing
> > data locklessly from one CPU to another.
> >
> > I'd like to see us use more of the pattern of
> >
> > - one thread does:
> >
> > .. allocate/create some data
> > smp_store_release() to "expose it"
> >
> > - another thread does:
> >
> > smp_load_acquire() to read index/pointer/flag/whatever
> > .. use the data any damn way you want ..
> >
> > and we should probably aim to prefer that pattern over a lot of our
> > traditional memory barriers.
>
> I couldn't agree more!
/me too!
> RCU made a similar move from open-coding smp_read_barrier_depends()
> to using rcu_dereference() many years ago, and that change made RCU
> code -much- easier to read and understand. I believe that moving
> from smp_mb(), smp_rmb(), and smp_wmb() to smp_store_release() and
> smp_load_acquire() will provide similar maintainability benefits.
> Furthermore, when the current code uses smp_mb(),
> smp_store_release() and smp_load_acquire() generate faster code on
> most architectures.
A similar maintainability argument can be made for locking:
spin_lock(x) was a big step forward compared to lock_kernel(),
primarily not because it improves scalability (it often does), but
because the '(x)' very clearly documents the data structure that is
being accessed and makes locking and data access bugs a lot more
visible in the review phase already.
I wish rcu_read_lock() had a data argument, for similar reasons - even
if it just pointed to a pre-existing lock or an rcu head it never
touches ;-)
As an example I picked a random file out of the kernel that uses RCU:
kernel/cpuset.c::validate_change():
static int validate_change(struct cpuset *cur, struct cpuset *trial)
{
struct cgroup_subsys_state *css;
struct cpuset *c, *par;
int ret;
rcu_read_lock();
/* Each of our child cpusets must be a subset of us */
ret = -EBUSY;
cpuset_for_each_child(c, css, cur)
if (!is_cpuset_subset(c, trial))
goto out;
/* Remaining checks don't apply to root cpuset */
ret = 0;
if (cur == &top_cpuset)
goto out;
par = parent_cs(cur);
/* On legacy hiearchy, we must be a subset of our parent cpuset. */
ret = -EACCES;
if (!cgroup_on_dfl(cur->css.cgroup) && !is_cpuset_subset(trial, par))
goto out;
/*
* If either I or some sibling (!= me) is exclusive, we can't
* overlap
*/
ret = -EINVAL;
cpuset_for_each_child(c, css, par) {
if ((is_cpu_exclusive(trial) || is_cpu_exclusive(c)) &&
c != cur &&
cpumask_intersects(trial->cpus_allowed, c->cpus_allowed))
goto out;
if ((is_mem_exclusive(trial) || is_mem_exclusive(c)) &&
c != cur &&
nodes_intersects(trial->mems_allowed, c->mems_allowed))
goto out;
}
/*
* Cpusets with tasks - existing or newly being attached - can't
* be changed to have empty cpus_allowed or mems_allowed.
*/
ret = -ENOSPC;
if ((cgroup_has_tasks(cur->css.cgroup) || cur->attach_in_progress)) {
if (!cpumask_empty(cur->cpus_allowed) &&
cpumask_empty(trial->cpus_allowed))
goto out;
if (!nodes_empty(cur->mems_allowed) &&
nodes_empty(trial->mems_allowed))
goto out;
}
/*
* We can't shrink if we won't have enough room for SCHED_DEADLINE
* tasks.
*/
ret = -EBUSY;
if (is_cpu_exclusive(cur) &&
!cpuset_cpumask_can_shrink(cur->cpus_allowed,
trial->cpus_allowed))
goto out;
ret = 0;
out:
rcu_read_unlock();
return ret;
}
So just from taking a glance at that function can you tell me what is
being RCU protected here? I cannot, I can only guess that it must
either be cpuset_for_each_child() or maybe the cpumasks or other
internals.
And if I search the file for call_rcu() it shows me nothing. Only if I
know that cpusets are integrated with cgroups and I search
kernel/cgroup.c for call_rcu(), do I find:
call_rcu(&css->rcu_head, css_free_rcu_fn);
aha!
... or if I drill down 3 levels into cpuset_for_each_child() ->
css_for_each_child() -> css_next_child() do I see the RCU iteration.
It would have been a lot clearer from the onset, if I had a hint
syntactically:
rcu_read_lock(&css->rcu_head);
...
rcu_read_unlock(&css->rcu_head);
beyond the reviewer bonus I bet this would allow some extra debugging
as well (only enabled in debug kernels):
- for example to make sure we only access a field if _that field_ is
RCU locked (reducing the chance of having the right locking for
the wrong reason)
- we could possibly also build lockdep dependencies out of such
annotated RCU locking patterns.
- RCU aware list walking primitives could auto-check that this
particular list is properly RCU locked.
This could be introduced gradually by using a different API name:
rcu_lock(&css->rcu_head);
...
rcu_unlock(&css->rcu_head);
(the 'read' is implied in RCU locking anyway.)
... and if you think this approach has any merit, I volunteer the perf
and sched subsystems as guinea pigs! :-)
What do you think?
Thanks,
Ingo
next prev parent reply other threads:[~2015-04-14 10:25 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-13 14:11 [PATCH v5 00/10] latched RB-trees and __module_address() Peter Zijlstra
2015-04-13 14:11 ` [PATCH v5 01/10] module: Sanitize RCU usage and locking Peter Zijlstra
2015-04-13 15:32 ` Ingo Molnar
2015-04-13 15:40 ` Peter Zijlstra
2015-04-13 16:32 ` Ingo Molnar
2015-04-13 14:11 ` [PATCH v5 02/10] module: Annotate module version magic Peter Zijlstra
2015-04-13 14:11 ` [PATCH v5 03/10] module, jump_label: Fix module locking Peter Zijlstra
2015-04-13 14:11 ` [PATCH v5 04/10] rbtree: Make lockless searches non-fatal Peter Zijlstra
2015-04-13 15:50 ` Ingo Molnar
2015-04-13 19:10 ` Peter Zijlstra
2015-04-13 14:11 ` [PATCH v5 05/10] seqlock: Better document raw_write_seqcount_latch() Peter Zijlstra
2015-04-13 16:32 ` Ingo Molnar
2015-04-13 17:08 ` Mathieu Desnoyers
2015-04-13 17:43 ` Paul E. McKenney
2015-04-13 18:21 ` Linus Torvalds
2015-04-13 18:42 ` Paul E. McKenney
2015-04-14 10:25 ` Ingo Molnar [this message]
2015-04-14 13:04 ` Paul E. McKenney
2015-04-14 14:31 ` Ingo Molnar
2015-04-14 15:11 ` Paul E. McKenney
2015-04-13 19:17 ` Peter Zijlstra
2015-04-13 19:47 ` Peter Zijlstra
2015-04-14 10:26 ` Ingo Molnar
2015-04-13 14:11 ` [PATCH v5 06/10] rbtree: Implement generic latch_tree Peter Zijlstra
2015-04-13 16:43 ` Ingo Molnar
2015-04-13 19:20 ` Peter Zijlstra
2015-04-13 14:11 ` [PATCH v5 07/10] module: Optimize __module_address() using a latched RB-tree Peter Zijlstra
2015-04-13 16:49 ` Ingo Molnar
2015-04-14 12:31 ` Peter Zijlstra
2015-04-13 14:11 ` [PATCH v5 08/10] module: Make the mod_tree stuff conditional on PERF_EVENTS || TRACING Peter Zijlstra
2015-04-13 16:53 ` Ingo Molnar
2015-04-13 14:11 ` [PATCH v5 09/10] module: Use __module_address() for module_address_lookup() Peter Zijlstra
2015-04-13 14:11 ` [PATCH v5 10/10] module: Rework module_addr_{min,max} Peter Zijlstra
2015-04-13 16:56 ` Ingo Molnar
2015-04-14 2:55 ` Rusty Russell
2015-04-14 6:42 ` Peter Zijlstra
2015-04-14 12:56 ` Peter Zijlstra
2015-04-14 13:00 ` Ingo Molnar
2015-04-13 17:02 ` [PATCH v5 00/10] latched RB-trees and __module_address() Ingo Molnar
2015-04-14 2:57 ` Rusty Russell
2015-04-14 6:41 ` Peter Zijlstra
2015-04-15 4:41 ` Rusty Russell
2015-04-15 9:41 ` Peter Zijlstra
2015-05-28 2:07 ` Rusty Russell
2015-05-28 11:20 ` Peter Zijlstra
2015-05-28 23:49 ` Rusty Russell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150414102505.GA13015@gmail.com \
--to=mingo@kernel.org \
--cc=David.Woodhouse@intel.com \
--cc=aarcange@redhat.com \
--cc=andi@firstfloor.org \
--cc=laijs@cn.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux@horizon.com \
--cc=mathieu.desnoyers@efficios.com \
--cc=oleg@redhat.com \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=rostedt@goodmis.org \
--cc=rusty@rustcorp.com.au \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=walken@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.