Re: [PATCH v5 05/10] seqlock: Better document raw_write_seqcount_latch()

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Ingo Molnar <mingo@kernel.org>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Rusty Russell <rusty@rustcorp.com.au>,
	Oleg Nesterov <oleg@redhat.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Andi Kleen <andi@firstfloor.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Lai Jiangshan <laijs@cn.fujitsu.com>,
	George Spelvin <linux@horizon.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	David Woodhouse <David.Woodhouse@intel.com>,
	Rik van Riel <riel@redhat.com>,
	Michel Lespinasse <walken@google.com>
Subject: Re: [PATCH v5 05/10] seqlock: Better document raw_write_seqcount_latch()
Date: Tue, 14 Apr 2015 12:25:06 +0200	[thread overview]
Message-ID: <20150414102505.GA13015@gmail.com> (raw)
In-Reply-To: <20150413184253.GZ23685@linux.vnet.ibm.com>


* Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:

> On Mon, Apr 13, 2015 at 11:21:46AM -0700, Linus Torvalds wrote:
> > On Mon, Apr 13, 2015 at 10:43 AM, Paul E. McKenney
> > <paulmck@linux.vnet.ibm.com> wrote:
> > >
> > > A shorthand for READ_ONCE + smp_read_barrier_depends() is the shiny
> > > new lockless_dereference()
> > 
> > Related side note - I think people should get used to seeing
> > "smp_load_acquire()". It has well-defined memory ordering properties
> > and should generally perform well on most architectures. It's (much)
> > stronger than lockless_dereference(), and together with
> > smp_store_release() you can make rather clear guarantees about passing
> > data locklessly from one CPU to another.
> > 
> > I'd like to see us use more of the pattern of
> > 
> >  - one thread does:
> > 
> >      .. allocate/create some data
> >       smp_store_release() to "expose it"
> > 
> >  - another thread does:
> > 
> >       smp_load_acquire() to read index/pointer/flag/whatever
> >       .. use the data any damn way you want ..
> > 
> > and we should probably aim to prefer that pattern over a lot of our
> > traditional memory barriers.
> 
> I couldn't agree more!

/me too!

> RCU made a similar move from open-coding smp_read_barrier_depends() 
> to using rcu_dereference() many years ago, and that change made RCU 
> code -much- easier to read and understand.  I believe that moving 
> from smp_mb(), smp_rmb(), and smp_wmb() to smp_store_release() and 
> smp_load_acquire() will provide similar maintainability benefits. 
> Furthermore, when the current code uses smp_mb(), 
> smp_store_release() and smp_load_acquire() generate faster code on 
> most architectures.

A similar maintainability argument can be made for locking: 
spin_lock(x) was a big step forward compared to lock_kernel(), 
primarily not because it improves scalability (it often does), but 
because the '(x)' very clearly documents the data structure that is 
being accessed and makes locking and data access bugs a lot more 
visible in the review phase already.

I wish rcu_read_lock() had a data argument, for similar reasons - even 
if it just pointed to a pre-existing lock or an rcu head it never 
touches ;-)

As an example I picked a random file out of the kernel that uses RCU: 
kernel/cpuset.c::validate_change():

static int validate_change(struct cpuset *cur, struct cpuset *trial)
{
	struct cgroup_subsys_state *css;
	struct cpuset *c, *par;
	int ret;

	rcu_read_lock();

	/* Each of our child cpusets must be a subset of us */
	ret = -EBUSY;
	cpuset_for_each_child(c, css, cur)
		if (!is_cpuset_subset(c, trial))
			goto out;

	/* Remaining checks don't apply to root cpuset */
	ret = 0;
	if (cur == &top_cpuset)
		goto out;

	par = parent_cs(cur);

	/* On legacy hiearchy, we must be a subset of our parent cpuset. */
	ret = -EACCES;
	if (!cgroup_on_dfl(cur->css.cgroup) && !is_cpuset_subset(trial, par))
		goto out;

	/*
	 * If either I or some sibling (!= me) is exclusive, we can't
	 * overlap
	 */
	ret = -EINVAL;
	cpuset_for_each_child(c, css, par) {
		if ((is_cpu_exclusive(trial) || is_cpu_exclusive(c)) &&
		    c != cur &&
		    cpumask_intersects(trial->cpus_allowed, c->cpus_allowed))
			goto out;
		if ((is_mem_exclusive(trial) || is_mem_exclusive(c)) &&
		    c != cur &&
		    nodes_intersects(trial->mems_allowed, c->mems_allowed))
			goto out;
	}

	/*
	 * Cpusets with tasks - existing or newly being attached - can't
	 * be changed to have empty cpus_allowed or mems_allowed.
	 */
	ret = -ENOSPC;
	if ((cgroup_has_tasks(cur->css.cgroup) || cur->attach_in_progress)) {
		if (!cpumask_empty(cur->cpus_allowed) &&
		    cpumask_empty(trial->cpus_allowed))
			goto out;
		if (!nodes_empty(cur->mems_allowed) &&
		    nodes_empty(trial->mems_allowed))
			goto out;
	}

	/*
	 * We can't shrink if we won't have enough room for SCHED_DEADLINE
	 * tasks.
	 */
	ret = -EBUSY;
	if (is_cpu_exclusive(cur) &&
	    !cpuset_cpumask_can_shrink(cur->cpus_allowed,
				       trial->cpus_allowed))
		goto out;

	ret = 0;
out:
	rcu_read_unlock();
	return ret;
}

So just from taking a glance at that function can you tell me what is 
being RCU protected here? I cannot, I can only guess that it must 
either be cpuset_for_each_child() or maybe the cpumasks or other 
internals.

And if I search the file for call_rcu() it shows me nothing. Only if I 
know that cpusets are integrated with cgroups and I search 
kernel/cgroup.c for call_rcu(), do I find:

        call_rcu(&css->rcu_head, css_free_rcu_fn);

aha!

... or if I drill down 3 levels into cpuset_for_each_child() -> 
css_for_each_child() -> css_next_child() do I see the RCU iteration.

It would have been a lot clearer from the onset, if I had a hint 
syntactically:

	rcu_read_lock(&css->rcu_head);
	...
	rcu_read_unlock(&css->rcu_head);

beyond the reviewer bonus I bet this would allow some extra debugging 
as well (only enabled in debug kernels):

  - for example to make sure we only access a field if _that field_ is 
    RCU locked (reducing the chance of having the right locking for 
    the wrong reason)

  - we could possibly also build lockdep dependencies out of such 
    annotated RCU locking patterns.

  - RCU aware list walking primitives could auto-check that this 
    particular list is properly RCU locked.

This could be introduced gradually by using a different API name:

	rcu_lock(&css->rcu_head);
	...
	rcu_unlock(&css->rcu_head);

(the 'read' is implied in RCU locking anyway.)

... and if you think this approach has any merit, I volunteer the perf 
and sched subsystems as guinea pigs! :-)

What do you think?

Thanks,

	Ingo

next prev parent reply	other threads:[~2015-04-14 10:25 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-13 14:11 [PATCH v5 00/10] latched RB-trees and __module_address() Peter Zijlstra
2015-04-13 14:11 ` [PATCH v5 01/10] module: Sanitize RCU usage and locking Peter Zijlstra
2015-04-13 15:32   ` Ingo Molnar
2015-04-13 15:40     ` Peter Zijlstra
2015-04-13 16:32       ` Ingo Molnar
2015-04-13 14:11 ` [PATCH v5 02/10] module: Annotate module version magic Peter Zijlstra
2015-04-13 14:11 ` [PATCH v5 03/10] module, jump_label: Fix module locking Peter Zijlstra
2015-04-13 14:11 ` [PATCH v5 04/10] rbtree: Make lockless searches non-fatal Peter Zijlstra
2015-04-13 15:50   ` Ingo Molnar
2015-04-13 19:10     ` Peter Zijlstra
2015-04-13 14:11 ` [PATCH v5 05/10] seqlock: Better document raw_write_seqcount_latch() Peter Zijlstra
2015-04-13 16:32   ` Ingo Molnar
2015-04-13 17:08     ` Mathieu Desnoyers
2015-04-13 17:43       ` Paul E. McKenney
2015-04-13 18:21         ` Linus Torvalds
2015-04-13 18:42           ` Paul E. McKenney
2015-04-14 10:25             ` Ingo Molnar [this message]
2015-04-14 13:04               ` Paul E. McKenney
2015-04-14 14:31                 ` Ingo Molnar
2015-04-14 15:11                   ` Paul E. McKenney
2015-04-13 19:17     ` Peter Zijlstra
2015-04-13 19:47       ` Peter Zijlstra
2015-04-14 10:26         ` Ingo Molnar
2015-04-13 14:11 ` [PATCH v5 06/10] rbtree: Implement generic latch_tree Peter Zijlstra
2015-04-13 16:43   ` Ingo Molnar
2015-04-13 19:20     ` Peter Zijlstra
2015-04-13 14:11 ` [PATCH v5 07/10] module: Optimize __module_address() using a latched RB-tree Peter Zijlstra
2015-04-13 16:49   ` Ingo Molnar
2015-04-14 12:31     ` Peter Zijlstra
2015-04-13 14:11 ` [PATCH v5 08/10] module: Make the mod_tree stuff conditional on PERF_EVENTS || TRACING Peter Zijlstra
2015-04-13 16:53   ` Ingo Molnar
2015-04-13 14:11 ` [PATCH v5 09/10] module: Use __module_address() for module_address_lookup() Peter Zijlstra
2015-04-13 14:11 ` [PATCH v5 10/10] module: Rework module_addr_{min,max} Peter Zijlstra
2015-04-13 16:56   ` Ingo Molnar
2015-04-14  2:55     ` Rusty Russell
2015-04-14  6:42       ` Peter Zijlstra
2015-04-14 12:56     ` Peter Zijlstra
2015-04-14 13:00       ` Ingo Molnar
2015-04-13 17:02 ` [PATCH v5 00/10] latched RB-trees and __module_address() Ingo Molnar
2015-04-14  2:57 ` Rusty Russell
2015-04-14  6:41   ` Peter Zijlstra
2015-04-15  4:41     ` Rusty Russell
2015-04-15  9:41       ` Peter Zijlstra
2015-05-28  2:07     ` Rusty Russell
2015-05-28 11:20       ` Peter Zijlstra
2015-05-28 23:49         ` Rusty Russell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150414102505.GA13015@gmail.com \
    --to=mingo@kernel.org \
    --cc=David.Woodhouse@intel.com \
    --cc=aarcange@redhat.com \
    --cc=andi@firstfloor.org \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@horizon.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=oleg@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=rusty@rustcorp.com.au \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.