Re: [PATCH v5 05/10] seqlock: Better document raw_write_seqcount_latch()

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Ingo Molnar <mingo@kernel.org>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Rusty Russell <rusty@rustcorp.com.au>,
	Oleg Nesterov <oleg@redhat.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Andi Kleen <andi@firstfloor.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Lai Jiangshan <laijs@cn.fujitsu.com>,
	George Spelvin <linux@horizon.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	David Woodhouse <David.Woodhouse@intel.com>,
	Rik van Riel <riel@redhat.com>,
	Michel Lespinasse <walken@google.com>
Subject: Re: [PATCH v5 05/10] seqlock: Better document raw_write_seqcount_latch()
Date: Tue, 14 Apr 2015 16:31:15 +0200	[thread overview]
Message-ID: <20150414143115.GA493@gmail.com> (raw)
In-Reply-To: <20150414130427.GR23685@linux.vnet.ibm.com>

* Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:

> > I wish rcu_read_lock() had a data argument, for similar reasons - 
> > even if it just pointed to a pre-existing lock or an rcu head it 
> > never touches ;-)
> 
> Heh!  Jack Slingwine and I had that argument back in 1993.  I 
> advocated placing the update-side lock into the rcu_read_lock() 
> equivalent, and he responded by showing me a use cases were (1) 
> there were no update-side locks and (2) there were many update-side 
> locks, and it was impossible to select just one on the read side.  
> ;-)

So as a response I'd have attempted to hand-wave something about those 
scenarios being either not that common, or not that interesting?!! :-)

> However, DYNIX/ptx did not have anything like rcu_dereference() or 
> list_for_each_entry_rcu(), which perhaps can be used in your example 
> below.  (Hey, that was 20 years ago, when 50MB was a lot of main 
> memory.  So we relied on compilers being quite dumb.)

I guess compilers are still dumb in many ways ;-)

> > know that cpusets are integrated with cgroups and I search 
> > kernel/cgroup.c for call_rcu(), do I find:
> > 
> >         call_rcu(&css->rcu_head, css_free_rcu_fn);
> > 
> > aha!
> > 
> > ... or if I drill down 3 levels into cpuset_for_each_child() -> 
> > css_for_each_child() -> css_next_child() do I see the RCU 
> > iteration.
> 
> And I have felt that reviewing pain as well.
> 
> But shouldn't these API members be tagged with "_rcu" to make that 
> more clear?  Sort of like the difference between list_for_each_entry 
> and list_for_each_entry_rcu()?

Yes, agreed absolutely!

Having it as a syntactic element instead of a stylistic one forces 
such self-documentation though.

At the cost of being an extra nuisance.

> > It would have been a lot clearer from the onset, if I had a hint 
> > syntactically:
> > 
> > 	rcu_read_lock(&css->rcu_head);
> > 	...
> > 	rcu_read_unlock(&css->rcu_head);
> 
> I cannot resist asking what you put there if the update side uses 
> synchronize_rcu()...  A NULL pointer?  A pointer to 
> synchronize_rcu()? Something else? [...]

So I'd either put a suitable zero-size struct ('struct 
rcu_head_sync'?) into the protected data structure: i.e. still 
annotate it in an active fashion, but don't change any code.

This would be zero size in the non-debug case, but it would allow 
debugging data in the debug case.

Another solution would be to not require such linking in all cases, 
only when there's a single update side lock.

> [...]  And what do you do in the not-uncommon case where multiple 
> RCU chains are being traversed in the same RCU read-side critical 
> section?  One approach would be to use varargs, I suppose. Though 
> with a hash table, list, or tree, you could have a -lot- of 
> ->rcu_head structures to reference, and concurrent additions and 
> deletions mean that you wouldn't necessarily know which at 
> rcu_read_lock() time.

I'd definitely keep it simple - i.e. no varargs.

I'd just try to link with the data structure in general. Or I'd just 
forget about handling these cases altogether, at least initially - 
first see how it works out for the simpler cases.

> 
> > beyond the reviewer bonus I bet this would allow some extra debugging 
> > as well (only enabled in debug kernels):
> > 
> >   - for example to make sure we only access a field if _that field_ is 
> >     RCU locked (reducing the chance of having the right locking for 
> >     the wrong reason)
> 
> One possibility would be to mark each traversal of an RCU-protected 
> pointer.  Currently, if a multilinked structure is inserted in one 
> shot, only the initial pointer to that structure needs to have 
> rcu_dereference().  Otherwise, it is hard to tell exactly how far 
> the RCU protection is to extend.  (Been having too much fun with 
> this sort of thing in the standards committees...)
>
> >   - we could possibly also build lockdep dependencies out of such 
> >     annotated RCU locking patterns.
> 
> Tell me more?

So to backpedal a bit, I think that in practice the use 
rcu_dereference*() gives us a lot of protection and documentation 
already - so I might be over-doing it.

But we could essentially split up the current monolithic 
rcu_read_lock() class and turn it into classes mirroring the update 
side lock classes.

This would at minimum give us access to CONFIG_LOCK_STAT statistics 
about the frequency of use of the various locks. Right now I can only 
see this:

/proc/lockdep:ffffffff82d27887 OPS: 1412000 FD:    1 BD:    1 ......: rcu_read_lock
/proc/lock_stat:                         rcu_read_lock-R:             0              0           0.00           0.00           0.00           0.00              0        1135745           0.00         969.66     1355676.33           1.19
/proc/lock_stat:                   rcu_read_lock_sched-R:             0              0           0.00           0.00           0.00           0.00              0             16           0.25          20.71          30.40           1.90
/proc/lock_stat:                      rcu_read_lock_bh-R:             0              0           0.00           0.00           0.00           0.00              0           5602           0.33         126.70       40478.88           7.23

which merges them into essentially a single counter.

If we had a more finegrained structure we could tell more about usage 
patterns? Not sure how valuable that is though.

So for example on the SRCU side we could detect such deadlocks:

	mutex_lock(&mutex);
	synchronize_srcu(&srcu);

vs.

	srcu_read_lock(&srcu);
	mutex_lock(&mutex);

(I don't think we are detecting these right now.)

On the classical RCU side it's harder to construct deadlocks, because 
the read side is a non-sleeping and irq-safe primitive, while 
synchronize_rcu() is a sleeping method ;-) So most (all?) deadlock 
scenarios are avoided by just those properties.

Another use would be to potentially split up the RCU read side into 
multiple grace period domains, flushed independently, a bit like how 
SRCU does it? That would be a non-debugging use for it.

> >   - RCU aware list walking primitives could auto-check that this 
> >     particular list is properly RCU locked.
> 
> For example, that a lock in the proper update class was held during 
> the corresponding update?

Yes, and also on the lookup side: that a 'lock' of the proper type is 
read-held during lookup. (with a few special annotations for special 
cases where as a side effect of other things we may have proper RCU 
read side protection.)

Thanks,

	Ingo

next prev parent reply	other threads:[~2015-04-14 14:31 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-13 14:11 [PATCH v5 00/10] latched RB-trees and __module_address() Peter Zijlstra
2015-04-13 14:11 ` [PATCH v5 01/10] module: Sanitize RCU usage and locking Peter Zijlstra
2015-04-13 15:32   ` Ingo Molnar
2015-04-13 15:40     ` Peter Zijlstra
2015-04-13 16:32       ` Ingo Molnar
2015-04-13 14:11 ` [PATCH v5 02/10] module: Annotate module version magic Peter Zijlstra
2015-04-13 14:11 ` [PATCH v5 03/10] module, jump_label: Fix module locking Peter Zijlstra
2015-04-13 14:11 ` [PATCH v5 04/10] rbtree: Make lockless searches non-fatal Peter Zijlstra
2015-04-13 15:50   ` Ingo Molnar
2015-04-13 19:10     ` Peter Zijlstra
2015-04-13 14:11 ` [PATCH v5 05/10] seqlock: Better document raw_write_seqcount_latch() Peter Zijlstra
2015-04-13 16:32   ` Ingo Molnar
2015-04-13 17:08     ` Mathieu Desnoyers
2015-04-13 17:43       ` Paul E. McKenney
2015-04-13 18:21         ` Linus Torvalds
2015-04-13 18:42           ` Paul E. McKenney
2015-04-14 10:25             ` Ingo Molnar
2015-04-14 13:04               ` Paul E. McKenney
2015-04-14 14:31                 ` Ingo Molnar [this message]
2015-04-14 15:11                   ` Paul E. McKenney
2015-04-13 19:17     ` Peter Zijlstra
2015-04-13 19:47       ` Peter Zijlstra
2015-04-14 10:26         ` Ingo Molnar
2015-04-13 14:11 ` [PATCH v5 06/10] rbtree: Implement generic latch_tree Peter Zijlstra
2015-04-13 16:43   ` Ingo Molnar
2015-04-13 19:20     ` Peter Zijlstra
2015-04-13 14:11 ` [PATCH v5 07/10] module: Optimize __module_address() using a latched RB-tree Peter Zijlstra
2015-04-13 16:49   ` Ingo Molnar
2015-04-14 12:31     ` Peter Zijlstra
2015-04-13 14:11 ` [PATCH v5 08/10] module: Make the mod_tree stuff conditional on PERF_EVENTS || TRACING Peter Zijlstra
2015-04-13 16:53   ` Ingo Molnar
2015-04-13 14:11 ` [PATCH v5 09/10] module: Use __module_address() for module_address_lookup() Peter Zijlstra
2015-04-13 14:11 ` [PATCH v5 10/10] module: Rework module_addr_{min,max} Peter Zijlstra
2015-04-13 16:56   ` Ingo Molnar
2015-04-14  2:55     ` Rusty Russell
2015-04-14  6:42       ` Peter Zijlstra
2015-04-14 12:56     ` Peter Zijlstra
2015-04-14 13:00       ` Ingo Molnar
2015-04-13 17:02 ` [PATCH v5 00/10] latched RB-trees and __module_address() Ingo Molnar
2015-04-14  2:57 ` Rusty Russell
2015-04-14  6:41   ` Peter Zijlstra
2015-04-15  4:41     ` Rusty Russell
2015-04-15  9:41       ` Peter Zijlstra
2015-05-28  2:07     ` Rusty Russell
2015-05-28 11:20       ` Peter Zijlstra
2015-05-28 23:49         ` Rusty Russell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150414143115.GA493@gmail.com \
    --to=mingo@kernel.org \
    --cc=David.Woodhouse@intel.com \
    --cc=aarcange@redhat.com \
    --cc=andi@firstfloor.org \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@horizon.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=oleg@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=rusty@rustcorp.com.au \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.