From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S932437AbbDNOba (ORCPT <rfc822;w@1wt.eu>);
	Tue, 14 Apr 2015 10:31:30 -0400
Received: from mail-wg0-f43.google.com ([74.125.82.43]:34577 "EHLO
	mail-wg0-f43.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1753750AbbDNObV (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Tue, 14 Apr 2015 10:31:21 -0400
Date: Tue, 14 Apr 2015 16:31:15 +0200
From: Ingo Molnar <mingo@kernel.org>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
        Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
        Peter Zijlstra <peterz@infradead.org>,
        Rusty Russell <rusty@rustcorp.com.au>, Oleg Nesterov <oleg@redhat.com>,
        Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
        Andi Kleen <andi@firstfloor.org>, Steven Rostedt <rostedt@goodmis.org>,
        Thomas Gleixner <tglx@linutronix.de>,
        Lai Jiangshan <laijs@cn.fujitsu.com>,
        George Spelvin <linux@horizon.com>,
        Andrea Arcangeli <aarcange@redhat.com>,
        David Woodhouse <David.Woodhouse@intel.com>,
        Rik van Riel <riel@redhat.com>, Michel Lespinasse <walken@google.com>
Subject: Re: [PATCH v5 05/10] seqlock: Better document
 raw_write_seqcount_latch()
Message-ID: <20150414143115.GA493@gmail.com>
References: <20150413141126.756350256@infradead.org>
 <20150413141213.492831596@infradead.org>
 <20150413163201.GC6040@gmail.com>
 <1979415164.29724.1428944899771.JavaMail.zimbra@efficios.com>
 <20150413174323.GY23685@linux.vnet.ibm.com>
 <CA+55aFwf5mNOF87=SJc2U5Q6VKgFG409+=UZzdCb38tYfCFveQ@mail.gmail.com>
 <20150413184253.GZ23685@linux.vnet.ibm.com>
 <20150414102505.GA13015@gmail.com>
 <20150414130427.GR23685@linux.vnet.ibm.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20150414130427.GR23685@linux.vnet.ibm.com>
User-Agent: Mutt/1.5.23 (2014-03-12)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org


* Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:

> > I wish rcu_read_lock() had a data argument, for similar reasons - 
> > even if it just pointed to a pre-existing lock or an rcu head it 
> > never touches ;-)
> 
> Heh!  Jack Slingwine and I had that argument back in 1993.  I 
> advocated placing the update-side lock into the rcu_read_lock() 
> equivalent, and he responded by showing me a use cases were (1) 
> there were no update-side locks and (2) there were many update-side 
> locks, and it was impossible to select just one on the read side.  
> ;-)

So as a response I'd have attempted to hand-wave something about those 
scenarios being either not that common, or not that interesting?!! :-)

> However, DYNIX/ptx did not have anything like rcu_dereference() or 
> list_for_each_entry_rcu(), which perhaps can be used in your example 
> below.  (Hey, that was 20 years ago, when 50MB was a lot of main 
> memory.  So we relied on compilers being quite dumb.)

I guess compilers are still dumb in many ways ;-)

> > know that cpusets are integrated with cgroups and I search 
> > kernel/cgroup.c for call_rcu(), do I find:
> > 
> >         call_rcu(&css->rcu_head, css_free_rcu_fn);
> > 
> > aha!
> > 
> > ... or if I drill down 3 levels into cpuset_for_each_child() -> 
> > css_for_each_child() -> css_next_child() do I see the RCU 
> > iteration.
> 
> And I have felt that reviewing pain as well.
> 
> But shouldn't these API members be tagged with "_rcu" to make that 
> more clear?  Sort of like the difference between list_for_each_entry 
> and list_for_each_entry_rcu()?

Yes, agreed absolutely!

Having it as a syntactic element instead of a stylistic one forces 
such self-documentation though.

At the cost of being an extra nuisance.

> > It would have been a lot clearer from the onset, if I had a hint 
> > syntactically:
> > 
> > 	rcu_read_lock(&css->rcu_head);
> > 	...
> > 	rcu_read_unlock(&css->rcu_head);
> 
> I cannot resist asking what you put there if the update side uses 
> synchronize_rcu()...  A NULL pointer?  A pointer to 
> synchronize_rcu()? Something else? [...]

So I'd either put a suitable zero-size struct ('struct 
rcu_head_sync'?) into the protected data structure: i.e. still 
annotate it in an active fashion, but don't change any code.

This would be zero size in the non-debug case, but it would allow 
debugging data in the debug case.

Another solution would be to not require such linking in all cases, 
only when there's a single update side lock.

> [...]  And what do you do in the not-uncommon case where multiple 
> RCU chains are being traversed in the same RCU read-side critical 
> section?  One approach would be to use varargs, I suppose. Though 
> with a hash table, list, or tree, you could have a -lot- of 
> ->rcu_head structures to reference, and concurrent additions and 
> deletions mean that you wouldn't necessarily know which at 
> rcu_read_lock() time.

I'd definitely keep it simple - i.e. no varargs.

I'd just try to link with the data structure in general. Or I'd just 
forget about handling these cases altogether, at least initially - 
first see how it works out for the simpler cases.

> 
> > beyond the reviewer bonus I bet this would allow some extra debugging 
> > as well (only enabled in debug kernels):
> > 
> >   - for example to make sure we only access a field if _that field_ is 
> >     RCU locked (reducing the chance of having the right locking for 
> >     the wrong reason)
> 
> One possibility would be to mark each traversal of an RCU-protected 
> pointer.  Currently, if a multilinked structure is inserted in one 
> shot, only the initial pointer to that structure needs to have 
> rcu_dereference().  Otherwise, it is hard to tell exactly how far 
> the RCU protection is to extend.  (Been having too much fun with 
> this sort of thing in the standards committees...)
>
> >   - we could possibly also build lockdep dependencies out of such 
> >     annotated RCU locking patterns.
> 
> Tell me more?

So to backpedal a bit, I think that in practice the use 
rcu_dereference*() gives us a lot of protection and documentation 
already - so I might be over-doing it.

But we could essentially split up the current monolithic 
rcu_read_lock() class and turn it into classes mirroring the update 
side lock classes.

This would at minimum give us access to CONFIG_LOCK_STAT statistics 
about the frequency of use of the various locks. Right now I can only 
see this:

/proc/lockdep:ffffffff82d27887 OPS: 1412000 FD:    1 BD:    1 ......: rcu_read_lock
/proc/lock_stat:                         rcu_read_lock-R:             0              0           0.00           0.00           0.00           0.00              0        1135745           0.00         969.66     1355676.33           1.19
/proc/lock_stat:                   rcu_read_lock_sched-R:             0              0           0.00           0.00           0.00           0.00              0             16           0.25          20.71          30.40           1.90
/proc/lock_stat:                      rcu_read_lock_bh-R:             0              0           0.00           0.00           0.00           0.00              0           5602           0.33         126.70       40478.88           7.23

which merges them into essentially a single counter.

If we had a more finegrained structure we could tell more about usage 
patterns? Not sure how valuable that is though.

So for example on the SRCU side we could detect such deadlocks:

	mutex_lock(&mutex);
	synchronize_srcu(&srcu);

vs.

	srcu_read_lock(&srcu);
	mutex_lock(&mutex);

(I don't think we are detecting these right now.)

On the classical RCU side it's harder to construct deadlocks, because 
the read side is a non-sleeping and irq-safe primitive, while 
synchronize_rcu() is a sleeping method ;-) So most (all?) deadlock 
scenarios are avoided by just those properties.

Another use would be to potentially split up the RCU read side into 
multiple grace period domains, flushed independently, a bit like how 
SRCU does it? That would be a non-debugging use for it.

> >   - RCU aware list walking primitives could auto-check that this 
> >     particular list is properly RCU locked.
> 
> For example, that a lock in the proper update class was held during 
> the corresponding update?

Yes, and also on the lookup side: that a 'lock' of the proper type is 
read-held during lookup. (with a few special annotations for special 
cases where as a side effect of other things we may have proper RCU 
read side protection.)

Thanks,

	Ingo