From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932437AbbDNOba (ORCPT ); Tue, 14 Apr 2015 10:31:30 -0400 Received: from mail-wg0-f43.google.com ([74.125.82.43]:34577 "EHLO mail-wg0-f43.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753750AbbDNObV (ORCPT ); Tue, 14 Apr 2015 10:31:21 -0400 Date: Tue, 14 Apr 2015 16:31:15 +0200 From: Ingo Molnar To: "Paul E. McKenney" Cc: Linus Torvalds , Mathieu Desnoyers , Peter Zijlstra , Rusty Russell , Oleg Nesterov , Linux Kernel Mailing List , Andi Kleen , Steven Rostedt , Thomas Gleixner , Lai Jiangshan , George Spelvin , Andrea Arcangeli , David Woodhouse , Rik van Riel , Michel Lespinasse Subject: Re: [PATCH v5 05/10] seqlock: Better document raw_write_seqcount_latch() Message-ID: <20150414143115.GA493@gmail.com> References: <20150413141126.756350256@infradead.org> <20150413141213.492831596@infradead.org> <20150413163201.GC6040@gmail.com> <1979415164.29724.1428944899771.JavaMail.zimbra@efficios.com> <20150413174323.GY23685@linux.vnet.ibm.com> <20150413184253.GZ23685@linux.vnet.ibm.com> <20150414102505.GA13015@gmail.com> <20150414130427.GR23685@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150414130427.GR23685@linux.vnet.ibm.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Paul E. McKenney wrote: > > I wish rcu_read_lock() had a data argument, for similar reasons - > > even if it just pointed to a pre-existing lock or an rcu head it > > never touches ;-) > > Heh! Jack Slingwine and I had that argument back in 1993. I > advocated placing the update-side lock into the rcu_read_lock() > equivalent, and he responded by showing me a use cases were (1) > there were no update-side locks and (2) there were many update-side > locks, and it was impossible to select just one on the read side. > ;-) So as a response I'd have attempted to hand-wave something about those scenarios being either not that common, or not that interesting?!! :-) > However, DYNIX/ptx did not have anything like rcu_dereference() or > list_for_each_entry_rcu(), which perhaps can be used in your example > below. (Hey, that was 20 years ago, when 50MB was a lot of main > memory. So we relied on compilers being quite dumb.) I guess compilers are still dumb in many ways ;-) > > know that cpusets are integrated with cgroups and I search > > kernel/cgroup.c for call_rcu(), do I find: > > > > call_rcu(&css->rcu_head, css_free_rcu_fn); > > > > aha! > > > > ... or if I drill down 3 levels into cpuset_for_each_child() -> > > css_for_each_child() -> css_next_child() do I see the RCU > > iteration. > > And I have felt that reviewing pain as well. > > But shouldn't these API members be tagged with "_rcu" to make that > more clear? Sort of like the difference between list_for_each_entry > and list_for_each_entry_rcu()? Yes, agreed absolutely! Having it as a syntactic element instead of a stylistic one forces such self-documentation though. At the cost of being an extra nuisance. > > It would have been a lot clearer from the onset, if I had a hint > > syntactically: > > > > rcu_read_lock(&css->rcu_head); > > ... > > rcu_read_unlock(&css->rcu_head); > > I cannot resist asking what you put there if the update side uses > synchronize_rcu()... A NULL pointer? A pointer to > synchronize_rcu()? Something else? [...] So I'd either put a suitable zero-size struct ('struct rcu_head_sync'?) into the protected data structure: i.e. still annotate it in an active fashion, but don't change any code. This would be zero size in the non-debug case, but it would allow debugging data in the debug case. Another solution would be to not require such linking in all cases, only when there's a single update side lock. > [...] And what do you do in the not-uncommon case where multiple > RCU chains are being traversed in the same RCU read-side critical > section? One approach would be to use varargs, I suppose. Though > with a hash table, list, or tree, you could have a -lot- of > ->rcu_head structures to reference, and concurrent additions and > deletions mean that you wouldn't necessarily know which at > rcu_read_lock() time. I'd definitely keep it simple - i.e. no varargs. I'd just try to link with the data structure in general. Or I'd just forget about handling these cases altogether, at least initially - first see how it works out for the simpler cases. > > > beyond the reviewer bonus I bet this would allow some extra debugging > > as well (only enabled in debug kernels): > > > > - for example to make sure we only access a field if _that field_ is > > RCU locked (reducing the chance of having the right locking for > > the wrong reason) > > One possibility would be to mark each traversal of an RCU-protected > pointer. Currently, if a multilinked structure is inserted in one > shot, only the initial pointer to that structure needs to have > rcu_dereference(). Otherwise, it is hard to tell exactly how far > the RCU protection is to extend. (Been having too much fun with > this sort of thing in the standards committees...) > > > - we could possibly also build lockdep dependencies out of such > > annotated RCU locking patterns. > > Tell me more? So to backpedal a bit, I think that in practice the use rcu_dereference*() gives us a lot of protection and documentation already - so I might be over-doing it. But we could essentially split up the current monolithic rcu_read_lock() class and turn it into classes mirroring the update side lock classes. This would at minimum give us access to CONFIG_LOCK_STAT statistics about the frequency of use of the various locks. Right now I can only see this: /proc/lockdep:ffffffff82d27887 OPS: 1412000 FD: 1 BD: 1 ......: rcu_read_lock /proc/lock_stat: rcu_read_lock-R: 0 0 0.00 0.00 0.00 0.00 0 1135745 0.00 969.66 1355676.33 1.19 /proc/lock_stat: rcu_read_lock_sched-R: 0 0 0.00 0.00 0.00 0.00 0 16 0.25 20.71 30.40 1.90 /proc/lock_stat: rcu_read_lock_bh-R: 0 0 0.00 0.00 0.00 0.00 0 5602 0.33 126.70 40478.88 7.23 which merges them into essentially a single counter. If we had a more finegrained structure we could tell more about usage patterns? Not sure how valuable that is though. So for example on the SRCU side we could detect such deadlocks: mutex_lock(&mutex); synchronize_srcu(&srcu); vs. srcu_read_lock(&srcu); mutex_lock(&mutex); (I don't think we are detecting these right now.) On the classical RCU side it's harder to construct deadlocks, because the read side is a non-sleeping and irq-safe primitive, while synchronize_rcu() is a sleeping method ;-) So most (all?) deadlock scenarios are avoided by just those properties. Another use would be to potentially split up the RCU read side into multiple grace period domains, flushed independently, a bit like how SRCU does it? That would be a non-debugging use for it. > > - RCU aware list walking primitives could auto-check that this > > particular list is properly RCU locked. > > For example, that a lock in the proper update class was held during > the corresponding update? Yes, and also on the lookup side: that a 'lock' of the proper type is read-held during lookup. (with a few special annotations for special cases where as a side effect of other things we may have proper RCU read side protection.) Thanks, Ingo