From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753997AbcGTRP6 (ORCPT ); Wed, 20 Jul 2016 13:15:58 -0400 Received: from mx1.redhat.com ([209.132.183.28]:51358 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752583AbcGTRPv (ORCPT ); Wed, 20 Jul 2016 13:15:51 -0400 Date: Wed, 20 Jul 2016 19:16:03 +0200 From: Oleg Nesterov To: "Paul E. McKenney" Cc: Peter Zijlstra , mingo@kernel.org, linux-kernel@vger.kernel.org, tj@kernel.org, john.stultz@linaro.org, dimitrysh@google.com, romlem@google.com, ccross@google.com, tkjos@google.com Subject: Re: [PATCH] rcu_sync: simplify the state machine, introduce __rcu_sync_enter() Message-ID: <20160720171602.GA5059@redhat.com> References: <20160714184351.GA18388@redhat.com> <20160714192018.GM30154@twins.programming.kicks-ass.net> <20160715132709.GA27644@redhat.com> <20160715133928.GH7094@linux.vnet.ibm.com> <20160715134523.GB28589@redhat.com> <20160715153840.GI7094@linux.vnet.ibm.com> <20160715164938.GA4294@redhat.com> <20160715180140.GM7094@linux.vnet.ibm.com> <20160716171007.GA30000@redhat.com> <20160719205018.GB7094@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160719205018.GB7094@linux.vnet.ibm.com> User-Agent: Mutt/1.5.18 (2008-05-17) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Wed, 20 Jul 2016 17:15:50 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Paul, I had to switch to internal bugzillas after the first email, and now I feel I can't read... I'll try to answer one question right now, tomorrow I'll reread your email, probably I need to answer something else... On 07/19, Paul E. McKenney wrote: > > On Sat, Jul 16, 2016 at 07:10:07PM +0200, Oleg Nesterov wrote: > > > And, there is another possible transition, GP_ENTER -> GP_IDLE, because > > not it is possible to call __rcu_sync_enter() and rcu_sync_exit() in any > > state (except obviously they should be balanced), and they do not block. ... > If you feel strongly about allowing rcu_sync_exit() in GP_ENTER state, > could you please tell me your use case? Or am I confused? See below, > And I think __rcu_sync_enter() can have more users. Let's look at > freeze_super(). It calls percpu_down_write() 3 times, and waits for 3 GP's > sequentally. > > > Now we can add 3 __rcu_sync_enter's at the start and 3 rcu_sync_exit's at > > the end (actually we can do better, just to simplify). And again, note > > that rcu_sync_exit() will work correctly even if we (say) return -EBUSY, > > so rcu_sync_wait and/or percpu_down_write() was not called in between, > > and in this case we won't block waiting for GP. > > > > I am not going to claim to understand freeze_super(), but it does seem > to have a fair amount of waiting. > > But yes, you could put rcu_sync_enter() ^^^^^^^^^^^^^^ __rcu_sync_enter, see below, > and rcu_sync_exit() before and > after a series of write-side enter/exit pairs in order to force things > to stay in writer mode, if that is what you are suggesting. No, no, this is not what I am trying to suggest. The problem is that freeze_super() takes 3 semaphores for writing in row, this means that it needs to wait for 3 GP's sequentally, and it does this with sb->s_umount held. This is just ugly. OK, lets suppose it simply does freeze_super(sb) { down_write(&sb->s_umount); if (NEED_TO_FREEZE) { percpu_down_write(SEM1); percpu_down_write(SEM2); percpu_down_write(SEM3); } up_write(&sb->s_umount); } and every percpu_down_write() waits for GP. Now, suppose we add the additional enter/exit's: freeze_super(sb) { // this doesn't block __rcu_sync_enter(SEM3); __rcu_sync_enter(SEM2); __rcu_sync_enter(SEM1); down_write(&sb->s_umount); if (NEED_TO_FREEZE) { percpu_down_write(SEM1); percpu_down_write(SEM2); percpu_down_write(SEM3); } up_write(&sb->s_umount); rcu_sync_exit(SEM1); rcu_sync_exit(SEM2); rcu_sync_exit(SEM3); } Again, actually we can do better, just to simplify. Now. the fisrt percpu_down_write(SEM1) can block waiting for GP or not, this depends on how many time it spends in down_write(). But the 2nd and the 3rd percpu_down_write() most likely won't block, so in the likely case freeze_super() will need a single GP pass. And note that NEED_TO_FREEZE can be false, in this case rcu_sync_exit() will be called in GP_ENTER state. To some degree, this is like get_state_synchronize_rcu/cond_synchronize_rcu. But obviously percpu_down_write() can not use these helpers, and in this particular case __rcu_sync_enter() is better because it forces the start of GP pass. ----------------------------------------------------------------------- As for cgroups, we want to switch cgroup_threadgroup_rwsem into the slow mode, at least for now. We could add the additional hooks/hacks into rcu/sync.c but why? We can do this without any changes outside of cgroup.c right now, just add rcu_sync_enter() into cgroup_init(). But we do not want to add a pointless synchronize_sched() at boot time, __rcu_sync_enter() looks much better. ------------------------------------------------------------------------ And even __cgroup_procs_write() could use __rcu_sync_enter(). But lets ignore this for now. Oleg.