From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756426Ab0DWTrB (ORCPT ); Fri, 23 Apr 2010 15:47:01 -0400 Received: from e6.ny.us.ibm.com ([32.97.182.146]:36989 "EHLO e6.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755963Ab0DWTq5 (ORCPT ); Fri, 23 Apr 2010 15:46:57 -0400 Date: Fri, 23 Apr 2010 12:46:49 -0700 From: "Paul E. McKenney" To: Vivek Goyal Cc: linux kernel mailing list , Jens Axboe , Li Zefan , Gui Jianfeng Subject: Re: [PATCH] blk-cgroup: Fix RCU correctness warning in cfq_init_queue() Message-ID: <20100423194649.GF2589@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20100422155452.GD3228@redhat.com> <20100422231556.GW2524@linux.vnet.ibm.com> <20100422235555.GA12004@redhat.com> <20100423001751.GX2524@linux.vnet.ibm.com> <20100423144138.GA5026@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20100423144138.GA5026@redhat.com> User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Apr 23, 2010 at 10:41:38AM -0400, Vivek Goyal wrote: > On Thu, Apr 22, 2010 at 05:17:51PM -0700, Paul E. McKenney wrote: > > On Thu, Apr 22, 2010 at 07:55:55PM -0400, Vivek Goyal wrote: > > > On Thu, Apr 22, 2010 at 04:15:56PM -0700, Paul E. McKenney wrote: > > > > On Thu, Apr 22, 2010 at 11:54:52AM -0400, Vivek Goyal wrote: > > > > > With RCU correctness on, We see following warning. This patch fixes it. > > > > > > > > This is in initialization code, so that there cannot be any concurrent > > > > updates, correct? If so, looks good. > > > > > > > > > > I think theoritically two instances of cfq_init_queue() can be running > > > in parallel (for two different devices), and they both can call > > > blkiocg_add_blkio_group(). But then we use a spin lock to protect > > > blkio_cgroup. > > > > > > spin_lock_irqsave(&blkcg->lock, flags); > > > > > > So I guess two parallel updates should be fine. > > > > OK, in that case, would it be possible add this spinlock to the condition > > checked by css_id()'s rcu_dereference_check()? > > Hi Paul, > > I think adding these spinlock to condition checked might become little > messy. And the reason being that this lock is subsystem (controller) > specific and maintained by controller. Now if any controller implements > a lock and we add that lock in css_id() rcu_dereference_check(), it will > look ugly. > > So probably a better way is to make sure that css_id() is always called > under rcu read lock so that we don't hit this warning? As long as holding rcu_read_lock() prevents css_id() from the usual problems such as access memory that was concurrently freed, yes. > > At first glance, css_id() > > needs to gain access to the blkio_cgroup structure that references > > the cgroup_subsys_state structure passed to css_id(). > > > > This means that there is only one blkio_cgroup structure referencing > > a given cgroup_subsys_state structure, right? Otherwise, we could still > > have concurrent access. > > Yes. In fact css object is embedded in blkio_cgroup structure. So we take > a rcu_read_lock() so that data structures associated with cgroup subsystem > don't go away and then take controller specific blkio_cgroup spin lock to > make sure multiple writers don't end up modifying a list at the same time. > > Am I missing something. This sounds very good! I did have to ask! ;-) Thanx, Paul > Thanks > Vivek > > > > > > (Just wanting to make sure that we are not papering over a real error!) > > > > > > > > Thanx, Paul > > > > > > > > > [ 103.790505] =================================================== > > > > > [ 103.790509] [ INFO: suspicious rcu_dereference_check() usage. ] > > > > > [ 103.790511] --------------------------------------------------- > > > > > [ 103.790514] kernel/cgroup.c:4432 invoked rcu_dereference_check() without protection! > > > > > [ 103.790517] > > > > > [ 103.790517] other info that might help us debug this: > > > > > [ 103.790519] > > > > > [ 103.790521] > > > > > [ 103.790521] rcu_scheduler_active = 1, debug_locks = 1 > > > > > [ 103.790524] 4 locks held by bash/4422: > > > > > [ 103.790526] #0: (&buffer->mutex){+.+.+.}, at: [] sysfs_write_file+0x3c/0x144 > > > > > [ 103.790537] #1: (s_active#102){.+.+.+}, at: [] sysfs_write_file+0xe7/0x144 > > > > > [ 103.790544] #2: (&q->sysfs_lock){+.+.+.}, at: [] queue_attr_store+0x49/0x8f > > > > > [ 103.790552] #3: (&(&blkcg->lock)->rlock){......}, at: [] blkiocg_add_blkio_group+0x2b/0xad > > > > > [ 103.790560] > > > > > [ 103.790561] stack backtrace: > > > > > [ 103.790564] Pid: 4422, comm: bash Not tainted 2.6.34-rc4-blkio-second-crash #81 > > > > > [ 103.790567] Call Trace: > > > > > [ 103.790572] [] lockdep_rcu_dereference+0x9d/0xa5 > > > > > [ 103.790577] [] css_id+0x44/0x57 > > > > > [ 103.790581] [] blkiocg_add_blkio_group+0x53/0xad > > > > > [ 103.790586] [] cfq_init_queue+0x139/0x32c > > > > > [ 103.790591] [] elv_iosched_store+0xbf/0x1bf > > > > > [ 103.790595] [] queue_attr_store+0x70/0x8f > > > > > [ 103.790599] [] ? sysfs_write_file+0xe7/0x144 > > > > > [ 103.790603] [] sysfs_write_file+0x108/0x144 > > > > > [ 103.790609] [] vfs_write+0xae/0x10b > > > > > [ 103.790612] [] ? trace_hardirqs_on_caller+0x10c/0x130 > > > > > [ 103.790616] [] sys_write+0x4a/0x6e > > > > > [ 103.790622] [] system_call_fastpath+0x16/0x1b > > > > > [ 103.790625] > > > > > > > > > > Signed-off-by: Vivek Goyal > > > > > --- > > > > > block/cfq-iosched.c | 2 ++ > > > > > 1 files changed, 2 insertions(+), 0 deletions(-) > > > > > > > > > > diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c > > > > > index 002a5b6..9386bf8 100644 > > > > > --- a/block/cfq-iosched.c > > > > > +++ b/block/cfq-iosched.c > > > > > @@ -3741,8 +3741,10 @@ static void *cfq_init_queue(struct request_queue *q) > > > > > * to make sure that cfq_put_cfqg() does not try to kfree root group > > > > > */ > > > > > atomic_set(&cfqg->ref, 1); > > > > > + rcu_read_lock(); > > > > > blkiocg_add_blkio_group(&blkio_root_cgroup, &cfqg->blkg, (void *)cfqd, > > > > > 0); > > > > > + rcu_read_unlock(); > > > > > #endif > > > > > /* > > > > > * Not strictly needed (since RB_ROOT just clears the node and we > > > > > -- > > > > > 1.6.2.5 > > > > >