From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932177Ab0D1UJK (ORCPT ); Wed, 28 Apr 2010 16:09:10 -0400 Received: from e9.ny.us.ibm.com ([32.97.182.139]:37598 "EHLO e9.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932158Ab0D1UJI (ORCPT ); Wed, 28 Apr 2010 16:09:08 -0400 Date: Wed, 28 Apr 2010 13:09:04 -0700 From: "Paul E. McKenney" To: Eric Dumazet Cc: Miles Lane , Vivek Goyal , Eric Paris , Lai Jiangshan , Ingo Molnar , Peter Zijlstra , LKML , nauman@google.com, netdev@vger.kernel.org, Jens Axboe , Gui Jianfeng , Li Zefan , Johannes Berg , shemminger@vyatta.com Subject: Re: 2.6.34-rc5-git7 (plus all patches) -- another suspicious rcu_dereference_check() usage. Message-ID: <20100428200904.GS2540@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20100428175426.GK2540@linux.vnet.ibm.com> <1272483491.2201.9.camel@edumazet-laptop> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1272483491.2201.9.camel@edumazet-laptop> User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 28, 2010 at 09:38:11PM +0200, Eric Dumazet wrote: > Le mercredi 28 avril 2010 à 10:54 -0700, Paul E. McKenney a écrit : > > On Mon, Apr 26, 2010 at 08:51:06PM -0400, Miles Lane wrote: > > > This one occurred during the wakeup from suspend to RAM. > > > > > > [ 984.724697] [ INFO: suspicious rcu_dereference_check() usage. ] > > > [ 984.724700] --------------------------------------------------- > > > [ 984.724703] include/linux/fdtable.h:88 invoked > > > rcu_dereference_check() without protection! > > > [ 984.724706] > > > [ 984.724707] other info that might help us debug this: > > > [ 984.724708] > > > [ 984.724711] > > > [ 984.724711] rcu_scheduler_active = 1, debug_locks = 1 > > > [ 984.724714] no locks held by dbus-daemon/4680. > > > [ 984.724717] > > > [ 984.724717] stack backtrace: > > > [ 984.724721] Pid: 4680, comm: dbus-daemon Not tainted 2.6.34-rc5-git7 #33 > > > [ 984.724724] Call Trace: > > > [ 984.724734] [] lockdep_rcu_dereference+0x9d/0xa6 > > > [ 984.724740] [] fcheck_files+0xb1/0xc9 > > > [ 984.724745] [] fget_light+0x35/0xab > > > [ 984.724751] [] ? sock_poll_wait+0x13/0x18 > > > [ 984.724755] [] ? unix_poll+0x19/0x95 > > > [ 984.724762] [] do_sys_poll+0x1ff/0x3e5 > > > [ 984.724766] [] ? __pollwait+0x0/0xc7 > > > [ 984.724771] [] ? pollwake+0x0/0x4f > > > [ 984.724776] [] ? pollwake+0x0/0x4f > > > [ 984.724780] [] ? pollwake+0x0/0x4f > > > [ 984.724784] [] ? pollwake+0x0/0x4f > > > [ 984.724788] [] ? pollwake+0x0/0x4f > > > [ 984.724793] [] ? pollwake+0x0/0x4f > > > [ 984.724797] [] ? pollwake+0x0/0x4f > > > [ 984.724802] [] ? pollwake+0x0/0x4f > > > [ 984.724806] [] ? pollwake+0x0/0x4f > > > [ 984.724812] [] sys_poll+0x50/0xbb > > > [ 984.724818] [] system_call_fastpath+0x16/0x1b > > > > Hmmm... I am not convinced that this is a false positive. Couldn't > > there be a multi-threaded process where one thread is invoking poll() > > on a UNIX socket just as another thread is calling close() on it? > > > > The current fcheck_files() logic requires that the caller either (1) be in > > an RCU read-side critical section, (2) hold ->files_lock, or (3) passing > > in a files_struct with ->count equal to 1 (initialization or cleanup). > > > > So I don't feel comfortable just slapping an RCU read-side critical > > section around this one, at least not unless someone who understands > > the locking says that doing so is OK. > > > > > > Its a single threaded program. > > So fget_light() calls fcheck_files(files, fd); without rcu lock, > but some /proc/pid/fd/... user temporarly raised files->count just > before we perform the condition check. So I should add a single-threaded check. My first thought was to use current_is_single_threaded(), but the bit about scanning the full list of processes does give me pause. However, thread_group_empty() looks like a much lighter-weight alternative. I believe that it is possible for a pair of single-threaded processes to share a file descriptor, but that should not be a problem, as both of them would need to close it for it to go away. But what happens if someone does a clone() with CLONE_FILES, as some of the AIO stuff seems to do? Won't that allow one of the resulting processes to close the file for both of them, even though both are otherwise single-threaded? And the ->count seems to be the only distinction between these two cases. And AIO does CLONE_VM as well as CLONE_FILES, but that seems to mean that the check must scan the processes with current_is_single_threaded(). Besides which, a user could invoke clone() with only CLONE_FILES specified, right? Or am I just confused here? Thanx, Paul