All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@suse.de>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Linux-MM <linux-mm@kvack.org>,
	Linux-FSDevel <linux-fsdevel@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	"Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>,
	Russell King - ARM Linux <linux@arm.linux.org.uk>,
	Gilad Ben-Yossef <gilad@benyossef.com>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Miklos Szeredi <mszeredi@novell.com>, Greg KH <gregkh@suse.de>,
	Gong Chen <gong.chen@intel.com>
Subject: Re: [PATCH 1/2] fs: sysfs: Do dcache-related updates to sysfs dentries under sysfs_mutex
Date: Wed, 11 Jan 2012 18:07:23 +0000	[thread overview]
Message-ID: <20120111180723.GF4118@suse.de> (raw)
In-Reply-To: <m1k44y5fls.fsf@fess.ebiederm.org>

On Wed, Jan 11, 2012 at 09:11:27AM -0800, Eric W. Biederman wrote:
> > In Miklos's case, the problem is with the bonding driver but during
> > CPU online or offline, a number of dentries are being created and
> > deleted and this deadlock is also being hit. Looking at sysfs, there
> > is a global sysfs_mutex that protects the sysfs directory tree from
> > concurrent reclaims. Almost all operations involving directory inodes
> > and dentries take place under the sysfs_mutex - linking, unlinking,
> > patch searching lookup, renames and readdir. d_invalidate is slightly
> > different. It is mostly under the mutex but if the dentry has to be
> > removed from the dcache, the mutex is dropped.
> 
> The sysfs_mutex protects the sysfs data structures not the vfs.
> 

Ok.

> > Where as Miklos' patch changes dcache, this patch changes sysfs to
> > consistently hold the mutex for dentry-related operations. Once
> > applied, this particular bug with CPU hotadd/hotremove no longer
> > occurs.
> 
> After taking a quick skim over the code to reacquaint myself with
> it appears that the usage in sysfs is idiomatic.  That is sysfs
> uses shrink_dcache_parent without a lock and in a context where
> the right race could trigger this deadlock.
> 

Yes.

> And in particular I expect you could trigger the same deadlock in
> proc, nfs, and gfs2 with if you can get the timing right.
> 

Agreed. When the dcache-specific fix was being discussed on an external
bugzilla, this came up. It's probably easiest to race in sysfs because
it's possible to create/delete directories faster than is possible
for proc, nfs or gfs2.

> I don't think adding a work-around for the bug in shrink_dcache_parent
> is going to do anything except hide the bug in the VFS, and
> unnecessarily increase the sysfs_mutex hold times.
> 

Ok.

> I may be blind but I don't see a reason at this point to rush out an
> incomplete work-around for the bug in shrink_dcahce_parent instead of
> actually fixing shrink_dcache_parent.
> 

Since I wrote this patch, the dcache specific fix was finished, merged
and I expect it'll make it to stable. Assuming that happens, this patch
will no longer be required.

Thanks Eric.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Mel Gorman <mgorman@suse.de>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Linux-MM <linux-mm@kvack.org>,
	Linux-FSDevel <linux-fsdevel@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	"Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>,
	Russell King - ARM Linux <linux@arm.linux.org.uk>,
	Gilad Ben-Yossef <gilad@benyossef.com>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Miklos Szeredi <mszeredi@novell.com>, Greg KH <gregkh@suse.de>,
	Gong Chen <gong.chen@intel.com>
Subject: Re: [PATCH 1/2] fs: sysfs: Do dcache-related updates to sysfs dentries under sysfs_mutex
Date: Wed, 11 Jan 2012 18:07:23 +0000	[thread overview]
Message-ID: <20120111180723.GF4118@suse.de> (raw)
In-Reply-To: <m1k44y5fls.fsf@fess.ebiederm.org>

On Wed, Jan 11, 2012 at 09:11:27AM -0800, Eric W. Biederman wrote:
> > In Miklos's case, the problem is with the bonding driver but during
> > CPU online or offline, a number of dentries are being created and
> > deleted and this deadlock is also being hit. Looking at sysfs, there
> > is a global sysfs_mutex that protects the sysfs directory tree from
> > concurrent reclaims. Almost all operations involving directory inodes
> > and dentries take place under the sysfs_mutex - linking, unlinking,
> > patch searching lookup, renames and readdir. d_invalidate is slightly
> > different. It is mostly under the mutex but if the dentry has to be
> > removed from the dcache, the mutex is dropped.
> 
> The sysfs_mutex protects the sysfs data structures not the vfs.
> 

Ok.

> > Where as Miklos' patch changes dcache, this patch changes sysfs to
> > consistently hold the mutex for dentry-related operations. Once
> > applied, this particular bug with CPU hotadd/hotremove no longer
> > occurs.
> 
> After taking a quick skim over the code to reacquaint myself with
> it appears that the usage in sysfs is idiomatic.  That is sysfs
> uses shrink_dcache_parent without a lock and in a context where
> the right race could trigger this deadlock.
> 

Yes.

> And in particular I expect you could trigger the same deadlock in
> proc, nfs, and gfs2 with if you can get the timing right.
> 

Agreed. When the dcache-specific fix was being discussed on an external
bugzilla, this came up. It's probably easiest to race in sysfs because
it's possible to create/delete directories faster than is possible
for proc, nfs or gfs2.

> I don't think adding a work-around for the bug in shrink_dcache_parent
> is going to do anything except hide the bug in the VFS, and
> unnecessarily increase the sysfs_mutex hold times.
> 

Ok.

> I may be blind but I don't see a reason at this point to rush out an
> incomplete work-around for the bug in shrink_dcahce_parent instead of
> actually fixing shrink_dcache_parent.
> 

Since I wrote this patch, the dcache specific fix was finished, merged
and I expect it'll make it to stable. Assuming that happens, this patch
will no longer be required.

Thanks Eric.

-- 
Mel Gorman
SUSE Labs

  reply	other threads:[~2012-01-11 18:07 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-11 10:11 [RFC PATCH 0/2] Improve reliability of CPU hotplug Mel Gorman
2012-01-11 10:11 ` Mel Gorman
2012-01-11 10:11 ` [PATCH 1/2] fs: sysfs: Do dcache-related updates to sysfs dentries under sysfs_mutex Mel Gorman
2012-01-11 10:11   ` Mel Gorman
2012-01-11 17:11   ` Eric W. Biederman
2012-01-11 17:11     ` Eric W. Biederman
2012-01-11 18:07     ` Mel Gorman [this message]
2012-01-11 18:07       ` Mel Gorman
2012-01-11 19:02       ` Eric W. Biederman
2012-01-11 19:02         ` Eric W. Biederman
2012-01-11 10:11 ` [PATCH 2/2] mm: page allocator: Do not drain per-cpu lists via IPI from page allocator context Mel Gorman
2012-01-11 10:11   ` Mel Gorman
2012-01-12 14:51   ` Gilad Ben-Yossef
2012-01-12 14:51     ` Gilad Ben-Yossef
2012-01-12 15:08     ` Peter Zijlstra
2012-01-12 15:08       ` Peter Zijlstra
2012-01-12 15:13       ` Gilad Ben-Yossef
2012-01-12 15:13         ` Gilad Ben-Yossef
2012-01-12 15:08     ` Gilad Ben-Yossef
2012-01-12 15:08       ` Gilad Ben-Yossef
2012-01-12 15:18   ` Peter Zijlstra
2012-01-12 15:18     ` Peter Zijlstra
2012-01-12 15:37     ` Mel Gorman
2012-01-12 15:37       ` Mel Gorman
2012-01-12 15:52       ` Peter Zijlstra
2012-01-12 15:52         ` Peter Zijlstra
2012-01-12 17:18         ` Mel Gorman
2012-01-12 17:18           ` Mel Gorman
2012-01-12 19:14           ` Gilad Ben-Yossef
2012-01-12 19:14             ` Gilad Ben-Yossef
2012-01-13 20:58             ` Milton Miller
2012-01-15 13:53               ` Gilad Ben-Yossef
2012-01-13 20:58           ` Milton Miller
2012-01-19 16:20             ` Mel Gorman
2012-01-19 21:46               ` Srivatsa S. Bhat
2012-01-19 21:46                 ` Srivatsa S. Bhat
2012-01-20  8:48                 ` Mel Gorman
2012-01-20  8:48                   ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120111180723.GF4118@suse.de \
    --to=mgorman@suse.de \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=ebiederm@xmission.com \
    --cc=gilad@benyossef.com \
    --cc=gong.chen@intel.com \
    --cc=gregkh@suse.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux@arm.linux.org.uk \
    --cc=mszeredi@novell.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=srivatsa.bhat@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.