[Ocfs2-devel] [BUG] deadlock between configfs_rmdir() and sys_rename() (WAS Re: [RFC][PATCH 4/4] configfs: Make multiple default_group) destructions lockdep friendly

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Louis Rilling <Louis.Rilling@kerlabs.com>
To: Joel.Becker@oracle.com
Cc: ocfs2-devel@oss.oracle.com, linux-kernel@vger.kernel.org
Subject: [Ocfs2-devel] [BUG] deadlock between configfs_rmdir() and sys_rename() (WAS	Re: [RFC][PATCH 4/4] configfs: Make multiple default_group)	destructions lockdep friendly
Date: Wed, 11 Jun 2008 16:10:10 +0200	[thread overview]
Message-ID: <20080611141010.GP18153@localhost> (raw)
In-Reply-To: <20080610173654.GA23829@ca-server1.us.oracle.com>

On Tue, Jun 10, 2008 at 10:36:54AM -0700, Joel Becker wrote:
> On Tue, Jun 10, 2008 at 12:14:27PM +0200, Louis Rilling wrote:
> > On Mon, Jun 09, 2008 at 06:58:00PM -0700, Joel Becker wrote:
> > > 	I'm not against the latter AT ALL.  I just haven't come up with
> > > it yet - we can't remove parts of the tree, it must be all or none.
> > > Hence, we lock them all speculatively.
> > 
> > I'm slowly thinking about a solution, but I don't know the VFS enough
> > yet, especially regarding dentry invalidation and locking. Would it be possible
> > to start an rmdir() by moving the group to some unreachable place? We could
> > probably use the mutex of the configfs subsystem and enlarge its scope to
> > protect against concurrent mkdir(), check for user-created items under the
> > group (and descendent default groups) to remove, invalidate all dentries and
> > actually remove the group. Do you think that something is feasible in this way?
> 
> 	Nope, because you may have live objects below you - the rmdir
> should fail, and nothing should change.  Sure, you could put it back,
> but in the middle there is a period where another process tries to look
> at the tree and gets ENOENT.  That's not right.

Sure. I was more thinking about moving the to-be-removed group only once we ware
sure that nobody would use it anymore (thanks to the subsys mutex enlarging its
scope for instance). Anyway, see below.

> 	But blocking lookup another way might work.  If we keep that
> rename process out of looking up its targets (blocked on a lock we hold)
> it might work.
> 	Note, btw, that the create side (populate_groups) is safe,
> because we hold the creating parent's i_mutex throughout the entire
> process.

Yes, mkdir() is completely ok. In fact, I am able to formally prove its locking
correctness (from lockdep's point of view) provided that I_MUTEX_PARENT ->
I_MUTEX_CHILD is correct. I'll come back to lockdep annotations once the
rmdir() bug is fixed.

> 	Hey, can we use d_revalidate?  Here's the issue.  rename, when
> going to lookup the objects it wants to lock, is getting them out of
> cached_lookup - there dcache locking is all that protects it.  I was
> first thinking we could take the dentry locks to block this out.  But
> rather, why not fail d_revalidate and force a locked lookup?  So, when
> we go to lock one of these groups for detaching, we also set a flag on
> the configfs_dirent.  We add a configfs_d_revalidate function that
> returns based on that flag - if set, revalidation is needed.  Thus, when
> another process comes in to look at the object we've already locked, it
> blocks waiting to find it.
> 	See, in do_rename, it does do_path_lookup() before actually
> calling lock_rename().  It would block there waiting for our speculative
> removal.  We'd either fail rmdir, and those lookups would succeed, or
> we'd succeed rmdir, and the lookup fails.
> 	The only concern is, can the reverse happen?  We get past the
> lookups in do_rename(), and then another process comes into rmdir() -
> will they deadlock there?

No we cannot make d_revalidate() temporarily fail, because it will either make
do_lookup() fail (return value < 0), or will tell do_revalidate() to call
d_invalidate() (return value == 0), which will either definitely invalidate the
dentry stored in item->ci_dentry (unlikely because its use count should be > 1,
right?), or return success to do_lookup().

The good news is that we can make d_revalidate() block on whatever we use to
protect rmdir() against racing operations. I'll try to send a patch levaraging
the subsystem's mutexes later in the day.

Louis

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes

WARNING: multiple messages have this Message-ID (diff)

From: Louis Rilling <Louis.Rilling@kerlabs.com>
To: Joel.Becker@oracle.com
Cc: ocfs2-devel@oss.oracle.com, linux-kernel@vger.kernel.org
Subject: Re: [BUG] deadlock between configfs_rmdir() and sys_rename() (WAS Re: [RFC][PATCH 4/4] configfs: Make multiple default_group) destructions lockdep friendly
Date: Wed, 11 Jun 2008 16:10:10 +0200	[thread overview]
Message-ID: <20080611141010.GP18153@localhost> (raw)
In-Reply-To: <20080610173654.GA23829@ca-server1.us.oracle.com>

On Tue, Jun 10, 2008 at 10:36:54AM -0700, Joel Becker wrote:
> On Tue, Jun 10, 2008 at 12:14:27PM +0200, Louis Rilling wrote:
> > On Mon, Jun 09, 2008 at 06:58:00PM -0700, Joel Becker wrote:
> > > 	I'm not against the latter AT ALL.  I just haven't come up with
> > > it yet - we can't remove parts of the tree, it must be all or none.
> > > Hence, we lock them all speculatively.
> > 
> > I'm slowly thinking about a solution, but I don't know the VFS enough
> > yet, especially regarding dentry invalidation and locking. Would it be possible
> > to start an rmdir() by moving the group to some unreachable place? We could
> > probably use the mutex of the configfs subsystem and enlarge its scope to
> > protect against concurrent mkdir(), check for user-created items under the
> > group (and descendent default groups) to remove, invalidate all dentries and
> > actually remove the group. Do you think that something is feasible in this way?
> 
> 	Nope, because you may have live objects below you - the rmdir
> should fail, and nothing should change.  Sure, you could put it back,
> but in the middle there is a period where another process tries to look
> at the tree and gets ENOENT.  That's not right.

Sure. I was more thinking about moving the to-be-removed group only once we ware
sure that nobody would use it anymore (thanks to the subsys mutex enlarging its
scope for instance). Anyway, see below.

> 	But blocking lookup another way might work.  If we keep that
> rename process out of looking up its targets (blocked on a lock we hold)
> it might work.
> 	Note, btw, that the create side (populate_groups) is safe,
> because we hold the creating parent's i_mutex throughout the entire
> process.

Yes, mkdir() is completely ok. In fact, I am able to formally prove its locking
correctness (from lockdep's point of view) provided that I_MUTEX_PARENT ->
I_MUTEX_CHILD is correct. I'll come back to lockdep annotations once the
rmdir() bug is fixed.

> 	Hey, can we use d_revalidate?  Here's the issue.  rename, when
> going to lookup the objects it wants to lock, is getting them out of
> cached_lookup - there dcache locking is all that protects it.  I was
> first thinking we could take the dentry locks to block this out.  But
> rather, why not fail d_revalidate and force a locked lookup?  So, when
> we go to lock one of these groups for detaching, we also set a flag on
> the configfs_dirent.  We add a configfs_d_revalidate function that
> returns based on that flag - if set, revalidation is needed.  Thus, when
> another process comes in to look at the object we've already locked, it
> blocks waiting to find it.
> 	See, in do_rename, it does do_path_lookup() before actually
> calling lock_rename().  It would block there waiting for our speculative
> removal.  We'd either fail rmdir, and those lookups would succeed, or
> we'd succeed rmdir, and the lookup fails.
> 	The only concern is, can the reverse happen?  We get past the
> lookups in do_rename(), and then another process comes into rmdir() -
> will they deadlock there?

No we cannot make d_revalidate() temporarily fail, because it will either make
do_lookup() fail (return value < 0), or will tell do_revalidate() to call
d_invalidate() (return value == 0), which will either definitely invalidate the
dentry stored in item->ci_dentry (unlikely because its use count should be > 1,
right?), or return success to do_lookup().

The good news is that we can make d_revalidate() block on whatever we use to
protect rmdir() against racing operations. I'll try to send a patch levaraging
the subsystem's mutexes later in the day.

Louis

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes

next prev parent reply	other threads:[~2008-06-11 14:10 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-22 11:40 [Ocfs2-devel] [RFC][PATCH 0/4] configfs: Make nested default groups lockdep-friendly v2 Louis Rilling
2008-05-22 11:40 ` Louis Rilling
2008-05-22 11:40 ` [Ocfs2-devel] [RFC][PATCH 1/4] Prepare i_mutex lockdep subclasses for locking of variable path lengths Louis Rilling
2008-05-22 11:40   ` Louis Rilling
2008-05-22 11:40 ` [Ocfs2-devel] [RFC][PATCH 2/4] Prepare vfs_rmdir() for further nested i_mutex locking Louis Rilling
2008-05-22 11:40   ` Louis Rilling
2008-05-22 11:40 ` [Ocfs2-devel] [RFC][PATCH 3/4] configfs: Make nested default groups creations lockdep-friendly Louis Rilling
2008-05-22 11:40   ` Louis Rilling
2008-05-22 11:40 ` [Ocfs2-devel] [RFC][PATCH 4/4] configfs: Make multiple default_group destructions lockdep friendly Louis Rilling
2008-05-22 11:40   ` Louis Rilling
2008-05-23 16:44   ` [Ocfs2-devel] " Louis Rilling
2008-05-23 16:44     ` Louis Rilling
2008-06-02 23:07     ` [Ocfs2-devel] " Joel Becker
2008-06-02 23:07       ` Joel Becker
2008-06-03 16:00       ` [Ocfs2-devel] " Louis Rilling
2008-06-03 16:00         ` Louis Rilling
2008-06-06 23:01         ` [Ocfs2-devel] " Joel Becker
2008-06-06 23:01           ` Joel Becker
2008-06-09 11:03           ` [Ocfs2-devel] " Louis Rilling
2008-06-09 11:03             ` Louis Rilling
2008-06-09 12:54             ` [Ocfs2-devel] [BUG] deadlock between configfs_rmdir() and sys_rename() (WAS Re: [RFC][PATCH 4/4] configfs: Make multiple default_group) " Louis Rilling
2008-06-09 12:54               ` Louis Rilling
2008-06-10  1:58               ` [Ocfs2-devel] " Joel Becker
2008-06-10  1:58                 ` Joel Becker
2008-06-10 10:14                 ` [Ocfs2-devel] " Louis Rilling
2008-06-10 10:14                   ` Louis Rilling
2008-06-10 17:36                   ` [Ocfs2-devel] " Joel Becker
2008-06-10 17:36                     ` Joel Becker
2008-06-11 14:10                     ` Louis Rilling [this message]
2008-06-11 14:10                       ` Louis Rilling
2008-06-11 17:30                       ` [Ocfs2-devel] " Louis Rilling
2008-06-11 17:30                         ` Louis Rilling
2008-06-11 21:15                         ` [Ocfs2-devel] " Joel Becker
2008-06-11 21:15                           ` Joel Becker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080611141010.GP18153@localhost \
    --to=louis.rilling@kerlabs.com \
    --cc=Joel.Becker@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ocfs2-devel@oss.oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.