[Ocfs2-devel] [PATCH 1/3][BUGFIX] configfs: Introduce configfs_dirent_lock

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Joel Becker <Joel.Becker@oracle.com>
To: Louis Rilling <Louis.Rilling@kerlabs.com>
Cc: linux-kernel@vger.kernel.org, ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] [PATCH 1/3][BUGFIX] configfs: Introduce configfs_dirent_lock
Date: Fri, 13 Jun 2008 13:17:46 -0700	[thread overview]
Message-ID: <20080613201746.GB20576@mail.oracle.com> (raw)
In-Reply-To: <20080613104513.GI30804@localhost>

Louis,
	Can I just say, you're the first person to do serious review
other than myself, and I really appreciate it :-)

On Fri, Jun 13, 2008 at 12:45:13PM +0200, Louis Rilling wrote:
> On Thu, Jun 12, 2008 at 07:41:31PM -0700, Joel Becker wrote:
> Unfortunately, thinking a bit more about it I found some issues with
> i_mutex lock free detach_prep(), but nothing that can't be fixed ;)
> 	Between detach_prep() in A and mkdir() in a default group A/B:
> detach_prep() can be called in the middle of attach_group(), for instance after
> having attached A/B/C, but attach_group() may then fail (because of memory
> pressure for instance) while attaching C's default group A/B/C/D. This would
> lead to both mkdir(A/B/C) and rmdir(A) failing, the reason for rmdir failure
> being at best obscure: the user would have expected to either see mkdir succeed
> and rmdir fail because of the new A/B/C group, or see mkdir fail and rmdir
> succeed because no user-created group lived under A. Solution: tag A/B with
> USET_IN_MKDIR on mkdir entrance, remove that tag on mkdir exit, and retry
> detach_prep() as long as USET_IN_MKDIR is found under A/*.

	I see what you are saying here.  I'm not sure if that is worth
the complexity - we can say "it was kind of there".  No one will ever
hit it :-)  But let me think about it more.

> 	Between rmdir() and readdir(): dir_open() might add a configfs_dirent
> to a default group A/B that detach_prep() already marked with USET_DROPPING.
> This could result in detach_groups() dropping the dirent and make readdir() in
> A/B crash afterwards. Solution: check USET_DROPPING in dir_open() and fail if
> it is set.

	I was trying to see why this could happen, given that we can
come to this from other places - the dir could have been open before we
set USET_DROPPING.  Oh!  We actually fail rmdir with ENOTEMPTY when the
dir is open?  That's wrong.  Ignore it though - we'll fix it later.
	But back to your concern.  configfs_readdir() can't crash for
two reasons.  First, detach_groups() won't remove this dirent.  A
readdir placeholder has s_element==NULL.  Note the check in
detach_groups():

	if (!sd->s_element ||
	    !(sd->s_type & CONFIGFS_USET_DEFAULT))
		continue;

It skips our readdir placeholder, allowing us to free it in dir_close().
	There's another reason this can't be a problem.  If we get into
detach_groups(), we take i_mutex, locking out readdir().  Then we delete
the directory, setting S_DEAD.  In vfs_readdir(), they check
IS_DEADDIR() after getting i_mutex.  So they will see S_DEAD and not
call our ->readdir().  S_DEAD is important.  Someone could actually have
our default_group as their cwd.  S_DEAD prevents them from doing
anything :-)

> 	Between rmdir() and lookup(): several lookup() called under A/* while
> rmdir(A) in the middle of detach_groups() could return inconsistent results (for
> instance some default groups being there and some other ones not). Solution:
> lock dirent_lock for the whole lookup() duration, check USET_DROPPING of current
> dir, and fail with ENOENT if it is set.

	Nah, we don't care about the spurious lookups.  This is a normal
race of i_mutex.  USET_DROPPING is not a way to prevent VFS views from
changing - it's only a way to prevent new children.
	Remember, ->lookup() comes with i_mutex locking.  We hold
i_mutex during the entire delete, so they can't call ->lookup() until
we're done with a directory.  Conversely, if they win i_mutex and ->lookup()
a default group, then try to use it after we've removed it, they'll just
ENOENT.  This is evident back in do_rename().  They call lookup, which
takes and drops locks, then call lock_rename() to get the locks back.
And they can handle ENOENT at that point.

> I was speaking as if we replaced i_mutex protection with dirent_lock
> protection for a whole mkdir(), that is taking the lock before attach_* and
> releasing it after.

	Ok.  I think that's not the way to go, what you currently have
is better.

> > 	I'm not even sure what you said here :-)
> 
> I was just saying that with i_mutex lock free detach_prep(), we have kind of
> optimistic mkdir(), with conflicts resolved as error cases of attach_*.

	Basically, the concerns you had above.

> The intermediate conditions that really matter are:
> 1/ the existence of partial default groups trees (I mean configfs_dirent trees)
>    in the middle of attach_group() and detach_group(),

	This is your first case, the mkdir ENOMEM vs rmdir ENOTEMPTY.

> 2/ the existence of default group trees that are tagged as USET_DROPPING and
>    should be treated as not existing anymore.

	This is not an issue.  USET_DROPPING does *not* mean it went
away.  It means we're safe to make it go away.  We protect the actual
going-away with i_mutex.  And that's normal VFS behavior.

Joel

-- 

"I don't know anything about music. In my line you don't have
 to."
        - Elvis Presley

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127

WARNING: multiple messages have this Message-ID (diff)

From: Joel Becker <Joel.Becker@oracle.com>
To: Louis Rilling <Louis.Rilling@kerlabs.com>
Cc: linux-kernel@vger.kernel.org, ocfs2-devel@oss.oracle.com
Subject: Re: [PATCH 1/3][BUGFIX] configfs: Introduce configfs_dirent_lock
Date: Fri, 13 Jun 2008 13:17:46 -0700	[thread overview]
Message-ID: <20080613201746.GB20576@mail.oracle.com> (raw)
In-Reply-To: <20080613104513.GI30804@localhost>

Louis,
	Can I just say, you're the first person to do serious review
other than myself, and I really appreciate it :-)

On Fri, Jun 13, 2008 at 12:45:13PM +0200, Louis Rilling wrote:
> On Thu, Jun 12, 2008 at 07:41:31PM -0700, Joel Becker wrote:
> Unfortunately, thinking a bit more about it I found some issues with
> i_mutex lock free detach_prep(), but nothing that can't be fixed ;)
> 	Between detach_prep() in A and mkdir() in a default group A/B:
> detach_prep() can be called in the middle of attach_group(), for instance after
> having attached A/B/C, but attach_group() may then fail (because of memory
> pressure for instance) while attaching C's default group A/B/C/D. This would
> lead to both mkdir(A/B/C) and rmdir(A) failing, the reason for rmdir failure
> being at best obscure: the user would have expected to either see mkdir succeed
> and rmdir fail because of the new A/B/C group, or see mkdir fail and rmdir
> succeed because no user-created group lived under A. Solution: tag A/B with
> USET_IN_MKDIR on mkdir entrance, remove that tag on mkdir exit, and retry
> detach_prep() as long as USET_IN_MKDIR is found under A/*.

	I see what you are saying here.  I'm not sure if that is worth
the complexity - we can say "it was kind of there".  No one will ever
hit it :-)  But let me think about it more.

> 	Between rmdir() and readdir(): dir_open() might add a configfs_dirent
> to a default group A/B that detach_prep() already marked with USET_DROPPING.
> This could result in detach_groups() dropping the dirent and make readdir() in
> A/B crash afterwards. Solution: check USET_DROPPING in dir_open() and fail if
> it is set.

	I was trying to see why this could happen, given that we can
come to this from other places - the dir could have been open before we
set USET_DROPPING.  Oh!  We actually fail rmdir with ENOTEMPTY when the
dir is open?  That's wrong.  Ignore it though - we'll fix it later.
	But back to your concern.  configfs_readdir() can't crash for
two reasons.  First, detach_groups() won't remove this dirent.  A
readdir placeholder has s_element==NULL.  Note the check in
detach_groups():

	if (!sd->s_element ||
	    !(sd->s_type & CONFIGFS_USET_DEFAULT))
		continue;

It skips our readdir placeholder, allowing us to free it in dir_close().
	There's another reason this can't be a problem.  If we get into
detach_groups(), we take i_mutex, locking out readdir().  Then we delete
the directory, setting S_DEAD.  In vfs_readdir(), they check
IS_DEADDIR() after getting i_mutex.  So they will see S_DEAD and not
call our ->readdir().  S_DEAD is important.  Someone could actually have
our default_group as their cwd.  S_DEAD prevents them from doing
anything :-)

> 	Between rmdir() and lookup(): several lookup() called under A/* while
> rmdir(A) in the middle of detach_groups() could return inconsistent results (for
> instance some default groups being there and some other ones not). Solution:
> lock dirent_lock for the whole lookup() duration, check USET_DROPPING of current
> dir, and fail with ENOENT if it is set.

	Nah, we don't care about the spurious lookups.  This is a normal
race of i_mutex.  USET_DROPPING is not a way to prevent VFS views from
changing - it's only a way to prevent new children.
	Remember, ->lookup() comes with i_mutex locking.  We hold
i_mutex during the entire delete, so they can't call ->lookup() until
we're done with a directory.  Conversely, if they win i_mutex and ->lookup()
a default group, then try to use it after we've removed it, they'll just
ENOENT.  This is evident back in do_rename().  They call lookup, which
takes and drops locks, then call lock_rename() to get the locks back.
And they can handle ENOENT at that point.

> I was speaking as if we replaced i_mutex protection with dirent_lock
> protection for a whole mkdir(), that is taking the lock before attach_* and
> releasing it after.

	Ok.  I think that's not the way to go, what you currently have
is better.

> > 	I'm not even sure what you said here :-)
> 
> I was just saying that with i_mutex lock free detach_prep(), we have kind of
> optimistic mkdir(), with conflicts resolved as error cases of attach_*.

	Basically, the concerns you had above.

> The intermediate conditions that really matter are:
> 1/ the existence of partial default groups trees (I mean configfs_dirent trees)
>    in the middle of attach_group() and detach_group(),

	This is your first case, the mkdir ENOMEM vs rmdir ENOTEMPTY.

> 2/ the existence of default group trees that are tagged as USET_DROPPING and
>    should be treated as not existing anymore.

	This is not an issue.  USET_DROPPING does *not* mean it went
away.  It means we're safe to make it go away.  We protect the actual
going-away with i_mutex.  And that's normal VFS behavior.

Joel

-- 

"I don't know anything about music. In my line you don't have
 to."
        - Elvis Presley

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

next prev parent reply	other threads:[~2008-06-13 20:17 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-12 13:31 [Ocfs2-devel] [PATCH 0/3][BUGFIX] configfs: Fix deadlock rmdir() vs rename() Louis Rilling
2008-06-12 13:31 ` Louis Rilling
2008-06-12 13:31 ` [Ocfs2-devel] [PATCH 1/3][BUGFIX] configfs: Introduce configfs_dirent_lock Louis Rilling
2008-06-12 13:31   ` Louis Rilling
2008-06-12 19:13   ` [Ocfs2-devel] " Joel Becker
2008-06-12 19:13     ` Joel Becker
2008-06-12 22:25     ` [Ocfs2-devel] " Louis Rilling
2008-06-12 22:25       ` Louis Rilling
2008-06-13  2:41       ` [Ocfs2-devel] " Joel Becker
2008-06-13  2:41         ` Joel Becker
2008-06-13 10:45         ` [Ocfs2-devel] " Louis Rilling
2008-06-13 10:45           ` Louis Rilling
2008-06-13 12:09           ` [Ocfs2-devel] " Louis Rilling
2008-06-13 12:09             ` Louis Rilling
2008-06-13 20:19             ` [Ocfs2-devel] " Joel Becker
2008-06-13 20:19               ` Joel Becker
2008-06-13 20:17           ` Joel Becker [this message]
2008-06-13 20:17             ` Joel Becker
2008-06-13 21:54             ` [Ocfs2-devel] " Louis Rilling
2008-06-13 21:54               ` Louis Rilling
2008-06-13 22:34               ` [Ocfs2-devel] " Joel Becker
2008-06-13 22:34                 ` Joel Becker
2008-06-16 11:30                 ` [Ocfs2-devel] " Louis Rilling
2008-06-16 11:30                   ` Louis Rilling
2008-06-12 13:31 ` [Ocfs2-devel] [PATCH 2/3][BUGFIX] configfs: Make configfs_new_dirent() return error code instead of NULL Louis Rilling
2008-06-12 13:31   ` Louis Rilling
2008-06-12 13:31 ` [Ocfs2-devel] [PATCH 3/3][BUGFIX] configfs: Fix deadlock with racing rmdir() and rename() Louis Rilling
2008-06-12 13:31   ` Louis Rilling

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080613201746.GB20576@mail.oracle.com \
    --to=joel.becker@oracle.com \
    --cc=Louis.Rilling@kerlabs.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ocfs2-devel@oss.oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.