All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gabriel Krisman Bertazi <krisman@suse.de>
To: Eric Biggers <ebiggers@kernel.org>
Cc: brauner@kernel.org, tytso@mit.edu,
	linux-f2fs-devel@lists.sourceforge.net, viro@zeniv.linux.org.uk,
	linux-fsdevel@vger.kernel.org, jaegeuk@kernel.org,
	linux-ext4@vger.kernel.org,
	Gabriel Krisman Bertazi <krisman@collabora.com>
Subject: Re: [PATCH v3 3/7] libfs: Validate negative dentries in case-insensitive directories
Date: Mon, 24 Jul 2023 17:33:43 -0400	[thread overview]
Message-ID: <87zg3l2dvs.fsf@suse.de> (raw)
In-Reply-To: <20230722042939.GC5660@sol.localdomain> (Eric Biggers's message of "Fri, 21 Jul 2023 21:29:39 -0700")

Eric Biggers <ebiggers@kernel.org> writes:

> On Fri, Jul 21, 2023 at 04:16:30PM -0400, Gabriel Krisman Bertazi wrote:
>> Eric Biggers <ebiggers@kernel.org> writes:
>> 
>> > On Wed, Jul 19, 2023 at 11:06:57PM -0700, Eric Biggers wrote:
>> >> 
>> >> I'm also having trouble understanding exactly when ->d_name is stable here.
>> >> AFAICS, unfortunately the VFS has an edge case where a dentry can be moved
>> >> without its parent's ->i_rwsem being held.  It happens when a subdirectory is
>> >> "found" under multiple names.  The VFS doesn't support directory hard links, so
>> >> if it finds a second link to a directory, it just moves the whole dentry tree to
>> >> the new location.  This can happen if a filesystem image is corrupted and
>> >> contains directory hard links.  Coincidentally, it can also happen in an
>> >> encrypted directory due to the no-key name => normal name transition...
>> >
>> > Sorry, I think I got this slightly wrong.  The move does happen with the
>> > parent's ->i_rwsem held, but it's for read, not for write.  First, before
>> > ->lookup is called, the ->i_rwsem of the parent directory is taken for read.
>> > ->lookup() calls d_splice_alias() which can call __d_unalias() which does the
>> > __d_move().  If the old alias is in a different directory (which cannot happen
>> > in that fscrypt case, but can happen in the general "directory hard links"
>> > case), __d_unalias() takes that directory's ->i_rwsem for read too.
>> >
>> > So it looks like the parent's ->i_rwsem does indeed exclude moves of child
>> > dentries, but only if it's taken for *write*.  So I guess you can rely on that;
>> > it's just a bit more subtle than it first appears.  Though, some of your
>> > explanation seems to assume that a read lock is sufficient ("In __lookup_slow,
>> > either the parent inode is locked by the caller (lookup_slow) ..."), so maybe
>> > there is still a problem.
>> 
>> I think I'm missing something on your clarification. I see your point
>> about __d_unalias, and I see in the case where alias->d_parent !=
>> dentry->d_parent we acquire the parent inode read lock:
>> 
>> static int __d_unalias(struct inode *inode,
>> 		struct dentry *dentry, struct dentry *alias)
>> {
>> ...
>> 	m1 = &dentry->d_sb->s_vfs_rename_mutex;
>> 	if (!inode_trylock_shared(alias->d_parent->d_inode))
>> 		goto out_err;
>> }

>> this __d_move Can do a dentry move and race with d_revalidate even
>> though it has the parent read lock.
>> 
>> > So it looks like the parent's ->i_rwsem does indeed exclude moves of child
>> > dentries, but only if it's taken for *write*.  So I guess you can rely on that;
>> 
>> We can get away of it with acquiring the d_lock as you suggested, I
>> think.  But can you clarify the above? I wanna make sure I didn't miss
>> anything. I am indeed relying only on the read lock here, as you can see.
>
> In my first email I thought that __d_move() can be called without the parent
> inode's i_rwsem held at all.  In my second email I realized that it is always
> called with at least a read (shared) lock.

I see. Thank you.  We are on the same page now.   I was confused by
this part of your second comment:

>> > I guess you can rely on that; it's just a bit more subtle than it
>> > first appears.  Though, some of your explanation seems to assume
>> > that a read lock is sufficient ("In __lookup_slow, either the
>> > parent inode is locked by the caller (lookup_slow) ..."),

...because I was then failing to see, after learning about the __d_move
case, how I could rely on the inode read lock.  But, as I now realize,
__d_move is not called for negative dentries, so lookup_slow is indeed
safe.

> The question is what kind of parent i_rwsem lock is guaranteed to be held by the
> caller of ->d_revalidate() when the name comparison is done.  Based on the
> above, it needs to be at least a write (exclusive) lock to exclude __d_move()
> without taking d_lock.  However, your analysis (in the commit message of "libfs:
> Validate negative dentries in case-insensitive directories") only talks about
> i_rwsem being "taken", without saying whether it's for read or write.  One thing
> you mentioned as taking i_rwsem is lookup_slow, but that only takes it for read.
> That seems like a problem, as it makes your analysis not correct.

My understanding and explanation was that a read lock should be enough
at all times, despite the __d_move case.  Any time d_revalidate is
called for a (LOOKUP_CREATE | LOOKUP_RENAME_TARGET), it holds at least
the read lock, preventing concurrent changes to d_name of negative
dentries.

I will review the places that touch ->d_name again and I will keep the
patch as-is and update my explanation to include this case.

-- 
Gabriel Krisman Bertazi

WARNING: multiple messages have this Message-ID (diff)
From: Gabriel Krisman Bertazi <krisman@suse.de>
To: Eric Biggers <ebiggers@kernel.org>
Cc: brauner@kernel.org, tytso@mit.edu,
	linux-f2fs-devel@lists.sourceforge.net, viro@zeniv.linux.org.uk,
	linux-fsdevel@vger.kernel.org, jaegeuk@kernel.org,
	linux-ext4@vger.kernel.org,
	Gabriel Krisman Bertazi <krisman@collabora.com>
Subject: Re: [f2fs-dev] [PATCH v3 3/7] libfs: Validate negative dentries in case-insensitive directories
Date: Mon, 24 Jul 2023 17:33:43 -0400	[thread overview]
Message-ID: <87zg3l2dvs.fsf@suse.de> (raw)
In-Reply-To: <20230722042939.GC5660@sol.localdomain> (Eric Biggers's message of "Fri, 21 Jul 2023 21:29:39 -0700")

Eric Biggers <ebiggers@kernel.org> writes:

> On Fri, Jul 21, 2023 at 04:16:30PM -0400, Gabriel Krisman Bertazi wrote:
>> Eric Biggers <ebiggers@kernel.org> writes:
>> 
>> > On Wed, Jul 19, 2023 at 11:06:57PM -0700, Eric Biggers wrote:
>> >> 
>> >> I'm also having trouble understanding exactly when ->d_name is stable here.
>> >> AFAICS, unfortunately the VFS has an edge case where a dentry can be moved
>> >> without its parent's ->i_rwsem being held.  It happens when a subdirectory is
>> >> "found" under multiple names.  The VFS doesn't support directory hard links, so
>> >> if it finds a second link to a directory, it just moves the whole dentry tree to
>> >> the new location.  This can happen if a filesystem image is corrupted and
>> >> contains directory hard links.  Coincidentally, it can also happen in an
>> >> encrypted directory due to the no-key name => normal name transition...
>> >
>> > Sorry, I think I got this slightly wrong.  The move does happen with the
>> > parent's ->i_rwsem held, but it's for read, not for write.  First, before
>> > ->lookup is called, the ->i_rwsem of the parent directory is taken for read.
>> > ->lookup() calls d_splice_alias() which can call __d_unalias() which does the
>> > __d_move().  If the old alias is in a different directory (which cannot happen
>> > in that fscrypt case, but can happen in the general "directory hard links"
>> > case), __d_unalias() takes that directory's ->i_rwsem for read too.
>> >
>> > So it looks like the parent's ->i_rwsem does indeed exclude moves of child
>> > dentries, but only if it's taken for *write*.  So I guess you can rely on that;
>> > it's just a bit more subtle than it first appears.  Though, some of your
>> > explanation seems to assume that a read lock is sufficient ("In __lookup_slow,
>> > either the parent inode is locked by the caller (lookup_slow) ..."), so maybe
>> > there is still a problem.
>> 
>> I think I'm missing something on your clarification. I see your point
>> about __d_unalias, and I see in the case where alias->d_parent !=
>> dentry->d_parent we acquire the parent inode read lock:
>> 
>> static int __d_unalias(struct inode *inode,
>> 		struct dentry *dentry, struct dentry *alias)
>> {
>> ...
>> 	m1 = &dentry->d_sb->s_vfs_rename_mutex;
>> 	if (!inode_trylock_shared(alias->d_parent->d_inode))
>> 		goto out_err;
>> }

>> this __d_move Can do a dentry move and race with d_revalidate even
>> though it has the parent read lock.
>> 
>> > So it looks like the parent's ->i_rwsem does indeed exclude moves of child
>> > dentries, but only if it's taken for *write*.  So I guess you can rely on that;
>> 
>> We can get away of it with acquiring the d_lock as you suggested, I
>> think.  But can you clarify the above? I wanna make sure I didn't miss
>> anything. I am indeed relying only on the read lock here, as you can see.
>
> In my first email I thought that __d_move() can be called without the parent
> inode's i_rwsem held at all.  In my second email I realized that it is always
> called with at least a read (shared) lock.

I see. Thank you.  We are on the same page now.   I was confused by
this part of your second comment:

>> > I guess you can rely on that; it's just a bit more subtle than it
>> > first appears.  Though, some of your explanation seems to assume
>> > that a read lock is sufficient ("In __lookup_slow, either the
>> > parent inode is locked by the caller (lookup_slow) ..."),

...because I was then failing to see, after learning about the __d_move
case, how I could rely on the inode read lock.  But, as I now realize,
__d_move is not called for negative dentries, so lookup_slow is indeed
safe.

> The question is what kind of parent i_rwsem lock is guaranteed to be held by the
> caller of ->d_revalidate() when the name comparison is done.  Based on the
> above, it needs to be at least a write (exclusive) lock to exclude __d_move()
> without taking d_lock.  However, your analysis (in the commit message of "libfs:
> Validate negative dentries in case-insensitive directories") only talks about
> i_rwsem being "taken", without saying whether it's for read or write.  One thing
> you mentioned as taking i_rwsem is lookup_slow, but that only takes it for read.
> That seems like a problem, as it makes your analysis not correct.

My understanding and explanation was that a read lock should be enough
at all times, despite the __d_move case.  Any time d_revalidate is
called for a (LOOKUP_CREATE | LOOKUP_RENAME_TARGET), it holds at least
the read lock, preventing concurrent changes to d_name of negative
dentries.

I will review the places that touch ->d_name again and I will keep the
patch as-is and update my explanation to include this case.

-- 
Gabriel Krisman Bertazi


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

  reply	other threads:[~2023-07-24 21:33 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-19 22:19 [PATCH v3 0/7] Support negative dentries on case-insensitive ext4 and f2fs Gabriel Krisman Bertazi
2023-07-19 22:19 ` [f2fs-dev] " Gabriel Krisman Bertazi
2023-07-19 22:19 ` [PATCH v3 1/7] fs: Expose name under lookup to d_revalidate hook Gabriel Krisman Bertazi
2023-07-19 22:19   ` [f2fs-dev] " Gabriel Krisman Bertazi
2023-07-19 22:19 ` [PATCH v3 2/7] fs: Add DCACHE_CASEFOLDED_NAME flag Gabriel Krisman Bertazi
2023-07-19 22:19   ` [f2fs-dev] " Gabriel Krisman Bertazi
2023-07-19 22:19 ` [PATCH v3 3/7] libfs: Validate negative dentries in case-insensitive directories Gabriel Krisman Bertazi
2023-07-19 22:19   ` [f2fs-dev] " Gabriel Krisman Bertazi
2023-07-20  6:06   ` Eric Biggers
2023-07-20  6:06     ` [f2fs-dev] " Eric Biggers
2023-07-20  6:41     ` Eric Biggers
2023-07-20  6:41       ` [f2fs-dev] " Eric Biggers
2023-07-21 20:16       ` Gabriel Krisman Bertazi
2023-07-21 20:16         ` [f2fs-dev] " Gabriel Krisman Bertazi
2023-07-22  4:29         ` Eric Biggers
2023-07-22  4:29           ` [f2fs-dev] " Eric Biggers
2023-07-24 21:33           ` Gabriel Krisman Bertazi [this message]
2023-07-24 21:33             ` Gabriel Krisman Bertazi
2023-07-19 22:19 ` [PATCH v3 4/7] libfs: Chain encryption checks after case-insensitive revalidation Gabriel Krisman Bertazi
2023-07-19 22:19   ` [f2fs-dev] " Gabriel Krisman Bertazi
2023-07-19 22:19 ` [PATCH v3 5/7] libfs: Merge encrypted_ci_dentry_ops and ci_dentry_ops Gabriel Krisman Bertazi
2023-07-19 22:19   ` [f2fs-dev] " Gabriel Krisman Bertazi
2023-07-19 22:19 ` [PATCH v3 6/7] ext4: Enable negative dentries on case-insensitive lookup Gabriel Krisman Bertazi
2023-07-19 22:19   ` [f2fs-dev] " Gabriel Krisman Bertazi
2023-07-19 22:19 ` [PATCH v3 7/7] f2fs: " Gabriel Krisman Bertazi
2023-07-19 22:19   ` [f2fs-dev] " Gabriel Krisman Bertazi
2023-07-20  7:43 ` [PATCH v3 0/7] Support negative dentries on case-insensitive ext4 and f2fs Eric Biggers
2023-07-20  7:43   ` [f2fs-dev] " Eric Biggers
2023-07-20 17:35   ` Gabriel Krisman Bertazi
2023-07-20 17:35     ` [f2fs-dev] " Gabriel Krisman Bertazi
2023-07-21  3:12     ` Eric Biggers
2023-07-21  3:12       ` [f2fs-dev] " Eric Biggers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87zg3l2dvs.fsf@suse.de \
    --to=krisman@suse.de \
    --cc=brauner@kernel.org \
    --cc=ebiggers@kernel.org \
    --cc=jaegeuk@kernel.org \
    --cc=krisman@collabora.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.