Re: [PATCH v16 3/9] libfs: Introduce case-insensitive string comparison helper

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Gabriel Krisman Bertazi <krisman@suse.de>
To: Eugen Hristev <eugen.hristev@collabora.com>
Cc: Eric Biggers <ebiggers@kernel.org>,
	 tytso@mit.edu, adilger.kernel@dilger.ca,
	 linux-ext4@vger.kernel.org, jaegeuk@kernel.org,
	 chao@kernel.org, linux-f2fs-devel@lists.sourceforge.net,
	 linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	 kernel@collabora.com, viro@zeniv.linux.org.uk,
	 brauner@kernel.org,  jack@suse.cz,
	 Gabriel Krisman Bertazi <krisman@collabora.com>
Subject: Re: [PATCH v16 3/9] libfs: Introduce case-insensitive string comparison helper
Date: Wed, 22 May 2024 19:05:48 -0400	[thread overview]
Message-ID: <87ttipqwfn.fsf@mailhost.krisman.be> (raw)
In-Reply-To: <9afebadd-765f-42f3-a80b-366dd749bf48@collabora.com> (Eugen Hristev's message of "Wed, 22 May 2024 17:02:53 +0300")

Eugen Hristev <eugen.hristev@collabora.com> writes:

> On 5/13/24 00:27, Gabriel Krisman Bertazi wrote:
>> Eric Biggers <ebiggers@kernel.org> writes:
>> 
>>> On Fri, Apr 05, 2024 at 03:13:26PM +0300, Eugen Hristev wrote:
>> 
>>>> +		if (WARN_ON_ONCE(!fscrypt_has_encryption_key(parent)))
>>>> +			return -EINVAL;
>>>> +
>>>> +		decrypted_name.name = kmalloc(de_name_len, GFP_KERNEL);
>>>> +		if (!decrypted_name.name)
>>>> +			return -ENOMEM;
>>>> +		res = fscrypt_fname_disk_to_usr(parent, 0, 0, &encrypted_name,
>>>> +						&decrypted_name);
>>>> +		if (res < 0)
>>>> +			goto out;
>>>
>>> If fscrypt_fname_disk_to_usr() returns an error and !sb_has_strict_encoding(sb),
>>> then this function returns 0 (indicating no match) instead of the error code
>>> (indicating an error).  Is that the correct behavior?  I would think that
>>> strict_encoding should only have an effect on the actual name
>>> comparison.
>> 
>> No. we *want* this return code to be propagated back to f2fs.  In ext4 it
>> wouldn't matter since the error is not visible outside of ext4_match,
>> but f2fs does the right thing and stops the lookup.
>
> In the previous version which I sent, you told me that the error should be
> propagated only in strict_mode, and if !strict_mode, it should just return no match.
> Originally I did not understand that this should be done only for utf8_strncasecmp
> errors, and not for all the errors. I will change it here to fix that.

Yes, it depends on which error we are talking about. For ENOMEM and
whatever error fscrypt_fname_disk_to_usr returns, we surely want to send
that back, such that f2fs can handle it (i.e abort the lookup).  Unicode
casefolding errors don't need to stop the lookup.


>> Thinking about it, there is a second problem with this series.
>> Currently, if we are on strict_mode, f2fs_match_ci_name does not
>> propagate unicode errors back to f2fs. So, once a utf8 invalid sequence
>> is found during lookup, it will be considered not-a-match but the lookup
>> will continue.  This allows some lookups to succeed even in a corrupted
>> directory.  With this patch, we will abort the lookup on the first
>> error, breaking existing semantics.  Note that these are different from
>> memory allocation failure and fscrypt_fname_disk_to_usr. For those, it
>> makes sense to abort.
>
> So , in the case of f2fs , we must not propagate utf8 errors ? It should just
> return no match even in strict mode ?
> If this helper is common for both f2fs and ext4, we have to do the same for ext4 ?
> Or we are no longer able to commonize the code altogether ?

We can have a common handler.  It doesn't matter for Ext4 because it
ignores all errors. Perhaps ext4 can be improved too in a different
patchset.

>> My suggestion would be to keep the current behavior.  Make
>> generic_ci_match only propagate non-unicode related errors back to the
>> filesystem.  This means that we need to move the error messages in patch
>> 6 and 7 into this function, so they only trigger when utf8_strncasecmp*
>> itself fails.
>> 
>
> So basically unicode errors stop here, and print the error message here in that case.
> Am I understanding it correctly ?

Yes, that is it.  print the error message - only in strict mode - and
return not-a-match.

Is there any problem with this approach that I'm missing?

>>>> +	/*
>>>> +	 * Attempt a case-sensitive match first. It is cheaper and
>>>> +	 * should cover most lookups, including all the sane
>>>> +	 * applications that expect a case-sensitive filesystem.
>>>> +	 */
>>>> +	if (folded_name->name) {
>>>> +		if (dirent.len == folded_name->len &&
>>>> +		    !memcmp(folded_name->name, dirent.name, dirent.len))
>>>> +			goto out;
>>>> +		res = utf8_strncasecmp_folded(um, folded_name, &dirent);
>>>
>>> Shouldn't the memcmp be done with the original user-specified name, not the
>>> casefolded name?  I would think that the user-specified name is the one that's
>>> more likely to match the on-disk name, because of case preservation.  In most
>>> cases users will specify the same case on both file creation and later access.
>> 
>> Yes.
>> 
> so the utf8_strncasecmp_folded call here must use name->name instead of folded_name ?

No, utf8_strncasecmp_folded requires a casefolded name.  Eric's point is
that the *memcmp* should always compare against name->name since it's more
likely to match the name on disk than the folded version because the user
is probably doing a case-exact lookup.

This also means the memcmp can be moved outside the "if (folded_name->name)",
simplifying the patch!

-- 
Gabriel Krisman Bertazi

WARNING: multiple messages have this Message-ID (diff)

From: Gabriel Krisman Bertazi <krisman@suse.de>
To: Eugen Hristev <eugen.hristev@collabora.com>
Cc: brauner@kernel.org, kernel@collabora.com, tytso@mit.edu,
	jack@suse.cz, linux-kernel@vger.kernel.org,
	linux-f2fs-devel@lists.sourceforge.net,
	Eric Biggers <ebiggers@kernel.org>,
	adilger.kernel@dilger.ca, viro@zeniv.linux.org.uk,
	linux-fsdevel@vger.kernel.org, jaegeuk@kernel.org,
	linux-ext4@vger.kernel.org,
	Gabriel Krisman Bertazi <krisman@collabora.com>
Subject: Re: [f2fs-dev] [PATCH v16 3/9] libfs: Introduce case-insensitive string comparison helper
Date: Wed, 22 May 2024 19:05:48 -0400	[thread overview]
Message-ID: <87ttipqwfn.fsf@mailhost.krisman.be> (raw)
In-Reply-To: <9afebadd-765f-42f3-a80b-366dd749bf48@collabora.com> (Eugen Hristev's message of "Wed, 22 May 2024 17:02:53 +0300")

Eugen Hristev <eugen.hristev@collabora.com> writes:

> On 5/13/24 00:27, Gabriel Krisman Bertazi wrote:
>> Eric Biggers <ebiggers@kernel.org> writes:
>> 
>>> On Fri, Apr 05, 2024 at 03:13:26PM +0300, Eugen Hristev wrote:
>> 
>>>> +		if (WARN_ON_ONCE(!fscrypt_has_encryption_key(parent)))
>>>> +			return -EINVAL;
>>>> +
>>>> +		decrypted_name.name = kmalloc(de_name_len, GFP_KERNEL);
>>>> +		if (!decrypted_name.name)
>>>> +			return -ENOMEM;
>>>> +		res = fscrypt_fname_disk_to_usr(parent, 0, 0, &encrypted_name,
>>>> +						&decrypted_name);
>>>> +		if (res < 0)
>>>> +			goto out;
>>>
>>> If fscrypt_fname_disk_to_usr() returns an error and !sb_has_strict_encoding(sb),
>>> then this function returns 0 (indicating no match) instead of the error code
>>> (indicating an error).  Is that the correct behavior?  I would think that
>>> strict_encoding should only have an effect on the actual name
>>> comparison.
>> 
>> No. we *want* this return code to be propagated back to f2fs.  In ext4 it
>> wouldn't matter since the error is not visible outside of ext4_match,
>> but f2fs does the right thing and stops the lookup.
>
> In the previous version which I sent, you told me that the error should be
> propagated only in strict_mode, and if !strict_mode, it should just return no match.
> Originally I did not understand that this should be done only for utf8_strncasecmp
> errors, and not for all the errors. I will change it here to fix that.

Yes, it depends on which error we are talking about. For ENOMEM and
whatever error fscrypt_fname_disk_to_usr returns, we surely want to send
that back, such that f2fs can handle it (i.e abort the lookup).  Unicode
casefolding errors don't need to stop the lookup.


>> Thinking about it, there is a second problem with this series.
>> Currently, if we are on strict_mode, f2fs_match_ci_name does not
>> propagate unicode errors back to f2fs. So, once a utf8 invalid sequence
>> is found during lookup, it will be considered not-a-match but the lookup
>> will continue.  This allows some lookups to succeed even in a corrupted
>> directory.  With this patch, we will abort the lookup on the first
>> error, breaking existing semantics.  Note that these are different from
>> memory allocation failure and fscrypt_fname_disk_to_usr. For those, it
>> makes sense to abort.
>
> So , in the case of f2fs , we must not propagate utf8 errors ? It should just
> return no match even in strict mode ?
> If this helper is common for both f2fs and ext4, we have to do the same for ext4 ?
> Or we are no longer able to commonize the code altogether ?

We can have a common handler.  It doesn't matter for Ext4 because it
ignores all errors. Perhaps ext4 can be improved too in a different
patchset.

>> My suggestion would be to keep the current behavior.  Make
>> generic_ci_match only propagate non-unicode related errors back to the
>> filesystem.  This means that we need to move the error messages in patch
>> 6 and 7 into this function, so they only trigger when utf8_strncasecmp*
>> itself fails.
>> 
>
> So basically unicode errors stop here, and print the error message here in that case.
> Am I understanding it correctly ?

Yes, that is it.  print the error message - only in strict mode - and
return not-a-match.

Is there any problem with this approach that I'm missing?

>>>> +	/*
>>>> +	 * Attempt a case-sensitive match first. It is cheaper and
>>>> +	 * should cover most lookups, including all the sane
>>>> +	 * applications that expect a case-sensitive filesystem.
>>>> +	 */
>>>> +	if (folded_name->name) {
>>>> +		if (dirent.len == folded_name->len &&
>>>> +		    !memcmp(folded_name->name, dirent.name, dirent.len))
>>>> +			goto out;
>>>> +		res = utf8_strncasecmp_folded(um, folded_name, &dirent);
>>>
>>> Shouldn't the memcmp be done with the original user-specified name, not the
>>> casefolded name?  I would think that the user-specified name is the one that's
>>> more likely to match the on-disk name, because of case preservation.  In most
>>> cases users will specify the same case on both file creation and later access.
>> 
>> Yes.
>> 
> so the utf8_strncasecmp_folded call here must use name->name instead of folded_name ?

No, utf8_strncasecmp_folded requires a casefolded name.  Eric's point is
that the *memcmp* should always compare against name->name since it's more
likely to match the name on disk than the folded version because the user
is probably doing a case-exact lookup.

This also means the memcmp can be moved outside the "if (folded_name->name)",
simplifying the patch!

-- 
Gabriel Krisman Bertazi


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

next prev parent reply	other threads:[~2024-05-22 23:05 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-05 12:13 [PATCH v16 0/9] Cache insensitive cleanup for ext4/f2fs Eugen Hristev
2024-04-05 12:13 ` [f2fs-dev] " Eugen Hristev via Linux-f2fs-devel
2024-04-05 12:13 ` [PATCH v16 1/9] ext4: Simplify the handling of cached insensitive names Eugen Hristev
2024-04-05 12:13   ` [f2fs-dev] " Eugen Hristev via Linux-f2fs-devel
2024-05-10  1:23   ` Eric Biggers
2024-05-10  1:23     ` [f2fs-dev] " Eric Biggers
2024-04-05 12:13 ` [PATCH v16 2/9] f2fs: " Eugen Hristev
2024-04-05 12:13   ` [f2fs-dev] " Eugen Hristev via Linux-f2fs-devel
2024-05-10  1:23   ` Eric Biggers
2024-05-10  1:23     ` [f2fs-dev] " Eric Biggers
2024-04-05 12:13 ` [PATCH v16 3/9] libfs: Introduce case-insensitive string comparison helper Eugen Hristev
2024-04-05 12:13   ` [f2fs-dev] " Eugen Hristev via Linux-f2fs-devel
2024-05-10  1:33   ` Eric Biggers
2024-05-10  1:33     ` [f2fs-dev] " Eric Biggers
2024-05-12 21:27     ` Gabriel Krisman Bertazi
2024-05-12 21:27       ` [f2fs-dev] " Gabriel Krisman Bertazi
2024-05-22 14:02       ` Eugen Hristev
2024-05-22 14:02         ` [f2fs-dev] " Eugen Hristev via Linux-f2fs-devel
2024-05-22 23:05         ` Gabriel Krisman Bertazi [this message]
2024-05-22 23:05           ` Gabriel Krisman Bertazi
2024-05-26 11:49           ` Eugen Hristev
2024-05-26 11:49             ` [f2fs-dev] " Eugen Hristev via Linux-f2fs-devel
2024-05-27 20:54             ` Gabriel Krisman Bertazi
2024-05-27 20:54               ` [f2fs-dev] " Gabriel Krisman Bertazi
2024-04-05 12:13 ` [PATCH v16 4/9] ext4: Reuse generic_ci_match for ci comparisons Eugen Hristev
2024-04-05 12:13   ` [f2fs-dev] " Eugen Hristev via Linux-f2fs-devel
2024-05-10  1:23   ` Eric Biggers
2024-05-10  1:23     ` [f2fs-dev] " Eric Biggers
2024-04-05 12:13 ` [PATCH v16 5/9] f2fs: " Eugen Hristev
2024-04-05 12:13   ` [f2fs-dev] " Eugen Hristev via Linux-f2fs-devel
2024-05-10  1:24   ` Eric Biggers
2024-05-10  1:24     ` [f2fs-dev] " Eric Biggers
2024-04-05 12:13 ` [PATCH v16 6/9] ext4: Log error when lookup of encoded dentry fails Eugen Hristev
2024-04-05 12:13   ` [f2fs-dev] " Eugen Hristev via Linux-f2fs-devel
2024-05-10  1:24   ` Eric Biggers
2024-05-10  1:24     ` [f2fs-dev] " Eric Biggers
2024-04-05 12:13 ` [PATCH v16 7/9] f2fs: " Eugen Hristev
2024-04-05 12:13   ` [f2fs-dev] " Eugen Hristev via Linux-f2fs-devel
2024-05-10  1:25   ` Eric Biggers
2024-05-10  1:25     ` [f2fs-dev] " Eric Biggers
2024-04-05 12:13 ` [PATCH v16 8/9] ext4: Move CONFIG_UNICODE defguards into the code flow Eugen Hristev
2024-04-05 12:13   ` [f2fs-dev] " Eugen Hristev via Linux-f2fs-devel
2024-05-10  1:25   ` Eric Biggers
2024-05-10  1:25     ` [f2fs-dev] " Eric Biggers
2024-04-05 12:13 ` [PATCH v16 9/9] f2fs: " Eugen Hristev
2024-04-05 12:13   ` [f2fs-dev] " Eugen Hristev via Linux-f2fs-devel
2024-05-10  1:25   ` Eric Biggers
2024-05-10  1:25     ` [f2fs-dev] " Eric Biggers
2024-04-05 12:18 ` [PATCH v16 0/9] Cache insensitive cleanup for ext4/f2fs Matthew Wilcox
2024-04-05 12:18   ` [f2fs-dev] " Matthew Wilcox
2024-04-05 13:02   ` Eugen Hristev
2024-04-05 13:02     ` [f2fs-dev] " Eugen Hristev via Linux-f2fs-devel
2024-04-05 16:37     ` Gabriel Krisman Bertazi
2024-04-05 16:37       ` [f2fs-dev] " Gabriel Krisman Bertazi
2024-05-09 15:12       ` Eugen Hristev
2024-05-09 15:12         ` [f2fs-dev] " Eugen Hristev via Linux-f2fs-devel
2024-07-24  2:16 ` patchwork-bot+f2fs
2024-07-24  2:16   ` patchwork-bot+f2fs

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ttipqwfn.fsf@mailhost.krisman.be \
    --to=krisman@suse.de \
    --cc=adilger.kernel@dilger.ca \
    --cc=brauner@kernel.org \
    --cc=chao@kernel.org \
    --cc=ebiggers@kernel.org \
    --cc=eugen.hristev@collabora.com \
    --cc=jack@suse.cz \
    --cc=jaegeuk@kernel.org \
    --cc=kernel@collabora.com \
    --cc=krisman@collabora.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.