All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gabriel Krisman Bertazi <krisman@suse.de>
To: Eugen Hristev <eugen.hristev@collabora.com>
Cc: tytso@mit.edu,  adilger.kernel@dilger.ca,  jaegeuk@kernel.org,
	chao@kernel.org,  viro@zeniv.linux.org.uk,  brauner@kernel.org,
	linux-ext4@vger.kernel.org,
	 linux-f2fs-devel@lists.sourceforge.net, jack@suse.cz,
	 linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	 kernel@collabora.com,
	 Gabriel Krisman Bertazi <krisman@collabora.com>,
	 Eric Biggers <ebiggers@google.com>
Subject: Re: [RESEND PATCH v9 1/3] libfs: Introduce case-insensitive string comparison helper
Date: Thu, 08 Feb 2024 13:38:33 -0500	[thread overview]
Message-ID: <87ttmivm1i.fsf@mailhost.krisman.be> (raw)
In-Reply-To: <20240208064334.268216-2-eugen.hristev@collabora.com> (Eugen Hristev's message of "Thu, 8 Feb 2024 08:43:32 +0200")

Eugen Hristev <eugen.hristev@collabora.com> writes:

> From: Gabriel Krisman Bertazi <krisman@collabora.com>
>
> generic_ci_match can be used by case-insensitive filesystems to compare
> strings under lookup with dirents in a case-insensitive way.  This
> function is currently reimplemented by each filesystem supporting
> casefolding, so this reduces code duplication in filesystem-specific
> code.
>
> Reviewed-by: Eric Biggers <ebiggers@google.com>
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> Signed-off-by: Eugen Hristev <eugen.hristev@collabora.com>

Hi Eugen,

Thanks for picking this up.  Please,  CC me in future versions.

> ---
>  fs/libfs.c         | 68 ++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/fs.h |  4 +++
>  2 files changed, 72 insertions(+)
>
> diff --git a/fs/libfs.c b/fs/libfs.c
> index bb18884ff20e..f80cb982ac89 100644
> --- a/fs/libfs.c
> +++ b/fs/libfs.c
> @@ -1773,6 +1773,74 @@ static const struct dentry_operations generic_ci_dentry_ops = {
>  	.d_hash = generic_ci_d_hash,
>  	.d_compare = generic_ci_d_compare,
>  };
> +
> +/**
> + * generic_ci_match() - Match a name (case-insensitively) with a dirent.
> + * @parent: Inode of the parent of the dirent under comparison
> + * @name: name under lookup.
> + * @folded_name: Optional pre-folded name under lookup
> + * @de_name: Dirent name.
> + * @de_name_len: dirent name length.
> + *
> + *
> + * Test whether a case-insensitive directory entry matches the filename
> + * being searched.  If @folded_name is provided, it is used instead of
> + * recalculating the casefold of @name.

Can we add a note that this is a filesystem helper for comparison with
directory entries, and VFS' ->d_compare should use generic_ci_d_compare.

> + *
> + * Return: > 0 if the directory entry matches, 0 if it doesn't match, or
> + * < 0 on error.
> + */
> +int generic_ci_match(const struct inode *parent,
> +		     const struct qstr *name,
> +		     const struct qstr *folded_name,
> +		     const u8 *de_name, u32 de_name_len)
> +{
> +	const struct super_block *sb = parent->i_sb;
> +	const struct unicode_map *um = sb->s_encoding;
> +	struct fscrypt_str decrypted_name = FSTR_INIT(NULL, de_name_len);
> +	struct qstr dirent = QSTR_INIT(de_name, de_name_len);
> +	int res, match = false;

I know I originally wrote it this way, but match is an integer, so
let's use integers instead of false/true.

> +
> +	if (IS_ENCRYPTED(parent)) {
> +		const struct fscrypt_str encrypted_name =
> +			FSTR_INIT((u8 *) de_name, de_name_len);
> +
> +		if (WARN_ON_ONCE(!fscrypt_has_encryption_key(parent)))
> +			return -EINVAL;
> +
> +		decrypted_name.name = kmalloc(de_name_len, GFP_KERNEL);
> +		if (!decrypted_name.name)
> +			return -ENOMEM;
> +		res = fscrypt_fname_disk_to_usr(parent, 0, 0, &encrypted_name,
> +						&decrypted_name);
> +		if (res < 0)
> +			goto out;
> +		dirent.name = decrypted_name.name;
> +		dirent.len = decrypted_name.len;
> +	}
> +
> +	if (folded_name->name)
> +		res = utf8_strncasecmp_folded(um, folded_name, &dirent);
> +	else
> +		res = utf8_strncasecmp(um, name, &dirent);

Similar to

  https://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode.git/commit/?h=for-next&id=367122c529f35b4655acbe33c0cc4d6d3b32ba71

We should be checking for an exact-match first to avoid the utf8
comparison cost unnecessarily.  The only problem is that we need to
ensure we fail for an invalid utf-8 de_name in "strict mode".

Fortunately, if folded_name->name exists, we know the name-under-lookup
was validated when initialized, so an exact-match of de_name must also
be valid.  If folded_name is NULL, though, we might either have an
invalid utf-8 dentry-under-lookup or the allocation itself failed, so we
need to utf8_validate it.

Honestly, I don't care much about this !folded_name->name case, since
utf8_strncasecmp will do the right thing and an invalid utf8 on
case-insensitive directories should be an exception, not the norm.  but
the code might get simpler if we do both:

(untested)

if (folded_name->name) {
	if (dirent.len == folded_name->len &&
	    !memcmp(folded_name->name, dirent.name, dirent.len)) {
            	res = 1;
		goto out;
        }
	res = utf8_strncasecmp_folded(um, folded_name, &dirent);
} else {
	if (dirent.len == name->len &&
	    !memcmp(name->name, dirent.name, dirent.len) &&
            (!sb_has_strict_encoding(sb) || !utf8_validate(um, name))) {
            	res = 1;
		goto out;
        }
	res = utf8_strncasecmp(um, name, &dirent);
}

> +
> +	if (!res)
> +		match = true;
> +	else if (res < 0 && !sb_has_strict_encoding(sb)) {
> +		/*
> +		 * In non-strict mode, fallback to a byte comparison if
> +		 * the names have invalid characters.
> +		 */
> +		res = 0;
> +		match = ((name->len == dirent.len) &&
> +			 !memcmp(name->name, dirent.name, dirent.len));
> +	}

This goes away entirely.

> +
> +out:
> +	kfree(decrypted_name.name);
> +	return (res >= 0) ? match : res;

and this becomes:

return res;

-- 
Gabriel Krisman Bertazi

WARNING: multiple messages have this Message-ID (diff)
From: Gabriel Krisman Bertazi <krisman@suse.de>
To: Eugen Hristev <eugen.hristev@collabora.com>
Cc: brauner@kernel.org, kernel@collabora.com, tytso@mit.edu,
	Eric Biggers <ebiggers@google.com>,
	jack@suse.cz, linux-kernel@vger.kernel.org,
	linux-f2fs-devel@lists.sourceforge.net, adilger.kernel@dilger.ca,
	viro@zeniv.linux.org.uk, linux-fsdevel@vger.kernel.org,
	jaegeuk@kernel.org, linux-ext4@vger.kernel.org,
	Gabriel Krisman Bertazi <krisman@collabora.com>
Subject: Re: [f2fs-dev] [RESEND PATCH v9 1/3] libfs: Introduce case-insensitive string comparison helper
Date: Thu, 08 Feb 2024 13:38:33 -0500	[thread overview]
Message-ID: <87ttmivm1i.fsf@mailhost.krisman.be> (raw)
In-Reply-To: <20240208064334.268216-2-eugen.hristev@collabora.com> (Eugen Hristev's message of "Thu, 8 Feb 2024 08:43:32 +0200")

Eugen Hristev <eugen.hristev@collabora.com> writes:

> From: Gabriel Krisman Bertazi <krisman@collabora.com>
>
> generic_ci_match can be used by case-insensitive filesystems to compare
> strings under lookup with dirents in a case-insensitive way.  This
> function is currently reimplemented by each filesystem supporting
> casefolding, so this reduces code duplication in filesystem-specific
> code.
>
> Reviewed-by: Eric Biggers <ebiggers@google.com>
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> Signed-off-by: Eugen Hristev <eugen.hristev@collabora.com>

Hi Eugen,

Thanks for picking this up.  Please,  CC me in future versions.

> ---
>  fs/libfs.c         | 68 ++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/fs.h |  4 +++
>  2 files changed, 72 insertions(+)
>
> diff --git a/fs/libfs.c b/fs/libfs.c
> index bb18884ff20e..f80cb982ac89 100644
> --- a/fs/libfs.c
> +++ b/fs/libfs.c
> @@ -1773,6 +1773,74 @@ static const struct dentry_operations generic_ci_dentry_ops = {
>  	.d_hash = generic_ci_d_hash,
>  	.d_compare = generic_ci_d_compare,
>  };
> +
> +/**
> + * generic_ci_match() - Match a name (case-insensitively) with a dirent.
> + * @parent: Inode of the parent of the dirent under comparison
> + * @name: name under lookup.
> + * @folded_name: Optional pre-folded name under lookup
> + * @de_name: Dirent name.
> + * @de_name_len: dirent name length.
> + *
> + *
> + * Test whether a case-insensitive directory entry matches the filename
> + * being searched.  If @folded_name is provided, it is used instead of
> + * recalculating the casefold of @name.

Can we add a note that this is a filesystem helper for comparison with
directory entries, and VFS' ->d_compare should use generic_ci_d_compare.

> + *
> + * Return: > 0 if the directory entry matches, 0 if it doesn't match, or
> + * < 0 on error.
> + */
> +int generic_ci_match(const struct inode *parent,
> +		     const struct qstr *name,
> +		     const struct qstr *folded_name,
> +		     const u8 *de_name, u32 de_name_len)
> +{
> +	const struct super_block *sb = parent->i_sb;
> +	const struct unicode_map *um = sb->s_encoding;
> +	struct fscrypt_str decrypted_name = FSTR_INIT(NULL, de_name_len);
> +	struct qstr dirent = QSTR_INIT(de_name, de_name_len);
> +	int res, match = false;

I know I originally wrote it this way, but match is an integer, so
let's use integers instead of false/true.

> +
> +	if (IS_ENCRYPTED(parent)) {
> +		const struct fscrypt_str encrypted_name =
> +			FSTR_INIT((u8 *) de_name, de_name_len);
> +
> +		if (WARN_ON_ONCE(!fscrypt_has_encryption_key(parent)))
> +			return -EINVAL;
> +
> +		decrypted_name.name = kmalloc(de_name_len, GFP_KERNEL);
> +		if (!decrypted_name.name)
> +			return -ENOMEM;
> +		res = fscrypt_fname_disk_to_usr(parent, 0, 0, &encrypted_name,
> +						&decrypted_name);
> +		if (res < 0)
> +			goto out;
> +		dirent.name = decrypted_name.name;
> +		dirent.len = decrypted_name.len;
> +	}
> +
> +	if (folded_name->name)
> +		res = utf8_strncasecmp_folded(um, folded_name, &dirent);
> +	else
> +		res = utf8_strncasecmp(um, name, &dirent);

Similar to

  https://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode.git/commit/?h=for-next&id=367122c529f35b4655acbe33c0cc4d6d3b32ba71

We should be checking for an exact-match first to avoid the utf8
comparison cost unnecessarily.  The only problem is that we need to
ensure we fail for an invalid utf-8 de_name in "strict mode".

Fortunately, if folded_name->name exists, we know the name-under-lookup
was validated when initialized, so an exact-match of de_name must also
be valid.  If folded_name is NULL, though, we might either have an
invalid utf-8 dentry-under-lookup or the allocation itself failed, so we
need to utf8_validate it.

Honestly, I don't care much about this !folded_name->name case, since
utf8_strncasecmp will do the right thing and an invalid utf8 on
case-insensitive directories should be an exception, not the norm.  but
the code might get simpler if we do both:

(untested)

if (folded_name->name) {
	if (dirent.len == folded_name->len &&
	    !memcmp(folded_name->name, dirent.name, dirent.len)) {
            	res = 1;
		goto out;
        }
	res = utf8_strncasecmp_folded(um, folded_name, &dirent);
} else {
	if (dirent.len == name->len &&
	    !memcmp(name->name, dirent.name, dirent.len) &&
            (!sb_has_strict_encoding(sb) || !utf8_validate(um, name))) {
            	res = 1;
		goto out;
        }
	res = utf8_strncasecmp(um, name, &dirent);
}

> +
> +	if (!res)
> +		match = true;
> +	else if (res < 0 && !sb_has_strict_encoding(sb)) {
> +		/*
> +		 * In non-strict mode, fallback to a byte comparison if
> +		 * the names have invalid characters.
> +		 */
> +		res = 0;
> +		match = ((name->len == dirent.len) &&
> +			 !memcmp(name->name, dirent.name, dirent.len));
> +	}

This goes away entirely.

> +
> +out:
> +	kfree(decrypted_name.name);
> +	return (res >= 0) ? match : res;

and this becomes:

return res;

-- 
Gabriel Krisman Bertazi


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

  reply	other threads:[~2024-02-08 18:38 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-08  6:43 [RESEND PATCH v9 0/3] Introduce case-insensitive string comparison helper Eugen Hristev
2024-02-08  6:43 ` [f2fs-dev] " Eugen Hristev via Linux-f2fs-devel
2024-02-08  6:43 ` [RESEND PATCH v9 1/3] libfs: " Eugen Hristev
2024-02-08  6:43   ` [f2fs-dev] " Eugen Hristev via Linux-f2fs-devel
2024-02-08 18:38   ` Gabriel Krisman Bertazi [this message]
2024-02-08 18:38     ` Gabriel Krisman Bertazi
2024-02-09 10:30     ` Eugen Hristev
2024-02-09 10:30       ` [f2fs-dev] " Eugen Hristev via Linux-f2fs-devel
2024-02-09 14:40       ` Gabriel Krisman Bertazi
2024-02-09 14:40         ` [f2fs-dev] " Gabriel Krisman Bertazi
2024-02-13  4:44         ` Eugen Hristev
2024-02-13  4:44           ` [f2fs-dev] " Eugen Hristev via Linux-f2fs-devel
2024-02-13 16:09           ` Gabriel Krisman Bertazi
2024-02-13 16:09             ` [f2fs-dev] " Gabriel Krisman Bertazi
2024-02-08  6:43 ` [RESEND PATCH v9 2/3] ext4: Reuse generic_ci_match for ci comparisons Eugen Hristev
2024-02-08  6:43   ` [f2fs-dev] " Eugen Hristev via Linux-f2fs-devel
2024-02-08 18:42   ` Gabriel Krisman Bertazi
2024-02-08 18:42     ` [f2fs-dev] " Gabriel Krisman Bertazi
2024-02-08  6:43 ` [RESEND PATCH v9 3/3] f2fs: " Eugen Hristev
2024-02-08  6:43   ` [f2fs-dev] " Eugen Hristev via Linux-f2fs-devel
2024-07-24  2:16 ` [f2fs-dev] [RESEND PATCH v9 0/3] Introduce case-insensitive string comparison helper patchwork-bot+f2fs
2024-07-24  2:16   ` patchwork-bot+f2fs

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ttmivm1i.fsf@mailhost.krisman.be \
    --to=krisman@suse.de \
    --cc=adilger.kernel@dilger.ca \
    --cc=brauner@kernel.org \
    --cc=chao@kernel.org \
    --cc=ebiggers@google.com \
    --cc=eugen.hristev@collabora.com \
    --cc=jack@suse.cz \
    --cc=jaegeuk@kernel.org \
    --cc=kernel@collabora.com \
    --cc=krisman@collabora.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.