* [PATCH v2] libfs: Attempt exact-match comparison first during casefold lookup
@ 2024-01-18 0:46 Gabriel Krisman Bertazi
2024-01-18 3:50 ` Eric Biggers
0 siblings, 1 reply; 6+ messages in thread
From: Gabriel Krisman Bertazi @ 2024-01-18 0:46 UTC (permalink / raw)
To: ebiggers, tytso
Cc: linux-fsdevel, viro, jaegeuk, Gabriel Krisman Bertazi,
Linus Torvalds
Casefolded comparisons are (obviously) way more costly than a simple
memcmp. Try the case-sensitive comparison first, falling-back to the
case-insensitive lookup only when needed. This allows any exact-match
lookup to complete without having to walk the utf8 trie.
Note that, for strict mode, generic_ci_d_compare used to reject an
invalid UTF-8 string, which would now be considered valid if it
exact-matches the disk-name. But, if that is the case, the filesystem
is corrupt. More than that, it really doesn't matter in practice,
because the name-under-lookup will have already been rejected by
generic_ci_d_hash and we won't even get here.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
---
changes since v1:
- just return utf8_strncasemp directly (Al Viro)
---
fs/libfs.c | 38 +++++++++++++++++++++-----------------
1 file changed, 21 insertions(+), 17 deletions(-)
diff --git a/fs/libfs.c b/fs/libfs.c
index eec6031b0155..0b553215eda8 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1704,17 +1704,27 @@ bool is_empty_dir_inode(struct inode *inode)
static int generic_ci_d_compare(const struct dentry *dentry, unsigned int len,
const char *str, const struct qstr *name)
{
- const struct dentry *parent = READ_ONCE(dentry->d_parent);
- const struct inode *dir = READ_ONCE(parent->d_inode);
- const struct super_block *sb = dentry->d_sb;
- const struct unicode_map *um = sb->s_encoding;
- struct qstr qstr = QSTR_INIT(str, len);
+ const struct dentry *parent;
+ const struct inode *dir;
char strbuf[DNAME_INLINE_LEN];
- int ret;
+ struct qstr qstr;
+
+ /*
+ * Attempt a case-sensitive match first. It is cheaper and
+ * should cover most lookups, including all the sane
+ * applications that expect a case-sensitive filesystem.
+ */
+ if (len == name->len && !memcmp(str, name->name, len))
+ return 0;
+ parent = READ_ONCE(dentry->d_parent);
+ dir = READ_ONCE(parent->d_inode);
if (!dir || !IS_CASEFOLDED(dir))
- goto fallback;
+ return 1;
+
/*
+ * Finally, try the case-insensitive match.
+ *
* If the dentry name is stored in-line, then it may be concurrently
* modified by a rename. If this happens, the VFS will eventually retry
* the lookup, so it doesn't matter what ->d_compare() returns.
@@ -1724,20 +1734,14 @@ static int generic_ci_d_compare(const struct dentry *dentry, unsigned int len,
if (len <= DNAME_INLINE_LEN - 1) {
memcpy(strbuf, str, len);
strbuf[len] = 0;
- qstr.name = strbuf;
+ str = strbuf;
/* prevent compiler from optimizing out the temporary buffer */
barrier();
}
- ret = utf8_strncasecmp(um, name, &qstr);
- if (ret >= 0)
- return ret;
+ qstr.len = len;
+ qstr.name = str;
- if (sb_has_strict_encoding(sb))
- return -EINVAL;
-fallback:
- if (len != name->len)
- return 1;
- return !!memcmp(str, name->name, len);
+ return utf8_strncasecmp(dentry->d_sb->s_encoding, name, &qstr);
}
/**
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v2] libfs: Attempt exact-match comparison first during casefold lookup
2024-01-18 0:46 [PATCH v2] libfs: Attempt exact-match comparison first during casefold lookup Gabriel Krisman Bertazi
@ 2024-01-18 3:50 ` Eric Biggers
2024-01-18 15:42 ` Gabriel Krisman Bertazi
0 siblings, 1 reply; 6+ messages in thread
From: Eric Biggers @ 2024-01-18 3:50 UTC (permalink / raw)
To: Gabriel Krisman Bertazi
Cc: tytso, linux-fsdevel, viro, jaegeuk, Linus Torvalds
On Wed, Jan 17, 2024 at 09:46:18PM -0300, Gabriel Krisman Bertazi wrote:
> Note that, for strict mode, generic_ci_d_compare used to reject an
> invalid UTF-8 string, which would now be considered valid if it
> exact-matches the disk-name. But, if that is the case, the filesystem
> is corrupt. More than that, it really doesn't matter in practice,
> because the name-under-lookup will have already been rejected by
> generic_ci_d_hash and we won't even get here.
Can you make the code comment explain this, please?
> + /*
> + * Attempt a case-sensitive match first. It is cheaper and
> + * should cover most lookups, including all the sane
> + * applications that expect a case-sensitive filesystem.
> + */
> + if (len == name->len && !memcmp(str, name->name, len))
> + return 0;
Is the memcmp() safe, considering that the 'str' can be concurrently modified?
The default dentry name comparison uses dentry_string_cmp(). Would using
dentry_string_cmp() be a better choice here?
- Eric
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] libfs: Attempt exact-match comparison first during casefold lookup
2024-01-18 3:50 ` Eric Biggers
@ 2024-01-18 15:42 ` Gabriel Krisman Bertazi
2024-01-18 16:51 ` Gabriel Krisman Bertazi
2024-01-18 19:29 ` Linus Torvalds
0 siblings, 2 replies; 6+ messages in thread
From: Gabriel Krisman Bertazi @ 2024-01-18 15:42 UTC (permalink / raw)
To: Eric Biggers; +Cc: tytso, linux-fsdevel, viro, jaegeuk, Linus Torvalds
Eric Biggers <ebiggers@kernel.org> writes:
> On Wed, Jan 17, 2024 at 09:46:18PM -0300, Gabriel Krisman Bertazi wrote:
>> Note that, for strict mode, generic_ci_d_compare used to reject an
>> invalid UTF-8 string, which would now be considered valid if it
>> exact-matches the disk-name. But, if that is the case, the filesystem
>> is corrupt. More than that, it really doesn't matter in practice,
>> because the name-under-lookup will have already been rejected by
>> generic_ci_d_hash and we won't even get here.
>
> Can you make the code comment explain this, please?
will do, once we settle the other discussion about this point.
>
>> + /*
>> + * Attempt a case-sensitive match first. It is cheaper and
>> + * should cover most lookups, including all the sane
>> + * applications that expect a case-sensitive filesystem.
>> + */
>> + if (len == name->len && !memcmp(str, name->name, len))
>> + return 0;
>
> Is the memcmp() safe, considering that the 'str' can be concurrently
> modified?
You wrote quite a bit of the data race documentation, so you definitely
know better.
But I don't see how this could be a problem. the str pointer itself
can't change since it's already a copy of the dentry->d_name pointer; if
the string is external, it is guaranteed to exist until the end of the
lookup; Finally, If it's inlined, the string can change concurrently
which, in the worst case scenario, gives us a false result. But then,
VFS will retry, like we do for the case-insensitive comparison right
below.
..Right? :)
--
Gabriel Krisman Bertazi
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] libfs: Attempt exact-match comparison first during casefold lookup
2024-01-18 15:42 ` Gabriel Krisman Bertazi
@ 2024-01-18 16:51 ` Gabriel Krisman Bertazi
2024-01-18 19:29 ` Linus Torvalds
1 sibling, 0 replies; 6+ messages in thread
From: Gabriel Krisman Bertazi @ 2024-01-18 16:51 UTC (permalink / raw)
To: Eric Biggers; +Cc: tytso, linux-fsdevel, viro, jaegeuk, Linus Torvalds
Gabriel Krisman Bertazi <krisman@suse.de> writes:
> can't change since it's already a copy of the dentry->d_name pointer;
dentry->d_name.name
--
Gabriel Krisman Bertazi
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] libfs: Attempt exact-match comparison first during casefold lookup
2024-01-18 15:42 ` Gabriel Krisman Bertazi
2024-01-18 16:51 ` Gabriel Krisman Bertazi
@ 2024-01-18 19:29 ` Linus Torvalds
2024-01-19 14:13 ` Gabriel Krisman Bertazi
1 sibling, 1 reply; 6+ messages in thread
From: Linus Torvalds @ 2024-01-18 19:29 UTC (permalink / raw)
To: Gabriel Krisman Bertazi; +Cc: Eric Biggers, tytso, linux-fsdevel, viro, jaegeuk
On Thu, 18 Jan 2024 at 07:42, Gabriel Krisman Bertazi <krisman@suse.de> wrote:
>
> But I don't see how this could be a problem. the str pointer itself
> can't change since it's already a copy of the dentry->d_name pointer; if
> the string is external, it is guaranteed to exist until the end of the
> lookup; Finally, If it's inlined, the string can change concurrently
> which, in the worst case scenario, gives us a false result. But then,
> VFS will retry, like we do for the case-insensitive comparison right
> below.
>
> ..Right? :)
Wrong, for two subtle reasons.
The issue is not that the string can go away. The issue is that the
string and the length have been loaded independently - and may not
match.
So when you do that
if (len == name->len && !memcmp(str, name->name, len))
the 'name->len' you test for equality with 'len' may not match the
length of the string allocated in 'name->name', because they are two
different loads of two different values, and we do not hold the lock
that makes them consistent.
See the big comment (and the READ_ONCE()" in dentry_cmp(), and notice
how dentry_cmp() intentionally doesn't use 'name->len' at all.
Subtle, subtle - but this is *incredibly* performance-critical code.
Locks are completely out of the question - they would make the whole
RCU pathwalk completely pointless, and the RCU path-walk with no
stores to the dentry cache is what makes the dentry cache perform so
well.
So it's not even that locks have contention, it's literally that RCU
path lookup treats the dentries as read-only and you actually get
shared cachelines and true parallel lookups. No reference count
updates, no *nothing* like that.
This is why dentry_cmp() does that magical dentry_string_cmp(), which
is very subtle: it knows that regardless of *which* source of naem we
have, the word-aligned reads of d_name->name are safe, because
(a) the allocations are word-aligned
(b) the dentry name allocations all end with a NUL byte (unlike the
pathname ones that can have '/' at the end of the name)
(c) the *pathname* length is reliable, and the previous word didn't
have a NUL byte in it because that would have ended the compare
(d) so that "read one word from cs" is safe, *despite* the fact that
the length we have isn't something we can rely on (it comes from the
pathname side, of course).
And yes, this code is subtle. There are very few people who really
understand all the dentry rules. But it is *the* most critical piece
of code in the kernel for a lot of real-life loads. Looking up
pathnames is pretty much "Job #1" of an OS - a lot of the rest is
"details".
Linus
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] libfs: Attempt exact-match comparison first during casefold lookup
2024-01-18 19:29 ` Linus Torvalds
@ 2024-01-19 14:13 ` Gabriel Krisman Bertazi
0 siblings, 0 replies; 6+ messages in thread
From: Gabriel Krisman Bertazi @ 2024-01-19 14:13 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Eric Biggers, tytso, linux-fsdevel, viro, jaegeuk
Linus Torvalds <torvalds@linux-foundation.org> writes:
> On Thu, 18 Jan 2024 at 07:42, Gabriel Krisman Bertazi <krisman@suse.de> wrote:
>>
>> But I don't see how this could be a problem. the str pointer itself
>> can't change since it's already a copy of the dentry->d_name pointer; if
>> the string is external, it is guaranteed to exist until the end of the
>> lookup; Finally, If it's inlined, the string can change concurrently
>> which, in the worst case scenario, gives us a false result. But then,
>> VFS will retry, like we do for the case-insensitive comparison right
>> below.
>>
>> ..Right? :)
>
> Wrong, for two subtle reasons.
>
> The issue is not that the string can go away. The issue is that the
> string and the length have been loaded independently - and may not
> match.
>
> So when you do that
>
> if (len == name->len && !memcmp(str, name->name, len))
>
> the 'name->len' you test for equality with 'len' may not match the
> length of the string allocated in 'name->name', because they are two
> different loads of two different values, and we do not hold the lock
> that makes them consistent.
>
> See the big comment (and the READ_ONCE()" in dentry_cmp(), and notice
> how dentry_cmp() intentionally doesn't use 'name->len' at all.
>
I see what you mean. I really appreciate the explanation, by the way.
Thanks for that.
I'll follow up with a v3.
--
Gabriel Krisman Bertazi
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-01-19 14:13 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-18 0:46 [PATCH v2] libfs: Attempt exact-match comparison first during casefold lookup Gabriel Krisman Bertazi
2024-01-18 3:50 ` Eric Biggers
2024-01-18 15:42 ` Gabriel Krisman Bertazi
2024-01-18 16:51 ` Gabriel Krisman Bertazi
2024-01-18 19:29 ` Linus Torvalds
2024-01-19 14:13 ` Gabriel Krisman Bertazi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).