From: Linus Torvalds <torvalds@linux-foundation.org>
To: Gabriel Krisman Bertazi <krisman@suse.de>
Cc: Eric Biggers <ebiggers@kernel.org>,
tytso@mit.edu, linux-fsdevel@vger.kernel.org,
viro@zeniv.linux.org.uk, jaegeuk@kernel.org
Subject: Re: [PATCH v2] libfs: Attempt exact-match comparison first during casefold lookup
Date: Thu, 18 Jan 2024 11:29:31 -0800 [thread overview]
Message-ID: <CAHk-=whgQXOouz7KVHKb_SYEo1qujH_1c9TjMLmaQmdbdRE_uw@mail.gmail.com> (raw)
In-Reply-To: <8734uufy1o.fsf@mailhost.krisman.be>
On Thu, 18 Jan 2024 at 07:42, Gabriel Krisman Bertazi <krisman@suse.de> wrote:
>
> But I don't see how this could be a problem. the str pointer itself
> can't change since it's already a copy of the dentry->d_name pointer; if
> the string is external, it is guaranteed to exist until the end of the
> lookup; Finally, If it's inlined, the string can change concurrently
> which, in the worst case scenario, gives us a false result. But then,
> VFS will retry, like we do for the case-insensitive comparison right
> below.
>
> ..Right? :)
Wrong, for two subtle reasons.
The issue is not that the string can go away. The issue is that the
string and the length have been loaded independently - and may not
match.
So when you do that
if (len == name->len && !memcmp(str, name->name, len))
the 'name->len' you test for equality with 'len' may not match the
length of the string allocated in 'name->name', because they are two
different loads of two different values, and we do not hold the lock
that makes them consistent.
See the big comment (and the READ_ONCE()" in dentry_cmp(), and notice
how dentry_cmp() intentionally doesn't use 'name->len' at all.
Subtle, subtle - but this is *incredibly* performance-critical code.
Locks are completely out of the question - they would make the whole
RCU pathwalk completely pointless, and the RCU path-walk with no
stores to the dentry cache is what makes the dentry cache perform so
well.
So it's not even that locks have contention, it's literally that RCU
path lookup treats the dentries as read-only and you actually get
shared cachelines and true parallel lookups. No reference count
updates, no *nothing* like that.
This is why dentry_cmp() does that magical dentry_string_cmp(), which
is very subtle: it knows that regardless of *which* source of naem we
have, the word-aligned reads of d_name->name are safe, because
(a) the allocations are word-aligned
(b) the dentry name allocations all end with a NUL byte (unlike the
pathname ones that can have '/' at the end of the name)
(c) the *pathname* length is reliable, and the previous word didn't
have a NUL byte in it because that would have ended the compare
(d) so that "read one word from cs" is safe, *despite* the fact that
the length we have isn't something we can rely on (it comes from the
pathname side, of course).
And yes, this code is subtle. There are very few people who really
understand all the dentry rules. But it is *the* most critical piece
of code in the kernel for a lot of real-life loads. Looking up
pathnames is pretty much "Job #1" of an OS - a lot of the rest is
"details".
Linus
next prev parent reply other threads:[~2024-01-18 19:29 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-18 0:46 [PATCH v2] libfs: Attempt exact-match comparison first during casefold lookup Gabriel Krisman Bertazi
2024-01-18 3:50 ` Eric Biggers
2024-01-18 15:42 ` Gabriel Krisman Bertazi
2024-01-18 16:51 ` Gabriel Krisman Bertazi
2024-01-18 19:29 ` Linus Torvalds [this message]
2024-01-19 14:13 ` Gabriel Krisman Bertazi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAHk-=whgQXOouz7KVHKb_SYEo1qujH_1c9TjMLmaQmdbdRE_uw@mail.gmail.com' \
--to=torvalds@linux-foundation.org \
--cc=ebiggers@kernel.org \
--cc=jaegeuk@kernel.org \
--cc=krisman@suse.de \
--cc=linux-fsdevel@vger.kernel.org \
--cc=tytso@mit.edu \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).