From: Christian Brauner <brauner@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mateusz Guzik <mjguzik@gmail.com>,
Al Viro <viro@zeniv.linux.org.uk>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Jan Kara <jack@suse.cz>,
Ext4 Developers List <linux-ext4@vger.kernel.org>
Subject: Re: generic_permission() optimization
Date: Mon, 14 Apr 2025 12:21:28 +0200 [thread overview]
Message-ID: <20250414-anomalie-abpfiff-9f293dce366b@brauner> (raw)
In-Reply-To: <CAHk-=wh+pk72FM+a7PoW2s46aU9OQZrY-oApMZSUH0Urg9bsMA@mail.gmail.com>
On Sat, Apr 12, 2025 at 01:22:38PM -0700, Linus Torvalds wrote:
> On Sat, 12 Apr 2025 at 09:26, Mateusz Guzik <mjguzik@gmail.com> wrote:
> >
> > I plopped your snippet towards the end of __ext4_iget:
>
> That's literally where I did the same thing, except I put it right after the
>
> brelse(iloc.bh);
>
> line, rather than before as you did.
>
> And it made no difference for me, but I didn't try to figure out why.
> Maybe some environment differences? Or maybe I just screwed up my
> testing...
>
> As mentioned earlier in the thread, I had this bi-modal distribution
> of results, because if I had a load where the *non*-owner of the inode
> looked up the pathnames, then the ACL information would get filled in
> when the VFS layer would do the lookup, and then once the ACLs were
> cached, everything worked beautifully.
>
> But if the only lookups of a path were done by the owner of the inodes
> (which is typical for at least my normal kernel build tree - nothing
> but my build will look at the files, and they are obviously always
> owned by me) then the ACL caches will never be filled because there
> will never be any real ACL lookups.
>
> And then rather than doing the nice efficient "no ACLs anywhere, no
> need to even look", it ends up having to actually do the vfsuid
> comparison for the UID equality check.
>
> Which then does the extra accesses to look up the idmap etc, and is
> visible in the profiles due to that whole dance:
>
> /* Are we the owner? If so, ACL's don't matter */
> vfsuid = i_uid_into_vfsuid(idmap, inode);
> if (likely(vfsuid_eq_kuid(vfsuid, current_fsuid()))) {
>
> even when idmap is 'nop_mnt_idmap' and it is reasonably cheap. Just
> because it ends up calling out to different functions and does extra
> D$ accesses to the inode and the suberblock (ie i_user_ns() is this
>
> return inode->i_sb->s_user_ns;
I think we can improve this. Right now multiple mounts from different
superblocks can share the same struct mnt_idmap. But I can change the
code so that struct mnt_idmap can only be shared between mounts from the
same superblock. With that we could do:
diff --git a/fs/mnt_idmapping.c b/fs/mnt_idmapping.c
index a37991fdb194..a5ec15c8c754 100644
--- a/fs/mnt_idmapping.c
+++ b/fs/mnt_idmapping.c
@@ -20,6 +20,7 @@
struct mnt_idmap {
struct uid_gid_map uid_map;
struct uid_gid_map gid_map;
+ struct user_namespace *s_user_ns;
refcount_t count;
};
And then stuff like:
static inline vfsuid_t i_uid_into_vfsuid(struct mnt_idmap *idmap,
const struct inode *inode)
{
return make_vfsuid(idmap, i_user_ns(inode), inode->i_uid);
}
just becomes:
static inline vfsuid_t i_uid_into_vfsuid(struct mnt_idmap *idmap,
const struct inode *inode)
{
return make_vfsuid(idmap, inode->i_uid);
}
which means:
vfsuid_t make_vfsuid(struct mnt_idmap *idmap,
kuid_t kuid)
{
uid_t uid;
if (idmap == &nop_mnt_idmap)
return VFSUIDT_INIT(kuid);
<snip>
}
will only have to verify nop_mnt_idmap and we never have to access the
inode->i_sb->s_user_ns at all.
I'll wip up a patch for this.
>
> so just to *see* that it's nop_mnt_idmap takes effort.
>
> One improvement might be to cache that 'nop_mnt_idmap' thing in the
> inode as a flag.
>
> But it would be even better if the filesystem just initializes the
> inode at inode read time to say "I have no ACL's for this inode" and
> none of this code will even trigger.
Yes, let's please do this.
next prev parent reply other threads:[~2025-04-14 10:21 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-31 4:16 generic_permission() optimization Linus Torvalds
2024-10-31 6:05 ` Al Viro
2024-10-31 6:42 ` Linus Torvalds
2024-10-31 18:14 ` Linus Torvalds
2024-10-31 22:28 ` Al Viro
2024-10-31 22:34 ` Linus Torvalds
2024-11-01 1:17 ` Linus Torvalds
2024-11-01 1:27 ` Al Viro
2024-11-01 13:15 ` Christian Brauner
2024-10-31 13:02 ` Christian Brauner
2024-10-31 19:04 ` Linus Torvalds
2024-10-31 22:02 ` Linus Torvalds
2024-10-31 22:31 ` Linus Torvalds
2024-11-07 19:54 ` Linus Torvalds
2024-11-07 22:22 ` Mateusz Guzik
2024-11-07 22:49 ` Linus Torvalds
2025-04-12 16:26 ` Mateusz Guzik
2025-04-12 20:22 ` Linus Torvalds
2025-04-14 10:21 ` Christian Brauner [this message]
2025-04-16 13:17 ` [PATCH RFC 0/3] mnt_idmapping: avoid pointer chase & inline low-level helpers Christian Brauner
2025-04-16 13:17 ` [PATCH RFC 1/3] inode: add fastpath for filesystem user namespace retrieval Christian Brauner
2025-04-16 13:49 ` Mateusz Guzik
2025-04-16 14:14 ` Christian Brauner
2025-04-22 10:37 ` Jan Kara
2025-04-22 13:33 ` Mateusz Guzik
2025-04-22 14:05 ` Christian Brauner
2025-04-16 13:17 ` [PATCH RFC 2/3] mnt_idmapping: add struct mnt_idmap to header Christian Brauner
2025-04-16 13:17 ` [PATCH RFC 3/3] mnt_idmapping: inline all low-level helpers Christian Brauner
2025-04-16 15:04 ` Linus Torvalds
2025-04-22 9:28 ` Christian Brauner
2025-04-12 21:52 ` generic_permission() optimization Theodore Ts'o
2025-04-12 22:36 ` Linus Torvalds
2025-04-12 23:12 ` Linus Torvalds
2025-04-12 23:55 ` Theodore Ts'o
2025-04-13 9:41 ` Mateusz Guzik
2025-04-13 12:40 ` Theodore Ts'o
2025-04-13 12:52 ` Mateusz Guzik
2025-04-13 17:29 ` Theodore Ts'o
2025-11-05 11:50 ` Mateusz Guzik
2025-11-05 11:51 ` Mateusz Guzik
2025-11-05 13:37 ` Jan Kara
2025-11-17 11:42 ` Mateusz Guzik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250414-anomalie-abpfiff-9f293dce366b@brauner \
--to=brauner@kernel.org \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=mjguzik@gmail.com \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).