linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christian Brauner <brauner@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mateusz Guzik <mjguzik@gmail.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	 linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Jan Kara <jack@suse.cz>,
	 Ext4 Developers List <linux-ext4@vger.kernel.org>
Subject: Re: generic_permission() optimization
Date: Mon, 14 Apr 2025 12:21:28 +0200	[thread overview]
Message-ID: <20250414-anomalie-abpfiff-9f293dce366b@brauner> (raw)
In-Reply-To: <CAHk-=wh+pk72FM+a7PoW2s46aU9OQZrY-oApMZSUH0Urg9bsMA@mail.gmail.com>

On Sat, Apr 12, 2025 at 01:22:38PM -0700, Linus Torvalds wrote:
> On Sat, 12 Apr 2025 at 09:26, Mateusz Guzik <mjguzik@gmail.com> wrote:
> >
> > I plopped your snippet towards the end of __ext4_iget:
> 
> That's literally where I did the same thing, except I put it right after the
> 
>           brelse(iloc.bh);
> 
> line, rather than before as you did.
> 
> And it made no difference for me, but I didn't try to figure out why.
> Maybe some environment differences? Or maybe I just screwed up my
> testing...
> 
> As mentioned earlier in the thread, I had this bi-modal distribution
> of results, because if I had a load where the *non*-owner of the inode
> looked up the pathnames, then the ACL information would get filled in
> when the VFS layer would do the lookup, and then once the ACLs were
> cached, everything worked beautifully.
> 
> But if the only lookups of a path were done by the owner of the inodes
> (which is typical for at least my normal kernel build tree - nothing
> but my build will look at the files, and they are obviously always
> owned by me) then the ACL caches will never be filled because there
> will never be any real ACL lookups.
> 
> And then rather than doing the nice efficient "no ACLs anywhere, no
> need to even look", it ends up having to actually do the vfsuid
> comparison for the UID equality check.
> 
> Which then does the extra accesses to look up the idmap etc, and is
> visible in the profiles due to that whole dance:
> 
>         /* Are we the owner? If so, ACL's don't matter */
>         vfsuid = i_uid_into_vfsuid(idmap, inode);
>         if (likely(vfsuid_eq_kuid(vfsuid, current_fsuid()))) {
> 
> even when idmap is 'nop_mnt_idmap' and it is reasonably cheap. Just
> because it ends up calling out to different functions and does extra
> D$ accesses to the inode and the suberblock (ie i_user_ns() is this
> 
>         return inode->i_sb->s_user_ns;

I think we can improve this. Right now multiple mounts from different
superblocks can share the same struct mnt_idmap. But I can change the
code so that struct mnt_idmap can only be shared between mounts from the
same superblock. With that we could do:

diff --git a/fs/mnt_idmapping.c b/fs/mnt_idmapping.c
index a37991fdb194..a5ec15c8c754 100644
--- a/fs/mnt_idmapping.c
+++ b/fs/mnt_idmapping.c
@@ -20,6 +20,7 @@
 struct mnt_idmap {
        struct uid_gid_map uid_map;
        struct uid_gid_map gid_map;
+       struct user_namespace *s_user_ns;
        refcount_t count;
 };

And then stuff like:

static inline vfsuid_t i_uid_into_vfsuid(struct mnt_idmap *idmap,
                                         const struct inode *inode)
{
        return make_vfsuid(idmap, i_user_ns(inode), inode->i_uid);
}

just becomes:

static inline vfsuid_t i_uid_into_vfsuid(struct mnt_idmap *idmap,
                                         const struct inode *inode)
{
        return make_vfsuid(idmap, inode->i_uid);
}

which means:

vfsuid_t make_vfsuid(struct mnt_idmap *idmap,
                     kuid_t kuid)
{
        uid_t uid;

        if (idmap == &nop_mnt_idmap)
                return VFSUIDT_INIT(kuid);

<snip>
}

will only have to verify nop_mnt_idmap and we never have to access the
inode->i_sb->s_user_ns at all.

I'll wip up a patch for this.

> 
> so just to *see* that it's nop_mnt_idmap takes effort.
> 
> One improvement might be to cache that 'nop_mnt_idmap' thing in the
> inode as a flag.
> 
> But it would be even better if the filesystem just initializes the
> inode at inode read time to say "I have no ACL's for this inode" and
> none of this code will even trigger.

Yes, let's please do this.

  reply	other threads:[~2025-04-14 10:21 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-31  4:16 generic_permission() optimization Linus Torvalds
2024-10-31  6:05 ` Al Viro
2024-10-31  6:42   ` Linus Torvalds
2024-10-31 18:14     ` Linus Torvalds
2024-10-31 22:28       ` Al Viro
2024-10-31 22:34         ` Linus Torvalds
2024-11-01  1:17           ` Linus Torvalds
2024-11-01  1:27             ` Al Viro
2024-11-01 13:15             ` Christian Brauner
2024-10-31 13:02 ` Christian Brauner
2024-10-31 19:04   ` Linus Torvalds
2024-10-31 22:02     ` Linus Torvalds
2024-10-31 22:31       ` Linus Torvalds
2024-11-07 19:54         ` Linus Torvalds
2024-11-07 22:22           ` Mateusz Guzik
2024-11-07 22:49             ` Linus Torvalds
2025-04-12 16:26               ` Mateusz Guzik
2025-04-12 20:22                 ` Linus Torvalds
2025-04-14 10:21                   ` Christian Brauner [this message]
2025-04-16 13:17                     ` [PATCH RFC 0/3] mnt_idmapping: avoid pointer chase & inline low-level helpers Christian Brauner
2025-04-16 13:17                       ` [PATCH RFC 1/3] inode: add fastpath for filesystem user namespace retrieval Christian Brauner
2025-04-16 13:49                         ` Mateusz Guzik
2025-04-16 14:14                           ` Christian Brauner
2025-04-22 10:37                         ` Jan Kara
2025-04-22 13:33                           ` Mateusz Guzik
2025-04-22 14:05                             ` Christian Brauner
2025-04-16 13:17                       ` [PATCH RFC 2/3] mnt_idmapping: add struct mnt_idmap to header Christian Brauner
2025-04-16 13:17                       ` [PATCH RFC 3/3] mnt_idmapping: inline all low-level helpers Christian Brauner
2025-04-16 15:04                         ` Linus Torvalds
2025-04-22  9:28                           ` Christian Brauner
2025-04-12 21:52                 ` generic_permission() optimization Theodore Ts'o
2025-04-12 22:36                   ` Linus Torvalds
2025-04-12 23:12                     ` Linus Torvalds
2025-04-12 23:55                     ` Theodore Ts'o
2025-04-13  9:41                       ` Mateusz Guzik
2025-04-13 12:40                         ` Theodore Ts'o
2025-04-13 12:52                           ` Mateusz Guzik
2025-04-13 17:29                             ` Theodore Ts'o
2025-11-05 11:50                           ` Mateusz Guzik
2025-11-05 11:51                             ` Mateusz Guzik
2025-11-05 13:37                               ` Jan Kara
2025-11-17 11:42                                 ` Mateusz Guzik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250414-anomalie-abpfiff-9f293dce366b@brauner \
    --to=brauner@kernel.org \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=mjguzik@gmail.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).