From: "Serge E. Hallyn" <serue@us.ibm.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Al Viro <viro@zeniv.linux.org.uk>,
Linux Filesystem Mailing List <linux-fsdevel@vger.kernel.org>,
Eric Paris <eparis@redhat.com>, Mimi Zohar <zohar@us.ibm.com>,
James Morris <jmorris@namei.org>
Subject: Re: [PATCH 0/8] VFS name lookup permission checking cleanup
Date: Tue, 8 Sep 2009 09:01:40 -0500 [thread overview]
Message-ID: <20090908140140.GB873@us.ibm.com> (raw)
In-Reply-To: <alpine.LFD.2.01.0909071337510.3419@localhost.localdomain>
Quoting Linus Torvalds (torvalds@linux-foundation.org):
>
> This is a series of eight trivial patches that I'd like people to take a
> look at, because I am hoping to eventually do multiple path component
> lookups in one go without taking the per-dentry lock or incrementing (and
> then decrementing) the per-dentry atomic count for each component.
>
> The aim would be to try to avoid getting that annoying cacheline ping-pong
> on the common top-level dentries that everybody looks up (ie root and home
> directories, /usr, /usr/bin etc).
>
> Right now I have some simple (but real) loads that show the contention on
> dentry->d_lock to be a roughly 3% performance hit on a single-socket
> nehalem, and I assume it can be much worse on multi-socket machines.
>
> And the thing is, it should be entirely possible to do everything but the
> last component lookup with just a single read_seqbegin()/read_seqretry()
> around the whole lookup. Yes, the last component is special and absolutely
> needs locking and counting - but the last component also doesn't tend to
> be shared, so locking it is fine.
>
> Now, I may never actually get there, but when looking at it, the biggest
> problem is actually not so much the path lookup itself, as the security
> tests that are done for each path component. And it should be noted that
> in order for a lockless seq-lock only lookup make sense, any such
> operations would have to be totally lock-free too. They certainly can't
> take mutexes etc, but right now they do.
>
> Those security tests fall into two categories:
>
> - actual security layer callouts: ima_path_check().
>
> This one looks totally pointless. Path component lookup is a horribly
> timing-critical path, and we will only do a successful lookup on a
> directory (inode needs to have a ->lookup operation), yet in the middle
> of that is a call to "ima_path_check()".
>
> Now, it looks like ima_path_check() is very much designed to only check
> the _final_ path anyway, and was never meant to be used to check the
> directories we hit on the way. In fact, the whole function starts with
>
> if (!ima_initialized || !S_ISREG(inode->i_mode))
> return 0;
>
> so it's totally pointless to do that thing on a directory where
> that !S_ISREG() test will trigger.
>
> So just remove it. IMA should never have put that check in there to
> begin with, it's just way too performance-sensitive.
>
> - the real filesystem permission checks.
>
> We used to do the common case entirely in the VFS layer, but these days
> the common case includes POSIX ACL checking, and as a result, the
> trivial short-circuit code in the VFS layer almost never triggers in
> practice, and we call down to the low-level filesystem for each
> component.
>
> We can't fix that by just removing the call, but what we _can_ do is to
> at least avoid the silly calling back-and-forth: most filesystems will
> just call back to the VFS layer to do the "generic_permission()" with
> their own ACL-checking routines.
>
> That way we can flatten the call-chain out a bit, and avoid one
> unnecessary indirect call in that timing-critical region. And
> eventually, if we make the whole ACL caching thing be something that we
> do at a VFS layer (Al Viro already worked on _some_ of that), we'll be
> able to avoid the calls entirely when we can see the cached ACL
> pointers directly.
>
> So this series of 8 patches do all these preliminary things. As shown by
> the diffstat below, it actually reduces the lines of code (mainly by just
> removing the silly per-filesystem wrappers around "generic_permission()")
> and it also makes it a _lot_ clearer what actually gets called in that
> whole 'exec_permission_lite()' function that we use to check the
> permission of a pathname lookup.
>
> Comments? Especially from the IMA people (first patch) and from generic
> VFS, security and low-level FS people (the 'Simplify exec_permission_lite'
> series, and then the check_acl + per-filesystem changes).
>
> Al?
>
> I'm looking to merge these shortly after 2.6.31 is released, but comments
> welcome.
All of them seem good, and I don't see any thinkos, no resulting skipped
checks or anything.
Acked-by: Serge Hallyn <serue@us.ibm.com>
-serge
next prev parent reply other threads:[~2009-09-08 14:01 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-09-07 21:01 [PATCH 0/8] VFS name lookup permission checking cleanup Linus Torvalds
2009-09-07 21:02 ` [PATCH 1/8] Do not call 'ima_path_check()' for each path component Linus Torvalds
2009-09-07 21:02 ` [PATCH 2/8] Simplify exec_permission_lite() logic Linus Torvalds
2009-09-07 21:03 ` [PATCH 3/8] Simplify exec_permission_lite() further Linus Torvalds
2009-09-07 21:03 ` [PATCH 4/8] Simplify exec_permission_lite(), part 3 Linus Torvalds
2009-09-07 21:04 ` [PATCH 5/8] Make 'check_acl()' a first-class filesystem op Linus Torvalds
2009-09-07 21:04 ` [PATCH 6/8] shmfs: use 'check_acl' instead of 'permission' Linus Torvalds
2009-09-07 21:05 ` [PATCH 7/8] ext[234]: move over to 'check_acl' permission model Linus Torvalds
2009-09-07 21:05 ` [PATCH 8/8] jffs2/jfs/xfs: switch over to 'check_acl' rather than 'permission()' Linus Torvalds
2009-09-08 18:34 ` Christoph Hellwig
2009-09-09 17:09 ` Linus Torvalds
2009-09-07 22:23 ` [PATCH 6/8] shmfs: use 'check_acl' instead of 'permission' James Morris
2009-09-08 0:03 ` James Morris
2009-09-08 18:05 ` Hugh Dickins
2009-09-07 22:20 ` [PATCH 5/8] Make 'check_acl()' a first-class filesystem op James Morris
2009-09-07 22:18 ` [PATCH 4/8] Simplify exec_permission_lite(), part 3 James Morris
2009-09-07 22:15 ` [PATCH 3/8] Simplify exec_permission_lite() further James Morris
2009-09-08 14:40 ` Jamie Lokier
2009-09-08 14:58 ` Matthew Wilcox
2009-09-08 15:02 ` Linus Torvalds
2009-09-07 22:12 ` [PATCH 2/8] Simplify exec_permission_lite() logic James Morris
2009-09-08 1:50 ` [PATCH 1/8] Do not call 'ima_path_check()' for each path component Christoph Hellwig
2009-09-08 14:01 ` Serge E. Hallyn [this message]
2009-09-08 17:52 ` [PATCH 0/8] VFS name lookup permission checking cleanup Mimi Zohar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090908140140.GB873@us.ibm.com \
--to=serue@us.ibm.com \
--cc=eparis@redhat.com \
--cc=jmorris@namei.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
--cc=zohar@us.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).