linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Serge E. Hallyn" <serue@us.ibm.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Linux Filesystem Mailing List <linux-fsdevel@vger.kernel.org>,
	Eric Paris <eparis@redhat.com>, Mimi Zohar <zohar@us.ibm.com>,
	James Morris <jmorris@namei.org>
Subject: Re: [PATCH 0/8] VFS name lookup permission checking cleanup
Date: Tue, 8 Sep 2009 09:01:40 -0500	[thread overview]
Message-ID: <20090908140140.GB873@us.ibm.com> (raw)
In-Reply-To: <alpine.LFD.2.01.0909071337510.3419@localhost.localdomain>

Quoting Linus Torvalds (torvalds@linux-foundation.org):
> 
> This is a series of eight trivial patches that I'd like people to take a 
> look at, because I am hoping to eventually do multiple path component 
> lookups in one go without taking the per-dentry lock or incrementing (and 
> then decrementing) the per-dentry atomic count for each component.
> 
> The aim would be to try to avoid getting that annoying cacheline ping-pong 
> on the common top-level dentries that everybody looks up (ie root and home 
> directories, /usr, /usr/bin etc).
> 
> Right now I have some simple (but real) loads that show the contention on 
> dentry->d_lock to be a roughly 3% performance hit on a single-socket 
> nehalem, and I assume it can be much worse on multi-socket machines.
> 
> And the thing is, it should be entirely possible to do everything but the 
> last component lookup with just a single read_seqbegin()/read_seqretry() 
> around the whole lookup. Yes, the last component is special and absolutely 
> needs locking and counting - but the last component also doesn't tend to 
> be shared, so locking it is fine.
> 
> Now, I may never actually get there, but when looking at it, the biggest 
> problem is actually not so much the path lookup itself, as the security 
> tests that are done for each path component. And it should be noted that 
> in order for a lockless seq-lock only lookup make sense, any such 
> operations would have to be totally lock-free too. They certainly can't 
> take mutexes etc, but right now they do.
> 
> Those security tests fall into two categories:
> 
>  - actual security layer callouts: ima_path_check().
> 
>    This one looks totally pointless. Path component lookup is a horribly 
>    timing-critical path, and we will only do a successful lookup on a 
>    directory (inode needs to have a ->lookup operation), yet in the middle 
>    of that is a call to "ima_path_check()".
> 
>    Now, it looks like ima_path_check() is very much designed to only check 
>    the _final_ path anyway, and was never meant to be used to check the 
>    directories we hit on the way. In fact, the whole function starts with
> 
> 	if (!ima_initialized || !S_ISREG(inode->i_mode))
> 		return 0;
> 
>    so it's totally pointless to do that thing on a directory where 
>    that !S_ISREG() test will trigger.
> 
>    So just remove it. IMA should never have put that check in there to 
>    begin with, it's just way too performance-sensitive.
> 
>  - the real filesystem permission checks. 
> 
>    We used to do the common case entirely in the VFS layer, but these days 
>    the common case includes POSIX ACL checking, and as a result, the 
>    trivial short-circuit code in the VFS layer almost never triggers in
>    practice, and we call down to the low-level filesystem for each 
>    component. 
> 
>    We can't fix that by just removing the call, but what we _can_ do is to 
>    at least avoid the silly calling back-and-forth: most filesystems will 
>    just call back to the VFS layer to do the "generic_permission()" with 
>    their own ACL-checking routines.
> 
>    That way we can flatten the call-chain out a bit, and avoid one 
>    unnecessary indirect call in that timing-critical region. And 
>    eventually, if we make the whole ACL caching thing be something that we 
>    do at a VFS layer (Al Viro already worked on _some_ of that), we'll be 
>    able to avoid the calls entirely when we can see the cached ACL 
>    pointers directly.
> 
> So this series of 8 patches do all these preliminary things. As shown by 
> the diffstat below, it actually reduces the lines of code (mainly by just 
> removing the silly per-filesystem wrappers around "generic_permission()") 
> and it also makes it a _lot_ clearer what actually gets called in that 
> whole 'exec_permission_lite()' function that we use to check the 
> permission of a pathname lookup.
> 
> Comments?  Especially from the IMA people (first patch) and from generic 
> VFS, security and low-level FS people (the 'Simplify exec_permission_lite' 
> series, and then the check_acl + per-filesystem changes).
> 
> Al?
> 
> I'm looking to merge these shortly after 2.6.31 is released, but comments 
> welcome.

All of them seem good, and I don't see any thinkos, no resulting skipped
checks or anything.

Acked-by: Serge Hallyn <serue@us.ibm.com>

-serge

  parent reply	other threads:[~2009-09-08 14:01 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-07 21:01 [PATCH 0/8] VFS name lookup permission checking cleanup Linus Torvalds
2009-09-07 21:02 ` [PATCH 1/8] Do not call 'ima_path_check()' for each path component Linus Torvalds
2009-09-07 21:02   ` [PATCH 2/8] Simplify exec_permission_lite() logic Linus Torvalds
2009-09-07 21:03     ` [PATCH 3/8] Simplify exec_permission_lite() further Linus Torvalds
2009-09-07 21:03       ` [PATCH 4/8] Simplify exec_permission_lite(), part 3 Linus Torvalds
2009-09-07 21:04         ` [PATCH 5/8] Make 'check_acl()' a first-class filesystem op Linus Torvalds
2009-09-07 21:04           ` [PATCH 6/8] shmfs: use 'check_acl' instead of 'permission' Linus Torvalds
2009-09-07 21:05             ` [PATCH 7/8] ext[234]: move over to 'check_acl' permission model Linus Torvalds
2009-09-07 21:05               ` [PATCH 8/8] jffs2/jfs/xfs: switch over to 'check_acl' rather than 'permission()' Linus Torvalds
2009-09-08 18:34                 ` Christoph Hellwig
2009-09-09 17:09                   ` Linus Torvalds
2009-09-07 22:23             ` [PATCH 6/8] shmfs: use 'check_acl' instead of 'permission' James Morris
2009-09-08  0:03               ` James Morris
2009-09-08 18:05             ` Hugh Dickins
2009-09-07 22:20           ` [PATCH 5/8] Make 'check_acl()' a first-class filesystem op James Morris
2009-09-07 22:18         ` [PATCH 4/8] Simplify exec_permission_lite(), part 3 James Morris
2009-09-07 22:15       ` [PATCH 3/8] Simplify exec_permission_lite() further James Morris
2009-09-08 14:40       ` Jamie Lokier
2009-09-08 14:58         ` Matthew Wilcox
2009-09-08 15:02         ` Linus Torvalds
2009-09-07 22:12     ` [PATCH 2/8] Simplify exec_permission_lite() logic James Morris
2009-09-08  1:50   ` [PATCH 1/8] Do not call 'ima_path_check()' for each path component Christoph Hellwig
2009-09-08 14:01 ` Serge E. Hallyn [this message]
2009-09-08 17:52 ` [PATCH 0/8] VFS name lookup permission checking cleanup Mimi Zohar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090908140140.GB873@us.ibm.com \
    --to=serue@us.ibm.com \
    --cc=eparis@redhat.com \
    --cc=jmorris@namei.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=zohar@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).