* security: lockless stat() issues
@ 2013-10-04 19:33 Linus Torvalds
2013-10-04 20:15 ` Al Viro
0 siblings, 1 reply; 4+ messages in thread
From: Linus Torvalds @ 2013-10-04 19:33 UTC (permalink / raw)
To: James Morris, Al Viro; +Cc: Linux Kernel Mailing List, LSM List, linux-fsdevel
[-- Attachment #1: Type: text/plain, Size: 2066 bytes --]
Guys,
we've been working on scaling path lookup again, and 3.12 will be
pretty kick-ass on many loads.
HOWEVER, there's one really annoying case: we do all the path lookup
in RCU mode, where we can avoid touching any dentry state etc at all,
but then to "finalize" the path lookup in order to use it long-term,
we have to increment reference counts etc. That can be somewhat
expensive for high-scalability cases, since it will force-dirty (and
thus make exclusive in the cache) the core dentry cache entry.
And that's all fine for a lot of the common cases: when you do things
like open a file, you really do have to increment the reference count,
and there's no question that we do the right thing. And most people
who really open files tend to work on them, so getting the dentry
exclusively is fine.
There's one very important exception, though: things like "stat()" and
"access()" do *not* open a file in order to hold on to it. And they
are quite common (stat() in particular), _and_ they are often done on
files that are shared and then passed by rather than worked on..
Now, interestingly, both stat() and access() actually _already_ do
some kind of retry for special circumstances (LOOKUP_REVAL), and that
really looks like it could be extended to just do the whole lookup in
RCU mode too, and thus avoid ever finalizing the pathname.
However, the LSM interface doesn't really allow for that.
So how do people feel about passing a "mode" value for
security_inode_getattr(), the same way we do for
security_inode_permission()? The only flag would be that MAY_NOT_BLOCK
flag that gets set for RCU lookup, and the semantics would be the same
(return -ECHILD if you need to sleep).
Attached is a patch that adds the interface, and then makes all
security layers just do that ECHILD thing (and nobody actually sets
MAY_NOT_BLOCK yet). So it's purely preparatory. It's also
insufficient, because we'll need the same kind o fflag for the
low-level filesystem "i_op->getattr()" call, but that's an independent
issue.
Al, any comments?
Linus
[-- Attachment #2: patch.diff --]
[-- Type: application/octet-stream, Size: 6798 bytes --]
fs/stat.c | 2 +-
include/linux/security.h | 9 ++++++---
security/apparmor/lsm.c | 4 +++-
security/capability.c | 2 +-
security/security.c | 4 ++--
security/selinux/hooks.c | 5 ++++-
security/smack/smack_lsm.c | 7 ++++++-
security/tomoyo/tomoyo.c | 4 +++-
8 files changed, 26 insertions(+), 11 deletions(-)
diff --git a/fs/stat.c b/fs/stat.c
index d0ea7ef75e26..c3ed76c95b67 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -42,7 +42,7 @@ int vfs_getattr(struct path *path, struct kstat *stat)
struct inode *inode = path->dentry->d_inode;
int retval;
- retval = security_inode_getattr(path->mnt, path->dentry);
+ retval = security_inode_getattr(path->mnt, path->dentry, 0);
if (retval)
return retval;
diff --git a/include/linux/security.h b/include/linux/security.h
index 9d37e2b9d3ec..ba0480b86351 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -505,7 +505,9 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts)
* Check permission before obtaining file attributes.
* @mnt is the vfsmount where the dentry was looked up
* @dentry contains the dentry structure for the file.
+ * @mask contains MAY_NOT_BLOCK is set if this is a lockless lookup
* Return 0 if permission is granted.
+ * Return -ECHILD if you cannot do it locklessly and MAY_NOT_BLOCK is set
* @inode_setxattr:
* Check permission before setting the extended attributes
* @value identified by @name for @dentry.
@@ -1511,7 +1513,7 @@ struct security_operations {
int (*inode_follow_link) (struct dentry *dentry, struct nameidata *nd);
int (*inode_permission) (struct inode *inode, int mask);
int (*inode_setattr) (struct dentry *dentry, struct iattr *attr);
- int (*inode_getattr) (struct vfsmount *mnt, struct dentry *dentry);
+ int (*inode_getattr) (struct vfsmount *mnt, struct dentry *dentry, int mask);
int (*inode_setxattr) (struct dentry *dentry, const char *name,
const void *value, size_t size, int flags);
void (*inode_post_setxattr) (struct dentry *dentry, const char *name,
@@ -1787,7 +1789,7 @@ int security_inode_readlink(struct dentry *dentry);
int security_inode_follow_link(struct dentry *dentry, struct nameidata *nd);
int security_inode_permission(struct inode *inode, int mask);
int security_inode_setattr(struct dentry *dentry, struct iattr *attr);
-int security_inode_getattr(struct vfsmount *mnt, struct dentry *dentry);
+int security_inode_getattr(struct vfsmount *mnt, struct dentry *dentry, int mask);
int security_inode_setxattr(struct dentry *dentry, const char *name,
const void *value, size_t size, int flags);
void security_inode_post_setxattr(struct dentry *dentry, const char *name,
@@ -2178,7 +2180,8 @@ static inline int security_inode_setattr(struct dentry *dentry,
}
static inline int security_inode_getattr(struct vfsmount *mnt,
- struct dentry *dentry)
+ struct dentry *dentry,
+ int mask)
{
return 0;
}
diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
index fb99e18123b4..af07024510f4 100644
--- a/security/apparmor/lsm.c
+++ b/security/apparmor/lsm.c
@@ -364,10 +364,12 @@ static int apparmor_path_chown(struct path *path, kuid_t uid, kgid_t gid)
return common_perm(OP_CHOWN, path, AA_MAY_CHOWN, &cond);
}
-static int apparmor_inode_getattr(struct vfsmount *mnt, struct dentry *dentry)
+static int apparmor_inode_getattr(struct vfsmount *mnt, struct dentry *dentry, int mask)
{
if (!mediated_filesystem(dentry->d_inode))
return 0;
+ if (mask & MAY_NOT_BLOCK)
+ return -ECHILD;
return common_perm_mnt_dentry(OP_GETATTR, mnt, dentry,
AA_MAY_META_READ);
diff --git a/security/capability.c b/security/capability.c
index dbeb9bc27b24..2fa80ae08a42 100644
--- a/security/capability.c
+++ b/security/capability.c
@@ -202,7 +202,7 @@ static int cap_inode_setattr(struct dentry *dentry, struct iattr *iattr)
return 0;
}
-static int cap_inode_getattr(struct vfsmount *mnt, struct dentry *dentry)
+static int cap_inode_getattr(struct vfsmount *mnt, struct dentry *dentry, int mask)
{
return 0;
}
diff --git a/security/security.c b/security/security.c
index 4dc31f4f2700..b2423fa64983 100644
--- a/security/security.c
+++ b/security/security.c
@@ -567,11 +567,11 @@ int security_inode_setattr(struct dentry *dentry, struct iattr *attr)
}
EXPORT_SYMBOL_GPL(security_inode_setattr);
-int security_inode_getattr(struct vfsmount *mnt, struct dentry *dentry)
+int security_inode_getattr(struct vfsmount *mnt, struct dentry *dentry, int mask)
{
if (unlikely(IS_PRIVATE(dentry->d_inode)))
return 0;
- return security_ops->inode_getattr(mnt, dentry);
+ return security_ops->inode_getattr(mnt, dentry, mask);
}
int security_inode_setxattr(struct dentry *dentry, const char *name,
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index a5091ec06aa6..0dafce8524fd 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2785,11 +2785,14 @@ static int selinux_inode_setattr(struct dentry *dentry, struct iattr *iattr)
return dentry_has_perm(cred, dentry, av);
}
-static int selinux_inode_getattr(struct vfsmount *mnt, struct dentry *dentry)
+static int selinux_inode_getattr(struct vfsmount *mnt, struct dentry *dentry, int mask)
{
const struct cred *cred = current_cred();
struct path path;
+ if (mask & MAY_NOT_BLOCK)
+ return -ECHILD;
+
path.dentry = dentry;
path.mnt = mnt;
diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
index 8825375cc031..3bc34b02434e 100644
--- a/security/smack/smack_lsm.c
+++ b/security/smack/smack_lsm.c
@@ -808,10 +808,15 @@ static int smack_inode_setattr(struct dentry *dentry, struct iattr *iattr)
*
* Returns 0 if access is permitted, an error code otherwise
*/
-static int smack_inode_getattr(struct vfsmount *mnt, struct dentry *dentry)
+static int smack_inode_getattr(struct vfsmount *mnt, struct dentry *dentry, int mask)
{
struct smk_audit_info ad;
struct path path;
+ int no_block = mask & MAY_NOT_BLOCK;
+
+ /* May be droppable after audit */
+ if (no_block)
+ return -ECHILD;
path.dentry = dentry;
path.mnt = mnt;
diff --git a/security/tomoyo/tomoyo.c b/security/tomoyo/tomoyo.c
index f0b756e27fed..a0c1b4fff196 100644
--- a/security/tomoyo/tomoyo.c
+++ b/security/tomoyo/tomoyo.c
@@ -144,9 +144,11 @@ static int tomoyo_bprm_check_security(struct linux_binprm *bprm)
*
* Returns 0 on success, negative value otherwise.
*/
-static int tomoyo_inode_getattr(struct vfsmount *mnt, struct dentry *dentry)
+static int tomoyo_inode_getattr(struct vfsmount *mnt, struct dentry *dentry, int mask)
{
struct path path = { mnt, dentry };
+ if (mask & MAY_NOT_BLOCK)
+ return -ECHILD;
return tomoyo_path_perm(TOMOYO_TYPE_GETATTR, &path, NULL);
}
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: security: lockless stat() issues
2013-10-04 19:33 security: lockless stat() issues Linus Torvalds
@ 2013-10-04 20:15 ` Al Viro
2013-10-04 20:49 ` Linus Torvalds
0 siblings, 1 reply; 4+ messages in thread
From: Al Viro @ 2013-10-04 20:15 UTC (permalink / raw)
To: Linus Torvalds
Cc: James Morris, Linux Kernel Mailing List, LSM List, linux-fsdevel
On Fri, Oct 04, 2013 at 12:33:17PM -0700, Linus Torvalds wrote:
> So how do people feel about passing a "mode" value for
> security_inode_getattr(), the same way we do for
> security_inode_permission()? The only flag would be that MAY_NOT_BLOCK
> flag that gets set for RCU lookup, and the semantics would be the same
> (return -ECHILD if you need to sleep).
>
> Attached is a patch that adds the interface, and then makes all
> security layers just do that ECHILD thing (and nobody actually sets
> MAY_NOT_BLOCK yet). So it's purely preparatory. It's also
> insufficient, because we'll need the same kind o fflag for the
> low-level filesystem "i_op->getattr()" call, but that's an independent
> issue.
>
> Al, any comments?
Just a couple of days ago you'd been complaining about filesystems exposure
to rcuwalk details and now you propose to increase the contact surface
by one more method? OK...
For one thing, that will definitely require lazy freeing of struct
super_block itself. At the very least, you want ->i_sb->s_dev, even
on default pathway. Another thing is that API is wrong for that
kind of stuff - we pass vfsmount/dentry pair and we end up looking
at the inode guts. Fine, but in RCU case you must cope with the
possibility of dentry->d_inode going NULL under you. Which isn't
all that terrible, but we need to slap ACCESS_ONCE() into each method
instance (all 34 of them). IOW, it's worse than just "oh, we need to
add a flag"; we'll need to pass the sodding inode separately, with
dire warnings not to use dentry->d_inode.
It's not impossible to do, of course, but it won't be fun...
Oh, well... In any case, we need something in Documentation/filesystems
describing the RCU-related rules for fs writers - exposure surface, what
to avoid, etc.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: security: lockless stat() issues
2013-10-04 20:15 ` Al Viro
@ 2013-10-04 20:49 ` Linus Torvalds
2013-10-04 21:15 ` Al Viro
0 siblings, 1 reply; 4+ messages in thread
From: Linus Torvalds @ 2013-10-04 20:49 UTC (permalink / raw)
To: Al Viro; +Cc: James Morris, Linux Kernel Mailing List, LSM List, linux-fsdevel
On Fri, Oct 4, 2013 at 1:15 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Just a couple of days ago you'd been complaining about filesystems exposure
> to rcuwalk details and now you propose to increase the contact surface
> by one more method? OK...\
.. anything for a really critical path.
> For one thing, that will definitely require lazy freeing of struct
> super_block itself. At the very least, you want ->i_sb->s_dev, even
> on default pathway. Another thing is that API is wrong for that
> kind of stuff - we pass vfsmount/dentry pair and we end up looking
> at the inode guts. Fine, but in RCU case you must cope with the
> possibility of dentry->d_inode going NULL under you. Which isn't
> all that terrible, but we need to slap ACCESS_ONCE() into each method
> instance (all 34 of them). IOW, it's worse than just "oh, we need to
> add a flag"; we'll need to pass the sodding inode separately, with
> dire warnings not to use dentry->d_inode.
That would get ugly.
However, I don't think we actually really need to do that. We had a
similar situation with d_revalidate() passing inode pointers etc
totally unnecessarily. Yes, the filesystem needs to use ACCESS_ONCE()
and care about NULL, but it doesn't need anything more than that. And
we really do have that already.
And we already have dentry->d_sb - which is supposed to be valid.
Again, we already use it under RCU for d_revalidate() and for name
hashing. So the super-block had better already be ok with RCU.
Linus
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: security: lockless stat() issues
2013-10-04 20:49 ` Linus Torvalds
@ 2013-10-04 21:15 ` Al Viro
0 siblings, 0 replies; 4+ messages in thread
From: Al Viro @ 2013-10-04 21:15 UTC (permalink / raw)
To: Linus Torvalds
Cc: James Morris, Linux Kernel Mailing List, LSM List, linux-fsdevel
On Fri, Oct 04, 2013 at 01:49:13PM -0700, Linus Torvalds wrote:
> That would get ugly.
>
> However, I don't think we actually really need to do that. We had a
> similar situation with d_revalidate() passing inode pointers etc
> totally unnecessarily. Yes, the filesystem needs to use ACCESS_ONCE()
> and care about NULL, but it doesn't need anything more than that. And
> we really do have that already.
Not for ->getattr()... BTW, ->permission() for btrfs in that series
relies on not getting a MAY_WRITE | MAY_NOT_BLOCK combination, and
if you are serious about access(2), we'll need to lazy-delay freeing
struct btrfs_root. Besides, we'll need to audit all ->permission()
instances for places that do not check for MAY_NOT_BLOCK on such
branches...
> And we already have dentry->d_sb - which is supposed to be valid.
> Again, we already use it under RCU for d_revalidate() and for name
> hashing. So the super-block had better already be ok with RCU.
Umm... Right you are, so we really need to make freeing these suckers
delayed. Fixed and pushed. BTW, I wonder how much does removal of
s_files (next-to-last commit in #experimental) affect scalability on
open/close... Anybody with perf setup and reasonable amount of
previous data?
BTW^2: what's the FM2R for perf testing of that kind? E.g. how much
is testable under KVM, etc.?
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2013-10-04 21:15 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-04 19:33 security: lockless stat() issues Linus Torvalds
2013-10-04 20:15 ` Al Viro
2013-10-04 20:49 ` Linus Torvalds
2013-10-04 21:15 ` Al Viro
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).