* remove_suid bangs on xattrs @ 2010-08-16 19:38 Chris Mason 2010-08-16 19:44 ` Chris Mason 0 siblings, 1 reply; 7+ messages in thread From: Chris Mason @ 2010-08-16 19:38 UTC (permalink / raw) To: linux-fsdevel; +Cc: serue Hi everyone, I'm looking into a 2.6.35 btrfs performance regression, and perf tells me that I'm spending a lot of time hammering on xattrs inside remove_suid. This is pretty surprising because I'm running as root, and my files are not suid. Looking back to this commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=b53767719b6cd8789392ea3e7e2eb7b8906898f0 We've changed remove_suid's semantics from if (file_is_suid) try to remove it To something that always checks to see if we have removal permissions. Was this intentional? It didn't cause my 2.6.35 regression (that's all my fault) but it does look wrong to me: diff --git a/mm/filemap.c b/mm/filemap.c index 4fb1546..79f24a9 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1627,12 +1627,18 @@ int __remove_suid(struct dentry *dentry, int kill) int remove_suid(struct dentry *dentry) { - int kill = should_remove_suid(dentry); + int killsuid = should_remove_suid(dentry); + int killpriv = security_inode_need_killpriv(dentry); + int error = 0; - if (unlikely(kill)) - return __remove_suid(dentry, kill); + if (killpriv < 0) + return killpriv; + if (killpriv) + error = security_inode_killpriv(dentry); + if (!error && killsuid) + error = __remove_suid(dentry, killsuid); - return 0; + return error; } EXPORT_SYMBOL(remove_suid); -chris ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: remove_suid bangs on xattrs 2010-08-16 19:38 remove_suid bangs on xattrs Chris Mason @ 2010-08-16 19:44 ` Chris Mason 2010-08-18 2:41 ` Serge E. Hallyn 0 siblings, 1 reply; 7+ messages in thread From: Chris Mason @ 2010-08-16 19:44 UTC (permalink / raw) To: linux-fsdevel, serge.hallyn [ sorry, corrected cc list ] On Mon, Aug 16, 2010 at 03:38:12PM -0400, Chris Mason wrote: > Hi everyone, > > I'm looking into a 2.6.35 btrfs performance regression, and perf tells > me that I'm spending a lot of time hammering on xattrs inside > remove_suid. This is pretty surprising because I'm running as root, and > my files are not suid. Looking back to this commit: > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=b53767719b6cd8789392ea3e7e2eb7b8906898f0 > > We've changed remove_suid's semantics from > > if (file_is_suid) > try to remove it > > To something that always checks to see if we have removal permissions. > > Was this intentional? It didn't cause my 2.6.35 regression (that's all > my fault) but it does look wrong to me: > > diff --git a/mm/filemap.c b/mm/filemap.c > index 4fb1546..79f24a9 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -1627,12 +1627,18 @@ int __remove_suid(struct dentry *dentry, int kill) > > int remove_suid(struct dentry *dentry) > { > - int kill = should_remove_suid(dentry); > + int killsuid = should_remove_suid(dentry); > + int killpriv = security_inode_need_killpriv(dentry); > + int error = 0; > > - if (unlikely(kill)) > - return __remove_suid(dentry, kill); > + if (killpriv < 0) > + return killpriv; > + if (killpriv) > + error = security_inode_killpriv(dentry); > + if (!error && killsuid) > + error = __remove_suid(dentry, killsuid); > > - return 0; > + return error; > } > EXPORT_SYMBOL(remove_suid); > > -chris > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: remove_suid bangs on xattrs 2010-08-16 19:44 ` Chris Mason @ 2010-08-18 2:41 ` Serge E. Hallyn 2010-08-20 5:31 ` Andrew G. Morgan 0 siblings, 1 reply; 7+ messages in thread From: Serge E. Hallyn @ 2010-08-18 2:41 UTC (permalink / raw) To: Chris Mason, linux-fsdevel, serge.hallyn, Andrew Morgan Quoting Chris Mason (chris.mason@oracle.com): > [ sorry, corrected cc list ] Thanks - sorry for the inconvenience. I'm also cc:ing Andrew Morgan for another opinion. > On Mon, Aug 16, 2010 at 03:38:12PM -0400, Chris Mason wrote: > > Hi everyone, > > > > I'm looking into a 2.6.35 btrfs performance regression, and perf tells > > me that I'm spending a lot of time hammering on xattrs inside > > remove_suid. This is pretty surprising because I'm running as root, and > > my files are not suid. Looking back to this commit: > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=b53767719b6cd8789392ea3e7e2eb7b8906898f0 > > > > We've changed remove_suid's semantics from > > > > if (file_is_suid) > > try to remove it (but only if not capable(CAP_FSETID)) > > To something that always checks to see if we have removal permissions. (not really - security_inode_need_killpriv() shoudl return <0 only if there was an actual error, and the write needs to be cancelled altogether. It returns 0 if privs don't need to be removed, and >0 if they do. > > Was this intentional? It didn't cause my 2.6.35 regression (that's all > > my fault) but it does look wrong to me: If I'm thinking right, I think the key change we should make is to have CAP_FSETID be honored for maintaining file capabilities. That would have two (good) results: 1. we should be able to re-arrange the code to check for CAP_FSETID before bothering to check for file capabilities, so we can save the getxattrs which I assume were what you were finding? Even if it wasn't the cause of your performance regression, it should be an improvement. 2. I think it can be seen as a semantic fix. We mostly try to respect suid behavior for file caps, so it will be more consistent to honor CAP_FSETID for file capabilities. Andrew, what do you think? > > diff --git a/mm/filemap.c b/mm/filemap.c > > index 4fb1546..79f24a9 100644 > > --- a/mm/filemap.c > > +++ b/mm/filemap.c > > @@ -1627,12 +1627,18 @@ int __remove_suid(struct dentry *dentry, int kill) > > > > int remove_suid(struct dentry *dentry) > > { > > - int kill = should_remove_suid(dentry); > > + int killsuid = should_remove_suid(dentry); > > + int killpriv = security_inode_need_killpriv(dentry); > > + int error = 0; > > > > - if (unlikely(kill)) > > - return __remove_suid(dentry, kill); > > + if (killpriv < 0) > > + return killpriv; > > + if (killpriv) > > + error = security_inode_killpriv(dentry); > > + if (!error && killsuid) > > + error = __remove_suid(dentry, killsuid); > > > > - return 0; > > + return error; > > } > > EXPORT_SYMBOL(remove_suid); > > > > -chris thanks, -serge ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: remove_suid bangs on xattrs 2010-08-18 2:41 ` Serge E. Hallyn @ 2010-08-20 5:31 ` Andrew G. Morgan 2010-08-20 12:25 ` Serge E. Hallyn [not found] ` <5E83F6C3-2B1E-4FBF-960C-27364528813C@dilger.ca> 0 siblings, 2 replies; 7+ messages in thread From: Andrew G. Morgan @ 2010-08-20 5:31 UTC (permalink / raw) To: Serge E. Hallyn; +Cc: Chris Mason, linux-fsdevel On Tue, Aug 17, 2010 at 7:41 PM, Serge E. Hallyn <serge.hallyn@canonical.com> wrote: > Quoting Chris Mason (chris.mason@oracle.com): >> [ sorry, corrected cc list ] > > Thanks - sorry for the inconvenience. I'm also cc:ing Andrew Morgan > for another opinion. > >> On Mon, Aug 16, 2010 at 03:38:12PM -0400, Chris Mason wrote: >> > Hi everyone, >> > >> > I'm looking into a 2.6.35 btrfs performance regression, and perf tells >> > me that I'm spending a lot of time hammering on xattrs inside >> > remove_suid. This is pretty surprising because I'm running as root, and >> > my files are not suid. Looking back to this commit: >> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=b53767719b6cd8789392ea3e7e2eb7b8906898f0 >> > >> > We've changed remove_suid's semantics from >> > >> > if (file_is_suid) >> > try to remove it > > (but only if not capable(CAP_FSETID)) I disagree. I think the relevant capability test should be with respect to CAP_SETFCAP. Since this is the capability that allows you to put a capability on a file, it should be the one to retain it if the file is modified. >> > To something that always checks to see if we have removal permissions. > > (not really - security_inode_need_killpriv() shoudl return <0 only if > there was an actual error, and the write needs to be cancelled altogether. > It returns 0 if privs don't need to be removed, and >0 if they do. > >> > Was this intentional? It didn't cause my 2.6.35 regression (that's all >> > my fault) but it does look wrong to me: > > If I'm thinking right, I think the key change we should make is to have > CAP_FSETID be honored for maintaining file capabilities. > > That would have two (good) results: > > 1. we should be able to re-arrange the code to check for CAP_FSETID > before bothering to check for file capabilities, so we can save the > getxattrs which I assume were what you were finding? Even if it > wasn't the cause of your performance regression, it should be an > improvement. > > 2. I think it can be seen as a semantic fix. We mostly try to > respect suid behavior for file caps, so it will be more consistent > to honor CAP_FSETID for file capabilities. > > Andrew, what do you think? > I think the test should be with respect to CAP_SETFCAP, but I agree with the rest of your comments. Lots of small writes to 'any' file also tends to bang on this code. I've been wondering if it might make sense to cache, in the inode, that a file does *not* have any capabilities associated with it. That way the kernel wouldn't need to look up the xattrs twice for the same incapable file - which is, by far, the common case. Cheers Andrew >> > diff --git a/mm/filemap.c b/mm/filemap.c >> > index 4fb1546..79f24a9 100644 >> > --- a/mm/filemap.c >> > +++ b/mm/filemap.c >> > @@ -1627,12 +1627,18 @@ int __remove_suid(struct dentry *dentry, int kill) >> > >> > int remove_suid(struct dentry *dentry) >> > { >> > - int kill = should_remove_suid(dentry); >> > + int killsuid = should_remove_suid(dentry); >> > + int killpriv = security_inode_need_killpriv(dentry); >> > + int error = 0; >> > >> > - if (unlikely(kill)) >> > - return __remove_suid(dentry, kill); >> > + if (killpriv < 0) >> > + return killpriv; >> > + if (killpriv) >> > + error = security_inode_killpriv(dentry); >> > + if (!error && killsuid) >> > + error = __remove_suid(dentry, killsuid); >> > >> > - return 0; >> > + return error; >> > } >> > EXPORT_SYMBOL(remove_suid); >> > >> > -chris > > thanks, > -serge > -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: remove_suid bangs on xattrs 2010-08-20 5:31 ` Andrew G. Morgan @ 2010-08-20 12:25 ` Serge E. Hallyn [not found] ` <5E83F6C3-2B1E-4FBF-960C-27364528813C@dilger.ca> 1 sibling, 0 replies; 7+ messages in thread From: Serge E. Hallyn @ 2010-08-20 12:25 UTC (permalink / raw) To: Andrew G. Morgan; +Cc: Serge E. Hallyn, Chris Mason, linux-fsdevel Quoting Andrew G. Morgan (morgan@kernel.org): > On Tue, Aug 17, 2010 at 7:41 PM, Serge E. Hallyn > <serge.hallyn@canonical.com> wrote: > > Quoting Chris Mason (chris.mason@oracle.com): > >> [ sorry, corrected cc list ] > > > > Thanks - sorry for the inconvenience. I'm also cc:ing Andrew Morgan > > for another opinion. > > > >> On Mon, Aug 16, 2010 at 03:38:12PM -0400, Chris Mason wrote: > >> > Hi everyone, > >> > > >> > I'm looking into a 2.6.35 btrfs performance regression, and perf tells > >> > me that I'm spending a lot of time hammering on xattrs inside > >> > remove_suid. This is pretty surprising because I'm running as root, and > >> > my files are not suid. Looking back to this commit: > >> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=b53767719b6cd8789392ea3e7e2eb7b8906898f0 > >> > > >> > We've changed remove_suid's semantics from > >> > > >> > if (file_is_suid) > >> > try to remove it > > > > (but only if not capable(CAP_FSETID)) > > I disagree. I think the relevant capability test should be with > respect to CAP_SETFCAP. > > Since this is the capability that allows you to put a capability on a > file, it should be the one to retain it if the file is modified. I'm ok with that. > >> > To something that always checks to see if we have removal permissions. > > > > (not really - security_inode_need_killpriv() shoudl return <0 only if > > there was an actual error, and the write needs to be cancelled altogether. > > It returns 0 if privs don't need to be removed, and >0 if they do. > > > >> > Was this intentional? It didn't cause my 2.6.35 regression (that's all > >> > my fault) but it does look wrong to me: > > > > If I'm thinking right, I think the key change we should make is to have > > CAP_FSETID be honored for maintaining file capabilities. > > > > That would have two (good) results: > > > > 1. we should be able to re-arrange the code to check for CAP_FSETID > > before bothering to check for file capabilities, so we can save the > > getxattrs which I assume were what you were finding? Even if it > > wasn't the cause of your performance regression, it should be an > > improvement. > > > > 2. I think it can be seen as a semantic fix. We mostly try to > > respect suid behavior for file caps, so it will be more consistent > > to honor CAP_FSETID for file capabilities. > > > > Andrew, what do you think? > > > > I think the test should be with respect to CAP_SETFCAP, but I agree > with the rest of your comments. I also point out, with some shame, that on first reading Chris' email, I had to look at that code more than I should have needed to to recall the details. So I think the function could stand some cleaning up/ simplifying. I'll try to take a look at this next week. > Lots of small writes to 'any' file also tends to bang on this code. > I've been wondering if it might make sense to cache, in the inode, > that a file does *not* have any capabilities associated with it. That > way the kernel wouldn't need to look up the xattrs twice for the same > incapable file - which is, by far, the common case. That could also be shared with a new (old) optional xattr-free file-backed-filecaps mount option :) thanks, -serge -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <5E83F6C3-2B1E-4FBF-960C-27364528813C@dilger.ca>]
* Re: remove_suid bangs on xattrs [not found] ` <5E83F6C3-2B1E-4FBF-960C-27364528813C@dilger.ca> @ 2010-09-02 16:02 ` Serge E. Hallyn 2010-09-02 21:01 ` Andreas Dilger 0 siblings, 1 reply; 7+ messages in thread From: Serge E. Hallyn @ 2010-09-02 16:02 UTC (permalink / raw) To: Andreas Dilger Cc: Andrew G. Morgan, Serge E. Hallyn, Chris Mason, linux-fsdevel Quoting Andreas Dilger (adilger@dilger.ca): > On 2010-08-19, at 23:31, Andrew G. Morgan wrote: > > Lots of small writes to 'any' file also tends to bang on this code. > > I've been wondering if it might make sense to cache, in the inode, > > that a file does *not* have any capabilities associated with it. That > > way the kernel wouldn't need to look up the xattrs twice for the same > > incapable file - which is, by far, the common case. > > That would be a blessing. I see a steady stream of > getxattr("security.capability") requests, and being able to disable this Do you think it would help at all to add a S_NO_POSIXCAPS to i_flags, and set that the first time we find that getxattr("security.capability") finds no capabilities? I.e. are these requests frequently for the same inode, or always for new ones? > (possibly even in the superblock with a flag) would avoid expensive RPCs on a > network filesystem. Hmm, as it is, the get_vfs_caps_from_disk() does not get called if MNT_NOSUID. But the cap_inode_need_killpriv() does, so a quick way to reduce that # for you would be to pass the inode to security_inode_need_killpriv (so it can get to mnt), and have that check for MNT_NOSUID, and then you can mount your network fs's with MNT_NOSUID... Would that help you? -serge ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: remove_suid bangs on xattrs 2010-09-02 16:02 ` Serge E. Hallyn @ 2010-09-02 21:01 ` Andreas Dilger 0 siblings, 0 replies; 7+ messages in thread From: Andreas Dilger @ 2010-09-02 21:01 UTC (permalink / raw) To: Serge E. Hallyn; +Cc: Andrew G. Morgan, Chris Mason, linux-fsdevel On 2010-09-02, at 09:02, Serge E. Hallyn wrote: > Quoting Andreas Dilger (adilger@dilger.ca): >> On 2010-08-19, at 23:31, Andrew G. Morgan wrote: >>> Lots of small writes to 'any' file also tends to bang on this code. >>> I've been wondering if it might make sense to cache, in the inode, >>> that a file does *not* have any capabilities associated with it. That >>> way the kernel wouldn't need to look up the xattrs twice for the same >>> incapable file - which is, by far, the common case. >> >> That would be a blessing. I see a steady stream of >> getxattr("security.capability") requests, and being able to disable this > > Do you think it would help at all to add a S_NO_POSIXCAPS > to i_flags, and set that the first time we find that > getxattr("security.capability") finds no capabilities? > I.e. are these requests frequently for the same inode, or > always for new ones? That would be useful, or as you suggest a MNT_* flag. >> (possibly even in the superblock with a flag) would avoid expensive RPCs on a >> network filesystem. > > Hmm, as it is, the get_vfs_caps_from_disk() does not get called > if MNT_NOSUID. But the cap_inode_need_killpriv() does, so a > quick way to reduce that # for you would be to pass the inode > to security_inode_need_killpriv (so it can get to mnt), and > have that check for MNT_NOSUID, and then you can mount your > network fs's with MNT_NOSUID... Would that help you? Except there are users that do use SUID binaries on network filesystems (e.g. root or /usr filesystems). Something more like MNT_NOCAPXATTR would be better. Cheers, Andreas ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2010-09-02 21:01 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-08-16 19:38 remove_suid bangs on xattrs Chris Mason 2010-08-16 19:44 ` Chris Mason 2010-08-18 2:41 ` Serge E. Hallyn 2010-08-20 5:31 ` Andrew G. Morgan 2010-08-20 12:25 ` Serge E. Hallyn [not found] ` <5E83F6C3-2B1E-4FBF-960C-27364528813C@dilger.ca> 2010-09-02 16:02 ` Serge E. Hallyn 2010-09-02 21:01 ` Andreas Dilger
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).