linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* remove_suid bangs on xattrs
@ 2010-08-16 19:38 Chris Mason
  2010-08-16 19:44 ` Chris Mason
  0 siblings, 1 reply; 7+ messages in thread
From: Chris Mason @ 2010-08-16 19:38 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: serue

Hi everyone,

I'm looking into a 2.6.35 btrfs performance regression, and perf tells
me that I'm spending a lot of time hammering on xattrs inside
remove_suid.  This is pretty surprising because I'm running as root, and
my files are not suid.  Looking back to this commit:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=b53767719b6cd8789392ea3e7e2eb7b8906898f0

We've changed remove_suid's semantics from

if (file_is_suid)
    try to remove it

To something that always checks to see if we have removal permissions.

Was this intentional?  It didn't cause my 2.6.35 regression (that's all
my fault) but it does look wrong to me:

diff --git a/mm/filemap.c b/mm/filemap.c
index 4fb1546..79f24a9 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1627,12 +1627,18 @@ int __remove_suid(struct dentry *dentry, int kill)
 
 int remove_suid(struct dentry *dentry)
 {
-       int kill = should_remove_suid(dentry);
+       int killsuid = should_remove_suid(dentry);
+       int killpriv = security_inode_need_killpriv(dentry);
+       int error = 0;
 
-       if (unlikely(kill))
-               return __remove_suid(dentry, kill);
+       if (killpriv < 0)
+               return killpriv;
+       if (killpriv)
+               error = security_inode_killpriv(dentry);
+       if (!error && killsuid)
+               error = __remove_suid(dentry, killsuid);
 
-       return 0;
+       return error;
 }
 EXPORT_SYMBOL(remove_suid);

-chris


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: remove_suid bangs on xattrs
  2010-08-16 19:38 remove_suid bangs on xattrs Chris Mason
@ 2010-08-16 19:44 ` Chris Mason
  2010-08-18  2:41   ` Serge E. Hallyn
  0 siblings, 1 reply; 7+ messages in thread
From: Chris Mason @ 2010-08-16 19:44 UTC (permalink / raw)
  To: linux-fsdevel, serge.hallyn

[ sorry, corrected cc list ]

On Mon, Aug 16, 2010 at 03:38:12PM -0400, Chris Mason wrote:
> Hi everyone,
> 
> I'm looking into a 2.6.35 btrfs performance regression, and perf tells
> me that I'm spending a lot of time hammering on xattrs inside
> remove_suid.  This is pretty surprising because I'm running as root, and
> my files are not suid.  Looking back to this commit:
> 
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=b53767719b6cd8789392ea3e7e2eb7b8906898f0
> 
> We've changed remove_suid's semantics from
> 
> if (file_is_suid)
>     try to remove it
> 
> To something that always checks to see if we have removal permissions.
> 
> Was this intentional?  It didn't cause my 2.6.35 regression (that's all
> my fault) but it does look wrong to me:
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 4fb1546..79f24a9 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -1627,12 +1627,18 @@ int __remove_suid(struct dentry *dentry, int kill)
>  
>  int remove_suid(struct dentry *dentry)
>  {
> -       int kill = should_remove_suid(dentry);
> +       int killsuid = should_remove_suid(dentry);
> +       int killpriv = security_inode_need_killpriv(dentry);
> +       int error = 0;
>  
> -       if (unlikely(kill))
> -               return __remove_suid(dentry, kill);
> +       if (killpriv < 0)
> +               return killpriv;
> +       if (killpriv)
> +               error = security_inode_killpriv(dentry);
> +       if (!error && killsuid)
> +               error = __remove_suid(dentry, killsuid);
>  
> -       return 0;
> +       return error;
>  }
>  EXPORT_SYMBOL(remove_suid);
> 
> -chris
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: remove_suid bangs on xattrs
  2010-08-16 19:44 ` Chris Mason
@ 2010-08-18  2:41   ` Serge E. Hallyn
  2010-08-20  5:31     ` Andrew G. Morgan
  0 siblings, 1 reply; 7+ messages in thread
From: Serge E. Hallyn @ 2010-08-18  2:41 UTC (permalink / raw)
  To: Chris Mason, linux-fsdevel, serge.hallyn, Andrew Morgan

Quoting Chris Mason (chris.mason@oracle.com):
> [ sorry, corrected cc list ]

Thanks - sorry for the inconvenience.  I'm also cc:ing Andrew Morgan
for another opinion.

> On Mon, Aug 16, 2010 at 03:38:12PM -0400, Chris Mason wrote:
> > Hi everyone,
> > 
> > I'm looking into a 2.6.35 btrfs performance regression, and perf tells
> > me that I'm spending a lot of time hammering on xattrs inside
> > remove_suid.  This is pretty surprising because I'm running as root, and
> > my files are not suid.  Looking back to this commit:
> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=b53767719b6cd8789392ea3e7e2eb7b8906898f0
> > 
> > We've changed remove_suid's semantics from
> > 
> > if (file_is_suid)
> >     try to remove it

(but only if not capable(CAP_FSETID))

> > To something that always checks to see if we have removal permissions.

(not really - security_inode_need_killpriv() shoudl return <0 only if
there was an actual error, and the write needs to be cancelled altogether.
It returns 0 if privs don't need to be removed, and >0 if they do.

> > Was this intentional?  It didn't cause my 2.6.35 regression (that's all
> > my fault) but it does look wrong to me:

If I'm thinking right, I think the key change we should make is to have
CAP_FSETID be honored for maintaining file capabilities.

That would have two (good) results:

1. we should be able to re-arrange the code to check for CAP_FSETID
before bothering to check for file capabilities, so we can save the
getxattrs which I assume were what you were finding?  Even if it
wasn't the cause of your performance regression, it should be an
improvement.

2. I think it can be seen as a semantic fix.  We mostly try to
respect suid behavior for file caps, so it will be more consistent
to honor CAP_FSETID for file capabilities.

Andrew, what do you think?

> > diff --git a/mm/filemap.c b/mm/filemap.c
> > index 4fb1546..79f24a9 100644
> > --- a/mm/filemap.c
> > +++ b/mm/filemap.c
> > @@ -1627,12 +1627,18 @@ int __remove_suid(struct dentry *dentry, int kill)
> >  
> >  int remove_suid(struct dentry *dentry)
> >  {
> > -       int kill = should_remove_suid(dentry);
> > +       int killsuid = should_remove_suid(dentry);
> > +       int killpriv = security_inode_need_killpriv(dentry);
> > +       int error = 0;
> >  
> > -       if (unlikely(kill))
> > -               return __remove_suid(dentry, kill);
> > +       if (killpriv < 0)
> > +               return killpriv;
> > +       if (killpriv)
> > +               error = security_inode_killpriv(dentry);
> > +       if (!error && killsuid)
> > +               error = __remove_suid(dentry, killsuid);
> >  
> > -       return 0;
> > +       return error;
> >  }
> >  EXPORT_SYMBOL(remove_suid);
> > 
> > -chris

thanks,
-serge

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: remove_suid bangs on xattrs
  2010-08-18  2:41   ` Serge E. Hallyn
@ 2010-08-20  5:31     ` Andrew G. Morgan
  2010-08-20 12:25       ` Serge E. Hallyn
       [not found]       ` <5E83F6C3-2B1E-4FBF-960C-27364528813C@dilger.ca>
  0 siblings, 2 replies; 7+ messages in thread
From: Andrew G. Morgan @ 2010-08-20  5:31 UTC (permalink / raw)
  To: Serge E. Hallyn; +Cc: Chris Mason, linux-fsdevel

On Tue, Aug 17, 2010 at 7:41 PM, Serge E. Hallyn
<serge.hallyn@canonical.com> wrote:
> Quoting Chris Mason (chris.mason@oracle.com):
>> [ sorry, corrected cc list ]
>
> Thanks - sorry for the inconvenience.  I'm also cc:ing Andrew Morgan
> for another opinion.
>
>> On Mon, Aug 16, 2010 at 03:38:12PM -0400, Chris Mason wrote:
>> > Hi everyone,
>> >
>> > I'm looking into a 2.6.35 btrfs performance regression, and perf tells
>> > me that I'm spending a lot of time hammering on xattrs inside
>> > remove_suid.  This is pretty surprising because I'm running as root, and
>> > my files are not suid.  Looking back to this commit:
>> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=b53767719b6cd8789392ea3e7e2eb7b8906898f0
>> >
>> > We've changed remove_suid's semantics from
>> >
>> > if (file_is_suid)
>> >     try to remove it
>
> (but only if not capable(CAP_FSETID))

I disagree. I think the relevant capability test should be with
respect to CAP_SETFCAP.

Since this is the capability that allows you to put a capability on a
file, it should be the one to retain it if the file is modified.

>> > To something that always checks to see if we have removal permissions.
>
> (not really - security_inode_need_killpriv() shoudl return <0 only if
> there was an actual error, and the write needs to be cancelled altogether.
> It returns 0 if privs don't need to be removed, and >0 if they do.
>
>> > Was this intentional?  It didn't cause my 2.6.35 regression (that's all
>> > my fault) but it does look wrong to me:
>
> If I'm thinking right, I think the key change we should make is to have
> CAP_FSETID be honored for maintaining file capabilities.
>
> That would have two (good) results:
>
> 1. we should be able to re-arrange the code to check for CAP_FSETID
> before bothering to check for file capabilities, so we can save the
> getxattrs which I assume were what you were finding?  Even if it
> wasn't the cause of your performance regression, it should be an
> improvement.
>
> 2. I think it can be seen as a semantic fix.  We mostly try to
> respect suid behavior for file caps, so it will be more consistent
> to honor CAP_FSETID for file capabilities.
>
> Andrew, what do you think?
>

I think the test should be with respect to CAP_SETFCAP, but I agree
with the rest of your comments.

Lots of small writes to 'any' file also tends to bang on this code.
I've been wondering if it might make sense to cache, in the inode,
that a file does *not* have any capabilities associated with it. That
way the kernel wouldn't need to look up the xattrs twice for the same
incapable file - which is, by far, the common case.

Cheers

Andrew

>> > diff --git a/mm/filemap.c b/mm/filemap.c
>> > index 4fb1546..79f24a9 100644
>> > --- a/mm/filemap.c
>> > +++ b/mm/filemap.c
>> > @@ -1627,12 +1627,18 @@ int __remove_suid(struct dentry *dentry, int kill)
>> >
>> >  int remove_suid(struct dentry *dentry)
>> >  {
>> > -       int kill = should_remove_suid(dentry);
>> > +       int killsuid = should_remove_suid(dentry);
>> > +       int killpriv = security_inode_need_killpriv(dentry);
>> > +       int error = 0;
>> >
>> > -       if (unlikely(kill))
>> > -               return __remove_suid(dentry, kill);
>> > +       if (killpriv < 0)
>> > +               return killpriv;
>> > +       if (killpriv)
>> > +               error = security_inode_killpriv(dentry);
>> > +       if (!error && killsuid)
>> > +               error = __remove_suid(dentry, killsuid);
>> >
>> > -       return 0;
>> > +       return error;
>> >  }
>> >  EXPORT_SYMBOL(remove_suid);
>> >
>> > -chris
>
> thanks,
> -serge
>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: remove_suid bangs on xattrs
  2010-08-20  5:31     ` Andrew G. Morgan
@ 2010-08-20 12:25       ` Serge E. Hallyn
       [not found]       ` <5E83F6C3-2B1E-4FBF-960C-27364528813C@dilger.ca>
  1 sibling, 0 replies; 7+ messages in thread
From: Serge E. Hallyn @ 2010-08-20 12:25 UTC (permalink / raw)
  To: Andrew G. Morgan; +Cc: Serge E. Hallyn, Chris Mason, linux-fsdevel

Quoting Andrew G. Morgan (morgan@kernel.org):
> On Tue, Aug 17, 2010 at 7:41 PM, Serge E. Hallyn
> <serge.hallyn@canonical.com> wrote:
> > Quoting Chris Mason (chris.mason@oracle.com):
> >> [ sorry, corrected cc list ]
> >
> > Thanks - sorry for the inconvenience.  I'm also cc:ing Andrew Morgan
> > for another opinion.
> >
> >> On Mon, Aug 16, 2010 at 03:38:12PM -0400, Chris Mason wrote:
> >> > Hi everyone,
> >> >
> >> > I'm looking into a 2.6.35 btrfs performance regression, and perf tells
> >> > me that I'm spending a lot of time hammering on xattrs inside
> >> > remove_suid.  This is pretty surprising because I'm running as root, and
> >> > my files are not suid.  Looking back to this commit:
> >> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=b53767719b6cd8789392ea3e7e2eb7b8906898f0
> >> >
> >> > We've changed remove_suid's semantics from
> >> >
> >> > if (file_is_suid)
> >> >     try to remove it
> >
> > (but only if not capable(CAP_FSETID))
> 
> I disagree. I think the relevant capability test should be with
> respect to CAP_SETFCAP.
> 
> Since this is the capability that allows you to put a capability on a
> file, it should be the one to retain it if the file is modified.

I'm ok with that.

> >> > To something that always checks to see if we have removal permissions.
> >
> > (not really - security_inode_need_killpriv() shoudl return <0 only if
> > there was an actual error, and the write needs to be cancelled altogether.
> > It returns 0 if privs don't need to be removed, and >0 if they do.
> >
> >> > Was this intentional?  It didn't cause my 2.6.35 regression (that's all
> >> > my fault) but it does look wrong to me:
> >
> > If I'm thinking right, I think the key change we should make is to have
> > CAP_FSETID be honored for maintaining file capabilities.
> >
> > That would have two (good) results:
> >
> > 1. we should be able to re-arrange the code to check for CAP_FSETID
> > before bothering to check for file capabilities, so we can save the
> > getxattrs which I assume were what you were finding?  Even if it
> > wasn't the cause of your performance regression, it should be an
> > improvement.
> >
> > 2. I think it can be seen as a semantic fix.  We mostly try to
> > respect suid behavior for file caps, so it will be more consistent
> > to honor CAP_FSETID for file capabilities.
> >
> > Andrew, what do you think?
> >
> 
> I think the test should be with respect to CAP_SETFCAP, but I agree
> with the rest of your comments.

I also point out, with some shame, that on first reading Chris' email,
I had to look at that code more than I should have needed to to recall
the details.  So I think the function could stand some cleaning up/
simplifying.

I'll try to take a look at this next week.

> Lots of small writes to 'any' file also tends to bang on this code.
> I've been wondering if it might make sense to cache, in the inode,
> that a file does *not* have any capabilities associated with it. That
> way the kernel wouldn't need to look up the xattrs twice for the same
> incapable file - which is, by far, the common case.

That could also be shared with a new (old) optional xattr-free
file-backed-filecaps mount option :)

thanks,
-serge
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: remove_suid bangs on xattrs
       [not found]       ` <5E83F6C3-2B1E-4FBF-960C-27364528813C@dilger.ca>
@ 2010-09-02 16:02         ` Serge E. Hallyn
  2010-09-02 21:01           ` Andreas Dilger
  0 siblings, 1 reply; 7+ messages in thread
From: Serge E. Hallyn @ 2010-09-02 16:02 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Andrew G. Morgan, Serge E. Hallyn, Chris Mason, linux-fsdevel

Quoting Andreas Dilger (adilger@dilger.ca):
> On 2010-08-19, at 23:31, Andrew G. Morgan wrote:
> > Lots of small writes to 'any' file also tends to bang on this code.
> > I've been wondering if it might make sense to cache, in the inode,
> > that a file does *not* have any capabilities associated with it. That
> > way the kernel wouldn't need to look up the xattrs twice for the same
> > incapable file - which is, by far, the common case.
> 
> That would be a blessing.  I see a steady stream of
> getxattr("security.capability") requests, and being able to disable this

Do you think it would help at all to add a S_NO_POSIXCAPS
to i_flags, and set that the first time we find that
getxattr("security.capability") finds no capabilities?
I.e. are these requests frequently for the same inode, or
always for new ones?

> (possibly even in the superblock with a flag) would avoid expensive RPCs on a
> network filesystem.

Hmm, as it is, the get_vfs_caps_from_disk() does not get called
if MNT_NOSUID.  But the cap_inode_need_killpriv() does, so a
quick way to reduce that # for you would be to pass the inode
to security_inode_need_killpriv (so it can get to mnt), and
have that check for MNT_NOSUID, and then you can mount your
network fs's with MNT_NOSUID...  Would that help you?


-serge

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: remove_suid bangs on xattrs
  2010-09-02 16:02         ` Serge E. Hallyn
@ 2010-09-02 21:01           ` Andreas Dilger
  0 siblings, 0 replies; 7+ messages in thread
From: Andreas Dilger @ 2010-09-02 21:01 UTC (permalink / raw)
  To: Serge E. Hallyn; +Cc: Andrew G. Morgan, Chris Mason, linux-fsdevel

On 2010-09-02, at 09:02, Serge E. Hallyn wrote:
> Quoting Andreas Dilger (adilger@dilger.ca):
>> On 2010-08-19, at 23:31, Andrew G. Morgan wrote:
>>> Lots of small writes to 'any' file also tends to bang on this code.
>>> I've been wondering if it might make sense to cache, in the inode,
>>> that a file does *not* have any capabilities associated with it. That
>>> way the kernel wouldn't need to look up the xattrs twice for the same
>>> incapable file - which is, by far, the common case.
>> 
>> That would be a blessing.  I see a steady stream of
>> getxattr("security.capability") requests, and being able to disable this
> 
> Do you think it would help at all to add a S_NO_POSIXCAPS
> to i_flags, and set that the first time we find that
> getxattr("security.capability") finds no capabilities?
> I.e. are these requests frequently for the same inode, or
> always for new ones?

That would be useful, or as you suggest a MNT_* flag.

>> (possibly even in the superblock with a flag) would avoid expensive RPCs on a
>> network filesystem.
> 
> Hmm, as it is, the get_vfs_caps_from_disk() does not get called
> if MNT_NOSUID.  But the cap_inode_need_killpriv() does, so a
> quick way to reduce that # for you would be to pass the inode
> to security_inode_need_killpriv (so it can get to mnt), and
> have that check for MNT_NOSUID, and then you can mount your
> network fs's with MNT_NOSUID...  Would that help you?

Except there are users that do use SUID binaries on network filesystems (e.g. root or /usr filesystems).  Something more like MNT_NOCAPXATTR would be better.


Cheers, Andreas






^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-09-02 21:01 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-08-16 19:38 remove_suid bangs on xattrs Chris Mason
2010-08-16 19:44 ` Chris Mason
2010-08-18  2:41   ` Serge E. Hallyn
2010-08-20  5:31     ` Andrew G. Morgan
2010-08-20 12:25       ` Serge E. Hallyn
     [not found]       ` <5E83F6C3-2B1E-4FBF-960C-27364528813C@dilger.ca>
2010-09-02 16:02         ` Serge E. Hallyn
2010-09-02 21:01           ` Andreas Dilger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).