All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vivek Goyal <vgoyal@redhat.com>
To: Miklos Szeredi <miklos@szeredi.hu>
Cc: linux-fsdevel@vger.kernel.org,
	virtio-fs-list <virtio-fs@redhat.com>,
	ganesh.mahalingam@intel.com
Subject: Re: [Virtio-fs] [PATCH] virtiofs: Enable SB_NOSEC flag to improve small write performance
Date: Tue, 21 Jul 2020 14:16:30 -0400	[thread overview]
Message-ID: <20200721181630.GD551452@redhat.com> (raw)
In-Reply-To: <20200721155503.GC551452@redhat.com>

On Tue, Jul 21, 2020 at 11:55:03AM -0400, Vivek Goyal wrote:
> On Tue, Jul 21, 2020 at 05:44:14PM +0200, Miklos Szeredi wrote:
> > On Tue, Jul 21, 2020 at 5:17 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> > >
> > > On Tue, Jul 21, 2020 at 02:33:41PM +0200, Miklos Szeredi wrote:
> > > > On Mon, Jul 20, 2020 at 5:41 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> > > > >
> > > > > On Fri, Jul 17, 2020 at 10:53:07AM +0200, Miklos Szeredi wrote:
> > > >
> > > > > I see in VFS that chown() always kills suid/sgid. While truncate() and
> > > > > write(), will suid/sgid only if caller does not have CAP_FSETID.
> > > > >
> > > > > How does this work with FUSE_HANDLE_KILLPRIV. IIUC, file server does not
> > > > > know if caller has CAP_FSETID or not. That means file server will be
> > > > > forced to kill suid/sgid on every write and truncate. And that will fail
> > > > > some of the tests.
> > > > >
> > > > > For WRITE requests now we do have the notion of setting
> > > > > FUSE_WRITE_KILL_PRIV flag to tell server explicitly to kill suid/sgid.
> > > > > Probably we could use that in cached write path as well to figure out
> > > > > whether to kill suid/sgid or not. But truncate() will still continue
> > > > > to be an issue.
> > > >
> > > > Yes, not doing the same for truncate seems to be an oversight.
> > > > Unfortunate, since we'll need another INIT flag to enable selective
> > > > clearing of suid/sgid on truncate.
> > > >
> > > > >
> > > > > >
> > > > > > Even writeback_cache could be handled by this addition, since we call
> > > > > > fuse_update_attributes() before generic_file_write_iter() :
> > > > > >
> > > > > > --- a/fs/fuse/dir.c
> > > > > > +++ b/fs/fuse/dir.c
> > > > > > @@ -985,6 +985,7 @@ static int fuse_update_get_attr(struct inode
> > > > > > *inode, struct file *file,
> > > > > >
> > > > > >         if (sync) {
> > > > > >                 forget_all_cached_acls(inode);
> > > > > > +               inode->i_flags &= ~S_NOSEC;
> > > > >
> > > > > Ok, So I was clearing S_NOSEC only if server reports that file has
> > > > > suid/sgid bit set. This change will clear S_NOSEC whenever we fetch
> > > > > attrs from host and will force getxattr() when we call file_remove_privs()
> > > > > and will increase overhead for non cache writeback mode. We probably
> > > > > could keep both. For cache writeback mode, clear it undonditionally
> > > > > otherwise not.
> > > >
> > > > We clear S_NOSEC because the attribute timeout has expired.  This
> > > > means we need to refresh all metadata, including cached xattr (which
> > > > is what S_NOSEC effectively is).
> > > >
> > > > > What I don't understand is though that how this change will clear
> > > > > suid/sgid on host in cache=writeback mode. I see fuse_setattr()
> > > > > will not set ATTR_MODE and clear S_ISUID and S_ISGID if
> > > > > fc->handle_killpriv is set. So when server receives setattr request
> > > > > (if it does), then how will it know it is supposed to kill suid/sgid
> > > > > bit. (its not chown, truncate and its not write).
> > > >
> > > > Depends.  If the attribute timeout is infinity, then that means the
> > > > cache is always up to date.  In that case we only need to clear
> > > > suid/sgid if set in i_mode.  Similarly, the security.capability will
> > > > only be cleared if it was set in the first place (which would clear
> > > > S_NOSEC).
> > > >
> > > > If the timeout is finite, then that means we need to check if the
> > > > metadata changed after a timeout.  That's the purpose of the
> > > > fuse_update_attributes() call before generic_file_write_iter().
> > > >
> > > > Does that make it clear?
> > >
> > > I understood it partly but one thing is still bothering me. What
> > > happens when cache writeback is set as well as fc->handle_killpriv=1.
> > >
> > > When handle_killpriv is set, how suid/sgid will be cleared by
> > > server. Given cache=writeback, write probably got cached in
> > > guest and server probably will not not see a WRITE immideately.
> > > (I am assuming we are relying on a WRITE to clear setuid/setgid when
> > >  handle_killpriv is set). And that means server will not clear
> > >  setuid/setgid till inode is written back at some point of time
> > >  later.
> > >
> > > IOW, cache=writeback and fc->handle_killpriv don't seem to go
> > > together (atleast given the current code).
> > 
> > fuse_cache_write_iter()
> >   -> fuse_update_attributes()   * this will refresh i_mode
> >   -> generic_file_write_iter()
> >       ->__generic_file_write_iter()
> >           ->file_remove_privs()    * this will check i_mode
> >               ->__remove_privs()
> >                   -> notify_change()
> >                      -> fuse_setattr()   * this will clear suid/sgit bits
> 
> And fuse_setattr() has following.
> 
>                 if (!fc->handle_killpriv) {
>                         /*
>                          * ia_mode calculation may have used stale i_mode.
>                          * Refresh and recalculate.
>                          */
>                         ret = fuse_do_getattr(inode, NULL, file);
>                         if (ret)
>                                 return ret;
> 
>                         attr->ia_mode = inode->i_mode;
>                         if (inode->i_mode & S_ISUID) {
>                                 attr->ia_valid |= ATTR_MODE;
>                                 attr->ia_mode &= ~S_ISUID;
>                         }
>                         if ((inode->i_mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
>                                 attr->ia_valid |= ATTR_MODE;
>                                 attr->ia_mode &= ~S_ISGID;
>                         }
>                 }
>         }
>         if (!attr->ia_valid)
>                 return 0;
> 
> So if fc->handle_killpriv is set, we might not even send setattr
> request if attr->ia_valid turns out to be zero.
> 
> I did a quick instrumentation and noticed that we are sending
> setattr with attr->ia_valid=0x200 (ATTR_FORCE) set. And file
> server is not required to kill suid/sgid in this case?

Did little more instrumentation of fuse and virtiofsd. Modified 
virtiofsd to enable FUSE_HANDLE_KILLPRIV and ran virtiofsd with
-o writeback.

On client created a file /mnt/virtiofs/foo.txt and set setuid bit.
Write a program to write a single charater to the file and
dropped CAP_FSETID before executing the program and noticed messages
coming on virtiofsd. 

I see no WRITE came and lo_setattr() was called with valid=0x0. And
that means it will not change any of the attrs and simply get
current attrs and return to client.

A WRITE comes later either when file is close (fuse_flush()) or
a writeback is triggred. So if file server clears setuid/setgid
bit always on WRITE, then setuid/setgid bit will ultimately
be cleared but much later when guest page is written back.

Hopefully I am not missing something very basic.

Thanks
Vivek


WARNING: multiple messages have this Message-ID (diff)
From: Vivek Goyal <vgoyal@redhat.com>
To: Miklos Szeredi <miklos@szeredi.hu>
Cc: linux-fsdevel@vger.kernel.org,
	virtio-fs-list <virtio-fs@redhat.com>,
	ganesh.mahalingam@intel.com
Subject: Re: [PATCH] virtiofs: Enable SB_NOSEC flag to improve small write performance
Date: Tue, 21 Jul 2020 14:16:30 -0400	[thread overview]
Message-ID: <20200721181630.GD551452@redhat.com> (raw)
In-Reply-To: <20200721155503.GC551452@redhat.com>

On Tue, Jul 21, 2020 at 11:55:03AM -0400, Vivek Goyal wrote:
> On Tue, Jul 21, 2020 at 05:44:14PM +0200, Miklos Szeredi wrote:
> > On Tue, Jul 21, 2020 at 5:17 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> > >
> > > On Tue, Jul 21, 2020 at 02:33:41PM +0200, Miklos Szeredi wrote:
> > > > On Mon, Jul 20, 2020 at 5:41 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> > > > >
> > > > > On Fri, Jul 17, 2020 at 10:53:07AM +0200, Miklos Szeredi wrote:
> > > >
> > > > > I see in VFS that chown() always kills suid/sgid. While truncate() and
> > > > > write(), will suid/sgid only if caller does not have CAP_FSETID.
> > > > >
> > > > > How does this work with FUSE_HANDLE_KILLPRIV. IIUC, file server does not
> > > > > know if caller has CAP_FSETID or not. That means file server will be
> > > > > forced to kill suid/sgid on every write and truncate. And that will fail
> > > > > some of the tests.
> > > > >
> > > > > For WRITE requests now we do have the notion of setting
> > > > > FUSE_WRITE_KILL_PRIV flag to tell server explicitly to kill suid/sgid.
> > > > > Probably we could use that in cached write path as well to figure out
> > > > > whether to kill suid/sgid or not. But truncate() will still continue
> > > > > to be an issue.
> > > >
> > > > Yes, not doing the same for truncate seems to be an oversight.
> > > > Unfortunate, since we'll need another INIT flag to enable selective
> > > > clearing of suid/sgid on truncate.
> > > >
> > > > >
> > > > > >
> > > > > > Even writeback_cache could be handled by this addition, since we call
> > > > > > fuse_update_attributes() before generic_file_write_iter() :
> > > > > >
> > > > > > --- a/fs/fuse/dir.c
> > > > > > +++ b/fs/fuse/dir.c
> > > > > > @@ -985,6 +985,7 @@ static int fuse_update_get_attr(struct inode
> > > > > > *inode, struct file *file,
> > > > > >
> > > > > >         if (sync) {
> > > > > >                 forget_all_cached_acls(inode);
> > > > > > +               inode->i_flags &= ~S_NOSEC;
> > > > >
> > > > > Ok, So I was clearing S_NOSEC only if server reports that file has
> > > > > suid/sgid bit set. This change will clear S_NOSEC whenever we fetch
> > > > > attrs from host and will force getxattr() when we call file_remove_privs()
> > > > > and will increase overhead for non cache writeback mode. We probably
> > > > > could keep both. For cache writeback mode, clear it undonditionally
> > > > > otherwise not.
> > > >
> > > > We clear S_NOSEC because the attribute timeout has expired.  This
> > > > means we need to refresh all metadata, including cached xattr (which
> > > > is what S_NOSEC effectively is).
> > > >
> > > > > What I don't understand is though that how this change will clear
> > > > > suid/sgid on host in cache=writeback mode. I see fuse_setattr()
> > > > > will not set ATTR_MODE and clear S_ISUID and S_ISGID if
> > > > > fc->handle_killpriv is set. So when server receives setattr request
> > > > > (if it does), then how will it know it is supposed to kill suid/sgid
> > > > > bit. (its not chown, truncate and its not write).
> > > >
> > > > Depends.  If the attribute timeout is infinity, then that means the
> > > > cache is always up to date.  In that case we only need to clear
> > > > suid/sgid if set in i_mode.  Similarly, the security.capability will
> > > > only be cleared if it was set in the first place (which would clear
> > > > S_NOSEC).
> > > >
> > > > If the timeout is finite, then that means we need to check if the
> > > > metadata changed after a timeout.  That's the purpose of the
> > > > fuse_update_attributes() call before generic_file_write_iter().
> > > >
> > > > Does that make it clear?
> > >
> > > I understood it partly but one thing is still bothering me. What
> > > happens when cache writeback is set as well as fc->handle_killpriv=1.
> > >
> > > When handle_killpriv is set, how suid/sgid will be cleared by
> > > server. Given cache=writeback, write probably got cached in
> > > guest and server probably will not not see a WRITE immideately.
> > > (I am assuming we are relying on a WRITE to clear setuid/setgid when
> > >  handle_killpriv is set). And that means server will not clear
> > >  setuid/setgid till inode is written back at some point of time
> > >  later.
> > >
> > > IOW, cache=writeback and fc->handle_killpriv don't seem to go
> > > together (atleast given the current code).
> > 
> > fuse_cache_write_iter()
> >   -> fuse_update_attributes()   * this will refresh i_mode
> >   -> generic_file_write_iter()
> >       ->__generic_file_write_iter()
> >           ->file_remove_privs()    * this will check i_mode
> >               ->__remove_privs()
> >                   -> notify_change()
> >                      -> fuse_setattr()   * this will clear suid/sgit bits
> 
> And fuse_setattr() has following.
> 
>                 if (!fc->handle_killpriv) {
>                         /*
>                          * ia_mode calculation may have used stale i_mode.
>                          * Refresh and recalculate.
>                          */
>                         ret = fuse_do_getattr(inode, NULL, file);
>                         if (ret)
>                                 return ret;
> 
>                         attr->ia_mode = inode->i_mode;
>                         if (inode->i_mode & S_ISUID) {
>                                 attr->ia_valid |= ATTR_MODE;
>                                 attr->ia_mode &= ~S_ISUID;
>                         }
>                         if ((inode->i_mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
>                                 attr->ia_valid |= ATTR_MODE;
>                                 attr->ia_mode &= ~S_ISGID;
>                         }
>                 }
>         }
>         if (!attr->ia_valid)
>                 return 0;
> 
> So if fc->handle_killpriv is set, we might not even send setattr
> request if attr->ia_valid turns out to be zero.
> 
> I did a quick instrumentation and noticed that we are sending
> setattr with attr->ia_valid=0x200 (ATTR_FORCE) set. And file
> server is not required to kill suid/sgid in this case?

Did little more instrumentation of fuse and virtiofsd. Modified 
virtiofsd to enable FUSE_HANDLE_KILLPRIV and ran virtiofsd with
-o writeback.

On client created a file /mnt/virtiofs/foo.txt and set setuid bit.
Write a program to write a single charater to the file and
dropped CAP_FSETID before executing the program and noticed messages
coming on virtiofsd. 

I see no WRITE came and lo_setattr() was called with valid=0x0. And
that means it will not change any of the attrs and simply get
current attrs and return to client.

A WRITE comes later either when file is close (fuse_flush()) or
a writeback is triggred. So if file server clears setuid/setgid
bit always on WRITE, then setuid/setgid bit will ultimately
be cleared but much later when guest page is written back.

Hopefully I am not missing something very basic.

Thanks
Vivek


  reply	other threads:[~2020-07-21 18:16 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-16 14:40 [Virtio-fs] [PATCH] virtiofs: Enable SB_NOSEC flag to improve small write performance Vivek Goyal
2020-07-16 14:40 ` Vivek Goyal
2020-07-16 18:18 ` [Virtio-fs] " Vivek Goyal
2020-07-16 18:18   ` Vivek Goyal
2020-07-17  8:53   ` [Virtio-fs] " Miklos Szeredi
2020-07-17  8:53     ` Miklos Szeredi
2020-07-20 15:41     ` [Virtio-fs] " Vivek Goyal
2020-07-20 15:41       ` Vivek Goyal
2020-07-21 12:33       ` [Virtio-fs] " Miklos Szeredi
2020-07-21 12:33         ` Miklos Szeredi
2020-07-21 15:16         ` [Virtio-fs] " Vivek Goyal
2020-07-21 15:16           ` Vivek Goyal
2020-07-21 15:44           ` [Virtio-fs] " Miklos Szeredi
2020-07-21 15:44             ` Miklos Szeredi
2020-07-21 15:55             ` [Virtio-fs] " Vivek Goyal
2020-07-21 15:55               ` Vivek Goyal
2020-07-21 18:16               ` Vivek Goyal [this message]
2020-07-21 18:16                 ` Vivek Goyal
2020-07-21 19:53               ` [Virtio-fs] " Miklos Szeredi
2020-07-21 19:53                 ` Miklos Szeredi
2020-07-21 21:30                 ` [Virtio-fs] " Vivek Goyal
2020-07-21 21:30                   ` Vivek Goyal
2020-07-22 10:00                   ` [Virtio-fs] " Miklos Szeredi
2020-07-22 10:00                     ` Miklos Szeredi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200721181630.GD551452@redhat.com \
    --to=vgoyal@redhat.com \
    --cc=ganesh.mahalingam@intel.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=virtio-fs@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.