Re: [PATCH/RFC] VFS: LOOKUP_MOUNTPOINT should used cached info whenever possible.

Linux NFS development
 help / color / mirror / Atom feed

From: Jeff Layton <jlayton@kernel.org>
To: Christian Brauner <brauner@kernel.org>, NeilBrown <neilb@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>,
	Dave Wysochanski <dwysocha@redhat.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-nfs <linux-nfs@vger.kernel.org>,
	David Howells <dhowells@redhat.com>,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH/RFC] VFS: LOOKUP_MOUNTPOINT should used cached info whenever possible.
Date: Mon, 17 Apr 2023 08:25:23 -0400	[thread overview]
Message-ID: <6c08ad94ca949d0f3525f7e1fc24a72c50affd59.camel@kernel.org> (raw)
In-Reply-To: <20230417-beisein-investieren-360fa20fb68a@brauner>

On Mon, 2023-04-17 at 13:55 +0200, Christian Brauner wrote:
> On Mon, Apr 17, 2023 at 09:13:52AM +1000, NeilBrown wrote:
> > 
> > When performing a LOOKUP_MOUNTPOINT lookup we don't really want to
> > engage with underlying systems at all.  Any mount point MUST be in the
> > dcache with a complete direct path from the root to the mountpoint.
> > That should be sufficient to find the mountpoint given a path name.
> > 
> > This becomes an issue when the filesystem changes unexpected, such as
> > when a NFS server is changed to reject all access.  It then becomes
> > impossible to unmount anything mounted on the filesystem which has
> > changed.  We could simply lazy-unmount the changed filesystem and that
> > will often be sufficient.  However if the target filesystem needs
> > "umount -f" to complete the unmount properly, then the lazy unmount will
> > leave it incompletely unmounted.  When "-f" is needed, we really need to
> 
> I don't understand this yet. All I see is nfs_umount_begin() that's
> different for MNT_FORCE to kill remaining io. Why does that preclude
> MNT_DETACH? You might very well fail MNT_FORCE and the only way you can
> get rid is to use MNT_DETACH, no? So I don't see why that is an
> argument.
> 
> > be able to name the target filesystem.
> > 
> > We NEVER want to revalidate anything.  We already avoid the revalidation
> > of the mountpoint itself, be we won't need to revalidate anything on the
> > path either as thay might affect the cache, and the cache is what we are
> > really looking in.
> > 
> > Permission checks are a little less clear.  We currently allow any user
> 
> This is all very brittle.
> 
> First case that comes to mind is overlayfs where the permission checking
> is performed twice. Once on the overlayfs inode itself based on the
> caller's security context and a second time on the underlying inode with
> the security context of the mounter of the overlayfs instance.
> 
> A mounter could have dropped all privileges aside from CAP_SYS_ADMIN so
> they'd be able to mount the overlayfs instance but would be restricted
> from accessing certain directories or files. The task accessing the
> overlayfs instance however could have a completely different security
> context including CAP_DAC_READ_SEARCH and such. Both tasks could
> reasonably be in different user namespaces and so on.
> 
> The LSM hooks are also called twice and would now also be called once.
> 
> It also forgets that acl_permission() check may very well call into the
> filesystem via check_acl().
> 
> So umount could either be used to infer existence of files that the user
> wouldn't otherwise know they exist or in the worst case allow to umount
> something that they wouldn't have access to.
> 
> Aside from that this would break userspace assumptions and as Al and
> I've mentioned before in the other thread you'd need a new flag to
> umount2() for this. The permission model can't just change behind users
> back.
> 
> But I dislike it for the now even more special-cased umount path lookup
> alone tbh. I'd feel way more comfortable with a non-lookup related
> solution that doesn't have subtle implications for permission checking.
> 

These are good points.

One way around the issues you point out might be to pass down a new
MAY_LOOKUP_MOUNTPOINT mask flag to ->permission. That would allow the
filesystem driver to decide whether it wants to avoid potentially
problematic activity when checking permissions. nfs_permission could
check for that and take a more hands-off approach to the permissions
check. Between that and skipping d_revalidate on LOOKUP_MOUNTPOINT, I
think that might do what we need.

> > to make the umount syscall and perform the path lookup and only reject
> > unprivileged users once the target mount point has been found.  If we
> > completely relax permission checks then an unprivileged user could probe
> > inaccessible parts of the name space by examining the error returned
> > from umount().
> > 
> > So we only relax permission check when the user has CAP_SYS_ADMIN
> > (may_mount() succeeds).
> > 
> > Note that if the path given is not direct and for example uses symlinks
> > or "..", then dentries or symlink content may not be cached and a remote
> > server could cause problem.  We can only be certain of a safe unmount if
> > a direct path is used.
> > 
> > Signed-off-by: NeilBrown <neilb@suse.de>
> > ---
> >  fs/namei.c | 46 ++++++++++++++++++++++++++++++++++++++++------
> >  1 file changed, 40 insertions(+), 6 deletions(-)
> > 
> > diff --git a/fs/namei.c b/fs/namei.c
> > index edfedfbccaef..f2df1adae7c5 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -498,8 +498,8 @@ static int sb_permission(struct super_block *sb, struct inode *inode, int mask)
> >   *
> >   * When checking for MAY_APPEND, MAY_WRITE must also be set in @mask.
> >   */
> > -int inode_permission(struct mnt_idmap *idmap,
> > -		     struct inode *inode, int mask)
> > +int inode_permission_mp(struct mnt_idmap *idmap,
> > +			struct inode *inode, int mask, bool mp)
> >  {
> >  	int retval;
> >  
> > @@ -523,7 +523,14 @@ int inode_permission(struct mnt_idmap *idmap,
> >  			return -EACCES;
> >  	}
> >  
> > -	retval = do_inode_permission(idmap, inode, mask);
> > +	if (mp)
> > +		/* We are looking for a mountpoint and so don't
> > +		 * want to interact with the filesystem at all, just
> > +		 * the dcache and icache.
> > +		 */
> > +		retval = generic_permission(idmap, inode, mask);
> > +	else
> > +		retval = do_inode_permission(idmap, inode, mask);
> >  	if (retval)
> >  		return retval;
> >  
> > @@ -533,6 +540,24 @@ int inode_permission(struct mnt_idmap *idmap,
> >  
> >  	return security_inode_permission(inode, mask);
> >  }
> > +
> > +/**
> > + * inode_permission - Check for access rights to a given inode
> > + * @idmap:	idmap of the mount the inode was found from
> > + * @inode:	Inode to check permission on
> > + * @mask:	Right to check for (%MAY_READ, %MAY_WRITE, %MAY_EXEC)
> > + *
> > + * Check for read/write/execute permissions on an inode.  We use fs[ug]id for
> > + * this, letting us set arbitrary permissions for filesystem access without
> > + * changing the "normal" UIDs which are used for other things.
> > + *
> > + * When checking for MAY_APPEND, MAY_WRITE must also be set in @mask.
> > + */
> > +int inode_permission(struct mnt_idmap *idmap,
> > +		     struct inode *inode, int mask)
> > +{
> > +	return inode_permission_mp(idmap, inode, mask, false);
> > +}
> >  EXPORT_SYMBOL(inode_permission);
> >  
> >  /**
> > @@ -589,6 +614,7 @@ struct nameidata {
> >  #define ND_ROOT_PRESET 1
> >  #define ND_ROOT_GRABBED 2
> >  #define ND_JUMPED 4
> > +#define ND_SYS_ADMIN 8
> >  
> >  static void __set_nameidata(struct nameidata *p, int dfd, struct filename *name)
> >  {
> > @@ -853,7 +879,8 @@ static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry)
> >  
> >  static inline int d_revalidate(struct dentry *dentry, unsigned int flags)
> >  {
> > -	if (unlikely(dentry->d_flags & DCACHE_OP_REVALIDATE))
> > +	if (unlikely(dentry->d_flags & DCACHE_OP_REVALIDATE) &&
> > +	    likely(!(flags & LOOKUP_MOUNTPOINT)))
> >  		return dentry->d_op->d_revalidate(dentry, flags);
> >  	else
> >  		return 1;
> > @@ -1708,12 +1735,17 @@ static struct dentry *lookup_slow(const struct qstr *name,
> >  static inline int may_lookup(struct mnt_idmap *idmap,
> >  			     struct nameidata *nd)
> >  {
> > +	/* If we are looking for a mountpoint and we have the SYS_ADMIN
> > +	 * capability, then we will by-pass the filesys for permission checks
> > +	 * and just use generic_permission().
> > +	 */
> > +	bool mp = (nd->flags & LOOKUP_MOUNTPOINT) && (nd->state & ND_SYS_ADMIN);
> >  	if (nd->flags & LOOKUP_RCU) {
> > -		int err = inode_permission(idmap, nd->inode, MAY_EXEC|MAY_NOT_BLOCK);
> > +		int err = inode_permission_mp(idmap, nd->inode, MAY_EXEC|MAY_NOT_BLOCK, mp);
> >  		if (err != -ECHILD || !try_to_unlazy(nd))
> >  			return err;
> >  	}
> > -	return inode_permission(idmap, nd->inode, MAY_EXEC);
> > +	return inode_permission_mp(idmap, nd->inode, MAY_EXEC, mp);
> >  }
> >  
> >  static int reserve_stack(struct nameidata *nd, struct path *link)
> > @@ -2501,6 +2533,8 @@ int filename_lookup(int dfd, struct filename *name, unsigned flags,
> >  	if (IS_ERR(name))
> >  		return PTR_ERR(name);
> >  	set_nameidata(&nd, dfd, name, root);
> > +	if ((flags & LOOKUP_MOUNTPOINT) && may_mount())
> > +		nd.state = ND_SYS_ADMIN;
> >  	retval = path_lookupat(&nd, flags | LOOKUP_RCU, path);
> >  	if (unlikely(retval == -ECHILD))
> >  		retval = path_lookupat(&nd, flags, path);
> > -- 
> > 2.40.0
> > 

-- 
Jeff Layton <jlayton@kernel.org>

next prev parent reply	other threads:[~2023-04-17 12:26 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-13 22:00 allowing for a completely cached umount(2) pathwalk Jeff Layton
2023-04-13 22:25 ` Andreas Dilger
2023-04-13 22:41 ` NeilBrown
2023-04-14  2:43   ` Al Viro
2023-04-14  3:28     ` Trond Myklebust
2023-04-14  3:51       ` Al Viro
2023-04-14  4:06         ` Trond Myklebust
2023-04-14  4:21           ` Al Viro
2023-04-14  9:41         ` Christian Brauner
2023-04-14 10:09           ` Jeff Layton
2023-04-14 11:16             ` Christian Brauner
2023-04-14 12:33               ` Jeff Layton
2023-04-14 12:51                 ` Christian Brauner
2023-04-15  9:51             ` Amir Goldstein
2023-04-14 10:06     ` Jeff Layton
2023-04-14 13:41       ` Christian Brauner
2023-04-14 14:21         ` Trond Myklebust
2023-04-14 15:13           ` Christian Brauner
2023-04-14 15:30             ` Trond Myklebust
2023-04-14 15:57               ` Trond Myklebust
2023-04-14 16:22                 ` Al Viro
2023-04-14 16:41                   ` Trond Myklebust
2023-04-14 19:01                     ` Benjamin Coddington
2023-04-17  8:22                       ` Christian Brauner
2023-04-14 16:32               ` Christian Brauner
2023-04-14  2:32 ` Al Viro
2023-04-14 10:01   ` Jeff Layton
2023-04-14 12:18     ` Christian Brauner
2023-04-14 14:57     ` Al Viro
2023-04-14 13:16   ` David Wysochanski
2023-04-16 23:13 ` [PATCH/RFC] VFS: LOOKUP_MOUNTPOINT should used cached info whenever possible NeilBrown
2023-04-17 11:55   ` Christian Brauner
2023-04-17 12:25     ` Jeff Layton [this message]
2023-04-17 14:24       ` Christian Brauner
2023-04-17 15:21         ` Jeff Layton
2023-04-17 21:34           ` NeilBrown
2023-04-18  8:10             ` Christian Brauner
2023-04-18  3:25           ` Andreas Dilger
2023-04-18  8:04             ` Christian Brauner
2023-04-20 13:05               ` Jeff Layton
2023-04-20 15:41                 ` Christian Brauner
2023-04-17 21:26     ` NeilBrown
2023-04-20 21:35       ` Al Viro
2023-04-20 22:01         ` NeilBrown
2023-04-20 22:27           ` Al Viro
2023-04-17 12:09   ` Jeff Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6c08ad94ca949d0f3525f7e1fc24a72c50affd59.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=brauner@kernel.org \
    --cc=dhowells@redhat.com \
    --cc=dwysocha@redhat.com \
    --cc=hch@lst.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox