From mboxrd@z Thu Jan 1 00:00:00 1970 From: NeilBrown Subject: Re: [PATCH] vfs: Test for and handle paths that are unreachable from their mnt_root Date: Mon, 17 Aug 2015 13:56:49 +1000 Message-ID: <20150817135649.0ed0f4f0@noble> References: <871tncuaf6.fsf@x220.int.ebiederm.org> <87mw5xq7lt.fsf@x220.int.ebiederm.org> <87a8yqou41.fsf_-_@x220.int.ebiederm.org> <874moq9oyb.fsf_-_@x220.int.ebiederm.org> <871tfkawu9.fsf_-_@x220.int.ebiederm.org> <87egjk9i61.fsf_-_@x220.int.ebiederm.org> <20150810043637.GC14139@ZenIV.linux.org.uk> <877foymrwt.fsf@x220.int.ebiederm.org> <87wpwyjxwc.fsf_-_@x220.int.ebiederm.org> <87fv3mjxsc.fsf_-_@x220.int.ebiederm.org> <20150815061617.GG14139@ZenIV.linux.org.uk> <874mk08l3g.fsf@x220.int.ebiederm.org> <87a8ts763c.fsf_-_@x220.int.ebiederm.org> <87bne82glg.fsf@x220.int.ebiederm.org> <87tws010r2.fsf_-_@x220.int.ebiederm.org> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: Linus Torvalds , Linux Containers , linux-fsdevel , Al Viro , Andy Lutomirski , "Serge E. Hallyn" , Richard Weinberger , Andrey Vagin , Jann Horn , Willy Tarreau , Omar Sandoval , Miklos Szeredi , "J. Bruce Fields" To: ebiederm@xmission.com (Eric W. Biederman) Return-path: Received: from mx2.suse.de ([195.135.220.15]:55321 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750976AbbHQD5F (ORCPT ); Sun, 16 Aug 2015 23:57:05 -0400 In-Reply-To: <87tws010r2.fsf_-_@x220.int.ebiederm.org> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Sat, 15 Aug 2015 20:27:13 -0500 ebiederm@xmission.com (Eric W. Biederman) wrote: > > In rare cases a directory can be renamed out from under a bind mount. > In those cases without special handling it becomes possible to walk up > the directory tree to the root dentry of the filesystem and down > from the root dentry to every other file or directory on the filesystem. > > Like division by zero .. from an unconnected path can not be given > a useful semantic as there is no predicting at which path component > the code will realize it is unconnected. We certainly can not match > the current behavior as the current behavior is a security hole. > > Therefore when encounting .. when following an unconnected path > return -ENOENT. > > - Add a function path_connected to verify path->dentry is reachable > from path->mnt.mnt_root. AKA to validate that rename did not do > something nasty to the bind mount. > > To avoid races path_connected must be called after following a path > component to it's next path component. > > Signed-off-by: "Eric W. Biederman" > --- > > This is the simple version that needs no extra vfs support. > > My availability is likely to be a bit spotty for the next while > as I am travelling to and then attending Linux Plumbers Conference. > > fs/namei.c | 27 +++++++++++++++++++++++++-- > 1 file changed, 25 insertions(+), 2 deletions(-) > > diff --git a/fs/namei.c b/fs/namei.c > index ae4e4c18b2ac..5303e994f8d6 100644 > --- a/fs/namei.c > +++ b/fs/namei.c > @@ -560,6 +560,24 @@ static int __nd_alloc_stack(struct nameidata *nd) > return 0; > } > > +/** > + * path_connected - Verify that a path->dentry is below path->mnt.mnt_root > + * @path: nameidate to verify > + * > + * Rename can sometimes move a file or directory outside of a bind > + * mount, path_connected allows those cases to be detected. While it is obviously true that a rename can move a file outside of a bind mount, it doesn't seem relevant and so could be confusing. This is only ever used for directories, and a file could already be outside a bind mount even while it is inside. I would stick with "Rename can sometimes move a directory outside..." > + */ > +static bool path_connected(const struct path *path) > +{ > + struct vfsmount *mnt = path->mnt; > + > + /* Only bind mounts can have disconnected paths */ > + if (mnt->mnt_root == mnt->mnt_sb->s_root) > + return true; > + > + return is_subdir(path->dentry, mnt->mnt_root); > +} > + > static inline int nd_alloc_stack(struct nameidata *nd) > { > if (likely(nd->depth != EMBEDDED_LEVELS)) > @@ -1296,6 +1314,8 @@ static int follow_dotdot_rcu(struct nameidata *nd) > return -ECHILD; > nd->path.dentry = parent; > nd->seq = seq; > + if (unlikely(!path_connected(&nd->path))) > + return -ENOENT; > break; > } else { > struct mount *mnt = real_mount(nd->path.mnt); > @@ -1396,7 +1416,7 @@ static void follow_mount(struct path *path) > } > } > > -static void follow_dotdot(struct nameidata *nd) > +static int follow_dotdot(struct nameidata *nd) > { > if (!nd->root.mnt) > set_root(nd); > @@ -1412,6 +1432,8 @@ static void follow_dotdot(struct nameidata *nd) > /* rare case of legitimate dget_parent()... */ > nd->path.dentry = dget_parent(nd->path.dentry); > dput(old); > + if (unlikely(!path_connected(&nd->path))) > + return -ENOENT; > break; > } > if (!follow_up(&nd->path)) > @@ -1419,6 +1441,7 @@ static void follow_dotdot(struct nameidata *nd) > } > follow_mount(&nd->path); > nd->inode = nd->path.dentry->d_inode; > + return 0; > } > > /* > @@ -1634,7 +1657,7 @@ static inline int handle_dots(struct nameidata *nd, int type) > if (nd->flags & LOOKUP_RCU) { > return follow_dotdot_rcu(nd); > } else > - follow_dotdot(nd); > + return follow_dotdot(nd); > } > return 0; > } I really like this patch, particularly from the standpoint of backporting to -stable and enterprise kernels. I suspect that all the tracking of which mounts might have been escaped from should be classified as premature optimisation, until measurements show otherwise. path_connected() adds no locks or even atomics. The path that it walks will be well-trodden and so very likely most of it will be in the CPU cache. And I particularly like that follow_dotdot() and follow_dotdot_rcu() now both that the same signature :-) What's not to like? Reviewed-by: NeilBrown in case it helps. NeilBrown