From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f65.google.com ([74.125.82.65]:35952 "EHLO mail-wm0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932226AbcEKQnE (ORCPT ); Wed, 11 May 2016 12:43:04 -0400 Date: Wed, 11 May 2016 17:42:47 +0100 From: Djalal Harouni To: James Bottomley Cc: Alexander Viro , Chris Mason , tytso@mit.edu, Serge Hallyn , Josh Triplett , "Eric W. Biederman" , Andy Lutomirski , Seth Forshee , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org, Dongsu Park , David Herrmann , Miklos Szeredi , Alban Crequy , Dave Chinner Subject: Re: [RFC v2 PATCH 0/8] VFS:userns: support portable root filesystems Message-ID: <20160511164247.GA9908@dztty.fritz.box> References: <1462372014-3786-1-git-send-email-tixxdz@gmail.com> <1462395979.14310.133.camel@HansenPartnership.com> <20160505073636.GA3357@dztty> <1462449388.2419.27.camel@HansenPartnership.com> <20160505214957.GA3071@dztty> <1462486085.2289.23.camel@HansenPartnership.com> <1462923416.14896.10.camel@HansenPartnership.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1462923416.14896.10.camel@HansenPartnership.com> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Tue, May 10, 2016 at 04:36:56PM -0700, James Bottomley wrote: > On Thu, 2016-05-05 at 18:08 -0400, James Bottomley wrote: [...] > > > > OK, so the way attributes are populated on an inode is via getattr. > > You intercept that, you change the inode owner and group that are > > installed on the inode. That means that when you list the directory, > > you see the shift and the shifted uid/gid are used to check > > permissions for vfs_open(). > > Just to illustrate how this could be done, here's a functional proof of > concept for a uid/gid shifting bind mount equivalent. It's not > actually a proper bind mount because it has to manufacture its own > inodes. As you can see, it can only be used by root, it will shift all > the uid/gid bits as well as the permission comparisons. It operates on > subtrees, so it can shift the uids/gids on any filesystem or part of > one and because the shifts are per superblock, it could actually shift > the same subtree for multiple users on different shifts. Best of all, > it requires no vfs changes at all, being entirely implemented inside > its own filesystem type. First, I guess this should be in a separate thread.. this way this RFC was just hijacked! Obviously as you say later in your response it may require a VFS change... You have just consumed all inodes... what about containers or small apps that are spawned quickly... it can even used maybe as a DoS... maybe you endup reporting different inode numbers... ? > You use it just like bind mount: > > mount -t shiftfs > > except that it takes uidshift=x:y:z and gidshift=x:y:z multiple times > as options. It's currently not recursive and it definitely needs > polishing to show things like mount options and be properly Kconfig > using. why it's not recursive ? and what if you have circular bind mounts ? Hmm anyway you are mounting this on behalf of filesystems, so if you add the recursive thing, you will just probably make everything worse, by making any /proc, /sys dentry that's under that path shiftable, and unprivileged users can just create user namespaces and read /proc/* and all the other stuff that doesn't have capable() related to the init_user_ns host... what if you have paths like /filesystem0/uidshiftedY/dir, /filesystem0/uidshiftedX/dir , /filesystem0/notshifted/dir where some of them are also bind mounts that point to same dentry ? Also, you create a totally new user namespace interface here! by making your own new interface we just lose the notion of init_user_ns and its children and mapping ? I'm not sure of the implication of all this... your user namespace mapping is not related at all to init_user_ns! it seems that it has its own init_user_ns ? does a capable() check now on a shifted filesystem relates to that and hence to your mapping or to the real init_user_ns ? > There's a bit of an open question of whether it should have vfs > changes: the way the struct file f_inode and f_ops are hijacked is a > bit nasty and perhaps d_select_inode() could be made a bit cleverer to > help us here instead. I'm not sure if this PoC works... but you sure you didn't introduce a serious vulnerability here ? you use a new mapping and you update current_fsuid() creds up, which is global on any fs operation, so may be: lets operate on any inode, update our current_fsuid()... and access the rest of *unshifted filesystems*... !? The worst thing is that current_fsuid() does not follow now the /proc/self/uid_map interface! this is a serious vulnerability and a mix of the current semantics... it's updated but using other rules...? For overlayfs I did write an expriment but for me it's not an overlayfs or another new filesystem problem... we are manipulating UID/GID identities... It would have been better if you did send this as a separate thread. It was a vfs:userns RFC fix which if we continue we turn it into a complicated thing! implement another new light filesystem with userns... (overlayfs...) Will follow up if the appropriate thread is created, not here, I guess it's ok ? > James > Thank you for your feedback! -- Djalal Harouni http://opendz.org