From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from bedivere.hansenpartnership.com ([66.63.167.143]:38942 "EHLO bedivere.hansenpartnership.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934674AbeFRO5C (ORCPT ); Mon, 18 Jun 2018 10:57:02 -0400 Message-ID: <1529333819.4021.4.camel@HansenPartnership.com> Subject: Re: shiftfs status and future development From: James Bottomley To: Seth Forshee Cc: "Serge E. Hallyn" , linux-fsdevel@vger.kernel.org, containers@lists.linux-foundation.org, Tyler Hicks , Christian Brauner Date: Mon, 18 Jun 2018 07:56:59 -0700 In-Reply-To: <20180618134032.GP30028@ubuntu-xps13> References: <20180614184448.GC30028@ubuntu-xps13> <20180615135638.GA29299@mail.hallyn.com> <20180615145917.GF30028@ubuntu-xps13> <1529118185.4048.46.camel@HansenPartnership.com> <20180618134032.GP30028@ubuntu-xps13> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Mon, 2018-06-18 at 08:40 -0500, Seth Forshee wrote: > On Fri, Jun 15, 2018 at 08:03:05PM -0700, James Bottomley wrote: > > On Fri, 2018-06-15 at 09:59 -0500, Seth Forshee wrote: > > > On Fri, Jun 15, 2018 at 08:56:38AM -0500, Serge E. Hallyn wrote: > > > > Quoting Seth Forshee (seth.forshee@canonical.com): > > > > > I wanted to inquire about the current status of shiftfs and > > > > > the plans for it moving forward. We'd like to have this > > > > > functionality available for use in lxd, and I'm interesetd in > > > > > helping with development (or picking up development if it's > > > > > stalled). > > > > > > > > > > To start, is anyone still working on shiftfs or similar > > > > > functionality? I haven't found it in any git tree on > > > > > kernel.org, and as far as mailing list activity the last > > > > > submission I can find is [1]. Is there anything newer than > > > > > this? > > > > > > > > > > Based on past mailing list discussions, it seems like there > > > > > was still debate as to whether this feature should be an > > > > > overlay filesystem or something supported at the vfs level. > > > > > Was this ever resolved? > > > > > > > > > > Thanks, > > > > > Seth > > > > > > > > > > [1] > > > > > http://lkml.kernel.org/r/1487638025.2337.49.camel@HansenPartn > > > > > ership.com > > > > > > > > Hey Seth, > > > > > > > > I haven't heard anything in a long time.  But if this is going > > > > to pick back up, can we come up with a detailed set of goals > > > > and requirements? > > > > That would actually help. > > > > > I was planning to follow up later with some discussion of > > > requirements. Here are some of ours: > > > > > >  - Supports any id maps possible for a user namespace > > > > Could you clarify: right at the moment, it basically reverses the > > namespace ID mapping when it does on to the filesystem using the > > superblock user namespace, so, in theory you can have an arbitrary > > mapping simply by changing the s_userns.  The problem here is that > > you don't have a lot of tools for manipulating the s_userns. > > For our purposes the way you're shifting with s_user_ns works fine. I > know that Serge would prefer a more arbitrary shift so that an > arbitrary, unprivileged range in the source fs could be use (e.g. use > ids 100000 - 101000 in the source instead of 0 - 1000), and my > thoughts on that are quoted below. The original (v1) shiftfs did simply take a range of ids to shift as an argument. However, that one could only be set up by root and Eric expressed a desire that it use the s_user_ns. > > >  - Does not break inotify > > > > I don't expect it does, but I haven't checked. > > I haven't checked either; I'm planning to do so soon. This is a > concern that was expressed to me by others, I think because inotify > doesn't work with overlayfs. I think shiftfs does work simply because it doesn't really do overlays, so lots of stuff that doesn't work with overlays does work with it. > > >  - Passes accurate disk usage and source information from the > > > "underlay" > > > > mounts of this type don't currently show up in df > > > > >  - Works with a variety of filesystems (ext4, xfx, btrfs, etc.) > > > > yes > > > > >  - Works with nested containers > > > > yes > > I'd say not so much: > >         /* to mark a mount point, must be real root */ >         if (ssi->mark && !capable(CAP_SYS_ADMIN)) >                 goto out; > > So within a container I cannot mark a range to be shiftfs-mountable > within a container I create. I'd argue that as long as a user has > CAP_SYS_ADMIN towards sb->s_user_ns for the source filesystem it > should be safe to allow this as it implies privleges wrt all ids > found in the source mount. This will likely lead to stacked shiftfs > mounts, not sure yet whether or not this works in the current code. Um, I think we have different definitions of "works with nested containers". Recall that for a nested container the s_user_ns is also nested, so we shift all the way back to the uid in the root. That means if the check for marking is not capable(CAP_SYS_ADMIN) then an unprivileged user would be able to gain root write access by setting up a nested shift. If your definition of nested means we only shift back one level of user_ns nesting then this could become ns_capable(), so I think we need to add "what is the desired nesting behaviour?" to the questions to be answered by the requirements. James