From: Dave Chinner <david@fromorbit.com>
To: Ben Myers <bpm@sgi.com>
Cc: "Eric W. Biederman" <ebiederm@gmail.com>,
Brian Foster <bfoster@redhat.com>,
Serge Hallyn <serge.hallyn@ubuntu.com>,
Dwight Engen <dwight.engen@oracle.com>,
xfs@oss.sgi.com
Subject: Re: [PATCH v2 RFC] userns: Convert xfs to use kuid/kgid where appropriate
Date: Fri, 28 Jun 2013 11:46:58 +1000 [thread overview]
Message-ID: <20130628014658.GF32195@dastard> (raw)
In-Reply-To: <20130627205758.GQ20932@sgi.com>
On Thu, Jun 27, 2013 at 03:57:58PM -0500, Ben Myers wrote:
> Hey,
>
> On Thu, Jun 27, 2013 at 08:44:10AM +1000, Dave Chinner wrote:
> > On Wed, Jun 26, 2013 at 05:30:17PM -0400, Dwight Engen wrote:
> > > On Wed, 26 Jun 2013 12:09:24 +1000
> > > Dave Chinner <david@fromorbit.com> wrote:
> > > > On Mon, Jun 24, 2013 at 09:10:35AM -0400, Dwight Engen wrote:
> > > > > Should we just require that callers of bulkstat
> > > > > be in init_user_ns? Thoughts?
> > > >
> > > > This is one of the reasons why I want Eric to give us some idea of
> > > > how this is supposed to work - exactly how is backup and restore
> > > > supposed to be managed on a shared filesystem that is segmented up
> > > > into multiple namespace containers? We can talk about the
> > > > implementation all we like, but none of us have a clue to the policy
> > > > decisions that users will make that we need to support. Until we
> > > > have a clear idea on what policies we are supposed to be supporting,
> > > > the implementation will be ambiguous and compromised.
> > > >
> > > > e.g. If users are responsible for it, then bulkstat needs to filter
> > > > based on the current namespace. If management is responsible (i.e.
> > > > init_user_ns does backup/restore of ns-specific subtrees), then
> > > > bulkstat cannot filter and needs to reject calls from outside the
> > > > init_user_ns().
> > >
> > > Maybe we can have bulkstat always filter based on if the caller
> > > kuid_has_mapping(current_user_ns(), inode->i_uid)? That way a caller
> > > from init_user_ns can see them all, but callers from inside a userns
> > > will get a subset of inodes returned?
> >
> > We could do that, though it means bulkstat is going to be a *lot
> > slower* when called from within a user namespace environment. A
> > namespace might only have a few thousand files for backup, yet the
> > underlying filesystem might have tens of millions of inodes in it.
> > The bulkstat call now has to walk all of the inodes just to find the
> > few thousand that match the filter. And multiply that by the number
> > of namespaces all doing backups at 3am in the morning and you start
> > to get an idea of the scope of the problem....
>
> Ugh. That really doesn't map well onto bulkstat. If we wanted bulkstat to
> work well with namespaces, we might have to teach the filesystem a bit more
> about them in order to create the required indices per namespace. While a
> filter might get the job done in a pinch, wouldn't you really rather have an
> inobt? ;)
Absolutely not. :/
Filesystems can be bind mounted into multiple namespaces, you can
hard link across namespace boundaries, you can do all sorts of
things that result in inodes being shared between namespaces. You
can't have a per-namespace inobt when you can do this sort of thing
that the underlying filesystem many not even be aware of. Hell, you
can have the init namespace manipulate files for the user namespace,
and those manipulations aren't even aware they are happening inside
a namespace.
That doesn't even begin to touch on the major problems it introduces
into the on-disk format. e.g. how do you find, manage and validate
abitrarily rooted allocated inode btrees. What AG do you put them
in? What happens when you have inodes in multiple AGs in a single
namespace? One tree per AG per namespace? What happens when you have
10000 namespaces and 1000 AGs? How do we find the right inobt(s)
when we do an allocation - they aren't in the AGI anymore? How do we
walk then on mount after an unclean shutdown? How do we allocate and
remove trees? What the hell is repair supposed to do with
corrupt/lost inode btrees?
It's a rats nest, and it doesn't solve the basic problem of how
utilities that use bulkstat are supposed to behave.
> To build that inobt you'd have to know whether a given directory was the root
> of a new namespace. Maybe implementable as some kind of flag, 'everything
> below this dir is part of its own namespace, put it in this inobt'. And then
> you'd have to have a way for bulkstat to know to look there, e.g. if the caller
> is not in init_user_ns and if the initial inode had the flag, use the inobt on
> that initial inode for bulkstat instead of the regular inobts. Crazy. Could
> be done.
And I could fly to the moon, too. But like per-namespace inode
btrees, I don't see ever happening either...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2013-06-28 1:47 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-19 15:09 [PATCH] userns: Convert xfs to use kuid/kgid where appropriate Dwight Engen
2013-06-19 20:35 ` Eric W. Biederman
2013-06-20 1:41 ` Dave Chinner
2013-06-20 13:54 ` Dwight Engen
2013-06-20 21:10 ` Dave Chinner
2013-06-20 0:13 ` Dave Chinner
2013-06-20 13:54 ` Dwight Engen
2013-06-20 15:27 ` Brian Foster
2013-06-20 17:39 ` Dwight Engen
2013-06-20 19:12 ` Brian Foster
2013-06-20 22:12 ` Dave Chinner
2013-06-20 22:45 ` Eric W. Biederman
2013-06-20 23:35 ` Dave Chinner
2013-06-20 22:03 ` Dave Chinner
2013-06-21 15:14 ` Dwight Engen
2013-06-24 0:33 ` Dave Chinner
2013-06-24 13:10 ` [PATCH v2 RFC] " Dwight Engen
2013-06-25 16:46 ` Brian Foster
2013-06-25 20:08 ` Dwight Engen
2013-06-25 21:04 ` Brian Foster
2013-06-26 2:09 ` Dave Chinner
2013-06-26 21:30 ` Dwight Engen
2013-06-26 22:44 ` Dave Chinner
2013-06-27 13:02 ` Serge Hallyn
2013-06-28 1:54 ` Dave Chinner
2013-06-28 15:25 ` Serge Hallyn
2013-06-28 16:16 ` Dwight Engen
2013-06-27 20:57 ` Ben Myers
2013-06-28 1:46 ` Dave Chinner [this message]
2013-06-28 15:15 ` Serge Hallyn
2013-06-28 14:23 ` Dwight Engen
2013-06-28 15:11 ` [PATCH v3 0/6] " Dwight Engen
2013-06-28 15:11 ` [PATCH 1/6] create wrappers for converting kuid_t to/from uid_t Dwight Engen
2013-06-28 15:11 ` [PATCH 2/6] convert kuid_t to/from uid_t in ACLs Dwight Engen
2013-06-28 15:11 ` [PATCH 3/6] ioctl: check for capabilities in the current user namespace Dwight Engen
2013-06-28 15:11 ` [PATCH 4/6] convert kuid_t to/from uid_t for xfs internal structures Dwight Engen
2013-06-28 15:11 ` [PATCH 5/6] create internal eofblocks structure with kuid_t types Dwight Engen
2013-06-28 18:09 ` Brian Foster
2013-06-28 15:11 ` [PATCH 6/6] ioctl eofblocks: require non-privileged users to specify uid/gid match Dwight Engen
2013-06-28 18:50 ` Brian Foster
2013-06-28 20:28 ` Dwight Engen
2013-06-28 21:39 ` Brian Foster
2013-06-28 23:22 ` Dwight Engen
2013-07-01 12:21 ` Brian Foster
2013-07-06 4:44 ` [PATCH 1/1] export inode_capable Serge Hallyn
2013-07-08 13:09 ` [PATCH v2 RFC] userns: Convert xfs to use kuid/kgid where appropriate Serge Hallyn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130628014658.GF32195@dastard \
--to=david@fromorbit.com \
--cc=bfoster@redhat.com \
--cc=bpm@sgi.com \
--cc=dwight.engen@oracle.com \
--cc=ebiederm@gmail.com \
--cc=serge.hallyn@ubuntu.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox