linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Li Xi <pkuelelixi@gmail.com>,
	linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org,
	linux-api@vger.kernel.org, tytso@mit.edu, adilger@dilger.ca,
	jack@suse.cz, viro@zeniv.linux.org.uk, hch@infradead.org,
	dmonakhov@openvz.org, "Eric W. Biederman" <ebiederm@xmission.com>
Subject: Re: [v8 4/5] ext4: adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support
Date: Sat, 24 Jan 2015 10:30:26 +1100	[thread overview]
Message-ID: <20150123233026.GP16552@dastard> (raw)
In-Reply-To: <54C23751.7000009@yandex-team.ru>

On Fri, Jan 23, 2015 at 02:58:09PM +0300, Konstantin Khlebnikov wrote:
> On 23.01.2015 04:53, Dave Chinner wrote:
> >On Thu, Jan 22, 2015 at 06:28:51PM +0300, Konstantin Khlebnikov wrote:
> >>>+	kprojid = make_kprojid(&init_user_ns, (projid_t)projid);
> >>
> >>Maybe current_user_ns()?
> >>This code should be user-namespace aware from the beginning.
> >
> >No, the code is correct. Project quotas have nothing to do with
> >UIDs and so should never have been included in the uid/gid
> >namespace mapping infrastructure in the first place.
> 
> Right, but user-namespace provides id mapping for project-id too.
> This infrastructure adds support for nested project quotas with
> virtualized ids in sub-containers. I couldn't say that this is
> must have feature but implementation is trivial because whole
> infrastructure is already here.

This is an extremely common misunderstanding of project IDs. Project
IDs are completely separate to the UID/GID namespace.  Project
quotas were originally designed specifically for
accounting/enforcing quotas in situations where uid/gid
accounting/enforcing is not possible. This design intent goes back
25 years - it predates XFS...

IOWs, mapping prids via user namespaces defeats the purpose
for which prids were originally intended for.

> >Point in case: directory subtree quotas can be used as a resource
> >controller for limiting space usage within separate containers that
> >share the same underlying (large) filesystem via mount namespaces.
> 
> That's exactly my use-case: 'sub-volumes' for containers with
> quota for space usage/inodes count.

That doesn't require mapped project IDs. Hard container space limits
can only be controlled by the init namespace, and because inodes can
hold only one project ID the current ns cannot be allowed to change
the project ID on the inode because that allows them to escape the
resource limits set on the project ID associated with the sub-mount
set up by the init namespace...

i.e.

/mnt			prid = 0, default for entire fs.
/mnt/container1/	prid = 1, inherit, 10GB space limit
/mnt/container2/	prid = 2, inherit, 50GB space limit
.....
/mnt/containerN/	prid = N, inherit, 20GB space limit

And you clone the mount namespace for each container so the root is
at the appropriate /mnt/containerX/.  Now the containers have a
fixed amount of space they can use in the parent filesystem they
know nothing about, and it is enforced by directory subquotas
controlled by the init namespace.  This "fixed amount of space" is
reflected in the container namespace when "df" is run as it will
report the project quota space limits. Adding or removing space to a
container is as simple as changing the project quota limits from the
init namespace. i.e. an admin operation controlled by the host, not
the container....

Allowing the container to modify the prid and/or the inherit bit of
inodes in it's namespace then means the user can define their own
space usage limits, even turn them off. It's not a resource
container at that point because the user can define their own
limits.  Hence, only if the current_ns cannot change project quotas
will we have a hard fence on space usage that the container *cannot
exceed*.

Yes, I know there are other use cases for project quotas *within* a
container as controlled by the user (same as existing project quota
usages), but we don't have the capability of storing multiple
project IDs on each inode, nor accounting/enforcement across
multiple project IDs on an inode. Nor, really, do we want to (on
disk format changes required) and hence we can have one or the
other but not both.

Further, in a containerised system, providing the admin with a
trivial and easy to manage mechanism to provide hard limits on
shared filesystem space usage of each container is far more
important than catering to the occasional user who might have a need
for project quotas inside a container.

These are the points I brought up when I initially reviewed the user
namespace patches - the userns developer ignored my concerns and the
code was merged without acknowledging them, let alone addressing
them.

As we (the XFS guys) have no way of knowing when such a distinction
should be made, and with the user ns developers being completely
unresponsive on the subject, we made the decision ourselves.  Our
only concern was to be consistent, safe and predictable and that
means we choose to only allow project quotas to be used as an
external container resource hardwall limit and hence *never* allow
access to project quotas inside container namespaces.

That's the long and the short of it. project IDs are independent of
user IDs and they cannot be sanely used both inside and outside user
namespaces at the same time. Hence they should never have been
included in the user namespace mappings in the first place.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2015-01-23 23:30 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-09  5:22 [v8 0/5] ext4: add project quota support Li Xi
2014-12-09  5:22 ` [v8 1/5] vfs: adds general codes to enforces project quota limits Li Xi
2014-12-09  5:22 ` [v8 2/5] ext4: adds project ID support Li Xi
2015-01-07 23:11   ` Andreas Dilger
2015-01-08  8:51     ` Jan Kara
2015-01-15  7:52     ` Li Xi
     [not found]   ` <1418102548-5469-3-git-send-email-lixi-LfVdkaOWEx8@public.gmane.org>
2015-01-08  8:26     ` Jan Kara
2015-01-08 22:20       ` Andreas Dilger
2015-01-09  9:47         ` Jan Kara
     [not found]           ` <20150109094758.GA2576-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2015-01-09 23:46             ` Dave Chinner
2015-01-12 17:01               ` Jan Kara
2014-12-09  5:22 ` [v8 3/5] ext4: adds project quota support Li Xi
2015-01-06 20:01   ` Andreas Dilger
2015-01-06 21:52     ` Jan Kara
2014-12-09  5:22 ` [v8 4/5] ext4: adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support Li Xi
2014-12-09 22:57   ` Dave Chinner
2015-01-22 15:20     ` Konstantin Khlebnikov
2015-01-22 15:59       ` Jan Kara
2015-01-22 18:35         ` Konstantin Khlebnikov
     [not found]         ` <20150122155900.GB3062-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2015-01-23  1:39           ` Dave Chinner
2015-01-22 15:28   ` Konstantin Khlebnikov
     [not found]     ` <54C11733.7080801-XoJtRXgx1JseBXzfvpsJ4g@public.gmane.org>
2015-01-23  1:53       ` Dave Chinner
2015-01-23 11:58         ` Konstantin Khlebnikov
2015-01-23 23:30           ` Dave Chinner [this message]
2015-01-23 23:59             ` Andy Lutomirski
     [not found]               ` <CALCETrXPCrOTrkoAMuW2os=z6anaEfv4F4D2yDxo6VtCuEtRZw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-01-27  8:02                 ` Dave Chinner
2015-01-27 10:45                   ` Konstantin Khlebnikov
2015-01-28  0:37                     ` Dave Chinner
2015-02-04 15:22                       ` Konstantin Khlebnikov
     [not found]                         ` <20150204225844.GA12722@dastard>
2015-02-05  9:32                           ` Konstantin Khlebnikov
2015-02-05 16:38                           ` Jan Kara
2015-02-05 21:05                             ` Dave Chinner
2015-01-28  0:45                   ` Andy Lutomirski
     [not found] ` <1418102548-5469-1-git-send-email-lixi-LfVdkaOWEx8@public.gmane.org>
2014-12-09  5:22   ` [v8 5/5] ext4: cleanup inode flag definitions Li Xi
     [not found]     ` <1418102548-5469-6-git-send-email-lixi-LfVdkaOWEx8@public.gmane.org>
2015-01-06 20:05       ` Andreas Dilger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150123233026.GP16552@dastard \
    --to=david@fromorbit.com \
    --cc=adilger@dilger.ca \
    --cc=dmonakhov@openvz.org \
    --cc=ebiederm@xmission.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=khlebnikov@yandex-team.ru \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=pkuelelixi@gmail.com \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).