From: Jeff Liu <jeff.liu-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
To: Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
Cc: jack-AlSwsSmVLrQ@public.gmane.org,
tytso-3s7WtUTddSA@public.gmane.org,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org,
hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org,
bpm-sJ/iWh9BUns@public.gmane.org,
christopher.jones-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
tm-d1IQDZat3X0@public.gmane.org,
linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
chris.mason-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org,
tinguely-sJ/iWh9BUns@public.gmane.org
Subject: Re: container disk quota
Date: Thu, 31 May 2012 20:31:42 +0800 [thread overview]
Message-ID: <4FC764AE.4070404@oracle.com> (raw)
In-Reply-To: <4FC731C1.5000903-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
Hi Glauber,
Thanks for you comments!
On 05/31/2012 04:54 PM, Glauber Costa wrote:
> On 05/30/2012 06:58 PM, jeff.liu-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org wrote:
>> Hello All,
>>
>> According to glauber's comments regarding container disk quota, it
>> should be binded to mount
>> namespace rather than cgroup.
>>
>> Per my try out, it works just fine by combining with userland quota
>> utilitly in this way.
> that's great.
>
> I'll take a look at the patches.
>
>
>>
>> * Modify quotactl(2) to examine if the caller is invoked inside
>> container.
>> implemented by checking the quota device name("rootfs" for lxc
>> guest) or current pid namespace
>> is not the initial one, then do mount namespace quotactl if
>> required, or goto
>> the normal quotactl procedure.
>
> I dislike the use of "lxc" name. There is nothing lxc-specific in this,
> this is namespace-specific. lxc is just one of the container solutions
> out there, so let's keep it generic.
I think I should forget all things regarding LXC, just treat it as a new
quota feature with regard to namespace.
>>
>> * Also, I have not handle a couple of things for now.
>> . I think the container quota should be isolated to Jan's fs/quota/
>> directory.
>> . There are a dozens of helper routines at general quota, e.g,
>> struct if_dqblk<-> struct fs_disk_quota converts.
>> dquot space and inodes bill up.
>> They can be refactored as shared routines to some extents.
>> . quotastats(8) is not teached to aware container for now.
>>
>> Changes in quota userland utility:
>> * Introduce a new quota format string "lxc" to all quota control
>> utility, to
>> let each utility know that the user want to run container quota
>> control. e.g:
>> quotacheck -cvugm -F "lxc" /
>> quotaon -u -F "lxc" /
>> ....
>>
>> * Currently, I manually created the underlying device(by editing cgroup
>> device access list and running mknod /dev/sdaX x x) for the rootfs
>> inside containers to let the cache mount points routine pass for
>> executing quotacheck against the "/" directory. Actually, it can be
>> omitted here.
>>
>> * Add a new quotaio_lxc.c[.h] for container quota IO, it basically
>> same to
>> VFS quotaio logic, I just hope to isolate container stuff here.
>>
>> Issues:
>> * How to detect quotactl(2) is launched from container in a reasonable
>> way.
>
> It's a system call. It is always called by a process. The process
> belongs to a namespace. What else is needed?
nothing now. :)
>
>> * Do we need to let container quota works for cgroup combine with
>> unshare(1)?
>> Now the patchset is mainly works for lxc guest. IMHO, it can be
>> used outside
>> guest if the user desired. In this case, the quota limits can take
>> effort
>> among different underlying file systems if they have exported quota
>> billing
>> routines.
>
> I still don't understand what is the business of cgroups here. If you
> are attaching it to mount namespace, you can always infer the context
> from the calling process. I still need to look at your patches, but I
> believe that dropping the "feature" of manipulating this from outside of
> the container will save you a lot of trouble.
Yup, just treat it to be namespace specific, there is nothing need to
consider with cgroup interface.
>
> Please note that a process can temporarily join a namespace with
> setns(). So you can have a *utility* that does it from the outer world,
> but the kernel has no business with that. As far as we're concerned, I
> believe that you should always get your context from the current
> namespace, and forbid any usage from outside.
I'll more investigation for that.
>
>> * The hash table list defines(hash table size)for dquot caching for
>> each type is
>> referred to kernel/user.c, maybe its better to define an array
>> separatly for
>> performance optimizations. Of course, that's all depending on my
>> current
>> implementation is on the right road. :)
>>
>> * Container quota statistics, should them be calculated and exposed to
>> /proc/fs/quota? If the underlying file system also enabled with
>> quotas, they will be
>> mixed up, so how about add a new proc file like "ns_quota" there?
> No, this should be transferred to the process-specific proc and them
> symlinked. Take a look at "/proc/self".
>
>>
>> * Memory shrinks acquired from kswap.
>> As all dquot are cached in memory, and if the user executing
>> quotaoff, maybe
>> I need to handle quota disable but still be kept at memory.
>> Also, add another routine to disable and remove all quotas from
>> memory to
>> save memory directly.
>
> I didn't read your patches yet, so take it with a grain of salt here.
> But I don't understand why you make this distinction of keeping it in
> memory only.
>
> You could keep quota files outside of the container, and then bind mount
> them to the current location in the setup-phase.
I have tried to keep quota files outsides originally, but I changed my
thoughts afterwards, because of three reasons at that time:
1) The quota files could be overwrote if the container's rootfs is
located at the root directory of a storage partition, and this partition
is mounted with quota limits enabled.
2) To deal with quota files, looks I have to tweak up
quota_read()/quota_write(), assuming ext4, which are corresponding to
ext4_quota_read()/ext4_quota_write().
3) As mount namespace could be created and destroyed at any stage,
it has no memory to recall which inodes are quota files. however, quota
tools need to restore a few things from those files I remember.
but can not recalled all of them for now. :( I'll do some check up to
refresh my head in this point.
Sure, considering that we can bind mount them at setup phase, the first
concern could be ignored.
Thanks,
-Jeff
next prev parent reply other threads:[~2012-05-31 12:31 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-30 14:58 container disk quota jeff.liu-QHcLZuEGTsvQT0dZR+AlfA
[not found] ` <1338389946-13711-1-git-send-email-jeff.liu-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2012-05-30 14:58 ` [PATCH 01/12] container quota: add kernel configuration for container quota jeff.liu-QHcLZuEGTsvQT0dZR+AlfA
[not found] ` <1338389946-13711-2-git-send-email-jeff.liu-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2012-05-31 9:00 ` Glauber Costa
2012-05-31 9:01 ` Glauber Costa
2012-05-30 14:58 ` [PATCH 02/12] container quota: lock/unlock mount namespace when performing quotactl jeff.liu-QHcLZuEGTsvQT0dZR+AlfA
[not found] ` <1338389946-13711-3-git-send-email-jeff.liu-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2012-05-31 9:04 ` Glauber Costa
[not found] ` <4FC73418.1040402-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-05-31 12:40 ` Jeff Liu
2012-05-30 14:58 ` [PATCH 03/12] container quota: introduce container quota format identifier jeff.liu-QHcLZuEGTsvQT0dZR+AlfA
2012-05-30 14:58 ` [PATCH 04/12] container quota: introduce container disk quota data header file jeff.liu-QHcLZuEGTsvQT0dZR+AlfA
2012-05-31 9:10 ` Glauber Costa
[not found] ` <4FC735A2.4040400-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-05-31 12:53 ` Jeff Liu
2012-05-30 14:58 ` [PATCH 05/12] container quota: bind disk quota stuff on mount namespace jeff.liu-QHcLZuEGTsvQT0dZR+AlfA
2012-05-30 14:59 ` [PATCH 06/12] container quota: implementations and header for block/inode bill up jeff.liu-QHcLZuEGTsvQT0dZR+AlfA
2012-05-30 14:59 ` [PATCH 07/12] container quota: add quota control source file jeff.liu-QHcLZuEGTsvQT0dZR+AlfA
2012-05-30 14:59 ` [PATCH 08/12] container quota: let quotactl(2) works for container jeff.liu-QHcLZuEGTsvQT0dZR+AlfA
2012-05-30 14:59 ` [PATCH 09/12] container quota: add container disk quota entry to Makefile jeff.liu-QHcLZuEGTsvQT0dZR+AlfA
2012-05-30 14:59 ` [PATCH 10/12] container quota: bill container inodes alloc/free on ext4 jeff.liu-QHcLZuEGTsvQT0dZR+AlfA
[not found] ` <1338389946-13711-11-git-send-email-jeff.liu-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2012-05-30 15:55 ` Ted Ts'o
[not found] ` <20120530155543.GB13236-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2012-05-31 1:43 ` Jeff Liu
[not found] ` <4FC6CCB6.4090908-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2012-05-31 1:54 ` Ted Ts'o
[not found] ` <20120531015453.GA6759-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2012-05-31 2:37 ` Jeff Liu
[not found] ` <4FC6D94D.6040106-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2012-05-31 3:24 ` Jeff Liu
2012-05-31 9:15 ` Glauber Costa
[not found] ` <4FC736AD.2070404-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-05-31 12:58 ` Jeff Liu
[not found] ` <4FC76B0D.6020804-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2012-05-31 13:14 ` Glauber Costa
[not found] ` <4FC76ECA.3070301-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-05-31 13:43 ` Jeff Liu
2012-06-05 0:03 ` Dave Chinner
2012-05-30 14:59 ` [PATCH 11/11] container quota: bill container disk blocks " jeff.liu-QHcLZuEGTsvQT0dZR+AlfA
2012-05-30 14:59 ` [PATCH 12/12] container quota: init/destroy container dqinfo on mount namespace jeff.liu-QHcLZuEGTsvQT0dZR+AlfA
2012-05-31 8:54 ` container disk quota Glauber Costa
[not found] ` <4FC731C1.5000903-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-05-31 9:19 ` Glauber Costa
[not found] ` <4FC7378B.2030707-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-05-31 13:04 ` Jeff Liu
2012-05-31 12:31 ` Jeff Liu [this message]
2012-06-01 15:54 ` Jan Kara
[not found] ` <20120601155457.GA30909-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2012-06-01 16:04 ` Serge Hallyn
2012-06-02 5:59 ` Jeff Liu
2012-06-02 6:06 ` Kirill Korotaev
[not found] ` <01FED15D-15A3-4542-B95B-1166F0A309E6-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-06-02 6:24 ` Jeff Liu
[not found] ` <4FC9B183.10605-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2012-06-02 15:21 ` Kirill Korotaev
[not found] ` <8660DDAA-D7A7-4C03-8CBB-9DB7E94C80CB-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-06-03 4:23 ` Jeff Liu
[not found] ` <4FCAE6CB.8060208-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2012-06-03 5:47 ` Kirill Korotaev
[not found] ` <81DE9C10-649B-4D13-86B0-200944AE8767-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-06-03 6:02 ` Jeff Liu
2012-06-03 9:48 ` Glauber Costa
2012-06-04 2:57 ` Serge Hallyn
2012-06-04 4:46 ` Jeff Liu
[not found] ` <4FCC3DB9.40105-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2012-06-04 9:42 ` Jan Kara
[not found] ` <20120604094224.GA7670-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2012-06-04 13:35 ` Jeff Liu
2012-06-04 13:56 ` Jan Kara
[not found] ` <20120604135615.GD11010-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2012-06-04 14:55 ` Jeff Liu
[not found] ` <4FCCCC64.5060301-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2012-06-04 15:50 ` Jeff Liu
2012-06-02 5:42 ` Jeff Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FC764AE.4070404@oracle.com \
--to=jeff.liu-qhclzuegtsvqt0dzr+alfa@public.gmane.org \
--cc=bpm-sJ/iWh9BUns@public.gmane.org \
--cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=chris.mason-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
--cc=christopher.jones-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
--cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
--cc=david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org \
--cc=glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org \
--cc=hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org \
--cc=jack-AlSwsSmVLrQ@public.gmane.org \
--cc=linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=tinguely-sJ/iWh9BUns@public.gmane.org \
--cc=tm-d1IQDZat3X0@public.gmane.org \
--cc=tytso-3s7WtUTddSA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).