From: Jeff Liu <jeff.liu@oracle.com>
To: Jan Kara <jack@suse.cz>
Cc: "linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
lsf-pc@lists.linux-foundation.org,
Jim Meyering <jim@meyering.net>
Subject: Re: [LSF/MM TOPIC] [ATTEND] Container disk quota and lseek(2) upon shared extents
Date: Wed, 30 Jan 2013 00:37:08 +0800 [thread overview]
Message-ID: <5107FAB4.1010204@oracle.com> (raw)
In-Reply-To: <20130129151427.GG32246@quack.suse.cz>
Hi Jan,
On 01/29/2013 11:14 PM, Jan Kara wrote:
> Hello,
>
> On Tue 29-01-13 22:44:24, Jeff Liu wrote:
>> I'd like to discuss the following problems on LSF:
>>
>> - Container UID/GID quota support
>> About more than half year ago, I have posted a patch set about support UID/GID
>> quota inside containers:
>> http://www.spinics.net/lists/linux-containers/msg25393.html
>>
>> However, I have to put it on ice at that time since this feature is depend on the
>> user namespace. Now I think it's time to bring it up because the user_ns was
>> basically done on 3.8-rcX.
>>
>> Combine with user_ns, there would have a couple of issues need to be solved at first:
>> 1) UID/GID mapping between global and containers quota files.
>> On my previous implementation, the quotas are cached in memory that is truely can not
>> be accepted at all, I'll try to make it as usual with journalling quota support.
>>
>> 2) To avoid modifying the quota tools, maybe we have to make quotas enabled all the
>> time inside containers so that the end user would just set up quota limits or won't.
>>
>> 3) Embed container quota accounting related logic into the corresponding VFS quota
>> routines and make it transparent for the outside file systems.
> So now looking into your old submission, your main aim was to make
> quota-tools work properly when run from inside a container, right?
Right.
> Because quota enforcement works properly once user namespaces are in place. In fact
> quota calls such as Q_GETQUOTA or Q_SETQUOTA work correctly as well with
> user namespaces. UID/GID translation from namespace id space to the
> global space and back is already happening. So what functionality are you
> missing?
So looks like there is no need to revisit it.:(
Previously I found that we can not turn quota off insides containers without modifying
the quota tools, I am not sure this sounds make sense or not, or is this a fair user
requirements. Anyway, I'll play with the user namespace with quota tools for further
investigations.
>
>
>> - Introduce a new whence to lseek(2) to fetch the reflinked/sharing extents
>>
>> We have some user requests about showing the real disk footprint with OCFS2 reflinked
>> or Btrfs cloned files. I had written a shared-du utility based on du(1) for OCFS2 as
>> this is the only file system with reflink supports at that time:
>> https://oss.oracle.com/pipermail/ocfs2-devel/2010-September/007293.html
> But this is a though problem, isn't it? You have to minimally cache some
> info about *every* file du(1) was called on so that you can check whether
> two files share some extents or not. I'm not saying it isn't a useful
> functionality, just I'd like to verify we are on the same page.
Yes, from the user land, I have to cache the shared extents info, and iterate
the cached item to examine if the next one to be cached is already exists or not.
If exits, increase the count number and check the next one...otherwise, cache it,
and repeat this step again and again until all the files resides on the target
partition/directories were checked.
>
>> It based on FIEMAP ioctl(2) on the user space, and OCFS2 using FIEMAP_EXTENT_SHARED
>> flag to indicate an extent is reflinked/cow when the internal OCFS2_EXT_REFCOUNTED
>> flag is detected.
>>
>> Recently, I have started to implement this feature on Btrfs in a similar approach.
>> Once it completed, the next thing is to teach upstream du(1) works for both file
>> systems with a new command option.
>>
>> Still sounds nothing because we have FIEMAP...:( But consider the bad interface
>> and error prone when I improving cp(1) through it for sparse files, it will extends
>> the ugly tentacles of FIEMAP into du(1) again that the maintainer of coreutils(Jim, CC-ed)
>> don't like it at all, and I also want to avoid if possible...
>>
>> How about if we add a new whence type to lseek(2) for this function? lseek has very clear
>> interface and works very well for SEEK_DATA/SEEK_HOLE, most likely could works fine for
>> shared extents IMHO.
> Well, I can hardly imagine how such lseek(2) interface would look to be
> useful for identifying shared extents among different files. Do you have
> something particular in mind?
lseek(2) is not used for identifying shared extents among files. It would be improved and
called to find out and return an desired extent which is reflinked or cloned with a particular
whence, the underlying file system should be improved accordingly.
To say Btrfs, if we performed btrfs_ioctl_clone from source file A to target B, run du(1)
against both files, it would show double space although only 1/2 space is really used/reserved
upon COW.
If we can mark the cloned extents of file with a special flag(to say EXTENT_MAP_CLONED), then
call lseek(fd, offset, SEEK_CLONE or ?), it would return the offset of a cloned extent which is
equal or beyond the given offset, so we can find out all the cloned extents upon a file which
would be used for the disk space accounting in user space tools.
Just as I was mentioned above, this can be implemented through FIEMAP at user space, however,
lseek(2) can supply nicer call interface IMHO. :)
Thanks,
-Jeff
next prev parent reply other threads:[~2013-01-29 16:38 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-01-29 14:44 [LSF/MM TOPIC] [ATTEND] Container disk quota and lseek(2) upon shared extents Jeff Liu
2013-01-29 15:14 ` Jan Kara
2013-01-29 16:37 ` Jeff Liu [this message]
2013-01-29 19:19 ` Jan Kara
2013-01-30 3:49 ` Jeff Liu
2013-01-30 2:41 ` Dave Chinner
2013-01-30 4:24 ` Jeff Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5107FAB4.1010204@oracle.com \
--to=jeff.liu@oracle.com \
--cc=jack@suse.cz \
--cc=jim@meyering.net \
--cc=linux-fsdevel@vger.kernel.org \
--cc=lsf-pc@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).