From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jan Kara Subject: Re: [LSF/MM TOPIC] [ATTEND] Container disk quota and lseek(2) upon shared extents Date: Tue, 29 Jan 2013 16:14:27 +0100 Message-ID: <20130129151427.GG32246@quack.suse.cz> References: <5107E048.6080902@oracle.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: "linux-fsdevel@vger.kernel.org" , lsf-pc@lists.linux-foundation.org, Jan Kara , Jim Meyering To: Jeff Liu Return-path: Received: from cantor2.suse.de ([195.135.220.15]:60453 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753900Ab3A2POj (ORCPT ); Tue, 29 Jan 2013 10:14:39 -0500 Content-Disposition: inline In-Reply-To: <5107E048.6080902@oracle.com> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: Hello, On Tue 29-01-13 22:44:24, Jeff Liu wrote: > I'd like to discuss the following problems on LSF: > > - Container UID/GID quota support > About more than half year ago, I have posted a patch set about support UID/GID > quota inside containers: > http://www.spinics.net/lists/linux-containers/msg25393.html > > However, I have to put it on ice at that time since this feature is depend on the > user namespace. Now I think it's time to bring it up because the user_ns was > basically done on 3.8-rcX. > > Combine with user_ns, there would have a couple of issues need to be solved at first: > 1) UID/GID mapping between global and containers quota files. > On my previous implementation, the quotas are cached in memory that is truely can not > be accepted at all, I'll try to make it as usual with journalling quota support. > > 2) To avoid modifying the quota tools, maybe we have to make quotas enabled all the > time inside containers so that the end user would just set up quota limits or won't. > > 3) Embed container quota accounting related logic into the corresponding VFS quota > routines and make it transparent for the outside file systems. So now looking into your old submission, your main aim was to make quota-tools work properly when run from inside a container, right? Because quota enforcement works properly once user namespaces are in place. In fact quota calls such as Q_GETQUOTA or Q_SETQUOTA work correctly as well with user namespaces. UID/GID translation from namespace id space to the global space and back is already happening. So what functionality are you missing? > - Introduce a new whence to lseek(2) to fetch the reflinked/sharing extents > > We have some user requests about showing the real disk footprint with OCFS2 reflinked > or Btrfs cloned files. I had written a shared-du utility based on du(1) for OCFS2 as > this is the only file system with reflink supports at that time: > https://oss.oracle.com/pipermail/ocfs2-devel/2010-September/007293.html But this is a though problem, isn't it? You have to minimally cache some info about *every* file du(1) was called on so that you can check whether two files share some extents or not. I'm not saying it isn't a useful functionality, just I'd like to verify we are on the same page. > It based on FIEMAP ioctl(2) on the user space, and OCFS2 using FIEMAP_EXTENT_SHARED > flag to indicate an extent is reflinked/cow when the internal OCFS2_EXT_REFCOUNTED > flag is detected. > > Recently, I have started to implement this feature on Btrfs in a similar approach. > Once it completed, the next thing is to teach upstream du(1) works for both file > systems with a new command option. > > Still sounds nothing because we have FIEMAP...:( But consider the bad interface > and error prone when I improving cp(1) through it for sparse files, it will extends > the ugly tentacles of FIEMAP into du(1) again that the maintainer of coreutils(Jim, CC-ed) > don't like it at all, and I also want to avoid if possible... > > How about if we add a new whence type to lseek(2) for this function? lseek has very clear > interface and works very well for SEEK_DATA/SEEK_HOLE, most likely could works fine for > shared extents IMHO. Well, I can hardly imagine how such lseek(2) interface would look to be useful for identifying shared extents among different files. Do you have something particular in mind? Honza -- Jan Kara SUSE Labs, CR