* [LSF/MM TOPIC] [ATTEND] Container disk quota and lseek(2) upon shared extents
@ 2013-01-29 14:44 Jeff Liu
2013-01-29 15:14 ` Jan Kara
0 siblings, 1 reply; 7+ messages in thread
From: Jeff Liu @ 2013-01-29 14:44 UTC (permalink / raw)
To: linux-fsdevel@vger.kernel.org, lsf-pc; +Cc: Jan Kara, Jim Meyering
Hello,
I'd like to discuss the following problems on LSF:
- Container UID/GID quota support
About more than half year ago, I have posted a patch set about support UID/GID
quota inside containers:
http://www.spinics.net/lists/linux-containers/msg25393.html
However, I have to put it on ice at that time since this feature is depend on the
user namespace. Now I think it's time to bring it up because the user_ns was
basically done on 3.8-rcX.
Combine with user_ns, there would have a couple of issues need to be solved at first:
1) UID/GID mapping between global and containers quota files.
On my previous implementation, the quotas are cached in memory that is truely can not
be accepted at all, I'll try to make it as usual with journalling quota support.
2) To avoid modifying the quota tools, maybe we have to make quotas enabled all the
time inside containers so that the end user would just set up quota limits or won't.
3) Embed container quota accounting related logic into the corresponding VFS quota
routines and make it transparent for the outside file systems.
- Introduce a new whence to lseek(2) to fetch the reflinked/sharing extents
We have some user requests about showing the real disk footprint with OCFS2 reflinked
or Btrfs cloned files. I had written a shared-du utility based on du(1) for OCFS2 as
this is the only file system with reflink supports at that time:
https://oss.oracle.com/pipermail/ocfs2-devel/2010-September/007293.html
It based on FIEMAP ioctl(2) on the user space, and OCFS2 using FIEMAP_EXTENT_SHARED
flag to indicate an extent is reflinked/cow when the internal OCFS2_EXT_REFCOUNTED
flag is detected.
Recently, I have started to implement this feature on Btrfs in a similar approach.
Once it completed, the next thing is to teach upstream du(1) works for both file
systems with a new command option.
Still sounds nothing because we have FIEMAP...:( But consider the bad interface
and error prone when I improving cp(1) through it for sparse files, it will extends
the ugly tentacles of FIEMAP into du(1) again that the maintainer of coreutils(Jim, CC-ed)
don't like it at all, and I also want to avoid if possible...
How about if we add a new whence type to lseek(2) for this function? lseek has very clear
interface and works very well for SEEK_DATA/SEEK_HOLE, most likely could works fine for
shared extents IMHO.
Thanks,
-Jeff
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [LSF/MM TOPIC] [ATTEND] Container disk quota and lseek(2) upon shared extents
2013-01-29 14:44 [LSF/MM TOPIC] [ATTEND] Container disk quota and lseek(2) upon shared extents Jeff Liu
@ 2013-01-29 15:14 ` Jan Kara
2013-01-29 16:37 ` Jeff Liu
0 siblings, 1 reply; 7+ messages in thread
From: Jan Kara @ 2013-01-29 15:14 UTC (permalink / raw)
To: Jeff Liu; +Cc: linux-fsdevel@vger.kernel.org, lsf-pc, Jan Kara, Jim Meyering
Hello,
On Tue 29-01-13 22:44:24, Jeff Liu wrote:
> I'd like to discuss the following problems on LSF:
>
> - Container UID/GID quota support
> About more than half year ago, I have posted a patch set about support UID/GID
> quota inside containers:
> http://www.spinics.net/lists/linux-containers/msg25393.html
>
> However, I have to put it on ice at that time since this feature is depend on the
> user namespace. Now I think it's time to bring it up because the user_ns was
> basically done on 3.8-rcX.
>
> Combine with user_ns, there would have a couple of issues need to be solved at first:
> 1) UID/GID mapping between global and containers quota files.
> On my previous implementation, the quotas are cached in memory that is truely can not
> be accepted at all, I'll try to make it as usual with journalling quota support.
>
> 2) To avoid modifying the quota tools, maybe we have to make quotas enabled all the
> time inside containers so that the end user would just set up quota limits or won't.
>
> 3) Embed container quota accounting related logic into the corresponding VFS quota
> routines and make it transparent for the outside file systems.
So now looking into your old submission, your main aim was to make
quota-tools work properly when run from inside a container, right? Because
quota enforcement works properly once user namespaces are in place. In fact
quota calls such as Q_GETQUOTA or Q_SETQUOTA work correctly as well with
user namespaces. UID/GID translation from namespace id space to the
global space and back is already happening. So what functionality are you
missing?
> - Introduce a new whence to lseek(2) to fetch the reflinked/sharing extents
>
> We have some user requests about showing the real disk footprint with OCFS2 reflinked
> or Btrfs cloned files. I had written a shared-du utility based on du(1) for OCFS2 as
> this is the only file system with reflink supports at that time:
> https://oss.oracle.com/pipermail/ocfs2-devel/2010-September/007293.html
But this is a though problem, isn't it? You have to minimally cache some
info about *every* file du(1) was called on so that you can check whether
two files share some extents or not. I'm not saying it isn't a useful
functionality, just I'd like to verify we are on the same page.
> It based on FIEMAP ioctl(2) on the user space, and OCFS2 using FIEMAP_EXTENT_SHARED
> flag to indicate an extent is reflinked/cow when the internal OCFS2_EXT_REFCOUNTED
> flag is detected.
>
> Recently, I have started to implement this feature on Btrfs in a similar approach.
> Once it completed, the next thing is to teach upstream du(1) works for both file
> systems with a new command option.
>
> Still sounds nothing because we have FIEMAP...:( But consider the bad interface
> and error prone when I improving cp(1) through it for sparse files, it will extends
> the ugly tentacles of FIEMAP into du(1) again that the maintainer of coreutils(Jim, CC-ed)
> don't like it at all, and I also want to avoid if possible...
>
> How about if we add a new whence type to lseek(2) for this function? lseek has very clear
> interface and works very well for SEEK_DATA/SEEK_HOLE, most likely could works fine for
> shared extents IMHO.
Well, I can hardly imagine how such lseek(2) interface would look to be
useful for identifying shared extents among different files. Do you have
something particular in mind?
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [LSF/MM TOPIC] [ATTEND] Container disk quota and lseek(2) upon shared extents
2013-01-29 15:14 ` Jan Kara
@ 2013-01-29 16:37 ` Jeff Liu
2013-01-29 19:19 ` Jan Kara
2013-01-30 2:41 ` Dave Chinner
0 siblings, 2 replies; 7+ messages in thread
From: Jeff Liu @ 2013-01-29 16:37 UTC (permalink / raw)
To: Jan Kara; +Cc: linux-fsdevel@vger.kernel.org, lsf-pc, Jim Meyering
Hi Jan,
On 01/29/2013 11:14 PM, Jan Kara wrote:
> Hello,
>
> On Tue 29-01-13 22:44:24, Jeff Liu wrote:
>> I'd like to discuss the following problems on LSF:
>>
>> - Container UID/GID quota support
>> About more than half year ago, I have posted a patch set about support UID/GID
>> quota inside containers:
>> http://www.spinics.net/lists/linux-containers/msg25393.html
>>
>> However, I have to put it on ice at that time since this feature is depend on the
>> user namespace. Now I think it's time to bring it up because the user_ns was
>> basically done on 3.8-rcX.
>>
>> Combine with user_ns, there would have a couple of issues need to be solved at first:
>> 1) UID/GID mapping between global and containers quota files.
>> On my previous implementation, the quotas are cached in memory that is truely can not
>> be accepted at all, I'll try to make it as usual with journalling quota support.
>>
>> 2) To avoid modifying the quota tools, maybe we have to make quotas enabled all the
>> time inside containers so that the end user would just set up quota limits or won't.
>>
>> 3) Embed container quota accounting related logic into the corresponding VFS quota
>> routines and make it transparent for the outside file systems.
> So now looking into your old submission, your main aim was to make
> quota-tools work properly when run from inside a container, right?
Right.
> Because quota enforcement works properly once user namespaces are in place. In fact
> quota calls such as Q_GETQUOTA or Q_SETQUOTA work correctly as well with
> user namespaces. UID/GID translation from namespace id space to the
> global space and back is already happening. So what functionality are you
> missing?
So looks like there is no need to revisit it.:(
Previously I found that we can not turn quota off insides containers without modifying
the quota tools, I am not sure this sounds make sense or not, or is this a fair user
requirements. Anyway, I'll play with the user namespace with quota tools for further
investigations.
>
>
>> - Introduce a new whence to lseek(2) to fetch the reflinked/sharing extents
>>
>> We have some user requests about showing the real disk footprint with OCFS2 reflinked
>> or Btrfs cloned files. I had written a shared-du utility based on du(1) for OCFS2 as
>> this is the only file system with reflink supports at that time:
>> https://oss.oracle.com/pipermail/ocfs2-devel/2010-September/007293.html
> But this is a though problem, isn't it? You have to minimally cache some
> info about *every* file du(1) was called on so that you can check whether
> two files share some extents or not. I'm not saying it isn't a useful
> functionality, just I'd like to verify we are on the same page.
Yes, from the user land, I have to cache the shared extents info, and iterate
the cached item to examine if the next one to be cached is already exists or not.
If exits, increase the count number and check the next one...otherwise, cache it,
and repeat this step again and again until all the files resides on the target
partition/directories were checked.
>
>> It based on FIEMAP ioctl(2) on the user space, and OCFS2 using FIEMAP_EXTENT_SHARED
>> flag to indicate an extent is reflinked/cow when the internal OCFS2_EXT_REFCOUNTED
>> flag is detected.
>>
>> Recently, I have started to implement this feature on Btrfs in a similar approach.
>> Once it completed, the next thing is to teach upstream du(1) works for both file
>> systems with a new command option.
>>
>> Still sounds nothing because we have FIEMAP...:( But consider the bad interface
>> and error prone when I improving cp(1) through it for sparse files, it will extends
>> the ugly tentacles of FIEMAP into du(1) again that the maintainer of coreutils(Jim, CC-ed)
>> don't like it at all, and I also want to avoid if possible...
>>
>> How about if we add a new whence type to lseek(2) for this function? lseek has very clear
>> interface and works very well for SEEK_DATA/SEEK_HOLE, most likely could works fine for
>> shared extents IMHO.
> Well, I can hardly imagine how such lseek(2) interface would look to be
> useful for identifying shared extents among different files. Do you have
> something particular in mind?
lseek(2) is not used for identifying shared extents among files. It would be improved and
called to find out and return an desired extent which is reflinked or cloned with a particular
whence, the underlying file system should be improved accordingly.
To say Btrfs, if we performed btrfs_ioctl_clone from source file A to target B, run du(1)
against both files, it would show double space although only 1/2 space is really used/reserved
upon COW.
If we can mark the cloned extents of file with a special flag(to say EXTENT_MAP_CLONED), then
call lseek(fd, offset, SEEK_CLONE or ?), it would return the offset of a cloned extent which is
equal or beyond the given offset, so we can find out all the cloned extents upon a file which
would be used for the disk space accounting in user space tools.
Just as I was mentioned above, this can be implemented through FIEMAP at user space, however,
lseek(2) can supply nicer call interface IMHO. :)
Thanks,
-Jeff
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [LSF/MM TOPIC] [ATTEND] Container disk quota and lseek(2) upon shared extents
2013-01-29 16:37 ` Jeff Liu
@ 2013-01-29 19:19 ` Jan Kara
2013-01-30 3:49 ` Jeff Liu
2013-01-30 2:41 ` Dave Chinner
1 sibling, 1 reply; 7+ messages in thread
From: Jan Kara @ 2013-01-29 19:19 UTC (permalink / raw)
To: Jeff Liu; +Cc: Jan Kara, linux-fsdevel@vger.kernel.org, lsf-pc, Jim Meyering
Hi Jeff,
On Wed 30-01-13 00:37:08, Jeff Liu wrote:
> On 01/29/2013 11:14 PM, Jan Kara wrote:
> > Hello,
> >
> > On Tue 29-01-13 22:44:24, Jeff Liu wrote:
> >> I'd like to discuss the following problems on LSF:
> >>
> >> - Container UID/GID quota support
> >> About more than half year ago, I have posted a patch set about support UID/GID
> >> quota inside containers:
> >> http://www.spinics.net/lists/linux-containers/msg25393.html
> >>
> >> However, I have to put it on ice at that time since this feature is depend on the
> >> user namespace. Now I think it's time to bring it up because the user_ns was
> >> basically done on 3.8-rcX.
> >>
> >> Combine with user_ns, there would have a couple of issues need to be solved at first:
> >> 1) UID/GID mapping between global and containers quota files.
> >> On my previous implementation, the quotas are cached in memory that is truely can not
> >> be accepted at all, I'll try to make it as usual with journalling quota support.
> >>
> >> 2) To avoid modifying the quota tools, maybe we have to make quotas enabled all the
> >> time inside containers so that the end user would just set up quota limits or won't.
> >>
> >> 3) Embed container quota accounting related logic into the corresponding VFS quota
> >> routines and make it transparent for the outside file systems.
> > So now looking into your old submission, your main aim was to make
> > quota-tools work properly when run from inside a container, right?
> Right.
> > Because quota enforcement works properly once user namespaces are in place. In fact
> > quota calls such as Q_GETQUOTA or Q_SETQUOTA work correctly as well with
> > user namespaces. UID/GID translation from namespace id space to the
> > global space and back is already happening. So what functionality are you
> > missing?
> So looks like there is no need to revisit it.:(
> Previously I found that we can not turn quota off insides containers without modifying
> the quota tools, I am not sure this sounds make sense or not, or is this a fair user
> requirements. Anyway, I'll play with the user namespace with quota tools for further
> investigations.
So turning quotas on/off is a filesystem global action. As such it's hard
to make it work from containers when you don't have fs-per-container
setup... Implementing something like per-namespace quota enforcement (i.e.
only processes from a particular namespace will not be allowed to exceed
quota) might be reasonably possible though - you would just need to tweak
sb_has_quota_limits_enabled() function to take also current namespace into
account.
> >> - Introduce a new whence to lseek(2) to fetch the reflinked/sharing extents
> >>
> >> We have some user requests about showing the real disk footprint with OCFS2 reflinked
> >> or Btrfs cloned files. I had written a shared-du utility based on du(1) for OCFS2 as
> >> this is the only file system with reflink supports at that time:
> >> https://oss.oracle.com/pipermail/ocfs2-devel/2010-September/007293.html
> > But this is a though problem, isn't it? You have to minimally cache some
> > info about *every* file du(1) was called on so that you can check whether
> > two files share some extents or not. I'm not saying it isn't a useful
> > functionality, just I'd like to verify we are on the same page.
> Yes, from the user land, I have to cache the shared extents info, and
> iterate the cached item to examine if the next one to be cached is
> already exists or not. If exits, increase the count number and check the
> next one...otherwise, cache it, and repeat this step again and again
> until all the files resides on the target partition/directories were
> checked.
Yes, that's what I'd imagine.
> >> It based on FIEMAP ioctl(2) on the user space, and OCFS2 using FIEMAP_EXTENT_SHARED
> >> flag to indicate an extent is reflinked/cow when the internal OCFS2_EXT_REFCOUNTED
> >> flag is detected.
> >>
> >> Recently, I have started to implement this feature on Btrfs in a similar approach.
> >> Once it completed, the next thing is to teach upstream du(1) works for both file
> >> systems with a new command option.
> >>
> >> Still sounds nothing because we have FIEMAP...:( But consider the bad interface
> >> and error prone when I improving cp(1) through it for sparse files, it will extends
> >> the ugly tentacles of FIEMAP into du(1) again that the maintainer of coreutils(Jim, CC-ed)
> >> don't like it at all, and I also want to avoid if possible...
> >>
> >> How about if we add a new whence type to lseek(2) for this function? lseek has very clear
> >> interface and works very well for SEEK_DATA/SEEK_HOLE, most likely could works fine for
> >> shared extents IMHO.
> > Well, I can hardly imagine how such lseek(2) interface would look to be
> > useful for identifying shared extents among different files. Do you have
> > something particular in mind?
> lseek(2) is not used for identifying shared extents among files. It
> would be improved and called to find out and return an desired extent
> which is reflinked or cloned with a particular whence, the underlying
> file system should be improved accordingly.
>
> To say Btrfs, if we performed btrfs_ioctl_clone from source file A to
> target B, run du(1) against both files, it would show double space
> although only 1/2 space is really used/reserved upon COW.
>
> If we can mark the cloned extents of file with a special flag(to say
> EXTENT_MAP_CLONED), then call lseek(fd, offset, SEEK_CLONE or ?), it
> would return the offset of a cloned extent which is equal or beyond the
> given offset, so we can find out all the cloned extents upon a file which
> would be used for the disk space accounting in user space tools.
OK, but then you have to call FIEMAP anyway to find which blocks are
underlying the extent so that you can match that with cloned extents from
different files. Ah, and the advantage would be that you don't have to
cache *all* the extents but only those that are reported as reflinked. OK,
now I see.
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [LSF/MM TOPIC] [ATTEND] Container disk quota and lseek(2) upon shared extents
2013-01-29 16:37 ` Jeff Liu
2013-01-29 19:19 ` Jan Kara
@ 2013-01-30 2:41 ` Dave Chinner
2013-01-30 4:24 ` Jeff Liu
1 sibling, 1 reply; 7+ messages in thread
From: Dave Chinner @ 2013-01-30 2:41 UTC (permalink / raw)
To: Jeff Liu; +Cc: Jan Kara, linux-fsdevel@vger.kernel.org, lsf-pc, Jim Meyering
On Wed, Jan 30, 2013 at 12:37:08AM +0800, Jeff Liu wrote:
> On 01/29/2013 11:14 PM, Jan Kara wrote:
> > On Tue 29-01-13 22:44:24, Jeff Liu wrote:
> >> I'd like to discuss the following problems on LSF:
....
> >> Still sounds nothing because we have FIEMAP...:( But consider the bad interface
> >> and error prone when I improving cp(1) through it for sparse files, it will extends
> >> the ugly tentacles of FIEMAP into du(1) again that the maintainer of coreutils(Jim, CC-ed)
> >> don't like it at all, and I also want to avoid if possible...
> >>
> >> How about if we add a new whence type to lseek(2) for this function? lseek has very clear
> >> interface and works very well for SEEK_DATA/SEEK_HOLE, most likely could works fine for
> >> shared extents IMHO.
> > Well, I can hardly imagine how such lseek(2) interface would look to be
> > useful for identifying shared extents among different files. Do you have
> > something particular in mind?
> lseek(2) is not used for identifying shared extents among files. It would be improved and
> called to find out and return an desired extent which is reflinked or cloned with a particular
> whence, the underlying file system should be improved accordingly.
>
> To say Btrfs, if we performed btrfs_ioctl_clone from source file A to target B, run du(1)
> against both files, it would show double space although only 1/2 space is really used/reserved
> upon COW.
>
> If we can mark the cloned extents of file with a special flag(to say EXTENT_MAP_CLONED), then
> call lseek(fd, offset, SEEK_CLONE or ?), it would return the offset of a cloned extent which is
> equal or beyond the given offset, so we can find out all the cloned extents upon a file which
> would be used for the disk space accounting in user space tools.
Why do you need lseek to find this? Couldn't we just add a filter
option to fiemap to ensure that it only returns extents in the file
that match the filter? You could then implement what you want with a
single call rather than having to indirectly iterate the extent map
with lseek() and then calling FIEMAP on each of the
regions discovered by lseek.....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [LSF/MM TOPIC] [ATTEND] Container disk quota and lseek(2) upon shared extents
2013-01-29 19:19 ` Jan Kara
@ 2013-01-30 3:49 ` Jeff Liu
0 siblings, 0 replies; 7+ messages in thread
From: Jeff Liu @ 2013-01-30 3:49 UTC (permalink / raw)
To: Jan Kara; +Cc: linux-fsdevel@vger.kernel.org, lsf-pc, Jim Meyering
On 01/30/2013 03:19 AM, Jan Kara wrote:
> Hi Jeff,
>
> On Wed 30-01-13 00:37:08, Jeff Liu wrote:
>> On 01/29/2013 11:14 PM, Jan Kara wrote:
>>> Hello,
>>>
>>> On Tue 29-01-13 22:44:24, Jeff Liu wrote:
>>>> I'd like to discuss the following problems on LSF:
>>>>
>>>> - Container UID/GID quota support
>>>> About more than half year ago, I have posted a patch set about support UID/GID
>>>> quota inside containers:
>>>> http://www.spinics.net/lists/linux-containers/msg25393.html
>>>>
>>>> However, I have to put it on ice at that time since this feature is depend on the
>>>> user namespace. Now I think it's time to bring it up because the user_ns was
>>>> basically done on 3.8-rcX.
>>>>
>>>> Combine with user_ns, there would have a couple of issues need to be solved at first:
>>>> 1) UID/GID mapping between global and containers quota files.
>>>> On my previous implementation, the quotas are cached in memory that is truely can not
>>>> be accepted at all, I'll try to make it as usual with journalling quota support.
>>>>
>>>> 2) To avoid modifying the quota tools, maybe we have to make quotas enabled all the
>>>> time inside containers so that the end user would just set up quota limits or won't.
>>>>
>>>> 3) Embed container quota accounting related logic into the corresponding VFS quota
>>>> routines and make it transparent for the outside file systems.
>>> So now looking into your old submission, your main aim was to make
>>> quota-tools work properly when run from inside a container, right?
>> Right.
>>> Because quota enforcement works properly once user namespaces are in place. In fact
>>> quota calls such as Q_GETQUOTA or Q_SETQUOTA work correctly as well with
>>> user namespaces. UID/GID translation from namespace id space to the
>>> global space and back is already happening. So what functionality are you
>>> missing?
>> So looks like there is no need to revisit it.:(
>> Previously I found that we can not turn quota off insides containers without modifying
>> the quota tools, I am not sure this sounds make sense or not, or is this a fair user
>> requirements. Anyway, I'll play with the user namespace with quota tools for further
>> investigations.
> So turning quotas on/off is a filesystem global action. As such it's hard
> to make it work from containers when you don't have fs-per-container
> setup... Implementing something like per-namespace quota enforcement (i.e.
> only processes from a particular namespace will not be allowed to exceed
> quota) might be reasonably possible though - you would just need to tweak
> sb_has_quota_limits_enabled() function to take also current namespace into
> account.
Yep, let me give a try.
>
>>>> - Introduce a new whence to lseek(2) to fetch the reflinked/sharing extents
>>>>
>>>> We have some user requests about showing the real disk footprint with OCFS2 reflinked
>>>> or Btrfs cloned files. I had written a shared-du utility based on du(1) for OCFS2 as
>>>> this is the only file system with reflink supports at that time:
>>>> https://oss.oracle.com/pipermail/ocfs2-devel/2010-September/007293.html
>>> But this is a though problem, isn't it? You have to minimally cache some
>>> info about *every* file du(1) was called on so that you can check whether
>>> two files share some extents or not. I'm not saying it isn't a useful
>>> functionality, just I'd like to verify we are on the same page.
>> Yes, from the user land, I have to cache the shared extents info, and
>> iterate the cached item to examine if the next one to be cached is
>> already exists or not. If exits, increase the count number and check the
>> next one...otherwise, cache it, and repeat this step again and again
>> until all the files resides on the target partition/directories were
>> checked.
> Yes, that's what I'd imagine.
>
>>>> It based on FIEMAP ioctl(2) on the user space, and OCFS2 using FIEMAP_EXTENT_SHARED
>>>> flag to indicate an extent is reflinked/cow when the internal OCFS2_EXT_REFCOUNTED
>>>> flag is detected.
>>>>
>>>> Recently, I have started to implement this feature on Btrfs in a similar approach.
>>>> Once it completed, the next thing is to teach upstream du(1) works for both file
>>>> systems with a new command option.
>>>>
>>>> Still sounds nothing because we have FIEMAP...:( But consider the bad interface
>>>> and error prone when I improving cp(1) through it for sparse files, it will extends
>>>> the ugly tentacles of FIEMAP into du(1) again that the maintainer of coreutils(Jim, CC-ed)
>>>> don't like it at all, and I also want to avoid if possible...
>>>>
>>>> How about if we add a new whence type to lseek(2) for this function? lseek has very clear
>>>> interface and works very well for SEEK_DATA/SEEK_HOLE, most likely could works fine for
>>>> shared extents IMHO.
>>> Well, I can hardly imagine how such lseek(2) interface would look to be
>>> useful for identifying shared extents among different files. Do you have
>>> something particular in mind?
>> lseek(2) is not used for identifying shared extents among files. It
>> would be improved and called to find out and return an desired extent
>> which is reflinked or cloned with a particular whence, the underlying
>> file system should be improved accordingly.
>>
>> To say Btrfs, if we performed btrfs_ioctl_clone from source file A to
>> target B, run du(1) against both files, it would show double space
>> although only 1/2 space is really used/reserved upon COW.
>>
>> If we can mark the cloned extents of file with a special flag(to say
>> EXTENT_MAP_CLONED), then call lseek(fd, offset, SEEK_CLONE or ?), it
>> would return the offset of a cloned extent which is equal or beyond the
>> given offset, so we can find out all the cloned extents upon a file which
>> would be used for the disk space accounting in user space tools.
> OK, but then you have to call FIEMAP anyway to find which blocks are
> underlying the extent so that you can match that with cloned extents from
> different files.
Yes, I have to call FIEMAP as the user space end up checking the
physical offset for the start of an extent. :(
> Ah, and the advantage would be that you don't have to
> cache *all* the extents but only those that are reported as reflinked.
Yess!
Thanks,
-Jeff
> OK, now I see.
>
> Honza
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [LSF/MM TOPIC] [ATTEND] Container disk quota and lseek(2) upon shared extents
2013-01-30 2:41 ` Dave Chinner
@ 2013-01-30 4:24 ` Jeff Liu
0 siblings, 0 replies; 7+ messages in thread
From: Jeff Liu @ 2013-01-30 4:24 UTC (permalink / raw)
To: Dave Chinner
Cc: Jan Kara, linux-fsdevel@vger.kernel.org, lsf-pc, Jim Meyering
On 01/30/2013 10:41 AM, Dave Chinner wrote:
> On Wed, Jan 30, 2013 at 12:37:08AM +0800, Jeff Liu wrote:
>> On 01/29/2013 11:14 PM, Jan Kara wrote:
>>> On Tue 29-01-13 22:44:24, Jeff Liu wrote:
>>>> I'd like to discuss the following problems on LSF:
> ....
>>>> Still sounds nothing because we have FIEMAP...:( But consider the bad interface
>>>> and error prone when I improving cp(1) through it for sparse files, it will extends
>>>> the ugly tentacles of FIEMAP into du(1) again that the maintainer of coreutils(Jim, CC-ed)
>>>> don't like it at all, and I also want to avoid if possible...
>>>>
>>>> How about if we add a new whence type to lseek(2) for this function? lseek has very clear
>>>> interface and works very well for SEEK_DATA/SEEK_HOLE, most likely could works fine for
>>>> shared extents IMHO.
>>> Well, I can hardly imagine how such lseek(2) interface would look to be
>>> useful for identifying shared extents among different files. Do you have
>>> something particular in mind?
>> lseek(2) is not used for identifying shared extents among files. It would be improved and
>> called to find out and return an desired extent which is reflinked or cloned with a particular
>> whence, the underlying file system should be improved accordingly.
>>
>> To say Btrfs, if we performed btrfs_ioctl_clone from source file A to target B, run du(1)
>> against both files, it would show double space although only 1/2 space is really used/reserved
>> upon COW.
>>
>> If we can mark the cloned extents of file with a special flag(to say EXTENT_MAP_CLONED), then
>> call lseek(fd, offset, SEEK_CLONE or ?), it would return the offset of a cloned extent which is
>> equal or beyond the given offset, so we can find out all the cloned extents upon a file which
>> would be used for the disk space accounting in user space tools.
>
> Why do you need lseek to find this?
I had originally thought to call lseek to get the cloned extents only,
ranther than fetching different kinds of extents, and then call FIEMAP
to get the physical block offset upon the logical.
> Couldn't we just add a filter
> option to fiemap to ensure that it only returns extents in the file
> that match the filter? You could then implement what you want with a
> single call rather than having to indirectly iterate the extent map
> with lseek() and then calling FIEMAP on each of the
> regions discovered by lseek.....
Really a nice point, thank you!
-Jeff
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2013-01-30 4:25 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-29 14:44 [LSF/MM TOPIC] [ATTEND] Container disk quota and lseek(2) upon shared extents Jeff Liu
2013-01-29 15:14 ` Jan Kara
2013-01-29 16:37 ` Jeff Liu
2013-01-29 19:19 ` Jan Kara
2013-01-30 3:49 ` Jeff Liu
2013-01-30 2:41 ` Dave Chinner
2013-01-30 4:24 ` Jeff Liu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).