linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* How to find (out if) files sharing content?
@ 2012-10-30 15:20 Gábor Nyers
  2012-10-30 15:39 ` Hugo Mills
  2012-10-31  0:40 ` Liu Bo
  0 siblings, 2 replies; 9+ messages in thread
From: Gábor Nyers @ 2012-10-30 15:20 UTC (permalink / raw)
  To: linux-btrfs

Hi,

How could one find out if 2 files share any extents on a btrfs file system?

A more generic variation of the above: How to list files on the same
file system/subvolume sharing content?

Thanks,
Gábor

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: How to find (out if) files sharing content?
  2012-10-30 15:20 How to find (out if) files sharing content? Gábor Nyers
@ 2012-10-30 15:39 ` Hugo Mills
  2012-10-30 15:58   ` Jan Schmidt
  2012-10-31  0:40 ` Liu Bo
  1 sibling, 1 reply; 9+ messages in thread
From: Hugo Mills @ 2012-10-30 15:39 UTC (permalink / raw)
  To: Gábor Nyers; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 896 bytes --]

On Tue, Oct 30, 2012 at 04:20:05PM +0100, Gábor Nyers wrote:
> Hi,
> 
> How could one find out if 2 files share any extents on a btrfs file system?
> 
> A more generic variation of the above: How to list files on the same
> file system/subvolume sharing content?

   You have direct (read-only) access to the metadata trees through
the TREE_SEARCH ioctl. It should be possible to walk through the
extents of a given file, and (I think) follow back-refs from the
extent back to the other files that share it.

   There's no simple code to do that right now, though.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- And what rough beast,  its hour come round at last / slouches ---  
                     towards Bethlehem,  to be born?                     

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: How to find (out if) files sharing content?
  2012-10-30 15:39 ` Hugo Mills
@ 2012-10-30 15:58   ` Jan Schmidt
  0 siblings, 0 replies; 9+ messages in thread
From: Jan Schmidt @ 2012-10-30 15:58 UTC (permalink / raw)
  To: Hugo Mills, Gábor Nyers, linux-btrfs

On Tue, October 30, 2012 at 16:39 (+0100), Hugo Mills wrote:
> It should be possible to walk through the
> extents of a given file, and (I think) follow back-refs from the
> extent back to the other files that share it.

You wish :-) Backrefs are not made to walk them while the file system is online.
However "btrfs inspect logical" manages quite well, at least I haven't heard
otherwise so far. You still need to get the logical block numbers, either by
TREE_SEARCH ioctl or by filefrag.

-Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: How to find (out if) files sharing content?
  2012-10-30 15:20 How to find (out if) files sharing content? Gábor Nyers
  2012-10-30 15:39 ` Hugo Mills
@ 2012-10-31  0:40 ` Liu Bo
  2012-10-31  2:30   ` Jeff Liu
  1 sibling, 1 reply; 9+ messages in thread
From: Liu Bo @ 2012-10-31  0:40 UTC (permalink / raw)
  To: Gábor Nyers; +Cc: linux-btrfs, Jie Liu

On 10/30/2012 11:20 PM, Gábor Nyers wrote:
> Hi,
> 
> How could one find out if 2 files share any extents on a btrfs file system?
> 
> A more generic variation of the above: How to list files on the same
> file system/subvolume sharing content?
> 

Indeed ocfs2 already has the feature where you can get shared parts via 'du',
we're planning to support this in btrfs, too.

thanks,
liubo

> Thanks,
> Gábor
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: How to find (out if) files sharing content?
  2012-10-31  0:40 ` Liu Bo
@ 2012-10-31  2:30   ` Jeff Liu
  2012-10-31 11:31     ` David Sterba
  0 siblings, 1 reply; 9+ messages in thread
From: Jeff Liu @ 2012-10-31  2:30 UTC (permalink / raw)
  To: Gábor Nyers; +Cc: Liu Bo, linux-btrfs

On 10/31/2012 08:40 AM, Liu Bo wrote:
> On 10/30/2012 11:20 PM, Gábor Nyers wrote:
>> Hi,
>>
>> How could one find out if 2 files share any extents on a btrfs file system?
>>
>> A more generic variation of the above: How to list files on the same
>> file system/subvolume sharing content?
One idea is to mark those cloned extents as FIEMAP_EXTENT_SHARED so that
we can go through a file to figure out how many extents are shared
through fiemap(2), and calculate the real storage(fs/subvolume) footprint
in the end.

Thanks,
-Jeff
>>
> 
> Indeed ocfs2 already has the feature where you can get shared parts via 'du',
> we're planning to support this in btrfs, too.
> 
> thanks,
> liubo
> 
>> Thanks,
>> Gábor
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: How to find (out if) files sharing content?
  2012-10-31  2:30   ` Jeff Liu
@ 2012-10-31 11:31     ` David Sterba
  2012-10-31 13:02       ` Jeff Liu
  0 siblings, 1 reply; 9+ messages in thread
From: David Sterba @ 2012-10-31 11:31 UTC (permalink / raw)
  To: Jeff Liu; +Cc: Gábor Nyers, Liu Bo, linux-btrfs

On Wed, Oct 31, 2012 at 10:30:22AM +0800, Jeff Liu wrote:
> One idea is to mark those cloned extents as FIEMAP_EXTENT_SHARED so that
> we can go through a file to figure out how many extents are shared
> through fiemap(2), and calculate the real storage(fs/subvolume) footprint
> in the end.

This will cost at least one more seek per extent to find out that the
extent is shared, could be quite expensive. And without any possibility
to turn this off, I'm afraid this will render FIEMAP unusable in
practice.

david

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: How to find (out if) files sharing content?
  2012-10-31 11:31     ` David Sterba
@ 2012-10-31 13:02       ` Jeff Liu
  2012-11-05 22:45         ` David Sterba
  0 siblings, 1 reply; 9+ messages in thread
From: Jeff Liu @ 2012-10-31 13:02 UTC (permalink / raw)
  To: dave; +Cc: Liu Bo, Gábor Nyers, linux-btrfs

On 10/31/2012 07:31 PM, David Sterba wrote:
> On Wed, Oct 31, 2012 at 10:30:22AM +0800, Jeff Liu wrote:
>> One idea is to mark those cloned extents as FIEMAP_EXTENT_SHARED so that
>> we can go through a file to figure out how many extents are shared
>> through fiemap(2), and calculate the real storage(fs/subvolume) footprint
>> in the end.
> 
> This will cost at least one more seek per extent to find out that the
> extent is shared, could be quite expensive.
I propose this because OCFS2 report shared space in this way combine with du(1).

An old patch set to teach du(1) aware of reflinked file:
https://oss.oracle.com/pipermail/ocfs2-devel/2010-September/007293.html

Do you means that the costs is very expensive for userland extent status checkup per file?
If yes, I have once tested an 50Gb OCFS2 partition filled with reflinked files on an old laptop,
it spent around 4 minutes to show the totally results if I recalled correct, but this definitely
depending on the real world scenarios.

> And without any possibility to turn this off,I'm afraid this will render FIEMAP unusable in practice.
For OCFS2, the FIEMAP_EXTENT_SHARED flag will be set upon fiemap ioctl(2) if an extent
is OCFS2_EXT_REFCOUNTED(i.e. reflinked or cloned), which means that FIEMAP_EXTENT_SHARED
is not a persistent flag, but I have no idea how Btrfs would be in this point. :(

Thanks,
-Jeff
> 
> david
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: How to find (out if) files sharing content?
  2012-10-31 13:02       ` Jeff Liu
@ 2012-11-05 22:45         ` David Sterba
  2012-11-06  3:53           ` Jeff Liu
  0 siblings, 1 reply; 9+ messages in thread
From: David Sterba @ 2012-11-05 22:45 UTC (permalink / raw)
  To: Jeff Liu; +Cc: dave, Liu Bo, Gábor Nyers, linux-btrfs

On Wed, Oct 31, 2012 at 09:02:15PM +0800, Jeff Liu wrote:
> I propose this because OCFS2 report shared space in this way combine with du(1).
> 
> An old patch set to teach du(1) aware of reflinked file:
> https://oss.oracle.com/pipermail/ocfs2-devel/2010-September/007293.html

Patch looks ok, the shared size is requested by an option.

> Do you means that the costs is very expensive for userland extent status checkup per file?

The most expensive part is IMO not in userspace, it does in-memory lookups.

> > And without any possibility to turn this off,I'm afraid this will render FIEMAP unusable in practice.
> For OCFS2, the FIEMAP_EXTENT_SHARED flag will be set upon fiemap ioctl(2) if an extent
> is OCFS2_EXT_REFCOUNTED(i.e. reflinked or cloned), which means that FIEMAP_EXTENT_SHARED
> is not a persistent flag, but I have no idea how Btrfs would be in this point. :(

After some research, I think this could work for btrfs without
unwanted performance penalties.

There's the fiemap::fm_flags field that can be extended to request the
shared extent info from fiemap, so the information is not computed
unconditionally (that was my concern before). The rest is only
implementation details how to speed up the file extent -> refcount info
lookups.

david

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: How to find (out if) files sharing content?
  2012-11-05 22:45         ` David Sterba
@ 2012-11-06  3:53           ` Jeff Liu
  0 siblings, 0 replies; 9+ messages in thread
From: Jeff Liu @ 2012-11-06  3:53 UTC (permalink / raw)
  To: dave, Liu Bo, Gábor Nyers, linux-btrfs

On 11/06/2012 06:45 AM, David Sterba wrote:
> On Wed, Oct 31, 2012 at 09:02:15PM +0800, Jeff Liu wrote:
>> I propose this because OCFS2 report shared space in this way combine with du(1).
>>
>> An old patch set to teach du(1) aware of reflinked file:
>> https://oss.oracle.com/pipermail/ocfs2-devel/2010-September/007293.html
> 
> Patch looks ok, the shared size is requested by an option.
> 
>> Do you means that the costs is very expensive for userland extent status checkup per file?
> 
> The most expensive part is IMO not in userspace, it does in-memory lookups.
> 
>>> And without any possibility to turn this off,I'm afraid this will render FIEMAP unusable in practice.
>> For OCFS2, the FIEMAP_EXTENT_SHARED flag will be set upon fiemap ioctl(2) if an extent
>> is OCFS2_EXT_REFCOUNTED(i.e. reflinked or cloned), which means that FIEMAP_EXTENT_SHARED
>> is not a persistent flag, but I have no idea how Btrfs would be in this point. :(
> 
> After some research, I think this could work for btrfs without
> unwanted performance penalties.
> 
> There's the fiemap::fm_flags field that can be extended to request the
> shared extent info from fiemap, so the information is not computed
> unconditionally (that was my concern before). The rest is only
> implementation details how to speed up the file extent -> refcount info
> lookups.
Thanks for your confirmation.

-Jeff
> 
> david
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2012-11-06  3:54 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-10-30 15:20 How to find (out if) files sharing content? Gábor Nyers
2012-10-30 15:39 ` Hugo Mills
2012-10-30 15:58   ` Jan Schmidt
2012-10-31  0:40 ` Liu Bo
2012-10-31  2:30   ` Jeff Liu
2012-10-31 11:31     ` David Sterba
2012-10-31 13:02       ` Jeff Liu
2012-11-05 22:45         ` David Sterba
2012-11-06  3:53           ` Jeff Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).