linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Identifying reflink / CoW files
@ 2016-10-27 11:30 Saint Germain
  2016-11-03  5:17 ` Zygo Blaxell
  0 siblings, 1 reply; 8+ messages in thread
From: Saint Germain @ 2016-10-27 11:30 UTC (permalink / raw)
  To: linux-btrfs

Hello,

Following the previous discussion:
https://www.spinics.net/lists/linux-btrfs/msg19075.html

I would be interested in finding a way to reliably identify reflink /
CoW files in order to use deduplication programs (like fdupes, jdupes,
rmlint) efficiently.

Using FIEMAP doesn't seem to be reliable according to this discussion
on rmlint:
https://github.com/sahib/rmlint/issues/132#issuecomment-157665154

Is there another way that deduplication programs can easily use ?

Thanks

^ permalink raw reply	[flat|nested] 8+ messages in thread
* Identifying reflink / CoW files
@ 2012-09-22  3:38 Jp Wise
  2012-09-22  7:49 ` Arne Jansen
  0 siblings, 1 reply; 8+ messages in thread
From: Jp Wise @ 2012-09-22  3:38 UTC (permalink / raw)
  To: linux-btrfs

Good morning, I'm working on an offline deduplication script intended to 
work around the copy-on-write functionality of BTRFS.

Simply put - is there any existing utility to compare two files (or 
dirs) and output if the files share the same physical extents / data 
blocks on disk?
- aka - they're CoW copies.

I'm not actively working with BTRFS yet, but for the project i'm working 
on it's looking to the be most suitable candidate, and the CoW 
functionality avoids issues with file changes that hardlinks would create.
 From reading other posts, aware the information could be pulled out via 
btrfs-debug-tree, but it would then involve parsing the entire output to 
locate the required files inodes and their extents which seems like 
quite a roundabout way to retrieve the information.

Also my programming skills aren't  up to the task of trying to pull the 
tree data directly from the filesystem to do it, and I'd like to avoid 
doing byte-by-byte comparisons on all files as it's inefficient if the 
file can instead be identified as a CoW copy.

Open to suggestions of other tools that could be used to acheive the 
desired result.

Thanks.
Jp.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-11-25  3:56 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-10-27 11:30 Identifying reflink / CoW files Saint Germain
2016-11-03  5:17 ` Zygo Blaxell
2016-11-04 14:41   ` Saint Germain
2016-11-25  3:55     ` Zygo Blaxell
  -- strict thread matches above, loose matches on Subject: below --
2012-09-22  3:38 Jp Wise
2012-09-22  7:49 ` Arne Jansen
2012-09-22 21:56   ` Jp Wise
2012-09-24 13:53     ` David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).