From: Jp Wise <jpwise@theflat.net.nz>
To: linux-btrfs@vger.kernel.org
Subject: Identifying reflink / CoW files
Date: Sat, 22 Sep 2012 15:38:42 +1200 [thread overview]
Message-ID: <505D32C2.8070105@theflat.net.nz> (raw)
Good morning, I'm working on an offline deduplication script intended to
work around the copy-on-write functionality of BTRFS.
Simply put - is there any existing utility to compare two files (or
dirs) and output if the files share the same physical extents / data
blocks on disk?
- aka - they're CoW copies.
I'm not actively working with BTRFS yet, but for the project i'm working
on it's looking to the be most suitable candidate, and the CoW
functionality avoids issues with file changes that hardlinks would create.
From reading other posts, aware the information could be pulled out via
btrfs-debug-tree, but it would then involve parsing the entire output to
locate the required files inodes and their extents which seems like
quite a roundabout way to retrieve the information.
Also my programming skills aren't up to the task of trying to pull the
tree data directly from the filesystem to do it, and I'd like to avoid
doing byte-by-byte comparisons on all files as it's inefficient if the
file can instead be identified as a CoW copy.
Open to suggestions of other tools that could be used to acheive the
desired result.
Thanks.
Jp.
next reply other threads:[~2012-09-22 3:48 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-09-22 3:38 Jp Wise [this message]
2012-09-22 7:49 ` Identifying reflink / CoW files Arne Jansen
2012-09-22 21:56 ` Jp Wise
2012-09-24 13:53 ` David Sterba
-- strict thread matches above, loose matches on Subject: below --
2016-10-27 11:30 Saint Germain
2016-11-03 5:17 ` Zygo Blaxell
2016-11-04 14:41 ` Saint Germain
2016-11-25 3:55 ` Zygo Blaxell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=505D32C2.8070105@theflat.net.nz \
--to=jpwise@theflat.net.nz \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).