From: Lutz Vieweg <lvml@5t9.de>
To: linux-btrfs@vger.kernel.org
Subject: Re: Can I get a checksum for a file from btrfs (without reading the whole file)?
Date: Fri, 06 Feb 2015 14:00:53 +0100 [thread overview]
Message-ID: <mb2du5$esf$1@ger.gmane.org> (raw)
In-Reply-To: <54D44F2A.8090108@cn.fujitsu.com>
On 02/06/2015 06:20 AM, Qu Wenruo wrote:
> From: Lutz Vieweg <lvml@5t9.de>
>> use case: You have two huge files on a btrfs, you assume they contain the same bytes,
>> but you do not know for sure.
>>
>> Is there a way to get a checksum of both files from btrfs with less effort than
>> reading the whole of both files and computing a hash sum?
> For short, NO.
>
> For long:
> For current implement, btrfs use calculate 4K sector into 4bytes(32bit) crc32 and restore it into
> csum tree.
>
> So, for large files, e.g. 1G(already quite small for modern storage), its checksum will be 1M in size.
> Which means even using crc32 (same as kernel and crc32(a+b) = crc32(a) + crc32(b)), you still needs to
> do crc32 on the all 1M crc32.
And yet, having to read only 1 MB checksums instead of 1 GB data sounds
like a good deal - is there some userspace interface allowing to read
(only) those per-4k checksums for a file?
> But there is still some case btrfs can help you determine whether the files are the same in a faster
> way.
> Prerequisite:
> The two files are copied using clone(cp --reflink command) or deduplicated
In my case I know for sure that no cloning/deduplication happened when
the files were written.
Regards,
Lutz Vieweg
next prev parent reply other threads:[~2015-02-06 13:01 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-05 10:40 Can I get a checksum for a file from btrfs (without reading the whole file)? Lutz Vieweg
2015-02-06 5:20 ` Qu Wenruo
2015-02-06 13:00 ` Lutz Vieweg [this message]
2015-02-06 18:04 ` David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='mb2du5$esf$1@ger.gmane.org' \
--to=lvml@5t9.de \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox