From: Lutz Vieweg <lvml@5t9.de>
To: linux-btrfs@vger.kernel.org
Subject: Re: Can I get a checksum for a file from btrfs (without reading the whole file)?
Date: Fri, 06 Feb 2015 14:00:53 +0100 [thread overview]
Message-ID: <mb2du5$esf$1@ger.gmane.org> (raw)
In-Reply-To: <54D44F2A.8090108@cn.fujitsu.com>
On 02/06/2015 06:20 AM, Qu Wenruo wrote:
> From: Lutz Vieweg <lvml@5t9.de>
>> use case: You have two huge files on a btrfs, you assume they contain the same bytes,
>> but you do not know for sure.
>>
>> Is there a way to get a checksum of both files from btrfs with less effort than
>> reading the whole of both files and computing a hash sum?
> For short, NO.
>
> For long:
> For current implement, btrfs use calculate 4K sector into 4bytes(32bit) crc32 and restore it into
> csum tree.
>
> So, for large files, e.g. 1G(already quite small for modern storage), its checksum will be 1M in size.
> Which means even using crc32 (same as kernel and crc32(a+b) = crc32(a) + crc32(b)), you still needs to
> do crc32 on the all 1M crc32.
And yet, having to read only 1 MB checksums instead of 1 GB data sounds
like a good deal - is there some userspace interface allowing to read
(only) those per-4k checksums for a file?
> But there is still some case btrfs can help you determine whether the files are the same in a faster
> way.
> Prerequisite:
> The two files are copied using clone(cp --reflink command) or deduplicated
In my case I know for sure that no cloning/deduplication happened when
the files were written.
Regards,
Lutz Vieweg
next prev parent reply other threads:[~2015-02-06 13:01 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-05 10:40 Can I get a checksum for a file from btrfs (without reading the whole file)? Lutz Vieweg
2015-02-06 5:20 ` Qu Wenruo
2015-02-06 13:00 ` Lutz Vieweg [this message]
2015-02-06 18:04 ` David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='mb2du5$esf$1@ger.gmane.org' \
--to=lvml@5t9.de \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.