On 2015-09-24 14:48, Matwey V. Kornilov wrote: > 2015-09-24 21:35 GMT+03:00 Austin S Hemmelgarn : >> On 2015-09-24 14:06, Matwey V. Kornilov wrote: >>> >>> >>> Hello, >>> >>> I would like to read the list of the checksums for the specific file >>> stored onto btrfs filesystem. I think I could use the checksums in the >>> manner like rsync does, but safe both CPU (because csums are already >>> calculated for the file) and I/O (because I don't need to reread all the >>> file from the hard drive). >> >> As of right now, there is no way to do this from userspace without just >> directly parsing the on-disk format (which isn't safe or reliable if the >> filesystem is mounted). It has been discussed before, but the discussions >> haven't really gotten anywhere. >> >> It's worth noting that the way btrfs does checksums isn't per-file, it's >> per-block. This means that: >> a. I think (I'm not 100% certain about this) that the checksum in btrfs >> includes the padding up to the end of the block for blocks that aren't full. >> b. Files that get stored in-line in their metadata block won't have a >> checksum just for the file data (because the checksum will cover the whole >> metadata block). >> c. While it is possible with some checksum algorithms (if I remember right, >> CRC32c is one such algorithm, and that is what btrfs uses for it's >> checksums) to combine the checksums from a group of data blocks to get the >> checksum for data as a whole, this in and of itself takes a significant >> amount of CPU time for large amounts of data. >> >> All in all, this means that if you just want a checksum of the contents of >> the file, it's almost certainly better to just do it in userspace. >> If you're trying to figure out what changed, using send/receive and >> snapshots is more efficient (usually). > > I want the checksums of the every block of the file to see which part > has been changed. > I cannot use send/receive because my other file replica is on the > remote host but not on the same filesystem. Compare with how rsync > works. It calculates checksums of the chunks of both versions of the > file and then syncs different chunks over the network. I just want to > utilize the fact that btrfs already has the data I need to calculate. On current versions of btrfs-progs, btrfs send has a mode that will just spit out the metadata, which can then be parsed to figure out what has changed. The parsing is of course non-trivial, but should still be faster than checksumming everything, and I'm relatively sure (although I may be wrong) that the send stream format is well documented.