From: Demi Marie Obenour <demiobenour@gmail.com>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>, linux-btrfs@vger.kernel.org
Subject: Re: Can the output of FIEMAP on BTRFS be used to check if a file and its reflink copy might have diverged?
Date: Mon, 22 Sep 2025 14:24:45 -0400 [thread overview]
Message-ID: <e4423bc0-38e8-4f61-9cd9-c3d9c00308ab@gmail.com> (raw)
In-Reply-To: <62a97fb7-75e3-4832-b97c-90763b287a5c@gmx.com>
[-- Attachment #1.1.1: Type: text/plain, Size: 3158 bytes --]
On 9/21/25 20:50, Qu Wenruo wrote:
>
>
> 在 2025/9/22 09:37, Demi Marie Obenour 写道:
>> Wyng Backup (https://codeberg.org/tasket/wyng-backup) relies on FIEMAP
>> to determine which parts of a file have not changed since it was last
>> backed up. Specifically, the output of filefrag -v is passed to sort and
>> then to uniq, and differences between the outputs for the file and
>> the previous version (a reflink copy) determine what gets backed up.
>>
>> Is this safe under BTRFS,
>
> No. There are several factors affecting this, some are minor some are not:
>
> - Inlined extents
> The returned bytenr is unreliable in that case.
> Although the fiemap flags should indicate that, with 'inline' flag
> set.
>
> - Balance
> Btrfs can balance the data extents, which will result the change of
> the fiemap.
>
> E.g.
> ## Before balance
> # md5sum /mnt/btrfs/foobar
> 27c9068d1b51da575a53ad34c57ca5cc /mnt/btrfs/foobar
> # filefrag -v /mnt/btrfs/foobar
> Filesystem type is: 9123683e
> File size of /mnt/btrfs/foobar is 65536 (8 blocks of 8192 bytes)
> ext: logical_offset: physical_offset: length: expected:
> flags:
> 0: 0.. 7: 1664.. 1671: 8:
> last,eof
> /mnt/btrfs/foobar: 1 extent found
>
> ## Do data balance
> # btrfs balance start -d /mnt/btrfs/
> Done, had to relocate 1 out of 3 chunks
>
> ## After data balannce
> # filefrag -v /mnt/btrfs/foobar
> Filesystem type is: 9123683e
> File size of /mnt/btrfs/foobar is 65536 (8 blocks of 8192 bytes)
> ext: logical_offset: physical_offset: length:
> expected: flags:
> 0: 0.. 7: 36480.. 36487: 8:
> last,eof
> /mnt/btrfs/foobar: 1 extent found
>
>
> - NODATACOW cases.
> In that case new data is written into the same location, without any
> extra new data extents. This completely breaks the assumption.
>
> - Dirty data that is not yet written into the disk
> In that case fiemap won't show those data but only the ones that are
> on the disk.
>
>> or can it result in data loss due to data
>> not being backed up that should be? In other words, can it result
>> in data being considered unchanged when it really is?
>
> Dirty data and NODATACOW will result data being considered unchanged
> using fiemap only.
>
> And balance will make the unchanged data to be considered changed.
>
> So overall, fiemap based solution on btrfs is unreliable.
Can one implement this reliably with TREE_SEARCH_V2 or by parsing
the output of `btrfs send`? Using `btrfs receive` to apply deltas
might work, but the backups must be encrypted with a key the
destination doesn't have, and that means the backups must be
opaque to the remote. Therefore, I think periodic full backups
with incremental backups on top of them (each containing the hash
of the last) is the best that could be done. This means that old
backups cannot be garbage-collected without taking a full backup
first.
--
Sincerely,
Demi Marie Obenour (she/her/hers)
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 7253 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2025-09-22 18:24 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-22 0:07 Can the output of FIEMAP on BTRFS be used to check if a file and its reflink copy might have diverged? Demi Marie Obenour
2025-09-22 0:50 ` Qu Wenruo
2025-09-22 18:24 ` Demi Marie Obenour [this message]
2025-09-22 21:38 ` Qu Wenruo
2025-09-22 16:48 ` Christoph Hellwig
2025-09-22 17:18 ` Demi Marie Obenour
2025-09-22 17:20 ` Christoph Hellwig
2025-09-22 17:30 ` Demi Marie Obenour
2025-09-22 17:31 ` Christoph Hellwig
2025-09-22 17:54 ` Demi Marie Obenour
2025-09-29 8:50 ` Christoph Hellwig
2025-09-29 23:56 ` Demi Marie Obenour
2025-09-30 1:34 ` Demi Marie Obenour
2025-10-03 7:45 ` Christoph Hellwig
2025-09-22 23:25 ` Chris Laprise
2025-09-29 8:49 ` Christoph Hellwig
2025-09-29 23:55 ` Demi Marie Obenour
2025-10-03 7:44 ` Christoph Hellwig
2025-10-04 1:09 ` Demi Marie Obenour
2025-10-04 1:43 ` Chris Laprise
2025-10-04 4:51 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e4423bc0-38e8-4f61-9cd9-c3d9c00308ab@gmail.com \
--to=demiobenour@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=quwenruo.btrfs@gmx.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).