From: Boris Burkov <boris@bur.io>
To: fdmanana@kernel.org
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 3/3] btrfs-progs: receive: fix silent data loss after fall back from encoded write
Date: Tue, 15 Nov 2022 12:45:13 -0800 [thread overview]
Message-ID: <Y3P5vyh09FSzuos4@zen> (raw)
In-Reply-To: <59f1b932bfd1549f777d18b814feeaabef0860ee.1668529099.git.fdmanana@suse.com>
On Tue, Nov 15, 2022 at 04:25:26PM +0000, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> When attempting an encoded write, if it fails for some specific reason
> like -EINVAL (when an offset is not sector size aligned) or -ENOSPC, we
> then fallback into decompressing the data and writing it using regular
> buffered IO. This logic however is not correct, one of the reasons is
> that it assumes the encoded offset is smaller than the unencoded file
> length and that they can be compared, but one is an offset and the other
> is a length, not an end offset, so they can't be compared to get correct
> results. This bad logic will often result in not copying all data, or even
> no data at all, resulting in a silent data loss. This is easily seen in
> with the following reproducer:
>
> $ cat test.sh
> #!/bin/bash
>
> DEV=/dev/sdj
> MNT=/mnt/sdj
>
> umount $DEV &> /dev/null
> mkfs.btrfs -f $DEV > /dev/null
> mount -o compress $DEV $MNT
>
> # File foo has a size of 33K, not aligned to the sector size.
> xfs_io -f -c "pwrite -S 0xab 0 33K" $MNT/foo
>
> xfs_io -f -c "pwrite -S 0xcd 0 64K" $MNT/bar
>
> # Now clone the first 32K of file bar into foo at offset 0.
> xfs_io -c "reflink $MNT/bar 0 0 32K" $MNT/foo
>
> # Snapshot the default subvolume and create a full send stream (v2).
> btrfs subvolume snapshot -r $MNT $MNT/snap
>
> btrfs send --compressed-data -f /tmp/test.send $MNT/snap
>
> echo -e "\nFile bar in the original filesystem:"
> od -A d -t x1 $MNT/snap/bar
>
> umount $MNT
> mkfs.btrfs -f $DEV > /dev/null
> mount $DEV $MNT
>
> echo -e "\nReceiving stream in a new filesystem..."
> btrfs receive -f /tmp/test.send $MNT
>
> echo -e "\nFile bar in the new filesystem:"
> od -A d -t x1 $MNT/snap/bar
>
> umount $MNT
>
> Running the test without this patch:
>
> $ ./test.sh
> (...)
> File bar in the original filesystem:
> 0000000 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd
> *
> 0065536
>
> Receiving stream in a new filesystem...
> At subvol snap
>
> File bar in the new filesystem:
> 0000000 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd
> *
> 0033792
>
> We end up with file bar having less data, and a smaller size, than in the
> original filesystem.
>
> This happens because when processing file bar, send issues the following
> commands:
>
> clone bar - source=foo source offset=0 offset=0 length=32768
> write bar - offset=32768 length=1024
> encoded_write bar - offset=33792, len=4096, unencoded_offset=33792, unencoded_file_len=31744, unencoded_len=65536, compression=1, encryption=0
>
> The first 32K are cloned from file foo, as that file ranged is shared
> between the files.
>
> Then there's a regular write operation for the file range [32K, 33K[,
> since file foo has different data from bar for that file range.
>
> Finally for the remainder of file bar, the send side issues an encoded
> write since the extent is compressed in the source filesystem, for the
> file offset 33792 (33K), remaining 31K of data. The receiver will try the
> encoded write, but that fails with -EINVAL since the offset 33K is not
> sector size aligned, so it will fallback to decompressing the data and
> writing it using regular buffered writes. However that results in doing
> no writes at decompress_and_write() because 'pos' is initialized to the
> value of 33K (unencoded_offset) and unencoded_file_len is 31K, so the
> while loop has no iterations.
>
> Another case where we can fallback to decompression plus regular buffered
> writes is when the destination filesystem has a sector size larger then
> the sector size of the source filesystem (for example when the source
> filesystem is on x86_64 with a 4K sector size and the destination
> filesystem is PowerPC with a 64K sector size). In that scenario encoded
> write attempts wil fail with -EINVAL due to offsets not being aligned with
> the sector size of the destination filesystem, and the receive will
> attempt the fallback of decompressing the buffer and writing the
> decompressed using regular buffered IO.
>
> Fix this by tracking the number of written bytes instead, and increment
> it, and the unencoded offset, after each write.
Thank you for catching and fixing this.
>
> Fixes: d20e759fc917 ("btrfs-progs: receive: encoded_write fallback to explicit decode and write")
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Boris Burkov <boris@bur.io>
> ---
> cmds/receive.c | 15 ++++++++-------
> 1 file changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/cmds/receive.c b/cmds/receive.c
> index e6f1aeab..0db66ee5 100644
> --- a/cmds/receive.c
> +++ b/cmds/receive.c
> @@ -1155,10 +1155,9 @@ static int decompress_and_write(struct btrfs_receive *rctx,
> u32 compression)
> {
> int ret = 0;
> - size_t pos;
> - ssize_t w;
> char *unencoded_data;
> int sector_shift = 0;
> + u64 written = 0;
>
> unencoded_data = calloc(unencoded_len, 1);
> if (!unencoded_data) {
> @@ -1209,17 +1208,19 @@ static int decompress_and_write(struct btrfs_receive *rctx,
> goto out;
> }
>
> - pos = unencoded_offset;
> - while (pos < unencoded_file_len) {
> - w = pwrite(rctx->write_fd, unencoded_data + pos,
> - unencoded_file_len - pos, offset);
> + while (written < unencoded_file_len) {
> + ssize_t w;
> +
> + w = pwrite(rctx->write_fd, unencoded_data + unencoded_offset,
> + unencoded_file_len - written, offset);
> if (w < 0) {
> ret = -errno;
> error("writing unencoded data failed: %m");
> goto out;
> }
> - pos += w;
> + written += w;
> offset += w;
> + unencoded_offset += w;
> }
> out:
> free(unencoded_data);
> --
> 2.35.1
>
next prev parent reply other threads:[~2022-11-15 20:45 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-15 16:25 [PATCH 0/3] btrfs-progs: receive: fix a silent data loss bug with encoded writes fdmanana
2022-11-15 16:25 ` [PATCH 1/3] btrfs-progs: receive: add debug messages when processing " fdmanana
2022-11-15 20:45 ` Boris Burkov
2022-11-15 16:25 ` [PATCH 2/3] btrfs-progs: receive: add debug messages when processing fallocate fdmanana
2022-11-15 20:46 ` Boris Burkov
2022-11-15 16:25 ` [PATCH 3/3] btrfs-progs: receive: fix silent data loss after fall back from encoded write fdmanana
2022-11-15 20:45 ` Boris Burkov [this message]
2022-11-15 16:37 ` [PATCH 0/3] btrfs-progs: receive: fix a silent data loss bug with encoded writes Christoph Anton Mitterer
2022-11-15 16:47 ` Filipe Manana
2022-11-15 16:53 ` Christoph Anton Mitterer
2022-11-16 10:50 ` Filipe Manana
2022-11-16 17:03 ` Christoph Anton Mitterer
2022-11-15 16:50 ` there should be some better way to inform interested people about data corruption issues Christoph Anton Mitterer
2022-11-23 18:55 ` [PATCH 0/3] btrfs-progs: receive: fix a silent data loss bug with encoded writes David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y3P5vyh09FSzuos4@zen \
--to=boris@bur.io \
--cc=fdmanana@kernel.org \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox