From: Boris Burkov <boris@bur.io>
To: fdmanana@kernel.org
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 3/3] btrfs-progs: receive: fix silent data loss after fall back from encoded write
Date: Tue, 15 Nov 2022 12:45:13 -0800 [thread overview]
Message-ID: <Y3P5vyh09FSzuos4@zen> (raw)
In-Reply-To: <59f1b932bfd1549f777d18b814feeaabef0860ee.1668529099.git.fdmanana@suse.com>
On Tue, Nov 15, 2022 at 04:25:26PM +0000, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> When attempting an encoded write, if it fails for some specific reason
> like -EINVAL (when an offset is not sector size aligned) or -ENOSPC, we
> then fallback into decompressing the data and writing it using regular
> buffered IO. This logic however is not correct, one of the reasons is
> that it assumes the encoded offset is smaller than the unencoded file
> length and that they can be compared, but one is an offset and the other
> is a length, not an end offset, so they can't be compared to get correct
> results. This bad logic will often result in not copying all data, or even
> no data at all, resulting in a silent data loss. This is easily seen in
> with the following reproducer:
>
> $ cat test.sh
> #!/bin/bash
>
> DEV=/dev/sdj
> MNT=/mnt/sdj
>
> umount $DEV &> /dev/null
> mkfs.btrfs -f $DEV > /dev/null
> mount -o compress $DEV $MNT
>
> # File foo has a size of 33K, not aligned to the sector size.
> xfs_io -f -c "pwrite -S 0xab 0 33K" $MNT/foo
>
> xfs_io -f -c "pwrite -S 0xcd 0 64K" $MNT/bar
>
> # Now clone the first 32K of file bar into foo at offset 0.
> xfs_io -c "reflink $MNT/bar 0 0 32K" $MNT/foo
>
> # Snapshot the default subvolume and create a full send stream (v2).
> btrfs subvolume snapshot -r $MNT $MNT/snap
>
> btrfs send --compressed-data -f /tmp/test.send $MNT/snap
>
> echo -e "\nFile bar in the original filesystem:"
> od -A d -t x1 $MNT/snap/bar
>
> umount $MNT
> mkfs.btrfs -f $DEV > /dev/null
> mount $DEV $MNT
>
> echo -e "\nReceiving stream in a new filesystem..."
> btrfs receive -f /tmp/test.send $MNT
>
> echo -e "\nFile bar in the new filesystem:"
> od -A d -t x1 $MNT/snap/bar
>
> umount $MNT
>
> Running the test without this patch:
>
> $ ./test.sh
> (...)
> File bar in the original filesystem:
> 0000000 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd
> *
> 0065536
>
> Receiving stream in a new filesystem...
> At subvol snap
>
> File bar in the new filesystem:
> 0000000 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd
> *
> 0033792
>
> We end up with file bar having less data, and a smaller size, than in the
> original filesystem.
>
> This happens because when processing file bar, send issues the following
> commands:
>
> clone bar - source=foo source offset=0 offset=0 length=32768
> write bar - offset=32768 length=1024
> encoded_write bar - offset=33792, len=4096, unencoded_offset=33792, unencoded_file_len=31744, unencoded_len=65536, compression=1, encryption=0
>
> The first 32K are cloned from file foo, as that file ranged is shared
> between the files.
>
> Then there's a regular write operation for the file range [32K, 33K[,
> since file foo has different data from bar for that file range.
>
> Finally for the remainder of file bar, the send side issues an encoded
> write since the extent is compressed in the source filesystem, for the
> file offset 33792 (33K), remaining 31K of data. The receiver will try the
> encoded write, but that fails with -EINVAL since the offset 33K is not
> sector size aligned, so it will fallback to decompressing the data and
> writing it using regular buffered writes. However that results in doing
> no writes at decompress_and_write() because 'pos' is initialized to the
> value of 33K (unencoded_offset) and unencoded_file_len is 31K, so the
> while loop has no iterations.
>
> Another case where we can fallback to decompression plus regular buffered
> writes is when the destination filesystem has a sector size larger then
> the sector size of the source filesystem (for example when the source
> filesystem is on x86_64 with a 4K sector size and the destination
> filesystem is PowerPC with a 64K sector size). In that scenario encoded
> write attempts wil fail with -EINVAL due to offsets not being aligned with
> the sector size of the destination filesystem, and the receive will
> attempt the fallback of decompressing the buffer and writing the
> decompressed using regular buffered IO.
>
> Fix this by tracking the number of written bytes instead, and increment
> it, and the unencoded offset, after each write.
Thank you for catching and fixing this.
>
> Fixes: d20e759fc917 ("btrfs-progs: receive: encoded_write fallback to explicit decode and write")
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Boris Burkov <boris@bur.io>
> ---
> cmds/receive.c | 15 ++++++++-------
> 1 file changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/cmds/receive.c b/cmds/receive.c
> index e6f1aeab..0db66ee5 100644
> --- a/cmds/receive.c
> +++ b/cmds/receive.c
> @@ -1155,10 +1155,9 @@ static int decompress_and_write(struct btrfs_receive *rctx,
> u32 compression)
> {
> int ret = 0;
> - size_t pos;
> - ssize_t w;
> char *unencoded_data;
> int sector_shift = 0;
> + u64 written = 0;
>
> unencoded_data = calloc(unencoded_len, 1);
> if (!unencoded_data) {
> @@ -1209,17 +1208,19 @@ static int decompress_and_write(struct btrfs_receive *rctx,
> goto out;
> }
>
> - pos = unencoded_offset;
> - while (pos < unencoded_file_len) {
> - w = pwrite(rctx->write_fd, unencoded_data + pos,
> - unencoded_file_len - pos, offset);
> + while (written < unencoded_file_len) {
> + ssize_t w;
> +
> + w = pwrite(rctx->write_fd, unencoded_data + unencoded_offset,
> + unencoded_file_len - written, offset);
> if (w < 0) {
> ret = -errno;
> error("writing unencoded data failed: %m");
> goto out;
> }
> - pos += w;
> + written += w;
> offset += w;
> + unencoded_offset += w;
> }
> out:
> free(unencoded_data);
> --
> 2.35.1
>
next prev parent reply other threads:[~2022-11-15 20:45 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-15 16:25 [PATCH 0/3] btrfs-progs: receive: fix a silent data loss bug with encoded writes fdmanana
2022-11-15 16:25 ` [PATCH 1/3] btrfs-progs: receive: add debug messages when processing " fdmanana
2022-11-15 20:45 ` Boris Burkov
2022-11-15 16:25 ` [PATCH 2/3] btrfs-progs: receive: add debug messages when processing fallocate fdmanana
2022-11-15 20:46 ` Boris Burkov
2022-11-15 16:25 ` [PATCH 3/3] btrfs-progs: receive: fix silent data loss after fall back from encoded write fdmanana
2022-11-15 20:45 ` Boris Burkov [this message]
2022-11-15 16:37 ` [PATCH 0/3] btrfs-progs: receive: fix a silent data loss bug with encoded writes Christoph Anton Mitterer
2022-11-15 16:47 ` Filipe Manana
2022-11-15 16:53 ` Christoph Anton Mitterer
2022-11-16 10:50 ` Filipe Manana
2022-11-16 17:03 ` Christoph Anton Mitterer
2022-11-15 16:50 ` there should be some better way to inform interested people about data corruption issues Christoph Anton Mitterer
2022-11-23 18:55 ` [PATCH 0/3] btrfs-progs: receive: fix a silent data loss bug with encoded writes David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y3P5vyh09FSzuos4@zen \
--to=boris@bur.io \
--cc=fdmanana@kernel.org \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.