From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: Filipe Manana <fdmanana@gmail.com>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: Reproducer for "compressed data + hole data corruption bug, 2018 edition" still works on 4.20.7
Date: Thu, 14 Feb 2019 00:00:54 -0500 [thread overview]
Message-ID: <20190214050043.GE23918@hungrycats.org> (raw)
In-Reply-To: <CAL3q7H6WSe_C7_+D0x1_Z+KdK=k+iM-t2J4pQSO32ZsY4AEZ=Q@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 15389 bytes --]
On Thu, Feb 14, 2019 at 01:22:49AM +0000, Filipe Manana wrote:
> On Wed, Feb 13, 2019 at 6:14 PM Filipe Manana <fdmanana@gmail.com> wrote:
> > On Wed, Feb 13, 2019 at 5:36 PM Filipe Manana <fdmanana@gmail.com> wrote:
[...]
> > > Tried it today and I got it reproduced (different vm, but still debian
> > > and kernel built from source).
> > > Not sure what was different last time. Yes, I had compression enabled.
> > >
> > > I'll look into it.
> >
> > So the problem is caused by hole punching. The script can be reduced
> > to the following:
> >
> > https://friendpaste.com/22t4OdktHQTl0aMGxckc86
> >
> > file size: 384K am
> > digests after file creation: 7c8349cc657fbe61af53fbc5cfacae6e9a402e83 am
> > digests after file creation 2: 7c8349cc657fbe61af53fbc5cfacae6e9a402e83 am
> > 262144 total bytes deduped in this operation
> > digests after dedupe: 7c8349cc657fbe61af53fbc5cfacae6e9a402e83 am
> > digests after dedupe 2: 7c8349cc657fbe61af53fbc5cfacae6e9a402e83 am
> > am: 24 KiB (24576 bytes) converted to sparse holes.
> > digests after hole punching: 7c8349cc657fbe61af53fbc5cfacae6e9a402e83 am
> > digests after hole punching 2: 5a357b64f4004ea38dbc7058c64a5678668420da am
> >
> > So hole punching is screwing things, and only after dropping the page
> > cache we can see the bug.
> > I'll send a fix likely tomorrow.
>
> So it turns out it's a problem in the read of compressed extents part,
> a variant of a bug I found back in 2015:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=005efedf2c7d0a270ffbe28d8997b03844f3e3e7
>
> The following one liner fixes it:
> https://friendpaste.com/22t4OdktHQTl0aMGxcWLj3
>
> While you test it there (if you want/can), I'll write a change log and
> a proper test case for fstests and submit them later.
Works here (and produces the correct sha1sum, which turns out to be
dae78e303edfb8b8ad64ecae01dc1bf233770cfd).
Nice work!
> Thanks!
> >
> > >
> > > >
> > > > > > > >
> > > > > > > > The behavior is slightly different on current kernels (4.20.7, 4.14.96)
> > > > > > > > which makes the problem a bit more difficult to detect.
> > > > > > > >
> > > > > > > > # repro-hole-corruption-test
> > > > > > > > i: 91, status: 0, bytes_deduped: 131072
> > > > > > > > i: 92, status: 0, bytes_deduped: 131072
> > > > > > > > i: 93, status: 0, bytes_deduped: 131072
> > > > > > > > i: 94, status: 0, bytes_deduped: 131072
> > > > > > > > i: 95, status: 0, bytes_deduped: 131072
> > > > > > > > i: 96, status: 0, bytes_deduped: 131072
> > > > > > > > i: 97, status: 0, bytes_deduped: 131072
> > > > > > > > i: 98, status: 0, bytes_deduped: 131072
> > > > > > > > i: 99, status: 0, bytes_deduped: 131072
> > > > > > > > 13107200 total bytes deduped in this operation
> > > > > > > > am: 4.8 MiB (4964352 bytes) converted to sparse holes.
> > > > > > > > 94a8acd3e1f6e14272f3262a8aa73ab6b25c9ce8 am
> > > > > > > > 6926a34e0ab3e0a023e8ea85a650f5b4217acab4 am
> > > > > > > > 6926a34e0ab3e0a023e8ea85a650f5b4217acab4 am
> > > > > > > > 6926a34e0ab3e0a023e8ea85a650f5b4217acab4 am
> > > > > > > > 6926a34e0ab3e0a023e8ea85a650f5b4217acab4 am
> > > > > > > > 6926a34e0ab3e0a023e8ea85a650f5b4217acab4 am
> > > > > > > > 6926a34e0ab3e0a023e8ea85a650f5b4217acab4 am
> > > > > > > >
> > > > > > > > The sha1sum seems stable after the first drop_caches--until a second
> > > > > > > > process tries to read the test file:
> > > > > > > >
> > > > > > > > 6926a34e0ab3e0a023e8ea85a650f5b4217acab4 am
> > > > > > > > 6926a34e0ab3e0a023e8ea85a650f5b4217acab4 am
> > > > > > > > # cat am > /dev/null (in another shell)
> > > > > > > > 19294e695272c42edb89ceee24bb08c13473140a am
> > > > > > > > 6926a34e0ab3e0a023e8ea85a650f5b4217acab4 am
> > > > > > > >
> > > > > > > > On Wed, Aug 22, 2018 at 11:11:25PM -0400, Zygo Blaxell wrote:
> > > > > > > > > This is a repro script for a btrfs bug that causes corrupted data reads
> > > > > > > > > when reading a mix of compressed extents and holes. The bug is
> > > > > > > > > reproducible on at least kernels v4.1..v4.18.
> > > > > > > > >
> > > > > > > > > Some more observations and background follow, but first here is the
> > > > > > > > > script and some sample output:
> > > > > > > > >
> > > > > > > > > root@rescue:/test# cat repro-hole-corruption-test
> > > > > > > > > #!/bin/bash
> > > > > > > > >
> > > > > > > > > # Write a 4096 byte block of something
> > > > > > > > > block () { head -c 4096 /dev/zero | tr '\0' "\\$1"; }
> > > > > > > > >
> > > > > > > > > # Here is some test data with holes in it:
> > > > > > > > > for y in $(seq 0 100); do
> > > > > > > > > for x in 0 1; do
> > > > > > > > > block 0;
> > > > > > > > > block 21;
> > > > > > > > > block 0;
> > > > > > > > > block 22;
> > > > > > > > > block 0;
> > > > > > > > > block 0;
> > > > > > > > > block 43;
> > > > > > > > > block 44;
> > > > > > > > > block 0;
> > > > > > > > > block 0;
> > > > > > > > > block 61;
> > > > > > > > > block 62;
> > > > > > > > > block 63;
> > > > > > > > > block 64;
> > > > > > > > > block 65;
> > > > > > > > > block 66;
> > > > > > > > > done
> > > > > > > > > done > am
> > > > > > > > > sync
> > > > > > > > >
> > > > > > > > > # Now replace those 101 distinct extents with 101 references to the first extent
> > > > > > > > > btrfs-extent-same 131072 $(for x in $(seq 0 100); do echo am $((x * 131072)); done) 2>&1 | tail
> > > > > > > > >
> > > > > > > > > # Punch holes into the extent refs
> > > > > > > > > fallocate -v -d am
> > > > > > > > >
> > > > > > > > > # Do some other stuff on the machine while this runs, and watch the sha1sums change!
> > > > > > > > > while :; do echo $(sha1sum am); sysctl -q vm.drop_caches={1,2,3}; sleep 1; done
> > > > > > > > >
> > > > > > > > > root@rescue:/test# ./repro-hole-corruption-test
> > > > > > > > > i: 91, status: 0, bytes_deduped: 131072
> > > > > > > > > i: 92, status: 0, bytes_deduped: 131072
> > > > > > > > > i: 93, status: 0, bytes_deduped: 131072
> > > > > > > > > i: 94, status: 0, bytes_deduped: 131072
> > > > > > > > > i: 95, status: 0, bytes_deduped: 131072
> > > > > > > > > i: 96, status: 0, bytes_deduped: 131072
> > > > > > > > > i: 97, status: 0, bytes_deduped: 131072
> > > > > > > > > i: 98, status: 0, bytes_deduped: 131072
> > > > > > > > > i: 99, status: 0, bytes_deduped: 131072
> > > > > > > > > 13107200 total bytes deduped in this operation
> > > > > > > > > am: 4.8 MiB (4964352 bytes) converted to sparse holes.
> > > > > > > > > 6926a34e0ab3e0a023e8ea85a650f5b4217acab4 am
> > > > > > > > > 6926a34e0ab3e0a023e8ea85a650f5b4217acab4 am
> > > > > > > > > 6926a34e0ab3e0a023e8ea85a650f5b4217acab4 am
> > > > > > > > > 072a152355788c767b97e4e4c0e4567720988b84 am
> > > > > > > > > 6926a34e0ab3e0a023e8ea85a650f5b4217acab4 am
> > > > > > > > > 6926a34e0ab3e0a023e8ea85a650f5b4217acab4 am
> > > > > > > > > 6926a34e0ab3e0a023e8ea85a650f5b4217acab4 am
> > > > > > > > > 6926a34e0ab3e0a023e8ea85a650f5b4217acab4 am
> > > > > > > > > 6926a34e0ab3e0a023e8ea85a650f5b4217acab4 am
> > > > > > > > > 6926a34e0ab3e0a023e8ea85a650f5b4217acab4 am
> > > > > > > > > 6926a34e0ab3e0a023e8ea85a650f5b4217acab4 am
> > > > > > > > > 6926a34e0ab3e0a023e8ea85a650f5b4217acab4 am
> > > > > > > > > 6926a34e0ab3e0a023e8ea85a650f5b4217acab4 am
> > > > > > > > > 6926a34e0ab3e0a023e8ea85a650f5b4217acab4 am
> > > > > > > > > 6926a34e0ab3e0a023e8ea85a650f5b4217acab4 am
> > > > > > > > > 6926a34e0ab3e0a023e8ea85a650f5b4217acab4 am
> > > > > > > > > bf00d862c6ad436a1be2be606a8ab88d22166b89 am
> > > > > > > > > 6926a34e0ab3e0a023e8ea85a650f5b4217acab4 am
> > > > > > > > > 0d44cdf030fb149e103cfdc164da3da2b7474c17 am
> > > > > > > > > 6926a34e0ab3e0a023e8ea85a650f5b4217acab4 am
> > > > > > > > > 60831f0e7ffe4b49722612c18685c09f4583b1df am
> > > > > > > > > 6926a34e0ab3e0a023e8ea85a650f5b4217acab4 am
> > > > > > > > > 6926a34e0ab3e0a023e8ea85a650f5b4217acab4 am
> > > > > > > > > 6926a34e0ab3e0a023e8ea85a650f5b4217acab4 am
> > > > > > > > > 6926a34e0ab3e0a023e8ea85a650f5b4217acab4 am
> > > > > > > > > a19662b294a3ccdf35dbb18fdd72c62018526d7d am
> > > > > > > > > 6926a34e0ab3e0a023e8ea85a650f5b4217acab4 am
> > > > > > > > > 6926a34e0ab3e0a023e8ea85a650f5b4217acab4 am
> > > > > > > > > ^C
> > > > > > > > >
> > > > > > > > > Corruption occurs most often when there is a sequence like this in a file:
> > > > > > > > >
> > > > > > > > > ref 1: hole
> > > > > > > > > ref 2: extent A, offset 0
> > > > > > > > > ref 3: hole
> > > > > > > > > ref 4: extent A, offset 8192
> > > > > > > > >
> > > > > > > > > This scenario typically arises due to hole-punching or deduplication.
> > > > > > > > > Hole-punching replaces one extent ref with two references to the same
> > > > > > > > > extent with a hole between them, so:
> > > > > > > > >
> > > > > > > > > ref 1: extent A, offset 0, length 16384
> > > > > > > > >
> > > > > > > > > becomes:
> > > > > > > > >
> > > > > > > > > ref 1: extent A, offset 0, length 4096
> > > > > > > > > ref 2: hole, length 8192
> > > > > > > > > ref 3: extent A, offset 12288, length 4096
> > > > > > > > >
> > > > > > > > > Deduplication replaces two distinct extent refs surrounding a hole with
> > > > > > > > > two references to one of the duplicate extents, turning this:
> > > > > > > > >
> > > > > > > > > ref 1: extent A, offset 0, length 4096
> > > > > > > > > ref 2: hole, length 8192
> > > > > > > > > ref 3: extent B, offset 0, length 4096
> > > > > > > > >
> > > > > > > > > into this:
> > > > > > > > >
> > > > > > > > > ref 1: extent A, offset 0, length 4096
> > > > > > > > > ref 2: hole, length 8192
> > > > > > > > > ref 3: extent A, offset 0, length 4096
> > > > > > > > >
> > > > > > > > > Compression is required (zlib, zstd, or lzo) for corruption to occur.
> > > > > > > > > I am not able to reproduce the issue with an uncompressed extent nor
> > > > > > > > > have I observed any such corruption in the wild.
> > > > > > > > >
> > > > > > > > > The presence or absence of the no-holes filesystem feature has no effect.
> > > > > > > > >
> > > > > > > > > Ordinary writes can lead to pairs of extent references to the same extent
> > > > > > > > > separated by a reference to a different extent; however, in this case
> > > > > > > > > there is data to be read from a real extent, instead of pages that have
> > > > > > > > > to be zero filled from a hole. If ordinary non-hole writes could trigger
> > > > > > > > > this bug, every page-oriented database engine would be crashing all the
> > > > > > > > > time on btrfs with compression enabled, and it's unlikely that would not
> > > > > > > > > have been noticed between 2015 and now. An ordinary write that splits
> > > > > > > > > an extent ref would look like this:
> > > > > > > > >
> > > > > > > > > ref 1: extent A, offset 0, length 4096
> > > > > > > > > ref 2: extent C, offset 0, length 8192
> > > > > > > > > ref 3: extent A, offset 12288, length 4096
> > > > > > > > >
> > > > > > > > > Sparse writes can lead to pairs of extent references surrounding a hole;
> > > > > > > > > however, in this case the extent references will point to different
> > > > > > > > > extents, avoiding the bug. If a sparse write could trigger the bug,
> > > > > > > > > the rsync -S option and qemu/kvm 'raw' disk image files (among many
> > > > > > > > > other tools that produce sparse files) would be unusable, and it's
> > > > > > > > > unlikely that would not have been noticed between 2015 and now either.
> > > > > > > > > Sparse writes look like this:
> > > > > > > > >
> > > > > > > > > ref 1: extent A, offset 0, length 4096
> > > > > > > > > ref 2: hole, length 8192
> > > > > > > > > ref 3: extent B, offset 0, length 4096
> > > > > > > > >
> > > > > > > > > The pattern or timing of read() calls seems to be relevant. It is very
> > > > > > > > > hard to see the corruption when reading files with 'hd', but 'cat | hd'
> > > > > > > > > will see the corruption just fine. Similar problems exist with 'cmp'
> > > > > > > > > but not 'sha1sum'. Two processes reading the same file at the same time
> > > > > > > > > seem to trigger the corruption very frequently.
> > > > > > > > >
> > > > > > > > > Some patterns of holes and data produce corruption faster than others.
> > > > > > > > > The pattern generated by the script above is based on instances of
> > > > > > > > > corruption I've found in the wild, and has a much better repro rate than
> > > > > > > > > random holes.
> > > > > > > > >
> > > > > > > > > The corruption occurs during reads, after csum verification and before
> > > > > > > > > decompression, so btrfs detects no csum failures. The data on disk
> > > > > > > > > seems to be OK and could be read correctly once the kernel bug is fixed.
> > > > > > > > > Repeated reads do eventually return correct data, but there is no way
> > > > > > > > > for userspace to distinguish between corrupt and correct data reliably.
> > > > > > > > >
> > > > > > > > > The corrupted data is usually data replaced by a hole or a copy of other
> > > > > > > > > blocks in the same extent.
> > > > > > > > >
> > > > > > > > > The behavior is similar to some earlier bugs related to holes and
> > > > > > > > > Compressed data in btrfs, but it's new and not fixed yet--hence,
> > > > > > > > > "2018 edition."
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Filipe David Manana,
> > > > > > >
> > > > > > > “Whether you think you can, or you think you can't — you're right.”
> > > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Filipe David Manana,
> > > > >
> > > > > “Whether you think you can, or you think you can't — you're right.”
> > > > >
> > >
> > >
> > >
> > > --
> > > Filipe David Manana,
> > >
> > > “Whether you think you can, or you think you can't — you're right.”
> >
> >
> >
> > --
> > Filipe David Manana,
> >
> > “Whether you think you can, or you think you can't — you're right.”
>
>
>
> --
> Filipe David Manana,
>
> “Whether you think you can, or you think you can't — you're right.”
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
next prev parent reply other threads:[~2019-02-14 5:01 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-08-23 3:11 Reproducer for "compressed data + hole data corruption bug, 2018 editiion" Zygo Blaxell
2018-08-23 5:10 ` Qu Wenruo
2018-08-23 16:44 ` Zygo Blaxell
2018-08-23 23:50 ` Qu Wenruo
2019-02-12 3:09 ` Reproducer for "compressed data + hole data corruption bug, 2018 edition" still works on 4.20.7 Zygo Blaxell
2019-02-12 15:33 ` Christoph Anton Mitterer
2019-02-12 15:35 ` Filipe Manana
2019-02-12 17:01 ` Zygo Blaxell
2019-02-12 17:56 ` Filipe Manana
2019-02-12 18:13 ` Zygo Blaxell
2019-02-13 7:24 ` Qu Wenruo
2019-02-13 17:36 ` Filipe Manana
2019-02-13 18:14 ` Filipe Manana
2019-02-14 1:22 ` Filipe Manana
2019-02-14 5:00 ` Zygo Blaxell [this message]
2019-02-14 12:21 ` Christoph Anton Mitterer
2019-02-15 5:40 ` Zygo Blaxell
2019-03-04 15:34 ` Christoph Anton Mitterer
2019-03-07 20:07 ` Zygo Blaxell
2019-03-08 10:37 ` Filipe Manana
2019-03-14 18:58 ` Christoph Anton Mitterer
2019-03-14 20:22 ` Christoph Anton Mitterer
2019-03-14 22:39 ` Filipe Manana
2019-03-08 12:20 ` Austin S. Hemmelgarn
2019-03-14 18:58 ` Christoph Anton Mitterer
2019-03-14 18:58 ` Christoph Anton Mitterer
2019-03-15 5:28 ` Zygo Blaxell
2019-03-16 22:11 ` Christoph Anton Mitterer
2019-03-17 2:54 ` Zygo Blaxell
2019-02-15 12:02 ` Filipe Manana
2019-03-04 15:46 ` Christoph Anton Mitterer
2019-02-12 18:58 ` Andrei Borzenkov
2019-02-12 21:48 ` Chris Murphy
2019-02-12 22:11 ` Zygo Blaxell
2019-02-12 22:53 ` Chris Murphy
2019-02-13 2:46 ` Zygo Blaxell
2019-02-13 7:47 ` Roman Mamedov
2019-02-13 8:04 ` Qu Wenruo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190214050043.GE23918@hungrycats.org \
--to=ce3g8jdj@umail.furryterror.org \
--cc=fdmanana@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.