linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Martin Dev <mrturtledev@gmail.com>
To: linux-btrfs@vger.kernel.org
Subject: Re: csum failed during copy/compare
Date: Mon, 10 Oct 2016 13:02:45 +0100	[thread overview]
Message-ID: <CAG6KGPSYXsU4DQ1x62Wyi+xpEd-EE5hEHPbY-CJfokey94mzoQ@mail.gmail.com> (raw)
In-Reply-To: <CAG6KGPSeR1zAuU0Xo9DrEpd-0QC9agB7p7z+Lb69hxS233_Cvg@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 9151 bytes --]

Additional BTRFS scrub logs

root@xxxx# btrfs scrub start -B /mnt/g/
scrub done for 554b0043-052f-48d1-986f-5a6154496d89
    scrub started at Mon Oct 10 12:52:40 2016 and finished after 00:00:39
    total bytes scrubbed: 20.03GiB with 46304 errors
    error details: csum=46304
    corrected errors: 0, uncorrectable errors: 46304, unverified errors: 0
ERROR: there are uncorrectable errors

I've attached the dmesg log incase gmail wordwrap completely destroys formatting

dmesg :
[10284.452766] BTRFS warning (device sdb2): checksum error at logical
143896739840 on dev /dev/sdb2, sector 42509568, root 5, inode 270,
offset 10548412416, length 4096, links 1 (path: shutdown-4.bin)
[10284.452775] BTRFS warning (device sdb2): checksum error at logical
143896608768 on dev /dev/sdb2, sector 42509312, root 5, inode 270,
offset 10548281344, length 4096, links 1 (path: shutdown-4.bin)
[10284.452784] BTRFS warning (device sdb2): checksum error at logical
143896346624 on dev /dev/sdb2, sector 42508800, root 5, inode 270,
offset 10548019200, length 4096, links 1 (path: shutdown-4.bin)
[10284.452792] BTRFS warning (device sdb2): checksum error at logical
143896215552 on dev /dev/sdb2, sector 42508544, root 5, inode 270,
offset 10547888128, length 4096, links 1 (path: shutdown-4.bin)
[10284.452800] BTRFS error (device sdb2): bdev /dev/sdb2 errs: wr 0,
rd 0, flush 0, corrupt 1, gen 0
[10284.452804] BTRFS error (device sdb2): bdev /dev/sdb2 errs: wr 0,
rd 0, flush 0, corrupt 2, gen 0
[10284.452810] BTRFS error (device sdb2): bdev /dev/sdb2 errs: wr 0,
rd 0, flush 0, corrupt 3, gen 0
[10284.452813] BTRFS error (device sdb2): unable to fixup (regular)
error at logical 143896608768 on dev /dev/sdb2
[10284.452815] BTRFS error (device sdb2): unable to fixup (regular)
error at logical 143896346624 on dev /dev/sdb2
[10284.452820] BTRFS error (device sdb2): unable to fixup (regular)
error at logical 143896215552 on dev /dev/sdb2
[10284.452840] BTRFS error (device sdb2): bdev /dev/sdb2 errs: wr 0,
rd 0, flush 0, corrupt 4, gen 0
[10284.452845] BTRFS error (device sdb2): unable to fixup (regular)
error at logical 143896739840 on dev /dev/sdb2
[10284.452906] BTRFS warning (device sdb2): checksum error at logical
143896084480 on dev /dev/sdb2, sector 42508288, root 5, inode 270,
offset 10547757056, length 4096, links 1 (path: shutdown-4.bin)
[10284.453082] BTRFS error (device sdb2): bdev /dev/sdb2 errs: wr 0,
rd 0, flush 0, corrupt 5, gen 0
[10284.453087] BTRFS error (device sdb2): unable to fixup (regular)
error at logical 143896084480 on dev /dev/sdb2
[10284.453107] BTRFS warning (device sdb2): checksum error at logical
143896743936 on dev /dev/sdb2, sector 42509576, root 5, inode 270,
offset 10548416512, length 4096, links 1 (path: shutdown-4.bin)
[10284.453113] BTRFS warning (device sdb2): checksum error at logical
143896612864 on dev /dev/sdb2, sector 42509320, root 5, inode 270,
offset 10548285440, length 4096, links 1 (path: shutdown-4.bin)
[10284.453122] BTRFS warning (device sdb2): checksum error at logical
143896350720 on dev /dev/sdb2, sector 42508808, root 5, inode 270,
offset 10548023296, length 4096, links 1 (path: shutdown-4.bin)
[10284.453126] BTRFS error (device sdb2): bdev /dev/sdb2 errs: wr 0,
rd 0, flush 0, corrupt 6, gen 0
[10284.453130] BTRFS error (device sdb2): unable to fixup (regular)
error at logical 143896612864 on dev /dev/sdb2
[10284.453134] BTRFS error (device sdb2): bdev /dev/sdb2 errs: wr 0,
rd 0, flush 0, corrupt 7, gen 0
[10284.453137] BTRFS error (device sdb2): unable to fixup (regular)
error at logical 143896350720 on dev /dev/sdb2
[10284.453153] BTRFS error (device sdb2): bdev /dev/sdb2 errs: wr 0,
rd 0, flush 0, corrupt 8, gen 0
[10284.453157] BTRFS error (device sdb2): unable to fixup (regular)
error at logical 143896743936 on dev /dev/sdb2
[10284.453190] BTRFS error (device sdb2): bdev /dev/sdb2 errs: wr 0,
rd 0, flush 0, corrupt 9, gen 0
[10284.453194] BTRFS warning (device sdb2): checksum error at logical
143896219648 on dev /dev/sdb2, sector 42508552, root 5, inode 270,
offset 10547892224, length 4096, links 1 (path: shutdown-4.bin)
[10284.453198] BTRFS error (device sdb2): bdev /dev/sdb2 errs: wr 0,
rd 0, flush 0, corrupt 10, gen 0
[10284.453201] BTRFS error (device sdb2): unable to fixup (regular)
error at logical 143896219648 on dev /dev/sdb2
[10284.453207] BTRFS error (device sdb2): unable to fixup (regular)
error at logical 143896616960 on dev /dev/sdb2
[10284.453238] BTRFS warning (device sdb2): checksum error at logical
143896477696 on dev /dev/sdb2, sector 42509056, root 5, inode 270,
offset 10548150272, length 4096, links 1 (path: shutdown-4.bin)

Important to note this happens across 2 machines, now 3 different
drives (all different brand SSDs), on a Z170 chipset and C612 chipset.

On Mon, Oct 10, 2016 at 10:44 AM, Martin Dev <mrturtledev@gmail.com> wrote:
> Hey everyone,
>
> I work for system verification of SSDs and we've recently come up
> against an issue with BTRFS on Ubuntu 16.04. We have a framework which
> follows the following steps:
>
> Generate verifiable 10GB file with FIO on internal drive
> Copy 10GB file to 2 target partitions on DUT (using "cp" command)
> Sync
> Verify copied files with FIO (using direct=1)
> Perform a power event (restart, shutdown, suspend, or hibernate)
> Verify both 10GB files
> Repeat.
>
> We keep the files from the first iteration on the target partitions,
> but subsequent iterations are deleted after verification. Every
> command is monitored for exit status and framework will fail with
> error if anything exits non zero.
>
> We've found that during this process (between 2-9 iterations of
> restarts or shutdowns) BTRFS will fail the pre-power verification of
> the file with 100% reproduction rate out of 7 attempts. So this is a
> "cp" command to copy from the internal to the btrfs partition, a sync,
> then a verify with fio. This seems to be a recent change as the same
> process has been used for the last 2 years including over the last
> month with no issues.
>
> Here are some more details:
>
> Linux ht_stress_b6_20 4.4.0-21-generic #37-Ubuntu SMP Mon Apr 18
> 18:33:37 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
>
> btrfs-progs v4.4
>
> Label: none  uuid: 554b0043-052f-48d1-986f-5a6154496d89
>     Total devices 1 FS bytes used 20.03GiB
>     devid    1 size 29.81GiB used 22.26GiB path /dev/sdb2
>
> Data, single: total=22.00GiB, used=20.00GiB
> System, single: total=4.00MiB, used=16.00KiB
> Metadata, single: total=264.00MiB, used=20.64MiB
> GlobalReserve, single: total=16.00MiB, used=0.00B
>
>
> df -hT
> /dev/sdb1      ext4       30G   21G  7.7G  73% /mnt/f
> /dev/sdb2      btrfs      30G   21G  9.6G  68% /mnt/g
> /dev/sdb3      xfs        30G   33M   30G   1% /mnt/h
> /dev/sdb4      ext4       30G   44M   28G   1% /mnt/i
>
> /etc/fstab
> /dev/sdb2                                       /mnt/g/ btrfs
> defaults,discard       0       0
>
> root@xxxx: ls -l /mnt/g/
> total 20971520
> -rw-r--r-- 1 root root 10737418240 Oct 10 09:42 shutdown-1.bin
> -rw-r--r-- 1 root root 10737418240 Oct 10 10:03 shutdown-4.bin
>
>
> root@xxxx# cat fio-screenlog.log
> test: (g=0): rw=read, bs=128K-128K/128K-128K/128K-128K,
> ioengine=libaio, iodepth=32
> fio-2.13-88-g32bc8-dirty
> Starting 1 process
> verify: bad magic header 101, wanted acca at file
> /mnt/g/shutdown-4.bin offset 10547757056, length 131072
> verify: bad magic header 101, wanted acca at file
> /mnt/g/shutdown-4.bin offset 10547888128, length 131072
> fio: pid=2288, err=84/file:io_u.c:1978, func=io_u_queued_complete,
> error=Invalid or incomplete multibyte or wide character
>
> In this situation, FIO is reading "101" instead of the correct data
> from the shutdown-4.bin file. from the "ls -l" above, we can see the
> shutdown-1.bin and shutdown-4.bin are the same size and we know the cp
> command exited 0.
>
> shutdown-1.bin (OK):
> dd if=/mnt/g/shutdown-1.bin of=/tmp/header bs=512 count=1 skip=20601088
> 1+0 records in
> 1+0 records out
> 512 bytes copied, 0.00247518 s, 207 kB/s
>
> Data:
> 0000000 acca 0001 0000 0002 3d92 e664 178e 8421
> 0000010 0000 74b2 0002 0000 0016 0000 2bfa 0009
> 0000020 0001 3a59 c82b 1b3a a7b8 ee74 881a 0747
> 0000030 94f7 09f2 79c9 04e3 d29e d6c2 ea3f 04b8
> ...
>
> shutdown-4.bin (FAILS):
> dd if=/mnt/g/shutdown-4.bin of=/tmp/header bs=512 count=1 skip=20601088
> dd: error reading '/mnt/g/shutdown-4.bin': Input/output error
> 0+0 records in
> 0+0 records out
> 0 bytes copied, 0.00230287 s, 0.0 kB/s
>
> We can't even read the file at that offset correctly and have to go
> back to 20601087 (1 512 byte sector) before I can get valid data.
> What's interesting, is over 2 different drives, 2 different machines,
> and 7 reproductions, the data is always "101", this might be a quirk
> of FIO, as I would hope that btrfs would not return corrupt data to
> the user.
>
> Reproduction is straight forward, and takes around an hour. I have a
> large amount of duplicate machines at my disposal if debug /
> investigations need to be run. I can provide portions of our
> automation logs, full dmesg (attached), and any other information that
> needs to be gathered.

[-- Attachment #2: dmesg_btrfsscrub.log --]
[-- Type: text/x-log, Size: 4171 bytes --]

[10284.452766] BTRFS warning (device sdb2): checksum error at logical 143896739840 on dev /dev/sdb2, sector 42509568, root 5, inode 270, offset 10548412416, length 4096, links 1 (path: shutdown-4.bin)
[10284.452775] BTRFS warning (device sdb2): checksum error at logical 143896608768 on dev /dev/sdb2, sector 42509312, root 5, inode 270, offset 10548281344, length 4096, links 1 (path: shutdown-4.bin)
[10284.452784] BTRFS warning (device sdb2): checksum error at logical 143896346624 on dev /dev/sdb2, sector 42508800, root 5, inode 270, offset 10548019200, length 4096, links 1 (path: shutdown-4.bin)
[10284.452792] BTRFS warning (device sdb2): checksum error at logical 143896215552 on dev /dev/sdb2, sector 42508544, root 5, inode 270, offset 10547888128, length 4096, links 1 (path: shutdown-4.bin)
[10284.452800] BTRFS error (device sdb2): bdev /dev/sdb2 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
[10284.452804] BTRFS error (device sdb2): bdev /dev/sdb2 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
[10284.452810] BTRFS error (device sdb2): bdev /dev/sdb2 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0
[10284.452813] BTRFS error (device sdb2): unable to fixup (regular) error at logical 143896608768 on dev /dev/sdb2
[10284.452815] BTRFS error (device sdb2): unable to fixup (regular) error at logical 143896346624 on dev /dev/sdb2
[10284.452820] BTRFS error (device sdb2): unable to fixup (regular) error at logical 143896215552 on dev /dev/sdb2
[10284.452840] BTRFS error (device sdb2): bdev /dev/sdb2 errs: wr 0, rd 0, flush 0, corrupt 4, gen 0
[10284.452845] BTRFS error (device sdb2): unable to fixup (regular) error at logical 143896739840 on dev /dev/sdb2
[10284.452906] BTRFS warning (device sdb2): checksum error at logical 143896084480 on dev /dev/sdb2, sector 42508288, root 5, inode 270, offset 10547757056, length 4096, links 1 (path: shutdown-4.bin)
[10284.453082] BTRFS error (device sdb2): bdev /dev/sdb2 errs: wr 0, rd 0, flush 0, corrupt 5, gen 0
[10284.453087] BTRFS error (device sdb2): unable to fixup (regular) error at logical 143896084480 on dev /dev/sdb2
[10284.453107] BTRFS warning (device sdb2): checksum error at logical 143896743936 on dev /dev/sdb2, sector 42509576, root 5, inode 270, offset 10548416512, length 4096, links 1 (path: shutdown-4.bin)
[10284.453113] BTRFS warning (device sdb2): checksum error at logical 143896612864 on dev /dev/sdb2, sector 42509320, root 5, inode 270, offset 10548285440, length 4096, links 1 (path: shutdown-4.bin)
[10284.453122] BTRFS warning (device sdb2): checksum error at logical 143896350720 on dev /dev/sdb2, sector 42508808, root 5, inode 270, offset 10548023296, length 4096, links 1 (path: shutdown-4.bin)
[10284.453126] BTRFS error (device sdb2): bdev /dev/sdb2 errs: wr 0, rd 0, flush 0, corrupt 6, gen 0
[10284.453130] BTRFS error (device sdb2): unable to fixup (regular) error at logical 143896612864 on dev /dev/sdb2
[10284.453134] BTRFS error (device sdb2): bdev /dev/sdb2 errs: wr 0, rd 0, flush 0, corrupt 7, gen 0
[10284.453137] BTRFS error (device sdb2): unable to fixup (regular) error at logical 143896350720 on dev /dev/sdb2
[10284.453153] BTRFS error (device sdb2): bdev /dev/sdb2 errs: wr 0, rd 0, flush 0, corrupt 8, gen 0
[10284.453157] BTRFS error (device sdb2): unable to fixup (regular) error at logical 143896743936 on dev /dev/sdb2
[10284.453190] BTRFS error (device sdb2): bdev /dev/sdb2 errs: wr 0, rd 0, flush 0, corrupt 9, gen 0
[10284.453194] BTRFS warning (device sdb2): checksum error at logical 143896219648 on dev /dev/sdb2, sector 42508552, root 5, inode 270, offset 10547892224, length 4096, links 1 (path: shutdown-4.bin)
[10284.453198] BTRFS error (device sdb2): bdev /dev/sdb2 errs: wr 0, rd 0, flush 0, corrupt 10, gen 0
[10284.453201] BTRFS error (device sdb2): unable to fixup (regular) error at logical 143896219648 on dev /dev/sdb2
[10284.453207] BTRFS error (device sdb2): unable to fixup (regular) error at logical 143896616960 on dev /dev/sdb2
[10284.453238] BTRFS warning (device sdb2): checksum error at logical 143896477696 on dev /dev/sdb2, sector 42509056, root 5, inode 270, offset 10548150272, length 4096, links 1 (path: shutdown-4.bin)

  reply	other threads:[~2016-10-10 12:03 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAG6KGPS8Gvy2zGCe22mMmk4uXrqVyRP8FHayhe0ph-8vC+kA4A@mail.gmail.com>
2016-10-10  9:44 ` Fwd: csum failed during copy/compare Martin Dev
2016-10-10 12:02   ` Martin Dev [this message]
2016-10-10 13:56     ` Martin Dev
2016-10-10 18:42   ` Roman Mamedov
2016-10-10 21:05     ` Chris Murphy
2016-10-19 14:39       ` Martin Dev
2016-10-21 13:19         ` Martin Dev
2016-10-21 16:19           ` Chris Murphy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAG6KGPSYXsU4DQ1x62Wyi+xpEd-EE5hEHPbY-CJfokey94mzoQ@mail.gmail.com \
    --to=mrturtledev@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).