From: Sebastian Ochmann <ochmann@informatik.uni-bonn.de>
To: Shilong Wang <wangshilong1991@gmail.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: 2 errors when scrubbing - but I don't know what they mean
Date: Sun, 01 Dec 2013 21:45:56 +0100 [thread overview]
Message-ID: <529BA004.2000202@informatik.uni-bonn.de> (raw)
In-Reply-To: <CAP9B-Q=Y+uY2kErYb1ZKMsvFrbYidmGpPnUbHm8iApj7v6wK+w@mail.gmail.com>
Hello,
> However, if you find such superblocks checksum mismatch very often
> during scrub, it maybe
> there are something wrong with disk!
I'm sorry, but I don't think there's a problem with my disks because I
was able to trigger the errors that increment the "gen" error counter
during scrub on a completely different machine and drive today. I
basically performed some I/O operations on a drive and scrubbed at the
same time over and over again until I actually saw "super" errors during
scrub. But the error is reeally hard to trigger. It seems to me like a
race condition somewhere.
So I went a step further and tried to create a repro for this. It seems
like I can trigger the errors now once every few minutes with the method
described below, but sometimes it really takes a long time until the
error pops up, so be patient when trying this...
For the repro:
I'm using a btrfs image in RAM for this for two reasons: I can scrub
quickly over and over again and I can rule our hard drive errors. My
machine has 32 GB of RAM, so that comes in handy here - if you try this
on a physical drive, make sure to adjust some parameters, if necessary.
Create a tmpfs and a testing image, format as btrfs:
$ mkdir btrfstest
$ cd btrfstest/
$ mkdir tmp
$ mount -t tmpfs -o size=20G none tmp
$ dd if=/dev/zero of=tmp/vol bs=1G count=19
$ mkfs.btrfs tmp/vol
$ mkdir mnt
$ mount -o commit=1 tmp/vol mnt
Note the "commit=1" mount option. It's not strictly necessary, but I
have the feeling it helps with triggering the problem...
So now we have a 19 GB btrfs filesystem in RAM, mounted in "mnt". What I
did for performing some artificial I/O operations is to rm and cp a
linux source tree over and over again. Suppose you have an unpacked
linux source tree available in the "/somewhere/linux" directory (and
you're using bash). We'll spawn some loops that keep the filesystem busy:
$ while true; do rm -fr mnt/a; sleep 1.0; cp -R /somewhere/linux mnt/a;
sleep 1.0; done
$ while true; do rm -fr mnt/b; sleep 1.1; cp -R /somewhere/linux mnt/b;
sleep 1.1; done
$ while true; do rm -fr mnt/c; sleep 1.2; cp -R /somewhere/linux mnt/c;
sleep 1.2; done
Now that the filesystem is busy, we'll also scrub it repeatedly (without
backgrounding, -B):
$ while true; do btrfs scrub start -B mnt; sleep 0.5; done
On my machine and in RAM, each scrub takes 0-1 second and the "total
bytes scrubbed" should fluctuate (seems to be especially true with
commit=1, but not sure). Get a beverage of your choice and wait.
(about 10 minutes later)
When I was writing this repro it took about 10 minutes until scrub said:
total bytes scrubbed: 1.20GB with 2 errors
error details: super=2
corrected errors: 0, uncorrectable errors: 0, unverified errors: 0
and in dmesg:
[15282.155170] btrfs: bdev /dev/loop0 errs: wr 0, rd 0, flush 0,
corrupt 0, gen 1
[15282.155176] btrfs: bdev /dev/loop0 errs: wr 0, rd 0, flush 0,
corrupt 0, gen 2
After that, scrub is happy again and will continue normally until the
same errors happen again after a few hundred scrubs or so.
So all in all, the error can be triggered using normal I/O operations
and scrubbing at the right moments, it seems. Even with a btrfs image in
RAM, so no hard drive error is possible.
Hope anyone can reproduce this and maybe debug it.
Best regards
Sebastian
next prev parent reply other threads:[~2013-12-01 20:46 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-11-28 20:36 2 errors when scrubbing - but I don't know what they mean Sebastian Ochmann
2013-11-29 1:10 ` Duncan
2013-11-30 11:31 ` Sebastian Ochmann
[not found] ` <CAP9B-Q=Y+uY2kErYb1ZKMsvFrbYidmGpPnUbHm8iApj7v6wK+w@mail.gmail.com>
2013-12-01 1:16 ` Fwd: " Shilong Wang
2013-12-01 20:45 ` Sebastian Ochmann [this message]
2013-12-02 1:30 ` Wang Shilong
2013-12-02 1:53 ` Wang Shilong
2013-12-02 9:21 ` Wang Shilong
2013-11-29 5:51 ` Wang Shilong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=529BA004.2000202@informatik.uni-bonn.de \
--to=ochmann@informatik.uni-bonn.de \
--cc=linux-btrfs@vger.kernel.org \
--cc=wangshilong1991@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).