From: Sebastian Ochmann <ochmann@informatik.uni-bonn.de>
To: Shilong Wang <wangshilong1991@gmail.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: 2 errors when scrubbing - but I don't know what they mean
Date: Sun, 01 Dec 2013 21:45:56 +0100 [thread overview]
Message-ID: <529BA004.2000202@informatik.uni-bonn.de> (raw)
In-Reply-To: <CAP9B-Q=Y+uY2kErYb1ZKMsvFrbYidmGpPnUbHm8iApj7v6wK+w@mail.gmail.com>
Hello,
> However, if you find such superblocks checksum mismatch very often
> during scrub, it maybe
> there are something wrong with disk!
I'm sorry, but I don't think there's a problem with my disks because I
was able to trigger the errors that increment the "gen" error counter
during scrub on a completely different machine and drive today. I
basically performed some I/O operations on a drive and scrubbed at the
same time over and over again until I actually saw "super" errors during
scrub. But the error is reeally hard to trigger. It seems to me like a
race condition somewhere.
So I went a step further and tried to create a repro for this. It seems
like I can trigger the errors now once every few minutes with the method
described below, but sometimes it really takes a long time until the
error pops up, so be patient when trying this...
For the repro:
I'm using a btrfs image in RAM for this for two reasons: I can scrub
quickly over and over again and I can rule our hard drive errors. My
machine has 32 GB of RAM, so that comes in handy here - if you try this
on a physical drive, make sure to adjust some parameters, if necessary.
Create a tmpfs and a testing image, format as btrfs:
$ mkdir btrfstest
$ cd btrfstest/
$ mkdir tmp
$ mount -t tmpfs -o size=20G none tmp
$ dd if=/dev/zero of=tmp/vol bs=1G count=19
$ mkfs.btrfs tmp/vol
$ mkdir mnt
$ mount -o commit=1 tmp/vol mnt
Note the "commit=1" mount option. It's not strictly necessary, but I
have the feeling it helps with triggering the problem...
So now we have a 19 GB btrfs filesystem in RAM, mounted in "mnt". What I
did for performing some artificial I/O operations is to rm and cp a
linux source tree over and over again. Suppose you have an unpacked
linux source tree available in the "/somewhere/linux" directory (and
you're using bash). We'll spawn some loops that keep the filesystem busy:
$ while true; do rm -fr mnt/a; sleep 1.0; cp -R /somewhere/linux mnt/a;
sleep 1.0; done
$ while true; do rm -fr mnt/b; sleep 1.1; cp -R /somewhere/linux mnt/b;
sleep 1.1; done
$ while true; do rm -fr mnt/c; sleep 1.2; cp -R /somewhere/linux mnt/c;
sleep 1.2; done
Now that the filesystem is busy, we'll also scrub it repeatedly (without
backgrounding, -B):
$ while true; do btrfs scrub start -B mnt; sleep 0.5; done
On my machine and in RAM, each scrub takes 0-1 second and the "total
bytes scrubbed" should fluctuate (seems to be especially true with
commit=1, but not sure). Get a beverage of your choice and wait.
(about 10 minutes later)
When I was writing this repro it took about 10 minutes until scrub said:
total bytes scrubbed: 1.20GB with 2 errors
error details: super=2
corrected errors: 0, uncorrectable errors: 0, unverified errors: 0
and in dmesg:
[15282.155170] btrfs: bdev /dev/loop0 errs: wr 0, rd 0, flush 0,
corrupt 0, gen 1
[15282.155176] btrfs: bdev /dev/loop0 errs: wr 0, rd 0, flush 0,
corrupt 0, gen 2
After that, scrub is happy again and will continue normally until the
same errors happen again after a few hundred scrubs or so.
So all in all, the error can be triggered using normal I/O operations
and scrubbing at the right moments, it seems. Even with a btrfs image in
RAM, so no hard drive error is possible.
Hope anyone can reproduce this and maybe debug it.
Best regards
Sebastian
next prev parent reply other threads:[~2013-12-01 20:46 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-11-28 20:36 2 errors when scrubbing - but I don't know what they mean Sebastian Ochmann
2013-11-29 1:10 ` Duncan
2013-11-30 11:31 ` Sebastian Ochmann
[not found] ` <CAP9B-Q=Y+uY2kErYb1ZKMsvFrbYidmGpPnUbHm8iApj7v6wK+w@mail.gmail.com>
2013-12-01 1:16 ` Fwd: " Shilong Wang
2013-12-01 20:45 ` Sebastian Ochmann [this message]
2013-12-02 1:30 ` Wang Shilong
2013-12-02 1:53 ` Wang Shilong
2013-12-02 9:21 ` Wang Shilong
2013-11-29 5:51 ` Wang Shilong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=529BA004.2000202@informatik.uni-bonn.de \
--to=ochmann@informatik.uni-bonn.de \
--cc=linux-btrfs@vger.kernel.org \
--cc=wangshilong1991@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.