Re: 2 errors when scrubbing - but I don't know what they mean

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Sebastian Ochmann <ochmann@informatik.uni-bonn.de>
To: Shilong Wang <wangshilong1991@gmail.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: 2 errors when scrubbing - but I don't know what they mean
Date: Sun, 01 Dec 2013 21:45:56 +0100	[thread overview]
Message-ID: <529BA004.2000202@informatik.uni-bonn.de> (raw)
In-Reply-To: <CAP9B-Q=Y+uY2kErYb1ZKMsvFrbYidmGpPnUbHm8iApj7v6wK+w@mail.gmail.com>

Hello,

 > However, if you find such superblocks checksum mismatch very often
 > during scrub, it maybe
 > there are something wrong with disk!

I'm sorry, but I don't think there's a problem with my disks because I 
was able to trigger the errors that increment the "gen" error counter 
during scrub on a completely different machine and drive today. I 
basically performed some I/O operations on a drive and scrubbed at the 
same time over and over again until I actually saw "super" errors during 
scrub. But the error is reeally hard to trigger. It seems to me like a 
race condition somewhere.

So I went a step further and tried to create a repro for this. It seems 
like I can trigger the errors now once every few minutes with the method 
described below, but sometimes it really takes a long time until the 
error pops up, so be patient when trying this...

For the repro:

I'm using a btrfs image in RAM for this for two reasons: I can scrub 
quickly over and over again and I can rule our hard drive errors. My 
machine has 32 GB of RAM, so that comes in handy here - if you try this 
on a physical drive, make sure to adjust some parameters, if necessary.

Create a tmpfs and a testing image, format as btrfs:

$ mkdir btrfstest
$ cd btrfstest/
$ mkdir tmp
$ mount -t tmpfs -o size=20G none tmp
$ dd if=/dev/zero of=tmp/vol bs=1G count=19
$ mkfs.btrfs tmp/vol
$ mkdir mnt
$ mount -o commit=1 tmp/vol mnt

Note the "commit=1" mount option. It's not strictly necessary, but I 
have the feeling it helps with triggering the problem...

So now we have a 19 GB btrfs filesystem in RAM, mounted in "mnt". What I 
did for performing some artificial I/O operations is to rm and cp a 
linux source tree over and over again. Suppose you have an unpacked 
linux source tree available in the "/somewhere/linux" directory (and 
you're using bash). We'll spawn some loops that keep the filesystem busy:

$ while true; do rm -fr mnt/a; sleep 1.0; cp -R /somewhere/linux mnt/a; 
sleep 1.0; done
$ while true; do rm -fr mnt/b; sleep 1.1; cp -R /somewhere/linux mnt/b; 
sleep 1.1; done
$ while true; do rm -fr mnt/c; sleep 1.2; cp -R /somewhere/linux mnt/c; 
sleep 1.2; done

Now that the filesystem is busy, we'll also scrub it repeatedly (without 
backgrounding, -B):

$ while true; do btrfs scrub start -B mnt; sleep 0.5; done

On my machine and in RAM, each scrub takes 0-1 second and the "total 
bytes scrubbed" should fluctuate (seems to be especially true with 
commit=1, but not sure). Get a beverage of your choice and wait.

(about 10 minutes later)

When I was writing this repro it took about 10 minutes until scrub said:

   total bytes scrubbed: 1.20GB with 2 errors
   error details: super=2
   corrected errors: 0, uncorrectable errors: 0, unverified errors: 0

and in dmesg:

   [15282.155170] btrfs: bdev /dev/loop0 errs: wr 0, rd 0, flush 0, 
corrupt 0, gen 1
   [15282.155176] btrfs: bdev /dev/loop0 errs: wr 0, rd 0, flush 0, 
corrupt 0, gen 2

After that, scrub is happy again and will continue normally until the 
same errors happen again after a few hundred scrubs or so.

So all in all, the error can be triggered using normal I/O operations 
and scrubbing at the right moments, it seems. Even with a btrfs image in 
RAM, so no hard drive error is possible.

Hope anyone can reproduce this and maybe debug it.

Best regards
Sebastian

next prev parent reply	other threads:[~2013-12-01 20:46 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-28 20:36 2 errors when scrubbing - but I don't know what they mean Sebastian Ochmann
2013-11-29  1:10 ` Duncan
2013-11-30 11:31   ` Sebastian Ochmann
     [not found]     ` <CAP9B-Q=Y+uY2kErYb1ZKMsvFrbYidmGpPnUbHm8iApj7v6wK+w@mail.gmail.com>
2013-12-01  1:16       ` Fwd: " Shilong Wang
2013-12-01 20:45       ` Sebastian Ochmann [this message]
2013-12-02  1:30         ` Wang Shilong
2013-12-02  1:53           ` Wang Shilong
2013-12-02  9:21         ` Wang Shilong
2013-11-29  5:51 ` Wang Shilong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=529BA004.2000202@informatik.uni-bonn.de \
    --to=ochmann@informatik.uni-bonn.de \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=wangshilong1991@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.