From: Austin S Hemmelgarn <ahferroin7@gmail.com>
To: Duncan <1i5t5.duncan@cox.net>, linux-btrfs@vger.kernel.org
Subject: Re: utils version and convert crash
Date: Wed, 2 Dec 2015 07:25:13 -0500 [thread overview]
Message-ID: <565EE329.3050902@gmail.com> (raw)
In-Reply-To: <pan$c5c73$162474d7$f06e12fa$496fb5f@cox.net>
[-- Attachment #1: Type: text/plain, Size: 4584 bytes --]
On 2015-12-02 05:01, Duncan wrote:
> Gareth Pye posted on Wed, 02 Dec 2015 18:07:48 +1100 as excerpted:
>
>> Output from scrub:
>> sudo btrfs scrub start -Bd /data
>
> [Omitted no-error device reports.]
>
>> scrub device /dev/sdh (id 6) done
>> scrub started at Wed Dec 2 07:04:08 2015 and finished after 06:47:22
>> total bytes scrubbed: 1.09TiB with 2 errors
>> error details: read=2
>> corrected errors: 2, uncorrectable errors: 0, unverified errors: 30
>
> Also note those unverified errors...
>
> I have quite a bit of experience with btrfs scrub as I ran with a failing
> ssd for awhile, using btrfs scrub on the multiple btrfs raid1 filesystems
> on parallel partitions on the failing ssd and another good one to correct
> the errors and continue operations.
>
> Unverified errors are, I believe[1], errors where a metadata block
> holding checksums itself has an error, so the blocks its checksums in
> turn covered are not checksum-verified.
>
> What that means in practice is that once the first metadata block error
> has been corrected in a first scrub run, a second scrub run can now check
> the blocks that were recorded as unverified errors in the first run,
> potentially finding and hopefully fixing additional errors, tho unless
> the problem's extreme, most of the unverifieds should end up being
> correct once they can be verified, with only a few possible further
> errors found.
>
> Of course if some of these previously unverified blocks are themselves
> metadata blocks with further checksums, yet another run may be required.
>
> Fortunately, these trees are quite wide (121 items according to an old
> post from Hugo I found myself rereading a few hours ago) and thus don't
> tend to be very deep -- I think I ended up rerunning scrub four times at
> one point, before both read and unverified errors went to zero, tho
> that's on relatively small partitioned-up ssd filesystems of under 50 gig
> usable capacity (pair-raid1, 50 gig per device), so I could see terabyte-
> scale filesystems going to 6-7 levels.
>
> And, again on a btrfs raid1 with a known failing device -- several
> thousand redirected sectors by the time I gave up and btrfs replaced --
> generally each successive scrub run would return an order of magnitude or
> so fewer errors (corrected and unverified both) than the previous run,
> tho occasionally I'd hit a bad spot and the number would go up a bit in
> one run, before dropping an order of magnitude or so again on the next
> run.
>
> So with only two corrected read-errors and 30 unverified, I'd expect
> maybe another one or two corrected read-errors on a second run, and
> probably no unverifieds, in which case a third run shouldn't be necessary
> unless you just want the peace of mind of seeing that no errors found
> message. Tho of course if you're unlucky, one of those 30 will turn out
> to be a a read error on a full 121-item metadata block, so your
> unverifieds will go up for that run, before going down again in
> subsequent runs.
>
> Of course with filesystems of under 50 gig capacity on fast ssds, a
> typical scrub ran in under a minute, so repeated scrubs to find and
> correct all errors wasn't a big deal, generally under 10 minutes
> including human response time. On terabyte-scale spinning rust with
> scrubs taking hours, multiple scrubs could easily take a full 24-hour day
> or more! =:^(
>
> So now that you did one scrub and did find errors, you do probably want
> to trace them down and correct the problem if possible, before running
> further scrubs to find and exterminate any errors still hiding behind
> unverified in the first run. But once you're reasonably confident you're
> running a reliable system again, you probably do want to run further
> scrubs until that unverified count goes to zero (assuming no
> uncorrectable errors in the mean time).
>
> ---
> [1] I'm not a dev and am not absolutely sure of the technical accuracy of
> this description, but from an admin's viewpoint it seems to be correct at
> least in practice, based on the fact that further scrubs as long as there
> were unverified errors often did find additional errors, while once the
> unverified count dropped to zero and the last read errors were corrected,
> further scrubs turned up no further errors.
>
AFAICT from reading the code, that is a correct assessment. It would be
kind of nice though if there was some way to tell scrub to recheck up to
X many times if there are unverified errors...
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3019 bytes --]
next prev parent reply other threads:[~2015-12-02 12:25 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-12-01 12:38 utils version and convert crash Gareth Pye
2015-12-01 12:57 ` Gareth Pye
2015-12-01 14:46 ` Duncan
2015-12-01 15:16 ` Austin S Hemmelgarn
2015-12-01 15:14 ` Duncan
2015-12-01 20:12 ` Gareth Pye
2015-12-01 20:30 ` Austin S Hemmelgarn
2015-12-01 22:22 ` Gareth Pye
2015-12-02 7:07 ` Gareth Pye
2015-12-02 10:01 ` Duncan
2015-12-02 12:07 ` Gareth Pye
2015-12-02 12:25 ` Austin S Hemmelgarn [this message]
2015-12-02 13:45 ` Duncan
2015-12-02 14:32 ` Austin S Hemmelgarn
2015-12-02 22:14 ` Gareth Pye
2016-02-28 10:23 ` Gareth Pye
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=565EE329.3050902@gmail.com \
--to=ahferroin7@gmail.com \
--cc=1i5t5.duncan@cox.net \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox