From: Marc MERLIN <marc@merlins.org>
To: Duncan <1i5t5.duncan@cox.net>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: Documentation for BTRFS error (device dev): bdev /dev/xx errs: wr 22, rd 0, flush 0, corrupt 0, gen 0
Date: Tue, 23 Feb 2016 16:19:44 -0800 [thread overview]
Message-ID: <20160224001944.GX22487@merlins.org> (raw)
In-Reply-To: <pan$84c48$1f5c952c$5c23be23$5799c3a@cox.net> <pan$1ce2f$38765775$42544d39$1c9fd0a5@cox.net>
On Tue, Feb 23, 2016 at 11:22:47PM +0000, Duncan wrote:
> Forgot to mention, tho you're probably already considering it, if this is
> the same raid5-backed btrfs you were complaining about being slow in the
> other thread,
No, that's another one :)
This one was remade from scratch after the filesystem on it got
corrupted.
5 x 4TB swraid5 64GB SSD
bcache
dmcrypt
btrfs
Smart is 100% for all 5 drives, and they passed an extensive test before
I built the new raid and filesystem on them.
> and considering redoing with bcache to an ssd added, as
> seems very likely, if it /is/ actually storage device or bus errors, that
> could be one reason the previous one was getting so slow... Maybe it
> wasn't btrfs after all.
Good thinking, although in this case, it's a different filesystem.
This filesystem is however on a Sata port multiplier with a 2 meter
cable to an external disk array.
As a result, bandwidth to it is going to be slow-ish, and the long cable
could be adding I/O errors.
On Tue, Feb 23, 2016 at 11:17:06PM +0000, Duncan wrote:
> I believe all formal documentation of what the error counters actually
> mean is developer-level -- "Trust the Source, Luke."
Haha, I know that one :)
Although to be fair I was more offering for someone to tell me what
they're supposed to mean, and me updating the wiki to capture that info.
> Yet another point supporting the "btrfs is still stabilizing, not yet
> fully stable" position, I suppose, as it could definitely be argued that
> those counters and their visibility, including display in the kernel log
> at mount time, are definitely intended to be consumed at the admin-user
> level, and that it follows that they should be documented at the admin-
> user level before the filesystem can properly be defined as fully stable.
Yes :) and I'm happy to help make this reality in the wiki at least.
> Write error counter increments should be accompanied by kernel log events
> telling you more -- what level of the device stack is returning the
> errors that propagate up to the filesystem level, for instance. Expected
> would be either bus level timeouts and resets, or storage device errors.
I agree, and I get 0 such errors here, which is why it's weird.
> If it's storage device errors, SMART data should show increasing raw
> value relocated sectors or the like (smartctl -A). If it's bus errors,
Correct, and they are all at 0.
> it could be bad cabling (bad connections or bad shielding, or using
> SATA-150 certified cables for SATA-600 or some such), or, as I saw on an
Cabling is indeed a likely culprit, I'm just surprised that if it's the
case, the sata layer is showing me nothing (I'm doing tail -f
/var/log/kern.log and usually I'd see sata or PMP errors there)
> old and failing mobo (when I pulled it there were bulging and some
> exploded capacitors) a few years ago, failing filter-capacitors on the
> mobo signalling paths. Bad power, including the possibility of an
> overloaded UPS that hit one guy I know, is notorious for both this sort
> of issue and memory problems, as well.
All true, but wouldn't all of these show up as actual disk errors by the
underlying driver involved too?
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
next prev parent reply other threads:[~2016-02-24 0:20 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-23 21:59 Documentation for BTRFS error (device dev): bdev /dev/xx errs: wr 22, rd 0, flush 0, corrupt 0, gen 0 Marc MERLIN
2016-02-23 23:17 ` Duncan
2016-02-23 23:22 ` Duncan
2016-02-24 0:19 ` Marc MERLIN [this message]
2016-02-24 0:38 ` Duncan
2016-03-07 15:13 ` Marc MERLIN
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160224001944.GX22487@merlins.org \
--to=marc@merlins.org \
--cc=1i5t5.duncan@cox.net \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).