From: Phillip Susi <psusi@ubuntu.com>
To: Chris Murphy <lists@colorremedies.com>,
Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Cc: Zygo Blaxell <zblaxell@furryterror.org>
Subject: Re: Uncorrectable errors on RAID-1?
Date: Sat, 27 Dec 2014 22:12:57 -0500 [thread overview]
Message-ID: <549F7539.4050801@ubuntu.com> (raw)
In-Reply-To: <CAJCQCtTK2OinQZakArtX4BD=JAz6UJr=dhc2xUbcqJJL1v6Kkw@mail.gmail.com>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
On 12/23/2014 05:09 PM, Chris Murphy wrote:
> The timer in /sys is a kernel command timer, it's not a device
> timer even though it's pointed at a block device. You need to
> change that from 30 to something higher to get the behavior you
> want. It doesn't really make sense to say, timeout in 30 seconds,
> but instead of reporting a timeout, report it as a read error.
> They're completely different things.
The idea is not to give the drive a ridiculous amount of time to
recover without timing out, but for the timeout to be handled properly.
> There are all sorts of errors listed in libata so for all of them
> to get dumped into a read error doesn't make sense. A lot of those
> errors don't report back a sector, and the key part of the read
> error is what sector(s) have the problem so that they can be fixed.
> Without that information, the ability to fix it is lost. And it's
> the drive that needs to report this.
It is not lost. The information is simply fuzzed from an exact
individual sector to a range of sectors in the timed out request. In
an ideal world the drive would give up in a reasonable time and report
the failure, but if it doesn't, then we should deal with that in a
better way than hanging all IO for an unacceptably long time.
> Oven doesn't work, so lets spray gasoline on it and light it and
> the kitchen on fire so that we can cook this damn pizza! That's
> what I just read. Sorry. It doesn't seem like a good idea to me to
> map all errors as read errors.
How do you conclude that? In the face of a timeout your choices are
between kicking the whole drive out of the array immediately, or
attempting to repair it by recovering the affected sector(s) and
rewriting them. Unless that recovery attempt could cause more harm
than degrading the array, then where is the "throwing gasoline on it"
part? This is simply a case of the device not providing a specific
error that says whether it can be recovered or not, so let's attempt
the recovery and see if it works instead of assuming that it won't and
possibly causing data loss that could be avoided.
> Any decent server SATA drive should support SCT ERC. The
> inexpensive WDC Red drives for NAS's all have it and by default are
> a reasonable 70 deciseconds last time I checked.
And yet it isn't supported on the cheaper but otherwise identical
greens, or the higher performing blues. We should not be helping
vendors charge a premium for zero cost firmware features that are
"required" for raid use when they really aren't ( even if they are
nice to have ).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAEBCgAGBQJUn3U5AAoJENRVrw2cjl5RFIQIAJAr86Y5s8RWuL8/We/AlM5Q
JUuZGGaE1IGmMROdUAEzmj78L8lI2U3D95sERDKmd3aJosfpi1SVOExQZebSIqch
hhkLGC0FecxE5VC/67E2wwmfbropSk0mlA5Fbgx8mYf60iUHWcFUkc01kER3JGnd
xMI2jV0UpqVD/gY/a5O7Z7bPeHICQcIyXCN7MAbTMBrDWsYhDACQpij+aNXu5+ke
rCNV5c/VkYFQZ9aaMb6Mxmi9KOkCVv2+kBOsxwqPxlO5s9vKORDhxMp8XeJQEvhU
X2GAgS8r8gSGVdPutekXR1vB+TwhdMxftBWL9jcI1y05Y0z3GcOX+/90S9mrSaU=
=2tIU
-----END PGP SIGNATURE-----
next prev parent reply other threads:[~2014-12-28 3:14 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-21 19:34 Uncorrectable errors on RAID-1? constantine
2014-12-21 21:56 ` Robert White
2014-12-21 22:17 ` Hugo Mills
2014-12-22 0:25 ` Chris Murphy
2014-12-23 21:16 ` Zygo Blaxell
2014-12-23 22:09 ` Chris Murphy
2014-12-23 22:23 ` Chris Murphy
2014-12-28 3:12 ` Phillip Susi [this message]
2014-12-29 21:53 ` Chris Murphy
2014-12-30 20:46 ` Phillip Susi
2014-12-30 23:58 ` Chris Murphy
2014-12-31 3:16 ` Phillip Susi
2015-01-03 5:31 ` Chris Murphy
2015-01-05 4:18 ` Phillip Susi
2015-01-05 7:41 ` Chris Murphy
2014-12-31 15:40 ` Austin S Hemmelgarn
[not found] ` <CAJCQCtQYhaDEic5bwd+PEcEfwOqLwAe8cT8VPZ9je+JLRP1GPw@mail.gmail.com>
2014-12-22 14:28 ` constantine
2014-12-22 16:05 ` Chris Murphy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=549F7539.4050801@ubuntu.com \
--to=psusi@ubuntu.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=lists@colorremedies.com \
--cc=zblaxell@furryterror.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).