From: Thomas Fjellstrom <thomas@fjellstrom.ca>
To: Phil Turmel <philip@turmel.org>
Cc: "linux-raid@vger.kernel.org" <linux-raid@vger.kernel.org>
Subject: Re: Recent drive errors
Date: Tue, 19 May 2015 23:38:16 -0600 [thread overview]
Message-ID: <358052151.1yasTJDDbz@balsa> (raw)
In-Reply-To: <34123545.0U2U7Wo4pc@balsa>
On Tue 19 May 2015 10:07:49 AM Thomas Fjellstrom wrote:
> On Tue 19 May 2015 10:51:59 AM you wrote:
> > On 05/19/2015 10:32 AM, Thomas Fjellstrom wrote:
> > > On Tue 19 May 2015 09:23:20 AM Phil Turmel wrote:
> > >> Depends. In a properly functioning array that gets scrubbed
> > >> occasionally, or sufficiently heavy use to read the entire contents
> > >> occasionally, the UREs get rewritten by MD right away. Any UREs then
> > >> only show up once.
> > >
> > > I have made sure that it's doing regular scrubs, and regular SMART
> > > scans.
> > > This time...
> >
> > Yes, and this drive was kicked out. Because it wouldn't be listening
> > when MD tried to write over the error it found.
>
[snip]
>
> > I posted this link earlier, but it is particularly relevant:
> > http://marc.info/?l=linux-raid&m=133665797115876&w=2
> >
> > >> Interesting. I suspect that if you wipe that disk with noise, read it
> > >> all back, and wipe it again, you'll have a handful of relocations.
> > >
> > > It looks like each one of the blocks in that display is 128KiB. Which i
> > > think means those red blocks aren't very far apart. Maybe 80MiB apart?
> > > Would it reallocate all of those? That'd be a lot of reallocated
> > > sectors.
> >
> > Drives will only reallocate where a previous read failed (making it
> > pending), then write and follow-up verification fails. In general,
> > writes are unverified at the time of write (or your write performance
> > would be dramatically slower than read).
>
> Right. I was just thinking about how you mentioned that I'd get a handful of
> reallocations based on the latency shown in the image I posted. It's a lot
> of sectors that seem to be affected by the latency spikes, so I assumed
> (probably wrongly) that many of them may be reallocated afterwards.
>
> If this drive ends up not reallocating a single sector, or only a few, I may
> just keep it around as a hot spare, though i feel that's not the best idea,
> if it is degrading, then when it actually goes to use that disk it has a
> higher chance of failing.
Well here's something:
[78447.747221] sd 0:0:15:0: [sdf] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[78447.749092] sd 0:0:15:0: [sdf] Sense Key : Medium Error [current]
[78447.751034] sd 0:0:15:0: [sdf] Add. Sense: Unrecovered read error
[78447.752925] sd 0:0:15:0: [sdf] CDB: Read(16) 88 00 00 00 00 00 ef 7a 0f b0 00 00 00 08 00 00
[78447.754746] blk_update_request: critical medium error, dev sdf, sector 4017754032
[78447.756700] Buffer I/O error on dev sdf, logical block 502219254, async page read
<many many more of the above>
5 Reallocated_Sector_Ct PO--CK 087 087 036 - 17232
187 Reported_Uncorrect -O--CK 001 001 000 - 8236
197 Current_Pending_Sector -O--C- 024 024 000 - 12584
198 Offline_Uncorrectable ----C- 024 024 000 - 12584
Badblocks is showing a bunch of errors now, and the above is what's in dmesg and smartctl.
So I guess it was dead after all.
> > >> You have it backwards. If you have WD Reds, they are correct out of
> > >> the
> > >> box. It's when you *don't* have ERC support, or you only have desktop
> > >> ERC, that you need to take special action.
> > >
> > > I was under the impression you still had to enable ERC on boot. And I
> > > /thought/ I read that you still want to adjust the timeouts, though not
> > > the
> > > same as for consumer drives.
> >
> > Desktop / consumer drives that support ERC typically ship with it
> > disabled, so they behave just like drives that don't support it at all.
> >
> > So a boot script would enable ERC on drives where it can (and not
> >
> > already OK), and set long driver timeouts on the rest.
> >
> > Any drive that claims "raid" compatibility will have ERC enabled by
> > default. Typically 7.0 seconds. WD Reds do. Enterprise drives do, and
> > have better URE specs, too.
>
> Good to know.
>
> > >> If you have consumer grade drives in a raid array, and you don't have
> > >> boot scripts or udev rules to deal with timeout mismatch, your *ss is
> > >> hanging in the wind. The links in my last msg should help you out.
> > >
> > > There was some talk of ERC/TLER and md. I'll still have to find or write
> > > a
> > > script to properly set up timeouts and enable TLER on drives capable of
> > > it
> > > (that don't come with it enabled by default).
> >
> > Before I got everything onto proper drives, I just put what I needed
> > into rc.local.
>
[snip]
>
> > Chris Murphy posted some udev rules that will likely work for you. I
> > haven't tried them myself, though.
> >
> > https://www.marc.info/?l=linux-raid&m=142487508806844&w=3
>
> Thanks :)
>
> > Phil
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Thomas Fjellstrom
thomas@fjellstrom.ca
next prev parent reply other threads:[~2015-05-20 5:38 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-19 11:08 Recent drive errors Thomas Fjellstrom
2015-05-19 12:34 ` Phil Turmel
2015-05-19 12:50 ` Thomas Fjellstrom
2015-05-19 13:23 ` Phil Turmel
2015-05-19 14:32 ` Thomas Fjellstrom
2015-05-19 14:51 ` Phil Turmel
2015-05-19 16:07 ` Thomas Fjellstrom
2015-05-20 5:38 ` Thomas Fjellstrom [this message]
2015-05-21 7:58 ` Mikael Abrahamsson
2015-05-21 12:45 ` Thomas Fjellstrom
2015-05-22 13:38 ` Mikael Abrahamsson
2015-05-22 14:19 ` Thomas Fjellstrom
2015-05-22 7:07 ` Weedy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=358052151.1yasTJDDbz@balsa \
--to=thomas@fjellstrom.ca \
--cc=linux-raid@vger.kernel.org \
--cc=philip@turmel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).