From: Thomas Fjellstrom <thomas@fjellstrom.ca>
To: Phil Turmel <philip@turmel.org>
Cc: "linux-raid@vger.kernel.org" <linux-raid@vger.kernel.org>
Subject: Re: Recent drive errors
Date: Tue, 19 May 2015 08:32:21 -0600 [thread overview]
Message-ID: <2278721.7NtVspC26F@balsa> (raw)
In-Reply-To: <555B3948.1030602@turmel.org>
On Tue 19 May 2015 09:23:20 AM Phil Turmel wrote:
> On 05/19/2015 08:50 AM, Thomas Fjellstrom wrote:
> > On Tue 19 May 2015 08:34:55 AM Phil Turmel wrote:
> >> Based on the smart report, this drive is perfectly healthy. A small
> >> number of uncorrectable read errors is normal in the life of any drive.
> >
> > Is it perfectly normal for the same sector to be reported uncorrectable 5
> > times in a row like it did?
>
> Yes, if you keep trying to read it. Unreadable sectors stay unreadable,
> generally, until they are re-written. That's the first opportunity the
> drive has to decide if a relocation is necessary.
>
> > How many UREs are considered "ok"? Tens, hundreds, thousands, tens of
> > thousands?
>
> Depends. In a properly functioning array that gets scrubbed
> occasionally, or sufficiently heavy use to read the entire contents
> occasionally, the UREs get rewritten by MD right away. Any UREs then
> only show up once.
I have made sure that it's doing regular scrubs, and regular SMART scans. This
time...
> In a desktop environment, or non-raid, or improperly configured raid,
> the UREs will build up, and get reported on every read attempt.
>
> Most consumer-grade drives claim a URE average below 1 per 1E14 bits
> read. So by the end of their warranty period, getting one every 12TB
> read wouldn't be unusual. This sort of thing follows a Poisson
> distribution:
>
> http://marc.info/?l=linux-raid&m=135863964624202&w=2
>
> > These drives have been barely used. Most of their life, they were either
> > off, or not actually being used. (it took a while to collect enough 3TB
> > drives, and then find time to build the array, and set it up as a regular
> > backup of my 11TB nas).
>
> While being off may lengthen their life somewhat, the magnetic domains
> on these things are so small that some degradation will happen just
> sitting there. Diffusion in the p- and n-doped regions of the
> semiconductors is also happening while sitting unused, degrading the
> electronics.
>
> >> It has no relocations, and no pending sectors. The latency spikes are
> >>
> >> likely due to slow degradation of some sectors that the drive is having
> >> to internally retry to read successfully. Again, normal.
> >
> > The latency spikes are /very/ regular and theres quite a lot of them.
> > See: http://i.imgur.com/QjTl6o3.png
>
> Interesting. I suspect that if you wipe that disk with noise, read it
> all back, and wipe it again, you'll have a handful of relocations.
It looks like each one of the blocks in that display is 128KiB. Which i think
means those red blocks aren't very far apart. Maybe 80MiB apart? Would it
reallocate all of those? That'd be a lot of reallocated sectors.
> Your latency test will show different numbers then, as the head will
> have to seek to the spare sector and back whenever you read through one
> of those spots.
>
> Or the rewrites will fix them all, and you'll have no further problems.
> Hard to tell. Bottom line is that drives can't fix any problems they
> have unless they are *written* in previously identified problem areas.
>
> >> I own some "DM001" drives -- they are unsuited to raid duty as they
> >> don't support ERC. So, out of the box, they are time bombs for any
> >> array you put them in. That's almost certainly why they were ejected
> >> from your array.
> >>
> >> If you absolutely must use them, you *must* set the *driver* timeout to
> >> 120 seconds or more.
> >
> > I've been planning on looking into the ERC stuff. I now actually have some
> > drives that do support ERC, so it'll be interesting to make sure
> > everything is set up properly.
>
> You have it backwards. If you have WD Reds, they are correct out of the
> box. It's when you *don't* have ERC support, or you only have desktop
> ERC, that you need to take special action.
I was under the impression you still had to enable ERC on boot. And I
/thought/ I read that you still want to adjust the timeouts, though not the
same as for consumer drives.
> If you have consumer grade drives in a raid array, and you don't have
> boot scripts or udev rules to deal with timeout mismatch, your *ss is
> hanging in the wind. The links in my last msg should help you out.
There was some talk of ERC/TLER and md. I'll still have to find or write a
script to properly set up timeouts and enable TLER on drives capable of it
(that don't come with it enabled by default).
> Also, I noticed that you used "smartctl -a" to post a complete report of
> your drive's status. It's not complete. You should get in the habit of
> using "smartctl -x" instead, so you see the ERC status, too.
Good to know. Thanks.
> Phil
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Thomas Fjellstrom
thomas@fjellstrom.ca
next prev parent reply other threads:[~2015-05-19 14:32 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-19 11:08 Recent drive errors Thomas Fjellstrom
2015-05-19 12:34 ` Phil Turmel
2015-05-19 12:50 ` Thomas Fjellstrom
2015-05-19 13:23 ` Phil Turmel
2015-05-19 14:32 ` Thomas Fjellstrom [this message]
2015-05-19 14:51 ` Phil Turmel
2015-05-19 16:07 ` Thomas Fjellstrom
2015-05-20 5:38 ` Thomas Fjellstrom
2015-05-21 7:58 ` Mikael Abrahamsson
2015-05-21 12:45 ` Thomas Fjellstrom
2015-05-22 13:38 ` Mikael Abrahamsson
2015-05-22 14:19 ` Thomas Fjellstrom
2015-05-22 7:07 ` Weedy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2278721.7NtVspC26F@balsa \
--to=thomas@fjellstrom.ca \
--cc=linux-raid@vger.kernel.org \
--cc=philip@turmel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).