All of lore.kernel.org
 help / color / mirror / Atom feed
From: Phil Turmel <philip@turmel.org>
To: Brad Campbell <lists2009@fnarfbargle.com>,
	Marc MERLIN <marc@merlins.org>
Cc: Sarah Newman <srn@prgmr.com>,
	"linux-raid@vger.kernel.org" <linux-raid@vger.kernel.org>
Subject: Re: Raid check didn't fix Current_Pending_Sector, but badblocks -nsv did
Date: Wed, 8 Jun 2016 08:24:00 -0400	[thread overview]
Message-ID: <57580E60.3040104@turmel.org> (raw)
In-Reply-To: <2c7e843f-58a9-a9f0-76bf-ff193935d13b@fnarfbargle.com>

On 06/07/2016 09:39 PM, Brad Campbell wrote:
> On 07/06/16 21:04, Phil Turmel wrote:
>> On 06/07/2016 12:51 AM, Marc MERLIN wrote:
> 
>>> Right, I understand now, good to know.
>>> So I'll use badblocks next time I have this issue.
>>
>> Or just ignore them.  You aren't using them, so they can't hurt you.
> 
> That's actually not necessarily true.
> 
> If you have a dud sector early on the disk (so before the start of the
> RAID data) you will terminate every SMART long test in the first couple
> of meg of the disk. So while a dud down there won't necessarily impact
> your usage from a RAID perspective, it'll knacker your ability to
> regularly check the disks in their entirety. SMART tests abort on the
> first bad read.

Don't bother doing long self-tests on drives participating in an array
-- check scrubs do everything a long self-test does on the area of
interest, plus actually fixing UREs that are found.  And check scrubs
don't abort on a read failure.

My advice stands: ignore the UREs in unused areas of the disk.

> It's ugly, but in the single instance I had that happen, I removed the
> drive from the array, wrote zero to the entire disk and then added it
> back. That forced a reallocation in the affected area.

Completely pointless exercise that opened a window of higher-risk of
failure of your array.  Unless you used --replace with another spare to
maintain redundancy on your array while that disk was out.

> Usually if it is in the RAID zone, a check scrub will clear it up.
> Having said that I've had a very peculiar one here in the last couple of
> days.
> 
> A WD 2TB Green drive with TLER set to 7 seconds. The first read would
> error out in 7 seconds (as it should), but a second read succeeded.
> After returning the error, the drive must have kept trying to recover in
> the background and eventually succeeded and cached the result. So
> subsequent reads were ok. After reading and writing enough to other
> parts of the drive to flush the drives cache, the process would repeat.

Pure speculation.  Unless you can show better evidence that those drives
will cache a read in that case, I would say it was just a mild enough
weak spot that it randomly succeeded more than not.  And if you follow
my advice, it doesn't matter:  if the array is the only process reading
from the disk, the first appearance of the URE would be the last, as the
array would re-write it immediately.  Whether during a scrub or due to
normal access.

Regular long self-tests are highly recommended for stand-alone disks and
for array hot spares.

Phil

  reply	other threads:[~2016-06-08 12:24 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-06 17:41 Raid check didn't fix Current_Pending_Sector, but badblocks -nsv did Marc MERLIN
2016-06-06 19:10 ` Sarah Newman
2016-06-06 22:44   ` Marc MERLIN
2016-06-07  0:54     ` Phil Turmel
2016-06-07  4:51       ` Marc MERLIN
2016-06-07 13:04         ` Phil Turmel
2016-06-07 13:56           ` Mikael Abrahamsson
2016-06-07 14:04           ` Marc MERLIN
2016-06-08  1:39           ` Brad Campbell
2016-06-08 12:24             ` Phil Turmel [this message]
2016-06-07  5:35 ` Roman Mamedov
2016-06-07 13:57 ` Andreas Klauer
2016-06-07 14:14   ` Phil Turmel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57580E60.3040104@turmel.org \
    --to=philip@turmel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=lists2009@fnarfbargle.com \
    --cc=marc@merlins.org \
    --cc=srn@prgmr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.