Re: md failing mechanism

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Phil Turmel <philip@turmel.org>
To: Dark Penguin <darkpenguin@yandex.ru>, linux-raid@vger.kernel.org
Subject: Re: md failing mechanism
Date: Fri, 22 Jan 2016 17:18:25 -0500	[thread overview]
Message-ID: <56A2AAB1.6070305@turmel.org> (raw)
In-Reply-To: <56A2A2C3.9000801@yandex.ru>

On 01/22/2016 04:44 PM, Dark Penguin wrote:
> Oh! Thank you! I really wanted to see a reliable "what's supposed to
> happen" sequence!

You're welcome.

> As for my case, those were indeed, um, "cheap desktop drives" - to be
> precise, some 80-Gb IDE drives in a Pentium-4 machine; "it works well
> for a small file server", I thought, oblivious to the finer details
> about the process of failure handling... But, I also have "big" file
> servers, so that timeout mismatch issue is something worth paying
> attention!
> 
> And also, now I understand why I probably "should have been scrubbing".
> =/ Do I understand correctly that "scrubbing" means those "monthly
> redundancy checks" that mdadm suggests? And I suppose what it does is
> just the same - read every sector and attempt to write it back upon
> failure, otherwise kicking the device?

A "check" scrub reads every sector every member device's data area.  If
any fail, the normal reconstruct and rewrite will fix it.  It also looks
for successfull reads where the data is inconsistent between mirrors or
between data blocks and parity blocks.  Those are counted for you to review.

A "repair" scrub reads forcibly ensures consistent redundancy by copying
mirror one to the others, and recomputing parity from data.  It will
also reconstruct if needed.

The "check" mode is your recommended regular scrub.  I do mine weekly,
but monthly is probably fine.  "Repair" is needed if "check" reports any
mismatches.

> ..... is all that correct?

From one of your reading assignments: (
http://marc.info/?l=linux-raid&m=135811522817345&w=1 )

> Options are:
> 
> A) Buy Enterprise drives. They have appropriate error timeouts and work
> properly with MD right out of the box.
> 
> B) Buy Desktop drives with SCTERC support. They have inappropriate
> default timeouts, but can be set to an appropriate value. Udev or boot
> script assistance is needed to call smartctl to set it. They do *not*
> work properly with MD out of the box.
> 
> C) Suffer with desktop drives without SCTERC support. They cannot be
> set to appropriate error timeouts. Udev or boot script assistance is
> needed to set a 120 second driver timeout in sysfs. They do *not* work
> properly with MD out of the box.
> 
> D) Lose your data during spare rebuild after your first URE. (Odds in
> proportion to array size.)
> 
> One last point bears repeating: MD is *not* a backup system, although
> some people leverage it's features for rotating off-site backup disks.
> Raid arrays are all about *uptime*. They will not save you from
> accidental deletion or other operator errors. They will not save you if
> your office burns down. You need a separate backup system for critical
> files.

Since that was written, 'A' would now include almost-enterprise drives
with RAID ratings like the Western Digital Red family.  And the
recommended timeout for 'C' has drifted upward to 180.

[trim /]

> Still, I don't think it has anything to do with what has happened to my
> "small file server"...

That's why I asked for the dmesg.  It could have been a bug.  No crisis
if it's lost, so long as you've accepted one of A through D above.

Phil

ps.  convention on kernel.org is reply-to-all and no top-posting.

next prev parent reply	other threads:[~2016-01-22 22:18 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-22 17:59 md failing mechanism Dark Penguin
2016-01-22 19:29 ` Phil Turmel
2016-01-22 20:00   ` Wols Lists
2016-01-22 21:44   ` Dark Penguin
2016-01-22 22:18     ` Phil Turmel [this message]
2016-01-22 22:50       ` Dark Penguin
2016-01-22 23:23         ` Edward Kuns
2016-01-22 23:34       ` Wols Lists
2016-01-23  0:09         ` Dark Penguin
2016-01-22 22:37     ` Edward Kuns
2016-01-22 23:07       ` Dark Penguin
2016-01-22 23:39         ` Wols Lists
2016-01-23  0:09           ` Dark Penguin
2016-01-23  0:34         ` Phil Turmel
2016-01-23 10:33           ` Dark Penguin
2016-01-23 15:12             ` Phil Turmel
2016-01-22 23:40     ` James J
2016-01-23  0:44       ` Phil Turmel
2016-01-23 14:09       ` Wols Lists
2016-01-23 19:02         ` James J
2016-01-24 22:13           ` Adam Goryachev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56A2AAB1.6070305@turmel.org \
    --to=philip@turmel.org \
    --cc=darkpenguin@yandex.ru \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).