Re: proactive disk replacement

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: David Brown <david.brown@hesbynett.no>
To: Reindl Harald <h.reindl@thelounge.net>,
	Adam Goryachev <mailinglists@websitemanagers.com.au>,
	Jeff Allison <jeff.allison@allygray.2y.net>
Cc: linux-raid@vger.kernel.org
Subject: Re: proactive disk replacement
Date: Tue, 21 Mar 2017 15:15:52 +0100	[thread overview]
Message-ID: <58D13598.50403@hesbynett.no> (raw)
In-Reply-To: <09f4c794-8b17-05f5-10b7-6a3fa515bfa9@thelounge.net>

On 21/03/17 14:24, Reindl Harald wrote:
> 
> 
> Am 21.03.2017 um 14:13 schrieb David Brown:
>> On 21/03/17 12:03, Reindl Harald wrote:
>>>
>>> Am 21.03.2017 um 11:54 schrieb Adam Goryachev:
>> <snip>
>>>
>>>> In addition, you claim that a drive larger than 2TB is almost certainly
>>>> going to suffer from a URE during recovery, yet this is exactly the
>>>> situation you will be in when trying to recover a RAID10 with member
>>>> devices 2TB or larger. A single URE on the surviving portion of the
>>>> RAID1 will cause you to lose the entire RAID10 array. On the other
>>>> hand,
>>>> 3 URE's on the three remaining members of the RAID6 will not cause more
>>>> than a hiccup (as long as no more than one URE on the same stripe,
>>>> which
>>>> I would argue is ... exceptionally unlikely).
>>>
>>> given that when your disks have the same age errors on another disk
>>> become more likely when one failed and the heavy disk IO due recovery of
>>> a RAID6 with takes *many hours* where you have heavy IO on *all disks*
>>> compared with a way faster restore of RAID1/10 guess in which case a URE
>>> is more likely
>>>
>>> additionally why should the whole array fail just because a single block
>>> get lost? the is no parity which needs to be calculated, you just lost a
>>> single block somewhere - RAID1/10 are way easier in their implementation
>>
>> If you have RAID1, and you have an URE, then the data can be recovered
>> from the other have of that RAID1 pair.  If you have had a disk failure
>> (manual for replacement, or a real failure), and you get an URE on the
>> other half of that pair, then you lose data.
>>
>> With RAID6, you need an additional failure (either another full disk
>> failure or an URE in the /same/ stripe) to lose data.  RAID6 has higher
>> redundancy than two-way RAID1 - of this there is /no/ doubt
> 
> yes, but with RAID5/RAID6 *all disks* are involved in the rebuild, with
> a 10 disk RAID10 only one disk needs to be read and the data written to
> the new one - all other disks are not involved in the resync at all

True...

> 
> for most arrays the disks have a similar age and usage pattern, so when
> the first one fails it becomes likely that it don't take too long for
> another one and so load and recovery time matters

False.  There is no reason to suspect that - certainly not to within the
hours or day it takes to rebuild your array.  Disk failure pattern shows
a peak within the first month or so (failures due to manufacturing or
handling), then a very low error rate for a few years, then a gradually
increasing rate after that.  There is not a very significant correlation
between drive failures within the same system, nor is there a very
significant correlation between usage and failures.  It might seem
reasonable to suspect that a drive is more likely to fail during a
rebuild since the disk is being heavily used, but that does not appear
to be the practice.  You will /spot/ more errors at that point - simply
because you don't see errors in parts of the disk that are not read -
but the rebuilding does not cause them.

And even if it /were/ true, then the key point is if there is an error
that causes data loss.  An error during reading for a RAID1 rebuild
means lost data.  An error during reading for a RAID6 rebuild means you
have to read an extra sector from another disk and correct the mistake.

next prev parent reply	other threads:[~2017-03-21 14:15 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-20 12:47 proactive disk replacement Jeff Allison
2017-03-20 13:25 ` Reindl Harald
2017-03-20 14:59 ` Adam Goryachev
2017-03-20 15:04   ` Reindl Harald
2017-03-20 15:23     ` Adam Goryachev
2017-03-20 16:19       ` Wols Lists
2017-03-21  2:33   ` Jeff Allison
2017-03-21  9:54     ` Reindl Harald
2017-03-21 10:54       ` Adam Goryachev
2017-03-21 11:03         ` Reindl Harald
2017-03-21 11:34           ` Andreas Klauer
2017-03-21 12:03             ` Reindl Harald
2017-03-21 12:41               ` Andreas Klauer
2017-03-22  4:16                 ` NeilBrown
2017-03-21 11:56           ` Adam Goryachev
2017-03-21 12:10             ` Reindl Harald
2017-03-21 13:13           ` David Brown
2017-03-21 13:24             ` Reindl Harald
2017-03-21 14:15               ` David Brown [this message]
2017-03-21 15:25                 ` Wols Lists
2017-03-21 15:41                   ` David Brown
2017-03-21 16:49                     ` Phil Turmel
2017-03-22 13:53                       ` Gandalf Corvotempesta
2017-03-22 14:12                         ` David Brown
2017-03-22 14:32                         ` Phil Turmel
2017-03-21 11:55         ` Gandalf Corvotempesta
2017-03-21 13:02       ` David Brown
2017-03-21 13:26         ` Gandalf Corvotempesta
2017-03-21 14:26           ` David Brown
2017-03-21 15:31             ` Wols Lists
2017-03-21 17:00               ` Phil Turmel
2017-03-21 15:29         ` Wols Lists
2017-03-21 16:55         ` Phil Turmel
2017-03-22 14:51 ` John Stoffel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=58D13598.50403@hesbynett.no \
    --to=david.brown@hesbynett.no \
    --cc=h.reindl@thelounge.net \
    --cc=jeff.allison@allygray.2y.net \
    --cc=linux-raid@vger.kernel.org \
    --cc=mailinglists@websitemanagers.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).