thanks md raid

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* thanks md raid
@ 2012-05-10 18:54 Daniel Pocock
  2012-05-10 21:49 ` Stan Hoeppner
  0 siblings, 1 reply; 4+ messages in thread
From: Daniel Pocock @ 2012-05-10 18:54 UTC (permalink / raw)
  To: linux-raid


I'm glad my RAID1 worked as expected... just hoping I don't encounter
any read timeouts on the non-TLER drive before my rebuild finishes:


# mdadm --manage --add -v /dev/md2 /dev/sdb2
mdadm: added /dev/sdb2

# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sdb2[3] sda2[2]
      976510840 blocks super 1.2 [2/1] [_U]
      [>....................]  recovery =  1.9% (19032256/976510840)
finish=350.1min speed=45575K/sec

unused devices: <none>




# iostat -k 1 -x /dev/sd[ab]

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.63    0.00   20.65    0.00    0.00   77.72

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00 1584.00    0.00 101376.00     0.00
128.00     2.84    1.80   0.39  61.60
sdb               0.00     3.00    0.00 1584.00     0.00 101568.00
128.24    27.72   17.70   0.63  99.60

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: thanks md raid
  2012-05-10 18:54 thanks md raid Daniel Pocock
@ 2012-05-10 21:49 ` Stan Hoeppner
  2012-05-10 21:51   ` Daniel Pocock
  0 siblings, 1 reply; 4+ messages in thread
From: Stan Hoeppner @ 2012-05-10 21:49 UTC (permalink / raw)
  To: Daniel Pocock; +Cc: linux-raid

On 5/10/2012 1:54 PM, Daniel Pocock wrote:
> 
> I'm glad my RAID1 worked as expected... just hoping I don't encounter
> any read timeouts on the non-TLER drive before my rebuild finishes:

You have an inverse understanding of ERC.  Drives without ERC will retry
forever, or until an upper layer puts a stop to its efforts.  Drives
with a 7 second ERC will return a hard error after 7 seconds.

So the only way you'll get a timeout with your rebuild is if the healthy
drive spends 30 seconds retying a sector read.

-- 
Stan

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: thanks md raid
  2012-05-10 21:49 ` Stan Hoeppner
@ 2012-05-10 21:51   ` Daniel Pocock
  2012-05-11  2:09     ` Stan Hoeppner
  0 siblings, 1 reply; 4+ messages in thread
From: Daniel Pocock @ 2012-05-10 21:51 UTC (permalink / raw)
  To: stan; +Cc: linux-raid



On 10/05/12 21:49, Stan Hoeppner wrote:
> On 5/10/2012 1:54 PM, Daniel Pocock wrote:
>>
>> I'm glad my RAID1 worked as expected... just hoping I don't encounter
>> any read timeouts on the non-TLER drive before my rebuild finishes:
> 
> You have an inverse understanding of ERC.  Drives without ERC will retry
> forever, or until an upper layer puts a stop to its efforts.  Drives
> with a 7 second ERC will return a hard error after 7 seconds.
> 
> So the only way you'll get a timeout with your rebuild is if the healthy
> drive spends 30 seconds retying a sector read.
> 

I was thinking about the more obscure case - that some other URE
followed by an attempt at write access on the good drive fails and it
becomes degraded

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: thanks md raid
  2012-05-10 21:51   ` Daniel Pocock
@ 2012-05-11  2:09     ` Stan Hoeppner
  0 siblings, 0 replies; 4+ messages in thread
From: Stan Hoeppner @ 2012-05-11  2:09 UTC (permalink / raw)
  To: Daniel Pocock; +Cc: linux-raid

On 5/10/2012 4:51 PM, Daniel Pocock wrote:
> 
> 
> On 10/05/12 21:49, Stan Hoeppner wrote:
>> On 5/10/2012 1:54 PM, Daniel Pocock wrote:
>>>
>>> I'm glad my RAID1 worked as expected... just hoping I don't encounter
>>> any read timeouts on the non-TLER drive before my rebuild finishes:
>>
>> You have an inverse understanding of ERC.  Drives without ERC will retry
>> forever, or until an upper layer puts a stop to its efforts.  Drives
>> with a 7 second ERC will return a hard error after 7 seconds.
>>
>> So the only way you'll get a timeout with your rebuild is if the healthy
>> drive spends 30 seconds retying a sector read.
>>
> 
> I was thinking about the more obscure case - that some other URE
> followed by an attempt at write access on the good drive fails and it
> becomes degraded

If drives were that damn fragile modern computing wouldn't exist.  The
odds of your UPS taking a dump during a rebuild are greater than the
scenario you just described.  You need to put more thought into UPS
failure scenarios than ERC.

I mention this specifically because my "desktop" APC Backups XS 900 did
the unthinkable the other day.  Apparently it decided the batteries were
bad at the very moment it ran its hard scheduled self test.  Class, what
happens with all APC UPSes when the scheduled self test runs and the
batteries have been flagged bad?  Answer:  it drops the load and causes
your system to reboot.

One would think APC would be smart enough to have the firmware skip the
self test until after the batteries have been replaced, specifically to
prevent an unplanned power event, the whole purpose of a UPS.  I guess
one of their actuaries figured they'd get more battery sales if they
keep downing your system until you replace them...

-- 
Stan

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-05-11  2:09 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-05-10 18:54 thanks md raid Daniel Pocock
2012-05-10 21:49 ` Stan Hoeppner
2012-05-10 21:51   ` Daniel Pocock
2012-05-11  2:09     ` Stan Hoeppner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).