Timeout question

Linux RAID subsystem development
 help / color / mirror / Atom feed

* Timeout question
@ 2013-11-04 20:07 Hans Kraus
  2013-11-04 22:39 ` Phil Turmel
  0 siblings, 1 reply; 4+ messages in thread
From: Hans Kraus @ 2013-11-04 20:07 UTC (permalink / raw)
  To: linux-raid

Hi,

I put all my replaced and so on HDs in one machine to serve
backup duties, with backuppc.

I assembled four raid0, each consiting of a 3 + 1 TB couple or
2 + 2 TB couple. Some of these support scterc, some do not. I've
put the following in rc.local (by the way, the system is running
Debian):
cd /dev
for x in sd[a-z]; do
     /bin/echo $x 
"---------------------------------------------------------------------------"
     /usr/sbin/smartctl -s on -o on -S on /dev/$x || echo 
"/usr/sbin/smartctl -s on -o on -S on /dev/$x failed."
     /usr/sbin/smartctl -l scterc,70,70 /dev/$x || echo 180 
 >/sys/block/$x/device/timeout || echo "/sys/block/$x/device/timeout not 
available"
     /usr/sbin/smartctl -t offline /dev/$x || echo "/usr/sbin/smartctl 
-t offline /dev/$x failed"
     /bin/echo 
"-------------------------------------------------------------------------------"
done

Afterwards, these four raid0 are the members of a raid5. The idea
behind this is to be able to replace the raid0 with single 4 TB drives.
Now comes my question: Do I need to care for timeouts of the raid0, and
if so, how do I do that? The following doesn't work:
for x in md??; do
     /bin/echo $x 
"--------------------------------------------------------------------------"
     echo 180 >/sys/block/$x/device/timeout || echo 
"/sys/block/$x/device/timeout not available"
     /bin/echo 
"-------------------------------------------------------------------------------"
  done

Kind regards,
Hans

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Timeout question
  2013-11-04 20:07 Timeout question Hans Kraus
@ 2013-11-04 22:39 ` Phil Turmel
  2013-11-04 23:29   ` Keith Keller
  2013-11-06  6:49   ` Hans Kraus
  0 siblings, 2 replies; 4+ messages in thread
From: Phil Turmel @ 2013-11-04 22:39 UTC (permalink / raw)
  To: Hans Kraus, linux-raid

Hi Hans,

On 11/04/2013 03:07 PM, Hans Kraus wrote:
> Hi,
> 
> I put all my replaced and so on HDs in one machine to serve
> backup duties, with backuppc.
> 
> I assembled four raid0, each consiting of a 3 + 1 TB couple or
> 2 + 2 TB couple. Some of these support scterc, some do not. I've
> put the following in rc.local (by the way, the system is running
> Debian):
> cd /dev
> for x in sd[a-z]; do
>     /bin/echo $x
> "---------------------------------------------------------------------------"
> 
>     /usr/sbin/smartctl -s on -o on -S on /dev/$x || echo
> "/usr/sbin/smartctl -s on -o on -S on /dev/$x failed."
>     /usr/sbin/smartctl -l scterc,70,70 /dev/$x || echo 180
>>/sys/block/$x/device/timeout || echo "/sys/block/$x/device/timeout not
> available"
>     /usr/sbin/smartctl -t offline /dev/$x || echo "/usr/sbin/smartctl -t
> offline /dev/$x failed"
>     /bin/echo
> "-------------------------------------------------------------------------------"

Good.
> 
> done
> 
> Afterwards, these four raid0 are the members of a raid5. The idea
> behind this is to be able to replace the raid0 with single 4 TB drives.
> Now comes my question: Do I need to care for timeouts of the raid0, and
> if so, how do I do that? The following doesn't work:
> for x in md??; do
>     /bin/echo $x
> "--------------------------------------------------------------------------"
> 
>     echo 180 >/sys/block/$x/device/timeout || echo
> "/sys/block/$x/device/timeout not available"
>     /bin/echo
> "-------------------------------------------------------------------------------"
> 
>  done

No.  The timeouts only matter on the physical devices.  MD doesn't have
a timeout as it isn't a physical driver.  What you have appears to be
correct.

Make sure you also have a "check" scrub in a cron job for everything
greater than raid0.  (Interval can vary--I use weekly.)  And follow up
on the cron job with a report of all mismatch-cnt values.

For large capacities with consumer drives (~8TB or more, IMHO), you
should seriously consider raid6.  The probability of an unrecoverable
read error interrupting a raid5 rebuild after a drive failure is
shockingly high.

HTH,

Phil

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Timeout question
  2013-11-04 22:39 ` Phil Turmel
@ 2013-11-04 23:29   ` Keith Keller
  2013-11-06  6:49   ` Hans Kraus
  1 sibling, 0 replies; 4+ messages in thread
From: Keith Keller @ 2013-11-04 23:29 UTC (permalink / raw)
  To: linux-raid

On 2013-11-04, Phil Turmel <philip@turmel.org> wrote:
>
> Make sure you also have a "check" scrub in a cron job for everything
> greater than raid0.  (Interval can vary--I use weekly.)  And follow up
> on the cron job with a report of all mismatch-cnt values.

RHEL/CentOS already has a script which does this; IIRC it emails root if
mismatch_cnt is nonzero after the check is complete.

> For large capacities with consumer drives (~8TB or more, IMHO), you
> should seriously consider raid6.

I'm assuming you mean arrays larger than 8TB, not individual drives!  :)
I have had a second drive fail in a large array after the first one
failed and a rebuild started, so this isn't just a theoretical
discussion.

--keith

--  
kkeller@wombat.san-francisco.ca.us

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Timeout question
  2013-11-04 22:39 ` Phil Turmel
  2013-11-04 23:29   ` Keith Keller
@ 2013-11-06  6:49   ` Hans Kraus
  1 sibling, 0 replies; 4+ messages in thread
From: Hans Kraus @ 2013-11-06  6:49 UTC (permalink / raw)
  To: linux-raid


Hi Phil,

thanks. Debian does already a scrub every first Sunday of a month.
Upgrading to a Raid6 is planned when I have the money for the disk(s).

I already encountered a second failure during rebuild of a raid5, (that
was the trigger for the backup solution), so I'm very aware of that
possibility. My main storage is already a raid6, on NAS drives.

Kind regards, Hans

Am 04.11.2013 23:39, schrieb Phil Turmel:
 > [...]
 >> Afterwards, these four raid0 are the members of a raid5. The idea
 >> behind this is to be able to replace the raid0 with single 4 TB drives.
 >> Now comes my question: Do I need to care for timeouts of the raid0, and
 >> if so, how do I do that? The following doesn't work:
 >> for x in md??; do
 >>      /bin/echo $x
 >> 
"--------------------------------------------------------------------------"
 >>
 >>      echo 180 >/sys/block/$x/device/timeout || echo
 >> "/sys/block/$x/device/timeout not available"
 >>      /bin/echo
 >> 
"-------------------------------------------------------------------------------"
 >>
 >>   done
 >
 > No.  The timeouts only matter on the physical devices.  MD doesn't have
 > a timeout as it isn't a physical driver.  What you have appears to be
 > correct.
 >
 > Make sure you also have a "check" scrub in a cron job for everything
 > greater than raid0.  (Interval can vary--I use weekly.)  And follow up
 > on the cron job with a report of all mismatch-cnt values.
 >
 > For large capacities with consumer drives (~8TB or more, IMHO), you
 > should seriously consider raid6.  The probability of an unrecoverable
 > read error interrupting a raid5 rebuild after a drive failure is
 > shockingly high.
 >
 > HTH,
 >
 > Phil



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-11-06  6:49 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-04 20:07 Timeout question Hans Kraus
2013-11-04 22:39 ` Phil Turmel
2013-11-04 23:29   ` Keith Keller
2013-11-06  6:49   ` Hans Kraus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox