Checking consistency of Linux software RAID

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Checking consistency of Linux software RAID
@ 2003-06-30 12:58 Martin Bene
  2003-06-30 13:13 ` Gordon Henderson
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Martin Bene @ 2003-06-30 12:58 UTC (permalink / raw)
  To: linux-raid

Hi,

Administrationg quite a few systems with HW raid controllers, I've come to
really like a feature that seems to be missing from current SW raid: 

Scheduling a (weekly) complete media scan where all surfaces of all drives
get read; in case of read errors a repair is tried: the content for the
failed sector is reconstructed just as if the drive had completely failed and
rewritten to the failed sector; if reading works afterwards, regard the
repair as successfull and continue using the drive.

Is there any way to do this with SW raid? I truly hate situations where some
sectors on a drive fail silently and you don't notice until a 2nd drive dies
and you find you can't recostruct your raid data becaus of silent "bitrot".

Tnaks for any hints,

Martin

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Checking consistency of Linux software RAID
  2003-06-30 12:58 Checking consistency of Linux software RAID Martin Bene
@ 2003-06-30 13:13 ` Gordon Henderson
  2003-06-30 13:16   ` Lars Marowsky-Bree
  2003-06-30 13:16 ` Lars Marowsky-Bree
  2003-07-07 18:29 ` Bernd Schubert
  2 siblings, 1 reply; 14+ messages in thread
From: Gordon Henderson @ 2003-06-30 13:13 UTC (permalink / raw)
  To: linux-raid

On Mon, 30 Jun 2003, Martin Bene wrote:

> Is there any way to do this with SW raid? I truly hate situations where
> some sectors on a drive fail silently and you don't notice until a 2nd
> drive dies and you find you can't recostruct your raid data becaus of
> silent "bitrot".

Try

	badblocks -v -s /dev/md0

Check the man page for various options, etc.

You might actually want to read the raw partitions rather than reading
through the md driver - eg. badblocks /dev/hda1, etc. so do this twice for
each partition of a RAID1, and N times for each slice of a RAID5 ...

Gordon



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Checking consistency of Linux software RAID
  2003-06-30 13:13 ` Gordon Henderson
@ 2003-06-30 13:16   ` Lars Marowsky-Bree
  2003-06-30 13:28     ` Gordon Henderson
  0 siblings, 1 reply; 14+ messages in thread
From: Lars Marowsky-Bree @ 2003-06-30 13:16 UTC (permalink / raw)
  To: Gordon Henderson, linux-raid

On 2003-06-30T14:13:57,
   Gordon Henderson <gordon@drogon.net> said:

> Try
> 
> 	badblocks -v -s /dev/md0

This won't do what he asked.


Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
SuSE Labs - Research & Development, SuSE Linux AG
  
"If anything can go wrong, it will." "Chance favors the prepared (mind)."
  -- Capt. Edward A. Murphy            -- Louis Pasteur
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Checking consistency of Linux software RAID
  2003-06-30 13:16   ` Lars Marowsky-Bree
@ 2003-06-30 13:28     ` Gordon Henderson
  2003-06-30 13:36       ` Lars Marowsky-Bree
  0 siblings, 1 reply; 14+ messages in thread
From: Gordon Henderson @ 2003-06-30 13:28 UTC (permalink / raw)
  To: Lars Marowsky-Bree; +Cc: linux-raid

On Mon, 30 Jun 2003, Lars Marowsky-Bree wrote:

> On 2003-06-30T14:13:57,
>    Gordon Henderson <gordon@drogon.net> said:
>
> > Try
> >
> > 	badblocks -v -s /dev/md0
>
> This won't do what he asked.

You are pedantically correct, but in the absence of anything else, at
least it could form a partial solution and let him know that a problem
exists which could them be actioned in some way before it becomes data
threatening.

Gordon

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Checking consistency of Linux software RAID
  2003-06-30 13:28     ` Gordon Henderson
@ 2003-06-30 13:36       ` Lars Marowsky-Bree
  0 siblings, 0 replies; 14+ messages in thread
From: Lars Marowsky-Bree @ 2003-06-30 13:36 UTC (permalink / raw)
  To: Gordon Henderson; +Cc: linux-raid

On 2003-06-30T14:28:57,
   Gordon Henderson <gordon@drogon.net> said:

> You are pedantically correct, but in the absence of anything else, at
> least it could form a partial solution and let him know that a problem
> exists which could them be actioned in some way before it becomes data
> threatening.

No. The problem is that if badblocks returns anything on a md device,
the data _is_ already threatened beyond rescue.

A badblocks r/o test on the underlaying devices me be more sensible and
help to diagnose it a little, but it also won't verify consistency
between the drives.


Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
SuSE Labs - Research & Development, SuSE Linux AG
  
"If anything can go wrong, it will." "Chance favors the prepared (mind)."
  -- Capt. Edward A. Murphy            -- Louis Pasteur
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Checking consistency of Linux software RAID
  2003-06-30 12:58 Checking consistency of Linux software RAID Martin Bene
  2003-06-30 13:13 ` Gordon Henderson
@ 2003-06-30 13:16 ` Lars Marowsky-Bree
  2003-07-07 18:29 ` Bernd Schubert
  2 siblings, 0 replies; 14+ messages in thread
From: Lars Marowsky-Bree @ 2003-06-30 13:16 UTC (permalink / raw)
  To: Martin Bene, linux-raid

On 2003-06-30T14:58:19,
   Martin Bene <martin.bene@icomedias.com> said:

> Scheduling a (weekly) complete media scan where all surfaces of all drives
> get read; in case of read errors a repair is tried: the content for the
> failed sector is reconstructed just as if the drive had completely failed and
> rewritten to the failed sector; if reading works afterwards, regard the
> repair as successfull and continue using the drive.

This can't currently be done.

I'd suggest to start from the resync code and instead use it to check
instead; add an ioctl to trigger the consistency scan, then you can
schedule it via cron all you like.


Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
SuSE Labs - Research & Development, SuSE Linux AG
  
"If anything can go wrong, it will." "Chance favors the prepared (mind)."
  -- Capt. Edward A. Murphy            -- Louis Pasteur
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Checking consistency of Linux software RAID
  2003-06-30 12:58 Checking consistency of Linux software RAID Martin Bene
  2003-06-30 13:13 ` Gordon Henderson
  2003-06-30 13:16 ` Lars Marowsky-Bree
@ 2003-07-07 18:29 ` Bernd Schubert
  2003-07-07 18:42   ` Corey McGuire
  2 siblings, 1 reply; 14+ messages in thread
From: Bernd Schubert @ 2003-07-07 18:29 UTC (permalink / raw)
  To: Martin Bene, linux-raid

On Monday 30 June 2003 14:58, Martin Bene wrote:
> Hi,
>
> Administrationg quite a few systems with HW raid controllers, I've come to
> really like a feature that seems to be missing from current SW raid:
>
> Scheduling a (weekly) complete media scan where all surfaces of all drives
> get read; in case of read errors a repair is tried: the content for the
> failed sector is reconstructed just as if the drive had completely failed
> and rewritten to the failed sector; if reading works afterwards, regard the
> repair as successfull and continue using the drive.
>
> Is there any way to do this with SW raid? I truly hate situations where
> some sectors on a drive fail silently and you don't notice until a 2nd
> drive dies and you find you can't recostruct your raid data becaus of
> silent "bitrot".
>

Hi,

/proc/mdstat is to monitor the status of your raid, so when one drive fails it 
becomes dropped out of the raid-array. Using mdadm you can monitor 
/proc/mdstat and it even can send you a mail when one of your disks fails. So 
if you really want to scan your disk once a week, why not running 'dd 
if=/dev/mdX of=/dev/zero' ? So every block of every raid-disk should become 
read and the md-driver should automatically drop a failing disk  out of the 
raid. 
I guess you could even try to repair a disk when it became dropped out of the 
raid by running some scripts, but since I never trusted any disk that had 
failed ones, I never worried about it.

Bernd

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Checking consistency of Linux software RAID
  2003-07-07 18:29 ` Bernd Schubert
@ 2003-07-07 18:42   ` Corey McGuire
  2003-07-08 16:51     ` Bernd Schubert
  0 siblings, 1 reply; 14+ messages in thread
From: Corey McGuire @ 2003-07-07 18:42 UTC (permalink / raw)
  To: linux-raid


This question has been on the tip of my tongue... Thanks for your answer...

Out of curiousity, why do you use /dev/zero? Would dd to /dev/null cause
problems or is /dev/zero required for proper results?

>Hi,
>
>/proc/mdstat is to monitor the status of your raid, so when one drive
>fails it 
>becomes dropped out of the raid-array. Using mdadm you can monitor 
>/proc/mdstat and it even can send you a mail when one of your disks fails.
>So 
>if you really want to scan your disk once a week, why not running 'dd 
>if=/dev/mdX of=/dev/zero' ? So every block of every raid-disk should
>become 
>read and the md-driver should automatically drop a failing disk  out of
>the 
>raid. 
>I guess you could even try to repair a disk when it became dropped out of
>the 
>raid by running some scripts, but since I never trusted any disk that had 
>failed ones, I never worried about it.
>
>Bernd
>-
>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html




/\/\/\/\/\/\ Nothing is foolproof to a talented fool. /\/\/\/\/\/\

coreyfro@coreyfro.com
http://www.coreyfro.com/
http://stats.distributed.net/rc5-64/psummary.php3?id=196879
ICQ : 3168059

-----BEGIN GEEK CODE BLOCK-----
GCS d--(+) s: a-- C++++$ UBL++>++++ P+ L+ E W+++$ N+ o? K? w++++$>+++++$
O---- !M--- V- PS+++ PE++(--) Y+ PGP- t--- 5(+) !X- R(+) !tv b-(+)
Dl++(++++) D++ G+ e>+++ h++(---) r++>+$ y++*>$ H++++ n---(----) p? !au w+
v- 3+>++ j- G'''' B--- u+++*** f* Quake++++>+++++$
------END GEEK CODE BLOCK------

Home of Geek Code - http://www.geekcode.com/
The Geek Code Decoder Page - http://www.ebb.org/ungeek//


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Checking consistency of Linux software RAID
  2003-07-07 18:42   ` Corey McGuire
@ 2003-07-08 16:51     ` Bernd Schubert
  2003-07-08 21:23       ` software raid hangs Donghui Wen
  2003-07-08 21:47       ` Checking consistency of Linux software RAID Corey McGuire
  0 siblings, 2 replies; 14+ messages in thread
From: Bernd Schubert @ 2003-07-08 16:51 UTC (permalink / raw)
  To: Corey McGuire, linux-raid

Hello Corey!

> This question has been on the tip of my tongue... Thanks for your answer...
>
> Out of curiousity, why do you use /dev/zero? Would dd to /dev/null cause
> problems or is /dev/zero required for proper results?
>

D'oh it seems I was a bit sleepy yesterday, of course, you are right - it has 
to be /dev/null! 
And of course, one can only read from /dev/zero. 

Sorry for posting improper commands.


Best regards,	
	Bernd


> >Hi,
> >
> >/proc/mdstat is to monitor the status of your raid, so when one drive
> >fails it
> >becomes dropped out of the raid-array. Using mdadm you can monitor
> >/proc/mdstat and it even can send you a mail when one of your disks fails.
> >So
> >if you really want to scan your disk once a week, why not running 'dd
> >if=/dev/mdX of=/dev/zero' ? So every block of every raid-disk should
> >become
> >read and the md-driver should automatically drop a failing disk  out of
> >the
> >raid.
> >I guess you could even try to repair a disk when it became dropped out of
> >the
> >raid by running some scripts, but since I never trusted any disk that had
> >failed ones, I never worried about it.
> >
> >Bernd
> >-
> >To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >the body of a message to majordomo@vger.kernel.org
> >More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> /\/\/\/\/\/\ Nothing is foolproof to a talented fool. /\/\/\/\/\/\
>
> coreyfro@coreyfro.com
> http://www.coreyfro.com/
> http://stats.distributed.net/rc5-64/psummary.php3?id=196879
> ICQ : 3168059
>
> -----BEGIN GEEK CODE BLOCK-----
> GCS d--(+) s: a-- C++++$ UBL++>++++ P+ L+ E W+++$ N+ o? K? w++++$>+++++$
> O---- !M--- V- PS+++ PE++(--) Y+ PGP- t--- 5(+) !X- R(+) !tv b-(+)
> Dl++(++++) D++ G+ e>+++ h++(---) r++>+$ y++*>$ H++++ n---(----) p? !au w+
> v- 3+>++ j- G'''' B--- u+++*** f* Quake++++>+++++$
> ------END GEEK CODE BLOCK------
>
> Home of Geek Code - http://www.geekcode.com/
> The Geek Code Decoder Page - http://www.ebb.org/ungeek//
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Bernd Schubert
Physikalisch Chemisches Institut / Theoretische Chemie
Universität Heidelberg
INF 229
69120 Heidelberg
e-mail: bernd.schubert@pci.uni-heidelberg.de
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* software raid hangs.
  2003-07-08 16:51     ` Bernd Schubert
@ 2003-07-08 21:23       ` Donghui Wen
  2003-07-08 21:38         ` Matt Simonsen
  2003-07-08 21:47       ` Checking consistency of Linux software RAID Corey McGuire
  1 sibling, 1 reply; 14+ messages in thread
From: Donghui Wen @ 2003-07-08 21:23 UTC (permalink / raw)
  To: linux-raid

Hi,
     I am testing software-raid with 3ware-7500 controller on a
hot-swappable chassis.
I found out that software-raid 5 could not even sustain 1 disk failure in
this case.
May be it is the limitation of linux scsi driver or 3ware driver. Do your
guys have
any ideas what it is going on? Here is what I did:
    * setup 3ware-7500 as jbod mode with 7 ide disks.
    * make raid 5  (md0) with sdb1 ~ sdh1
    * mount /dev/md0 /u02
    * cd /u02 and run bonnie++
    * when bonnie++ is running, I pulled out one disk (sda). bonnie++ will
switch to state 'D'
      and hangs there.
    * Any disk operation in /u02 will hangs, for example : echo "Hello,
world" > aaa.txt

    If the whole operating system is installed on a software-raid 5
partition, in this case,
the system hangs since disk io will cause a process switch to state 'D'.
    I think these is the limitation of linux scsi driver( scsi itself does
not support hotplug),
but if it is true, software-raid is not a real raid solutions, because if
one disk  is pulled out,
the whole system hangs.

Thanks!

Donghui




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: software raid hangs.
  2003-07-08 21:23       ` software raid hangs Donghui Wen
@ 2003-07-08 21:38         ` Matt Simonsen
  2003-07-08 21:41           ` Donghui Wen
  0 siblings, 1 reply; 14+ messages in thread
From: Matt Simonsen @ 2003-07-08 21:38 UTC (permalink / raw)
  To: Donghui Wen; +Cc: linux-raid

On Tue, 2003-07-08 at 14:23, Donghui Wen wrote:
> Hi,
>      I am testing software-raid with 3ware-7500 controller on a
> hot-swappable chassis.
> I found out that software-raid 5 could not even sustain 1 disk failure in
> this case.


Did you allow the array to first sync up before pulling out a disk? It
may take quite a while if you have large disks (several hours to days).

Matt


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: software raid hangs.
  2003-07-08 21:38         ` Matt Simonsen
@ 2003-07-08 21:41           ` Donghui Wen
  0 siblings, 0 replies; 14+ messages in thread
From: Donghui Wen @ 2003-07-08 21:41 UTC (permalink / raw)
  To: Matt Simonsen; +Cc: linux-raid

Yes, I did. Actully, in order to verify this case,
I create some small partitions (1G) to make raid.

Donghui

----- Original Message -----
From: "Matt Simonsen" <matt_lists@careercast.com>
To: "Donghui Wen" <dhwen@protegonetworks.com>
Cc: <linux-raid@vger.kernel.org>
Sent: Tuesday, July 08, 2003 2:38 PM
Subject: Re: software raid hangs.


> On Tue, 2003-07-08 at 14:23, Donghui Wen wrote:
> > Hi,
> >      I am testing software-raid with 3ware-7500 controller on a
> > hot-swappable chassis.
> > I found out that software-raid 5 could not even sustain 1 disk failure
in
> > this case.
>
>
> Did you allow the array to first sync up before pulling out a disk? It
> may take quite a while if you have large disks (several hours to days).
>
> Matt
>
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Checking consistency of Linux software RAID
  2003-07-08 16:51     ` Bernd Schubert
  2003-07-08 21:23       ` software raid hangs Donghui Wen
@ 2003-07-08 21:47       ` Corey McGuire
  1 sibling, 0 replies; 14+ messages in thread
From: Corey McGuire @ 2003-07-08 21:47 UTC (permalink / raw)
  To: linux-raid

no big... I put "dd" to use, and it works like a charm... of course, I
won't really know if it works like a charm until something breaks...

That's the trouble with computers ;-)

*********** REPLY SEPARATOR  ***********

On 7/8/2003 at 6:51 PM Bernd Schubert wrote:

>Hello Corey!
>
>> This question has been on the tip of my tongue... Thanks for your
>answer...
>>
>> Out of curiousity, why do you use /dev/zero? Would dd to /dev/null cause
>> problems or is /dev/zero required for proper results?
>>
>
>D'oh it seems I was a bit sleepy yesterday, of course, you are right - it
>has 
>to be /dev/null! 
>And of course, one can only read from /dev/zero. 
>
>Sorry for posting improper commands.
>
>
>Best regards,	
>	Bernd
>
>
>> >Hi,
>> >
>> >/proc/mdstat is to monitor the status of your raid, so when one drive
>> >fails it
>> >becomes dropped out of the raid-array. Using mdadm you can monitor
>> >/proc/mdstat and it even can send you a mail when one of your disks
>fails.
>> >So
>> >if you really want to scan your disk once a week, why not running 'dd
>> >if=/dev/mdX of=/dev/zero' ? So every block of every raid-disk
should
>> >become
>> >read and the md-driver should automatically drop a failing disk  out of
>> >the
>> >raid.
>> >I guess you could even try to repair a disk when it became dropped out
>of
>> >the
>> >raid by running some scripts, but since I never trusted any disk that
>had
>> >failed ones, I never worried about it.
>> >
>> >Bernd
>> >-
>> >To unsubscribe from this list: send the line "unsubscribe linux-raid"
in
>> >the body of a message to majordomo@vger.kernel.org
>> >More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>> /\/\/\/\/\/\ Nothing is foolproof to a talented fool. /\/\/\/\/\/\
>>
>> coreyfro@coreyfro.com
>> http://www.coreyfro.com/
>> http://stats.distributed.net/rc5-64/psummary.php3?id=196879
>> ICQ : 3168059
>>
>> -----BEGIN GEEK CODE BLOCK-----
>> GCS d--(+) s: a-- C++++$ UBL++>++++ P+ L+ E W+++$ N+ o? K? w++++$>+++++$
>> O---- !M--- V- PS+++ PE++(--) Y+ PGP- t--- 5(+) !X- R(+) !tv b-(+)
>> Dl++(++++) D++ G+ e>+++ h++(---) r++>+$ y++*>$ H++++ n---(----) p? !au
w+
>> v- 3+>++ j- G'''' B--- u+++*** f* Quake++++>+++++$
>> ------END GEEK CODE BLOCK------
>>
>> Home of Geek Code - http://www.geekcode.com/
>> The Geek Code Decoder Page - http://www.ebb.org/ungeek//
>>
>> -
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>-- 
>Bernd Schubert
>Physikalisch Chemisches Institut / Theoretische Chemie
>Universität Heidelberg
>INF 229
>69120 Heidelberg
>e-mail: bernd.schubert@pci.uni-heidelberg.de
>-
>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html




/\/\/\/\/\/\ Nothing is foolproof to a talented fool. /\/\/\/\/\/\

coreyfro@coreyfro.com
http://www.coreyfro.com/
http://stats.distributed.net/rc5-64/psummary.php3?id=196879
ICQ : 3168059

-----BEGIN GEEK CODE BLOCK-----
GCS d--(+) s: a-- C++++$ UBL++>++++ P+ L+ E W+++$ N+ o? K? w++++$>+++++$
O---- !M--- V- PS+++ PE++(--) Y+ PGP- t--- 5(+) !X- R(+) !tv b-(+)
Dl++(++++) D++ G+ e>+++ h++(---) r++>+$ y++*>$ H++++ n---(----) p? !au w+
v- 3+>++ j- G'''' B--- u+++*** f* Quake++++>+++++$
------END GEEK CODE BLOCK------

Home of Geek Code - http://www.geekcode.com/
The Geek Code Decoder Page - http://www.ebb.org/ungeek//

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Checking consistency of Linux software RAID
@ 2003-07-09  8:06 Martin Bene
  0 siblings, 0 replies; 14+ messages in thread
From: Martin Bene @ 2003-07-09  8:06 UTC (permalink / raw)
  To: Bernd Schubert, Corey McGuire, linux-raid

Hi Bernd,

> /proc/mdstat is to monitor the status of your raid, so when 
> one drive fails it becomes dropped out of the raid-array. 
> Using mdadm you can monitor /proc/mdstat and it even can 
> send you a mail when one of your disks fails.

Yes, but it can only notify you of errors that it actually detects; as writen
before my concern is silent "bitrot", i.e unaccessed data on the disks going
bad.

> So if you really want to scan your disk once a week, why not 
> running 'dd if=/dev/mdX of=/dev/null'? 
> So every block of every raid-disk should become
> read and the md-driver should automatically drop a failing 
> disk  out of the raid.

Umm, no: this only reads each logical block but doesn't read the redundant
information on raid1 or raid5. Meaning: even if a read of the whole MD device
works, it doesn't guarantee that all sectors of all physical devices can
actually be read.

To check for errors, scanning the lowlevel devices (/dev/sd??) could work,
but still won't help for errors as described further down.

> I guess you could even try to repair a disk when it became 
> dropped out of the raid by running some scripts, but since 
> I never trusted any disk that had failed ones, I never worried 
> about it.

If a write is in progress during a power failure, chances are quite high that
you end up with at least one unreadable sector on the drive; repairing these
is quite OK and not a sign of the drive going bad. So having one sector bad
on drive 0 and another sector on drive 1 is not too farfetched - currently,
there's no good way to recover from such a situation: if you hit the bad
sector on drive 0, drive0 will be kicked from the array; when you hit the bad
sector on drive1 during resync, resync will fail.

With (some) hardware raid, solutions, a media scan will find both errors and
rewrite the bad sectors with recostructed data from the other drives. Quite a
useful feature but not yet possible with linux SW raid.

Bye, Martin

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2003-07-09  8:06 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-06-30 12:58 Checking consistency of Linux software RAID Martin Bene
2003-06-30 13:13 ` Gordon Henderson
2003-06-30 13:16   ` Lars Marowsky-Bree
2003-06-30 13:28     ` Gordon Henderson
2003-06-30 13:36       ` Lars Marowsky-Bree
2003-06-30 13:16 ` Lars Marowsky-Bree
2003-07-07 18:29 ` Bernd Schubert
2003-07-07 18:42   ` Corey McGuire
2003-07-08 16:51     ` Bernd Schubert
2003-07-08 21:23       ` software raid hangs Donghui Wen
2003-07-08 21:38         ` Matt Simonsen
2003-07-08 21:41           ` Donghui Wen
2003-07-08 21:47       ` Checking consistency of Linux software RAID Corey McGuire
  -- strict thread matches above, loose matches on Subject: below --
2003-07-09  8:06 Martin Bene

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).