problem killing raid 5

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* problem killing raid 5
@ 2007-10-01 11:04 Daniel Santos
  2007-10-01 18:20 ` Daniel Santos
  0 siblings, 1 reply; 8+ messages in thread
From: Daniel Santos @ 2007-10-01 11:04 UTC (permalink / raw)
  To: linux-raid

Hello,

I had a raid 5 array on three disks. Because of a hardware problem two 
disks dissapeared one after the other. I have since been trying to 
create a new array with them.

Between the degradation of the two disks I tryied removing one of the 
failed disks and re-adding it to the array. When the second disk failed 
I noticed the drive numbers on the broken array, and misteriously a 
fourth drive appeared on it. Now I have numbers 0,1 and 3, but no number 2.
mdadm tells me that number 3 is a spare.

Now I want to start all over again, but even after zeroing the 
superblocks on all three disks, and creation of a new array, 
/proc/mdstat shows the same drive numbers, while reconstructing the 
third drive.
What should I do ?

Daniel Santos

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: problem killing raid 5
  2007-10-01 11:04 problem killing raid 5 Daniel Santos
@ 2007-10-01 18:20 ` Daniel Santos
  2007-10-01 18:47   ` Michael Tokarev
  0 siblings, 1 reply; 8+ messages in thread
From: Daniel Santos @ 2007-10-01 18:20 UTC (permalink / raw)
  Cc: linux-raid

I retried rebuilding the array once again from scratch, and this time 
checked the syslog messages. The reconstructions process is getting 
stuck at a disk block that it can't read. I double checked the block 
number by repeating the array creation, and did a bad block scan. No bad 
blocks were found. How could the md driver be stuck if the block is fine ?

Supposing that the disk has bad blocks, can I have a raid device on 
disks that have badblocks ? Each one of the disks is 400 GB.

Probably not a good idea because if a drive has bad blocks it probably 
will have more in the future. But anyway, can I ?
The bad blocks would have to be known to the md driver.

Daniel Santos wrote:
> Hello,
>
> I had a raid 5 array on three disks. Because of a hardware problem two 
> disks dissapeared one after the other. I have since been trying to 
> create a new array with them.
>
> Between the degradation of the two disks I tryied removing one of the 
> failed disks and re-adding it to the array. When the second disk 
> failed I noticed the drive numbers on the broken array, and 
> misteriously a fourth drive appeared on it. Now I have numbers 0,1 and 
> 3, but no number 2.
> mdadm tells me that number 3 is a spare.
>
> Now I want to start all over again, but even after zeroing the 
> superblocks on all three disks, and creation of a new array, 
> /proc/mdstat shows the same drive numbers, while reconstructing the 
> third drive.
> What should I do ?
>
> Daniel Santos
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: problem killing raid 5
  2007-10-01 18:20 ` Daniel Santos
@ 2007-10-01 18:47   ` Michael Tokarev
  2007-10-01 18:58     ` Patrik Jonsson
  0 siblings, 1 reply; 8+ messages in thread
From: Michael Tokarev @ 2007-10-01 18:47 UTC (permalink / raw)
  To: Daniel Santos; +Cc: linux-raid

Daniel Santos wrote:
> I retried rebuilding the array once again from scratch, and this time
> checked the syslog messages. The reconstructions process is getting
> stuck at a disk block that it can't read. I double checked the block
> number by repeating the array creation, and did a bad block scan. No bad
> blocks were found. How could the md driver be stuck if the block is fine ?
> 
> Supposing that the disk has bad blocks, can I have a raid device on
> disks that have badblocks ? Each one of the disks is 400 GB.
> 
> Probably not a good idea because if a drive has bad blocks it probably
> will have more in the future. But anyway, can I ?
> The bad blocks would have to be known to the md driver.

Well, almost all modern drives can remap bad blocks (at least I know no
drive that can't).  Most of the time it happens on write - becaue if such
a bad block is found during read operation and the drive really can't
read the content of that block, it can't remap it either without losing
data.  From my expirience (about 20 years, many 100s of drives, mostly
(old) SCSI but (old) IDE too), it's pretty normal for a drive to develop
several bad blocks, especially during first year of usage.  Sometimes
however, number of bad blocks grows quite rapidly and such a drive
definietely should be replaced - at least Seagate drives are covered
by warranty in this case.

SCSI drives has 2 so-called "defect lists", stored somewhere inside the
drive - factory-preset list (bad blocks found during internal testing
when producing a drive), and grown list (bad blocks found by drive
during normal usage).  Factory-preset list can contain from 0 to about
1000 entries or even more (depending on the size too), grown list can
be as large as 500 blocks or more, whenever it's fatal or not depends
on whenever new bad blocks continues to be found or not.  We have
several drives which developed that many bad blocks in first few
months of usage, the list stopped growing, and they're still working
just fine for >5 years.  Both defect lists can be shown by scsitools
programs.

I don't know how one can see defect lists on a IDE or SATA drive.

Note that md layer (raid1, 4, 5, 6, 10 - but obviously not raid0 and
linear) are now able to repair bad blocks automatically, by forcing
write to the same place of the drive where a read error occured -
this usually forces drive to automatically reallocate that block
and continue.

But in any case, md should not stall - be it during reconstruction
or not.  For this, I can't comment - to me it smells like a bug
somewhere (md layer? error handling in driver? something else?)
which should be found and fixed.  And for this, some more details
are needed I guess -- kernel version is a start.

/mjt

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: problem killing raid 5
  2007-10-01 18:47   ` Michael Tokarev
@ 2007-10-01 18:58     ` Patrik Jonsson
  2007-10-01 20:35       ` Michael Tokarev
  0 siblings, 1 reply; 8+ messages in thread
From: Patrik Jonsson @ 2007-10-01 18:58 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: Daniel Santos, linux-raid

[-- Attachment #1: Type: text/plain, Size: 3475 bytes --]

Michael Tokarev wrote:
> Daniel Santos wrote:
>> I retried rebuilding the array once again from scratch, and this time
>> checked the syslog messages. The reconstructions process is getting
>> stuck at a disk block that it can't read. I double checked the block
>> number by repeating the array creation, and did a bad block scan. No bad
>> blocks were found. How could the md driver be stuck if the block is fine ?
>>
>> Supposing that the disk has bad blocks, can I have a raid device on
>> disks that have badblocks ? Each one of the disks is 400 GB.
>>
>> Probably not a good idea because if a drive has bad blocks it probably
>> will have more in the future. But anyway, can I ?
>> The bad blocks would have to be known to the md driver.
> 
> Well, almost all modern drives can remap bad blocks (at least I know no
> drive that can't).  Most of the time it happens on write - becaue if such
> a bad block is found during read operation and the drive really can't
> read the content of that block, it can't remap it either without losing
> data.  From my expirience (about 20 years, many 100s of drives, mostly
> (old) SCSI but (old) IDE too), it's pretty normal for a drive to develop
> several bad blocks, especially during first year of usage.  Sometimes
> however, number of bad blocks grows quite rapidly and such a drive
> definietely should be replaced - at least Seagate drives are covered
> by warranty in this case.
> 
> SCSI drives has 2 so-called "defect lists", stored somewhere inside the
> drive - factory-preset list (bad blocks found during internal testing
> when producing a drive), and grown list (bad blocks found by drive
> during normal usage).  Factory-preset list can contain from 0 to about
> 1000 entries or even more (depending on the size too), grown list can
> be as large as 500 blocks or more, whenever it's fatal or not depends
> on whenever new bad blocks continues to be found or not.  We have
> several drives which developed that many bad blocks in first few
> months of usage, the list stopped growing, and they're still working
> just fine for >5 years.  Both defect lists can be shown by scsitools
> programs.
> 
> I don't know how one can see defect lists on a IDE or SATA drive.
> 
> Note that md layer (raid1, 4, 5, 6, 10 - but obviously not raid0 and
> linear) are now able to repair bad blocks automatically, by forcing
> write to the same place of the drive where a read error occured -
> this usually forces drive to automatically reallocate that block
> and continue.
> 
> But in any case, md should not stall - be it during reconstruction
> or not.  For this, I can't comment - to me it smells like a bug
> somewhere (md layer? error handling in driver? something else?)
> which should be found and fixed.  And for this, some more details
> are needed I guess -- kernel version is a start.

Really? It's my understanding that if md finds an unreadable block
during raid5 reconstruction, it has no option but to fail since the
information can't be reconstructed. When this happened to me, I had to
wipe the bad block, which should allow reconstruction to proceed at the
cost of losing the chunk that's on the unreadable block. The bad block
howto and messages in this list ~2 years ago explain how to figure out
which file(s) is affected.

This is why it's important to run a weekly check so md can repair blocks
*before* a drive fails.

cheers,

/Patrik


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: problem killing raid 5
  2007-10-01 18:58     ` Patrik Jonsson
@ 2007-10-01 20:35       ` Michael Tokarev
  2007-10-01 21:11         ` Daniel Santos
  0 siblings, 1 reply; 8+ messages in thread
From: Michael Tokarev @ 2007-10-01 20:35 UTC (permalink / raw)
  To: Patrik Jonsson; +Cc: Daniel Santos, linux-raid

Patrik Jonsson wrote:
> Michael Tokarev wrote:
[]
>> But in any case, md should not stall - be it during reconstruction
>> or not.  For this, I can't comment - to me it smells like a bug
>> somewhere (md layer? error handling in driver? something else?)
>> which should be found and fixed.  And for this, some more details
>> are needed I guess -- kernel version is a start.
> 
> Really? It's my understanding that if md finds an unreadable block
> during raid5 reconstruction, it has no option but to fail since the
> information can't be reconstructed. When this happened to me, I had to

Yes indeed, it should fail, but not stuck as Daniel reported.
Ie, it should either complete the work or fail, but not sleep
somewhere in between.

[]
> This is why it's important to run a weekly check so md can repair blocks
> *before* a drive fails.

*nod*.

/mjt

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: problem killing raid 5
  2007-10-01 20:35       ` Michael Tokarev
@ 2007-10-01 21:11         ` Daniel Santos
  2007-10-01 21:44           ` Justin Piszcz
  0 siblings, 1 reply; 8+ messages in thread
From: Daniel Santos @ 2007-10-01 21:11 UTC (permalink / raw)
  Cc: linux-raid

It stopped the reconstruction process and the output of /proc/mdstat was :

oraculo:/home/dlsa# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid1] [raid0] [linear]
md0 : active raid5 sdc1[3](S) sdb1[4](F) sdd1[0]
      781417472 blocks level 5, 256k chunk, algorithm 2 [3/1] [U__]

I then stopped the array and tried to assemble it with a scan :

oraculo:/home/dlsa# mdadm --assemble --scan
mdadm: /dev/md0 assembled from 1 drive and 1 spare - not enough to start 
the array.
oraculo:/home/dlsa# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid1] [raid0] [linear]
md0 : inactive sdd1[0](S) sdc1[3](S) sdb1[1](S)
      1172126208 blocks

The fourth drive I had to put in mdadm.conf as missing.

The result was that because of the read error, the reconstruction 
process for the new array aborted, and the assemble came up with an 
array that seems like the one that failed before I created the new one.

I am running debian with a 2.6.22 kernel.


Michael Tokarev wrote:
> Patrik Jonsson wrote:
>   
>> Michael Tokarev wrote:
>>     
> []
>   
>>> But in any case, md should not stall - be it during reconstruction
>>> or not.  For this, I can't comment - to me it smells like a bug
>>> somewhere (md layer? error handling in driver? something else?)
>>> which should be found and fixed.  And for this, some more details
>>> are needed I guess -- kernel version is a start.
>>>       
>> Really? It's my understanding that if md finds an unreadable block
>> during raid5 reconstruction, it has no option but to fail since the
>> information can't be reconstructed. When this happened to me, I had to
>>     
>
> Yes indeed, it should fail, but not stuck as Daniel reported.
> Ie, it should either complete the work or fail, but not sleep
> somewhere in between.
>
> []
>   
>> This is why it's important to run a weekly check so md can repair blocks
>> *before* a drive fails.
>>     
>
> *nod*.
>
> /mjt
>
>   


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: problem killing raid 5
  2007-10-01 21:11         ` Daniel Santos
@ 2007-10-01 21:44           ` Justin Piszcz
  2007-10-02  6:53             ` Daniel Santos
  0 siblings, 1 reply; 8+ messages in thread
From: Justin Piszcz @ 2007-10-01 21:44 UTC (permalink / raw)
  To: Daniel Santos; +Cc: linux-raid



On Mon, 1 Oct 2007, Daniel Santos wrote:

> It stopped the reconstruction process and the output of /proc/mdstat was :
>
> oraculo:/home/dlsa# cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4] [raid1] [raid0] [linear]
> md0 : active raid5 sdc1[3](S) sdb1[4](F) sdd1[0]
>     781417472 blocks level 5, 256k chunk, algorithm 2 [3/1] [U__]
>
> I then stopped the array and tried to assemble it with a scan :
>
> oraculo:/home/dlsa# mdadm --assemble --scan
> mdadm: /dev/md0 assembled from 1 drive and 1 spare - not enough to start the 
> array.
> oraculo:/home/dlsa# cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4] [raid1] [raid0] [linear]
> md0 : inactive sdd1[0](S) sdc1[3](S) sdb1[1](S)
>     1172126208 blocks
>
> The fourth drive I had to put in mdadm.conf as missing.
>
> The result was that because of the read error, the reconstruction process for 
> the new array aborted, and the assemble came up with an array that seems like 
> the one that failed before I created the new one.
>
> I am running debian with a 2.6.22 kernel.
>
>
> Michael Tokarev wrote:
>> Patrik Jonsson wrote:
>> 
>>> Michael Tokarev wrote:
>>> 
>> []
>> 
>>>> But in any case, md should not stall - be it during reconstruction
>>>> or not.  For this, I can't comment - to me it smells like a bug
>>>> somewhere (md layer? error handling in driver? something else?)
>>>> which should be found and fixed.  And for this, some more details
>>>> are needed I guess -- kernel version is a start.
>>>> 
>>> Really? It's my understanding that if md finds an unreadable block
>>> during raid5 reconstruction, it has no option but to fail since the
>>> information can't be reconstructed. When this happened to me, I had to
>>> 
>> 
>> Yes indeed, it should fail, but not stuck as Daniel reported.
>> Ie, it should either complete the work or fail, but not sleep
>> somewhere in between.
>> 
>> []
>> 
>>> This is why it's important to run a weekly check so md can repair blocks
>>> *before* a drive fails.
>>> 
>> 
>> *nod*.
>> 
>> /mjt
>>
>> 
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Yikes.  By the way are all those drives on the same chipset? What type of 
drives did you use?

Justin.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: problem killing raid 5
  2007-10-01 21:44           ` Justin Piszcz
@ 2007-10-02  6:53             ` Daniel Santos
  0 siblings, 0 replies; 8+ messages in thread
From: Daniel Santos @ 2007-10-02  6:53 UTC (permalink / raw)
  Cc: linux-raid

All the drives are identical, and they are on identical usb enclosures. 
I am starting to suspect USB. It frequently resets the enclosures. I'll 
have to look at that first. Anyway I had it working before for some time.

Justin Piszcz wrote:
>
>
> On Mon, 1 Oct 2007, Daniel Santos wrote:
>
>> It stopped the reconstruction process and the output of /proc/mdstat 
>> was :
>>
>> oraculo:/home/dlsa# cat /proc/mdstat
>> Personalities : [raid6] [raid5] [raid4] [raid1] [raid0] [linear]
>> md0 : active raid5 sdc1[3](S) sdb1[4](F) sdd1[0]
>>     781417472 blocks level 5, 256k chunk, algorithm 2 [3/1] [U__]
>>
>> I then stopped the array and tried to assemble it with a scan :
>>
>> oraculo:/home/dlsa# mdadm --assemble --scan
>> mdadm: /dev/md0 assembled from 1 drive and 1 spare - not enough to 
>> start the array.
>> oraculo:/home/dlsa# cat /proc/mdstat
>> Personalities : [raid6] [raid5] [raid4] [raid1] [raid0] [linear]
>> md0 : inactive sdd1[0](S) sdc1[3](S) sdb1[1](S)
>>     1172126208 blocks
>>
>> The fourth drive I had to put in mdadm.conf as missing.
>>
>> The result was that because of the read error, the reconstruction 
>> process for the new array aborted, and the assemble came up with an 
>> array that seems like the one that failed before I created the new one.
>>
>> I am running debian with a 2.6.22 kernel.
>>
>>
>> Michael Tokarev wrote:
>>> Patrik Jonsson wrote:
>>>
>>>> Michael Tokarev wrote:
>>>>
>>> []
>>>
>>>>> But in any case, md should not stall - be it during reconstruction
>>>>> or not.  For this, I can't comment - to me it smells like a bug
>>>>> somewhere (md layer? error handling in driver? something else?)
>>>>> which should be found and fixed.  And for this, some more details
>>>>> are needed I guess -- kernel version is a start.
>>>>>
>>>> Really? It's my understanding that if md finds an unreadable block
>>>> during raid5 reconstruction, it has no option but to fail since the
>>>> information can't be reconstructed. When this happened to me, I had to
>>>>
>>>
>>> Yes indeed, it should fail, but not stuck as Daniel reported.
>>> Ie, it should either complete the work or fail, but not sleep
>>> somewhere in between.
>>>
>>> []
>>>
>>>> This is why it's important to run a weekly check so md can repair 
>>>> blocks
>>>> *before* a drive fails.
>>>>
>>>
>>> *nod*.
>>>
>>> /mjt
>>>
>>>
>>
>> -
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
> Yikes.  By the way are all those drives on the same chipset? What type 
> of drives did you use?
>
> Justin.
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2007-10-02  6:53 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-01 11:04 problem killing raid 5 Daniel Santos
2007-10-01 18:20 ` Daniel Santos
2007-10-01 18:47   ` Michael Tokarev
2007-10-01 18:58     ` Patrik Jonsson
2007-10-01 20:35       ` Michael Tokarev
2007-10-01 21:11         ` Daniel Santos
2007-10-01 21:44           ` Justin Piszcz
2007-10-02  6:53             ` Daniel Santos

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).