Suggestion needed for fixing RAID6

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Suggestion needed for fixing RAID6
@ 2010-04-22 10:09 Janos Haar
  2010-04-22 15:00 ` Mikael Abrahamsson
                   ` (2 more replies)
  0 siblings, 3 replies; 48+ messages in thread
From: Janos Haar @ 2010-04-22 10:09 UTC (permalink / raw)
  To: linux-raid

Hello Neil, list,

I am trying to fix one RAID6 array wich have 12x1.5TB (samsung) drives.
Actually the array have 1 missing drive, and 3 wich have some bad sectors!
Genearlly because it is RAID6 there is no data lost, because the bad sectors 
are not in one address line, but i can't rebuild the missing drive, because 
the kernel drops out the bad sector-drives one by one during the rebuild 
process.

My question is, there is any way, to force the array to keep the members in 
even if have some reading errors?
Or is there a way to re-add the bad sector drives after the kernel dropped 
out without stopping the rebuild process?
In normal way after 18 hour sync, @ 97.9% the 3rd drive is always dropped 
out and the rebuild stops.

Thanks,
Janos Haar 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Suggestion needed for fixing RAID6
  2010-04-22 10:09 Suggestion needed for fixing RAID6 Janos Haar
@ 2010-04-22 15:00 ` Mikael Abrahamsson
  2010-04-22 15:12   ` Janos Haar
       [not found] ` <4BD0AF2D.90207@stud.tu-ilmenau.de>
  2010-04-23  6:51 ` Luca Berra
  2 siblings, 1 reply; 48+ messages in thread
From: Mikael Abrahamsson @ 2010-04-22 15:00 UTC (permalink / raw)
  To: Janos Haar; +Cc: linux-raid

On Thu, 22 Apr 2010, Janos Haar wrote:

> My question is, there is any way, to force the array to keep the members in 
> even if have some reading errors?

What version of the kernel are you running? If it's running anywhere 
recent kernel it shouldn't kick drives upon read error but instead 
recreate from parity. You should probably send "repair" to the md device 
(echo repair > /sys/block/mdX/md/sync_action) and see if that fixes the 
bad blocks. I believe this came in 2.6.15 or something like that (google 
if you're in that neighbourhood, if you're in 2.6.26 or alike then you 
should be fine).

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Suggestion needed for fixing RAID6
  2010-04-22 15:00 ` Mikael Abrahamsson
@ 2010-04-22 15:12   ` Janos Haar
  2010-04-22 15:18     ` Mikael Abrahamsson
  0 siblings, 1 reply; 48+ messages in thread
From: Janos Haar @ 2010-04-22 15:12 UTC (permalink / raw)
  To: Mikael Abrahamsson; +Cc: linux-raid

Hi,


----- Original Message ----- 
From: "Mikael Abrahamsson" <swmike@swm.pp.se>
To: "Janos Haar" <janos.haar@netcenter.hu>
Cc: <linux-raid@vger.kernel.org>
Sent: Thursday, April 22, 2010 5:00 PM
Subject: Re: Suggestion needed for fixing RAID6


> On Thu, 22 Apr 2010, Janos Haar wrote:
>
>> My question is, there is any way, to force the array to keep the members 
>> in even if have some reading errors?
>
> What version of the kernel are you running? If it's running anywhere 
> recent kernel it shouldn't kick drives upon read error but instead 
> recreate from parity. You should probably send "repair" to the md device 
> (echo repair > /sys/block/mdX/md/sync_action) and see if that fixes the 
> bad blocks. I believe this came in 2.6.15 or something like that (google 
> if you're in that neighbourhood, if you're in 2.6.26 or alike then you 
> should be fine).

The kernel is. 2.6.28.10.
I am just tested one of the badblock-hdds, and the bad blocks comes 
periodicaly, like a little and short scratch, and the drive can't correct 
these by write.
Maybe this is why the kernel kicsk it out...
But anyway, the problem is still here, i want to rebuild the missing disk 
(prior to replace the badblocked drives one by one), but the kernel kicks 
out more 2 drive during the rebuild.

Thanks for the idea,

Janos

>
> -- 
> Mikael Abrahamsson    email: swmike@swm.pp.se 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Suggestion needed for fixing RAID6
  2010-04-22 15:12   ` Janos Haar
@ 2010-04-22 15:18     ` Mikael Abrahamsson
  2010-04-22 16:25       ` Janos Haar
  2010-04-22 16:32       ` Peter Rabbitson
  0 siblings, 2 replies; 48+ messages in thread
From: Mikael Abrahamsson @ 2010-04-22 15:18 UTC (permalink / raw)
  To: Janos Haar; +Cc: linux-raid

On Thu, 22 Apr 2010, Janos Haar wrote:

> I am just tested one of the badblock-hdds, and the bad blocks comes 
> periodicaly, like a little and short scratch, and the drive can't correct 
> these by write.

Oh, if you get write errors on the drive then you're in bigger trouble.

> Maybe this is why the kernel kicsk it out...

Yes, a write error to the drive is a kick:able offence. What does smartctl 
say about the drives?

> But anyway, the problem is still here, i want to rebuild the missing disk 
> (prior to replace the badblocked drives one by one), but the kernel kicks out 
> more 2 drive during the rebuild.

I don't have a good idea that assures your data, unfortunately. One way 
would be to dd the defective drives to working ones, but that will most 
likely cause you to have data loss on the defective sectors (since md has 
no idea that these sectors should be re-created from parity).

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Suggestion needed for fixing RAID6
  2010-04-22 15:18     ` Mikael Abrahamsson
@ 2010-04-22 16:25       ` Janos Haar
  2010-04-22 16:32       ` Peter Rabbitson
  1 sibling, 0 replies; 48+ messages in thread
From: Janos Haar @ 2010-04-22 16:25 UTC (permalink / raw)
  To: Mikael Abrahamsson; +Cc: linux-raid


----- Original Message ----- 
From: "Mikael Abrahamsson" <swmike@swm.pp.se>
To: "Janos Haar" <janos.haar@netcenter.hu>
Cc: <linux-raid@vger.kernel.org>
Sent: Thursday, April 22, 2010 5:18 PM
Subject: Re: Suggestion needed for fixing RAID6


> On Thu, 22 Apr 2010, Janos Haar wrote:
>
>> I am just tested one of the badblock-hdds, and the bad blocks comes 
>> periodicaly, like a little and short scratch, and the drive can't correct 
>> these by write.
>
> Oh, if you get write errors on the drive then you're in bigger trouble.

I am planning to replace all the defective drives, but first i need to 
rebuild the missing part.
I don't care about wich is the problem, the first drive have 123 unredable 
sectors, and i have tried to rewrite one but not works.
This will goes to RMA, but first i need to solve the problem.

>
>> Maybe this is why the kernel kicsk it out...
>
> Yes, a write error to the drive is a kick:able offence. What does smartctl 
> say about the drives?

The smart healt is good. (not wondering...)
But the drive have some offline unc sectors and some pendings.

>
>> But anyway, the problem is still here, i want to rebuild the missing disk 
>> (prior to replace the badblocked drives one by one), but the kernel kicks 
>> out more 2 drive during the rebuild.
>
> I don't have a good idea that assures your data, unfortunately. One way 
> would be to dd the defective drives to working ones, but that will most 
> likely cause you to have data loss on the defective sectors (since md has 
> no idea that these sectors should be re-created from parity).

Exactly.
This is why i ask here. :-)
Because i don't want to make some KB errors on the array wich have all the 
needed information.

Any good idea?

Thanks a lot,
Janos

>
> -- 
> Mikael Abrahamsson    email: swmike@swm.pp.se
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Suggestion needed for fixing RAID6
  2010-04-22 15:18     ` Mikael Abrahamsson
  2010-04-22 16:25       ` Janos Haar
@ 2010-04-22 16:32       ` Peter Rabbitson
  1 sibling, 0 replies; 48+ messages in thread
From: Peter Rabbitson @ 2010-04-22 16:32 UTC (permalink / raw)
  To: Mikael Abrahamsson; +Cc: Janos Haar, linux-raid

Mikael Abrahamsson wrote:
> On Thu, 22 Apr 2010, Janos Haar wrote:
> 
> I don't have a good idea that assures your data, unfortunately. One way
> would be to dd the defective drives to working ones, but that will most
> likely cause you to have data loss on the defective sectors (since md
> has no idea that these sectors should be re-created from parity).
> 

There was a thread[1] some time ago, where HPA confirmed that the RAID6
data is sufficient to write an algorithm which will be able to determine
which sector is in fact the offending one. There wasn't any interest to
incorporate this into the sync_action/repair function though :(

[1] http://www.mail-archive.com/linux-raid@vger.kernel.org/msg07327.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

[parent not found: <4BD0AF2D.90207@stud.tu-ilmenau.de>]

* Re: Suggestion needed for fixing RAID6
       [not found] ` <4BD0AF2D.90207@stud.tu-ilmenau.de>
@ 2010-04-22 20:48   ` Janos Haar
  0 siblings, 0 replies; 48+ messages in thread
From: Janos Haar @ 2010-04-22 20:48 UTC (permalink / raw)
  To: st0ff; +Cc: linux-raid

Hi,

----- Original Message ----- 
From: "Stefan /*St0fF*/ Hübner" <stefan.huebner@stud.tu-ilmenau.de>
To: "Janos Haar" <janos.haar@netcenter.hu>
Sent: Thursday, April 22, 2010 10:18 PM
Subject: Re: Suggestion needed for fixing RAID6


> Hi Janos,
>
> I'd ddrescue the failing drives one by one to replacement drives.  Set a
> very high retry-count for this action.

I know what am i doing, trust me. ;-)
I have much more professional tools for this than the ddrescue, and i have
the list of defective sectors as well.
Now i am imaging the second of the failing drives, and this one have >1800
failing sectors.

>
> The logfile ddrescue creates shows the unreadable sectors afterwards.
> The hard part would now be to incorporate the raid-algorithm into some
> tool to just restore the missing sectors...

I can do that, but it is not a good game for 15TB array or even some hundred
of sectors to fix by hand....
The linux md knows how to recalculate these errors, i want to find this
way....somehow...
I am thinking of making RAID1 from the defective drives, and if the kernel
will re-write the sectors, the copy will get it.
But i don't know how to prevent the copy to read it. :-/

Thanks for your suggestions,

Janos

>
> I hope this helps a bit.
> Stefan
>
> Am 22.04.2010 12:09, schrieb Janos Haar:
>> Hello Neil, list,
>>
>> I am trying to fix one RAID6 array wich have 12x1.5TB (samsung) drives.
>> Actually the array have 1 missing drive, and 3 wich have some bad
>> sectors!
>> Genearlly because it is RAID6 there is no data lost, because the bad
>> sectors are not in one address line, but i can't rebuild the missing
>> drive, because the kernel drops out the bad sector-drives one by one
>> during the rebuild process.
>>
>> My question is, there is any way, to force the array to keep the members
>> in even if have some reading errors?
>> Or is there a way to re-add the bad sector drives after the kernel
>> dropped out without stopping the rebuild process?
>> In normal way after 18 hour sync, @ 97.9% the 3rd drive is always
>> dropped out and the rebuild stops.
>>
>> Thanks,
>> Janos Haar
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Suggestion needed for fixing RAID6
  2010-04-22 10:09 Suggestion needed for fixing RAID6 Janos Haar
  2010-04-22 15:00 ` Mikael Abrahamsson
       [not found] ` <4BD0AF2D.90207@stud.tu-ilmenau.de>
@ 2010-04-23  6:51 ` Luca Berra
  2010-04-23  8:47   ` Janos Haar
  2 siblings, 1 reply; 48+ messages in thread
From: Luca Berra @ 2010-04-23  6:51 UTC (permalink / raw)
  To: linux-raid

On Thu, Apr 22, 2010 at 12:09:08PM +0200, Janos Haar wrote:
> Hello Neil, list,
>
> I am trying to fix one RAID6 array wich have 12x1.5TB (samsung) drives.
> Actually the array have 1 missing drive, and 3 wich have some bad sectors!
> Genearlly because it is RAID6 there is no data lost, because the bad 
> sectors are not in one address line, but i can't rebuild the missing drive, 
> because the kernel drops out the bad sector-drives one by one during the 
> rebuild process.

I would seriously consider moving the data out of that array and dumping
all drives from that batch, and this is gonna be painful because you
must watch drives being dropped and add them back, and yes you need the
resources to store the data.

ddrescue won't obviously work, since it will mask read errors and turn
those into data corruption

the raid 1 trick wont work, as you noted

another option could be using the device mapper snapshot-merge target
(writable snapshot), which iirc is a 2.6.33+ feature
look at
http://smorgasbord.gavagai.nl/2010/03/online-merging-of-cow-volumes-with-dm-snapshot/
for hints.
btw i have no clue how the scsi error will travel thru the dm layer.
L.

-- 
Luca Berra -- bluca@comedia.it
          Communication Media & Services S.r.l.
   /"\
   \ /     ASCII RIBBON CAMPAIGN
    X        AGAINST HTML MAIL
   / \

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Suggestion needed for fixing RAID6
  2010-04-23  6:51 ` Luca Berra
@ 2010-04-23  8:47   ` Janos Haar
  2010-04-23 12:34     ` MRK
  0 siblings, 1 reply; 48+ messages in thread
From: Janos Haar @ 2010-04-23  8:47 UTC (permalink / raw)
  To: Luca Berra; +Cc: linux-raid


----- Original Message ----- 
From: "Luca Berra" <bluca@comedia.it>
To: <linux-raid@vger.kernel.org>
Sent: Friday, April 23, 2010 8:51 AM
Subject: Re: Suggestion needed for fixing RAID6


> On Thu, Apr 22, 2010 at 12:09:08PM +0200, Janos Haar wrote:
>> Hello Neil, list,
>>
>> I am trying to fix one RAID6 array wich have 12x1.5TB (samsung) drives.
>> Actually the array have 1 missing drive, and 3 wich have some bad 
>> sectors!
>> Genearlly because it is RAID6 there is no data lost, because the bad 
>> sectors are not in one address line, but i can't rebuild the missing 
>> drive, because the kernel drops out the bad sector-drives one by one 
>> during the rebuild process.
>
> I would seriously consider moving the data out of that array and dumping
> all drives from that batch, and this is gonna be painful because you
> must watch drives being dropped and add them back, and yes you need the
> resources to store the data.
>
> ddrescue won't obviously work, since it will mask read errors and turn
> those into data corruption
>
> the raid 1 trick wont work, as you noted
>
> another option could be using the device mapper snapshot-merge target
> (writable snapshot), which iirc is a 2.6.33+ feature
> look at
> http://smorgasbord.gavagai.nl/2010/03/online-merging-of-cow-volumes-with-dm-snapshot/
> for hints.
> btw i have no clue how the scsi error will travel thru the dm layer.
> L.

...or cowloop! :-)
This is a good idea! :-)
Thank you.

I have another one:
re-create the array (--assume-clean) with external bitmap, than drop the 
missing drive.
Than manually manipulate the bitmap file to re-sync only the last 10% wich 
is good enough for me...

Thanks again,
Janos

>
> -- 
> Luca Berra -- bluca@comedia.it
>          Communication Media & Services S.r.l.
>   /"\
>   \ /     ASCII RIBBON CAMPAIGN
>    X        AGAINST HTML MAIL
>   / \
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Suggestion needed for fixing RAID6
  2010-04-23  8:47   ` Janos Haar
@ 2010-04-23 12:34     ` MRK
  2010-04-24 19:36       ` Janos Haar
  0 siblings, 1 reply; 48+ messages in thread
From: MRK @ 2010-04-23 12:34 UTC (permalink / raw)
  To: Janos Haar; +Cc: linux-raid

On 04/23/2010 10:47 AM, Janos Haar wrote:
>
> ----- Original Message ----- From: "Luca Berra" <bluca@comedia.it>
> To: <linux-raid@vger.kernel.org>
> Sent: Friday, April 23, 2010 8:51 AM
> Subject: Re: Suggestion needed for fixing RAID6
>
>
>> another option could be using the device mapper snapshot-merge target
>> (writable snapshot), which iirc is a 2.6.33+ feature
>> look at
>> http://smorgasbord.gavagai.nl/2010/03/online-merging-of-cow-volumes-with-dm-snapshot/ 
>>
>> for hints.
>> btw i have no clue how the scsi error will travel thru the dm layer.
>> L.
>
> ...or cowloop! :-)
> This is a good idea! :-)
> Thank you.
>
> I have another one:
> re-create the array (--assume-clean) with external bitmap, than drop 
> the missing drive.
> Than manually manipulate the bitmap file to re-sync only the last 10% 
> wich is good enough for me...

Cowloop is kinda deprecated in favour of DM, says wikipedia, and messing 
with the bitmap looks complicated to me.
I think Luca's is a great suggestion. You can use 3 files with 
loop-device so to store the COW devices for the 3 disks which are 
faulty. So that writes go there and you can complete the resync.
Then you would fail the cow devices one by one from mdadm and replicate 
to spares.

But this will work ONLY if read errors are still be reported across the 
DM-snapshot thingo. Otherwise (if it e.g. returns a block of zeroes 
without error) you are eventually going to get data corruption when 
replacing drives.

You can check if read errors are reported, by looking at the dmesg 
during the resync. If you see many  "read error corrected..." it works, 
while if it's silent it means it hasn't received read errors which means 
that it doesn't work. If it doesn't work DO NOT go ahead replacing 
drives, or you will get data corruption.

So you need an initial test which just performs a resync but *without* 
replicating to a spare. So I suggest you first remove all the spares 
from the array, then create the COW snapshots, then assemble the array, 
perform a resync, look at the dmesg. If it works: add the spares back, 
fail one drive, etc.

If this technique works this would be useful for everybody, so pls keep 
us informed!!
Thank you

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Suggestion needed for fixing RAID6
  2010-04-23 12:34     ` MRK
@ 2010-04-24 19:36       ` Janos Haar
  2010-04-24 22:47         ` MRK
  0 siblings, 1 reply; 48+ messages in thread
From: Janos Haar @ 2010-04-24 19:36 UTC (permalink / raw)
  To: MRK; +Cc: linux-raid


----- Original Message ----- 
From: "MRK" <mrk@shiftmail.org>
To: "Janos Haar" <janos.haar@netcenter.hu>
Cc: "linux-raid" <linux-raid@vger.kernel.org>
Sent: Friday, April 23, 2010 2:34 PM
Subject: Re: Suggestion needed for fixing RAID6


> On 04/23/2010 10:47 AM, Janos Haar wrote:
>>
>> ----- Original Message ----- From: "Luca Berra" <bluca@comedia.it>
>> To: <linux-raid@vger.kernel.org>
>> Sent: Friday, April 23, 2010 8:51 AM
>> Subject: Re: Suggestion needed for fixing RAID6
>>
>>
>>> another option could be using the device mapper snapshot-merge target
>>> (writable snapshot), which iirc is a 2.6.33+ feature
>>> look at
>>> http://smorgasbord.gavagai.nl/2010/03/online-merging-of-cow-volumes-with-dm-snapshot/
>>> for hints.
>>> btw i have no clue how the scsi error will travel thru the dm layer.
>>> L.
>>
>> ...or cowloop! :-)
>> This is a good idea! :-)
>> Thank you.
>>
>> I have another one:
>> re-create the array (--assume-clean) with external bitmap, than drop the 
>> missing drive.
>> Than manually manipulate the bitmap file to re-sync only the last 10% 
>> wich is good enough for me...
>
>
> Cowloop is kinda deprecated in favour of DM, says wikipedia, and messing 
> with the bitmap looks complicated to me.

Hi,

I think i will like again this idea... :-D

> I think Luca's is a great suggestion. You can use 3 files with loop-device 
> so to store the COW devices for the 3 disks which are faulty. So that 
> writes go there and you can complete the resync.
> Then you would fail the cow devices one by one from mdadm and replicate to 
> spares.
>
> But this will work ONLY if read errors are still be reported across the 
> DM-snapshot thingo. Otherwise (if it e.g. returns a block of zeroes 
> without error) you are eventually going to get data corruption when 
> replacing drives.
>
> You can check if read errors are reported, by looking at the dmesg during 
> the resync. If you see many  "read error corrected..." it works, while if 
> it's silent it means it hasn't received read errors which means that it 
> doesn't work. If it doesn't work DO NOT go ahead replacing drives, or you 
> will get data corruption.
>
> So you need an initial test which just performs a resync but *without* 
> replicating to a spare. So I suggest you first remove all the spares from 
> the array, then create the COW snapshots, then assemble the array, perform 
> a resync, look at the dmesg. If it works: add the spares back, fail one 
> drive, etc.
>
> If this technique works this would be useful for everybody, so pls keep us 
> informed!!

Ok, i am doing it.

I think i have found some interesting, what is unexpected:
After 99.9% (and another 1800minute) the array is dropped the dm-snapshot 
structure!

ata5.00: exception Emask 0x0 SAct 0x7fa1 SErr 0x0 action 0x0
ata5.00: irq_stat 0x40000008
ata5.00: cmd 60/d8:38:1d:e7:90/00:00:ae:00:00/40 tag 7 ncq 110592 in
         res 41/40:7a:7b:e7:90/6c:00:ae:00:00/40 Emask 0x409 (media error) 
<F>
ata5.00: status: { DRDY ERR }
ata5.00: error: { UNC }
ata5.00: configured for UDMA/133
ata5: EH complete

...

sd 4:0:0:0: [sde] 2930277168 512-byte hardware sectors: (1.50 TB/1.36 TiB)
sd 4:0:0:0: [sde] Write Protect is off
sd 4:0:0:0: [sde] Mode Sense: 00 3a 00 00
sd 4:0:0:0: [sde] Write cache: enabled, read cache: enabled, doesn't support 
DPO or FUA
ata5.00: exception Emask 0x0 SAct 0x3ff SErr 0x0 action 0x0
ata5.00: irq_stat 0x40000008
ata5.00: cmd 60/d8:38:1d:e7:90/00:00:ae:00:00/40 tag 7 ncq 110592 in
         res 41/40:7a:7b:e7:90/6c:00:ae:00:00/40 Emask 0x409 (media error) 
<F>
ata5.00: status: { DRDY ERR }
ata5.00: error: { UNC }
ata5.00: configured for UDMA/133
sd 4:0:0:0: [sde] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
sd 4:0:0:0: [sde] Sense Key : Medium Error [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
        72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
        ae 90 e7 7b
sd 4:0:0:0: [sde] Add. Sense: Unrecovered read error - auto reallocate 
failed
end_request: I/O error, dev sde, sector 2928732027
__ratelimit: 16 callbacks suppressed
raid5:md3: read error not correctable (sector 2923767936 on dm-0).
raid5: Disk failure on dm-0, disabling device.
raid5: Operation continuing on 9 devices.
md: md3: recovery done.
raid5:md3: read error not correctable (sector 2923767944 on dm-0).
raid5:md3: read error not correctable (sector 2923767952 on dm-0).
raid5:md3: read error not correctable (sector 2923767960 on dm-0).
raid5:md3: read error not correctable (sector 2923767968 on dm-0).
raid5:md3: read error not correctable (sector 2923767976 on dm-0).
raid5:md3: read error not correctable (sector 2923767984 on dm-0).
raid5:md3: read error not correctable (sector 2923767992 on dm-0).
raid5:md3: read error not correctable (sector 2923768000 on dm-0).
raid5:md3: read error not correctable (sector 2923768008 on dm-0).
ata5: EH complete
sd 4:0:0:0: [sde] 2930277168 512-byte hardware sectors: (1.50 TB/1.36 TiB)
sd 4:0:0:0: [sde] Write Protect is off
sd 4:0:0:0: [sde] Mode Sense: 00 3a 00 00
sd 4:0:0:0: [sde] Write cache: enabled, read cache: enabled, doesn't support 
DPO or FUA
ata5.00: exception Emask 0x0 SAct 0x1e1 SErr 0x0 action 0x0
ata5.00: irq_stat 0x40000008
ata5.00: cmd 60/00:28:f5:e8:90/01:00:ae:00:00/40 tag 5 ncq 131072 in
         res 41/40:27:ce:e9:90/6c:00:ae:00:00/40 Emask 0x409 (media error) 
<F>
ata5.00: status: { DRDY ERR }
ata5.00: error: { UNC }
ata5.00: configured for UDMA/133
ata5: EH complete
sd 4:0:0:0: [sde] 2930277168 512-byte hardware sectors: (1.50 TB/1.36 TiB)
sd 4:0:0:0: [sde] Write Protect is off
sd 4:0:0:0: [sde] Mode Sense: 00 3a 00 00
sd 4:0:0:0: [sde] Write cache: enabled, read cache: enabled, doesn't support 
DPO or FUA
RAID5 conf printout:
 --- rd:12 wd:9
 disk 0, o:1, dev:sda4
 disk 1, o:1, dev:sdb4
 disk 2, o:1, dev:sdc4
 disk 3, o:1, dev:sdd4
 disk 4, o:0, dev:dm-0
 disk 5, o:1, dev:sdf4
 disk 6, o:1, dev:sdg4
 disk 8, o:1, dev:sdi4
 disk 9, o:1, dev:sdj4
 disk 10, o:1, dev:sdk4
 disk 11, o:1, dev:sdl4
RAID5 conf printout:
 --- rd:12 wd:9
 disk 0, o:1, dev:sda4
 disk 1, o:1, dev:sdb4
 disk 2, o:1, dev:sdc4
 disk 4, o:0, dev:dm-0
 disk 5, o:1, dev:sdf4
 disk 6, o:1, dev:sdg4
 disk 8, o:1, dev:sdi4
 disk 9, o:1, dev:sdj4
 disk 10, o:1, dev:sdk4
 disk 11, o:1, dev:sdl4
RAID5 conf printout:
 --- rd:12 wd:9
 disk 0, o:1, dev:sda4
 disk 1, o:1, dev:sdb4
 disk 2, o:1, dev:sdc4
 disk 4, o:0, dev:dm-0
 disk 5, o:1, dev:sdf4
 disk 6, o:1, dev:sdg4
 disk 8, o:1, dev:sdi4
 disk 9, o:1, dev:sdj4
 disk 10, o:1, dev:sdk4
 disk 11, o:1, dev:sdl4
RAID5 conf printout:
 --- rd:12 wd:9
 disk 0, o:1, dev:sda4
 disk 1, o:1, dev:sdb4
 disk 2, o:1, dev:sdc4
 disk 5, o:1, dev:sdf4
 disk 6, o:1, dev:sdg4
 disk 8, o:1, dev:sdi4
 disk 9, o:1, dev:sdj4
 disk 10, o:1, dev:sdk4
 disk 11, o:1, dev:sdl4

So, the dm-0 is dropped only for _READ_ error!

kernel 2.6.28.10

Now i am trying to do a repair-resync solution before rebuild the missing 
drive...

Cheers,
Janos



> Thank you
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Suggestion needed for fixing RAID6
  2010-04-24 19:36       ` Janos Haar
@ 2010-04-24 22:47         ` MRK
  2010-04-25 10:00           ` Janos Haar
  0 siblings, 1 reply; 48+ messages in thread
From: MRK @ 2010-04-24 22:47 UTC (permalink / raw)
  To: Janos Haar; +Cc: linux-raid

On 04/24/2010 09:36 PM, Janos Haar wrote:
>
> Ok, i am doing it.
>
> I think i have found some interesting, what is unexpected:
> After 99.9% (and another 1800minute) the array is dropped the 
> dm-snapshot structure!
>
> ...[CUT]...
>
> raid5:md3: read error not correctable (sector 2923767944 on dm-0).
> raid5:md3: read error not correctable (sector 2923767952 on dm-0).
> raid5:md3: read error not correctable (sector 2923767960 on dm-0).
> raid5:md3: read error not correctable (sector 2923767968 on dm-0).
> raid5:md3: read error not correctable (sector 2923767976 on dm-0).
> raid5:md3: read error not correctable (sector 2923767984 on dm-0).
> raid5:md3: read error not correctable (sector 2923767992 on dm-0).
> raid5:md3: read error not correctable (sector 2923768000 on dm-0).
>
> ...[CUT]...
>
> So, the dm-0 is dropped only for _READ_ error!

Actually no, it is being dropped for "uncorrectable read error" which 
means, AFAIK, that the read error was received, then the block was 
recomputed from the other disks, then a rewrite of the damaged block was 
attempted, and such *write* failed. So it is being dropped for a *write* 
error. People correct me if I'm wrong.

This is strange because the write should have gone to the cow device. 
Are you sure you did everything correctly with DM? Could you post here 
how you created the dm-0 device?

We might ask to the DM people why it's not working maybe. Anyway there 
is one good news, and it's that the read error apparently does travel 
through the DM stack.

Thanks for your work

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Suggestion needed for fixing RAID6
  2010-04-24 22:47         ` MRK
@ 2010-04-25 10:00           ` Janos Haar
  2010-04-26 10:24             ` MRK
  0 siblings, 1 reply; 48+ messages in thread
From: Janos Haar @ 2010-04-25 10:00 UTC (permalink / raw)
  To: MRK; +Cc: linux-raid


----- Original Message ----- 
From: "MRK" <mrk@shiftmail.org>
To: "Janos Haar" <janos.haar@netcenter.hu>
Cc: <linux-raid@vger.kernel.org>
Sent: Sunday, April 25, 2010 12:47 AM
Subject: Re: Suggestion needed for fixing RAID6

Just a little note:

The repair-sync action failed similar way too. :-(


> On 04/24/2010 09:36 PM, Janos Haar wrote:
>>
>> Ok, i am doing it.
>>
>> I think i have found some interesting, what is unexpected:
>> After 99.9% (and another 1800minute) the array is dropped the dm-snapshot 
>> structure!
>>
>> ...[CUT]...
>>
>> raid5:md3: read error not correctable (sector 2923767944 on dm-0).
>> raid5:md3: read error not correctable (sector 2923767952 on dm-0).
>> raid5:md3: read error not correctable (sector 2923767960 on dm-0).
>> raid5:md3: read error not correctable (sector 2923767968 on dm-0).
>> raid5:md3: read error not correctable (sector 2923767976 on dm-0).
>> raid5:md3: read error not correctable (sector 2923767984 on dm-0).
>> raid5:md3: read error not correctable (sector 2923767992 on dm-0).
>> raid5:md3: read error not correctable (sector 2923768000 on dm-0).
>>
>> ...[CUT]...
>>
>> So, the dm-0 is dropped only for _READ_ error!
>
> Actually no, it is being dropped for "uncorrectable read error" which 
> means, AFAIK, that the read error was received, then the block was 
> recomputed from the other disks, then a rewrite of the damaged block was 
> attempted, and such *write* failed. So it is being dropped for a *write* 
> error. People correct me if I'm wrong.

I think i can try:

# dd_rescue -v /dev/zero -S $((2923767944 / 2))k /dev/mapper/cow  -m 4k
dd_rescue: (info): about to transfer 4.0 kBytes from /dev/zero to 
/dev/mapper/cow
dd_rescue: (info): blocksizes: soft 65536, hard 512
dd_rescue: (info): starting positions: in 0.0k, out 1461883972.0k
dd_rescue: (info): Logfile: (none), Maxerr: 0
dd_rescue: (info): Reverse: no , Trunc: no , interactive: no
dd_rescue: (info): abort on Write errs: no , spArse write: if err
dd_rescue: (info): ipos:         0.0k, opos:1461883972.0k, xferd: 
0.0k
                   errs:      0, errxfer:         0.0k, succxfer: 
0.0k
             +curr.rate:        0kB/s, avg.rate:        0kB/s, avg.load: 
0.0%
Summary for /dev/zero -> /dev/mapper/cow:
dd_rescue: (info): ipos:         4.0k, opos:1461883976.0k, xferd: 
4.0k
                   errs:      0, errxfer:         0.0k, succxfer: 
4.0k
             +curr.rate:      203kB/s, avg.rate:      203kB/s, avg.load: 
0.0%


>
> This is strange because the write should have gone to the cow device. Are 
> you sure you did everything correctly with DM? Could you post here how you 
> created the dm-0 device?

echo 0 $(blockdev --getsize /dev/sde4) \
        snapshot /dev/sde4 /dev/loop3 p 8 | \
        dmsetup create cow

]# losetup /dev/loop3
/dev/loop3: [0901]:55091517 (/snapshot.bin)

/snapshot.bin is a sparse file with 2000G seeked size.
I have 3.6GB free space in / so the out of space is not an option. :-)

I think this is correct. :-)
But anyway, i have pre-tested it with fdisk and works.

>
> We might ask to the DM people why it's not working maybe. Anyway there is 
> one good news, and it's that the read error apparently does travel through 
> the DM stack.

For me, this looks like md's bug not dm's problem.
The "uncorrectable read error" means exactly the drive can't correct the 
damaged sector with ECC, and this is an unreadable sector. (pending in smart 
table)
The auto read reallocation failed not meas the sector is not re-allocatable 
by rewriting it!
The most of the drives doesn't do read-reallocation only write-reallocation.

These drives wich does read reallocation, does it because the sector was 
hard to re-calculate (maybe needed more rotation, more repositioning, too 
much time) and moved automatically, BUT those sectors ARE NOT reported to 
the pc as read-error (UNC), so must NOT appear in the log...

I am glad if i can help to fix this but, but please keep this in mind, this 
raid array is a productive system, and my customer gets more and more 
nervous day by day...
I need a good solution for fixing this array to safely replace the bad 
drives without any data lost!

Somebody have any good idea wich is not copy the entire (15TB) array?

Thanks a lot,
Janos Haat

>
> Thanks for your work
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Suggestion needed for fixing RAID6
  2010-04-25 10:00           ` Janos Haar
@ 2010-04-26 10:24             ` MRK
  2010-04-26 12:52               ` Janos Haar
  0 siblings, 1 reply; 48+ messages in thread
From: MRK @ 2010-04-26 10:24 UTC (permalink / raw)
  To: Janos Haar; +Cc: linux-raid

On 04/25/2010 12:00 PM, Janos Haar wrote:
>
> ----- Original Message ----- From: "MRK" <mrk@shiftmail.org>
> To: "Janos Haar" <janos.haar@netcenter.hu>
> Cc: <linux-raid@vger.kernel.org>
> Sent: Sunday, April 25, 2010 12:47 AM
> Subject: Re: Suggestion needed for fixing RAID6
>
> Just a little note:
>
> The repair-sync action failed similar way too. :-(
>
>
>> On 04/24/2010 09:36 PM, Janos Haar wrote:
>>>
>>> Ok, i am doing it.
>>>
>>> I think i have found some interesting, what is unexpected:
>>> After 99.9% (and another 1800minute) the array is dropped the 
>>> dm-snapshot structure!
>>>
>>> ...[CUT]...
>>>
>>> raid5:md3: read error not correctable (sector 2923767944 on dm-0).
>>> raid5:md3: read error not correctable (sector 2923767952 on dm-0).
>>> raid5:md3: read error not correctable (sector 2923767960 on dm-0).
>>> raid5:md3: read error not correctable (sector 2923767968 on dm-0).
>>> raid5:md3: read error not correctable (sector 2923767976 on dm-0).
>>> raid5:md3: read error not correctable (sector 2923767984 on dm-0).
>>> raid5:md3: read error not correctable (sector 2923767992 on dm-0).
>>> raid5:md3: read error not correctable (sector 2923768000 on dm-0).
>>>
>>> ...[CUT]...
>>>
>

Remember this exact error message: "read error not correctable"

>
>>
>> This is strange because the write should have gone to the cow device. 
>> Are you sure you did everything correctly with DM? Could you post 
>> here how you created the dm-0 device?
>
> echo 0 $(blockdev --getsize /dev/sde4) \
>        snapshot /dev/sde4 /dev/loop3 p 8 | \
>        dmsetup create cow
>

Seems correct to me...

> ]# losetup /dev/loop3
> /dev/loop3: [0901]:55091517 (/snapshot.bin)
>
This line comes BEFORE the other one, right?

> /snapshot.bin is a sparse file with 2000G seeked size.
> I have 3.6GB free space in / so the out of space is not an option. :-)
>
>
[...]
>
>>
>> We might ask to the DM people why it's not working maybe. Anyway 
>> there is one good news, and it's that the read error apparently does 
>> travel through the DM stack.
>
> For me, this looks like md's bug not dm's problem.
> The "uncorrectable read error" means exactly the drive can't correct 
> the damaged sector with ECC, and this is an unreadable sector. 
> (pending in smart table)
> The auto read reallocation failed not meas the sector is not 
> re-allocatable by rewriting it!
> The most of the drives doesn't do read-reallocation only 
> write-reallocation.
>
> These drives wich does read reallocation, does it because the sector 
> was hard to re-calculate (maybe needed more rotation, more 
> repositioning, too much time) and moved automatically, BUT those 
> sectors ARE NOT reported to the pc as read-error (UNC), so must NOT 
> appear in the log...
>

No the error message really comes from MD. Can you read C code? Go into 
the kernel source and look this file:

linux_source_dir/drivers/md/raid5.c

(file raid5.c is also for raid6) search for "read error not correctable"

What you see there is the reason for failure. You see the line "if 
(conf->mddev->degraded)" just above? I think your mistake was that you 
did the DM COW trick only on the last device, or anyway one device only, 
instead you should have done it on all 3 devices which were failing.

It did not work for you because at the moment you got the read error on 
the last disk, two disks were already dropped from the array, the array 
was doubly degraded, and it's not possible to correct a read error if 
the array is degraded because you don't have enough parity information 
to recover the data for that sector.

You should have prevented also the first two disks from dropping. Do the 
DM trick on all of them simultaneously, or at least on 2 of them (if you 
are sure only 3 disks have problems), start the array making sure it 
starts with all devices online i.e. nondegraded, then start the resync, 
and I think it will work.


> I am glad if i can help to fix this but, but please keep this in mind, 
> this raid array is a productive system, and my customer gets more and 
> more nervous day by day...
> I need a good solution for fixing this array to safely replace the bad 
> drives without any data lost!
>
> Somebody have any good idea wich is not copy the entire (15TB) array?

I don't think there is another way. You need to make this work.

Good luck


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Suggestion needed for fixing RAID6
  2010-04-26 10:24             ` MRK
@ 2010-04-26 12:52               ` Janos Haar
  2010-04-26 16:53                 ` MRK
  0 siblings, 1 reply; 48+ messages in thread
From: Janos Haar @ 2010-04-26 12:52 UTC (permalink / raw)
  To: MRK; +Cc: linux-raid


----- Original Message ----- 
From: "MRK" <mrk@shiftmail.org>
To: "Janos Haar" <janos.haar@netcenter.hu>
Cc: <linux-raid@vger.kernel.org>
Sent: Monday, April 26, 2010 12:24 PM
Subject: Re: Suggestion needed for fixing RAID6


> On 04/25/2010 12:00 PM, Janos Haar wrote:
>>
>> ----- Original Message ----- From: "MRK" <mrk@shiftmail.org>
>> To: "Janos Haar" <janos.haar@netcenter.hu>
>> Cc: <linux-raid@vger.kernel.org>
>> Sent: Sunday, April 25, 2010 12:47 AM
>> Subject: Re: Suggestion needed for fixing RAID6
>>
>> Just a little note:
>>
>> The repair-sync action failed similar way too. :-(
>>
>>
>>> On 04/24/2010 09:36 PM, Janos Haar wrote:
>>>>
>>>> Ok, i am doing it.
>>>>
>>>> I think i have found some interesting, what is unexpected:
>>>> After 99.9% (and another 1800minute) the array is dropped the 
>>>> dm-snapshot structure!
>>>>
>>>> ...[CUT]...
>>>>
>>>> raid5:md3: read error not correctable (sector 2923767944 on dm-0).
>>>> raid5:md3: read error not correctable (sector 2923767952 on dm-0).
>>>> raid5:md3: read error not correctable (sector 2923767960 on dm-0).
>>>> raid5:md3: read error not correctable (sector 2923767968 on dm-0).
>>>> raid5:md3: read error not correctable (sector 2923767976 on dm-0).
>>>> raid5:md3: read error not correctable (sector 2923767984 on dm-0).
>>>> raid5:md3: read error not correctable (sector 2923767992 on dm-0).
>>>> raid5:md3: read error not correctable (sector 2923768000 on dm-0).
>>>>
>>>> ...[CUT]...
>>>>
>>
> 
> Remember this exact error message: "read error not correctable"
> 
>>
>>>
>>> This is strange because the write should have gone to the cow device. 
>>> Are you sure you did everything correctly with DM? Could you post 
>>> here how you created the dm-0 device?
>>
>> echo 0 $(blockdev --getsize /dev/sde4) \
>>        snapshot /dev/sde4 /dev/loop3 p 8 | \
>>        dmsetup create cow
>>
> 
> Seems correct to me...
> 
>> ]# losetup /dev/loop3
>> /dev/loop3: [0901]:55091517 (/snapshot.bin)
>>
> This line comes BEFORE the other one, right?
> 
>> /snapshot.bin is a sparse file with 2000G seeked size.
>> I have 3.6GB free space in / so the out of space is not an option. :-)
>>
>>
> [...]
>>
>>>
>>> We might ask to the DM people why it's not working maybe. Anyway 
>>> there is one good news, and it's that the read error apparently does 
>>> travel through the DM stack.
>>
>> For me, this looks like md's bug not dm's problem.
>> The "uncorrectable read error" means exactly the drive can't correct 
>> the damaged sector with ECC, and this is an unreadable sector. 
>> (pending in smart table)
>> The auto read reallocation failed not meas the sector is not 
>> re-allocatable by rewriting it!
>> The most of the drives doesn't do read-reallocation only 
>> write-reallocation.
>>
>> These drives wich does read reallocation, does it because the sector 
>> was hard to re-calculate (maybe needed more rotation, more 
>> repositioning, too much time) and moved automatically, BUT those 
>> sectors ARE NOT reported to the pc as read-error (UNC), so must NOT 
>> appear in the log...
>>
> 
> No the error message really comes from MD. Can you read C code? Go into 
> the kernel source and look this file:
> 
> linux_source_dir/drivers/md/raid5.c
> 
> (file raid5.c is also for raid6) search for "read error not correctable"
> 
> What you see there is the reason for failure. You see the line "if 
> (conf->mddev->degraded)" just above? I think your mistake was that you 
> did the DM COW trick only on the last device, or anyway one device only, 
> instead you should have done it on all 3 devices which were failing.
> 
> It did not work for you because at the moment you got the read error on 
> the last disk, two disks were already dropped from the array, the array 
> was doubly degraded, and it's not possible to correct a read error if 
> the array is degraded because you don't have enough parity information 
> to recover the data for that sector.

Oops, you are right!
It was my mistake.
Sorry, i will try it again, to support 2 drives with dm-cow.
I will try it.

Thanks again.

Janos



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Suggestion needed for fixing RAID6
  2010-04-26 12:52               ` Janos Haar
@ 2010-04-26 16:53                 ` MRK
  2010-04-26 22:39                   ` Janos Haar
  2010-04-27 15:50                   ` Janos Haar
  0 siblings, 2 replies; 48+ messages in thread
From: MRK @ 2010-04-26 16:53 UTC (permalink / raw)
  To: Janos Haar; +Cc: linux-raid

On 04/26/2010 02:52 PM, Janos Haar wrote:
>
> Oops, you are right!
> It was my mistake.
> Sorry, i will try it again, to support 2 drives with dm-cow.
> I will try it.

Great! post here the results... the dmesg in particular.
The dmesg should contain multiple lines like this "raid5:md3: read error 
corrected ....."
then you know it worked.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Suggestion needed for fixing RAID6
  2010-04-26 16:53                 ` MRK
@ 2010-04-26 22:39                   ` Janos Haar
  2010-04-26 23:06                     ` Michael Evans
  2010-04-27 15:50                   ` Janos Haar
  1 sibling, 1 reply; 48+ messages in thread
From: Janos Haar @ 2010-04-26 22:39 UTC (permalink / raw)
  To: MRK; +Cc: linux-raid


----- Original Message ----- 
From: "MRK" <mrk@shiftmail.org>
To: "Janos Haar" <janos.haar@netcenter.hu>
Cc: <linux-raid@vger.kernel.org>
Sent: Monday, April 26, 2010 6:53 PM
Subject: Re: Suggestion needed for fixing RAID6


> On 04/26/2010 02:52 PM, Janos Haar wrote:
>>
>> Oops, you are right!
>> It was my mistake.
>> Sorry, i will try it again, to support 2 drives with dm-cow.
>> I will try it.
>
> Great! post here the results... the dmesg in particular.
> The dmesg should contain multiple lines like this "raid5:md3: read error 
> corrected ....."
> then you know it worked.

md3 : active raid6 sdd4[12] sdl4[11] sdk4[10] sdj4[9] sdi4[8] dm-1[13](F) 
sdg4[6] sdf4[5] dm-0[14](F) sdc4[2] sdb4[1] sda4[0]
      14626538880 blocks level 6, 16k chunk, algorithm 2 [12/9] 
[UUU__UU_UUUU]
      [>....................]  recovery =  1.5% (22903832/1462653888) 
finish=3188383.4min speed=7K/sec

Khm.... :-D
It is working on something or stopped with 3 missing drive? : ^ )

(I have found the cause of the 2 dm's failure.
Now retry runs...)

Cheers,
Janos




> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Suggestion needed for fixing RAID6
  2010-04-26 22:39                   ` Janos Haar
@ 2010-04-26 23:06                     ` Michael Evans
       [not found]                       ` <7cfd01cae598$419e8d20$0400a8c0@dcccs>
  0 siblings, 1 reply; 48+ messages in thread
From: Michael Evans @ 2010-04-26 23:06 UTC (permalink / raw)
  To: Janos Haar; +Cc: MRK, linux-raid

On Mon, Apr 26, 2010 at 3:39 PM, Janos Haar <janos.haar@netcenter.hu> wrote:
>
> ----- Original Message ----- From: "MRK" <mrk@shiftmail.org>
> To: "Janos Haar" <janos.haar@netcenter.hu>
> Cc: <linux-raid@vger.kernel.org>
> Sent: Monday, April 26, 2010 6:53 PM
> Subject: Re: Suggestion needed for fixing RAID6
>
>
>> On 04/26/2010 02:52 PM, Janos Haar wrote:
>>>
>>> Oops, you are right!
>>> It was my mistake.
>>> Sorry, i will try it again, to support 2 drives with dm-cow.
>>> I will try it.
>>
>> Great! post here the results... the dmesg in particular.
>> The dmesg should contain multiple lines like this "raid5:md3: read error
>> corrected ....."
>> then you know it worked.
>
> md3 : active raid6 sdd4[12] sdl4[11] sdk4[10] sdj4[9] sdi4[8] dm-1[13](F)
> sdg4[6] sdf4[5] dm-0[14](F) sdc4[2] sdb4[1] sda4[0]
>     14626538880 blocks level 6, 16k chunk, algorithm 2 [12/9] [UUU__UU_UUUU]
>     [>....................]  recovery =  1.5% (22903832/1462653888)
> finish=3188383.4min speed=7K/sec
>
> Khm.... :-D
> It is working on something or stopped with 3 missing drive? : ^ )
>
> (I have found the cause of the 2 dm's failure.
> Now retry runs...)
>
> Cheers,
> Janos
>
>
>
>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

What is displayed there seems like it can't be correct.  Please run

mdadm -Evvs

mdadm -Dvvs

and provide the results for us.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

[parent not found: <7cfd01cae598$419e8d20$0400a8c0@dcccs>]

* Re: Suggestion needed for fixing RAID6
       [not found]                       ` <7cfd01cae598$419e8d20$0400a8c0@dcccs>
@ 2010-04-27  0:04                         ` Michael Evans
  0 siblings, 0 replies; 48+ messages in thread
From: Michael Evans @ 2010-04-27  0:04 UTC (permalink / raw)
  To: linux-raid

On Mon, Apr 26, 2010 at 4:29 PM, Janos Haar <janos.haar@netcenter.hu> wrote:
>
> ----- Original Message ----- From: "Michael Evans" <mjevans1983@gmail.com>
> To: "Janos Haar" <janos.haar@netcenter.hu>
> Cc: "MRK" <mrk@shiftmail.org>; <linux-raid@vger.kernel.org>
> Sent: Tuesday, April 27, 2010 1:06 AM
> Subject: Re: Suggestion needed for fixing RAID6
>
>
>> On Mon, Apr 26, 2010 at 3:39 PM, Janos Haar <janos.haar@netcenter.hu>
>> wrote:
>>>
>>> ----- Original Message ----- From: "MRK" <mrk@shiftmail.org>
>>> To: "Janos Haar" <janos.haar@netcenter.hu>
>>> Cc: <linux-raid@vger.kernel.org>
>>> Sent: Monday, April 26, 2010 6:53 PM
>>> Subject: Re: Suggestion needed for fixing RAID6
>>>
>>>
>>>> On 04/26/2010 02:52 PM, Janos Haar wrote:
>>>>>
>>>>> Oops, you are right!
>>>>> It was my mistake.
>>>>> Sorry, i will try it again, to support 2 drives with dm-cow.
>>>>> I will try it.
>>>>
>>>> Great! post here the results... the dmesg in particular.
>>>> The dmesg should contain multiple lines like this "raid5:md3: read error
>>>> corrected ....."
>>>> then you know it worked.
>>>
>>> md3 : active raid6 sdd4[12] sdl4[11] sdk4[10] sdj4[9] sdi4[8] dm-1[13](F)
>>> sdg4[6] sdf4[5] dm-0[14](F) sdc4[2] sdb4[1] sda4[0]
>>> 14626538880 blocks level 6, 16k chunk, algorithm 2 [12/9] [UUU__UU_UUUU]
>>> [>....................] recovery = 1.5% (22903832/1462653888)
>>> finish=3188383.4min speed=7K/sec
>>>
>>> Khm.... :-D
>>> It is working on something or stopped with 3 missing drive? : ^ )
>>>
>>> (I have found the cause of the 2 dm's failure.
>>> Now retry runs...)
>>>
>>> Cheers,
>>> Janos
>>>
>>>
>>>
>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>
>> What is displayed there seems like it can't be correct.  Please run
>>
>> mdadm -Evvs
>>
>> mdadm -Dvvs
>>
>> and provide the results for us.
>
> I have wrongly assigned the dm devices (cross-linked) and the sync process
> is freezed.
> The snapshot is grown to the maximum of space, than both failed with write
> error at the same time with out of space.
> The md_sync process is freezed.
> (I have to push the reset.)
>
> I think this is correct what we can see, because the process is freezed
> before exit and can't change the state to failed.
>
> Cheers,
> Janos
>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

Please reply to all.

It sounds like you need a LOT more space.  Please carefully try again.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Suggestion needed for fixing RAID6
  2010-04-26 16:53                 ` MRK
  2010-04-26 22:39                   ` Janos Haar
@ 2010-04-27 15:50                   ` Janos Haar
  2010-04-27 23:02                     ` MRK
  1 sibling, 1 reply; 48+ messages in thread
From: Janos Haar @ 2010-04-27 15:50 UTC (permalink / raw)
  To: MRK; +Cc: linux-raid


----- Original Message ----- 
From: "MRK" <mrk@shiftmail.org>
To: "Janos Haar" <janos.haar@netcenter.hu>
Cc: <linux-raid@vger.kernel.org>
Sent: Monday, April 26, 2010 6:53 PM
Subject: Re: Suggestion needed for fixing RAID6


> On 04/26/2010 02:52 PM, Janos Haar wrote:
>>
>> Oops, you are right!
>> It was my mistake.
>> Sorry, i will try it again, to support 2 drives with dm-cow.
>> I will try it.
>
> Great! post here the results... the dmesg in particular.
> The dmesg should contain multiple lines like this "raid5:md3: read error 
> corrected ....."
> then you know it worked.

I am affraid i am still right about that....

...
end_request: I/O error, dev sdh, sector 1667152256
raid5:md3: read error not correctable (sector 1662188168 on dm-1).
raid5: Disk failure on dm-1, disabling device.
raid5: Operation continuing on 10 devices.
raid5:md3: read error not correctable (sector 1662188176 on dm-1).
raid5:md3: read error not correctable (sector 1662188184 on dm-1).
raid5:md3: read error not correctable (sector 1662188192 on dm-1).
raid5:md3: read error not correctable (sector 1662188200 on dm-1).
raid5:md3: read error not correctable (sector 1662188208 on dm-1).
raid5:md3: read error not correctable (sector 1662188216 on dm-1).
raid5:md3: read error not correctable (sector 1662188224 on dm-1).
raid5:md3: read error not correctable (sector 1662188232 on dm-1).
raid5:md3: read error not correctable (sector 1662188240 on dm-1).
ata8: EH complete
sd 7:0:0:0: [sdh] 2930277168 512-byte hardware sectors: (1.50 TB/1.36 TiB)
ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata8.00: port_status 0x20200000
ata8.00: cmd 25/00:f8:f5:ba:5e/00:03:63:00:00/e0 tag 0 dma 520192 in
         res 51/40:00:ef:bb:5e/40:00:63:00:00/e0 Emask 0x9 (media error)
ata8.00: status: { DRDY ERR }
ata8.00: error: { UNC }
ata8.00: configured for UDMA/133
ata8: EH complete
....
....
sd 7:0:0:0: [sdh] Add. Sense: Unrecovered read error - auto reallocate 
failed
end_request: I/O error, dev sdh, sector 1667152879
__ratelimit: 36 callbacks suppressed
raid5:md3: read error not correctable (sector 1662188792 on dm-1).
raid5:md3: read error not correctable (sector 1662188800 on dm-1).
md: md3: recovery done.
raid5:md3: read error not correctable (sector 1662188808 on dm-1).
raid5:md3: read error not correctable (sector 1662188816 on dm-1).
raid5:md3: read error not correctable (sector 1662188824 on dm-1).
raid5:md3: read error not correctable (sector 1662188832 on dm-1).
raid5:md3: read error not correctable (sector 1662188840 on dm-1).
raid5:md3: read error not correctable (sector 1662188848 on dm-1).
raid5:md3: read error not correctable (sector 1662188856 on dm-1).
raid5:md3: read error not correctable (sector 1662188864 on dm-1).
ata8: EH complete
sd 7:0:0:0: [sdh] Write Protect is off
sd 7:0:0:0: [sdh] Mode Sense: 00 3a 00 00
sd 7:0:0:0: [sdh] Write cache: enabled, read cache: enabled, doesn't support 
DPO 
or FUA
ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata8.00: port_status 0x20200000
....
....
res 51/40:00:27:c0:5e/40:00:63:00:00/e0 Emask 0x9 (media error)
ata8.00: status: { DRDY ERR }
ata8.00: error: { UNC }
ata8.00: configured for UDMA/133
sd 7:0:0:0: [sdh] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
sd 7:0:0:0: [sdh] Sense Key : Medium Error [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
        72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
        63 5e c0 27
sd 7:0:0:0: [sdh] Add. Sense: Unrecovered read error - auto reallocate 
failed
end_request: I/O error, dev sdh, sector 1667153959
__ratelimit: 86 callbacks suppressed
raid5:md3: read error not correctable (sector 1662189872 on dm-1).
raid5:md3: read error not correctable (sector 1662189880 on dm-1).
raid5:md3: read error not correctable (sector 1662189888 on dm-1).
raid5:md3: read error not correctable (sector 1662189896 on dm-1).
raid5:md3: read error not correctable (sector 1662189904 on dm-1).
raid5:md3: read error not correctable (sector 1662189912 on dm-1).
raid5:md3: read error not correctable (sector 1662189920 on dm-1).
raid5:md3: read error not correctable (sector 1662189928 on dm-1).
raid5:md3: read error not correctable (sector 1662189936 on dm-1).
raid5:md3: read error not correctable (sector 1662189944 on dm-1).
ata8: EH complete
sd 7:0:0:0: [sdh] 2930277168 512-byte hardware sectors: (1.50 TB/1.36 TiB)
sd 7:0:0:0: [sdh] Write Protect is off
sd 7:0:0:0: [sdh] Mode Sense: 00 3a 00 00
sd 7:0:0:0: [sdh] Write cache: enabled, read cache: enabled, doesn't support 
DPO 
or FUA
sd 7:0:0:0: [sdh] 2930277168 512-byte hardware sectors: (1.50 TB/1.36 TiB)
sd 7:0:0:0: [sdh] Write Protect is off
sd 7:0:0:0: [sdh] Mode Sense: 00 3a 00 00
sd 7:0:0:0: [sdh] Write cache: enabled, read cache: enabled, doesn't support 
DPO 
or FUA
RAID5 conf printout:
 --- rd:12 wd:10
 disk 0, o:1, dev:sda4
 disk 1, o:1, dev:sdb4
 disk 2, o:1, dev:sdc4
 disk 3, o:1, dev:sdd4
 disk 4, o:1, dev:dm-0
 disk 5, o:1, dev:sdf4
 disk 6, o:1, dev:sdg4
 disk 7, o:0, dev:dm-1
 disk 8, o:1, dev:sdi4
 disk 9, o:1, dev:sdj4
 disk 10, o:1, dev:sdk4
 disk 11, o:1, dev:sdl4
RAID5 conf printout:
 --- rd:12 wd:10
 disk 0, o:1, dev:sda4
 disk 1, o:1, dev:sdb4
 disk 2, o:1, dev:sdc4
 disk 3, o:1, dev:sdd4
 disk 4, o:1, dev:dm-0
 disk 5, o:1, dev:sdf4
 disk 6, o:1, dev:sdg4
 disk 7, o:0, dev:dm-1
 disk 8, o:1, dev:sdi4
 disk 9, o:1, dev:sdj4
 disk 10, o:1, dev:sdk4
 disk 11, o:1, dev:sdl4
RAID5 conf printout:
 --- rd:12 wd:10
 disk 0, o:1, dev:sda4
 disk 1, o:1, dev:sdb4
 disk 2, o:1, dev:sdc4
 disk 3, o:1, dev:sdd4
 disk 4, o:1, dev:dm-0
 disk 5, o:1, dev:sdf4
 disk 6, o:1, dev:sdg4
 disk 7, o:0, dev:dm-1
 disk 8, o:1, dev:sdi4
 disk 9, o:1, dev:sdj4
 disk 10, o:1, dev:sdk4
 disk 11, o:1, dev:sdl4
RAID5 conf printout:
 --- rd:12 wd:10
 disk 0, o:1, dev:sda4
 disk 1, o:1, dev:sdb4
 disk 2, o:1, dev:sdc4
 disk 3, o:1, dev:sdd4
 disk 4, o:1, dev:dm-0
 disk 5, o:1, dev:sdf4
 disk 6, o:1, dev:sdg4
 disk 8, o:1, dev:sdi4
 disk 9, o:1, dev:sdj4
 disk 10, o:1, dev:sdk4
 disk 11, o:1, dev:sdl4
md: recovery of RAID array md3
md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
md: using maximum available idle IO bandwidth (but not more than 200000 
KB/sec) 
for recovery.
md: using 128k window, over a total of 1462653888 blocks.
md: resuming recovery of md3 from checkpoint.

md3 : active raid6 sdd4[12] sdl4[11] sdk4[10] sdj4[9] sdi4[8] dm-1[13](F) 
sdg4[6] sdf4[5] dm-0[4] sdc4[2] sdb4[1] sda4[0]
      14626538880 blocks level 6, 16k chunk, algorithm 2 [12/10] 
[UUU_UUU_UUUU]
      [===============>.....]  recovery = 75.3% (1101853312/1462653888) 
finish=292.3min speed=20565K/sec

du -h /sna*
1.1M    /snapshot2.bin
1.1M    /snapshot.bin

df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/md1               19G   16G  3.5G  82% /
/dev/md0               99M   34M   60M  36% /boot
tmpfs                 2.0G     0  2.0G   0% /dev/shm

This is the actual state. :-(
In this way, the sync will stop again at 97.9%.

Another idea?
Or how to solve this dm-snapshot thing?

I think i know how can this be:
If i am right, the sync uses normal block size like usually wich is 4Kbyte 
in linux.
But the bad blocks are 512 bytes.
lets see for example one 4K window:
[BGBGBBGG] B: bad G: good sector
The sync reads up the block, the reported state is UNC because the drive 
reported UNC for some sector in this area.
The md recalculates the first 512byte bad block because the address is the 
same like the 4K block, than re-write it.
Than re-read the 4K block wich is still UNC because the 3rd sector is bad.
Can this be the issue?

Thanks,
Janos

> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Suggestion needed for fixing RAID6
  2010-04-27 15:50                   ` Janos Haar
@ 2010-04-27 23:02                     ` MRK
  2010-04-28  1:37                       ` Neil Brown
  0 siblings, 1 reply; 48+ messages in thread
From: MRK @ 2010-04-27 23:02 UTC (permalink / raw)
  To: Janos Haar; +Cc: linux-raid, Neil Brown

On 04/27/2010 05:50 PM, Janos Haar wrote:
>
> ----- Original Message ----- From: "MRK" <mrk@shiftmail.org>
> To: "Janos Haar" <janos.haar@netcenter.hu>
> Cc: <linux-raid@vger.kernel.org>
> Sent: Monday, April 26, 2010 6:53 PM
> Subject: Re: Suggestion needed for fixing RAID6
>
>
>> On 04/26/2010 02:52 PM, Janos Haar wrote:
>>>
>>> Oops, you are right!
>>> It was my mistake.
>>> Sorry, i will try it again, to support 2 drives with dm-cow.
>>> I will try it.
>>
>> Great! post here the results... the dmesg in particular.
>> The dmesg should contain multiple lines like this "raid5:md3: read 
>> error corrected ....."
>> then you know it worked.
>
> I am affraid i am still right about that....
>
> ...
> end_request: I/O error, dev sdh, sector 1667152256
> raid5:md3: read error not correctable (sector 1662188168 on dm-1).
> raid5: Disk failure on dm-1, disabling device.
> raid5: Operation continuing on 10 devices.

I think I can see a problem here:
You had 11 active devices over 12 when you received the read error.
At 11 devices over 12 your array is singly-degraded and this should be 
enough for raid6 to recompute the block from parity and perform the 
rewrite, correcting the read-error, but instead MD declared that it's 
impossible to correct the error, and dropped one more device (going to 
doubly-degraded).

I think this is an MD bug, and I think I know where it is:

--- linux-2.6.33-vanilla/drivers/md/raid5.c     2010-02-24 
19:52:17.000000000 +0100
+++ linux-2.6.33/drivers/md/raid5.c     2010-04-27 23:58:31.000000000 +0200
@@ -1526,7 +1526,7 @@ static void raid5_end_read_request(struc

                 clear_bit(R5_UPTODATE, &sh->dev[i].flags);
                 atomic_inc(&rdev->read_errors);
-               if (conf->mddev->degraded)
+               if (conf->mddev->degraded == conf->max_degraded)
                         printk_rl(KERN_WARNING
                                   "raid5:%s: read error not correctable "
                                   "(sector %llu on %s).\n",

------------------------------------------------------
(This is just compile-tested so try at your risk)

I'd like to hear what Neil thinks of this...

The problem here (apart from the erroneous error message) is that if 
execution goes inside that "if" clause, it will eventually reach the 
md_error() statement some 30 lines below there, which will have the 
effect of dropping one further device further worsening the situation 
instead of recovering it, and this is not the correct behaviour in this 
case as far as I understand.
At the current state raid6 behaves like if it was a raid5, effectively 
supporting only one failed disk.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Suggestion needed for fixing RAID6
  2010-04-27 23:02                     ` MRK
@ 2010-04-28  1:37                       ` Neil Brown
  2010-04-28  2:02                         ` Mikael Abrahamsson
  2010-04-28 12:57                         ` MRK
  0 siblings, 2 replies; 48+ messages in thread
From: Neil Brown @ 2010-04-28  1:37 UTC (permalink / raw)
  To: MRK; +Cc: Janos Haar, linux-raid

On Wed, 28 Apr 2010 01:02:14 +0200
MRK <mrk@shiftmail.org> wrote:

> On 04/27/2010 05:50 PM, Janos Haar wrote:
> >
> > ----- Original Message ----- From: "MRK" <mrk@shiftmail.org>
> > To: "Janos Haar" <janos.haar@netcenter.hu>
> > Cc: <linux-raid@vger.kernel.org>
> > Sent: Monday, April 26, 2010 6:53 PM
> > Subject: Re: Suggestion needed for fixing RAID6
> >
> >
> >> On 04/26/2010 02:52 PM, Janos Haar wrote:
> >>>
> >>> Oops, you are right!
> >>> It was my mistake.
> >>> Sorry, i will try it again, to support 2 drives with dm-cow.
> >>> I will try it.
> >>
> >> Great! post here the results... the dmesg in particular.
> >> The dmesg should contain multiple lines like this "raid5:md3: read 
> >> error corrected ....."
> >> then you know it worked.
> >
> > I am affraid i am still right about that....
> >
> > ...
> > end_request: I/O error, dev sdh, sector 1667152256
> > raid5:md3: read error not correctable (sector 1662188168 on dm-1).
> > raid5: Disk failure on dm-1, disabling device.
> > raid5: Operation continuing on 10 devices.
> 
> I think I can see a problem here:
> You had 11 active devices over 12 when you received the read error.
> At 11 devices over 12 your array is singly-degraded and this should be 
> enough for raid6 to recompute the block from parity and perform the 
> rewrite, correcting the read-error, but instead MD declared that it's 
> impossible to correct the error, and dropped one more device (going to 
> doubly-degraded).
> 
> I think this is an MD bug, and I think I know where it is:
> 
> 
> --- linux-2.6.33-vanilla/drivers/md/raid5.c     2010-02-24 
> 19:52:17.000000000 +0100
> +++ linux-2.6.33/drivers/md/raid5.c     2010-04-27 23:58:31.000000000 +0200
> @@ -1526,7 +1526,7 @@ static void raid5_end_read_request(struc
> 
>                  clear_bit(R5_UPTODATE, &sh->dev[i].flags);
>                  atomic_inc(&rdev->read_errors);
> -               if (conf->mddev->degraded)
> +               if (conf->mddev->degraded == conf->max_degraded)
>                          printk_rl(KERN_WARNING
>                                    "raid5:%s: read error not correctable "
>                                    "(sector %llu on %s).\n",
> 
> ------------------------------------------------------
> (This is just compile-tested so try at your risk)
> 
> I'd like to hear what Neil thinks of this...

I think you've found a real bug - thanks.

It would make the test '>=' rather than '==' as that is safer, otherwise I
agree.

> -               if (conf->mddev->degraded)
> +               if (conf->mddev->degraded >= conf->max_degraded)

Thanks,
NeilBrown


> 
> The problem here (apart from the erroneous error message) is that if 
> execution goes inside that "if" clause, it will eventually reach the 
> md_error() statement some 30 lines below there, which will have the 
> effect of dropping one further device further worsening the situation 
> instead of recovering it, and this is not the correct behaviour in this 
> case as far as I understand.
> At the current state raid6 behaves like if it was a raid5, effectively 
> supporting only one failed disk.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Suggestion needed for fixing RAID6
  2010-04-28  1:37                       ` Neil Brown
@ 2010-04-28  2:02                         ` Mikael Abrahamsson
  2010-04-28  2:12                           ` Neil Brown
  2010-04-28 12:57                         ` MRK
  1 sibling, 1 reply; 48+ messages in thread
From: Mikael Abrahamsson @ 2010-04-28  2:02 UTC (permalink / raw)
  To: Neil Brown; +Cc: MRK, Janos Haar, linux-raid

On Wed, 28 Apr 2010, Neil Brown wrote:

>> I think I can see a problem here:
>> You had 11 active devices over 12 when you received the read error.
>> At 11 devices over 12 your array is singly-degraded and this should be
>> enough for raid6 to recompute the block from parity and perform the
>> rewrite, correcting the read-error, but instead MD declared that it's
>> impossible to correct the error, and dropped one more device (going to
>> doubly-degraded).
>>
>> I think this is an MD bug, and I think I know where it is:
>>
>>
>> --- linux-2.6.33-vanilla/drivers/md/raid5.c     2010-02-24
>> 19:52:17.000000000 +0100
>> +++ linux-2.6.33/drivers/md/raid5.c     2010-04-27 23:58:31.000000000 +0200
>> @@ -1526,7 +1526,7 @@ static void raid5_end_read_request(struc
>>
>>                  clear_bit(R5_UPTODATE, &sh->dev[i].flags);
>>                  atomic_inc(&rdev->read_errors);
>> -               if (conf->mddev->degraded)
>> +               if (conf->mddev->degraded == conf->max_degraded)
>>                          printk_rl(KERN_WARNING
>>                                    "raid5:%s: read error not correctable "
>>                                    "(sector %llu on %s).\n",
>>
>> ------------------------------------------------------
>> (This is just compile-tested so try at your risk)
>>
>> I'd like to hear what Neil thinks of this...
>
> I think you've found a real bug - thanks.
>
> It would make the test '>=' rather than '==' as that is safer, otherwise I
> agree.
>
>> -               if (conf->mddev->degraded)
>> +               if (conf->mddev->degraded >= conf->max_degraded)

If a raid6 device handling can reach this code path, could I also point 
out that the message says "raid5" and that this is confusing if it's 
referring to a degraded raid6?

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Suggestion needed for fixing RAID6
  2010-04-28  2:02                         ` Mikael Abrahamsson
@ 2010-04-28  2:12                           ` Neil Brown
  2010-04-28  2:30                             ` Mikael Abrahamsson
  0 siblings, 1 reply; 48+ messages in thread
From: Neil Brown @ 2010-04-28  2:12 UTC (permalink / raw)
  To: Mikael Abrahamsson; +Cc: MRK, Janos Haar, linux-raid

On Wed, 28 Apr 2010 04:02:39 +0200 (CEST)
Mikael Abrahamsson <swmike@swm.pp.se> wrote:

> On Wed, 28 Apr 2010, Neil Brown wrote:
> 
> >> I think I can see a problem here:
> >> You had 11 active devices over 12 when you received the read error.
> >> At 11 devices over 12 your array is singly-degraded and this should be
> >> enough for raid6 to recompute the block from parity and perform the
> >> rewrite, correcting the read-error, but instead MD declared that it's
> >> impossible to correct the error, and dropped one more device (going to
> >> doubly-degraded).
> >>
> >> I think this is an MD bug, and I think I know where it is:
> >>
> >>
> >> --- linux-2.6.33-vanilla/drivers/md/raid5.c     2010-02-24
> >> 19:52:17.000000000 +0100
> >> +++ linux-2.6.33/drivers/md/raid5.c     2010-04-27 23:58:31.000000000 +0200
> >> @@ -1526,7 +1526,7 @@ static void raid5_end_read_request(struc
> >>
> >>                  clear_bit(R5_UPTODATE, &sh->dev[i].flags);
> >>                  atomic_inc(&rdev->read_errors);
> >> -               if (conf->mddev->degraded)
> >> +               if (conf->mddev->degraded == conf->max_degraded)
> >>                          printk_rl(KERN_WARNING
> >>                                    "raid5:%s: read error not correctable "
> >>                                    "(sector %llu on %s).\n",
> >>
> >> ------------------------------------------------------
> >> (This is just compile-tested so try at your risk)
> >>
> >> I'd like to hear what Neil thinks of this...
> >
> > I think you've found a real bug - thanks.
> >
> > It would make the test '>=' rather than '==' as that is safer, otherwise I
> > agree.
> >
> >> -               if (conf->mddev->degraded)
> >> +               if (conf->mddev->degraded >= conf->max_degraded)
> 
> If a raid6 device handling can reach this code path, could I also point 
> out that the message says "raid5" and that this is confusing if it's 
> referring to a degraded raid6?
> 

You could....

There are lots of places that say "raid5" where it could apply to raid4
or raid6 as well.  Maybe I should change them all to 'raid456'...

NeilBrown

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Suggestion needed for fixing RAID6
  2010-04-28  2:12                           ` Neil Brown
@ 2010-04-28  2:30                             ` Mikael Abrahamsson
  2010-05-03  2:29                               ` Neil Brown
  0 siblings, 1 reply; 48+ messages in thread
From: Mikael Abrahamsson @ 2010-04-28  2:30 UTC (permalink / raw)
  To: Neil Brown; +Cc: MRK, Janos Haar, linux-raid

On Wed, 28 Apr 2010, Neil Brown wrote:

> There are lots of places that say "raid5" where it could apply to raid4 
> or raid6 as well.  Maybe I should change them all to 'raid456'...

That sounds like a good idea, or just call it "raid:" or "raid4/5/6".

Don't know where we are in the stable kernel release cycle, but it would 
be super if this could make it in by next cycle, this code is handling the 
fault scenario that made me go from raid5 to raid6 :)

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Suggestion needed for fixing RAID6
  2010-04-28  2:30                             ` Mikael Abrahamsson
@ 2010-05-03  2:29                               ` Neil Brown
  0 siblings, 0 replies; 48+ messages in thread
From: Neil Brown @ 2010-05-03  2:29 UTC (permalink / raw)
  To: Mikael Abrahamsson; +Cc: MRK, Janos Haar, linux-raid

On Wed, 28 Apr 2010 04:30:05 +0200 (CEST)
Mikael Abrahamsson <swmike@swm.pp.se> wrote:

> On Wed, 28 Apr 2010, Neil Brown wrote:
> 
> > There are lots of places that say "raid5" where it could apply to raid4 
> > or raid6 as well.  Maybe I should change them all to 'raid456'...
> 
> That sounds like a good idea, or just call it "raid:" or "raid4/5/6".
> 
> Don't know where we are in the stable kernel release cycle, but it would 
> be super if this could make it in by next cycle, this code is handling the 
> fault scenario that made me go from raid5 to raid6 :)
> 

We are very close to release of 2.6.34.  I won't submit this before 2.6.34 is
released as it is not a regression and not technically a data-corruption
bug.  However it will go into 2.6.35-rc1 and but submitted to -stable for
2.6.34.1 and probably other -stable kernels.

NeilBrown

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Suggestion needed for fixing RAID6
  2010-04-28  1:37                       ` Neil Brown
  2010-04-28  2:02                         ` Mikael Abrahamsson
@ 2010-04-28 12:57                         ` MRK
  2010-04-28 13:32                           ` Janos Haar
  1 sibling, 1 reply; 48+ messages in thread
From: MRK @ 2010-04-28 12:57 UTC (permalink / raw)
  To: Neil Brown; +Cc: Janos Haar, linux-raid

On 04/28/2010 03:37 AM, Neil Brown wrote:
>> --- linux-2.6.33-vanilla/drivers/md/raid5.c     2010-02-24
>> 19:52:17.000000000 +0100
>> +++ linux-2.6.33/drivers/md/raid5.c     2010-04-27 23:58:31.000000000 +0200
>> @@ -1526,7 +1526,7 @@ static void raid5_end_read_request(struc
>>
>>                   clear_bit(R5_UPTODATE,&sh->dev[i].flags);
>>                   atomic_inc(&rdev->read_errors);
>> -               if (conf->mddev->degraded)
>> +               if (conf->mddev->degraded == conf->max_degraded)
>>                           printk_rl(KERN_WARNING
>>                                     "raid5:%s: read error not correctable "
>>                                     "(sector %llu on %s).\n",
>>
>> ------------------------------------------------------
>> (This is just compile-tested so try at your risk)
>>
>> I'd like to hear what Neil thinks of this...
>>      
> I think you've found a real bug - thanks.
>
> It would make the test '>=' rather than '==' as that is safer, otherwise I
> agree.
>
>    
>> -               if (conf->mddev->degraded)
>> +               if (conf->mddev->degraded>= conf->max_degraded)
>>      

Right, agreed...

> Thanks,
> NeilBrown
>    

Ok then I'll post a more official patch in a separate email shortly, thanks


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Suggestion needed for fixing RAID6
  2010-04-28 12:57                         ` MRK
@ 2010-04-28 13:32                           ` Janos Haar
  2010-04-28 14:19                             ` MRK
  0 siblings, 1 reply; 48+ messages in thread
From: Janos Haar @ 2010-04-28 13:32 UTC (permalink / raw)
  To: MRK; +Cc: linux-raid, Neil Brown

MRK, Neil,

Please let me have one wish:
Please write down my name to the kernel tree with a note i was who reported 
and helped to track down this. :-)

Thanks.
Janos Haar

----- Original Message ----- 
From: "MRK" <mrk@shiftmail.org>
To: "Neil Brown" <neilb@suse.de>
Cc: "Janos Haar" <janos.haar@netcenter.hu>; <linux-raid@vger.kernel.org>
Sent: Wednesday, April 28, 2010 2:57 PM
Subject: Re: Suggestion needed for fixing RAID6


> On 04/28/2010 03:37 AM, Neil Brown wrote:
>>> --- linux-2.6.33-vanilla/drivers/md/raid5.c     2010-02-24
>>> 19:52:17.000000000 +0100
>>> +++ linux-2.6.33/drivers/md/raid5.c     2010-04-27 23:58:31.000000000 
>>> +0200
>>> @@ -1526,7 +1526,7 @@ static void raid5_end_read_request(struc
>>>
>>>                   clear_bit(R5_UPTODATE,&sh->dev[i].flags);
>>>                   atomic_inc(&rdev->read_errors);
>>> -               if (conf->mddev->degraded)
>>> +               if (conf->mddev->degraded == conf->max_degraded)
>>>                           printk_rl(KERN_WARNING
>>>                                     "raid5:%s: read error not 
>>> correctable "
>>>                                     "(sector %llu on %s).\n",
>>>
>>> ------------------------------------------------------
>>> (This is just compile-tested so try at your risk)
>>>
>>> I'd like to hear what Neil thinks of this...
>>>
>> I think you've found a real bug - thanks.
>>
>> It would make the test '>=' rather than '==' as that is safer, otherwise 
>> I
>> agree.
>>
>>
>>> -               if (conf->mddev->degraded)
>>> +               if (conf->mddev->degraded>= conf->max_degraded)
>>>
>
> Right, agreed...
>
>> Thanks,
>> NeilBrown
>>
>
> Ok then I'll post a more official patch in a separate email shortly, 
> thanks 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Suggestion needed for fixing RAID6
  2010-04-28 13:32                           ` Janos Haar
@ 2010-04-28 14:19                             ` MRK
  2010-04-28 14:51                               ` Janos Haar
  2010-04-29  7:55                               ` Janos Haar
  0 siblings, 2 replies; 48+ messages in thread
From: MRK @ 2010-04-28 14:19 UTC (permalink / raw)
  To: Janos Haar; +Cc: linux-raid, Neil Brown

On 04/28/2010 03:32 PM, Janos Haar wrote:
> MRK, Neil,
>
> Please let me have one wish:
> Please write down my name to the kernel tree with a note i was who 
> reported and helped to track down this. :-)
>
> Thanks.
> Janos Haar

Ok I did
However it would be nice if you can actually test the patch and confirm 
that it solves your problem, starting with the raid6 array in 
singly-degraded mode like you did yesterday. Then I think we can add one 
further line on top:

Tested-by: Janos Haar <janos.haar@netcenter.hu>

before Neil (hopefully) acks it. Testing is needed anyway before pushing 
it to mainline, I think...

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Suggestion needed for fixing RAID6
  2010-04-28 14:19                             ` MRK
@ 2010-04-28 14:51                               ` Janos Haar
  2010-04-29  7:55                               ` Janos Haar
  1 sibling, 0 replies; 48+ messages in thread
From: Janos Haar @ 2010-04-28 14:51 UTC (permalink / raw)
  To: MRK; +Cc: linux-raid


----- Original Message ----- 
From: "MRK" <gabriele.trombetti@gmail.com>
To: "Janos Haar" <janos.haar@netcenter.hu>
Cc: <linux-raid@vger.kernel.org>; "Neil Brown" <neilb@suse.de>
Sent: Wednesday, April 28, 2010 4:19 PM
Subject: Re: Suggestion needed for fixing RAID6


> On 04/28/2010 03:32 PM, Janos Haar wrote:
>> MRK, Neil,
>>
>> Please let me have one wish:
>> Please write down my name to the kernel tree with a note i was who 
>> reported and helped to track down this. :-)
>>
>> Thanks.
>> Janos Haar
> 
> Ok I did
> However it would be nice if you can actually test the patch and confirm 
> that it solves your problem, starting with the raid6 array in 
> singly-degraded mode like you did yesterday. Then I think we can add one 
> further line on top:
> 
> Tested-by: Janos Haar <janos.haar@netcenter.hu>
> 
> before Neil (hopefully) acks it. Testing is needed anyway before pushing 
> it to mainline, I think...

I am allready working on......
Please give me some time....

Janos

> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Suggestion needed for fixing RAID6
  2010-04-28 14:19                             ` MRK
  2010-04-28 14:51                               ` Janos Haar
@ 2010-04-29  7:55                               ` Janos Haar
  2010-04-29 15:22                                 ` MRK
  1 sibling, 1 reply; 48+ messages in thread
From: Janos Haar @ 2010-04-29  7:55 UTC (permalink / raw)
  To: MRK; +Cc: linux-raid


md3 : active raid6 sdd4[12] sdl4[11] sdk4[10] sdj4[9] sdi4[8] dm-1[13](F) 
sdg4[6
] sdf4[5] dm-0[4] sdc4[2] sdb4[1] sda4[0]
      14626538880 blocks level 6, 16k chunk, algorithm 2 [12/10] 
[UUU_UUU_UUUU]
      [===========>.........]  recovery = 56.8% (831095108/1462653888) 
finish=50
19.8min speed=2096K/sec

Drive dropped again with this patch!
+ the kernel freezed.
(I will try to get more info...)

Janos

----- Original Message ----- 
From: "MRK" <**************>
To: "Janos Haar" <janos.haar@netcenter.hu>
Cc: <linux-raid@vger.kernel.org>; "Neil Brown" <neilb@suse.de>
Sent: Wednesday, April 28, 2010 4:19 PM
Subject: Re: Suggestion needed for fixing RAID6


> On 04/28/2010 03:32 PM, Janos Haar wrote:
>> MRK, Neil,
>>
>> Please let me have one wish:
>> Please write down my name to the kernel tree with a note i was who 
>> reported and helped to track down this. :-)
>>
>> Thanks.
>> Janos Haar
>
> Ok I did
> However it would be nice if you can actually test the patch and confirm 
> that it solves your problem, starting with the raid6 array in 
> singly-degraded mode like you did yesterday. Then I think we can add one 
> further line on top:
>
> Tested-by: Janos Haar <janos.haar@netcenter.hu>
>
> before Neil (hopefully) acks it. Testing is needed anyway before pushing 
> it to mainline, I think...
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Suggestion needed for fixing RAID6
  2010-04-29  7:55                               ` Janos Haar
@ 2010-04-29 15:22                                 ` MRK
  2010-04-29 21:07                                   ` Janos Haar
  0 siblings, 1 reply; 48+ messages in thread
From: MRK @ 2010-04-29 15:22 UTC (permalink / raw)
  To: Janos Haar; +Cc: linux-raid

On 04/29/2010 09:55 AM, Janos Haar wrote:
>
> md3 : active raid6 sdd4[12] sdl4[11] sdk4[10] sdj4[9] sdi4[8] 
> dm-1[13](F) sdg4[6
> ] sdf4[5] dm-0[4] sdc4[2] sdb4[1] sda4[0]
>      14626538880 blocks level 6, 16k chunk, algorithm 2 [12/10] 
> [UUU_UUU_UUUU]
>      [===========>.........]  recovery = 56.8% (831095108/1462653888) 
> finish=50
> 19.8min speed=2096K/sec
>
> Drive dropped again with this patch!
> + the kernel freezed.
> (I will try to get more info...)
>
> Janos

Hmm too bad :-( it seems it still doesn't work, sorry for that

I suppose the kernel didn't freeze immediately after disabling the drive 
or you wouldn't have had the chance to cat /proc/mdstat...

Hence dmesg messages might have gone to /var/log/messages or something. 
Can you look there to see if there is any interesting message to post here?
Did the COW device fill up at least a bit?

Also: you know that if you disable graphics on the server 
("/etc/init.d/gdm stop" or something like that) you usually can see the 
stack trace of the kernel panic on screen when it hangs (unless terminal 
was blank for powersaving, which you can disable too). You can take a 
photo of that one (or write it down but it will be long) to so maybe 
somebody can understand why it hanged. You might be even obtain the 
stack trace through a serial port but that will take more effort.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Suggestion needed for fixing RAID6
  2010-04-29 15:22                                 ` MRK
@ 2010-04-29 21:07                                   ` Janos Haar
  2010-04-29 23:00                                     ` MRK
  0 siblings, 1 reply; 48+ messages in thread
From: Janos Haar @ 2010-04-29 21:07 UTC (permalink / raw)
  To: MRK; +Cc: linux-raid


----- Original Message ----- 
From: "MRK" <mrk@shiftmail.org>
To: "Janos Haar" <janos.haar@netcenter.hu>
Cc: <linux-raid@vger.kernel.org>
Sent: Thursday, April 29, 2010 5:22 PM
Subject: Re: Suggestion needed for fixing RAID6


> On 04/29/2010 09:55 AM, Janos Haar wrote:
>>
>> md3 : active raid6 sdd4[12] sdl4[11] sdk4[10] sdj4[9] sdi4[8] dm-1[13](F) 
>> sdg4[6
>> ] sdf4[5] dm-0[4] sdc4[2] sdb4[1] sda4[0]
>>      14626538880 blocks level 6, 16k chunk, algorithm 2 [12/10] 
>> [UUU_UUU_UUUU]
>>      [===========>.........]  recovery = 56.8% (831095108/1462653888) 
>> finish=50
>> 19.8min speed=2096K/sec
>>
>> Drive dropped again with this patch!
>> + the kernel freezed.
>> (I will try to get more info...)
>>
>> Janos
>
> Hmm too bad :-( it seems it still doesn't work, sorry for that
>
> I suppose the kernel didn't freeze immediately after disabling the drive 
> or you wouldn't have had the chance to cat /proc/mdstat...

this was this command in putty.exe window:
watch "cat /proc/mdstat ; du -h /snap*"

I think it have crashed soon.
I had no time to recognize what happened and exit from the watch.

>
> Hence dmesg messages might have gone to /var/log/messages or something. 
> Can you look there to see if there is any interesting message to post 
> here?

Yes, i know that.
The crash was not written up unfortunately.
But there is some info:

(some UNC reported from sdh)
....
Apr 29 09:50:29 Clarus-gl2k10-2 kernel:          res 
51/40:00:27:c0:5e/40:00:63:00:00/e0 Emask 0x9 (media error)
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: ata8.00: status: { DRDY ERR }
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: ata8.00: error: { UNC }
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: ata8.00: configured for UDMA/133
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: sd 7:0:0:0: [sdh] Result: 
hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: sd 7:0:0:0: [sdh] Sense Key : Medium 
Error [current] [descriptor]
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: Descriptor sense data with sense 
descriptors (in hex):
Apr 29 09:50:29 Clarus-gl2k10-2 kernel:         72 03 11 04 00 00 00 0c 00 
0a 80 00 00 00 00 00
Apr 29 09:50:29 Clarus-gl2k10-2 kernel:         63 5e c0 27
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: sd 7:0:0:0: [sdh] Add. Sense: 
Unrecovered read error - auto reallocate failed
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: end_request: I/O error, dev sdh, 
sector 1667153959
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not 
correctable (sector 1662189872 on dm-1).
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not 
correctable (sector 1662189880 on dm-1).
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not 
correctable (sector 1662189888 on dm-1).
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not 
correctable (sector 1662189896 on dm-1).
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not 
correctable (sector 1662189904 on dm-1).
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not 
correctable (sector 1662189912 on dm-1).
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not 
correctable (sector 1662189920 on dm-1).
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not 
correctable (sector 1662189928 on dm-1).
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not 
correctable (sector 1662189936 on dm-1).
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not 
correctable (sector 1662189944 on dm-1).
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: ata8: EH complete
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: sd 7:0:0:0: [sdh] Write Protect is 
off
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: sd 7:0:0:0: [sdh] Write cache: 
enabled, read cache: enabled, doesn't support DPO or FUA
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: sd 7:0:0:0: [sdh] 2930277168 
512-byte hardware sectors: (1.50 TB/1.36 TiB)
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: sd 7:0:0:0: [sdh] Write Protect is 
off
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: sd 7:0:0:0: [sdh] Write cache: 
enabled, read cache: enabled, doesn't support DPO or FUA
Apr 29 13:07:39 Clarus-gl2k10-2 syslogd 1.4.1: restart.


> Did the COW device fill up at least a bit?

The initial size is 1.1MB, and what we wants to see is only some kbytes...
I don't know exactly.
Next time i will try to reduce the initial size to 16KByte.

>
> Also: you know that if you disable graphics on the server 
> ("/etc/init.d/gdm stop" or something like that) you usually can see the 
> stack trace of the kernel panic on screen when it hangs (unless terminal 
> was blank for powersaving, which you can disable too). You can take a 
> photo of that one (or write it down but it will be long) to so maybe 
> somebody can understand why it hanged. You might be even obtain the stack 
> trace through a serial port but that will take more effort.

This pc based server have no graphic card at all. :-) (this is one of my 
freak ideas)
And the terminal is redirected to the com1.
If i really want, i can catch this with serial cable, but i think the log 
should be enough from the messages file.

Thanks,
Janos 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Suggestion needed for fixing RAID6
  2010-04-29 21:07                                   ` Janos Haar
@ 2010-04-29 23:00                                     ` MRK
  2010-04-30  6:17                                       ` Janos Haar
  0 siblings, 1 reply; 48+ messages in thread
From: MRK @ 2010-04-29 23:00 UTC (permalink / raw)
  To: Janos Haar; +Cc: linux-raid

On 04/29/2010 11:07 PM, Janos Haar wrote:
>
> ----- Original Message ----- From: "MRK" <mrk@shiftmail.org>
> To: "Janos Haar" <janos.haar@netcenter.hu>
> Cc: <linux-raid@vger.kernel.org>
> Sent: Thursday, April 29, 2010 5:22 PM
> Subject: Re: Suggestion needed for fixing RAID6
>
>
>> On 04/29/2010 09:55 AM, Janos Haar wrote:
>>>
>>> md3 : active raid6 sdd4[12] sdl4[11] sdk4[10] sdj4[9] sdi4[8] 
>>> dm-1[13](F) sdg4[6
>>> ] sdf4[5] dm-0[4] sdc4[2] sdb4[1] sda4[0]
>>>      14626538880 blocks level 6, 16k chunk, algorithm 2 [12/10] 
>>> [UUU_UUU_UUUU]
>>>      [===========>.........]  recovery = 56.8% 
>>> (831095108/1462653888) finish=50
>>> 19.8min speed=2096K/sec
>>>
>>> Drive dropped again with this patch!
>>> + the kernel freezed.
>>> (I will try to get more info...)
>>>
>>> Janos
>>
>> Hmm too bad :-( it seems it still doesn't work, sorry for that
>>
>> I suppose the kernel didn't freeze immediately after disabling the 
>> drive or you wouldn't have had the chance to cat /proc/mdstat...
>
> this was this command in putty.exe window:
> watch "cat /proc/mdstat ; du -h /snap*"
>

good idea...

> I think it have crashed soon.
> I had no time to recognize what happened and exit from the watch.
>
>>
>> Hence dmesg messages might have gone to /var/log/messages or 
>> something. Can you look there to see if there is any interesting 
>> message to post here?
>
> Yes, i know that.
> The crash was not written up unfortunately.
> But there is some info:
>
> (some UNC reported from sdh)
> ....
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel:          res 
> 51/40:00:27:c0:5e/40:00:63:00:00/e0 Emask 0x9 (media error)
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: ata8.00: status: { DRDY ERR }
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: ata8.00: error: { UNC }
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: ata8.00: configured for UDMA/133
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: sd 7:0:0:0: [sdh] Result: 
> hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: sd 7:0:0:0: [sdh] Sense Key : 
> Medium Error [current] [descriptor]
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: Descriptor sense data with 
> sense descriptors (in hex):
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel:         72 03 11 04 00 00 00 
> 0c 00 0a 80 00 00 00 00 00
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel:         63 5e c0 27
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: sd 7:0:0:0: [sdh] Add. Sense: 
> Unrecovered read error - auto reallocate failed
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: end_request: I/O error, dev 
> sdh, sector 1667153959
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not 
> correctable (sector 1662189872 on dm-1).
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not 
> correctable (sector 1662189880 on dm-1).
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not 
> correctable (sector 1662189888 on dm-1).
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not 
> correctable (sector 1662189896 on dm-1).
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not 
> correctable (sector 1662189904 on dm-1).
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not 
> correctable (sector 1662189912 on dm-1).
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not 
> correctable (sector 1662189920 on dm-1).
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not 
> correctable (sector 1662189928 on dm-1).
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not 
> correctable (sector 1662189936 on dm-1).
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not 
> correctable (sector 1662189944 on dm-1).
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: ata8: EH complete
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: sd 7:0:0:0: [sdh] Write 
> Protect is off
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: sd 7:0:0:0: [sdh] Write cache: 
> enabled, read cache: enabled, doesn't support DPO or FUA
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: sd 7:0:0:0: [sdh] 2930277168 
> 512-byte hardware sectors: (1.50 TB/1.36 TiB)
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: sd 7:0:0:0: [sdh] Write 
> Protect is off
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: sd 7:0:0:0: [sdh] Write cache: 
> enabled, read cache: enabled, doesn't support DPO or FUA
> Apr 29 13:07:39 Clarus-gl2k10-2 syslogd 1.4.1: restart.

Hmm what strange...
I don't see the message "Disk failure on %s, disabling device" \n 
"Operation continuing on %d devices" in your log.

In MD raid456 the ONLY place where a disk is set faulty is this (file 
raid5.c):

----------------------
                 set_bit(Faulty, &rdev->flags);
                 printk(KERN_ALERT
                        "raid5: Disk failure on %s, disabling device.\n"
                        "raid5: Operation continuing on %d devices.\n",
                        bdevname(rdev->bdev,b), conf->raid_disks - 
mddev->degraded);
----------------------
( which is called by md_error() )

As you can see, just after disabling the device it prints the dmesg message.
I don't understand how you could catch a cat /proc/mdstat already 
reporting the disk as failed, and still not seeing the message in the 
/var/log/messages .

But you do see messages that should come chronologically after that one. 
The errors like:
"Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not 
correctable (sector 1662189872 on dm-1)."
can now (after the patch) be generated only after raid-6 is in 
doubly-degraded state. I don't understand how those errors could become 
visible before the message telling that MD is disabling the device.

To make the thing more strange, if raid-6 is in doubly-degraded state it 
means dm-1/sdh is disabled, but if dm-1/sdh is disabled MD should not 
have read anything from there. I mean there shouldn't have been any read 
error because there shouldn't have been any read.

You are sure that
a) this dmesg you reported really is from your last run of the resync
b) above or below the messages you report there is no "Disk failure on 
..., disabling device" string?

Last thing, your system might have crashed because of the sd / SATA 
driver (instead of that being a direct bug of MD). You see, those are 
the last messages before the reboot, and the message about write cache 
is repeated. The driver might have tried to reset the drive, maybe 
quickly more than once. I'm not sure... but that could be a reason.

Exactly what kernel version are you running now, after applying my patch?

At the moment I don't have more ideas, sorry. I hope somebody else replies.
In the meanwhile you might run it through the serial cable if you have 
some time. Maybe you can get more dmesg stuff that couldn't make it 
through /var/log/messages. And you would also get the kernel panic. 
Actually for the dmesg I think you can try with a "watch dmesg -c" via 
putty.

Good luck

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Suggestion needed for fixing RAID6
  2010-04-29 23:00                                     ` MRK
@ 2010-04-30  6:17                                       ` Janos Haar
  2010-04-30 23:54                                         ` MRK
       [not found]                                         ` <4BDB6DB6.5020306@sh iftmail.org>
  0 siblings, 2 replies; 48+ messages in thread
From: Janos Haar @ 2010-04-30  6:17 UTC (permalink / raw)
  To: MRK; +Cc: Neil Brown, linux-raid

Hello,

OK, MRK you are right (again).
There was some line in the messages wich avoids my attention.
The entire log is here: 
http://download.netcenter.hu/bughunt/20100430/messages

The dm founds invalid my cow devices, but i don't know why at this time.

My setup script looks like this: "create-cow":

rm -f /snapshot.bin
rm -f /snapshot2.bin

dd_rescue -v /dev/zero /snapshot.bin -m 4k -S 2000G
dd_rescue -v /dev/zero /snapshot2.bin -m 4k -S 2000G

losetup /dev/loop3 /snapshot.bin
losetup /dev/loop4 /snapshot2.bin

dd if=/dev/zero of=/dev/loop3 bs=1M count=1
dd if=/dev/zero of=/dev/loop4 bs=1M count=1

echo 0 $(blockdev --getsize /dev/sde4) \
        snapshot /dev/sde4 /dev/loop3 p 8 | \
        dmsetup create cow

echo 0 $(blockdev --getsize /dev/sdh4) \
        snapshot /dev/sdh4 /dev/loop4 p 8 | \
        dmsetup create cow2

Now i have the last state, and there is more space left on the disk, and the 
snapshots are smalls:
du -h /snapshot*
1.1M    /snapshot2.bin
1.1M    /snapshot.bin

My new kernel is the same like the old one, only diff is the md-patch.
Additionally i need to note, my kernel have only one additional patch wich 
differs from the normal tree, this patch is the pdflush-patch.
(I can set the number of pdflushd's number in the proc.)

I can try again, if there is any new idea, but it would be really good to do 
some trick with bitmaps or set the recovery's start point or something 
similar, because every time i need >16 hour to get the first poit where the 
raid do something interesting....

Neil,
Can you say something useful about this?

Thanks again,
Janos


----- Original Message ----- 
From: "MRK" <mrk@shiftmail.org>
To: "Janos Haar" <janos.haar@netcenter.hu>
Cc: <linux-raid@vger.kernel.org>
Sent: Friday, April 30, 2010 1:00 AM
Subject: Re: Suggestion needed for fixing RAID6


> On 04/29/2010 11:07 PM, Janos Haar wrote:
>>
>> ----- Original Message ----- From: "MRK" <mrk@shiftmail.org>
>> To: "Janos Haar" <janos.haar@netcenter.hu>
>> Cc: <linux-raid@vger.kernel.org>
>> Sent: Thursday, April 29, 2010 5:22 PM
>> Subject: Re: Suggestion needed for fixing RAID6
>>
>>
>>> On 04/29/2010 09:55 AM, Janos Haar wrote:
>>>>
>>>> md3 : active raid6 sdd4[12] sdl4[11] sdk4[10] sdj4[9] sdi4[8] 
>>>> dm-1[13](F) sdg4[6
>>>> ] sdf4[5] dm-0[4] sdc4[2] sdb4[1] sda4[0]
>>>>      14626538880 blocks level 6, 16k chunk, algorithm 2 [12/10] 
>>>> [UUU_UUU_UUUU]
>>>>      [===========>.........]  recovery = 56.8% (831095108/1462653888) 
>>>> finish=50
>>>> 19.8min speed=2096K/sec
>>>>
>>>> Drive dropped again with this patch!
>>>> + the kernel freezed.
>>>> (I will try to get more info...)
>>>>
>>>> Janos
>>>
>>> Hmm too bad :-( it seems it still doesn't work, sorry for that
>>>
>>> I suppose the kernel didn't freeze immediately after disabling the drive 
>>> or you wouldn't have had the chance to cat /proc/mdstat...
>>
>> this was this command in putty.exe window:
>> watch "cat /proc/mdstat ; du -h /snap*"
>>
>
> good idea...
>
>> I think it have crashed soon.
>> I had no time to recognize what happened and exit from the watch.
>>
>>>
>>> Hence dmesg messages might have gone to /var/log/messages or something. 
>>> Can you look there to see if there is any interesting message to post 
>>> here?
>>
>> Yes, i know that.
>> The crash was not written up unfortunately.
>> But there is some info:
>>
>> (some UNC reported from sdh)
>> ....
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel:          res 
>> 51/40:00:27:c0:5e/40:00:63:00:00/e0 Emask 0x9 (media error)
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: ata8.00: status: { DRDY ERR }
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: ata8.00: error: { UNC }
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: ata8.00: configured for UDMA/133
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: sd 7:0:0:0: [sdh] Result: 
>> hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: sd 7:0:0:0: [sdh] Sense Key : 
>> Medium Error [current] [descriptor]
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: Descriptor sense data with sense 
>> descriptors (in hex):
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel:         72 03 11 04 00 00 00 0c 
>> 00 0a 80 00 00 00 00 00
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel:         63 5e c0 27
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: sd 7:0:0:0: [sdh] Add. Sense: 
>> Unrecovered read error - auto reallocate failed
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: end_request: I/O error, dev sdh, 
>> sector 1667153959
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not 
>> correctable (sector 1662189872 on dm-1).
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not 
>> correctable (sector 1662189880 on dm-1).
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not 
>> correctable (sector 1662189888 on dm-1).
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not 
>> correctable (sector 1662189896 on dm-1).
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not 
>> correctable (sector 1662189904 on dm-1).
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not 
>> correctable (sector 1662189912 on dm-1).
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not 
>> correctable (sector 1662189920 on dm-1).
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not 
>> correctable (sector 1662189928 on dm-1).
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not 
>> correctable (sector 1662189936 on dm-1).
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not 
>> correctable (sector 1662189944 on dm-1).
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: ata8: EH complete
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: sd 7:0:0:0: [sdh] Write Protect 
>> is off
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: sd 7:0:0:0: [sdh] Write cache: 
>> enabled, read cache: enabled, doesn't support DPO or FUA
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: sd 7:0:0:0: [sdh] 2930277168 
>> 512-byte hardware sectors: (1.50 TB/1.36 TiB)
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: sd 7:0:0:0: [sdh] Write Protect 
>> is off
>> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: sd 7:0:0:0: [sdh] Write cache: 
>> enabled, read cache: enabled, doesn't support DPO or FUA
>> Apr 29 13:07:39 Clarus-gl2k10-2 syslogd 1.4.1: restart.
>
> Hmm what strange...
> I don't see the message "Disk failure on %s, disabling device" \n 
> "Operation continuing on %d devices" in your log.
>
> In MD raid456 the ONLY place where a disk is set faulty is this (file 
> raid5.c):
>
> ----------------------
>                 set_bit(Faulty, &rdev->flags);
>                 printk(KERN_ALERT
>                        "raid5: Disk failure on %s, disabling device.\n"
>                        "raid5: Operation continuing on %d devices.\n",
>                        bdevname(rdev->bdev,b), conf->raid_disks - 
> mddev->degraded);
> ----------------------
> ( which is called by md_error() )
>
> As you can see, just after disabling the device it prints the dmesg 
> message.
> I don't understand how you could catch a cat /proc/mdstat already 
> reporting the disk as failed, and still not seeing the message in the 
> /var/log/messages .
>
> But you do see messages that should come chronologically after that one. 
> The errors like:
> "Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not 
> correctable (sector 1662189872 on dm-1)."
> can now (after the patch) be generated only after raid-6 is in 
> doubly-degraded state. I don't understand how those errors could become 
> visible before the message telling that MD is disabling the device.
>
> To make the thing more strange, if raid-6 is in doubly-degraded state it 
> means dm-1/sdh is disabled, but if dm-1/sdh is disabled MD should not have 
> read anything from there. I mean there shouldn't have been any read error 
> because there shouldn't have been any read.
>
> You are sure that
> a) this dmesg you reported really is from your last run of the resync
> b) above or below the messages you report there is no "Disk failure on 
> ..., disabling device" string?
>
> Last thing, your system might have crashed because of the sd / SATA driver 
> (instead of that being a direct bug of MD). You see, those are the last 
> messages before the reboot, and the message about write cache is repeated. 
> The driver might have tried to reset the drive, maybe quickly more than 
> once. I'm not sure... but that could be a reason.
>
> Exactly what kernel version are you running now, after applying my patch?
>
> At the moment I don't have more ideas, sorry. I hope somebody else 
> replies.
> In the meanwhile you might run it through the serial cable if you have 
> some time. Maybe you can get more dmesg stuff that couldn't make it 
> through /var/log/messages. And you would also get the kernel panic. 
> Actually for the dmesg I think you can try with a "watch dmesg -c" via 
> putty.
>
> Good luck 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Suggestion needed for fixing RAID6
  2010-04-30  6:17                                       ` Janos Haar
@ 2010-04-30 23:54                                         ` MRK
       [not found]                                         ` <4BDB6DB6.5020306@sh iftmail.org>
  1 sibling, 0 replies; 48+ messages in thread
From: MRK @ 2010-04-30 23:54 UTC (permalink / raw)
  To: Janos Haar; +Cc: linux-raid

On 04/30/2010 08:17 AM, Janos Haar wrote:
> Hello,
>
> OK, MRK you are right (again).
> There was some line in the messages wich avoids my attention.
> The entire log is here: 
> http://download.netcenter.hu/bughunt/20100430/messages
>

Ah here we go:

Apr 29 09:50:29 Clarus-gl2k10-2 kernel: device-mapper: snapshots: Invalidating snapshot: Error reading/writing.
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: ata8: EH complete
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5: Disk failure on dm-1, disabling device.
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5: Operation continuing on 10 devices.
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: md: md3: recovery done.

Firstly I'm not totally sure of how DM passed the information of the 
device failing to MD. There is no error message about this on MD. If it 
was a read error, MD should have performed the rewrite but this 
apparently did not happen (the error message for a failed rewrite by MD 
I think is "read error NOT corrected!!"). But anyway...

> The dm founds invalid my cow devices, but i don't know why at this time.
>

I have just had a brief look ad DM code. I understand like 1% of it 
right now, however I am thinking that in a not-perfectly-optimized way 
of doing things, if you specified 8 sectors (8x512b = 4k, which you did) 
granularity during the creation of your cow and cow2 devices, whenever 
you write to the COW device, DM might do the thing in 2 steps:

1- copy 8 (or multiple of 8) sectors from the HD to the cow device, 
enough to cover the area to which you are writing
2- overwrite such 8 sectors with the data coming from MD.

Of course this is not optimal in case you are writing exactly 8 sectors 
with MD, and these are aligned to the ones that DM uses (both things I 
think are true in your case) because DM could have skipped #1 in this case.
However supposing DM is not so smart and it indeed does not skip step 
#1, then I think I understand why it disables the device: it's because 
#1 fails with read error and DM does not know how to handle the 
situation in that case in general. If you had written a smaller amount 
with MD such as 512 bytes, if step #1 fails, what do you write in the 
other 7 sectors around it? The right semantics is not obvious so they 
disable the device.

Firstly you could try with 1 sector granularity instead of 8, during the 
creation of dm cow devices. This MIGHT work around the issue if DM is at 
least a bit smart. Right now it's not obvious to me where in the is code 
the logic for the COW copying. Maybe tomorrow I will understand this.

If this doesn't work, the best thing is probably if you can write to the 
DM mailing list asking why it behaves like this and if they can guess a 
workaround. You can keep me in cc, I'm interested.

> [CUT]
>
> echo 0 $(blockdev --getsize /dev/sde4) \
>        snapshot /dev/sde4 /dev/loop3 p 8 | \
>        dmsetup create cow
>
> echo 0 $(blockdev --getsize /dev/sdh4) \
>        snapshot /dev/sdh4 /dev/loop4 p 8 | \
>        dmsetup create cow2

See, you are creating it with 8 sectors granularity... try with 1.

> I can try again, if there is any new idea, but it would be really good 
> to do some trick with bitmaps or set the recovery's start point or 
> something similar, because every time i need >16 hour to get the first 
> poit where the raid do something interesting....
>
> Neil,
> Can you say something useful about this?
>

I just looked into this and it seems this feature is already there.
See if you have these files:
/sys/block/md3/md/sync_min and sync_max
Those are the starting and ending sector.
But keep in mind you have to enter them in multiples of the chunk size 
so if your chunk is e.g. 1024k then you need to enter multiples of 2048 
(sectors).
Enter the value before starting the sync. Or stop the sync by entering 
"idle" in sync_action, then change the sync_min value, then restart the 
sync entering "check" in sync_action. It should work, I just tried it on 
my comp.

Good luck

^ permalink raw reply	[flat|nested] 48+ messages in thread

[parent not found: <4BDB6DB6.5020306@sh iftmail.org>]

* Re: Suggestion needed for fixing RAID6
       [not found]                                         ` <4BDB6DB6.5020306@sh iftmail.org>
@ 2010-05-01  9:37                                           ` Janos Haar
  2010-05-01 17:17                                             ` MRK
  0 siblings, 1 reply; 48+ messages in thread
From: Janos Haar @ 2010-05-01  9:37 UTC (permalink / raw)
  To: MRK; +Cc: linux-raid

Hello,

Now i am tried with 1 sector snapshot size.
the result was the same
first the snapshot have been invalidated, than DM dropped from the raid.

The next was this:
md3 : active raid6 sdl4[11] sdk4[10] sdj4[9] sdi4[8] dm-1[12](F) sdg4[6] 
sdf4[5]
 dm-0[4] sdc4[2] sdb4[1] sda4[0]
      14626538880 blocks level 6, 16k chunk, algorithm 2 [12/10] 
[UUU_UUU_UUUU]
      [===================>.]  resync = 99.9% (1462653628/1462653888) 
finish=0.0
min speed=2512K/sec

The sync progress bar jumped from 58.8% to 99.9% the speed falls, the 
1462653628/1462653888 is freezed in this point.
I can do dmesg once by hand, than save the dmesg output to file, but the 
system crashed after this.

The entire story was about 1 minute.

Whoever, the sync_min option generally solves my problem, becasue i can 
build up the missing disk from the 90% wich is good enough for me. :-)
If somebody is interested about playing more with this system, i still have 
some days for it, but i am not interested anymore to trace the md-dm 
behavior in this situation....
Additionally, i don't want to put in risk the data if not really needed....

Thanks a lot,
Janos Haar


----- Original Message ----- 
From: "MRK" <mrk@shiftmail.org>
To: "Janos Haar" <janos.haar@netcenter.hu>
Cc: <linux-raid@vger.kernel.org>
Sent: Saturday, May 01, 2010 1:54 AM
Subject: Re: Suggestion needed for fixing RAID6


> On 04/30/2010 08:17 AM, Janos Haar wrote:
>> Hello,
>>
>> OK, MRK you are right (again).
>> There was some line in the messages wich avoids my attention.
>> The entire log is here: 
>> http://download.netcenter.hu/bughunt/20100430/messages
>>
>
> Ah here we go:
>
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: device-mapper: snapshots: 
> Invalidating snapshot: Error reading/writing.
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: ata8: EH complete
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5: Disk failure on dm-1, 
> disabling device.
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5: Operation continuing on 10 
> devices.
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: md: md3: recovery done.
>
> Firstly I'm not totally sure of how DM passed the information of the 
> device failing to MD. There is no error message about this on MD. If it 
> was a read error, MD should have performed the rewrite but this apparently 
> did not happen (the error message for a failed rewrite by MD I think is 
> "read error NOT corrected!!"). But anyway...
>
>> The dm founds invalid my cow devices, but i don't know why at this time.
>>
>
> I have just had a brief look ad DM code. I understand like 1% of it right 
> now, however I am thinking that in a not-perfectly-optimized way of doing 
> things, if you specified 8 sectors (8x512b = 4k, which you did) 
> granularity during the creation of your cow and cow2 devices, whenever you 
> write to the COW device, DM might do the thing in 2 steps:
>
> 1- copy 8 (or multiple of 8) sectors from the HD to the cow device, enough 
> to cover the area to which you are writing
> 2- overwrite such 8 sectors with the data coming from MD.
>
> Of course this is not optimal in case you are writing exactly 8 sectors 
> with MD, and these are aligned to the ones that DM uses (both things I 
> think are true in your case) because DM could have skipped #1 in this 
> case.
> However supposing DM is not so smart and it indeed does not skip step #1, 
> then I think I understand why it disables the device: it's because #1 
> fails with read error and DM does not know how to handle the situation in 
> that case in general. If you had written a smaller amount with MD such as 
> 512 bytes, if step #1 fails, what do you write in the other 7 sectors 
> around it? The right semantics is not obvious so they disable the device.
>
> Firstly you could try with 1 sector granularity instead of 8, during the 
> creation of dm cow devices. This MIGHT work around the issue if DM is at 
> least a bit smart. Right now it's not obvious to me where in the is code 
> the logic for the COW copying. Maybe tomorrow I will understand this.
>
> If this doesn't work, the best thing is probably if you can write to the 
> DM mailing list asking why it behaves like this and if they can guess a 
> workaround. You can keep me in cc, I'm interested.
>
>
>> [CUT]
>>
>> echo 0 $(blockdev --getsize /dev/sde4) \
>>        snapshot /dev/sde4 /dev/loop3 p 8 | \
>>        dmsetup create cow
>>
>> echo 0 $(blockdev --getsize /dev/sdh4) \
>>        snapshot /dev/sdh4 /dev/loop4 p 8 | \
>>        dmsetup create cow2
>
> See, you are creating it with 8 sectors granularity... try with 1.
>
>> I can try again, if there is any new idea, but it would be really good to 
>> do some trick with bitmaps or set the recovery's start point or something 
>> similar, because every time i need >16 hour to get the first poit where 
>> the raid do something interesting....
>>
>> Neil,
>> Can you say something useful about this?
>>
>
> I just looked into this and it seems this feature is already there.
> See if you have these files:
> /sys/block/md3/md/sync_min and sync_max
> Those are the starting and ending sector.
> But keep in mind you have to enter them in multiples of the chunk size so 
> if your chunk is e.g. 1024k then you need to enter multiples of 2048 
> (sectors).
> Enter the value before starting the sync. Or stop the sync by entering 
> "idle" in sync_action, then change the sync_min value, then restart the 
> sync entering "check" in sync_action. It should work, I just tried it on 
> my comp.
>
> Good luck
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Suggestion needed for fixing RAID6
  2010-05-01  9:37                                           ` Janos Haar
@ 2010-05-01 17:17                                             ` MRK
  2010-05-01 21:44                                               ` Janos Haar
  0 siblings, 1 reply; 48+ messages in thread
From: MRK @ 2010-05-01 17:17 UTC (permalink / raw)
  To: Janos Haar; +Cc: linux-raid

On 05/01/2010 11:37 AM, Janos Haar wrote:
> Whoever, the sync_min option generally solves my problem, becasue i 
> can build up the missing disk from the 90% wich is good enough for me. :-)

Are you sure? How do you do that?
Resyncing a specific part is easy, replicating to a spare a specific 
part is not. If the disk you want to replace was 100% made of parity 
data that would be easy, you do that with a resync after replacing the 
disk, maybe multiple resyncs region by region, but in your case it is 
not made of only parity data. Only raid3 and 4 separate parity data from 
actual data, raid6 instead finely interleaves them.
If you are thinking about replacing a disk with a new one (full of 
zeroes) and then resyncing manually region by region, you will destroy 
your data. Because in those chunks where the new disk acts as "actual 
data" the parity will be recomputed based on your newly introduced 
zeroes, and it will overwrite the parity data you had on the good disks, 
making recovery impossible from that point on.
You really need to do the replication to a spare as a single step, from 
the beginning to the end. You cannot use sync_min and sync_max for that 
purpose.
I think... unless bitmaps really do some magic in this, flagging the 
newly introduced disk as more recent than parity data... but do they 
really do this? people correct me if I'm wrong.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Suggestion needed for fixing RAID6
  2010-05-01 17:17                                             ` MRK
@ 2010-05-01 21:44                                               ` Janos Haar
  2010-05-02 23:05                                                 ` MRK
  2010-05-03  2:17                                                 ` Neil Brown
  0 siblings, 2 replies; 48+ messages in thread
From: Janos Haar @ 2010-05-01 21:44 UTC (permalink / raw)
  To: MRK; +Cc: linux-raid


----- Original Message ----- 
From: "MRK" <mrk@shiftmail.org>
To: "Janos Haar" <janos.haar@netcenter.hu>
Cc: <linux-raid@vger.kernel.org>
Sent: Saturday, May 01, 2010 7:17 PM
Subject: Re: Suggestion needed for fixing RAID6


> On 05/01/2010 11:37 AM, Janos Haar wrote:
>> Whoever, the sync_min option generally solves my problem, becasue i can 
>> build up the missing disk from the 90% wich is good enough for me. :-)
>
> Are you sure? How do you do that?
> Resyncing a specific part is easy, replicating to a spare a specific part 
> is not. If the disk you want to replace was 100% made of parity data that 
> would be easy, you do that with a resync after replacing the disk, maybe 
> multiple resyncs region by region, but in your case it is not made of only 
> parity data. Only raid3 and 4 separate parity data from actual data, raid6 
> instead finely interleaves them.
> If you are thinking about replacing a disk with a new one (full of zeroes) 
> and then resyncing manually region by region, you will destroy your data. 
> Because in those chunks where the new disk acts as "actual data" the 
> parity will be recomputed based on your newly introduced zeroes, and it 
> will overwrite the parity data you had on the good disks, making recovery 
> impossible from that point on.
> You really need to do the replication to a spare as a single step, from 
> the beginning to the end. You cannot use sync_min and sync_max for that 
> purpose.

You are right again, or at least close. :-)
I have the missing sdd4 wich is 98% correctly rebuilt allready.
But you are right, because the sync_min option not works for rebuilding 
disks, only for resyncing. (it is too smart to do the trick for me)

> I think... unless bitmaps really do some magic in this, flagging the newly 
> introduced disk as more recent than parity data... but do they really do 
> this? people correct me if I'm wrong.

Bitmap manipulation should work.
I think i know how to do that, but the data is more important than try it on 
my own.
I want to wait until somebody support this.
... or somebody have another good idea?

The general problem is, i have one single-degraded RAID6 + 2 badblock disk 
inside wich have bads in different location.
The big question is how to keep the integrity or how to do the rebuild by 2 
step instead of one continous?

Thanks again
Janos

> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Suggestion needed for fixing RAID6
  2010-05-01 21:44                                               ` Janos Haar
@ 2010-05-02 23:05                                                 ` MRK
  2010-05-03  2:17                                                 ` Neil Brown
  1 sibling, 0 replies; 48+ messages in thread
From: MRK @ 2010-05-02 23:05 UTC (permalink / raw)
  To: Janos Haar; +Cc: linux-raid

On 05/01/2010 11:44 PM, Janos Haar wrote:
>
> But you are right, because the sync_min option not works for 
> rebuilding disks, only for resyncing. (it is too smart to do the trick 
> for me)
>
>> I think... unless bitmaps really do some magic in this, flagging the 
>> newly introduced disk as more recent than parity data... but do they 
>> really do this? people correct me if I'm wrong.
>
> Bitmap manipulation should work.
> I think i know how to do that, but the data is more important than try 
> it on my own.
> I want to wait until somebody support this.
> ... or somebody have another good idea?

Firstly: do you have any backup of your data? If not, before doing any 
experiment I suggest that you back up important stuff. This can be done 
with rsync, and reassembling the array every time it goes down. I 
suggest to put the array in readonly mode (mdadm --readonly /dev/md3): 
this should prevent resyncs from starting automatically, and AFAIR even 
prevent drives being dropped because of read errors (but you can't use 
it during resyncs or rebuilds). Resyncs are bad because they will 
eventually bring down your array. Don't use DM when doing this.

Now, for the real thing, instead of experimenting with bitmaps, I 
suggest you try and see if the normal MD resync works now. If that works 
then you can do the normal rebuild.

*Pls note that: DM should not be needed!* - I know that you have tried 
resyncing with DM COW under MD and that one doesn't work well in this 
case, but in fact DM should not be needed.

We pointed you to DM around Apr 23rd because at that time we thought 
that your drives were dropping for uncorrectable read error, but we had 
guessed wrong.
The general MD phylosophy is that if there is enough parity 
informations, drives are not dropped just for a read error. Upon read 
error MD recomputes the value of the sector from the parity information, 
and then it attempts rewriting the block in place. During this rewrite 
the drive performs a reallocation, moving the block to a hidden spare 
region. If this rewrite fails it means that the drive is out of spare 
sectors and this is considered to be a major failure for MD, and only at 
that point the drive is dropped.
So we thought this was the reason also in your case, but we were wrong, 
in your case it was because of an MD bug, which is the one for which I 
submitted the patch.

So it should work now (without DM). And I think this is the safest thing 
you can try. Having a backup is always better though.

So start the resync without DM and see if it goes through to the end 
without dropping drives. You can use sync_min to cut the dead times.

For max safety you could first try resyncing only one chunk from the 
region of the damaged sectors, so to provoke only a minimum amount of 
rewrites. Set the sync_min to the location of the errors, and sync_max 
to just one chunk above. See what happens...
If it rewrites correctly and the drive is not dropped, then run "check" 
again on the same region and see if "cat /sys/block/md3/md/mismatch_cnt" 
still returns zero (or the value it was before the rewrite). If it is 
zero (or anyway has not changed value) it means the block was really 
rewritten with the correct value: recovery of one sector really works 
for raid6 in singly-degraded state. Then the procedure is safe, as far 
as I understand, and you can go ahead on the other chunks.
When all damaged sectors are reallocated, there are no more read errors, 
and the mismatch_cnt is still at zero, you can go ahead replacing the 
defective drive.

There are a few reasons that can still make the resync fail if we are 
really unlucky, but dmesg should point us to the right direction in that 
case.
Also remember that the patch still needs testing... currently it is not 
really tested because DM drops the drive before MD. We would need to 
know if raid6 is behaving like a raid6 now or it's still behaving like a 
raid5...
Thank you

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Suggestion needed for fixing RAID6
  2010-05-01 21:44                                               ` Janos Haar
  2010-05-02 23:05                                                 ` MRK
@ 2010-05-03  2:17                                                 ` Neil Brown
  2010-05-03 10:04                                                   ` MRK
       [not found]                                                   ` <4BDE9FB6.80309@shiftmai! l.org>
  1 sibling, 2 replies; 48+ messages in thread
From: Neil Brown @ 2010-05-03  2:17 UTC (permalink / raw)
  To: Janos Haar; +Cc: MRK, linux-raid

On Sat, 1 May 2010 23:44:04 +0200
"Janos Haar" <janos.haar@netcenter.hu> wrote:

> The general problem is, i have one single-degraded RAID6 + 2 badblock disk 
> inside wich have bads in different location.
> The big question is how to keep the integrity or how to do the rebuild by 2 
> step instead of one continous?

Once you have the fix that has already been discussed in this thread, the
only other problem I can see with this situation is if attempts to write good
data over the read-errors results in a write-error which causes the device to
be evicted from the array.  And I think you have reported getting write
errors.

The following patch should address this issue for you.  It is *not* a
general-purpose fix, but a specific fix to address an issue you are having.
It might be appropriate to make this configurable via sysfs, or possibly even
to try to auto-detect the situation and don't bother writing.

Longer term I want to add support for storing a bad-block-list per device
so that a write error just fails that block, not the whole device.  I just
need to organise my time so that I make progress on that project.

NeilBrown

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index c181438..fd73929 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -3427,6 +3427,12 @@ static void handle_stripe6(struct stripe_head *sh)
 			    && !test_bit(R5_LOCKED, &dev->flags)
 			    && test_bit(R5_UPTODATE, &dev->flags)
 				) {
+#if 1
+				/* We have recovered the data, but don't
+				 * trust the device enough to write back
+				 */
+				clear_bit(R5_ReadError, &dev->flags);
+#else
 				if (!test_bit(R5_ReWrite, &dev->flags)) {
 					set_bit(R5_Wantwrite, &dev->flags);
 					set_bit(R5_ReWrite, &dev->flags);
@@ -3438,6 +3444,7 @@ static void handle_stripe6(struct stripe_head *sh)
 					set_bit(R5_LOCKED, &dev->flags);
 					s.locked++;
 				}
+#endif
 			}
 		}

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: Suggestion needed for fixing RAID6
  2010-05-03  2:17                                                 ` Neil Brown
@ 2010-05-03 10:04                                                   ` MRK
  2010-05-03 10:21                                                     ` MRK
  2010-05-03 21:02                                                     ` Neil Brown
       [not found]                                                   ` <4BDE9FB6.80309@shiftmai! l.org>
  1 sibling, 2 replies; 48+ messages in thread
From: MRK @ 2010-05-03 10:04 UTC (permalink / raw)
  To: Neil Brown; +Cc: Janos Haar, linux-raid

On 05/03/2010 04:17 AM, Neil Brown wrote:
> On Sat, 1 May 2010 23:44:04 +0200
> "Janos Haar"<janos.haar@netcenter.hu>  wrote:
>
>    
>> The general problem is, i have one single-degraded RAID6 + 2 badblock disk
>> inside wich have bads in different location.
>> The big question is how to keep the integrity or how to do the rebuild by 2
>> step instead of one continous?
>>      
> Once you have the fix that has already been discussed in this thread, the
> only other problem I can see with this situation is if attempts to write good
> data over the read-errors results in a write-error which causes the device to
> be evicted from the array.
>
> And I think you have reported getting write
> errors.
>    

His dmesg AFAIR has never reported any error of the kind "raid5:%s: read 
error NOT corrected!! " (the error message you get on failed rewrite AFAIU)
Up to now (after my patch) he only tried with MD above DM-COW and DM was 
dropping the drive on read error so I think MD didn't get any 
opportunity to rewrite.

It is not clear to me what kind of error MD got from DM:

Apr 29 09:50:29 Clarus-gl2k10-2 kernel: device-mapper: snapshots: Invalidating snapshot: Error reading/writing.
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: ata8: EH complete
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5: Disk failure on dm-1, disabling device.

I don't understand from what place the md_error() is called...
but also in this case it doesn't look like a rewrite error...

I think without DM COW it should probably work in his case.

Your new patch skips the rewriting and keeps the unreadable sectors, 
right? So that the drive isn't dropped on rewrite...

> The following patch should address this issue for you.
> It is*not*  a general-purpose fix, but a specific fix
[CUT]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Suggestion needed for fixing RAID6
  2010-05-03 10:04                                                   ` MRK
@ 2010-05-03 10:21                                                     ` MRK
  2010-05-03 21:04                                                       ` Neil Brown
  2010-05-03 21:02                                                     ` Neil Brown
  1 sibling, 1 reply; 48+ messages in thread
From: MRK @ 2010-05-03 10:21 UTC (permalink / raw)
  To: MRK, Neil Brown; +Cc: Janos Haar, linux-raid

On 05/03/2010 12:04 PM, MRK wrote:
> On 05/03/2010 04:17 AM, Neil Brown wrote:
>> On Sat, 1 May 2010 23:44:04 +0200
>> "Janos Haar"<janos.haar@netcenter.hu>  wrote:
>>
>>> The general problem is, i have one single-degraded RAID6 + 2 
>>> badblock disk
>>> inside wich have bads in different location.
>>> The big question is how to keep the integrity or how to do the 
>>> rebuild by 2
>>> step instead of one continous?
>> Once you have the fix that has already been discussed in this thread, 
>> the
>> only other problem I can see with this situation is if attempts to 
>> write good
>> data over the read-errors results in a write-error which causes the 
>> device to
>> be evicted from the array.
>>
>> And I think you have reported getting write
>> errors.
>
> His dmesg AFAIR has never reported any error of the kind "raid5:%s: 
> read error NOT corrected!! " (the error message you get on failed 
> rewrite AFAIU)
> Up to now (after my patch) he only tried with MD above DM-COW and DM 
> was dropping the drive on read error so I think MD didn't get any 
> opportunity to rewrite.
>
> It is not clear to me what kind of error MD got from DM:
>
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: device-mapper: snapshots: 
> Invalidating snapshot: Error reading/writing.
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: ata8: EH complete
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5: Disk failure on dm-1, 
> disabling device.
>
> I don't understand from what place the md_error() is called...
> [CUT]

Oh and there is another issue I wanted to expose:

His last dmesg:
http://download.netcenter.hu/bughunt/20100430/messages

Much after the line:
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5: Disk failure on dm-1, 
disabling device.

there are many lines like this:
Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not 
correctable (sector 1662189872 on dm-1).

How come MD still wants to read from a device it has disabled?
looks like a problem to me...
MD also scrubs failed devices during check?


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Suggestion needed for fixing RAID6
  2010-05-03 10:21                                                     ` MRK
@ 2010-05-03 21:04                                                       ` Neil Brown
  0 siblings, 0 replies; 48+ messages in thread
From: Neil Brown @ 2010-05-03 21:04 UTC (permalink / raw)
  To: MRK; +Cc: Janos Haar, linux-raid

On Mon, 03 May 2010 12:21:08 +0200
MRK <mrk@shiftmail.org> wrote:

> On 05/03/2010 12:04 PM, MRK wrote:
> > On 05/03/2010 04:17 AM, Neil Brown wrote:
> >> On Sat, 1 May 2010 23:44:04 +0200
> >> "Janos Haar"<janos.haar@netcenter.hu>  wrote:
> >>
> >>> The general problem is, i have one single-degraded RAID6 + 2 
> >>> badblock disk
> >>> inside wich have bads in different location.
> >>> The big question is how to keep the integrity or how to do the 
> >>> rebuild by 2
> >>> step instead of one continous?
> >> Once you have the fix that has already been discussed in this thread, 
> >> the
> >> only other problem I can see with this situation is if attempts to 
> >> write good
> >> data over the read-errors results in a write-error which causes the 
> >> device to
> >> be evicted from the array.
> >>
> >> And I think you have reported getting write
> >> errors.
> >
> > His dmesg AFAIR has never reported any error of the kind "raid5:%s: 
> > read error NOT corrected!! " (the error message you get on failed 
> > rewrite AFAIU)
> > Up to now (after my patch) he only tried with MD above DM-COW and DM 
> > was dropping the drive on read error so I think MD didn't get any 
> > opportunity to rewrite.
> >
> > It is not clear to me what kind of error MD got from DM:
> >
> > Apr 29 09:50:29 Clarus-gl2k10-2 kernel: device-mapper: snapshots: 
> > Invalidating snapshot: Error reading/writing.
> > Apr 29 09:50:29 Clarus-gl2k10-2 kernel: ata8: EH complete
> > Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5: Disk failure on dm-1, 
> > disabling device.
> >
> > I don't understand from what place the md_error() is called...
> > [CUT]
> 
> Oh and there is another issue I wanted to expose:
> 
> His last dmesg:
> http://download.netcenter.hu/bughunt/20100430/messages
> 
> Much after the line:
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5: Disk failure on dm-1, 
> disabling device.
> 
> there are many lines like this:
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not 
> correctable (sector 1662189872 on dm-1).
> 
> How come MD still wants to read from a device it has disabled?
> looks like a problem to me...

There are often many IO requests in flight at the same time.  When one
returns with an error we might fail the device but there are still lots more
that have not yet completed.  As they complete we might write messages about
them - even after we have reported the device as 'failed'.  But we never
initiate an IO after the device has been marked 'faulty'.

NeilBrown



> MD also scrubs failed devices during check?
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Suggestion needed for fixing RAID6
  2010-05-03 10:04                                                   ` MRK
  2010-05-03 10:21                                                     ` MRK
@ 2010-05-03 21:02                                                     ` Neil Brown
  1 sibling, 0 replies; 48+ messages in thread
From: Neil Brown @ 2010-05-03 21:02 UTC (permalink / raw)
  To: MRK; +Cc: Janos Haar, linux-raid

On Mon, 03 May 2010 12:04:38 +0200
MRK <mrk@shiftmail.org> wrote:

> On 05/03/2010 04:17 AM, Neil Brown wrote:
> > On Sat, 1 May 2010 23:44:04 +0200
> > "Janos Haar"<janos.haar@netcenter.hu>  wrote:
> >
> >    
> >> The general problem is, i have one single-degraded RAID6 + 2 badblock disk
> >> inside wich have bads in different location.
> >> The big question is how to keep the integrity or how to do the rebuild by 2
> >> step instead of one continous?
> >>      
> > Once you have the fix that has already been discussed in this thread, the
> > only other problem I can see with this situation is if attempts to write good
> > data over the read-errors results in a write-error which causes the device to
> > be evicted from the array.
> >
> > And I think you have reported getting write
> > errors.
> >    
> 
> His dmesg AFAIR has never reported any error of the kind "raid5:%s: read 
> error NOT corrected!! " (the error message you get on failed rewrite AFAIU)
> Up to now (after my patch) he only tried with MD above DM-COW and DM was 
> dropping the drive on read error so I think MD didn't get any 
> opportunity to rewrite.

Hmmm... fair enough.

> 
> It is not clear to me what kind of error MD got from DM:
> 
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: device-mapper: snapshots: Invalidating snapshot: Error reading/writing.
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: ata8: EH complete
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5: Disk failure on dm-1, disabling device.
> 
> I don't understand from what place the md_error() is called...

I suspect it is from raid5_end_write_request.  It looks like we don't print
any message when the re-write fails.  Only if the read after the rewrite
fails.

> but also in this case it doesn't look like a rewrite error...
> 

... so I suspect it is a rewrite error.  Unless I missed something.  What
message did you expect to see in the case of a re-write error?

> I think without DM COW it should probably work in his case.
> 
> Your new patch skips the rewriting and keeps the unreadable sectors, 
> right? So that the drive isn't dropped on rewrite...

Correct.


> 
> > The following patch should address this issue for you.
> > It is*not*  a general-purpose fix, but a specific fix
> [CUT]

NeilBrown

^ permalink raw reply	[flat|nested] 48+ messages in thread

[parent not found: <4BDE9FB6.80309@shiftmai! l.org>]

* Re: Suggestion needed for fixing RAID6
       [not found]                                                   ` <4BDE9FB6.80309@shiftmai! l.org>
@ 2010-05-03 10:20                                                     ` Janos Haar
  2010-05-05 15:24                                                     ` Suggestion needed for fixing RAID6 [SOLVED] Janos Haar
  1 sibling, 0 replies; 48+ messages in thread
From: Janos Haar @ 2010-05-03 10:20 UTC (permalink / raw)
  To: MRK; +Cc: Neil Brown, linux-raid


----- Original Message ----- 
From: "MRK" <mrk@shiftmail.org>
To: "Neil Brown" <neilb@suse.de>
Cc: "Janos Haar" <janos.haar@netcenter.hu>; <linux-raid@vger.kernel.org>
Sent: Monday, May 03, 2010 12:04 PM
Subject: Re: Suggestion needed for fixing RAID6


> On 05/03/2010 04:17 AM, Neil Brown wrote:
>> On Sat, 1 May 2010 23:44:04 +0200
>> "Janos Haar"<janos.haar@netcenter.hu>  wrote:
>>
>>
>>> The general problem is, i have one single-degraded RAID6 + 2 badblock 
>>> disk
>>> inside wich have bads in different location.
>>> The big question is how to keep the integrity or how to do the rebuild 
>>> by 2
>>> step instead of one continous?
>>>
>> Once you have the fix that has already been discussed in this thread, the
>> only other problem I can see with this situation is if attempts to write 
>> good
>> data over the read-errors results in a write-error which causes the 
>> device to
>> be evicted from the array.
>>
>> And I think you have reported getting write
>> errors.
>>
>
> His dmesg AFAIR has never reported any error of the kind "raid5:%s: read 
> error NOT corrected!! " (the error message you get on failed rewrite 
> AFAIU)
> Up to now (after my patch) he only tried with MD above DM-COW and DM was 
> dropping the drive on read error so I think MD didn't get any opportunity 
> to rewrite.
>
> It is not clear to me what kind of error MD got from DM:
>
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: device-mapper: snapshots: 
> Invalidating snapshot: Error reading/writing.
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: ata8: EH complete
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5: Disk failure on dm-1, 
> disabling device.
>
> I don't understand from what place the md_error() is called...
> but also in this case it doesn't look like a rewrite error...
>
> I think without DM COW it should probably work in his case.
>
> Your new patch skips the rewriting and keeps the unreadable sectors, 
> right? So that the drive isn't dropped on rewrite...
>
>> The following patch should address this issue for you.
>> It is*not*  a general-purpose fix, but a specific fix
> [CUT]

Just a little note:
I have 2 bad drives, one wich have bads at 54%, have >2500 UNC sectors, wich 
is too much for trying it to repair, this drive is really failing....
The other have only 123 bads at 99% wich is a very small scratch on the 
platter, now i am trying to fix this drive instead.
The repair-check sync process now runs, i will reply soon again...

Thanks,
Janos 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Suggestion needed for fixing RAID6 [SOLVED]
       [not found]                                                   ` <4BDE9FB6.80309@shiftmai! l.org>
  2010-05-03 10:20                                                     ` Janos Haar
@ 2010-05-05 15:24                                                     ` Janos Haar
  2010-05-05 19:27                                                       ` MRK
  1 sibling, 1 reply; 48+ messages in thread
From: Janos Haar @ 2010-05-05 15:24 UTC (permalink / raw)
  To: MRK; +Cc: linux-raid, Neil Brown

>
> It is not clear to me what kind of error MD got from DM:
>
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: device-mapper: snapshots: 
> Invalidating snapshot: Error reading/writing.
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: ata8: EH complete
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5: Disk failure on dm-1, 
> disabling device.
>
> I don't understand from what place the md_error() is called...
> but also in this case it doesn't look like a rewrite error...
>
> I think without DM COW it should probably work in his case.

First sorry for delay.
Without DM, the original behavior-fix patch worked very well.

Neil is generally right about the drive should reallocate the bad sectors on 
rewrite, but this is the ideal scenario wich is far from the real world 
unfortunately....
I needed to repeat 4 times the "repair" sync methode on the better HDD (wich 
have only 123 bads) to get readable again.
The another hdd have >2500 bads wich looks like have no chance to fix this 
way.

>
> Your new patch skips the rewriting and keeps the unreadable sectors, 
> right? So that the drive isn't dropped on rewrite...
>
>> The following patch should address this issue for you.
>> It is*not*  a general-purpose fix, but a specific fix
> [CUT]

Neil, i think this patch should be in the sysfs or in the proc to be 
inactive by default, and of course will be good for recover bad cases like 
mine.
There is a lot of hdd problems wich can make really uncorrectable sectors 
wich can't be good again even on rewrite....

Thanks a lot for all who helped me to solve this....

And MRK, please don't forget to write in my name. :-)

Cheers,
Janos 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Suggestion needed for fixing RAID6 [SOLVED]
  2010-05-05 15:24                                                     ` Suggestion needed for fixing RAID6 [SOLVED] Janos Haar
@ 2010-05-05 19:27                                                       ` MRK
  0 siblings, 0 replies; 48+ messages in thread
From: MRK @ 2010-05-05 19:27 UTC (permalink / raw)
  To: Janos Haar; +Cc: linux-raid, Neil Brown

On 05/05/2010 05:24 PM, Janos Haar wrote:
>> I think without DM COW it should probably work in his case.
>
> First sorry for delay.
> Without DM, the original behavior-fix patch worked very well.

Great!
Ok I have just resubmitted the patch (v2) which includes a "Tested-by: 
Janos Haar <janos.haar@netcenter.hu>" line and a few fixes on the 
description.

> [CUT]
> Thanks a lot for all who helped me to solve this....
>
> And MRK, please don't forget to write in my name. :-)

I did it. Now it's in Neil's hands, hopefully he acks it and pushes it 
to mainline.

Thanks everybody,
GAT

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2010-05-05 19:27 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-22 10:09 Suggestion needed for fixing RAID6 Janos Haar
2010-04-22 15:00 ` Mikael Abrahamsson
2010-04-22 15:12   ` Janos Haar
2010-04-22 15:18     ` Mikael Abrahamsson
2010-04-22 16:25       ` Janos Haar
2010-04-22 16:32       ` Peter Rabbitson
     [not found] ` <4BD0AF2D.90207@stud.tu-ilmenau.de>
2010-04-22 20:48   ` Janos Haar
2010-04-23  6:51 ` Luca Berra
2010-04-23  8:47   ` Janos Haar
2010-04-23 12:34     ` MRK
2010-04-24 19:36       ` Janos Haar
2010-04-24 22:47         ` MRK
2010-04-25 10:00           ` Janos Haar
2010-04-26 10:24             ` MRK
2010-04-26 12:52               ` Janos Haar
2010-04-26 16:53                 ` MRK
2010-04-26 22:39                   ` Janos Haar
2010-04-26 23:06                     ` Michael Evans
     [not found]                       ` <7cfd01cae598$419e8d20$0400a8c0@dcccs>
2010-04-27  0:04                         ` Michael Evans
2010-04-27 15:50                   ` Janos Haar
2010-04-27 23:02                     ` MRK
2010-04-28  1:37                       ` Neil Brown
2010-04-28  2:02                         ` Mikael Abrahamsson
2010-04-28  2:12                           ` Neil Brown
2010-04-28  2:30                             ` Mikael Abrahamsson
2010-05-03  2:29                               ` Neil Brown
2010-04-28 12:57                         ` MRK
2010-04-28 13:32                           ` Janos Haar
2010-04-28 14:19                             ` MRK
2010-04-28 14:51                               ` Janos Haar
2010-04-29  7:55                               ` Janos Haar
2010-04-29 15:22                                 ` MRK
2010-04-29 21:07                                   ` Janos Haar
2010-04-29 23:00                                     ` MRK
2010-04-30  6:17                                       ` Janos Haar
2010-04-30 23:54                                         ` MRK
     [not found]                                         ` <4BDB6DB6.5020306@sh iftmail.org>
2010-05-01  9:37                                           ` Janos Haar
2010-05-01 17:17                                             ` MRK
2010-05-01 21:44                                               ` Janos Haar
2010-05-02 23:05                                                 ` MRK
2010-05-03  2:17                                                 ` Neil Brown
2010-05-03 10:04                                                   ` MRK
2010-05-03 10:21                                                     ` MRK
2010-05-03 21:04                                                       ` Neil Brown
2010-05-03 21:02                                                     ` Neil Brown
     [not found]                                                   ` <4BDE9FB6.80309@shiftmai! l.org>
2010-05-03 10:20                                                     ` Janos Haar
2010-05-05 15:24                                                     ` Suggestion needed for fixing RAID6 [SOLVED] Janos Haar
2010-05-05 19:27                                                       ` MRK

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).