Re: [PATCH md 2 of 4] Fix raid6 problem

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH md 2 of 4] Fix raid6 problem
       [not found] <200502031145.j13Bj1fl016074@terminus.zytor.com>
@ 2005-02-03 16:39 ` H. Peter Anvin
  2005-02-03 16:59   ` Lars Marowsky-Bree
  2005-02-03 17:43   ` Guy
  0 siblings, 2 replies; 28+ messages in thread
From: H. Peter Anvin @ 2005-02-03 16:39 UTC (permalink / raw)
  To: Ruth Ivimey-Cook; +Cc: linux-raid

Ruth Ivimey-Cook wrote:
> 
> Would you say that raid-6 is suitable for storing mission-critical data, then?
> 

What I'd say is that I don't have any evidence it's not.  Unfortunately, 
that's not quite the same thing.

> I have a .5TB raid5 array on 5 IDE disks, and given what has been said recently
> about disk MTBF's and RAID failure recovery, I'm thinking it might be best to
> switch to raid-6.
> 
> I guess such a switch is best implemented as {make backup, reformat, restore},
> if I went ahead?

Yes, right now there is no RAID5->RAID6 conversion tool that I know of.

	-hpa

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH md 2 of 4] Fix raid6 problem
  2005-02-03 16:39 ` [PATCH md 2 of 4] Fix raid6 problem H. Peter Anvin
@ 2005-02-03 16:59   ` Lars Marowsky-Bree
  2005-02-03 17:06     ` H. Peter Anvin
  2005-02-03 17:43   ` Guy
  1 sibling, 1 reply; 28+ messages in thread
From: Lars Marowsky-Bree @ 2005-02-03 16:59 UTC (permalink / raw)
  To: H. Peter Anvin, Ruth Ivimey-Cook; +Cc: linux-raid

On 2005-02-03T08:39:41, "H. Peter Anvin" <hpa@zytor.com> wrote:

> Yes, right now there is no RAID5->RAID6 conversion tool that I know of.

Hm. One of the checksums is identical, as is the disk layout of the
data, no?

So wouldn't mdadm with the right parameters forcing the right super
block to be written, and then the missing disk to be hot-added "convert"
this?

But yes, I'd recommend a backup before doing so ;-)


Mit freundlichen Grüßen,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH md 2 of 4] Fix raid6 problem
  2005-02-03 16:59   ` Lars Marowsky-Bree
@ 2005-02-03 17:06     ` H. Peter Anvin
  0 siblings, 0 replies; 28+ messages in thread
From: H. Peter Anvin @ 2005-02-03 17:06 UTC (permalink / raw)
  To: Lars Marowsky-Bree; +Cc: Ruth Ivimey-Cook, linux-raid

Lars Marowsky-Bree wrote:
> On 2005-02-03T08:39:41, "H. Peter Anvin" <hpa@zytor.com> wrote:
> 
> 
>>Yes, right now there is no RAID5->RAID6 conversion tool that I know of.
> 
> Hm. One of the checksums is identical, as is the disk layout of the
> data, no?
> 

No, the layout is different.

	-hpa

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH md 2 of 4] Fix raid6 problem
  2005-02-03 16:39 ` [PATCH md 2 of 4] Fix raid6 problem H. Peter Anvin
  2005-02-03 16:59   ` Lars Marowsky-Bree
@ 2005-02-03 17:43   ` Guy
  2005-02-03 18:07     ` H. Peter Anvin
  1 sibling, 1 reply; 28+ messages in thread
From: Guy @ 2005-02-03 17:43 UTC (permalink / raw)
  To: 'H. Peter Anvin', 'Ruth Ivimey-Cook'; +Cc: linux-raid

Would you say that the 2.6 Kernel is suitable for storing mission-critical
data, then?

I ask because I have read about a lot of problems with data corruption and
oops on this list and the SCSI list.  But in most or all cases the 2.4
Kernel does not have the same problem.

Who out there has a RAID6 array that they believe is stable and safe?
And please give some details about the array.  Number of disks, sizes, LVM,
FS, SCSI, ATA and anything else you can think of?  Also, details about any
disk failures and how well recovery went?

Thanks,
Guy

-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of H. Peter Anvin
Sent: Thursday, February 03, 2005 11:40 AM
To: Ruth Ivimey-Cook
Cc: linux-raid@vger.kernel.org
Subject: Re: [PATCH md 2 of 4] Fix raid6 problem

Ruth Ivimey-Cook wrote:
> 
> Would you say that raid-6 is suitable for storing mission-critical data,
then?
> 

What I'd say is that I don't have any evidence it's not.  Unfortunately, 
that's not quite the same thing.

> I have a .5TB raid5 array on 5 IDE disks, and given what has been said
recently
> about disk MTBF's and RAID failure recovery, I'm thinking it might be best
to
> switch to raid-6.
> 
> I guess such a switch is best implemented as {make backup, reformat,
restore},
> if I went ahead?

Yes, right now there is no RAID5->RAID6 conversion tool that I know of.

	-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH md 2 of 4] Fix raid6 problem
  2005-02-03 17:43   ` Guy
@ 2005-02-03 18:07     ` H. Peter Anvin
  2005-02-03 19:36       ` Gordon Henderson
  0 siblings, 1 reply; 28+ messages in thread
From: H. Peter Anvin @ 2005-02-03 18:07 UTC (permalink / raw)
  To: Guy; +Cc: 'Ruth Ivimey-Cook', linux-raid

Guy wrote:
> Would you say that the 2.6 Kernel is suitable for storing mission-critical
> data, then?

Sure.  I'd trust 2.6 over 2.4 at this point.

> I ask because I have read about a lot of problems with data corruption and
> oops on this list and the SCSI list.  But in most or all cases the 2.4
> Kernel does not have the same problem.

I haven't seen any problems like that, including on kernel.org, which is 
definitely a high demand site.

> Who out there has a RAID6 array that they believe is stable and safe?
> And please give some details about the array.  Number of disks, sizes, LVM,
> FS, SCSI, ATA and anything else you can think of?  Also, details about any
> disk failures and how well recovery went?

The one I have is a 6-disk ATA array (6x250 GB), ext3.  Had one disk 
failure which hasn't been replaced yet; it's successfully running in 
1-disk degraded mode.

I'll let other people speak for themselves.

	-hpa

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH md 2 of 4] Fix raid6 problem
  2005-02-03 18:07     ` H. Peter Anvin
@ 2005-02-03 19:36       ` Gordon Henderson
  2005-02-04  9:04         ` Andrew Walrond
  2005-02-14  4:27         ` Tim Moore
  0 siblings, 2 replies; 28+ messages in thread
From: Gordon Henderson @ 2005-02-03 19:36 UTC (permalink / raw)
  To: linux-raid

On Thu, 3 Feb 2005, H. Peter Anvin wrote:

> Guy wrote:
> > Would you say that the 2.6 Kernel is suitable for storing mission-critical
> > data, then?
>
> Sure.  I'd trust 2.6 over 2.4 at this point.

This is interesting to hear.

> > I ask because I have read about a lot of problems with data corruption and
> > oops on this list and the SCSI list.  But in most or all cases the 2.4
> > Kernel does not have the same problem.
>
> I haven't seen any problems like that, including on kernel.org, which is
> definitely a high demand site.
>
> > Who out there has a RAID6 array that they believe is stable and safe?
> > And please give some details about the array.  Number of disks, sizes, LVM,
> > FS, SCSI, ATA and anything else you can think of?  Also, details about any
> > disk failures and how well recovery went?
>
> The one I have is a 6-disk ATA array (6x250 GB), ext3.  Had one disk
> failure which hasn't been replaced yet; it's successfully running in
> 1-disk degraded mode.
>
> I'll let other people speak for themselves.

I asked this question a couple of weeks ago...

I didn't get many replies that inspired confidence, however, I didn't get
any "don't do it" replies either.

So I went off and built a test server with a bit mis-match of drives,
controllers and whatno, and made the effort to get Debian Woody to use the
2.6.10 kernel (stock, no patces) and played with mdadm and RAID-6.

My test server is an old Asus Twin Xeon 500 MHz board (XG-DLS, I think)
with 2 old 4GB IDE drives, 2 older 18GB SCSI drives (on-board controller,
one on a nice 68-way LVD cable, the other on a 50-way flat ribbon) and 2
Maxtor (I know)  80GB drives on an Highpoint controller.

I was unable to make it crash, or corrupt data (that I could tell) in
about a weeks worth of testing. I only hard-pulled a drive once though,
and it stalled for a short while, then did what it was supposed to do and
carried on. I did lots of tests where I failed a drive (using mdadm), then
failed a 2nd, then started a re-sync, then started a 2nd resync, and
failed a drive after the 1st resync finished, etc., etc., etc.... Nothing
more than RAID-6 and ext3, no LVM, XFS, etc.

So at that point I was reasonably happy with RAID-6 and 2.6.10. My blood
had stopped dripping over the edge, I'm warming to 2.6.10 and thinking
RAID-6 might just be the solution to all my problems...

However, I then got production hardware - Tyan Thunder K8W twin Opteron
board, 4-port SATA on-oboard, 2x2-port SATA in PCI slots (all SII chipset)
and it all went pear-shaped from there. The system locks solid whenever I
try to use the disks off the PCI SATA controllers doing anything much more
than run fdisk on them.  (It just stops, no oops, cursor stops flashing on
the display, it needs a hard-reset to get it going again) I've tried PCI
slot positions, fiddling with mobo jumpers, BIOS options, and so on.  I
can make it work for varying degrees of "work", however blood is currently
flowing over the edge and gathering in a pool at my feet. Even getting it
to boot off the SATA drives was a challenge in itself (which still isn't
solved to my satisfaction)

Anyone using Tyan Thunder K8W motherboards???

I now know, there is a K8S (server?) version of that mobo, but at the time
it was all orderd, I wasn't aware of it - my thoughts are there there is
some sort of PCI/PCI-X problem with either the motherboard or the chipset,
and in all probability the K8S mobo will have the same chipsset and same
problems anyway...

Right now, (to test the PCI SATA cards in PCI-X slots), I have 4 x
dual-port SATA cards in a Dell PCI-X mobo connected to the 8 drives in
their box via 900mm SATA cables, and it's all running quite nicely.  Read
performance on a RAID-0 array was 230MB/sec, write 300MB/sec (!?!), it
falls to 110MB/sec write and 140MB/sec read for RAID-6...) (Processor here
is a single Xeon 2.4GHz)

Gordon

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH md 2 of 4] Fix raid6 problem
  2005-02-03 19:36       ` Gordon Henderson
@ 2005-02-04  9:04         ` Andrew Walrond
  2005-02-04 11:19           ` Gordon Henderson
  2005-02-14  4:27         ` Tim Moore
  1 sibling, 1 reply; 28+ messages in thread
From: Andrew Walrond @ 2005-02-04  9:04 UTC (permalink / raw)
  To: linux-raid

Hi Gordon,

On Thursday 03 February 2005 19:36, Gordon Henderson wrote:
>
> However, I then got production hardware - Tyan Thunder K8W twin Opteron
> board, 4-port SATA on-oboard, 2x2-port SATA in PCI slots (all SII chipset)
> and it all went pear-shaped from there. The system locks solid whenever I
> try to use the disks off the PCI SATA controllers doing anything much more
> than run fdisk on them.  (It just stops, no oops, cursor stops flashing on
> the display, it needs a hard-reset to get it going again) I've tried PCI
> slot positions, fiddling with mobo jumpers, BIOS options, and so on.  I
> can make it work for varying degrees of "work", however blood is currently
> flowing over the edge and gathering in a pool at my feet. Even getting it
> to boot off the SATA drives was a challenge in itself (which still isn't
> solved to my satisfaction)
>
> Anyone using Tyan Thunder K8W motherboards???
>

I'm using K8W's here with a combo od raid0/1 on on-board SATA, and its been 
rock solid for months (2.6.10). Looks like your problems are all with the PCI 
cards, but I can't help there. Since you are using vanilla 2.6.10, Jeff 
Garzik (SATA maintainer) should be interested/helpful on LKML if you want to 
pursue this.

What was the booting problem? I have no problems  here in that regard.

> I now know, there is a K8S (server?) version of that mobo, but at the time
> it was all orderd, I wasn't aware of it - my thoughts are there there is
> some sort of PCI/PCI-X problem with either the motherboard or the chipset,
> and in all probability the K8S mobo will have the same chipsset and same
> problems anyway...

Right; very similar, no AGP but additional on-board scsi.

>
> Right now, (to test the PCI SATA cards in PCI-X slots), I have 4 x
> dual-port SATA cards in a Dell PCI-X mobo connected to the 8 drives in
> their box via 900mm SATA cables, and it's all running quite nicely.  Read
> performance on a RAID-0 array was 230MB/sec, write 300MB/sec (!?!), it
> falls to 110MB/sec write and 140MB/sec read for RAID-6...) (Processor here
> is a single Xeon 2.4GHz)

I wonder if it's the onboard/pci combination that causes the problem on K8W...

BTW There is a new K8W just out; pci express replaces agp and I think it 
supports dual core opterons (when they appear) so check it out before you 
place any big orders for the old one :)

Andrew Walrond

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH md 2 of 4] Fix raid6 problem
  2005-02-04  9:04         ` Andrew Walrond
@ 2005-02-04 11:19           ` Gordon Henderson
  2005-02-04 18:31             ` Mike Hardy
                               ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Gordon Henderson @ 2005-02-04 11:19 UTC (permalink / raw)
  To: linux-raid

On Fri, 4 Feb 2005, Andrew Walrond wrote:

> Hi Gordon,
>
> > Anyone using Tyan Thunder K8W motherboards???
>
> I'm using K8W's here with a combo od raid0/1 on on-board SATA, and its been
> rock solid for months (2.6.10). Looks like your problems are all with the PCI
> cards, but I can't help there. Since you are using vanilla 2.6.10, Jeff
> Garzik (SATA maintainer) should be interested/helpful on LKML if you want to
> pursue this.

The on-board stuff was solid for me too. Great with in JBOD mode with 4
disks under Linux s/w RAID. I didn't enable it's own RAID system. The
problems came when I added 2 x 2-port SATA PCI cards to get it to access
all 8 SATA disks I have in the box.

> What was the booting problem? I have no problems  here in that regard.

When you add extra PCI cards, depending on where you add the cards and on
what order you have the PCI scan in the BIOS, depends on what disk it
really wants to boot from. I could boot OK from the on-board controller, I
had problems when I added PCI SATA cards.

To build it, I had to add an IDE disk and do the install on that, then
copy it over to the SATA drives - this is something I've done in the past
- Debian Woody uses 2.4.18 to start with, and doesn't have any SATA
drivers. I've done this in the past without incident - I actually have an
IDE drive which I use for such installs, and although it's a bit of a
fiddle, I get the system I want reasonably quickly.

Bizarrely it seems to work better when booting off an IDE drive, so
putting in 2 IDE drives (mirrored) to boot off is an option, as is putting
in a IDE/Flash card.... (Which I've used in the past)

What I wanted was an 8-way RAID-1 for the boot partition (all of /, in
reality) and I've done this many times in the past on other 2-5 way
systems without issue. So I do the stuff I've done in the past, and theres
nothing really new to me in that respect. (I'm using LILO) So when I try
to get it to boot off the md device, it boots and says LIL and then
nothing more. (Lilo diagnostics interpret this as a media failure, or
geometry mismatch) If I make it boot off /dev/sda1 then it would work.
(ie. boot off /dev/sda1, root on /dev/md1, an 8-way RAID-1) I tried many
combinations of old (Debian woody) & new Lilo (compiled from the latest
source), I even tried GRUB at one point with no luck either. It was more
frustrating as the turn-around time is several minutes by the time you go
through the BIOS to change the boot device, then reboot, change lilo.conf,
then try again )-:

> > I now know, there is a K8S (server?) version of that mobo, but at the time
> > it was all orderd, I wasn't aware of it - my thoughts are there there is
> > some sort of PCI/PCI-X problem with either the motherboard or the chipset,
> > and in all probability the K8S mobo will have the same chipsset and same
> > problems anyway...
>
> Right; very similar, no AGP but additional on-board scsi.
>
> >
> > Right now, (to test the PCI SATA cards in PCI-X slots), I have 4 x
> > dual-port SATA cards in a Dell PCI-X mobo connected to the 8 drives in
> > their box via 900mm SATA cables, and it's all running quite nicely.  Read
> > performance on a RAID-0 array was 230MB/sec, write 300MB/sec (!?!), it
> > falls to 110MB/sec write and 140MB/sec read for RAID-6...) (Processor here
> > is a single Xeon 2.4GHz)
>
> I wonder if it's the onboard/pci combination that causes the problem on
> K8W...

Thats what I'm thinking - there are various jumpers to put the PCI-X slots
into PCI mode and lots of BIOS options to control speed, etc, none of
which made any difference. I did try a different brand of SATA PCI card
and that worked slightly better, but I could still force a total lock-up
with lots of access to the SATA drives on the PCI cards. (Mobo & PCI cards
are all SII 3114/3112 chipsets)

I have had a private email from someone who has experienced similar
lock-ups with twin Opteron systems and PCI cards, so from that point of
view it doesn't bode well.

I have some quad Opterons running too, and they seem fine, although they
are pure compute servers with just a local IDE drive. (and they are
running SuSE 64-bit which is what the application demands)

It seemed more stable with just one PCI card in, so I have a 4-port card
on order as a last ditch attempt to make it work - I did try re-flashing
the BIOS on one board, (I have 2) as it seemed to be about a year old and
there are several updates on the Tyan web-site, however that resulted in
wiping out the BIOS - it seemed to be going just fine, then it went beep
and was silent forever more )-: Anyone in the SW have a flash
programmer/copier handy???

> BTW There is a new K8W just out; pci express replaces agp and I think it
> supports dual core opterons (when they appear) so check it out before you
> place any big orders for the old one :)

Maybe, or maybe we just move to an Intel system, although power
dissipation was a consideration and the Opterons are attractive in that
aspect... The case has a 600W PSU before anyone asks..

Cheers,

Gordon

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH md 2 of 4] Fix raid6 problem
  2005-02-04 11:19           ` Gordon Henderson
@ 2005-02-04 18:31             ` Mike Hardy
  2005-02-13 21:05               ` Mark Hahn
  2005-02-06  3:38             ` Tim Moore
  2005-02-14  4:49             ` Tim Moore
  2 siblings, 1 reply; 28+ messages in thread
From: Mike Hardy @ 2005-02-04 18:31 UTC (permalink / raw)
  To: linux-raid

Gordon Henderson wrote:

> I have had a private email from someone who has experienced similar
> lock-ups with twin Opteron systems and PCI cards, so from that point of
> view it doesn't bode well.
> 
> I have some quad Opterons running too, and they seem fine, although they
> are pure compute servers with just a local IDE drive. (and they are
> running SuSE 64-bit which is what the application demands)

Interesting - the private mail was from me, and I've got two dual 
Opterons in service. The one with significantly more PCI activity has 
significantly more problems then the one with less PCI activity.

There appears to be a trend here then, if 3 is a trend.

We're moving to a single PCI card (8-port 3ware, iirc) for all 
off-motherboard disk access, and I'll report back if that changes anything.

-Mike

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH md 2 of 4] Fix raid6 problem
  2005-02-04 18:31             ` Mike Hardy
@ 2005-02-13 21:05               ` Mark Hahn
  2005-02-13 21:19                 ` Gordon Henderson
  2005-02-13 22:58                 ` Mike Hardy
  0 siblings, 2 replies; 28+ messages in thread
From: Mark Hahn @ 2005-02-13 21:05 UTC (permalink / raw)
  To: Mike Hardy; +Cc: linux-raid

> Interesting - the private mail was from me, and I've got two dual 
> Opterons in service. The one with significantly more PCI activity has 
> significantly more problems then the one with less PCI activity.

that's pretty odd, since the most intense IO devices I know of 
are cluster interconnect (quadrics, myrinet, infiniband),
and those vendors *love* opterons.  I've never heard any of them
say other than that Opteron IO handling is noticably better than
Intel's.

otoh, I could easily believe that if you're running the Opteron 
systems in acts-like-a-faster-xeon mode (ie, not x86_64),
you might be exercising some less-tested paths.

regards, mark hahn.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH md 2 of 4] Fix raid6 problem
  2005-02-13 21:05               ` Mark Hahn
@ 2005-02-13 21:19                 ` Gordon Henderson
  2005-02-14  4:56                   ` Tim Moore
  2005-02-13 22:58                 ` Mike Hardy
  1 sibling, 1 reply; 28+ messages in thread
From: Gordon Henderson @ 2005-02-13 21:19 UTC (permalink / raw)
  To: linux-raid

On Sun, 13 Feb 2005, Mark Hahn wrote:

> > Interesting - the private mail was from me, and I've got two dual
> > Opterons in service. The one with significantly more PCI activity has
> > significantly more problems then the one with less PCI activity.
>
> that's pretty odd, since the most intense IO devices I know of
> are cluster interconnect (quadrics, myrinet, infiniband),
> and those vendors *love* opterons.  I've never heard any of them
> say other than that Opteron IO handling is noticably better than
> Intel's.
>
> otoh, I could easily believe that if you're running the Opteron
> systems in acts-like-a-faster-xeon mode (ie, not x86_64),
> you might be exercising some less-tested paths.

I was about to post that I've solved my problems with that Tyan dual
opteron motherboard, but it's still crap. I upgraded the BIOS to the 2.02
beta and it seemed to work a lot better.  Still couldn't boot off it with
all 8 drives in, but solved that with the use of a 32MB flash IDE unit
holding /boot... However, it dropped a drive during initial sync of the
raid6 arrays with lots of SCSI errors, and had given lots of DMA interrupt
missing, etc. thorugh the day when I've run soaktests on it, so I'm going
to conclude that that Tyan motherboard is utterly useless and deserves
nothing more than being driven over. Slowly. With a steam roller.  Then
jumped on. Just to make me feel better.

Gordon

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH md 2 of 4] Fix raid6 problem
  2005-02-13 21:19                 ` Gordon Henderson
@ 2005-02-14  4:56                   ` Tim Moore
  2005-02-14  9:42                     ` Andrew Walrond
  0 siblings, 1 reply; 28+ messages in thread
From: Tim Moore @ 2005-02-14  4:56 UTC (permalink / raw)
  To: linux-raid

Gordon Henderson wrote:
> On Sun, 13 Feb 2005, Mark Hahn wrote:
> I was about to post that I've solved my problems with that Tyan dual
> opteron motherboard, but it's still crap. I upgraded the BIOS to the 2.02
> beta and it seemed to work a lot better.  Still couldn't boot off it with
> all 8 drives in, but solved that with the use of a 32MB flash IDE unit
> holding /boot... However, it dropped a drive during initial sync of the
> raid6 arrays with lots of SCSI errors, and had given lots of DMA interrupt
> missing, etc. thorugh the day when I've run soaktests on it, so I'm going

2 of the 4 SATA cables packaged with the K8W were intermittent bad 
(SCSI/DMA errors, BIOS didn't see them at boot).  New cables, no problem.

No I don't work for AMD or Tyan :)  I tried the K8W as a high throughput 
client driver platform because I couldnt take the budget hit on an Iwill 
QK8S which we use in production...they work like perfect screaming daemons.

Also considered the MSI board but no numa memory interconnect and no 64bit 
slots.

Good luck.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH md 2 of 4] Fix raid6 problem
  2005-02-14  4:56                   ` Tim Moore
@ 2005-02-14  9:42                     ` Andrew Walrond
  0 siblings, 0 replies; 28+ messages in thread
From: Andrew Walrond @ 2005-02-14  9:42 UTC (permalink / raw)
  To: linux-raid

On Monday 14 February 2005 04:56, Tim Moore wrote:
>
> Also considered the MSI board but no numa memory interconnect and no 64bit
> slots.
>

I have some MSI boards here. The K8D Master3 has both NUMA and 64bit slots. 
I've been running 2.6.10 flawlessy with more than a months uptime so far.

Andrew Walrond

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH md 2 of 4] Fix raid6 problem
  2005-02-13 21:05               ` Mark Hahn
  2005-02-13 21:19                 ` Gordon Henderson
@ 2005-02-13 22:58                 ` Mike Hardy
  2005-02-13 23:14                   ` Richard Scobie
  1 sibling, 1 reply; 28+ messages in thread
From: Mike Hardy @ 2005-02-13 22:58 UTC (permalink / raw)
  Cc: linux-raid

Mark Hahn wrote:
>>Interesting - the private mail was from me, and I've got two dual 
>>Opterons in service. The one with significantly more PCI activity has 
>>significantly more problems then the one with less PCI activity.
> 
> 
> that's pretty odd, since the most intense IO devices I know of 
> are cluster interconnect (quadrics, myrinet, infiniband),
> and those vendors *love* opterons.  I've never heard any of them
> say other than that Opteron IO handling is noticably better than
> Intel's.

Sure, but which variables are changed between the rigs the vendors 
loved, and the rig we're having problems with?

> otoh, I could easily believe that if you're running the Opteron 
> systems in acts-like-a-faster-xeon mode (ie, not x86_64),
> you might be exercising some less-tested paths.

Its running x86_64 (Fedora Core 3) and the problem is rooted in the 
chipset I believe. I don't think its Opterons per se, I think its just 
the Athlon take two - which is to say that its a wonderful chip, but 
some of the chipsets its saddled with are horrible, and careful 
selection (as well as heavy testing prior to putting a machine in 
service) is essential.

-Mike

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH md 2 of 4] Fix raid6 problem
  2005-02-13 22:58                 ` Mike Hardy
@ 2005-02-13 23:14                   ` Richard Scobie
  0 siblings, 0 replies; 28+ messages in thread
From: Richard Scobie @ 2005-02-13 23:14 UTC (permalink / raw)
  To: linux-raid

Mike Hardy wrote:

> Its running x86_64 (Fedora Core 3) and the problem is rooted in the 
> chipset I believe. I don't think its Opterons per se, I think its just 
> the Athlon take two - which is to say that its a wonderful chip, but 
> some of the chipsets its saddled with are horrible, and careful 
> selection (as well as heavy testing prior to putting a machine in 
> service) is essential.

I'd be interested to hear the outcome, as I'll be looking at Opteron
systems soon and having been badly bitten on dual Athlon (AMD768
Southbridge), would prefer to avoid a repeat of the experience.

Regards,

Richard


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH md 2 of 4] Fix raid6 problem
  2005-02-04 11:19           ` Gordon Henderson
  2005-02-04 18:31             ` Mike Hardy
@ 2005-02-06  3:38             ` Tim Moore
  2005-02-14  4:49             ` Tim Moore
  2 siblings, 0 replies; 28+ messages in thread
From: Tim Moore @ 2005-02-06  3:38 UTC (permalink / raw)
  To: linux-raid; +Cc: Gordon Henderson

Gordon Henderson wrote:
 > ...
> It seemed more stable with just one PCI card in, so I have a 4-port card
> on order as a last ditch attempt to make it work - I did try re-flashing
> the BIOS on one board, (I have 2) as it seemed to be about a year old and
> there are several updates on the Tyan web-site, however that resulted in
> wiping out the BIOS - it seemed to be going just fine, then it went beep
> and was silent forever more )-: Anyone in the SW have a flash
> programmer/copier handy???

I had this happen when flashing 2.02b.  Reboot was an endless cycle of 15 
short beeps.  Turns out the 2.02 memory configuration is so different than 
the 1.?? it couldn't boot.  Swapped the 1GB PC2700 (ATP) memory with 512MB 
modules, cleared CMOS, rebooted, reconfigured the BIOS, shut down, replaced 
the 1GB production modules, rebooted.  Perfect ever since.

The sil 3114 onboard w software RAID5 has been flawless (RH 7.3 base, 
2.4.29 kernel, SCSI/sil driver).  The only other problem was that two of 
the 4 short black SATA cables that came with the board were intermittent. 
Once replaced all is well.

I'd give 2.02 beta Bios a run and make sure PCI-X slot configuration is 
matched to the controller card.  2.02 BIOS is very different than earlier 
versions.

Iwill quad opteron boards and are quite happy to never go back to intel 
ever again.

t.
-- 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH md 2 of 4] Fix raid6 problem
  2005-02-04 11:19           ` Gordon Henderson
  2005-02-04 18:31             ` Mike Hardy
  2005-02-06  3:38             ` Tim Moore
@ 2005-02-14  4:49             ` Tim Moore
  2005-02-14  8:09               ` Gordon Henderson
  2 siblings, 1 reply; 28+ messages in thread
From: Tim Moore @ 2005-02-14  4:49 UTC (permalink / raw)
  To: linux-raid



Gordon Henderson wrote:
> What I wanted was an 8-way RAID-1 for the boot partition (all of /, in
> reality) and I've done this many times in the past on other 2-5 way
> systems without issue. So I do the stuff I've done in the past, and theres
> nothing really new to me in that respect. (I'm using LILO) So when I try
> to get it to boot off the md device, it boots and says LIL and then
> nothing more. (Lilo diagnostics interpret this as a media failure, or
> geometry mismatch) If I make it boot off /dev/sda1 then it would work.

We put /boot on 100MB /dev/sda1 partition, rest of drive is md.  Lilo 
script section does
dd if=/dev/sda of=/boot/boot446.sda bs=446 count=1 && \
fdisk -l /dev/sda > /boot/fdisk.sda && \
dd if=/dev/sda1 of=/dev/sdb1
every time a new kernel is built.  Recovery is much easier without RAID 
involved (lilo 22.6).  I've considered manipulating the boot block/disk 
label on copy so that it would boot off any off sda1 or sdb1 transparently.

> (ie. boot off /dev/sda1, root on /dev/md1, an 8-way RAID-1) I tried many
> combinations of old (Debian woody) & new Lilo (compiled from the latest
> source), I even tried GRUB at one point with no luck either. It was more
> frustrating as the turn-around time is several minutes by the time you go
> through the BIOS to change the boot device, then reboot, change lilo.conf,
> then try again )-:
> 
> It seemed more stable with just one PCI card in, so I have a 4-port card
> on order as a last ditch attempt to make it work - I did try re-flashing
> the BIOS on one board, (I have 2) as it seemed to be about a year old and
> there are several updates on the Tyan web-site, however that resulted in
> wiping out the BIOS - it seemed to be going just fine, then it went beep
> and was silent forever more )-: Anyone in the SW have a flash
> programmer/copier handy???

Flashing from 1.x to 2.02b, same problem.  Power off, pull plug, pull both 
power connectors off mobo, wait 15 seconds, clear CMOS for 15 seconds, 
reboot, reset BIOS, no worries.

> Maybe, or maybe we just move to an Intel system, although power
> dissipation was a consideration and the Opterons are attractive in that
> aspect... The case has a 600W PSU before anyone asks..

Yeech.  Get the 8131's working and you'll never go back.  No Northbridge 
bottlenecks, thank you.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH md 2 of 4] Fix raid6 problem
  2005-02-14  4:49             ` Tim Moore
@ 2005-02-14  8:09               ` Gordon Henderson
  0 siblings, 0 replies; 28+ messages in thread
From: Gordon Henderson @ 2005-02-14  8:09 UTC (permalink / raw)
  To: Tim Moore; +Cc: linux-raid

On Sun, 13 Feb 2005, Tim Moore wrote:

> Gordon Henderson wrote:
> > What I wanted was an 8-way RAID-1 for the boot partition (all of /, in
> > reality) and I've done this many times in the past on other 2-5 way
> > systems without issue. So I do the stuff I've done in the past, and theres
> > nothing really new to me in that respect. (I'm using LILO) So when I try
> > to get it to boot off the md device, it boots and says LIL and then
> > nothing more. (Lilo diagnostics interpret this as a media failure, or
> > geometry mismatch) If I make it boot off /dev/sda1 then it would work.
>
> We put /boot on 100MB /dev/sda1 partition, rest of drive is md.  Lilo
> script section does
> dd if=/dev/sda of=/boot/boot446.sda bs=446 count=1 && \
> fdisk -l /dev/sda > /boot/fdisk.sda && \
> dd if=/dev/sda1 of=/dev/sdb1
> every time a new kernel is built.  Recovery is much easier without RAID
> involved (lilo 22.6).  I've considered manipulating the boot block/disk
> label on copy so that it would boot off any off sda1 or sdb1 transparently.

It's now booting off a 32MB flash IDE drive thing which is mounted
read-only under /boot. I have lilo remount it r/w the do its stuff, then
remount it r/o again. I could boot it OK under raid-1 off the 4 drives on
the on-board controller. As soon as I plugged in PCI disk contorllers it
all goes pear-shaped. This motherboard & bios and serious issues with more
than 4 SATA disks.

> Flashing from 1.x to 2.02b, same problem.  Power off, pull plug, pull
> both power connectors off mobo, wait 15 seconds, clear CMOS for 15
> seconds, reboot, reset BIOS, no worries.

I re-flashed one board to 2.02b - it helps in that the system doesn't
lock-hard with one make of 3112 card, but the other make still can cause a
hard lockup. I also seem to have lose the use of PCI slots 3 and 4 with
this new BIOS. One board died during re-flashing, so it's gone back.

Gordon

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH md 2 of 4] Fix raid6 problem
  2005-02-03 19:36       ` Gordon Henderson
  2005-02-04  9:04         ` Andrew Walrond
@ 2005-02-14  4:27         ` Tim Moore
  2005-02-14  8:05           ` Gordon Henderson
  1 sibling, 1 reply; 28+ messages in thread
From: Tim Moore @ 2005-02-14  4:27 UTC (permalink / raw)
  To: linux-raid

Gordon Henderson wrote:
> 
> Anyone using Tyan Thunder K8W motherboards???
> 
> I now know, there is a K8S (server?) version of that mobo, but at the time
> it was all orderd, I wasn't aware of it - my thoughts are there there is
> some sort of PCI/PCI-X problem with either the motherboard or the chipset,
> and in all probability the K8S mobo will have the same chipsset and same
> problems anyway...

I'm using a K8W at work as a driver client for NAS testing.  Onboard 
Broadcom GigE, Linksys Marvell GigE, 2xWD1200JD + 2xMaxtor Maxline Plus II 
as RAID-0 and RAID-5 using the Sil_3114, 2.4.29, raidtools 1.0.  2+2x1GB 
PC-2700 in first and third slots for each CPU.  All PCI-X/HT configs set to 
  Auto in BIOS and Jumpers, 2.02b BIOS.  No issues except for a bad SATA cable.

Striping yields ~90MB/s, RAID-5 about 65r, 55w on 8GB Bonnie++ runs, 2GB dd 
reads on raw devices yields ~55MB/s

Fedora Core 2 tests next week.

-- 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH md 2 of 4] Fix raid6 problem
  2005-02-14  4:27         ` Tim Moore
@ 2005-02-14  8:05           ` Gordon Henderson
  0 siblings, 0 replies; 28+ messages in thread
From: Gordon Henderson @ 2005-02-14  8:05 UTC (permalink / raw)
  To: Tim Moore; +Cc: linux-raid

On Sun, 13 Feb 2005, Tim Moore wrote:

> Gordon Henderson wrote:
> >
> > Anyone using Tyan Thunder K8W motherboards???
> >
> > I now know, there is a K8S (server?) version of that mobo, but at the time
> > it was all orderd, I wasn't aware of it - my thoughts are there there is
> > some sort of PCI/PCI-X problem with either the motherboard or the chipset,
> > and in all probability the K8S mobo will have the same chipsset and same
> > problems anyway...
>
> I'm using a K8W at work as a driver client for NAS testing.  Onboard
> Broadcom GigE, Linksys Marvell GigE, 2xWD1200JD + 2xMaxtor Maxline Plus II
> as RAID-0 and RAID-5 using the Sil_3114, 2.4.29, raidtools 1.0.  2+2x1GB
> PC-2700 in first and third slots for each CPU.  All PCI-X/HT configs set to
>   Auto in BIOS and Jumpers, 2.02b BIOS.  No issues except for a bad SATA cable.
>
> Striping yields ~90MB/s, RAID-5 about 65r, 55w on 8GB Bonnie++ runs, 2GB dd
> reads on raw devices yields ~55MB/s

I've not had any issues with the on-board 3114 controller. It's not
blindingly fast, but it's at the end of 2 PCI bridges and on a 33MHz
32-bit bus, but it's fine. I've had over 270MB/sec reads out of an 8-way
RAID-0 array. (300Mb/sec writes!) although I want raid-6 on all
partitions, that drops down to ~130MB/sec read.

The problems happen when I plug-in a PCI SATA card (3112 chipset based).
I've tried 2 different types of cards and while one is better than the
other (causes less or no lock-ups) it's still not perfect. I'm going to
try a 4-port card this week. It's solid with only one PCI card in.
Assitionally, if I plug a card into PCI slots 3 or 4, then the bios locks
up at boot time (Checking NVRAM ... )

Hm. According to the manual, your memory configuration isn't supported -
you should be using slots 1 & 2 for each processor to get 128-bit
access... I only have 2 x 512MB PC2700 modules in slots 1 & 2 of CPU0.

If this box was just going to be a fileserver (NAS sort of thing) then I'd
have gone with a single processor, but it's also going to be running some
huge CVS and MySQL application (home built version control system) and
they specified dual-processors. (The existing setup runs on a dual Xeon
PII/700 Dull box which takes up too much rack space and doesn't have
enough disks)

Gordon

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH md 2 of 4] Fix raid6 problem
@ 2004-11-03 23:56 A. James Lewis
  2004-12-09  0:21 ` H. Peter Anvin
       [not found] ` <200412090021.iB90L4MK014200@terminus.zytor.com>
  0 siblings, 2 replies; 28+ messages in thread
From: A. James Lewis @ 2004-11-03 23:56 UTC (permalink / raw)
  To: linux-raid

Hi,

I'd like to put raid6 to use, and after reading through the process of
tracking down this bug, it seems that a rational explanation was found for
the data corruption, and the fix well tested... but being new to a lot of
the process here, what is the process for this to get into the standard
kernel... perhaps 2.6.10 will have this patch??

Obviously I could apply the patch to raid6main.c on my system, but it
would be  good to use a standard kernel...

The problem is only when writing to a degraded array, but most of us are
impatient and want to write a filesystem and get it mounted before the
first sync is complete... and those, like me cursed with bad hardware will
have 2 drives fail at the same time (last week!) and hence raid6 is very
appealing :).

-- 
¯·.¸¸.·´¯·.¸¸.-> A. James Lewis (james@fsck.co.uk)
http://www.fsck.co.uk/personal/nopistons.jpg
MAZDA - World domination through rotary power.

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH md 2 of 4] Fix raid6 problem
  2004-11-03 23:56 A. James Lewis
@ 2004-12-09  0:21 ` H. Peter Anvin
  2004-12-09  0:35   ` Jim Paris
       [not found] ` <200412090021.iB90L4MK014200@terminus.zytor.com>
  1 sibling, 1 reply; 28+ messages in thread
From: H. Peter Anvin @ 2004-12-09  0:21 UTC (permalink / raw)
  To: linux-raid

Followup to:  <38038.212.158.231.74.1099526180.squirrel@mail.fsck.co.uk>
By author:    "A. James Lewis" <james@fsck.co.uk>
In newsgroup: linux.dev.raid
> 
> I'd like to put raid6 to use, and after reading through the process of
> tracking down this bug, it seems that a rational explanation was found for
> the data corruption, and the fix well tested... but being new to a lot of
> the process here, what is the process for this to get into the standard
> kernel... perhaps 2.6.10 will have this patch??
> 
> Obviously I could apply the patch to raid6main.c on my system, but it
> would be  good to use a standard kernel...
> 
> The problem is only when writing to a degraded array, but most of us are
> impatient and want to write a filesystem and get it mounted before the
> first sync is complete... and those, like me cursed with bad hardware will
> have 2 drives fail at the same time (last week!) and hence raid6 is very
> appealing :).
> 

Hi James,

This patch got integrated in, I believe, 2.6.10-rc2.

Please let me know what your experience is.  It would be good to get
the EXPERIMENTAL tag taken off at some point.

	-hpa

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH md 2 of 4] Fix raid6 problem
  2004-12-09  0:21 ` H. Peter Anvin
@ 2004-12-09  0:35   ` Jim Paris
  0 siblings, 0 replies; 28+ messages in thread
From: Jim Paris @ 2004-12-09  0:35 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-raid

> Please let me know what your experience is.  It would be good to get
> the EXPERIMENTAL tag taken off at some point.

FYI: My raid-6 system (reiserfs) has seen only moderate use and had no
disk failures occur, but I have had no problems since applying those
patches.

-jim

^ permalink raw reply	[flat|nested] 28+ messages in thread

[parent not found: <200412090021.iB90L4MK014200@terminus.zytor.com>]

* Re: [PATCH md 2 of 4] Fix raid6 problem
       [not found] ` <200412090021.iB90L4MK014200@terminus.zytor.com>
@ 2005-01-23 14:02   ` A. James Lewis
  2005-01-23 14:42     ` Kevin P. Fleming
  2005-02-03  2:12     ` H. Peter Anvin
  0 siblings, 2 replies; 28+ messages in thread
From: A. James Lewis @ 2005-01-23 14:02 UTC (permalink / raw)
  To: linux-raid

Sorry for the delay in replying, I've been using RAID6 in a real life
situation with 2.6.9 + patch, for 2 months now, with 1.15Tb of storage,
and I have had more than 1 drive failure... as well as some rather
embarasing hardware corruption which I traced to a faulty IDE controller.

Dispite some random DMA corrupion, and loosing a total of 3 disks, I have
not had any problems with it RAID6 itself, and really it has litereally
saved my data from being lost.

I ran a diff against the 2.6.9 patch and what is in 2.6.10... and they are
not the same, presumably a more elegant fix has been implimented for the
production kernel??

As an aside,

At the moment, I am experimenting with RAID on top of USB Mass Storage
devices... it's interesting because the USB system takes a significant
time to identify and make each drive available, and I have to determine if
all the drives have become available before starting any arrays....

Does anyone have any experience with this sort of thing?

I'm sure H. Peter Anvin said somthing about:
> Followup to:  <38038.212.158.231.74.1099526180.squirrel@mail.fsck.co.uk>
> By author:    "A. James Lewis" <james@fsck.co.uk>
> In newsgroup: linux.dev.raid
>>
>> I'd like to put raid6 to use, and after reading through the process of
>> tracking down this bug, it seems that a rational explanation was found
>> for
>> the data corruption, and the fix well tested... but being new to a lot
>> of
>> the process here, what is the process for this to get into the standard
>> kernel... perhaps 2.6.10 will have this patch??
>>
>> Obviously I could apply the patch to raid6main.c on my system, but it
>> would be  good to use a standard kernel...
>>
>> The problem is only when writing to a degraded array, but most of us are
>> impatient and want to write a filesystem and get it mounted before the
>> first sync is complete... and those, like me cursed with bad hardware
>> will
>> have 2 drives fail at the same time (last week!) and hence raid6 is very
>> appealing :).
>>
>
> Hi James,
>
> This patch got integrated in, I believe, 2.6.10-rc2.
>
> Please let me know what your experience is.  It would be good to get
> the EXPERIMENTAL tag taken off at some point.
>
> 	-hpa
>
>
> !DSPAM:41b79a8888125029010830!
>
>
>

-- 
¯·.¸¸.·´¯·.¸¸.-> A. James Lewis (james@fsck.co.uk)
http://www.fsck.co.uk/personal/nopistons.jpg
MAZDA - World domination through rotary power.

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH md 2 of 4] Fix raid6 problem
  2005-01-23 14:02   ` A. James Lewis
@ 2005-01-23 14:42     ` Kevin P. Fleming
  2005-02-03  2:12     ` H. Peter Anvin
  1 sibling, 0 replies; 28+ messages in thread
From: Kevin P. Fleming @ 2005-01-23 14:42 UTC (permalink / raw)
  Cc: linux-raid

A. James Lewis wrote:

> At the moment, I am experimenting with RAID on top of USB Mass Storage
> devices... it's interesting because the USB system takes a significant
> time to identify and make each drive available, and I have to determine if
> all the drives have become available before starting any arrays....

You'd be best off to leverage the hotplug infrastructure (userspace 
scripts) and just not try to start the array until you know all the 
members are available.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH md 2 of 4] Fix raid6 problem
  2005-01-23 14:02   ` A. James Lewis
  2005-01-23 14:42     ` Kevin P. Fleming
@ 2005-02-03  2:12     ` H. Peter Anvin
  2005-02-03 17:13       ` Andy Smith
  1 sibling, 1 reply; 28+ messages in thread
From: H. Peter Anvin @ 2005-02-03  2:12 UTC (permalink / raw)
  To: linux-raid

Followup to:  <33023.212.158.231.74.1106488921.squirrel@mail.fsck.co.uk>
By author:    "A. James Lewis" <james@fsck.co.uk>
In newsgroup: linux.dev.raid
>
> 
> Sorry for the delay in replying, I've been using RAID6 in a real life
> situation with 2.6.9 + patch, for 2 months now, with 1.15Tb of storage,
> and I have had more than 1 drive failure... as well as some rather
> embarasing hardware corruption which I traced to a faulty IDE controller.
> 
> Dispite some random DMA corrupion, and loosing a total of 3 disks, I have
> not had any problems with it RAID6 itself, and really it has litereally
> saved my data from being lost.
> 
> I ran a diff against the 2.6.9 patch and what is in 2.6.10... and they are
> not the same, presumably a more elegant fix has been implimented for the
> production kernel??
> 

I think there are some other (generic) fixes in there too.

Anyway... I'm thinking of sending in a patch to take out the
"experimental" status of RAID-6.  I have been running a 1 TB
production server in 1-disk degraded mode for about a month now
without incident.

	-hpa


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH md 2 of 4] Fix raid6 problem
  2005-02-03  2:12     ` H. Peter Anvin
@ 2005-02-03 17:13       ` Andy Smith
  0 siblings, 0 replies; 28+ messages in thread
From: Andy Smith @ 2005-02-03 17:13 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 345 bytes --]

On Thu, Feb 03, 2005 at 02:12:38AM +0000, H. Peter Anvin wrote:
> Anyway... I'm thinking of sending in a patch to take out the
> "experimental" status of RAID-6.  I have been running a 1 TB
> production server in 1-disk degraded mode for about a month now
> without incident.

Out of interest, how many disks does this have and what capacities?

[-- Attachment #2: Type: application/pgp-signature, Size: 187 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH md 0 of 4] Introduction
@ 2004-11-02  3:37 NeilBrown
  2004-11-02  3:37 ` [PATCH md 2 of 4] Fix raid6 problem NeilBrown
  0 siblings, 1 reply; 28+ messages in thread
From: NeilBrown @ 2004-11-02  3:37 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid

Following are 4 patches for md/raid against 2.6.10-rc1-mm2.

1/ Fix problem with linear arrays if component devices are > 2terabytes
2/ Fix data corruption in (experimental) RAID6 personality
3/ Fix possible oops with unplug_timer firing at the wrong time.
4/ Add new md personality "faulty".
    "Faulty" can be used to inject faults and so test failure modes
    of other raid levels and of filesystes.

NeilBrown


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH md 2 of 4] Fix raid6 problem
  2004-11-02  3:37 [PATCH md 0 of 4] Introduction NeilBrown
@ 2004-11-02  3:37 ` NeilBrown
  0 siblings, 0 replies; 28+ messages in thread
From: NeilBrown @ 2004-11-02  3:37 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid


Sometimes it didn't read all (working) drives before
a parity calculation.

Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>

### Diffstat output
 ./drivers/md/raid6main.c |   17 +++++++++--------
 1 files changed, 9 insertions(+), 8 deletions(-)

diff ./drivers/md/raid6main.c~current~ ./drivers/md/raid6main.c
--- ./drivers/md/raid6main.c~current~	2004-11-02 14:20:15.000000000 +1100
+++ ./drivers/md/raid6main.c	2004-11-02 14:20:15.000000000 +1100
@@ -734,7 +734,6 @@ static void compute_parity(struct stripe
 	case READ_MODIFY_WRITE:
 		BUG();		/* READ_MODIFY_WRITE N/A for RAID-6 */
 	case RECONSTRUCT_WRITE:
-	case UPDATE_PARITY:	/* Is this right? */
 		for (i= disks; i-- ;)
 			if ( i != pd_idx && i != qd_idx && sh->dev[i].towrite ) {
 				chosen = sh->dev[i].towrite;
@@ -770,7 +769,8 @@ static void compute_parity(struct stripe
 		i = d0_idx;
 		do {
 			ptrs[count++] = page_address(sh->dev[i].page);
-
+			if (count <= disks-2 && !test_bit(R5_UPTODATE, &sh->dev[i].flags))
+				printk("block %d/%d not uptodate on parity calc\n", i,count);
 			i = raid6_next_disk(i, disks);
 		} while ( i != d0_idx );
 //		break;
@@ -818,7 +818,7 @@ static void compute_block_1(struct strip
 			if (test_bit(R5_UPTODATE, &sh->dev[i].flags))
 				ptr[count++] = p;
 			else
-				PRINTK("compute_block() %d, stripe %llu, %d"
+				printk("compute_block() %d, stripe %llu, %d"
 				       " not present\n", dd_idx,
 				       (unsigned long long)sh->sector, i);
 
@@ -875,6 +875,9 @@ static void compute_block_2(struct strip
 		do {
 			ptrs[count++] = page_address(sh->dev[i].page);
 			i = raid6_next_disk(i, disks);
+			if (i != dd_idx1 && i != dd_idx2 &&
+			    !test_bit(R5_UPTODATE, &sh->dev[i].flags))
+				printk("compute_2 with missing block %d/%d\n", count, i);
 		} while ( i != d0_idx );
 
 		if ( failb == disks-2 ) {
@@ -1157,17 +1160,15 @@ static void handle_stripe(struct stripe_
 	 * parity, or to satisfy requests
 	 * or to load a block that is being partially written.
 	 */
-	if (to_read || non_overwrite || (syncing && (uptodate < disks))) {
+	if (to_read || non_overwrite || (to_write && failed) || (syncing && (uptodate < disks))) {
 		for (i=disks; i--;) {
 			dev = &sh->dev[i];
 			if (!test_bit(R5_LOCKED, &dev->flags) && !test_bit(R5_UPTODATE, &dev->flags) &&
 			    (dev->toread ||
 			     (dev->towrite && !test_bit(R5_OVERWRITE, &dev->flags)) ||
 			     syncing ||
-			     (failed >= 1 && (sh->dev[failed_num[0]].toread ||
-					 (sh->dev[failed_num[0]].towrite && !test_bit(R5_OVERWRITE, &sh->dev[failed_num[0]].flags)))) ||
-			     (failed >= 2 && (sh->dev[failed_num[1]].toread ||
-					 (sh->dev[failed_num[1]].towrite && !test_bit(R5_OVERWRITE, &sh->dev[failed_num[1]].flags))))
+			     (failed >= 1 && (sh->dev[failed_num[0]].toread || to_write)) ||
+			     (failed >= 2 && (sh->dev[failed_num[1]].toread || to_write))
 				    )
 				) {
 				/* we would like to get this block, possibly

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2005-02-14  9:42 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <200502031145.j13Bj1fl016074@terminus.zytor.com>
2005-02-03 16:39 ` [PATCH md 2 of 4] Fix raid6 problem H. Peter Anvin
2005-02-03 16:59   ` Lars Marowsky-Bree
2005-02-03 17:06     ` H. Peter Anvin
2005-02-03 17:43   ` Guy
2005-02-03 18:07     ` H. Peter Anvin
2005-02-03 19:36       ` Gordon Henderson
2005-02-04  9:04         ` Andrew Walrond
2005-02-04 11:19           ` Gordon Henderson
2005-02-04 18:31             ` Mike Hardy
2005-02-13 21:05               ` Mark Hahn
2005-02-13 21:19                 ` Gordon Henderson
2005-02-14  4:56                   ` Tim Moore
2005-02-14  9:42                     ` Andrew Walrond
2005-02-13 22:58                 ` Mike Hardy
2005-02-13 23:14                   ` Richard Scobie
2005-02-06  3:38             ` Tim Moore
2005-02-14  4:49             ` Tim Moore
2005-02-14  8:09               ` Gordon Henderson
2005-02-14  4:27         ` Tim Moore
2005-02-14  8:05           ` Gordon Henderson
2004-11-03 23:56 A. James Lewis
2004-12-09  0:21 ` H. Peter Anvin
2004-12-09  0:35   ` Jim Paris
     [not found] ` <200412090021.iB90L4MK014200@terminus.zytor.com>
2005-01-23 14:02   ` A. James Lewis
2005-01-23 14:42     ` Kevin P. Fleming
2005-02-03  2:12     ` H. Peter Anvin
2005-02-03 17:13       ` Andy Smith
  -- strict thread matches above, loose matches on Subject: below --
2004-11-02  3:37 [PATCH md 0 of 4] Introduction NeilBrown
2004-11-02  3:37 ` [PATCH md 2 of 4] Fix raid6 problem NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).