* Re: [PATCH md 2 of 4] Fix raid6 problem [not found] <200502031145.j13Bj1fl016074@terminus.zytor.com> @ 2005-02-03 16:39 ` H. Peter Anvin 2005-02-03 16:59 ` Lars Marowsky-Bree 2005-02-03 17:43 ` Guy 0 siblings, 2 replies; 28+ messages in thread From: H. Peter Anvin @ 2005-02-03 16:39 UTC (permalink / raw) To: Ruth Ivimey-Cook; +Cc: linux-raid Ruth Ivimey-Cook wrote: > > Would you say that raid-6 is suitable for storing mission-critical data, then? > What I'd say is that I don't have any evidence it's not. Unfortunately, that's not quite the same thing. > I have a .5TB raid5 array on 5 IDE disks, and given what has been said recently > about disk MTBF's and RAID failure recovery, I'm thinking it might be best to > switch to raid-6. > > I guess such a switch is best implemented as {make backup, reformat, restore}, > if I went ahead? Yes, right now there is no RAID5->RAID6 conversion tool that I know of. -hpa ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH md 2 of 4] Fix raid6 problem 2005-02-03 16:39 ` [PATCH md 2 of 4] Fix raid6 problem H. Peter Anvin @ 2005-02-03 16:59 ` Lars Marowsky-Bree 2005-02-03 17:06 ` H. Peter Anvin 2005-02-03 17:43 ` Guy 1 sibling, 1 reply; 28+ messages in thread From: Lars Marowsky-Bree @ 2005-02-03 16:59 UTC (permalink / raw) To: H. Peter Anvin, Ruth Ivimey-Cook; +Cc: linux-raid On 2005-02-03T08:39:41, "H. Peter Anvin" <hpa@zytor.com> wrote: > Yes, right now there is no RAID5->RAID6 conversion tool that I know of. Hm. One of the checksums is identical, as is the disk layout of the data, no? So wouldn't mdadm with the right parameters forcing the right super block to be written, and then the missing disk to be hot-added "convert" this? But yes, I'd recommend a backup before doing so ;-) Mit freundlichen Grüßen, Lars Marowsky-Brée <lmb@suse.de> -- High Availability & Clustering SUSE Labs, Research and Development SUSE LINUX Products GmbH - A Novell Business - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH md 2 of 4] Fix raid6 problem 2005-02-03 16:59 ` Lars Marowsky-Bree @ 2005-02-03 17:06 ` H. Peter Anvin 0 siblings, 0 replies; 28+ messages in thread From: H. Peter Anvin @ 2005-02-03 17:06 UTC (permalink / raw) To: Lars Marowsky-Bree; +Cc: Ruth Ivimey-Cook, linux-raid Lars Marowsky-Bree wrote: > On 2005-02-03T08:39:41, "H. Peter Anvin" <hpa@zytor.com> wrote: > > >>Yes, right now there is no RAID5->RAID6 conversion tool that I know of. > > Hm. One of the checksums is identical, as is the disk layout of the > data, no? > No, the layout is different. -hpa ^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [PATCH md 2 of 4] Fix raid6 problem 2005-02-03 16:39 ` [PATCH md 2 of 4] Fix raid6 problem H. Peter Anvin 2005-02-03 16:59 ` Lars Marowsky-Bree @ 2005-02-03 17:43 ` Guy 2005-02-03 18:07 ` H. Peter Anvin 1 sibling, 1 reply; 28+ messages in thread From: Guy @ 2005-02-03 17:43 UTC (permalink / raw) To: 'H. Peter Anvin', 'Ruth Ivimey-Cook'; +Cc: linux-raid Would you say that the 2.6 Kernel is suitable for storing mission-critical data, then? I ask because I have read about a lot of problems with data corruption and oops on this list and the SCSI list. But in most or all cases the 2.4 Kernel does not have the same problem. Who out there has a RAID6 array that they believe is stable and safe? And please give some details about the array. Number of disks, sizes, LVM, FS, SCSI, ATA and anything else you can think of? Also, details about any disk failures and how well recovery went? Thanks, Guy -----Original Message----- From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of H. Peter Anvin Sent: Thursday, February 03, 2005 11:40 AM To: Ruth Ivimey-Cook Cc: linux-raid@vger.kernel.org Subject: Re: [PATCH md 2 of 4] Fix raid6 problem Ruth Ivimey-Cook wrote: > > Would you say that raid-6 is suitable for storing mission-critical data, then? > What I'd say is that I don't have any evidence it's not. Unfortunately, that's not quite the same thing. > I have a .5TB raid5 array on 5 IDE disks, and given what has been said recently > about disk MTBF's and RAID failure recovery, I'm thinking it might be best to > switch to raid-6. > > I guess such a switch is best implemented as {make backup, reformat, restore}, > if I went ahead? Yes, right now there is no RAID5->RAID6 conversion tool that I know of. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH md 2 of 4] Fix raid6 problem 2005-02-03 17:43 ` Guy @ 2005-02-03 18:07 ` H. Peter Anvin 2005-02-03 19:36 ` Gordon Henderson 0 siblings, 1 reply; 28+ messages in thread From: H. Peter Anvin @ 2005-02-03 18:07 UTC (permalink / raw) To: Guy; +Cc: 'Ruth Ivimey-Cook', linux-raid Guy wrote: > Would you say that the 2.6 Kernel is suitable for storing mission-critical > data, then? Sure. I'd trust 2.6 over 2.4 at this point. > I ask because I have read about a lot of problems with data corruption and > oops on this list and the SCSI list. But in most or all cases the 2.4 > Kernel does not have the same problem. I haven't seen any problems like that, including on kernel.org, which is definitely a high demand site. > Who out there has a RAID6 array that they believe is stable and safe? > And please give some details about the array. Number of disks, sizes, LVM, > FS, SCSI, ATA and anything else you can think of? Also, details about any > disk failures and how well recovery went? The one I have is a 6-disk ATA array (6x250 GB), ext3. Had one disk failure which hasn't been replaced yet; it's successfully running in 1-disk degraded mode. I'll let other people speak for themselves. -hpa ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH md 2 of 4] Fix raid6 problem 2005-02-03 18:07 ` H. Peter Anvin @ 2005-02-03 19:36 ` Gordon Henderson 2005-02-04 9:04 ` Andrew Walrond 2005-02-14 4:27 ` Tim Moore 0 siblings, 2 replies; 28+ messages in thread From: Gordon Henderson @ 2005-02-03 19:36 UTC (permalink / raw) To: linux-raid On Thu, 3 Feb 2005, H. Peter Anvin wrote: > Guy wrote: > > Would you say that the 2.6 Kernel is suitable for storing mission-critical > > data, then? > > Sure. I'd trust 2.6 over 2.4 at this point. This is interesting to hear. > > I ask because I have read about a lot of problems with data corruption and > > oops on this list and the SCSI list. But in most or all cases the 2.4 > > Kernel does not have the same problem. > > I haven't seen any problems like that, including on kernel.org, which is > definitely a high demand site. > > > Who out there has a RAID6 array that they believe is stable and safe? > > And please give some details about the array. Number of disks, sizes, LVM, > > FS, SCSI, ATA and anything else you can think of? Also, details about any > > disk failures and how well recovery went? > > The one I have is a 6-disk ATA array (6x250 GB), ext3. Had one disk > failure which hasn't been replaced yet; it's successfully running in > 1-disk degraded mode. > > I'll let other people speak for themselves. I asked this question a couple of weeks ago... I didn't get many replies that inspired confidence, however, I didn't get any "don't do it" replies either. So I went off and built a test server with a bit mis-match of drives, controllers and whatno, and made the effort to get Debian Woody to use the 2.6.10 kernel (stock, no patces) and played with mdadm and RAID-6. My test server is an old Asus Twin Xeon 500 MHz board (XG-DLS, I think) with 2 old 4GB IDE drives, 2 older 18GB SCSI drives (on-board controller, one on a nice 68-way LVD cable, the other on a 50-way flat ribbon) and 2 Maxtor (I know) 80GB drives on an Highpoint controller. I was unable to make it crash, or corrupt data (that I could tell) in about a weeks worth of testing. I only hard-pulled a drive once though, and it stalled for a short while, then did what it was supposed to do and carried on. I did lots of tests where I failed a drive (using mdadm), then failed a 2nd, then started a re-sync, then started a 2nd resync, and failed a drive after the 1st resync finished, etc., etc., etc.... Nothing more than RAID-6 and ext3, no LVM, XFS, etc. So at that point I was reasonably happy with RAID-6 and 2.6.10. My blood had stopped dripping over the edge, I'm warming to 2.6.10 and thinking RAID-6 might just be the solution to all my problems... However, I then got production hardware - Tyan Thunder K8W twin Opteron board, 4-port SATA on-oboard, 2x2-port SATA in PCI slots (all SII chipset) and it all went pear-shaped from there. The system locks solid whenever I try to use the disks off the PCI SATA controllers doing anything much more than run fdisk on them. (It just stops, no oops, cursor stops flashing on the display, it needs a hard-reset to get it going again) I've tried PCI slot positions, fiddling with mobo jumpers, BIOS options, and so on. I can make it work for varying degrees of "work", however blood is currently flowing over the edge and gathering in a pool at my feet. Even getting it to boot off the SATA drives was a challenge in itself (which still isn't solved to my satisfaction) Anyone using Tyan Thunder K8W motherboards??? I now know, there is a K8S (server?) version of that mobo, but at the time it was all orderd, I wasn't aware of it - my thoughts are there there is some sort of PCI/PCI-X problem with either the motherboard or the chipset, and in all probability the K8S mobo will have the same chipsset and same problems anyway... Right now, (to test the PCI SATA cards in PCI-X slots), I have 4 x dual-port SATA cards in a Dell PCI-X mobo connected to the 8 drives in their box via 900mm SATA cables, and it's all running quite nicely. Read performance on a RAID-0 array was 230MB/sec, write 300MB/sec (!?!), it falls to 110MB/sec write and 140MB/sec read for RAID-6...) (Processor here is a single Xeon 2.4GHz) Gordon ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH md 2 of 4] Fix raid6 problem 2005-02-03 19:36 ` Gordon Henderson @ 2005-02-04 9:04 ` Andrew Walrond 2005-02-04 11:19 ` Gordon Henderson 2005-02-14 4:27 ` Tim Moore 1 sibling, 1 reply; 28+ messages in thread From: Andrew Walrond @ 2005-02-04 9:04 UTC (permalink / raw) To: linux-raid Hi Gordon, On Thursday 03 February 2005 19:36, Gordon Henderson wrote: > > However, I then got production hardware - Tyan Thunder K8W twin Opteron > board, 4-port SATA on-oboard, 2x2-port SATA in PCI slots (all SII chipset) > and it all went pear-shaped from there. The system locks solid whenever I > try to use the disks off the PCI SATA controllers doing anything much more > than run fdisk on them. (It just stops, no oops, cursor stops flashing on > the display, it needs a hard-reset to get it going again) I've tried PCI > slot positions, fiddling with mobo jumpers, BIOS options, and so on. I > can make it work for varying degrees of "work", however blood is currently > flowing over the edge and gathering in a pool at my feet. Even getting it > to boot off the SATA drives was a challenge in itself (which still isn't > solved to my satisfaction) > > Anyone using Tyan Thunder K8W motherboards??? > I'm using K8W's here with a combo od raid0/1 on on-board SATA, and its been rock solid for months (2.6.10). Looks like your problems are all with the PCI cards, but I can't help there. Since you are using vanilla 2.6.10, Jeff Garzik (SATA maintainer) should be interested/helpful on LKML if you want to pursue this. What was the booting problem? I have no problems here in that regard. > I now know, there is a K8S (server?) version of that mobo, but at the time > it was all orderd, I wasn't aware of it - my thoughts are there there is > some sort of PCI/PCI-X problem with either the motherboard or the chipset, > and in all probability the K8S mobo will have the same chipsset and same > problems anyway... Right; very similar, no AGP but additional on-board scsi. > > Right now, (to test the PCI SATA cards in PCI-X slots), I have 4 x > dual-port SATA cards in a Dell PCI-X mobo connected to the 8 drives in > their box via 900mm SATA cables, and it's all running quite nicely. Read > performance on a RAID-0 array was 230MB/sec, write 300MB/sec (!?!), it > falls to 110MB/sec write and 140MB/sec read for RAID-6...) (Processor here > is a single Xeon 2.4GHz) I wonder if it's the onboard/pci combination that causes the problem on K8W... BTW There is a new K8W just out; pci express replaces agp and I think it supports dual core opterons (when they appear) so check it out before you place any big orders for the old one :) Andrew Walrond ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH md 2 of 4] Fix raid6 problem 2005-02-04 9:04 ` Andrew Walrond @ 2005-02-04 11:19 ` Gordon Henderson 2005-02-04 18:31 ` Mike Hardy ` (2 more replies) 0 siblings, 3 replies; 28+ messages in thread From: Gordon Henderson @ 2005-02-04 11:19 UTC (permalink / raw) To: linux-raid On Fri, 4 Feb 2005, Andrew Walrond wrote: > Hi Gordon, > > > Anyone using Tyan Thunder K8W motherboards??? > > I'm using K8W's here with a combo od raid0/1 on on-board SATA, and its been > rock solid for months (2.6.10). Looks like your problems are all with the PCI > cards, but I can't help there. Since you are using vanilla 2.6.10, Jeff > Garzik (SATA maintainer) should be interested/helpful on LKML if you want to > pursue this. The on-board stuff was solid for me too. Great with in JBOD mode with 4 disks under Linux s/w RAID. I didn't enable it's own RAID system. The problems came when I added 2 x 2-port SATA PCI cards to get it to access all 8 SATA disks I have in the box. > What was the booting problem? I have no problems here in that regard. When you add extra PCI cards, depending on where you add the cards and on what order you have the PCI scan in the BIOS, depends on what disk it really wants to boot from. I could boot OK from the on-board controller, I had problems when I added PCI SATA cards. To build it, I had to add an IDE disk and do the install on that, then copy it over to the SATA drives - this is something I've done in the past - Debian Woody uses 2.4.18 to start with, and doesn't have any SATA drivers. I've done this in the past without incident - I actually have an IDE drive which I use for such installs, and although it's a bit of a fiddle, I get the system I want reasonably quickly. Bizarrely it seems to work better when booting off an IDE drive, so putting in 2 IDE drives (mirrored) to boot off is an option, as is putting in a IDE/Flash card.... (Which I've used in the past) What I wanted was an 8-way RAID-1 for the boot partition (all of /, in reality) and I've done this many times in the past on other 2-5 way systems without issue. So I do the stuff I've done in the past, and theres nothing really new to me in that respect. (I'm using LILO) So when I try to get it to boot off the md device, it boots and says LIL and then nothing more. (Lilo diagnostics interpret this as a media failure, or geometry mismatch) If I make it boot off /dev/sda1 then it would work. (ie. boot off /dev/sda1, root on /dev/md1, an 8-way RAID-1) I tried many combinations of old (Debian woody) & new Lilo (compiled from the latest source), I even tried GRUB at one point with no luck either. It was more frustrating as the turn-around time is several minutes by the time you go through the BIOS to change the boot device, then reboot, change lilo.conf, then try again )-: > > I now know, there is a K8S (server?) version of that mobo, but at the time > > it was all orderd, I wasn't aware of it - my thoughts are there there is > > some sort of PCI/PCI-X problem with either the motherboard or the chipset, > > and in all probability the K8S mobo will have the same chipsset and same > > problems anyway... > > Right; very similar, no AGP but additional on-board scsi. > > > > > Right now, (to test the PCI SATA cards in PCI-X slots), I have 4 x > > dual-port SATA cards in a Dell PCI-X mobo connected to the 8 drives in > > their box via 900mm SATA cables, and it's all running quite nicely. Read > > performance on a RAID-0 array was 230MB/sec, write 300MB/sec (!?!), it > > falls to 110MB/sec write and 140MB/sec read for RAID-6...) (Processor here > > is a single Xeon 2.4GHz) > > I wonder if it's the onboard/pci combination that causes the problem on > K8W... Thats what I'm thinking - there are various jumpers to put the PCI-X slots into PCI mode and lots of BIOS options to control speed, etc, none of which made any difference. I did try a different brand of SATA PCI card and that worked slightly better, but I could still force a total lock-up with lots of access to the SATA drives on the PCI cards. (Mobo & PCI cards are all SII 3114/3112 chipsets) I have had a private email from someone who has experienced similar lock-ups with twin Opteron systems and PCI cards, so from that point of view it doesn't bode well. I have some quad Opterons running too, and they seem fine, although they are pure compute servers with just a local IDE drive. (and they are running SuSE 64-bit which is what the application demands) It seemed more stable with just one PCI card in, so I have a 4-port card on order as a last ditch attempt to make it work - I did try re-flashing the BIOS on one board, (I have 2) as it seemed to be about a year old and there are several updates on the Tyan web-site, however that resulted in wiping out the BIOS - it seemed to be going just fine, then it went beep and was silent forever more )-: Anyone in the SW have a flash programmer/copier handy??? > BTW There is a new K8W just out; pci express replaces agp and I think it > supports dual core opterons (when they appear) so check it out before you > place any big orders for the old one :) Maybe, or maybe we just move to an Intel system, although power dissipation was a consideration and the Opterons are attractive in that aspect... The case has a 600W PSU before anyone asks.. Cheers, Gordon ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH md 2 of 4] Fix raid6 problem 2005-02-04 11:19 ` Gordon Henderson @ 2005-02-04 18:31 ` Mike Hardy 2005-02-13 21:05 ` Mark Hahn 2005-02-06 3:38 ` Tim Moore 2005-02-14 4:49 ` Tim Moore 2 siblings, 1 reply; 28+ messages in thread From: Mike Hardy @ 2005-02-04 18:31 UTC (permalink / raw) To: linux-raid Gordon Henderson wrote: > I have had a private email from someone who has experienced similar > lock-ups with twin Opteron systems and PCI cards, so from that point of > view it doesn't bode well. > > I have some quad Opterons running too, and they seem fine, although they > are pure compute servers with just a local IDE drive. (and they are > running SuSE 64-bit which is what the application demands) Interesting - the private mail was from me, and I've got two dual Opterons in service. The one with significantly more PCI activity has significantly more problems then the one with less PCI activity. There appears to be a trend here then, if 3 is a trend. We're moving to a single PCI card (8-port 3ware, iirc) for all off-motherboard disk access, and I'll report back if that changes anything. -Mike ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH md 2 of 4] Fix raid6 problem 2005-02-04 18:31 ` Mike Hardy @ 2005-02-13 21:05 ` Mark Hahn 2005-02-13 21:19 ` Gordon Henderson 2005-02-13 22:58 ` Mike Hardy 0 siblings, 2 replies; 28+ messages in thread From: Mark Hahn @ 2005-02-13 21:05 UTC (permalink / raw) To: Mike Hardy; +Cc: linux-raid > Interesting - the private mail was from me, and I've got two dual > Opterons in service. The one with significantly more PCI activity has > significantly more problems then the one with less PCI activity. that's pretty odd, since the most intense IO devices I know of are cluster interconnect (quadrics, myrinet, infiniband), and those vendors *love* opterons. I've never heard any of them say other than that Opteron IO handling is noticably better than Intel's. otoh, I could easily believe that if you're running the Opteron systems in acts-like-a-faster-xeon mode (ie, not x86_64), you might be exercising some less-tested paths. regards, mark hahn. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH md 2 of 4] Fix raid6 problem 2005-02-13 21:05 ` Mark Hahn @ 2005-02-13 21:19 ` Gordon Henderson 2005-02-14 4:56 ` Tim Moore 2005-02-13 22:58 ` Mike Hardy 1 sibling, 1 reply; 28+ messages in thread From: Gordon Henderson @ 2005-02-13 21:19 UTC (permalink / raw) To: linux-raid On Sun, 13 Feb 2005, Mark Hahn wrote: > > Interesting - the private mail was from me, and I've got two dual > > Opterons in service. The one with significantly more PCI activity has > > significantly more problems then the one with less PCI activity. > > that's pretty odd, since the most intense IO devices I know of > are cluster interconnect (quadrics, myrinet, infiniband), > and those vendors *love* opterons. I've never heard any of them > say other than that Opteron IO handling is noticably better than > Intel's. > > otoh, I could easily believe that if you're running the Opteron > systems in acts-like-a-faster-xeon mode (ie, not x86_64), > you might be exercising some less-tested paths. I was about to post that I've solved my problems with that Tyan dual opteron motherboard, but it's still crap. I upgraded the BIOS to the 2.02 beta and it seemed to work a lot better. Still couldn't boot off it with all 8 drives in, but solved that with the use of a 32MB flash IDE unit holding /boot... However, it dropped a drive during initial sync of the raid6 arrays with lots of SCSI errors, and had given lots of DMA interrupt missing, etc. thorugh the day when I've run soaktests on it, so I'm going to conclude that that Tyan motherboard is utterly useless and deserves nothing more than being driven over. Slowly. With a steam roller. Then jumped on. Just to make me feel better. Gordon ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH md 2 of 4] Fix raid6 problem 2005-02-13 21:19 ` Gordon Henderson @ 2005-02-14 4:56 ` Tim Moore 2005-02-14 9:42 ` Andrew Walrond 0 siblings, 1 reply; 28+ messages in thread From: Tim Moore @ 2005-02-14 4:56 UTC (permalink / raw) To: linux-raid Gordon Henderson wrote: > On Sun, 13 Feb 2005, Mark Hahn wrote: > I was about to post that I've solved my problems with that Tyan dual > opteron motherboard, but it's still crap. I upgraded the BIOS to the 2.02 > beta and it seemed to work a lot better. Still couldn't boot off it with > all 8 drives in, but solved that with the use of a 32MB flash IDE unit > holding /boot... However, it dropped a drive during initial sync of the > raid6 arrays with lots of SCSI errors, and had given lots of DMA interrupt > missing, etc. thorugh the day when I've run soaktests on it, so I'm going 2 of the 4 SATA cables packaged with the K8W were intermittent bad (SCSI/DMA errors, BIOS didn't see them at boot). New cables, no problem. No I don't work for AMD or Tyan :) I tried the K8W as a high throughput client driver platform because I couldnt take the budget hit on an Iwill QK8S which we use in production...they work like perfect screaming daemons. Also considered the MSI board but no numa memory interconnect and no 64bit slots. Good luck. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH md 2 of 4] Fix raid6 problem 2005-02-14 4:56 ` Tim Moore @ 2005-02-14 9:42 ` Andrew Walrond 0 siblings, 0 replies; 28+ messages in thread From: Andrew Walrond @ 2005-02-14 9:42 UTC (permalink / raw) To: linux-raid On Monday 14 February 2005 04:56, Tim Moore wrote: > > Also considered the MSI board but no numa memory interconnect and no 64bit > slots. > I have some MSI boards here. The K8D Master3 has both NUMA and 64bit slots. I've been running 2.6.10 flawlessy with more than a months uptime so far. Andrew Walrond ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH md 2 of 4] Fix raid6 problem 2005-02-13 21:05 ` Mark Hahn 2005-02-13 21:19 ` Gordon Henderson @ 2005-02-13 22:58 ` Mike Hardy 2005-02-13 23:14 ` Richard Scobie 1 sibling, 1 reply; 28+ messages in thread From: Mike Hardy @ 2005-02-13 22:58 UTC (permalink / raw) Cc: linux-raid Mark Hahn wrote: >>Interesting - the private mail was from me, and I've got two dual >>Opterons in service. The one with significantly more PCI activity has >>significantly more problems then the one with less PCI activity. > > > that's pretty odd, since the most intense IO devices I know of > are cluster interconnect (quadrics, myrinet, infiniband), > and those vendors *love* opterons. I've never heard any of them > say other than that Opteron IO handling is noticably better than > Intel's. Sure, but which variables are changed between the rigs the vendors loved, and the rig we're having problems with? > otoh, I could easily believe that if you're running the Opteron > systems in acts-like-a-faster-xeon mode (ie, not x86_64), > you might be exercising some less-tested paths. Its running x86_64 (Fedora Core 3) and the problem is rooted in the chipset I believe. I don't think its Opterons per se, I think its just the Athlon take two - which is to say that its a wonderful chip, but some of the chipsets its saddled with are horrible, and careful selection (as well as heavy testing prior to putting a machine in service) is essential. -Mike ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH md 2 of 4] Fix raid6 problem 2005-02-13 22:58 ` Mike Hardy @ 2005-02-13 23:14 ` Richard Scobie 0 siblings, 0 replies; 28+ messages in thread From: Richard Scobie @ 2005-02-13 23:14 UTC (permalink / raw) To: linux-raid Mike Hardy wrote: > Its running x86_64 (Fedora Core 3) and the problem is rooted in the > chipset I believe. I don't think its Opterons per se, I think its just > the Athlon take two - which is to say that its a wonderful chip, but > some of the chipsets its saddled with are horrible, and careful > selection (as well as heavy testing prior to putting a machine in > service) is essential. I'd be interested to hear the outcome, as I'll be looking at Opteron systems soon and having been badly bitten on dual Athlon (AMD768 Southbridge), would prefer to avoid a repeat of the experience. Regards, Richard ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH md 2 of 4] Fix raid6 problem 2005-02-04 11:19 ` Gordon Henderson 2005-02-04 18:31 ` Mike Hardy @ 2005-02-06 3:38 ` Tim Moore 2005-02-14 4:49 ` Tim Moore 2 siblings, 0 replies; 28+ messages in thread From: Tim Moore @ 2005-02-06 3:38 UTC (permalink / raw) To: linux-raid; +Cc: Gordon Henderson Gordon Henderson wrote: > ... > It seemed more stable with just one PCI card in, so I have a 4-port card > on order as a last ditch attempt to make it work - I did try re-flashing > the BIOS on one board, (I have 2) as it seemed to be about a year old and > there are several updates on the Tyan web-site, however that resulted in > wiping out the BIOS - it seemed to be going just fine, then it went beep > and was silent forever more )-: Anyone in the SW have a flash > programmer/copier handy??? I had this happen when flashing 2.02b. Reboot was an endless cycle of 15 short beeps. Turns out the 2.02 memory configuration is so different than the 1.?? it couldn't boot. Swapped the 1GB PC2700 (ATP) memory with 512MB modules, cleared CMOS, rebooted, reconfigured the BIOS, shut down, replaced the 1GB production modules, rebooted. Perfect ever since. The sil 3114 onboard w software RAID5 has been flawless (RH 7.3 base, 2.4.29 kernel, SCSI/sil driver). The only other problem was that two of the 4 short black SATA cables that came with the board were intermittent. Once replaced all is well. I'd give 2.02 beta Bios a run and make sure PCI-X slot configuration is matched to the controller card. 2.02 BIOS is very different than earlier versions. Iwill quad opteron boards and are quite happy to never go back to intel ever again. t. -- ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH md 2 of 4] Fix raid6 problem 2005-02-04 11:19 ` Gordon Henderson 2005-02-04 18:31 ` Mike Hardy 2005-02-06 3:38 ` Tim Moore @ 2005-02-14 4:49 ` Tim Moore 2005-02-14 8:09 ` Gordon Henderson 2 siblings, 1 reply; 28+ messages in thread From: Tim Moore @ 2005-02-14 4:49 UTC (permalink / raw) To: linux-raid Gordon Henderson wrote: > What I wanted was an 8-way RAID-1 for the boot partition (all of /, in > reality) and I've done this many times in the past on other 2-5 way > systems without issue. So I do the stuff I've done in the past, and theres > nothing really new to me in that respect. (I'm using LILO) So when I try > to get it to boot off the md device, it boots and says LIL and then > nothing more. (Lilo diagnostics interpret this as a media failure, or > geometry mismatch) If I make it boot off /dev/sda1 then it would work. We put /boot on 100MB /dev/sda1 partition, rest of drive is md. Lilo script section does dd if=/dev/sda of=/boot/boot446.sda bs=446 count=1 && \ fdisk -l /dev/sda > /boot/fdisk.sda && \ dd if=/dev/sda1 of=/dev/sdb1 every time a new kernel is built. Recovery is much easier without RAID involved (lilo 22.6). I've considered manipulating the boot block/disk label on copy so that it would boot off any off sda1 or sdb1 transparently. > (ie. boot off /dev/sda1, root on /dev/md1, an 8-way RAID-1) I tried many > combinations of old (Debian woody) & new Lilo (compiled from the latest > source), I even tried GRUB at one point with no luck either. It was more > frustrating as the turn-around time is several minutes by the time you go > through the BIOS to change the boot device, then reboot, change lilo.conf, > then try again )-: > > It seemed more stable with just one PCI card in, so I have a 4-port card > on order as a last ditch attempt to make it work - I did try re-flashing > the BIOS on one board, (I have 2) as it seemed to be about a year old and > there are several updates on the Tyan web-site, however that resulted in > wiping out the BIOS - it seemed to be going just fine, then it went beep > and was silent forever more )-: Anyone in the SW have a flash > programmer/copier handy??? Flashing from 1.x to 2.02b, same problem. Power off, pull plug, pull both power connectors off mobo, wait 15 seconds, clear CMOS for 15 seconds, reboot, reset BIOS, no worries. > Maybe, or maybe we just move to an Intel system, although power > dissipation was a consideration and the Opterons are attractive in that > aspect... The case has a 600W PSU before anyone asks.. Yeech. Get the 8131's working and you'll never go back. No Northbridge bottlenecks, thank you. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH md 2 of 4] Fix raid6 problem 2005-02-14 4:49 ` Tim Moore @ 2005-02-14 8:09 ` Gordon Henderson 0 siblings, 0 replies; 28+ messages in thread From: Gordon Henderson @ 2005-02-14 8:09 UTC (permalink / raw) To: Tim Moore; +Cc: linux-raid On Sun, 13 Feb 2005, Tim Moore wrote: > Gordon Henderson wrote: > > What I wanted was an 8-way RAID-1 for the boot partition (all of /, in > > reality) and I've done this many times in the past on other 2-5 way > > systems without issue. So I do the stuff I've done in the past, and theres > > nothing really new to me in that respect. (I'm using LILO) So when I try > > to get it to boot off the md device, it boots and says LIL and then > > nothing more. (Lilo diagnostics interpret this as a media failure, or > > geometry mismatch) If I make it boot off /dev/sda1 then it would work. > > We put /boot on 100MB /dev/sda1 partition, rest of drive is md. Lilo > script section does > dd if=/dev/sda of=/boot/boot446.sda bs=446 count=1 && \ > fdisk -l /dev/sda > /boot/fdisk.sda && \ > dd if=/dev/sda1 of=/dev/sdb1 > every time a new kernel is built. Recovery is much easier without RAID > involved (lilo 22.6). I've considered manipulating the boot block/disk > label on copy so that it would boot off any off sda1 or sdb1 transparently. It's now booting off a 32MB flash IDE drive thing which is mounted read-only under /boot. I have lilo remount it r/w the do its stuff, then remount it r/o again. I could boot it OK under raid-1 off the 4 drives on the on-board controller. As soon as I plugged in PCI disk contorllers it all goes pear-shaped. This motherboard & bios and serious issues with more than 4 SATA disks. > Flashing from 1.x to 2.02b, same problem. Power off, pull plug, pull > both power connectors off mobo, wait 15 seconds, clear CMOS for 15 > seconds, reboot, reset BIOS, no worries. I re-flashed one board to 2.02b - it helps in that the system doesn't lock-hard with one make of 3112 card, but the other make still can cause a hard lockup. I also seem to have lose the use of PCI slots 3 and 4 with this new BIOS. One board died during re-flashing, so it's gone back. Gordon ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH md 2 of 4] Fix raid6 problem 2005-02-03 19:36 ` Gordon Henderson 2005-02-04 9:04 ` Andrew Walrond @ 2005-02-14 4:27 ` Tim Moore 2005-02-14 8:05 ` Gordon Henderson 1 sibling, 1 reply; 28+ messages in thread From: Tim Moore @ 2005-02-14 4:27 UTC (permalink / raw) To: linux-raid Gordon Henderson wrote: > > Anyone using Tyan Thunder K8W motherboards??? > > I now know, there is a K8S (server?) version of that mobo, but at the time > it was all orderd, I wasn't aware of it - my thoughts are there there is > some sort of PCI/PCI-X problem with either the motherboard or the chipset, > and in all probability the K8S mobo will have the same chipsset and same > problems anyway... I'm using a K8W at work as a driver client for NAS testing. Onboard Broadcom GigE, Linksys Marvell GigE, 2xWD1200JD + 2xMaxtor Maxline Plus II as RAID-0 and RAID-5 using the Sil_3114, 2.4.29, raidtools 1.0. 2+2x1GB PC-2700 in first and third slots for each CPU. All PCI-X/HT configs set to Auto in BIOS and Jumpers, 2.02b BIOS. No issues except for a bad SATA cable. Striping yields ~90MB/s, RAID-5 about 65r, 55w on 8GB Bonnie++ runs, 2GB dd reads on raw devices yields ~55MB/s Fedora Core 2 tests next week. -- ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH md 2 of 4] Fix raid6 problem 2005-02-14 4:27 ` Tim Moore @ 2005-02-14 8:05 ` Gordon Henderson 0 siblings, 0 replies; 28+ messages in thread From: Gordon Henderson @ 2005-02-14 8:05 UTC (permalink / raw) To: Tim Moore; +Cc: linux-raid On Sun, 13 Feb 2005, Tim Moore wrote: > Gordon Henderson wrote: > > > > Anyone using Tyan Thunder K8W motherboards??? > > > > I now know, there is a K8S (server?) version of that mobo, but at the time > > it was all orderd, I wasn't aware of it - my thoughts are there there is > > some sort of PCI/PCI-X problem with either the motherboard or the chipset, > > and in all probability the K8S mobo will have the same chipsset and same > > problems anyway... > > I'm using a K8W at work as a driver client for NAS testing. Onboard > Broadcom GigE, Linksys Marvell GigE, 2xWD1200JD + 2xMaxtor Maxline Plus II > as RAID-0 and RAID-5 using the Sil_3114, 2.4.29, raidtools 1.0. 2+2x1GB > PC-2700 in first and third slots for each CPU. All PCI-X/HT configs set to > Auto in BIOS and Jumpers, 2.02b BIOS. No issues except for a bad SATA cable. > > Striping yields ~90MB/s, RAID-5 about 65r, 55w on 8GB Bonnie++ runs, 2GB dd > reads on raw devices yields ~55MB/s I've not had any issues with the on-board 3114 controller. It's not blindingly fast, but it's at the end of 2 PCI bridges and on a 33MHz 32-bit bus, but it's fine. I've had over 270MB/sec reads out of an 8-way RAID-0 array. (300Mb/sec writes!) although I want raid-6 on all partitions, that drops down to ~130MB/sec read. The problems happen when I plug-in a PCI SATA card (3112 chipset based). I've tried 2 different types of cards and while one is better than the other (causes less or no lock-ups) it's still not perfect. I'm going to try a 4-port card this week. It's solid with only one PCI card in. Assitionally, if I plug a card into PCI slots 3 or 4, then the bios locks up at boot time (Checking NVRAM ... ) Hm. According to the manual, your memory configuration isn't supported - you should be using slots 1 & 2 for each processor to get 128-bit access... I only have 2 x 512MB PC2700 modules in slots 1 & 2 of CPU0. If this box was just going to be a fileserver (NAS sort of thing) then I'd have gone with a single processor, but it's also going to be running some huge CVS and MySQL application (home built version control system) and they specified dual-processors. (The existing setup runs on a dual Xeon PII/700 Dull box which takes up too much rack space and doesn't have enough disks) Gordon ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH md 2 of 4] Fix raid6 problem
@ 2004-11-03 23:56 A. James Lewis
2004-12-09 0:21 ` H. Peter Anvin
[not found] ` <200412090021.iB90L4MK014200@terminus.zytor.com>
0 siblings, 2 replies; 28+ messages in thread
From: A. James Lewis @ 2004-11-03 23:56 UTC (permalink / raw)
To: linux-raid
Hi,
I'd like to put raid6 to use, and after reading through the process of
tracking down this bug, it seems that a rational explanation was found for
the data corruption, and the fix well tested... but being new to a lot of
the process here, what is the process for this to get into the standard
kernel... perhaps 2.6.10 will have this patch??
Obviously I could apply the patch to raid6main.c on my system, but it
would be good to use a standard kernel...
The problem is only when writing to a degraded array, but most of us are
impatient and want to write a filesystem and get it mounted before the
first sync is complete... and those, like me cursed with bad hardware will
have 2 drives fail at the same time (last week!) and hence raid6 is very
appealing :).
--
¯·.¸¸.·´¯·.¸¸.-> A. James Lewis (james@fsck.co.uk)
http://www.fsck.co.uk/personal/nopistons.jpg
MAZDA - World domination through rotary power.
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 28+ messages in thread* Re: [PATCH md 2 of 4] Fix raid6 problem 2004-11-03 23:56 A. James Lewis @ 2004-12-09 0:21 ` H. Peter Anvin 2004-12-09 0:35 ` Jim Paris [not found] ` <200412090021.iB90L4MK014200@terminus.zytor.com> 1 sibling, 1 reply; 28+ messages in thread From: H. Peter Anvin @ 2004-12-09 0:21 UTC (permalink / raw) To: linux-raid Followup to: <38038.212.158.231.74.1099526180.squirrel@mail.fsck.co.uk> By author: "A. James Lewis" <james@fsck.co.uk> In newsgroup: linux.dev.raid > > I'd like to put raid6 to use, and after reading through the process of > tracking down this bug, it seems that a rational explanation was found for > the data corruption, and the fix well tested... but being new to a lot of > the process here, what is the process for this to get into the standard > kernel... perhaps 2.6.10 will have this patch?? > > Obviously I could apply the patch to raid6main.c on my system, but it > would be good to use a standard kernel... > > The problem is only when writing to a degraded array, but most of us are > impatient and want to write a filesystem and get it mounted before the > first sync is complete... and those, like me cursed with bad hardware will > have 2 drives fail at the same time (last week!) and hence raid6 is very > appealing :). > Hi James, This patch got integrated in, I believe, 2.6.10-rc2. Please let me know what your experience is. It would be good to get the EXPERIMENTAL tag taken off at some point. -hpa ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH md 2 of 4] Fix raid6 problem 2004-12-09 0:21 ` H. Peter Anvin @ 2004-12-09 0:35 ` Jim Paris 0 siblings, 0 replies; 28+ messages in thread From: Jim Paris @ 2004-12-09 0:35 UTC (permalink / raw) To: H. Peter Anvin; +Cc: linux-raid > Please let me know what your experience is. It would be good to get > the EXPERIMENTAL tag taken off at some point. FYI: My raid-6 system (reiserfs) has seen only moderate use and had no disk failures occur, but I have had no problems since applying those patches. -jim ^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <200412090021.iB90L4MK014200@terminus.zytor.com>]
* Re: [PATCH md 2 of 4] Fix raid6 problem [not found] ` <200412090021.iB90L4MK014200@terminus.zytor.com> @ 2005-01-23 14:02 ` A. James Lewis 2005-01-23 14:42 ` Kevin P. Fleming 2005-02-03 2:12 ` H. Peter Anvin 0 siblings, 2 replies; 28+ messages in thread From: A. James Lewis @ 2005-01-23 14:02 UTC (permalink / raw) To: linux-raid Sorry for the delay in replying, I've been using RAID6 in a real life situation with 2.6.9 + patch, for 2 months now, with 1.15Tb of storage, and I have had more than 1 drive failure... as well as some rather embarasing hardware corruption which I traced to a faulty IDE controller. Dispite some random DMA corrupion, and loosing a total of 3 disks, I have not had any problems with it RAID6 itself, and really it has litereally saved my data from being lost. I ran a diff against the 2.6.9 patch and what is in 2.6.10... and they are not the same, presumably a more elegant fix has been implimented for the production kernel?? As an aside, At the moment, I am experimenting with RAID on top of USB Mass Storage devices... it's interesting because the USB system takes a significant time to identify and make each drive available, and I have to determine if all the drives have become available before starting any arrays.... Does anyone have any experience with this sort of thing? I'm sure H. Peter Anvin said somthing about: > Followup to: <38038.212.158.231.74.1099526180.squirrel@mail.fsck.co.uk> > By author: "A. James Lewis" <james@fsck.co.uk> > In newsgroup: linux.dev.raid >> >> I'd like to put raid6 to use, and after reading through the process of >> tracking down this bug, it seems that a rational explanation was found >> for >> the data corruption, and the fix well tested... but being new to a lot >> of >> the process here, what is the process for this to get into the standard >> kernel... perhaps 2.6.10 will have this patch?? >> >> Obviously I could apply the patch to raid6main.c on my system, but it >> would be good to use a standard kernel... >> >> The problem is only when writing to a degraded array, but most of us are >> impatient and want to write a filesystem and get it mounted before the >> first sync is complete... and those, like me cursed with bad hardware >> will >> have 2 drives fail at the same time (last week!) and hence raid6 is very >> appealing :). >> > > Hi James, > > This patch got integrated in, I believe, 2.6.10-rc2. > > Please let me know what your experience is. It would be good to get > the EXPERIMENTAL tag taken off at some point. > > -hpa > > > !DSPAM:41b79a8888125029010830! > > > -- ¯·.¸¸.·´¯·.¸¸.-> A. James Lewis (james@fsck.co.uk) http://www.fsck.co.uk/personal/nopistons.jpg MAZDA - World domination through rotary power. - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH md 2 of 4] Fix raid6 problem 2005-01-23 14:02 ` A. James Lewis @ 2005-01-23 14:42 ` Kevin P. Fleming 2005-02-03 2:12 ` H. Peter Anvin 1 sibling, 0 replies; 28+ messages in thread From: Kevin P. Fleming @ 2005-01-23 14:42 UTC (permalink / raw) Cc: linux-raid A. James Lewis wrote: > At the moment, I am experimenting with RAID on top of USB Mass Storage > devices... it's interesting because the USB system takes a significant > time to identify and make each drive available, and I have to determine if > all the drives have become available before starting any arrays.... You'd be best off to leverage the hotplug infrastructure (userspace scripts) and just not try to start the array until you know all the members are available. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH md 2 of 4] Fix raid6 problem 2005-01-23 14:02 ` A. James Lewis 2005-01-23 14:42 ` Kevin P. Fleming @ 2005-02-03 2:12 ` H. Peter Anvin 2005-02-03 17:13 ` Andy Smith 1 sibling, 1 reply; 28+ messages in thread From: H. Peter Anvin @ 2005-02-03 2:12 UTC (permalink / raw) To: linux-raid Followup to: <33023.212.158.231.74.1106488921.squirrel@mail.fsck.co.uk> By author: "A. James Lewis" <james@fsck.co.uk> In newsgroup: linux.dev.raid > > > Sorry for the delay in replying, I've been using RAID6 in a real life > situation with 2.6.9 + patch, for 2 months now, with 1.15Tb of storage, > and I have had more than 1 drive failure... as well as some rather > embarasing hardware corruption which I traced to a faulty IDE controller. > > Dispite some random DMA corrupion, and loosing a total of 3 disks, I have > not had any problems with it RAID6 itself, and really it has litereally > saved my data from being lost. > > I ran a diff against the 2.6.9 patch and what is in 2.6.10... and they are > not the same, presumably a more elegant fix has been implimented for the > production kernel?? > I think there are some other (generic) fixes in there too. Anyway... I'm thinking of sending in a patch to take out the "experimental" status of RAID-6. I have been running a 1 TB production server in 1-disk degraded mode for about a month now without incident. -hpa ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH md 2 of 4] Fix raid6 problem 2005-02-03 2:12 ` H. Peter Anvin @ 2005-02-03 17:13 ` Andy Smith 0 siblings, 0 replies; 28+ messages in thread From: Andy Smith @ 2005-02-03 17:13 UTC (permalink / raw) To: linux-raid [-- Attachment #1: Type: text/plain, Size: 345 bytes --] On Thu, Feb 03, 2005 at 02:12:38AM +0000, H. Peter Anvin wrote: > Anyway... I'm thinking of sending in a patch to take out the > "experimental" status of RAID-6. I have been running a 1 TB > production server in 1-disk degraded mode for about a month now > without incident. Out of interest, how many disks does this have and what capacities? [-- Attachment #2: Type: application/pgp-signature, Size: 187 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH md 0 of 4] Introduction
@ 2004-11-02 3:37 NeilBrown
2004-11-02 3:37 ` [PATCH md 2 of 4] Fix raid6 problem NeilBrown
0 siblings, 1 reply; 28+ messages in thread
From: NeilBrown @ 2004-11-02 3:37 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
Following are 4 patches for md/raid against 2.6.10-rc1-mm2.
1/ Fix problem with linear arrays if component devices are > 2terabytes
2/ Fix data corruption in (experimental) RAID6 personality
3/ Fix possible oops with unplug_timer firing at the wrong time.
4/ Add new md personality "faulty".
"Faulty" can be used to inject faults and so test failure modes
of other raid levels and of filesystes.
NeilBrown
^ permalink raw reply [flat|nested] 28+ messages in thread* [PATCH md 2 of 4] Fix raid6 problem 2004-11-02 3:37 [PATCH md 0 of 4] Introduction NeilBrown @ 2004-11-02 3:37 ` NeilBrown 0 siblings, 0 replies; 28+ messages in thread From: NeilBrown @ 2004-11-02 3:37 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-raid Sometimes it didn't read all (working) drives before a parity calculation. Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> ### Diffstat output ./drivers/md/raid6main.c | 17 +++++++++-------- 1 files changed, 9 insertions(+), 8 deletions(-) diff ./drivers/md/raid6main.c~current~ ./drivers/md/raid6main.c --- ./drivers/md/raid6main.c~current~ 2004-11-02 14:20:15.000000000 +1100 +++ ./drivers/md/raid6main.c 2004-11-02 14:20:15.000000000 +1100 @@ -734,7 +734,6 @@ static void compute_parity(struct stripe case READ_MODIFY_WRITE: BUG(); /* READ_MODIFY_WRITE N/A for RAID-6 */ case RECONSTRUCT_WRITE: - case UPDATE_PARITY: /* Is this right? */ for (i= disks; i-- ;) if ( i != pd_idx && i != qd_idx && sh->dev[i].towrite ) { chosen = sh->dev[i].towrite; @@ -770,7 +769,8 @@ static void compute_parity(struct stripe i = d0_idx; do { ptrs[count++] = page_address(sh->dev[i].page); - + if (count <= disks-2 && !test_bit(R5_UPTODATE, &sh->dev[i].flags)) + printk("block %d/%d not uptodate on parity calc\n", i,count); i = raid6_next_disk(i, disks); } while ( i != d0_idx ); // break; @@ -818,7 +818,7 @@ static void compute_block_1(struct strip if (test_bit(R5_UPTODATE, &sh->dev[i].flags)) ptr[count++] = p; else - PRINTK("compute_block() %d, stripe %llu, %d" + printk("compute_block() %d, stripe %llu, %d" " not present\n", dd_idx, (unsigned long long)sh->sector, i); @@ -875,6 +875,9 @@ static void compute_block_2(struct strip do { ptrs[count++] = page_address(sh->dev[i].page); i = raid6_next_disk(i, disks); + if (i != dd_idx1 && i != dd_idx2 && + !test_bit(R5_UPTODATE, &sh->dev[i].flags)) + printk("compute_2 with missing block %d/%d\n", count, i); } while ( i != d0_idx ); if ( failb == disks-2 ) { @@ -1157,17 +1160,15 @@ static void handle_stripe(struct stripe_ * parity, or to satisfy requests * or to load a block that is being partially written. */ - if (to_read || non_overwrite || (syncing && (uptodate < disks))) { + if (to_read || non_overwrite || (to_write && failed) || (syncing && (uptodate < disks))) { for (i=disks; i--;) { dev = &sh->dev[i]; if (!test_bit(R5_LOCKED, &dev->flags) && !test_bit(R5_UPTODATE, &dev->flags) && (dev->toread || (dev->towrite && !test_bit(R5_OVERWRITE, &dev->flags)) || syncing || - (failed >= 1 && (sh->dev[failed_num[0]].toread || - (sh->dev[failed_num[0]].towrite && !test_bit(R5_OVERWRITE, &sh->dev[failed_num[0]].flags)))) || - (failed >= 2 && (sh->dev[failed_num[1]].toread || - (sh->dev[failed_num[1]].towrite && !test_bit(R5_OVERWRITE, &sh->dev[failed_num[1]].flags)))) + (failed >= 1 && (sh->dev[failed_num[0]].toread || to_write)) || + (failed >= 2 && (sh->dev[failed_num[1]].toread || to_write)) ) ) { /* we would like to get this block, possibly ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2005-02-14 9:42 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <200502031145.j13Bj1fl016074@terminus.zytor.com>
2005-02-03 16:39 ` [PATCH md 2 of 4] Fix raid6 problem H. Peter Anvin
2005-02-03 16:59 ` Lars Marowsky-Bree
2005-02-03 17:06 ` H. Peter Anvin
2005-02-03 17:43 ` Guy
2005-02-03 18:07 ` H. Peter Anvin
2005-02-03 19:36 ` Gordon Henderson
2005-02-04 9:04 ` Andrew Walrond
2005-02-04 11:19 ` Gordon Henderson
2005-02-04 18:31 ` Mike Hardy
2005-02-13 21:05 ` Mark Hahn
2005-02-13 21:19 ` Gordon Henderson
2005-02-14 4:56 ` Tim Moore
2005-02-14 9:42 ` Andrew Walrond
2005-02-13 22:58 ` Mike Hardy
2005-02-13 23:14 ` Richard Scobie
2005-02-06 3:38 ` Tim Moore
2005-02-14 4:49 ` Tim Moore
2005-02-14 8:09 ` Gordon Henderson
2005-02-14 4:27 ` Tim Moore
2005-02-14 8:05 ` Gordon Henderson
2004-11-03 23:56 A. James Lewis
2004-12-09 0:21 ` H. Peter Anvin
2004-12-09 0:35 ` Jim Paris
[not found] ` <200412090021.iB90L4MK014200@terminus.zytor.com>
2005-01-23 14:02 ` A. James Lewis
2005-01-23 14:42 ` Kevin P. Fleming
2005-02-03 2:12 ` H. Peter Anvin
2005-02-03 17:13 ` Andy Smith
-- strict thread matches above, loose matches on Subject: below --
2004-11-02 3:37 [PATCH md 0 of 4] Introduction NeilBrown
2004-11-02 3:37 ` [PATCH md 2 of 4] Fix raid6 problem NeilBrown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).