* Need some information and help on mdadm in order to support it on IBM z Systems
@ 2008-04-11 12:28 Jean-Baptiste Joret
2008-04-11 14:39 ` Bill Davidsen
0 siblings, 1 reply; 8+ messages in thread
From: Jean-Baptiste Joret @ 2008-04-11 12:28 UTC (permalink / raw)
To: linux-raid
Hello,
I am trying to obtain information such as design document or anything that
would describe the content of the metadata. I am evaluating the solution
to determinate whether it is entreprise ready for use as a mirror solution
and if we can support it at IBM.
Also I am currently having quite a show stopper issue, where help would be
appreciated. I have a RAID1 with 2 Harddisks, when I remove one hardisk (I
put the chpids offline which is equivalent to telling the system that the
drive is currently not available), the missing disk is marked as "faulty
spare" when calling mdadm -D /dev/md0.
/dev/md0:
Version : 01.02.03
Creation Time : Fri Apr 11 11:11:59 2008
Raid Level : raid1
Array Size : 2403972 (2.29 GiB 2.46 GB)
Used Dev Size : 2403972 (2.29 GiB 2.46 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 0
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Fri Apr 11 11:23:04 2008
State : active, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 1
Spare Devices : 0
Name : 0
UUID : 9a0a6e30:4b8bbe7f:bc0cad81:9fd46804
Events : 8
Number Major Minor RaidDevice State
0 0 0 0 removed
1 94 21 1 active sync /dev/dasdf1
0 94 17 - faulty spare /dev/dasde1
When I put the disk back online it is not automatically reinserted into
the array. The only thing that I have tried that worked was to do a hot
remove followed by a hot add (mdadm /dev/md0 -r /dev/dasde1 and then mdadm
/dev/md0 -a /dev/dasde1). Is that the correct way or is there any option
to tell the disk is back an clean ? I don't like my solution verymuch as
somtimes I get an error saying the superblock cannot be written.
Thank you very much for any help you can provide.
Best regards / Mit freundlichen Gruessen / Cordialement / Cordiali Saluti
Jean-Baptiste Joret - Linux on System Z
Phone: +49 7031 16-3278 / ITN: 39203278 - eMail: joret@de.ibm.com
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter
Geschäftsführung: Herbert Kircher
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: Need some information and help on mdadm in order to support it on IBM z Systems 2008-04-11 12:28 Need some information and help on mdadm in order to support it on IBM z Systems Jean-Baptiste Joret @ 2008-04-11 14:39 ` Bill Davidsen 2008-04-14 11:05 ` Jean-Baptiste Joret 0 siblings, 1 reply; 8+ messages in thread From: Bill Davidsen @ 2008-04-11 14:39 UTC (permalink / raw) To: Jean-Baptiste Joret; +Cc: linux-raid Jean-Baptiste Joret wrote: > Hello, > > I am trying to obtain information such as design document or anything that > would describe the content of the metadata. I am evaluating the solution > to determinate whether it is entreprise ready for use as a mirror solution > and if we can support it at IBM. > > Also I am currently having quite a show stopper issue, where help would be > appreciated. I have a RAID1 with 2 Harddisks, when I remove one hardisk (I > put the chpids offline which is equivalent to telling the system that the > drive is currently not available), the missing disk is marked as "faulty > spare" when calling mdadm -D /dev/md0. > > /dev/md0: > Version : 01.02.03 > Creation Time : Fri Apr 11 11:11:59 2008 > Raid Level : raid1 > Array Size : 2403972 (2.29 GiB 2.46 GB) > Used Dev Size : 2403972 (2.29 GiB 2.46 GB) > Raid Devices : 2 > Total Devices : 2 > Preferred Minor : 0 > Persistence : Superblock is persistent > > Intent Bitmap : Internal > > Update Time : Fri Apr 11 11:23:04 2008 > State : active, degraded > Active Devices : 1 > Working Devices : 1 > Failed Devices : 1 > Spare Devices : 0 > > Name : 0 > UUID : 9a0a6e30:4b8bbe7f:bc0cad81:9fd46804 > Events : 8 > > Number Major Minor RaidDevice State > 0 0 0 0 removed > 1 94 21 1 active sync /dev/dasdf1 > > 0 94 17 - faulty spare /dev/dasde1 > > When I put the disk back online it is not automatically reinserted into > the array. The only thing that I have tried that worked was to do a hot > remove followed by a hot add (mdadm /dev/md0 -r /dev/dasde1 and then mdadm > /dev/md0 -a /dev/dasde1). Is that the correct way or is there any option > to tell the disk is back an clean ? I don't like my solution verymuch as > somtimes I get an error saying the superblock cannot be written. > > Thank you very much for any help you can provide. > > Start by detailing the versions of the kernel, mdadm, which superblock you use, and your bitmap configuration (or lack of it). > Best regards / Mit freundlichen Gruessen / Cordialement / Cordiali Saluti > > Jean-Baptiste Joret - Linux on System Z > Phone: +49 7031 16-3278 / ITN: 39203278 - eMail: joret@de.ibm.com > > IBM Deutschland Research & Development GmbH > Vorsitzender des Aufsichtsrats: Martin Jetter > Geschäftsführung: Herbert Kircher > Sitz der Gesellschaft: Böblingen > Registergericht: Amtsgericht Stuttgart, HRB 243294 > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- Bill Davidsen <davidsen@tmr.com> "Woe unto the statesman who makes war without a reason that will still be valid when the war is over..." Otto von Bismark -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Need some information and help on mdadm in order to support it on IBM z Systems 2008-04-11 14:39 ` Bill Davidsen @ 2008-04-14 11:05 ` Jean-Baptiste Joret [not found] ` <4804F9FD.4070606@tmr.com> 0 siblings, 1 reply; 8+ messages in thread From: Jean-Baptiste Joret @ 2008-04-14 11:05 UTC (permalink / raw) To: Bill Davidsen; +Cc: linux-raid Hi Bill, I have created the array with "mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/dasd[ef]1 --metadata=1.2 --bitmap=internal" using as you can see version 1.2 of the Metatata format. The Kernel ist the SuSE standard kernel 2.6.16.60-0.9-default on s390x (SLES 10 SP2 RC1). I have this issue with RC2 too. What I would like to have a more documentation about the Metadata and how they are used if you have or know someone who can provide this. Thank you in advance. Best regards / Mit freundlichen Gruessen / Cordialement / Cordiali Saluti Jean-Baptiste Joret - Linux on System Z Phone: +49 7031 16-3278 / ITN: 39203278 - eMail: joret@de.ibm.com IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martin Jetter Geschäftsführung: Herbert Kircher Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294 From: Bill Davidsen <davidsen@tmr.com> To: Jean-Baptiste Joret/Germany/IBM@IBMDE Cc: linux-raid@vger.kernel.org Date: 11.04.2008 16:35 Subject: Re: Need some information and help on mdadm in order to support it on IBM z Systems Jean-Baptiste Joret wrote: > Hello, > > I am trying to obtain information such as design document or anything that > would describe the content of the metadata. I am evaluating the solution > to determinate whether it is entreprise ready for use as a mirror solution > and if we can support it at IBM. > > Also I am currently having quite a show stopper issue, where help would be > appreciated. I have a RAID1 with 2 Harddisks, when I remove one hardisk (I > put the chpids offline which is equivalent to telling the system that the > drive is currently not available), the missing disk is marked as "faulty > spare" when calling mdadm -D /dev/md0. > > /dev/md0: > Version : 01.02.03 > Creation Time : Fri Apr 11 11:11:59 2008 > Raid Level : raid1 > Array Size : 2403972 (2.29 GiB 2.46 GB) > Used Dev Size : 2403972 (2.29 GiB 2.46 GB) > Raid Devices : 2 > Total Devices : 2 > Preferred Minor : 0 > Persistence : Superblock is persistent > > Intent Bitmap : Internal > > Update Time : Fri Apr 11 11:23:04 2008 > State : active, degraded > Active Devices : 1 > Working Devices : 1 > Failed Devices : 1 > Spare Devices : 0 > > Name : 0 > UUID : 9a0a6e30:4b8bbe7f:bc0cad81:9fd46804 > Events : 8 > > Number Major Minor RaidDevice State > 0 0 0 0 removed > 1 94 21 1 active sync /dev/dasdf1 > > 0 94 17 - faulty spare /dev/dasde1 > > When I put the disk back online it is not automatically reinserted into > the array. The only thing that I have tried that worked was to do a hot > remove followed by a hot add (mdadm /dev/md0 -r /dev/dasde1 and then mdadm > /dev/md0 -a /dev/dasde1). Is that the correct way or is there any option > to tell the disk is back an clean ? I don't like my solution verymuch as > somtimes I get an error saying the superblock cannot be written. > > Thank you very much for any help you can provide. > > Start by detailing the versions of the kernel, mdadm, which superblock you use, and your bitmap configuration (or lack of it). > Best regards / Mit freundlichen Gruessen / Cordialement / Cordiali Saluti > > Jean-Baptiste Joret - Linux on System Z > Phone: +49 7031 16-3278 / ITN: 39203278 - eMail: joret@de.ibm.com > > IBM Deutschland Research & Development GmbH > Vorsitzender des Aufsichtsrats: Martin Jetter > Geschäftsführung: Herbert Kircher > Sitz der Gesellschaft: Böblingen > Registergericht: Amtsgericht Stuttgart, HRB 243294 > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- Bill Davidsen <davidsen@tmr.com> "Woe unto the statesman who makes war without a reason that will still be valid when the war is over..." Otto von Bismark -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <4804F9FD.4070606@tmr.com>]
* Re: Need some information and help on mdadm in order to support it on IBM z Systems [not found] ` <4804F9FD.4070606@tmr.com> @ 2008-04-16 15:20 ` Jean-Baptiste Joret 2008-04-18 9:46 ` Mario 'BitKoenig' Holbe 0 siblings, 1 reply; 8+ messages in thread From: Jean-Baptiste Joret @ 2008-04-16 15:20 UTC (permalink / raw) To: Bill Davidsen; +Cc: Linux RAID Hello Bill, the scenario actually involves simulating a hardware connection issue for a few seconds and bring it back online. But once the hardware comes back online it is still do not come back into the array an remains marked "faulty spare". Moreover, if you then reboot, the mirror comes up and you can mount it but it is degraded and my "faulty spare" is now removed: Number Major Minor RaidDevice State 0 0 0 0 removed 1 8 17 1 active sync /dev/sdb1 Is there a way maybe using a udev rule to mark the device clean so it can be readded automatically into the array ? Best regards / Mit freundlichen Gruessen / Cordialement / Cordiali Saluti Jean-Baptiste Joret - Linux on System Z Phone: +49 7031 16-3278 / ITN: 39203278 - eMail: joret@de.ibm.com IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martin Jetter Geschäftsführung: Herbert Kircher Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294 From: Bill Davidsen <davidsen@tmr.com> To: Jean-Baptiste Joret/Germany/IBM@IBMDE, Linux RAID <linux-raid@vger.kernel.org> Date: 15.04.2008 20:50 Subject: Re: Need some information and help on mdadm in order to support it on IBM z Systems I have added the list back into the addresses, you can use "reply all" to keep the discussion where folks can easily contribute. Jean-Baptiste Joret wrote: Hi Bill, I have created the array with "mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/dasd[ef]1 --metadata=1.2 --bitmap=internal" using as you can see version 1.2 of the Metatata format. The Kernel ist the SuSE standard kernel 2.6.16.60-0.9-default on s390x (SLES 10 SP2 RC1). I have this issue with RC2 too. What I would like to have a more documentation about the Metadata and how they are used if you have or know someone who can provide this. The best (only) description of the metadata is in the md portion of the kernel or in the mdadm source code. I am guessing that there is a fix for your problem in more recent kernels, since a similar thing was mentioned on the mailing list recently. Older versions of the kernel require some event to start the rebuild, at which point the spare will be put back into the array. Unfortunately I didn't find it quickly, although memory tells me that it has been fixed in the latest kernel. I think you need to look carefully at any hardware or connection issues which cause the device to drop out of the array in the first place. The fact that it comes in as a faulty spare indicates a problem, but I don't quite see what that is. Your remove and reinsert will get it going again, is it possible that the devices is not ready at boot time for some reason? There may be log messages from the time when the drive was kicked from that array which will tell you more. Thank you in advance. Best regards / Mit freundlichen Gruessen / Cordialement / Cordiali Saluti Jean-Baptiste Joret - Linux on System Z Phone: +49 7031 16-3278 / ITN: 39203278 - eMail: joret@de.ibm.com IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martin Jetter Geschäftsführung: Herbert Kircher Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294 From: Bill Davidsen <davidsen@tmr.com> To: Jean-Baptiste Joret/Germany/IBM@IBMDE Cc: linux-raid@vger.kernel.org Date: 11.04.2008 16:35 Subject: Re: Need some information and help on mdadm in order to support it on IBM z Systems Jean-Baptiste Joret wrote: Hello, I am trying to obtain information such as design document or anything that would describe the content of the metadata. I am evaluating the solution to determinate whether it is entreprise ready for use as a mirror solution and if we can support it at IBM. Also I am currently having quite a show stopper issue, where help would be appreciated. I have a RAID1 with 2 Harddisks, when I remove one hardisk (I put the chpids offline which is equivalent to telling the system that the drive is currently not available), the missing disk is marked as "faulty spare" when calling mdadm -D /dev/md0. /dev/md0: Version : 01.02.03 Creation Time : Fri Apr 11 11:11:59 2008 Raid Level : raid1 Array Size : 2403972 (2.29 GiB 2.46 GB) Used Dev Size : 2403972 (2.29 GiB 2.46 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 0 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Fri Apr 11 11:23:04 2008 State : active, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 1 Spare Devices : 0 Name : 0 UUID : 9a0a6e30:4b8bbe7f:bc0cad81:9fd46804 Events : 8 Number Major Minor RaidDevice State 0 0 0 0 removed 1 94 21 1 active sync /dev/dasdf1 0 94 17 - faulty spare /dev/dasde1 When I put the disk back online it is not automatically reinserted into the array. The only thing that I have tried that worked was to do a hot remove followed by a hot add (mdadm /dev/md0 -r /dev/dasde1 and then mdadm /dev/md0 -a /dev/dasde1). Is that the correct way or is there any option to tell the disk is back an clean ? I don't like my solution verymuch as somtimes I get an error saying the superblock cannot be written. Thank you very much for any help you can provide. Start by detailing the versions of the kernel, mdadm, which superblock you use, and your bitmap configuration (or lack of it). Best regards / Mit freundlichen Gruessen / Cordialement / Cordiali Saluti Jean-Baptiste Joret - Linux on System Z Phone: +49 7031 16-3278 / ITN: 39203278 - eMail: joret@de.ibm.com IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martin Jetter Geschäftsführung: Herbert Kircher Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294 -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Bill Davidsen <davidsen@tmr.com> "Woe unto the statesman who makes war without a reason that will still be valid when the war is over..." Otto von Bismark -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Need some information and help on mdadm in order to support it on IBM z Systems 2008-04-16 15:20 ` Jean-Baptiste Joret @ 2008-04-18 9:46 ` Mario 'BitKoenig' Holbe 2008-04-18 13:45 ` David Lethe 0 siblings, 1 reply; 8+ messages in thread From: Mario 'BitKoenig' Holbe @ 2008-04-18 9:46 UTC (permalink / raw) To: linux-raid Jean-Baptiste Joret <JORET@de.ibm.com> wrote: > the scenario actually involves simulating a hardware connection issue for > a few seconds and bring it back online. But once the hardware comes back > online it is still do not come back into the array an remains marked > "faulty spare". Moreover, if you then reboot, the mirror comes up and you > can mount it but it is degraded and my "faulty spare" is now removed: This is just the normal way md deals with faulty components. And even more: I personally don't know any (soft or hard) RAID solution that would automatically try to re-add faulty components back to an array. I personally would also consider such an automatic re-add a really bad idea. There was a reason for the component to fail, you don't want to touch it again without user intervention - it could make things far more worse (blocking busses, reading wrong data etc.). A user who knows better can of course trigger the RAID to touch it again - for md it's just the way you described already: remove the faulty component from the array and re-add it. Being more "intelligent" regarding such an automatic re-add would require a far deeper failure analysis to decide whether it would be safe to try re-adding it or better leave it untouched. I don't know any software yet that would be capable to do so. Afaik, since a little while md contains one such automatism regarding sector read errors where it automatically tries to re-write this sector to the failing disk to trigger disk's sector-reallocation. I personally even consider this behaviour quite dangerous, since there is no guarantee that this read-error really occured due to a (quite harmless) single-sector failure and thus, IMHO even there is a chance to make things more worse by touching the failing disk again per default. regards Mario -- Computer Science is no more about computers than astronomy is about telescopes. -- E. W. Dijkstra ^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: Re: Need some information and help on mdadm in order to support it on IBM z Systems 2008-04-18 9:46 ` Mario 'BitKoenig' Holbe @ 2008-04-18 13:45 ` David Lethe 2008-04-20 13:41 ` Peter Grandi 0 siblings, 1 reply; 8+ messages in thread From: David Lethe @ 2008-04-18 13:45 UTC (permalink / raw) To: Mario 'BitKoenig' Holbe, linux-raid Well, I can name many RAID controllers that will automatically add a "faulty" drive back into an array. This is a very good thing to have, and is counter-intuitive to all but experienced RAID architects. Seeing how the OP works for IBM, then I'll use the IBM Profibre engine as an example of an engine that will automatically insert a "known bad" disk. Infortrend engines actually have a menu item to force an array with a "known bad" disk online. Several LSI-family controllers have this feature in their API, and as a backdoor diagnostic feature. Some of the Xyratex engines give this to you. I can go on by getting really specific and citing firmware revisions, model numbers, and so on ... but what do I know ... I just write diagnostic software, RAID configurators, failure/stress testing, failover drivers, etc. To be fair, there are correct ways to reinsert these bad disks, and the architect needs to do a few things to minimize data integrity risks, and repair them as part of the reinsertion process. As this is a public forum I won't post them but will instead, speak in generalities to make my point. There are hundreds of published patents concerning data recovery and availability in various failure scenarios, so anybody who wants to learn more can simply search the USPTO.GOV database and read them for themselves. A few real-world reasons you want this capability ... * Your RAID system consists of 2 external units, with RAID controller(s) and disks in unit A, and unit B is an expansion chassis, with interconnecting cables. You have a LUN that is spread between 2 enclosures. Enclosure "B" goes offline, either because of a power failure; a sysadmin who doesn't know you should power the RAID head down first, then expansion; or he/she powers it up backwards .. bottom line is that the drive "failed", but it only "failed" because it was disconnected due to power issues. Well architected firmware needs to be able to recognize this scenario and put the disk back. * When a disk drive really does fail, then, depending on the type of bus/loop structure in the enclosure, and quality of the backplane architecture, other disks may be affected and may not be able to respond to I/O for a second or so. If the RAID architecture aggressively fails disks in this scenario then you would have cascade effects that knock perfectly good arrays offline. * You have a hot swap array where disks are not physically "locked" in an enclosure, and the removal process starts by pushing the drive in.. Igor the klutz leans against the enclosure the wrong way and the drive temporarily gets disconnected .. but he frantically pushes it back in. You get the idea, it happens. The RAID engine (or md software) needs to be more intelligent an be able to recognize the difference between a drive failure and a drive getting disconnected Now to go back to the OP and solve his problem. Use a special connecter and extend a pair of wires outside the enclosure that breaks power. If this is a fibre-channel backplane, then you should also have external wires to short out loop A and/or loop B in order to inject other types of errors. My RAID testing software (not trying to plug it, just telling you some of the things you can do so you can write it yourself) sends a CDB to tell the disk to perform a mediainit command, or commands a disk to simply spin down. Well-designed RAID software/firmware will handle all of these problems differently. While on subject, your RAID testing software needs to be able to create ECC errors on any disk/block you need to, so you can combine these "disk" failures with stripes that have both good and bad parity. (Yes, kinda sorta plugging myself as a hired gun again, but.. ) if your testing scenario doesn't involve creating ECC errors and running non-destructive data and parity testing in combination with simulated hardware failures, then your testing, not certifying. To go back to Mario's argument that you *could* make things far worse .. absolutely. The RAID architect needs to incorporate hot-adding md disks back into the array, as long as it is done properly. RAID recovery logic is perhaps 75% of the source code for top-of-the-line RAID controllers. Their firmware determines why a disk "failed", and does what it can to bring it back online and fix the damage. A $50 SATA RAID controller has perhaps 10% of the logic dedicated to failover/failback. The md driver is somewhere in the middle. I'll end this post by reminding the md architects to consider how many days it takes to rebuild a RAID-5 set that uses 500GB or larger disk drives, and how unnecessary this action can be under certain failure scenarios. David @ SANtools ^ com -----Original Message----- From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Mario 'BitKoenig' Holbe Sent: Friday, April 18, 2008 4:46 AM To: linux-raid@vger.kernel.org Subject: Re: Need some information and help on mdadm in order to support it on IBM z Systems Jean-Baptiste Joret <JORET@de.ibm.com> wrote: > the scenario actually involves simulating a hardware connection issue for > a few seconds and bring it back online. But once the hardware comes back > online it is still do not come back into the array an remains marked > "faulty spare". Moreover, if you then reboot, the mirror comes up and you > can mount it but it is degraded and my "faulty spare" is now removed: This is just the normal way md deals with faulty components. And even more: I personally don't know any (soft or hard) RAID solution that would automatically try to re-add faulty components back to an array. I personally would also consider such an automatic re-add a really bad idea. There was a reason for the component to fail, you don't want to touch it again without user intervention - it could make things far more worse (blocking busses, reading wrong data etc.). A user who knows better can of course trigger the RAID to touch it again - for md it's just the way you described already: remove the faulty component from the array and re-add it. Being more "intelligent" regarding such an automatic re-add would require a far deeper failure analysis to decide whether it would be safe to try re-adding it or better leave it untouched. I don't know any software yet that would be capable to do so. Afaik, since a little while md contains one such automatism regarding sector read errors where it automatically tries to re-write this sector to the failing disk to trigger disk's sector-reallocation. I personally even consider this behaviour quite dangerous, since there is no guarantee that this read-error really occured due to a (quite harmless) single-sector failure and thus, IMHO even there is a chance to make things more worse by touching the failing disk again per default. regards Mario -- Computer Science is no more about computers than astronomy is about telescopes. -- E. W. Dijkstra -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Need some information and help on mdadm in order to support it on IBM z Systems 2008-04-18 13:45 ` David Lethe @ 2008-04-20 13:41 ` Peter Grandi 2008-04-20 16:24 ` David Lethe 0 siblings, 1 reply; 8+ messages in thread From: Peter Grandi @ 2008-04-20 13:41 UTC (permalink / raw) To: Linux RAID [ ... ] > Well, I can name many RAID controllers that will automatically > add a "faulty" drive back into an array. This is a very good > thing to have, and is counter-intuitive to all but experienced > RAID architects. It is not counter-intuitive, it is absolutely crazy in the general case, and in particular cases it leads to loss of focus and mingling of astractions layers that should remain separate. [ ... ] > There are hundreds of published patents concerning data > recovery and availability in various failure scenarios, so > anybody who wants to learn more can simply search the > USPTO.GOV database and read them for themselves. Sure, those are heuristics that are implemented on top of a RAID subsystem, and that are part of a ''computer assisted storage administration'' logic. Just like IBM developed many years ago expert systems to automate or assist with recovery from various faults (not just storage faults) on 370/390 class mainframes. Such recovery tools have nothing to do with RAID as such, even if they are often packaged with RAID products. They belong in a totally different abastraction layer, as this example makes starkly clear: > A few real-world reasons you want this capability ... > * Your RAID system consists of 2 external units, with RAID > controller(s) and disks in unit A, and unit B is an > expansion chassis, with interconnecting cables. You have a > LUN that is spread between 2 enclosures. Enclosure "B" goes > offline, either because of a power failure; a sysadmin who > doesn't know you should power the RAID head down first, then > expansion; or he/she powers it up backwards .. bottom line > is that the drive "failed", but it only "failed" because it > was disconnected due to power issues. > [ ... ] This relies on case-based expert-system like fault analysis and recovery using knowledge of non-RAID aspects of the storage subsystem. Fine, but it has nothing to do with RAID -- as it requires a kind of ''total system'' approach. > Well architected firmware needs to be able to recognize this > scenario and put the disk back. Well architected *RAID* firmware should do nothing of the sort. RAID has a (deceptively) simple operation model and yet getting RAID firmware right is hard enough. Well architected fault analysis and recovery daemons might well recognize that scenario and put the disk back, but that's a completely different story from RAID firmware doing that. > To go back to Mario's argument that you *could* make things > far worse .. absolutely. Sure, because fault analysis and recovery heuristics take chances that can go spectacularly wrong, as well as being pretty hard to code too. While I am not however against optional fault analysis and recovery layers on top of RAID, I really object to statements like this: > The RAID architect needs to incorporate hot-adding md disks > back into the array, as long as it is done properly. Becase the RAID architect should stay well clear of considering such kind of issues, and of polluting the base RAID firmware with additional complications; even the base RAID logic is amazingly bug infested in various products I have had the misfortune to suffer. The role of the RAID achitect is to focus on the performance and correctness of the basic RAID logic, and let the architects of fault analysis and recovery daemons worry with other issues, and perhaps to provide suitable hooks to them to make their life easier. > RAID recovery logic is perhaps 75% of the source code for > top-of-the-line RAID controllers. Their firmware determines > why a disk "failed", and does what it can to bring it back > online and fix the damage. There is a rationale for bundling a storage fault analysis and recovery daemon into a RAID host adapter, but I don't like that, because often there are two downside: * Fault analysis and recovery usually are best done at the highest possible abstraction level, that is as software daemons running on the host, as they have more information than a daemon running inside the host adapter. * Mingling fault analysis and recovery software with the base RAID logic (as the temptation then becomes hard to resist) tends to distract from the overridingly important task of getting the latter to perform reliably and to report errors clearly and usefully. In previous discussions in this list there were crazy proposals to make part of the Linux RAID logic some detection )not too bad) and recovery (chancy horror) from (ambiguous) unreported errors and the reason why I am objecting strenuously here is to help quash calls for something insane like that. Separation of concerns and of abstractions layers and keeping fundamental firmware logic simple are rather important goals in mission critical subsystems. > A $50 SATA RAID controller Except perhaps the IT8212 chips these don't exist :-). > has perhaps 10% of the logic dedicated to failover/failback. That 10% is 10% too many. It is already difficult to find simple, reliable RAID host adapters, never mind get RAID host adapters that try to be too clever. ^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: Need some information and help on mdadm in order to support it on IBM z Systems 2008-04-20 13:41 ` Peter Grandi @ 2008-04-20 16:24 ` David Lethe 0 siblings, 0 replies; 8+ messages in thread From: David Lethe @ 2008-04-20 16:24 UTC (permalink / raw) To: Peter Grandi, Linux RAID Well, I can't formerly speak for IBM, EMC, LSI, NetApp, and others when I say you are wrong in just about everything you wrote. Their architectures are "absolutely crazy", and their firmware doesn't meet your personal criteria for being "well-architected". Qlogic, Emulex and LSI are also wrong since they have vanity firmware/drivers for specific RAID subsystems to increase interoperability between all of the RAID hardware/software layers. Now contrast zfs with md+filesystemofyourchoice. The performance, reliability, security, data integrity, and self-healing capability of zfs are as profoundly superior to md and your design philosophy, as the current md architecture is to MS-DOS/FAT. The empirical evidence speaks for itself. The RAID hardware vendors and the architects of zfs spend billions of dollars annually on R&D, have superior products, and do it my way, not yours. If you want to respond with a flame, then take it to a zfs group. I see no need to respond further. -----Original Message----- From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Peter Grandi Sent: Sunday, April 20, 2008 8:41 AM To: Linux RAID Subject: Re: Need some information and help on mdadm in order to support it on IBM z Systems [ ... ] > Well, I can name many RAID controllers that will automatically > add a "faulty" drive back into an array. This is a very good > thing to have, and is counter-intuitive to all but experienced > RAID architects. It is not counter-intuitive, it is absolutely crazy in the general case, and in particular cases it leads to loss of focus and mingling of astractions layers that should remain separate. [ ... ] > There are hundreds of published patents concerning data > recovery and availability in various failure scenarios, so > anybody who wants to learn more can simply search the > USPTO.GOV database and read them for themselves. Sure, those are heuristics that are implemented on top of a RAID subsystem, and that are part of a ''computer assisted storage administration'' logic. Just like IBM developed many years ago expert systems to automate or assist with recovery from various faults (not just storage faults) on 370/390 class mainframes. Such recovery tools have nothing to do with RAID as such, even if they are often packaged with RAID products. They belong in a totally different abastraction layer, as this example makes starkly clear: > A few real-world reasons you want this capability ... > * Your RAID system consists of 2 external units, with RAID > controller(s) and disks in unit A, and unit B is an > expansion chassis, with interconnecting cables. You have a > LUN that is spread between 2 enclosures. Enclosure "B" goes > offline, either because of a power failure; a sysadmin who > doesn't know you should power the RAID head down first, then > expansion; or he/she powers it up backwards .. bottom line > is that the drive "failed", but it only "failed" because it > was disconnected due to power issues. > [ ... ] This relies on case-based expert-system like fault analysis and recovery using knowledge of non-RAID aspects of the storage subsystem. Fine, but it has nothing to do with RAID -- as it requires a kind of ''total system'' approach. > Well architected firmware needs to be able to recognize this > scenario and put the disk back. Well architected *RAID* firmware should do nothing of the sort. RAID has a (deceptively) simple operation model and yet getting RAID firmware right is hard enough. Well architected fault analysis and recovery daemons might well recognize that scenario and put the disk back, but that's a completely different story from RAID firmware doing that. > To go back to Mario's argument that you *could* make things > far worse .. absolutely. Sure, because fault analysis and recovery heuristics take chances that can go spectacularly wrong, as well as being pretty hard to code too. While I am not however against optional fault analysis and recovery layers on top of RAID, I really object to statements like this: > The RAID architect needs to incorporate hot-adding md disks > back into the array, as long as it is done properly. Becase the RAID architect should stay well clear of considering such kind of issues, and of polluting the base RAID firmware with additional complications; even the base RAID logic is amazingly bug infested in various products I have had the misfortune to suffer. The role of the RAID achitect is to focus on the performance and correctness of the basic RAID logic, and let the architects of fault analysis and recovery daemons worry with other issues, and perhaps to provide suitable hooks to them to make their life easier. > RAID recovery logic is perhaps 75% of the source code for > top-of-the-line RAID controllers. Their firmware determines > why a disk "failed", and does what it can to bring it back > online and fix the damage. There is a rationale for bundling a storage fault analysis and recovery daemon into a RAID host adapter, but I don't like that, because often there are two downside: * Fault analysis and recovery usually are best done at the highest possible abstraction level, that is as software daemons running on the host, as they have more information than a daemon running inside the host adapter. * Mingling fault analysis and recovery software with the base RAID logic (as the temptation then becomes hard to resist) tends to distract from the overridingly important task of getting the latter to perform reliably and to report errors clearly and usefully. In previous discussions in this list there were crazy proposals to make part of the Linux RAID logic some detection )not too bad) and recovery (chancy horror) from (ambiguous) unreported errors and the reason why I am objecting strenuously here is to help quash calls for something insane like that. Separation of concerns and of abstractions layers and keeping fundamental firmware logic simple are rather important goals in mission critical subsystems. > A $50 SATA RAID controller Except perhaps the IT8212 chips these don't exist :-). > has perhaps 10% of the logic dedicated to failover/failback. That 10% is 10% too many. It is already difficult to find simple, reliable RAID host adapters, never mind get RAID host adapters that try to be too clever. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2008-04-20 16:24 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-11 12:28 Need some information and help on mdadm in order to support it on IBM z Systems Jean-Baptiste Joret
2008-04-11 14:39 ` Bill Davidsen
2008-04-14 11:05 ` Jean-Baptiste Joret
[not found] ` <4804F9FD.4070606@tmr.com>
2008-04-16 15:20 ` Jean-Baptiste Joret
2008-04-18 9:46 ` Mario 'BitKoenig' Holbe
2008-04-18 13:45 ` David Lethe
2008-04-20 13:41 ` Peter Grandi
2008-04-20 16:24 ` David Lethe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).