From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mario 'BitKoenig' Holbe Subject: Re: Need some information and help on mdadm in order to support it on IBM z Systems Date: Fri, 18 Apr 2008 11:46:23 +0200 Message-ID: References: <47FF7831.20707@tmr.com> <4804F9FD.4070606@tmr.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Return-path: Sender: linux-raid-owner@vger.kernel.org To: linux-raid@vger.kernel.org List-Id: linux-raid.ids Jean-Baptiste Joret wrote: > the scenario actually involves simulating a hardware connection issue for > a few seconds and bring it back online. But once the hardware comes back > online it is still do not come back into the array an remains marked > "faulty spare". Moreover, if you then reboot, the mirror comes up and you > can mount it but it is degraded and my "faulty spare" is now removed: This is just the normal way md deals with faulty components. And even more: I personally don't know any (soft or hard) RAID solution that would automatically try to re-add faulty components back to an array. I personally would also consider such an automatic re-add a really bad idea. There was a reason for the component to fail, you don't want to touch it again without user intervention - it could make things far more worse (blocking busses, reading wrong data etc.). A user who knows better can of course trigger the RAID to touch it again - for md it's just the way you described already: remove the faulty component from the array and re-add it. Being more "intelligent" regarding such an automatic re-add would require a far deeper failure analysis to decide whether it would be safe to try re-adding it or better leave it untouched. I don't know any software yet that would be capable to do so. Afaik, since a little while md contains one such automatism regarding sector read errors where it automatically tries to re-write this sector to the failing disk to trigger disk's sector-reallocation. I personally even consider this behaviour quite dangerous, since there is no guarantee that this read-error really occured due to a (quite harmless) single-sector failure and thus, IMHO even there is a chance to make things more worse by touching the failing disk again per default. regards Mario -- Computer Science is no more about computers than astronomy is about telescopes. -- E. W. Dijkstra