From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sebastian Riemer Subject: Re: mdadm --fail doesn't mark device as failed? Date: Wed, 21 Nov 2012 18:10:25 +0100 Message-ID: <50AD0B01.7020300@profitbricks.com> References: <1353514677.5795.14.camel@corn.betterworld.us> <50AD0726.9090509@profitbricks.com> <1353517421.5795.58.camel@corn.betterworld.us> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: In-Reply-To: <1353517421.5795.58.camel@corn.betterworld.us> Sender: linux-raid-owner@vger.kernel.org To: Ross Boylan Cc: linux-raid@vger.kernel.org List-Id: linux-raid.ids On 21.11.2012 18:03, Ross Boylan wrote: > On Wed, 2012-11-21 at 17:53 +0100, Sebastian Riemer wrote: >> On 21.11.2012 17:17, Ross Boylan wrote: >>> After I failed and removed a partition, mdadm --examine seems to sh= ow >>> that partition is fine. >>> >>> Perhaps related to this, I failed a partition and when I rebooted i= t >>> came up as the sole member of its RAID array. >>> >>> Is this behavior expected? Is there a way to make the failures mor= e >>> convincing? >> Yes, it is expected behavior. Without "mdadm --fail" you can't remov= e a >> device from the array. If you stop the array with the failed device, >> then the state is stored in the superblock. > I'm confused. I did run mdadm --fail. Are you saying that, in addit= ion > to doing that, I also need to manipulate sysfs as you describe below? > Or were you assuming I didn't mdadm --fail? You only need to set the value in the "errors" sysfs file additionally to ensure that this device isn't used for assembly anymore. The kernel reports in "dmesg" then: md: kicking non-fresh sdb1 from array! >> There is a difference in the way mdadm does it and the sysfs method. >> mdadm sends an ioctl to the kernel. With the sysfs command the fault= y >> state is stored immediately in the superblock. >> >> # echo faulty > /sys/block/md0/md/dev-sdb1/state >> >> If you reassemble that you'll get the message: >> mdadm: device 0 in /dev/md0 has wrong state in superblock, but /dev/= sdb1 >> seems ok >> >> There is a limit of how many errors are allowed on the device (usual= ly 20). >> >> If you do the following additionally, your device won't be used for >> assembly anymore. >> # echo 20 > /sys/block/md0/md/dev-sdb1/errors >> >> I guess this is related to: /sys/block/md0/md/max_read_errors. >> >>> The drive sdb in the following excerpt does appear to be experienci= ng >>> hardware problems. However, the failed partition that became the m= d on >>> reboot was on a drive without any reported problems. >>> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid= " in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html --=20 Sebastian Riemer Linux Kernel Developer - Storage We are looking for (SENIOR) LINUX KERNEL DEVELOPERS! ProfitBricks GmbH =95 Greifswalder Str. 207 =95 10405 Berlin, Germany www.profitbricks.com =95 sebastian.riemer@profitbricks.com Tel.: +49 - 30 - 60 98 56 991 - 915 Sitz der Gesellschaft: Berlin Registergericht: Amtsgericht Charlottenburg, HRB 125506 B Gesch=E4ftsf=FChrer: Andreas Gauger, Achim Weiss -- To unsubscribe from this list: send the line "unsubscribe linux-raid" i= n the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html