* Proactive Drive Replacement @ 2008-10-20 17:35 Jon Nelson 2008-10-20 22:40 ` Mario 'BitKoenig' Holbe 0 siblings, 1 reply; 16+ messages in thread From: Jon Nelson @ 2008-10-20 17:35 UTC (permalink / raw) To: LinuxRaid I was wondering about proactive drive replacement. Specifically, let's assume we have a RAID5 (or 10 or whatever) comprised of 3 drives, A, B, and C. Let's assume we want to replace drive C with drive D, and the array is md0. We want to minimize our rebuild windows. The naive approach would be to: --add drive D to md0 --fail drive C on md0 wait for the rebuild to finish. (zero the superblock on drive C) remove drive C Obviously, this places the array in mortal danger if another drive should fail during that time. Could we not do something like this instead? 1. make sure md0 is using bitmaps 2. --fail drive C 3. create a new *single disk* raid1 from drive C 4. --add drive D to md99 5. --add md99 back into md1. 6. wait for md99's rebuild to finish 7. --fail and --remove md99 8. break md99 9. --add drive D to md0 The problem I see with the above is the creation of the raid1 which overwrites the superblock. Is there some way to avoid that (--build?)? The advantage is that the amount of time the array spends degraded is, theoretically, very small. The disadvantages include complexity, difficulty resuming in the case of more serious error (maybe), and *2* windows during which the array is mortally vulnerable to a component failure. -- Jon ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Proactive Drive Replacement 2008-10-20 17:35 Proactive Drive Replacement Jon Nelson @ 2008-10-20 22:40 ` Mario 'BitKoenig' Holbe 2008-10-21 8:38 ` David Greaves 0 siblings, 1 reply; 16+ messages in thread From: Mario 'BitKoenig' Holbe @ 2008-10-20 22:40 UTC (permalink / raw) To: linux-raid Jon Nelson <jnelson-linux-raid@jamponi.net> wrote: > I was wondering about proactive drive replacement. [bitmaps, raid1 drive to replace and new drive, ...] I belive to remember a HowTo going over this list somewhere in the past (early bitmap times?) which was recommending exactly your way. > The problem I see with the above is the creation of the raid1 which > overwrites the superblock. Is there some way to avoid that (--build?)? You can build a RAID1 without superblock. regards Mario -- [mod_nessus for iauth] <delta> "scanning your system...found depreciated OS...found hole...installing new OS...please reboot and reconnect now" ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Proactive Drive Replacement 2008-10-20 22:40 ` Mario 'BitKoenig' Holbe @ 2008-10-21 8:38 ` David Greaves 2008-10-21 13:05 ` Jon Nelson ` (2 more replies) 0 siblings, 3 replies; 16+ messages in thread From: David Greaves @ 2008-10-21 8:38 UTC (permalink / raw) To: Mario 'BitKoenig' Holbe; +Cc: linux-raid, Jon Nelson, neilb Mario 'BitKoenig' Holbe wrote: > Jon Nelson <jnelson-linux-raid@jamponi.net> wrote: >> I was wondering about proactive drive replacement. > [bitmaps, raid1 drive to replace and new drive, ...] > > I belive to remember a HowTo going over this list somewhere in the past > (early bitmap times?) which was recommending exactly your way. > >> The problem I see with the above is the creation of the raid1 which >> overwrites the superblock. Is there some way to avoid that (--build?)? > > You can build a RAID1 without superblock. How nice, an independent request for a feature just a few days later... See: "non-degraded component replacement was Re: Distributed spares" http://marc.info/?l=linux-raid&m=122398583728320&w=2 It references Dean Gaudet's work which explains why the above scenario, although it seems OK at first glance, isn't good enough. The main issue is that the drive being replaced almost certainly has a bad block. This block could be recovered from the raid5 set but won't be. Worse, the mirror operation may just fail to mirror that block - leaving it 'random' and thus corrupt the set when replaced. Of course this will work in the happy path ... but raid is about correct behaviour in the unhappy path. If you could force the mirroring to complete and note the non-mirrored blocks then you could fix it by identifying the bad/unwritten block on the new device in a raid set and manually set the bitmap for the area around that block to be 'dirty' and force it to be rebuilt from the remaining disks. Actually, this would be a nice thing to do as a subset of the feature to force a re-write of SMART identified badblocks using parity calculated values. David -- "Don't worry, you'll be fine; I saw it work in a cartoon once..." ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Proactive Drive Replacement 2008-10-21 8:38 ` David Greaves @ 2008-10-21 13:05 ` Jon Nelson 2008-10-21 13:36 ` David Greaves 2008-10-21 13:50 ` David Lethe 2008-10-21 13:57 ` Mario 'BitKoenig' Holbe 2008-10-24 5:57 ` Luca Berra 2 siblings, 2 replies; 16+ messages in thread From: Jon Nelson @ 2008-10-21 13:05 UTC (permalink / raw) To: David Greaves; +Cc: Mario 'BitKoenig' Holbe, LinuxRaid On Tue, Oct 21, 2008 at 3:38 AM, David Greaves <david@dgreaves.com> wrote: > Mario 'BitKoenig' Holbe wrote: >> Jon Nelson <jnelson-linux-raid@jamponi.net> wrote: >>> I was wondering about proactive drive replacement. >> [bitmaps, raid1 drive to replace and new drive, ...] >> >> I belive to remember a HowTo going over this list somewhere in the past >> (early bitmap times?) which was recommending exactly your way. >> >>> The problem I see with the above is the creation of the raid1 which >>> overwrites the superblock. Is there some way to avoid that (--build?)? >> >> You can build a RAID1 without superblock. > > How nice, an independent request for a feature just a few days later... > > See: > "non-degraded component replacement was Re: Distributed spares" > http://marc.info/?l=linux-raid&m=122398583728320&w=2 D'oh! I had skipped that thread before. There are differences, however minor. > It references Dean Gaudet's work which explains why the above scenario, although > it seems OK at first glance, isn't good enough. > > The main issue is that the drive being replaced almost certainly has a bad > block. This block could be recovered from the raid5 set but won't be. > Worse, the mirror operation may just fail to mirror that block - leaving it > 'random' and thus corrupt the set when replaced. > Of course this will work in the happy path ... but raid is about correct > behaviour in the unhappy path. In my case I was replacing a drive because I didn't like it. -- Jon ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Proactive Drive Replacement 2008-10-21 13:05 ` Jon Nelson @ 2008-10-21 13:36 ` David Greaves 2008-10-21 13:50 ` David Lethe 1 sibling, 0 replies; 16+ messages in thread From: David Greaves @ 2008-10-21 13:36 UTC (permalink / raw) To: Jon Nelson; +Cc: Mario 'BitKoenig' Holbe, LinuxRaid Jon Nelson wrote: > In my case I was replacing a drive because I didn't like it. Hmm, I suspect drive-ism will possibly not be the most common reason for swapping drives ;) -- "Don't worry, you'll be fine; I saw it work in a cartoon once..." ^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: Proactive Drive Replacement 2008-10-21 13:05 ` Jon Nelson 2008-10-21 13:36 ` David Greaves @ 2008-10-21 13:50 ` David Lethe 2008-10-21 14:11 ` Mario 'BitKoenig' Holbe 2008-10-21 19:39 ` David Greaves 1 sibling, 2 replies; 16+ messages in thread From: David Lethe @ 2008-10-21 13:50 UTC (permalink / raw) To: Jon Nelson, David Greaves; +Cc: Mario 'BitKoenig' Holbe, LinuxRaid > -----Original Message----- > From: linux-raid-owner@vger.kernel.org [mailto:linux-raid- > owner@vger.kernel.org] On Behalf Of Jon Nelson > Sent: Tuesday, October 21, 2008 8:06 AM > To: David Greaves > Cc: Mario 'BitKoenig' Holbe; LinuxRaid > Subject: Re: Proactive Drive Replacement > > On Tue, Oct 21, 2008 at 3:38 AM, David Greaves <david@dgreaves.com> > wrote: > > Mario 'BitKoenig' Holbe wrote: > >> Jon Nelson <jnelson-linux-raid@jamponi.net> wrote: > >>> I was wondering about proactive drive replacement. > >> [bitmaps, raid1 drive to replace and new drive, ...] > >> > >> I belive to remember a HowTo going over this list somewhere in the > past > >> (early bitmap times?) which was recommending exactly your way. > >> > >>> The problem I see with the above is the creation of the raid1 which > >>> overwrites the superblock. Is there some way to avoid that (-- > build?)? > >> > >> You can build a RAID1 without superblock. > > > > How nice, an independent request for a feature just a few days > later... > > > > See: > > "non-degraded component replacement was Re: Distributed spares" > > http://marc.info/?l=linux-raid&m=122398583728320&w=2 > > D'oh! I had skipped that thread before. There are differences, however > minor. > > > It references Dean Gaudet's work which explains why the above > scenario, although > > it seems OK at first glance, isn't good enough. > > > > The main issue is that the drive being replaced almost certainly has > a bad > > block. This block could be recovered from the raid5 set but won't be. > > Worse, the mirror operation may just fail to mirror that block - > leaving it > > 'random' and thus corrupt the set when replaced. > > Of course this will work in the happy path ... but raid is about > correct > > behaviour in the unhappy path. > > In my case I was replacing a drive because I didn't like it. > > -- > Jon > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" > in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html S.M.A.R.T. does not, has not, will not, ever ... identify bad blocks. At most, depending on the firmware, it will trigger a bit if the disk has a bad block that was discovered as a result of a read already. It will NOT trigger a bit if there is a bad block that hasn't been read yet by either a self-test or an I/O request from the host. For ATA/SATA class drives, the ANSI specification for S.M.A.R.T. provides for reading some structures which indicate such things as cumulative errors, temperature, and a Boolean that says if the disk is in a degrading mode and a S.M.A.R.T. alert is warranted. The ANSI spec is also clear in that everything but that single pass/fail bit is open to interpretation by the manufacturer (other than data format for these various registers). SCSI/SAS/FC/SAA class devices also have this bit, but the ANSI SCSI spec also provides for Log pages which are somewhat similar to the structures defined in ATA/SATA class disks, the Difference being that the ANSI spec formalized such things as exactly where errors and warnings of various types belong. They also provided or a rich subset of vendor-specific pages. Both families of disks provide for some self-test commands, but these commands do not scan the entire surface of the disk, so they are incapable of reporting or indicating where you have a new bad block. They report if you have a bad block if one is found in the extremely small sample of I/O it ran. Now some enterprise class drives support something called BGMS (Like the Seagate 15K.5 SAS/FC/SCSI disks, but 99% of the disks out there do not have such a mechanism. Sorry about rant .. but it got to me finally, where people keep posting how S.M.A.R.T. seems to be this all-knowing mechanism that tells you what is wrong with the disk and/or where the bad blocks might be. It isn't. The poster is 100% correct in that parity-protected RAID is all about recovering when bad things happen. Distributing spares is about performance. Their objectives are mutually exclusive. If you Must have a RAID mechanism that is fast, safe, and efficient on rebuilds and expansions, then consider either high-end hardware-based RAID or run ZFS on Solaris. Next best thing in LINUX world is RAID6. David @ santools.com ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Proactive Drive Replacement 2008-10-21 13:50 ` David Lethe @ 2008-10-21 14:11 ` Mario 'BitKoenig' Holbe 2008-10-21 15:13 ` David Lethe 2008-10-21 19:39 ` David Greaves 1 sibling, 1 reply; 16+ messages in thread From: Mario 'BitKoenig' Holbe @ 2008-10-21 14:11 UTC (permalink / raw) To: linux-raid David Lethe <david@santools.com> wrote: > S.M.A.R.T. does not, has not, will not, ever ... identify bad blocks. Well, as you state yourself later, S.M.A.R.T. defines self-tests which are able to identify bad blocks. Though, they have to be triggered. > Both families of disks provide for some self-test commands, but these > commands do not scan the > entire surface of the disk This is not true. The long self-test scans the entire surface of the disk at least for ATA devices, I don't know if it does that for SCSI devices too. ATA does also know about selective self-tests which are able to scan defineable surface areas - which is, at first, quite nice to identify more than one bad sector, and which is, at second, quite nice on bigger devices as well... my ST31500341AS take about 4.5 hours for a long self-test. > new bad block. They report if you have a bad block if one is found in > the extremely small sample > of I/O it ran. And, at least ATA devices report the LBA_of_first_error in the self-test log, so you can identify the first bad sector. regards Mario -- Singing is the lowest form of communication. -- Homer J. Simpson ^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: Re: Proactive Drive Replacement 2008-10-21 14:11 ` Mario 'BitKoenig' Holbe @ 2008-10-21 15:13 ` David Lethe 2008-10-21 15:30 ` Mario 'BitKoenig' Holbe 0 siblings, 1 reply; 16+ messages in thread From: David Lethe @ 2008-10-21 15:13 UTC (permalink / raw) To: Mario 'BitKoenig' Holbe, linux-raid > -----Original Message----- > From: linux-raid-owner@vger.kernel.org [mailto:linux-raid- > owner@vger.kernel.org] On Behalf Of Mario 'BitKoenig' Holbe > Sent: Tuesday, October 21, 2008 9:12 AM > To: linux-raid@vger.kernel.org > Subject: Re: Proactive Drive Replacement > > David Lethe <david@santools.com> wrote: > > S.M.A.R.T. does not, has not, will not, ever ... identify bad blocks. > > Well, as you state yourself later, S.M.A.R.T. defines self-tests which > are able to identify bad blocks. Though, they have to be triggered. > > > Both families of disks provide for some self-test commands, but these > > commands do not scan the > > entire surface of the disk > > This is not true. The long self-test scans the entire surface of the > disk at least for ATA devices, I don't know if it does that for SCSI > devices too. > ATA does also know about selective self-tests which are able to scan > defineable surface areas - which is, at first, quite nice to identify > more than one bad sector, and which is, at second, quite nice on bigger > devices as well... my ST31500341AS take about 4.5 hours for a long > self-test. > > > new bad block. They report if you have a bad block if one is found > in > > the extremely small sample > > of I/O it ran. > > And, at least ATA devices report the LBA_of_first_error in the self- > test > log, so you can identify the first bad sector. > > > regards > Mario > -- > Singing is the lowest form of communication. > -- Homer J. Simpson > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" > in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html The SCSI-family of self-test commands terminate after the first media error. This makes perfect sense as if the disk fails, you ordinarily want to know that immediately, rather than have the disk continue scanning. As such, self-test gives you the first bad block and that is it. As for SATA/ATA self-tests, then all logs are limited to 512 bytes. If you run the right self-test, then You will get a PARTIAL list of bad blocks. Specifically, you get 24 bytes which tell you the starting bad Block. You do not even get a range of bad blocks. You just know that block X is bad. It doesn't tell you if block X+1 is bad. If block X+2 is bad, it will tell you that, because it chews up another log entry. There is room for 20 entries. Not all disks support this type of self-test either. The ANSI spec says this is optional, and it is a relatively recent introduction. So, at best, if you disk supports it, you can run self-tests that will take half a day and give you a partial list of bad blocks, between ranges of LBA numbers you want to scan. This is correctly called the "SMART selective self-test routine". By the way, this is an OFF-LINE scan. So bottom line, Mario is correct in that there is a way to get a PARTIAL list of bad blocks, if you have a disk that supports this command, and you're willing to run an off-line scan (not practical or a parity-protected RAID environment). As original poster wanted to just use SMART to factor in known bad blocks on a rebuild, then you can see that there Is no viable option unless you already have a full list of known bad blocks. You have to find bad blocks as you just read from them as part of the rebuild for these types of disks). It is possible that some vendor has implemented a SATA ON-LINE bad block scanning mechanism that reports results and doesn't kill I/O performance. It would have to give full list of bad blocks, or at least startingblock + range. That would be wonderful as you could just read the list on regular interval and rebuild stripes as necessary. You'd have Self-healing parity. It still wouldn't protect against a drive failure, but it would insure that you wouldn't have any lost chunks due to a unreadable block on one of the surviving disks in a RAID set. David ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Proactive Drive Replacement 2008-10-21 15:13 ` David Lethe @ 2008-10-21 15:30 ` Mario 'BitKoenig' Holbe 0 siblings, 0 replies; 16+ messages in thread From: Mario 'BitKoenig' Holbe @ 2008-10-21 15:30 UTC (permalink / raw) To: linux-raid David Lethe <david@santools.com> wrote: > is correctly called the "SMART selective > self-test routine". By the way, this is an OFF-LINE scan. short, long, conveyance and selective tests are all offline. > So bottom line, Mario is correct in that there is a way to get a PARTIAL > list of bad blocks, if you have a disk > that supports this command, and you're willing to run an off-line scan > (not practical or a parity-protected RAID > environment). Most modern (ATA) disks support "Suspend Offline collection upon new command". Well, the tests take notably longer on a loaded disk and (low-frequent) requests to that disk take notably longer as well (high-frequent requests just keep the test suspended), but it works. > It is possible that some vendor has implemented a SATA ON-LINE bad block > scanning mechanism that reports results and > doesn't kill I/O performance. It would have to give full list of bad > blocks, or at least startingblock + range. > > That would be wonderful as you could just read the list on regular > interval and rebuild stripes as necessary. You'd have > Self-healing parity. echo check > /sys/block/mdx/md/sync_action That's indeed way more powerful than any attempt to rely on any S.M.A.R.T. thingy. regards Mario -- I thought the only thing the internet was good for was porn. -- Futurama ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Proactive Drive Replacement 2008-10-21 13:50 ` David Lethe 2008-10-21 14:11 ` Mario 'BitKoenig' Holbe @ 2008-10-21 19:39 ` David Greaves 1 sibling, 0 replies; 16+ messages in thread From: David Greaves @ 2008-10-21 19:39 UTC (permalink / raw) To: David Lethe; +Cc: Jon Nelson, Mario 'BitKoenig' Holbe, LinuxRaid It is also worth saying that this has wandered way off topic. The comment about parity rebuild yadda yadda was an aside to the real meat: a drive replace facility that uses very efficient mirroring for >99.9% of the disk rebuild and parity for the <0.1% where a read-error occured. Hmm, it occurs in the event of a highly dodgy failed drive then maybe it could do >99.9% recovery from parity and in the event of a failure from one of the remaining drives, it could attempt a read from the dodgy disk. David Lethe wrote: > Sorry about rant .. but it got to me finally, where people keep posting > how S.M.A.R.T. seems > to be this all-knowing mechanism that tells you what is wrong with the > disk and/or where the > bad blocks might be. It isn't. No, but I run long self-tests on a weekly basis and when it tells me I have a bad block I can examine further; attempt a re-write; run another long test and see if it comes back clean. David Lethe also wrote: > As original poster wanted to just use SMART to factor in known bad > blocks on a rebuild, then you can see that there > Is no viable option unless you already have a full list of known bad > blocks. You have to find bad blocks as you > just read from them as part of the rebuild for these types of disks). I did say force a re-write of SMART identified badblocks using parity calculated values. and that was innacurate. I should have said something like: when SMART identifies a bad block then force a re-write using parity calculated values. I appreciate that SMART isn't that smart - but it has a lot of value way down here below the top-end enterprise systems. David -- "Don't worry, you'll be fine; I saw it work in a cartoon once..." ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Proactive Drive Replacement 2008-10-21 8:38 ` David Greaves 2008-10-21 13:05 ` Jon Nelson @ 2008-10-21 13:57 ` Mario 'BitKoenig' Holbe 2008-10-21 17:29 ` David Greaves 2008-10-24 5:57 ` Luca Berra 2 siblings, 1 reply; 16+ messages in thread From: Mario 'BitKoenig' Holbe @ 2008-10-21 13:57 UTC (permalink / raw) To: linux-raid David Greaves <david@dgreaves.com> wrote: > The main issue is that the drive being replaced almost certainly has a bad > block. Then, the replacement is not pro-active ;) > This block could be recovered from the raid5 set but won't be. This is what 'check' and 'repair' operations (/sys/block/md*/md/sync_action) can be used for. regards Mario -- When Bruce Schneier uses double ROT13 encryption, the ciphertext is totally unbreakable. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Proactive Drive Replacement 2008-10-21 13:57 ` Mario 'BitKoenig' Holbe @ 2008-10-21 17:29 ` David Greaves 0 siblings, 0 replies; 16+ messages in thread From: David Greaves @ 2008-10-21 17:29 UTC (permalink / raw) To: Mario 'BitKoenig' Holbe; +Cc: linux-raid Mario 'BitKoenig' Holbe wrote: > David Greaves <david@dgreaves.com> wrote: >> The main issue is that the drive being replaced almost certainly has a bad >> block. > > Then, the replacement is not pro-active ;) > >> This block could be recovered from the raid5 set but won't be. > > This is what 'check' and 'repair' operations > (/sys/block/md*/md/sync_action) can be used for. Well, yes and no. If I have a bad block then I could use the remaining disks to calculate data to overwrite it. So yes. However the overwrite may fail. So no. If I have an md managed mirror then the overwrite will write to the new disk and the old one. I don't care if the old one fails. David -- "Don't worry, you'll be fine; I saw it work in a cartoon once..." ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Proactive Drive Replacement 2008-10-21 8:38 ` David Greaves 2008-10-21 13:05 ` Jon Nelson 2008-10-21 13:57 ` Mario 'BitKoenig' Holbe @ 2008-10-24 5:57 ` Luca Berra 2008-10-24 8:09 ` David Greaves 2 siblings, 1 reply; 16+ messages in thread From: Luca Berra @ 2008-10-24 5:57 UTC (permalink / raw) To: linux-raid On Tue, Oct 21, 2008 at 09:38:17AM +0100, David Greaves wrote: >The main issue is that the drive being replaced almost certainly has a bad >block. This block could be recovered from the raid5 set but won't be. >Worse, the mirror operation may just fail to mirror that block - leaving it >'random' and thus corrupt the set when replaced. False, if SMART reports the drive is failing, it just means the number of _correctable_ errors got too high, remember that hard drives (*) do use ECC and autonomously remap bad blocks. You replace a drive based on smart to prevent it developing bad blocks. Ignoring the above, your scenario is still impossible, if you tried to mirror a source drive with a bad block, md will notice and fail the mirroring process. You will never end up with one drive with a bad block and the other with uninitialized data. If what you are really worried about is not bad block, but silent corruption, you should run a check (see sync_action in /usr/src/linux/Documentation/md.txt) L. (*) note that i don't write 'modern hard drives'. -- Luca Berra -- bluca@comedia.it Communication Media & Services S.r.l. /"\ \ / ASCII RIBBON CAMPAIGN X AGAINST HTML MAIL / \ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Proactive Drive Replacement 2008-10-24 5:57 ` Luca Berra @ 2008-10-24 8:09 ` David Greaves 2008-10-25 13:20 ` Luca Berra 0 siblings, 1 reply; 16+ messages in thread From: David Greaves @ 2008-10-24 8:09 UTC (permalink / raw) To: linux-raid Luca Berra wrote: > On Tue, Oct 21, 2008 at 09:38:17AM +0100, David Greaves wrote: >> The main issue is that the drive being replaced almost certainly has a >> bad >> block. This block could be recovered from the raid5 set but won't be. >> Worse, the mirror operation may just fail to mirror that block - >> leaving it >> 'random' and thus corrupt the set when replaced. > False, > if SMART reports the drive is failing, it just means the number of > _correctable_ errors got too high, remember that hard drives (*) do use > ECC and autonomously remap bad blocks. > You replace a drive based on smart to prevent it developing bad blocks. I have just been through a batch of RMAing and re-RMAing 18+ dreadful Samsung 1Tb drives in a 3 and 5 drive level 5 array. smartd did a great job of alerting me to bad blocks found during nightly short and weekly long selftests. Usually by the time the RMA arrived the drive was capable of being fully read (once with retries). I manually mirrored the drives using ddrescue since this stressed the remaining disks less, had a reliable retry* facility. About 3 times the drive had unreadable blocks. In this case I couldn't use the mirrored drive which had a tiny bad area (a few Kb in 1Tb) - I had to do a rebuild. In one of these cases I developed a bad block on another component and had to restore from a backup. That was entirely avoidable. > Ignoring the above, your scenario is still impossible, if you tried to > mirror a source drive with a bad block, md will notice and fail the > mirroring process. You will never end up with one drive with a bad block > and the other with uninitialized data. Well done. Great nit you found <sigh>. When I wrote that I was thinking about the case above which wasn't md mirroring and re-reading it I realise that I was totally unclear and you're right; that can't happen. However you seem to ignore the part of the threads that demonstrate my understanding of the issue when I talk about mirroring from the failing drive and the need to have md resort to the remaining components/parity in the event of a failed block precisely to avoid md failing the mirroring process and leaving you stuck :) > If what you are really worried about is not bad block, but silent > corruption, you should run a check (see sync_action in > /usr/src/linux/Documentation/md.txt) No, what I am worried about is having a raid5 develop a bad block on one component and then, during recovery, develop a bad block (different #) on another component. That results in unneeded data loss - the parity is there but nothing reads it. There was some noise on /. recently when they pointed back to a year-old story about raid5 being redundant. Well, IMO this proposal would massively improve raid5/6 reliability when, not if, drives are replaced. David *I was stuck on 2.6.18 due to Xen - though eventually I did recovery using a rescue disk and 2.6.27. -- "Don't worry, you'll be fine; I saw it work in a cartoon once..." ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Proactive Drive Replacement 2008-10-24 8:09 ` David Greaves @ 2008-10-25 13:20 ` Luca Berra 2008-10-25 16:33 ` David Greaves 0 siblings, 1 reply; 16+ messages in thread From: Luca Berra @ 2008-10-25 13:20 UTC (permalink / raw) To: linux-raid On Fri, Oct 24, 2008 at 09:09:33AM +0100, David Greaves wrote: >However you seem to ignore the part of the threads that demonstrate my >understanding of the issue when I talk about mirroring from the failing drive >and the need to have md resort to the remaining components/parity in the event >of a failed block precisely to avoid md failing the mirroring process and >leaving you stuck :) It was not 'ignored', in the sense i did not read or understand it: I do agree that hot-sparing of a failing drive should be a native feature of md I was just pointing out what were, imho, errors in your reasoning. L. -- Luca Berra -- bluca@comedia.it Communication Media & Services S.r.l. /"\ \ / ASCII RIBBON CAMPAIGN X AGAINST HTML MAIL / \ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Proactive Drive Replacement 2008-10-25 13:20 ` Luca Berra @ 2008-10-25 16:33 ` David Greaves 0 siblings, 0 replies; 16+ messages in thread From: David Greaves @ 2008-10-25 16:33 UTC (permalink / raw) To: linux-raid Luca Berra wrote: > I do agree that hot-sparing of a failing drive should be a native > feature of md OK - good to hear. I suppose I'm just trying to raise the image of this issue. Hot-replacing a drive seems massively more valuable than squeezing a bit of performance out of an idle spare. David -- "Don't worry, you'll be fine; I saw it work in a cartoon once..." ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2008-10-25 16:33 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-10-20 17:35 Proactive Drive Replacement Jon Nelson 2008-10-20 22:40 ` Mario 'BitKoenig' Holbe 2008-10-21 8:38 ` David Greaves 2008-10-21 13:05 ` Jon Nelson 2008-10-21 13:36 ` David Greaves 2008-10-21 13:50 ` David Lethe 2008-10-21 14:11 ` Mario 'BitKoenig' Holbe 2008-10-21 15:13 ` David Lethe 2008-10-21 15:30 ` Mario 'BitKoenig' Holbe 2008-10-21 19:39 ` David Greaves 2008-10-21 13:57 ` Mario 'BitKoenig' Holbe 2008-10-21 17:29 ` David Greaves 2008-10-24 5:57 ` Luca Berra 2008-10-24 8:09 ` David Greaves 2008-10-25 13:20 ` Luca Berra 2008-10-25 16:33 ` David Greaves
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).