* Resize on dirty array? @ 2006-08-08 0:36 James Peverill 2006-08-08 0:46 ` Neil Brown 0 siblings, 1 reply; 20+ messages in thread From: James Peverill @ 2006-08-08 0:36 UTC (permalink / raw) To: linux-raid I have a software raid 5 setup with four drives. One drive failed. I got a replacement but unfortunately it turns out that my original disks were just a few gigs over the replacement. It seems that most manufacturers don't actually advertise the REAL capacity of the disk, so getting one that is the same size as the old ones could be tough.(and they aren't available anymore of course...) So my question... can I resize the array while it is missing a drive? The raid is <50% full, and the few gigs is only a few percent. In retrospect I shouldn't have sized them right to the limit... Any ideas would be appreciated. Thanks! James ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Resize on dirty array? 2006-08-08 0:36 Resize on dirty array? James Peverill @ 2006-08-08 0:46 ` Neil Brown 2006-08-08 19:25 ` James Peverill 0 siblings, 1 reply; 20+ messages in thread From: Neil Brown @ 2006-08-08 0:46 UTC (permalink / raw) To: James Peverill; +Cc: linux-raid On Monday August 7, jamespev@net1plus.com wrote: > > I have a software raid 5 setup with four drives. One drive failed. I > got a replacement but unfortunately it turns out that my original disks > were just a few gigs over the replacement. It seems that most > manufacturers don't actually advertise the REAL capacity of the disk, so > getting one that is the same size as the old ones could be tough.(and > they aren't available anymore of course...) > > So my question... can I resize the array while it is missing a drive? > The raid is <50% full, and the few gigs is only a few percent. In > retrospect I shouldn't have sized them right to the limit... Yes, that should work. First resize the filesystem to make it smaller. Then resize the array mdadm --grow /dev/mdX --size=whatever You have to calculate 'whatever' yourself. It is in kibibytes and must be 128K < size of new drive, and obviously must leave room for the filesystem. A good suggestion is: shrink the filesystem a lot. shrink the array an adequate amount add the new drive resize the array up to 'max' (mdadm -G /dev/mdX --size=max) resize the filesystem up to max. NeilBrown ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Resize on dirty array? 2006-08-08 0:46 ` Neil Brown @ 2006-08-08 19:25 ` James Peverill 2006-08-09 4:09 ` Neil Brown 0 siblings, 1 reply; 20+ messages in thread From: James Peverill @ 2006-08-08 19:25 UTC (permalink / raw) To: linux-raid The resize went fine, but after re-adding the drive back into the array I got another fail event (on another drive) about 23% through the rebuild :( Did I have to "remove" the bad drive before re-adding it with mdadm? I think my array might be toast... Any tips on where I should go now? Thanks for the help. James Neil Brown wrote: > On Monday August 7, jamespev@net1plus.com wrote: > >> I have a software raid 5 setup with four drives. One drive failed. I >> got a replacement but unfortunately it turns out that my original disks >> were just a few gigs over the replacement. It seems that most >> manufacturers don't actually advertise the REAL capacity of the disk, so >> getting one that is the same size as the old ones could be tough.(and >> they aren't available anymore of course...) >> >> So my question... can I resize the array while it is missing a drive? >> The raid is <50% full, and the few gigs is only a few percent. In >> retrospect I shouldn't have sized them right to the limit... >> > > Yes, that should work. > > First resize the filesystem to make it smaller. > Then resize the array > mdadm --grow /dev/mdX --size=whatever > > You have to calculate 'whatever' yourself. It is in kibibytes and > must be 128K < size of new drive, and obviously must leave room for > the filesystem. > > A good suggestion is: > shrink the filesystem a lot. > shrink the array an adequate amount > add the new drive > resize the array up to 'max' (mdadm -G /dev/mdX --size=max) > resize the filesystem up to max. > > NeilBrown > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Resize on dirty array? 2006-08-08 19:25 ` James Peverill @ 2006-08-09 4:09 ` Neil Brown 2006-08-09 11:28 ` James Peverill 0 siblings, 1 reply; 20+ messages in thread From: Neil Brown @ 2006-08-09 4:09 UTC (permalink / raw) To: James Peverill; +Cc: linux-raid On Tuesday August 8, jamespev@net1plus.com wrote: > > The resize went fine, but after re-adding the drive back into the array > I got another fail event (on another drive) about 23% through the > rebuild :( > > Did I have to "remove" the bad drive before re-adding it with mdadm? I > think my array might be toast... > You wouldn't be able to re-add the drive without removing it first. But why did you re-add the failed drive? Why not add the new one? Or maybe you did... 2 drives failed - yes - that sounds a bit like toast. You can possible do a --force assemble without the new drive and try to backup the data somewhere - if you have somewhere large enough. NeilBrown > Any tips on where I should go now? > > Thanks for the help. > > James ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Resize on dirty array? 2006-08-09 4:09 ` Neil Brown @ 2006-08-09 11:28 ` James Peverill 2006-08-09 11:37 ` Martin Schröder ` (2 more replies) 0 siblings, 3 replies; 20+ messages in thread From: James Peverill @ 2006-08-09 11:28 UTC (permalink / raw) Cc: linux-raid I'll try the force assemble but it sounds like I'm screwed. It sounds like what happened was that two of my drives developed bad sectors in different places that weren't found until I accessed certain areas (in the case of the first failure) and did the drive rebuild (for the second failure). In the future, is there a way to help prevent this? Given that the bad sectors were likely on different parts of their respective drives, I should still have a complete copy of all the data right? Is it possible to recover from a partial two-disk failure using all the disks? It looks like I might as well cut my losses and buy new disks (I suspect the last two drives are near death given whats happened to their brethren). If I go SATA am I better off getting 2 dual port cards or 1 four port? Thanks again. James Neil Brown wrote: > On Tuesday August 8, jamespev@net1plus.com wrote: > >> The resize went fine, but after re-adding the drive back into the array >> I got another fail event (on another drive) about 23% through the >> rebuild :( >> >> Did I have to "remove" the bad drive before re-adding it with mdadm? I >> think my array might be toast... >> >> > > You wouldn't be able to re-add the drive without removing it first. > But why did you re-add the failed drive? Why not add the new one? Or > maybe you did... > > 2 drives failed - yes - that sounds a bit like toast. > You can possible do a --force assemble without the new drive and try > to backup the data somewhere - if you have somewhere large enough. > > NeilBrown > > > >> Any tips on where I should go now? >> >> Thanks for the help. >> >> James >> > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Resize on dirty array? 2006-08-09 11:28 ` James Peverill @ 2006-08-09 11:37 ` Martin Schröder 2006-08-09 13:05 ` Mark Hahn 2006-08-09 14:56 ` Henrik Holst 2006-08-12 7:22 ` Tuomas Leikola 2 siblings, 1 reply; 20+ messages in thread From: Martin Schröder @ 2006-08-09 11:37 UTC (permalink / raw) To: James Peverill; +Cc: linux-raid 2006/8/9, James Peverill <jamespev@net1plus.com>: > failure). In the future, is there a way to help prevent this? RAID is no excuse for backups. smartd may warn you in advance. Best Martin PS: http://en.wikipedia.org/wiki/Top-posting#Top-posting ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Resize on dirty array? 2006-08-09 11:37 ` Martin Schröder @ 2006-08-09 13:05 ` Mark Hahn 2006-08-09 13:33 ` James Peverill 2006-08-11 17:34 ` John Stoffel 0 siblings, 2 replies; 20+ messages in thread From: Mark Hahn @ 2006-08-09 13:05 UTC (permalink / raw) To: linux-raid >> failure). In the future, is there a way to help prevent this? sure; periodic scans (perhaps smartctl) of your disks would prevent it. I suspect that throttling the rebuild rate is also often a good idea if there's any question about disk reliability. > RAID is no excuse for backups. I wish people would quit saying this: not only is it not helpful, but it's also wrong. a traditional backup is nothing more than a strangely async raid1, with the same space inefficiency. tape is not the answer, and getting more not. the idea of a periodic snapshot to media which is located apart and not under the same load as the primary copy is a good one, but not cheap or easy. backups are also often file-based, which is handy but orthogonal to being raid (or incremental, for that matter). and backups don't mean you can avoid the cold calculation of how much reliability you want to buy. _that_ is how you should choose your storage architecture... regards, mark hahn. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Resize on dirty array? 2006-08-09 13:05 ` Mark Hahn @ 2006-08-09 13:33 ` James Peverill 2006-08-09 21:17 ` David Greaves 2006-08-10 17:44 ` dean gaudet 2006-08-11 17:34 ` John Stoffel 1 sibling, 2 replies; 20+ messages in thread From: James Peverill @ 2006-08-09 13:33 UTC (permalink / raw) To: linux-raid In this case the raid WAS the backup... however it seems it turned out to be less reliable than the single disks it was supporting. In the future I think I'll make sure my disks have varying ages so they don't fail all at once. James >> RAID is no excuse for backups. PS: <ctrl><pgup> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Resize on dirty array? 2006-08-09 13:33 ` James Peverill @ 2006-08-09 21:17 ` David Greaves 2006-08-10 17:44 ` dean gaudet 1 sibling, 0 replies; 20+ messages in thread From: David Greaves @ 2006-08-09 21:17 UTC (permalink / raw) To: James Peverill; +Cc: linux-raid No, it wasn't *less* reliable than a single drive; you benefited as soon as a James Peverill wrote: > > In this case the raid WAS the backup... however it seems it turned out > to be less reliable than the single disks it was supporting. In the > future I think I'll make sure my disks have varying ages so they don't > fail all at once. > be at the moment. With RAID you then stressed the remaining drives to the point of a second failure (not that you had much choice - you *could* have spent money > James > >>> RAID is no excuse for backups. on enough media to mirror your data whilst you played with your only remaining I can't see where you mention the kernel version you're running? md can perform validation sync's on a periodic basis in later kernels - Debian's mdadm enables this in cron. copy - that's a cost/risk tradeoff you chose not to make. I've made the same choice in the past - I've been lucky - you were not - sorry.) > PS: <ctrl><pgup> > - David > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > drive failed. At that point you would have been just as toasted as you may well PS Reorganise lines from distributed reply as you like :) -- ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Resize on dirty array? 2006-08-09 13:33 ` James Peverill 2006-08-09 21:17 ` David Greaves @ 2006-08-10 17:44 ` dean gaudet 2006-08-12 1:11 ` David Rees [not found] ` <72dbd3150608111810m4e4a2e07r5ddcee2132dd6d9a@mail.gmail.com> 1 sibling, 2 replies; 20+ messages in thread From: dean gaudet @ 2006-08-10 17:44 UTC (permalink / raw) To: James Peverill; +Cc: linux-raid suggestions: - set up smartd to run long self tests once a month. (stagger it every few days so that your disks aren't doing self-tests at the same time) - run 2.6.15 or later so md supports repairing read errors from the other drives... - run 2.6.16 or later so you get the check and repair sync_actions in /sys/block/mdX/md/sync_action (i think 2.6.16.x still has a bug where you have to echo a random word other than repair to sync_action to get a repair to start... wrong sense on a strcmp, fixed in 2.6.17). - run nightly diffs of smartctl -a output on all your drives so you see when one of them reports problems in the smart self test or otherwise has a Current_Pending_Sectors or Realloc event... then launch a repair sync_action. - proactively replace your disks every couple years (i prefer to replace busy disks before 3 years). -dean On Wed, 9 Aug 2006, James Peverill wrote: > > In this case the raid WAS the backup... however it seems it turned out to be > less reliable than the single disks it was supporting. In the future I think > I'll make sure my disks have varying ages so they don't fail all at once. > > James > > > > RAID is no excuse for backups. > PS: <ctrl><pgup> > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Resize on dirty array? 2006-08-10 17:44 ` dean gaudet @ 2006-08-12 1:11 ` David Rees [not found] ` <72dbd3150608111810m4e4a2e07r5ddcee2132dd6d9a@mail.gmail.com> 1 sibling, 0 replies; 20+ messages in thread From: David Rees @ 2006-08-12 1:11 UTC (permalink / raw) To: linux-raid; +Cc: James Peverill On 8/10/06, dean gaudet <dean@arctic.org> wrote: > - set up smartd to run long self tests once a month. (stagger it every > few days so that your disks aren't doing self-tests at the same time) I personally prefer to do a long self-test once a week, a month seems like a lot of time for something to go wrong. > - run nightly diffs of smartctl -a output on all your drives so you see > when one of them reports problems in the smart self test or otherwise > has a Current_Pending_Sectors or Realloc event... then launch a > repair sync_action. You can (and probably should) setup smartd to automatically send out email alerts as well. -Dave ^ permalink raw reply [flat|nested] 20+ messages in thread
[parent not found: <72dbd3150608111810m4e4a2e07r5ddcee2132dd6d9a@mail.gmail.com>]
[parent not found: <Pine.LNX.4.64.0608111813250.29322@twinlark.arctic.org>]
* Re: Resize on dirty array? [not found] ` <Pine.LNX.4.64.0608111813250.29322@twinlark.arctic.org> @ 2006-08-12 2:05 ` David Rees 2006-08-12 4:36 ` Brad Campbell 2006-08-13 16:02 ` dean gaudet 0 siblings, 2 replies; 20+ messages in thread From: David Rees @ 2006-08-12 2:05 UTC (permalink / raw) To: dean gaudet, linux-raid On 8/11/06, dean gaudet <dean@arctic.org> wrote: > On Fri, 11 Aug 2006, David Rees wrote: > > > On 8/10/06, dean gaudet <dean@arctic.org> wrote: > > > - set up smartd to run long self tests once a month. (stagger it every > > > few days so that your disks aren't doing self-tests at the same time) > > > > I personally prefer to do a long self-test once a week, a month seems > > like a lot of time for something to go wrong. > > unfortunately i found some drives (seagate 400 pata) had a rather negative > effect on performance while doing self-test. Interesting that you noted negative performance, but I typically schedule the tests for off-hours anyway where performance isn't critical. How much of a performance hit did you notice? -Dave ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Resize on dirty array? 2006-08-12 2:05 ` David Rees @ 2006-08-12 4:36 ` Brad Campbell 2006-08-13 16:02 ` dean gaudet 1 sibling, 0 replies; 20+ messages in thread From: Brad Campbell @ 2006-08-12 4:36 UTC (permalink / raw) To: David Rees; +Cc: dean gaudet, linux-raid David Rees wrote: >> > I personally prefer to do a long self-test once a week, a month seems >> > like a lot of time for something to go wrong. >> >> unfortunately i found some drives (seagate 400 pata) had a rather >> negative >> effect on performance while doing self-test. > > Interesting that you noted negative performance, but I typically > schedule the tests for off-hours anyway where performance isn't > critical. Personally I have every disk do a short test at 6am Monday-Saturday, and then they *all* (29 of them) do a long test every Sunday at 6am. I figure having all disks do a long test at the same time rather than staggered is going to show up any pending issues with my PSU's also. (Been doing this for nearly 2 years now and had it show up a couple of drives that were slowly growing defects. Nothing a dd if=/dev/zero of=/dev/sd(x) did not fix though) Brad -- "Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so." -- Douglas Adams ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Resize on dirty array? 2006-08-12 2:05 ` David Rees 2006-08-12 4:36 ` Brad Campbell @ 2006-08-13 16:02 ` dean gaudet 2006-08-30 7:30 ` dean gaudet 1 sibling, 1 reply; 20+ messages in thread From: dean gaudet @ 2006-08-13 16:02 UTC (permalink / raw) To: David Rees; +Cc: linux-raid On Fri, 11 Aug 2006, David Rees wrote: > On 8/11/06, dean gaudet <dean@arctic.org> wrote: > > On Fri, 11 Aug 2006, David Rees wrote: > > > > > On 8/10/06, dean gaudet <dean@arctic.org> wrote: > > > > - set up smartd to run long self tests once a month. (stagger it every > > > > few days so that your disks aren't doing self-tests at the same time) > > > > > > I personally prefer to do a long self-test once a week, a month seems > > > like a lot of time for something to go wrong. > > > > unfortunately i found some drives (seagate 400 pata) had a rather negative > > effect on performance while doing self-test. > > Interesting that you noted negative performance, but I typically > schedule the tests for off-hours anyway where performance isn't > critical. > > How much of a performance hit did you notice? i never benchmarked it explicitly. iirc the problem was generally metadata performance... and became less of an issue when i moved the filesystem log off the raid5 onto a raid1. unfortunately there aren't really any "off hours" for this system. -dean ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Resize on dirty array? 2006-08-13 16:02 ` dean gaudet @ 2006-08-30 7:30 ` dean gaudet 0 siblings, 0 replies; 20+ messages in thread From: dean gaudet @ 2006-08-30 7:30 UTC (permalink / raw) To: David Rees; +Cc: linux-raid On Sun, 13 Aug 2006, dean gaudet wrote: > On Fri, 11 Aug 2006, David Rees wrote: > > > On 8/11/06, dean gaudet <dean@arctic.org> wrote: > > > On Fri, 11 Aug 2006, David Rees wrote: > > > > > > > On 8/10/06, dean gaudet <dean@arctic.org> wrote: > > > > > - set up smartd to run long self tests once a month. (stagger it every > > > > > few days so that your disks aren't doing self-tests at the same time) > > > > > > > > I personally prefer to do a long self-test once a week, a month seems > > > > like a lot of time for something to go wrong. > > > > > > unfortunately i found some drives (seagate 400 pata) had a rather negative > > > effect on performance while doing self-test. > > > > Interesting that you noted negative performance, but I typically > > schedule the tests for off-hours anyway where performance isn't > > critical. > > > > How much of a performance hit did you notice? > > i never benchmarked it explicitly. iirc the problem was generally > metadata performance... and became less of an issue when i moved the > filesystem log off the raid5 onto a raid1. unfortunately there aren't > really any "off hours" for this system. the problem reappeared... so i can provide some data. one of the 400GB seagates has been stuck at 20% of a SMART long self test for over 2 days now, and the self-test itself has been going for about 4.5 days total. a typical "iostat -x /dev/sd[cdfgh] 30" sample looks like this: Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sdc 90.94 137.52 14.70 25.76 841.32 1360.35 54.43 0.94 23.30 10.30 41.68 sdd 93.67 140.52 14.96 22.06 863.98 1354.75 59.93 0.91 24.50 12.17 45.05 sdf 92.84 136.85 15.36 26.39 857.85 1360.35 53.13 0.88 21.04 10.59 44.21 sdg 87.74 137.82 14.23 24.86 807.73 1355.55 55.35 0.85 21.86 11.25 43.99 sdh 87.20 134.56 14.96 28.29 810.13 1356.88 50.10 1.90 43.72 20.02 86.60 those 5 are in a raid5, so their io should be relatively even... notice the await, svctm and %util of sdh compared to the other 4. sdh is the one with the exceptionally slow going SMART long self-test. i assume it's still making progress because the effect is measurable in iostat. -dean ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Resize on dirty array? 2006-08-09 13:05 ` Mark Hahn 2006-08-09 13:33 ` James Peverill @ 2006-08-11 17:34 ` John Stoffel 1 sibling, 0 replies; 20+ messages in thread From: John Stoffel @ 2006-08-11 17:34 UTC (permalink / raw) To: Mark Hahn; +Cc: linux-raid >>>>> "Mark" == Mark Hahn <hahn@physics.mcmaster.ca> writes: >> RAID is no excuse for backups. Mark> I wish people would quit saying this: not only is it not helpful, Mark> but it's also wrong. You've got to be kidding, right? A backup is another aspect of data protection. RAID is another form. Both have their uses, and both should be used on any system with important data. You're just spouting the wrong thing here and I really dislike seeing it, which has prompted this reply. Mark> a traditional backup is nothing more than a strangely async Mark> raid1, with the same space inefficiency. tape is not the Mark> answer, and getting more not. the idea of a periodic snapshot Mark> to media which is located apart and not under the same load as Mark> the primary copy is a good one, but not cheap or easy. backups Mark> are also often file-based, which is handy but orthogonal to Mark> being raid (or incremental, for that matter). and backups don't Mark> mean you can avoid the cold calculation of how much reliability Mark> you want to buy. _that_ is how you should choose your storage Mark> architecture... You again mixing up your ideas here. This is the first time I've ever heard someone imply that backups to tape are a form of RAID, never. You really have an interesting point of view here. Now maybe you do have some good points, but they're certainly not articulated clearly. Just to work through them: First, backups to tape may not be cheap or easy, especially with the rise of 250gb disks for $100. Buying a tape drive that has the space and performance to backup that amount of data can be a big investment. Second, reliability is a different measure from that of data retention. I can have the most reliable RAID system on a server which can handle multiple devices failing (because they weren't reliable), or power supply failure or connectivity failures, etc. But if a user deletes a file and it can't be recovered from your RAID system, then how much help has that RAID system been? Now you may argue that reliability includes backups, but that's just wrong. Reliability is a measure of the media/sub-system. It's not a measure of how good your backups are. So you then claim that snapshots are a great way to get cheap and easy backups, especially when you have reliable RAID. So what happens when your building burns down? Or even just your house? (As an aside, while I do backups at home, I don't take them offsite in case of fire. Shame on me, and I'm a SysAdmin by profession!) So how do you know that your snapshots are reliable? Are they filesystem based? Are they volume based? If volume based, how do you get the filesystem in a quiescent state to make sure there's no corruption when you make the snapshot? It's not a trivial problem. And even traditional backups to tape have this issue. I'd write more, but I'm busy with other stuff and I wanted to hear your justifications in more detail before I bothered spending the time to refute them. John ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Resize on dirty array? 2006-08-09 11:28 ` James Peverill 2006-08-09 11:37 ` Martin Schröder @ 2006-08-09 14:56 ` Henrik Holst 2006-08-12 7:22 ` Tuomas Leikola 2 siblings, 0 replies; 20+ messages in thread From: Henrik Holst @ 2006-08-09 14:56 UTC (permalink / raw) To: James Peverill; +Cc: linux-raid James Peverill wrote: > I'll try the force assemble but it sounds like I'm screwed. It > sounds like what happened was that two of my drives developed bad > sectors in different places that weren't found until I accessed > certain areas (in the case of the first failure) and did the drive > rebuild (for the second failure). The file /sys/block/mdX/md/sync_action can be used to issue a recheck of the data. Read Documentation/md.txt in kernel source for details about the exact procedure. My advice (if you still want to continue using software raid) is that you run such a check before any add/grow or other action in the future. Also, if the raid has been unused for a long while it might be a good idea to recheck the data. [snip] I feel your pain. Massive data loss is the worst. I have had my share of crashes. Once due to bad disk and no redundancy, the other time due to good old stupidity. Henrik Holst ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Resize on dirty array? 2006-08-09 11:28 ` James Peverill 2006-08-09 11:37 ` Martin Schröder 2006-08-09 14:56 ` Henrik Holst @ 2006-08-12 7:22 ` Tuomas Leikola 2006-08-28 4:55 ` Neil Brown 2 siblings, 1 reply; 20+ messages in thread From: Tuomas Leikola @ 2006-08-12 7:22 UTC (permalink / raw) To: James Peverill; +Cc: linux-raid, Neil Brown On 8/9/06, James Peverill <jamespev@net1plus.com> wrote: > > I'll try the force assemble but it sounds like I'm screwed. It sounds > like what happened was that two of my drives developed bad sectors in > different places that weren't found until I accessed certain areas (in > the case of the first failure) and did the drive rebuild (for the second > failure). In the future, is there a way to help prevent this? This is a common scenario, and I feel could be helped if md could be told to not drop the disk on first failure, but rather keep it running in "FAILING" status (as opposed to FAILED), until all data from it has been evacuated (hot spare). This way, if another disk became "failing" during rebuild, due to another area of the disk, those blocks could be rebuilt using the other "failing" disk. (Also, this allows for the rebuild to mostly be a ddrescue-style copy operation, rather than parity computation). Do you guys feel this is feasible? Neil? ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Resize on dirty array? 2006-08-12 7:22 ` Tuomas Leikola @ 2006-08-28 4:55 ` Neil Brown 2006-08-28 6:36 ` Mario 'BitKoenig' Holbe 0 siblings, 1 reply; 20+ messages in thread From: Neil Brown @ 2006-08-28 4:55 UTC (permalink / raw) To: Tuomas Leikola; +Cc: James Peverill, linux-raid On Saturday August 12, tuomas.leikola@gmail.com wrote: > On 8/9/06, James Peverill <jamespev@net1plus.com> wrote: > > > > I'll try the force assemble but it sounds like I'm screwed. It sounds > > like what happened was that two of my drives developed bad sectors in > > different places that weren't found until I accessed certain areas (in > > the case of the first failure) and did the drive rebuild (for the second > > failure). In the future, is there a way to help prevent this? > > This is a common scenario, and I feel could be helped if md could be > told to not drop the disk on first failure, but rather keep it running > in "FAILING" status (as opposed to FAILED), until all data from it has > been evacuated (hot spare). This way, if another disk became "failing" > during rebuild, due to another area of the disk, those blocks could be > rebuilt using the other "failing" disk. (Also, this allows for the > rebuild to mostly be a ddrescue-style copy operation, rather than > parity computation). > > Do you guys feel this is feasible? Neil? Maybe.... I would be a lot happier about it if the block layer told me whether the fail was a Media error or some other sort of error. But something could probably be arranged, and the general idea has been suggested a number of times now, so maybe it really is a good idea :-) I'll put it on my todo list :-) NeilBrown ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Resize on dirty array? 2006-08-28 4:55 ` Neil Brown @ 2006-08-28 6:36 ` Mario 'BitKoenig' Holbe 0 siblings, 0 replies; 20+ messages in thread From: Mario 'BitKoenig' Holbe @ 2006-08-28 6:36 UTC (permalink / raw) To: linux-raid Neil Brown <neilb@suse.de> wrote: > I would be a lot happier about it if the block layer told me whether > the fail was a Media error or some other sort of error. This wouldn't help you either. I've seen drives (mainly Samsung) that locked up the whole IDE bus after some simple (subsequent) sector-read- errors. And with strange IDE drivers (like PDC202XX_NEW) this could also escalate to whole-machine freezes - I've also seen this and had to play hardly with device-mapper's dm-error target to work around this :) So IMHO at least the default behaviour should stay as it currently is: if a drive fails, do never ever touch it again. Perhaps, you could make such a "FAILING" feature somehow configurable (perhaps even on a per-mirror base) in order to allow users to enable it for drives they *do* know they don't show up such a bad behaviour. regards Mario -- Independence Day: Fortunately, the alien computer operating system works just fine with the laptop. This proves an important point which Apple enthusiasts have known for years. While the evil empire of Microsoft may dominate the computers of Earth people, more advanced life forms clearly prefer Macs. ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2006-08-30 7:30 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-08 0:36 Resize on dirty array? James Peverill
2006-08-08 0:46 ` Neil Brown
2006-08-08 19:25 ` James Peverill
2006-08-09 4:09 ` Neil Brown
2006-08-09 11:28 ` James Peverill
2006-08-09 11:37 ` Martin Schröder
2006-08-09 13:05 ` Mark Hahn
2006-08-09 13:33 ` James Peverill
2006-08-09 21:17 ` David Greaves
2006-08-10 17:44 ` dean gaudet
2006-08-12 1:11 ` David Rees
[not found] ` <72dbd3150608111810m4e4a2e07r5ddcee2132dd6d9a@mail.gmail.com>
[not found] ` <Pine.LNX.4.64.0608111813250.29322@twinlark.arctic.org>
2006-08-12 2:05 ` David Rees
2006-08-12 4:36 ` Brad Campbell
2006-08-13 16:02 ` dean gaudet
2006-08-30 7:30 ` dean gaudet
2006-08-11 17:34 ` John Stoffel
2006-08-09 14:56 ` Henrik Holst
2006-08-12 7:22 ` Tuomas Leikola
2006-08-28 4:55 ` Neil Brown
2006-08-28 6:36 ` Mario 'BitKoenig' Holbe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).