* how to clone a disk @ 2006-03-11 0:56 Ming Zhang 2006-03-11 11:53 ` Paul M. 0 siblings, 1 reply; 14+ messages in thread From: Ming Zhang @ 2006-03-11 0:56 UTC (permalink / raw) To: linux-raid Hi folks I have a raid5 array that contain 4 disk and 1 spare disk. now i saw one disk have sign of going fail via smart log. so i am trying to do this 1) pull that disk (A) out 2) using dd to copy that disk (A) to another same one (B); 3) put disk (B) back to raid5 and assume that bitmap can help me resync quickly. now my question is 1) can this achieve my goal? 2) is there any better/correct way? thanks! Ming ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: how to clone a disk 2006-03-11 0:56 how to clone a disk Ming Zhang @ 2006-03-11 11:53 ` Paul M. 2006-03-11 12:40 ` PFC 2006-03-11 22:08 ` Ming Zhang 0 siblings, 2 replies; 14+ messages in thread From: Paul M. @ 2006-03-11 11:53 UTC (permalink / raw) To: mingz; +Cc: linux-raid Since its raid5 you would be fine just pulling the disk out and letting the raid driver rebuild the array. If you have a hot spare then that disk will automatically take over when you remove the failing disk. It would also be reasonable to simply wait for the disk to fail. When that happens the array will be keep running. When the disk does fail the system will use the hot spare to rebuild the array. The short answer is your way will work but it not necessarily. -Paul On 3/10/06, Ming Zhang <mingz@ele.uri.edu> wrote: > Hi folks > > I have a raid5 array that contain 4 disk and 1 spare disk. now i saw one > disk have sign of going fail via smart log. > > so i am trying to do this > > 1) pull that disk (A) out > 2) using dd to copy that disk (A) to another same one (B); > 3) put disk (B) back to raid5 and assume that bitmap can help me resync > quickly. > > now my question is > 1) can this achieve my goal? > 2) is there any better/correct way? > > thanks! > > Ming > > > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: how to clone a disk 2006-03-11 11:53 ` Paul M. @ 2006-03-11 12:40 ` PFC 2006-03-11 16:55 ` Mike Hardy 2006-03-11 22:08 ` Ming Zhang 1 sibling, 1 reply; 14+ messages in thread From: PFC @ 2006-03-11 12:40 UTC (permalink / raw) To: Paul M., mingz; +Cc: linux-raid >> I have a raid5 array that contain 4 disk and 1 spare disk. now i saw one >> disk have sign of going fail via smart log. Better safe than sorry... replace the failing disk and resync, that's all. You might want to do "cat /dev/md# > /dev/null", or "cat /dev/hd? >/dev/null" first. This is to be sure there isn't some yet-unseen bad sector on some other drive which would screw your resync. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: how to clone a disk 2006-03-11 12:40 ` PFC @ 2006-03-11 16:55 ` Mike Hardy 2006-03-11 22:09 ` Ming Zhang 0 siblings, 1 reply; 14+ messages in thread From: Mike Hardy @ 2006-03-11 16:55 UTC (permalink / raw) To: PFC; +Cc: Paul M., mingz, linux-raid I can think of two things I'd do slightly differently... Do a smartctl -t long on each disk before you do anything, to verify that you don't have single sector errors on other drives Use ddrescue for better results copying a failing drive -Mike PFC wrote: > >>> I have a raid5 array that contain 4 disk and 1 spare disk. now i saw one >>> disk have sign of going fail via smart log. > > > Better safe than sorry... replace the failing disk and resync, > that's all. > > You might want to do "cat /dev/md# > /dev/null", or "cat /dev/hd? > >> /dev/null" first. This is to be sure there isn't some yet-unseen bad > > sector on some other drive which would screw your resync. > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: how to clone a disk 2006-03-11 16:55 ` Mike Hardy @ 2006-03-11 22:09 ` Ming Zhang 2006-03-11 23:08 ` Mike Hardy 0 siblings, 1 reply; 14+ messages in thread From: Ming Zhang @ 2006-03-11 22:09 UTC (permalink / raw) To: Mike Hardy; +Cc: PFC, Paul M., linux-raid On Sat, 2006-03-11 at 08:55 -0800, Mike Hardy wrote: > I can think of two things I'd do slightly differently... > > Do a smartctl -t long on each disk before you do anything, to verify > that you don't have single sector errors on other drives will this test interfere with normal disk io activity? > > Use ddrescue for better results copying a failing drive yes, ddrescue will be better here than dd. > > -Mike > > PFC wrote: > > > >>> I have a raid5 array that contain 4 disk and 1 spare disk. now i saw one > >>> disk have sign of going fail via smart log. > > > > > > Better safe than sorry... replace the failing disk and resync, > > that's all. > > > > You might want to do "cat /dev/md# > /dev/null", or "cat /dev/hd? > > > >> /dev/null" first. This is to be sure there isn't some yet-unseen bad > > > > sector on some other drive which would screw your resync. > > - > > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: how to clone a disk 2006-03-11 22:09 ` Ming Zhang @ 2006-03-11 23:08 ` Mike Hardy 2006-03-11 23:10 ` Ming Zhang 0 siblings, 1 reply; 14+ messages in thread From: Mike Hardy @ 2006-03-11 23:08 UTC (permalink / raw) To: mingz; +Cc: PFC, Paul M., linux-raid Ming Zhang wrote: > On Sat, 2006-03-11 at 08:55 -0800, Mike Hardy wrote: > >>I can think of two things I'd do slightly differently... >> >>Do a smartctl -t long on each disk before you do anything, to verify >>that you don't have single sector errors on other drives > > > will this test interfere with normal disk io activity? If I understand the documentation correctly, it is not supposed to interfere with normal I/O. It's something the disk does when it can. Also, I do it on running systems with busy disks all the time (it's scheduled, actually, and runs automatically at the same time backups and things like that happen), and it is never noticable to me. *except* when it finds something, and it tells me there's a problem before any software notices. I love the advance warning... -Mike ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: how to clone a disk 2006-03-11 23:08 ` Mike Hardy @ 2006-03-11 23:10 ` Ming Zhang 0 siblings, 0 replies; 14+ messages in thread From: Ming Zhang @ 2006-03-11 23:10 UTC (permalink / raw) To: Mike Hardy; +Cc: PFC, Paul M., linux-raid thanks a lot! ming On Sat, 2006-03-11 at 15:08 -0800, Mike Hardy wrote: > > > Ming Zhang wrote: > > On Sat, 2006-03-11 at 08:55 -0800, Mike Hardy wrote: > > > >>I can think of two things I'd do slightly differently... > >> > >>Do a smartctl -t long on each disk before you do anything, to verify > >>that you don't have single sector errors on other drives > > > > > > will this test interfere with normal disk io activity? > > If I understand the documentation correctly, it is not supposed to > interfere with normal I/O. It's something the disk does when it can. > > Also, I do it on running systems with busy disks all the time (it's > scheduled, actually, and runs automatically at the same time backups and > things like that happen), and it is never noticable to me. > > *except* when it finds something, and it tells me there's a problem > before any software notices. I love the advance warning... > > -Mike ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: how to clone a disk 2006-03-11 11:53 ` Paul M. 2006-03-11 12:40 ` PFC @ 2006-03-11 22:08 ` Ming Zhang 2006-03-12 0:15 ` dean gaudet 1 sibling, 1 reply; 14+ messages in thread From: Ming Zhang @ 2006-03-11 22:08 UTC (permalink / raw) To: Paul M.; +Cc: linux-raid On Sat, 2006-03-11 at 06:53 -0500, Paul M. wrote: > Since its raid5 you would be fine just pulling the disk out and > letting the raid driver rebuild the array. If you have a hot spare yes, rebuilding is the simplest way. but rebuild will need to read all other disks and write to the new disk. when serving some io at same time, the rebuilding speed is not much, but if i do a dd clone and plug it back. the total traffic is copy one disk which can be done very fast as a fully sequential workload. with that bitmap feature, the rsync work after plugging back is minor. so the one disk fail window is pretty small here. right? ming > then that disk will automatically take over when you remove the > failing disk. > It would also be reasonable to simply wait for the disk to fail. When > that happens the array will be keep running. When the disk does fail > the system will use the hot spare to rebuild the array. > The short answer is your way will work but it not necessarily. > -Paul > > On 3/10/06, Ming Zhang <mingz@ele.uri.edu> wrote: > > Hi folks > > > > I have a raid5 array that contain 4 disk and 1 spare disk. now i saw one > > disk have sign of going fail via smart log. > > > > so i am trying to do this > > > > 1) pull that disk (A) out > > 2) using dd to copy that disk (A) to another same one (B); > > 3) put disk (B) back to raid5 and assume that bitmap can help me resync > > quickly. > > > > now my question is > > 1) can this achieve my goal? > > 2) is there any better/correct way? > > > > thanks! > > > > Ming > > > > > > - > > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: how to clone a disk 2006-03-11 22:08 ` Ming Zhang @ 2006-03-12 0:15 ` dean gaudet 2006-03-12 0:22 ` Ming Zhang 0 siblings, 1 reply; 14+ messages in thread From: dean gaudet @ 2006-03-12 0:15 UTC (permalink / raw) To: Ming Zhang; +Cc: Paul M., linux-raid On Sat, 11 Mar 2006, Ming Zhang wrote: > On Sat, 2006-03-11 at 06:53 -0500, Paul M. wrote: > > Since its raid5 you would be fine just pulling the disk out and > > letting the raid driver rebuild the array. If you have a hot spare > > yes, rebuilding is the simplest way. but rebuild will need to read all > other disks and write to the new disk. when serving some io at same > time, the rebuilding speed is not much, > > but if i do a dd clone and plug it back. the total traffic is copy one > disk which can be done very fast as a fully sequential workload. with > that bitmap feature, the rsync work after plugging back is minor. > > so the one disk fail window is pretty small here. right? you're planning to do this while the array is online? that's not safe... unless it's a read-only array... if you've got a bitmap then one thing you *could* do is stop the array temporarily, and copy the bitmap first, then restart the array... then copy the rest of the disk minus the bitmap. you basically need an atomic copy of the bitmap from before you start the ddrescue... and you need to use that copy of the bitmap when you reassemble the array with the new disk. or you could stop the raid5, and make a raid1 (legacy style, without raid superblock) of the dying disk and the new disk... then reassemble the raid5 using the raid1 for the one component... then restart the raid5. regardless of which method you use you're going to need to take the array offline at least once to reassemble it with the duplicated disk in place of the dying disk... i think i'd be tempted to do the raid1 method ... because that one requires you go offline at most once -- after the raid1 syncs you can fail out the dying drive and leave the raid1 around "degraded" until some future system maintenance event where you can reassemble without it. (a reboot would automagically make it disappear too -- because it wouldn't have a raid1 superblock anyhow). -dean ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: how to clone a disk 2006-03-12 0:15 ` dean gaudet @ 2006-03-12 0:22 ` Ming Zhang 2006-03-12 0:31 ` dean gaudet 0 siblings, 1 reply; 14+ messages in thread From: Ming Zhang @ 2006-03-12 0:22 UTC (permalink / raw) To: dean gaudet; +Cc: Paul M., linux-raid On Sat, 2006-03-11 at 16:15 -0800, dean gaudet wrote: > On Sat, 11 Mar 2006, Ming Zhang wrote: > > > On Sat, 2006-03-11 at 06:53 -0500, Paul M. wrote: > > > Since its raid5 you would be fine just pulling the disk out and > > > letting the raid driver rebuild the array. If you have a hot spare > > > > yes, rebuilding is the simplest way. but rebuild will need to read all > > other disks and write to the new disk. when serving some io at same > > time, the rebuilding speed is not much, > > > > but if i do a dd clone and plug it back. the total traffic is copy one > > disk which can be done very fast as a fully sequential workload. with > > that bitmap feature, the rsync work after plugging back is minor. > > > > so the one disk fail window is pretty small here. right? > > you're planning to do this while the array is online? that's not safe... > unless it's a read-only array... what i plan to do is to pull out the disk (which is ok now but going to die), so raid5 will degrade with 1 disk fail and no spare disk here, then do ddresue to a new disk which will have same uuid and everything, then put it back, then bitmap will shine here right? so raid5 is still online while that disk is not part of raid5 now. and no diskio on it at all. so do not think i need an atomic operation here. > > if you've got a bitmap then one thing you *could* do is stop the array > temporarily, and copy the bitmap first, then restart the array... then > copy the rest of the disk minus the bitmap. > > you basically need an atomic copy of the bitmap from before you start the > ddrescue... and you need to use that copy of the bitmap when you > reassemble the array with the new disk. > this raid5 over raid1 way sounds interesting. worthy trying. > or you could stop the raid5, and make a raid1 (legacy style, without raid > superblock) of the dying disk and the new disk... then reassemble the > raid5 using the raid1 for the one component... then restart the raid5. > > regardless of which method you use you're going to need to take the array > offline at least once to reassemble it with the duplicated disk in place > of the dying disk... > > i think i'd be tempted to do the raid1 method ... because that one > requires you go offline at most once -- after the raid1 syncs you can fail > out the dying drive and leave the raid1 around "degraded" until some > future system maintenance event where you can reassemble without it. (a > reboot would automagically make it disappear too -- because it wouldn't > have a raid1 superblock anyhow). > > -dean ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: how to clone a disk 2006-03-12 0:22 ` Ming Zhang @ 2006-03-12 0:31 ` dean gaudet 2006-03-12 0:36 ` Ming Zhang 0 siblings, 1 reply; 14+ messages in thread From: dean gaudet @ 2006-03-12 0:31 UTC (permalink / raw) To: Ming Zhang; +Cc: Paul M., linux-raid On Sat, 11 Mar 2006, Ming Zhang wrote: > On Sat, 2006-03-11 at 16:15 -0800, dean gaudet wrote: > > > you're planning to do this while the array is online? that's not safe... > > unless it's a read-only array... > > what i plan to do is to pull out the disk (which is ok now but going to > die), so raid5 will degrade with 1 disk fail and no spare disk here, > then do ddresue to a new disk which will have same uuid and everything, > then put it back, then bitmap will shine here right? > > so raid5 is still online while that disk is not part of raid5 now. and > no diskio on it at all. so do not think i need an atomic operation here. if you fail the disk from the array, or boot without the failing disk, then the event counter in the other superblocks will be updated... and the removed/failed disk will no longer be considered an up to date component... so after doing the ddrescue you'd need to reassemble the raid5. i'm not sure you can convince md to use the bitmap in this case -- i'm just not familiar enough with it. > this raid5 over raid1 way sounds interesting. worthy trying. let us know how it goes :) i've considered doing this a few times myself... but i've been too conservative and just taken the system down to single user to do the ddrescue with the raid offline entirely. -dean ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: how to clone a disk 2006-03-12 0:31 ` dean gaudet @ 2006-03-12 0:36 ` Ming Zhang 2006-03-12 0:47 ` dean gaudet 0 siblings, 1 reply; 14+ messages in thread From: Ming Zhang @ 2006-03-12 0:36 UTC (permalink / raw) To: dean gaudet; +Cc: Paul M., linux-raid On Sat, 2006-03-11 at 16:31 -0800, dean gaudet wrote: > On Sat, 11 Mar 2006, Ming Zhang wrote: > > > On Sat, 2006-03-11 at 16:15 -0800, dean gaudet wrote: > > > > > you're planning to do this while the array is online? that's not safe... > > > unless it's a read-only array... > > > > what i plan to do is to pull out the disk (which is ok now but going to > > die), so raid5 will degrade with 1 disk fail and no spare disk here, > > then do ddresue to a new disk which will have same uuid and everything, > > then put it back, then bitmap will shine here right? > > > > so raid5 is still online while that disk is not part of raid5 now. and > > no diskio on it at all. so do not think i need an atomic operation here. > > if you fail the disk from the array, or boot without the failing disk, > then the event counter in the other superblocks will be updated... and the > removed/failed disk will no longer be considered an up to date > component... so after doing the ddrescue you'd need to reassemble the > raid5. i'm not sure you can convince md to use the bitmap in this case -- > i'm just not familiar enough with it. i am little confused here. then what the purpose of that bitmap for? is not that bitmap is for a component temporarily out of place and thus out of sync a bit? > > > this raid5 over raid1 way sounds interesting. worthy trying. > > let us know how it goes :) i've considered doing this a few times > myself... but i've been too conservative and just taken the system down to > single user to do the ddrescue with the raid offline entirely. sure. after we finish this discussion and sort out a stable plan. do not want to risk my data. :P > > -dean ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: how to clone a disk 2006-03-12 0:36 ` Ming Zhang @ 2006-03-12 0:47 ` dean gaudet 2006-03-12 0:54 ` Ming Zhang 0 siblings, 1 reply; 14+ messages in thread From: dean gaudet @ 2006-03-12 0:47 UTC (permalink / raw) To: Ming Zhang; +Cc: Paul M., linux-raid On Sat, 11 Mar 2006, Ming Zhang wrote: > On Sat, 2006-03-11 at 16:31 -0800, dean gaudet wrote: > > if you fail the disk from the array, or boot without the failing disk, > > then the event counter in the other superblocks will be updated... and the > > removed/failed disk will no longer be considered an up to date > > component... so after doing the ddrescue you'd need to reassemble the > > raid5. i'm not sure you can convince md to use the bitmap in this case -- > > i'm just not familiar enough with it. > > i am little confused here. then what the purpose of that bitmap for? is > not that bitmap is for a component temporarily out of place and thus out > of sync a bit? hmm... yeah i suppose that is the purpose of the bitmap... i haven't used bitmaps yet though... so i don't know which types of events they protect against. in theory what you want to do sounds like it should work though, but i'd experiment somewhere safe first. -dean ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: how to clone a disk 2006-03-12 0:47 ` dean gaudet @ 2006-03-12 0:54 ` Ming Zhang 0 siblings, 0 replies; 14+ messages in thread From: Ming Zhang @ 2006-03-12 0:54 UTC (permalink / raw) To: dean gaudet; +Cc: Paul M., linux-raid On Sat, 2006-03-11 at 16:47 -0800, dean gaudet wrote: > On Sat, 11 Mar 2006, Ming Zhang wrote: > > > On Sat, 2006-03-11 at 16:31 -0800, dean gaudet wrote: > > > if you fail the disk from the array, or boot without the failing disk, > > > then the event counter in the other superblocks will be updated... and the > > > removed/failed disk will no longer be considered an up to date > > > component... so after doing the ddrescue you'd need to reassemble the > > > raid5. i'm not sure you can convince md to use the bitmap in this case -- > > > i'm just not familiar enough with it. > > > > i am little confused here. then what the purpose of that bitmap for? is > > not that bitmap is for a component temporarily out of place and thus out > > of sync a bit? > > hmm... yeah i suppose that is the purpose of the bitmap... i haven't used > bitmaps yet though... so i don't know which types of events they protect > against. in theory what you want to do sounds like it should work though, > but i'd experiment somewhere safe first. yes, need to see if can find bitmap usage and purpose from archive here or google. > > -dean ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2006-03-12 0:54 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-03-11 0:56 how to clone a disk Ming Zhang 2006-03-11 11:53 ` Paul M. 2006-03-11 12:40 ` PFC 2006-03-11 16:55 ` Mike Hardy 2006-03-11 22:09 ` Ming Zhang 2006-03-11 23:08 ` Mike Hardy 2006-03-11 23:10 ` Ming Zhang 2006-03-11 22:08 ` Ming Zhang 2006-03-12 0:15 ` dean gaudet 2006-03-12 0:22 ` Ming Zhang 2006-03-12 0:31 ` dean gaudet 2006-03-12 0:36 ` Ming Zhang 2006-03-12 0:47 ` dean gaudet 2006-03-12 0:54 ` Ming Zhang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).