how to clone a disk

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* how to clone a disk
@ 2006-03-11  0:56 Ming Zhang
  2006-03-11 11:53 ` Paul M.
  0 siblings, 1 reply; 14+ messages in thread
From: Ming Zhang @ 2006-03-11  0:56 UTC (permalink / raw)
  To: linux-raid

Hi folks

I have a raid5 array that contain 4 disk and 1 spare disk. now i saw one
disk have sign of going fail via smart log.

so i am trying to do this

1) pull that disk (A) out
2) using dd to copy that disk (A) to another same one (B);
3) put disk (B) back to raid5 and assume that bitmap can help me resync
quickly.

now my question is 
1) can this achieve my goal?
2) is there any better/correct way?

thanks!

Ming

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: how to clone a disk
  2006-03-11  0:56 how to clone a disk Ming Zhang
@ 2006-03-11 11:53 ` Paul M.
  2006-03-11 12:40   ` PFC
  2006-03-11 22:08   ` Ming Zhang
  0 siblings, 2 replies; 14+ messages in thread
From: Paul M. @ 2006-03-11 11:53 UTC (permalink / raw)
  To: mingz; +Cc: linux-raid

Since its raid5 you would be fine just pulling the disk out and
letting the raid driver rebuild the array. If you have a hot spare
then that disk will automatically take over when you remove the
failing disk.
It would also be reasonable to simply wait for the disk to fail. When
that happens the array will be keep running. When the disk does fail
the system will use the hot spare to rebuild the array.
The short answer is your way will work but it not necessarily.
-Paul

On 3/10/06, Ming Zhang <mingz@ele.uri.edu> wrote:
> Hi folks
>
> I have a raid5 array that contain 4 disk and 1 spare disk. now i saw one
> disk have sign of going fail via smart log.
>
> so i am trying to do this
>
> 1) pull that disk (A) out
> 2) using dd to copy that disk (A) to another same one (B);
> 3) put disk (B) back to raid5 and assume that bitmap can help me resync
> quickly.
>
> now my question is
> 1) can this achieve my goal?
> 2) is there any better/correct way?
>
> thanks!
>
> Ming
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: how to clone a disk
  2006-03-11 11:53 ` Paul M.
@ 2006-03-11 12:40   ` PFC
  2006-03-11 16:55     ` Mike Hardy
  2006-03-11 22:08   ` Ming Zhang
  1 sibling, 1 reply; 14+ messages in thread
From: PFC @ 2006-03-11 12:40 UTC (permalink / raw)
  To: Paul M., mingz; +Cc: linux-raid


>> I have a raid5 array that contain 4 disk and 1 spare disk. now i saw one
>> disk have sign of going fail via smart log.

	Better safe than sorry... replace the failing disk and resync, that's all.

	You might want to do "cat /dev/md# > /dev/null", or "cat /dev/hd?  
>/dev/null" first. This is to be sure there isn't some yet-unseen bad  
sector on some other drive which would screw your resync.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: how to clone a disk
  2006-03-11 12:40   ` PFC
@ 2006-03-11 16:55     ` Mike Hardy
  2006-03-11 22:09       ` Ming Zhang
  0 siblings, 1 reply; 14+ messages in thread
From: Mike Hardy @ 2006-03-11 16:55 UTC (permalink / raw)
  To: PFC; +Cc: Paul M., mingz, linux-raid


I can think of two things I'd do slightly differently...

Do a smartctl -t long on each disk before you do anything, to verify
that you don't have single sector errors on other drives

Use ddrescue for better results copying a failing drive

-Mike

PFC wrote:
> 
>>> I have a raid5 array that contain 4 disk and 1 spare disk. now i saw one
>>> disk have sign of going fail via smart log.
> 
> 
>     Better safe than sorry... replace the failing disk and resync,
> that's all.
> 
>     You might want to do "cat /dev/md# > /dev/null", or "cat /dev/hd? 
> 
>> /dev/null" first. This is to be sure there isn't some yet-unseen bad  
> 
> sector on some other drive which would screw your resync.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: how to clone a disk
  2006-03-11 11:53 ` Paul M.
  2006-03-11 12:40   ` PFC
@ 2006-03-11 22:08   ` Ming Zhang
  2006-03-12  0:15     ` dean gaudet
  1 sibling, 1 reply; 14+ messages in thread
From: Ming Zhang @ 2006-03-11 22:08 UTC (permalink / raw)
  To: Paul M.; +Cc: linux-raid

On Sat, 2006-03-11 at 06:53 -0500, Paul M. wrote:
> Since its raid5 you would be fine just pulling the disk out and
> letting the raid driver rebuild the array. If you have a hot spare

yes, rebuilding is the simplest way. but rebuild will need to read all
other disks and write to the new disk. when serving some io at same
time, the rebuilding speed is not much,

but if i do a dd clone and plug it back. the total traffic is copy one
disk which can be done very fast as a fully sequential workload. with
that bitmap feature, the rsync work after plugging back is minor.

so the one disk fail window is pretty small here. right?

ming



> then that disk will automatically take over when you remove the
> failing disk.
> It would also be reasonable to simply wait for the disk to fail. When
> that happens the array will be keep running. When the disk does fail
> the system will use the hot spare to rebuild the array.
> The short answer is your way will work but it not necessarily.
> -Paul
> 
> On 3/10/06, Ming Zhang <mingz@ele.uri.edu> wrote:
> > Hi folks
> >
> > I have a raid5 array that contain 4 disk and 1 spare disk. now i saw one
> > disk have sign of going fail via smart log.
> >
> > so i am trying to do this
> >
> > 1) pull that disk (A) out
> > 2) using dd to copy that disk (A) to another same one (B);
> > 3) put disk (B) back to raid5 and assume that bitmap can help me resync
> > quickly.
> >
> > now my question is
> > 1) can this achieve my goal?
> > 2) is there any better/correct way?
> >
> > thanks!
> >
> > Ming
> >
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: how to clone a disk
  2006-03-11 16:55     ` Mike Hardy
@ 2006-03-11 22:09       ` Ming Zhang
  2006-03-11 23:08         ` Mike Hardy
  0 siblings, 1 reply; 14+ messages in thread
From: Ming Zhang @ 2006-03-11 22:09 UTC (permalink / raw)
  To: Mike Hardy; +Cc: PFC, Paul M., linux-raid

On Sat, 2006-03-11 at 08:55 -0800, Mike Hardy wrote:
> I can think of two things I'd do slightly differently...
> 
> Do a smartctl -t long on each disk before you do anything, to verify
> that you don't have single sector errors on other drives

will this test interfere with normal disk io activity?

> 
> Use ddrescue for better results copying a failing drive

yes, ddrescue will be better here than dd.

> 
> -Mike
> 
> PFC wrote:
> > 
> >>> I have a raid5 array that contain 4 disk and 1 spare disk. now i saw one
> >>> disk have sign of going fail via smart log.
> > 
> > 
> >     Better safe than sorry... replace the failing disk and resync,
> > that's all.
> > 
> >     You might want to do "cat /dev/md# > /dev/null", or "cat /dev/hd? 
> > 
> >> /dev/null" first. This is to be sure there isn't some yet-unseen bad  
> > 
> > sector on some other drive which would screw your resync.
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: how to clone a disk
  2006-03-11 22:09       ` Ming Zhang
@ 2006-03-11 23:08         ` Mike Hardy
  2006-03-11 23:10           ` Ming Zhang
  0 siblings, 1 reply; 14+ messages in thread
From: Mike Hardy @ 2006-03-11 23:08 UTC (permalink / raw)
  To: mingz; +Cc: PFC, Paul M., linux-raid

Ming Zhang wrote:
> On Sat, 2006-03-11 at 08:55 -0800, Mike Hardy wrote:
> 
>>I can think of two things I'd do slightly differently...
>>
>>Do a smartctl -t long on each disk before you do anything, to verify
>>that you don't have single sector errors on other drives
> 
> 
> will this test interfere with normal disk io activity?

If I understand the documentation correctly, it is not supposed to
interfere with normal I/O. It's something the disk does when it can.

Also, I do it on running systems with busy disks all the time (it's
scheduled, actually, and runs automatically at the same time backups and
things like that happen), and it is never noticable to me.

*except* when it finds something, and it tells me there's a problem
before any software notices. I love the advance warning...

-Mike

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: how to clone a disk
  2006-03-11 23:08         ` Mike Hardy
@ 2006-03-11 23:10           ` Ming Zhang
  0 siblings, 0 replies; 14+ messages in thread
From: Ming Zhang @ 2006-03-11 23:10 UTC (permalink / raw)
  To: Mike Hardy; +Cc: PFC, Paul M., linux-raid

thanks a lot!

ming

On Sat, 2006-03-11 at 15:08 -0800, Mike Hardy wrote:
> 
> 
> Ming Zhang wrote:
> > On Sat, 2006-03-11 at 08:55 -0800, Mike Hardy wrote:
> > 
> >>I can think of two things I'd do slightly differently...
> >>
> >>Do a smartctl -t long on each disk before you do anything, to verify
> >>that you don't have single sector errors on other drives
> > 
> > 
> > will this test interfere with normal disk io activity?
> 
> If I understand the documentation correctly, it is not supposed to
> interfere with normal I/O. It's something the disk does when it can.
> 
> Also, I do it on running systems with busy disks all the time (it's
> scheduled, actually, and runs automatically at the same time backups and
> things like that happen), and it is never noticable to me.
> 
> *except* when it finds something, and it tells me there's a problem
> before any software notices. I love the advance warning...
> 
> -Mike


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: how to clone a disk
  2006-03-11 22:08   ` Ming Zhang
@ 2006-03-12  0:15     ` dean gaudet
  2006-03-12  0:22       ` Ming Zhang
  0 siblings, 1 reply; 14+ messages in thread
From: dean gaudet @ 2006-03-12  0:15 UTC (permalink / raw)
  To: Ming Zhang; +Cc: Paul M., linux-raid

On Sat, 11 Mar 2006, Ming Zhang wrote:

> On Sat, 2006-03-11 at 06:53 -0500, Paul M. wrote:
> > Since its raid5 you would be fine just pulling the disk out and
> > letting the raid driver rebuild the array. If you have a hot spare
> 
> yes, rebuilding is the simplest way. but rebuild will need to read all
> other disks and write to the new disk. when serving some io at same
> time, the rebuilding speed is not much,
> 
> but if i do a dd clone and plug it back. the total traffic is copy one
> disk which can be done very fast as a fully sequential workload. with
> that bitmap feature, the rsync work after plugging back is minor.
> 
> so the one disk fail window is pretty small here. right?

you're planning to do this while the array is online?  that's not safe... 
unless it's a read-only array...

if you've got a bitmap then one thing you *could* do is stop the array 
temporarily, and copy the bitmap first, then restart the array... then 
copy the rest of the disk minus the bitmap.

you basically need an atomic copy of the bitmap from before you start the 
ddrescue... and you need to use that copy of the bitmap when you 
reassemble the array with the new disk.

or you could stop the raid5, and make a raid1 (legacy style, without raid 
superblock) of the dying disk and the new disk... then reassemble the 
raid5 using the raid1 for the one component... then restart the raid5.

regardless of which method you use you're going to need to take the array 
offline at least once to reassemble it with the duplicated disk in place 
of the dying disk...

i think i'd be tempted to do the raid1 method ... because that one 
requires you go offline at most once -- after the raid1 syncs you can fail 
out the dying drive and leave the raid1 around "degraded" until some 
future system maintenance event where you can reassemble without it.  (a 
reboot would automagically make it disappear too -- because it wouldn't 
have a raid1 superblock anyhow).

-dean

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: how to clone a disk
  2006-03-12  0:15     ` dean gaudet
@ 2006-03-12  0:22       ` Ming Zhang
  2006-03-12  0:31         ` dean gaudet
  0 siblings, 1 reply; 14+ messages in thread
From: Ming Zhang @ 2006-03-12  0:22 UTC (permalink / raw)
  To: dean gaudet; +Cc: Paul M., linux-raid

On Sat, 2006-03-11 at 16:15 -0800, dean gaudet wrote:
> On Sat, 11 Mar 2006, Ming Zhang wrote:
> 
> > On Sat, 2006-03-11 at 06:53 -0500, Paul M. wrote:
> > > Since its raid5 you would be fine just pulling the disk out and
> > > letting the raid driver rebuild the array. If you have a hot spare
> > 
> > yes, rebuilding is the simplest way. but rebuild will need to read all
> > other disks and write to the new disk. when serving some io at same
> > time, the rebuilding speed is not much,
> > 
> > but if i do a dd clone and plug it back. the total traffic is copy one
> > disk which can be done very fast as a fully sequential workload. with
> > that bitmap feature, the rsync work after plugging back is minor.
> > 
> > so the one disk fail window is pretty small here. right?
> 

> you're planning to do this while the array is online?  that's not safe... 
> unless it's a read-only array...

what i plan to do is to pull out the disk (which is ok now but going to
die), so raid5 will degrade with 1 disk fail and no spare disk here,
then do ddresue to a new disk which will have same uuid and everything,
then put it back, then bitmap will shine here right?

so raid5 is still online while that disk is not part of raid5 now. and
no diskio on it at all. so do not think i need an atomic operation here.

> 
> if you've got a bitmap then one thing you *could* do is stop the array 
> temporarily, and copy the bitmap first, then restart the array... then 
> copy the rest of the disk minus the bitmap.
> 
> you basically need an atomic copy of the bitmap from before you start the 
> ddrescue... and you need to use that copy of the bitmap when you 
> reassemble the array with the new disk.
> 

this raid5 over raid1 way sounds interesting. worthy trying.

> or you could stop the raid5, and make a raid1 (legacy style, without raid 
> superblock) of the dying disk and the new disk... then reassemble the 
> raid5 using the raid1 for the one component... then restart the raid5.
> 
> regardless of which method you use you're going to need to take the array 
> offline at least once to reassemble it with the duplicated disk in place 
> of the dying disk...
> 
> i think i'd be tempted to do the raid1 method ... because that one 
> requires you go offline at most once -- after the raid1 syncs you can fail 
> out the dying drive and leave the raid1 around "degraded" until some 
> future system maintenance event where you can reassemble without it.  (a 
> reboot would automagically make it disappear too -- because it wouldn't 
> have a raid1 superblock anyhow).
> 
> -dean


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: how to clone a disk
  2006-03-12  0:22       ` Ming Zhang
@ 2006-03-12  0:31         ` dean gaudet
  2006-03-12  0:36           ` Ming Zhang
  0 siblings, 1 reply; 14+ messages in thread
From: dean gaudet @ 2006-03-12  0:31 UTC (permalink / raw)
  To: Ming Zhang; +Cc: Paul M., linux-raid

On Sat, 11 Mar 2006, Ming Zhang wrote:

> On Sat, 2006-03-11 at 16:15 -0800, dean gaudet wrote:
> 
> > you're planning to do this while the array is online?  that's not safe... 
> > unless it's a read-only array...
> 
> what i plan to do is to pull out the disk (which is ok now but going to
> die), so raid5 will degrade with 1 disk fail and no spare disk here,
> then do ddresue to a new disk which will have same uuid and everything,
> then put it back, then bitmap will shine here right?
> 
> so raid5 is still online while that disk is not part of raid5 now. and
> no diskio on it at all. so do not think i need an atomic operation here.

if you fail the disk from the array, or boot without the failing disk, 
then the event counter in the other superblocks will be updated... and the 
removed/failed disk will no longer be considered an up to date 
component... so after doing the ddrescue you'd need to reassemble the 
raid5.  i'm not sure you can convince md to use the bitmap in this case -- 
i'm just not familiar enough with it.

> this raid5 over raid1 way sounds interesting. worthy trying.

let us know how it goes :)  i've considered doing this a few times 
myself... but i've been too conservative and just taken the system down to 
single user to do the ddrescue with the raid offline entirely.

-dean

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: how to clone a disk
  2006-03-12  0:31         ` dean gaudet
@ 2006-03-12  0:36           ` Ming Zhang
  2006-03-12  0:47             ` dean gaudet
  0 siblings, 1 reply; 14+ messages in thread
From: Ming Zhang @ 2006-03-12  0:36 UTC (permalink / raw)
  To: dean gaudet; +Cc: Paul M., linux-raid

On Sat, 2006-03-11 at 16:31 -0800, dean gaudet wrote:
> On Sat, 11 Mar 2006, Ming Zhang wrote:
> 
> > On Sat, 2006-03-11 at 16:15 -0800, dean gaudet wrote:
> > 
> > > you're planning to do this while the array is online?  that's not safe... 
> > > unless it's a read-only array...
> > 
> > what i plan to do is to pull out the disk (which is ok now but going to
> > die), so raid5 will degrade with 1 disk fail and no spare disk here,
> > then do ddresue to a new disk which will have same uuid and everything,
> > then put it back, then bitmap will shine here right?
> > 
> > so raid5 is still online while that disk is not part of raid5 now. and
> > no diskio on it at all. so do not think i need an atomic operation here.
> 
> if you fail the disk from the array, or boot without the failing disk, 
> then the event counter in the other superblocks will be updated... and the 
> removed/failed disk will no longer be considered an up to date 
> component... so after doing the ddrescue you'd need to reassemble the 
> raid5.  i'm not sure you can convince md to use the bitmap in this case -- 
> i'm just not familiar enough with it.

i am little confused here. then what the purpose of that bitmap for? is
not that bitmap is for a component temporarily out of place and thus out
of sync a bit?

> 
> > this raid5 over raid1 way sounds interesting. worthy trying.
> 
> let us know how it goes :)  i've considered doing this a few times 
> myself... but i've been too conservative and just taken the system down to 
> single user to do the ddrescue with the raid offline entirely.

sure. after we finish this discussion and sort out a stable plan. do not
want to risk my data. :P

> 
> -dean


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: how to clone a disk
  2006-03-12  0:36           ` Ming Zhang
@ 2006-03-12  0:47             ` dean gaudet
  2006-03-12  0:54               ` Ming Zhang
  0 siblings, 1 reply; 14+ messages in thread
From: dean gaudet @ 2006-03-12  0:47 UTC (permalink / raw)
  To: Ming Zhang; +Cc: Paul M., linux-raid

On Sat, 11 Mar 2006, Ming Zhang wrote:

> On Sat, 2006-03-11 at 16:31 -0800, dean gaudet wrote:
> > if you fail the disk from the array, or boot without the failing disk, 
> > then the event counter in the other superblocks will be updated... and the 
> > removed/failed disk will no longer be considered an up to date 
> > component... so after doing the ddrescue you'd need to reassemble the 
> > raid5.  i'm not sure you can convince md to use the bitmap in this case -- 
> > i'm just not familiar enough with it.
> 
> i am little confused here. then what the purpose of that bitmap for? is
> not that bitmap is for a component temporarily out of place and thus out
> of sync a bit?

hmm... yeah i suppose that is the purpose of the bitmap... i haven't used 
bitmaps yet though... so i don't know which types of events they protect 
against.  in theory what you want to do sounds like it should work though, 
but i'd experiment somewhere safe first.

-dean

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: how to clone a disk
  2006-03-12  0:47             ` dean gaudet
@ 2006-03-12  0:54               ` Ming Zhang
  0 siblings, 0 replies; 14+ messages in thread
From: Ming Zhang @ 2006-03-12  0:54 UTC (permalink / raw)
  To: dean gaudet; +Cc: Paul M., linux-raid

On Sat, 2006-03-11 at 16:47 -0800, dean gaudet wrote:
> On Sat, 11 Mar 2006, Ming Zhang wrote:
> 
> > On Sat, 2006-03-11 at 16:31 -0800, dean gaudet wrote:
> > > if you fail the disk from the array, or boot without the failing disk, 
> > > then the event counter in the other superblocks will be updated... and the 
> > > removed/failed disk will no longer be considered an up to date 
> > > component... so after doing the ddrescue you'd need to reassemble the 
> > > raid5.  i'm not sure you can convince md to use the bitmap in this case -- 
> > > i'm just not familiar enough with it.
> > 
> > i am little confused here. then what the purpose of that bitmap for? is
> > not that bitmap is for a component temporarily out of place and thus out
> > of sync a bit?
> 
> hmm... yeah i suppose that is the purpose of the bitmap... i haven't used 
> bitmaps yet though... so i don't know which types of events they protect 
> against.  in theory what you want to do sounds like it should work though, 
> but i'd experiment somewhere safe first.

yes, need to see if can find bitmap usage and purpose from archive here
or google.


> 
> -dean


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2006-03-12  0:54 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-03-11  0:56 how to clone a disk Ming Zhang
2006-03-11 11:53 ` Paul M.
2006-03-11 12:40   ` PFC
2006-03-11 16:55     ` Mike Hardy
2006-03-11 22:09       ` Ming Zhang
2006-03-11 23:08         ` Mike Hardy
2006-03-11 23:10           ` Ming Zhang
2006-03-11 22:08   ` Ming Zhang
2006-03-12  0:15     ` dean gaudet
2006-03-12  0:22       ` Ming Zhang
2006-03-12  0:31         ` dean gaudet
2006-03-12  0:36           ` Ming Zhang
2006-03-12  0:47             ` dean gaudet
2006-03-12  0:54               ` Ming Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).