RE: Replacing all disks in a an array as a preventative measure before failing.

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* RE: Replacing all disks in a an array as a preventative measure before failing.
@ 2022-02-07 20:26 Red Wil
  2022-02-07 22:28 ` Wol
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Red Wil @ 2022-02-07 20:26 UTC (permalink / raw)
  To: linux-raid

Hello,

It started as the subject said:
 - goal was to replace all 10 disks in a R6
 - context and perceived constraints
   - soft raid (no imsm and or ddl containers) 
   - multiple disk partition. partitions across 10 disks formed R6
   - downtime not an issue
   - minimize the number of commands
   - minimize disks stress
   - reduce the time spent with this process
   - difficult to add 10 spares at once in the rig
   - after a reshape/grow from 6 to 10 disks offset of data in raid
     members was all over the place from cca 10ksect to 200ksect

Approaches/solutions and critique 
 1- add one by one a 'spare' and 'replace' raid member
  critique:
  - seem to me long and tedious process
  - cannot/will not run in parallel
 2- add all the spares at once and perform 'replace' on members
  critique
  - just tedious - lots of cli commands which can be prone to mistakes.
 next ones assume I have all the 'spares' in the rig
 3- create new arrays on spares, fresh fs and copy data.
 4- dd/ddrescue copy each drive to a new one. Advantage can be done one
 by one or in parallel. less commands in the terminal. 

In the end I decided I will use route (3). 
 - flexibility on creation
 - copy only what I need
 - old array is a sort of backup

Question:
Just for my curiosity regarding (4) assuming array is offline:
Besides being not recommended in case of imsm/ddl containers which (as
far as i understood) keep some data on the hardware itself

In case of pure soft raid is anything technical or safety related that
prevents a 'dd' copy of a physical hard drive to act exactly as the
original.

Thanks
Red

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Replacing all disks in a an array as a preventative measure before failing.
  2022-02-07 20:26 Replacing all disks in a an array as a preventative measure before failing Red Wil
@ 2022-02-07 22:28 ` Wol
  2022-02-09 20:58   ` Red Wil
  2022-02-09 13:02 ` Roger Heflin
  2022-02-09 14:57 ` Phil Turmel
  2 siblings, 1 reply; 7+ messages in thread
From: Wol @ 2022-02-07 22:28 UTC (permalink / raw)
  To: Red Wil, linux-raid

On 07/02/2022 20:26, Red Wil wrote:
> Hello,
> 
> It started as the subject said:
>   - goal was to replace all 10 disks in a R6
>   - context and perceived constraints
>     - soft raid (no imsm and or ddl containers)
>     - multiple disk partition. partitions across 10 disks formed R6
>     - downtime not an issue
>     - minimize the number of commands
>     - minimize disks stress
>     - reduce the time spent with this process
>     - difficult to add 10 spares at once in the rig
>     - after a reshape/grow from 6 to 10 disks offset of data in raid
>       members was all over the place from cca 10ksect to 200ksect
> 
> Approaches/solutions and critique
>   1- add one by one a 'spare' and 'replace' raid member
>    critique:
>    - seem to me long and tedious process
>    - cannot/will not run in parallel

There's not a problem running in parallel as far as mdraid is concerned. 
If you can get the spare drives into the chassis (or on eSATA), you can 
--replace several drives at once.

And it pretty much just does a dd, just on the live system keeping you 
raid-safe.

>   2- add all the spares at once and perform 'replace' on members
>    critique
>    - just tedious - lots of cli commands which can be prone to mistakes.

pretty much the same as (1). Given that your sdX's are moving all over 
the place, I would work with uuids even though it's more typing, it's safer.

>   next ones assume I have all the 'spares' in the rig
>   3- create new arrays on spares, fresh fs and copy data.

Well, you could fail/replace all the old drives, but yes just building a 
new array from scratch (if you can afford the downtime) is probably better.

>   4- dd/ddrescue copy each drive to a new one. Advantage can be done one
>   by one or in parallel. less commands in the terminal.

Less commands? Dunno about that. Much safer in many ways though, remove 
the drive you're replacing, copy it, put the new one back. Less chance 
for a physical error.
> 
> In the end I decided I will use route (3).
>   - flexibility on creation
>   - copy only what I need
>   - old array is a sort of backup
> 
> Question:
> Just for my curiosity regarding (4) assuming array is offline:
> Besides being not recommended in case of imsm/ddl containers which (as
> far as i understood) keep some data on the hardware itself
> 
> In case of pure soft raid is anything technical or safety related that
> prevents a 'dd' copy of a physical hard drive to act exactly as the
> original.
> 
Nope. You've copied the partition byte for byte, the raid won't know any 
different.

One question, though. Why are you replacing the drives? Just a precaution?

How big are the drives? What I'd do if you're not replacing dying 
drives, is buy five or possibly six drives of twice the capacity. Do a 
--replace on those five drives. Now take two of the drives you've 
removed, raid-0 them, and now do a major re-org, adding your raid-0 as 
device 6, reducing your raid to a 6-device array, and removing the last 
four old drives from the array. Assuming you've only got 10 bays and 
you've been faffing about externally as you replace drives, you can now 
use the last three drives in the chassis to create another two-drive 
raid-0, add that as a spare into your raid-6, and add your last drive as 
a spare into both your raid-0s.

So you end up with a 6-device+plus-spare raid-6, and devices 6 & spare 
(your raid-0s) share a spare between them.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Replacing all disks in a an array as a preventative measure before failing.
  2022-02-07 22:28 ` Wol
@ 2022-02-09 20:58   ` Red Wil
  0 siblings, 0 replies; 7+ messages in thread
From: Red Wil @ 2022-02-09 20:58 UTC (permalink / raw)
  To: Wol; +Cc: linux-raid

On Mon, 7 Feb 2022 22:28:57 +0000
Wol <antlists@youngman.org.uk> wrote:

> On 07/02/2022 20:26, Red Wil wrote:
> > Hello,
> > 
> > It started as the subject said:
> >   - goal was to replace all 10 disks in a R6
> >   - context and perceived constraints
> >     - soft raid (no imsm and or ddl containers)
> >     - multiple disk partition. partitions across 10 disks formed R6
> >     - downtime not an issue
> >     - minimize the number of commands
> >     - minimize disks stress
> >     - reduce the time spent with this process
> >     - difficult to add 10 spares at once in the rig
> >     - after a reshape/grow from 6 to 10 disks offset of data in raid
> >       members was all over the place from cca 10ksect to 200ksect
> > 
> > Approaches/solutions and critique
> >   1- add one by one a 'spare' and 'replace' raid member
> >    critique:
> >    - seem to me long and tedious process
> >    - cannot/will not run in parallel  
> 
> There's not a problem running in parallel as far as mdraid is
> concerned. If you can get the spare drives into the chassis (or on
> eSATA), you can --replace several drives at once.
> 
> And it pretty much just does a dd, just on the live system keeping
> you raid-safe.
If I remember correctly if you have multiple partitions on a single
disk (different arrays obviously) if you start a syn/resync op, for
example, on all arrays from that particular spindle/disk, it will be
done sequentially. If it would do it in parallel -> heads movement
stress.
> 
> >   2- add all the spares at once and perform 'replace' on members
> >    critique
> >    - just tedious - lots of cli commands which can be prone to
> > mistakes.  
> 
> pretty much the same as (1). Given that your sdX's are moving all
> over the place, I would work with uuids even though it's more typing,
> it's safer.
> 
> >   next ones assume I have all the 'spares' in the rig
> >   3- create new arrays on spares, fresh fs and copy data.  
> 
> Well, you could fail/replace all the old drives, but yes just
> building a new array from scratch (if you can afford the downtime) is
> probably better.
Another reason to go this route was to tune/tweak the stack
(RAID-LVM-FS)
> 
> >   4- dd/ddrescue copy each drive to a new one. Advantage can be
> > done one by one or in parallel. less commands in the terminal.  
> 
> Less commands? Dunno about that. Much safer in many ways though,
> remove the drive you're replacing, copy it, put the new one back.
> Less chance for a physical error.
well.. it's a matter of perception. for 10 disks I will have 10 dd
commands of the form "dd if=olddrive of=newdrive <some params>" or even
better "ddrescue olddrive newdrive logfile" otherwise all the "mdadm
commands" would be 50 in total for 10 disks for I have 5 individual
arrays across 10 disks
> > 
> > In the end I decided I will use route (3).
> >   - flexibility on creation
> >   - copy only what I need
> >   - old array is a sort of backup
> > 
> > Question:
> > Just for my curiosity regarding (4) assuming array is offline:
> > Besides being not recommended in case of imsm/ddl containers which
> > (as far as i understood) keep some data on the hardware itself
> > 
> > In case of pure soft raid is anything technical or safety related
> > that prevents a 'dd' copy of a physical hard drive to act exactly
> > as the original.
> >   
> Nope. You've copied the partition byte for byte, the raid won't know
> any different.
> 
> One question, though. Why are you replacing the drives? Just a
> precaution?
> 
> How big are the drives? What I'd do if you're not replacing dying 
> drives, is buy five or possibly six drives of twice the capacity. Do
> a --replace on those five drives. Now take two of the drives you've 
> removed, raid-0 them, and now do a major re-org, adding your raid-0
> as device 6, reducing your raid to a 6-device array, and removing the
> last four old drives from the array. Assuming you've only got 10 bays
> and you've been faffing about externally as you replace drives, you
> can now use the last three drives in the chassis to create another
> two-drive raid-0, add that as a spare into your raid-6, and add your
> last drive as a spare into both your raid-0s.
> 
> So you end up with a 6-device+plus-spare raid-6, and devices 6 &
> spare (your raid-0s) share a spare between them.
> 
> Cheers,
> Wol
I was thinking of cutting nr. of drives to 6 from 10 by using double
size drives but financial considerations at the time end up with 10
slightly larger drives.

Thanks for your comments
Red

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Replacing all disks in a an array as a preventative measure before failing.
  2022-02-07 20:26 Replacing all disks in a an array as a preventative measure before failing Red Wil
  2022-02-07 22:28 ` Wol
@ 2022-02-09 13:02 ` Roger Heflin
  2022-02-09 21:07   ` Red Wil
  2022-02-09 14:57 ` Phil Turmel
  2 siblings, 1 reply; 7+ messages in thread
From: Roger Heflin @ 2022-02-09 13:02 UTC (permalink / raw)
  To: Red Wil; +Cc: linux-raid

On Wed, Feb 9, 2022 at 3:12 AM Red Wil <redwil@gmail.com> wrote:
>
> Hello,
>
> It started as the subject said:
>  - goal was to replace all 10 disks in a R6
>  - context and perceived constraints
>    - soft raid (no imsm and or ddl containers)
>    - multiple disk partition. partitions across 10 disks formed R6
>    - downtime not an issue
>    - minimize the number of commands
>    - minimize disks stress
>    - reduce the time spent with this process
>    - difficult to add 10 spares at once in the rig
>    - after a reshape/grow from 6 to 10 disks offset of data in raid
>      members was all over the place from cca 10ksect to 200ksect
>
> Approaches/solutions and critique
>  1- add one by one a 'spare' and 'replace' raid member
>   critique:
>   - seem to me long and tedious process
>   - cannot/will not run in parallel
>  2- add all the spares at once and perform 'replace' on members
>   critique
>   - just tedious - lots of cli commands which can be prone to mistakes.
>  next ones assume I have all the 'spares' in the rig
>  3- create new arrays on spares, fresh fs and copy data.
>  4- dd/ddrescue copy each drive to a new one. Advantage can be done one
>  by one or in parallel. less commands in the terminal.
>
> In the end I decided I will use route (3).
>  - flexibility on creation
>  - copy only what I need
>  - old array is a sort of backup
>

When I did mine I did a combination of 3 and 2.  I bought new disks
that were 2x the size of the devices in the original array, and
partitioned those new disks with partition the correct size for the
old array.  I used 2 of new disks to remove 2 disks that were not
behaving, and I used another new disk to replace a 3rd original device
that was behaving just fine.  I used the 3rd device I replaced to add
to the 3 new disk partitions and created a 4 disk raid6 (3 new + 1
old/replaced device) and rearranged a subset of files from the
original array to its own mount point on the new array.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Replacing all disks in a an array as a preventative measure before failing.
  2022-02-09 13:02 ` Roger Heflin
@ 2022-02-09 21:07   ` Red Wil
  0 siblings, 0 replies; 7+ messages in thread
From: Red Wil @ 2022-02-09 21:07 UTC (permalink / raw)
  To: Roger Heflin; +Cc: linux-raid

On Wed, 9 Feb 2022 07:02:45 -0600
Roger Heflin <rogerheflin@gmail.com> wrote:

> On Wed, Feb 9, 2022 at 3:12 AM Red Wil <redwil@gmail.com> wrote:
> >
> > Hello,
> >
> > It started as the subject said:
> >  - goal was to replace all 10 disks in a R6
> >  - context and perceived constraints
> >    - soft raid (no imsm and or ddl containers)
> >    - multiple disk partition. partitions across 10 disks formed R6
> >    - downtime not an issue
> >    - minimize the number of commands
> >    - minimize disks stress
> >    - reduce the time spent with this process
> >    - difficult to add 10 spares at once in the rig
> >    - after a reshape/grow from 6 to 10 disks offset of data in raid
> >      members was all over the place from cca 10ksect to 200ksect
> >
> > Approaches/solutions and critique
> >  1- add one by one a 'spare' and 'replace' raid member
> >   critique:
> >   - seem to me long and tedious process
> >   - cannot/will not run in parallel
> >  2- add all the spares at once and perform 'replace' on members
> >   critique
> >   - just tedious - lots of cli commands which can be prone to
> > mistakes. next ones assume I have all the 'spares' in the rig
> >  3- create new arrays on spares, fresh fs and copy data.
> >  4- dd/ddrescue copy each drive to a new one. Advantage can be done
> > one by one or in parallel. less commands in the terminal.
> >
> > In the end I decided I will use route (3).
> >  - flexibility on creation
> >  - copy only what I need
> >  - old array is a sort of backup
> >  
> 
> When I did mine I did a combination of 3 and 2.  I bought new disks
> that were 2x the size of the devices in the original array, and
> partitioned those new disks with partition the correct size for the
> old array.  I used 2 of new disks to remove 2 disks that were not
> behaving, and I used another new disk to replace a 3rd original device
> that was behaving just fine.  I used the 3rd device I replaced to add
> to the 3 new disk partitions and created a 4 disk raid6 (3 new + 1
> old/replaced device) and rearranged a subset of files from the
> original array to its own mount point on the new array.

Obviously, as usual in 'nix' world there are multiple solutions for
the same problem especially if you have a small number of drives.

My real question was regarding (4) if a exact copy bit-wise replica of
an entire disk/spindle would have any technical or safety concerns.

Thanks
Red 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Replacing all disks in a an array as a preventative measure before failing.
  2022-02-07 20:26 Replacing all disks in a an array as a preventative measure before failing Red Wil
  2022-02-07 22:28 ` Wol
  2022-02-09 13:02 ` Roger Heflin
@ 2022-02-09 14:57 ` Phil Turmel
  2022-02-09 21:15   ` Red Wil
  2 siblings, 1 reply; 7+ messages in thread
From: Phil Turmel @ 2022-02-09 14:57 UTC (permalink / raw)
  To: Red Wil, linux-raid

On 2/7/22 15:26, Red Wil wrote:
> Hello,

[trim/]

> Approaches/solutions and critique
>   1- add one by one a 'spare' and 'replace' raid member
>    critique:
>    - seem to me long and tedious process
>    - cannot/will not run in parallel
>   2- add all the spares at once and perform 'replace' on members
>    critique
>    - just tedious - lots of cli commands which can be prone to mistakes.
>   next ones assume I have all the 'spares' in the rig
>   3- create new arrays on spares, fresh fs and copy data.
>   4- dd/ddrescue copy each drive to a new one. Advantage can be done one
>   by one or in parallel. less commands in the terminal.

My last drive upgrades were done in a chassis that had two extra hot 
swap bays.  So I could do two at a time.  I wanted to keep careful track 
of roles, so I started a replace after each spare added, to ensure that 
spare would get the designated role.  After it was running, I would 
--add and --replace the next.  After the first two were running 
(staggered), it was just waiting for one to finish to pop it out and 
start the next.

After completion, I used --grow to occupy the new space on each.

Took several days, but no downtime at all.

Phil

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Replacing all disks in a an array as a preventative measure before failing.
  2022-02-09 14:57 ` Phil Turmel
@ 2022-02-09 21:15   ` Red Wil
  0 siblings, 0 replies; 7+ messages in thread
From: Red Wil @ 2022-02-09 21:15 UTC (permalink / raw)
  To: Phil Turmel; +Cc: linux-raid

On Wed, 9 Feb 2022 09:57:25 -0500
Phil Turmel <philip@turmel.org> wrote:

> On 2/7/22 15:26, Red Wil wrote:
> > Hello,  
> 
> [trim/]
> 
> > Approaches/solutions and critique
> >   1- add one by one a 'spare' and 'replace' raid member
> >    critique:
> >    - seem to me long and tedious process
> >    - cannot/will not run in parallel
> >   2- add all the spares at once and perform 'replace' on members
> >    critique
> >    - just tedious - lots of cli commands which can be prone to
> > mistakes. next ones assume I have all the 'spares' in the rig
> >   3- create new arrays on spares, fresh fs and copy data.
> >   4- dd/ddrescue copy each drive to a new one. Advantage can be
> > done one by one or in parallel. less commands in the terminal.  
> 
> My last drive upgrades were done in a chassis that had two extra hot 
> swap bays.  So I could do two at a time.  I wanted to keep careful
> track of roles, so I started a replace after each spare added, to
> ensure that spare would get the designated role.  After it was
> running, I would --add and --replace the next.  After the first two
> were running (staggered), it was just waiting for one to finish to
> pop it out and start the next.
> 
> After completion, I used --grow to occupy the new space on each.
> 
> Took several days, but no downtime at all.
> 
> Phil

Hello Phil,

My current chassis is full (no space at all) but I found another
chassis I could use to temporary extend my chassis for the duration of
the swap by using two SAS HBAs and use all 22 drives at once. 

Thanks
Red

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-02-09 21:16 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-02-07 20:26 Replacing all disks in a an array as a preventative measure before failing Red Wil
2022-02-07 22:28 ` Wol
2022-02-09 20:58   ` Red Wil
2022-02-09 13:02 ` Roger Heflin
2022-02-09 21:07   ` Red Wil
2022-02-09 14:57 ` Phil Turmel
2022-02-09 21:15   ` Red Wil

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).