* RE: Replacing all disks in a an array as a preventative measure before failing.
@ 2022-02-07 20:26 Red Wil
2022-02-07 22:28 ` Wol
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Red Wil @ 2022-02-07 20:26 UTC (permalink / raw)
To: linux-raid
Hello,
It started as the subject said:
- goal was to replace all 10 disks in a R6
- context and perceived constraints
- soft raid (no imsm and or ddl containers)
- multiple disk partition. partitions across 10 disks formed R6
- downtime not an issue
- minimize the number of commands
- minimize disks stress
- reduce the time spent with this process
- difficult to add 10 spares at once in the rig
- after a reshape/grow from 6 to 10 disks offset of data in raid
members was all over the place from cca 10ksect to 200ksect
Approaches/solutions and critique
1- add one by one a 'spare' and 'replace' raid member
critique:
- seem to me long and tedious process
- cannot/will not run in parallel
2- add all the spares at once and perform 'replace' on members
critique
- just tedious - lots of cli commands which can be prone to mistakes.
next ones assume I have all the 'spares' in the rig
3- create new arrays on spares, fresh fs and copy data.
4- dd/ddrescue copy each drive to a new one. Advantage can be done one
by one or in parallel. less commands in the terminal.
In the end I decided I will use route (3).
- flexibility on creation
- copy only what I need
- old array is a sort of backup
Question:
Just for my curiosity regarding (4) assuming array is offline:
Besides being not recommended in case of imsm/ddl containers which (as
far as i understood) keep some data on the hardware itself
In case of pure soft raid is anything technical or safety related that
prevents a 'dd' copy of a physical hard drive to act exactly as the
original.
Thanks
Red
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: Replacing all disks in a an array as a preventative measure before failing. 2022-02-07 20:26 Replacing all disks in a an array as a preventative measure before failing Red Wil @ 2022-02-07 22:28 ` Wol 2022-02-09 20:58 ` Red Wil 2022-02-09 13:02 ` Roger Heflin 2022-02-09 14:57 ` Phil Turmel 2 siblings, 1 reply; 7+ messages in thread From: Wol @ 2022-02-07 22:28 UTC (permalink / raw) To: Red Wil, linux-raid On 07/02/2022 20:26, Red Wil wrote: > Hello, > > It started as the subject said: > - goal was to replace all 10 disks in a R6 > - context and perceived constraints > - soft raid (no imsm and or ddl containers) > - multiple disk partition. partitions across 10 disks formed R6 > - downtime not an issue > - minimize the number of commands > - minimize disks stress > - reduce the time spent with this process > - difficult to add 10 spares at once in the rig > - after a reshape/grow from 6 to 10 disks offset of data in raid > members was all over the place from cca 10ksect to 200ksect > > Approaches/solutions and critique > 1- add one by one a 'spare' and 'replace' raid member > critique: > - seem to me long and tedious process > - cannot/will not run in parallel There's not a problem running in parallel as far as mdraid is concerned. If you can get the spare drives into the chassis (or on eSATA), you can --replace several drives at once. And it pretty much just does a dd, just on the live system keeping you raid-safe. > 2- add all the spares at once and perform 'replace' on members > critique > - just tedious - lots of cli commands which can be prone to mistakes. pretty much the same as (1). Given that your sdX's are moving all over the place, I would work with uuids even though it's more typing, it's safer. > next ones assume I have all the 'spares' in the rig > 3- create new arrays on spares, fresh fs and copy data. Well, you could fail/replace all the old drives, but yes just building a new array from scratch (if you can afford the downtime) is probably better. > 4- dd/ddrescue copy each drive to a new one. Advantage can be done one > by one or in parallel. less commands in the terminal. Less commands? Dunno about that. Much safer in many ways though, remove the drive you're replacing, copy it, put the new one back. Less chance for a physical error. > > In the end I decided I will use route (3). > - flexibility on creation > - copy only what I need > - old array is a sort of backup > > Question: > Just for my curiosity regarding (4) assuming array is offline: > Besides being not recommended in case of imsm/ddl containers which (as > far as i understood) keep some data on the hardware itself > > In case of pure soft raid is anything technical or safety related that > prevents a 'dd' copy of a physical hard drive to act exactly as the > original. > Nope. You've copied the partition byte for byte, the raid won't know any different. One question, though. Why are you replacing the drives? Just a precaution? How big are the drives? What I'd do if you're not replacing dying drives, is buy five or possibly six drives of twice the capacity. Do a --replace on those five drives. Now take two of the drives you've removed, raid-0 them, and now do a major re-org, adding your raid-0 as device 6, reducing your raid to a 6-device array, and removing the last four old drives from the array. Assuming you've only got 10 bays and you've been faffing about externally as you replace drives, you can now use the last three drives in the chassis to create another two-drive raid-0, add that as a spare into your raid-6, and add your last drive as a spare into both your raid-0s. So you end up with a 6-device+plus-spare raid-6, and devices 6 & spare (your raid-0s) share a spare between them. Cheers, Wol ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Replacing all disks in a an array as a preventative measure before failing. 2022-02-07 22:28 ` Wol @ 2022-02-09 20:58 ` Red Wil 0 siblings, 0 replies; 7+ messages in thread From: Red Wil @ 2022-02-09 20:58 UTC (permalink / raw) To: Wol; +Cc: linux-raid On Mon, 7 Feb 2022 22:28:57 +0000 Wol <antlists@youngman.org.uk> wrote: > On 07/02/2022 20:26, Red Wil wrote: > > Hello, > > > > It started as the subject said: > > - goal was to replace all 10 disks in a R6 > > - context and perceived constraints > > - soft raid (no imsm and or ddl containers) > > - multiple disk partition. partitions across 10 disks formed R6 > > - downtime not an issue > > - minimize the number of commands > > - minimize disks stress > > - reduce the time spent with this process > > - difficult to add 10 spares at once in the rig > > - after a reshape/grow from 6 to 10 disks offset of data in raid > > members was all over the place from cca 10ksect to 200ksect > > > > Approaches/solutions and critique > > 1- add one by one a 'spare' and 'replace' raid member > > critique: > > - seem to me long and tedious process > > - cannot/will not run in parallel > > There's not a problem running in parallel as far as mdraid is > concerned. If you can get the spare drives into the chassis (or on > eSATA), you can --replace several drives at once. > > And it pretty much just does a dd, just on the live system keeping > you raid-safe. If I remember correctly if you have multiple partitions on a single disk (different arrays obviously) if you start a syn/resync op, for example, on all arrays from that particular spindle/disk, it will be done sequentially. If it would do it in parallel -> heads movement stress. > > > 2- add all the spares at once and perform 'replace' on members > > critique > > - just tedious - lots of cli commands which can be prone to > > mistakes. > > pretty much the same as (1). Given that your sdX's are moving all > over the place, I would work with uuids even though it's more typing, > it's safer. > > > next ones assume I have all the 'spares' in the rig > > 3- create new arrays on spares, fresh fs and copy data. > > Well, you could fail/replace all the old drives, but yes just > building a new array from scratch (if you can afford the downtime) is > probably better. Another reason to go this route was to tune/tweak the stack (RAID-LVM-FS) > > > 4- dd/ddrescue copy each drive to a new one. Advantage can be > > done one by one or in parallel. less commands in the terminal. > > Less commands? Dunno about that. Much safer in many ways though, > remove the drive you're replacing, copy it, put the new one back. > Less chance for a physical error. well.. it's a matter of perception. for 10 disks I will have 10 dd commands of the form "dd if=olddrive of=newdrive <some params>" or even better "ddrescue olddrive newdrive logfile" otherwise all the "mdadm commands" would be 50 in total for 10 disks for I have 5 individual arrays across 10 disks > > > > In the end I decided I will use route (3). > > - flexibility on creation > > - copy only what I need > > - old array is a sort of backup > > > > Question: > > Just for my curiosity regarding (4) assuming array is offline: > > Besides being not recommended in case of imsm/ddl containers which > > (as far as i understood) keep some data on the hardware itself > > > > In case of pure soft raid is anything technical or safety related > > that prevents a 'dd' copy of a physical hard drive to act exactly > > as the original. > > > Nope. You've copied the partition byte for byte, the raid won't know > any different. > > One question, though. Why are you replacing the drives? Just a > precaution? > > How big are the drives? What I'd do if you're not replacing dying > drives, is buy five or possibly six drives of twice the capacity. Do > a --replace on those five drives. Now take two of the drives you've > removed, raid-0 them, and now do a major re-org, adding your raid-0 > as device 6, reducing your raid to a 6-device array, and removing the > last four old drives from the array. Assuming you've only got 10 bays > and you've been faffing about externally as you replace drives, you > can now use the last three drives in the chassis to create another > two-drive raid-0, add that as a spare into your raid-6, and add your > last drive as a spare into both your raid-0s. > > So you end up with a 6-device+plus-spare raid-6, and devices 6 & > spare (your raid-0s) share a spare between them. > > Cheers, > Wol I was thinking of cutting nr. of drives to 6 from 10 by using double size drives but financial considerations at the time end up with 10 slightly larger drives. Thanks for your comments Red ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Replacing all disks in a an array as a preventative measure before failing. 2022-02-07 20:26 Replacing all disks in a an array as a preventative measure before failing Red Wil 2022-02-07 22:28 ` Wol @ 2022-02-09 13:02 ` Roger Heflin 2022-02-09 21:07 ` Red Wil 2022-02-09 14:57 ` Phil Turmel 2 siblings, 1 reply; 7+ messages in thread From: Roger Heflin @ 2022-02-09 13:02 UTC (permalink / raw) To: Red Wil; +Cc: linux-raid On Wed, Feb 9, 2022 at 3:12 AM Red Wil <redwil@gmail.com> wrote: > > Hello, > > It started as the subject said: > - goal was to replace all 10 disks in a R6 > - context and perceived constraints > - soft raid (no imsm and or ddl containers) > - multiple disk partition. partitions across 10 disks formed R6 > - downtime not an issue > - minimize the number of commands > - minimize disks stress > - reduce the time spent with this process > - difficult to add 10 spares at once in the rig > - after a reshape/grow from 6 to 10 disks offset of data in raid > members was all over the place from cca 10ksect to 200ksect > > Approaches/solutions and critique > 1- add one by one a 'spare' and 'replace' raid member > critique: > - seem to me long and tedious process > - cannot/will not run in parallel > 2- add all the spares at once and perform 'replace' on members > critique > - just tedious - lots of cli commands which can be prone to mistakes. > next ones assume I have all the 'spares' in the rig > 3- create new arrays on spares, fresh fs and copy data. > 4- dd/ddrescue copy each drive to a new one. Advantage can be done one > by one or in parallel. less commands in the terminal. > > In the end I decided I will use route (3). > - flexibility on creation > - copy only what I need > - old array is a sort of backup > When I did mine I did a combination of 3 and 2. I bought new disks that were 2x the size of the devices in the original array, and partitioned those new disks with partition the correct size for the old array. I used 2 of new disks to remove 2 disks that were not behaving, and I used another new disk to replace a 3rd original device that was behaving just fine. I used the 3rd device I replaced to add to the 3 new disk partitions and created a 4 disk raid6 (3 new + 1 old/replaced device) and rearranged a subset of files from the original array to its own mount point on the new array. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Replacing all disks in a an array as a preventative measure before failing. 2022-02-09 13:02 ` Roger Heflin @ 2022-02-09 21:07 ` Red Wil 0 siblings, 0 replies; 7+ messages in thread From: Red Wil @ 2022-02-09 21:07 UTC (permalink / raw) To: Roger Heflin; +Cc: linux-raid On Wed, 9 Feb 2022 07:02:45 -0600 Roger Heflin <rogerheflin@gmail.com> wrote: > On Wed, Feb 9, 2022 at 3:12 AM Red Wil <redwil@gmail.com> wrote: > > > > Hello, > > > > It started as the subject said: > > - goal was to replace all 10 disks in a R6 > > - context and perceived constraints > > - soft raid (no imsm and or ddl containers) > > - multiple disk partition. partitions across 10 disks formed R6 > > - downtime not an issue > > - minimize the number of commands > > - minimize disks stress > > - reduce the time spent with this process > > - difficult to add 10 spares at once in the rig > > - after a reshape/grow from 6 to 10 disks offset of data in raid > > members was all over the place from cca 10ksect to 200ksect > > > > Approaches/solutions and critique > > 1- add one by one a 'spare' and 'replace' raid member > > critique: > > - seem to me long and tedious process > > - cannot/will not run in parallel > > 2- add all the spares at once and perform 'replace' on members > > critique > > - just tedious - lots of cli commands which can be prone to > > mistakes. next ones assume I have all the 'spares' in the rig > > 3- create new arrays on spares, fresh fs and copy data. > > 4- dd/ddrescue copy each drive to a new one. Advantage can be done > > one by one or in parallel. less commands in the terminal. > > > > In the end I decided I will use route (3). > > - flexibility on creation > > - copy only what I need > > - old array is a sort of backup > > > > When I did mine I did a combination of 3 and 2. I bought new disks > that were 2x the size of the devices in the original array, and > partitioned those new disks with partition the correct size for the > old array. I used 2 of new disks to remove 2 disks that were not > behaving, and I used another new disk to replace a 3rd original device > that was behaving just fine. I used the 3rd device I replaced to add > to the 3 new disk partitions and created a 4 disk raid6 (3 new + 1 > old/replaced device) and rearranged a subset of files from the > original array to its own mount point on the new array. Obviously, as usual in 'nix' world there are multiple solutions for the same problem especially if you have a small number of drives. My real question was regarding (4) if a exact copy bit-wise replica of an entire disk/spindle would have any technical or safety concerns. Thanks Red ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Replacing all disks in a an array as a preventative measure before failing. 2022-02-07 20:26 Replacing all disks in a an array as a preventative measure before failing Red Wil 2022-02-07 22:28 ` Wol 2022-02-09 13:02 ` Roger Heflin @ 2022-02-09 14:57 ` Phil Turmel 2022-02-09 21:15 ` Red Wil 2 siblings, 1 reply; 7+ messages in thread From: Phil Turmel @ 2022-02-09 14:57 UTC (permalink / raw) To: Red Wil, linux-raid On 2/7/22 15:26, Red Wil wrote: > Hello, [trim/] > Approaches/solutions and critique > 1- add one by one a 'spare' and 'replace' raid member > critique: > - seem to me long and tedious process > - cannot/will not run in parallel > 2- add all the spares at once and perform 'replace' on members > critique > - just tedious - lots of cli commands which can be prone to mistakes. > next ones assume I have all the 'spares' in the rig > 3- create new arrays on spares, fresh fs and copy data. > 4- dd/ddrescue copy each drive to a new one. Advantage can be done one > by one or in parallel. less commands in the terminal. My last drive upgrades were done in a chassis that had two extra hot swap bays. So I could do two at a time. I wanted to keep careful track of roles, so I started a replace after each spare added, to ensure that spare would get the designated role. After it was running, I would --add and --replace the next. After the first two were running (staggered), it was just waiting for one to finish to pop it out and start the next. After completion, I used --grow to occupy the new space on each. Took several days, but no downtime at all. Phil ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Replacing all disks in a an array as a preventative measure before failing. 2022-02-09 14:57 ` Phil Turmel @ 2022-02-09 21:15 ` Red Wil 0 siblings, 0 replies; 7+ messages in thread From: Red Wil @ 2022-02-09 21:15 UTC (permalink / raw) To: Phil Turmel; +Cc: linux-raid On Wed, 9 Feb 2022 09:57:25 -0500 Phil Turmel <philip@turmel.org> wrote: > On 2/7/22 15:26, Red Wil wrote: > > Hello, > > [trim/] > > > Approaches/solutions and critique > > 1- add one by one a 'spare' and 'replace' raid member > > critique: > > - seem to me long and tedious process > > - cannot/will not run in parallel > > 2- add all the spares at once and perform 'replace' on members > > critique > > - just tedious - lots of cli commands which can be prone to > > mistakes. next ones assume I have all the 'spares' in the rig > > 3- create new arrays on spares, fresh fs and copy data. > > 4- dd/ddrescue copy each drive to a new one. Advantage can be > > done one by one or in parallel. less commands in the terminal. > > My last drive upgrades were done in a chassis that had two extra hot > swap bays. So I could do two at a time. I wanted to keep careful > track of roles, so I started a replace after each spare added, to > ensure that spare would get the designated role. After it was > running, I would --add and --replace the next. After the first two > were running (staggered), it was just waiting for one to finish to > pop it out and start the next. > > After completion, I used --grow to occupy the new space on each. > > Took several days, but no downtime at all. > > Phil Hello Phil, My current chassis is full (no space at all) but I found another chassis I could use to temporary extend my chassis for the duration of the swap by using two SAS HBAs and use all 22 drives at once. Thanks Red ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2022-02-09 21:16 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-02-07 20:26 Replacing all disks in a an array as a preventative measure before failing Red Wil 2022-02-07 22:28 ` Wol 2022-02-09 20:58 ` Red Wil 2022-02-09 13:02 ` Roger Heflin 2022-02-09 21:07 ` Red Wil 2022-02-09 14:57 ` Phil Turmel 2022-02-09 21:15 ` Red Wil
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).