* Rotating RAID 1 @ 2011-08-15 19:56 Jérôme Poulin 2011-08-15 20:19 ` Phil Turmel 2011-08-15 20:21 ` Pavel Hofman 0 siblings, 2 replies; 20+ messages in thread From: Jérôme Poulin @ 2011-08-15 19:56 UTC (permalink / raw) To: linux-raid Good evening, I'm currently working on a project in which I use md-raid RAID1 with Bitmap to "clone" my data from one disk to another and I would like to know if this could cause corruption: The system has 2 SATA ports which are hotplug capable. I created 2 partitions, 1 system (2GB), 1 data (1TB+). I created to RAID1 using: mdadm --create /dev/md0 --raid-devices=2 --bitmap=internal --bitmap-chunk=4096 --metadata=1.0 /dev/sd[ab]1 mdadm --create /dev/md1 --raid-devices=2 --bitmap=internal --bitmap-chunk=65536 --metadata=1.0 /dev/sd[ab]2 Forced sync_parallel on the system disk to be sure it rebuild first. Formatted system ext3 and data ext4. Both mounted using data=writeback. This system doesn't contain critical data but it contains backups on the data partition. Once the data is in sync, I removed a disk and let udev fail and remove the disk from the array, this is ArchLinux and udev is set to mount the array using the incremental option, I added --run to make sure it mounts even when a disk is missing. As of now, eveything works as expected. Then what is different about a standard RAID1, I removed sdb and replaced it with a brand new disk, copied the partition template from the other one and added the new disk using mdadm -a on both arrays, it synced and works, then swapping the other disk back only rebuilds according to the bitmap, however sometimes it appears to make a full rebuild which is alright. However once, after a day of modifications and weeks after setting-up this RAID, at least 100 GB, it took seconds to rebuild and days later it appeared to have encountered corruption, the kernel complained about bad extents and fsck found errors in one of the file I know it had modified that day. So the question is; Am I right to use md-raid to do this kind of stuff, rsync is too CPU heavy for what I need and I need to stay compatible with Windows thus choosing metadata 1.0. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Rotating RAID 1 2011-08-15 19:56 Rotating RAID 1 Jérôme Poulin @ 2011-08-15 20:19 ` Phil Turmel 2011-08-15 20:23 ` Jérôme Poulin 2011-08-15 20:21 ` Pavel Hofman 1 sibling, 1 reply; 20+ messages in thread From: Phil Turmel @ 2011-08-15 20:19 UTC (permalink / raw) To: Jérôme Poulin; +Cc: linux-raid Hi Jérôme, On 08/15/2011 03:56 PM, Jérôme Poulin wrote: > Then what is different about a standard RAID1, I removed sdb and > replaced it with a brand new disk, copied the partition template from > the other one and added the new disk using mdadm -a on both arrays, it > synced and works, then swapping the other disk back only rebuilds > according to the bitmap, however sometimes it appears to make a full > rebuild which is alright. However once, after a day of modifications > and weeks after setting-up this RAID, at least 100 GB, it took seconds > to rebuild and days later it appeared to have encountered corruption, > the kernel complained about bad extents and fsck found errors in one > of the file I know it had modified that day. This is a problem. MD only knows about two disk. You have three. When two disks are in place and sync'ed, the bitmaps will essentially stay cleared. When you swap to the other disk, its bitmap is also clear, for the same reason. I'm sure mdadm notices the different event counts, but the clear bitmap would leave mdadm little or nothing to do to resync, as far as it knows. But lots of writes have happened in the meantime, and they won't get copied to the freshly inserted drive. Mdadm will read from both disks in parallel when there are parallel workloads, so one workload would get current data and the other would get stale data. If you perform a "check" pass after swapping and resyncing, I bet it finds many mismatches. It definitely can't work as described. I'm not sure, but this might work if you could temporarily set it up as a triple mirror, so each disk has a unique slot/role. It would also work if you didn't use a bitmap, as a re-inserted drive would simply be overwritten completely. > So the question is; Am I right to use md-raid to do this kind of > stuff, rsync is too CPU heavy for what I need and I need to stay > compatible with Windows thus choosing metadata 1.0. How do you stay compatible with Windows? If you let Windows write to any of these disks, you've corrupted that disk with respect to its peers. Danger, Will Robinson! HTH, Phil -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Rotating RAID 1 2011-08-15 20:19 ` Phil Turmel @ 2011-08-15 20:23 ` Jérôme Poulin 0 siblings, 0 replies; 20+ messages in thread From: Jérôme Poulin @ 2011-08-15 20:23 UTC (permalink / raw) To: Phil Turmel; +Cc: linux-raid On Mon, Aug 15, 2011 at 4:19 PM, Phil Turmel <philip@turmel.org> wrote: > It would also work if you didn't use a bitmap, as a re-inserted drive would simply be overwritten completely. After reading this, I'll prefer wiping the bitmap rather than having an horror story trying to restore that backup, I'll give it a try. > > How do you stay compatible with Windows? If you let Windows write to any of these disks, you've corrupted that disk with respect to its peers. Danger, Will Robinson! I am using ext4 driver in read only mode. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Rotating RAID 1 2011-08-15 19:56 Rotating RAID 1 Jérôme Poulin 2011-08-15 20:19 ` Phil Turmel @ 2011-08-15 20:21 ` Pavel Hofman 2011-08-15 20:25 ` Jérôme Poulin 1 sibling, 1 reply; 20+ messages in thread From: Pavel Hofman @ 2011-08-15 20:21 UTC (permalink / raw) To: Jérôme Poulin; +Cc: linux-raid Dne 15.8.2011 21:56, Jérôme Poulin napsal(a): > Good evening, > > I'm currently working on a project in which I use md-raid RAID1 with > Bitmap to "clone" my data from one disk to another and I would like to > know if this could cause corruption: > > The system has 2 SATA ports which are hotplug capable. > I created 2 partitions, 1 system (2GB), 1 data (1TB+). > I created to RAID1 using: > mdadm --create /dev/md0 --raid-devices=2 --bitmap=internal > --bitmap-chunk=4096 --metadata=1.0 /dev/sd[ab]1 > mdadm --create /dev/md1 --raid-devices=2 --bitmap=internal > --bitmap-chunk=65536 --metadata=1.0 /dev/sd[ab]2 > Forced sync_parallel on the system disk to be sure it rebuild first. > Formatted system ext3 and data ext4. > Both mounted using data=writeback. > > This system doesn't contain critical data but it contains backups on > the data partition. Once the data is in sync, I removed a disk and let > udev fail and remove the disk from the array, this is ArchLinux and > udev is set to mount the array using the incremental option, I added > --run to make sure it mounts even when a disk is missing. As of now, > eveything works as expected. > > Then what is different about a standard RAID1, I removed sdb and > replaced it with a brand new disk, copied the partition template from > the other one and added the new disk using mdadm -a on both arrays, it > synced and works, then swapping the other disk back only rebuilds > according to the bitmap, however sometimes it appears to make a full > rebuild which is alright. However once, after a day of modifications > and weeks after setting-up this RAID, at least 100 GB, it took seconds > to rebuild and days later it appeared to have encountered corruption, > the kernel complained about bad extents and fsck found errors in one > of the file I know it had modified that day. Does your scenario involve using two "external" drives, being swapped each time? I am using such setup, but in order to gain the bitmap performance effects, I have to run two mirrored RAID1s, i.e. two bitmaps, each for its corresponding external disk. This setup has been working OK for a few years now. Best regards, Pavel. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Rotating RAID 1 2011-08-15 20:21 ` Pavel Hofman @ 2011-08-15 20:25 ` Jérôme Poulin 2011-08-15 20:42 ` Pavel Hofman 0 siblings, 1 reply; 20+ messages in thread From: Jérôme Poulin @ 2011-08-15 20:25 UTC (permalink / raw) To: Pavel Hofman; +Cc: linux-raid On Mon, Aug 15, 2011 at 4:21 PM, Pavel Hofman <pavel.hofman@ivitera.com> wrote: > Does your scenario involve using two "external" drives, being swapped > each time? Yes, exactly, 3 or more drive, one stays in place, and the others get rotated off-site. > I am using such setup, but in order to gain the bitmap > performance effects, I have to run two mirrored RAID1s, i.e. two > bitmaps, each for its corresponding external disk. This setup has been > working OK for a few years now. Did you script something that stops the RAID and re-assemble it? The RAID must stay mounted in my case as there is live data (incremential backups, so even if the last file is incomplete it is not a problem.) ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Rotating RAID 1 2011-08-15 20:25 ` Jérôme Poulin @ 2011-08-15 20:42 ` Pavel Hofman 2011-08-15 22:42 ` NeilBrown 0 siblings, 1 reply; 20+ messages in thread From: Pavel Hofman @ 2011-08-15 20:42 UTC (permalink / raw) To: Jérôme Poulin; +Cc: linux-raid Dne 15.8.2011 22:25, Jérôme Poulin napsal(a): > On Mon, Aug 15, 2011 at 4:21 PM, Pavel Hofman <pavel.hofman@ivitera.com> wrote: >> Does your scenario involve using two "external" drives, being swapped >> each time? > > Yes, exactly, 3 or more drive, one stays in place, and the others get > rotated off-site. > >> I am using such setup, but in order to gain the bitmap >> performance effects, I have to run two mirrored RAID1s, i.e. two >> bitmaps, each for its corresponding external disk. This setup has been >> working OK for a few years now. > > Did you script something that stops the RAID and re-assemble it? The > RAID must stay mounted in my case as there is live data (incremential > backups, so even if the last file is incomplete it is not a problem.) I am working on wiki description of our backup solution. The priorities got re-organized recently, looks like I should finish it soon :-) Yes, I have a script automatically re-assembling the array corresponding to the added drive and starting synchronization. There is another script checking synchronization status, run periodically from cron. When the arrays are synced, it waits until the currently running backup job finishes, shuts down the backup software (backuppc), unmounts the filesystem to flush, removes the external drives from the array (we run several external drives in raid0), does a few basic checks on the external copy (mounting read-only, reading a directory) and puts the external drives to sleep (hdparm -Y) for storing them outside of company premises. Give me a few days, I will finish the wiki page and send you a link. Pavel. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Rotating RAID 1 2011-08-15 20:42 ` Pavel Hofman @ 2011-08-15 22:42 ` NeilBrown 2011-08-15 23:32 ` Jérôme Poulin 2011-08-16 4:36 ` maurice 0 siblings, 2 replies; 20+ messages in thread From: NeilBrown @ 2011-08-15 22:42 UTC (permalink / raw) To: Pavel Hofman; +Cc: Jérôme Poulin, linux-raid On Mon, 15 Aug 2011 22:42:06 +0200 Pavel Hofman <pavel.hofman@ivitera.com> wrote: > > Dne 15.8.2011 22:25, Jérôme Poulin napsal(a): > > On Mon, Aug 15, 2011 at 4:21 PM, Pavel Hofman <pavel.hofman@ivitera.com> wrote: > >> Does your scenario involve using two "external" drives, being swapped > >> each time? > > > > Yes, exactly, 3 or more drive, one stays in place, and the others get > > rotated off-site. > > > >> I am using such setup, but in order to gain the bitmap > >> performance effects, I have to run two mirrored RAID1s, i.e. two > >> bitmaps, each for its corresponding external disk. This setup has been > >> working OK for a few years now. > > > > Did you script something that stops the RAID and re-assemble it? The > > RAID must stay mounted in my case as there is live data (incremential > > backups, so even if the last file is incomplete it is not a problem.) > > I am working on wiki description of our backup solution. The priorities > got re-organized recently, looks like I should finish it soon :-) > > Yes, I have a script automatically re-assembling the array corresponding > to the added drive and starting synchronization. There is another script > checking synchronization status, run periodically from cron. When the > arrays are synced, it waits until the currently running backup job > finishes, shuts down the backup software (backuppc), unmounts the > filesystem to flush, removes the external drives from the array (we run > several external drives in raid0), does a few basic checks on the > external copy (mounting read-only, reading a directory) and puts the > external drives to sleep (hdparm -Y) for storing them outside of company > premises. > > Give me a few days, I will finish the wiki page and send you a link. > I'm not sure from you description whether the following describes exactly what you are doing or not, but this is how I would do it. As you say, you need two bitmaps. So if there are 3 drives A, X, Y where A is permanent and X and Y are rotated off-site, then I create two RAID1s like this: mdadm -C /dev/md0 -l1 -n2 --bitmap=internal /dev/A /dev/X mdadm -C /dev/md1 -l1 -n2 --bitmap=internal /dev/md0 /dev/Y mkfs /dev/md1; mount /dev/md1 ... Then you can remove either or both of X and Y and which each is re-added it will recover just the blocks that it needs. X from the bitmap of md0, Y from the bitmap of md1. NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Rotating RAID 1 2011-08-15 22:42 ` NeilBrown @ 2011-08-15 23:32 ` Jérôme Poulin 2011-08-15 23:55 ` NeilBrown 2011-08-16 4:36 ` maurice 1 sibling, 1 reply; 20+ messages in thread From: Jérôme Poulin @ 2011-08-15 23:32 UTC (permalink / raw) To: NeilBrown; +Cc: Pavel Hofman, linux-raid On Mon, Aug 15, 2011 at 6:42 PM, NeilBrown <neilb@suse.de> wrote: > So if there are 3 drives A, X, Y where A is permanent and X and Y are rotated > off-site, then I create two RAID1s like this: > > > mdadm -C /dev/md0 -l1 -n2 --bitmap=internal /dev/A /dev/X > mdadm -C /dev/md1 -l1 -n2 --bitmap=internal /dev/md0 /dev/Y That seems nice for 2 disks, but adding another one later would be a mess. Is there any way to play with slots number manually to make it appear as an always degraded RAID ? I can't plug all the disks at once because of the maximum of 2 ports. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Rotating RAID 1 2011-08-15 23:32 ` Jérôme Poulin @ 2011-08-15 23:55 ` NeilBrown 2011-08-16 6:34 ` Pavel Hofman ` (2 more replies) 0 siblings, 3 replies; 20+ messages in thread From: NeilBrown @ 2011-08-15 23:55 UTC (permalink / raw) To: Jérôme Poulin; +Cc: Pavel Hofman, linux-raid On Mon, 15 Aug 2011 19:32:04 -0400 Jérôme Poulin <jeromepoulin@gmail.com> wrote: > On Mon, Aug 15, 2011 at 6:42 PM, NeilBrown <neilb@suse.de> wrote: > > So if there are 3 drives A, X, Y where A is permanent and X and Y are rotated > > off-site, then I create two RAID1s like this: > > > > > > mdadm -C /dev/md0 -l1 -n2 --bitmap=internal /dev/A /dev/X > > mdadm -C /dev/md1 -l1 -n2 --bitmap=internal /dev/md0 /dev/Y > > That seems nice for 2 disks, but adding another one later would be a > mess. Is there any way to play with slots number manually to make it > appear as an always degraded RAID ? I can't plug all the disks at once > because of the maximum of 2 ports. Yes, add another one later would be difficult. But if you know up-front that you will want three off-site devices it is easy. You could mdadm -C /dev/md0 -l1 -n2 -b internal /dev/A missing mdadm -C /dev/md1 -l1 -n2 -b internal /dev/md0 missing mdadm -C /dev/md2 -l1 -n2 -b internal /dev/md1 missing mdadm -C /dev/md3 -l1 -n2 -b internal /dev/md2 missing mkfs /dev/md3 ; mount .. So you now have 4 "missing" devices. Each time you plug in a device that hasn't been in an array before, explicitly add it to the array that you want it to be a part of and let it recover. When you plug in a device that was previously plugged in, just "mdadm -I /dev/XX" and it will automatically be added and recover based on the bitmap. You can have as many or as few of the transient drives plugged in at any time as you like. There is a cost here of course. Every write potentially needs to update every bitmap, so the more bitmaps, the more overhead in updating them. So don't create more than you need. Also, it doesn't have to be a linear stack. It could be a binary tree though that might take a little more care to construct. Then when an adjacent pair of leafs are both off-site, their bitmap would not need updating. NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Rotating RAID 1 2011-08-15 23:55 ` NeilBrown @ 2011-08-16 6:34 ` Pavel Hofman 2011-09-09 22:28 ` Bill Davidsen 2011-08-23 3:45 ` Jérôme Poulin 2011-10-25 7:34 ` linbloke 2 siblings, 1 reply; 20+ messages in thread From: Pavel Hofman @ 2011-08-16 6:34 UTC (permalink / raw) To: NeilBrown; +Cc: Jérôme Poulin, linux-raid Dne 16.8.2011 01:55, NeilBrown napsal(a): > > Also, it doesn't have to be a linear stack. It could be a binary tree > though that might take a little more care to construct. Since our backup server being a critical resource needs redundancy itself, we are running two degraded RAID1s in parallel, using two internal drives. The two alternating external drives plug into the corresponding bitmap-enabled RAID1. Pavel. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Rotating RAID 1 2011-08-16 6:34 ` Pavel Hofman @ 2011-09-09 22:28 ` Bill Davidsen 2011-09-11 19:21 ` Pavel Hofman 0 siblings, 1 reply; 20+ messages in thread From: Bill Davidsen @ 2011-09-09 22:28 UTC (permalink / raw) To: Pavel Hofman; +Cc: NeilBrown, Jérôme Poulin, linux-raid Pavel Hofman wrote: > Dne 16.8.2011 01:55, NeilBrown napsal(a): > >> Also, it doesn't have to be a linear stack. It could be a binary tree >> though that might take a little more care to construct. >> > Since our backup server being a critical resource needs redundancy > itself, we are running two degraded RAID1s in parallel, using two > internal drives. The two alternating external drives plug into the > corresponding bitmap-enabled RAID1. > I wonder if you could use a four device raid1 here, two drives permanently installed and two being added one at a time to the array. That gives you internal redundancy and recent backups as well. I'm still a bit puzzled about the idea of rsync being too much CPU overhead, but I'll pass on that. The issue I have had with raid1 for a backup is that the data isn't always in a logical useful state when you do physical backup. Do thing with scripts and hope you always run the right one. -- Bill Davidsen<davidsen@tmr.com> We are not out of the woods yet, but we know the direction and have taken the first step. The steps are many, but finite in number, and if we persevere we will reach our destination. -me, 2010 ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Rotating RAID 1 2011-09-09 22:28 ` Bill Davidsen @ 2011-09-11 19:21 ` Pavel Hofman 2011-09-12 14:20 ` Bill Davidsen 0 siblings, 1 reply; 20+ messages in thread From: Pavel Hofman @ 2011-09-11 19:21 UTC (permalink / raw) To: Bill Davidsen; +Cc: linux-raid Dne 10.9.2011 00:28, Bill Davidsen napsal(a): > Pavel Hofman wrote: >> Dne 16.8.2011 01:55, NeilBrown napsal(a): >> >>> Also, it doesn't have to be a linear stack. It could be a binary tree >>> though that might take a little more care to construct. >>> >> Since our backup server being a critical resource needs redundancy >> itself, we are running two degraded RAID1s in parallel, using two >> internal drives. The two alternating external drives plug into the >> corresponding bitmap-enabled RAID1. >> > > I wonder if you could use a four device raid1 here, two drives > permanently installed and two being added one at a time to the array. > That gives you internal redundancy and recent backups as well. I am not sure you could employ the write-intent bitmap then. And the bitmap makes the backup considerably faster. > > I'm still a bit puzzled about the idea of rsync being too much CPU > overhead, but I'll pass on that. The issue I have had with raid1 for a > backup is that the data isn't always in a logical useful state when you > do physical backup. Do thing with scripts and hope you always run the > right one. I am afraid I do not understand exactly what you mean :-) We have a few scripts, but only one is started manually, the rest is taken care of automatically. Pavel. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Rotating RAID 1 2011-09-11 19:21 ` Pavel Hofman @ 2011-09-12 14:20 ` Bill Davidsen 0 siblings, 0 replies; 20+ messages in thread From: Bill Davidsen @ 2011-09-12 14:20 UTC (permalink / raw) To: Pavel Hofman; +Cc: linux-raid Pavel Hofman wrote: > Dne 10.9.2011 00:28, Bill Davidsen napsal(a): > >> Pavel Hofman wrote: >> >>> Dne 16.8.2011 01:55, NeilBrown napsal(a): >>> >>> >>>> Also, it doesn't have to be a linear stack. It could be a binary tree >>>> though that might take a little more care to construct. >>>> >>>> >>> Since our backup server being a critical resource needs redundancy >>> itself, we are running two degraded RAID1s in parallel, using two >>> internal drives. The two alternating external drives plug into the >>> corresponding bitmap-enabled RAID1. >>> >>> >> I wonder if you could use a four device raid1 here, two drives >> permanently installed and two being added one at a time to the array. >> That gives you internal redundancy and recent backups as well. >> > I am not sure you could employ the write-intent bitmap then. And the > bitmap makes the backup considerably faster. > With --bitmap=internal you should have all of the information you need to do fast recovery, but I may misunderstand internal bitmap and possibly incremental build. What I proposed was creating the array as dev1 dev2 dev3 missing, then dev3 or dev4 could be added and brought up to current independently because they would be separate devices. -- Bill Davidsen<davidsen@tmr.com> We are not out of the woods yet, but we know the direction and have taken the first step. The steps are many, but finite in number, and if we persevere we will reach our destination. -me, 2010 ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Rotating RAID 1 2011-08-15 23:55 ` NeilBrown 2011-08-16 6:34 ` Pavel Hofman @ 2011-08-23 3:45 ` Jérôme Poulin 2011-08-23 3:58 ` NeilBrown 2011-10-25 7:34 ` linbloke 2 siblings, 1 reply; 20+ messages in thread From: Jérôme Poulin @ 2011-08-23 3:45 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid On Mon, Aug 15, 2011 at 7:55 PM, NeilBrown <neilb@suse.de> wrote: > Yes, add another one later would be difficult. But if you know up-front that > you will want three off-site devices it is easy. > > You could > > mdadm -C /dev/md0 -l1 -n2 -b internal /dev/A missing > mdadm -C /dev/md1 -l1 -n2 -b internal /dev/md0 missing > mdadm -C /dev/md2 -l1 -n2 -b internal /dev/md1 missing > mdadm -C /dev/md3 -l1 -n2 -b internal /dev/md2 missing > > mkfs /dev/md3 ; mount .. > > So you now have 4 "missing" devices. Alright, so I tried that on my project, being a low-end device is resulted in about 30-40% performance lost with 8 MDs (planning in advance), I tried disabling all bitmap to see if it helps and I get minimal performance gain. Is there anything I should tune in this case? -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Rotating RAID 1 2011-08-23 3:45 ` Jérôme Poulin @ 2011-08-23 3:58 ` NeilBrown 2011-08-23 4:05 ` Jérôme Poulin 0 siblings, 1 reply; 20+ messages in thread From: NeilBrown @ 2011-08-23 3:58 UTC (permalink / raw) To: Jérôme Poulin; +Cc: linux-raid On Mon, 22 Aug 2011 23:45:53 -0400 Jérôme Poulin <jeromepoulin@gmail.com> wrote: > On Mon, Aug 15, 2011 at 7:55 PM, NeilBrown <neilb@suse.de> wrote: > > Yes, add another one later would be difficult. But if you know up-front that > > you will want three off-site devices it is easy. > > > > You could > > > > mdadm -C /dev/md0 -l1 -n2 -b internal /dev/A missing > > mdadm -C /dev/md1 -l1 -n2 -b internal /dev/md0 missing > > mdadm -C /dev/md2 -l1 -n2 -b internal /dev/md1 missing > > mdadm -C /dev/md3 -l1 -n2 -b internal /dev/md2 missing > > > > mkfs /dev/md3 ; mount .. > > > > So you now have 4 "missing" devices. > > Alright, so I tried that on my project, being a low-end device is > resulted in about 30-40% performance lost with 8 MDs (planning in > advance), I tried disabling all bitmap to see if it helps and I get > minimal performance gain. Is there anything I should tune in this > case? More concrete details would help... So you have 8 MD RAID1s each with one missing device and the other device is the next RAID1 down in the stack, except that last RAID1 where the one device is a real device. And in some unspecified test the RAID1 at the top of the stack gives 2/3 the performance of the plain device? This the same when all bitmaps are removed. Certainly seems strange. Can you give details of the test and numbers etc. NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Rotating RAID 1 2011-08-23 3:58 ` NeilBrown @ 2011-08-23 4:05 ` Jérôme Poulin 2011-08-24 2:28 ` Jérôme Poulin 0 siblings, 1 reply; 20+ messages in thread From: Jérôme Poulin @ 2011-08-23 4:05 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid On Mon, Aug 22, 2011 at 11:58 PM, NeilBrown <neilb@suse.de> wrote: > More concrete details would help... Sorry, you're right, I though it could have been something fast. I have details for the first test I made with 15 RAIDs. > > So you have 8 MD RAID1s each with one missing device and the other device is > the next RAID1 down in the stack, except that last RAID1 where the one device > is a real device. Exactly, only 1 real device at the moment. > > And in some unspecified test the RAID1 at the top of the stack gives 2/3 the > performance of the plain device? This the same when all bitmaps are > removed. > > Certainly seems strange. > > Can you give details of the test and numbers etc. So the test is a backup, Veeam exactly, using Samba 3.6.0 with brand new SMB2 protocol, bitmaps are removed. The backup took 45 minutes instead of 14 to 22 minutes. Here is a sample of iostat showing the average queue size increasing by RAID devices: Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdb 0.00 35.67 0.00 27.00 0.00 5579.00 413.26 2.01 74.69 0.00 74.69 34.32 92.67 md64 0.00 0.00 0.00 61.33 0.00 5577.00 181.86 0.00 0.00 0.00 0.00 0.00 0.00 md65 0.00 0.00 0.00 60.00 0.00 5574.67 185.82 0.00 0.00 0.00 0.00 0.00 0.00 md66 0.00 0.00 0.00 58.67 0.00 5572.33 189.97 0.00 0.00 0.00 0.00 0.00 0.00 md67 0.00 0.00 0.00 58.67 0.00 5572.33 189.97 0.00 0.00 0.00 0.00 0.00 0.00 md68 0.00 0.00 0.00 58.67 0.00 5572.33 189.97 0.00 0.00 0.00 0.00 0.00 0.00 md69 0.00 0.00 0.00 58.67 0.00 5572.33 189.97 0.00 0.00 0.00 0.00 0.00 0.00 md70 0.00 0.00 0.00 58.33 0.00 5572.00 191.04 0.00 0.00 0.00 0.00 0.00 0.00 md71 0.00 0.00 0.00 57.00 0.00 5569.67 195.43 0.00 0.00 0.00 0.00 0.00 0.00 md72 0.00 0.00 0.00 55.67 0.00 5567.33 200.02 0.00 0.00 0.00 0.00 0.00 0.00 md73 0.00 0.00 0.00 54.33 0.00 5565.00 204.85 0.00 0.00 0.00 0.00 0.00 0.00 md74 0.00 0.00 0.00 53.00 0.00 5562.67 209.91 0.00 0.00 0.00 0.00 0.00 0.00 md75 0.00 0.00 0.00 51.67 0.00 5560.33 215.24 0.00 0.00 0.00 0.00 0.00 0.00 md76 0.00 0.00 0.00 50.33 0.00 5558.00 220.85 0.00 0.00 0.00 0.00 0.00 0.00 md77 0.00 0.00 0.00 49.00 0.00 5555.67 226.76 0.00 0.00 0.00 0.00 0.00 0.00 md78 0.00 0.00 0.00 47.67 0.00 5553.33 233.01 0.00 0.00 0.00 0.00 0.00 0.00 -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Rotating RAID 1 2011-08-23 4:05 ` Jérôme Poulin @ 2011-08-24 2:28 ` Jérôme Poulin 0 siblings, 0 replies; 20+ messages in thread From: Jérôme Poulin @ 2011-08-24 2:28 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid On Tue, Aug 23, 2011 at 12:05 AM, Jérôme Poulin <jeromepoulin@gmail.com> wrote: > On Mon, Aug 22, 2011 at 11:58 PM, NeilBrown <neilb@suse.de> wrote: >> So you have 8 MD RAID1s each with one missing device and the other device is >> the next RAID1 down in the stack, except that last RAID1 where the one device >> is a real device. > More tests revealed nothing very consistent... however, there is a consistent performance degradation on our backups when using multiple RAID devices, backup is every 2 hours and it is really slower. Here are the results of bonnie++ which only show degradation of per char even if I know it is not really significant. Rewrite was going down more and more until it went back up for no reason, really weird unexplainable results. First line is from raw device, sdb2, then from md device, then incrementally more md devices in series. -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- GANAS0202 300M 5547 93 76052 54 26862 34 5948 99 80050 49 175.6 2 GANAS0202 300M 5455 92 72428 52 26787 35 5847 97 75833 49 166.3 2 GANAS0202 300M 5401 91 71860 52 27100 35 5820 97 79219 53 156.2 2 GANAS0202 300M 5315 90 71488 51 22472 30 5673 94 73707 51 162.5 2 GANAS0202 300M 5159 87 67984 50 22860 31 5642 94 78829 54 138.6 2 GANAS0202 300M 5033 85 67091 48 22189 30 5458 91 76586 55 149.3 2 GANAS0202 300M 4904 83 65626 47 24602 34 5425 91 72349 52 112.9 2 GANAS0202 300M 4854 82 66664 48 24937 35 5120 85 75008 56 149.1 2 GANAS0202 300M 4732 80 66429 48 25646 37 5296 88 75137 57 145.7 2 GANAS0202 300M 4246 71 69589 51 25112 36 5031 84 78260 61 136.2 2 GANAS0202 300M 4253 72 70190 52 27121 40 5194 87 77648 61 107.5 2 GANAS0202 300M 4112 69 76360 55 23852 35 4827 81 74005 59 118.9 2 GANAS0202 300M 3987 67 62689 47 22475 33 4971 83 74315 61 97.6 2 GANAS0202 300M 3912 66 69769 51 22221 33 4979 83 74631 62 114.9 2 GANAS0202 300M 3602 61 52773 38 25944 40 4953 83 77794 65 125.4 2 GANAS0202 300M 3580 60 58728 43 22855 35 4680 79 74244 64 155.2 3 -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Rotating RAID 1 2011-08-15 23:55 ` NeilBrown 2011-08-16 6:34 ` Pavel Hofman 2011-08-23 3:45 ` Jérôme Poulin @ 2011-10-25 7:34 ` linbloke 2011-10-25 21:47 ` NeilBrown 2 siblings, 1 reply; 20+ messages in thread From: linbloke @ 2011-10-25 7:34 UTC (permalink / raw) To: NeilBrown; +Cc: Jérôme Poulin, Pavel Hofman, linux-raid On 16/08/11 9:55 AM, NeilBrown wrote: > On Mon, 15 Aug 2011 19:32:04 -0400 Jérôme Poulin<jeromepoulin@gmail.com> > wrote: > >> On Mon, Aug 15, 2011 at 6:42 PM, NeilBrown<neilb@suse.de> wrote: >>> So if there are 3 drives A, X, Y where A is permanent and X and Y are rotated >>> off-site, then I create two RAID1s like this: >>> >>> >>> mdadm -C /dev/md0 -l1 -n2 --bitmap=internal /dev/A /dev/X >>> mdadm -C /dev/md1 -l1 -n2 --bitmap=internal /dev/md0 /dev/Y >> That seems nice for 2 disks, but adding another one later would be a >> mess. Is there any way to play with slots number manually to make it >> appear as an always degraded RAID ? I can't plug all the disks at once >> because of the maximum of 2 ports. > Yes, add another one later would be difficult. But if you know up-front that > you will want three off-site devices it is easy. > > You could > > mdadm -C /dev/md0 -l1 -n2 -b internal /dev/A missing > mdadm -C /dev/md1 -l1 -n2 -b internal /dev/md0 missing > mdadm -C /dev/md2 -l1 -n2 -b internal /dev/md1 missing > mdadm -C /dev/md3 -l1 -n2 -b internal /dev/md2 missing > > mkfs /dev/md3 ; mount .. > > So you now have 4 "missing" devices. Each time you plug in a device that > hasn't been in an array before, explicitly add it to the array that you want > it to be a part of and let it recover. > When you plug in a device that was previously plugged in, just "mdadm > -I /dev/XX" and it will automatically be added and recover based on the > bitmap. > > You can have as many or as few of the transient drives plugged in at any > time as you like. > > There is a cost here of course. Every write potentially needs to update > every bitmap, so the more bitmaps, the more overhead in updating them. So > don't create more than you need. > > Also, it doesn't have to be a linear stack. It could be a binary tree > though that might take a little more care to construct. Then when an > adjacent pair of leafs are both off-site, their bitmap would not need > updating. > > NeilBrown Hi Neil, Jérôme and Pavel, I'm in the process of testing the solution described above and have been successful at those steps (I now have sync'd devices that I have failed and removed from their respective arrays - the "backups"). I can add new devices and also incrementally re-add the devices back to their respective arrays and all my tests show this process works well. The point which I'm now trying to resolve is how to create a new array from one of the off-site components - ie, the restore from backup test. Below are the steps I've taken to implement and verify each step, you can skip to the bottom section "Restore from off-site backup" to get to the point if you like. When the wiki is back up, I'll post this process there for others who are looking for mdadm based offline backups. Any corrections gratefully appreciated. Based on the example above, for a target setup of 7 off-site devices synced to a two device RAID1, my test setup for a is: RAID Array Online Device Off-site device md100 sdc sdd md101 md100 sde md102 md101 sdf md103 md102 sdg md104 md103 sdh md105 md104 sdi md106 md105 sdj root@deb6dev:~# uname -a Linux deb6dev 2.6.32-5-686 #1 SMP Tue Mar 8 21:36:00 UTC 2011 i686 GNU/Linux root@deb6dev:~# mdadm -V mdadm - v3.1.4 - 31st August 2010 root@deb6dev:~# cat /etc/debian_version 6.0.1 Create the nested arrays --------------------- root@deb6dev:~# mdadm -C /dev/md100 -l1 -n2 -b internal -e 1.2 /dev/sdc missing mdadm: array /dev/md100 started. root@deb6dev:~# mdadm -C /dev/md101 -l1 -n2 -b internal -e 1.2 /dev/md100 missing mdadm: array /dev/md101 started. root@deb6dev:~# mdadm -C /dev/md102 -l1 -n2 -b internal -e 1.2 /dev/md101 missing mdadm: array /dev/md102 started. root@deb6dev:~# mdadm -C /dev/md103 -l1 -n2 -b internal -e 1.2 /dev/md102 missing mdadm: array /dev/md103 started. root@deb6dev:~# mdadm -C /dev/md104 -l1 -n2 -b internal -e 1.2 /dev/md103 missing mdadm: array /dev/md104 started. root@deb6dev:~# mdadm -C /dev/md105 -l1 -n2 -b internal -e 1.2 /dev/md104 missing mdadm: array /dev/md105 started. root@deb6dev:~# mdadm -C /dev/md106 -l1 -n2 -b internal -e 1.2 /dev/md105 missing mdadm: array /dev/md106 started. root@deb6dev:~# cat /proc/mdstat Personalities : [raid1] md106 : active (auto-read-only) raid1 md105[0] 51116 blocks super 1.2 [2/1] [U_] bitmap: 1/1 pages [4KB], 65536KB chunk md105 : active raid1 md104[0] 51128 blocks super 1.2 [2/1] [U_] bitmap: 1/1 pages [4KB], 65536KB chunk md104 : active raid1 md103[0] 51140 blocks super 1.2 [2/1] [U_] bitmap: 1/1 pages [4KB], 65536KB chunk md103 : active raid1 md102[0] 51152 blocks super 1.2 [2/1] [U_] bitmap: 1/1 pages [4KB], 65536KB chunk md102 : active raid1 md101[0] 51164 blocks super 1.2 [2/1] [U_] bitmap: 1/1 pages [4KB], 65536KB chunk md101 : active raid1 md100[0] 51176 blocks super 1.2 [2/1] [U_] bitmap: 1/1 pages [4KB], 65536KB chunk md100 : active raid1 sdc[0] 51188 blocks super 1.2 [2/1] [U_] bitmap: 1/1 pages [4KB], 65536KB chunk unused devices: <none> Create and mount a filesystem -------------------------- root@deb6dev:~# mkfs.ext3 /dev/md106 <<successful mkfs output snipped>> root@deb6dev:~# mount -t ext3 /dev/md106 /mnt/backup root@deb6dev:~# df | grep backup /dev/md106 49490 4923 42012 11% /mnt/backup Plug in a device that hasn't been in an array before ------------------------------------------- root@deb6dev:~# mdadm -vv /dev/md100 --add /dev/sdd mdadm: added /dev/sdd root@deb6dev:~# cat /proc/mdstat Personalities : [raid1] md106 : active raid1 md105[0] 51116 blocks super 1.2 [2/1] [U_] bitmap: 1/1 pages [4KB], 65536KB chunk md105 : active raid1 md104[0] 51128 blocks super 1.2 [2/1] [U_] bitmap: 1/1 pages [4KB], 65536KB chunk md104 : active raid1 md103[0] 51140 blocks super 1.2 [2/1] [U_] bitmap: 1/1 pages [4KB], 65536KB chunk md103 : active raid1 md102[0] 51152 blocks super 1.2 [2/1] [U_] bitmap: 1/1 pages [4KB], 65536KB chunk md102 : active raid1 md101[0] 51164 blocks super 1.2 [2/1] [U_] bitmap: 1/1 pages [4KB], 65536KB chunk md101 : active raid1 md100[0] 51176 blocks super 1.2 [2/1] [U_] bitmap: 1/1 pages [4KB], 65536KB chunk md100 : active raid1 sdd[2] sdc[0] 51188 blocks super 1.2 [2/2] [UU] bitmap: 1/1 pages [4KB], 65536KB chunk Write to the array --------------- root@deb6dev:~# dd if=/dev/urandom of=a.blob bs=1M count=20 20+0 records in 20+0 records out 20971520 bytes (21 MB) copied, 5.05528 s, 4.1 MB/s root@deb6dev:~# dd if=/dev/urandom of=b.blob bs=1M count=10 10+0 records in 10+0 records out 10485760 bytes (10 MB) copied, 2.59361 s, 4.0 MB/s root@deb6dev:~# dd if=/dev/urandom of=c.blob bs=1M count=5 5+0 records in 5+0 records out 5242880 bytes (5.2 MB) copied, 1.35619 s, 3.9 MB/s root@deb6dev:~# md5sum *blob > md5sums.txt root@deb6dev:~# ls -l total 35844 -rw-r--r-- 1 root root 20971520 Oct 25 15:57 a.blob -rw-r--r-- 1 root root 10485760 Oct 25 15:57 b.blob -rw-r--r-- 1 root root 5242880 Oct 25 15:57 c.blob -rw-r--r-- 1 root root 123 Oct 25 15:57 md5sums.txt root@deb6dev:~# cp *blob /mnt/backup root@deb6dev:~# ls -l /mnt/backup total 35995 -rw-r--r-- 1 root root 20971520 Oct 25 15:58 a.blob -rw-r--r-- 1 root root 10485760 Oct 25 15:58 b.blob -rw-r--r-- 1 root root 5242880 Oct 25 15:58 c.blob drwx------ 2 root root 12288 Oct 25 15:27 lost+found root@deb6dev:~# df | grep backup /dev/md106 49490 40906 6029 88% /mnt/backup root@deb6dev:~# cat /proc/mdstat | grep -A 3 '^md100' md100 : active raid1 sdd[2] sdc[0] 51188 blocks super 1.2 [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk Data written and array devices in sync (bitmap 0/1) Fail and remove device ------------------- root@deb6dev:~# sync root@deb6dev:~# mdadm -vv /dev/md100 --fail /dev/sdd --remove /dev/sdd mdadm: set /dev/sdd faulty in /dev/md100 mdadm: hot removed /dev/sdd from /dev/md100 root@deb6dev:~# cat /proc/mdstat | grep -A 3 '^md100' md100 : active raid1 sdc[0] 51188 blocks super 1.2 [2/1] [U_] bitmap: 0/1 pages [0KB], 65536KB chunk Device may now be unplugged Write to the array again -------------------- root@deb6dev:~# rm /mnt/backup/b.blob root@deb6dev:~# ls -l /mnt/backup total 25714 -rw-r--r-- 1 root root 20971520 Oct 25 15:58 a.blob -rw-r--r-- 1 root root 5242880 Oct 25 15:58 c.blob drwx------ 2 root root 12288 Oct 25 15:27 lost+found root@deb6dev:~# df | grep backup /dev/md106 49490 30625 16310 66% /mnt/backup root@deb6dev:~# cat /proc/mdstat | grep -A 3 '^md100' md100 : active raid1 sdc[0] 51188 blocks super 1.2 [2/1] [U_] bitmap: 1/1 pages [4KB], 65536KB chunk bitmap 1/1 shows array is not in sync (we know it's due to the writes pending for the device we previously failed) Plug in a device that was previously plugged in ---------------------------------------- root@deb6dev:~# mdadm -vv -I /dev/sdd --run mdadm: UUID differs from /dev/md/0. mdadm: UUID differs from /dev/md/1. mdadm: /dev/sdd attached to /dev/md100 which is already active. root@deb6dev:~# sync root@deb6dev:~# cat /proc/mdstat | grep -A 3 '^md100' md100 : active raid1 sdd[2] sdc[0] 51188 blocks super 1.2 [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk Device reconnected [UU] and in sync (bitmap 0/1) Restore from off-site device ------------------------ Remove device from array root@deb6dev:~# mdadm -vv /dev/md100 --fail /dev/sdd --remove /dev/sdd mdadm: set /dev/sdd faulty in /dev/md100 mdadm: hot removed /dev/sdd from /dev/md100 root@deb6dev:~# mdadm -Ev /dev/sdd /dev/sdd: Magic : a92b4efc Version : 1.2 Feature Map : 0x1 Array UUID : 4c957fac:d7dbc792:b642daf0:d22e313e Name : deb6dev:100 (local to host deb6dev) Creation Time : Tue Oct 25 15:22:19 2011 Raid Level : raid1 Raid Devices : 2 Avail Dev Size : 102376 (50.00 MiB 52.42 MB) Array Size : 102376 (50.00 MiB 52.42 MB) Data Offset : 24 sectors Super Offset : 8 sectors State : clean Device UUID : 381f453f:5a97f1f6:bb5098bb:8c071a95 Internal Bitmap : 8 sectors from superblock Update Time : Tue Oct 25 17:27:53 2011 Checksum : acbcee5f - correct Events : 250 Device Role : Active device 1 Array State : AA ('A' == active, '.' == missing) Assemble a new array from off-site component: root@deb6dev:~# mdadm -vv -A /dev/md200 --run /dev/sdd mdadm: looking for devices for /dev/md200 mdadm: /dev/sdd is identified as a member of /dev/md200, slot 1. mdadm: no uptodate device for slot 0 of /dev/md200 mdadm: added /dev/sdd to /dev/md200 as 1 mdadm: /dev/md200 has been started with 1 drive (out of 2). root@deb6dev:~# Check file-system on new array root@deb6dev:~# fsck.ext3 -f -n /dev/md200 e2fsck 1.41.12 (17-May-2010) fsck.ext3: Superblock invalid, trying backup blocks... fsck.ext3: Bad magic number in super-block while trying to open /dev/md200 The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock: e2fsck -b 8193 <device> How do I use these devices in a new array? Kind regards and thanks for your help, Josh -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Rotating RAID 1 2011-10-25 7:34 ` linbloke @ 2011-10-25 21:47 ` NeilBrown 0 siblings, 0 replies; 20+ messages in thread From: NeilBrown @ 2011-10-25 21:47 UTC (permalink / raw) To: linbloke; +Cc: Jérôme Poulin, Pavel Hofman, linux-raid [-- Attachment #1: Type: text/plain, Size: 12942 bytes --] On Tue, 25 Oct 2011 18:34:57 +1100 linbloke <linbloke@fastmail.fm> wrote: > On 16/08/11 9:55 AM, NeilBrown wrote: > > On Mon, 15 Aug 2011 19:32:04 -0400 Jérôme Poulin<jeromepoulin@gmail.com> > > wrote: > > > >> On Mon, Aug 15, 2011 at 6:42 PM, NeilBrown<neilb@suse.de> wrote: > >>> So if there are 3 drives A, X, Y where A is permanent and X and Y are rotated > >>> off-site, then I create two RAID1s like this: > >>> > >>> > >>> mdadm -C /dev/md0 -l1 -n2 --bitmap=internal /dev/A /dev/X > >>> mdadm -C /dev/md1 -l1 -n2 --bitmap=internal /dev/md0 /dev/Y > >> That seems nice for 2 disks, but adding another one later would be a > >> mess. Is there any way to play with slots number manually to make it > >> appear as an always degraded RAID ? I can't plug all the disks at once > >> because of the maximum of 2 ports. > > Yes, add another one later would be difficult. But if you know up-front that > > you will want three off-site devices it is easy. > > > > You could > > > > mdadm -C /dev/md0 -l1 -n2 -b internal /dev/A missing > > mdadm -C /dev/md1 -l1 -n2 -b internal /dev/md0 missing > > mdadm -C /dev/md2 -l1 -n2 -b internal /dev/md1 missing > > mdadm -C /dev/md3 -l1 -n2 -b internal /dev/md2 missing > > > > mkfs /dev/md3 ; mount .. > > > > So you now have 4 "missing" devices. Each time you plug in a device that > > hasn't been in an array before, explicitly add it to the array that you want > > it to be a part of and let it recover. > > When you plug in a device that was previously plugged in, just "mdadm > > -I /dev/XX" and it will automatically be added and recover based on the > > bitmap. > > > > You can have as many or as few of the transient drives plugged in at any > > time as you like. > > > > There is a cost here of course. Every write potentially needs to update > > every bitmap, so the more bitmaps, the more overhead in updating them. So > > don't create more than you need. > > > > Also, it doesn't have to be a linear stack. It could be a binary tree > > though that might take a little more care to construct. Then when an > > adjacent pair of leafs are both off-site, their bitmap would not need > > updating. > > > > NeilBrown > > Hi Neil, Jérôme and Pavel, > > I'm in the process of testing the solution described above and have been > successful at those steps (I now have sync'd devices that I have failed > and removed from their respective arrays - the "backups"). I can add new > devices and also incrementally re-add the devices back to their > respective arrays and all my tests show this process works well. The > point which I'm now trying to resolve is how to create a new array from > one of the off-site components - ie, the restore from backup test. > Below are the steps I've taken to implement and verify each step, you > can skip to the bottom section "Restore from off-site backup" to get to > the point if you like. When the wiki is back up, I'll post this process > there for others who are looking for mdadm based offline backups. Any > corrections gratefully appreciated. > > Based on the example above, for a target setup of 7 off-site devices > synced to a two device RAID1, my test setup for a is: > > RAID Array Online Device Off-site device > md100 sdc sdd > md101 md100 sde > md102 md101 sdf > md103 md102 sdg > md104 md103 sdh > md105 md104 sdi > md106 md105 sdj > > root@deb6dev:~# uname -a > Linux deb6dev 2.6.32-5-686 #1 SMP Tue Mar 8 21:36:00 UTC 2011 i686 GNU/Linux > root@deb6dev:~# mdadm -V > mdadm - v3.1.4 - 31st August 2010 > root@deb6dev:~# cat /etc/debian_version > 6.0.1 > > Create the nested arrays > --------------------- > root@deb6dev:~# mdadm -C /dev/md100 -l1 -n2 -b internal -e 1.2 /dev/sdc > missing > mdadm: array /dev/md100 started. > root@deb6dev:~# mdadm -C /dev/md101 -l1 -n2 -b internal -e 1.2 > /dev/md100 missing > mdadm: array /dev/md101 started. > root@deb6dev:~# mdadm -C /dev/md102 -l1 -n2 -b internal -e 1.2 > /dev/md101 missing > mdadm: array /dev/md102 started. > root@deb6dev:~# mdadm -C /dev/md103 -l1 -n2 -b internal -e 1.2 > /dev/md102 missing > mdadm: array /dev/md103 started. > root@deb6dev:~# mdadm -C /dev/md104 -l1 -n2 -b internal -e 1.2 > /dev/md103 missing > mdadm: array /dev/md104 started. > root@deb6dev:~# mdadm -C /dev/md105 -l1 -n2 -b internal -e 1.2 > /dev/md104 missing > mdadm: array /dev/md105 started. > root@deb6dev:~# mdadm -C /dev/md106 -l1 -n2 -b internal -e 1.2 > /dev/md105 missing > mdadm: array /dev/md106 started. > > root@deb6dev:~# cat /proc/mdstat > Personalities : [raid1] > md106 : active (auto-read-only) raid1 md105[0] > 51116 blocks super 1.2 [2/1] [U_] > bitmap: 1/1 pages [4KB], 65536KB chunk > > md105 : active raid1 md104[0] > 51128 blocks super 1.2 [2/1] [U_] > bitmap: 1/1 pages [4KB], 65536KB chunk > > md104 : active raid1 md103[0] > 51140 blocks super 1.2 [2/1] [U_] > bitmap: 1/1 pages [4KB], 65536KB chunk > > md103 : active raid1 md102[0] > 51152 blocks super 1.2 [2/1] [U_] > bitmap: 1/1 pages [4KB], 65536KB chunk > > md102 : active raid1 md101[0] > 51164 blocks super 1.2 [2/1] [U_] > bitmap: 1/1 pages [4KB], 65536KB chunk > > md101 : active raid1 md100[0] > 51176 blocks super 1.2 [2/1] [U_] > bitmap: 1/1 pages [4KB], 65536KB chunk > > md100 : active raid1 sdc[0] > 51188 blocks super 1.2 [2/1] [U_] > bitmap: 1/1 pages [4KB], 65536KB chunk > > unused devices: <none> > > Create and mount a filesystem > -------------------------- > root@deb6dev:~# mkfs.ext3 /dev/md106 > <<successful mkfs output snipped>> > root@deb6dev:~# mount -t ext3 /dev/md106 /mnt/backup > root@deb6dev:~# df | grep backup > /dev/md106 49490 4923 42012 11% /mnt/backup > > Plug in a device that hasn't been in an array before > ------------------------------------------- > root@deb6dev:~# mdadm -vv /dev/md100 --add /dev/sdd > mdadm: added /dev/sdd > root@deb6dev:~# cat /proc/mdstat > Personalities : [raid1] > md106 : active raid1 md105[0] > 51116 blocks super 1.2 [2/1] [U_] > bitmap: 1/1 pages [4KB], 65536KB chunk > > md105 : active raid1 md104[0] > 51128 blocks super 1.2 [2/1] [U_] > bitmap: 1/1 pages [4KB], 65536KB chunk > > md104 : active raid1 md103[0] > 51140 blocks super 1.2 [2/1] [U_] > bitmap: 1/1 pages [4KB], 65536KB chunk > > md103 : active raid1 md102[0] > 51152 blocks super 1.2 [2/1] [U_] > bitmap: 1/1 pages [4KB], 65536KB chunk > > md102 : active raid1 md101[0] > 51164 blocks super 1.2 [2/1] [U_] > bitmap: 1/1 pages [4KB], 65536KB chunk > > md101 : active raid1 md100[0] > 51176 blocks super 1.2 [2/1] [U_] > bitmap: 1/1 pages [4KB], 65536KB chunk > > md100 : active raid1 sdd[2] sdc[0] > 51188 blocks super 1.2 [2/2] [UU] > bitmap: 1/1 pages [4KB], 65536KB chunk > > > Write to the array > --------------- > root@deb6dev:~# dd if=/dev/urandom of=a.blob bs=1M count=20 > 20+0 records in > 20+0 records out > 20971520 bytes (21 MB) copied, 5.05528 s, 4.1 MB/s > root@deb6dev:~# dd if=/dev/urandom of=b.blob bs=1M count=10 > 10+0 records in > 10+0 records out > 10485760 bytes (10 MB) copied, 2.59361 s, 4.0 MB/s > root@deb6dev:~# dd if=/dev/urandom of=c.blob bs=1M count=5 > 5+0 records in > 5+0 records out > 5242880 bytes (5.2 MB) copied, 1.35619 s, 3.9 MB/s > root@deb6dev:~# md5sum *blob > md5sums.txt > root@deb6dev:~# ls -l > total 35844 > -rw-r--r-- 1 root root 20971520 Oct 25 15:57 a.blob > -rw-r--r-- 1 root root 10485760 Oct 25 15:57 b.blob > -rw-r--r-- 1 root root 5242880 Oct 25 15:57 c.blob > -rw-r--r-- 1 root root 123 Oct 25 15:57 md5sums.txt > root@deb6dev:~# cp *blob /mnt/backup > root@deb6dev:~# ls -l /mnt/backup > total 35995 > -rw-r--r-- 1 root root 20971520 Oct 25 15:58 a.blob > -rw-r--r-- 1 root root 10485760 Oct 25 15:58 b.blob > -rw-r--r-- 1 root root 5242880 Oct 25 15:58 c.blob > drwx------ 2 root root 12288 Oct 25 15:27 lost+found > root@deb6dev:~# df | grep backup > /dev/md106 49490 40906 6029 88% /mnt/backup > root@deb6dev:~# cat /proc/mdstat | grep -A 3 '^md100' > md100 : active raid1 sdd[2] sdc[0] > 51188 blocks super 1.2 [2/2] [UU] > bitmap: 0/1 pages [0KB], 65536KB chunk > > Data written and array devices in sync (bitmap 0/1) > > Fail and remove device > ------------------- > root@deb6dev:~# sync > root@deb6dev:~# mdadm -vv /dev/md100 --fail /dev/sdd --remove /dev/sdd > mdadm: set /dev/sdd faulty in /dev/md100 > mdadm: hot removed /dev/sdd from /dev/md100 > root@deb6dev:~# cat /proc/mdstat | grep -A 3 '^md100' > md100 : active raid1 sdc[0] > 51188 blocks super 1.2 [2/1] [U_] > bitmap: 0/1 pages [0KB], 65536KB chunk > > Device may now be unplugged > > > Write to the array again > -------------------- > root@deb6dev:~# rm /mnt/backup/b.blob > root@deb6dev:~# ls -l /mnt/backup > total 25714 > -rw-r--r-- 1 root root 20971520 Oct 25 15:58 a.blob > -rw-r--r-- 1 root root 5242880 Oct 25 15:58 c.blob > drwx------ 2 root root 12288 Oct 25 15:27 lost+found > root@deb6dev:~# df | grep backup > /dev/md106 49490 30625 16310 66% /mnt/backup > root@deb6dev:~# cat /proc/mdstat | grep -A 3 '^md100' > md100 : active raid1 sdc[0] > 51188 blocks super 1.2 [2/1] [U_] > bitmap: 1/1 pages [4KB], 65536KB chunk > > bitmap 1/1 shows array is not in sync (we know it's due to the writes > pending for the device we previously failed) > > Plug in a device that was previously plugged in > ---------------------------------------- > root@deb6dev:~# mdadm -vv -I /dev/sdd --run > mdadm: UUID differs from /dev/md/0. > mdadm: UUID differs from /dev/md/1. > mdadm: /dev/sdd attached to /dev/md100 which is already active. > root@deb6dev:~# sync > root@deb6dev:~# cat /proc/mdstat | grep -A 3 '^md100' > md100 : active raid1 sdd[2] sdc[0] > 51188 blocks super 1.2 [2/2] [UU] > bitmap: 0/1 pages [0KB], 65536KB chunk > > Device reconnected [UU] and in sync (bitmap 0/1) > > Restore from off-site device > ------------------------ > Remove device from array > root@deb6dev:~# mdadm -vv /dev/md100 --fail /dev/sdd --remove /dev/sdd > mdadm: set /dev/sdd faulty in /dev/md100 > mdadm: hot removed /dev/sdd from /dev/md100 > root@deb6dev:~# mdadm -Ev /dev/sdd > /dev/sdd: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x1 > Array UUID : 4c957fac:d7dbc792:b642daf0:d22e313e > Name : deb6dev:100 (local to host deb6dev) > Creation Time : Tue Oct 25 15:22:19 2011 > Raid Level : raid1 > Raid Devices : 2 > > Avail Dev Size : 102376 (50.00 MiB 52.42 MB) > Array Size : 102376 (50.00 MiB 52.42 MB) > Data Offset : 24 sectors > Super Offset : 8 sectors > State : clean > Device UUID : 381f453f:5a97f1f6:bb5098bb:8c071a95 > > Internal Bitmap : 8 sectors from superblock > Update Time : Tue Oct 25 17:27:53 2011 > Checksum : acbcee5f - correct > Events : 250 > > > Device Role : Active device 1 > Array State : AA ('A' == active, '.' == missing) > > Assemble a new array from off-site component: > root@deb6dev:~# mdadm -vv -A /dev/md200 --run /dev/sdd > mdadm: looking for devices for /dev/md200 > mdadm: /dev/sdd is identified as a member of /dev/md200, slot 1. > mdadm: no uptodate device for slot 0 of /dev/md200 > mdadm: added /dev/sdd to /dev/md200 as 1 > mdadm: /dev/md200 has been started with 1 drive (out of 2). > root@deb6dev:~# > > Check file-system on new array > root@deb6dev:~# fsck.ext3 -f -n /dev/md200 > e2fsck 1.41.12 (17-May-2010) > fsck.ext3: Superblock invalid, trying backup blocks... > fsck.ext3: Bad magic number in super-block while trying to open /dev/md200 > > The superblock could not be read or does not describe a correct ext2 > filesystem. If the device is valid and it really contains an ext2 > filesystem (and not swap or ufs or something else), then the superblock > is corrupt, and you might try running e2fsck with an alternate superblock: > e2fsck -b 8193 <device> > > > How do I use these devices in a new array? > You need to also assemble md201 md202 md203 md204 md205 md206 and the fsck/mount md206 Each of these is made by assembling the single previous md20X array. mdadm -A /dev/md201 --run /dev/md200 mdadm -A /dev/md202 --run /dev/md201 .... mdadm -A /dev/md206 --run /dev/md205 All the rest of your description looks good! Thanks, NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Rotating RAID 1 2011-08-15 22:42 ` NeilBrown 2011-08-15 23:32 ` Jérôme Poulin @ 2011-08-16 4:36 ` maurice 1 sibling, 0 replies; 20+ messages in thread From: maurice @ 2011-08-16 4:36 UTC (permalink / raw) To: NeilBrown; +Cc: Pavel Hofman, Jérôme Poulin, linux-raid On 8/15/2011 4:42 PM, NeilBrown wrote: > ..I'm not sure from you description whether the following describes > exactly > what you are doing or not, but this is how I would do it. > As you say, you need two bitmaps. > So if there are 3 drives A, X, Y where A is permanent and X and Y are > rotated off-site, > then I create two RAID1s like this: > mdadm -C /dev/md0 -l1 -n2 --bitmap=internal /dev/A /dev/X > mdadm -C /dev/md1 -l1 -n2 --bitmap=internal /dev/md0 /dev/Y > > mkfs /dev/md1; mount /dev/md1 ... > > Then you can remove either or both of X and Y and which each is > re-added it will > recover just the blocks that it needs. > X from the bitmap of md0, Y from the bitmap of md1. > > NeilBrown How elegantly described. After so many instances of being told "You should not use RAID as a backup device like that!" it is pleasant to hear you detail the "right way" to do this. Thank you very much for that Neil. -- Cheers, Maurice Hilarius eMail: /mhilarius@gmail.com/ ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2011-10-25 21:47 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-08-15 19:56 Rotating RAID 1 Jérôme Poulin 2011-08-15 20:19 ` Phil Turmel 2011-08-15 20:23 ` Jérôme Poulin 2011-08-15 20:21 ` Pavel Hofman 2011-08-15 20:25 ` Jérôme Poulin 2011-08-15 20:42 ` Pavel Hofman 2011-08-15 22:42 ` NeilBrown 2011-08-15 23:32 ` Jérôme Poulin 2011-08-15 23:55 ` NeilBrown 2011-08-16 6:34 ` Pavel Hofman 2011-09-09 22:28 ` Bill Davidsen 2011-09-11 19:21 ` Pavel Hofman 2011-09-12 14:20 ` Bill Davidsen 2011-08-23 3:45 ` Jérôme Poulin 2011-08-23 3:58 ` NeilBrown 2011-08-23 4:05 ` Jérôme Poulin 2011-08-24 2:28 ` Jérôme Poulin 2011-10-25 7:34 ` linbloke 2011-10-25 21:47 ` NeilBrown 2011-08-16 4:36 ` maurice
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).