* Rotating RAID 1
@ 2011-08-15 19:56 Jérôme Poulin
2011-08-15 20:19 ` Phil Turmel
2011-08-15 20:21 ` Pavel Hofman
0 siblings, 2 replies; 20+ messages in thread
From: Jérôme Poulin @ 2011-08-15 19:56 UTC (permalink / raw)
To: linux-raid
Good evening,
I'm currently working on a project in which I use md-raid RAID1 with
Bitmap to "clone" my data from one disk to another and I would like to
know if this could cause corruption:
The system has 2 SATA ports which are hotplug capable.
I created 2 partitions, 1 system (2GB), 1 data (1TB+).
I created to RAID1 using:
mdadm --create /dev/md0 --raid-devices=2 --bitmap=internal
--bitmap-chunk=4096 --metadata=1.0 /dev/sd[ab]1
mdadm --create /dev/md1 --raid-devices=2 --bitmap=internal
--bitmap-chunk=65536 --metadata=1.0 /dev/sd[ab]2
Forced sync_parallel on the system disk to be sure it rebuild first.
Formatted system ext3 and data ext4.
Both mounted using data=writeback.
This system doesn't contain critical data but it contains backups on
the data partition. Once the data is in sync, I removed a disk and let
udev fail and remove the disk from the array, this is ArchLinux and
udev is set to mount the array using the incremental option, I added
--run to make sure it mounts even when a disk is missing. As of now,
eveything works as expected.
Then what is different about a standard RAID1, I removed sdb and
replaced it with a brand new disk, copied the partition template from
the other one and added the new disk using mdadm -a on both arrays, it
synced and works, then swapping the other disk back only rebuilds
according to the bitmap, however sometimes it appears to make a full
rebuild which is alright. However once, after a day of modifications
and weeks after setting-up this RAID, at least 100 GB, it took seconds
to rebuild and days later it appeared to have encountered corruption,
the kernel complained about bad extents and fsck found errors in one
of the file I know it had modified that day.
So the question is; Am I right to use md-raid to do this kind of
stuff, rsync is too CPU heavy for what I need and I need to stay
compatible with Windows thus choosing metadata 1.0.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Rotating RAID 1
2011-08-15 19:56 Rotating RAID 1 Jérôme Poulin
@ 2011-08-15 20:19 ` Phil Turmel
2011-08-15 20:23 ` Jérôme Poulin
2011-08-15 20:21 ` Pavel Hofman
1 sibling, 1 reply; 20+ messages in thread
From: Phil Turmel @ 2011-08-15 20:19 UTC (permalink / raw)
To: Jérôme Poulin; +Cc: linux-raid
Hi Jérôme,
On 08/15/2011 03:56 PM, Jérôme Poulin wrote:
> Then what is different about a standard RAID1, I removed sdb and
> replaced it with a brand new disk, copied the partition template from
> the other one and added the new disk using mdadm -a on both arrays, it
> synced and works, then swapping the other disk back only rebuilds
> according to the bitmap, however sometimes it appears to make a full
> rebuild which is alright. However once, after a day of modifications
> and weeks after setting-up this RAID, at least 100 GB, it took seconds
> to rebuild and days later it appeared to have encountered corruption,
> the kernel complained about bad extents and fsck found errors in one
> of the file I know it had modified that day.
This is a problem. MD only knows about two disk. You have three. When two disks are in place and sync'ed, the bitmaps will essentially stay cleared.
When you swap to the other disk, its bitmap is also clear, for the same reason. I'm sure mdadm notices the different event counts, but the clear bitmap would leave mdadm little or nothing to do to resync, as far as it knows. But lots of writes have happened in the meantime, and they won't get copied to the freshly inserted drive. Mdadm will read from both disks in parallel when there are parallel workloads, so one workload would get current data and the other would get stale data.
If you perform a "check" pass after swapping and resyncing, I bet it finds many mismatches. It definitely can't work as described.
I'm not sure, but this might work if you could temporarily set it up as a triple mirror, so each disk has a unique slot/role.
It would also work if you didn't use a bitmap, as a re-inserted drive would simply be overwritten completely.
> So the question is; Am I right to use md-raid to do this kind of
> stuff, rsync is too CPU heavy for what I need and I need to stay
> compatible with Windows thus choosing metadata 1.0.
How do you stay compatible with Windows? If you let Windows write to any of these disks, you've corrupted that disk with respect to its peers. Danger, Will Robinson!
HTH,
Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Rotating RAID 1
2011-08-15 19:56 Rotating RAID 1 Jérôme Poulin
2011-08-15 20:19 ` Phil Turmel
@ 2011-08-15 20:21 ` Pavel Hofman
2011-08-15 20:25 ` Jérôme Poulin
1 sibling, 1 reply; 20+ messages in thread
From: Pavel Hofman @ 2011-08-15 20:21 UTC (permalink / raw)
To: Jérôme Poulin; +Cc: linux-raid
Dne 15.8.2011 21:56, Jérôme Poulin napsal(a):
> Good evening,
>
> I'm currently working on a project in which I use md-raid RAID1 with
> Bitmap to "clone" my data from one disk to another and I would like to
> know if this could cause corruption:
>
> The system has 2 SATA ports which are hotplug capable.
> I created 2 partitions, 1 system (2GB), 1 data (1TB+).
> I created to RAID1 using:
> mdadm --create /dev/md0 --raid-devices=2 --bitmap=internal
> --bitmap-chunk=4096 --metadata=1.0 /dev/sd[ab]1
> mdadm --create /dev/md1 --raid-devices=2 --bitmap=internal
> --bitmap-chunk=65536 --metadata=1.0 /dev/sd[ab]2
> Forced sync_parallel on the system disk to be sure it rebuild first.
> Formatted system ext3 and data ext4.
> Both mounted using data=writeback.
>
> This system doesn't contain critical data but it contains backups on
> the data partition. Once the data is in sync, I removed a disk and let
> udev fail and remove the disk from the array, this is ArchLinux and
> udev is set to mount the array using the incremental option, I added
> --run to make sure it mounts even when a disk is missing. As of now,
> eveything works as expected.
>
> Then what is different about a standard RAID1, I removed sdb and
> replaced it with a brand new disk, copied the partition template from
> the other one and added the new disk using mdadm -a on both arrays, it
> synced and works, then swapping the other disk back only rebuilds
> according to the bitmap, however sometimes it appears to make a full
> rebuild which is alright. However once, after a day of modifications
> and weeks after setting-up this RAID, at least 100 GB, it took seconds
> to rebuild and days later it appeared to have encountered corruption,
> the kernel complained about bad extents and fsck found errors in one
> of the file I know it had modified that day.
Does your scenario involve using two "external" drives, being swapped
each time? I am using such setup, but in order to gain the bitmap
performance effects, I have to run two mirrored RAID1s, i.e. two
bitmaps, each for its corresponding external disk. This setup has been
working OK for a few years now.
Best regards,
Pavel.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Rotating RAID 1
2011-08-15 20:19 ` Phil Turmel
@ 2011-08-15 20:23 ` Jérôme Poulin
0 siblings, 0 replies; 20+ messages in thread
From: Jérôme Poulin @ 2011-08-15 20:23 UTC (permalink / raw)
To: Phil Turmel; +Cc: linux-raid
On Mon, Aug 15, 2011 at 4:19 PM, Phil Turmel <philip@turmel.org> wrote:
> It would also work if you didn't use a bitmap, as a re-inserted drive would simply be overwritten completely.
After reading this, I'll prefer wiping the bitmap rather than having
an horror story trying to restore that backup, I'll give it a try.
>
> How do you stay compatible with Windows? If you let Windows write to any of these disks, you've corrupted that disk with respect to its peers. Danger, Will Robinson!
I am using ext4 driver in read only mode.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Rotating RAID 1
2011-08-15 20:21 ` Pavel Hofman
@ 2011-08-15 20:25 ` Jérôme Poulin
2011-08-15 20:42 ` Pavel Hofman
0 siblings, 1 reply; 20+ messages in thread
From: Jérôme Poulin @ 2011-08-15 20:25 UTC (permalink / raw)
To: Pavel Hofman; +Cc: linux-raid
On Mon, Aug 15, 2011 at 4:21 PM, Pavel Hofman <pavel.hofman@ivitera.com> wrote:
> Does your scenario involve using two "external" drives, being swapped
> each time?
Yes, exactly, 3 or more drive, one stays in place, and the others get
rotated off-site.
> I am using such setup, but in order to gain the bitmap
> performance effects, I have to run two mirrored RAID1s, i.e. two
> bitmaps, each for its corresponding external disk. This setup has been
> working OK for a few years now.
Did you script something that stops the RAID and re-assemble it? The
RAID must stay mounted in my case as there is live data (incremential
backups, so even if the last file is incomplete it is not a problem.)
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Rotating RAID 1
2011-08-15 20:25 ` Jérôme Poulin
@ 2011-08-15 20:42 ` Pavel Hofman
2011-08-15 22:42 ` NeilBrown
0 siblings, 1 reply; 20+ messages in thread
From: Pavel Hofman @ 2011-08-15 20:42 UTC (permalink / raw)
To: Jérôme Poulin; +Cc: linux-raid
Dne 15.8.2011 22:25, Jérôme Poulin napsal(a):
> On Mon, Aug 15, 2011 at 4:21 PM, Pavel Hofman <pavel.hofman@ivitera.com> wrote:
>> Does your scenario involve using two "external" drives, being swapped
>> each time?
>
> Yes, exactly, 3 or more drive, one stays in place, and the others get
> rotated off-site.
>
>> I am using such setup, but in order to gain the bitmap
>> performance effects, I have to run two mirrored RAID1s, i.e. two
>> bitmaps, each for its corresponding external disk. This setup has been
>> working OK for a few years now.
>
> Did you script something that stops the RAID and re-assemble it? The
> RAID must stay mounted in my case as there is live data (incremential
> backups, so even if the last file is incomplete it is not a problem.)
I am working on wiki description of our backup solution. The priorities
got re-organized recently, looks like I should finish it soon :-)
Yes, I have a script automatically re-assembling the array corresponding
to the added drive and starting synchronization. There is another script
checking synchronization status, run periodically from cron. When the
arrays are synced, it waits until the currently running backup job
finishes, shuts down the backup software (backuppc), unmounts the
filesystem to flush, removes the external drives from the array (we run
several external drives in raid0), does a few basic checks on the
external copy (mounting read-only, reading a directory) and puts the
external drives to sleep (hdparm -Y) for storing them outside of company
premises.
Give me a few days, I will finish the wiki page and send you a link.
Pavel.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Rotating RAID 1
2011-08-15 20:42 ` Pavel Hofman
@ 2011-08-15 22:42 ` NeilBrown
2011-08-15 23:32 ` Jérôme Poulin
2011-08-16 4:36 ` maurice
0 siblings, 2 replies; 20+ messages in thread
From: NeilBrown @ 2011-08-15 22:42 UTC (permalink / raw)
To: Pavel Hofman; +Cc: Jérôme Poulin, linux-raid
On Mon, 15 Aug 2011 22:42:06 +0200 Pavel Hofman <pavel.hofman@ivitera.com>
wrote:
>
> Dne 15.8.2011 22:25, Jérôme Poulin napsal(a):
> > On Mon, Aug 15, 2011 at 4:21 PM, Pavel Hofman <pavel.hofman@ivitera.com> wrote:
> >> Does your scenario involve using two "external" drives, being swapped
> >> each time?
> >
> > Yes, exactly, 3 or more drive, one stays in place, and the others get
> > rotated off-site.
> >
> >> I am using such setup, but in order to gain the bitmap
> >> performance effects, I have to run two mirrored RAID1s, i.e. two
> >> bitmaps, each for its corresponding external disk. This setup has been
> >> working OK for a few years now.
> >
> > Did you script something that stops the RAID and re-assemble it? The
> > RAID must stay mounted in my case as there is live data (incremential
> > backups, so even if the last file is incomplete it is not a problem.)
>
> I am working on wiki description of our backup solution. The priorities
> got re-organized recently, looks like I should finish it soon :-)
>
> Yes, I have a script automatically re-assembling the array corresponding
> to the added drive and starting synchronization. There is another script
> checking synchronization status, run periodically from cron. When the
> arrays are synced, it waits until the currently running backup job
> finishes, shuts down the backup software (backuppc), unmounts the
> filesystem to flush, removes the external drives from the array (we run
> several external drives in raid0), does a few basic checks on the
> external copy (mounting read-only, reading a directory) and puts the
> external drives to sleep (hdparm -Y) for storing them outside of company
> premises.
>
> Give me a few days, I will finish the wiki page and send you a link.
>
I'm not sure from you description whether the following describes exactly
what you are doing or not, but this is how I would do it.
As you say, you need two bitmaps.
So if there are 3 drives A, X, Y where A is permanent and X and Y are rotated
off-site, then I create two RAID1s like this:
mdadm -C /dev/md0 -l1 -n2 --bitmap=internal /dev/A /dev/X
mdadm -C /dev/md1 -l1 -n2 --bitmap=internal /dev/md0 /dev/Y
mkfs /dev/md1; mount /dev/md1 ...
Then you can remove either or both of X and Y and which each is re-added it
will recover just the blocks that it needs. X from the bitmap of md0, Y from
the bitmap of md1.
NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Rotating RAID 1
2011-08-15 22:42 ` NeilBrown
@ 2011-08-15 23:32 ` Jérôme Poulin
2011-08-15 23:55 ` NeilBrown
2011-08-16 4:36 ` maurice
1 sibling, 1 reply; 20+ messages in thread
From: Jérôme Poulin @ 2011-08-15 23:32 UTC (permalink / raw)
To: NeilBrown; +Cc: Pavel Hofman, linux-raid
On Mon, Aug 15, 2011 at 6:42 PM, NeilBrown <neilb@suse.de> wrote:
> So if there are 3 drives A, X, Y where A is permanent and X and Y are rotated
> off-site, then I create two RAID1s like this:
>
>
> mdadm -C /dev/md0 -l1 -n2 --bitmap=internal /dev/A /dev/X
> mdadm -C /dev/md1 -l1 -n2 --bitmap=internal /dev/md0 /dev/Y
That seems nice for 2 disks, but adding another one later would be a
mess. Is there any way to play with slots number manually to make it
appear as an always degraded RAID ? I can't plug all the disks at once
because of the maximum of 2 ports.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Rotating RAID 1
2011-08-15 23:32 ` Jérôme Poulin
@ 2011-08-15 23:55 ` NeilBrown
2011-08-16 6:34 ` Pavel Hofman
` (2 more replies)
0 siblings, 3 replies; 20+ messages in thread
From: NeilBrown @ 2011-08-15 23:55 UTC (permalink / raw)
To: Jérôme Poulin; +Cc: Pavel Hofman, linux-raid
On Mon, 15 Aug 2011 19:32:04 -0400 Jérôme Poulin <jeromepoulin@gmail.com>
wrote:
> On Mon, Aug 15, 2011 at 6:42 PM, NeilBrown <neilb@suse.de> wrote:
> > So if there are 3 drives A, X, Y where A is permanent and X and Y are rotated
> > off-site, then I create two RAID1s like this:
> >
> >
> > mdadm -C /dev/md0 -l1 -n2 --bitmap=internal /dev/A /dev/X
> > mdadm -C /dev/md1 -l1 -n2 --bitmap=internal /dev/md0 /dev/Y
>
> That seems nice for 2 disks, but adding another one later would be a
> mess. Is there any way to play with slots number manually to make it
> appear as an always degraded RAID ? I can't plug all the disks at once
> because of the maximum of 2 ports.
Yes, add another one later would be difficult. But if you know up-front that
you will want three off-site devices it is easy.
You could
mdadm -C /dev/md0 -l1 -n2 -b internal /dev/A missing
mdadm -C /dev/md1 -l1 -n2 -b internal /dev/md0 missing
mdadm -C /dev/md2 -l1 -n2 -b internal /dev/md1 missing
mdadm -C /dev/md3 -l1 -n2 -b internal /dev/md2 missing
mkfs /dev/md3 ; mount ..
So you now have 4 "missing" devices. Each time you plug in a device that
hasn't been in an array before, explicitly add it to the array that you want
it to be a part of and let it recover.
When you plug in a device that was previously plugged in, just "mdadm
-I /dev/XX" and it will automatically be added and recover based on the
bitmap.
You can have as many or as few of the transient drives plugged in at any
time as you like.
There is a cost here of course. Every write potentially needs to update
every bitmap, so the more bitmaps, the more overhead in updating them. So
don't create more than you need.
Also, it doesn't have to be a linear stack. It could be a binary tree
though that might take a little more care to construct. Then when an
adjacent pair of leafs are both off-site, their bitmap would not need
updating.
NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Rotating RAID 1
2011-08-15 22:42 ` NeilBrown
2011-08-15 23:32 ` Jérôme Poulin
@ 2011-08-16 4:36 ` maurice
1 sibling, 0 replies; 20+ messages in thread
From: maurice @ 2011-08-16 4:36 UTC (permalink / raw)
To: NeilBrown; +Cc: Pavel Hofman, Jérôme Poulin, linux-raid
On 8/15/2011 4:42 PM, NeilBrown wrote:
> ..I'm not sure from you description whether the following describes
> exactly
> what you are doing or not, but this is how I would do it.
> As you say, you need two bitmaps.
> So if there are 3 drives A, X, Y where A is permanent and X and Y are
> rotated off-site,
> then I create two RAID1s like this:
> mdadm -C /dev/md0 -l1 -n2 --bitmap=internal /dev/A /dev/X
> mdadm -C /dev/md1 -l1 -n2 --bitmap=internal /dev/md0 /dev/Y
>
> mkfs /dev/md1; mount /dev/md1 ...
>
> Then you can remove either or both of X and Y and which each is
> re-added it will
> recover just the blocks that it needs.
> X from the bitmap of md0, Y from the bitmap of md1.
>
> NeilBrown
How elegantly described.
After so many instances of being told "You should not use RAID as a
backup device like that!"
it is pleasant to hear you detail the "right way" to do this.
Thank you very much for that Neil.
--
Cheers,
Maurice Hilarius
eMail: /mhilarius@gmail.com/
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Rotating RAID 1
2011-08-15 23:55 ` NeilBrown
@ 2011-08-16 6:34 ` Pavel Hofman
2011-09-09 22:28 ` Bill Davidsen
2011-08-23 3:45 ` Jérôme Poulin
2011-10-25 7:34 ` linbloke
2 siblings, 1 reply; 20+ messages in thread
From: Pavel Hofman @ 2011-08-16 6:34 UTC (permalink / raw)
To: NeilBrown; +Cc: Jérôme Poulin, linux-raid
Dne 16.8.2011 01:55, NeilBrown napsal(a):
>
> Also, it doesn't have to be a linear stack. It could be a binary tree
> though that might take a little more care to construct.
Since our backup server being a critical resource needs redundancy
itself, we are running two degraded RAID1s in parallel, using two
internal drives. The two alternating external drives plug into the
corresponding bitmap-enabled RAID1.
Pavel.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Rotating RAID 1
2011-08-15 23:55 ` NeilBrown
2011-08-16 6:34 ` Pavel Hofman
@ 2011-08-23 3:45 ` Jérôme Poulin
2011-08-23 3:58 ` NeilBrown
2011-10-25 7:34 ` linbloke
2 siblings, 1 reply; 20+ messages in thread
From: Jérôme Poulin @ 2011-08-23 3:45 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
On Mon, Aug 15, 2011 at 7:55 PM, NeilBrown <neilb@suse.de> wrote:
> Yes, add another one later would be difficult. But if you know up-front that
> you will want three off-site devices it is easy.
>
> You could
>
> mdadm -C /dev/md0 -l1 -n2 -b internal /dev/A missing
> mdadm -C /dev/md1 -l1 -n2 -b internal /dev/md0 missing
> mdadm -C /dev/md2 -l1 -n2 -b internal /dev/md1 missing
> mdadm -C /dev/md3 -l1 -n2 -b internal /dev/md2 missing
>
> mkfs /dev/md3 ; mount ..
>
> So you now have 4 "missing" devices.
Alright, so I tried that on my project, being a low-end device is
resulted in about 30-40% performance lost with 8 MDs (planning in
advance), I tried disabling all bitmap to see if it helps and I get
minimal performance gain. Is there anything I should tune in this
case?
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Rotating RAID 1
2011-08-23 3:45 ` Jérôme Poulin
@ 2011-08-23 3:58 ` NeilBrown
2011-08-23 4:05 ` Jérôme Poulin
0 siblings, 1 reply; 20+ messages in thread
From: NeilBrown @ 2011-08-23 3:58 UTC (permalink / raw)
To: Jérôme Poulin; +Cc: linux-raid
On Mon, 22 Aug 2011 23:45:53 -0400 Jérôme Poulin <jeromepoulin@gmail.com>
wrote:
> On Mon, Aug 15, 2011 at 7:55 PM, NeilBrown <neilb@suse.de> wrote:
> > Yes, add another one later would be difficult. But if you know up-front that
> > you will want three off-site devices it is easy.
> >
> > You could
> >
> > mdadm -C /dev/md0 -l1 -n2 -b internal /dev/A missing
> > mdadm -C /dev/md1 -l1 -n2 -b internal /dev/md0 missing
> > mdadm -C /dev/md2 -l1 -n2 -b internal /dev/md1 missing
> > mdadm -C /dev/md3 -l1 -n2 -b internal /dev/md2 missing
> >
> > mkfs /dev/md3 ; mount ..
> >
> > So you now have 4 "missing" devices.
>
> Alright, so I tried that on my project, being a low-end device is
> resulted in about 30-40% performance lost with 8 MDs (planning in
> advance), I tried disabling all bitmap to see if it helps and I get
> minimal performance gain. Is there anything I should tune in this
> case?
More concrete details would help...
So you have 8 MD RAID1s each with one missing device and the other device is
the next RAID1 down in the stack, except that last RAID1 where the one device
is a real device.
And in some unspecified test the RAID1 at the top of the stack gives 2/3 the
performance of the plain device? This the same when all bitmaps are
removed.
Certainly seems strange.
Can you give details of the test and numbers etc.
NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Rotating RAID 1
2011-08-23 3:58 ` NeilBrown
@ 2011-08-23 4:05 ` Jérôme Poulin
2011-08-24 2:28 ` Jérôme Poulin
0 siblings, 1 reply; 20+ messages in thread
From: Jérôme Poulin @ 2011-08-23 4:05 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
On Mon, Aug 22, 2011 at 11:58 PM, NeilBrown <neilb@suse.de> wrote:
> More concrete details would help...
Sorry, you're right, I though it could have been something fast.
I have details for the first test I made with 15 RAIDs.
>
> So you have 8 MD RAID1s each with one missing device and the other device is
> the next RAID1 down in the stack, except that last RAID1 where the one device
> is a real device.
Exactly, only 1 real device at the moment.
>
> And in some unspecified test the RAID1 at the top of the stack gives 2/3 the
> performance of the plain device? This the same when all bitmaps are
> removed.
>
> Certainly seems strange.
>
> Can you give details of the test and numbers etc.
So the test is a backup, Veeam exactly, using Samba 3.6.0 with brand
new SMB2 protocol, bitmaps are removed.
The backup took 45 minutes instead of 14 to 22 minutes.
Here is a sample of iostat showing the average queue size increasing
by RAID devices:
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
sdb 0.00 35.67 0.00 27.00 0.00 5579.00
413.26 2.01 74.69 0.00 74.69 34.32 92.67
md64 0.00 0.00 0.00 61.33 0.00 5577.00
181.86 0.00 0.00 0.00 0.00 0.00 0.00
md65 0.00 0.00 0.00 60.00 0.00 5574.67
185.82 0.00 0.00 0.00 0.00 0.00 0.00
md66 0.00 0.00 0.00 58.67 0.00 5572.33
189.97 0.00 0.00 0.00 0.00 0.00 0.00
md67 0.00 0.00 0.00 58.67 0.00 5572.33
189.97 0.00 0.00 0.00 0.00 0.00 0.00
md68 0.00 0.00 0.00 58.67 0.00 5572.33
189.97 0.00 0.00 0.00 0.00 0.00 0.00
md69 0.00 0.00 0.00 58.67 0.00 5572.33
189.97 0.00 0.00 0.00 0.00 0.00 0.00
md70 0.00 0.00 0.00 58.33 0.00 5572.00
191.04 0.00 0.00 0.00 0.00 0.00 0.00
md71 0.00 0.00 0.00 57.00 0.00 5569.67
195.43 0.00 0.00 0.00 0.00 0.00 0.00
md72 0.00 0.00 0.00 55.67 0.00 5567.33
200.02 0.00 0.00 0.00 0.00 0.00 0.00
md73 0.00 0.00 0.00 54.33 0.00 5565.00
204.85 0.00 0.00 0.00 0.00 0.00 0.00
md74 0.00 0.00 0.00 53.00 0.00 5562.67
209.91 0.00 0.00 0.00 0.00 0.00 0.00
md75 0.00 0.00 0.00 51.67 0.00 5560.33
215.24 0.00 0.00 0.00 0.00 0.00 0.00
md76 0.00 0.00 0.00 50.33 0.00 5558.00
220.85 0.00 0.00 0.00 0.00 0.00 0.00
md77 0.00 0.00 0.00 49.00 0.00 5555.67
226.76 0.00 0.00 0.00 0.00 0.00 0.00
md78 0.00 0.00 0.00 47.67 0.00 5553.33
233.01 0.00 0.00 0.00 0.00 0.00 0.00
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Rotating RAID 1
2011-08-23 4:05 ` Jérôme Poulin
@ 2011-08-24 2:28 ` Jérôme Poulin
0 siblings, 0 replies; 20+ messages in thread
From: Jérôme Poulin @ 2011-08-24 2:28 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
On Tue, Aug 23, 2011 at 12:05 AM, Jérôme Poulin <jeromepoulin@gmail.com> wrote:
> On Mon, Aug 22, 2011 at 11:58 PM, NeilBrown <neilb@suse.de> wrote:
>> So you have 8 MD RAID1s each with one missing device and the other device is
>> the next RAID1 down in the stack, except that last RAID1 where the one device
>> is a real device.
>
More tests revealed nothing very consistent... however, there is a
consistent performance degradation on our backups when using multiple
RAID devices, backup is every 2 hours and it is really slower.
Here are the results of bonnie++ which only show degradation of per
char even if I know it is not really significant.
Rewrite was going down more and more until it went back up for no
reason, really weird unexplainable results.
First line is from raw device, sdb2, then from md device, then
incrementally more md devices in series.
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
GANAS0202 300M 5547 93 76052 54 26862 34 5948 99 80050 49 175.6 2
GANAS0202 300M 5455 92 72428 52 26787 35 5847 97 75833 49 166.3 2
GANAS0202 300M 5401 91 71860 52 27100 35 5820 97 79219 53 156.2 2
GANAS0202 300M 5315 90 71488 51 22472 30 5673 94 73707 51 162.5 2
GANAS0202 300M 5159 87 67984 50 22860 31 5642 94 78829 54 138.6 2
GANAS0202 300M 5033 85 67091 48 22189 30 5458 91 76586 55 149.3 2
GANAS0202 300M 4904 83 65626 47 24602 34 5425 91 72349 52 112.9 2
GANAS0202 300M 4854 82 66664 48 24937 35 5120 85 75008 56 149.1 2
GANAS0202 300M 4732 80 66429 48 25646 37 5296 88 75137 57 145.7 2
GANAS0202 300M 4246 71 69589 51 25112 36 5031 84 78260 61 136.2 2
GANAS0202 300M 4253 72 70190 52 27121 40 5194 87 77648 61 107.5 2
GANAS0202 300M 4112 69 76360 55 23852 35 4827 81 74005 59 118.9 2
GANAS0202 300M 3987 67 62689 47 22475 33 4971 83 74315 61 97.6 2
GANAS0202 300M 3912 66 69769 51 22221 33 4979 83 74631 62 114.9 2
GANAS0202 300M 3602 61 52773 38 25944 40 4953 83 77794 65 125.4 2
GANAS0202 300M 3580 60 58728 43 22855 35 4680 79 74244 64 155.2 3
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Rotating RAID 1
2011-08-16 6:34 ` Pavel Hofman
@ 2011-09-09 22:28 ` Bill Davidsen
2011-09-11 19:21 ` Pavel Hofman
0 siblings, 1 reply; 20+ messages in thread
From: Bill Davidsen @ 2011-09-09 22:28 UTC (permalink / raw)
To: Pavel Hofman; +Cc: NeilBrown, Jérôme Poulin, linux-raid
Pavel Hofman wrote:
> Dne 16.8.2011 01:55, NeilBrown napsal(a):
>
>> Also, it doesn't have to be a linear stack. It could be a binary tree
>> though that might take a little more care to construct.
>>
> Since our backup server being a critical resource needs redundancy
> itself, we are running two degraded RAID1s in parallel, using two
> internal drives. The two alternating external drives plug into the
> corresponding bitmap-enabled RAID1.
>
I wonder if you could use a four device raid1 here, two drives
permanently installed and two being added one at a time to the array.
That gives you internal redundancy and recent backups as well.
I'm still a bit puzzled about the idea of rsync being too much CPU
overhead, but I'll pass on that. The issue I have had with raid1 for a
backup is that the data isn't always in a logical useful state when you
do physical backup. Do thing with scripts and hope you always run the
right one.
--
Bill Davidsen<davidsen@tmr.com>
We are not out of the woods yet, but we know the direction and have
taken the first step. The steps are many, but finite in number, and if
we persevere we will reach our destination. -me, 2010
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Rotating RAID 1
2011-09-09 22:28 ` Bill Davidsen
@ 2011-09-11 19:21 ` Pavel Hofman
2011-09-12 14:20 ` Bill Davidsen
0 siblings, 1 reply; 20+ messages in thread
From: Pavel Hofman @ 2011-09-11 19:21 UTC (permalink / raw)
To: Bill Davidsen; +Cc: linux-raid
Dne 10.9.2011 00:28, Bill Davidsen napsal(a):
> Pavel Hofman wrote:
>> Dne 16.8.2011 01:55, NeilBrown napsal(a):
>>
>>> Also, it doesn't have to be a linear stack. It could be a binary tree
>>> though that might take a little more care to construct.
>>>
>> Since our backup server being a critical resource needs redundancy
>> itself, we are running two degraded RAID1s in parallel, using two
>> internal drives. The two alternating external drives plug into the
>> corresponding bitmap-enabled RAID1.
>>
>
> I wonder if you could use a four device raid1 here, two drives
> permanently installed and two being added one at a time to the array.
> That gives you internal redundancy and recent backups as well.
I am not sure you could employ the write-intent bitmap then. And the
bitmap makes the backup considerably faster.
>
> I'm still a bit puzzled about the idea of rsync being too much CPU
> overhead, but I'll pass on that. The issue I have had with raid1 for a
> backup is that the data isn't always in a logical useful state when you
> do physical backup. Do thing with scripts and hope you always run the
> right one.
I am afraid I do not understand exactly what you mean :-) We have a few
scripts, but only one is started manually, the rest is taken care of
automatically.
Pavel.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Rotating RAID 1
2011-09-11 19:21 ` Pavel Hofman
@ 2011-09-12 14:20 ` Bill Davidsen
0 siblings, 0 replies; 20+ messages in thread
From: Bill Davidsen @ 2011-09-12 14:20 UTC (permalink / raw)
To: Pavel Hofman; +Cc: linux-raid
Pavel Hofman wrote:
> Dne 10.9.2011 00:28, Bill Davidsen napsal(a):
>
>> Pavel Hofman wrote:
>>
>>> Dne 16.8.2011 01:55, NeilBrown napsal(a):
>>>
>>>
>>>> Also, it doesn't have to be a linear stack. It could be a binary tree
>>>> though that might take a little more care to construct.
>>>>
>>>>
>>> Since our backup server being a critical resource needs redundancy
>>> itself, we are running two degraded RAID1s in parallel, using two
>>> internal drives. The two alternating external drives plug into the
>>> corresponding bitmap-enabled RAID1.
>>>
>>>
>> I wonder if you could use a four device raid1 here, two drives
>> permanently installed and two being added one at a time to the array.
>> That gives you internal redundancy and recent backups as well.
>>
> I am not sure you could employ the write-intent bitmap then. And the
> bitmap makes the backup considerably faster.
>
With --bitmap=internal you should have all of the information you need
to do fast recovery, but I may misunderstand internal bitmap and
possibly incremental build. What I proposed was creating the array as
dev1 dev2 dev3 missing, then dev3 or dev4 could be added and brought up
to current independently because they would be separate devices.
--
Bill Davidsen<davidsen@tmr.com>
We are not out of the woods yet, but we know the direction and have
taken the first step. The steps are many, but finite in number, and if
we persevere we will reach our destination. -me, 2010
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Rotating RAID 1
2011-08-15 23:55 ` NeilBrown
2011-08-16 6:34 ` Pavel Hofman
2011-08-23 3:45 ` Jérôme Poulin
@ 2011-10-25 7:34 ` linbloke
2011-10-25 21:47 ` NeilBrown
2 siblings, 1 reply; 20+ messages in thread
From: linbloke @ 2011-10-25 7:34 UTC (permalink / raw)
To: NeilBrown; +Cc: Jérôme Poulin, Pavel Hofman, linux-raid
On 16/08/11 9:55 AM, NeilBrown wrote:
> On Mon, 15 Aug 2011 19:32:04 -0400 Jérôme Poulin<jeromepoulin@gmail.com>
> wrote:
>
>> On Mon, Aug 15, 2011 at 6:42 PM, NeilBrown<neilb@suse.de> wrote:
>>> So if there are 3 drives A, X, Y where A is permanent and X and Y are rotated
>>> off-site, then I create two RAID1s like this:
>>>
>>>
>>> mdadm -C /dev/md0 -l1 -n2 --bitmap=internal /dev/A /dev/X
>>> mdadm -C /dev/md1 -l1 -n2 --bitmap=internal /dev/md0 /dev/Y
>> That seems nice for 2 disks, but adding another one later would be a
>> mess. Is there any way to play with slots number manually to make it
>> appear as an always degraded RAID ? I can't plug all the disks at once
>> because of the maximum of 2 ports.
> Yes, add another one later would be difficult. But if you know up-front that
> you will want three off-site devices it is easy.
>
> You could
>
> mdadm -C /dev/md0 -l1 -n2 -b internal /dev/A missing
> mdadm -C /dev/md1 -l1 -n2 -b internal /dev/md0 missing
> mdadm -C /dev/md2 -l1 -n2 -b internal /dev/md1 missing
> mdadm -C /dev/md3 -l1 -n2 -b internal /dev/md2 missing
>
> mkfs /dev/md3 ; mount ..
>
> So you now have 4 "missing" devices. Each time you plug in a device that
> hasn't been in an array before, explicitly add it to the array that you want
> it to be a part of and let it recover.
> When you plug in a device that was previously plugged in, just "mdadm
> -I /dev/XX" and it will automatically be added and recover based on the
> bitmap.
>
> You can have as many or as few of the transient drives plugged in at any
> time as you like.
>
> There is a cost here of course. Every write potentially needs to update
> every bitmap, so the more bitmaps, the more overhead in updating them. So
> don't create more than you need.
>
> Also, it doesn't have to be a linear stack. It could be a binary tree
> though that might take a little more care to construct. Then when an
> adjacent pair of leafs are both off-site, their bitmap would not need
> updating.
>
> NeilBrown
Hi Neil, Jérôme and Pavel,
I'm in the process of testing the solution described above and have been
successful at those steps (I now have sync'd devices that I have failed
and removed from their respective arrays - the "backups"). I can add new
devices and also incrementally re-add the devices back to their
respective arrays and all my tests show this process works well. The
point which I'm now trying to resolve is how to create a new array from
one of the off-site components - ie, the restore from backup test.
Below are the steps I've taken to implement and verify each step, you
can skip to the bottom section "Restore from off-site backup" to get to
the point if you like. When the wiki is back up, I'll post this process
there for others who are looking for mdadm based offline backups. Any
corrections gratefully appreciated.
Based on the example above, for a target setup of 7 off-site devices
synced to a two device RAID1, my test setup for a is:
RAID Array Online Device Off-site device
md100 sdc sdd
md101 md100 sde
md102 md101 sdf
md103 md102 sdg
md104 md103 sdh
md105 md104 sdi
md106 md105 sdj
root@deb6dev:~# uname -a
Linux deb6dev 2.6.32-5-686 #1 SMP Tue Mar 8 21:36:00 UTC 2011 i686 GNU/Linux
root@deb6dev:~# mdadm -V
mdadm - v3.1.4 - 31st August 2010
root@deb6dev:~# cat /etc/debian_version
6.0.1
Create the nested arrays
---------------------
root@deb6dev:~# mdadm -C /dev/md100 -l1 -n2 -b internal -e 1.2 /dev/sdc
missing
mdadm: array /dev/md100 started.
root@deb6dev:~# mdadm -C /dev/md101 -l1 -n2 -b internal -e 1.2
/dev/md100 missing
mdadm: array /dev/md101 started.
root@deb6dev:~# mdadm -C /dev/md102 -l1 -n2 -b internal -e 1.2
/dev/md101 missing
mdadm: array /dev/md102 started.
root@deb6dev:~# mdadm -C /dev/md103 -l1 -n2 -b internal -e 1.2
/dev/md102 missing
mdadm: array /dev/md103 started.
root@deb6dev:~# mdadm -C /dev/md104 -l1 -n2 -b internal -e 1.2
/dev/md103 missing
mdadm: array /dev/md104 started.
root@deb6dev:~# mdadm -C /dev/md105 -l1 -n2 -b internal -e 1.2
/dev/md104 missing
mdadm: array /dev/md105 started.
root@deb6dev:~# mdadm -C /dev/md106 -l1 -n2 -b internal -e 1.2
/dev/md105 missing
mdadm: array /dev/md106 started.
root@deb6dev:~# cat /proc/mdstat
Personalities : [raid1]
md106 : active (auto-read-only) raid1 md105[0]
51116 blocks super 1.2 [2/1] [U_]
bitmap: 1/1 pages [4KB], 65536KB chunk
md105 : active raid1 md104[0]
51128 blocks super 1.2 [2/1] [U_]
bitmap: 1/1 pages [4KB], 65536KB chunk
md104 : active raid1 md103[0]
51140 blocks super 1.2 [2/1] [U_]
bitmap: 1/1 pages [4KB], 65536KB chunk
md103 : active raid1 md102[0]
51152 blocks super 1.2 [2/1] [U_]
bitmap: 1/1 pages [4KB], 65536KB chunk
md102 : active raid1 md101[0]
51164 blocks super 1.2 [2/1] [U_]
bitmap: 1/1 pages [4KB], 65536KB chunk
md101 : active raid1 md100[0]
51176 blocks super 1.2 [2/1] [U_]
bitmap: 1/1 pages [4KB], 65536KB chunk
md100 : active raid1 sdc[0]
51188 blocks super 1.2 [2/1] [U_]
bitmap: 1/1 pages [4KB], 65536KB chunk
unused devices: <none>
Create and mount a filesystem
--------------------------
root@deb6dev:~# mkfs.ext3 /dev/md106
<<successful mkfs output snipped>>
root@deb6dev:~# mount -t ext3 /dev/md106 /mnt/backup
root@deb6dev:~# df | grep backup
/dev/md106 49490 4923 42012 11% /mnt/backup
Plug in a device that hasn't been in an array before
-------------------------------------------
root@deb6dev:~# mdadm -vv /dev/md100 --add /dev/sdd
mdadm: added /dev/sdd
root@deb6dev:~# cat /proc/mdstat
Personalities : [raid1]
md106 : active raid1 md105[0]
51116 blocks super 1.2 [2/1] [U_]
bitmap: 1/1 pages [4KB], 65536KB chunk
md105 : active raid1 md104[0]
51128 blocks super 1.2 [2/1] [U_]
bitmap: 1/1 pages [4KB], 65536KB chunk
md104 : active raid1 md103[0]
51140 blocks super 1.2 [2/1] [U_]
bitmap: 1/1 pages [4KB], 65536KB chunk
md103 : active raid1 md102[0]
51152 blocks super 1.2 [2/1] [U_]
bitmap: 1/1 pages [4KB], 65536KB chunk
md102 : active raid1 md101[0]
51164 blocks super 1.2 [2/1] [U_]
bitmap: 1/1 pages [4KB], 65536KB chunk
md101 : active raid1 md100[0]
51176 blocks super 1.2 [2/1] [U_]
bitmap: 1/1 pages [4KB], 65536KB chunk
md100 : active raid1 sdd[2] sdc[0]
51188 blocks super 1.2 [2/2] [UU]
bitmap: 1/1 pages [4KB], 65536KB chunk
Write to the array
---------------
root@deb6dev:~# dd if=/dev/urandom of=a.blob bs=1M count=20
20+0 records in
20+0 records out
20971520 bytes (21 MB) copied, 5.05528 s, 4.1 MB/s
root@deb6dev:~# dd if=/dev/urandom of=b.blob bs=1M count=10
10+0 records in
10+0 records out
10485760 bytes (10 MB) copied, 2.59361 s, 4.0 MB/s
root@deb6dev:~# dd if=/dev/urandom of=c.blob bs=1M count=5
5+0 records in
5+0 records out
5242880 bytes (5.2 MB) copied, 1.35619 s, 3.9 MB/s
root@deb6dev:~# md5sum *blob > md5sums.txt
root@deb6dev:~# ls -l
total 35844
-rw-r--r-- 1 root root 20971520 Oct 25 15:57 a.blob
-rw-r--r-- 1 root root 10485760 Oct 25 15:57 b.blob
-rw-r--r-- 1 root root 5242880 Oct 25 15:57 c.blob
-rw-r--r-- 1 root root 123 Oct 25 15:57 md5sums.txt
root@deb6dev:~# cp *blob /mnt/backup
root@deb6dev:~# ls -l /mnt/backup
total 35995
-rw-r--r-- 1 root root 20971520 Oct 25 15:58 a.blob
-rw-r--r-- 1 root root 10485760 Oct 25 15:58 b.blob
-rw-r--r-- 1 root root 5242880 Oct 25 15:58 c.blob
drwx------ 2 root root 12288 Oct 25 15:27 lost+found
root@deb6dev:~# df | grep backup
/dev/md106 49490 40906 6029 88% /mnt/backup
root@deb6dev:~# cat /proc/mdstat | grep -A 3 '^md100'
md100 : active raid1 sdd[2] sdc[0]
51188 blocks super 1.2 [2/2] [UU]
bitmap: 0/1 pages [0KB], 65536KB chunk
Data written and array devices in sync (bitmap 0/1)
Fail and remove device
-------------------
root@deb6dev:~# sync
root@deb6dev:~# mdadm -vv /dev/md100 --fail /dev/sdd --remove /dev/sdd
mdadm: set /dev/sdd faulty in /dev/md100
mdadm: hot removed /dev/sdd from /dev/md100
root@deb6dev:~# cat /proc/mdstat | grep -A 3 '^md100'
md100 : active raid1 sdc[0]
51188 blocks super 1.2 [2/1] [U_]
bitmap: 0/1 pages [0KB], 65536KB chunk
Device may now be unplugged
Write to the array again
--------------------
root@deb6dev:~# rm /mnt/backup/b.blob
root@deb6dev:~# ls -l /mnt/backup
total 25714
-rw-r--r-- 1 root root 20971520 Oct 25 15:58 a.blob
-rw-r--r-- 1 root root 5242880 Oct 25 15:58 c.blob
drwx------ 2 root root 12288 Oct 25 15:27 lost+found
root@deb6dev:~# df | grep backup
/dev/md106 49490 30625 16310 66% /mnt/backup
root@deb6dev:~# cat /proc/mdstat | grep -A 3 '^md100'
md100 : active raid1 sdc[0]
51188 blocks super 1.2 [2/1] [U_]
bitmap: 1/1 pages [4KB], 65536KB chunk
bitmap 1/1 shows array is not in sync (we know it's due to the writes
pending for the device we previously failed)
Plug in a device that was previously plugged in
----------------------------------------
root@deb6dev:~# mdadm -vv -I /dev/sdd --run
mdadm: UUID differs from /dev/md/0.
mdadm: UUID differs from /dev/md/1.
mdadm: /dev/sdd attached to /dev/md100 which is already active.
root@deb6dev:~# sync
root@deb6dev:~# cat /proc/mdstat | grep -A 3 '^md100'
md100 : active raid1 sdd[2] sdc[0]
51188 blocks super 1.2 [2/2] [UU]
bitmap: 0/1 pages [0KB], 65536KB chunk
Device reconnected [UU] and in sync (bitmap 0/1)
Restore from off-site device
------------------------
Remove device from array
root@deb6dev:~# mdadm -vv /dev/md100 --fail /dev/sdd --remove /dev/sdd
mdadm: set /dev/sdd faulty in /dev/md100
mdadm: hot removed /dev/sdd from /dev/md100
root@deb6dev:~# mdadm -Ev /dev/sdd
/dev/sdd:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 4c957fac:d7dbc792:b642daf0:d22e313e
Name : deb6dev:100 (local to host deb6dev)
Creation Time : Tue Oct 25 15:22:19 2011
Raid Level : raid1
Raid Devices : 2
Avail Dev Size : 102376 (50.00 MiB 52.42 MB)
Array Size : 102376 (50.00 MiB 52.42 MB)
Data Offset : 24 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 381f453f:5a97f1f6:bb5098bb:8c071a95
Internal Bitmap : 8 sectors from superblock
Update Time : Tue Oct 25 17:27:53 2011
Checksum : acbcee5f - correct
Events : 250
Device Role : Active device 1
Array State : AA ('A' == active, '.' == missing)
Assemble a new array from off-site component:
root@deb6dev:~# mdadm -vv -A /dev/md200 --run /dev/sdd
mdadm: looking for devices for /dev/md200
mdadm: /dev/sdd is identified as a member of /dev/md200, slot 1.
mdadm: no uptodate device for slot 0 of /dev/md200
mdadm: added /dev/sdd to /dev/md200 as 1
mdadm: /dev/md200 has been started with 1 drive (out of 2).
root@deb6dev:~#
Check file-system on new array
root@deb6dev:~# fsck.ext3 -f -n /dev/md200
e2fsck 1.41.12 (17-May-2010)
fsck.ext3: Superblock invalid, trying backup blocks...
fsck.ext3: Bad magic number in super-block while trying to open /dev/md200
The superblock could not be read or does not describe a correct ext2
filesystem. If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
e2fsck -b 8193 <device>
How do I use these devices in a new array?
Kind regards and thanks for your help,
Josh
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Rotating RAID 1
2011-10-25 7:34 ` linbloke
@ 2011-10-25 21:47 ` NeilBrown
0 siblings, 0 replies; 20+ messages in thread
From: NeilBrown @ 2011-10-25 21:47 UTC (permalink / raw)
To: linbloke; +Cc: Jérôme Poulin, Pavel Hofman, linux-raid
[-- Attachment #1: Type: text/plain, Size: 12942 bytes --]
On Tue, 25 Oct 2011 18:34:57 +1100 linbloke <linbloke@fastmail.fm> wrote:
> On 16/08/11 9:55 AM, NeilBrown wrote:
> > On Mon, 15 Aug 2011 19:32:04 -0400 Jérôme Poulin<jeromepoulin@gmail.com>
> > wrote:
> >
> >> On Mon, Aug 15, 2011 at 6:42 PM, NeilBrown<neilb@suse.de> wrote:
> >>> So if there are 3 drives A, X, Y where A is permanent and X and Y are rotated
> >>> off-site, then I create two RAID1s like this:
> >>>
> >>>
> >>> mdadm -C /dev/md0 -l1 -n2 --bitmap=internal /dev/A /dev/X
> >>> mdadm -C /dev/md1 -l1 -n2 --bitmap=internal /dev/md0 /dev/Y
> >> That seems nice for 2 disks, but adding another one later would be a
> >> mess. Is there any way to play with slots number manually to make it
> >> appear as an always degraded RAID ? I can't plug all the disks at once
> >> because of the maximum of 2 ports.
> > Yes, add another one later would be difficult. But if you know up-front that
> > you will want three off-site devices it is easy.
> >
> > You could
> >
> > mdadm -C /dev/md0 -l1 -n2 -b internal /dev/A missing
> > mdadm -C /dev/md1 -l1 -n2 -b internal /dev/md0 missing
> > mdadm -C /dev/md2 -l1 -n2 -b internal /dev/md1 missing
> > mdadm -C /dev/md3 -l1 -n2 -b internal /dev/md2 missing
> >
> > mkfs /dev/md3 ; mount ..
> >
> > So you now have 4 "missing" devices. Each time you plug in a device that
> > hasn't been in an array before, explicitly add it to the array that you want
> > it to be a part of and let it recover.
> > When you plug in a device that was previously plugged in, just "mdadm
> > -I /dev/XX" and it will automatically be added and recover based on the
> > bitmap.
> >
> > You can have as many or as few of the transient drives plugged in at any
> > time as you like.
> >
> > There is a cost here of course. Every write potentially needs to update
> > every bitmap, so the more bitmaps, the more overhead in updating them. So
> > don't create more than you need.
> >
> > Also, it doesn't have to be a linear stack. It could be a binary tree
> > though that might take a little more care to construct. Then when an
> > adjacent pair of leafs are both off-site, their bitmap would not need
> > updating.
> >
> > NeilBrown
>
> Hi Neil, Jérôme and Pavel,
>
> I'm in the process of testing the solution described above and have been
> successful at those steps (I now have sync'd devices that I have failed
> and removed from their respective arrays - the "backups"). I can add new
> devices and also incrementally re-add the devices back to their
> respective arrays and all my tests show this process works well. The
> point which I'm now trying to resolve is how to create a new array from
> one of the off-site components - ie, the restore from backup test.
> Below are the steps I've taken to implement and verify each step, you
> can skip to the bottom section "Restore from off-site backup" to get to
> the point if you like. When the wiki is back up, I'll post this process
> there for others who are looking for mdadm based offline backups. Any
> corrections gratefully appreciated.
>
> Based on the example above, for a target setup of 7 off-site devices
> synced to a two device RAID1, my test setup for a is:
>
> RAID Array Online Device Off-site device
> md100 sdc sdd
> md101 md100 sde
> md102 md101 sdf
> md103 md102 sdg
> md104 md103 sdh
> md105 md104 sdi
> md106 md105 sdj
>
> root@deb6dev:~# uname -a
> Linux deb6dev 2.6.32-5-686 #1 SMP Tue Mar 8 21:36:00 UTC 2011 i686 GNU/Linux
> root@deb6dev:~# mdadm -V
> mdadm - v3.1.4 - 31st August 2010
> root@deb6dev:~# cat /etc/debian_version
> 6.0.1
>
> Create the nested arrays
> ---------------------
> root@deb6dev:~# mdadm -C /dev/md100 -l1 -n2 -b internal -e 1.2 /dev/sdc
> missing
> mdadm: array /dev/md100 started.
> root@deb6dev:~# mdadm -C /dev/md101 -l1 -n2 -b internal -e 1.2
> /dev/md100 missing
> mdadm: array /dev/md101 started.
> root@deb6dev:~# mdadm -C /dev/md102 -l1 -n2 -b internal -e 1.2
> /dev/md101 missing
> mdadm: array /dev/md102 started.
> root@deb6dev:~# mdadm -C /dev/md103 -l1 -n2 -b internal -e 1.2
> /dev/md102 missing
> mdadm: array /dev/md103 started.
> root@deb6dev:~# mdadm -C /dev/md104 -l1 -n2 -b internal -e 1.2
> /dev/md103 missing
> mdadm: array /dev/md104 started.
> root@deb6dev:~# mdadm -C /dev/md105 -l1 -n2 -b internal -e 1.2
> /dev/md104 missing
> mdadm: array /dev/md105 started.
> root@deb6dev:~# mdadm -C /dev/md106 -l1 -n2 -b internal -e 1.2
> /dev/md105 missing
> mdadm: array /dev/md106 started.
>
> root@deb6dev:~# cat /proc/mdstat
> Personalities : [raid1]
> md106 : active (auto-read-only) raid1 md105[0]
> 51116 blocks super 1.2 [2/1] [U_]
> bitmap: 1/1 pages [4KB], 65536KB chunk
>
> md105 : active raid1 md104[0]
> 51128 blocks super 1.2 [2/1] [U_]
> bitmap: 1/1 pages [4KB], 65536KB chunk
>
> md104 : active raid1 md103[0]
> 51140 blocks super 1.2 [2/1] [U_]
> bitmap: 1/1 pages [4KB], 65536KB chunk
>
> md103 : active raid1 md102[0]
> 51152 blocks super 1.2 [2/1] [U_]
> bitmap: 1/1 pages [4KB], 65536KB chunk
>
> md102 : active raid1 md101[0]
> 51164 blocks super 1.2 [2/1] [U_]
> bitmap: 1/1 pages [4KB], 65536KB chunk
>
> md101 : active raid1 md100[0]
> 51176 blocks super 1.2 [2/1] [U_]
> bitmap: 1/1 pages [4KB], 65536KB chunk
>
> md100 : active raid1 sdc[0]
> 51188 blocks super 1.2 [2/1] [U_]
> bitmap: 1/1 pages [4KB], 65536KB chunk
>
> unused devices: <none>
>
> Create and mount a filesystem
> --------------------------
> root@deb6dev:~# mkfs.ext3 /dev/md106
> <<successful mkfs output snipped>>
> root@deb6dev:~# mount -t ext3 /dev/md106 /mnt/backup
> root@deb6dev:~# df | grep backup
> /dev/md106 49490 4923 42012 11% /mnt/backup
>
> Plug in a device that hasn't been in an array before
> -------------------------------------------
> root@deb6dev:~# mdadm -vv /dev/md100 --add /dev/sdd
> mdadm: added /dev/sdd
> root@deb6dev:~# cat /proc/mdstat
> Personalities : [raid1]
> md106 : active raid1 md105[0]
> 51116 blocks super 1.2 [2/1] [U_]
> bitmap: 1/1 pages [4KB], 65536KB chunk
>
> md105 : active raid1 md104[0]
> 51128 blocks super 1.2 [2/1] [U_]
> bitmap: 1/1 pages [4KB], 65536KB chunk
>
> md104 : active raid1 md103[0]
> 51140 blocks super 1.2 [2/1] [U_]
> bitmap: 1/1 pages [4KB], 65536KB chunk
>
> md103 : active raid1 md102[0]
> 51152 blocks super 1.2 [2/1] [U_]
> bitmap: 1/1 pages [4KB], 65536KB chunk
>
> md102 : active raid1 md101[0]
> 51164 blocks super 1.2 [2/1] [U_]
> bitmap: 1/1 pages [4KB], 65536KB chunk
>
> md101 : active raid1 md100[0]
> 51176 blocks super 1.2 [2/1] [U_]
> bitmap: 1/1 pages [4KB], 65536KB chunk
>
> md100 : active raid1 sdd[2] sdc[0]
> 51188 blocks super 1.2 [2/2] [UU]
> bitmap: 1/1 pages [4KB], 65536KB chunk
>
>
> Write to the array
> ---------------
> root@deb6dev:~# dd if=/dev/urandom of=a.blob bs=1M count=20
> 20+0 records in
> 20+0 records out
> 20971520 bytes (21 MB) copied, 5.05528 s, 4.1 MB/s
> root@deb6dev:~# dd if=/dev/urandom of=b.blob bs=1M count=10
> 10+0 records in
> 10+0 records out
> 10485760 bytes (10 MB) copied, 2.59361 s, 4.0 MB/s
> root@deb6dev:~# dd if=/dev/urandom of=c.blob bs=1M count=5
> 5+0 records in
> 5+0 records out
> 5242880 bytes (5.2 MB) copied, 1.35619 s, 3.9 MB/s
> root@deb6dev:~# md5sum *blob > md5sums.txt
> root@deb6dev:~# ls -l
> total 35844
> -rw-r--r-- 1 root root 20971520 Oct 25 15:57 a.blob
> -rw-r--r-- 1 root root 10485760 Oct 25 15:57 b.blob
> -rw-r--r-- 1 root root 5242880 Oct 25 15:57 c.blob
> -rw-r--r-- 1 root root 123 Oct 25 15:57 md5sums.txt
> root@deb6dev:~# cp *blob /mnt/backup
> root@deb6dev:~# ls -l /mnt/backup
> total 35995
> -rw-r--r-- 1 root root 20971520 Oct 25 15:58 a.blob
> -rw-r--r-- 1 root root 10485760 Oct 25 15:58 b.blob
> -rw-r--r-- 1 root root 5242880 Oct 25 15:58 c.blob
> drwx------ 2 root root 12288 Oct 25 15:27 lost+found
> root@deb6dev:~# df | grep backup
> /dev/md106 49490 40906 6029 88% /mnt/backup
> root@deb6dev:~# cat /proc/mdstat | grep -A 3 '^md100'
> md100 : active raid1 sdd[2] sdc[0]
> 51188 blocks super 1.2 [2/2] [UU]
> bitmap: 0/1 pages [0KB], 65536KB chunk
>
> Data written and array devices in sync (bitmap 0/1)
>
> Fail and remove device
> -------------------
> root@deb6dev:~# sync
> root@deb6dev:~# mdadm -vv /dev/md100 --fail /dev/sdd --remove /dev/sdd
> mdadm: set /dev/sdd faulty in /dev/md100
> mdadm: hot removed /dev/sdd from /dev/md100
> root@deb6dev:~# cat /proc/mdstat | grep -A 3 '^md100'
> md100 : active raid1 sdc[0]
> 51188 blocks super 1.2 [2/1] [U_]
> bitmap: 0/1 pages [0KB], 65536KB chunk
>
> Device may now be unplugged
>
>
> Write to the array again
> --------------------
> root@deb6dev:~# rm /mnt/backup/b.blob
> root@deb6dev:~# ls -l /mnt/backup
> total 25714
> -rw-r--r-- 1 root root 20971520 Oct 25 15:58 a.blob
> -rw-r--r-- 1 root root 5242880 Oct 25 15:58 c.blob
> drwx------ 2 root root 12288 Oct 25 15:27 lost+found
> root@deb6dev:~# df | grep backup
> /dev/md106 49490 30625 16310 66% /mnt/backup
> root@deb6dev:~# cat /proc/mdstat | grep -A 3 '^md100'
> md100 : active raid1 sdc[0]
> 51188 blocks super 1.2 [2/1] [U_]
> bitmap: 1/1 pages [4KB], 65536KB chunk
>
> bitmap 1/1 shows array is not in sync (we know it's due to the writes
> pending for the device we previously failed)
>
> Plug in a device that was previously plugged in
> ----------------------------------------
> root@deb6dev:~# mdadm -vv -I /dev/sdd --run
> mdadm: UUID differs from /dev/md/0.
> mdadm: UUID differs from /dev/md/1.
> mdadm: /dev/sdd attached to /dev/md100 which is already active.
> root@deb6dev:~# sync
> root@deb6dev:~# cat /proc/mdstat | grep -A 3 '^md100'
> md100 : active raid1 sdd[2] sdc[0]
> 51188 blocks super 1.2 [2/2] [UU]
> bitmap: 0/1 pages [0KB], 65536KB chunk
>
> Device reconnected [UU] and in sync (bitmap 0/1)
>
> Restore from off-site device
> ------------------------
> Remove device from array
> root@deb6dev:~# mdadm -vv /dev/md100 --fail /dev/sdd --remove /dev/sdd
> mdadm: set /dev/sdd faulty in /dev/md100
> mdadm: hot removed /dev/sdd from /dev/md100
> root@deb6dev:~# mdadm -Ev /dev/sdd
> /dev/sdd:
> Magic : a92b4efc
> Version : 1.2
> Feature Map : 0x1
> Array UUID : 4c957fac:d7dbc792:b642daf0:d22e313e
> Name : deb6dev:100 (local to host deb6dev)
> Creation Time : Tue Oct 25 15:22:19 2011
> Raid Level : raid1
> Raid Devices : 2
>
> Avail Dev Size : 102376 (50.00 MiB 52.42 MB)
> Array Size : 102376 (50.00 MiB 52.42 MB)
> Data Offset : 24 sectors
> Super Offset : 8 sectors
> State : clean
> Device UUID : 381f453f:5a97f1f6:bb5098bb:8c071a95
>
> Internal Bitmap : 8 sectors from superblock
> Update Time : Tue Oct 25 17:27:53 2011
> Checksum : acbcee5f - correct
> Events : 250
>
>
> Device Role : Active device 1
> Array State : AA ('A' == active, '.' == missing)
>
> Assemble a new array from off-site component:
> root@deb6dev:~# mdadm -vv -A /dev/md200 --run /dev/sdd
> mdadm: looking for devices for /dev/md200
> mdadm: /dev/sdd is identified as a member of /dev/md200, slot 1.
> mdadm: no uptodate device for slot 0 of /dev/md200
> mdadm: added /dev/sdd to /dev/md200 as 1
> mdadm: /dev/md200 has been started with 1 drive (out of 2).
> root@deb6dev:~#
>
> Check file-system on new array
> root@deb6dev:~# fsck.ext3 -f -n /dev/md200
> e2fsck 1.41.12 (17-May-2010)
> fsck.ext3: Superblock invalid, trying backup blocks...
> fsck.ext3: Bad magic number in super-block while trying to open /dev/md200
>
> The superblock could not be read or does not describe a correct ext2
> filesystem. If the device is valid and it really contains an ext2
> filesystem (and not swap or ufs or something else), then the superblock
> is corrupt, and you might try running e2fsck with an alternate superblock:
> e2fsck -b 8193 <device>
>
>
> How do I use these devices in a new array?
>
You need to also assemble md201 md202 md203 md204 md205 md206
and the fsck/mount md206
Each of these is made by assembling the single previous md20X array.
mdadm -A /dev/md201 --run /dev/md200
mdadm -A /dev/md202 --run /dev/md201
....
mdadm -A /dev/md206 --run /dev/md205
All the rest of your description looks good!
Thanks,
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2011-10-25 21:47 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-08-15 19:56 Rotating RAID 1 Jérôme Poulin
2011-08-15 20:19 ` Phil Turmel
2011-08-15 20:23 ` Jérôme Poulin
2011-08-15 20:21 ` Pavel Hofman
2011-08-15 20:25 ` Jérôme Poulin
2011-08-15 20:42 ` Pavel Hofman
2011-08-15 22:42 ` NeilBrown
2011-08-15 23:32 ` Jérôme Poulin
2011-08-15 23:55 ` NeilBrown
2011-08-16 6:34 ` Pavel Hofman
2011-09-09 22:28 ` Bill Davidsen
2011-09-11 19:21 ` Pavel Hofman
2011-09-12 14:20 ` Bill Davidsen
2011-08-23 3:45 ` Jérôme Poulin
2011-08-23 3:58 ` NeilBrown
2011-08-23 4:05 ` Jérôme Poulin
2011-08-24 2:28 ` Jérôme Poulin
2011-10-25 7:34 ` linbloke
2011-10-25 21:47 ` NeilBrown
2011-08-16 4:36 ` maurice
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).