Rotating RAID 1

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Rotating RAID 1
@ 2011-08-15 19:56 Jérôme Poulin
  2011-08-15 20:19 ` Phil Turmel
  2011-08-15 20:21 ` Pavel Hofman
  0 siblings, 2 replies; 20+ messages in thread
From: Jérôme Poulin @ 2011-08-15 19:56 UTC (permalink / raw)
  To: linux-raid

Good evening,

I'm currently working on a project in which I use md-raid RAID1 with
Bitmap to "clone" my data from one disk to another and I would like to
know if this could cause corruption:

The system has 2 SATA ports which are hotplug capable.
I created 2 partitions, 1 system (2GB), 1 data (1TB+).
I created to RAID1 using:
mdadm --create /dev/md0 --raid-devices=2 --bitmap=internal
--bitmap-chunk=4096 --metadata=1.0 /dev/sd[ab]1
mdadm --create /dev/md1 --raid-devices=2 --bitmap=internal
--bitmap-chunk=65536 --metadata=1.0 /dev/sd[ab]2
Forced sync_parallel on the system disk to be sure it rebuild first.
Formatted system ext3 and data ext4.
Both mounted using data=writeback.

This system doesn't contain critical data but it contains backups on
the data partition. Once the data is in sync, I removed a disk and let
udev fail and remove the disk from the array, this is ArchLinux and
udev is set to mount the array using the incremental option, I added
--run to make sure it mounts even when a disk is missing. As of now,
eveything works as expected.

Then what is different about a standard RAID1, I removed sdb and
replaced it with a brand new disk, copied the partition template from
the other one and added the new disk using mdadm -a on both arrays, it
synced and works, then swapping the other disk back only rebuilds
according to the bitmap, however sometimes it appears to make a full
rebuild which is alright. However once, after a day of modifications
and weeks after setting-up this RAID, at least 100 GB, it took seconds
to rebuild and days later it appeared to have encountered corruption,
the kernel complained about bad extents and fsck found errors in one
of the file I know it had modified that day.

So the question is; Am I right to use md-raid to do this kind of
stuff, rsync is too CPU heavy for what I need and I need to stay
compatible with Windows thus choosing metadata 1.0.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Rotating RAID 1
  2011-08-15 19:56 Rotating RAID 1 Jérôme Poulin
@ 2011-08-15 20:19 ` Phil Turmel
  2011-08-15 20:23   ` Jérôme Poulin
  2011-08-15 20:21 ` Pavel Hofman
  1 sibling, 1 reply; 20+ messages in thread
From: Phil Turmel @ 2011-08-15 20:19 UTC (permalink / raw)
  To: Jérôme Poulin; +Cc: linux-raid

Hi Jérôme,

On 08/15/2011 03:56 PM, Jérôme Poulin wrote:
> Then what is different about a standard RAID1, I removed sdb and
> replaced it with a brand new disk, copied the partition template from
> the other one and added the new disk using mdadm -a on both arrays, it
> synced and works, then swapping the other disk back only rebuilds
> according to the bitmap, however sometimes it appears to make a full
> rebuild which is alright. However once, after a day of modifications
> and weeks after setting-up this RAID, at least 100 GB, it took seconds
> to rebuild and days later it appeared to have encountered corruption,
> the kernel complained about bad extents and fsck found errors in one
> of the file I know it had modified that day.

This is a problem.  MD only knows about two disk.  You have three.  When two disks are in place and sync'ed, the bitmaps will essentially stay cleared.

When you swap to the other disk, its bitmap is also clear, for the same reason.  I'm sure mdadm notices the different event counts, but the clear bitmap would leave mdadm little or nothing to do to resync, as far as it knows.  But lots of writes have happened in the meantime, and they won't get copied to the freshly inserted drive.  Mdadm will read from both disks in parallel when there are parallel workloads, so one workload would get current data and the other would get stale data.

If you perform a "check" pass after swapping and resyncing, I bet it finds many mismatches.  It definitely can't work as described.

I'm not sure, but this might work if you could temporarily set it up as a triple mirror, so each disk has a unique slot/role.

It would also work if you didn't use a bitmap, as a re-inserted drive would simply be overwritten completely.

> So the question is; Am I right to use md-raid to do this kind of
> stuff, rsync is too CPU heavy for what I need and I need to stay
> compatible with Windows thus choosing metadata 1.0.

How do you stay compatible with Windows?  If you let Windows write to any of these disks, you've corrupted that disk with respect to its peers.  Danger, Will Robinson!

HTH,

Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Rotating RAID 1
  2011-08-15 20:19 ` Phil Turmel
@ 2011-08-15 20:23   ` Jérôme Poulin
  0 siblings, 0 replies; 20+ messages in thread
From: Jérôme Poulin @ 2011-08-15 20:23 UTC (permalink / raw)
  To: Phil Turmel; +Cc: linux-raid

On Mon, Aug 15, 2011 at 4:19 PM, Phil Turmel <philip@turmel.org> wrote:
> It would also work if you didn't use a bitmap, as a re-inserted drive would simply be overwritten completely.

After reading this, I'll prefer wiping the bitmap rather than having
an horror story trying to restore that backup, I'll give it a try.

>
> How do you stay compatible with Windows?  If you let Windows write to any of these disks, you've corrupted that disk with respect to its peers.  Danger, Will Robinson!

I am using ext4 driver in read only mode.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Rotating RAID 1
  2011-08-15 19:56 Rotating RAID 1 Jérôme Poulin
  2011-08-15 20:19 ` Phil Turmel
@ 2011-08-15 20:21 ` Pavel Hofman
  2011-08-15 20:25   ` Jérôme Poulin
  1 sibling, 1 reply; 20+ messages in thread
From: Pavel Hofman @ 2011-08-15 20:21 UTC (permalink / raw)
  To: Jérôme Poulin; +Cc: linux-raid


Dne 15.8.2011 21:56, Jérôme Poulin napsal(a):
> Good evening,
> 
> I'm currently working on a project in which I use md-raid RAID1 with
> Bitmap to "clone" my data from one disk to another and I would like to
> know if this could cause corruption:
> 
> The system has 2 SATA ports which are hotplug capable.
> I created 2 partitions, 1 system (2GB), 1 data (1TB+).
> I created to RAID1 using:
> mdadm --create /dev/md0 --raid-devices=2 --bitmap=internal
> --bitmap-chunk=4096 --metadata=1.0 /dev/sd[ab]1
> mdadm --create /dev/md1 --raid-devices=2 --bitmap=internal
> --bitmap-chunk=65536 --metadata=1.0 /dev/sd[ab]2
> Forced sync_parallel on the system disk to be sure it rebuild first.
> Formatted system ext3 and data ext4.
> Both mounted using data=writeback.
> 
> This system doesn't contain critical data but it contains backups on
> the data partition. Once the data is in sync, I removed a disk and let
> udev fail and remove the disk from the array, this is ArchLinux and
> udev is set to mount the array using the incremental option, I added
> --run to make sure it mounts even when a disk is missing. As of now,
> eveything works as expected.
> 
> Then what is different about a standard RAID1, I removed sdb and
> replaced it with a brand new disk, copied the partition template from
> the other one and added the new disk using mdadm -a on both arrays, it
> synced and works, then swapping the other disk back only rebuilds
> according to the bitmap, however sometimes it appears to make a full
> rebuild which is alright. However once, after a day of modifications
> and weeks after setting-up this RAID, at least 100 GB, it took seconds
> to rebuild and days later it appeared to have encountered corruption,
> the kernel complained about bad extents and fsck found errors in one
> of the file I know it had modified that day.

Does your scenario involve using two "external" drives, being swapped
each time? I am using such setup, but in order to gain the bitmap
performance effects, I have to run two mirrored RAID1s, i.e. two
bitmaps, each for its corresponding external disk. This setup has been
working OK for a few years now.

Best regards,

Pavel.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Rotating RAID 1
  2011-08-15 20:21 ` Pavel Hofman
@ 2011-08-15 20:25   ` Jérôme Poulin
  2011-08-15 20:42     ` Pavel Hofman
  0 siblings, 1 reply; 20+ messages in thread
From: Jérôme Poulin @ 2011-08-15 20:25 UTC (permalink / raw)
  To: Pavel Hofman; +Cc: linux-raid

On Mon, Aug 15, 2011 at 4:21 PM, Pavel Hofman <pavel.hofman@ivitera.com> wrote:
> Does your scenario involve using two "external" drives, being swapped
> each time?

Yes, exactly, 3 or more drive, one stays in place, and the others get
rotated off-site.

> I am using such setup, but in order to gain the bitmap
> performance effects, I have to run two mirrored RAID1s, i.e. two
> bitmaps, each for its corresponding external disk. This setup has been
> working OK for a few years now.

Did you script something that stops the RAID and re-assemble it? The
RAID must stay mounted in my case as there is live data (incremential
backups, so even if the last file is incomplete it is not a problem.)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Rotating RAID 1
  2011-08-15 20:25   ` Jérôme Poulin
@ 2011-08-15 20:42     ` Pavel Hofman
  2011-08-15 22:42       ` NeilBrown
  0 siblings, 1 reply; 20+ messages in thread
From: Pavel Hofman @ 2011-08-15 20:42 UTC (permalink / raw)
  To: Jérôme Poulin; +Cc: linux-raid

Dne 15.8.2011 22:25, Jérôme Poulin napsal(a):
> On Mon, Aug 15, 2011 at 4:21 PM, Pavel Hofman <pavel.hofman@ivitera.com> wrote:
>> Does your scenario involve using two "external" drives, being swapped
>> each time?
> 
> Yes, exactly, 3 or more drive, one stays in place, and the others get
> rotated off-site.
> 
>> I am using such setup, but in order to gain the bitmap
>> performance effects, I have to run two mirrored RAID1s, i.e. two
>> bitmaps, each for its corresponding external disk. This setup has been
>> working OK for a few years now.
> 
> Did you script something that stops the RAID and re-assemble it? The
> RAID must stay mounted in my case as there is live data (incremential
> backups, so even if the last file is incomplete it is not a problem.)

I am working on wiki description of our backup solution. The priorities
got re-organized recently, looks like I should finish it soon :-)

Yes, I have a script automatically re-assembling the array corresponding
to the added drive and starting synchronization. There is another script
checking synchronization status, run periodically from cron. When the
arrays are synced, it waits until the currently running backup job
finishes, shuts down the backup software (backuppc), unmounts the
filesystem to flush, removes the external drives from the array (we run
several external drives in raid0), does a few basic checks on the
external copy (mounting read-only, reading a directory) and puts the
external drives to sleep (hdparm -Y) for storing them outside of company
premises.

Give me a few days, I will finish the wiki page and send you a link.

Pavel.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Rotating RAID 1
  2011-08-15 20:42     ` Pavel Hofman
@ 2011-08-15 22:42       ` NeilBrown
  2011-08-15 23:32         ` Jérôme Poulin
  2011-08-16  4:36         ` maurice
  0 siblings, 2 replies; 20+ messages in thread
From: NeilBrown @ 2011-08-15 22:42 UTC (permalink / raw)
  To: Pavel Hofman; +Cc: Jérôme Poulin, linux-raid

On Mon, 15 Aug 2011 22:42:06 +0200 Pavel Hofman <pavel.hofman@ivitera.com>
wrote:

> 
> Dne 15.8.2011 22:25, Jérôme Poulin napsal(a):
> > On Mon, Aug 15, 2011 at 4:21 PM, Pavel Hofman <pavel.hofman@ivitera.com> wrote:
> >> Does your scenario involve using two "external" drives, being swapped
> >> each time?
> > 
> > Yes, exactly, 3 or more drive, one stays in place, and the others get
> > rotated off-site.
> > 
> >> I am using such setup, but in order to gain the bitmap
> >> performance effects, I have to run two mirrored RAID1s, i.e. two
> >> bitmaps, each for its corresponding external disk. This setup has been
> >> working OK for a few years now.
> > 
> > Did you script something that stops the RAID and re-assemble it? The
> > RAID must stay mounted in my case as there is live data (incremential
> > backups, so even if the last file is incomplete it is not a problem.)
> 
> I am working on wiki description of our backup solution. The priorities
> got re-organized recently, looks like I should finish it soon :-)
> 
> Yes, I have a script automatically re-assembling the array corresponding
> to the added drive and starting synchronization. There is another script
> checking synchronization status, run periodically from cron. When the
> arrays are synced, it waits until the currently running backup job
> finishes, shuts down the backup software (backuppc), unmounts the
> filesystem to flush, removes the external drives from the array (we run
> several external drives in raid0), does a few basic checks on the
> external copy (mounting read-only, reading a directory) and puts the
> external drives to sleep (hdparm -Y) for storing them outside of company
> premises.
> 
> Give me a few days, I will finish the wiki page and send you a link.
> 

I'm not sure from you description whether the following describes exactly
what you are doing or not, but this is how I would do it.
As you say, you need two bitmaps.

So if there are 3 drives A, X, Y where A is permanent and X and Y are rotated
off-site, then I create two RAID1s like this:


mdadm -C /dev/md0 -l1 -n2 --bitmap=internal /dev/A /dev/X
mdadm -C /dev/md1 -l1 -n2 --bitmap=internal /dev/md0 /dev/Y

mkfs /dev/md1; mount /dev/md1 ...


Then you can remove either or both of X and Y and which each is re-added it
will recover just the blocks that it needs.  X from the bitmap of md0, Y from
the bitmap of md1.

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Rotating RAID 1
  2011-08-15 22:42       ` NeilBrown
@ 2011-08-15 23:32         ` Jérôme Poulin
  2011-08-15 23:55           ` NeilBrown
  2011-08-16  4:36         ` maurice
  1 sibling, 1 reply; 20+ messages in thread
From: Jérôme Poulin @ 2011-08-15 23:32 UTC (permalink / raw)
  To: NeilBrown; +Cc: Pavel Hofman, linux-raid

On Mon, Aug 15, 2011 at 6:42 PM, NeilBrown <neilb@suse.de> wrote:
> So if there are 3 drives A, X, Y where A is permanent and X and Y are rotated
> off-site, then I create two RAID1s like this:
>
>
> mdadm -C /dev/md0 -l1 -n2 --bitmap=internal /dev/A /dev/X
> mdadm -C /dev/md1 -l1 -n2 --bitmap=internal /dev/md0 /dev/Y

That seems nice for 2 disks, but adding another one later would be a
mess. Is there any way to play with slots number manually to make it
appear as an always degraded RAID ? I can't plug all the disks at once
because of the maximum of 2 ports.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Rotating RAID 1
  2011-08-15 23:32         ` Jérôme Poulin
@ 2011-08-15 23:55           ` NeilBrown
  2011-08-16  6:34             ` Pavel Hofman
                               ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: NeilBrown @ 2011-08-15 23:55 UTC (permalink / raw)
  To: Jérôme Poulin; +Cc: Pavel Hofman, linux-raid

On Mon, 15 Aug 2011 19:32:04 -0400 Jérôme Poulin <jeromepoulin@gmail.com>
wrote:

> On Mon, Aug 15, 2011 at 6:42 PM, NeilBrown <neilb@suse.de> wrote:
> > So if there are 3 drives A, X, Y where A is permanent and X and Y are rotated
> > off-site, then I create two RAID1s like this:
> >
> >
> > mdadm -C /dev/md0 -l1 -n2 --bitmap=internal /dev/A /dev/X
> > mdadm -C /dev/md1 -l1 -n2 --bitmap=internal /dev/md0 /dev/Y
> 
> That seems nice for 2 disks, but adding another one later would be a
> mess. Is there any way to play with slots number manually to make it
> appear as an always degraded RAID ? I can't plug all the disks at once
> because of the maximum of 2 ports.

Yes, add another one later would be difficult.  But if you know up-front that
you will want three off-site devices it is easy.

You could

 mdadm -C /dev/md0 -l1 -n2 -b internal /dev/A missing
 mdadm -C /dev/md1 -l1 -n2 -b internal /dev/md0 missing
 mdadm -C /dev/md2 -l1 -n2 -b internal /dev/md1 missing
 mdadm -C /dev/md3 -l1 -n2 -b internal /dev/md2 missing

 mkfs /dev/md3 ; mount ..

 So you now have 4 "missing" devices.  Each time you plug in a device that
 hasn't been in an array before, explicitly add it to the array that you want
 it to be a part of and let it recover.
 When you plug in a device that was previously plugged in, just "mdadm
 -I /dev/XX" and it will automatically be added and recover based on the
 bitmap.

 You can have as many or as few of the transient drives plugged in at any
 time as you like.

 There is a cost here of course.  Every write potentially needs to update
 every bitmap, so the more bitmaps, the more overhead in updating them.  So
 don't create more than you need.

 Also, it doesn't have to be a linear stack.  It could be a binary tree
 though that might take a little more care to construct.  Then when an
 adjacent pair of leafs are both off-site, their bitmap would not need
 updating.

NeilBrown

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Rotating RAID 1
  2011-08-15 23:55           ` NeilBrown
@ 2011-08-16  6:34             ` Pavel Hofman
  2011-09-09 22:28               ` Bill Davidsen
  2011-08-23  3:45             ` Jérôme Poulin
  2011-10-25  7:34             ` linbloke
  2 siblings, 1 reply; 20+ messages in thread
From: Pavel Hofman @ 2011-08-16  6:34 UTC (permalink / raw)
  To: NeilBrown; +Cc: Jérôme Poulin, linux-raid

Dne 16.8.2011 01:55, NeilBrown napsal(a):
> 
>  Also, it doesn't have to be a linear stack.  It could be a binary tree
>  though that might take a little more care to construct.

Since our backup server being a critical resource needs redundancy
itself, we are running two degraded RAID1s in parallel, using two
internal drives. The two alternating external drives plug into the
corresponding bitmap-enabled RAID1.

Pavel.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Rotating RAID 1
  2011-08-16  6:34             ` Pavel Hofman
@ 2011-09-09 22:28               ` Bill Davidsen
  2011-09-11 19:21                 ` Pavel Hofman
  0 siblings, 1 reply; 20+ messages in thread
From: Bill Davidsen @ 2011-09-09 22:28 UTC (permalink / raw)
  To: Pavel Hofman; +Cc: NeilBrown, Jérôme Poulin, linux-raid

Pavel Hofman wrote:
> Dne 16.8.2011 01:55, NeilBrown napsal(a):
>    
>>   Also, it doesn't have to be a linear stack.  It could be a binary tree
>>   though that might take a little more care to construct.
>>      
> Since our backup server being a critical resource needs redundancy
> itself, we are running two degraded RAID1s in parallel, using two
> internal drives. The two alternating external drives plug into the
> corresponding bitmap-enabled RAID1.
>    

I wonder if you could use a four device raid1 here, two drives 
permanently installed and two being added one at a time to the array. 
That gives you internal redundancy and recent backups as well.

I'm still a bit puzzled about the idea of rsync being too much CPU 
overhead, but I'll pass on that. The issue I have had with raid1 for a 
backup is that the data isn't always in a logical useful state when you 
do physical backup. Do thing with scripts and hope you always run the 
right one.

-- 
Bill Davidsen<davidsen@tmr.com>
   We are not out of the woods yet, but we know the direction and have
taken the first step. The steps are many, but finite in number, and if
we persevere we will reach our destination.  -me, 2010

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Rotating RAID 1
  2011-09-09 22:28               ` Bill Davidsen
@ 2011-09-11 19:21                 ` Pavel Hofman
  2011-09-12 14:20                   ` Bill Davidsen
  0 siblings, 1 reply; 20+ messages in thread
From: Pavel Hofman @ 2011-09-11 19:21 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: linux-raid

Dne 10.9.2011 00:28, Bill Davidsen napsal(a):
> Pavel Hofman wrote:
>> Dne 16.8.2011 01:55, NeilBrown napsal(a):
>>   
>>>   Also, it doesn't have to be a linear stack.  It could be a binary tree
>>>   though that might take a little more care to construct.
>>>      
>> Since our backup server being a critical resource needs redundancy
>> itself, we are running two degraded RAID1s in parallel, using two
>> internal drives. The two alternating external drives plug into the
>> corresponding bitmap-enabled RAID1.
>>    
> 
> I wonder if you could use a four device raid1 here, two drives
> permanently installed and two being added one at a time to the array.
> That gives you internal redundancy and recent backups as well.

I am not sure you could employ the write-intent bitmap then. And the
bitmap makes the backup considerably faster.

> 
> I'm still a bit puzzled about the idea of rsync being too much CPU
> overhead, but I'll pass on that. The issue I have had with raid1 for a
> backup is that the data isn't always in a logical useful state when you
> do physical backup. Do thing with scripts and hope you always run the
> right one.


I am afraid I do not understand exactly what you mean :-) We have a few
scripts, but only one is started manually, the rest is taken care of
automatically.

Pavel.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Rotating RAID 1
  2011-09-11 19:21                 ` Pavel Hofman
@ 2011-09-12 14:20                   ` Bill Davidsen
  0 siblings, 0 replies; 20+ messages in thread
From: Bill Davidsen @ 2011-09-12 14:20 UTC (permalink / raw)
  To: Pavel Hofman; +Cc: linux-raid

Pavel Hofman wrote:
> Dne 10.9.2011 00:28, Bill Davidsen napsal(a):
>    
>> Pavel Hofman wrote:
>>      
>>> Dne 16.8.2011 01:55, NeilBrown napsal(a):
>>>
>>>        
>>>>    Also, it doesn't have to be a linear stack.  It could be a binary tree
>>>>    though that might take a little more care to construct.
>>>>
>>>>          
>>> Since our backup server being a critical resource needs redundancy
>>> itself, we are running two degraded RAID1s in parallel, using two
>>> internal drives. The two alternating external drives plug into the
>>> corresponding bitmap-enabled RAID1.
>>>
>>>        
>> I wonder if you could use a four device raid1 here, two drives
>> permanently installed and two being added one at a time to the array.
>> That gives you internal redundancy and recent backups as well.
>>      
> I am not sure you could employ the write-intent bitmap then. And the
> bitmap makes the backup considerably faster.
>    

With --bitmap=internal you should have all of the information you need 
to do fast recovery, but I may misunderstand internal bitmap and 
possibly incremental build. What I proposed was creating the array as 
dev1 dev2 dev3 missing, then dev3 or dev4 could be added and brought up 
to current independently because they would be separate devices.

-- 
Bill Davidsen<davidsen@tmr.com>
   We are not out of the woods yet, but we know the direction and have
taken the first step. The steps are many, but finite in number, and if
we persevere we will reach our destination.  -me, 2010




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Rotating RAID 1
  2011-08-15 23:55           ` NeilBrown
  2011-08-16  6:34             ` Pavel Hofman
@ 2011-08-23  3:45             ` Jérôme Poulin
  2011-08-23  3:58               ` NeilBrown
  2011-10-25  7:34             ` linbloke
  2 siblings, 1 reply; 20+ messages in thread
From: Jérôme Poulin @ 2011-08-23  3:45 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On Mon, Aug 15, 2011 at 7:55 PM, NeilBrown <neilb@suse.de> wrote:
> Yes, add another one later would be difficult.  But if you know up-front that
> you will want three off-site devices it is easy.
>
> You could
>
>  mdadm -C /dev/md0 -l1 -n2 -b internal /dev/A missing
>  mdadm -C /dev/md1 -l1 -n2 -b internal /dev/md0 missing
>  mdadm -C /dev/md2 -l1 -n2 -b internal /dev/md1 missing
>  mdadm -C /dev/md3 -l1 -n2 -b internal /dev/md2 missing
>
>  mkfs /dev/md3 ; mount ..
>
>  So you now have 4 "missing" devices.

Alright, so I tried that on my project, being a low-end device is
resulted in about 30-40% performance lost with 8 MDs (planning in
advance), I tried disabling all bitmap to see if it helps and I get
minimal performance gain. Is there anything I should tune in this
case?
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Rotating RAID 1
  2011-08-23  3:45             ` Jérôme Poulin
@ 2011-08-23  3:58               ` NeilBrown
  2011-08-23  4:05                 ` Jérôme Poulin
  0 siblings, 1 reply; 20+ messages in thread
From: NeilBrown @ 2011-08-23  3:58 UTC (permalink / raw)
  To: Jérôme Poulin; +Cc: linux-raid

On Mon, 22 Aug 2011 23:45:53 -0400 Jérôme Poulin <jeromepoulin@gmail.com>
wrote:

> On Mon, Aug 15, 2011 at 7:55 PM, NeilBrown <neilb@suse.de> wrote:
> > Yes, add another one later would be difficult.  But if you know up-front that
> > you will want three off-site devices it is easy.
> >
> > You could
> >
> >  mdadm -C /dev/md0 -l1 -n2 -b internal /dev/A missing
> >  mdadm -C /dev/md1 -l1 -n2 -b internal /dev/md0 missing
> >  mdadm -C /dev/md2 -l1 -n2 -b internal /dev/md1 missing
> >  mdadm -C /dev/md3 -l1 -n2 -b internal /dev/md2 missing
> >
> >  mkfs /dev/md3 ; mount ..
> >
> >  So you now have 4 "missing" devices.
> 
> Alright, so I tried that on my project, being a low-end device is
> resulted in about 30-40% performance lost with 8 MDs (planning in
> advance), I tried disabling all bitmap to see if it helps and I get
> minimal performance gain. Is there anything I should tune in this
> case?

More concrete details would help...

So you have 8 MD RAID1s each with one missing device and the other device is
the next RAID1 down in the stack, except that last RAID1 where the one device
is a real device.

And in some unspecified test the RAID1 at the top of the stack gives 2/3 the
performance of the plain device?  This the same when all bitmaps are
removed.

Certainly seems strange.

Can you give details of the test and numbers etc.

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Rotating RAID 1
  2011-08-23  3:58               ` NeilBrown
@ 2011-08-23  4:05                 ` Jérôme Poulin
  2011-08-24  2:28                   ` Jérôme Poulin
  0 siblings, 1 reply; 20+ messages in thread
From: Jérôme Poulin @ 2011-08-23  4:05 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On Mon, Aug 22, 2011 at 11:58 PM, NeilBrown <neilb@suse.de> wrote:
> More concrete details would help...

Sorry, you're right, I though it could have been something fast.
I have details for the first test I made with 15 RAIDs.

>
> So you have 8 MD RAID1s each with one missing device and the other device is
> the next RAID1 down in the stack, except that last RAID1 where the one device
> is a real device.

Exactly, only 1 real device at the moment.

>
> And in some unspecified test the RAID1 at the top of the stack gives 2/3 the
> performance of the plain device?  This the same when all bitmaps are
> removed.
>
> Certainly seems strange.
>
> Can you give details of the test and numbers etc.

So the test is a backup, Veeam exactly, using Samba 3.6.0 with brand
new SMB2 protocol, bitmaps are removed.
The backup took 45 minutes instead of 14 to 22 minutes.

Here is a sample of iostat showing the average queue size increasing
by RAID devices:
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb               0.00    35.67    0.00   27.00     0.00  5579.00
413.26     2.01   74.69    0.00   74.69  34.32  92.67
md64              0.00     0.00    0.00   61.33     0.00  5577.00
181.86     0.00    0.00    0.00    0.00   0.00   0.00
md65              0.00     0.00    0.00   60.00     0.00  5574.67
185.82     0.00    0.00    0.00    0.00   0.00   0.00
md66              0.00     0.00    0.00   58.67     0.00  5572.33
189.97     0.00    0.00    0.00    0.00   0.00   0.00
md67              0.00     0.00    0.00   58.67     0.00  5572.33
189.97     0.00    0.00    0.00    0.00   0.00   0.00
md68              0.00     0.00    0.00   58.67     0.00  5572.33
189.97     0.00    0.00    0.00    0.00   0.00   0.00
md69              0.00     0.00    0.00   58.67     0.00  5572.33
189.97     0.00    0.00    0.00    0.00   0.00   0.00
md70              0.00     0.00    0.00   58.33     0.00  5572.00
191.04     0.00    0.00    0.00    0.00   0.00   0.00
md71              0.00     0.00    0.00   57.00     0.00  5569.67
195.43     0.00    0.00    0.00    0.00   0.00   0.00
md72              0.00     0.00    0.00   55.67     0.00  5567.33
200.02     0.00    0.00    0.00    0.00   0.00   0.00
md73              0.00     0.00    0.00   54.33     0.00  5565.00
204.85     0.00    0.00    0.00    0.00   0.00   0.00
md74              0.00     0.00    0.00   53.00     0.00  5562.67
209.91     0.00    0.00    0.00    0.00   0.00   0.00
md75              0.00     0.00    0.00   51.67     0.00  5560.33
215.24     0.00    0.00    0.00    0.00   0.00   0.00
md76              0.00     0.00    0.00   50.33     0.00  5558.00
220.85     0.00    0.00    0.00    0.00   0.00   0.00
md77              0.00     0.00    0.00   49.00     0.00  5555.67
226.76     0.00    0.00    0.00    0.00   0.00   0.00
md78              0.00     0.00    0.00   47.67     0.00  5553.33
233.01     0.00    0.00    0.00    0.00   0.00   0.00
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Rotating RAID 1
  2011-08-23  4:05                 ` Jérôme Poulin
@ 2011-08-24  2:28                   ` Jérôme Poulin
  0 siblings, 0 replies; 20+ messages in thread
From: Jérôme Poulin @ 2011-08-24  2:28 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On Tue, Aug 23, 2011 at 12:05 AM, Jérôme Poulin <jeromepoulin@gmail.com> wrote:
> On Mon, Aug 22, 2011 at 11:58 PM, NeilBrown <neilb@suse.de> wrote:
>> So you have 8 MD RAID1s each with one missing device and the other device is
>> the next RAID1 down in the stack, except that last RAID1 where the one device
>> is a real device.
>

More tests revealed nothing very consistent... however, there is a
consistent performance degradation on our backups when using multiple
RAID devices, backup is every 2 hours and it is really slower.

Here are the results of bonnie++ which only show degradation of per
char even if I know it is not really significant.
Rewrite was going down more and more until it went back up for no
reason, really weird unexplainable results.
First line is from raw device, sdb2, then from md device, then
incrementally more md devices in series.

                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
GANAS0202      300M  5547  93 76052  54 26862  34  5948  99 80050  49 175.6   2
GANAS0202      300M  5455  92 72428  52 26787  35  5847  97 75833  49 166.3   2
GANAS0202      300M  5401  91 71860  52 27100  35  5820  97 79219  53 156.2   2
GANAS0202      300M  5315  90 71488  51 22472  30  5673  94 73707  51 162.5   2
GANAS0202      300M  5159  87 67984  50 22860  31  5642  94 78829  54 138.6   2
GANAS0202      300M  5033  85 67091  48 22189  30  5458  91 76586  55 149.3   2
GANAS0202      300M  4904  83 65626  47 24602  34  5425  91 72349  52 112.9   2
GANAS0202      300M  4854  82 66664  48 24937  35  5120  85 75008  56 149.1   2
GANAS0202      300M  4732  80 66429  48 25646  37  5296  88 75137  57 145.7   2
GANAS0202      300M  4246  71 69589  51 25112  36  5031  84 78260  61 136.2   2
GANAS0202      300M  4253  72 70190  52 27121  40  5194  87 77648  61 107.5   2
GANAS0202      300M  4112  69 76360  55 23852  35  4827  81 74005  59 118.9   2
GANAS0202      300M  3987  67 62689  47 22475  33  4971  83 74315  61  97.6   2
GANAS0202      300M  3912  66 69769  51 22221  33  4979  83 74631  62 114.9   2
GANAS0202      300M  3602  61 52773  38 25944  40  4953  83 77794  65 125.4   2
GANAS0202      300M  3580  60 58728  43 22855  35  4680  79 74244  64 155.2   3
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Rotating RAID 1
  2011-08-15 23:55           ` NeilBrown
  2011-08-16  6:34             ` Pavel Hofman
  2011-08-23  3:45             ` Jérôme Poulin
@ 2011-10-25  7:34             ` linbloke
  2011-10-25 21:47               ` NeilBrown
  2 siblings, 1 reply; 20+ messages in thread
From: linbloke @ 2011-10-25  7:34 UTC (permalink / raw)
  To: NeilBrown; +Cc: Jérôme Poulin, Pavel Hofman, linux-raid

On 16/08/11 9:55 AM, NeilBrown wrote:
> On Mon, 15 Aug 2011 19:32:04 -0400 Jérôme Poulin<jeromepoulin@gmail.com>
> wrote:
>
>> On Mon, Aug 15, 2011 at 6:42 PM, NeilBrown<neilb@suse.de>  wrote:
>>> So if there are 3 drives A, X, Y where A is permanent and X and Y are rotated
>>> off-site, then I create two RAID1s like this:
>>>
>>>
>>> mdadm -C /dev/md0 -l1 -n2 --bitmap=internal /dev/A /dev/X
>>> mdadm -C /dev/md1 -l1 -n2 --bitmap=internal /dev/md0 /dev/Y
>> That seems nice for 2 disks, but adding another one later would be a
>> mess. Is there any way to play with slots number manually to make it
>> appear as an always degraded RAID ? I can't plug all the disks at once
>> because of the maximum of 2 ports.
> Yes, add another one later would be difficult.  But if you know up-front that
> you will want three off-site devices it is easy.
>
> You could
>
>   mdadm -C /dev/md0 -l1 -n2 -b internal /dev/A missing
>   mdadm -C /dev/md1 -l1 -n2 -b internal /dev/md0 missing
>   mdadm -C /dev/md2 -l1 -n2 -b internal /dev/md1 missing
>   mdadm -C /dev/md3 -l1 -n2 -b internal /dev/md2 missing
>
>   mkfs /dev/md3 ; mount ..
>
>   So you now have 4 "missing" devices.  Each time you plug in a device that
>   hasn't been in an array before, explicitly add it to the array that you want
>   it to be a part of and let it recover.
>   When you plug in a device that was previously plugged in, just "mdadm
>   -I /dev/XX" and it will automatically be added and recover based on the
>   bitmap.
>
>   You can have as many or as few of the transient drives plugged in at any
>   time as you like.
>
>   There is a cost here of course.  Every write potentially needs to update
>   every bitmap, so the more bitmaps, the more overhead in updating them.  So
>   don't create more than you need.
>
>   Also, it doesn't have to be a linear stack.  It could be a binary tree
>   though that might take a little more care to construct.  Then when an
>   adjacent pair of leafs are both off-site, their bitmap would not need
>   updating.
>
> NeilBrown

Hi Neil, Jérôme and Pavel,

I'm in the process of testing the solution described above and have been 
successful at those steps (I now have sync'd devices that I have failed 
and removed from their respective arrays - the "backups"). I can add new 
devices and also incrementally re-add the devices back to their 
respective arrays and all my tests show this process works well. The 
point which I'm now trying to resolve is how to create a new array from 
one of the off-site components - ie, the restore from backup test.
Below are the steps I've taken to implement and verify each step, you 
can skip to the bottom section "Restore from off-site backup" to get to 
the point if you like. When the wiki is back up, I'll post this process 
there for others who are looking for mdadm based offline backups. Any 
corrections gratefully appreciated.

Based on the example above, for a target setup of 7 off-site devices 
synced to a two device RAID1, my test setup for a is:

RAID Array    Online Device    Off-site device
md100    sdc         sdd
md101    md100    sde
md102    md101    sdf
md103    md102    sdg
md104    md103    sdh
md105    md104    sdi
md106    md105    sdj

root@deb6dev:~# uname -a
Linux deb6dev 2.6.32-5-686 #1 SMP Tue Mar 8 21:36:00 UTC 2011 i686 GNU/Linux
root@deb6dev:~# mdadm -V
mdadm - v3.1.4 - 31st August 2010
root@deb6dev:~# cat /etc/debian_version
6.0.1

Create the nested arrays
---------------------
root@deb6dev:~# mdadm -C /dev/md100 -l1 -n2 -b internal -e 1.2 /dev/sdc 
missing
mdadm: array /dev/md100 started.
root@deb6dev:~# mdadm -C /dev/md101 -l1 -n2 -b internal -e 1.2 
/dev/md100 missing
mdadm: array /dev/md101 started.
root@deb6dev:~# mdadm -C /dev/md102 -l1 -n2 -b internal -e 1.2 
/dev/md101 missing
mdadm: array /dev/md102 started.
root@deb6dev:~# mdadm -C /dev/md103 -l1 -n2 -b internal -e 1.2 
/dev/md102 missing
mdadm: array /dev/md103 started.
root@deb6dev:~# mdadm -C /dev/md104 -l1 -n2 -b internal -e 1.2 
/dev/md103 missing
mdadm: array /dev/md104 started.
root@deb6dev:~# mdadm -C /dev/md105 -l1 -n2 -b internal -e 1.2 
/dev/md104 missing
mdadm: array /dev/md105 started.
root@deb6dev:~# mdadm -C /dev/md106 -l1 -n2 -b internal -e 1.2 
/dev/md105 missing
mdadm: array /dev/md106 started.

root@deb6dev:~# cat /proc/mdstat
Personalities : [raid1]
md106 : active (auto-read-only) raid1 md105[0]
       51116 blocks super 1.2 [2/1] [U_]
       bitmap: 1/1 pages [4KB], 65536KB chunk

md105 : active raid1 md104[0]
       51128 blocks super 1.2 [2/1] [U_]
       bitmap: 1/1 pages [4KB], 65536KB chunk

md104 : active raid1 md103[0]
       51140 blocks super 1.2 [2/1] [U_]
       bitmap: 1/1 pages [4KB], 65536KB chunk

md103 : active raid1 md102[0]
       51152 blocks super 1.2 [2/1] [U_]
       bitmap: 1/1 pages [4KB], 65536KB chunk

md102 : active raid1 md101[0]
       51164 blocks super 1.2 [2/1] [U_]
       bitmap: 1/1 pages [4KB], 65536KB chunk

md101 : active raid1 md100[0]
       51176 blocks super 1.2 [2/1] [U_]
       bitmap: 1/1 pages [4KB], 65536KB chunk

md100 : active raid1 sdc[0]
       51188 blocks super 1.2 [2/1] [U_]
       bitmap: 1/1 pages [4KB], 65536KB chunk

unused devices: <none>

Create and mount a filesystem
--------------------------
root@deb6dev:~# mkfs.ext3 /dev/md106
<<successful mkfs output snipped>>
root@deb6dev:~# mount -t ext3 /dev/md106 /mnt/backup
root@deb6dev:~# df | grep backup
/dev/md106               49490      4923     42012  11% /mnt/backup

Plug in a device that hasn't been in an array before
-------------------------------------------
root@deb6dev:~# mdadm -vv /dev/md100 --add /dev/sdd
mdadm: added /dev/sdd
root@deb6dev:~# cat /proc/mdstat
Personalities : [raid1]
md106 : active raid1 md105[0]
       51116 blocks super 1.2 [2/1] [U_]
       bitmap: 1/1 pages [4KB], 65536KB chunk

md105 : active raid1 md104[0]
       51128 blocks super 1.2 [2/1] [U_]
       bitmap: 1/1 pages [4KB], 65536KB chunk

md104 : active raid1 md103[0]
       51140 blocks super 1.2 [2/1] [U_]
       bitmap: 1/1 pages [4KB], 65536KB chunk

md103 : active raid1 md102[0]
       51152 blocks super 1.2 [2/1] [U_]
       bitmap: 1/1 pages [4KB], 65536KB chunk

md102 : active raid1 md101[0]
       51164 blocks super 1.2 [2/1] [U_]
       bitmap: 1/1 pages [4KB], 65536KB chunk

md101 : active raid1 md100[0]
       51176 blocks super 1.2 [2/1] [U_]
       bitmap: 1/1 pages [4KB], 65536KB chunk

md100 : active raid1 sdd[2] sdc[0]
       51188 blocks super 1.2 [2/2] [UU]
       bitmap: 1/1 pages [4KB], 65536KB chunk


Write to the array
---------------
root@deb6dev:~# dd if=/dev/urandom of=a.blob bs=1M count=20
20+0 records in
20+0 records out
20971520 bytes (21 MB) copied, 5.05528 s, 4.1 MB/s
root@deb6dev:~# dd if=/dev/urandom of=b.blob bs=1M count=10
10+0 records in
10+0 records out
10485760 bytes (10 MB) copied, 2.59361 s, 4.0 MB/s
root@deb6dev:~# dd if=/dev/urandom of=c.blob bs=1M count=5
5+0 records in
5+0 records out
5242880 bytes (5.2 MB) copied, 1.35619 s, 3.9 MB/s
root@deb6dev:~# md5sum *blob > md5sums.txt
root@deb6dev:~# ls -l
total 35844
-rw-r--r-- 1 root root 20971520 Oct 25 15:57 a.blob
-rw-r--r-- 1 root root 10485760 Oct 25 15:57 b.blob
-rw-r--r-- 1 root root  5242880 Oct 25 15:57 c.blob
-rw-r--r-- 1 root root      123 Oct 25 15:57 md5sums.txt
root@deb6dev:~# cp *blob /mnt/backup
root@deb6dev:~# ls -l /mnt/backup
total 35995
-rw-r--r-- 1 root root 20971520 Oct 25 15:58 a.blob
-rw-r--r-- 1 root root 10485760 Oct 25 15:58 b.blob
-rw-r--r-- 1 root root  5242880 Oct 25 15:58 c.blob
drwx------ 2 root root    12288 Oct 25 15:27 lost+found
root@deb6dev:~# df | grep backup
/dev/md106               49490     40906      6029  88% /mnt/backup
root@deb6dev:~# cat /proc/mdstat | grep -A 3 '^md100'
md100 : active raid1 sdd[2] sdc[0]
       51188 blocks super 1.2 [2/2] [UU]
       bitmap: 0/1 pages [0KB], 65536KB chunk

Data written and array devices in sync (bitmap 0/1)

Fail and remove device
-------------------
root@deb6dev:~# sync
root@deb6dev:~# mdadm -vv /dev/md100 --fail /dev/sdd --remove /dev/sdd
mdadm: set /dev/sdd faulty in /dev/md100
mdadm: hot removed /dev/sdd from /dev/md100
root@deb6dev:~# cat /proc/mdstat | grep -A 3 '^md100'
md100 : active raid1 sdc[0]
       51188 blocks super 1.2 [2/1] [U_]
       bitmap: 0/1 pages [0KB], 65536KB chunk

Device may now be unplugged


Write to the array again
--------------------
root@deb6dev:~# rm /mnt/backup/b.blob
root@deb6dev:~# ls -l /mnt/backup
total 25714
-rw-r--r-- 1 root root 20971520 Oct 25 15:58 a.blob
-rw-r--r-- 1 root root  5242880 Oct 25 15:58 c.blob
drwx------ 2 root root    12288 Oct 25 15:27 lost+found
root@deb6dev:~# df | grep backup
/dev/md106               49490     30625     16310  66% /mnt/backup
root@deb6dev:~# cat /proc/mdstat | grep -A 3 '^md100'
md100 : active raid1 sdc[0]
       51188 blocks super 1.2 [2/1] [U_]
       bitmap: 1/1 pages [4KB], 65536KB chunk

bitmap 1/1 shows array is not in sync (we know it's due to the writes 
pending for the device we previously failed)

Plug in a device that was previously plugged in
----------------------------------------
root@deb6dev:~# mdadm -vv -I /dev/sdd --run
mdadm: UUID differs from /dev/md/0.
mdadm: UUID differs from /dev/md/1.
mdadm: /dev/sdd attached to /dev/md100 which is already active.
root@deb6dev:~# sync
root@deb6dev:~# cat /proc/mdstat | grep -A 3 '^md100'
md100 : active raid1 sdd[2] sdc[0]
       51188 blocks super 1.2 [2/2] [UU]
       bitmap: 0/1 pages [0KB], 65536KB chunk

Device reconnected [UU] and in sync (bitmap 0/1)

Restore from off-site device
------------------------
Remove device from array
root@deb6dev:~# mdadm -vv /dev/md100 --fail /dev/sdd --remove /dev/sdd
mdadm: set /dev/sdd faulty in /dev/md100
mdadm: hot removed /dev/sdd from /dev/md100
root@deb6dev:~# mdadm -Ev /dev/sdd
/dev/sdd:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x1
      Array UUID : 4c957fac:d7dbc792:b642daf0:d22e313e
            Name : deb6dev:100  (local to host deb6dev)
   Creation Time : Tue Oct 25 15:22:19 2011
      Raid Level : raid1
    Raid Devices : 2

  Avail Dev Size : 102376 (50.00 MiB 52.42 MB)
      Array Size : 102376 (50.00 MiB 52.42 MB)
     Data Offset : 24 sectors
    Super Offset : 8 sectors
           State : clean
     Device UUID : 381f453f:5a97f1f6:bb5098bb:8c071a95

Internal Bitmap : 8 sectors from superblock
     Update Time : Tue Oct 25 17:27:53 2011
        Checksum : acbcee5f - correct
          Events : 250


    Device Role : Active device 1
    Array State : AA ('A' == active, '.' == missing)

Assemble a new array from off-site component:
root@deb6dev:~# mdadm -vv -A /dev/md200 --run /dev/sdd
mdadm: looking for devices for /dev/md200
mdadm: /dev/sdd is identified as a member of /dev/md200, slot 1.
mdadm: no uptodate device for slot 0 of /dev/md200
mdadm: added /dev/sdd to /dev/md200 as 1
mdadm: /dev/md200 has been started with 1 drive (out of 2).
root@deb6dev:~#

Check file-system on new array
root@deb6dev:~# fsck.ext3 -f -n /dev/md200
e2fsck 1.41.12 (17-May-2010)
fsck.ext3: Superblock invalid, trying backup blocks...
fsck.ext3: Bad magic number in super-block while trying to open /dev/md200

The superblock could not be read or does not describe a correct ext2
filesystem.  If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
     e2fsck -b 8193 <device>


How do I use these devices in a new array?

Kind regards and thanks for your help,
Josh

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Rotating RAID 1
  2011-10-25  7:34             ` linbloke
@ 2011-10-25 21:47               ` NeilBrown
  0 siblings, 0 replies; 20+ messages in thread
From: NeilBrown @ 2011-10-25 21:47 UTC (permalink / raw)
  To: linbloke; +Cc: Jérôme Poulin, Pavel Hofman, linux-raid

[-- Attachment #1: Type: text/plain, Size: 12942 bytes --]

On Tue, 25 Oct 2011 18:34:57 +1100 linbloke <linbloke@fastmail.fm> wrote:

> On 16/08/11 9:55 AM, NeilBrown wrote:
> > On Mon, 15 Aug 2011 19:32:04 -0400 Jérôme Poulin<jeromepoulin@gmail.com>
> > wrote:
> >
> >> On Mon, Aug 15, 2011 at 6:42 PM, NeilBrown<neilb@suse.de>  wrote:
> >>> So if there are 3 drives A, X, Y where A is permanent and X and Y are rotated
> >>> off-site, then I create two RAID1s like this:
> >>>
> >>>
> >>> mdadm -C /dev/md0 -l1 -n2 --bitmap=internal /dev/A /dev/X
> >>> mdadm -C /dev/md1 -l1 -n2 --bitmap=internal /dev/md0 /dev/Y
> >> That seems nice for 2 disks, but adding another one later would be a
> >> mess. Is there any way to play with slots number manually to make it
> >> appear as an always degraded RAID ? I can't plug all the disks at once
> >> because of the maximum of 2 ports.
> > Yes, add another one later would be difficult.  But if you know up-front that
> > you will want three off-site devices it is easy.
> >
> > You could
> >
> >   mdadm -C /dev/md0 -l1 -n2 -b internal /dev/A missing
> >   mdadm -C /dev/md1 -l1 -n2 -b internal /dev/md0 missing
> >   mdadm -C /dev/md2 -l1 -n2 -b internal /dev/md1 missing
> >   mdadm -C /dev/md3 -l1 -n2 -b internal /dev/md2 missing
> >
> >   mkfs /dev/md3 ; mount ..
> >
> >   So you now have 4 "missing" devices.  Each time you plug in a device that
> >   hasn't been in an array before, explicitly add it to the array that you want
> >   it to be a part of and let it recover.
> >   When you plug in a device that was previously plugged in, just "mdadm
> >   -I /dev/XX" and it will automatically be added and recover based on the
> >   bitmap.
> >
> >   You can have as many or as few of the transient drives plugged in at any
> >   time as you like.
> >
> >   There is a cost here of course.  Every write potentially needs to update
> >   every bitmap, so the more bitmaps, the more overhead in updating them.  So
> >   don't create more than you need.
> >
> >   Also, it doesn't have to be a linear stack.  It could be a binary tree
> >   though that might take a little more care to construct.  Then when an
> >   adjacent pair of leafs are both off-site, their bitmap would not need
> >   updating.
> >
> > NeilBrown
> 
> Hi Neil, Jérôme and Pavel,
> 
> I'm in the process of testing the solution described above and have been 
> successful at those steps (I now have sync'd devices that I have failed 
> and removed from their respective arrays - the "backups"). I can add new 
> devices and also incrementally re-add the devices back to their 
> respective arrays and all my tests show this process works well. The 
> point which I'm now trying to resolve is how to create a new array from 
> one of the off-site components - ie, the restore from backup test.
> Below are the steps I've taken to implement and verify each step, you 
> can skip to the bottom section "Restore from off-site backup" to get to 
> the point if you like. When the wiki is back up, I'll post this process 
> there for others who are looking for mdadm based offline backups. Any 
> corrections gratefully appreciated.
> 
> Based on the example above, for a target setup of 7 off-site devices 
> synced to a two device RAID1, my test setup for a is:
> 
> RAID Array    Online Device    Off-site device
> md100    sdc         sdd
> md101    md100    sde
> md102    md101    sdf
> md103    md102    sdg
> md104    md103    sdh
> md105    md104    sdi
> md106    md105    sdj
> 
> root@deb6dev:~# uname -a
> Linux deb6dev 2.6.32-5-686 #1 SMP Tue Mar 8 21:36:00 UTC 2011 i686 GNU/Linux
> root@deb6dev:~# mdadm -V
> mdadm - v3.1.4 - 31st August 2010
> root@deb6dev:~# cat /etc/debian_version
> 6.0.1
> 
> Create the nested arrays
> ---------------------
> root@deb6dev:~# mdadm -C /dev/md100 -l1 -n2 -b internal -e 1.2 /dev/sdc 
> missing
> mdadm: array /dev/md100 started.
> root@deb6dev:~# mdadm -C /dev/md101 -l1 -n2 -b internal -e 1.2 
> /dev/md100 missing
> mdadm: array /dev/md101 started.
> root@deb6dev:~# mdadm -C /dev/md102 -l1 -n2 -b internal -e 1.2 
> /dev/md101 missing
> mdadm: array /dev/md102 started.
> root@deb6dev:~# mdadm -C /dev/md103 -l1 -n2 -b internal -e 1.2 
> /dev/md102 missing
> mdadm: array /dev/md103 started.
> root@deb6dev:~# mdadm -C /dev/md104 -l1 -n2 -b internal -e 1.2 
> /dev/md103 missing
> mdadm: array /dev/md104 started.
> root@deb6dev:~# mdadm -C /dev/md105 -l1 -n2 -b internal -e 1.2 
> /dev/md104 missing
> mdadm: array /dev/md105 started.
> root@deb6dev:~# mdadm -C /dev/md106 -l1 -n2 -b internal -e 1.2 
> /dev/md105 missing
> mdadm: array /dev/md106 started.
> 
> root@deb6dev:~# cat /proc/mdstat
> Personalities : [raid1]
> md106 : active (auto-read-only) raid1 md105[0]
>        51116 blocks super 1.2 [2/1] [U_]
>        bitmap: 1/1 pages [4KB], 65536KB chunk
> 
> md105 : active raid1 md104[0]
>        51128 blocks super 1.2 [2/1] [U_]
>        bitmap: 1/1 pages [4KB], 65536KB chunk
> 
> md104 : active raid1 md103[0]
>        51140 blocks super 1.2 [2/1] [U_]
>        bitmap: 1/1 pages [4KB], 65536KB chunk
> 
> md103 : active raid1 md102[0]
>        51152 blocks super 1.2 [2/1] [U_]
>        bitmap: 1/1 pages [4KB], 65536KB chunk
> 
> md102 : active raid1 md101[0]
>        51164 blocks super 1.2 [2/1] [U_]
>        bitmap: 1/1 pages [4KB], 65536KB chunk
> 
> md101 : active raid1 md100[0]
>        51176 blocks super 1.2 [2/1] [U_]
>        bitmap: 1/1 pages [4KB], 65536KB chunk
> 
> md100 : active raid1 sdc[0]
>        51188 blocks super 1.2 [2/1] [U_]
>        bitmap: 1/1 pages [4KB], 65536KB chunk
> 
> unused devices: <none>
> 
> Create and mount a filesystem
> --------------------------
> root@deb6dev:~# mkfs.ext3 /dev/md106
> <<successful mkfs output snipped>>
> root@deb6dev:~# mount -t ext3 /dev/md106 /mnt/backup
> root@deb6dev:~# df | grep backup
> /dev/md106               49490      4923     42012  11% /mnt/backup
> 
> Plug in a device that hasn't been in an array before
> -------------------------------------------
> root@deb6dev:~# mdadm -vv /dev/md100 --add /dev/sdd
> mdadm: added /dev/sdd
> root@deb6dev:~# cat /proc/mdstat
> Personalities : [raid1]
> md106 : active raid1 md105[0]
>        51116 blocks super 1.2 [2/1] [U_]
>        bitmap: 1/1 pages [4KB], 65536KB chunk
> 
> md105 : active raid1 md104[0]
>        51128 blocks super 1.2 [2/1] [U_]
>        bitmap: 1/1 pages [4KB], 65536KB chunk
> 
> md104 : active raid1 md103[0]
>        51140 blocks super 1.2 [2/1] [U_]
>        bitmap: 1/1 pages [4KB], 65536KB chunk
> 
> md103 : active raid1 md102[0]
>        51152 blocks super 1.2 [2/1] [U_]
>        bitmap: 1/1 pages [4KB], 65536KB chunk
> 
> md102 : active raid1 md101[0]
>        51164 blocks super 1.2 [2/1] [U_]
>        bitmap: 1/1 pages [4KB], 65536KB chunk
> 
> md101 : active raid1 md100[0]
>        51176 blocks super 1.2 [2/1] [U_]
>        bitmap: 1/1 pages [4KB], 65536KB chunk
> 
> md100 : active raid1 sdd[2] sdc[0]
>        51188 blocks super 1.2 [2/2] [UU]
>        bitmap: 1/1 pages [4KB], 65536KB chunk
> 
> 
> Write to the array
> ---------------
> root@deb6dev:~# dd if=/dev/urandom of=a.blob bs=1M count=20
> 20+0 records in
> 20+0 records out
> 20971520 bytes (21 MB) copied, 5.05528 s, 4.1 MB/s
> root@deb6dev:~# dd if=/dev/urandom of=b.blob bs=1M count=10
> 10+0 records in
> 10+0 records out
> 10485760 bytes (10 MB) copied, 2.59361 s, 4.0 MB/s
> root@deb6dev:~# dd if=/dev/urandom of=c.blob bs=1M count=5
> 5+0 records in
> 5+0 records out
> 5242880 bytes (5.2 MB) copied, 1.35619 s, 3.9 MB/s
> root@deb6dev:~# md5sum *blob > md5sums.txt
> root@deb6dev:~# ls -l
> total 35844
> -rw-r--r-- 1 root root 20971520 Oct 25 15:57 a.blob
> -rw-r--r-- 1 root root 10485760 Oct 25 15:57 b.blob
> -rw-r--r-- 1 root root  5242880 Oct 25 15:57 c.blob
> -rw-r--r-- 1 root root      123 Oct 25 15:57 md5sums.txt
> root@deb6dev:~# cp *blob /mnt/backup
> root@deb6dev:~# ls -l /mnt/backup
> total 35995
> -rw-r--r-- 1 root root 20971520 Oct 25 15:58 a.blob
> -rw-r--r-- 1 root root 10485760 Oct 25 15:58 b.blob
> -rw-r--r-- 1 root root  5242880 Oct 25 15:58 c.blob
> drwx------ 2 root root    12288 Oct 25 15:27 lost+found
> root@deb6dev:~# df | grep backup
> /dev/md106               49490     40906      6029  88% /mnt/backup
> root@deb6dev:~# cat /proc/mdstat | grep -A 3 '^md100'
> md100 : active raid1 sdd[2] sdc[0]
>        51188 blocks super 1.2 [2/2] [UU]
>        bitmap: 0/1 pages [0KB], 65536KB chunk
> 
> Data written and array devices in sync (bitmap 0/1)
> 
> Fail and remove device
> -------------------
> root@deb6dev:~# sync
> root@deb6dev:~# mdadm -vv /dev/md100 --fail /dev/sdd --remove /dev/sdd
> mdadm: set /dev/sdd faulty in /dev/md100
> mdadm: hot removed /dev/sdd from /dev/md100
> root@deb6dev:~# cat /proc/mdstat | grep -A 3 '^md100'
> md100 : active raid1 sdc[0]
>        51188 blocks super 1.2 [2/1] [U_]
>        bitmap: 0/1 pages [0KB], 65536KB chunk
> 
> Device may now be unplugged
> 
> 
> Write to the array again
> --------------------
> root@deb6dev:~# rm /mnt/backup/b.blob
> root@deb6dev:~# ls -l /mnt/backup
> total 25714
> -rw-r--r-- 1 root root 20971520 Oct 25 15:58 a.blob
> -rw-r--r-- 1 root root  5242880 Oct 25 15:58 c.blob
> drwx------ 2 root root    12288 Oct 25 15:27 lost+found
> root@deb6dev:~# df | grep backup
> /dev/md106               49490     30625     16310  66% /mnt/backup
> root@deb6dev:~# cat /proc/mdstat | grep -A 3 '^md100'
> md100 : active raid1 sdc[0]
>        51188 blocks super 1.2 [2/1] [U_]
>        bitmap: 1/1 pages [4KB], 65536KB chunk
> 
> bitmap 1/1 shows array is not in sync (we know it's due to the writes 
> pending for the device we previously failed)
> 
> Plug in a device that was previously plugged in
> ----------------------------------------
> root@deb6dev:~# mdadm -vv -I /dev/sdd --run
> mdadm: UUID differs from /dev/md/0.
> mdadm: UUID differs from /dev/md/1.
> mdadm: /dev/sdd attached to /dev/md100 which is already active.
> root@deb6dev:~# sync
> root@deb6dev:~# cat /proc/mdstat | grep -A 3 '^md100'
> md100 : active raid1 sdd[2] sdc[0]
>        51188 blocks super 1.2 [2/2] [UU]
>        bitmap: 0/1 pages [0KB], 65536KB chunk
> 
> Device reconnected [UU] and in sync (bitmap 0/1)
> 
> Restore from off-site device
> ------------------------
> Remove device from array
> root@deb6dev:~# mdadm -vv /dev/md100 --fail /dev/sdd --remove /dev/sdd
> mdadm: set /dev/sdd faulty in /dev/md100
> mdadm: hot removed /dev/sdd from /dev/md100
> root@deb6dev:~# mdadm -Ev /dev/sdd
> /dev/sdd:
>            Magic : a92b4efc
>          Version : 1.2
>      Feature Map : 0x1
>       Array UUID : 4c957fac:d7dbc792:b642daf0:d22e313e
>             Name : deb6dev:100  (local to host deb6dev)
>    Creation Time : Tue Oct 25 15:22:19 2011
>       Raid Level : raid1
>     Raid Devices : 2
> 
>   Avail Dev Size : 102376 (50.00 MiB 52.42 MB)
>       Array Size : 102376 (50.00 MiB 52.42 MB)
>      Data Offset : 24 sectors
>     Super Offset : 8 sectors
>            State : clean
>      Device UUID : 381f453f:5a97f1f6:bb5098bb:8c071a95
> 
> Internal Bitmap : 8 sectors from superblock
>      Update Time : Tue Oct 25 17:27:53 2011
>         Checksum : acbcee5f - correct
>           Events : 250
> 
> 
>     Device Role : Active device 1
>     Array State : AA ('A' == active, '.' == missing)
> 
> Assemble a new array from off-site component:
> root@deb6dev:~# mdadm -vv -A /dev/md200 --run /dev/sdd
> mdadm: looking for devices for /dev/md200
> mdadm: /dev/sdd is identified as a member of /dev/md200, slot 1.
> mdadm: no uptodate device for slot 0 of /dev/md200
> mdadm: added /dev/sdd to /dev/md200 as 1
> mdadm: /dev/md200 has been started with 1 drive (out of 2).
> root@deb6dev:~#
> 
> Check file-system on new array
> root@deb6dev:~# fsck.ext3 -f -n /dev/md200
> e2fsck 1.41.12 (17-May-2010)
> fsck.ext3: Superblock invalid, trying backup blocks...
> fsck.ext3: Bad magic number in super-block while trying to open /dev/md200
> 
> The superblock could not be read or does not describe a correct ext2
> filesystem.  If the device is valid and it really contains an ext2
> filesystem (and not swap or ufs or something else), then the superblock
> is corrupt, and you might try running e2fsck with an alternate superblock:
>      e2fsck -b 8193 <device>
> 
> 
> How do I use these devices in a new array?
> 

You need to also assemble md201 md202 md203 md204 md205 md206
and the fsck/mount md206
Each of these is made by assembling the single previous md20X array.

mdadm -A /dev/md201 --run /dev/md200
mdadm -A /dev/md202 --run /dev/md201
....
mdadm -A /dev/md206 --run /dev/md205

All the rest of your description looks good!

Thanks,
NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Rotating RAID 1
  2011-08-15 22:42       ` NeilBrown
  2011-08-15 23:32         ` Jérôme Poulin
@ 2011-08-16  4:36         ` maurice
  1 sibling, 0 replies; 20+ messages in thread
From: maurice @ 2011-08-16  4:36 UTC (permalink / raw)
  To: NeilBrown; +Cc: Pavel Hofman, Jérôme Poulin, linux-raid

On 8/15/2011 4:42 PM, NeilBrown wrote:
> ..I'm not sure from you description whether the following describes 
> exactly
> what you are doing or not, but this is how I would do it.
> As you say, you need two bitmaps.
> So if there are 3 drives A, X, Y where A is permanent and X and Y are 
> rotated off-site,
> then I create two RAID1s like this:
>   mdadm -C /dev/md0 -l1 -n2 --bitmap=internal /dev/A /dev/X
>   mdadm -C /dev/md1 -l1 -n2 --bitmap=internal /dev/md0 /dev/Y
>
>   mkfs /dev/md1; mount /dev/md1 ...
>
> Then you can remove either or both of X and Y and which each is 
> re-added it will
> recover just the blocks that it needs.
> X from the bitmap of md0, Y from the bitmap of md1.
>
> NeilBrown


How elegantly described.

After so many instances of being told "You should not use RAID as a 
backup device like that!"
it is pleasant to hear you detail the "right way" to do this.

Thank you very much for that Neil.


-- 
Cheers,
Maurice Hilarius
eMail: /mhilarius@gmail.com/

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2011-10-25 21:47 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-08-15 19:56 Rotating RAID 1 Jérôme Poulin
2011-08-15 20:19 ` Phil Turmel
2011-08-15 20:23   ` Jérôme Poulin
2011-08-15 20:21 ` Pavel Hofman
2011-08-15 20:25   ` Jérôme Poulin
2011-08-15 20:42     ` Pavel Hofman
2011-08-15 22:42       ` NeilBrown
2011-08-15 23:32         ` Jérôme Poulin
2011-08-15 23:55           ` NeilBrown
2011-08-16  6:34             ` Pavel Hofman
2011-09-09 22:28               ` Bill Davidsen
2011-09-11 19:21                 ` Pavel Hofman
2011-09-12 14:20                   ` Bill Davidsen
2011-08-23  3:45             ` Jérôme Poulin
2011-08-23  3:58               ` NeilBrown
2011-08-23  4:05                 ` Jérôme Poulin
2011-08-24  2:28                   ` Jérôme Poulin
2011-10-25  7:34             ` linbloke
2011-10-25 21:47               ` NeilBrown
2011-08-16  4:36         ` maurice

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).