From: NeilBrown <neilb@suse.de>
To: linbloke <linbloke@fastmail.fm>
Cc: "Jérôme Poulin" <jeromepoulin@gmail.com>,
"Pavel Hofman" <pavel.hofman@ivitera.com>,
linux-raid <linux-raid@vger.kernel.org>
Subject: Re: Rotating RAID 1
Date: Wed, 26 Oct 2011 08:47:53 +1100 [thread overview]
Message-ID: <20111026084753.1db251b2@notabene.brown> (raw)
In-Reply-To: <4EA666A1.9000904@fastmail.fm>
[-- Attachment #1: Type: text/plain, Size: 12942 bytes --]
On Tue, 25 Oct 2011 18:34:57 +1100 linbloke <linbloke@fastmail.fm> wrote:
> On 16/08/11 9:55 AM, NeilBrown wrote:
> > On Mon, 15 Aug 2011 19:32:04 -0400 Jérôme Poulin<jeromepoulin@gmail.com>
> > wrote:
> >
> >> On Mon, Aug 15, 2011 at 6:42 PM, NeilBrown<neilb@suse.de> wrote:
> >>> So if there are 3 drives A, X, Y where A is permanent and X and Y are rotated
> >>> off-site, then I create two RAID1s like this:
> >>>
> >>>
> >>> mdadm -C /dev/md0 -l1 -n2 --bitmap=internal /dev/A /dev/X
> >>> mdadm -C /dev/md1 -l1 -n2 --bitmap=internal /dev/md0 /dev/Y
> >> That seems nice for 2 disks, but adding another one later would be a
> >> mess. Is there any way to play with slots number manually to make it
> >> appear as an always degraded RAID ? I can't plug all the disks at once
> >> because of the maximum of 2 ports.
> > Yes, add another one later would be difficult. But if you know up-front that
> > you will want three off-site devices it is easy.
> >
> > You could
> >
> > mdadm -C /dev/md0 -l1 -n2 -b internal /dev/A missing
> > mdadm -C /dev/md1 -l1 -n2 -b internal /dev/md0 missing
> > mdadm -C /dev/md2 -l1 -n2 -b internal /dev/md1 missing
> > mdadm -C /dev/md3 -l1 -n2 -b internal /dev/md2 missing
> >
> > mkfs /dev/md3 ; mount ..
> >
> > So you now have 4 "missing" devices. Each time you plug in a device that
> > hasn't been in an array before, explicitly add it to the array that you want
> > it to be a part of and let it recover.
> > When you plug in a device that was previously plugged in, just "mdadm
> > -I /dev/XX" and it will automatically be added and recover based on the
> > bitmap.
> >
> > You can have as many or as few of the transient drives plugged in at any
> > time as you like.
> >
> > There is a cost here of course. Every write potentially needs to update
> > every bitmap, so the more bitmaps, the more overhead in updating them. So
> > don't create more than you need.
> >
> > Also, it doesn't have to be a linear stack. It could be a binary tree
> > though that might take a little more care to construct. Then when an
> > adjacent pair of leafs are both off-site, their bitmap would not need
> > updating.
> >
> > NeilBrown
>
> Hi Neil, Jérôme and Pavel,
>
> I'm in the process of testing the solution described above and have been
> successful at those steps (I now have sync'd devices that I have failed
> and removed from their respective arrays - the "backups"). I can add new
> devices and also incrementally re-add the devices back to their
> respective arrays and all my tests show this process works well. The
> point which I'm now trying to resolve is how to create a new array from
> one of the off-site components - ie, the restore from backup test.
> Below are the steps I've taken to implement and verify each step, you
> can skip to the bottom section "Restore from off-site backup" to get to
> the point if you like. When the wiki is back up, I'll post this process
> there for others who are looking for mdadm based offline backups. Any
> corrections gratefully appreciated.
>
> Based on the example above, for a target setup of 7 off-site devices
> synced to a two device RAID1, my test setup for a is:
>
> RAID Array Online Device Off-site device
> md100 sdc sdd
> md101 md100 sde
> md102 md101 sdf
> md103 md102 sdg
> md104 md103 sdh
> md105 md104 sdi
> md106 md105 sdj
>
> root@deb6dev:~# uname -a
> Linux deb6dev 2.6.32-5-686 #1 SMP Tue Mar 8 21:36:00 UTC 2011 i686 GNU/Linux
> root@deb6dev:~# mdadm -V
> mdadm - v3.1.4 - 31st August 2010
> root@deb6dev:~# cat /etc/debian_version
> 6.0.1
>
> Create the nested arrays
> ---------------------
> root@deb6dev:~# mdadm -C /dev/md100 -l1 -n2 -b internal -e 1.2 /dev/sdc
> missing
> mdadm: array /dev/md100 started.
> root@deb6dev:~# mdadm -C /dev/md101 -l1 -n2 -b internal -e 1.2
> /dev/md100 missing
> mdadm: array /dev/md101 started.
> root@deb6dev:~# mdadm -C /dev/md102 -l1 -n2 -b internal -e 1.2
> /dev/md101 missing
> mdadm: array /dev/md102 started.
> root@deb6dev:~# mdadm -C /dev/md103 -l1 -n2 -b internal -e 1.2
> /dev/md102 missing
> mdadm: array /dev/md103 started.
> root@deb6dev:~# mdadm -C /dev/md104 -l1 -n2 -b internal -e 1.2
> /dev/md103 missing
> mdadm: array /dev/md104 started.
> root@deb6dev:~# mdadm -C /dev/md105 -l1 -n2 -b internal -e 1.2
> /dev/md104 missing
> mdadm: array /dev/md105 started.
> root@deb6dev:~# mdadm -C /dev/md106 -l1 -n2 -b internal -e 1.2
> /dev/md105 missing
> mdadm: array /dev/md106 started.
>
> root@deb6dev:~# cat /proc/mdstat
> Personalities : [raid1]
> md106 : active (auto-read-only) raid1 md105[0]
> 51116 blocks super 1.2 [2/1] [U_]
> bitmap: 1/1 pages [4KB], 65536KB chunk
>
> md105 : active raid1 md104[0]
> 51128 blocks super 1.2 [2/1] [U_]
> bitmap: 1/1 pages [4KB], 65536KB chunk
>
> md104 : active raid1 md103[0]
> 51140 blocks super 1.2 [2/1] [U_]
> bitmap: 1/1 pages [4KB], 65536KB chunk
>
> md103 : active raid1 md102[0]
> 51152 blocks super 1.2 [2/1] [U_]
> bitmap: 1/1 pages [4KB], 65536KB chunk
>
> md102 : active raid1 md101[0]
> 51164 blocks super 1.2 [2/1] [U_]
> bitmap: 1/1 pages [4KB], 65536KB chunk
>
> md101 : active raid1 md100[0]
> 51176 blocks super 1.2 [2/1] [U_]
> bitmap: 1/1 pages [4KB], 65536KB chunk
>
> md100 : active raid1 sdc[0]
> 51188 blocks super 1.2 [2/1] [U_]
> bitmap: 1/1 pages [4KB], 65536KB chunk
>
> unused devices: <none>
>
> Create and mount a filesystem
> --------------------------
> root@deb6dev:~# mkfs.ext3 /dev/md106
> <<successful mkfs output snipped>>
> root@deb6dev:~# mount -t ext3 /dev/md106 /mnt/backup
> root@deb6dev:~# df | grep backup
> /dev/md106 49490 4923 42012 11% /mnt/backup
>
> Plug in a device that hasn't been in an array before
> -------------------------------------------
> root@deb6dev:~# mdadm -vv /dev/md100 --add /dev/sdd
> mdadm: added /dev/sdd
> root@deb6dev:~# cat /proc/mdstat
> Personalities : [raid1]
> md106 : active raid1 md105[0]
> 51116 blocks super 1.2 [2/1] [U_]
> bitmap: 1/1 pages [4KB], 65536KB chunk
>
> md105 : active raid1 md104[0]
> 51128 blocks super 1.2 [2/1] [U_]
> bitmap: 1/1 pages [4KB], 65536KB chunk
>
> md104 : active raid1 md103[0]
> 51140 blocks super 1.2 [2/1] [U_]
> bitmap: 1/1 pages [4KB], 65536KB chunk
>
> md103 : active raid1 md102[0]
> 51152 blocks super 1.2 [2/1] [U_]
> bitmap: 1/1 pages [4KB], 65536KB chunk
>
> md102 : active raid1 md101[0]
> 51164 blocks super 1.2 [2/1] [U_]
> bitmap: 1/1 pages [4KB], 65536KB chunk
>
> md101 : active raid1 md100[0]
> 51176 blocks super 1.2 [2/1] [U_]
> bitmap: 1/1 pages [4KB], 65536KB chunk
>
> md100 : active raid1 sdd[2] sdc[0]
> 51188 blocks super 1.2 [2/2] [UU]
> bitmap: 1/1 pages [4KB], 65536KB chunk
>
>
> Write to the array
> ---------------
> root@deb6dev:~# dd if=/dev/urandom of=a.blob bs=1M count=20
> 20+0 records in
> 20+0 records out
> 20971520 bytes (21 MB) copied, 5.05528 s, 4.1 MB/s
> root@deb6dev:~# dd if=/dev/urandom of=b.blob bs=1M count=10
> 10+0 records in
> 10+0 records out
> 10485760 bytes (10 MB) copied, 2.59361 s, 4.0 MB/s
> root@deb6dev:~# dd if=/dev/urandom of=c.blob bs=1M count=5
> 5+0 records in
> 5+0 records out
> 5242880 bytes (5.2 MB) copied, 1.35619 s, 3.9 MB/s
> root@deb6dev:~# md5sum *blob > md5sums.txt
> root@deb6dev:~# ls -l
> total 35844
> -rw-r--r-- 1 root root 20971520 Oct 25 15:57 a.blob
> -rw-r--r-- 1 root root 10485760 Oct 25 15:57 b.blob
> -rw-r--r-- 1 root root 5242880 Oct 25 15:57 c.blob
> -rw-r--r-- 1 root root 123 Oct 25 15:57 md5sums.txt
> root@deb6dev:~# cp *blob /mnt/backup
> root@deb6dev:~# ls -l /mnt/backup
> total 35995
> -rw-r--r-- 1 root root 20971520 Oct 25 15:58 a.blob
> -rw-r--r-- 1 root root 10485760 Oct 25 15:58 b.blob
> -rw-r--r-- 1 root root 5242880 Oct 25 15:58 c.blob
> drwx------ 2 root root 12288 Oct 25 15:27 lost+found
> root@deb6dev:~# df | grep backup
> /dev/md106 49490 40906 6029 88% /mnt/backup
> root@deb6dev:~# cat /proc/mdstat | grep -A 3 '^md100'
> md100 : active raid1 sdd[2] sdc[0]
> 51188 blocks super 1.2 [2/2] [UU]
> bitmap: 0/1 pages [0KB], 65536KB chunk
>
> Data written and array devices in sync (bitmap 0/1)
>
> Fail and remove device
> -------------------
> root@deb6dev:~# sync
> root@deb6dev:~# mdadm -vv /dev/md100 --fail /dev/sdd --remove /dev/sdd
> mdadm: set /dev/sdd faulty in /dev/md100
> mdadm: hot removed /dev/sdd from /dev/md100
> root@deb6dev:~# cat /proc/mdstat | grep -A 3 '^md100'
> md100 : active raid1 sdc[0]
> 51188 blocks super 1.2 [2/1] [U_]
> bitmap: 0/1 pages [0KB], 65536KB chunk
>
> Device may now be unplugged
>
>
> Write to the array again
> --------------------
> root@deb6dev:~# rm /mnt/backup/b.blob
> root@deb6dev:~# ls -l /mnt/backup
> total 25714
> -rw-r--r-- 1 root root 20971520 Oct 25 15:58 a.blob
> -rw-r--r-- 1 root root 5242880 Oct 25 15:58 c.blob
> drwx------ 2 root root 12288 Oct 25 15:27 lost+found
> root@deb6dev:~# df | grep backup
> /dev/md106 49490 30625 16310 66% /mnt/backup
> root@deb6dev:~# cat /proc/mdstat | grep -A 3 '^md100'
> md100 : active raid1 sdc[0]
> 51188 blocks super 1.2 [2/1] [U_]
> bitmap: 1/1 pages [4KB], 65536KB chunk
>
> bitmap 1/1 shows array is not in sync (we know it's due to the writes
> pending for the device we previously failed)
>
> Plug in a device that was previously plugged in
> ----------------------------------------
> root@deb6dev:~# mdadm -vv -I /dev/sdd --run
> mdadm: UUID differs from /dev/md/0.
> mdadm: UUID differs from /dev/md/1.
> mdadm: /dev/sdd attached to /dev/md100 which is already active.
> root@deb6dev:~# sync
> root@deb6dev:~# cat /proc/mdstat | grep -A 3 '^md100'
> md100 : active raid1 sdd[2] sdc[0]
> 51188 blocks super 1.2 [2/2] [UU]
> bitmap: 0/1 pages [0KB], 65536KB chunk
>
> Device reconnected [UU] and in sync (bitmap 0/1)
>
> Restore from off-site device
> ------------------------
> Remove device from array
> root@deb6dev:~# mdadm -vv /dev/md100 --fail /dev/sdd --remove /dev/sdd
> mdadm: set /dev/sdd faulty in /dev/md100
> mdadm: hot removed /dev/sdd from /dev/md100
> root@deb6dev:~# mdadm -Ev /dev/sdd
> /dev/sdd:
> Magic : a92b4efc
> Version : 1.2
> Feature Map : 0x1
> Array UUID : 4c957fac:d7dbc792:b642daf0:d22e313e
> Name : deb6dev:100 (local to host deb6dev)
> Creation Time : Tue Oct 25 15:22:19 2011
> Raid Level : raid1
> Raid Devices : 2
>
> Avail Dev Size : 102376 (50.00 MiB 52.42 MB)
> Array Size : 102376 (50.00 MiB 52.42 MB)
> Data Offset : 24 sectors
> Super Offset : 8 sectors
> State : clean
> Device UUID : 381f453f:5a97f1f6:bb5098bb:8c071a95
>
> Internal Bitmap : 8 sectors from superblock
> Update Time : Tue Oct 25 17:27:53 2011
> Checksum : acbcee5f - correct
> Events : 250
>
>
> Device Role : Active device 1
> Array State : AA ('A' == active, '.' == missing)
>
> Assemble a new array from off-site component:
> root@deb6dev:~# mdadm -vv -A /dev/md200 --run /dev/sdd
> mdadm: looking for devices for /dev/md200
> mdadm: /dev/sdd is identified as a member of /dev/md200, slot 1.
> mdadm: no uptodate device for slot 0 of /dev/md200
> mdadm: added /dev/sdd to /dev/md200 as 1
> mdadm: /dev/md200 has been started with 1 drive (out of 2).
> root@deb6dev:~#
>
> Check file-system on new array
> root@deb6dev:~# fsck.ext3 -f -n /dev/md200
> e2fsck 1.41.12 (17-May-2010)
> fsck.ext3: Superblock invalid, trying backup blocks...
> fsck.ext3: Bad magic number in super-block while trying to open /dev/md200
>
> The superblock could not be read or does not describe a correct ext2
> filesystem. If the device is valid and it really contains an ext2
> filesystem (and not swap or ufs or something else), then the superblock
> is corrupt, and you might try running e2fsck with an alternate superblock:
> e2fsck -b 8193 <device>
>
>
> How do I use these devices in a new array?
>
You need to also assemble md201 md202 md203 md204 md205 md206
and the fsck/mount md206
Each of these is made by assembling the single previous md20X array.
mdadm -A /dev/md201 --run /dev/md200
mdadm -A /dev/md202 --run /dev/md201
....
mdadm -A /dev/md206 --run /dev/md205
All the rest of your description looks good!
Thanks,
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
next prev parent reply other threads:[~2011-10-25 21:47 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-08-15 19:56 Rotating RAID 1 Jérôme Poulin
2011-08-15 20:19 ` Phil Turmel
2011-08-15 20:23 ` Jérôme Poulin
2011-08-15 20:21 ` Pavel Hofman
2011-08-15 20:25 ` Jérôme Poulin
2011-08-15 20:42 ` Pavel Hofman
2011-08-15 22:42 ` NeilBrown
2011-08-15 23:32 ` Jérôme Poulin
2011-08-15 23:55 ` NeilBrown
2011-08-16 6:34 ` Pavel Hofman
2011-09-09 22:28 ` Bill Davidsen
2011-09-11 19:21 ` Pavel Hofman
2011-09-12 14:20 ` Bill Davidsen
2011-08-23 3:45 ` Jérôme Poulin
2011-08-23 3:58 ` NeilBrown
2011-08-23 4:05 ` Jérôme Poulin
2011-08-24 2:28 ` Jérôme Poulin
2011-10-25 7:34 ` linbloke
2011-10-25 21:47 ` NeilBrown [this message]
2011-08-16 4:36 ` maurice
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20111026084753.1db251b2@notabene.brown \
--to=neilb@suse.de \
--cc=jeromepoulin@gmail.com \
--cc=linbloke@fastmail.fm \
--cc=linux-raid@vger.kernel.org \
--cc=pavel.hofman@ivitera.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.