From: NeilBrown <neilb@suse.de>
To: linbloke <linbloke@fastmail.fm>
Cc: "Jérôme Poulin" <jeromepoulin@gmail.com>,
"Pavel Hofman" <pavel.hofman@ivitera.com>,
linux-raid <linux-raid@vger.kernel.org>
Subject: Re: Rotating RAID 1
Date: Wed, 26 Oct 2011 08:47:53 +1100 [thread overview]
Message-ID: <20111026084753.1db251b2@notabene.brown> (raw)
In-Reply-To: <4EA666A1.9000904@fastmail.fm>
[-- Attachment #1: Type: text/plain, Size: 12942 bytes --]
On Tue, 25 Oct 2011 18:34:57 +1100 linbloke <linbloke@fastmail.fm> wrote:
> On 16/08/11 9:55 AM, NeilBrown wrote:
> > On Mon, 15 Aug 2011 19:32:04 -0400 Jérôme Poulin<jeromepoulin@gmail.com>
> > wrote:
> >
> >> On Mon, Aug 15, 2011 at 6:42 PM, NeilBrown<neilb@suse.de> wrote:
> >>> So if there are 3 drives A, X, Y where A is permanent and X and Y are rotated
> >>> off-site, then I create two RAID1s like this:
> >>>
> >>>
> >>> mdadm -C /dev/md0 -l1 -n2 --bitmap=internal /dev/A /dev/X
> >>> mdadm -C /dev/md1 -l1 -n2 --bitmap=internal /dev/md0 /dev/Y
> >> That seems nice for 2 disks, but adding another one later would be a
> >> mess. Is there any way to play with slots number manually to make it
> >> appear as an always degraded RAID ? I can't plug all the disks at once
> >> because of the maximum of 2 ports.
> > Yes, add another one later would be difficult. But if you know up-front that
> > you will want three off-site devices it is easy.
> >
> > You could
> >
> > mdadm -C /dev/md0 -l1 -n2 -b internal /dev/A missing
> > mdadm -C /dev/md1 -l1 -n2 -b internal /dev/md0 missing
> > mdadm -C /dev/md2 -l1 -n2 -b internal /dev/md1 missing
> > mdadm -C /dev/md3 -l1 -n2 -b internal /dev/md2 missing
> >
> > mkfs /dev/md3 ; mount ..
> >
> > So you now have 4 "missing" devices. Each time you plug in a device that
> > hasn't been in an array before, explicitly add it to the array that you want
> > it to be a part of and let it recover.
> > When you plug in a device that was previously plugged in, just "mdadm
> > -I /dev/XX" and it will automatically be added and recover based on the
> > bitmap.
> >
> > You can have as many or as few of the transient drives plugged in at any
> > time as you like.
> >
> > There is a cost here of course. Every write potentially needs to update
> > every bitmap, so the more bitmaps, the more overhead in updating them. So
> > don't create more than you need.
> >
> > Also, it doesn't have to be a linear stack. It could be a binary tree
> > though that might take a little more care to construct. Then when an
> > adjacent pair of leafs are both off-site, their bitmap would not need
> > updating.
> >
> > NeilBrown
>
> Hi Neil, Jérôme and Pavel,
>
> I'm in the process of testing the solution described above and have been
> successful at those steps (I now have sync'd devices that I have failed
> and removed from their respective arrays - the "backups"). I can add new
> devices and also incrementally re-add the devices back to their
> respective arrays and all my tests show this process works well. The
> point which I'm now trying to resolve is how to create a new array from
> one of the off-site components - ie, the restore from backup test.
> Below are the steps I've taken to implement and verify each step, you
> can skip to the bottom section "Restore from off-site backup" to get to
> the point if you like. When the wiki is back up, I'll post this process
> there for others who are looking for mdadm based offline backups. Any
> corrections gratefully appreciated.
>
> Based on the example above, for a target setup of 7 off-site devices
> synced to a two device RAID1, my test setup for a is:
>
> RAID Array Online Device Off-site device
> md100 sdc sdd
> md101 md100 sde
> md102 md101 sdf
> md103 md102 sdg
> md104 md103 sdh
> md105 md104 sdi
> md106 md105 sdj
>
> root@deb6dev:~# uname -a
> Linux deb6dev 2.6.32-5-686 #1 SMP Tue Mar 8 21:36:00 UTC 2011 i686 GNU/Linux
> root@deb6dev:~# mdadm -V
> mdadm - v3.1.4 - 31st August 2010
> root@deb6dev:~# cat /etc/debian_version
> 6.0.1
>
> Create the nested arrays
> ---------------------
> root@deb6dev:~# mdadm -C /dev/md100 -l1 -n2 -b internal -e 1.2 /dev/sdc
> missing
> mdadm: array /dev/md100 started.
> root@deb6dev:~# mdadm -C /dev/md101 -l1 -n2 -b internal -e 1.2
> /dev/md100 missing
> mdadm: array /dev/md101 started.
> root@deb6dev:~# mdadm -C /dev/md102 -l1 -n2 -b internal -e 1.2
> /dev/md101 missing
> mdadm: array /dev/md102 started.
> root@deb6dev:~# mdadm -C /dev/md103 -l1 -n2 -b internal -e 1.2
> /dev/md102 missing
> mdadm: array /dev/md103 started.
> root@deb6dev:~# mdadm -C /dev/md104 -l1 -n2 -b internal -e 1.2
> /dev/md103 missing
> mdadm: array /dev/md104 started.
> root@deb6dev:~# mdadm -C /dev/md105 -l1 -n2 -b internal -e 1.2
> /dev/md104 missing
> mdadm: array /dev/md105 started.
> root@deb6dev:~# mdadm -C /dev/md106 -l1 -n2 -b internal -e 1.2
> /dev/md105 missing
> mdadm: array /dev/md106 started.
>
> root@deb6dev:~# cat /proc/mdstat
> Personalities : [raid1]
> md106 : active (auto-read-only) raid1 md105[0]
> 51116 blocks super 1.2 [2/1] [U_]
> bitmap: 1/1 pages [4KB], 65536KB chunk
>
> md105 : active raid1 md104[0]
> 51128 blocks super 1.2 [2/1] [U_]
> bitmap: 1/1 pages [4KB], 65536KB chunk
>
> md104 : active raid1 md103[0]
> 51140 blocks super 1.2 [2/1] [U_]
> bitmap: 1/1 pages [4KB], 65536KB chunk
>
> md103 : active raid1 md102[0]
> 51152 blocks super 1.2 [2/1] [U_]
> bitmap: 1/1 pages [4KB], 65536KB chunk
>
> md102 : active raid1 md101[0]
> 51164 blocks super 1.2 [2/1] [U_]
> bitmap: 1/1 pages [4KB], 65536KB chunk
>
> md101 : active raid1 md100[0]
> 51176 blocks super 1.2 [2/1] [U_]
> bitmap: 1/1 pages [4KB], 65536KB chunk
>
> md100 : active raid1 sdc[0]
> 51188 blocks super 1.2 [2/1] [U_]
> bitmap: 1/1 pages [4KB], 65536KB chunk
>
> unused devices: <none>
>
> Create and mount a filesystem
> --------------------------
> root@deb6dev:~# mkfs.ext3 /dev/md106
> <<successful mkfs output snipped>>
> root@deb6dev:~# mount -t ext3 /dev/md106 /mnt/backup
> root@deb6dev:~# df | grep backup
> /dev/md106 49490 4923 42012 11% /mnt/backup
>
> Plug in a device that hasn't been in an array before
> -------------------------------------------
> root@deb6dev:~# mdadm -vv /dev/md100 --add /dev/sdd
> mdadm: added /dev/sdd
> root@deb6dev:~# cat /proc/mdstat
> Personalities : [raid1]
> md106 : active raid1 md105[0]
> 51116 blocks super 1.2 [2/1] [U_]
> bitmap: 1/1 pages [4KB], 65536KB chunk
>
> md105 : active raid1 md104[0]
> 51128 blocks super 1.2 [2/1] [U_]
> bitmap: 1/1 pages [4KB], 65536KB chunk
>
> md104 : active raid1 md103[0]
> 51140 blocks super 1.2 [2/1] [U_]
> bitmap: 1/1 pages [4KB], 65536KB chunk
>
> md103 : active raid1 md102[0]
> 51152 blocks super 1.2 [2/1] [U_]
> bitmap: 1/1 pages [4KB], 65536KB chunk
>
> md102 : active raid1 md101[0]
> 51164 blocks super 1.2 [2/1] [U_]
> bitmap: 1/1 pages [4KB], 65536KB chunk
>
> md101 : active raid1 md100[0]
> 51176 blocks super 1.2 [2/1] [U_]
> bitmap: 1/1 pages [4KB], 65536KB chunk
>
> md100 : active raid1 sdd[2] sdc[0]
> 51188 blocks super 1.2 [2/2] [UU]
> bitmap: 1/1 pages [4KB], 65536KB chunk
>
>
> Write to the array
> ---------------
> root@deb6dev:~# dd if=/dev/urandom of=a.blob bs=1M count=20
> 20+0 records in
> 20+0 records out
> 20971520 bytes (21 MB) copied, 5.05528 s, 4.1 MB/s
> root@deb6dev:~# dd if=/dev/urandom of=b.blob bs=1M count=10
> 10+0 records in
> 10+0 records out
> 10485760 bytes (10 MB) copied, 2.59361 s, 4.0 MB/s
> root@deb6dev:~# dd if=/dev/urandom of=c.blob bs=1M count=5
> 5+0 records in
> 5+0 records out
> 5242880 bytes (5.2 MB) copied, 1.35619 s, 3.9 MB/s
> root@deb6dev:~# md5sum *blob > md5sums.txt
> root@deb6dev:~# ls -l
> total 35844
> -rw-r--r-- 1 root root 20971520 Oct 25 15:57 a.blob
> -rw-r--r-- 1 root root 10485760 Oct 25 15:57 b.blob
> -rw-r--r-- 1 root root 5242880 Oct 25 15:57 c.blob
> -rw-r--r-- 1 root root 123 Oct 25 15:57 md5sums.txt
> root@deb6dev:~# cp *blob /mnt/backup
> root@deb6dev:~# ls -l /mnt/backup
> total 35995
> -rw-r--r-- 1 root root 20971520 Oct 25 15:58 a.blob
> -rw-r--r-- 1 root root 10485760 Oct 25 15:58 b.blob
> -rw-r--r-- 1 root root 5242880 Oct 25 15:58 c.blob
> drwx------ 2 root root 12288 Oct 25 15:27 lost+found
> root@deb6dev:~# df | grep backup
> /dev/md106 49490 40906 6029 88% /mnt/backup
> root@deb6dev:~# cat /proc/mdstat | grep -A 3 '^md100'
> md100 : active raid1 sdd[2] sdc[0]
> 51188 blocks super 1.2 [2/2] [UU]
> bitmap: 0/1 pages [0KB], 65536KB chunk
>
> Data written and array devices in sync (bitmap 0/1)
>
> Fail and remove device
> -------------------
> root@deb6dev:~# sync
> root@deb6dev:~# mdadm -vv /dev/md100 --fail /dev/sdd --remove /dev/sdd
> mdadm: set /dev/sdd faulty in /dev/md100
> mdadm: hot removed /dev/sdd from /dev/md100
> root@deb6dev:~# cat /proc/mdstat | grep -A 3 '^md100'
> md100 : active raid1 sdc[0]
> 51188 blocks super 1.2 [2/1] [U_]
> bitmap: 0/1 pages [0KB], 65536KB chunk
>
> Device may now be unplugged
>
>
> Write to the array again
> --------------------
> root@deb6dev:~# rm /mnt/backup/b.blob
> root@deb6dev:~# ls -l /mnt/backup
> total 25714
> -rw-r--r-- 1 root root 20971520 Oct 25 15:58 a.blob
> -rw-r--r-- 1 root root 5242880 Oct 25 15:58 c.blob
> drwx------ 2 root root 12288 Oct 25 15:27 lost+found
> root@deb6dev:~# df | grep backup
> /dev/md106 49490 30625 16310 66% /mnt/backup
> root@deb6dev:~# cat /proc/mdstat | grep -A 3 '^md100'
> md100 : active raid1 sdc[0]
> 51188 blocks super 1.2 [2/1] [U_]
> bitmap: 1/1 pages [4KB], 65536KB chunk
>
> bitmap 1/1 shows array is not in sync (we know it's due to the writes
> pending for the device we previously failed)
>
> Plug in a device that was previously plugged in
> ----------------------------------------
> root@deb6dev:~# mdadm -vv -I /dev/sdd --run
> mdadm: UUID differs from /dev/md/0.
> mdadm: UUID differs from /dev/md/1.
> mdadm: /dev/sdd attached to /dev/md100 which is already active.
> root@deb6dev:~# sync
> root@deb6dev:~# cat /proc/mdstat | grep -A 3 '^md100'
> md100 : active raid1 sdd[2] sdc[0]
> 51188 blocks super 1.2 [2/2] [UU]
> bitmap: 0/1 pages [0KB], 65536KB chunk
>
> Device reconnected [UU] and in sync (bitmap 0/1)
>
> Restore from off-site device
> ------------------------
> Remove device from array
> root@deb6dev:~# mdadm -vv /dev/md100 --fail /dev/sdd --remove /dev/sdd
> mdadm: set /dev/sdd faulty in /dev/md100
> mdadm: hot removed /dev/sdd from /dev/md100
> root@deb6dev:~# mdadm -Ev /dev/sdd
> /dev/sdd:
> Magic : a92b4efc
> Version : 1.2
> Feature Map : 0x1
> Array UUID : 4c957fac:d7dbc792:b642daf0:d22e313e
> Name : deb6dev:100 (local to host deb6dev)
> Creation Time : Tue Oct 25 15:22:19 2011
> Raid Level : raid1
> Raid Devices : 2
>
> Avail Dev Size : 102376 (50.00 MiB 52.42 MB)
> Array Size : 102376 (50.00 MiB 52.42 MB)
> Data Offset : 24 sectors
> Super Offset : 8 sectors
> State : clean
> Device UUID : 381f453f:5a97f1f6:bb5098bb:8c071a95
>
> Internal Bitmap : 8 sectors from superblock
> Update Time : Tue Oct 25 17:27:53 2011
> Checksum : acbcee5f - correct
> Events : 250
>
>
> Device Role : Active device 1
> Array State : AA ('A' == active, '.' == missing)
>
> Assemble a new array from off-site component:
> root@deb6dev:~# mdadm -vv -A /dev/md200 --run /dev/sdd
> mdadm: looking for devices for /dev/md200
> mdadm: /dev/sdd is identified as a member of /dev/md200, slot 1.
> mdadm: no uptodate device for slot 0 of /dev/md200
> mdadm: added /dev/sdd to /dev/md200 as 1
> mdadm: /dev/md200 has been started with 1 drive (out of 2).
> root@deb6dev:~#
>
> Check file-system on new array
> root@deb6dev:~# fsck.ext3 -f -n /dev/md200
> e2fsck 1.41.12 (17-May-2010)
> fsck.ext3: Superblock invalid, trying backup blocks...
> fsck.ext3: Bad magic number in super-block while trying to open /dev/md200
>
> The superblock could not be read or does not describe a correct ext2
> filesystem. If the device is valid and it really contains an ext2
> filesystem (and not swap or ufs or something else), then the superblock
> is corrupt, and you might try running e2fsck with an alternate superblock:
> e2fsck -b 8193 <device>
>
>
> How do I use these devices in a new array?
>
You need to also assemble md201 md202 md203 md204 md205 md206
and the fsck/mount md206
Each of these is made by assembling the single previous md20X array.
mdadm -A /dev/md201 --run /dev/md200
mdadm -A /dev/md202 --run /dev/md201
....
mdadm -A /dev/md206 --run /dev/md205
All the rest of your description looks good!
Thanks,
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
next prev parent reply other threads:[~2011-10-25 21:47 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-08-15 19:56 Rotating RAID 1 Jérôme Poulin
2011-08-15 20:19 ` Phil Turmel
2011-08-15 20:23 ` Jérôme Poulin
2011-08-15 20:21 ` Pavel Hofman
2011-08-15 20:25 ` Jérôme Poulin
2011-08-15 20:42 ` Pavel Hofman
2011-08-15 22:42 ` NeilBrown
2011-08-15 23:32 ` Jérôme Poulin
2011-08-15 23:55 ` NeilBrown
2011-08-16 6:34 ` Pavel Hofman
2011-09-09 22:28 ` Bill Davidsen
2011-09-11 19:21 ` Pavel Hofman
2011-09-12 14:20 ` Bill Davidsen
2011-08-23 3:45 ` Jérôme Poulin
2011-08-23 3:58 ` NeilBrown
2011-08-23 4:05 ` Jérôme Poulin
2011-08-24 2:28 ` Jérôme Poulin
2011-10-25 7:34 ` linbloke
2011-10-25 21:47 ` NeilBrown [this message]
2011-08-16 4:36 ` maurice
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20111026084753.1db251b2@notabene.brown \
--to=neilb@suse.de \
--cc=jeromepoulin@gmail.com \
--cc=linbloke@fastmail.fm \
--cc=linux-raid@vger.kernel.org \
--cc=pavel.hofman@ivitera.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).