From: Neil Brown <neilb@suse.de>
To: Jon Nelson <jnelson-linux-raid@jamponi.net>
Cc: LinuxRaid <linux-raid@vger.kernel.org>
Subject: Re: weird issues with raid1
Date: Mon, 15 Dec 2008 17:00:49 +1100 [thread overview]
Message-ID: <18757.62097.166706.244330@notabene.brown> (raw)
In-Reply-To: message from Jon Nelson on Friday December 5
On Friday December 5, jnelson-linux-raid@jamponi.net wrote:
> I set up a raid1 between some devices, and have been futzing with it.
> I've been encountering all kinds of weird problems, including one
> which required me to reboot my machine.
>
> This is long, sorry.
>
> First, this is how I built the raid:
>
> mdadm --create /dev/md10 --level=1 --raid-devices=2 --bitmap=internal
> /dev/sdd1 --write-mostly --write-behind missing
'write-behind' is a setting on the bitmap and applies to all
write-mostly devices, so it can be specified anywhere.
'write-mostly' is a setting that applies to a particular device, not
to a position in the array. So setting 'write-mostly' on a 'missing'
device has no useful effect. When you add a new device to the array
you will need to set 'write-mostly' on that if you want that feature.
i.e.
mdadm /dev/md10 --add --write-mostly /dev/nbd0
>
> then I added /dev/nbd0:
>
> mdadm /dev/md10 --add /dev/nbd0
>
> and it rebuilt just fine.
Good.
>
> Then I failed and removed /dev/sdd1, and added /dev/sda:
>
> mdadm /dev/md10 --fail /dev/sdd1 --remove /dev/sdd1
> mdadm /dev/md10 --add /dev/sda
>
> I let it rebuild.
>
> Then I failed, and removed it:
>
> The --fail worked, but the --remove did not.
>
> mdadm /dev/md10 --fail /dev/sda --remove /dev/sda
> mdadm: set /dev/sda faulty in /dev/md10
> mdadm: hot remove failed for /dev/sda: Device or resource busy
That is expected. Marking a device a 'failed' does not immediately
disconnect it from the array. You have to wait for any in-flight IO
requests to complete.
>
> Whaaa?
> So I tried again:
>
> mdadm /dev/md10 --remove /dev/sda
> mdadm: hot removed /dev/sda
By now all those in-flight requests had completed and the device could
be removed.
>
> OK. Better, but weird.
> Since I'm using bitmaps, I would expect --re-add to allow the rebuild
> to pick up where it left off. It was 78% done.
Nope.
With v0.90 metadata, a spare device is not marked a being part of the
array until it is fully recovered. So if you interrupt a recovery
there is no record how far it got.
With v1.0 metadata we do record how far the recovery has progressed
and it can restart. However I don't think that helps if you fail a
device - only if you stop the array and later restart it.
The bitmap is really about 'resync', not 'recovery'.
>
> ******
> Question 1:
> I'm using a bitmap. Why does the rebuild start completely over?
Because the bitmap isn't used to guide a rebuild, only a resync.
The effect of --re-add is to make md do a resync rather than a rebuild
if the device was previously a fully active member of the array.
>
> 4% into the rebuild, this is what --examine-bitmap looks like for both
> components:
>
> Filename : /dev/sda
> Magic : 6d746962
> Version : 4
> UUID : 542a0986:dd465da6:b224af07:ed28e4e5
> Events : 500
> Events Cleared : 496
> State : OK
> Chunksize : 256 KB
> Daemon : 5s flush period
> Write Mode : Allow write behind, max 256
> Sync Size : 78123968 (74.50 GiB 80.00 GB)
> Bitmap : 305172 bits (chunks), 305172 dirty (100.0%)
>
> turnip:~ # mdadm --examine-bitmap /dev/nbd0
> Filename : /dev/nbd0
> Magic : 6d746962
> Version : 4
> UUID : 542a0986:dd465da6:b224af07:ed28e4e5
> Events : 524
> Events Cleared : 496
> State : OK
> Chunksize : 256 KB
> Daemon : 5s flush period
> Write Mode : Allow write behind, max 256
> Sync Size : 78123968 (74.50 GiB 80.00 GB)
> Bitmap : 305172 bits (chunks), 0 dirty (0.0%)
>
>
> No matter how long I wait, until it is rebuilt, the bitmap on /dev/sda
> is always 100% dirty.
> If I --fail, --remove (twice) /dev/sda, and I re-add /dev/sdd1, it
> clearly uses the bitmap and re-syncs in under 1 second.
Yes, there is a bug here.
When an array recovers on to a hot space it doesn't copy the bitmap
across. That will only happen lazily as bits are updated.
I'm surprised I hadn't noticed that before, so they might be more to
this than I'm seeing at the moment. But I definitely cannot find
code to copy the bitmap across. I'll have to have a think about
that.
>
>
> ***************
> Question 2: mdadm --detail and cat /proc/mdstat do not agree:
>
> NOTE: mdadm --detail says the rebuild status is 0% complete, but cat
> /proc/mdstat shows it as 7%.
> A bit later, I check again and they both agree - 14%.
> Below, from when the rebuild was 7% according to /proc/mdstat
I cannot explain this except to wonder if 7% of the recovery
completed between running "mdadm -D" and "cat /proc/mdstat".
The number report by "mdadm -D" is obtained by reading /proc/mdstat
and applying "atoi()" to the string that ends with a '%'.
NeilBrown
>
> /dev/md10:
> Version : 00.90.03
> Creation Time : Fri Dec 5 07:44:41 2008
> Raid Level : raid1
> Array Size : 78123968 (74.50 GiB 80.00 GB)
> Used Dev Size : 78123968 (74.50 GiB 80.00 GB)
> Raid Devices : 2
> Total Devices : 2
> Preferred Minor : 10
> Persistence : Superblock is persistent
>
> Intent Bitmap : Internal
>
> Update Time : Fri Dec 5 20:04:30 2008
> State : active, degraded, recovering
> Active Devices : 1
> Working Devices : 2
> Failed Devices : 0
> Spare Devices : 1
>
> Rebuild Status : 0% complete
>
> UUID : 542a0986:dd465da6:b224af07:ed28e4e5
> Events : 0.544
>
> Number Major Minor RaidDevice State
> 2 8 0 0 spare rebuilding /dev/sda
> 1 43 0 1 active sync /dev/nbd0
>
>
> md10 : active raid1 sda[2] nbd0[1]
> 78123968 blocks [2/1] [_U]
> [==>..................] recovery = 13.1% (10283392/78123968)
> finish=27.3min speed=41367K/sec
> bitmap: 0/150 pages [0KB], 256KB chunk
>
>
>
> --
> Jon
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
v
next prev parent reply other threads:[~2008-12-15 6:00 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-12-06 2:10 weird issues with raid1 Jon Nelson
2008-12-06 2:46 ` Jon Nelson
2008-12-06 12:16 ` Justin Piszcz
2008-12-15 2:17 ` Jon Nelson
2008-12-15 6:00 ` Neil Brown [this message]
2008-12-15 13:42 ` Jon Nelson
2008-12-15 21:33 ` Neil Brown
2008-12-15 21:47 ` Jon Nelson
2008-12-16 1:21 ` Neil Brown
2008-12-16 2:32 ` Jon Nelson
2008-12-18 4:42 ` Neil Brown
2008-12-18 4:50 ` Jon Nelson
2008-12-18 4:55 ` Jon Nelson
2008-12-18 5:17 ` Neil Brown
2008-12-18 5:47 ` Jon Nelson
2008-12-18 6:21 ` Neil Brown
2008-12-19 2:15 ` Jon Nelson
2008-12-19 16:51 ` Jon Nelson
2008-12-19 20:40 ` Jon Nelson
2008-12-19 21:18 ` Jon Nelson
2008-12-22 14:40 ` Jon Nelson
2008-12-22 21:07 ` NeilBrown
2008-12-18 5:43 ` Neil Brown
2008-12-18 5:54 ` Jon Nelson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=18757.62097.166706.244330@notabene.brown \
--to=neilb@suse.de \
--cc=jnelson-linux-raid@jamponi.net \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).