linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Neil Brown <neilb@suse.de>
To: Jon Nelson <jnelson-linux-raid@jamponi.net>
Cc: LinuxRaid <linux-raid@vger.kernel.org>
Subject: Re: weird issues with raid1
Date: Mon, 15 Dec 2008 17:00:49 +1100	[thread overview]
Message-ID: <18757.62097.166706.244330@notabene.brown> (raw)
In-Reply-To: message from Jon Nelson on Friday December 5

On Friday December 5, jnelson-linux-raid@jamponi.net wrote:
> I set up a raid1 between some devices, and have been futzing with it.
> I've been encountering all kinds of weird problems, including one
> which required me to reboot my machine.
> 
> This is long, sorry.
> 
> First, this is how I built the raid:
> 
> mdadm --create /dev/md10 --level=1 --raid-devices=2 --bitmap=internal
> /dev/sdd1 --write-mostly --write-behind missing

'write-behind' is a setting on the bitmap and applies to all
write-mostly devices, so it can be specified anywhere.
'write-mostly' is a setting that applies to a particular device, not
to a position in the array.  So setting 'write-mostly' on a 'missing'
device has no useful effect.  When you add a new device to the array
you will need to set 'write-mostly' on that if you want that feature.
i.e.
   mdadm /dev/md10 --add --write-mostly /dev/nbd0


> 
> then I added /dev/nbd0:
> 
> mdadm /dev/md10 --add /dev/nbd0
> 
> and it rebuilt just fine.

Good.

> 
> Then I failed and removed /dev/sdd1, and added /dev/sda:
> 
> mdadm /dev/md10 --fail /dev/sdd1 --remove /dev/sdd1
> mdadm /dev/md10 --add /dev/sda
> 
> I let it rebuild.
> 
> Then I failed, and removed it:
> 
> The --fail worked, but the --remove did not.
> 
> mdadm /dev/md10 --fail /dev/sda --remove /dev/sda
> mdadm: set /dev/sda faulty in /dev/md10
> mdadm: hot remove failed for /dev/sda: Device or resource busy

That is expected.  Marking a device a 'failed' does not immediately
disconnect it from the array.  You have to wait for any in-flight IO
requests to complete.

> 
> Whaaa?
> So I tried again:
> 
> mdadm /dev/md10 --remove /dev/sda
> mdadm: hot removed /dev/sda

By now all those in-flight requests had completed and the device could
be removed.

> 
> OK. Better, but weird.
> Since I'm using bitmaps, I would expect --re-add to allow the rebuild
> to pick up where it left off. It was 78% done.

Nope.
With v0.90 metadata, a spare device is not marked a being part of the
array until it is fully recovered.  So if you interrupt a recovery
there is no record how far it got.
With v1.0 metadata we do record how far the recovery has progressed
and it can restart.  However I don't think that helps if you fail a
device - only if you stop the array and later restart it.

The bitmap is really about 'resync', not 'recovery'.

> 
> ******
> Question 1:
> I'm using a bitmap. Why does the rebuild start completely over?

Because the bitmap isn't used to guide a rebuild, only a resync.

The effect of --re-add is to make md do a resync rather than a rebuild
if the device was previously a fully active member of the array.

> 
> 4% into the rebuild, this is what --examine-bitmap looks like for both
> components:
> 
>         Filename : /dev/sda
>            Magic : 6d746962
>          Version : 4
>             UUID : 542a0986:dd465da6:b224af07:ed28e4e5
>           Events : 500
>   Events Cleared : 496
>            State : OK
>        Chunksize : 256 KB
>           Daemon : 5s flush period
>       Write Mode : Allow write behind, max 256
>        Sync Size : 78123968 (74.50 GiB 80.00 GB)
>           Bitmap : 305172 bits (chunks), 305172 dirty (100.0%)
> 
> turnip:~ # mdadm --examine-bitmap /dev/nbd0
>         Filename : /dev/nbd0
>            Magic : 6d746962
>          Version : 4
>             UUID : 542a0986:dd465da6:b224af07:ed28e4e5
>           Events : 524
>   Events Cleared : 496
>            State : OK
>        Chunksize : 256 KB
>           Daemon : 5s flush period
>       Write Mode : Allow write behind, max 256
>        Sync Size : 78123968 (74.50 GiB 80.00 GB)
>           Bitmap : 305172 bits (chunks), 0 dirty (0.0%)
> 
> 
> No matter how long I wait, until it is rebuilt, the bitmap on /dev/sda
> is always 100% dirty.
> If I --fail, --remove (twice) /dev/sda, and I re-add /dev/sdd1, it
> clearly uses the bitmap and re-syncs in under 1 second.

Yes, there is a bug here.
When an array recovers on to a hot space it doesn't copy the bitmap
across.  That will only happen lazily as bits are updated.
I'm surprised I hadn't noticed that before, so they might be more to
this than I'm seeing at the moment.   But I definitely cannot find
code to copy the bitmap across.  I'll have to have a think about
that. 

> 
> 
> ***************
> Question 2: mdadm --detail and cat /proc/mdstat do not agree:
> 
> NOTE: mdadm --detail says the rebuild status is 0% complete, but cat
> /proc/mdstat shows it as 7%.
> A bit later, I check again and they both agree - 14%.
> Below, from when the rebuild was 7% according to /proc/mdstat

I cannot explain this except to wonder if 7% of the recovery
completed between running "mdadm -D" and "cat /proc/mdstat".

The number report by "mdadm -D" is obtained by reading /proc/mdstat
and applying "atoi()" to the string that ends with a '%'.

NeilBrown


> 
> /dev/md10:
>         Version : 00.90.03
>   Creation Time : Fri Dec  5 07:44:41 2008
>      Raid Level : raid1
>      Array Size : 78123968 (74.50 GiB 80.00 GB)
>   Used Dev Size : 78123968 (74.50 GiB 80.00 GB)
>    Raid Devices : 2
>   Total Devices : 2
> Preferred Minor : 10
>     Persistence : Superblock is persistent
> 
>   Intent Bitmap : Internal
> 
>     Update Time : Fri Dec  5 20:04:30 2008
>           State : active, degraded, recovering
>  Active Devices : 1
> Working Devices : 2
>  Failed Devices : 0
>   Spare Devices : 1
> 
>  Rebuild Status : 0% complete
> 
>            UUID : 542a0986:dd465da6:b224af07:ed28e4e5
>          Events : 0.544
> 
>     Number   Major   Minor   RaidDevice State
>        2       8        0        0      spare rebuilding   /dev/sda
>        1      43        0        1      active sync   /dev/nbd0
> 
> 
> md10 : active raid1 sda[2] nbd0[1]
>       78123968 blocks [2/1] [_U]
>       [==>..................]  recovery = 13.1% (10283392/78123968)
> finish=27.3min speed=41367K/sec
>       bitmap: 0/150 pages [0KB], 256KB chunk
> 
> 
> 
> -- 
> Jon
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
v

  parent reply	other threads:[~2008-12-15  6:00 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-06  2:10 weird issues with raid1 Jon Nelson
2008-12-06  2:46 ` Jon Nelson
2008-12-06 12:16   ` Justin Piszcz
2008-12-15  2:17     ` Jon Nelson
2008-12-15  6:00 ` Neil Brown [this message]
2008-12-15 13:42   ` Jon Nelson
2008-12-15 21:33     ` Neil Brown
2008-12-15 21:47       ` Jon Nelson
2008-12-16  1:21         ` Neil Brown
2008-12-16  2:32           ` Jon Nelson
2008-12-18  4:42           ` Neil Brown
2008-12-18  4:50             ` Jon Nelson
2008-12-18  4:55               ` Jon Nelson
2008-12-18  5:17                 ` Neil Brown
2008-12-18  5:47                   ` Jon Nelson
2008-12-18  6:21                     ` Neil Brown
2008-12-19  2:15                       ` Jon Nelson
2008-12-19 16:51                         ` Jon Nelson
2008-12-19 20:40                           ` Jon Nelson
2008-12-19 21:18                             ` Jon Nelson
2008-12-22 14:40                               ` Jon Nelson
2008-12-22 21:07                                 ` NeilBrown
2008-12-18  5:43   ` Neil Brown
2008-12-18  5:54     ` Jon Nelson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=18757.62097.166706.244330@notabene.brown \
    --to=neilb@suse.de \
    --cc=jnelson-linux-raid@jamponi.net \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).