linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* weird issues with raid1
@ 2008-12-06  2:10 Jon Nelson
  2008-12-06  2:46 ` Jon Nelson
  2008-12-15  6:00 ` Neil Brown
  0 siblings, 2 replies; 24+ messages in thread
From: Jon Nelson @ 2008-12-06  2:10 UTC (permalink / raw)
  To: LinuxRaid

I set up a raid1 between some devices, and have been futzing with it.
I've been encountering all kinds of weird problems, including one
which required me to reboot my machine.

This is long, sorry.

First, this is how I built the raid:

mdadm --create /dev/md10 --level=1 --raid-devices=2 --bitmap=internal
/dev/sdd1 --write-mostly --write-behind missing

then I added /dev/nbd0:

mdadm /dev/md10 --add /dev/nbd0

and it rebuilt just fine.

Then I failed and removed /dev/sdd1, and added /dev/sda:

mdadm /dev/md10 --fail /dev/sdd1 --remove /dev/sdd1
mdadm /dev/md10 --add /dev/sda

I let it rebuild.

Then I failed, and removed it:

The --fail worked, but the --remove did not.

mdadm /dev/md10 --fail /dev/sda --remove /dev/sda
mdadm: set /dev/sda faulty in /dev/md10
mdadm: hot remove failed for /dev/sda: Device or resource busy

Whaaa?
So I tried again:

mdadm /dev/md10 --remove /dev/sda
mdadm: hot removed /dev/sda

OK. Better, but weird.
Since I'm using bitmaps, I would expect --re-add to allow the rebuild
to pick up where it left off. It was 78% done.

mdadm /dev/md10 --re-add /dev/sda

cat /dev/mdstat

md10 : active raid1 sda[2] nbd0[1]
      78123968 blocks [2/1] [_U]
      [>....................]  recovery =  1.2% (959168/78123968)
finish=30.8min speed=41702K/sec
      bitmap: 0/150 pages [0KB], 256KB chunk


******
Question 1:
I'm using a bitmap. Why does the rebuild start completely over?

4% into the rebuild, this is what --examine-bitmap looks like for both
components:

        Filename : /dev/sda
           Magic : 6d746962
         Version : 4
            UUID : 542a0986:dd465da6:b224af07:ed28e4e5
          Events : 500
  Events Cleared : 496
           State : OK
       Chunksize : 256 KB
          Daemon : 5s flush period
      Write Mode : Allow write behind, max 256
       Sync Size : 78123968 (74.50 GiB 80.00 GB)
          Bitmap : 305172 bits (chunks), 305172 dirty (100.0%)

turnip:~ # mdadm --examine-bitmap /dev/nbd0
        Filename : /dev/nbd0
           Magic : 6d746962
         Version : 4
            UUID : 542a0986:dd465da6:b224af07:ed28e4e5
          Events : 524
  Events Cleared : 496
           State : OK
       Chunksize : 256 KB
          Daemon : 5s flush period
      Write Mode : Allow write behind, max 256
       Sync Size : 78123968 (74.50 GiB 80.00 GB)
          Bitmap : 305172 bits (chunks), 0 dirty (0.0%)


No matter how long I wait, until it is rebuilt, the bitmap on /dev/sda
is always 100% dirty.
If I --fail, --remove (twice) /dev/sda, and I re-add /dev/sdd1, it
clearly uses the bitmap and re-syncs in under 1 second.


***************
Question 2: mdadm --detail and cat /proc/mdstat do not agree:

NOTE: mdadm --detail says the rebuild status is 0% complete, but cat
/proc/mdstat shows it as 7%.
A bit later, I check again and they both agree - 14%.
Below, from when the rebuild was 7% according to /proc/mdstat

/dev/md10:
        Version : 00.90.03
  Creation Time : Fri Dec  5 07:44:41 2008
     Raid Level : raid1
     Array Size : 78123968 (74.50 GiB 80.00 GB)
  Used Dev Size : 78123968 (74.50 GiB 80.00 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 10
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Fri Dec  5 20:04:30 2008
          State : active, degraded, recovering
 Active Devices : 1
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 1

 Rebuild Status : 0% complete

           UUID : 542a0986:dd465da6:b224af07:ed28e4e5
         Events : 0.544

    Number   Major   Minor   RaidDevice State
       2       8        0        0      spare rebuilding   /dev/sda
       1      43        0        1      active sync   /dev/nbd0


md10 : active raid1 sda[2] nbd0[1]
      78123968 blocks [2/1] [_U]
      [==>..................]  recovery = 13.1% (10283392/78123968)
finish=27.3min speed=41367K/sec
      bitmap: 0/150 pages [0KB], 256KB chunk



-- 
Jon

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: weird issues with raid1
  2008-12-06  2:10 weird issues with raid1 Jon Nelson
@ 2008-12-06  2:46 ` Jon Nelson
  2008-12-06 12:16   ` Justin Piszcz
  2008-12-15  6:00 ` Neil Brown
  1 sibling, 1 reply; 24+ messages in thread
From: Jon Nelson @ 2008-12-06  2:46 UTC (permalink / raw)
  To: LinuxRaid

More info:

according to /dev/mdstat (and /var/log/messages) the rebuild is complete:

md10 : active raid1 sda[0] nbd0[1]
      78123968 blocks [2/2] [UU]
      bitmap: 0/150 pages [0KB], 256KB chunk

and --detail:

/dev/md10:
        Version : 00.90.03
  Creation Time : Fri Dec  5 07:44:41 2008
     Raid Level : raid1
     Array Size : 78123968 (74.50 GiB 80.00 GB)
  Used Dev Size : 78123968 (74.50 GiB 80.00 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 10
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Fri Dec  5 20:40:32 2008
          State : active
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : 542a0986:dd465da6:b224af07:ed28e4e5
         Events : 0.554

    Number   Major   Minor   RaidDevice State
       0       8        0        0      active sync   /dev/sda
       1      43        0        1      active sync   /dev/nbd0



however, --examine-bitmap disagrees:

        Filename : /dev/sda
           Magic : 6d746962
         Version : 4
            UUID : 542a0986:dd465da6:b224af07:ed28e4e5
          Events : 554
  Events Cleared : 554
           State : OK
       Chunksize : 256 KB
          Daemon : 5s flush period
      Write Mode : Allow write behind, max 256
       Sync Size : 78123968 (74.50 GiB 80.00 GB)
          Bitmap : 305172 bits (chunks), 274452 dirty (89.9%)

The bitmap numbers *DID NOT CHANGE* throughout the entire rebuild
process, and when it was complete, changed to what you see above. The
rebuild completed a few minutes prior to the --examine-bitmap.

Something is very funky.

If I --grow --bitmap=none, --grow --bitmap=internal then things look
OK after maybe 10-15 seconds.

Of course, when this is complete the --fail --remove and --re-add work
as expected on /dev/sda.



-- 
Jon

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: weird issues with raid1
  2008-12-06  2:46 ` Jon Nelson
@ 2008-12-06 12:16   ` Justin Piszcz
  2008-12-15  2:17     ` Jon Nelson
  0 siblings, 1 reply; 24+ messages in thread
From: Justin Piszcz @ 2008-12-06 12:16 UTC (permalink / raw)
  To: Jon Nelson; +Cc: LinuxRaid



On Fri, 5 Dec 2008, Jon Nelson wrote:

> More info:
>
> according to /dev/mdstat (and /var/log/messages) the rebuild is complete:
>
> md10 : active raid1 sda[0] nbd0[1]
>      78123968 blocks [2/2] [UU]
>      bitmap: 0/150 pages [0KB], 256KB chunk

I have not tried using network block devices or the write-behind option;
however, Neil et. all will want to know:

- kernel version used
- mdadm version

In order to help better track the issues.

Justin.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: weird issues with raid1
  2008-12-06 12:16   ` Justin Piszcz
@ 2008-12-15  2:17     ` Jon Nelson
  0 siblings, 0 replies; 24+ messages in thread
From: Jon Nelson @ 2008-12-15  2:17 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: LinuxRaid

> I have not tried using network block devices or the write-behind option;
> however, Neil et. all will want to know:
>
> - kernel version used
> - mdadm version


Kernel: 2.6.25.18-0.2-default (stock openSUSE 11.0)
mdadm: tried both 2.6.4 (stock openSUSE 11.0) and 3.0-12.1 (from
opensuse factory)

-- 
Jon

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: weird issues with raid1
  2008-12-06  2:10 weird issues with raid1 Jon Nelson
  2008-12-06  2:46 ` Jon Nelson
@ 2008-12-15  6:00 ` Neil Brown
  2008-12-15 13:42   ` Jon Nelson
  2008-12-18  5:43   ` Neil Brown
  1 sibling, 2 replies; 24+ messages in thread
From: Neil Brown @ 2008-12-15  6:00 UTC (permalink / raw)
  To: Jon Nelson; +Cc: LinuxRaid

On Friday December 5, jnelson-linux-raid@jamponi.net wrote:
> I set up a raid1 between some devices, and have been futzing with it.
> I've been encountering all kinds of weird problems, including one
> which required me to reboot my machine.
> 
> This is long, sorry.
> 
> First, this is how I built the raid:
> 
> mdadm --create /dev/md10 --level=1 --raid-devices=2 --bitmap=internal
> /dev/sdd1 --write-mostly --write-behind missing

'write-behind' is a setting on the bitmap and applies to all
write-mostly devices, so it can be specified anywhere.
'write-mostly' is a setting that applies to a particular device, not
to a position in the array.  So setting 'write-mostly' on a 'missing'
device has no useful effect.  When you add a new device to the array
you will need to set 'write-mostly' on that if you want that feature.
i.e.
   mdadm /dev/md10 --add --write-mostly /dev/nbd0


> 
> then I added /dev/nbd0:
> 
> mdadm /dev/md10 --add /dev/nbd0
> 
> and it rebuilt just fine.

Good.

> 
> Then I failed and removed /dev/sdd1, and added /dev/sda:
> 
> mdadm /dev/md10 --fail /dev/sdd1 --remove /dev/sdd1
> mdadm /dev/md10 --add /dev/sda
> 
> I let it rebuild.
> 
> Then I failed, and removed it:
> 
> The --fail worked, but the --remove did not.
> 
> mdadm /dev/md10 --fail /dev/sda --remove /dev/sda
> mdadm: set /dev/sda faulty in /dev/md10
> mdadm: hot remove failed for /dev/sda: Device or resource busy

That is expected.  Marking a device a 'failed' does not immediately
disconnect it from the array.  You have to wait for any in-flight IO
requests to complete.

> 
> Whaaa?
> So I tried again:
> 
> mdadm /dev/md10 --remove /dev/sda
> mdadm: hot removed /dev/sda

By now all those in-flight requests had completed and the device could
be removed.

> 
> OK. Better, but weird.
> Since I'm using bitmaps, I would expect --re-add to allow the rebuild
> to pick up where it left off. It was 78% done.

Nope.
With v0.90 metadata, a spare device is not marked a being part of the
array until it is fully recovered.  So if you interrupt a recovery
there is no record how far it got.
With v1.0 metadata we do record how far the recovery has progressed
and it can restart.  However I don't think that helps if you fail a
device - only if you stop the array and later restart it.

The bitmap is really about 'resync', not 'recovery'.

> 
> ******
> Question 1:
> I'm using a bitmap. Why does the rebuild start completely over?

Because the bitmap isn't used to guide a rebuild, only a resync.

The effect of --re-add is to make md do a resync rather than a rebuild
if the device was previously a fully active member of the array.

> 
> 4% into the rebuild, this is what --examine-bitmap looks like for both
> components:
> 
>         Filename : /dev/sda
>            Magic : 6d746962
>          Version : 4
>             UUID : 542a0986:dd465da6:b224af07:ed28e4e5
>           Events : 500
>   Events Cleared : 496
>            State : OK
>        Chunksize : 256 KB
>           Daemon : 5s flush period
>       Write Mode : Allow write behind, max 256
>        Sync Size : 78123968 (74.50 GiB 80.00 GB)
>           Bitmap : 305172 bits (chunks), 305172 dirty (100.0%)
> 
> turnip:~ # mdadm --examine-bitmap /dev/nbd0
>         Filename : /dev/nbd0
>            Magic : 6d746962
>          Version : 4
>             UUID : 542a0986:dd465da6:b224af07:ed28e4e5
>           Events : 524
>   Events Cleared : 496
>            State : OK
>        Chunksize : 256 KB
>           Daemon : 5s flush period
>       Write Mode : Allow write behind, max 256
>        Sync Size : 78123968 (74.50 GiB 80.00 GB)
>           Bitmap : 305172 bits (chunks), 0 dirty (0.0%)
> 
> 
> No matter how long I wait, until it is rebuilt, the bitmap on /dev/sda
> is always 100% dirty.
> If I --fail, --remove (twice) /dev/sda, and I re-add /dev/sdd1, it
> clearly uses the bitmap and re-syncs in under 1 second.

Yes, there is a bug here.
When an array recovers on to a hot space it doesn't copy the bitmap
across.  That will only happen lazily as bits are updated.
I'm surprised I hadn't noticed that before, so they might be more to
this than I'm seeing at the moment.   But I definitely cannot find
code to copy the bitmap across.  I'll have to have a think about
that. 

> 
> 
> ***************
> Question 2: mdadm --detail and cat /proc/mdstat do not agree:
> 
> NOTE: mdadm --detail says the rebuild status is 0% complete, but cat
> /proc/mdstat shows it as 7%.
> A bit later, I check again and they both agree - 14%.
> Below, from when the rebuild was 7% according to /proc/mdstat

I cannot explain this except to wonder if 7% of the recovery
completed between running "mdadm -D" and "cat /proc/mdstat".

The number report by "mdadm -D" is obtained by reading /proc/mdstat
and applying "atoi()" to the string that ends with a '%'.

NeilBrown


> 
> /dev/md10:
>         Version : 00.90.03
>   Creation Time : Fri Dec  5 07:44:41 2008
>      Raid Level : raid1
>      Array Size : 78123968 (74.50 GiB 80.00 GB)
>   Used Dev Size : 78123968 (74.50 GiB 80.00 GB)
>    Raid Devices : 2
>   Total Devices : 2
> Preferred Minor : 10
>     Persistence : Superblock is persistent
> 
>   Intent Bitmap : Internal
> 
>     Update Time : Fri Dec  5 20:04:30 2008
>           State : active, degraded, recovering
>  Active Devices : 1
> Working Devices : 2
>  Failed Devices : 0
>   Spare Devices : 1
> 
>  Rebuild Status : 0% complete
> 
>            UUID : 542a0986:dd465da6:b224af07:ed28e4e5
>          Events : 0.544
> 
>     Number   Major   Minor   RaidDevice State
>        2       8        0        0      spare rebuilding   /dev/sda
>        1      43        0        1      active sync   /dev/nbd0
> 
> 
> md10 : active raid1 sda[2] nbd0[1]
>       78123968 blocks [2/1] [_U]
>       [==>..................]  recovery = 13.1% (10283392/78123968)
> finish=27.3min speed=41367K/sec
>       bitmap: 0/150 pages [0KB], 256KB chunk
> 
> 
> 
> -- 
> Jon
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
v

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: weird issues with raid1
  2008-12-15  6:00 ` Neil Brown
@ 2008-12-15 13:42   ` Jon Nelson
  2008-12-15 21:33     ` Neil Brown
  2008-12-18  5:43   ` Neil Brown
  1 sibling, 1 reply; 24+ messages in thread
From: Jon Nelson @ 2008-12-15 13:42 UTC (permalink / raw)
  To: Neil Brown; +Cc: LinuxRaid

On Mon, Dec 15, 2008 at 12:00 AM, Neil Brown <neilb@suse.de> wrote:
> On Friday December 5, jnelson-linux-raid@jamponi.net wrote:
>> I set up a raid1 between some devices, and have been futzing with it.
>> I've been encountering all kinds of weird problems, including one
>> which required me to reboot my machine.
>>
>> This is long, sorry.
>>
>> First, this is how I built the raid:
>>
>> mdadm --create /dev/md10 --level=1 --raid-devices=2 --bitmap=internal
>> /dev/sdd1 --write-mostly --write-behind missing
>
> 'write-behind' is a setting on the bitmap and applies to all
> write-mostly devices, so it can be specified anywhere.
> 'write-mostly' is a setting that applies to a particular device, not
> to a position in the array.  So setting 'write-mostly' on a 'missing'
> device has no useful effect.  When you add a new device to the array
> you will need to set 'write-mostly' on that if you want that feature.

Aha! Good to know.

>   mdadm /dev/md10 --add --write-mostly /dev/nbd0

..

>> Then I failed and removed /dev/sdd1, and added /dev/sda:
>>
>> mdadm /dev/md10 --fail /dev/sdd1 --remove /dev/sdd1
>> mdadm /dev/md10 --add /dev/sda
>>
>> I let it rebuild.
>>
>> Then I failed, and removed it:
>>
>> The --fail worked, but the --remove did not.
>>
>> mdadm /dev/md10 --fail /dev/sda --remove /dev/sda
>> mdadm: set /dev/sda faulty in /dev/md10
>> mdadm: hot remove failed for /dev/sda: Device or resource busy
>
> That is expected.  Marking a device a 'failed' does not immediately
> disconnect it from the array.  You have to wait for any in-flight IO
> requests to complete.

Aha! Got it.

>> OK. Better, but weird.
>> Since I'm using bitmaps, I would expect --re-add to allow the rebuild
>> to pick up where it left off. It was 78% done.
>
> Nope.
> With v0.90 metadata, a spare device is not marked a being part of the
> array until it is fully recovered.  So if you interrupt a recovery
> there is no record how far it got.
> With v1.0 metadata we do record how far the recovery has progressed
> and it can restart.  However I don't think that helps if you fail a
> device - only if you stop the array and later restart it.
>
> The bitmap is really about 'resync', not 'recovery'.

OK, so task 1: switch to 1.0 (1.1, 1.2) metadata. That's going to
happen as soon as my raid10,f2 'check' is complete.

However, it raises a question: bitmaps are about 'resync' not
'recovery'?  How do they differ?

>> Question 1:
>> I'm using a bitmap. Why does the rebuild start completely over?
>
> Because the bitmap isn't used to guide a rebuild, only a resync.
>
> The effect of --re-add is to make md do a resync rather than a rebuild
> if the device was previously a fully active member of the array.

Aha!  This explains a question I raised in another email. What
happened there is a previously fully active member of the raid got
added, somehow, as a spare, via --incremental. That's when the entire
raid thought it needed to be rebuilt. How did that (the device being
treated as a spare instead of as a previously fully active member)
happen?

>> 4% into the rebuild, this is what --examine-bitmap looks like for both
>> components:
>>
>>         Filename : /dev/sda
>>            Magic : 6d746962
>>          Version : 4
>>             UUID : 542a0986:dd465da6:b224af07:ed28e4e5
>>           Events : 500
>>   Events Cleared : 496
>>            State : OK
>>        Chunksize : 256 KB
>>           Daemon : 5s flush period
>>       Write Mode : Allow write behind, max 256
>>        Sync Size : 78123968 (74.50 GiB 80.00 GB)
>>           Bitmap : 305172 bits (chunks), 305172 dirty (100.0%)
>>
>> turnip:~ # mdadm --examine-bitmap /dev/nbd0
>>         Filename : /dev/nbd0
>>            Magic : 6d746962
>>          Version : 4
>>             UUID : 542a0986:dd465da6:b224af07:ed28e4e5
>>           Events : 524
>>   Events Cleared : 496
>>            State : OK
>>        Chunksize : 256 KB
>>           Daemon : 5s flush period
>>       Write Mode : Allow write behind, max 256
>>        Sync Size : 78123968 (74.50 GiB 80.00 GB)
>>           Bitmap : 305172 bits (chunks), 0 dirty (0.0%)
>>
>>
>> No matter how long I wait, until it is rebuilt, the bitmap on /dev/sda
>> is always 100% dirty.
>> If I --fail, --remove (twice) /dev/sda, and I re-add /dev/sdd1, it
>> clearly uses the bitmap and re-syncs in under 1 second.
>
> Yes, there is a bug here.
> When an array recovers on to a hot space it doesn't copy the bitmap
> across.  That will only happen lazily as bits are updated.
> I'm surprised I hadn't noticed that before, so they might be more to
> this than I'm seeing at the moment.   But I definitely cannot find
> code to copy the bitmap across.  I'll have to have a think about
> that.

ok.

>> Question 2: mdadm --detail and cat /proc/mdstat do not agree:
>>
>> NOTE: mdadm --detail says the rebuild status is 0% complete, but cat
>> /proc/mdstat shows it as 7%.
>> A bit later, I check again and they both agree - 14%.
>> Below, from when the rebuild was 7% according to /proc/mdstat
>
> I cannot explain this except to wonder if 7% of the recovery
> completed between running "mdadm -D" and "cat /proc/mdstat".
>
> The number report by "mdadm -D" is obtained by reading /proc/mdstat
> and applying "atoi()" to the string that ends with a '%'.

OK. As I see it, there are three issues here:

1. somehow a previously fully-active member got re-added (via
--incremental) as a spare instead simply re-added, forcing a full
rebuild.

2. new raid member bitmap weirdness (the bitmap doesn't get copied
over on new members, causing all sorts of weirdness).

3. The unexplained difference between mdadm --detail and cat /proc/mdstat

I have a few more questions / observations I'd like to make but I'll
do those in another email.

Thanks for your response(s)!

-- 
Jon

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: weird issues with raid1
  2008-12-15 13:42   ` Jon Nelson
@ 2008-12-15 21:33     ` Neil Brown
  2008-12-15 21:47       ` Jon Nelson
  0 siblings, 1 reply; 24+ messages in thread
From: Neil Brown @ 2008-12-15 21:33 UTC (permalink / raw)
  To: Jon Nelson; +Cc: LinuxRaid

On Monday December 15, jnelson-linux-raid@jamponi.net wrote:
> 
> However, it raises a question: bitmaps are about 'resync' not
> 'recovery'?  How do they differ?

With resync, the expectation is that most of the device is correct.
The bitmap tells us which sectors aren't, and we just resync those.

With recover, the expectation is that the entire drive contains
garbage and it has to be recovered from beginning to end.

Each device has a flag to say where the device is in sync write the
array.  The bit map records which sectors of "in-sync" devices may not
actually in in-sync at the moment.
'resync' synchronises the 'in-sync' devices.
'recovery' synchronises a 'not-in-sync' device.b


> 
> >> Question 1:
> >> I'm using a bitmap. Why does the rebuild start completely over?
> >
> > Because the bitmap isn't used to guide a rebuild, only a resync.
> >
> > The effect of --re-add is to make md do a resync rather than a rebuild
> > if the device was previously a fully active member of the array.
> 
> Aha!  This explains a question I raised in another email. What
> happened there is a previously fully active member of the raid got
> added, somehow, as a spare, via --incremental. That's when the entire
> raid thought it needed to be rebuilt. How did that (the device being
> treated as a spare instead of as a previously fully active member)
> happen?

It is hard to guess without details, and they might be hard to collect
after the fact.
Maybe if you have the kernel logs of when the server rebooted and the
recovery started, that might contain some hints.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: weird issues with raid1
  2008-12-15 21:33     ` Neil Brown
@ 2008-12-15 21:47       ` Jon Nelson
  2008-12-16  1:21         ` Neil Brown
  0 siblings, 1 reply; 24+ messages in thread
From: Jon Nelson @ 2008-12-15 21:47 UTC (permalink / raw)
  To: Neil Brown; +Cc: LinuxRaid

On Mon, Dec 15, 2008 at 3:33 PM, Neil Brown <neilb@suse.de> wrote:
> On Monday December 15, jnelson-linux-raid@jamponi.net wrote:
>>
>> However, it raises a question: bitmaps are about 'resync' not
>> 'recovery'?  How do they differ?
>
> With resync, the expectation is that most of the device is correct.
> The bitmap tells us which sectors aren't, and we just resync those.
>
> With recover, the expectation is that the entire drive contains
> garbage and it has to be recovered from beginning to end.
>
> Each device has a flag to say where the device is in sync write the
> array.  The bit map records which sectors of "in-sync" devices may not
> actually in in-sync at the moment.
> 'resync' synchronises the 'in-sync' devices.
> 'recovery' synchronises a 'not-in-sync' device.b
>
>
>>
>> >> Question 1:
>> >> I'm using a bitmap. Why does the rebuild start completely over?
>> >
>> > Because the bitmap isn't used to guide a rebuild, only a resync.
>> >
>> > The effect of --re-add is to make md do a resync rather than a rebuild
>> > if the device was previously a fully active member of the array.
>>
>> Aha!  This explains a question I raised in another email. What
>> happened there is a previously fully active member of the raid got
>> added, somehow, as a spare, via --incremental. That's when the entire
>> raid thought it needed to be rebuilt. How did that (the device being
>> treated as a spare instead of as a previously fully active member)
>> happen?
>
> It is hard to guess without details, and they might be hard to collect
> after the fact.
> Maybe if you have the kernel logs of when the server rebooted and the
> recovery started, that might contain some hints.

I hope this helps.

Prior to the reboot:

Dec 15 15:19:39 turnip kernel: md: md11: recovery done.
Dec 15 15:19:39 turnip kernel: RAID1 conf printout:
Dec 15 15:19:39 turnip kernel:  --- wd:2 rd:2
Dec 15 15:19:39 turnip kernel:  disk 0, wo:0, o:1, dev:nbd0
Dec 15 15:19:39 turnip kernel:  disk 1, wo:0, o:1, dev:sda

During booting:

<6>raid1: raid set md11 active with 1 out of 2 mirrors
<6>md11: bitmap initialized from disk: read 1/1 pages, set 1 bits
<6>created bitmap (10 pages) for device md11

After boot:

Dec 15 15:34:38 turnip kernel: md: bind<nbd0>
Dec 15 15:34:38 turnip kernel: RAID1 conf printout:
Dec 15 15:34:38 turnip kernel:  --- wd:1 rd:2
Dec 15 15:34:38 turnip kernel:  disk 0, wo:1, o:1, dev:nbd0
Dec 15 15:34:38 turnip kernel:  disk 1, wo:0, o:1, dev:sda
Dec 15 15:34:38 turnip kernel: md: recovery of RAID array md11
Dec 15 15:34:38 turnip kernel: md: minimum _guaranteed_  speed: 1000
KB/sec/disk.
Dec 15 15:34:38 turnip kernel: md: using maximum available idle IO
bandwidth (but not more than 200000 KB/sec) for recovery.
Dec 15 15:34:38 turnip kernel: md: using 128k window, over a total of
78123988 blocks.

/dev/nbd0 was added via --incremental (mdadm 3.0)


--detail:

/dev/md11:
        Version : 01.00.03
  Creation Time : Mon Dec 15 07:06:13 2008
     Raid Level : raid1
     Array Size : 78123988 (74.50 GiB 80.00 GB)
  Used Dev Size : 156247976 (149.01 GiB 160.00 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 11
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Mon Dec 15 15:35:17 2008
          State : active, degraded, recovering
 Active Devices : 1
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 1

 Rebuild Status : 9% complete

           Name : turnip:11  (local to host turnip)
           UUID : cf24d099:9e174a79:2a2f6797:dcff1420
         Events : 3914

    Number   Major   Minor   RaidDevice State
       2      43        0        0      spare rebuilding   /dev/nbd0
       3       8        0        1      active sync   /dev/sda


turnip:~ # mdadm --examine /dev/sda
/dev/sda:
          Magic : a92b4efc
        Version : 1.0
    Feature Map : 0x1
     Array UUID : cf24d099:9e174a79:2a2f6797:dcff1420
           Name : turnip:11  (local to host turnip)
  Creation Time : Mon Dec 15 07:06:13 2008
     Raid Level : raid1
   Raid Devices : 2

 Avail Dev Size : 160086384 (76.34 GiB 81.96 GB)
     Array Size : 156247976 (74.50 GiB 80.00 GB)
  Used Dev Size : 156247976 (74.50 GiB 80.00 GB)
   Super Offset : 160086512 sectors
          State : clean
    Device UUID : 0059434c:ecef51a0:2974482d:ba38f944

Internal Bitmap : 2 sectors from superblock
    Update Time : Mon Dec 15 15:45:21 2008
       Checksum : 21396863 - correct
         Events : 3916


    Array Slot : 3 (failed, failed, empty, 1)
   Array State : _U 2 failed
turnip:~ #

turnip:~ # mdadm --examine /dev/nbd0
/dev/nbd0:
          Magic : a92b4efc
        Version : 1.0
    Feature Map : 0x1
     Array UUID : cf24d099:9e174a79:2a2f6797:dcff1420
           Name : turnip:11  (local to host turnip)
  Creation Time : Mon Dec 15 07:06:13 2008
     Raid Level : raid1
   Raid Devices : 2

 Avail Dev Size : 160086384 (76.34 GiB 81.96 GB)
     Array Size : 156247976 (74.50 GiB 80.00 GB)
  Used Dev Size : 156247976 (74.50 GiB 80.00 GB)
   Super Offset : 160086512 sectors
          State : clean
    Device UUID : 01524a75:c309869c:6da972c9:084115c6

Internal Bitmap : 2 sectors from superblock
      Flags : write-mostly
    Update Time : Mon Dec 15 15:45:21 2008
       Checksum : 63bab8ce - correct
         Events : 3916


    Array Slot : 2 (failed, failed, empty, 1)
   Array State : _u 2 failed
turnip:~ #



Thanks!!

-- 
Jon

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: weird issues with raid1
  2008-12-15 21:47       ` Jon Nelson
@ 2008-12-16  1:21         ` Neil Brown
  2008-12-16  2:32           ` Jon Nelson
  2008-12-18  4:42           ` Neil Brown
  0 siblings, 2 replies; 24+ messages in thread
From: Neil Brown @ 2008-12-16  1:21 UTC (permalink / raw)
  To: Jon Nelson; +Cc: LinuxRaid

On Monday December 15, jnelson-linux-raid@jamponi.net wrote:
> On Mon, Dec 15, 2008 at 3:33 PM, Neil Brown <neilb@suse.de> wrote:
> > On Monday December 15, jnelson-linux-raid@jamponi.net wrote:
> >>
> >> Aha!  This explains a question I raised in another email. What
> >> happened there is a previously fully active member of the raid got
> >> added, somehow, as a spare, via --incremental. That's when the entire
> >> raid thought it needed to be rebuilt. How did that (the device being
> >> treated as a spare instead of as a previously fully active member)
> >> happen?
> >
> > It is hard to guess without details, and they might be hard to collect
> > after the fact.
> > Maybe if you have the kernel logs of when the server rebooted and the
> > recovery started, that might contain some hints.
> 
> I hope this helps.

Yes it does, though I generally prefer to get more complete logs.  If
I get the surrounding log lines then I know what isn't there as well
as what is - and it isn't always clear at first which bits will be
important. 

The problem here is that --incremental doesn't provide the --re-add
functionality that you are depending on.  That was an oversight on my
part.  I'll see if I can get it fixed.
In the mean time, you'll need to use --re-add (or --add, it does the
same thing in your situation) to add nbd0 to the array.

NeilBrown


> 
> Prior to the reboot:
> 
> Dec 15 15:19:39 turnip kernel: md: md11: recovery done.
> Dec 15 15:19:39 turnip kernel: RAID1 conf printout:
> Dec 15 15:19:39 turnip kernel:  --- wd:2 rd:2
> Dec 15 15:19:39 turnip kernel:  disk 0, wo:0, o:1, dev:nbd0
> Dec 15 15:19:39 turnip kernel:  disk 1, wo:0, o:1, dev:sda
> 
> During booting:
> 
> <6>raid1: raid set md11 active with 1 out of 2 mirrors
> <6>md11: bitmap initialized from disk: read 1/1 pages, set 1 bits
> <6>created bitmap (10 pages) for device md11
> 
> After boot:
> 
> Dec 15 15:34:38 turnip kernel: md: bind<nbd0>
> Dec 15 15:34:38 turnip kernel: RAID1 conf printout:
> Dec 15 15:34:38 turnip kernel:  --- wd:1 rd:2
> Dec 15 15:34:38 turnip kernel:  disk 0, wo:1, o:1, dev:nbd0
> Dec 15 15:34:38 turnip kernel:  disk 1, wo:0, o:1, dev:sda
> Dec 15 15:34:38 turnip kernel: md: recovery of RAID array md11
> Dec 15 15:34:38 turnip kernel: md: minimum _guaranteed_  speed: 1000
> KB/sec/disk.
> Dec 15 15:34:38 turnip kernel: md: using maximum available idle IO
> bandwidth (but not more than 200000 KB/sec) for recovery.
> Dec 15 15:34:38 turnip kernel: md: using 128k window, over a total of
> 78123988 blocks.
> 
> /dev/nbd0 was added via --incremental (mdadm 3.0)
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: weird issues with raid1
  2008-12-16  1:21         ` Neil Brown
@ 2008-12-16  2:32           ` Jon Nelson
  2008-12-18  4:42           ` Neil Brown
  1 sibling, 0 replies; 24+ messages in thread
From: Jon Nelson @ 2008-12-16  2:32 UTC (permalink / raw)
  To: Neil Brown; +Cc: LinuxRaid

On Mon, Dec 15, 2008 at 7:21 PM, Neil Brown <neilb@suse.de> wrote:
> On Monday December 15, jnelson-linux-raid@jamponi.net wrote:
>> I hope this helps.
>
> Yes it does, though I generally prefer to get more complete logs.  If
> I get the surrounding log lines then I know what isn't there as well
> as what is - and it isn't always clear at first which bits will be
> important.

Quite literally the rest of /var/log/messages was stuff unrelated
(dhcp, etc...). However, I'll try to include more context next time.

> The problem here is that --incremental doesn't provide the --re-add
> functionality that you are depending on.  That was an oversight on my
> part.  I'll see if I can get it fixed.
> In the mean time, you'll need to use --re-add (or --add, it does the
> same thing in your situation) to add nbd0 to the array.

Why does it usually work as though I *had* used --re-add (and
specified the right array)?


-- 
Jon

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: weird issues with raid1
  2008-12-16  1:21         ` Neil Brown
  2008-12-16  2:32           ` Jon Nelson
@ 2008-12-18  4:42           ` Neil Brown
  2008-12-18  4:50             ` Jon Nelson
  1 sibling, 1 reply; 24+ messages in thread
From: Neil Brown @ 2008-12-18  4:42 UTC (permalink / raw)
  To: Jon Nelson, LinuxRaid

On Tuesday December 16, neilb@suse.de wrote:
> On Monday December 15, jnelson-linux-raid@jamponi.net wrote:
> > On Mon, Dec 15, 2008 at 3:33 PM, Neil Brown <neilb@suse.de> wrote:
> > > On Monday December 15, jnelson-linux-raid@jamponi.net wrote:
> > >>
> > >> Aha!  This explains a question I raised in another email. What
> > >> happened there is a previously fully active member of the raid got
> > >> added, somehow, as a spare, via --incremental. That's when the entire
> > >> raid thought it needed to be rebuilt. How did that (the device being
> > >> treated as a spare instead of as a previously fully active member)
> > >> happen?
> > >
> > > It is hard to guess without details, and they might be hard to collect
> > > after the fact.
> > > Maybe if you have the kernel logs of when the server rebooted and the
> > > recovery started, that might contain some hints.
> > 
> > I hope this helps.
> 
> Yes it does, though I generally prefer to get more complete logs.  If
> I get the surrounding log lines then I know what isn't there as well
> as what is - and it isn't always clear at first which bits will be
> important. 
> 
> The problem here is that --incremental doesn't provide the --re-add
> functionality that you are depending on.  That was an oversight on my
> part.  I'll see if I can get it fixed.
> In the mean time, you'll need to use --re-add (or --add, it does the
> same thing in your situation) to add nbd0 to the array.

Actually, I'm wrong.
--incremental does do the right thing w.r.t. --re-add.
I couldn't reproduce your symptoms.

It could be that you are hitting the bug fixed by 
  commit a0da84f35b25875870270d16b6eccda4884d61a7

You would need 2.6.26 or later to have that fixed.
Can you try with a newer kernel???

NeilBrown


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: weird issues with raid1
  2008-12-18  4:42           ` Neil Brown
@ 2008-12-18  4:50             ` Jon Nelson
  2008-12-18  4:55               ` Jon Nelson
  0 siblings, 1 reply; 24+ messages in thread
From: Jon Nelson @ 2008-12-18  4:50 UTC (permalink / raw)
  To: Neil Brown; +Cc: LinuxRaid

On Wed, Dec 17, 2008 at 10:42 PM, Neil Brown <neilb@suse.de> wrote:
> On Tuesday December 16, neilb@suse.de wrote:
>> On Monday December 15, jnelson-linux-raid@jamponi.net wrote:
>> > On Mon, Dec 15, 2008 at 3:33 PM, Neil Brown <neilb@suse.de> wrote:
>> > > On Monday December 15, jnelson-linux-raid@jamponi.net wrote:
>> > >>
>> > >> Aha!  This explains a question I raised in another email. What
>> > >> happened there is a previously fully active member of the raid got
>> > >> added, somehow, as a spare, via --incremental. That's when the entire
>> > >> raid thought it needed to be rebuilt. How did that (the device being
>> > >> treated as a spare instead of as a previously fully active member)
>> > >> happen?
>> > >
>> > > It is hard to guess without details, and they might be hard to collect
>> > > after the fact.
>> > > Maybe if you have the kernel logs of when the server rebooted and the
>> > > recovery started, that might contain some hints.
>> >
>> > I hope this helps.
>>
>> Yes it does, though I generally prefer to get more complete logs.  If
>> I get the surrounding log lines then I know what isn't there as well
>> as what is - and it isn't always clear at first which bits will be
>> important.
>>
>> The problem here is that --incremental doesn't provide the --re-add
>> functionality that you are depending on.  That was an oversight on my
>> part.  I'll see if I can get it fixed.
>> In the mean time, you'll need to use --re-add (or --add, it does the
>> same thing in your situation) to add nbd0 to the array.
>
> Actually, I'm wrong.
> --incremental does do the right thing w.r.t. --re-add.
> I couldn't reproduce your symptoms.

OK.

> It could be that you are hitting the bug fixed by
>  commit a0da84f35b25875870270d16b6eccda4884d61a7

That sure sounds like it. I'd have to log to see what happened,
exactly, but I've added substantial logging around the device
discovery and addition section which manages this particular raid.

> You would need 2.6.26 or later to have that fixed.
> Can you try with a newer kernel???

I hope to be giving opensuse 11.1 a try soon, which uses 2.6.27.X
afaik.  I suspect I can also backport that patch to 2.6.25 easily.



-- 
Jon

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: weird issues with raid1
  2008-12-18  4:50             ` Jon Nelson
@ 2008-12-18  4:55               ` Jon Nelson
  2008-12-18  5:17                 ` Neil Brown
  0 siblings, 1 reply; 24+ messages in thread
From: Jon Nelson @ 2008-12-18  4:55 UTC (permalink / raw)
  To: Neil Brown; +Cc: LinuxRaid

On Wed, Dec 17, 2008 at 10:50 PM, Jon Nelson
<jnelson-linux-raid@jamponi.net> wrote:
> On Wed, Dec 17, 2008 at 10:42 PM, Neil Brown <neilb@suse.de> wrote:
>> On Tuesday December 16, neilb@suse.de wrote:
>>> On Monday December 15, jnelson-linux-raid@jamponi.net wrote:
>>> > On Mon, Dec 15, 2008 at 3:33 PM, Neil Brown <neilb@suse.de> wrote:
>>> > > On Monday December 15, jnelson-linux-raid@jamponi.net wrote:
>>> > >>
>>> > >> Aha!  This explains a question I raised in another email. What
>>> > >> happened there is a previously fully active member of the raid got
>>> > >> added, somehow, as a spare, via --incremental. That's when the entire
>>> > >> raid thought it needed to be rebuilt. How did that (the device being
>>> > >> treated as a spare instead of as a previously fully active member)
>>> > >> happen?
>>> > >
>>> > > It is hard to guess without details, and they might be hard to collect
>>> > > after the fact.
>>> > > Maybe if you have the kernel logs of when the server rebooted and the
>>> > > recovery started, that might contain some hints.
>>> >
>>> > I hope this helps.
>>>
>>> Yes it does, though I generally prefer to get more complete logs.  If
>>> I get the surrounding log lines then I know what isn't there as well
>>> as what is - and it isn't always clear at first which bits will be
>>> important.
>>>
>>> The problem here is that --incremental doesn't provide the --re-add
>>> functionality that you are depending on.  That was an oversight on my
>>> part.  I'll see if I can get it fixed.
>>> In the mean time, you'll need to use --re-add (or --add, it does the
>>> same thing in your situation) to add nbd0 to the array.
>>
>> Actually, I'm wrong.
>> --incremental does do the right thing w.r.t. --re-add.
>> I couldn't reproduce your symptoms.
>
> OK.
>
>> It could be that you are hitting the bug fixed by
>>  commit a0da84f35b25875870270d16b6eccda4884d61a7
>
> That sure sounds like it. I'd have to log to see what happened,
> exactly, but I've added substantial logging around the device
> discovery and addition section which manages this particular raid.
>
>> You would need 2.6.26 or later to have that fixed.
>> Can you try with a newer kernel???
>
> I hope to be giving opensuse 11.1 a try soon, which uses 2.6.27.X
> afaik.  I suspect I can also backport that patch to 2.6.25 easily.

The kernel source for 2.6.25.18-0.2 (from suse) has this patch
already, so I was already using it.

Perhaps this weekend or some night this week I'll find time to try to
break things again.

-- 
Jon

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: weird issues with raid1
  2008-12-18  4:55               ` Jon Nelson
@ 2008-12-18  5:17                 ` Neil Brown
  2008-12-18  5:47                   ` Jon Nelson
  0 siblings, 1 reply; 24+ messages in thread
From: Neil Brown @ 2008-12-18  5:17 UTC (permalink / raw)
  To: Jon Nelson; +Cc: LinuxRaid

On Wednesday December 17, jnelson-linux-raid@jamponi.net wrote:
> >
> >> It could be that you are hitting the bug fixed by
> >>  commit a0da84f35b25875870270d16b6eccda4884d61a7
> >
> > That sure sounds like it. I'd have to log to see what happened,
> > exactly, but I've added substantial logging around the device
> > discovery and addition section which manages this particular raid.
> >
> >> You would need 2.6.26 or later to have that fixed.
> >> Can you try with a newer kernel???
> >
> > I hope to be giving opensuse 11.1 a try soon, which uses 2.6.27.X
> > afaik.  I suspect I can also backport that patch to 2.6.25 easily.
> 
> The kernel source for 2.6.25.18-0.2 (from suse) has this patch
> already, so I was already using it.

Are you sure?  I just looked in the openSUSE-11.0 kernel tree and I
cannot see it there....

NeilBrown


> 
> Perhaps this weekend or some night this week I'll find time to try to
> break things again.
> 
> -- 
> Jon

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: weird issues with raid1
  2008-12-15  6:00 ` Neil Brown
  2008-12-15 13:42   ` Jon Nelson
@ 2008-12-18  5:43   ` Neil Brown
  2008-12-18  5:54     ` Jon Nelson
  1 sibling, 1 reply; 24+ messages in thread
From: Neil Brown @ 2008-12-18  5:43 UTC (permalink / raw)
  To: Jon Nelson, LinuxRaid

On Monday December 15, neilb@suse.de wrote:
> > 
> > No matter how long I wait, until it is rebuilt, the bitmap on /dev/sda
> > is always 100% dirty.
> > If I --fail, --remove (twice) /dev/sda, and I re-add /dev/sdd1, it
> > clearly uses the bitmap and re-syncs in under 1 second.
> 
> Yes, there is a bug here.
> When an array recovers on to a hot space it doesn't copy the bitmap
> across.  That will only happen lazily as bits are updated.
> I'm surprised I hadn't noticed that before, so they might be more to
> this than I'm seeing at the moment.   But I definitely cannot find
> code to copy the bitmap across.  I'll have to have a think about
> that. 

There isn't a bug here, I was wrong.

We don't update the bitmap on recovery until the recovery is
complete.  Once it is complete we do (as you notice) update it all at
once.
This is correct behaviour because until the recovery is complete, the
new device isn't really part of the array so the bitmap on it doesn't
mean anything.  As soon as the array is flagged as 'InSync' we update
the bitmap on it.

NeilBrown

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: weird issues with raid1
  2008-12-18  5:17                 ` Neil Brown
@ 2008-12-18  5:47                   ` Jon Nelson
  2008-12-18  6:21                     ` Neil Brown
  0 siblings, 1 reply; 24+ messages in thread
From: Jon Nelson @ 2008-12-18  5:47 UTC (permalink / raw)
  To: Neil Brown; +Cc: LinuxRaid

On Wed, Dec 17, 2008 at 11:17 PM, Neil Brown <neilb@suse.de> wrote:
> On Wednesday December 17, jnelson-linux-raid@jamponi.net wrote:
>> >
>> >> It could be that you are hitting the bug fixed by
>> >>  commit a0da84f35b25875870270d16b6eccda4884d61a7
>> >
>> > That sure sounds like it. I'd have to log to see what happened,
>> > exactly, but I've added substantial logging around the device
>> > discovery and addition section which manages this particular raid.
>> >
>> >> You would need 2.6.26 or later to have that fixed.
>> >> Can you try with a newer kernel???
>> >
>> > I hope to be giving opensuse 11.1 a try soon, which uses 2.6.27.X
>> > afaik.  I suspect I can also backport that patch to 2.6.25 easily.
>>
>> The kernel source for 2.6.25.18-0.2 (from suse) has this patch
>> already, so I was already using it.
>
> Are you sure?  I just looked in the openSUSE-11.0 kernel tree and I
> cannot see it there....
>
> NeilBrown
>
>
>>
>> Perhaps this weekend or some night this week I'll find time to try to
>> break things again.
>>
>> --
>> Jon
>

jnelson@turnip:~/kernels> rpm -qf /usr/src/linux-2.6.25.18-0.2
kernel-source-2.6.25.18-0.2
jnelson@turnip:~/kernels> rpm -V kernel-source-2.6.25.18-0.2
jnelson@turnip:~/kernels> (cd linux-2.6 && git diff
a0da84f35b25875870270d16b6eccda4884d61a7
a0da84f35b25875870270d16b6eccda4884d61a7^ ) > d.diff
jnelson@turnip:~/kernels> head d.diff
diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c
index dedba16..b26927c 100644
--- a/drivers/md/bitmap.c
+++ b/drivers/md/bitmap.c
@@ -454,11 +454,8 @@ void bitmap_update_sb(struct bitmap *bitmap)
        spin_unlock_irqrestore(&bitmap->lock, flags);
        sb = (bitmap_super_t *)kmap_atomic(bitmap->sb_page, KM_USER0);
        sb->events = cpu_to_le64(bitmap->mddev->events);
-       if (bitmap->mddev->events < bitmap->events_cleared) {
-               /* rocking back to read-only */
jnelson@turnip:~/kernels> cp -r /usr/src/linux-2.6.25.18-0.2 .
jnelson@turnip:~/kernels/linux-2.6.25.18-0.2> tail -n +454
drivers/md/bitmap.c | head -n 20
{
        bitmap_super_t *sb;
        unsigned long flags;

        if (!bitmap || !bitmap->mddev) /* no bitmap for this array */
                return;
        spin_lock_irqsave(&bitmap->lock, flags);
        if (!bitmap->sb_page) { /* no superblock */
                spin_unlock_irqrestore(&bitmap->lock, flags);
                return;
        }
        spin_unlock_irqrestore(&bitmap->lock, flags);
        sb = (bitmap_super_t *)kmap_atomic(bitmap->sb_page, KM_USER0);
        sb->events = cpu_to_le64(bitmap->mddev->events);
        if (!bitmap->mddev->degraded)
                sb->events_cleared = cpu_to_le64(bitmap->mddev->events);
        kunmap_atomic(sb, KM_USER0);
        write_page(bitmap, bitmap->sb_page, 1);
}



When I view the diff and the source they appear to agree.

-- 
Jon

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: weird issues with raid1
  2008-12-18  5:43   ` Neil Brown
@ 2008-12-18  5:54     ` Jon Nelson
  0 siblings, 0 replies; 24+ messages in thread
From: Jon Nelson @ 2008-12-18  5:54 UTC (permalink / raw)
  To: Neil Brown; +Cc: LinuxRaid

On Wed, Dec 17, 2008 at 11:43 PM, Neil Brown <neilb@suse.de> wrote:
> On Monday December 15, neilb@suse.de wrote:
>> >
>> > No matter how long I wait, until it is rebuilt, the bitmap on /dev/sda
>> > is always 100% dirty.
>> > If I --fail, --remove (twice) /dev/sda, and I re-add /dev/sdd1, it
>> > clearly uses the bitmap and re-syncs in under 1 second.
>>
>> Yes, there is a bug here.
>> When an array recovers on to a hot space it doesn't copy the bitmap
>> across.  That will only happen lazily as bits are updated.
>> I'm surprised I hadn't noticed that before, so they might be more to
>> this than I'm seeing at the moment.   But I definitely cannot find
>> code to copy the bitmap across.  I'll have to have a think about
>> that.
>
> There isn't a bug here, I was wrong.
>
> We don't update the bitmap on recovery until the recovery is
> complete.  Once it is complete we do (as you notice) update it all at
> once.
> This is correct behaviour because until the recovery is complete, the
> new device isn't really part of the array so the bitmap on it doesn't
> mean anything.  As soon as the array is flagged as 'InSync' we update
> the bitmap on it.

OK. Fair enough, except for some issues I've had with the bitmap /not/
getting updated at all, ever, on the replacement device. That's a
whole 'nother thread, though.

However, I would argue that it's *kinda* part of the array.

If I were rebuilding some huge array, and it was 99% done and some
issue developed (and was resolved), I would not want to start over.

How do you feel about copying over the bitmap right away and marking
all of the bits out-of-date, then letting the normal bitmappy stuff
work to our advantage?

-- 
Jon

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: weird issues with raid1
  2008-12-18  5:47                   ` Jon Nelson
@ 2008-12-18  6:21                     ` Neil Brown
  2008-12-19  2:15                       ` Jon Nelson
  0 siblings, 1 reply; 24+ messages in thread
From: Neil Brown @ 2008-12-18  6:21 UTC (permalink / raw)
  To: Jon Nelson; +Cc: LinuxRaid

On Wednesday December 17, jnelson-linux-raid@jamponi.net wrote:
> On Wed, Dec 17, 2008 at 11:17 PM, Neil Brown <neilb@suse.de> wrote:
> > On Wednesday December 17, jnelson-linux-raid@jamponi.net wrote:
> >> >
> >> >> It could be that you are hitting the bug fixed by
> >> >>  commit a0da84f35b25875870270d16b6eccda4884d61a7
> >> >
> >> > That sure sounds like it. I'd have to log to see what happened,
> >> > exactly, but I've added substantial logging around the device
> >> > discovery and addition section which manages this particular raid.
> >> >
> >> >> You would need 2.6.26 or later to have that fixed.
> >> >> Can you try with a newer kernel???
> >> >
> >> > I hope to be giving opensuse 11.1 a try soon, which uses 2.6.27.X
> >> > afaik.  I suspect I can also backport that patch to 2.6.25 easily.
> >>
> >> The kernel source for 2.6.25.18-0.2 (from suse) has this patch
> >> already, so I was already using it.
> >
> > Are you sure?  I just looked in the openSUSE-11.0 kernel tree and I
> > cannot see it there....
> >
> > NeilBrown
> >
> >
> >>
> >> Perhaps this weekend or some night this week I'll find time to try to
> >> break things again.
> >>
> >> --
> >> Jon
> >
> 
> jnelson@turnip:~/kernels> rpm -qf /usr/src/linux-2.6.25.18-0.2
> kernel-source-2.6.25.18-0.2
> jnelson@turnip:~/kernels> rpm -V kernel-source-2.6.25.18-0.2
> jnelson@turnip:~/kernels> (cd linux-2.6 && git diff
> a0da84f35b25875870270d16b6eccda4884d61a7
> a0da84f35b25875870270d16b6eccda4884d61a7^ ) > d.diff

This is requesting the diff between a given version, and the previous
version.  So it will be a reversed diff.

> jnelson@turnip:~/kernels> head d.diff
> diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c
> index dedba16..b26927c 100644
> --- a/drivers/md/bitmap.c
> +++ b/drivers/md/bitmap.c
> @@ -454,11 +454,8 @@ void bitmap_update_sb(struct bitmap *bitmap)
>         spin_unlock_irqrestore(&bitmap->lock, flags);
>         sb = (bitmap_super_t *)kmap_atomic(bitmap->sb_page, KM_USER0);
>         sb->events = cpu_to_le64(bitmap->mddev->events);
> -       if (bitmap->mddev->events < bitmap->events_cleared) {
> -               /* rocking back to read-only */

i.e. these two lines are *added* by the patch.
I usually use e.g.
   git log -p a0da84f35b25875870270d16b6eccda4884d61a7
to look at diffs.  Less room for confusion. (or gitk).

> jnelson@turnip:~/kernels> cp -r /usr/src/linux-2.6.25.18-0.2 .
> jnelson@turnip:~/kernels/linux-2.6.25.18-0.2> tail -n +454
> drivers/md/bitmap.c | head -n 20
> {
>         bitmap_super_t *sb;
>         unsigned long flags;
> 
>         if (!bitmap || !bitmap->mddev) /* no bitmap for this array */
>                 return;
>         spin_lock_irqsave(&bitmap->lock, flags);
>         if (!bitmap->sb_page) { /* no superblock */
>                 spin_unlock_irqrestore(&bitmap->lock, flags);
>                 return;
>         }
>         spin_unlock_irqrestore(&bitmap->lock, flags);
>         sb = (bitmap_super_t *)kmap_atomic(bitmap->sb_page, KM_USER0);
>         sb->events = cpu_to_le64(bitmap->mddev->events);
>         if (!bitmap->mddev->degraded)
>                 sb->events_cleared = cpu_to_le64(bitmap->mddev->events);
>         kunmap_atomic(sb, KM_USER0);
>         write_page(bitmap, bitmap->sb_page, 1);
> }

and as those two lines are not present here, the patch as not been
applied. 
:-)

NeilBrown

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: weird issues with raid1
  2008-12-18  6:21                     ` Neil Brown
@ 2008-12-19  2:15                       ` Jon Nelson
  2008-12-19 16:51                         ` Jon Nelson
  0 siblings, 1 reply; 24+ messages in thread
From: Jon Nelson @ 2008-12-19  2:15 UTC (permalink / raw)
  To: Neil Brown; +Cc: LinuxRaid

>> jnelson@turnip:~/kernels> (cd linux-2.6 && git diff
>> a0da84f35b25875870270d16b6eccda4884d61a7
>> a0da84f35b25875870270d16b6eccda4884d61a7^ ) > d.diff
>
> This is requesting the diff between a given version, and the previous
> version.  So it will be a reversed diff.

*sigh*

> i.e. these two lines are *added* by the patch.
> I usually use e.g.
>   git log -p a0da84f35b25875870270d16b6eccda4884d61a7
> to look at diffs.  Less room for confusion. (or gitk).

I will remember that one!

> and as those two lines are not present here, the patch as not been
> applied.

I'll apply and get back to you. My raid rebuilt 3 times today, quite
possibly because of this.

Obviously, I'm abusing the code in ways it was not intended to be
used. Sometimes that's good for finding corner-case-y kinds of issues,
though.

Thanks again for your patience.

-- 
Jon

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: weird issues with raid1
  2008-12-19  2:15                       ` Jon Nelson
@ 2008-12-19 16:51                         ` Jon Nelson
  2008-12-19 20:40                           ` Jon Nelson
  0 siblings, 1 reply; 24+ messages in thread
From: Jon Nelson @ 2008-12-19 16:51 UTC (permalink / raw)
  To: Neil Brown; +Cc: LinuxRaid

> I'll apply and get back to you. My raid rebuilt 3 times today, quite
> possibly because of this.

I'm now running the patch from
a0da84f35b25875870270d16b6eccda4884d61a7 and it still did a complete
rebuild. Was that expected, the first time the device was re-added?

I just rebooted into the new kernel...

(the logs prefixed with nbd0-frank are the result of the follwoing commands:
mdadm --examine --test /dev/nbd0
mdadm --examine-bitmap /dev/nbd0
and eventually
mdadm /dev/md11 --re-add /dev/nbd0
)

Dec 19 10:30:41 turnip nbd0-frank: /dev/nbd0:
Dec 19 10:30:41 turnip nbd0-frank:           Magic : a92b4efc
Dec 19 10:30:41 turnip nbd0-frank:         Version : 1.0
Dec 19 10:30:41 turnip nbd0-frank:     Feature Map : 0x1
Dec 19 10:30:41 turnip nbd0-frank:      Array UUID :
cf24d099:9e174a79:2a2f6797:dcff1420
Dec 19 10:30:41 turnip nbd0-frank:            Name : turnip:11  (local
to host turnip)
Dec 19 10:30:41 turnip nbd0-frank:   Creation Time : Mon Dec 15 07:06:13 2008
Dec 19 10:30:41 turnip nbd0-frank:      Raid Level : raid1
Dec 19 10:30:41 turnip nbd0-frank:    Raid Devices : 2
Dec 19 10:30:41 turnip nbd0-frank:
Dec 19 10:30:41 turnip nbd0-frank:  Avail Dev Size : 160086384 (76.34
GiB 81.96 GB)
Dec 19 10:30:41 turnip nbd0-frank:      Array Size : 156247976 (74.50
GiB 80.00 GB)
Dec 19 10:30:41 turnip nbd0-frank:   Used Dev Size : 156247976 (74.50
GiB 80.00 GB)
Dec 19 10:30:41 turnip nbd0-frank:    Super Offset : 160086512 sectors
Dec 19 10:30:41 turnip nbd0-frank:           State : clean
Dec 19 10:30:41 turnip nbd0-frank:     Device UUID :
01524a75:c309869c:6da972c9:084115c6
Dec 19 10:30:41 turnip nbd0-frank:
Dec 19 10:30:41 turnip nbd0-frank: Internal Bitmap : 2 sectors from superblock
Dec 19 10:30:41 turnip nbd0-frank:       Flags : write-mostly
Dec 19 10:30:41 turnip nbd0-frank:     Update Time : Fri Dec 19 09:46:48 2008
Dec 19 10:30:41 turnip nbd0-frank:        Checksum : 63bfb069 - correct
Dec 19 10:30:41 turnip nbd0-frank:          Events : 5360
Dec 19 10:30:41 turnip nbd0-frank:
Dec 19 10:30:41 turnip nbd0-frank:
Dec 19 10:30:41 turnip nbd0-frank:     Array Slot : 2 (failed, failed, empty, 1)
Dec 19 10:30:41 turnip nbd0-frank:    Array State : _u 2 failed
Dec 19 10:30:41 turnip nbd0-frank:         Filename : /dev/nbd0
Dec 19 10:30:41 turnip nbd0-frank:            Magic : 6d746962
Dec 19 10:30:41 turnip nbd0-frank:          Version : 4
Dec 19 10:30:41 turnip nbd0-frank:             UUID :
cf24d099:9e174a79:2a2f6797:dcff1420
Dec 19 10:30:41 turnip nbd0-frank:           Events : 4462
Dec 19 10:30:41 turnip nbd0-frank:   Events Cleared : 4462
Dec 19 10:30:41 turnip nbd0-frank:            State : OK
Dec 19 10:30:41 turnip nbd0-frank:        Chunksize : 4 MB
Dec 19 10:30:41 turnip nbd0-frank:           Daemon : 5s flush period
Dec 19 10:30:41 turnip nbd0-frank:       Write Mode : Allow write
behind, max 256
Dec 19 10:30:41 turnip nbd0-frank:        Sync Size : 78123988 (74.50
GiB 80.00 GB)
Dec 19 10:30:41 turnip nbd0-frank:           Bitmap : 19074 bits
(chunks), 0 dirty (0.0%)
Dec 19 10:30:41 turnip nbd0-frank: Pre-setting the recovery speed to
5MB/s to avoid saturating network...
Dec 19 10:30:41 turnip nbd0-frank: Adding /dev/nbd0 to /dev/md11....
Dec 19 10:30:41 turnip kernel: md: bind<nbd0>
Dec 19 10:30:41 turnip nbd0-frank: mdadm: re-added /dev/nbd0
Dec 19 10:30:41 turnip kernel: RAID1 conf printout:
Dec 19 10:30:41 turnip kernel:  --- wd:1 rd:2
Dec 19 10:30:41 turnip kernel:  disk 0, wo:1, o:1, dev:nbd0
Dec 19 10:30:41 turnip kernel:  disk 1, wo:0, o:1, dev:sda
Dec 19 10:30:41 turnip kernel: md: recovery of RAID array md11
Dec 19 10:30:41 turnip kernel: md: minimum _guaranteed_  speed: 1000
KB/sec/disk.
Dec 19 10:30:41 turnip kernel: md: using maximum available idle IO
bandwidth (but not more than 5120 KB/sec) for recovery.
Dec 19 10:30:41 turnip kernel: md: using 128k window, over a total of
78123988 blocks.


-- 
Jon

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: weird issues with raid1
  2008-12-19 16:51                         ` Jon Nelson
@ 2008-12-19 20:40                           ` Jon Nelson
  2008-12-19 21:18                             ` Jon Nelson
  0 siblings, 1 reply; 24+ messages in thread
From: Jon Nelson @ 2008-12-19 20:40 UTC (permalink / raw)
  To: Neil Brown; +Cc: LinuxRaid

On Fri, Dec 19, 2008 at 10:51 AM, Jon Nelson
<jnelson-linux-raid@jamponi.net> wrote:
>> I'll apply and get back to you. My raid rebuilt 3 times today, quite
>> possibly because of this.
>
> I'm now running the patch from
> a0da84f35b25875870270d16b6eccda4884d61a7 and it still did a complete
> rebuild. Was that expected, the first time the device was re-added?

After the array reconstructed completely, I did the following:

1. --fail then --remove /dev/nbd0
2. unmounted /dev/md11
3. mdadm --stop /dev/md11
4. mdadm --assemble --scan (this started /dev/md11):

Dec 19 14:21:17 turnip kernel: raid1: raid set md11 active with 1 out
of 2 mirrors
Dec 19 14:21:17 turnip kernel: md11: bitmap initialized from disk:
read 1/1 pages, set 0 bits
Dec 19 14:21:17 turnip kernel: created bitmap (10 pages) for device md11

5. fsck.ext3 -f -v -D -C0 /dev/md11 (this caused some writes to take
place, and I wanted to fsck the volume anyway)

6. --re-add /dev/nbd0

At step 6, the array decided to go into recovery:

Dec 19 14:32:26 turnip kernel: md: bind<nbd0>
Dec 19 14:32:26 turnip kernel: RAID1 conf printout:
Dec 19 14:32:26 turnip kernel:  --- wd:1 rd:2
Dec 19 14:32:26 turnip kernel:  disk 0, wo:1, o:1, dev:nbd0
Dec 19 14:32:26 turnip kernel:  disk 1, wo:0, o:1, dev:sda
Dec 19 14:32:26 turnip kernel: md: recovery of RAID array md11

and has some time to go ...

      [=>...................]  recovery =  7.7% (6031360/78123988)
finish=234.6min speed=5120K/sec

At the time I --re-add'd /dev/nbd0, I also did an --examine and
--examine-bitmap of /dev/nbd0:

Dec 19 14:32:26 turnip nbd0-frank: /dev/nbd0:
Dec 19 14:32:26 turnip nbd0-frank:           Magic : a92b4efc
Dec 19 14:32:26 turnip nbd0-frank:         Version : 1.0
Dec 19 14:32:26 turnip nbd0-frank:     Feature Map : 0x1
Dec 19 14:32:26 turnip nbd0-frank:      Array UUID :
cf24d099:9e174a79:2a2f6797:dcff1420
Dec 19 14:32:26 turnip nbd0-frank:            Name : turnip:11  (local
to host turnip)
Dec 19 14:32:26 turnip nbd0-frank:   Creation Time : Mon Dec 15 07:06:13 2008
Dec 19 14:32:26 turnip nbd0-frank:      Raid Level : raid1
Dec 19 14:32:26 turnip nbd0-frank:    Raid Devices : 2
Dec 19 14:32:26 turnip nbd0-frank:
Dec 19 14:32:26 turnip nbd0-frank:  Avail Dev Size : 160086384 (76.34
GiB 81.96 GB)
Dec 19 14:32:26 turnip nbd0-frank:      Array Size : 156247976 (74.50
GiB 80.00 GB)
Dec 19 14:32:26 turnip nbd0-frank:   Used Dev Size : 156247976 (74.50
GiB 80.00 GB)
Dec 19 14:32:26 turnip nbd0-frank:    Super Offset : 160086512 sectors
Dec 19 14:32:26 turnip nbd0-frank:           State : clean
Dec 19 14:32:26 turnip nbd0-frank:     Device UUID :
01524a75:c309869c:6da972c9:084115c6
Dec 19 14:32:26 turnip nbd0-frank:
Dec 19 14:32:26 turnip nbd0-frank: Internal Bitmap : 2 sectors from superblock
Dec 19 14:32:26 turnip nbd0-frank:       Flags : write-mostly
Dec 19 14:32:26 turnip nbd0-frank:     Update Time : Fri Dec 19 14:20:52 2008
Dec 19 14:32:26 turnip nbd0-frank:        Checksum : 63bef0c2 - correct
Dec 19 14:32:26 turnip nbd0-frank:          Events : 5388
Dec 19 14:32:26 turnip nbd0-frank:
Dec 19 14:32:26 turnip nbd0-frank:
Dec 19 14:32:26 turnip nbd0-frank:     Array Slot : 2 (failed, failed, 0, 1)
Dec 19 14:32:26 turnip nbd0-frank:    Array State : Uu 2 failed
Dec 19 14:32:26 turnip nbd0-frank:         Filename : /dev/nbd0
Dec 19 14:32:26 turnip nbd0-frank:            Magic : 6d746962
Dec 19 14:32:26 turnip nbd0-frank:          Version : 4
Dec 19 14:32:26 turnip nbd0-frank:             UUID :
cf24d099:9e174a79:2a2f6797:dcff1420
Dec 19 14:32:26 turnip nbd0-frank:           Events : 5388
Dec 19 14:32:26 turnip nbd0-frank:   Events Cleared : 4462
Dec 19 14:32:26 turnip nbd0-frank:            State : OK
Dec 19 14:32:26 turnip nbd0-frank:        Chunksize : 4 MB
Dec 19 14:32:26 turnip nbd0-frank:           Daemon : 5s flush period
Dec 19 14:32:26 turnip nbd0-frank:       Write Mode : Allow write
behind, max 256
Dec 19 14:32:26 turnip nbd0-frank:        Sync Size : 78123988 (74.50
GiB 80.00 GB)
Dec 19 14:32:26 turnip nbd0-frank:           Bitmap : 19074 bits
(chunks), 0 dirty (0.0%)
Dec 19 14:32:26 turnip nbd0-frank: Pre-setting the recovery speed to
5MB/s to avoid saturating netwo
rk...
Dec 19 14:32:26 turnip nbd0-frank: Adding /dev/nbd0 to /dev/md11....
Dec 19 14:32:26 turnip kernel: md: bind<nbd0>



So. What's going on here? I applied the patch which /starts out/
looking like this:


diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c
index b26927c..dedba16 100644
--- a/drivers/md/bitmap.c
+++ b/drivers/md/bitmap.c
@@ -454,8 +454,11 @@ void bitmap_update_sb(struct bitmap *bitmap)
        spin_unlock_irqrestore(&bitmap->lock, flags);
        sb = (bitmap_super_t *)kmap_atomic(bitmap->sb_page, KM_USER0);
        sb->events = cpu_to_le64(bitmap->mddev->events);
-       if (!bitmap->mddev->degraded)
-               sb->events_cleared = cpu_to_le64(bitmap->mddev->events);
+       if (bitmap->mddev->events < bitmap->events_cleared) {
+               /* rocking back to read-only */
+               bitmap->events_cleared = bitmap->mddev->events;
+               sb->events_cleared = cpu_to_le64(bitmap->events_cleared);
+       }
        kunmap_atomic(sb, KM_USER0);
        write_page(bitmap, bitmap->sb_page, 1);
 }
@@ -1085,9 +1088,19 @@ void bitmap_daemon_work(struct bitmap *bitmap)


To the 2.6.25.18-0.2 source, rebuilt, installed, and rebooted.

/me wipes brow

-- 
Jon

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: weird issues with raid1
  2008-12-19 20:40                           ` Jon Nelson
@ 2008-12-19 21:18                             ` Jon Nelson
  2008-12-22 14:40                               ` Jon Nelson
  0 siblings, 1 reply; 24+ messages in thread
From: Jon Nelson @ 2008-12-19 21:18 UTC (permalink / raw)
  To: Neil Brown; +Cc: LinuxRaid

A correction: I used:

mdadm --assemble /dev/md11 --scan

to assemble md11.

> 6. --re-add /dev/nbd0
>
> At step 6, the array decided to go into recovery:
>
> Dec 19 14:32:26 turnip kernel: md: bind<nbd0>
> Dec 19 14:32:26 turnip kernel: RAID1 conf printout:
> Dec 19 14:32:26 turnip kernel:  --- wd:1 rd:2
> Dec 19 14:32:26 turnip kernel:  disk 0, wo:1, o:1, dev:nbd0
> Dec 19 14:32:26 turnip kernel:  disk 1, wo:0, o:1, dev:sda
> Dec 19 14:32:26 turnip kernel: md: recovery of RAID array md11
>
> and has some time to go ...
>
>      [=>...................]  recovery =  7.7% (6031360/78123988)
> finish=234.6min speed=5120K/sec

(I bumped the recovery speed up to it's maximum, FYI.)

Going off of the timestamps below, it took about 35 minutes to
recover. That's way faster than the more than 2 hours necessary for a
full sync. Therefore, I must assume that the bitmap is at least
partially working.

Dec 19 15:06:41 turnip kernel: md: md11: recovery done.
Dec 19 15:06:41 turnip kernel: RAID1 conf printout:
Dec 19 15:06:41 turnip kernel:  --- wd:2 rd:2
Dec 19 15:06:41 turnip kernel:  disk 0, wo:0, o:1, dev:nbd0
Dec 19 15:06:41 turnip kernel:  disk 1, wo:0, o:1, dev:sda

I'm going to re-do this experiment and grab an --examine-bitmap after
a minute or so into the rebuild to see what happens.

I am tentatively saying that the commit you suggested may be the root
cause of some of the "unnecessary full-sync" issues I've had.



-- 
Jon

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: weird issues with raid1
  2008-12-19 21:18                             ` Jon Nelson
@ 2008-12-22 14:40                               ` Jon Nelson
  2008-12-22 21:07                                 ` NeilBrown
  0 siblings, 1 reply; 24+ messages in thread
From: Jon Nelson @ 2008-12-22 14:40 UTC (permalink / raw)
  Cc: LinuxRaid

More updates:

1. I upgraded to openSUSE 11.1 over the weekend. The kernel is
2.6.27.7-9 as of this writing.

2. When I fired up the machine which hosts the network block device,
the machine hosting the raid properly noticed and --re-added /dev/nbd0
to /dev/md11.

3. /dev/md11 went into "recover" mode (not resync).

4. I'm using persistent metadata and a write-intent bitmap.


**Question**:

What am I doing wrong here? Why doesn't --re-add cause resync instead
of rebuild? If I'm reading the output from --examine-bitmap (below)
correctly, there are 2049 dirty bits at 4MB per bit or about 8196 MB
to resync.


According to this (from the manpage)

       If an array is using a write-intent bitmap, then devices
       which have been removed can be re-added in  a  way  that
       avoids  a  full  reconstruction but instead just updates
       the blocks  that  have  changed  since  the  device  was
       removed.     For   arrays   with   persistent   metadata
       (superblocks) this is done  automatically.   For  arrays
       created  with  --build  mdadm needs to be told that this
       device we removed recently with --re-add.

I'm doing everything OK.
I can the --examine, --examine-bitmap from /dev/nbd0 *before* it is
added to the array:


          Magic : a92b4efc
        Version : 1.0
    Feature Map : 0x1
     Array UUID : cf24d099:9e174a79:2a2f6797:dcff1420
           Name : turnip:11  (local to host turnip)
  Creation Time : Mon Dec 15 07:06:13 2008
     Raid Level : raid1
   Raid Devices : 2

 Avail Dev Size : 160086384 (76.34 GiB 81.96 GB)
     Array Size : 156247976 (74.50 GiB 80.00 GB)
  Used Dev Size : 156247976 (74.50 GiB 80.00 GB)
   Super Offset : 160086512 sectors
          State : clean
    Device UUID : 01524a75:c309869c:6da972c9:084115c6

Internal Bitmap : 2 sectors from superblock
      Flags : write-mostly
    Update Time : Sat Dec 20 19:43:43 2008
       Checksum : 63c19462 - correct
         Events : 7042


    Array Slot : 2 (failed, failed, empty, 1)
   Array State : _u 2 failed
        Filename : /dev/nbd0
           Magic : 6d746962
         Version : 4
            UUID : cf24d099:9e174a79:2a2f6797:dcff1420
          Events : 5518
  Events Cleared : 5494
           State : OK
       Chunksize : 4 MB
          Daemon : 5s flush period
      Write Mode : Allow write behind, max 256
       Sync Size : 78123988 (74.50 GiB 80.00 GB)
          Bitmap : 19074 bits (chunks), 0 dirty (0.0%)

Then I --re-added /dev/nbd0 to the array:

Dec 22 08:15:53 turnip kernel: RAID1 conf printout:
Dec 22 08:15:53 turnip kernel:  --- wd:1 rd:2
Dec 22 08:15:53 turnip kernel:  disk 0, wo:1, o:1, dev:nbd0
Dec 22 08:15:53 turnip kernel:  disk 1, wo:0, o:1, dev:sda
Dec 22 08:15:53 turnip kernel: md: recovery of RAID array md11
Dec 22 08:15:53 turnip kernel: md: minimum _guaranteed_  speed: 1000
KB/sec/disk.
Dec 22 08:15:53 turnip kernel: md: using maximum available idle IO
bandwidth (but not more than 5120 KB/sec) for recovery.
Dec 22 08:15:53 turnip kernel: md: using 128k window, over a total of
78123988 blocks.

And this is what things look like 20 minutes into the reconstruction/rebuild:

turnip:~ # mdadm --examine-bitmap /dev/sda
        Filename : /dev/sda
           Magic : 6d746962
         Version : 4
            UUID : cf24d099:9e174a79:2a2f6797:dcff1420
          Events : 15928
  Events Cleared : 5494
           State : OK
       Chunksize : 4 MB
          Daemon : 5s flush period
      Write Mode : Allow write behind, max 256
       Sync Size : 78123988 (74.50 GiB 80.00 GB)
          Bitmap : 19074 bits (chunks), 2065 dirty (10.8%)
turnip:~ # mdadm --examine-bitmap /dev/nbd0
        Filename : /dev/nbd0
           Magic : 6d746962
         Version : 4
            UUID : cf24d099:9e174a79:2a2f6797:dcff1420
          Events : 5518
  Events Cleared : 5494
           State : OK
       Chunksize : 4 MB
          Daemon : 5s flush period
      Write Mode : Allow write behind, max 256
       Sync Size : 78123988 (74.50 GiB 80.00 GB)
          Bitmap : 19074 bits (chunks), 0 dirty (0.0%)
turnip:~ #

and finally some --detail:


/dev/md11:
        Version : 1.00
  Creation Time : Mon Dec 15 07:06:13 2008
     Raid Level : raid1
     Array Size : 78123988 (74.50 GiB 80.00 GB)
  Used Dev Size : 156247976 (149.01 GiB 160.00 GB)
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Mon Dec 22 08:24:25 2008
          State : active, degraded, recovering
 Active Devices : 1
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 1

 Rebuild Status : 8% complete

           Name : turnip:11  (local to host turnip)
           UUID : cf24d099:9e174a79:2a2f6797:dcff1420
         Events : 15928

    Number   Major   Minor   RaidDevice State
       2      43        0        0      writemostly spare rebuilding   /dev/nbd0
       3       8        0        1      active sync   /dev/sda

--
Jon

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: weird issues with raid1
  2008-12-22 14:40                               ` Jon Nelson
@ 2008-12-22 21:07                                 ` NeilBrown
  0 siblings, 0 replies; 24+ messages in thread
From: NeilBrown @ 2008-12-22 21:07 UTC (permalink / raw)
  To: Jon Nelson; +Cc: LinuxRaid

On Tue, December 23, 2008 1:40 am, Jon Nelson wrote:
> More updates:
>
> 1. I upgraded to openSUSE 11.1 over the weekend. The kernel is
> 2.6.27.7-9 as of this writing.
>
> 2. When I fired up the machine which hosts the network block device,
> the machine hosting the raid properly noticed and --re-added /dev/nbd0
> to /dev/md11.
>
> 3. /dev/md11 went into "recover" mode (not resync).
>
> 4. I'm using persistent metadata and a write-intent bitmap.
>

It does seem like you are doing the right thing....

Can you show me the output of both --examine and ----examine-bitmap
on both /dev/sda and /dev/nbd0 just before you --re-add nbd0 to
the array that already contains sda ??
For recovery to use the bitmap, "Events Cleared" on sda must be no
more than "Events" (from --examine) of nbd0.

What you sent doesn't quite have all this information, but it does
show that for nbd0 before it is added to the array:

--examine;
         Events : 7042

--examine-bitmap:
          Events : 5518
  Events Cleared : 5494

This shouldn't happen.  The 'events' from --examine and from
--examine-bitmap should always be the same.  That is how md knows
that the bitmap is still accurate.
This seems to suggest that nbd0 was, for a while, assembled into an
array which did not have an active bitmap.

Thanks,
NeilBrown



^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2008-12-22 21:07 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-06  2:10 weird issues with raid1 Jon Nelson
2008-12-06  2:46 ` Jon Nelson
2008-12-06 12:16   ` Justin Piszcz
2008-12-15  2:17     ` Jon Nelson
2008-12-15  6:00 ` Neil Brown
2008-12-15 13:42   ` Jon Nelson
2008-12-15 21:33     ` Neil Brown
2008-12-15 21:47       ` Jon Nelson
2008-12-16  1:21         ` Neil Brown
2008-12-16  2:32           ` Jon Nelson
2008-12-18  4:42           ` Neil Brown
2008-12-18  4:50             ` Jon Nelson
2008-12-18  4:55               ` Jon Nelson
2008-12-18  5:17                 ` Neil Brown
2008-12-18  5:47                   ` Jon Nelson
2008-12-18  6:21                     ` Neil Brown
2008-12-19  2:15                       ` Jon Nelson
2008-12-19 16:51                         ` Jon Nelson
2008-12-19 20:40                           ` Jon Nelson
2008-12-19 21:18                             ` Jon Nelson
2008-12-22 14:40                               ` Jon Nelson
2008-12-22 21:07                                 ` NeilBrown
2008-12-18  5:43   ` Neil Brown
2008-12-18  5:54     ` Jon Nelson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).