* weird issues with raid1
@ 2008-12-06 2:10 Jon Nelson
2008-12-06 2:46 ` Jon Nelson
2008-12-15 6:00 ` Neil Brown
0 siblings, 2 replies; 24+ messages in thread
From: Jon Nelson @ 2008-12-06 2:10 UTC (permalink / raw)
To: LinuxRaid
I set up a raid1 between some devices, and have been futzing with it.
I've been encountering all kinds of weird problems, including one
which required me to reboot my machine.
This is long, sorry.
First, this is how I built the raid:
mdadm --create /dev/md10 --level=1 --raid-devices=2 --bitmap=internal
/dev/sdd1 --write-mostly --write-behind missing
then I added /dev/nbd0:
mdadm /dev/md10 --add /dev/nbd0
and it rebuilt just fine.
Then I failed and removed /dev/sdd1, and added /dev/sda:
mdadm /dev/md10 --fail /dev/sdd1 --remove /dev/sdd1
mdadm /dev/md10 --add /dev/sda
I let it rebuild.
Then I failed, and removed it:
The --fail worked, but the --remove did not.
mdadm /dev/md10 --fail /dev/sda --remove /dev/sda
mdadm: set /dev/sda faulty in /dev/md10
mdadm: hot remove failed for /dev/sda: Device or resource busy
Whaaa?
So I tried again:
mdadm /dev/md10 --remove /dev/sda
mdadm: hot removed /dev/sda
OK. Better, but weird.
Since I'm using bitmaps, I would expect --re-add to allow the rebuild
to pick up where it left off. It was 78% done.
mdadm /dev/md10 --re-add /dev/sda
cat /dev/mdstat
md10 : active raid1 sda[2] nbd0[1]
78123968 blocks [2/1] [_U]
[>....................] recovery = 1.2% (959168/78123968)
finish=30.8min speed=41702K/sec
bitmap: 0/150 pages [0KB], 256KB chunk
******
Question 1:
I'm using a bitmap. Why does the rebuild start completely over?
4% into the rebuild, this is what --examine-bitmap looks like for both
components:
Filename : /dev/sda
Magic : 6d746962
Version : 4
UUID : 542a0986:dd465da6:b224af07:ed28e4e5
Events : 500
Events Cleared : 496
State : OK
Chunksize : 256 KB
Daemon : 5s flush period
Write Mode : Allow write behind, max 256
Sync Size : 78123968 (74.50 GiB 80.00 GB)
Bitmap : 305172 bits (chunks), 305172 dirty (100.0%)
turnip:~ # mdadm --examine-bitmap /dev/nbd0
Filename : /dev/nbd0
Magic : 6d746962
Version : 4
UUID : 542a0986:dd465da6:b224af07:ed28e4e5
Events : 524
Events Cleared : 496
State : OK
Chunksize : 256 KB
Daemon : 5s flush period
Write Mode : Allow write behind, max 256
Sync Size : 78123968 (74.50 GiB 80.00 GB)
Bitmap : 305172 bits (chunks), 0 dirty (0.0%)
No matter how long I wait, until it is rebuilt, the bitmap on /dev/sda
is always 100% dirty.
If I --fail, --remove (twice) /dev/sda, and I re-add /dev/sdd1, it
clearly uses the bitmap and re-syncs in under 1 second.
***************
Question 2: mdadm --detail and cat /proc/mdstat do not agree:
NOTE: mdadm --detail says the rebuild status is 0% complete, but cat
/proc/mdstat shows it as 7%.
A bit later, I check again and they both agree - 14%.
Below, from when the rebuild was 7% according to /proc/mdstat
/dev/md10:
Version : 00.90.03
Creation Time : Fri Dec 5 07:44:41 2008
Raid Level : raid1
Array Size : 78123968 (74.50 GiB 80.00 GB)
Used Dev Size : 78123968 (74.50 GiB 80.00 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 10
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Fri Dec 5 20:04:30 2008
State : active, degraded, recovering
Active Devices : 1
Working Devices : 2
Failed Devices : 0
Spare Devices : 1
Rebuild Status : 0% complete
UUID : 542a0986:dd465da6:b224af07:ed28e4e5
Events : 0.544
Number Major Minor RaidDevice State
2 8 0 0 spare rebuilding /dev/sda
1 43 0 1 active sync /dev/nbd0
md10 : active raid1 sda[2] nbd0[1]
78123968 blocks [2/1] [_U]
[==>..................] recovery = 13.1% (10283392/78123968)
finish=27.3min speed=41367K/sec
bitmap: 0/150 pages [0KB], 256KB chunk
--
Jon
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: weird issues with raid1 2008-12-06 2:10 weird issues with raid1 Jon Nelson @ 2008-12-06 2:46 ` Jon Nelson 2008-12-06 12:16 ` Justin Piszcz 2008-12-15 6:00 ` Neil Brown 1 sibling, 1 reply; 24+ messages in thread From: Jon Nelson @ 2008-12-06 2:46 UTC (permalink / raw) To: LinuxRaid More info: according to /dev/mdstat (and /var/log/messages) the rebuild is complete: md10 : active raid1 sda[0] nbd0[1] 78123968 blocks [2/2] [UU] bitmap: 0/150 pages [0KB], 256KB chunk and --detail: /dev/md10: Version : 00.90.03 Creation Time : Fri Dec 5 07:44:41 2008 Raid Level : raid1 Array Size : 78123968 (74.50 GiB 80.00 GB) Used Dev Size : 78123968 (74.50 GiB 80.00 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 10 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Fri Dec 5 20:40:32 2008 State : active Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 UUID : 542a0986:dd465da6:b224af07:ed28e4e5 Events : 0.554 Number Major Minor RaidDevice State 0 8 0 0 active sync /dev/sda 1 43 0 1 active sync /dev/nbd0 however, --examine-bitmap disagrees: Filename : /dev/sda Magic : 6d746962 Version : 4 UUID : 542a0986:dd465da6:b224af07:ed28e4e5 Events : 554 Events Cleared : 554 State : OK Chunksize : 256 KB Daemon : 5s flush period Write Mode : Allow write behind, max 256 Sync Size : 78123968 (74.50 GiB 80.00 GB) Bitmap : 305172 bits (chunks), 274452 dirty (89.9%) The bitmap numbers *DID NOT CHANGE* throughout the entire rebuild process, and when it was complete, changed to what you see above. The rebuild completed a few minutes prior to the --examine-bitmap. Something is very funky. If I --grow --bitmap=none, --grow --bitmap=internal then things look OK after maybe 10-15 seconds. Of course, when this is complete the --fail --remove and --re-add work as expected on /dev/sda. -- Jon ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: weird issues with raid1 2008-12-06 2:46 ` Jon Nelson @ 2008-12-06 12:16 ` Justin Piszcz 2008-12-15 2:17 ` Jon Nelson 0 siblings, 1 reply; 24+ messages in thread From: Justin Piszcz @ 2008-12-06 12:16 UTC (permalink / raw) To: Jon Nelson; +Cc: LinuxRaid On Fri, 5 Dec 2008, Jon Nelson wrote: > More info: > > according to /dev/mdstat (and /var/log/messages) the rebuild is complete: > > md10 : active raid1 sda[0] nbd0[1] > 78123968 blocks [2/2] [UU] > bitmap: 0/150 pages [0KB], 256KB chunk I have not tried using network block devices or the write-behind option; however, Neil et. all will want to know: - kernel version used - mdadm version In order to help better track the issues. Justin. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: weird issues with raid1 2008-12-06 12:16 ` Justin Piszcz @ 2008-12-15 2:17 ` Jon Nelson 0 siblings, 0 replies; 24+ messages in thread From: Jon Nelson @ 2008-12-15 2:17 UTC (permalink / raw) To: Justin Piszcz; +Cc: LinuxRaid > I have not tried using network block devices or the write-behind option; > however, Neil et. all will want to know: > > - kernel version used > - mdadm version Kernel: 2.6.25.18-0.2-default (stock openSUSE 11.0) mdadm: tried both 2.6.4 (stock openSUSE 11.0) and 3.0-12.1 (from opensuse factory) -- Jon ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: weird issues with raid1 2008-12-06 2:10 weird issues with raid1 Jon Nelson 2008-12-06 2:46 ` Jon Nelson @ 2008-12-15 6:00 ` Neil Brown 2008-12-15 13:42 ` Jon Nelson 2008-12-18 5:43 ` Neil Brown 1 sibling, 2 replies; 24+ messages in thread From: Neil Brown @ 2008-12-15 6:00 UTC (permalink / raw) To: Jon Nelson; +Cc: LinuxRaid On Friday December 5, jnelson-linux-raid@jamponi.net wrote: > I set up a raid1 between some devices, and have been futzing with it. > I've been encountering all kinds of weird problems, including one > which required me to reboot my machine. > > This is long, sorry. > > First, this is how I built the raid: > > mdadm --create /dev/md10 --level=1 --raid-devices=2 --bitmap=internal > /dev/sdd1 --write-mostly --write-behind missing 'write-behind' is a setting on the bitmap and applies to all write-mostly devices, so it can be specified anywhere. 'write-mostly' is a setting that applies to a particular device, not to a position in the array. So setting 'write-mostly' on a 'missing' device has no useful effect. When you add a new device to the array you will need to set 'write-mostly' on that if you want that feature. i.e. mdadm /dev/md10 --add --write-mostly /dev/nbd0 > > then I added /dev/nbd0: > > mdadm /dev/md10 --add /dev/nbd0 > > and it rebuilt just fine. Good. > > Then I failed and removed /dev/sdd1, and added /dev/sda: > > mdadm /dev/md10 --fail /dev/sdd1 --remove /dev/sdd1 > mdadm /dev/md10 --add /dev/sda > > I let it rebuild. > > Then I failed, and removed it: > > The --fail worked, but the --remove did not. > > mdadm /dev/md10 --fail /dev/sda --remove /dev/sda > mdadm: set /dev/sda faulty in /dev/md10 > mdadm: hot remove failed for /dev/sda: Device or resource busy That is expected. Marking a device a 'failed' does not immediately disconnect it from the array. You have to wait for any in-flight IO requests to complete. > > Whaaa? > So I tried again: > > mdadm /dev/md10 --remove /dev/sda > mdadm: hot removed /dev/sda By now all those in-flight requests had completed and the device could be removed. > > OK. Better, but weird. > Since I'm using bitmaps, I would expect --re-add to allow the rebuild > to pick up where it left off. It was 78% done. Nope. With v0.90 metadata, a spare device is not marked a being part of the array until it is fully recovered. So if you interrupt a recovery there is no record how far it got. With v1.0 metadata we do record how far the recovery has progressed and it can restart. However I don't think that helps if you fail a device - only if you stop the array and later restart it. The bitmap is really about 'resync', not 'recovery'. > > ****** > Question 1: > I'm using a bitmap. Why does the rebuild start completely over? Because the bitmap isn't used to guide a rebuild, only a resync. The effect of --re-add is to make md do a resync rather than a rebuild if the device was previously a fully active member of the array. > > 4% into the rebuild, this is what --examine-bitmap looks like for both > components: > > Filename : /dev/sda > Magic : 6d746962 > Version : 4 > UUID : 542a0986:dd465da6:b224af07:ed28e4e5 > Events : 500 > Events Cleared : 496 > State : OK > Chunksize : 256 KB > Daemon : 5s flush period > Write Mode : Allow write behind, max 256 > Sync Size : 78123968 (74.50 GiB 80.00 GB) > Bitmap : 305172 bits (chunks), 305172 dirty (100.0%) > > turnip:~ # mdadm --examine-bitmap /dev/nbd0 > Filename : /dev/nbd0 > Magic : 6d746962 > Version : 4 > UUID : 542a0986:dd465da6:b224af07:ed28e4e5 > Events : 524 > Events Cleared : 496 > State : OK > Chunksize : 256 KB > Daemon : 5s flush period > Write Mode : Allow write behind, max 256 > Sync Size : 78123968 (74.50 GiB 80.00 GB) > Bitmap : 305172 bits (chunks), 0 dirty (0.0%) > > > No matter how long I wait, until it is rebuilt, the bitmap on /dev/sda > is always 100% dirty. > If I --fail, --remove (twice) /dev/sda, and I re-add /dev/sdd1, it > clearly uses the bitmap and re-syncs in under 1 second. Yes, there is a bug here. When an array recovers on to a hot space it doesn't copy the bitmap across. That will only happen lazily as bits are updated. I'm surprised I hadn't noticed that before, so they might be more to this than I'm seeing at the moment. But I definitely cannot find code to copy the bitmap across. I'll have to have a think about that. > > > *************** > Question 2: mdadm --detail and cat /proc/mdstat do not agree: > > NOTE: mdadm --detail says the rebuild status is 0% complete, but cat > /proc/mdstat shows it as 7%. > A bit later, I check again and they both agree - 14%. > Below, from when the rebuild was 7% according to /proc/mdstat I cannot explain this except to wonder if 7% of the recovery completed between running "mdadm -D" and "cat /proc/mdstat". The number report by "mdadm -D" is obtained by reading /proc/mdstat and applying "atoi()" to the string that ends with a '%'. NeilBrown > > /dev/md10: > Version : 00.90.03 > Creation Time : Fri Dec 5 07:44:41 2008 > Raid Level : raid1 > Array Size : 78123968 (74.50 GiB 80.00 GB) > Used Dev Size : 78123968 (74.50 GiB 80.00 GB) > Raid Devices : 2 > Total Devices : 2 > Preferred Minor : 10 > Persistence : Superblock is persistent > > Intent Bitmap : Internal > > Update Time : Fri Dec 5 20:04:30 2008 > State : active, degraded, recovering > Active Devices : 1 > Working Devices : 2 > Failed Devices : 0 > Spare Devices : 1 > > Rebuild Status : 0% complete > > UUID : 542a0986:dd465da6:b224af07:ed28e4e5 > Events : 0.544 > > Number Major Minor RaidDevice State > 2 8 0 0 spare rebuilding /dev/sda > 1 43 0 1 active sync /dev/nbd0 > > > md10 : active raid1 sda[2] nbd0[1] > 78123968 blocks [2/1] [_U] > [==>..................] recovery = 13.1% (10283392/78123968) > finish=27.3min speed=41367K/sec > bitmap: 0/150 pages [0KB], 256KB chunk > > > > -- > Jon > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html v ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: weird issues with raid1 2008-12-15 6:00 ` Neil Brown @ 2008-12-15 13:42 ` Jon Nelson 2008-12-15 21:33 ` Neil Brown 2008-12-18 5:43 ` Neil Brown 1 sibling, 1 reply; 24+ messages in thread From: Jon Nelson @ 2008-12-15 13:42 UTC (permalink / raw) To: Neil Brown; +Cc: LinuxRaid On Mon, Dec 15, 2008 at 12:00 AM, Neil Brown <neilb@suse.de> wrote: > On Friday December 5, jnelson-linux-raid@jamponi.net wrote: >> I set up a raid1 between some devices, and have been futzing with it. >> I've been encountering all kinds of weird problems, including one >> which required me to reboot my machine. >> >> This is long, sorry. >> >> First, this is how I built the raid: >> >> mdadm --create /dev/md10 --level=1 --raid-devices=2 --bitmap=internal >> /dev/sdd1 --write-mostly --write-behind missing > > 'write-behind' is a setting on the bitmap and applies to all > write-mostly devices, so it can be specified anywhere. > 'write-mostly' is a setting that applies to a particular device, not > to a position in the array. So setting 'write-mostly' on a 'missing' > device has no useful effect. When you add a new device to the array > you will need to set 'write-mostly' on that if you want that feature. Aha! Good to know. > mdadm /dev/md10 --add --write-mostly /dev/nbd0 .. >> Then I failed and removed /dev/sdd1, and added /dev/sda: >> >> mdadm /dev/md10 --fail /dev/sdd1 --remove /dev/sdd1 >> mdadm /dev/md10 --add /dev/sda >> >> I let it rebuild. >> >> Then I failed, and removed it: >> >> The --fail worked, but the --remove did not. >> >> mdadm /dev/md10 --fail /dev/sda --remove /dev/sda >> mdadm: set /dev/sda faulty in /dev/md10 >> mdadm: hot remove failed for /dev/sda: Device or resource busy > > That is expected. Marking a device a 'failed' does not immediately > disconnect it from the array. You have to wait for any in-flight IO > requests to complete. Aha! Got it. >> OK. Better, but weird. >> Since I'm using bitmaps, I would expect --re-add to allow the rebuild >> to pick up where it left off. It was 78% done. > > Nope. > With v0.90 metadata, a spare device is not marked a being part of the > array until it is fully recovered. So if you interrupt a recovery > there is no record how far it got. > With v1.0 metadata we do record how far the recovery has progressed > and it can restart. However I don't think that helps if you fail a > device - only if you stop the array and later restart it. > > The bitmap is really about 'resync', not 'recovery'. OK, so task 1: switch to 1.0 (1.1, 1.2) metadata. That's going to happen as soon as my raid10,f2 'check' is complete. However, it raises a question: bitmaps are about 'resync' not 'recovery'? How do they differ? >> Question 1: >> I'm using a bitmap. Why does the rebuild start completely over? > > Because the bitmap isn't used to guide a rebuild, only a resync. > > The effect of --re-add is to make md do a resync rather than a rebuild > if the device was previously a fully active member of the array. Aha! This explains a question I raised in another email. What happened there is a previously fully active member of the raid got added, somehow, as a spare, via --incremental. That's when the entire raid thought it needed to be rebuilt. How did that (the device being treated as a spare instead of as a previously fully active member) happen? >> 4% into the rebuild, this is what --examine-bitmap looks like for both >> components: >> >> Filename : /dev/sda >> Magic : 6d746962 >> Version : 4 >> UUID : 542a0986:dd465da6:b224af07:ed28e4e5 >> Events : 500 >> Events Cleared : 496 >> State : OK >> Chunksize : 256 KB >> Daemon : 5s flush period >> Write Mode : Allow write behind, max 256 >> Sync Size : 78123968 (74.50 GiB 80.00 GB) >> Bitmap : 305172 bits (chunks), 305172 dirty (100.0%) >> >> turnip:~ # mdadm --examine-bitmap /dev/nbd0 >> Filename : /dev/nbd0 >> Magic : 6d746962 >> Version : 4 >> UUID : 542a0986:dd465da6:b224af07:ed28e4e5 >> Events : 524 >> Events Cleared : 496 >> State : OK >> Chunksize : 256 KB >> Daemon : 5s flush period >> Write Mode : Allow write behind, max 256 >> Sync Size : 78123968 (74.50 GiB 80.00 GB) >> Bitmap : 305172 bits (chunks), 0 dirty (0.0%) >> >> >> No matter how long I wait, until it is rebuilt, the bitmap on /dev/sda >> is always 100% dirty. >> If I --fail, --remove (twice) /dev/sda, and I re-add /dev/sdd1, it >> clearly uses the bitmap and re-syncs in under 1 second. > > Yes, there is a bug here. > When an array recovers on to a hot space it doesn't copy the bitmap > across. That will only happen lazily as bits are updated. > I'm surprised I hadn't noticed that before, so they might be more to > this than I'm seeing at the moment. But I definitely cannot find > code to copy the bitmap across. I'll have to have a think about > that. ok. >> Question 2: mdadm --detail and cat /proc/mdstat do not agree: >> >> NOTE: mdadm --detail says the rebuild status is 0% complete, but cat >> /proc/mdstat shows it as 7%. >> A bit later, I check again and they both agree - 14%. >> Below, from when the rebuild was 7% according to /proc/mdstat > > I cannot explain this except to wonder if 7% of the recovery > completed between running "mdadm -D" and "cat /proc/mdstat". > > The number report by "mdadm -D" is obtained by reading /proc/mdstat > and applying "atoi()" to the string that ends with a '%'. OK. As I see it, there are three issues here: 1. somehow a previously fully-active member got re-added (via --incremental) as a spare instead simply re-added, forcing a full rebuild. 2. new raid member bitmap weirdness (the bitmap doesn't get copied over on new members, causing all sorts of weirdness). 3. The unexplained difference between mdadm --detail and cat /proc/mdstat I have a few more questions / observations I'd like to make but I'll do those in another email. Thanks for your response(s)! -- Jon ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: weird issues with raid1 2008-12-15 13:42 ` Jon Nelson @ 2008-12-15 21:33 ` Neil Brown 2008-12-15 21:47 ` Jon Nelson 0 siblings, 1 reply; 24+ messages in thread From: Neil Brown @ 2008-12-15 21:33 UTC (permalink / raw) To: Jon Nelson; +Cc: LinuxRaid On Monday December 15, jnelson-linux-raid@jamponi.net wrote: > > However, it raises a question: bitmaps are about 'resync' not > 'recovery'? How do they differ? With resync, the expectation is that most of the device is correct. The bitmap tells us which sectors aren't, and we just resync those. With recover, the expectation is that the entire drive contains garbage and it has to be recovered from beginning to end. Each device has a flag to say where the device is in sync write the array. The bit map records which sectors of "in-sync" devices may not actually in in-sync at the moment. 'resync' synchronises the 'in-sync' devices. 'recovery' synchronises a 'not-in-sync' device.b > > >> Question 1: > >> I'm using a bitmap. Why does the rebuild start completely over? > > > > Because the bitmap isn't used to guide a rebuild, only a resync. > > > > The effect of --re-add is to make md do a resync rather than a rebuild > > if the device was previously a fully active member of the array. > > Aha! This explains a question I raised in another email. What > happened there is a previously fully active member of the raid got > added, somehow, as a spare, via --incremental. That's when the entire > raid thought it needed to be rebuilt. How did that (the device being > treated as a spare instead of as a previously fully active member) > happen? It is hard to guess without details, and they might be hard to collect after the fact. Maybe if you have the kernel logs of when the server rebooted and the recovery started, that might contain some hints. Thanks, NeilBrown ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: weird issues with raid1 2008-12-15 21:33 ` Neil Brown @ 2008-12-15 21:47 ` Jon Nelson 2008-12-16 1:21 ` Neil Brown 0 siblings, 1 reply; 24+ messages in thread From: Jon Nelson @ 2008-12-15 21:47 UTC (permalink / raw) To: Neil Brown; +Cc: LinuxRaid On Mon, Dec 15, 2008 at 3:33 PM, Neil Brown <neilb@suse.de> wrote: > On Monday December 15, jnelson-linux-raid@jamponi.net wrote: >> >> However, it raises a question: bitmaps are about 'resync' not >> 'recovery'? How do they differ? > > With resync, the expectation is that most of the device is correct. > The bitmap tells us which sectors aren't, and we just resync those. > > With recover, the expectation is that the entire drive contains > garbage and it has to be recovered from beginning to end. > > Each device has a flag to say where the device is in sync write the > array. The bit map records which sectors of "in-sync" devices may not > actually in in-sync at the moment. > 'resync' synchronises the 'in-sync' devices. > 'recovery' synchronises a 'not-in-sync' device.b > > >> >> >> Question 1: >> >> I'm using a bitmap. Why does the rebuild start completely over? >> > >> > Because the bitmap isn't used to guide a rebuild, only a resync. >> > >> > The effect of --re-add is to make md do a resync rather than a rebuild >> > if the device was previously a fully active member of the array. >> >> Aha! This explains a question I raised in another email. What >> happened there is a previously fully active member of the raid got >> added, somehow, as a spare, via --incremental. That's when the entire >> raid thought it needed to be rebuilt. How did that (the device being >> treated as a spare instead of as a previously fully active member) >> happen? > > It is hard to guess without details, and they might be hard to collect > after the fact. > Maybe if you have the kernel logs of when the server rebooted and the > recovery started, that might contain some hints. I hope this helps. Prior to the reboot: Dec 15 15:19:39 turnip kernel: md: md11: recovery done. Dec 15 15:19:39 turnip kernel: RAID1 conf printout: Dec 15 15:19:39 turnip kernel: --- wd:2 rd:2 Dec 15 15:19:39 turnip kernel: disk 0, wo:0, o:1, dev:nbd0 Dec 15 15:19:39 turnip kernel: disk 1, wo:0, o:1, dev:sda During booting: <6>raid1: raid set md11 active with 1 out of 2 mirrors <6>md11: bitmap initialized from disk: read 1/1 pages, set 1 bits <6>created bitmap (10 pages) for device md11 After boot: Dec 15 15:34:38 turnip kernel: md: bind<nbd0> Dec 15 15:34:38 turnip kernel: RAID1 conf printout: Dec 15 15:34:38 turnip kernel: --- wd:1 rd:2 Dec 15 15:34:38 turnip kernel: disk 0, wo:1, o:1, dev:nbd0 Dec 15 15:34:38 turnip kernel: disk 1, wo:0, o:1, dev:sda Dec 15 15:34:38 turnip kernel: md: recovery of RAID array md11 Dec 15 15:34:38 turnip kernel: md: minimum _guaranteed_ speed: 1000 KB/sec/disk. Dec 15 15:34:38 turnip kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery. Dec 15 15:34:38 turnip kernel: md: using 128k window, over a total of 78123988 blocks. /dev/nbd0 was added via --incremental (mdadm 3.0) --detail: /dev/md11: Version : 01.00.03 Creation Time : Mon Dec 15 07:06:13 2008 Raid Level : raid1 Array Size : 78123988 (74.50 GiB 80.00 GB) Used Dev Size : 156247976 (149.01 GiB 160.00 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 11 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Mon Dec 15 15:35:17 2008 State : active, degraded, recovering Active Devices : 1 Working Devices : 2 Failed Devices : 0 Spare Devices : 1 Rebuild Status : 9% complete Name : turnip:11 (local to host turnip) UUID : cf24d099:9e174a79:2a2f6797:dcff1420 Events : 3914 Number Major Minor RaidDevice State 2 43 0 0 spare rebuilding /dev/nbd0 3 8 0 1 active sync /dev/sda turnip:~ # mdadm --examine /dev/sda /dev/sda: Magic : a92b4efc Version : 1.0 Feature Map : 0x1 Array UUID : cf24d099:9e174a79:2a2f6797:dcff1420 Name : turnip:11 (local to host turnip) Creation Time : Mon Dec 15 07:06:13 2008 Raid Level : raid1 Raid Devices : 2 Avail Dev Size : 160086384 (76.34 GiB 81.96 GB) Array Size : 156247976 (74.50 GiB 80.00 GB) Used Dev Size : 156247976 (74.50 GiB 80.00 GB) Super Offset : 160086512 sectors State : clean Device UUID : 0059434c:ecef51a0:2974482d:ba38f944 Internal Bitmap : 2 sectors from superblock Update Time : Mon Dec 15 15:45:21 2008 Checksum : 21396863 - correct Events : 3916 Array Slot : 3 (failed, failed, empty, 1) Array State : _U 2 failed turnip:~ # turnip:~ # mdadm --examine /dev/nbd0 /dev/nbd0: Magic : a92b4efc Version : 1.0 Feature Map : 0x1 Array UUID : cf24d099:9e174a79:2a2f6797:dcff1420 Name : turnip:11 (local to host turnip) Creation Time : Mon Dec 15 07:06:13 2008 Raid Level : raid1 Raid Devices : 2 Avail Dev Size : 160086384 (76.34 GiB 81.96 GB) Array Size : 156247976 (74.50 GiB 80.00 GB) Used Dev Size : 156247976 (74.50 GiB 80.00 GB) Super Offset : 160086512 sectors State : clean Device UUID : 01524a75:c309869c:6da972c9:084115c6 Internal Bitmap : 2 sectors from superblock Flags : write-mostly Update Time : Mon Dec 15 15:45:21 2008 Checksum : 63bab8ce - correct Events : 3916 Array Slot : 2 (failed, failed, empty, 1) Array State : _u 2 failed turnip:~ # Thanks!! -- Jon ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: weird issues with raid1 2008-12-15 21:47 ` Jon Nelson @ 2008-12-16 1:21 ` Neil Brown 2008-12-16 2:32 ` Jon Nelson 2008-12-18 4:42 ` Neil Brown 0 siblings, 2 replies; 24+ messages in thread From: Neil Brown @ 2008-12-16 1:21 UTC (permalink / raw) To: Jon Nelson; +Cc: LinuxRaid On Monday December 15, jnelson-linux-raid@jamponi.net wrote: > On Mon, Dec 15, 2008 at 3:33 PM, Neil Brown <neilb@suse.de> wrote: > > On Monday December 15, jnelson-linux-raid@jamponi.net wrote: > >> > >> Aha! This explains a question I raised in another email. What > >> happened there is a previously fully active member of the raid got > >> added, somehow, as a spare, via --incremental. That's when the entire > >> raid thought it needed to be rebuilt. How did that (the device being > >> treated as a spare instead of as a previously fully active member) > >> happen? > > > > It is hard to guess without details, and they might be hard to collect > > after the fact. > > Maybe if you have the kernel logs of when the server rebooted and the > > recovery started, that might contain some hints. > > I hope this helps. Yes it does, though I generally prefer to get more complete logs. If I get the surrounding log lines then I know what isn't there as well as what is - and it isn't always clear at first which bits will be important. The problem here is that --incremental doesn't provide the --re-add functionality that you are depending on. That was an oversight on my part. I'll see if I can get it fixed. In the mean time, you'll need to use --re-add (or --add, it does the same thing in your situation) to add nbd0 to the array. NeilBrown > > Prior to the reboot: > > Dec 15 15:19:39 turnip kernel: md: md11: recovery done. > Dec 15 15:19:39 turnip kernel: RAID1 conf printout: > Dec 15 15:19:39 turnip kernel: --- wd:2 rd:2 > Dec 15 15:19:39 turnip kernel: disk 0, wo:0, o:1, dev:nbd0 > Dec 15 15:19:39 turnip kernel: disk 1, wo:0, o:1, dev:sda > > During booting: > > <6>raid1: raid set md11 active with 1 out of 2 mirrors > <6>md11: bitmap initialized from disk: read 1/1 pages, set 1 bits > <6>created bitmap (10 pages) for device md11 > > After boot: > > Dec 15 15:34:38 turnip kernel: md: bind<nbd0> > Dec 15 15:34:38 turnip kernel: RAID1 conf printout: > Dec 15 15:34:38 turnip kernel: --- wd:1 rd:2 > Dec 15 15:34:38 turnip kernel: disk 0, wo:1, o:1, dev:nbd0 > Dec 15 15:34:38 turnip kernel: disk 1, wo:0, o:1, dev:sda > Dec 15 15:34:38 turnip kernel: md: recovery of RAID array md11 > Dec 15 15:34:38 turnip kernel: md: minimum _guaranteed_ speed: 1000 > KB/sec/disk. > Dec 15 15:34:38 turnip kernel: md: using maximum available idle IO > bandwidth (but not more than 200000 KB/sec) for recovery. > Dec 15 15:34:38 turnip kernel: md: using 128k window, over a total of > 78123988 blocks. > > /dev/nbd0 was added via --incremental (mdadm 3.0) > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: weird issues with raid1 2008-12-16 1:21 ` Neil Brown @ 2008-12-16 2:32 ` Jon Nelson 2008-12-18 4:42 ` Neil Brown 1 sibling, 0 replies; 24+ messages in thread From: Jon Nelson @ 2008-12-16 2:32 UTC (permalink / raw) To: Neil Brown; +Cc: LinuxRaid On Mon, Dec 15, 2008 at 7:21 PM, Neil Brown <neilb@suse.de> wrote: > On Monday December 15, jnelson-linux-raid@jamponi.net wrote: >> I hope this helps. > > Yes it does, though I generally prefer to get more complete logs. If > I get the surrounding log lines then I know what isn't there as well > as what is - and it isn't always clear at first which bits will be > important. Quite literally the rest of /var/log/messages was stuff unrelated (dhcp, etc...). However, I'll try to include more context next time. > The problem here is that --incremental doesn't provide the --re-add > functionality that you are depending on. That was an oversight on my > part. I'll see if I can get it fixed. > In the mean time, you'll need to use --re-add (or --add, it does the > same thing in your situation) to add nbd0 to the array. Why does it usually work as though I *had* used --re-add (and specified the right array)? -- Jon ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: weird issues with raid1 2008-12-16 1:21 ` Neil Brown 2008-12-16 2:32 ` Jon Nelson @ 2008-12-18 4:42 ` Neil Brown 2008-12-18 4:50 ` Jon Nelson 1 sibling, 1 reply; 24+ messages in thread From: Neil Brown @ 2008-12-18 4:42 UTC (permalink / raw) To: Jon Nelson, LinuxRaid On Tuesday December 16, neilb@suse.de wrote: > On Monday December 15, jnelson-linux-raid@jamponi.net wrote: > > On Mon, Dec 15, 2008 at 3:33 PM, Neil Brown <neilb@suse.de> wrote: > > > On Monday December 15, jnelson-linux-raid@jamponi.net wrote: > > >> > > >> Aha! This explains a question I raised in another email. What > > >> happened there is a previously fully active member of the raid got > > >> added, somehow, as a spare, via --incremental. That's when the entire > > >> raid thought it needed to be rebuilt. How did that (the device being > > >> treated as a spare instead of as a previously fully active member) > > >> happen? > > > > > > It is hard to guess without details, and they might be hard to collect > > > after the fact. > > > Maybe if you have the kernel logs of when the server rebooted and the > > > recovery started, that might contain some hints. > > > > I hope this helps. > > Yes it does, though I generally prefer to get more complete logs. If > I get the surrounding log lines then I know what isn't there as well > as what is - and it isn't always clear at first which bits will be > important. > > The problem here is that --incremental doesn't provide the --re-add > functionality that you are depending on. That was an oversight on my > part. I'll see if I can get it fixed. > In the mean time, you'll need to use --re-add (or --add, it does the > same thing in your situation) to add nbd0 to the array. Actually, I'm wrong. --incremental does do the right thing w.r.t. --re-add. I couldn't reproduce your symptoms. It could be that you are hitting the bug fixed by commit a0da84f35b25875870270d16b6eccda4884d61a7 You would need 2.6.26 or later to have that fixed. Can you try with a newer kernel??? NeilBrown ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: weird issues with raid1 2008-12-18 4:42 ` Neil Brown @ 2008-12-18 4:50 ` Jon Nelson 2008-12-18 4:55 ` Jon Nelson 0 siblings, 1 reply; 24+ messages in thread From: Jon Nelson @ 2008-12-18 4:50 UTC (permalink / raw) To: Neil Brown; +Cc: LinuxRaid On Wed, Dec 17, 2008 at 10:42 PM, Neil Brown <neilb@suse.de> wrote: > On Tuesday December 16, neilb@suse.de wrote: >> On Monday December 15, jnelson-linux-raid@jamponi.net wrote: >> > On Mon, Dec 15, 2008 at 3:33 PM, Neil Brown <neilb@suse.de> wrote: >> > > On Monday December 15, jnelson-linux-raid@jamponi.net wrote: >> > >> >> > >> Aha! This explains a question I raised in another email. What >> > >> happened there is a previously fully active member of the raid got >> > >> added, somehow, as a spare, via --incremental. That's when the entire >> > >> raid thought it needed to be rebuilt. How did that (the device being >> > >> treated as a spare instead of as a previously fully active member) >> > >> happen? >> > > >> > > It is hard to guess without details, and they might be hard to collect >> > > after the fact. >> > > Maybe if you have the kernel logs of when the server rebooted and the >> > > recovery started, that might contain some hints. >> > >> > I hope this helps. >> >> Yes it does, though I generally prefer to get more complete logs. If >> I get the surrounding log lines then I know what isn't there as well >> as what is - and it isn't always clear at first which bits will be >> important. >> >> The problem here is that --incremental doesn't provide the --re-add >> functionality that you are depending on. That was an oversight on my >> part. I'll see if I can get it fixed. >> In the mean time, you'll need to use --re-add (or --add, it does the >> same thing in your situation) to add nbd0 to the array. > > Actually, I'm wrong. > --incremental does do the right thing w.r.t. --re-add. > I couldn't reproduce your symptoms. OK. > It could be that you are hitting the bug fixed by > commit a0da84f35b25875870270d16b6eccda4884d61a7 That sure sounds like it. I'd have to log to see what happened, exactly, but I've added substantial logging around the device discovery and addition section which manages this particular raid. > You would need 2.6.26 or later to have that fixed. > Can you try with a newer kernel??? I hope to be giving opensuse 11.1 a try soon, which uses 2.6.27.X afaik. I suspect I can also backport that patch to 2.6.25 easily. -- Jon ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: weird issues with raid1 2008-12-18 4:50 ` Jon Nelson @ 2008-12-18 4:55 ` Jon Nelson 2008-12-18 5:17 ` Neil Brown 0 siblings, 1 reply; 24+ messages in thread From: Jon Nelson @ 2008-12-18 4:55 UTC (permalink / raw) To: Neil Brown; +Cc: LinuxRaid On Wed, Dec 17, 2008 at 10:50 PM, Jon Nelson <jnelson-linux-raid@jamponi.net> wrote: > On Wed, Dec 17, 2008 at 10:42 PM, Neil Brown <neilb@suse.de> wrote: >> On Tuesday December 16, neilb@suse.de wrote: >>> On Monday December 15, jnelson-linux-raid@jamponi.net wrote: >>> > On Mon, Dec 15, 2008 at 3:33 PM, Neil Brown <neilb@suse.de> wrote: >>> > > On Monday December 15, jnelson-linux-raid@jamponi.net wrote: >>> > >> >>> > >> Aha! This explains a question I raised in another email. What >>> > >> happened there is a previously fully active member of the raid got >>> > >> added, somehow, as a spare, via --incremental. That's when the entire >>> > >> raid thought it needed to be rebuilt. How did that (the device being >>> > >> treated as a spare instead of as a previously fully active member) >>> > >> happen? >>> > > >>> > > It is hard to guess without details, and they might be hard to collect >>> > > after the fact. >>> > > Maybe if you have the kernel logs of when the server rebooted and the >>> > > recovery started, that might contain some hints. >>> > >>> > I hope this helps. >>> >>> Yes it does, though I generally prefer to get more complete logs. If >>> I get the surrounding log lines then I know what isn't there as well >>> as what is - and it isn't always clear at first which bits will be >>> important. >>> >>> The problem here is that --incremental doesn't provide the --re-add >>> functionality that you are depending on. That was an oversight on my >>> part. I'll see if I can get it fixed. >>> In the mean time, you'll need to use --re-add (or --add, it does the >>> same thing in your situation) to add nbd0 to the array. >> >> Actually, I'm wrong. >> --incremental does do the right thing w.r.t. --re-add. >> I couldn't reproduce your symptoms. > > OK. > >> It could be that you are hitting the bug fixed by >> commit a0da84f35b25875870270d16b6eccda4884d61a7 > > That sure sounds like it. I'd have to log to see what happened, > exactly, but I've added substantial logging around the device > discovery and addition section which manages this particular raid. > >> You would need 2.6.26 or later to have that fixed. >> Can you try with a newer kernel??? > > I hope to be giving opensuse 11.1 a try soon, which uses 2.6.27.X > afaik. I suspect I can also backport that patch to 2.6.25 easily. The kernel source for 2.6.25.18-0.2 (from suse) has this patch already, so I was already using it. Perhaps this weekend or some night this week I'll find time to try to break things again. -- Jon ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: weird issues with raid1 2008-12-18 4:55 ` Jon Nelson @ 2008-12-18 5:17 ` Neil Brown 2008-12-18 5:47 ` Jon Nelson 0 siblings, 1 reply; 24+ messages in thread From: Neil Brown @ 2008-12-18 5:17 UTC (permalink / raw) To: Jon Nelson; +Cc: LinuxRaid On Wednesday December 17, jnelson-linux-raid@jamponi.net wrote: > > > >> It could be that you are hitting the bug fixed by > >> commit a0da84f35b25875870270d16b6eccda4884d61a7 > > > > That sure sounds like it. I'd have to log to see what happened, > > exactly, but I've added substantial logging around the device > > discovery and addition section which manages this particular raid. > > > >> You would need 2.6.26 or later to have that fixed. > >> Can you try with a newer kernel??? > > > > I hope to be giving opensuse 11.1 a try soon, which uses 2.6.27.X > > afaik. I suspect I can also backport that patch to 2.6.25 easily. > > The kernel source for 2.6.25.18-0.2 (from suse) has this patch > already, so I was already using it. Are you sure? I just looked in the openSUSE-11.0 kernel tree and I cannot see it there.... NeilBrown > > Perhaps this weekend or some night this week I'll find time to try to > break things again. > > -- > Jon ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: weird issues with raid1 2008-12-18 5:17 ` Neil Brown @ 2008-12-18 5:47 ` Jon Nelson 2008-12-18 6:21 ` Neil Brown 0 siblings, 1 reply; 24+ messages in thread From: Jon Nelson @ 2008-12-18 5:47 UTC (permalink / raw) To: Neil Brown; +Cc: LinuxRaid On Wed, Dec 17, 2008 at 11:17 PM, Neil Brown <neilb@suse.de> wrote: > On Wednesday December 17, jnelson-linux-raid@jamponi.net wrote: >> > >> >> It could be that you are hitting the bug fixed by >> >> commit a0da84f35b25875870270d16b6eccda4884d61a7 >> > >> > That sure sounds like it. I'd have to log to see what happened, >> > exactly, but I've added substantial logging around the device >> > discovery and addition section which manages this particular raid. >> > >> >> You would need 2.6.26 or later to have that fixed. >> >> Can you try with a newer kernel??? >> > >> > I hope to be giving opensuse 11.1 a try soon, which uses 2.6.27.X >> > afaik. I suspect I can also backport that patch to 2.6.25 easily. >> >> The kernel source for 2.6.25.18-0.2 (from suse) has this patch >> already, so I was already using it. > > Are you sure? I just looked in the openSUSE-11.0 kernel tree and I > cannot see it there.... > > NeilBrown > > >> >> Perhaps this weekend or some night this week I'll find time to try to >> break things again. >> >> -- >> Jon > jnelson@turnip:~/kernels> rpm -qf /usr/src/linux-2.6.25.18-0.2 kernel-source-2.6.25.18-0.2 jnelson@turnip:~/kernels> rpm -V kernel-source-2.6.25.18-0.2 jnelson@turnip:~/kernels> (cd linux-2.6 && git diff a0da84f35b25875870270d16b6eccda4884d61a7 a0da84f35b25875870270d16b6eccda4884d61a7^ ) > d.diff jnelson@turnip:~/kernels> head d.diff diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c index dedba16..b26927c 100644 --- a/drivers/md/bitmap.c +++ b/drivers/md/bitmap.c @@ -454,11 +454,8 @@ void bitmap_update_sb(struct bitmap *bitmap) spin_unlock_irqrestore(&bitmap->lock, flags); sb = (bitmap_super_t *)kmap_atomic(bitmap->sb_page, KM_USER0); sb->events = cpu_to_le64(bitmap->mddev->events); - if (bitmap->mddev->events < bitmap->events_cleared) { - /* rocking back to read-only */ jnelson@turnip:~/kernels> cp -r /usr/src/linux-2.6.25.18-0.2 . jnelson@turnip:~/kernels/linux-2.6.25.18-0.2> tail -n +454 drivers/md/bitmap.c | head -n 20 { bitmap_super_t *sb; unsigned long flags; if (!bitmap || !bitmap->mddev) /* no bitmap for this array */ return; spin_lock_irqsave(&bitmap->lock, flags); if (!bitmap->sb_page) { /* no superblock */ spin_unlock_irqrestore(&bitmap->lock, flags); return; } spin_unlock_irqrestore(&bitmap->lock, flags); sb = (bitmap_super_t *)kmap_atomic(bitmap->sb_page, KM_USER0); sb->events = cpu_to_le64(bitmap->mddev->events); if (!bitmap->mddev->degraded) sb->events_cleared = cpu_to_le64(bitmap->mddev->events); kunmap_atomic(sb, KM_USER0); write_page(bitmap, bitmap->sb_page, 1); } When I view the diff and the source they appear to agree. -- Jon ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: weird issues with raid1 2008-12-18 5:47 ` Jon Nelson @ 2008-12-18 6:21 ` Neil Brown 2008-12-19 2:15 ` Jon Nelson 0 siblings, 1 reply; 24+ messages in thread From: Neil Brown @ 2008-12-18 6:21 UTC (permalink / raw) To: Jon Nelson; +Cc: LinuxRaid On Wednesday December 17, jnelson-linux-raid@jamponi.net wrote: > On Wed, Dec 17, 2008 at 11:17 PM, Neil Brown <neilb@suse.de> wrote: > > On Wednesday December 17, jnelson-linux-raid@jamponi.net wrote: > >> > > >> >> It could be that you are hitting the bug fixed by > >> >> commit a0da84f35b25875870270d16b6eccda4884d61a7 > >> > > >> > That sure sounds like it. I'd have to log to see what happened, > >> > exactly, but I've added substantial logging around the device > >> > discovery and addition section which manages this particular raid. > >> > > >> >> You would need 2.6.26 or later to have that fixed. > >> >> Can you try with a newer kernel??? > >> > > >> > I hope to be giving opensuse 11.1 a try soon, which uses 2.6.27.X > >> > afaik. I suspect I can also backport that patch to 2.6.25 easily. > >> > >> The kernel source for 2.6.25.18-0.2 (from suse) has this patch > >> already, so I was already using it. > > > > Are you sure? I just looked in the openSUSE-11.0 kernel tree and I > > cannot see it there.... > > > > NeilBrown > > > > > >> > >> Perhaps this weekend or some night this week I'll find time to try to > >> break things again. > >> > >> -- > >> Jon > > > > jnelson@turnip:~/kernels> rpm -qf /usr/src/linux-2.6.25.18-0.2 > kernel-source-2.6.25.18-0.2 > jnelson@turnip:~/kernels> rpm -V kernel-source-2.6.25.18-0.2 > jnelson@turnip:~/kernels> (cd linux-2.6 && git diff > a0da84f35b25875870270d16b6eccda4884d61a7 > a0da84f35b25875870270d16b6eccda4884d61a7^ ) > d.diff This is requesting the diff between a given version, and the previous version. So it will be a reversed diff. > jnelson@turnip:~/kernels> head d.diff > diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c > index dedba16..b26927c 100644 > --- a/drivers/md/bitmap.c > +++ b/drivers/md/bitmap.c > @@ -454,11 +454,8 @@ void bitmap_update_sb(struct bitmap *bitmap) > spin_unlock_irqrestore(&bitmap->lock, flags); > sb = (bitmap_super_t *)kmap_atomic(bitmap->sb_page, KM_USER0); > sb->events = cpu_to_le64(bitmap->mddev->events); > - if (bitmap->mddev->events < bitmap->events_cleared) { > - /* rocking back to read-only */ i.e. these two lines are *added* by the patch. I usually use e.g. git log -p a0da84f35b25875870270d16b6eccda4884d61a7 to look at diffs. Less room for confusion. (or gitk). > jnelson@turnip:~/kernels> cp -r /usr/src/linux-2.6.25.18-0.2 . > jnelson@turnip:~/kernels/linux-2.6.25.18-0.2> tail -n +454 > drivers/md/bitmap.c | head -n 20 > { > bitmap_super_t *sb; > unsigned long flags; > > if (!bitmap || !bitmap->mddev) /* no bitmap for this array */ > return; > spin_lock_irqsave(&bitmap->lock, flags); > if (!bitmap->sb_page) { /* no superblock */ > spin_unlock_irqrestore(&bitmap->lock, flags); > return; > } > spin_unlock_irqrestore(&bitmap->lock, flags); > sb = (bitmap_super_t *)kmap_atomic(bitmap->sb_page, KM_USER0); > sb->events = cpu_to_le64(bitmap->mddev->events); > if (!bitmap->mddev->degraded) > sb->events_cleared = cpu_to_le64(bitmap->mddev->events); > kunmap_atomic(sb, KM_USER0); > write_page(bitmap, bitmap->sb_page, 1); > } and as those two lines are not present here, the patch as not been applied. :-) NeilBrown ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: weird issues with raid1 2008-12-18 6:21 ` Neil Brown @ 2008-12-19 2:15 ` Jon Nelson 2008-12-19 16:51 ` Jon Nelson 0 siblings, 1 reply; 24+ messages in thread From: Jon Nelson @ 2008-12-19 2:15 UTC (permalink / raw) To: Neil Brown; +Cc: LinuxRaid >> jnelson@turnip:~/kernels> (cd linux-2.6 && git diff >> a0da84f35b25875870270d16b6eccda4884d61a7 >> a0da84f35b25875870270d16b6eccda4884d61a7^ ) > d.diff > > This is requesting the diff between a given version, and the previous > version. So it will be a reversed diff. *sigh* > i.e. these two lines are *added* by the patch. > I usually use e.g. > git log -p a0da84f35b25875870270d16b6eccda4884d61a7 > to look at diffs. Less room for confusion. (or gitk). I will remember that one! > and as those two lines are not present here, the patch as not been > applied. I'll apply and get back to you. My raid rebuilt 3 times today, quite possibly because of this. Obviously, I'm abusing the code in ways it was not intended to be used. Sometimes that's good for finding corner-case-y kinds of issues, though. Thanks again for your patience. -- Jon ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: weird issues with raid1 2008-12-19 2:15 ` Jon Nelson @ 2008-12-19 16:51 ` Jon Nelson 2008-12-19 20:40 ` Jon Nelson 0 siblings, 1 reply; 24+ messages in thread From: Jon Nelson @ 2008-12-19 16:51 UTC (permalink / raw) To: Neil Brown; +Cc: LinuxRaid > I'll apply and get back to you. My raid rebuilt 3 times today, quite > possibly because of this. I'm now running the patch from a0da84f35b25875870270d16b6eccda4884d61a7 and it still did a complete rebuild. Was that expected, the first time the device was re-added? I just rebooted into the new kernel... (the logs prefixed with nbd0-frank are the result of the follwoing commands: mdadm --examine --test /dev/nbd0 mdadm --examine-bitmap /dev/nbd0 and eventually mdadm /dev/md11 --re-add /dev/nbd0 ) Dec 19 10:30:41 turnip nbd0-frank: /dev/nbd0: Dec 19 10:30:41 turnip nbd0-frank: Magic : a92b4efc Dec 19 10:30:41 turnip nbd0-frank: Version : 1.0 Dec 19 10:30:41 turnip nbd0-frank: Feature Map : 0x1 Dec 19 10:30:41 turnip nbd0-frank: Array UUID : cf24d099:9e174a79:2a2f6797:dcff1420 Dec 19 10:30:41 turnip nbd0-frank: Name : turnip:11 (local to host turnip) Dec 19 10:30:41 turnip nbd0-frank: Creation Time : Mon Dec 15 07:06:13 2008 Dec 19 10:30:41 turnip nbd0-frank: Raid Level : raid1 Dec 19 10:30:41 turnip nbd0-frank: Raid Devices : 2 Dec 19 10:30:41 turnip nbd0-frank: Dec 19 10:30:41 turnip nbd0-frank: Avail Dev Size : 160086384 (76.34 GiB 81.96 GB) Dec 19 10:30:41 turnip nbd0-frank: Array Size : 156247976 (74.50 GiB 80.00 GB) Dec 19 10:30:41 turnip nbd0-frank: Used Dev Size : 156247976 (74.50 GiB 80.00 GB) Dec 19 10:30:41 turnip nbd0-frank: Super Offset : 160086512 sectors Dec 19 10:30:41 turnip nbd0-frank: State : clean Dec 19 10:30:41 turnip nbd0-frank: Device UUID : 01524a75:c309869c:6da972c9:084115c6 Dec 19 10:30:41 turnip nbd0-frank: Dec 19 10:30:41 turnip nbd0-frank: Internal Bitmap : 2 sectors from superblock Dec 19 10:30:41 turnip nbd0-frank: Flags : write-mostly Dec 19 10:30:41 turnip nbd0-frank: Update Time : Fri Dec 19 09:46:48 2008 Dec 19 10:30:41 turnip nbd0-frank: Checksum : 63bfb069 - correct Dec 19 10:30:41 turnip nbd0-frank: Events : 5360 Dec 19 10:30:41 turnip nbd0-frank: Dec 19 10:30:41 turnip nbd0-frank: Dec 19 10:30:41 turnip nbd0-frank: Array Slot : 2 (failed, failed, empty, 1) Dec 19 10:30:41 turnip nbd0-frank: Array State : _u 2 failed Dec 19 10:30:41 turnip nbd0-frank: Filename : /dev/nbd0 Dec 19 10:30:41 turnip nbd0-frank: Magic : 6d746962 Dec 19 10:30:41 turnip nbd0-frank: Version : 4 Dec 19 10:30:41 turnip nbd0-frank: UUID : cf24d099:9e174a79:2a2f6797:dcff1420 Dec 19 10:30:41 turnip nbd0-frank: Events : 4462 Dec 19 10:30:41 turnip nbd0-frank: Events Cleared : 4462 Dec 19 10:30:41 turnip nbd0-frank: State : OK Dec 19 10:30:41 turnip nbd0-frank: Chunksize : 4 MB Dec 19 10:30:41 turnip nbd0-frank: Daemon : 5s flush period Dec 19 10:30:41 turnip nbd0-frank: Write Mode : Allow write behind, max 256 Dec 19 10:30:41 turnip nbd0-frank: Sync Size : 78123988 (74.50 GiB 80.00 GB) Dec 19 10:30:41 turnip nbd0-frank: Bitmap : 19074 bits (chunks), 0 dirty (0.0%) Dec 19 10:30:41 turnip nbd0-frank: Pre-setting the recovery speed to 5MB/s to avoid saturating network... Dec 19 10:30:41 turnip nbd0-frank: Adding /dev/nbd0 to /dev/md11.... Dec 19 10:30:41 turnip kernel: md: bind<nbd0> Dec 19 10:30:41 turnip nbd0-frank: mdadm: re-added /dev/nbd0 Dec 19 10:30:41 turnip kernel: RAID1 conf printout: Dec 19 10:30:41 turnip kernel: --- wd:1 rd:2 Dec 19 10:30:41 turnip kernel: disk 0, wo:1, o:1, dev:nbd0 Dec 19 10:30:41 turnip kernel: disk 1, wo:0, o:1, dev:sda Dec 19 10:30:41 turnip kernel: md: recovery of RAID array md11 Dec 19 10:30:41 turnip kernel: md: minimum _guaranteed_ speed: 1000 KB/sec/disk. Dec 19 10:30:41 turnip kernel: md: using maximum available idle IO bandwidth (but not more than 5120 KB/sec) for recovery. Dec 19 10:30:41 turnip kernel: md: using 128k window, over a total of 78123988 blocks. -- Jon ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: weird issues with raid1 2008-12-19 16:51 ` Jon Nelson @ 2008-12-19 20:40 ` Jon Nelson 2008-12-19 21:18 ` Jon Nelson 0 siblings, 1 reply; 24+ messages in thread From: Jon Nelson @ 2008-12-19 20:40 UTC (permalink / raw) To: Neil Brown; +Cc: LinuxRaid On Fri, Dec 19, 2008 at 10:51 AM, Jon Nelson <jnelson-linux-raid@jamponi.net> wrote: >> I'll apply and get back to you. My raid rebuilt 3 times today, quite >> possibly because of this. > > I'm now running the patch from > a0da84f35b25875870270d16b6eccda4884d61a7 and it still did a complete > rebuild. Was that expected, the first time the device was re-added? After the array reconstructed completely, I did the following: 1. --fail then --remove /dev/nbd0 2. unmounted /dev/md11 3. mdadm --stop /dev/md11 4. mdadm --assemble --scan (this started /dev/md11): Dec 19 14:21:17 turnip kernel: raid1: raid set md11 active with 1 out of 2 mirrors Dec 19 14:21:17 turnip kernel: md11: bitmap initialized from disk: read 1/1 pages, set 0 bits Dec 19 14:21:17 turnip kernel: created bitmap (10 pages) for device md11 5. fsck.ext3 -f -v -D -C0 /dev/md11 (this caused some writes to take place, and I wanted to fsck the volume anyway) 6. --re-add /dev/nbd0 At step 6, the array decided to go into recovery: Dec 19 14:32:26 turnip kernel: md: bind<nbd0> Dec 19 14:32:26 turnip kernel: RAID1 conf printout: Dec 19 14:32:26 turnip kernel: --- wd:1 rd:2 Dec 19 14:32:26 turnip kernel: disk 0, wo:1, o:1, dev:nbd0 Dec 19 14:32:26 turnip kernel: disk 1, wo:0, o:1, dev:sda Dec 19 14:32:26 turnip kernel: md: recovery of RAID array md11 and has some time to go ... [=>...................] recovery = 7.7% (6031360/78123988) finish=234.6min speed=5120K/sec At the time I --re-add'd /dev/nbd0, I also did an --examine and --examine-bitmap of /dev/nbd0: Dec 19 14:32:26 turnip nbd0-frank: /dev/nbd0: Dec 19 14:32:26 turnip nbd0-frank: Magic : a92b4efc Dec 19 14:32:26 turnip nbd0-frank: Version : 1.0 Dec 19 14:32:26 turnip nbd0-frank: Feature Map : 0x1 Dec 19 14:32:26 turnip nbd0-frank: Array UUID : cf24d099:9e174a79:2a2f6797:dcff1420 Dec 19 14:32:26 turnip nbd0-frank: Name : turnip:11 (local to host turnip) Dec 19 14:32:26 turnip nbd0-frank: Creation Time : Mon Dec 15 07:06:13 2008 Dec 19 14:32:26 turnip nbd0-frank: Raid Level : raid1 Dec 19 14:32:26 turnip nbd0-frank: Raid Devices : 2 Dec 19 14:32:26 turnip nbd0-frank: Dec 19 14:32:26 turnip nbd0-frank: Avail Dev Size : 160086384 (76.34 GiB 81.96 GB) Dec 19 14:32:26 turnip nbd0-frank: Array Size : 156247976 (74.50 GiB 80.00 GB) Dec 19 14:32:26 turnip nbd0-frank: Used Dev Size : 156247976 (74.50 GiB 80.00 GB) Dec 19 14:32:26 turnip nbd0-frank: Super Offset : 160086512 sectors Dec 19 14:32:26 turnip nbd0-frank: State : clean Dec 19 14:32:26 turnip nbd0-frank: Device UUID : 01524a75:c309869c:6da972c9:084115c6 Dec 19 14:32:26 turnip nbd0-frank: Dec 19 14:32:26 turnip nbd0-frank: Internal Bitmap : 2 sectors from superblock Dec 19 14:32:26 turnip nbd0-frank: Flags : write-mostly Dec 19 14:32:26 turnip nbd0-frank: Update Time : Fri Dec 19 14:20:52 2008 Dec 19 14:32:26 turnip nbd0-frank: Checksum : 63bef0c2 - correct Dec 19 14:32:26 turnip nbd0-frank: Events : 5388 Dec 19 14:32:26 turnip nbd0-frank: Dec 19 14:32:26 turnip nbd0-frank: Dec 19 14:32:26 turnip nbd0-frank: Array Slot : 2 (failed, failed, 0, 1) Dec 19 14:32:26 turnip nbd0-frank: Array State : Uu 2 failed Dec 19 14:32:26 turnip nbd0-frank: Filename : /dev/nbd0 Dec 19 14:32:26 turnip nbd0-frank: Magic : 6d746962 Dec 19 14:32:26 turnip nbd0-frank: Version : 4 Dec 19 14:32:26 turnip nbd0-frank: UUID : cf24d099:9e174a79:2a2f6797:dcff1420 Dec 19 14:32:26 turnip nbd0-frank: Events : 5388 Dec 19 14:32:26 turnip nbd0-frank: Events Cleared : 4462 Dec 19 14:32:26 turnip nbd0-frank: State : OK Dec 19 14:32:26 turnip nbd0-frank: Chunksize : 4 MB Dec 19 14:32:26 turnip nbd0-frank: Daemon : 5s flush period Dec 19 14:32:26 turnip nbd0-frank: Write Mode : Allow write behind, max 256 Dec 19 14:32:26 turnip nbd0-frank: Sync Size : 78123988 (74.50 GiB 80.00 GB) Dec 19 14:32:26 turnip nbd0-frank: Bitmap : 19074 bits (chunks), 0 dirty (0.0%) Dec 19 14:32:26 turnip nbd0-frank: Pre-setting the recovery speed to 5MB/s to avoid saturating netwo rk... Dec 19 14:32:26 turnip nbd0-frank: Adding /dev/nbd0 to /dev/md11.... Dec 19 14:32:26 turnip kernel: md: bind<nbd0> So. What's going on here? I applied the patch which /starts out/ looking like this: diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c index b26927c..dedba16 100644 --- a/drivers/md/bitmap.c +++ b/drivers/md/bitmap.c @@ -454,8 +454,11 @@ void bitmap_update_sb(struct bitmap *bitmap) spin_unlock_irqrestore(&bitmap->lock, flags); sb = (bitmap_super_t *)kmap_atomic(bitmap->sb_page, KM_USER0); sb->events = cpu_to_le64(bitmap->mddev->events); - if (!bitmap->mddev->degraded) - sb->events_cleared = cpu_to_le64(bitmap->mddev->events); + if (bitmap->mddev->events < bitmap->events_cleared) { + /* rocking back to read-only */ + bitmap->events_cleared = bitmap->mddev->events; + sb->events_cleared = cpu_to_le64(bitmap->events_cleared); + } kunmap_atomic(sb, KM_USER0); write_page(bitmap, bitmap->sb_page, 1); } @@ -1085,9 +1088,19 @@ void bitmap_daemon_work(struct bitmap *bitmap) To the 2.6.25.18-0.2 source, rebuilt, installed, and rebooted. /me wipes brow -- Jon ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: weird issues with raid1 2008-12-19 20:40 ` Jon Nelson @ 2008-12-19 21:18 ` Jon Nelson 2008-12-22 14:40 ` Jon Nelson 0 siblings, 1 reply; 24+ messages in thread From: Jon Nelson @ 2008-12-19 21:18 UTC (permalink / raw) To: Neil Brown; +Cc: LinuxRaid A correction: I used: mdadm --assemble /dev/md11 --scan to assemble md11. > 6. --re-add /dev/nbd0 > > At step 6, the array decided to go into recovery: > > Dec 19 14:32:26 turnip kernel: md: bind<nbd0> > Dec 19 14:32:26 turnip kernel: RAID1 conf printout: > Dec 19 14:32:26 turnip kernel: --- wd:1 rd:2 > Dec 19 14:32:26 turnip kernel: disk 0, wo:1, o:1, dev:nbd0 > Dec 19 14:32:26 turnip kernel: disk 1, wo:0, o:1, dev:sda > Dec 19 14:32:26 turnip kernel: md: recovery of RAID array md11 > > and has some time to go ... > > [=>...................] recovery = 7.7% (6031360/78123988) > finish=234.6min speed=5120K/sec (I bumped the recovery speed up to it's maximum, FYI.) Going off of the timestamps below, it took about 35 minutes to recover. That's way faster than the more than 2 hours necessary for a full sync. Therefore, I must assume that the bitmap is at least partially working. Dec 19 15:06:41 turnip kernel: md: md11: recovery done. Dec 19 15:06:41 turnip kernel: RAID1 conf printout: Dec 19 15:06:41 turnip kernel: --- wd:2 rd:2 Dec 19 15:06:41 turnip kernel: disk 0, wo:0, o:1, dev:nbd0 Dec 19 15:06:41 turnip kernel: disk 1, wo:0, o:1, dev:sda I'm going to re-do this experiment and grab an --examine-bitmap after a minute or so into the rebuild to see what happens. I am tentatively saying that the commit you suggested may be the root cause of some of the "unnecessary full-sync" issues I've had. -- Jon ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: weird issues with raid1 2008-12-19 21:18 ` Jon Nelson @ 2008-12-22 14:40 ` Jon Nelson 2008-12-22 21:07 ` NeilBrown 0 siblings, 1 reply; 24+ messages in thread From: Jon Nelson @ 2008-12-22 14:40 UTC (permalink / raw) Cc: LinuxRaid More updates: 1. I upgraded to openSUSE 11.1 over the weekend. The kernel is 2.6.27.7-9 as of this writing. 2. When I fired up the machine which hosts the network block device, the machine hosting the raid properly noticed and --re-added /dev/nbd0 to /dev/md11. 3. /dev/md11 went into "recover" mode (not resync). 4. I'm using persistent metadata and a write-intent bitmap. **Question**: What am I doing wrong here? Why doesn't --re-add cause resync instead of rebuild? If I'm reading the output from --examine-bitmap (below) correctly, there are 2049 dirty bits at 4MB per bit or about 8196 MB to resync. According to this (from the manpage) If an array is using a write-intent bitmap, then devices which have been removed can be re-added in a way that avoids a full reconstruction but instead just updates the blocks that have changed since the device was removed. For arrays with persistent metadata (superblocks) this is done automatically. For arrays created with --build mdadm needs to be told that this device we removed recently with --re-add. I'm doing everything OK. I can the --examine, --examine-bitmap from /dev/nbd0 *before* it is added to the array: Magic : a92b4efc Version : 1.0 Feature Map : 0x1 Array UUID : cf24d099:9e174a79:2a2f6797:dcff1420 Name : turnip:11 (local to host turnip) Creation Time : Mon Dec 15 07:06:13 2008 Raid Level : raid1 Raid Devices : 2 Avail Dev Size : 160086384 (76.34 GiB 81.96 GB) Array Size : 156247976 (74.50 GiB 80.00 GB) Used Dev Size : 156247976 (74.50 GiB 80.00 GB) Super Offset : 160086512 sectors State : clean Device UUID : 01524a75:c309869c:6da972c9:084115c6 Internal Bitmap : 2 sectors from superblock Flags : write-mostly Update Time : Sat Dec 20 19:43:43 2008 Checksum : 63c19462 - correct Events : 7042 Array Slot : 2 (failed, failed, empty, 1) Array State : _u 2 failed Filename : /dev/nbd0 Magic : 6d746962 Version : 4 UUID : cf24d099:9e174a79:2a2f6797:dcff1420 Events : 5518 Events Cleared : 5494 State : OK Chunksize : 4 MB Daemon : 5s flush period Write Mode : Allow write behind, max 256 Sync Size : 78123988 (74.50 GiB 80.00 GB) Bitmap : 19074 bits (chunks), 0 dirty (0.0%) Then I --re-added /dev/nbd0 to the array: Dec 22 08:15:53 turnip kernel: RAID1 conf printout: Dec 22 08:15:53 turnip kernel: --- wd:1 rd:2 Dec 22 08:15:53 turnip kernel: disk 0, wo:1, o:1, dev:nbd0 Dec 22 08:15:53 turnip kernel: disk 1, wo:0, o:1, dev:sda Dec 22 08:15:53 turnip kernel: md: recovery of RAID array md11 Dec 22 08:15:53 turnip kernel: md: minimum _guaranteed_ speed: 1000 KB/sec/disk. Dec 22 08:15:53 turnip kernel: md: using maximum available idle IO bandwidth (but not more than 5120 KB/sec) for recovery. Dec 22 08:15:53 turnip kernel: md: using 128k window, over a total of 78123988 blocks. And this is what things look like 20 minutes into the reconstruction/rebuild: turnip:~ # mdadm --examine-bitmap /dev/sda Filename : /dev/sda Magic : 6d746962 Version : 4 UUID : cf24d099:9e174a79:2a2f6797:dcff1420 Events : 15928 Events Cleared : 5494 State : OK Chunksize : 4 MB Daemon : 5s flush period Write Mode : Allow write behind, max 256 Sync Size : 78123988 (74.50 GiB 80.00 GB) Bitmap : 19074 bits (chunks), 2065 dirty (10.8%) turnip:~ # mdadm --examine-bitmap /dev/nbd0 Filename : /dev/nbd0 Magic : 6d746962 Version : 4 UUID : cf24d099:9e174a79:2a2f6797:dcff1420 Events : 5518 Events Cleared : 5494 State : OK Chunksize : 4 MB Daemon : 5s flush period Write Mode : Allow write behind, max 256 Sync Size : 78123988 (74.50 GiB 80.00 GB) Bitmap : 19074 bits (chunks), 0 dirty (0.0%) turnip:~ # and finally some --detail: /dev/md11: Version : 1.00 Creation Time : Mon Dec 15 07:06:13 2008 Raid Level : raid1 Array Size : 78123988 (74.50 GiB 80.00 GB) Used Dev Size : 156247976 (149.01 GiB 160.00 GB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Mon Dec 22 08:24:25 2008 State : active, degraded, recovering Active Devices : 1 Working Devices : 2 Failed Devices : 0 Spare Devices : 1 Rebuild Status : 8% complete Name : turnip:11 (local to host turnip) UUID : cf24d099:9e174a79:2a2f6797:dcff1420 Events : 15928 Number Major Minor RaidDevice State 2 43 0 0 writemostly spare rebuilding /dev/nbd0 3 8 0 1 active sync /dev/sda -- Jon ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: weird issues with raid1 2008-12-22 14:40 ` Jon Nelson @ 2008-12-22 21:07 ` NeilBrown 0 siblings, 0 replies; 24+ messages in thread From: NeilBrown @ 2008-12-22 21:07 UTC (permalink / raw) To: Jon Nelson; +Cc: LinuxRaid On Tue, December 23, 2008 1:40 am, Jon Nelson wrote: > More updates: > > 1. I upgraded to openSUSE 11.1 over the weekend. The kernel is > 2.6.27.7-9 as of this writing. > > 2. When I fired up the machine which hosts the network block device, > the machine hosting the raid properly noticed and --re-added /dev/nbd0 > to /dev/md11. > > 3. /dev/md11 went into "recover" mode (not resync). > > 4. I'm using persistent metadata and a write-intent bitmap. > It does seem like you are doing the right thing.... Can you show me the output of both --examine and ----examine-bitmap on both /dev/sda and /dev/nbd0 just before you --re-add nbd0 to the array that already contains sda ?? For recovery to use the bitmap, "Events Cleared" on sda must be no more than "Events" (from --examine) of nbd0. What you sent doesn't quite have all this information, but it does show that for nbd0 before it is added to the array: --examine; Events : 7042 --examine-bitmap: Events : 5518 Events Cleared : 5494 This shouldn't happen. The 'events' from --examine and from --examine-bitmap should always be the same. That is how md knows that the bitmap is still accurate. This seems to suggest that nbd0 was, for a while, assembled into an array which did not have an active bitmap. Thanks, NeilBrown ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: weird issues with raid1 2008-12-15 6:00 ` Neil Brown 2008-12-15 13:42 ` Jon Nelson @ 2008-12-18 5:43 ` Neil Brown 2008-12-18 5:54 ` Jon Nelson 1 sibling, 1 reply; 24+ messages in thread From: Neil Brown @ 2008-12-18 5:43 UTC (permalink / raw) To: Jon Nelson, LinuxRaid On Monday December 15, neilb@suse.de wrote: > > > > No matter how long I wait, until it is rebuilt, the bitmap on /dev/sda > > is always 100% dirty. > > If I --fail, --remove (twice) /dev/sda, and I re-add /dev/sdd1, it > > clearly uses the bitmap and re-syncs in under 1 second. > > Yes, there is a bug here. > When an array recovers on to a hot space it doesn't copy the bitmap > across. That will only happen lazily as bits are updated. > I'm surprised I hadn't noticed that before, so they might be more to > this than I'm seeing at the moment. But I definitely cannot find > code to copy the bitmap across. I'll have to have a think about > that. There isn't a bug here, I was wrong. We don't update the bitmap on recovery until the recovery is complete. Once it is complete we do (as you notice) update it all at once. This is correct behaviour because until the recovery is complete, the new device isn't really part of the array so the bitmap on it doesn't mean anything. As soon as the array is flagged as 'InSync' we update the bitmap on it. NeilBrown ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: weird issues with raid1 2008-12-18 5:43 ` Neil Brown @ 2008-12-18 5:54 ` Jon Nelson 0 siblings, 0 replies; 24+ messages in thread From: Jon Nelson @ 2008-12-18 5:54 UTC (permalink / raw) To: Neil Brown; +Cc: LinuxRaid On Wed, Dec 17, 2008 at 11:43 PM, Neil Brown <neilb@suse.de> wrote: > On Monday December 15, neilb@suse.de wrote: >> > >> > No matter how long I wait, until it is rebuilt, the bitmap on /dev/sda >> > is always 100% dirty. >> > If I --fail, --remove (twice) /dev/sda, and I re-add /dev/sdd1, it >> > clearly uses the bitmap and re-syncs in under 1 second. >> >> Yes, there is a bug here. >> When an array recovers on to a hot space it doesn't copy the bitmap >> across. That will only happen lazily as bits are updated. >> I'm surprised I hadn't noticed that before, so they might be more to >> this than I'm seeing at the moment. But I definitely cannot find >> code to copy the bitmap across. I'll have to have a think about >> that. > > There isn't a bug here, I was wrong. > > We don't update the bitmap on recovery until the recovery is > complete. Once it is complete we do (as you notice) update it all at > once. > This is correct behaviour because until the recovery is complete, the > new device isn't really part of the array so the bitmap on it doesn't > mean anything. As soon as the array is flagged as 'InSync' we update > the bitmap on it. OK. Fair enough, except for some issues I've had with the bitmap /not/ getting updated at all, ever, on the replacement device. That's a whole 'nother thread, though. However, I would argue that it's *kinda* part of the array. If I were rebuilding some huge array, and it was 99% done and some issue developed (and was resolved), I would not want to start over. How do you feel about copying over the bitmap right away and marking all of the bits out-of-date, then letting the normal bitmappy stuff work to our advantage? -- Jon ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2008-12-22 21:07 UTC | newest] Thread overview: 24+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-12-06 2:10 weird issues with raid1 Jon Nelson 2008-12-06 2:46 ` Jon Nelson 2008-12-06 12:16 ` Justin Piszcz 2008-12-15 2:17 ` Jon Nelson 2008-12-15 6:00 ` Neil Brown 2008-12-15 13:42 ` Jon Nelson 2008-12-15 21:33 ` Neil Brown 2008-12-15 21:47 ` Jon Nelson 2008-12-16 1:21 ` Neil Brown 2008-12-16 2:32 ` Jon Nelson 2008-12-18 4:42 ` Neil Brown 2008-12-18 4:50 ` Jon Nelson 2008-12-18 4:55 ` Jon Nelson 2008-12-18 5:17 ` Neil Brown 2008-12-18 5:47 ` Jon Nelson 2008-12-18 6:21 ` Neil Brown 2008-12-19 2:15 ` Jon Nelson 2008-12-19 16:51 ` Jon Nelson 2008-12-19 20:40 ` Jon Nelson 2008-12-19 21:18 ` Jon Nelson 2008-12-22 14:40 ` Jon Nelson 2008-12-22 21:07 ` NeilBrown 2008-12-18 5:43 ` Neil Brown 2008-12-18 5:54 ` Jon Nelson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).