* question about bitmaps and dirty percentile
@ 2009-07-30 18:25 Jon Nelson
2009-07-30 19:16 ` Jon Nelson
2009-08-06 6:21 ` Neil Brown
0 siblings, 2 replies; 11+ messages in thread
From: Jon Nelson @ 2009-07-30 18:25 UTC (permalink / raw)
To: LinuxRaid
I have a 3-disk raid1 configured with bitmaps.
Most of the time it only has 1 disk (disk A)
Periodically (weekly or less frequenly) I re-add a second disk (disk
B), which then re-synchronizes, and when it's done I --fail and
--remove it.
Even less frequently (monthly or less frequently) I do the same thing
with a third disk (disk C).
Before adding the disks, I will issue an --examine.
When I added disk B today, it said this:
Events : 14580
Bitmap : 283645 bits (chunks), 11781 dirty (4.2%)
I'm curious why *any* of the bitmap chunks are dirty - when the disks
are removed the device has typically been quiescent for quite some
time. Is there a way to force a "flush" or whatever to get each disk
as up-to-date as possible, prior to a --fail and --remove?
While /dev/nbd0 was syncing, I also --re-add'ed /dev/sdf1, which (as
expected) waited until /dev/nbd0 was done.
Then, due to a logic bug in a script, /dev/sdf1 was removed (the
script was waiting with mdadm --wait /dev/md12 which returned when
/dev/nbd0 was done, even though /dev/sdf1 had not yet started!!).
Then things got weird.
I saw this, which just *can't* be right:
md12 : active raid1 nbd0[2](W) sde[0]
72612988 blocks super 1.1 [3/1] [U__]
[======================================>] recovery =192.7%
(69979200/36306494) finish=13228593199978.6min speed=11620K/sec
bitmap: 139/139 pages [556KB], 256KB chunk
and of course the percentile kept growing, and the finish minutes are crazy.
I had to --fail and --remove /dev/nbd0, and re-add it, which
unfortunately started the recovery over.
I haven't even gotten to my questions about dirty percentages and so
on, which I will save for later.
In summary:
3-disk raid1, using bitmaps, with 2 missing disks.
re-add disk B. recovery begins.
re-add disk C. recovery continues on to disk B, will wait for disk C.
recovery completes on disk B, mdadm --wait returns (unexpectedly)
--fail, --remove disk C (which was never recovered on-to)
/proc/mdstat crazy, disk I/O still high (WTF is it *doing*, then?)
--fail --remove disk B, --re-add disk B, recovery starts over.
--
Jon
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: question about bitmaps and dirty percentile
2009-07-30 18:25 question about bitmaps and dirty percentile Jon Nelson
@ 2009-07-30 19:16 ` Jon Nelson
2009-07-31 18:17 ` Paul Clements
2009-08-06 6:21 ` Neil Brown
1 sibling, 1 reply; 11+ messages in thread
From: Jon Nelson @ 2009-07-30 19:16 UTC (permalink / raw)
To: LinuxRaid
On Thu, Jul 30, 2009 at 1:25 PM, Jon
Nelson<jnelson-linux-raid@jamponi.net> wrote:
> Then things got weird.
>
> I saw this, which just *can't* be right:
>
> md12 : active raid1 nbd0[2](W) sde[0]
> 72612988 blocks super 1.1 [3/1] [U__]
> [======================================>] recovery =192.7%
> (69979200/36306494) finish=13228593199978.6min speed=11620K/sec
> bitmap: 139/139 pages [556KB], 256KB chunk
>
> and of course the percentile kept growing, and the finish minutes are crazy.
Weirdness: it ready 199 (or so) and then completed:
md12 : active raid1 nbd0[2](W) sde[0]
72612988 blocks super 1.1 [3/2] [UU_]
bitmap: 139/139 pages [556KB], 256KB chunk
I --fail, --remove the device, and then --re-add it.
The recovery *starts over*, as if nothing had happened over the last hour or so.
The event counter are very close between /dev/nbd0 (the device here)
and /dev/sde (the core device), within a dozen or so, but the "dirty
percentile" on /dev/nbd0 is big - 18.8%, and unchanging between runs.
It's like the bitmap isn't getting updated, or getting updated
incompletely, or something.
Does the bitmap only get updated when *all* devices have sync'd???
I'll let you know in about 2 hours.
--
Jon
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: question about bitmaps and dirty percentile
2009-07-30 19:16 ` Jon Nelson
@ 2009-07-31 18:17 ` Paul Clements
2009-07-31 19:09 ` Jon Nelson
0 siblings, 1 reply; 11+ messages in thread
From: Paul Clements @ 2009-07-31 18:17 UTC (permalink / raw)
To: Jon Nelson; +Cc: LinuxRaid
Jon Nelson wrote:
> On Thu, Jul 30, 2009 at 1:25 PM, Jon
> Nelson<jnelson-linux-raid@jamponi.net> wrote:
>> Then things got weird.
>>
>> I saw this, which just *can't* be right:
>>
>> md12 : active raid1 nbd0[2](W) sde[0]
>> 72612988 blocks super 1.1 [3/1] [U__]
>> [======================================>] recovery =192.7%
>> (69979200/36306494) finish=13228593199978.6min speed=11620K/sec
>> bitmap: 139/139 pages [556KB], 256KB chunk
>>
>> and of course the percentile kept growing, and the finish minutes are crazy.
>
> Weirdness: it ready 199 (or so) and then completed:
>
> md12 : active raid1 nbd0[2](W) sde[0]
> 72612988 blocks super 1.1 [3/2] [UU_]
> bitmap: 139/139 pages [556KB], 256KB chunk
>
> I --fail, --remove the device, and then --re-add it.
>
> The recovery *starts over*, as if nothing had happened over the last hour or so.
> The event counter are very close between /dev/nbd0 (the device here)
> and /dev/sde (the core device), within a dozen or so, but the "dirty
> percentile" on /dev/nbd0 is big - 18.8%, and unchanging between runs.
> It's like the bitmap isn't getting updated, or getting updated
> incompletely, or something.
>
> Does the bitmap only get updated when *all* devices have sync'd???
> I'll let you know in about 2 hours.
The bitmap never gets cleared unless all disks in the array are in sync.
--
Paul
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: question about bitmaps and dirty percentile
2009-07-31 18:17 ` Paul Clements
@ 2009-07-31 19:09 ` Jon Nelson
2009-08-03 16:44 ` Matthias Urlichs
0 siblings, 1 reply; 11+ messages in thread
From: Jon Nelson @ 2009-07-31 19:09 UTC (permalink / raw)
Cc: LinuxRaid
> The bitmap never gets cleared unless all disks in the array are in sync.
Well, that sucks. What is the reasoning behind that? It would seem
that having 2 out of 3 disks with an up-to-date bitmap would be
useful.
However, it doesn't explain the 200% problem.
--
Jon
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: question about bitmaps and dirty percentile
2009-07-31 19:09 ` Jon Nelson
@ 2009-08-03 16:44 ` Matthias Urlichs
2009-08-03 20:30 ` Paul Clements
0 siblings, 1 reply; 11+ messages in thread
From: Matthias Urlichs @ 2009-08-03 16:44 UTC (permalink / raw)
To: linux-raid
On Fri, 31 Jul 2009 14:09:06 -0500, Jon Nelson wrote:
>> The bitmap never gets cleared unless all disks in the array are in
>> sync.
>
> Well, that sucks. What is the reasoning behind that?
There's only one bitmap per device. If the bits get cleaned writing to
disk #2, then the system would forget that they still need to be written
to disk #3.
--
Matthias Urlichs
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: question about bitmaps and dirty percentile
2009-08-03 16:44 ` Matthias Urlichs
@ 2009-08-03 20:30 ` Paul Clements
0 siblings, 0 replies; 11+ messages in thread
From: Paul Clements @ 2009-08-03 20:30 UTC (permalink / raw)
To: linux-raid
Matthias Urlichs wrote:
> On Fri, 31 Jul 2009 14:09:06 -0500, Jon Nelson wrote:
>
>>> The bitmap never gets cleared unless all disks in the array are in
>>> sync.
>> Well, that sucks. What is the reasoning behind that?
>
> There's only one bitmap per device. If the bits get cleaned writing to
> disk #2, then the system would forget that they still need to be written
> to disk #3.
Right, and that decision was made for efficiency and simplicity of
design. Having a bitmap per pair of component disks would be inefficient
and very complicated. You could stack raid1's if you absolutely had to
have that type of functionality.
--
Paul
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: question about bitmaps and dirty percentile
2009-07-30 18:25 question about bitmaps and dirty percentile Jon Nelson
2009-07-30 19:16 ` Jon Nelson
@ 2009-08-06 6:21 ` Neil Brown
2009-08-06 13:02 ` Jon Nelson
1 sibling, 1 reply; 11+ messages in thread
From: Neil Brown @ 2009-08-06 6:21 UTC (permalink / raw)
To: Jon Nelson; +Cc: LinuxRaid
On Thursday July 30, jnelson-linux-raid@jamponi.net wrote:
>
> I saw this, which just *can't* be right:
>
> md12 : active raid1 nbd0[2](W) sde[0]
> 72612988 blocks super 1.1 [3/1] [U__]
> [======================================>] recovery =192.7%
> (69979200/36306494) finish=13228593199978.6min speed=11620K/sec
> bitmap: 139/139 pages [556KB], 256KB chunk
Certainly very strange. I cannot explain it at all.
Please report exactly what kernel version you were running, all kernel
log messages from before the first resync completed until after the
sync-to-200% completed.
Hopefully there will be a clue somewhere in there.
Thanks,
NeilBrown
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: question about bitmaps and dirty percentile
2009-08-06 6:21 ` Neil Brown
@ 2009-08-06 13:02 ` Jon Nelson
2009-08-07 1:47 ` NeilBrown
0 siblings, 1 reply; 11+ messages in thread
From: Jon Nelson @ 2009-08-06 13:02 UTC (permalink / raw)
Cc: LinuxRaid
On Thu, Aug 6, 2009 at 1:21 AM, Neil Brown<neilb@suse.de> wrote:
> On Thursday July 30, jnelson-linux-raid@jamponi.net wrote:
>>
>> I saw this, which just *can't* be right:
>>
>> md12 : active raid1 nbd0[2](W) sde[0]
>> 72612988 blocks super 1.1 [3/1] [U__]
>> [======================================>] recovery =192.7%
>> (69979200/36306494) finish=13228593199978.6min speed=11620K/sec
>> bitmap: 139/139 pages [556KB], 256KB chunk
>
> Certainly very strange. I cannot explain it at all.
>
> Please report exactly what kernel version you were running, all kernel
> log messages from before the first resync completed until after the
> sync-to-200% completed.
>
> Hopefully there will be a clue somewhere in there.
Stock openSUSE 2.6.27.25-0.1-default on x86_64.
I'm pretty sure this is it:
Jul 30 13:51:01 turnip kernel: md: bind<nbd0>
Jul 30 13:51:01 turnip kernel: RAID1 conf printout:
Jul 30 13:51:01 turnip kernel: --- wd:1 rd:3
Jul 30 13:51:01 turnip kernel: disk 0, wo:0, o:1, dev:sde
Jul 30 13:51:01 turnip kernel: disk 1, wo:1, o:1, dev:nbd0
Jul 30 13:51:01 turnip kernel: md: recovery of RAID array md12
Jul 30 13:51:01 turnip kernel: md: minimum _guaranteed_ speed: 1000
KB/sec/disk.
Jul 30 13:51:01 turnip kernel: md: using maximum available idle IO
bandwidth (but not more than 200000 KB/sec) for recovery.
Jul 30 13:51:01 turnip kernel: md: using 128k window, over a total of
72612988 blocks.
Jul 30 14:10:48 turnip kernel: md: md12: recovery done.
Jul 30 14:10:49 turnip kernel: RAID1 conf printout:
Jul 30 14:10:49 turnip kernel: --- wd:2 rd:3
Jul 30 14:10:49 turnip kernel: disk 0, wo:0, o:1, dev:sde
Jul 30 14:10:49 turnip kernel: disk 1, wo:0, o:1, dev:nbd0
--
Jon
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: question about bitmaps and dirty percentile
2009-08-06 13:02 ` Jon Nelson
@ 2009-08-07 1:47 ` NeilBrown
2009-08-07 2:17 ` Jon Nelson
0 siblings, 1 reply; 11+ messages in thread
From: NeilBrown @ 2009-08-07 1:47 UTC (permalink / raw)
To: Jon Nelson; +Cc: LinuxRaid
On Thu, August 6, 2009 11:02 pm, Jon Nelson wrote:
> On Thu, Aug 6, 2009 at 1:21 AM, Neil Brown<neilb@suse.de> wrote:
>> On Thursday July 30, jnelson-linux-raid@jamponi.net wrote:
>>>
>>> I saw this, which just *can't* be right:
>>>
>>> md12 : active raid1 nbd0[2](W) sde[0]
>>> 72612988 blocks super 1.1 [3/1] [U__]
>>> [======================================>] recovery =192.7%
>>> (69979200/36306494) finish=13228593199978.6min speed=11620K/sec
>>> bitmap: 139/139 pages [556KB], 256KB chunk
>>
>> Certainly very strange. I cannot explain it at all.
>>
>> Please report exactly what kernel version you were running, all kernel
>> log messages from before the first resync completed until after the
>> sync-to-200% completed.
>>
>> Hopefully there will be a clue somewhere in there.
>
> Stock openSUSE 2.6.27.25-0.1-default on x86_64.
Ok, so it was probably broken by whoever maintain md for
SuSE.... oh wait, that's me :-)
>
> I'm pretty sure this is it:
>
> Jul 30 13:51:01 turnip kernel: md: bind<nbd0>
> Jul 30 13:51:01 turnip kernel: RAID1 conf printout:
> Jul 30 13:51:01 turnip kernel: --- wd:1 rd:3
> Jul 30 13:51:01 turnip kernel: disk 0, wo:0, o:1, dev:sde
> Jul 30 13:51:01 turnip kernel: disk 1, wo:1, o:1, dev:nbd0
> Jul 30 13:51:01 turnip kernel: md: recovery of RAID array md12
> Jul 30 13:51:01 turnip kernel: md: minimum _guaranteed_ speed: 1000
> KB/sec/disk.
> Jul 30 13:51:01 turnip kernel: md: using maximum available idle IO
> bandwidth (but not more than 200000 KB/sec) for recovery.
> Jul 30 13:51:01 turnip kernel: md: using 128k window, over a total of
> 72612988 blocks.
> Jul 30 14:10:48 turnip kernel: md: md12: recovery done.
> Jul 30 14:10:49 turnip kernel: RAID1 conf printout:
> Jul 30 14:10:49 turnip kernel: --- wd:2 rd:3
> Jul 30 14:10:49 turnip kernel: disk 0, wo:0, o:1, dev:sde
> Jul 30 14:10:49 turnip kernel: disk 1, wo:0, o:1, dev:nbd0
>
Thanks...
So:
- when the recovery started, mddev->size was twice of half of the value
printed for "over a total of...", so 72612988 (I assume this is
expected to be a 72 Gig array).
Twice this will have been stored in 'max_sectors' and the loop in
md_do_sync will have taken 'j' up to that value and periodically
stored in in mddev->curr_resync
- When you ran "cat /proc/mdstat", mddev->array_sectors will have been
twice the value printed at "... blocks", which is the same,
145225976
- When you ran "cat /proc/mdstat", it printed "recovery", not "resync",
so MD_RECOVERY_SYNC was not set, so max_sectors was set to
mddev->size... that looks wrong (size is in KB)
Half of this is printed in the second number in
the (%d/%d) bit, so ->size was twice 36306494 or
72612988, which is consistent.
So the problem is that in resync_status, max_sectors is being set to
mddev->size rather than mddev->size*2. This is purely a cosmetic problem,
it do not affect data safety at all.
It looks like I botched a backport of
commit dd71cf6b2773310b01c6fe6c773064c80fd2476b
into the Suse kernel. I'll get that fixed for the next update.
Thanks for the report, and as I said, the only thing affected here
is the content of /proc/mdstat. The recovery is doing the right
thing.
Thanks,
NeilBrown
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: question about bitmaps and dirty percentile
2009-08-07 1:47 ` NeilBrown
@ 2009-08-07 2:17 ` Jon Nelson
2009-08-07 12:29 ` John Robinson
0 siblings, 1 reply; 11+ messages in thread
From: Jon Nelson @ 2009-08-07 2:17 UTC (permalink / raw)
Cc: LinuxRaid
On Thu, Aug 6, 2009 at 8:47 PM, NeilBrown<neilb@suse.de> wrote:
...
> So the problem is that in resync_status, max_sectors is being set to
> mddev->size rather than mddev->size*2. This is purely a cosmetic problem,
> it do not affect data safety at all.
>
> It looks like I botched a backport of
> commit dd71cf6b2773310b01c6fe6c773064c80fd2476b
> into the Suse kernel. I'll get that fixed for the next update.
>
> Thanks for the report, and as I said, the only thing affected here
> is the content of /proc/mdstat. The recovery is doing the right
> thing.
Sweet! Open Source is AWESOME.
--
Jon
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: question about bitmaps and dirty percentile
2009-08-07 2:17 ` Jon Nelson
@ 2009-08-07 12:29 ` John Robinson
0 siblings, 0 replies; 11+ messages in thread
From: John Robinson @ 2009-08-07 12:29 UTC (permalink / raw)
To: Jon Nelson; +Cc: LinuxRaid
On 07/08/2009 03:17, Jon Nelson wrote:
> Sweet! Open Source is AWESOME.
Amen to that, brother Jon!
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2009-08-07 12:29 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-07-30 18:25 question about bitmaps and dirty percentile Jon Nelson
2009-07-30 19:16 ` Jon Nelson
2009-07-31 18:17 ` Paul Clements
2009-07-31 19:09 ` Jon Nelson
2009-08-03 16:44 ` Matthias Urlichs
2009-08-03 20:30 ` Paul Clements
2009-08-06 6:21 ` Neil Brown
2009-08-06 13:02 ` Jon Nelson
2009-08-07 1:47 ` NeilBrown
2009-08-07 2:17 ` Jon Nelson
2009-08-07 12:29 ` John Robinson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).