* Raid5 bitmap - Bug in bitmap_startwrite()
@ 2006-08-04 9:05 Francois Barre
2006-08-04 14:20 ` Paul Clements
0 siblings, 1 reply; 3+ messages in thread
From: Francois Barre @ 2006-08-04 9:05 UTC (permalink / raw)
To: linux-raid
Hi all,
I lost connection to the testing machine, so I'll have to recall all
the details. I may have contact with it again by tonight, so I may be
able to post exact problems.
The setup : ppc/g4 (mac mini) running 4 firewire-based raid5 disks.
Kernel 2.6.18-rc3 from linux/kernel/git/linville/wireless-dev.git
The action : on this 4 disk raid5, add an internal bitmap (mdadm -g
--bitmap internal) while the array is rebuilding
The problem : at bitmap insertion, kernel shows a BUG in
bitmap_startwrite() at line 1122 (as far as I recall). This is a bit
strange, because the only BUG_ON() defined in this function is at
1166, and is
BUG_ON((*bmc & COUNTER_MAX) == COUNTER_MAX);
The running rebuild goes on, but at the end of it the machine hangs completely.
This bug has been triggered with a clean raid5 (i.e. not rebuilding)
also, and with 2.6.17-rc3.
The question : is the bitmap creation tested in such recent kernels ?
Including BE cpus ?
I may investigate the value of (*bmc), which is the result of
bitmap_get_counter().. Is the value only modified in
bitmap_startwrite() at line 1167 with (*bmc) ++; ?
Final question, I do not fully understand the bitmap_get_counter()
function, especially comparing the 'hijacked' version (lines
1126-11127) :
return &((bitmap_counter_t *) &bitmap->bp[page].map)[hi];
and the 'normal' version (lines 1131-1132)
return (bitmap_counter_t *) &(bitmap->bp[page].map[pageoff]);
The hijacked version uses a 16-bit bitmap_counter_t*
'bitmap->bp[page].map' table with the hi index, whereas the normal
uses a 8-bit char* 'bitmap->bp[page].map' table with the pageoff
index.
This may be the 'hijacked' logic, but I'm a little puzzled here.
Thanks for any clues, help, or... anything else.
Regards,
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Raid5 bitmap - Bug in bitmap_startwrite()
2006-08-04 9:05 Raid5 bitmap - Bug in bitmap_startwrite() Francois Barre
@ 2006-08-04 14:20 ` Paul Clements
2006-08-04 16:46 ` Francois Barre
0 siblings, 1 reply; 3+ messages in thread
From: Paul Clements @ 2006-08-04 14:20 UTC (permalink / raw)
To: Francois Barre; +Cc: linux-raid
Francois Barre wrote:
> Final question, I do not fully understand the bitmap_get_counter()
> function, especially comparing the 'hijacked' version (lines
> 1126-11127) :
> return &((bitmap_counter_t *) &bitmap->bp[page].map)[hi];
>
> and the 'normal' version (lines 1131-1132)
> return (bitmap_counter_t *) &(bitmap->bp[page].map[pageoff]);
>
> The hijacked version uses a 16-bit bitmap_counter_t*
> 'bitmap->bp[page].map' table with the hi index, whereas the normal
> uses a 8-bit char* 'bitmap->bp[page].map' table with the pageoff
> index.
>
> This may be the 'hijacked' logic, but I'm a little puzzled here.
Yes. When we fail to allocate a page for the map (which should be rare),
we, instead of failing the whole operation, just use the pointer to page
, so we're basically using 4 bytes (the page pointer itself) instead of
4K (the page) for that part of the bitmap. So each bit represents more
data (1000x more in the case of x86).
--
Paul
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Raid5 bitmap - Bug in bitmap_startwrite()
2006-08-04 14:20 ` Paul Clements
@ 2006-08-04 16:46 ` Francois Barre
0 siblings, 0 replies; 3+ messages in thread
From: Francois Barre @ 2006-08-04 16:46 UTC (permalink / raw)
To: linux-raid; +Cc: Paul Clements
[-- Attachment #1: Type: text/plain, Size: 918 bytes --]
As promised, here is the dmesg of the Oops caused by the BUG_ON() on line 1166.
So it seems that (*bcm & COUNTER_MAX) == COUNTER_MAX, so the system is
issuing many more bitmap_startwrite() than bitmap_endwrite(). I'll try
and compile with more verbous options and see what happens.
> > This may be the 'hijacked' logic, but I'm a little puzzled here.
>
> Yes. When we fail to allocate a page for the map (which should be rare),
> we, instead of failing the whole operation, just use the pointer to page
> , so we're basically using 4 bytes (the page pointer itself) instead of
> 4K (the page) for that part of the bitmap. So each bit represents more
> data (1000x more in the case of x86).
Thanks, I got it when I re-read the code and understood the
bitmap_checkpage() as well. Now it seems pretty clear.
Anyway it looks pretty unlikely for my system to run OOM and to hijack
page in bitmap_checkpage()...
Regards,
[-- Attachment #2: dmesg_bitmap_trace_tail --]
[-- Type: application/octet-stream, Size: 2506 bytes --]
RAID5 conf printout:
--- rd:4 wd:4 fd:0
disk 0, o:1, dev:sda1
disk 1, o:1, dev:sdc1
disk 2, o:1, dev:sdb1
disk 3, o:1, dev:sdd1
md0: bitmap initialized from disk: read 8/8 pages, set 244959 bits, status: 0
created bitmap (120 pages) for device md0
md: syncing RAID array md0
md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
md: using 128k window, over a total of 979840 blocks.
kjournald starting. Commit interval 5 seconds
kernel BUG in bitmap_startwrite at drivers/md/bitmap.c:1166!
Oops: Exception in kernel mode, sig: 5 [#1]
Modules linked in: raid456 xor radeon drm agpgart md_mod snd_pcm snd_timer snd soundcore snd_page_alloc sbp2 arc4 rate_control bcm43xx_d80211 eth1394 firmware_class 80211 ohci1394 ieee1394 ehci_hcd
NIP: E256C140 LR: E256BFF0 CTR: C0010324
REGS: dc5d5a30 TRAP: 0700 Not tainted (2.6.18-rc3-g1507f098)
MSR: 00021032 <ME,IR,DR> CR: 84008484 XER: 00000000
TASK = dc6ccd50[6664] 'mount' THREAD: dc5d4000
GPR00: 00000001 DC5D5AE0 DC6CCD50 DBEFF000 00000000 FFFFFFFB 00000000 08000000
GPR08: 00000001 00000000 00000000 C0010324 00000000 1002C9F0 28004422 00000000
GPR16: 101111E8 100D0000 00000000 00000001 00000001 00000000 00000008 00000000
GPR24: C0915E00 DC67A580 00000001 00000008 DBEFF000 00000000 00000000 CFE7FC60
NIP [E256C140] bitmap_startwrite+0x1b8/0x214 [md_mod]
LR [E256BFF0] bitmap_startwrite+0x68/0x214 [md_mod]
Call Trace:
[DC5D5AE0] [E256BFF0] bitmap_startwrite+0x68/0x214 [md_mod] (unreliable)
[DC5D5B10] [E2748BC0] make_request+0x450/0x6fc [raid456]
[DC5D5B70] [C01024E0] generic_make_request+0x218/0x234
[DC5D5BB0] [C01038EC] submit_bio+0xe4/0xf8
[DC5D5BF0] [C006B574] submit_bh+0x160/0x198
[DC5D5C10] [C006B63C] sync_dirty_buffer+0x90/0x11c
[DC5D5C20] [C00B2B90] ext3_commit_super+0x98/0xac
[DC5D5C40] [C00B2E6C] ext3_setup_super+0x198/0x230
[DC5D5C80] [C00B4C38] ext3_fill_super+0x1234/0x14fc
[DC5D5CF0] [C0070774] get_sb_bdev+0x100/0x16c
[DC5D5D40] [C00B22BC] ext3_get_sb+0x1c/0x2c
[DC5D5D50] [C006FC84] vfs_kern_mount+0x5c/0xa4
[DC5D5D70] [C006FD18] do_kern_mount+0x3c/0x60
[DC5D5D90] [C0086E7C] do_mount+0x558/0x5c0
[DC5D5F10] [C0086F74] sys_mount+0x90/0xe4
[DC5D5F40] [C000F4DC] ret_from_syscall+0x0/0x38
--- Exception: c01 at 0xff21738
LR = 0x100037c8
Instruction dump:
38e00001 4bffed01 813f000c 806901b0 48001b51 38000002 b01c0000 a01c0000
540004be 68003fff 21200000 7c090114 <0f000000> a13c0000 39290001 b13c0000
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2006-08-04 16:46 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-04 9:05 Raid5 bitmap - Bug in bitmap_startwrite() Francois Barre
2006-08-04 14:20 ` Paul Clements
2006-08-04 16:46 ` Francois Barre
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.