* [BISECT] Kernel panic, RIP bitmap_create [not found] <CAOOwNtJhFa67EFTs5AdgSHzFseBr9xJGTsaEOyYnaYYNCeUMAQ@mail.gmail.com> @ 2012-05-03 5:05 ` Karl Newman 2012-05-03 5:58 ` NeilBrown 0 siblings, 1 reply; 8+ messages in thread From: Karl Newman @ 2012-05-03 5:05 UTC (permalink / raw) To: linux-raid; +Cc: neilb Hi, I'm attempting to use kernel 3.4-rc? but keep running into a kernel panic on boot, with RIP pointing to bitmap_create. I tried 3.4-rc1, 3.4-rc4 and 3.4-rc5 and they all have the kernel panic, while 3.3.4 boots fine. I have my root on raid 5 with an internal bitmap, and the kernel panic occurs if I use the built-in kernel autodetect or during the root array assembly via mdadm inside a dracut-generated initramfs. I bisected it down to the following commit: 61a0d80ce4ab5b4fb9ecb38f1fb19654778b71ed md/bitmap: discard CHUNK_BLOCK_SHIFT macro Be redefining ->chunkshift as the shift from sectors to chunks rather than bytes to chunks, we can just use "bitmap->chunkshift" which is shorter than the macro call, and less indirect. Signed-off-by: NeilBrown <neilb@suse.de> My bisect testing including a scary commit where 2 of 3 drives had their UUIDs zeroed when I booted with it! Fortunately I found the mailing list archives with the solution and I was able to recover everything and keep bisecting (although I was tempted to quit and just give the range of commits...). I hope this fix can make it into the next 3.4-rc kernel. Thanks, Karl Newman P.S. Sorry for the possible repeat, apparently Gmail's default HTML format is unacceptable to this list. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [BISECT] Kernel panic, RIP bitmap_create 2012-05-03 5:05 ` [BISECT] Kernel panic, RIP bitmap_create Karl Newman @ 2012-05-03 5:58 ` NeilBrown 2012-05-03 6:14 ` Karl Newman 0 siblings, 1 reply; 8+ messages in thread From: NeilBrown @ 2012-05-03 5:58 UTC (permalink / raw) To: Karl Newman; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 1798 bytes --] On Wed, 2 May 2012 22:05:44 -0700 Karl Newman <siliconfiend@gmail.com> wrote: > Hi, > > I'm attempting to use kernel 3.4-rc? but keep running into a kernel panic on > boot, with RIP pointing to bitmap_create. I tried 3.4-rc1, 3.4-rc4 and > 3.4-rc5 and they all have the kernel panic, while 3.3.4 boots fine. I have > my root on raid 5 with an internal bitmap, and the kernel panic occurs if I > use the built-in kernel autodetect or during the root array assembly via > mdadm inside a dracut-generated initramfs. I bisected it down to the > following commit: > 61a0d80ce4ab5b4fb9ecb38f1fb19654778b71ed > > md/bitmap: discard CHUNK_BLOCK_SHIFT macro > > Be redefining ->chunkshift as the shift from sectors to chunks rather than > bytes to chunks, we can just use "bitmap->chunkshift" which is shorter than > the macro call, and less indirect. > > Signed-off-by: NeilBrown <neilb@suse.de> > > My bisect testing including a scary commit where 2 of 3 drives had their > UUIDs zeroed when I booted with it! Fortunately I found the mailing list > archives with the solution and I was able to recover everything and keep > bisecting (although I was tempted to quit and just give the range of > commits...). > > I hope this fix can make it into the next 3.4-rc kernel. I do too, but first I would need to know what the fix is, and I cannot see anything in that commit what would change the behaviour of md at all. Do you have a copy of the full stack trace provided when Linux crashed? That could be useful. Also what bitmap chunk size are you using? Maybe the output of mdadm -X and mdadm -E of one of the devices in the array would help. Thanks a lot for the report and going to the trouble of bisecting, it is really appreciated. NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [BISECT] Kernel panic, RIP bitmap_create 2012-05-03 5:58 ` NeilBrown @ 2012-05-03 6:14 ` Karl Newman 2012-05-03 6:25 ` NeilBrown 2012-05-03 6:50 ` NeilBrown 0 siblings, 2 replies; 8+ messages in thread From: Karl Newman @ 2012-05-03 6:14 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid On Wed, May 2, 2012 at 10:58 PM, NeilBrown <neilb@suse.de> wrote: > On Wed, 2 May 2012 22:05:44 -0700 Karl Newman <siliconfiend@gmail.com> wrote: > >> Hi, >> >> I'm attempting to use kernel 3.4-rc? but keep running into a kernel panic on >> boot, with RIP pointing to bitmap_create. I tried 3.4-rc1, 3.4-rc4 and >> 3.4-rc5 and they all have the kernel panic, while 3.3.4 boots fine. I have >> my root on raid 5 with an internal bitmap, and the kernel panic occurs if I >> use the built-in kernel autodetect or during the root array assembly via >> mdadm inside a dracut-generated initramfs. I bisected it down to the >> following commit: >> 61a0d80ce4ab5b4fb9ecb38f1fb19654778b71ed >> >> md/bitmap: discard CHUNK_BLOCK_SHIFT macro >> >> Be redefining ->chunkshift as the shift from sectors to chunks rather than >> bytes to chunks, we can just use "bitmap->chunkshift" which is shorter than >> the macro call, and less indirect. >> >> Signed-off-by: NeilBrown <neilb@suse.de> >> >> My bisect testing including a scary commit where 2 of 3 drives had their >> UUIDs zeroed when I booted with it! Fortunately I found the mailing list >> archives with the solution and I was able to recover everything and keep >> bisecting (although I was tempted to quit and just give the range of >> commits...). >> >> I hope this fix can make it into the next 3.4-rc kernel. > > I do too, but first I would need to know what the fix is, and I cannot see > anything in that commit what would change the behaviour of md at all. > > Do you have a copy of the full stack trace provided when Linux crashed? That > could be useful. > Also what bitmap chunk size are you using? Maybe the output of > mdadm -X > and > mdadm -E > > of one of the devices in the array would help. > > Thanks a lot for the report and going to the trouble of bisecting, it is > really appreciated. > > NeilBrown I'm not sure how to go about getting the full stack trace. The motherboard has no serial port, so that's not an option. Unless the kernel supports USB to serial adapters for that purpose, in which case I might be able to borrow a couple Keyspans. Or I could sit and try and transcribe the whole thing...(!) I'm a little nervous about tripping the all-zeros UUID bug again, although it only happened once and it doesn't seem to be related to that commit. Anyway, here's some data from the array: # mdadm -E /dev/sda3 /dev/sda3: Magic : a92b4efc Version : 0.90.00 UUID : 60bf4ee8:e6e3e14f:073e21cd:ed2abb54 Creation Time : Wed May 2 20:22:47 2012 Raid Level : raid5 Used Dev Size : 29302400 (27.94 GiB 30.01 GB) Array Size : 58604800 (55.89 GiB 60.01 GB) Raid Devices : 3 Total Devices : 3 Preferred Minor : 127 Update Time : Wed May 2 23:01:45 2012 State : active Internal Bitmap : present Active Devices : 3 Working Devices : 3 Failed Devices : 0 Spare Devices : 0 Checksum : 863c968f - correct Events : 2 Layout : left-symmetric Chunk Size : 128K Number Major Minor RaidDevice State this 0 8 3 0 active sync /dev/sda3 0 0 8 3 0 active sync /dev/sda3 1 1 8 19 1 active sync /dev/sdb3 2 2 8 35 2 active sync /dev/sdc3 # mdadm -X /dev/sda3 Filename : /dev/sda3 Magic : 6d746962 Version : 4 UUID : 60bf4ee8:e6e3e14f:073e21cd:ed2abb54 Events : 1 Events Cleared : 1 State : OK Chunksize : 64 MB Daemon : 5s flush period Write Mode : Normal Sync Size : 29302400 (27.94 GiB 30.01 GB) Bitmap : 448 bits (chunks), 0 dirty (0.0%) -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [BISECT] Kernel panic, RIP bitmap_create 2012-05-03 6:14 ` Karl Newman @ 2012-05-03 6:25 ` NeilBrown 2012-05-03 6:50 ` NeilBrown 1 sibling, 0 replies; 8+ messages in thread From: NeilBrown @ 2012-05-03 6:25 UTC (permalink / raw) To: Karl Newman; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 3134 bytes --] On Wed, 2 May 2012 23:14:09 -0700 Karl Newman <siliconfiend@gmail.com> wrote: > I'm not sure how to go about getting the full stack trace. The > motherboard has no serial port, so that's not an option. Unless the > kernel supports USB to serial adapters for that purpose, in which case > I might be able to borrow a couple Keyspans. Or I could sit and try > and transcribe the whole thing...(!) A photo with a digital camera is usually easiest. If you have wired ethernet you could possible set up net-console. Add something like netconsole=@192.168.1.8/eth0,6666@192.168.1.3/00:14:85:fc:3b:de ^my address ^other host IP / ethernet Then on other-host run nc -u -l -p 6666 | tee -a /tmp/log > I'm a little nervous about > tripping the all-zeros UUID bug again, although it only happened once > and it doesn't seem to be related to that commit. IF you apply --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -8140,7 +8140,8 @@ static int md_notify_reboot(struct notifier_block *this, for_each_mddev(mddev, tmp) { if (mddev_trylock(mddev)) { - __md_stop_writes(mddev); + if (mddev->pers) + __md_stop_writes(mddev); mddev->safemode = 2; mddev_unlock(mddev); } to the kernel before you build it, that bug should not happen again. > Anyway, here's some > data from the array: Thanks. Nothing jumps out at me, but I'll ponder it some more. Thanks, NeilBrown > > # mdadm -E /dev/sda3 > /dev/sda3: > Magic : a92b4efc > Version : 0.90.00 > UUID : 60bf4ee8:e6e3e14f:073e21cd:ed2abb54 > Creation Time : Wed May 2 20:22:47 2012 > Raid Level : raid5 > Used Dev Size : 29302400 (27.94 GiB 30.01 GB) > Array Size : 58604800 (55.89 GiB 60.01 GB) > Raid Devices : 3 > Total Devices : 3 > Preferred Minor : 127 > > Update Time : Wed May 2 23:01:45 2012 > State : active > Internal Bitmap : present > Active Devices : 3 > Working Devices : 3 > Failed Devices : 0 > Spare Devices : 0 > Checksum : 863c968f - correct > Events : 2 > > Layout : left-symmetric > Chunk Size : 128K > > Number Major Minor RaidDevice State > this 0 8 3 0 active sync /dev/sda3 > > 0 0 8 3 0 active sync /dev/sda3 > 1 1 8 19 1 active sync /dev/sdb3 > 2 2 8 35 2 active sync /dev/sdc3 > > # mdadm -X /dev/sda3 > Filename : /dev/sda3 > Magic : 6d746962 > Version : 4 > UUID : 60bf4ee8:e6e3e14f:073e21cd:ed2abb54 > Events : 1 > Events Cleared : 1 > State : OK > Chunksize : 64 MB > Daemon : 5s flush period > Write Mode : Normal > Sync Size : 29302400 (27.94 GiB 30.01 GB) > Bitmap : 448 bits (chunks), 0 dirty (0.0%) [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [BISECT] Kernel panic, RIP bitmap_create 2012-05-03 6:14 ` Karl Newman 2012-05-03 6:25 ` NeilBrown @ 2012-05-03 6:50 ` NeilBrown 2012-05-04 6:37 ` Karl Newman 1 sibling, 1 reply; 8+ messages in thread From: NeilBrown @ 2012-05-03 6:50 UTC (permalink / raw) To: Karl Newman; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 1245 bytes --] I've managed to find a bug, but it is fairly minor and I cannot see how it would cause a crash. The calculation of bitmap->chunks is wrong and will usually be 1 too small. Does it make a difference for you? I tend to doubt it. Thanks, NeilBrown diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c index 97e73e5..17e2b47 100644 --- a/drivers/md/bitmap.c +++ b/drivers/md/bitmap.c @@ -1727,8 +1727,7 @@ int bitmap_create(struct mddev *mddev) bitmap->chunkshift = (ffz(~mddev->bitmap_info.chunksize) - BITMAP_BLOCK_SHIFT); - /* now that chunksize and chunkshift are set, we can use these macros */ - chunks = (blocks + bitmap->chunkshift - 1) >> + chunks = (blocks + (1 << bitmap->chunkshift) - 1) >> bitmap->chunkshift; pages = (chunks + PAGE_COUNTER_RATIO - 1) / PAGE_COUNTER_RATIO; diff --git a/drivers/md/bitmap.h b/drivers/md/bitmap.h index 55ca5ae..b44b0aba 100644 --- a/drivers/md/bitmap.h +++ b/drivers/md/bitmap.h @@ -101,9 +101,6 @@ typedef __u16 bitmap_counter_t; #define BITMAP_BLOCK_SHIFT 9 -/* how many blocks per chunk? (this is variable) */ -#define CHUNK_BLOCK_RATIO(bitmap) ((bitmap)->mddev->bitmap_info.chunksize >> BITMAP_BLOCK_SHIFT) - #endif /* [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [BISECT] Kernel panic, RIP bitmap_create 2012-05-03 6:50 ` NeilBrown @ 2012-05-04 6:37 ` Karl Newman 2012-05-04 6:47 ` NeilBrown 0 siblings, 1 reply; 8+ messages in thread From: Karl Newman @ 2012-05-04 6:37 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid On Wed, May 2, 2012 at 11:50 PM, NeilBrown <neilb@suse.de> wrote: > > I've managed to find a bug, but it is fairly minor and I cannot see how > it would cause a crash. > > The calculation of bitmap->chunks is wrong and will usually be 1 too small. > > Does it make a difference for you? I tend to doubt it. > > Thanks, > NeilBrown > > > diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c > index 97e73e5..17e2b47 100644 > --- a/drivers/md/bitmap.c > +++ b/drivers/md/bitmap.c > @@ -1727,8 +1727,7 @@ int bitmap_create(struct mddev *mddev) > bitmap->chunkshift = (ffz(~mddev->bitmap_info.chunksize) > - BITMAP_BLOCK_SHIFT); > > - /* now that chunksize and chunkshift are set, we can use these macros */ > - chunks = (blocks + bitmap->chunkshift - 1) >> > + chunks = (blocks + (1 << bitmap->chunkshift) - 1) >> > bitmap->chunkshift; > pages = (chunks + PAGE_COUNTER_RATIO - 1) / PAGE_COUNTER_RATIO; > > diff --git a/drivers/md/bitmap.h b/drivers/md/bitmap.h > index 55ca5ae..b44b0aba 100644 > --- a/drivers/md/bitmap.h > +++ b/drivers/md/bitmap.h > @@ -101,9 +101,6 @@ typedef __u16 bitmap_counter_t; > > #define BITMAP_BLOCK_SHIFT 9 > > -/* how many blocks per chunk? (this is variable) */ > -#define CHUNK_BLOCK_RATIO(bitmap) ((bitmap)->mddev->bitmap_info.chunksize >> BITMAP_BLOCK_SHIFT) > - > #endif > > /* Somehow gmail marked this email as read, too, so I missed it. Anyway, that did it! With this patch applied I can successfully boot! I tested the offending commit by itself first with the all-zeros uuid patch applied and confirmed the bug was still present, then applied this patch and the bug was gone. I also applied this patch to 3.4-rc5 and confirmed that it was still good. Thank you for your help on this issue, and thank you for your work as a kernel developer and supporting this crucial component. Sincerely, Karl Newman -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [BISECT] Kernel panic, RIP bitmap_create 2012-05-04 6:37 ` Karl Newman @ 2012-05-04 6:47 ` NeilBrown 2012-05-04 13:54 ` Karl Newman 0 siblings, 1 reply; 8+ messages in thread From: NeilBrown @ 2012-05-04 6:47 UTC (permalink / raw) To: Karl Newman; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 1057 bytes --] On Thu, 3 May 2012 23:37:38 -0700 Karl Newman <siliconfiend@gmail.com> wrote: > Somehow gmail marked this email as read, too, so I missed it. Anyway, > that did it! With this patch applied I can successfully boot! I tested > the offending commit by itself first with the all-zeros uuid patch > applied and confirmed the bug was still present, then applied this > patch and the bug was gone. I also applied this patch to 3.4-rc5 and > confirmed that it was still good. Thanks - and good news. I'd still like to know how this bug manages to cause a crash (I create an array that have identical "mdadm -E" and "mdadm -X" output on an x86_64 machine, and couldn't make it crash). I'll add a Reported-by: and Tested-by: for you and submit to Linus shortly. > > Thank you for your help on this issue, and thank you for your work as > a kernel developer and supporting this crucial component. A pleasure - specially when I get to work with helpful and responsive people :-) NeilBrown > > Sincerely, > > Karl Newman [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [BISECT] Kernel panic, RIP bitmap_create 2012-05-04 6:47 ` NeilBrown @ 2012-05-04 13:54 ` Karl Newman 0 siblings, 0 replies; 8+ messages in thread From: Karl Newman @ 2012-05-04 13:54 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid On Thu, May 3, 2012 at 11:47 PM, NeilBrown <neilb@suse.de> wrote: > On Thu, 3 May 2012 23:37:38 -0700 Karl Newman <siliconfiend@gmail.com> wrote: > > >> Somehow gmail marked this email as read, too, so I missed it. Anyway, >> that did it! With this patch applied I can successfully boot! I tested >> the offending commit by itself first with the all-zeros uuid patch >> applied and confirmed the bug was still present, then applied this >> patch and the bug was gone. I also applied this patch to 3.4-rc5 and >> confirmed that it was still good. > > Thanks - and good news. > > I'd still like to know how this bug manages to cause a crash (I create an > array that have identical "mdadm -E" and "mdadm -X" output on an x86_64 > machine, and couldn't make it crash). > > I'll add a Reported-by: and Tested-by: for you and submit to Linus shortly. > > >> >> Thank you for your help on this issue, and thank you for your work as >> a kernel developer and supporting this crucial component. > > A pleasure - specially when I get to work with helpful and responsive > people :-) > > NeilBrown > Well, if it helps any, here's some history: This array dates back to early 2006 and was created with the Gentoo mdadm tools available at that time. I had one hard drive fail about 2 years ago and replaced it with an identical model. During this recent testing I noticed that one of the array devices had a metadata of versions of 0.90.02 where the others were 0.90.00 so possibly that was a side effect of the replacement. A few weeks ago I had the motherboard or CPU or something fail on the machine, so I bought replacement hardware and am trying to bring it up on the old array (which is why I'm using rc kernels--I need the driver support introduced in 3.4). It was during this rebuild that I discovered about bitmaps and thought it would be a good idea to add it to the array, so I did. So, the array has had its metadata written by at least 3 different versions of mdadm scattered over 6-1/2 years. Thus, it may be impossible (or at least extremely difficult) for you to exactly re-create my situation unless you can scrounge the old versions and simulate it. I'm suspecting my condition is an oddball one, which is probably why nobody else has experienced it (or at least google didn't find anyone talking about it). Sincerely, Karl ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2012-05-04 13:54 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CAOOwNtJhFa67EFTs5AdgSHzFseBr9xJGTsaEOyYnaYYNCeUMAQ@mail.gmail.com>
2012-05-03 5:05 ` [BISECT] Kernel panic, RIP bitmap_create Karl Newman
2012-05-03 5:58 ` NeilBrown
2012-05-03 6:14 ` Karl Newman
2012-05-03 6:25 ` NeilBrown
2012-05-03 6:50 ` NeilBrown
2012-05-04 6:37 ` Karl Newman
2012-05-04 6:47 ` NeilBrown
2012-05-04 13:54 ` Karl Newman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).