* crash with 2.6.21 BUG:ll_rw_blk.c @ 2007-07-18 8:46 walter harms 2007-07-18 10:33 ` Jens Axboe 0 siblings, 1 reply; 12+ messages in thread From: walter harms @ 2007-07-18 8:46 UTC (permalink / raw) To: LKML hi list, i managed to crash 2.6.21 at boottime with IPI Shortcut mode BUG: atblock/ll_rw_blc.c 1566 blk_remove_plug() system: acer Notebook TM620 does anyone care or is 2.6.21 already done ? re, wh ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: crash with 2.6.21 BUG:ll_rw_blk.c 2007-07-18 8:46 crash with 2.6.21 BUG:ll_rw_blk.c walter harms @ 2007-07-18 10:33 ` Jens Axboe [not found] ` <469DF233.5080902@bfs.de> 0 siblings, 1 reply; 12+ messages in thread From: Jens Axboe @ 2007-07-18 10:33 UTC (permalink / raw) To: walter harms; +Cc: LKML On Wed, Jul 18 2007, walter harms wrote: > hi list, > i managed to crash 2.6.21 at boottime with > > IPI Shortcut mode > BUG: atblock/ll_rw_blc.c 1566 blk_remove_plug() > > > system: acer Notebook TM620 > > does anyone care or is 2.6.21 already done ? We need a lot more than that, can you capture the full oops? Preferably complete with boot messages prior to the oops. -- Jens Axboe ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <469DF233.5080902@bfs.de>]
[parent not found: <20070718110724.GN11657@kernel.dk>]
[parent not found: <469E072E.7080400@bfs.de>]
[parent not found: <20070718123142.GV11657@kernel.dk>]
* Re: crash with 2.6.22.1 crash:ll_rw_blk.c blk_remove_plug() [not found] ` <20070718123142.GV11657@kernel.dk> @ 2007-07-22 16:51 ` walter harms 2007-07-22 17:20 ` Satyam Sharma 0 siblings, 1 reply; 12+ messages in thread From: walter harms @ 2007-07-22 16:51 UTC (permalink / raw) To: Jens Axboe; +Cc: LKML [-- Attachment #1: Type: text/plain, Size: 2186 bytes --] hello all, on my asus notebook tm620 there is a crash with 2.6.22 and 2.6.21 .... Using IPI Shortcut mode WARNING: at block/ll_rw_blk.c:1575 blk_remove_plug() [<c01ac87e>] blk_remove_plug+0x36/0x5a [<c01ac8b6>] __generic_unplug_device+0x14/0x1f [<c01ad587>] __make_request+0x39b/0x49c [<c01abc8c>] generic_make_request+0x228/0x255 [<c01adb54>] submit_bio+0xa5/0xac [<c013e233>] mempool_alloc+0x37/0xae [<c01314dc>] submit+0xc2/0x11d [<c0131585>] bio_read_page+0x24/0x27 [<c013188b>] swsusp_check+0x4f/0xaf [<c012f6c2>] software_resume+0x5f/0x108 [<c037867e>] kernel_init+0xb0/0x212 [<c0103a16>] ret_from_fork+0x6/0x1c [<c03785ce>] kernel_init+0x0/0x212 [<c03785ce>] kernel_init+0x0/0x212 [<c010465b>] kernel_thread_helper+0x7/0x10 ======================= Freeing unused kernel memory: 272k freed .... I attached two files with the kernel bootmessage 1. bug.txt.gz kernel 2.6.22.1 created with netconsole, last lines were missing i added them by hand 2. out.txt.gz simple a dmesg from kernel 2.6.18, works fine additional observations: removing 'resume=/dev/hda1' from grub lets the crash disappear but the drive still does not work because ata_id[772]: main: HDIO_GET_IDENTITY failed for '/dev/.tmp-3-0' there is a bug in the hd detection code: 2.6.22.1 hda: 8032MB, CHS=1024/255/63 hda: hda1 hda2 hda3 hda: p2 exceeds device capacity hda: p3 exceeds device capacity 2.6.18 hda: 39070080 sectors (20003 MB), CHS=38760/16/63, UDMA(100) hda: cache flushes supported hda: hda1 hda2 hda3 Probing IDE interface ide1... here the output from fdisk: fdisk -l Disk /dev/hda: 20.0 GB, 20003880960 bytes 255 heads, 63 sectors/track, 2432 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/hda1 1 131 1052226 82 Linux swap / Solaris /dev/hda2 * 132 1437 10490445 83 Linux /dev/hda3 1438 2432 7992337+ 83 Linux is it possible that this leads to a chain of events causing the crash in ll_rw_blk.c ? if someone is interested in the .config please mail me directly, i have not subscribed the lkml. re, wh [-- Attachment #2: bug.txt.gz --] [-- Type: application/x-gzip, Size: 4107 bytes --] [-- Attachment #3: out.txt.gz --] [-- Type: application/x-gzip, Size: 6109 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: crash with 2.6.22.1 crash:ll_rw_blk.c blk_remove_plug() 2007-07-22 16:51 ` crash with 2.6.22.1 crash:ll_rw_blk.c blk_remove_plug() walter harms @ 2007-07-22 17:20 ` Satyam Sharma 2007-07-22 22:17 ` Jens Axboe 2007-07-23 7:57 ` walter harms 0 siblings, 2 replies; 12+ messages in thread From: Satyam Sharma @ 2007-07-22 17:20 UTC (permalink / raw) To: wharms; +Cc: Jens Axboe, LKML, Rafael J. Wysocki, pm list [-- Attachment #1: Type: text/plain, Size: 3060 bytes --] Hi Walter, Thanks for reporting this. On 7/22/07, walter harms <wharms@bfs.de> wrote: > hello all, > on my asus notebook tm620 there is a crash with 2.6.22 and 2.6.21 Did this happen when you were resuming from a suspend-to-ram/disk? [ I ask because I see swsusp in the trace below, linux-pm added to Cc: ] > .... > Using IPI Shortcut mode > WARNING: at block/ll_rw_blk.c:1575 blk_remove_plug() > [<c01ac87e>] blk_remove_plug+0x36/0x5a > [<c01ac8b6>] __generic_unplug_device+0x14/0x1f > [<c01ad587>] __make_request+0x39b/0x49c > [<c01abc8c>] generic_make_request+0x228/0x255 > [<c01adb54>] submit_bio+0xa5/0xac > [<c013e233>] mempool_alloc+0x37/0xae > [<c01314dc>] submit+0xc2/0x11d > [<c0131585>] bio_read_page+0x24/0x27 > [<c013188b>] swsusp_check+0x4f/0xaf > [<c012f6c2>] software_resume+0x5f/0x108 > [<c037867e>] kernel_init+0xb0/0x212 > [<c0103a16>] ret_from_fork+0x6/0x1c > [<c03785ce>] kernel_init+0x0/0x212 > [<c03785ce>] kernel_init+0x0/0x212 > [<c010465b>] kernel_thread_helper+0x7/0x10 > ======================= Surprising, that's a WARN_ON(!irqs_disabled()) but IRQs are disabled alright on that codepath. OTOH, __make_request() is heavily goto-driven, uses the non-save/restore variants of spin_lock_irq, and does not even balance locks / unlocks for some error paths ... gaah. > Freeing unused kernel memory: 272k freed > .... > > I attached two files with the kernel bootmessage > > 1. bug.txt.gz > kernel 2.6.22.1 created with netconsole, last lines were missing i added them by hand > > 2. out.txt.gz > simple a dmesg from kernel 2.6.18, works fine I've reattached them in this mail, for linux-pm to see. > additional observations: > removing 'resume=/dev/hda1' from grub lets the crash disappear but the > drive still does not work because > ata_id[772]: main: HDIO_GET_IDENTITY failed for '/dev/.tmp-3-0' If you're resuming from a suspend I don't see how you can avoid giving the resume= parameter (unless you somehow specify some default at build-time ...) > there is a bug in the hd detection code: > 2.6.22.1 > hda: 8032MB, CHS=1024/255/63 > hda: hda1 hda2 hda3 > hda: p2 exceeds device capacity > hda: p3 exceeds device capacity > > 2.6.18 > hda: 39070080 sectors (20003 MB), CHS=38760/16/63, UDMA(100) > hda: cache flushes supported > hda: hda1 hda2 hda3 > Probing IDE interface ide1... > > > here the output from fdisk: > > fdisk -l > > Disk /dev/hda: 20.0 GB, 20003880960 bytes > 255 heads, 63 sectors/track, 2432 cylinders > Units = cylinders of 16065 * 512 = 8225280 bytes > > Device Boot Start End Blocks Id System > /dev/hda1 1 131 1052226 82 Linux swap / Solaris > /dev/hda2 * 132 1437 10490445 83 Linux > /dev/hda3 1438 2432 7992337+ 83 Linux > > > > is it possible that this leads to a chain of events causing the crash in ll_rw_blk.c ? > > if someone is interested in the .config please mail me directly, i have not > subscribed the lkml. Yes, please post your .config also. Thanks, Satyam [-- Attachment #2: bug.txt.gz --] [-- Type: application/x-gzip, Size: 4107 bytes --] [-- Attachment #3: out.txt.gz --] [-- Type: application/x-gzip, Size: 6109 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: crash with 2.6.22.1 crash:ll_rw_blk.c blk_remove_plug() 2007-07-22 17:20 ` Satyam Sharma @ 2007-07-22 22:17 ` Jens Axboe 2007-07-25 0:22 ` Satyam Sharma 2007-07-23 7:57 ` walter harms 1 sibling, 1 reply; 12+ messages in thread From: Jens Axboe @ 2007-07-22 22:17 UTC (permalink / raw) To: Satyam Sharma; +Cc: wharms, LKML, Rafael J. Wysocki, pm list On Sun, Jul 22 2007, Satyam Sharma wrote: > Hi Walter, > > Thanks for reporting this. > > On 7/22/07, walter harms <wharms@bfs.de> wrote: >> hello all, >> on my asus notebook tm620 there is a crash with 2.6.22 and 2.6.21 > > Did this happen when you were resuming from a suspend-to-ram/disk? > [ I ask because I see swsusp in the trace below, linux-pm added to Cc: ] > >> .... >> Using IPI Shortcut mode >> WARNING: at block/ll_rw_blk.c:1575 blk_remove_plug() >> [<c01ac87e>] blk_remove_plug+0x36/0x5a >> [<c01ac8b6>] __generic_unplug_device+0x14/0x1f >> [<c01ad587>] __make_request+0x39b/0x49c >> [<c01abc8c>] generic_make_request+0x228/0x255 >> [<c01adb54>] submit_bio+0xa5/0xac >> [<c013e233>] mempool_alloc+0x37/0xae >> [<c01314dc>] submit+0xc2/0x11d >> [<c0131585>] bio_read_page+0x24/0x27 >> [<c013188b>] swsusp_check+0x4f/0xaf >> [<c012f6c2>] software_resume+0x5f/0x108 >> [<c037867e>] kernel_init+0xb0/0x212 >> [<c0103a16>] ret_from_fork+0x6/0x1c >> [<c03785ce>] kernel_init+0x0/0x212 >> [<c03785ce>] kernel_init+0x0/0x212 >> [<c010465b>] kernel_thread_helper+0x7/0x10 >> ======================= > > Surprising, that's a WARN_ON(!irqs_disabled()) but IRQs are disabled > alright on that codepath. OTOH, __make_request() is heavily goto-driven, > uses the non-save/restore variants of spin_lock_irq, and does not even > balance locks / unlocks for some error paths ... gaah. __make_request() must be called from process context, hence spin_lock_irq() is perfectly already and the fastest way to go. And of course the locking is balanced! So please save your 'gaah's for code you actually took the time to try and understand. But it does look like unbalanced irq disable/enable calls. I'd guess in the suspend/resume path. Obviously something more esoteric, since this is the first such report for 2.6.22, so like some not-very-used driver for instance. -- Jens Axboe ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: crash with 2.6.22.1 crash:ll_rw_blk.c blk_remove_plug() 2007-07-22 22:17 ` Jens Axboe @ 2007-07-25 0:22 ` Satyam Sharma 2007-07-25 7:05 ` walter harms 2007-07-25 11:18 ` Jens Axboe 0 siblings, 2 replies; 12+ messages in thread From: Satyam Sharma @ 2007-07-25 0:22 UTC (permalink / raw) To: Jens Axboe; +Cc: wharms, LKML, Rafael J. Wysocki, pm list On 7/23/07, Jens Axboe <jens.axboe@oracle.com> wrote: > On Sun, Jul 22 2007, Satyam Sharma wrote: > > Hi Walter, > > > > Thanks for reporting this. > > > > On 7/22/07, walter harms <wharms@bfs.de> wrote: > >> hello all, > >> on my asus notebook tm620 there is a crash with 2.6.22 and 2.6.21 > > > > Did this happen when you were resuming from a suspend-to-ram/disk? > > [ I ask because I see swsusp in the trace below, linux-pm added to Cc: ] > > > >> .... > >> Using IPI Shortcut mode > >> WARNING: at block/ll_rw_blk.c:1575 blk_remove_plug() > >> [<c01ac87e>] blk_remove_plug+0x36/0x5a > >> [<c01ac8b6>] __generic_unplug_device+0x14/0x1f > >> [<c01ad587>] __make_request+0x39b/0x49c > >> [<c01abc8c>] generic_make_request+0x228/0x255 > >> [<c01adb54>] submit_bio+0xa5/0xac > >> [<c013e233>] mempool_alloc+0x37/0xae > >> [<c01314dc>] submit+0xc2/0x11d > >> [<c0131585>] bio_read_page+0x24/0x27 > >> [<c013188b>] swsusp_check+0x4f/0xaf > >> [<c012f6c2>] software_resume+0x5f/0x108 > >> [<c037867e>] kernel_init+0xb0/0x212 > >> [<c0103a16>] ret_from_fork+0x6/0x1c > >> [<c03785ce>] kernel_init+0x0/0x212 > >> [<c03785ce>] kernel_init+0x0/0x212 > >> [<c010465b>] kernel_thread_helper+0x7/0x10 > >> ======================= > > > > Surprising, that's a WARN_ON(!irqs_disabled()) but IRQs are disabled > > alright on that codepath. OTOH, __make_request() is heavily goto-driven, > > uses the non-save/restore variants of spin_lock_irq, and does not even > > balance locks / unlocks for some error paths ... gaah. > > __make_request() must be called from process context, hence > spin_lock_irq() is perfectly already and the fastest way to go. And of > course the locking is balanced! So please save your 'gaah's for code > you actually took the time to try and understand. You're right, I didn't really look at that code for long (it even explicitly comments about what's going with the locking in there!) sorry about that. [ Off-topic: BTW does every call to __make_request() end up in blk_remove_plug()? Since you're explicitly making the assumption that it *must* be called from process context (and hence the use of the non-save/restore variants), you could consider putting a WARN_ON(irqs_disabled()) over there, and perhaps a WARN_ON (!spin_is_locked(queue_lock)) in blk_remove_plug() instead, and other such similar functions that currently have the !irqs_disabled check. This way you'd effectively cover _both_ the assertions, and in appropriate places -- just a suggestion. ] > But it does look like unbalanced irq disable/enable calls. I'd guess in > the suspend/resume path. Obviously something more esoteric, since this > is the first such report for 2.6.22, so like some not-very-used driver > for instance. Now that I do look at the codepath, it does seem surprising irqs were not disabled there. There are a bunch of calls to _other_ functions between the spin_lock_irq and the blk_remove_plug via __generic_unplug_device that would also have complained about !irqs_disabled. Walter, does this happen reproducibly? Satyam ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: crash with 2.6.22.1 crash:ll_rw_blk.c blk_remove_plug() 2007-07-25 0:22 ` Satyam Sharma @ 2007-07-25 7:05 ` walter harms 2007-07-25 11:18 ` Jens Axboe 1 sibling, 0 replies; 12+ messages in thread From: walter harms @ 2007-07-25 7:05 UTC (permalink / raw) To: Satyam Sharma; +Cc: Jens Axboe, LKML, Rafael J. Wysocki, pm list Satyam Sharma wrote: > On 7/23/07, Jens Axboe <jens.axboe@oracle.com> wrote: >> On Sun, Jul 22 2007, Satyam Sharma wrote: >> > Hi Walter, >> > >> > Thanks for reporting this. >> > >> > On 7/22/07, walter harms <wharms@bfs.de> wrote: >> >> hello all, >> >> on my asus notebook tm620 there is a crash with 2.6.22 and 2.6.21 >> > >> > Did this happen when you were resuming from a suspend-to-ram/disk? >> > [ I ask because I see swsusp in the trace below, linux-pm added to >> Cc: ] >> > >> >> .... >> >> Using IPI Shortcut mode >> >> WARNING: at block/ll_rw_blk.c:1575 blk_remove_plug() >> >> [<c01ac87e>] blk_remove_plug+0x36/0x5a >> >> [<c01ac8b6>] __generic_unplug_device+0x14/0x1f >> >> [<c01ad587>] __make_request+0x39b/0x49c >> >> [<c01abc8c>] generic_make_request+0x228/0x255 >> >> [<c01adb54>] submit_bio+0xa5/0xac >> >> [<c013e233>] mempool_alloc+0x37/0xae >> >> [<c01314dc>] submit+0xc2/0x11d >> >> [<c0131585>] bio_read_page+0x24/0x27 >> >> [<c013188b>] swsusp_check+0x4f/0xaf >> >> [<c012f6c2>] software_resume+0x5f/0x108 >> >> [<c037867e>] kernel_init+0xb0/0x212 >> >> [<c0103a16>] ret_from_fork+0x6/0x1c >> >> [<c03785ce>] kernel_init+0x0/0x212 >> >> [<c03785ce>] kernel_init+0x0/0x212 >> >> [<c010465b>] kernel_thread_helper+0x7/0x10 >> >> ======================= >> > >> > Surprising, that's a WARN_ON(!irqs_disabled()) but IRQs are disabled >> > alright on that codepath. OTOH, __make_request() is heavily >> goto-driven, >> > uses the non-save/restore variants of spin_lock_irq, and does not even >> > balance locks / unlocks for some error paths ... gaah. >> >> __make_request() must be called from process context, hence >> spin_lock_irq() is perfectly already and the fastest way to go. And of >> course the locking is balanced! So please save your 'gaah's for code >> you actually took the time to try and understand. > > You're right, I didn't really look at that code for long (it even > explicitly > comments about what's going with the locking in there!) sorry about > that. > > [ Off-topic: BTW does every call to __make_request() end up in > blk_remove_plug()? Since you're explicitly making the assumption > that it *must* be called from process context (and hence the use of > the non-save/restore variants), you could consider putting a > WARN_ON(irqs_disabled()) over there, and perhaps a WARN_ON > (!spin_is_locked(queue_lock)) in blk_remove_plug() instead, and > other such similar functions that currently have the !irqs_disabled > check. This way you'd effectively cover _both_ the assertions, > and in appropriate places -- just a suggestion. ] > >> But it does look like unbalanced irq disable/enable calls. I'd guess in >> the suspend/resume path. Obviously something more esoteric, since this >> is the first such report for 2.6.22, so like some not-very-used driver >> for instance. > > Now that I do look at the codepath, it does seem surprising irqs were > not disabled there. There are a bunch of calls to _other_ functions > between the spin_lock_irq and the blk_remove_plug via > __generic_unplug_device that would also have complained about > !irqs_disabled. > > Walter, does this happen reproducibly? > yes, with 2.6.21 and 2.6.22.1 re, wh ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: crash with 2.6.22.1 crash:ll_rw_blk.c blk_remove_plug() 2007-07-25 0:22 ` Satyam Sharma 2007-07-25 7:05 ` walter harms @ 2007-07-25 11:18 ` Jens Axboe 2007-07-25 12:19 ` walter harms ` (2 more replies) 1 sibling, 3 replies; 12+ messages in thread From: Jens Axboe @ 2007-07-25 11:18 UTC (permalink / raw) To: Satyam Sharma; +Cc: wharms, LKML, Rafael J. Wysocki, pm list On Wed, Jul 25 2007, Satyam Sharma wrote: > On 7/23/07, Jens Axboe <jens.axboe@oracle.com> wrote: >> On Sun, Jul 22 2007, Satyam Sharma wrote: >> > Hi Walter, >> > >> > Thanks for reporting this. >> > >> > On 7/22/07, walter harms <wharms@bfs.de> wrote: >> >> hello all, >> >> on my asus notebook tm620 there is a crash with 2.6.22 and 2.6.21 >> > >> > Did this happen when you were resuming from a suspend-to-ram/disk? >> > [ I ask because I see swsusp in the trace below, linux-pm added to Cc: ] >> > >> >> .... >> >> Using IPI Shortcut mode >> >> WARNING: at block/ll_rw_blk.c:1575 blk_remove_plug() >> >> [<c01ac87e>] blk_remove_plug+0x36/0x5a >> >> [<c01ac8b6>] __generic_unplug_device+0x14/0x1f >> >> [<c01ad587>] __make_request+0x39b/0x49c >> >> [<c01abc8c>] generic_make_request+0x228/0x255 >> >> [<c01adb54>] submit_bio+0xa5/0xac >> >> [<c013e233>] mempool_alloc+0x37/0xae >> >> [<c01314dc>] submit+0xc2/0x11d >> >> [<c0131585>] bio_read_page+0x24/0x27 >> >> [<c013188b>] swsusp_check+0x4f/0xaf >> >> [<c012f6c2>] software_resume+0x5f/0x108 >> >> [<c037867e>] kernel_init+0xb0/0x212 >> >> [<c0103a16>] ret_from_fork+0x6/0x1c >> >> [<c03785ce>] kernel_init+0x0/0x212 >> >> [<c03785ce>] kernel_init+0x0/0x212 >> >> [<c010465b>] kernel_thread_helper+0x7/0x10 >> >> ======================= >> > >> > Surprising, that's a WARN_ON(!irqs_disabled()) but IRQs are disabled >> > alright on that codepath. OTOH, __make_request() is heavily goto-driven, >> > uses the non-save/restore variants of spin_lock_irq, and does not even >> > balance locks / unlocks for some error paths ... gaah. >> >> __make_request() must be called from process context, hence >> spin_lock_irq() is perfectly already and the fastest way to go. And of >> course the locking is balanced! So please save your 'gaah's for code >> you actually took the time to try and understand. > > You're right, I didn't really look at that code for long (it even > explicitly > comments about what's going with the locking in there!) sorry about > that. > > [ Off-topic: BTW does every call to __make_request() end up in > blk_remove_plug()? Since you're explicitly making the assumption > that it *must* be called from process context (and hence the use of > the non-save/restore variants), you could consider putting a > WARN_ON(irqs_disabled()) over there, and perhaps a WARN_ON > (!spin_is_locked(queue_lock)) in blk_remove_plug() instead, and > other such similar functions that currently have the !irqs_disabled > check. This way you'd effectively cover _both_ the assertions, > and in appropriate places -- just a suggestion. ] No, blk_remove_plug() will only be called for sync bios, or where we have to wait for request allocation (which will unplug the device). __generic_make_request() already does a might_sleep() check, so it should catch this already. >> But it does look like unbalanced irq disable/enable calls. I'd guess in >> the suspend/resume path. Obviously something more esoteric, since this >> is the first such report for 2.6.22, so like some not-very-used driver >> for instance. > > Now that I do look at the codepath, it does seem surprising irqs were > not disabled there. There are a bunch of calls to _other_ functions > between the spin_lock_irq and the blk_remove_plug via > __generic_unplug_device that would also have complained about > !irqs_disabled. > > Walter, does this happen reproducibly? As I previously wrote, it's like some of the device power up or resume routines that botch the irq enable/disable stuff. It'd be interesting to start stripping down the config until the warning goes away - or enable CONFIG_PM_DEBUG which may help as well. -- Jens Axboe ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: crash with 2.6.22.1 crash:ll_rw_blk.c blk_remove_plug() 2007-07-25 11:18 ` Jens Axboe @ 2007-07-25 12:19 ` walter harms 2007-07-26 7:05 ` walter harms 2007-07-29 15:48 ` walter harms 2 siblings, 0 replies; 12+ messages in thread From: walter harms @ 2007-07-25 12:19 UTC (permalink / raw) To: Jens Axboe; +Cc: Satyam Sharma, LKML, Rafael J. Wysocki, pm list Jens Axboe wrote: > On Wed, Jul 25 2007, Satyam Sharma wrote: >> On 7/23/07, Jens Axboe <jens.axboe@oracle.com> wrote: >>> On Sun, Jul 22 2007, Satyam Sharma wrote: >>>> Hi Walter, >>>> >>>> Thanks for reporting this. >>>> >>>> On 7/22/07, walter harms <wharms@bfs.de> wrote: >>>>> hello all, >>>>> on my asus notebook tm620 there is a crash with 2.6.22 and 2.6.21 >>>> Did this happen when you were resuming from a suspend-to-ram/disk? >>>> [ I ask because I see swsusp in the trace below, linux-pm added to Cc: ] >>>> >>>>> .... >>>>> Using IPI Shortcut mode >>>>> WARNING: at block/ll_rw_blk.c:1575 blk_remove_plug() >>>>> [<c01ac87e>] blk_remove_plug+0x36/0x5a >>>>> [<c01ac8b6>] __generic_unplug_device+0x14/0x1f >>>>> [<c01ad587>] __make_request+0x39b/0x49c >>>>> [<c01abc8c>] generic_make_request+0x228/0x255 >>>>> [<c01adb54>] submit_bio+0xa5/0xac >>>>> [<c013e233>] mempool_alloc+0x37/0xae >>>>> [<c01314dc>] submit+0xc2/0x11d >>>>> [<c0131585>] bio_read_page+0x24/0x27 >>>>> [<c013188b>] swsusp_check+0x4f/0xaf >>>>> [<c012f6c2>] software_resume+0x5f/0x108 >>>>> [<c037867e>] kernel_init+0xb0/0x212 >>>>> [<c0103a16>] ret_from_fork+0x6/0x1c >>>>> [<c03785ce>] kernel_init+0x0/0x212 >>>>> [<c03785ce>] kernel_init+0x0/0x212 >>>>> [<c010465b>] kernel_thread_helper+0x7/0x10 >>>>> ======================= >>>> Surprising, that's a WARN_ON(!irqs_disabled()) but IRQs are disabled >>>> alright on that codepath. OTOH, __make_request() is heavily goto-driven, >>>> uses the non-save/restore variants of spin_lock_irq, and does not even >>>> balance locks / unlocks for some error paths ... gaah. >>> __make_request() must be called from process context, hence >>> spin_lock_irq() is perfectly already and the fastest way to go. And of >>> course the locking is balanced! So please save your 'gaah's for code >>> you actually took the time to try and understand. >> You're right, I didn't really look at that code for long (it even >> explicitly >> comments about what's going with the locking in there!) sorry about >> that. >> >> [ Off-topic: BTW does every call to __make_request() end up in >> blk_remove_plug()? Since you're explicitly making the assumption >> that it *must* be called from process context (and hence the use of >> the non-save/restore variants), you could consider putting a >> WARN_ON(irqs_disabled()) over there, and perhaps a WARN_ON >> (!spin_is_locked(queue_lock)) in blk_remove_plug() instead, and >> other such similar functions that currently have the !irqs_disabled >> check. This way you'd effectively cover _both_ the assertions, >> and in appropriate places -- just a suggestion. ] > > No, blk_remove_plug() will only be called for sync bios, or where we > have to wait for request allocation (which will unplug the device). > > __generic_make_request() already does a might_sleep() check, so it > should catch this already. > >>> But it does look like unbalanced irq disable/enable calls. I'd guess in >>> the suspend/resume path. Obviously something more esoteric, since this >>> is the first such report for 2.6.22, so like some not-very-used driver >>> for instance. >> Now that I do look at the codepath, it does seem surprising irqs were >> not disabled there. There are a bunch of calls to _other_ functions >> between the spin_lock_irq and the blk_remove_plug via >> __generic_unplug_device that would also have complained about >> !irqs_disabled. >> >> Walter, does this happen reproducibly? > > As I previously wrote, it's like some of the device power up or resume > routines that botch the irq enable/disable stuff. It'd be interesting to > start stripping down the config until the warning goes away - or enable > CONFIG_PM_DEBUG which may help as well. > i will give CONFIG_PM_DEBUG a try, do not expect results before WE. re, wh ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: crash with 2.6.22.1 crash:ll_rw_blk.c blk_remove_plug() 2007-07-25 11:18 ` Jens Axboe 2007-07-25 12:19 ` walter harms @ 2007-07-26 7:05 ` walter harms 2007-07-29 15:48 ` walter harms 2 siblings, 0 replies; 12+ messages in thread From: walter harms @ 2007-07-26 7:05 UTC (permalink / raw) To: Jens Axboe; +Cc: Satyam Sharma, LKML, Rafael J. Wysocki, pm list Jens Axboe wrote: > On Wed, Jul 25 2007, Satyam Sharma wrote: >> On 7/23/07, Jens Axboe <jens.axboe@oracle.com> wrote: >>> On Sun, Jul 22 2007, Satyam Sharma wrote: >>>> Hi Walter, >>>> >>>> Thanks for reporting this. >>>> >>>> On 7/22/07, walter harms <wharms@bfs.de> wrote: >>>>> hello all, >>>>> on my asus notebook tm620 there is a crash with 2.6.22 and 2.6.21 >>>> Did this happen when you were resuming from a suspend-to-ram/disk? >>>> [ I ask because I see swsusp in the trace below, linux-pm added to Cc: ] >>>> >>>>> .... >>>>> Using IPI Shortcut mode >>>>> WARNING: at block/ll_rw_blk.c:1575 blk_remove_plug() >>>>> [<c01ac87e>] blk_remove_plug+0x36/0x5a >>>>> [<c01ac8b6>] __generic_unplug_device+0x14/0x1f >>>>> [<c01ad587>] __make_request+0x39b/0x49c >>>>> [<c01abc8c>] generic_make_request+0x228/0x255 >>>>> [<c01adb54>] submit_bio+0xa5/0xac >>>>> [<c013e233>] mempool_alloc+0x37/0xae >>>>> [<c01314dc>] submit+0xc2/0x11d >>>>> [<c0131585>] bio_read_page+0x24/0x27 >>>>> [<c013188b>] swsusp_check+0x4f/0xaf >>>>> [<c012f6c2>] software_resume+0x5f/0x108 >>>>> [<c037867e>] kernel_init+0xb0/0x212 >>>>> [<c0103a16>] ret_from_fork+0x6/0x1c >>>>> [<c03785ce>] kernel_init+0x0/0x212 >>>>> [<c03785ce>] kernel_init+0x0/0x212 >>>>> [<c010465b>] kernel_thread_helper+0x7/0x10 >>>>> ======================= >>>> Surprising, that's a WARN_ON(!irqs_disabled()) but IRQs are disabled >>>> alright on that codepath. OTOH, __make_request() is heavily goto-driven, >>>> uses the non-save/restore variants of spin_lock_irq, and does not even >>>> balance locks / unlocks for some error paths ... gaah. >>> __make_request() must be called from process context, hence >>> spin_lock_irq() is perfectly already and the fastest way to go. And of >>> course the locking is balanced! So please save your 'gaah's for code >>> you actually took the time to try and understand. >> You're right, I didn't really look at that code for long (it even >> explicitly >> comments about what's going with the locking in there!) sorry about >> that. >> >> [ Off-topic: BTW does every call to __make_request() end up in >> blk_remove_plug()? Since you're explicitly making the assumption >> that it *must* be called from process context (and hence the use of >> the non-save/restore variants), you could consider putting a >> WARN_ON(irqs_disabled()) over there, and perhaps a WARN_ON >> (!spin_is_locked(queue_lock)) in blk_remove_plug() instead, and >> other such similar functions that currently have the !irqs_disabled >> check. This way you'd effectively cover _both_ the assertions, >> and in appropriate places -- just a suggestion. ] > > No, blk_remove_plug() will only be called for sync bios, or where we > have to wait for request allocation (which will unplug the device). > > __generic_make_request() already does a might_sleep() check, so it > should catch this already. > >>> But it does look like unbalanced irq disable/enable calls. I'd guess in >>> the suspend/resume path. Obviously something more esoteric, since this >>> is the first such report for 2.6.22, so like some not-very-used driver >>> for instance. >> Now that I do look at the codepath, it does seem surprising irqs were >> not disabled there. There are a bunch of calls to _other_ functions >> between the spin_lock_irq and the blk_remove_plug via >> __generic_unplug_device that would also have complained about >> !irqs_disabled. >> >> Walter, does this happen reproducibly? > > As I previously wrote, it's like some of the device power up or resume > routines that botch the irq enable/disable stuff. It'd be interesting to > start stripping down the config until the warning goes away - or enable > CONFIG_PM_DEBUG which may help as well. > hi all, i have recompiled the kernel with CONFIG_PM_DEBUG but that resulted in nothing more that a magic number after the backtrace, does anyone care ? I played with hda=c,h,s to overcome the detection problem but no success here beside some "TSC instability". i did not try this with a working kernel, it is only to give you all information i have. re, wh ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: crash with 2.6.22.1 crash:ll_rw_blk.c blk_remove_plug() 2007-07-25 11:18 ` Jens Axboe 2007-07-25 12:19 ` walter harms 2007-07-26 7:05 ` walter harms @ 2007-07-29 15:48 ` walter harms 2 siblings, 0 replies; 12+ messages in thread From: walter harms @ 2007-07-29 15:48 UTC (permalink / raw) To: Jens Axboe; +Cc: Satyam Sharma, LKML, Rafael J. Wysocki, pm list [-- Attachment #1: Type: text/plain, Size: 3688 bytes --] >>>> On 7/22/07, walter harms <wharms@bfs.de> wrote: >>>>> hello all, >>>>> on my asus notebook tm620 there is a crash with 2.6.22 and 2.6.21 >>>> Did this happen when you were resuming from a suspend-to-ram/disk? >>>> [ I ask because I see swsusp in the trace below, linux-pm added to Cc: ] >>>> >>>>> .... >>>>> Using IPI Shortcut mode >>>>> WARNING: at block/ll_rw_blk.c:1575 blk_remove_plug() >>>>> [<c01ac87e>] blk_remove_plug+0x36/0x5a >>>>> [<c01ac8b6>] __generic_unplug_device+0x14/0x1f >>>>> [<c01ad587>] __make_request+0x39b/0x49c >>>>> [<c01abc8c>] generic_make_request+0x228/0x255 >>>>> [<c01adb54>] submit_bio+0xa5/0xac >>>>> [<c013e233>] mempool_alloc+0x37/0xae >>>>> [<c01314dc>] submit+0xc2/0x11d >>>>> [<c0131585>] bio_read_page+0x24/0x27 >>>>> [<c013188b>] swsusp_check+0x4f/0xaf >>>>> [<c012f6c2>] software_resume+0x5f/0x108 >>>>> [<c037867e>] kernel_init+0xb0/0x212 >>>>> [<c0103a16>] ret_from_fork+0x6/0x1c >>>>> [<c03785ce>] kernel_init+0x0/0x212 >>>>> [<c03785ce>] kernel_init+0x0/0x212 >>>>> [<c010465b>] kernel_thread_helper+0x7/0x10 >>>>> ======================= >>>> Surprising, that's a WARN_ON(!irqs_disabled()) but IRQs are disabled >>>> alright on that codepath. OTOH, __make_request() is heavily goto-driven, >>>> uses the non-save/restore variants of spin_lock_irq, and does not even >>>> balance locks / unlocks for some error paths ... gaah. >>> __make_request() must be called from process context, hence >>> spin_lock_irq() is perfectly already and the fastest way to go. And of >>> course the locking is balanced! So please save your 'gaah's for code >>> you actually took the time to try and understand. >> You're right, I didn't really look at that code for long (it even >> explicitly >> comments about what's going with the locking in there!) sorry about >> that. >> >> [ Off-topic: BTW does every call to __make_request() end up in >> blk_remove_plug()? Since you're explicitly making the assumption >> that it *must* be called from process context (and hence the use of >> the non-save/restore variants), you could consider putting a >> WARN_ON(irqs_disabled()) over there, and perhaps a WARN_ON >> (!spin_is_locked(queue_lock)) in blk_remove_plug() instead, and >> other such similar functions that currently have the !irqs_disabled >> check. This way you'd effectively cover _both_ the assertions, >> and in appropriate places -- just a suggestion. ] > > No, blk_remove_plug() will only be called for sync bios, or where we > have to wait for request allocation (which will unplug the device). > > __generic_make_request() already does a might_sleep() check, so it > should catch this already. > >>> But it does look like unbalanced irq disable/enable calls. I'd guess in >>> the suspend/resume path. Obviously something more esoteric, since this >>> is the first such report for 2.6.22, so like some not-very-used driver >>> for instance. >> Now that I do look at the codepath, it does seem surprising irqs were >> not disabled there. There are a bunch of calls to _other_ functions >> between the spin_lock_irq and the blk_remove_plug via >> __generic_unplug_device that would also have complained about >> !irqs_disabled. >> >> Walter, does this happen reproducibly? > > As I previously wrote, it's like some of the device power up or resume > routines that botch the irq enable/disable stuff. It'd be interesting to > start stripping down the config until the warning goes away - or enable > CONFIG_PM_DEBUG which may help as well. > hi ppl, here is the output of 2.6.21 with CONFIG_PM_DEBUG. i have disable edd,apm,acpi,smp,resume whatever i could to make thinks more easy. imho it shows nothing useful. re, wh [-- Attachment #2: 2.6.21.PM_DEBUG.gz --] [-- Type: application/x-gzip, Size: 3807 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: crash with 2.6.22.1 crash:ll_rw_blk.c blk_remove_plug() 2007-07-22 17:20 ` Satyam Sharma 2007-07-22 22:17 ` Jens Axboe @ 2007-07-23 7:57 ` walter harms 1 sibling, 0 replies; 12+ messages in thread From: walter harms @ 2007-07-23 7:57 UTC (permalink / raw) To: Satyam Sharma; +Cc: Jens Axboe, LKML, Rafael J. Wysocki, pm list [-- Attachment #1: Type: text/plain, Size: 558 bytes --] Satyam Sharma wrote: > Hi Walter, > > Thanks for reporting this. > > On 7/22/07, walter harms <wharms@bfs.de> wrote: >> hello all, >> on my asus notebook tm620 there is a crash with 2.6.22 and 2.6.21 > > Did this happen when you were resuming from a suspend-to-ram/disk? > [ I ask because I see swsusp in the trace below, linux-pm added to Cc: ] > hi all, this is a 'simple' bootprocess no suspend from ram/disk no exotic devices beyond a 20GB ide-hd. attached is the current .conf (renamed to _config so nobody will accidentally overwrite) re, wh [-- Attachment #2: _config.gz --] [-- Type: application/x-gzip, Size: 18812 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2007-07-29 15:49 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-18 8:46 crash with 2.6.21 BUG:ll_rw_blk.c walter harms
2007-07-18 10:33 ` Jens Axboe
[not found] ` <469DF233.5080902@bfs.de>
[not found] ` <20070718110724.GN11657@kernel.dk>
[not found] ` <469E072E.7080400@bfs.de>
[not found] ` <20070718123142.GV11657@kernel.dk>
2007-07-22 16:51 ` crash with 2.6.22.1 crash:ll_rw_blk.c blk_remove_plug() walter harms
2007-07-22 17:20 ` Satyam Sharma
2007-07-22 22:17 ` Jens Axboe
2007-07-25 0:22 ` Satyam Sharma
2007-07-25 7:05 ` walter harms
2007-07-25 11:18 ` Jens Axboe
2007-07-25 12:19 ` walter harms
2007-07-26 7:05 ` walter harms
2007-07-29 15:48 ` walter harms
2007-07-23 7:57 ` walter harms
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox