crash with 2.6.21 BUG:ll_rw

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* crash with 2.6.21 BUG:ll_rw_blk.c
@ 2007-07-18  8:46 walter harms
  2007-07-18 10:33 ` Jens Axboe
  0 siblings, 1 reply; 12+ messages in thread
From: walter harms @ 2007-07-18  8:46 UTC (permalink / raw)
  To: LKML

hi list,
i managed to crash 2.6.21 at boottime with

IPI Shortcut mode
BUG: atblock/ll_rw_blc.c 1566 blk_remove_plug()


system: acer Notebook TM620

does anyone care or is 2.6.21 already done ?

re,
 wh

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: crash with 2.6.21 BUG:ll_rw_blk.c
  2007-07-18  8:46 crash with 2.6.21 BUG:ll_rw_blk.c walter harms
@ 2007-07-18 10:33 ` Jens Axboe
       [not found]   ` <469DF233.5080902@bfs.de>
  0 siblings, 1 reply; 12+ messages in thread
From: Jens Axboe @ 2007-07-18 10:33 UTC (permalink / raw)
  To: walter harms; +Cc: LKML

On Wed, Jul 18 2007, walter harms wrote:
> hi list,
> i managed to crash 2.6.21 at boottime with
> 
> IPI Shortcut mode
> BUG: atblock/ll_rw_blc.c 1566 blk_remove_plug()
> 
> 
> system: acer Notebook TM620
> 
> does anyone care or is 2.6.21 already done ?

We need a lot more than that, can you capture the full oops? Preferably
complete with boot messages prior to the oops.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: crash with 2.6.22.1 crash:ll_rw_blk.c blk_remove_plug()
       [not found]         ` <20070718123142.GV11657@kernel.dk>
@ 2007-07-22 16:51           ` walter harms
  2007-07-22 17:20             ` Satyam Sharma
  0 siblings, 1 reply; 12+ messages in thread
From: walter harms @ 2007-07-22 16:51 UTC (permalink / raw)
  To: Jens Axboe; +Cc: LKML

[-- Attachment #1: Type: text/plain, Size: 2186 bytes --]

hello all,
on my asus notebook tm620 there is a crash with 2.6.22 and 2.6.21

....
Using IPI Shortcut mode
WARNING: at block/ll_rw_blk.c:1575 blk_remove_plug()
 [<c01ac87e>] blk_remove_plug+0x36/0x5a
 [<c01ac8b6>] __generic_unplug_device+0x14/0x1f
 [<c01ad587>] __make_request+0x39b/0x49c
 [<c01abc8c>] generic_make_request+0x228/0x255
 [<c01adb54>] submit_bio+0xa5/0xac
 [<c013e233>] mempool_alloc+0x37/0xae
 [<c01314dc>] submit+0xc2/0x11d
 [<c0131585>] bio_read_page+0x24/0x27
 [<c013188b>] swsusp_check+0x4f/0xaf
 [<c012f6c2>] software_resume+0x5f/0x108
 [<c037867e>] kernel_init+0xb0/0x212
 [<c0103a16>] ret_from_fork+0x6/0x1c
 [<c03785ce>] kernel_init+0x0/0x212
 [<c03785ce>] kernel_init+0x0/0x212
 [<c010465b>] kernel_thread_helper+0x7/0x10
 =======================
Freeing unused kernel memory: 272k freed
....

I attached two files with the kernel bootmessage

1. bug.txt.gz
 kernel 2.6.22.1 created with netconsole, last lines were missing i added them by hand

2. out.txt.gz
  simple a dmesg from kernel 2.6.18, works fine


additional observations:
removing 'resume=/dev/hda1' from grub lets the crash disappear but the drive still does not work because
ata_id[772]: main: HDIO_GET_IDENTITY failed for '/dev/.tmp-3-0'

there is a bug in the hd detection code:
2.6.22.1
hda: 8032MB, CHS=1024/255/63
 hda: hda1 hda2 hda3
 hda: p2 exceeds device capacity
 hda: p3 exceeds device capacity

2.6.18
hda: 39070080 sectors (20003 MB), CHS=38760/16/63, UDMA(100)
hda: cache flushes supported
 hda: hda1 hda2 hda3
Probing IDE interface ide1...


here the output from fdisk:

 fdisk -l

Disk /dev/hda: 20.0 GB, 20003880960 bytes
255 heads, 63 sectors/track, 2432 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/hda1               1         131     1052226   82  Linux swap / Solaris
/dev/hda2   *         132        1437    10490445   83  Linux
/dev/hda3            1438        2432     7992337+  83  Linux



is it possible that this leads to a chain of events causing the crash in ll_rw_blk.c ?

if someone is interested in the .config please mail me directly, i have not subscribed the lkml.

re,
 wh



[-- Attachment #2: bug.txt.gz --]
[-- Type: application/x-gzip, Size: 4107 bytes --]

[-- Attachment #3: out.txt.gz --]
[-- Type: application/x-gzip, Size: 6109 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: crash with 2.6.22.1 crash:ll_rw_blk.c blk_remove_plug()
  2007-07-22 16:51           ` crash with 2.6.22.1 crash:ll_rw_blk.c blk_remove_plug() walter harms
@ 2007-07-22 17:20             ` Satyam Sharma
  2007-07-22 22:17               ` Jens Axboe
  2007-07-23  7:57               ` walter harms
  0 siblings, 2 replies; 12+ messages in thread
From: Satyam Sharma @ 2007-07-22 17:20 UTC (permalink / raw)
  To: wharms; +Cc: Jens Axboe, LKML, Rafael J. Wysocki, pm list

[-- Attachment #1: Type: text/plain, Size: 3060 bytes --]

Hi Walter,

Thanks for reporting this.

On 7/22/07, walter harms <wharms@bfs.de> wrote:
> hello all,
> on my asus notebook tm620 there is a crash with 2.6.22 and 2.6.21

Did this happen when you were resuming from a suspend-to-ram/disk?
[ I ask because I see swsusp in the trace below, linux-pm added to Cc: ]

> ....
> Using IPI Shortcut mode
> WARNING: at block/ll_rw_blk.c:1575 blk_remove_plug()
>  [<c01ac87e>] blk_remove_plug+0x36/0x5a
>  [<c01ac8b6>] __generic_unplug_device+0x14/0x1f
>  [<c01ad587>] __make_request+0x39b/0x49c
>  [<c01abc8c>] generic_make_request+0x228/0x255
>  [<c01adb54>] submit_bio+0xa5/0xac
>  [<c013e233>] mempool_alloc+0x37/0xae
>  [<c01314dc>] submit+0xc2/0x11d
>  [<c0131585>] bio_read_page+0x24/0x27
>  [<c013188b>] swsusp_check+0x4f/0xaf
>  [<c012f6c2>] software_resume+0x5f/0x108
>  [<c037867e>] kernel_init+0xb0/0x212
>  [<c0103a16>] ret_from_fork+0x6/0x1c
>  [<c03785ce>] kernel_init+0x0/0x212
>  [<c03785ce>] kernel_init+0x0/0x212
>  [<c010465b>] kernel_thread_helper+0x7/0x10
>  =======================

Surprising, that's a WARN_ON(!irqs_disabled()) but IRQs are disabled
alright on that codepath. OTOH, __make_request() is heavily goto-driven,
uses the non-save/restore variants of spin_lock_irq, and does not even
balance locks / unlocks for some error paths ... gaah.

> Freeing unused kernel memory: 272k freed
> ....
>
> I attached two files with the kernel bootmessage
>
> 1. bug.txt.gz
>  kernel 2.6.22.1 created with netconsole, last lines were missing i added them by hand
>
> 2. out.txt.gz
>   simple a dmesg from kernel 2.6.18, works fine

I've reattached them in this mail, for linux-pm to see.

> additional observations:
> removing 'resume=/dev/hda1' from grub lets the crash disappear but the
> drive still does not work because
> ata_id[772]: main: HDIO_GET_IDENTITY failed for '/dev/.tmp-3-0'

If you're resuming from a suspend I don't see how you can avoid
giving the resume= parameter (unless you somehow specify some
default at build-time ...)

> there is a bug in the hd detection code:
> 2.6.22.1
> hda: 8032MB, CHS=1024/255/63
>  hda: hda1 hda2 hda3
>  hda: p2 exceeds device capacity
>  hda: p3 exceeds device capacity
>
> 2.6.18
> hda: 39070080 sectors (20003 MB), CHS=38760/16/63, UDMA(100)
> hda: cache flushes supported
>  hda: hda1 hda2 hda3
> Probing IDE interface ide1...
>
>
> here the output from fdisk:
>
>  fdisk -l
>
> Disk /dev/hda: 20.0 GB, 20003880960 bytes
> 255 heads, 63 sectors/track, 2432 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
>
>    Device Boot      Start         End      Blocks   Id  System
> /dev/hda1               1         131     1052226   82  Linux swap / Solaris
> /dev/hda2   *         132        1437    10490445   83  Linux
> /dev/hda3            1438        2432     7992337+  83  Linux
>
>
>
> is it possible that this leads to a chain of events causing the crash in ll_rw_blk.c ?
>
> if someone is interested in the .config please mail me directly, i have not
> subscribed the lkml.

Yes, please post your .config also.

Thanks,
Satyam

[-- Attachment #2: bug.txt.gz --]
[-- Type: application/x-gzip, Size: 4107 bytes --]

[-- Attachment #3: out.txt.gz --]
[-- Type: application/x-gzip, Size: 6109 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: crash with 2.6.22.1 crash:ll_rw_blk.c blk_remove_plug()
  2007-07-22 17:20             ` Satyam Sharma
@ 2007-07-22 22:17               ` Jens Axboe
  2007-07-25  0:22                 ` Satyam Sharma
  2007-07-23  7:57               ` walter harms
  1 sibling, 1 reply; 12+ messages in thread
From: Jens Axboe @ 2007-07-22 22:17 UTC (permalink / raw)
  To: Satyam Sharma; +Cc: wharms, LKML, Rafael J. Wysocki, pm list

On Sun, Jul 22 2007, Satyam Sharma wrote:
> Hi Walter,
>
> Thanks for reporting this.
>
> On 7/22/07, walter harms <wharms@bfs.de> wrote:
>> hello all,
>> on my asus notebook tm620 there is a crash with 2.6.22 and 2.6.21
>
> Did this happen when you were resuming from a suspend-to-ram/disk?
> [ I ask because I see swsusp in the trace below, linux-pm added to Cc: ]
>
>> ....
>> Using IPI Shortcut mode
>> WARNING: at block/ll_rw_blk.c:1575 blk_remove_plug()
>>  [<c01ac87e>] blk_remove_plug+0x36/0x5a
>>  [<c01ac8b6>] __generic_unplug_device+0x14/0x1f
>>  [<c01ad587>] __make_request+0x39b/0x49c
>>  [<c01abc8c>] generic_make_request+0x228/0x255
>>  [<c01adb54>] submit_bio+0xa5/0xac
>>  [<c013e233>] mempool_alloc+0x37/0xae
>>  [<c01314dc>] submit+0xc2/0x11d
>>  [<c0131585>] bio_read_page+0x24/0x27
>>  [<c013188b>] swsusp_check+0x4f/0xaf
>>  [<c012f6c2>] software_resume+0x5f/0x108
>>  [<c037867e>] kernel_init+0xb0/0x212
>>  [<c0103a16>] ret_from_fork+0x6/0x1c
>>  [<c03785ce>] kernel_init+0x0/0x212
>>  [<c03785ce>] kernel_init+0x0/0x212
>>  [<c010465b>] kernel_thread_helper+0x7/0x10
>>  =======================
>
> Surprising, that's a WARN_ON(!irqs_disabled()) but IRQs are disabled
> alright on that codepath. OTOH, __make_request() is heavily goto-driven,
> uses the non-save/restore variants of spin_lock_irq, and does not even
> balance locks / unlocks for some error paths ... gaah.

__make_request() must be called from process context, hence
spin_lock_irq() is perfectly already and the fastest way to go. And of
course the locking is balanced! So please save your 'gaah's for code
you actually took the time to try and understand.

But it does look like unbalanced irq disable/enable calls. I'd guess in
the suspend/resume path. Obviously something more esoteric, since this
is the first such report for 2.6.22, so like some not-very-used driver
for instance.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: crash with 2.6.22.1 crash:ll_rw_blk.c blk_remove_plug()
  2007-07-22 17:20             ` Satyam Sharma
  2007-07-22 22:17               ` Jens Axboe
@ 2007-07-23  7:57               ` walter harms
  1 sibling, 0 replies; 12+ messages in thread
From: walter harms @ 2007-07-23  7:57 UTC (permalink / raw)
  To: Satyam Sharma; +Cc: Jens Axboe, LKML, Rafael J. Wysocki, pm list

[-- Attachment #1: Type: text/plain, Size: 558 bytes --]



Satyam Sharma wrote:
> Hi Walter,
> 
> Thanks for reporting this.
> 
> On 7/22/07, walter harms <wharms@bfs.de> wrote:
>> hello all,
>> on my asus notebook tm620 there is a crash with 2.6.22 and 2.6.21
> 
> Did this happen when you were resuming from a suspend-to-ram/disk?
> [ I ask because I see swsusp in the trace below, linux-pm added to Cc: ]
> 

hi all,
this is a 'simple' bootprocess no suspend from ram/disk no exotic devices beyond
a 20GB ide-hd.

attached is the current .conf (renamed to _config so nobody will accidentally overwrite)

re,
 wh

[-- Attachment #2: _config.gz --]
[-- Type: application/x-gzip, Size: 18812 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: crash with 2.6.22.1 crash:ll_rw_blk.c blk_remove_plug()
  2007-07-22 22:17               ` Jens Axboe
@ 2007-07-25  0:22                 ` Satyam Sharma
  2007-07-25  7:05                   ` walter harms
  2007-07-25 11:18                   ` Jens Axboe
  0 siblings, 2 replies; 12+ messages in thread
From: Satyam Sharma @ 2007-07-25  0:22 UTC (permalink / raw)
  To: Jens Axboe; +Cc: wharms, LKML, Rafael J. Wysocki, pm list

On 7/23/07, Jens Axboe <jens.axboe@oracle.com> wrote:
> On Sun, Jul 22 2007, Satyam Sharma wrote:
> > Hi Walter,
> >
> > Thanks for reporting this.
> >
> > On 7/22/07, walter harms <wharms@bfs.de> wrote:
> >> hello all,
> >> on my asus notebook tm620 there is a crash with 2.6.22 and 2.6.21
> >
> > Did this happen when you were resuming from a suspend-to-ram/disk?
> > [ I ask because I see swsusp in the trace below, linux-pm added to Cc: ]
> >
> >> ....
> >> Using IPI Shortcut mode
> >> WARNING: at block/ll_rw_blk.c:1575 blk_remove_plug()
> >>  [<c01ac87e>] blk_remove_plug+0x36/0x5a
> >>  [<c01ac8b6>] __generic_unplug_device+0x14/0x1f
> >>  [<c01ad587>] __make_request+0x39b/0x49c
> >>  [<c01abc8c>] generic_make_request+0x228/0x255
> >>  [<c01adb54>] submit_bio+0xa5/0xac
> >>  [<c013e233>] mempool_alloc+0x37/0xae
> >>  [<c01314dc>] submit+0xc2/0x11d
> >>  [<c0131585>] bio_read_page+0x24/0x27
> >>  [<c013188b>] swsusp_check+0x4f/0xaf
> >>  [<c012f6c2>] software_resume+0x5f/0x108
> >>  [<c037867e>] kernel_init+0xb0/0x212
> >>  [<c0103a16>] ret_from_fork+0x6/0x1c
> >>  [<c03785ce>] kernel_init+0x0/0x212
> >>  [<c03785ce>] kernel_init+0x0/0x212
> >>  [<c010465b>] kernel_thread_helper+0x7/0x10
> >>  =======================
> >
> > Surprising, that's a WARN_ON(!irqs_disabled()) but IRQs are disabled
> > alright on that codepath. OTOH, __make_request() is heavily goto-driven,
> > uses the non-save/restore variants of spin_lock_irq, and does not even
> > balance locks / unlocks for some error paths ... gaah.
>
> __make_request() must be called from process context, hence
> spin_lock_irq() is perfectly already and the fastest way to go. And of
> course the locking is balanced! So please save your 'gaah's for code
> you actually took the time to try and understand.

You're right, I didn't really look at that code for long (it even explicitly
comments about what's going with the locking in there!) sorry about
that.

[ Off-topic: BTW does every call to __make_request() end up in
blk_remove_plug()? Since you're explicitly making the assumption
that it *must* be called from process context (and hence the use of
the non-save/restore variants), you could consider putting a
WARN_ON(irqs_disabled()) over there, and perhaps a WARN_ON
(!spin_is_locked(queue_lock)) in blk_remove_plug() instead, and
other such similar functions that currently have the !irqs_disabled
check. This way you'd effectively cover _both_ the assertions,
and in appropriate places -- just a suggestion. ]

> But it does look like unbalanced irq disable/enable calls. I'd guess in
> the suspend/resume path. Obviously something more esoteric, since this
> is the first such report for 2.6.22, so like some not-very-used driver
> for instance.

Now that I do look at the codepath, it does seem surprising irqs were
not disabled there. There are a bunch of calls to _other_ functions
between the spin_lock_irq and the blk_remove_plug via
__generic_unplug_device that would also have complained about
!irqs_disabled.

Walter, does this happen reproducibly?

Satyam

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: crash with 2.6.22.1 crash:ll_rw_blk.c blk_remove_plug()
  2007-07-25  0:22                 ` Satyam Sharma
@ 2007-07-25  7:05                   ` walter harms
  2007-07-25 11:18                   ` Jens Axboe
  1 sibling, 0 replies; 12+ messages in thread
From: walter harms @ 2007-07-25  7:05 UTC (permalink / raw)
  To: Satyam Sharma; +Cc: Jens Axboe, LKML, Rafael J. Wysocki, pm list



Satyam Sharma wrote:
> On 7/23/07, Jens Axboe <jens.axboe@oracle.com> wrote:
>> On Sun, Jul 22 2007, Satyam Sharma wrote:
>> > Hi Walter,
>> >
>> > Thanks for reporting this.
>> >
>> > On 7/22/07, walter harms <wharms@bfs.de> wrote:
>> >> hello all,
>> >> on my asus notebook tm620 there is a crash with 2.6.22 and 2.6.21
>> >
>> > Did this happen when you were resuming from a suspend-to-ram/disk?
>> > [ I ask because I see swsusp in the trace below, linux-pm added to
>> Cc: ]
>> >
>> >> ....
>> >> Using IPI Shortcut mode
>> >> WARNING: at block/ll_rw_blk.c:1575 blk_remove_plug()
>> >>  [<c01ac87e>] blk_remove_plug+0x36/0x5a
>> >>  [<c01ac8b6>] __generic_unplug_device+0x14/0x1f
>> >>  [<c01ad587>] __make_request+0x39b/0x49c
>> >>  [<c01abc8c>] generic_make_request+0x228/0x255
>> >>  [<c01adb54>] submit_bio+0xa5/0xac
>> >>  [<c013e233>] mempool_alloc+0x37/0xae
>> >>  [<c01314dc>] submit+0xc2/0x11d
>> >>  [<c0131585>] bio_read_page+0x24/0x27
>> >>  [<c013188b>] swsusp_check+0x4f/0xaf
>> >>  [<c012f6c2>] software_resume+0x5f/0x108
>> >>  [<c037867e>] kernel_init+0xb0/0x212
>> >>  [<c0103a16>] ret_from_fork+0x6/0x1c
>> >>  [<c03785ce>] kernel_init+0x0/0x212
>> >>  [<c03785ce>] kernel_init+0x0/0x212
>> >>  [<c010465b>] kernel_thread_helper+0x7/0x10
>> >>  =======================
>> >
>> > Surprising, that's a WARN_ON(!irqs_disabled()) but IRQs are disabled
>> > alright on that codepath. OTOH, __make_request() is heavily
>> goto-driven,
>> > uses the non-save/restore variants of spin_lock_irq, and does not even
>> > balance locks / unlocks for some error paths ... gaah.
>>
>> __make_request() must be called from process context, hence
>> spin_lock_irq() is perfectly already and the fastest way to go. And of
>> course the locking is balanced! So please save your 'gaah's for code
>> you actually took the time to try and understand.
> 
> You're right, I didn't really look at that code for long (it even
> explicitly
> comments about what's going with the locking in there!) sorry about
> that.
> 
> [ Off-topic: BTW does every call to __make_request() end up in
> blk_remove_plug()? Since you're explicitly making the assumption
> that it *must* be called from process context (and hence the use of
> the non-save/restore variants), you could consider putting a
> WARN_ON(irqs_disabled()) over there, and perhaps a WARN_ON
> (!spin_is_locked(queue_lock)) in blk_remove_plug() instead, and
> other such similar functions that currently have the !irqs_disabled
> check. This way you'd effectively cover _both_ the assertions,
> and in appropriate places -- just a suggestion. ]
> 
>> But it does look like unbalanced irq disable/enable calls. I'd guess in
>> the suspend/resume path. Obviously something more esoteric, since this
>> is the first such report for 2.6.22, so like some not-very-used driver
>> for instance.
> 
> Now that I do look at the codepath, it does seem surprising irqs were
> not disabled there. There are a bunch of calls to _other_ functions
> between the spin_lock_irq and the blk_remove_plug via
> __generic_unplug_device that would also have complained about
> !irqs_disabled.
> 
> Walter, does this happen reproducibly?
> 

yes, with 2.6.21 and 2.6.22.1

re,
 wh







^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: crash with 2.6.22.1 crash:ll_rw_blk.c blk_remove_plug()
  2007-07-25  0:22                 ` Satyam Sharma
  2007-07-25  7:05                   ` walter harms
@ 2007-07-25 11:18                   ` Jens Axboe
  2007-07-25 12:19                     ` walter harms
                                       ` (2 more replies)
  1 sibling, 3 replies; 12+ messages in thread
From: Jens Axboe @ 2007-07-25 11:18 UTC (permalink / raw)
  To: Satyam Sharma; +Cc: wharms, LKML, Rafael J. Wysocki, pm list

On Wed, Jul 25 2007, Satyam Sharma wrote:
> On 7/23/07, Jens Axboe <jens.axboe@oracle.com> wrote:
>> On Sun, Jul 22 2007, Satyam Sharma wrote:
>> > Hi Walter,
>> >
>> > Thanks for reporting this.
>> >
>> > On 7/22/07, walter harms <wharms@bfs.de> wrote:
>> >> hello all,
>> >> on my asus notebook tm620 there is a crash with 2.6.22 and 2.6.21
>> >
>> > Did this happen when you were resuming from a suspend-to-ram/disk?
>> > [ I ask because I see swsusp in the trace below, linux-pm added to Cc: ]
>> >
>> >> ....
>> >> Using IPI Shortcut mode
>> >> WARNING: at block/ll_rw_blk.c:1575 blk_remove_plug()
>> >>  [<c01ac87e>] blk_remove_plug+0x36/0x5a
>> >>  [<c01ac8b6>] __generic_unplug_device+0x14/0x1f
>> >>  [<c01ad587>] __make_request+0x39b/0x49c
>> >>  [<c01abc8c>] generic_make_request+0x228/0x255
>> >>  [<c01adb54>] submit_bio+0xa5/0xac
>> >>  [<c013e233>] mempool_alloc+0x37/0xae
>> >>  [<c01314dc>] submit+0xc2/0x11d
>> >>  [<c0131585>] bio_read_page+0x24/0x27
>> >>  [<c013188b>] swsusp_check+0x4f/0xaf
>> >>  [<c012f6c2>] software_resume+0x5f/0x108
>> >>  [<c037867e>] kernel_init+0xb0/0x212
>> >>  [<c0103a16>] ret_from_fork+0x6/0x1c
>> >>  [<c03785ce>] kernel_init+0x0/0x212
>> >>  [<c03785ce>] kernel_init+0x0/0x212
>> >>  [<c010465b>] kernel_thread_helper+0x7/0x10
>> >>  =======================
>> >
>> > Surprising, that's a WARN_ON(!irqs_disabled()) but IRQs are disabled
>> > alright on that codepath. OTOH, __make_request() is heavily goto-driven,
>> > uses the non-save/restore variants of spin_lock_irq, and does not even
>> > balance locks / unlocks for some error paths ... gaah.
>>
>> __make_request() must be called from process context, hence
>> spin_lock_irq() is perfectly already and the fastest way to go. And of
>> course the locking is balanced! So please save your 'gaah's for code
>> you actually took the time to try and understand.
>
> You're right, I didn't really look at that code for long (it even 
> explicitly
> comments about what's going with the locking in there!) sorry about
> that.
>
> [ Off-topic: BTW does every call to __make_request() end up in
> blk_remove_plug()? Since you're explicitly making the assumption
> that it *must* be called from process context (and hence the use of
> the non-save/restore variants), you could consider putting a
> WARN_ON(irqs_disabled()) over there, and perhaps a WARN_ON
> (!spin_is_locked(queue_lock)) in blk_remove_plug() instead, and
> other such similar functions that currently have the !irqs_disabled
> check. This way you'd effectively cover _both_ the assertions,
> and in appropriate places -- just a suggestion. ]

No, blk_remove_plug() will only be called for sync bios, or where we
have to wait for request allocation (which will unplug the device).

__generic_make_request() already does a might_sleep() check, so it
should catch this already.

>> But it does look like unbalanced irq disable/enable calls. I'd guess in
>> the suspend/resume path. Obviously something more esoteric, since this
>> is the first such report for 2.6.22, so like some not-very-used driver
>> for instance.
>
> Now that I do look at the codepath, it does seem surprising irqs were
> not disabled there. There are a bunch of calls to _other_ functions
> between the spin_lock_irq and the blk_remove_plug via
> __generic_unplug_device that would also have complained about
> !irqs_disabled.
>
> Walter, does this happen reproducibly?

As I previously wrote, it's like some of the device power up or resume
routines that botch the irq enable/disable stuff. It'd be interesting to
start stripping down the config until the warning goes away - or enable
CONFIG_PM_DEBUG which may help as well.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: crash with 2.6.22.1 crash:ll_rw_blk.c blk_remove_plug()
  2007-07-25 11:18                   ` Jens Axboe
@ 2007-07-25 12:19                     ` walter harms
  2007-07-26  7:05                     ` walter harms
  2007-07-29 15:48                     ` walter harms
  2 siblings, 0 replies; 12+ messages in thread
From: walter harms @ 2007-07-25 12:19 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Satyam Sharma, LKML, Rafael J. Wysocki, pm list



Jens Axboe wrote:
> On Wed, Jul 25 2007, Satyam Sharma wrote:
>> On 7/23/07, Jens Axboe <jens.axboe@oracle.com> wrote:
>>> On Sun, Jul 22 2007, Satyam Sharma wrote:
>>>> Hi Walter,
>>>>
>>>> Thanks for reporting this.
>>>>
>>>> On 7/22/07, walter harms <wharms@bfs.de> wrote:
>>>>> hello all,
>>>>> on my asus notebook tm620 there is a crash with 2.6.22 and 2.6.21
>>>> Did this happen when you were resuming from a suspend-to-ram/disk?
>>>> [ I ask because I see swsusp in the trace below, linux-pm added to Cc: ]
>>>>
>>>>> ....
>>>>> Using IPI Shortcut mode
>>>>> WARNING: at block/ll_rw_blk.c:1575 blk_remove_plug()
>>>>>  [<c01ac87e>] blk_remove_plug+0x36/0x5a
>>>>>  [<c01ac8b6>] __generic_unplug_device+0x14/0x1f
>>>>>  [<c01ad587>] __make_request+0x39b/0x49c
>>>>>  [<c01abc8c>] generic_make_request+0x228/0x255
>>>>>  [<c01adb54>] submit_bio+0xa5/0xac
>>>>>  [<c013e233>] mempool_alloc+0x37/0xae
>>>>>  [<c01314dc>] submit+0xc2/0x11d
>>>>>  [<c0131585>] bio_read_page+0x24/0x27
>>>>>  [<c013188b>] swsusp_check+0x4f/0xaf
>>>>>  [<c012f6c2>] software_resume+0x5f/0x108
>>>>>  [<c037867e>] kernel_init+0xb0/0x212
>>>>>  [<c0103a16>] ret_from_fork+0x6/0x1c
>>>>>  [<c03785ce>] kernel_init+0x0/0x212
>>>>>  [<c03785ce>] kernel_init+0x0/0x212
>>>>>  [<c010465b>] kernel_thread_helper+0x7/0x10
>>>>>  =======================
>>>> Surprising, that's a WARN_ON(!irqs_disabled()) but IRQs are disabled
>>>> alright on that codepath. OTOH, __make_request() is heavily goto-driven,
>>>> uses the non-save/restore variants of spin_lock_irq, and does not even
>>>> balance locks / unlocks for some error paths ... gaah.
>>> __make_request() must be called from process context, hence
>>> spin_lock_irq() is perfectly already and the fastest way to go. And of
>>> course the locking is balanced! So please save your 'gaah's for code
>>> you actually took the time to try and understand.
>> You're right, I didn't really look at that code for long (it even 
>> explicitly
>> comments about what's going with the locking in there!) sorry about
>> that.
>>
>> [ Off-topic: BTW does every call to __make_request() end up in
>> blk_remove_plug()? Since you're explicitly making the assumption
>> that it *must* be called from process context (and hence the use of
>> the non-save/restore variants), you could consider putting a
>> WARN_ON(irqs_disabled()) over there, and perhaps a WARN_ON
>> (!spin_is_locked(queue_lock)) in blk_remove_plug() instead, and
>> other such similar functions that currently have the !irqs_disabled
>> check. This way you'd effectively cover _both_ the assertions,
>> and in appropriate places -- just a suggestion. ]
> 
> No, blk_remove_plug() will only be called for sync bios, or where we
> have to wait for request allocation (which will unplug the device).
> 
> __generic_make_request() already does a might_sleep() check, so it
> should catch this already.
> 
>>> But it does look like unbalanced irq disable/enable calls. I'd guess in
>>> the suspend/resume path. Obviously something more esoteric, since this
>>> is the first such report for 2.6.22, so like some not-very-used driver
>>> for instance.
>> Now that I do look at the codepath, it does seem surprising irqs were
>> not disabled there. There are a bunch of calls to _other_ functions
>> between the spin_lock_irq and the blk_remove_plug via
>> __generic_unplug_device that would also have complained about
>> !irqs_disabled.
>>
>> Walter, does this happen reproducibly?
> 
> As I previously wrote, it's like some of the device power up or resume
> routines that botch the irq enable/disable stuff. It'd be interesting to
> start stripping down the config until the warning goes away - or enable
> CONFIG_PM_DEBUG which may help as well.
> 

i will give CONFIG_PM_DEBUG a try,
do not expect results before WE.

re,
 wh

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: crash with 2.6.22.1 crash:ll_rw_blk.c blk_remove_plug()
  2007-07-25 11:18                   ` Jens Axboe
  2007-07-25 12:19                     ` walter harms
@ 2007-07-26  7:05                     ` walter harms
  2007-07-29 15:48                     ` walter harms
  2 siblings, 0 replies; 12+ messages in thread
From: walter harms @ 2007-07-26  7:05 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Satyam Sharma, LKML, Rafael J. Wysocki, pm list



Jens Axboe wrote:
> On Wed, Jul 25 2007, Satyam Sharma wrote:
>> On 7/23/07, Jens Axboe <jens.axboe@oracle.com> wrote:
>>> On Sun, Jul 22 2007, Satyam Sharma wrote:
>>>> Hi Walter,
>>>>
>>>> Thanks for reporting this.
>>>>
>>>> On 7/22/07, walter harms <wharms@bfs.de> wrote:
>>>>> hello all,
>>>>> on my asus notebook tm620 there is a crash with 2.6.22 and 2.6.21
>>>> Did this happen when you were resuming from a suspend-to-ram/disk?
>>>> [ I ask because I see swsusp in the trace below, linux-pm added to Cc: ]
>>>>
>>>>> ....
>>>>> Using IPI Shortcut mode
>>>>> WARNING: at block/ll_rw_blk.c:1575 blk_remove_plug()
>>>>>  [<c01ac87e>] blk_remove_plug+0x36/0x5a
>>>>>  [<c01ac8b6>] __generic_unplug_device+0x14/0x1f
>>>>>  [<c01ad587>] __make_request+0x39b/0x49c
>>>>>  [<c01abc8c>] generic_make_request+0x228/0x255
>>>>>  [<c01adb54>] submit_bio+0xa5/0xac
>>>>>  [<c013e233>] mempool_alloc+0x37/0xae
>>>>>  [<c01314dc>] submit+0xc2/0x11d
>>>>>  [<c0131585>] bio_read_page+0x24/0x27
>>>>>  [<c013188b>] swsusp_check+0x4f/0xaf
>>>>>  [<c012f6c2>] software_resume+0x5f/0x108
>>>>>  [<c037867e>] kernel_init+0xb0/0x212
>>>>>  [<c0103a16>] ret_from_fork+0x6/0x1c
>>>>>  [<c03785ce>] kernel_init+0x0/0x212
>>>>>  [<c03785ce>] kernel_init+0x0/0x212
>>>>>  [<c010465b>] kernel_thread_helper+0x7/0x10
>>>>>  =======================
>>>> Surprising, that's a WARN_ON(!irqs_disabled()) but IRQs are disabled
>>>> alright on that codepath. OTOH, __make_request() is heavily goto-driven,
>>>> uses the non-save/restore variants of spin_lock_irq, and does not even
>>>> balance locks / unlocks for some error paths ... gaah.
>>> __make_request() must be called from process context, hence
>>> spin_lock_irq() is perfectly already and the fastest way to go. And of
>>> course the locking is balanced! So please save your 'gaah's for code
>>> you actually took the time to try and understand.
>> You're right, I didn't really look at that code for long (it even 
>> explicitly
>> comments about what's going with the locking in there!) sorry about
>> that.
>>
>> [ Off-topic: BTW does every call to __make_request() end up in
>> blk_remove_plug()? Since you're explicitly making the assumption
>> that it *must* be called from process context (and hence the use of
>> the non-save/restore variants), you could consider putting a
>> WARN_ON(irqs_disabled()) over there, and perhaps a WARN_ON
>> (!spin_is_locked(queue_lock)) in blk_remove_plug() instead, and
>> other such similar functions that currently have the !irqs_disabled
>> check. This way you'd effectively cover _both_ the assertions,
>> and in appropriate places -- just a suggestion. ]
> 
> No, blk_remove_plug() will only be called for sync bios, or where we
> have to wait for request allocation (which will unplug the device).
> 
> __generic_make_request() already does a might_sleep() check, so it
> should catch this already.
> 
>>> But it does look like unbalanced irq disable/enable calls. I'd guess in
>>> the suspend/resume path. Obviously something more esoteric, since this
>>> is the first such report for 2.6.22, so like some not-very-used driver
>>> for instance.
>> Now that I do look at the codepath, it does seem surprising irqs were
>> not disabled there. There are a bunch of calls to _other_ functions
>> between the spin_lock_irq and the blk_remove_plug via
>> __generic_unplug_device that would also have complained about
>> !irqs_disabled.
>>
>> Walter, does this happen reproducibly?
> 
> As I previously wrote, it's like some of the device power up or resume
> routines that botch the irq enable/disable stuff. It'd be interesting to
> start stripping down the config until the warning goes away - or enable
> CONFIG_PM_DEBUG which may help as well.
> 

hi all,
i have recompiled the kernel with CONFIG_PM_DEBUG but that resulted in nothing
more that a magic number after the backtrace, does anyone care ?

I played with hda=c,h,s to overcome the detection problem but no success here
beside some "TSC instability". i did not try this with a working kernel, it is only
to give you all information i have.

re,
 wh




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: crash with 2.6.22.1 crash:ll_rw_blk.c blk_remove_plug()
  2007-07-25 11:18                   ` Jens Axboe
  2007-07-25 12:19                     ` walter harms
  2007-07-26  7:05                     ` walter harms
@ 2007-07-29 15:48                     ` walter harms
  2 siblings, 0 replies; 12+ messages in thread
From: walter harms @ 2007-07-29 15:48 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Satyam Sharma, LKML, Rafael J. Wysocki, pm list

[-- Attachment #1: Type: text/plain, Size: 3688 bytes --]

>>>> On 7/22/07, walter harms <wharms@bfs.de> wrote:
>>>>> hello all,
>>>>> on my asus notebook tm620 there is a crash with 2.6.22 and 2.6.21
>>>> Did this happen when you were resuming from a suspend-to-ram/disk?
>>>> [ I ask because I see swsusp in the trace below, linux-pm added to Cc: ]
>>>>
>>>>> ....
>>>>> Using IPI Shortcut mode
>>>>> WARNING: at block/ll_rw_blk.c:1575 blk_remove_plug()
>>>>>  [<c01ac87e>] blk_remove_plug+0x36/0x5a
>>>>>  [<c01ac8b6>] __generic_unplug_device+0x14/0x1f
>>>>>  [<c01ad587>] __make_request+0x39b/0x49c
>>>>>  [<c01abc8c>] generic_make_request+0x228/0x255
>>>>>  [<c01adb54>] submit_bio+0xa5/0xac
>>>>>  [<c013e233>] mempool_alloc+0x37/0xae
>>>>>  [<c01314dc>] submit+0xc2/0x11d
>>>>>  [<c0131585>] bio_read_page+0x24/0x27
>>>>>  [<c013188b>] swsusp_check+0x4f/0xaf
>>>>>  [<c012f6c2>] software_resume+0x5f/0x108
>>>>>  [<c037867e>] kernel_init+0xb0/0x212
>>>>>  [<c0103a16>] ret_from_fork+0x6/0x1c
>>>>>  [<c03785ce>] kernel_init+0x0/0x212
>>>>>  [<c03785ce>] kernel_init+0x0/0x212
>>>>>  [<c010465b>] kernel_thread_helper+0x7/0x10
>>>>>  =======================
>>>> Surprising, that's a WARN_ON(!irqs_disabled()) but IRQs are disabled
>>>> alright on that codepath. OTOH, __make_request() is heavily goto-driven,
>>>> uses the non-save/restore variants of spin_lock_irq, and does not even
>>>> balance locks / unlocks for some error paths ... gaah.
>>> __make_request() must be called from process context, hence
>>> spin_lock_irq() is perfectly already and the fastest way to go. And of
>>> course the locking is balanced! So please save your 'gaah's for code
>>> you actually took the time to try and understand.
>> You're right, I didn't really look at that code for long (it even 
>> explicitly
>> comments about what's going with the locking in there!) sorry about
>> that.
>>
>> [ Off-topic: BTW does every call to __make_request() end up in
>> blk_remove_plug()? Since you're explicitly making the assumption
>> that it *must* be called from process context (and hence the use of
>> the non-save/restore variants), you could consider putting a
>> WARN_ON(irqs_disabled()) over there, and perhaps a WARN_ON
>> (!spin_is_locked(queue_lock)) in blk_remove_plug() instead, and
>> other such similar functions that currently have the !irqs_disabled
>> check. This way you'd effectively cover _both_ the assertions,
>> and in appropriate places -- just a suggestion. ]
> 
> No, blk_remove_plug() will only be called for sync bios, or where we
> have to wait for request allocation (which will unplug the device).
> 
> __generic_make_request() already does a might_sleep() check, so it
> should catch this already.
> 
>>> But it does look like unbalanced irq disable/enable calls. I'd guess in
>>> the suspend/resume path. Obviously something more esoteric, since this
>>> is the first such report for 2.6.22, so like some not-very-used driver
>>> for instance.
>> Now that I do look at the codepath, it does seem surprising irqs were
>> not disabled there. There are a bunch of calls to _other_ functions
>> between the spin_lock_irq and the blk_remove_plug via
>> __generic_unplug_device that would also have complained about
>> !irqs_disabled.
>>
>> Walter, does this happen reproducibly?
> 
> As I previously wrote, it's like some of the device power up or resume
> routines that botch the irq enable/disable stuff. It'd be interesting to
> start stripping down the config until the warning goes away - or enable
> CONFIG_PM_DEBUG which may help as well.
> 

hi ppl,
here is the output of 2.6.21 with CONFIG_PM_DEBUG. i have disable edd,apm,acpi,smp,resume
whatever i could to make thinks more easy. imho it shows nothing useful.


re,
 wh

[-- Attachment #2: 2.6.21.PM_DEBUG.gz --]
[-- Type: application/x-gzip, Size: 3807 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2007-07-29 15:49 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-18  8:46 crash with 2.6.21 BUG:ll_rw_blk.c walter harms
2007-07-18 10:33 ` Jens Axboe
     [not found]   ` <469DF233.5080902@bfs.de>
     [not found]     ` <20070718110724.GN11657@kernel.dk>
     [not found]       ` <469E072E.7080400@bfs.de>
     [not found]         ` <20070718123142.GV11657@kernel.dk>
2007-07-22 16:51           ` crash with 2.6.22.1 crash:ll_rw_blk.c blk_remove_plug() walter harms
2007-07-22 17:20             ` Satyam Sharma
2007-07-22 22:17               ` Jens Axboe
2007-07-25  0:22                 ` Satyam Sharma
2007-07-25  7:05                   ` walter harms
2007-07-25 11:18                   ` Jens Axboe
2007-07-25 12:19                     ` walter harms
2007-07-26  7:05                     ` walter harms
2007-07-29 15:48                     ` walter harms
2007-07-23  7:57               ` walter harms

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox