[Bug 18252] spinlock lockup in __make_request <- submit_bio <- ondemand

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [Bug 18252] spinlock lockup in __make_request <- submit_bio <- ondemand_readahead
       [not found] <bug-18252-4803@https.bugzilla.kernel.org/>
@ 2010-09-11  9:50 ` Stefan Richter
  2010-09-13 21:41   ` Andrew Morton
  0 siblings, 1 reply; 5+ messages in thread
From: Stefan Richter @ 2010-09-11  9:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: bugzilla-daemon, axboe

Full quote for lkml:

bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=18252
> 
>            Summary: spinlock lockup in __make_request <- submit_bio <-
>                     ondemand_readahead
>            Product: IO/Storage
>            Version: 2.5
>     Kernel Version: 2.6.36-rc3
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Block Layer
>         AssignedTo: axboe@kernel.dk
>         ReportedBy: stefanr@s5r6.in-berlin.de
>         Regression: No
> 
> 
> Created an attachment (id=29562)
>  --> (https://bugzilla.kernel.org/attachment.cgi?id=29562)
> BUG screenshot
> 
> After a week uptime of 2.6.36-rc3 (I ran 2.6.35 before that),

Almost two weeks uptime actually.

> I was greeted by a black screen of death today in the morning:
> 
> (see screenshot in attachment; partial transcript:)
> 
> sending NMI to all CPUs:
> BUG: soinlock lockup on CPU#0, ktorrent/4313, ffff8802...
> PID: 4313, comm: ktorrent Tainted: G  M D W   2.6.36-rc3 #3
> Call Trace:
>  [...] do_raw_spin_lock+0x118/0x147
>  [...] _raw_spin_lock_irq+0x44/0x49
>  [...] ? __make_request+0x5c/0x400
>  [...] __make_request+0x5c/0x400
>  [...] generic_make_request+0x23a/0x2a9
>  [...] submit_bio+0xad/b6
>  [...] mpage_bio_submit...
>  [...] do_mpage_readpage...
>  [...] ? get_parent_ip...
>  [...] ? sub_preempt_count...
>  [...] ? __lru_cache_add...
>  [...] mpage_readpages...
>  [...] ? ext4_get_block...
>  [...] ? __alloc_pages_nodemask...
>  [...] ? ext4_get_block...
>  [...] ext4_readpages...
>  [...] __do_page_cache_readahead...
>  [...] ? __do_page_cache_readahead...
>  [...] ra_submit...
>  [...] ondemand_readahead...
> 
> This is a system with Phenom II x4 and Radeon graphics.  Since kernel mode
> setting is fairly new for radeon, it is possible that the lockup happened with
> earlier kernels too but simply ended in a lockup without trace dump to the
> screen.  IOW, it is not clear to me whether this is a regression or not.
> 
> The bug happened while kaffeine wrote an MPEG 2 TS to the same filesystem from
> which ktorrent was reading.  Of course this kind of commonplace workload
> happened without problem two or three times before during the week in which I
> ran 2.6.36-rc3.
> 

(The screenshot is a bit large, hence I reported in bugzilla instead of the list.)

The kernel taint was due to prior apparently unrelated lockdep report, bug
17752 "2.6.36-rc3: inconsistent lock state (iprune_sem, shrink_icache_memory".
 And there were three machine check events ten days ago due to corrected ECC
memory errors.
-- 
Stefan Richter
-=====-==-=- =--= -=-==
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Bug 18252] spinlock lockup in __make_request <- submit_bio <- ondemand_readahead
  2010-09-11  9:50 ` [Bug 18252] spinlock lockup in __make_request <- submit_bio <- ondemand_readahead Stefan Richter
@ 2010-09-13 21:41   ` Andrew Morton
  2010-09-14  6:56     ` Stefan Richter
  0 siblings, 1 reply; 5+ messages in thread
From: Andrew Morton @ 2010-09-13 21:41 UTC (permalink / raw)
  To: Stefan Richter; +Cc: linux-kernel, bugzilla-daemon, axboe, linux-scsi

On Sat, 11 Sep 2010 11:50:41 +0200
Stefan Richter <stefanr@s5r6.in-berlin.de> wrote:

> Full quote for lkml:
> 
> bugzilla-daemon@bugzilla.kernel.org wrote:
> > https://bugzilla.kernel.org/show_bug.cgi?id=18252
> > 
> >            Summary: spinlock lockup in __make_request <- submit_bio <-
> >                     ondemand_readahead
> >            Product: IO/Storage
> >            Version: 2.5
> >     Kernel Version: 2.6.36-rc3
> >           Platform: All
> >         OS/Version: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: normal
> >           Priority: P1
> >          Component: Block Layer
> >         AssignedTo: axboe@kernel.dk
> >         ReportedBy: stefanr@s5r6.in-berlin.de
> >         Regression: No
> > 
> > 
> > Created an attachment (id=29562)
> >  --> (https://bugzilla.kernel.org/attachment.cgi?id=29562)
> > BUG screenshot
> > 
> > After a week uptime of 2.6.36-rc3 (I ran 2.6.35 before that),
> 
> Almost two weeks uptime actually.
> 
> > I was greeted by a black screen of death today in the morning:
> > 
> > (see screenshot in attachment; partial transcript:)
> > 
> > sending NMI to all CPUs:
> > BUG: soinlock lockup on CPU#0, ktorrent/4313, ffff8802...
> > PID: 4313, comm: ktorrent Tainted: G  M D W   2.6.36-rc3 #3
> > Call Trace:
> >  [...] do_raw_spin_lock+0x118/0x147
> >  [...] _raw_spin_lock_irq+0x44/0x49
> >  [...] ? __make_request+0x5c/0x400
> >  [...] __make_request+0x5c/0x400
> >  [...] generic_make_request+0x23a/0x2a9
> >  [...] submit_bio+0xad/b6
> >  [...] mpage_bio_submit...
> >  [...] do_mpage_readpage...
> >  [...] ? get_parent_ip...
> >  [...] ? sub_preempt_count...
> >  [...] ? __lru_cache_add...
> >  [...] mpage_readpages...
> >  [...] ? ext4_get_block...
> >  [...] ? __alloc_pages_nodemask...
> >  [...] ? ext4_get_block...
> >  [...] ext4_readpages...
> >  [...] __do_page_cache_readahead...
> >  [...] ? __do_page_cache_readahead...
> >  [...] ra_submit...
> >  [...] ondemand_readahead...
> > 
> > This is a system with Phenom II x4 and Radeon graphics.  Since kernel mode
> > setting is fairly new for radeon, it is possible that the lockup happened with
> > earlier kernels too but simply ended in a lockup without trace dump to the
> > screen.  IOW, it is not clear to me whether this is a regression or not.
> > 
> > The bug happened while kaffeine wrote an MPEG 2 TS to the same filesystem from
> > which ktorrent was reading.  Of course this kind of commonplace workload
> > happened without problem two or three times before during the week in which I
> > ran 2.6.36-rc3.
> > 
> 
> (The screenshot is a bit large, hence I reported in bugzilla instead of the list.)
> 

What you've quoted above appears to be just the aftermath. 
https://bugzilla.kernel.org/attachment.cgi?id=29562 indicates that the
kernel earlier crashed in scsi code, perhaps under
scsi_setup_fs_cmnd().

The question is: was that actually the first crash, or did an even
earlier one scroll off?


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Bug 18252] spinlock lockup in __make_request <- submit_bio <- ondemand_readahead
  2010-09-13 21:41   ` Andrew Morton
@ 2010-09-14  6:56     ` Stefan Richter
  2010-09-14  6:58       ` Jens Axboe
  0 siblings, 1 reply; 5+ messages in thread
From: Stefan Richter @ 2010-09-14  6:56 UTC (permalink / raw)
  To: Andrew Morton, Florian Mickler
  Cc: linux-kernel, bugzilla-daemon, axboe, linux-scsi

Andrew Morton wrote:
>>> https://bugzilla.kernel.org/show_bug.cgi?id=18252
...
> What you've quoted above appears to be just the aftermath. 
> https://bugzilla.kernel.org/attachment.cgi?id=29562 indicates that the
> kernel earlier crashed in scsi code, perhaps under
> scsi_setup_fs_cmnd().
> 
> The question is: was that actually the first crash, or did an even
> earlier one scroll off?

It happened overnight.  The screenshot
https://bugzilla.kernel.org/attachment.cgi?id=29562 shows that there was a lot
more logged before it.  When I saw it in the morning I assumed that the tail
was a repetition of the leading bug trace, but it seems I am mistaken.

Florian Mickler wrote:
> There was an scsi-related use-after-free OOPS fixed recently and pulled 3 days
> ago.  
> 
> On Sat, 11 Sep 2010 19:07:44 +0000
> James Bottomley <James.Bottomley@suse.de> wrote:
> 
>> This includes the oops from use after free, a set of qla2xxx fixes, some
>> misc warning cleanups from the recently introduced printk issue, an hpsa
>> lockup fix and a medium removal bug in sd introduced by the BKL
>> pushdown.
>> 
>> The patch is available here:
>> 
>> master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6.git
> 
> Maybe you are seeing that?
> 
> (reacting to the general-protection-fault preceded by scsi_init in the
> attachment jpg)

Now that you point it out --- perhaps.  Though I haven't looked into the
mechanics of the now fixed scsi_ini_io use after free.

I am going to update to 2.6.36-rc4 today (I had reverted to 2.6.35 since the
report), and if the issue does not return after two weeks or so I will close
it as fixed, I suggest.
-- 
Stefan Richter
-=====-==-=- =--= -===-
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Bug 18252] spinlock lockup in __make_request <- submit_bio <- ondemand_readahead
  2010-09-14  6:56     ` Stefan Richter
@ 2010-09-14  6:58       ` Jens Axboe
  2010-09-14 11:18         ` Stefan Richter
  0 siblings, 1 reply; 5+ messages in thread
From: Jens Axboe @ 2010-09-14  6:58 UTC (permalink / raw)
  To: Stefan Richter
  Cc: Andrew Morton, Florian Mickler, linux-kernel, bugzilla-daemon,
	linux-scsi

On 2010-09-14 08:56, Stefan Richter wrote:
> Andrew Morton wrote:
>>>> https://bugzilla.kernel.org/show_bug.cgi?id=18252
> ...
>> What you've quoted above appears to be just the aftermath. 
>> https://bugzilla.kernel.org/attachment.cgi?id=29562 indicates that the
>> kernel earlier crashed in scsi code, perhaps under
>> scsi_setup_fs_cmnd().
>>
>> The question is: was that actually the first crash, or did an even
>> earlier one scroll off?
> 
> It happened overnight.  The screenshot
> https://bugzilla.kernel.org/attachment.cgi?id=29562 shows that there was a lot
> more logged before it.  When I saw it in the morning I assumed that the tail
> was a repetition of the leading bug trace, but it seems I am mistaken.
> 
> Florian Mickler wrote:
>> There was an scsi-related use-after-free OOPS fixed recently and pulled 3 days
>> ago.  
>>
>> On Sat, 11 Sep 2010 19:07:44 +0000
>> James Bottomley <James.Bottomley@suse.de> wrote:
>>
>>> This includes the oops from use after free, a set of qla2xxx fixes, some
>>> misc warning cleanups from the recently introduced printk issue, an hpsa
>>> lockup fix and a medium removal bug in sd introduced by the BKL
>>> pushdown.
>>>
>>> The patch is available here:
>>>
>>> master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6.git
>>
>> Maybe you are seeing that?
>>
>> (reacting to the general-protection-fault preceded by scsi_init in the
>> attachment jpg)
> 
> Now that you point it out --- perhaps.  Though I haven't looked into the
> mechanics of the now fixed scsi_ini_io use after free.

It seems the very likely explanation, since I can't see any other way that
you would deadlock on the queue lock from that call trace if you haven't
had someone else crash with the lock held already.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Bug 18252] spinlock lockup in __make_request <- submit_bio <- ondemand_readahead
  2010-09-14  6:58       ` Jens Axboe
@ 2010-09-14 11:18         ` Stefan Richter
  0 siblings, 0 replies; 5+ messages in thread
From: Stefan Richter @ 2010-09-14 11:18 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Andrew Morton, Florian Mickler, linux-kernel, bugzilla-daemon,
	linux-scsi

Jens Axboe wrote:
> On 2010-09-14 08:56, Stefan Richter wrote:
>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=18252
>> Florian Mickler wrote:
>>> There was an scsi-related use-after-free OOPS fixed recently and pulled 3 days
>>> ago.  
...
>>> Maybe you are seeing that?
>>>
>>> (reacting to the general-protection-fault preceded by scsi_init in the
>>> attachment jpg)
>> Now that you point it out --- perhaps.  Though I haven't looked into the
>> mechanics of the now fixed scsi_ini_io use after free.
> 
> It seems the very likely explanation, since I can't see any other way that
> you would deadlock on the queue lock from that call trace if you haven't
> had someone else crash with the lock held already.

Good, I close the bugzilla item right away with the reasonable assuption that
it is fixed by commit 3a5c19c23db65a554f2e4f5df5f307c668277056.
-- 
Stefan Richter
-=====-==-=- =--= -===-
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-09-14 11:19 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <bug-18252-4803@https.bugzilla.kernel.org/>
2010-09-11  9:50 ` [Bug 18252] spinlock lockup in __make_request <- submit_bio <- ondemand_readahead Stefan Richter
2010-09-13 21:41   ` Andrew Morton
2010-09-14  6:56     ` Stefan Richter
2010-09-14  6:58       ` Jens Axboe
2010-09-14 11:18         ` Stefan Richter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox