* [Bug 18252] spinlock lockup in __make_request <- submit_bio <- ondemand_readahead [not found] <bug-18252-4803@https.bugzilla.kernel.org/> @ 2010-09-11 9:50 ` Stefan Richter 2010-09-13 21:41 ` Andrew Morton 0 siblings, 1 reply; 5+ messages in thread From: Stefan Richter @ 2010-09-11 9:50 UTC (permalink / raw) To: linux-kernel; +Cc: bugzilla-daemon, axboe Full quote for lkml: bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=18252 > > Summary: spinlock lockup in __make_request <- submit_bio <- > ondemand_readahead > Product: IO/Storage > Version: 2.5 > Kernel Version: 2.6.36-rc3 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Block Layer > AssignedTo: axboe@kernel.dk > ReportedBy: stefanr@s5r6.in-berlin.de > Regression: No > > > Created an attachment (id=29562) > --> (https://bugzilla.kernel.org/attachment.cgi?id=29562) > BUG screenshot > > After a week uptime of 2.6.36-rc3 (I ran 2.6.35 before that), Almost two weeks uptime actually. > I was greeted by a black screen of death today in the morning: > > (see screenshot in attachment; partial transcript:) > > sending NMI to all CPUs: > BUG: soinlock lockup on CPU#0, ktorrent/4313, ffff8802... > PID: 4313, comm: ktorrent Tainted: G M D W 2.6.36-rc3 #3 > Call Trace: > [...] do_raw_spin_lock+0x118/0x147 > [...] _raw_spin_lock_irq+0x44/0x49 > [...] ? __make_request+0x5c/0x400 > [...] __make_request+0x5c/0x400 > [...] generic_make_request+0x23a/0x2a9 > [...] submit_bio+0xad/b6 > [...] mpage_bio_submit... > [...] do_mpage_readpage... > [...] ? get_parent_ip... > [...] ? sub_preempt_count... > [...] ? __lru_cache_add... > [...] mpage_readpages... > [...] ? ext4_get_block... > [...] ? __alloc_pages_nodemask... > [...] ? ext4_get_block... > [...] ext4_readpages... > [...] __do_page_cache_readahead... > [...] ? __do_page_cache_readahead... > [...] ra_submit... > [...] ondemand_readahead... > > This is a system with Phenom II x4 and Radeon graphics. Since kernel mode > setting is fairly new for radeon, it is possible that the lockup happened with > earlier kernels too but simply ended in a lockup without trace dump to the > screen. IOW, it is not clear to me whether this is a regression or not. > > The bug happened while kaffeine wrote an MPEG 2 TS to the same filesystem from > which ktorrent was reading. Of course this kind of commonplace workload > happened without problem two or three times before during the week in which I > ran 2.6.36-rc3. > (The screenshot is a bit large, hence I reported in bugzilla instead of the list.) The kernel taint was due to prior apparently unrelated lockdep report, bug 17752 "2.6.36-rc3: inconsistent lock state (iprune_sem, shrink_icache_memory". And there were three machine check events ten days ago due to corrected ECC memory errors. -- Stefan Richter -=====-==-=- =--= -=-== http://arcgraph.de/sr/ ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Bug 18252] spinlock lockup in __make_request <- submit_bio <- ondemand_readahead 2010-09-11 9:50 ` [Bug 18252] spinlock lockup in __make_request <- submit_bio <- ondemand_readahead Stefan Richter @ 2010-09-13 21:41 ` Andrew Morton 2010-09-14 6:56 ` Stefan Richter 0 siblings, 1 reply; 5+ messages in thread From: Andrew Morton @ 2010-09-13 21:41 UTC (permalink / raw) To: Stefan Richter; +Cc: linux-kernel, bugzilla-daemon, axboe, linux-scsi On Sat, 11 Sep 2010 11:50:41 +0200 Stefan Richter <stefanr@s5r6.in-berlin.de> wrote: > Full quote for lkml: > > bugzilla-daemon@bugzilla.kernel.org wrote: > > https://bugzilla.kernel.org/show_bug.cgi?id=18252 > > > > Summary: spinlock lockup in __make_request <- submit_bio <- > > ondemand_readahead > > Product: IO/Storage > > Version: 2.5 > > Kernel Version: 2.6.36-rc3 > > Platform: All > > OS/Version: Linux > > Tree: Mainline > > Status: NEW > > Severity: normal > > Priority: P1 > > Component: Block Layer > > AssignedTo: axboe@kernel.dk > > ReportedBy: stefanr@s5r6.in-berlin.de > > Regression: No > > > > > > Created an attachment (id=29562) > > --> (https://bugzilla.kernel.org/attachment.cgi?id=29562) > > BUG screenshot > > > > After a week uptime of 2.6.36-rc3 (I ran 2.6.35 before that), > > Almost two weeks uptime actually. > > > I was greeted by a black screen of death today in the morning: > > > > (see screenshot in attachment; partial transcript:) > > > > sending NMI to all CPUs: > > BUG: soinlock lockup on CPU#0, ktorrent/4313, ffff8802... > > PID: 4313, comm: ktorrent Tainted: G M D W 2.6.36-rc3 #3 > > Call Trace: > > [...] do_raw_spin_lock+0x118/0x147 > > [...] _raw_spin_lock_irq+0x44/0x49 > > [...] ? __make_request+0x5c/0x400 > > [...] __make_request+0x5c/0x400 > > [...] generic_make_request+0x23a/0x2a9 > > [...] submit_bio+0xad/b6 > > [...] mpage_bio_submit... > > [...] do_mpage_readpage... > > [...] ? get_parent_ip... > > [...] ? sub_preempt_count... > > [...] ? __lru_cache_add... > > [...] mpage_readpages... > > [...] ? ext4_get_block... > > [...] ? __alloc_pages_nodemask... > > [...] ? ext4_get_block... > > [...] ext4_readpages... > > [...] __do_page_cache_readahead... > > [...] ? __do_page_cache_readahead... > > [...] ra_submit... > > [...] ondemand_readahead... > > > > This is a system with Phenom II x4 and Radeon graphics. Since kernel mode > > setting is fairly new for radeon, it is possible that the lockup happened with > > earlier kernels too but simply ended in a lockup without trace dump to the > > screen. IOW, it is not clear to me whether this is a regression or not. > > > > The bug happened while kaffeine wrote an MPEG 2 TS to the same filesystem from > > which ktorrent was reading. Of course this kind of commonplace workload > > happened without problem two or three times before during the week in which I > > ran 2.6.36-rc3. > > > > (The screenshot is a bit large, hence I reported in bugzilla instead of the list.) > What you've quoted above appears to be just the aftermath. https://bugzilla.kernel.org/attachment.cgi?id=29562 indicates that the kernel earlier crashed in scsi code, perhaps under scsi_setup_fs_cmnd(). The question is: was that actually the first crash, or did an even earlier one scroll off? ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Bug 18252] spinlock lockup in __make_request <- submit_bio <- ondemand_readahead 2010-09-13 21:41 ` Andrew Morton @ 2010-09-14 6:56 ` Stefan Richter 2010-09-14 6:58 ` Jens Axboe 0 siblings, 1 reply; 5+ messages in thread From: Stefan Richter @ 2010-09-14 6:56 UTC (permalink / raw) To: Andrew Morton, Florian Mickler Cc: linux-kernel, bugzilla-daemon, axboe, linux-scsi Andrew Morton wrote: >>> https://bugzilla.kernel.org/show_bug.cgi?id=18252 ... > What you've quoted above appears to be just the aftermath. > https://bugzilla.kernel.org/attachment.cgi?id=29562 indicates that the > kernel earlier crashed in scsi code, perhaps under > scsi_setup_fs_cmnd(). > > The question is: was that actually the first crash, or did an even > earlier one scroll off? It happened overnight. The screenshot https://bugzilla.kernel.org/attachment.cgi?id=29562 shows that there was a lot more logged before it. When I saw it in the morning I assumed that the tail was a repetition of the leading bug trace, but it seems I am mistaken. Florian Mickler wrote: > There was an scsi-related use-after-free OOPS fixed recently and pulled 3 days > ago. > > On Sat, 11 Sep 2010 19:07:44 +0000 > James Bottomley <James.Bottomley@suse.de> wrote: > >> This includes the oops from use after free, a set of qla2xxx fixes, some >> misc warning cleanups from the recently introduced printk issue, an hpsa >> lockup fix and a medium removal bug in sd introduced by the BKL >> pushdown. >> >> The patch is available here: >> >> master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6.git > > Maybe you are seeing that? > > (reacting to the general-protection-fault preceded by scsi_init in the > attachment jpg) Now that you point it out --- perhaps. Though I haven't looked into the mechanics of the now fixed scsi_ini_io use after free. I am going to update to 2.6.36-rc4 today (I had reverted to 2.6.35 since the report), and if the issue does not return after two weeks or so I will close it as fixed, I suggest. -- Stefan Richter -=====-==-=- =--= -===- http://arcgraph.de/sr/ ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Bug 18252] spinlock lockup in __make_request <- submit_bio <- ondemand_readahead 2010-09-14 6:56 ` Stefan Richter @ 2010-09-14 6:58 ` Jens Axboe 2010-09-14 11:18 ` Stefan Richter 0 siblings, 1 reply; 5+ messages in thread From: Jens Axboe @ 2010-09-14 6:58 UTC (permalink / raw) To: Stefan Richter Cc: Andrew Morton, Florian Mickler, linux-kernel, bugzilla-daemon, linux-scsi On 2010-09-14 08:56, Stefan Richter wrote: > Andrew Morton wrote: >>>> https://bugzilla.kernel.org/show_bug.cgi?id=18252 > ... >> What you've quoted above appears to be just the aftermath. >> https://bugzilla.kernel.org/attachment.cgi?id=29562 indicates that the >> kernel earlier crashed in scsi code, perhaps under >> scsi_setup_fs_cmnd(). >> >> The question is: was that actually the first crash, or did an even >> earlier one scroll off? > > It happened overnight. The screenshot > https://bugzilla.kernel.org/attachment.cgi?id=29562 shows that there was a lot > more logged before it. When I saw it in the morning I assumed that the tail > was a repetition of the leading bug trace, but it seems I am mistaken. > > Florian Mickler wrote: >> There was an scsi-related use-after-free OOPS fixed recently and pulled 3 days >> ago. >> >> On Sat, 11 Sep 2010 19:07:44 +0000 >> James Bottomley <James.Bottomley@suse.de> wrote: >> >>> This includes the oops from use after free, a set of qla2xxx fixes, some >>> misc warning cleanups from the recently introduced printk issue, an hpsa >>> lockup fix and a medium removal bug in sd introduced by the BKL >>> pushdown. >>> >>> The patch is available here: >>> >>> master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6.git >> >> Maybe you are seeing that? >> >> (reacting to the general-protection-fault preceded by scsi_init in the >> attachment jpg) > > Now that you point it out --- perhaps. Though I haven't looked into the > mechanics of the now fixed scsi_ini_io use after free. It seems the very likely explanation, since I can't see any other way that you would deadlock on the queue lock from that call trace if you haven't had someone else crash with the lock held already. -- Jens Axboe ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Bug 18252] spinlock lockup in __make_request <- submit_bio <- ondemand_readahead 2010-09-14 6:58 ` Jens Axboe @ 2010-09-14 11:18 ` Stefan Richter 0 siblings, 0 replies; 5+ messages in thread From: Stefan Richter @ 2010-09-14 11:18 UTC (permalink / raw) To: Jens Axboe Cc: Andrew Morton, Florian Mickler, linux-kernel, bugzilla-daemon, linux-scsi Jens Axboe wrote: > On 2010-09-14 08:56, Stefan Richter wrote: >>>>> https://bugzilla.kernel.org/show_bug.cgi?id=18252 >> Florian Mickler wrote: >>> There was an scsi-related use-after-free OOPS fixed recently and pulled 3 days >>> ago. ... >>> Maybe you are seeing that? >>> >>> (reacting to the general-protection-fault preceded by scsi_init in the >>> attachment jpg) >> Now that you point it out --- perhaps. Though I haven't looked into the >> mechanics of the now fixed scsi_ini_io use after free. > > It seems the very likely explanation, since I can't see any other way that > you would deadlock on the queue lock from that call trace if you haven't > had someone else crash with the lock held already. Good, I close the bugzilla item right away with the reasonable assuption that it is fixed by commit 3a5c19c23db65a554f2e4f5df5f307c668277056. -- Stefan Richter -=====-==-=- =--= -===- http://arcgraph.de/sr/ ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2010-09-14 11:19 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <bug-18252-4803@https.bugzilla.kernel.org/>
2010-09-11 9:50 ` [Bug 18252] spinlock lockup in __make_request <- submit_bio <- ondemand_readahead Stefan Richter
2010-09-13 21:41 ` Andrew Morton
2010-09-14 6:56 ` Stefan Richter
2010-09-14 6:58 ` Jens Axboe
2010-09-14 11:18 ` Stefan Richter
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox