* RE: [SCSI BUG 2.6.15-rc3-mm1] scheduling while atomic on boot tim e
@ 2005-12-02 20:35 goggin, edward
2005-12-05 20:31 ` James Bottomley
0 siblings, 1 reply; 4+ messages in thread
From: goggin, edward @ 2005-12-02 20:35 UTC (permalink / raw)
To: 'Andrew Morton', Wu Fengguang; +Cc: linux-kernel, linux-scsi
I think this is caused by my patch to scsi_next_command()
(on or about 11/11) causing it to call put_device() and
invoke the kobject's release() function while in soft
interrupt. My patch should be removed ... although I
don't have an alternate solution in mind for the original
problem which was an "oops with USB Storage on 2.6.14".
> -----Original Message-----
> From: linux-scsi-owner@vger.kernel.org
> [mailto:linux-scsi-owner@vger.kernel.org] On Behalf Of Andrew Morton
> Sent: Friday, December 02, 2005 2:32 PM
> To: Wu Fengguang
> Cc: linux-kernel@vger.kernel.org; linux-scsi@vger.kernel.org
> Subject: Re: [SCSI BUG 2.6.15-rc3-mm1] scheduling while
> atomic on boot time
>
> Wu Fengguang <wfg@mail.ustc.edu.cn> wrote:
> >
> > My server occasionally crashes on boot time, this has been
> happening in many
> > recent kernel versions(at least from 2.6.14-rcx). It is
> rare enough, I setup
> > netconsole and rebooted numerous times, but still failed to
> catch it. Luckily
> > it happened again this time, and does not panic. Here is the logs.
> >
> > Thanks,
> > Wu
> >
> > Error messages:
> > [4294676.927000] scheduling while atomic: ksoftirqd/0/0x00000200/3
> > [4294676.927000] [dump_stack+21/32] dump_stack+0x15/0x20
> > [4294676.927000] [schedule+3563/3584] schedule+0xdeb/0xe00
> > [4294676.927000] [__down+138/272] __down+0x8a/0x110
> > [4294676.927000] [__sched_text_start+10/16] <6>scsi[0]:
> scanning scsi channel 1 [Phy 1] for non-raid devices
> > [4294676.927000] __down_failed+0xa/0x10
> > [4294676.927000] [.text.lock.main+43/71] .text.lock.main+0x2b/0x47
> > [4294676.928000] [device_del+62/112] device_del+0x3e/0x70
> > [4294676.928000] [scsi_target_reap+137/176]
> scsi_target_reap+0x89/0xb0
> > [4294676.928000] [scsi_device_dev_release+251/400]
> scsi_device_dev_release+0xfb/0x190
> > [4294676.928000] [device_release+23/80] device_release+0x17/0x50
> > [4294676.928000] [kobject_cleanup+116/128]
> kobject_cleanup+0x74/0x80
> > [4294676.928000] [kobject_release+11/16] kobject_release+0xb/0x10
> > [4294676.929000] [kref_put+52/160] kref_put+0x34/0xa0
> > [4294676.929000] [kobject_put+20/32] kobject_put+0x14/0x20
> > [4294676.929000] [put_device+17/32] put_device+0x11/0x20
> > [4294676.929000] [scsi_next_command+48/64]
> scsi_next_command+0x30/0x40
> > [4294676.929000] [scsi_end_request+165/192]
> scsi_end_request+0xa5/0xc0
> > [4294676.929000] [scsi_io_completion+540/1152]
> scsi_io_completion+0x21c/0x480
> > [4294676.929000] [scsi_generic_done+43/64]
> scsi_generic_done+0x2b/0x40
> > [4294676.930000] [scsi_finish_command+146/240]
> scsi_finish_command+0x92/0xf0
> > [4294676.930000] [scsi_softirq+215/320] scsi_softirq+0xd7/0x140
> > [4294676.930000] [__do_softirq+216/240] __do_softirq+0xd8/0xf0
> > [4294676.930000] [do_softirq+74/96] do_softirq+0x4a/0x60
> > [4294676.930000] =======================
>
> Which device driver are you using?
>
> This is just a warning - it won't necessarily cause a crash
> and in this
> case it didn't appear to do so.
>
> I seem to recall diagnosing this exact locking problem a
> month or so ago,
> and cc'ing linux-scsi on that analysis.
> -
> To unsubscribe from this list: send the line "unsubscribe
> linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: [SCSI BUG 2.6.15-rc3-mm1] scheduling while atomic on boot tim e
2005-12-02 20:35 [SCSI BUG 2.6.15-rc3-mm1] scheduling while atomic on boot tim e goggin, edward
@ 2005-12-05 20:31 ` James Bottomley
0 siblings, 0 replies; 4+ messages in thread
From: James Bottomley @ 2005-12-05 20:31 UTC (permalink / raw)
To: goggin, edward
Cc: 'Andrew Morton', Wu Fengguang, linux-kernel, linux-scsi
On Fri, 2005-12-02 at 15:35 -0500, goggin, edward wrote:
> I think this is caused by my patch to scsi_next_command()
> (on or about 11/11) causing it to call put_device() and
> invoke the kobject's release() function while in soft
> interrupt. My patch should be removed ... although I
> don't have an alternate solution in mind for the original
> problem which was an "oops with USB Storage on 2.6.14".
Yes and no.
Reverting your patch won't fix the problem because scsi_put_command()
will then relinquish the last reference to the device and trigger the
same warning. Additionally, blk_run_queue now stands a good chance of
running on a freed queue which could trigger a panic.
The problem seems to be that device_del() is apparently requiring user
context, if that's true, this will bite us not only here, but all over
the place ... in fact the fix might have to be to do the target reap
through a workqueue.
Regardless, your patch isn't the culprit here, it's just the thing which
is doing the last put.
James
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: [SCSI BUG 2.6.15-rc3-mm1] scheduling while atomic on boot tim e
@ 2005-12-07 3:13 goggin, edward
0 siblings, 0 replies; 4+ messages in thread
From: goggin, edward @ 2005-12-07 3:13 UTC (permalink / raw)
To: James Bottomley, goggin, edward
Cc: 'Andrew Morton', Wu Fengguang, linux-kernel, linux-scsi
> -----Original Message-----
> From: James Bottomley [mailto:James.Bottomley@SteelEye.com]
> Sent: Monday, December 05, 2005 3:32 PM
> To: goggin, edward
> Cc: 'Andrew Morton'; Wu Fengguang;
> linux-kernel@vger.kernel.org; linux-scsi@vger.kernel.org
> Subject: RE: [SCSI BUG 2.6.15-rc3-mm1] scheduling while
> atomic on boot tim e
>
> On Fri, 2005-12-02 at 15:35 -0500, goggin, edward wrote:
> > I think this is caused by my patch to scsi_next_command()
> > (on or about 11/11) causing it to call put_device() and
> > invoke the kobject's release() function while in soft
> > interrupt. My patch should be removed ... although I
> > don't have an alternate solution in mind for the original
> > problem which was an "oops with USB Storage on 2.6.14".
>
> Yes and no.
>
> Reverting your patch won't fix the problem because scsi_put_command()
> will then relinquish the last reference to the device and trigger the
> same warning. Additionally, blk_run_queue now stands a good chance of
> running on a freed queue which could trigger a panic.
>
> The problem seems to be that device_del() is apparently requiring user
> context, if that's true, this will bite us not only here, but all over
> the place ... in fact the fix might have to be to do the target reap
> through a workqueue.
How about extending kobject_cleanup() to queue to a workqueue if
requested to do so? Doing so would provide an easy mechanism for
other cases to utilize and they could all share the same workqueue.
Scsi would request certainly request this feature for calls to
scsi_device_dev_release() for its device kobjects.
>
> Regardless, your patch isn't the culprit here, it's just the
> thing which
> is doing the last put.
>
> James
>
>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: [SCSI BUG 2.6.15-rc3-mm1] scheduling while atomic on boot tim e
@ 2005-12-07 3:27 goggin, edward
0 siblings, 0 replies; 4+ messages in thread
From: goggin, edward @ 2005-12-07 3:27 UTC (permalink / raw)
To: 'James Bottomley', goggin, edward
Cc: 'Andrew Morton', Wu Fengguang, linux-kernel, linux-scsi
> -----Original Message-----
> From: James Bottomley [mailto:James.Bottomley@SteelEye.com]
> Sent: Monday, December 05, 2005 3:32 PM
> To: goggin, edward
> Cc: 'Andrew Morton'; Wu Fengguang;
> linux-kernel@vger.kernel.org; linux-scsi@vger.kernel.org
> Subject: RE: [SCSI BUG 2.6.15-rc3-mm1] scheduling while
> atomic on boot tim e
>
> On Fri, 2005-12-02 at 15:35 -0500, goggin, edward wrote:
> > I think this is caused by my patch to scsi_next_command()
> > (on or about 11/11) causing it to call put_device() and
> > invoke the kobject's release() function while in soft
> > interrupt. My patch should be removed ... although I
> > don't have an alternate solution in mind for the original
> > problem which was an "oops with USB Storage on 2.6.14".
>
> Yes and no.
>
> Reverting your patch won't fix the problem because scsi_put_command()
> will then relinquish the last reference to the device and trigger the
> same warning. Additionally, blk_run_queue now stands a good chance of
> running on a freed queue which could trigger a panic.
>
> The problem seems to be that device_del() is apparently requiring user
> context, if that's true, this will bite us not only here, but all over
> the place
like as a result of the call to put_device() at the bottom of
scsi_request_fn() when called indirectly via scsi_next_command()'s
call to scsi_run_queue()
> ... in fact the fix might have to be to do the target reap
> through a workqueue.
>
> Regardless, your patch isn't the culprit here, it's just the
> thing which
> is doing the last put.
>
> James
>
>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2005-12-07 3:28 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-12-02 20:35 [SCSI BUG 2.6.15-rc3-mm1] scheduling while atomic on boot tim e goggin, edward
2005-12-05 20:31 ` James Bottomley
-- strict thread matches above, loose matches on Subject: below --
2005-12-07 3:13 goggin, edward
2005-12-07 3:27 goggin, edward
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).