* Question about scsi_device_online() usage in mptscsih
@ 2005-04-02 5:35 Tejun Heo
2005-04-14 18:54 ` Christoph Hellwig
0 siblings, 1 reply; 7+ messages in thread
From: Tejun Heo @ 2005-04-02 5:35 UTC (permalink / raw)
To: Eric.Moore, James Bottomley; +Cc: linux-scsi
Hello, Eric.
Hello, James.
I've been working on new SCSI state model and was checking on
scsi_device_online() users. As the state model is going to change, I
need to audit device state usages in lldd's and I'm having difficult
time understanding why scsi_device_online() is used in mptscsih.
In mptscsih.c, mptscsih_flush_running_cmds() uses
scsi_device_online() to determine whether or not dma-unmap data area
of active commands. This was added in the changeset 1.1371.776.1 by
Eric Moore with the comment "MPT Fusion add back FC909 support". Can
you please explain me why and how scsi_device_online() condition is
used here?
Thanks a lot. :-)
--
tejun
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Question about scsi_device_online() usage in mptscsih
2005-04-02 5:35 Tejun Heo
@ 2005-04-14 18:54 ` Christoph Hellwig
0 siblings, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2005-04-14 18:54 UTC (permalink / raw)
To: Tejun Heo; +Cc: Eric.Moore, James Bottomley, linux-scsi
On Sat, Apr 02, 2005 at 02:35:57PM +0900, Tejun Heo wrote:
> Hello, Eric.
> Hello, James.
>
> I've been working on new SCSI state model and was checking on
> scsi_device_online() users. As the state model is going to change, I
> need to audit device state usages in lldd's and I'm having difficult
> time understanding why scsi_device_online() is used in mptscsih.
>
> In mptscsih.c, mptscsih_flush_running_cmds() uses
> scsi_device_online() to determine whether or not dma-unmap data area
> of active commands. This was added in the changeset 1.1371.776.1 by
> Eric Moore with the comment "MPT Fusion add back FC909 support". Can
> you please explain me why and how scsi_device_online() condition is
> used here?
I brought that issue up a while ago, but we didn't really get anywhere,
see the "fix dma mapping leak in fusion" thread on linux-scsi.
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: Question about scsi_device_online() usage in mptscsih
@ 2005-04-14 20:28 Moore, Eric Dean
2005-04-15 10:13 ` Masao Fukuchi
0 siblings, 1 reply; 7+ messages in thread
From: Moore, Eric Dean @ 2005-04-14 20:28 UTC (permalink / raw)
To: fukuchi.masao; +Cc: James Bottomley, linux-scsi, Tejun Heo, Christoph Hellwig
Masao Fukuchi reported an issue when the
outstanding mf queue was being flushed in
contents of slave_destroy, the scsi dev was
offlined already by the mid-layer.
They may not be an issue anymore since I removed
the error handling timers from mptscsih.
Masao can you validate this pls?
Christoph Hellwig, wrote:
>
> On Sat, Apr 02, 2005 at 02:35:57PM +0900, Tejun Heo wrote:
> > Hello, Eric.
> > Hello, James.
> >
> > I've been working on new SCSI state model and was checking on
> > scsi_device_online() users. As the state model is going to
> change, I
> > need to audit device state usages in lldd's and I'm having difficult
> > time understanding why scsi_device_online() is used in mptscsih.
> >
> > In mptscsih.c, mptscsih_flush_running_cmds() uses
> > scsi_device_online() to determine whether or not dma-unmap data area
> > of active commands. This was added in the changeset 1.1371.776.1 by
> > Eric Moore with the comment "MPT Fusion add back FC909
> support". Can
> > you please explain me why and how scsi_device_online() condition is
> > used here?
>
> I brought that issue up a while ago, but we didn't really get
> anywhere,
> see the "fix dma mapping leak in fusion" thread on linux-scsi.
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Question about scsi_device_online() usage in mptscsih
2005-04-14 20:28 Moore, Eric Dean
@ 2005-04-15 10:13 ` Masao Fukuchi
2005-04-15 14:30 ` James Bottomley
0 siblings, 1 reply; 7+ messages in thread
From: Masao Fukuchi @ 2005-04-15 10:13 UTC (permalink / raw)
To: Moore, Eric Dean
Cc: James Bottomley, linux-scsi, Tejun Heo, Christoph Hellwig
Hi Eric.
The problem was caused by calling pci_unmap_sg()
after device offline.
The sequence is:
1.Host issues SCSI command to fusion MPT driver.
2.Mid layer detects command timeout and then performs
error recovery sequence.
But the sequence fails and it causes device offline.
(fusion MPT driver still hold the command)
4.Fusion MPT driver activates host reset and calls
mptscsih_flush_running_cmds().
5.In mptscsih_flush_running_cmds(), pci_unmap_sg() is called
for all SCSI cmds.
But the buffer had already freed and this causes oops.
I don't know your removal of error handling timer resolves
this issue.
Masao Fukuchi
Moore, Eric Dean wrote:
>Masao Fukuchi reported an issue when the
>outstanding mf queue was being flushed in
>contents of slave_destroy, the scsi dev was
>offlined already by the mid-layer.
>
>They may not be an issue anymore since I removed
>the error handling timers from mptscsih.
>
>Masao can you validate this pls?
>
>Christoph Hellwig, wrote:
>>
>> On Sat, Apr 02, 2005 at 02:35:57PM +0900, Tejun Heo wrote:
>> > Hello, Eric.
>> > Hello, James.
>> >
>> > I've been working on new SCSI state model and was checking on
>> > scsi_device_online() users. As the state model is going to
>> change, I
>> > need to audit device state usages in lldd's and I'm having difficult
>> > time understanding why scsi_device_online() is used in mptscsih.
>> >
>> > In mptscsih.c, mptscsih_flush_running_cmds() uses
>> > scsi_device_online() to determine whether or not dma-unmap data area
>> > of active commands. This was added in the changeset 1.1371.776.1 by
>> > Eric Moore with the comment "MPT Fusion add back FC909
>> support". Can
>> > you please explain me why and how scsi_device_online() condition is
>> > used here?
>>
>> I brought that issue up a while ago, but we didn't really get
>> anywhere,
>> see the "fix dma mapping leak in fusion" thread on linux-scsi.
>>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Question about scsi_device_online() usage in mptscsih
2005-04-15 10:13 ` Masao Fukuchi
@ 2005-04-15 14:30 ` James Bottomley
0 siblings, 0 replies; 7+ messages in thread
From: James Bottomley @ 2005-04-15 14:30 UTC (permalink / raw)
To: Masao Fukuchi
Cc: Moore, Eric Dean, SCSI Mailing List, Tejun Heo, Christoph Hellwig
On Fri, 2005-04-15 at 19:13 +0900, Masao Fukuchi wrote:
> The sequence is:
> 1.Host issues SCSI command to fusion MPT driver.
> 2.Mid layer detects command timeout and then performs
> error recovery sequence.
> But the sequence fails and it causes device offline.
> (fusion MPT driver still hold the command)
> 4.Fusion MPT driver activates host reset and calls
> mptscsih_flush_running_cmds().
Well, this is the error: The Fusion host reset should be activated from
the eh_host_reset_handler.
> 5.In mptscsih_flush_running_cmds(), pci_unmap_sg() is called
> for all SCSI cmds.
> But the buffer had already freed and this causes oops.
If you do the above, the command will still be valid. If you wait until
after the device is offlined, of course the mid-layer throws away the
commands, since it's up to the driver to clean them up during the reset
sequences.
James
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: Question about scsi_device_online() usage in mptscsih
@ 2005-04-15 16:11 Moore, Eric Dean
2005-04-15 16:41 ` James Bottomley
0 siblings, 1 reply; 7+ messages in thread
From: Moore, Eric Dean @ 2005-04-15 16:11 UTC (permalink / raw)
To: James Bottomley, Masao Fukuchi
Cc: SCSI Mailing List, Tejun Heo, Christoph Hellwig
On Friday, April 15, 2005 8:30 AM, James Bottomley wrote:
>
> On Fri, 2005-04-15 at 19:13 +0900, Masao Fukuchi wrote:
> > The sequence is:
> > 1.Host issues SCSI command to fusion MPT driver.
> > 2.Mid layer detects command timeout and then performs
> > error recovery sequence.
> > But the sequence fails and it causes device offline.
> > (fusion MPT driver still hold the command)
> > 4.Fusion MPT driver activates host reset and calls
> > mptscsih_flush_running_cmds().
>
> Well, this is the error: The Fusion host reset should be
> activated from
> the eh_host_reset_handler.
>
Since I removed the eh timers, this is not going to happen.
The driver is always waiting for task management request
to complete before returning control back to the mid-layer.
Before when the driver used timers, the eh thread returned
immediately back to the mid-layer, and the actual
task management request to the firmware was delayed and
completed later. If the task management request didn't
complete, the timer would expire, and host reset would be called,
and it was possible to opps as Fukuchi has indicated.
Thus I think its safe to say, we can remove the
scsi_device_online from mptscsih_flush_running_cmds.
Right??
Eric Moore
LSI Logic
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: Question about scsi_device_online() usage in mptscsih
2005-04-15 16:11 Question about scsi_device_online() usage in mptscsih Moore, Eric Dean
@ 2005-04-15 16:41 ` James Bottomley
0 siblings, 0 replies; 7+ messages in thread
From: James Bottomley @ 2005-04-15 16:41 UTC (permalink / raw)
To: Moore, Eric Dean
Cc: Masao Fukuchi, SCSI Mailing List, Tejun Heo, Christoph Hellwig
On Fri, 2005-04-15 at 10:11 -0600, Moore, Eric Dean wrote:
> Thus I think its safe to say, we can remove the
> scsi_device_online from mptscsih_flush_running_cmds.
> Right??
Yes, as long as you now tear down all the outstanding commands while the
error handler is executing, you should be safe from this condition.
James
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2005-04-15 16:41 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-15 16:11 Question about scsi_device_online() usage in mptscsih Moore, Eric Dean
2005-04-15 16:41 ` James Bottomley
-- strict thread matches above, loose matches on Subject: below --
2005-04-14 20:28 Moore, Eric Dean
2005-04-15 10:13 ` Masao Fukuchi
2005-04-15 14:30 ` James Bottomley
2005-04-02 5:35 Tejun Heo
2005-04-14 18:54 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox