public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* Question about scsi_device_online() usage in mptscsih
@ 2005-04-02  5:35 Tejun Heo
  2005-04-14 18:54 ` Christoph Hellwig
  0 siblings, 1 reply; 7+ messages in thread
From: Tejun Heo @ 2005-04-02  5:35 UTC (permalink / raw)
  To: Eric.Moore, James Bottomley; +Cc: linux-scsi

 Hello, Eric.
 Hello, James.

 I've been working on new SCSI state model and was checking on
scsi_device_online() users.  As the state model is going to change, I
need to audit device state usages in lldd's and I'm having difficult
time understanding why scsi_device_online() is used in mptscsih.

 In mptscsih.c, mptscsih_flush_running_cmds() uses
scsi_device_online() to determine whether or not dma-unmap data area
of active commands.  This was added in the changeset 1.1371.776.1 by
Eric Moore with the comment "MPT Fusion add back FC909 support".  Can
you please explain me why and how scsi_device_online() condition is
used here?

 Thanks a lot. :-)

-- 
tejun


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Question about scsi_device_online() usage in mptscsih
  2005-04-02  5:35 Tejun Heo
@ 2005-04-14 18:54 ` Christoph Hellwig
  0 siblings, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2005-04-14 18:54 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Eric.Moore, James Bottomley, linux-scsi

On Sat, Apr 02, 2005 at 02:35:57PM +0900, Tejun Heo wrote:
>  Hello, Eric.
>  Hello, James.
> 
>  I've been working on new SCSI state model and was checking on
> scsi_device_online() users.  As the state model is going to change, I
> need to audit device state usages in lldd's and I'm having difficult
> time understanding why scsi_device_online() is used in mptscsih.
> 
>  In mptscsih.c, mptscsih_flush_running_cmds() uses
> scsi_device_online() to determine whether or not dma-unmap data area
> of active commands.  This was added in the changeset 1.1371.776.1 by
> Eric Moore with the comment "MPT Fusion add back FC909 support".  Can
> you please explain me why and how scsi_device_online() condition is
> used here?

I brought that issue up a while ago, but we didn't really get anywhere,
see the "fix dma mapping leak in fusion" thread on linux-scsi.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: Question about scsi_device_online() usage in mptscsih
@ 2005-04-14 20:28 Moore, Eric Dean
  2005-04-15 10:13 ` Masao Fukuchi
  0 siblings, 1 reply; 7+ messages in thread
From: Moore, Eric Dean @ 2005-04-14 20:28 UTC (permalink / raw)
  To: fukuchi.masao; +Cc: James Bottomley, linux-scsi, Tejun Heo, Christoph Hellwig

Masao Fukuchi reported an issue when the
outstanding mf queue was being flushed in
contents of slave_destroy, the scsi dev was 
offlined already by the mid-layer.

They may not be an issue anymore since I removed
the error handling timers from mptscsih.

Masao can you validate this pls?

Christoph Hellwig, wrote:
> 
> On Sat, Apr 02, 2005 at 02:35:57PM +0900, Tejun Heo wrote:
> >  Hello, Eric.
> >  Hello, James.
> > 
> >  I've been working on new SCSI state model and was checking on
> > scsi_device_online() users.  As the state model is going to 
> change, I
> > need to audit device state usages in lldd's and I'm having difficult
> > time understanding why scsi_device_online() is used in mptscsih.
> > 
> >  In mptscsih.c, mptscsih_flush_running_cmds() uses
> > scsi_device_online() to determine whether or not dma-unmap data area
> > of active commands.  This was added in the changeset 1.1371.776.1 by
> > Eric Moore with the comment "MPT Fusion add back FC909 
> support".  Can
> > you please explain me why and how scsi_device_online() condition is
> > used here?
> 
> I brought that issue up a while ago, but we didn't really get 
> anywhere,
> see the "fix dma mapping leak in fusion" thread on linux-scsi.
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Question about scsi_device_online() usage in mptscsih
  2005-04-14 20:28 Question about scsi_device_online() usage in mptscsih Moore, Eric Dean
@ 2005-04-15 10:13 ` Masao Fukuchi
  2005-04-15 14:30   ` James Bottomley
  0 siblings, 1 reply; 7+ messages in thread
From: Masao Fukuchi @ 2005-04-15 10:13 UTC (permalink / raw)
  To: Moore, Eric Dean
  Cc: James Bottomley, linux-scsi, Tejun Heo, Christoph Hellwig

Hi Eric.

The problem was caused by calling pci_unmap_sg() 
after device offline.

The sequence is: 
1.Host issues SCSI command to fusion MPT driver.
2.Mid layer detects command timeout and then performs 
  error recovery sequence. 
  But the sequence fails and it causes device offline.
  (fusion MPT driver still hold the command)
4.Fusion MPT driver activates host reset and calls
  mptscsih_flush_running_cmds().
5.In mptscsih_flush_running_cmds(), pci_unmap_sg() is called
  for all SCSI cmds.
  But the buffer had already freed and this causes oops.

I don't know your removal of error handling timer resolves
this issue.

Masao Fukuchi

Moore, Eric Dean wrote:
>Masao Fukuchi reported an issue when the
>outstanding mf queue was being flushed in
>contents of slave_destroy, the scsi dev was 
>offlined already by the mid-layer.
>
>They may not be an issue anymore since I removed
>the error handling timers from mptscsih.
>
>Masao can you validate this pls?
>
>Christoph Hellwig, wrote:
>> 
>> On Sat, Apr 02, 2005 at 02:35:57PM +0900, Tejun Heo wrote:
>> >  Hello, Eric.
>> >  Hello, James.
>> > 
>> >  I've been working on new SCSI state model and was checking on
>> > scsi_device_online() users.  As the state model is going to 
>> change, I
>> > need to audit device state usages in lldd's and I'm having difficult
>> > time understanding why scsi_device_online() is used in mptscsih.
>> > 
>> >  In mptscsih.c, mptscsih_flush_running_cmds() uses
>> > scsi_device_online() to determine whether or not dma-unmap data area
>> > of active commands.  This was added in the changeset 1.1371.776.1 by
>> > Eric Moore with the comment "MPT Fusion add back FC909 
>> support".  Can
>> > you please explain me why and how scsi_device_online() condition is
>> > used here?
>> 
>> I brought that issue up a while ago, but we didn't really get 
>> anywhere,
>> see the "fix dma mapping leak in fusion" thread on linux-scsi.
>> 
>-
>To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Question about scsi_device_online() usage in mptscsih
  2005-04-15 10:13 ` Masao Fukuchi
@ 2005-04-15 14:30   ` James Bottomley
  0 siblings, 0 replies; 7+ messages in thread
From: James Bottomley @ 2005-04-15 14:30 UTC (permalink / raw)
  To: Masao Fukuchi
  Cc: Moore, Eric Dean, SCSI Mailing List, Tejun Heo, Christoph Hellwig

On Fri, 2005-04-15 at 19:13 +0900, Masao Fukuchi wrote:
> The sequence is: 
> 1.Host issues SCSI command to fusion MPT driver.
> 2.Mid layer detects command timeout and then performs 
>   error recovery sequence. 
>   But the sequence fails and it causes device offline.
>   (fusion MPT driver still hold the command)
> 4.Fusion MPT driver activates host reset and calls
>   mptscsih_flush_running_cmds().

Well, this is the error:  The Fusion host reset should be activated from
the eh_host_reset_handler.

> 5.In mptscsih_flush_running_cmds(), pci_unmap_sg() is called
>   for all SCSI cmds.
>   But the buffer had already freed and this causes oops.

If you do the above, the command will still be valid.  If you wait until
after the device is offlined, of course the mid-layer throws away the
commands, since it's up to the driver to clean them up during the reset
sequences.

James



^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: Question about scsi_device_online() usage in mptscsih
@ 2005-04-15 16:11 Moore, Eric Dean
  2005-04-15 16:41 ` James Bottomley
  0 siblings, 1 reply; 7+ messages in thread
From: Moore, Eric Dean @ 2005-04-15 16:11 UTC (permalink / raw)
  To: James Bottomley, Masao Fukuchi
  Cc: SCSI Mailing List, Tejun Heo, Christoph Hellwig

On Friday, April 15, 2005 8:30 AM, James Bottomley wrote:
> 
> On Fri, 2005-04-15 at 19:13 +0900, Masao Fukuchi wrote:
> > The sequence is: 
> > 1.Host issues SCSI command to fusion MPT driver.
> > 2.Mid layer detects command timeout and then performs 
> >   error recovery sequence. 
> >   But the sequence fails and it causes device offline.
> >   (fusion MPT driver still hold the command)
> > 4.Fusion MPT driver activates host reset and calls
> >   mptscsih_flush_running_cmds().
> 
> Well, this is the error:  The Fusion host reset should be 
> activated from
> the eh_host_reset_handler.
> 

Since I removed the eh timers, this is not going to happen.
The driver is always waiting for task management request 
to complete before returning control back to the mid-layer.
Before when the driver used timers, the eh thread returned
immediately back to the mid-layer, and the actual
task management request to the firmware was delayed and 
completed later. If the task management request didn't 
complete, the timer would expire, and host reset would be called,
and it was possible to opps as Fukuchi has indicated.

Thus I think its safe to say, we can remove the 
scsi_device_online from mptscsih_flush_running_cmds.
Right??

Eric Moore
LSI Logic


^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: Question about scsi_device_online() usage in mptscsih
  2005-04-15 16:11 Moore, Eric Dean
@ 2005-04-15 16:41 ` James Bottomley
  0 siblings, 0 replies; 7+ messages in thread
From: James Bottomley @ 2005-04-15 16:41 UTC (permalink / raw)
  To: Moore, Eric Dean
  Cc: Masao Fukuchi, SCSI Mailing List, Tejun Heo, Christoph Hellwig

On Fri, 2005-04-15 at 10:11 -0600, Moore, Eric Dean wrote:
> Thus I think its safe to say, we can remove the 
> scsi_device_online from mptscsih_flush_running_cmds.
> Right??

Yes, as long as you now tear down all the outstanding commands while the
error handler is executing, you should be safe from this condition.

James



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2005-04-15 16:41 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-14 20:28 Question about scsi_device_online() usage in mptscsih Moore, Eric Dean
2005-04-15 10:13 ` Masao Fukuchi
2005-04-15 14:30   ` James Bottomley
  -- strict thread matches above, loose matches on Subject: below --
2005-04-15 16:11 Moore, Eric Dean
2005-04-15 16:41 ` James Bottomley
2005-04-02  5:35 Tejun Heo
2005-04-14 18:54 ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox