public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* RE: Transport affected timeouts...
@ 2004-04-22 18:54 Smart, James
  2004-04-22 19:02 ` James Bottomley
  2004-04-22 19:09 ` Brian King
  0 siblings, 2 replies; 15+ messages in thread
From: Smart, James @ 2004-04-22 18:54 UTC (permalink / raw)
  To: 'Brian King'; +Cc: 'James Bottomley', Linux SCSI Reflector

Brian,

To be honest, it's probably both.  The folks that performed the
trouble-shooting in the past blamed much of the problem on the latency, and
used link timer values to resolve it. However, since the qual was
predominantly raid arrays, I'd bet that it was heavily influenced by the
target as you indicate. (note: the resulting timeout based on r_a_tov value
is very close to just doubling the timeout). Note: I was rather surprised to
see the timeout value of sd to be 30 seconds. I know when I was in Tru64, we
had 60 seconds as a minimum.

One question though - how does the LLD really know what the timeout should
be ?  It doesn't identify a target as a raid device does it ? or what raid
level it's using ?

-- James S
 

> -----Original Message-----
> From: Brian King [mailto:brking@us.ibm.com]
> Sent: Thursday, April 22, 2004 2:15 PM
> To: Smart, James
> Cc: 'James Bottomley'; Linux SCSI Reflector
> Subject: Re: Transport affected timeouts...
> 
> 
> We are really trying to solve two different problems here. 
> The problem I am
> trying to solve with the patch I submitted is that the 
> existing r/w timeouts
> are too short for RAID array devices. Since a single read or 
> write may end
> up resulting in multiple ops in RAID 5 arrays this timeout 
> becomes far too
> short. In this scenario the LLD has the best knowledge as to 
> what this timeout
> value needs to be. A delta value here really does not make sense.
> 
> The problem you are trying to solve is more related to the 
> latencies you may
> experience due to the transport. I'm not sure my patch is the 
> best way to fix
> your problem. Updating the st driver to use this rw_timeout 
> does not sound like
> a good solution as the LLD really has no idea what the total 
> timeout for a
> read or write should be for a tape.
> 
> Option 2 works best for the ipr driver, option 3 works best 
> for you. Since we
> are really solving two different problems, how about using 
> both options?
> 
> -Brian
> 
> 
> > Potential options:
> > 1) Change the base driver timeout.  (base drivers defined 
> to be sd, st, etc)
> > 
> > I dislike this mainly because it fails (a) and (c). Also 
> concerned about
> > abilities to tune all base drivers.
> > 
> > 2) Allow the scsi-host to provide a timeout value that can 
> override the base
> > driver.
> > The IBM proposed patch does this. I dislike the patch as : 
> scsi host has no
> > input as to what the base driver timeout is; there are 
> multiple base driver
> > timeouts (sd, st, etc); thus apriori knowledge is required 
> to determine a
> > maximum. Also, application of timeout change is inconsistent.
> > 
> > 3) Allow the scsi-host to provide a transport-specific 
> increment that can be
> > added to the base driver timeout.
> > Just a refinement of (2) to hopefully remove my dislikes. 
> Still has faults
> > as the exact relationship of topology/configuration/devices 
> to needed
> > timeout is not exact/determinable.
> 
> 
> -- 
> Brian King
> eServer Storage I/O
> IBM Linux Technology Center
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread
* RE: Transport affected timeouts...
@ 2004-04-22 21:36 Smart, James
  2004-04-22 21:45 ` Brian King
  0 siblings, 1 reply; 15+ messages in thread
From: Smart, James @ 2004-04-22 21:36 UTC (permalink / raw)
  To: 'James Bottomley'; +Cc: 'Brian King', Linux SCSI Reflector



> I know the way solaris does this is to have a global variable that
> allows you to raise the timeout.  If we simply exposed 
> Brian's proposed
> parameter in sysfs, so you could change it from user space, would that
> be sufficient?


Yes.

-- James S

^ permalink raw reply	[flat|nested] 15+ messages in thread
* RE: Transport affected timeouts...
@ 2004-04-22 16:28 Smart, James
  2004-04-22 18:14 ` Brian King
  0 siblings, 1 reply; 15+ messages in thread
From: Smart, James @ 2004-04-22 16:28 UTC (permalink / raw)
  To: 'James Bottomley'; +Cc: Linux SCSI Reflector

I noted the current behavior - as it is unacceptable, and I'd like a
solution so that we can get rid of it.

The problem we're trying to solve is : there are topologies (long links,
small bb credits, large multi-lun devices thus lots of queued i/o) where the
default timeout values of the base drivers (sd, st, etc) are too short.

The optimal solution:
a) adjusts the timeout only on the scsi devices affected
b) timeout value determined dynamically by best entity (usually the scsi
host) at appropriate times (topology changes).
c) timeout adjustment does not require administrator input, apriori
knowledge, or kernel/driver rebuilds
d) addresses all commands from any source. 

Potential options:
1) Change the base driver timeout.  (base drivers defined to be sd, st, etc)

I dislike this mainly because it fails (a) and (c). Also concerned about
abilities to tune all base drivers.

2) Allow the scsi-host to provide a timeout value that can override the base
driver.
The IBM proposed patch does this. I dislike the patch as : scsi host has no
input as to what the base driver timeout is; there are multiple base driver
timeouts (sd, st, etc); thus apriori knowledge is required to determine a
maximum. Also, application of timeout change is inconsistent.

3) Allow the scsi-host to provide a transport-specific increment that can be
added to the base driver timeout.
Just a refinement of (2) to hopefully remove my dislikes. Still has faults
as the exact relationship of topology/configuration/devices to needed
timeout is not exact/determinable.

If there's not enough consensus to do (3) - then I vote for moving ahead
with the IBM patch (2), and updating the st driver as well.

-- James S

note: we are using the scsi_host_self_blocked interface to bridge reconfig
events, but that's a different topic.


> -----Original Message-----
> From: James Bottomley [mailto:James.Bottomley@SteelEye.com]
> Sent: Wednesday, April 21, 2004 3:21 PM
> To: Smart, James
> Cc: Linux SCSI Reflector
> Subject: RE: Transport affected timeouts...
> 
> 
> On Wed, 2004-04-21 at 12:53, Smart, James wrote:
> > Where do we go from here ?   
> > 
> > What we are doing in our driver is the following:
> > - Cancel the mid-layer timeout
> > - Set timeout to (cmd->timeout_per_command/HZ) + hba_offset
> > - Start timer based on new timeout value
> 
> Well, this is unacceptable.  Only the mid layer should be mucking with
> mid-layer timers.
> 
> > Where hba_offset is: (2 * R_A_TOV) + administrative 
> increment (default 0)
> > Where R_A_TOV is the fabric-reported timeout. R_A_TOV is at 
> least a round
> > trip time, plus 2 times max delivery delay time within the 
> fabric. (default
> > 10 seconds).  This value can change based on fabric 
> reconfiguration or
> > plugging the adapter into a differnet fabric.
> 
> I'm still not clear on what you're trying to achieve.
> 
> the scsi_host_self_blocked interface was created with 
> reconfig events in
> mind...it still won't stop in-progress timers, but I've been 
> considering
> adding that feature for things like FC lip events.
> 
> James
> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread
* RE: Transport affected timeouts...
@ 2004-04-21 16:53 Smart, James
  2004-04-21 19:20 ` James Bottomley
  0 siblings, 1 reply; 15+ messages in thread
From: Smart, James @ 2004-04-21 16:53 UTC (permalink / raw)
  To: Smart, James, 'James Bottomley'; +Cc: Linux SCSI Reflector

James B and others...

Where do we go from here ?   

What we are doing in our driver is the following:
- Cancel the mid-layer timeout
- Set timeout to (cmd->timeout_per_command/HZ) + hba_offset
- Start timer based on new timeout value

Where hba_offset is: (2 * R_A_TOV) + administrative increment (default 0)
Where R_A_TOV is the fabric-reported timeout. R_A_TOV is at least a round
trip time, plus 2 times max delivery delay time within the fabric. (default
10 seconds).  This value can change based on fabric reconfiguration or
plugging the adapter into a differnet fabric.

-- James S
  

> -----Original Message-----
> From: Smart, James 
> Sent: Friday, April 16, 2004 4:13 PM
> To: 'James Bottomley'
> Cc: Linux SCSI Reflector
> Subject: RE: Transport affected timeouts...
> 
> 
> 
> > -----Original Message-----
> > From: James Bottomley [mailto:James.Bottomley@SteelEye.com]
> > Sent: Friday, April 16, 2004 3:47 PM
> > To: Smart, James
> > Cc: Linux SCSI Reflector
> > Subject: RE: Transport affected timeouts...
> > 
> > 
> > On Fri, 2004-04-16 at 14:39, Smart, James wrote:
> > > I had looked at the patch.  I don't think it works as well as the
> > > incrementer. But it would be a start. We would need the st 
> > driver, and scsi
> > > generic to use it as well. It doesn't address the timeout 
> > changing post
> > > slave_configure. The other thing that bothers me is that it 
> > uses an explicit
> > > value. As per the thread, it would have been better to 
> know what the
> > > original default was and just increment/double it - and 
> > there is the issue
> > > of different device types needing different defaults.
> > 
> > I don't think it's a good idea to alter *every* timeout, merely the
> > usual ones (hence, really only read and write in the patch).
> 
> In general, we're reflecting larger/longer topologies that 
> just induce more
> latency in performing an i/o and which can have an aggregate 
> effect overall
> to i/o queued in/for the target. Didn't matter whether it was 
> an usual i/o
> or not.
> 
> > 
> > Why do you need it to be variable post slave_configure?
> 
> Hmm... the gist of the argument is that the adapter could be 
> replugged to
> another fabric that has larger or lower timeouts needed. But - I would
> assume a midlayer rescan would have to occur on such a 
> change. New devices
> are covered as slave_configure would be called for them. But, is
> slave_configure called again on existing devices that change 
> "personality"
> as they are now a different physical device?
> 
> -- james
>  
> -
> To unsubscribe from this list: send the line "unsubscribe 
> linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread
* RE: Transport affected timeouts...
@ 2004-04-16 20:13 Smart, James
  0 siblings, 0 replies; 15+ messages in thread
From: Smart, James @ 2004-04-16 20:13 UTC (permalink / raw)
  To: 'James Bottomley'; +Cc: Linux SCSI Reflector


> -----Original Message-----
> From: James Bottomley [mailto:James.Bottomley@SteelEye.com]
> Sent: Friday, April 16, 2004 3:47 PM
> To: Smart, James
> Cc: Linux SCSI Reflector
> Subject: RE: Transport affected timeouts...
> 
> 
> On Fri, 2004-04-16 at 14:39, Smart, James wrote:
> > I had looked at the patch.  I don't think it works as well as the
> > incrementer. But it would be a start. We would need the st 
> driver, and scsi
> > generic to use it as well. It doesn't address the timeout 
> changing post
> > slave_configure. The other thing that bothers me is that it 
> uses an explicit
> > value. As per the thread, it would have been better to know what the
> > original default was and just increment/double it - and 
> there is the issue
> > of different device types needing different defaults.
> 
> I don't think it's a good idea to alter *every* timeout, merely the
> usual ones (hence, really only read and write in the patch).

In general, we're reflecting larger/longer topologies that just induce more
latency in performing an i/o and which can have an aggregate effect overall
to i/o queued in/for the target. Didn't matter whether it was an usual i/o
or not.

> 
> Why do you need it to be variable post slave_configure?

Hmm... the gist of the argument is that the adapter could be replugged to
another fabric that has larger or lower timeouts needed. But - I would
assume a midlayer rescan would have to occur on such a change. New devices
are covered as slave_configure would be called for them. But, is
slave_configure called again on existing devices that change "personality"
as they are now a different physical device?

-- james
 

^ permalink raw reply	[flat|nested] 15+ messages in thread
* RE: Transport affected timeouts...
@ 2004-04-16 19:39 Smart, James
  2004-04-16 19:46 ` James Bottomley
  0 siblings, 1 reply; 15+ messages in thread
From: Smart, James @ 2004-04-16 19:39 UTC (permalink / raw)
  To: 'James Bottomley'; +Cc: Linux SCSI Reflector

I had looked at the patch.  I don't think it works as well as the
incrementer. But it would be a start. We would need the st driver, and scsi
generic to use it as well. It doesn't address the timeout changing post
slave_configure. The other thing that bothers me is that it uses an explicit
value. As per the thread, it would have been better to know what the
original default was and just increment/double it - and there is the issue
of different device types needing different defaults.

-- james


> -----Original Message-----
> From: James Bottomley [mailto:James.Bottomley@SteelEye.com]
> Sent: Friday, April 16, 2004 3:24 PM
> To: Smart, James
> Cc: Linux SCSI Reflector
> Subject: Re: Transport affected timeouts...
> 
> 
> On Fri, 2004-04-16 at 10:40, Smart, James wrote:
> > One issue that we're wrestling with in our driver is 
> timeout values. In the
> > past, we've encountered large configurations where the 
> timeouts from the
> > midlayer are insufficient. In general - we'd like the scsi 
> host to be able
> > to add a transport/topology increment time to the base 
> timeout values.  The
> > methodology would have to be dynamic as it may change as 
> link connectivity
> > changes.
> > 
> > Obviously, an hba driver mucking with the timeout values 
> handed to it is
> > frowned upon. Is there a recommendation on how we should 
> handle this ?
> 
> There's no currently agreed upon framework, but would
> 
> http://www-124.ibm.com/storageio/ipr/patch-2.6.5-sd_timeout_mod
> 
> Do for what you want?
> 
> James
> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread
* Transport affected timeouts...
@ 2004-04-16 15:40 Smart, James
  2004-04-16 19:24 ` James Bottomley
  0 siblings, 1 reply; 15+ messages in thread
From: Smart, James @ 2004-04-16 15:40 UTC (permalink / raw)
  To: Linux SCSI Reflector

All,

One issue that we're wrestling with in our driver is timeout values. In the
past, we've encountered large configurations where the timeouts from the
midlayer are insufficient. In general - we'd like the scsi host to be able
to add a transport/topology increment time to the base timeout values.  The
methodology would have to be dynamic as it may change as link connectivity
changes.

Obviously, an hba driver mucking with the timeout values handed to it is
frowned upon. Is there a recommendation on how we should handle this ?

-- james s

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2004-05-03 15:50 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-04-22 18:54 Transport affected timeouts Smart, James
2004-04-22 19:02 ` James Bottomley
2004-04-22 19:09 ` Brian King
  -- strict thread matches above, loose matches on Subject: below --
2004-04-22 21:36 Smart, James
2004-04-22 21:45 ` Brian King
2004-05-03 15:49   ` Brian King
2004-04-22 16:28 Smart, James
2004-04-22 18:14 ` Brian King
2004-04-21 16:53 Smart, James
2004-04-21 19:20 ` James Bottomley
2004-04-16 20:13 Smart, James
2004-04-16 19:39 Smart, James
2004-04-16 19:46 ` James Bottomley
2004-04-16 15:40 Smart, James
2004-04-16 19:24 ` James Bottomley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox