[RFC] Persistent naming of scsi devices

linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC] Persistent naming of scsi devices
@ 2002-04-08 15:18 sullivan
  2002-04-08 15:04 ` Christoph Hellwig
                   ` (2 more replies)
  0 siblings, 3 replies; 297+ messages in thread
From: sullivan @ 2002-04-08 15:18 UTC (permalink / raw)
  To: linux-scsi

I have been working on a prototype to allow the persistent naming of scsi devices across boots. The prototype attempts to address the namespace slippage that occurs when names are assigned based on discovery order or topology, and a hardware configuration change occurs. It does this by assigning names based on the characteristics of the device.

The prototype and more detailed info can be found at:
http://oss.software.ibm.com/devreg/

The prototype utilizes 3 components:
1. driverfs to collect and publish the characteristics of the device
2. devfs to generate insertion and removal events
3. devfsd library to handle the events, parse driverfs, and apply the naming logic.

I am also currently working on a non devfs version that uses /sbin/hotplug as the event generation mechanism.

I would welcome any thoughts, comments, or suggestions.

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Persistent naming of scsi devices
  2002-04-08 15:18 [RFC] Persistent naming of scsi devices sullivan
@ 2002-04-08 15:04 ` Christoph Hellwig
  2002-04-08 15:59   ` Matthew Jacob
                     ` (3 more replies)
  2002-04-08 20:18 ` Eddie Williams
  2002-04-09  0:48 ` Kurt Garloff
  2 siblings, 4 replies; 297+ messages in thread
From: Christoph Hellwig @ 2002-04-08 15:04 UTC (permalink / raw)
  To: sullivan; +Cc: linux-scsi

On Mon, Apr 08, 2002 at 09:18:44AM -0600, sullivan wrote:
> I would welcome any thoughts, comments, or suggestions.

What is the rationale for doing this?  I think we should rather get people
to actually use the UUID= option to mount..


^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Persistent naming of scsi devices
  2002-04-08 15:04 ` Christoph Hellwig
@ 2002-04-08 15:59   ` Matthew Jacob
  2002-04-08 16:34   ` James Bottomley
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 297+ messages in thread
From: Matthew Jacob @ 2002-04-08 15:59 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: sullivan, linux-scsi



On Mon, 8 Apr 2002, Christoph Hellwig wrote:

> On Mon, Apr 08, 2002 at 09:18:44AM -0600, sullivan wrote:
> > I would welcome any thoughts, comments, or suggestions.
> 
> What is the rationale for doing this?  I think we should rather get people
> to actually use the UUID= option to mount..
right on!



^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Persistent naming of scsi devices
  2002-04-08 15:04 ` Christoph Hellwig
  2002-04-08 15:59   ` Matthew Jacob
@ 2002-04-08 16:34   ` James Bottomley
  2002-04-08 18:27     ` Patrick Mansfield
  2002-04-08 17:51   ` Oliver Neukum
  2002-04-08 18:45   ` Tigran Aivazian
  3 siblings, 1 reply; 297+ messages in thread
From: James Bottomley @ 2002-04-08 16:34 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: sullivan, linux-scsi

On Mon, Apr 08, 2002 at 09:18:44AM -0600, sullivan wrote:
> I would welcome any thoughts, comments, or suggestions.

hch@infradead.org said:
> What is the rationale for doing this?  I think we should rather get
> people to actually use the UUID= option to mount.. 

I think, actually, that this feature is aimed more generally.  UUID really 
only works for filesystems that support it (and in Linux, I believe, this is 
currently only ext2/3 for 2.4 plus reiser in 2.5).  A persistent device 
binding would be of use to people adopting Linux in their SAN enterprise who 
were not (at least initially) planning to convert all their filesystems over 
to Linux ones.  It would also work for raw devices, which don't have an 
underlying filesystem to store the UUID.

I'm not sure we want to get into writing a UUID to a "safe" place on a device 
for such legacy filesystem types (especially as most administrators loathe 
this feature in NT).

The proposal for the device naming project says 

"[...] Some of the characteristics examined include information returned from 
the SCSI Inquiry command, and the labels found on the disks partitions."

which implies to me that you can bind by UUID if desired.

One of the things I'm not so keen on in the proposed patch is that there is a 
large amount of code placed in the kernel for determining the vital product 
data IDs.  For the scheme to be fully flexible, it would be much better to 
move this out to user land, using the /dev/sg pipe into the device, so that 
determining and using the device "characteristics" can be much more flexible.  
Hotplug is probably more useful to this type of user level based configuration 
than devfs et al.

James

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Persistent naming of scsi devices
  2002-04-08 16:34   ` James Bottomley
@ 2002-04-08 18:27     ` Patrick Mansfield
  2002-04-08 19:17       ` James Bottomley
  0 siblings, 1 reply; 297+ messages in thread
From: Patrick Mansfield @ 2002-04-08 18:27 UTC (permalink / raw)
  To: James Bottomley; +Cc: Christoph Hellwig, sullivan, linux-scsi

On Mon, Apr 08, 2002 at 11:34:26AM -0500, James Bottomley wrote:
> On Mon, Apr 08, 2002 at 09:18:44AM -0600, sullivan wrote:
> > I would welcome any thoughts, comments, or suggestions.

> One of the things I'm not so keen on in the proposed patch is that there is a 
> large amount of code placed in the kernel for determining the vital product 
> data IDs.  For the scheme to be fully flexible, it would be much better to 
> move this out to user land, using the /dev/sg pipe into the device, so that 
> determining and using the device "characteristics" can be much more flexible.  
> Hotplug is probably more useful to this type of user level based configuration 
> than devfs et al.
> 
> James

James -

We could have a set of interfaces to get a SCSI UUID, and then have
hotplug tell us which interface (or module?) to use. That way we don't
require sg.

The device_list[] (black/white list) should also come from user land.
Maybe hotplug could get the device_list[] characteristics, including
the UUID method for the device.

I'd also like a uuid stored in Scsi_Device for multi-path support in the
mid-layer (independent of how it's set). This uuid could not be stored
in the partition or as part of the file system.

-- Patrick

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Persistent naming of scsi devices
  2002-04-08 18:27     ` Patrick Mansfield
@ 2002-04-08 19:17       ` James Bottomley
  2002-04-09  0:22         ` Douglas Gilbert
  2002-04-09 14:55         ` sullivan
  0 siblings, 2 replies; 297+ messages in thread
From: James Bottomley @ 2002-04-08 19:17 UTC (permalink / raw)
  To: Patrick Mansfield
  Cc: James Bottomley, Christoph Hellwig, sullivan, linux-scsi

patmans@us.ibm.com said:
> We could have a set of interfaces to get a SCSI UUID, and then have
> hotplug tell us which interface (or module?) to use. That way we don't
> require sg.

I'm in two minds about this one.  Deciding exactly what constitutes the UUID 
for a particular device can be non trivial.  Usually you have to probe for the 
supported vital product pages, and if they have the WWN one use that otherwise 
fall back to the (less guaranteed to be unique) SCSI-2 serial number page or 
finally make something up dependent on what unique numbers you can get from 
the device.

It makes sense to me that this type of complex rule (of thumb almost) based 
lookup is best done from user level by doing the explicit SCSI commands if 
necessary.

However, I'm biased.  From a philosophical point of view, I like the hotplug 
approach because it allows us to eject a lot of the use once initialisation 
code into user space from the kernel.

> The device_list[] (black/white list) should also come from user land.
> Maybe hotplug could get the device_list[] characteristics, including
> the UUID method for the device.

> I'd also like a uuid stored in Scsi_Device for multi-path support in
> the mid-layer (independent of how it's set). This uuid could not be
> stored in the partition or as part of the file system. 

If we can come up with a nice, general mechanism, there's no reason why it 
cannot apply outside the SCSI system, so I wouldn't necessarily tie it to 
Scsi_Devce.  More likely (and actually what the persistent binding begins) is 
to tie it to the concept of the Mochel internal device tree.

James

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Persistent naming of scsi devices
  2002-04-08 19:17       ` James Bottomley
@ 2002-04-09  0:22         ` Douglas Gilbert
  2002-04-09 14:35           ` sullivan
  2002-04-09 14:55         ` sullivan
  1 sibling, 1 reply; 297+ messages in thread
From: Douglas Gilbert @ 2002-04-09  0:22 UTC (permalink / raw)
  To: James Bottomley
  Cc: Patrick Mansfield, Christoph Hellwig, sullivan, linux-scsi

James Bottomley wrote:
> 
> patmans@us.ibm.com said:
> > We could have a set of interfaces to get a SCSI UUID, and then have
> > hotplug tell us which interface (or module?) to use. That way we don't
> > require sg.
> 
> I'm in two minds about this one.  Deciding exactly what constitutes the UUID
> for a particular device can be non trivial.  Usually you have to probe for the
> supported vital product pages, and if they have the WWN one use that otherwise
> fall back to the (less guaranteed to be unique) SCSI-2 serial number page or
> finally make something up dependent on what unique numbers you can get from
> the device.
> 
> It makes sense to me that this type of complex rule (of thumb almost) based
> lookup is best done from user level by doing the explicit SCSI commands if
> necessary.
> 
> However, I'm biased.  From a philosophical point of view, I like the hotplug
> approach because it allows us to eject a lot of the use once initialisation
> code into user space from the kernel.

Interesting discussion.

To see what the hotplug subsystem approach might look like,
see http://www.torque.net/scsi/scsimon.html . This is a
proposed scsi subsystem upper level driver (gathering dust). 
What sets it apart from the others is that it is not device
based. It has a single (misc) device name and can be thought 
of as a window through to the scsi mid level. It supplies 
hotplug alerts whenever a scsi device is attached or detached.
[Unfortunately scsi hosts being registered or unregistered
does not cause hotplug alerts.] Attaches and detaches are
given event numbers (ascending sequence) and a time_since_boot.

That said, I like the devfs/devfsd approach as well.

> > The device_list[] (black/white list) should also come from user land.
> > Maybe hotplug could get the device_list[] characteristics, including
> > the UUID method for the device.
> 
> > I'd also like a uuid stored in Scsi_Device for multi-path support in
> > the mid-layer (independent of how it's set). This uuid could not be
> > stored in the partition or as part of the file system.
> 
> If we can come up with a nice, general mechanism, there's no reason why it
> cannot apply outside the SCSI system, so I wouldn't necessarily tie it to
> Scsi_Devce.  More likely (and actually what the persistent binding begins) is
> to tie it to the concept of the Mochel internal device tree.

In Patrick Mansfield's and my report_lun/twin_inquiry patch
the full INQUIRY response (evpd=0 cmdtt=0) is placed in
Scsi_Device. It could be useful to make that available 
(in ascii-hex) via driverfs/hot_plug to the user space.

Eric's point is a good one about dirty INQUIRY data. This
is especially the case with what might be a large
group of "scsi" devices that linux needs to cope with:
usb2 and ieee1394 talking to an external ATA disk. For
the same Maxtor disk one INQUIRY (1394) tells me that it
is a scsi 6 compliant device while the other INQUIRY (usb2)
truncates the response at 28 bytes. I'm pretty sure they ignore
the evpd bit as well.

I'm attempting to get this patch working against 2.5.7-dj3
(as it has captured most of the good patches from the list
that haven't made it any further to date). There is a long
way to go but here are some notes:
  - driverfs is documented in:
     Documentation/driver-model.txt
     Documentation/filesystems/driverfs.txt
  - driverfs in built into 2.5 kernels but needs to be
    mounted. The documentation uses "/devices" as a
    mount point; driverfs seems more comfortable with me:
      mkdir /driverfs ; mount -t driverfs none /driverfs

Doug Gilbert

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Persistent naming of scsi devices
  2002-04-09  0:22         ` Douglas Gilbert
@ 2002-04-09 14:35           ` sullivan
  0 siblings, 0 replies; 297+ messages in thread
From: sullivan @ 2002-04-09 14:35 UTC (permalink / raw)
  To: Douglas Gilbert; +Cc: linux-scsi

On Mon, Apr 08, 2002 at 08:22:57PM -0400, Douglas Gilbert wrote:
> James Bottomley wrote:
> > 
> > patmans@us.ibm.com said:
> > > We could have a set of interfaces to get a SCSI UUID, and then have
> > > hotplug tell us which interface (or module?) to use. That way we don't
> > > require sg.
> > 
> > I'm in two minds about this one.  Deciding exactly what constitutes the UUID
> > for a particular device can be non trivial.  Usually you have to probe for the
> > supported vital product pages, and if they have the WWN one use that otherwise
> > fall back to the (less guaranteed to be unique) SCSI-2 serial number page or
> > finally make something up dependent on what unique numbers you can get from
> > the device.
> > 
> > It makes sense to me that this type of complex rule (of thumb almost) based
> > lookup is best done from user level by doing the explicit SCSI commands if
> > necessary.
> > 
> > However, I'm biased.  From a philosophical point of view, I like the hotplug
> > approach because it allows us to eject a lot of the use once initialisation
> > code into user space from the kernel.
> 
> Interesting discussion.
> 
> To see what the hotplug subsystem approach might look like,
> see http://www.torque.net/scsi/scsimon.html . This is a
> proposed scsi subsystem upper level driver (gathering dust). 
> What sets it apart from the others is that it is not device
> based. It has a single (misc) device name and can be thought 
> of as a window through to the scsi mid level. It supplies 
> hotplug alerts whenever a scsi device is attached or detached.
> [Unfortunately scsi hosts being registered or unregistered
> does not cause hotplug alerts.] Attaches and detaches are
> given event numbers (ascending sequence) and a time_since_boot.
> 
> That said, I like the devfs/devfsd approach as well.
>  
Thanks Doug. I've taken a look at it looks like it would work very well in providing the event generation support in the non devfs case.


> > > The device_list[] (black/white list) should also come from user land.
> > > Maybe hotplug could get the device_list[] characteristics, including
> > > the UUID method for the device.
> > 
> > > I'd also like a uuid stored in Scsi_Device for multi-path support in
> > > the mid-layer (independent of how it's set). This uuid could not be
> > > stored in the partition or as part of the file system.
> > 
> > If we can come up with a nice, general mechanism, there's no reason why it
> > cannot apply outside the SCSI system, so I wouldn't necessarily tie it to
> > Scsi_Devce.  More likely (and actually what the persistent binding begins) is
> > to tie it to the concept of the Mochel internal device tree.
> 
Patrick Mochel had mentioned before that he was looking at providing a unique identifier field as part of the common driverfs infrastructure. I'll check and see what his current thoughts are.

> In Patrick Mansfield's and my report_lun/twin_inquiry patch
> the full INQUIRY response (evpd=0 cmdtt=0) is placed in
> Scsi_Device. It could be useful to make that available 
> (in ascii-hex) via driverfs/hot_plug to the user space.
Sounds good. I'll take a look at the patch and see about adding the driverfs implementation needed to support it.

> 
> Eric's point is a good one about dirty INQUIRY data. This
> is especially the case with what might be a large
> group of "scsi" devices that linux needs to cope with:
> usb2 and ieee1394 talking to an external ATA disk. For
> the same Maxtor disk one INQUIRY (1394) tells me that it
> is a scsi 6 compliant device while the other INQUIRY (usb2)
> truncates the response at 28 bytes. I'm pretty sure they ignore
> the evpd bit as well.
> 
> 
> I'm attempting to get this patch working against 2.5.7-dj3
> (as it has captured most of the good patches from the list
> that haven't made it any further to date). There is a long
> way to go but here are some notes:
>   - driverfs is documented in:
>      Documentation/driver-model.txt
>      Documentation/filesystems/driverfs.txt
>   - driverfs in built into 2.5 kernels but needs to be
>     mounted. The documentation uses "/devices" as a
>     mount point; driverfs seems more comfortable with me:
>       mkdir /driverfs ; mount -t driverfs none /driverfs
> 
> Doug Gilbert

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Persistent naming of scsi devices
  2002-04-08 19:17       ` James Bottomley
  2002-04-09  0:22         ` Douglas Gilbert
@ 2002-04-09 14:55         ` sullivan
  1 sibling, 0 replies; 297+ messages in thread
From: sullivan @ 2002-04-09 14:55 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi

On Mon, Apr 08, 2002 at 02:17:42PM -0500, James Bottomley wrote:
> patmans@us.ibm.com said:
> > We could have a set of interfaces to get a SCSI UUID, and then have
> > hotplug tell us which interface (or module?) to use. That way we don't
> > require sg.
> 
> I'm in two minds about this one.  Deciding exactly what constitutes the UUID 
> for a particular device can be non trivial.  Usually you have to probe for the 
> supported vital product pages, and if they have the WWN one use that otherwise 
> fall back to the (less guaranteed to be unique) SCSI-2 serial number page or 
> finally make something up dependent on what unique numbers you can get from 
> the device.
> 
> It makes sense to me that this type of complex rule (of thumb almost) based 
> lookup is best done from user level by doing the explicit SCSI commands if 
> necessary.
> 
> However, I'm biased.  From a philosophical point of view, I like the hotplug 
> approach because it allows us to eject a lot of the use once initialisation 
> code into user space from the kernel.
> 
> > The device_list[] (black/white list) should also come from user land.
> > Maybe hotplug could get the device_list[] characteristics, including
> > the UUID method for the device.
> 
> > I'd also like a uuid stored in Scsi_Device for multi-path support in
> > the mid-layer (independent of how it's set). This uuid could not be
> > stored in the partition or as part of the file system. 
> 
> If we can come up with a nice, general mechanism, there's no reason why it 
> cannot apply outside the SCSI system, so I wouldn't necessarily tie it to 
> Scsi_Devce.  More likely (and actually what the persistent binding begins) is 
> to tie it to the concept of the Mochel internal device tree.

I think this is the decision that is the most difficult. Setting up nodes on driverfs provides a consistent location and format (if we are careful) for surfacing information on devices. I find it much more difficult to keep up with where stuff is and what format it's in in /proc. Also, I must confess that keeping track of what userland utilites to use to collect various pieces of device information strains my limited memory cells. Of course this comes at the cost of extra code in the kernel.

> 
> James
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Persistent naming of scsi devices
  2002-04-08 15:04 ` Christoph Hellwig
  2002-04-08 15:59   ` Matthew Jacob
  2002-04-08 16:34   ` James Bottomley
@ 2002-04-08 17:51   ` Oliver Neukum
  2002-04-08 18:01     ` Christoph Hellwig
  2002-04-08 18:18     ` Matthew Jacob
  2002-04-08 18:45   ` Tigran Aivazian
  3 siblings, 2 replies; 297+ messages in thread
From: Oliver Neukum @ 2002-04-08 17:51 UTC (permalink / raw)
  To: Christoph Hellwig, sullivan; +Cc: linux-scsi

On Monday 08 April 2002 17:04, Christoph Hellwig wrote:
> On Mon, Apr 08, 2002 at 09:18:44AM -0600, sullivan wrote:
> > I would welcome any thoughts, comments, or suggestions.
>
> What is the rationale for doing this?  I think we should rather get people
> to actually use the UUID= option to mount..

UUID cannot solve the problem. It identifies filesystems.
There's a need to identify devices. Not everything
is a hard drive with a filesystem on it.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Persistent naming of scsi devices
  2002-04-08 17:51   ` Oliver Neukum
@ 2002-04-08 18:01     ` Christoph Hellwig
  2002-04-08 18:18     ` Matthew Jacob
  1 sibling, 0 replies; 297+ messages in thread
From: Christoph Hellwig @ 2002-04-08 18:01 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: Christoph Hellwig, sullivan, linux-scsi

On Mon, Apr 08, 2002 at 07:51:59PM +0200, Oliver Neukum wrote:
> On Monday 08 April 2002 17:04, Christoph Hellwig wrote:
> > On Mon, Apr 08, 2002 at 09:18:44AM -0600, sullivan wrote:
> > > I would welcome any thoughts, comments, or suggestions.
> >
> > What is the rationale for doing this?  I think we should rather get people
> > to actually use the UUID= option to mount..
> 
> UUID cannot solve the problem. It identifies filesystems.

Or volumes..

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Persistent naming of scsi devices
  2002-04-08 17:51   ` Oliver Neukum
  2002-04-08 18:01     ` Christoph Hellwig
@ 2002-04-08 18:18     ` Matthew Jacob
  2002-04-08 18:28       ` James Bottomley
  1 sibling, 1 reply; 297+ messages in thread
From: Matthew Jacob @ 2002-04-08 18:18 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: Christoph Hellwig, sullivan, linux-scsi

On Mon, 8 Apr 2002, Oliver Neukum wrote:

> On Monday 08 April 2002 17:04, Christoph Hellwig wrote:
> > On Mon, Apr 08, 2002 at 09:18:44AM -0600, sullivan wrote:
> > > I would welcome any thoughts, comments, or suggestions.
> >
> > What is the rationale for doing this?  I think we should rather get people
> > to actually use the UUID= option to mount..
> 
> UUID cannot solve the problem. It identifies filesystems.
> There's a need to identify devices. Not everything
> is a hard drive with a filesystem on it.
> 

Then let's use WWPN/WWNS and/or drive serial #'s to do UUIDs for devices.

-amtt



^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Persistent naming of scsi devices
  2002-04-08 18:18     ` Matthew Jacob
@ 2002-04-08 18:28       ` James Bottomley
  2002-04-08 18:34         ` Matthew Jacob
  0 siblings, 1 reply; 297+ messages in thread
From: James Bottomley @ 2002-04-08 18:28 UTC (permalink / raw)
  To: mjacob; +Cc: Oliver Neukum, Christoph Hellwig, sullivan, linux-scsi

mjacob@feral.com said:
> Then let's use WWPN/WWNS and/or drive serial #'s to do UUIDs for
> devices. 

That's what the persistent binding proposal that started all this uses.  
However, these strings are incredibly long at best and (in spite of what the 
SCSI spec says) sometimes include non-printing characters, so you don't 
necessarily want to be passing them on the command line.

James

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Persistent naming of scsi devices
  2002-04-08 18:28       ` James Bottomley
@ 2002-04-08 18:34         ` Matthew Jacob
  2002-04-08 19:07           ` James Bottomley
  0 siblings, 1 reply; 297+ messages in thread
From: Matthew Jacob @ 2002-04-08 18:34 UTC (permalink / raw)
  To: James Bottomley; +Cc: Oliver Neukum, Christoph Hellwig, sullivan, linux-scsi

A discussion about this kind of stuff went on at Sun in 1989. You want
unambigious physical attribute names that are absolutely precise- this is what
you get with, say, OpenBoot pathnames. But you don't want people to have to
deal with them because they're unwieldy. Instead, you want 'friendly'
names. For the OBP discussions back at Sun, this ended up as the 'devalias'
command for the prom, so 'Fred' could be /iommu@foo/sbus@bar/disk@wank, etc.

For the current discussion, it seems to me that unlabelled/unimported
disks/volumes (i.e., ones w/o a UUID) still can be named unambiguosly by the
unwieldy names because you're likely to put a UUID on them very very
quickly. Furthermore, a minor amount of effort in userland tools can make the
manipulation of said names not as much of a problem.

Sorry if I came into the discussion late.

-matt

On Mon, 8 Apr 2002, James Bottomley wrote:

> mjacob@feral.com said:
> > Then let's use WWPN/WWNS and/or drive serial #'s to do UUIDs for
> > devices. 
> 
> That's what the persistent binding proposal that started all this uses.  
> However, these strings are incredibly long at best and (in spite of what the 
> SCSI spec says) sometimes include non-printing characters, so you don't 
> necessarily want to be passing them on the command line.
> 
> James
> 
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Persistent naming of scsi devices
  2002-04-08 18:34         ` Matthew Jacob
@ 2002-04-08 19:07           ` James Bottomley
  2002-04-08 20:41             ` Matthew Jacob
  0 siblings, 1 reply; 297+ messages in thread
From: James Bottomley @ 2002-04-08 19:07 UTC (permalink / raw)
  To: mjacob
  Cc: James Bottomley, Oliver Neukum, Christoph Hellwig, sullivan,
	linux-scsi

mjacob@feral.com said:
> For the current discussion, it seems to me that unlabelled/unimported
> disks/volumes (i.e., ones w/o a UUID) still can be named unambiguosly
> by the unwieldy names because you're likely to put a UUID on them very
> very quickly. Furthermore, a minor amount of effort in userland tools
> can make the manipulation of said names not as much of a problem.

I agree, but there are the edge cases where you can't put a UUID on them.

I think, however, that what the discussion is showing us is that there is no 
one universal way to get a unique name for a volume, so what we want is some 
type of pluggable infrastructure which allows us to construct a unique name.

As far as persistent binding goes, I think having the ability to persistently 
bind the unique name to a device is a useful feature for those of us who don't 
want to know what the actual unique name is (or who want a nicer name).

However, nothing should prevent you from passing the name as a string to mount 
instead of using persistent binding (even if the string turns out to have to 
be a set of hex digits).  Perhaps we could even have a user level table of 
nice UUID to unwieldy unique name mapping, and you can use a user specified 
UUID without having to put anything on the device or partition.

The summary so far, as I see it, is

1) persistent binding might be a user friendly way of uniquely identifying 
devices
2) UUID doesn't solve the problem because not everything has one
3) ditto for WWN or EVPID

Another wrinkle is that UUID is partition specific, but WWN etc. are device 
specific.

James

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Persistent naming of scsi devices
  2002-04-08 19:07           ` James Bottomley
@ 2002-04-08 20:41             ` Matthew Jacob
  0 siblings, 0 replies; 297+ messages in thread
From: Matthew Jacob @ 2002-04-08 20:41 UTC (permalink / raw)
  To: James Bottomley; +Cc: Oliver Neukum, Christoph Hellwig, sullivan, linux-scsi

> mjacob@feral.com said:
> > For the current discussion, it seems to me that unlabelled/unimported
> > disks/volumes (i.e., ones w/o a UUID) still can be named unambiguosly
> > by the unwieldy names because you're likely to put a UUID on them very
> > very quickly. Furthermore, a minor amount of effort in userland tools
> > can make the manipulation of said names not as much of a problem.
> 
> I agree, but there are the edge cases where you can't put a UUID on them.
> 
> I think, however, that what the discussion is showing us is that there is no 
> one universal way to get a unique name for a volume, so what we want is some 
> type of pluggable infrastructure which allows us to construct a unique name.

Fair enough!

I would resist constructing names with addresses that can change easily (e.g.,
PortIDs or SCSI Targets, or device instance numbers based on probe order).

-matt



^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Persistent naming of scsi devices
  2002-04-08 15:04 ` Christoph Hellwig
                     ` (2 preceding siblings ...)
  2002-04-08 17:51   ` Oliver Neukum
@ 2002-04-08 18:45   ` Tigran Aivazian
  3 siblings, 0 replies; 297+ messages in thread
From: Tigran Aivazian @ 2002-04-08 18:45 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: sullivan, linux-scsi

On Mon, 8 Apr 2002, Christoph Hellwig wrote:

> On Mon, Apr 08, 2002 at 09:18:44AM -0600, sullivan wrote:
> > I would welcome any thoughts, comments, or suggestions.
>
> What is the rationale for doing this?  I think we should rather get people
> to actually use the UUID= option to mount..

Imagine an appliance serving out data from filesystems but also directly
connected to a tape library (or, even better, many such libraries),
obviously for direct backups.  After reshuffling devices in the library
(e.g. by breaking some of them :) Linux will reorder the tape devices in
the way it finds them which will completely screw up the windows client
software that is controlling the actual backup. So persistent naming of
such devices is a good idea but to implement it is far from trivial. I am
assuming the folks at IBM did this "non-trivial bit".

Regards,
Tigran

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Persistent naming of scsi devices
  2002-04-08 15:18 [RFC] Persistent naming of scsi devices sullivan
  2002-04-08 15:04 ` Christoph Hellwig
@ 2002-04-08 20:18 ` Eddie Williams
  2002-04-09  0:48 ` Kurt Garloff
  2 siblings, 0 replies; 297+ messages in thread
From: Eddie Williams @ 2002-04-08 20:18 UTC (permalink / raw)
  To: sullivan, linux-scsi

On Monday 08 April 2002 11:18 am, sullivan wrote:
> I have been working on a prototype to allow the persistent naming of scsi
> devices across boots. The prototype attempts to address the namespace
> slippage that occurs when names are assigned based on discovery order or
> topology, and a hardware configuration change occurs. It does this by
> assigning names based on the characteristics of the device.
>
> The prototype and more detailed info can be found at:
> http://oss.software.ibm.com/devreg/


Looks good.  As your README acknowledges one problem is that not all devices 
support the pages 0x80 and 0x83.

A quick scan of the code looks like you will work well with the "good" 
devices.  :-)

Several cases you might want to consider for "bad" devices:
- devices that don't support 0x00 and return standard inquiry data instead!
- devices that don't support 0x00 but do support 0x80

For the first case you can snoop the return data from 0x00 to see if it looks 
"good" versus looks like standard inquiry.

For the second case if 0x00 fails go ahead and try 0x80 anyway.

Eddie




^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Persistent naming of scsi devices
  2002-04-08 15:18 [RFC] Persistent naming of scsi devices sullivan
  2002-04-08 15:04 ` Christoph Hellwig
  2002-04-08 20:18 ` Eddie Williams
@ 2002-04-09  0:48 ` Kurt Garloff
  2 siblings, 0 replies; 297+ messages in thread
From: Kurt Garloff @ 2002-04-09  0:48 UTC (permalink / raw)
  To: sullivan; +Cc: linux-scsi

[-- Attachment #1: Type: text/plain, Size: 790 bytes --]

Hi,

just reading this discussion, I hope you all are aware of scsidev, which
provides some of the functionality needed.
Apart from allowing you to specify devces according to the bus topology
(just as devfs does), it also allows for aliases, based on various things,
such as names, device types, model or serial numbers (got from INQUIRY
with page code 0x80).
It may miss some mechanisms to deal with devices without this serial no,
e.g., but in general I do not really see why it would not be useable.
http://www.garloff.de/kurt/linux/scsidev/

Regards,
-- 
Kurt Garloff  <garloff@suse.de>                          Eindhoven, NL
GPG key: See mail header, key servers         Linux kernel development
SuSE Linux AG, Nuernberg, DE                            SCSI, Security

[-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --]

^ permalink raw reply	[flat|nested] 297+ messages in thread

* RE: [RFC] Persistent naming of scsi devices
@ 2002-04-08 16:11 Matt_Domsch
  0 siblings, 0 replies; 297+ messages in thread
From: Matt_Domsch @ 2002-04-08 16:11 UTC (permalink / raw)
  To: sullivan, linux-scsi

> I have been working on a prototype to allow the persistent 
> naming of scsi devices across boots. The prototype attempts 
> to address the namespace slippage that occurs when names are 
> assigned based on discovery order or topology, and a hardware 
> configuration change occurs. It does this by assigning names 
> based on the characteristics of the device.

> I think we should rather get people
> to actually use the UUID= option to mount..

Agreed.  Andreas' swap partition patches are needed here too.

The only other case this is tricky is at clean-disk install time, before any
UUIDs are available, when you've got disks on multiple controllers.  Here,
the BIOS and Linux have no way to communicate which device they each think
is the boot device.  Some BIOSs (like on Adaptec 39160 add-in cards, and
7890/7899 embedded controllers on Dell servers) provide BIOS Enhanced Disk
Device Services 3.0 (EDD 3.0), which provides an extension to BIOS int13
AX=48 which specifies, for BIOS's idea of device 80, 81, ..., the PCI
bus/dev/fn and type (SCSI/IDE,...) Not all BIOSs yet provide such. :-(  But,
having that, Linux could then compare its similar mapping, and get right
without manual intervention what disk is the boot disk.

-Matt
--
Matt Domsch
Sr. Software Engineer
Dell Linux Solutions www.dell.com/linux
Linux on Dell mailing lists @ http://lists.us.dell.com
#1 US Linux Server provider for 2001!

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Persistent naming of scsi devices
@ 2002-04-08 19:18 Martin Peschke3
  2002-04-08 20:45 ` Matthew Jacob
  2002-04-10  1:16 ` Rick Stevens
  0 siblings, 2 replies; 297+ messages in thread
From: Martin Peschke3 @ 2002-04-08 19:18 UTC (permalink / raw)
  To: James Bottomley
  Cc: mjacob, Oliver Neukum, Christoph Hellwig, sullivan, linux-scsi



> mjacob@feral.com said:
> > Then let's use WWPN/WWNS and/or drive serial #'s to do UUIDs for
> > devices.
>
> That's what the persistent binding proposal that started all this uses.
> However, these strings are incredibly long at best and (in spite of what
the
> SCSI spec says) sometimes include non-printing characters, so you don't
> necessarily want to be passing them on the command line.
>
> James

That's true of course for Fibre Channel WWNs (64 bit).
Such identifiers might be shorter (or longer)
depending on the considered interface.
What could be used for good old parallel attachments?

I would suggest to circumvent problems with non-printing
characters by means of hexadecimal notation
(e.g. 0x5007890000a11753 representing a WWN).


Mit freundlichen Grüßen / with kind regards

Martin Peschke

IBM Deutschland Entwicklung GmbH
Linux for eServer Development
Phone: +49-(0)7031-16-2349



-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Persistent naming of scsi devices
  2002-04-08 19:18 Martin Peschke3
@ 2002-04-08 20:45 ` Matthew Jacob
  2002-04-10  1:16 ` Rick Stevens
  1 sibling, 0 replies; 297+ messages in thread
From: Matthew Jacob @ 2002-04-08 20:45 UTC (permalink / raw)
  To: Martin Peschke3
  Cc: James Bottomley, Oliver Neukum, Christoph Hellwig, sullivan,
	linux-scsi

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: TEXT/PLAIN; charset=X-UNKNOWN, Size: 1435 bytes --]


It's more than just a WWN- it really has to be the pair of WWNN/WWPN (node &&
port name) as you can have, e.g., a dual ported disk with the same WWNN but
different WWPNs.

On Mon, 8 Apr 2002, Martin Peschke3 wrote:

> 
> 
> > mjacob@feral.com said:
> > > Then let's use WWPN/WWNS and/or drive serial #'s to do UUIDs for
> > > devices.
> >
> > That's what the persistent binding proposal that started all this uses.
> > However, these strings are incredibly long at best and (in spite of what
> the
> > SCSI spec says) sometimes include non-printing characters, so you don't
> > necessarily want to be passing them on the command line.
> >
> > James
> 
> That's true of course for Fibre Channel WWNs (64 bit).
> Such identifiers might be shorter (or longer)
> depending on the considered interface.
> What could be used for good old parallel attachments?
> 
> I would suggest to circumvent problems with non-printing
> characters by means of hexadecimal notation
> (e.g. 0x5007890000a11753 representing a WWN).
> 
> 
> Mit freundlichen Grüßen / with kind regards
> 
> Martin Peschke
> 
> IBM Deutschland Entwicklung GmbH
> Linux for eServer Development
> Phone: +49-(0)7031-16-2349
> 
> 
> 
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Persistent naming of scsi devices
  2002-04-08 19:18 Martin Peschke3
  2002-04-08 20:45 ` Matthew Jacob
@ 2002-04-10  1:16 ` Rick Stevens
  2002-04-10  2:01   ` Matthew Jacob
                     ` (2 more replies)
  1 sibling, 3 replies; 297+ messages in thread
From: Rick Stevens @ 2002-04-10  1:16 UTC (permalink / raw)
  To: linux-scsi

Martin Peschke3 wrote:

<much stuff snipped>

I'm going to wade in here.  Based on my experiences on many other
Unixish systems, wouldn't it be simply better to number the things
based on controller position, SCSI ID, LUN and partition?  This
has been called "CTL" format in various documents, and many systems
use this method such as Sun, DG AvIIons, DEC, HP and a host of others.

E.g. "/dev/dsk/c0t1d2s3" is controller 0 (first SCSI controller seen on
the bus), target ID 1, LUN 2, partition (slice) 3.  Simple, unambiguous
and repeatable.

I understand that other devices may be seen on the PCI if you add or
remove cards.  Under this scheme, controller 0 is the first (lowest
slot number) unit found.  The next one would be controller 1.  Even
if you were to stuff, say, a video card in there, these would _still_
be the first SCSI cards found.  The only time a change would occur
is if a SCSI card was installed in a lower slot number or between
other SCSI controllers (and only then if the original cards were
left in) or if the ORDER of the cards was changed in the bus.

As I said, other people smarter than I seem to think it makes sense.
Why not Linux?  It's silly to smush things together just to satisfy a
bizzare craving to have a list of devices with no "holes" in it.
Besides "fsck"ing or "mkfs"ing drives, how often do you refer to them
by their names in "/dev", anyway?

This would also work for tape drives.  However, they're rooted at
/dev/stape rather than /dev/dsk.

At boot, you could create more mnemonic names as symbolic links to the
CTL names if you wish (as is done with /dev/cdrom and such), but if
you absolutely want to talk to the SAME DEVICE, you use the CTL name.

Just adding my $0.02.  We now return you to your regularly scheduled
arguments.

----------------------------------------------------------------------
- Rick Stevens, SSE, VitalStream, Inc.      rstevens@vitalstream.com -
- 949-743-2010 (Voice)                    http://www.vitalstream.com -
-                                                                    -
-          su -; find / -name someone -exec touch \{\} \;            -
-                          - The UNIX way of touching someone        -
----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Persistent naming of scsi devices
  2002-04-10  1:16 ` Rick Stevens
@ 2002-04-10  2:01   ` Matthew Jacob
  2002-04-10  2:17   ` Linus Torvalds
  2002-04-10  3:37   ` Martin K. Petersen
  2 siblings, 0 replies; 297+ messages in thread
From: Matthew Jacob @ 2002-04-10  2:01 UTC (permalink / raw)
  To: Rick Stevens; +Cc: linux-scsi



On Tue, 9 Apr 2002, Rick Stevens wrote:

> Martin Peschke3 wrote:
> 
> <much stuff snipped>
> 
> I'm going to wade in here.  Based on my experiences on many other
> Unixish systems, wouldn't it be simply better to number the things
> based on controller position, SCSI ID, LUN and partition? 

Address has become a very non-persistent thing.


> has been called "CTL" format in various documents, and many systems
> use this method such as Sun, DG AvIIons, DEC, HP and a host of others.
> 
> E.g. "/dev/dsk/c0t1d2s3" is controller 0 (first SCSI controller seen on
> the bus), target ID 1, LUN 2, partition (slice) 3.  Simple, unambiguous
> and repeatable.
> 
> I understand that other devices may be seen on the PCI if you add or
> remove cards.  Under this scheme, controller 0 is the first (lowest
> slot number) unit found.  The next one would be controller 1.  Even
> if you were to stuff, say, a video card in there, these would _still_
> be the first SCSI cards found.  The only time a change would occur
> is if a SCSI card was installed in a lower slot number or between
> other SCSI controllers (and only then if the original cards were
> left in) or if the ORDER of the cards was changed in the bus.
> 
> As I said, other people smarter than I seem to think it makes sense.
> Why not Linux?  It's silly to smush things together just to satisfy a
> bizzare craving to have a list of devices with no "holes" in it.
> Besides "fsck"ing or "mkfs"ing drives, how often do you refer to them
> by their names in "/dev", anyway?
> 
> This would also work for tape drives.  However, they're rooted at
> /dev/stape rather than /dev/dsk.
> 
> At boot, you could create more mnemonic names as symbolic links to the
> CTL names if you wish (as is done with /dev/cdrom and such), but if
> you absolutely want to talk to the SAME DEVICE, you use the CTL name.
> 
> Just adding my $0.02.  We now return you to your regularly scheduled
> arguments.
> 
> ----------------------------------------------------------------------
> - Rick Stevens, SSE, VitalStream, Inc.      rstevens@vitalstream.com -
> - 949-743-2010 (Voice)                    http://www.vitalstream.com -
> -                                                                    -
> -          su -; find / -name someone -exec touch \{\} \;            -
> -                          - The UNIX way of touching someone        -
> ----------------------------------------------------------------------
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Persistent naming of scsi devices
  2002-04-10  1:16 ` Rick Stevens
  2002-04-10  2:01   ` Matthew Jacob
@ 2002-04-10  2:17   ` Linus Torvalds
  2002-04-10  3:37   ` Martin K. Petersen
  2 siblings, 0 replies; 297+ messages in thread
From: Linus Torvalds @ 2002-04-10  2:17 UTC (permalink / raw)
  To: linux-scsi

In article <3CB39285.8000609@vitalstream.com>,
Rick Stevens  <rstevens@vitalstream.com> wrote:
>
> [ Position-based naming ]
>
>As I said, other people smarter than I seem to think it makes sense.

No, those people were probably not smarter than you. In fact, they are
likely complete morons.

Position means nothing.  Anybody who bases naming on position is just
bending over and waiting for it when it comes to hotplugging etc.  It
solves none of the problems it is claimed to solve (ie the names are
_not_ constant), and it fundamentally locks you into a mindset that
simply isn't true. 

>Why not Linux?  It's silly to smush things together just to satisfy a
>bizzare craving to have a list of devices with no "holes" in it.

No, it's silly to think that you can enumerate the address space: you
can't. The only thing you get from trying is a horrible mess in /dev
that adds zero information anywhere.

You're much better off just enumerating your SCSI devices the trivial
way (ie 0, 1, 2 ....  completely independent on position - each device
gets one unique number that has no inherent meaning and is UNDERSTOOD to
have no inherent meaning, only an ID) and then having a way to query
their attibutes.  That way you can use the attributes (which are _NOT_
identities) to create whatever "convenient mapping" you want. 

The position of a drive is not its name, it's just one random attribute
in a sea of other random attributes.  Depending on what you do, it might
be a useful one, but it's equally likely that it is completely
meaningless.  Trying to make it inherently meaningful is just *wrong*.

		Linus

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Persistent naming of scsi devices
  2002-04-10  1:16 ` Rick Stevens
  2002-04-10  2:01   ` Matthew Jacob
  2002-04-10  2:17   ` Linus Torvalds
@ 2002-04-10  3:37   ` Martin K. Petersen
  2002-04-10 13:19     ` Theodore Tso
  2 siblings, 1 reply; 297+ messages in thread
From: Martin K. Petersen @ 2002-04-10  3:37 UTC (permalink / raw)
  To: linux-scsi

Rick> E.g. "/dev/dsk/c0t1d2s3" is controller 0 (first SCSI controller
Rick> seen on the bus), target ID 1, LUN 2, partition (slice) 3.
Rick> Simple, unambiguous and repeatable.

You plug in another SCSI controller and then what happens?  Or one of
your existing controllers goes up in blue smoke?

What we want is (at least) three ways of addressing a device:

1. By content.  This is the persistent naming.  Think
   filesystem/MD/LVM UUID.  This is what you put in /etc/fstab and
   what metadisk systems use to assemble logical volumes.

   Content referencing is used for accessing data.

2. By physical path.  This naming is not persistent.  Not even runtime
   because hotplug, iSCSI and whatnot may mess things up.

   Path naming is for discovery and recovery.  When you add an
   unlabeled disk you want to reference it by path to give it a name.
   When you have a failed disk on your system you want to know which
   physical device to pull from the array.

3. By enumeration.  This is what the kernel happens to be using to
   reference the device.  diskN.  Certainly not persistent.

   Enumeration is for the kernel.

Now, some magic needs to be involved at discovery time because we
can't rely only on neither content addressing, nor path addressing
alone to do the right thing.  And why is that?

Well, imagine you have a set of mirrored disks.  If you only address
by content, you will find have two devices that match (or three if MD
has done autodetection).  What you really want to mount is the MD
device.

Multipathing has a similar problem.  You'll see two copies of the same
filesystem/UUID.  And again you really want the metadevice to be
mounted.

So not only is your proposal a bad idea for persistency reasons.  The
scheme does not carry enough information to do what is needed.

-- 
Martin K. Petersen, Principal Linux Consultant, Linuxcare, Inc.
mkp@linuxcare.com, http://www.linuxcare.com/
SGI XFS for Linux Developer, http://oss.sgi.com/projects/xfs/

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Persistent naming of scsi devices
  2002-04-10  3:37   ` Martin K. Petersen
@ 2002-04-10 13:19     ` Theodore Tso
  2002-04-10 14:04       ` Eddie Williams
  0 siblings, 1 reply; 297+ messages in thread
From: Theodore Tso @ 2002-04-10 13:19 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: linux-scsi

On Tue, Apr 09, 2002 at 11:37:44PM -0400, Martin K. Petersen wrote:
> Well, imagine you have a set of mirrored disks.  If you only address
> by content, you will find have two devices that match (or three if MD
> has done autodetection).  What you really want to mount is the MD
> device.
> 
> Multipathing has a similar problem.  You'll see two copies of the same
> filesystem/UUID.  And again you really want the metadevice to be
> mounted.

Note that fsck (and I think mount, but I definitely know about fsck
because I maintain it) solves the above problem by searching the MD
devices first.  Once the multipathing becomes stable, again the right
answer for fsck is to search the metadevice first.

Andreas Dilger has written a block-device-id library which is really
promising, and the goal is to abstract out the current search
algorithms currently used by fsck and mount into a single library.
Said library will also cache the location of filesystems by UUID and
label name, and when a particular block device corresponding to
UUID=XXX or LABEL=YYY is requested, it can look up the location in the
cache, validate it to make sure that someone hasn't inserted or
removed a SCSI controller card, and then return the answer.  If the
validation fails, the library will then do a full sweep of all of the
known devices (again, doing the metadevices --- LVM, MD, et. al ---
first, to deal with the problem you listed).

This basically deals with all of the problems except for swap (which
doesn't have a UUID --- but it shouldn't be too hard to add a UUID
into the swap signature, and then teach swapon how to use it), and the
problem of initializing new disks, before they have UUID's attached to
it. 

That is indeed a hard problem, and ultimately, the thing that will
make this easier is if there's some way we can read out a
drive-specific serial number out of the ATAPI or SCSI interfaces.  I
don't know if disk serial numbers are commonly supported by disk
manufacturers, but hopefully the newer disks have something like that
we can use.  Without it, though, the problem is very, very hard if you
want to make it 100% foolproof --- and hence, something which
civilians (i.e., stupid users) can use.

						- Ted

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Persistent naming of scsi devices
  2002-04-10 13:19     ` Theodore Tso
@ 2002-04-10 14:04       ` Eddie Williams
  2002-04-10 17:45         ` Mike Anderson
  0 siblings, 1 reply; 297+ messages in thread
From: Eddie Williams @ 2002-04-10 14:04 UTC (permalink / raw)
  To: Theodore Tso, Martin K. Petersen; +Cc: linux-scsi

On Wednesday 10 April 2002 09:19 am, Theodore Tso wrote:

> That is indeed a hard problem, and ultimately, the thing that will
> make this easier is if there's some way we can read out a
> drive-specific serial number out of the ATAPI or SCSI interfaces.  I
> don't know if disk serial numbers are commonly supported by disk
> manufacturers, but hopefully the newer disks have something like that
> we can use.  Without it, though, the problem is very, very hard if you
> want to make it 100% foolproof --- and hence, something which
> civilians (i.e., stupid users) can use.

While I enjoy bashing stupid users I would like to point out that persistent 
naming, while it makes life easier for stupid users, it has tremendous 
benefits for all of us.  

We (the product I work on) have used serial numbers for a long time to 
uniquely identify devices and this improves the ease of use tremendously.  In 
a cluster this is really critical or one can easily try to bring an 
application in-service on a wrong device with catestrophic results.

Eddie

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Persistent naming of scsi devices
  2002-04-10 14:04       ` Eddie Williams
@ 2002-04-10 17:45         ` Mike Anderson
  0 siblings, 0 replies; 297+ messages in thread
From: Mike Anderson @ 2002-04-10 17:45 UTC (permalink / raw)
  To: Eddie Williams; +Cc: Theodore Tso, Martin K. Petersen, linux-scsi

Eddie Williams [Eddie.Williams@steeleye.com] wrote:
> On Wednesday 10 April 2002 09:19 am, Theodore Tso wrote:
> 
> > That is indeed a hard problem, and ultimately, the thing that will
> > make this easier is if there's some way we can read out a
> > drive-specific serial number out of the ATAPI or SCSI interfaces.  I
> > don't know if disk serial numbers are commonly supported by disk
> > manufacturers, but hopefully the newer disks have something like that
> > we can use.  Without it, though, the problem is very, very hard if you
> > want to make it 100% foolproof --- and hence, something which
> > civilians (i.e., stupid users) can use.
> 
> While I enjoy bashing stupid users I would like to point out that persistent 
> naming, while it makes life easier for stupid users, it has tremendous 
> benefits for all of us.  
> 
> We (the product I work on) have used serial numbers for a long time to 
> uniquely identify devices and this improves the ease of use tremendously.  In 
> a cluster this is really critical or one can easily try to bring an 
> application in-service on a wrong device with catestrophic results.

Agreed,

While the immutablity / uniqueness of uuids across different name spaces
may vary depending on device capability there are still useful. Many have
been obtaining IDs from devices for years using a method of obtaining the
most unique ID the device can provide and then ensuring that names spaces
do not collide (i.e go for page 0x83, then 0x80, vendor-product-serial,
vendor unique).

We have also found in the past that when users make complete copies of
there volumes (for the purpose of hot-backup) that a uuid is one of the
few methods that can be used to differ the clone from the original.


-Mike
-- 
Michael Anderson
andmike@us.ibm.com


^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Persistent naming of scsi devices
@ 2002-04-08 22:05 Martin Peschke3
  2002-04-08 22:17 ` Matthew Jacob
  0 siblings, 1 reply; 297+ messages in thread
From: Martin Peschke3 @ 2002-04-08 22:05 UTC (permalink / raw)
  To: mjacob
  Cc: James Bottomley, Oliver Neukum, Christoph Hellwig, sullivan,
	linux-scsi

> It's more than just a WWN- it really has to be
> the pair of WWNN/WWPN (node && port name) as you
> can have, e.g., a dual ported disk with the same WWNN but
> different WWPNs.

Ok, I thought of WWPN and that it would be sufficient to identify the
target side of a link, isn't it?
Do you really need a pair of WWNN/WWPN, i.e. identical WWPNs might
exist considering different WWNNs?


Mit freundlichen Grüßen / with kind regards

Martin Peschke

IBM Deutschland Entwicklung GmbH
Linux for eServer Development
Phone: +49-(0)7031-16-2349


Matthew Jacob <mjacob@feral.com>@vger.kernel.org on 04/08/2002 10:45:33 PM

Please respond to mjacob@feral.com

Sent by:    linux-scsi-owner@vger.kernel.org


To:    Martin Peschke3/Germany/IBM@IBMDE
cc:    James Bottomley <James.Bottomley@steeleye.com>, Oliver Neukum
       <oliver@neukum.org>, Christoph Hellwig <hch@infradead.org>, sullivan
       <sullivan@austin.ibm.com>, linux-scsi@vger.kernel.org
Subject:    Re: [RFC] Persistent naming of scsi devices




It's more than just a WWN- it really has to be the pair of WWNN/WWPN (node
&&
port name) as you can have, e.g., a dual ported disk with the same WWNN but
different WWPNs.

On Mon, 8 Apr 2002, Martin Peschke3 wrote:

>
>
> > mjacob@feral.com said:
> > > Then let's use WWPN/WWNS and/or drive serial #'s to do UUIDs for
> > > devices.
> >
> > That's what the persistent binding proposal that started all this uses.
> > However, these strings are incredibly long at best and (in spite of
what
> the
> > SCSI spec says) sometimes include non-printing characters, so you don't
> > necessarily want to be passing them on the command line.
> >
> > James
>
> That's true of course for Fibre Channel WWNs (64 bit).
> Such identifiers might be shorter (or longer)
> depending on the considered interface.
> What could be used for good old parallel attachments?
>
> I would suggest to circumvent problems with non-printing
> characters by means of hexadecimal notation
> (e.g. 0x5007890000a11753 representing a WWN).
>
>
> Mit freundlichen Grüßen / with kind regards
>
> Martin Peschke
>
> IBM Deutschland Entwicklung GmbH
> Linux for eServer Development
> Phone: +49-(0)7031-16-2349
>
>
>
>

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Persistent naming of scsi devices
  2002-04-08 22:05 Martin Peschke3
@ 2002-04-08 22:17 ` Matthew Jacob
  0 siblings, 0 replies; 297+ messages in thread
From: Matthew Jacob @ 2002-04-08 22:17 UTC (permalink / raw)
  To: Martin Peschke3
  Cc: James Bottomley, Oliver Neukum, Christoph Hellwig, sullivan,
	linux-scsi

On Tue, 9 Apr 2002, Martin Peschke3 wrote:

> > It's more than just a WWN- it really has to be
> > the pair of WWNN/WWPN (node && port name) as you
> > can have, e.g., a dual ported disk with the same WWNN but
> > different WWPNs.
> 
> Ok, I thought of WWPN and that it would be sufficient to identify the
> target side of a link, isn't it? Do you really need a pair of WWNN/WWPN,
> i.e. identical WWPNs might exist considering different WWNNs?

If the numbering is NAA=2 (2 in the top nibble), you can always derive the
WWNN from the WWPN. But that's only true for NAA=2.

In *practice*, WWPN might be enough,  but if we're doing things that formally
and fully ID a device, and considering that iSCSI identifiers are 255 bytes or
better, two 64 bit ids for Fibre Channel devices is not unreasonable to allow
for.

-matt

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Persistent naming of scsi devices
@ 2002-04-10  1:40 Bryan Henderson
  0 siblings, 0 replies; 297+ messages in thread
From: Bryan Henderson @ 2002-04-10  1:40 UTC (permalink / raw)
  To: Rick Stevens; +Cc: linux-scsi

>E.g. "/dev/dsk/c0t1d2s3" is controller 0 (first SCSI controller seen on
>the bus), target ID 1, LUN 2, partition (slice) 3.  Simple, unambiguous
>and repeatable.

For many purposes, you're right, and for that reason devfs already does 
this.  However, in some cases these are not repeatable.  A volume that is 
LUN 5 today could be LUN 7 tomorrow.  This is true in heavy-duty disk 
storage subsystems that let you create and destroy logical units on the 
fly, using the physical medium as a resource pool.  And what is LUN 5 for 
me could be LUN 7 for you at the same time, if we're both wired into the 
same storage subsystem.

^ permalink raw reply	[flat|nested] 297+ messages in thread

* RE: [RFC] Persistent naming of scsi devices
@ 2002-04-10 14:28 berthiaume_wayne
  0 siblings, 0 replies; 297+ messages in thread
From: berthiaume_wayne @ 2002-04-10 14:28 UTC (permalink / raw)
  To: rstevens; +Cc: linux-scsi

	Actually, it would be easier to control if it uses its actual PCI
slot number, for example c2 for the second slot, as opposed to c2 for the
second SCSI controller. This would be easier to implement and would be
"persistent" as long as the controller is not moved. Doing it in the manner
you suggest would not provide persistence but revert to the order of
discovery we already have with a different naming convention. Doesn't seem
like a wise alternative. The intention of the CTDL nomenclature is to
provide precise device targeting without the BS mapping to some arbitrarily
named device table. In DG/AViiON, more precisely DG/UX, it was CTDL based
but then mapped to a device mneumonic in the devtable. Devices were named in
the system config file by their long name (CTDL) and linked in the devtable
to its short name, for example /dev/pdsk/sd0 for the buffered SCSI disk and
/dev/rpdsk/sd0 for the raw SCSI disk. We used different SCSI controllers but
this didn't present a problem because, they were placed in the kernel config
file. If the Ciprico was in slot 0 and in the kernel's spec file, it's long
name was entered into the devtable and linked at boot to the short name
/dev/pdsk/sd0. Conversely, if the Adaptec was in slot 0 and configured into
the kernel spec file, its long name would be entered into the devtable and
linked to the /dev/pdsk/sd0 short name.
Wayne
EMC Corp
Centera Engineering
4400 Computer Drive
M/S F213
Westboro,  MA    01580

email:       Berthiaume_Wayne@emc.com

"One man can make a difference, and every man should try."  - JFK

-----Original Message-----
From: Rick Stevens [mailto:rstevens@vitalstream.com]
Sent: Tuesday, April 09, 2002 9:17 PM
To: linux-scsi@vger.kernel.org
Subject: Re: [RFC] Persistent naming of scsi devices

Martin Peschke3 wrote:

<much stuff snipped>

I'm going to wade in here.  Based on my experiences on many other
Unixish systems, wouldn't it be simply better to number the things
based on controller position, SCSI ID, LUN and partition?  This
has been called "CTL" format in various documents, and many systems
use this method such as Sun, DG AvIIons, DEC, HP and a host of others.

E.g. "/dev/dsk/c0t1d2s3" is controller 0 (first SCSI controller seen on
the bus), target ID 1, LUN 2, partition (slice) 3.  Simple, unambiguous
and repeatable.

I understand that other devices may be seen on the PCI if you add or
remove cards.  Under this scheme, controller 0 is the first (lowest
slot number) unit found.  The next one would be controller 1.  Even
if you were to stuff, say, a video card in there, these would _still_
be the first SCSI cards found.  The only time a change would occur
is if a SCSI card was installed in a lower slot number or between
other SCSI controllers (and only then if the original cards were
left in) or if the ORDER of the cards was changed in the bus.

As I said, other people smarter than I seem to think it makes sense.
Why not Linux?  It's silly to smush things together just to satisfy a
bizzare craving to have a list of devices with no "holes" in it.
Besides "fsck"ing or "mkfs"ing drives, how often do you refer to them
by their names in "/dev", anyway?

This would also work for tape drives.  However, they're rooted at
/dev/stape rather than /dev/dsk.

At boot, you could create more mnemonic names as symbolic links to the
CTL names if you wish (as is done with /dev/cdrom and such), but if
you absolutely want to talk to the SAME DEVICE, you use the CTL name.

Just adding my $0.02.  We now return you to your regularly scheduled
arguments.

----------------------------------------------------------------------
- Rick Stevens, SSE, VitalStream, Inc.      rstevens@vitalstream.com -
- 949-743-2010 (Voice)                    http://www.vitalstream.com -
-                                                                    -
-          su -; find / -name someone -exec touch \{\} \;            -
-                          - The UNIX way of touching someone        -
----------------------------------------------------------------------

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 297+ messages in thread

* RE: [RFC] Persistent naming of scsi devices
@ 2002-04-10 14:36 berthiaume_wayne
  2002-04-10 16:02 ` Matthew Jacob
  0 siblings, 1 reply; 297+ messages in thread
From: berthiaume_wayne @ 2002-04-10 14:36 UTC (permalink / raw)
  To: hbryan; +Cc: linux-scsi

	This is true with the design of today's virtual LUNs on the storage
arrays. However, I believe that if you are sharing a VLUN on the array the
same WWNN/WWPN is assigned to it. Though it may be represented to you as
VLUN 1 in your virtual storage system and VLUN 6 in mine I thought the same
WWNN/WWPN was assigned to it.

-----Original Message-----
From: Bryan Henderson [mailto:hbryan@us.ibm.com]
Sent: Tuesday, April 09, 2002 9:40 PM
To: Rick Stevens
Cc: linux-scsi@vger.kernel.org
Subject: Re: [RFC] Persistent naming of scsi devices


>E.g. "/dev/dsk/c0t1d2s3" is controller 0 (first SCSI controller seen on
>the bus), target ID 1, LUN 2, partition (slice) 3.  Simple, unambiguous
>and repeatable.

For many purposes, you're right, and for that reason devfs already does 
this.  However, in some cases these are not repeatable.  A volume that is 
LUN 5 today could be LUN 7 tomorrow.  This is true in heavy-duty disk 
storage subsystems that let you create and destroy logical units on the 
fly, using the physical medium as a resource pool.  And what is LUN 5 for 
me could be LUN 7 for you at the same time, if we're both wired into the 
same storage subsystem.

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 297+ messages in thread

* RE: [RFC] Persistent naming of scsi devices
  2002-04-10 14:36 berthiaume_wayne
@ 2002-04-10 16:02 ` Matthew Jacob
  0 siblings, 0 replies; 297+ messages in thread
From: Matthew Jacob @ 2002-04-10 16:02 UTC (permalink / raw)
  To: berthiaume_wayne; +Cc: hbryan, linux-scsi


That's why you always trail lun off of the wwnn/wwpn- see solaris' ssd device
names.


On Wed, 10 Apr 2002 berthiaume_wayne@emc.com wrote:

> 	This is true with the design of today's virtual LUNs on the storage
> arrays. However, I believe that if you are sharing a VLUN on the array the
> same WWNN/WWPN is assigned to it. Though it may be represented to you as
> VLUN 1 in your virtual storage system and VLUN 6 in mine I thought the same
> WWNN/WWPN was assigned to it.
> 
> -----Original Message-----
> From: Bryan Henderson [mailto:hbryan@us.ibm.com]
> Sent: Tuesday, April 09, 2002 9:40 PM
> To: Rick Stevens
> Cc: linux-scsi@vger.kernel.org
> Subject: Re: [RFC] Persistent naming of scsi devices
> 
> 
> >E.g. "/dev/dsk/c0t1d2s3" is controller 0 (first SCSI controller seen on
> >the bus), target ID 1, LUN 2, partition (slice) 3.  Simple, unambiguous
> >and repeatable.
> 
> For many purposes, you're right, and for that reason devfs already does 
> this.  However, in some cases these are not repeatable.  A volume that is 
> LUN 5 today could be LUN 7 tomorrow.  This is true in heavy-duty disk 
> storage subsystems that let you create and destroy logical units on the 
> fly, using the physical medium as a resource pool.  And what is LUN 5 for 
> me could be LUN 7 for you at the same time, if we're both wired into the 
> same storage subsystem.
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 297+ messages in thread

* RE: [RFC] Persistent naming of scsi devices
@ 2002-04-10 15:28 Bryan Henderson
  0 siblings, 0 replies; 297+ messages in thread
From: Bryan Henderson @ 2002-04-10 15:28 UTC (permalink / raw)
  To: berthiaume_wayne; +Cc: linux-scsi

>I believe that if you are sharing a VLUN on the array the
>same WWNN/WWPN is assigned to it. Though it may be represented to you as
>VLUN 1 in your virtual storage system and VLUN 6 in mine I thought the
same
>WWNN/WWPN was assigned to it.

Yes, every system I've seen does that and it seems to be the intent (for
whatever that's worth) of the fibre channel architects.  But how is that
relevant?  We're talking about uniquely identifying a logical unit (which
is what would appear in Linux as a device).

I don't know what a VLUN is, so maybe I'm just lost.  Isn't that redundant?
Virtual and logical are the same thing.

Leave the WWPN (worldwide port number) out of it, by the way.  That doesn't
help identify devices, just ports that can be used to reach them.

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Persistent naming of scsi devices
@ 2002-04-10 15:52 Martin Peschke3
  2002-04-10 19:33 ` Matthew Jacob
  0 siblings, 1 reply; 297+ messages in thread
From: Martin Peschke3 @ 2002-04-10 15:52 UTC (permalink / raw)
  To: mjacob
  Cc: James Bottomley, Oliver Neukum, Christoph Hellwig, sullivan,
	linux-scsi

Matt,

> > > It's more than just a WWN- it really has to be
> > > the pair of WWNN/WWPN (node && port name) as you
> > > can have, e.g., a dual ported disk with the same WWNN but
> > > different WWPNs.
> >
> > Ok, I thought of WWPN and that it would be sufficient to identify the
> > target side of a link, isn't it? Do you really need a pair of
WWNN/WWPN,
> > i.e. identical WWPNs might exist considering different WWNNs?
>
> If the numbering is NAA=2 (2 in the top nibble), you can always derive
the
> WWNN from the WWPN. But that's only true for NAA=2.
>
> In *practice*, WWPN might be enough,  but if we're doing things that
formally
> and fully ID a device, and considering that iSCSI identifiers are 255
bytes or
> better, two 64 bit ids for Fibre Channel devices is not unreasonable to
allow
> for.
>
> -matt

If there are cases which require WWNN+WWPN to identify a
FC port then a you would need this pair to query a FC ports
D_ID. I wonder why this is not reflected in the FC name server
specification, FC-GS-3, (see GID_PN for example).


Mit freundlichen Grüßen / with kind regards

Martin Peschke

IBM Deutschland Entwicklung GmbH
Linux for eServer Development
Phone: +49-(0)7031-16-2349


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Persistent naming of scsi devices
  2002-04-10 15:52 Martin Peschke3
@ 2002-04-10 19:33 ` Matthew Jacob
  0 siblings, 0 replies; 297+ messages in thread
From: Matthew Jacob @ 2002-04-10 19:33 UTC (permalink / raw)
  To: Martin Peschke3
  Cc: James Bottomley, Oliver Neukum, Christoph Hellwig, sullivan,
	linux-scsi

> If there are cases which require WWNN+WWPN to identify a FC port then a
> you would need this pair to query a FC ports D_ID. I wonder why this is
> not reflected in the FC name server specification, FC-GS-3, (see GID_PN
> for example).

Good question.

But I believe that in sending a CT frame to the name server with PortID XXXXXX
to ask it what the Port Name is or the Node Name, all you're doing is to ask
it to use that PortID as a search key and return the Port Name. This hasn't
got all that much to do with initiators needing to use WWNN+WWPN (currently
bound with D_ID such and such) to uniquely, and persistently, identify a
device.

I mean, to use FC-GS-2 or FC-GS-3, you do something like

GID_FT
(gimme a list of Port IDs on the fabric for the FC4 type I want to know about)

For each member of the returned list do:

	GPN_ID (get it's port name)

	GNN_ID (get it's node name)

	optionally: GFF_ID (get it's FC4 features. For FC-SCSI this gets
	you Initiator/Target support roles)

You would think that there would be one CT command to return *both* Port and
Node names- and there is (it's part of the return data at least for GA_NXT).

It's my ill informed opinion that this design is a result of a difference of
opinion about the usages of WWPN/WWNN within the t10 committee. I believe I've
heard one of the members expresss regret over having separate numbers.

Be that as it may be, we *know* we have cases where you *must* use both the
WWNN and WWPN to *uniquely* ID the device. If I have just about any FC disk
drive, it is going to have a common WWNN but different WWPNs depending on
whether you talk to it via FC_AL Port A or Port B connections. If I have the
drive in a JMR JBOD with both Port A and Port B connectors, and connect both
to HBAs from a system (e.g., to provide warm failover), I really need *both*
WWPN and WWNN to keep track of which drive is which drive.

I *might* want to treat them as the same (e.g., for transparent failover), but
this should be under control, not happenstance depending on whether I decide
to use just one of WWNN or WWPN to ID devices.

-matt

^ permalink raw reply	[flat|nested] 297+ messages in thread

* RE: [RFC] Persistent naming of scsi devices
@ 2002-04-10 16:44 berthiaume_wayne
  0 siblings, 0 replies; 297+ messages in thread
From: berthiaume_wayne @ 2002-04-10 16:44 UTC (permalink / raw)
  To: hbryan; +Cc: linux-scsi

	VLUN = virtual LUN   

	This would be the number assigned by the storage system. A storage
system can be broken up into multiple virtual storage systems each unique to
the owner. The real trick is when to virtual storage systems share the same
LUN. Depending on the order of entry into the virtual storage array on a
CLARiiON at the time it is being configured it could be a LUN 1 to your
virtual storage and LUN 6 on mine.

-----Original Message-----
From: Bryan Henderson [mailto:hbryan@us.ibm.com]
Sent: Wednesday, April 10, 2002 11:28 AM
To: berthiaume_wayne@emc.com
Cc: linux-scsi@vger.kernel.org
Subject: RE: [RFC] Persistent naming of scsi devices

>I believe that if you are sharing a VLUN on the array the
>same WWNN/WWPN is assigned to it. Though it may be represented to you as
>VLUN 1 in your virtual storage system and VLUN 6 in mine I thought the
same
>WWNN/WWPN was assigned to it.

Yes, every system I've seen does that and it seems to be the intent (for
whatever that's worth) of the fibre channel architects.  But how is that
relevant?  We're talking about uniquely identifying a logical unit (which
is what would appear in Linux as a device).

I don't know what a VLUN is, so maybe I'm just lost.  Isn't that redundant?
Virtual and logical are the same thing.

Leave the WWPN (worldwide port number) out of it, by the way.  That doesn't
help identify devices, just ports that can be used to reach them.

^ permalink raw reply	[flat|nested] 297+ messages in thread

* RE: [RFC] Persistent naming of scsi devices
@ 2002-04-10 19:02 Martin Peschke3
  0 siblings, 0 replies; 297+ messages in thread
From: Martin Peschke3 @ 2002-04-10 19:02 UTC (permalink / raw)
  To: berthiaume_wayne; +Cc: rstevens, linux-scsi

What's about non-PCI SCSI adapters?

Another idea regarding a topology oriented naming scheme:

replace     /dev/scsi/host<index>/...
by          /dev/scsi/<hba driver name>/<unique id>/...

Both <hba driver name> and <unique id> are available for each
HBA in the mid layers data today.
This proposal assumes that a particular HBA is driven by
not more than one HBA driver at the same time. Besides,
<unique id> (provided by HBA driver) should be persistent
in the scope of a particular HBA driver. You might use PCI
slot or any other architecture dependent characteristic.

Mit freundlichen Grüßen / with kind regards

Martin Peschke

IBM Deutschland Entwicklung GmbH
Linux for eServer Development
Phone: +49-(0)7031-16-2349

berthiaume_wayne@emc.com@vger.kernel.org on 04/10/2002 04:28:33 PM

Please respond to berthiaume_wayne@emc.com

Sent by:    linux-scsi-owner@vger.kernel.org

To:    rstevens@vitalstream.com
cc:    linux-scsi@vger.kernel.org
Subject:    RE: [RFC] Persistent naming of scsi devices

 Actually, it would be easier to control if it uses its actual PCI
slot number, for example c2 for the second slot, as opposed to c2 for the
second SCSI controller. This would be easier to implement and would be
"persistent" as long as the controller is not moved. Doing it in the manner
you suggest would not provide persistence but revert to the order of
discovery we already have with a different naming convention. Doesn't seem
like a wise alternative. The intention of the CTDL nomenclature is to
provide precise device targeting without the BS mapping to some arbitrarily
named device table. In DG/AViiON, more precisely DG/UX, it was CTDL based
but then mapped to a device mneumonic in the devtable. Devices were named
in
the system config file by their long name (CTDL) and linked in the devtable
to its short name, for example /dev/pdsk/sd0 for the buffered SCSI disk and
/dev/rpdsk/sd0 for the raw SCSI disk. We used different SCSI controllers
but
this didn't present a problem because, they were placed in the kernel
config
file. If the Ciprico was in slot 0 and in the kernel's spec file, it's long
name was entered into the devtable and linked at boot to the short name
/dev/pdsk/sd0. Conversely, if the Adaptec was in slot 0 and configured into
the kernel spec file, its long name would be entered into the devtable and
linked to the /dev/pdsk/sd0 short name.
Wayne
EMC Corp
Centera Engineering
4400 Computer Drive
M/S F213
Westboro,  MA    01580

email:       Berthiaume_Wayne@emc.com

"One man can make a difference, and every man should try."  - JFK

-----Original Message-----
From: Rick Stevens [mailto:rstevens@vitalstream.com]
Sent: Tuesday, April 09, 2002 9:17 PM
To: linux-scsi@vger.kernel.org
Subject: Re: [RFC] Persistent naming of scsi devices

Martin Peschke3 wrote:

<much stuff snipped>

I'm going to wade in here.  Based on my experiences on many other
Unixish systems, wouldn't it be simply better to number the things
based on controller position, SCSI ID, LUN and partition?  This
has been called "CTL" format in various documents, and many systems
use this method such as Sun, DG AvIIons, DEC, HP and a host of others.

E.g. "/dev/dsk/c0t1d2s3" is controller 0 (first SCSI controller seen on
the bus), target ID 1, LUN 2, partition (slice) 3.  Simple, unambiguous
and repeatable.

I understand that other devices may be seen on the PCI if you add or
remove cards.  Under this scheme, controller 0 is the first (lowest
slot number) unit found.  The next one would be controller 1.  Even
if you were to stuff, say, a video card in there, these would _still_
be the first SCSI cards found.  The only time a change would occur
is if a SCSI card was installed in a lower slot number or between
other SCSI controllers (and only then if the original cards were
left in) or if the ORDER of the cards was changed in the bus.

As I said, other people smarter than I seem to think it makes sense.
Why not Linux?  It's silly to smush things together just to satisfy a
bizzare craving to have a list of devices with no "holes" in it.
Besides "fsck"ing or "mkfs"ing drives, how often do you refer to them
by their names in "/dev", anyway?

This would also work for tape drives.  However, they're rooted at
/dev/stape rather than /dev/dsk.

At boot, you could create more mnemonic names as symbolic links to the
CTL names if you wish (as is done with /dev/cdrom and such), but if
you absolutely want to talk to the SAME DEVICE, you use the CTL name.

Just adding my $0.02.  We now return you to your regularly scheduled
arguments.

----------------------------------------------------------------------
- Rick Stevens, SSE, VitalStream, Inc.      rstevens@vitalstream.com -
- 949-743-2010 (Voice)                    http://www.vitalstream.com -
-                                                                    -
-          su -; find / -name someone -exec touch \{\} \;            -
-                          - The UNIX way of touching someone        -
----------------------------------------------------------------------

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 297+ messages in thread

* RE: [RFC] Persistent naming of scsi devices
@ 2002-04-10 20:24 berthiaume_wayne
  0 siblings, 0 replies; 297+ messages in thread
From: berthiaume_wayne @ 2002-04-10 20:24 UTC (permalink / raw)
  To: MPESCHKE; +Cc: rstevens, linux-scsi

	True, there is a hole in the PCI slot logic - onboard SCSI chips for
instance. Using /dev/scsi/<hba driver name>/<unique id>/... does have it
merits as it would lend a quick physical identification, if one can remember
the driver's name for a particular HBA. =;^)

-----Original Message-----
From: Martin Peschke3 [mailto:MPESCHKE@de.ibm.com]
Sent: Wednesday, April 10, 2002 3:03 PM
To: berthiaume_wayne@emc.com
Cc: rstevens@vitalstream.com; linux-scsi@vger.kernel.org
Subject: RE: [RFC] Persistent naming of scsi devices

What's about non-PCI SCSI adapters?

Another idea regarding a topology oriented naming scheme:

replace     /dev/scsi/host<index>/...
by          /dev/scsi/<hba driver name>/<unique id>/...

Both <hba driver name> and <unique id> are available for each
HBA in the mid layers data today.
This proposal assumes that a particular HBA is driven by
not more than one HBA driver at the same time. Besides,
<unique id> (provided by HBA driver) should be persistent
in the scope of a particular HBA driver. You might use PCI
slot or any other architecture dependent characteristic.

Mit freundlichen Grüßen / with kind regards

Martin Peschke

IBM Deutschland Entwicklung GmbH
Linux for eServer Development
Phone: +49-(0)7031-16-2349

berthiaume_wayne@emc.com@vger.kernel.org on 04/10/2002 04:28:33 PM

Please respond to berthiaume_wayne@emc.com

Sent by:    linux-scsi-owner@vger.kernel.org

To:    rstevens@vitalstream.com
cc:    linux-scsi@vger.kernel.org
Subject:    RE: [RFC] Persistent naming of scsi devices

 Actually, it would be easier to control if it uses its actual PCI
slot number, for example c2 for the second slot, as opposed to c2 for the
second SCSI controller. This would be easier to implement and would be
"persistent" as long as the controller is not moved. Doing it in the manner
you suggest would not provide persistence but revert to the order of
discovery we already have with a different naming convention. Doesn't seem
like a wise alternative. The intention of the CTDL nomenclature is to
provide precise device targeting without the BS mapping to some arbitrarily
named device table. In DG/AViiON, more precisely DG/UX, it was CTDL based
but then mapped to a device mneumonic in the devtable. Devices were named
in
the system config file by their long name (CTDL) and linked in the devtable
to its short name, for example /dev/pdsk/sd0 for the buffered SCSI disk and
/dev/rpdsk/sd0 for the raw SCSI disk. We used different SCSI controllers
but
this didn't present a problem because, they were placed in the kernel
config
file. If the Ciprico was in slot 0 and in the kernel's spec file, it's long
name was entered into the devtable and linked at boot to the short name
/dev/pdsk/sd0. Conversely, if the Adaptec was in slot 0 and configured into
the kernel spec file, its long name would be entered into the devtable and
linked to the /dev/pdsk/sd0 short name.
Wayne
EMC Corp
Centera Engineering
4400 Computer Drive
M/S F213
Westboro,  MA    01580

email:       Berthiaume_Wayne@emc.com

"One man can make a difference, and every man should try."  - JFK

-----Original Message-----
From: Rick Stevens [mailto:rstevens@vitalstream.com]
Sent: Tuesday, April 09, 2002 9:17 PM
To: linux-scsi@vger.kernel.org
Subject: Re: [RFC] Persistent naming of scsi devices

Martin Peschke3 wrote:

<much stuff snipped>

I'm going to wade in here.  Based on my experiences on many other
Unixish systems, wouldn't it be simply better to number the things
based on controller position, SCSI ID, LUN and partition?  This
has been called "CTL" format in various documents, and many systems
use this method such as Sun, DG AvIIons, DEC, HP and a host of others.

E.g. "/dev/dsk/c0t1d2s3" is controller 0 (first SCSI controller seen on
the bus), target ID 1, LUN 2, partition (slice) 3.  Simple, unambiguous
and repeatable.

I understand that other devices may be seen on the PCI if you add or
remove cards.  Under this scheme, controller 0 is the first (lowest
slot number) unit found.  The next one would be controller 1.  Even
if you were to stuff, say, a video card in there, these would _still_
be the first SCSI cards found.  The only time a change would occur
is if a SCSI card was installed in a lower slot number or between
other SCSI controllers (and only then if the original cards were
left in) or if the ORDER of the cards was changed in the bus.

As I said, other people smarter than I seem to think it makes sense.
Why not Linux?  It's silly to smush things together just to satisfy a
bizzare craving to have a list of devices with no "holes" in it.
Besides "fsck"ing or "mkfs"ing drives, how often do you refer to them
by their names in "/dev", anyway?

This would also work for tape drives.  However, they're rooted at
/dev/stape rather than /dev/dsk.

At boot, you could create more mnemonic names as symbolic links to the
CTL names if you wish (as is done with /dev/cdrom and such), but if
you absolutely want to talk to the SAME DEVICE, you use the CTL name.

Just adding my $0.02.  We now return you to your regularly scheduled
arguments.

----------------------------------------------------------------------
- Rick Stevens, SSE, VitalStream, Inc.      rstevens@vitalstream.com -
- 949-743-2010 (Voice)                    http://www.vitalstream.com -
-                                                                    -
-          su -; find / -name someone -exec touch \{\} \;            -
-                          - The UNIX way of touching someone        -
----------------------------------------------------------------------

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 297+ messages in thread

* RE: [RFC] Persistent naming of scsi devices
@ 2002-04-11 16:01 Bryan Henderson
  0 siblings, 0 replies; 297+ messages in thread
From: Bryan Henderson @ 2002-04-11 16:01 UTC (permalink / raw)
  To: berthiaume_wayne; +Cc: linux-scsi

>                VLUN = virtual LUN 
>
>This would be the number assigned by the storage system. A storage
>system can be broken up into multiple virtual storage systems each unique 
to
>the owner. The real trick is when to virtual storage systems share the 
same
>LUN.

This VLUN sounds like what SCSI/FCP defines as a LUN.  What you're calling 
a LUN is apparently something that's not defined in SCSI.  I'm guessing a 
LUN here identifies a volume, and the same volume is connected to multiple 
SCSI logical units, one in each virtual storage subsystem.  I'm also 
assuming that a virtual storage subsystem is a target device/node in SCSI 
terms.

That raises the question:  What do you get if you INQUIRE as to the 
logical unit serial number of two SCSI logical units, in different virtual 
storage subsystems, associated with the same volume?  Is it the same 
serial number?

^ permalink raw reply	[flat|nested] 297+ messages in thread

* RE: [RFC] Persistent naming of scsi devices
@ 2002-04-12 13:15 berthiaume_wayne
  0 siblings, 0 replies; 297+ messages in thread
From: berthiaume_wayne @ 2002-04-12 13:15 UTC (permalink / raw)
  To: hbryan; +Cc: linux-scsi

	The CLARiiON returns pages 00h, 80h, and 83h. Page 80h will contain
the array's serial number. Bytes 4 thru 23 of page 83h contain the
Identifier Descriptor 0 data which carries the LUN's WWN and bytes 24 thru
43 contain the Identifier Descriptor 1 data which carries the LUN and VLU
numbers. The LUN/VLU come from the VLUT (Virtual LUN Table). This is how you
are able to map your virtual LUN in your storage group to the physical LUN
in the array. The physical LUN itself is not a physical disk in the physical
array but created at the time you bind disks or portions of disks into LUN.
For example, I may bind five disks into a RAID 5 group and portion them up
into 10 LUNs. Then if I install and use the AccessLogix software on the
array I can break up the array into virtual arrays called Storage Groups to
be used by multiple hosts. Storage Groups can be shared or individually
owned and, for the clever array administrator, you can design individual
storage groups which contain private and shared LUNs. All this is only
possible because we utilize the WWN of the array. The LUNs themselves use a
concatenation of the array's WWN to form the WWN for each individual LUN.
The WWN for the attached HBA is used to identify the unique storage group.
Thus persistence is maintained in the array. Now the trick will be to get
the same for Linux. =;^)

-----Original Message-----
From: Bryan Henderson [mailto:hbryan@us.ibm.com]
Sent: Thursday, April 11, 2002 12:01 PM
To: berthiaume_wayne@emc.com
Cc: linux-scsi@vger.kernel.org
Subject: RE: [RFC] Persistent naming of scsi devices


>                VLUN = virtual LUN 
>
>This would be the number assigned by the storage system. A storage
>system can be broken up into multiple virtual storage systems each unique 
to
>the owner. The real trick is when to virtual storage systems share the 
same
>LUN.

This VLUN sounds like what SCSI/FCP defines as a LUN.  What you're calling 
a LUN is apparently something that's not defined in SCSI.  I'm guessing a 
LUN here identifies a volume, and the same volume is connected to multiple 
SCSI logical units, one in each virtual storage subsystem.  I'm also 
assuming that a virtual storage subsystem is a target device/node in SCSI 
terms.

That raises the question:  What do you get if you INQUIRE as to the 
logical unit serial number of two SCSI logical units, in different virtual 
storage subsystems, associated with the same volume?  Is it the same 
serial number?

^ permalink raw reply	[flat|nested] 297+ messages in thread

* RE: [RFC] Persistent naming of scsi devices
@ 2002-04-12 17:18 Bryan Henderson
  0 siblings, 0 replies; 297+ messages in thread
From: Bryan Henderson @ 2002-04-12 17:18 UTC (permalink / raw)
  To: berthiaume_wayne; +Cc: linux-scsi

That clarifies things a bit.  Clariion uses an unfortunate terminology; we
have to keep that in mind in these naming discussions because, for example,
when we talk a about making a device file name out of host/bus/target/lun,
we're talking about a Clariion VLUN, not a Clariion LUN.  The Clariion LUN
is what I believe most others would call a "volume."  Or maybe a virtual
disk.

>The physical LUN itself is not a physical disk in the physical
>array but created at the time you bind disks or portions of disks into
LUN.

Don't you feel funny saying "physical LUN" (physical logical unit number)?
:-) Is it physical or logical?  And is it a number or a storage medium?

^ permalink raw reply	[flat|nested] 297+ messages in thread

* RE: [RFC] Persistent naming of scsi devices
@ 2002-04-12 18:03 berthiaume_wayne
  0 siblings, 0 replies; 297+ messages in thread
From: berthiaume_wayne @ 2002-04-12 18:03 UTC (permalink / raw)
  To: hbryan; +Cc: linux-scsi

	I guess I hail back from when a LUN, or as I ahd always perceived
it, was a whole device and not a portion of one or several (if they were
bound in a RAID group.) =%^)

-----Original Message-----
From: Bryan Henderson [mailto:hbryan@us.ibm.com]
Sent: Friday, April 12, 2002 1:19 PM
To: berthiaume_wayne@emc.com
Cc: linux-scsi@vger.kernel.org
Subject: RE: [RFC] Persistent naming of scsi devices

That clarifies things a bit.  Clariion uses an unfortunate terminology; we
have to keep that in mind in these naming discussions because, for example,
when we talk a about making a device file name out of host/bus/target/lun,
we're talking about a Clariion VLUN, not a Clariion LUN.  The Clariion LUN
is what I believe most others would call a "volume."  Or maybe a virtual
disk.

>The physical LUN itself is not a physical disk in the physical
>array but created at the time you bind disks or portions of disks into
LUN.

Don't you feel funny saying "physical LUN" (physical logical unit number)?
:-) Is it physical or logical?  And is it a number or a storage medium?

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 297+ messages in thread

* [RFC] Persistent naming of scsi devices
@ 2002-06-05 20:13 sullivan
  2002-06-06  1:08 ` Douglas Gilbert
  0 siblings, 1 reply; 297+ messages in thread
From: sullivan @ 2002-06-05 20:13 UTC (permalink / raw)
  To: linux-scsi; +Cc: mochel

Thanks for your postings to the original RFC submitted under this subject. Based on your feedback two utilities are now available at http://oss.software.ibm.com/devreg/
a. scsiname utility
	- Makes sg calls to collect device info used to make naming decisions
	- Hooked into the hotplug interface using Doug Gilbert's scsimon patch
	- Targeted specifically to scsi devices
	- Implemented completely in userspace

b. devnaming utility
	- Utilizes Patrick Mochel's driverfs fs to collect device info
	- Can easily be extended to support other device types (non scsi)
	- Includes a kernel patch to create/publish device info in driverfs
	- Hooked into hotplug interface using scsimon patch

Why two utilities? I see scsiname providing an immediate fix for providing a persistent set of /dev scsi names across boots. The devnaming utility is more long term in that it anticipates that driverfs will evolve into providing the device information necessary, removing the need for interfacing through sg. The config and hotplug portions of the utilities are consistent and should (hopefully) provide a smooth migration path.

I'd appreciate feedback from anyone that has the interest and the time to take a further look.

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Persistent naming of scsi devices
  2002-06-05 20:13 sullivan
@ 2002-06-06  1:08 ` Douglas Gilbert
  0 siblings, 0 replies; 297+ messages in thread
From: Douglas Gilbert @ 2002-06-06  1:08 UTC (permalink / raw)
  To: sullivan; +Cc: linux-scsi, mochel

sullivan wrote:
> 
> Thanks for your postings to the original RFC submitted under this subject. Based on your feedback two utilities are now available at http://oss.software.ibm.com/devreg/
> a. scsiname utility
>         - Makes sg calls to collect device info used to make naming decisions
>         - Hooked into the hotplug interface using Doug Gilbert's scsimon patch
>         - Targeted specifically to scsi devices
>         - Implemented completely in userspace
> 
> b. devnaming utility
>         - Utilizes Patrick Mochel's driverfs fs to collect device info
>         - Can easily be extended to support other device types (non scsi)
>         - Includes a kernel patch to create/publish device info in driverfs
>         - Hooked into hotplug interface using scsimon patch
> 
> Why two utilities? I see scsiname providing an immediate fix for 
> providing a persistent set of /dev scsi names across boots. The 
> devnaming utility is more long term in that it anticipates that 
> driverfs will evolve into providing the device information 
> necessary, removing the need for interfacing through sg. The config
> and hotplug portions of the utilities are consistent and should 
> (hopefully) provide a smooth migration path.
> 
> I'd appreciate feedback from anyone that has the interest and the 
> time to take a further look.

Mike,
Looks good.

I have just updated http://www.torque.net/scsi/scsimon.html
with a version of scsimon for lk 2.5.20 . The patch is
only supeficially different from the previous one (which still
applies with minor noise in drivers/scsi/Makefile ).

Your devnaming patch contains a kernel patch against lk 2.5.14 .
I noticed that driverfs changed in lk 2.5.20 and Patrick Moche
is proposing lots more changes soon (see lkml). Tracking it should
be "interesting". The PCI, USB and IDE subsystem currently have
slots for driverfs, although the PCI one seems to be the only one
active. Getting your driverfs changes for the scsi subsystem into 
Linus's tree may coerce Patrick into keeping the interface up to
date ...

The hotplug facilities that you are using in scsimon are probably
good candidates to go into the scsi mid level in lk 2.5 .

Doug Gilbert

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Proposed changes to generic blk tag for use in SCSI (1/3)
@ 2002-06-11  2:46 James Bottomley
  2002-06-11  5:50 ` Jens Axboe
  2002-06-13 21:01 ` Doug Ledford
  0 siblings, 2 replies; 297+ messages in thread
From: James Bottomley @ 2002-06-11  2:46 UTC (permalink / raw)
  To: axboe, linux-scsi; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 845 bytes --]

The attached is what I needed in the generic block layer to get the SCSI 
subsystem using it for Tag Command Queueing.

The changes are basically

1) I need a function to find the tagged request given the queue and the tag, 
which I've added as a function in the block layer

2) The SCSI queue will stall if it gets an untagged request in the stream, so 
once tagged queueing is enabled, all commands (including SPECIALS) must be 
tagged.  I altered the check in blk_queue_start_tag to permit this.

This is part of a set of three patches which provide a sample implementation 
of a SCSI driver using the generic TCQ code.

There are several shortcomings of the prototype, most notably it doesn't have 
tag starvation detection and processing.  However, I think I can re-introduce 
this as part of the error handler functions.

James Bottomley


[-- Attachment #2: blk-tag-2.5.21.diff --]
[-- Type: text/plain , Size: 2554 bytes --]

# This is a BitKeeper generated patch for the following project:
# Project Name: Linux kernel tree
# This patch format is intended for GNU patch command version 2.5 or higher.
# This patch includes the following deltas:
#	           ChangeSet	1.585   -> 1.586  
#	drivers/block/ll_rw_blk.c	1.67    -> 1.68   
#	include/linux/blkdev.h	1.44    -> 1.45   
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 02/06/10	jejb@mulgrave.(none)	1.586
# [BLK LAYER]
# 
# add find tag function, adjust criteria for tagging commands.
# --------------------------------------------
#
diff -Nru a/drivers/block/ll_rw_blk.c b/drivers/block/ll_rw_blk.c
--- a/drivers/block/ll_rw_blk.c	Mon Jun 10 22:29:38 2002
+++ b/drivers/block/ll_rw_blk.c	Mon Jun 10 22:29:38 2002
@@ -304,6 +304,27 @@
 }
 
 /**
+ * blk_queue_find_tag - find a request by its tag and queue
+ *
+ * @q:	 The request queue for the device
+ * @tag: The tag of the request
+ *
+ * Notes:
+ *    Should be used when a device returns a tag and you want to match
+ *    it with a request.
+ *
+ *    no locks need be held.
+ **/
+struct request *blk_queue_find_tag(request_queue_t *q, int tag)
+{
+	struct blk_queue_tag *bqt = q->queue_tags;
+
+	if(unlikely(bqt == NULL || bqt->max_depth < tag))
+		return NULL;
+
+	return bqt->tag_index[tag];
+}
+/**
  * blk_queue_free_tags - release tag maintenance info
  * @q:  the request queue for the device
  *
@@ -448,7 +469,7 @@
 	unsigned long *map = bqt->tag_map;
 	int tag = 0;
 
-	if (unlikely(!(rq->flags & REQ_CMD)))
+	if (unlikely((rq->flags & REQ_QUEUED)))
 		return 1;
 
 	for (map = bqt->tag_map; *map == -1UL; map++) {
@@ -1945,6 +1966,7 @@
 EXPORT_SYMBOL(ll_10byte_cmd_build);
 EXPORT_SYMBOL(blk_queue_prep_rq);
 
+EXPORT_SYMBOL(blk_queue_find_tag);
 EXPORT_SYMBOL(blk_queue_init_tags);
 EXPORT_SYMBOL(blk_queue_free_tags);
 EXPORT_SYMBOL(blk_queue_start_tag);
diff -Nru a/include/linux/blkdev.h b/include/linux/blkdev.h
--- a/include/linux/blkdev.h	Mon Jun 10 22:29:38 2002
+++ b/include/linux/blkdev.h	Mon Jun 10 22:29:38 2002
@@ -339,6 +339,7 @@
 #define blk_queue_tag_queue(q)		((q)->queue_tags->busy < (q)->queue_tags->max_depth)
 #define blk_rq_tagged(rq)		((rq)->flags & REQ_QUEUED)
 extern int blk_queue_start_tag(request_queue_t *, struct request *);
+extern struct request *blk_queue_find_tag(request_queue_t *, int);
 extern void blk_queue_end_tag(request_queue_t *, struct request *);
 extern int blk_queue_init_tags(request_queue_t *, int);
 extern void blk_queue_free_tags(request_queue_t *);

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: Proposed changes to generic blk tag for use in SCSI (1/3)
  2002-06-11  2:46 Proposed changes to generic blk tag for use in SCSI (1/3) James Bottomley
@ 2002-06-11  5:50 ` Jens Axboe
  2002-06-11 14:29   ` James Bottomley
  2002-06-13 21:01 ` Doug Ledford
  1 sibling, 1 reply; 297+ messages in thread
From: Jens Axboe @ 2002-06-11  5:50 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi, linux-kernel

On Mon, Jun 10 2002, James Bottomley wrote:
> The attached is what I needed in the generic block layer to get the SCSI 
> subsystem using it for Tag Command Queueing.
> 
> The changes are basically
> 
> 1) I need a function to find the tagged request given the queue and the tag, 
> which I've added as a function in the block layer

Ehm it's already there, one could argue that it's pretty core
functionality for this type of stuff :-). It's called
blk_queue_get_tag(q, tag), and it's in blkdev.h. However, I agree that
we should just move it into ll_rw_blk.c. That gets better documented as
well. Could you redo that part?

> 2) The SCSI queue will stall if it gets an untagged request in the stream, so 
> once tagged queueing is enabled, all commands (including SPECIALS) must be 
> tagged.  I altered the check in blk_queue_start_tag to permit this.

I completely agree with this, blk_queue_start_tag() should not need to
know about these things so just checking if the request is already
marked tagged is fine with me. But please make that a warning, like

	if (rq->flags & REQ_QUEUED) {
		printk("blk_queue_start_tag: rq already tagged\n");
		return 1;
	}

Also, you need to fix drivers/ide/tcq.c to make sure it doesn't call
blk_queue_start_tag() for non REQ_CMD requests. Ah wait, I'll just
change that. And also _please_ fix the comment about REQ_CMD and not
just the code, it's doesn't stand anymore.

> This is part of a set of three patches which provide a sample implementation 
> of a SCSI driver using the generic TCQ code.

Cool! Looking forward to reviewing it later today.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: Proposed changes to generic blk tag for use in SCSI (1/3)
  2002-06-11  5:50 ` Jens Axboe
@ 2002-06-11 14:29   ` James Bottomley
  2002-06-11 14:45     ` Jens Axboe
  0 siblings, 1 reply; 297+ messages in thread
From: James Bottomley @ 2002-06-11 14:29 UTC (permalink / raw)
  To: Jens Axboe; +Cc: James Bottomley, linux-scsi, linux-kernel

axboe@suse.de said:
> Ehm it's already there, one could argue that it's pretty core
> functionality for this type of stuff :-). It's called
> blk_queue_get_tag(q, tag), and it's in blkdev.h. However, I agree that
> we should just move it into ll_rw_blk.c. That gets better documented
> as well. Could you redo that part? 

I guessed it must be.  I grepped the IDE tree looking for anything with `get' 
or `find' in it, but came up empty.  It's actually called 
blk_queue_tag_request(), which is why I didn't find it.

Do you want me to keep this name if I move it?

axboe@suse.de said:
> I completely agree with this, blk_queue_start_tag() should not need to
> know about these things so just checking if the request is already
> marked tagged is fine with me. But please make that a warning, like 

Actually, I think it should be a BUG().  By the time a tagged request comes in 
to blk_queue_start_tag, we must already have corrupted the lists since we use 
the same list element (req->queuelist) to queue on both the tag queue and the 
request queue.

> And also _please_ fix the comment about REQ_CMD and not just the code,
> it's doesn't stand anymore. 

Will do...I didn't see much point altering the comment in the prototype until 
there was agreement that it was OK to do it this way.

James

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: Proposed changes to generic blk tag for use in SCSI (1/3)
  2002-06-11 14:29   ` James Bottomley
@ 2002-06-11 14:45     ` Jens Axboe
  2002-06-11 16:39       ` James Bottomley
  0 siblings, 1 reply; 297+ messages in thread
From: Jens Axboe @ 2002-06-11 14:45 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi, linux-kernel

On Tue, Jun 11 2002, James Bottomley wrote:
> axboe@suse.de said:
> > Ehm it's already there, one could argue that it's pretty core
> > functionality for this type of stuff :-). It's called
> > blk_queue_get_tag(q, tag), and it's in blkdev.h. However, I agree that
> > we should just move it into ll_rw_blk.c. That gets better documented
> > as well. Could you redo that part? 
> 
> I guessed it must be.  I grepped the IDE tree looking for anything with `get' 
> or `find' in it, but came up empty.  It's actually called 
> blk_queue_tag_request(), which is why I didn't find it.
> 
> Do you want me to keep this name if I move it?

Nah the name sucks, you see I didn't even remember it myself either. Why
not just change the name to something sane, like blk_queue_get_tag() or
blk_queue_find_tag(). I think yours (the latter) describes it best. Just
ide accordingly as well, that's the sole current user.

> axboe@suse.de said:
> > I completely agree with this, blk_queue_start_tag() should not need to
> > know about these things so just checking if the request is already
> > marked tagged is fine with me. But please make that a warning, like 
> 
> Actually, I think it should be a BUG().  By the time a tagged request
> comes in to blk_queue_start_tag, we must already have corrupted the
> lists since we use the same list element (req->queuelist) to queue on
> both the tag queue and the request queue.

Agree, make it a BUG_ON() or something.

> > And also _please_ fix the comment about REQ_CMD and not just the code,
> > it's doesn't stand anymore. 
> 
> Will do...I didn't see much point altering the comment in the
> prototype until there was agreement that it was OK to do it this way.

Fine.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: Proposed changes to generic blk tag for use in SCSI (1/3)
  2002-06-11 14:45     ` Jens Axboe
@ 2002-06-11 16:39       ` James Bottomley
  0 siblings, 0 replies; 297+ messages in thread
From: James Bottomley @ 2002-06-11 16:39 UTC (permalink / raw)
  To: Jens Axboe; +Cc: James Bottomley, linux-scsi, linux-kernel

OK, this is the new patch with all the changes.  I didn't alter the IDE 
blk_queue_start_tag() to check for a REQ_CMD, since you said you would do that.

James

You can import this changeset into BK by piping this whole message to:
'| bk receive [path to repository]' or apply the patch as usual.

===================================================================


ChangeSet@1.490, 2002-06-11 12:33:05-04:00, jejb@mulgrave.(none)
  [BLK TCQ]
  
  remove blk_queue_tag_request() in favour of blk_queue_find_tag()
  Allow any request type into blk_queue_start_tag()

ChangeSet@1.489, 2002-06-11 10:34:17-04:00, jejb@mulgrave.(none)
  [BLK LAYER]
  
  add find tag function, adjust criteria for tagging commands.


 drivers/block/ll_rw_blk.c |   40 ++++++++++++++++++++++++++++++++++------
 drivers/ide/tcq.c         |    2 +-
 include/linux/blkdev.h    |    2 +-
 3 files changed, 36 insertions(+), 8 deletions(-)


diff -Nru a/drivers/block/ll_rw_blk.c b/drivers/block/ll_rw_blk.c
--- a/drivers/block/ll_rw_blk.c	Tue Jun 11 12:38:12 2002
+++ b/drivers/block/ll_rw_blk.c	Tue Jun 11 12:38:12 2002
@@ -297,6 +297,27 @@
 }
 
 /**
+ * blk_queue_find_tag - find a request by its tag and queue
+ *
+ * @q:	 The request queue for the device
+ * @tag: The tag of the request
+ *
+ * Notes:
+ *    Should be used when a device returns a tag and you want to match
+ *    it with a request.
+ *
+ *    no locks need be held.
+ **/
+struct request *blk_queue_find_tag(request_queue_t *q, int tag)
+{
+	struct blk_queue_tag *bqt = q->queue_tags;
+
+	if(unlikely(bqt == NULL || bqt->max_depth < tag))
+		return NULL;
+
+	return bqt->tag_index[tag];
+}
+/**
  * blk_queue_free_tags - release tag maintenance info
  * @q:  the request queue for the device
  *
@@ -429,10 +450,12 @@
  *  Description:
  *    This can either be used as a stand-alone helper, or possibly be
  *    assigned as the queue &prep_rq_fn (in which case &struct request
- *    automagically gets a tag assigned). Note that this function assumes
- *    that only REQ_CMD requests can be queued! The request will also be
- *    removed from the request queue, so it's the drivers responsibility to
- *    readd it if it should need to be restarted for some reason.
+ *    automagically gets a tag assigned). Note that this function
+ *    assumes that any type of request can be queued! if this is not
+ *    true for your device, you must check the request type before
+ *    calling this function.  The request will also be removed from
+ *    the request queue, so it's the drivers responsibility to readd
+ *    it if it should need to be restarted for some reason.
  *
  *  Notes:
  *   queue lock must be held.
@@ -443,8 +466,12 @@
 	unsigned long *map = bqt->tag_map;
 	int tag = 0;
 
-	if (unlikely(!(rq->flags & REQ_CMD)))
-		return 1;
+	if (unlikely((rq->flags & REQ_QUEUED))) {
+		printk(KERN_ERR 
+		       "request %p for device [02%x:02%x] already tagged %d",
+		       rq, major(rq->rq_dev), minor(rq->rq_dev), rq->tag);
+		BUG();
+	}
 
 	for (map = bqt->tag_map; *map == -1UL; map++) {
 		tag += BLK_TAGS_PER_LONG;
@@ -2027,6 +2054,7 @@
 
 EXPORT_SYMBOL(blk_queue_prep_rq);
 
+EXPORT_SYMBOL(blk_queue_find_tag);
 EXPORT_SYMBOL(blk_queue_init_tags);
 EXPORT_SYMBOL(blk_queue_free_tags);
 EXPORT_SYMBOL(blk_queue_start_tag);
diff -Nru a/drivers/ide/tcq.c b/drivers/ide/tcq.c
--- a/drivers/ide/tcq.c	Tue Jun 11 12:38:12 2002
+++ b/drivers/ide/tcq.c	Tue Jun 11 12:38:12 2002
@@ -282,7 +282,7 @@
 
 	TCQ_PRINTK("%s: stat %x, feat %x\n", __FUNCTION__, stat, feat);
 
-	rq = blk_queue_tag_request(&drive->queue, tag);
+	rq = blk_queue_find_tag(&drive->queue, tag);
 	if (!rq) {
 		printk(KERN_ERR"%s: missing request for tag %d\n", __FUNCTION__, tag);
 		return ide_stopped;
diff -Nru a/include/linux/blkdev.h b/include/linux/blkdev.h
--- a/include/linux/blkdev.h	Tue Jun 11 12:38:12 2002
+++ b/include/linux/blkdev.h	Tue Jun 11 12:38:12 2002
@@ -328,11 +328,11 @@
 /*
  * tag stuff
  */
-#define blk_queue_tag_request(q, tag)	((q)->queue_tags->tag_index[(tag)])
 #define blk_queue_tag_depth(q)		((q)->queue_tags->busy)
 #define blk_queue_tag_queue(q)		((q)->queue_tags->busy < (q)->queue_tags->
max_depth)
 #define blk_rq_tagged(rq)		((rq)->flags & REQ_QUEUED)
 extern int blk_queue_start_tag(request_queue_t *, struct request *);
+extern struct request *blk_queue_find_tag(request_queue_t *, int);
 extern void blk_queue_end_tag(request_queue_t *, struct request *);
 extern int blk_queue_init_tags(request_queue_t *, int);
 extern void blk_queue_free_tags(request_queue_t *);

===================================================================


This BitKeeper patch contains the following changesets:
1.489..1.490
## Wrapped with gzip_uu ##


begin 664 bkpatch9679
M'XL(`'0G!CT``\U8^W/:.!#^&?\5>^VT![D`\I-'+YF\F%XF:9J29N8ZO0ZC
MV`*<^`&6@#!'__=;R>892IJT=,YA@BVO5OOX]M.*EW#-65+/W;+;&^TE_!5S
M4<^%@Z"3T"$KY:,X8@4<;\8QCI>[<<C*4K1\=%:^">Z*$1L5!>T4C9*MH=@E
M%6X7ABSA]9Q>,F<C8MQC]5RS\?;Z_+"I:7M[<-RE48==,0%[>YJ(DR$-/'Y`
M13>(HY)(:,1#)FC)C</)3'1B$&+@GZU73&([$]TA5F7BZIZN4TMG'C&LJF/-
MM4EK-^MR2(W84M_$L&NFK9V`7K*J-2!&F3AE70>=U$VKKE>*Q*H3`M+U@Y7H
MP!\&%(EV!#_7BV/-A<]'YV=P?OBIT?R"3_BAG@=M/_(`8P[M0>0*/XYV<?AV
MP`6XB2]8XE-HQXF4Z/A1!W#MD$8>+VEG8%B$U+3+>?"UXA,O32.4:/N/^.HE
MOL0`(B1V[\I!T$I&+41+R5WP'4VQT'>G6IM43=UT+.+JAN>XMNFL#?-C2AU=
MURW3THT)L:VJ_:B-?N0&`X^5`S\:W$LL>VQ8ZBXFIV:1B648IC6YH<RUJ5,Q
M:J1BMF^,]09NTKA@7968AJ:MU;"*SFP2HE,E3J&S1A;0:=1-LT[LC>@TMXK.
MC\<?,FPF+(R'#-#O5G_`!JR%"&PE#.^YR!?`CZ!-A_$@@;B]("31+"7S!51Q
M&`3Q"&@TAFR>(@Z<*N*%*5S01&1S$--HIOX+,/WK\>*8ADTF1+<P]?O@T8"Y
M=_X!&_H<B[XX9)$8)(PO%9R/ZH3;G]:$34R]HB.U371\PC16VI6*P=KM"G&K
MK&9M+K0597.33)LXQO^6!%(;#=NIZ&JG>>".W'&V$<UG*74P#(Y9Q4HS;02/
MJG'=62YQHVZ3C26N0U'?2HE?T)"!JR8NE^UR;2^5YT)%GT$*E?=03$;J@]5V
M^3`CSRC9$Z-J@ZZ=IE^YI`][ZVQXK=8J[JOA7;DI%MXLH>(!D![O1WX0V-\'
MDV\!?`87P['-%"Y5\T&_8FSN5\QMX47V)P^S,.M4)%!4^T)G!-]!=W&\RT#-
MD3K2]B;=5`[3SF8-^4,4"ZDO8>T!9_).#OIM-=YL?&@=OSLI*77C5(8&4R&7
MX@RN%KTZOCJ%GM]C2,)*">H/`MDKJ2;B&\!]D)QG`/C4J-7`T#7861>QXFJ<
M;L;@"Z[,EP%*@P4[<OI!OYZ#C^C,5%:]3)M`',5=Q7>9$L39=24IU6`]B_FD
M3-=%+!BORSN\KKKQ(,"$,L!X>3#J8J9HI@_G(6@CC@-3F\;Q`$8T4ED)9=N?
MJ?$%C'S1G3M3RA;#*XI!QI)#Q)A:J<L"3[[?*6M<)`-7S+S:65/>V;LI*<%.
M?U=V"ZK.M7^U7*9BB;=041]/'=#/6$&.\3?:/UK.;^<'4>#?L6"<5S)[<'%]
M?@Z3">!C<3^D]RV/]="5/]4*!2V72\.@Y)2.[%G)2Y)$2]G]9[S[\D;[JI5W
M=K03RW(D<Z5?N"C,5\TG:%4[0(/@M0+QA^O&=>.D@$N=8@G6<$+C[\OWS8^M
MJT_OCMZ?YQ_&1/+;S^B?%;-83]V(\"CD;(590GK'UM(`=5U,B2ISU2PBK*>(
MR;-[]0XK01)!D##JC=7!"+&&!G/5/ZI&88N%?F+9)EB8;]L!)X,]'8@XI!W?
M1;(90X>)61UQ[G<BYA5*JA2Q0BFBN>OS&8=.-7`^"!6)T?7.NS22]:3"Y?TF
MB5&IP0\29*8$JR/EB;'LR]/"WE5U'*KS9!?WJ4622!>Y83B%92JD!_*8N61C
M"9;X:.0'`8:?Q]*@])B`Q]@D#J=FK'+7+J"L+WY/.3I+!HKP7AQQ_\8/?#%.
MN1\WG#G+H(_XGZ>LI0A%9$LJN,A%T5<>AW*(<K03D^-4P,#D.%5,SO<6(R"U
MY'H),LU=_JS1O&@UFDW`(4BO%U-G7O74BAEC?B;&J_NZ_/=E%8ROO!>[\_D)
MLEA(;^-$69#TD72&!1SRH]4A^9#6?"YW=/TV+V^^JOYF_>'B\>;F1XXYFSN;
MC<>=65M#\#LE'UM_8ENC;^N@^XVNII?$(I85@2RB#O4K)++>X>>T"J8IFUQV
M+QAN+<_9%M6N^,V=X2D_7*C,&$_<%LBV^LU-OSE`7C9,">L%U&4A(E%RW[IC
FBMP%U"%[6_D[,4T,U/SW4<6K2-][)JVZ58>VM?\`!I#L'H85````
`
end



^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: Proposed changes to generic blk tag for use in SCSI (1/3)
  2002-06-11  2:46 Proposed changes to generic blk tag for use in SCSI (1/3) James Bottomley
  2002-06-11  5:50 ` Jens Axboe
@ 2002-06-13 21:01 ` Doug Ledford
  2002-06-13 21:26   ` James Bottomley
  1 sibling, 1 reply; 297+ messages in thread
From: Doug Ledford @ 2002-06-13 21:01 UTC (permalink / raw)
  To: James Bottomley; +Cc: axboe, linux-scsi, linux-kernel

On Mon, Jun 10, 2002 at 10:46:44PM -0400, James Bottomley wrote:
> 2) The SCSI queue will stall if it gets an untagged request in the stream, so 
> once tagged queueing is enabled, all commands (including SPECIALS) must be 
> tagged.  I altered the check in blk_queue_start_tag to permit this.

Hmmm...this seems broken to me.  Switching from tagged to untagged 
momentarily and then back is perfectly valid.  Can the bio layer 
handle this and not the scsi layer, or are both layers unable to handle 
this sort of tag manipulation? 

> There are several shortcomings of the prototype, most notably it doesn't have 
> tag starvation detection and processing.  However, I think I can re-introduce 
> this as part of the error handler functions.

If you are using the bio layer tag processing, then it should be 
doing this part I would think.  If it isn't, then it sounds like either 
it's design is missing some key elements required to be fully functional 
or the integration between the scsi layer and the bio layer needs some 
additional work.

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: Proposed changes to generic blk tag for use in SCSI (1/3)
  2002-06-13 21:01 ` Doug Ledford
@ 2002-06-13 21:26   ` James Bottomley
  0 siblings, 0 replies; 297+ messages in thread
From: James Bottomley @ 2002-06-13 21:26 UTC (permalink / raw)
  To: James Bottomley, axboe, linux-scsi, linux-kernel

On Mon, Jun 10, 2002 at 10:46:44PM -0400, James Bottomley wrote:
> 2) The SCSI queue will stall if it gets an untagged request in the stream, so 
> once tagged queueing is enabled, all commands (including SPECIALS) must be 
> tagged.  I altered the check in blk_queue_start_tag to permit this.

dledford@redhat.com said:
> Hmmm...this seems broken to me.  Switching from tagged to untagged
> momentarily and then back is perfectly valid.  Can the bio layer
> handle this and not the scsi layer, or are both layers unable to
> handle  this sort of tag manipulation?  

The layers can cope with the switch easily enough.  The problem is that to 
send an untagged command to a SCSI device you have to wait for the outstanding 
tags to clear which is what causes the stall.  The scsi mid-layer queue push 
back system pushes all commands back to the BIO layer marked as REQ_SPECIAL 
(because the upper layer drivers generate the commands and it has no idea what 
they are supposed to be doing) if the driver cannot handle them.  This means 
for those drivers (like the new adaptec) which load up the device until it 
returns a queue full (thus causing push back into the bio layer) we'd get 
stutter in the command pipeline.  The cleanest solution is to allow (but not 
require) tagging of every request type.

On Mon, Jun 10, 2002 at 10:46:44PM -0400, James Bottomley wrote:
> There are several shortcomings of the prototype, most notably it doesn't have 
> tag starvation detection and processing.  However, I think I can re-introduce 
> this as part of the error handler functions.

dledford@redhat.com said:
> If you are using the bio layer tag processing, then it should be
> doing this part I would think.  If it isn't, then it sounds like
> either  it's design is missing some key elements required to be fully
> functional  or the integration between the scsi layer and the bio
> layer needs some  additional work. 

I thought about doing this.  The problem is that the blk layer doesn't have 
very good instrumentation for detecting the condition.  The SCSI layer is the 
one that has per command timers and all the other necessaries so it can detect 
when a command should have returned and take corrective action.

James

^ permalink raw reply	[flat|nested] 297+ messages in thread

[parent not found: <200206132126.g5DLQiQ24889@localhost.localdomain>]

* Re: Proposed changes to generic blk tag for use in SCSI (1/3)
       [not found] <200206132126.g5DLQiQ24889@localhost.localdomain>
@ 2002-06-13 21:50 ` Doug Ledford
  2002-06-13 22:09   ` James Bottomley
  0 siblings, 1 reply; 297+ messages in thread
From: Doug Ledford @ 2002-06-13 21:50 UTC (permalink / raw)
  To: James Bottomley; +Cc: axboe, linux-scsi, linux-kernel

On Thu, Jun 13, 2002 at 05:26:44PM -0400, James Bottomley wrote:
> On Mon, Jun 10, 2002 at 10:46:44PM -0400, James Bottomley wrote:
> > 2) The SCSI queue will stall if it gets an untagged request in the stream, so 
> > once tagged queueing is enabled, all commands (including SPECIALS) must be 
> > tagged.  I altered the check in blk_queue_start_tag to permit this.
> 
> dledford@redhat.com said:
> > Hmmm...this seems broken to me.  Switching from tagged to untagged
> > momentarily and then back is perfectly valid.  Can the bio layer
> > handle this and not the scsi layer, or are both layers unable to
> > handle  this sort of tag manipulation?  
> 
> The layers can cope with the switch easily enough.  The problem is that to 
> send an untagged command to a SCSI device you have to wait for the outstanding 
> tags to clear which is what causes the stall.

Well, intentional behaviour is hardly what I would call a stall.  I 
thought you were implying that it would stall the queue indefinitely.  I'm 
fully aware that it forces the queue to wait until all outstanding 
commands have completed before sending the untagged command, that's part 
of the desired behaviour in that case.

>  The scsi mid-layer queue push 
> back system pushes all commands back to the BIO layer marked as REQ_SPECIAL 
> (because the upper layer drivers generate the commands and it has no idea what 
> they are supposed to be doing) if the driver cannot handle them.  This means 
> for those drivers (like the new adaptec) which load up the device until it 
> returns a queue full (thus causing push back into the bio layer) we'd get 
> stutter in the command pipeline.  The cleanest solution is to allow (but not 
> require) tagging of every request type.

This I'm not following.  If you get a QUEUE_FULL from the adaptec driver, 
then the commands you are pushing back should still be tagged and no stall 
should be required beyond just waiting for any outstanding command on the 
drive to complete or for a timeout to pass.  It should not require any 
untagged type stall where you have to drain the entire pipeline...

> I thought about doing this.  The problem is that the blk layer doesn't have 
> very good instrumentation for detecting the condition.  The SCSI layer is the 
> one that has per command timers and all the other necessaries so it can detect 
> when a command should have returned and take corrective action.

I would think that, eventually, the bio layer will support I/O fencing via 
tagged commands (aka, ext3 needs an I/O fence and the bio layer does as 
needed to enforce that, which on scsi may mean an ordered queue tag is 
generated instead of a regular tag and on IDE it may mean something else).  
It will have to be able to tell that some of these conditions have been 
satisfied in those cases, so I see no reason why it shouldn't be made 
aware of them now.  Just my $.02

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: Proposed changes to generic blk tag for use in SCSI (1/3)
  2002-06-13 21:50 ` Doug Ledford
@ 2002-06-13 22:09   ` James Bottomley
  0 siblings, 0 replies; 297+ messages in thread
From: James Bottomley @ 2002-06-13 22:09 UTC (permalink / raw)
  To: James Bottomley, axboe, linux-scsi, linux-kernel

dledford@redhat.com said:
> This I'm not following.  If you get a QUEUE_FULL from the adaptec
> driver,  then the commands you are pushing back should still be tagged
> and no stall  should be required beyond just waiting for any
> outstanding command on the  drive to complete or for a timeout to
> pass.  It should not require any  untagged type stall where you have
> to drain the entire pipeline... 

Ah, but that was the problem in the blk generic tagging.  only requests of 
type REQ_CMD get tagged, so say the SD driver gets a block read request as a 
REQ_CMD, translates it into a SCSI READ, sends it through the mid-layer to the 
low layer driver which requests a block layer tag but eventually responds 
QUEUE_FULL.  Now the command gets pushed back to the blk queue head as a 
REQ_SPECIAL (and as part of the push back, we have to finish the tag since 
command moves from the tag queue to the blk queue), which to the scsi mid 
layer means request with already formulated SCSI command so don't go back 
through the upper layer driver again.  The problem is that when this command 
comes back again into the scsi mid layer for execution it would now do so as 
an untagged command because the blk_queue_start_tag() code will only tag 
REQ_CMD requests, and hence we get a queue stall every time the low level 
driver responds QUEUE_FULL.

This was the behaviour (in the blk layer) I was objecting to---on the second 
go around, we request a tag using blk_queue_start_tag() but get denied because 
the request isn't of the correct type---and why I think the 
blk_generic_start_tag() needs to allow REQ_SPECIAL requests to be tagged.

> I would think that, eventually, the bio layer will support I/O fencing
> via  tagged commands (aka, ext3 needs an I/O fence and the bio layer
> does as  needed to enforce that, which on scsi may mean an ordered
> queue tag is  generated instead of a regular tag and on IDE it may
> mean something else).   It will have to be able to tell that some of
> these conditions have been  satisfied in those cases, so I see no
> reason why it shouldn't be made  aware of them now.  Just my $.02 

I've actually already put this code into the mid layer patch (the 
scsi_populate_tag_msg() function in scsi.h) to generate an ordered tag for the 
case where the request is marked REQ_BARRIER.

James

^ permalink raw reply	[flat|nested] 297+ messages in thread

* When must the io_request_lock be held?
@ 2002-08-05 23:53 Jamie Wellnitz
  2002-08-06 17:58 ` Mukul Kotwani
  2002-08-07 14:48 ` Doug Ledford
  0 siblings, 2 replies; 297+ messages in thread
From: Jamie Wellnitz @ 2002-08-05 23:53 UTC (permalink / raw)
  To: linux-scsi

When does a low-level driver have to hold the io_request_lock?  I know
that a driver should take the lock when it calls a SCSI command's done
function.  Also, when the driver's queuecommand is entered, it already
holds this lock.

I've seen some drivers that drop the lock (via either spin_unlock or
spin_unlock_irq) inside their queuecommand.  What are the rules for
dropping (and reacquiring) the lock here?

Thanks,
Jamie Wellnitz

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: When must the io_request_lock be held?
  2002-08-05 23:53 When must the io_request_lock be held? Jamie Wellnitz
@ 2002-08-06 17:58 ` Mukul Kotwani
  2002-08-07 14:48 ` Doug Ledford
  1 sibling, 0 replies; 297+ messages in thread
From: Mukul Kotwani @ 2002-08-06 17:58 UTC (permalink / raw)
  To: Jamie Wellnitz, linux-scsi

--- Jamie Wellnitz <Jamie.Wellnitz@emulex.com> wrote:
> When does a low-level driver have to hold the
> io_request_lock?  I know
> that a driver should take the lock when it calls a
> SCSI command's done
> function.  Also, when the driver's queuecommand is
> entered, it already
> holds this lock.
> 
> I've seen some drivers that drop the lock (via
> either spin_unlock or
> spin_unlock_irq) inside their queuecommand.  What
> are the rules for
> dropping (and reacquiring) the lock here?
> 

If you drop the lock, you are responsible for the
synchronization of the driver. From what I know, the
io_request_lock is going to be removed from the code
in the later versions, so it is always a good idea not
to depend on the lock for synchronization and define
ur own, so that it becomes easier in the future.

You can drop the lock at the start if u have ur own
locks, and reacquire it at the end, because the mid
level expects it to be in the acquired state. 

Mukul

> Thanks,
> Jamie Wellnitz
> -
> To unsubscribe from this list: send the line
> "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at 
http://vger.kernel.org/majordomo-info.html

__________________________________________________
Do You Yahoo!?
Yahoo! Health - Feel better, live better
http://health.yahoo.com

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: When must the io_request_lock be held?
  2002-08-05 23:53 When must the io_request_lock be held? Jamie Wellnitz
  2002-08-06 17:58 ` Mukul Kotwani
@ 2002-08-07 14:48 ` Doug Ledford
  2002-08-07 15:26   ` James Bottomley
  1 sibling, 1 reply; 297+ messages in thread
From: Doug Ledford @ 2002-08-07 14:48 UTC (permalink / raw)
  To: Jamie Wellnitz; +Cc: linux-scsi

On Mon, Aug 05, 2002 at 07:53:25PM -0400, Jamie Wellnitz wrote:
> When does a low-level driver have to hold the io_request_lock?

If you are using the io_request_lock as your driver's only big lock, then
you should leave it held in the queuecommand(), abort(), reset() routines.  
I can't, for the life of me, remember right now if the new_eh strategy,
abort, device_reset, and bus_reset routines are called with a lock held,
you'll have to look that up by going into scsi_error.c and checking it
out. You should grab it prior to entering your interrupt handler body and
you should grab it from any timer based entry points in your driver.  
Releasing of the lock is OK as long as you reacquire it before returning
and as long as you know that not owning a lock at that time is safe.  You
can use the spin_unlock() variant if you wish, but that will leave
interrupts disabled and therefore buy you exactly zero benefit on UP
systems.  The spin_unlock_irq() is what I typically use since that
actually allows other interrupts to come in while you are working.  You
are, of course, responsible for doing sane locking in your driver if you
choose to unlock the locks the mid layer holds for you.  In 2.4.19, the 
io_request_lock would be held.  In 2.5.x, the io_request_lock is gone and 
instead the host->host_lock would be held.

>  I know
> that a driver should take the lock when it calls a SCSI command's done
> function.

In both 2.4.x and 2.5.x if your driver uses the new error handling code, 
then this is not necessary.  It may call the done() function without any 
locks held since all the done function does is grab the bottom half 
handler lock long enough to stuff the command onto a list and then 
releases the lock and returns.

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: When must the io_request_lock be held?
  2002-08-07 14:48 ` Doug Ledford
@ 2002-08-07 15:26   ` James Bottomley
  2002-08-07 16:18     ` Doug Ledford
  0 siblings, 1 reply; 297+ messages in thread
From: James Bottomley @ 2002-08-07 15:26 UTC (permalink / raw)
  To: Jamie Wellnitz, linux-scsi

dledford@redhat.com said:
> I can't, for the life of me, remember right now if the new_eh
> strategy, abort, device_reset, and bus_reset routines are called with
> a lock held, you'll have to look that up by going into scsi_error.c
> and checking it out.

They are, but you can drop it (as long as you reacquire it before exit).

dledford@redhat.com said:
> You can use the spin_unlock() variant if you wish, but that will leave
> interrupts disabled and therefore buy you exactly zero benefit on UP
> systems.

This isn't necessarily true.  Some SCSI cards have a single mailbox for 
outgoing and incoming transfer parameters and commands.  If you have one of 
these, you need to keep interrupts disabled as you set up a command so that 
your own interrupt routine doesn't overwrite the values if you get a reconnect 
interrupt while you're programming the card.

dledford@redhat.com said:
> You are, of course, responsible for doing sane locking in your driver
> if you choose to unlock the locks the mid layer holds for you.

This is a very important point to bear in mind.  Since the io_request_lock 
must be held when calling the done() routine, you are asking to create a 
deadlock if you use your own locks in the driver (it's not impossible to get 
this correct, just very difficult).

Basically, the io_request_lock is used to protect structures within the SCSI 
and block layer.  Once you get into the LLD, you may extend it if you wish to 
protect the individual data structures of your driver (which is why it's held 
on entry to most of the LLD API) but if you don't want to, it is safe to drop 
it (as long as your queuecommand is re-entrant and you re-acquire it before 
exiting the API routines).

James

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: When must the io_request_lock be held?
  2002-08-07 15:26   ` James Bottomley
@ 2002-08-07 16:18     ` Doug Ledford
  2002-08-07 16:48       ` James Bottomley
  2002-08-07 16:55       ` Patrick Mansfield
  0 siblings, 2 replies; 297+ messages in thread
From: Doug Ledford @ 2002-08-07 16:18 UTC (permalink / raw)
  To: James Bottomley; +Cc: Jamie Wellnitz, linux-scsi

On Wed, Aug 07, 2002 at 10:26:24AM -0500, James Bottomley wrote:
> dledford@redhat.com said:
> > You can use the spin_unlock() variant if you wish, but that will leave
> > interrupts disabled and therefore buy you exactly zero benefit on UP
> > systems.
> 
> This isn't necessarily true.  Some SCSI cards have a single mailbox for 
> outgoing and incoming transfer parameters and commands.  If you have one of 
> these, you need to keep interrupts disabled as you set up a command so that 
> your own interrupt routine doesn't overwrite the values if you get a reconnect 
> interrupt while you're programming the card.

Then you better not *ever* release the lock under those conditions.  
Leaving interrupts disabled may work on UP, but on SMP it only disables 
interrupts on that one CPU and if you aren't holding your lock then 
another CPU can run the interrupt routine and trash your data.  So, back 
to my point, if you aren't enabling interrupts, then there is no reason to 
release the lock because you obviously need some sort of lock type 
protection.

> dledford@redhat.com said:
> > You are, of course, responsible for doing sane locking in your driver
> > if you choose to unlock the locks the mid layer holds for you.
> 
> This is a very important point to bear in mind.  Since the io_request_lock 
> must be held when calling the done() routine,

Don't get too hung up on this point.  In 2.5 you don't need either the 
io_request_lock or host->host_lock when calling the done function.  In 
2.4.19 you don't need it if you are a new eh driver, only if you are old.  
And hopefully by 2.4.20 that won't be true either.  If Marcello takes the 
patch I've got for 2.4 that we shipped in our Red Hat 7.3 product, then 
both new and old eh drivers will no longer have to hold a lock when they 
call the done function because both of them will be using the new eh 
code's bottom half mechanism.

> you are asking to create a 
> deadlock if you use your own locks in the driver (it's not impossible to get 
> this correct, just very difficult).

Again, if Marcello takes my patch, this will become easier as well since 
it adds a host->lock lock that allows you to get your driver off the 
horrible io_request_lock once and for all.  (Note: I may have to change 
the name to host->host_lock to make it match 2.5, but that just annoys the 
hell out of me because I *detest* totally redundant names like that...)

> Basically, the io_request_lock is used to protect structures within the SCSI 
> and block layer.  Once you get into the LLD, you may extend it if you wish to 
> protect the individual data structures of your driver (which is why it's held 
> on entry to most of the LLD API) but if you don't want to, it is safe to drop 
> it (as long as your queuecommand is re-entrant and you re-acquire it before 
> exiting the API routines).
> 
> James
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: When must the io_request_lock be held?
  2002-08-07 16:18     ` Doug Ledford
@ 2002-08-07 16:48       ` James Bottomley
  2002-08-07 18:06         ` Mike Anderson
  2002-08-08 19:28         ` Luben Tuikov
  2002-08-07 16:55       ` Patrick Mansfield
  1 sibling, 2 replies; 297+ messages in thread
From: James Bottomley @ 2002-08-07 16:48 UTC (permalink / raw)
  To: Jamie Wellnitz, linux-scsi

dledford@redhat.com said:
> Then you better not *ever* release the lock under those conditions.   

Exactly so.

dledford@redhat.com said:
> Again, if Marcello takes my patch, this will become easier as well
> since  it adds a host->lock lock that allows you to get your driver
> off the  horrible io_request_lock once and for all.  (Note: I may have
> to change  the name to host->host_lock to make it match 2.5, but that
> just annoys the  hell out of me because I *detest* totally redundant
> names like that...)

Well, that's just semantics, I won't get into the argument...

Note also that 2.5 already has per queue locks (that means per individual SCSI 
devices) which are lock pointers.  In 2.5 current, all the devices that hang 
off an individual host have their queue_locks initialised to point to the 
single host->host_lock.  However, drivers which can do it (the type of 
multi-threaded multi-mailbox ones) will be permitted to use individual and 
separate queue_locks, rather than the single host_lock, to protect their 
mailbox registers.  If you are never going to do this, the difference between 
using host_lock and queue_lock is irrelevant to you.  If you are, you need to 
begin using the queue_locks.

James

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: When must the io_request_lock be held?
  2002-08-07 16:48       ` James Bottomley
@ 2002-08-07 18:06         ` Mike Anderson
  2002-08-07 23:17           ` James Bottomley
  2002-08-08 19:28         ` Luben Tuikov
  1 sibling, 1 reply; 297+ messages in thread
From: Mike Anderson @ 2002-08-07 18:06 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi

James Bottomley [James.Bottomley@steeleye.com] wrote:
> 
> Well, that's just semantics, I won't get into the argument...
> 
> Note also that 2.5 already has per queue locks (that means per individual SCSI 
> devices) which are lock pointers.  In 2.5 current, all the devices that hang 
> off an individual host have their queue_locks initialised to point to the 
> single host->host_lock.  However, drivers which can do it (the type of 
> multi-threaded multi-mailbox ones) will be permitted to use individual and 
> separate queue_locks, rather than the single host_lock, to protect their 
> mailbox registers.  If you are never going to do this, the difference between 
> using host_lock and queue_lock is irrelevant to you.  If you are, you need to 
> begin using the queue_locks.

James,
	Maybe I am mis-reading something.

	Why would we want to go back to a model of having LL drivers
	using locks outside there domain (queue_lock). Should we not be
	moving toward having drivers use there own locks.

	I would think a future model might be to have queue_locks for
	queue operations and possible shared as a scsi device lock, a
	host_lock for host data, and LL driver locks used where the
	driver needs them.

-Mike
-- 
Michael Anderson
andmike@us.ibm.com


^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: When must the io_request_lock be held?
  2002-08-07 18:06         ` Mike Anderson
@ 2002-08-07 23:17           ` James Bottomley
  0 siblings, 0 replies; 297+ messages in thread
From: James Bottomley @ 2002-08-07 23:17 UTC (permalink / raw)
  To: James Bottomley, linux-scsi

andmike@us.ibm.com said:
> 	Maybe I am mis-reading something.

> 	Why would we want to go back to a model of having LL drivers
> 	using locks outside there domain (queue_lock). Should we not be
> 	moving toward having drivers use there own locks.

> 	I would think a future model might be to have queue_locks for
> 	queue operations and possible shared as a scsi device lock, a
> 	host_lock for host data, and LL driver locks used where the
> 	driver needs them. 

We're not really moving back:  Drivers that do their own locking today should 
continue to work under the new model (it just breaks the granularity of the 
io_request_lock, which was used for queueing previously).  Obviously, if those 
drivers move to individual queue locks instead of using the host lock for all 
queues, they get greater throughput but have nastier deadlock potential.

The drivers that don't have their own locks should now rely on either the 
host_lock or the queue_lock depending on what their hardware allows.  The 
re-entrancy of queuecommand is goverened by the programming model of the 
device.  For simple register and single mailbox cards, it doesn't really make 
too much sense to introduce a separate card lock just so they can drop the 
queue locks---It complicates the programming model and has no corresponding 
performance gain.

The bottom line is that there's no reason to require all drivers to develop 
their own locking scheme, but the barriers to this should be decreasing for 
those drivers that can profitably do this.

James

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: When must the io_request_lock be held?
  2002-08-07 16:48       ` James Bottomley
  2002-08-07 18:06         ` Mike Anderson
@ 2002-08-08 19:28         ` Luben Tuikov
  1 sibling, 0 replies; 297+ messages in thread
From: Luben Tuikov @ 2002-08-08 19:28 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi

James Bottomley wrote:
> 
> dledford@redhat.com said:
> > Again, if Marcello takes my patch, this will become easier as well
> > since  it adds a host->lock lock that allows you to get your driver
> > off the  horrible io_request_lock once and for all.  (Note: I may have
> > to change  the name to host->host_lock to make it match 2.5, but that
> > just annoys the  hell out of me because I *detest* totally redundant
> > names like that...)
> 
> Well, that's just semantics, I won't get into the argument...
> 

I've assumed that you also wanted to add that semantics are
quite _very_ important.

-- 
Luben

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: When must the io_request_lock be held?
  2002-08-07 16:18     ` Doug Ledford
  2002-08-07 16:48       ` James Bottomley
@ 2002-08-07 16:55       ` Patrick Mansfield
  1 sibling, 0 replies; 297+ messages in thread
From: Patrick Mansfield @ 2002-08-07 16:55 UTC (permalink / raw)
  To: James Bottomley, Jamie Wellnitz, linux-scsi

On Wed, Aug 07, 2002 at 12:18:46PM -0400, Doug Ledford wrote:

> Again, if Marcello takes my patch, this will become easier as well since 
> it adds a host->lock lock that allows you to get your driver off the 
> horrible io_request_lock once and for all.  (Note: I may have to change 
> the name to host->host_lock to make it match 2.5, but that just annoys the 
> hell out of me because I *detest* totally redundant names like that...)

Such code looks a bit odd, but I would rather have it and be able to
more easily find or audit all the references; using cscope, there
are about 6700 references to lock in the 2.5 kernel (only looking at
i386 sources, so there are probably more), and about 300 references to
host_lock.

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 297+ messages in thread

* [PATCH] 2.5.31 scsi_error.c cleanup
@ 2002-08-12 23:38 Mike Anderson
  2002-08-22 14:05 ` James Bottomley
  0 siblings, 1 reply; 297+ messages in thread
From: Mike Anderson @ 2002-08-12 23:38 UTC (permalink / raw)
  To: linux-scsi

I have created a scsi_error cleanup patch.

I would appreciate any feedback and testing.

The patch simplifies scsi_unjam_host and associated functions.

I did not change any of the current error policy. I would like to do
that in the future.

I have tested the patch on the following configurations with successful
recovery:

UML:
	Driver: scsi_debug Version: 1.59
	Error Hdlr Policy:
		- Request sense if no auto sense.
		- Abort on timed out command

2.5.31:
	Driver: aic7xxx 6.2.4
	Error Hdlr Policy:
		- Abort on timed out command
		- Attempt BDR
		- Send Bus Reset.

I also ran against qlogicisp, ips, and qla drivers but error injection
did not generate any meaningful results.

Complete patch against 2.5.31 is at:

http://www-124.ibm.com/storageio/gen-io/patch-scsi_error-2.5.31-1.gz

-Mike

-- 
Michael Anderson
andmike@us.ibm.com

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] 2.5.31 scsi_error.c cleanup
  2002-08-12 23:38 [PATCH] 2.5.31 scsi_error.c cleanup Mike Anderson
@ 2002-08-22 14:05 ` James Bottomley
  2002-08-22 16:34   ` Mike Anderson
  0 siblings, 1 reply; 297+ messages in thread
From: James Bottomley @ 2002-08-22 14:05 UTC (permalink / raw)
  To: Mike Anderson; +Cc: linux-scsi

andmike@us.ibm.com said:
> I did not change any of the current error policy. I would like to do
> that in the future. 

Looking through the patch, it seems you've changed the offline behaviour.  Now 
if all error recovery fails, the machine will panic instead of just offlining 
the failed device.  I know offlining has never worked correctly, because it 
always seems to leave the system hanging, but it is a useful feature for large 
machines with many SCSI attachements, could you look at trying to get it to 
work corectly?

Thanks,

James

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] 2.5.31 scsi_error.c cleanup
  2002-08-22 14:05 ` James Bottomley
@ 2002-08-22 16:34   ` Mike Anderson
  2002-08-22 17:11     ` James Bottomley
  0 siblings, 1 reply; 297+ messages in thread
From: Mike Anderson @ 2002-08-22 16:34 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi

James Bottomley [James.Bottomley@steeleye.com] wrote:
> andmike@us.ibm.com said:
> > I did not change any of the current error policy. I would like to do
> > that in the future. 
> 
> Looking through the patch, it seems you've changed the offline behavior.  Now 
> if all error recovery fails, the machine will panic instead of just offlining 
> the failed device.  I know offlining has never worked correctly, because it 
> always seems to leave the system hanging, but it is a useful feature for large 
> machines with many SCSI attachments, could you look at trying to get it to 
> work corectly?
> 
> Thanks,
> 
> James

Thanks for the feedback my intent was not to panic on failed recovery. I
will re-verify an offline case.

All sdev's that fail to recover in scsi_eh_bus_host_reset through a bus
and/or host reset should be offlined at the bottom of the channel for
loop before we go on to the next channel. The BUG_ON(shost->host_failed)
in scsi_unjam host is a carryover from the old scsi_error panic that I
believe was trying to catch race conditions / code problems. I left the
check in for now.

-Mike

-- 
Michael Anderson
andmike@us.ibm.com

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] 2.5.31 scsi_error.c cleanup
  2002-08-22 16:34   ` Mike Anderson
@ 2002-08-22 17:11     ` James Bottomley
  2002-08-22 20:10       ` Mike Anderson
  0 siblings, 1 reply; 297+ messages in thread
From: James Bottomley @ 2002-08-22 17:11 UTC (permalink / raw)
  To: James Bottomley, linux-scsi

andmike@us.ibm.com said:
> Thanks for the feedback my intent was not to panic on failed recovery.
> I will re-verify an offline case.

> All sdev's that fail to recover in scsi_eh_bus_host_reset through a
> bus and/or host reset should be offlined at the bottom of the channel
> for loop before we go on to the next channel. The BUG_ON(shost->
> host_failed) in scsi_unjam host is a carryover from the old scsi_error
> panic that I believe was trying to catch race conditions / code
> problems. I left the check in for now. 

Ah, OK.  I missed that in the logic flow.  Does it perhaps make sense to move 
the offline into its own nicely named function?  It might add to the initial 
readability of scsi_unjam_host.  I see it was problematic because the offline 
loop is bound up inside the host loop.

James



^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] 2.5.31 scsi_error.c cleanup
  2002-08-22 17:11     ` James Bottomley
@ 2002-08-22 20:10       ` Mike Anderson
  0 siblings, 0 replies; 297+ messages in thread
From: Mike Anderson @ 2002-08-22 20:10 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi

James Bottomley [James.Bottomley@SteelEye.com] wrote:
> Ah, OK.  I missed that in the logic flow.  Does it perhaps make sense to move 
> the offline into its own nicely named function?  It might add to the initial 
> readability of scsi_unjam_host.  I see it was problematic because the offline 
> loop is bound up inside the host loop.
> 
> James

On early versions of the patch I had off-lining in its own function, but
moved it into scsi_eh_bus_host_reset for some forgotten reason. I will
re-roll the patch with a offline function as I agree it will improve the
readability of scsi_unjam_host.

-Mike

-- 
Michael Anderson
andmike@us.ibm.com


^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC]: 64 bit LUN/Tags, dummy device in host_queue, host_lock <-> LLDD reentrancy
@ 2002-08-26 16:29 Aron Zeh
  2002-08-26 16:48 ` James Bottomley
  0 siblings, 1 reply; 297+ messages in thread
From: Aron Zeh @ 2002-08-26 16:29 UTC (permalink / raw)
  To: James Bottomley; +Cc: Luben Tuikov, linux-scsi


> ARZEH@de.ibm.com said:
> > To me all of the proposed changes look good. The one above I'd be
> > interested in particularly. It would help sorting out a lot of woes we
> > had with writing a fibre-channel HBA driver. For fibre-channel LUNs
> > and port IDs (WWPN) are 64-bit and depending on configuration, high
> > values can be (and in our case are) common.
>
> Actually, I'd like to see us moving towards adopting the capabilities of
> driverfs for this.
>
> PUN is to all intents and purposes now an abstraction in the FC realm.
LUNs
> too, as long as they retain the grouping abstraction with PUNs.  The mid
layer
> really doesn't need to know what these are (there are a few pieces of
code
> that populate the LUN field for SCSI-1 devices that would need rework).
All
> the mid layer really needs is the Scsi_Device structure that describes an

> individual LUN (and some knowledge of LUN and PUN grouping for reset
action
> prediction).  It doesn't need to know or care about the current PUN and
LUN
> numbers.
>
> If FC drivers move straight to using driverfs, you can populate the
driverfs
> names with whatever is meaningful to you for PUN (WWN, portid etch) and
LUN
> and thus avoid this problem (and also the mapping code most have to take
these
> to and from the PUN/LUN numbers).  Since driverfs is a tree, the PUN/LUN
> division is done in a directory (and could, theoretically be done on more
than
> just a two level split).
>
> James

James, there are some things that I don't fully understand, yet.

Does the use of driverfs-names mean that the HBA driver somehow comes into
the discovery-loop for devices (e.g. scan_scsis somehow interacts with the
HBA driver to determine which driverfs entry to create.)?
Shouldn't there be some sort of rule for LUN (or unit) naming within at
least the same hardware class? That is for: parallel, iSCSI, FC, etc.
Will the SCSI stack automatically try to discover all devices behind the
HBA (e.g, on the FC SAN)? Or will there be an interface to restrict the
range, etc?
Lastly, how will you keep track of LUNs in the SCSI stack? Will the current
LUN field disappear in favour of a generic character string?

Cheers,
Aron






^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC]: 64 bit LUN/Tags, dummy device in host_queue, host_lock <-> LLDD reentrancy
  2002-08-26 16:29 [RFC]: 64 bit LUN/Tags, dummy device in host_queue, host_lock <-> LLDD reentrancy Aron Zeh
@ 2002-08-26 16:48 ` James Bottomley
  2002-08-26 17:27   ` Mike Anderson
  0 siblings, 1 reply; 297+ messages in thread
From: James Bottomley @ 2002-08-26 16:48 UTC (permalink / raw)
  To: Aron Zeh; +Cc: James Bottomley, Luben Tuikov, linux-scsi

ARZEH@de.ibm.com said:
> Does the use of driverfs-names mean that the HBA driver somehow comes
> into the discovery-loop for devices (e.g. scan_scsis somehow interacts
> with the HBA driver to determine which driverfs entry to create.)?

Well, scanning is really only for legacy.  Way into the future, I'd like to 
see hotplug notification for device attach events, so no fibre controller ever 
need be troubled by the scan.  All they'd do is send a device attach 
notification into SCSI when they detect one.  (SCSI would probably then turn 
around and ask for a REPORT_LUNS).

> Shouldn't there be some sort of rule for LUN (or unit) naming within
> at least the same hardware class? That is for: parallel, iSCSI, FC,
> etc. Will the SCSI stack automatically try to discover all devices
> behind the HBA (e.g, on the FC SAN)? Or will there be an interface to
> restrict the range, etc?

Yes, there should.  But such a rule would only be for maintaining consistency 
within driverfs, and therefore could happily become Patric Mochel's problem.  
If we go the whole hotplug route, discovery would be up to the scsi hotplug 
scripts.

> Lastly, how will you keep track of LUNs in the SCSI stack? Will the
> current LUN field disappear in favour of a generic character string? 

Internally to Scsi_Device, LUNs will be linked by pointers.  How we know 
whether such a linkage needs to occur, I'm not yet sure about.  In the worst 
case we could use driverfs matching to see if a newly discovered device's PUN 
matches an existing device, or there might be a way for the HBA driver to 
supply the information.

James

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC]: 64 bit LUN/Tags, dummy device in host_queue, host_lock <-> LLDD reentrancy
  2002-08-26 16:48 ` James Bottomley
@ 2002-08-26 17:27   ` Mike Anderson
  2002-08-26 19:00     ` James Bottomley
  0 siblings, 1 reply; 297+ messages in thread
From: Mike Anderson @ 2002-08-26 17:27 UTC (permalink / raw)
  To: James Bottomley; +Cc: Aron Zeh, Luben Tuikov, linux-scsi

James Bottomley [James.Bottomley@SteelEye.com] wrote:
> ARZEH@de.ibm.com said:
> > Does the use of driverfs-names mean that the HBA driver somehow comes
> > into the discovery-loop for devices (e.g. scan_scsis somehow interacts
> > with the HBA driver to determine which driverfs entry to create.)?
> 
> Well, scanning is really only for legacy.  Way into the future, I'd like to 
> see hotplug notification for device attach events, so no fibre controller ever 
> need be troubled by the scan.  All they'd do is send a device attach 
> notification into SCSI when they detect one.  (SCSI would probably then turn 
> around and ask for a REPORT_LUNS).
> 

There will probably still need to be some form of scan through the
hotplug scripts based on a HAs capability (parallel SCSI, FC loop). A
CAM-3 like indication that the HA is capable of auto discovery or needs
a probe from min-max.

> > Shouldn't there be some sort of rule for LUN (or unit) naming within
> > at least the same hardware class? That is for: parallel, iSCSI, FC,
> > etc. Will the SCSI stack automatically try to discover all devices
> > behind the HBA (e.g, on the FC SAN)? Or will there be an interface to
> > restrict the range, etc?
> 
> Yes, there should.  But such a rule would only be for maintaining consistency 
> within driverfs, and therefore could happily become Patric Mochel's problem.  
> If we go the whole hotplug route, discovery would be up to the scsi hotplug 
> scripts.
> 
> > Lastly, how will you keep track of LUNs in the SCSI stack? Will the
> > current LUN field disappear in favour of a generic character string? 
> 
> Internally to Scsi_Device, LUNs will be linked by pointers.  How we know 
> whether such a linkage needs to occur, I'm not yet sure about.  In the worst 
> case we could use driverfs matching to see if a newly discovered device's PUN 
> matches an existing device, or there might be a way for the HBA driver to 
> supply the information.
> 

James I do not follow the usage of LUNs linked by pointers, can you
explain further?

-Mike
-- 
Michael Anderson
andmike@us.ibm.com


^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC]: 64 bit LUN/Tags, dummy device in host_queue, host_lock <-> LLDD reentrancy
  2002-08-26 17:27   ` Mike Anderson
@ 2002-08-26 19:00     ` James Bottomley
  2002-08-26 20:57       ` Mike Anderson
  0 siblings, 1 reply; 297+ messages in thread
From: James Bottomley @ 2002-08-26 19:00 UTC (permalink / raw)
  To: James Bottomley, Aron Zeh, Luben Tuikov, linux-scsi

andmike@us.ibm.com said:
> James I do not follow the usage of LUNs linked by pointers, can you
> explain further? 

Yes:  the mid-layer could function only with Scsi_Devices and Scsi_Hosts.  
LUNs are currently only used explicitly for scanning.  However, I'd like to 
make the error handler handle the consequences of its action, thus it will 
need to know LUN relationships to handle a BDR correctly.  However, the mid 
layer doesn't need an arbitrary number associated with LUN or PUN.  Since the 
number, string or whatever is only really useful to the lld.  So, simply put, 
we'd probably have an entry in the Scsi_Device for lun_list.  This would be 
the usual doubly linked list so starting with any Scsi_Device we could always 
find the associated LUNs.

James

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC]: 64 bit LUN/Tags, dummy device in host_queue, host_lock <-> LLDD reentrancy
  2002-08-26 19:00     ` James Bottomley
@ 2002-08-26 20:57       ` Mike Anderson
  2002-08-26 21:10         ` James Bottomley
  2002-08-26 21:15         ` Mike Anderson
  0 siblings, 2 replies; 297+ messages in thread
From: Mike Anderson @ 2002-08-26 20:57 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi

James Bottomley [James.Bottomley@SteelEye.com] wrote:
> andmike@us.ibm.com said:
> > James I do not follow the usage of LUNs linked by pointers, can you
> > explain further? 
> 
> Yes:  the mid-layer could function only with Scsi_Devices and Scsi_Hosts.  
> LUNs are currently only used explicitly for scanning.  However, I'd like to 
> make the error handler handle the consequences of its action, thus it will 
> need to know LUN relationships to handle a BDR correctly.  However, the mid 
> layer doesn't need an arbitrary number associated with LUN or PUN.  Since the 
> number, string or whatever is only really useful to the lld.  So, simply put, 
> we'd probably have an entry in the Scsi_Device for lun_list.  This would be 
> the usual doubly linked list so starting with any Scsi_Device we could always 
> find the associated LUNs.
> 

Ok, so we are looking for a sibling check. We should alter our driverfs
registration so that we pick this up for free by comparing the
de->parent of our scsi_device. Example tree shown below. Our scsi_device
devfs_entry would be pointing the lun leaf node. TBD on where all the
devfs_entrys get stored an who creates / destroys them.

Example tree
devices/root/pci0/00:09.0/name "PCI device 1014:002e"
devices/root/pci0/00:09.0/0/name "scsi0"
devices/root/pci0/00:09.0/0/0/name "chan0"
devices/root/pci0/00:09.0/0/0/1/name "tgt1"
devices/root/pci0/00:09.0/0/0/1/1/name "lun1"
devices/root/pci0/00:09.0/0/0/1/0/name "lun0"
devices/root/pci0/00:09.0/0/0/0/name "tgt0"
devices/root/pci0/00:09.0/0/0/0/1/name "lun1"
devices/root/pci0/00:09.0/0/0/0/0/name "lun0"


-Mike
-- 
Michael Anderson
andmike@us.ibm.com


^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC]: 64 bit LUN/Tags, dummy device in host_queue, host_lock <-> LLDD reentrancy
  2002-08-26 20:57       ` Mike Anderson
@ 2002-08-26 21:10         ` James Bottomley
  2002-08-26 22:38           ` Mike Anderson
  2002-08-26 21:15         ` Mike Anderson
  1 sibling, 1 reply; 297+ messages in thread
From: James Bottomley @ 2002-08-26 21:10 UTC (permalink / raw)
  To: James Bottomley, linux-scsi

andmike@us.ibm.com said:
> Ok, so we are looking for a sibling check

Exactly.

> Example tree
> devices/root/pci0/00:09.0/name "PCI device 1014:002e"
> devices/root/pci0/00:09.0/0/name "scsi0"
> devices/root/pci0/00:09.0/0/0/name "chan0"
> devices/root/pci0/00:09.0/0/0/1/name "tgt1"
> devices/root/pci0/00:09.0/0/0/1/1/name "lun1"
> devices/root/pci0/00:09.0/0/0/1/0/name "lun0"
> devices/root/pci0/00:09.0/0/0/0/name "tgt0"
> devices/root/pci0/00:09.0/0/0/0/1/name "lun1"
> devices/root/pci0/00:09.0/0/0/0/0/name "lun0"

I'm not too happy with this tree.  The unique string 
devices/root/pci0/00:09.0/name "PCI device 1014:002e" is enough to enumerate 
the scsi card (as long as it's not multi-function).  If I follow my PUN/LUN 
position to its logical conclusion, we no longer need scsi host numbers, 
either.  tgt1 is really a fiction: For a device that has luns there's usually 
no such thing as the target address (traditionally, it's LUN 0).  However, 
fundamentally, the point is that moving all this entirely into driverfs means 
that scsi doesn't need to care what the tree looks like, it just has mappings 
into parts of it.

James

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC]: 64 bit LUN/Tags, dummy device in host_queue, host_lock <-> LLDD reentrancy
  2002-08-26 21:10         ` James Bottomley
@ 2002-08-26 22:38           ` Mike Anderson
  2002-08-26 22:56             ` Patrick Mansfield
                               ` (2 more replies)
  0 siblings, 3 replies; 297+ messages in thread
From: Mike Anderson @ 2002-08-26 22:38 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi

James Bottomley [James.Bottomley@SteelEye.com] wrote:
> > Example tree
> > devices/root/pci0/00:09.0/name "PCI device 1014:002e"
> > devices/root/pci0/00:09.0/0/name "scsi0"
> > devices/root/pci0/00:09.0/0/0/name "chan0"
> > devices/root/pci0/00:09.0/0/0/1/name "tgt1"
> > devices/root/pci0/00:09.0/0/0/1/1/name "lun1"
> > devices/root/pci0/00:09.0/0/0/1/0/name "lun0"
> > devices/root/pci0/00:09.0/0/0/0/name "tgt0"
> > devices/root/pci0/00:09.0/0/0/0/1/name "lun1"
> > devices/root/pci0/00:09.0/0/0/0/0/name "lun0"
> 
> I'm not too happy with this tree.  The unique string 
> devices/root/pci0/00:09.0/name "PCI device 1014:002e" is enough to enumerate 
> the scsi card (as long as it's not multi-function).  If I follow my PUN/LUN 
> position to its logical conclusion, we no longer need scsi host numbers, 
> either.  
Agreed, this most likely can be collapsed. In looking at hosts.[ch] list
addition/cleanup and talking with Greg k-h in the future (once driverfs
ref count is in) we should not being doing our own host lists and ref
counting. 

> tgt1 is really a fiction: For a device that has luns there's usually 
> no such thing as the target address (traditionally, it's LUN 0).

Is there no need to represent a I_T nexus and a I_T_L nexus? It would
seem to be a lost of information if we dump everything into to large of
a bucket and then need to make extra links to regain this information.

The tgts where just examples, these could be port names, etc. My point
was to say if luns (which can have any name you want) where contained
under a port node than it would be easy to determine siblings. The only
drawback would be an extra lun0 node.

> However, fundamentally, the point is that moving all this entirely
> into driverfs means that scsi doesn't need to care what the tree looks
> like, it just has mappings into parts of it.

I am confused as the scsi subsystem is creating parts of this tree by
deciding when and what to register with driverfs. Driverfs is only
providing the interface and some default nodes. While driverfs is doing
a lot, how we register can make future code easier or hard.

-Mike

-- 
Michael Anderson
andmike@us.ibm.com


^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC]: 64 bit LUN/Tags, dummy device in host_queue, host_lock <-> LLDD reentrancy
  2002-08-26 22:38           ` Mike Anderson
@ 2002-08-26 22:56             ` Patrick Mansfield
  2002-08-26 23:10             ` Doug Ledford
  2002-08-28 14:38             ` James Bottomley
  2 siblings, 0 replies; 297+ messages in thread
From: Patrick Mansfield @ 2002-08-26 22:56 UTC (permalink / raw)
  To: James Bottomley, linux-scsi

On Mon, Aug 26, 2002 at 03:38:09PM -0700, Mike Anderson wrote:
> James Bottomley [James.Bottomley@SteelEye.com] wrote:
> > > Example tree
> > > devices/root/pci0/00:09.0/name "PCI device 1014:002e"
> > > devices/root/pci0/00:09.0/0/name "scsi0"
> > > devices/root/pci0/00:09.0/0/0/name "chan0"
> > > devices/root/pci0/00:09.0/0/0/1/name "tgt1"
> > > devices/root/pci0/00:09.0/0/0/1/1/name "lun1"
> > > devices/root/pci0/00:09.0/0/0/1/0/name "lun0"
> > > devices/root/pci0/00:09.0/0/0/0/name "tgt0"
> > > devices/root/pci0/00:09.0/0/0/0/1/name "lun1"
> > > devices/root/pci0/00:09.0/0/0/0/0/name "lun0"
> > 
> > I'm not too happy with this tree.  The unique string 
> > devices/root/pci0/00:09.0/name "PCI device 1014:002e" is enough to enumerate 
> > the scsi card (as long as it's not multi-function).  If I follow my PUN/LUN 
> > position to its logical conclusion, we no longer need scsi host numbers, 
> > either.  
> Agreed, this most likely can be collapsed. In looking at hosts.[ch] list
> addition/cleanup and talking with Greg k-h in the future (once driverfs
> ref count is in) we should not being doing our own host lists and ref
> counting. 

> > tgt1 is really a fiction: For a device that has luns there's usually 
> > no such thing as the target address (traditionally, it's LUN 0).
> 
> Is there no need to represent a I_T nexus and a I_T_L nexus? It would
> seem to be a lost of information if we dump everything into to large of
> a bucket and then need to make extra links to regain this information.

> The tgts where just examples, these could be port names, etc. My point
> was to say if luns (which can have any name you want) where contained
> under a port node than it would be easy to determine siblings. The only
> drawback would be an extra lun0 node.

James/Mike - 

For hotplug user level probe/scanning, you want a target identified: having
the device model (aka driverfs) expose a target would (should) eventually
lead to a hotplug callout that can then be used to trigger a probe (sequential
lun or report lun scans) of the target from user land.

A target representation is also nice for device (i.e. target) 
removal/replacment, where we want to shutdown and perhaps remove
everything on one target with multiple LUNs. Even if we have to manually
remove each LUN, having a target in the tree lets a user figure out
which LUNs have to be shutdown/removed.

The notion of a target could be useful for multi-path path selection, when
we want to round-robin across the target ports, rather than a LUN port, or
pick the least-busy target port. The notion of a bus (like a SCSI bus, FCP
fabric, or an FCP loop) could be usefull for similiar reasons.

Also - according to Pat Mochel the struct device "name" field is supposed
to be a description of the device, not an id or unique identifier. I don't
know if he has plans to add a id/uuid field.

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC]: 64 bit LUN/Tags, dummy device in host_queue, host_lock <-> LLDD reentrancy
  2002-08-26 22:38           ` Mike Anderson
  2002-08-26 22:56             ` Patrick Mansfield
@ 2002-08-26 23:10             ` Doug Ledford
  2002-08-28 14:38             ` James Bottomley
  2 siblings, 0 replies; 297+ messages in thread
From: Doug Ledford @ 2002-08-26 23:10 UTC (permalink / raw)
  To: James Bottomley, linux-scsi

On Mon, Aug 26, 2002 at 03:38:09PM -0700, Mike Anderson wrote:
> James Bottomley [James.Bottomley@SteelEye.com] wrote:
> > > Example tree
> > > devices/root/pci0/00:09.0/name "PCI device 1014:002e"
> > > devices/root/pci0/00:09.0/0/name "scsi0"
> > > devices/root/pci0/00:09.0/0/0/name "chan0"
> > > devices/root/pci0/00:09.0/0/0/1/name "tgt1"
> > > devices/root/pci0/00:09.0/0/0/1/1/name "lun1"
> > > devices/root/pci0/00:09.0/0/0/1/0/name "lun0"
> > > devices/root/pci0/00:09.0/0/0/0/name "tgt0"
> > > devices/root/pci0/00:09.0/0/0/0/1/name "lun1"
> > > devices/root/pci0/00:09.0/0/0/0/0/name "lun0"
> > 
> > I'm not too happy with this tree.  The unique string 
> > devices/root/pci0/00:09.0/name "PCI device 1014:002e" is enough to enumerate 
> > the scsi card (as long as it's not multi-function).  If I follow my PUN/LUN 
> > position to its logical conclusion, we no longer need scsi host numbers, 
> > either.  
> Agreed, this most likely can be collapsed. In looking at hosts.[ch] list
> addition/cleanup and talking with Greg k-h in the future (once driverfs
> ref count is in) we should not being doing our own host lists and ref
> counting. 

The scsi host can be collapsed, the other stuff should really remain.  
After all, the scsi host number was nothing more than the previous way of 
expressing "PCI device 1014:002e", they both map to a controller 
somewhere.

> > tgt1 is really a fiction: For a device that has luns there's usually 
> > no such thing as the target address (traditionally, it's LUN 0).
> 
> Is there no need to represent a I_T nexus and a I_T_L nexus?

There is.  Not really for device needs, but for management needs.  It's 
connection specific and likely is just an implementation detail, but it 
would make like simpler for LLDDs if this was expressed.  What I'm 
thinking is that both the target level struct and the lun level struct 
need to have driver_private pointers for driver defined memory structs as 
well as a back pointer from the lun struct to the parent target struct.  
This is so drivers can easily do things like implement tagged queues 
properly, which are per lun, and also implement speed negotiations, which 
are per target.

> It would
> seem to be a lost of information if we dump everything into to large of
> a bucket and then need to make extra links to regain this information.

Having links to get the information isn't bad (it's currently better than 
the crap we do now which is search the linear list until we've found all 
the needed devices, very ugly).  Having the right links is actually quite 
nice ;-)

> The tgts where just examples, these could be port names, etc. My point
> was to say if luns (which can have any name you want) where contained
> under a port node than it would be easy to determine siblings. The only
> drawback would be an extra lun0 node.

It doesn't have to be that way.  It's entirely possible to make the target 
struct not require this extra lun0 node you refer to.  However, the target 
struct should actually be a fairly thin struct, only containing enough 
info to represent those items that are in fact target wide vs. lun 
specific.  Essentially, anything that the SAM spec calls out as using an 
I_T nexus should be here, and anything that goes on an I_T_L nexus should 
be in the lun struct.  It seems to me that the specs are likely to keep 
this level of heirarchy going forward and so it should be safe to model 
after it, but I could be wrong (and I have read the latest draft specs so 
I could already be wrong).  Feel free to correct me here if I am wrong.

> I am confused as the scsi subsystem is creating parts of this tree by
> deciding when and what to register with driverfs. Driverfs is only
> providing the interface and some default nodes. While driverfs is doing
> a lot, how we register can make future code easier or hard.

Hmmm...I'm not going to comment on this yet because I want to go look a 
few things up first....

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC]: 64 bit LUN/Tags, dummy device in host_queue, host_lock <-> LLDD reentrancy
  2002-08-26 22:38           ` Mike Anderson
  2002-08-26 22:56             ` Patrick Mansfield
  2002-08-26 23:10             ` Doug Ledford
@ 2002-08-28 14:38             ` James Bottomley
  2 siblings, 0 replies; 297+ messages in thread
From: James Bottomley @ 2002-08-28 14:38 UTC (permalink / raw)
  To: James Bottomley, linux-scsi

andmike@us.ibm.com said:
> Is there no need to represent a I_T nexus and a I_T_L nexus? It would
> seem to be a lost of information if we dump everything into to large
> of a bucket and then need to make extra links to regain this
> information. 

As far as the mid-layer is concerned, no.  I_T vs I_T_L is really somewhat of 
a fiction even in SAM-2 since the M_IDENTIFY for I_T just looks like a 
M_IDENTIFY for I_T_L with LUN set to 0.

> The tgts where just examples, these could be port names, etc. My point
> was to say if luns (which can have any name you want) where contained
> under a port node than it would be easy to determine siblings. The
> only drawback would be an extra lun0 node. 

Oh, yes, I agree with that.   The mid-layer just uses a Scsi_Device with a 
driverfs pointer attached.  If the driverfs representation happens to be a 
simple tgt1 or tgt1/lun0, the mid-layer won't care.

> I am confused as the scsi subsystem is creating parts of this tree by
> deciding when and what to register with driverfs. Driverfs is only
> providing the interface and some default nodes. While driverfs is
> doing a lot, how we register can make future code easier or hard. 

OK, I see, conceptually what I'm aiming at is that the bulk of the mid-layer 
doesn't know or care about the name:  it thinks in terms of Scsi_Host or 
Scsi_Device.  When the mid-layer needs to print a name, it fishes it out of 
driverfs.

Currently, the code that generates these names lives in the mid-layer, but it 
is possible that the day may come when we need standardisation amongst all 
disc like devices (or tape like devices, etc.), and it may not live there any 
longer (or at least, parts of it may not).

The ideal now is that we separate these as cleanly as possible inside the 
mid-layer.  I'd like us to be to the point where if we don't like the current  
host/chan/pun/lun/... naming scheme, a few formatting characters get altered 
in one routine and the new naming scheme is implemented. In the long run, I 
think we probably turn over one piece (the host/chan/pun) to the Host driver 
so that it can be used for naming an mapping (with library routines in the mid 
layer for the usual pun numbering) and the LUN part probably always belongs in 
the mid layer since that's fairly universal.

James

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC]: 64 bit LUN/Tags, dummy device in host_queue, host_lock <-> LLDD reentrancy
  2002-08-26 20:57       ` Mike Anderson
  2002-08-26 21:10         ` James Bottomley
@ 2002-08-26 21:15         ` Mike Anderson
  1 sibling, 0 replies; 297+ messages in thread
From: Mike Anderson @ 2002-08-26 21:15 UTC (permalink / raw)
  To: James Bottomley, linux-scsi

Sorry for replying to myself, but a mistake on the previous mail in that
I meant to reference the scsi_device sdev_driverfs_dev member and
associated sdev_driverfs_dev.parent pointer.

-Mike
Mike Anderson [andmike@us.ibm.com] wrote:
> James Bottomley [James.Bottomley@SteelEye.com] wrote:
> > andmike@us.ibm.com said:
> > > James I do not follow the usage of LUNs linked by pointers, can you
> > > explain further? 
> > 
> > Yes:  the mid-layer could function only with Scsi_Devices and Scsi_Hosts.  
> > LUNs are currently only used explicitly for scanning.  However, I'd like to 
> > make the error handler handle the consequences of its action, thus it will 
> > need to know LUN relationships to handle a BDR correctly.  However, the mid 
> > layer doesn't need an arbitrary number associated with LUN or PUN.  Since the 
> > number, string or whatever is only really useful to the lld.  So, simply put, 
> > we'd probably have an entry in the Scsi_Device for lun_list.  This would be 
> > the usual doubly linked list so starting with any Scsi_Device we could always 
> > find the associated LUNs.
> > 
> 
> Ok, so we are looking for a sibling check. We should alter our driverfs
> registration so that we pick this up for free by comparing the
> de->parent of our scsi_device. Example tree shown below. Our scsi_device
> devfs_entry would be pointing the lun leaf node. TBD on where all the
> devfs_entrys get stored an who creates / destroys them.
> 
> Example tree
> devices/root/pci0/00:09.0/name "PCI device 1014:002e"
> devices/root/pci0/00:09.0/0/name "scsi0"
> devices/root/pci0/00:09.0/0/0/name "chan0"
> devices/root/pci0/00:09.0/0/0/1/name "tgt1"
> devices/root/pci0/00:09.0/0/0/1/1/name "lun1"
> devices/root/pci0/00:09.0/0/0/1/0/name "lun0"
> devices/root/pci0/00:09.0/0/0/0/name "tgt0"
> devices/root/pci0/00:09.0/0/0/0/1/name "lun1"
> devices/root/pci0/00:09.0/0/0/0/0/name "lun0"
> 
> 
> -Mike
> -- 
> Michael Anderson
> andmike@us.ibm.com
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Michael Anderson
andmike@us.ibm.com


^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
@ 2002-09-03 14:35 James Bottomley
  2002-09-03 18:23 ` Doug Ledford
  0 siblings, 1 reply; 297+ messages in thread
From: James Bottomley @ 2002-09-03 14:35 UTC (permalink / raw)
  To: Justin T. Gibbs, Doug Ledford; +Cc: linux-kernel, linux-scsi

> Doug Ledford writes:
> 
>  > took the device off line.  So, in short, the mid layer isn't waiting
> long   > enough, or when it gets sense indicated not ready it needs to
> implement a   > waiting queue with a timeout to try rekicking things a
> few times and don't   > actually mark the device off line until a longer
> period of time has   > elasped without the device coming back.
> 
> There is a kernel config CONFIG_AIC7XXX_RESET_DELAY_MS (default 15s).
> Would increasing it help?

Justin Gibbs writes:
> This currently only effects the initial bus reset delay.  If the
> driver holds off commands after subsequent bus resets, it can cause
> undeserved timeouts on the commands it has intentionally deferred.
> The mid-layer has a 5 second delay after bus resets, but I haven't
> verified that this is honored correctly during error recovery.

I'm planning a major re-write of this area in the error handler.  The way I 
think it should go is:

1) Quiesce host (set in_recovery flag)
2) Suspend active timers on this host
3) Proceed down the error correction track (eliminate abort and go down 
device, bus and host resets and finally set the device offline).
5) On each error recovery wait out a recovery timer for the device to become 
active before talking to it again.  Send all affected commands back to the 
block layer to await reissue (note: it would now be illegal for commands to 
lie to the mid layer and say they've done the reset when they haven't).
6) issue a TUR using a command allocated to the eh for that purpose.  Process 
the return code (in particular, if the device says NOT READY, wait some more). 
 Only if the TUR definitively fails proceed up the recovery chain all the way 
to taking the device offline.

I also plan to expose the suspend and resume timers API in some form for FC 
drivers to use.

James

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-03 14:35 aic7xxx sets CDR offline, how to reset? James Bottomley
@ 2002-09-03 18:23 ` Doug Ledford
  2002-09-03 19:09   ` James Bottomley
  0 siblings, 1 reply; 297+ messages in thread
From: Doug Ledford @ 2002-09-03 18:23 UTC (permalink / raw)
  To: James Bottomley; +Cc: Justin T. Gibbs, linux-kernel, linux-scsi

On Tue, Sep 03, 2002 at 09:35:02AM -0500, James Bottomley wrote:
> 
> 1) Quiesce host (set in_recovery flag)

Right.

> 2) Suspend active timers on this host

Right.

> 3) Proceed down the error correction track (eliminate abort and go down 
> device, bus and host resets and finally set the device offline).

Leave abort active.  It does actually work in certain scenarios.  The CD 
burner scenario that started this thread is an example of somewhere that 
an abort should actually do the job.

> 5) On each error recovery wait out a recovery timer for the device to become 
> active before talking to it again.  Send all affected commands back to the 
> block layer to await reissue (note: it would now be illegal for commands to 
> lie to the mid layer and say they've done the reset when they haven't).
> 6) issue a TUR using a command allocated to the eh for that purpose.  Process 
> the return code (in particular, if the device says NOT READY, wait some more). 
>  Only if the TUR definitively fails proceed up the recovery chain all the way 
> to taking the device offline.

Right.

> I also plan to expose the suspend and resume timers API in some form for FC 
> drivers to use.



-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606
  

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-03 18:23 ` Doug Ledford
@ 2002-09-03 19:09   ` James Bottomley
  2002-09-03 20:59     ` Alan Cox
                       ` (2 more replies)
  0 siblings, 3 replies; 297+ messages in thread
From: James Bottomley @ 2002-09-03 19:09 UTC (permalink / raw)
  To: James Bottomley, Justin T. Gibbs, linux-kernel, linux-scsi

dledford@redhat.com said:
> Leave abort active.  It does actually work in certain scenarios.  The
> CD  burner scenario that started this thread is an example of
> somewhere that  an abort should actually do the job. 

Unfortunately, it would destroy the REQ_BARRIER approach in the block layer.  
At best, abort probably causes a command to overtake a barrier it shouldn't, 
at worst we abort the ordered tag that is the barrier and transactional 
integrity is lost.

When error correction is needed, we have to return all the commands for that 
device to the block layer so that ordering and barrier issues can be taken 
care of in the reissue.  This makes LUN RESET (for those that support it) the 
minimum level of error correction we can apply.

James

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-03 19:09   ` James Bottomley
@ 2002-09-03 20:59     ` Alan Cox
  2002-09-03 21:32       ` James Bottomley
  2002-09-03 21:13     ` Doug Ledford
  2002-09-03 21:24     ` Patrick Mansfield
  2 siblings, 1 reply; 297+ messages in thread
From: Alan Cox @ 2002-09-03 20:59 UTC (permalink / raw)
  Cc: James Bottomley, Justin T. Gibbs, linux-kernel, linux-scsi

On Tue, 2002-09-03 at 20:09, James Bottomley wrote:
> dledford@redhat.com said:
> > Leave abort active.  It does actually work in certain scenarios.  The
> > CD  burner scenario that started this thread is an example of
> > somewhere that  an abort should actually do the job. 
> 
> Unfortunately, it would destroy the REQ_BARRIER approach in the block layer.  
> At best, abort probably causes a command to overtake a barrier it shouldn't, 
> at worst we abort the ordered tag that is the barrier and transactional 
> integrity is lost.

What do we plan to do for the cases where reset is disabled because we
have shared disk scsi so don't want to reset and hose the reservations ?

If your error correction always requires all commands return to the
block layer then the block layer is IMHO broken. Its messy enough doing
that before you hit the fun situations where insert scsi commands of
their own the block layer never initiated.

Next you only need to return stuff if commands have been issued between
the aborting command and a barrier. Since most sane systems will never
be causing REQ_BARRIER that should mean the general case for an abort is
going to be fine. The CD burner example is also true for this. If we
track barrier sequences then we will know the barrier count for the
command we are aborting and the top barrier count for commands issued to
the device. Finally you only need to go to the large hammer approach
when you are dealing with a media changing command (ie WRITE*) - if we
abort a read then knowing we don't queue overlapping read then write to
disk we already know that the read will not break down the tag ordering
as I understand it ?

If we get to the point we need an abort we don't want to issue a reset.
Not every device comes back sane from a reset and in some cases we have
to issue a whole sequence of commands to get the state of the device
back (door locking, power management, ..)

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-03 20:59     ` Alan Cox
@ 2002-09-03 21:32       ` James Bottomley
  2002-09-03 21:54         ` Alan Cox
  2002-09-03 22:50         ` Doug Ledford
  0 siblings, 2 replies; 297+ messages in thread
From: James Bottomley @ 2002-09-03 21:32 UTC (permalink / raw)
  To: Alan Cox; +Cc: James Bottomley, Justin T. Gibbs, linux-kernel, linux-scsi

alan@lxorguk.ukuu.org.uk said:
> What do we plan to do for the cases where reset is disabled because we
> have shared disk scsi so don't want to reset and hose the reservations

The reset gets issued and the reservation gets broken.  Good HA or other 
software knows the reservation may be lost and takes this into account in the 
cluster algorithm.

With SCSI-2 reservations, there's no way to preserve the reservation and have 
the reset be effective (I know, in theory, that this can be circumvented by 
the soft reset alternative, but I've never seen a device that implements it 
correctly).  I suppose we hope SCSI-3 Persistent Group Reservations come along 
quickly.

> If your error correction always requires all commands return to the
> block layer then the block layer is IMHO broken. Its messy enough
> doing that before you hit the fun situations where insert scsi
> commands of their own the block layer never initiated. 

This is part of the slim SCSI down approach.  The block layer already has 
handling for tag errors like this.  Inserted SCSI commands should now work 
correctly since we're deprecating the scsi_do_cmnd() in favour of scsi_do_req, 
which means the command is always associated with a request and goes into the 
block queue just like any other request.

I think the block layer, which already knows about the barrier ordering, is 
the appropriate place for this.  If you think the scsi error handler is a 
hairy wart now, just watch it grow into a stonking great carbuncle as I try to 
introduce it to the concept of command queue ordering and appropriate recovery.

> Next you only need to return stuff if commands have been issued
> between the aborting command and a barrier. Since most sane systems
> will never be causing REQ_BARRIER that should mean the general case
> for an abort is going to be fine. The CD burner example is also true
> for this. If we track barrier sequences then we will know the barrier
> count for the command we are aborting and the top barrier count for
> commands issued to the device. Finally you only need to go to the
> large hammer approach when you are dealing with a media changing
> command (ie WRITE*) - if we abort a read then knowing we don't queue
> overlapping read then write to disk we already know that the read will
> not break down the tag ordering as I understand it ? 

I agree with your reasoning.  However, errors occur infrequently enough (I 
hope) so that its just not worth the extra code complexity to make the error 
handler look for that case.

However, in all honesty, I have to say that I just don't believe ABORTs are 
ever particularly effective.  As part of error recovery, If a device is 
tipping over into failure, adding another message isn't a good way to pull it 
back.  ABORT is really part of the I/O cancellation API, and, like all 
cancellation implementations, it's potentially full of holes.  The only uses 
it might have---like oops I didn't mean to fixate that CD, give it back to me 
now---aren't clearly defined in the SPEC to produce the desired effect (stop 
the fixation so the drive door can be opened).

> If we get to the point we need an abort we don't want to issue a
> reset. Not every device comes back sane from a reset and in some cases
> we have to issue a whole sequence of commands to get the state of the
> device back (door locking, power management, ..)

Well, this is SCSI---the first thing most controllers do for parallel SCSI at 
least is reset the BUS.  Some FC drivers do the FC equivalent as well (not 
that they should, but that's another issue).

The pain of coming back from a reset (and I grant, it isn't trivial) is well 
known and well implemented in SCSI.  It also, from error handlings point of 
view, sets the device back to a known point in the state model.

James

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-03 21:32       ` James Bottomley
@ 2002-09-03 21:54         ` Alan Cox
  2002-09-03 22:50         ` Doug Ledford
  1 sibling, 0 replies; 297+ messages in thread
From: Alan Cox @ 2002-09-03 21:54 UTC (permalink / raw)
  To: James Bottomley; +Cc: Justin T. Gibbs, linux-kernel, linux-scsi

On Tue, 2002-09-03 at 22:32, James Bottomley wrote:
> I think the block layer, which already knows about the barrier ordering, is 
> the appropriate place for this.  If you think the scsi error handler is a 
> hairy wart now, just watch it grow into a stonking great carbuncle as I try to 
> introduce it to the concept of command queue ordering and appropriate recovery.

Point taken

> I agree with your reasoning.  However, errors occur infrequently enough (I 
> hope) so that its just not worth the extra code complexity to make the error 
> handler look for that case.

When you ar ehandling CD-ROM problems then they can be quite a lot. Not
helped by the fact some ancient CD's seem to like to take a long walk in
the park when told to reset.

> cancellation implementations, it's potentially full of holes.  The only uses 
> it might have---like oops I didn't mean to fixate that CD, give it back to me 
> now---aren't clearly defined in the SPEC to produce the desired effect (stop 
> the fixation so the drive door can be opened).

Yes but does windows assume it. There are two specs 8)

> The pain of coming back from a reset (and I grant, it isn't trivial) is well 
> known and well implemented in SCSI.  It also, from error handlings point of 
> view, sets the device back to a known point in the state model.
> 

Resetting the bus is antisocial to say the least. We had lots of hangs
with the old eh handler that went away when the new one didnt keep
resetting the bus. If we do reset the bus then we have to handle "Sorry
can't reset the bus" (we have cards that can't or won't) as well as
being conservative about timings, making sure we don't reset the bus
during the seconds while a device rattles back into online state and so
on.

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-03 21:32       ` James Bottomley
  2002-09-03 21:54         ` Alan Cox
@ 2002-09-03 22:50         ` Doug Ledford
  2002-09-03 23:28           ` Alan Cox
                             ` (2 more replies)
  1 sibling, 3 replies; 297+ messages in thread
From: Doug Ledford @ 2002-09-03 22:50 UTC (permalink / raw)
  To: James Bottomley; +Cc: Alan Cox, Justin T. Gibbs, linux-kernel, linux-scsi

> alan@lxorguk.ukuu.org.uk said:
> > Next you only need to return stuff if commands have been issued
> > between the aborting command and a barrier. Since most sane systems
> > will never be causing REQ_BARRIER

Hmmm...I thought a big reason for adding REQ_BARRIER was to be able to 
support more robust journaling with order requirement verification.  If 
that's true, then REQ_BARRIER commands could become quite common on disks 
using ext3.

On Tue, Sep 03, 2002 at 04:32:38PM -0500, James Bottomley wrote:
> However, in all honesty, I have to say that I just don't believe ABORTs are 
> ever particularly effective.  As part of error recovery, If a device is 
> tipping over into failure, adding another message isn't a good way to pull it 
                             ^^^^^^^^^^^^^^^^^^^^^^
Then you might as well skip device resets since they are implemented using 
messages and go straight to bus resets.  Shot deflected, no score.

> back.  ABORT is really part of the I/O cancellation API, and, like all 
> cancellation implementations, it's potentially full of holes.  The only uses 
> it might have---like oops I didn't mean to fixate that CD, give it back to me 
> now---aren't clearly defined in the SPEC to produce the desired effect (stop 
> the fixation so the drive door can be opened).

In my experience, aborts have always actually worked fairly well in any 
scenario where a bus device reset will work.  Generally speaking, the 
problems I've always ran into with SCSI busses have been either A) this 
command is screwing up but it isn't confusing the drive so we can abort it 
or BDR it because the drive still responds to us or B) the bus is hung 
hard and no transfers or messages of any kind can make it through.  In the 
B case, a full bus reset is the only thing that works.  In the A case, 
aborts work just as often as anything else.

> The pain of coming back from a reset (and I grant, it isn't trivial) is well 
> known and well implemented in SCSI.  It also, from error handlings point of 
> view, sets the device back to a known point in the state model.

So does a successful abort.

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606
  

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-03 22:50         ` Doug Ledford
@ 2002-09-03 23:28           ` Alan Cox
  2002-09-04  7:40           ` Jeremy Higdon
  2002-09-04 16:13           ` James Bottomley
  2 siblings, 0 replies; 297+ messages in thread
From: Alan Cox @ 2002-09-03 23:28 UTC (permalink / raw)
  To: Doug Ledford; +Cc: James Bottomley, Justin T. Gibbs, linux-kernel, linux-scsi

On Tue, 2002-09-03 at 23:50, Doug Ledford wrote:
> Hmmm...I thought a big reason for adding REQ_BARRIER was to be able to 
> support more robust journaling with order requirement verification.  If 
> that's true, then REQ_BARRIER commands could become quite common on disks 
> using ext3.

I doubt it very much because the only way to implement barriers on IDE
is fantastically expensive. I'm dubious it makes sense for ext3 to use
barriers in that way. All it needs as I understand the rules is that an
I/O isnt reported as completed by the device driver before it is on the
medium or in a non volatile cache.



^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-03 22:50         ` Doug Ledford
  2002-09-03 23:28           ` Alan Cox
@ 2002-09-04  7:40           ` Jeremy Higdon
  2002-09-04 16:24             ` James Bottomley
  2002-09-04 16:13           ` James Bottomley
  2 siblings, 1 reply; 297+ messages in thread
From: Jeremy Higdon @ 2002-09-04  7:40 UTC (permalink / raw)
  To: Doug Ledford, James Bottomley
  Cc: Alan Cox, Justin T. Gibbs, linux-kernel, linux-scsi

On Sep 3,  6:50pm, Doug Ledford wrote:
> 
> > alan@lxorguk.ukuu.org.uk said:
> > > Next you only need to return stuff if commands have been issued
> > > between the aborting command and a barrier. Since most sane systems
> > > will never be causing REQ_BARRIER
> 
> Hmmm...I thought a big reason for adding REQ_BARRIER was to be able to 
> support more robust journaling with order requirement verification.  If 
> that's true, then REQ_BARRIER commands could become quite common on disks 
> using ext3.

Hmm.  There do seem to be a lot of loopholes/race conditions where the
barrier just won't work right in the face of error recovery.  I wouldn't
want to use barriers on any system where data integrity was crucial.

For example, in Fibrechannel using class 3 (the usual)

	send command (command frame corrupted; device does not receive)
	send barrier (completes normally)
	... (lots of time goes by, many more commands are processed)
	timeout original command whose command frame was corrupted

The only safe way to run such a filesystem is to hold the barriers in
the driver until all previous commands are successfully completed.

There was also the problem of the queue full to the barrier command,
etc.

Did I miss the answer to these?  I don't recall seeing an answer to
Patrick Mansfield's questions either (original message edited cutting
out a couple of paragraphs):

> On Tue, Sep 03, 2002 at 02:09:44PM -0500, James Bottomley wrote:
> > dledford@redhat.com said:
> > > Leave abort active.  It does actually work in certain scenarios.  The
> > > CD  burner scenario that started this thread is an example of
> > > somewhere that  an abort should actually do the job.
> >
> > Unfortunately, it would destroy the REQ_BARRIER approach in the block layer.
> > At best, abort probably causes a command to overtake a barrier it shouldn't,
> > at worst we abort the ordered tag that is the barrier and transactional
> > integrity is lost.
> >
> > When error correction is needed, we have to return all the commands for that
> > device to the block layer so that ordering and barrier issues can be taken
> > care of in the reissue.  This makes LUN RESET (for those that support it) the
> > minimum level of error correction we can apply.
> >
> > James
> 
> If we only send an abort or reset after a quiesce I don't see why one
> is better than the other.
> 
> Not specific to reset or abort - if a single command gets an error, we
> wait for oustanding commands to complete before starting up the error
> handler thread. If all the commands (error one and outstanding) have
> barriers, those that do not error out will complete out of order from
> the errored command.
> 
> How is this properly handled?
> 
> And what happens if one command gets some sort of check condition (like
> medium error, or aborted command) that causes a retry? Will IO's still
> be correctly ordered?


jeremy

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-04  7:40           ` Jeremy Higdon
@ 2002-09-04 16:24             ` James Bottomley
  2002-09-04 17:13               ` Mike Anderson
  2002-09-05  9:50               ` Jeremy Higdon
  0 siblings, 2 replies; 297+ messages in thread
From: James Bottomley @ 2002-09-04 16:24 UTC (permalink / raw)
  To: Jeremy Higdon
  Cc: Doug Ledford, Alan Cox, Justin T. Gibbs, linux-kernel, linux-scsi

jeremy@classic.engr.sgi.com said:
> For example, in Fibrechannel using class 3 (the usual)

> 	send command (command frame corrupted; device does not receive)
> 	send barrier (completes normally)
> 	... (lots of time goes by, many more commands are processed)
> 	timeout original command whose command frame was corrupted 

This doesn't look right to me from the SCSI angle  I don't see how you can get 
a successful disconnect on a command the device doesn't receive (I take it 
this is some type of Fibre magic?).  Of course, if the device (or its proxy) 
does receive the command then the ordered queue tag implementation requires 
that the corrupted frame command be processed prior to the barrier,  this 
isn't optional if you obey the spec.  Thus, assuming the processor does no 
integrity checking of the command until it does processing (this should be a 
big if), then we still must get notification of the failed command before the 
barrier tag is begun.  Obviously, from that notification we do then race to 
eliminate the overtaking tags.

> There was also the problem of the queue full to the barrier command,
> etc. 

The queue full problem still exists.  I've used this argument against the 
filesystem people many times at the various fora where it has been discussed.  
The situation is that everyone agrees that it's a theoretical problem, but 
no-one is convinced that it will actually occur in practice (I think it falls 
into the "risk we're willing to take" category).

James

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-04 16:24             ` James Bottomley
@ 2002-09-04 17:13               ` Mike Anderson
  2002-09-05  9:50               ` Jeremy Higdon
  1 sibling, 0 replies; 297+ messages in thread
From: Mike Anderson @ 2002-09-04 17:13 UTC (permalink / raw)
  To: James Bottomley
  Cc: Jeremy Higdon, Doug Ledford, Alan Cox, Justin T. Gibbs,
	linux-kernel, linux-scsi

James Bottomley [James.Bottomley@steeleye.com] wrote:
> jeremy@classic.engr.sgi.com said:
> > For example, in Fibrechannel using class 3 (the usual)
> 
> > 	send command (command frame corrupted; device does not receive)
> > 	send barrier (completes normally)
> > 	... (lots of time goes by, many more commands are processed)
> > 	timeout original command whose command frame was corrupted 
> 
> This doesn't look right to me from the SCSI angle  I don't see how you can get 
> a successful disconnect on a command the device doesn't receive (I take it 
> this is some type of Fibre magic?).  Of course, if the device (or its proxy) 
> does receive the command then the ordered queue tag implementation requires 
> that the corrupted frame command be processed prior to the barrier,  this 
> isn't optional if you obey the spec.  Thus, assuming the processor does no 
> integrity checking of the command until it does processing (this should be a 
> big if), then we still must get notification of the failed command before the 
> barrier tag is begun.  Obviously, from that notification we do then race to 
> eliminate the overtaking tags.

In FC class 3 if you are logged into a port then notice of this loss
doesn't happen until a upper level timeout occurs (ULTP?). The loss can
happen prior to the command reaching the device (i.e. the switch can
drop the frame). If a corrupted frame makes it to the device it will be
discarded as there is not much it can do with a frame containing unreliable
data. In FC class 2 frames are ack'd so the recovery can be much more
responsive.


-Mike
-- 
Michael Anderson
andmike@us.ibm.com


^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-04 16:24             ` James Bottomley
  2002-09-04 17:13               ` Mike Anderson
@ 2002-09-05  9:50               ` Jeremy Higdon
  1 sibling, 0 replies; 297+ messages in thread
From: Jeremy Higdon @ 2002-09-05  9:50 UTC (permalink / raw)
  To: James Bottomley
  Cc: Doug Ledford, Alan Cox, Justin T. Gibbs, linux-kernel, linux-scsi

On Sep 4, 11:24am, James Bottomley wrote:
> 
> jeremy@classic.engr.sgi.com said:
> > For example, in Fibrechannel using class 3 (the usual)
> 
> > 	send command (command frame corrupted; device does not receive)
> > 	send barrier (completes normally)
> > 	... (lots of time goes by, many more commands are processed)
> > 	timeout original command whose command frame was corrupted 
> 
> This doesn't look right to me from the SCSI angle  I don't see how you can get 
> a successful disconnect on a command the device doesn't receive (I take it 
> this is some type of Fibre magic?).  Of course, if the device (or its proxy) 
> does receive the command then the ordered queue tag implementation requires 
> that the corrupted frame command be processed prior to the barrier,  this 
> isn't optional if you obey the spec.  Thus, assuming the processor does no 
> integrity checking of the command until it does processing (this should be a 
> big if), then we still must get notification of the failed command before the 
> barrier tag is begun.  Obviously, from that notification we do then race to 
> eliminate the overtaking tags.

You don't have disconnect per se with Fibre (or with other packet based
interfaces).  A command is sent as a frame.  If the frame is corrupted
or lost on the way to the target, then the target will never know about
the command having been sent.  There are obvious problems if that command
has an ordered tag, preceded and followed by simple tags dependent on
the ordered semantics being followed, and where the simple tagged commands
following don't wait for the ordered tag to finish.  There are other
failure modes too, of course.

> > There was also the problem of the queue full to the barrier command,
> > etc. 
> 
> The queue full problem still exists.  I've used this argument against the 
> filesystem people many times at the various fora where it has been discussed.  
> The situation is that everyone agrees that it's a theoretical problem, but 
> no-one is convinced that it will actually occur in practice (I think it falls 
> into the "risk we're willing to take" category).

I guess it all depends on how sensitive one is to getting it wrong.  In
any case, it looks as though you are doing what you can . . . .

> James

thanks

jeremy

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-03 22:50         ` Doug Ledford
  2002-09-03 23:28           ` Alan Cox
  2002-09-04  7:40           ` Jeremy Higdon
@ 2002-09-04 16:13           ` James Bottomley
  2002-09-04 16:50             ` Justin T. Gibbs
  2 siblings, 1 reply; 297+ messages in thread
From: James Bottomley @ 2002-09-04 16:13 UTC (permalink / raw)
  To: James Bottomley, Alan Cox, Justin T. Gibbs, linux-kernel,
	linux-scsi

dledford@redhat.com said:
> Now, granted, that is more complex than going straight to a BDR, but I
>  have to argue that it *isn't* that complex.  It certainly isn't the
> nightmare you make it sound like ;-) 

It's three times longer even in pseudocode...

However, assume we do this (because we must for barrier preservation).  The 
chances are that for a failing device we're aborting a significant number of 
the tags.  This is quite a big increase in the message load over what we do 
now---Particularly for the AIC driver which can have hundreds of tags 
outstanding (murphys law says it's usually the earilest tag which times out).  
I'm not convinced that a BDR, which is a single message and has roughly the 
same effect, isn't preferable.

However, what about a compromise?  We can count outstanding commands, so what 
about doing abort *if* the number of outstanding commands is exactly one (the 
one we're trying to abort).  This means for devices that don't do TCQ (like 
old CD-ROMs) we'll always try abort first.  For large numbers of outstanding 
tags, we skip over abort and move straight to BDR.  The code to implement this 
will be clean and simple because abort no longer has to pay attention to the 
barrier.

James

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-04 16:13           ` James Bottomley
@ 2002-09-04 16:50             ` Justin T. Gibbs
  2002-09-05  9:39               ` Jeremy Higdon
  0 siblings, 1 reply; 297+ messages in thread
From: Justin T. Gibbs @ 2002-09-04 16:50 UTC (permalink / raw)
  To: James Bottomley, Alan Cox, linux-kernel, linux-scsi

> dledford@redhat.com said:
>> Now, granted, that is more complex than going straight to a BDR, but I
>>  have to argue that it *isn't* that complex.  It certainly isn't the
>> nightmare you make it sound like ;-) 
> 
> It's three times longer even in pseudocode...

To make this work, you really need to use the QErr bit in the
disconnect/reconnect page and/or ECA or ACA.  QErr I believe is
well supported in devices, but ECA (pre SCSI-3) and ACA most
likely receive very little testing.

I will also voice my opinion (again) that watchdog timer recovery
is in the wrong place in Linux.  It belongs in the controller drivers:

1) Only the controller driver knows when to start the timeout
2) Only the controller driver knows the current status of the bus/transport
3) Only the controller can close timeout/just completed races
4) Only the controller driver knows the true transport type
   (SPI/FC/ATA/USB/1394/IP) and what recovery actions are appropriate
   for that transport type given the capabilities of the controller.
5) The algorithm for recovery and maintaining order becomes quite simple:
	1) Freeze the input queue for the controller
	2) Return all transactions unqueued to a device to the mid-layer
	3) Perform the recovery actions required
	4) Unfreeze the controller's queue
	5) Device type driver (sd, cd, tape, etc) decides what errors
	   at what rates should cause the failure of a device.  The
	   controller driver just needs to have the error codes so
	   it can honestly and fully report to the type driver what
	   really happens so it can make good decissions

   This of course assumes that all transactions have a serial number and
   that requeuing transactions orders them by serial number.  With QErr
   set, the device closes the rest if the barrier race for you, but even
   without barriers, full transaction ordering is required if you don't
   want a read to inadvertantly pass a write to the same location during
   recovery.

   For prior art, take a look at FreeBSD.  In the worst case, where
   escalation to a bus reset is required, recovery takes 5 seconds.

--
Justin

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-04 16:50             ` Justin T. Gibbs
@ 2002-09-05  9:39               ` Jeremy Higdon
  2002-09-05 13:35                 ` Justin T. Gibbs
  0 siblings, 1 reply; 297+ messages in thread
From: Jeremy Higdon @ 2002-09-05  9:39 UTC (permalink / raw)
  To: Justin T. Gibbs, James Bottomley, Alan Cox, linux-kernel,
	linux-scsi

On Sep 4, 10:50am, Justin T. Gibbs wrote:
> 
>    This of course assumes that all transactions have a serial number and
>    that requeuing transactions orders them by serial number.  With QErr
>    set, the device closes the rest if the barrier race for you, but even
>    without barriers, full transaction ordering is required if you don't
>    want a read to inadvertantly pass a write to the same location during
>    recovery.


The original FCP (SCSI commands over Fibre) profile specified that QERR=1
was not available.  Unless that is changed, it would appear that you cannot
count on being able to set Qerr.

Qerr was one of those annoying little things in SCSI that forces host
adapter drivers to know a mode page setting.

jeremy

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-05  9:39               ` Jeremy Higdon
@ 2002-09-05 13:35                 ` Justin T. Gibbs
  2002-09-05 23:56                   ` Jeremy Higdon
  0 siblings, 1 reply; 297+ messages in thread
From: Justin T. Gibbs @ 2002-09-05 13:35 UTC (permalink / raw)
  To: Jeremy Higdon, James Bottomley, Alan Cox, linux-kernel,
	linux-scsi

> On Sep 4, 10:50am, Justin T. Gibbs wrote:
>> 
>>    This of course assumes that all transactions have a serial number and
>>    that requeuing transactions orders them by serial number.  With QErr
>>    set, the device closes the rest if the barrier race for you, but even
>>    without barriers, full transaction ordering is required if you don't
>>    want a read to inadvertantly pass a write to the same location during
>>    recovery.
> 
> 
> The original FCP (SCSI commands over Fibre) profile specified that QERR=1
> was not available.  Unless that is changed, it would appear that you
> cannot count on being able to set Qerr.
> 
> Qerr was one of those annoying little things in SCSI that forces host
> adapter drivers to know a mode page setting.

It is not the controllers that know, but the type drivers.  The controller
drivers should be conduits for commands, nothing more.  With the proper
events and error codes, the type drivers can maintain mode parameters
and only the type drivers know what parameters are appropriate for their
type of service.

For FC, you need to use ECA/ACA anyway as that is the only way to deal with
inflight commands at the time of an error.

--
Justin

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-05 13:35                 ` Justin T. Gibbs
@ 2002-09-05 23:56                   ` Jeremy Higdon
  2002-09-06  0:13                     ` Justin T. Gibbs
  0 siblings, 1 reply; 297+ messages in thread
From: Jeremy Higdon @ 2002-09-05 23:56 UTC (permalink / raw)
  To: Justin T. Gibbs, James Bottomley, Alan Cox, linux-scsi

On Sep 5,  7:35am, Justin T. Gibbs wrote:
> 
> > On Sep 4, 10:50am, Justin T. Gibbs wrote:
> >> 
> >>    This of course assumes that all transactions have a serial number and
> >>    that requeuing transactions orders them by serial number.  With QErr
> >>    set, the device closes the rest if the barrier race for you, but even
> >>    without barriers, full transaction ordering is required if you don't
> >>    want a read to inadvertantly pass a write to the same location during
> >>    recovery.
> > 
> > 
> > The original FCP (SCSI commands over Fibre) profile specified that QERR=1
> > was not available.  Unless that is changed, it would appear that you
> > cannot count on being able to set Qerr.
> > 
> > Qerr was one of those annoying little things in SCSI that forces host
> > adapter drivers to know a mode page setting.
> 
> It is not the controllers that know, but the type drivers.  The controller
> drivers should be conduits for commands, nothing more.  With the proper
> events and error codes, the type drivers can maintain mode parameters
> and only the type drivers know what parameters are appropriate for their
> type of service.

But don't the controllers have to know which commands have been silently
aborted by the target?  There are various resources allocated to a command,
so if the command will never be completed, as a result of some other command
getting "check condition", the controller would have to know.

Or are you suggesting that the type drivers would tell the controllers to
release those resources?

And would this change if the controller drivers were keeping track of
timeouts, etc.?

jeremy

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-05 23:56                   ` Jeremy Higdon
@ 2002-09-06  0:13                     ` Justin T. Gibbs
  2002-09-06  0:32                       ` Jeremy Higdon
  0 siblings, 1 reply; 297+ messages in thread
From: Justin T. Gibbs @ 2002-09-06  0:13 UTC (permalink / raw)
  To: Jeremy Higdon, James Bottomley, Alan Cox, linux-scsi

> But don't the controllers have to know which commands have been silently
> aborted by the target?

The controller drivers would need to know that the QErr policy is in
effect for a given device.  With that knowledge, the book keeping from
the controller's standpoint is really quite simple and no different than
if a lun reset, target reset, clear task set, bus reset or any other task
management function is executed.  In other words, the controller drivers
already need to understand the consequences of such events.

>  There are various resources allocated to a
> command, so if the command will never be completed, as a result of some
> other command getting "check condition", the controller would have to
> know.

The controller knows which commands are affected and just returns them
to the mid-layer with the appropriate error code.

> Or are you suggesting that the type drivers would tell the controllers to
> release those resources?

Not necessary.

> And would this change if the controller drivers were keeping track of
> timeouts, etc.?

This all assumes that the controllers are doing watchdog recovery.

--
Justin

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-06  0:13                     ` Justin T. Gibbs
@ 2002-09-06  0:32                       ` Jeremy Higdon
  0 siblings, 0 replies; 297+ messages in thread
From: Jeremy Higdon @ 2002-09-06  0:32 UTC (permalink / raw)
  To: Justin T. Gibbs, James Bottomley, Alan Cox, linux-scsi

On Sep 5,  6:13pm, Justin T. Gibbs wrote:
> 
> > But don't the controllers have to know which commands have been silently
> > aborted by the target?
> 
> The controller drivers would need to know that the QErr policy is in
> effect for a given device.

Right.  We're on the same page now.  So you'd need some interface for
the sd driver (or midlayer) to tell the host driver what the QErr policy
is for a given lun.  I don't recall seeing that in the current layer,
but it would be easy to add, presumably.

jeremY

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-03 19:09   ` James Bottomley
  2002-09-03 20:59     ` Alan Cox
@ 2002-09-03 21:13     ` Doug Ledford
  2002-09-03 21:48       ` James Bottomley
  2002-09-03 21:24     ` Patrick Mansfield
  2 siblings, 1 reply; 297+ messages in thread
From: Doug Ledford @ 2002-09-03 21:13 UTC (permalink / raw)
  To: James Bottomley; +Cc: Justin T. Gibbs, linux-kernel, linux-scsi

On Tue, Sep 03, 2002 at 02:09:44PM -0500, James Bottomley wrote:
> dledford@redhat.com said:
> > Leave abort active.  It does actually work in certain scenarios.  The
> > CD  burner scenario that started this thread is an example of
> > somewhere that  an abort should actually do the job. 
> 
> Unfortunately, it would destroy the REQ_BARRIER approach in the block layer.  

REQ_BARRIER is used for filesystem devices (disks mainly) while the 
devices that might benefit most from working aborts would likely be other 
devices.  But, regardless, the REQ_BARRIER ordering *can* be preserved 
while using abort processing.  Since the command that needs aborting is, 
as you are hypothesizing, before the REQ_BARRIER command, and since it 
hasn't completed, then the REQ_BARRIER command can not be complete and 
neither can any of the commands behind the REQ_BARRIER.  On direct access 
devices you are only concerned about ordering around the barrier, not 
ordering of the actual tagged commands, so for abort you can actually call 
abort on all the commands past the REQ_BARRIER command first, then the 
REQ_BARRIER command, then the hung command.  That would do the job and 
preserve REQ_BARRIER ordering while still using aborts.

> At best, abort probably causes a command to overtake a barrier it shouldn't, 
> at worst we abort the ordered tag that is the barrier and transactional 
> integrity is lost.
> 
> When error correction is needed, we have to return all the commands for that 
> device to the block layer so that ordering and barrier issues can be taken 
> care of in the reissue.

Not really, this would be easily enough done in the ML_QUEUE area of the 
scsi layer, but it matters not to me.  However, if you throw a BDR, then 
you have cancelled all outstanding commands and (typically) initiated a 
hard reset of the device which then requires a device settle time.  All of 
this is more drastic and typically takes longer than the individual aborts 
which are completed in a single connect->disconnect cycle without ever 
hitting a data phase and without triggering a full device reset and 
requiring a settle time.

>  This makes LUN RESET (for those that support it) the 
> minimum level of error correction we can apply.

Not true.

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-03 21:13     ` Doug Ledford
@ 2002-09-03 21:48       ` James Bottomley
  2002-09-03 22:42         ` Doug Ledford
  2002-09-04 10:37         ` Andries Brouwer
  0 siblings, 2 replies; 297+ messages in thread
From: James Bottomley @ 2002-09-03 21:48 UTC (permalink / raw)
  To: James Bottomley, Justin T. Gibbs, linux-kernel, linux-scsi

dledford@redhat.com said:
> But, regardless, the REQ_BARRIER ordering *can* be preserved  while
> using abort processing.  Since the command that needs aborting is,  as
> you are hypothesizing, before the REQ_BARRIER command, and since it
> hasn't completed, then the REQ_BARRIER command can not be complete and
>  neither can any of the commands behind the REQ_BARRIER.

You are correct.  However, as soon as you abort the problem command (assuming 
the device recovers from this), it will go on its merry way processing the 
remaining commands in the queue.  Assuming one of these is the barrier, you've 
no way now of re-queueing the aborted command so that it comes before the 
ordered tag barrier.  You can try using a head of queue tag, but it's still a 
nasty race.

> On direct access  devices you are only concerned about ordering around
> the barrier, not  ordering of the actual tagged commands, so for abort
> you can actually call  abort on all the commands past the REQ_BARRIER
> command first, then the  REQ_BARRIER command, then the hung command.
> That would do the job and  preserve REQ_BARRIER ordering while still
> using aborts.

I agree, but the most likely scenario is that now you're trying to abort 
almost every tag for that device in the system.  Isn't reset a simpler 
alternative to this?

> > At best, abort probably causes a command to overtake a barrier it shouldn't, 
> > at worst we abort the ordered tag that is the barrier and transactional 
> > integrity is lost.
> > 
> > When error correction is needed, we have to return all the commands for that 
> > device to the block layer so that ordering and barrier issues can be taken 
> > care of in the reissue.

> Not really, this would be easily enough done in the ML_QUEUE area of
> the  scsi layer, but it matters not to me.  However, if you throw a
> BDR, then  you have cancelled all outstanding commands and (typically)
> initiated a  hard reset of the device which then requires a device
> settle time.  All of  this is more drastic and typically takes longer
> than the individual aborts  which are completed in a single connect->
> disconnect cycle without ever  hitting a data phase and without
> triggering a full device reset and  requiring a settle time. 

I agree.  I certainly could do it.  I'm just a lazy so-and-so.  However, think 
what it does.  Apart from me having to do more work, the code becomes longer 
and the error recovery path more convoluted and difficult to follow.  The 
benefit?  well, error recovery might be faster in certain circumstances.  I 
just don't see that it's a cost effective change.  If you're hitting error 
recovery so often that whether it recovers in  half a second or several 
seconds makes a difference, I'd say there's something else wrong.

James

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-03 21:48       ` James Bottomley
@ 2002-09-03 22:42         ` Doug Ledford
  2002-09-03 22:52           ` Doug Ledford
                             ` (2 more replies)
  2002-09-04 10:37         ` Andries Brouwer
  1 sibling, 3 replies; 297+ messages in thread
From: Doug Ledford @ 2002-09-03 22:42 UTC (permalink / raw)
  To: James Bottomley; +Cc: Justin T. Gibbs, linux-kernel, linux-scsi

On Tue, Sep 03, 2002 at 04:48:39PM -0500, James Bottomley wrote:
> dledford@redhat.com said:
> You are correct.  However, as soon as you abort the problem command (assuming 
> the device recovers from this), it will go on its merry way processing the 
> remaining commands in the queue.  Assuming one of these is the barrier, you've 
> no way now of re-queueing the aborted command so that it comes before the 
> ordered tag barrier.  You can try using a head of queue tag, but it's still a 
> nasty race.

(Solution to this race was in my next paragraph as you found ;-)

> > On direct access  devices you are only concerned about ordering around
> > the barrier, not  ordering of the actual tagged commands, so for abort
> > you can actually call  abort on all the commands past the REQ_BARRIER
> > command first, then the  REQ_BARRIER command, then the hung command.
> > That would do the job and  preserve REQ_BARRIER ordering while still
> > using aborts.
> 
> I agree, but the most likely scenario is that now you're trying to abort 
> almost every tag for that device in the system.  Isn't reset a simpler 
> alternative to this?

Not really.  It hasn't been done yet, but one of my goals is to change the 
scsi commands over to reasonable list usage (finally) so that we can avoid 
all these horrible linear scans it does now looking for an available 
command (it also means things like SCSI_OWNER_MID_LAYER can go away 
because ownership is defined implicitly by list membership).  So, 
basically, you have a list item struct on each command.  When you build 
the commands, you add them to SDpnt->free_list.  When you need a command, 
instead of searching for a free one, you just grab the head of 
SDpnt->free_list and use it.  Once you've built the command and are ready 
to hand it off to the lldd, you put the command on the tail of the 
SDpnt->active_list.  When a command completes, you list_remove() it from 
the SDpnt->active_list and put it on the SDpnt->complete_list to be 
handled by the tasklet.  When the tasklet actually completes the command, 
it frees the scsi command struct by simply putting it back on the 
SDpnt->free_list.  Now, if you do things that way, your reset vs. abort 
code is actually pretty trivial.

Case 1: you want to throw a BDR.  Sample code might end up looking like 
this,

	[ oops we timed out ]
	hostt->bus_device_reset(cmd);
	if(!list_empty(cmd->device->active_list)) {
		[ our commands haven't all been returned, spew chunks! ]
	}
	[ do post reset processing ]

Case 2: you want to do an abort, but you need to preserve ordering around 
any possible REQ_BARRIERs on the bus.  This requires that we keep a 
REQ_BARRIER count for the device, it is after all possible that we could 
have multiple barriers active at once, so as each command is put on the 
active_list, if it is a barrier, then we increment SDpnt->barrier_count 
and as we complete commands (at the interrupt context completion, not the 
final completion) if it is a barrier command we decrement the count.

	[ oops we timed out ]
	while(SDpnt->barrier_count && cmd) {
		// when the aborted command is returned via the done()
		// it will remove it from the active_list, so don't remove
		// it here
		abort_cmd = list_get_tail(SDpnt->active_list);
		if(hostt->abort(abort_cmd) != SUCCESS) {
			[ oops, go on to more drastic action ]
		} else {
			if(abort_cmd->type == BARRIER)
				SDpnt->barrier_count--;
			if(abort_cmd == cmd)
				cmd = NULL;
		}
	}
	if(cmd) {
		if(hostt->abort(cmd) != SUCCESS)
			[ oops, go on to more drastic action ]
	}

Now, granted, that is more complex than going straight to a BDR, but I 
have to argue that it *isn't* that complex.  It certainly isn't the 
nightmare you make it sound like ;-)

> > > At best, abort probably causes a command to overtake a barrier it shouldn't, 
> > > at worst we abort the ordered tag that is the barrier and transactional 
> > > integrity is lost.
> > > 
> > > When error correction is needed, we have to return all the commands for that 
> > > device to the block layer so that ordering and barrier issues can be taken 
> > > care of in the reissue.
> 
> > Not really, this would be easily enough done in the ML_QUEUE area of
> > the  scsi layer, but it matters not to me.  However, if you throw a
> > BDR, then  you have cancelled all outstanding commands and (typically)
> > initiated a  hard reset of the device which then requires a device
> > settle time.  All of  this is more drastic and typically takes longer
> > than the individual aborts  which are completed in a single connect->
> > disconnect cycle without ever  hitting a data phase and without
> > triggering a full device reset and  requiring a settle time. 
> 
> I agree.  I certainly could do it.  I'm just a lazy so-and-so.  However, think 
> what it does.  Apart from me having to do more work, the code becomes longer 
> and the error recovery path more convoluted and difficult to follow.  The 
> benefit?  well, error recovery might be faster in certain circumstances.

Well, as I've laid it out above, I don't really think it's all that much 
to implement ;-)  At least not in the mid layer.  The low level device 
drivers are doing *far* more work to support aborts than the mid layer has 
to do.

>  I 
> just don't see that it's a cost effective change.

Matter of some question, I'm sure.  I don't see it as all that much work, 
so it seems reasonably cost effective to me ;-)

>  If you're hitting error 
> recovery so often that whether it recovers in  half a second or several 
> seconds makes a difference, I'd say there's something else wrong.

Hehehe, if you are hitting error recovery at all then something else is 
wrong by definition, the only difference is in how you handle it :-P

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-03 22:42         ` Doug Ledford
@ 2002-09-03 22:52           ` Doug Ledford
  2002-09-03 23:29           ` Alan Cox
  2002-09-04 21:16           ` Luben Tuikov
  2 siblings, 0 replies; 297+ messages in thread
From: Doug Ledford @ 2002-09-03 22:52 UTC (permalink / raw)
  To: James Bottomley, Justin T. Gibbs, linux-kernel, linux-scsi

On Tue, Sep 03, 2002 at 06:42:16PM -0400, Doug Ledford wrote:
> 
> Case 2: you want to do an abort, but you need to preserve ordering around 
> any possible REQ_BARRIERs on the bus.  This requires that we keep a 
> REQ_BARRIER count for the device, it is after all possible that we could 
> have multiple barriers active at once, so as each command is put on the 
> active_list, if it is a barrier, then we increment SDpnt->barrier_count 
> and as we complete commands (at the interrupt context completion, not the 
> final completion) if it is a barrier command we decrement the count.
> 
> 	[ oops we timed out ]
> 	while(SDpnt->barrier_count && cmd) {
> 		// when the aborted command is returned via the done()
> 		// it will remove it from the active_list, so don't remove
> 		// it here
> 		abort_cmd = list_get_tail(SDpnt->active_list);
> 		if(hostt->abort(abort_cmd) != SUCCESS) {
> 			[ oops, go on to more drastic action ]
> 		} else {
> 			if(abort_cmd->type == BARRIER)
> 				SDpnt->barrier_count--;

Oops, delete those last two lines....the done() function decrements the 
barrier_count for us.

> 			if(abort_cmd == cmd)
> 				cmd = NULL;
> 		}
> 	}
> 	if(cmd) {
> 		if(hostt->abort(cmd) != SUCCESS)
> 			[ oops, go on to more drastic action ]
> 	}

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606
  

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-03 22:42         ` Doug Ledford
  2002-09-03 22:52           ` Doug Ledford
@ 2002-09-03 23:29           ` Alan Cox
  2002-09-04 21:16           ` Luben Tuikov
  2 siblings, 0 replies; 297+ messages in thread
From: Alan Cox @ 2002-09-03 23:29 UTC (permalink / raw)
  To: Doug Ledford; +Cc: James Bottomley, Justin T. Gibbs, linux-kernel, linux-scsi

On Tue, 2002-09-03 at 23:42, Doug Ledford wrote:
> Not really.  It hasn't been done yet, but one of my goals is to change the 
> scsi commands over to reasonable list usage (finally) so that we can avoid 
> all these horrible linear scans it does now looking for an available 
> command (it also means things like SCSI_OWNER_MID_LAYER can go away 

At least partly done in 2.4-ac and waiting pushing to Marcelo. I believe
Hugh contributed the O(1) command finder

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-03 22:42         ` Doug Ledford
  2002-09-03 22:52           ` Doug Ledford
  2002-09-03 23:29           ` Alan Cox
@ 2002-09-04 21:16           ` Luben Tuikov
  2 siblings, 0 replies; 297+ messages in thread
From: Luben Tuikov @ 2002-09-04 21:16 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-kernel, linux-scsi

Doug Ledford wrote:
> 
> Not really.  It hasn't been done yet, but one of my goals is to change the
> scsi commands over to reasonable list usage (finally) so that we can avoid
> all these horrible linear scans it does now looking for an available
> command

Using the struct list_head for this will literally allow you to do _magic_.
Avoiding the linear scan is the last thing this will fix.

It would allow for a lot better/simpler/sound design of all of
the mid layer/SCSI core. Things will be/become easier as you
point out below. Currently the mid-layer queuing is hairy at best.

I'm all for it.

> So,
> basically, you have a list item struct on each command.  When you build
> the commands, you add them to SDpnt->free_list.  When you need a command,
> instead of searching for a free one, you just grab the head of
> SDpnt->free_list and use it.  Once you've built the command and are ready
> to hand it off to the lldd, you put the command on the tail of the
> SDpnt->active_list.  When a command completes, you list_remove() it from
> the SDpnt->active_list and put it on the SDpnt->complete_list to be
> handled by the tasklet.  When the tasklet actually completes the command,
> it frees the scsi command struct by simply putting it back on the
> SDpnt->free_list.

Great!

Once you're on that train, you may want to rethink the whole queuing
mechanism of the mid-layer (straight from sd/etc and internally down to LLDD)
for an improved design.

There'd be problems like cmd moving b/n lists is atomic, only cmd movers
can actually cancel a command, move before calling queuecommand(), etc,
but is nothing extraordinary.

> Now, granted, that is more complex than going straight to a BDR, but I
> have to argue that it *isn't* that complex.  It certainly isn't the
> nightmare you make it sound like ;-)

No, it certainly is NOT!

Granted, by looking at the code it will not be overly clear
who moves what and when, but a 2 page commentary on the design
would only leave one exlaiming ``Aaaaah... such simplicity, so great!'' 

> Well, as I've laid it out above, I don't really think it's all that much
> to implement ;-)  At least not in the mid layer.

Right, it's not. This type of queuing mechanism would only make things
more consistent and easy to manipulate.

There'd be logistical issues, but those are easy to figure out
with pen and paper.

-- 
Luben

``Perfection is achieved not when there is nothing more to add
  but when there is nothing left to take away.''
                              Antoine de Saint Exupery

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-03 21:48       ` James Bottomley
  2002-09-03 22:42         ` Doug Ledford
@ 2002-09-04 10:37         ` Andries Brouwer
  2002-09-04 10:48           ` Doug Ledford
  2002-09-04 11:23           ` Alan Cox
  1 sibling, 2 replies; 297+ messages in thread
From: Andries Brouwer @ 2002-09-04 10:37 UTC (permalink / raw)
  To: James Bottomley; +Cc: Justin T. Gibbs, linux-kernel, linux-scsi

On Tue, Sep 03, 2002 at 04:48:39PM -0500, James Bottomley wrote:

> If you're hitting error recovery so often that whether it recovers
> in  half a second or several seconds makes a difference, I'd say
> there's something else wrong.

Not that I want to contradict, but an example.
Without my sd.c patch from yesterday or so (fixing MODE SENSE calls)
an "insmod usb-storage.o" would take 14 minutes and 6 seconds for me.

[One USB device, with 3 subdevices, gets into a bad state when
presented with a MODE SENSE command that asks for more than the
56 bytes it has available. For each of the three subdevices we
get a long sequence of retries, abort, reset, host reset, bus reset
before it is taken off-line.]

The scsi error recovery has many bad properties, but one is its slowness.
Once it gets triggered on a machine with SCSI disks it is common to
have a dead system for several minutes. I have not yet met a situation
in which rebooting was not preferable above scsi error recovery,
especially since the attempt to recover often fails.

Andries

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-04 10:37         ` Andries Brouwer
@ 2002-09-04 10:48           ` Doug Ledford
  2002-09-04 11:23           ` Alan Cox
  1 sibling, 0 replies; 297+ messages in thread
From: Doug Ledford @ 2002-09-04 10:48 UTC (permalink / raw)
  To: Andries Brouwer
  Cc: James Bottomley, Justin T. Gibbs, linux-kernel, linux-scsi

On Wed, Sep 04, 2002 at 12:37:37PM +0200, Andries Brouwer wrote:
> 
> The scsi error recovery has many bad properties, but one is its slowness.

This does not have to be this way.  It is a solvable problem.

> Once it gets triggered on a machine with SCSI disks it is common to
> have a dead system for several minutes.

Yes, well, this too is solvable.  It, in fact, reminds me that one of the 
things I think needs added to the scsi host settings is a default timeout 
value for typical devices.  Something like adding a default_timeout value 
to each Scsi_Device struct and allowing the scsi driver to modify the 
value during the slave_attach() call.  Then we can put the default timeout 
on non-intelligent controllers to something sane while things like 
MegaRAID controllers can keep their sky high timeout values.

> I have not yet met a situation
> in which rebooting was not preferable above scsi error recovery,
> especially since the attempt to recover often fails.

This, too, is solvable.  It just requires that the scsi subsystem start 
paying attention to *how* things fail and making the error handling code 
smart enough to know when to retry things and when to just fail.

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-04 10:37         ` Andries Brouwer
  2002-09-04 10:48           ` Doug Ledford
@ 2002-09-04 11:23           ` Alan Cox
  2002-09-04 16:25             ` Rogier Wolff
  1 sibling, 1 reply; 297+ messages in thread
From: Alan Cox @ 2002-09-04 11:23 UTC (permalink / raw)
  To: Andries Brouwer
  Cc: James Bottomley, Justin T. Gibbs, linux-kernel, linux-scsi

On Wed, 2002-09-04 at 11:37, Andries Brouwer wrote:
> The scsi error recovery has many bad properties, but one is its slowness.
> Once it gets triggered on a machine with SCSI disks it is common to
> have a dead system for several minutes. I have not yet met a situation
> in which rebooting was not preferable above scsi error recovery,
> especially since the attempt to recover often fails.

Well I for one prefer the scsi timeout/abort sequence on a CD getting
confused badly by a bad block (as at least some of my drives do) to a
reboot everytime I get bad media


^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-04 11:23           ` Alan Cox
@ 2002-09-04 16:25             ` Rogier Wolff
  2002-09-04 19:34               ` Thunder from the hill
  0 siblings, 1 reply; 297+ messages in thread
From: Rogier Wolff @ 2002-09-04 16:25 UTC (permalink / raw)
  To: Alan Cox
  Cc: Andries Brouwer, James Bottomley, Justin T. Gibbs, linux-kernel,
	linux-scsi

On Wed, Sep 04, 2002 at 12:23:13PM +0100, Alan Cox wrote:
> On Wed, 2002-09-04 at 11:37, Andries Brouwer wrote:
> > The scsi error recovery has many bad properties, but one is its slowness.
> > Once it gets triggered on a machine with SCSI disks it is common to
> > have a dead system for several minutes. I have not yet met a situation
> > in which rebooting was not preferable above scsi error recovery,
> > especially since the attempt to recover often fails.
> 
> Well I for one prefer the scsi timeout/abort sequence on a CD getting
> confused badly by a bad block (as at least some of my drives do) to a
> reboot everytime I get bad media

Reboot is bad. Retries are bad. 

Errors should be returned to an upper layer, with an error code: "may
retry", or "will never work". (Like in SMTP)

I will most likely set the "retry count" to 0: Never retry. Almost
never works anyway. And the disk already retried manytimes, so
retrying in software is only "taking time".

We do datarecovery around here. We get bad disks on a dayly basis. We
are currently reading a drive that gets over 10Mb per second while
spitting out bad block reports!

Thing is: those blocks that didn't work first time, may work on our
second retry. However, we need userspace control over that retry. We
prefer to get the 18G worth of data off the disk first, and only then
retry the blocks that happened to be bad first time around.

			Roger.

-- 
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2137555 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
* The Worlds Ecosystem is a stable system. Stable systems may experience *
* excursions from the stable situation. We are currenly in such an       * 
* excursion: The stable situation does not include humans. ***************

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-04 16:25             ` Rogier Wolff
@ 2002-09-04 19:34               ` Thunder from the hill
  0 siblings, 0 replies; 297+ messages in thread
From: Thunder from the hill @ 2002-09-04 19:34 UTC (permalink / raw)
  To: Rogier Wolff
  Cc: Alan Cox, Andries Brouwer, James Bottomley, Justin T. Gibbs,
	linux-kernel, linux-scsi

Hi,

On Wed, 4 Sep 2002, Rogier Wolff wrote:
> I will most likely set the "retry count" to 0: Never retry. Almost
> never works anyway. And the disk already retried manytimes, so
> retrying in software is only "taking time".

fd = open(pathname, O_NORETRY); ?

			Thunder
-- 
--./../...-/. -.--/---/..-/.-./..././.-../..-. .---/..-/.../- .-
--/../-./..-/-/./--..-- ../.----./.-../.-.. --./../...-/. -.--/---/..-
.- -/---/--/---/.-./.-./---/.--/.-.-.-
--./.-/-.../.-./.././.-../.-.-.-

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-03 19:09   ` James Bottomley
  2002-09-03 20:59     ` Alan Cox
  2002-09-03 21:13     ` Doug Ledford
@ 2002-09-03 21:24     ` Patrick Mansfield
  2002-09-03 22:02       ` James Bottomley
  2 siblings, 1 reply; 297+ messages in thread
From: Patrick Mansfield @ 2002-09-03 21:24 UTC (permalink / raw)
  To: James Bottomley; +Cc: Justin T. Gibbs, linux-kernel, linux-scsi

James -

On Tue, Sep 03, 2002 at 02:09:44PM -0500, James Bottomley wrote:
> dledford@redhat.com said:
> > Leave abort active.  It does actually work in certain scenarios.  The
> > CD  burner scenario that started this thread is an example of
> > somewhere that  an abort should actually do the job. 
> 
> Unfortunately, it would destroy the REQ_BARRIER approach in the block layer.  
> At best, abort probably causes a command to overtake a barrier it shouldn't, 
> at worst we abort the ordered tag that is the barrier and transactional 
> integrity is lost.
> 
> When error correction is needed, we have to return all the commands for that 
> device to the block layer so that ordering and barrier issues can be taken 
> care of in the reissue.  This makes LUN RESET (for those that support it) the 
> minimum level of error correction we can apply.
> 
> James

If we only send an abort or reset after a quiesce I don't see why one
is better than the other.

Not specific to reset or abort - if a single command gets an error, we
wait for oustanding commands to complete before starting up the error
handler thread. If all the commands (error one and outstanding) have
barriers, those that do not error out will complete out of order from
the errored command.

How is this properly handled? 

And what happens if one command gets some sort of check condition (like
medium error, or aborted command) that causes a retry? Will IO's still
be correctly ordered?

The abort could also be usefull in handling the locking/ownership of the
scsi_cmnd - the abort at the LLD layer can be used by the LLD to cancel
any software timeouts, as well as to flush the command from the hardware.
After the abort, the mid-layer could assume that it once again "owned"
the scsi_cmnd, especially if the LLD abort were a required function.

I would like to see error handling occur without quiescing the entire
adapter before taking any action. Stopping all adapter IO for a timeout
can be a bit expensive - imagine a tape drive and multiple disks on an
adapter, any IO disk timeout or failure will wait for the tape IO to
complete before allowing any other IO, if the tape operation is long or
is going to timeout this could be minutes or hours.

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-03 21:24     ` Patrick Mansfield
@ 2002-09-03 22:02       ` James Bottomley
  2002-09-03 23:26         ` Alan Cox
  0 siblings, 1 reply; 297+ messages in thread
From: James Bottomley @ 2002-09-03 22:02 UTC (permalink / raw)
  To: Patrick Mansfield
  Cc: James Bottomley, Justin T. Gibbs, linux-kernel, linux-scsi

patmans@us.ibm.com said:
> If we only send an abort or reset after a quiesce I don't see why one
> is better than the other.

Quiesce means from the top (no more commands go from the block layer to the 
device) not from the bottom (commands can still be completed for the device).

> Not specific to reset or abort - if a single command gets an error, we
> wait for oustanding commands to complete before starting up the error
> handler thread. If all the commands (error one and outstanding) have
> barriers, those that do not error out will complete out of order from
> the errored command. 

We don't wait.  The commands may possibly be completing in parallel with 
recovery.  To address your point, though, that's why the device needs to be 
reset as fast as possible: to preserve what's left of the command order.  I 
accept that for a misbehaving device, this may be a race.

> And what happens if one command gets some sort of check condition
> (like medium error, or aborted command) that causes a retry? Will IO's
> still be correctly ordered? 

Retries get eliminated.  It should be up to the upper layers (sd or beyond) to 
say whether a retry is done.  Since, as you point out, retries automatically 
violate any barrier, it is probably up to the block layer to judge what should 
be done about the problem.

> I would like to see error handling occur without quiescing the entire
> adapter before taking any action. Stopping all adapter IO for a
> timeout can be a bit expensive - imagine a tape drive and multiple
> disks on an adapter, any IO disk timeout or failure will wait for the
> tape IO to complete before allowing any other IO, if the tape
> operation is long or is going to timeout this could be minutes or
> hours. 

Actually, I do think that quiescing has an important role to play.  A lot of 
drive errors can be self inflicted indigestion by accepting more tags into the 
queue than it can process.  Quiescing the queue lets us see if the drive can 
digest what it currently has or whether we need to apply a strong emetic.

I'd actually like to add tag starvation recovery to the error handler's list 
of things to do.  In that case, all you do on entry to the error handler is 
quiesce the upper queue and wait a while to see if the commands continue 
returning.  You only begin more drastic measures if nothing comes back in the 
designated time period.

James

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: aic7xxx sets CDR offline, how to reset?
  2002-09-03 22:02       ` James Bottomley
@ 2002-09-03 23:26         ` Alan Cox
  0 siblings, 0 replies; 297+ messages in thread
From: Alan Cox @ 2002-09-03 23:26 UTC (permalink / raw)
  To: James Bottomley
  Cc: Patrick Mansfield, Justin T. Gibbs, linux-kernel, linux-scsi

On Tue, 2002-09-03 at 23:02, James Bottomley wrote:
> > And what happens if one command gets some sort of check condition
> > (like medium error, or aborted command) that causes a retry? Will IO's
> > still be correctly ordered? 
> 
> Retries get eliminated.  It should be up to the upper layers (sd or beyond) to 
> say whether a retry is done.  Since, as you point out, retries automatically 
> violate any barrier, it is probably up to the block layer to judge what should 
> be done about the problem.

Then we need to give the block layer a lot more information about what
kind of a problem occurred


^ permalink raw reply	[flat|nested] 297+ messages in thread

[parent not found: <200209091458.g89Evv806056@localhost.localdomain>]

* Re: [RFC] Multi-path IO in 2.5/2.6 ?
       [not found] <200209091458.g89Evv806056@localhost.localdomain>
@ 2002-09-09 16:56 ` Patrick Mansfield
  2002-09-09 17:34   ` James Bottomley
  0 siblings, 1 reply; 297+ messages in thread
From: Patrick Mansfield @ 2002-09-09 16:56 UTC (permalink / raw)
  To: James Bottomley, Lars Marowsky-Bree; +Cc: linux-kernel, linux-scsi

On Mon, Sep 09, 2002 at 09:57:56AM -0500, James Bottomley wrote:
> Lars Marowsky-Bree <lmb@suse.de> said:
> > So, what is the take on "multi-path IO" (in particular, storage) in
> > 2.5/2.6?
> 
> I've already made my views on this fairly clear (at least from the SCSI stack 
> standpoint):
> 
> - multi-path inside the low level drivers (like qla2x00) is wrong

Agreed.

> - multi-path inside the SCSI mid-layer is probably wrong

Disagree

> - from the generic block layer on up, I hold no specific preferences

Using md or volume manager is wrong for non-failover usage, and somewhat
bad for failover models; generic block layer is OK but it is wasted code
for any lower layers that do not or cannot have multi-path IO (such as
IDE).

> 
> James

I have a newer version of SCSI multi-path in the mid layer that I hope to
post this week, the last version patched against 2.5.14 is here:

http://www-124.ibm.com/storageio/multipath/scsi-multipath

Some documentation is located here:

http://www-124.ibm.com/storageio/multipath/scsi-multipath/docs/index.html

I have a current patch against 2.5.33 that includes NUMA support (it
requires discontigmem support that I believe is in the current linus
bk tree, plus NUMA topology patches).

A major problem with multi-path in md or other volume manager is that we
use multiple (block layer) queues for a single device, when we should be
using a single queue. If we want to use all paths to a device (i.e. round
robin across paths or such, not a failover model) this means the elevator
code becomes inefficient, mabye even counterproductive. For disk arrays,
this might not be bad, but for actual drives or even plugging single
ported drives into a switch or bus with multiple initiators, this could
lead to slower disk performance.

If the volume manager implements only a failover model (use only a single
path until that path fails), besides performance issues in balancing IO
loads, we waste space allocating an extra Scsi_Device for each path.

In the current code, each path is allocated a Scsi_Device, including a
request_queue_t, and a set of Scsi_Cmnd structures. Not only do we
end up with a Scsi_Device for each path, we also have an upper level
(sd, sg, st, or sr) driver attached to each Scsi_Device.

For sd, this means if you have n paths to each SCSI device, you are
limited to whatever limit sd has divided by n, right now 128 / n.  Having
four paths to a device is very reasonable, limiting us to 32 devices, but
with the overhead of 128 devices.

Using a volume manager to implement multiple paths (again non-failover
model) means that the queue_depth might be too large if the queue_depth
(i.e.  number of outstanding commands sent to the drive) is set as a
per-device value - we can end sending n * queue_depth commands to a device.

multi-path in the scsi layer enables multi-path use for all upper level scsi
drivers, not just disks.

We could implement multi-path IO in the block layer, but if the only user
is SCSI, this gains nothing compared to putting multi-path in the scsi
layers. Creating block level interfaces that will work for future devices
and/or future code is hard without already having the devices or code in
place. Any block level interface still requires support in the the
underlying layers.

I'm not against a block level interface, but I don't have ideas or code
for such an implementation.

Generic device naming consistency is a problem if multiple devices show up
with the same id.

With the scsi layer multi-path, ide-scsi or usb-scsi could also do
multi-path IO.

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Multi-path IO in 2.5/2.6 ?
  2002-09-09 16:56 ` [RFC] Multi-path IO in 2.5/2.6 ? Patrick Mansfield
@ 2002-09-09 17:34   ` James Bottomley
  2002-09-09 18:40     ` Mike Anderson
  2002-09-10  0:08     ` Patrick Mansfield
  0 siblings, 2 replies; 297+ messages in thread
From: James Bottomley @ 2002-09-09 17:34 UTC (permalink / raw)
  To: Patrick Mansfield
  Cc: James Bottomley, Lars Marowsky-Bree, linux-kernel, linux-scsi

patmans@us.ibm.com said:
> Using md or volume manager is wrong for non-failover usage, and
> somewhat bad for failover models; generic block layer is OK but it is
> wasted code for any lower layers that do not or cannot have multi-path
> IO (such as IDE). 

What about block devices that could usefully use multi-path to achieve network 
redundancy, like nbd? If it's in the block layer or above, they can be made to 
work with minimal effort.

My basic point is that the utility of the feature transcends SCSI, so SCSI is 
too low a layer for it.

I wouldn't be too sure even of the IDE case: IDE has a habit of copying SCSI 
features when they become more main-stream (and thus cheaper).  It wouldn't 
suprise me to see multi-path as an adjunct to the IDE serial stuff.

> A major problem with multi-path in md or other volume manager is that
> we use multiple (block layer) queues for a single device, when we
> should be using a single queue. If we want to use all paths to a
> device (i.e. round robin across paths or such, not a failover model)
> this means the elevator code becomes inefficient, mabye even
> counterproductive. For disk arrays, this might not be bad, but for
> actual drives or even plugging single ported drives into a switch or
> bus with multiple initiators, this could lead to slower disk
> performance. 

That's true today, but may not be true in 2.6.  Suparna's bio splitting code 
is aimed precisely at this and other software RAID cases.

> In the current code, each path is allocated a Scsi_Device, including a
> request_queue_t, and a set of Scsi_Cmnd structures. Not only do we end
> up with a Scsi_Device for each path, we also have an upper level (sd,
> sg, st, or sr) driver attached to each Scsi_Device. 

You can't really get away from this.  Transfer parameters are negotiated at 
the Scsi_Device level (i.e. per device path from HBA to controller), and LLDs 
accept I/O's for Scsi_Devices.  Whatever you do, you still need an entity that 
performs most of the same functions as the Scsi_Device, so you might as well 
keep Scsi_Device itself, since it works.

> For sd, this means if you have n paths to each SCSI device, you are
> limited to whatever limit sd has divided by n, right now 128 / n.
> Having four paths to a device is very reasonable, limiting us to 32
> devices, but with the overhead of 128 devices. 

I really don't expect this to be true in 2.6.

> Using a volume manager to implement multiple paths (again non-failover
> model) means that the queue_depth might be too large if the
> queue_depth (i.e.  number of outstanding commands sent to the drive)
> is set as a per-device value - we can end sending n * queue_depth
> commands to a device.

The queues tend to be in the controllers, not in the RAID devices, thus for a 
dual path RAID device you usually have two caching controllers and thus twice 
the queue depth (I know this isn't always the case, but it certainly is enough 
of the time for me to argue that you should have the flexibility to queue per 
path).

> We could implement multi-path IO in the block layer, but if the only
> user is SCSI, this gains nothing compared to putting multi-path in the
> scsi layers. Creating block level interfaces that will work for future
> devices and/or future code is hard without already having the devices
> or code in place. Any block level interface still requires support in
> the the underlying layers.

> I'm not against a block level interface, but I don't have ideas or
> code for such an implementation.

SCSI got into a lot of trouble by going down the "kernel doesn't have X 
feature I need, so I'll just code it into the SCSI mid-layer instead", I'm 
loth to accept something into SCSI that I don't think belongs there in the 
long term.

Answer me this question:

- In the forseeable future does multi-path have uses other than SCSI?

I've got to say, I can't see a "no" to that one, so it fails the high level 
bar to getting into the scsi subsystem.  However, the kernel, as has been said 
before, isn't a theoretical excercise in design, so is there a good expediency 
argument (like "it will take one year to get all the features of the block 
layer to arrive and I have a customer now").  Also, to go in under expediency, 
the code must be readily removable against the day it can be redone correctly.

> Generic device naming consistency is a problem if multiple devices
> show up with the same id.

Patrick Mochel has an open task to come up with a solution to this.

> With the scsi layer multi-path, ide-scsi or usb-scsi could also do
> multi-path IO. 

The "scsi is everything" approach got its wings shot off at the kernel summit, 
and subsequently confirmed its death in a protracted wrangle on lkml (I can't 
remember the reference off the top of my head, but I'm sure others can).

James

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Multi-path IO in 2.5/2.6 ?
  2002-09-09 17:34   ` James Bottomley
@ 2002-09-09 18:40     ` Mike Anderson
  2002-09-10 13:02       ` Lars Marowsky-Bree
  2002-09-10  0:08     ` Patrick Mansfield
  1 sibling, 1 reply; 297+ messages in thread
From: Mike Anderson @ 2002-09-09 18:40 UTC (permalink / raw)
  To: James Bottomley
  Cc: Patrick Mansfield, Lars Marowsky-Bree, linux-kernel, linux-scsi

James Bottomley [James.Bottomley@SteelEye.com] wrote:
> patmans@us.ibm.com said:
> > Using md or volume manager is wrong for non-failover usage, and
> > somewhat bad for failover models; generic block layer is OK but it is
> > wasted code for any lower layers that do not or cannot have multi-path
> > IO (such as IDE). 
> 
> What about block devices that could usefully use multi-path to achieve network 
> redundancy, like nbd? If it's in the block layer or above, they can be made to 
> work with minimal effort.

When you get into networking I believe we may get into path failover
capability that is already implemented by the network stack. So the
paths may not be visible to the block layer.

> 
> My basic point is that the utility of the feature transcends SCSI, so SCSI is 
> too low a layer for it.
> 
> I wouldn't be too sure even of the IDE case: IDE has a habit of copying SCSI 
> features when they become more main-stream (and thus cheaper).  It wouldn't 
> suprise me to see multi-path as an adjunct to the IDE serial stuff.
> 

The utility does transcend SCSI, but transport / device specific
characteristics may make "true" generic implementations difficult.

To add functionality beyond failover multi-path you will need to get into
transport and device specific data gathering.

> > A major problem with multi-path in md or other volume manager is that
> > we use multiple (block layer) queues for a single device, when we
> > should be using a single queue. If we want to use all paths to a
> > device (i.e. round robin across paths or such, not a failover model)
> > this means the elevator code becomes inefficient, mabye even
> > counterproductive. For disk arrays, this might not be bad, but for
> > actual drives or even plugging single ported drives into a switch or
> > bus with multiple initiators, this could lead to slower disk
> > performance. 
> 
> That's true today, but may not be true in 2.6.  Suparna's bio splitting code 
> is aimed precisely at this and other software RAID cases.

I have not looked at Suparna's patch but it would seem that device
knowledge would be helpful for knowing when to split.

> > In the current code, each path is allocated a Scsi_Device, including a
> > request_queue_t, and a set of Scsi_Cmnd structures. Not only do we end
> > up with a Scsi_Device for each path, we also have an upper level (sd,
> > sg, st, or sr) driver attached to each Scsi_Device. 
> 
> You can't really get away from this.  Transfer parameters are negotiated at 
> the Scsi_Device level (i.e. per device path from HBA to controller), and LLDs 
> accept I/O's for Scsi_Devices.  Whatever you do, you still need an entity that 
> performs most of the same functions as the Scsi_Device, so you might as well 
> keep Scsi_Device itself, since it works.

James have you looked at the documentation / patch previously pointed to
by Patrick? There is still a Scsi_device.

> 
> > For sd, this means if you have n paths to each SCSI device, you are
> > limited to whatever limit sd has divided by n, right now 128 / n.
> > Having four paths to a device is very reasonable, limiting us to 32
> > devices, but with the overhead of 128 devices. 
> 
> I really don't expect this to be true in 2.6.
> 

While the device space may be increased in 2.6 you are still consuming
extra resources, but we do this in other places also.

> > We could implement multi-path IO in the block layer, but if the only
> > user is SCSI, this gains nothing compared to putting multi-path in the
> > scsi layers. Creating block level interfaces that will work for future
> > devices and/or future code is hard without already having the devices
> > or code in place. Any block level interface still requires support in
> > the the underlying layers.
> 
> > I'm not against a block level interface, but I don't have ideas or
> > code for such an implementation.
> 
> SCSI got into a lot of trouble by going down the "kernel doesn't have X 
> feature I need, so I'll just code it into the SCSI mid-layer instead", I'm 
> loth to accept something into SCSI that I don't think belongs there in the 
> long term.
> 
> Answer me this question:
> 
> - In the forseeable future does multi-path have uses other than SCSI?
> 

See top comment.

> The "scsi is everything" approach got its wings shot off at the kernel summit, 
> and subsequently confirmed its death in a protracted wrangle on lkml (I can't 
> remember the reference off the top of my head, but I'm sure others can).

Could you point this out so I can understand the context.

-Mike
-- 
Michael Anderson
andmike@us.ibm.com


^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Multi-path IO in 2.5/2.6 ?
  2002-09-09 18:40     ` Mike Anderson
@ 2002-09-10 13:02       ` Lars Marowsky-Bree
  2002-09-10 16:03         ` Patrick Mansfield
  2002-09-10 16:27         ` Mike Anderson
  0 siblings, 2 replies; 297+ messages in thread
From: Lars Marowsky-Bree @ 2002-09-10 13:02 UTC (permalink / raw)
  To: James Bottomley, Patrick Mansfield, linux-kernel, linux-scsi

On 2002-09-09T11:40:26,
   Mike Anderson <andmike@us.ibm.com> said:

> When you get into networking I believe we may get into path failover
> capability that is already implemented by the network stack. So the
> paths may not be visible to the block layer.

It depends. The block layer might need knowledge of the different paths for
load balancing.

> The utility does transcend SCSI, but transport / device specific
> characteristics may make "true" generic implementations difficult.

I disagree. What makes "generic" implementations difficult is the absolutely
mediocre error reporting and handling from the lower layers.

With multipathing, you want the lower level to hand you the error
_immediately_ if there is some way it could be related to a path failure and
no automatic retries should take place - so you can immediately mark the path
as faulty and go to another. 

However, on a "access beyond end of device" or a clear read error, failing a
path is a rather stupid idea, but instead the error should go up immediately.

This will need to be sorted regardless of the layer it is implemented in.

How far has this been already addressed in 2.5 ?

> > > For sd, this means if you have n paths to each SCSI device, you are
> > > limited to whatever limit sd has divided by n, right now 128 / n.
> > > Having four paths to a device is very reasonable, limiting us to 32
> > > devices, but with the overhead of 128 devices. 
> > 
> > I really don't expect this to be true in 2.6.
> > 
> While the device space may be increased in 2.6 you are still consuming
> extra resources, but we do this in other places also.

For user-space reprobing of failed paths, it may be vital to expose the
physical paths too. Then "reprobing" could be as simple as

	dd if=/dev/physical/path of=/dev/null count=1 && enable_path_again

I dislike reprobing in kernel space, in particular using live requests as
the LVM1 patch by IBM does it.

Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
Immortality is an adequate definition of high availability for me.
	--- Gregory F. Pfister

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Multi-path IO in 2.5/2.6 ?
  2002-09-10 13:02       ` Lars Marowsky-Bree
@ 2002-09-10 16:03         ` Patrick Mansfield
  2002-09-10 16:27         ` Mike Anderson
  1 sibling, 0 replies; 297+ messages in thread
From: Patrick Mansfield @ 2002-09-10 16:03 UTC (permalink / raw)
  To: Lars Marowsky-Bree; +Cc: James Bottomley, linux-kernel, linux-scsi

On Tue, Sep 10, 2002 at 03:02:01PM +0200, Lars Marowsky-Bree wrote:
> On 2002-09-09T11:40:26,

> With multipathing, you want the lower level to hand you the error
> _immediately_ if there is some way it could be related to a path failure and
> no automatic retries should take place - so you can immediately mark the path
> as faulty and go to another. 
> 
> However, on a "access beyond end of device" or a clear read error, failing a
> path is a rather stupid idea, but instead the error should go up immediately.
> 
> This will need to be sorted regardless of the layer it is implemented in.
> 
> How far has this been already addressed in 2.5 ?
> 

The current scsi multi-path code handles the above cases. There is
a scsi_path_decide_disposition() that fails paths independent of the
result of the IO. It is similiar to the current scsi_decide_disposition,
except it also fails the path. So for a check condition of media error,
the IO might be marked as SUCCESS (meaning completed with an error),
but the path would not be modified (there are more details than this).

> For user-space reprobing of failed paths, it may be vital to expose the
> physical paths too. Then "reprobing" could be as simple as
> 
> 	dd if=/dev/physical/path of=/dev/null count=1 && enable_path_again
> 

Yes, that is a good idea, I was thinking that this should be done
via sg, and modify sg to allow path selection - so no matter what, sg
could be used to probe a path.  I have no plans to expose a user level
device for each path, but a device model "file" could be exposed for
the state of each path, and its state controlled via driverfs.

> I dislike reprobing in kernel space, in particular using live requests as
> the LVM1 patch by IBM does it.
> 
> 
> Sincerely,
>     Lars Marowsky-Brée <lmb@suse.de>

-- Patrick Mansfield
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Multi-path IO in 2.5/2.6 ?
  2002-09-10 13:02       ` Lars Marowsky-Bree
  2002-09-10 16:03         ` Patrick Mansfield
@ 2002-09-10 16:27         ` Mike Anderson
  1 sibling, 0 replies; 297+ messages in thread
From: Mike Anderson @ 2002-09-10 16:27 UTC (permalink / raw)
  To: Lars Marowsky-Bree
  Cc: James Bottomley, Patrick Mansfield, linux-kernel, linux-scsi

Lars Marowsky-Bree [lmb@suse.de] wrote:
> On 2002-09-09T11:40:26,
>    Mike Anderson <andmike@us.ibm.com> said:
> 
> > When you get into networking I believe we may get into path failover
> > capability that is already implemented by the network stack. So the
> > paths may not be visible to the block layer.
> 
> It depends. The block layer might need knowledge of the different paths for
> load balancing.
> 
> > The utility does transcend SCSI, but transport / device specific
> > characteristics may make "true" generic implementations difficult.
> 
> I disagree. What makes "generic" implementations difficult is the absolutely
> mediocre error reporting and handling from the lower layers.
> 

Where working on it. I have done a mid scsi_error cleanup patch for 2.5
so that we can better view where we are at currently. Hopefully soon we
can do some actual useful improvements.

My main point on the previous comment though was that some transports may
decide not to expose there paths (i.e. the may manage them at the
transport layer) and the block layer would be unable to attach to
individual paths.

The second point I was trying to make is that if you look at most
multi-path solutions across different operating systems once they have
failover capability and move to support more performant / advanced
multi-path solutions they need specific attributes of the device or
path. These attributes have sometimes been called personalities. 

Examples of these personalities are path usage models (failover,
transparent failover, active load balancing), ports per controller
config,  platform memory latency (NUMA), cache characteristics, special
error decoding for device events, etc.

I mention a few in these in this document.
http://www-124.ibm.com/storageio/multipath/scsi-multipath/docs/index.html

These personalities could be acquired at any level of the IO stack, but
involve some API if we want to try and get as close as possible to
"generic".

> With multipathing, you want the lower level to hand you the error
> _immediately_ if there is some way it could be related to a path failure and
> no automatic retries should take place - so you can immediately mark the path
> as faulty and go to another. 
> 

Agreed, In a past O/S we worked hard to have our error policy structure
so that transports worried about transport errors and devices worried
about device errors. If a transport received a completion of an IO its
job was done (we did have some edge case and heuristics to stop paths
from cycling from disabled to enabled)

> However, on a "access beyond end of device" or a clear read error, failing a
> path is a rather stupid idea, but instead the error should go up immediately.
> 

Agreed, each layer should only deal with error in there domain.

> This will need to be sorted regardless of the layer it is implemented in.
> 
> How far has this been already addressed in 2.5 ?

See previous comment.

> 
> > > > For sd, this means if you have n paths to each SCSI device, you are
> > > > limited to whatever limit sd has divided by n, right now 128 / n.
> > > > Having four paths to a device is very reasonable, limiting us to 32
> > > > devices, but with the overhead of 128 devices. 
> > > 
> > > I really don't expect this to be true in 2.6.
> > > 
> > While the device space may be increased in 2.6 you are still consuming
> > extra resources, but we do this in other places also.
> 
> For user-space reprobing of failed paths, it may be vital to expose the
> physical paths too. Then "reprobing" could be as simple as
> 
> 	dd if=/dev/physical/path of=/dev/null count=1 && enable_path_again
> 
> I dislike reprobing in kernel space, in particular using live requests as
> the LVM1 patch by IBM does it.
> 

In the current mid-mp patch the paths are exposed through the proc
interface and can be activated / state changed through a echo command. 

While live requests are sometimes viewed as a bad things to activate a
path very small IO sizes in optical networks a unreliable in
determining anything but completely dead paths. The size of the payload
is important.

I believe the difference in views here is what to expose and what
size/type of structure represents each piece of the nexus (b:c:t:l).

-Mike

-- 
Michael Anderson
andmike@us.ibm.com

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Multi-path IO in 2.5/2.6 ?
  2002-09-09 17:34   ` James Bottomley
  2002-09-09 18:40     ` Mike Anderson
@ 2002-09-10  0:08     ` Patrick Mansfield
  2002-09-10  7:55       ` Jeremy Higdon
                         ` (2 more replies)
  1 sibling, 3 replies; 297+ messages in thread
From: Patrick Mansfield @ 2002-09-10  0:08 UTC (permalink / raw)
  To: James Bottomley; +Cc: Lars Marowsky-Bree, linux-kernel, linux-scsi

James -

On Mon, Sep 09, 2002 at 12:34:05PM -0500, James Bottomley wrote:

> What about block devices that could usefully use multi-path to achieve network 
> redundancy, like nbd? If it's in the block layer or above, they can be made to 
> work with minimal effort.
> 
> My basic point is that the utility of the feature transcends SCSI, so SCSI is 
> too low a layer for it.

I agree it has potential uses outside of SCSI, this does not directly
imply that we need to create a generic implementation. I have found no
code to reference in other block drivers or in the block layer. I've
looked some at the dasd code but can't figure out if or where there is any
multi-path code.

Putting multi-path into the block layer means it would have to acquire and
maintain a handle (i.e.  path) for each device it knows about, and then
eventually pass this handle down to the lower level. I don't see this
happening in 2.5/2.6, unless someone is coding it right now.

It makes sense to at least expose the topology of the IO storage, whether
or not the block or other layers can figure out what to do with the
information.  That is, ideally for SCSI we should have a representation of
the target - like struct scsi_target - and then the target is multi-pathed,
not the devices (LUNs, block or character devices) attached to the target.
We should also have a bus or fabric representation, showing multi-path from
the adapters view (multiple initiators on the fabric or bus).

Whether or not the fabric or target information is used to route IO, they
are useful for hardware removal/replacement. Imagine replacing a fibre
switch, or replacing a failed controller on a raid array.

If all this information was in the device model (driver?), with some sort
of function or data pointers, perhaps (in 2.7.x timeframe) we could route
IO and call appropriate drivers based on that information.

> > A major problem with multi-path in md or other volume manager is that
> > we use multiple (block layer) queues for a single device, when we
> > should be using a single queue. If we want to use all paths to a
> > device (i.e. round robin across paths or such, not a failover model)
> > this means the elevator code becomes inefficient, mabye even
> > counterproductive. For disk arrays, this might not be bad, but for
> > actual drives or even plugging single ported drives into a switch or
> > bus with multiple initiators, this could lead to slower disk
> > performance. 
> 
> That's true today, but may not be true in 2.6.  Suparna's bio splitting code 
> is aimed precisely at this and other software RAID cases.

Yes, but then we need some sort of md/RAID/volume manager aware eleavator
code + bio splitting, and perhaps avoid calling elevator code normally called
for a Scsi_Device. Though I can imagine splitting the bio in md and then
still merging and sorting requests for SCSI.

> > In the current code, each path is allocated a Scsi_Device, including a
> > request_queue_t, and a set of Scsi_Cmnd structures. Not only do we end
> > up with a Scsi_Device for each path, we also have an upper level (sd,
> > sg, st, or sr) driver attached to each Scsi_Device. 
> 
> You can't really get away from this.  Transfer parameters are negotiated at 
> the Scsi_Device level (i.e. per device path from HBA to controller), and LLDs 
> accept I/O's for Scsi_Devices.  Whatever you do, you still need an entity that 
> performs most of the same functions as the Scsi_Device, so you might as well 
> keep Scsi_Device itself, since it works.

Yes negotiation is at the adapter level, but that does not have to be tied
to a Scsi_Device. I need to search for Scsi_Device::hostdata usage to
figure out details, and to figure out if anything is broken in the current
scsi multi-path code - right now it requires the same adapter drivers be
used and that certain Scsi_Host parameters are equal if multiple paths
to a Scsi_Device are found.

> > For sd, this means if you have n paths to each SCSI device, you are
> > limited to whatever limit sd has divided by n, right now 128 / n.
> > Having four paths to a device is very reasonable, limiting us to 32
> > devices, but with the overhead of 128 devices. 
> 
> I really don't expect this to be true in 2.6.

If we use a Scsi_Device for each path, we always have the overhead of the
number of paths times the number of devices - upping the limits of sd
certainly helps, but we are then increasing the possibly large amount
of memory that we can waste. And, other devices besides disks can be
multi-pathed.

> > Using a volume manager to implement multiple paths (again non-failover
> > model) means that the queue_depth might be too large if the
> > queue_depth (i.e.  number of outstanding commands sent to the drive)
> > is set as a per-device value - we can end sending n * queue_depth
> > commands to a device.
> 
> The queues tend to be in the controllers, not in the RAID devices, thus for a 
> dual path RAID device you usually have two caching controllers and thus twice 
> the queue depth (I know this isn't always the case, but it certainly is enough 
> of the time for me to argue that you should have the flexibility to queue per 
> path).

You can have multiple initiators on FCP or SPI, without dual controllers
involved at all. Most of my multi-path testing has been with dual
ported FCP disk drives, with multiple FCP adapters connected to a
switch, not with disk arrays (I don't have any non-failover multi-ported
disk arrays available, I'm using a fastt 200 disk array); I don't know the
details of the drive controllers for my disks, but putting multiple
controllers in a disk drive certainly would increase the cost.

Yes, per path queues and per device queues are reasonable; per path queues
requires knowledge of actual device ports not in the current scsi multi-path
patch. The code I have now uses the Scsi_Host::can_queue to limit the number
of commands sent to a host. I really need slave_attach() support in the host
adapter (like Doug L's patch a while back), plus maybe a slave_attach_path(),
and/or queue limit per path.

Per path queues are not required, as long as any queue limits do not
hinder the performance.

> SCSI got into a lot of trouble by going down the "kernel doesn't have X 
> feature I need, so I'll just code it into the SCSI mid-layer instead", I'm 
> loth to accept something into SCSI that I don't think belongs there in the 
> long term.
> 
> Answer me this question:
> 
> - In the forseeable future does multi-path have uses other than SCSI?
> 
> I've got to say, I can't see a "no" to that one, so it fails the high level 
> bar to getting into the scsi subsystem.  However, the kernel, as has been said 
> before, isn't a theoretical excercise in design, so is there a good expediency 
> argument (like "it will take one year to get all the features of the block 
> layer to arrive and I have a customer now").  Also, to go in under expediency, 
> the code must be readily removable against the day it can be redone correctly.

Yes, there could be future multi-path users, or maybe with DASD. If we take
SCSI and DASD as existing usage, they could be a basis for a block layer
(or generic) set of multi-path interfaces.

There is code available for scsi multi-path, this is not a design in theory.
Anyone can take the code and fold it into a block layer implementation or
other approach. I would be willing to work on scsi usage or such for any
new block level or other such code for generic multi-path use. At this
time I wouldn't feel comfortable adding to or modifying block layer
interfaces and code, nor do I think it is possible to come up with the
best interface given only one block driver implementation, nor do I think
there is enough time to get this into 2.5.x.

IMO, there is demand for scsi multi-path support now, as users move to 
large databases requiring higher availabitity. md or volume manager
for failover is adequate in some of these cases.

I see other issues as being more important to scsi - like cleaning it up
or rewriting portions of the code, but we still need to add new features
as we move forward.

Even with generic block layer multi-path support, we still need block
driver (scsi, ide, etc.) code for multi-path.

> > Generic device naming consistency is a problem if multiple devices
> > show up with the same id.
> 
> Patrick Mochel has an open task to come up with a solution to this.

I don't think this can be solved if multiple devices show up with the same
id. If I have five disks that all say I'm disk X, how can there be one
name or handle for it from user level?

> > With the scsi layer multi-path, ide-scsi or usb-scsi could also do
> > multi-path IO. 
> 
> The "scsi is everything" approach got its wings shot off at the kernel summit, 
> and subsequently confirmed its death in a protracted wrangle on lkml (I can't 
> remember the reference off the top of my head, but I'm sure others can).

Agreed, but having the block layer be everything is also wrong.

My view is that md/volume manager multi-pathing is useful with 2.4.x, scsi
layer multi-path for 2.5.x, and this (perhaps with DASD) could then evolve
into generic block level (or perhaps integrated with the device model)
multi-pathing support for use in 2.7.x. Do you agree or disagree with this
approach?

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Multi-path IO in 2.5/2.6 ?
  2002-09-10  0:08     ` Patrick Mansfield
@ 2002-09-10  7:55       ` Jeremy Higdon
  2002-09-10 13:04         ` Lars Marowsky-Bree
  2002-09-10 13:16       ` Lars Marowsky-Bree
  2002-09-10 17:21       ` Patrick Mochel
  2 siblings, 1 reply; 297+ messages in thread
From: Jeremy Higdon @ 2002-09-10  7:55 UTC (permalink / raw)
  To: Patrick Mansfield, James Bottomley
  Cc: Lars Marowsky-Bree, linux-kernel, linux-scsi

On Sep 9,  5:08pm, Patrick Mansfield wrote:
> 
> You can have multiple initiators on FCP or SPI, without dual controllers
> involved at all. Most of my multi-path testing has been with dual
> ported FCP disk drives, with multiple FCP adapters connected to a
> switch, not with disk arrays (I don't have any non-failover multi-ported
> disk arrays available, I'm using a fastt 200 disk array); I don't know the
> details of the drive controllers for my disks, but putting multiple
> controllers in a disk drive certainly would increase the cost.

Is there any plan to do something for hardware RAIDs in which two different
RAID controllers can get to the same logical unit, but you pay a performance
penalty when you access the lun via both controllers?  It seems to me that
all RAIDs that don't require a command to switch a lun from one to the
other controller (i.e. where both ctlrs can access a lun simultaneously)
pay a performance penalty when you access a lun from both.

Working around this in a generic way (i.e. without designation by the
system admin) seems difficult, so I'm wondering what may have been done
with this (my reading of this discussion is that it has not been tackled
yet).

thanks

jeremy

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Multi-path IO in 2.5/2.6 ?
  2002-09-10  7:55       ` Jeremy Higdon
@ 2002-09-10 13:04         ` Lars Marowsky-Bree
  2002-09-10 16:20           ` Patrick Mansfield
  0 siblings, 1 reply; 297+ messages in thread
From: Lars Marowsky-Bree @ 2002-09-10 13:04 UTC (permalink / raw)
  To: linux-kernel, linux-scsi

On 2002-09-10T00:55:58,
   Jeremy Higdon <jeremy@classic.engr.sgi.com> said:

> Is there any plan to do something for hardware RAIDs in which two different
> RAID controllers can get to the same logical unit, but you pay a performance
> penalty when you access the lun via both controllers? 

This is implemented in the md multipath patch in 2.4; it distinguishes between
"active" and "spare" paths.

The LVM1 patch also does this by having priorities for each path and only
going to the next priority group if all paths in the current one have failed,
which IMHO is slightly over the top but there is always someone who might need
it ;-)

This functionality is a generic requirement IMHO.


Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
Immortality is an adequate definition of high availability for me.
	--- Gregory F. Pfister

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Multi-path IO in 2.5/2.6 ?
  2002-09-10 13:04         ` Lars Marowsky-Bree
@ 2002-09-10 16:20           ` Patrick Mansfield
  0 siblings, 0 replies; 297+ messages in thread
From: Patrick Mansfield @ 2002-09-10 16:20 UTC (permalink / raw)
  To: Lars Marowsky-Bree; +Cc: linux-kernel, linux-scsi

On Tue, Sep 10, 2002 at 03:04:27PM +0200, Lars Marowsky-Bree wrote:
> On 2002-09-10T00:55:58,
>    Jeremy Higdon <jeremy@classic.engr.sgi.com> said:
> 
> > Is there any plan to do something for hardware RAIDs in which two different
> > RAID controllers can get to the same logical unit, but you pay a performance
> > penalty when you access the lun via both controllers? 
> 
> This is implemented in the md multipath patch in 2.4; it distinguishes between
> "active" and "spare" paths.
> 
> The LVM1 patch also does this by having priorities for each path and only
> going to the next priority group if all paths in the current one have failed,
> which IMHO is slightly over the top but there is always someone who might need
> it ;-)
> 
> This functionality is a generic requirement IMHO.
> 
> 
> Sincerely,
>     Lars Marowsky-Brée <lmb@suse.de>

The current scsi multi-path has a default setting of "last path used", so
that it will always work with such controllers (I refer to such controllers
as fail-over devices). You have to modify the config, or boot with a
flag to get round-robin path selection. Right now, all paths will start
out on the same adapter (initiator), this is bad if you have more than
a few arrays attached to and adapter.

I was planning on implementing something similiar to what you describe 
for LVM1, with path weighting. Yes, it seems a bit heavy, but if there
are only two weights or priorities, it looks just like your active/spare
code, and is a bit more flexible.

Figuring out what ports are active or spare is not easy, and varies
from array to array. This is a (another) good canidate for user level
probing/configuration. I will likely probably hard-code device
personallity information into the kernel for now, and hopefully in
2.7 we could move SCSI to user level probe/scan/configure.

I have heard that some arrays at one time had a small penalty for
switching controllers, and it was best to use the last path used,
but it was still OK to use both at the same time (cache warmth). I
was trying to think of a way to allow good load balancing in such
cases, but didn't come up with a good solution. Like use last path
used selection, and once a path is too "busy" move to another path
- but then all devices might switch at the same time; some hearistics
or timing could probably be added to avoid such problems. The code
allows for path selection in a single function, so it should not
be difficult to add more complex path selection.

I have also put NUMA path selection into the latest code. I've tested
it with 2.5.32, and a bunch of NUMA + NUMAQ patches on IBM/Sequent
NUMAQ systems.

-- Patrick Mansfield
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Multi-path IO in 2.5/2.6 ?
  2002-09-10  0:08     ` Patrick Mansfield
  2002-09-10  7:55       ` Jeremy Higdon
@ 2002-09-10 13:16       ` Lars Marowsky-Bree
  2002-09-10 19:26         ` Patrick Mansfield
  2002-09-10 17:21       ` Patrick Mochel
  2 siblings, 1 reply; 297+ messages in thread
From: Lars Marowsky-Bree @ 2002-09-10 13:16 UTC (permalink / raw)
  To: linux-kernel, linux-scsi

On 2002-09-09T17:08:47,
   Patrick Mansfield <patmans@us.ibm.com> said:

Patrick, I am only replying to what I understand. Some of your comments on the
internals of the SCSI layer are beyond me ;-)

> Yes negotiation is at the adapter level, but that does not have to be tied
> to a Scsi_Device. I need to search for Scsi_Device::hostdata usage to
> figure out details, and to figure out if anything is broken in the current
> scsi multi-path code - right now it requires the same adapter drivers be
> used and that certain Scsi_Host parameters are equal if multiple paths
> to a Scsi_Device are found.

This seems to be a serious limitation. There are good reasons for wanting to
use different HBAs for the different paths.

And the Scsi_Device might be quite different. Imagine something like two
storage boxes which do internal replication among them; yes, you'd only want
to use one of them normal (because the Cache-coherency traffic is going to
kill performance otherwise), but you can failover from one to the other even
if they have different SCSI serials etc.

> of memory that we can waste. And, other devices besides disks can be
> multi-pathed.

That is a good point.

But it would also be true for a generic block device implementation.

> Yes, there could be future multi-path users, or maybe with DASD. If we take
> SCSI and DASD as existing usage, they could be a basis for a block layer
> (or generic) set of multi-path interfaces.

SATA will also support multipathing if the birds were right, so it might make
sense to keep this in mind, at least for 2.7.

> There is code available for scsi multi-path, this is not a design in theory.

Well, there is code available for all the others too ;-)

> IMO, there is demand for scsi multi-path support now, as users move to 
> large databases requiring higher availabitity. md or volume manager
> for failover is adequate in some of these cases.

The volume manager multi-pathing, at least as done via the LVM1 patch, has a
major drawback. It can't easily be stacked with software RAID. It is very
awkward to do that right now.

And software RAID on top of multi-pathing is a typical example for a truely
fault tolerant configuration.

Thats obviously easier with md, and I assume your SCSI code can also do that
nicely.

> Even with generic block layer multi-path support, we still need block
> driver (scsi, ide, etc.) code for multi-path.

Yes. Error handling in particular ;-)

The topology information you mention is also a good candidate for exposure.

> Agreed, but having the block layer be everything is also wrong.

Having the block device handling all block devices seems fairly reasonable to
me.

> My view is that md/volume manager multi-pathing is useful with 2.4.x, scsi
> layer multi-path for 2.5.x, and this (perhaps with DASD) could then evolve
> into generic block level (or perhaps integrated with the device model)
> multi-pathing support for use in 2.7.x. Do you agree or disagree with this
> approach?

Well, I guess 2.5/2.6 will have all the different multi-path implementations
mentioned so far (EVMS, LVM2, md, scsi, proprietary) - they all have code and
a userbase... All of them and future implementations can benefit from better
error handling and general cleanup, so that might be the best to do for now.

I think it is too soon to clean that up and consolidate the m-p approaches,
but I think it really ought to be consolidated in 2.7, and this seems like a
good time to start planning for that one.

Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
Immortality is an adequate definition of high availability for me.
	--- Gregory F. Pfister

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Multi-path IO in 2.5/2.6 ?
  2002-09-10 13:16       ` Lars Marowsky-Bree
@ 2002-09-10 19:26         ` Patrick Mansfield
  2002-09-11 14:20           ` James Bottomley
  0 siblings, 1 reply; 297+ messages in thread
From: Patrick Mansfield @ 2002-09-10 19:26 UTC (permalink / raw)
  To: Lars Marowsky-Bree; +Cc: linux-kernel, linux-scsi

On Tue, Sep 10, 2002 at 03:16:06PM +0200, Lars Marowsky-Bree wrote:
> On 2002-09-09T17:08:47,
>    Patrick Mansfield <patmans@us.ibm.com> said:

> > Yes negotiation is at the adapter level, but that does not have to be tied
> > to a Scsi_Device. I need to search for Scsi_Device::hostdata usage to
> > figure out details, and to figure out if anything is broken in the current
> > scsi multi-path code - right now it requires the same adapter drivers be
> > used and that certain Scsi_Host parameters are equal if multiple paths
> > to a Scsi_Device are found.
> 
> This seems to be a serious limitation. There are good reasons for wanting to
> use different HBAs for the different paths.

What reasons? Adapter upgrades/replacement on a live system? I can imagine
someone using different HBAs so that they won't hit the same bug in both
HBAs, but that is a weak argument; I would think such systems would want
some type of cluster failover.

If the HBAs had the same memory and other limitations, it should function
OK, but it is hard to figure out exactly what might happen (if the HBAs had
different error handling characteristics, handled timeouts differently,
etc.). It would be easy to get rid of the checking for the same drivers,
(the code actually checks for the same drivers via Scsi_Host::hostt, not
the same hardware) - so it would allow multiple paths if the same driver
is used for different HBA's.

> And the Scsi_Device might be quite different. Imagine something like two
> storage boxes which do internal replication among them; yes, you'd only want
> to use one of them normal (because the Cache-coherency traffic is going to
> kill performance otherwise), but you can failover from one to the other even
> if they have different SCSI serials etc.

> And software RAID on top of multi-pathing is a typical example for a truely
> fault tolerant configuration.
> 
> Thats obviously easier with md, and I assume your SCSI code can also do that
> nicely.

I haven't tried it, but I see no reason why it would not work.

> > Agreed, but having the block layer be everything is also wrong.
> 
> Having the block device handling all block devices seems fairly reasonable to
> me.

Note that scsi uses the block device layer (the request_queue_t) for
character devices - look at st.c, sg.c, and sr*.c, calls to scsi_do_req()
or scsi_wait_req() queue to the request_queue_t. Weird but it works - you can
open a CD via sr and sg at the same time.

> > My view is that md/volume manager multi-pathing is useful with 2.4.x, scsi
> > layer multi-path for 2.5.x, and this (perhaps with DASD) could then evolve
> > into generic block level (or perhaps integrated with the device model)
> > multi-pathing support for use in 2.7.x. Do you agree or disagree with this
> > approach?
> 
> Well, I guess 2.5/2.6 will have all the different multi-path implementations
> mentioned so far (EVMS, LVM2, md, scsi, proprietary) - they all have code and
> a userbase... All of them and future implementations can benefit from better
> error handling and general cleanup, so that might be the best to do for now.
> 
> I think it is too soon to clean that up and consolidate the m-p approaches,
> but I think it really ought to be consolidated in 2.7, and this seems like a
> good time to start planning for that one.
> 
> 
> Sincerely,
>     Lars Marowsky-Brée <lmb@suse.de>

The scsi multi-path code is not in 2.5.x, and I doubt it will be accepted
without the support of James and others.

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Multi-path IO in 2.5/2.6 ?
  2002-09-10 19:26         ` Patrick Mansfield
@ 2002-09-11 14:20           ` James Bottomley
  2002-09-11 19:17             ` Lars Marowsky-Bree
  2002-09-11 20:30             ` Doug Ledford
  0 siblings, 2 replies; 297+ messages in thread
From: James Bottomley @ 2002-09-11 14:20 UTC (permalink / raw)
  To: Patrick Mansfield; +Cc: Lars Marowsky-Bree, linux-kernel, linux-scsi

patmans@us.ibm.com said:
> The scsi multi-path code is not in 2.5.x, and I doubt it will be
> accepted without the support of James and others. 

I haven't said "no" yet (and Doug and Jens haven't said anything).  I did say 
when the patches first surfaced that I didn't like the idea of replacing 
Scsi_Device with Scsi_Path at the bottom and the concomitant changes to all 
the Low Level Drivers which want to support multi-pathing.  If this is to go 
in the SCSI subsystem it has to be self contained, transparent and easily 
isolated.  That means the LLDs shouldn't have to be multipath aware.

I think we all agree:

1) that multi-path in SCSI isn't the way to go in the long term because other 
devices may have a use for the infrastructure.

2) that the scsi-error handler is the big problem

3) that errors (both medium and transport) may need to be propagated 
immediately up the block layer in order for multi-path to be handled 
efficiently.

Although I outlined my ideas for a rework of the error handler, they got lost 
in the noise of the abort vs reset debate.  These are some of the salient 
features that will help in this case

- no retries from the tasklet.

- Quiesce from above, not below (commands return while eh processes, so we 
begin with the first error and don't have to wait for all commands to return 
or error out)

- It's the object of the error handler to return all commands to the block 
layer for requeue and reorder as quickly as possible.  They have to be 
returned with an indication from the error handler that it would like them 
retried.  This indication can be propagated up (although I haven't given 
thought how to do that).  Any commands that are sent down to probe the device 
are generated from within the error handler thread (no device probing with 
live commands).

What other features do you need on the eh wishlist?

James

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Multi-path IO in 2.5/2.6 ?
  2002-09-11 14:20           ` James Bottomley
@ 2002-09-11 19:17             ` Lars Marowsky-Bree
  2002-09-11 19:37               ` James Bottomley
  2002-09-11 20:30             ` Doug Ledford
  1 sibling, 1 reply; 297+ messages in thread
From: Lars Marowsky-Bree @ 2002-09-11 19:17 UTC (permalink / raw)
  To: linux-kernel, linux-scsi

On 2002-09-11T09:20:38,
   James Bottomley <James.Bottomley@steeleye.com> said:

> patmans@us.ibm.com said:
> > The scsi multi-path code is not in 2.5.x, and I doubt it will be
> > accepted without the support of James and others. 
> I haven't said "no" yet (and Doug and Jens haven't said anything). 

Except for dasds, all devices I care for with regard to multipathing _are_
SCSI, so that would solve at least 90% of my worries in the mid-term. And also
do multipathing for !block devices, in theory.

It does have the advantage of knowing the topology better than the block
layer; the chance that a path failure only affects one of the LUNs on a device
is pretty much nil, so it would speed up error recovery.

However, I could also live with this being handled in the volume manager /
device mapper. This would transcend all potential devices - if the character
devices were really mapped through the block layer (didn't someone have this
weird idea once? ;-), it would too work for !block devs.

Exposing a better error reporting upwards is also definetely a good idea, and
if the device mapper could have the notion of "to what topology group does
this device belong to", or even "distance metric (without going into further
detail on what this is, as long as it is consistent to the physical layer) to
the current CPU" (so that the shortest path in NUMA could be selected), that
would be kinda cool ;-) And doesn't seem too intrusive.

Now, what I definetely dislike is the vaste amount of duplicated code. I'm not
sure whether we can get rid of this in 2.5 timeframe though.

<rant>

If EVMS was cleaned up some, maybe used the neat LVM2 device mapper
internally, and in fact is a superset of everything we've had before and can
support everything from our past as well as give us what we really want
(multi-pathing, journaled RAID etc), and we do the above, I vote for a legacy
free kernel. Unify the damn block device management and throw out the old
code. 

I hate cruft. Customers want to and will use it. Someone has to support it. It
breaks stuff. It cuts a 9 of my availability figure. ;-)

</rant>

> I think we all agree:

I agree here.

> 3) that errors (both medium and transport) may need to be propagated 
> immediately up the block layer in order for multi-path to be handled 
> efficiently.

Right. All the points you outline about the error handling are perfectly
valid.

But one of the issues with the md layer right now for example is the fact that
an error on a backup path will only be noticed as soon as it actually tries to
use it, even if the cpqfc (to name the culprit, worst code I've seen in a
while) notices a link-down event. It isn't communicated anywhere, and how
should it be?

This may be for later, but is something to keep in mind.

> thought how to do that).  Any commands that are sent down to probe the device 
> are generated from within the error handler thread (no device probing with 
> live commands).

If the path was somehow exposed to user-space (as it is with md right now),
the reprobing could even be done outside the kernel. This seems to make sense,
because a potential extensive device-specific diagnostic doesn't have to be
folded into it then.

> What other features do you need on the eh wishlist?

The other features on my list - prioritizing paths, a useful, consistent user
interface via driverfs/devfs/procfs  - are more or less policy I guess.

Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
Immortality is an adequate definition of high availability for me.
	--- Gregory F. Pfister

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Multi-path IO in 2.5/2.6 ?
  2002-09-11 19:17             ` Lars Marowsky-Bree
@ 2002-09-11 19:37               ` James Bottomley
  2002-09-11 19:52                 ` Lars Marowsky-Bree
  2002-09-11 21:38                 ` Oliver Xymoron
  0 siblings, 2 replies; 297+ messages in thread
From: James Bottomley @ 2002-09-11 19:37 UTC (permalink / raw)
  To: Lars Marowsky-Bree; +Cc: linux-kernel, linux-scsi

lmb@suse.de said:
>  and if the device mapper could have the notion of "to what topology
> group does this device belong to", or even "distance metric (without
> going into further detail on what this is, as long as it is consistent
> to the physical layer) to the current CPU" (so that the shortest path
> in NUMA could be selected), that would be kinda cool ;-) And doesn't
> seem too intrusive.

I think I see driverfs as the solution here.  Topology is deduced by examining 
certain device and HBA parameters.  As long as these parameters can be exposed 
as nodes in the device directory for driverfs, a user level daemon map the 
topology and connect the paths at the top.  It should even be possible to 
weight the constructed multi-paths.

This solution appeals because the kernel doesn't have to dictate policy, all 
it needs to be told is what information it should be exposing and lets user 
level get on with policy determination (this is a mini version of why we 
shouldn't have network routing policy deduced and implemented by the kernel).

> But one of the issues with the md layer right now for example is the fact that
> an error on a backup path will only be noticed as soon as it actually tries to
> use it, even if the cpqfc (to name the culprit, worst code I've seen in a
> while) notices a link-down event. It isn't communicated anywhere, and how
> should it be?

I've been think about this separately.  FC in particular needs some type of 
event notification API (something like "I've just seen this disk" or "my loop 
just went down").  I'd like to leverage a mid-layer api into hot plug for some 
of this, but I don't have the details worked out.

> If the path was somehow exposed to user-space (as it is with md right now),
> the reprobing could even be done outside the kernel. This seems to make sense,
> because a potential extensive device-specific diagnostic doesn't have to be
> folded into it then.

The probing issue is an interesting one.  At least SCSI has the ability to 
probe with no IO (something like a TEST UNIT READY) and I assume other block 
devices have something similar.  Would it make sense to tie this to a single 
well known ioctl so that you can probe any device that supports it without 
having to send real I/O?

> The other features on my list - prioritizing paths, a useful, consistent user
> interface via driverfs/devfs/procfs  - are more or less policy I guess.

Mike Sullivan (of IBM) is working with Patrick Mochel on this.

James

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Multi-path IO in 2.5/2.6 ?
  2002-09-11 19:37               ` James Bottomley
@ 2002-09-11 19:52                 ` Lars Marowsky-Bree
  2002-09-11 21:38                 ` Oliver Xymoron
  1 sibling, 0 replies; 297+ messages in thread
From: Lars Marowsky-Bree @ 2002-09-11 19:52 UTC (permalink / raw)
  To: linux-kernel, linux-scsi

On 2002-09-11T14:37:40,
   James Bottomley <James.Bottomley@steeleye.com> said:

> I think I see driverfs as the solution here.  Topology is deduced by
> examining certain device and HBA parameters.  As long as these parameters
> can be exposed as nodes in the device directory for driverfs, a user level
> daemon map the topology and connect the paths at the top.  It should even be
> possible to weight the constructed multi-paths.

Perfect, I agree, should've thought of it. As long as this is simple enough
that it can be done in initrd (if / is on a multipath device...).

The required weighting has already been implemented in the LVM1 patch by IBM.
While it appeared overkill to me for "simple" cases, I think it is suited to
expressing proximity.

> This solution appeals because the kernel doesn't have to dictate policy, 

Right.

> I've been think about this separately.  FC in particular needs some type of
> event notification API (something like "I've just seen this disk" or "my
> loop just went down").  I'd like to leverage a mid-layer api into hot plug
> for some of this, but I don't have the details worked out.

This isn't just FC, but also dasd on S/390. Potentially also network block
devices, which can notice a link down.

> The probing issue is an interesting one.  At least SCSI has the ability to 
> probe with no IO (something like a TEST UNIT READY) and I assume other block 
> devices have something similar.  Would it make sense to tie this to a single 
> well known ioctl so that you can probe any device that supports it without 
> having to send real I/O?

Not sufficient. The test is policy, so the above applies here too ;-) 

In the case of talking to a dual headed RAID box for example, TEST UNIT READY
might return OK, but the controller might refuse actual IO, or the path may be
somehow damaged in a way which is only detected by doing some "large" IO. Now,
this might be total overkill for other scenarios.

I vote for exposing the path via driverfs (which, I think, is already
concensus so the multipath group, topology etc can be used) and allowing
user-space to reenable them after doing whatever probing deemed necessary.

What are your ideas on the potential timeframe?

Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
Immortality is an adequate definition of high availability for me.
	--- Gregory F. Pfister

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Multi-path IO in 2.5/2.6 ?
  2002-09-11 19:37               ` James Bottomley
  2002-09-11 19:52                 ` Lars Marowsky-Bree
@ 2002-09-11 21:38                 ` Oliver Xymoron
  1 sibling, 0 replies; 297+ messages in thread
From: Oliver Xymoron @ 2002-09-11 21:38 UTC (permalink / raw)
  To: James Bottomley; +Cc: Lars Marowsky-Bree, linux-kernel, linux-scsi

On Wed, Sep 11, 2002 at 02:37:40PM -0500, James Bottomley wrote:
> lmb@suse.de said:
> >  and if the device mapper could have the notion of "to what topology
> > group does this device belong to", or even "distance metric (without
> > going into further detail on what this is, as long as it is consistent
> > to the physical layer) to the current CPU" (so that the shortest path
> > in NUMA could be selected), that would be kinda cool ;-) And doesn't
> > seem too intrusive.
> 
> I think I see driverfs as the solution here.  Topology is deduced by examining 
> certain device and HBA parameters.  As long as these parameters can be exposed 
> as nodes in the device directory for driverfs, a user level daemon map the 
> topology and connect the paths at the top.  It should even be possible to 
> weight the constructed multi-paths.
> 
> This solution appeals because the kernel doesn't have to dictate policy, all 
> it needs to be told is what information it should be exposing and lets user 
> level get on with policy determination (this is a mini version of why we 
> shouldn't have network routing policy deduced and implemented by the kernel).

Not coincidentally, network routing policy _is_ multipath config in
the iSCSI or nbd case.

-- 
 "Love the dolphins," she advised him. "Write by W.A.S.T.E.." 

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Multi-path IO in 2.5/2.6 ?
  2002-09-11 14:20           ` James Bottomley
  2002-09-11 19:17             ` Lars Marowsky-Bree
@ 2002-09-11 20:30             ` Doug Ledford
  2002-09-11 21:17               ` Mike Anderson
  1 sibling, 1 reply; 297+ messages in thread
From: Doug Ledford @ 2002-09-11 20:30 UTC (permalink / raw)
  To: James Bottomley
  Cc: Patrick Mansfield, Lars Marowsky-Bree, linux-kernel, linux-scsi

On Wed, Sep 11, 2002 at 09:20:38AM -0500, James Bottomley wrote:
> patmans@us.ibm.com said:
> > The scsi multi-path code is not in 2.5.x, and I doubt it will be
> > accepted without the support of James and others. 
> 
> I haven't said "no" yet (and Doug and Jens haven't said anything).

Well, I for one was gone on vacation, and I'm allowed to ignore linux-scsi
in such times, so, as Bill de Cat would say, thptptptppt! :-)

>  I did say 
> when the patches first surfaced that I didn't like the idea of replacing 
> Scsi_Device with Scsi_Path at the bottom and the concomitant changes to all 
> the Low Level Drivers which want to support multi-pathing.  If this is to go 
> in the SCSI subsystem it has to be self contained, transparent and easily 
> isolated.  That means the LLDs shouldn't have to be multipath aware.

I agree with this.

> I think we all agree:
> 
> 1) that multi-path in SCSI isn't the way to go in the long term because other 
> devices may have a use for the infrastructure.

I'm not so sure about this.  I think in the long run this is going to end 
up blurring the line between SCSI layer and block layer IMHO.

> 2) that the scsi-error handler is the big problem

Aye, it is, and for more than just this issue.

> 3) that errors (both medium and transport) may need to be propagated 
> immediately up the block layer in order for multi-path to be handled 
> efficiently.

This is why I'm not sure I agree with 1.  If we are doing this, then we
are sending up SCSI errors at which point the block layer now needs to
know SCSI specifics in order to properly decide what to do with the error.  
That, or we are building specific "is this error multipath relevant" logic
into the SCSI layer and then passing the result up to the block layer.  
I'm the kind of person that my preference would be that either A) the SCSI
layer doesn't know jack about multipath and the block layer handles it all
or B) the block layer doesn't know about our multipath and the SCSI layer
handles it all.  I don't like the idea of mixing them at this current
point in time (there really isn't much of a reason to mix them yet, and
people can only speculate that there might be reason to do so later).

> Although I outlined my ideas for a rework of the error handler, they got lost 
> in the noise of the abort vs reset debate.  These are some of the salient 
> features that will help in this case

[ snipped eh features ]

I'll have to respond to these items separately.  They cross over some with 
these issues, but really they aren't tired directly together and deserve 
separate consideration.

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Multi-path IO in 2.5/2.6 ?
  2002-09-11 20:30             ` Doug Ledford
@ 2002-09-11 21:17               ` Mike Anderson
  0 siblings, 0 replies; 297+ messages in thread
From: Mike Anderson @ 2002-09-11 21:17 UTC (permalink / raw)
  To: James Bottomley, Patrick Mansfield, Lars Marowsky-Bree,
	linux-kernel, linux-scsi

Doug Ledford [dledford@redhat.com] wrote:
> On Wed, Sep 11, 2002 at 09:20:38AM -0500, James Bottomley wrote:
> > patmans@us.ibm.com said:
> >  I did say 
> > when the patches first surfaced that I didn't like the idea of replacing 
> > Scsi_Device with Scsi_Path at the bottom and the concomitant changes to all 
> > the Low Level Drivers which want to support multi-pathing.  If this is to go 
> > in the SCSI subsystem it has to be self contained, transparent and easily 
> > isolated.  That means the LLDs shouldn't have to be multipath aware.
> 
> I agree with this.
> 

In the mid-level mp patch the adapters are not aware of multi-path. The
changes to adapters carried in the patch have to do with a driver not
allowing aborts during link down cases or iterating over the host_queue
for ioctl, /proc reasons. The hiding of some of the lists behind APIs is
something I had to do in the host list cleanup. We might even do some of
this same list cleanup outside of mp.

Also add "my me to" on the scsi error handling is lacking statement. I
am currently trying to do something about not using the failed command
for error recovery (post abort).

-Mike
-- 
Michael Anderson
andmike@us.ibm.com

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Multi-path IO in 2.5/2.6 ?
  2002-09-10  0:08     ` Patrick Mansfield
  2002-09-10  7:55       ` Jeremy Higdon
  2002-09-10 13:16       ` Lars Marowsky-Bree
@ 2002-09-10 17:21       ` Patrick Mochel
  2002-09-10 18:42         ` Patrick Mansfield
  2 siblings, 1 reply; 297+ messages in thread
From: Patrick Mochel @ 2002-09-10 17:21 UTC (permalink / raw)
  To: Patrick Mansfield
  Cc: James Bottomley, Lars Marowsky-Bree, linux-kernel, linux-scsi


> > > Generic device naming consistency is a problem if multiple devices
> > > show up with the same id.
> > 
> > Patrick Mochel has an open task to come up with a solution to this.
> 
> I don't think this can be solved if multiple devices show up with the same
> id. If I have five disks that all say I'm disk X, how can there be one
> name or handle for it from user level?

Easy: you map the unique identifier of the device to a name in userspace.  
In our utopian future, /sbin/hotplug is called with that unique ID as one
of its parameters. It searches for, and finds names based on the ID is. If
the name(s) already exist, then it doesn't continue.


	-pat

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Multi-path IO in 2.5/2.6 ?
  2002-09-10 17:21       ` Patrick Mochel
@ 2002-09-10 18:42         ` Patrick Mansfield
  2002-09-10 19:00           ` Patrick Mochel
  0 siblings, 1 reply; 297+ messages in thread
From: Patrick Mansfield @ 2002-09-10 18:42 UTC (permalink / raw)
  To: Patrick Mochel
  Cc: James Bottomley, Lars Marowsky-Bree, linux-kernel, linux-scsi

On Tue, Sep 10, 2002 at 10:21:53AM -0700, Patrick Mochel wrote:
> 
> > > > Generic device naming consistency is a problem if multiple devices
> > > > show up with the same id.
> > > 
> > > Patrick Mochel has an open task to come up with a solution to this.
> > 
> > I don't think this can be solved if multiple devices show up with the same
> > id. If I have five disks that all say I'm disk X, how can there be one
> > name or handle for it from user level?
> 
> Easy: you map the unique identifier of the device to a name in userspace.  
> In our utopian future, /sbin/hotplug is called with that unique ID as one
> of its parameters. It searches for, and finds names based on the ID is. If
> the name(s) already exist, then it doesn't continue.
> 
> 
> 	-pat

But then if the md or volume manager wants to do multi-path IO it
will not be able to find all of the names in userspace since the
extra ones (second path and on) have been dropped.

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Multi-path IO in 2.5/2.6 ?
  2002-09-10 18:42         ` Patrick Mansfield
@ 2002-09-10 19:00           ` Patrick Mochel
  2002-09-10 19:37             ` Patrick Mansfield
  0 siblings, 1 reply; 297+ messages in thread
From: Patrick Mochel @ 2002-09-10 19:00 UTC (permalink / raw)
  To: Patrick Mansfield
  Cc: James Bottomley, Lars Marowsky-Bree, linux-kernel, linux-scsi


On Tue, 10 Sep 2002, Patrick Mansfield wrote:

> On Tue, Sep 10, 2002 at 10:21:53AM -0700, Patrick Mochel wrote:
> > 
> > > > > Generic device naming consistency is a problem if multiple devices
> > > > > show up with the same id.
> > > > 
> > > > Patrick Mochel has an open task to come up with a solution to this.
> > > 
> > > I don't think this can be solved if multiple devices show up with the same
> > > id. If I have five disks that all say I'm disk X, how can there be one
> > > name or handle for it from user level?
> > 
> > Easy: you map the unique identifier of the device to a name in userspace.  
> > In our utopian future, /sbin/hotplug is called with that unique ID as one
> > of its parameters. It searches for, and finds names based on the ID is. If
> > the name(s) already exist, then it doesn't continue.
> > 
> > 
> > 	-pat
> 
> But then if the md or volume manager wants to do multi-path IO it
> will not be able to find all of the names in userspace since the
> extra ones (second path and on) have been dropped.

Which is it that you want? One canonical name or all the paths? I supplied
a solution for the former in my repsonse. The latter is solved via the
exposure of the paths in driverfs, which has been discussed previously.


	-pat

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC] Multi-path IO in 2.5/2.6 ?
  2002-09-10 19:00           ` Patrick Mochel
@ 2002-09-10 19:37             ` Patrick Mansfield
  0 siblings, 0 replies; 297+ messages in thread
From: Patrick Mansfield @ 2002-09-10 19:37 UTC (permalink / raw)
  To: Patrick Mochel
  Cc: James Bottomley, Lars Marowsky-Bree, linux-kernel, linux-scsi

On Tue, Sep 10, 2002 at 12:00:47PM -0700, Patrick Mochel wrote:
> 
> On Tue, 10 Sep 2002, Patrick Mansfield wrote:
> 
> > On Tue, Sep 10, 2002 at 10:21:53AM -0700, Patrick Mochel wrote:
> > > Easy: you map the unique identifier of the device to a name in userspace.  
> > > In our utopian future, /sbin/hotplug is called with that unique ID as one
> > > of its parameters. It searches for, and finds names based on the ID is. If
> > > the name(s) already exist, then it doesn't continue.
> > > 
> > > 
> > > 	-pat
> > 
> > But then if the md or volume manager wants to do multi-path IO it
> > will not be able to find all of the names in userspace since the
> > extra ones (second path and on) have been dropped.
> 
> Which is it that you want? One canonical name or all the paths? I supplied
> a solution for the former in my repsonse. The latter is solved via the
> exposure of the paths in driverfs, which has been discussed previously.
> 
> 
> 	-pat

For scsi multi-path, one name; without scsi multi-path (or for individual
paths that are not exposed in driverfs) each path probably needs to show up
in user space with a different name so md or other volume managers can use
them.

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 297+ messages in thread

* SCSI woes (followup)
@ 2002-09-24 11:35 Russell King
  2002-09-24 13:46 ` James Bottomley
  0 siblings, 1 reply; 297+ messages in thread
From: Russell King @ 2002-09-24 11:35 UTC (permalink / raw)
  To: linux-scsi, James Bottomley

In my previous mail, I said I thought scsi_allocate_device() was the cause
of the lockup while trying to attach devices to drivers.  This doesn't
seem to be the case.

Instead, we have something interesting going on:

1. We submit a test unit ready command for the disk device.  This command
   is at the head, and we call scsi_request_fn()
2. the scsi_request_fn() realises that the device was reset, and performs
   a scsi_ioctl to lock the door.
3. the ioctl queues up another request on the tail of the request queue
   to lock the door, and calls scsi_request_fn()
4. scsi_request_fn() processes the test unit ready command at the head of
   the queue, and hands this off to the driver.  The driver is now busy.
5. scsi_request_fn() returns, and waits for our door lock request to
   complete.
6. the test unit ready command completes.  We pass completion notification
   through scsi_done, the bottom half handler, scsi_finish_command,
   the scsi command's done function (which is the scsi requests sr_done
   function.)  At no point to we kick the queue to go and execute that
   door lock request.

So, we are left with one invocation of scsi_request_fn() spinning in the
scsi ioctl code waiting for a command that can never ever complete.

I'm getting the impression that the door lock handling code is misplaced,
and is probably the cause of these problems.  I'm going to try disabling
that code, and reverting my previous change to scsi_restart_operations().

Help!

-- 
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: SCSI woes (followup)
  2002-09-24 11:35 SCSI woes (followup) Russell King
@ 2002-09-24 13:46 ` James Bottomley
  2002-09-24 13:58   ` Russell King
  2002-09-24 17:57   ` Luben Tuikov
  0 siblings, 2 replies; 297+ messages in thread
From: James Bottomley @ 2002-09-24 13:46 UTC (permalink / raw)
  To: Russell King; +Cc: linux-scsi, James Bottomley

[-- Attachment #1: Type: text/plain, Size: 1136 bytes --]

rmk@arm.linux.org.uk said:
> I'm getting the impression that the door lock handling code is
> misplaced, and is probably the cause of these problems.  I'm going to
> try disabling that code, and reverting my previous change to
> scsi_restart_operations(). 

Unfortunately, if we want to ensure the medium is locked in place, the lock 
has to occur that low down (since error recovery can unlock the door and a 
retry doesn't currently trouble the upper layers).

I think it's method of operation is misplaced.  What it should probably be 
doing is simply adding a doorlock to the head of the queue and then go back 
around.  Since we know the doorlock now preceeds the command that instigated 
the issue the door should be locked before anything else happens.  It should 
simply loop around and process the door lock request so it doesn't have to 
wait for anything else to occur (any commands which come down in the interim 
should go on the queue tail).

However, for your case does simply moving the queue empty check to the top 
cause the problems to go away? (That would be hiding the problem not fixing 
it, but still...)

James

[-- Attachment #2: tmp.diffs --]
[-- Type: text/plain , Size: 775 bytes --]

===== scsi_lib.c 1.15 vs edited =====
--- 1.15/drivers/scsi/scsi_lib.c	Tue Aug 13 09:35:17 2002
+++ edited/scsi_lib.c	Tue Sep 24 09:45:16 2002
@@ -868,6 +868,15 @@
 		if (SDpnt->device_blocked) {
 			break;
 		}
+
+		/*
+		 * If we couldn't find a request that could be queued, then we
+		 * can also quit.
+		 */
+
+		if (list_empty(&q->queue_head))
+			break;
+
 		if ((SHpnt->can_queue > 0 && (SHpnt->host_busy >= SHpnt->can_queue))
 		    || (SHpnt->host_blocked) 
 		    || (SHpnt->host_self_blocked)) {
@@ -914,13 +923,6 @@
 				continue;
 			}
 		}
-
-		/*
-		 * If we couldn't find a request that could be queued, then we
-		 * can also quit.
-		 */
-		if (list_empty(&q->queue_head))
-			break;

 		/*
 		 * Loop through all of the requests in this queue, and find

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: SCSI woes (followup)
  2002-09-24 13:46 ` James Bottomley
@ 2002-09-24 13:58   ` Russell King
  2002-09-24 14:29     ` James Bottomley
  2002-09-24 18:18     ` Patrick Mansfield
  2002-09-24 17:57   ` Luben Tuikov
  1 sibling, 2 replies; 297+ messages in thread
From: Russell King @ 2002-09-24 13:58 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi

On Tue, Sep 24, 2002 at 09:46:14AM -0400, James Bottomley wrote:
> I think it's method of operation is misplaced.

I think it is misplaced.  It locks the doors of devices that aren't even
in use, which is just plain stupid.

> However, for your case does simply moving the queue empty check to the top 
> cause the problems to go away? (That would be hiding the problem not fixing 
> it, but still...)

I #if 0'd it out, and it makes the problem go away.

One thing to note is the comment that Eric left in the code about it -
it seems to indicate that he believes it to be misplaced, but its there
because we have host drivers using the old error handling code.

I'm wondering if this behaviour only happens with host drivers that use
the new error handling.  If so, one possibility would be to follow Eric's
suggestion there, and lock the door in the error handler "as Eric intended"
and leave that door locking behind for the old error handling.

Incidentally, I've found another weirdness in the code.  We decide that
we need to lock the door based on the "was_reset" flag, which we clear.
However, st.c uses "was_reset" to decide whether to suspend operations
(due to a bus reset).  The two uses of this flag seem to conflict each
other.  How can we guarantee that st.c will see was_reset set if the
request function clears it?  st.c appears to use scsi_request_fn, so
I guess there's a conflict of use there.

-- 
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: SCSI woes (followup)
  2002-09-24 13:58   ` Russell King
@ 2002-09-24 14:29     ` James Bottomley
  2002-09-24 18:16       ` Luben Tuikov
  2002-09-24 18:18     ` Patrick Mansfield
  1 sibling, 1 reply; 297+ messages in thread
From: James Bottomley @ 2002-09-24 14:29 UTC (permalink / raw)
  To: Russell King; +Cc: linux-scsi

rmk@arm.linux.org.uk said:
> I think it is misplaced.  It locks the doors of devices that aren't
> even in use, which is just plain stupid. 

A better fix is probably to implement a flag to say whether the door should be 
locked or not.

The whole lock door before a command goes down philosophy looks slightly wrong 
to me.  I can see we want to lock the door for a mounted device and other 
conditions, but locking the door just to send down a test unit ready or 
inquiry doesn't seem correct.

Door locking is very similar to reservation maintenance.  Perhaps adding 
kernel code to maintain certain persistent but fragile states of the device 
might be in order.

> Incidentally, I've found another weirdness in the code.  We decide
> that we need to lock the door based on the "was_reset" flag, which we
> clear. However, st.c uses "was_reset" to decide whether to suspend
> operations (due to a bus reset).  The two uses of this flag seem to
> conflict each other.  How can we guarantee that st.c will see
> was_reset set if the request function clears it?  st.c appears to use
> scsi_request_fn, so I guess there's a conflict of use there.

The code is riddled with these.  Reset handling was really just bolted on to 
the code as an afterthought.

James

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: SCSI woes (followup)
  2002-09-24 14:29     ` James Bottomley
@ 2002-09-24 18:16       ` Luben Tuikov
  0 siblings, 0 replies; 297+ messages in thread
From: Luben Tuikov @ 2002-09-24 18:16 UTC (permalink / raw)
  To: James Bottomley; +Cc: Russell King, linux-scsi

James Bottomley wrote:
> 
> A better fix is probably to implement a flag to say whether the door should be
> locked or not.
> 

This would really depend on the command to be executed.
I.e. does the command need access to the media, or not.

> The whole lock door before a command goes down philosophy looks slightly wrong
> to me.  I can see we want to lock the door for a mounted device and other
> conditions, but locking the door just to send down a test unit ready or
> inquiry doesn't seem correct.

Well, SAM-3 is the place to check this out. For TUR (7.27) it would seem that
one does indeed need to lock the door/media into place in order to
be ``able to accept an appropriate medium-access command without returning
CHECK CONDITION status''.

> Door locking is very similar to reservation maintenance.  Perhaps adding
> kernel code to maintain certain persistent but fragile states of the device
> might be in order.

This is the job of the task manager for the LU, and if not implemented
by the LU, it should be implemented by the LLDD, which would complicate
things considerably for the LLDD.

As to your suggestion, there may be generic code, which LLDD may want to use,
but nevetheless, has to be no higher than the LLDD, and ideally in the LU itself.

The idea is that SCSI core should be quite minimal as I've been suggesting
for some time now.

-- 
Luben

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: SCSI woes (followup)
  2002-09-24 13:58   ` Russell King
  2002-09-24 14:29     ` James Bottomley
@ 2002-09-24 18:18     ` Patrick Mansfield
  2002-09-24 19:01       ` Russell King
                         ` (2 more replies)
  1 sibling, 3 replies; 297+ messages in thread
From: Patrick Mansfield @ 2002-09-24 18:18 UTC (permalink / raw)
  To: Russell King; +Cc: James Bottomley, linux-scsi

On Tue, Sep 24, 2002 at 02:58:52PM +0100, Russell King wrote:
> On Tue, Sep 24, 2002 at 09:46:14AM -0400, James Bottomley wrote:
> > I think it's method of operation is misplaced.
> 
> I think it is misplaced.  It locks the doors of devices that aren't even
> in use, which is just plain stupid.
> 
> > However, for your case does simply moving the queue empty check to the top 
> > cause the problems to go away? (That would be hiding the problem not fixing 
> > it, but still...)
> 
> I #if 0'd it out, and it makes the problem go away.

The scan will only send INQUIRY commands, and after all scanning is
done, the upper level drivers might send a TUR.

After a new Scsi_Device is added in scsi_scan.c it calls
scsi_release_commandblocks() and sets queue_depth = 0.

Any call to scsi_request_fn() for the device at this point will just
return (break statements) after scsi_allocate_device() returns NULL,
and if scsi_ioctl() was called from scsi_request_fn() it will hang
forever.

The problem is that we try to send a command via scsi_request_fn() to
a device that has no command blocks allocated - it's initializatin
is incomplete.

Moving the empty check up sounds like good and simple fix for 2.4, or
check if queue_depth == 0. Anything else would be difficult to get right.

Moving the the SCSI_IOCTL_DOORLOCK doesn't fix the problem if it is
still called on a incompletely initialized device.

And, perhaps do not allow the error handler to run during scanning, let
later IO (to any discovered device) kick off the error handler. It's
hard to say if this is good or not - for example, if this is your root
device, you want it online. But if it some other device, and we try hard
to scan and use it, it can cause more problems (if it keeps getting errors,
and we keeping running the error handler/reset cycle, blocking other IO).

The problem happens via:

1) device A is found that has removable media during scan

2) INQUIRY to another device B kicks off error handling before the
scan has completed, so device A has no command blocks.

3) Error handler completion calls scsi_request_fn() for A.

4) scsi_request_fn() for A sees the reset happened, and calls scsi_ioctl().

5) scsi_ioctl() calls scsi_request_fn(), it cannot get a Scsi_Cmnd, so
it just returns, incorrectly assuming that another request must be
outstanding.

6) The scsi_ioctl() never completes. The error handling thread should
be hung.

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: SCSI woes (followup)
  2002-09-24 18:18     ` Patrick Mansfield
@ 2002-09-24 19:01       ` Russell King
  2002-09-24 19:08       ` Mike Anderson
  2002-09-24 19:32       ` Patrick Mansfield
  2 siblings, 0 replies; 297+ messages in thread
From: Russell King @ 2002-09-24 19:01 UTC (permalink / raw)
  To: Patrick Mansfield; +Cc: James Bottomley, linux-scsi

On Tue, Sep 24, 2002 at 11:18:47AM -0700, Patrick Mansfield wrote:
> Moving the empty check up sounds like good and simple fix for 2.4, or
> check if queue_depth == 0. Anything else would be difficult to get right.

I disagree here.  Moving the empty check up means that we won't relock any
in-use but idle check.

As I said to James earlier today, Eric Young's comments in the code say
that it is in the wrong place.  Thinking about it for a while, I'd agree
with that statement for the reason above; any device that is in use by
user space (ie, mounted on a filesystem) could well be idle for many
hours before a request comes through for it.

This means that the door would be unlocked on an in-use device, and the
media can be (accidentally) unloaded.

> Moving the the SCSI_IOCTL_DOORLOCK doesn't fix the problem if it is
> still called on a incompletely initialized device.

There are _many_ problems here.  Let me sort out a patch later tonight
and put together the gory details of each problem.  Its basically layer
upon layer of crap and fixme's.

I have a set of fixes piling up, some trivial, that address many of these
problems.  Hell, I almost have a completely stable SCSI system here which
I almost can't bring down.  And I'm giving it all sorts of hellish
conditions to deal with.

> And, perhaps do not allow the error handler to run during scanning, let
> later IO (to any discovered device) kick off the error handler. It's
> hard to say if this is good or not - for example, if this is your root
> device, you want it online. But if it some other device, and we try hard
> to scan and use it, it can cause more problems (if it keeps getting errors,
> and we keeping running the error handler/reset cycle, blocking other IO).

No.  You need the error handler to produce bus resets.  Without bus resets,
if your SCSI bus hangs (eg, in my case due to a permanent parity error to
one device brought on by test code) you don't want to continue queueing
IDENTIFY messages to other targets.  They have no hope what so ever to get
onto the bus.

You need the error handler to time out the connection to the bad device
and perform a bus reset.  A bus reset is the only way that an initiator
can clear down a stuck bus.

> The problem happens via:
> 
> 1) device A is found that has removable media during scan
> 
> 2) INQUIRY to another device B kicks off error handling before the
> scan has completed, so device A has no command blocks.
> 
> 3) Error handler completion calls scsi_request_fn() for A.
> 
> 4) scsi_request_fn() for A sees the reset happened, and calls scsi_ioctl().
> 
> 5) scsi_ioctl() calls scsi_request_fn(), it cannot get a Scsi_Cmnd, so
> it just returns, incorrectly assuming that another request must be
> outstanding.

Not quite.  Its even more disgusting.  That code is fundamentally wrong, as
proven by my later hangs when the devices have been initialised, and passed
to the device drivers.

1) We queue up a command for device A _or_ step 3 above.

2) scsi_request_fn() gets called to start this device

3) scsi_request_fn() for A sees that a reset happened, and calls scsi_ioctl()

4) scsi_ioctl() calls scsi_request_fn().

5) the head of the request queue is _not_ the door lock, but the original
   request.

6) we kick off the original request and return from scsi_request_fn()

7) scsi_ioctl() waits for its door lock command to complete.

Ahem, well, it has _never_ been submitted to the host driver.  The done paths
for the original request don't kick the request function.  We deadlock.

Trying to lock the door in scsi_request_fn() in the way we do is
_fundamentally_ flawed.  Even Eric Young's comments agree!

Now, as I said above, I do have a whole raft of fixes accumulating here, and
if I can solve the last few problems, I'll get some patches out for you guys
to look at.

-- 
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: SCSI woes (followup)
  2002-09-24 18:18     ` Patrick Mansfield
  2002-09-24 19:01       ` Russell King
@ 2002-09-24 19:08       ` Mike Anderson
  2002-09-24 19:21         ` Russell King
  2002-09-24 19:32       ` Patrick Mansfield
  2 siblings, 1 reply; 297+ messages in thread
From: Mike Anderson @ 2002-09-24 19:08 UTC (permalink / raw)
  To: Patrick Mansfield; +Cc: Russell King, James Bottomley, linux-scsi

Patrick Mansfield [patmans@us.ibm.com] wrote:
> The problem is that we try to send a command via scsi_request_fn() to
> a device that has no command blocks allocated - it's initializatin
> is incomplete.
> 
> Moving the empty check up sounds like good and simple fix for 2.4, or
> check if queue_depth == 0. Anything else would be difficult to get right.

Since error handling running during scan is kinda casuing the problem a
a patch like the one below to  scsi_restart_operations sould also work. 

-andmike
--
Michael Anderson
andmike@us.ibm.com

--- linux-2.4/drivers/scsi/scsi_error.c	Thu Sep 19 09:08:50 2002
+++ linux-2.4-test/drivers/scsi/scsi_error.c	Tue Sep 24 12:03:03 2002
@@ -1264,8 +1264,10 @@
 		    || (SDpnt->device_blocked)) {
 			break;
 		}
-		q = &SDpnt->request_queue;
-		q->request_fn(q);
+		if (SDpnt->has_cmdblocks) {
+			q = &SDpnt->request_queue;
+			q->request_fn(q);
+		}
 	}
 	spin_unlock_irqrestore(&io_request_lock, flags);
 }


^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: SCSI woes (followup)
  2002-09-24 19:08       ` Mike Anderson
@ 2002-09-24 19:21         ` Russell King
  0 siblings, 0 replies; 297+ messages in thread
From: Russell King @ 2002-09-24 19:21 UTC (permalink / raw)
  To: Patrick Mansfield, James Bottomley, linux-scsi

On Tue, Sep 24, 2002 at 12:08:09PM -0700, Mike Anderson wrote:
> Patrick Mansfield [patmans@us.ibm.com] wrote:
> > The problem is that we try to send a command via scsi_request_fn() to
> > a device that has no command blocks allocated - it's initializatin
> > is incomplete.
> > 
> > Moving the empty check up sounds like good and simple fix for 2.4, or
> > check if queue_depth == 0. Anything else would be difficult to get right.
> 
> Since error handling running during scan is kinda casuing the problem a
> a patch like the one below to  scsi_restart_operations sould also work. 

No.  There are still deadlock cases.  This is the _wrong_ fix.

Been there.  Tried it.  It failed.

-- 
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html


^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: SCSI woes (followup)
  2002-09-24 18:18     ` Patrick Mansfield
  2002-09-24 19:01       ` Russell King
  2002-09-24 19:08       ` Mike Anderson
@ 2002-09-24 19:32       ` Patrick Mansfield
  2002-09-24 20:00         ` Russell King
  2002-09-24 22:39         ` Russell King
  2 siblings, 2 replies; 297+ messages in thread
From: Patrick Mansfield @ 2002-09-24 19:32 UTC (permalink / raw)
  To: Russell King; +Cc: James Bottomley, linux-scsi

On Tue, Sep 24, 2002 at 08:01:17PM +0100, Russell King wrote:
> On Tue, Sep 24, 2002 at 11:18:47AM -0700, Patrick Mansfield wrote:
> > Moving the empty check up sounds like good and simple fix for 2.4, or
> > check if queue_depth == 0. Anything else would be difficult to get right.
> 
> I disagree here.  Moving the empty check up means that we won't relock any
> in-use but idle check.

But, if we check for queue_depth == 0, it will still relock in-use but
idle devices. No, it won't relock immediately if the error handler does
not run, but this does not involve finding another place to send the ioctl.

> As I said to James earlier today, Eric Young's comments in the code say
> that it is in the wrong place.  Thinking about it for a while, I'd agree
> with that statement for the reason above; any device that is in use by
> user space (ie, mounted on a filesystem) could well be idle for many
> hours before a request comes through for it.
> 
> This means that the door would be unlocked on an in-use device, and the
> media can be (accidentally) unloaded.

> > Moving the the SCSI_IOCTL_DOORLOCK doesn't fix the problem if it is
> > still called on a incompletely initialized device.
> 
> There are _many_ problems here.  Let me sort out a patch later tonight
> and put together the gory details of each problem.  Its basically layer
> upon layer of crap and fixme's.

Yeh :(

I totally agree the re-locking _should_ be moved, and should be conditional
(if it was locked, lock it again, but don't lock an unlocked device, like
your tape door example).

> I have a set of fixes piling up, some trivial, that address many of these
> problems.  Hell, I almost have a completely stable SCSI system here which
> I almost can't bring down.  And I'm giving it all sorts of hellish
> conditions to deal with.

Maybe you can try 2.5 first? Not that it makes thing easier for you, but it
is probably very easy to break something, and we should break 2.5 rather
than 2.4. Mike's cleaned up scsi_error.c would probably help a lot, it
should be pushed into 2.5.

Backporting to 2.4 would be ummm fun :(

> > And, perhaps do not allow the error handler to run during scanning, let
> > later IO (to any discovered device) kick off the error handler. It's
> > hard to say if this is good or not - for example, if this is your root
> > device, you want it online. But if it some other device, and we try hard
> > to scan and use it, it can cause more problems (if it keeps getting errors,
> > and we keeping running the error handler/reset cycle, blocking other IO).
> 
> No.  You need the error handler to produce bus resets.  Without bus resets,
> if your SCSI bus hangs (eg, in my case due to a permanent parity error to
> one device brought on by test code) you don't want to continue queueing
> IDENTIFY messages to other targets.  They have no hope what so ever to get
> onto the bus.
> 
> You need the error handler to time out the connection to the bad device
> and perform a bus reset.  A bus reset is the only way that an initiator
> can clear down a stuck bus.

OK - but you also do not want eternal bus resets because of a marginal
device. Maybe the error handler can kick off, but if it is a scan INQUIRY,
do not retry the command, do the resets etc and complete the command as a
failure. The device being scanned is likely causing the problem (since
no one else should be using the adapter at this point, unless this is a
"hardcoded" scan via /proc), this would prevent a new Scsi_Device from 
being created, but would allow scanning to continue.

> > The problem happens via:
> > 
> > 1) device A is found that has removable media during scan
> > 
> > 2) INQUIRY to another device B kicks off error handling before the
> > scan has completed, so device A has no command blocks.
> > 
> > 3) Error handler completion calls scsi_request_fn() for A.
> > 
> > 4) scsi_request_fn() for A sees the reset happened, and calls scsi_ioctl().
> > 
> > 5) scsi_ioctl() calls scsi_request_fn(), it cannot get a Scsi_Cmnd, so
> > it just returns, incorrectly assuming that another request must be
> > outstanding.

> Not quite.  Its even more disgusting.  That code is fundamentally wrong, as
> proven by my later hangs when the devices have been initialised, and passed
> to the device drivers.
> 
> 1) We queue up a command for device A _or_ step 3 above.
> 
> 2) scsi_request_fn() gets called to start this device
> 
> 3) scsi_request_fn() for A sees that a reset happened, and calls scsi_ioctl()
> 
> 4) scsi_ioctl() calls scsi_request_fn().
> 

Agree with all the above.

> 5) the head of the request queue is _not_ the door lock, but the original
>    request.

I still don't see how this can be the original request. The TUR is sent via
the error handler, the INQUIRY resent during error handling does not call
scsi_request_fn().

So, where is the request requeued?

Is this is in modified error handler code or something?

> 6) we kick off the original request and return from scsi_request_fn()

And, I don't see how this can happen since I don't see how step 5 happens.

> 7) scsi_ioctl() waits for its door lock command to complete.
> 
> Ahem, well, it has _never_ been submitted to the host driver.  The done paths
> for the original request don't kick the request function.  We deadlock.
> 
> Trying to lock the door in scsi_request_fn() in the way we do is
> _fundamentally_ flawed.  Even Eric Young's comments agree!
> 
> Now, as I said above, I do have a whole raft of fixes accumulating here, and
> if I can solve the last few problems, I'll get some patches out for you guys
> to look at.
> 
> -- 
> Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
>              http://www.arm.linux.org.uk/personal/aboutme.html

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: SCSI woes (followup)
  2002-09-24 19:32       ` Patrick Mansfield
@ 2002-09-24 20:00         ` Russell King
  2002-09-24 22:23           ` Patrick Mansfield
  2002-09-24 22:39         ` Russell King
  1 sibling, 1 reply; 297+ messages in thread
From: Russell King @ 2002-09-24 20:00 UTC (permalink / raw)
  To: Patrick Mansfield; +Cc: James Bottomley, linux-scsi

On Tue, Sep 24, 2002 at 12:32:50PM -0700, Patrick Mansfield wrote:
> OK - but you also do not want eternal bus resets because of a marginal
> device.

I agree.  I'm actually running against the "rmk" error handler, which
has a fair number of the problems with the existing error handler fixed.
For instance, it won't leave the HBA driver with a currently executing
command and a stuck bus after attempting a retry.

It also knows about channels, and knows that if it a command fails on
a bus that has needed a reset, then the right thing to do is to reset
it _once_ again and only once, because the bus is probably stuck.

Oh, yes, I've put a lot of thought into this. 8)

> > Not quite.  Its even more disgusting.  That code is fundamentally wrong, as
> > proven by my later hangs when the devices have been initialised, and passed
> > to the device drivers.
> > 
> > 1) We queue up a command for device A _or_ step 3 above.
> > 
> > 2) scsi_request_fn() gets called to start this device
> > 
> > 3) scsi_request_fn() for A sees that a reset happened, and calls scsi_ioctl()
> > 
> > 4) scsi_ioctl() calls scsi_request_fn().
> > 
> 
> Agree with all the above.
> 
> > 5) the head of the request queue is _not_ the door lock, but the original
> >    request.
> 
> I still don't see how this can be the original request. The TUR is sent via
> the error handler, the INQUIRY resent during error handling does not call
> scsi_request_fn().
> 
> So, where is the request requeued?

A request received via ioctl is queued at the tail of the request queue,
not the head.  I can give you a reference if you'd like. 8)

> Is this is in modified error handler code or something?

Yes, and it is definitely more correct than the previous.  It certainly
isn't the cause of the problems.  I can say this because I'm now back
in my driver trying to debug a problem there, and the rest of the SCSI
subsystem is admirably coping with the crap I'm throwing at it.

My error handler doesn't go around chunking stuff into request queues
btw.  Neither does the existing one.

Basically, all its doing is knowing about channels properly, and knowing
that it needs to do a bus reset if its going to take a device off line.

If you can stand the suspense a while longer, I'll generate the patches,
and then you can read all the gory details yourself.  Until that time, I
think its rather academic trying to discuss stuff that I've had more than
12 hours to hammer away at and successfully solve.

Give me two hours, and I'll have patches.

-- 
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: SCSI woes (followup)
  2002-09-24 20:00         ` Russell King
@ 2002-09-24 22:23           ` Patrick Mansfield
  2002-09-24 23:04             ` Russell King
  0 siblings, 1 reply; 297+ messages in thread
From: Patrick Mansfield @ 2002-09-24 22:23 UTC (permalink / raw)
  To: Russell King; +Cc: James Bottomley, linux-scsi

On Tue, Sep 24, 2002 at 09:00:42PM +0100, Russell King wrote:

> I agree.  I'm actually running against the "rmk" error handler, which
> has a fair number of the problems with the existing error handler fixed.
> For instance, it won't leave the HBA driver with a currently executing
> command and a stuck bus after attempting a retry.
> 
> It also knows about channels, and knows that if it a command fails on
> a bus that has needed a reset, then the right thing to do is to reset
> it _once_ again and only once, because the bus is probably stuck.
> 
> Oh, yes, I've put a lot of thought into this. 8)

Have you seen Mike Anderon's 2.5 cleanup patch?

Patch:

http://www-124.ibm.com/storageio/gen-io/patch-scsi_error-2.5.34-1.gz

Posting:

http://marc.theaimsgroup.com/?l=linux-scsi&m=103187417110244&w=2

I've been focusing on 2.5.x scsi (i.e. scsi multi-path), and would prefer
work be done there first.

> > > 5) the head of the request queue is _not_ the door lock, but the original
> > >    request.
> > 
> > I still don't see how this can be the original request. The TUR is sent via
> > the error handler, the INQUIRY resent during error handling does not call
> > scsi_request_fn().
> > 
> > So, where is the request requeued?
> 
> A request received via ioctl is queued at the tail of the request queue,
> not the head.  I can give you a reference if you'd like. 8)

Yes, I understand that, but where is the original, non-door lock request
coming from? 

My point is that I think there is none, and there can't be any on
a partially initialized device, since there are no command blocks
for the device - put a printk in the scsi_ioctl or such to dump
Scsi_Device::queue_depth and/or Scsi_Device::has_cmdblocks.

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: SCSI woes (followup)
  2002-09-24 22:23           ` Patrick Mansfield
@ 2002-09-24 23:04             ` Russell King
  0 siblings, 0 replies; 297+ messages in thread
From: Russell King @ 2002-09-24 23:04 UTC (permalink / raw)
  To: linux-scsi

On Tue, Sep 24, 2002 at 03:23:19PM -0700, Patrick Mansfield wrote:
> Have you seen Mike Anderon's 2.5 cleanup patch?
> 
> Patch:
> 
> http://www-124.ibm.com/storageio/gen-io/patch-scsi_error-2.5.34-1.gz
> 
> Posting:
> 
> http://marc.theaimsgroup.com/?l=linux-scsi&m=103187417110244&w=2
> 
> I've been focusing on 2.5.x scsi (i.e. scsi multi-path), and would prefer
> work be done there first.

I'd rather not.  2.5.x is a little unstable for this box.  It holds
the master ARM kernel trees for 2.2, 2.4 and 2.5, and given the recent
"issues" with 2.5 IDE, I'd rather not have to reinstall the box from
scratch.

I will look at the patch though.

> > A request received via ioctl is queued at the tail of the request queue,
> > not the head.  I can give you a reference if you'd like. 8)
> 
> Yes, I understand that, but where is the original, non-door lock request
> coming from? 

You must have missed something I mentioned in a previous mail.  Let me
restate with hard evidence.

There are two, almost identical conditions where this goes pear shaped.

1. when restarting operations after error handling.

2. when trying to attach devices to drivers.

In the second case you end up with the precise case; you have a command
that is queued on the request list, and we try to lock the door.

Case 1:  yes, there is no pending request.  However, we do drop the
request on the floor because there are no Scsi_Cmnd structures available.
Here are the debugging messages:

!! scsi: device set offline - not ready or command retry failed after bus reset: host 0 channel 0 id 3 lun 0
!! unjam_host: returning success
!! host_busy = 1 host_blocked = 0
!! Adding timer for command c4489c00 at 100 (c00e1050)
!! Clearing timer for command c4489c00 1
!! host_busy = 1 host_blocked = 0
!! scsi_error.c: Waking up host to restart
!! scsi_error.c: device offline - report as SUCCESS
!! Command finished 1 0 0x6
!! Notifying upper driver of completion for device 3 6
!! restarting target 1
!! Request for device 1: queue empty
!! device 1 re-locking door
!! Open returning 1
!! Trying ioctl with scsi command 30
!! scsi_do_req (host = 0, channel = 0 target = 1, buffer =00000000, bufflen = 0, done = c00db780, timeout = 1000, retries = 5)
!! command : 1e  00  00  00  01  00
!! Request for device 1: req c0471ab4 cmd 4 special 1 completion c45d1e98 command=Prevent/Allow Medium Removal 00 00 00 01 00
!! 
!! Leaving scsi_do_req()

Here, we have dropped the command on the floor because there weren't any
Scsi_Cmnd structures available.  We are still waiting for it though, and
we are using the error handlers context, and thus blocking the error handler.

!! Deactivating command for device 3 (active=0, failed=0)
!! Request for device 3: queue empty
!! device 3 re-locking door
!! device 3 no request
!! scsi: INQUIRY failed with code 0x6
!! scsi: performing INQUIRY
!! scsi_do_req (host = 0, channel = 0 target = 4, buffer =c459fd44, bufflen = 256, done = c00db780, timeout = 600, retries = 3)
!! command : 12  00  00  00  ff  00
!! Request for device 4: req c0471db4 cmd 4 special 1 completion c459fc3c command=Inquiry 00 00 00 ff 00
!! 
!! Activating command for device 4 (1)
!! Leaving scsi_init_cmd_from_req()
!! Adding timer for command c4489e00 at 600 (c00e0de8)
!! scsi_dispatch_cmnd (host = 0, channel = 0, target = 4, command = c4489e58, buffer = c459fd44,
!! bufflen = 256, done = c00db780)
!! queuecommand : routine at cc81d784
!! scsi0.H: received command for id 4 (c4489e00) Inquiry 00 00 00 ff 00
!! scsi0.4: starting Inquiry 00 00 00 ff 00
!! scsi0.4: select: data pointers [c459fd44, 100]
!! scsi0.H: queue success
!! leaving scsi_dispatch_cmnd()
!! host blocked: host_busy=1 host_blocked=0 host_self_blocked=0
!! Leaving scsi_do_req()
!! Command timed out active=1 busy=1 failed=1
<< deadlock >>

Note the nice race condition here.  Above, you'll notice that the command
timed out, and requires error handler work.  Oh, but the error handler is
blocked waiting for a command to complete that never will.  Below, the
command disconnected with DID_NO_CONNECT status.

For some reason, this host controller chip seems to be slow to report
this.

!! scsi0.4: disconnect phase=01
!! scsi0.4: command complete result=0x00010000 CDB: Inquiry 00 00 00 ff 00
!! Clearing timer for command c4489e00 0

Case 2:  I'll let you read through this.

!! Host status for host c0c7e000:
!! Device 1 c0c2d400:
!! Device 2 c0c2d800:
!! sg_init
!! sg_attach: dev=0
!! sg_attach: dev=1
!! Attached scsi removable disk sda at scsi0, channel 0, id 2, lun 0
!! scsi_do_req (host = 0, channel = 0 target = 2, buffer =c0bea000, bufflen = 0, done = c00db780, timeout = 3000, retries = 5)
!! command : 00  00  00  00  00  00
!! Request for device 2: req cc3fe0d4 cmd 4 special 1 completion c0d47d74 command=Test Unit Ready 00 00 00 00 00

!! device 2 re-locking door
!! Open returning 1
!! Trying ioctl with scsi command 30
!! scsi_do_req (host = 0, channel = 0 target = 2, buffer =00000000, bufflen = 0, done = c00db780, timeout = 1000, retries = 5)
!! command : 1e  00  00  00  01  00
!! Request for device 2: req cc3fe0d4 cmd 4 special 1 completion c0d47d74 command=Test Unit Ready 00 00 00 00 00

!! Activating command for device 2 (1)
!! Leaving scsi_init_cmd_from_req()
!! Adding timer for command c0c0e200 at 3000 (c00e0de8)
!! scsi_dispatch_cmnd (host = 0, channel = 0, target = 2, command = c0c0e258, buffer = c0bea000,
!! bufflen = 0, done = c00db780)
!! queuecommand : routine at cc81d784
!! scsi0.H: received command for id 2 (c0c0e200) Test Unit Ready 00 00 00 00 00
!! scsi0.2: starting Test Unit Ready 00 00 00 00 00
!! scsi0.2: select: data pointers [00000000, 0]
!! scsi0.H: queue success
!! leaving scsi_dispatch_cmnd()
!! scsi0.2: disconnect phase=0d
!! scsi0.2: command complete result=0x00000002 CDB: Test Unit Ready 00 00 00 00 00
!! scsi0.2: starting Request Sense 00 00 00 40 00
!! scsi0.2: select: data pointers [c0c0e30c, 40]
!! host blocked: host_busy=1 host_blocked=0 host_self_blocked=0
!! Leaving scsi_do_req()
!! scsi0.2: disconnect phase=0d
!! scsi0.H: request sense complete, result=0x00000000
!! scsi0.2: sense buffer: 70 00 06 00 00 00 00 0e 00 00 00 00 29 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
!! Clearing timer for command c0c0e200 1
!! Command needs retry 1 0 0x2
!! Adding timer for command c0c0e200 at 3000 (c00e0de8)
!! scsi_dispatch_cmnd (host = 0, channel = 0, target = 2, command = c0c0e258, buffer = c0bea000,
!! bufflen = 0, done = c00db780)
!! queuecommand : routine at cc81d784
!! scsi0.H: received command for id 2 (c0c0e200) Test Unit Ready 00 00 00 00 00
!! scsi0.2: starting Test Unit Ready 00 00 00 00 00
!! scsi0.2: select: data pointers [00000000, 0]
!! scsi0.H: queue success
!! leaving scsi_dispatch_cmnd()
!! scsi0.2: disconnect phase=0d
!! scsi0.2: command complete result=0x00000002 CDB: Test Unit Ready 00 00 00 00 00
!! scsi0.2: starting Request Sense 00 00 00 40 00
!! scsi0.2: select: data pointers [c0c0e30c, 40]
!! scsi0.2: disconnect phase=0d
!! scsi0.H: request sense complete, result=0x00000000
!! scsi0.2: sense buffer: 70 00 02 00 00 00 00 0e 00 00 00 00 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
!! Clearing timer for command c0c0e200 1
!! Command finished 1 0 0x2
!! Notifying upper driver of completion for device 2 8000002
<< deadlock >>

Both of the above are nicely reproducable, and are from my logs from earlier
today, where I've plenty of such examples.

So, lets review.

1. We deadlock the error handler when we issue a door lock request while
   trying to restart operations and there is are no requests pending,
   nor command structures available.

2. We deadlock if we issue a door lock request with a pending request.

-- 
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html


^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: SCSI woes (followup)
  2002-09-24 19:32       ` Patrick Mansfield
  2002-09-24 20:00         ` Russell King
@ 2002-09-24 22:39         ` Russell King
  2002-09-24 23:14           ` James Bottomley
                             ` (2 more replies)
  1 sibling, 3 replies; 297+ messages in thread
From: Russell King @ 2002-09-24 22:39 UTC (permalink / raw)
  To: Patrick Mansfield; +Cc: James Bottomley, linux-scsi

Ok, as promised here's the first patch.

This completely solves the door locking issues for host drivers that
use the new error handling code, and preserves the old behaviour for
drivers that use the old error handler.  If I decide to resurect my
old WD33C93 card, I'll look at the old error handler issues.

The patch is arranged so that you can read the notes in order, and
the patch in order, and the two tie up.  Please take the time to read
the whole thing.

I don't recommend incorporating this into mainline currently; there
are a few minor build issues with it that I'd like to get cleaned up
first (for example, a missing function prototype.)  At this point,
I think it is more important to get this looked over, and to provide
a basis for ideas.

Please note that this patch is against vanilla 2.4.19.

The ideas and method behind this came out of discussion with James
Bottomley.  James should therefore take some credit for this.

1. We introduce a new per-device flag - "locked".  When this flag is
   set, it means that user space wanted the door locked, and we have
   successfully locked it.  When clear, it means something user space
   wanted the door unlocked, and we have successfully unlocked it.

2. Introduce "scsi_set_medium_removal".  This handles all door locking
   and unlocking requests that essentially originate from user space.
   ie, when a device is opened or closed, or an ioctl is received.
   This function takes care of issuing the ALLOW_MEDIUM_REMOVAL command
   via the ioctl layers.

   This function ins't really anything new; its existing code moved
   into a function.  It also removes code duplication.

   The new bit is what it does after the command has completed.  If
   the command was successful, we update the device "locked" flag to
   indicate the new state.

3. Rather than indirecting via scsi_ioctl() from the various drivers,
   we now call scsi_set_medium_removal directly.

So far so good; the above changes are all about tracking the current
door lock state.  Now for its use.  We use it in two places.  Ideally,
this should become one place in the long run.

4. The "new" error handling code.  Just before we attempt to restart
   operations on a host in scsi_restart_operations(), we loop through
   all devices on the host.  At this point, we don't have much idea
   which devices received a reset.

   Any device that is online (something the old code never checked for)
   and was locked, we try to re-lock the door.  Note how we handle this.
   We create a request structure, and fill in all the relevant values,
   and insert this request at the _head_ of the queue.

   The requests done function merely frees the command.  If the command
   fails, we can't really do much to recover from it.  We could retry,
   but the generic SCSI command handling will have done that for us
   already.

5. The "old" code.  Yes, suggestion: read the comment.  Since we have
   the "new" error handling code in place, we must not to lock the
   door while processing requests.  If we do, we'll certainly deadlock.

   However, we do take note of the currently requested lock state, and
   only ask for the door to be locked if it was previously locked.

Now, as I say, this is only _half_ of my fixes.  This fixes the SCSI
door locking problem.  It doesn't fix the complete bus hangs when the
new error handing kicks in, fails, and takes devices off line.  We'll
deal with that can of worms when we've got this one out of the way.

diff -u orig/drivers/scsi/scsi.h linux-rpc/drivers/scsi/scsi.h
--- orig/drivers/scsi/scsi.h	Mon Aug  5 13:31:23 2002
+++ linux-rpc/drivers/scsi/scsi.h	Tue Sep 24 15:57:54 2002
@@ -597,6 +597,7 @@
 	unsigned changed:1;	/* Data invalid due to media change */
 	unsigned busy:1;	/* Used to prevent races */
 	unsigned lockable:1;	/* Able to prevent media removal */
+	unsigned locked:1;	/* Media removal disabled */
 	unsigned borken:1;	/* Tell the Seagate driver to be 
 				 * painfully slow on this device */
 	unsigned tagged_supported:1;	/* Supports SCSI-II tagged queuing */
diff -u orig/drivers/scsi/scsi_ioctl.c linux-rpc/drivers/scsi/scsi_ioctl.c
--- orig/drivers/scsi/scsi_ioctl.c	Mon Aug  5 13:31:24 2002
+++ linux-rpc/drivers/scsi/scsi_ioctl.c	Tue Sep 24 16:38:24 2002
@@ -153,6 +153,29 @@
 	return result;
 }

+int scsi_set_medium_removal(Scsi_Device *dev, char state)
+{
+	char scsi_cmd[MAX_COMMAND_SIZE];
+	int ret;
+
+	if (!dev->removable || !dev->lockable)
+		return 0;
+
+	scsi_cmd[0] = ALLOW_MEDIUM_REMOVAL;
+	scsi_cmd[1] = (dev->scsi_level <= SCSI_2) ? (dev->lun << 5) : 0;
+	scsi_cmd[2] = 0;
+	scsi_cmd[3] = 0;
+	scsi_cmd[4] = state;
+	scsi_cmd[5] = 0;
+
+	ret = ioctl_internal_command(dev, scsi_cmd, IOCTL_NORMAL_TIMEOUT, NORMAL_RETRIES);
+
+	if (ret == 0)
+		dev->locked = state == SCSI_REMOVAL_PREVENT;
+
+	return ret;
+}
+
 /*
  * This interface is depreciated - users should use the scsi generic (sg)
  * interface instead, as this is a more flexible approach to performing
@@ -449,24 +472,9 @@
 		return scsi_ioctl_send_command((Scsi_Device *) dev,
 					     (Scsi_Ioctl_Command *) arg);
 	case SCSI_IOCTL_DOORLOCK:
-		if (!dev->removable || !dev->lockable)
-			return 0;
-		scsi_cmd[0] = ALLOW_MEDIUM_REMOVAL;
-		scsi_cmd[1] = cmd_byte1;
-		scsi_cmd[2] = scsi_cmd[3] = scsi_cmd[5] = 0;
-		scsi_cmd[4] = SCSI_REMOVAL_PREVENT;
-		return ioctl_internal_command((Scsi_Device *) dev, scsi_cmd,
-				   IOCTL_NORMAL_TIMEOUT, NORMAL_RETRIES);
-		break;
+		return scsi_set_medium_removal(dev, SCSI_REMOVAL_PREVENT);
 	case SCSI_IOCTL_DOORUNLOCK:
-		if (!dev->removable || !dev->lockable)
-			return 0;
-		scsi_cmd[0] = ALLOW_MEDIUM_REMOVAL;
-		scsi_cmd[1] = cmd_byte1;
-		scsi_cmd[2] = scsi_cmd[3] = scsi_cmd[5] = 0;
-		scsi_cmd[4] = SCSI_REMOVAL_ALLOW;
-		return ioctl_internal_command((Scsi_Device *) dev, scsi_cmd,
-				   IOCTL_NORMAL_TIMEOUT, NORMAL_RETRIES);
+		return scsi_set_medium_removal(dev, SCSI_REMOVAL_ALLOW);
 	case SCSI_IOCTL_TEST_UNIT_READY:
 		scsi_cmd[0] = TEST_UNIT_READY;
 		scsi_cmd[1] = cmd_byte1;
diff -u orig/drivers/scsi/sd.c linux-rpc/drivers/scsi/sd.c
--- orig/drivers/scsi/sd.c	Mon Aug  5 13:31:25 2002
+++ linux-rpc/drivers/scsi/sd.c	Tue Sep 24 16:07:45 2002
@@ -524,7 +524,7 @@
 	if (SDev->removable)
 		if (SDev->access_count==1)
 			if (scsi_block_when_processing_errors(SDev))
-				scsi_ioctl(SDev, SCSI_IOCTL_DOORLOCK, NULL);
+				scsi_set_medium_removal(SDev, SCSI_REMOVAL_PREVENT);

 	return 0;
@@ -553,7 +553,7 @@
 	if (SDev->removable) {
 		if (!SDev->access_count)
 			if (scsi_block_when_processing_errors(SDev))
-				scsi_ioctl(SDev, SCSI_IOCTL_DOORUNLOCK, NULL);
+				scsi_set_medium_removal(SDev, SCSI_REMOVAL_ALLOW);
 	}
 	if (SDev->host->hostt->module)
 		__MOD_DEC_USE_COUNT(SDev->host->hostt->module);
diff -u orig/drivers/scsi/sr_ioctl.c linux-rpc/drivers/scsi/sr_ioctl.c
--- orig/drivers/scsi/sr_ioctl.c	Mon Aug  5 13:31:26 2002
+++ linux-rpc/drivers/scsi/sr_ioctl.c	Tue Sep 24 16:09:08 2002
@@ -216,9 +216,8 @@

 int sr_lock_door(struct cdrom_device_info *cdi, int lock)
 {
-	return scsi_ioctl(scsi_CDs[MINOR(cdi->dev)].device,
-		      lock ? SCSI_IOCTL_DOORLOCK : SCSI_IOCTL_DOORUNLOCK,
-			  0);
+	return scsi_set_medium_removal(scsi_CDs[MINOR(cdi->dev)].device,
+		      lock ? SCSI_REMOVAL_PREVENT : SCSI_REMOVAL_ALLOW);
 }

 int sr_drive_status(struct cdrom_device_info *cdi, int slot)
diff -u orig/drivers/scsi/scsi_error.c linux-rpc/drivers/scsi/scsi_error.c
--- orig/drivers/scsi/scsi_error.c	Mon Aug  5 13:31:24 2002
+++ linux-rpc/drivers/scsi/scsi_error.c	Tue Sep 24 17:03:58 2002
@@ -35,6 +35,8 @@
 #include "hosts.h"
 #include "constants.h"

+#include <scsi/scsi_ioctl.h> /* grr */
+
 /*
  * We must always allow SHUTDOWN_SIGS.  Even if we are not a module,
  * the host drivers that we are using may be loaded as modules, and
@@ -1219,6 +1221,43 @@
 }

+static void scsi_eh_lock_done(struct scsi_cmnd *SCpnt)
+{
+	struct scsi_request *SRpnt = SCpnt->sc_request;
+
+	SCpnt->sc_request = NULL;
+	SRpnt->sr_command = NULL;
+
+	scsi_release_command(SCpnt);
+	scsi_release_request(SRpnt);
+}
+
+STATIC void scsi_eh_lock_door(struct scsi_device *dev)
+{
+	struct scsi_request *SRpnt = scsi_allocate_request(dev);
+
+	if (SRpnt == NULL) {
+		/* what now? */
+		return;
+	}
+
+	SRpnt->sr_cmnd[0] = ALLOW_MEDIUM_REMOVAL;
+	SRpnt->sr_cmnd[1] = (dev->scsi_level <= SCSI_2) ? (dev->lun << 5) : 0;
+	SRpnt->sr_cmnd[2] = 0;
+	SRpnt->sr_cmnd[3] = 0;
+	SRpnt->sr_cmnd[4] = SCSI_REMOVAL_PREVENT;
+	SRpnt->sr_cmnd[5] = 0;
+	SRpnt->sr_data_direction = SCSI_DATA_NONE;
+	SRpnt->sr_bufflen = 0;
+	SRpnt->sr_buffer = NULL;
+	SRpnt->sr_allowed = 5;
+	SRpnt->sr_done = scsi_eh_lock_done;
+	SRpnt->sr_timeout_per_command = 10 * HZ;
+	SRpnt->sr_cmd_len = COMMAND_SIZE(SRpnt->sr_cmnd[0]);
+
+	scsi_insert_special_req(SRpnt, 1);
+}
+
 /*
  * Function:  scsi_restart_operations
  *
@@ -1241,6 +1280,18 @@
 	ASSERT_LOCK(&io_request_lock, 0);

 	/*
+	 * If the door was locked, we need to insert a door lock request
+	 * onto the head of the SCSI request queue for the device.  There
+	 * is no point trying to lock the door of an off-line device.
+	 */
+	for (SDpnt = host->host_queue; SDpnt; SDpnt = SDpnt->next) {
+		if (!SDpnt->online || !SDpnt->locked)
+			continue;
+
+		scsi_eh_lock_door(SDpnt);
+	}
+
+	/*
 	 * Next free up anything directly waiting upon the host.  This will be
 	 * requests for character device operations, and also for ioctls to queued
 	 * block devices.
diff -u orig/drivers/scsi/scsi_lib.c linux-rpc/drivers/scsi/scsi_lib.c
--- orig/drivers/scsi/scsi_lib.c	Mon Aug  5 13:31:24 2002
+++ linux-rpc/drivers/scsi/scsi_lib.c	Tue Sep 24 17:27:00 2002
@@ -901,8 +926,17 @@
 		 * space.   Technically the error handling thread should be
 		 * doing this crap, but the error handler isn't used by
 		 * most hosts.
+		 *
+		 * (rmk)
+		 * Trying to lock the door can cause deadlocks.  We therefore
+		 * only use this for old hosts; our door locking is now done
+		 * by the error handler in scsi_restart_operations for new
+		 * eh hosts.
+		 *
+		 * Note that we don't clear was_reset here; this is used by
+		 * st.c, and either one or other has to die.
 		 */
-		if (SDpnt->was_reset) {
+		if (SHpnt->hostt->use_new_eh_code == 0 && SDpnt->was_reset) {
 			/*
 			 * We need to relock the door, but we might
 			 * be in an interrupt handler.  Only do this
@@ -913,7 +947,7 @@
 			 * this work.
 			 */
 			SDpnt->was_reset = 0;
-			if (SDpnt->removable && !in_interrupt()) {
+			if (SDpnt->removable && SDpnt->locked && !in_interrupt()) {
 				spin_unlock_irq(&io_request_lock);
 				scsi_ioctl(SDpnt, SCSI_IOCTL_DOORLOCK, 0);
 				spin_lock_irq(&io_request_lock);

-- 
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: SCSI woes (followup)
  2002-09-24 22:39         ` Russell King
@ 2002-09-24 23:14           ` James Bottomley
  2002-09-24 23:26             ` Mike Anderson
  2002-09-24 23:33           ` Mike Anderson
  2002-09-25  0:08           ` Patrick Mansfield
  2 siblings, 1 reply; 297+ messages in thread
From: James Bottomley @ 2002-09-24 23:14 UTC (permalink / raw)
  To: Russell King; +Cc: Patrick Mansfield, James Bottomley, linux-scsi

rmk@arm.linux.org.uk said:
> Now, as I say, this is only _half_ of my fixes.  This fixes the SCSI
> door locking problem.  It doesn't fix the complete bus hangs when the
> new error handing kicks in, fails, and takes devices off line.  We'll
> deal with that can of worms when we've got this one out of the way.

This looks fine to me, except the bit where you free the command and request 
from inside the done function.  I can't find anywhere we might touch the 
request again after this, so I suppose it's safe for now.

send it off to Marcelo and I'll try to up-port to 2.5

James



^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: SCSI woes (followup)
  2002-09-24 23:14           ` James Bottomley
@ 2002-09-24 23:26             ` Mike Anderson
  2002-09-24 23:31               ` James Bottomley
                                 ` (2 more replies)
  0 siblings, 3 replies; 297+ messages in thread
From: Mike Anderson @ 2002-09-24 23:26 UTC (permalink / raw)
  To: James Bottomley; +Cc: Russell King, Patrick Mansfield, linux-scsi

James Bottomley [James.Bottomley@SteelEye.com] wrote:
> send it off to Marcelo and I'll try to up-port to 2.5

If you want me to I can add it to a patch bundle I already have that
includes ports of Russell's previous changes that apply on top of my
2.5 scsi_error cleanup patch.

-andmike
--
Michael Anderson
andmike@us.ibm.com


^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: SCSI woes (followup)
  2002-09-24 23:26             ` Mike Anderson
@ 2002-09-24 23:31               ` James Bottomley
  2002-09-24 23:56                 ` Mike Anderson
  2002-09-24 23:33               ` Russell King
  2002-09-25 14:41               ` Russell King
  2 siblings, 1 reply; 297+ messages in thread
From: James Bottomley @ 2002-09-24 23:31 UTC (permalink / raw)
  To: James Bottomley, Russell King, Patrick Mansfield, linux-scsi

andmike@us.ibm.com said:
> If you want me to I can add it to a patch bundle I already have that
> includes ports of Russell's previous changes that apply on top of my
> 2.5 scsi_error cleanup patch. 

Sounds good.  Do you have a copy of the error handler clean up that applies 
cleanly to 2.5.38 (after the removal of scsi_queue.c)?

James



^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: SCSI woes (followup)
  2002-09-24 23:31               ` James Bottomley
@ 2002-09-24 23:56                 ` Mike Anderson
  0 siblings, 0 replies; 297+ messages in thread
From: Mike Anderson @ 2002-09-24 23:56 UTC (permalink / raw)
  To: James Bottomley; +Cc: Russell King, Patrick Mansfield, linux-scsi

James Bottomley [James.Bottomley@SteelEye.com] wrote:
> andmike@us.ibm.com said:
> > If you want me to I can add it to a patch bundle I already have that
> > includes ports of Russell's previous changes that apply on top of my
> > 2.5 scsi_error cleanup patch. 
> 
> Sounds good.  Do you have a copy of the error handler clean up that applies 
> cleanly to 2.5.38 (after the removal of scsi_queue.c)?
> 

Yes,
	I am sending out mail now on new thread.

-andmike
--
Michael Anderson
andmike@us.ibm.com


^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: SCSI woes (followup)
  2002-09-24 23:26             ` Mike Anderson
  2002-09-24 23:31               ` James Bottomley
@ 2002-09-24 23:33               ` Russell King
  2002-09-25  0:47                 ` Mike Anderson
  2002-09-25  2:18                 ` Doug Ledford
  2002-09-25 14:41               ` Russell King
  2 siblings, 2 replies; 297+ messages in thread
From: Russell King @ 2002-09-24 23:33 UTC (permalink / raw)
  To: linux-scsi

On Tue, Sep 24, 2002 at 04:26:30PM -0700, Mike Anderson wrote:
> James Bottomley [James.Bottomley@SteelEye.com] wrote:
> > send it off to Marcelo and I'll try to up-port to 2.5
> 
> If you want me to I can add it to a patch bundle I already have that
> includes ports of Russell's previous changes that apply on top of my
> 2.5 scsi_error cleanup patch.

Thanks Mike; I should really be looking at 2.5 kernel stuff for the ARM
CPU at the moment (like getting a new release out.)

There is a trivial but required fix that I should backport from 2.5 to
2.4; that's the one that limits the number of command retries in
scsi_error.c.  I'm not sure if Doug Ledford picked that one up though.
Doug?

-- 
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html


^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: SCSI woes (followup)
  2002-09-24 23:33               ` Russell King
@ 2002-09-25  0:47                 ` Mike Anderson
  2002-09-25  8:45                   ` Russell King
  2002-09-25  2:18                 ` Doug Ledford
  1 sibling, 1 reply; 297+ messages in thread
From: Mike Anderson @ 2002-09-25  0:47 UTC (permalink / raw)
  To: Russell King; +Cc: linux-scsi

Russell King [rmk@arm.linux.org.uk] wrote:
> There is a trivial but required fix that I should backport from 2.5 to
> 2.4; that's the one that limits the number of command retries in
> scsi_error.c.  I'm not sure if Doug Ledford picked that one up though.
> Doug?

After I forward ported your restore code prior to retry I noticed
that the retry code did look correct. 

The 2.5 code copies the scsi_decide_disposition maybe_retry code block
into scsi_eh_completed_normally. While this function is similar to
decide_disposition it is called by scsi_send_eh_cmnd which is used both
for error_recovery cmds (TURs, etc) and retrying the command. The down
side here is that you get inconsistent and unexpected behavior in your
error_recovery commands depending on the retry allowed value of the
failed cmd and you also alter the retry count for the failed command.
The failed commands are also retried even if the user has said not to
(not a very nice thing to do a sequential  device).

I moved this check up higher. I also altered the retry code to not
exceed retries allowed.

I will try to post my update tomorrow, but it will not mean much to 2.4
as it is on top of the 2.5 cleanup which is very different from 2.4.

-andmike
--
Michael Anderson
andmike@us.ibm.com

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: SCSI woes (followup)
  2002-09-25  0:47                 ` Mike Anderson
@ 2002-09-25  8:45                   ` Russell King
  0 siblings, 0 replies; 297+ messages in thread
From: Russell King @ 2002-09-25  8:45 UTC (permalink / raw)
  To: linux-scsi

On Tue, Sep 24, 2002 at 05:47:38PM -0700, Mike Anderson wrote:
> I will try to post my update tomorrow, but it will not mean much to 2.4
> as it is on top of the 2.5 cleanup which is very different from 2.4.

Nevertheless, bounding those retries in 2.4 is a must for stability.
Loops that could become infinite one day almost certainly will at some
point.  I think the SCSI subsystem has proven this point in the past. 8/

I've also spotted a few other places that could do with my "restore
Scsi_Cmnd to pristine state after error handling commands have
completed" cleanup in scsi_error.c to ensure that we restore everything
necessary.

I'm putting that at low priority at the moment though.

-- 
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: SCSI woes (followup)
  2002-09-24 23:33               ` Russell King
  2002-09-25  0:47                 ` Mike Anderson
@ 2002-09-25  2:18                 ` Doug Ledford
  1 sibling, 0 replies; 297+ messages in thread
From: Doug Ledford @ 2002-09-25  2:18 UTC (permalink / raw)
  To: Russell King; +Cc: linux-scsi

Russell King wrote:
> There is a trivial but required fix that I should backport from 2.5 to
> 2.4; that's the one that limits the number of command retries in
> scsi_error.c.  I'm not sure if Doug Ledford picked that one up though.
> Doug?

I haven't even caught up entirely yet from being gone so much the last 
two weeks, so I seriously doubt it (and I don't remember it).




-- 
   Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
          Red Hat, Inc.
          1801 Varsity Dr.
          Raleigh, NC 27606



^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: SCSI woes (followup)
  2002-09-24 23:26             ` Mike Anderson
  2002-09-24 23:31               ` James Bottomley
  2002-09-24 23:33               ` Russell King
@ 2002-09-25 14:41               ` Russell King
  2 siblings, 0 replies; 297+ messages in thread
From: Russell King @ 2002-09-25 14:41 UTC (permalink / raw)
  To: linux-scsi

On Tue, Sep 24, 2002 at 04:26:30PM -0700, Mike Anderson wrote:
> James Bottomley [James.Bottomley@SteelEye.com] wrote:
> > send it off to Marcelo and I'll try to up-port to 2.5
> 
> If you want me to I can add it to a patch bundle I already have that
> includes ports of Russell's previous changes that apply on top of my
> 2.5 scsi_error cleanup patch.

It'll probably be better to wait for me to say "ok, I'm happy, and sending
the stuff" before doing that.

I've just re-diffed and cleaned up all my scsi patches to date, and
split them up into logical blocks:

01-scsi-cmd-retry-1.diff
01-scsi-cmd-retry-2.diff
01-scsi-cmd-retry-3.diff
02-scsi-cmd-report.diff
03-scsi-restart-ops.diff
04-scsi-door-lock-1.diff
04-scsi-door-lock-2.diff

I've updated the command retry patches to catch a couple of re-setup
places I'd missed in scsi_error.c.

I've also incorporated Patricks concern about breaking out of the
restart loop when we hit a device with the "device_blocked" flag
set.  In this case, I think we should still call the request function;
the request function performs its own "device blocked" check, so putting
such a check here would just be a needless duplicate.

I'll put the patches up on ftp.linux.org.uk later today.

I'll also be asking Alan Cox at some point in the future if he'd mind
dropping them into his tree for further testing, before sending them
to Marcelo.

(Oh, and in case anyone hasn't realised yet, I'm not on linux-scsi.)

-- 
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: SCSI woes (followup)
  2002-09-24 22:39         ` Russell King
  2002-09-24 23:14           ` James Bottomley
@ 2002-09-24 23:33           ` Mike Anderson
  2002-09-24 23:45             ` Russell King
  2002-09-25  0:08           ` Patrick Mansfield
  2 siblings, 1 reply; 297+ messages in thread
From: Mike Anderson @ 2002-09-24 23:33 UTC (permalink / raw)
  To: Russell King; +Cc: Patrick Mansfield, James Bottomley, linux-scsi

Russell King [rmk@arm.linux.org.uk] wrote:

looks good.

One nit is that scsi_report_bus_reset sets was_reset which is only
called by two drivers I currently see. The new code does not catch this
case, but the trade of is probably worth it.

-andmike
--
Michael Anderson
andmike@us.ibm.com


^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: SCSI woes (followup)
  2002-09-24 23:33           ` Mike Anderson
@ 2002-09-24 23:45             ` Russell King
  0 siblings, 0 replies; 297+ messages in thread
From: Russell King @ 2002-09-24 23:45 UTC (permalink / raw)
  To: linux-scsi

On Tue, Sep 24, 2002 at 04:33:46PM -0700, Mike Anderson wrote:
> Russell King [rmk@arm.linux.org.uk] wrote:
> 
> looks good.
> 
> One nit is that scsi_report_bus_reset sets was_reset which is only
> called by two drivers I currently see. The new code does not catch this
> case, but the trade of is probably worth it.

I opted for the safe method here for two reasons (why am I making lists
of stuff all the time 8)):

a) There seems to exist a hole with the device reset handling, where we
   want to reset one device.  We make no attempt to mark this device has
   having been reset.

   The other (bus and host) reset functions in scsi_error.c set was_reset
   and the unit attention flag.

b) st.c certainly uses was_reset for its own purposes.  We could introduce
   a new flag "needs_locking" or something like that, but to be honest I
   didn't see the point.  If we entered the error handler, commands have
   gone horribly wrong, and the host is quiet, so we aren't going to
   eat bus bandwidth sending these commands.

(a) I'm going to think about tomorrow; IMHO it should be plugged anyway,
but I want to make sure we get the right approach there, and end up with
something sane.  I also want to review your work; I'm sure there's ideas
that will be mutually beneficial to each other in both of our the error
handler efforts.

-- 
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: SCSI woes (followup)
  2002-09-24 22:39         ` Russell King
  2002-09-24 23:14           ` James Bottomley
  2002-09-24 23:33           ` Mike Anderson
@ 2002-09-25  0:08           ` Patrick Mansfield
  2002-09-25  8:41             ` Russell King
  2002-09-25 12:46             ` Russell King
  2 siblings, 2 replies; 297+ messages in thread
From: Patrick Mansfield @ 2002-09-25  0:08 UTC (permalink / raw)
  To: Russell King; +Cc: James Bottomley, linux-scsi

On Tue, Sep 24, 2002 at 11:39:42PM +0100, Russell King wrote:
> Ok, as promised here's the first patch.

Looks nice.  I have the same comment as before about no commands.

>    The new bit is what it does after the command has completed.  If
>    the command was successful, we update the device "locked" flag to
>    indicate the new state.

(The sd.c and sr code are a bit odd since lock failures are ignored;
sr won't unlock later in such cases, but sd unlocks even if the
lock failed on sd_open.)

> +static void scsi_eh_lock_done(struct scsi_cmnd *SCpnt)
> +{
> +	struct scsi_request *SRpnt = SCpnt->sc_request;
> +
> +	SCpnt->sc_request = NULL;
> +	SRpnt->sr_command = NULL;
> +
> +	scsi_release_command(SCpnt);
> +	scsi_release_request(SRpnt);
> +}

This is back to my orginal point (i.e. case 1 in your other response
about "restarting operations after error handling", I haven't figured
out case 2 yet) - the above will not be called if there are no commands.

Can you add some debug printks or what not above and below so we know
for certain the result?

It would be nice if we could call
scsi_set_medium_removal(SCSI_REMOVAL_PREVENT) and use a single code path
for all door lock requests, then we need a special force-to-queue-head
argument passed on down the chain. (We should really have some common
setup-a-scsi-command functions to remove all the duplicate cmd[]
setup code.)

> +STATIC void scsi_eh_lock_door(struct scsi_device *dev)
> +{
> +	struct scsi_request *SRpnt = scsi_allocate_request(dev);
> +
> +	if (SRpnt == NULL) {
> +		/* what now? */
> +		return;
> +	}
> +
> +	SRpnt->sr_cmnd[0] = ALLOW_MEDIUM_REMOVAL;
> +	SRpnt->sr_cmnd[1] = (dev->scsi_level <= SCSI_2) ? (dev->lun << 5) : 0;
> +	SRpnt->sr_cmnd[2] = 0;
> +	SRpnt->sr_cmnd[3] = 0;
> +	SRpnt->sr_cmnd[4] = SCSI_REMOVAL_PREVENT;
> +	SRpnt->sr_cmnd[5] = 0;
> +	SRpnt->sr_data_direction = SCSI_DATA_NONE;
> +	SRpnt->sr_bufflen = 0;
> +	SRpnt->sr_buffer = NULL;
> +	SRpnt->sr_allowed = 5;
> +	SRpnt->sr_done = scsi_eh_lock_done;
> +	SRpnt->sr_timeout_per_command = 10 * HZ;
> +	SRpnt->sr_cmd_len = COMMAND_SIZE(SRpnt->sr_cmnd[0]);
> +
> +	scsi_insert_special_req(SRpnt, 1);
> +}
> +
>  /*
>   * Function:  scsi_restart_operations
>   *
> @@ -1241,6 +1280,18 @@
>  	ASSERT_LOCK(&io_request_lock, 0);
>  
>  	/*
> +	 * If the door was locked, we need to insert a door lock request
> +	 * onto the head of the SCSI request queue for the device.  There
> +	 * is no point trying to lock the door of an off-line device.
> +	 */
> +	for (SDpnt = host->host_queue; SDpnt; SDpnt = SDpnt->next) {

> +		if (!SDpnt->online || !SDpnt->locked)
> +			continue;
> +
> +		scsi_eh_lock_door(SDpnt);
> +	}

Why is the above in a separate loop from the loop with the request_fn call?
To avoid holding the lock during the call or what?

Above is where we also need a "SDpnt->has_cmdblock" check, simplify
the above to be:

	for (SDpnt = host->host_queue; SDpnt; SDpnt = SDpnt->next)
		if (SDpnt->online && SDpnt->locked && SDpnt->has_cmdblock)
			scsi_eh_lock_door(SDpnt);
			
(And why does the SDpnt->device_blocked cause the later loop to
break rather than continue! I can understand the host->xxx checks
breaking out.)

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: SCSI woes (followup)
  2002-09-25  0:08           ` Patrick Mansfield
@ 2002-09-25  8:41             ` Russell King
  2002-09-25 17:22               ` Patrick Mansfield
  2002-09-25 12:46             ` Russell King
  1 sibling, 1 reply; 297+ messages in thread
From: Russell King @ 2002-09-25  8:41 UTC (permalink / raw)
  To: linux-scsi

On Tue, Sep 24, 2002 at 05:08:57PM -0700, Patrick Mansfield wrote:
> On Tue, Sep 24, 2002 at 11:39:42PM +0100, Russell King wrote:
> > Ok, as promised here's the first patch.
> 
> Looks nice.  I have the same comment as before about no commands.

I fail to see what the problem here is.  I've repeatedly explained this.
I'm not going to explain it again; maybe it would be wise to read the
2.4.19 code?  Please re-read all my mails since Sunday.

> > +static void scsi_eh_lock_done(struct scsi_cmnd *SCpnt)
> > +{
> > +	struct scsi_request *SRpnt = SCpnt->sc_request;
> > +
> > +	SCpnt->sc_request = NULL;
> > +	SRpnt->sr_command = NULL;
> > +
> > +	scsi_release_command(SCpnt);
> > +	scsi_release_request(SRpnt);
> > +}
> 
> This is back to my orginal point (i.e. case 1 in your other response
> about "restarting operations after error handling", I haven't figured
> out case 2 yet) - the above will not be called if there are no commands.

It seems that you've misunderstood what this code is doing.

We queue up a set of door lock requests onto the head of the queue for
each device.  Each of these requests needs a "done" function (read some
of the other subsystem code to find out why, I'm not going to explain
the basics of the subsystem here) and the above code that you quoted
is it.  Since we are the last user of the request and command structures
for this door lock request, we must free them to avoid leaking memory.
So we do exactly that.

The above code has _nothing_ to do with any other request what so ever
nor the state of the queues.

I fail to see what is soo difficult to grasp here.

-- 
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: SCSI woes (followup)
  2002-09-25  8:41             ` Russell King
@ 2002-09-25 17:22               ` Patrick Mansfield
  0 siblings, 0 replies; 297+ messages in thread
From: Patrick Mansfield @ 2002-09-25 17:22 UTC (permalink / raw)
  To: Russell King; +Cc: linux-scsi

Russel -

On Wed, Sep 25, 2002 at 09:41:37AM +0100, Russell King wrote:
> On Tue, Sep 24, 2002 at 05:08:57PM -0700, Patrick Mansfield wrote:
> > On Tue, Sep 24, 2002 at 11:39:42PM +0100, Russell King wrote:
> > 
> > Looks nice.  I have the same comment as before about no commands.
> 
> I fail to see what the problem here is.  I've repeatedly explained this.
> I'm not going to explain it again; maybe it would be wise to read the
> 2.4.19 code?  Please re-read all my mails since Sunday.

OK, I understand those and the IO completion code.

I see now in your patch that if Scsi_Device::locked is set, we must
have Scsi_Device::cmd_blocks != 0, since we only set locked in upper
layers or via an ioctl to an upper layer device. So, checking for
cmd_blocks is redundant.

> > > +static void scsi_eh_lock_done(struct scsi_cmnd *SCpnt)
> > > +{
> > > +	struct scsi_request *SRpnt = SCpnt->sc_request;
> > > +
> > > +	SCpnt->sc_request = NULL;
> > > +	SRpnt->sr_command = NULL;
> > > +
> > > +	scsi_release_command(SCpnt);
> > > +	scsi_release_request(SRpnt);
> > > +}
> > 
> > This is back to my orginal point (i.e. case 1 in your other response
> > about "restarting operations after error handling", I haven't figured
> > out case 2 yet) - the above will not be called if there are no commands.

So, the above can never happen, since locked can't be set until after
cmd_blocks == 1.

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: SCSI woes (followup)
  2002-09-25  0:08           ` Patrick Mansfield
  2002-09-25  8:41             ` Russell King
@ 2002-09-25 12:46             ` Russell King
  1 sibling, 0 replies; 297+ messages in thread
From: Russell King @ 2002-09-25 12:46 UTC (permalink / raw)
  To: linux-scsi

Some more thoughts about this...

On Tue, Sep 24, 2002 at 05:08:57PM -0700, Patrick Mansfield wrote:
> It would be nice if we could call
> scsi_set_medium_removal(SCSI_REMOVAL_PREVENT) and use a single code path
> for all door lock requests, then we need a special force-to-queue-head
> argument passed on down the chain.

I don't think so.  In the general case, we need to wait for the command
to complete, pick up the result of that command, and report that back
to the caller.

In the error handing case, we must not wait for the command to complete.
Firstly, any result it reveals isn't useful.  The completion of the
command isn't interesting.

Secondly, blocking the error handler is one of the problems that the
original code had, which is why the rest of the error handler goes to
great efforts to submit retried commands to the host adapter in a
carefully controlled manner.  We must not block the error handler,
and then get into a situation that we need the error handler to
recover itself from.

So we have two situations.  First, the common case where we add requests
to the tail of the request queue.  This is the one you optimise for,
and make as fast as possible.

Second, we have the slow path, which is the error recovery.  Yes, it
is less important to optimise it, but it shouldn't be at the expense
of the fast, common path.

> (We should really have some common setup-a-scsi-command functions to
> remove all the duplicate cmd[] setup code.)

I have some thoughts on this.  However, I don't believe them to be
appropriate for 2.4 kernels.  For 2.4, I'm working to fix the problems
by introducing the minimum of new features.

> > +	for (SDpnt = host->host_queue; SDpnt; SDpnt = SDpnt->next) {
> 
> > +		if (!SDpnt->online || !SDpnt->locked)
> > +			continue;
> > +
> > +		scsi_eh_lock_door(SDpnt);
> > +	}
> 
> Why is the above in a separate loop from the loop with the request_fn call?
> To avoid holding the lock during the call or what?

Yes.  scsi_eh_lock_door() may sleep when trying to obtain a request.
I'm not convinced that this is the best thing to do yet.  However,
if we do have the possibility of sleeping, we must not hold any
spinlocks.

> Above is where we also need a "SDpnt->has_cmdblock" check, simplify
> the above to be:

No.  The request allocation code should be doing that for us.

> 	for (SDpnt = host->host_queue; SDpnt; SDpnt = SDpnt->next)
> 		if (SDpnt->online && SDpnt->locked && SDpnt->has_cmdblock)
> 			scsi_eh_lock_door(SDpnt);
> 			
> (And why does the SDpnt->device_blocked cause the later loop to
> break rather than continue! I can understand the host->xxx checks
> breaking out.)

Probably another bug.

I'm trying to make things more correct than they currently are.  Each
change on its own may not fix every single bug in one go.  Neither
should it.

Locate one problem, address it, solve it, prove that it is solved,
create a patch, submit, and get it reviewed.  Move to the next problem.

This is the only way that we're going to get reasonable improvement.
SCSI error handling has sucked for many many years.  If it was trivial
to fix, it would've been fixed long ago.

It's going to take time, but I believe that this method will have a
better success rate than one huge patch that claims to address all
issues, changes lots of code and therefore can't be reviewed.

-- 
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: SCSI woes (followup)
  2002-09-24 13:46 ` James Bottomley
  2002-09-24 13:58   ` Russell King
@ 2002-09-24 17:57   ` Luben Tuikov
  2002-09-24 18:39     ` Mike Anderson
  1 sibling, 1 reply; 297+ messages in thread
From: Luben Tuikov @ 2002-09-24 17:57 UTC (permalink / raw)
  To: James Bottomley; +Cc: Russell King, linux-scsi

James Bottomley wrote:
>
> I think it's method of operation is misplaced.  What it should probably be
> doing is simply adding a doorlock to the head of the queue and then go back
> around.  Since we know the doorlock now preceeds the command that instigated
> the issue the door should be locked before anything else happens.  It should
> simply loop around and process the door lock request so it doesn't have to
> wait for anything else to occur (any commands which come down in the interim
> should go on the queue tail).
> 

Similar mechanism exists in some current SCSI transport/interconnect
protocols.

The notion of _an_immediate_command_ does exactly that.

Add a flag, say unsigned I:1; to scsi_cmnd, which if set to 1, will
make add it to the head of the queue, irrespective of its serial
number, else the command is added according to its serial number
in the pending commands queue.

-- 
Luben

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: SCSI woes (followup)
  2002-09-24 17:57   ` Luben Tuikov
@ 2002-09-24 18:39     ` Mike Anderson
  2002-09-24 18:49       ` Luben Tuikov
  0 siblings, 1 reply; 297+ messages in thread
From: Mike Anderson @ 2002-09-24 18:39 UTC (permalink / raw)
  To: Luben Tuikov; +Cc: James Bottomley, Russell King, linux-scsi

Luben Tuikov [luben@splentec.com] wrote:
> Add a flag, say unsigned I:1; to scsi_cmnd, which if set to 1, will
> make add it to the head of the queue, irrespective of its serial
> number, else the command is added according to its serial number
> in the pending commands queue.

The code path currently being discussed is a request path and we do not
have a scsi_cmnd until it is processed by the scsi_request_fn. There is
already an interface to insert the request at the head of the queue
scsi_insert_special_req, but we always set it to 0. We also do not
expose this interface high enough to let requests set this. We do
expose it through the older scsi_insert_special_cmd interface used by
scsi_mlqueue_insert.

-andmike
--
Michael Anderson
andmike@us.ibm.com

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: SCSI woes (followup)
  2002-09-24 18:39     ` Mike Anderson
@ 2002-09-24 18:49       ` Luben Tuikov
  0 siblings, 0 replies; 297+ messages in thread
From: Luben Tuikov @ 2002-09-24 18:49 UTC (permalink / raw)
  To: Mike Anderson; +Cc: James Bottomley, Russell King, linux-scsi

Mike Anderson wrote:
> 
> Luben Tuikov [luben@splentec.com] wrote:
> > Add a flag, say unsigned I:1; to scsi_cmnd, which if set to 1, will
> > make add it to the head of the queue, irrespective of its serial
> > number, else the command is added according to its serial number
> > in the pending commands queue.
> 
> The code path currently being discussed is a request path and we do not

Sorry Mike,

I didn't really bother to look this up -- I just wanted to
send along an idea -- whether it be a request or cmnd,
the important thing is that you get the idea.

> have a scsi_cmnd until it is processed by the scsi_request_fn. There is
> already an interface to insert the request at the head of the queue
> scsi_insert_special_req, but we always set it to 0. We also do not

Should be a flag. ``Make'' it private to SCSI core, e.g.
by setting it to 0 when entering from ULP, etc.

Thus, the queue insertion code would be quite generic; also
using list_head as Doug has suggested, would do wonders (again
I'm _not_ looking this up).

> expose this interface high enough to let requests set this. We do
> expose it through the older scsi_insert_special_cmd interface used by
> scsi_mlqueue_insert.

-- 
Luben

^ permalink raw reply	[flat|nested] 297+ messages in thread

* [PATCH] first cut at fixing unable to requeue with no outstanding commands
@ 2002-09-30 21:06 James Bottomley
  2002-09-30 23:28 ` Mike Anderson
  0 siblings, 1 reply; 297+ messages in thread
From: James Bottomley @ 2002-09-30 21:06 UTC (permalink / raw)
  To: linux-scsi

[-- Attachment #1: Type: text/plain, Size: 1137 bytes --]

The attached represents an attempt to break the scsi mid-layer of the 
assumption that any device can queue at least one command.

What essentially happens if the host rejects a command with no other 
outstanding commands, it does a very crude countdown (basically counts the 
number of cycles through the scsi request function) until the device gets 
enabled again when the count reaches zero.  I think the iteration in the 
request function is better than a fixed timer because it makes the system more 
responsive to I/O pressure (and also, it's easier to code).

I've tested this by making a SCSI driver artificially reject commands with 
none outstanding (and run it on my root device).  A value of seven seems to 
cause a delay of between half and five seconds before the host starts up again 
(depending on the I/O load).

If this approach looks acceptable, I plan the following enhancements

1. Make device_busy count down in the same fashion
2. give ->queuecommand() a two value return (one for blocking the entire host 
and another for just blocking the device).
3. Make the countdown tuneable from the host template.

James



[-- Attachment #2: tmp.diff --]
[-- Type: text/plain , Size: 2512 bytes --]

===== hosts.c 1.9 vs edited =====
--- 1.9/drivers/scsi/hosts.c	Tue Jun 11 15:43:08 2002
+++ edited/hosts.c	Mon Sep 30 15:29:56 2002
@@ -210,7 +210,7 @@
     retval->eh_notify   = NULL;    /* Who we notify when we exit. */
 
 
-    retval->host_blocked = FALSE;
+    retval->host_blocked = 0;
     retval->host_self_blocked = FALSE;
 
 #ifdef DEBUG
===== hosts.h 1.12 vs edited =====
--- 1.12/drivers/scsi/hosts.h	Sun Jul 21 03:55:49 2002
+++ edited/hosts.h	Mon Sep 30 15:25:53 2002
@@ -395,11 +395,6 @@
     unsigned use_blk_tcq:1;
 
     /*
-     * Host has rejected a command because it was busy.
-     */
-    unsigned host_blocked:1;
-
-    /*
      * Host has requested that no further requests come through for the
      * time being.
      */
@@ -417,6 +412,19 @@
      */
     unsigned some_device_starved:1;
    
+    /*
+     * Host has rejected a command because it was busy.
+     */
+    unsigned int host_blocked;
+
+    /*
+     * Initial value for the blocking.  If the queue is empty, host_blocked
+     * counts down in the request_fn until it restarts host operations as
+     * zero is reached.  
+     *
+     * FIXME: This should probably be a value in the template */
+    #define SCSI_START_HOST_BLOCKED	7
+
     void (*select_queue_depths)(struct Scsi_Host *, Scsi_Device *);
 
     /*
===== scsi.c 1.36 vs edited =====
--- 1.36/drivers/scsi/scsi.c	Fri Sep 20 00:40:42 2002
+++ edited/scsi.c	Mon Sep 30 15:26:41 2002
@@ -643,7 +643,7 @@
 				return 0;
 			}
 		}
-		host->host_blocked = TRUE;
+		host->host_blocked = SCSI_START_HOST_BLOCKED;
 	} else {
 		if (cmd->device->device_busy == 0) {
 			if (scsi_retry_command(cmd) == 0) {
@@ -1443,7 +1443,7 @@
          * for both the queue full condition on a device, and for a
          * host full condition on the host.
          */
-        host->host_blocked = FALSE;
+        host->host_blocked = 0;
         device->device_blocked = FALSE;
 
 	/*
===== scsi_lib.c 1.30 vs edited =====
--- 1.30/drivers/scsi/scsi_lib.c	Wed Sep 18 11:36:10 2002
+++ edited/scsi_lib.c	Mon Sep 30 15:33:00 2002
@@ -754,6 +754,16 @@
 		if (SHpnt->in_recovery || blk_queue_plugged(q))
 			return;
 
+		if(SHpnt->host_busy == 0 && SHpnt->host_blocked) {
+			/* unblock after host_blocked iterates to zero */
+			if(--SHpnt->host_blocked == 0) {
+				printk("scsi%d unblocking host at zero depth\n", SHpnt->host_no);
+			} else {
+				blk_plug_device(q);
+				break;
+			}
+		}
+				
 		/*
 		 * If the device cannot accept another request, then quit.
 		 */

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] first cut at fixing unable to requeue with no outstanding commands
  2002-09-30 21:06 [PATCH] first cut at fixing unable to requeue with no outstanding commands James Bottomley
@ 2002-09-30 23:28 ` Mike Anderson
  2002-10-01  0:38   ` James Bottomley
  0 siblings, 1 reply; 297+ messages in thread
From: Mike Anderson @ 2002-09-30 23:28 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi

James Bottomley [James.Bottomley@steeleye.com] wrote:
> +		if(SHpnt->host_busy == 0 && SHpnt->host_blocked) {
> +			/* unblock after host_blocked iterates to zero */
> +			if(--SHpnt->host_blocked == 0) {
> +				printk("scsi%d unblocking host at zero depth\n", SHpnt->host_no);
> +			} else {
> +				blk_plug_device(q);
> +				break;
> +			}
> +		}
> +				

Are we guaranteed that blk_run_queues will be called for all types of
I/O?

-andmike
--
Michael Anderson
andmike@us.ibm.com


^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] first cut at fixing unable to requeue with no outstanding commands
  2002-09-30 23:28 ` Mike Anderson
@ 2002-10-01  0:38   ` James Bottomley
  2002-10-01 15:01     ` Patrick Mansfield
  0 siblings, 1 reply; 297+ messages in thread
From: James Bottomley @ 2002-10-01  0:38 UTC (permalink / raw)
  To: linux-scsi

andmike@us.ibm.com said:
> Are we guaranteed that blk_run_queues will be called for all types of
> I/O? 

Pretty much, it looks like.  It's mainly triggered by I/O stuff in fs, 
including the buffer functions, so I think the assumption that we will be 
called eventually is good.

James




^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] first cut at fixing unable to requeue with no outstanding commands
  2002-10-01  0:38   ` James Bottomley
@ 2002-10-01 15:01     ` Patrick Mansfield
  2002-10-01 15:14       ` James Bottomley
  0 siblings, 1 reply; 297+ messages in thread
From: Patrick Mansfield @ 2002-10-01 15:01 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi

On Mon, Sep 30, 2002 at 08:38:49PM -0400, James Bottomley wrote:
> andmike@us.ibm.com said:
> > Are we guaranteed that blk_run_queues will be called for all types of
> > I/O? 
> 
> Pretty much, it looks like.  It's mainly triggered by I/O stuff in fs, 
> including the buffer functions, so I think the assumption that we will be 
> called eventually is good.
> 
> James

What about applications and devices that only send one IO at a time?
They could still hang.

We have: tape (st or osst), sg usage, partitioning, scanning, direct
use of the block device (i.e. dd if=/dev/sda), and probably others.

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] first cut at fixing unable to requeue with no outstanding commands
  2002-10-01 15:01     ` Patrick Mansfield
@ 2002-10-01 15:14       ` James Bottomley
  2002-10-01 16:23         ` Mike Anderson
  2002-10-01 20:18         ` Inhibit auto-attach of scsi disks ? Scott Merritt
  0 siblings, 2 replies; 297+ messages in thread
From: James Bottomley @ 2002-10-01 15:14 UTC (permalink / raw)
  To: linux-scsi

patmans@us.ibm.com said:
> What about applications and devices that only send one IO at a time?
> They could still hang.

> We have: tape (st or osst), sg usage, partitioning, scanning, direct
> use of the block device (i.e. dd if=/dev/sda), and probably others. 

I don't believe so.

Unplugging is a global thing.  The request function for a plugged queue will 
always be run, that's a guarantee, so using this approach, the queue will 
stall but never hang forever.

I think for a rejection of a command with none outstanding, a stall is what 
you want (give the host/device time to recover from whatever the problem is).  
The length of the stall will be dependent on I/O pressure in the system, which 
is also roughly what you want: we can afford to give a device quite a while to 
recover if we're not desparate to get I/O to it. We can also handle BUSY 
returns this way too...

James

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] first cut at fixing unable to requeue with no outstanding commands
  2002-10-01 15:14       ` James Bottomley
@ 2002-10-01 16:23         ` Mike Anderson
  2002-10-01 16:30           ` James Bottomley
  2002-10-01 20:18         ` Inhibit auto-attach of scsi disks ? Scott Merritt
  1 sibling, 1 reply; 297+ messages in thread
From: Mike Anderson @ 2002-10-01 16:23 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi

James Bottomley [James.Bottomley@steeleye.com] wrote:
> patmans@us.ibm.com said:
> > What about applications and devices that only send one IO at a time?
> > They could still hang.
> 
> > We have: tape (st or osst), sg usage, partitioning, scanning, direct
> > use of the block device (i.e. dd if=/dev/sda), and probably others. 
> 
> I don't believe so.
> 
> Unplugging is a global thing.  The request function for a plugged queue will 
> always be run, that's a guarantee, so using this approach, the queue will 
> stall but never hang forever.
> 
> I think for a rejection of a command with none outstanding, a stall is what 
> you want (give the host/device time to recover from whatever the problem is).  
> The length of the stall will be dependent on I/O pressure in the system, which 
> is also roughly what you want: we can afford to give a device quite a while to 
> recover if we're not desparate to get I/O to it. We can also handle BUSY 
> returns this way too...
> 
> James

ok I see the call path now. I was unsure that blk_run_queues would be
called for non-fs IO.

I traced a dd (dd if=/dev/sda of=/dev/null count=2 bs=512) command under
uml with a modified scsi_debug to return 1 on a queuecommand call and
your patch. The trace showed blk_run_queues being called through
wb_kupdate.

#0  scsi_request_fn (...) at scsi_lib.c:738
#1  0xa00beda2 in blk_run_queues () at ll_rw_blk.c:1045
#2  0xa004fb0d in wb_kupdate (...) at page-writeback.c:270
#3  0xa004f6c9 in __pdflush (...) at pdflush.c:130
#4  0xa004f76e in pdflush (...) at pdflush.c:178
#5  0xa001aba4 in run_kernel_thread (...) at process.c:232


Since you are modifying scsi_mlqueue_insert what about a couple of these
cleanups.

	- Is the call to scsi_delete_timer really necessary. All callers
	  have already deleted the timer. The del_timer function takes a
	  lock which would be nice to avoid if we do not need to call
	  it.  If we want to protect the code we could do a quick check
	  on SCset->eh_timeout.function prior to calling.

	- Patrick pointed out a while ago that the "if (host->host_busy
	  == 0)" check and the similar one for the device will never be
	  called because to be in this function the values of these
	  variables need to be at least 1. I believe this direct call to
	  scsi_retry_command should be removed instead of adjusting the
	  check to "== 1" as this seems counter to how you are trying to
	  handle busy.

-andmike
--
Michael Anderson
andmike@us.ibm.com


^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] first cut at fixing unable to requeue with no outstanding commands
  2002-10-01 16:23         ` Mike Anderson
@ 2002-10-01 16:30           ` James Bottomley
  0 siblings, 0 replies; 297+ messages in thread
From: James Bottomley @ 2002-10-01 16:30 UTC (permalink / raw)
  To: linux-scsi

andmike@us.ibm.com said:
> 	- Is the call to scsi_delete_timer really necessary. All callers
> 	  have already deleted the timer. The del_timer function takes a
> 	  lock which would be nice to avoid if we do not need to call
> 	  it.  If we want to protect the code we could do a quick check
> 	  on SCset->eh_timeout.function prior to calling. 

That's a belt and braces thing.  I suppose we could change it to BUG_ON timer 
active and see what pops out of the woodwork.

andmike@us.ibm.com said:
> 	- Patrick pointed out a while ago that the "if (host->host_busy
> 	  == 0)" check and the similar one for the device will never be
> 	  called because to be in this function the values of these
> 	  variables need to be at least 1. I believe this direct call to
> 	  scsi_retry_command should be removed instead of adjusting the
> 	  check to "== 1" as this seems counter to how you are trying to
> 	  handle busy. 

Yes, I plan on slowly removing all automatic reissues of commands. I've 
already removed the checks in my local tree (and the corresponding reissues).

James



^ permalink raw reply	[flat|nested] 297+ messages in thread

* Inhibit auto-attach of scsi disks ?
  2002-10-01 15:14       ` James Bottomley
  2002-10-01 16:23         ` Mike Anderson
@ 2002-10-01 20:18         ` Scott Merritt
  2002-10-02  0:46           ` Alan Cox
  1 sibling, 1 reply; 297+ messages in thread
From: Scott Merritt @ 2002-10-01 20:18 UTC (permalink / raw)
  To: linux-scsi

> patmans@us.ibm.com said:
> use of the block device (i.e. dd if=/dev/sda), and probably others. 

I've noticed (at least in LK 2.4.19) that the scsi system will automatically "attach" any scsi disk-type device that it finds while scanning the scsi bus for new devices.  This includes issuing read capacity commands and attempting to read and parse that device's partition table.

I was wondering if there has been any discussion of a method/option by which a user could inhibit the automatic attachment of disk units ?  This might be desired if the users intent was to reformat, repartition, or perform other low-level activities.

Thanks, Scott.

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: Inhibit auto-attach of scsi disks ?
  2002-10-01 20:18         ` Inhibit auto-attach of scsi disks ? Scott Merritt
@ 2002-10-02  0:46           ` Alan Cox
  2002-10-02  1:49             ` Scott Merritt
  0 siblings, 1 reply; 297+ messages in thread
From: Alan Cox @ 2002-10-02  0:46 UTC (permalink / raw)
  To: Scott Merritt; +Cc: linux-scsi

You can do all those things while it is attached as a disk device.

Alan


^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: Inhibit auto-attach of scsi disks ?
  2002-10-02  0:46           ` Alan Cox
@ 2002-10-02  1:49             ` Scott Merritt
  2002-10-02  1:58               ` Doug Ledford
  2002-10-02 13:40               ` Alan Cox
  0 siblings, 2 replies; 297+ messages in thread
From: Scott Merritt @ 2002-10-02  1:49 UTC (permalink / raw)
  To: Alan Cox; +Cc: Scsi, linux-scsi

Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> You can do all those things while it is attached as a disk device.

Yes - my concerns are perhaps more "artistic" (and possibly misguided).  If I am mounting a disk for low level maintenance, it may not have a valid partition table and I may not appreciate the Syslog warnings related to the partition table.  Furthermore, all though I haven't investigated, it seems like it might be tricky to determine that I have altered the partition table and that /dev/sdb6 is no longer a valid partition.  To me, it just seemed that giving the user/administrator some way to control/inhibit the auto-attachment would be a "cleaner" solution to the problem - but that's just one man's opinion ... :)

Regards, Scott.

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: Inhibit auto-attach of scsi disks ?
  2002-10-02  1:49             ` Scott Merritt
@ 2002-10-02  1:58               ` Doug Ledford
  2002-10-02  2:45                 ` Scott Merritt
  2002-10-02 13:40               ` Alan Cox
  1 sibling, 1 reply; 297+ messages in thread
From: Doug Ledford @ 2002-10-02  1:58 UTC (permalink / raw)
  To: Scott Merritt; +Cc: Alan Cox, linux-scsi

On Tue, Oct 01, 2002 at 09:49:37PM -0400, Scott Merritt wrote:
> 
> Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> > You can do all those things while it is attached as a disk device.
> 
> Yes - my concerns are perhaps more "artistic" (and possibly misguided).  If I am mounting a disk for low level maintenance, it may not have a valid partition table and I may not appreciate the Syslog warnings related to the partition table.  Furthermore, all though I haven't investigated, it seems like it might be tricky to determine that I have altered the partition table and that /dev/sdb6 is no longer a valid partition.  To me, it just seemed that giving the user/administrator some way to control/inhibit the auto-attachment would be a "cleaner" solution to the problem - but that's just one man's opinion ... :)

You are confusing auto attachment with auto mounting.  When you do the low 
level things you suggest, the disk should not be *mounted* but it *must* 
be attached (or else there won't be a scsi device for you to open and tell 
to format itself for instance).  Same goes for fdisk partition operations, 
the disk shouldn't be mounted, but it must be attached for fdisk to be 
able to write to it.  Then, when the partition tables are rewritten, fdisk 
tells the kernel to reread the tables from disk, which the kernel happily 
does as long as no one currently has any of the previous partitions still 
mounted.

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: Inhibit auto-attach of scsi disks ?
  2002-10-02  1:58               ` Doug Ledford
@ 2002-10-02  2:45                 ` Scott Merritt
  0 siblings, 0 replies; 297+ messages in thread
From: Scott Merritt @ 2002-10-02  2:45 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-scsi

On Tue, 1 Oct 2002 21:58:38 -0400
Doug Ledford <dledford@redhat.com> wrote:

> You are confusing auto attachment with auto mounting.  When you do the low 
> level things you suggest, the disk should not be *mounted* but it *must* 
> be attached (or else there won't be a scsi device for you to open and tell 
> to format itself for instance).

Yes, I would certainly agree that the scsi disk must first be "attached".  However, I'm not entirely clear on the need to *always* read/parse the partition table as an integral part of "attaching" the scsi disk.

Assuming that I understand you correctly, I think this is simply a slightly different "artistic" view.  Under the assumption that an "early" read of the partition table does no "damage", there is clearly no compelling reason to change.  About the best argument I could come up with is some type of hypothetical disk device that would appreciate a firmware/parameter download from the "main" filesystem before being forced to perform it's first data operation.  An admittedly tenuous example ... :)

Best regards, Scott.

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: Inhibit auto-attach of scsi disks ?
  2002-10-02  1:49             ` Scott Merritt
  2002-10-02  1:58               ` Doug Ledford
@ 2002-10-02 13:40               ` Alan Cox
  1 sibling, 0 replies; 297+ messages in thread
From: Alan Cox @ 2002-10-02 13:40 UTC (permalink / raw)
  To: Scott Merritt; +Cc: linux-scsi

You need it attached to write the partition table.


^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] scsi host cleanup 3/3 (driver changes)
@ 2002-10-10 15:01 Stephen Cameron
  2002-10-10 16:46 ` Mike Anderson
  0 siblings, 1 reply; 297+ messages in thread
From: Stephen Cameron @ 2002-10-10 15:01 UTC (permalink / raw)
  To: andmike; +Cc: linux-scsi

Mike Anderson wrote:

> If you read my previous post on this patch I indicated that few of the
> driver changes I was only able to compile test ( block/cciss_scsi.c,
> scsi/53c700.c, scsi/pcmcia/*, scsi/wd33c93.c). The changes to the
> drivers are to remove the old interfaces and possibly extra NULL inits
> of struct members. These changes will need to be ok'd by there
> respective maintainers.

I tried out these patches with 2.5.40 and the cciss driver.
Looks ok to me.

When I did "echo engage scsi > /proc/driver/cciss/cciss0" to 
make the cciss driver register with the scsi subsystem, I got
what you see below.  However, I also got the same thing when 
trying 2.5.40 without the patches, so I don't think the patches
are the problem.

When I tried 2.5.41 without the patches, the problem below was
gone.

And, despite all this, it always appeared to work anyway. 
My tape drive appeared, and I could always do i/o to it.

-- steve

dmesg output follows:

cciss0: No device changes detected.
scsi0 : cciss0
  Vendor: COMPAQ    Model: SDT-10000         Rev: 1.14
  Type:   Sequential-Access                  ANSI SCSI revision: 02
Debug: sleeping function called from illegal context at slab.c:1374
ce275de0 c01195e6 c0320bc0 c0326282 0000055e c1287080 c013c15f c0326282 
       0000055e c1336324 c1330c00 00000000 ce275e4c 00000286 00000286 cfc32974 
       cf37e000 00000000 cfffb080 d0800000 00001000 ce274000 00001000 c013a199 
Call Trace:
 [<c01195e6>]__might_sleep+0x56/0x5d
 [<c013c15f>]kmalloc+0x4f/0x330
 [<c013a199>]get_vm_area+0x29/0x140
 [<c013a4fb>]__vmalloc+0x3b/0x120
 [<c013a5f6>]vmalloc+0x16/0x20
 [<c029837e>]sg_init+0xbe/0x160
 [<c0288280>]scsi_register_host+0x130/0x200
 [<c025a5b6>]cciss_engage_scsi+0x136/0x150
 [<c025a8da>]cciss_proc_write+0x8a/0xb0
 [<c015ca9f>]open_namei+0x37f/0x4e0
 [<c013bcf2>]free_block+0x192/0x2c0
 [<c017bf21>]proc_file_write+0x31/0x40
 [<c014d316>]vfs_write+0xb6/0x180
 [<c014c94f>]filp_close+0x10f/0x120
 [<c014c94f>]filp_close+0x10f/0x120
 [<c014d44a>]sys_write+0x2a/0x40
 [<c01078af>]syscall_call+0x7/0xb

bad: scheduling while atomic!
ce275ce8 c0116b9d c0320a20 c03c0590 ce275d0c c01170a1 c12b5800 00000000 
       ce274000 ce275dd4 ce275dd8 ce275d68 c011749a 00000000 cfcb1080 c0117080 
       00000000 00000000 c0117196 c03c0588 00000003 00000001 cfcb1080 c0117080 
Call Trace:
 [<c0116b9d>]schedule+0x3d/0x4c0
 [<c01170a1>]default_wake_function+0x21/0x40
 [<c011749a>]wait_for_completion+0x12a/0x1e0
 [<c0117080>]default_wake_function+0x0/0x40
 [<c0117196>]__wake_up+0x66/0xb0
 [<c0117080>]default_wake_function+0x0/0x40
 [<c012d5a1>]call_usermodehelper+0x101/0x110
 [<c012d470>]__call_usermodehelper+0x0/0x30
 [<c013bcf2>]free_block+0x192/0x2c0
 [<c012d470>]__call_usermodehelper+0x0/0x30
 [<c021ed84>]dev_hotplug+0x1a4/0x230
 [<c021edd7>]dev_hotplug+0x1f7/0x230
 [<c021babd>]device_attach+0x2d/0x40
 [<c021be84>]device_register+0x194/0x270
 [<c029874f>]sg_attach+0x2cf/0x380
 [<c02882ca>]scsi_register_host+0x17a/0x200
 [<c025a5b6>]cciss_engage_scsi+0x136/0x150
 [<c025a8da>]cciss_proc_write+0x8a/0xb0
 [<c015ca9f>]open_namei+0x37f/0x4e0
 [<c013bcf2>]free_block+0x192/0x2c0
 [<c017bf21>]proc_file_write+0x31/0x40
 [<c014d316>]vfs_write+0xb6/0x180
 [<c014c94f>]filp_close+0x10f/0x120
 [<c014c94f>]filp_close+0x10f/0x120
 [<c014d44a>]sys_write+0x2a/0x40
 [<c01078af>]syscall_call+0x7/0xb

Attached scsi tape st0 at scsi0, channel 0, id 0, lun 0
st0: try direct i/o: yes, max page reachable by HBA 65532

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] scsi host cleanup 3/3 (driver changes)
  2002-10-10 15:01 [PATCH] scsi host cleanup 3/3 (driver changes) Stephen Cameron
@ 2002-10-10 16:46 ` Mike Anderson
  2002-10-10 16:59   ` James Bottomley
  0 siblings, 1 reply; 297+ messages in thread
From: Mike Anderson @ 2002-10-10 16:46 UTC (permalink / raw)
  To: Stephen Cameron; +Cc: linux-scsi

Stephen,
	You where most likely hitting the sg might sleep problem. I
	posted a patch to this list and I believe that the workaround is
	in 2.5.41.

Stephen Cameron [steve.cameron@hp.com] wrote:
> Mike Anderson wrote:
> 
> > If you read my previous post on this patch I indicated that few of the
> > driver changes I was only able to compile test ( block/cciss_scsi.c,
> > scsi/53c700.c, scsi/pcmcia/*, scsi/wd33c93.c). The changes to the
> > drivers are to remove the old interfaces and possibly extra NULL inits
> > of struct members. These changes will need to be ok'd by there
> > respective maintainers.
> 
> I tried out these patches with 2.5.40 and the cciss driver.
> Looks ok to me.
> 
> When I did "echo engage scsi > /proc/driver/cciss/cciss0" to 
> make the cciss driver register with the scsi subsystem, I got
> what you see below.  However, I also got the same thing when 
> trying 2.5.40 without the patches, so I don't think the patches
> are the problem.
> 
> When I tried 2.5.41 without the patches, the problem below was
> gone.
> 
> And, despite all this, it always appeared to work anyway. 
> My tape drive appeared, and I could always do i/o to it.
> 
> -- steve
> 
> dmesg output follows:
> 
> cciss0: No device changes detected.
> scsi0 : cciss0
>   Vendor: COMPAQ    Model: SDT-10000         Rev: 1.14
>   Type:   Sequential-Access                  ANSI SCSI revision: 02
> Debug: sleeping function called from illegal context at slab.c:1374
> ce275de0 c01195e6 c0320bc0 c0326282 0000055e c1287080 c013c15f c0326282 
>        0000055e c1336324 c1330c00 00000000 ce275e4c 00000286 00000286 cfc32974 
>        cf37e000 00000000 cfffb080 d0800000 00001000 ce274000 00001000 c013a199 
> Call Trace:
>  [<c01195e6>]__might_sleep+0x56/0x5d
>  [<c013c15f>]kmalloc+0x4f/0x330
>  [<c013a199>]get_vm_area+0x29/0x140
>  [<c013a4fb>]__vmalloc+0x3b/0x120
>  [<c013a5f6>]vmalloc+0x16/0x20
>  [<c029837e>]sg_init+0xbe/0x160
>  [<c0288280>]scsi_register_host+0x130/0x200
>  [<c025a5b6>]cciss_engage_scsi+0x136/0x150
>  [<c025a8da>]cciss_proc_write+0x8a/0xb0
>  [<c015ca9f>]open_namei+0x37f/0x4e0
>  [<c013bcf2>]free_block+0x192/0x2c0
>  [<c017bf21>]proc_file_write+0x31/0x40
>  [<c014d316>]vfs_write+0xb6/0x180
>  [<c014c94f>]filp_close+0x10f/0x120
>  [<c014c94f>]filp_close+0x10f/0x120
>  [<c014d44a>]sys_write+0x2a/0x40
>  [<c01078af>]syscall_call+0x7/0xb
.. snip ..

-andmike
--
Michael Anderson
andmike@us.ibm.com


^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] scsi host cleanup 3/3 (driver changes)
  2002-10-10 16:46 ` Mike Anderson
@ 2002-10-10 16:59   ` James Bottomley
  2002-10-10 20:05     ` Mike Anderson
  0 siblings, 1 reply; 297+ messages in thread
From: James Bottomley @ 2002-10-10 16:59 UTC (permalink / raw)
  To: linux-scsi

I (finally) got around to looking at the patches.  There were only a few minor 
quibbles, really boiling down to the fact that you export some functions 
beginning shost_ instead of scsi_ (shost_tp_for_each_host, 
shost_chk_and_release).  I can't see any reason why this might cause a naming 
clash, but I think it is safer to stick to exports beginning with scsi_.

Also, why do we now have some functions beginning scsi_host and some beginning 
scsi_shost?  Could we just use one or the other (I'd vote for scsi_host, since 
it's shorter).

James

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] scsi host cleanup 3/3 (driver changes)
  2002-10-10 16:59   ` James Bottomley
@ 2002-10-10 20:05     ` Mike Anderson
  0 siblings, 0 replies; 297+ messages in thread
From: Mike Anderson @ 2002-10-10 20:05 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi

James Bottomley [James.Bottomley@steeleye.com] wrote:
> I (finally) got around to looking at the patches.  There were only a few minor 
> quibbles, really boiling down to the fact that you export some functions 
> beginning shost_ instead of scsi_ (shost_tp_for_each_host, 
> shost_chk_and_release).  I can't see any reason why this might cause a naming 
> clash, but I think it is safer to stick to exports beginning with scsi_.

Ok will change to scsi_.

> 
> Also, why do we now have some functions beginning scsi_host and some beginning 
> scsi_shost?  Could we just use one or the other (I'd vote for scsi_host, since 
> it's shorter).
> 

Ok. will change to scsi_host.

Currently I waiting on a re-roll because of an issue currently with the
patch relating to driverfs registration. Since I moved a lot of
functionality to scsi_register (which I wanted to do because of past
issues / comments) I am registering the host with driverfs prior to
scsi_set_pci_device being called by the host driver. This result in
scsi(n) showing up under root. I could move registration into the
scsi_set_pci_device function, but was wanting to wait until we get some
driverfs layout issues understood.

-andmike
--
Michael Anderson
andmike@us.ibm.com

^ permalink raw reply	[flat|nested] 297+ messages in thread

[parent not found: <patmans@us.ibm.com>]

* [RFC PATCH] consolidate SCSI-2 command lun setting
@ 2002-10-15 16:55 ` Patrick Mansfield
  2002-10-15 20:29   ` James Bottomley
  0 siblings, 1 reply; 297+ messages in thread
From: Patrick Mansfield @ 2002-10-15 16:55 UTC (permalink / raw)
  To: linux-scsi

This patch consolidates the setting of the LUN in byte 1 of the SCSI
command block for SCSI-2 and lower devices.

This is needed for multi-path IO (some devices can actually have different
LUN values for each path), but is also a clean up of the code.

The scsi_error.c retry code does not use scsi_request_fn(), so it
must still set the LUN value.

sg.c was able to inhibit setting the value, this removes that capability -
if it is really needed it can be black listed in the device_list[]
flags, or sg could set a similiar flag.

This is patched against the latest bk (as of some time on Oct 14), and
patches clean against the snapshots/patch-2.5.42-bk2 (there are a bunch
of sr changes in the latest bk).

 drivers/scsi/osst.c       |    2 --
 drivers/scsi/scsi_error.c |    2 +-
 drivers/scsi/scsi_ioctl.c |   16 ++++------------
 drivers/scsi/scsi_lib.c   |    8 ++++++++
 drivers/scsi/sd.c         |   22 +++++-----------------
 drivers/scsi/sg.c         |    4 ----
 drivers/scsi/sr.c         |   17 ++---------------
 drivers/scsi/sr_ioctl.c   |   20 +-------------------
 drivers/scsi/sr_vendor.c  |   18 +++---------------
 drivers/scsi/st.c         |    2 --
 include/scsi/sg.h         |    2 +-
 11 files changed, 25 insertions(+), 88 deletions(-)

diff -Nru a/drivers/scsi/osst.c b/drivers/scsi/osst.c
--- a/drivers/scsi/osst.c	Tue Oct 15 09:46:06 2002
+++ b/drivers/scsi/osst.c	Tue Oct 15 09:46:06 2002
@@ -322,8 +322,6 @@
 		}
 	}
 
-        if (SRpnt->sr_device->scsi_level <= SCSI_2)
-                cmd[1] |= (SRpnt->sr_device->lun << 5) & 0xe0;
         init_completion(&STp->wait);
 	SRpnt->sr_use_sg = (bytes > (STp->buffer)->sg[0].length) ?
 				    (STp->buffer)->use_sg : 0;
diff -Nru a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
--- a/drivers/scsi/scsi_error.c	Tue Oct 15 09:46:06 2002
+++ b/drivers/scsi/scsi_error.c	Tue Oct 15 09:46:06 2002
@@ -1419,7 +1419,7 @@
 	}
 
 	sreq->sr_cmnd[0] = ALLOW_MEDIUM_REMOVAL;
-	sreq->sr_cmnd[1] = (sdev->scsi_level <= SCSI_2) ? (sdev->lun << 5) : 0;
+	sreq->sr_cmnd[1] = 0;
 	sreq->sr_cmnd[2] = 0;
 	sreq->sr_cmnd[3] = 0;
 	sreq->sr_cmnd[4] = SCSI_REMOVAL_PREVENT;
diff -Nru a/drivers/scsi/scsi_ioctl.c b/drivers/scsi/scsi_ioctl.c
--- a/drivers/scsi/scsi_ioctl.c	Tue Oct 15 09:46:07 2002
+++ b/drivers/scsi/scsi_ioctl.c	Tue Oct 15 09:46:07 2002
@@ -160,7 +160,7 @@
 	       return 0;
 
 	scsi_cmd[0] = ALLOW_MEDIUM_REMOVAL;
-	scsi_cmd[1] = (dev->scsi_level <= SCSI_2) ? (dev->lun << 5) : 0;
+	scsi_cmd[1] = 0;
 	scsi_cmd[2] = 0;
 	scsi_cmd[3] = 0;
 	scsi_cmd[4] = state;
@@ -297,12 +297,6 @@
 	if(copy_from_user(buf, cmd_in + cmdlen, inlen))
 		goto error;
 
-	/*
-	 * Set the lun field to the correct value.
-	 */
-	if (dev->scsi_level <= SCSI_2)
-		cmd[1] = (cmd[1] & 0x1f) | (dev->lun << 5);
-
 	switch (opcode) {
 	case FORMAT_UNIT:
 		timeout = FORMAT_UNIT_TIMEOUT;
@@ -416,7 +410,6 @@
 int scsi_ioctl(Scsi_Device * dev, int cmd, void *arg)
 {
 	char scsi_cmd[MAX_COMMAND_SIZE];
-	char cmd_byte1;
 
 	/* No idea how this happens.... */
 	if (!dev)
@@ -431,7 +424,6 @@
 	if (!scsi_block_when_processing_errors(dev)) {
 		return -ENODEV;
 	}
-	cmd_byte1 = (dev->scsi_level <= SCSI_2) ? (dev->lun << 5) : 0;
 
 	switch (cmd) {
 	case SCSI_IOCTL_GET_IDLUN:
@@ -484,7 +476,7 @@
 		return scsi_set_medium_removal(dev, SCSI_REMOVAL_ALLOW);
 	case SCSI_IOCTL_TEST_UNIT_READY:
 		scsi_cmd[0] = TEST_UNIT_READY;
-		scsi_cmd[1] = cmd_byte1;
+		scsi_cmd[1] = 0;
 		scsi_cmd[2] = scsi_cmd[3] = scsi_cmd[5] = 0;
 		scsi_cmd[4] = 0;
 		return ioctl_internal_command((Scsi_Device *) dev, scsi_cmd,
@@ -492,7 +484,7 @@
 		break;
 	case SCSI_IOCTL_START_UNIT:
 		scsi_cmd[0] = START_STOP;
-		scsi_cmd[1] = cmd_byte1;
+		scsi_cmd[1] = 0;
 		scsi_cmd[2] = scsi_cmd[3] = scsi_cmd[5] = 0;
 		scsi_cmd[4] = 1;
 		return ioctl_internal_command((Scsi_Device *) dev, scsi_cmd,
@@ -500,7 +492,7 @@
 		break;
 	case SCSI_IOCTL_STOP_UNIT:
 		scsi_cmd[0] = START_STOP;
-		scsi_cmd[1] = cmd_byte1;
+		scsi_cmd[1] = 0;
 		scsi_cmd[2] = scsi_cmd[3] = scsi_cmd[5] = 0;
 		scsi_cmd[4] = 0;
 		return ioctl_internal_command((Scsi_Device *) dev, scsi_cmd,
diff -Nru a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
--- a/drivers/scsi/scsi_lib.c	Tue Oct 15 09:46:06 2002
+++ b/drivers/scsi/scsi_lib.c	Tue Oct 15 09:46:06 2002
@@ -972,6 +972,14 @@
 				continue;
 			}
 		}
+		/* 
+		 * If SCSI-2 or lower, store the LUN value in cmnd.
+		 */
+		if (SDpnt->scsi_level <= SCSI_2)
+			SCpnt->cmnd[1] = (SCpnt->cmnd[1] & 0x1f) |
+				(SCpnt->lun << 5 & 0xe0);
+
+
 		/*
 		 * Finally, initialize any error handling parameters, and set up
 		 * the timers for timeouts.
diff -Nru a/drivers/scsi/sd.c b/drivers/scsi/sd.c
--- a/drivers/scsi/sd.c	Tue Oct 15 09:46:07 2002
+++ b/drivers/scsi/sd.c	Tue Oct 15 09:46:07 2002
@@ -402,8 +402,7 @@
 		nbuff, (rq_data_dir(SCpnt->request) == WRITE) ? 
 		"writing" : "reading", this_count, SCpnt->request->nr_sectors));
 
-	SCpnt->cmnd[1] = (SCpnt->device->scsi_level <= SCSI_2) ?
-			 ((SCpnt->lun << 5) & 0xe0) : 0;
+	SCpnt->cmnd[1] = 0;
 
 	if (((this_count > 0xff) || (block > 0x1fffff)) || SCpnt->device->ten) {
 		if (this_count > 0xffff)
@@ -815,9 +814,7 @@
 
 		while (retries < 3) {
 			cmd[0] = TEST_UNIT_READY;
-			cmd[1] = (sdp->scsi_level <= SCSI_2) ?
-				((sdp->lun << 5) & 0xe0) : 0;
-			memset((void *) &cmd[2], 0, 8);
+			memset((void *) &cmd[1], 0, 9);
 
 			SRpnt->sr_cmd_len = 0;
 			SRpnt->sr_sense_buffer[0] = 0;
@@ -851,9 +848,7 @@
 				printk(KERN_NOTICE "%s: Spinning up disk...",
 				       diskname);
 				cmd[0] = START_STOP;
-				cmd[1] = (sdp->scsi_level <= SCSI_2) ?
-					((sdp->lun << 5) & 0xe0) : 0;
-				cmd[1] |= 1;	/* Return immediately */
+				cmd[1] = 1;	/* Return immediately */
 				memset((void *) &cmd[2], 0, 8);
 				cmd[4] = 1;	/* Start spin cycle */
 				SRpnt->sr_cmd_len = 0;
@@ -894,7 +889,6 @@
 		   Scsi_Request *SRpnt, unsigned char *buffer) {
 
 	unsigned char cmd[10];
-	Scsi_Device *sdp = sdkp->device;
 	int the_result, retries;
 
 	retries = 3;
@@ -902,9 +896,7 @@
 
 		memset((void *) &cmd[0], 0, 10);
 		cmd[0] = MODE_SENSE;
-		cmd[1] = (sdp->scsi_level <= SCSI_2) ?
-			 ((sdp->lun << 5) & 0xe0) : 0;
-		cmd[1] |= 0x08;	/* DBD */
+		cmd[1] = 0x08;	/* DBD */
 		cmd[2] = 0x08;	/* current values, cache page */
 		cmd[4] = 128;	/* allocation length */
 
@@ -968,9 +960,7 @@
 	retries = 3;
 	do {
 		cmd[0] = READ_CAPACITY;
-		cmd[1] = (sdp->scsi_level <= SCSI_2) ?
-			((sdp->lun << 5) & 0xe0) : 0;
-		memset((void *) &cmd[2], 0, 8);
+		memset((void *) &cmd[1], 0, 9);
 		memset((void *) buffer, 0, 8);
 
 		SRpnt->sr_cmd_len = 0;
@@ -1090,7 +1080,6 @@
 
 	memset((void *) &cmd[0], 0, 8);
 	cmd[0] = MODE_SENSE;
-	cmd[1] = (sdp->scsi_level <= SCSI_2) ? ((sdp->lun << 5) & 0xe0) : 0;
 	cmd[2] = modepage;
 	cmd[4] = len;
 
@@ -1612,7 +1601,6 @@
 		unsigned char cmd[10] = { 0 };
 
 		cmd[0] = SYNCHRONIZE_CACHE;
-		cmd[1] = SDpnt->scsi_level <= SCSI_2 ? (SDpnt->lun << 5) & 0xe0 : 0;
 		/* leave the rest of the command zero to indicate 
 		 * flush everything */
 		scsi_wait_req(SRpnt, (void *)cmd, NULL, 0,
diff -Nru a/drivers/scsi/sg.c b/drivers/scsi/sg.c
--- a/drivers/scsi/sg.c	Tue Oct 15 09:46:06 2002
+++ b/drivers/scsi/sg.c	Tue Oct 15 09:46:06 2002
@@ -705,10 +705,6 @@
 	SRpnt->sr_request->rq_dev = sdp->i_rdev;
 	SRpnt->sr_sense_buffer[0] = 0;
 	SRpnt->sr_cmd_len = hp->cmd_len;
-	if (!(hp->flags & SG_FLAG_LUN_INHIBIT)) {
-		if (sdp->device->scsi_level <= SCSI_2)
-			cmnd[1] = (cmnd[1] & 0x1f) | (sdp->device->lun << 5);
-	}
 	SRpnt->sr_use_sg = srp->data.k_use_sg;
 	SRpnt->sr_sglist_len = srp->data.sglist_len;
 	SRpnt->sr_bufflen = srp->data.bufflen;
diff -Nru a/drivers/scsi/sr.c b/drivers/scsi/sr.c
--- a/drivers/scsi/sr.c	Tue Oct 15 09:46:06 2002
+++ b/drivers/scsi/sr.c	Tue Oct 15 09:46:06 2002
@@ -336,9 +336,7 @@
 		   (rq_data_dir(SCpnt->request) == WRITE) ? "writing" : "reading",
 				 this_count, SCpnt->request->nr_sectors));
 
-	SCpnt->cmnd[1] = (SCpnt->device->scsi_level <= SCSI_2) ?
-			 ((SCpnt->lun << 5) & 0xe0) : 0;
-
+	SCpnt->cmnd[1] = 0;
 	block = (unsigned int)SCpnt->request->sector / (s_size >> 9);
 
 	if (this_count > 0xffff)
@@ -486,9 +484,7 @@
 
 	do {
 		cmd[0] = READ_CAPACITY;
-		cmd[1] = (cd->device->scsi_level <= SCSI_2) ?
-			 ((cd->device->lun << 5) & 0xe0) : 0;
-		memset((void *) &cmd[2], 0, 8);
+		memset((void *) &cmd[1], 0, 9);
 		SRpnt->sr_request->rq_status = RQ_SCSI_BUSY;	/* Mark as really busy */
 		SRpnt->sr_cmd_len = 0;
 
@@ -599,8 +595,6 @@
 	}
 	memset(&cgc, 0, sizeof(struct cdrom_generic_command));
 	cgc.cmd[0] = MODE_SENSE;
-	cgc.cmd[1] = (cd->device->scsi_level <= SCSI_2) ?
-		     ((cd->device->lun << 5) & 0xe0) : 0;
 	cgc.cmd[2] = 0x2a;
 	cgc.cmd[4] = 128;
 	cgc.buffer = buffer;
@@ -678,13 +672,6 @@
  */
 static int sr_packet(struct cdrom_device_info *cdi, struct cdrom_generic_command *cgc)
 {
-	Scsi_CD *cd = cdi->handle;
-	Scsi_Device *device = cd->device;
-	
-	/* set the LUN */
-	if (device->scsi_level <= SCSI_2)
-		cgc->cmd[1] |= device->lun << 5;
-
 	if (cgc->timeout <= 0)
 		cgc->timeout = IOCTL_TIMEOUT;
 
diff -Nru a/drivers/scsi/sr_ioctl.c b/drivers/scsi/sr_ioctl.c
--- a/drivers/scsi/sr_ioctl.c	Tue Oct 15 09:46:06 2002
+++ b/drivers/scsi/sr_ioctl.c	Tue Oct 15 09:46:06 2002
@@ -200,8 +200,6 @@
 
 	memset(&cgc, 0, sizeof(struct cdrom_generic_command));
 	cgc.cmd[0] = GPCMD_TEST_UNIT_READY;
-	cgc.cmd[1] = (cd->device->scsi_level <= SCSI_2) ?
-		     ((cd->device->lun) << 5) : 0;
 	cgc.quiet = 1;
 	cgc.data_direction = SCSI_DATA_NONE;
 	cgc.timeout = IOCTL_TIMEOUT;
@@ -215,8 +213,6 @@
 
 	memset(&cgc, 0, sizeof(struct cdrom_generic_command));
 	cgc.cmd[0] = GPCMD_START_STOP_UNIT;
-	cgc.cmd[1] = (cd->device->scsi_level <= SCSI_2) ?
-		     ((cd->device->lun) << 5) : 0;
 	cgc.cmd[4] = (pos == 0) ? 0x03 /* close */ : 0x02 /* eject */ ;
 	cgc.data_direction = SCSI_DATA_NONE;
 	cgc.timeout = IOCTL_TIMEOUT;
@@ -293,8 +289,6 @@
 
 	memset(&cgc, 0, sizeof(struct cdrom_generic_command));
 	cgc.cmd[0] = GPCMD_READ_SUBCHANNEL;
-	cgc.cmd[1] = (cd->device->scsi_level <= SCSI_2) ?
-		     ((cd->device->lun) << 5) : 0;
 	cgc.cmd[2] = 0x40;	/* I do want the subchannel info */
 	cgc.cmd[3] = 0x02;	/* Give me medium catalog number info */
 	cgc.cmd[8] = 24;
@@ -327,8 +321,6 @@
 
 	memset(&cgc, 0, sizeof(struct cdrom_generic_command));
 	cgc.cmd[0] = GPCMD_SET_SPEED;	/* SET CD SPEED */
-	cgc.cmd[1] = (cd->device->scsi_level <= SCSI_2) ?
-		     ((cd->device->lun) << 5) : 0;
 	cgc.cmd[2] = (speed >> 8) & 0xff;	/* MSB for speed (in kbytes/sec) */
 	cgc.cmd[3] = speed & 0xff;	/* LSB */
 	cgc.data_direction = SCSI_DATA_NONE;
@@ -361,8 +353,6 @@
 			struct cdrom_tochdr *tochdr = (struct cdrom_tochdr *) arg;
 
 			cgc.cmd[0] = GPCMD_READ_TOC_PMA_ATIP;
-			cgc.cmd[1] = (cd->device->scsi_level <= SCSI_2) ?
-				     ((cd->device->lun) << 5) : 0;
 			cgc.cmd[8] = 12;		/* LSB of length */
 			cgc.buffer = buffer;
 			cgc.buflen = 12;
@@ -382,8 +372,6 @@
 			struct cdrom_tocentry *tocentry = (struct cdrom_tocentry *) arg;
 
 			cgc.cmd[0] = GPCMD_READ_TOC_PMA_ATIP;
-			cgc.cmd[1] = (cd->device->scsi_level <= SCSI_2) ?
-				     ((cd->device->lun) << 5) : 0;
 			cgc.cmd[1] |= (tocentry->cdte_format == CDROM_MSF) ? 0x02 : 0;
 			cgc.cmd[6] = tocentry->cdte_track;
 			cgc.cmd[8] = 12;		/* LSB of length */
@@ -411,8 +399,6 @@
 		struct cdrom_ti* ti = (struct cdrom_ti*)arg;
 
 		cgc.cmd[0] = GPCMD_PLAYAUDIO_TI;
-		cgc.cmd[1] = (cd->device->scsi_level <= SCSI_2) ?
-			     (cd->device->lun << 5) : 0;
 		cgc.cmd[4] = ti->cdti_trk0;
 		cgc.cmd[5] = ti->cdti_ind0;
 		cgc.cmd[7] = ti->cdti_trk1;
@@ -463,9 +449,7 @@
 
 	memset(&cgc, 0, sizeof(struct cdrom_generic_command));
 	cgc.cmd[0] = GPCMD_READ_CD;	/* READ_CD */
-	cgc.cmd[1] = (cd->device->scsi_level <= SCSI_2) ?
-		     (cd->device->lun << 5) : 0;
-	cgc.cmd[1] |= ((format & 7) << 2);
+	cgc.cmd[1] = ((format & 7) << 2);
 	cgc.cmd[2] = (unsigned char) (lba >> 24) & 0xff;
 	cgc.cmd[3] = (unsigned char) (lba >> 16) & 0xff;
 	cgc.cmd[4] = (unsigned char) (lba >> 8) & 0xff;
@@ -521,8 +505,6 @@
 
 	memset(&cgc, 0, sizeof(struct cdrom_generic_command));
 	cgc.cmd[0] = GPCMD_READ_10;
-	cgc.cmd[1] = (cd->device->scsi_level <= SCSI_2) ?
-		     (cd->device->lun << 5) : 0;
 	cgc.cmd[2] = (unsigned char) (lba >> 24) & 0xff;
 	cgc.cmd[3] = (unsigned char) (lba >> 16) & 0xff;
 	cgc.cmd[4] = (unsigned char) (lba >> 8) & 0xff;
diff -Nru a/drivers/scsi/sr_vendor.c b/drivers/scsi/sr_vendor.c
--- a/drivers/scsi/sr_vendor.c	Tue Oct 15 09:46:06 2002
+++ b/drivers/scsi/sr_vendor.c	Tue Oct 15 09:46:06 2002
@@ -124,9 +124,7 @@
 #endif
 	memset(&cgc, 0, sizeof(struct cdrom_generic_command));
 	cgc.cmd[0] = MODE_SELECT;
-	cgc.cmd[1] = (cd->device->scsi_level <= SCSI_2) ?
-		     (cd->device->lun << 5) : 0;
-	cgc.cmd[1] |= (1 << 4);
+	cgc.cmd[1] = (1 << 4);
 	cgc.cmd[4] = 12;
 	modesel = (struct ccs_modesel_head *) buffer;
 	memset(modesel, 0, sizeof(*modesel));
@@ -180,8 +178,6 @@
 
 	case VENDOR_SCSI3:
 		cgc.cmd[0] = READ_TOC;
-		cgc.cmd[1] = (cd->device->scsi_level <= SCSI_2) ?
-			     (cd->device->lun << 5) : 0;
 		cgc.cmd[8] = 12;
 		cgc.cmd[9] = 0x40;
 		cgc.buffer = buffer;
@@ -210,9 +206,7 @@
 	case VENDOR_NEC:{
 			unsigned long min, sec, frame;
 			cgc.cmd[0] = 0xde;
-			cgc.cmd[1] = (cd->device->scsi_level <= SCSI_2) ?
-				     (cd->device->lun << 5) : 0;
-			cgc.cmd[1] |= 0x03;
+			cgc.cmd[1] = 0x03;
 			cgc.cmd[2] = 0xb0;
 			cgc.buffer = buffer;
 			cgc.buflen = 0x16;
@@ -242,9 +236,7 @@
 			/* we request some disc information (is it a XA-CD ?,
 			 * where starts the last session ?) */
 			cgc.cmd[0] = 0xc7;
-			cgc.cmd[1] = (cd->device->scsi_level <= SCSI_2) ?
-				     (cd->device->lun << 5) : 0;
-			cgc.cmd[1] |= 0x03;
+			cgc.cmd[1] = 0x03;
 			cgc.buffer = buffer;
 			cgc.buflen = 4;
 			cgc.quiet = 1;
@@ -272,8 +264,6 @@
 
 	case VENDOR_WRITER:
 		cgc.cmd[0] = READ_TOC;
-		cgc.cmd[1] = (cd->device->scsi_level <= SCSI_2) ?
-			     (cd->device->lun << 5) : 0;
 		cgc.cmd[8] = 0x04;
 		cgc.cmd[9] = 0x40;
 		cgc.buffer = buffer;
@@ -291,8 +281,6 @@
 			break;
 		}
 		cgc.cmd[0] = READ_TOC;	/* Read TOC */
-		cgc.cmd[1] = (cd->device->scsi_level <= SCSI_2) ?
-			     (cd->device->lun << 5) : 0;
 		cgc.cmd[6] = rc & 0x7f;	/* number of last session */
 		cgc.cmd[8] = 0x0c;
 		cgc.cmd[9] = 0x40;
diff -Nru a/drivers/scsi/st.c b/drivers/scsi/st.c
--- a/drivers/scsi/st.c	Tue Oct 15 09:46:06 2002
+++ b/drivers/scsi/st.c	Tue Oct 15 09:46:06 2002
@@ -379,8 +379,6 @@
 		}
 	}
 
-	if (SRpnt->sr_device->scsi_level <= SCSI_2)
-		cmd[1] |= (SRpnt->sr_device->lun << 5) & 0xe0;
 	init_completion(&STp->wait);
 	SRpnt->sr_use_sg = STp->buffer->do_dio || (bytes > (STp->buffer)->frp[0].length);
 	if (SRpnt->sr_use_sg) {
diff -Nru a/include/scsi/sg.h b/include/scsi/sg.h
--- a/include/scsi/sg.h	Tue Oct 15 09:46:06 2002
+++ b/include/scsi/sg.h	Tue Oct 15 09:46:06 2002
@@ -130,7 +130,7 @@
 
 /* following flag values can be "or"-ed together */
 #define SG_FLAG_DIRECT_IO 1     /* default is indirect IO */
-#define SG_FLAG_LUN_INHIBIT 2   /* default is overwrite lun in SCSI */
+#define SG_FLAG_UNUSED_LUN_INHIBIT 2   /* default is overwrite lun in SCSI */
 				/* command block (when <= SCSI_2) */
 #define SG_FLAG_MMAP_IO 4       /* request memory mapped IO */
 #define SG_FLAG_NO_DXFER 0x10000 /* no transfer of kernel buffers to/from */

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC PATCH] consolidate SCSI-2 command lun setting
  2002-10-15 16:55 ` [RFC PATCH] consolidate SCSI-2 command lun setting Patrick Mansfield
@ 2002-10-15 20:29   ` James Bottomley
  2002-10-15 22:00     ` Patrick Mansfield
  0 siblings, 1 reply; 297+ messages in thread
From: James Bottomley @ 2002-10-15 20:29 UTC (permalink / raw)
  To: linux-scsi

This all looks OK, except here:

+++ b/drivers/scsi/scsi_lib.c	Tue Oct 15 09:46:06 2002
@@ -972,6 +972,14 @@
 				continue;
 			}
 		}
+		/* 
+		 * If SCSI-2 or lower, store the LUN value in cmnd.
+		 */
+		if (SDpnt->scsi_level <= SCSI_2)
+			SCpnt->cmnd[1] = (SCpnt->cmnd[1] & 0x1f) |
+				(SCpnt->lun << 5 & 0xe0);
+
+
 		/*
 		 * Finally, initialize any error handling parameters, and set up
 		 * the timers for timeouts.

Shouldn't this code go early in scsi_dispatch_io instead? (with something 
similar in scsi_send_eh_cmnd)

James



^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC PATCH] consolidate SCSI-2 command lun setting
  2002-10-15 20:29   ` James Bottomley
@ 2002-10-15 22:00     ` Patrick Mansfield
  0 siblings, 0 replies; 297+ messages in thread
From: Patrick Mansfield @ 2002-10-15 22:00 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi

On Tue, Oct 15, 2002 at 01:29:59PM -0700, James Bottomley wrote:
> This all looks OK, except here:
> 
> +++ b/drivers/scsi/scsi_lib.c	Tue Oct 15 09:46:06 2002
> @@ -972,6 +972,14 @@
>  				continue;
>  			}
>  		}
> +		/* 
> +		 * If SCSI-2 or lower, store the LUN value in cmnd.
> +		 */
> +		if (SDpnt->scsi_level <= SCSI_2)
> +			SCpnt->cmnd[1] = (SCpnt->cmnd[1] & 0x1f) |
> +				(SCpnt->lun << 5 & 0xe0);
> +
> +
>  		/*
>  		 * Finally, initialize any error handling parameters, and set up
>  		 * the timers for timeouts.
> 
> Shouldn't this code go early in scsi_dispatch_io instead? (with something 
> similar in scsi_send_eh_cmnd)
> 
> James

James -

Thanks, yes, (I assume you meant scsi_dispatch_cmnd), that makes more sense
and I added common code for scsi_send_eh_cmnd.

New patch:

 drivers/scsi/osst.c       |    2 --
 drivers/scsi/scsi.c       |    6 ++++++
 drivers/scsi/scsi_error.c |   12 +++++-------
 drivers/scsi/scsi_ioctl.c |   16 ++++------------
 drivers/scsi/sd.c         |   22 +++++-----------------
 drivers/scsi/sg.c         |    4 ----
 drivers/scsi/sr.c         |   17 ++---------------
 drivers/scsi/sr_ioctl.c   |   20 +-------------------
 drivers/scsi/sr_vendor.c  |   18 +++---------------
 drivers/scsi/st.c         |    2 --
 include/scsi/sg.h         |    2 +-
 11 files changed, 27 insertions(+), 94 deletions(-)

diff -Nru a/drivers/scsi/osst.c b/drivers/scsi/osst.c
--- a/drivers/scsi/osst.c	Tue Oct 15 14:43:30 2002
+++ b/drivers/scsi/osst.c	Tue Oct 15 14:43:30 2002
@@ -322,8 +322,6 @@
 		}
 	}
 
-        if (SRpnt->sr_device->scsi_level <= SCSI_2)
-                cmd[1] |= (SRpnt->sr_device->lun << 5) & 0xe0;
         init_completion(&STp->wait);
 	SRpnt->sr_use_sg = (bytes > (STp->buffer)->sg[0].length) ?
 				    (STp->buffer)->use_sg : 0;
diff -Nru a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
--- a/drivers/scsi/scsi.c	Tue Oct 15 14:43:30 2002
+++ b/drivers/scsi/scsi.c	Tue Oct 15 14:43:30 2002
@@ -798,6 +798,12 @@
 		serial_number = 1;
 	SCpnt->serial_number = serial_number;
 	SCpnt->pid = scsi_pid++;
+	/* 
+	 * If SCSI-2 or lower, store the LUN value in cmnd.
+	 */
+	if (SCpnt->device->scsi_level <= SCSI_2)
+		SCpnt->cmnd[1] = (SCpnt->cmnd[1] & 0x1f) |
+			(SCpnt->lun << 5 & 0xe0);
 
 	/*
 	 * We will wait MIN_RESET_DELAY clock ticks after the last reset so
diff -Nru a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
--- a/drivers/scsi/scsi_error.c	Tue Oct 15 14:43:30 2002
+++ b/drivers/scsi/scsi_error.c	Tue Oct 15 14:43:30 2002
@@ -502,6 +502,10 @@
 	 */
 	scmd->owner = SCSI_OWNER_LOWLEVEL;
 
+	if (scmd->device->scsi_level <= SCSI_2)
+		scmd->cmnd[1] = (scmd->cmnd[1] & 0x1f) |
+			(scmd->lun << 5 & 0xe0);
+
 	if (host->can_queue) {
 		DECLARE_MUTEX_LOCKED(sem);
 
@@ -610,9 +614,6 @@
 	memcpy((void *) scmd->cmnd, (void *) generic_sense,
 	       sizeof(generic_sense));
 
-	if (scmd->device->scsi_level <= SCSI_2)
-		scmd->cmnd[1] = scmd->lun << 5;
-
 	scsi_result = (!scmd->host->hostt->unchecked_isa_dma)
 	    ? &scsi_result0[0] : kmalloc(512, GFP_ATOMIC | GFP_DMA);
 
@@ -839,9 +840,6 @@
 	memcpy((void *) scmd->cmnd, (void *) tur_command,
 	       sizeof(tur_command));
 
-	if (scmd->device->scsi_level <= SCSI_2)
-		scmd->cmnd[1] = scmd->lun << 5;
-
 	/*
 	 * zero the sense buffer.  the scsi spec mandates that any
 	 * untransferred sense data should be interpreted as being zero.
@@ -1419,7 +1417,7 @@
 	}
 
 	sreq->sr_cmnd[0] = ALLOW_MEDIUM_REMOVAL;
-	sreq->sr_cmnd[1] = (sdev->scsi_level <= SCSI_2) ? (sdev->lun << 5) : 0;
+	sreq->sr_cmnd[1] = 0;
 	sreq->sr_cmnd[2] = 0;
 	sreq->sr_cmnd[3] = 0;
 	sreq->sr_cmnd[4] = SCSI_REMOVAL_PREVENT;
diff -Nru a/drivers/scsi/scsi_ioctl.c b/drivers/scsi/scsi_ioctl.c
--- a/drivers/scsi/scsi_ioctl.c	Tue Oct 15 14:43:30 2002
+++ b/drivers/scsi/scsi_ioctl.c	Tue Oct 15 14:43:30 2002
@@ -160,7 +160,7 @@
 	       return 0;
 
 	scsi_cmd[0] = ALLOW_MEDIUM_REMOVAL;
-	scsi_cmd[1] = (dev->scsi_level <= SCSI_2) ? (dev->lun << 5) : 0;
+	scsi_cmd[1] = 0;
 	scsi_cmd[2] = 0;
 	scsi_cmd[3] = 0;
 	scsi_cmd[4] = state;
@@ -297,12 +297,6 @@
 	if(copy_from_user(buf, cmd_in + cmdlen, inlen))
 		goto error;
 
-	/*
-	 * Set the lun field to the correct value.
-	 */
-	if (dev->scsi_level <= SCSI_2)
-		cmd[1] = (cmd[1] & 0x1f) | (dev->lun << 5);
-
 	switch (opcode) {
 	case FORMAT_UNIT:
 		timeout = FORMAT_UNIT_TIMEOUT;
@@ -416,7 +410,6 @@
 int scsi_ioctl(Scsi_Device * dev, int cmd, void *arg)
 {
 	char scsi_cmd[MAX_COMMAND_SIZE];
-	char cmd_byte1;
 
 	/* No idea how this happens.... */
 	if (!dev)
@@ -431,7 +424,6 @@
 	if (!scsi_block_when_processing_errors(dev)) {
 		return -ENODEV;
 	}
-	cmd_byte1 = (dev->scsi_level <= SCSI_2) ? (dev->lun << 5) : 0;
 
 	switch (cmd) {
 	case SCSI_IOCTL_GET_IDLUN:
@@ -484,7 +476,7 @@
 		return scsi_set_medium_removal(dev, SCSI_REMOVAL_ALLOW);
 	case SCSI_IOCTL_TEST_UNIT_READY:
 		scsi_cmd[0] = TEST_UNIT_READY;
-		scsi_cmd[1] = cmd_byte1;
+		scsi_cmd[1] = 0;
 		scsi_cmd[2] = scsi_cmd[3] = scsi_cmd[5] = 0;
 		scsi_cmd[4] = 0;
 		return ioctl_internal_command((Scsi_Device *) dev, scsi_cmd,
@@ -492,7 +484,7 @@
 		break;
 	case SCSI_IOCTL_START_UNIT:
 		scsi_cmd[0] = START_STOP;
-		scsi_cmd[1] = cmd_byte1;
+		scsi_cmd[1] = 0;
 		scsi_cmd[2] = scsi_cmd[3] = scsi_cmd[5] = 0;
 		scsi_cmd[4] = 1;
 		return ioctl_internal_command((Scsi_Device *) dev, scsi_cmd,
@@ -500,7 +492,7 @@
 		break;
 	case SCSI_IOCTL_STOP_UNIT:
 		scsi_cmd[0] = START_STOP;
-		scsi_cmd[1] = cmd_byte1;
+		scsi_cmd[1] = 0;
 		scsi_cmd[2] = scsi_cmd[3] = scsi_cmd[5] = 0;
 		scsi_cmd[4] = 0;
 		return ioctl_internal_command((Scsi_Device *) dev, scsi_cmd,
diff -Nru a/drivers/scsi/sd.c b/drivers/scsi/sd.c
--- a/drivers/scsi/sd.c	Tue Oct 15 14:43:30 2002
+++ b/drivers/scsi/sd.c	Tue Oct 15 14:43:30 2002
@@ -402,8 +402,7 @@
 		nbuff, (rq_data_dir(SCpnt->request) == WRITE) ? 
 		"writing" : "reading", this_count, SCpnt->request->nr_sectors));
 
-	SCpnt->cmnd[1] = (SCpnt->device->scsi_level <= SCSI_2) ?
-			 ((SCpnt->lun << 5) & 0xe0) : 0;
+	SCpnt->cmnd[1] = 0;
 
 	if (((this_count > 0xff) || (block > 0x1fffff)) || SCpnt->device->ten) {
 		if (this_count > 0xffff)
@@ -815,9 +814,7 @@
 
 		while (retries < 3) {
 			cmd[0] = TEST_UNIT_READY;
-			cmd[1] = (sdp->scsi_level <= SCSI_2) ?
-				((sdp->lun << 5) & 0xe0) : 0;
-			memset((void *) &cmd[2], 0, 8);
+			memset((void *) &cmd[1], 0, 9);
 
 			SRpnt->sr_cmd_len = 0;
 			SRpnt->sr_sense_buffer[0] = 0;
@@ -851,9 +848,7 @@
 				printk(KERN_NOTICE "%s: Spinning up disk...",
 				       diskname);
 				cmd[0] = START_STOP;
-				cmd[1] = (sdp->scsi_level <= SCSI_2) ?
-					((sdp->lun << 5) & 0xe0) : 0;
-				cmd[1] |= 1;	/* Return immediately */
+				cmd[1] = 1;	/* Return immediately */
 				memset((void *) &cmd[2], 0, 8);
 				cmd[4] = 1;	/* Start spin cycle */
 				SRpnt->sr_cmd_len = 0;
@@ -894,7 +889,6 @@
 		   Scsi_Request *SRpnt, unsigned char *buffer) {
 
 	unsigned char cmd[10];
-	Scsi_Device *sdp = sdkp->device;
 	int the_result, retries;
 
 	retries = 3;
@@ -902,9 +896,7 @@
 
 		memset((void *) &cmd[0], 0, 10);
 		cmd[0] = MODE_SENSE;
-		cmd[1] = (sdp->scsi_level <= SCSI_2) ?
-			 ((sdp->lun << 5) & 0xe0) : 0;
-		cmd[1] |= 0x08;	/* DBD */
+		cmd[1] = 0x08;	/* DBD */
 		cmd[2] = 0x08;	/* current values, cache page */
 		cmd[4] = 128;	/* allocation length */
 
@@ -968,9 +960,7 @@
 	retries = 3;
 	do {
 		cmd[0] = READ_CAPACITY;
-		cmd[1] = (sdp->scsi_level <= SCSI_2) ?
-			((sdp->lun << 5) & 0xe0) : 0;
-		memset((void *) &cmd[2], 0, 8);
+		memset((void *) &cmd[1], 0, 9);
 		memset((void *) buffer, 0, 8);
 
 		SRpnt->sr_cmd_len = 0;
@@ -1090,7 +1080,6 @@
 
 	memset((void *) &cmd[0], 0, 8);
 	cmd[0] = MODE_SENSE;
-	cmd[1] = (sdp->scsi_level <= SCSI_2) ? ((sdp->lun << 5) & 0xe0) : 0;
 	cmd[2] = modepage;
 	cmd[4] = len;
 
@@ -1612,7 +1601,6 @@
 		unsigned char cmd[10] = { 0 };
 
 		cmd[0] = SYNCHRONIZE_CACHE;
-		cmd[1] = SDpnt->scsi_level <= SCSI_2 ? (SDpnt->lun << 5) & 0xe0 : 0;
 		/* leave the rest of the command zero to indicate 
 		 * flush everything */
 		scsi_wait_req(SRpnt, (void *)cmd, NULL, 0,
diff -Nru a/drivers/scsi/sg.c b/drivers/scsi/sg.c
--- a/drivers/scsi/sg.c	Tue Oct 15 14:43:30 2002
+++ b/drivers/scsi/sg.c	Tue Oct 15 14:43:30 2002
@@ -705,10 +705,6 @@
 	SRpnt->sr_request->rq_dev = sdp->i_rdev;
 	SRpnt->sr_sense_buffer[0] = 0;
 	SRpnt->sr_cmd_len = hp->cmd_len;
-	if (!(hp->flags & SG_FLAG_LUN_INHIBIT)) {
-		if (sdp->device->scsi_level <= SCSI_2)
-			cmnd[1] = (cmnd[1] & 0x1f) | (sdp->device->lun << 5);
-	}
 	SRpnt->sr_use_sg = srp->data.k_use_sg;
 	SRpnt->sr_sglist_len = srp->data.sglist_len;
 	SRpnt->sr_bufflen = srp->data.bufflen;
diff -Nru a/drivers/scsi/sr.c b/drivers/scsi/sr.c
--- a/drivers/scsi/sr.c	Tue Oct 15 14:43:30 2002
+++ b/drivers/scsi/sr.c	Tue Oct 15 14:43:30 2002
@@ -336,9 +336,7 @@
 		   (rq_data_dir(SCpnt->request) == WRITE) ? "writing" : "reading",
 				 this_count, SCpnt->request->nr_sectors));
 
-	SCpnt->cmnd[1] = (SCpnt->device->scsi_level <= SCSI_2) ?
-			 ((SCpnt->lun << 5) & 0xe0) : 0;
-
+	SCpnt->cmnd[1] = 0;
 	block = (unsigned int)SCpnt->request->sector / (s_size >> 9);
 
 	if (this_count > 0xffff)
@@ -486,9 +484,7 @@
 
 	do {
 		cmd[0] = READ_CAPACITY;
-		cmd[1] = (cd->device->scsi_level <= SCSI_2) ?
-			 ((cd->device->lun << 5) & 0xe0) : 0;
-		memset((void *) &cmd[2], 0, 8);
+		memset((void *) &cmd[1], 0, 9);
 		SRpnt->sr_request->rq_status = RQ_SCSI_BUSY;	/* Mark as really busy */
 		SRpnt->sr_cmd_len = 0;
 
@@ -599,8 +595,6 @@
 	}
 	memset(&cgc, 0, sizeof(struct cdrom_generic_command));
 	cgc.cmd[0] = MODE_SENSE;
-	cgc.cmd[1] = (cd->device->scsi_level <= SCSI_2) ?
-		     ((cd->device->lun << 5) & 0xe0) : 0;
 	cgc.cmd[2] = 0x2a;
 	cgc.cmd[4] = 128;
 	cgc.buffer = buffer;
@@ -678,13 +672,6 @@
  */
 static int sr_packet(struct cdrom_device_info *cdi, struct cdrom_generic_command *cgc)
 {
-	Scsi_CD *cd = cdi->handle;
-	Scsi_Device *device = cd->device;
-	
-	/* set the LUN */
-	if (device->scsi_level <= SCSI_2)
-		cgc->cmd[1] |= device->lun << 5;
-
 	if (cgc->timeout <= 0)
 		cgc->timeout = IOCTL_TIMEOUT;
 
diff -Nru a/drivers/scsi/sr_ioctl.c b/drivers/scsi/sr_ioctl.c
--- a/drivers/scsi/sr_ioctl.c	Tue Oct 15 14:43:30 2002
+++ b/drivers/scsi/sr_ioctl.c	Tue Oct 15 14:43:30 2002
@@ -200,8 +200,6 @@
 
 	memset(&cgc, 0, sizeof(struct cdrom_generic_command));
 	cgc.cmd[0] = GPCMD_TEST_UNIT_READY;
-	cgc.cmd[1] = (cd->device->scsi_level <= SCSI_2) ?
-		     ((cd->device->lun) << 5) : 0;
 	cgc.quiet = 1;
 	cgc.data_direction = SCSI_DATA_NONE;
 	cgc.timeout = IOCTL_TIMEOUT;
@@ -215,8 +213,6 @@
 
 	memset(&cgc, 0, sizeof(struct cdrom_generic_command));
 	cgc.cmd[0] = GPCMD_START_STOP_UNIT;
-	cgc.cmd[1] = (cd->device->scsi_level <= SCSI_2) ?
-		     ((cd->device->lun) << 5) : 0;
 	cgc.cmd[4] = (pos == 0) ? 0x03 /* close */ : 0x02 /* eject */ ;
 	cgc.data_direction = SCSI_DATA_NONE;
 	cgc.timeout = IOCTL_TIMEOUT;
@@ -293,8 +289,6 @@
 
 	memset(&cgc, 0, sizeof(struct cdrom_generic_command));
 	cgc.cmd[0] = GPCMD_READ_SUBCHANNEL;
-	cgc.cmd[1] = (cd->device->scsi_level <= SCSI_2) ?
-		     ((cd->device->lun) << 5) : 0;
 	cgc.cmd[2] = 0x40;	/* I do want the subchannel info */
 	cgc.cmd[3] = 0x02;	/* Give me medium catalog number info */
 	cgc.cmd[8] = 24;
@@ -327,8 +321,6 @@
 
 	memset(&cgc, 0, sizeof(struct cdrom_generic_command));
 	cgc.cmd[0] = GPCMD_SET_SPEED;	/* SET CD SPEED */
-	cgc.cmd[1] = (cd->device->scsi_level <= SCSI_2) ?
-		     ((cd->device->lun) << 5) : 0;
 	cgc.cmd[2] = (speed >> 8) & 0xff;	/* MSB for speed (in kbytes/sec) */
 	cgc.cmd[3] = speed & 0xff;	/* LSB */
 	cgc.data_direction = SCSI_DATA_NONE;
@@ -361,8 +353,6 @@
 			struct cdrom_tochdr *tochdr = (struct cdrom_tochdr *) arg;
 
 			cgc.cmd[0] = GPCMD_READ_TOC_PMA_ATIP;
-			cgc.cmd[1] = (cd->device->scsi_level <= SCSI_2) ?
-				     ((cd->device->lun) << 5) : 0;
 			cgc.cmd[8] = 12;		/* LSB of length */
 			cgc.buffer = buffer;
 			cgc.buflen = 12;
@@ -382,8 +372,6 @@
 			struct cdrom_tocentry *tocentry = (struct cdrom_tocentry *) arg;
 
 			cgc.cmd[0] = GPCMD_READ_TOC_PMA_ATIP;
-			cgc.cmd[1] = (cd->device->scsi_level <= SCSI_2) ?
-				     ((cd->device->lun) << 5) : 0;
 			cgc.cmd[1] |= (tocentry->cdte_format == CDROM_MSF) ? 0x02 : 0;
 			cgc.cmd[6] = tocentry->cdte_track;
 			cgc.cmd[8] = 12;		/* LSB of length */
@@ -411,8 +399,6 @@
 		struct cdrom_ti* ti = (struct cdrom_ti*)arg;
 
 		cgc.cmd[0] = GPCMD_PLAYAUDIO_TI;
-		cgc.cmd[1] = (cd->device->scsi_level <= SCSI_2) ?
-			     (cd->device->lun << 5) : 0;
 		cgc.cmd[4] = ti->cdti_trk0;
 		cgc.cmd[5] = ti->cdti_ind0;
 		cgc.cmd[7] = ti->cdti_trk1;
@@ -463,9 +449,7 @@
 
 	memset(&cgc, 0, sizeof(struct cdrom_generic_command));
 	cgc.cmd[0] = GPCMD_READ_CD;	/* READ_CD */
-	cgc.cmd[1] = (cd->device->scsi_level <= SCSI_2) ?
-		     (cd->device->lun << 5) : 0;
-	cgc.cmd[1] |= ((format & 7) << 2);
+	cgc.cmd[1] = ((format & 7) << 2);
 	cgc.cmd[2] = (unsigned char) (lba >> 24) & 0xff;
 	cgc.cmd[3] = (unsigned char) (lba >> 16) & 0xff;
 	cgc.cmd[4] = (unsigned char) (lba >> 8) & 0xff;
@@ -521,8 +505,6 @@
 
 	memset(&cgc, 0, sizeof(struct cdrom_generic_command));
 	cgc.cmd[0] = GPCMD_READ_10;
-	cgc.cmd[1] = (cd->device->scsi_level <= SCSI_2) ?
-		     (cd->device->lun << 5) : 0;
 	cgc.cmd[2] = (unsigned char) (lba >> 24) & 0xff;
 	cgc.cmd[3] = (unsigned char) (lba >> 16) & 0xff;
 	cgc.cmd[4] = (unsigned char) (lba >> 8) & 0xff;
diff -Nru a/drivers/scsi/sr_vendor.c b/drivers/scsi/sr_vendor.c
--- a/drivers/scsi/sr_vendor.c	Tue Oct 15 14:43:30 2002
+++ b/drivers/scsi/sr_vendor.c	Tue Oct 15 14:43:30 2002
@@ -124,9 +124,7 @@
 #endif
 	memset(&cgc, 0, sizeof(struct cdrom_generic_command));
 	cgc.cmd[0] = MODE_SELECT;
-	cgc.cmd[1] = (cd->device->scsi_level <= SCSI_2) ?
-		     (cd->device->lun << 5) : 0;
-	cgc.cmd[1] |= (1 << 4);
+	cgc.cmd[1] = (1 << 4);
 	cgc.cmd[4] = 12;
 	modesel = (struct ccs_modesel_head *) buffer;
 	memset(modesel, 0, sizeof(*modesel));
@@ -180,8 +178,6 @@
 
 	case VENDOR_SCSI3:
 		cgc.cmd[0] = READ_TOC;
-		cgc.cmd[1] = (cd->device->scsi_level <= SCSI_2) ?
-			     (cd->device->lun << 5) : 0;
 		cgc.cmd[8] = 12;
 		cgc.cmd[9] = 0x40;
 		cgc.buffer = buffer;
@@ -210,9 +206,7 @@
 	case VENDOR_NEC:{
 			unsigned long min, sec, frame;
 			cgc.cmd[0] = 0xde;
-			cgc.cmd[1] = (cd->device->scsi_level <= SCSI_2) ?
-				     (cd->device->lun << 5) : 0;
-			cgc.cmd[1] |= 0x03;
+			cgc.cmd[1] = 0x03;
 			cgc.cmd[2] = 0xb0;
 			cgc.buffer = buffer;
 			cgc.buflen = 0x16;
@@ -242,9 +236,7 @@
 			/* we request some disc information (is it a XA-CD ?,
 			 * where starts the last session ?) */
 			cgc.cmd[0] = 0xc7;
-			cgc.cmd[1] = (cd->device->scsi_level <= SCSI_2) ?
-				     (cd->device->lun << 5) : 0;
-			cgc.cmd[1] |= 0x03;
+			cgc.cmd[1] = 0x03;
 			cgc.buffer = buffer;
 			cgc.buflen = 4;
 			cgc.quiet = 1;
@@ -272,8 +264,6 @@
 
 	case VENDOR_WRITER:
 		cgc.cmd[0] = READ_TOC;
-		cgc.cmd[1] = (cd->device->scsi_level <= SCSI_2) ?
-			     (cd->device->lun << 5) : 0;
 		cgc.cmd[8] = 0x04;
 		cgc.cmd[9] = 0x40;
 		cgc.buffer = buffer;
@@ -291,8 +281,6 @@
 			break;
 		}
 		cgc.cmd[0] = READ_TOC;	/* Read TOC */
-		cgc.cmd[1] = (cd->device->scsi_level <= SCSI_2) ?
-			     (cd->device->lun << 5) : 0;
 		cgc.cmd[6] = rc & 0x7f;	/* number of last session */
 		cgc.cmd[8] = 0x0c;
 		cgc.cmd[9] = 0x40;
diff -Nru a/drivers/scsi/st.c b/drivers/scsi/st.c
--- a/drivers/scsi/st.c	Tue Oct 15 14:43:30 2002
+++ b/drivers/scsi/st.c	Tue Oct 15 14:43:30 2002
@@ -379,8 +379,6 @@
 		}
 	}
 
-	if (SRpnt->sr_device->scsi_level <= SCSI_2)
-		cmd[1] |= (SRpnt->sr_device->lun << 5) & 0xe0;
 	init_completion(&STp->wait);
 	SRpnt->sr_use_sg = STp->buffer->do_dio || (bytes > (STp->buffer)->frp[0].length);
 	if (SRpnt->sr_use_sg) {
diff -Nru a/include/scsi/sg.h b/include/scsi/sg.h
--- a/include/scsi/sg.h	Tue Oct 15 14:43:30 2002
+++ b/include/scsi/sg.h	Tue Oct 15 14:43:30 2002
@@ -130,7 +130,7 @@
 
 /* following flag values can be "or"-ed together */
 #define SG_FLAG_DIRECT_IO 1     /* default is indirect IO */
-#define SG_FLAG_LUN_INHIBIT 2   /* default is overwrite lun in SCSI */
+#define SG_FLAG_UNUSED_LUN_INHIBIT 2   /* default is overwrite lun in SCSI */
 				/* command block (when <= SCSI_2) */
 #define SG_FLAG_MMAP_IO 4       /* request memory mapped IO */
 #define SG_FLAG_NO_DXFER 0x10000 /* no transfer of kernel buffers to/from */

^ permalink raw reply	[flat|nested] 297+ messages in thread

* [PATCH] 2.5 current bk fix setting scsi queue depths
@ 2002-10-30 16:58 ` Patrick Mansfield
  2002-10-30 17:17   ` James Bottomley
  0 siblings, 1 reply; 297+ messages in thread
From: Patrick Mansfield @ 2002-10-30 16:58 UTC (permalink / raw)
  To: Linus Torvalds, linux-kernel, linux-scsi

Hi -

This patch fixes a problem with the current linus bk tree setting
scsi queue depths to 1. Please apply.

Without the patch:

[patman@elm3a50 patman]$ cat /proc/scsi/sg/device_hdr /proc/scsi/sg/devices
host    chan    id      lun     type    opens   qdepth  busy    online
0       0       0       0       0       2       1       0       1
0       0       1       0       0       1       1       0       1
0       0       15      0       3       0       1       0       1

With the patch:

[patman@elm3a50 patman]$ cat /proc/scsi/sg/device_hdr /proc/scsi/sg/devices
host    chan    id      lun     type    opens   qdepth  busy    online
0       0       0       0       0       2       253     0       1
0       0       1       0       0       1       253     0       1
0       0       15      0       3       0       2       0       1


--- 1.51/drivers/scsi/scsi.c	Tue Oct 29 01:03:27 2002
+++ edited/drivers/scsi/scsi.c	Wed Oct 30 08:36:23 2002
@@ -1511,7 +1511,6 @@
 		kfree((char *) SCpnt);
 	}
 	SDpnt->current_queue_depth = 0;
-	SDpnt->new_queue_depth = 0;
 	spin_unlock_irqrestore(&device_request_lock, flags);
 }
 
-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] 2.5 current bk fix setting scsi queue depths
  2002-10-30 16:58 ` [PATCH] 2.5 current bk fix setting scsi queue depths Patrick Mansfield
@ 2002-10-30 17:17   ` James Bottomley
  2002-10-30 18:05     ` Patrick Mansfield
  0 siblings, 1 reply; 297+ messages in thread
From: James Bottomley @ 2002-10-30 17:17 UTC (permalink / raw)
  To: Linus Torvalds, linux-kernel, linux-scsi

patmans@us.ibm.com said:
> This patch fixes a problem with the current linus bk tree setting scsi
> queue depths to 1. Please apply. 

This patch causes the depth specification to be retained when we release 
commandblocks.  Since releasing command blocks is supposed only to be done 
when we give up the device (and therefore, is supposed to clear everything), 
your fix looks like it's merely masking a problem, not fixing it.

Is the real problem that this controller is getting a release command blocks 
and then a reallocate of them after slave_attach is called?  If so, that's 
probably what needs to be fixed.

James

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] 2.5 current bk fix setting scsi queue depths
  2002-10-30 17:17   ` James Bottomley
@ 2002-10-30 18:05     ` Patrick Mansfield
  2002-10-31  0:44       ` James Bottomley
  0 siblings, 1 reply; 297+ messages in thread
From: Patrick Mansfield @ 2002-10-30 18:05 UTC (permalink / raw)
  To: James Bottomley; +Cc: Linus Torvalds, linux-kernel, linux-scsi

On Wed, Oct 30, 2002 at 11:17:52AM -0600, James Bottomley wrote:
> patmans@us.ibm.com said:
> > This patch fixes a problem with the current linus bk tree setting scsi
> > queue depths to 1. Please apply. 
> 
> This patch causes the depth specification to be retained when we release 
> commandblocks.  Since releasing command blocks is supposed only to be done 
> when we give up the device (and therefore, is supposed to clear everything), 
> your fix looks like it's merely masking a problem, not fixing it.
> 
> Is the real problem that this controller is getting a release command blocks 
> and then a reallocate of them after slave_attach is called?  If so, that's 
> probably what needs to be fixed.
> 
> James

Yes, the problem is that in scsi_register_host() if there are no upper
level drivers - the standard case if building no modules - we call
scsi_release_commandblocks even though we are NOT getting rid of
the scsi_device. So, with current code, new_queue_depth and
current_queue_depth are zero.

When we register upper level drivers in scsi_register_device(), we
call scsi_build_commandblocks (again), and get a queue depth of 1,
since we've cleared new_queue_depth.

(In many cases, for one device we call build command blocks twice, call
release command blocks, and then build command blocks again. Yuck)

Removing the scsi_release_commandblocks() in scsi_register_host()
would also fix the problem, and in most cases, would not waste any
space. In the worst case AFAICT it would waste one scsi_cmnd (about 300
or so bytes?).

I see no good reason to zero new_queue_depth in scsi_release_commandblocks,
as new_queue_depth is the desired queue depth, and should remain so until
scsi_adjust_queue_depth is called. Setting new_queue_depth to zero means
we have to call slave_attach again to set it right, and depending on what
else an adapter slave_attach does could be very wrong.

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] 2.5 current bk fix setting scsi queue depths
  2002-10-30 18:05     ` Patrick Mansfield
@ 2002-10-31  0:44       ` James Bottomley
  0 siblings, 0 replies; 297+ messages in thread
From: James Bottomley @ 2002-10-31  0:44 UTC (permalink / raw)
  To: Linus Torvalds, linux-kernel, linux-scsi

> Yes, the problem is that in scsi_register_host() if there are no upper
> level drivers - the standard case if building no modules - we call
> scsi_release_commandblocks even though we are NOT getting rid of the
> scsi_device. So, with current code, new_queue_depth and
> current_queue_depth are zero.

But slave_attach isn't called here (even though it should be for attached 
devices).  I assume it's getting added by the scan between register_host & 
register_device.

> When we register upper level drivers in scsi_register_device(), we
> call scsi_build_commandblocks (again), and get a queue depth of 1,
> since we've cleared new_queue_depth. 

OK, we have a slight mess up here.  Perhaps the rule for slave_attach should 
be that we only call it if we actually have an upper level device attached (if 
we haven't, there's little point asking the HBA to allocate space for queueing 
for a device we're not currently using).  Then, we should do slave_attach when 
something actually decides to attach to the device.

if we follow this approach, slave_attach wouldn't be called until 
register_device in your problem scenario, and then everything would work as 
expected.

> Removing the scsi_release_commandblocks() in scsi_register_host()
> would also fix the problem, and in most cases, would not waste any
> space. In the worst case AFAICT it would waste one scsi_cmnd (about
> 300 or so bytes?). 

Well, if there's no device attached, there's no need for a queue.  This would 
waste 1 SCSI command per unattached device (and SCSI commands are DMA'able 
memory which is precious on some systems).  Right now, that's OK for small 
systems.  When we move to a lazy attachment model because we have an array 
with 65535 LUNs and we're only interested in one of them, it won't be.

> I see no good reason to zero new_queue_depth in scsi_release_commandblo
> cks, as new_queue_depth is the desired queue depth, and should remain
> so until scsi_adjust_queue_depth is called. Setting new_queue_depth to
> zero means we have to call slave_attach again to set it right, and
> depending on what else an adapter slave_attach does could be very
> wrong. 

Well, to my way of thinking, build and release commandblocks are like 
constructor and destructor for the device queue.  On general design 
principles, I don't like the idea of queue specific information persisting 
past its destruction.

James

^ permalink raw reply	[flat|nested] 297+ messages in thread

* [patch 2.5] ips queue depths
@ 2002-10-15 18:55 Jeffery, David
  2002-10-15 19:30 ` Dave Hansen
  2002-10-15 19:47 ` Doug Ledford
  0 siblings, 2 replies; 297+ messages in thread
From: Jeffery, David @ 2002-10-15 18:55 UTC (permalink / raw)
  To: 'Dave Hansen'; +Cc: 'linux-scsi@vger.kernel.org'

[-- Attachment #1: Type: text/plain, Size: 379 bytes --]

Dave,

Here's a patch that should restore the queue depths to what they were before
the
queue depths change was merged in.  Hopefully this well restore your lost
performance.  If I don't hear or find anything bad about it, it will be
going
to Linus shortly.

And thanks goes to Mike Anderson for his initial version.  That made writing
this patch all the easier.

David Jeffery


[-- Attachment #2: ips2.5.42.patch --]
[-- Type: application/octet-stream, Size: 673 bytes --]

--- linux-2.5.42/drivers/scsi/ips.c	Tue Oct 15 14:34:16 2002
+++ linux-2.5.42_new/drivers/scsi/ips.c	Tue Oct 15 14:24:13 2002
@@ -1877,12 +1877,16 @@
 {
    ips_ha_t    *ha;
    int          min;
+   int          depth;
 
    ha = IPS_HA(SDptr->host);
    min = ha->max_cmds / 4;
-   if (min < 8)
-      min = ha->max_cmds - 1;
-   scsi_adjust_queue_depth(SDptr, MSG_ORDERED_TAG, min);
+   if((SDptr->type == TYPE_DISK) && ha->enq->ucLogDriveCount){
+      depth = (ha->max_cmds - 1) / ha->enq->ucLogDriveCount;
+      depth = max(min, depth);
+   } else
+      depth = 2;
+   scsi_adjust_queue_depth(SDptr, MSG_SIMPLE_TAG, depth);
    return 0;
 }
 

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [patch 2.5] ips queue depths
  2002-10-15 18:55 [patch 2.5] ips " Jeffery, David
@ 2002-10-15 19:30 ` Dave Hansen
  2002-10-15 19:47 ` Doug Ledford
  1 sibling, 0 replies; 297+ messages in thread
From: Dave Hansen @ 2002-10-15 19:30 UTC (permalink / raw)
  To: Jeffery, David; +Cc: 'linux-scsi@vger.kernel.org'

[-- Attachment #1: Type: text/plain, Size: 888 bytes --]

Jeffery, David wrote:
> Dave,
> 
> Here's a patch that should restore the queue depths to what they were before
> the
> queue depths change was merged in.  Hopefully this well restore your lost
> performance.  If I don't hear or find anything bad about it, it will be
> going
> to Linus shortly.
> 
> And thanks goes to Mike Anderson for his initial version.  That made writing
> this patch all the easier.

Thank you for taking care of that so quickly.  I've tested it and it 
shows that I'm within 0.5% of where I was before.  this is well within 
the margin of error for my test.

On another (completely cosmetic)  note, ips.c is spitting out warnings 
because of the IPS_*LOCK_RESTORE macros which don't use the flags 
argument in 2.5.  If you're sending an update to Linus, could you 
include something like what I've attached to get rid of them?
-- 
Dave Hansen
haveblue@us.ibm.com

[-- Attachment #2: ips-warning-fix-2.5.42+bk-0.patch --]
[-- Type: text/plain, Size: 633 bytes --]

--- ips.c.orig	Tue Oct 15 12:24:25 2002
+++ ips.c	Tue Oct 15 12:25:24 2002
@@ -248,8 +248,8 @@
 #else
     #define IPS_SG_ADDRESS(sg)      (page_address((sg)->page) ? \
                                      page_address((sg)->page)+(sg)->offset : 0)
-    #define IPS_LOCK_SAVE(lock,flags) spin_lock(lock)
-    #define IPS_UNLOCK_RESTORE(lock,flags) spin_unlock(lock)
+    #define IPS_LOCK_SAVE(lock,flags) do { spin_lock(lock); (void)flags } while (0)
+    #define IPS_UNLOCK_RESTORE(lock,flags) { spin_unlock(lock); (void)flags } while (0)
 #endif
 
 #define IPS_DMA_DIR(scb) ((!scb->scsi_cmd || ips_is_passthru(scb->scsi_cmd) || \

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [patch 2.5] ips queue depths
  2002-10-15 18:55 [patch 2.5] ips " Jeffery, David
  2002-10-15 19:30 ` Dave Hansen
@ 2002-10-15 19:47 ` Doug Ledford
  2002-10-15 20:04   ` Patrick Mansfield
                     ` (4 more replies)
  1 sibling, 5 replies; 297+ messages in thread
From: Doug Ledford @ 2002-10-15 19:47 UTC (permalink / raw)
  To: Jeffery, David
  Cc: 'Dave Hansen', 'linux-scsi@vger.kernel.org'

[-- Attachment #1: Type: text/plain, Size: 2734 bytes --]

On Tue, Oct 15, 2002 at 02:55:46PM -0400, Jeffery, David wrote:
> Dave,
> 
> Here's a patch that should restore the queue depths to what they were before
> the
> queue depths change was merged in.  Hopefully this well restore your lost
> performance.  If I don't hear or find anything bad about it, it will be
> going
> to Linus shortly.
> 
> And thanks goes to Mike Anderson for his initial version.  That made writing
> this patch all the easier.


I actually sent a patch to linus already to do this but it hasn't come 
through yet.  However, my patch and yours differ in one key point that I 
don't understand.

The scsi mid layer will never send you more than host->can_queue commands 
at one time, so why do all the scsi driver authors feel it is necessary to 
split their queue depth up amongst devices?  Justin Gibbs is the only one 
that gets it right IMHO.  Set the depth on each device as deep as that 
device can take (if that happens to be host->can_queue - 1 or such, then 
so be it).  Then, let the mid layer and the request function worry about 
fairness across devices.  That's its job after all.  If everyone is going 
to moderate their queue depths like that, then we might as well yank 
can_queue out of the host struct entirely because *it serves no purpose*.

Now, simple scsi controllers like the aic7xxx and intelligent raid 
controllers are two different beasts.  On simple controllers, the max 
queue depth on a target is typically whatever that target can handle.  On 
raid controllers, it's all up to what the firmware can handle (for all I 
know, the ServeRAID firmware may have one queue depth limit for total 
commands and a separate queue depth limit on each logical device).  So, I 
can't say for sure what the ips driver can and can't do, but I'm 
relatively sure that you are artificially limiting your own performance by 
handling queue depths the way you are.

Oh, also, host->cmd_per_lun is suppossed to be the automatic queue depth
on untagged devices.  16 is typically way excessive for an untagged device
(and I'm not even sure you support untagged pass-through devices on ips,
and if you do I have no clue if the firmware will properly queue up
untagged commands on a pass through device), so I changed that as well.  
So, attached are the 3 different patches I sent to linus on this driver.  
I would be interested to know what happens to performance on this driver 
if you follow my suggestion of letting the mid layer take care of watching 
the card's maximum queue depth and start letting the drives have queues as 
deep as they can handle.


-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606
  

[-- Attachment #2: ips.patch --]
[-- Type: text/plain, Size: 4907 bytes --]

# This is a BitKeeper generated patch for the following project:
# Project Name: Linux kernel tree
# This patch format is intended for GNU patch command version 2.5 or higher.
# This patch includes the following deltas:
#	           ChangeSet	1.754   -> 1.755  
#	  drivers/scsi/ips.c	1.26    -> 1.27   
#	  drivers/scsi/ips.h	1.9     -> 1.10   
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 02/10/11	dledford@aladin.rdu.redhat.com	1.755
# Make the rest of the world happy with ips again
# --------------------------------------------
#
diff -Nru a/drivers/scsi/ips.c b/drivers/scsi/ips.c
--- a/drivers/scsi/ips.c	Fri Oct 11 14:28:18 2002
+++ b/drivers/scsi/ips.c	Fri Oct 11 14:28:18 2002
@@ -433,6 +433,7 @@
 int ips_eh_reset(Scsi_Cmnd *);
 int ips_queue(Scsi_Cmnd *, void (*) (Scsi_Cmnd *));
 int ips_biosparam(Disk *, struct block_device *, int *);
+int ips_slave_attach(Scsi_Device *);
 const char * ips_info(struct Scsi_Host *);
 void do_ipsintr(int, void *, struct pt_regs *);
 static int ips_hainit(ips_ha_t *);
@@ -481,7 +482,7 @@
 static void ips_free_flash_copperhead(ips_ha_t *ha);
 static void ips_get_bios_version(ips_ha_t *, int);
 static void ips_identify_controller(ips_ha_t *);
-static void ips_select_queue_depth(struct Scsi_Host *, Scsi_Device *);
+//static void ips_select_queue_depth(struct Scsi_Host *, Scsi_Device *);
 static void ips_chkstatus(ips_ha_t *, IPS_STATUS *);
 static void ips_enable_int_copperhead(ips_ha_t *);
 static void ips_enable_int_copperhead_memio(ips_ha_t *);
@@ -1087,7 +1088,7 @@
          sh->n_io_port = io_addr ? 255 : 0;
          sh->unique_id = (io_addr) ? io_addr : mem_addr;
          sh->irq = irq;
-         sh->select_queue_depths = ips_select_queue_depth;
+         //sh->select_queue_depths = ips_select_queue_depth;
          sh->sg_tablesize = sh->hostt->sg_tablesize;
          sh->can_queue = sh->hostt->can_queue;
          sh->cmd_per_lun = sh->hostt->cmd_per_lun;
@@ -1827,7 +1828,7 @@
 /*   Select queue depths for the devices on the contoller                   */
 /*                                                                          */
 /****************************************************************************/
-static void
+/*static void
 ips_select_queue_depth(struct Scsi_Host *host, Scsi_Device *scsi_devs) {
    Scsi_Device *device;
    ips_ha_t    *ha;
@@ -1860,6 +1861,30 @@
       }
    }
 }
+*/
+
+/****************************************************************************/
+/*                                                                          */
+/* Routine Name: ips_slave_attach                                           */
+/*                                                                          */
+/* Routine Description:                                                     */
+/*                                                                          */
+/*   Set queue depths on devices once scan is complete                      */
+/*                                                                          */
+/****************************************************************************/
+int
+ips_slave_attach(Scsi_Device *SDptr)
+{
+   ips_ha_t    *ha;
+   int          min;
+
+   ha = IPS_HA(SDptr->host);
+   min = ha->max_cmds / 4;
+   if (min < 8)
+      min = ha->max_cmds - 1;
+   scsi_adjust_queue_depth(SDptr, MSG_ORDERED_TAG, min);
+   return 0;
+}
 
 /****************************************************************************/
 /*                                                                          */
@@ -7407,7 +7432,7 @@
     sh->n_io_port = io_addr ? 255 : 0;
     sh->unique_id = (io_addr) ? io_addr : mem_addr;
     sh->irq = irq;
-    sh->select_queue_depths = ips_select_queue_depth;
+    //sh->select_queue_depths = ips_select_queue_depth;
     sh->sg_tablesize = sh->hostt->sg_tablesize;
     sh->can_queue = sh->hostt->can_queue;
     sh->cmd_per_lun = sh->hostt->cmd_per_lun;
diff -Nru a/drivers/scsi/ips.h b/drivers/scsi/ips.h
--- a/drivers/scsi/ips.h	Fri Oct 11 14:28:18 2002
+++ b/drivers/scsi/ips.h	Fri Oct 11 14:28:18 2002
@@ -60,6 +60,7 @@
    extern int ips_eh_reset(Scsi_Cmnd *);
    extern int ips_queue(Scsi_Cmnd *, void (*) (Scsi_Cmnd *));
    extern int ips_biosparam(Disk *, struct block_device *, int *);
+   extern int ips_slave_attach(Scsi_Device *);
    extern const char * ips_info(struct Scsi_Host *);
    extern void do_ips(int, void *, struct pt_regs *);
 
@@ -481,7 +482,8 @@
     eh_host_reset_handler : ips_eh_reset, \
     abort : NULL,                         \
     reset : NULL,                         \
-    slave_attach : NULL,                  \
+    slave_attach : ips_slave_attach,      \
+    slave_detach : NULL,                  \
     bios_param : ips_biosparam,           \
     can_queue : 0,                        \
     this_id: -1,                          \

[-- Attachment #3: ip2-2.patch --]
[-- Type: text/plain, Size: 572 bytes --]

diff -Nru a/drivers/scsi/ips.c b/drivers/scsi/ips.c
--- a/drivers/scsi/ips.c	Fri Oct 11 16:42:41 2002
+++ b/drivers/scsi/ips.c	Fri Oct 11 16:42:41 2002
@@ -1879,10 +1879,12 @@
    int          min;
 
    ha = IPS_HA(SDptr->host);
-   min = ha->max_cmds / 4;
-   if (min < 8)
-      min = ha->max_cmds - 1;
-   scsi_adjust_queue_depth(SDptr, MSG_ORDERED_TAG, min);
+   if (SDptr->tagged_supported) {
+      min = ha->max_cmds / 2;
+      if (min <= 16)
+         min = ha->max_cmds - 1;
+      scsi_adjust_queue_depth(SDptr, MSG_ORDERED_TAG, min);
+   }
    return 0;
 }
 

[-- Attachment #4: ips.h.patch --]
[-- Type: text/plain, Size: 1952 bytes --]

# This is a BitKeeper generated patch for the following project:
# Project Name: Linux kernel tree
# This patch format is intended for GNU patch command version 2.5 or higher.
# This patch includes the following deltas:
#	           ChangeSet	1.782.2.2 -> 1.782.2.3
#	  drivers/scsi/ips.h	1.11    -> 1.12   
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 02/10/12	dledford@aladin.rdu.redhat.com	1.782.2.3
# ips.h:
#   Since we now have proper tagged depth setting, make
#   the cmd_per_lun value reasonable for untagged devices
#   like it is suppossed to be.
# --------------------------------------------
#
diff -Nru a/drivers/scsi/ips.h b/drivers/scsi/ips.h
--- a/drivers/scsi/ips.h	Mon Oct 14 15:06:26 2002
+++ b/drivers/scsi/ips.h	Mon Oct 14 15:06:26 2002
@@ -429,7 +429,7 @@
     can_queue : 0,                        \
     this_id: -1,                          \
     sg_tablesize : IPS_MAX_SG,            \
-    cmd_per_lun: 16,                      \
+    cmd_per_lun: 3,                       \
     present : 0,                          \
     unchecked_isa_dma : 0,                \
     use_clustering : ENABLE_CLUSTERING,   \
@@ -458,7 +458,7 @@
     can_queue : 0,                        \
     this_id: -1,                          \
     sg_tablesize : IPS_MAX_SG,            \
-    cmd_per_lun: 16,                      \
+    cmd_per_lun: 3,                       \
     present : 0,                          \
     unchecked_isa_dma : 0,                \
     use_clustering : ENABLE_CLUSTERING,   \
@@ -488,7 +488,7 @@
     can_queue : 0,                        \
     this_id: -1,                          \
     sg_tablesize : IPS_MAX_SG,            \
-    cmd_per_lun: 16,                      \
+    cmd_per_lun: 3,                       \
     present : 0,                          \
     unchecked_isa_dma : 0,                \
     use_clustering : ENABLE_CLUSTERING,   \

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [patch 2.5] ips queue depths
  2002-10-15 19:47 ` Doug Ledford
@ 2002-10-15 20:04   ` Patrick Mansfield
  2002-10-15 20:52     ` Doug Ledford
  2002-10-15 20:10   ` Mike Anderson
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 297+ messages in thread
From: Patrick Mansfield @ 2002-10-15 20:04 UTC (permalink / raw)
  To: Jeffery, David, 'Dave Hansen',
	'linux-scsi@vger.kernel.org'

On Tue, Oct 15, 2002 at 03:47:06PM -0400, Doug Ledford wrote:

> The scsi mid layer will never send you more than host->can_queue commands 
> at one time, so why do all the scsi driver authors feel it is necessary to 
> split their queue depth up amongst devices?  Justin Gibbs is the only one 
> that gets it right IMHO.  Set the depth on each device as deep as that 
> device can take (if that happens to be host->can_queue - 1 or such, then 
> so be it).  Then, let the mid layer and the request function worry about 
> fairness across devices.  That's its job after all.  If everyone is going 
> to moderate their queue depths like that, then we might as well yank 
> can_queue out of the host struct entirely because *it serves no purpose*.

> Now, simple scsi controllers like the aic7xxx and intelligent raid 
> controllers are two different beasts.  On simple controllers, the max 
> queue depth on a target is typically whatever that target can handle.  On 
> raid controllers, it's all up to what the firmware can handle (for all I 
> know, the ServeRAID firmware may have one queue depth limit for total 
> commands and a separate queue depth limit on each logical device).  So, I 
> can't say for sure what the ips driver can and can't do, but I'm 
> relatively sure that you are artificially limiting your own performance by 
> handling queue depths the way you are.

I totally agree the queue depth should not be limited based on the number of
devices, but there should be a maximum limit - as Justin noted with the
aic driver, using a large default queue depth (he had 253 or so) is not
good with linux.

The queue depth should be as small as possible without limiting the
performance.

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [patch 2.5] ips queue depths
  2002-10-15 20:04   ` Patrick Mansfield
@ 2002-10-15 20:52     ` Doug Ledford
  2002-10-15 23:30       ` Patrick Mansfield
  0 siblings, 1 reply; 297+ messages in thread
From: Doug Ledford @ 2002-10-15 20:52 UTC (permalink / raw)
  To: Jeffery, David, 'Dave Hansen',
	'linux-scsi@vger.kernel.org'

On Tue, Oct 15, 2002 at 01:04:45PM -0700, Patrick Mansfield wrote:
> 
> I totally agree the queue depth should not be limited based on the number of
> devices, but there should be a maximum limit - as Justin noted with the
> aic driver, using a large default queue depth (he had 253 or so) is not
> good with linux.

That's largely due to the scsi_request_fn() in its current form.

> The queue depth should be as small as possible without limiting the
> performance.

Which happens to be a magic number that no one really knows I think ;-)  
Current goals of the work I'm doing include making this adjustable so you
aren't wasting queue depth on devices that have a hard limit less than
your initial queue depth value and changing the scsi_request_fn() to be
more fair with command allocation on a controller in the presence of
starvation (and of course, starvation is only possible when the queue
depth of all devices is greater than the depth of the controller, so
obviously the starvation code in scsi_request_fn() is something
controllers have been avoiding by limiting queue depths on individual
devices).

The adjustable queue depth work has been mostly done and sent to linus.  
I'm trying to finish up the last few controllers that haven't been 
modified to the new scheme yet.  The request_fn work is on hold until this 
is done (because the breakage made everyone moan so loadly that I dropped 
everything else until this is taken care of).

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [patch 2.5] ips queue depths
  2002-10-15 20:52     ` Doug Ledford
@ 2002-10-15 23:30       ` Patrick Mansfield
  2002-10-15 23:56         ` Luben Tuikov
  2002-10-16  2:32         ` Doug Ledford
  0 siblings, 2 replies; 297+ messages in thread
From: Patrick Mansfield @ 2002-10-15 23:30 UTC (permalink / raw)
  To: Jeffery, David, 'Dave Hansen',
	'linux-scsi@vger.kernel.org'

On Tue, Oct 15, 2002 at 04:52:18PM -0400, Doug Ledford wrote:
> On Tue, Oct 15, 2002 at 01:04:45PM -0700, Patrick Mansfield wrote:
> > 
> > I totally agree the queue depth should not be limited based on the number of
> > devices, but there should be a maximum limit - as Justin noted with the
> > aic driver, using a large default queue depth (he had 253 or so) is not
> > good with linux.
> 
> That's largely due to the scsi_request_fn() in its current form.

I don't understand that.

> 
> > The queue depth should be as small as possible without limiting the
> > performance.
> 
> Which happens to be a magic number that no one really knows I think ;-)  

Yes, but that implies that adapters should not be relied upon to set
the queue depth. If the adapter is setting queue depth based upon what
the adapter knows, that is completely wrong - a regular disk and disk
array can end up with the same queue depth (this is not ture for raid
cards like ips, where the adapter is the device).

> Current goals of the work I'm doing include making this adjustable so you
> aren't wasting queue depth on devices that have a hard limit less than

But, this won't lower it to some optimal magic number.

> your initial queue depth value and changing the scsi_request_fn() to be
> more fair with command allocation on a controller in the presence of
> starvation (and of course, starvation is only possible when the queue
> depth of all devices is greater than the depth of the controller, so
> obviously the starvation code in scsi_request_fn() is something
> controllers have been avoiding by limiting queue depths on individual
> devices).

It would nice to have more device model/kernfs device attributes.

With your new queueing code, if we expose and allow setting new_queue_depth,
user code can easily modify the queue depth.

I wanted to write some code in this area but don't have enoough time. It would
not be hard to have a scsi_sdev_attrs.c file, that using macros could
automagically create device attr files and functions for any selection of
Scsi_Device fields. Locking might be an issue - since we don't have a
Scsi_Device lock for queue depth, just one big lock, this is true for
some of the other Scsi_Device fields.

A single function could be called in scsi_scan.c to setup all Scsi_Device
device attribute files (i.e. calls device_create_file for multiple
xxx_attr_types).

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [patch 2.5] ips queue depths
  2002-10-15 23:30       ` Patrick Mansfield
@ 2002-10-15 23:56         ` Luben Tuikov
  2002-10-16  2:32         ` Doug Ledford
  1 sibling, 0 replies; 297+ messages in thread
From: Luben Tuikov @ 2002-10-15 23:56 UTC (permalink / raw)
  To: linux-scsi

> > > The queue depth should be as small as possible without limiting the
> > > performance.
> >
> > Which happens to be a magic number that no one really knows I think ;-)
> 
> Yes, but that implies that adapters should not be relied upon to set
> the queue depth. If the adapter is setting queue depth based upon what
> the adapter knows, that is completely wrong - a regular disk and disk
> array can end up with the same queue depth (this is not ture for raid
> cards like ips, where the adapter is the device).

If I'm reading this correctly, does this bash the ability to set the
queue depth by the LLDD?

This ability is ESSENTIAL, especially in the light of newer devices
coming into play. E.g. just imagine what kind of storage magic you can
do with an iSCSI Initiator(*) connected to fiber using LUN masking... and
setting the queue depth to 50 would be just <your fav. methaphor>. 

* This of course uses 64 bit LUN transparently (as an interconnect)...

> > Current goals of the work I'm doing include making this adjustable so you
> > aren't wasting queue depth on devices that have a hard limit less than
> 
> But, this won't lower it to some optimal magic number.

No, it won't. But that is NOT the point.

All he is saying is that it will allow for the MECHANISM,
and leave the policy to some other entity/entities to decide...

As far as I remember THIS is the whole beauty of the UNIX/Linux paradigm.

> It would nice to have more device model/kernfs device attributes.
> 
> With your new queueing code, if we expose and allow setting new_queue_depth,
> user code can easily modify the queue depth.

Not bloody likely. To use the language of the other LT, this would be ``braindead''.

Linux is not a research project for someone to write a shell script
and expriment with different device tag queue depths from user space.

This will KILL the whole point of the matter. This should be a kernel issue.

This is the whole point of having and Operating System, (pause) and having
Device Drivers.

User space should NOT be concerned about this at all.

This is a kernel issue and the more you give userspace to play with kernel
issues the more the OS will degrade. That is, the user/kernel space border
should be clearly marked, so that both sides are happy.

If you don't believe me, ask Linus.

> I wanted to write some code in this area but don't have enoough time. It would
> not be hard to have a scsi_sdev_attrs.c file, that using macros could
> automagically create device attr files and functions for any selection of
> Scsi_Device fields. Locking might be an issue - since we don't have a
> Scsi_Device lock for queue depth, just one big lock, this is true for
> some of the other Scsi_Device fields.

Overengineering would not lead to good things, definitely not. On the long run,
and given the fact that so many subsystems come into play.

Let's keep the SCSI subsystem lean and clean, else the other block subsystem's
path awaits us...

-- 
Luben

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [patch 2.5] ips queue depths
  2002-10-15 23:30       ` Patrick Mansfield
  2002-10-15 23:56         ` Luben Tuikov
@ 2002-10-16  2:32         ` Doug Ledford
  2002-10-16 19:04           ` Patrick Mansfield
  1 sibling, 1 reply; 297+ messages in thread
From: Doug Ledford @ 2002-10-16  2:32 UTC (permalink / raw)
  To: Jeffery, David, 'Dave Hansen',
	'linux-scsi@vger.kernel.org'

On Tue, Oct 15, 2002 at 04:30:57PM -0700, Patrick Mansfield wrote:
> On Tue, Oct 15, 2002 at 04:52:18PM -0400, Doug Ledford wrote:
> > On Tue, Oct 15, 2002 at 01:04:45PM -0700, Patrick Mansfield wrote:
> > > 
> > > I totally agree the queue depth should not be limited based on the number of
> > > devices, but there should be a maximum limit - as Justin noted with the
> > > aic driver, using a large default queue depth (he had 253 or so) is not
> > > good with linux.
> > 
> > That's largely due to the scsi_request_fn() in its current form.
> 
> I don't understand that.

The current scsi_request_fn() sucks rocks when it comes to any sort of 
fair load balancing in a host queue starvation scenario.

> > > The queue depth should be as small as possible without limiting the
> > > performance.
> > 
> > Which happens to be a magic number that no one really knows I think ;-)  
> 
> Yes, but that implies that adapters should not be relied upon to set
> the queue depth.

I draw exactly the opposite conclusion (look at the email from Lubon 
Tuikov for several valid reasons why).

> If the adapter is setting queue depth based upon what
> the adapter knows, that is completely wrong

I strongly disagree here.  Only the adapter driver knows if the card 
itself has a hard limit of X commands at a time (so that deeper queues are 
wasted).  Only the adapter driver knows if the particular interconnect has 
any inherent queue limitations or speed limitations.  There are a thousand 
things only the adapter driver knows that should be factored into a sane 
queue depth.

> - a regular disk and disk
> array can end up with the same queue depth (this is not ture for raid
> cards like ips, where the adapter is the device).

I don't see a problem here.  I have yet to meet a reasonable SCSI disk 
that doesn't do well with deep queues.  Raid devices I'm not so sure of, 
but I'll give them the benefit of the doubt and let them have a deep queue 
as well.  I would say the maximum speed of the interconnect combined with 
the maximum size of each command is a better guage upon which to determine 
queue depth than whether it's a real disk or logical disk.

> > Current goals of the work I'm doing include making this adjustable so you
> > aren't wasting queue depth on devices that have a hard limit less than
> 
> But, this won't lower it to some optimal magic number.

No, the controller driver in question should already have it's "optimum" 
number in mind, set the drive to that, then if it's too high it can come 
down.  There is no magical "Hey, send me more commands" status code, so 
you have to start high and go low, not the other way around.  That, of 
course, is why an adjustable depth is important.  The controller should 
set this "optimum" number based upon it's own capabilities and assume the 
drive is just as capable.

> > your initial queue depth value and changing the scsi_request_fn() to be
> > more fair with command allocation on a controller in the presence of
> > starvation (and of course, starvation is only possible when the queue
> > depth of all devices is greater than the depth of the controller, so
> > obviously the starvation code in scsi_request_fn() is something
> > controllers have been avoiding by limiting queue depths on individual
> > devices).
> 
> It would nice to have more device model/kernfs device attributes.
> 
> With your new queueing code, if we expose and allow setting new_queue_depth,
> user code can easily modify the queue depth.

No.  For the reasons that the driver knows what it's capable of, there 
really is *very* little tunability here.  But, should you really need it, 
the drivers should provide it (both mine and Justin's aic7xxx drivers do 
so run time via module options or boot command options).  Make the default 
a good common default, let users override if necessary.  But I wouldn't 
make it a fiddle knob that people can tweak without thinking.  Regardless 
of where you make it adjustable though, it has to be passed through the 
low level drivers so that they can adjust their internal data structs as 
needed (some won't have to do anything, some will have to do allocations 
or frees on every adjustment, so you *have* to pass all adjustments 
through them).

> I wanted to write some code in this area but don't have enoough time. It would
> not be hard to have a scsi_sdev_attrs.c file, that using macros could
> automagically create device attr files and functions for any selection of
> Scsi_Device fields. Locking might be an issue - since we don't have a
> Scsi_Device lock for queue depth, just one big lock, this is true for
> some of the other Scsi_Device fields.
> 
> A single function could be called in scsi_scan.c to setup all Scsi_Device
> device attribute files (i.e. calls device_create_file for multiple
> xxx_attr_types).
> 
> -- Patrick Mansfield
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [patch 2.5] ips queue depths
  2002-10-16  2:32         ` Doug Ledford
@ 2002-10-16 19:04           ` Patrick Mansfield
  2002-10-16 20:15             ` Doug Ledford
  2002-10-17  0:39             ` Luben Tuikov
  0 siblings, 2 replies; 297+ messages in thread
From: Patrick Mansfield @ 2002-10-16 19:04 UTC (permalink / raw)
  To: 'linux-scsi@vger.kernel.org'

On Tue, Oct 15, 2002 at 10:32:31PM -0400, Doug Ledford wrote:
> On Tue, Oct 15, 2002 at 04:30:57PM -0700, Patrick Mansfield wrote:
> 
> > If the adapter is setting queue depth based upon what
> > the adapter knows, that is completely wrong
> 
> I strongly disagree here.  Only the adapter driver knows if the card 
> itself has a hard limit of X commands at a time (so that deeper queues are 
> wasted).  Only the adapter driver knows if the particular interconnect has 
> any inherent queue limitations or speed limitations.  There are a thousand 
> things only the adapter driver knows that should be factored into a sane 
> queue depth.

OK, the adapter does not get it completely wrong, but it does not
know about special scsi device limitations, block layer limits, usage
patterns, or total number of scsi devices on the system.

Your changes (set high, adjust lower as we hit queue fulls) should
work fine in most cases, and is much better than the previous state.

Some example cases where we might want to lower queue depth:

System with small amounts of memory compared to the number of devices.

With many disks on a system, some with a very light load, it could be
give the lightly loaded disks a lower queue depth so they use less
memory, or so they can do less IO.

In a 2 node cluster with shared devices, the queue depth could be set to
half of some hard limit on each node of the cluster, and avoid hitting
any hard queue fulls.

(It would really nice if we could modify the number of struct request's 
allocated for little used or unused devices, or on character devices like
tape that don't even use the requests. Current block code allocates 2*128
of these on systems with lots of memory, this could save way more space
than lowering the queue depth.)

> > It would nice to have more device model/kernfs device attributes.
> > 
> > With your new queueing code, if we expose and allow setting new_queue_depth,
> > user code can easily modify the queue depth.
> 
> No.  For the reasons that the driver knows what it's capable of, there 
> really is *very* little tunability here.  But, should you really need it, 
> the drivers should provide it (both mine and Justin's aic7xxx drivers do 
> so run time via module options or boot command options).  Make the default 
> a good common default, let users override if necessary.  But I wouldn't 
> make it a fiddle knob that people can tweak without thinking.  Regardless 
> of where you make it adjustable though, it has to be passed through the 
> low level drivers so that they can adjust their internal data structs as 
> needed (some won't have to do anything, some will have to do allocations 
> or frees on every adjustment, so you *have* to pass all adjustments 
> through them).

I was suggesting a common interface via a Scsi_Device device attribute
so the default depth can be modified as needed, rather than a fixed
boot or module load option that is fixed (once the driver is loaded)
and might be a different option for every adapter driver.

It's too bad we can't modify new_queue_depth and have all the layers
(well mid and lower) adjust accordingly.

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [patch 2.5] ips queue depths
  2002-10-16 19:04           ` Patrick Mansfield
@ 2002-10-16 20:15             ` Doug Ledford
  2002-10-17  0:39             ` Luben Tuikov
  1 sibling, 0 replies; 297+ messages in thread
From: Doug Ledford @ 2002-10-16 20:15 UTC (permalink / raw)
  To: 'linux-scsi@vger.kernel.org'

On Wed, Oct 16, 2002 at 12:04:36PM -0700, Patrick Mansfield wrote:
> On Tue, Oct 15, 2002 at 10:32:31PM -0400, Doug Ledford wrote:
> > On Tue, Oct 15, 2002 at 04:30:57PM -0700, Patrick Mansfield wrote:
> > 
> > > If the adapter is setting queue depth based upon what
> > > the adapter knows, that is completely wrong
> > 
> > I strongly disagree here.  Only the adapter driver knows if the card 
> > itself has a hard limit of X commands at a time (so that deeper queues are 
> > wasted).  Only the adapter driver knows if the particular interconnect has 
> > any inherent queue limitations or speed limitations.  There are a thousand 
> > things only the adapter driver knows that should be factored into a sane 
> > queue depth.
> 
> OK, the adapter does not get it completely wrong, but it does not
> know about special scsi device limitations, block layer limits, usage
> patterns, or total number of scsi devices on the system.

All of these are special cases.  Scsi device limitations it will learn
(that's the go lower part ;-), block layer limits need to be addressed (by
changing the block layer to know about our queue depth requirements is my
preferred way of doing things, I would like to see the block layer adjust
the request queue depth whenever we adjust our queue depth, but I'm not 
sure what this would take at this point in time), usage patterns are 
something only an admin could reasonably tell us ("Hey, this disk is only 
used for temporary storage when streaming to tape, so it only needs 
miniscule resources" is valid, but special, and trying to write a default 
for this scenario isn't feasible), and total number of scsi devices on the 
system is learnable, but I don't think we should change our defaults over 
it (especially since hot plug devices makes this a dubious distinction at 
best, I think the better default is to assume that any machine with some 
huge number of drives attached via SCSI is likely a huge machine and that 
the relatively small X kilobytes of data we allocate statically per drive 
is OK, the aic7xxx_old driver allocates 1k +- for each command, with a 
maximum of 255 commands per controller regardless of total device queue 
depth on that controller, so a maximum of 256K per controller right now, 
so 100 drives on 10 controllers would be a maximum of 2.5MB of RAM, and 
any machine with 100 disks shouldn't balk at 2.5MB of aic7xxx_old data 
structs, how much data is allocated in the block layer and mid layer is 
not a figure I have at hand though).

> Your changes (set high, adjust lower as we hit queue fulls) should
> work fine in most cases, and is much better than the previous state.

Good, then we agree ;-)

> Some example cases where we might want to lower queue depth:
> 
> System with small amounts of memory compared to the number of devices.
> 
> With many disks on a system, some with a very light load, it could be
> give the lightly loaded disks a lower queue depth so they use less
> memory, or so they can do less IO.

Both of these are strong candidates for proper admin setup IMHO.

> In a 2 node cluster with shared devices, the queue depth could be set to
> half of some hard limit on each node of the cluster, and avoid hitting
> any hard queue fulls.

Maybe, but what if you want each node to be able to reach maximum 
performance under peak conditions and then fall back during more relaxed 
times?  This is the same sort of thing that all of our controllers have 
been doing in the past by limiting the queue depth of all devices on a 
controller to the controllers maximum depth and I contend it's a false 
optimization.  It permanently lowers the peak performance of each machine 
for fear that you might have contention for the device's queue instead of 
simply acting reasonable in the presence of QUEUE_FULL return codes.

> (It would really nice if we could modify the number of struct request's 
> allocated for little used or unused devices, or on character devices like
> tape that don't even use the requests. Current block code allocates 2*128
> of these on systems with lots of memory, this could save way more space
> than lowering the queue depth.)

This I agree with 1000% and is on my list of things to investigate.

> I was suggesting a common interface via a Scsi_Device device attribute
> so the default depth can be modified as needed, rather than a fixed
> boot or module load option that is fixed (once the driver is loaded)
> and might be a different option for every adapter driver.
> 
> It's too bad we can't modify new_queue_depth and have all the layers
> (well mid and lower) adjust accordingly.

Well, I don't have any objection to being able to modify queue depths.  
However, the current problem is that none of the device drivers have the 
ability to accept events that tell them to change their queue depth on a 
device.  Instead, they all want to tell the mid layer what queue depth to 
use.  Now, because we can't just blindly shove more commands into a driver 
than it's expecting, you can't up the queue depth without telling the 
driver (lowering it is probably safe, although it is wasteful of allocated 
structs in memory).  As it stands, the mid layer *would* honor changing 
new_queue_depth.

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [patch 2.5] ips queue depths
  2002-10-16 19:04           ` Patrick Mansfield
  2002-10-16 20:15             ` Doug Ledford
@ 2002-10-17  0:39             ` Luben Tuikov
  2002-10-17 17:01               ` Mike Anderson
  1 sibling, 1 reply; 297+ messages in thread
From: Luben Tuikov @ 2002-10-17  0:39 UTC (permalink / raw)
  To: linux-scsi

Patrick Mansfield wrote:
> 
> OK, the adapter does not get it completely wrong, but it does not
> know about special scsi device limitations, block layer limits, usage
> patterns, or total number of scsi devices on the system.
>

This is the dependency graph:

   block layer <-- SCSI core <-- SCSI LLDD.

where ``A <-- B'', means ``A depends on B''.

That is, the block layer (as an UPPER LAYER) should change its
parameters as suggested by the lower layers, since they operate
closer to the real devices, and NOT the other way around, as
has been suggested so many times here. (The whole point of an OS.)

That is, should should NOT force the device and you should NOT restrain
the device.

Also involving the elevator algorithm, merging, etc, is not appropriate
when we talk about such things as TCQ depths, just because those notions
belong to a different layer (upper at that).

> Some example cases where we might want to lower queue depth:
> 
> System with small amounts of memory compared to the number of devices.
> 
> With many disks on a system, some with a very light load, it could be
> give the lightly loaded disks a lower queue depth so they use less
> memory, or so they can do less IO.
> 
> In a 2 node cluster with shared devices, the queue depth could be set to
> half of some hard limit on each node of the cluster, and avoid hitting
> any hard queue fulls.
> 
> (It would really nice if we could modify the number of struct request's
> allocated for little used or unused devices, or on character devices like
> tape that don't even use the requests. Current block code allocates 2*128
> of these on systems with lots of memory, this could save way more space
> than lowering the queue depth.)
> 

Valid point, but what kind of memory allocator are you thinking of?

I'm thinking more of the lines of, what was once suggested by Doug,
a pool of the objects and if we need one, just unhook it from its
struct list_head (great solution), and use it... (search linux-scsi for
the exact message)

That is, the queue becomes just a NUMBER, an int if you like, and
the resource management is centralized, thus wasting LESS resources,
as resource users increases (OS 101).

Now let's take this step further, and _delegate_. Let's give resource
management to the lookaside cache (kmem_cache_create() and friends) and let
that (the) resource manager worry about whether it uses struct list_head,
or what not and how many pages it has preallocated and what not, and if
we have a problem with how fast or what not, we can get in touch with its
maintainers. (Though I've used that solution in my drivers and I get
_excellent_ performance.)

So you see, the tagged device queue itself would be a number rather than
wasted resources.

> I was suggesting a common interface via a Scsi_Device device attribute
> so the default depth can be modified as needed, rather than a fixed
> boot or module load option that is fixed (once the driver is loaded)
> and might be a different option for every adapter driver.

This discussion should be dropped already.

Just imagine what would happen if a SCSI LLDD suddently finds out that
its tagged device queue depth has been changed, what is it supposed to do?

Furthermore, you say ``so the default depth can be modified as needed'',
which contradicts the meaning of ``default''.

In fact the default setting wouldn't play much, and would be quickly forgotten
as soon as the driver is run and disk/devices are connected to it. So in this
respect it has little significance.

Even if you have little memory (thin client) it makes sense to have a TCQ depth
of 200 if you're connected to a monster storage system, since if you send
200 tagged commands to /dev/sda they may NOT necessarily go to one ``device''.
Imagine that!

-- 
Luben

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [patch 2.5] ips queue depths
  2002-10-17  0:39             ` Luben Tuikov
@ 2002-10-17 17:01               ` Mike Anderson
  2002-10-17 21:13                 ` Luben Tuikov
  0 siblings, 1 reply; 297+ messages in thread
From: Mike Anderson @ 2002-10-17 17:01 UTC (permalink / raw)
  To: Luben Tuikov; +Cc: linux-scsi

Luben Tuikov [luben@splentec.com] wrote:
> Valid point, but what kind of memory allocator are you thinking of?
> 
> I'm thinking more of the lines of, what was once suggested by Doug,
> a pool of the objects and if we need one, just unhook it from its
> struct list_head (great solution), and use it... (search linux-scsi for
> the exact message)
> 
> That is, the queue becomes just a NUMBER, an int if you like, and
> the resource management is centralized, thus wasting LESS resources,
> as resource users increases (OS 101).
> 
> Now let's take this step further, and _delegate_. Let's give resource
> management to the lookaside cache (kmem_cache_create() and friends) and let
> that (the) resource manager worry about whether it uses struct list_head,
> or what not and how many pages it has preallocated and what not, and if
> we have a problem with how fast or what not, we can get in touch with its
> maintainers. (Though I've used that solution in my drivers and I get
> _excellent_ performance.)
> 
> So you see, the tagged device queue itself would be a number rather than
> wasted resources.
> 

I assume all devices will be guaranteed a min of the poll otherwise under
memory pressure we could be unable to do IO on swap devices.

> This discussion should be dropped already.
> 
> Just imagine what would happen if a SCSI LLDD suddently finds out that
> its tagged device queue depth has been changed, what is it supposed to do?
> 
> Furthermore, you say ``so the default depth can be modified as needed'',
> which contradicts the meaning of ``default''.
> 
> In fact the default setting wouldn't play much, and would be quickly forgotten
> as soon as the driver is run and disk/devices are connected to it. So in this
> respect it has little significance.
> 
> Even if you have little memory (thin client) it makes sense to have a TCQ depth
> of 200 if you're connected to a monster storage system, since if you send
> 200 tagged commands to /dev/sda they may NOT necessarily go to one ``device''.
> Imagine that!

Just because a device can accept a command and not return busy does not
mean it is always a good thing to give it 200 commands. Most larger
arrays will have dynamic queue depths and accept a lot of IO without
really working on it. If the delta of time is increasing on your IO than
at some point you are inefficient. We have already seen on some adapters
that reducing the queue depths achieved the same amount of IO rate with
reduced CPU overhead. Which is a good thing as users usally want to do
something else besides IO.

If you are using shared resources (i.e. sg_table pool mem, the above
suggested scsi command poll, timers, etc) to avoid resource starvation
you might need to set limits outside of what a single LLDD believes is
best. More efficient implementations would also reduce some of the
overhead.

A administrator may want to adjust the overall policy for a specific
work load. Maybe proper values can be set and feedback information can
allow self adjustment, but there are probably workloads that the
values are incorrect. What that policy is would depend on the resources.
Something similar to vm adjustments.

-andmike
--
Michael Anderson
andmike@us.ibm.com

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [patch 2.5] ips queue depths
  2002-10-17 17:01               ` Mike Anderson
@ 2002-10-17 21:13                 ` Luben Tuikov
  0 siblings, 0 replies; 297+ messages in thread
From: Luben Tuikov @ 2002-10-17 21:13 UTC (permalink / raw)
  To: linux-scsi

Mike Anderson wrote:
> 
> Luben Tuikov [luben@splentec.com] wrote:
> > Even if you have little memory (thin client) it makes sense to have a TCQ depth
> > of 200 if you're connected to a monster storage system, since if you send
> > 200 tagged commands to /dev/sda they may NOT necessarily go to one ``device''.
> > Imagine that!
> 
> Just because a device can accept a command and not return busy does not
> mean it is always a good thing to give it 200 commands. Most larger
> arrays will have dynamic queue depths and accept a lot of IO without
> really working on it. If the delta of time is increasing on your IO than
> at some point you are inefficient. We have already seen on some adapters
> that reducing the queue depths achieved the same amount of IO rate with
> reduced CPU overhead. Which is a good thing as users usally want to do
> something else besides IO.

I repeat again: those commands will NOT go to the same device, but
after tier one, they'll be sent out to different devices and
probably execute concurrently, and come back out of order,
at which point the interconnect will (may) order them, before
returning status.

> If you are using shared resources (i.e. sg_table pool mem, the above
> suggested scsi command poll, timers, etc) to avoid resource starvation
> you might need to set limits outside of what a single LLDD believes is
> best. More efficient implementations would also reduce some of the
> overhead.
> 
> A administrator may want to adjust the overall policy for a specific
> work load. Maybe proper values can be set and feedback information can
> allow self adjustment, but there are probably workloads that the
> values are incorrect. What that policy is would depend on the resources.
> Something similar to vm adjustments.

Cannot comment on general statements like this.
A more concrete example would be needed.

-- 
Luben

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [patch 2.5] ips queue depths
  2002-10-15 19:47 ` Doug Ledford
  2002-10-15 20:04   ` Patrick Mansfield
@ 2002-10-15 20:10   ` Mike Anderson
  2002-10-15 20:24     ` Doug Ledford
  2002-10-15 20:38     ` James Bottomley
  2002-10-15 20:24   ` Mike Anderson
                     ` (2 subsequent siblings)
  4 siblings, 2 replies; 297+ messages in thread
From: Mike Anderson @ 2002-10-15 20:10 UTC (permalink / raw)
  To: Jeffery, David, 'Dave Hansen',
	'linux-scsi@vger.kernel.org'

Doug Ledford [dledford@redhat.com] wrote:
> I actually sent a patch to linus already to do this but it hasn't come 
> through yet.  However, my patch and yours differ in one key point that I 
> don't understand.
> 
> The scsi mid layer will never send you more than host->can_queue commands 
> at one time, so why do all the scsi driver authors feel it is necessary to 
> split their queue depth up amongst devices?  Justin Gibbs is the only one 
> that gets it right IMHO.  Set the depth on each device as deep as that 
> device can take (if that happens to be host->can_queue - 1 or such, then 
> so be it).  Then, let the mid layer and the request function worry about 
> fairness across devices.  That's its job after all.  If everyone is going 
> to moderate their queue depths like that, then we might as well yank 
> can_queue out of the host struct entirely because *it serves no purpose*.
> commands and a separate queue depth limit on each logical device).  So, I 
> can't say for sure what the ips driver can and can't do, but I'm 
> relatively sure that you are artificially limiting your own performance by 
> handling queue depths the way you are.

I never seen the mid layer handle device starvation correctly. It might
be because in scsi_queue_next_request we do this:

/*
 * Just hit the requeue function for the queue.
 */
  q->request_fn(q);

before we check for starved.


-andmike
--
Michael Anderson
andmike@us.ibm.com


^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [patch 2.5] ips queue depths
  2002-10-15 20:10   ` Mike Anderson
@ 2002-10-15 20:24     ` Doug Ledford
  2002-10-15 20:38     ` James Bottomley
  1 sibling, 0 replies; 297+ messages in thread
From: Doug Ledford @ 2002-10-15 20:24 UTC (permalink / raw)
  To: Jeffery, David, 'Dave Hansen',
	'linux-scsi@vger.kernel.org'

On Tue, Oct 15, 2002 at 01:10:22PM -0700, Mike Anderson wrote:
> I never seen the mid layer handle device starvation correctly. It might
> be because in scsi_queue_next_request we do this:
> 
> /*
>  * Just hit the requeue function for the queue.
>  */
>   q->request_fn(q);
> 
> before we check for starved.

Don't pay too much attention to the junk that's there now...

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606
  

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [patch 2.5] ips queue depths
  2002-10-15 20:10   ` Mike Anderson
  2002-10-15 20:24     ` Doug Ledford
@ 2002-10-15 20:38     ` James Bottomley
  2002-10-15 22:10       ` Mike Anderson
  1 sibling, 1 reply; 297+ messages in thread
From: James Bottomley @ 2002-10-15 20:38 UTC (permalink / raw)
  To: Jeffery, David, 'Dave Hansen',
	'linux-scsi@vger.kernel.org'

andmike@us.ibm.com said:
> I never seen the mid layer handle device starvation correctly. It
> might be because in scsi_queue_next_request we do this:

> /*
>  * Just hit the requeue function for the queue.
>  */
>   q->request_fn(q);

> before we check for starved. 

This starvation code has nothing to do with fairness.  It's job is purely to 
restart the device queue if we got a host blocked condition and we had to 
reject commands with zero depth.  When the host blocked abates then we loop 
over all the starved devices and call their request functions.  Without this, 
we could get a hang in the block layer queueing.

This could do with being converted over to the blk_stop/start_queue api.

The SCSI code seems to have been written with the idea that device queue_depth 
<< can_queue and thus doesn't concern itself with resource fairness issues 
when the two are closer in size to each other.

James

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [patch 2.5] ips queue depths
  2002-10-15 20:38     ` James Bottomley
@ 2002-10-15 22:10       ` Mike Anderson
  2002-10-16  1:04         ` James Bottomley
  0 siblings, 1 reply; 297+ messages in thread
From: Mike Anderson @ 2002-10-15 22:10 UTC (permalink / raw)
  To: James Bottomley
  Cc: Jeffery, David, 'Dave Hansen',
	'linux-scsi@vger.kernel.org'

James Bottomley [James.Bottomley@steeleye.com] wrote:
> andmike@us.ibm.com said:
> > I never seen the mid layer handle device starvation correctly. It
> > might be because in scsi_queue_next_request we do this:
> 
> > /*
> >  * Just hit the requeue function for the queue.
> >  */
> >   q->request_fn(q);
> 
> > before we check for starved. 
> 
> This starvation code has nothing to do with fairness.  It's job is purely to 
> restart the device queue if we got a host blocked condition and we had to 
> reject commands with zero depth.  When the host blocked abates then we loop 
> over all the starved devices and call their request functions.  Without this, 
> we could get a hang in the block layer queueing.

I am mis-reading what your saying.

Ok it will not handle fairness, but it will trigger a starved and
some_device_starved case when SHpnt->can_queue > 0 and SHpnt->host_busy >=
SHpnt->can_queue. Which is the case we where discussing and is not a
host blocked condition. The reverse ordering of the request_fn call
could cause a device to stayed starved if the device queue depth is near
or equal to the host can_queue.

-andmike
--
Michael Anderson
andmike@us.ibm.com


^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [patch 2.5] ips queue depths
  2002-10-15 22:10       ` Mike Anderson
@ 2002-10-16  1:04         ` James Bottomley
  0 siblings, 0 replies; 297+ messages in thread
From: James Bottomley @ 2002-10-16  1:04 UTC (permalink / raw)
  To: Jeffery, David, 'Dave Hansen',
	'linux-scsi@vger.kernel.org'

andmike@us.ibm.com said:
> Ok it will not handle fairness, but it will trigger a starved and
> some_device_starved case when SHpnt->can_queue > 0 and SHpnt->
> host_busy >= SHpnt->can_queue. Which is the case we where discussing
> and is not a host blocked condition.

Yes, but this still only activates if the device trying to queue has no 
currently outstanding commands.  Usually under heavy load to multiple LUNs, 
you probably won't run into this too often.

The way the mid-layer expects to operate under load is that a returning 
command for a given device triggers the request queue for that device to queue 
up the next waiting command.

> The reverse ordering of the request_fn call could cause a device to
> stayed starved if the device queue depth is near or equal to the host
> can_queue.

True, I suppose, but only in the limited case where there are no queued 
commands for a particular device.  The most likely case is that one device has 
stolen a huge number of slots and everyone else gets a tiny number, which 
won't activate this code at all.

James

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [patch 2.5] ips queue depths
  2002-10-15 19:47 ` Doug Ledford
  2002-10-15 20:04   ` Patrick Mansfield
  2002-10-15 20:10   ` Mike Anderson
@ 2002-10-15 20:24   ` Mike Anderson
  2002-10-15 22:46     ` Doug Ledford
  2002-10-15 20:26   ` Luben Tuikov
  2002-10-21  7:28   ` Mike Anderson
  4 siblings, 1 reply; 297+ messages in thread
From: Mike Anderson @ 2002-10-15 20:24 UTC (permalink / raw)
  To: Jeffery, David, 'Dave Hansen',
	'linux-scsi@vger.kernel.org'

Style question below.

Doug Ledford [dledford@redhat.com] wrote:
> @@ -481,7 +482,8 @@
>      eh_host_reset_handler : ips_eh_reset, \
>      abort : NULL,                         \
>      reset : NULL,                         \
> -    slave_attach : NULL,                  \
> +    slave_attach : ips_slave_attach,      \
> +    slave_detach : NULL,                  \
>      bios_param : ips_biosparam,           \
>      can_queue : 0,                        \
>      this_id: -1,                          \

Can we stop setting struct members to NULL. If we later change the
interface it makes the patch just that much larger. 

-andmike
--
Michael Anderson
andmike@us.ibm.com


^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [patch 2.5] ips queue depths
  2002-10-15 20:24   ` Mike Anderson
@ 2002-10-15 22:46     ` Doug Ledford
  0 siblings, 0 replies; 297+ messages in thread
From: Doug Ledford @ 2002-10-15 22:46 UTC (permalink / raw)
  To: Jeffery, David, 'Dave Hansen',
	'linux-scsi@vger.kernel.org'

On Tue, Oct 15, 2002 at 01:24:50PM -0700, Mike Anderson wrote:
> Style question below.
> 
> Doug Ledford [dledford@redhat.com] wrote:
> > @@ -481,7 +482,8 @@
> >      eh_host_reset_handler : ips_eh_reset, \
> >      abort : NULL,                         \
> >      reset : NULL,                         \
> > -    slave_attach : NULL,                  \
> > +    slave_attach : ips_slave_attach,      \
> > +    slave_detach : NULL,                  \
> >      bios_param : ips_biosparam,           \
> >      can_queue : 0,                        \
> >      this_id: -1,                          \
> 
> Can we stop setting struct members to NULL. If we later change the
> interface it makes the patch just that much larger. 

Sure, I don't care.

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606
  

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [patch 2.5] ips queue depths
  2002-10-15 19:47 ` Doug Ledford
                     ` (2 preceding siblings ...)
  2002-10-15 20:24   ` Mike Anderson
@ 2002-10-15 20:26   ` Luben Tuikov
  2002-10-15 21:27     ` Patrick Mansfield
  2002-10-21  7:28   ` Mike Anderson
  4 siblings, 1 reply; 297+ messages in thread
From: Luben Tuikov @ 2002-10-15 20:26 UTC (permalink / raw)
  To: linux-scsi

Doug Ledford wrote:
> 
> The scsi mid layer will never send you more than host->can_queue commands
> at one time, so why do all the scsi driver authors feel it is necessary to
> split their queue depth up amongst devices?  Justin Gibbs is the only one
> that gets it right IMHO.  Set the depth on each device as deep as that
> device can take (if that happens to be host->can_queue - 1 or such, then
> so be it).  Then, let the mid layer and the request function worry about
> fairness across devices.  That's its job after all.  If everyone is going
> to moderate their queue depths like that, then we might as well yank
> can_queue out of the host struct entirely because *it serves no purpose*.
> [...]

I agree with Doug on the functionality splitting.
Regarding bashing SCSI core that it doesn't handle the device
q's (starvation) properly: nevertheless it is _its_ function to do
so and this should be observed. Soon enough it will get it right.

There should be a functionality split and LLDD should abide by it --
in the worst case, LLDD can play with the max tagged cmnds number.

Re. bashing the too large a number of tags allowed -- this actually
makes sense sometimes -- e.g. an iSCSI Initiator wants to have
as many tagged commands under its control rather than they
be waiting in SCSI core. If the mid layer has a problem with
a scsi tagged command, then it can notify the LLDD and it
will resolve this (*** please see my next Q. posting ***).

Statements like this: ``The queue depth should be as small as
possible without limiting the performance.'' cannot be generalized
to all devices, be it hardware or software emulations (YKWIM).

-- 
Luben

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [patch 2.5] ips queue depths
  2002-10-15 20:26   ` Luben Tuikov
@ 2002-10-15 21:27     ` Patrick Mansfield
  2002-10-16  0:43       ` Luben Tuikov
  0 siblings, 1 reply; 297+ messages in thread
From: Patrick Mansfield @ 2002-10-15 21:27 UTC (permalink / raw)
  To: Luben Tuikov; +Cc: linux-scsi

On Tue, Oct 15, 2002 at 04:26:45PM -0400, Luben Tuikov wrote:

> Re. bashing the too large a number of tags allowed -- this actually
> makes sense sometimes -- e.g. an iSCSI Initiator wants to have
> as many tagged commands under its control rather than they
> be waiting in SCSI core. If the mid layer has a problem with
> a scsi tagged command, then it can notify the LLDD and it
> will resolve this (*** please see my next Q. posting ***).
> 
> Statements like this: ``The queue depth should be as small as
> possible without limiting the performance.'' cannot be generalized
> to all devices, be it hardware or software emulations (YKWIM).
> 

I'm saying don't set the queue depth really high when it gives no or
very little performance gain. If an adapter driver finds that a large
queue depth helps more than it hurts for all IO loads (for sequential
as well as random IO), go ahead, but I would guess that queue depths over
100 give zero or very little performance gain compared to a queue depth
of say 50 for most devices. I was trying to run some tests on this
in the past but never had time to get it working well, plus it would have
been for only two different devices (disk and disk array), and the
drives I have are not really fast (20 mb/sec for disk, about 50mb/sec for
the disk array).

What is really needed are IO performance numbers for varying queue depths.

With 2.5, the number of commands outstanding to the device is not
subtracted from the blk request queue size (we don't release a blk request
until the IO is completed, there is no call to blkdev_release_request in
scsi_request_fn) - this means large queue depths will cause the blk request
queue to fill up and even be full without any available blk request queue
commands to merge or sort with.

There are also issues like Andrew had with the read latency - although
his benchmark is aritificial, and has more to do with too many dirty
pages, it still showed that higher queue depths can have an impact
on interactive performance (i.e. read latencies).

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [patch 2.5] ips queue depths
  2002-10-15 21:27     ` Patrick Mansfield
@ 2002-10-16  0:43       ` Luben Tuikov
  0 siblings, 0 replies; 297+ messages in thread
From: Luben Tuikov @ 2002-10-16  0:43 UTC (permalink / raw)
  To: linux-scsi

Patrick Mansfield wrote:
> 
> I'm saying don't set the queue depth really high when it gives no or
> very little performance gain. If an adapter driver finds that a large
> queue depth helps more than it hurts for all IO loads (for sequential
> as well as random IO), go ahead, but I would guess that queue depths over
> 100 give zero or very little performance gain compared to a queue depth
> of say 50 for most devices. I was trying to run some tests on this
> in the past but never had time to get it working well, plus it would have
> been for only two different devices (disk and disk array), and the
> drives I have are not really fast (20 mb/sec for disk, about 50mb/sec for
> the disk array).

Ok, this may work, now and here, but then and there it doesn't have to.

Predicting on a number, say 100, is speculation at best. What if the
initiator is connected to fiber which is connected to another, etc.
And what if /dev/sda is an iSCSI initiator, connected to a bunch of
targets, which are arrays on another fiber...

You see, 100 means nothing anymore. That is sending 200 tagged commands
will NOT go to the same ``device''... (your imagination here)

The SCSI LLDD, being the gate to the interconnect/transport, knows best,
and has at its disposal features/abilities not easily exportable to ULP/userland.
Thus, it has the ability to at least hint at some number, being the
device queue depth.

> What is really needed are IO performance numbers for varying queue depths.

Yep, this is what you give your boss... (Essay topic for next Thursday :-))

But tomorrow, someone has decided to just change one little iota in the code
and those same numbers are out the window (just as has recently happened).
That is, this wouldn't work here.

Those numbers would of course depend on each subsystem getting it ``right'',
and the dependent variables become too many.

Thus, in my experience (and it is my opinion) it is best to approach matters
like this from an academic/reasearch point of view -- that is, we are speaking
of a _general_ architecture, and not of a few empirical tests, hinting at 10 five line
patches.

> With 2.5, the number of commands outstanding to the device is not
> subtracted from the blk request queue size (we don't release a blk request
> until the IO is completed, there is no call to blkdev_release_request in
> scsi_request_fn) - this means large queue depths will cause the blk request
> queue to fill up and even be full without any available blk request queue
> commands to merge or sort with.

Yes, ok, so we are involving the block layer, which can/should/may change
tomorrow BUT the SCSI core should/may not have to -- this would mean
that it's doing a great job. (cont'd below)

> There are also issues like Andrew had with the read latency - although
> his benchmark is aritificial, and has more to do with too many dirty
> pages, it still showed that higher queue depths can have an impact
> on interactive performance (i.e. read latencies).

Right! Meaning that the issue is/was elsewhere all along.

So if we involve the block layer too much, and tomorrow someone finds out
something was broken there, we SHOULD NOT HAVE TO change the SCSI core.
This would mean a fairly independent implementation (being a subsystem),
which implies general structrure, which implies research.

While it is good to look at who's below us and above us (SCSI core),
depending too much on their particulars is not generally a good investment.

((All this of course implies that SCSI Core would be quite minimal.))

-- 
Luben

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [patch 2.5] ips queue depths
  2002-10-15 19:47 ` Doug Ledford
                     ` (3 preceding siblings ...)
  2002-10-15 20:26   ` Luben Tuikov
@ 2002-10-21  7:28   ` Mike Anderson
  2002-10-21 16:16     ` Doug Ledford
  4 siblings, 1 reply; 297+ messages in thread
From: Mike Anderson @ 2002-10-21  7:28 UTC (permalink / raw)
  To: Jeffery, David, 'Dave Hansen',
	'linux-scsi@vger.kernel.org'

Doug Ledford [dledford@redhat.com] wrote:
>     ha = IPS_HA(SDptr->host);
> -   min = ha->max_cmds / 4;
> -   if (min < 8)
> -      min = ha->max_cmds - 1;
> -   scsi_adjust_queue_depth(SDptr, MSG_ORDERED_TAG, min);
> +   if (SDptr->tagged_supported) {
> +      min = ha->max_cmds / 2;
> +      if (min <= 16)
> +         min = ha->max_cmds - 1;
> +      scsi_adjust_queue_depth(SDptr, MSG_ORDERED_TAG, min);
> +   }
>     return 0;
>  }
>  

Doug,
	Sorry I did not get on to checking your version of this patch
	sooner.

I did check 2.5.44 which includes your patched version of ips.c. I am
seeing only a queue depth of 1.

The reason is that the SERVERAID disks are virutal and the inquiry data
they send back does is generated in the driver. The inq data does not
have the CmdQue bit set which results in tagged_supported not being set.

Either the slave_attached code will need to change or the driver will
need to start setting CmdQue.

Output for ServeRAID:
PQual=0, Device type=0, RMB=0, ANSI version=2, [full version=0x02]
AERC=0, TrmTsk=0, NormACA=0, HiSUP=0, Resp data format=2, SCCS=0
BQue=0, EncServ=0, MultiP=0, MChngr=0, ACKREQQ=0, Addr16=1
RelAdr=0, WBus16=1, Sync=1, Linked=0, TranDis=0, CmdQue=0
length=36 (0x24)
Vendor identification: IBM
Product identification: SERVERAID
Product revision level: 1.00


-andmike
--
Michael Anderson
andmike@us.ibm.com


^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [patch 2.5] ips queue depths
  2002-10-21  7:28   ` Mike Anderson
@ 2002-10-21 16:16     ` Doug Ledford
  2002-10-21 16:29       ` James Bottomley
  0 siblings, 1 reply; 297+ messages in thread
From: Doug Ledford @ 2002-10-21 16:16 UTC (permalink / raw)
  To: Mike Anderson; +Cc: Linux Scsi Mailing List

On Mon, Oct 21, 2002 at 12:28:01AM -0700, Mike Anderson wrote:
> Doug Ledford [dledford@redhat.com] wrote:
> >     ha = IPS_HA(SDptr->host);
> > -   min = ha->max_cmds / 4;
> > -   if (min < 8)
> > -      min = ha->max_cmds - 1;
> > -   scsi_adjust_queue_depth(SDptr, MSG_ORDERED_TAG, min);
> > +   if (SDptr->tagged_supported) {
> > +      min = ha->max_cmds / 2;
> > +      if (min <= 16)
> > +         min = ha->max_cmds - 1;
> > +      scsi_adjust_queue_depth(SDptr, MSG_ORDERED_TAG, min);
> > +   }
> >     return 0;
> >  }
> >  
> 
> Doug,
> 	Sorry I did not get on to checking your version of this patch
> 	sooner.
> 
> I did check 2.5.44 which includes your patched version of ips.c. I am
> seeing only a queue depth of 1.
> 
> The reason is that the SERVERAID disks are virutal and the inquiry data
> they send back does is generated in the driver. The inq data does not
> have the CmdQue bit set which results in tagged_supported not being set.

Yeah, David Jeffery of Adaptec (the current maintainer of the ips driver) 
already sent a fix for this to Linus, so it should be in 2.5.44 I think.  
If not, let me know and I'll add it to my queue I'm preparing for when 
Linus comes back.

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606
  

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [patch 2.5] ips queue depths
  2002-10-21 16:16     ` Doug Ledford
@ 2002-10-21 16:29       ` James Bottomley
  0 siblings, 0 replies; 297+ messages in thread
From: James Bottomley @ 2002-10-21 16:29 UTC (permalink / raw)
  To: Linux Scsi Mailing List

dledford@redhat.com said:
> Yeah, David Jeffery of Adaptec (the current maintainer of the ips
> driver)  already sent a fix for this to Linus, so it should be in
> 2.5.44 I think.   If not, let me know and I'll add it to my queue I'm
> preparing for when  Linus comes back. 

The last change to ips was you on 16 October, so it's probably still AWOL.

I'm keeping the trivial miscellaneous changes in linux-scsi.bkbits.net/scsi-mis
c-2.5 you can look at the change logs (even if you don't run bitkeeper) to see 
what's in there.

James





^ permalink raw reply	[flat|nested] 297+ messages in thread

[parent not found: <dledford@redhat.com>]

* PATCH: scsi device queue depth adjustability patch
@ 2002-10-02  0:28 ` Doug Ledford
  2002-10-02  1:16   ` Alan Cox
                     ` (2 more replies)
  0 siblings, 3 replies; 297+ messages in thread
From: Doug Ledford @ 2002-10-02  0:28 UTC (permalink / raw)
  To: linux-scsi

[-- Attachment #1: Type: text/plain, Size: 1381 bytes --]

This patch makes it possible to adjust the queue depth of a scsi device 
after it has been in use some time and you have a better idea of what the 
optimal queue depth should be.  For the most part this should work, but my 
2.5.40 test machine is blowing chunks on the serverworks IDE support right 
now so it isn't tested :-(

What I need, people to test this with the old aic7xxx driver (which 
implements the new code paths, no other drivers do) and people to test 
with other drivers to make sure it doesn't break them.  If I don't hear 
complaints then I'll move on to my next change which is far more intrusive 
on drivers in general.

Side note: I left the control of queue depth setting solely in the hands 
of the low level drivers since they are the *only* ones that can get an 
accurate queue depth reading at the time of any given QUEUE_FULL message 
(see what the aic7xxx_old driver has to do in the QUEUE_FULL handler to 
find out how many commands the drive has seen at this exact point in time 
vs. how many we may have queued up to the card, the difference in numbers 
can be significant).  For that reason, the adjust_queue_depth call was 
made to defer the action until later so that it was interrupt context 
safe.

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606
  

[-- Attachment #2: scsi-queue.patch --]
[-- Type: text/plain, Size: 38636 bytes --]

--- 2.5/drivers/scsi/aic7xxx_old/aic7xxx.h.queue	2002-08-02 16:24:32.000000000 -0400
+++ 2.5/drivers/scsi/aic7xxx_old/aic7xxx.h	2002-10-01 19:19:37.000000000 -0400
@@ -46,7 +46,9 @@
 	eh_host_reset_handler: NULL,				\
 	abort: aic7xxx_abort,					\
 	reset: aic7xxx_reset,					\
-	slave_attach: NULL,					\
+	select_queue_depths: NULL,				\
+	slave_attach: aic7xxx_slave_attach,			\
+	slave_detach: aic7xxx_slave_detach,			\
 	bios_param: aic7xxx_biosparam,				\
 	can_queue: 255,		/* max simultaneous cmds      */\
 	this_id: -1,		/* scsi id of host adapter    */\
@@ -64,6 +66,8 @@
 extern int aic7xxx_reset(Scsi_Cmnd *, unsigned int);
 extern int aic7xxx_abort(Scsi_Cmnd *);
 extern int aic7xxx_release(struct Scsi_Host *);
+extern int aic7xxx_slave_attach(Scsi_Device *);
+extern void aic7xxx_slave_detach(Scsi_Device *);
 
 extern const char *aic7xxx_info(struct Scsi_Host *);
 
--- 2.5/drivers/scsi/aic7xxx_old.c.queue	2002-08-27 19:50:42.000000000 -0400
+++ 2.5/drivers/scsi/aic7xxx_old.c	2002-10-01 19:19:37.000000000 -0400
@@ -977,7 +977,7 @@
 #define  DEVICE_DTR_SCANNED             0x40
   volatile unsigned char   dev_flags[MAX_TARGETS];
   volatile unsigned char   dev_active_cmds[MAX_TARGETS];
-  volatile unsigned char   dev_temp_queue_depth[MAX_TARGETS];
+  volatile unsigned short  dev_temp_queue_depth[MAX_TARGETS];
   unsigned char            dev_commands_sent[MAX_TARGETS];
 
   unsigned int             dev_timer_active; /* Which devs have a timer set */
@@ -989,7 +989,9 @@
 
   unsigned char            dev_last_queue_full[MAX_TARGETS];
   unsigned char            dev_last_queue_full_count[MAX_TARGETS];
-  unsigned char            dev_max_queue_depth[MAX_TARGETS];
+  unsigned char            dev_lun_queue_depth[MAX_TARGETS];
+  unsigned short           dev_scbs_needed[MAX_TARGETS];
+  unsigned short           dev_max_queue_depth[MAX_TARGETS];
 
   volatile scb_queue_type  delayed_scbs[MAX_TARGETS];
 
@@ -1036,6 +1038,7 @@
   ahc_chip                 chip;             /* chip type */
   ahc_bugs                 bugs;
   dma_addr_t		   fifo_dma;	     /* DMA handle for fifo arrays */
+  Scsi_Device		  *Scsi_Dev[MAX_TARGETS][MAX_LUNS];
 
   /*
    * Statistics Kept:
@@ -2821,94 +2824,6 @@
     cmd->result |= (DID_RESET << 16);
   }
 
-  if (!(p->dev_flags[tindex] & DEVICE_PRESENT))
-  {
-    if ( (cmd->cmnd[0] == INQUIRY) && (cmd->result == DID_OK) )
-    {
-    
-      p->dev_flags[tindex] |= DEVICE_PRESENT;
-#define WIDE_INQUIRY_BITS 0x60
-#define SYNC_INQUIRY_BITS 0x10
-#define SCSI_VERSION_BITS 0x07
-#define SCSI_DT_BIT       0x04
-      if(!(p->dev_flags[tindex] & DEVICE_DTR_SCANNED)) {
-        char *buffer;
-
-        if(cmd->use_sg)
-          BUG();
-
-        buffer = (char *)cmd->request_buffer;
-
-        if ( (buffer[7] & WIDE_INQUIRY_BITS) &&
-             (p->features & AHC_WIDE) )
-        {
-          p->needwdtr |= (1<<tindex);
-          p->needwdtr_copy |= (1<<tindex);
-          p->transinfo[tindex].goal_width = p->transinfo[tindex].user_width;
-        }
-        else
-        {
-          p->needwdtr &= ~(1<<tindex);
-          p->needwdtr_copy &= ~(1<<tindex);
-          pause_sequencer(p);
-          aic7xxx_set_width(p, cmd->target, cmd->channel, cmd->lun,
-                            MSG_EXT_WDTR_BUS_8_BIT, (AHC_TRANS_ACTIVE |
-                                                     AHC_TRANS_GOAL |
-                                                     AHC_TRANS_CUR) );
-          unpause_sequencer(p, FALSE);
-        }
-        if ( (buffer[7] & SYNC_INQUIRY_BITS) &&
-              p->transinfo[tindex].user_offset )
-        {
-          p->transinfo[tindex].goal_period = p->transinfo[tindex].user_period;
-          p->transinfo[tindex].goal_options = p->transinfo[tindex].user_options;
-          if (p->features & AHC_ULTRA2)
-            p->transinfo[tindex].goal_offset = MAX_OFFSET_ULTRA2;
-          else if (p->transinfo[tindex].goal_width == MSG_EXT_WDTR_BUS_16_BIT)
-            p->transinfo[tindex].goal_offset = MAX_OFFSET_16BIT;
-          else
-            p->transinfo[tindex].goal_offset = MAX_OFFSET_8BIT;
-          if ( (((buffer[2] & SCSI_VERSION_BITS) >= 3) ||
-                 (buffer[56] & SCSI_DT_BIT) ||
-                 (p->dev_flags[tindex] & DEVICE_SCSI_3) ) &&
-                 (p->transinfo[tindex].user_period <= 9) &&
-                 (p->transinfo[tindex].user_options) )
-          {
-            p->needppr |= (1<<tindex);
-            p->needppr_copy |= (1<<tindex);
-            p->needsdtr &= ~(1<<tindex);
-            p->needsdtr_copy &= ~(1<<tindex);
-            p->needwdtr &= ~(1<<tindex);
-            p->needwdtr_copy &= ~(1<<tindex);
-            p->dev_flags[tindex] |= DEVICE_SCSI_3;
-          }
-          else
-          {
-            p->needsdtr |= (1<<tindex);
-            p->needsdtr_copy |= (1<<tindex);
-            p->transinfo[tindex].goal_period = 
-              MAX(10, p->transinfo[tindex].goal_period);
-            p->transinfo[tindex].goal_options = 0;
-          }
-        }
-        else
-        {
-          p->needsdtr &= ~(1<<tindex);
-          p->needsdtr_copy &= ~(1<<tindex);
-          p->transinfo[tindex].goal_period = 255;
-          p->transinfo[tindex].goal_offset = 0;
-          p->transinfo[tindex].goal_options = 0;
-        }
-        p->dev_flags[tindex] |= DEVICE_DTR_SCANNED;
-        p->dev_flags[tindex] |= DEVICE_PRINT_DTR;
-      }
-#undef WIDE_INQUIRY_BITS
-#undef SYNC_INQUIRY_BITS
-#undef SCSI_VERSION_BITS
-#undef SCSI_DT_BIT
-    }
-  }
-
   if ((scb->flags & SCB_MSGOUT_BITS) != 0)
   {
     unsigned short mask;
@@ -4920,15 +4835,29 @@
                 if ( (p->dev_last_queue_full_count[tindex] > 14) &&
                      (p->dev_active_cmds[tindex] > 4) )
                 {
+		  int diff, lun;
+		  if (p->dev_active_cmds[tindex] > p->dev_lun_queue_depth[tindex])
+		    /* We don't know what to do here, so bail. */
+		    break;
                   if (aic7xxx_verbose & VERBOSE_NEGOTIATION2)
                     printk(INFO_LEAD "Queue depth reduced to %d\n", p->host_no,
                       CTL_OF_SCB(scb), p->dev_active_cmds[tindex]);
-                  p->dev_max_queue_depth[tindex] = 
-                      p->dev_active_cmds[tindex];
+		  diff = p->dev_lun_queue_depth[tindex] -
+			 p->dev_active_cmds[tindex];
+		  p->dev_lun_queue_depth[tindex] -= diff;
+		  for(lun = 0; lun < p->host->max_lun; lun++)
+		  {
+		    if(p->Scsi_Dev[tindex][lun] != NULL)
+		    {
+		      p->dev_max_queue_depth[tindex] -= diff;
+		      scsi_adjust_queue_depth(p->Scsi_Dev[tindex][lun], 1,
+				              p->dev_lun_queue_depth[tindex]);
+		      if(p->dev_temp_queue_depth[tindex] > p->dev_max_queue_depth[tindex])
+		        p->dev_temp_queue_depth[tindex] = p->dev_max_queue_depth[tindex];
+		    }
+		  }
                   p->dev_last_queue_full[tindex] = 0;
                   p->dev_last_queue_full_count[tindex] = 0;
-                  p->dev_temp_queue_depth[tindex] = 
-                    p->dev_active_cmds[tindex];
                 }
                 else if (p->dev_active_cmds[tindex] == 0)
                 {
@@ -7024,10 +6953,10 @@
  *   with queue depths for individual devices.  It also allows tagged
  *   queueing to be [en|dis]abled for a specific adapter.
  *-F*************************************************************************/
-static int
+static void
 aic7xxx_device_queue_depth(struct aic7xxx_host *p, Scsi_Device *device)
 {
-  int default_depth = 3;
+  int default_depth = p->host->hostt->cmd_per_lun;
   unsigned char tindex;
   unsigned short target_mask;
 
@@ -7037,12 +6966,69 @@
   if (p->dev_max_queue_depth[tindex] > 1)
   {
     /*
-     * We've already scanned this device, leave it alone
+     * We've already scanned some lun on this device and enabled tagged
+     * queueing on it.  So, as long as this lun also supports tagged
+     * queueing, enable it here with the same depth.  Call SCSI mid layer
+     * to adjust depth on this device, and add enough to the max_queue_depth
+     * to cover the commands for this lun.
+     *
+     * Note: there is a shortcoming here.  The aic7xxx driver really assumes
+     * that if any lun on a device supports tagged queueing, then they *all*
+     * do.  Our p->tagenable field is on a per target id basis and doesn't
+     * differentiate for different luns.  If we end up with one lun that
+     * doesn't support tagged queueing, it's going to disable tagged queueing
+     * on *all* the luns on that target ID :-(
      */
-    return(p->dev_max_queue_depth[tindex]);
+    if(device->tagged_supported) {
+      if (aic7xxx_verbose & VERBOSE_NEGOTIATION2)
+      {
+        printk(INFO_LEAD "Enabled tagged queuing, queue depth %d.\n",
+               p->host_no, device->channel, device->id,
+               device->lun, device->queue_depth);
+      }
+      p->dev_max_queue_depth[tindex] += p->dev_lun_queue_depth[tindex];
+      p->dev_temp_queue_depth[tindex] += p->dev_lun_queue_depth[tindex];
+      scsi_adjust_queue_depth(device, 1, p->dev_lun_queue_depth[tindex]);
+    }
+    else
+    {
+      int lun;
+      /*
+       * Uh ohh, this is what I was talking about.  All the other devices on
+       * this target ID that support tagged queueing are going to end up
+       * getting tagged queueing turned off because of this device.  Print
+       * out a message to this effect for the user, then disable tagged
+       * queueing on all the devices on this ID.
+       */
+      printk(WARN_LEAD "does not support tagged queuing while other luns on\n"
+             "          the same target ID do!!  Tagged queueing will be disabled for\n"
+             "          all luns on this target ID!!\n", p->host_no,
+	     device->channel, device->id, device->lun);
+      
+      p->dev_lun_queue_depth[tindex] = default_depth;
+      p->dev_scbs_needed[tindex] = 0;
+      p->dev_temp_queue_depth[tindex] = 1;
+      p->dev_max_queue_depth[tindex] = 1;
+      p->tagenable &= ~target_mask;
+
+      for(lun=0; lun < p->host->max_lun; lun++)
+      {
+        if(p->Scsi_Dev[tindex][lun] != NULL)
+	{
+          printk(WARN_LEAD "disabling tagged queuing.\n", p->host_no,
+                 p->Scsi_Dev[tindex][lun]->channel,
+		 p->Scsi_Dev[tindex][lun]->id,
+		 p->Scsi_Dev[tindex][lun]->lun);
+          scsi_adjust_queue_depth(p->Scsi_Dev[tindex][lun], 0, default_depth);
+	  p->dev_scbs_needed[tindex] += default_depth;
+	}
+      }
+    }
+    return;
   }
 
-  device->queue_depth = default_depth;
+  p->dev_lun_queue_depth[tindex] = default_depth;
+  p->dev_scbs_needed[tindex] = default_depth;
   p->dev_temp_queue_depth[tindex] = 1;
   p->dev_max_queue_depth[tindex] = 1;
   p->tagenable &= ~target_mask;
@@ -7052,7 +7038,7 @@
     int tag_enabled = TRUE;
 
     default_depth = AIC7XXX_CMDS_PER_DEVICE;
- 
+
     if (!(p->discenable & target_mask))
     {
       if (aic7xxx_verbose & VERBOSE_NEGOTIATION2)
@@ -7073,7 +7059,7 @@
                            " the aic7xxx.c source file.\n");
           print_warning = FALSE;
         }
-        device->queue_depth = default_depth;
+        p->dev_lun_queue_depth[tindex] = default_depth;
       }
       else
       {
@@ -7081,19 +7067,18 @@
         if (aic7xxx_tag_info[p->instance].tag_commands[tindex] == 255)
         {
           tag_enabled = FALSE;
-          device->queue_depth = 3;  /* Tagged queueing is disabled. */
         }
         else if (aic7xxx_tag_info[p->instance].tag_commands[tindex] == 0)
         {
-          device->queue_depth = default_depth;
+          p->dev_lun_queue_depth[tindex] = default_depth;
         }
         else
         {
-          device->queue_depth =
+          p->dev_lun_queue_depth[tindex] =
             aic7xxx_tag_info[p->instance].tag_commands[tindex];
         }
       }
-      if ((device->tagged_queue == 0) && tag_enabled)
+      if (tag_enabled)
       {
         if (aic7xxx_verbose & VERBOSE_NEGOTIATION2)
         {
@@ -7101,46 +7086,70 @@
                 p->host_no, device->channel, device->id,
                 device->lun, device->queue_depth);
         }
-        p->dev_max_queue_depth[tindex] = device->queue_depth;
-        p->dev_temp_queue_depth[tindex] = device->queue_depth;
+        p->dev_max_queue_depth[tindex] = p->dev_lun_queue_depth[tindex];
+        p->dev_temp_queue_depth[tindex] = p->dev_lun_queue_depth[tindex];
+        p->dev_scbs_needed[tindex] = p->dev_lun_queue_depth[tindex];
         p->tagenable |= target_mask;
         p->orderedtag |= target_mask;
-        device->tagged_queue = 1;
-        device->current_tag = SCB_LIST_NULL;
+	scsi_adjust_queue_depth(device, 1, p->dev_lun_queue_depth[tindex]);
       }
     }
   }
-  return(p->dev_max_queue_depth[tindex]);
+  return;
 }
 
 /*+F*************************************************************************
  * Function:
- *   aic7xxx_select_queue_depth
+ *   aic7xxx_slave_detach
  *
  * Description:
- *   Sets the queue depth for each SCSI device hanging off the input
- *   host adapter.  We use a queue depth of 2 for devices that do not
- *   support tagged queueing.  If AIC7XXX_CMDS_PER_LUN is defined, we
- *   use that for tagged queueing devices; otherwise we use our own
- *   algorithm for determining the queue depth based on the maximum
- *   SCBs for the controller.
+ *   prepare for this device to go away
  *-F*************************************************************************/
-static void
-aic7xxx_select_queue_depth(struct Scsi_Host *host,
-    Scsi_Device *scsi_devs)
+void
+aic7xxx_slave_detach(Scsi_Device *sdpnt)
 {
-  Scsi_Device *device;
-  struct aic7xxx_host *p = (struct aic7xxx_host *) host->hostdata;
-  int scbnum;
+  struct aic7xxx_host *p = (struct aic7xxx_host *) sdpnt->host->hostdata;
+  int lun, tindex;
+
+  tindex = sdpnt->id | (sdpnt->channel << 3);
+  lun = sdpnt->lun;
+  if(p->Scsi_Dev[tindex][lun] == NULL)
+    return;
 
-  scbnum = 0;
-  for (device = scsi_devs; device != NULL; device = device->next)
+  if(p->tagenable & (1 << tindex))
   {
-    if (device->host == host)
-    {
-      scbnum += aic7xxx_device_queue_depth(p, device);
-    }
+    p->dev_max_queue_depth[tindex] -= p->dev_lun_queue_depth[tindex];
+    if(p->dev_temp_queue_depth[tindex] > p->dev_max_queue_depth[tindex])
+      p->dev_temp_queue_depth[tindex] = p->dev_max_queue_depth[tindex];
   }
+  p->dev_scbs_needed[tindex] -= p->dev_lun_queue_depth[tindex];
+  p->Scsi_Dev[tindex][lun] = NULL;
+  return;
+}
+
+/*+F*************************************************************************
+ * Function:
+ *   aic7xxx_slave_attach
+ *
+ * Description:
+ *   Configure the device we are attaching to the controller.  This is
+ *   where we get to do things like scan the INQUIRY data, set queue
+ *   depths, allocate command structs, etc.
+ *-F*************************************************************************/
+int
+aic7xxx_slave_attach(Scsi_Device *sdpnt)
+{
+  struct aic7xxx_host *p = (struct aic7xxx_host *) sdpnt->host->hostdata;
+  int scbnum, tindex, i;
+
+  tindex = sdpnt->id | (sdpnt->channel << 3);
+  p->dev_flags[tindex] |= DEVICE_PRESENT;
+
+  p->Scsi_Dev[tindex][sdpnt->lun] = sdpnt;
+  aic7xxx_device_queue_depth(p, sdpnt);
+
+  for(i = 0, scbnum = 0; i < p->host->max_id; i++)
+    scbnum += p->dev_scbs_needed[i];
   while (scbnum > p->scb_data->numscbs)
   {
     /*
@@ -7149,8 +7158,77 @@
      * the SCB in order to perform a swap operation (possible deadlock)
      */
     if ( aic7xxx_allocate_scb(p) == 0 )
-      return;
+      break;
+  }
+
+  /*
+   * We only need to check INQUIRY data on one lun of multi lun devices
+   * since speed negotiations are not lun specific.  Once we've check this
+   * particular target id once, the DEVICE_PRESENT flag will be set.
+   */
+  if (!(p->dev_flags[tindex] & DEVICE_DTR_SCANNED))
+  {
+    p->dev_flags[tindex] |= DEVICE_DTR_SCANNED;
+
+    if ( sdpnt->wdtr && (p->features & AHC_WIDE) )
+    {
+      p->needwdtr |= (1<<tindex);
+      p->needwdtr_copy |= (1<<tindex);
+      p->transinfo[tindex].goal_width = p->transinfo[tindex].user_width;
+    }
+    else
+    {
+      p->needwdtr &= ~(1<<tindex);
+      p->needwdtr_copy &= ~(1<<tindex);
+      pause_sequencer(p);
+      aic7xxx_set_width(p, sdpnt->id, sdpnt->channel, sdpnt->lun,
+                        MSG_EXT_WDTR_BUS_8_BIT, (AHC_TRANS_ACTIVE |
+                                                 AHC_TRANS_GOAL |
+                                                 AHC_TRANS_CUR) );
+      unpause_sequencer(p, FALSE);
+    }
+    if ( sdpnt->sdtr && p->transinfo[tindex].user_offset )
+    {
+      p->transinfo[tindex].goal_period = p->transinfo[tindex].user_period;
+      p->transinfo[tindex].goal_options = p->transinfo[tindex].user_options;
+      if (p->features & AHC_ULTRA2)
+        p->transinfo[tindex].goal_offset = MAX_OFFSET_ULTRA2;
+      else if (p->transinfo[tindex].goal_width == MSG_EXT_WDTR_BUS_16_BIT)
+        p->transinfo[tindex].goal_offset = MAX_OFFSET_16BIT;
+      else
+        p->transinfo[tindex].goal_offset = MAX_OFFSET_8BIT;
+      if ( sdpnt->ppr && p->transinfo[tindex].user_period <= 9 &&
+             p->transinfo[tindex].user_options )
+      {
+        p->needppr |= (1<<tindex);
+        p->needppr_copy |= (1<<tindex);
+        p->needsdtr &= ~(1<<tindex);
+        p->needsdtr_copy &= ~(1<<tindex);
+        p->needwdtr &= ~(1<<tindex);
+        p->needwdtr_copy &= ~(1<<tindex);
+        p->dev_flags[tindex] |= DEVICE_SCSI_3;
+      }
+      else
+      {
+        p->needsdtr |= (1<<tindex);
+        p->needsdtr_copy |= (1<<tindex);
+        p->transinfo[tindex].goal_period = 
+          MAX(10, p->transinfo[tindex].goal_period);
+        p->transinfo[tindex].goal_options = 0;
+      }
+    }
+    else
+    {
+      p->needsdtr &= ~(1<<tindex);
+      p->needsdtr_copy &= ~(1<<tindex);
+      p->transinfo[tindex].goal_period = 255;
+      p->transinfo[tindex].goal_offset = 0;
+      p->transinfo[tindex].goal_options = 0;
+    }
+    p->dev_flags[tindex] |= DEVICE_PRINT_DTR;
   }
+
+  return(0);
 }
 
 /*+F*************************************************************************
@@ -8247,7 +8325,6 @@
   host->can_queue = AIC7XXX_MAXSCB;
   host->cmd_per_lun = 3;
   host->sg_tablesize = AIC7XXX_MAX_SG;
-  host->select_queue_depths = aic7xxx_select_queue_depth;
   host->this_id = p->scsi_id;
   host->io_port = p->base;
   host->n_io_port = 0xFF;
--- 2.5/drivers/scsi/hosts.h.queue	2002-10-01 15:46:04.000000000 -0400
+++ 2.5/drivers/scsi/hosts.h	2002-10-01 19:19:37.000000000 -0400
@@ -97,6 +97,10 @@
      */
     int (* detect)(struct SHT *);
 
+    /*
+     * This function is only used by one driver and will be going away
+     * once it switches over to using the slave_detach() function instead.
+     */
     int (*revoke)(Scsi_Device *);
 
     /* Used with loadable modules to unload the host structures.  Note:
@@ -200,11 +204,59 @@
     int (* reset)(Scsi_Cmnd *, unsigned int);
 
     /*
-     * This function is used to select synchronous communications,
-     * which will result in a higher data throughput.  Not implemented
-     * yet.
+     * Once the device has responded to an INQUIRY and we know the device
+     * is online, call into the low level driver with the Scsi_Device *
+     * (so that the low level driver may save it off in a safe location
+     * for later use in calling scsi_adjust_queue_depth() or possibly
+     * other scsi_* functions) and char * to the INQUIRY return data buffer.
+     * This way, low level drivers will no longer have to snoop INQUIRY data
+     * to see if a drive supports PPR message protocol for Ultra160 speed
+     * negotiations or other similar items.  Instead it can simply wait until
+     * the scsi mid layer calls them with the data in hand and then it can
+     * do it's checking of INQUIRY data.  This will happen once for each new
+     * device added on this controller (including once for each lun on
+     * multi-lun devices, so low level drivers should take care to make
+     * sure that if they do tagged queueing on a per physical unit basis
+     * instead of a per logical unit basis that they have the mid layer
+     * allocate tags accordingly).
+     *
+     * Things currently recommended to be handled at this time include:
+     *
+     * 1.  Checking for tagged queueing capability and if able then calling
+     *     scsi_adjust_queue_depth() with the device pointer and the
+     *     suggested new queue depth.
+     * 2.  Checking for things such as SCSI level or DT bit in order to
+     *     determine if PPR message protocols are appropriate on this
+     *     device (or any other scsi INQUIRY data specific things the
+     *     driver wants to know in order to properly handle this device).
+     * 3.  Allocating command structs that the device will need.
+     * 4.  Setting the default timeout on this device (if needed).
+     * 5.  Saving the Scsi_Device pointer so that the low level driver
+     *     will be able to easily call back into scsi_adjust_queue_depth
+     *     again should it be determined that the queue depth for this
+     *     device should be lower or higher than it is initially set to.
+     * 6.  Allocate device data structures as needed that can be attached
+     *     to the Scsi_Device * via SDpnt->host_device_ptr
+     * 7.  Anything else the low level driver might want to do on a device
+     *     specific setup basis...
+     * 8.  Return 0 on success, non-0 on error.  The device will be marked
+     *     as offline on error so that no access will occur.
+     */
+    int (* slave_attach)(Scsi_Device *);
+
+    /*
+     * If we are getting ready to remove a device from the scsi chain then
+     * we call into the low level driver to let them know.  Once a low
+     * level driver has been informed that a drive is going away, the low
+     * level driver *must* remove it's pointer to the Scsi_Device because
+     * it is going to be kfree()'ed shortly.  It is no longer safe to call
+     * any mid layer functions with this Scsi_Device *.  Additionally, the
+     * mid layer will not make any more calls into the low level driver's
+     * queue routine with this device, so it is safe for the device driver
+     * to deallocate all structs/commands/etc that is has allocated
+     * specifically for this device at the time of this call.
      */
-    int (* slave_attach)(int, int);
+    void (* slave_detach)(Scsi_Device *);
 
     /*
      * This function determines the bios parameters for a given
@@ -217,6 +269,8 @@
 
     /*
      * Used to set the queue depth for a specific device.
+     *
+     * Once the slave_attach() function is in full use, this will go away.
      */
     void (*select_queue_depths)(struct Scsi_Host *, Scsi_Device *);
 
--- 2.5/drivers/scsi/scsi.c.queue	2002-10-01 15:46:06.000000000 -0400
+++ 2.5/drivers/scsi/scsi.c	2002-10-01 19:22:12.000000000 -0400
@@ -551,6 +551,7 @@
 {
 	unsigned long flags;
         Scsi_Device * SDpnt;
+	int alloc_cmd = 0;
 
 	spin_lock_irqsave(&device_request_lock, flags);
 
@@ -567,6 +568,25 @@
 				   atomic_read(&SCpnt->host->host_active),
 				   SCpnt->host->host_failed));
 
+	if(SDpnt->queue_depth > SDpnt->new_queue_depth) {
+		Scsi_Cmnd *prev, *next;
+		/*
+		 * Release the command block and decrement the queue
+		 * depth.
+		 */
+		for(prev = NULL, next = SDpnt->device_queue;
+				next != SCpnt;
+				prev = next, next = next->next) ;
+		if(prev == NULL)
+			SDpnt->device_queue = next->next;
+		else
+			prev->next = next->next;
+		kfree((char *)SCpnt);
+		SDpnt->queue_depth--;
+	} else if(SDpnt->queue_depth < SDpnt->new_queue_depth) {
+		alloc_cmd = 1;
+		SDpnt->queue_depth++;
+	}
 	spin_unlock_irqrestore(&device_request_lock, flags);
 
         /*
@@ -575,6 +595,48 @@
          * they wake up.  
          */
 	wake_up(&SDpnt->scpnt_wait);
+
+	/*
+	 * We are happy to release command blocks in the scope of the
+	 * device_request_lock since that's nice and quick, but allocation
+	 * can take more time so do it outside that scope instead.
+	 */
+	if(alloc_cmd) {
+		Scsi_Cmnd *newSCpnt;
+
+		newSCpnt = kmalloc(sizeof(Scsi_Cmnd), GFP_ATOMIC |
+				(SDpnt->host->unchecked_isa_dma ?
+				 GFP_DMA : 0));
+		if(newSCpnt) {
+			memset(newSCpnt, 0, sizeof(Scsi_Cmnd));
+			newSCpnt->host = SDpnt->host;
+			newSCpnt->device = SDpnt;
+			newSCpnt->target = SDpnt->id;
+			newSCpnt->lun = SDpnt->lun;
+			newSCpnt->channel = SDpnt->channel;
+			newSCpnt->request = NULL;
+			newSCpnt->use_sg = 0;
+			newSCpnt->old_use_sg = 0;
+			newSCpnt->old_cmd_len = 0;
+			newSCpnt->underflow = 0;
+			newSCpnt->old_underflow = 0;
+			newSCpnt->transfersize = 0;
+			newSCpnt->resid = 0;
+			newSCpnt->serial_number = 0;
+			newSCpnt->serial_number_at_timeout = 0;
+			newSCpnt->host_scribble = NULL;
+			newSCpnt->state = SCSI_STATE_UNUSED;
+			newSCpnt->owner = SCSI_OWNER_NOBODY;
+			spin_lock_irqsave(&device_request_lock, flags);
+			newSCpnt->next = SDpnt->device_queue;
+			SDpnt->device_queue = newSCpnt;
+			spin_unlock_irqrestore(&device_request_lock, flags);
+		} else {
+			spin_lock_irqsave(&device_request_lock, flags);
+			SDpnt->queue_depth--;
+			spin_unlock_irqrestore(&device_request_lock, flags);
+		}
+	}
 }
 
 /*
@@ -1455,8 +1517,8 @@
 		SDpnt->device_queue = SCnext = SCpnt->next;
 		kfree((char *) SCpnt);
 	}
-	SDpnt->has_cmdblocks = 0;
 	SDpnt->queue_depth = 0;
+	SDpnt->new_queue_depth = 0;
 	spin_unlock_irqrestore(&device_request_lock, flags);
 }
 
@@ -1471,63 +1533,115 @@
  *
  * Lock status: No locking assumed or required.
  *
- * Notes:
+ * Notes:	We really only allocate one command here.  We will allocate
+ *		more commands as needed once the device goes into real use.
  */
 void scsi_build_commandblocks(Scsi_Device * SDpnt)
 {
 	unsigned long flags;
-	struct Scsi_Host *host = SDpnt->host;
-	int j;
 	Scsi_Cmnd *SCpnt;
 
+	if (SDpnt->queue_depth != 0)
+		return;
+		
+	SCpnt = (Scsi_Cmnd *) kmalloc(sizeof(Scsi_Cmnd), GFP_ATOMIC |
+			(SDpnt->host->unchecked_isa_dma ? GFP_DMA : 0));
+	if (NULL == SCpnt) {
+		/*
+		 * Since we don't currently have *any* command blocks on this
+		 * device, go ahead and try an atomic allocation...
+		 */
+		SCpnt = (Scsi_Cmnd *) kmalloc(sizeof(Scsi_Cmnd), GFP_ATOMIC |
+			(SDpnt->host->unchecked_isa_dma ? GFP_DMA : 0));
+		if (NULL == SCpnt)
+			return;	/* Oops, we aren't going anywhere for now */
+	}
+
+	memset(SCpnt, 0, sizeof(Scsi_Cmnd));
+	SCpnt->host = SDpnt->host;
+	SCpnt->device = SDpnt;
+	SCpnt->target = SDpnt->id;
+	SCpnt->lun = SDpnt->lun;
+	SCpnt->channel = SDpnt->channel;
+	SCpnt->request = NULL;
+	SCpnt->use_sg = 0;
+	SCpnt->old_use_sg = 0;
+	SCpnt->old_cmd_len = 0;
+	SCpnt->underflow = 0;
+	SCpnt->old_underflow = 0;
+	SCpnt->transfersize = 0;
+	SCpnt->resid = 0;
+	SCpnt->serial_number = 0;
+	SCpnt->serial_number_at_timeout = 0;
+	SCpnt->host_scribble = NULL;
+	SCpnt->state = SCSI_STATE_UNUSED;
+	SCpnt->owner = SCSI_OWNER_NOBODY;
 	spin_lock_irqsave(&device_request_lock, flags);
+	if(SDpnt->new_queue_depth == 0)
+		SDpnt->new_queue_depth = 1;
+	SDpnt->queue_depth++;
+	SCpnt->next = SDpnt->device_queue;
+	SDpnt->device_queue = SCpnt;
+	spin_unlock_irqrestore(&device_request_lock, flags);
+}
 
-	if (SDpnt->queue_depth == 0)
+/*
+ * Function:	scsi_adjust_queue_depth()
+ *
+ * Purpose:	Allow low level drivers to tell us to change the queue depth
+ * 		on a specific SCSI device
+ *
+ * Arguments:	SDpnt	- SCSI Device in question
+ * 		tagged	- Do we use tagged queueing (non-0) or do we treat
+ * 			  this device as an untagged device (0)
+ * 		tags	- Number of tags allowed if tagged queueing enabled,
+ * 			  or number of commands the low level driver can
+ * 			  queue up in non-tagged mode (as per cmd_per_lun).
+ *
+ * Returns:	Nothing
+ *
+ * Lock Status:	None held on entry
+ *
+ * Notes:	Low level drivers may call this at any time and we will do
+ * 		the right thing depending on whether or not the device is
+ * 		currently active and whether or not it even has the
+ * 		command blocks built yet.
+ *
+ * 		If cmdblocks != 0 then we are a live device.  We just set the
+ * 		new_queue_depth variable and when the scsi completion handler
+ * 		notices that queue_depth != new_queue_depth it will work to
+ *		rectify the situation.  If new_queue_depth is less than current
+ *		queue_depth, then it will free the completed command instead of
+ *		putting it back on the free list and dec queue_depth.  Otherwise
+ *		it will try to allocate a new command block for the device and
+ *		put it on the free list along with the command that is being
+ *		completed.  Obviously, if the device isn't doing anything then
+ *		neither is this code, so it will bring the devices queue depth
+ *		back into line when the device is actually being used.  This
+ *		keeps us from needing to fire off a kernel thread or some such
+ *		nonsense (this routine can be called from interrupt code, so
+ *		handling allocations here would be tricky and risky, making
+ *		a kernel thread a much safer way to go if we wanted to handle
+ *		the work immediately instead of letting it get done a little
+ *		at a time in the completion handler).
+ */
+void scsi_adjust_queue_depth(Scsi_Device *SDpnt, int tagged, int tags)
+{
+	unsigned long flags;
+
+	/*
+	 * refuse to set tagged depth to an unworkable size
+	 */
+	if(tags == 0)
+		return;
+	spin_lock_irqsave(&device_request_lock, flags);
+	SDpnt->new_queue_depth = tags;
+	SDpnt->tagged_queue = tagged;
+	spin_unlock_irqrestore(&device_request_lock, flags);
+	if(SDpnt->queue_depth == 0)
 	{
-		SDpnt->queue_depth = host->cmd_per_lun;
-		if (SDpnt->queue_depth == 0)
-			SDpnt->queue_depth = 1; /* live to fight another day */
-	}
-	SDpnt->device_queue = NULL;
-
-	for (j = 0; j < SDpnt->queue_depth; j++) {
-		SCpnt = (Scsi_Cmnd *)
-		    kmalloc(sizeof(Scsi_Cmnd),
-				     GFP_ATOMIC |
-				(host->unchecked_isa_dma ? GFP_DMA : 0));
-		if (NULL == SCpnt)
-			break;	/* If not, the next line will oops ... */
-		memset(SCpnt, 0, sizeof(Scsi_Cmnd));
-		SCpnt->host = host;
-		SCpnt->device = SDpnt;
-		SCpnt->target = SDpnt->id;
-		SCpnt->lun = SDpnt->lun;
-		SCpnt->channel = SDpnt->channel;
-		SCpnt->request = NULL;
-		SCpnt->use_sg = 0;
-		SCpnt->old_use_sg = 0;
-		SCpnt->old_cmd_len = 0;
-		SCpnt->underflow = 0;
-		SCpnt->old_underflow = 0;
-		SCpnt->transfersize = 0;
-		SCpnt->resid = 0;
-		SCpnt->serial_number = 0;
-		SCpnt->serial_number_at_timeout = 0;
-		SCpnt->host_scribble = NULL;
-		SCpnt->next = SDpnt->device_queue;
-		SDpnt->device_queue = SCpnt;
-		SCpnt->state = SCSI_STATE_UNUSED;
-		SCpnt->owner = SCSI_OWNER_NOBODY;
-	}
-	if (j < SDpnt->queue_depth) {	/* low on space (D.Gilbert 990424) */
-		printk(KERN_WARNING "scsi_build_commandblocks: want=%d, space for=%d blocks\n",
-		       SDpnt->queue_depth, j);
-		SDpnt->queue_depth = j;
-		SDpnt->has_cmdblocks = (0 != j);
-	} else {
-		SDpnt->has_cmdblocks = 1;
+		scsi_build_commandblocks(SDpnt);
 	}
-	spin_unlock_irqrestore(&device_request_lock, flags);
 }
 
 void __init scsi_host_no_insert(char *str, int n)
@@ -1766,13 +1880,6 @@
 			goto out;	/* We do not yet support unplugging */
 
 		scan_scsis(HBA_ptr, 1, channel, id, lun);
-
-		/* FIXME (DB) This assumes that the queue_depth routines can be used
-		   in this context as well, while they were all designed to be
-		   called only once after the detect routine. (DB) */
-		/* queue_depth routine moved to inside scan_scsis(,1,,,) so
-		   it is called before build_commandblocks() */
-
 		err = length;
 		goto out;
 	}
@@ -1834,6 +1941,8 @@
 			 */
                         if (HBA_ptr->hostt->revoke)
                                 HBA_ptr->hostt->revoke(scd);
+			if (HBA_ptr->hostt->slave_detach)
+				(*HBA_ptr->hostt->slave_detach) (scd);
 			devfs_unregister (scd->de);
 			scsi_release_commandblocks(scd);
 
@@ -1993,7 +2099,7 @@
 							(*sdtpnt->attach) (SDpnt);
 					if (SDpnt->attached) {
 						scsi_build_commandblocks(SDpnt);
-						if (0 == SDpnt->has_cmdblocks)
+						if (SDpnt->queue_depth == 0)
 							out_of_space = 1;
 					}
 				}
@@ -2124,6 +2230,8 @@
 				printk(KERN_ERR "Attached usage count = %d\n", SDpnt->attached);
 				goto err_out;
 			}
+			if (shpnt->hostt->slave_detach)
+				(*shpnt->hostt->slave_detach) (SDpnt);
 			devfs_unregister (SDpnt->de);
 			put_device(&SDpnt->sdev_driverfs_dev);
 		}
@@ -2280,10 +2388,10 @@
 			 * If this driver attached to the device, and don't have any
 			 * command blocks for this device, allocate some.
 			 */
-			if (SDpnt->attached && SDpnt->has_cmdblocks == 0) {
+			if (SDpnt->attached && SDpnt->queue_depth == 0) {
 				SDpnt->online = TRUE;
 				scsi_build_commandblocks(SDpnt);
-				if (0 == SDpnt->has_cmdblocks)
+				if (SDpnt->queue_depth == 0)
 					out_of_space = 1;
 			}
 		}
@@ -2333,6 +2441,8 @@
 				 * Nobody is using this device any more.  Free all of the
 				 * command structures.
 				 */
+				if (shpnt->hostt->slave_detach)
+					(*shpnt->hostt->slave_detach) (SDpnt);
 				scsi_release_commandblocks(SDpnt);
 			}
 		}
@@ -2686,9 +2796,13 @@
         SDpnt->host = SHpnt;
         SDpnt->id = SHpnt->this_id;
         SDpnt->type = -1;
-        SDpnt->queue_depth = 1;
+	SDpnt->new_queue_depth = 1;
         
 	scsi_build_commandblocks(SDpnt);
+	if(SDpnt->queue_depth == 0) {
+		kfree(SDpnt);
+		return NULL;
+	}
 
 	scsi_initialize_queue(SDpnt, SHpnt);
 
--- 2.5/drivers/scsi/scsi.h.queue	2002-10-01 15:46:04.000000000 -0400
+++ 2.5/drivers/scsi/scsi.h	2002-10-01 19:19:37.000000000 -0400
@@ -481,6 +481,7 @@
 extern void scsi_bottom_half_handler(void);
 extern void scsi_release_commandblocks(Scsi_Device * SDpnt);
 extern void scsi_build_commandblocks(Scsi_Device * SDpnt);
+extern void scsi_adjust_queue_depth(Scsi_Device *, int, int);
 extern void scsi_done(Scsi_Cmnd * SCpnt);
 extern void scsi_finish_command(Scsi_Cmnd *);
 extern int scsi_retry_command(Scsi_Cmnd *);
@@ -563,6 +564,8 @@
 	volatile unsigned short device_busy;	/* commands actually active on low-level */
 	Scsi_Cmnd *device_queue;	/* queue of SCSI Command structures */
         Scsi_Cmnd *current_cmnd;	/* currently active command */
+	unsigned short queue_depth;	/* How deep of a queue we have */
+	unsigned short new_queue_depth; /* How deep of a queue we want */
 
 	unsigned int id, lun, channel;
 
@@ -586,24 +589,25 @@
 	unsigned char current_tag;	/* current tag */
 	unsigned char sync_min_period;	/* Not less than this period */
 	unsigned char sync_max_offset;	/* Not greater than this offset */
-	unsigned char queue_depth;	/* How deep a queue to use */
 
 	unsigned online:1;
 	unsigned writeable:1;
 	unsigned removable:1;
 	unsigned random:1;
-	unsigned has_cmdblocks:1;
 	unsigned changed:1;	/* Data invalid due to media change */
 	unsigned busy:1;	/* Used to prevent races */
 	unsigned lockable:1;	/* Able to prevent media removal */
 	unsigned borken:1;	/* Tell the Seagate driver to be 
 				 * painfully slow on this device */
-	unsigned tagged_supported:1;	/* Supports SCSI-II tagged queuing */
-	unsigned tagged_queue:1;	/* SCSI-II tagged queuing enabled */
 	unsigned disconnect:1;	/* can disconnect */
 	unsigned soft_reset:1;	/* Uses soft reset option */
-	unsigned sync:1;	/* Negotiate for sync transfers */
-	unsigned wide:1;	/* Negotiate for WIDE transfers */
+	unsigned sdtr:1;	/* Device supports SDTR messages */
+	unsigned wdtr:1;	/* Device supports WDTR messages */
+	unsigned ppr:1;		/* Device supports PPR messages */
+	unsigned tagged_supported:1;	/* Supports SCSI-II tagged queuing */
+	unsigned tagged_queue:1;	/* SCSI-II tagged queuing enabled */
+	unsigned simple_tags:1;	/* Device supports simple queue tag messages */
+	unsigned ordered_tags:1;/* Device supports ordered queue tag messages */
 	unsigned single_lun:1;	/* Indicates we should only allow I/O to
 				 * one of the luns for the device at a 
 				 * time. */
--- 2.5/drivers/scsi/scsi_scan.c.queue	2002-08-27 19:50:45.000000000 -0400
+++ 2.5/drivers/scsi/scsi_scan.c	2002-10-01 19:19:37.000000000 -0400
@@ -1409,6 +1409,14 @@
 	sdev->lockable = sdev->removable;
 	sdev->soft_reset = (inq_result[7] & 1) && ((inq_result[3] & 7) == 2);
 
+	if (sdev->scsi_level >= SCSI_3 || (sdev->inquiry_len > 56 &&
+		inq_result[56] & 0x04))
+		sdev->ppr = 1;
+	if (inq_result[7] & 0x60)
+		sdev->wdtr = 1;
+	if (inq_result[7] & 0x10)
+		sdev->sdtr = 1;
+
 	/*
 	 * XXX maybe move the identifier and driverfs/devfs setup to a new
 	 * function, and call them after this function is called.
@@ -1509,9 +1517,9 @@
 	 * XXX maybe change scsi_release_commandblocks to not reset
 	 * queue_depth to 0.
 	 */
-	sdevscan->queue_depth = 1;
+	sdevscan->new_queue_depth = 1;
 	scsi_build_commandblocks(sdevscan);
-	if (sdevscan->has_cmdblocks == 0)
+	if (sdevscan->queue_depth == 0)
 		goto alloc_failed;
 
 	sreq = scsi_allocate_request(sdevscan);
@@ -1585,7 +1593,7 @@
 		kfree(scsi_result);
 	if (sreq != NULL)
 		scsi_release_request(sreq);
-	if (sdevscan->has_cmdblocks != 0)
+	if (sdevscan->queue_depth != 0)
 		scsi_release_commandblocks(sdevscan);
 	return SCSI_SCAN_NO_RESPONSE;
 }
@@ -1739,9 +1747,9 @@
 	if (sdevscan->scsi_level < SCSI_3)
 		return 1;
 
-	sdevscan->queue_depth = 1;
+	sdevscan->new_queue_depth = 1;
 	scsi_build_commandblocks(sdevscan);
-	if (sdevscan->has_cmdblocks == 0) {
+	if (sdevscan->queue_depth == 0) {
 		printk(ALLOC_FAILURE_MSG, __FUNCTION__);
 		/*
 		 * We are out of memory, don't try scanning any further.
@@ -2014,6 +2022,17 @@
 		 */
 		if (shost->select_queue_depths != NULL)
 			(shost->select_queue_depths) (shost, shost->host_queue);
+		if (shost->hostt->slave_attach != NULL)
+			if ((shost->hostt->slave_attach) (sdev) != 0) {
+				/*
+				 * Low level driver failed to attach this
+				 * device, we've got to kick it back out
+				 * now as a result :-(
+				 */
+				printk("scsi_scan_selected_lun: slave_attach "
+					"failed, marking device OFFLINE.\n");
+				sdev->online = FALSE;
+			}
 
 		for (sdt = scsi_devicelist; sdt; sdt = sdt->next)
 			if (sdt->init && sdt->dev_noticed)
@@ -2024,7 +2043,7 @@
 				(*sdt->attach) (sdev);
 				if (sdev->attached) {
 					scsi_build_commandblocks(sdev);
-					if (sdev->has_cmdblocks == 0)
+					if (sdev->queue_depth == 0)
 						printk(ALLOC_FAILURE_MSG,
 						       __FUNCTION__);
 				}
--- 2.5/drivers/scsi/scsi_syms.c.queue	2002-10-01 19:26:44.000000000 -0400
+++ 2.5/drivers/scsi/scsi_syms.c	2002-10-01 19:27:24.000000000 -0400
@@ -66,6 +66,7 @@
 EXPORT_SYMBOL(scsi_report_bus_reset);
 EXPORT_SYMBOL(scsi_block_requests);
 EXPORT_SYMBOL(scsi_unblock_requests);
+EXPORT_SYMBOL(scsi_adjust_queue_depth);
 
 EXPORT_SYMBOL(scsi_get_host_dev);
 EXPORT_SYMBOL(scsi_free_host_dev);

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: PATCH: scsi device queue depth adjustability patch
  2002-10-02  0:28 ` PATCH: scsi device queue depth adjustability patch Doug Ledford
@ 2002-10-02  1:16   ` Alan Cox
  2002-10-02  1:41     ` Doug Ledford
  2002-10-02 21:41   ` James Bottomley
  2002-10-03 14:25   ` James Bottomley
  2 siblings, 1 reply; 297+ messages in thread
From: Alan Cox @ 2002-10-02  1:16 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-scsi

On Wed, 2002-10-02 at 01:28, Doug Ledford wrote:
> 2.5.40 test machine is blowing chunks on the serverworks IDE support right 
> now so it isn't tested :-(

Please provide more details


^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: PATCH: scsi device queue depth adjustability patch
  2002-10-02  1:16   ` Alan Cox
@ 2002-10-02  1:41     ` Doug Ledford
  2002-10-02 13:44       ` Alan Cox
  0 siblings, 1 reply; 297+ messages in thread
From: Doug Ledford @ 2002-10-02  1:41 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel, linux-scsi

On Wed, Oct 02, 2002 at 02:16:07AM +0100, Alan Cox wrote:
> On Wed, 2002-10-02 at 01:28, Doug Ledford wrote:
> > 2.5.40 test machine is blowing chunks on the serverworks IDE support right 
> > now so it isn't tested :-(
> 
> Please provide more details

[ linux-kernel added to the Cc: since that's where IDE stuff goes ]

OK, it repeatedly hits a BUG() during startup (after detecting the first 
two hard drives on the Primary IDE channel, it takes an interrupt and then 
during that context calls detect_ide_disk_speed or something like that, 
which then calls _kmem_cache_alloc during said interrupt context, 
resulting in the stack printout and traceback, then it continues).  Second 
problem is that as soon as I started up /dev/md0, a raid1 array that was 
dirty due to all the failed boots, it almost immediately hard locks the 
system.  That could be a raid1 hardlock, or it could be a serverworks or 
generic ide code lockup.  Important details would be that the two mirror 
disks for /dev/md0 are on ide0 and ide1 so a rebuild would generate lots 
of primary/secondary controller parallel traffic.

I didn't provide a lot of details before because this machine doesn't have
a serial console host to grab logs and I didn't write all the stuff down.  
Then there is the additional fubar stuff but it's mainly in the category
of "Red Hat Linux 8.0 made initrd images using nash as the shell blow
chunks in regards to actually working with the 2.5.40 kernel, failing
simple operations such as mounting /proc, which of course causes damn near
everything else it tries to do to fail since the normal /proc files aren't
available".

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: PATCH: scsi device queue depth adjustability patch
  2002-10-02  1:41     ` Doug Ledford
@ 2002-10-02 13:44       ` Alan Cox
  0 siblings, 0 replies; 297+ messages in thread
From: Alan Cox @ 2002-10-02 13:44 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-kernel, linux-scsi

On Wed, 2002-10-02 at 02:41, Doug Ledford wrote:
> I didn't provide a lot of details before because this machine doesn't have
> a serial console host to grab logs and I didn't write all the stuff down.  
> Then there is the additional fubar stuff but it's mainly in the category
> of "Red Hat Linux 8.0 made initrd images using nash as the shell blow
> chunks in regards to actually working with the 2.5.40 kernel, failing
> simple operations such as mounting /proc, which of course causes damn near
> everything else it tries to do to fail since the normal /proc files aren't
> available".

Ok first thing to try - does 2.4.20-pre8-ac3 also lock up. That will
seperate problems with the 2.5.40 kernel from IDE ones. In paticular I
notice you mention using initrd which doesnt seem to work in 2.5.40

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: PATCH: scsi device queue depth adjustability patch
  2002-10-02  0:28 ` PATCH: scsi device queue depth adjustability patch Doug Ledford
  2002-10-02  1:16   ` Alan Cox
@ 2002-10-02 21:41   ` James Bottomley
  2002-10-02 22:18     ` Doug Ledford
  2002-10-03 14:25   ` James Bottomley
  2 siblings, 1 reply; 297+ messages in thread
From: James Bottomley @ 2002-10-02 21:41 UTC (permalink / raw)
  To: linux-scsi

dledford@redhat.com said:
> This patch makes it possible to adjust the queue depth of a scsi
> device  after it has been in use some time and you have a better idea
> of what the  optimal queue depth should be.  For the most part this
> should work, but my  2.5.40 test machine is blowing chunks on the
> serverworks IDE support right  now so it isn't tested :-( 

I note that there's a lot more than dynamic queue depth adjustment in this 
patch (PPA inquiry, slave attach etc.).

How do HBA's that don't support this now work?  You've taken any dependency on 
scsi_host.cmd_per_lun out of the code (thus rendering it useless) so every HBA 
driver now is forced to use an initial queue depth adjustment just to start 
tagged command queueing.  Can't we at least start with cmd_per_lun as the 
default depth?

I'm not entirely happy with the idea that we control the queue depth by 
adjusting the number of the device's allocated commands.  I know the patch 
goes to great lengths to move these kmallocs out of the critical path, but 
there are certain environments (multi-initiator) where the queue depth can be 
nastily and randomly variable.  If the allocations were more lazy (wait a 
while before freeing a struct Scsi_Cmnd to see if the queue depth goes up 
again for instance) this would address some of these concerns (perhaps just 
moving to a slab allocator for command blocks would do it?)

> Side note: I left the control of queue depth setting solely in the
> hands  of the low level drivers since they are the *only* ones that
> can get an  accurate queue depth reading at the time of any given
> QUEUE_FULL message  (see what the aic7xxx_old driver has to do in the
> QUEUE_FULL handler to  find out how many commands the drive has seen
> at this exact point in time  vs. how many we may have queued up to the
> card, the difference in numbers  can be significant).  For that
> reason, the adjust_queue_depth call was  made to defer the action
> until later so that it was interrupt context  safe.

I appreciate that the HBA driver is the most exact counter of the queue depth, 
but would it make a significant difference if the adjustments were done 
globally in the mid-layer?  The great advantage is that we would gain dynamic 
queue depth adjustments without having to add specific code to every driver, 
at the cost of not always being entirely accurate about the depth.  Is there a 
good argument that this really, really must be done at the LLD level given the 
cost in terms of LLD modifications?  You could still add hooks for those HBAs 
that really want to do it themselves.

James

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: PATCH: scsi device queue depth adjustability patch
  2002-10-02 21:41   ` James Bottomley
@ 2002-10-02 22:18     ` Doug Ledford
  2002-10-02 23:19       ` James Bottomley
  2002-10-03 12:46       ` James Bottomley
  0 siblings, 2 replies; 297+ messages in thread
From: Doug Ledford @ 2002-10-02 22:18 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi

On Wed, Oct 02, 2002 at 05:41:12PM -0400, James Bottomley wrote:
> dledford@redhat.com said:
> > This patch makes it possible to adjust the queue depth of a scsi
> > device  after it has been in use some time and you have a better idea
> > of what the  optimal queue depth should be.  For the most part this
> > should work, but my  2.5.40 test machine is blowing chunks on the
> > serverworks IDE support right  now so it isn't tested :-( 
> 
> I note that there's a lot more than dynamic queue depth adjustment in this 
> patch (PPA inquiry, slave attach etc.).

The slave attach is required for the queue depth adjustment stuff to work.  
The goal is to remove the select_queue_depths() function entirely and make 
slave_attach() -> adjust_queue_depth() the replacement.  The PPR message 
ability flag was just a simple bit that goes along with slave attach.  
There's more to be done there, but that's in there as a sample.

> How do HBA's that don't support this now work?

The mid layer calls into the HBA driver's select_queue_depths() function 
and the driver sets the queue depth on each device from there.

>  You've taken any dependency on 
> scsi_host.cmd_per_lun out of the code (thus rendering it useless) so every HBA 
> driver now is forced to use an initial queue depth adjustment just to start 
> tagged command queueing.

This is true both before and after my patch.  If the HBA doesn't implement 
the select_queue_depths function then it will never get tagged queueing.  
Besides, cmd_per_lun is *not* for tagged queueing, it's the queue depth 
the scsi mid layer is suppossed to keep for *non* tagged devices.

>  Can't we at least start with cmd_per_lun as the 
> default depth?

The current mid layer never has enabled a default depth, all scsi drivers 
must enable tagged queueing for each device before it happens.  I'm just 
changing how that's done from a specific select_queue_depths() call in 
point that does one and only one thing to a generic slave_attach() call in 
point that sets various post-INQUIRY data drive specific items, queue 
depth and acceptable speed negotiation message protocols being the two 
examples in my patch.

> I'm not entirely happy with the idea that we control the queue depth by 
> adjusting the number of the device's allocated commands.

It's the only safe way to do it.  You can't count on being able to 
allocate commands because you may not have available mem, and you don't 
want extra laying around because that wastes mem, and it allows my next 
changes to go in which includes switching to linked lists for commands and 
that way we can have a free list for each lun and when there is a command 
on the free list, then you know you are under your queue depth count 
because if all commands are used, then none are on the free list.

>  I know the patch 
> goes to great lengths to move these kmallocs out of the critical path, but 
> there are certain environments (multi-initiator) where the queue depth can be 
> nastily and randomly variable.

Go look at the QUEUE_FULL handler in the aic7xxx_old driver.  This is how 
most/all reasonably well written drivers handle queue depth adjustments.  
Trust me, they don't go around adjusting the depth all the time.  Most of 
the time there will be one initial adjustment, then maybe one more 
adjustment as we lock it down to the max upper limit when one exists, the 
rest of the time we just handle the occasional random queue depth 
QUEUE_FULL messages as exactly that and only temporarily freeze the queue 
to let the drive get some work done.

>  If the allocations were more lazy (wait a 
> while before freeing a struct Scsi_Cmnd to see if the queue depth goes up 
> again for instance) this would address some of these concerns (perhaps just 
> moving to a slab allocator for command blocks would do it?)

Nope, this is a non concern because we don't call adjust_queue_depth on 
every queue full (or at least we shouldn't, there isn't any documentation 
yet to tell people that, but that was the purpose of making the 
aic7xxx_old driver implement the new method since it serves as an example 
and it does exactly as I'm talking about).

> > Side note: I left the control of queue depth setting solely in the
> > hands  of the low level drivers since they are the *only* ones that
> > can get an  accurate queue depth reading at the time of any given
> > QUEUE_FULL message  (see what the aic7xxx_old driver has to do in the
> > QUEUE_FULL handler to  find out how many commands the drive has seen
> > at this exact point in time  vs. how many we may have queued up to the
> > card, the difference in numbers  can be significant).  For that
> > reason, the adjust_queue_depth call was  made to defer the action
> > until later so that it was interrupt context  safe.
> 
> I appreciate that the HBA driver is the most exact counter of the queue depth, 
> but would it make a significant difference if the adjustments were done 
> globally in the mid-layer?

The mid layer simply does not have access to the info needed to do what 
you are talking about.  Let me give you an example from the aic7xxx 
driver.  On this card we have a sequencer that handles the scsi protocol 
for us.  When we get a command from the mid layer it goes either to the 
card's QINFIFO (queue of commands that the sequencer hasn't started yet) 
or to the device's waiting queue (queue of commands that we can't send to 
the sequencer yet because of some reason).  Once it is on the card and the 
sequencer starts the process of selecting the target device, the command 
is moved to the waiting_q on the card (queue of commands already started 
but for which the target device has not yet responded).  Once the device 
is selected and we get to send the command, then we usually go directly 
into status phase and get our QUEUE_FULL.  At this point the sequencer 
pauses all operations on the scsi bus and interrupts the kernel to handle 
the QUEUE_FULL.  Prior to this command being selected, there may have been 
any number of commands place in the cards QOUTFIFO (our queue of already 
completed SCSI commands which the kernel interrupt driver hasn't processed 
yet).  So when the kernel driver gets the QUEUE_FULL interrupt, it must 
first plug this device up.  It then must remove any commands it finds in 
the QINFIFO and in the waiting_q on the card and requeue them back into 
the device's waiting queue.  Each command it moves like this decrements 
the device's active command count since when we place a command on the 
QINFIFO that's when we increment the active count.  Then we have to pull 
any possible completed commands out of the QOUTFIFO and place them on the 
done queue (commands we have queued to go back to the mid layer), also 
decrementing the active count each time.  Finally, we decrement the active 
count for the device one more time to accomodate our QUEUE_FULL command.  
Only after we have done *all* of this crap do we have an accurate queue 
depth count for the command that returned a QUEUE_FULL.  As you can see, 
lots of this is very hardware specific.  Controller cards that don't have 
a sequencer like control engine won't be starting commands off in a lazy 
fasion like this and so don't need all this stuff.  That's why the only 
place where we can get an accurate count of the queue depth is in the low 
level driver.  So, that code *must* stay there since getting an accurate 
count is how we avoid the random flip flops of queue depth that you were 
worried about earlier.

Now, what can be moved to the mid layer, and is on my list to do, is a 
generic interface for coming up with the proper action after a QUEUE_FULL.  
Currently, each driver not only determines the real depth, but then also 
does it's own magic to tell if it's a random event or a hard limit.  It 
would be easy to add something like scsi_notify_queue_full(sdev, count); 
where scsi_notify_queue_full() would keep track of the last queue full 
depth, etc.  Let the low level driver come up with the accurate count, 
then they can use this function to determine what to do about it.  On any 
change to queue depth, the function can return the amount of commands to 
add/subtract, then the low level driver can adjust it's own internal 
structure counts and also call scsi_adjust_queue_depth() to have the mid 
layer do likewise.  BTW, I'll change the adjust_queue_depth code to make 
it immediately adjust the depth down when possible and do lazy increases 
so that hitting a hard limit will free up resources immediately, but that 
will go with making the commands linked list based so that it can simply 
do a while(sdev->queue_depth > sdev->new_queue_depth && 
list_not_empty(sdev->free_list)) { kfree(get_list_head(sdev->free_list)); 
sdev->queue_depth--; } 

>  The great advantage is that we would gain dynamic 
> queue depth adjustments without having to add specific code to every driver, 

You're never going to get around this and have a reasonably reliable 
method :-(

> at the cost of not always being entirely accurate about the depth.  Is there a 
> good argument that this really, really must be done at the LLD level given the 
> cost in terms of LLD modifications?  You could still add hooks for those HBAs 
> that really want to do it themselves.

Well, all the device drivers that implement queue depth adjustments at all 
already do it themselves, so my proposed method is better than what we 
currently have since it at least allows them to move the decide 
disposition stuff into a library call or they can do it themselves.

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: PATCH: scsi device queue depth adjustability patch
  2002-10-02 22:18     ` Doug Ledford
@ 2002-10-02 23:19       ` James Bottomley
  2002-10-03 12:46       ` James Bottomley
  1 sibling, 0 replies; 297+ messages in thread
From: James Bottomley @ 2002-10-02 23:19 UTC (permalink / raw)
  To: linux-scsi

dledford@redhat.com said:
> > How do HBA's that don't support this now work?

> The mid layer calls into the HBA driver's select_queue_depths()
> function  and the driver sets the queue depth on each device from
> there. 

> >  You've taken any dependency on 
> > scsi_host.cmd_per_lun out of the code (thus rendering it useless) so every HBA 
> > driver now is forced to use an initial queue depth adjustment just to start 
> > tagged command queueing.

> This is true both before and after my patch.  If the HBA doesn't implement 
> the select_queue_depths function then it will never get tagged queueing.  

But that's not quite how it works today.

If a HBA driver doesn't implement select_queue_depths, it can still set the 
template cmd_per_lun and have the mid layer build it this many commands per 
device (that's the line in scsi_build_commands which sets queue_depth to 
cmd_per_lun if its still unset), and hence you have an effective default 
tagged queue depth of cmd_per_lun.

The patch will break this, and thus disable TCQ for all drivers that rely on 
it.

> Besides, cmd_per_lun is *not* for tagged queueing, it's the queue depth 
> the scsi mid layer is suppossed to keep for *non* tagged devices.

The comment in the code above this parameter says: "...Set this to the maximum 
number of command blocks to be provided for each device.  Set this to 1 for 
one command block per lun, 2 for two, etc."  It's not unreasonable to assume 
this means the maximum number of outstanding tags.

Even if we assume it means what you say, how does the non TCQ HBA that sets 
this to 2 so every device always has one outstanding command and one ready to 
roll work? Now it only gets one command block because we never consult 
cmd_per_lun.

James

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: PATCH: scsi device queue depth adjustability patch
  2002-10-02 22:18     ` Doug Ledford
  2002-10-02 23:19       ` James Bottomley
@ 2002-10-03 12:46       ` James Bottomley
  2002-10-03 16:35         ` Doug Ledford
  2002-10-04  1:40         ` Jeremy Higdon
  1 sibling, 2 replies; 297+ messages in thread
From: James Bottomley @ 2002-10-03 12:46 UTC (permalink / raw)
  To: linux-scsi

dledford@redhat.com said:
> Go look at the QUEUE_FULL handler in the aic7xxx_old driver.  This is
> how  most/all reasonably well written drivers handle queue depth
> adjustments.   Trust me, they don't go around adjusting the depth all
> the time.  Most of  the time there will be one initial adjustment,
> then maybe one more  adjustment as we lock it down to the max upper
> limit when one exists, the  rest of the time we just handle the
> occasional random queue depth  QUEUE_FULL messages as exactly that and
> only temporarily freeze the queue  to let the drive get some work
> done. 

OK, I spent a nice evening doing this (do I get brownie points?).  I see your 
algorithm is roughly lower the depth after 14 queue fulls and assume that luns 
of the same pun need to be treated equivalently.

I failed entirely to find how the queue depth is increased (no brownie points 
here).  How is that done?

> The mid layer simply does not have access to the info needed to do
> what  you are talking about.  Let me give you an example from the
> aic7xxx  driver.  On this card we have a sequencer that handles the
[...]
> how we avoid the random flip flops of queue depth that you were
> worried about earlier. 

Yes, many HBAs have internal "issue" queues where commands wait before being 
placed on the bus.  I was assuming, however, that when a HBA driver got QUEUE 
FULL, it would traverse the issue queue and respond QUEUE FULL also to all 
pending commands for that device.  The mid-layer should thus see a succession 
of QUEUE FULLs for the device (we even have a nice signature for this because 
the QUEUE FULL occurs while device_blocked is set).  However, as long as it 
can correctly recognise this, it knows that when the last QUEUE FULL is 
through it has the true device queue depth, doesn't it?

> Now, what can be moved to the mid layer, and is on my list to do, is a
>  generic interface for coming up with the proper action after a
> QUEUE_FULL.   Currently, each driver not only determines the real
> depth, but then also  does it's own magic to tell if it's a random
> event or a hard limit.  It  would be easy to add something like
> scsi_notify_queue_full(sdev, count);  where scsi_notify_queue_full()
> would keep track of the last queue full  depth, etc.  Let the low
> level driver come up with the accurate count,  then they can use this
> function to determine what to do about it.  On any  change to queue
> depth, the function can return the amount of commands to  add/
> subtract, then the low level driver can adjust it's own internal
> structure counts and also call scsi_adjust_queue_depth() to have the
> mid  layer do likewise.  BTW, I'll change the adjust_queue_depth code
> to make  it immediately adjust the depth down when possible and do
> lazy increases  so that hitting a hard limit will free up resources
> immediately, but that  will go with making the commands linked list
> based so that it can simply  do a while(sdev->queue_depth > sdev->
> new_queue_depth &&  list_not_empty(sdev->free_list)) {
> kfree(get_list_head(sdev->free_list));  sdev->queue_depth--; }  

I'll go for this.  That would address my main concern which is a proliferation 
of individual queue full handling algorithms in the LLDs. (and it's better 
than teaching the mid-layer about QUEUE FULL sequences).

> Well, all the device drivers that implement queue depth adjustments at
> all  already do it themselves, so my proposed method is better than
> what we  currently have since it at least allows them to move the
> decide  disposition stuff into a library call or they can do it
> themselves.

Fair enough.

James







^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: PATCH: scsi device queue depth adjustability patch
  2002-10-03 12:46       ` James Bottomley
@ 2002-10-03 16:35         ` Doug Ledford
  2002-10-04  1:40         ` Jeremy Higdon
  1 sibling, 0 replies; 297+ messages in thread
From: Doug Ledford @ 2002-10-03 16:35 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi

On Thu, Oct 03, 2002 at 08:46:19AM -0400, James Bottomley wrote:
> dledford@redhat.com said:
> > Go look at the QUEUE_FULL handler in the aic7xxx_old driver.  This is
> > how  most/all reasonably well written drivers handle queue depth
> > adjustments.   Trust me, they don't go around adjusting the depth all
> > the time.  Most of  the time there will be one initial adjustment,
> > then maybe one more  adjustment as we lock it down to the max upper
> > limit when one exists, the  rest of the time we just handle the
> > occasional random queue depth  QUEUE_FULL messages as exactly that and
> > only temporarily freeze the queue  to let the drive get some work
> > done. 
> 
> OK, I spent a nice evening doing this (do I get brownie points?).  I see your 
> algorithm is roughly lower the depth after 14 queue fulls and assume that luns 
> of the same pun need to be treated equivalently.

(Note: the aic7xxx_old driver has flawed pun/lun tagged queue 
differentiation.  I know this.  It's not trivial to fix and I'm not going 
to fix it until I send in the patch that implements decision making at the 
library level.  I have no plan to ever fix it in the actual aic7xxx_old 
driver, only in the library)  Yes, you get brownie points ;-)

> I failed entirely to find how the queue depth is increased (no brownie points 
> here).  How is that done?

It isn't.  When we set the queue depth during init, we set it to the 
maximum allowed.  We only reduce the depth after getting 14 consecutive 
QUEUE_FULLs at the same depth.  By that point in time we can safely assume 
that the 14 QUEUE_FULLs represented a hard limit, not a transient limit, 
and therefore there is no need to ever raise the limit again.  Keep in 
mind that in the driver, every time we get a QUEUE_FULL, we freeze the 
queue until either A) a command completes or B) 10ms have passed.  That 
means we won't get 14 QUEUE_FULLs without the drive having a chance to 
change the current situation and remedy any transient conditions.

> > The mid layer simply does not have access to the info needed to do
> > what  you are talking about.  Let me give you an example from the
> > aic7xxx  driver.  On this card we have a sequencer that handles the
> [...]
> > how we avoid the random flip flops of queue depth that you were
> > worried about earlier. 
> 
> Yes, many HBAs have internal "issue" queues where commands wait before being 
> placed on the bus.  I was assuming, however, that when a HBA driver got QUEUE 
> FULL, it would traverse the issue queue and respond QUEUE FULL also to all 
> pending commands for that device.  The mid-layer should thus see a succession 
> of QUEUE FULLs for the device (we even have a nice signature for this because 
> the QUEUE FULL occurs while device_blocked is set).  However, as long as it 
> can correctly recognise this, it knows that when the last QUEUE FULL is 
> through it has the true device queue depth, doesn't it?

Nope.  It *might* be possible to make this work, but there is no guarantee 
as it stands.  Consider that while processing the QUEUE_FULLs I may have 
other successfully completed commands on my done queue waiting to be sent 
to the mid layer.  Furthermore, they get queued for a tasklet to handle.  
What happens if a driver sends the QUEUE_FULLs first and then the 
completed commands next?  We get a wrong count.  In general, you are 
suggesting that we impose a strict syncronization between the mid layer 
and the low level driver, including ordering restrictions and more, when 
instead we could just call one function with our final count once we know 
it.  We still have to do all the work in the low level driver of finding 
the commands anyway, so my way of doing things only adds a few lines of 
code to each driver beyond what they must have regardless and it avoids 
the syncronization headache this other method brings up.

> > Now, what can be moved to the mid layer, and is on my list to do, is a
> >  generic interface for coming up with the proper action after a
> > QUEUE_FULL.   Currently, each driver not only determines the real
> > depth, but then also  does it's own magic to tell if it's a random
> > event or a hard limit.  It  would be easy to add something like
> > scsi_notify_queue_full(sdev, count);  where scsi_notify_queue_full()
> > would keep track of the last queue full  depth, etc.  Let the low
> > level driver come up with the accurate count,  then they can use this
> > function to determine what to do about it.  On any  change to queue
> > depth, the function can return the amount of commands to  add/
> > subtract, then the low level driver can adjust it's own internal
> > structure counts and also call scsi_adjust_queue_depth() to have the
> > mid  layer do likewise.  BTW, I'll change the adjust_queue_depth code
> > to make  it immediately adjust the depth down when possible and do
> > lazy increases  so that hitting a hard limit will free up resources
> > immediately, but that  will go with making the commands linked list
> > based so that it can simply  do a while(sdev->queue_depth > sdev->
> > new_queue_depth &&  list_not_empty(sdev->free_list)) {
> > kfree(get_list_head(sdev->free_list));  sdev->queue_depth--; }  
> 
> I'll go for this.  That would address my main concern which is a proliferation 
> of individual queue full handling algorithms in the LLDs. (and it's better 
> than teaching the mid-layer about QUEUE FULL sequences).

Good, then we agree 100% then ;-)

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: PATCH: scsi device queue depth adjustability patch
  2002-10-03 12:46       ` James Bottomley
  2002-10-03 16:35         ` Doug Ledford
@ 2002-10-04  1:40         ` Jeremy Higdon
  1 sibling, 0 replies; 297+ messages in thread
From: Jeremy Higdon @ 2002-10-04  1:40 UTC (permalink / raw)
  To: James Bottomley, linux-scsi

On Oct 3,  8:46am, James Bottomley wrote:
> 
> > The mid layer simply does not have access to the info needed to do
> > what  you are talking about.  Let me give you an example from the
> > aic7xxx  driver.  On this card we have a sequencer that handles the
> [...]
> > how we avoid the random flip flops of queue depth that you were
> > worried about earlier. 
> 
> Yes, many HBAs have internal "issue" queues where commands wait before being 
> placed on the bus.  I was assuming, however, that when a HBA driver got QUEUE 
> FULL, it would traverse the issue queue and respond QUEUE FULL also to all 
> pending commands for that device.  The mid-layer should thus see a succession 
> of QUEUE FULLs for the device (we even have a nice signature for this because 
> the QUEUE FULL occurs while device_blocked is set).  However, as long as it 
> can correctly recognise this, it knows that when the last QUEUE FULL is 
> through it has the true device queue depth, doesn't it?

The Qlogic host adapters do not provide a way to find the true queue depth
to the device as described previously by Doug.  The midlayer depth is
probably about as accurate as you could get, with the exception being
if the Qlogic driver runs out of IOCBs.  I don't believe that they return
Queue Full for unsubmitted commands, but I do not know for sure.  I certainly
would not expect every host adapter to do that.

That was one of the holes in the Ordered Tag scheme.  You issue a bunch
of unordered commands, then an ordered one, followed by a bunch of unordered.

If the ordered tag gets the Queue Full, and you are depending on the
Ordered semantics, you're in Big Trouble.

jeremy

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: PATCH: scsi device queue depth adjustability patch
  2002-10-02  0:28 ` PATCH: scsi device queue depth adjustability patch Doug Ledford
  2002-10-02  1:16   ` Alan Cox
  2002-10-02 21:41   ` James Bottomley
@ 2002-10-03 14:25   ` James Bottomley
  2002-10-03 16:41     ` Doug Ledford
  2 siblings, 1 reply; 297+ messages in thread
From: James Bottomley @ 2002-10-03 14:25 UTC (permalink / raw)
  To: linux-scsi

[-- Attachment #1: Type: text/plain, Size: 573 bytes --]

dledford@redhat.com said:
> What I need, people to test this with the old aic7xxx driver (which
> implements the new code paths, no other drivers do) and people to test
>  with other drivers to make sure it doesn't break them.  If I don't
> hear  complaints then I'll move on to my next change which is far more
> intrusive  on drivers in general. 

After fixing up the 53c700 to use the new adjust queue depths stuff, I can 
confirm it's working fine for me here (well, actually I fixed the generic scsi 
tcq code to use it, but only the 53c700 driver uses this).

James


[-- Attachment #2: tmp.diff --]
[-- Type: text/plain , Size: 641 bytes --]

===== scsi.h 1.22 vs edited =====
--- 1.22/drivers/scsi/scsi.h	Thu Aug 15 16:01:28 2002
+++ edited/scsi.h	Thu Aug 15 16:18:53 2002
@@ -882,7 +882,7 @@
 
         if(SDpnt->tagged_supported && !blk_queue_tagged(q)) {
                 blk_queue_init_tags(q, depth);
-                SDpnt->tagged_queue = 1;
+		scsi_adjust_queue_depth(SDpnt, 1, depth);
         }
 }
 
@@ -892,7 +892,7 @@
  **/
 static inline void scsi_deactivate_tcq(Scsi_Device *SDpnt) {
         blk_queue_free_tags(&SDpnt->request_queue);
-        SDpnt->tagged_queue = 0;
+	scsi_adjust_queue_depth(SDpnt, 0, 2);
 }
 #define MSG_SIMPLE_TAG	0x20
 #define MSG_HEAD_TAG	0x21

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: PATCH: scsi device queue depth adjustability patch
  2002-10-03 14:25   ` James Bottomley
@ 2002-10-03 16:41     ` Doug Ledford
  2002-10-03 17:00       ` James Bottomley
  0 siblings, 1 reply; 297+ messages in thread
From: Doug Ledford @ 2002-10-03 16:41 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi

On Thu, Oct 03, 2002 at 10:25:44AM -0400, James Bottomley wrote:
> dledford@redhat.com said:
> > What I need, people to test this with the old aic7xxx driver (which
> > implements the new code paths, no other drivers do) and people to test
> >  with other drivers to make sure it doesn't break them.  If I don't
> > hear  complaints then I'll move on to my next change which is far more
> > intrusive  on drivers in general. 
> 
> After fixing up the 53c700 to use the new adjust queue depths stuff, I can 
> confirm it's working fine for me here (well, actually I fixed the generic scsi 
> tcq code to use it, but only the 53c700 driver uses this).

Excellent ;-)  However, a couple questions.

1)  When/How is the lldd notified that tagged queueing has been enabled 
and the depth to which it is enabled?

2)  How does the lldd tell the midlayer (and hence the block layer) to 
reduce the queue depth?


Content-Description: tmp.diff
> ===== scsi.h 1.22 vs edited =====
> --- 1.22/drivers/scsi/scsi.h	Thu Aug 15 16:01:28 2002
> +++ edited/scsi.h	Thu Aug 15 16:18:53 2002
> @@ -882,7 +882,7 @@
>  
>          if(SDpnt->tagged_supported && !blk_queue_tagged(q)) {
>                  blk_queue_init_tags(q, depth);
> -                SDpnt->tagged_queue = 1;
> +		scsi_adjust_queue_depth(SDpnt, 1, depth);
>          }
>  }
>  
> @@ -892,7 +892,7 @@
>   **/
>  static inline void scsi_deactivate_tcq(Scsi_Device *SDpnt) {
>          blk_queue_free_tags(&SDpnt->request_queue);
> -        SDpnt->tagged_queue = 0;
> +	scsi_adjust_queue_depth(SDpnt, 0, 2);
>  }
>  #define MSG_SIMPLE_TAG	0x20
>  #define MSG_HEAD_TAG	0x21


-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606
  

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: PATCH: scsi device queue depth adjustability patch
  2002-10-03 16:41     ` Doug Ledford
@ 2002-10-03 17:00       ` James Bottomley
  0 siblings, 0 replies; 297+ messages in thread
From: James Bottomley @ 2002-10-03 17:00 UTC (permalink / raw)
  To: linux-scsi

dledford@redhat.com said:
> 1)  When/How is the lldd notified that tagged queueing has been
> enabled  and the depth to which it is enabled?

A trick question. Obviously it should be in slave_attach.  However, the 53c700 
is used with a lot of quirky drives, so the way I do it is to shadow the 
commands.  The first command that comes down with tagged_supported set, I try 
a tagged command.  If that command succeeds, I then activate the TCQ and thus 
set tagged_enabled (the driver has several bits per target to support this 
state model). If the first tag fails, I clear tagged_supported.  This catches 
the annoying drives that set the  CmdQue bit in the inquiry page and then 
reject all tagged commands.  The depth for the 53c700 is just the default 
cmd_per_lun.

You can see this in the source code (tag_negotiated bitmap and 
NCR_700_DEV_BEGIN_TAG_QUEUEING flag).

> 2)  How does the lldd tell the midlayer (and hence the block layer) to
>  reduce the queue depth? 

It can't, currently.  This could be done, but only by disabling and reenabling 
the block layer TCQ support.  The block layer currently uses a static array 
for this.

However, the resources the block layer uses are tiny (one entry in this array 
per tag, plus a bitlist) so its probably just not worth freeing and 
reacquiring these.  To integrate with the blk layer, I'd propose just 
initialising the blk layer with the maximum possible number of tags (i.e. 
introduce an upper limit somewhere) and then just do the dynamic stuff in the 
mid-layer.

James

^ permalink raw reply	[flat|nested] 297+ messages in thread

* scsi_scan.c question
@ 2002-10-16 21:35 ` Doug Ledford
  2002-10-16 21:41   ` James Bottomley
  2002-10-16 21:57   ` Patrick Mansfield
  0 siblings, 2 replies; 297+ messages in thread
From: Doug Ledford @ 2002-10-16 21:35 UTC (permalink / raw)
  To: linux-scsi

Is anyone still working on this code or have the changes other people 
wanted to make all been integrated?

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606
  

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: scsi_scan.c question
  2002-10-16 21:35 ` scsi_scan.c question Doug Ledford
@ 2002-10-16 21:41   ` James Bottomley
  2002-10-17  0:18     ` Doug Ledford
  2002-10-16 21:57   ` Patrick Mansfield
  1 sibling, 1 reply; 297+ messages in thread
From: James Bottomley @ 2002-10-16 21:41 UTC (permalink / raw)
  To: linux-scsi

dledford@redhat.com said:
> Is anyone still working on this code or have the changes other people
> wanted to make all been integrated? 

I have some changes to throw most of it away and re-implement it in user space 
using hotplug instead.  I think this type of thing is a little radical for the 
eve of 2.6, so feel free to alter scsi_scan.c in any way you want.

James



^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: scsi_scan.c question
  2002-10-16 21:41   ` James Bottomley
@ 2002-10-17  0:18     ` Doug Ledford
  0 siblings, 0 replies; 297+ messages in thread
From: Doug Ledford @ 2002-10-17  0:18 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi

On Wed, Oct 16, 2002 at 02:41:51PM -0700, James Bottomley wrote:
> dledford@redhat.com said:
> > Is anyone still working on this code or have the changes other people
> > wanted to make all been integrated? 
> 
> I have some changes to throw most of it away and re-implement it in user space 
> using hotplug instead.

(Not a bad plan, I rather like that idea myself)

>  I think this type of thing is a little radical for the 
> eve of 2.6, so feel free to alter scsi_scan.c in any way you want.

OK, then I'm working on it as soon as my 2.5 test box works right again 
(ServerWorks IDE seems broke on my box and I don't have the time to fix it 
when I've got this other stuff waiting).

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606
  

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: scsi_scan.c question
  2002-10-16 21:35 ` scsi_scan.c question Doug Ledford
  2002-10-16 21:41   ` James Bottomley
@ 2002-10-16 21:57   ` Patrick Mansfield
  2002-10-18 15:57     ` Patrick Mansfield
  1 sibling, 1 reply; 297+ messages in thread
From: Patrick Mansfield @ 2002-10-16 21:57 UTC (permalink / raw)
  To: linux-scsi

On Wed, Oct 16, 2002 at 05:35:18PM -0400, Doug Ledford wrote:
> Is anyone still working on this code or have the changes other people 
> wanted to make all been integrated?

Not that I know of.

I never re-sent patches to change the GFP_ATOMIC to GFP_KERNEL. With
the sleep debugging, and per other discussions, no in_interrupt check
should be needed. I can create, test, and send them to you or
the list, but the change is simple.

More device attributes should be added, so we can move away from
/proc/scsi - like vendor, product, online, and maybe more.

The device model "name" should be replaced for now with our own
"uid" field, and name used as a generic description, probably the
same thing that is put into the type field right now, or something
for disk like "scsi direct-access" and for tape "scsi sequential-access".

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: scsi_scan.c question
  2002-10-16 21:57   ` Patrick Mansfield
@ 2002-10-18 15:57     ` Patrick Mansfield
  0 siblings, 0 replies; 297+ messages in thread
From: Patrick Mansfield @ 2002-10-18 15:57 UTC (permalink / raw)
  To: linux-scsi

On Wed, Oct 16, 2002 at 02:57:28PM -0700, Patrick Mansfield wrote:
> On Wed, Oct 16, 2002 at 05:35:18PM -0400, Doug Ledford wrote:
> > Is anyone still working on this code or have the changes other people 
> > wanted to make all been integrated?
> 
> Not that I know of.

Sorry, I did not include SCSI-2 LUN setting changes for scsi_scan.c in
my recent consolidate SCSI-2 command lun setting patch, I re-rolled the
patch with scsi_scan.c changes, and I'm asking James to push it.

Also, I was excluding my multi-path changes - I'm assuming I'll have to
merge with whatever changes you make.

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 297+ messages in thread

* aic7xxx_biosparam
@ 2002-11-18  0:27 ` Doug Ledford
  2002-11-18  0:36   ` aic7xxx_biosparam J.E.J. Bottomley
                     ` (2 more replies)
  0 siblings, 3 replies; 297+ messages in thread
From: Doug Ledford @ 2002-11-18  0:27 UTC (permalink / raw)
  To: Linux Scsi Mailing List

[-- Attachment #1: Type: text/plain, Size: 688 bytes --]

So, this is the change I made to update it so that it won't puke on large 
devices.  It seems to be working OK for me, but I didn't check the gcc 
assembly output to see if it's horrible.  Unless someone tells me I'm 
smoking crack for making this change, I'll submit it pretty soon (I don't 
consider myself a partition expert, I'm just trying to make sure that when 
geometry does get defined on my driver that it at least is something the 
Adaptec BIOS will accept, and the Adaptec BIOS *only* accepts the two 
listed head/sector combos).

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606
  

[-- Attachment #2: aic.patch --]
[-- Type: text/plain, Size: 1980 bytes --]

# This is a BitKeeper generated patch for the following project:
# Project Name: Linux kernel tree
# This patch format is intended for GNU patch command version 2.5 or higher.
# This patch includes the following deltas:
#	           ChangeSet	1.878   -> 1.878.1.1
#	drivers/scsi/aic7xxx_old.c	1.32    -> 1.33   
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 02/11/17	dledford@aladin.rdu.redhat.com	1.878.1.1
# aic7xxx_old: fix up the biosparam function to do 64bit math safely
# --------------------------------------------
#
diff -Nru a/drivers/scsi/aic7xxx_old.c b/drivers/scsi/aic7xxx_old.c
--- a/drivers/scsi/aic7xxx_old.c	Sun Nov 17 19:19:48 2002
+++ b/drivers/scsi/aic7xxx_old.c	Sun Nov 17 19:19:49 2002
@@ -10974,7 +10974,8 @@
 aic7xxx_biosparam(struct scsi_device *sdev, struct block_device *bdev,
 		sector_t capacity, int geom[])
 {
-  int heads, sectors, cylinders, ret;
+  sector_t heads, sectors, cylinders;
+  int ret;
   struct aic7xxx_host *p;
   unsigned char *buf;
 
@@ -10991,18 +10992,26 @@
   
   heads = 64;
   sectors = 32;
-  cylinders = (unsigned long)capacity / (heads * sectors);
+  cylinders = capacity >> 11;
 
   if ((p->flags & AHC_EXTEND_TRANS_A) && (cylinders > 1024))
   {
     heads = 255;
     sectors = 63;
-    cylinders = (unsigned long)capacity / (heads * sectors);
+    /* pull this crap because 64bit math in the kernel is a no-no as far
+     * as division is concerned, but 64bit multiplication can be done */
+    /* This shift approximates capacity / (heads * sectors) */
+    cylinders = capacity >> 14;
+    /* Now we brute force upping cylinders until we go over by 1 */
+    while( capacity >= (cylinders * sectors * heads))
+      cylinders++;
+    /* Then back it back down by one */
+    cylinders--;
   }
 
-  geom[0] = heads;
-  geom[1] = sectors;
-  geom[2] = cylinders;
+  geom[0] = (int)heads;
+  geom[1] = (int)sectors;
+  geom[2] = (int)cylinders;
 
   return (0);
 }

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: aic7xxx_biosparam
  2002-11-18  0:27 ` aic7xxx_biosparam Doug Ledford
@ 2002-11-18  0:36   ` J.E.J. Bottomley
  2002-11-18  2:46     ` aic7xxx_biosparam Doug Ledford
  2002-11-18  0:43   ` aic7xxx_biosparam Andries Brouwer
  2002-11-18  0:57   ` aic7xxx_biosparam Alan Cox
  2 siblings, 1 reply; 297+ messages in thread
From: J.E.J. Bottomley @ 2002-11-18  0:36 UTC (permalink / raw)
  To: Linux Scsi Mailing List

dledford@redhat.com said:
> So, this is the change I made to update it so that it won't puke on
> large  devices.  It seems to be working OK for me, but I didn't check
> the gcc  assembly output to see if it's horrible.  Unless someone
> tells me I'm  smoking crack for making this change, I'll submit it
> pretty soon (I don't  consider myself a partition expert, I'm just
> trying to make sure that when  geometry does get defined on my driver
> that it at least is something the  Adaptec BIOS will accept, and the
> Adaptec BIOS *only* accepts the two  listed head/sector combos). 

Well, SCSI doesn't implement the BIG_GETGEO ioctl, so this isn't really much 
use.  You can use sector_div to do the divisions.

However, Andries Brower assures me that the geometry stuff is unused (at least 
from the cylinders point of view), so all of this is unnecessary anyway.  
There's a large thread about it on linux-scsi (don't have the reference).  The 
upshot was the code you can see in scsicam.c:scsicam_bios_param().

So the short answer is that this should be unnecessary.

James

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: aic7xxx_biosparam
  2002-11-18  0:36   ` aic7xxx_biosparam J.E.J. Bottomley
@ 2002-11-18  2:46     ` Doug Ledford
  2002-11-18  3:20       ` aic7xxx_biosparam J.E.J. Bottomley
  0 siblings, 1 reply; 297+ messages in thread
From: Doug Ledford @ 2002-11-18  2:46 UTC (permalink / raw)
  To: J.E.J. Bottomley; +Cc: Linux Scsi Mailing List

On Sun, Nov 17, 2002 at 06:36:25PM -0600, J.E.J. Bottomley wrote:
> 
> Well, SCSI doesn't implement the BIG_GETGEO ioctl, so this isn't really much 
> use.  You can use sector_div to do the divisions.

Hmmm..should that be changed though?

> However, Andries Brower assures me that the geometry stuff is unused (at least 
> from the cylinders point of view), so all of this is unnecessary anyway.

I beg to differ.  At least on the Adaptec stuff, it gets pissy if the 
partition tables don't look sane (at least older BIOSes do, and those 
cards are still in use, so that means I try and make them work).  Besides, 
when I make changes in this code, the effects are instantly visible in 
fdisk (well, after a module reload), so it get's used somewhere.

> There's a large thread about it on linux-scsi (don't have the reference).  The 
> upshot was the code you can see in scsicam.c:scsicam_bios_param().
> 
> So the short answer is that this should be unnecessary.

No, it's needed.  If it isn't there, then when a disk is currently empty 
of any partitions, the generic geometry stuff picks out whatever it can 
find that fits the total capacity best without regard to the BIOS issues 
on the older Adaptec cards, resulting in funky crap that doesn't work.

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: aic7xxx_biosparam
  2002-11-18  2:46     ` aic7xxx_biosparam Doug Ledford
@ 2002-11-18  3:20       ` J.E.J. Bottomley
  2002-11-18  3:26         ` aic7xxx_biosparam Doug Ledford
  0 siblings, 1 reply; 297+ messages in thread
From: J.E.J. Bottomley @ 2002-11-18  3:20 UTC (permalink / raw)
  To: Linux Scsi Mailing List

dledford@redhat.com said:
> Hmmm..should that be changed though? 

I hope not.  If we implement BIG_GETGEO then I've made all the wrong decisions 
for biosparam: since it can pass back ints, the implementations should all be 
using sector_div and we should do the truncation in the GETGEO ioctl since we 
can use the same biosparam call for both ioctls.

> I beg to differ.  At least on the Adaptec stuff, it gets pissy if the
> partition tables don't look sane (at least older BIOSes do, and those
> cards are still in use, so that means I try and make them work).
> Besides,  when I make changes in this code, the effects are instantly
> visible in  fdisk (well, after a module reload), so it get's used
> somewhere.

The argument is that no user space tools rely on the cyls value, only the 
sectors and heads.  They all use the GETGEO ioctl, but also get the blocksize 
of the device and divide this out by the sectors*heads to get the true cyls 
value.

I know this is the wrong short term decision: you should never implement an 
interface wrongly because it causes exactly this type of confusion.  I'm just 
betting on the C/H/S mess being sorted out soon....I hope.

James

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: aic7xxx_biosparam
  2002-11-18  3:20       ` aic7xxx_biosparam J.E.J. Bottomley
@ 2002-11-18  3:26         ` Doug Ledford
  0 siblings, 0 replies; 297+ messages in thread
From: Doug Ledford @ 2002-11-18  3:26 UTC (permalink / raw)
  To: J.E.J. Bottomley; +Cc: Linux Scsi Mailing List

On Sun, Nov 17, 2002 at 09:20:10PM -0600, J.E.J. Bottomley wrote:
> > visible in  fdisk (well, after a module reload), so it get's used
> > somewhere.
> 
> The argument is that no user space tools rely on the cyls value, only the 
> sectors and heads.  They all use the GETGEO ioctl, but also get the blocksize 

That may very well be true and I'm all for that, but I'm still anal about 
being right anyway, so I implemented what Andries suggested.

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606
  

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: aic7xxx_biosparam
  2002-11-18  0:27 ` aic7xxx_biosparam Doug Ledford
  2002-11-18  0:36   ` aic7xxx_biosparam J.E.J. Bottomley
@ 2002-11-18  0:43   ` Andries Brouwer
  2002-11-18  2:47     ` aic7xxx_biosparam Doug Ledford
  2002-11-18  0:57   ` aic7xxx_biosparam Alan Cox
  2 siblings, 1 reply; 297+ messages in thread
From: Andries Brouwer @ 2002-11-18  0:43 UTC (permalink / raw)
  To: Linux Scsi Mailing List

On Sun, Nov 17, 2002 at 07:27:42PM -0500, Doug Ledford wrote:

> So, this is the change I made to update it so that it won't puke on large 
> devices.  It seems to be working OK for me, but I didn't check the gcc 
> assembly output to see if it's horrible.  Unless someone tells me I'm 
> smoking crack for making this change, I'll submit it pretty soon (I don't 
> consider myself a partition expert, I'm just trying to make sure that when 
> geometry does get defined on my driver that it at least is something the 
> Adaptec BIOS will accept, and the Adaptec BIOS *only* accepts the two 
> listed head/sector combos).

Yes. But note: the number of cylinders (i) is totally immaterial,
(ii) is returned in a short, the only place where it is used.
Thus, computing values larger than 65535 is undesired, they will
be truncated later.

That is why you can replace this strange

> +    while( capacity >= (cylinders * sectors * heads))
> +      cylinders++;
> +    cylinders--;

by a test like:

	if (capacity > 65535 * sectors * heads)
		cylinders = 65535;
	else
		cylinders = ((unsigned int) capacity) / (sectors * heads);


Andries

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: aic7xxx_biosparam
  2002-11-18  0:43   ` aic7xxx_biosparam Andries Brouwer
@ 2002-11-18  2:47     ` Doug Ledford
  0 siblings, 0 replies; 297+ messages in thread
From: Doug Ledford @ 2002-11-18  2:47 UTC (permalink / raw)
  To: Andries Brouwer; +Cc: Linux Scsi Mailing List

On Mon, Nov 18, 2002 at 01:43:41AM +0100, Andries Brouwer wrote:
> 
> Yes. But note: the number of cylinders (i) is totally immaterial,
> (ii) is returned in a short, the only place where it is used.
> Thus, computing values larger than 65535 is undesired, they will
> be truncated later.

That is a bit of information I was looking for.  I figured it would get 
truncated somewhere (or worse, wrap), and I wanted to know where that 
might be.

> That is why you can replace this strange
> 
> > +    while( capacity >= (cylinders * sectors * heads))
> > +      cylinders++;
> > +    cylinders--;
> 
> by a test like:
> 
> 	if (capacity > 65535 * sectors * heads)
> 		cylinders = 65535;
> 	else
> 		cylinders = ((unsigned int) capacity) / (sectors * heads);

Thanks, I'll do that.

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606
  

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: aic7xxx_biosparam
  2002-11-18  0:27 ` aic7xxx_biosparam Doug Ledford
  2002-11-18  0:36   ` aic7xxx_biosparam J.E.J. Bottomley
  2002-11-18  0:43   ` aic7xxx_biosparam Andries Brouwer
@ 2002-11-18  0:57   ` Alan Cox
  2002-11-18  2:34     ` aic7xxx_biosparam Doug Ledford
  2 siblings, 1 reply; 297+ messages in thread
From: Alan Cox @ 2002-11-18  0:57 UTC (permalink / raw)
  To: Doug Ledford; +Cc: Linux Scsi Mailing List

On Mon, 2002-11-18 at 00:27, Doug Ledford wrote:
> So, this is the change I made to update it so that it won't puke on large 
> devices.  It seems to be working OK for me, but I didn't check the gcc 
> assembly output to see if it's horrible.  Unless someone tells me I'm 
> smoking crack for making this change, I'll submit it pretty soon (I don't 
> consider myself a partition expert, I'm just trying to make sure that when 
> geometry does get defined on my driver that it at least is something the 
> Adaptec BIOS will accept, and the Adaptec BIOS *only* accepts the two 
> listed head/sector combos).

What about using sector_div ?


^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: aic7xxx_biosparam
  2002-11-18  0:57   ` aic7xxx_biosparam Alan Cox
@ 2002-11-18  2:34     ` Doug Ledford
  0 siblings, 0 replies; 297+ messages in thread
From: Doug Ledford @ 2002-11-18  2:34 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linux Scsi Mailing List

On Mon, Nov 18, 2002 at 12:57:20AM +0000, Alan Cox wrote:
> What about using sector_div ?

Didn't know it existed.  Pointer?

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606
  

^ permalink raw reply	[flat|nested] 297+ messages in thread

* scsi_scan changes...
@ 2002-12-21  1:22 ` Doug Ledford
  2002-12-21  1:27   ` James Bottomley
  0 siblings, 1 reply; 297+ messages in thread
From: Doug Ledford @ 2002-12-21  1:22 UTC (permalink / raw)
  To: James Bottomley; +Cc: Linux Scsi Mailing List

I seem to have found my bug and I am getting ready to build a tree.  
James, where is your latest collection of stuff that will be going to 
Linus so I can build against that when I populate my tree?

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: scsi_scan changes...
  2002-12-21  1:22 ` scsi_scan changes Doug Ledford
@ 2002-12-21  1:27   ` James Bottomley
  0 siblings, 0 replies; 297+ messages in thread
From: James Bottomley @ 2002-12-21  1:27 UTC (permalink / raw)
  To: James Bottomley, Linux Scsi Mailing List

dledford@redhat.com said:
> I seem to have found my bug and I am getting ready to build a tree.
> James, where is your latest collection of stuff that will be going to
> Linus so I can build against that when I populate my tree? 

There's only a single outstanding change against the Linus tree currently.  
It's at http://linux-scsi.bkbits.net/scsi-misc-2.5

James



^ permalink raw reply	[flat|nested] 297+ messages in thread

* [PATCH] get rid of ->finish method for highlevel drivers
@ 2002-10-21 19:34 Christoph Hellwig
  2002-10-21 23:58 ` James Bottomley
  2002-10-22  7:30 ` Mike Anderson
  0 siblings, 2 replies; 297+ messages in thread
From: Christoph Hellwig @ 2002-10-21 19:34 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi

the ->finish method is a relicat from the old day were we never had
hotplugging and allowed the driver to do fixups after all busses
had been scanned.  Nowdays only sd and sr actually implement it,
and both only defer actions to there that should actually happen in
->attach.  Change both drivers to move that code into ->attach,
clenaup the Templates to use C99 initializers and get rid of the
methods.

This also cleans up some very crude race-avoidable code in those
drivers, btw..


diff -uNr -Xdontdiff -p linux-2.5.44-uc0/drivers/scsi/hosts.c linux/drivers/scsi/hosts.c
--- linux-2.5.44-uc0/drivers/scsi/hosts.c	Mon Oct 21 17:18:03 2002
+++ linux/drivers/scsi/hosts.c	Mon Oct 21 20:46:14 2002
@@ -573,14 +573,6 @@ int scsi_register_host(Scsi_Host_Templat
 					}
 				}
 		}
-
-		/* This does any final handling that is required. */
-		for (sdev_tp = scsi_devicelist; sdev_tp;
-		     sdev_tp = sdev_tp->next) {
-			if (sdev_tp->finish && sdev_tp->nr_dev) {
-				(*sdev_tp->finish) ();
-			}
-		}
 	}
 
 	return 0;
diff -uNr -Xdontdiff -p linux-2.5.44-uc0/drivers/scsi/hosts.h linux/drivers/scsi/hosts.h
--- linux-2.5.44-uc0/drivers/scsi/hosts.h	Mon Oct 21 17:18:19 2002
+++ linux/drivers/scsi/hosts.h	Mon Oct 21 20:43:56 2002
@@ -583,7 +583,6 @@ struct Scsi_Device_Template
     int (*detect)(Scsi_Device *); /* Returns 1 if we can attach this device */
     int (*init)(void);		  /* Sizes arrays based upon number of devices
 		   *  detected */
-    void (*finish)(void);	  /* Perform initialization after attachment */
     int (*attach)(Scsi_Device *); /* Attach devices to arrays */
     void (*detach)(Scsi_Device *);
     int (*init_command)(Scsi_Cmnd *);     /* Used by new queueing code. 
diff -uNr -Xdontdiff -p linux-2.5.44-uc0/drivers/scsi/scsi.c linux/drivers/scsi/scsi.c
--- linux-2.5.44-uc0/drivers/scsi/scsi.c	Mon Oct 21 17:19:01 2002
+++ linux/drivers/scsi/scsi.c	Mon Oct 21 20:44:58 2002
@@ -2034,18 +2039,14 @@ int scsi_register_device(struct Scsi_Dev
 		}
 	}
 
-	/*
-	 * This does any final handling that is required.
-	 */
-	if (tpnt->finish && tpnt->nr_dev)
-		(*tpnt->finish) ();
 	MOD_INC_USE_COUNT;
 
 	if (out_of_space) {
 		scsi_unregister_device(tpnt);	/* easiest way to clean up?? */
 		return 1;
-	} else
-		return 0;
+	}
+
+	return 0;
 }
 
 int scsi_unregister_device(struct Scsi_Device_Template *tpnt)
diff -uNr -Xdontdiff -p linux-2.5.44-uc0/drivers/scsi/scsi_scan.c linux/drivers/scsi/scsi_scan.c
--- linux-2.5.44-uc0/drivers/scsi/scsi_scan.c	Mon Oct 21 17:20:04 2002
+++ linux/drivers/scsi/scsi_scan.c	Mon Oct 21 20:47:21 2002
@@ -2012,11 +2012,6 @@ static void scsi_scan_selected_lun(struc
 						       __FUNCTION__);
 				}
 			}
-
-		for (sdt = scsi_devicelist; sdt; sdt = sdt->next)
-			if (sdt->finish && sdt->nr_dev)
-				(*sdt->finish) ();
-
 	}
 }
 
diff -uNr -Xdontdiff -p linux-2.5.44-uc0/drivers/scsi/sd.c linux/drivers/scsi/sd.c
--- linux-2.5.44-uc0/drivers/scsi/sd.c	Mon Oct 21 17:20:02 2002
+++ linux/drivers/scsi/sd.c	Mon Oct 21 20:52:13 2002
@@ -92,7 +92,6 @@ static int sd_revalidate(struct gendisk 
 static void sd_init_onedisk(Scsi_Disk * sdkp, struct gendisk *disk);
 
 static int sd_init(void);
-static void sd_finish(void);
 static int sd_attach(Scsi_Device *);
 static int sd_detect(Scsi_Device *);
 static void sd_detach(Scsi_Device *);
@@ -103,23 +102,19 @@ static int sd_notifier(struct notifier_b
 static struct notifier_block sd_notifier_block = {sd_notifier, NULL, 0}; 
 
 static struct Scsi_Device_Template sd_template = {
-	module:THIS_MODULE,
-	name:"disk",
-	tag:"sd",
-	scsi_type:TYPE_DISK,
-	major:SCSI_DISK0_MAJOR,
-        /*
-         * Secondary range of majors that this driver handles.
-         */
-	min_major:SCSI_DISK1_MAJOR,
-	max_major:SCSI_DISK7_MAJOR,
-	blk:1,
-	detect:sd_detect,
-	init:sd_init,
-	finish:sd_finish,
-	attach:sd_attach,
-	detach:sd_detach,
-	init_command:sd_init_command,
+	.module		= THIS_MODULE,
+	.name		= "disk",
+	.tag		= "sd",
+	.scsi_type	= TYPE_DISK,
+	.major		= SCSI_DISK0_MAJOR,
+	.min_major	= SCSI_DISK1_MAJOR,
+	.max_major	= SCSI_DISK7_MAJOR,
+	.blk		= 1,
+	.detect		= sd_detect,
+	.init		= sd_init,
+	.attach		= sd_attach,
+	.detach		= sd_detach,
+	.init_command	= sd_init_command,
 };
 
 static void sd_rw_intr(Scsi_Cmnd * SCpnt);
@@ -1291,38 +1286,6 @@ cleanup_mem:
 }
 
 /**
- *	sd_finish - called during driver initialization, after all
- *	the sd_attach() calls are finished.
- *
- *	Note: this function is invoked from the scsi mid-level.
- *	This function is not called after driver initialization has completed.
- *	Specifically later device attachments invoke sd_attach() but not
- *	this function.
- **/
-static void sd_finish()
-{
-	int k;
-	Scsi_Disk * sdkp;
-
-	SCSI_LOG_HLQUEUE(3, printk("sd_finish: \n"));
-
-	for (k = 0; k < sd_template.dev_max; ++k) {
-		sdkp = sd_get_sdisk(k);
-		if (sdkp && (0 == sdkp->capacity) && sdkp->device) {
-			sd_init_onedisk(sdkp, sd_disks[k]);
-			if (sdkp->has_been_registered)
-				continue;
-			set_capacity(sd_disks[k], sdkp->capacity);
-			sd_disks[k]->private_data = sdkp;
-			sd_disks[k]->queue = &sdkp->device->request_queue;
-			add_disk(sd_disks[k]);
-			sdkp->has_been_registered = 1;
-		}
-	}
-	return;
-}
-
-/**
  *	sd_detect - called at the start of driver initialization, once 
  *	for each scsi device (not just disks) present.
  *
@@ -1358,13 +1321,12 @@ static int sd_detect(Scsi_Device * sdp)
  **/
 static int sd_attach(Scsi_Device * sdp)
 {
-	Scsi_Disk *sdkp;
+	Scsi_Disk *sdkp = NULL;	/* shut up lame gcc warning */
 	int dsk_nr;
 	unsigned long iflags;
 	struct gendisk *gd;
 
-	if ((NULL == sdp) ||
-	    ((sdp->type != TYPE_DISK) && (sdp->type != TYPE_MOD)))
+	if ((sdp->type != TYPE_DISK) && (sdp->type != TYPE_MOD))
 		return 0;
 
 	gd = alloc_disk(16);
@@ -1373,15 +1335,16 @@ static int sd_attach(Scsi_Device * sdp)
 
 	SCSI_LOG_HLQUEUE(3, printk("sd_attach: scsi device: <%d,%d,%d,%d>\n", 
 			 sdp->host->host_no, sdp->channel, sdp->id, sdp->lun));
+
 	if (sd_template.nr_dev >= sd_template.dev_max) {
-		sdp->attached--;
 		printk(KERN_ERR "sd_init: no more room for device\n");
-		put_disk(gd);
-		return 1;
+		goto out;
 	}
 
-/* Assume sd_attach is not re-entrant (for time being) */
-/* Also think about sd_attach() and sd_detach() running coincidentally. */
+	/*
+	 * Assume sd_attach is not re-entrant (for time being)
+	 * Also think about sd_attach() and sd_detach() running coincidentally.
+	 */
 	write_lock_irqsave(&sd_dsk_arr_lock, iflags);
 	for (dsk_nr = 0; dsk_nr < sd_template.dev_max; dsk_nr++) {
 		sdkp = sd_dsk_arr[dsk_nr];
@@ -1393,15 +1356,15 @@ static int sd_attach(Scsi_Device * sdp)
 	}
 	write_unlock_irqrestore(&sd_dsk_arr_lock, iflags);
 
-	if (dsk_nr >= sd_template.dev_max) {
-		/* panic("scsi_devices corrupt (sd)");  overkill */
+	if (!sdkp || dsk_nr >= sd_template.dev_max) {
 		printk(KERN_ERR "sd_init: sd_dsk_arr corrupted\n");
-		put_disk(gd);
-		return 1;
+		goto out;
 	}
 
+	sd_init_onedisk(sdkp, gd);
 	sd_template.nr_dev++;
-        gd->de = sdp->de;
+
+	gd->de = sdp->de;
 	gd->major = SD_MAJOR(dsk_nr>>4);
 	gd->first_minor = (dsk_nr & 15)<<4;
 	gd->fops = &sd_fops;
@@ -1409,14 +1372,26 @@ static int sd_attach(Scsi_Device * sdp)
 		sprintf(gd->disk_name, "sd%c%c",'a'+dsk_nr/26-1,'a'+dsk_nr%26);
 	else
 		sprintf(gd->disk_name, "sd%c",'a'+dsk_nr%26);
-        gd->flags = sdp->removable ? GENHD_FL_REMOVABLE : 0;
-        gd->driverfs_dev = &sdp->sdev_driverfs_dev;
-        gd->flags |= GENHD_FL_DRIVERFS | GENHD_FL_DEVFS;
+	gd->flags = sdp->removable ? GENHD_FL_REMOVABLE : 0;
+	gd->driverfs_dev = &sdp->sdev_driverfs_dev;
+	gd->flags |= GENHD_FL_DRIVERFS | GENHD_FL_DEVFS;
+	gd->private_data = sdkp;
+	gd->queue = &sdkp->device->request_queue;
+
+	set_capacity(gd, sdkp->capacity);
+	add_disk(gd);
+
 	sd_disks[dsk_nr] = gd;
+
 	printk(KERN_NOTICE "Attached scsi %sdisk %s at scsi%d, channel %d, "
 	       "id %d, lun %d\n", sdp->removable ? "removable " : "",
 	       gd->disk_name, sdp->host->host_no, sdp->channel, sdp->id, sdp->lun);
 	return 0;
+
+out:
+	sdp->attached--;
+	put_disk(gd);
+	return 1;
 }
 
 static int sd_revalidate(struct gendisk *disk)
@@ -1472,10 +1447,7 @@ static void sd_detach(Scsi_Device * sdp)
 	sdkp->capacity = 0;
 	/* sdkp->detaching = 1; */
 
-	if (sdkp->has_been_registered) {
-		sdkp->has_been_registered = 0;
-		del_gendisk(sd_disks[dsk_nr]);
-	}
+	del_gendisk(sd_disks[dsk_nr]);
 	sdp->attached--;
 	sd_template.dev_noticed--;
 	sd_template.nr_dev--;
diff -uNr -Xdontdiff -p linux-2.5.44-uc0/drivers/scsi/sd.h linux/drivers/scsi/sd.h
--- linux-2.5.44-uc0/drivers/scsi/sd.h	Mon Oct 21 17:19:51 2002
+++ linux/drivers/scsi/sd.h	Mon Oct 21 20:34:15 2002
@@ -25,7 +25,6 @@ typedef struct scsi_disk {
 	Scsi_Device *device;
 	unsigned char media_present;
 	unsigned char write_prot;
-	unsigned has_been_registered:1;
 	unsigned WCE:1;         /* state of disk WCE bit */
 	unsigned RCD:1;         /* state of disk RCD bit */
 } Scsi_Disk;
diff -uNr -Xdontdiff -p linux-2.5.44-uc0/drivers/scsi/sr.c linux/drivers/scsi/sr.c
--- linux-2.5.44-uc0/drivers/scsi/sr.c	Mon Oct 21 17:18:14 2002
+++ linux/drivers/scsi/sr.c	Mon Oct 21 20:57:29 2002
@@ -63,27 +63,24 @@ MODULE_PARM(xa_test, "i");	/* see sr_ioc
 #define SR_TIMEOUT	(30 * HZ)
 
 static int sr_init(void);
-static void sr_finish(void);
 static int sr_attach(Scsi_Device *);
 static int sr_detect(Scsi_Device *);
 static void sr_detach(Scsi_Device *);
 
 static int sr_init_command(Scsi_Cmnd *);
 
-static struct Scsi_Device_Template sr_template =
-{
-	module:THIS_MODULE,
-	name:"cdrom",
-	tag:"sr",
-	scsi_type:TYPE_ROM,
-	major:SCSI_CDROM_MAJOR,
-	blk:1,
-	detect:sr_detect,
-	init:sr_init,
-	finish:sr_finish,
-	attach:sr_attach,
-	detach:sr_detach,
-	init_command:sr_init_command
+static struct Scsi_Device_Template sr_template = {
+	.module		= THIS_MODULE,
+	.name		= "cdrom",
+	.tag		= "sr",
+	.scsi_type	= TYPE_ROM,
+	.major		= SCSI_CDROM_MAJOR,
+	.blk		= 1,
+	.detect		= sr_detect,
+	.init		= sr_init,
+	.attach		= sr_attach,
+	.detach		= sr_detach,
+	.init_command	= sr_init_command
 };
 
 static Scsi_CD *scsi_CDs;
@@ -91,6 +88,7 @@ static Scsi_CD *scsi_CDs;
 static int sr_open(struct cdrom_device_info *, int);
 static void get_sectorsize(Scsi_CD *);
 static void get_capabilities(Scsi_CD *);
+static int sr_init_one(Scsi_CD *, int);
 
 static int sr_media_change(struct cdrom_device_info *, int);
 static int sr_packet(struct cdrom_device_info *, struct cdrom_generic_command *);
@@ -473,10 +471,9 @@ static int sr_attach(Scsi_Device * SDp)
 	if (SDp->type != TYPE_ROM && SDp->type != TYPE_WORM)
 		return 1;
 
-	if (sr_template.nr_dev >= sr_template.dev_max) {
-		SDp->attached--;
-		return 1;
-	}
+	if (sr_template.nr_dev >= sr_template.dev_max)
+		goto fail;
+
 	for (cpnt = scsi_CDs, i = 0; i < sr_template.dev_max; i++, cpnt++)
 		if (!cpnt->device)
 			break;
@@ -484,6 +481,8 @@ static int sr_attach(Scsi_Device * SDp)
 	if (i >= sr_template.dev_max)
 		panic("scsi_devices corrupt (sr)");
 
+	if (sr_init_one(cpnt, i))
+		goto fail;
 
 	scsi_CDs[i].device = SDp;
 
@@ -494,6 +493,10 @@ static int sr_attach(Scsi_Device * SDp)
 	printk("Attached scsi CD-ROM %s at scsi%d, channel %d, id %d, lun %d\n",
 	       scsi_CDs[i].cdi.name, SDp->host->host_no, SDp->channel, SDp->id, SDp->lun);
 	return 0;
+
+fail:
+	SDp->attached--;
+	return 1;
 }
 
 
@@ -744,64 +747,56 @@ cleanup_dev:
 	return 1;
 }
 
-void sr_finish()
+static int sr_init_one(Scsi_CD *cd, int first_minor)
 {
-	int i;
+	struct gendisk *disk;
 
-	for (i = 0; i < sr_template.nr_dev; ++i) {
-		struct gendisk *disk;
-		Scsi_CD *cd = &scsi_CDs[i];
-		/* If we have already seen this, then skip it.  Comes up
-		 * with loadable modules. */
-		if (cd->disk)
-			continue;
-		disk = alloc_disk(1);
-		if (!disk)
-			continue;
-		if (cd->disk) {
-			put_disk(disk);
-			continue;
-		}
-		disk->major = MAJOR_NR;
-		disk->first_minor = i;
-		strcpy(disk->disk_name, cd->cdi.name);
-		disk->fops = &sr_bdops;
-		disk->flags = GENHD_FL_CD;
-		cd->disk = disk;
-		cd->capacity = 0x1fffff;
-		cd->device->sector_size = 2048;/* A guess, just in case */
-		cd->needs_sector_size = 1;
-		cd->device->changed = 1;	/* force recheck CD type */
+	disk = alloc_disk(1);
+	if (!disk)
+		return -ENOMEM;
+
+	disk->major = MAJOR_NR;
+	disk->first_minor = first_minor;
+	strcpy(disk->disk_name, cd->cdi.name);
+	disk->fops = &sr_bdops;
+	disk->flags = GENHD_FL_CD;
+	cd->disk = disk;
+	cd->capacity = 0x1fffff;
+	cd->device->sector_size = 2048;/* A guess, just in case */
+	cd->needs_sector_size = 1;
+	cd->device->changed = 1;	/* force recheck CD type */
 #if 0
-		/* seems better to leave this for later */
-		get_sectorsize(cd);
-		printk("Scd sectorsize = %d bytes.\n", cd->sector_size);
+	/* seems better to leave this for later */
+	get_sectorsize(cd);
+	printk("Scd sectorsize = %d bytes.\n", cd->sector_size);
 #endif
-		cd->use = 1;
+	cd->use = 1;
 
-		cd->device->ten = 1;
-		cd->device->remap = 1;
-		cd->readcd_known = 0;
-		cd->readcd_cdda = 0;
-
-		cd->cdi.ops = &sr_dops;
-		cd->cdi.handle = cd;
-		cd->cdi.mask = 0;
-		cd->cdi.capacity = 1;
-		/*
-		 *	FIXME: someone needs to handle a get_capabilities
-		 *	failure properly ??
-		 */
-		get_capabilities(cd);
-		sr_vendor_init(cd);
-		disk->de = cd->device->de;
-		disk->driverfs_dev = &cd->device->sdev_driverfs_dev;
-		register_cdrom(&cd->cdi);
-		set_capacity(disk, cd->capacity);
-		disk->private_data = cd;
-		disk->queue = &cd->device->request_queue;
-		add_disk(disk);
-	}
+	cd->device->ten = 1;
+	cd->device->remap = 1;
+	cd->readcd_known = 0;
+	cd->readcd_cdda = 0;
+
+	cd->cdi.ops = &sr_dops;
+	cd->cdi.handle = cd;
+	cd->cdi.mask = 0;
+	cd->cdi.capacity = 1;
+
+	/*
+	 *	FIXME: someone needs to handle a get_capabilities
+	 *	failure properly ??
+	 */
+	get_capabilities(cd);
+	sr_vendor_init(cd);
+	disk->de = cd->device->de;
+	disk->driverfs_dev = &cd->device->sdev_driverfs_dev;
+	register_cdrom(&cd->cdi);
+	set_capacity(disk, cd->capacity);
+	disk->private_data = cd;
+	disk->queue = &cd->device->request_queue;
+	add_disk(disk);
+
+	return 0;
 }
 
 static void sr_detach(Scsi_Device * SDp)

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] get rid of ->finish method for highlevel drivers
  2002-10-21 19:34 [PATCH] get rid of ->finish method for highlevel drivers Christoph Hellwig
@ 2002-10-21 23:58 ` James Bottomley
  2002-10-22 15:48   ` James Bottomley
  2002-10-22  7:30 ` Mike Anderson
  1 sibling, 1 reply; 297+ messages in thread
From: James Bottomley @ 2002-10-21 23:58 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: James Bottomley, linux-scsi

hch@lst.de said:
> the ->finish method is a relicat from the old day were we never had
> hotplugging and allowed the driver to do fixups after all busses had
> been scanned.  Nowdays only sd and sr actually implement it, and both
> only defer actions to there that should actually happen in ->attach.
> Change both drivers to move that code into ->attach, clenaup the
> Templates to use C99 initializers and get rid of the methods. 

OK, this one causes a hang on boot for me.  I'll look at debugging the cause 
some more tomorrow.

James



^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] get rid of ->finish method for highlevel drivers
  2002-10-21 23:58 ` James Bottomley
@ 2002-10-22 15:48   ` James Bottomley
  2002-10-22 18:43     ` Patrick Mansfield
  0 siblings, 1 reply; 297+ messages in thread
From: James Bottomley @ 2002-10-22 15:48 UTC (permalink / raw)
  To: Christoph Hellwig, linux-scsi

[-- Attachment #1: Type: text/plain, Size: 1023 bytes --]

hch@lst.de said:
> the ->finish method is a relicat from the old day were we never had
> hotplugging and allowed the driver to do fixups after all busses had
> been scanned.  Nowdays only sd and sr actually implement it, and both
> only defer actions to there that should actually happen in ->attach.
> Change both drivers to move that code into ->attach, clenaup the
> Templates to use C99 initializers and get rid of the methods. 

James.Bottomley@SteelEye.com said:
> OK, this one causes a hang on boot for me.  I'll look at debugging the
> cause  some more tomorrow. 

I tracked down the essential problem: the CD rom finish method expects to have 
allocated command blocks to do a mode sense.  I also fixed a trivial 
initialisation problem (the device pointer in Scsi_CD was being used before it 
was initialised).

The attached patch fixes the problem for me.

I still can't detach and add a CD-ROM, but the panic is in sr_init, so doesn't 
look to be affected by these changes, but I'll continue debugging.

James


[-- Attachment #2: tmp.diff --]
[-- Type: text/plain , Size: 3495 bytes --]

===== drivers/scsi/hosts.c 1.17 vs edited =====
--- 1.17/drivers/scsi/hosts.c	Mon Oct 21 15:46:14 2002
+++ edited/drivers/scsi/hosts.c	Tue Oct 22 10:16:26 2002
@@ -561,15 +561,16 @@
 			shost = list_entry(lh, struct Scsi_Host, sh_list);
 			for (sdev = shost->host_queue; sdev; sdev = sdev->next)
 				if (sdev->host->hostt == shost_tp) {
+					scsi_build_commandblocks(sdev);
+					if (sdev->current_queue_depth == 0)
+						goto out_of_space;
 					for (sdev_tp = scsi_devicelist;
 					     sdev_tp;
 					     sdev_tp = sdev_tp->next)
 						if (sdev_tp->attach)
 							(*sdev_tp->attach) (sdev);
-					if (sdev->attached) {
-						scsi_build_commandblocks(sdev);
-						if (sdev->current_queue_depth == 0)
-							goto out_of_space;
+					if (!sdev->attached) {
+                                                scsi_release_commandblocks(sdev);
 					}
 				}
 		}
===== drivers/scsi/scsi.c 1.50 vs edited =====
--- 1.50/drivers/scsi/scsi.c	Mon Oct 21 17:58:31 2002
+++ edited/drivers/scsi/scsi.c	Tue Oct 22 09:56:52 2002
@@ -2024,18 +2024,22 @@
 	     shpnt = scsi_host_get_next(shpnt)) {
 		for (SDpnt = shpnt->host_queue; SDpnt;
 		     SDpnt = SDpnt->next) {
+			scsi_build_commandblocks(SDpnt);
+			if (SDpnt->current_queue_depth == 0) {
+				out_of_space = 1;
+				continue;
+			}
 			if (tpnt->attach)
 				(*tpnt->attach) (SDpnt);
+
 			/*
 			 * If this driver attached to the device, and don't have any
 			 * command blocks for this device, allocate some.
 			 */
-			if (SDpnt->attached && SDpnt->current_queue_depth == 0) {
+			if (SDpnt->attached)
 				SDpnt->online = TRUE;
-				scsi_build_commandblocks(SDpnt);
-				if (SDpnt->current_queue_depth == 0)
-					out_of_space = 1;
-			}
+			else
+				scsi_release_commandblocks(SDpnt);
 		}
 	}
 
===== drivers/scsi/scsi_scan.c 1.29 vs edited =====
--- 1.29/drivers/scsi/scsi_scan.c	Mon Oct 21 15:47:21 2002
+++ edited/drivers/scsi/scsi_scan.c	Tue Oct 22 10:38:19 2002
@@ -1995,24 +1995,28 @@
 	sdevscan->scsi_level = scsi_find_scsi_level(channel, id, shost);
 	res = scsi_probe_and_add_lun(sdevscan, &sdev, NULL);
 	scsi_free_sdev(sdevscan);
-	if (res == SCSI_SCAN_LUN_PRESENT) {
-		BUG_ON(sdev == NULL);
 
-		for (sdt = scsi_devicelist; sdt; sdt = sdt->next)
-			if (sdt->init && sdt->dev_noticed)
-				(*sdt->init) ();
-
-		for (sdt = scsi_devicelist; sdt; sdt = sdt->next)
-			if (sdt->attach) {
-				(*sdt->attach) (sdev);
-				if (sdev->attached) {
-					scsi_build_commandblocks(sdev);
-					if (sdev->current_queue_depth == 0)
-						printk(ALLOC_FAILURE_MSG,
-						       __FUNCTION__);
-				}
-			}
+	if (res != SCSI_SCAN_LUN_PRESENT) 
+		return;
+
+	BUG_ON(sdev == NULL);
+
+	scsi_build_commandblocks(sdev);
+	if (sdev->current_queue_depth == 0) {
+		printk(ALLOC_FAILURE_MSG, __FUNCTION__);
+		return;
 	}
+
+	for (sdt = scsi_devicelist; sdt; sdt = sdt->next)
+		if (sdt->init && sdt->dev_noticed)
+			(*sdt->init) ();
+
+	for (sdt = scsi_devicelist; sdt; sdt = sdt->next)
+		if (sdt->attach)
+			(*sdt->attach) (sdev);
+
+	if (!sdev->attached)
+		scsi_release_commandblocks(sdev);
 }
 
 /**
===== drivers/scsi/sr.c 1.58 vs edited =====
--- 1.58/drivers/scsi/sr.c	Mon Oct 21 17:58:38 2002
+++ edited/drivers/scsi/sr.c	Tue Oct 22 10:28:30 2002
@@ -481,10 +481,10 @@
 	if (i >= sr_template.dev_max)
 		panic("scsi_devices corrupt (sr)");
 
+	scsi_CDs[i].device = SDp;
+
 	if (sr_init_one(cpnt, i))
 		goto fail;
-
-	scsi_CDs[i].device = SDp;
 
 	sr_template.nr_dev++;
 	if (sr_template.nr_dev > sr_template.dev_max)

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] get rid of ->finish method for highlevel drivers
  2002-10-22 15:48   ` James Bottomley
@ 2002-10-22 18:43     ` Patrick Mansfield
  2002-10-22 23:17       ` Mike Anderson
  0 siblings, 1 reply; 297+ messages in thread
From: Patrick Mansfield @ 2002-10-22 18:43 UTC (permalink / raw)
  To: James Bottomley; +Cc: Christoph Hellwig, linux-scsi

On Tue, Oct 22, 2002 at 10:48:19AM -0500, James Bottomley wrote:
> 
> I tracked down the essential problem: the CD rom finish method expects to have 
> allocated command blocks to do a mode sense.  I also fixed a trivial 
> initialisation problem (the device pointer in Scsi_CD was being used before it 
> was initialised).
> 
> The attached patch fixes the problem for me.
> 
> I still can't detach and add a CD-ROM, but the panic is in sr_init, so doesn't 
> look to be affected by these changes, but I'll continue debugging.
> 
> James
> 

James -

How do we end up with no command blocks if the slave_attach is
calling scsi_build_commandblocks via scsi_adjust_queue_depth?

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] get rid of ->finish method for highlevel drivers
  2002-10-22 18:43     ` Patrick Mansfield
@ 2002-10-22 23:17       ` Mike Anderson
  2002-10-22 23:30         ` Doug Ledford
  0 siblings, 1 reply; 297+ messages in thread
From: Mike Anderson @ 2002-10-22 23:17 UTC (permalink / raw)
  To: James Bottomley, Christoph Hellwig, linux-scsi

Patrick Mansfield [patmans@us.ibm.com] wrote:
> On Tue, Oct 22, 2002 at 10:48:19AM -0500, James Bottomley wrote:
> > 
> > I tracked down the essential problem: the CD rom finish method expects to have 
> > allocated command blocks to do a mode sense.  I also fixed a trivial 
> > initialisation problem (the device pointer in Scsi_CD was being used before it 
> > was initialised).
> > 
> > The attached patch fixes the problem for me.
> > 
> > I still can't detach and add a CD-ROM, but the panic is in sr_init, so doesn't 
> > look to be affected by these changes, but I'll continue debugging.
> > 
> > James
> > 
> 
> James -
> 
> How do we end up with no command blocks if the slave_attach is
> calling scsi_build_commandblocks via scsi_adjust_queue_depth?

It appears after looking at this with Patrick that in drivers where
slave_attach is is not calling scsi_adjust_queue_depth the
scsi_build_commandblocks will not be called. In my case ips.c and in
James case 53c700.c.

If post calling slave_attach the new_queue_depth is 0 and the device is
not offline we should ensure there is at least a new_queue_depth of
1 which is what your patch does by later on calling
scsi_build_commandblocks directly.

-andmike
--
Michael Anderson
andmike@us.ibm.com


^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] get rid of ->finish method for highlevel drivers
  2002-10-22 23:17       ` Mike Anderson
@ 2002-10-22 23:30         ` Doug Ledford
  2002-10-23 14:16           ` James Bottomley
  0 siblings, 1 reply; 297+ messages in thread
From: Doug Ledford @ 2002-10-22 23:30 UTC (permalink / raw)
  To: James Bottomley, Christoph Hellwig, linux-scsi

On Tue, Oct 22, 2002 at 04:17:14PM -0700, Mike Anderson wrote:
> Patrick Mansfield [patmans@us.ibm.com] wrote:
> > On Tue, Oct 22, 2002 at 10:48:19AM -0500, James Bottomley wrote:
> > > 
> > > I tracked down the essential problem: the CD rom finish method expects to have 
> > > allocated command blocks to do a mode sense.  I also fixed a trivial 
> > > initialisation problem (the device pointer in Scsi_CD was being used before it 
> > > was initialised).
> > > 
> > > The attached patch fixes the problem for me.
> > > 
> > > I still can't detach and add a CD-ROM, but the panic is in sr_init, so doesn't 
> > > look to be affected by these changes, but I'll continue debugging.
> > > 
> > > James
> > > 
> > 
> > James -
> > 
> > How do we end up with no command blocks if the slave_attach is
> > calling scsi_build_commandblocks via scsi_adjust_queue_depth?
> 
> It appears after looking at this with Patrick that in drivers where
> slave_attach is is not calling scsi_adjust_queue_depth the
> scsi_build_commandblocks will not be called. In my case ips.c and in
> James case 53c700.c.
> 
> If post calling slave_attach the new_queue_depth is 0 and the device is
> not offline we should ensure there is at least a new_queue_depth of
> 1 which is what your patch does by later on calling
> scsi_build_commandblocks directly.

Actually, prior to the scsi_scan changes, the code always made sure that 
it was at least 1 after the scan.  Maybe that got lost somehow...

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606
  

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] get rid of ->finish method for highlevel drivers
  2002-10-22 23:30         ` Doug Ledford
@ 2002-10-23 14:16           ` James Bottomley
  2002-10-23 15:13             ` Christoph Hellwig
  0 siblings, 1 reply; 297+ messages in thread
From: James Bottomley @ 2002-10-23 14:16 UTC (permalink / raw)
  To: Christoph Hellwig, linux-scsi

andmike@us.ibm.com said:
> It appears after looking at this with Patrick that in drivers where
> slave_attach is is not calling scsi_adjust_queue_depth the
> scsi_build_commandblocks will not be called. In my case ips.c and in
> James case 53c700.c. 

> If post calling slave_attach the new_queue_depth is 0 and the device
> is not offline we should ensure there is at least a new_queue_depth of
> 1 which is what your patch does by later on calling
> scsi_build_commandblocks directly. 

> Actually, prior to the scsi_scan changes, the code always made sure
> that  it was at least 1 after the scan.  Maybe that got lost
> somehow... 

This is really exposing our whole detect init attach finish slave_attach mess 
which Christoph was trying to clean up.

What happens now (after Christoph's removal of finish) is:

1) call detect to see if anyone will attach.  If yes, call slave attach (we 
adjust the queue depth here if wanted)
2) call init (if not inited---done once per upper level driver) --- init is 
supposed to be for first init of the SCSI upper level device driver to 
allocate inital resources, so we already have detect called befor init.
3) call attach but release the commandblocks if sdev->attached comes back zero.

-> attached, by the way is a semaphore because we can have more than one upper 
level attachment.  This exposes nicely in the rather icky algorithm of 
scsi_get_request_dev() which is used to find which upper level driver should 
process the request in the request function.

It looks like what we want to have happen is:

1) init first
2) detect is currently superfluous (there will probably be a future time when 
we need a separate detect for lazy attachment---devices don't attach and 
consume resources until the operator so instructs)
4) call build commandblocks to get a single working command
5) attach
6) if anything attached do the slave attach or cmd_per_lun adjustment.  If 
not, release the command blocks.

The net effect should be that if a device has no upper level driver, it can 
appear in /proc/scsi/scsi, but it's consuming no other system resources.

James

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] get rid of ->finish method for highlevel drivers
  2002-10-23 14:16           ` James Bottomley
@ 2002-10-23 15:13             ` Christoph Hellwig
  2002-10-24  1:36               ` Patrick Mansfield
  2002-10-24 23:20               ` Willem Riede
  0 siblings, 2 replies; 297+ messages in thread
From: Christoph Hellwig @ 2002-10-23 15:13 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi

On Wed, Oct 23, 2002 at 09:16:49AM -0500, James Bottomley wrote:
> This is really exposing our whole detect init attach finish slave_attach mess 
> which Christoph was trying to clean up.
> 
> What happens now (after Christoph's removal of finish) is:
> 
> 1) call detect to see if anyone will attach.  If yes, call slave attach (we 
> adjust the queue depth here if wanted)
> 2) call init (if not inited---done once per upper level driver) --- init is 
> supposed to be for first init of the SCSI upper level device driver to 
> allocate inital resources, so we already have detect called befor init.

Well, I'm working on removing ->init.  With the patch posted yersterday
only sg.c and osst.c are left implementing it.

> 3) call attach but release the commandblocks if sdev->attached comes back zero.
> 
> -> attached, by the way is a semaphore because we can have more than one upper 
> level attachment.  This exposes nicely in the rather icky algorithm of 
> scsi_get_request_dev() which is used to find which upper level driver should 
> process the request in the request function.
> 
> It looks like what we want to have happen is:
> 
> 1) init first
>
> 
> 2) detect is currently superfluous (there will probably be a future time when 
> we need a separate detect for lazy attachment---devices don't attach and 
> consume resources until the operator so instructs)
> 4) call build commandblocks to get a single working command
> 5) attach
> 6) if anything attached do the slave attach or cmd_per_lun adjustment.  If 
> not, release the command blocks.

Completly agreed with that.


^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] get rid of ->finish method for highlevel drivers
  2002-10-23 15:13             ` Christoph Hellwig
@ 2002-10-24  1:36               ` Patrick Mansfield
  2002-10-24 23:20               ` Willem Riede
  1 sibling, 0 replies; 297+ messages in thread
From: Patrick Mansfield @ 2002-10-24  1:36 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: James Bottomley, linux-scsi

On Wed, Oct 23, 2002 at 05:13:35PM +0200, Christoph Hellwig wrote:
> On Wed, Oct 23, 2002 at 09:16:49AM -0500, James Bottomley wrote:
> > It looks like what we want to have happen is:
> > 
> > 1) init first
> >
> > 
> > 2) detect is currently superfluous (there will probably be a future time when 
> > we need a separate detect for lazy attachment---devices don't attach and 
> > consume resources until the operator so instructs)
> > 4) call build commandblocks to get a single working command
> > 5) attach
> > 6) if anything attached do the slave attach or cmd_per_lun adjustment.  If 
> > not, release the command blocks.
> 
> Completly agreed with that.

We should also call blk_cleanup_queue along with scsi_release_commandblocks,
and then call scsi_initialize_queue when we call scsi_build_commandblocks
in scsi_register_device (or wherever we have the corresponding
scsi_build_commandblocks call).

The request queue memory usage is probably a lot higher than what is used
for one Scsi_Cmnd.

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] get rid of ->finish method for highlevel drivers
  2002-10-23 15:13             ` Christoph Hellwig
  2002-10-24  1:36               ` Patrick Mansfield
@ 2002-10-24 23:20               ` Willem Riede
  2002-10-24 23:36                 ` Christoph Hellwig
  1 sibling, 1 reply; 297+ messages in thread
From: Willem Riede @ 2002-10-24 23:20 UTC (permalink / raw)
  To: linux-scsi

On 2002.10.23 11:13 Christoph Hellwig wrote:
> 
> Well, I'm working on removing ->init.  With the patch posted yersterday
> only sg.c and osst.c are left implementing it.
> 
Ermm, osst (which I maintain) works the same as st in this respect.
I didn't notice any changes to st.c in that patch. What am I missing?

Thanks, Willem Riede.

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] get rid of ->finish method for highlevel drivers
  2002-10-24 23:20               ` Willem Riede
@ 2002-10-24 23:36                 ` Christoph Hellwig
  2002-10-25  0:02                   ` Willem Riede
  0 siblings, 1 reply; 297+ messages in thread
From: Christoph Hellwig @ 2002-10-24 23:36 UTC (permalink / raw)
  To: Willem Riede; +Cc: linux-scsi

On Thu, Oct 24, 2002 at 07:20:26PM -0400, Willem Riede wrote:
> > Well, I'm working on removing ->init.  With the patch posted yersterday
> > only sg.c and osst.c are left implementing it.
> > 
> Ermm, osst (which I maintain) works the same as st in this respect.
> I didn't notice any changes to st.c in that patch. What am I missing?

st in 2.5 doesn't have ->init and ->finish anymore.  Time for a resync? :)


^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] get rid of ->finish method for highlevel drivers
  2002-10-24 23:36                 ` Christoph Hellwig
@ 2002-10-25  0:02                   ` Willem Riede
  0 siblings, 0 replies; 297+ messages in thread
From: Willem Riede @ 2002-10-25  0:02 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-scsi

On 2002.10.24 19:36 Christoph Hellwig wrote:
> On Thu, Oct 24, 2002 at 07:20:26PM -0400, Willem Riede wrote:
> > > Well, I'm working on removing ->init.  With the patch posted yersterday
> > > only sg.c and osst.c are left implementing it.
> > >
> > Ermm, osst (which I maintain) works the same as st in this respect.
> > I didn't notice any changes to st.c in that patch. What am I missing?
> 
> st in 2.5 doesn't have ->init and ->finish anymore.  Time for a resync? :)
> 
Obviously. I'll do that. I have to admit I've kind of ignored 2.5.x to
let it stabilize before spending effort on keeping osst up to date,
but it looks like it's time to make up for lost time...

Thanks, Willem Riede.

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] get rid of ->finish method for highlevel drivers
  2002-10-21 19:34 [PATCH] get rid of ->finish method for highlevel drivers Christoph Hellwig
  2002-10-21 23:58 ` James Bottomley
@ 2002-10-22  7:30 ` Mike Anderson
  2002-10-22 11:14   ` Christoph Hellwig
  1 sibling, 1 reply; 297+ messages in thread
From: Mike Anderson @ 2002-10-22  7:30 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: James Bottomley, linux-scsi

Christoph Hellwig [hch@lst.de] wrote:
> the ->finish method is a relicat from the old day were we never had
> hotplugging and allowed the driver to do fixups after all busses
> had been scanned.  Nowdays only sd and sr actually implement it,
> and both only defer actions to there that should actually happen in
> ->attach.  Change both drivers to move that code into ->attach,
> clenaup the Templates to use C99 initializers and get rid of the
> methods.
> 
> This also cleans up some very crude race-avoidable code in those
> drivers, btw..

Looks good Christoph. I applied this patch to James scsi-misc-2.5 tree
and I ran it under UML and did not see any boot / shutdown problems. I
will run it on our FC 2x tomorrow.

It looks to me like we could also investigate removing sd_disks and
sd_dsk_arr by using the per driver storage of the device model. In the 2
sd_disks access cases we have the scsi_device and should be able to
locate the device object. I believe the sd_dsk_arr could also be covered
in a similar method.

-andmike
--
Michael Anderson
andmike@us.ibm.com

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] get rid of ->finish method for highlevel drivers
  2002-10-22  7:30 ` Mike Anderson
@ 2002-10-22 11:14   ` Christoph Hellwig
  0 siblings, 0 replies; 297+ messages in thread
From: Christoph Hellwig @ 2002-10-22 11:14 UTC (permalink / raw)
  To: James Bottomley, linux-scsi

On Tue, Oct 22, 2002 at 12:30:40AM -0700, Mike Anderson wrote:
> It looks to me like we could also investigate removing sd_disks and
> sd_dsk_arr by using the per driver storage of the device model. In the 2
> sd_disks access cases we have the scsi_device and should be able to
> locate the device object. I believe the sd_dsk_arr could also be covered
> in a similar method.

Yes, that's on my TODO list for this week.
> 

^ permalink raw reply	[flat|nested] 297+ messages in thread

* [PATCH] fix 2.5 scsi queue depth setting
@ 2002-11-06  4:24 Patrick Mansfield
  2002-11-06  4:35 ` Patrick Mansfield
                   ` (3 more replies)
  0 siblings, 4 replies; 297+ messages in thread
From: Patrick Mansfield @ 2002-11-06  4:24 UTC (permalink / raw)
  To: James Bottomley, Christoph Hellwig, linux-scsi

This patch fixes queue depth setting of scsi devices.

This is done by pairing shost->slave_attach() calls with
a scsi_build_commandblocks in the new scsi_slave_attach.

This is a patch aginst linux-scsi.bkbits.net/scsi-for-linus-2.5 after 
applying the last posted hch version of the "Eliminate scsi_host_tmpl_list"
patch, it still applies with offset to the current scsi-for-linus-2.5.

It also:

Will properly call shost->slave_attach after a scsi_unregister_device()
followed by a scsi_register_device() - as could happen if you were able to
rmmod all upper level drivers and then insmod any of them back (only
possible when not booted on scsi).

Checks for scsi_build_commandblocks() allocation failures.

Sets queue depth even if shost->slave_attach() does not call
scsi_adjust_queue_depth.

Removes the use of revoke (no drivers are setting it, it was only
call via the proc scsi remove-single-device interface).

There are at least two problems with sysfs and scsi (one in sysfs, one in
scsi, I'll try and post more soon ...) so I could not completey test rmmod
of an adapter or upper level driver without leading to an oops or shutdown
hang.

 hosts.c              |    5 --
 hosts.h              |    6 --
 osst.c               |    9 ++-
 scsi.c               |  118 +++++++++++++++++++++++++++++++--------------------
 scsi.h               |    2 
 scsi_mid_low_api.txt |   24 ----------
 scsi_scan.c          |    9 ---
 sd.c                 |   10 +++-
 sg.c                 |   10 ++--
 sr.c                 |    7 ++-
 st.c                 |   11 +++-
 11 files changed, 106 insertions(+), 105 deletions(-)

===== drivers/scsi/hosts.c 1.23 vs edited =====
--- 1.23/drivers/scsi/hosts.c	Tue Nov  5 17:13:42 2002
+++ edited/drivers/scsi/hosts.c	Tue Nov  5 17:18:22 2002
@@ -249,10 +249,6 @@
 			       sdev->attached);
 			return 1;
 		}
-
-		if (shost->hostt->slave_detach)
-			(*shost->hostt->slave_detach) (sdev);
-
 		devfs_unregister(sdev->de);
 		device_unregister(&sdev->sdev_driverfs_dev);
 	}
@@ -261,7 +257,6 @@
 
 	for (sdev = shost->host_queue; sdev;
 	     sdev = shost->host_queue) {
-		scsi_release_commandblocks(sdev);
 		blk_cleanup_queue(&sdev->request_queue);
 		/* Next free up the Scsi_Device structures for this host */
 		shost->host_queue = sdev->next;
===== drivers/scsi/hosts.h 1.32 vs edited =====
--- 1.32/drivers/scsi/hosts.h	Tue Nov  5 17:13:42 2002
+++ edited/drivers/scsi/hosts.h	Tue Nov  5 19:40:42 2002
@@ -93,12 +93,6 @@
      */
     int (* detect)(struct SHT *);
 
-    /*
-     * This function is only used by one driver and will be going away
-     * once it switches over to using the slave_detach() function instead.
-     */
-    int (*revoke)(Scsi_Device *);
-
     /* Used with loadable modules to unload the host structures.  Note:
      * there is a default action built into the modules code which may
      * be sufficient for most host adapters.  Thus you may not have to supply
===== drivers/scsi/osst.c 1.24 vs edited =====
--- 1.24/drivers/scsi/osst.c	Mon Nov  4 20:00:30 2002
+++ edited/drivers/scsi/osst.c	Tue Nov  5 17:26:53 2002
@@ -5421,10 +5421,12 @@
 		return 1;
 
 	if (osst_nr_dev >= osst_dev_max) {
-		 SDp->attached--;
 		 put_disk(disk);
 		 return 1;
 	}
+
+	if (scsi_slave_attach(SDp))
+		return 1;
 	
 	/* find a free minor number */
 	for (i=0; os_scsi_tapes[i] && i<osst_dev_max; i++);
@@ -5433,9 +5435,10 @@
 	/* allocate a OS_Scsi_Tape for this device */
 	tpnt = (OS_Scsi_Tape *)kmalloc(sizeof(OS_Scsi_Tape), GFP_ATOMIC);
 	if (tpnt == NULL) {
-		 SDp->attached--;
 		 printk(KERN_WARNING "osst :W: Can't allocate device descriptor.\n");
 		 put_disk(disk);
+
+		 scsi_slave_detach(SDp);
 		 return 1;
 	}
 	memset(tpnt, 0, sizeof(OS_Scsi_Tape));
@@ -5648,7 +5651,7 @@
 		put_disk(tpnt->disk);
 		kfree(tpnt);
 		os_scsi_tapes[i] = NULL;
-		SDp->attached--;
+		scsi_slave_detach(SDp);
 		osst_nr_dev--;
 		osst_dev_noticed--;
 		return;
===== drivers/scsi/scsi.c 1.55 vs edited =====
--- 1.55/drivers/scsi/scsi.c	Tue Nov  5 17:13:42 2002
+++ edited/drivers/scsi/scsi.c	Tue Nov  5 20:05:10 2002
@@ -1664,10 +1664,6 @@
 			break;
 	}
 	spin_unlock_irqrestore(&device_request_lock, flags);
-	if(SDpnt->current_queue_depth == 0)
-	{
-		scsi_build_commandblocks(SDpnt);
-	}
 }
 
 #ifdef CONFIG_PROC_FS
@@ -1932,16 +1928,7 @@
 		scsi_detach_device(scd);
 
 		if (scd->attached == 0) {
-			/*
-			 * Nobody is using this device any more.
-			 * Free all of the command structures.
-			 */
-                        if (HBA_ptr->hostt->revoke)
-                                HBA_ptr->hostt->revoke(scd);
-			if (HBA_ptr->hostt->slave_detach)
-				(*HBA_ptr->hostt->slave_detach) (scd);
 			devfs_unregister (scd->de);
-			scsi_release_commandblocks(scd);
 
 			/* Now we can remove the device structure */
 			if (scd->next != NULL)
@@ -1976,7 +1963,7 @@
 	down_read(&scsi_devicelist_mutex);
 	for (sdt = scsi_devicelist; sdt; sdt = sdt->next)
 		if (sdt->detect)
-			sdev->attached += (*sdt->detect)(sdev);
+			(*sdt->detect)(sdev);
 	up_read(&scsi_devicelist_mutex);
 }
 
@@ -1984,18 +1971,16 @@
 {
 	struct Scsi_Device_Template *sdt;
 
-	scsi_build_commandblocks(sdev);
-	if (sdev->current_queue_depth == 0)
-		goto fail;
-
 	down_read(&scsi_devicelist_mutex);
 	for (sdt = scsi_devicelist; sdt; sdt = sdt->next)
 		if (sdt->attach)
+			/*
+			 * XXX check result when the upper level attach
+			 * return values are fixed, and on failure goto
+			 * fail.
+			 */
 			(*sdt->attach) (sdev);
 	up_read(&scsi_devicelist_mutex);
-
-	if (!sdev->attached)
-		scsi_release_commandblocks(sdev);
 	return 0;
 
 fail:
@@ -2017,6 +2002,66 @@
 }
 
 /*
+ * Function:	scsi_slave_attach()
+ *
+ * Purpose:	Called from the upper level driver attach to handle common
+ * 		attach code.
+ *
+ * Arguments:	sdev - scsi_device to attach
+ *
+ * Returns:	1 on error, 0 on succes
+ *
+ * Lock Status:	Protected via scsi_devicelist_mutex.
+ */
+int scsi_slave_attach(struct scsi_device *sdev)
+{
+	if (sdev->attached++ == 0) {
+		/*
+		 * No one was attached.
+		 */
+		if ((sdev->host->hostt->slave_attach != NULL) &&
+		    (sdev->host->hostt->slave_attach(sdev) != 0)) {
+			printk(KERN_INFO "scsi: failed low level driver"
+			       " attach, some SCSI device might not be"
+			       " configured\n");
+			return 1;
+		}
+		if ((sdev->new_queue_depth == 0) &&
+		    (sdev->host->cmd_per_lun != 0))
+			scsi_adjust_queue_depth(sdev, 0,
+						sdev->host->cmd_per_lun);
+		scsi_build_commandblocks(sdev);
+		if (sdev->current_queue_depth == 0) {
+			printk(KERN_ERR "scsi: Allocation failure during"
+			       " attach, some SCSI devices might not be"
+			       " configured\n");
+			if (sdev->host->hostt->slave_detach != NULL)
+				sdev->host->hostt->slave_detach(sdev);
+			return 1;
+		}
+	}
+	return 0;
+}
+
+/*
+ * Function:	scsi_slave_detach()
+ *
+ * Purpose:	Called from the upper level driver attach to handle common
+ * 		detach code.
+ *
+ * Arguments:	sdev - struct scsi_device to detach
+ *
+ * Lock Status:	Protected via scsi_devicelist_mutex.
+ */
+void scsi_slave_detach(struct scsi_device *sdev)
+{
+	if (--sdev->attached == 0) {
+		if (sdev->host->hostt->slave_detach != NULL)
+			sdev->host->hostt->slave_detach(sdev);
+		scsi_release_commandblocks(sdev);
+	}
+}
+/*
  * This entry point should be called by a loadable module if it is trying
  * add a high level scsi driver to the system.
  */
@@ -2053,7 +2098,7 @@
 		for (SDpnt = shpnt->host_queue; SDpnt;
 		     SDpnt = SDpnt->next) {
 			if (tpnt->detect)
-				SDpnt->attached += (*tpnt->detect) (SDpnt);
+				(*tpnt->detect) (SDpnt);
 		}
 	}
 
@@ -2064,22 +2109,14 @@
 	     shpnt = scsi_host_get_next(shpnt)) {
 		for (SDpnt = shpnt->host_queue; SDpnt;
 		     SDpnt = SDpnt->next) {
-			scsi_build_commandblocks(SDpnt);
-			if (SDpnt->current_queue_depth == 0) {
-				out_of_space = 1;
-				continue;
-			}
 			if (tpnt->attach)
+				/*
+				 * XXX check result when the upper level
+				 * attach return values are fixed, and
+				 * stop attaching on failure.
+				 */
 				(*tpnt->attach) (SDpnt);
 
-			/*
-			 * If this driver attached to the device, and don't have any
-			 * command blocks for this device, allocate some.
-			 */
-			if (SDpnt->attached)
-				SDpnt->online = TRUE;
-			else
-				scsi_release_commandblocks(SDpnt);
 		}
 	}
 
@@ -2119,17 +2156,6 @@
 		     SDpnt = SDpnt->next) {
 			if (tpnt->detach)
 				(*tpnt->detach) (SDpnt);
-			if (SDpnt->attached == 0) {
-				SDpnt->online = FALSE;
-
-				/*
-				 * Nobody is using this device any more.  Free all of the
-				 * command structures.
-				 */
-				if (shpnt->hostt->slave_detach)
-					(*shpnt->hostt->slave_detach) (SDpnt);
-				scsi_release_commandblocks(SDpnt);
-			}
 		}
 	}
 	/*
===== drivers/scsi/scsi.h 1.33 vs edited =====
--- 1.33/drivers/scsi/scsi.h	Mon Nov  4 09:04:51 2002
+++ edited/drivers/scsi/scsi.h	Tue Nov  5 17:26:24 2002
@@ -466,6 +466,8 @@
 extern void scsi_release_commandblocks(Scsi_Device * SDpnt);
 extern void scsi_build_commandblocks(Scsi_Device * SDpnt);
 extern void scsi_adjust_queue_depth(Scsi_Device *, int, int);
+extern int scsi_slave_attach(struct scsi_device *sdev);
+extern void scsi_slave_detach(struct scsi_device *sdev);
 extern void scsi_done(Scsi_Cmnd * SCpnt);
 extern void scsi_finish_command(Scsi_Cmnd *);
 extern int scsi_retry_command(Scsi_Cmnd *);
===== drivers/scsi/scsi_mid_low_api.txt 1.4 vs edited =====
--- 1.4/drivers/scsi/scsi_mid_low_api.txt	Thu Oct 24 17:29:22 2002
+++ edited/drivers/scsi/scsi_mid_low_api.txt	Tue Nov  5 19:41:34 2002
@@ -352,33 +352,11 @@
  *      Locks: lock_kernel() active on entry and expected to be active
  *      on return.
  *
- *      Notes: Invoked from mid level's scsi_unregister_host(). When a
- *      host is being unregistered the mid level does not bother to
- *      call revoke() on the devices it controls.
+ *      Notes: Invoked from mid level's scsi_unregister_host().
  *      This function should call scsi_unregister(shp) [found in hosts.c]
  *      prior to returning.
  **/
     int release(struct Scsi_Host * shp);
-
-
-/**
- *      revoke - indicate disinterest in a scsi device
- *      @sdp: host template for this driver.
- *
- *      Return value ignored.
- *
- *      Required: no
- *
- *      Locks: none held
- *
- *      Notes: Called when "scsi remove-single-device <h> <b> <t> <l>"
- *      is written to /proc/scsi/scsi to indicate the device is no longer 
- *      required. It is called after the upper level drivers have detached 
- *      this device and before the device name  (e.g. /dev/sdc) is 
- *      unregistered and the resources associated with it are freed.
- **/
-    int revoke(Scsi_device * sdp);
-
 
 /**
  *      select_queue_depths - calculate allowable number of scsi commands
===== drivers/scsi/scsi_scan.c 1.32 vs edited =====
--- 1.32/drivers/scsi/scsi_scan.c	Sun Nov  3 10:48:33 2002
+++ edited/drivers/scsi/scsi_scan.c	Tue Nov  5 17:27:24 2002
@@ -1480,15 +1480,6 @@
 
 	scsi_detect_device(sdev);
 
-	if (sdev->host->hostt->slave_attach != NULL) {
-		if (sdev->host->hostt->slave_attach(sdev) != 0) {
-			printk(KERN_INFO "%s: scsi_add_lun: failed low level driver attach, setting device offline", devname);
-			sdev->online = FALSE;
-		}
-	} else if(sdev->host->cmd_per_lun) {
-		scsi_adjust_queue_depth(sdev, 0, sdev->host->cmd_per_lun);
-	}
-
 	if (sdevnew != NULL)
 		*sdevnew = sdev;
 
===== drivers/scsi/sd.c 1.88 vs edited =====
--- 1.88/drivers/scsi/sd.c	Mon Nov  4 20:08:01 2002
+++ edited/drivers/scsi/sd.c	Tue Nov  5 17:29:12 2002
@@ -1212,9 +1212,12 @@
 	SCSI_LOG_HLQUEUE(3, printk("sd_attach: scsi device: <%d,%d,%d,%d>\n", 
 			 sdp->host->host_no, sdp->channel, sdp->id, sdp->lun));
 
+	if (scsi_slave_attach(sdp))
+		goto out;
+
 	sdkp = kmalloc(sizeof(*sdkp), GFP_KERNEL);
 	if (!sdkp)
-		goto out;
+		goto out_detach;
 
 	gd = alloc_disk(16);
 	if (!gd)
@@ -1263,8 +1266,9 @@
 
 out_free:
 	kfree(sdkp);
+out_detach:
+	scsi_slave_detach(sdp);
 out:
-	sdp->attached--;
 	return 1;
 }
 
@@ -1312,7 +1316,7 @@
 
 	sd_devlist_remove(sdkp);
 	del_gendisk(sdkp->disk);
-	sdp->attached--;
+	scsi_slave_detach(sdp);
 	sd_nr_dev--;
 	put_disk(sdkp->disk);
 	kfree(sdkp);
===== drivers/scsi/sg.c 1.37 vs edited =====
--- 1.37/drivers/scsi/sg.c	Tue Nov  5 13:22:20 2002
+++ edited/drivers/scsi/sg.c	Tue Nov  5 17:30:41 2002
@@ -1396,6 +1396,8 @@
 
 	if (!disk)
 		return 1;
+	if (scsi_slave_attach(scsidp))
+		return 1;
 	write_lock_irqsave(&sg_dev_arr_lock, iflags);
 	if (sg_nr_dev >= sg_dev_max) {	/* try to resize */
 		Sg_device **tmp_da;
@@ -1405,10 +1407,10 @@
 		tmp_da = (Sg_device **)vmalloc(
 				tmp_dev_max * sizeof(Sg_device *));
 		if (NULL == tmp_da) {
-			scsidp->attached--;
 			printk(KERN_ERR
 			       "sg_attach: device array cannot be resized\n");
 			put_disk(disk);
+			scsi_slave_detach(scsidp);
 			return 1;
 		}
 		write_lock_irqsave(&sg_dev_arr_lock, iflags);
@@ -1425,7 +1427,6 @@
 		if (!sg_dev_arr[k])
 			break;
 	if (k > SG_MAX_DEVS_MASK) {
-		scsidp->attached--;
 		write_unlock_irqrestore(&sg_dev_arr_lock, iflags);
 		printk(KERN_WARNING
 		       "Unable to attach sg device <%d, %d, %d, %d>"
@@ -1435,6 +1436,7 @@
 		if (NULL != sdp)
 			vfree((char *) sdp);
 		put_disk(disk);
+		scsi_slave_detach(scsidp);
 		return 1;
 	}
 	if (k < sg_dev_max) {
@@ -1448,10 +1450,10 @@
 	} else
 		sdp = NULL;
 	if (NULL == sdp) {
-		scsidp->attached--;
 		write_unlock_irqrestore(&sg_dev_arr_lock, iflags);
 		printk(KERN_ERR "sg_attach: Sg_device cannot be allocated\n");
 		put_disk(disk);
+		scsi_slave_detach(scsidp);
 		return 1;
 	}
 
@@ -1559,7 +1561,7 @@
 			SCSI_LOG_TIMEOUT(3, printk("sg_detach: dev=%d\n", k));
 			sg_dev_arr[k] = NULL;
 		}
-		scsidp->attached--;
+		scsi_slave_detach(scsidp);
 		sg_nr_dev--;
 		sg_dev_noticed--;	/* from <dan@lectra.fr> */
 		break;
===== drivers/scsi/sr.c 1.63 vs edited =====
--- 1.63/drivers/scsi/sr.c	Mon Nov  4 20:08:08 2002
+++ edited/drivers/scsi/sr.c	Tue Nov  5 17:37:40 2002
@@ -506,6 +506,9 @@
 	if (sdev->type != TYPE_ROM && sdev->type != TYPE_WORM)
 		return 1;
 
+	if (scsi_slave_attach(sdev))
+		return 1;
+
 	cd = kmalloc(sizeof(*cd), GFP_KERNEL);
 	if (!cd)
 		goto fail;
@@ -574,7 +577,7 @@
 fail_free:
 	kfree(cd);
 fail:
-	sdev->attached--;
+	scsi_slave_detach(sdev);
 	return 1;
 }
 
@@ -807,7 +810,7 @@
 	put_disk(cd->disk);
 	unregister_cdrom(&cd->cdi);
 
-	SDp->attached--;
+	scsi_slave_detach(SDp);
 	sr_nr_dev--;
 
 	kfree(cd);
===== drivers/scsi/st.c 1.39 vs edited =====
--- 1.39/drivers/scsi/st.c	Mon Nov  4 20:04:52 2002
+++ edited/drivers/scsi/st.c	Tue Nov  5 17:36:00 2002
@@ -3667,6 +3667,9 @@
 		return 1;
 	}
 
+	if (scsi_slave_attach(SDp))
+		return 1;
+
 	i = SDp->host->sg_tablesize;
 	if (st_max_sg_segs < i)
 		i = st_max_sg_segs;
@@ -3692,11 +3695,11 @@
 		if (tmp_dev_max > ST_MAX_TAPES)
 			tmp_dev_max = ST_MAX_TAPES;
 		if (tmp_dev_max <= st_nr_dev) {
-			SDp->attached--;
 			write_unlock(&st_dev_arr_lock);
 			printk(KERN_ERR "st: Too many tape devices (max. %d).\n",
 			       ST_MAX_TAPES);
 			put_disk(disk);
+			scsi_slave_detach(SDp);
 			return 1;
 		}
 
@@ -3707,10 +3710,10 @@
 				kfree(tmp_da);
 			if (tmp_ba != NULL)
 				kfree(tmp_ba);
-			SDp->attached--;
 			write_unlock(&st_dev_arr_lock);
 			printk(KERN_ERR "st: Can't extend device array.\n");
 			put_disk(disk);
+			scsi_slave_detach(SDp);
 			return 1;
 		}
 
@@ -3733,10 +3736,10 @@
 
 	tpnt = kmalloc(sizeof(Scsi_Tape), GFP_ATOMIC);
 	if (tpnt == NULL) {
-		SDp->attached--;
 		write_unlock(&st_dev_arr_lock);
 		printk(KERN_ERR "st: Can't allocate device descriptor.\n");
 		put_disk(disk);
+		scsi_slave_detach(SDp);
 		return 1;
 	}
 	memset(tpnt, 0, sizeof(Scsi_Tape));
@@ -3906,7 +3909,7 @@
 				tpnt->de_n[mode] = NULL;
 			}
 			scsi_tapes[i] = 0;
-			SDp->attached--;
+			scsi_slave_detach(SDp);
 			st_nr_dev--;
 			write_unlock(&st_dev_arr_lock);
 

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] fix 2.5 scsi queue depth setting
  2002-11-06  4:24 [PATCH] fix 2.5 scsi queue depth setting Patrick Mansfield
@ 2002-11-06  4:35 ` Patrick Mansfield
  2002-11-06 17:15 ` J.E.J. Bottomley
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 297+ messages in thread
From: Patrick Mansfield @ 2002-11-06  4:35 UTC (permalink / raw)
  To: James Bottomley, Christoph Hellwig, linux-scsi

On Tue, Nov 05, 2002 at 08:24:17PM -0800, Patrick Mansfield wrote:
> This patch fixes queue depth setting of scsi devices.
> 
> This is done by pairing shost->slave_attach() calls with
> a scsi_build_commandblocks in the new scsi_slave_attach.
> 
> This is a patch aginst linux-scsi.bkbits.net/scsi-for-linus-2.5 after 
> applying the last posted hch version of the "Eliminate scsi_host_tmpl_list"
> patch, it still applies with offset to the current scsi-for-linus-2.5.

BTW, with all these changes, the SCSI messages come out in a bit different
order, like the following for two hosts. I removed the messages for all
but the first two drives found on each adapter, you can see that each host
adapter is scanned and attaches before going to the next host adapter,
and that queue depth setting is happening during attach time (queue depth
messages are interspersed with the attach messages).

scsi2 : QLogic QLA2200 PCI to Fibre Channel Host Adapter: bus 2 device 4 irq 22
        Firmware version:  2.02.03, Driver version 6.03.00b8
  Vendor: IBM       Model: 3542              Rev: 0400
  Type:   Direct-Access                      ANSI SCSI revision: 03
  Vendor: IBM       Model: 3542              Rev: 0400
  Type:   Direct-Access                      ANSI SCSI revision: 03
scsi(2:0:0:0): Enabled tagged queuing, queue depth 16.
SCSI device : drive cache: write back
SCSI device : 35466240 512-byte hdwr sectors (18159 MB)
 sdc: unknown partition table
Attached scsi disk sdc at scsi2, channel 0, id 0, lun 0
scsi(2:0:0:1): Enabled tagged queuing, queue depth 16.
SCSI device : drive cache: write back
SCSI device : 35466240 512-byte hdwr sectors (18159 MB)
 sdd: unknown partition table
Attached scsi disk sdd at scsi2, channel 0, id 0, lun 1
scsi3 : QLogic QLA2200 PCI to Fibre Channel Host Adapter: bus 2 device 5 irq 23
        Firmware version:  2.02.03, Driver version 6.03.00b8
  Vendor: IBM       Model: 3542              Rev: 0400
  Type:   Direct-Access                      ANSI SCSI revision: 03
  Vendor: IBM       Model: 3542              Rev: 0400
  Type:   Direct-Access                      ANSI SCSI revision: 03
scsi(3:0:0:0): Enabled tagged queuing, queue depth 16.
SCSI device : drive cache: write back
SCSI device : 35466240 512-byte hdwr sectors (18159 MB)
 sdm: unknown partition table
Attached scsi disk sdm at scsi3, channel 0, id 0, lun 0
scsi(3:0:0:1): Enabled tagged queuing, queue depth 16.
SCSI device : drive cache: write back
SCSI device : 35466240 512-byte hdwr sectors (18159 MB)
 sdn: unknown partition table
Attached scsi disk sdn at scsi3, channel 0, id 0, lun 1

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] fix 2.5 scsi queue depth setting
  2002-11-06  4:24 [PATCH] fix 2.5 scsi queue depth setting Patrick Mansfield
  2002-11-06  4:35 ` Patrick Mansfield
@ 2002-11-06 17:15 ` J.E.J. Bottomley
  2002-11-06 17:47 ` J.E.J. Bottomley
  2002-11-06 20:50 ` Doug Ledford
  3 siblings, 0 replies; 297+ messages in thread
From: J.E.J. Bottomley @ 2002-11-06 17:15 UTC (permalink / raw)
  To: Patrick Mansfield; +Cc: James Bottomley, Christoph Hellwig, linux-scsi

In this block:

+int scsi_slave_attach(struct scsi_device *sdev)
+{
+	if (sdev->attached++ == 0) {
+		/*
+		 * No one was attached.
+		 */
+		if ((sdev->host->hostt->slave_attach != NULL) &&
+		    (sdev->host->hostt->slave_attach(sdev) != 0)) {
+			printk(KERN_INFO "scsi: failed low level driver"
+			       " attach, some SCSI device might not be"
+			       " configured\n");
+			return 1;
+		}
+		if ((sdev->new_queue_depth == 0) &&
+		    (sdev->host->cmd_per_lun != 0))
+			scsi_adjust_queue_depth(sdev, 0,
+						sdev->host->cmd_per_lun);
+		scsi_build_commandblocks(sdev);

I've a marginal preference for calling scsi_build_commandblocks first, so we 
have a single command block going into the slave_attach (remember the adjust 
queue depth stuff is deferred building of command blocks).  That way, we can 
error out if the current_queue_depth is still zero without having to do a 
slave detach.

Other than that, this patch looks like an excellent solution.  Thanks!

James



^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] fix 2.5 scsi queue depth setting
  2002-11-06  4:24 [PATCH] fix 2.5 scsi queue depth setting Patrick Mansfield
  2002-11-06  4:35 ` Patrick Mansfield
  2002-11-06 17:15 ` J.E.J. Bottomley
@ 2002-11-06 17:47 ` J.E.J. Bottomley
  2002-11-06 18:24   ` Patrick Mansfield
  2002-11-06 20:50 ` Doug Ledford
  3 siblings, 1 reply; 297+ messages in thread
From: J.E.J. Bottomley @ 2002-11-06 17:47 UTC (permalink / raw)
  To: Patrick Mansfield; +Cc: James Bottomley, Christoph Hellwig, linux-scsi

This is what I think needs to be done to the patch, does this look OK?

James
===== drivers/scsi/scsi.c 1.55 vs edited =====
--- 1.55/drivers/scsi/scsi.c	Tue Nov  5 14:05:10 2002
+++ edited/drivers/scsi/scsi.c	Wed Nov  6 11:33:16 2002
@@ -2019,6 +2019,13 @@
 		/*
 		 * No one was attached.
 		 */
+		scsi_build_commandblocks(sdev);
+		if (sdev->current_queue_depth == 0) {
+			printk(KERN_ERR "scsi: Allocation failure during"
+			       " attach, some SCSI devices might not be"
+			       " configured\n");
+			return 1;
+		}
 		if ((sdev->host->hostt->slave_attach != NULL) &&
 		    (sdev->host->hostt->slave_attach(sdev) != 0)) {
 			printk(KERN_INFO "scsi: failed low level driver"
@@ -2030,15 +2037,6 @@
 		    (sdev->host->cmd_per_lun != 0))
 			scsi_adjust_queue_depth(sdev, 0,
 						sdev->host->cmd_per_lun);
-		scsi_build_commandblocks(sdev);
-		if (sdev->current_queue_depth == 0) {
-			printk(KERN_ERR "scsi: Allocation failure during"
-			       " attach, some SCSI devices might not be"
-			       " configured\n");
-			if (sdev->host->hostt->slave_detach != NULL)
-				sdev->host->hostt->slave_detach(sdev);
-			return 1;
-		}
 	}
 	return 0;
 }
===== drivers/scsi/scsi_syms.c 1.17 vs edited =====
--- 1.17/drivers/scsi/scsi_syms.c	Mon Nov  4 15:55:09 2002
+++ edited/drivers/scsi/scsi_syms.c	Wed Nov  6 11:33:58 2002
@@ -82,6 +82,8 @@
 
 EXPORT_SYMBOL(scsi_register_blocked_host);
 EXPORT_SYMBOL(scsi_deregister_blocked_host);
+EXPORT_SYMBOL(scsi_slave_attach);
+EXPORT_SYMBOL(scsi_slave_detach);
 
 /*
  * This symbol is for the highlevel drivers (e.g. sg) only.



^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] fix 2.5 scsi queue depth setting
  2002-11-06 17:47 ` J.E.J. Bottomley
@ 2002-11-06 18:24   ` Patrick Mansfield
  2002-11-06 18:32     ` J.E.J. Bottomley
  2002-11-06 20:45     ` Doug Ledford
  0 siblings, 2 replies; 297+ messages in thread
From: Patrick Mansfield @ 2002-11-06 18:24 UTC (permalink / raw)
  To: J.E.J. Bottomley; +Cc: Christoph Hellwig, linux-scsi

On Wed, Nov 06, 2002 at 12:47:28PM -0500, J.E.J. Bottomley wrote:
> This is what I think needs to be done to the patch, does this look OK?
> 
> James

The only problem is that scsi_build_commandblocks() sets sdev->new_queue_depth 
to 1, so if the slave_attach() does not set queue depth, we will leave
it at 1, and the post-slave_attach call to scsi_adjust_queue_depth
can never be hit.

It looks like the code could be removed from scsi_build_commandblocks,
since (in this code path) we will always call scsi_adjust_queue_depth
before sending IO, and other code paths always set new_queue_depth
prior to calling scsi_build_commandblocks.

It would be nice if scsi_request_fn would abort IO if current_queue_depth
== 0 or new_queue_depth == 0.

And yes, I missed scsi_syms.c.

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] fix 2.5 scsi queue depth setting
  2002-11-06 18:24   ` Patrick Mansfield
@ 2002-11-06 18:32     ` J.E.J. Bottomley
  2002-11-06 18:39       ` Patrick Mansfield
  2002-11-06 20:45     ` Doug Ledford
  1 sibling, 1 reply; 297+ messages in thread
From: J.E.J. Bottomley @ 2002-11-06 18:32 UTC (permalink / raw)
  To: Patrick Mansfield; +Cc: J.E.J. Bottomley, Christoph Hellwig, linux-scsi

patmans@us.ibm.com said:
> The only problem is that scsi_build_commandblocks() sets sdev->
> new_queue_depth  to 1, so if the slave_attach() does not set queue
> depth, we will leave it at 1, and the post-slave_attach call to
> scsi_adjust_queue_depth can never be hit.

OK, missed that.  What about just conditioning the call to 
scsi_adjust_queue_depth to be if you don't have a slave attach call.  The 
expectation would be that if you do have a slave_attach then the queue depth 
stays one until something is done about it by the LLD?

James




^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] fix 2.5 scsi queue depth setting
  2002-11-06 18:32     ` J.E.J. Bottomley
@ 2002-11-06 18:39       ` Patrick Mansfield
  2002-11-06 18:50         ` J.E.J. Bottomley
  0 siblings, 1 reply; 297+ messages in thread
From: Patrick Mansfield @ 2002-11-06 18:39 UTC (permalink / raw)
  To: J.E.J. Bottomley; +Cc: Christoph Hellwig, linux-scsi

On Wed, Nov 06, 2002 at 01:32:42PM -0500, J.E.J. Bottomley wrote:
> patmans@us.ibm.com said:
> 
> OK, missed that.  What about just conditioning the call to 
> scsi_adjust_queue_depth to be if you don't have a slave attach call.  The 
> expectation would be that if you do have a slave_attach then the queue depth 
> stays one until something is done about it by the LLD?
> 
> James
> 

That is what we had before, but then there is no warning that the
queue depth is at 1, and the only way to make sure it is right is to
audit the drivers.

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] fix 2.5 scsi queue depth setting
  2002-11-06 18:39       ` Patrick Mansfield
@ 2002-11-06 18:50         ` J.E.J. Bottomley
  2002-11-06 19:50           ` Patrick Mansfield
  0 siblings, 1 reply; 297+ messages in thread
From: J.E.J. Bottomley @ 2002-11-06 18:50 UTC (permalink / raw)
  To: Patrick Mansfield; +Cc: J.E.J. Bottomley, Christoph Hellwig, linux-scsi

patmans@us.ibm.com said:
> That is what we had before, but then there is no warning that the
> queue depth is at 1, and the only way to make sure it is right is to
> audit the drivers. 

I'm OK with that, since the drivers can be audited as they're moved over to 
slave attach.  It also works for drivers that use older hardware (like the 
53c700) which don't call adjust_queue_depth from slave attach, but slightly 
later on when they've really verified the device will accept tags.  In this 
case, I don't want the mid layer to call adjust_queue_depth for me even if I 
leave slave_attach with only one command allocated.

The bottom line is that if a driver supplies a slave_attach routine, tag setup 
is entirely under its control and the mid-layer won't try to second guess it 
if it doesn't set them up correctly.

James

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] fix 2.5 scsi queue depth setting
  2002-11-06 18:50         ` J.E.J. Bottomley
@ 2002-11-06 19:50           ` Patrick Mansfield
  0 siblings, 0 replies; 297+ messages in thread
From: Patrick Mansfield @ 2002-11-06 19:50 UTC (permalink / raw)
  To: J.E.J. Bottomley; +Cc: Christoph Hellwig, linux-scsi

On Wed, Nov 06, 2002 at 01:50:00PM -0500, J.E.J. Bottomley wrote:
> I'm OK with that, since the drivers can be audited as they're moved over to 
> slave attach.  It also works for drivers that use older hardware (like the 
> 53c700) which don't call adjust_queue_depth from slave attach, but slightly 
> later on when they've really verified the device will accept tags.  In this 
> case, I don't want the mid layer to call adjust_queue_depth for me even if I 
> leave slave_attach with only one command allocated.

OK, here it is again, as discussed, plus it calls scsi_release_commandblocks
on slave_attach failure.

 hosts.c              |    5 --
 hosts.h              |    6 --
 osst.c               |    9 ++-
 scsi.c               |  116 ++++++++++++++++++++++++++++++---------------------
 scsi.h               |    2 
 scsi_mid_low_api.txt |   24 ----------
 scsi_scan.c          |    9 ---
 scsi_syms.c          |    2 
 sd.c                 |   10 +++-
 sg.c                 |   10 ++--
 sr.c                 |    7 ++-
 st.c                 |   11 +++-
 12 files changed, 106 insertions(+), 105 deletions(-)

--- 1.23/drivers/scsi/hosts.c	Tue Nov  5 17:13:42 2002
+++ edited/drivers/scsi/hosts.c	Tue Nov  5 17:18:22 2002
@@ -249,10 +249,6 @@
 			       sdev->attached);
 			return 1;
 		}
-
-		if (shost->hostt->slave_detach)
-			(*shost->hostt->slave_detach) (sdev);
-
 		devfs_unregister(sdev->de);
 		device_unregister(&sdev->sdev_driverfs_dev);
 	}
@@ -261,7 +257,6 @@
 
 	for (sdev = shost->host_queue; sdev;
 	     sdev = shost->host_queue) {
-		scsi_release_commandblocks(sdev);
 		blk_cleanup_queue(&sdev->request_queue);
 		/* Next free up the Scsi_Device structures for this host */
 		shost->host_queue = sdev->next;
===== drivers/scsi/hosts.h 1.32 vs edited =====
--- 1.32/drivers/scsi/hosts.h	Tue Nov  5 17:13:42 2002
+++ edited/drivers/scsi/hosts.h	Tue Nov  5 19:40:42 2002
@@ -93,12 +93,6 @@
      */
     int (* detect)(struct SHT *);
 
-    /*
-     * This function is only used by one driver and will be going away
-     * once it switches over to using the slave_detach() function instead.
-     */
-    int (*revoke)(Scsi_Device *);
-
     /* Used with loadable modules to unload the host structures.  Note:
      * there is a default action built into the modules code which may
      * be sufficient for most host adapters.  Thus you may not have to supply
===== drivers/scsi/osst.c 1.24 vs edited =====
--- 1.24/drivers/scsi/osst.c	Mon Nov  4 20:00:30 2002
+++ edited/drivers/scsi/osst.c	Tue Nov  5 17:26:53 2002
@@ -5421,10 +5421,12 @@
 		return 1;
 
 	if (osst_nr_dev >= osst_dev_max) {
-		 SDp->attached--;
 		 put_disk(disk);
 		 return 1;
 	}
+
+	if (scsi_slave_attach(SDp))
+		return 1;
 	
 	/* find a free minor number */
 	for (i=0; os_scsi_tapes[i] && i<osst_dev_max; i++);
@@ -5433,9 +5435,10 @@
 	/* allocate a OS_Scsi_Tape for this device */
 	tpnt = (OS_Scsi_Tape *)kmalloc(sizeof(OS_Scsi_Tape), GFP_ATOMIC);
 	if (tpnt == NULL) {
-		 SDp->attached--;
 		 printk(KERN_WARNING "osst :W: Can't allocate device descriptor.\n");
 		 put_disk(disk);
+
+		 scsi_slave_detach(SDp);
 		 return 1;
 	}
 	memset(tpnt, 0, sizeof(OS_Scsi_Tape));
@@ -5648,7 +5651,7 @@
 		put_disk(tpnt->disk);
 		kfree(tpnt);
 		os_scsi_tapes[i] = NULL;
-		SDp->attached--;
+		scsi_slave_detach(SDp);
 		osst_nr_dev--;
 		osst_dev_noticed--;
 		return;
===== drivers/scsi/scsi.c 1.55 vs edited =====
--- 1.55/drivers/scsi/scsi.c	Tue Nov  5 17:13:42 2002
+++ edited/drivers/scsi/scsi.c	Wed Nov  6 11:28:38 2002
@@ -1664,10 +1664,6 @@
 			break;
 	}
 	spin_unlock_irqrestore(&device_request_lock, flags);
-	if(SDpnt->current_queue_depth == 0)
-	{
-		scsi_build_commandblocks(SDpnt);
-	}
 }
 
 #ifdef CONFIG_PROC_FS
@@ -1932,16 +1928,7 @@
 		scsi_detach_device(scd);
 
 		if (scd->attached == 0) {
-			/*
-			 * Nobody is using this device any more.
-			 * Free all of the command structures.
-			 */
-                        if (HBA_ptr->hostt->revoke)
-                                HBA_ptr->hostt->revoke(scd);
-			if (HBA_ptr->hostt->slave_detach)
-				(*HBA_ptr->hostt->slave_detach) (scd);
 			devfs_unregister (scd->de);
-			scsi_release_commandblocks(scd);
 
 			/* Now we can remove the device structure */
 			if (scd->next != NULL)
@@ -1976,7 +1963,7 @@
 	down_read(&scsi_devicelist_mutex);
 	for (sdt = scsi_devicelist; sdt; sdt = sdt->next)
 		if (sdt->detect)
-			sdev->attached += (*sdt->detect)(sdev);
+			(*sdt->detect)(sdev);
 	up_read(&scsi_devicelist_mutex);
 }
 
@@ -1984,18 +1971,16 @@
 {
 	struct Scsi_Device_Template *sdt;
 
-	scsi_build_commandblocks(sdev);
-	if (sdev->current_queue_depth == 0)
-		goto fail;
-
 	down_read(&scsi_devicelist_mutex);
 	for (sdt = scsi_devicelist; sdt; sdt = sdt->next)
 		if (sdt->attach)
+			/*
+			 * XXX check result when the upper level attach
+			 * return values are fixed, and on failure goto
+			 * fail.
+			 */
 			(*sdt->attach) (sdev);
 	up_read(&scsi_devicelist_mutex);
-
-	if (!sdev->attached)
-		scsi_release_commandblocks(sdev);
 	return 0;
 
 fail:
@@ -2017,6 +2002,64 @@
 }
 
 /*
+ * Function:	scsi_slave_attach()
+ *
+ * Purpose:	Called from the upper level driver attach to handle common
+ * 		attach code.
+ *
+ * Arguments:	sdev - scsi_device to attach
+ *
+ * Returns:	1 on error, 0 on succes
+ *
+ * Lock Status:	Protected via scsi_devicelist_mutex.
+ */
+int scsi_slave_attach(struct scsi_device *sdev)
+{
+	if (sdev->attached++ == 0) {
+		/*
+		 * No one was attached.
+		 */
+		scsi_build_commandblocks(sdev);
+		if (sdev->current_queue_depth == 0) {
+			printk(KERN_ERR "scsi: Allocation failure during"
+			       " attach, some SCSI devices might not be"
+			       " configured\n");
+			return 1;
+		}
+		if (sdev->host->hostt->slave_attach != NULL) {
+			if (sdev->host->hostt->slave_attach(sdev) != 0) {
+				printk(KERN_INFO "scsi: failed low level driver"
+				       " attach, some SCSI device might not be"
+				       " configured\n");
+				scsi_release_commandblocks(sdev);
+				return 1;
+			}
+		} else if (sdev->host->cmd_per_lun != 0)
+			scsi_adjust_queue_depth(sdev, 0,
+						sdev->host->cmd_per_lun);
+	}
+	return 0;
+}
+
+/*
+ * Function:	scsi_slave_detach()
+ *
+ * Purpose:	Called from the upper level driver attach to handle common
+ * 		detach code.
+ *
+ * Arguments:	sdev - struct scsi_device to detach
+ *
+ * Lock Status:	Protected via scsi_devicelist_mutex.
+ */
+void scsi_slave_detach(struct scsi_device *sdev)
+{
+	if (--sdev->attached == 0) {
+		if (sdev->host->hostt->slave_detach != NULL)
+			sdev->host->hostt->slave_detach(sdev);
+		scsi_release_commandblocks(sdev);
+	}
+}
+/*
  * This entry point should be called by a loadable module if it is trying
  * add a high level scsi driver to the system.
  */
@@ -2053,7 +2096,7 @@
 		for (SDpnt = shpnt->host_queue; SDpnt;
 		     SDpnt = SDpnt->next) {
 			if (tpnt->detect)
-				SDpnt->attached += (*tpnt->detect) (SDpnt);
+				(*tpnt->detect) (SDpnt);
 		}
 	}
 
@@ -2064,22 +2107,14 @@
 	     shpnt = scsi_host_get_next(shpnt)) {
 		for (SDpnt = shpnt->host_queue; SDpnt;
 		     SDpnt = SDpnt->next) {
-			scsi_build_commandblocks(SDpnt);
-			if (SDpnt->current_queue_depth == 0) {
-				out_of_space = 1;
-				continue;
-			}
 			if (tpnt->attach)
+				/*
+				 * XXX check result when the upper level
+				 * attach return values are fixed, and
+				 * stop attaching on failure.
+				 */
 				(*tpnt->attach) (SDpnt);
 
-			/*
-			 * If this driver attached to the device, and don't have any
-			 * command blocks for this device, allocate some.
-			 */
-			if (SDpnt->attached)
-				SDpnt->online = TRUE;
-			else
-				scsi_release_commandblocks(SDpnt);
 		}
 	}
 
@@ -2119,17 +2154,6 @@
 		     SDpnt = SDpnt->next) {
 			if (tpnt->detach)
 				(*tpnt->detach) (SDpnt);
-			if (SDpnt->attached == 0) {
-				SDpnt->online = FALSE;
-
-				/*
-				 * Nobody is using this device any more.  Free all of the
-				 * command structures.
-				 */
-				if (shpnt->hostt->slave_detach)
-					(*shpnt->hostt->slave_detach) (SDpnt);
-				scsi_release_commandblocks(SDpnt);
-			}
 		}
 	}
 	/*
===== drivers/scsi/scsi.h 1.33 vs edited =====
--- 1.33/drivers/scsi/scsi.h	Mon Nov  4 09:04:51 2002
+++ edited/drivers/scsi/scsi.h	Tue Nov  5 17:26:24 2002
@@ -466,6 +466,8 @@
 extern void scsi_release_commandblocks(Scsi_Device * SDpnt);
 extern void scsi_build_commandblocks(Scsi_Device * SDpnt);
 extern void scsi_adjust_queue_depth(Scsi_Device *, int, int);
+extern int scsi_slave_attach(struct scsi_device *sdev);
+extern void scsi_slave_detach(struct scsi_device *sdev);
 extern void scsi_done(Scsi_Cmnd * SCpnt);
 extern void scsi_finish_command(Scsi_Cmnd *);
 extern int scsi_retry_command(Scsi_Cmnd *);
===== drivers/scsi/scsi_mid_low_api.txt 1.4 vs edited =====
--- 1.4/drivers/scsi/scsi_mid_low_api.txt	Thu Oct 24 17:29:22 2002
+++ edited/drivers/scsi/scsi_mid_low_api.txt	Tue Nov  5 19:41:34 2002
@@ -352,33 +352,11 @@
  *      Locks: lock_kernel() active on entry and expected to be active
  *      on return.
  *
- *      Notes: Invoked from mid level's scsi_unregister_host(). When a
- *      host is being unregistered the mid level does not bother to
- *      call revoke() on the devices it controls.
+ *      Notes: Invoked from mid level's scsi_unregister_host().
  *      This function should call scsi_unregister(shp) [found in hosts.c]
  *      prior to returning.
  **/
     int release(struct Scsi_Host * shp);
-
-
-/**
- *      revoke - indicate disinterest in a scsi device
- *      @sdp: host template for this driver.
- *
- *      Return value ignored.
- *
- *      Required: no
- *
- *      Locks: none held
- *
- *      Notes: Called when "scsi remove-single-device <h> <b> <t> <l>"
- *      is written to /proc/scsi/scsi to indicate the device is no longer 
- *      required. It is called after the upper level drivers have detached 
- *      this device and before the device name  (e.g. /dev/sdc) is 
- *      unregistered and the resources associated with it are freed.
- **/
-    int revoke(Scsi_device * sdp);
-
 
 /**
  *      select_queue_depths - calculate allowable number of scsi commands
===== drivers/scsi/scsi_scan.c 1.32 vs edited =====
--- 1.32/drivers/scsi/scsi_scan.c	Sun Nov  3 10:48:33 2002
+++ edited/drivers/scsi/scsi_scan.c	Tue Nov  5 17:27:24 2002
@@ -1480,15 +1480,6 @@
 
 	scsi_detect_device(sdev);
 
-	if (sdev->host->hostt->slave_attach != NULL) {
-		if (sdev->host->hostt->slave_attach(sdev) != 0) {
-			printk(KERN_INFO "%s: scsi_add_lun: failed low level driver attach, setting device offline", devname);
-			sdev->online = FALSE;
-		}
-	} else if(sdev->host->cmd_per_lun) {
-		scsi_adjust_queue_depth(sdev, 0, sdev->host->cmd_per_lun);
-	}
-
 	if (sdevnew != NULL)
 		*sdevnew = sdev;
 
===== drivers/scsi/scsi_syms.c 1.17 vs edited =====
--- 1.17/drivers/scsi/scsi_syms.c	Mon Nov  4 13:55:09 2002
+++ edited/drivers/scsi/scsi_syms.c	Wed Nov  6 11:25:04 2002
@@ -82,6 +82,8 @@
 
 EXPORT_SYMBOL(scsi_register_blocked_host);
 EXPORT_SYMBOL(scsi_deregister_blocked_host);
+EXPORT_SYMBOL(scsi_slave_attach);
+EXPORT_SYMBOL(scsi_slave_detach);
 
 /*
  * This symbol is for the highlevel drivers (e.g. sg) only.
===== drivers/scsi/sd.c 1.88 vs edited =====
--- 1.88/drivers/scsi/sd.c	Mon Nov  4 20:08:01 2002
+++ edited/drivers/scsi/sd.c	Tue Nov  5 17:29:12 2002
@@ -1212,9 +1212,12 @@
 	SCSI_LOG_HLQUEUE(3, printk("sd_attach: scsi device: <%d,%d,%d,%d>\n", 
 			 sdp->host->host_no, sdp->channel, sdp->id, sdp->lun));
 
+	if (scsi_slave_attach(sdp))
+		goto out;
+
 	sdkp = kmalloc(sizeof(*sdkp), GFP_KERNEL);
 	if (!sdkp)
-		goto out;
+		goto out_detach;
 
 	gd = alloc_disk(16);
 	if (!gd)
@@ -1263,8 +1266,9 @@
 
 out_free:
 	kfree(sdkp);
+out_detach:
+	scsi_slave_detach(sdp);
 out:
-	sdp->attached--;
 	return 1;
 }
 
@@ -1312,7 +1316,7 @@
 
 	sd_devlist_remove(sdkp);
 	del_gendisk(sdkp->disk);
-	sdp->attached--;
+	scsi_slave_detach(sdp);
 	sd_nr_dev--;
 	put_disk(sdkp->disk);
 	kfree(sdkp);
===== drivers/scsi/sg.c 1.37 vs edited =====
--- 1.37/drivers/scsi/sg.c	Tue Nov  5 13:22:20 2002
+++ edited/drivers/scsi/sg.c	Tue Nov  5 17:30:41 2002
@@ -1396,6 +1396,8 @@
 
 	if (!disk)
 		return 1;
+	if (scsi_slave_attach(scsidp))
+		return 1;
 	write_lock_irqsave(&sg_dev_arr_lock, iflags);
 	if (sg_nr_dev >= sg_dev_max) {	/* try to resize */
 		Sg_device **tmp_da;
@@ -1405,10 +1407,10 @@
 		tmp_da = (Sg_device **)vmalloc(
 				tmp_dev_max * sizeof(Sg_device *));
 		if (NULL == tmp_da) {
-			scsidp->attached--;
 			printk(KERN_ERR
 			       "sg_attach: device array cannot be resized\n");
 			put_disk(disk);
+			scsi_slave_detach(scsidp);
 			return 1;
 		}
 		write_lock_irqsave(&sg_dev_arr_lock, iflags);
@@ -1425,7 +1427,6 @@
 		if (!sg_dev_arr[k])
 			break;
 	if (k > SG_MAX_DEVS_MASK) {
-		scsidp->attached--;
 		write_unlock_irqrestore(&sg_dev_arr_lock, iflags);
 		printk(KERN_WARNING
 		       "Unable to attach sg device <%d, %d, %d, %d>"
@@ -1435,6 +1436,7 @@
 		if (NULL != sdp)
 			vfree((char *) sdp);
 		put_disk(disk);
+		scsi_slave_detach(scsidp);
 		return 1;
 	}
 	if (k < sg_dev_max) {
@@ -1448,10 +1450,10 @@
 	} else
 		sdp = NULL;
 	if (NULL == sdp) {
-		scsidp->attached--;
 		write_unlock_irqrestore(&sg_dev_arr_lock, iflags);
 		printk(KERN_ERR "sg_attach: Sg_device cannot be allocated\n");
 		put_disk(disk);
+		scsi_slave_detach(scsidp);
 		return 1;
 	}
 
@@ -1559,7 +1561,7 @@
 			SCSI_LOG_TIMEOUT(3, printk("sg_detach: dev=%d\n", k));
 			sg_dev_arr[k] = NULL;
 		}
-		scsidp->attached--;
+		scsi_slave_detach(scsidp);
 		sg_nr_dev--;
 		sg_dev_noticed--;	/* from <dan@lectra.fr> */
 		break;
===== drivers/scsi/sr.c 1.63 vs edited =====
--- 1.63/drivers/scsi/sr.c	Mon Nov  4 20:08:08 2002
+++ edited/drivers/scsi/sr.c	Tue Nov  5 17:37:40 2002
@@ -506,6 +506,9 @@
 	if (sdev->type != TYPE_ROM && sdev->type != TYPE_WORM)
 		return 1;
 
+	if (scsi_slave_attach(sdev))
+		return 1;
+
 	cd = kmalloc(sizeof(*cd), GFP_KERNEL);
 	if (!cd)
 		goto fail;
@@ -574,7 +577,7 @@
 fail_free:
 	kfree(cd);
 fail:
-	sdev->attached--;
+	scsi_slave_detach(sdev);
 	return 1;
 }
 
@@ -807,7 +810,7 @@
 	put_disk(cd->disk);
 	unregister_cdrom(&cd->cdi);
 
-	SDp->attached--;
+	scsi_slave_detach(SDp);
 	sr_nr_dev--;
 
 	kfree(cd);
===== drivers/scsi/st.c 1.39 vs edited =====
--- 1.39/drivers/scsi/st.c	Mon Nov  4 20:04:52 2002
+++ edited/drivers/scsi/st.c	Tue Nov  5 17:36:00 2002
@@ -3667,6 +3667,9 @@
 		return 1;
 	}
 
+	if (scsi_slave_attach(SDp))
+		return 1;
+
 	i = SDp->host->sg_tablesize;
 	if (st_max_sg_segs < i)
 		i = st_max_sg_segs;
@@ -3692,11 +3695,11 @@
 		if (tmp_dev_max > ST_MAX_TAPES)
 			tmp_dev_max = ST_MAX_TAPES;
 		if (tmp_dev_max <= st_nr_dev) {
-			SDp->attached--;
 			write_unlock(&st_dev_arr_lock);
 			printk(KERN_ERR "st: Too many tape devices (max. %d).\n",
 			       ST_MAX_TAPES);
 			put_disk(disk);
+			scsi_slave_detach(SDp);
 			return 1;
 		}
 
@@ -3707,10 +3710,10 @@
 				kfree(tmp_da);
 			if (tmp_ba != NULL)
 				kfree(tmp_ba);
-			SDp->attached--;
 			write_unlock(&st_dev_arr_lock);
 			printk(KERN_ERR "st: Can't extend device array.\n");
 			put_disk(disk);
+			scsi_slave_detach(SDp);
 			return 1;
 		}
 
@@ -3733,10 +3736,10 @@
 
 	tpnt = kmalloc(sizeof(Scsi_Tape), GFP_ATOMIC);
 	if (tpnt == NULL) {
-		SDp->attached--;
 		write_unlock(&st_dev_arr_lock);
 		printk(KERN_ERR "st: Can't allocate device descriptor.\n");
 		put_disk(disk);
+		scsi_slave_detach(SDp);
 		return 1;
 	}
 	memset(tpnt, 0, sizeof(Scsi_Tape));
@@ -3906,7 +3909,7 @@
 				tpnt->de_n[mode] = NULL;
 			}
 			scsi_tapes[i] = 0;
-			SDp->attached--;
+			scsi_slave_detach(SDp);
 			st_nr_dev--;
 			write_unlock(&st_dev_arr_lock);
 

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] fix 2.5 scsi queue depth setting
  2002-11-06 18:24   ` Patrick Mansfield
  2002-11-06 18:32     ` J.E.J. Bottomley
@ 2002-11-06 20:45     ` Doug Ledford
  2002-11-06 21:19       ` J.E.J. Bottomley
  1 sibling, 1 reply; 297+ messages in thread
From: Doug Ledford @ 2002-11-06 20:45 UTC (permalink / raw)
  To: Patrick Mansfield; +Cc: J.E.J. Bottomley, Christoph Hellwig, linux-scsi

On Wed, Nov 06, 2002 at 10:24:44AM -0800, Patrick Mansfield wrote:
> On Wed, Nov 06, 2002 at 12:47:28PM -0500, J.E.J. Bottomley wrote:
> > This is what I think needs to be done to the patch, does this look OK?
> > 
> > James
> 
> The only problem is that scsi_build_commandblocks() sets sdev->new_queue_depth 
> to 1, so if the slave_attach() does not set queue depth, we will leave
> it at 1, and the post-slave_attach call to scsi_adjust_queue_depth
> can never be hit.
> 
> It looks like the code could be removed from scsi_build_commandblocks,
> since (in this code path) we will always call scsi_adjust_queue_depth
> before sending IO, and other code paths always set new_queue_depth
> prior to calling scsi_build_commandblocks.
> 
> It would be nice if scsi_request_fn would abort IO if current_queue_depth
> == 0 or new_queue_depth == 0.
> 
> And yes, I missed scsi_syms.c.

Actually, I've been planning a slight API change to the slave_attach() 
lldd stuff.  Basically, if a host implements slave_attach(), then it 
*must* call adjust_queue_depth() on all devices passed to it, whether they 
are going to use tagged queueing or not.  They should either call 
adjust_queue_depth() with a tag parameter of non-0 and the actual depth 
they want, or with a tag parameter of 0 and 1 if they don't support 
multiple untagged commands at a time (many drivers call this linked 
commands even though they don't use the scsi protocol version of linking, 
they just queue the commands up internally and send them one after the 
other) or whatever number they support at one time on untagged devices.  
The rational for this involves me not liking wishy-washy if statements 
like what's in the scsi_slave_attach() right now.  Instead of:

if(host->hostt->slave_attach)
  host->hostt->slave_attach(device);
if ((sdev->new_queue_depth == 0) &&
    (sdev->host->cmd_per_lun != 0))
  scsi_adjust_queue_depth(sdev, 0, sdev->host->cmd_per_lun);
scsi_build_commandblocks(sdev);

(which can be tricked BTW, since some slave_attach() routines don't touch 
untagged devices and we set the new_queue_depth to 1 to enable scanning, 
we won't even set untagged devices to cmd_per_lun like our intent is), I 
would rather we did this:

if(host->hostt->slave_attach)
  blah;
else
  scsi_adjust_queue_depth(sdev, 0, cmd_per_lun);

with the requirement that slave attach always do the correct 
scsi_adjust_queue_depth.  Anyway, that's my preference.

Now, the second reason I want this is in preparation for another change I 
have planned.  This one I would like some discussion on anyway, so I'm 
bringing it up now:

I want to make the queuecommand() interface optional, with the ability to 
replace it with a combo of prepare_command() and send_command() instead.  
A driver can only define queuecommand() or the prepare,send_command() 
pair.  The obvious benefit is that a prepare then send routine is needed 
to make sequential untagged commands fast without having to implement a 
driver internal queue, and this is necessary for streaming devices that 
can't tolerate long delays between commands without falling out of their 
current stream (tapes, early CD-Rs and current very cheap CD-Rs, etc).  
Obviously, this new interface would require that driver authors actually 
think about things like sendable commands vs. just prepped commands, etc.  
Making them always call adjust queue depth is one way of forcing them to 
think about these issues, hence why I like requiring it in all 
slave_attach calls.  Thoughts?

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] fix 2.5 scsi queue depth setting
  2002-11-06 20:45     ` Doug Ledford
@ 2002-11-06 21:19       ` J.E.J. Bottomley
  0 siblings, 0 replies; 297+ messages in thread
From: J.E.J. Bottomley @ 2002-11-06 21:19 UTC (permalink / raw)
  To: Patrick Mansfield, J.E.J. Bottomley, Christoph Hellwig,
	linux-scsi

dledford@redhat.com said:
> Actually, I've been planning a slight API change to the slave_attach()
>  lldd stuff.  Basically, if a host implements slave_attach(), then it
> *must* call adjust_queue_depth() on all devices passed to it, whether
> they  are going to use tagged queueing or not.

I agree.

dledford@redhat.com said:
> I  would rather we did this:
> if(host->hostt->slave_attach)
>   blah; else
>   scsi_adjust_queue_depth(sdev, 0, cmd_per_lun); 

I agree.  (That's what patrick just put in the patch for me).

> I want to make the queuecommand() interface optional, with the ability
> to  replace it with a combo of prepare_command() and send_command()
> instead.   A driver can only define queuecommand() or the
> prepare,send_command()  pair.  The obvious benefit is that a prepare
> then send routine is needed  to make sequential untagged commands fast
> without having to implement a  driver internal queue, and this is
> necessary for streaming devices that  can't tolerate long delays
> between commands without falling out of their  current stream (tapes,
> early CD-Rs and current very cheap CD-Rs, etc).   Obviously, this new
> interface would require that driver authors actually  think about
> things like sendable commands vs. just prepped commands, etc.   Making
> them always call adjust queue depth is one way of forcing them to
> think about these issues, hence why I like requiring it in all
> slave_attach calls.  Thoughts? 

Quite like the idea.  It should be possible to make it complement the 
request_prep_fn I'm doing for SCSI (this will force the upper level drivers to 
do early prepares on commands, so we can have the whole thing just waiting to 
shoot off as the completed command returns).

James



^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] fix 2.5 scsi queue depth setting
  2002-11-06  4:24 [PATCH] fix 2.5 scsi queue depth setting Patrick Mansfield
                   ` (2 preceding siblings ...)
  2002-11-06 17:47 ` J.E.J. Bottomley
@ 2002-11-06 20:50 ` Doug Ledford
  3 siblings, 0 replies; 297+ messages in thread
From: Doug Ledford @ 2002-11-06 20:50 UTC (permalink / raw)
  To: Patrick Mansfield; +Cc: James Bottomley, Christoph Hellwig, linux-scsi

On Tue, Nov 05, 2002 at 08:24:17PM -0800, Patrick Mansfield wrote:
> +void scsi_slave_detach(struct scsi_device *sdev)
> +{
> +	if (--sdev->attached == 0) {
> +		if (sdev->host->hostt->slave_detach != NULL)
> +			sdev->host->hostt->slave_detach(sdev);
> +		scsi_release_commandblocks(sdev);
> +	}
> +}
> +/*

Hmmm...either this needs to implement what I alluded to in my writeup a 
couple weeks ago and add the lines:

if(sdev->hostdata)
  kfree(sdev->hostdata);
sdev->hostdata = NULL;

after the slave_detach call.  The other option is to slightly refine the 
slave_attach/slave_detach API to specificy that if your attach routine 
allocates and hangs memory off of sdev->hostdata, then you *must* 
implement a slave_detach() routine and you must free said memory yourself 
(and NULL out the pointer to be safe).

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606
  

^ permalink raw reply	[flat|nested] 297+ messages in thread

* [PATCH] add request prep functions to SCSI
@ 2002-11-06 22:18 J.E.J. Bottomley
  2002-11-06 23:16 ` Doug Ledford
  2002-11-07 21:45 ` Mike Anderson
  0 siblings, 2 replies; 297+ messages in thread
From: J.E.J. Bottomley @ 2002-11-06 22:18 UTC (permalink / raw)
  To: linux-scsi

[-- Attachment #1: Type: text/plain, Size: 711 bytes --]

This patch adds request prep functions to the mid-layer.  At the moment, its a single request prep function for all of SCSI.  I've altered the logic in scsi_request_fn so that we now do early preparation (this should improve throughput slightly in the untagged case with only a single command block).

The prep function also cannot drop the queue lock, so the calling assumptions for scsi_init_io and the upper layer driver init_commands have changed to be that the lock is now held and they cannot drop it.  I think this means that we have no callers of scsi_init_io that aren't atomic, so perhaps I can just take the if out.

I've hammered this in my usual set up, but other testers would be welcome.

James


[-- Attachment #2: tmp.diff --]
[-- Type: text/plain , Size: 11011 bytes --]

===== drivers/scsi/scsi.c 1.56 vs edited =====
--- 1.56/drivers/scsi/scsi.c	Wed Nov  6 15:12:50 2002
+++ edited/drivers/scsi/scsi.c	Wed Nov  6 15:30:27 2002
@@ -226,6 +226,8 @@
 
 	if (!SHpnt->use_clustering)
 		clear_bit(QUEUE_FLAG_CLUSTER, &q->queue_flags);
+
+        blk_queue_prep_rq(q, scsi_prep_fn);
 }
 
 #ifdef MODULE
===== drivers/scsi/scsi.h 1.34 vs edited =====
--- 1.34/drivers/scsi/scsi.h	Tue Nov  5 11:26:24 2002
+++ edited/drivers/scsi/scsi.h	Wed Nov  6 15:30:28 2002
@@ -455,6 +455,7 @@
 extern void scsi_io_completion(Scsi_Cmnd * SCpnt, int good_sectors,
 			       int block_sectors);
 extern void scsi_queue_next_request(request_queue_t * q, Scsi_Cmnd * SCpnt);
+extern int scsi_prep_fn(struct request_queue *q, struct request *req);
 extern void scsi_request_fn(request_queue_t * q);
 extern int scsi_starvation_completion(Scsi_Device * SDpnt);
 
===== drivers/scsi/scsi_lib.c 1.44 vs edited =====
--- 1.44/drivers/scsi/scsi_lib.c	Fri Nov  1 06:28:12 2002
+++ edited/drivers/scsi/scsi_lib.c	Wed Nov  6 16:11:13 2002
@@ -102,6 +102,13 @@
 {
 	request_queue_t *q = &SRpnt->sr_device->request_queue;
 
+	/* This is used to insert SRpnt specials.  Because users of
+	 * this function are apt to reuse requests with no modification,
+	 * we have to sanitise the request flags here
+	 */
+
+	SRpnt->sr_request->flags &= ~REQ_DONTPREP;
+
 	blk_insert_request(q, SRpnt->sr_request, at_head, SRpnt);
 	return 0;
 }
@@ -240,6 +247,12 @@
 		SCpnt->request->special = (void *) SCpnt;
 		if(blk_rq_tagged(SCpnt->request))
 			blk_queue_end_tag(q, SCpnt->request);
+		/* set REQ_SPECIAL - we have a command
+		 * clear REQ_DONTPREP - we assume the sg table has been 
+		 *	nuked so we need to set it up again.
+		 */
+		SCpnt->request->flags |= REQ_SPECIAL;
+		SCpnt->request->flags &= ~REQ_DONTPREP;
 		__elv_add_request(q, SCpnt->request, 0, 0);
 	}
 
@@ -741,7 +754,7 @@
 	SCpnt->use_sg = req->nr_phys_segments;
 
 	gfp_mask = GFP_NOIO;
-	if (in_interrupt()) {
+	if (likely(in_atomic())) {
 		gfp_mask &= ~__GFP_WAIT;
 		gfp_mask |= __GFP_HIGH;
 	}
@@ -788,6 +801,116 @@
 	return 0;
 }
 
+int scsi_prep_fn(struct request_queue *q, struct request *req)
+{
+	struct Scsi_Device_Template *STpnt;
+	Scsi_Cmnd *SCpnt;
+	Scsi_Device *SDpnt;
+
+	SDpnt = (Scsi_Device *) q->queuedata;
+	BUG_ON(!SDpnt);
+
+	/*
+	 * Find the actual device driver associated with this command.
+	 * The SPECIAL requests are things like character device or
+	 * ioctls, which did not originate from ll_rw_blk.  Note that
+	 * the special field is also used to indicate the SCpnt for
+	 * the remainder of a partially fulfilled request that can 
+	 * come up when there is a medium error.  We have to treat
+	 * these two cases differently.  We differentiate by looking
+	 * at request->cmd, as this tells us the real story.
+	 */
+	if (req->flags & REQ_SPECIAL) {
+		Scsi_Request *SRpnt;
+
+		STpnt = NULL;
+		SCpnt = (Scsi_Cmnd *) req->special;
+		SRpnt = (Scsi_Request *) req->special;
+		
+		if( SRpnt->sr_magic == SCSI_REQ_MAGIC ) {
+			SCpnt = scsi_allocate_device(SRpnt->sr_device, 
+						     FALSE, FALSE);
+			if (!SCpnt)
+				return BLKPREP_DEFER;
+			scsi_init_cmd_from_req(SCpnt, SRpnt);
+		}
+		
+	} else if (req->flags & (REQ_CMD | REQ_BLOCK_PC)) {
+		/*
+		 * Now try and find a command block that we can use.
+		 */
+		if (req->special) {
+				SCpnt = (Scsi_Cmnd *) req->special;
+		} else {
+			SCpnt = scsi_allocate_device(SDpnt, FALSE, FALSE);
+		}
+		/*
+		 * if command allocation failure, wait a bit
+		 */
+		if (unlikely(!SCpnt))
+			return BLKPREP_DEFER;
+		
+		/* pull a tag out of the request if we have one */
+		SCpnt->tag = req->tag;
+	} else {
+		blk_dump_rq_flags(req, "SCSI bad req");
+		return BLKPREP_KILL;
+	}
+	
+	/* note the overloading of req->special.  When the tag
+	 * is active it always means SCpnt.  If the tag goes
+	 * back for re-queueing, it may be reset */
+	req->special = SCpnt;
+	SCpnt->request = req;
+	
+	/*
+	 * FIXME: drop the lock here because the functions below
+	 * expect to be called without the queue lock held.  Also,
+	 * previously, we dequeued the request before dropping the
+	 * lock.  We hope REQ_STARTED prevents anything untoward from
+	 * happening now.
+	 */
+
+	if (req->flags & (REQ_CMD | REQ_BLOCK_PC)) {
+		/*
+		 * This will do a couple of things:
+		 *  1) Fill in the actual SCSI command.
+		 *  2) Fill in any other upper-level specific fields
+		 * (timeout).
+		 *
+		 * If this returns 0, it means that the request failed
+		 * (reading past end of disk, reading offline device,
+		 * etc).   This won't actually talk to the device, but
+		 * some kinds of consistency checking may cause the	
+		 * request to be rejected immediately.
+		 */
+		STpnt = scsi_get_request_dev(req);
+		BUG_ON(!STpnt);
+
+		/* 
+		 * This sets up the scatter-gather table (allocating if
+		 * required).
+		 */
+		if (!scsi_init_io(SCpnt)) {
+			/* Mark it as special --- We already have an
+			 * allocated command associated with it */
+			req->flags |= REQ_SPECIAL;
+			return BLKPREP_DEFER;
+		}
+		
+		/*
+		 * Initialize the actual SCSI command for this request.
+		 */
+		if (!STpnt->init_command(SCpnt)) {
+			scsi_release_buffers(SCpnt);
+			return BLKPREP_KILL;
+		}
+	}
+	/* The request is now prepped, no need to come back here */
+	req->flags |= REQ_DONTPREP;
+	return BLKPREP_OK;
+}
+
 /*
  * Function:    scsi_request_fn()
  *
@@ -811,10 +934,8 @@
 {
 	struct request *req;
 	Scsi_Cmnd *SCpnt;
-	Scsi_Request *SRpnt;
 	Scsi_Device *SDpnt;
 	struct Scsi_Host *SHpnt;
-	struct Scsi_Device_Template *STpnt;
 
 	ASSERT_LOCK(q->queue_lock, 1);
 
@@ -837,6 +958,14 @@
 		if (SHpnt->in_recovery || blk_queue_plugged(q))
 			return;
 
+		/*
+		 * get next queueable request.  We do this early to make sure
+		 * that the request is fully prepared even if we cannot 
+		 * accept it.  If there is no request, we'll detect this
+		 * lower down.
+		 */
+		req = elv_next_request(q);
+
 		if(SHpnt->host_busy == 0 && SHpnt->host_blocked) {
 			/* unblock after host_blocked iterates to zero */
 			if(--SHpnt->host_blocked == 0) {
@@ -888,141 +1017,40 @@
 		if (blk_queue_empty(q))
 			break;
 
-		/*
-		 * get next queueable request.
-		 */
-		req = elv_next_request(q);
-
-		/*
-		 * Find the actual device driver associated with this command.
-		 * The SPECIAL requests are things like character device or
-		 * ioctls, which did not originate from ll_rw_blk.  Note that
-		 * the special field is also used to indicate the SCpnt for
-		 * the remainder of a partially fulfilled request that can 
-		 * come up when there is a medium error.  We have to treat
-		 * these two cases differently.  We differentiate by looking
-		 * at request->cmd, as this tells us the real story.
-		 */
-		if (req->flags & REQ_SPECIAL) {
-			STpnt = NULL;
-			SCpnt = (Scsi_Cmnd *) req->special;
-			SRpnt = (Scsi_Request *) req->special;
-
-			if( SRpnt->sr_magic == SCSI_REQ_MAGIC ) {
-				SCpnt = scsi_allocate_device(SRpnt->sr_device, 
-							     FALSE, FALSE);
-				if (!SCpnt)
-					break;
-				scsi_init_cmd_from_req(SCpnt, SRpnt);
-			}
-
-		} else if (req->flags & (REQ_CMD | REQ_BLOCK_PC)) {
-			SRpnt = NULL;
-			STpnt = scsi_get_request_dev(req);
-			if (!STpnt) {
-				panic("Unable to find device associated with request");
-			}
-			/*
-			 * Now try and find a command block that we can use.
-			 */
-			if (req->special) {
-				SCpnt = (Scsi_Cmnd *) req->special;
-			} else {
-				SCpnt = scsi_allocate_device(SDpnt, FALSE, FALSE);
-			}
-			/*
-			 * If so, we are ready to do something.  Bump the count
-			 * while the queue is locked and then break out of the
-			 * loop. Otherwise loop around and try another request.
-			 */
-			if (!SCpnt)
-				break;
-
-			/* pull a tag out of the request if we have one */
-			SCpnt->tag = req->tag;
-		} else {
-			blk_dump_rq_flags(req, "SCSI bad req");
+		if(!req) {
+			/* can happen if the prep fails 
+			 * FIXME: elv_next_request() should be plugging the
+			 * queue */
+			blk_plug_device(q);
 			break;
 		}
 
-		/*
-		 * Now bump the usage count for both the host and the
-		 * device.
+		SCpnt = (struct scsi_cmnd *)req->special;
+
+		/* Should be impossible for a correctly prepared request
+		 * please mail the stack trace to linux-scsi@vger.kernel.org
 		 */
-		SHpnt->host_busy++;
-		SDpnt->device_busy++;
+		BUG_ON(!SCpnt);
 
 		/*
 		 * Finally, before we release the lock, we copy the
 		 * request to the command block, and remove the
-		 * request from the request list.   Note that we always
+		 * request from the request list.  Note that we always
 		 * operate on the queue head - there is absolutely no
-		 * reason to search the list, because all of the commands
-		 * in this queue are for the same device.
+		 * reason to search the list, because all of the
+		 * commands in this queue are for the same device.
 		 */
 		if(!(blk_queue_tagged(q) && (blk_queue_start_tag(q, req) == 0)))
 			blkdev_dequeue_request(req);
-
-		/* note the overloading of req->special.  When the tag
-		 * is active it always means SCpnt.  If the tag goes
-		 * back for re-queueing, it may be reset */
-		req->special = SCpnt;
-		SCpnt->request = req;
-
+	
 		/*
-		 * Now it is finally safe to release the lock.  We are
-		 * not going to noodle the request list until this
-		 * request has been queued and we loop back to queue
-		 * another.  
+		 * Now bump the usage count for both the host and the
+		 * device.
 		 */
-		req = NULL;
+		SHpnt->host_busy++;
+		SDpnt->device_busy++;
 		spin_unlock_irq(q->queue_lock);
 
-		if (!(SCpnt->request->flags & REQ_DONTPREP)
-		    && (SCpnt->request->flags & (REQ_CMD | REQ_BLOCK_PC))) {
-			/*
-			 * This will do a couple of things:
-			 *  1) Fill in the actual SCSI command.
-			 *  2) Fill in any other upper-level specific fields
-			 * (timeout).
-			 *
-			 * If this returns 0, it means that the request failed
-			 * (reading past end of disk, reading offline device,
-			 * etc).   This won't actually talk to the device, but
-			 * some kinds of consistency checking may cause the	
-			 * request to be rejected immediately.
-			 */
-			if (STpnt == NULL)
-				STpnt = scsi_get_request_dev(SCpnt->request);
-
-			/* 
-			 * This sets up the scatter-gather table (allocating if
-			 * required).
-			 */
-			if (!scsi_init_io(SCpnt)) {
-				scsi_mlqueue_insert(SCpnt, SCSI_MLQUEUE_DEVICE_BUSY);
-				spin_lock_irq(q->queue_lock);
-				break;
-			}
-
-			/*
-			 * Initialize the actual SCSI command for this request.
-			 */
-			if (!STpnt->init_command(SCpnt)) {
-				scsi_release_buffers(SCpnt);
-				SCpnt = __scsi_end_request(SCpnt, 0, 
-							   SCpnt->request->nr_sectors, 0, 0);
-				if( SCpnt != NULL )
-				{
-					panic("Should not have leftover blocks\n");
-				}
-				spin_lock_irq(q->queue_lock);
-				SHpnt->host_busy--;
-				SDpnt->device_busy--;
-				continue;
-			}
-		}
-		SCpnt->request->flags |= REQ_DONTPREP;
 		/*
 		 * Finally, initialize any error handling parameters, and set up
 		 * the timers for timeouts.

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] add request prep functions to SCSI
  2002-11-06 22:18 [PATCH] add request prep functions to SCSI J.E.J. Bottomley
@ 2002-11-06 23:16 ` Doug Ledford
  2002-11-06 23:43   ` J.E.J. Bottomley
  2002-11-07 21:45 ` Mike Anderson
  1 sibling, 1 reply; 297+ messages in thread
From: Doug Ledford @ 2002-11-06 23:16 UTC (permalink / raw)
  To: J.E.J. Bottomley; +Cc: linux-scsi

On Wed, Nov 06, 2002 at 05:18:02PM -0500, J.E.J. Bottomley wrote:
> This patch adds request prep functions to the mid-layer.  At the moment, its a single request prep function for all of SCSI.  I've altered the logic in scsi_request_fn so that we now do early preparation (this should improve throughput slightly in the untagged case with only a single command block).
> 

Hmmm...the current pace of scsi patches is starting to get hard to keep up 
with.  Do we have a unified bk tree with all the patches we want Linus to 
apply in it as of yet?  I know a lot of stuff has come through just in the 
last day and I'm starting to see a lot of areas where there is going to be 
conflict between patches going forward.


-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606
  

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] add request prep functions to SCSI
  2002-11-06 23:16 ` Doug Ledford
@ 2002-11-06 23:43   ` J.E.J. Bottomley
  0 siblings, 0 replies; 297+ messages in thread
From: J.E.J. Bottomley @ 2002-11-06 23:43 UTC (permalink / raw)
  To: linux-scsi

dledford@redhat.com said:
> Hmmm...the current pace of scsi patches is starting to get hard to
> keep up  with.  Do we have a unified bk tree with all the patches we
> want Linus to  apply in it as of yet?  I know a lot of stuff has come
> through just in the  last day and I'm starting to see a lot of areas
> where there is going to be  conflict between patches going forward. 

I'm just getting ready to add Patrick's patch to scsi-for-linus-2.5 and bring 
it up to date with the BK current.  Should be about another hour (depending on 
how long it takes me to compile and test)

James



^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] add request prep functions to SCSI
  2002-11-06 22:18 [PATCH] add request prep functions to SCSI J.E.J. Bottomley
  2002-11-06 23:16 ` Doug Ledford
@ 2002-11-07 21:45 ` Mike Anderson
  1 sibling, 0 replies; 297+ messages in thread
From: Mike Anderson @ 2002-11-07 21:45 UTC (permalink / raw)
  To: J.E.J. Bottomley; +Cc: linux-scsi

J.E.J. Bottomley [James.Bottomley@steeleye.com] wrote:
> This patch adds request prep functions to the mid-layer.  At the moment, its a single request prep function for all of SCSI.  I've altered the logic in scsi_request_fn so that we now do early preparation (this should improve throughput slightly in the untagged case with only a single command block).
> 
> The prep function also cannot drop the queue lock, so the calling assumptions for scsi_init_io and the upper layer driver init_commands have changed to be that the lock is now held and they cannot drop it.  I think this means that we have no callers of scsi_init_io that aren't atomic, so perhaps I can just take the if out.
> 
> I've hammered this in my usual set up, but other testers would be welcome.
> 
> James
> 

The patch looks good.

I ran the patch on scsi-for-linus-2.5 plus my sysfs patches. I have
booted and ran some dd based io.

I have the following drivers loaded:
	isp, aic7xxx, isp1020, scsi_debug, qla2300

Hopefully later today I can generate a error in scsi_debug and get a
requeue. I have generated a couple port down/ups on the fibre, but this
is mainly being handled inside the LLD.

-andmike
--
Michael Anderson
andmike@us.ibm.com


^ permalink raw reply	[flat|nested] 297+ messages in thread

* [RFC][PATCH] move dma_mask into struct device
@ 2002-11-15 20:34 J.E.J. Bottomley
  2002-11-16  0:19 ` Mike Anderson
  2002-11-16 20:33 ` Patrick Mansfield
  0 siblings, 2 replies; 297+ messages in thread
From: J.E.J. Bottomley @ 2002-11-15 20:34 UTC (permalink / raw)
  To: linux-kernel, linux-scsi; +Cc: grundler, willy

[-- Attachment #1: Type: text/plain, Size: 1216 bytes --]

Attached is a patch which moves dma_mask into struct device and cleans up the 
scsi mid-layer to use it (instead of using struct pci_dev).  The advantage to 
doing this is probably most apparent on non-pci bus architectures where 
currently you have to construct a fake pci_dev just so you can get the bounce 
buffers to work correctly.

The patch tries to perturb the minimum amount of code, so dma_mask in struct 
device is simply a pointer to the one in pci_dev.  However, it will make it 
easy for me now to add generic device to MCA without having to go the fake pci 
route.

This patch completely removes knowledge of pci devices from the SCSI mid-layer.

I have compiled and tested this, but obviously, since I have an MCA machine, 
it's not of much value to the pci code changes, so if someone with a PCI 
machine could check those out, I'd be grateful.

The main problem SCSI has with this is the scsi_ioctl_get_pci which is used to 
get the pci slot name.  Although, this can be fixed up afterwards with Matthew 
Wilcox's name change patch.

I'd like to see this as the beginning of a move away from bus specific code to 
using generic device code (where possible).  Comments and feedback welcome.

James


[-- Attachment #2: tmp.diff --]
[-- Type: text/plain , Size: 3432 bytes --]

===== drivers/pci/probe.c 1.17 vs edited =====
--- 1.17/drivers/pci/probe.c	Fri Nov  1 12:33:02 2002
+++ edited/drivers/pci/probe.c	Fri Nov 15 14:00:46 2002
@@ -449,6 +449,7 @@
 	/* now put in global tree */
 	strcpy(dev->dev.name,dev->name);
 	strcpy(dev->dev.bus_id,dev->slot_name);
+	dev->dev->dma_mask = &dev->dma_mask;
 
 	device_register(&dev->dev);
 	return dev;
===== drivers/scsi/hosts.h 1.36 vs edited =====
--- 1.36/drivers/scsi/hosts.h	Thu Nov 14 13:07:27 2002
+++ edited/drivers/scsi/hosts.h	Fri Nov 15 14:43:59 2002
@@ -468,10 +468,10 @@
     unsigned int max_host_blocked;
 
     /*
-     * For SCSI hosts which are PCI devices, set pci_dev so that
-     * we can do BIOS EDD 3.0 mappings
+     * This is a pointer to the generic device for this host (i.e. the
+     * device on the bus);
      */
-    struct pci_dev *pci_dev;
+    struct device *dev;
 
     /* 
      * Support for driverfs filesystem
@@ -521,11 +521,17 @@
 	shost->host_lock = lock;
 }
 
+static inline void scsi_set_device(struct Scsi_Host *shost,
+                                   struct device *dev)
+{
+        shost->dev = dev;
+        shost->host_driverfs_dev.parent = dev;
+}
+
 static inline void scsi_set_pci_device(struct Scsi_Host *shost,
                                        struct pci_dev *pdev)
 {
-	shost->pci_dev = pdev;
-	shost->host_driverfs_dev.parent=&pdev->dev;
+        scsi_set_device(shost, &pdev->dev);
 }
 
 
===== drivers/scsi/scsi_ioctl.c 1.12 vs edited =====
--- 1.12/drivers/scsi/scsi_ioctl.c	Thu Oct 17 13:52:39 2002
+++ edited/drivers/scsi/scsi_ioctl.c	Fri Nov 15 14:08:19 2002
@@ -396,9 +396,9 @@
 scsi_ioctl_get_pci(Scsi_Device * dev, void *arg)
 {
 
-        if (!dev->host->pci_dev) return -ENXIO;
-        return copy_to_user(arg, dev->host->pci_dev->slot_name,
-                            sizeof(dev->host->pci_dev->slot_name));
+        if (!dev->host->dev) return -ENXIO;
+        return copy_to_user(arg, dev->host->dev->name,
+                            sizeof(dev->host->dev->name));
 }
 
 
===== drivers/scsi/scsi_scan.c 1.35 vs edited =====
--- 1.35/drivers/scsi/scsi_scan.c	Thu Nov 14 12:34:35 2002
+++ edited/drivers/scsi/scsi_scan.c	Fri Nov 15 14:25:41 2002
@@ -436,8 +436,8 @@
 	u64 bounce_limit;
 
 	if (sh->highmem_io) {
-		if (sh->pci_dev && PCI_DMA_BUS_IS_PHYS) {
-			bounce_limit = sh->pci_dev->dma_mask;
+		if (sh->dev && sh->dev->dma_mask && PCI_DMA_BUS_IS_PHYS) {
+			bounce_limit = *sh->dev->dma_mask;
 		} else {
 			/*
 			 * Platforms with virtual-DMA translation
===== drivers/scsi/st.c 1.42 vs edited =====
--- 1.42/drivers/scsi/st.c	Mon Nov 11 03:32:34 2002
+++ edited/drivers/scsi/st.c	Fri Nov 15 14:25:58 2002
@@ -3786,8 +3786,8 @@
 			 * hardware have no practical limit.
 			 */
 			bounce_limit = BLK_BOUNCE_ANY;
-		else if (SDp->host->pci_dev)
-			bounce_limit = SDp->host->pci_dev->dma_mask;
+		else if (SDp->host->dev && SDp->host->dev->dma_mask)
+			bounce_limit = *SDp->host->dev->dma_mask;
 	} else if (SDp->host->unchecked_isa_dma)
 		bounce_limit = BLK_BOUNCE_ISA;
 	bounce_limit >>= PAGE_SHIFT;
===== include/linux/device.h 1.58 vs edited =====
--- 1.58/include/linux/device.h	Thu Oct 31 15:25:58 2002
+++ edited/include/linux/device.h	Fri Nov 15 13:52:55 2002
@@ -300,6 +300,7 @@
 					   being off. */
 
 	unsigned char *saved_state;	/* saved device state */
+	u64		*dma_mask;	/* dma mask (if dma'able device) */
 
 	void	(*release)(struct device * dev);
 };

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC][PATCH] move dma_mask into struct device
  2002-11-15 20:34 [RFC][PATCH] move dma_mask into struct device J.E.J. Bottomley
@ 2002-11-16  0:19 ` Mike Anderson
  2002-11-16 14:48   ` J.E.J. Bottomley
  2002-11-16 20:33 ` Patrick Mansfield
  1 sibling, 1 reply; 297+ messages in thread
From: Mike Anderson @ 2002-11-16  0:19 UTC (permalink / raw)
  To: J.E.J. Bottomley; +Cc: linux-kernel, linux-scsi, grundler, willy

J.E.J. Bottomley [James.Bottomley@steeleye.com] wrote:

> ===== drivers/pci/probe.c 1.17 vs edited =====
> --- 1.17/drivers/pci/probe.c	Fri Nov  1 12:33:02 2002
> +++ edited/drivers/pci/probe.c	Fri Nov 15 14:00:46 2002
> @@ -449,6 +449,7 @@
>  	/* now put in global tree */
>  	strcpy(dev->dev.name,dev->name);
>  	strcpy(dev->dev.bus_id,dev->slot_name);
> +	dev->dev->dma_mask = &dev->dma_mask;

I got a compile error here. This should be.
	dev->dev.dma_mask = &dev->dma_mask;

I did not have a current bk handy on the lab machine, but I ran it on
a 2.5.47 view with a few offset warnings.

The machine is a 2x pci systems with the following drivers loaded:
	aic7xxx, ips, qlogicisp.

It just booted and ran a little IO.

-andmike
--
Michael Anderson
andmike@us.ibm.com

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC][PATCH] move dma_mask into struct device
  2002-11-16  0:19 ` Mike Anderson
@ 2002-11-16 14:48   ` J.E.J. Bottomley
  0 siblings, 0 replies; 297+ messages in thread
From: J.E.J. Bottomley @ 2002-11-16 14:48 UTC (permalink / raw)
  To: linux-kernel, linux-scsi, grundler, willy

andmike@us.ibm.com said:
> I got a compile error here. This should be.
> 	dev->dev.dma_mask = &dev->dma_mask; 

Oops, yes, that's what comes of making changes in code you don't compile.

> The machine is a 2x pci systems with the following drivers loaded:
> 	aic7xxx, ips, qlogicisp. 

Thanks for testing it.

James



^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC][PATCH] move dma_mask into struct device
  2002-11-15 20:34 [RFC][PATCH] move dma_mask into struct device J.E.J. Bottomley
  2002-11-16  0:19 ` Mike Anderson
@ 2002-11-16 20:33 ` Patrick Mansfield
  2002-11-17 15:07   ` J.E.J. Bottomley
  1 sibling, 1 reply; 297+ messages in thread
From: Patrick Mansfield @ 2002-11-16 20:33 UTC (permalink / raw)
  To: J.E.J. Bottomley; +Cc: linux-kernel, linux-scsi, grundler, willy


On Fri, Nov 15, 2002 at 03:34:12PM -0500, J.E.J. Bottomley wrote:
> --- 1.36/drivers/scsi/hosts.h	Thu Nov 14 13:07:27 2002
> +++ edited/drivers/scsi/hosts.h	Fri Nov 15 14:43:59 2002
> @@ -468,10 +468,10 @@
>      unsigned int max_host_blocked;
>  
>      /*
> -     * For SCSI hosts which are PCI devices, set pci_dev so that
> -     * we can do BIOS EDD 3.0 mappings
> +     * This is a pointer to the generic device for this host (i.e. the
> +     * device on the bus);
>       */
> -    struct pci_dev *pci_dev;
> +    struct device *dev;
>  
>      /* 
>       * Support for driverfs filesystem
> @@ -521,11 +521,17 @@
>  	shost->host_lock = lock;
>  }
>  
> +static inline void scsi_set_device(struct Scsi_Host *shost,
> +                                   struct device *dev)
> +{
> +        shost->dev = dev;
> +        shost->host_driverfs_dev.parent = dev;
> +}
> +
>  static inline void scsi_set_pci_device(struct Scsi_Host *shost,
>                                         struct pci_dev *pdev)
>  {
> -	shost->pci_dev = pdev;
> -	shost->host_driverfs_dev.parent=&pdev->dev;
> +        scsi_set_device(shost, &pdev->dev);
>  }

Can shost->host_driverfs_dev.parent be used instead of adding and
using a duplicate shost->dev?

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [RFC][PATCH] move dma_mask into struct device
  2002-11-16 20:33 ` Patrick Mansfield
@ 2002-11-17 15:07   ` J.E.J. Bottomley
  0 siblings, 0 replies; 297+ messages in thread
From: J.E.J. Bottomley @ 2002-11-17 15:07 UTC (permalink / raw)
  To: Patrick Mansfield
  Cc: J.E.J. Bottomley, linux-kernel, linux-scsi, grundler, willy

patmans@us.ibm.com said:
> Can shost->host_driverfs_dev.parent be used instead of adding and
> using a duplicate shost->dev? 

I think so. I believe the parent is always the device we're looking for.  I'll 
make the fix.

James



^ permalink raw reply	[flat|nested] 297+ messages in thread

* [PATCH] removel useless mod use count manipulation
@ 2002-11-16 19:40 Christoph Hellwig
  2002-11-17  2:59 ` Doug Ledford
  2002-11-17 12:40 ` Douglas Gilbert
  0 siblings, 2 replies; 297+ messages in thread
From: Christoph Hellwig @ 2002-11-16 19:40 UTC (permalink / raw)
  To: James.Bottomley; +Cc: linux-scsi

There's a bunch of useless mod usecount handling in scsi:

o MOD_INC_USE_COUNT/MOD_DEC_USE_COUNT in exported functions that
  are only used outside this module - a module that has exported
  symbols use by other modules can't be unloaded at all
o GET_USE_COUNT checks in scsi_unregister_device/scsi_unregister_host -
  the scsi layer shouldn't care about the mod use count of the modules
  implementing drivers, and as they are usually called from the mod
  unload handler the check is totally pointless (if it ever triggers
  the scsi midlayer could try to scramble at freed memory).  Also
  GET_USE_COUNT is disfunct with the new module loader
o lacking ->owner in megaraid's chardev leading to junk
o double-increment of the modules use count in upper layer drivers
  (chardev/bdev open already gets a reference on ->owner for us)

I also had a patch to get the increment on the host driver module
in upper layer ->open right, but as Rusty's new loader changed the
interface for that I'll submit that one once the new loader appears
in the scsi tree.

===== drivers/scsi/NCR53C9x.c 1.12 vs edited =====
--- 1.12/drivers/scsi/NCR53C9x.c	Wed Oct  9 12:20:19 2002
+++ edited/drivers/scsi/NCR53C9x.c	Sat Nov 16 19:16:14 2002
@@ -732,9 +732,6 @@
 	/* Reset the thing before we try anything... */
 	esp_bootup_reset(esp, eregs);
 
-#ifdef MODULE
-	MOD_INC_USE_COUNT;
-#endif
 	esps_in_use++;
 }
 
@@ -3638,7 +3635,6 @@
 void cleanup_module(void) {}
 void esp_release(void)
 {
-	MOD_DEC_USE_COUNT;
 	esps_in_use--;
 	esps_running = esps_in_use;
 }
===== drivers/scsi/hosts.c 1.27 vs edited =====
--- 1.27/drivers/scsi/hosts.c	Thu Nov 14 13:19:04 2002
+++ edited/drivers/scsi/hosts.c	Sat Nov 16 19:14:08 2002
@@ -220,12 +220,6 @@
 	struct scsi_cmnd *scmd;
 
 	/*
-	 * Current policy is all shosts go away on unregister.
-	 */
-	if (shost->hostt->module && GET_USE_COUNT(shost->hostt->module))
-		return 1;
-
-	/*
 	 * FIXME Do ref counting.  We force all of the devices offline to
 	 * help prevent race conditions where other hosts/processors could
 	 * try and get in and queue a command.
@@ -520,8 +514,6 @@
 
 	cur_cnt = scsi_hosts_registered;
 
-	MOD_INC_USE_COUNT;
-
 	/*
 	 * The detect routine must carefully spinunlock/spinlock if it
 	 * enables interrupts, since all interrupt handlers do spinlock as
@@ -604,8 +596,6 @@
 	if (pcount != scsi_hosts_registered)
 		printk(KERN_INFO "scsi : %d host%s left.\n", scsi_hosts_registered,
 		       (scsi_hosts_registered == 1) ? "" : "s");
-
-	MOD_DEC_USE_COUNT;
 
 	unlock_kernel();
 	return 0;
===== drivers/scsi/megaraid.c 1.27 vs edited =====
--- 1.27/drivers/scsi/megaraid.c	Thu Oct 31 23:59:07 2002
+++ edited/drivers/scsi/megaraid.c	Sat Nov 16 19:15:23 2002
@@ -760,9 +760,8 @@
 /* For controller re-ordering */ 
 
 static struct file_operations megadev_fops = {
-	ioctl:megadev_ioctl_entry,
-	open:megadev_open,
-	release:megadev_close,
+	.owner		= THIS_MODULE,
+	.ioctl		= megadev_ioctl_entry,
 };
 
 /*
@@ -4333,15 +4332,6 @@
 	}
 }
 
-/*
- * Routines for the character/ioctl interface to the driver
- */
-static int megadev_open (struct inode *inode, struct file *filep)
-{
-	MOD_INC_USE_COUNT;
-	return 0;		/* success */
-}
-
 static int megadev_ioctl_entry (struct inode *inode, struct file *filep,
 		     unsigned int cmd, unsigned long arg)
 {
@@ -4851,16 +4841,6 @@
 
 	return scb;
 }
-
-static int
-megadev_close (struct inode *inode, struct file *filep)
-{
-#ifdef MODULE
-	MOD_DEC_USE_COUNT;
-#endif
-	return 0;
-}
-
 
 static int
 mega_support_ext_cdb(mega_host_config *this_hba)
===== drivers/scsi/osst.c 1.27 vs edited =====
--- 1.27/drivers/scsi/osst.c	Mon Nov 11 03:32:34 2002
+++ edited/drivers/scsi/osst.c	Sat Nov 16 19:20:34 2002
@@ -4176,8 +4176,6 @@
 
 	if (STp->device->host->hostt->module)
 		 __MOD_INC_USE_COUNT(STp->device->host->hostt->module);
-	if (osst_template.module)
-		 __MOD_INC_USE_COUNT(osst_template.module);
 	STp->device->access_count++;
 
 	if (mode != STp->current_mode) {
@@ -4520,8 +4518,6 @@
 
 	if (STp->device->host->hostt->module)
 	    __MOD_DEC_USE_COUNT(STp->device->host->hostt->module);
-	if (osst_template.module)
-	    __MOD_DEC_USE_COUNT(osst_template.module);
 
 	return retval;
 }
@@ -4651,8 +4647,6 @@
 
 	if (STp->device->host->hostt->module)
 		__MOD_DEC_USE_COUNT(STp->device->host->hostt->module);
-	if(osst_template.module)
-		__MOD_DEC_USE_COUNT(osst_template.module);
 
 	return result;
 }
===== drivers/scsi/scsi.c 1.61 vs edited =====
--- 1.61/drivers/scsi/scsi.c	Thu Nov 14 13:19:04 2002
+++ edited/drivers/scsi/scsi.c	Sat Nov 16 19:08:34 2002
@@ -2404,8 +2404,6 @@
 		}
 	}
 
-	MOD_INC_USE_COUNT;
-
 	if (out_of_space) {
 		scsi_unregister_device(tpnt);	/* easiest way to clean up?? */
 		return 1;
@@ -2422,12 +2420,6 @@
 	struct Scsi_Device_Template *prev_spnt;
 	
 	lock_kernel();
-	/*
-	 * If we are busy, this is not going to fly.
-	 */
-	if (GET_USE_COUNT(tpnt->module) != 0)
-		goto error_out;
-
 	driver_unregister(&tpnt->scsi_driverfs_driver);
 
 	/*
@@ -2458,16 +2450,12 @@
 		prev_spnt->next = spnt->next;
 	up_write(&scsi_devicelist_mutex);
 
-	MOD_DEC_USE_COUNT;
 	unlock_kernel();
 	/*
 	 * Final cleanup for the driver is done in the driver sources in the
 	 * cleanup function.
 	 */
 	return 0;
-error_out:
-	unlock_kernel();
-	return -1;
 }
 
 #ifdef CONFIG_PROC_FS
===== drivers/scsi/sd.c 1.91 vs edited =====
--- 1.91/drivers/scsi/sd.c	Sat Nov  9 18:48:20 2002
+++ edited/drivers/scsi/sd.c	Sat Nov 16 19:19:05 2002
@@ -453,8 +453,6 @@
 	 */
 	if (sdp->host->hostt->module)
 		__MOD_INC_USE_COUNT(sdp->host->hostt->module);
-	if (sd_template.module)
-		__MOD_INC_USE_COUNT(sd_template.module);
 	sdp->access_count++;
 
 	if (sdp->removable) {
@@ -498,8 +496,6 @@
 	sdp->access_count--;
 	if (sdp->host->hostt->module)
 		__MOD_DEC_USE_COUNT(sdp->host->hostt->module);
-	if (sd_template.module)
-		__MOD_DEC_USE_COUNT(sd_template.module);
 	return retval;	
 }
 
@@ -536,8 +532,6 @@
 	}
 	if (sdp->host->hostt->module)
 		__MOD_DEC_USE_COUNT(sdp->host->hostt->module);
-	if (sd_template.module)
-		__MOD_DEC_USE_COUNT(sd_template.module);
 
 	return 0;
 }
===== drivers/scsi/sg.c 1.39 vs edited =====
--- 1.39/drivers/scsi/sg.c	Wed Nov  6 21:06:38 2002
+++ edited/drivers/scsi/sg.c	Sat Nov 16 19:20:02 2002
@@ -1304,8 +1304,6 @@
 				if (sdp->device->host->hostt->module)
 					__MOD_DEC_USE_COUNT(sdp->device->host->hostt->module);
 			}
-			if (sg_template.module)
-				__MOD_DEC_USE_COUNT(sg_template.module);
 			sfp = NULL;
 		}
 	} else if (srp && srp->orphan) {
@@ -1529,8 +1527,6 @@
 				}
 				if (sfp->closed) {
 					sdp->device->access_count--;
-					if (sg_template.module)
-						__MOD_DEC_USE_COUNT(sg_template.module);
 					if (sdp->device->host->hostt->module)
 						__MOD_DEC_USE_COUNT(
 							sdp->device->host->
@@ -2522,8 +2518,6 @@
 		sfp->closed = 1;	/* flag dirty state on this fd */
 		sdp->device->access_count++;
 		/* MOD_INC's to inhibit unloading sg and associated adapter driver */
-		if (sg_template.module)
-			__MOD_INC_USE_COUNT(sg_template.module);
 		if (sdp->device->host->hostt->module)
 			__MOD_INC_USE_COUNT(sdp->device->host->hostt->module);
 		SCSI_LOG_TIMEOUT(1, printk("sg_remove_sfp: worrisome, %d writes pending\n",
===== drivers/scsi/sr.c 1.65 vs edited =====
--- 1.65/drivers/scsi/sr.c	Wed Nov  6 21:04:59 2002
+++ edited/drivers/scsi/sr.c	Sat Nov 16 19:20:31 2002
@@ -130,8 +130,6 @@
 	cd->device->access_count--;
 	if (cd->device->host->hostt->module)
 		__MOD_DEC_USE_COUNT(cd->device->host->hostt->module);
-	if (sr_template.module)
-		__MOD_DEC_USE_COUNT(sr_template.module);
 }
 
 static struct cdrom_device_ops sr_dops = {
@@ -473,8 +471,6 @@
 	cd->device->access_count++;
 	if (cd->device->host->hostt->module)
 		__MOD_INC_USE_COUNT(cd->device->host->hostt->module);
-	if (sr_template.module)
-		__MOD_INC_USE_COUNT(sr_template.module);
 
 	/* If this device did not have media in the drive at boot time, then
 	 * we would have been unable to get the sector size.  Check to see if
===== drivers/scsi/wd33c93.c 1.5 vs edited =====
--- 1.5/drivers/scsi/wd33c93.c	Wed Oct 16 18:51:48 2002
+++ edited/drivers/scsi/wd33c93.c	Sat Nov 16 19:13:37 2002
@@ -1853,7 +1853,6 @@
    printk("\n");
    printk("           Version %s - %s, Compiled %s at %s\n",
                WD33C93_VERSION,WD33C93_DATE,__DATE__,__TIME__);
-   MOD_INC_USE_COUNT;
 }
 
 
@@ -2031,7 +2030,6 @@
 #endif
 void wd33c93_release(void)
 {
-   MOD_DEC_USE_COUNT;
 }
 
 MODULE_LICENSE("GPL");

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] removel useless mod use count manipulation
  2002-11-16 19:40 [PATCH] removel useless mod use count manipulation Christoph Hellwig
@ 2002-11-17  2:59 ` Doug Ledford
  2002-11-17 17:31   ` J.E.J. Bottomley
  2002-11-17 12:40 ` Douglas Gilbert
  1 sibling, 1 reply; 297+ messages in thread
From: Doug Ledford @ 2002-11-17  2:59 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: James.Bottomley, linux-scsi

On Sat, Nov 16, 2002 at 08:40:09PM +0100, Christoph Hellwig wrote:
> There's a bunch of useless mod usecount handling in scsi:
> 
> o MOD_INC_USE_COUNT/MOD_DEC_USE_COUNT in exported functions that
>   are only used outside this module - a module that has exported
>   symbols use by other modules can't be unloaded at all
> o GET_USE_COUNT checks in scsi_unregister_device/scsi_unregister_host -
>   the scsi layer shouldn't care about the mod use count of the modules
>   implementing drivers, and as they are usually called from the mod
>   unload handler the check is totally pointless (if it ever triggers
>   the scsi midlayer could try to scramble at freed memory).  Also
>   GET_USE_COUNT is disfunct with the new module loader
> o lacking ->owner in megaraid's chardev leading to junk
> o double-increment of the modules use count in upper layer drivers
>   (chardev/bdev open already gets a reference on ->owner for us)
> 
> I also had a patch to get the increment on the host driver module
> in upper layer ->open right, but as Rusty's new loader changed the
> interface for that I'll submit that one once the new loader appears
> in the scsi tree.

[ snipped patch ]

This patch conflicts with one I had internally that I was still working 
on.  I had a bit more stuff in mine than you did in yours so I folded your 
changes into my stuff.  When I push my tree out it will be in there 
(otherwise it would have required hand merging by someone else anyway).  I 
just gotta wait till I go back up to the office and restart my test box in 
order to push the changes out :-/  (shutdown -nr segfaulted on me)


-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606
  

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] removel useless mod use count manipulation
  2002-11-17  2:59 ` Doug Ledford
@ 2002-11-17 17:31   ` J.E.J. Bottomley
  2002-11-17 18:14     ` Doug Ledford
  0 siblings, 1 reply; 297+ messages in thread
From: J.E.J. Bottomley @ 2002-11-17 17:31 UTC (permalink / raw)
  To: Christoph Hellwig, linux-scsi

> This patch conflicts with one I had internally that I was still
> working  on.  I had a bit more stuff in mine than you did in yours so
> I folded your  changes into my stuff.  When I push my tree out it will
> be in there  (otherwise it would have required hand merging by someone
> else anyway).  I  just gotta wait till I go back up to the office and
> restart my test box in  order to push the changes out :-/  (shutdown
> -nr segfaulted on me) 

OK, I'll back this one out of the scsi-misc-2.5 tree I have internally before 
putting it out on the bkbits site.

James



^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] removel useless mod use count manipulation
  2002-11-17 17:31   ` J.E.J. Bottomley
@ 2002-11-17 18:14     ` Doug Ledford
  0 siblings, 0 replies; 297+ messages in thread
From: Doug Ledford @ 2002-11-17 18:14 UTC (permalink / raw)
  To: J.E.J. Bottomley; +Cc: Christoph Hellwig, linux-scsi

On Sun, Nov 17, 2002 at 11:31:17AM -0600, J.E.J. Bottomley wrote:
> > This patch conflicts with one I had internally that I was still
> > working  on.  I had a bit more stuff in mine than you did in yours so
> > I folded your  changes into my stuff.  When I push my tree out it will
> > be in there  (otherwise it would have required hand merging by someone
> > else anyway).  I  just gotta wait till I go back up to the office and
> > restart my test box in  order to push the changes out :-/  (shutdown
> > -nr segfaulted on me) 
> 
> OK, I'll back this one out of the scsi-misc-2.5 tree I have internally before 
> putting it out on the bkbits site.

Cool.  I'm up here working on it at the moment.  I'm also trying to track 
down the problem Doug Gilbert reported and that I was seeing last night as 
well, aka on /dev/sdc I can access /dev/sdc but not /dev/sdc1 which is a 
valid partition.  Once I've tracked that down I'll push my tree up.

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606
  

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] removel useless mod use count manipulation
  2002-11-16 19:40 [PATCH] removel useless mod use count manipulation Christoph Hellwig
  2002-11-17  2:59 ` Doug Ledford
@ 2002-11-17 12:40 ` Douglas Gilbert
  2002-11-17 12:48   ` Christoph Hellwig
  1 sibling, 1 reply; 297+ messages in thread
From: Douglas Gilbert @ 2002-11-17 12:40 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-scsi

Christoph Hellwig wrote:
> There's a bunch of useless mod usecount handling in scsi:

While playing with scsi_debug today I saw the opposite effect:
with a file system mounted on a scsi_debug fake disk **, lsmod
showed a zero "in use" count. I was able to "rmmod scsi_debug"
and crash the system when I tried to access that mounted fs.

** A further problem I am seeing is building partitions with
fdisk (on scsi_debug [haven't tried a real disk]) that don't
get reflected in /proc/partitions . I can add a partition
to /dev/sda but 'mke2fs /dev/sda1' fails with "no such device".
BTW I can forget partitions and successfully work with the
whole disk (i.e. "mke2fs -F /dev/sda ; mount /dev/sda /mnt/x").
Lk 2.5.47 on RH8.0 system with fdisk from util-linux-2.11r-10 .

Doug Gilbert

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] removel useless mod use count manipulation
  2002-11-17 12:40 ` Douglas Gilbert
@ 2002-11-17 12:48   ` Christoph Hellwig
  2002-11-17 13:38     ` Douglas Gilbert
  0 siblings, 1 reply; 297+ messages in thread
From: Christoph Hellwig @ 2002-11-17 12:48 UTC (permalink / raw)
  To: Douglas Gilbert; +Cc: linux-scsi

On Sun, Nov 17, 2002 at 11:40:43PM +1100, Douglas Gilbert wrote:
> Christoph Hellwig wrote:
> > There's a bunch of useless mod usecount handling in scsi:
> 
> While playing with scsi_debug today I saw the opposite effect:
> with a file system mounted on a scsi_debug fake disk **, lsmod
> showed a zero "in use" count. I was able to "rmmod scsi_debug"
> and crash the system when I tried to access that mounted fs.

This change didn't change the use-count handling for HBA driver at all,
so this must be something different.  Do you use a scsi_debug using
scsi_add_host/scsi_remove_host?  Try adding

	.module		= THIS_MODULE,

to sdebug_driver_template, I forgot that change when sendig you the
initial draft version of the change.


^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] removel useless mod use count manipulation
  2002-11-17 12:48   ` Christoph Hellwig
@ 2002-11-17 13:38     ` Douglas Gilbert
  0 siblings, 0 replies; 297+ messages in thread
From: Douglas Gilbert @ 2002-11-17 13:38 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-scsi

Christoph Hellwig wrote:
> On Sun, Nov 17, 2002 at 11:40:43PM +1100, Douglas Gilbert wrote:
> 
>>Christoph Hellwig wrote:
>>
>>>There's a bunch of useless mod usecount handling in scsi:
>>
>>While playing with scsi_debug today I saw the opposite effect:
>>with a file system mounted on a scsi_debug fake disk **, lsmod
>>showed a zero "in use" count. I was able to "rmmod scsi_debug"
>>and crash the system when I tried to access that mounted fs.
> 
> 
> This change didn't change the use-count handling for HBA driver at all,
> so this must be something different.  Do you use a scsi_debug using
> scsi_add_host/scsi_remove_host?

Yes.

 > Try adding
> 
> 	.module		= THIS_MODULE,
> 
> to sdebug_driver_template, I forgot that change when sendig you the
> initial draft version of the change.

That fixed it. [Still have "missing" partition(s) problem.]

Thanks. New version of scsi_debug coming soon.

Doug Gilbert





^ permalink raw reply	[flat|nested] 297+ messages in thread

* [PATCH] turn scsi_allocate_device into readable code
@ 2002-11-21 15:16 Christoph Hellwig
  2002-11-21 15:36 ` Doug Ledford
  0 siblings, 1 reply; 297+ messages in thread
From: Christoph Hellwig @ 2002-11-21 15:16 UTC (permalink / raw)
  To: James.Bottomley; +Cc: linux-scsi


--- 1.23/drivers/scsi/cpqfcTSinit.c	Fri Oct 25 03:13:39 2002
+++ edited/drivers/scsi/cpqfcTSinit.c	Thu Nov 21 15:51:32 2002
@@ -1604,7 +1604,7 @@
   scsi_cdb[0] = RELEASE;
 
   // allocate with wait = true, interruptible = false 
-  SCpnt = scsi_allocate_device(ScsiDev, 1, 0);
+  SCpnt = scsi_allocate_device(ScsiDev, 1);
   {
     CPQFC_DECLARE_COMPLETION(wait);
     
--- 1.15/drivers/scsi/gdth.c	Fri Nov  8 08:47:03 2002
+++ edited/drivers/scsi/gdth.c	Thu Nov 21 15:51:32 2002
@@ -4599,7 +4599,7 @@
 
 #if LINUX_VERSION_CODE >= 0x020322
     sdev = scsi_get_host_dev(gdth_ctr_tab[hanum]);
-    scp  = scsi_allocate_device(sdev, 1, FALSE);
+    scp  = scsi_allocate_device(sdev, 1);
     scp->cmd_len = 12;
     scp->use_sg = 0;
 #else
@@ -4673,7 +4673,7 @@
         memset(cmnd, 0xff, MAX_COMMAND_SIZE);
 #if LINUX_VERSION_CODE >= 0x020322
         sdev = scsi_get_host_dev(gdth_ctr_tab[hanum]);
-        scp  = scsi_allocate_device(sdev, 1, FALSE);
+        scp  = scsi_allocate_device(sdev, 1);
         scp->cmd_len = 12;
         scp->use_sg = 0;
 #else
--- 1.6/drivers/scsi/gdth_proc.c	Sat Apr 13 19:42:56 2002
+++ edited/drivers/scsi/gdth_proc.c	Thu Nov 21 15:51:33 2002
@@ -48,7 +48,7 @@
 
 #if LINUX_VERSION_CODE >= 0x020322
     sdev = scsi_get_host_dev(gdth_ctr_vtab[vh]);
-    scp  = scsi_allocate_device(sdev, 1, FALSE);
+    scp  = scsi_allocate_device(sdev, 1);
     if (!scp)
         return -ENOMEM;
     scp->cmd_len = 12;
@@ -712,7 +712,7 @@
 
 #if LINUX_VERSION_CODE >= 0x020322
     sdev = scsi_get_host_dev(gdth_ctr_vtab[vh]);
-    scp  = scsi_allocate_device(sdev, 1, FALSE);
+    scp  = scsi_allocate_device(sdev, 1);
     if (!scp)
         return -ENOMEM;
     scp->cmd_len = 12;
--- 1.69/drivers/scsi/scsi.c	Mon Nov 18 17:14:23 2002
+++ edited/drivers/scsi/scsi.c	Thu Nov 21 15:51:33 2002
@@ -347,6 +347,41 @@
 }
 
 /*
+ * FIXME(eric) - this is not at all optimal.  Given that
+ * single lun devices are rare and usually slow
+ * (i.e. CD changers), this is good enough for now, but
+ * we may want to come back and optimize this later.
+ *
+ * Scan through all of the devices attached to this
+ * host, and see if any are active or not.  If so,
+ * we need to defer this command.
+ *
+ * We really need a busy counter per device.  This would
+ * allow us to more easily figure out whether we should
+ * do anything here or not.
+ */
+static int check_all_luns(struct Scsi_Host *shost, struct scsi_device *myself)
+{
+	struct scsi_device *sdev;
+
+	for (sdev = shost->host_queue; sdev; sdev = sdev->next) {
+		/*
+		 * Only look for other devices on the same bus
+		 * with the same target ID.
+		 */
+		if (sdev->channel != myself->channel || sdev->id != myself->id)
+			continue;
+		if (sdev == myself)
+			continue;
+
+		if (atomic_read(&sdev->device_active))
+			return 1;
+	}
+
+	return 0;
+}
+
+/*
  * Function:    scsi_allocate_device
  *
  * Purpose:     Allocate a command descriptor.
@@ -372,172 +407,87 @@
  *              This function is deprecated, and drivers should be
  *              rewritten to use Scsi_Request instead of Scsi_Cmnd.
  */
-
-Scsi_Cmnd *scsi_allocate_device(Scsi_Device * device, int wait, 
-                                int interruptable)
+struct scsi_cmnd *scsi_allocate_device(struct scsi_device *sdev, int wait)
 {
- 	struct Scsi_Host *host;
-  	Scsi_Cmnd *SCpnt = NULL;
-	Scsi_Device *SDpnt;
+	DECLARE_WAITQUEUE(wq, current);
+	struct Scsi_Host *shost = sdev->host;
+	struct scsi_cmnd *scmnd;
 	unsigned long flags;
-  
-  	if (!device)
-  		panic("No device passed to scsi_allocate_device().\n");
-  
-  	host = device->host;
-  
+
 	spin_lock_irqsave(&device_request_lock, flags);
- 
-	while (1 == 1) {
-		SCpnt = NULL;
-		if (!device->device_blocked) {
-			if (device->single_lun) {
-				/*
-				 * FIXME(eric) - this is not at all optimal.  Given that
-				 * single lun devices are rare and usually slow
-				 * (i.e. CD changers), this is good enough for now, but
-				 * we may want to come back and optimize this later.
-				 *
-				 * Scan through all of the devices attached to this
-				 * host, and see if any are active or not.  If so,
-				 * we need to defer this command.
-				 *
-				 * We really need a busy counter per device.  This would
-				 * allow us to more easily figure out whether we should
-				 * do anything here or not.
-				 */
-				for (SDpnt = host->host_queue;
-				     SDpnt;
-				     SDpnt = SDpnt->next) {
-					/*
-					 * Only look for other devices on the same bus
-					 * with the same target ID.
-					 */
-					if (SDpnt->channel != device->channel
-					    || SDpnt->id != device->id
-					    || SDpnt == device) {
- 						continue;
-					}
-                                        if( atomic_read(&SDpnt->device_active) != 0)
-                                        {
-                                                break;
-                                        }
-				}
-				if (SDpnt) {
-					/*
-					 * Some other device in this cluster is busy.
-					 * If asked to wait, we need to wait, otherwise
-					 * return NULL.
-					 */
-					SCpnt = NULL;
-					goto busy;
-				}
-			}
-			/*
-			 * Now we can check for a free command block for this device.
-			 */
-			for (SCpnt = device->device_queue; SCpnt; SCpnt = SCpnt->next) {
-				if (SCpnt->request == NULL)
-					break;
-			}
-		}
+	while (1) {
+		if (sdev->device_blocked)
+			goto busy;
+		if (sdev->single_lun && check_all_luns(shost, sdev))
+			goto busy;
+
 		/*
-		 * If we couldn't find a free command block, and we have been
-		 * asked to wait, then do so.
+		 * Now we can check for a free command block for this device.
 		 */
-		if (SCpnt) {
-			break;
-		}
-      busy:
+		for (scmnd = sdev->device_queue; scmnd; scmnd = scmnd->next)
+			if (!scmnd->request)
+				goto found;
+
+busy:
+		if (!wait)
+			goto fail;
+
 		/*
-		 * If we have been asked to wait for a free block, then
-		 * wait here.
+		 * We need to wait for a free commandblock.  We need to
+		 * insert ourselves into the list before we release the
+		 * lock.  This way if a block were released the same
+		 * microsecond that we released the lock, the call
+		 * to schedule() wouldn't block (well, it might switch,
+		 * but the current task will still be schedulable.
 		 */
-		if (wait) {
-                        DECLARE_WAITQUEUE(wait, current);
-
-                        /*
-                         * We need to wait for a free commandblock.  We need to
-                         * insert ourselves into the list before we release the
-                         * lock.  This way if a block were released the same
-                         * microsecond that we released the lock, the call
-                         * to schedule() wouldn't block (well, it might switch,
-                         * but the current task will still be schedulable.
-                         */
-                        add_wait_queue(&device->scpnt_wait, &wait);
-                        if( interruptable ) {
-                                set_current_state(TASK_INTERRUPTIBLE);
-                        } else {
-                                set_current_state(TASK_UNINTERRUPTIBLE);
-                        }
-
-                        spin_unlock_irqrestore(&device_request_lock, flags);
-
-			/*
-			 * This should block until a device command block
-			 * becomes available.
-			 */
-                        schedule();
-
-			spin_lock_irqsave(&device_request_lock, flags);
-
-                        remove_wait_queue(&device->scpnt_wait, &wait);
-                        /*
-                         * FIXME - Isn't this redundant??  Someone
-                         * else will have forced the state back to running.
-                         */
-                        set_current_state(TASK_RUNNING);
-                        /*
-                         * In the event that a signal has arrived that we need
-                         * to consider, then simply return NULL.  Everyone
-                         * that calls us should be prepared for this
-                         * possibility, and pass the appropriate code back
-                         * to the user.
-                         */
-                        if( interruptable ) {
-                                if (signal_pending(current)) {
-                                        spin_unlock_irqrestore(&device_request_lock, flags);
-                                        return NULL;
-                                }
-                        }
-		} else {
-                        spin_unlock_irqrestore(&device_request_lock, flags);
-			return NULL;
-		}
-	}
-
-	SCpnt->request = NULL;
-	atomic_inc(&SCpnt->host->host_active);
-	atomic_inc(&SCpnt->device->device_active);
-
-	SCpnt->buffer  = NULL;
-	SCpnt->bufflen = 0;
-	SCpnt->request_buffer = NULL;
-	SCpnt->request_bufflen = 0;
-
-	SCpnt->use_sg = 0;	/* Reset the scatter-gather flag */
-	SCpnt->old_use_sg = 0;
-	SCpnt->transfersize = 0;	/* No default transfer size */
-	SCpnt->cmd_len = 0;
+		add_wait_queue(&sdev->scpnt_wait, &wq);
+		set_current_state(TASK_UNINTERRUPTIBLE);
 
-	SCpnt->sc_data_direction = SCSI_DATA_UNKNOWN;
-	SCpnt->sc_request = NULL;
-	SCpnt->sc_magic = SCSI_CMND_MAGIC;
-
-        SCpnt->result = 0;
-	SCpnt->underflow = 0;	/* Do not flag underflow conditions */
-	SCpnt->old_underflow = 0;
-	SCpnt->resid = 0;
-	SCpnt->state = SCSI_STATE_INITIALIZING;
-	SCpnt->owner = SCSI_OWNER_HIGHLEVEL;
+		spin_unlock_irqrestore(&device_request_lock, flags);
+		schedule();
+		spin_lock_irqsave(&device_request_lock, flags);
+
+		remove_wait_queue(&sdev->scpnt_wait, &wq);
+		set_current_state(TASK_RUNNING);
+	}
+
+found:
+	scmnd->request = NULL;
+	atomic_inc(&scmnd->host->host_active);
+	atomic_inc(&scmnd->device->device_active);
+
+	scmnd->buffer  = NULL;
+	scmnd->bufflen = 0;
+	scmnd->request_buffer = NULL;
+	scmnd->request_bufflen = 0;
+
+	scmnd->use_sg = 0;	/* Reset the scatter-gather flag */
+	scmnd->old_use_sg = 0;
+	scmnd->transfersize = 0;	/* No default transfer size */
+	scmnd->cmd_len = 0;
+
+	scmnd->sc_data_direction = SCSI_DATA_UNKNOWN;
+	scmnd->sc_request = NULL;
+	scmnd->sc_magic = SCSI_CMND_MAGIC;
+
+	scmnd->result = 0;
+	scmnd->underflow = 0;	/* Do not flag underflow conditions */
+	scmnd->old_underflow = 0;
+	scmnd->resid = 0;
+	scmnd->state = SCSI_STATE_INITIALIZING;
+	scmnd->owner = SCSI_OWNER_HIGHLEVEL;
 
 	spin_unlock_irqrestore(&device_request_lock, flags);
 
 	SCSI_LOG_MLQUEUE(5, printk("Activating command for device %d (%d)\n",
-				   SCpnt->target,
-				atomic_read(&SCpnt->host->host_active)));
+				scmnd->target,
+				atomic_read(&scmnd->host->host_active)));
+
+	return scmnd;
 
-	return SCpnt;
+fail:
+	spin_unlock_irqrestore(&device_request_lock, flags);
+	return NULL;
 }
 
 inline void __scsi_release_command(Scsi_Cmnd * SCpnt)
--- 1.44/drivers/scsi/scsi.h	Sun Nov 17 22:44:35 2002
+++ edited/drivers/scsi/scsi.h	Thu Nov 21 15:51:33 2002
@@ -471,7 +471,7 @@
 extern void scsi_done(Scsi_Cmnd * SCpnt);
 extern void scsi_finish_command(Scsi_Cmnd *);
 extern int scsi_retry_command(Scsi_Cmnd *);
-extern Scsi_Cmnd *scsi_allocate_device(Scsi_Device *, int, int);
+extern Scsi_Cmnd *scsi_allocate_device(Scsi_Device *, int);
 extern void __scsi_release_command(Scsi_Cmnd *);
 extern void scsi_release_command(Scsi_Cmnd *);
 extern void scsi_do_cmd(Scsi_Cmnd *, const void *cmnd,
--- 1.50/drivers/scsi/scsi_lib.c	Wed Nov 20 09:33:50 2002
+++ edited/drivers/scsi/scsi_lib.c	Thu Nov 21 15:52:29 2002
@@ -797,8 +797,7 @@
 		SRpnt = (Scsi_Request *) req->special;
 		
 		if( SRpnt->sr_magic == SCSI_REQ_MAGIC ) {
-			SCpnt = scsi_allocate_device(SRpnt->sr_device, 
-						     FALSE, FALSE);
+			SCpnt = scsi_allocate_device(SRpnt->sr_device, 0);
 			if (!SCpnt)
 				return BLKPREP_DEFER;
 			scsi_init_cmd_from_req(SCpnt, SRpnt);
@@ -809,9 +808,9 @@
 		 * Now try and find a command block that we can use.
 		 */
 		if (req->special) {
-				SCpnt = (Scsi_Cmnd *) req->special;
+			SCpnt = (Scsi_Cmnd *) req->special;
 		} else {
-			SCpnt = scsi_allocate_device(SDpnt, FALSE, FALSE);
+			SCpnt = scsi_allocate_device(SDpnt, 0);
 		}
 		/*
 		 * if command allocation failure, wait a bit

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] turn scsi_allocate_device into readable code
  2002-11-21 15:16 [PATCH] turn scsi_allocate_device into readable code Christoph Hellwig
@ 2002-11-21 15:36 ` Doug Ledford
  2002-11-21 15:39   ` J.E.J. Bottomley
  0 siblings, 1 reply; 297+ messages in thread
From: Doug Ledford @ 2002-11-21 15:36 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: James.Bottomley, linux-scsi

On Thu, Nov 21, 2002 at 04:16:47PM +0100, Christoph Hellwig wrote:
> 
> --- 1.23/drivers/scsi/cpqfcTSinit.c	Fri Oct 25 03:13:39 2002
> +++ edited/drivers/scsi/cpqfcTSinit.c	Thu Nov 21 15:51:32 2002
> @@ -1604,7 +1604,7 @@
>    scsi_cdb[0] = RELEASE;
>  
>    // allocate with wait = true, interruptible = false 
> -  SCpnt = scsi_allocate_device(ScsiDev, 1, 0);
> +  SCpnt = scsi_allocate_device(ScsiDev, 1);
>    {

Dammit all to hell man!  If you keep making patches that keep conflicting 
with my changes we're going to be merging forever!  :-)

James, have you dropped Christoph's latest stuff into a tree yet?  If so, 
I'll pull it and do my merging there...if not then I'll start grabbing 
patches from an archive (I didn't expect so many of them to overlap so 
closely today).

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606
  

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] turn scsi_allocate_device into readable code
  2002-11-21 15:36 ` Doug Ledford
@ 2002-11-21 15:39   ` J.E.J. Bottomley
  2002-11-21 15:49     ` Doug Ledford
  0 siblings, 1 reply; 297+ messages in thread
From: J.E.J. Bottomley @ 2002-11-21 15:39 UTC (permalink / raw)
  To: Christoph Hellwig, James.Bottomley, linux-scsi

dledford@redhat.com said:
> Dammit all to hell man!  If you keep making patches that keep
> conflicting  with my changes we're going to be merging forever!  :-) 

Erm, well, smaller patches and earlier incorporation? :-)

> James, have you dropped Christoph's latest stuff into a tree yet?  If
> so,  I'll pull it and do my merging there...if not then I'll start
> grabbing  patches from an archive (I didn't expect so many of them to
> overlap so  closely today). 

I was just about to.  I have to resync scsi-misc-2.5 with BK current first.  
If you can give me an hour, I'll have it done.

James



^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] turn scsi_allocate_device into readable code
  2002-11-21 15:39   ` J.E.J. Bottomley
@ 2002-11-21 15:49     ` Doug Ledford
  2002-11-21 16:12       ` J.E.J. Bottomley
  0 siblings, 1 reply; 297+ messages in thread
From: Doug Ledford @ 2002-11-21 15:49 UTC (permalink / raw)
  To: J.E.J. Bottomley; +Cc: Christoph Hellwig, linux-scsi

On Thu, Nov 21, 2002 at 09:39:29AM -0600, J.E.J. Bottomley wrote:
> dledford@redhat.com said:
> > Dammit all to hell man!  If you keep making patches that keep
> > conflicting  with my changes we're going to be merging forever!  :-) 
> 
> Erm, well, smaller patches and earlier incorporation? :-)

It's a race Charlie Brown!  :-P

> > James, have you dropped Christoph's latest stuff into a tree yet?  If
> > so,  I'll pull it and do my merging there...if not then I'll start
> > grabbing  patches from an archive (I didn't expect so many of them to
> > overlap so  closely today). 
> 
> I was just about to.  I have to resync scsi-misc-2.5 with BK current first.  
> If you can give me an hour, I'll have it done.

Thanks!

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606
  

^ permalink raw reply	[flat|nested] 297+ messages in thread

* Re: [PATCH] turn scsi_allocate_device into readable code
  2002-11-21 15:49     ` Doug Ledford
@ 2002-11-21 16:12       ` J.E.J. Bottomley
  2002-11-21 17:08         ` [PATCH] current scsi-misc-2.5 include files Patrick Mansfield
  0 siblings, 1 reply; 297+ messages in thread
From: J.E.J. Bottomley @ 2002-11-21 16:12 UTC (permalink / raw)
  To: J.E.J. Bottomley, Christoph Hellwig, linux-scsi

The scsi-misc-2.5 resync should be done now.

James



^ permalink raw reply	[flat|nested] 297+ messages in thread

* [PATCH] current scsi-misc-2.5 include files
  2002-11-21 16:12       ` J.E.J. Bottomley
@ 2002-11-21 17:08         ` Patrick Mansfield
  0 siblings, 0 replies; 297+ messages in thread
From: Patrick Mansfield @ 2002-11-21 17:08 UTC (permalink / raw)
  To: J.E.J. Bottomley; +Cc: Christoph Hellwig, linux-scsi

On Thu, Nov 21, 2002 at 10:12:23AM -0600, J.E.J. Bottomley wrote:
> The scsi-misc-2.5 resync should be done now.
> 
> James

Hi -

hardirq.h and init.h were removed from scsi.h. So I had to make the
following changes to compile. I also removed two extraneous includes. 

--- 1.52/drivers/scsi/scsi_lib.c	Thu Nov 21 07:52:29 2002
+++ edited/drivers/scsi/scsi_lib.c	Thu Nov 21 08:48:16 2002
@@ -12,6 +12,7 @@
 #include <linux/bio.h>
 #include <linux/kernel.h>
 #include <linux/blk.h>
+#include <asm/hardirq.h>
 #include <linux/smp_lock.h>
 #include <linux/completion.h>
 
===== drivers/scsi/scsi_proc.c 1.10 vs edited =====
--- 1.10/drivers/scsi/scsi_proc.c	Wed Nov 20 17:17:50 2002
+++ edited/drivers/scsi/scsi_proc.c	Thu Nov 21 08:47:56 2002
@@ -16,14 +16,13 @@
  * Michael A. Griffith <grif@acm.org>
  */
 
-#include <linux/config.h>
 #include <linux/module.h>
+#include <linux/init.h>
 #include <linux/string.h>
 #include <linux/mm.h>
 #include <linux/slab.h>
 #include <linux/proc_fs.h>
 #include <linux/errno.h>
-#include <linux/stat.h>
 #include <linux/blk.h>
 #include <asm/uaccess.h>
 
-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 297+ messages in thread

end of thread, other threads:[~2002-12-21  1:27 UTC | newest]

Thread overview: 297+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-04-08 15:18 [RFC] Persistent naming of scsi devices sullivan
2002-04-08 15:04 ` Christoph Hellwig
2002-04-08 15:59   ` Matthew Jacob
2002-04-08 16:34   ` James Bottomley
2002-04-08 18:27     ` Patrick Mansfield
2002-04-08 19:17       ` James Bottomley
2002-04-09  0:22         ` Douglas Gilbert
2002-04-09 14:35           ` sullivan
2002-04-09 14:55         ` sullivan
2002-04-08 17:51   ` Oliver Neukum
2002-04-08 18:01     ` Christoph Hellwig
2002-04-08 18:18     ` Matthew Jacob
2002-04-08 18:28       ` James Bottomley
2002-04-08 18:34         ` Matthew Jacob
2002-04-08 19:07           ` James Bottomley
2002-04-08 20:41             ` Matthew Jacob
2002-04-08 18:45   ` Tigran Aivazian
2002-04-08 20:18 ` Eddie Williams
2002-04-09  0:48 ` Kurt Garloff
  -- strict thread matches above, loose matches on Subject: below --
2002-04-08 16:11 Matt_Domsch
2002-04-08 19:18 Martin Peschke3
2002-04-08 20:45 ` Matthew Jacob
2002-04-10  1:16 ` Rick Stevens
2002-04-10  2:01   ` Matthew Jacob
2002-04-10  2:17   ` Linus Torvalds
2002-04-10  3:37   ` Martin K. Petersen
2002-04-10 13:19     ` Theodore Tso
2002-04-10 14:04       ` Eddie Williams
2002-04-10 17:45         ` Mike Anderson
2002-04-08 22:05 Martin Peschke3
2002-04-08 22:17 ` Matthew Jacob
2002-04-10  1:40 Bryan Henderson
2002-04-10 14:28 berthiaume_wayne
2002-04-10 14:36 berthiaume_wayne
2002-04-10 16:02 ` Matthew Jacob
2002-04-10 15:28 Bryan Henderson
2002-04-10 15:52 Martin Peschke3
2002-04-10 19:33 ` Matthew Jacob
2002-04-10 16:44 berthiaume_wayne
2002-04-10 19:02 Martin Peschke3
2002-04-10 20:24 berthiaume_wayne
2002-04-11 16:01 Bryan Henderson
2002-04-12 13:15 berthiaume_wayne
2002-04-12 17:18 Bryan Henderson
2002-04-12 18:03 berthiaume_wayne
2002-06-05 20:13 sullivan
2002-06-06  1:08 ` Douglas Gilbert
2002-06-11  2:46 Proposed changes to generic blk tag for use in SCSI (1/3) James Bottomley
2002-06-11  5:50 ` Jens Axboe
2002-06-11 14:29   ` James Bottomley
2002-06-11 14:45     ` Jens Axboe
2002-06-11 16:39       ` James Bottomley
2002-06-13 21:01 ` Doug Ledford
2002-06-13 21:26   ` James Bottomley
     [not found] <200206132126.g5DLQiQ24889@localhost.localdomain>
2002-06-13 21:50 ` Doug Ledford
2002-06-13 22:09   ` James Bottomley
2002-08-05 23:53 When must the io_request_lock be held? Jamie Wellnitz
2002-08-06 17:58 ` Mukul Kotwani
2002-08-07 14:48 ` Doug Ledford
2002-08-07 15:26   ` James Bottomley
2002-08-07 16:18     ` Doug Ledford
2002-08-07 16:48       ` James Bottomley
2002-08-07 18:06         ` Mike Anderson
2002-08-07 23:17           ` James Bottomley
2002-08-08 19:28         ` Luben Tuikov
2002-08-07 16:55       ` Patrick Mansfield
2002-08-12 23:38 [PATCH] 2.5.31 scsi_error.c cleanup Mike Anderson
2002-08-22 14:05 ` James Bottomley
2002-08-22 16:34   ` Mike Anderson
2002-08-22 17:11     ` James Bottomley
2002-08-22 20:10       ` Mike Anderson
2002-08-26 16:29 [RFC]: 64 bit LUN/Tags, dummy device in host_queue, host_lock <-> LLDD reentrancy Aron Zeh
2002-08-26 16:48 ` James Bottomley
2002-08-26 17:27   ` Mike Anderson
2002-08-26 19:00     ` James Bottomley
2002-08-26 20:57       ` Mike Anderson
2002-08-26 21:10         ` James Bottomley
2002-08-26 22:38           ` Mike Anderson
2002-08-26 22:56             ` Patrick Mansfield
2002-08-26 23:10             ` Doug Ledford
2002-08-28 14:38             ` James Bottomley
2002-08-26 21:15         ` Mike Anderson
2002-09-03 14:35 aic7xxx sets CDR offline, how to reset? James Bottomley
2002-09-03 18:23 ` Doug Ledford
2002-09-03 19:09   ` James Bottomley
2002-09-03 20:59     ` Alan Cox
2002-09-03 21:32       ` James Bottomley
2002-09-03 21:54         ` Alan Cox
2002-09-03 22:50         ` Doug Ledford
2002-09-03 23:28           ` Alan Cox
2002-09-04  7:40           ` Jeremy Higdon
2002-09-04 16:24             ` James Bottomley
2002-09-04 17:13               ` Mike Anderson
2002-09-05  9:50               ` Jeremy Higdon
2002-09-04 16:13           ` James Bottomley
2002-09-04 16:50             ` Justin T. Gibbs
2002-09-05  9:39               ` Jeremy Higdon
2002-09-05 13:35                 ` Justin T. Gibbs
2002-09-05 23:56                   ` Jeremy Higdon
2002-09-06  0:13                     ` Justin T. Gibbs
2002-09-06  0:32                       ` Jeremy Higdon
2002-09-03 21:13     ` Doug Ledford
2002-09-03 21:48       ` James Bottomley
2002-09-03 22:42         ` Doug Ledford
2002-09-03 22:52           ` Doug Ledford
2002-09-03 23:29           ` Alan Cox
2002-09-04 21:16           ` Luben Tuikov
2002-09-04 10:37         ` Andries Brouwer
2002-09-04 10:48           ` Doug Ledford
2002-09-04 11:23           ` Alan Cox
2002-09-04 16:25             ` Rogier Wolff
2002-09-04 19:34               ` Thunder from the hill
2002-09-03 21:24     ` Patrick Mansfield
2002-09-03 22:02       ` James Bottomley
2002-09-03 23:26         ` Alan Cox
     [not found] <200209091458.g89Evv806056@localhost.localdomain>
2002-09-09 16:56 ` [RFC] Multi-path IO in 2.5/2.6 ? Patrick Mansfield
2002-09-09 17:34   ` James Bottomley
2002-09-09 18:40     ` Mike Anderson
2002-09-10 13:02       ` Lars Marowsky-Bree
2002-09-10 16:03         ` Patrick Mansfield
2002-09-10 16:27         ` Mike Anderson
2002-09-10  0:08     ` Patrick Mansfield
2002-09-10  7:55       ` Jeremy Higdon
2002-09-10 13:04         ` Lars Marowsky-Bree
2002-09-10 16:20           ` Patrick Mansfield
2002-09-10 13:16       ` Lars Marowsky-Bree
2002-09-10 19:26         ` Patrick Mansfield
2002-09-11 14:20           ` James Bottomley
2002-09-11 19:17             ` Lars Marowsky-Bree
2002-09-11 19:37               ` James Bottomley
2002-09-11 19:52                 ` Lars Marowsky-Bree
2002-09-11 21:38                 ` Oliver Xymoron
2002-09-11 20:30             ` Doug Ledford
2002-09-11 21:17               ` Mike Anderson
2002-09-10 17:21       ` Patrick Mochel
2002-09-10 18:42         ` Patrick Mansfield
2002-09-10 19:00           ` Patrick Mochel
2002-09-10 19:37             ` Patrick Mansfield
2002-09-24 11:35 SCSI woes (followup) Russell King
2002-09-24 13:46 ` James Bottomley
2002-09-24 13:58   ` Russell King
2002-09-24 14:29     ` James Bottomley
2002-09-24 18:16       ` Luben Tuikov
2002-09-24 18:18     ` Patrick Mansfield
2002-09-24 19:01       ` Russell King
2002-09-24 19:08       ` Mike Anderson
2002-09-24 19:21         ` Russell King
2002-09-24 19:32       ` Patrick Mansfield
2002-09-24 20:00         ` Russell King
2002-09-24 22:23           ` Patrick Mansfield
2002-09-24 23:04             ` Russell King
2002-09-24 22:39         ` Russell King
2002-09-24 23:14           ` James Bottomley
2002-09-24 23:26             ` Mike Anderson
2002-09-24 23:31               ` James Bottomley
2002-09-24 23:56                 ` Mike Anderson
2002-09-24 23:33               ` Russell King
2002-09-25  0:47                 ` Mike Anderson
2002-09-25  8:45                   ` Russell King
2002-09-25  2:18                 ` Doug Ledford
2002-09-25 14:41               ` Russell King
2002-09-24 23:33           ` Mike Anderson
2002-09-24 23:45             ` Russell King
2002-09-25  0:08           ` Patrick Mansfield
2002-09-25  8:41             ` Russell King
2002-09-25 17:22               ` Patrick Mansfield
2002-09-25 12:46             ` Russell King
2002-09-24 17:57   ` Luben Tuikov
2002-09-24 18:39     ` Mike Anderson
2002-09-24 18:49       ` Luben Tuikov
2002-09-30 21:06 [PATCH] first cut at fixing unable to requeue with no outstanding commands James Bottomley
2002-09-30 23:28 ` Mike Anderson
2002-10-01  0:38   ` James Bottomley
2002-10-01 15:01     ` Patrick Mansfield
2002-10-01 15:14       ` James Bottomley
2002-10-01 16:23         ` Mike Anderson
2002-10-01 16:30           ` James Bottomley
2002-10-01 20:18         ` Inhibit auto-attach of scsi disks ? Scott Merritt
2002-10-02  0:46           ` Alan Cox
2002-10-02  1:49             ` Scott Merritt
2002-10-02  1:58               ` Doug Ledford
2002-10-02  2:45                 ` Scott Merritt
2002-10-02 13:40               ` Alan Cox
2002-10-10 15:01 [PATCH] scsi host cleanup 3/3 (driver changes) Stephen Cameron
2002-10-10 16:46 ` Mike Anderson
2002-10-10 16:59   ` James Bottomley
2002-10-10 20:05     ` Mike Anderson
     [not found] <patmans@us.ibm.com>
2002-10-15 16:55 ` [RFC PATCH] consolidate SCSI-2 command lun setting Patrick Mansfield
2002-10-15 20:29   ` James Bottomley
2002-10-15 22:00     ` Patrick Mansfield
2002-10-30 16:58 ` [PATCH] 2.5 current bk fix setting scsi queue depths Patrick Mansfield
2002-10-30 17:17   ` James Bottomley
2002-10-30 18:05     ` Patrick Mansfield
2002-10-31  0:44       ` James Bottomley
2002-10-15 18:55 [patch 2.5] ips " Jeffery, David
2002-10-15 19:30 ` Dave Hansen
2002-10-15 19:47 ` Doug Ledford
2002-10-15 20:04   ` Patrick Mansfield
2002-10-15 20:52     ` Doug Ledford
2002-10-15 23:30       ` Patrick Mansfield
2002-10-15 23:56         ` Luben Tuikov
2002-10-16  2:32         ` Doug Ledford
2002-10-16 19:04           ` Patrick Mansfield
2002-10-16 20:15             ` Doug Ledford
2002-10-17  0:39             ` Luben Tuikov
2002-10-17 17:01               ` Mike Anderson
2002-10-17 21:13                 ` Luben Tuikov
2002-10-15 20:10   ` Mike Anderson
2002-10-15 20:24     ` Doug Ledford
2002-10-15 20:38     ` James Bottomley
2002-10-15 22:10       ` Mike Anderson
2002-10-16  1:04         ` James Bottomley
2002-10-15 20:24   ` Mike Anderson
2002-10-15 22:46     ` Doug Ledford
2002-10-15 20:26   ` Luben Tuikov
2002-10-15 21:27     ` Patrick Mansfield
2002-10-16  0:43       ` Luben Tuikov
2002-10-21  7:28   ` Mike Anderson
2002-10-21 16:16     ` Doug Ledford
2002-10-21 16:29       ` James Bottomley
     [not found] <dledford@redhat.com>
2002-10-02  0:28 ` PATCH: scsi device queue depth adjustability patch Doug Ledford
2002-10-02  1:16   ` Alan Cox
2002-10-02  1:41     ` Doug Ledford
2002-10-02 13:44       ` Alan Cox
2002-10-02 21:41   ` James Bottomley
2002-10-02 22:18     ` Doug Ledford
2002-10-02 23:19       ` James Bottomley
2002-10-03 12:46       ` James Bottomley
2002-10-03 16:35         ` Doug Ledford
2002-10-04  1:40         ` Jeremy Higdon
2002-10-03 14:25   ` James Bottomley
2002-10-03 16:41     ` Doug Ledford
2002-10-03 17:00       ` James Bottomley
2002-10-16 21:35 ` scsi_scan.c question Doug Ledford
2002-10-16 21:41   ` James Bottomley
2002-10-17  0:18     ` Doug Ledford
2002-10-16 21:57   ` Patrick Mansfield
2002-10-18 15:57     ` Patrick Mansfield
2002-11-18  0:27 ` aic7xxx_biosparam Doug Ledford
2002-11-18  0:36   ` aic7xxx_biosparam J.E.J. Bottomley
2002-11-18  2:46     ` aic7xxx_biosparam Doug Ledford
2002-11-18  3:20       ` aic7xxx_biosparam J.E.J. Bottomley
2002-11-18  3:26         ` aic7xxx_biosparam Doug Ledford
2002-11-18  0:43   ` aic7xxx_biosparam Andries Brouwer
2002-11-18  2:47     ` aic7xxx_biosparam Doug Ledford
2002-11-18  0:57   ` aic7xxx_biosparam Alan Cox
2002-11-18  2:34     ` aic7xxx_biosparam Doug Ledford
2002-12-21  1:22 ` scsi_scan changes Doug Ledford
2002-12-21  1:27   ` James Bottomley
2002-10-21 19:34 [PATCH] get rid of ->finish method for highlevel drivers Christoph Hellwig
2002-10-21 23:58 ` James Bottomley
2002-10-22 15:48   ` James Bottomley
2002-10-22 18:43     ` Patrick Mansfield
2002-10-22 23:17       ` Mike Anderson
2002-10-22 23:30         ` Doug Ledford
2002-10-23 14:16           ` James Bottomley
2002-10-23 15:13             ` Christoph Hellwig
2002-10-24  1:36               ` Patrick Mansfield
2002-10-24 23:20               ` Willem Riede
2002-10-24 23:36                 ` Christoph Hellwig
2002-10-25  0:02                   ` Willem Riede
2002-10-22  7:30 ` Mike Anderson
2002-10-22 11:14   ` Christoph Hellwig
2002-11-06  4:24 [PATCH] fix 2.5 scsi queue depth setting Patrick Mansfield
2002-11-06  4:35 ` Patrick Mansfield
2002-11-06 17:15 ` J.E.J. Bottomley
2002-11-06 17:47 ` J.E.J. Bottomley
2002-11-06 18:24   ` Patrick Mansfield
2002-11-06 18:32     ` J.E.J. Bottomley
2002-11-06 18:39       ` Patrick Mansfield
2002-11-06 18:50         ` J.E.J. Bottomley
2002-11-06 19:50           ` Patrick Mansfield
2002-11-06 20:45     ` Doug Ledford
2002-11-06 21:19       ` J.E.J. Bottomley
2002-11-06 20:50 ` Doug Ledford
2002-11-06 22:18 [PATCH] add request prep functions to SCSI J.E.J. Bottomley
2002-11-06 23:16 ` Doug Ledford
2002-11-06 23:43   ` J.E.J. Bottomley
2002-11-07 21:45 ` Mike Anderson
2002-11-15 20:34 [RFC][PATCH] move dma_mask into struct device J.E.J. Bottomley
2002-11-16  0:19 ` Mike Anderson
2002-11-16 14:48   ` J.E.J. Bottomley
2002-11-16 20:33 ` Patrick Mansfield
2002-11-17 15:07   ` J.E.J. Bottomley
2002-11-16 19:40 [PATCH] removel useless mod use count manipulation Christoph Hellwig
2002-11-17  2:59 ` Doug Ledford
2002-11-17 17:31   ` J.E.J. Bottomley
2002-11-17 18:14     ` Doug Ledford
2002-11-17 12:40 ` Douglas Gilbert
2002-11-17 12:48   ` Christoph Hellwig
2002-11-17 13:38     ` Douglas Gilbert
2002-11-21 15:16 [PATCH] turn scsi_allocate_device into readable code Christoph Hellwig
2002-11-21 15:36 ` Doug Ledford
2002-11-21 15:39   ` J.E.J. Bottomley
2002-11-21 15:49     ` Doug Ledford
2002-11-21 16:12       ` J.E.J. Bottomley
2002-11-21 17:08         ` [PATCH] current scsi-misc-2.5 include files Patrick Mansfield

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).