From: Doug Ledford <dledford@redhat.com>
To: Patrick Mansfield <patmans@us.ibm.com>
Cc: linux-scsi@vger.kernel.org
Subject: Re: [PATCH + RFC] Beginning of some updates to scsi mid layer
Date: Thu, 20 Jun 2002 15:45:57 -0400 [thread overview]
Message-ID: <20020620154557.E9181@redhat.com> (raw)
In-Reply-To: <20020619141522.A11733@eng2.beaverton.ibm.com>; from patmans@us.ibm.com on Wed, Jun 19, 2002 at 02:15:22PM -0700
On Wed, Jun 19, 2002 at 02:15:22PM -0700, Patrick Mansfield wrote:
> On Tue, Jun 18, 2002 at 08:47:53PM -0400, Doug Ledford wrote:
>
> Doug -
>
> > As mentioned, this adds a new scsi_adjust_queue_depth routine that lets
> > the low level driver update the queue depth on a device at any point in
> > time. So, when Justin Gibbs' driver allocates a queue depth of 253
> > commands on drive X, then finds out that drive X has a hard limit of 64,
> > then we aren't continuing to waste 189 commands worth of allocated kernel
> > memory that can never be effectively used. Future plans here include
> > linking this into the bio queueing so that adjusting the mid layer queue
> > depth does the "right thing" in regards to the bio layer (for all I know
> > it does already, I haven't gotten into the bio->scsi interface yet).
>
> > Once this stuff is in place, then my next goal is to get the queueing code
> > in the mid layer up to useable standards (the goal is to be able to rip
> > the buts of the QUEUE_FULL and BUSY handling out of my driver and still
> > have the system do the "right thing" instead of my driver having it's own
> > complete internal queue mechanism including timers and such like it does
> > now).
>
> The Scsi_Cmnd initialization should have its own function (maybe
> including allocation), called from scsi_adjust_queue_depth and your
> new code.
I disagree, but see below for why.
> So do you think the current select_queue_depths should (eventually)
> be removed?
Yes, that's my goal. The current method is confusing to some driver
writers as they end up seeing the same device multiple times when someone
does any scsi-add-single-device magic on thier bus. It only lets you
adjust the queue depth when a device is added, not when you need to. It's
a bad design and needs to go away.
> Not a comment on your patch, but why isn't the queue depth setting
> handled completely in the mid-layer?
Because the mid layer doesn't have the information needed to do so. Queue
depth settings are limited by two things. 1) the drive's ability to
support tagged queue commands. 2) the driver/hardware's ability to send
those commands. The driver's often used shared tags amongst all drives
for efficiency reasons (my aic7xxx driver does for example). Details such
as this impact what can and can not be done in terms of tagged queueing.
Only the low level driver does/should have that information, so only the
low level driver should control the tagged queueing. The mid layer is,
for the most part, totally tag ignorant. It only needs to know that they
are supported and it can send X of them, that's all. There isn't any real
reason to try and make it control things. (Not yet anyway, but I am
planning on adding one specific type of control to the mid layer in
regards to tagged queueing, but that's still a few patches down the road)
> The cmd_per_lun and can_queue fields are in Scsi_Host, and the mid-layer
> already checks for tagged queueing. I don't see what adavantage there
> is to having the low level adapters each set the queue depth in their
> own way. Plus, a lot of common (and not so common) code would go away.
Each low level driver *must* set the queue depth in its own way. A few
examples should illustrate this for you quite nicely:
1) In the aic7xxx driver I have a 255 command limit per card. In host
memory I have a set of card/command specific data structures. For each
command that comes in, I fill in one of these structures and I then start
the command. The tag value used in talking to the device is actually a
direct index into this array of command structures. That way, when a
device reconnects and tells the card what tag it is reconnecting, the card
can download the saved command state immediately. This means that actual
tag values used between drives *must* always be unique. Therefore, the
mid layer is not in a position to assign the actual tag value for me, but
only in a position to say "send this as a tagged command".
2) Currently (and this is broken BTW, but oh well), the aic7xxx driver
does most of it's indexing into device specific structs based upon the
target ID, but not based upon the lun. That means that when connecting
the aic7xxx driver to an external SCSI RAID chassis that exports more than
one drive as the same target ID and different luns, we have overlap in the
data structures in the driver. Setting the queue depth for these
overlapping devices requires internal knowledge of how the overlap effects
things. Obviously, the mid layer can't know these details.
3) The aic7xxx driver is capable of setting up a waiting queue on the card
itself. In this queue we may have up to 16 commands waiting to be sent to
the device. Now, imagine that we already have 64 commands sent to a
Seagate drive, and we have just queued up 16 more. When the first of
those 16 commands connects to the Seagate drive, it might generate a
QUEUE_FULL response. If so, in my interrupt handler, I pull the other 15
commands out of the waiting queue so I don't hit the drive with a bunch of
commands when the drive can't take them. I also count exactly how many
commands are *actually* on the drive and active so I can track whether or
not the QUEUE_FULL messages always happen at the same depth. It is
impossible to do this count without knowing intimate hardware details of
the controller and its outgoing queue structure. This is also the only
reliable way of detecting when a device has a hard queue depth limit or if
it generates QUEUE_FULL messages at differing depths based upon current
resource allocations on the drive itself. This happens at interrupt time
because the command completion, where we get the QUEUE_FULL status, is
obviously an interrupt.
> And then scsi_adjust_queue_depth() could be called when not in interrupt
> context.
This is totally backwards to what you say earlier. Earlier you want the
mid layer to handle command queueing so that each driver doesn't have to
write specific code for setting things, and here you advocate each driver
writing its own method for calling adjust_queue_depth out of the interrupt
context when instead this lazy queue depth setting handler is one of the
few items that *could* be handled just in one place. I think your take on
tagged queueing is about the perfect opposite of most implementation
details.
--
Doug Ledford <dledford@redhat.com> 919-754-3700 x44233
Red Hat, Inc.
1801 Varsity Dr.
Raleigh, NC 27606
next prev parent reply other threads:[~2002-06-20 19:45 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-06-19 0:47 [PATCH + RFC] Beginning of some updates to scsi mid layer Doug Ledford
2002-06-19 21:15 ` Patrick Mansfield
2002-06-20 19:45 ` Doug Ledford [this message]
[not found] <20020619014048.B8623@redhat.com>
2002-06-19 17:44 ` Pete Zaitcev
2002-06-19 17:55 ` Matthew Jacob
2002-06-19 18:25 ` Doug Ledford
2002-06-28 5:41 ` Jeremy Higdon
2002-06-28 7:37 ` Doug Ledford
-- strict thread matches above, loose matches on Subject: below --
2002-06-28 6:08 Martin Peschke3
2002-06-28 7:39 ` Doug Ledford
2002-06-29 1:19 ` Jeremy Higdon
2002-06-29 2:04 ` Matthew Jacob
2002-06-29 10:05 ` Doug Ledford
2002-06-29 10:37 ` Matthew Jacob
2002-07-01 21:02 ` Gérard Roudier
2002-07-01 19:08 ` Matthew Jacob
2002-07-01 19:15 ` Doug Ledford
2002-07-01 19:23 ` Matthew Jacob
2002-07-01 19:59 ` Doug Ledford
2002-07-01 20:17 ` Matthew Jacob
2002-07-02 11:27 ` Rogier Wolff
2002-06-29 10:10 ` Doug Ledford
2002-06-28 8:25 Martin Peschke3
2002-06-28 11:22 ` Doug Ledford
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20020620154557.E9181@redhat.com \
--to=dledford@redhat.com \
--cc=linux-scsi@vger.kernel.org \
--cc=patmans@us.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox