Setting queue depths in the scsi mid layer, an intro

linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Setting queue depths in the scsi mid layer, an intro
@ 2002-10-11  2:46 Doug Ledford
  2002-10-11 10:34 ` Alan Cox
  2002-10-15  1:15 ` Patrick Mansfield
  0 siblings, 2 replies; 8+ messages in thread
From: Doug Ledford @ 2002-10-11  2:46 UTC (permalink / raw)
  To: linux-scsi

One of the recent changes to the scsi mid layer was my adjustable queue
depth patch.  This patch requires that all drivers who wish to implement
tagged queueing be updated.  (Yes James, I know that the simple NCR700
driver you've been using does tagged queueing without these updates, but
that won't last because the tagged queueing mechanism needs to be a full
bottom up mechanism in order to work, so very shortly the business of
setting the block layer queue depth will be controlled the via this same
mechanism)

So here's the long and short of it so that maybe some other people can 
help jump in and update drivers.

First, the select_queue_depths call in point in drivers is now deprecated.  
I left the call in active in my patch, but on deeper reflection it was a
wasted effort (I was going to let this continue to work until most/all of
the drivers had been modified, but because scsi_build_commandblocks() now
looks at new_queue_depth instead of queue_depth, this is not possible, in
fact it will likely confuse the hell out of the mid layer because
queue_depth is now suppossed to indicate the actual number of commands
allocated on the device in question, not how many commands you want
allocated, and that's an important distinction) and the
select_queue_depths call in from the mid layer (in scsi_scan.c) will be
going away immediately (Linus, expect a small patch shortly)

The revoke() call in point in scsi drivers will also be going away.  
However, most drivers, if they implement a revoke() call, can simply 
rename it to slave_detach() and be fine (the name is changing to be more 
descriptive, but it does the same basic thing, more on that later).

The replacement for select_queue_depths() is the slave_attach() call in
point.  Once the drive has been identified and the INQUIRY data
sufficiently probed to tell things like device type, scsi level, and
available options (including the new sdtr/wdtr/ppr flags which tell the
low level driver whether or not the device supports these particular
message protocols based upon the INQUIRY return values).  At this point,
the driver is free to do device specific setups.  That can include setting
the queue depth on the device by calling scsi_adjust_queue_depths() entry
point with the scsi device, whether you are enabling tagged command
queueing (and tagged options, coming shortly in an update patch,
basically, 0 means disabled TCQ, MSG_SIMPLE_TAG is simple tags only, and
MSG_ORDERED_TAG will indicate tagged queueing is enabled with both simple
and ordered tags allowed), and the queue depth.  Most disk devices will
end up looking something like this:

scsi_adjust_queue_depth(SDptr, MSG_ORDERED_TAG, queue_depth);

At some later point in time (like in an interrupt handler,
scsi_adjust_queue_depths *is* interrupt context safe) if your driver finds
out that the device doesn't support ordered tags, then it can recall this
command and omit the ORDERED_Q_TAG bit in order to get tagged queueing
with just simple queue tags.  If it doesn't support either tag type, then
call it with (SDptr, 0, <cmd_per_lun>) as the arguments to set it back to
no tagged queueing and your driver's default cmd_per_lun queue depth.

That, of course, is just one thing a driver can do at the slave_attach() 
call in point.  It may also do things like allocate driver specific device 
data storage and attach it to SDptr->hostdata for future use (for example, 
you may keep your queue depth counter or busy flag or other driver 
specific crap like BIOS settings in this data storage area).  Some 
examples are present in the aic7xxx_old.c driver.

Why did we make this change?  Simple, the select_queue_depths() interface 
was broken by design.  It did only one thing when in reality there are a 
number of things that need done at device attach time.  It was only 
possible to change queue depths at this one point in time, never more, 
resulting in things like Justin's aic7xxx driver setting the queue depth 
on all disks to 253 in order to get the best possible performance, but 
when the driver found out that some disk had a hard limit of say 64, it 
was too late to tell the mid layer to adjust things down to a reasonable 
value.  This allows the low level driver to make changes to the queue 
depth later and still have the mid layer honor them.

The revoke->slave_detach change is pretty much a name change only.  I
would have left it alone and just named slave_attach something that made
it obvious they are a matched pair, but invoke(SDptr) sounded more 
like something I would read in a David Eddings novel than a kernel 
function.  The big item is that I'm strongly encouraging drivers to make
use of the SDptr->hostdata pointer to store their data (helps to avoid
things like having to keep sparse direct indexed tables in your driver if
you can have device attached storage space).  The aic7xxx_old driver will
be getting updated to use this device attached storage area, thereby
eliminating quite a few huge arrays currently in each aic7xxx_host struct.  
The slave_detach() call is a place for driver to deconstruct anything they
might have attached to the device, disable the device internally in their
driver, etc.  The most important thing here though is this:

The slave_detach() call, if it deallocates any memory storage attached to 
SDptr->hostdata *must* then NULL out the hostdata pointer.  The scsi mid 
layer is nice and will kfree(SDptr->hostdata) for you on device 
destruction of SDptr->hostdata is non-NULL.  We do this because, by far, 
the common case is a simple kfree() of the area and that's it.  If your 
driver needs to do more, it can make and register the slave_detach() call.  
If it doesn't, no need to bother with slave_detach().  That catches the 
biggest common cases.  Failure to NULL out the SDptr->hostdata after 
kfree()'ing the memory will result in a double kfree() and subsequent 
kernel complaints.

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Setting queue depths in the scsi mid layer, an intro
  2002-10-11  2:46 Setting queue depths in the scsi mid layer, an intro Doug Ledford
@ 2002-10-11 10:34 ` Alan Cox
  2002-10-11 10:39   ` Doug Ledford
  2002-10-15  1:15 ` Patrick Mansfield
  1 sibling, 1 reply; 8+ messages in thread
From: Alan Cox @ 2002-10-11 10:34 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-scsi

> and ordered tags allowed), and the queue depth.  Most disk devices will
> end up looking something like this:
> 
> scsi_adjust_queue_depth(SDptr, MSG_ORDERED_TAG, queue_depth);
>
> on all disks to 253 in order to get the best possible performance, but 
> when the driver found out that some disk had a hard limit of say 64, it 
> was too late to tell the mid layer to adjust things down to a reasonable 
> value.  This allows the low level driver to make changes to the queue 
> depth later and still have the mid layer honor them.

This seems a somewhat broken interface to me. Not in the facilities it
provides - which are fine, but because in the general case surely this
is entirely the business of the mid layer or mid layer provided library
code to get right.

Why should each driver now be required to know about queue depths and
snoop inquiry commands ? It seems analogous to each net driver knowing
about tcp windowing rather than the protocol doing it.

Or am I misunderstanding. If I blindly do

	scsi_adjust_queue_depth(SDptr, MSG_ORDERED_TAG, 64);

will the mid layer then figure out if it should trim the queue or turn
off ordered tags ? The interface most authors need is probably more like

	scsi_default_queue_depth(....)

	eh_tag_error: scsi_default_tag_eh

Alan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Setting queue depths in the scsi mid layer, an intro
  2002-10-11 10:34 ` Alan Cox
@ 2002-10-11 10:39   ` Doug Ledford
  2002-10-11 12:47     ` Alan Cox
  0 siblings, 1 reply; 8+ messages in thread
From: Doug Ledford @ 2002-10-11 10:39 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-scsi

On Fri, Oct 11, 2002 at 11:34:48AM +0100, Alan Cox wrote:
> > and ordered tags allowed), and the queue depth.  Most disk devices will
> > end up looking something like this:
> > 
> > scsi_adjust_queue_depth(SDptr, MSG_ORDERED_TAG, queue_depth);
> >
> > on all disks to 253 in order to get the best possible performance, but 
> > when the driver found out that some disk had a hard limit of say 64, it 
> > was too late to tell the mid layer to adjust things down to a reasonable 
> > value.  This allows the low level driver to make changes to the queue 
> > depth later and still have the mid layer honor them.
> 
> This seems a somewhat broken interface to me. Not in the facilities it
> provides - which are fine, but because in the general case surely this
> is entirely the business of the mid layer or mid layer provided library
> code to get right.

<Arghhh!!!  Pulling hair out>

I wish people would decide on whether or not they want the drivers to be 
able to control things.  One week I hear everyone say "The driver should 
be able to set feature X because only it knows what to do" and the next 
week I get what you see above...

</frustration>

No, the mid layer is not the right place to handle this.  Largely because
we have both drivers like the 152x Adaptec and MegaRAID/ServeRAID in the
tree.  The mid layer can't accomodate all of them in terms of queue depth
the same as it can't accomodate them all in terms of reasonable default
command timeouts.

> Why should each driver now be required to know about queue depths and
> snoop inquiry commands ?

Who said anything about snooping INQUIRY commands?  As for queue depths, 
the driver is the *only* one that knows what it can support.  The mid 
layer still snoops the INQUIRY data from devices and uses it to set a 
bunch of flags in the scsi device struct (disconnect, sdtr, wdtr, ppr, 
tagged_supported to name the most important ones).  Then the mid layer 
calls the slave_attach() routine after all of these items have already 
been set.  The lldd only needs to check these bits (or it can, at it's 
option, snoop the INQUIRY directly via SDptr->inquiry).

> It seems analogous to each net driver knowing
> about tcp windowing rather than the protocol doing it.

No, it's more analogous to each net driver knowing it's own transmit and 
receive ring sizes.

> Or am I misunderstanding. If I blindly do
> 
> 	scsi_adjust_queue_depth(SDptr, MSG_ORDERED_TAG, 64);
> 
> will the mid layer then figure out if it should trim the queue or turn
> off ordered tags ?

Not reliably.  It can't usually because of internal queues on smart cards 
and crap like that.  Also, it can't know to turn off ordered tags unless 
the low level driver passes back the MSG_REJECT message it should have 
gotten when issuing the ordered tag if the device doesn't support ordered 
tags.  Of course, since there's no facility for passing that info back...

What my aic7xxx_old driver does is in the MSG_REJECT handler, if last 
message was MSG_ORDERED_TAG, then call:

scsi_adjust_queue_depth(SDptr, MSG_SIMPLE_TAG, SDptr->new_queue_depth);

which leaves the actual depth unchanged but turns off ordered tags.

Now, this is all really pointless though unless the driver is *also* 
updated to honor the tag type the block layer requests, which is another 
subject all together and hasn't been brought up yet ;-)

> The interface most authors need is probably more like
> 
> 
> 	scsi_default_queue_depth(....)
> 
> 	eh_tag_error: scsi_default_tag_eh
> 
> Alan

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Setting queue depths in the scsi mid layer, an intro
  2002-10-11 10:39   ` Doug Ledford
@ 2002-10-11 12:47     ` Alan Cox
  2002-10-11 12:49       ` Doug Ledford
  0 siblings, 1 reply; 8+ messages in thread
From: Alan Cox @ 2002-10-11 12:47 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-scsi

On Fri, 2002-10-11 at 11:39, Doug Ledford wrote:
> <Arghhh!!!  Pulling hair out>
> 
> I wish people would decide on whether or not they want the drivers to be 
> able to control things.  One week I hear everyone say "The driver should 
> be able to set feature X because only it knows what to do" and the next 
> week I get what you see above...
> 

It isnt an either/or question. I think thats the problem. 

Do we want drivers to be able to control things - yes
Do we want drivers to be able to armwave generic settings - yes
 
> the driver is the *only* one that knows what it can support.  The mid 
> layer still snoops the INQUIRY data from devices and uses it to set a 
> bunch of flags in the scsi device struct (disconnect, sdtr, wdtr, ppr, 

Ok.

> tagged_supported to name the most important ones).  Then the mid layer 
> calls the slave_attach() routine after all of these items have already 
> been set.  The lldd only needs to check these bits (or it can, at it's 
> option, snoop the INQUIRY directly via SDptr->inquiry).

Ok thats not as bad.

> No, it's more analogous to each net driver knowing it's own transmit and 
> receive ring sizes.

Believe it or not, some of them don't, and the tcp layer actually
figures out the right buffering behaviour itself. But thats a bit
different.

> Now, this is all really pointless though unless the driver is *also* 
> updated to honor the tag type the block layer requests, which is another 
> subject all together and hasn't been brought up yet ;-)

Done that bit, and for things like i2o_scsi I'm happy to do the full job
where I can, but for a lot of the older dumber devices really all they
want is "generic_do_it_conservatively"

Alan


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Setting queue depths in the scsi mid layer, an intro
  2002-10-11 12:47     ` Alan Cox
@ 2002-10-11 12:49       ` Doug Ledford
  2002-10-11 13:22         ` Alan Cox
  0 siblings, 1 reply; 8+ messages in thread
From: Doug Ledford @ 2002-10-11 12:49 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-scsi

On Fri, Oct 11, 2002 at 01:47:02PM +0100, Alan Cox wrote:
> On Fri, 2002-10-11 at 11:39, Doug Ledford wrote:
> > <Arghhh!!!  Pulling hair out>
> > 
> > I wish people would decide on whether or not they want the drivers to be 
> > able to control things.  One week I hear everyone say "The driver should 
> > be able to set feature X because only it knows what to do" and the next 
> > week I get what you see above...
> 
> It isnt an either/or question. I think thats the problem. 
> 
> Do we want drivers to be able to control things - yes
> Do we want drivers to be able to armwave generic settings - yes

And to a certain extent, both of those are true.  Overall performance in 
the face of odd conditions and such is pretty much driver dependant 
though.  Armwaving a generic setting gets you an armwaved generic error 
handler at best ;-)

> > No, it's more analogous to each net driver knowing it's own transmit and 
> > receive ring sizes.
> 
> Believe it or not, some of them don't,

Hmmm...I think we are talking different terms.  It would seem a bit 
difficult for a card driver to program the DMA engine on the card 
(assuming it has one, if it doesn't I don't really care about how it works 
anyway) without knowing the ring of receive buffers it has (like the ring 
on an e100 card where they set up 32 or so commands as empty receive buffs 
and tell the card where they exist and then wait for the card to interrupt 
them and say "buff X has been filled").  That's pretty analogous to the 
hardware detail level required to fully handle QUEUE_FULL messages 
properly.  Half-assed handling of QUEUE_FULL messages is easy and can be 
done with generic armwaving settings ;-) (and in fact it will be, expect 
those changes when I've had some sleep and get back to work on it, the 
aic7xxx_old patch I just sent to Linus disables the intelligent QF 
handling in the driver so I can test mid layer handling instead).

> > Now, this is all really pointless though unless the driver is *also* 
> > updated to honor the tag type the block layer requests, which is another 
> > subject all together and hasn't been brought up yet ;-)
> 
> Done that bit, and for things like i2o_scsi I'm happy to do the full job
> where I can, but for a lot of the older dumber devices really all they
> want is "generic_do_it_conservatively"

That's not going to be a problem.  Doing things optimally and doing things 
well enough are both going to be options like they almost always have 
been.  The goal is that the optimal version will be a good bit easier and 
be able to be done with smaller drivers in the future though.

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Setting queue depths in the scsi mid layer, an intro
  2002-10-11 12:49       ` Doug Ledford
@ 2002-10-11 13:22         ` Alan Cox
  0 siblings, 0 replies; 8+ messages in thread
From: Alan Cox @ 2002-10-11 13:22 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-scsi

On Fri, 2002-10-11 at 13:49, Doug Ledford wrote:
> And to a certain extent, both of those are true.  Overall performance in 
> the face of odd conditions and such is pretty much driver dependant 
> though.  Armwaving a generic setting gets you an armwaved generic error 
> handler at best ;-)

Which is fair enough. Having users able to write adequate but working
drivers is incredibly important to Linux. We have a wide variety of
"functional" ethernet drivers and a smaller set of precision honed
drivers that cover the common high performance chips.

> Hmmm...I think we are talking different terms.  It would seem a bit 
> difficult for a card driver to program the DMA engine on the card 

There are cards whose interface is basically

	"give me the address of the next packet buffer
	"please now send this address"

and
	"do you have a packet and if so where"
	"thank you, I have finished with it"

> them and say "buff X has been filled").  That's pretty analogous to the 
> hardware detail level required to fully handle QUEUE_FULL messages 
> properly.  Half-assed handling of QUEUE_FULL messages is easy and can be 
> done with generic armwaving settings ;-) (and in fact it will be, expect 
> those changes when I've had some sleep and get back to work on it, the 
> aic7xxx_old patch I just sent to Linus disables the intelligent QF 
> handling in the driver so I can test mid layer handling instead).

Cool. I just want to be able to make sure that all the simpler drivers
can say "yeah Im not perfect but I work pretty sanely and I dont curl up
and die"

> That's not going to be a problem.  Doing things optimally and doing things 
> well enough are both going to be options like they almost always have 
> been.  The goal is that the optimal version will be a good bit easier and 
> be able to be done with smaller drivers in the future though.

Great. Then we are actually on the same path


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Setting queue depths in the scsi mid layer, an intro
  2002-10-11  2:46 Setting queue depths in the scsi mid layer, an intro Doug Ledford
  2002-10-11 10:34 ` Alan Cox
@ 2002-10-15  1:15 ` Patrick Mansfield
  2002-10-15  1:28   ` Doug Ledford
  1 sibling, 1 reply; 8+ messages in thread
From: Patrick Mansfield @ 2002-10-15  1:15 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-scsi

Doug -

Finally looking at and using the new queue depth code in 2.5.42.

Why are there two kmalloc()'s in scsi.c scsi_build_commandblocks()? 

They both use the same flags, including GFP_ATOMIC:

void scsi_build_commandblocks(Scsi_Device * SDpnt)
{
	unsigned long flags;
	Scsi_Cmnd *SCpnt;

	if (SDpnt->queue_depth != 0)
		return;
		
	SCpnt = (Scsi_Cmnd *) kmalloc(sizeof(Scsi_Cmnd), GFP_ATOMIC |
			(SDpnt->host->unchecked_isa_dma ? GFP_DMA : 0));
	if (NULL == SCpnt) {
		/*
		 * Since we don't currently have *any* command blocks on this
		 * device, go ahead and try an atomic allocation...
		 */
		SCpnt = (Scsi_Cmnd *) kmalloc(sizeof(Scsi_Cmnd), GFP_ATOMIC |
			(SDpnt->host->unchecked_isa_dma ? GFP_DMA : 0));
		if (NULL == SCpnt)
			return;	/* Oops, we aren't going anywhere for now */
	}

	memset(SCpnt, 0, sizeof(Scsi_Cmnd));

[ ... ]

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Setting queue depths in the scsi mid layer, an intro
  2002-10-15  1:15 ` Patrick Mansfield
@ 2002-10-15  1:28   ` Doug Ledford
  0 siblings, 0 replies; 8+ messages in thread
From: Doug Ledford @ 2002-10-15  1:28 UTC (permalink / raw)
  To: linux-scsi

On Mon, Oct 14, 2002 at 06:15:12PM -0700, Patrick Mansfield wrote:
> Doug -
> 
> Finally looking at and using the new queue depth code in 2.5.42.
> 
> Why are there two kmalloc()'s in scsi.c scsi_build_commandblocks()? 
> 
> They both use the same flags, including GFP_ATOMIC:
> 
> void scsi_build_commandblocks(Scsi_Device * SDpnt)
> {
> 	unsigned long flags;
> 	Scsi_Cmnd *SCpnt;
> 
> 	if (SDpnt->queue_depth != 0)
> 		return;
> 		
> 	SCpnt = (Scsi_Cmnd *) kmalloc(sizeof(Scsi_Cmnd), GFP_ATOMIC |
> 			(SDpnt->host->unchecked_isa_dma ? GFP_DMA : 0));
> 	if (NULL == SCpnt) {
> 		/*
> 		 * Since we don't currently have *any* command blocks on this
> 		 * device, go ahead and try an atomic allocation...
> 		 */
> 		SCpnt = (Scsi_Cmnd *) kmalloc(sizeof(Scsi_Cmnd), GFP_ATOMIC |
> 			(SDpnt->host->unchecked_isa_dma ? GFP_DMA : 0));
> 		if (NULL == SCpnt)
> 			return;	/* Oops, we aren't going anywhere for now */
> 	}

Oops.  Oversight.  It was from the time when I had the first allocation as 
GFP_KERNEL and then an ATOMIC backup.  I changed that because this is 
called from non-sleep allowed contexts (tasklet/softirq) and therefore 
must always use ATOMIC.  So, it only needs one allocation attempt.


-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606
  

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2002-10-15  1:28 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-10-11  2:46 Setting queue depths in the scsi mid layer, an intro Doug Ledford
2002-10-11 10:34 ` Alan Cox
2002-10-11 10:39   ` Doug Ledford
2002-10-11 12:47     ` Alan Cox
2002-10-11 12:49       ` Doug Ledford
2002-10-11 13:22         ` Alan Cox
2002-10-15  1:15 ` Patrick Mansfield
2002-10-15  1:28   ` Doug Ledford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).