* PATCH [5/15] qla2xxx: SG tablesize update
@ 2004-03-14 8:24 Andrew Vasquez
2004-03-14 14:49 ` James Bottomley
0 siblings, 1 reply; 22+ messages in thread
From: Andrew Vasquez @ 2004-03-14 8:24 UTC (permalink / raw)
To: James Bottomley; +Cc: SCSI Mailing List
ChangeSet
1.1661 04/02/28 22:35:59 andrew.vasquez@qlogic.com +3 -0
Limit SG entry size to make more efficient use of the
size contraints of the request-queue.
This SG entry size is inline with the size specified in the 6.x
and 7.x series drivers.
drivers/scsi/qla2xxx/qla_os.c
1.9 04/02/28 22:35:52 andrew.vasquez@qlogic.com +1 -1
Limit SG entry size to make more efficient use of the
size contraints of the request-queue.
drivers/scsi/qla2xxx/qla_init.c
1.8 04/02/28 22:35:52 andrew.vasquez@qlogic.com +4 -1
Limit the size of the buffer used to load firmware.
drivers/scsi/qla2xxx/qla_def.h
1.8 04/02/28 22:35:52 andrew.vasquez@qlogic.com +1 -1
Increase an ISP's request queue size.
drivers/scsi/qla2xxx/qla_def.h | 2 +-
drivers/scsi/qla2xxx/qla_init.c | 5 ++++-
drivers/scsi/qla2xxx/qla_os.c | 2 +-
3 files changed, 6 insertions(+), 3 deletions(-)
ftp://ftp.qlogic.com/outgoing/linux/patches/8.x/8.00.00b11k/14_sg_resize.patch
diff -Nru a/drivers/scsi/qla2xxx/qla_def.h b/drivers/scsi/qla2xxx/qla_def.h
--- a/drivers/scsi/qla2xxx/qla_def.h Fri Mar 12 17:07:22 2004
+++ b/drivers/scsi/qla2xxx/qla_def.h Fri Mar 12 17:07:22 2004
@@ -209,7 +209,7 @@
#define MAX_OUTSTANDING_COMMANDS 1024
/* ISP request and response entry counts (37-65535) */
-#define REQUEST_ENTRY_CNT 1024 /* Number of request entries. */
+#define REQUEST_ENTRY_CNT 2048 /* Number of request entries. */
#define RESPONSE_ENTRY_CNT_2100 64 /* Number of response entries.*/
#define RESPONSE_ENTRY_CNT_2300 512 /* Number of response entries.*/
diff -Nru a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c
--- a/drivers/scsi/qla2xxx/qla_init.c Fri Mar 12 17:07:22 2004
+++ b/drivers/scsi/qla2xxx/qla_init.c Fri Mar 12 17:07:22 2004
@@ -603,7 +603,10 @@
ha->product_id[3] = mb[4];
/* Adjust fw RISC transfer size */
- ha->fw_transfer_size = REQUEST_ENTRY_SIZE * REQUEST_ENTRY_CNT;
+ if (REQUEST_ENTRY_CNT > 1024)
+ ha->fw_transfer_size = REQUEST_ENTRY_SIZE * 1024;
+ else
+ ha->fw_transfer_size = REQUEST_ENTRY_SIZE * REQUEST_ENTRY_CNT;
if (IS_QLA2200(ha) &&
RD_MAILBOX_REG(ha, reg, 7) == QLA2200A_RISC_ROM_VER) {
diff -Nru a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
--- a/drivers/scsi/qla2xxx/qla_os.c Fri Mar 12 17:07:22 2004
+++ b/drivers/scsi/qla2xxx/qla_os.c Fri Mar 12 17:07:22 2004
@@ -166,7 +166,7 @@
.can_queue = REQUEST_ENTRY_CNT+128,
.cmd_per_lun = 3,
.use_clustering = ENABLE_CLUSTERING,
- .sg_tablesize = SG_ALL,
+ .sg_tablesize = 32,
/*
* The RISC allows for each command to transfer (2^32-1) bytes of data,
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: PATCH [5/15] qla2xxx: SG tablesize update
2004-03-14 8:24 PATCH [5/15] qla2xxx: SG tablesize update Andrew Vasquez
@ 2004-03-14 14:49 ` James Bottomley
2004-03-14 14:51 ` Jens Axboe
0 siblings, 1 reply; 22+ messages in thread
From: James Bottomley @ 2004-03-14 14:49 UTC (permalink / raw)
To: Andrew Vasquez; +Cc: SCSI Mailing List
On Sun, 2004-03-14 at 03:24, Andrew Vasquez wrote:
> - .sg_tablesize = SG_ALL,
> + .sg_tablesize = 32,
Are you sure you want to do this? It limits the maximum transfer size
per command to 128k (on x86) which is a bit small for FC.
James
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: PATCH [5/15] qla2xxx: SG tablesize update
2004-03-14 14:49 ` James Bottomley
@ 2004-03-14 14:51 ` Jens Axboe
2004-03-14 14:59 ` James Bottomley
0 siblings, 1 reply; 22+ messages in thread
From: Jens Axboe @ 2004-03-14 14:51 UTC (permalink / raw)
To: James Bottomley; +Cc: Andrew Vasquez, SCSI Mailing List
On Sun, Mar 14 2004, James Bottomley wrote:
> On Sun, 2004-03-14 at 03:24, Andrew Vasquez wrote:
> > - .sg_tablesize = SG_ALL,
> > + .sg_tablesize = 32,
>
> Are you sure you want to do this? It limits the maximum transfer size
> per command to 128k (on x86) which is a bit small for FC.
The explanation of why this was done (request queue limitations, huh?)
sounded pretty bogus as well.
--
Jens Axboe
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: PATCH [5/15] qla2xxx: SG tablesize update
2004-03-14 14:51 ` Jens Axboe
@ 2004-03-14 14:59 ` James Bottomley
2004-03-14 15:15 ` Jens Axboe
` (2 more replies)
0 siblings, 3 replies; 22+ messages in thread
From: James Bottomley @ 2004-03-14 14:59 UTC (permalink / raw)
To: Jens Axboe; +Cc: Andrew Vasquez, SCSI Mailing List
On Sun, 2004-03-14 at 09:51, Jens Axboe wrote:
> The explanation of why this was done (request queue limitations, huh?)
> sounded pretty bogus as well.
Actually, to be fair, I think he means the qlogic request queue, not the
block one.
The Qla chips are rather weird in that they have a single issue queue
(whose size you can vary) but whose entry formats are fixed. If I
remember correctly, an initial command can have 4 SG elements, but a
follow on entry can have 7 (not sure of the figures). But anyway, large
SG commands end up having to find multiple entries in this queue (and
being a single issue queue for the entire card, it has to be mutexed
while you search). The more resources you need, the more difficult the
search and the more contention you generate on the resource mutex.
I talked to Arjan a bit about this type of card and how best to handle
it in our current infrastructure, but I haven't got any further than
thinking about it.
James
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: PATCH [5/15] qla2xxx: SG tablesize update
2004-03-14 14:59 ` James Bottomley
@ 2004-03-14 15:15 ` Jens Axboe
2004-03-14 15:18 ` Anton Blanchard
2004-03-14 20:36 ` Jeff Garzik
2 siblings, 0 replies; 22+ messages in thread
From: Jens Axboe @ 2004-03-14 15:15 UTC (permalink / raw)
To: James Bottomley; +Cc: Andrew Vasquez, SCSI Mailing List
On Sun, Mar 14 2004, James Bottomley wrote:
> On Sun, 2004-03-14 at 09:51, Jens Axboe wrote:
> > The explanation of why this was done (request queue limitations, huh?)
> > sounded pretty bogus as well.
>
> Actually, to be fair, I think he means the qlogic request queue, not the
> block one.
>
> The Qla chips are rather weird in that they have a single issue queue
> (whose size you can vary) but whose entry formats are fixed. If I
> remember correctly, an initial command can have 4 SG elements, but a
> follow on entry can have 7 (not sure of the figures). But anyway, large
> SG commands end up having to find multiple entries in this queue (and
> being a single issue queue for the entire card, it has to be mutexed
> while you search). The more resources you need, the more difficult the
> search and the more contention you generate on the resource mutex.
Ah, makes a lot more sense now, thanks :-)
--
Jens Axboe
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: PATCH [5/15] qla2xxx: SG tablesize update
2004-03-14 14:59 ` James Bottomley
2004-03-14 15:15 ` Jens Axboe
@ 2004-03-14 15:18 ` Anton Blanchard
2004-03-14 15:31 ` James Bottomley
2004-03-14 20:36 ` Jeff Garzik
2 siblings, 1 reply; 22+ messages in thread
From: Anton Blanchard @ 2004-03-14 15:18 UTC (permalink / raw)
To: James Bottomley; +Cc: Jens Axboe, Andrew Vasquez, SCSI Mailing List
> Actually, to be fair, I think he means the qlogic request queue, not the
> block one.
>
> The Qla chips are rather weird in that they have a single issue queue
> (whose size you can vary) but whose entry formats are fixed. If I
> remember correctly, an initial command can have 4 SG elements, but a
> follow on entry can have 7 (not sure of the figures). But anyway, large
> SG commands end up having to find multiple entries in this queue (and
> being a single issue queue for the entire card, it has to be mutexed
> while you search). The more resources you need, the more difficult the
> search and the more contention you generate on the resource mutex.
With 2.6.3 on a ppc64 box connected to a shark with 8 LUNs I could
easily consume all the request slots on the card. At this point we spent
> 50% of the cpu bouncing into qla2x00_start_scsi, doing a pci_map_sg
realising it wont fit and doing a pci_unmap_sg.
As Andrew points out, the driver shouldnt be doing this constant
map/unmap stuff, but is there some way for a scsi device driver to tell
the layers above it to temporarily leave it alone?
Anton
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: PATCH [5/15] qla2xxx: SG tablesize update
2004-03-14 15:18 ` Anton Blanchard
@ 2004-03-14 15:31 ` James Bottomley
2004-03-14 15:47 ` Anton Blanchard
0 siblings, 1 reply; 22+ messages in thread
From: James Bottomley @ 2004-03-14 15:31 UTC (permalink / raw)
To: Anton Blanchard; +Cc: Jens Axboe, Andrew Vasquez, SCSI Mailing List
On Sun, 2004-03-14 at 10:18, Anton Blanchard wrote:
> With 2.6.3 on a ppc64 box connected to a shark with 8 LUNs I could
> easily consume all the request slots on the card. At this point we spent
> > 50% of the cpu bouncing into qla2x00_start_scsi, doing a pci_map_sg
> realising it wont fit and doing a pci_unmap_sg.
Well, since the pool is fixed size, can't the driver do some rule of
thumb check before it actually gets to the map_sg (perhaps even
pre-reserve the slots before mapping)?
> As Andrew points out, the driver shouldnt be doing this constant
> map/unmap stuff, but is there some way for a scsi device driver to tell
> the layers above it to temporarily leave it alone?
Mappings can be very limited in certain machines. I really don't think
we want precious mapping resources sitting in an issue queue in the
mid-layer just because the qla driver couldn't manage them efficiently.
The layering rules say that the driver is responsible for getting the
IOMMU resources with dma_map_sg before it places the I/O in-flight and
should free them ASAP after the I/O returns.
All drivers have to manage their issue slots efficiently. It really
sounds like the root cause of this problem is mapping too early.
If you think you don't know how many resource slots you'll need, that's
in scsi_cmnd->request->nr_hw_segments (give or take, this is always
guaranteed to be at or over the actual number dma_map_sg eventually
returns). So: fix the qlogic mapping routines.
James
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: PATCH [5/15] qla2xxx: SG tablesize update
2004-03-14 15:31 ` James Bottomley
@ 2004-03-14 15:47 ` Anton Blanchard
2004-03-14 15:55 ` James Bottomley
2004-03-14 20:41 ` Jeff Garzik
0 siblings, 2 replies; 22+ messages in thread
From: Anton Blanchard @ 2004-03-14 15:47 UTC (permalink / raw)
To: James Bottomley; +Cc: Jens Axboe, Andrew Vasquez, SCSI Mailing List
> If you think you don't know how many resource slots you'll need, that's
> in scsi_cmnd->request->nr_hw_segments (give or take, this is always
> guaranteed to be at or over the actual number dma_map_sg eventually
> returns). So: fix the qlogic mapping routines.
Yes I agree the driver should be fixed. My question was more a response
to the Jens' query about why the SG limit was introduced.
Lets ignore the pci_map_sg problem and assume the qlogic driver did the
right thing. Whats to stop the upper layers from continually trying to
queue things when we run out of request slots? If its just sg_tablesize
then we are stuck with the current compromise (ie set it to something
reasonably low).
Anton
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: PATCH [5/15] qla2xxx: SG tablesize update
2004-03-14 15:47 ` Anton Blanchard
@ 2004-03-14 15:55 ` James Bottomley
2004-03-14 16:01 ` Anton Blanchard
2004-03-14 20:41 ` Jeff Garzik
1 sibling, 1 reply; 22+ messages in thread
From: James Bottomley @ 2004-03-14 15:55 UTC (permalink / raw)
To: Anton Blanchard; +Cc: Jens Axboe, Andrew Vasquez, SCSI Mailing List
On Sun, 2004-03-14 at 10:47, Anton Blanchard wrote:
> Lets ignore the pci_map_sg problem and assume the qlogic driver did the
> right thing. Whats to stop the upper layers from continually trying to
> queue things when we run out of request slots? If its just sg_tablesize
> then we are stuck with the current compromise (ie set it to something
> reasonably low).
Because on this type of failure, the driver will return
SCSI_MLQUEUE_HOST_BUSY from its queuecommand routine. This will tell
the mid layer not to try any more commands for the entire host until an
outstanding one returns (thus hopefully freeing up enough resources).
James
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: PATCH [5/15] qla2xxx: SG tablesize update
2004-03-14 15:55 ` James Bottomley
@ 2004-03-14 16:01 ` Anton Blanchard
0 siblings, 0 replies; 22+ messages in thread
From: Anton Blanchard @ 2004-03-14 16:01 UTC (permalink / raw)
To: James Bottomley; +Cc: Jens Axboe, Andrew Vasquez, SCSI Mailing List
> Because on this type of failure, the driver will return
> SCSI_MLQUEUE_HOST_BUSY from its queuecommand routine. This will tell
> the mid layer not to try any more commands for the entire host until an
> outstanding one returns (thus hopefully freeing up enough resources).
That makes sense, it sounds like the qlogic could do that and remove
the SG limit.
Anton
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: PATCH [5/15] qla2xxx: SG tablesize update
2004-03-14 14:59 ` James Bottomley
2004-03-14 15:15 ` Jens Axboe
2004-03-14 15:18 ` Anton Blanchard
@ 2004-03-14 20:36 ` Jeff Garzik
2004-03-14 22:31 ` James Bottomley
2 siblings, 1 reply; 22+ messages in thread
From: Jeff Garzik @ 2004-03-14 20:36 UTC (permalink / raw)
To: James Bottomley; +Cc: Jens Axboe, Andrew Vasquez, SCSI Mailing List
On Sun, Mar 14, 2004 at 09:59:55AM -0500, James Bottomley wrote:
> On Sun, 2004-03-14 at 09:51, Jens Axboe wrote:
> > The explanation of why this was done (request queue limitations, huh?)
> > sounded pretty bogus as well.
>
> Actually, to be fair, I think he means the qlogic request queue, not the
> block one.
>
> The Qla chips are rather weird in that they have a single issue queue
> (whose size you can vary) but whose entry formats are fixed. If I
> remember correctly, an initial command can have 4 SG elements, but a
> follow on entry can have 7 (not sure of the figures). But anyway, large
> SG commands end up having to find multiple entries in this queue (and
> being a single issue queue for the entire card, it has to be mutexed
> while you search). The more resources you need, the more difficult the
> search and the more contention you generate on the resource mutex.
The block layer can handle this type of hardware just fine ;-)
In my SATA hacking I am finding several devices that one must constrain
based on hardware-global resources, rather than just TCQ (or lack
thereof).
Jeff
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: PATCH [5/15] qla2xxx: SG tablesize update
2004-03-14 15:47 ` Anton Blanchard
2004-03-14 15:55 ` James Bottomley
@ 2004-03-14 20:41 ` Jeff Garzik
2004-03-14 22:27 ` James Bottomley
1 sibling, 1 reply; 22+ messages in thread
From: Jeff Garzik @ 2004-03-14 20:41 UTC (permalink / raw)
To: Anton Blanchard
Cc: James Bottomley, Jens Axboe, Andrew Vasquez, SCSI Mailing List
On Mon, Mar 15, 2004 at 02:47:13AM +1100, Anton Blanchard wrote:
> Lets ignore the pci_map_sg problem and assume the qlogic driver did the
> right thing. Whats to stop the upper layers from continually trying to
> queue things when we run out of request slots? If its just sg_tablesize
> then we are stuck with the current compromise (ie set it to something
> reasonably low).
The block layer is quite simple, really.
You have an asynchronous event, "start queue drain."
And the block layer won't touch your queue again until you have removed
all requests from the queue. Once you have removed all requests from
the queue, the cycle repeats.
It is up to the block driver (and/or SCSI layer in this case) to stop
taking requests off the queue, when it can take no more. And it is up
to that same driver to start taking requests again, when hardware
resources are once again available.
So, the block layer supports this hardware situation just fine... it's a
matter of getting SCSI to understand that, I suppose :)
The issue of hardware-global resources constraining multiple device
queues is one that requires a bit of thought. Not impossible... I did
it in my Carmel driver. But so far, I have not seen any _real_
solutions excepts for mine. All the others have been hacks -- such as
"hardware queue size 1024 for all devices, so limit each device
to '1024 / n_devices' requests at a time."
Jeff
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: PATCH [5/15] qla2xxx: SG tablesize update
2004-03-14 20:41 ` Jeff Garzik
@ 2004-03-14 22:27 ` James Bottomley
2004-03-15 16:12 ` Jeff Garzik
0 siblings, 1 reply; 22+ messages in thread
From: James Bottomley @ 2004-03-14 22:27 UTC (permalink / raw)
To: Jeff Garzik
Cc: Anton Blanchard, Jens Axboe, Andrew Vasquez, SCSI Mailing List
On Sun, 2004-03-14 at 15:41, Jeff Garzik wrote:
> So, the block layer supports this hardware situation just fine... it's a
> matter of getting SCSI to understand that, I suppose :)
We already do all of that, if you look ...
The place where all of this becomes rather unoptimal for SCSI is in an
SMP where we get multiple queues racing on multiple CPUs for different
devices then having to be coalesced into a single queue for the host
adapter. It causes cache line bouncing like you wouldn't believe...
> The issue of hardware-global resources constraining multiple device
> queues is one that requires a bit of thought. Not impossible... I did
> it in my Carmel driver. But so far, I have not seen any _real_
> solutions excepts for mine. All the others have been hacks -- such as
> "hardware queue size 1024 for all devices, so limit each device
> to '1024 / n_devices' requests at a time."
We do this too.
If you look int the scsi_host structure you see can_queue, which is the
maximum number of total requests we allow to the entire host. Then
there's queue_depth in scsi_device, which is the maximum number of
requests to the individual devices. We even have a very simplistic
fairness scheme to prevent one device hogging all of the host queue
elements.
For dynamic resource situations we have the queuecommand return codes
SCSI_MLQUEUE_HOST_BUSY which means the entire host is temporarily out of
resources and causes the mid layer to hold off all commands for that
host until we get one back from any device on the host and
SCSI_MLQUEUE_DEVICE_BUSY which means that the device queue is
temporarily out of resources, hold off all commands to that device until
one returns from that device.
James
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: PATCH [5/15] qla2xxx: SG tablesize update
2004-03-14 20:36 ` Jeff Garzik
@ 2004-03-14 22:31 ` James Bottomley
2004-03-15 16:09 ` Jeff Garzik
0 siblings, 1 reply; 22+ messages in thread
From: James Bottomley @ 2004-03-14 22:31 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Jens Axboe, Andrew Vasquez, SCSI Mailing List
On Sun, 2004-03-14 at 15:36, Jeff Garzik wrote:
> On Sun, Mar 14, 2004 at 09:59:55AM -0500, James Bottomley wrote:
> > The Qla chips are rather weird in that they have a single issue queue
> > (whose size you can vary) but whose entry formats are fixed. If I
> > remember correctly, an initial command can have 4 SG elements, but a
> > follow on entry can have 7 (not sure of the figures). But anyway, large
> > SG commands end up having to find multiple entries in this queue (and
> > being a single issue queue for the entire card, it has to be mutexed
> > while you search). The more resources you need, the more difficult the
> > search and the more contention you generate on the resource mutex.
>
> The block layer can handle this type of hardware just fine ;-)
>
> In my SATA hacking I am finding several devices that one must constrain
> based on hardware-global resources, rather than just TCQ (or lack
> thereof).
I'm sorry, I must have missed the multiple queues down to a single HBA
queue API in the block layer, which one is it again?
James
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: PATCH [5/15] qla2xxx: SG tablesize update
2004-03-14 22:31 ` James Bottomley
@ 2004-03-15 16:09 ` Jeff Garzik
0 siblings, 0 replies; 22+ messages in thread
From: Jeff Garzik @ 2004-03-15 16:09 UTC (permalink / raw)
To: James Bottomley; +Cc: Jens Axboe, Andrew Vasquez, SCSI Mailing List
James Bottomley wrote:
> On Sun, 2004-03-14 at 15:36, Jeff Garzik wrote:
>
>>On Sun, Mar 14, 2004 at 09:59:55AM -0500, James Bottomley wrote:
>>
>>>The Qla chips are rather weird in that they have a single issue queue
>>>(whose size you can vary) but whose entry formats are fixed. If I
>>>remember correctly, an initial command can have 4 SG elements, but a
>>>follow on entry can have 7 (not sure of the figures). But anyway, large
>>>SG commands end up having to find multiple entries in this queue (and
>>>being a single issue queue for the entire card, it has to be mutexed
>>>while you search). The more resources you need, the more difficult the
>>>search and the more contention you generate on the resource mutex.
>>
>>The block layer can handle this type of hardware just fine ;-)
>>
>>In my SATA hacking I am finding several devices that one must constrain
>>based on hardware-global resources, rather than just TCQ (or lack
>>thereof).
>
>
> I'm sorry, I must have missed the multiple queues down to a single HBA
> queue API in the block layer, which one is it again?
Not saying the block layer provides this capability, simply that it is
possible in any block driver. It required < 20 LOC in carmel.c to do
this in a hardware-specific fashion, so it seemed unnecessary to invent
a new API... I would like to see something similar in cciss.c, in fact...
Jeff
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: PATCH [5/15] qla2xxx: SG tablesize update
2004-03-14 22:27 ` James Bottomley
@ 2004-03-15 16:12 ` Jeff Garzik
0 siblings, 0 replies; 22+ messages in thread
From: Jeff Garzik @ 2004-03-15 16:12 UTC (permalink / raw)
To: James Bottomley
Cc: Anton Blanchard, Jens Axboe, Andrew Vasquez, SCSI Mailing List
James Bottomley wrote:
> We already do all of that, if you look ...
[...]
> We do this too.
[...]
Oh, no argument. I use some of that stuff in libata, too.
These are features, however are not specific to SCSI. They are just
thin wrappers on top of the block layer, and would be more generally
useful one layer up... :)
Jeff
^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: PATCH [5/15] qla2xxx: SG tablesize update
@ 2004-03-15 23:43 Andrew Vasquez
2004-03-16 3:37 ` James Bottomley
` (3 more replies)
0 siblings, 4 replies; 22+ messages in thread
From: Andrew Vasquez @ 2004-03-15 23:43 UTC (permalink / raw)
To: James Bottomley, Jeff Garzik
Cc: Anton Blanchard, Jens Axboe, SCSI Mailing List
On Sunday, March 14, 2004 6:49 AM, James Bottomley wrote:
> On Sun, 2004-03-14 at 03:24, Andrew Vasquez wrote:
> > - .sg_tablesize = SG_ALL,
> > + .sg_tablesize = 32,
>
> Are you sure you want to do this? It limits the maximum transfer size
> per command to 128k (on x86) which is a bit small for FC.
>
Ideally, no we don't want to do this...how about sending down larger
individual SG entries, the ISP only limits the total transfer of a
command to 2^32-1 bytes ;)
On Sunday, March 14, 2004 7:00 AM, James Bottomley wrote:
> On Sun, 2004-03-14 at 09:51, Jens Axboe wrote:
> > The explanation of why this was done (request queue limitations,
> > huh?) sounded pretty bogus as well.
>
> Actually, to be fair, I think he means the qlogic request queue, not
> the block one.
>
Yes, the ISPs request queue is the rate-limiting factor.
> The Qla chips are rather weird in that they have a single issue
> queue (whose size you can vary) but whose entry formats are fixed.
> If I remember correctly, an initial command can have 4 SG elements,
> but a follow on entry can have 7 (not sure of the figures). But
> anyway, large SG commands end up having to find multiple entries in
> this queue (and being a single issue queue for the entire card, it
> has to be mutexed while you search). The more resources you need,
> the more difficult the search and the more contention you generate
> on the resource mutex.
>
Our intention with the SG_ALL -> 32 sg_tablesize patch was to
strike an (artificial and static) balance between request-queue
resource usage and the commands that consume them. Let me digress for
a moment.
Since the driver specifies SG_ALL for the sg_tablesize, from testing
(x86 and ppc64), we've seen at most 128 SG entries attached to a
command request of 512K bytes (4K page size * 128).
For each of those 128 SG entries commands, the driver requires:
o 18 request entries with a DMA mask set to 0xffffffff.
(128 - 3) / 7 + 1
o 26 request entries with a DMA mask set to ~0ULL.
(128 - 2) / 5 + 1
taking the 64bit DMA mask, the following table shows a rough
approximation of the maximum number of outstanding commands with each
having 128 SG entries:
RequestQ Size Max. Outstanding Commands
1024 39
2048 78
.... ...
8192 315
.... ...
That's not very many (albeit large transfer) commands outstanding at
any given time across all targets and luns.
As I had stated, our initial though was to reduce the sg_tablesize to
provide a 'better' balence of outstanding I/Os to a number of devices.
In the 6.x and 7.x series drivers, we've limited the sg_tablesize to
32, offering a max command transfer size of 128K (still assuming
page-size SG entries with no merging). Again, for each of the 32 SG
entries commands, the driver requires:
o 5 request entries with a DMA mask set to 0xffffffff.
(32 - 3) / 7 + 1
o 6 request entries with a DMA mask set to ~0ULL.
(32 - 2) / 5
Again taking the 64bit DMA mask as an example:
RequestQ Size Max. Outstanding Commands
1024 170
2048 341
.... ...
8192 1365
.... ...
Superficially, this seems like a 'better' solution...but...
On Sunday, March 14, 2004 7:31 AM, James Bottomley wrote:
> On Sun, 2004-03-14 at 10:18, Anton Blanchard wrote:
> > With 2.6.3 on a ppc64 box connected to a shark with 8 LUNs I could
> > easily consume all the request slots on the card. At this point we
> > spent 50% of the cpu bouncing into qla2x00_start_scsi, doing a
> > pci_map_sg realising it wont fit and doing a pci_unmap_sg.
>
> Well, since the pool is fixed size, can't the driver do some rule of
> thumb check before it actually gets to the map_sg (perhaps even
> pre-reserve the slots before mapping)?
>
Well, 'PATCH [2/15] qla2xxx: Track DSDs used by an SRB' was meant to
fix this problem of needlessly (and repeatedly) calling pci_map_sg()
during a resource shortage.
> If you think you don't know how many resource slots you'll need,
> that's in scsi_cmnd->request->nr_hw_segments (give or take, this is
> always guaranteed to be at or over the actual number dma_map_sg
> eventually returns).
>
I'm curious then, how does this value (nr_hw_segments) differ from
scsi_cmnd->use_sg?
Basically what the above patch did was track (once) the total number
of SG entries required (return value from pci_map_sg()) so that if a
resouce shortage occurred, the command would be internally requeued
for later processing without the need to call pci_map_sg().
But from later emails...it's beginning to sound like a 'better' fix
would be to use the midlayer's own queueing mechanisms and strip out
the qla2xxx driver's legacy pending-queue infrastructure in favor or
returning:
On Sunday, March 14, 2004 2:27 PM, James Bottomley wrote:
> For dynamic resource situations we have the queuecommand return
> codes
>
> SCSI_MLQUEUE_HOST_BUSY which means the entire host is temporarily
> out of resources and causes the mid layer to hold off all commands
> for that host until we get one back from any device on the host and
>
from queuecommand(). The 8.x series driver inherited alot of the
queuing baggage created during driver development of [567].x to
address some deficiecies of earlier midlayer implementations (all of
which have been addressed in recent kernels). I'll start to take a
look at tearing out the pending_q.
<If you made it this far> Thanks everyone for all your feedback.
Regards,
Andrew Vasquez
^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: PATCH [5/15] qla2xxx: SG tablesize update
2004-03-15 23:43 Andrew Vasquez
@ 2004-03-16 3:37 ` James Bottomley
2004-03-16 6:40 ` Jeremy Higdon
` (2 subsequent siblings)
3 siblings, 0 replies; 22+ messages in thread
From: James Bottomley @ 2004-03-16 3:37 UTC (permalink / raw)
To: Andrew Vasquez
Cc: Jeff Garzik, Anton Blanchard, Jens Axboe, SCSI Mailing List
On Mon, 2004-03-15 at 18:43, Andrew Vasquez wrote:
> I'm curious then, how does this value (nr_hw_segments) differ from
> scsi_cmnd->use_sg?
OK, here's a potted version of what goes on:
the block layer, when merging keeps two counters of request segments:
nr_phys_segments and nr_hw_segments. The counts are slightly inaccurate
for efficiency, but guaranteed not to go over max_phys_segments and
max_hw_segments respectively (the latter is what sg_tablesize becomes).
max_phys_segments counts the size of the *input* sgtable (the one the
mid layer). The mid layer (in scsi_alloc_sgtable) allocates enough
space for a table of nr_phys_segments elements. Then we map the request
from the block layer (blk_rq_map_sg) which fills them in with the
physical pages. blk_rq_map_sg() does an exact count of physical
segments and may find additional cases where pages are actually adjacent
and so can be merged, so the value it returns (which is what is put in
cmd->use_sg) is <= nr_phys_segments. However, you don't care about this
part, the mid-layer does it all for you (well, except that
max_phys_segments is fixed in the mid-layer at 128, which means that
your sgtable can be in practice never more than 128 elements).
When you map the sgtable for use on by the HBA, using dma_map_sg(), the
bus physical addresses are filled in. If the platform has no IOMMU,
this is usually simply the memory physical addresses and
nr_phys_segments == nr_hw_segments.
However, if there is an IOMMU in the system it may be able to take non
adjacent pages in physical memory and remap them to be adjacent in bus
physical address space. This is called virtual merging and when it
happens, the size of the sgtable you get out of dma_map_sg shrinks. The
size it shrinks down to is always <= nr_hw_segments (because the way the
iommu does the mapping has been parametrised for the block layer).
Thus, the number of elements the driver will have to allocate is always
<= nr_hw_segments.
> But from later emails...it's beginning to sound like a 'better' fix
> would be to use the midlayer's own queueing mechanisms and strip out
> the qla2xxx driver's legacy pending-queue infrastructure in favor or
> returning:
Yes.
> On Sunday, March 14, 2004 2:27 PM, James Bottomley wrote:
> > For dynamic resource situations we have the queuecommand return
> > codes
> >
> > SCSI_MLQUEUE_HOST_BUSY which means the entire host is temporarily
> > out of resources and causes the mid layer to hold off all commands
> > for that host until we get one back from any device on the host and
> >
>
> from queuecommand(). The 8.x series driver inherited alot of the
> queuing baggage created during driver development of [567].x to
> address some deficiecies of earlier midlayer implementations (all of
> which have been addressed in recent kernels). I'll start to take a
> look at tearing out the pending_q.
Thanks,
James
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: PATCH [5/15] qla2xxx: SG tablesize update
2004-03-15 23:43 Andrew Vasquez
2004-03-16 3:37 ` James Bottomley
@ 2004-03-16 6:40 ` Jeremy Higdon
2004-03-16 11:32 ` Anton Blanchard
2004-03-16 21:49 ` James Bottomley
3 siblings, 0 replies; 22+ messages in thread
From: Jeremy Higdon @ 2004-03-16 6:40 UTC (permalink / raw)
To: Andrew Vasquez
Cc: James Bottomley, Jeff Garzik, Anton Blanchard, Jens Axboe,
SCSI Mailing List
On Mon, Mar 15, 2004 at 03:43:18PM -0800, Andrew Vasquez wrote:
>
> Since the driver specifies SG_ALL for the sg_tablesize, from testing
> (x86 and ppc64), we've seen at most 128 SG entries attached to a
> command request of 512K bytes (4K page size * 128).
Andrew, with a slight mod to your driver, and using James's patch
to issue larger requests, I was able to issue 4MB requests to a
device (256 SG entries times 16KB page size) on our Itanium box.
Some H/W RAIDs have a strong preference for really large requests.
Your plan to switch to the midlayer throttling mechanism sounds good.
jeremy
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: PATCH [5/15] qla2xxx: SG tablesize update
2004-03-15 23:43 Andrew Vasquez
2004-03-16 3:37 ` James Bottomley
2004-03-16 6:40 ` Jeremy Higdon
@ 2004-03-16 11:32 ` Anton Blanchard
2004-03-16 21:49 ` James Bottomley
3 siblings, 0 replies; 22+ messages in thread
From: Anton Blanchard @ 2004-03-16 11:32 UTC (permalink / raw)
To: Andrew Vasquez
Cc: James Bottomley, Jeff Garzik, Jens Axboe, SCSI Mailing List
Hi,
> Ideally, no we don't want to do this...how about sending down larger
> individual SG entries, the ISP only limits the total transfer of a
> command to 2^32-1 bytes ;)
Many architectures with IOMMUs do virtual merging, so you will get
much larger individual SG entries on those architectures. On ppc64 its
now an option, IOMMU_VMERGE.
Virtual merging is another reason to remove the 32 SG limit. On ppc64 we
dont use the BIO_VMERGE_BOUNDARY trick (its there to allow early
estimation of how an sglist can be merged). We do this because in the
time between the estimation and the actual allocation the IOMMU space
may have become too fragmented for the allocation to succeeed.
Instead we do all the merging at pci_map_sg time which means we can fall
back to not merging at all when the IOMMU space becomes fragmented.
Unfortunately with the 32 SG change we will be limited to requests of
PAGE_SIZE*32.
> Since the driver specifies SG_ALL for the sg_tablesize, from testing
> (x86 and ppc64), we've seen at most 128 SG entries attached to a
> command request of 512K bytes (4K page size * 128).
Yep, we are limited by the maximum size of the scsi SG list of 128
entries. James merged a patch recently that allows us to play with
larger sizes (eg 256 entries), check out SCSI_MAX_PHYS_SEGMENTS and
MAX_PHYS_SEGMENTS.
> from queuecommand(). The 8.x series driver inherited alot of the
> queuing baggage created during driver development of [567].x to
> address some deficiecies of earlier midlayer implementations (all of
> which have been addressed in recent kernels). I'll start to take a
> look at tearing out the pending_q.
Yeah that solution sounds promising.
Anton
^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: PATCH [5/15] qla2xxx: SG tablesize update
2004-03-15 23:43 Andrew Vasquez
` (2 preceding siblings ...)
2004-03-16 11:32 ` Anton Blanchard
@ 2004-03-16 21:49 ` James Bottomley
3 siblings, 0 replies; 22+ messages in thread
From: James Bottomley @ 2004-03-16 21:49 UTC (permalink / raw)
To: Andrew Vasquez
Cc: Jeff Garzik, Anton Blanchard, Jens Axboe, SCSI Mailing List
So, did you want me to apply the sg_tablesize lowering patch?
I really think performance may suffer from it but, hey, I'm not the one
who has to justify the figures.
James
^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: PATCH [5/15] qla2xxx: SG tablesize update
@ 2004-03-16 22:09 Andrew Vasquez
0 siblings, 0 replies; 22+ messages in thread
From: Andrew Vasquez @ 2004-03-16 22:09 UTC (permalink / raw)
To: James Bottomley
Cc: Jeff Garzik, Anton Blanchard, JensAxboe, SCSI Mailing List
On Tuesday, March 16, 2004 1:50 PM, James Bottomley wrote:
> So, did you want me to apply the sg_tablesize lowering patch?
>
> I really think performance may suffer from it but, hey, I'm not the
> one who has to justify the figures.
>
For now, I'd say keep everything in the original patch except the
hunk that changes SG_ALL -> 32, revised patch attached.
The larger pending_q tear-down patch will follow shortly.
Thanks,
Andrew
diff -Nru a/drivers/scsi/qla2xxx/qla_def.h b/drivers/scsi/qla2xxx/qla_def.h
--- a/drivers/scsi/qla2xxx/qla_def.h Fri Mar 12 17:07:22 2004
+++ b/drivers/scsi/qla2xxx/qla_def.h Fri Mar 12 17:07:22 2004
@@ -209,7 +209,7 @@
#define MAX_OUTSTANDING_COMMANDS 1024
/* ISP request and response entry counts (37-65535) */
-#define REQUEST_ENTRY_CNT 1024 /* Number of request entries. */
+#define REQUEST_ENTRY_CNT 2048 /* Number of request entries. */
#define RESPONSE_ENTRY_CNT_2100 64 /* Number of response entries.*/
#define RESPONSE_ENTRY_CNT_2300 512 /* Number of response entries.*/
diff -Nru a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c
--- a/drivers/scsi/qla2xxx/qla_init.c Fri Mar 12 17:07:22 2004
+++ b/drivers/scsi/qla2xxx/qla_init.c Fri Mar 12 17:07:22 2004
@@ -603,7 +603,10 @@
ha->product_id[3] = mb[4];
/* Adjust fw RISC transfer size */
- ha->fw_transfer_size = REQUEST_ENTRY_SIZE * REQUEST_ENTRY_CNT;
+ if (REQUEST_ENTRY_CNT > 1024)
+ ha->fw_transfer_size = REQUEST_ENTRY_SIZE * 1024;
+ else
+ ha->fw_transfer_size = REQUEST_ENTRY_SIZE * REQUEST_ENTRY_CNT;
if (IS_QLA2200(ha) &&
RD_MAILBOX_REG(ha, reg, 7) == QLA2200A_RISC_ROM_VER) {
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2004-03-16 22:08 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-03-14 8:24 PATCH [5/15] qla2xxx: SG tablesize update Andrew Vasquez
2004-03-14 14:49 ` James Bottomley
2004-03-14 14:51 ` Jens Axboe
2004-03-14 14:59 ` James Bottomley
2004-03-14 15:15 ` Jens Axboe
2004-03-14 15:18 ` Anton Blanchard
2004-03-14 15:31 ` James Bottomley
2004-03-14 15:47 ` Anton Blanchard
2004-03-14 15:55 ` James Bottomley
2004-03-14 16:01 ` Anton Blanchard
2004-03-14 20:41 ` Jeff Garzik
2004-03-14 22:27 ` James Bottomley
2004-03-15 16:12 ` Jeff Garzik
2004-03-14 20:36 ` Jeff Garzik
2004-03-14 22:31 ` James Bottomley
2004-03-15 16:09 ` Jeff Garzik
-- strict thread matches above, loose matches on Subject: below --
2004-03-15 23:43 Andrew Vasquez
2004-03-16 3:37 ` James Bottomley
2004-03-16 6:40 ` Jeremy Higdon
2004-03-16 11:32 ` Anton Blanchard
2004-03-16 21:49 ` James Bottomley
2004-03-16 22:09 Andrew Vasquez
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox