* [PATCH + RFC] Beginning of some updates to scsi mid layer
@ 2002-06-19 0:47 Doug Ledford
2002-06-19 21:15 ` Patrick Mansfield
0 siblings, 1 reply; 24+ messages in thread
From: Doug Ledford @ 2002-06-19 0:47 UTC (permalink / raw)
To: linux-scsi; +Cc: torvalds
Linus,
I've got some things queued up to start sending through to you as they
pass my testing here. However, I wanted to get some input from people to
make sure that these are heading down paths people can agree with...
So, this first patch extends the slave_attach() function that Eric put as
a stub in scsi.c long ago and then never used. The idea is actually
sound, it was just never implemented. Specifically, the scsi upper layer
knows when a device is ready to be used, but some of the limitations on
that device are low level driver imposed (and sometimes the low level
driver needs to know certain things about a device in order to act
appropriately). This patch makes it so that any time we attach a new
device into the scsi chain, we call the scsi low level driver's
slave_attach() routine. This is where the low level driver should do
things like look at the INQUIRY data to determine U160+ support, set queue
depths, things like that. When we are removing the device, we call the
(new) slave_detach() routine in the driver. The slave_detach will replace
the revoke() routine once I've removed the usage of revoke() from the one
and only driver that uses it.
Currently, the low level driver can:
1. Poke into the INQUIRY data for sync speed info
2. Set the queue depth on the device (using the new
scsi_adjust_queue_depth function)
3. Allocate any needed structs and such for this device
Future plans include allowing drivers to:
4. Set the default timeout for the device (so RAID controllers can have
their 60 second disk timeouts while non-RAID controllers can have their
much shorter 5 to 10 second timeouts, really helps responsiveness of
system in the face of any error conditions when using the non-RAID
controllers)
5. Configure other queueing parameters (queue type, ordering
capabilities, etc. as appropriate in light of other changes I have in
mind)
6. Anything else that makes sense.
As mentioned, this adds a new scsi_adjust_queue_depth routine that lets
the low level driver update the queue depth on a device at any point in
time. So, when Justin Gibbs' driver allocates a queue depth of 253
commands on drive X, then finds out that drive X has a hard limit of 64,
then we aren't continuing to waste 189 commands worth of allocated kernel
memory that can never be effectively used. Future plans here include
linking this into the bio queueing so that adjusting the mid layer queue
depth does the "right thing" in regards to the bio layer (for all I know
it does already, I haven't gotten into the bio->scsi interface yet).
Once this stuff is in place, then my next goal is to get the queueing code
in the mid layer up to useable standards (the goal is to be able to rip
the buts of the QUEUE_FULL and BUSY handling out of my driver and still
have the system do the "right thing" instead of my driver having it's own
complete internal queue mechanism including timers and such like it does
now).
Side note: the work I'm doing on the aic7xxx_old driver here is simply
because that's the easiest sandbox for me to verify my changes, it doesn't
imply that I'll be updating the driver in any meaningful way.
Comments?
--- linux-2.5.22/drivers/scsi/aic7xxx_old/aic7xxx.h.slave Sun Jun 16 22:31:23 2002
+++ linux-2.5.22/drivers/scsi/aic7xxx_old/aic7xxx.h Tue Jun 18 18:04:50 2002
@@ -46,7 +46,9 @@
eh_host_reset_handler: NULL, \
abort: aic7xxx_abort, \
reset: aic7xxx_reset, \
- slave_attach: NULL, \
+ select_queue_depths: NULL, \
+ slave_attach: aic7xxx_slave_attach, \
+ slave_detach: aic7xxx_slave_detach, \
bios_param: aic7xxx_biosparam, \
can_queue: 255, /* max simultaneous cmds */\
this_id: -1, /* scsi id of host adapter */\
@@ -64,6 +66,8 @@
extern int aic7xxx_reset(Scsi_Cmnd *, unsigned int);
extern int aic7xxx_abort(Scsi_Cmnd *);
extern int aic7xxx_release(struct Scsi_Host *);
+extern int aic7xxx_slave_attach(Scsi_Device *);
+extern void aic7xxx_slave_detach(Scsi_Device *);
extern const char *aic7xxx_info(struct Scsi_Host *);
--- linux-2.5.22/drivers/scsi/aic7xxx_old.c.slave Sun Jun 16 22:31:31 2002
+++ linux-2.5.22/drivers/scsi/aic7xxx_old.c Tue Jun 18 19:34:13 2002
@@ -977,7 +977,7 @@
#define DEVICE_DTR_SCANNED 0x40
volatile unsigned char dev_flags[MAX_TARGETS];
volatile unsigned char dev_active_cmds[MAX_TARGETS];
- volatile unsigned char dev_temp_queue_depth[MAX_TARGETS];
+ volatile unsigned short dev_temp_queue_depth[MAX_TARGETS];
unsigned char dev_commands_sent[MAX_TARGETS];
unsigned int dev_timer_active; /* Which devs have a timer set */
@@ -989,7 +989,9 @@
unsigned char dev_last_queue_full[MAX_TARGETS];
unsigned char dev_last_queue_full_count[MAX_TARGETS];
- unsigned char dev_max_queue_depth[MAX_TARGETS];
+ unsigned char dev_lun_queue_depth[MAX_TARGETS];
+ unsigned short dev_scbs_needed[MAX_TARGETS];
+ unsigned short dev_max_queue_depth[MAX_TARGETS];
volatile scb_queue_type delayed_scbs[MAX_TARGETS];
@@ -1036,6 +1038,7 @@
ahc_chip chip; /* chip type */
ahc_bugs bugs;
dma_addr_t fifo_dma; /* DMA handle for fifo arrays */
+ Scsi_Device *Scsi_Dev[MAX_TARGETS][MAX_LUNS];
/*
* Statistics Kept:
@@ -2821,94 +2824,6 @@
cmd->result |= (DID_RESET << 16);
}
- if (!(p->dev_flags[tindex] & DEVICE_PRESENT))
- {
- if ( (cmd->cmnd[0] == INQUIRY) && (cmd->result == DID_OK) )
- {
-
- p->dev_flags[tindex] |= DEVICE_PRESENT;
-#define WIDE_INQUIRY_BITS 0x60
-#define SYNC_INQUIRY_BITS 0x10
-#define SCSI_VERSION_BITS 0x07
-#define SCSI_DT_BIT 0x04
- if(!(p->dev_flags[tindex] & DEVICE_DTR_SCANNED)) {
- char *buffer;
-
- if(cmd->use_sg)
- BUG();
-
- buffer = (char *)cmd->request_buffer;
-
- if ( (buffer[7] & WIDE_INQUIRY_BITS) &&
- (p->features & AHC_WIDE) )
- {
- p->needwdtr |= (1<<tindex);
- p->needwdtr_copy |= (1<<tindex);
- p->transinfo[tindex].goal_width = p->transinfo[tindex].user_width;
- }
- else
- {
- p->needwdtr &= ~(1<<tindex);
- p->needwdtr_copy &= ~(1<<tindex);
- pause_sequencer(p);
- aic7xxx_set_width(p, cmd->target, cmd->channel, cmd->lun,
- MSG_EXT_WDTR_BUS_8_BIT, (AHC_TRANS_ACTIVE |
- AHC_TRANS_GOAL |
- AHC_TRANS_CUR) );
- unpause_sequencer(p, FALSE);
- }
- if ( (buffer[7] & SYNC_INQUIRY_BITS) &&
- p->transinfo[tindex].user_offset )
- {
- p->transinfo[tindex].goal_period = p->transinfo[tindex].user_period;
- p->transinfo[tindex].goal_options = p->transinfo[tindex].user_options;
- if (p->features & AHC_ULTRA2)
- p->transinfo[tindex].goal_offset = MAX_OFFSET_ULTRA2;
- else if (p->transinfo[tindex].goal_width == MSG_EXT_WDTR_BUS_16_BIT)
- p->transinfo[tindex].goal_offset = MAX_OFFSET_16BIT;
- else
- p->transinfo[tindex].goal_offset = MAX_OFFSET_8BIT;
- if ( (((buffer[2] & SCSI_VERSION_BITS) >= 3) ||
- (buffer[56] & SCSI_DT_BIT) ||
- (p->dev_flags[tindex] & DEVICE_SCSI_3) ) &&
- (p->transinfo[tindex].user_period <= 9) &&
- (p->transinfo[tindex].user_options) )
- {
- p->needppr |= (1<<tindex);
- p->needppr_copy |= (1<<tindex);
- p->needsdtr &= ~(1<<tindex);
- p->needsdtr_copy &= ~(1<<tindex);
- p->needwdtr &= ~(1<<tindex);
- p->needwdtr_copy &= ~(1<<tindex);
- p->dev_flags[tindex] |= DEVICE_SCSI_3;
- }
- else
- {
- p->needsdtr |= (1<<tindex);
- p->needsdtr_copy |= (1<<tindex);
- p->transinfo[tindex].goal_period =
- MAX(10, p->transinfo[tindex].goal_period);
- p->transinfo[tindex].goal_options = 0;
- }
- }
- else
- {
- p->needsdtr &= ~(1<<tindex);
- p->needsdtr_copy &= ~(1<<tindex);
- p->transinfo[tindex].goal_period = 255;
- p->transinfo[tindex].goal_offset = 0;
- p->transinfo[tindex].goal_options = 0;
- }
- p->dev_flags[tindex] |= DEVICE_DTR_SCANNED;
- p->dev_flags[tindex] |= DEVICE_PRINT_DTR;
- }
-#undef WIDE_INQUIRY_BITS
-#undef SYNC_INQUIRY_BITS
-#undef SCSI_VERSION_BITS
-#undef SCSI_DT_BIT
- }
- }
-
if ((scb->flags & SCB_MSGOUT_BITS) != 0)
{
unsigned short mask;
@@ -4920,15 +4835,29 @@
if ( (p->dev_last_queue_full_count[tindex] > 14) &&
(p->dev_active_cmds[tindex] > 4) )
{
+ int diff, lun;
+ if (p->dev_active_cmds[tindex] > p->dev_lun_queue_depth[tindex])
+ /* We don't know what to do here, so bail. */
+ break;
if (aic7xxx_verbose & VERBOSE_NEGOTIATION2)
printk(INFO_LEAD "Queue depth reduced to %d\n", p->host_no,
CTL_OF_SCB(scb), p->dev_active_cmds[tindex]);
- p->dev_max_queue_depth[tindex] =
- p->dev_active_cmds[tindex];
+ diff = p->dev_lun_queue_depth[tindex] -
+ p->dev_active_cmds[tindex];
+ p->dev_lun_queue_depth[tindex] -= diff;
+ for(lun = 0; lun < p->host->max_lun; lun++)
+ {
+ if(p->Scsi_Dev[tindex][lun] != NULL)
+ {
+ p->dev_max_queue_depth[tindex] -= diff;
+ scsi_adjust_queue_depth(p->Scsi_Dev[tindex][lun], 1,
+ p->dev_lun_queue_depth[tindex]);
+ if(p->dev_temp_queue_depth[tindex] > p->dev_max_queue_depth[tindex])
+ p->dev_temp_queue_depth[tindex] = p->dev_max_queue_depth[tindex];
+ }
+ }
p->dev_last_queue_full[tindex] = 0;
p->dev_last_queue_full_count[tindex] = 0;
- p->dev_temp_queue_depth[tindex] =
- p->dev_active_cmds[tindex];
}
else if (p->dev_active_cmds[tindex] == 0)
{
@@ -7026,10 +6955,10 @@
* with queue depths for individual devices. It also allows tagged
* queueing to be [en|dis]abled for a specific adapter.
*-F*************************************************************************/
-static int
+static void
aic7xxx_device_queue_depth(struct aic7xxx_host *p, Scsi_Device *device)
{
- int default_depth = 3;
+ int default_depth = p->host->hostt->cmd_per_lun;
unsigned char tindex;
unsigned short target_mask;
@@ -7039,12 +6968,69 @@
if (p->dev_max_queue_depth[tindex] > 1)
{
/*
- * We've already scanned this device, leave it alone
+ * We've already scanned some lun on this device and enabled tagged
+ * queueing on it. So, as long as this lun also supports tagged
+ * queueing, enable it here with the same depth. Call SCSI mid layer
+ * to adjust depth on this device, and add enough to the max_queue_depth
+ * to cover the commands for this lun.
+ *
+ * Note: there is a shortcoming here. The aic7xxx driver really assumes
+ * that if any lun on a device supports tagged queueing, then they *all*
+ * do. Our p->tagenable field is on a per target id basis and doesn't
+ * differentiate for different luns. If we end up with one lun that
+ * doesn't support tagged queueing, it's going to disable tagged queueing
+ * on *all* the luns on that target ID :-(
*/
- return(p->dev_max_queue_depth[tindex]);
+ if(device->tagged_supported) {
+ if (aic7xxx_verbose & VERBOSE_NEGOTIATION2)
+ {
+ printk(INFO_LEAD "Enabled tagged queuing, queue depth %d.\n",
+ p->host_no, device->channel, device->id,
+ device->lun, device->queue_depth);
+ }
+ p->dev_max_queue_depth[tindex] += p->dev_lun_queue_depth[tindex];
+ p->dev_temp_queue_depth[tindex] += p->dev_lun_queue_depth[tindex];
+ scsi_adjust_queue_depth(device, 1, p->dev_lun_queue_depth[tindex]);
+ }
+ else
+ {
+ int lun;
+ /*
+ * Uh ohh, this is what I was talking about. All the other devices on
+ * this target ID that support tagged queueing are going to end up
+ * getting tagged queueing turned off because of this device. Print
+ * out a message to this effect for the user, then disable tagged
+ * queueing on all the devices on this ID.
+ */
+ printk(WARN_LEAD "does not support tagged queuing while other luns on\n"
+ " the same target ID do!! Tagged queueing will be disabled for\n"
+ " all luns on this target ID!!\n", p->host_no,
+ device->channel, device->id, device->lun);
+
+ p->dev_lun_queue_depth[tindex] = default_depth;
+ p->dev_scbs_needed[tindex] = 0;
+ p->dev_temp_queue_depth[tindex] = 1;
+ p->dev_max_queue_depth[tindex] = 1;
+ p->tagenable &= ~target_mask;
+
+ for(lun=0; lun < p->host->max_lun; lun++)
+ {
+ if(p->Scsi_Dev[tindex][lun] != NULL)
+ {
+ printk(WARN_LEAD "disabling tagged queuing.\n", p->host_no,
+ p->Scsi_Dev[tindex][lun]->channel,
+ p->Scsi_Dev[tindex][lun]->id,
+ p->Scsi_Dev[tindex][lun]->lun);
+ scsi_adjust_queue_depth(p->Scsi_Dev[tindex][lun], 0, default_depth);
+ p->dev_scbs_needed[tindex] += default_depth;
+ }
+ }
+ }
+ return;
}
- device->queue_depth = default_depth;
+ p->dev_lun_queue_depth[tindex] = default_depth;
+ p->dev_scbs_needed[tindex] = default_depth;
p->dev_temp_queue_depth[tindex] = 1;
p->dev_max_queue_depth[tindex] = 1;
p->tagenable &= ~target_mask;
@@ -7054,7 +7040,7 @@
int tag_enabled = TRUE;
default_depth = AIC7XXX_CMDS_PER_DEVICE;
-
+
if (!(p->discenable & target_mask))
{
if (aic7xxx_verbose & VERBOSE_NEGOTIATION2)
@@ -7075,7 +7061,7 @@
" the aic7xxx.c source file.\n");
print_warning = FALSE;
}
- device->queue_depth = default_depth;
+ p->dev_lun_queue_depth[tindex] = default_depth;
}
else
{
@@ -7083,19 +7069,18 @@
if (aic7xxx_tag_info[p->instance].tag_commands[tindex] == 255)
{
tag_enabled = FALSE;
- device->queue_depth = 3; /* Tagged queueing is disabled. */
}
else if (aic7xxx_tag_info[p->instance].tag_commands[tindex] == 0)
{
- device->queue_depth = default_depth;
+ p->dev_lun_queue_depth[tindex] = default_depth;
}
else
{
- device->queue_depth =
+ p->dev_lun_queue_depth[tindex] =
aic7xxx_tag_info[p->instance].tag_commands[tindex];
}
}
- if ((device->tagged_queue == 0) && tag_enabled)
+ if (tag_enabled)
{
if (aic7xxx_verbose & VERBOSE_NEGOTIATION2)
{
@@ -7103,46 +7088,70 @@
p->host_no, device->channel, device->id,
device->lun, device->queue_depth);
}
- p->dev_max_queue_depth[tindex] = device->queue_depth;
- p->dev_temp_queue_depth[tindex] = device->queue_depth;
+ p->dev_max_queue_depth[tindex] = p->dev_lun_queue_depth[tindex];
+ p->dev_temp_queue_depth[tindex] = p->dev_lun_queue_depth[tindex];
+ p->dev_scbs_needed[tindex] = p->dev_lun_queue_depth[tindex];
p->tagenable |= target_mask;
p->orderedtag |= target_mask;
- device->tagged_queue = 1;
- device->current_tag = SCB_LIST_NULL;
+ scsi_adjust_queue_depth(device, 1, p->dev_lun_queue_depth[tindex]);
}
}
}
- return(p->dev_max_queue_depth[tindex]);
+ return;
}
/*+F*************************************************************************
* Function:
- * aic7xxx_select_queue_depth
+ * aic7xxx_slave_detach
*
* Description:
- * Sets the queue depth for each SCSI device hanging off the input
- * host adapter. We use a queue depth of 2 for devices that do not
- * support tagged queueing. If AIC7XXX_CMDS_PER_LUN is defined, we
- * use that for tagged queueing devices; otherwise we use our own
- * algorithm for determining the queue depth based on the maximum
- * SCBs for the controller.
+ * prepare for this device to go away
*-F*************************************************************************/
-static void
-aic7xxx_select_queue_depth(struct Scsi_Host *host,
- Scsi_Device *scsi_devs)
+void
+aic7xxx_slave_detach(Scsi_Device *sdpnt)
{
- Scsi_Device *device;
- struct aic7xxx_host *p = (struct aic7xxx_host *) host->hostdata;
- int scbnum;
+ struct aic7xxx_host *p = (struct aic7xxx_host *) sdpnt->host->hostdata;
+ int lun, tindex;
+
+ tindex = sdpnt->id | (sdpnt->channel << 3);
+ lun = sdpnt->lun;
+ if(p->Scsi_Dev[tindex][lun] == NULL)
+ return;
- scbnum = 0;
- for (device = scsi_devs; device != NULL; device = device->next)
+ if(p->tagenable & (1 << tindex))
{
- if (device->host == host)
- {
- scbnum += aic7xxx_device_queue_depth(p, device);
- }
+ p->dev_max_queue_depth[tindex] -= p->dev_lun_queue_depth[tindex];
+ if(p->dev_temp_queue_depth[tindex] > p->dev_max_queue_depth[tindex])
+ p->dev_temp_queue_depth[tindex] = p->dev_max_queue_depth[tindex];
}
+ p->dev_scbs_needed[tindex] -= p->dev_lun_queue_depth[tindex];
+ p->Scsi_Dev[tindex][lun] = NULL;
+ return;
+}
+
+/*+F*************************************************************************
+ * Function:
+ * aic7xxx_slave_attach
+ *
+ * Description:
+ * Configure the device we are attaching to the controller. This is
+ * where we get to do things like scan the INQUIRY data, set queue
+ * depths, allocate command structs, etc.
+ *-F*************************************************************************/
+int
+aic7xxx_slave_attach(Scsi_Device *sdpnt)
+{
+ struct aic7xxx_host *p = (struct aic7xxx_host *) sdpnt->host->hostdata;
+ int scbnum, tindex, i;
+
+ tindex = sdpnt->id | (sdpnt->channel << 3);
+ p->dev_flags[tindex] |= DEVICE_PRESENT;
+
+ p->Scsi_Dev[tindex][sdpnt->lun] = sdpnt;
+ aic7xxx_device_queue_depth(p, sdpnt);
+
+ for(i = 0, scbnum = 0; i < p->host->max_id; i++)
+ scbnum += p->dev_scbs_needed[i];
while (scbnum > p->scb_data->numscbs)
{
/*
@@ -7151,8 +7160,91 @@
* the SCB in order to perform a swap operation (possible deadlock)
*/
if ( aic7xxx_allocate_scb(p) == 0 )
- return;
+ break;
+ }
+
+ /*
+ * We only need to check INQUIRY data on one lun of multi lun devices
+ * since speed negotiations are not lun specific. Once we've check this
+ * particular target id once, the DEVICE_PRESENT flag will be set.
+ */
+ if (!(p->dev_flags[tindex] & DEVICE_DTR_SCANNED))
+ {
+ p->dev_flags[tindex] |= DEVICE_DTR_SCANNED;
+#define WIDE_INQUIRY_BITS 0x60
+#define SYNC_INQUIRY_BITS 0x10
+#define SCSI_VERSION_BITS 0x07
+#define SCSI_DT_BIT 0x04
+
+ if ( (sdpnt->inquiry[7] & WIDE_INQUIRY_BITS) &&
+ (p->features & AHC_WIDE) )
+ {
+ p->needwdtr |= (1<<tindex);
+ p->needwdtr_copy |= (1<<tindex);
+ p->transinfo[tindex].goal_width = p->transinfo[tindex].user_width;
+ }
+ else
+ {
+ p->needwdtr &= ~(1<<tindex);
+ p->needwdtr_copy &= ~(1<<tindex);
+ pause_sequencer(p);
+ aic7xxx_set_width(p, sdpnt->id, sdpnt->channel, sdpnt->lun,
+ MSG_EXT_WDTR_BUS_8_BIT, (AHC_TRANS_ACTIVE |
+ AHC_TRANS_GOAL |
+ AHC_TRANS_CUR) );
+ unpause_sequencer(p, FALSE);
+ }
+ if ( (sdpnt->inquiry[7] & SYNC_INQUIRY_BITS) &&
+ p->transinfo[tindex].user_offset )
+ {
+ p->transinfo[tindex].goal_period = p->transinfo[tindex].user_period;
+ p->transinfo[tindex].goal_options = p->transinfo[tindex].user_options;
+ if (p->features & AHC_ULTRA2)
+ p->transinfo[tindex].goal_offset = MAX_OFFSET_ULTRA2;
+ else if (p->transinfo[tindex].goal_width == MSG_EXT_WDTR_BUS_16_BIT)
+ p->transinfo[tindex].goal_offset = MAX_OFFSET_16BIT;
+ else
+ p->transinfo[tindex].goal_offset = MAX_OFFSET_8BIT;
+ if ( (((sdpnt->inquiry[2] & SCSI_VERSION_BITS) >= 3) ||
+ ((sdpnt->inquiry_len >= 57) &&
+ (sdpnt->inquiry[56] & SCSI_DT_BIT)) ||
+ (p->dev_flags[tindex] & DEVICE_SCSI_3) ) &&
+ (p->transinfo[tindex].user_period <= 9) &&
+ (p->transinfo[tindex].user_options) )
+ {
+ p->needppr |= (1<<tindex);
+ p->needppr_copy |= (1<<tindex);
+ p->needsdtr &= ~(1<<tindex);
+ p->needsdtr_copy &= ~(1<<tindex);
+ p->needwdtr &= ~(1<<tindex);
+ p->needwdtr_copy &= ~(1<<tindex);
+ p->dev_flags[tindex] |= DEVICE_SCSI_3;
+ }
+ else
+ {
+ p->needsdtr |= (1<<tindex);
+ p->needsdtr_copy |= (1<<tindex);
+ p->transinfo[tindex].goal_period =
+ MAX(10, p->transinfo[tindex].goal_period);
+ p->transinfo[tindex].goal_options = 0;
+ }
+ }
+ else
+ {
+ p->needsdtr &= ~(1<<tindex);
+ p->needsdtr_copy &= ~(1<<tindex);
+ p->transinfo[tindex].goal_period = 255;
+ p->transinfo[tindex].goal_offset = 0;
+ p->transinfo[tindex].goal_options = 0;
+ }
+ p->dev_flags[tindex] |= DEVICE_PRINT_DTR;
+#undef WIDE_INQUIRY_BITS
+#undef SYNC_INQUIRY_BITS
+#undef SCSI_VERSION_BITS
+#undef SCSI_DT_BIT
}
+
+ return(0);
}
/*+F*************************************************************************
@@ -8249,7 +8341,6 @@
host->can_queue = AIC7XXX_MAXSCB;
host->cmd_per_lun = 3;
host->sg_tablesize = AIC7XXX_MAX_SG;
- host->select_queue_depths = aic7xxx_select_queue_depth;
host->this_id = p->scsi_id;
host->io_port = p->base;
host->n_io_port = 0xFF;
--- linux-2.5.22/drivers/scsi/hosts.h.slave Sun Jun 16 22:31:29 2002
+++ linux-2.5.22/drivers/scsi/hosts.h Tue Jun 18 18:04:50 2002
@@ -97,6 +97,10 @@
*/
int (* detect)(struct SHT *);
+ /*
+ * This function is only used by one driver and will be going away
+ * once it switches over to using the slave_detach() function instead.
+ */
int (*revoke)(Scsi_Device *);
/* Used with loadable modules to unload the host structures. Note:
@@ -200,11 +204,59 @@
int (* reset)(Scsi_Cmnd *, unsigned int);
/*
- * This function is used to select synchronous communications,
- * which will result in a higher data throughput. Not implemented
- * yet.
+ * Once the device has responded to an INQUIRY and we know the device
+ * is online, call into the low level driver with the Scsi_Device *
+ * (so that the low level driver may save it off in a safe location
+ * for later use in calling scsi_adjust_queue_depth() or possibly
+ * other scsi_* functions) and char * to the INQUIRY return data buffer.
+ * This way, low level drivers will no longer have to snoop INQUIRY data
+ * to see if a drive supports PPR message protocol for Ultra160 speed
+ * negotiations or other similar items. Instead it can simply wait until
+ * the scsi mid layer calls them with the data in hand and then it can
+ * do it's checking of INQUIRY data. This will happen once for each new
+ * device added on this controller (including once for each lun on
+ * multi-lun devices, so low level drivers should take care to make
+ * sure that if they do tagged queueing on a per physical unit basis
+ * instead of a per logical unit basis that they have the mid layer
+ * allocate tags accordingly).
+ *
+ * Things currently recommended to be handled at this time include:
+ *
+ * 1. Checking for tagged queueing capability and if able then calling
+ * scsi_adjust_queue_depth() with the device pointer and the
+ * suggested new queue depth.
+ * 2. Checking for things such as SCSI level or DT bit in order to
+ * determine if PPR message protocols are appropriate on this
+ * device (or any other scsi INQUIRY data specific things the
+ * driver wants to know in order to properly handle this device).
+ * 3. Allocating command structs that the device will need.
+ * 4. Setting the default timeout on this device (if needed).
+ * 5. Saving the Scsi_Device pointer so that the low level driver
+ * will be able to easily call back into scsi_adjust_queue_depth
+ * again should it be determined that the queue depth for this
+ * device should be lower or higher than it is initially set to.
+ * 6. Allocate device data structures as needed that can be attached
+ * to the Scsi_Device * via SDpnt->host_device_ptr
+ * 7. Anything else the low level driver might want to do on a device
+ * specific setup basis...
+ * 8. Return 0 on success, non-0 on error. The device will be marked
+ * as offline on error so that no access will occur.
+ */
+ int (* slave_attach)(Scsi_Device *);
+
+ /*
+ * If we are getting ready to remove a device from the scsi chain then
+ * we call into the low level driver to let them know. Once a low
+ * level driver has been informed that a drive is going away, the low
+ * level driver *must* remove it's pointer to the Scsi_Device because
+ * it is going to be kfree()'ed shortly. It is no longer safe to call
+ * any mid layer functions with this Scsi_Device *. Additionally, the
+ * mid layer will not make any more calls into the low level driver's
+ * queue routine with this device, so it is safe for the device driver
+ * to deallocate all structs/commands/etc that is has allocated
+ * specifically for this device at the time of this call.
*/
- int (* slave_attach)(int, int);
+ void (* slave_detach)(Scsi_Device *);
/*
* This function determines the bios parameters for a given
@@ -217,6 +269,8 @@
/*
* Used to set the queue depth for a specific device.
+ *
+ * Once the slave_attach() function is in full use, this will go away.
*/
void (*select_queue_depths)(struct Scsi_Host *, Scsi_Device *);
--- linux-2.5.22/drivers/scsi/scsi.c.slave Sun Jun 16 22:31:33 2002
+++ linux-2.5.22/drivers/scsi/scsi.c Tue Jun 18 20:29:43 2002
@@ -575,6 +575,54 @@
up(SCpnt->host->eh_wait);
}
+ if(SDpnt->queue_depth && SDpnt->queue_depth != SDpnt->new_queue_depth) {
+ if(SDpnt->new_queue_depth > SDpnt->queue_depth) {
+ Scsi_Cmnd *newSCpnt;
+
+ newSCpnt = kmalloc(sizeof(Scsi_Cmnd), GFP_ATOMIC |
+ (SDpnt->host->unchecked_isa_dma ?
+ GFP_DMA : 0));
+ if(newSCpnt) {
+ memset(newSCpnt, 0, sizeof(Scsi_Cmnd));
+ newSCpnt->host = SDpnt->host;
+ newSCpnt->device = SDpnt;
+ newSCpnt->target = SDpnt->id;
+ newSCpnt->lun = SDpnt->lun;
+ newSCpnt->channel = SDpnt->channel;
+ newSCpnt->request.rq_status = RQ_INACTIVE;
+ newSCpnt->use_sg = 0;
+ newSCpnt->old_use_sg = 0;
+ newSCpnt->old_cmd_len = 0;
+ newSCpnt->underflow = 0;
+ newSCpnt->old_underflow = 0;
+ newSCpnt->transfersize = 0;
+ newSCpnt->resid = 0;
+ newSCpnt->serial_number = 0;
+ newSCpnt->serial_number_at_timeout = 0;
+ newSCpnt->host_scribble = NULL;
+ newSCpnt->next = SDpnt->device_queue;
+ SDpnt->device_queue = newSCpnt;
+ newSCpnt->state = SCSI_STATE_UNUSED;
+ newSCpnt->owner = SCSI_OWNER_NOBODY;
+ SDpnt->queue_depth++;
+ }
+ } else {
+ Scsi_Cmnd *prev, *next;
+ /*
+ * Release the command block and decrement the queue
+ * depth.
+ */
+ SDpnt->queue_depth--;
+ for(prev = NULL, next = SDpnt->device_queue;
+ next != SCpnt;
+ prev = next, next = next->next) ;
+ if(prev == NULL)
+ SDpnt->device_queue = next->next;
+ else
+ prev->next = next->next;
+ kfree((char *)SCpnt);
+ }
+ }
spin_unlock_irqrestore(&device_request_lock, flags);
/*
@@ -1514,6 +1562,60 @@
spin_unlock_irqrestore(&device_request_lock, flags);
}
+/*
+ * Function: scsi_adjust_queue_depth()
+ *
+ * Purpose: Allow low level drivers to tell us to change the queue depth
+ * on a specific SCSI device
+ *
+ * Arguments: SDpnt - SCSI Device in question
+ * tagged - Do we use tagged queueing (non-0) or do we treat
+ * this device as an untagged device (0)
+ * tags - Number of tags allowed if tagged queueing enabled,
+ * or number of commands the low level driver can
+ * queue up in non-tagged mode (as per cmd_per_lun).
+ *
+ * Returns: Nothing
+ *
+ * Lock Status: None held on entry
+ *
+ * Notes: Low level drivers may call this at any time and we will do
+ * the right thing depending on whether or not the device is
+ * currently active and whether or not it even has the
+ * command blocks built yet.
+ *
+ * If cmdblocks != 0 then we are a live device. We just set the
+ * new_queue_depth variable and when the scsi completion handler
+ * notices that queue_depth != new_queue_depth it will work to
+ * rectify the situation. If new_queue_depth is less than current
+ * queue_depth, then it will free the completed command instead of
+ * putting it back on the free list and dec queue_depth. Otherwise
+ * it will try to allocate a new command block for the device and
+ * put it on the free list along with the command that is being
+ * completed. Obviously, if the device isn't doing anything then
+ * neither is this code, so it will bring the devices queue depth
+ * back into line when the device is actually being used. This
+ * keeps us from needing to fire off a kernel thread or some such
+ * nonsense (this routine can be called from interrupt code, so
+ * handling allocations here would be tricky and risky, making
+ * a kernel thread a much safer way to go if we wanted to handle
+ * the work immediately instead of letting it get done a little
+ * at a time in the completion handler).
+ */
+void scsi_adjust_queue_depth(Scsi_Device *SDpnt, int tagged, int tags)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&device_request_lock, flags);
+ SDpnt->new_queue_depth = tags;
+ SDpnt->tagged_queue = tagged;
+ if(SDpnt->has_cmdblocks == 0)
+ {
+ SDpnt->queue_depth = tags;
+ }
+ spin_unlock_irqrestore(&device_request_lock, flags);
+}
+
void __init scsi_host_no_insert(char *str, int n)
{
Scsi_Host_Name *shn, *shn2;
@@ -1818,6 +1920,8 @@
*/
if (HBA_ptr->hostt->revoke)
HBA_ptr->hostt->revoke(scd);
+ if (HBA_ptr->hostt->slave_detach)
+ (*HBA_ptr->hostt->slave_detach) (scd);
devfs_unregister (scd->de);
scsi_release_commandblocks(scd);
@@ -2100,6 +2204,8 @@
printk(KERN_ERR "Attached usage count = %d\n", SDpnt->attached);
goto err_out;
}
+ if (shpnt->hostt->slave_detach)
+ (*shpnt->hostt->slave_detach) (SDpnt);
devfs_unregister (SDpnt->de);
}
}
@@ -2307,6 +2413,8 @@
* Nobody is using this device any more. Free all of the
* command structures.
*/
+ if (shpnt->hostt->slave_detach)
+ (*shpnt->hostt->slave_detach) (SDpnt);
scsi_release_commandblocks(SDpnt);
}
}
--- linux-2.5.22/drivers/scsi/scsi.h.slave Sun Jun 16 22:31:27 2002
+++ linux-2.5.22/drivers/scsi/scsi.h Tue Jun 18 18:04:50 2002
@@ -485,6 +485,7 @@
extern void scsi_bottom_half_handler(void);
extern void scsi_release_commandblocks(Scsi_Device * SDpnt);
extern void scsi_build_commandblocks(Scsi_Device * SDpnt);
+extern void scsi_adjust_queue_depth(Scsi_Device *, int, int);
extern void scsi_done(Scsi_Cmnd * SCpnt);
extern void scsi_finish_command(Scsi_Cmnd *);
extern int scsi_retry_command(Scsi_Cmnd *);
@@ -562,10 +563,14 @@
wait_queue_head_t scpnt_wait; /* Used to wait if
device is busy */
struct Scsi_Host *host;
+ void *host_device_ptr; /* host private data ptr, can be alloc'ed on
+ attach and free'ed on detach */
request_queue_t request_queue;
atomic_t device_active; /* commands checked out for device */
volatile unsigned short device_busy; /* commands actually active on low-level */
Scsi_Cmnd *device_queue; /* queue of SCSI Command structures */
+ unsigned char queue_depth; /* How deep a queue to use */
+ unsigned char new_queue_depth; /* If we need to raise/lower depth */
unsigned int id, lun, channel;
@@ -589,7 +594,6 @@
unsigned char current_tag; /* current tag */
unsigned char sync_min_period; /* Not less than this period */
unsigned char sync_max_offset; /* Not greater than this offset */
- unsigned char queue_depth; /* How deep a queue to use */
unsigned online:1;
unsigned writeable:1;
--- linux-2.5.22/drivers/scsi/scsi_scan.c.slave Sun Jun 16 22:31:35 2002
+++ linux-2.5.22/drivers/scsi/scsi_scan.c Tue Jun 18 18:04:50 2002
@@ -860,6 +860,17 @@
scsi_release_commandblocks(SDpnt);
+ if(shpnt->hostt->slave_attach != NULL)
+ if((shpnt->hostt->slave_attach)(SDpnt) != 0) {
+ /*
+ * Low level driver failed to attach this device,
+ * we've got to kick it back out now :-(
+ */
+ printk("scsi: scan_scsis_single: slave_attach failed, "
+ "marking OFF-LINE.\n");
+ SDpnt->online = FALSE;
+ }
+
/*
* This device was already hooked up to the host in question,
* so at this point we just let go of it and it should be fine. We do need to
--- linux-2.5.22/drivers/scsi/scsi_syms.c.slave Tue Jun 18 18:33:15 2002
+++ linux-2.5.22/drivers/scsi/scsi_syms.c Tue Jun 18 18:34:44 2002
@@ -62,6 +62,7 @@
EXPORT_SYMBOL(scsi_release_request);
EXPORT_SYMBOL(scsi_wait_req);
EXPORT_SYMBOL(scsi_do_req);
+EXPORT_SYMBOL(scsi_adjust_queue_depth);
EXPORT_SYMBOL(scsi_report_bus_reset);
EXPORT_SYMBOL(scsi_block_requests);
--
Doug Ledford <dledford@redhat.com> 919-754-3700 x44233
Red Hat, Inc.
1801 Varsity Dr.
Raleigh, NC 27606
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH + RFC] Beginning of some updates to scsi mid layer
[not found] <20020619014048.B8623@redhat.com>
@ 2002-06-19 17:44 ` Pete Zaitcev
2002-06-19 17:55 ` Matthew Jacob
2002-06-19 18:25 ` Doug Ledford
0 siblings, 2 replies; 24+ messages in thread
From: Pete Zaitcev @ 2002-06-19 17:44 UTC (permalink / raw)
To: dledford; +Cc: linux-scsi
> So, when Justin Gibbs' driver allocates a queue depth of 253
> commands on drive X, then finds out that drive X has a hard limit of 64,
> then we aren't continuing to waste 189 commands worth of allocated kernel
> memory that can never be effectively used.
> + diff = p->dev_lun_queue_depth[tindex] -
> + p->dev_active_cmds[tindex];
> + p->dev_lun_queue_depth[tindex] -= diff;
I think this is not enough, or backwards. We hit a situation with
EMC when attaching hundreds of disks depleted atomic memory with
queue request arrays before swapper had a chance to replenish it.
I would much appreciate if the initial allocation was concervative,
if low performing, and was adjusted upwards at slave_attach time.
For 2.5 I would appreciate if we made some SCSI allocations non-atomic.
The probe time things better be GFP_KERNEL.
-- Pete
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH + RFC] Beginning of some updates to scsi mid layer
2002-06-19 17:44 ` [PATCH + RFC] Beginning of some updates to scsi mid layer Pete Zaitcev
@ 2002-06-19 17:55 ` Matthew Jacob
2002-06-19 18:25 ` Doug Ledford
1 sibling, 0 replies; 24+ messages in thread
From: Matthew Jacob @ 2002-06-19 17:55 UTC (permalink / raw)
To: Pete Zaitcev; +Cc: dledford, linux-scsi
>
> I think this is not enough, or backwards. We hit a situation with
> EMC when attaching hundreds of disks depleted atomic memory with
> queue request arrays before swapper had a chance to replenish it.
> I would much appreciate if the initial allocation was concervative,
> if low performing, and was adjusted upwards at slave_attach time.
>
> For 2.5 I would appreciate if we made some SCSI allocations non-atomic.
> The probe time things better be GFP_KERNEL.
I'd go further and say that you only need probe your boot device. After that,
GFP_KERNEL is more than sufficient for anything.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH + RFC] Beginning of some updates to scsi mid layer
2002-06-19 17:44 ` [PATCH + RFC] Beginning of some updates to scsi mid layer Pete Zaitcev
2002-06-19 17:55 ` Matthew Jacob
@ 2002-06-19 18:25 ` Doug Ledford
2002-06-28 5:41 ` Jeremy Higdon
1 sibling, 1 reply; 24+ messages in thread
From: Doug Ledford @ 2002-06-19 18:25 UTC (permalink / raw)
To: Pete Zaitcev; +Cc: linux-scsi
On Wed, Jun 19, 2002 at 01:44:36PM -0400, Pete Zaitcev wrote:
> > So, when Justin Gibbs' driver allocates a queue depth of 253
> > commands on drive X, then finds out that drive X has a hard limit of 64,
> > then we aren't continuing to waste 189 commands worth of allocated kernel
> > memory that can never be effectively used.
>
> > + diff = p->dev_lun_queue_depth[tindex] -
> > + p->dev_active_cmds[tindex];
> > + p->dev_lun_queue_depth[tindex] -= diff;
>
> I think this is not enough, or backwards. We hit a situation with
> EMC when attaching hundreds of disks depleted atomic memory with
> queue request arrays before swapper had a chance to replenish it.
> I would much appreciate if the initial allocation was concervative,
> if low performing, and was adjusted upwards at slave_attach time.
This is actually from the runtime code, part of my interrupt handler. It
notices that we might get a lot of QUEUE_FULLs from a device all at the
same queue depth, and in that case it reduces the queue depth to the new
depth. So, in short, it's not backwards, it's actually taking the depth
down based upon solid feedback from the drive.
> For 2.5 I would appreciate if we made some SCSI allocations non-atomic.
> The probe time things better be GFP_KERNEL.
Noted, I'll change that soon. As long as we have at least 1 command
block, then the other allocations can be at our leisure since the change
to scsi_release_command() will attempt to allocate 1 command block per
completed command until they catch up to the intended queue depth. And,
since that isn't done from an interrupt, all make it GFP_KERNEL as well
and if we fail, then we simply wait and do the allocation later.
Allocations on the aic7xxx_old driver are a different thing. Don't pay
too much attention to them. This work I'm doing here has pointed out to
me that the tagged queue handling in the driver is totally busted in
regards to multi-lun disk devices. I would actually have to do a rather
large overhaul to get it right, and that's not on the plate until I've
got some of this other stuff out of the way.
--
Doug Ledford <dledford@redhat.com> 919-754-3700 x44233
Red Hat, Inc.
1801 Varsity Dr.
Raleigh, NC 27606
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH + RFC] Beginning of some updates to scsi mid layer
2002-06-19 0:47 Doug Ledford
@ 2002-06-19 21:15 ` Patrick Mansfield
2002-06-20 19:45 ` Doug Ledford
0 siblings, 1 reply; 24+ messages in thread
From: Patrick Mansfield @ 2002-06-19 21:15 UTC (permalink / raw)
To: linux-scsi
On Tue, Jun 18, 2002 at 08:47:53PM -0400, Doug Ledford wrote:
Doug -
> As mentioned, this adds a new scsi_adjust_queue_depth routine that lets
> the low level driver update the queue depth on a device at any point in
> time. So, when Justin Gibbs' driver allocates a queue depth of 253
> commands on drive X, then finds out that drive X has a hard limit of 64,
> then we aren't continuing to waste 189 commands worth of allocated kernel
> memory that can never be effectively used. Future plans here include
> linking this into the bio queueing so that adjusting the mid layer queue
> depth does the "right thing" in regards to the bio layer (for all I know
> it does already, I haven't gotten into the bio->scsi interface yet).
> Once this stuff is in place, then my next goal is to get the queueing code
> in the mid layer up to useable standards (the goal is to be able to rip
> the buts of the QUEUE_FULL and BUSY handling out of my driver and still
> have the system do the "right thing" instead of my driver having it's own
> complete internal queue mechanism including timers and such like it does
> now).
The Scsi_Cmnd initialization should have its own function (maybe
including allocation), called from scsi_adjust_queue_depth and your
new code.
So do you think the current select_queue_depths should (eventually)
be removed?
Not a comment on your patch, but why isn't the queue depth setting
handled completely in the mid-layer?
The cmd_per_lun and can_queue fields are in Scsi_Host, and the mid-layer
already checks for tagged queueing. I don't see what adavantage there
is to having the low level adapters each set the queue depth in their
own way. Plus, a lot of common (and not so common) code would go away.
And then scsi_adjust_queue_depth() could be called when not in interrupt
context.
-- Patrick Mansfield
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH + RFC] Beginning of some updates to scsi mid layer
2002-06-19 21:15 ` Patrick Mansfield
@ 2002-06-20 19:45 ` Doug Ledford
0 siblings, 0 replies; 24+ messages in thread
From: Doug Ledford @ 2002-06-20 19:45 UTC (permalink / raw)
To: Patrick Mansfield; +Cc: linux-scsi
On Wed, Jun 19, 2002 at 02:15:22PM -0700, Patrick Mansfield wrote:
> On Tue, Jun 18, 2002 at 08:47:53PM -0400, Doug Ledford wrote:
>
> Doug -
>
> > As mentioned, this adds a new scsi_adjust_queue_depth routine that lets
> > the low level driver update the queue depth on a device at any point in
> > time. So, when Justin Gibbs' driver allocates a queue depth of 253
> > commands on drive X, then finds out that drive X has a hard limit of 64,
> > then we aren't continuing to waste 189 commands worth of allocated kernel
> > memory that can never be effectively used. Future plans here include
> > linking this into the bio queueing so that adjusting the mid layer queue
> > depth does the "right thing" in regards to the bio layer (for all I know
> > it does already, I haven't gotten into the bio->scsi interface yet).
>
> > Once this stuff is in place, then my next goal is to get the queueing code
> > in the mid layer up to useable standards (the goal is to be able to rip
> > the buts of the QUEUE_FULL and BUSY handling out of my driver and still
> > have the system do the "right thing" instead of my driver having it's own
> > complete internal queue mechanism including timers and such like it does
> > now).
>
> The Scsi_Cmnd initialization should have its own function (maybe
> including allocation), called from scsi_adjust_queue_depth and your
> new code.
I disagree, but see below for why.
> So do you think the current select_queue_depths should (eventually)
> be removed?
Yes, that's my goal. The current method is confusing to some driver
writers as they end up seeing the same device multiple times when someone
does any scsi-add-single-device magic on thier bus. It only lets you
adjust the queue depth when a device is added, not when you need to. It's
a bad design and needs to go away.
> Not a comment on your patch, but why isn't the queue depth setting
> handled completely in the mid-layer?
Because the mid layer doesn't have the information needed to do so. Queue
depth settings are limited by two things. 1) the drive's ability to
support tagged queue commands. 2) the driver/hardware's ability to send
those commands. The driver's often used shared tags amongst all drives
for efficiency reasons (my aic7xxx driver does for example). Details such
as this impact what can and can not be done in terms of tagged queueing.
Only the low level driver does/should have that information, so only the
low level driver should control the tagged queueing. The mid layer is,
for the most part, totally tag ignorant. It only needs to know that they
are supported and it can send X of them, that's all. There isn't any real
reason to try and make it control things. (Not yet anyway, but I am
planning on adding one specific type of control to the mid layer in
regards to tagged queueing, but that's still a few patches down the road)
> The cmd_per_lun and can_queue fields are in Scsi_Host, and the mid-layer
> already checks for tagged queueing. I don't see what adavantage there
> is to having the low level adapters each set the queue depth in their
> own way. Plus, a lot of common (and not so common) code would go away.
Each low level driver *must* set the queue depth in its own way. A few
examples should illustrate this for you quite nicely:
1) In the aic7xxx driver I have a 255 command limit per card. In host
memory I have a set of card/command specific data structures. For each
command that comes in, I fill in one of these structures and I then start
the command. The tag value used in talking to the device is actually a
direct index into this array of command structures. That way, when a
device reconnects and tells the card what tag it is reconnecting, the card
can download the saved command state immediately. This means that actual
tag values used between drives *must* always be unique. Therefore, the
mid layer is not in a position to assign the actual tag value for me, but
only in a position to say "send this as a tagged command".
2) Currently (and this is broken BTW, but oh well), the aic7xxx driver
does most of it's indexing into device specific structs based upon the
target ID, but not based upon the lun. That means that when connecting
the aic7xxx driver to an external SCSI RAID chassis that exports more than
one drive as the same target ID and different luns, we have overlap in the
data structures in the driver. Setting the queue depth for these
overlapping devices requires internal knowledge of how the overlap effects
things. Obviously, the mid layer can't know these details.
3) The aic7xxx driver is capable of setting up a waiting queue on the card
itself. In this queue we may have up to 16 commands waiting to be sent to
the device. Now, imagine that we already have 64 commands sent to a
Seagate drive, and we have just queued up 16 more. When the first of
those 16 commands connects to the Seagate drive, it might generate a
QUEUE_FULL response. If so, in my interrupt handler, I pull the other 15
commands out of the waiting queue so I don't hit the drive with a bunch of
commands when the drive can't take them. I also count exactly how many
commands are *actually* on the drive and active so I can track whether or
not the QUEUE_FULL messages always happen at the same depth. It is
impossible to do this count without knowing intimate hardware details of
the controller and its outgoing queue structure. This is also the only
reliable way of detecting when a device has a hard queue depth limit or if
it generates QUEUE_FULL messages at differing depths based upon current
resource allocations on the drive itself. This happens at interrupt time
because the command completion, where we get the QUEUE_FULL status, is
obviously an interrupt.
> And then scsi_adjust_queue_depth() could be called when not in interrupt
> context.
This is totally backwards to what you say earlier. Earlier you want the
mid layer to handle command queueing so that each driver doesn't have to
write specific code for setting things, and here you advocate each driver
writing its own method for calling adjust_queue_depth out of the interrupt
context when instead this lazy queue depth setting handler is one of the
few items that *could* be handled just in one place. I think your take on
tagged queueing is about the perfect opposite of most implementation
details.
--
Doug Ledford <dledford@redhat.com> 919-754-3700 x44233
Red Hat, Inc.
1801 Varsity Dr.
Raleigh, NC 27606
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH + RFC] Beginning of some updates to scsi mid layer
2002-06-19 18:25 ` Doug Ledford
@ 2002-06-28 5:41 ` Jeremy Higdon
2002-06-28 7:37 ` Doug Ledford
0 siblings, 1 reply; 24+ messages in thread
From: Jeremy Higdon @ 2002-06-28 5:41 UTC (permalink / raw)
To: Doug Ledford, Pete Zaitcev; +Cc: linux-scsi
On Jun 19, 2:25pm, Doug Ledford wrote:
>
> This is actually from the runtime code, part of my interrupt handler. It
> notices that we might get a lot of QUEUE_FULLs from a device all at the
> same queue depth, and in that case it reduces the queue depth to the new
> depth. So, in short, it's not backwards, it's actually taking the depth
> down based upon solid feedback from the drive.
Is there any accounting for the following condition:
An array with multiple logical units can accept 256 commands (for
argument purposes) from all initiators for all luns. Therefore,
the number of commands that you can queue to sdX is dependent
on how many commands queued by this initiator to other sds, as well
as how many commands other initiators have issued to this sd and
others.
This may not be an issue so much with parallel SCSI RAIDs, but it does
come up with Fibrechannel.
It is theoretically possible to get QUEUE_FULL when the host has no other
commands in progress on a logical unit. Would the host driver then believe
that it could issue 0 or 1 command only, or is there some lower limit?
jeremy
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH + RFC] Beginning of some updates to scsi mid layer
@ 2002-06-28 6:08 Martin Peschke3
2002-06-28 7:39 ` Doug Ledford
0 siblings, 1 reply; 24+ messages in thread
From: Martin Peschke3 @ 2002-06-28 6:08 UTC (permalink / raw)
To: Jeremy Higdon; +Cc: Doug Ledford, Pete Zaitcev, linux-scsi
Hi,
as we learned with zfcp (fibre channel HBA driver for zSeries)
QUEUE_FULL might occur even if there isn't any own SCSI
command of that initiator outstanding. Unfortunately the mid
layer assumes that it can queue at least one command.
This assumption is wrong considering the fact that
other initiators might have filled the queue of a target device
as you describe below. This condition results in a starvation
of I/O for that particular device. The command which was rejected
with QUEUE_FULL is moved into the ml queue and never retried
since the mid layer only triggers a retry when another command for
that device returns. But there is no other command which
could return.
We think this needs to be fixed in the ml code (e.g. retry periodically
triggered by timer?). Our current workaround is to map QUEUE_FULL
to BUSY which avoids moving commands into the ml queue.
Mit freundlichen Grüßen / with kind regards
Martin Peschke
IBM Deutschland Entwicklung GmbH
Linux for eServer Development
Phone: +49-(0)7031-16-2349
Jeremy Higdon <jeremy@classic.engr.sgi.com>@vger.kernel.org on 06/28/2002
07:41:31 AM
Please respond to Jeremy Higdon <jeremy@classic.engr.sgi.com>
Sent by: linux-scsi-owner@vger.kernel.org
To: Doug Ledford <dledford@redhat.com>, Pete Zaitcev
<zaitcev@redhat.com>
cc: linux-scsi@vger.kernel.org
Subject: Re: [PATCH + RFC] Beginning of some updates to scsi mid layer
On Jun 19, 2:25pm, Doug Ledford wrote:
>
> This is actually from the runtime code, part of my interrupt handler. It
> notices that we might get a lot of QUEUE_FULLs from a device all at the
> same queue depth, and in that case it reduces the queue depth to the new
> depth. So, in short, it's not backwards, it's actually taking the depth
> down based upon solid feedback from the drive.
Is there any accounting for the following condition:
An array with multiple logical units can accept 256 commands (for
argument purposes) from all initiators for all luns. Therefore,
the number of commands that you can queue to sdX is dependent
on how many commands queued by this initiator to other sds, as well
as how many commands other initiators have issued to this sd and
others.
This may not be an issue so much with parallel SCSI RAIDs, but it does
come up with Fibrechannel.
It is theoretically possible to get QUEUE_FULL when the host has no other
commands in progress on a logical unit. Would the host driver then believe
that it could issue 0 or 1 command only, or is there some lower limit?
jeremy
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH + RFC] Beginning of some updates to scsi mid layer
2002-06-28 5:41 ` Jeremy Higdon
@ 2002-06-28 7:37 ` Doug Ledford
0 siblings, 0 replies; 24+ messages in thread
From: Doug Ledford @ 2002-06-28 7:37 UTC (permalink / raw)
To: Jeremy Higdon; +Cc: Pete Zaitcev, linux-scsi
On Thu, Jun 27, 2002 at 10:41:31PM -0700, Jeremy Higdon wrote:
> Is there any accounting for the following condition:
>
> An array with multiple logical units can accept 256 commands (for
> argument purposes) from all initiators for all luns. Therefore,
> the number of commands that you can queue to sdX is dependent
> on how many commands queued by this initiator to other sds, as well
> as how many commands other initiators have issued to this sd and
> others.
That's (currently) on a per driver basis. My driver would account for
this situation just fine. Others might not. That is one of the things I
plan to change in the mid layer very soon now.
> This may not be an issue so much with parallel SCSI RAIDs, but it does
> come up with Fibrechannel.
>
> It is theoretically possible to get QUEUE_FULL when the host has no other
> commands in progress on a logical unit. Would the host driver then believe
> that it could issue 0 or 1 command only, or is there some lower limit?
>
> jeremy
--
Doug Ledford <dledford@redhat.com> 919-754-3700 x44233
Red Hat, Inc.
1801 Varsity Dr.
Raleigh, NC 27606
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH + RFC] Beginning of some updates to scsi mid layer
2002-06-28 6:08 Martin Peschke3
@ 2002-06-28 7:39 ` Doug Ledford
2002-06-29 1:19 ` Jeremy Higdon
0 siblings, 1 reply; 24+ messages in thread
From: Doug Ledford @ 2002-06-28 7:39 UTC (permalink / raw)
To: Martin Peschke3; +Cc: Jeremy Higdon, Pete Zaitcev, linux-scsi
On Fri, Jun 28, 2002 at 08:08:01AM +0200, Martin Peschke3 wrote:
> as you describe below. This condition results in a starvation
> of I/O for that particular device. The command which was rejected
> with QUEUE_FULL is moved into the ml queue and never retried
> since the mid layer only triggers a retry when another command for
> that device returns. But there is no other command which
> could return.
> We think this needs to be fixed in the ml code (e.g. retry periodically
> triggered by timer?). Our current workaround is to map QUEUE_FULL
Yes, the answer to this problem involves a timer. Basically what I
suggest (and do in my driver) is set it so that the driver waits until
either A) a command is completed or B) 10ms has passed before sending the
command back out, whichever comes first. Since most drives have had
plenty of time to complete one or more commands in 10ms, we assume that
even if we haven't got a completion yet that the drive likely has some
freed up resources and so we try again.
--
Doug Ledford <dledford@redhat.com> 919-754-3700 x44233
Red Hat, Inc.
1801 Varsity Dr.
Raleigh, NC 27606
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH + RFC] Beginning of some updates to scsi mid layer
@ 2002-06-28 8:25 Martin Peschke3
2002-06-28 11:22 ` Doug Ledford
0 siblings, 1 reply; 24+ messages in thread
From: Martin Peschke3 @ 2002-06-28 8:25 UTC (permalink / raw)
To: Doug Ledford; +Cc: Jeremy Higdon, Pete Zaitcev, linux-scsi
Hi Doug,
you wrote
> Yes, the answer to this problem involves a timer. Basically what I
> suggest (and do in my driver) is set it so that the driver waits until
> either A) a command is completed or B) 10ms has passed before sending the
> command back out, whichever comes first. Since most drives have had
> plenty of time to complete one or more commands in 10ms, we assume that
> even if we haven't got a completion yet that the drive likely has some
> freed up resources and so we try again.
I don't like the idea to implement those retries in each HBA driver.
It would be the same for each HBA driver:
On queue full condition setup a timer which triggers the retry.
(If the retry has not already been triggered by a returning command.)
How many retries would be allowed? How long may a sequence of retries
take? Do you have in mind that the mid layer has another timer which
watches the completion of your command being in your retry loop
and that this timer will force the command to be aborted on expiration.
I think, implementing this in an HBA driver correctly gets quite
complex. There are a lot of them and there is good chance
that a few of them will get buggy. Besides, it would be much cleaner
to let the mid layer or upper layer care about QUEUE_FULL since
the mentioned ml timer per command would be canceled.
Mit freundlichen Grüßen / with kind regards
Martin Peschke
IBM Deutschland Entwicklung GmbH
Linux for eServer Development
Phone: +49-(0)7031-16-2349
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH + RFC] Beginning of some updates to scsi mid layer
2002-06-28 8:25 Martin Peschke3
@ 2002-06-28 11:22 ` Doug Ledford
0 siblings, 0 replies; 24+ messages in thread
From: Doug Ledford @ 2002-06-28 11:22 UTC (permalink / raw)
To: Martin Peschke3; +Cc: Jeremy Higdon, Pete Zaitcev, linux-scsi
On Fri, Jun 28, 2002 at 10:25:41AM +0200, Martin Peschke3 wrote:
> Hi Doug,
>
> you wrote
> > Yes, the answer to this problem involves a timer. Basically what I
> > suggest (and do in my driver) is set it so that the driver waits until
> > either A) a command is completed or B) 10ms has passed before sending the
> > command back out, whichever comes first. Since most drives have had
> > plenty of time to complete one or more commands in 10ms, we assume that
> > even if we haven't got a completion yet that the drive likely has some
> > freed up resources and so we try again.
>
> I don't like the idea to implement those retries in each HBA driver.
> It would be the same for each HBA driver:
I agree 100%. I should have been more clear. Take my answer above as
"this is what my driver does to make things work properly now". Then,
take my previous posts about the things I'm working on and read that to be
"I'm working on making the mid layer do this right so your driver doesn't
have to".
--
Doug Ledford <dledford@redhat.com> 919-754-3700 x44233
Red Hat, Inc.
1801 Varsity Dr.
Raleigh, NC 27606
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH + RFC] Beginning of some updates to scsi mid layer
2002-06-28 7:39 ` Doug Ledford
@ 2002-06-29 1:19 ` Jeremy Higdon
2002-06-29 2:04 ` Matthew Jacob
2002-06-29 10:10 ` Doug Ledford
0 siblings, 2 replies; 24+ messages in thread
From: Jeremy Higdon @ 2002-06-29 1:19 UTC (permalink / raw)
To: Doug Ledford, Martin Peschke3; +Cc: Pete Zaitcev, linux-scsi
On Jun 28, 3:39am, Doug Ledford wrote:
>
> On Fri, Jun 28, 2002 at 08:08:01AM +0200, Martin Peschke3 wrote:
> > as you describe below. This condition results in a starvation
> > of I/O for that particular device. The command which was rejected
> > with QUEUE_FULL is moved into the ml queue and never retried
> > since the mid layer only triggers a retry when another command for
> > that device returns. But there is no other command which
> > could return.
> > We think this needs to be fixed in the ml code (e.g. retry periodically
> > triggered by timer?). Our current workaround is to map QUEUE_FULL
>
> Yes, the answer to this problem involves a timer. Basically what I
> suggest (and do in my driver) is set it so that the driver waits until
> either A) a command is completed or B) 10ms has passed before sending the
> command back out, whichever comes first. Since most drives have had
> plenty of time to complete one or more commands in 10ms, we assume that
> even if we haven't got a completion yet that the drive likely has some
> freed up resources and so we try again.
Do you change the queue depth that you use based on a QUEUE_FULL, then?
If so, then I presume you have some minimum, such as two or four
commands . . . ?
jeremy
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH + RFC] Beginning of some updates to scsi mid layer
2002-06-29 1:19 ` Jeremy Higdon
@ 2002-06-29 2:04 ` Matthew Jacob
2002-06-29 10:05 ` Doug Ledford
2002-07-01 21:02 ` Gérard Roudier
2002-06-29 10:10 ` Doug Ledford
1 sibling, 2 replies; 24+ messages in thread
From: Matthew Jacob @ 2002-06-29 2:04 UTC (permalink / raw)
To: Jeremy Higdon; +Cc: Doug Ledford, Martin Peschke3, Pete Zaitcev, linux-scsi
FWIW, Bob Snively (on T10 commitee) has always claimed that the correct
response to QUEUE FULL is immediate command resubmission (i.e., *don't* wait
for a command to complete (the QFULL may have resulted from commands being
processed for *another* initiator), and *don't* a period of time for things to
'get better').
I'm not sure agree with this 100%, but he might have a point.
On Fri, 28 Jun 2002, Jeremy Higdon wrote:
> On Jun 28, 3:39am, Doug Ledford wrote:
> >
> > On Fri, Jun 28, 2002 at 08:08:01AM +0200, Martin Peschke3 wrote:
> > > as you describe below. This condition results in a starvation
> > > of I/O for that particular device. The command which was rejected
> > > with QUEUE_FULL is moved into the ml queue and never retried
> > > since the mid layer only triggers a retry when another command for
> > > that device returns. But there is no other command which
> > > could return.
> > > We think this needs to be fixed in the ml code (e.g. retry periodically
> > > triggered by timer?). Our current workaround is to map QUEUE_FULL
> >
> > Yes, the answer to this problem involves a timer. Basically what I
> > suggest (and do in my driver) is set it so that the driver waits until
> > either A) a command is completed or B) 10ms has passed before sending the
> > command back out, whichever comes first. Since most drives have had
> > plenty of time to complete one or more commands in 10ms, we assume that
> > even if we haven't got a completion yet that the drive likely has some
> > freed up resources and so we try again.
>
>
> Do you change the queue depth that you use based on a QUEUE_FULL, then?
> If so, then I presume you have some minimum, such as two or four
> commands . . . ?
>
> jeremy
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH + RFC] Beginning of some updates to scsi mid layer
2002-06-29 2:04 ` Matthew Jacob
@ 2002-06-29 10:05 ` Doug Ledford
2002-06-29 10:37 ` Matthew Jacob
2002-07-01 21:02 ` Gérard Roudier
1 sibling, 1 reply; 24+ messages in thread
From: Doug Ledford @ 2002-06-29 10:05 UTC (permalink / raw)
To: Matthew Jacob; +Cc: Jeremy Higdon, Martin Peschke3, Pete Zaitcev, linux-scsi
On Fri, Jun 28, 2002 at 07:04:28PM -0700, Matthew Jacob wrote:
>
> FWIW, Bob Snively (on T10 commitee) has always claimed that the correct
> response to QUEUE FULL is immediate command resubmission (i.e., *don't* wait
> for a command to complete (the QFULL may have resulted from commands being
> processed for *another* initiator), and *don't* a period of time for things to
> 'get better').
>
> I'm not sure agree with this 100%, but he might have a point.
No, absolutely not! Faster drivers are capable of restarting commands on
the bus so fast that in this scenario they can flood the bus with
selection, command, queue full sequences and not allow *any* commands to
complete because the bus is always tied up (this was, in fact, one of the
reasons that I had to put the queue handling code in the aic7xxx driver
when I did).
--
Doug Ledford <dledford@redhat.com> 919-754-3700 x44233
Red Hat, Inc.
1801 Varsity Dr.
Raleigh, NC 27606
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH + RFC] Beginning of some updates to scsi mid layer
2002-06-29 1:19 ` Jeremy Higdon
2002-06-29 2:04 ` Matthew Jacob
@ 2002-06-29 10:10 ` Doug Ledford
1 sibling, 0 replies; 24+ messages in thread
From: Doug Ledford @ 2002-06-29 10:10 UTC (permalink / raw)
To: Jeremy Higdon; +Cc: Martin Peschke3, Pete Zaitcev, linux-scsi
On Fri, Jun 28, 2002 at 06:19:59PM -0700, Jeremy Higdon wrote:
> On Jun 28, 3:39am, Doug Ledford wrote:
> >
> > Yes, the answer to this problem involves a timer. Basically what I
> > suggest (and do in my driver) is set it so that the driver waits until
> > either A) a command is completed or B) 10ms has passed before sending the
> > command back out, whichever comes first. Since most drives have had
> > plenty of time to complete one or more commands in 10ms, we assume that
> > even if we haven't got a completion yet that the drive likely has some
> > freed up resources and so we try again.
>
>
> Do you change the queue depth that you use based on a QUEUE_FULL, then?
> If so, then I presume you have some minimum, such as two or four
> commands . . . ?
There is a minimum (I think it's 8). But, more importantly, I only adjust
the queue depth if the QUEUE_FULL messages always come at some exact
depth. If the drive returns QUEUE_FULLs at differing depths, then we
assume it's a resource shortage that causes it and that reducing the queue
depth won't necessarily help.
--
Doug Ledford <dledford@redhat.com> 919-754-3700 x44233
Red Hat, Inc.
1801 Varsity Dr.
Raleigh, NC 27606
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH + RFC] Beginning of some updates to scsi mid layer
2002-06-29 10:05 ` Doug Ledford
@ 2002-06-29 10:37 ` Matthew Jacob
0 siblings, 0 replies; 24+ messages in thread
From: Matthew Jacob @ 2002-06-29 10:37 UTC (permalink / raw)
To: Doug Ledford; +Cc: Jeremy Higdon, Martin Peschke3, Pete Zaitcev, linux-scsi
On Sat, 29 Jun 2002, Doug Ledford wrote:
> On Fri, Jun 28, 2002 at 07:04:28PM -0700, Matthew Jacob wrote:
> >
> > FWIW, Bob Snively (on T10 commitee) has always claimed that the correct
> > response to QUEUE FULL is immediate command resubmission (i.e., *don't* wait
> > for a command to complete (the QFULL may have resulted from commands being
> > processed for *another* initiator), and *don't* a period of time for things to
> > 'get better').
> >
> > I'm not sure agree with this 100%, but he might have a point.
>
> No, absolutely not! Faster drivers are capable of restarting commands on
> the bus so fast that in this scenario they can flood the bus with
> selection, command, queue full sequences and not allow *any* commands to
> complete because the bus is always tied up (this was, in fact, one of the
> reasons that I had to put the queue handling code in the aic7xxx driver
> when I did).
Yeah- I thought of that. But that problem actually goes away somewhat with FC
(&& QAS maybe).
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH + RFC] Beginning of some updates to scsi mid layer
2002-07-01 21:02 ` Gérard Roudier
@ 2002-07-01 19:08 ` Matthew Jacob
2002-07-01 19:15 ` Doug Ledford
0 siblings, 1 reply; 24+ messages in thread
From: Matthew Jacob @ 2002-07-01 19:08 UTC (permalink / raw)
To: Gérard Roudier
Cc: Jeremy Higdon, Doug Ledford, Martin Peschke3, Pete Zaitcev,
linux-scsi
>
>
> On Fri, 28 Jun 2002, Matthew Jacob wrote:
>
> > FWIW, Bob Snively (on T10 commitee) has always claimed that the correct
> > response to QUEUE FULL is immediate command resubmission (i.e., *don't* wait
> > for a command to complete (the QFULL may have resulted from commands being
> > processed for *another* initiator), and *don't* a period of time for things to
> > 'get better').
> >
> > I'm not sure agree with this 100%, but he might have a point.
>
> My favorite retry handling algorithm looks like this:
>
> 1) Retry immediately,
> 2) If still fails, retry after some minimal delay T,
> 3) If still fails, retry after delay 2T,
> 4) If still fails, retry after delay 4T,
> *) and so on ... until some maximum value for the delay.
>
> Even with T being very short, you will never flood. OTOH, a transient
> condition that lasts X will be recovered in not more than 2X.
>
> BTW, it isn't implemented in the sym drivers, but I use to use such
> algorithm when kind of blind retrying is to be implemented.
Yes, exponential backoff is good in a lot of cases. BTW- this is something I
want to put into cam_periph_error for FreeBSD- I want to make the retry after
selection timeout also have exponential delays up to retry count.
In private mail with Doug && Pete, I managed to put myself into an untenable
position. The problem here is that we don't have end-to-end flow control, so
we have to *guess* when it might be good to try again.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH + RFC] Beginning of some updates to scsi mid layer
2002-07-01 19:08 ` Matthew Jacob
@ 2002-07-01 19:15 ` Doug Ledford
2002-07-01 19:23 ` Matthew Jacob
2002-07-02 11:27 ` Rogier Wolff
0 siblings, 2 replies; 24+ messages in thread
From: Doug Ledford @ 2002-07-01 19:15 UTC (permalink / raw)
To: Matthew Jacob
Cc: Gérard Roudier, Jeremy Higdon, Martin Peschke3, Pete Zaitcev,
linux-scsi
On Mon, Jul 01, 2002 at 12:08:03PM -0700, Matthew Jacob wrote:
> Yes, exponential backoff is good in a lot of cases. BTW- this is something I
> want to put into cam_periph_error for FreeBSD- I want to make the retry after
> selection timeout also have exponential delays up to retry count.
Hmmm...what's the purpose on this BTW? Selection timeouts on what, fiber
or SPI or something else?
> In private mail with Doug && Pete, I managed to put myself into an untenable
> position. The problem here is that we don't have end-to-end flow control, so
> we have to *guess* when it might be good to try again.
Yep. Experimentation can be your friend on that. That's what I did to
come up with the alghorithm I was quoting. However, I didn't try
exponential backoff so I make no claims that it wouldn't in fact be
better.
--
Doug Ledford <dledford@redhat.com> 919-754-3700 x44233
Red Hat, Inc.
1801 Varsity Dr.
Raleigh, NC 27606
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH + RFC] Beginning of some updates to scsi mid layer
2002-07-01 19:15 ` Doug Ledford
@ 2002-07-01 19:23 ` Matthew Jacob
2002-07-01 19:59 ` Doug Ledford
2002-07-02 11:27 ` Rogier Wolff
1 sibling, 1 reply; 24+ messages in thread
From: Matthew Jacob @ 2002-07-01 19:23 UTC (permalink / raw)
To: Doug Ledford
Cc: Gérard Roudier, Jeremy Higdon, Martin Peschke3, Pete Zaitcev,
linux-scsi
On Mon, 1 Jul 2002, Doug Ledford wrote:
> On Mon, Jul 01, 2002 at 12:08:03PM -0700, Matthew Jacob wrote:
> > Yes, exponential backoff is good in a lot of cases. BTW- this is something I
> > want to put into cam_periph_error for FreeBSD- I want to make the retry after
> > selection timeout also have exponential delays up to retry count.
>
> Hmmm...what's the purpose on this BTW? Selection timeouts on what, fiber
> or SPI or something else?
Fibre. Devices 'go away' for shortish to longish periods of time on Fibre (or
iSCSI for that matter). Lacking a sane framework that is USB-like about
hotpluggable that does *not* cause complete system spasm when this happens
(and it happens a lot), what one wants is a bit of hysteresis to give the
disks that went away a chance to log back into the fabric and thus appear
again (as if they had never gone).
>
> > In private mail with Doug && Pete, I managed to put myself into an untenable
> > position. The problem here is that we don't have end-to-end flow control, so
> > we have to *guess* when it might be good to try again.
>
> Yep. Experimentation can be your friend on that. That's what I did to
> come up with the alghorithm I was quoting. However, I didn't try
> exponential backoff so I make no claims that it wouldn't in fact be
> better.
But the trouble with experimentation can be like what got me into trouble in
our private email- I was relying on stale information (like, 10 years old) in
my thinking. Bad matt.....
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH + RFC] Beginning of some updates to scsi mid layer
2002-07-01 19:23 ` Matthew Jacob
@ 2002-07-01 19:59 ` Doug Ledford
2002-07-01 20:17 ` Matthew Jacob
0 siblings, 1 reply; 24+ messages in thread
From: Doug Ledford @ 2002-07-01 19:59 UTC (permalink / raw)
To: Matthew Jacob
Cc: Gérard Roudier, Jeremy Higdon, Martin Peschke3, Pete Zaitcev,
linux-scsi
On Mon, Jul 01, 2002 at 12:23:56PM -0700, Matthew Jacob wrote:
> Fibre. Devices 'go away' for shortish to longish periods of time on Fibre (or
> iSCSI for that matter). Lacking a sane framework that is USB-like about
> hotpluggable that does *not* cause complete system spasm when this happens
> (and it happens a lot), what one wants is a bit of hysteresis to give the
> disks that went away a chance to log back into the fabric and thus appear
> again (as if they had never gone).
(I knew this, but it had slipped my mind). This reminds me of something I
was thinking about for the mid layer code stuff. Does FreeBSD have it's
drivers declare their connection method anywhere? It seems to me there
are a limited number of connection specific details that the mid layer
would benefit from being aware of (for example > 8 luns on SCSI-2 fiber
devices, but not on SCSI-2 SPI devices, the selection timeout issue you
just raised with transient disk loss which should not happen on SPI,
things like that). Does the FreeBSD mid layer do anything with that sort
of information?
--
Doug Ledford <dledford@redhat.com> 919-754-3700 x44233
Red Hat, Inc.
1801 Varsity Dr.
Raleigh, NC 27606
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH + RFC] Beginning of some updates to scsi mid layer
2002-07-01 19:59 ` Doug Ledford
@ 2002-07-01 20:17 ` Matthew Jacob
0 siblings, 0 replies; 24+ messages in thread
From: Matthew Jacob @ 2002-07-01 20:17 UTC (permalink / raw)
To: Doug Ledford
Cc: Gérard Roudier, Jeremy Higdon, Martin Peschke3, Pete Zaitcev,
linux-scsi
> On Mon, Jul 01, 2002 at 12:23:56PM -0700, Matthew Jacob wrote:
> > Fibre. Devices 'go away' for shortish to longish periods of time on Fibre (or
> > iSCSI for that matter). Lacking a sane framework that is USB-like about
> > hotpluggable that does *not* cause complete system spasm when this happens
> > (and it happens a lot), what one wants is a bit of hysteresis to give the
> > disks that went away a chance to log back into the fabric and thus appear
> > again (as if they had never gone).
>
> (I knew this, but it had slipped my mind). This reminds me of something I
> was thinking about for the mid layer code stuff. Does FreeBSD have it's
> drivers declare their connection method anywhere? It seems to me there
> are a limited number of connection specific details that the mid layer
> would benefit from being aware of (for example > 8 luns on SCSI-2 fiber
> devices, but not on SCSI-2 SPI devices,
Eh? You can have &way& > 8 luns on normal SCSI-2 SPI- you just have to have a
cooperating target that uses the target routine bits. The QLogic 12160 Ultra3
cards have firmware for 32 or 64 luns.
> the selection timeout issue you
> just raised with transient disk loss which should not happen on SPI,
> things like that). Does the FreeBSD mid layer do anything with that sort
> of information?
The -current version has a bunch of rewritten code (that isn't the default
yet) that specifically addresses this issue. It allows you to have plug in
transport methods. That allows you to do transport specific things. For
example, speed negotiation is meaningless for FC and ATAPI transports, but
crucial for SPI.
In terms of the 'selection timeout'- what I've wanted more recently is really
a hack. The problem is that you really *do* need to reflect upward that as far
as *you* (the HBA/SIM) can tell, the device has *left* the fabric. It may in
fact reappear instantaneously, but it has left. So, ultimately, you need a
framework that can accomdate this sensibly.
So, rather than the selection timeout hack, you need something which can
sensibly follow a settable policy when the async event of devices leaving
occurs. The policy might be fielded by the midlayer to stop queueing for a
policy set period of time- and then you start EIO'ing commands when that
expires (this is more than normal device command timeouts, which are really
meant to reflect how long a command should take on the device itself, not that
*plus* whatever transmission troubles keep you from actually starting the
command).
At any rate, the selection timeout hack is a good interim solution for those
of us who don't have the time or the moxie to rewrite as much as actually
*needs* to be rewritten.
-matt
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH + RFC] Beginning of some updates to scsi mid layer
2002-06-29 2:04 ` Matthew Jacob
2002-06-29 10:05 ` Doug Ledford
@ 2002-07-01 21:02 ` Gérard Roudier
2002-07-01 19:08 ` Matthew Jacob
1 sibling, 1 reply; 24+ messages in thread
From: Gérard Roudier @ 2002-07-01 21:02 UTC (permalink / raw)
To: Matthew Jacob
Cc: Jeremy Higdon, Doug Ledford, Martin Peschke3, Pete Zaitcev,
linux-scsi
On Fri, 28 Jun 2002, Matthew Jacob wrote:
> FWIW, Bob Snively (on T10 commitee) has always claimed that the correct
> response to QUEUE FULL is immediate command resubmission (i.e., *don't* wait
> for a command to complete (the QFULL may have resulted from commands being
> processed for *another* initiator), and *don't* a period of time for things to
> 'get better').
>
> I'm not sure agree with this 100%, but he might have a point.
My favorite retry handling algorithm looks like this:
1) Retry immediately,
2) If still fails, retry after some minimal delay T,
3) If still fails, retry after delay 2T,
4) If still fails, retry after delay 4T,
*) and so on ... until some maximum value for the delay.
Even with T being very short, you will never flood. OTOH, a transient
condition that lasts X will be recovered in not more than 2X.
BTW, it isn't implemented in the sym drivers, but I use to use such
algorithm when kind of blind retrying is to be implemented.
Gérard.
> On Fri, 28 Jun 2002, Jeremy Higdon wrote:
>
> > On Jun 28, 3:39am, Doug Ledford wrote:
> > >
> > > On Fri, Jun 28, 2002 at 08:08:01AM +0200, Martin Peschke3 wrote:
> > > > as you describe below. This condition results in a starvation
> > > > of I/O for that particular device. The command which was rejected
> > > > with QUEUE_FULL is moved into the ml queue and never retried
> > > > since the mid layer only triggers a retry when another command for
> > > > that device returns. But there is no other command which
> > > > could return.
> > > > We think this needs to be fixed in the ml code (e.g. retry periodically
> > > > triggered by timer?). Our current workaround is to map QUEUE_FULL
> > >
> > > Yes, the answer to this problem involves a timer. Basically what I
> > > suggest (and do in my driver) is set it so that the driver waits until
> > > either A) a command is completed or B) 10ms has passed before sending the
> > > command back out, whichever comes first. Since most drives have had
> > > plenty of time to complete one or more commands in 10ms, we assume that
> > > even if we haven't got a completion yet that the drive likely has some
> > > freed up resources and so we try again.
> >
> >
> > Do you change the queue depth that you use based on a QUEUE_FULL, then?
> > If so, then I presume you have some minimum, such as two or four
> > commands . . . ?
> >
> > jeremy
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH + RFC] Beginning of some updates to scsi mid layer
2002-07-01 19:15 ` Doug Ledford
2002-07-01 19:23 ` Matthew Jacob
@ 2002-07-02 11:27 ` Rogier Wolff
1 sibling, 0 replies; 24+ messages in thread
From: Rogier Wolff @ 2002-07-02 11:27 UTC (permalink / raw)
To: Doug Ledford
Cc: Matthew Jacob, [G_rard Roudier], Jeremy Higdon, Martin Peschke3,
Pete Zaitcev, linux-scsi
Doug Ledford wrote:
> On Mon, Jul 01, 2002 at 12:08:03PM -0700, Matthew Jacob wrote:
> > Yes, exponential backoff is good in a lot of cases. BTW- this is something I
> > want to put into cam_periph_error for FreeBSD- I want to make the retry after
> > selection timeout also have exponential delays up to retry count.
>
> Hmmm...what's the purpose on this BTW? Selection timeouts on what, fiber
> or SPI or something else?
As I'm in the data-recovery business, I've seen more than my share of
"bad disks".
I've seen (mostly IDE) cases where the retry "interrupted" the drive
while it was still recovering from the last problem, leading to a new
erorr condition, which triggered the same "lengthy recovery" in the
drive. Repeat ad inifinitum.
Exponential backoff would've helped.
If "now" we think that a retry after 1 second is right, then I would
suggest something like a "backoff factor" of 1.5, and an initial value
of 0.7 seconds.
Roger.
--
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2137555 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
* There are old pilots, and there are bold pilots.
* There are also old, bald pilots.
^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2002-07-02 11:27 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20020619014048.B8623@redhat.com>
2002-06-19 17:44 ` [PATCH + RFC] Beginning of some updates to scsi mid layer Pete Zaitcev
2002-06-19 17:55 ` Matthew Jacob
2002-06-19 18:25 ` Doug Ledford
2002-06-28 5:41 ` Jeremy Higdon
2002-06-28 7:37 ` Doug Ledford
2002-06-28 8:25 Martin Peschke3
2002-06-28 11:22 ` Doug Ledford
-- strict thread matches above, loose matches on Subject: below --
2002-06-28 6:08 Martin Peschke3
2002-06-28 7:39 ` Doug Ledford
2002-06-29 1:19 ` Jeremy Higdon
2002-06-29 2:04 ` Matthew Jacob
2002-06-29 10:05 ` Doug Ledford
2002-06-29 10:37 ` Matthew Jacob
2002-07-01 21:02 ` Gérard Roudier
2002-07-01 19:08 ` Matthew Jacob
2002-07-01 19:15 ` Doug Ledford
2002-07-01 19:23 ` Matthew Jacob
2002-07-01 19:59 ` Doug Ledford
2002-07-01 20:17 ` Matthew Jacob
2002-07-02 11:27 ` Rogier Wolff
2002-06-29 10:10 ` Doug Ledford
2002-06-19 0:47 Doug Ledford
2002-06-19 21:15 ` Patrick Mansfield
2002-06-20 19:45 ` Doug Ledford
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox