public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH] scsi-misc-2.5 software enqueue when can_queue reached
@ 2003-02-28 19:19 Patrick Mansfield
  2003-03-02  8:57 ` Christoph Hellwig
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Patrick Mansfield @ 2003-02-28 19:19 UTC (permalink / raw)
  To: linux-scsi

Hi -

Any comments or suggestions on the following patch?

Patch is against scsi-misc-2.5.

I'm trying to break split queue_lock into a per scsi_device lock, and the
starved code is in the way, plus it is not fair and looks like it has a
couple of bugs. I started by trying to add a queue of request queues plus
throttling to the current "starved" algorithm, but it was not too clean,
so instead implemented this software enqueueing when the host can_queue
limit is reached.

This can use more resources (scsi_cmnd's) if the sum of queue_depths on a
host is greater than can_queue, but it is fairer (when not all devices are
hitting queue_depth limits, or if we had throttling with a throttling
limit of 1) and simpler compared to using a queue of queues (or code
similiar to the current starvation code).

This also includes moving self_host_blocked checks (on self host unblock
we run all queues so there is never a "starvation" issue), and chaning
some of the mlqueue log levels so we do not get flooded with tons of
message for "scsi log mlqueue 4".

Tested using the feral qla driver, with 10 drives on a host, queue_depth
set to 16, and can_queue set to 100.


 hosts.c      |   70 +++++++++++++++++++++++++++++++++++++++++++-----
 hosts.h      |   12 +-------
 scsi.c       |   69 +++++++++++++++++++++++++++++------------------
 scsi.h       |    9 +++---
 scsi_error.c |   51 +++++++++++++++++++++++++++++------
 scsi_lib.c   |   85 ++++++++++++++++++++---------------------------------------
 6 files changed, 182 insertions(+), 114 deletions(-)


===== drivers/scsi/hosts.c 1.54 vs edited =====
--- 1.54/drivers/scsi/hosts.c	Sun Feb 23 10:34:55 2003
+++ edited/drivers/scsi/hosts.c	Thu Feb 27 16:21:09 2003
@@ -383,6 +383,7 @@
 	scsi_assign_lock(shost, &shost->default_lock);
 	INIT_LIST_HEAD(&shost->my_devices);
 	INIT_LIST_HEAD(&shost->eh_cmd_q);
+	INIT_LIST_HEAD(&shost->pending_queue);
 
 	init_waitqueue_head(&shost->host_wait);
 	shost->dma_channel = 0xff;
@@ -613,19 +614,72 @@
 	spin_unlock_irqrestore(shost->host_lock, flags);
 }
 
+void scsi_host_send_pending (struct Scsi_Host *shost)
+{
+	struct scsi_cmnd *scmd;
+	unsigned long flags;
+
+	spin_lock_irqsave(shost->host_lock, flags);
+	while (!shost->host_self_blocked
+	       && !list_empty(&shost->pending_queue)) {
+		/*
+		 * list_for_each_safe is not safe here, since after the
+		 * unlock someone else can remove the scmd after the one
+		 * we are pulling off the list.
+		 */
+		scmd = list_entry(shost->pending_queue.next, struct scsi_cmnd,
+				  pending_queue);
+		list_del_init(&scmd->pending_queue);
+		SCSI_LOG_MLQUEUE(3,
+			printk("%s: removed from pending cmd: 0x%lx\n",
+			       __FUNCTION__, scmd->serial_number));
+		spin_unlock_irqrestore(shost->host_lock, flags);
+		if (scsi_dispatch_cmd(scmd, 0))
+			/*
+			 * Device hit host_blocked, or (race case)
+			 * can_queue limit was reached.
+			 */
+			goto unlocked;
+		spin_lock_irqsave(shost->host_lock, flags);
+	}
+	spin_unlock_irqrestore(shost->host_lock, flags);
+unlocked:
+}
+
 void scsi_host_busy_dec_and_test(struct Scsi_Host *shost, Scsi_Device *sdev)
 {
 	unsigned long flags;
+	struct scsi_cmnd *scmd;
 
 	spin_lock_irqsave(shost->host_lock, flags);
+	/*
+	 * If broken adapters are ever fixed or go away (at least qlogicfc
+	 * and qlogicisp):
+	 *
+	 * WARN_ON(shost->host_busy > shost->can_queue);
+	 */
 	shost->host_busy--;
 	sdev->device_busy--;
-	if (shost->in_recovery && shost->host_failed &&
-	    (shost->host_busy == shost->host_failed))
-	{
-		up(shost->eh_wait);
-		SCSI_LOG_ERROR_RECOVERY(5, printk("Waking error handler"
-					  " thread\n"));
-	}
-	spin_unlock_irqrestore(shost->host_lock, flags);
+	if (shost->in_recovery) {
+		if (shost->host_failed && (shost->host_busy == shost->host_failed)) {
+			up(shost->eh_wait);
+			SCSI_LOG_ERROR_RECOVERY(5, printk("Waking error handler"
+						  " thread\n"));
+		}
+		spin_unlock_irqrestore(shost->host_lock, flags);
+	} else if (!list_empty(&shost->pending_queue) &&
+		   !shost->host_blocked && !shost->host_self_blocked) {
+		/*
+		 * Send one pending command.
+		 */
+		scmd = list_entry(shost->pending_queue.next, struct scsi_cmnd,
+				  pending_queue);
+		list_del_init(&scmd->pending_queue);
+		SCSI_LOG_MLQUEUE(3,
+			printk("%s: removed from pending cmd: 0x%lx\n",
+			       __FUNCTION__, scmd->serial_number));
+		spin_unlock_irqrestore(shost->host_lock, flags);
+		scsi_dispatch_cmd(scmd, 0);
+	} else 
+		spin_unlock_irqrestore(shost->host_lock, flags);
 }
===== drivers/scsi/hosts.h 1.56 vs edited =====
--- 1.56/drivers/scsi/hosts.h	Fri Feb 21 16:41:13 2003
+++ edited/drivers/scsi/hosts.h	Fri Feb 28 09:31:46 2003
@@ -380,6 +380,7 @@
     struct scsi_host_cmd_pool *cmd_pool;
     spinlock_t            free_list_lock;
     struct list_head      free_list;   /* backup store of cmd structs */
+    struct list_head      pending_queue;   /* software enqueued commands */
 
     spinlock_t		  default_lock;
     spinlock_t		  *host_lock;
@@ -471,12 +472,6 @@
     unsigned reverse_ordering:1;
 
     /*
-     * Indicates that one or more devices on this host were starved, and
-     * when the device becomes less busy that we need to feed them.
-     */
-    unsigned some_device_starved:1;
-   
-    /*
      * Host has rejected a command because it was busy.
      */
     unsigned int host_blocked;
@@ -580,12 +575,9 @@
 extern struct Scsi_Host *scsi_host_hn_get(unsigned short);
 extern void scsi_host_put(struct Scsi_Host *);
 extern void scsi_host_init(void);
-
-/*
- * host_busy inc/dec/test functions
- */
 extern void scsi_host_busy_inc(struct Scsi_Host *, Scsi_Device *);
 extern void scsi_host_busy_dec_and_test(struct Scsi_Host *, Scsi_Device *);
+extern void scsi_host_send_pending (struct Scsi_Host *shost);
 
 /**
  * scsi_find_device - find a device given the host
===== drivers/scsi/scsi.c 1.97 vs edited =====
--- 1.97/drivers/scsi/scsi.c	Wed Feb 26 00:00:06 2003
+++ edited/drivers/scsi/scsi.c	Fri Feb 28 09:32:01 2003
@@ -303,6 +303,7 @@
 		cmd->owner = SCSI_OWNER_NOBODY;
 		init_timer(&cmd->eh_timeout);
 		INIT_LIST_HEAD(&cmd->list);
+		INIT_LIST_HEAD(&cmd->pending_queue);
 		spin_lock_irqsave(&dev->list_lock, flags);
 		list_add_tail(&cmd->list, &dev->cmd_list);
 		spin_unlock_irqrestore(&dev->list_lock, flags);
@@ -423,15 +424,17 @@
 }
 
 /*
- * Function:    scsi_dispatch_command
+ * Function:    scsi_dispatch_cmd
  *
  * Purpose:     Dispatch a command to the low-level driver.
  *
  * Arguments:   SCpnt - command block we are dispatching.
+ *              resend - command is being resent, host_busy was not
+ *              decremented (blah), and we have a slot available for SCpnt.
  *
  * Notes:
  */
-int scsi_dispatch_cmd(Scsi_Cmnd * SCpnt)
+int scsi_dispatch_cmd(Scsi_Cmnd * SCpnt, int resend)
 {
 #ifdef DEBUG_DELAY
 	unsigned long clock;
@@ -494,7 +497,7 @@
 	 * We will use a queued command if possible, otherwise we will emulate the
 	 * queuing and calling of completion function ourselves.
 	 */
-	SCSI_LOG_MLQUEUE(3, printk("scsi_dispatch_cmnd (host = %d, channel = %d, target = %d, "
+	SCSI_LOG_MLQUEUE(4, printk("scsi_dispatch_cmnd (host = %d, channel = %d, target = %d, "
 	       "command = %p, buffer = %p, \nbufflen = %d, done = %p)\n",
 	SCpnt->device->host->host_no, SCpnt->device->channel, SCpnt->device->id, SCpnt->cmnd,
 			    SCpnt->buffer, SCpnt->bufflen, SCpnt->done));
@@ -502,7 +505,7 @@
 	SCpnt->state = SCSI_STATE_QUEUED;
 	SCpnt->owner = SCSI_OWNER_LOWLEVEL;
 	if (host->can_queue) {
-		SCSI_LOG_MLQUEUE(3, printk("queuecommand : routine at %p\n",
+		SCSI_LOG_MLQUEUE(4, printk("queuecommand : routine at %p\n",
 					   host->hostt->queuecommand));
 		/*
 		 * Before we queue this command, check if the command
@@ -510,12 +513,33 @@
 		 */
 		if (CDB_SIZE(SCpnt) <= SCpnt->device->host->max_cmd_len) {
 			spin_lock_irqsave(host->host_lock, flags);
-			rtn = host->hostt->queuecommand(SCpnt, scsi_done);
-			spin_unlock_irqrestore(host->host_lock, flags);
-			if (rtn != 0) {
-				scsi_queue_insert(SCpnt, rtn == SCSI_MLQUEUE_DEVICE_BUSY ? rtn : SCSI_MLQUEUE_HOST_BUSY);
+			if (!resend && ((host->host_busy > host->can_queue) ||
+			    host->host_blocked)) {
+				/*
+				 * queue a pending command
+				 */
 				SCSI_LOG_MLQUEUE(3,
-				   printk("queuecommand : request rejected\n"));                                
+					printk("%s: add to pending cmd:"
+					       " 0x%lx\n", __FUNCTION__,
+					       SCpnt->serial_number));
+				scsi_delete_timer(SCpnt);
+				list_add_tail(&SCpnt->pending_queue,
+					      &host->pending_queue);
+				spin_unlock_irqrestore(host->host_lock, flags);
+				rtn = 1;
+			} else {
+				if (!resend)
+					host->host_busy++;
+				rtn = host->hostt->queuecommand(SCpnt,
+								scsi_done);
+				spin_unlock_irqrestore(host->host_lock, flags);
+				if (rtn != 0) {
+					scsi_queue_insert(SCpnt,
+						rtn == SCSI_MLQUEUE_DEVICE_BUSY
+						? rtn : SCSI_MLQUEUE_HOST_BUSY);
+					SCSI_LOG_MLQUEUE(3,
+					   printk("queuecommand : request rejected\n"));                                
+				}
 			}
 		} else {
 			SCSI_LOG_MLQUEUE(3,
@@ -531,6 +555,8 @@
 
 		SCSI_LOG_MLQUEUE(3, printk("command() :  routine at %p\n", host->hostt->command));
                 spin_lock_irqsave(host->host_lock, flags);
+		if (!resend)
+			host->host_busy++;
 		temp = host->hostt->command(SCpnt);
 		SCpnt->result = temp;
 #ifdef DEBUG_DELAY
@@ -547,7 +573,7 @@
 		scsi_done(SCpnt);
                 spin_unlock_irqrestore(host->host_lock, flags);
 	}
-	SCSI_LOG_MLQUEUE(3, printk("leaving scsi_dispatch_cmnd()\n"));
+	SCSI_LOG_MLQUEUE(4, printk("leaving scsi_dispatch_cmnd()\n"));
 	return rtn;
 }
 
@@ -824,7 +850,7 @@
          */
 	memset((void *) SCpnt->sense_buffer, 0, sizeof SCpnt->sense_buffer);
 
-	return scsi_dispatch_cmd(SCpnt);
+	return scsi_dispatch_cmd(SCpnt, 1);
 }
 
 /*
@@ -845,23 +871,12 @@
 
 	ASSERT_LOCK(host->host_lock, 0);
 
-        /*
-         * We need to protect the decrement, as otherwise a race condition
-         * would exist.  Fiddling with SCpnt isn't a problem as the
-         * design only allows a single SCpnt to be active in only
-         * one execution context, but the device and host structures are
-         * shared.
-         */
-	scsi_host_busy_dec_and_test(host, device);
-
-        /*
-         * Clear the flags which say that the device/host is no longer
-         * capable of accepting new commands.  These are set in scsi_queue.c
-         * for both the queue full condition on a device, and for a
-         * host full condition on the host.
-         */
-        host->host_blocked = 0;
         device->device_blocked = 0;
+	scsi_host_busy_dec_and_test(host, device);
+	if (host->host_blocked) {
+		host->host_blocked = 0;
+		scsi_host_send_pending(host);
+	}
 
 	/*
 	 * If we have valid sense information, then some kind of recovery
===== drivers/scsi/scsi.h 1.65 vs edited =====
--- 1.65/drivers/scsi/scsi.h	Sun Feb 23 10:34:55 2003
+++ edited/drivers/scsi/scsi.h	Thu Feb 27 16:21:10 2003
@@ -444,7 +444,7 @@
 /*
  * Prototypes for functions in scsi.c
  */
-extern int scsi_dispatch_cmd(Scsi_Cmnd * SCpnt);
+extern int scsi_dispatch_cmd(Scsi_Cmnd * SCpnt, int resend);
 extern int scsi_setup_command_freelist(struct Scsi_Host *shost);
 extern void scsi_destroy_command_freelist(struct Scsi_Host *shost);
 extern struct scsi_cmnd *scsi_get_command(struct scsi_device *dev, int flags);
@@ -634,8 +634,6 @@
 					 * because we did a bus reset. */
 	unsigned ten:1;		/* support ten byte read / write */
 	unsigned remap:1;	/* support remapping  */
-	unsigned starved:1;	/* unable to process commands because
-				   host busy */
 //	unsigned sync:1;	/* Sync transfer state, managed by host */
 //	unsigned wide:1;	/* WIDE transfer state, managed by host */
 
@@ -727,7 +725,9 @@
 	struct scsi_cmnd *reset_chain;
 
 	struct list_head list;  /* scsi_cmnd participates in queue lists */
-
+	struct list_head pending_queue;  /* for host->pending_queue, this could
+					  * be shared with eh_entry.
+					  */
 	struct list_head eh_entry; /* entry for the host eh_cmd_q */
 	int eh_state;		/* Used for state tracking in error handlr */
 	int eh_eflags;		/* Used by error handlr */
@@ -854,6 +854,7 @@
 #define SCSI_MLQUEUE_HOST_BUSY   0x1055
 #define SCSI_MLQUEUE_DEVICE_BUSY 0x1056
 #define SCSI_MLQUEUE_EH_RETRY    0x1057
+#define SCSI_MLQUEUE_PENDING     0x1058
 
 /*
  * old style reset request from external source
===== drivers/scsi/scsi_error.c 1.38 vs edited =====
--- 1.38/drivers/scsi/scsi_error.c	Sat Feb 22 08:17:01 2003
+++ edited/drivers/scsi/scsi_error.c	Thu Feb 27 16:21:10 2003
@@ -1467,16 +1467,8 @@
 	 * requests are started.
 	 */
 	spin_lock_irqsave(shost->host_lock, flags);
-	list_for_each_entry(sdev, &shost->my_devices, siblings) {
-		if ((shost->can_queue > 0 &&
-		     (shost->host_busy >= shost->can_queue))
-		    || (shost->host_blocked)
-		    || (shost->host_self_blocked)) {
-			break;
-		}
-
+	list_for_each_entry(sdev, &shost->my_devices, siblings)
 		__blk_run_queue(sdev->request_queue);
-	}
 	spin_unlock_irqrestore(shost->host_lock, flags);
 }
 
@@ -1497,6 +1489,34 @@
 }
 
 /**
+ * scsi_eh_flush_pending_q - finish pending commands or requeue them.
+ * @pending_q:	list_head of queued pending commands.
+ *
+ **/
+static void scsi_eh_flush_pending_q(struct list_head *pending_q)
+{
+	struct list_head *lh, *lh_sf;
+	struct scsi_cmnd *scmd;
+
+	list_for_each_safe(lh, lh_sf, pending_q) {
+		scmd = list_entry(lh, struct scsi_cmnd, pending_queue);
+		list_del_init(lh);
+		if (scmd->device->online) {
+			SCSI_LOG_ERROR_RECOVERY(3, 
+				printk("%s: flush pending cmd: %p\n",
+				       current->comm, scmd));
+			scsi_queue_insert(scmd, SCSI_MLQUEUE_PENDING);
+		} else {
+			scmd->result |= (DRIVER_TIMEOUT << 24);
+			SCSI_LOG_ERROR_RECOVERY(3,
+				printk("%s: finish pending cmd: %p\n",
+				       current->comm, scmd));
+			scsi_finish_command(scmd);
+		}
+	}
+}
+
+/**
  * scsi_eh_flush_done_q - finish processed commands or retry them.
  * @done_q:	list_head of processed commands.
  *
@@ -1557,6 +1577,7 @@
 	unsigned long flags;
 	LIST_HEAD(eh_work_q);
 	LIST_HEAD(eh_done_q);
+	LIST_HEAD(pending_q);
 
 	spin_lock_irqsave(shost->host_lock, flags);
 	list_splice_init(&shost->eh_cmd_q, &eh_work_q);
@@ -1568,6 +1589,16 @@
 		if (!scsi_eh_abort_cmds(&eh_work_q, &eh_done_q))
 			scsi_eh_ready_devs(shost, &eh_work_q, &eh_done_q);
 
+	/*
+	 * Finish or requeue all outstanding commands, first the pending
+	 * (so they are sent after any error recovered commands) and then
+	 * any error recovered commands.
+	 */
+	spin_lock_irqsave(shost->host_lock, flags);
+	list_splice_init(&shost->pending_queue, &pending_q);
+	spin_unlock_irqrestore(shost->host_lock, flags);
+	scsi_eh_flush_pending_q(&pending_q);
+
 	scsi_eh_flush_done_q(&eh_done_q);
 }
 
@@ -1659,7 +1690,9 @@
 		 * restart, we restart any I/O to any other devices on the bus
 		 * which are still online.
 		 */
+
 		scsi_restart_operations(shost);
+
 
 	}
 
===== drivers/scsi/scsi_lib.c 1.73 vs edited =====
--- 1.73/drivers/scsi/scsi_lib.c	Sat Feb 22 12:35:33 2003
+++ edited/drivers/scsi/scsi_lib.c	Thu Feb 27 16:21:10 2003
@@ -92,9 +92,11 @@
 {
 	struct Scsi_Host *host = cmd->device->host;
 	struct scsi_device *device = cmd->device;
+	unsigned long flags;
 
 	SCSI_LOG_MLQUEUE(1,
-		 printk("Inserting command %p into mlqueue\n", cmd));
+		 printk("Inserting command %p into mlqueue reason 0x%x\n",
+			cmd, reason));
 
 	/*
 	 * We are inserting the command into the ml queue.  First, we
@@ -127,11 +129,20 @@
 	cmd->owner = SCSI_OWNER_MIDLEVEL;
 	cmd->bh_next = NULL;
 
-	/*
-	 * Decrement the counters, since these commands are no longer
-	 * active on the host/device.
-	 */
-	scsi_host_busy_dec_and_test(host, device);
+	if (reason == SCSI_MLQUEUE_PENDING) {
+		/*
+		 * Pending commands have not actually made it to the host,
+		 * so do not decrement host_busy.
+		 */
+		spin_lock_irqsave(host->host_lock, flags);
+		device->device_busy--;
+		spin_unlock_irqrestore(host->host_lock, flags);
+	} else
+		/*
+		 * Decrement the counters, since these commands are no longer
+		 * active on the host/device.
+		 */
+		scsi_host_busy_dec_and_test(host, device);
 
 	/*
 	 * Insert this command at the head of the queue for it's device.
@@ -358,7 +369,6 @@
 	struct scsi_device *sdev, *sdev2;
 	struct Scsi_Host *shost;
 	unsigned long flags;
-	int all_clear;
 
 	ASSERT_LOCK(q->queue_lock, 0);
 
@@ -400,10 +410,7 @@
 	 * with special case code, then spin off separate versions and
 	 * use function pointers to pick the right one.
 	 */
-	if (sdev->single_lun && blk_queue_empty(q) && sdev->device_busy ==0 &&
-			!shost->host_blocked && !shost->host_self_blocked &&
-			!((shost->can_queue > 0) && (shost->host_busy >=
-				       		     shost->can_queue))) {
+	if (sdev->single_lun && blk_queue_empty(q) && sdev->device_busy == 0) {
 		list_for_each_entry(sdev2, &sdev->same_target_siblings,
 			       same_target_siblings) {
 			if (!sdev2->device_blocked &&
@@ -414,31 +421,6 @@
 		}
 	}
 
-	/*
-	 * Now see whether there are other devices on the bus which
-	 * might be starved.  If so, hit the request function.  If we
-	 * don't find any, then it is safe to reset the flag.  If we
-	 * find any device that it is starved, it isn't safe to reset the
-	 * flag as the queue function releases the lock and thus some
-	 * other device might have become starved along the way.
-	 */
-	all_clear = 1;
-	if (shost->some_device_starved) {
-		list_for_each_entry(sdev, &shost->my_devices, siblings) {
-			if (shost->can_queue > 0 &&
-			    shost->host_busy >= shost->can_queue)
-				break;
-			if (shost->host_blocked || shost->host_self_blocked)
-				break;
-			if (sdev->device_blocked || !sdev->starved)
-				continue;
-			__blk_run_queue(sdev->request_queue);
-			all_clear = 0;
-		}
-
-		if (sdev == NULL && all_clear)
-			shost->some_device_starved = 0;
-	}
 	spin_unlock_irqrestore(q->queue_lock, flags);
 }
 
@@ -1094,6 +1076,7 @@
 				SCSI_LOG_MLQUEUE(3,
 					printk("scsi%d unblocking host at zero depth\n",
 						shost->host_no));
+				WARN_ON(!list_empty(&shost->pending_queue));
 			} else {
 				blk_plug_device(q);
 				break;
@@ -1117,23 +1100,18 @@
 		 */
 		if (sdev->device_blocked)
 			break;
-		if ((shost->can_queue > 0 && shost->host_busy >= shost->can_queue) ||
-		    shost->host_blocked || shost->host_self_blocked) {
+
+		if (shost->host_self_blocked)
+			break;
+
+		if ((sdev->device_busy > 0) && shost->host_blocked)
 			/*
-			 * If we are unable to process any commands at all for
-			 * this device, then we consider it to be starved.
-			 * What this means is that there are no outstanding
-			 * commands for this device and hence we need a
-			 * little help getting it started again
-			 * once the host isn't quite so busy.
+			 * On unblock during IO completion, we send
+			 * pending commands, make sure we always have at
+			 * least one command pending to keep this queue
+			 * running again.
 			 */
-			if (sdev->device_busy == 0) {
-				sdev->starved = 1;
-				shost->some_device_starved = 1;
-			}
 			break;
-		} else
-			sdev->starved = 0;
 
 		/*
 		 * If we couldn't find a request that could be queued, then we
@@ -1170,11 +1148,6 @@
 		if (!(blk_queue_tagged(q) && (blk_queue_start_tag(q, req) == 0)))
 			blkdev_dequeue_request(req);
 	
-		/*
-		 * Now bump the usage count for both the host and the
-		 * device.
-		 */
-		shost->host_busy++;
 		sdev->device_busy++;
 		spin_unlock_irq(q->queue_lock);
 
@@ -1187,7 +1160,7 @@
 		/*
 		 * Dispatch the command to the low-level driver.
 		 */
-		scsi_dispatch_cmd(cmd);
+		scsi_dispatch_cmd(cmd, 0);
 
 		/*
 		 * Now we need to grab the lock again.  We are about to mess

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH] scsi-misc-2.5 software enqueue when can_queue reached
  2003-02-28 19:19 [RFC][PATCH] scsi-misc-2.5 software enqueue when can_queue reached Patrick Mansfield
@ 2003-03-02  8:57 ` Christoph Hellwig
  2003-03-02 18:15   ` Patrick Mansfield
                     ` (2 more replies)
  2003-03-02 20:57 ` Luben Tuikov
  2003-03-05  3:02 ` James Bottomley
  2 siblings, 3 replies; 18+ messages in thread
From: Christoph Hellwig @ 2003-03-02  8:57 UTC (permalink / raw)
  To: Patrick Mansfield; +Cc: linux-scsi

The patch looks fine to me in principle, a few nitpicks are below.
(btw, any chance you could post a current scsi midlayer multipathing
patch?  I see there is BK activity.. :))

> +++ edited/drivers/scsi/hosts.c	Thu Feb 27 16:21:09 2003
> @@ -383,6 +383,7 @@
>  	scsi_assign_lock(shost, &shost->default_lock);
>  	INIT_LIST_HEAD(&shost->my_devices);
>  	INIT_LIST_HEAD(&shost->eh_cmd_q);
> +	INIT_LIST_HEAD(&shost->pending_queue);

This isn't consistant, either ew use _q in all new code or _queue.
(peronally I prefer the latter)

> +void scsi_host_send_pending (struct Scsi_Host *shost)

codingstyle issue:  no space between function name and opening brace.

> +{
> +	struct scsi_cmnd *scmd;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(shost->host_lock, flags);
> +	while (!shost->host_self_blocked
> +	       && !list_empty(&shost->pending_queue)) {

Linux codingstyle sais the && should be before the line break.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH] scsi-misc-2.5 software enqueue when can_queue reached
  2003-03-02  8:57 ` Christoph Hellwig
@ 2003-03-02 18:15   ` Patrick Mansfield
  2003-03-03 15:52   ` Randy.Dunlap
  2003-03-03 18:17   ` Luben Tuikov
  2 siblings, 0 replies; 18+ messages in thread
From: Patrick Mansfield @ 2003-03-02 18:15 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-scsi

On Sun, Mar 02, 2003 at 08:57:28AM +0000, Christoph Hellwig wrote:
> The patch looks fine to me in principle, a few nitpicks are below.
> (btw, any chance you could post a current scsi midlayer multipathing
> patch?  I see there is BK activity.. :))

The last multi-path patch is available here:

http://www-124.ibm.com/storageio/multipath/scsi-multipath/releases/2.5.59-mpath-1.patch.gz

Did you want to see something newer?

Mike was talking about pulling it forward to 2.5.latest, maybe he has
started.

> > +++ edited/drivers/scsi/hosts.c	Thu Feb 27 16:21:09 2003
> > @@ -383,6 +383,7 @@
> >  	scsi_assign_lock(shost, &shost->default_lock);
> >  	INIT_LIST_HEAD(&shost->my_devices);
> >  	INIT_LIST_HEAD(&shost->eh_cmd_q);
> > +	INIT_LIST_HEAD(&shost->pending_queue);
> 
> This isn't consistant, either ew use _q in all new code or _queue.
> (peronally I prefer the latter)

Yes, I also prefer _queue, maybe I should make it pending_cmd, then the
name is not an issue, and the type is not embeded in the name (the _q
implies that it is a queue or list_head)?

> > +void scsi_host_send_pending (struct Scsi_Host *shost)
> 
> codingstyle issue:  no space between function name and opening brace.
> 
> > +{
> > +	struct scsi_cmnd *scmd;
> > +	unsigned long flags;
> > +
> > +	spin_lock_irqsave(shost->host_lock, flags);
> > +	while (!shost->host_self_blocked
> > +	       && !list_empty(&shost->pending_queue)) {
> 
> Linux codingstyle sais the && should be before the line break.

OK thanks.

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH] scsi-misc-2.5 software enqueue when can_queue reached
  2003-02-28 19:19 [RFC][PATCH] scsi-misc-2.5 software enqueue when can_queue reached Patrick Mansfield
  2003-03-02  8:57 ` Christoph Hellwig
@ 2003-03-02 20:57 ` Luben Tuikov
  2003-03-02 21:08   ` Luben Tuikov
  2003-03-03 20:52   ` Patrick Mansfield
  2003-03-05  3:02 ` James Bottomley
  2 siblings, 2 replies; 18+ messages in thread
From: Luben Tuikov @ 2003-03-02 20:57 UTC (permalink / raw)
  To: Patrick Mansfield; +Cc: linux-scsi

Patrick Mansfield wrote:
> Hi -
> 
> Any comments or suggestions on the following patch?

Patrick,

The idea is good, but scsi core is not there yet.

Here are a few thematical comments:

1. How does the comment and name relate to each other?

struct list_head      pending_queue;   /* software enqueued commands */

Looking at how this queue is used, a proper name would
be something like ``deferred_cmd_q'', since those are
indeed _deferred_ commands because the host were busy...

I also prefer the prefix ``_cmd_q'' since it describes
that it is a _command queue_, and phonetically ``q''
*is* ``queue'', thus there's no reason to spell it out
in variable names.  But I don't have problems with
a prefix of ``_q'', as well, which is what I've used
myself.

In general, let's not use names like ``pending'' and
``siblings'' just because they sound cool or Knuth
or some paper uses them, and let us be more descriptive.

When I mentioned ``pending_q'' from my own mini-scsi-core,
I also mentioned its use, here you seem to be using the
name for a different purpose.

A subsystem, like SCSI Core, should name the variables
*from its own point of view*.  I.e. things like:
incoming_cmd_q, pending_cmd_q, done_cmd_q, for
incoming commands from ULP, pending commands for execution
status (i.e. for which queuecommand() were called), and
done commands, i.e. for which a LLDD has called scsi_done().

You see, the commands in all these hypothetical queues (but
representative of the state of a scsi command from SCSI Core's
point of view), are all *pending*.  The first queue, pending to
be submitted to a LLDD, the second queue, pending execution
status, and the third, pending to be returned to ULP.

So, please, if this patch were to go in, select a more
descriptive name.

(Descriptive names are a limited resource -- let's use
them wisely.)


2. Just think about how much *easier* your patch would be
if SCSI Core had a ``incoming_cmd_q''?   You could just
enqueue at the front, while the request fn could enqueue
at the end.

As I said, the idea is good (long overdue), but SCSI Core
is not there yet, unless its whole internal queueing mechanism
is overhauled (my beef for the last 1.5 years now).

The thing is that we've added *one more* list_head entry
to the scsi command struct and this makes tracking of
scsi commands harder.  (Active command cancelling, vs.
passive command cancelling.)

If it were up to me, the scsi command struct would have
*only one* list_head entry, which would be used from
its instantiation to its being freed, in which case
the command gets back to the free list or the slab,
and thus you get a closed logical loop.

Tiny comments of logical nature inlined, and general
comment at the bottom:
 
> Patch is against scsi-misc-2.5.
> 
> I'm trying to break split queue_lock into a per scsi_device lock, and the
> starved code is in the way, plus it is not fair and looks like it has a
> couple of bugs. I started by trying to add a queue of request queues plus
> throttling to the current "starved" algorithm, but it was not too clean,
> so instead implemented this software enqueueing when the host can_queue
> limit is reached.
> 
> This can use more resources (scsi_cmnd's) if the sum of queue_depths on a
> host is greater than can_queue, but it is fairer (when not all devices are
> hitting queue_depth limits, or if we had throttling with a throttling
> limit of 1) and simpler compared to using a queue of queues (or code
> similiar to the current starvation code).
> 
> This also includes moving self_host_blocked checks (on self host unblock
> we run all queues so there is never a "starvation" issue), and chaning
> some of the mlqueue log levels so we do not get flooded with tons of
> message for "scsi log mlqueue 4".
> 
> Tested using the feral qla driver, with 10 drives on a host, queue_depth
> set to 16, and can_queue set to 100.
> 
> 
>  hosts.c      |   70 +++++++++++++++++++++++++++++++++++++++++++-----
>  hosts.h      |   12 +-------
>  scsi.c       |   69 +++++++++++++++++++++++++++++------------------
>  scsi.h       |    9 +++---
>  scsi_error.c |   51 +++++++++++++++++++++++++++++------
>  scsi_lib.c   |   85 ++++++++++++++++++++---------------------------------------
>  6 files changed, 182 insertions(+), 114 deletions(-)
> 
> 
> ===== drivers/scsi/hosts.c 1.54 vs edited =====
> --- 1.54/drivers/scsi/hosts.c	Sun Feb 23 10:34:55 2003
> +++ edited/drivers/scsi/hosts.c	Thu Feb 27 16:21:09 2003
> @@ -383,6 +383,7 @@
>  	scsi_assign_lock(shost, &shost->default_lock);
>  	INIT_LIST_HEAD(&shost->my_devices);
>  	INIT_LIST_HEAD(&shost->eh_cmd_q);
> +	INIT_LIST_HEAD(&shost->pending_queue);
>  
>  	init_waitqueue_head(&shost->host_wait);
>  	shost->dma_channel = 0xff;
> @@ -613,19 +614,72 @@
>  	spin_unlock_irqrestore(shost->host_lock, flags);
>  }
>  
> +void scsi_host_send_pending (struct Scsi_Host *shost)
> +{
> +	struct scsi_cmnd *scmd;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(shost->host_lock, flags);
> +	while (!shost->host_self_blocked
> +	       && !list_empty(&shost->pending_queue)) {
> +		/*
> +		 * list_for_each_safe is not safe here, since after the
> +		 * unlock someone else can remove the scmd after the one
> +		 * we are pulling off the list.
> +		 */
> +		scmd = list_entry(shost->pending_queue.next, struct scsi_cmnd,
> +				  pending_queue);
> +		list_del_init(&scmd->pending_queue);


Out of principle (99.9% of everything I write/post), I don't like
seeing an object yanked out of a queue list just to belong to
``ether-space''.

Such logical changes, as your patch warrants, dictate
that a command is to move from queue to queue, as its state/owner
changes.  Actually its state *is* its owner and vice versa.


> +		SCSI_LOG_MLQUEUE(3,
> +			printk("%s: removed from pending cmd: 0x%lx\n",
> +			       __FUNCTION__, scmd->serial_number));
> +		spin_unlock_irqrestore(shost->host_lock, flags);
> +		if (scsi_dispatch_cmd(scmd, 0))
> +			/*
> +			 * Device hit host_blocked, or (race case)
> +			 * can_queue limit was reached.
> +			 */
> +			goto unlocked;
> +		spin_lock_irqsave(shost->host_lock, flags);
> +	}
> +	spin_unlock_irqrestore(shost->host_lock, flags);
> +unlocked:
> +}


That ``goto unlocked'' is one more reason that the host lock
should just go *and* queuecommand() should be void...


> +
>  void scsi_host_busy_dec_and_test(struct Scsi_Host *shost, Scsi_Device *sdev)


Anything named ``xxx_dec_and_test'', had better *RETURN* the result
of the dec_and_test.  So either, rename this function or change
is prototype to an integral type.


>  {
>  	unsigned long flags;
> +	struct scsi_cmnd *scmd;
>  
>  	spin_lock_irqsave(shost->host_lock, flags);
> +	/*
> +	 * If broken adapters are ever fixed or go away (at least qlogicfc
> +	 * and qlogicisp):
> +	 *
> +	 * WARN_ON(shost->host_busy > shost->can_queue);
> +	 */
>  	shost->host_busy--;
>  	sdev->device_busy--;
> -	if (shost->in_recovery && shost->host_failed &&
> -	    (shost->host_busy == shost->host_failed))
> -	{
> -		up(shost->eh_wait);
> -		SCSI_LOG_ERROR_RECOVERY(5, printk("Waking error handler"
> -					  " thread\n"));
> -	}
> -	spin_unlock_irqrestore(shost->host_lock, flags);
> +	if (shost->in_recovery) {
> +		if (shost->host_failed && (shost->host_busy == shost->host_failed)) {
> +			up(shost->eh_wait);
> +			SCSI_LOG_ERROR_RECOVERY(5, printk("Waking error handler"
> +						  " thread\n"));
> +		}
> +		spin_unlock_irqrestore(shost->host_lock, flags);
> +	} else if (!list_empty(&shost->pending_queue) &&
> +		   !shost->host_blocked && !shost->host_self_blocked) {
> +		/*
> +		 * Send one pending command.
> +		 */
> +		scmd = list_entry(shost->pending_queue.next, struct scsi_cmnd,
> +				  pending_queue);
> +		list_del_init(&scmd->pending_queue);
> +		SCSI_LOG_MLQUEUE(3,
> +			printk("%s: removed from pending cmd: 0x%lx\n",
> +			       __FUNCTION__, scmd->serial_number));
> +		spin_unlock_irqrestore(shost->host_lock, flags);
> +		scsi_dispatch_cmd(scmd, 0);
> +	} else 
> +		spin_unlock_irqrestore(shost->host_lock, flags);
>  }
> ===== drivers/scsi/hosts.h 1.56 vs edited =====
> --- 1.56/drivers/scsi/hosts.h	Fri Feb 21 16:41:13 2003
> +++ edited/drivers/scsi/hosts.h	Fri Feb 28 09:31:46 2003
> @@ -380,6 +380,7 @@
>      struct scsi_host_cmd_pool *cmd_pool;
>      spinlock_t            free_list_lock;
>      struct list_head      free_list;   /* backup store of cmd structs */
> +    struct list_head      pending_queue;   /* software enqueued commands */


See top of this message.


>  
>      spinlock_t		  default_lock;
>      spinlock_t		  *host_lock;
> @@ -471,12 +472,6 @@
>      unsigned reverse_ordering:1;
>  
>      /*
> -     * Indicates that one or more devices on this host were starved, and
> -     * when the device becomes less busy that we need to feed them.
> -     */
> -    unsigned some_device_starved:1;
> -   
> -    /*
>       * Host has rejected a command because it was busy.
>       */
>      unsigned int host_blocked;
> @@ -580,12 +575,9 @@
>  extern struct Scsi_Host *scsi_host_hn_get(unsigned short);
>  extern void scsi_host_put(struct Scsi_Host *);
>  extern void scsi_host_init(void);
> -
> -/*
> - * host_busy inc/dec/test functions
> - */
>  extern void scsi_host_busy_inc(struct Scsi_Host *, Scsi_Device *);
>  extern void scsi_host_busy_dec_and_test(struct Scsi_Host *, Scsi_Device *);
> +extern void scsi_host_send_pending (struct Scsi_Host *shost);
>  
>  /**
>   * scsi_find_device - find a device given the host
> ===== drivers/scsi/scsi.c 1.97 vs edited =====
> --- 1.97/drivers/scsi/scsi.c	Wed Feb 26 00:00:06 2003
> +++ edited/drivers/scsi/scsi.c	Fri Feb 28 09:32:01 2003
> @@ -303,6 +303,7 @@
>  		cmd->owner = SCSI_OWNER_NOBODY;
>  		init_timer(&cmd->eh_timeout);
>  		INIT_LIST_HEAD(&cmd->list);
> +		INIT_LIST_HEAD(&cmd->pending_queue);
>  		spin_lock_irqsave(&dev->list_lock, flags);
>  		list_add_tail(&cmd->list, &dev->cmd_list);
>  		spin_unlock_irqrestore(&dev->list_lock, flags);
> @@ -423,15 +424,17 @@
>  }
>  
>  /*
> - * Function:    scsi_dispatch_command
> + * Function:    scsi_dispatch_cmd
>   *
>   * Purpose:     Dispatch a command to the low-level driver.
>   *
>   * Arguments:   SCpnt - command block we are dispatching.
> + *              resend - command is being resent, host_busy was not
> + *              decremented (blah), and we have a slot available for SCpnt.
>   *
>   * Notes:
>   */
> -int scsi_dispatch_cmd(Scsi_Cmnd * SCpnt)
> +int scsi_dispatch_cmd(Scsi_Cmnd * SCpnt, int resend)


Oh, boy!

More complication!  Why do you need to change the prototype
for the dispatch fn?  (Rhetorical.)

Because the rest of the infrastructure of SCSI Core is not
up to date to your patch-idea?

See bottom of message.


>  {
>  #ifdef DEBUG_DELAY
>  	unsigned long clock;
> @@ -494,7 +497,7 @@
>  	 * We will use a queued command if possible, otherwise we will emulate the
>  	 * queuing and calling of completion function ourselves.
>  	 */
> -	SCSI_LOG_MLQUEUE(3, printk("scsi_dispatch_cmnd (host = %d, channel = %d, target = %d, "
> +	SCSI_LOG_MLQUEUE(4, printk("scsi_dispatch_cmnd (host = %d, channel = %d, target = %d, "
>  	       "command = %p, buffer = %p, \nbufflen = %d, done = %p)\n",
>  	SCpnt->device->host->host_no, SCpnt->device->channel, SCpnt->device->id, SCpnt->cmnd,
>  			    SCpnt->buffer, SCpnt->bufflen, SCpnt->done));
> @@ -502,7 +505,7 @@
>  	SCpnt->state = SCSI_STATE_QUEUED;
>  	SCpnt->owner = SCSI_OWNER_LOWLEVEL;
>  	if (host->can_queue) {
> -		SCSI_LOG_MLQUEUE(3, printk("queuecommand : routine at %p\n",
> +		SCSI_LOG_MLQUEUE(4, printk("queuecommand : routine at %p\n",
>  					   host->hostt->queuecommand));
>  		/*
>  		 * Before we queue this command, check if the command
> @@ -510,12 +513,33 @@
>  		 */
>  		if (CDB_SIZE(SCpnt) <= SCpnt->device->host->max_cmd_len) {
>  			spin_lock_irqsave(host->host_lock, flags);
> -			rtn = host->hostt->queuecommand(SCpnt, scsi_done);
> -			spin_unlock_irqrestore(host->host_lock, flags);
> -			if (rtn != 0) {
> -				scsi_queue_insert(SCpnt, rtn == SCSI_MLQUEUE_DEVICE_BUSY ? rtn : SCSI_MLQUEUE_HOST_BUSY);
> +			if (!resend && ((host->host_busy > host->can_queue) ||
> +			    host->host_blocked)) {
> +				/*
> +				 * queue a pending command
> +				 */
>  				SCSI_LOG_MLQUEUE(3,
> -				   printk("queuecommand : request rejected\n"));                                
> +					printk("%s: add to pending cmd:"
> +					       " 0x%lx\n", __FUNCTION__,
> +					       SCpnt->serial_number));
> +				scsi_delete_timer(SCpnt);
> +				list_add_tail(&SCpnt->pending_queue,
> +					      &host->pending_queue);
> +				spin_unlock_irqrestore(host->host_lock, flags);
> +				rtn = 1;


Messy, messy, messy, but I sympathize.


> +			} else {
> +				if (!resend)
> +					host->host_busy++;
> +				rtn = host->hostt->queuecommand(SCpnt,
> +								scsi_done);
> +				spin_unlock_irqrestore(host->host_lock, flags);
> +				if (rtn != 0) {
> +					scsi_queue_insert(SCpnt,
> +						rtn == SCSI_MLQUEUE_DEVICE_BUSY
> +						? rtn : SCSI_MLQUEUE_HOST_BUSY);
> +					SCSI_LOG_MLQUEUE(3,
> +					   printk("queuecommand : request rejected\n"));                                
> +				}
>  			}
>  		} else {
>  			SCSI_LOG_MLQUEUE(3,
> @@ -531,6 +555,8 @@
>  
>  		SCSI_LOG_MLQUEUE(3, printk("command() :  routine at %p\n", host->hostt->command));
>                  spin_lock_irqsave(host->host_lock, flags);
> +		if (!resend)
> +			host->host_busy++;
>  		temp = host->hostt->command(SCpnt);
>  		SCpnt->result = temp;
>  #ifdef DEBUG_DELAY
> @@ -547,7 +573,7 @@
>  		scsi_done(SCpnt);
>                  spin_unlock_irqrestore(host->host_lock, flags);
>  	}
> -	SCSI_LOG_MLQUEUE(3, printk("leaving scsi_dispatch_cmnd()\n"));
> +	SCSI_LOG_MLQUEUE(4, printk("leaving scsi_dispatch_cmnd()\n"));
>  	return rtn;
>  }
>  
> @@ -824,7 +850,7 @@
>           */
>  	memset((void *) SCpnt->sense_buffer, 0, sizeof SCpnt->sense_buffer);
>  
> -	return scsi_dispatch_cmd(SCpnt);
> +	return scsi_dispatch_cmd(SCpnt, 1);
>  }
>  
>  /*
> @@ -845,23 +871,12 @@
>  
>  	ASSERT_LOCK(host->host_lock, 0);
>  
> -        /*
> -         * We need to protect the decrement, as otherwise a race condition
> -         * would exist.  Fiddling with SCpnt isn't a problem as the
> -         * design only allows a single SCpnt to be active in only
> -         * one execution context, but the device and host structures are
> -         * shared.
> -         */
> -	scsi_host_busy_dec_and_test(host, device);
> -
> -        /*
> -         * Clear the flags which say that the device/host is no longer
> -         * capable of accepting new commands.  These are set in scsi_queue.c
> -         * for both the queue full condition on a device, and for a
> -         * host full condition on the host.
> -         */
> -        host->host_blocked = 0;
>          device->device_blocked = 0;
> +	scsi_host_busy_dec_and_test(host, device);
> +	if (host->host_blocked) {
> +		host->host_blocked = 0;
> +		scsi_host_send_pending(host);
> +	}
>  
>  	/*
>  	 * If we have valid sense information, then some kind of recovery
> ===== drivers/scsi/scsi.h 1.65 vs edited =====
> --- 1.65/drivers/scsi/scsi.h	Sun Feb 23 10:34:55 2003
> +++ edited/drivers/scsi/scsi.h	Thu Feb 27 16:21:10 2003
> @@ -444,7 +444,7 @@
>  /*
>   * Prototypes for functions in scsi.c
>   */
> -extern int scsi_dispatch_cmd(Scsi_Cmnd * SCpnt);
> +extern int scsi_dispatch_cmd(Scsi_Cmnd * SCpnt, int resend);
>  extern int scsi_setup_command_freelist(struct Scsi_Host *shost);
>  extern void scsi_destroy_command_freelist(struct Scsi_Host *shost);
>  extern struct scsi_cmnd *scsi_get_command(struct scsi_device *dev, int flags);
> @@ -634,8 +634,6 @@
>  					 * because we did a bus reset. */
>  	unsigned ten:1;		/* support ten byte read / write */
>  	unsigned remap:1;	/* support remapping  */
> -	unsigned starved:1;	/* unable to process commands because
> -				   host busy */
>  //	unsigned sync:1;	/* Sync transfer state, managed by host */
>  //	unsigned wide:1;	/* WIDE transfer state, managed by host */
>  
> @@ -727,7 +725,9 @@
>  	struct scsi_cmnd *reset_chain;
>  
>  	struct list_head list;  /* scsi_cmnd participates in queue lists */
> -
> +	struct list_head pending_queue;  /* for host->pending_queue, this could
> +					  * be shared with eh_entry.
> +					  */


Oh, boy!  One more list_head...
As I mentioned above, this would make the logical structure
of what belongs where, look like a spaghetti dish.


>  	struct list_head eh_entry; /* entry for the host eh_cmd_q */
>  	int eh_state;		/* Used for state tracking in error handlr */
>  	int eh_eflags;		/* Used by error handlr */
> @@ -854,6 +854,7 @@
>  #define SCSI_MLQUEUE_HOST_BUSY   0x1055
>  #define SCSI_MLQUEUE_DEVICE_BUSY 0x1056
>  #define SCSI_MLQUEUE_EH_RETRY    0x1057
> +#define SCSI_MLQUEUE_PENDING     0x1058
>  
>  /*
>   * old style reset request from external source
> ===== drivers/scsi/scsi_error.c 1.38 vs edited =====
> --- 1.38/drivers/scsi/scsi_error.c	Sat Feb 22 08:17:01 2003
> +++ edited/drivers/scsi/scsi_error.c	Thu Feb 27 16:21:10 2003
> @@ -1467,16 +1467,8 @@
>  	 * requests are started.
>  	 */
>  	spin_lock_irqsave(shost->host_lock, flags);
> -	list_for_each_entry(sdev, &shost->my_devices, siblings) {
> -		if ((shost->can_queue > 0 &&
> -		     (shost->host_busy >= shost->can_queue))
> -		    || (shost->host_blocked)
> -		    || (shost->host_self_blocked)) {
> -			break;
> -		}
> -
> +	list_for_each_entry(sdev, &shost->my_devices, siblings)
>  		__blk_run_queue(sdev->request_queue);
> -	}
>  	spin_unlock_irqrestore(shost->host_lock, flags);
>  }
>  
> @@ -1497,6 +1489,34 @@
>  }
>  
>  /**
> + * scsi_eh_flush_pending_q - finish pending commands or requeue them.
> + * @pending_q:	list_head of queued pending commands.
> + *
> + **/
> +static void scsi_eh_flush_pending_q(struct list_head *pending_q)
> +{
> +	struct list_head *lh, *lh_sf;
> +	struct scsi_cmnd *scmd;
> +
> +	list_for_each_safe(lh, lh_sf, pending_q) {
> +		scmd = list_entry(lh, struct scsi_cmnd, pending_queue);
> +		list_del_init(lh);
> +		if (scmd->device->online) {
> +			SCSI_LOG_ERROR_RECOVERY(3, 
> +				printk("%s: flush pending cmd: %p\n",
> +				       current->comm, scmd));
> +			scsi_queue_insert(scmd, SCSI_MLQUEUE_PENDING);
> +		} else {
> +			scmd->result |= (DRIVER_TIMEOUT << 24);
> +			SCSI_LOG_ERROR_RECOVERY(3,
> +				printk("%s: finish pending cmd: %p\n",
> +				       current->comm, scmd));
> +			scsi_finish_command(scmd);
> +		}
> +	}
> +}


No.  I don't like it (from logical point of view of course).
This will get too tricky as the device could be in many more
states, then you'll have the burden of deciding what to
do with the commands...

Decisions of what should be done with a command should
be centralized in a single function.


> +
> +/**
>   * scsi_eh_flush_done_q - finish processed commands or retry them.
>   * @done_q:	list_head of processed commands.
>   *
> @@ -1557,6 +1577,7 @@
>  	unsigned long flags;
>  	LIST_HEAD(eh_work_q);
>  	LIST_HEAD(eh_done_q);
> +	LIST_HEAD(pending_q);
>  
>  	spin_lock_irqsave(shost->host_lock, flags);
>  	list_splice_init(&shost->eh_cmd_q, &eh_work_q);
> @@ -1568,6 +1589,16 @@
>  		if (!scsi_eh_abort_cmds(&eh_work_q, &eh_done_q))
>  			scsi_eh_ready_devs(shost, &eh_work_q, &eh_done_q);
>  
> +	/*
> +	 * Finish or requeue all outstanding commands, first the pending
> +	 * (so they are sent after any error recovered commands) and then
> +	 * any error recovered commands.
> +	 */
> +	spin_lock_irqsave(shost->host_lock, flags);
> +	list_splice_init(&shost->pending_queue, &pending_q);
> +	spin_unlock_irqrestore(shost->host_lock, flags);
> +	scsi_eh_flush_pending_q(&pending_q);
> +
>  	scsi_eh_flush_done_q(&eh_done_q);


If you're marrying a function name with an object of SCSI Core
shouldn't this be called ``scsi_flush_eh_done_q()''?


>  }
>  
> @@ -1659,7 +1690,9 @@
>  		 * restart, we restart any I/O to any other devices on the bus
>  		 * which are still online.
>  		 */
> +
>  		scsi_restart_operations(shost);
> +
>  
>  	}
>  
> ===== drivers/scsi/scsi_lib.c 1.73 vs edited =====
> --- 1.73/drivers/scsi/scsi_lib.c	Sat Feb 22 12:35:33 2003
> +++ edited/drivers/scsi/scsi_lib.c	Thu Feb 27 16:21:10 2003
> @@ -92,9 +92,11 @@
>  {
>  	struct Scsi_Host *host = cmd->device->host;
>  	struct scsi_device *device = cmd->device;
> +	unsigned long flags;
>  
>  	SCSI_LOG_MLQUEUE(1,
> -		 printk("Inserting command %p into mlqueue\n", cmd));
> +		 printk("Inserting command %p into mlqueue reason 0x%x\n",
> +			cmd, reason));
>  
>  	/*
>  	 * We are inserting the command into the ml queue.  First, we
> @@ -127,11 +129,20 @@
>  	cmd->owner = SCSI_OWNER_MIDLEVEL;
>  	cmd->bh_next = NULL;
>  
> -	/*
> -	 * Decrement the counters, since these commands are no longer
> -	 * active on the host/device.
> -	 */
> -	scsi_host_busy_dec_and_test(host, device);
> +	if (reason == SCSI_MLQUEUE_PENDING) {
> +		/*
> +		 * Pending commands have not actually made it to the host,
> +		 * so do not decrement host_busy.
> +		 */
> +		spin_lock_irqsave(host->host_lock, flags);
> +		device->device_busy--;
> +		spin_unlock_irqrestore(host->host_lock, flags);
> +	} else
> +		/*
> +		 * Decrement the counters, since these commands are no longer
> +		 * active on the host/device.
> +		 */
> +		scsi_host_busy_dec_and_test(host, device);
>  
>  	/*
>  	 * Insert this command at the head of the queue for it's device.
> @@ -358,7 +369,6 @@
>  	struct scsi_device *sdev, *sdev2;
>  	struct Scsi_Host *shost;
>  	unsigned long flags;
> -	int all_clear;
>  
>  	ASSERT_LOCK(q->queue_lock, 0);
>  
> @@ -400,10 +410,7 @@
>  	 * with special case code, then spin off separate versions and
>  	 * use function pointers to pick the right one.
>  	 */
> -	if (sdev->single_lun && blk_queue_empty(q) && sdev->device_busy ==0 &&
> -			!shost->host_blocked && !shost->host_self_blocked &&
> -			!((shost->can_queue > 0) && (shost->host_busy >=
> -				       		     shost->can_queue))) {
> +	if (sdev->single_lun && blk_queue_empty(q) && sdev->device_busy == 0) {
>  		list_for_each_entry(sdev2, &sdev->same_target_siblings,
>  			       same_target_siblings) {
>  			if (!sdev2->device_blocked &&
> @@ -414,31 +421,6 @@
>  		}
>  	}
>  
> -	/*
> -	 * Now see whether there are other devices on the bus which
> -	 * might be starved.  If so, hit the request function.  If we
> -	 * don't find any, then it is safe to reset the flag.  If we
> -	 * find any device that it is starved, it isn't safe to reset the
> -	 * flag as the queue function releases the lock and thus some
> -	 * other device might have become starved along the way.
> -	 */
> -	all_clear = 1;
> -	if (shost->some_device_starved) {
> -		list_for_each_entry(sdev, &shost->my_devices, siblings) {
> -			if (shost->can_queue > 0 &&
> -			    shost->host_busy >= shost->can_queue)
> -				break;
> -			if (shost->host_blocked || shost->host_self_blocked)
> -				break;
> -			if (sdev->device_blocked || !sdev->starved)
> -				continue;
> -			__blk_run_queue(sdev->request_queue);
> -			all_clear = 0;
> -		}
> -
> -		if (sdev == NULL && all_clear)
> -			shost->some_device_starved = 0;
> -	}
>  	spin_unlock_irqrestore(q->queue_lock, flags);
>  }
>  
> @@ -1094,6 +1076,7 @@
>  				SCSI_LOG_MLQUEUE(3,
>  					printk("scsi%d unblocking host at zero depth\n",
>  						shost->host_no));
> +				WARN_ON(!list_empty(&shost->pending_queue));
>  			} else {
>  				blk_plug_device(q);
>  				break;
> @@ -1117,23 +1100,18 @@
>  		 */
>  		if (sdev->device_blocked)
>  			break;
> -		if ((shost->can_queue > 0 && shost->host_busy >= shost->can_queue) ||
> -		    shost->host_blocked || shost->host_self_blocked) {
> +
> +		if (shost->host_self_blocked)
> +			break;
> +
> +		if ((sdev->device_busy > 0) && shost->host_blocked)
>  			/*
> -			 * If we are unable to process any commands at all for
> -			 * this device, then we consider it to be starved.
> -			 * What this means is that there are no outstanding
> -			 * commands for this device and hence we need a
> -			 * little help getting it started again
> -			 * once the host isn't quite so busy.
> +			 * On unblock during IO completion, we send
> +			 * pending commands, make sure we always have at
> +			 * least one command pending to keep this queue
> +			 * running again.
>  			 */
> -			if (sdev->device_busy == 0) {
> -				sdev->starved = 1;
> -				shost->some_device_starved = 1;
> -			}
>  			break;
> -		} else
> -			sdev->starved = 0;
>  
>  		/*
>  		 * If we couldn't find a request that could be queued, then we
> @@ -1170,11 +1148,6 @@
>  		if (!(blk_queue_tagged(q) && (blk_queue_start_tag(q, req) == 0)))
>  			blkdev_dequeue_request(req);
>  	
> -		/*
> -		 * Now bump the usage count for both the host and the
> -		 * device.
> -		 */
> -		shost->host_busy++;
>  		sdev->device_busy++;
>  		spin_unlock_irq(q->queue_lock);
>  
> @@ -1187,7 +1160,7 @@
>  		/*
>  		 * Dispatch the command to the low-level driver.
>  		 */
> -		scsi_dispatch_cmd(cmd);
> +		scsi_dispatch_cmd(cmd, 0);
>  
>  		/*
>  		 * Now we need to grab the lock again.  We are about to mess
> 

Overall, I think for the last 2 months we saw too many additions
of list_head queues and list_head entries (abuse of struct list_head?).

I think that this makes SCSI Core more complicated than its purpose
warrants, and harder to maintain and implement other needed features
like *active* command cancelling (cf. passive command cancelling).

I appreciate your effort to incorporate some of the [queuing]
ideas of my own mini-scsi-core, which I spoke of so many times,
but as I said, for them to be of any use, a general overhaul of
SCSI Core is needed.

-- 
Luben




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH] scsi-misc-2.5 software enqueue when can_queue reached
  2003-03-02 20:57 ` Luben Tuikov
@ 2003-03-02 21:08   ` Luben Tuikov
  2003-03-03 20:52   ` Patrick Mansfield
  1 sibling, 0 replies; 18+ messages in thread
From: Luben Tuikov @ 2003-03-02 21:08 UTC (permalink / raw)
  To: Patrick Mansfield; +Cc: linux-scsi

I wrote:
[cut]
> 
> I also prefer the prefix ``_cmd_q'' since it describes

That's ``postfix'', but you knew that...

> that it is a _command queue_, and phonetically ``q''
> *is* ``queue'', thus there's no reason to spell it out
> in variable names.  But I don't have problems with
> a prefix of ``_q'', as well, which is what I've used

Same here.

> myself.

[cut]

-- 
Luben




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH] scsi-misc-2.5 software enqueue when can_queue reached
  2003-03-02  8:57 ` Christoph Hellwig
  2003-03-02 18:15   ` Patrick Mansfield
@ 2003-03-03 15:52   ` Randy.Dunlap
  2003-03-03 18:17   ` Luben Tuikov
  2 siblings, 0 replies; 18+ messages in thread
From: Randy.Dunlap @ 2003-03-03 15:52 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: patmans, linux-scsi

On Sun, 2 Mar 2003 08:57:28 +0000
Christoph Hellwig <hch@infradead.org> wrote:

| The patch looks fine to me in principle, a few nitpicks are below.
| (btw, any chance you could post a current scsi midlayer multipathing
| patch?  I see there is BK activity.. :))
| 
| > +++ edited/drivers/scsi/hosts.c	Thu Feb 27 16:21:09 2003
| > @@ -383,6 +383,7 @@
| >  	scsi_assign_lock(shost, &shost->default_lock);
| >  	INIT_LIST_HEAD(&shost->my_devices);
| >  	INIT_LIST_HEAD(&shost->eh_cmd_q);
| > +	INIT_LIST_HEAD(&shost->pending_queue);
| 
| This isn't consistant, either ew use _q in all new code or _queue.
| (peronally I prefer the latter)

I agree; looks/reads better, easier to remember.

| > +void scsi_host_send_pending (struct Scsi_Host *shost)
| 
| codingstyle issue:  no space between function name and opening brace.

It's much more readable with the space there IMO.

| > +{
| > +	struct scsi_cmnd *scmd;
| > +	unsigned long flags;
| > +
| > +	spin_lock_irqsave(shost->host_lock, flags);
| > +	while (!shost->host_self_blocked
| > +	       && !list_empty(&shost->pending_queue)) {
| 
| Linux codingstyle sais the && should be before the line break.

While I also prefer the && at the end of the line, I don't see that as
a requirement anywhere... and not in Documentation/CodingStyle (and I'm
not saying that you said it's in there).

--
~Randy

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH] scsi-misc-2.5 software enqueue when can_queue reached
  2003-03-02  8:57 ` Christoph Hellwig
  2003-03-02 18:15   ` Patrick Mansfield
  2003-03-03 15:52   ` Randy.Dunlap
@ 2003-03-03 18:17   ` Luben Tuikov
  2003-03-04  1:11     ` Andrew Morton
  2 siblings, 1 reply; 18+ messages in thread
From: Luben Tuikov @ 2003-03-03 18:17 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Patrick Mansfield, linux-scsi

Christoph Hellwig wrote:
[cut]
>>+{
>>+	struct scsi_cmnd *scmd;
>>+	unsigned long flags;
>>+
>>+	spin_lock_irqsave(shost->host_lock, flags);
>>+	while (!shost->host_self_blocked
>>+	       && !list_empty(&shost->pending_queue)) {
> 
> 
> Linux codingstyle sais the && should be before the line break.

Actually there's a point to this.

Vision research has shown that the human eye doesn't scan/parse
more than 5% to 15% of the beginning of text lines when
searching for an item.

When a binary logical operator is on the continued (second)
line of the if test, the brain automatically assumes that
this is a _continuation_ of the logical condition -- since
no valid C language line starts _with_ a _binary logical op_.

But if the binary logical op were on the previous line, then
the next line would look just like a normal line and one would
_have_ to look at the previous line to see if it ends with `}' or
`;'.

There's a book called "C Style: Standards and Guidelines" by
David Straker, (C) 1992, Prentice Hall, ISBN: 0-13-116898-3.
More information can be found therein.

-- 
Luben



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH] scsi-misc-2.5 software enqueue when can_queue reached
  2003-03-02 20:57 ` Luben Tuikov
  2003-03-02 21:08   ` Luben Tuikov
@ 2003-03-03 20:52   ` Patrick Mansfield
  2003-03-03 22:40     ` Luben Tuikov
  1 sibling, 1 reply; 18+ messages in thread
From: Patrick Mansfield @ 2003-03-03 20:52 UTC (permalink / raw)
  To: Luben Tuikov; +Cc: linux-scsi

On Sun, Mar 02, 2003 at 03:57:27PM -0500, Luben Tuikov wrote:
> Patrick Mansfield wrote:

> Patrick,
> 
> The idea is good, but scsi core is not there yet.

The patch fixes the problem we currently have when the can_queue limit is
reached, and helps move towards a per scsi_device queue_lock.

I coded it in with the idea of moving towards further separation of
scsi_host versus scsi_device - for example, always decrementing host_busy
when a command completes, but I'm not trying to address other general
issues.

> Here are a few thematical comments:
> 
> 1. How does the comment and name relate to each other?
> 
> struct list_head      pending_queue;   /* software enqueued commands */

> So, please, if this patch were to go in, select a more
> descriptive name.

I'll rename the field to pending_cmd - it is a command that is pending
for this host adapter, and just avoid the q vs queue postfix issue.

> 2. Just think about how much *easier* your patch would be
> if SCSI Core had a ``incoming_cmd_q''?   You could just
> enqueue at the front, while the request fn could enqueue
> at the end.

If we do that, for the current code if we never hit can_queue, then we
would always add to the list and then immediately remove the same command
from the list for further processing.

It would be interesting to use a block request queue for sending commands
to the host adapter, but that is beyond the scope of this patch.

> The thing is that we've added *one more* list_head entry
> to the scsi command struct and this makes tracking of
> scsi commands harder.  (Active command cancelling, vs.
> passive command cancelling.)
> 
> If it were up to me, the scsi command struct would have
> *only one* list_head entry, which would be used from
> its instantiation to its being freed, in which case
> the command gets back to the free list or the slab,
> and thus you get a closed logical loop.

The list_head is there because of how we are implementing the lists or
queueing - we could create a list of scsi_cmnd's that does not require a
separate field within scsi_cmnd.

We can combine the scsi_cmnd list_head pending with the eh_entry, if there
is agreement that a smaller scsi_cmnd is worth the list_head overloading.

> > +		scmd = list_entry(shost->pending_queue.next, struct scsi_cmnd,
> > +				  pending_queue);
> > +		list_del_init(&scmd->pending_queue);
> 
> 
> Out of principle (99.9% of everything I write/post), I don't like
> seeing an object yanked out of a queue list just to belong to
> ``ether-space''.
> 
> Such logical changes, as your patch warrants, dictate
> that a command is to move from queue to queue, as its state/owner
> changes.  Actually its state *is* its owner and vice versa.

This is not a queue of commands for the host adapter to process, it is a
queue of commands the host adapter will be sent in the future, so we are
either on the queue, or not on the queue.

> > -int scsi_dispatch_cmd(Scsi_Cmnd * SCpnt)
> > +int scsi_dispatch_cmd(Scsi_Cmnd * SCpnt, int resend)
> 
> 
> Oh, boy!
> 
> More complication!  Why do you need to change the prototype
> for the dispatch fn?  (Rhetorical.)
> 
> Because the rest of the infrastructure of SCSI Core is not
> up to date to your patch-idea?

Slightly, but even so it nice to have some sort of priority for the
scsi_cmnd, and a resend can imply higher priority.

> > +static void scsi_eh_flush_pending_q(struct list_head *pending_q)
> > +{
> > +	struct list_head *lh, *lh_sf;
> > +	struct scsi_cmnd *scmd;
> > +
> > +	list_for_each_safe(lh, lh_sf, pending_q) {
> > +		scmd = list_entry(lh, struct scsi_cmnd, pending_queue);
> > +		list_del_init(lh);
> > +		if (scmd->device->online) {
> > +			SCSI_LOG_ERROR_RECOVERY(3, 
> > +				printk("%s: flush pending cmd: %p\n",
> > +				       current->comm, scmd));
> > +			scsi_queue_insert(scmd, SCSI_MLQUEUE_PENDING);
> > +		} else {
> > +			scmd->result |= (DRIVER_TIMEOUT << 24);
> > +			SCSI_LOG_ERROR_RECOVERY(3,
> > +				printk("%s: finish pending cmd: %p\n",
> > +				       current->comm, scmd));
> > +			scsi_finish_command(scmd);
> > +		}
> > +	}
> > +}
> 
> 
> No.  I don't like it (from logical point of view of course).
> This will get too tricky as the device could be in many more
> states, then you'll have the burden of deciding what to
> do with the commands...
> 
> Decisions of what should be done with a command should
> be centralized in a single function.

I generally do not like having two lists that we must flush, but I don't
have a simpler idea that works within the context of the current code.

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH] scsi-misc-2.5 software enqueue when can_queue reached
  2003-03-03 20:52   ` Patrick Mansfield
@ 2003-03-03 22:40     ` Luben Tuikov
  2003-03-03 23:41       ` Patrick Mansfield
  0 siblings, 1 reply; 18+ messages in thread
From: Luben Tuikov @ 2003-03-03 22:40 UTC (permalink / raw)
  To: Patrick Mansfield; +Cc: linux-scsi

Patrick Mansfield wrote:
> 
> The patch fixes the problem we currently have when the can_queue limit is
> reached, and helps move towards a per scsi_device queue_lock.

Yes, I saw that.  This is why I said ``deferred_cmd_q'', since those
are _deferred_ commands which _became_ *deferred* when the host couldn't
take them (can_queue).

> I coded it in with the idea of moving towards further separation of
> scsi_host versus scsi_device - for example, always decrementing host_busy
> when a command completes, but I'm not trying to address other general
> issues.

Well, you have to make up your mind -- addressing other issues is
inevitable when you want to plug a different approach-idea into
the old infrastructure.  (Kind of reminds me of those baby toys,
where the cube will never fit through the disk shape -- such a clever
toy. :-> )

> I'll rename the field to pending_cmd - it is a command that is pending
> for this host adapter, and just avoid the q vs queue postfix issue.

Ah, yes, sadly I see the [political] compromize -- I promize to
straighten this out with the source right after I finish this email.
In due time you're allowed free thinking. :-)

But isn't it a queue? Furhermore isn't it a queue of _deferred_
*by SCSI Core* commands, because the host command limit were hit?

Then wouldn't it *make sense* to call it ``deferred_q'' or
``deferred_cmd_q''?

Doesn't this just *make sense*?

I really dislike your choice of ``pending_.*'' as I've secretly
reserved it for the queue of commands pending execution status by
the device/LUN, i.e. submitted to the LLDD via queuecommand() --
in a future SCSI Core.  And, btw, this is what I have in my own
mini-scsi-core (incoming_q, pending_q, done_q, plus a few more) --
so, yes, I am biased. :-)  But I hope this is a good bias.

>>2. Just think about how much *easier* your patch would be
>>if SCSI Core had a ``incoming_cmd_q''?   You could just
>>enqueue at the front, while the request fn could enqueue
>>at the end.
> 
> 
> If we do that, for the current code if we never hit can_queue, then we
> would always add to the list and then immediately remove the same command
> from the list for further processing.

This is absolutely correct!  We do not want to buffer scsi commands
in SCSI Core, we just want to redirect and intercept them!

But, we want the *ability* to buffer, in such situations as when
a command queue limit has been reached, which your patch addresses.

But this gets more complicated, as you'll have to check with the
portal, or target or LU, for *what* the reason is that a command
queue limit has been reached!

It may be the case that you cannot afford to buffer commands,
but you'll have to terminate them immediately with CHECK CONDITION
status and an appropriate ASC and ASCQ codes.  Or if the error
was in the portal, it may be the case that you'll have to
return SERVICE DELIVERY FAILURE as service response... (in a new-scsi-core).

> It would be interesting to use a block request queue for sending commands
> to the host adapter, but that is beyond the scope of this patch.

Well, please don't forget that block commands are a special case.
SCSI Core should only work with scsi command structs, as more exotic
sources will send scsi commands directly to SCSI Core, to more exotic
devices (e.g. processors), in the future.
SG, is right now a special case of a general nature. :-)

So in the future, or a new SCSI Core, I see a general entry of
a sort ala-execute_scsi_command().

> The list_head is there because of how we are implementing the lists or
> queueing - we could create a list of scsi_cmnd's that does not require a
> separate field within scsi_cmnd.

Zzzzzzz..... What's new p... ? [Tom Jones :-) ]

(I know why list_head is used -- check linux-scsi archives for when I asked
that next, prev be eliminated for list_head...)

> We can combine the scsi_cmnd list_head pending with the eh_entry, if there
> is agreement that a smaller scsi_cmnd is worth the list_head overloading.

But why should ``a smaller scsi_cmnd'' struct be a driving force in
a queuing design?  This is *the* recipe for disaster!

What you want to say is that: We could combine ... into a single
list_head entry if the queueing design would be finalized in linux-scsi
for scsi core.

The thing is that I don't really see it [queueing design] changing at all,
now that we're so close to 2.6/3.0.

BTW, there is no such thing as ``list_head overloading''.  An entry
can be a member of only one list at a time.

> This is not a queue of commands for the host adapter to process, it is a
> queue of commands the host adapter will be sent in the future, so we are
> either on the queue, or not on the queue.

Uuuh, Logic 101.

But if the host will be sent those commands in the future, then
it will process those commands... :-)

IOW, I know this, Patrick.

What I dislike is that the command is ``either there or it's not''.
And I envision that a command is always a member of a list, depending on
it's state.  The advantage of this is that a thread can *catch up* with the
command to...

So you could have the request fn adding to incoming_q at the tail
*and* SCSI Core adding deferred commands at the front of the incoming_q
*at the same time* via a spin lock (irq), and then when its time
you get fairness *implicitly* since the enqueuing code will read from the
front! I.e. deferred commands first.

So this distinction which you write about can be consolidated with
just one queue -- incoming_q -- and then you don't need a second
argument to the dispatch fn.

> 
>>>-int scsi_dispatch_cmd(Scsi_Cmnd * SCpnt)
>>>+int scsi_dispatch_cmd(Scsi_Cmnd * SCpnt, int resend)
>>
>>
>>Oh, boy!
>>
>>More complication!  Why do you need to change the prototype
>>for the dispatch fn?  (Rhetorical.)
>>
>>Because the rest of the infrastructure of SCSI Core is not
>>up to date to your patch-idea?
> 
> 
> Slightly, but even so it nice to have some sort of priority for the
> scsi_cmnd, and a resend can imply higher priority.

Yes, it is nice to have priority, but not in such a ... way -- via
an added function parameter to tell from which queue we're to read...

> I generally do not like having two lists that we must flush, but I don't
> have a simpler idea that works within the context of the current code.

A-ha!  Now you're talking!

Oh, well, incoming_q comes to the rescue. :-)
Too late for this though.

-- 
Luben



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH] scsi-misc-2.5 software enqueue when can_queue reached
  2003-03-03 22:40     ` Luben Tuikov
@ 2003-03-03 23:41       ` Patrick Mansfield
  2003-03-04  5:48         ` Luben Tuikov
  0 siblings, 1 reply; 18+ messages in thread
From: Patrick Mansfield @ 2003-03-03 23:41 UTC (permalink / raw)
  To: Luben Tuikov; +Cc: linux-scsi

Luben -

On Mon, Mar 03, 2003 at 05:40:59PM -0500, Luben Tuikov wrote:

> Ah, yes, sadly I see the [political] compromize -- I promize to
> straighten this out with the source right after I finish this email.
> In due time you're allowed free thinking. :-)

Well they are just names, the important issue is that we understand how
the code uses them.

> But isn't it a queue? Furhermore isn't it a queue of _deferred_
> *by SCSI Core* commands, because the host command limit were hit?
> 
> Then wouldn't it *make sense* to call it ``deferred_q'' or
> ``deferred_cmd_q''?
> 
> Doesn't this just *make sense*?

It is a struct list_head, you can call it a queue but it is coded as a
list_head. Adding _q or _queue does not change anything, or make the code
easier to understand.

There is not much difference between "pending" and "deferred" (damn I
actually had to go look those up in a dictionary), though SAM-3 references
pending, I don't see any SAM references to deferred.

The name can always be modified via future patches.

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH] scsi-misc-2.5 software enqueue when can_queue reached
  2003-03-03 18:17   ` Luben Tuikov
@ 2003-03-04  1:11     ` Andrew Morton
  2003-03-04  4:49       ` Luben Tuikov
  0 siblings, 1 reply; 18+ messages in thread
From: Andrew Morton @ 2003-03-04  1:11 UTC (permalink / raw)
  To: Luben Tuikov; +Cc: hch, patmans, linux-scsi

Luben Tuikov <luben@splentec.com> wrote:
>
> > Linux codingstyle sais the && should be before the line break.
> 
> Actually there's a point to this.
> 

Sorry, but tough luck.

The advantages of everyone using the same coding style outweigh the arguable
marginal benefits of introducing local inconsistencies.

I was a braces-on-a-new-line guy for nearly 20 years, but I knuckled under
and learnt to use Linus-style for this reason.  I'm sure others can manage
this.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH] scsi-misc-2.5 software enqueue when can_queue reached
  2003-03-04  1:11     ` Andrew Morton
@ 2003-03-04  4:49       ` Luben Tuikov
  0 siblings, 0 replies; 18+ messages in thread
From: Luben Tuikov @ 2003-03-04  4:49 UTC (permalink / raw)
  To: Andrew Morton; +Cc: hch, patmans, linux-scsi

Andrew Morton wrote:
> Luben Tuikov <luben@splentec.com> wrote:
> 
>>>Linux codingstyle sais the && should be before the line break.
>>
>>Actually there's a point to this.
>>
> 
> 
> Sorry, but tough luck.

What do you mean?
 
> The advantages of everyone using the same coding style outweigh the arguable
> marginal benefits of introducing local inconsistencies.

I wasn't ``introducing local inconsistencies'', I was merely stating
a few facts.  And if you had quoted my whole letter you'd see that
I did not express personal opinion on the subject -- we already
have coding police. :-)
 
> I was a braces-on-a-new-line guy for nearly 20 years, but I knuckled under
> and learnt to use Linus-style for this reason.  I'm sure others can manage
> this.

After Pascal, I learned C from K&R1 over a decade ago.

In general I don't really care *that* much about the style
as long as it is K&R1 and the tabs are 8 spaces.  I care more
about the logistics in the code.

-- 
Luben



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH] scsi-misc-2.5 software enqueue when can_queue reached
  2003-03-03 23:41       ` Patrick Mansfield
@ 2003-03-04  5:48         ` Luben Tuikov
  0 siblings, 0 replies; 18+ messages in thread
From: Luben Tuikov @ 2003-03-04  5:48 UTC (permalink / raw)
  To: Patrick Mansfield; +Cc: linux-scsi

Patrick Mansfield wrote:
> Luben -
> 
> Well they are just names, the important issue is that we understand how
> the code uses them.
> 
> It is a struct list_head, you can call it a queue but it is coded as a
> list_head. Adding _q or _queue does not change anything, or make the code
> easier to understand.

Ah, Patrick, mea culpa.  I see your point of view.

The reason I'm so finicky on names/labels/designators is because
I subscribe to the theory that symbols themselves carry meaning,
rather than the opposing camp of thought, where the claim is that
symbols just label ideas.

The saying ``it's just semantics'' stems from the second
school of thought, which is in recess, after the first one
(cf. Platonism) made its point in the latter 20th century, mainly
through math and philosophy.
Also, the second one gets into infinite regression when/if trying to
explain ``meaning'' itself.

I've mentioned this a few times and given examples of
how hard it is doing arithmetic in Roman numerals, as opposed
to Arabic numerals, which *are* the numbers.

So for this reason I try to name things as closely as possible
so that their names (which is they, themselves) represent what they
are used for or stand for -- which is again they themselves.

If something doesn't have a name, like this object:


then it doesn't really exist. :-)

Else we might just give names like 
	struct list_head c3po;   /* queue of done commands */
which would hardly be appropriate.

BTW, long time ago I used to be one of those ppl that didn't really
care for what the names were -- then I got converted.
 
> There is not much difference between "pending" and "deferred" (damn I
> actually had to go look those up in a dictionary),

(I love reading the dictionary.)

Anyway, something pending is always deferred, but something
deferred doesn't have to be pending.

So in your case, your pending commands may never get executed
if the LUN had died in due time, because the host was buggy
when it's queue limit got hit.  So put them in a deferred_q.

But all commands submitted to the LLDDs, via queuecommand()
are pending for their execution status via scsi_done().

> though SAM-3 references
> pending, I don't see any SAM references to deferred.

Forget about SAM-3.  The queuing model there is for a
different purpose, and we're talking about a [SCSI kernel]
subsystem here. What I'm trying to say is, do what makes sense.
 
> The name can always be modified via future patches.

Theoretically, yes.  Practially, I don't think anyone has
the time and/or desire -- well, maybe Christoph. ;-)

But really, whatever goes in now, will stay there for a
long, long time, so I say, just as we think throughout
the logic, might as well do the same for the names.

-- 
Luben
P.S. Meet the man behind the keyboard, last week, Le Massif, Quebec:
http://www.splentec.com/~luben/dsc00091-re.jpg




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH] scsi-misc-2.5 software enqueue when can_queue reached
  2003-02-28 19:19 [RFC][PATCH] scsi-misc-2.5 software enqueue when can_queue reached Patrick Mansfield
  2003-03-02  8:57 ` Christoph Hellwig
  2003-03-02 20:57 ` Luben Tuikov
@ 2003-03-05  3:02 ` James Bottomley
  2003-03-05 18:43   ` Patrick Mansfield
  2 siblings, 1 reply; 18+ messages in thread
From: James Bottomley @ 2003-03-05  3:02 UTC (permalink / raw)
  To: Patrick Mansfield; +Cc: SCSI Mailing List

On Fri, 2003-02-28 at 20:19, Patrick Mansfield wrote:
> Hi -
> 
> Any comments or suggestions on the following patch?

Sorry for getting to this late, just returned from a nice long weekend
holiday.

> Patch is against scsi-misc-2.5.
> 
> I'm trying to break split queue_lock into a per scsi_device lock, and the
> starved code is in the way, plus it is not fair and looks like it has a
> couple of bugs. I started by trying to add a queue of request queues plus
> throttling to the current "starved" algorithm, but it was not too clean,
> so instead implemented this software enqueueing when the host can_queue
> limit is reached.

Could you elaborate on why a pending_queue (which duplicates some of the
block layer queueing functionality that we use) is a good idea.

Under the current scheme, we prep one command beyond the can_queue limit
and leave it in the block queue, so the returning commands can restart
with a fully prepped command but we still leave all the others in the
block queue for potential elevator merging. 

> This can use more resources (scsi_cmnd's) if the sum of queue_depths on a
> host is greater than can_queue, but it is fairer (when not all devices are
> hitting queue_depth limits, or if we had throttling with a throttling
> limit of 1) and simpler compared to using a queue of queues (or code
> similiar to the current starvation code).

It would certainly need some kind of throttling.  In the current code,
not taking a command off the block queue when we can't accept it gives
the elevator longer to work on merging; taking too many commands off
when we can't actually send them on might lead to the average command
size dropping.

James



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH] scsi-misc-2.5 software enqueue when can_queue reached
  2003-03-05  3:02 ` James Bottomley
@ 2003-03-05 18:43   ` Patrick Mansfield
  2003-03-06 15:57     ` James Bottomley
  0 siblings, 1 reply; 18+ messages in thread
From: Patrick Mansfield @ 2003-03-05 18:43 UTC (permalink / raw)
  To: James Bottomley; +Cc: SCSI Mailing List

On Wed, Mar 05, 2003 at 04:02:38AM +0100, James Bottomley wrote:
> 
> Could you elaborate on why a pending_queue (which duplicates some of the
> block layer queueing functionality that we use) is a good idea.

> Under the current scheme, we prep one command beyond the can_queue limit
> and leave it in the block queue, so the returning commands can restart
> with a fully prepped command but we still leave all the others in the
> block queue for potential elevator merging. 

Note that if we go over can_queue performance can suffer no matter what we
do in scsi core. If the bandwidth or a limit of the adapter is reached, no
changes in scsi core can fix that, all we can do is make sure each
scsi_device can do some IO. So, we are trying to figure out a good way to
make sure all devices can do IO when can_queue is hit.

(Not sure if you implied the following change) The host pending_cmd queue
could be replaced in the future with a (block) request queue for each
LLDD, without much change in function - we would still have to pull
requests off of the scsi_device queue before putting them into any LLDD
request queue, so we still would not be able to leave requests in the
scsi_device queue.  We could try to "sort" the LLDD queue so we have a mix
of scsi_devices represented, but that could lead to other issues.

Going to a block request queue now might be hard - it would likely need a
further separation of scsi_device and scsi_host within scsi core (in the
request and prep functions, and in the IO completion path).

With multiple starved devices with IO requests pending for all of them,
the algorithm we have now (assuming it worked right) can unfairly allow
each scsi_device to have as many commands outstanding as it did when we
hit the starved state.

The current algorithm could be fixed and throttling added.

Today, many of the host adapter drivers avoid the can_queue issue by
limiting the queue_depth (such as ips.c), or by having their own software
queue (qlogic's qla driver).

Pros of a host pending_cmd queue versus throttling across (effectively) a
queue of queues:

1) Simpler (and perhaps faster) code.

2) We don't need any throttling, and can always keep can_queue IO's in
flight. With per-host throttling, we sometimes must keep the number of
IO's in flight below can_queue in order to be fair (in cases where one
device is busier than another; there could be a way around this, but I
haven't figured it out).

3) The queue_depth setting can be used to limit IO across scsi_devices on
a single adapter - if you are hitting can_queue, setting a lower
queue_depth for some devices will affectively lower the bandwidth
available to them. (Given we can modify these on a per scsi_device basis;
I would really like a writable sysfs sdev->queue_depth).

4) Allows better separation of scsi_device code from scsi_host, for
example, it simplifies going to a per-device queue lock.

Cons:

1) Does not leave leave devices on the block queue so they can be
merged/sorted with incoming commands.

2) Might lead to grouping of IO per scsi_device rather than spreading IO
across all devices. Hopefully this would balance out over the long term or
not degrade performance at all (for sequential IO it might speed things
up, but for random it might slow down), and over a long period and with
random enough (across scsi_device queues, not within a given queue) IO
patterns, we would end up spreading IO across all queues.

Given that we are talking about what to do when a hardware limit is
reached (can_queue) I would rather go with the simpler approach.

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH] scsi-misc-2.5 software enqueue when can_queue reached
  2003-03-05 18:43   ` Patrick Mansfield
@ 2003-03-06 15:57     ` James Bottomley
  2003-03-06 17:41       ` Patrick Mansfield
  0 siblings, 1 reply; 18+ messages in thread
From: James Bottomley @ 2003-03-06 15:57 UTC (permalink / raw)
  To: Patrick Mansfield; +Cc: SCSI Mailing List

On Wed, 2003-03-05 at 12:43, Patrick Mansfield wrote: 
> On Wed, Mar 05, 2003 at 04:02:38AM +0100, James Bottomley wrote:
> > 
> > Could you elaborate on why a pending_queue (which duplicates some of the
> > block layer queueing functionality that we use) is a good idea.
> 
> > Under the current scheme, we prep one command beyond the can_queue limit
> > and leave it in the block queue, so the returning commands can restart
> > with a fully prepped command but we still leave all the others in the
> > block queue for potential elevator merging. 
> 
> Note that if we go over can_queue performance can suffer no matter what we
> do in scsi core. If the bandwidth or a limit of the adapter is reached, no
> changes in scsi core can fix that, all we can do is make sure each
> scsi_device can do some IO. So, we are trying to figure out a good way to
> make sure all devices can do IO when can_queue is hit.

So what you're basically trying to do is to ensure restart fairness for
the host starvation case. 

Since the driver can't service any requests in this case, I do think the
correct thing to do is to leave the requests in the block queue in the
hope that the elevator has longer to merge them, so we ultimately get
fewer, larger requests. 

> (Not sure if you implied the following change) The host pending_cmd queue
> could be replaced in the future with a (block) request queue for each
> LLDD, without much change in function - we would still have to pull
> requests off of the scsi_device queue before putting them into any LLDD
> request queue, so we still would not be able to leave requests in the
> scsi_device queue.  We could try to "sort" the LLDD queue so we have a mix
> of scsi_devices represented, but that could lead to other issues.
> 
> Going to a block request queue now might be hard - it would likely need a
> further separation of scsi_device and scsi_host within scsi core (in the
> request and prep functions, and in the IO completion path).

But we already have a per device block request queue, it's the block queue
associated with the current device which we already use.

> With multiple starved devices with IO requests pending for all of them,
> the algorithm we have now (assuming it worked right) can unfairly allow
> each scsi_device to have as many commands outstanding as it did when we
> hit the starved state.
> 
> The current algorithm could be fixed and throttling added.

How about a different fix: instead of a queue of pending commands, a queue (or
really an ordered list) of devices to restart.  Add to the tail of this list
when we're over the host can_queue limit, but restart from the head.  Leave
in place the current prep one command extra per device, that way we just
do a __blk_run_queue() in order from this list.  It replaces the
some_device_starved and device_starved flags, will keep the current
elevator behaviour and should ensure reasonable fairness.

I believe this should ensure for equal I/O pressure to two devices that
they eventually get half the available slots each when we run into
the can_queue limit, which looks desirable.

James



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH] scsi-misc-2.5 software enqueue when can_queue reached
  2003-03-06 15:57     ` James Bottomley
@ 2003-03-06 17:41       ` Patrick Mansfield
  2003-03-06 18:04         ` James Bottomley
  0 siblings, 1 reply; 18+ messages in thread
From: Patrick Mansfield @ 2003-03-06 17:41 UTC (permalink / raw)
  To: James Bottomley; +Cc: SCSI Mailing List

On Thu, Mar 06, 2003 at 09:57:55AM -0600, James Bottomley wrote:
> On Wed, 2003-03-05 at 12:43, Patrick Mansfield wrote: 
> > On Wed, Mar 05, 2003 at 04:02:38AM +0100, James Bottomley wrote:

> > Note that if we go over can_queue performance can suffer no matter what we
> > do in scsi core. If the bandwidth or a limit of the adapter is reached, no
> > changes in scsi core can fix that, all we can do is make sure each
> > scsi_device can do some IO. So, we are trying to figure out a good way to
> > make sure all devices can do IO when can_queue is hit.
> 
> So what you're basically trying to do is to ensure restart fairness for
> the host starvation case. 

Well more specifically, that each device can do an equal amount of IO - so
not just restart fairness, but IO's/second fairness.

> Since the driver can't service any requests in this case, I do think the
> correct thing to do is to leave the requests in the block queue in the
> hope that the elevator has longer to merge them, so we ultimately get
> fewer, larger requests. 

> > Going to a block request queue now might be hard - it would likely need a
> > further separation of scsi_device and scsi_host within scsi core (in the
> > request and prep functions, and in the IO completion path).
> 
> But we already have a per device block request queue, it's the block queue
> associated with the current device which we already use.

Yes - but the host could be represented as another request queue.

> > With multiple starved devices with IO requests pending for all of them,
> > the algorithm we have now (assuming it worked right) can unfairly allow
> > each scsi_device to have as many commands outstanding as it did when we
> > hit the starved state.
> > 
> > The current algorithm could be fixed and throttling added.
> 
> How about a different fix: instead of a queue of pending commands, a queue (or
> really an ordered list) of devices to restart.  Add to the tail of this list
> when we're over the host can_queue limit, but restart from the head.  Leave
> in place the current prep one command extra per device, that way we just
> do a __blk_run_queue() in order from this list.  It replaces the
> some_device_starved and device_starved flags, will keep the current
> elevator behaviour and should ensure reasonable fairness.
> 
> I believe this should ensure for equal I/O pressure to two devices that
> they eventually get half the available slots each when we run into
> the can_queue limit, which looks desirable.
> 
> James

That is close to what the current algorithm is trying to do.

But ... my logic was off in assuming that this approach is unfair
(thinking that the device that sent IO before hitting any starved case
would *always* have more requests outstanding than any other devices that
come in later).

Let me see what I can code up along these lines.

Such a solution complicates moving to a per-device queue lock - we need a
per-host lock while pulling off the restart list, and we need a
per-queue-lock prior to calling __blk_run_queue(); but when starving in
the scsi_request_fn(), we have a per-queue-lock and need the per-host lock
to add to the restart list. So we have to allow dropping a lock in both
cases without leading to a hung requeset queue.

Thanks.

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH] scsi-misc-2.5 software enqueue when can_queue reached
  2003-03-06 17:41       ` Patrick Mansfield
@ 2003-03-06 18:04         ` James Bottomley
  0 siblings, 0 replies; 18+ messages in thread
From: James Bottomley @ 2003-03-06 18:04 UTC (permalink / raw)
  To: Patrick Mansfield; +Cc: SCSI Mailing List

On Thu, 2003-03-06 at 11:41, Patrick Mansfield wrote:
> Such a solution complicates moving to a per-device queue lock - we need a
> per-host lock while pulling off the restart list, and we need a
> per-queue-lock prior to calling __blk_run_queue(); but when starving in
> the scsi_request_fn(), we have a per-queue-lock and need the per-host lock
> to add to the restart list. So we have to allow dropping a lock in both
> cases without leading to a hung requeset queue.

Actually, I don't think it does. Any driver that can truly have a per
device lock can't have a fixed size pool of resources that device
requests are competing for (otherwise it would need a host lock to
access these resources), thus can_queue should always be set to a number
much greater than the aggregate of the local device queue depths (we'd
need a comment in the code to this effect).

Do we actually have any LLDs in the tree today that can benefit from a
device based locking approach?  (i.e. they must manage their resource
pools on a device basis and also have a mailbox scheme that doesn't
require host based locking).

James



^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2003-03-06 18:04 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-02-28 19:19 [RFC][PATCH] scsi-misc-2.5 software enqueue when can_queue reached Patrick Mansfield
2003-03-02  8:57 ` Christoph Hellwig
2003-03-02 18:15   ` Patrick Mansfield
2003-03-03 15:52   ` Randy.Dunlap
2003-03-03 18:17   ` Luben Tuikov
2003-03-04  1:11     ` Andrew Morton
2003-03-04  4:49       ` Luben Tuikov
2003-03-02 20:57 ` Luben Tuikov
2003-03-02 21:08   ` Luben Tuikov
2003-03-03 20:52   ` Patrick Mansfield
2003-03-03 22:40     ` Luben Tuikov
2003-03-03 23:41       ` Patrick Mansfield
2003-03-04  5:48         ` Luben Tuikov
2003-03-05  3:02 ` James Bottomley
2003-03-05 18:43   ` Patrick Mansfield
2003-03-06 15:57     ` James Bottomley
2003-03-06 17:41       ` Patrick Mansfield
2003-03-06 18:04         ` James Bottomley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox