linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] scsi_dh: Make scsi_dh_activate asynchronous
@ 2009-09-30  2:08 Chandra Seetharaman
  2009-09-30  2:08 ` [PATCH 1/4] scsi_dh: Change the scsidh_activate interface to be asynchronous Chandra Seetharaman
                   ` (5 more replies)
  0 siblings, 6 replies; 22+ messages in thread
From: Chandra Seetharaman @ 2009-09-30  2:08 UTC (permalink / raw)
  To: linux-scsi
  Cc: dm-devel, babu.moger, michaelc, Benoit_Arthur, Eddie.Williams,
	berthiaume_wayne, Chandra Seetharaman

Hello All,

Currently, device handlers process path activation in series. This leads
to a lot of time delay when more than 100 luns are involved. For example,
with lsi rdac 100+ luns take about 12-15 minutes. This was found by Moger
Babu of LSI.

This time delay can be avoided if we can do the activations asynchronously.
By making scsi_dh_activate() async, we can give the device handlers an
oppurtunity to decide on how to send the device activation down the wire
to make the turn around time faster. They can send the commands
asynchronously or send them in batches.

I brought up this issue on the mailing list
(http://marc.info/?l=linux-scsi&m=123888063818755&w=2) and provided a
set of patch a loooong while back http://marc.info/?l=linux-scsi&m=124088712709821&w=2

This set of patches applies cleanly on 2.6.31 and I tested the rdac handler
on the same.

Please review and provide comments.

This set of patched adds asynchronous support for rdac, hp, and alua handlers.

Didn't add async support for EMC for the following reasons:
 - EMC handler is getting more hairy (when I try to move to _nowait())
 - I do not know if EMC has a special command to group them (like rdac).
 - I wasn't sure if EMC has this problem in the first place ( I did not get
   any response from EMC).
Can somebody from EMC comment, please.

Thanks & Regards,

chandra


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/4] scsi_dh: Change the scsidh_activate interface to be asynchronous
  2009-09-30  2:08 [PATCH 0/4] scsi_dh: Make scsi_dh_activate asynchronous Chandra Seetharaman
@ 2009-09-30  2:08 ` Chandra Seetharaman
  2009-10-02 22:04   ` Moger, Babu
  2009-09-30  2:08 ` [PATCH 2/4] scsi_dh: rdac handler: Make rdac hardware handler async Chandra Seetharaman
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 22+ messages in thread
From: Chandra Seetharaman @ 2009-09-30  2:08 UTC (permalink / raw)
  To: linux-scsi
  Cc: dm-devel, babu.moger, michaelc, Benoit_Arthur, Eddie.Williams,
	berthiaume_wayne, Chandra Seetharaman

Make scsi_dh_activate() function asynchronous, by taking in two additional 
parameters, one is the callback function and the other is the data to call
the callback function with.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>

--------------
 drivers/md/dm-mpath.c                       |    8 ++++----
 drivers/scsi/device_handler/scsi_dh.c       |   17 ++++++++++++-----
 drivers/scsi/device_handler/scsi_dh_alua.c  |    6 ++++--
 drivers/scsi/device_handler/scsi_dh_emc.c   |    6 ++++--
 drivers/scsi/device_handler/scsi_dh_hp_sw.c |    6 ++++--
 drivers/scsi/device_handler/scsi_dh_rdac.c  |    6 ++++--
 include/scsi/scsi_device.h                  |    3 ++-
 include/scsi/scsi_dh.h                      |    7 +++++--
 8 files changed, 39 insertions(+), 20 deletions(-)

Index: linux-2.6.31/include/scsi/scsi_dh.h
===================================================================
--- linux-2.6.31.orig/include/scsi/scsi_dh.h
+++ linux-2.6.31/include/scsi/scsi_dh.h
@@ -55,14 +55,17 @@ enum {
 	SCSI_DH_NOSYS,
 	SCSI_DH_DRIVER_MAX,
 };
+
 #if defined(CONFIG_SCSI_DH) || defined(CONFIG_SCSI_DH_MODULE)
-extern int scsi_dh_activate(struct request_queue *);
+extern int scsi_dh_activate(struct request_queue *, activate_complete, void *);
 extern int scsi_dh_handler_exist(const char *);
 extern int scsi_dh_attach(struct request_queue *, const char *);
 extern void scsi_dh_detach(struct request_queue *);
 #else
-static inline int scsi_dh_activate(struct request_queue *req)
+static inline int scsi_dh_activate(struct request_queue *req,
+					activate_complete fn, void *data)
 {
+	fn(data, 0);
 	return 0;
 }
 static inline int scsi_dh_handler_exist(const char *name)
Index: linux-2.6.31/drivers/scsi/device_handler/scsi_dh.c
===================================================================
--- linux-2.6.31.orig/drivers/scsi/device_handler/scsi_dh.c
+++ linux-2.6.31/drivers/scsi/device_handler/scsi_dh.c
@@ -214,7 +214,7 @@ store_dh_state(struct device *dev, struc
 			 * Activate a device handler
 			 */
 			if (scsi_dh->activate)
-				err = scsi_dh->activate(sdev);
+				err = scsi_dh->activate(sdev, NULL, NULL);
 			else
 				err = 0;
 		}
@@ -411,10 +411,17 @@ EXPORT_SYMBOL_GPL(scsi_unregister_device
 /*
  * scsi_dh_activate - activate the path associated with the scsi_device
  *      corresponding to the given request queue.
- * @q - Request queue that is associated with the scsi_device to be
- *      activated.
+ *     Returns immediately without waiting for activation to be completed.
+ * @q    - Request queue that is associated with the scsi_device to be
+ *         activated.
+ * @fn   - Function to be called upon completion of the activation.
+ *         Function fn is called with data (below) and the error code.
+ *         Function fn may be called from the same calling context. So,
+ *         do not hold the lock in the caller which may be needed in fn.
+ * @data - data passed to the function fn upon completion.
+ *
  */
-int scsi_dh_activate(struct request_queue *q)
+int scsi_dh_activate(struct request_queue *q, activate_complete fn, void *data)
 {
 	int err = 0;
 	unsigned long flags;
@@ -433,7 +440,7 @@ int scsi_dh_activate(struct request_queu
 		return err;
 
 	if (scsi_dh->activate)
-		err = scsi_dh->activate(sdev);
+		err = scsi_dh->activate(sdev, fn, data);
 	put_device(&sdev->sdev_gendev);
 	return err;
 }
Index: linux-2.6.31/drivers/scsi/device_handler/scsi_dh_rdac.c
===================================================================
--- linux-2.6.31.orig/drivers/scsi/device_handler/scsi_dh_rdac.c
+++ linux-2.6.31/drivers/scsi/device_handler/scsi_dh_rdac.c
@@ -516,7 +516,7 @@ done:
 	return err;
 }
 
-static int rdac_activate(struct scsi_device *sdev)
+static int rdac_activate(struct scsi_device *sdev, activate_complete fn, void *data)
 {
 	struct rdac_dh_data *h = get_rdac_data(sdev);
 	int err = SCSI_DH_OK;
@@ -539,7 +539,9 @@ static int rdac_activate(struct scsi_dev
 	if (h->lun_state == RDAC_LUN_UNOWNED)
 		err = send_mode_select(sdev, h);
 done:
-	return err;
+	if (fn)
+		fn(data, err);
+	return 0;
 }
 
 static int rdac_prep_fn(struct scsi_device *sdev, struct request *req)
Index: linux-2.6.31/drivers/scsi/device_handler/scsi_dh_alua.c
===================================================================
--- linux-2.6.31.orig/drivers/scsi/device_handler/scsi_dh_alua.c
+++ linux-2.6.31/drivers/scsi/device_handler/scsi_dh_alua.c
@@ -652,7 +652,7 @@ out:
  * based on a certain policy. But until we actually encounter them it
  * should be okay.
  */
-static int alua_activate(struct scsi_device *sdev)
+static int alua_activate(struct scsi_device *sdev, activate_complete fn, void *data)
 {
 	struct alua_dh_data *h = get_alua_data(sdev);
 	int err = SCSI_DH_OK;
@@ -667,7 +667,9 @@ static int alua_activate(struct scsi_dev
 		err = alua_stpg(sdev, TPGS_STATE_OPTIMIZED, h);
 
 out:
-	return err;
+	if (fn)
+		fn(data, err);
+	return 0;
 }
 
 /*
Index: linux-2.6.31/drivers/scsi/device_handler/scsi_dh_emc.c
===================================================================
--- linux-2.6.31.orig/drivers/scsi/device_handler/scsi_dh_emc.c
+++ linux-2.6.31/drivers/scsi/device_handler/scsi_dh_emc.c
@@ -528,7 +528,7 @@ retry:
 	return err;
 }
 
-static int clariion_activate(struct scsi_device *sdev)
+static int clariion_activate(struct scsi_device *sdev, activate_complete fn, void *data)
 {
 	struct clariion_dh_data *csdev = get_clariion_data(sdev);
 	int result;
@@ -559,7 +559,9 @@ done:
 		    csdev->port, lun_state[csdev->lun_state],
 		    csdev->default_sp + 'A');
 
-	return result;
+	if (fn)
+		fn(data, result);
+	return 0;
 }
 
 static const struct scsi_dh_devlist clariion_dev_list[] = {
Index: linux-2.6.31/drivers/scsi/device_handler/scsi_dh_hp_sw.c
===================================================================
--- linux-2.6.31.orig/drivers/scsi/device_handler/scsi_dh_hp_sw.c
+++ linux-2.6.31/drivers/scsi/device_handler/scsi_dh_hp_sw.c
@@ -268,7 +268,7 @@ static int hp_sw_prep_fn(struct scsi_dev
  * activate the passive path (and deactivate the
  * previously active one).
  */
-static int hp_sw_activate(struct scsi_device *sdev)
+static int hp_sw_activate(struct scsi_device *sdev, activate_complete fn, void *data)
 {
 	int ret = SCSI_DH_OK;
 	struct hp_sw_dh_data *h = get_hp_sw_data(sdev);
@@ -283,7 +283,9 @@ static int hp_sw_activate(struct scsi_de
 				    HP_SW_NAME);
 	}
 
-	return ret;
+	if (fn)
+		fn(data, ret);
+	return 0;
 }
 
 static const struct scsi_dh_devlist hp_sw_dh_data_list[] = {
Index: linux-2.6.31/include/scsi/scsi_device.h
===================================================================
--- linux-2.6.31.orig/include/scsi/scsi_device.h
+++ linux-2.6.31/include/scsi/scsi_device.h
@@ -174,6 +174,7 @@ struct scsi_dh_devlist {
 	char *model;
 };
 
+typedef void (*activate_complete)(void *, int);
 struct scsi_device_handler {
 	/* Used by the infrastructure */
 	struct list_head list; /* list of scsi_device_handlers */
@@ -185,7 +186,7 @@ struct scsi_device_handler {
 	int (*check_sense)(struct scsi_device *, struct scsi_sense_hdr *);
 	int (*attach)(struct scsi_device *);
 	void (*detach)(struct scsi_device *);
-	int (*activate)(struct scsi_device *);
+	int (*activate)(struct scsi_device *, activate_complete, void *);
 	int (*prep_fn)(struct scsi_device *, struct request *);
 };
 
Index: linux-2.6.31/drivers/md/dm-mpath.c
===================================================================
--- linux-2.6.31.orig/drivers/md/dm-mpath.c
+++ linux-2.6.31/drivers/md/dm-mpath.c
@@ -1086,8 +1086,9 @@ static int pg_init_limit_reached(struct 
 	return limit_reached;
 }
 
-static void pg_init_done(struct dm_path *path, int errors)
+static void pg_init_done(void *data, int errors)
 {
+	struct dm_path *path = data;
 	struct pgpath *pgpath = path_to_pgpath(path);
 	struct priority_group *pg = pgpath->pg;
 	struct multipath *m = pg->m;
@@ -1153,12 +1154,11 @@ static void pg_init_done(struct dm_path 
 
 static void activate_path(struct work_struct *work)
 {
-	int ret;
 	struct pgpath *pgpath =
 		container_of(work, struct pgpath, activate_path);
 
-	ret = scsi_dh_activate(bdev_get_queue(pgpath->path.dev->bdev));
-	pg_init_done(&pgpath->path, ret);
+	scsi_dh_activate(bdev_get_queue(pgpath->path.dev->bdev),
+				pg_init_done, &pgpath->path);
 }
 
 /*

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 2/4] scsi_dh: rdac handler: Make rdac hardware handler async
  2009-09-30  2:08 [PATCH 0/4] scsi_dh: Make scsi_dh_activate asynchronous Chandra Seetharaman
  2009-09-30  2:08 ` [PATCH 1/4] scsi_dh: Change the scsidh_activate interface to be asynchronous Chandra Seetharaman
@ 2009-09-30  2:08 ` Chandra Seetharaman
  2009-10-02  0:03   ` Moger, Babu
  2009-09-30  2:08 ` [PATCH 3/4] scsi_dh: rdac handler: Make hp " Chandra Seetharaman
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 22+ messages in thread
From: Chandra Seetharaman @ 2009-09-30  2:08 UTC (permalink / raw)
  To: linux-scsi
  Cc: dm-devel, babu.moger, michaelc, Benoit_Arthur, Eddie.Williams,
	berthiaume_wayne, Chandra Seetharaman

Batch up MODE_SELECT in rdac device handler.

LSI RDAC storage has the capability of handling mode selects for
multiple luns in a same command. Make use of that ability to send
as few MODE SELECTs as possible to the storage controller as possible.

This patch creates a work queue and queues up activate requests
when a MODE SELECT is sent down the wire. When that MODE SELECT 
completes, it compiles queued up activate requests for multiple
luns into a single MODE SELECT.

This reduces the time to do failover/failback of large number of LUNS.

Signed-off-by: Babu Moger <babu.moger@lsi.com>
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>

---
 drivers/scsi/device_handler/scsi_dh_rdac.c |  117 ++++++++++++++++++++++++++---
 1 file changed, 104 insertions(+), 13 deletions(-)

Index: linux-2.6.31/drivers/scsi/device_handler/scsi_dh_rdac.c
===================================================================
--- linux-2.6.31.orig/drivers/scsi/device_handler/scsi_dh_rdac.c
+++ linux-2.6.31/drivers/scsi/device_handler/scsi_dh_rdac.c
@@ -22,6 +22,7 @@
 #include <scsi/scsi.h>
 #include <scsi/scsi_eh.h>
 #include <scsi/scsi_dh.h>
+#include <linux/workqueue.h>
 
 #define RDAC_NAME "rdac"
 #define RDAC_RETRY_COUNT 5
@@ -135,6 +136,11 @@ struct rdac_controller {
 		struct rdac_pg_legacy legacy;
 		struct rdac_pg_expanded expanded;
 	} mode_select;
+	spinlock_t		ms_lock;
+	int			ms_queued;
+	struct work_struct	ms_work;
+	struct scsi_device	*ms_sdev;
+	struct list_head	ms_head;
 };
 struct c8_inquiry {
 	u8	peripheral_info;
@@ -195,8 +201,17 @@ static const char *lun_state[] =
 	"owned (AVT mode)",
 };
 
+struct rdac_queue_data {
+	struct list_head	entry;
+	struct rdac_dh_data	*h;
+	activate_complete	callback_fn;
+	void			*callback_data;
+};
+
 static LIST_HEAD(ctlr_list);
 static DEFINE_SPINLOCK(list_lock);
+static struct workqueue_struct *kmpath_rdacd;
+static void send_mode_select(struct work_struct *work);
 
 static inline struct rdac_dh_data *get_rdac_data(struct scsi_device *sdev)
 {
@@ -253,7 +268,6 @@ static struct request *rdac_failover_get
 		rdac_pg->subpage_code = 0x1;
 		rdac_pg->page_len[0] = 0x01;
 		rdac_pg->page_len[1] = 0x28;
-		rdac_pg->lun_table[h->lun] = 0x81;
 	} else {
 		struct rdac_pg_legacy *rdac_pg;
 
@@ -263,7 +277,6 @@ static struct request *rdac_failover_get
 		common = &rdac_pg->common;
 		rdac_pg->page_code = RDAC_PAGE_CODE_REDUNDANT_CONTROLLER;
 		rdac_pg->page_len = 0x68;
-		rdac_pg->lun_table[h->lun] = 0x81;
 	}
 	common->rdac_mode[1] = RDAC_MODE_TRANSFER_SPECIFIED_LUNS;
 	common->quiescence_timeout = RDAC_QUIESCENCE_TIME;
@@ -297,6 +310,7 @@ static void release_controller(struct kr
 	struct rdac_controller *ctlr;
 	ctlr = container_of(kref, struct rdac_controller, kref);
 
+	flush_workqueue(kmpath_rdacd);
 	spin_lock(&list_lock);
 	list_del(&ctlr->node);
 	spin_unlock(&list_lock);
@@ -326,6 +340,11 @@ static struct rdac_controller *get_contr
 	memcpy(ctlr->slot_id, slot_id, SLOT_ID_LEN);
 	kref_init(&ctlr->kref);
 	ctlr->use_ms10 = -1;
+	ctlr->ms_queued = 0;
+	ctlr->ms_sdev = NULL;
+	spin_lock_init(&ctlr->ms_lock);
+	INIT_WORK(&ctlr->ms_work, send_mode_select);
+	INIT_LIST_HEAD(&ctlr->ms_head);
 	list_add(&ctlr->node, &ctlr_list);
 done:
 	spin_unlock(&list_lock);
@@ -445,8 +464,7 @@ static int set_mode_select(struct scsi_d
 	return err;
 }
 
-static int mode_select_handle_sense(struct scsi_device *sdev,
-				    unsigned char *sensebuf)
+static int mode_select_handle_sense(unsigned char *sensebuf)
 {
 	struct scsi_sense_hdr sense_hdr;
 	int err = SCSI_DH_IO, ret;
@@ -478,7 +496,7 @@ static int mode_select_handle_sense(stru
 			err = SCSI_DH_RETRY;
 		break;
 	default:
-		sdev_printk(KERN_INFO, sdev,
+		printk(KERN_INFO
 			    "MODE_SELECT failed with sense %02x/%02x/%02x.\n",
 			    sense_hdr.sense_key, sense_hdr.asc, sense_hdr.ascq);
 	}
@@ -487,11 +505,29 @@ done:
 	return err;
 }
 
-static int send_mode_select(struct scsi_device *sdev, struct rdac_dh_data *h)
+static void send_mode_select(struct work_struct *work)
 {
+	struct rdac_controller *ctlr =
+		container_of(work, struct rdac_controller, ms_work);
 	struct request *rq;
+	struct scsi_device *sdev = ctlr->ms_sdev;
+	struct rdac_dh_data *h = get_rdac_data(sdev);
 	struct request_queue *q = sdev->request_queue;
 	int err, retry_cnt = RDAC_RETRY_COUNT;
+	struct rdac_queue_data *tmp, *qdata;
+	LIST_HEAD(list);
+	u8 *lun_table;
+
+	spin_lock(&ctlr->ms_lock);
+	list_splice_init(&ctlr->ms_head, &list);
+	ctlr->ms_queued = 0;
+	ctlr->ms_sdev = NULL;
+	spin_unlock(&ctlr->ms_lock);
+
+	if (ctlr->use_ms10)
+		lun_table = ctlr->mode_select.expanded.lun_table;
+	else
+		lun_table = ctlr->mode_select.legacy.lun_table;
 
 retry:
 	err = SCSI_DH_RES_TEMP_UNAVAIL;
@@ -499,13 +535,17 @@ retry:
 	if (!rq)
 		goto done;
 
-	sdev_printk(KERN_INFO, sdev, "%s MODE_SELECT command.\n",
+ 	list_for_each_entry(qdata, &list, entry) {
+		lun_table[qdata->h->lun] = 0x81;
+	}
+
+	printk(KERN_INFO "%s MODE_SELECT command.\n",
 		(retry_cnt == RDAC_RETRY_COUNT) ? "queueing" : "retrying");
 
 	err = blk_execute_rq(q, NULL, rq, 1);
 	blk_put_request(rq);
 	if (err != SCSI_DH_OK) {
-		err = mode_select_handle_sense(sdev, h->sense);
+		err = mode_select_handle_sense(h->sense);
 		if (err == SCSI_DH_RETRY && retry_cnt--)
 			goto retry;
 	}
@@ -513,10 +553,45 @@ retry:
 		h->state = RDAC_STATE_ACTIVE;
 
 done:
-	return err;
+	list_for_each_entry_safe(qdata, tmp, &list, entry) {
+		list_del(&qdata->entry);
+		if (err == SCSI_DH_OK)
+			qdata->h->state = RDAC_STATE_ACTIVE;
+		if (qdata->callback_fn)
+			qdata->callback_fn(qdata->callback_data, err);
+		kfree(qdata);
+	}
+	return;
 }
 
-static int rdac_activate(struct scsi_device *sdev, activate_complete fn, void *data)
+static int queue_mode_select(struct scsi_device *sdev,
+				activate_complete fn, void *data)
+{
+	struct rdac_queue_data *qdata;
+	struct rdac_controller *ctlr;
+
+	qdata = kzalloc(sizeof(*qdata), GFP_KERNEL);
+	if (!qdata)
+		return SCSI_DH_RETRY;
+
+ 	qdata->h = get_rdac_data(sdev);
+	qdata->callback_fn = fn;
+	qdata->callback_data = data;
+
+	ctlr = qdata->h->ctlr;
+	spin_lock(&ctlr->ms_lock);
+	list_add_tail(&qdata->entry, &ctlr->ms_head);
+	if (!ctlr->ms_queued) {
+		ctlr->ms_queued = 1;
+		ctlr->ms_sdev = sdev;
+		queue_work(kmpath_rdacd, &ctlr->ms_work);
+	}
+	spin_unlock(&ctlr->ms_lock);
+	return SCSI_DH_OK;
+}
+
+static int rdac_activate(struct scsi_device *sdev,
+				activate_complete fn, void *data)
 {
 	struct rdac_dh_data *h = get_rdac_data(sdev);
 	int err = SCSI_DH_OK;
@@ -536,8 +611,11 @@ static int rdac_activate(struct scsi_dev
 		if (err != SCSI_DH_OK)
 			goto done;
 	}
-	if (h->lun_state == RDAC_LUN_UNOWNED)
-		err = send_mode_select(sdev, h);
+	if (h->lun_state == RDAC_LUN_UNOWNED) {
+		err = queue_mode_select(sdev, fn, data);
+		if (err == SCSI_DH_OK)
+			return 0;
+	}
 done:
 	if (fn)
 		fn(data, err);
@@ -726,13 +804,26 @@ static int __init rdac_init(void)
 	int r;
 
 	r = scsi_register_device_handler(&rdac_dh);
-	if (r != 0)
+	if (r != 0) {
 		printk(KERN_ERR "Failed to register scsi device handler.");
+		goto done;
+	}
+
+	/*
+	 * Create workqueue to handle mode selects for rdac
+	 */
+	kmpath_rdacd = create_singlethread_workqueue("kmpath_rdacd");
+	if (!kmpath_rdacd) {
+		scsi_unregister_device_handler(&rdac_dh);
+		printk(KERN_ERR "kmpath_rdacd creation failed.\n");
+	}
+done:
 	return r;
 }
 
 static void __exit rdac_exit(void)
 {
+	destroy_workqueue(kmpath_rdacd);
 	scsi_unregister_device_handler(&rdac_dh);
 }
 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 3/4] scsi_dh: rdac handler: Make hp hardware handler async
  2009-09-30  2:08 [PATCH 0/4] scsi_dh: Make scsi_dh_activate asynchronous Chandra Seetharaman
  2009-09-30  2:08 ` [PATCH 1/4] scsi_dh: Change the scsidh_activate interface to be asynchronous Chandra Seetharaman
  2009-09-30  2:08 ` [PATCH 2/4] scsi_dh: rdac handler: Make rdac hardware handler async Chandra Seetharaman
@ 2009-09-30  2:08 ` Chandra Seetharaman
  2009-09-30  2:08 ` [PATCH 4/4] scsi_dh: rdac handler: Make alua " Chandra Seetharaman
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 22+ messages in thread
From: Chandra Seetharaman @ 2009-09-30  2:08 UTC (permalink / raw)
  To: linux-scsi
  Cc: dm-devel, babu.moger, michaelc, Benoit_Arthur, Eddie.Williams,
	berthiaume_wayne, Chandra Seetharaman

Make the activate function asynchronous by using blk_execute_rq_nowait()

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
---
 drivers/scsi/device_handler/scsi_dh_hp_sw.c |   87 +++++++++++++++++-----------
 1 file changed, 54 insertions(+), 33 deletions(-)

Index: linux-2.6.31/drivers/scsi/device_handler/scsi_dh_hp_sw.c
===================================================================
--- linux-2.6.31.orig/drivers/scsi/device_handler/scsi_dh_hp_sw.c
+++ linux-2.6.31/drivers/scsi/device_handler/scsi_dh_hp_sw.c
@@ -39,8 +39,14 @@ struct hp_sw_dh_data {
 	unsigned char sense[SCSI_SENSE_BUFFERSIZE];
 	int path_state;
 	int retries;
+	int retry_cnt;
+	struct scsi_device *sdev;
+	activate_complete	callback_fn;
+	void			*callback_data;
 };
 
+static int hp_sw_start_stop(struct hp_sw_dh_data *);
+
 static inline struct hp_sw_dh_data *get_hp_sw_data(struct scsi_device *sdev)
 {
 	struct scsi_dh_data *scsi_dh_data = sdev->scsi_dh_data;
@@ -191,19 +197,53 @@ static int start_done(struct scsi_device
 	return rc;
 }
 
+static void start_stop_endio(struct request *req, int error)
+{
+	struct hp_sw_dh_data *h = req->end_io_data;
+	unsigned err = SCSI_DH_OK;
+
+	if (error || host_byte(req->errors) != DID_OK ||
+			msg_byte(req->errors) != COMMAND_COMPLETE) {
+		sdev_printk(KERN_WARNING, h->sdev,
+			    "%s: sending start_stop_unit failed with %x\n",
+			    HP_SW_NAME, req->errors);
+		err = SCSI_DH_IO;
+		goto done;
+	}
+
+	if (req->sense_len > 0) {
+		err = start_done(h->sdev, h->sense);
+		if (err == SCSI_DH_RETRY) {
+			err = SCSI_DH_IO;
+			if (--h->retry_cnt) {
+				blk_put_request(req);
+				err = hp_sw_start_stop(h);
+				if (err == SCSI_DH_OK)
+					return;
+			}
+		}
+	}
+done:
+	blk_put_request(req);
+	if (h->callback_fn) {
+		h->callback_fn(h->callback_data, err);
+		h->callback_fn = h->callback_data = NULL;
+	}
+	return;
+
+}
+
 /*
  * hp_sw_start_stop - Send START STOP UNIT command
  * @sdev: sdev command should be sent to
  *
  * Sending START STOP UNIT activates the SP.
  */
-static int hp_sw_start_stop(struct scsi_device *sdev, struct hp_sw_dh_data *h)
+static int hp_sw_start_stop(struct hp_sw_dh_data *h)
 {
 	struct request *req;
-	int ret, retry;
 
-retry:
-	req = blk_get_request(sdev->request_queue, WRITE, GFP_NOIO);
+	req = blk_get_request(h->sdev->request_queue, WRITE, GFP_ATOMIC);
 	if (!req)
 		return SCSI_DH_RES_TEMP_UNAVAIL;
 
@@ -217,32 +257,10 @@ retry:
 	req->sense = h->sense;
 	memset(req->sense, 0, SCSI_SENSE_BUFFERSIZE);
 	req->sense_len = 0;
-	retry = h->retries;
+	req->end_io_data = h;
 
-	ret = blk_execute_rq(req->q, NULL, req, 1);
-	if (ret == -EIO) {
-		if (req->sense_len > 0) {
-			ret = start_done(sdev, h->sense);
-		} else {
-			sdev_printk(KERN_WARNING, sdev,
-				    "%s: sending start_stop_unit failed with %x\n",
-				    HP_SW_NAME, req->errors);
-			ret = SCSI_DH_IO;
-		}
-	} else
-		ret = SCSI_DH_OK;
-
-	if (ret == SCSI_DH_RETRY) {
-		if (--retry) {
-			blk_put_request(req);
-			goto retry;
-		}
-		ret = SCSI_DH_IO;
-	}
-
-	blk_put_request(req);
-
-	return ret;
+	blk_execute_rq_nowait(req->q, NULL, req, 1, start_stop_endio);
+	return SCSI_DH_OK;
 }
 
 static int hp_sw_prep_fn(struct scsi_device *sdev, struct request *req)
@@ -276,11 +294,13 @@ static int hp_sw_activate(struct scsi_de
 	ret = hp_sw_tur(sdev, h);
 
 	if (ret == SCSI_DH_OK && h->path_state == HP_SW_PATH_PASSIVE) {
-		ret = hp_sw_start_stop(sdev, h);
+		h->retry_cnt = h->retries;
+		h->callback_fn = fn;
+		h->callback_data = data;
+		ret = hp_sw_start_stop(h);
 		if (ret == SCSI_DH_OK)
-			sdev_printk(KERN_INFO, sdev,
-				    "%s: activated path\n",
-				    HP_SW_NAME);
+			return 0;
+		h->callback_fn = h->callback_data = NULL;
 	}
 
 	if (fn)
@@ -328,6 +348,7 @@ static int hp_sw_bus_attach(struct scsi_
 	h = (struct hp_sw_dh_data *) scsi_dh_data->buf;
 	h->path_state = HP_SW_PATH_UNINITIALIZED;
 	h->retries = HP_SW_RETRIES;
+	h->sdev = sdev;
 
 	ret = hp_sw_tur(sdev, h);
 	if (ret != SCSI_DH_OK || h->path_state == HP_SW_PATH_UNINITIALIZED)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 4/4] scsi_dh: rdac handler: Make alua hardware handler async
  2009-09-30  2:08 [PATCH 0/4] scsi_dh: Make scsi_dh_activate asynchronous Chandra Seetharaman
                   ` (2 preceding siblings ...)
  2009-09-30  2:08 ` [PATCH 3/4] scsi_dh: rdac handler: Make hp " Chandra Seetharaman
@ 2009-09-30  2:08 ` Chandra Seetharaman
  2009-10-01  4:19 ` [PATCH 0/4] scsi_dh: Make scsi_dh_activate asynchronous Moger, Babu
  2009-10-05 13:01 ` Hannes Reinecke
  5 siblings, 0 replies; 22+ messages in thread
From: Chandra Seetharaman @ 2009-09-30  2:08 UTC (permalink / raw)
  To: linux-scsi
  Cc: dm-devel, babu.moger, michaelc, Benoit_Arthur, Eddie.Williams,
	berthiaume_wayne, Chandra Seetharaman

Make the activate function asynchronous by using blk_execute_rq_nowait()

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
---
 drivers/scsi/device_handler/scsi_dh_alua.c |  132 ++++++++++++++++-------------
 1 file changed, 73 insertions(+), 59 deletions(-)

Index: linux-2.6.31/drivers/scsi/device_handler/scsi_dh_alua.c
===================================================================
--- linux-2.6.31.orig/drivers/scsi/device_handler/scsi_dh_alua.c
+++ linux-2.6.31/drivers/scsi/device_handler/scsi_dh_alua.c
@@ -60,11 +60,17 @@ struct alua_dh_data {
 	int			bufflen;
 	unsigned char		sense[SCSI_SENSE_BUFFERSIZE];
 	int			senselen;
+	struct scsi_device	*sdev;
+	activate_complete	callback_fn;
+	void			*callback_data;
 };
 
 #define ALUA_POLICY_SWITCH_CURRENT	0
 #define ALUA_POLICY_SWITCH_ALL		1
 
+static char print_alua_state(int);
+static int alua_check_sense(struct scsi_device *, struct scsi_sense_hdr *);
+
 static inline struct alua_dh_data *get_alua_data(struct scsi_device *sdev)
 {
 	struct scsi_dh_data *scsi_dh_data = sdev->scsi_dh_data;
@@ -231,18 +237,71 @@ done:
 }
 
 /*
+ * alua_stpg - Evaluate SET TARGET GROUP STATES
+ * @sdev: the device to be evaluated
+ * @state: the new target group state
+ *
+ * Send a SET TARGET GROUP STATES command to the device.
+ * We only have to test here if we should resubmit the command;
+ * any other error is assumed as a failure.
+ */
+static void stpg_endio(struct request *req, int error)
+{
+	struct alua_dh_data *h = req->end_io_data;
+	struct scsi_sense_hdr sense_hdr;
+	unsigned err = SCSI_DH_IO;
+
+	if (error || host_byte(req->errors) != DID_OK ||
+			msg_byte(req->errors) != COMMAND_COMPLETE)
+		goto done;
+
+	if (err == SCSI_DH_IO && h->senselen > 0) {
+		err = scsi_normalize_sense(h->sense, SCSI_SENSE_BUFFERSIZE,
+					   &sense_hdr);
+		if (!err) {
+			err = SCSI_DH_IO;
+			goto done;
+		}
+		err = alua_check_sense(h->sdev, &sense_hdr);
+		if (err == ADD_TO_MLQUEUE) {
+			err = SCSI_DH_RETRY;
+			goto done;
+		}
+		sdev_printk(KERN_INFO, h->sdev,
+			    "%s: stpg sense code: %02x/%02x/%02x\n",
+			    ALUA_DH_NAME, sense_hdr.sense_key,
+			    sense_hdr.asc, sense_hdr.ascq);
+		err = SCSI_DH_IO;
+	}
+	if (err == SCSI_DH_OK) {
+		h->state = TPGS_STATE_OPTIMIZED;
+		sdev_printk(KERN_INFO, h->sdev,
+			    "%s: port group %02x switched to state %c\n",
+			    ALUA_DH_NAME, h->group_id,
+			    print_alua_state(h->state) );
+	}
+done:
+	blk_put_request(req);
+	if (h->callback_fn) {
+		h->callback_fn(h->callback_data, err);
+		h->callback_fn = h->callback_data = NULL;
+	}
+	return;
+}
+
+/*
  * submit_stpg - Issue a SET TARGET GROUP STATES command
- * @sdev: sdev the command should be sent to
  *
  * Currently we're only setting the current target port group state
  * to 'active/optimized' and let the array firmware figure out
  * the states of the remaining groups.
  */
-static unsigned submit_stpg(struct scsi_device *sdev, struct alua_dh_data *h)
+static unsigned submit_stpg(struct alua_dh_data *h)
 {
 	struct request *rq;
 	int err = SCSI_DH_RES_TEMP_UNAVAIL;
 	int stpg_len = 8;
+	struct scsi_device *sdev = h->sdev;
 
 	/* Prepare the data buffer */
 	memset(h->buff, 0, stpg_len);
@@ -252,7 +311,7 @@ static unsigned submit_stpg(struct scsi_
 
 	rq = get_alua_req(sdev, h->buff, stpg_len, WRITE);
 	if (!rq)
-		goto done;
+		return SCSI_DH_RES_TEMP_UNAVAIL;
 
 	/* Prepare the command. */
 	rq->cmd[0] = MAINTENANCE_OUT;
@@ -266,17 +325,9 @@ static unsigned submit_stpg(struct scsi_
 	rq->sense = h->sense;
 	memset(rq->sense, 0, SCSI_SENSE_BUFFERSIZE);
 	rq->sense_len = h->senselen = 0;
+	rq->end_io_data = h;
 
-	err = blk_execute_rq(rq->q, NULL, rq, 1);
-	if (err == -EIO) {
-		sdev_printk(KERN_INFO, sdev,
-			    "%s: stpg failed with %x\n",
-			    ALUA_DH_NAME, rq->errors);
-		h->senselen = rq->sense_len;
-		err = SCSI_DH_IO;
-	}
-	blk_put_request(rq);
-done:
+	blk_execute_rq_nowait(rq->q, NULL, rq, 1, stpg_endio);
 	return err;
 }
 
@@ -477,50 +528,6 @@ static int alua_check_sense(struct scsi_
 }
 
 /*
- * alua_stpg - Evaluate SET TARGET GROUP STATES
- * @sdev: the device to be evaluated
- * @state: the new target group state
- *
- * Send a SET TARGET GROUP STATES command to the device.
- * We only have to test here if we should resubmit the command;
- * any other error is assumed as a failure.
- */
-static int alua_stpg(struct scsi_device *sdev, int state,
-		     struct alua_dh_data *h)
-{
-	struct scsi_sense_hdr sense_hdr;
-	unsigned err;
-	int retry = ALUA_FAILOVER_RETRIES;
-
- retry:
-	err = submit_stpg(sdev, h);
-	if (err == SCSI_DH_IO && h->senselen > 0) {
-		err = scsi_normalize_sense(h->sense, SCSI_SENSE_BUFFERSIZE,
-					   &sense_hdr);
-		if (!err)
-			return SCSI_DH_IO;
-		err = alua_check_sense(sdev, &sense_hdr);
-		if (retry > 0 && err == ADD_TO_MLQUEUE) {
-			retry--;
-			goto retry;
-		}
-		sdev_printk(KERN_INFO, sdev,
-			    "%s: stpg sense code: %02x/%02x/%02x\n",
-			    ALUA_DH_NAME, sense_hdr.sense_key,
-			    sense_hdr.asc, sense_hdr.ascq);
-		err = SCSI_DH_IO;
-	}
-	if (err == SCSI_DH_OK) {
-		h->state = state;
-		sdev_printk(KERN_INFO, sdev,
-			    "%s: port group %02x switched to state %c\n",
-			    ALUA_DH_NAME, h->group_id,
-			    print_alua_state(h->state) );
-	}
-	return err;
-}
-
-/*
  * alua_rtpg - Evaluate REPORT TARGET GROUP STATES
  * @sdev: the device to be evaluated.
  *
@@ -663,8 +670,14 @@ static int alua_activate(struct scsi_dev
 			goto out;
 	}
 
-	if (h->tpgs == TPGS_MODE_EXPLICIT && h->state != TPGS_STATE_OPTIMIZED)
-		err = alua_stpg(sdev, TPGS_STATE_OPTIMIZED, h);
+	if (h->tpgs == TPGS_MODE_EXPLICIT && h->state != TPGS_STATE_OPTIMIZED) {
+		h->callback_fn = fn;
+		h->callback_data = data;
+		err = submit_stpg(h);
+		if (err == SCSI_DH_OK)
+			return 0;
+		h->callback_fn = h->callback_data = NULL;
+	}
 
 out:
 	if (fn)
@@ -747,6 +760,7 @@ static int alua_bus_attach(struct scsi_d
 	h->rel_port = -1;
 	h->buff = h->inq;
 	h->bufflen = ALUA_INQUIRY_SIZE;
+	h->sdev = sdev;
 
 	err = alua_initialize(sdev, h);
 	if (err != SCSI_DH_OK)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH 0/4] scsi_dh: Make scsi_dh_activate asynchronous
  2009-09-30  2:08 [PATCH 0/4] scsi_dh: Make scsi_dh_activate asynchronous Chandra Seetharaman
                   ` (3 preceding siblings ...)
  2009-09-30  2:08 ` [PATCH 4/4] scsi_dh: rdac handler: Make alua " Chandra Seetharaman
@ 2009-10-01  4:19 ` Moger, Babu
  2009-10-01 20:54   ` Chandra Seetharaman
  2009-10-05 13:01 ` Hannes Reinecke
  5 siblings, 1 reply; 22+ messages in thread
From: Moger, Babu @ 2009-10-01  4:19 UTC (permalink / raw)
  To: Chandra Seetharaman, linux-scsi@vger.kernel.org
  Cc: dm-devel@redhat.com, michaelc@cs.wisc.edu,
	Eddie.Williams@steeleye.com, Benoit_Arthur@emc.com

Hi Chandra,
   Thanks for the patches. Patches applied cleanly on 2.6.31. I am testing the patches right now. Will post the with test results tomorrow.
These are my initial comments. 
1. Patches do not apply on 2.6.32. You may have to regenerate it on 2.6.32.
2. Also, I noticed few errors and warnings while running checkpatch.pl script.  
3. Subject line for patch 3 and 4 may need to be modified. It still says "rdac handler". It should be hp and alua handler. 

Thanks..

> -----Original Message-----
> From: linux-scsi-owner@vger.kernel.org [mailto:linux-scsi-
> owner@vger.kernel.org] On Behalf Of Chandra Seetharaman
> Sent: Tuesday, September 29, 2009 9:08 PM
> To: linux-scsi@vger.kernel.org
> Cc: dm-devel@redhat.com; Moger, Babu; michaelc@cs.wisc.edu;
> Benoit_Arthur@emc.com; Eddie.Williams@steeleye.com;
> berthiaume_wayne@emc.com; Chandra Seetharaman
> Subject: [PATCH 0/4] scsi_dh: Make scsi_dh_activate asynchronous
> 
> Hello All,
> 
> Currently, device handlers process path activation in series. This leads
> to a lot of time delay when more than 100 luns are involved. For example,
> with lsi rdac 100+ luns take about 12-15 minutes. This was found by Moger
> Babu of LSI.
> 
> This time delay can be avoided if we can do the activations
> asynchronously.
> By making scsi_dh_activate() async, we can give the device handlers an
> oppurtunity to decide on how to send the device activation down the wire
> to make the turn around time faster. They can send the commands
> asynchronously or send them in batches.
> 
> I brought up this issue on the mailing list
> (http://marc.info/?l=linux-scsi&m=123888063818755&w=2) and provided a
> set of patch a loooong while back http://marc.info/?l=linux-
> scsi&m=124088712709821&w=2
> 
> This set of patches applies cleanly on 2.6.31 and I tested the rdac
> handler
> on the same.
> 
> Please review and provide comments.
> 
> This set of patched adds asynchronous support for rdac, hp, and alua
> handlers.
> 
> Didn't add async support for EMC for the following reasons:
>  - EMC handler is getting more hairy (when I try to move to _nowait())
>  - I do not know if EMC has a special command to group them (like rdac).
>  - I wasn't sure if EMC has this problem in the first place ( I did not
> get
>    any response from EMC).
> Can somebody from EMC comment, please.
> 
> Thanks & Regards,
> 
> chandra
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH 0/4] scsi_dh: Make scsi_dh_activate asynchronous
  2009-10-01  4:19 ` [PATCH 0/4] scsi_dh: Make scsi_dh_activate asynchronous Moger, Babu
@ 2009-10-01 20:54   ` Chandra Seetharaman
  0 siblings, 0 replies; 22+ messages in thread
From: Chandra Seetharaman @ 2009-10-01 20:54 UTC (permalink / raw)
  To: Moger, Babu
  Cc: linux-scsi@vger.kernel.org, dm-devel@redhat.com,
	michaelc@cs.wisc.edu, Benoit_Arthur@emc.com,
	Eddie.Williams@steeleye.com, berthiaume_wayne@emc.com

Thanks Babu,

I will fix the issues you pointed.

Let me know how the testing goes.

It will be nice if you could also review the code :)

chandra
On Wed, 2009-09-30 at 22:19 -0600, Moger, Babu wrote:
> Hi Chandra,
>    Thanks for the patches. Patches applied cleanly on 2.6.31. I am testing the patches right now. Will post the with test results tomorrow.
> These are my initial comments. 
> 1. Patches do not apply on 2.6.32. You may have to regenerate it on 2.6.32.
> 2. Also, I noticed few errors and warnings while running checkpatch.pl script.  
> 3. Subject line for patch 3 and 4 may need to be modified. It still says "rdac handler". It should be hp and alua handler. 
> 
> Thanks..
> 
> > -----Original Message-----
> > From: linux-scsi-owner@vger.kernel.org [mailto:linux-scsi-
> > owner@vger.kernel.org] On Behalf Of Chandra Seetharaman
> > Sent: Tuesday, September 29, 2009 9:08 PM
> > To: linux-scsi@vger.kernel.org
> > Cc: dm-devel@redhat.com; Moger, Babu; michaelc@cs.wisc.edu;
> > Benoit_Arthur@emc.com; Eddie.Williams@steeleye.com;
> > berthiaume_wayne@emc.com; Chandra Seetharaman
> > Subject: [PATCH 0/4] scsi_dh: Make scsi_dh_activate asynchronous
> > 
> > Hello All,
> > 
> > Currently, device handlers process path activation in series. This leads
> > to a lot of time delay when more than 100 luns are involved. For example,
> > with lsi rdac 100+ luns take about 12-15 minutes. This was found by Moger
> > Babu of LSI.
> > 
> > This time delay can be avoided if we can do the activations
> > asynchronously.
> > By making scsi_dh_activate() async, we can give the device handlers an
> > oppurtunity to decide on how to send the device activation down the wire
> > to make the turn around time faster. They can send the commands
> > asynchronously or send them in batches.
> > 
> > I brought up this issue on the mailing list
> > (http://marc.info/?l=linux-scsi&m=123888063818755&w=2) and provided a
> > set of patch a loooong while back http://marc.info/?l=linux-
> > scsi&m=124088712709821&w=2
> > 
> > This set of patches applies cleanly on 2.6.31 and I tested the rdac
> > handler
> > on the same.
> > 
> > Please review and provide comments.
> > 
> > This set of patched adds asynchronous support for rdac, hp, and alua
> > handlers.
> > 
> > Didn't add async support for EMC for the following reasons:
> >  - EMC handler is getting more hairy (when I try to move to _nowait())
> >  - I do not know if EMC has a special command to group them (like rdac).
> >  - I wasn't sure if EMC has this problem in the first place ( I did not
> > get
> >    any response from EMC).
> > Can somebody from EMC comment, please.
> > 
> > Thanks & Regards,
> > 
> > chandra
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH 2/4] scsi_dh: rdac handler: Make rdac hardware handler async
  2009-09-30  2:08 ` [PATCH 2/4] scsi_dh: rdac handler: Make rdac hardware handler async Chandra Seetharaman
@ 2009-10-02  0:03   ` Moger, Babu
  2009-10-02  0:29     ` Chandra Seetharaman
  0 siblings, 1 reply; 22+ messages in thread
From: Moger, Babu @ 2009-10-02  0:03 UTC (permalink / raw)
  To: Chandra Seetharaman, linux-scsi@vger.kernel.org
  Cc: dm-devel@redhat.com, michaelc@cs.wisc.edu, Benoit_Arthur@emc.com,
	Eddie.Williams@steeleye.com, berthiaume_wayne@emc.com

Hi Chandra,
I have tested these patches on 2.6.31 kernel with LSI storage.
My configuration had 234 Luns with two paths to each lun(one active, one  passive) and luns were evenly distributed. 
Here are the results.

Results with the patches
Failover takes about 1 minute to complete 117 mode selects. Failovers are completed in two sets(1 mode select first and then 116 mode selects together).
Failback takes about 2 and Half minutes.. 


Results without the patches on same configuration
failover takes about 12 minutes to complete 117 mode selects. 
Failback takes about 15 minutes.

Thanks
Babu Moger 

> -----Original Message-----
> From: linux-scsi-owner@vger.kernel.org [mailto:linux-scsi-
> owner@vger.kernel.org] On Behalf Of Chandra Seetharaman
> Sent: Tuesday, September 29, 2009 9:08 PM
> To: linux-scsi@vger.kernel.org
> Cc: dm-devel@redhat.com; Moger, Babu; michaelc@cs.wisc.edu;
> Benoit_Arthur@emc.com; Eddie.Williams@steeleye.com;
> berthiaume_wayne@emc.com; Chandra Seetharaman
> Subject: [PATCH 2/4] scsi_dh: rdac handler: Make rdac hardware handler
> async
> 
> Batch up MODE_SELECT in rdac device handler.
> 
> LSI RDAC storage has the capability of handling mode selects for
> multiple luns in a same command. Make use of that ability to send
> as few MODE SELECTs as possible to the storage controller as possible.
> 
> This patch creates a work queue and queues up activate requests
> when a MODE SELECT is sent down the wire. When that MODE SELECT
> completes, it compiles queued up activate requests for multiple
> luns into a single MODE SELECT.
> 
> This reduces the time to do failover/failback of large number of LUNS.
> 
> Signed-off-by: Babu Moger <babu.moger@lsi.com>
> Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
> 
> ---
>  drivers/scsi/device_handler/scsi_dh_rdac.c |  117
> ++++++++++++++++++++++++++---
>  1 file changed, 104 insertions(+), 13 deletions(-)
> 
> Index: linux-2.6.31/drivers/scsi/device_handler/scsi_dh_rdac.c
> ===================================================================
> --- linux-2.6.31.orig/drivers/scsi/device_handler/scsi_dh_rdac.c
> +++ linux-2.6.31/drivers/scsi/device_handler/scsi_dh_rdac.c
> @@ -22,6 +22,7 @@
>  #include <scsi/scsi.h>
>  #include <scsi/scsi_eh.h>
>  #include <scsi/scsi_dh.h>
> +#include <linux/workqueue.h>
> 
>  #define RDAC_NAME "rdac"
>  #define RDAC_RETRY_COUNT 5
> @@ -135,6 +136,11 @@ struct rdac_controller {
>  		struct rdac_pg_legacy legacy;
>  		struct rdac_pg_expanded expanded;
>  	} mode_select;
> +	spinlock_t		ms_lock;
> +	int			ms_queued;
> +	struct work_struct	ms_work;
> +	struct scsi_device	*ms_sdev;
> +	struct list_head	ms_head;
>  };
>  struct c8_inquiry {
>  	u8	peripheral_info;
> @@ -195,8 +201,17 @@ static const char *lun_state[] =
>  	"owned (AVT mode)",
>  };
> 
> +struct rdac_queue_data {
> +	struct list_head	entry;
> +	struct rdac_dh_data	*h;
> +	activate_complete	callback_fn;
> +	void			*callback_data;
> +};
> +
>  static LIST_HEAD(ctlr_list);
>  static DEFINE_SPINLOCK(list_lock);
> +static struct workqueue_struct *kmpath_rdacd;
> +static void send_mode_select(struct work_struct *work);
> 
>  static inline struct rdac_dh_data *get_rdac_data(struct scsi_device
> *sdev)
>  {
> @@ -253,7 +268,6 @@ static struct request *rdac_failover_get
>  		rdac_pg->subpage_code = 0x1;
>  		rdac_pg->page_len[0] = 0x01;
>  		rdac_pg->page_len[1] = 0x28;
> -		rdac_pg->lun_table[h->lun] = 0x81;
>  	} else {
>  		struct rdac_pg_legacy *rdac_pg;
> 
> @@ -263,7 +277,6 @@ static struct request *rdac_failover_get
>  		common = &rdac_pg->common;
>  		rdac_pg->page_code = RDAC_PAGE_CODE_REDUNDANT_CONTROLLER;
>  		rdac_pg->page_len = 0x68;
> -		rdac_pg->lun_table[h->lun] = 0x81;
>  	}
>  	common->rdac_mode[1] = RDAC_MODE_TRANSFER_SPECIFIED_LUNS;
>  	common->quiescence_timeout = RDAC_QUIESCENCE_TIME;
> @@ -297,6 +310,7 @@ static void release_controller(struct kr
>  	struct rdac_controller *ctlr;
>  	ctlr = container_of(kref, struct rdac_controller, kref);
> 
> +	flush_workqueue(kmpath_rdacd);
>  	spin_lock(&list_lock);
>  	list_del(&ctlr->node);
>  	spin_unlock(&list_lock);
> @@ -326,6 +340,11 @@ static struct rdac_controller *get_contr
>  	memcpy(ctlr->slot_id, slot_id, SLOT_ID_LEN);
>  	kref_init(&ctlr->kref);
>  	ctlr->use_ms10 = -1;
> +	ctlr->ms_queued = 0;
> +	ctlr->ms_sdev = NULL;
> +	spin_lock_init(&ctlr->ms_lock);
> +	INIT_WORK(&ctlr->ms_work, send_mode_select);
> +	INIT_LIST_HEAD(&ctlr->ms_head);
>  	list_add(&ctlr->node, &ctlr_list);
>  done:
>  	spin_unlock(&list_lock);
> @@ -445,8 +464,7 @@ static int set_mode_select(struct scsi_d
>  	return err;
>  }
> 
> -static int mode_select_handle_sense(struct scsi_device *sdev,
> -				    unsigned char *sensebuf)
> +static int mode_select_handle_sense(unsigned char *sensebuf)
>  {
>  	struct scsi_sense_hdr sense_hdr;
>  	int err = SCSI_DH_IO, ret;
> @@ -478,7 +496,7 @@ static int mode_select_handle_sense(stru
>  			err = SCSI_DH_RETRY;
>  		break;
>  	default:
> -		sdev_printk(KERN_INFO, sdev,
> +		printk(KERN_INFO
>  			    "MODE_SELECT failed with sense
> %02x/%02x/%02x.\n",
>  			    sense_hdr.sense_key, sense_hdr.asc,
> sense_hdr.ascq);
>  	}
> @@ -487,11 +505,29 @@ done:
>  	return err;
>  }
> 
> -static int send_mode_select(struct scsi_device *sdev, struct
> rdac_dh_data *h)
> +static void send_mode_select(struct work_struct *work)
>  {
> +	struct rdac_controller *ctlr =
> +		container_of(work, struct rdac_controller, ms_work);
>  	struct request *rq;
> +	struct scsi_device *sdev = ctlr->ms_sdev;
> +	struct rdac_dh_data *h = get_rdac_data(sdev);
>  	struct request_queue *q = sdev->request_queue;
>  	int err, retry_cnt = RDAC_RETRY_COUNT;
> +	struct rdac_queue_data *tmp, *qdata;
> +	LIST_HEAD(list);
> +	u8 *lun_table;
> +
> +	spin_lock(&ctlr->ms_lock);
> +	list_splice_init(&ctlr->ms_head, &list);
> +	ctlr->ms_queued = 0;
> +	ctlr->ms_sdev = NULL;
> +	spin_unlock(&ctlr->ms_lock);
> +
> +	if (ctlr->use_ms10)
> +		lun_table = ctlr->mode_select.expanded.lun_table;
> +	else
> +		lun_table = ctlr->mode_select.legacy.lun_table;
> 
>  retry:
>  	err = SCSI_DH_RES_TEMP_UNAVAIL;
> @@ -499,13 +535,17 @@ retry:
>  	if (!rq)
>  		goto done;
> 
> -	sdev_printk(KERN_INFO, sdev, "%s MODE_SELECT command.\n",
> + 	list_for_each_entry(qdata, &list, entry) {
> +		lun_table[qdata->h->lun] = 0x81;
> +	}
> +
> +	printk(KERN_INFO "%s MODE_SELECT command.\n",
>  		(retry_cnt == RDAC_RETRY_COUNT) ? "queueing" : "retrying");
> 
>  	err = blk_execute_rq(q, NULL, rq, 1);
>  	blk_put_request(rq);
>  	if (err != SCSI_DH_OK) {
> -		err = mode_select_handle_sense(sdev, h->sense);
> +		err = mode_select_handle_sense(h->sense);
>  		if (err == SCSI_DH_RETRY && retry_cnt--)
>  			goto retry;
>  	}
> @@ -513,10 +553,45 @@ retry:
>  		h->state = RDAC_STATE_ACTIVE;
> 
>  done:
> -	return err;
> +	list_for_each_entry_safe(qdata, tmp, &list, entry) {
> +		list_del(&qdata->entry);
> +		if (err == SCSI_DH_OK)
> +			qdata->h->state = RDAC_STATE_ACTIVE;
> +		if (qdata->callback_fn)
> +			qdata->callback_fn(qdata->callback_data, err);
> +		kfree(qdata);
> +	}
> +	return;
>  }
> 
> -static int rdac_activate(struct scsi_device *sdev, activate_complete
> fn, void *data)
> +static int queue_mode_select(struct scsi_device *sdev,
> +				activate_complete fn, void *data)
> +{
> +	struct rdac_queue_data *qdata;
> +	struct rdac_controller *ctlr;
> +
> +	qdata = kzalloc(sizeof(*qdata), GFP_KERNEL);
> +	if (!qdata)
> +		return SCSI_DH_RETRY;
> +
> + 	qdata->h = get_rdac_data(sdev);
> +	qdata->callback_fn = fn;
> +	qdata->callback_data = data;
> +
> +	ctlr = qdata->h->ctlr;
> +	spin_lock(&ctlr->ms_lock);
> +	list_add_tail(&qdata->entry, &ctlr->ms_head);
> +	if (!ctlr->ms_queued) {
> +		ctlr->ms_queued = 1;
> +		ctlr->ms_sdev = sdev;
> +		queue_work(kmpath_rdacd, &ctlr->ms_work);
> +	}
> +	spin_unlock(&ctlr->ms_lock);
> +	return SCSI_DH_OK;
> +}
> +
> +static int rdac_activate(struct scsi_device *sdev,
> +				activate_complete fn, void *data)
>  {
>  	struct rdac_dh_data *h = get_rdac_data(sdev);
>  	int err = SCSI_DH_OK;
> @@ -536,8 +611,11 @@ static int rdac_activate(struct scsi_dev
>  		if (err != SCSI_DH_OK)
>  			goto done;
>  	}
> -	if (h->lun_state == RDAC_LUN_UNOWNED)
> -		err = send_mode_select(sdev, h);
> +	if (h->lun_state == RDAC_LUN_UNOWNED) {
> +		err = queue_mode_select(sdev, fn, data);
> +		if (err == SCSI_DH_OK)
> +			return 0;
> +	}
>  done:
>  	if (fn)
>  		fn(data, err);
> @@ -726,13 +804,26 @@ static int __init rdac_init(void)
>  	int r;
> 
>  	r = scsi_register_device_handler(&rdac_dh);
> -	if (r != 0)
> +	if (r != 0) {
>  		printk(KERN_ERR "Failed to register scsi device handler.");
> +		goto done;
> +	}
> +
> +	/*
> +	 * Create workqueue to handle mode selects for rdac
> +	 */
> +	kmpath_rdacd = create_singlethread_workqueue("kmpath_rdacd");
> +	if (!kmpath_rdacd) {
> +		scsi_unregister_device_handler(&rdac_dh);
> +		printk(KERN_ERR "kmpath_rdacd creation failed.\n");
> +	}
> +done:
>  	return r;
>  }
> 
>  static void __exit rdac_exit(void)
>  {
> +	destroy_workqueue(kmpath_rdacd);
>  	scsi_unregister_device_handler(&rdac_dh);
>  }
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi"
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH 2/4] scsi_dh: rdac handler: Make rdac hardware handler async
  2009-10-02  0:03   ` Moger, Babu
@ 2009-10-02  0:29     ` Chandra Seetharaman
  0 siblings, 0 replies; 22+ messages in thread
From: Chandra Seetharaman @ 2009-10-02  0:29 UTC (permalink / raw)
  To: Moger, Babu
  Cc: linux-scsi@vger.kernel.org, dm-devel@redhat.com,
	michaelc@cs.wisc.edu, Benoit_Arthur@emc.com,
	Eddie.Williams@steeleye.com, berthiaume_wayne@emc.com

Thanks Babu,

It matches with what I was seeing too.

On Thu, 2009-10-01 at 18:03 -0600, Moger, Babu wrote:
> Hi Chandra,
> I have tested these patches on 2.6.31 kernel with LSI storage.
> My configuration had 234 Luns with two paths to each lun(one active, one  passive) and luns were evenly distributed. 
> Here are the results.
> 
> Results with the patches
> Failover takes about 1 minute to complete 117 mode selects. Failovers are completed in two sets(1 mode select first and then 116 mode selects together).
> Failback takes about 2 and Half minutes.. 
> 
> 
> Results without the patches on same configuration
> failover takes about 12 minutes to complete 117 mode selects. 
> Failback takes about 15 minutes.
> 
> Thanks
> Babu Moger 
> 
> > -----Original Message-----
> > From: linux-scsi-owner@vger.kernel.org [mailto:linux-scsi-
> > owner@vger.kernel.org] On Behalf Of Chandra Seetharaman
> > Sent: Tuesday, September 29, 2009 9:08 PM
> > To: linux-scsi@vger.kernel.org
> > Cc: dm-devel@redhat.com; Moger, Babu; michaelc@cs.wisc.edu;
> > Benoit_Arthur@emc.com; Eddie.Williams@steeleye.com;
> > berthiaume_wayne@emc.com; Chandra Seetharaman
> > Subject: [PATCH 2/4] scsi_dh: rdac handler: Make rdac hardware handler
> > async
> > 
> > Batch up MODE_SELECT in rdac device handler.
> > 
> > LSI RDAC storage has the capability of handling mode selects for
> > multiple luns in a same command. Make use of that ability to send
> > as few MODE SELECTs as possible to the storage controller as possible.
> > 
> > This patch creates a work queue and queues up activate requests
> > when a MODE SELECT is sent down the wire. When that MODE SELECT
> > completes, it compiles queued up activate requests for multiple
> > luns into a single MODE SELECT.
> > 
> > This reduces the time to do failover/failback of large number of LUNS.
> > 
> > Signed-off-by: Babu Moger <babu.moger@lsi.com>
> > Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
> > 
> > ---
> >  drivers/scsi/device_handler/scsi_dh_rdac.c |  117
> > ++++++++++++++++++++++++++---
> >  1 file changed, 104 insertions(+), 13 deletions(-)
> > 
> > Index: linux-2.6.31/drivers/scsi/device_handler/scsi_dh_rdac.c
> > ===================================================================
> > --- linux-2.6.31.orig/drivers/scsi/device_handler/scsi_dh_rdac.c
> > +++ linux-2.6.31/drivers/scsi/device_handler/scsi_dh_rdac.c
> > @@ -22,6 +22,7 @@
> >  #include <scsi/scsi.h>
> >  #include <scsi/scsi_eh.h>
> >  #include <scsi/scsi_dh.h>
> > +#include <linux/workqueue.h>
> > 
> >  #define RDAC_NAME "rdac"
> >  #define RDAC_RETRY_COUNT 5
> > @@ -135,6 +136,11 @@ struct rdac_controller {
> >  		struct rdac_pg_legacy legacy;
> >  		struct rdac_pg_expanded expanded;
> >  	} mode_select;
> > +	spinlock_t		ms_lock;
> > +	int			ms_queued;
> > +	struct work_struct	ms_work;
> > +	struct scsi_device	*ms_sdev;
> > +	struct list_head	ms_head;
> >  };
> >  struct c8_inquiry {
> >  	u8	peripheral_info;
> > @@ -195,8 +201,17 @@ static const char *lun_state[] =
> >  	"owned (AVT mode)",
> >  };
> > 
> > +struct rdac_queue_data {
> > +	struct list_head	entry;
> > +	struct rdac_dh_data	*h;
> > +	activate_complete	callback_fn;
> > +	void			*callback_data;
> > +};
> > +
> >  static LIST_HEAD(ctlr_list);
> >  static DEFINE_SPINLOCK(list_lock);
> > +static struct workqueue_struct *kmpath_rdacd;
> > +static void send_mode_select(struct work_struct *work);
> > 
> >  static inline struct rdac_dh_data *get_rdac_data(struct scsi_device
> > *sdev)
> >  {
> > @@ -253,7 +268,6 @@ static struct request *rdac_failover_get
> >  		rdac_pg->subpage_code = 0x1;
> >  		rdac_pg->page_len[0] = 0x01;
> >  		rdac_pg->page_len[1] = 0x28;
> > -		rdac_pg->lun_table[h->lun] = 0x81;
> >  	} else {
> >  		struct rdac_pg_legacy *rdac_pg;
> > 
> > @@ -263,7 +277,6 @@ static struct request *rdac_failover_get
> >  		common = &rdac_pg->common;
> >  		rdac_pg->page_code = RDAC_PAGE_CODE_REDUNDANT_CONTROLLER;
> >  		rdac_pg->page_len = 0x68;
> > -		rdac_pg->lun_table[h->lun] = 0x81;
> >  	}
> >  	common->rdac_mode[1] = RDAC_MODE_TRANSFER_SPECIFIED_LUNS;
> >  	common->quiescence_timeout = RDAC_QUIESCENCE_TIME;
> > @@ -297,6 +310,7 @@ static void release_controller(struct kr
> >  	struct rdac_controller *ctlr;
> >  	ctlr = container_of(kref, struct rdac_controller, kref);
> > 
> > +	flush_workqueue(kmpath_rdacd);
> >  	spin_lock(&list_lock);
> >  	list_del(&ctlr->node);
> >  	spin_unlock(&list_lock);
> > @@ -326,6 +340,11 @@ static struct rdac_controller *get_contr
> >  	memcpy(ctlr->slot_id, slot_id, SLOT_ID_LEN);
> >  	kref_init(&ctlr->kref);
> >  	ctlr->use_ms10 = -1;
> > +	ctlr->ms_queued = 0;
> > +	ctlr->ms_sdev = NULL;
> > +	spin_lock_init(&ctlr->ms_lock);
> > +	INIT_WORK(&ctlr->ms_work, send_mode_select);
> > +	INIT_LIST_HEAD(&ctlr->ms_head);
> >  	list_add(&ctlr->node, &ctlr_list);
> >  done:
> >  	spin_unlock(&list_lock);
> > @@ -445,8 +464,7 @@ static int set_mode_select(struct scsi_d
> >  	return err;
> >  }
> > 
> > -static int mode_select_handle_sense(struct scsi_device *sdev,
> > -				    unsigned char *sensebuf)
> > +static int mode_select_handle_sense(unsigned char *sensebuf)
> >  {
> >  	struct scsi_sense_hdr sense_hdr;
> >  	int err = SCSI_DH_IO, ret;
> > @@ -478,7 +496,7 @@ static int mode_select_handle_sense(stru
> >  			err = SCSI_DH_RETRY;
> >  		break;
> >  	default:
> > -		sdev_printk(KERN_INFO, sdev,
> > +		printk(KERN_INFO
> >  			    "MODE_SELECT failed with sense
> > %02x/%02x/%02x.\n",
> >  			    sense_hdr.sense_key, sense_hdr.asc,
> > sense_hdr.ascq);
> >  	}
> > @@ -487,11 +505,29 @@ done:
> >  	return err;
> >  }
> > 
> > -static int send_mode_select(struct scsi_device *sdev, struct
> > rdac_dh_data *h)
> > +static void send_mode_select(struct work_struct *work)
> >  {
> > +	struct rdac_controller *ctlr =
> > +		container_of(work, struct rdac_controller, ms_work);
> >  	struct request *rq;
> > +	struct scsi_device *sdev = ctlr->ms_sdev;
> > +	struct rdac_dh_data *h = get_rdac_data(sdev);
> >  	struct request_queue *q = sdev->request_queue;
> >  	int err, retry_cnt = RDAC_RETRY_COUNT;
> > +	struct rdac_queue_data *tmp, *qdata;
> > +	LIST_HEAD(list);
> > +	u8 *lun_table;
> > +
> > +	spin_lock(&ctlr->ms_lock);
> > +	list_splice_init(&ctlr->ms_head, &list);
> > +	ctlr->ms_queued = 0;
> > +	ctlr->ms_sdev = NULL;
> > +	spin_unlock(&ctlr->ms_lock);
> > +
> > +	if (ctlr->use_ms10)
> > +		lun_table = ctlr->mode_select.expanded.lun_table;
> > +	else
> > +		lun_table = ctlr->mode_select.legacy.lun_table;
> > 
> >  retry:
> >  	err = SCSI_DH_RES_TEMP_UNAVAIL;
> > @@ -499,13 +535,17 @@ retry:
> >  	if (!rq)
> >  		goto done;
> > 
> > -	sdev_printk(KERN_INFO, sdev, "%s MODE_SELECT command.\n",
> > + 	list_for_each_entry(qdata, &list, entry) {
> > +		lun_table[qdata->h->lun] = 0x81;
> > +	}
> > +
> > +	printk(KERN_INFO "%s MODE_SELECT command.\n",
> >  		(retry_cnt == RDAC_RETRY_COUNT) ? "queueing" : "retrying");
> > 
> >  	err = blk_execute_rq(q, NULL, rq, 1);
> >  	blk_put_request(rq);
> >  	if (err != SCSI_DH_OK) {
> > -		err = mode_select_handle_sense(sdev, h->sense);
> > +		err = mode_select_handle_sense(h->sense);
> >  		if (err == SCSI_DH_RETRY && retry_cnt--)
> >  			goto retry;
> >  	}
> > @@ -513,10 +553,45 @@ retry:
> >  		h->state = RDAC_STATE_ACTIVE;
> > 
> >  done:
> > -	return err;
> > +	list_for_each_entry_safe(qdata, tmp, &list, entry) {
> > +		list_del(&qdata->entry);
> > +		if (err == SCSI_DH_OK)
> > +			qdata->h->state = RDAC_STATE_ACTIVE;
> > +		if (qdata->callback_fn)
> > +			qdata->callback_fn(qdata->callback_data, err);
> > +		kfree(qdata);
> > +	}
> > +	return;
> >  }
> > 
> > -static int rdac_activate(struct scsi_device *sdev, activate_complete
> > fn, void *data)
> > +static int queue_mode_select(struct scsi_device *sdev,
> > +				activate_complete fn, void *data)
> > +{
> > +	struct rdac_queue_data *qdata;
> > +	struct rdac_controller *ctlr;
> > +
> > +	qdata = kzalloc(sizeof(*qdata), GFP_KERNEL);
> > +	if (!qdata)
> > +		return SCSI_DH_RETRY;
> > +
> > + 	qdata->h = get_rdac_data(sdev);
> > +	qdata->callback_fn = fn;
> > +	qdata->callback_data = data;
> > +
> > +	ctlr = qdata->h->ctlr;
> > +	spin_lock(&ctlr->ms_lock);
> > +	list_add_tail(&qdata->entry, &ctlr->ms_head);
> > +	if (!ctlr->ms_queued) {
> > +		ctlr->ms_queued = 1;
> > +		ctlr->ms_sdev = sdev;
> > +		queue_work(kmpath_rdacd, &ctlr->ms_work);
> > +	}
> > +	spin_unlock(&ctlr->ms_lock);
> > +	return SCSI_DH_OK;
> > +}
> > +
> > +static int rdac_activate(struct scsi_device *sdev,
> > +				activate_complete fn, void *data)
> >  {
> >  	struct rdac_dh_data *h = get_rdac_data(sdev);
> >  	int err = SCSI_DH_OK;
> > @@ -536,8 +611,11 @@ static int rdac_activate(struct scsi_dev
> >  		if (err != SCSI_DH_OK)
> >  			goto done;
> >  	}
> > -	if (h->lun_state == RDAC_LUN_UNOWNED)
> > -		err = send_mode_select(sdev, h);
> > +	if (h->lun_state == RDAC_LUN_UNOWNED) {
> > +		err = queue_mode_select(sdev, fn, data);
> > +		if (err == SCSI_DH_OK)
> > +			return 0;
> > +	}
> >  done:
> >  	if (fn)
> >  		fn(data, err);
> > @@ -726,13 +804,26 @@ static int __init rdac_init(void)
> >  	int r;
> > 
> >  	r = scsi_register_device_handler(&rdac_dh);
> > -	if (r != 0)
> > +	if (r != 0) {
> >  		printk(KERN_ERR "Failed to register scsi device handler.");
> > +		goto done;
> > +	}
> > +
> > +	/*
> > +	 * Create workqueue to handle mode selects for rdac
> > +	 */
> > +	kmpath_rdacd = create_singlethread_workqueue("kmpath_rdacd");
> > +	if (!kmpath_rdacd) {
> > +		scsi_unregister_device_handler(&rdac_dh);
> > +		printk(KERN_ERR "kmpath_rdacd creation failed.\n");
> > +	}
> > +done:
> >  	return r;
> >  }
> > 
> >  static void __exit rdac_exit(void)
> >  {
> > +	destroy_workqueue(kmpath_rdacd);
> >  	scsi_unregister_device_handler(&rdac_dh);
> >  }
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-scsi"
> > in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH 1/4] scsi_dh: Change the scsidh_activate interface to be asynchronous
  2009-09-30  2:08 ` [PATCH 1/4] scsi_dh: Change the scsidh_activate interface to be asynchronous Chandra Seetharaman
@ 2009-10-02 22:04   ` Moger, Babu
  2009-10-02 22:36     ` Chandra Seetharaman
  0 siblings, 1 reply; 22+ messages in thread
From: Moger, Babu @ 2009-10-02 22:04 UTC (permalink / raw)
  To: Chandra Seetharaman, linux-scsi@vger.kernel.org
  Cc: dm-devel@redhat.com, michaelc@cs.wisc.edu, Benoit_Arthur@emc.com,
	Eddie.Williams@steeleye.com, berthiaume_wayne@emc.com


Chandra, I have one comment on this patch.. 

>  static inline int scsi_dh_handler_exist(const char *name)
> Index: linux-2.6.31/drivers/scsi/device_handler/scsi_dh.c
> ===================================================================
> --- linux-2.6.31.orig/drivers/scsi/device_handler/scsi_dh.c
> +++ linux-2.6.31/drivers/scsi/device_handler/scsi_dh.c
> @@ -214,7 +214,7 @@ store_dh_state(struct device *dev, struc
>  			 * Activate a device handler
>  			 */
>  			if (scsi_dh->activate)
> -				err = scsi_dh->activate(sdev);
> +				err = scsi_dh->activate(sdev, NULL, NULL);


This might cause problems in handlers.  We don’t do NULL check in attach routine.


>  			else
>  				err = 0;
>  		}
> @@ -411,10 +411,17 @@ EXPORT_SYMBOL_GPL(scsi_unregister_device
>  /*
>   * scsi_dh_activate - activate the path associated with the
> scsi_device
>   *      corresponding to the given request queue.
> - * @q - Request queue that is associated with the scsi_device to be
> - *      activated.
> + *     Returns immediately without waiting for activation to be
> completed.
> + * @q    - Request queue that is associated with the scsi_device to be
> + *         activated.
> + * @fn   - Function to be called upon completion of the activation.
> + *         Function fn is called with data (below) and the error code.
> + *         Function fn may be called from the same calling context.
> So,
> + *         do not hold the lock in the caller which may be needed in
> fn.
> + * @data - data passed to the function fn upon completion.
> + *
>   */
> -int scsi_dh_activate(struct request_queue *q)
> +int scsi_dh_activate(struct request_queue *q, activate_complete fn,
> void *data)
>  {
>  	int err = 0;
>  	unsigned long flags;
> @@ -433,7 +440,7 @@ int scsi_dh_activate(struct request_queu
>  		return err;
> 
>  	if (scsi_dh->activate)
> -		err = scsi_dh->activate(sdev);
> +		err = scsi_dh->activate(sdev, fn, data);
>  	put_device(&sdev->sdev_gendev);
>  	return err;
>  }
> Index: linux-2.6.31/drivers/scsi/device_handler/scsi_dh_rdac.c
> ===================================================================
> --- linux-2.6.31.orig/drivers/scsi/device_handler/scsi_dh_rdac.c
> +++ linux-2.6.31/drivers/scsi/device_handler/scsi_dh_rdac.c
> @@ -516,7 +516,7 @@ done:
>  	return err;
>  }
> 
> -static int rdac_activate(struct scsi_device *sdev)
> +static int rdac_activate(struct scsi_device *sdev, activate_complete
> fn, void *data)
>  {
>  	struct rdac_dh_data *h = get_rdac_data(sdev);
>  	int err = SCSI_DH_OK;
> @@ -539,7 +539,9 @@ static int rdac_activate(struct scsi_dev
>  	if (h->lun_state == RDAC_LUN_UNOWNED)
>  		err = send_mode_select(sdev, h);
>  done:
> -	return err;
> +	if (fn)
> +		fn(data, err);
> +	return 0;
>  }
> 
>  static int rdac_prep_fn(struct scsi_device *sdev, struct request *req)
> Index: linux-2.6.31/drivers/scsi/device_handler/scsi_dh_alua.c
> ===================================================================
> --- linux-2.6.31.orig/drivers/scsi/device_handler/scsi_dh_alua.c
> +++ linux-2.6.31/drivers/scsi/device_handler/scsi_dh_alua.c
> @@ -652,7 +652,7 @@ out:
>   * based on a certain policy. But until we actually encounter them it
>   * should be okay.
>   */
> -static int alua_activate(struct scsi_device *sdev)
> +static int alua_activate(struct scsi_device *sdev, activate_complete
> fn, void *data)
>  {
>  	struct alua_dh_data *h = get_alua_data(sdev);
>  	int err = SCSI_DH_OK;
> @@ -667,7 +667,9 @@ static int alua_activate(struct scsi_dev
>  		err = alua_stpg(sdev, TPGS_STATE_OPTIMIZED, h);
> 
>  out:
> -	return err;
> +	if (fn)
> +		fn(data, err);
> +	return 0;
>  }
> 
>  /*
> Index: linux-2.6.31/drivers/scsi/device_handler/scsi_dh_emc.c
> ===================================================================
> --- linux-2.6.31.orig/drivers/scsi/device_handler/scsi_dh_emc.c
> +++ linux-2.6.31/drivers/scsi/device_handler/scsi_dh_emc.c
> @@ -528,7 +528,7 @@ retry:
>  	return err;
>  }
> 
> -static int clariion_activate(struct scsi_device *sdev)
> +static int clariion_activate(struct scsi_device *sdev,
> activate_complete fn, void *data)
>  {
>  	struct clariion_dh_data *csdev = get_clariion_data(sdev);
>  	int result;
> @@ -559,7 +559,9 @@ done:
>  		    csdev->port, lun_state[csdev->lun_state],
>  		    csdev->default_sp + 'A');
> 
> -	return result;
> +	if (fn)
> +		fn(data, result);
> +	return 0;
>  }
> 
>  static const struct scsi_dh_devlist clariion_dev_list[] = {
> Index: linux-2.6.31/drivers/scsi/device_handler/scsi_dh_hp_sw.c
> ===================================================================
> --- linux-2.6.31.orig/drivers/scsi/device_handler/scsi_dh_hp_sw.c
> +++ linux-2.6.31/drivers/scsi/device_handler/scsi_dh_hp_sw.c
> @@ -268,7 +268,7 @@ static int hp_sw_prep_fn(struct scsi_dev
>   * activate the passive path (and deactivate the
>   * previously active one).
>   */
> -static int hp_sw_activate(struct scsi_device *sdev)
> +static int hp_sw_activate(struct scsi_device *sdev, activate_complete
> fn, void *data)
>  {
>  	int ret = SCSI_DH_OK;
>  	struct hp_sw_dh_data *h = get_hp_sw_data(sdev);
> @@ -283,7 +283,9 @@ static int hp_sw_activate(struct scsi_de
>  				    HP_SW_NAME);
>  	}
> 
> -	return ret;
> +	if (fn)
> +		fn(data, ret);
> +	return 0;
>  }
> 
>  static const struct scsi_dh_devlist hp_sw_dh_data_list[] = {
> Index: linux-2.6.31/include/scsi/scsi_device.h
> ===================================================================
> --- linux-2.6.31.orig/include/scsi/scsi_device.h
> +++ linux-2.6.31/include/scsi/scsi_device.h
> @@ -174,6 +174,7 @@ struct scsi_dh_devlist {
>  	char *model;
>  };
> 
> +typedef void (*activate_complete)(void *, int);
>  struct scsi_device_handler {
>  	/* Used by the infrastructure */
>  	struct list_head list; /* list of scsi_device_handlers */
> @@ -185,7 +186,7 @@ struct scsi_device_handler {
>  	int (*check_sense)(struct scsi_device *, struct scsi_sense_hdr
> *);
>  	int (*attach)(struct scsi_device *);
>  	void (*detach)(struct scsi_device *);
> -	int (*activate)(struct scsi_device *);
> +	int (*activate)(struct scsi_device *, activate_complete, void *);
>  	int (*prep_fn)(struct scsi_device *, struct request *);
>  };
> 
> Index: linux-2.6.31/drivers/md/dm-mpath.c
> ===================================================================
> --- linux-2.6.31.orig/drivers/md/dm-mpath.c
> +++ linux-2.6.31/drivers/md/dm-mpath.c
> @@ -1086,8 +1086,9 @@ static int pg_init_limit_reached(struct
>  	return limit_reached;
>  }
> 
> -static void pg_init_done(struct dm_path *path, int errors)
> +static void pg_init_done(void *data, int errors)
>  {
> +	struct dm_path *path = data;
>  	struct pgpath *pgpath = path_to_pgpath(path);
>  	struct priority_group *pg = pgpath->pg;
>  	struct multipath *m = pg->m;
> @@ -1153,12 +1154,11 @@ static void pg_init_done(struct dm_path
> 
>  static void activate_path(struct work_struct *work)
>  {
> -	int ret;
>  	struct pgpath *pgpath =
>  		container_of(work, struct pgpath, activate_path);
> 
> -	ret = scsi_dh_activate(bdev_get_queue(pgpath->path.dev->bdev));
> -	pg_init_done(&pgpath->path, ret);
> +	scsi_dh_activate(bdev_get_queue(pgpath->path.dev->bdev),
> +				pg_init_done, &pgpath->path);
>  }
> 
>  /*
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi"
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH 1/4] scsi_dh: Change the scsidh_activate interface to be asynchronous
  2009-10-02 22:04   ` Moger, Babu
@ 2009-10-02 22:36     ` Chandra Seetharaman
  2009-10-02 23:02       ` Moger, Babu
  0 siblings, 1 reply; 22+ messages in thread
From: Chandra Seetharaman @ 2009-10-02 22:36 UTC (permalink / raw)
  To: Moger, Babu
  Cc: linux-scsi@vger.kernel.org, dm-devel@redhat.com,
	michaelc@cs.wisc.edu, Benoit_Arthur@emc.com,
	Eddie.Williams@steeleye.com, berthiaume_wayne@emc.com


On Fri, 2009-10-02 at 16:04 -0600, Moger, Babu wrote:
> Chandra, I have one comment on this patch.. 
> 
> >  static inline int scsi_dh_handler_exist(const char *name)
> > Index: linux-2.6.31/drivers/scsi/device_handler/scsi_dh.c
> > ===================================================================
> > --- linux-2.6.31.orig/drivers/scsi/device_handler/scsi_dh.c
> > +++ linux-2.6.31/drivers/scsi/device_handler/scsi_dh.c
> > @@ -214,7 +214,7 @@ store_dh_state(struct device *dev, struc
> >  			 * Activate a device handler
> >  			 */
> >  			if (scsi_dh->activate)
> > -				err = scsi_dh->activate(sdev);
> > +				err = scsi_dh->activate(sdev, NULL, NULL);
> 
> 
> This might cause problems in handlers.  We don’t do NULL check in attach routine.

I assume you meant activate(), when you mention "attach" above.

At every place where the callback function is called, we first check
that the callback function is not NULL.

Can you point me if I missed some place.

Thanks

chandra

> 
> 
> >  			else
> >  				err = 0;
> >  		}


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH 1/4] scsi_dh: Change the scsidh_activate interface to be asynchronous
  2009-10-02 22:36     ` Chandra Seetharaman
@ 2009-10-02 23:02       ` Moger, Babu
  0 siblings, 0 replies; 22+ messages in thread
From: Moger, Babu @ 2009-10-02 23:02 UTC (permalink / raw)
  To: sekharan@linux.vnet.ibm.com
  Cc: linux-scsi@vger.kernel.org, dm-devel@redhat.com,
	michaelc@cs.wisc.edu, Benoit_Arthur@emc.com,
	Eddie.Williams@steeleye.com, berthiaume_wayne@emc.com


> On Fri, 2009-10-02 at 16:04 -0600, Moger, Babu wrote:
> > Chandra, I have one comment on this patch..
> >
> > >  static inline int scsi_dh_handler_exist(const char *name)
> > > Index: linux-2.6.31/drivers/scsi/device_handler/scsi_dh.c
> > > ===================================================================
> > > --- linux-2.6.31.orig/drivers/scsi/device_handler/scsi_dh.c
> > > +++ linux-2.6.31/drivers/scsi/device_handler/scsi_dh.c
> > > @@ -214,7 +214,7 @@ store_dh_state(struct device *dev, struc
> > >  			 * Activate a device handler
> > >  			 */
> > >  			if (scsi_dh->activate)
> > > -				err = scsi_dh->activate(sdev);
> > > +				err = scsi_dh->activate(sdev, NULL, NULL);
> >
> >
> > This might cause problems in handlers.  We don’t do NULL check in
> attach routine.
> 
> I assume you meant activate(), when you mention "attach" above.
> 
> At every place where the callback function is called, we first check
> that the callback function is not NULL.
> 
> Can you point me if I missed some place.
> 

Sorry for the confusion.. Yes, I meant activate. 
You are right. All the handlers check for NULL before calling. We are good here..
I misread the code..

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/4] scsi_dh: Make scsi_dh_activate asynchronous
  2009-09-30  2:08 [PATCH 0/4] scsi_dh: Make scsi_dh_activate asynchronous Chandra Seetharaman
                   ` (4 preceding siblings ...)
  2009-10-01  4:19 ` [PATCH 0/4] scsi_dh: Make scsi_dh_activate asynchronous Moger, Babu
@ 2009-10-05 13:01 ` Hannes Reinecke
  2009-10-05 14:35   ` Hannes Reinecke
  2009-10-05 23:25   ` Chandra Seetharaman
  5 siblings, 2 replies; 22+ messages in thread
From: Hannes Reinecke @ 2009-10-05 13:01 UTC (permalink / raw)
  To: Chandra Seetharaman
  Cc: michaelc, linux-scsi, dm-devel, Benoit_Arthur, Eddie.Williams

Chandra Seetharaman wrote:
> Hello All,
> 
> Currently, device handlers process path activation in series. This leads
> to a lot of time delay when more than 100 luns are involved. For example,
> with lsi rdac 100+ luns take about 12-15 minutes. This was found by Moger
> Babu of LSI.
> 
> This time delay can be avoided if we can do the activations asynchronously.
> By making scsi_dh_activate() async, we can give the device handlers an
> oppurtunity to decide on how to send the device activation down the wire
> to make the turn around time faster. They can send the commands
> asynchronously or send them in batches.
> 
> I brought up this issue on the mailing list
> (http://marc.info/?l=linux-scsi&m=123888063818755&w=2) and provided a
> set of patch a loooong while back http://marc.info/?l=linux-scsi&m=124088712709821&w=2
> 
> This set of patches applies cleanly on 2.6.31 and I tested the rdac handler
> on the same.
> 
> Please review and provide comments.
> 
> This set of patched adds asynchronous support for rdac, hp, and alua handlers.
> 
Hmm. IIRC we added the synchronous mode by request of LSI/IBM, as the RDAC
handler could only support on MODE SELECT command at a time.
If LSI checked this, okay, no objections.

However: The main reason why we're getting flooded with MODE SELECT commands
is that the RDAC handler switches _each LUN_, not the entire controller.
Seeing that the controller simply cannot cope with the resulting MODE SELECT
flood wouldn't it be more sensible to switch the entire controller here?

After all, we're trying to address a communication failure between the
HBA and the controller, not a failure between the controller and the LUN.
And by that reasoning switching individual LUNs is quite pointless as we
have to switch _all_ LUNs handled by this controller eventually.

So I would suggest to first issue a MODE SENSE command to check which LUNs
are currently handled by this controller and then switch those LUNs in
one go. This way we would be sending quite a few MODE SENSE commands,
but I was under the impression that those do not have any restriction.

I will see to draw up a patch.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/4] scsi_dh: Make scsi_dh_activate asynchronous
  2009-10-05 13:01 ` Hannes Reinecke
@ 2009-10-05 14:35   ` Hannes Reinecke
  2009-10-05 23:25   ` Chandra Seetharaman
  1 sibling, 0 replies; 22+ messages in thread
From: Hannes Reinecke @ 2009-10-05 14:35 UTC (permalink / raw)
  To: Chandra Seetharaman
  Cc: michaelc, linux-scsi, dm-devel, Benoit_Arthur, Eddie.Williams

[-- Attachment #1: Type: text/plain, Size: 1298 bytes --]

Hannes Reinecke wrote:
[ .. ]
> However: The main reason why we're getting flooded with MODE SELECT 
> commands
> is that the RDAC handler switches _each LUN_, not the entire controller.
> Seeing that the controller simply cannot cope with the resulting MODE 
> SELECT
> flood wouldn't it be more sensible to switch the entire controller here?
> 
> After all, we're trying to address a communication failure between the
> HBA and the controller, not a failure between the controller and the LUN.
> And by that reasoning switching individual LUNs is quite pointless as we
> have to switch _all_ LUNs handled by this controller eventually.
> 
> So I would suggest to first issue a MODE SENSE command to check which LUNs
> are currently handled by this controller and then switch those LUNs in
> one go. This way we would be sending quite a few MODE SENSE commands,
> but I was under the impression that those do not have any restriction.
> 
> I will see to draw up a patch.
> 
Was easier than I thought. Patch is attached.
Note: Proof of concept only. No warranties.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

[-- Attachment #2: rdac-transfer-all-luns --]
[-- Type: text/plain, Size: 1922 bytes --]

diff --git a/drivers/scsi/device_handler/scsi_dh_rdac.c b/drivers/scsi/device_handler/scsi_dh_rdac.c
index ec0ad84..6900115 100644
--- a/drivers/scsi/device_handler/scsi_dh_rdac.c
+++ b/drivers/scsi/device_handler/scsi_dh_rdac.c
@@ -42,6 +42,7 @@
 /*
  * Controller modes definitions
  */
+#define RDAC_MODE_TRANSFER_ALL_VISIBLE_LUNS	0x01
 #define RDAC_MODE_TRANSFER_SPECIFIED_LUNS	0x02
 
 /*
@@ -129,6 +130,7 @@ struct rdac_controller {
 	u8			subsys_id[SUBSYS_ID_LEN];
 	u8			slot_id[SLOT_ID_LEN];
 	int			use_ms10;
+	int			transfer_all_luns;
 	struct kref		kref;
 	struct list_head	node; /* list of all controllers */
 	union			{
@@ -253,7 +255,8 @@ static struct request *rdac_failover_get(struct scsi_device *sdev,
 		rdac_pg->subpage_code = 0x1;
 		rdac_pg->page_len[0] = 0x01;
 		rdac_pg->page_len[1] = 0x28;
-		rdac_pg->lun_table[h->lun] = 0x81;
+		if (!h->ctlr->transfer_all_luns)
+			rdac_pg->lun_table[h->lun] = 0x81;
 	} else {
 		struct rdac_pg_legacy *rdac_pg;
 
@@ -263,9 +266,13 @@ static struct request *rdac_failover_get(struct scsi_device *sdev,
 		common = &rdac_pg->common;
 		rdac_pg->page_code = RDAC_PAGE_CODE_REDUNDANT_CONTROLLER;
 		rdac_pg->page_len = 0x68;
-		rdac_pg->lun_table[h->lun] = 0x81;
+		if (!h->ctlr->transfer_all_luns)
+			rdac_pg->lun_table[h->lun] = 0x81;
 	}
-	common->rdac_mode[1] = RDAC_MODE_TRANSFER_SPECIFIED_LUNS;
+	if (h->ctlr->transfer_all_luns)
+		common->rdac_mode[1] = RDAC_MODE_TRANSFER_ALL_VISIBLE_LUNS;
+	else
+		common->rdac_mode[1] = RDAC_MODE_TRANSFER_SPECIFIED_LUNS;
 	common->quiescence_timeout = RDAC_QUIESCENCE_TIME;
 	common->rdac_options = RDAC_FORCED_QUIESENCE;
 
@@ -326,6 +333,7 @@ static struct rdac_controller *get_controller(u8 *subsys_id, u8 *slot_id)
 	memcpy(ctlr->slot_id, slot_id, SLOT_ID_LEN);
 	kref_init(&ctlr->kref);
 	ctlr->use_ms10 = -1;
+	ctlr->transfer_all_luns = 1;
 	list_add(&ctlr->node, &ctlr_list);
 done:
 	spin_unlock(&list_lock);

[-- Attachment #3: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/4] scsi_dh: Make scsi_dh_activate asynchronous
  2009-10-05 13:01 ` Hannes Reinecke
  2009-10-05 14:35   ` Hannes Reinecke
@ 2009-10-05 23:25   ` Chandra Seetharaman
  2009-10-06  8:08     ` Hannes Reinecke
  1 sibling, 1 reply; 22+ messages in thread
From: Chandra Seetharaman @ 2009-10-05 23:25 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: michaelc, linux-scsi, dm-devel, Benoit_Arthur, Eddie.Williams


On Mon, 2009-10-05 at 15:01 +0200, Hannes Reinecke wrote:

Thanks for the comment and the patch. I will try the patch.

> Hmm. IIRC we added the synchronous mode by request of LSI/IBM, as the RDAC
> handler could only support on MODE SELECT command at a time.
> If LSI checked this, okay, no objections.

The original patch (for rdac) came from LSI (Moger Babu), I made changes
to the infrastructure and to the code to fit them together.

> 
> However: The main reason why we're getting flooded with MODE SELECT commands
> is that the RDAC handler switches _each LUN_, not the entire controller.
> Seeing that the controller simply cannot cope with the resulting MODE SELECT
> flood wouldn't it be more sensible to switch the entire controller here?

I see your point of view.... but here is the rationale...

When we originally added the rdac support (as dm_rdac), we decided this
way consciously for the following reasons:
 1. we do not know which link is broken (hba-ctlr or ctlr-lun).
    The way it is currently implemented, both these cases are handled.
 2. multipath layer to decide what to do and this module to just do
    what it was told.
 3. since multipath sends pg_init only if there is any IO sent to the
    lun, (with the current implementation) we don't have to change
    ownership (back and forth) of all the luns if the user is using only
    a handful.
 4. to be consistent with LSI's original driver (which does one lun at a
    time).

Let me know what do you think. 

Babu, What is LSI's opinion in this issue ?

regards,

chandra

> 
> After all, we're trying to address a communication failure between the
> HBA and the controller, not a failure between the controller and the LUN.
> And by that reasoning switching individual LUNs is quite pointless as we
> have to switch _all_ LUNs handled by this controller eventually.
> 
> So I would suggest to first issue a MODE SENSE command to check which LUNs
> are currently handled by this controller and then switch those LUNs in
> one go. This way we would be sending quite a few MODE SENSE commands,
> but I was under the impression that those do not have any restriction.
> 
> I will see to draw up a patch.
> 
> Cheers,
> 
> Hannes

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/4] scsi_dh: Make scsi_dh_activate asynchronous
  2009-10-05 23:25   ` Chandra Seetharaman
@ 2009-10-06  8:08     ` Hannes Reinecke
  2009-10-06 19:46       ` Moger, Babu
  0 siblings, 1 reply; 22+ messages in thread
From: Hannes Reinecke @ 2009-10-06  8:08 UTC (permalink / raw)
  To: sekharan; +Cc: michaelc, linux-scsi, dm-devel, Benoit_Arthur, Eddie.Williams

Chandra Seetharaman wrote:
> On Mon, 2009-10-05 at 15:01 +0200, Hannes Reinecke wrote:
> 
> Thanks for the comment and the patch. I will try the patch.
> 
>> Hmm. IIRC we added the synchronous mode by request of LSI/IBM, as the RDAC
>> handler could only support on MODE SELECT command at a time.
>> If LSI checked this, okay, no objections.
> 
> The original patch (for rdac) came from LSI (Moger Babu), I made changes
> to the infrastructure and to the code to fit them together.
> 
>> However: The main reason why we're getting flooded with MODE SELECT commands
>> is that the RDAC handler switches _each LUN_, not the entire controller.
>> Seeing that the controller simply cannot cope with the resulting MODE SELECT
>> flood wouldn't it be more sensible to switch the entire controller here?
> 
> I see your point of view.... but here is the rationale...
> 
> When we originally added the rdac support (as dm_rdac), we decided this
> way consciously for the following reasons:
>  1. we do not know which link is broken (hba-ctlr or ctlr-lun).
>     The way it is currently implemented, both these cases are handled.
Quite. But if the ctlr-lun link is broken, we really should receive
and appropriate error code, which then could be handled in dm-rdac
appropriately. After all, the controller is still accessible and
so we don't have to guess what happened (which is all what multipath
is about). So I doubt we need to worry about this.

>  2. multipath layer to decide what to do and this module to just do
>     what it was told.
Yep.

>  3. since multipath sends pg_init only if there is any IO sent to the
>     lun, (with the current implementation) we don't have to change
>     ownership (back and forth) of all the luns if the user is using only
>     a handful.
Well, yes. But if the implementation is such that changing ownership
for all LUNs is about as efficient as changing an individual LUN (or,
as the case might be, as _inefficient_ :-), surely it's better to
save us some coding here.

>  4. to be consistent with LSI's original driver (which does one lun at a
>     time).
:-/ That's unfair.
But it still has the drawback that it doesn't scale; given enough LUNs and
access patterns you _inevitably_ have to send MODE SELECT commands for
each LUN, ie you only delay this issue.
Only by switching all LUNs you can avoid this.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH 0/4] scsi_dh: Make scsi_dh_activate asynchronous
  2009-10-06  8:08     ` Hannes Reinecke
@ 2009-10-06 19:46       ` Moger, Babu
  2009-10-07 23:08         ` Moger, Babu
  0 siblings, 1 reply; 22+ messages in thread
From: Moger, Babu @ 2009-10-06 19:46 UTC (permalink / raw)
  To: Hannes Reinecke, sekharan@linux.vnet.ibm.com
  Cc: michaelc@cs.wisc.edu, Stankey, Robert, linux-scsi@vger.kernel.org,
	Dachepalli, Sudhir, dm-devel@redhat.com, Chauhan, Vijay,
	Benoit_Arthur@emc.com, Qi, Yanling, Eddie.Williams@steeleye.com

Thanks Hannes and Chandra for your patches and feedback.  Right now we are internally discussing the pros and cons of both these approaches. We will respond to all your question once we arrive at the conclusion.

Also adding our Team (Bob Stankey, Sudhir, Yanling, Vijay ) in the response.

Thanks
Babu Moger 

> -----Original Message-----
> From: Hannes Reinecke [mailto:hare@suse.de]
> Sent: Tuesday, October 06, 2009 3:08 AM
> To: sekharan@linux.vnet.ibm.com
> Cc: linux-scsi@vger.kernel.org; dm-devel@redhat.com; Moger, Babu;
> michaelc@cs.wisc.edu; Benoit_Arthur@emc.com;
> Eddie.Williams@steeleye.com; berthiaume_wayne@emc.com
> Subject: Re: [PATCH 0/4] scsi_dh: Make scsi_dh_activate asynchronous
> 
> Chandra Seetharaman wrote:
> > On Mon, 2009-10-05 at 15:01 +0200, Hannes Reinecke wrote:
> >
> > Thanks for the comment and the patch. I will try the patch.
> >
> >> Hmm. IIRC we added the synchronous mode by request of LSI/IBM, as
> the RDAC
> >> handler could only support on MODE SELECT command at a time.
> >> If LSI checked this, okay, no objections.
> >
> > The original patch (for rdac) came from LSI (Moger Babu), I made
> changes
> > to the infrastructure and to the code to fit them together.
> >
> >> However: The main reason why we're getting flooded with MODE SELECT
> commands
> >> is that the RDAC handler switches _each LUN_, not the entire
> controller.
> >> Seeing that the controller simply cannot cope with the resulting
> MODE SELECT
> >> flood wouldn't it be more sensible to switch the entire controller
> here?
> >
> > I see your point of view.... but here is the rationale...
> >
> > When we originally added the rdac support (as dm_rdac), we decided
> this
> > way consciously for the following reasons:
> >  1. we do not know which link is broken (hba-ctlr or ctlr-lun).
> >     The way it is currently implemented, both these cases are
> handled.
> Quite. But if the ctlr-lun link is broken, we really should receive
> and appropriate error code, which then could be handled in dm-rdac
> appropriately. After all, the controller is still accessible and
> so we don't have to guess what happened (which is all what multipath
> is about). So I doubt we need to worry about this.
> 
> >  2. multipath layer to decide what to do and this module to just do
> >     what it was told.
> Yep.
> 
> >  3. since multipath sends pg_init only if there is any IO sent to the
> >     lun, (with the current implementation) we don't have to change
> >     ownership (back and forth) of all the luns if the user is using
> only
> >     a handful.
> Well, yes. But if the implementation is such that changing ownership
> for all LUNs is about as efficient as changing an individual LUN (or,
> as the case might be, as _inefficient_ :-), surely it's better to
> save us some coding here.
> 
> >  4. to be consistent with LSI's original driver (which does one lun
> at a
> >     time).
> :-/ That's unfair.
> But it still has the drawback that it doesn't scale; given enough LUNs
> and
> access patterns you _inevitably_ have to send MODE SELECT commands for
> each LUN, ie you only delay this issue.
> Only by switching all LUNs you can avoid this.
> 
> Cheers,
> 
> Hannes
> --
> Dr. Hannes Reinecke		      zSeries & Storage
> hare@suse.de			      +49 911 74053 688
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: Markus Rex, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: RE: [PATCH 0/4] scsi_dh: Make scsi_dh_activate asynchronous
  2009-10-06 19:46       ` Moger, Babu
@ 2009-10-07 23:08         ` Moger, Babu
  2009-10-09  9:44           ` Hannes Reinecke
  0 siblings, 1 reply; 22+ messages in thread
From: Moger, Babu @ 2009-10-07 23:08 UTC (permalink / raw)
  To: device-mapper development, Hannes Reinecke,
	sekharan@linux.vnet.ibm.com
  Cc: michaelc@cs.wisc.edu, Stankey, Robert, linux-scsi@vger.kernel.org,
	Dachepalli, Sudhir, Chauhan, Vijay, Benoit_Arthur@emc.com,
	Qi, Yanling, Eddie.Williams@steeleye.com


Hi Hannes,
I have tested the patch you had sent. Failover works fine. 

But, we are seeing problems during the failback. It is causing continues mode-select thrashing(ping pong).

Reason for this is, handler does not know if the mode select is coming for failover or failback. Every mode select will cause movement of all the Luns. It does not matter if the LUNs are on preferred path or not. On next polling interval, multipathd will find some Luns are not on preferred path and will initiate another failback. This will result in continues ping pong. I have explained this with an EXAMPLE 1 below.

For failback to work properly, we have to have selective Lun level failover.

There is also one more Cluster scenario where we could get into thrashing with Controller Level failover. Please see the EXAMPLE 2 below.

We have been testing LUN level failover with device mapper for a while now. It is working well for us and only problem we have is slower failovers with big configurations(failover was taking about 12 minutes with 234 luns). LSI and IBM(Chandra) has been working on  asynchronous behavior for the past 3-4 months. I have tested all the patches Chandra has posted and we have seen very good results(Failover takes only 1 minute with 234 luns).

Also, these patches give the opportunity for other handlers to move to asynchronous behavior if they wish to. We need your(and Linux community) help to review the patches and move forward on this issue.

Thanks
Babu Moger
LSI Corporation


Following are the two example where we could see mode-select thrashing.. 

EXAMPLE 1 (mode select thrashing with 2 Luns in single host)
=======================================================
Let's take a very simple example.

I have 2 Luns on my host. Host is seeing both the controllers with one path to each controller.

Lun 0 is owned by controller A and preferred owner is A.
Lun 1 is owned by controller B and preferred owned is B

Here is multipath -ll output..

mpath237 (3600a0b80000f519c0000cc8a48fc7d0b) dm-4 LSI,INF-01-00
[size=2.0G][features=1 queue_if_no_path][hwhandler=1 rdac][rw]
\_ round-robin 0 [prio=100][active]   
 \_ 1:0:0:0 sde 8:64  [active][ready] (controller A)
\_ round-robin 0 [prio=0][enabled]
 \_ 2:0:0:0 sdi 8:128 [active][ghost] (controller B)

mpath180 (3600a0b80000f519c0000cc9048fc7d7b) dm-5 LSI,INF-01-00
[size=2.0G][features=1 queue_if_no_path][hwhandler=1 rdac][rw]
\_ round-robin 0 [prio=100][active]   
 \_ 2:0:0:1 sdj 8:144 [active][ready] (controller B)
\_ round-robin 0 [prio=0][enabled]
 \_ 1:0:0:1 sdf 8:80  [active][ghost] (controller A)


1. Run I/O on both these Luns
2. Pull the cable connected to controller A.
3. Failover will happen and Lun 0 will move to controller B. Now both the Luns are on controller B.
4. Connect the cable back on controller A.
5. multipath tool will detect the physical Luns on controller A and run the priority test.
6. It will find that Lun 0 is not on preferred path and will initiate a failback. 
   Because it is a controller level failover it will move the Lun 1 also to controller A.
   Now both the Luns are on controller A.
7. Multipath tool will come again and find Lun 1 not on preferred path and initiate failback.
   This will both the Luns to controller B.
   This will continue forever.   



EXAMPLE 2: (mode select thrashing in cluster setup)
============================================================================
Let's take two node cluster environment where luns are visible across multiple nodes, although any given lun would only be accessible via one node at a time.  If a cluster configuration were to get into a state where one node only has visibility to one controller while another node only has visibility to the alternate, a “thrashing” condition could happen.  Take this example:

•	32 luns have been mapped from the storage to all nodes.
•	Luns 0-15 are owned by the ‘A’ controller and being accessed by node #1; luns 16-31 are owned by ‘B’ and mapped to node #2.
•	Node #1 only has access to the ‘A’ controller; node #2 only has access to the ‘B’ controller.

Let’s say Node #1 decides to access lun 16.  Because it does not have visibility to the ‘B’ controller it must issue a volume transfer request.  With Controller failover solution the volume transfer request would also move luns 17-31.  If node #2 were accessing those luns they would receive ownership errors, causing a volume transfer request to move them back.  However, this also moves lun 16 from ‘A’ back to ‘B’, causing node #1 to do the volume transfer request again…..etc.  




> -----Original Message-----
> From: dm-devel-bounces@redhat.com [mailto:dm-devel-bounces@redhat.com]
> On Behalf Of Moger, Babu
> Sent: Tuesday, October 06, 2009 2:46 PM
> To: Hannes Reinecke; sekharan@linux.vnet.ibm.com
> Cc: michaelc@cs.wisc.edu; Stankey, Robert; linux-scsi@vger.kernel.org;
> Dachepalli, Sudhir; dm-devel@redhat.com; Chauhan, Vijay;
> Benoit_Arthur@emc.com; Qi, Yanling; Eddie.Williams@steeleye.com
> Subject: [dm-devel] RE: [PATCH 0/4] scsi_dh: Make scsi_dh_activate
> asynchronous
> 
> Thanks Hannes and Chandra for your patches and feedback.  Right now we
> are internally discussing the pros and cons of both these approaches.
> We will respond to all your question once we arrive at the conclusion.
> 
> Also adding our Team (Bob Stankey, Sudhir, Yanling, Vijay ) in the
> response.
> 
> Thanks
> Babu Moger
> 
> > -----Original Message-----
> > From: Hannes Reinecke [mailto:hare@suse.de]
> > Sent: Tuesday, October 06, 2009 3:08 AM
> > To: sekharan@linux.vnet.ibm.com
> > Cc: linux-scsi@vger.kernel.org; dm-devel@redhat.com; Moger, Babu;
> > michaelc@cs.wisc.edu; Benoit_Arthur@emc.com;
> > Eddie.Williams@steeleye.com; berthiaume_wayne@emc.com
> > Subject: Re: [PATCH 0/4] scsi_dh: Make scsi_dh_activate asynchronous
> >
> > Chandra Seetharaman wrote:
> > > On Mon, 2009-10-05 at 15:01 +0200, Hannes Reinecke wrote:
> > >
> > > Thanks for the comment and the patch. I will try the patch.
> > >
> > >> Hmm. IIRC we added the synchronous mode by request of LSI/IBM, as
> > the RDAC
> > >> handler could only support on MODE SELECT command at a time.
> > >> If LSI checked this, okay, no objections.
> > >
> > > The original patch (for rdac) came from LSI (Moger Babu), I made
> > changes
> > > to the infrastructure and to the code to fit them together.
> > >
> > >> However: The main reason why we're getting flooded with MODE
> SELECT
> > commands
> > >> is that the RDAC handler switches _each LUN_, not the entire
> > controller.
> > >> Seeing that the controller simply cannot cope with the resulting
> > MODE SELECT
> > >> flood wouldn't it be more sensible to switch the entire controller
> > here?
> > >
> > > I see your point of view.... but here is the rationale...
> > >
> > > When we originally added the rdac support (as dm_rdac), we decided
> > this
> > > way consciously for the following reasons:
> > >  1. we do not know which link is broken (hba-ctlr or ctlr-lun).
> > >     The way it is currently implemented, both these cases are
> > handled.
> > Quite. But if the ctlr-lun link is broken, we really should receive
> > and appropriate error code, which then could be handled in dm-rdac
> > appropriately. After all, the controller is still accessible and
> > so we don't have to guess what happened (which is all what multipath
> > is about). So I doubt we need to worry about this.
> >
> > >  2. multipath layer to decide what to do and this module to just do
> > >     what it was told.
> > Yep.
> >
> > >  3. since multipath sends pg_init only if there is any IO sent to
> the
> > >     lun, (with the current implementation) we don't have to change
> > >     ownership (back and forth) of all the luns if the user is using
> > only
> > >     a handful.
> > Well, yes. But if the implementation is such that changing ownership
> > for all LUNs is about as efficient as changing an individual LUN (or,
> > as the case might be, as _inefficient_ :-), surely it's better to
> > save us some coding here.
> >
> > >  4. to be consistent with LSI's original driver (which does one lun
> > at a
> > >     time).
> > :-/ That's unfair.
> > But it still has the drawback that it doesn't scale; given enough
> LUNs
> > and
> > access patterns you _inevitably_ have to send MODE SELECT commands
> for
> > each LUN, ie you only delay this issue.
> > Only by switching all LUNs you can avoid this.
> >
> > Cheers,
> >
> > Hannes
> > --
> > Dr. Hannes Reinecke		      zSeries & Storage
> > hare@suse.de			      +49 911 74053 688
> > SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
> > GF: Markus Rex, HRB 16746 (AG Nürnberg)
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RE: [PATCH 0/4] scsi_dh: Make scsi_dh_activate asynchronous
  2009-10-07 23:08         ` Moger, Babu
@ 2009-10-09  9:44           ` Hannes Reinecke
  2009-10-09 14:06             ` Moger, Babu
  0 siblings, 1 reply; 22+ messages in thread
From: Hannes Reinecke @ 2009-10-09  9:44 UTC (permalink / raw)
  To: Moger, Babu
  Cc: michaelc@cs.wisc.edu, Stankey, Robert, linux-scsi@vger.kernel.org,
	sekharan@linux.vnet.ibm.com, Dachepalli, Sudhir,
	device-mapper development, Chauhan, Vijay, Benoit_Arthur@emc.com,
	Qi, Yanling, Eddie.Williams@steeleye.com

Moger, Babu wrote:
> Hi Hannes,
> I have tested the patch you had sent. Failover works fine. 
> 
> But, we are seeing problems during the failback. It is causing continues mode-select thrashing(ping pong).
> 
> Reason for this is, handler does not know if the mode select is coming for failover or failback. Every mode select will cause movement of all the Luns. It does not matter if the LUNs are on preferred path or not. On next polling interval, multipathd will find some Luns are not on preferred path and will initiate another failback. This will result in continues ping pong. I have explained this with an EXAMPLE 1 below.
> 
> For failback to work properly, we have to have selective Lun level failover.
> 
> There is also one more Cluster scenario where we could get into thrashing with Controller Level failover. Please see the EXAMPLE 2 below.
> 
> We have been testing LUN level failover with device mapper for a while now. It is working well for us and only problem we have is slower failovers with big configurations(failover was taking about 12 minutes with 234 luns). LSI and IBM(Chandra) has been working on  asynchronous behavior for the past 3-4 months. I have tested all the patches Chandra has posted and we have seen very good results(Failover takes only 1 minute with 234 luns).
> 
> Also, these patches give the opportunity for other handlers to move to asynchronous behavior if they wish to. We need your(and Linux community) help to review the patches and move forward on this issue.
> 
> Thanks
> Babu Moger
> LSI Corporation
> 
> 
> Following are the two example where we could see mode-select thrashing.. 
> 
> EXAMPLE 1 (mode select thrashing with 2 Luns in single host)
> =======================================================
> Let's take a very simple example.
> 
> I have 2 Luns on my host. Host is seeing both the controllers with one path to each controller.
> 
> Lun 0 is owned by controller A and preferred owner is A.
> Lun 1 is owned by controller B and preferred owned is B
> 
> Here is multipath -ll output..
> 
> mpath237 (3600a0b80000f519c0000cc8a48fc7d0b) dm-4 LSI,INF-01-00
> [size=2.0G][features=1 queue_if_no_path][hwhandler=1 rdac][rw]
> \_ round-robin 0 [prio=100][active]   
>  \_ 1:0:0:0 sde 8:64  [active][ready] (controller A)
> \_ round-robin 0 [prio=0][enabled]
>  \_ 2:0:0:0 sdi 8:128 [active][ghost] (controller B)
> 
> mpath180 (3600a0b80000f519c0000cc9048fc7d7b) dm-5 LSI,INF-01-00
> [size=2.0G][features=1 queue_if_no_path][hwhandler=1 rdac][rw]
> \_ round-robin 0 [prio=100][active]   
>  \_ 2:0:0:1 sdj 8:144 [active][ready] (controller B)
> \_ round-robin 0 [prio=0][enabled]
>  \_ 1:0:0:1 sdf 8:80  [active][ghost] (controller A)
> 
> 
> 1. Run I/O on both these Luns
> 2. Pull the cable connected to controller A.
> 3. Failover will happen and Lun 0 will move to controller B. Now both the Luns are on controller B.
> 4. Connect the cable back on controller A.
> 5. multipath tool will detect the physical Luns on controller A and run the priority test.
> 6. It will find that Lun 0 is not on preferred path and will initiate a failback. 
>    Because it is a controller level failover it will move the Lun 1 also to controller A.
>    Now both the Luns are on controller A.
> 7. Multipath tool will come again and find Lun 1 not on preferred path and initiate failback.
>    This will both the Luns to controller B.
>    This will continue forever.   
> 
Hmm. Yes, correct.
After all, the patch I sent was meant to be a proof of concept, not a fully fledged
solution. (In fact, I'm quite surprised it worked so well :-)

What about modifying the LUN select code to switch all _visible_ LUNs (ie all LUNs which
are _not_ on the preferred path) in one go?
That way we wouldn't run into this issue.

> 
> 
> EXAMPLE 2: (mode select thrashing in cluster setup)
> ============================================================================
> Let's take two node cluster environment where luns are visible across multiple nodes, although any
> given lun would only be accessible via one node at a time.  If a cluster configuration were to get
> into a state where one node only has visibility to one controller while another node only has
> visibility to the alternate, a “thrashing” condition could happen.  Take this example:
> 
> •	32 luns have been mapped from the storage to all nodes.
> •	Luns 0-15 are owned by the ‘A’ controller and being accessed by node #1; luns 16-31 are owned by ‘B’ and mapped to node #2.
> •	Node #1 only has access to the ‘A’ controller; node #2 only has access to the ‘B’ controller.
> 
> Let’s say Node #1 decides to access lun 16.  Because it does not have visibility to the ‘B’ controller
> it must issue a volume transfer request.  With Controller failover solution the volume transfer request
> would also move luns 17-31.  If node #2 were accessing those luns they would receive ownership errors,
> causing a volume transfer request to move them back.  However, this also moves lun 16 from ‘A’ back to ‘B’,
> causing node #1 to do the volume transfer request again…..etc.  
> 
Again, I think this can be solved by just moving the LUNS _not_ on the preferred path.
The difference between the existing solution would be to move all LUNs not on the preferred path in
on go, instead of moving the LUNs one by one.

Will see to draw up a patch.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: RE: [PATCH 0/4] scsi_dh: Make scsi_dh_activate asynchronous
  2009-10-09  9:44           ` Hannes Reinecke
@ 2009-10-09 14:06             ` Moger, Babu
  2009-10-09 14:55               ` Hannes Reinecke
  0 siblings, 1 reply; 22+ messages in thread
From: Moger, Babu @ 2009-10-09 14:06 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: michaelc@cs.wisc.edu, Stankey, Robert, linux-scsi@vger.kernel.org,
	sekharan@linux.vnet.ibm.com, Dachepalli, Sudhir,
	device-mapper development, Chauhan, Vijay, Benoit_Arthur@emc.com,
	Qi, Yanling, Eddie.Williams@steeleye.com

Hannes, Thanks again for the response..  Please see the comments below..
 
> Moger, Babu wrote:
> > 1. Run I/O on both these Luns
> > 2. Pull the cable connected to controller A.
> > 3. Failover will happen and Lun 0 will move to controller B. Now both
> the Luns are on controller B.
> > 4. Connect the cable back on controller A.
> > 5. multipath tool will detect the physical Luns on controller A and
> run the priority test.
> > 6. It will find that Lun 0 is not on preferred path and will initiate
> a failback.
> >    Because it is a controller level failover it will move the Lun 1
> also to controller A.
> >    Now both the Luns are on controller A.
> > 7. Multipath tool will come again and find Lun 1 not on preferred
> path and initiate failback.
> >    This will both the Luns to controller B.
> >    This will continue forever.
> >
> Hmm. Yes, correct.
> After all, the patch I sent was meant to be a proof of concept, not a
> fully fledged
> solution. (In fact, I'm quite surprised it worked so well :-)
> 
> What about modifying the LUN select code to switch all _visible_ LUNs
> (ie all LUNs which
> are _not_ on the preferred path) in one go?
> That way we wouldn't run into this issue.

In this case handler should be able to do two tasks.
1. move all the Luns when one of the path failed(failover)
2. move only the luns which are not there on preferred path(failback).

With the current code rdac_activate(struct scsi_device *sdev) we cannot make that decision.  There is no information passed from dm layer to know if this command is for task 1(failover) or task 2(failback). 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RE: [PATCH 0/4] scsi_dh: Make scsi_dh_activate asynchronous
  2009-10-09 14:06             ` Moger, Babu
@ 2009-10-09 14:55               ` Hannes Reinecke
  0 siblings, 0 replies; 22+ messages in thread
From: Hannes Reinecke @ 2009-10-09 14:55 UTC (permalink / raw)
  To: Moger, Babu
  Cc: michaelc@cs.wisc.edu, Stankey, Robert, linux-scsi@vger.kernel.org,
	sekharan@linux.vnet.ibm.com, Dachepalli, Sudhir,
	device-mapper development, Chauhan, Vijay, Benoit_Arthur@emc.com,
	Qi, Yanling, Eddie.Williams@steeleye.com

[-- Attachment #1: Type: text/plain, Size: 2528 bytes --]

Moger, Babu wrote:
> Hannes, Thanks again for the response..  Please see the comments below..
>  
>> Moger, Babu wrote:
>>> 1. Run I/O on both these Luns
>>> 2. Pull the cable connected to controller A.
>>> 3. Failover will happen and Lun 0 will move to controller B. Now both
>> the Luns are on controller B.
>>> 4. Connect the cable back on controller A.
>>> 5. multipath tool will detect the physical Luns on controller A and
>> run the priority test.
>>> 6. It will find that Lun 0 is not on preferred path and will initiate
>> a failback.
>>>    Because it is a controller level failover it will move the Lun 1
>> also to controller A.
>>>    Now both the Luns are on controller A.
>>> 7. Multipath tool will come again and find Lun 1 not on preferred
>> path and initiate failback.
>>>    This will both the Luns to controller B.
>>>    This will continue forever.
>>>
>> Hmm. Yes, correct.
>> After all, the patch I sent was meant to be a proof of concept, not a
>> fully fledged
>> solution. (In fact, I'm quite surprised it worked so well :-)
>>
>> What about modifying the LUN select code to switch all _visible_ LUNs
>> (ie all LUNs which
>> are _not_ on the preferred path) in one go?
>> That way we wouldn't run into this issue.
> 
> In this case handler should be able to do two tasks.
> 1. move all the Luns when one of the path failed(failover)
> 2. move only the luns which are not there on preferred path(failback).
> 
> With the current code rdac_activate(struct scsi_device *sdev) we cannot make that decision.
>  There is no information passed from dm layer to know if this command is for task 1(failover) or task 2(failback). 

Correct, but we don't need to.
We can detect the preferred path bit and use that to make this decision.
When the preferred path bit is not set for this LUN, we're switching _from_ the preferred path.
So chances are quite good that we're in failover mode and should transfer all LUNs.
When the preferred path bit is set, we're switching _to_ it and we're in failback mode.
So we should be transferring this path only; reasoning here is that failback is
triggered from the daemon and is running asynchronously anyway. So it doesn't really
matter how long this will take.

So something like the attached patch should work.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

[-- Attachment #2: rdac-failover-all-luns --]
[-- Type: text/plain, Size: 3532 bytes --]

diff --git a/drivers/scsi/device_handler/scsi_dh_rdac.c b/drivers/scsi/device_handler/scsi_dh_rdac.c
index 0bd33ee..d163cd8 100644
--- a/drivers/scsi/device_handler/scsi_dh_rdac.c
+++ b/drivers/scsi/device_handler/scsi_dh_rdac.c
@@ -42,6 +42,7 @@
 /*
  * Controller modes definitions
  */
+#define RDAC_MODE_TRANSFER_ALL_VISIBLE_LUNS	0x01
 #define RDAC_MODE_TRANSFER_SPECIFIED_LUNS	0x02
 
 /*
@@ -178,6 +179,7 @@ struct rdac_dh_data {
 #define RDAC_LUN_UNOWNED	0
 #define RDAC_LUN_OWNED		1
 #define RDAC_LUN_AVT		2
+#define RDAC_LUN_PREF		4
 	char			lun_state;
 	unsigned char		sense[SCSI_SENSE_BUFFERSIZE];
 	union			{
@@ -192,7 +194,12 @@ static const char *lun_state[] =
 {
 	"unowned",
 	"owned",
-	"owned (AVT mode)",
+	"unowned (AVT)",
+	"owned (AVT)",
+	"unowned (preferred)",
+	"owned (preferred)",
+	"unowned (preferred,AVT)",
+	"owned (preferred,AVT)",
 };
 
 static LIST_HEAD(ctlr_list);
@@ -253,7 +260,8 @@ static struct request *rdac_failover_get(struct scsi_device *sdev,
 		rdac_pg->subpage_code = 0x1;
 		rdac_pg->page_len[0] = 0x01;
 		rdac_pg->page_len[1] = 0x28;
-		rdac_pg->lun_table[h->lun] = 0x81;
+		if (h->lun->state & RDAC_LUN_PREF)
+			rdac_pg->lun_table[h->lun] = 0x81;
 	} else {
 		struct rdac_pg_legacy *rdac_pg;
 
@@ -263,9 +271,16 @@ static struct request *rdac_failover_get(struct scsi_device *sdev,
 		common = &rdac_pg->common;
 		rdac_pg->page_code = RDAC_PAGE_CODE_REDUNDANT_CONTROLLER;
 		rdac_pg->page_len = 0x68;
-		rdac_pg->lun_table[h->lun] = 0x81;
+		if (h->lun_state & RDAC_LUN_PREF)
+			rdac_pg->lun_table[h->lun] = 0x81;
 	}
-	common->rdac_mode[1] = RDAC_MODE_TRANSFER_SPECIFIED_LUNS;
+	if (h->lun_state & RDAC_LUN_PREF)
+		/* Failback mode; swith this LUN only */
+		common->rdac_mode[1] = RDAC_MODE_TRANSFER_SPECIFIED_LUNS;
+	else
+		/* Failover mode; switch all LUNs */
+		common->rdac_mode[1] = RDAC_MODE_TRANSFER_ALL_VISIBLE_LUNS;
+
 	common->quiescence_timeout = RDAC_QUIESCENCE_TIME;
 	common->rdac_options = RDAC_FORCED_QUIESENCE;
 
@@ -326,6 +341,7 @@ static struct rdac_controller *get_controller(u8 *subsys_id, u8 *slot_id)
 	memcpy(ctlr->slot_id, slot_id, SLOT_ID_LEN);
 	kref_init(&ctlr->kref);
 	ctlr->use_ms10 = -1;
+	ctlr->transfer_all_luns = 1;
 	list_add(&ctlr->node, &ctlr_list);
 done:
 	spin_unlock(&list_lock);
@@ -391,19 +407,24 @@ static int check_ownership(struct scsi_device *sdev, struct rdac_dh_data *h)
 	err = submit_inquiry(sdev, 0xC9, sizeof(struct c9_inquiry), h);
 	if (err == SCSI_DH_OK) {
 		inqp = &h->inq.c9;
+		if ((inqp->avte_cvp & 0x1) != 0) {
+			/* LUN was owned by the controller */
+			h->lun_state = RDAC_LUN_OWNED;
+		}
 		if ((inqp->avte_cvp >> 7) == 0x1) {
 			/* LUN in AVT mode */
 			sdev_printk(KERN_NOTICE, sdev,
 				    "%s: AVT mode detected\n",
 				    RDAC_NAME);
-			h->lun_state = RDAC_LUN_AVT;
-		} else if ((inqp->avte_cvp & 0x1) != 0) {
-			/* LUN was owned by the controller */
-			h->lun_state = RDAC_LUN_OWNED;
+			h->lun_state |= RDAC_LUN_AVT;
+		}
+		if ((inqp->path_prio && 0xf) == 0x1) {
+			/* LUN access through preferred path */
+			h->lun_state |= RDAC_LUN_PREF;
 		}
 	}
 
-	if (h->lun_state == RDAC_LUN_UNOWNED)
+	if ((h->lun_state & RDAC_LUN_OWNED) == 0)
 		h->state = RDAC_STATE_PASSIVE;
 
 	return err;
@@ -537,7 +558,8 @@ static int rdac_activate(struct scsi_device *sdev)
 		if (err != SCSI_DH_OK)
 			goto done;
 	}
-	if (h->lun_state == RDAC_LUN_UNOWNED)
+	if ((h->lun_state & RDAC_LUN_OWNED) == 0 &&
+	    (h->lun_state & RDAC_LUN_AVT) == 0 )
 		err = send_mode_select(sdev, h);
 done:
 	return err;

[-- Attachment #3: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 0/4] scsi_dh: Make scsi_dh_activate asynchronous
@ 2009-10-21 16:22 Chandra Seetharaman
  0 siblings, 0 replies; 22+ messages in thread
From: Chandra Seetharaman @ 2009-10-21 16:22 UTC (permalink / raw)
  To: linux-scsi
  Cc: dm-devel, babu.moger, michaelc, Benoit_Arthur, Eddie.Williams,
	berthiaume_wayne, Chandra Seetharaman

Hello All,

Currently, device handlers process path activation in series. This leads
to a lot of time delay when more than 100 luns are involved. For example,
with lsi rdac 100+ luns take about 12-15 minutes. This was found by Moger
Babu of LSI.

This time delay can be avoided if we can do the activations asynchronously.
By making scsi_dh_activate() async, we can give the device handlers an
oppurtunity to decide on how to send the device activation down the wire
to make the turn around time faster. They can send the commands
asynchronously or send them in batches.

I brought up this issue on the mailing list
(http://marc.info/?l=linux-scsi&m=123888063818755&w=2) and provided a
set of patch a loooong while back http://marc.info/?l=linux-scsi&m=124088712709821&w=2

I posted another set of patches against 2.6.31 on 2009/09/29:
http://marc.info/?l=linux-scsi&m=125427648406736&w=2

This set of patches applies cleanly on 2.6.32-rc5 and I tested the rdac handler
and alua handler on the same.

Please review and provide comments.

This set of patched adds asynchronous support for rdac, hp, and alua handlers.

Didn't add async support for EMC for the following reasons:
 - EMC handler is getting more hairy (when I try to move to _nowait())
 - I do not know if EMC has a special command to group them (like rdac).
 - I wasn't sure if EMC has this problem in the first place ( I did not get
   any response from EMC).

Can somebody from EMC comment, please.

Thanks & Regards,

chandra


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2009-10-21 16:22 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-09-30  2:08 [PATCH 0/4] scsi_dh: Make scsi_dh_activate asynchronous Chandra Seetharaman
2009-09-30  2:08 ` [PATCH 1/4] scsi_dh: Change the scsidh_activate interface to be asynchronous Chandra Seetharaman
2009-10-02 22:04   ` Moger, Babu
2009-10-02 22:36     ` Chandra Seetharaman
2009-10-02 23:02       ` Moger, Babu
2009-09-30  2:08 ` [PATCH 2/4] scsi_dh: rdac handler: Make rdac hardware handler async Chandra Seetharaman
2009-10-02  0:03   ` Moger, Babu
2009-10-02  0:29     ` Chandra Seetharaman
2009-09-30  2:08 ` [PATCH 3/4] scsi_dh: rdac handler: Make hp " Chandra Seetharaman
2009-09-30  2:08 ` [PATCH 4/4] scsi_dh: rdac handler: Make alua " Chandra Seetharaman
2009-10-01  4:19 ` [PATCH 0/4] scsi_dh: Make scsi_dh_activate asynchronous Moger, Babu
2009-10-01 20:54   ` Chandra Seetharaman
2009-10-05 13:01 ` Hannes Reinecke
2009-10-05 14:35   ` Hannes Reinecke
2009-10-05 23:25   ` Chandra Seetharaman
2009-10-06  8:08     ` Hannes Reinecke
2009-10-06 19:46       ` Moger, Babu
2009-10-07 23:08         ` Moger, Babu
2009-10-09  9:44           ` Hannes Reinecke
2009-10-09 14:06             ` Moger, Babu
2009-10-09 14:55               ` Hannes Reinecke
  -- strict thread matches above, loose matches on Subject: below --
2009-10-21 16:22 Chandra Seetharaman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).