linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/20, v4] Make ib_srp better suited for H.A. purposes
@ 2012-08-09 15:41 Bart Van Assche
  2012-08-09 15:57 ` [PATCH 14/20] srp_transport: Simplify attribute initialization code Bart Van Assche
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Bart Van Assche @ 2012-08-09 15:41 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-scsi,
	David Dillow

This patch series makes the ib_srp driver better suited for use in a
H.A. setup because:
- multipathd is notified faster about transport layer failures.
- Transport layer failures reliably result in a reconnect.
- Switchover can be triggered explicitly by deleting an initiator
  device.
- Disconnecting from a target without unloading ib_srp is now possible.

Changes since v3:
- Restored the dev_loss_tmo and fast_io_fail_tmo sysfs attributes.
- Included a patch to fix an ib_srp crash that could be triggered by
  cable pulling.

Changes since v2:
- Addressed the v2 review comments.
- Dropped the patches that have already been merged.
- Dropped the patches for integration with multipathd.
- Dropped the micro-optimization of the IB completion handlers.

The individual patches are:
0001-ib_srp-Fix-a-race-condition.patch
0002-ib_srp-Enlarge-block-layer-timeout.patch
0003-ib_srp-Move-QP-state-check-into-srp_send_tsk_mgmt.patch
0004-ib_srp-Stop-queueing-if-QP-in-error.patch
0005-ib_srp-Eliminate-state-SRP_TARGET_CONNECTING.patch
0006-ib_srp-Suppress-superfluous-error-messages.patch
0007-ib_srp-Avoid-that-SCSI-error-handling-triggers-a-cra.patch
0008-ib_srp-Introduce-the-helper-function-srp_remove_targ.patch
0009-ib_srp-Eliminate-state-SRP_TARGET_DEAD.patch
0010-ib_srp-Keep-processing-commands-during-scsi_remove_h.patch
0011-ib_srp-Make-srp_disconnect_target-wait-for-IB-comple.patch
0012-ib_srp-Document-sysfs-attributes.patch
0013-srp_transport-Fix-atttribute-registration.patch
0014-srp_transport-Simplify-attribute-initialization-code.patch
0015-srp_transport-Document-sysfs-attributes.patch
0016-ib_srp-Allow-SRP-disconnect-through-sysfs.patch
0017-ib_srp-Introduce-a-temporary-variable-in-srp_remove_.patch
0018-ib_srp-Maintain-a-single-connection-per-I_T-nexus.patch
0019-srp_transport-Add-transport-layer-error-handling.patch
0020-ib_srp-Add-dev_loss_tmo-support.patch

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 13/20] srp_transport: Fix atttribute registration
       [not found] ` <5023DA39.7020000-HInyCGIudOg@public.gmane.org>
@ 2012-08-09 15:56   ` Bart Van Assche
  2012-08-09 15:58   ` [PATCH 15/20] srp_transport: Document sysfs attributes Bart Van Assche
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Bart Van Assche @ 2012-08-09 15:56 UTC (permalink / raw)
  To: Robert Jennings
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-scsi,
	David Dillow, Roland Dreier, FUJITA Tomonori

Register transport attributes after the attribute array has been
set up instead of before. The current code can trigger a race
condition because the code reading the attribute array can run
on another thread than the code that initialized that array.
Make sure that any code reading the attribute array will see all
values written into that array.

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Cc: FUJITA Tomonori <fujita.tomonori-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
Cc: Robert Jennings <rcj-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Cc: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
Cc: <stable-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/scsi/scsi_transport_srp.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c
index 21a045e..07c4394 100644
--- a/drivers/scsi/scsi_transport_srp.c
+++ b/drivers/scsi/scsi_transport_srp.c
@@ -324,13 +324,14 @@ srp_attach_transport(struct srp_function_template *ft)
 	i->rport_attr_cont.ac.attrs = &i->rport_attrs[0];
 	i->rport_attr_cont.ac.class = &srp_rport_class.class;
 	i->rport_attr_cont.ac.match = srp_rport_match;
-	transport_container_register(&i->rport_attr_cont);
 
 	count = 0;
 	SETUP_RPORT_ATTRIBUTE_RD(port_id);
 	SETUP_RPORT_ATTRIBUTE_RD(roles);
 	i->rport_attrs[count] = NULL;
 
+	transport_container_register(&i->rport_attr_cont);
+
 	i->f = ft;
 
 	return &i->t;
-- 
1.7.7


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 14/20] srp_transport: Simplify attribute initialization code
  2012-08-09 15:41 [PATCH 00/20, v4] Make ib_srp better suited for H.A. purposes Bart Van Assche
@ 2012-08-09 15:57 ` Bart Van Assche
       [not found] ` <5023DA39.7020000-HInyCGIudOg@public.gmane.org>
  2012-08-09 16:18 ` Bart Van Assche
  2 siblings, 0 replies; 16+ messages in thread
From: Bart Van Assche @ 2012-08-09 15:57 UTC (permalink / raw)
  To: Robert Jennings
  Cc: linux-rdma@vger.kernel.org, linux-scsi, David Dillow,
	Roland Dreier, FUJITA Tomonori

Eliminate the private_rport_attrs[] array and the SETUP_*() macros
used to set up that array since the information in that array
duplicates the information in the static device attributes. Also,
verify whether SRP_RPORT_ATTRS is large enough since it is easy
to forget to update that macro when adding new attributes.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
Cc: David Dillow <dillowda@ornl.gov>
Cc: Roland Dreier <roland@purestorage.com>
---
 drivers/scsi/scsi_transport_srp.c |   26 ++++----------------------
 1 files changed, 4 insertions(+), 22 deletions(-)

diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c
index 07c4394..0d85f79 100644
--- a/drivers/scsi/scsi_transport_srp.c
+++ b/drivers/scsi/scsi_transport_srp.c
@@ -47,7 +47,6 @@ struct srp_internal {
 	struct device_attribute *host_attrs[SRP_HOST_ATTRS + 1];
 
 	struct device_attribute *rport_attrs[SRP_RPORT_ATTRS + 1];
-	struct device_attribute private_rport_attrs[SRP_RPORT_ATTRS];
 	struct transport_container rport_attr_cont;
 };
 
@@ -72,24 +71,6 @@ static DECLARE_TRANSPORT_CLASS(srp_host_class, "srp_host", srp_host_setup,
 static DECLARE_TRANSPORT_CLASS(srp_rport_class, "srp_remote_ports",
 			       NULL, NULL, NULL);
 
-#define SETUP_TEMPLATE(attrb, field, perm, test, ro_test, ro_perm)	\
-	i->private_##attrb[count] = dev_attr_##field;		\
-	i->private_##attrb[count].attr.mode = perm;			\
-	if (ro_test) {							\
-		i->private_##attrb[count].attr.mode = ro_perm;		\
-		i->private_##attrb[count].store = NULL;			\
-	}								\
-	i->attrb[count] = &i->private_##attrb[count];			\
-	if (test)							\
-		count++
-
-#define SETUP_RPORT_ATTRIBUTE_RD(field)					\
-	SETUP_TEMPLATE(rport_attrs, field, S_IRUGO, 1, 0, 0)
-
-#define SETUP_RPORT_ATTRIBUTE_RW(field)					\
-	SETUP_TEMPLATE(rport_attrs, field, S_IRUGO | S_IWUSR,		\
-		       1, 1, S_IRUGO)
-
 #define SRP_PID(p) \
 	(p)->port_id[0], (p)->port_id[1], (p)->port_id[2], (p)->port_id[3], \
 	(p)->port_id[4], (p)->port_id[5], (p)->port_id[6], (p)->port_id[7], \
@@ -326,9 +307,10 @@ srp_attach_transport(struct srp_function_template *ft)
 	i->rport_attr_cont.ac.match = srp_rport_match;
 
 	count = 0;
-	SETUP_RPORT_ATTRIBUTE_RD(port_id);
-	SETUP_RPORT_ATTRIBUTE_RD(roles);
-	i->rport_attrs[count] = NULL;
+	i->rport_attrs[count++] = &dev_attr_port_id;
+	i->rport_attrs[count++] = &dev_attr_roles;
+	i->rport_attrs[count++] = NULL;
+	BUG_ON(count > ARRAY_SIZE(i->rport_attrs));
 
 	transport_container_register(&i->rport_attr_cont);
 
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 15/20] srp_transport: Document sysfs attributes
       [not found] ` <5023DA39.7020000-HInyCGIudOg@public.gmane.org>
  2012-08-09 15:56   ` [PATCH 13/20] srp_transport: Fix atttribute registration Bart Van Assche
@ 2012-08-09 15:58   ` Bart Van Assche
  2012-08-09 15:59   ` [PATCH 16/20] ib_srp: Allow SRP disconnect through sysfs Bart Van Assche
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Bart Van Assche @ 2012-08-09 15:58 UTC (permalink / raw)
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-scsi,
	David Dillow, Roland Dreier, FUJITA Tomonori, Robert Jennings

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Cc: FUJITA Tomonori <fujita.tomonori-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
Cc: Robert Jennings <rcj-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Cc: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
---
 Documentation/ABI/stable/sysfs-transport-srp |   12 ++++++++++++
 1 files changed, 12 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/ABI/stable/sysfs-transport-srp

diff --git a/Documentation/ABI/stable/sysfs-transport-srp b/Documentation/ABI/stable/sysfs-transport-srp
new file mode 100644
index 0000000..7b0d4a5
--- /dev/null
+++ b/Documentation/ABI/stable/sysfs-transport-srp
@@ -0,0 +1,12 @@
+What:		/sys/class/srp_remote_ports/port-<h>:<n>/port_id
+Date:		June 27, 2007
+KernelVersion:	2.6.24
+Contact:	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
+Description:	16-byte local SRP port identifier in hexadecimal format. An
+		example: 4c:49:4e:55:58:20:56:49:4f:00:00:00:00:00:00:00.
+
+What:		/sys/class/srp_remote_ports/port-<h>:<n>/roles
+Date:		June 27, 2007
+KernelVersion:	2.6.24
+Contact:	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
+Description:	Role of the remote port. Either "SRP Initiator" or "SRP Target".
-- 
1.7.7

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 16/20] ib_srp: Allow SRP disconnect through sysfs
       [not found] ` <5023DA39.7020000-HInyCGIudOg@public.gmane.org>
  2012-08-09 15:56   ` [PATCH 13/20] srp_transport: Fix atttribute registration Bart Van Assche
  2012-08-09 15:58   ` [PATCH 15/20] srp_transport: Document sysfs attributes Bart Van Assche
@ 2012-08-09 15:59   ` Bart Van Assche
  2012-08-09 16:02   ` [PATCH 19/20] srp_transport: Add transport layer error handling Bart Van Assche
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Bart Van Assche @ 2012-08-09 15:59 UTC (permalink / raw)
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-scsi,
	David Dillow, Roland Dreier, FUJITA Tomonori, Robert Jennings

Make it possible to disconnect the IB RC connection used by the
SRP protocol to communicate with a target.

Let the SRP transport layer create a sysfs "delete" attribute for
initiator drivers that support this functionality.

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Cc: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
Cc: FUJITA Tomonori <fujita.tomonori-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
Cc: Robert Jennings <rcj-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
 Documentation/ABI/stable/sysfs-transport-srp |    7 +++++++
 drivers/infiniband/ulp/srp/ib_srp.c          |   10 ++++++++++
 drivers/scsi/scsi_transport_srp.c            |   22 +++++++++++++++++++++-
 include/scsi/scsi_transport_srp.h            |    8 ++++++++
 4 files changed, 46 insertions(+), 1 deletions(-)

diff --git a/Documentation/ABI/stable/sysfs-transport-srp b/Documentation/ABI/stable/sysfs-transport-srp
index 7b0d4a5..b36fb0d 100644
--- a/Documentation/ABI/stable/sysfs-transport-srp
+++ b/Documentation/ABI/stable/sysfs-transport-srp
@@ -1,3 +1,10 @@
+What:		/sys/class/srp_remote_ports/port-<h>:<n>/delete
+Date:		June 1, 2012
+KernelVersion:	3.7
+Contact:	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
+Description:	Instructs an SRP initiator to disconnect from a target and to
+		remove all LUNs imported from that target.
+
 What:		/sys/class/srp_remote_ports/port-<h>:<n>/port_id
 Date:		June 27, 2007
 KernelVersion:	2.6.24
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 4de7c46..d90100e 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -609,6 +609,13 @@ static void srp_remove_work(struct work_struct *work)
 	srp_remove_target(target);
 }
 
+static void srp_rport_delete(struct srp_rport *rport)
+{
+	struct srp_target_port *target = rport->lld_data;
+
+	srp_queue_remove_work(target);
+}
+
 static int srp_connect_target(struct srp_target_port *target)
 {
 	int retries = 3;
@@ -2029,6 +2036,8 @@ static int srp_add_target(struct srp_host *host, struct srp_target_port *target)
 		return PTR_ERR(rport);
 	}
 
+	rport->lld_data = target;
+
 	spin_lock(&host->target_lock);
 	list_add_tail(&target->list, &host->target_list);
 	spin_unlock(&host->target_lock);
@@ -2596,6 +2605,7 @@ static void srp_remove_one(struct ib_device *device)
 }
 
 static struct srp_function_template ib_srp_transport_functions = {
+	.rport_delete		 = srp_rport_delete,
 };
 
 static int __init srp_init_module(void)
diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c
index 0d85f79..f379c7f 100644
--- a/drivers/scsi/scsi_transport_srp.c
+++ b/drivers/scsi/scsi_transport_srp.c
@@ -38,7 +38,7 @@ struct srp_host_attrs {
 #define to_srp_host_attrs(host)	((struct srp_host_attrs *)(host)->shost_data)
 
 #define SRP_HOST_ATTRS 0
-#define SRP_RPORT_ATTRS 2
+#define SRP_RPORT_ATTRS 3
 
 struct srp_internal {
 	struct scsi_transport_template t;
@@ -116,6 +116,24 @@ show_srp_rport_roles(struct device *dev, struct device_attribute *attr,
 
 static DEVICE_ATTR(roles, S_IRUGO, show_srp_rport_roles, NULL);
 
+static ssize_t store_srp_rport_delete(struct device *dev,
+				      struct device_attribute *attr,
+				      const char *buf, size_t count)
+{
+	struct srp_rport *rport = transport_class_to_srp_rport(dev);
+	struct Scsi_Host *shost = dev_to_shost(dev);
+	struct srp_internal *i = to_srp_internal(shost->transportt);
+
+	if (i->f->rport_delete) {
+		i->f->rport_delete(rport);
+		return count;
+	} else {
+		return -ENOSYS;
+	}
+}
+
+static DEVICE_ATTR(delete, S_IWUSR, NULL, store_srp_rport_delete);
+
 static void srp_rport_release(struct device *dev)
 {
 	struct srp_rport *rport = dev_to_rport(dev);
@@ -309,6 +327,8 @@ srp_attach_transport(struct srp_function_template *ft)
 	count = 0;
 	i->rport_attrs[count++] = &dev_attr_port_id;
 	i->rport_attrs[count++] = &dev_attr_roles;
+	if (ft->rport_delete)
+		i->rport_attrs[count++] = &dev_attr_delete;
 	i->rport_attrs[count++] = NULL;
 	BUG_ON(count > ARRAY_SIZE(i->rport_attrs));
 
diff --git a/include/scsi/scsi_transport_srp.h b/include/scsi/scsi_transport_srp.h
index 9c60ca1..ff0f04a 100644
--- a/include/scsi/scsi_transport_srp.h
+++ b/include/scsi/scsi_transport_srp.h
@@ -14,13 +14,21 @@ struct srp_rport_identifiers {
 };
 
 struct srp_rport {
+	/* for initiator and target drivers */
+
 	struct device dev;
 
 	u8 port_id[16];
 	u8 roles;
+
+	/* for initiator drivers */
+
+	void *lld_data;	/* LLD private data */
 };
 
 struct srp_function_template {
+	/* for initiator drivers */
+	void (*rport_delete)(struct srp_rport *rport);
 	/* for target drivers */
 	int (* tsk_mgmt_response)(struct Scsi_Host *, u64, u64, int);
 	int (* it_nexus_response)(struct Scsi_Host *, u64, int);
-- 
1.7.7

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 19/20] srp_transport: Add transport layer error handling
       [not found] ` <5023DA39.7020000-HInyCGIudOg@public.gmane.org>
                     ` (2 preceding siblings ...)
  2012-08-09 15:59   ` [PATCH 16/20] ib_srp: Allow SRP disconnect through sysfs Bart Van Assche
@ 2012-08-09 16:02   ` Bart Van Assche
  2012-08-27 18:37   ` [PATCH 00/20, v4] Make ib_srp better suited for H.A. purposes Dongsu Park
  2012-09-25 15:05   ` Bart Van Assche
  5 siblings, 0 replies; 16+ messages in thread
From: Bart Van Assche @ 2012-08-09 16:02 UTC (permalink / raw)
  To: Robert Jennings
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-scsi,
	David Dillow, FUJITA Tomonori

Add the necessary functions in the SRP transport module to allow
an SRP initiator driver to implement transport error handling.
This includes:
- Support for implementing fast_io_fail_tmo, the time that should
  elapse after having detected a transport layer problem and
  before failing I/O.
- Support for implementing dev_loss_tmo, the time that should
  elapse after having detected a transport layer problem and
  before removing a remote port.

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Cc: FUJITA Tomonori <fujita.tomonori-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
Cc: Robert Jennings <rcj-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Cc: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
---
 Documentation/ABI/stable/sysfs-transport-srp |   17 +++
 drivers/scsi/scsi_transport_srp.c            |  199 +++++++++++++++++++++++++-
 include/scsi/scsi_transport_srp.h            |   11 ++-
 3 files changed, 224 insertions(+), 3 deletions(-)

diff --git a/Documentation/ABI/stable/sysfs-transport-srp b/Documentation/ABI/stable/sysfs-transport-srp
index b36fb0d..2f14a5b 100644
--- a/Documentation/ABI/stable/sysfs-transport-srp
+++ b/Documentation/ABI/stable/sysfs-transport-srp
@@ -5,6 +5,23 @@ Contact:	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
 Description:	Instructs an SRP initiator to disconnect from a target and to
 		remove all LUNs imported from that target.
 
+What:		/sys/class/srp_remote_ports/port-<h>:<n>/dev_loss_tmo
+Date:		January 1, 2012
+KernelVersion:	3.7
+Contact:	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
+Description:	Number of seconds the SCSI layer will wait after a transport
+		layer error has been observed before removing a target port.
+		Zero means immediate removal.
+
+What:		/sys/class/srp_remote_ports/port-<h>:<n>/fast_io_fail_tmo
+Date:		January 1, 2012
+KernelVersion:	3.7
+Contact:	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
+Description:	Number of seconds the SCSI layer will wait after a transport
+		layer error has been observed before failing I/O. Zero means
+		immediate removal. A negative value will disable this
+		behavior.
+
 What:		/sys/class/srp_remote_ports/port-<h>:<n>/port_id
 Date:		June 27, 2007
 KernelVersion:	2.6.24
diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c
index f379c7f..965a91f 100644
--- a/drivers/scsi/scsi_transport_srp.c
+++ b/drivers/scsi/scsi_transport_srp.c
@@ -2,6 +2,7 @@
  * SCSI RDMA (SRP) transport class
  *
  * Copyright (C) 2007 FUJITA Tomonori <tomof-HInyCGIudOg@public.gmane.org>
+ * Copyright (C) 2012 Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License as
@@ -30,6 +31,7 @@
 #include <scsi/scsi_host.h>
 #include <scsi/scsi_transport.h>
 #include <scsi/scsi_transport_srp.h>
+#include "scsi_priv.h"
 #include "scsi_transport_srp_internal.h"
 
 struct srp_host_attrs {
@@ -38,7 +40,7 @@ struct srp_host_attrs {
 #define to_srp_host_attrs(host)	((struct srp_host_attrs *)(host)->shost_data)
 
 #define SRP_HOST_ATTRS 0
-#define SRP_RPORT_ATTRS 3
+#define SRP_RPORT_ATTRS 5
 
 struct srp_internal {
 	struct scsi_transport_template t;
@@ -54,6 +56,10 @@ struct srp_internal {
 
 #define	dev_to_rport(d)	container_of(d, struct srp_rport, dev)
 #define transport_class_to_srp_rport(dev) dev_to_rport((dev)->parent)
+static inline struct Scsi_Host *rport_to_shost(struct srp_rport *r)
+{
+	return dev_to_shost(r->dev.parent);
+}
 
 static int srp_host_setup(struct transport_container *tc, struct device *dev,
 			  struct device *cdev)
@@ -134,6 +140,186 @@ static ssize_t store_srp_rport_delete(struct device *dev,
 
 static DEVICE_ATTR(delete, S_IWUSR, NULL, store_srp_rport_delete);
 
+/**
+ * srp_tmo_valid() - Check timeout combination validity.
+ *
+ * If no fast I/O fail timeout has been configured then the device loss timeout
+ * must be below SCSI_DEVICE_BLOCK_MAX_TIMEOUT. If a fast I/O fail timeout has
+ * been configured then it must be below the device loss timeout.
+ */
+static int srp_tmo_valid(int fast_io_fail_tmo, unsigned dev_loss_tmo)
+{
+	return (fast_io_fail_tmo < 0 &&
+		dev_loss_tmo <= SCSI_DEVICE_BLOCK_MAX_TIMEOUT)
+		|| (0 <= fast_io_fail_tmo &&
+		    fast_io_fail_tmo < dev_loss_tmo &&
+		    dev_loss_tmo < ULONG_MAX / HZ) ? 0 : -EINVAL;
+}
+
+static ssize_t show_srp_rport_fast_io_fail_tmo(struct device *dev,
+					   struct device_attribute *attr,
+					   char *buf)
+{
+	struct srp_rport *rport = transport_class_to_srp_rport(dev);
+
+	if (rport->fast_io_fail_tmo >= 0)
+		return sprintf(buf, "%d\n", rport->fast_io_fail_tmo);
+	else
+		return sprintf(buf, "off\n");
+}
+
+static ssize_t store_srp_rport_fast_io_fail_tmo(struct device *dev,
+					    struct device_attribute *attr,
+					    const char *buf, size_t count)
+{
+	struct srp_rport *rport = transport_class_to_srp_rport(dev);
+	char ch[16];
+	int res;
+	int fast_io_fail_tmo;
+
+	if (count >= 3 && memcmp(buf, "off", 3) == 0) {
+		fast_io_fail_tmo = -1;
+	} else {
+		sprintf(ch, "%.*s", min_t(int, sizeof(ch) - 1, count), buf);
+		res = kstrtoint(ch, 0, &fast_io_fail_tmo);
+		if (res)
+			goto out;
+	}
+	res = srp_tmo_valid(fast_io_fail_tmo, rport->dev_loss_tmo);
+	if (res)
+		goto out;
+	rport->fast_io_fail_tmo = fast_io_fail_tmo;
+	res = count;
+out:
+	return res;
+}
+
+static DEVICE_ATTR(fast_io_fail_tmo, S_IRUGO | S_IWUSR,
+		   show_srp_rport_fast_io_fail_tmo,
+		   store_srp_rport_fast_io_fail_tmo);
+
+static ssize_t show_srp_rport_dev_loss_tmo(struct device *dev,
+					   struct device_attribute *attr,
+					   char *buf)
+{
+	struct srp_rport *rport = transport_class_to_srp_rport(dev);
+
+	return sprintf(buf, "%u\n", rport->dev_loss_tmo);
+}
+
+static ssize_t store_srp_rport_dev_loss_tmo(struct device *dev,
+					    struct device_attribute *attr,
+					    const char *buf, size_t count)
+{
+	struct srp_rport *rport = transport_class_to_srp_rport(dev);
+	char ch[16];
+	int res;
+	unsigned dev_loss_tmo;
+
+	sprintf(ch, "%.*s", min_t(int, sizeof(ch) - 1, count), buf);
+	res = kstrtouint(ch, 0, &dev_loss_tmo);
+	if (res)
+		goto out;
+	res = srp_tmo_valid(rport->fast_io_fail_tmo, dev_loss_tmo);
+	if (res)
+		goto out;
+	rport->dev_loss_tmo = dev_loss_tmo;
+	res = count;
+out:
+	return res;
+}
+
+static DEVICE_ATTR(dev_loss_tmo, S_IRUGO | S_IWUSR,
+		   show_srp_rport_dev_loss_tmo,
+		   store_srp_rport_dev_loss_tmo);
+
+/**
+ * rport_fast_io_fail_timedout() - Fast I/O failure timeout handler.
+ *
+ * Unblocks the SCSI host.
+ */
+static void rport_fast_io_fail_timedout(struct work_struct *work)
+{
+	struct srp_rport *rport =
+		container_of(to_delayed_work(work), struct srp_rport,
+			     fast_io_fail_work);
+	struct Scsi_Host *shost;
+	struct srp_internal *i;
+
+	pr_err("SRP transport: fast_io_fail_tmo (%ds) expired - unblocking %s.\n",
+	       rport->fast_io_fail_tmo, dev_name(&rport->dev));
+
+	shost = rport_to_shost(rport);
+	i = to_srp_internal(shost->transportt);
+	/* Involve the LLDD if possible to terminate all io on the rport. */
+	if (i->f->terminate_rport_io)
+		i->f->terminate_rport_io(rport);
+
+	scsi_target_unblock(rport->dev.parent, SDEV_TRANSPORT_OFFLINE);
+}
+
+/**
+ * rport_dev_loss_timedout() - Device loss timeout handler.
+ *
+ * Note: rport->ft->rport_delete must either unblock the SCSI host or schedule
+ * SCSI host removal.
+ */
+static void rport_dev_loss_timedout(struct work_struct *work)
+{
+	struct srp_rport *rport =
+		container_of(to_delayed_work(work), struct srp_rport,
+			     fast_io_fail_work);
+	struct Scsi_Host *shost;
+	struct srp_internal *i;
+
+	pr_err("SRP transport: dev_loss_tmo (%ds) expired - removing %s.\n",
+	       rport->dev_loss_tmo, dev_name(&rport->dev));
+
+	shost = rport_to_shost(rport);
+	i = to_srp_internal(shost->transportt);
+	BUG_ON(!i->f);
+	BUG_ON(!i->f->rport_delete);
+
+	i->f->rport_delete(rport);
+}
+
+/**
+ * srp_start_tl_fail_timers() - Start the transport layer failure timers.
+ *
+ * Start the transport layer fast I/O failure and device loss timers. Do not
+ * modify a timer that was already started.
+ */
+void srp_start_tl_fail_timers(struct srp_rport *rport)
+{
+	if (rport->fast_io_fail_tmo >= 0)
+		queue_delayed_work(system_long_wq, &rport->fast_io_fail_work,
+				   1UL * rport->fast_io_fail_tmo * HZ);
+	queue_delayed_work(system_long_wq, &rport->dev_loss_work,
+			   1UL * rport->dev_loss_tmo * HZ);
+}
+EXPORT_SYMBOL(srp_start_tl_fail_timers);
+
+void srp_stop_tl_fail_timers(struct srp_rport *rport)
+{
+	cancel_delayed_work_sync(&rport->fast_io_fail_work);
+	cancel_delayed_work_sync(&rport->dev_loss_work);
+}
+EXPORT_SYMBOL(srp_stop_tl_fail_timers);
+
+/**
+ * srp_stop_rport() - Stop asynchronous work for an rport.
+ */
+void srp_stop_rport(struct srp_rport *rport)
+{
+	struct device *dev = rport->dev.parent;
+
+	device_remove_file(dev, &dev_attr_fast_io_fail_tmo);
+	device_remove_file(dev, &dev_attr_dev_loss_tmo);
+	srp_stop_tl_fail_timers(rport);
+	scsi_target_unblock(rport->dev.parent, SDEV_RUNNING);
+}
+EXPORT_SYMBOL(srp_stop_rport);
+
 static void srp_rport_release(struct device *dev)
 {
 	struct srp_rport *rport = dev_to_rport(dev);
@@ -210,6 +396,12 @@ struct srp_rport *srp_rport_add(struct Scsi_Host *shost,
 	memcpy(rport->port_id, ids->port_id, sizeof(rport->port_id));
 	rport->roles = ids->roles;
 
+	rport->fast_io_fail_tmo = -1;
+	rport->dev_loss_tmo = 60;
+	INIT_DELAYED_WORK(&rport->fast_io_fail_work,
+			  rport_fast_io_fail_timedout);
+	INIT_DELAYED_WORK(&rport->dev_loss_work, rport_dev_loss_timedout);
+
 	id = atomic_inc_return(&to_srp_host_attrs(shost)->next_port_id);
 	dev_set_name(&rport->dev, "port-%d:%d", shost->host_no, id);
 
@@ -327,8 +519,11 @@ srp_attach_transport(struct srp_function_template *ft)
 	count = 0;
 	i->rport_attrs[count++] = &dev_attr_port_id;
 	i->rport_attrs[count++] = &dev_attr_roles;
-	if (ft->rport_delete)
+	if (ft->rport_delete) {
+		i->rport_attrs[count++] = &dev_attr_dev_loss_tmo;
+		i->rport_attrs[count++] = &dev_attr_fast_io_fail_tmo;
 		i->rport_attrs[count++] = &dev_attr_delete;
+	}
 	i->rport_attrs[count++] = NULL;
 	BUG_ON(count > ARRAY_SIZE(i->rport_attrs));
 
diff --git a/include/scsi/scsi_transport_srp.h b/include/scsi/scsi_transport_srp.h
index ff0f04a..170bace 100644
--- a/include/scsi/scsi_transport_srp.h
+++ b/include/scsi/scsi_transport_srp.h
@@ -23,11 +23,17 @@ struct srp_rport {
 
 	/* for initiator drivers */
 
-	void *lld_data;	/* LLD private data */
+	void			*lld_data;	/* LLD private data */
+
+	int			fast_io_fail_tmo;
+	unsigned		dev_loss_tmo;
+	struct delayed_work	fast_io_fail_work;
+	struct delayed_work	dev_loss_work;
 };
 
 struct srp_function_template {
 	/* for initiator drivers */
+	void (*terminate_rport_io)(struct srp_rport *rport);
 	void (*rport_delete)(struct srp_rport *rport);
 	/* for target drivers */
 	int (* tsk_mgmt_response)(struct Scsi_Host *, u64, u64, int);
@@ -41,6 +47,9 @@ extern void srp_release_transport(struct scsi_transport_template *);
 extern struct srp_rport *srp_rport_add(struct Scsi_Host *,
 				       struct srp_rport_identifiers *);
 extern void srp_rport_del(struct srp_rport *);
+extern void srp_start_tl_fail_timers(struct srp_rport *rport);
+extern void srp_stop_tl_fail_timers(struct srp_rport *rport);
+extern void srp_stop_rport(struct srp_rport *rport);
 
 extern void srp_remove_host(struct Scsi_Host *);
 
-- 
1.7.7

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 00/20, v4] Make ib_srp better suited for H.A. purposes
  2012-08-09 15:41 [PATCH 00/20, v4] Make ib_srp better suited for H.A. purposes Bart Van Assche
  2012-08-09 15:57 ` [PATCH 14/20] srp_transport: Simplify attribute initialization code Bart Van Assche
       [not found] ` <5023DA39.7020000-HInyCGIudOg@public.gmane.org>
@ 2012-08-09 16:18 ` Bart Van Assche
       [not found]   ` <5023E2E3.4030602-HInyCGIudOg@public.gmane.org>
  2 siblings, 1 reply; 16+ messages in thread
From: Bart Van Assche @ 2012-08-09 16:18 UTC (permalink / raw)
  To: linux-rdma@vger.kernel.org, linux-scsi, David Dillow

On 08/09/12 15:41, Bart Van Assche wrote:
> [ ... ]

The patch series is also available on top of 3.6-rc1 here:
http://github.com/bvanassche/linux/tree/srp-ha

Bart.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 00/20, v4] Make ib_srp better suited for H.A. purposes
       [not found]   ` <5023E2E3.4030602-HInyCGIudOg@public.gmane.org>
@ 2012-08-11  8:29     ` Joseph Glanville
  0 siblings, 0 replies; 16+ messages in thread
From: Joseph Glanville @ 2012-08-11  8:29 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-scsi,
	David Dillow

Hi Bart,

I have tested this series, all issues relating to hard removing
targets are resolved.
Thanks!

Joseph.

On 10 August 2012 02:18, Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org> wrote:
> On 08/09/12 15:41, Bart Van Assche wrote:
>> [ ... ]
>
> The patch series is also available on top of 3.6-rc1 here:
> http://github.com/bvanassche/linux/tree/srp-ha
>
> Bart.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
CTO | Orion Virtualisation Solutions | www.orionvm.com.au
Phone: 1300 56 99 52 | Mobile: 0428 754 846
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 00/20, v4] Make ib_srp better suited for H.A. purposes
       [not found] ` <5023DA39.7020000-HInyCGIudOg@public.gmane.org>
                     ` (3 preceding siblings ...)
  2012-08-09 16:02   ` [PATCH 19/20] srp_transport: Add transport layer error handling Bart Van Assche
@ 2012-08-27 18:37   ` Dongsu Park
  2012-08-28 10:04     ` Bart Van Assche
  2012-09-25 15:05   ` Bart Van Assche
  5 siblings, 1 reply; 16+ messages in thread
From: Dongsu Park @ 2012-08-27 18:37 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-scsi,
	David Dillow

Hi Bart,

while testing ib_srp based on your srp-ha,
we sometimes hit kernel crashes with the call trace below.

How to reproduce:

0. Kernel 3.2.15 with SCST v4193 on the target,
   Kernel 3.2.8 with ib_srp-ha on the initiator.
1. Configure 500+ vdisks on target, and get initiator connected.
2. Exchange data intensively, which works well.
3. (On initiator) delete SRP remote port occasionally, e.g.
   # echo "1" > /sys/class/srp_remote_ports/port-6\:1/delete
   And configure again the SRP target.
4. (On target) disable Infiniband interface, and enable it again.
5. Repeat 3 and 4.

Then the initiator's kernel suddenly crashes. (but not always)

Do you have any idea why?

Thanks in advance,
Dongsu

---------------------------------------------------------------
BUG: unable to handle kernel paging request at 0000000000010001
IP: [<ffffffff8139ec55>] strnlen+0x5/0x40
PGD 212fea067 PUD 2162f8067 PMD 0
Oops: 0000 [#1] SMP
CPU 0
Modules linked in: ib_srp scsi_transport_srp scsi_tgt rdma_ucm rdma_cm
iw_cm ib_addr ib_ipoib ib_cm ib_sa ib_uverbs ib_umad loop psmouse
serio_raw evdev i2c_piix4 tpm_tis tpm tpm_bios ib_mthca sg ib_mad
processor amd64_edac_mod edac_core thermal_sys edac_mce_amd ib_core
button sd_mod crc_t10dif hid_cherry usb_storage ahci libahci libata
scsi_mod [last unloaded: scsi_wait_scan]

Pid: 2311, comm: kworker/0:2 Not tainted 3.2.8 #1 Supermicro
H8DGU/H8DGU
RIP: 0010:[<ffffffff8139ec55>]  [<ffffffff8139ec55>] strnlen+0x5/0x40
RSP: 0018:ffff880215fe3c28  EFLAGS: 00010086
RAX: ffffffff81915991 RBX: ffffffff81ba5497 RCX: ffffffffff0a0004
RDX: 0000000000010001 RSI: ffffffffffffffff RDI: 0000000000010001
RBP: ffffffff81ba5860 R08: 000000000000fffb R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000010001
R13: ffffffffffffffff R14: 00000000ffffffff R15: 0000000000000000
FS:  00007faafb63e700(0000) GS:ffff880217c00000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000010001 CR3: 0000000212f87000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/0:2 (pid: 2311, threadinfo ffff880215fe2000, task
ffff88020f2ce540)
Stack:
 ffffffff813a023c 0000000000000000 ffffffff81ba5497 ffffffffa0131d82
 ffffffffa0131d80 ffff880215fe3da0 ffffffff81ba5860 ffff880215fe3c90
 ffffffff813a142d 0000000000000016 ffffffff81ba5460 0000000000000400
Call Trace:
 [<ffffffff813a023c>] ? string+0x4c/0xe0
 [<ffffffff813a142d>] ? vsnprintf+0x1ed/0x5b0
 [<ffffffffa0131900>] ? do_srp_rport_del+0x30/0x30 [scsi_transport_srp]
 [<ffffffff813a18a9>] ? vscnprintf+0x9/0x20
 [<ffffffff81049b7f>] ? vprintk+0xaf/0x440
 [<ffffffff810f3cc0>] ? next_online_pgdat+0x20/0x50
 [<ffffffff810f3d20>] ? next_zone+0x30/0x40
 [<ffffffff810f4c60>] ? refresh_cpu_vm_stats+0xf0/0x160
 [<ffffffffa0131900>] ? do_srp_rport_del+0x30/0x30 [scsi_transport_srp]
 [<ffffffff816533b6>] ? printk+0x40/0x4a
 [<ffffffffa013192d>] ? rport_dev_loss_timedout+0x2d/0xa0
[scsi_transport_srp]
 [<ffffffff81063383>] ? process_one_work+0x113/0x470
 [<ffffffff81065c73>] ? worker_thread+0x163/0x3e0
 [<ffffffff81065b10>] ? manage_workers+0x200/0x200
 [<ffffffff81065b10>] ? manage_workers+0x200/0x200
 [<ffffffff8106a126>] ? kthread+0x96/0xa0
 [<ffffffff8165f674>] ? kernel_thread_helper+0x4/0x10
 [<ffffffff8106a090>] ? kthread_worker_fn+0x180/0x180
 [<ffffffff8165f670>] ? gs_change+0x13/0x13
Code: 1f 80 00 00 00 00 31 c0 80 3f 00 48 89 fa 74 14 66 0f 1f 44 00 00
48 ff c2 80 3a 00 75 f8 48 89 d0 48 29 f8 f3 c3 48 85 f6 74 27 <80> 3f
00 74 22 48 ff ce 48 89 f8 eb 0e 66 0f 1f 44 00 00 48 ff
RIP  [<ffffffff8139ec55>] strnlen+0x5/0x40
RSP <ffff880215fe3c28>
CR2: 0000000000010001
---[ end trace d55b61cd78c54a0a ]---
BUG: unable to handle kernel paging request at fffffffffffffff8


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 00/20, v4] Make ib_srp better suited for H.A. purposes
  2012-08-27 18:37   ` [PATCH 00/20, v4] Make ib_srp better suited for H.A. purposes Dongsu Park
@ 2012-08-28 10:04     ` Bart Van Assche
  2012-08-28 12:25       ` Dongsu Park
  0 siblings, 1 reply; 16+ messages in thread
From: Bart Van Assche @ 2012-08-28 10:04 UTC (permalink / raw)
  To: Dongsu Park; +Cc: linux-rdma@vger.kernel.org, linux-scsi, David Dillow

On 08/27/12 18:37, Dongsu Park wrote:
> while testing ib_srp based on your srp-ha,
> we sometimes hit kernel crashes with the call trace below.
> 
> How to reproduce:
> 
> 0. Kernel 3.2.15 with SCST v4193 on the target,
>    Kernel 3.2.8 with ib_srp-ha on the initiator.
> 1. Configure 500+ vdisks on target, and get initiator connected.
> 2. Exchange data intensively, which works well.
> 3. (On initiator) delete SRP remote port occasionally, e.g.
>    # echo "1" > /sys/class/srp_remote_ports/port-6\:1/delete
>    And configure again the SRP target.
> 4. (On target) disable Infiniband interface, and enable it again.
> 5. Repeat 3 and 4.
> 
> Then the initiator's kernel suddenly crashes. (but not always)
> 
> Do you have any idea why?

Hello Dongsu,

That's unfortunate. I've just finished running the above test 1000 times
on my test setup. The test ran perfectly - login succeeded every time,
the test finished in the expected time, no kernel crash did occur and no
memory was leaked. I've been running my test with kernel 3.6-rc3 instead
of kernel 3.2.8 though. Can you repeat your test with kernel 3.6-rc3 on
the initiator system instead of kernel 3.2.8 ? The 3.6-rc3 kernel
contains multiple patches that improve robustness with regard to SCSI
device removal.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 00/20, v4] Make ib_srp better suited for H.A. purposes
  2012-08-28 10:04     ` Bart Van Assche
@ 2012-08-28 12:25       ` Dongsu Park
  2012-08-28 12:58         ` Bart Van Assche
  0 siblings, 1 reply; 16+ messages in thread
From: Dongsu Park @ 2012-08-28 12:25 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: linux-rdma@vger.kernel.org, linux-scsi, David Dillow

Hi Bart,

On 28.08.2012 10:04, Bart Van Assche wrote:
> On 08/27/12 18:37, Dongsu Park wrote:
> > while testing ib_srp based on your srp-ha,
> > we sometimes hit kernel crashes with the call trace below.
> > 
> > How to reproduce:
> > 
> > 0. Kernel 3.2.15 with SCST v4193 on the target,
> >    Kernel 3.2.8 with ib_srp-ha on the initiator.
> > 1. Configure 500+ vdisks on target, and get initiator connected.
> > 2. Exchange data intensively, which works well.
> > 3. (On initiator) delete SRP remote port occasionally, e.g.
> >    # echo "1" > /sys/class/srp_remote_ports/port-6\:1/delete
> >    And configure again the SRP target.
> > 4. (On target) disable Infiniband interface, and enable it again.
> > 5. Repeat 3 and 4.
> > 
> > Then the initiator's kernel suddenly crashes. (but not always)
> > 
> > Do you have any idea why?
> 
> Hello Dongsu,
> 
> That's unfortunate. I've just finished running the above test 1000 times
> on my test setup. The test ran perfectly - login succeeded every time,
> the test finished in the expected time, no kernel crash did occur and no
> memory was leaked. I've been running my test with kernel 3.6-rc3 instead
> of kernel 3.2.8 though. Can you repeat your test with kernel 3.6-rc3 on
> the initiator system instead of kernel 3.2.8 ? The 3.6-rc3 kernel
> contains multiple patches that improve robustness with regard to SCSI
> device removal.

Ok, when I get a chance to set up a new test system with kernel 3.6-rc3,
I'll do a new test and let you know.

By the way, as long as I've observed today, the crash occurs only if
rport_dev_loss_timedout() is called. It means, without device loss,
a simple rport_delete does not make any crash.

Is that probably because arguments to pr_err() are accessing to invalid
addresses?

drivers/scsi/scsi_transport_srp.c:275

        pr_err("SRP transport: dev_loss_tmo (%ds) expired - removing %s.\n",
               rport->dev_loss_tmo, dev_name(&rport->dev));

Cheers,
Dongsu


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 00/20, v4] Make ib_srp better suited for H.A. purposes
  2012-08-28 12:25       ` Dongsu Park
@ 2012-08-28 12:58         ` Bart Van Assche
  0 siblings, 0 replies; 16+ messages in thread
From: Bart Van Assche @ 2012-08-28 12:58 UTC (permalink / raw)
  To: Dongsu Park; +Cc: linux-rdma@vger.kernel.org, linux-scsi, David Dillow

On 08/28/12 12:25, Dongsu Park wrote:
> By the way, as long as I've observed today, the crash occurs only if
> rport_dev_loss_timedout() is called. It means, without device loss,
> a simple rport_delete does not make any crash.
> 
> Is that probably because arguments to pr_err() are accessing to invalid
> addresses?
> 
> drivers/scsi/scsi_transport_srp.c:275
> 
>         pr_err("SRP transport: dev_loss_tmo (%ds) expired - removing %s.\n",
>                rport->dev_loss_tmo, dev_name(&rport->dev));

It's not clear to me how that could happen. The dev_loss timer is
stopped before the rport device is removed. See also the
srp_stop_rport() call (which stops the dev_loss timer) and
srp_remove_host() call (which removes the rport) in srp_remove_target().

Bart.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 00/20, v4] Make ib_srp better suited for H.A. purposes
       [not found] ` <5023DA39.7020000-HInyCGIudOg@public.gmane.org>
                     ` (4 preceding siblings ...)
  2012-08-27 18:37   ` [PATCH 00/20, v4] Make ib_srp better suited for H.A. purposes Dongsu Park
@ 2012-09-25 15:05   ` Bart Van Assche
  2012-09-27  0:31     ` David Dillow
  5 siblings, 1 reply; 16+ messages in thread
From: Bart Van Assche @ 2012-09-25 15:05 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-scsi,
	David Dillow

On 08/09/12 17:41, Bart Van Assche wrote:
> [ ... ]

Hello Dave,

More than six weeks have elapsed since I posted version four of this 
patch series. It would be appreciated if you could tell me when review 
comments for this patch series will be posted. I'd also like to remind 
you that some time ago you asked other people to wait with posting more 
ib_srp patches until this patch series is upstream [1, 2].

Thanks,

Bart.

[1] David Dillow, Re: [PATCH 1/1] ib_srp: Infiniband srp fast failover
patch, May 29, 2012
(http://www.mail-archive.com/linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org/msg11801.html).
[2] David Dillow, Re: [PATCH] srp: convert SRP_RQ_SHIFT into a module
parameter, May 29, 2012
(http://www.mail-archive.com/linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org/msg11802.html).

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 00/20, v4] Make ib_srp better suited for H.A. purposes
  2012-09-25 15:05   ` Bart Van Assche
@ 2012-09-27  0:31     ` David Dillow
       [not found]       ` <1348705896.26028.3.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: David Dillow @ 2012-09-27  0:31 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: linux-rdma@vger.kernel.org, linux-scsi

On Tue, 2012-09-25 at 17:05 +0200, Bart Van Assche wrote:
> On 08/09/12 17:41, Bart Van Assche wrote:
> > [ ... ]
> 
> Hello Dave,
> 
> More than six weeks have elapsed since I posted version four of this 
> patch series. It would be appreciated if you could tell me when review 
> comments for this patch series will be posted. I'd also like to remind 
> you that some time ago you asked other people to wait with posting more 
> ib_srp patches until this patch series is upstream [1, 2].

Yes, it has taken me far more time than I expected to get to these. I am
in the middle of fiscal-year-end thrash, and will attend to the SRP
backlog next week.
-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 00/20, v4] Make ib_srp better suited for H.A. purposes
       [not found]       ` <1348705896.26028.3.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
@ 2012-11-23 15:07         ` Bart Van Assche
       [not found]           ` <50AF9146.5000405-HInyCGIudOg@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Bart Van Assche @ 2012-11-23 15:07 UTC (permalink / raw)
  To: David Dillow
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-scsi

On 09/27/12 02:31, David Dillow wrote:
> On Tue, 2012-09-25 at 17:05 +0200, Bart Van Assche wrote:
>> On 08/09/12 17:41, Bart Van Assche wrote:
>>> [ ... ]
>>
>> Hello Dave,
>>
>> More than six weeks have elapsed since I posted version four of this
>> patch series. It would be appreciated if you could tell me when review
>> comments for this patch series will be posted. I'd also like to remind
>> you that some time ago you asked other people to wait with posting more
>> ib_srp patches until this patch series is upstream [1, 2].
>
> Yes, it has taken me far more time than I expected to get to these. I am
> in the middle of fiscal-year-end thrash, and will attend to the SRP
> backlog next week.

(replying to an e-mail of one month ago)

Hello Dave,

Will you have time next week to review the srp-ha patch series ?

Thanks,

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 00/20, v4] Make ib_srp better suited for H.A. purposes
       [not found]           ` <50AF9146.5000405-HInyCGIudOg@public.gmane.org>
@ 2012-11-26  4:47             ` David Dillow
  0 siblings, 0 replies; 16+ messages in thread
From: David Dillow @ 2012-11-26  4:47 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-scsi

On Fri, 2012-11-23 at 16:07 +0100, Bart Van Assche wrote:
> On 09/27/12 02:31, David Dillow wrote:
> > On Tue, 2012-09-25 at 17:05 +0200, Bart Van Assche wrote:
> >> On 08/09/12 17:41, Bart Van Assche wrote:
> >>> [ ... ]
> >>
> >> Hello Dave,
> >>
> >> More than six weeks have elapsed since I posted version four of this
> >> patch series. It would be appreciated if you could tell me when review
> >> comments for this patch series will be posted. I'd also like to remind
> >> you that some time ago you asked other people to wait with posting more
> >> ib_srp patches until this patch series is upstream [1, 2].
> >
> > Yes, it has taken me far more time than I expected to get to these. I am
> > in the middle of fiscal-year-end thrash, and will attend to the SRP
> > backlog next week.
> 
> (replying to an e-mail of one month ago)
> 
> Hello Dave,
> 
> Will you have time next week to review the srp-ha patch series ?

I just pushed my first patch for your review; fingers crossed to build
some momentum.
-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2012-11-26  4:47 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-09 15:41 [PATCH 00/20, v4] Make ib_srp better suited for H.A. purposes Bart Van Assche
2012-08-09 15:57 ` [PATCH 14/20] srp_transport: Simplify attribute initialization code Bart Van Assche
     [not found] ` <5023DA39.7020000-HInyCGIudOg@public.gmane.org>
2012-08-09 15:56   ` [PATCH 13/20] srp_transport: Fix atttribute registration Bart Van Assche
2012-08-09 15:58   ` [PATCH 15/20] srp_transport: Document sysfs attributes Bart Van Assche
2012-08-09 15:59   ` [PATCH 16/20] ib_srp: Allow SRP disconnect through sysfs Bart Van Assche
2012-08-09 16:02   ` [PATCH 19/20] srp_transport: Add transport layer error handling Bart Van Assche
2012-08-27 18:37   ` [PATCH 00/20, v4] Make ib_srp better suited for H.A. purposes Dongsu Park
2012-08-28 10:04     ` Bart Van Assche
2012-08-28 12:25       ` Dongsu Park
2012-08-28 12:58         ` Bart Van Assche
2012-09-25 15:05   ` Bart Van Assche
2012-09-27  0:31     ` David Dillow
     [not found]       ` <1348705896.26028.3.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
2012-11-23 15:07         ` Bart Van Assche
     [not found]           ` <50AF9146.5000405-HInyCGIudOg@public.gmane.org>
2012-11-26  4:47             ` David Dillow
2012-08-09 16:18 ` Bart Van Assche
     [not found]   ` <5023E2E3.4030602-HInyCGIudOg@public.gmane.org>
2012-08-11  8:29     ` Joseph Glanville

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).