* [PATCH 00/14] Make ib_srp better suited for H.A. purposes
@ 2011-12-01 18:54 Bart Van Assche
2011-12-01 19:05 ` [PATCH 08/14] srp_transport: Document sysfs attributes Bart Van Assche
` (5 more replies)
0 siblings, 6 replies; 36+ messages in thread
From: Bart Van Assche @ 2011-12-01 18:54 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
Cc: linux-scsi-u79uwXL29TY76Z2rM5mHXA, David Dillow, Roland Dreier,
Vu Pham
This patch series makes the ib_srp driver better suited for use in a H.A.
setup because:
- Switchover without triggering read or write errors become possible. Such
errors are bad because these can make a filesystem switch to read-only
mode.
- A ping mechanism has been added that allows to reduce the switch-over
time greatly.
- Disconnecting from a target without unloading ib_srp becomes possible.
- Switchover can be triggered explicitly by deleting an initiator device.
Where possible these mechanisms have been modelled after similar
functionality in the open-iscsi kernel code.
According to the measurements I ran these patches do not affect ib_srp
performance.
The individual patches are:
0001-ib_srp-Introduce-pr_fmt.patch
0002-ib_srp-Consolidate-repetitive-sysfs-code.patch
0003-ib_srp-Disallow-duplicate-logins.patch
0004-ib_srp-Set-block-layer-timeout.patch
0005-ib_srp-Avoid-that-late-SRP-replies-cause-trouble.patch
0006-ib_srp-Micro-optimize-completion-handlers.patch
0007-ib_srp-Introduce-srp_handle_qp_err.patch
0008-srp_transport-Document-sysfs-attributes.patch
0009-srp_transport-Fix-atttribute-registration-race.patch
0010-srp_transport-Simplify-attribute-initialization-code.patch
0011-ib_srp-Document-sysfs-attributes.patch
0012-ib_srp-Rework-error-handling.patch
0013-ib_srp-Implement-transport-layer-ping.patch
0014-ib_srp-Allow-SRP-disconnect-through-sysfs.patch
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 36+ messages in thread* [PATCH 08/14] srp_transport: Document sysfs attributes 2011-12-01 18:54 [PATCH 00/14] Make ib_srp better suited for H.A. purposes Bart Van Assche @ 2011-12-01 19:05 ` Bart Van Assche 2011-12-01 19:06 ` [PATCH 09/14] srp_transport: Fix attribute registration Bart Van Assche ` (4 subsequent siblings) 5 siblings, 0 replies; 36+ messages in thread From: Bart Van Assche @ 2011-12-01 19:05 UTC (permalink / raw) To: linux-rdma; +Cc: linux-scsi, Fujita Tomonori, Brian King Document the srp_transport sysfs attributes according to the rules specified in Documentation/ABI/README. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Brian King <brking@linux.vnet.ibm.com> --- Documentation/ABI/stable/sysfs-transport-srp | 12 ++++++++++++ 1 files changed, 12 insertions(+), 0 deletions(-) create mode 100644 Documentation/ABI/stable/sysfs-transport-srp diff --git a/Documentation/ABI/stable/sysfs-transport-srp b/Documentation/ABI/stable/sysfs-transport-srp new file mode 100644 index 0000000..7b0d4a5 --- /dev/null +++ b/Documentation/ABI/stable/sysfs-transport-srp @@ -0,0 +1,12 @@ +What: /sys/class/srp_remote_ports/port-<h>:<n>/port_id +Date: June 27, 2007 +KernelVersion: 2.6.24 +Contact: linux-scsi@vger.kernel.org +Description: 16-byte local SRP port identifier in hexadecimal format. An + example: 4c:49:4e:55:58:20:56:49:4f:00:00:00:00:00:00:00. + +What: /sys/class/srp_remote_ports/port-<h>:<n>/roles +Date: June 27, 2007 +KernelVersion: 2.6.24 +Contact: linux-scsi@vger.kernel.org +Description: Role of the remote port. Either "SRP Initiator" or "SRP Target". -- 1.7.3.4 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 09/14] srp_transport: Fix attribute registration 2011-12-01 18:54 [PATCH 00/14] Make ib_srp better suited for H.A. purposes Bart Van Assche 2011-12-01 19:05 ` [PATCH 08/14] srp_transport: Document sysfs attributes Bart Van Assche @ 2011-12-01 19:06 ` Bart Van Assche [not found] ` <201112012006.50445.bvanassche-HInyCGIudOg@public.gmane.org> 2011-12-01 19:10 ` [PATCH 12/14] ib_srp: Rework error handling Bart Van Assche ` (3 subsequent siblings) 5 siblings, 1 reply; 36+ messages in thread From: Bart Van Assche @ 2011-12-01 19:06 UTC (permalink / raw) To: linux-rdma, David Dillow, Roland Dreier, stable Cc: linux-scsi, Fujita Tomonori, Brian King Register transport attributes after the attribute array has been set up instead of before. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Brian King <brking@linux.vnet.ibm.com> Cc: David Dillow <dillowda@ornl.gov> Cc: Roland Dreier <roland@purestorage.com> Cc: stable@vger.kernel.org --- drivers/scsi/scsi_transport_srp.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c index 21a045e..07c4394 100644 --- a/drivers/scsi/scsi_transport_srp.c +++ b/drivers/scsi/scsi_transport_srp.c @@ -324,13 +324,14 @@ srp_attach_transport(struct srp_function_template *ft) i->rport_attr_cont.ac.attrs = &i->rport_attrs[0]; i->rport_attr_cont.ac.class = &srp_rport_class.class; i->rport_attr_cont.ac.match = srp_rport_match; - transport_container_register(&i->rport_attr_cont); count = 0; SETUP_RPORT_ATTRIBUTE_RD(port_id); SETUP_RPORT_ATTRIBUTE_RD(roles); i->rport_attrs[count] = NULL; + transport_container_register(&i->rport_attr_cont); + i->f = ft; return &i->t; -- 1.7.3.4 ^ permalink raw reply related [flat|nested] 36+ messages in thread
[parent not found: <201112012006.50445.bvanassche-HInyCGIudOg@public.gmane.org>]
* Re: [PATCH 09/14] srp_transport: Fix attribute registration [not found] ` <201112012006.50445.bvanassche-HInyCGIudOg@public.gmane.org> @ 2011-12-15 20:09 ` David Dillow 0 siblings, 0 replies; 36+ messages in thread From: David Dillow @ 2011-12-15 20:09 UTC (permalink / raw) To: Bart Van Assche Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Roland Dreier, stable-u79uwXL29TY76Z2rM5mHXA, linux-scsi-u79uwXL29TY76Z2rM5mHXA, Fujita Tomonori, Brian King On Thu, 2011-12-01 at 20:06 +0100, Bart Van Assche wrote: > Register transport attributes after the attribute array has been > set up instead of before. Why? I don't disagree with the change, but there should be at least a little discussion of why you're doing it. Your last message about this to James Bottomley had quite a bit of content, perhaps too much, but then his comment that 'it's best practice to initialize a structure before registering it' seems appropriate. > Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org> > Cc: FUJITA Tomonori <fujita.tomonori-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org> > Cc: Brian King <brking-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> > Cc: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org> > Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org> > Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > --- > drivers/scsi/scsi_transport_srp.c | 3 ++- > 1 files changed, 2 insertions(+), 1 deletions(-) > > diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c > index 21a045e..07c4394 100644 > --- a/drivers/scsi/scsi_transport_srp.c > +++ b/drivers/scsi/scsi_transport_srp.c > @@ -324,13 +324,14 @@ srp_attach_transport(struct srp_function_template *ft) > i->rport_attr_cont.ac.attrs = &i->rport_attrs[0]; > i->rport_attr_cont.ac.class = &srp_rport_class.class; > i->rport_attr_cont.ac.match = srp_rport_match; > - transport_container_register(&i->rport_attr_cont); > > count = 0; > SETUP_RPORT_ATTRIBUTE_RD(port_id); > SETUP_RPORT_ATTRIBUTE_RD(roles); > i->rport_attrs[count] = NULL; > > + transport_container_register(&i->rport_attr_cont); > + > i->f = ft; > > return &i->t; -- Dave Dillow National Center for Computational Science Oak Ridge National Laboratory (865) 241-6602 office -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 12/14] ib_srp: Rework error handling 2011-12-01 18:54 [PATCH 00/14] Make ib_srp better suited for H.A. purposes Bart Van Assche 2011-12-01 19:05 ` [PATCH 08/14] srp_transport: Document sysfs attributes Bart Van Assche 2011-12-01 19:06 ` [PATCH 09/14] srp_transport: Fix attribute registration Bart Van Assche @ 2011-12-01 19:10 ` Bart Van Assche [not found] ` <201112012010.37276.bvanassche-HInyCGIudOg@public.gmane.org> [not found] ` <201112011954.25811.bvanassche-HInyCGIudOg@public.gmane.org> ` (2 subsequent siblings) 5 siblings, 1 reply; 36+ messages in thread From: Bart Van Assche @ 2011-12-01 19:10 UTC (permalink / raw) To: linux-rdma, David Dillow, Fujita Tomonori, Brian King, Roland Dreier Cc: linux-scsi Add the necessary functions in the SRP transport module to allow an SRP initiator driver to implement transport layer recovery. Convert srp_target_port.qp_in_error into a target port state. Factor out the code for removing an SRP target into the new function srp_remove_target(). In the ib_srp IB completion handlers, do not stop processing completions after the first error completion or a CM DREQ has been received. Block the SCSI target as soon as the first IB error completion has been received. Eliminate the SCSI target state test from srp_queuecommand(). Rework ib_srp transport layer error handling. Instead of letting SCSI commands time out if a transport layer error occurs, block the SCSI host and try to reconnect until the reconnect timeout elapses or until the maximum number of reconnect attempts has been exceeded, whichever happens first. Rescan LUNs after having unblocked a SCSI target controlled by ib_srp. Add the sysfs attributes reconnect_tmo, failed_reconnects and max_reconnects. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Cc: David Dillow <dillowda@ornl.gov> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Brian King <brking@linux.vnet.ibm.com> Cc: Roland Dreier <roland@purestorage.com> --- Documentation/ABI/stable/sysfs-driver-ib_srp | 25 ++ drivers/infiniband/ulp/srp/ib_srp.c | 398 ++++++++++++++++++++------ drivers/infiniband/ulp/srp/ib_srp.h | 27 ++- drivers/scsi/scsi_transport_srp.c | 162 +++++++++++- include/scsi/scsi_transport_srp.h | 31 ++ 5 files changed, 548 insertions(+), 95 deletions(-) diff --git a/Documentation/ABI/stable/sysfs-driver-ib_srp b/Documentation/ABI/stable/sysfs-driver-ib_srp index 6b74c09..b1f449c 100644 --- a/Documentation/ABI/stable/sysfs-driver-ib_srp +++ b/Documentation/ABI/stable/sysfs-driver-ib_srp @@ -74,6 +74,12 @@ Contact: linux-rdma@vger.kernel.org Description: InfiniBand destination GID used for communication with the SRP target. Differs from orig_dgid if port redirection has happened. +What: /sys/class/scsi_host/host<n>/failed_reconnects +Date: November 1, 2011 +KernelVersion: 3.3 +Contact: linux-rdma@vger.kernel.org +Description: Number of consecutive failed SRP reconnect attempts. + What: /sys/class/scsi_host/host<n>/id_ext Date: June 17, 2006 KernelVersion: 2.6.17 @@ -102,6 +108,14 @@ Contact: linux-rdma@vger.kernel.org Description: Number of the HCA port used for communicating with the SRP target. +What: /sys/class/scsi_host/host<n>/max_reconnects +Date: November 1, 2011 +KernelVersion: 3.3 +Contact: linux-rdma@vger.kernel.org +Description: Maximum number of times ib_srp should attempt to reconnect + to the SRP target after an I/O error occurred. The value -1 + means that ib_srp should keep trying to reconnect forever. + What: /sys/class/scsi_host/host<n>/orig_dgid Date: June 17, 2006 KernelVersion: 2.6.17 @@ -116,6 +130,17 @@ Contact: linux-rdma@vger.kernel.org Description: A 16-bit number representing the InfiniBand partition key used for communication with the SRP target. +What: /sys/class/scsi_host/host<n>/reconnect_tmo +Date: November 1, 2011 +KernelVersion: 3.3 +Contact: linux-rdma@vger.kernel.org +Description: Time in seconds to wait after a failed connection attempt + before trying to reconnect. Setting this parameter to zero or + to a negative value prevents ib_srp to attempt to reconnect. + Changing this parameter from a value <= 0 into a value > 0 at + a time there is no connection with the target will cause + ib_srp to try to reconnect with the target. + What: /sys/class/scsi_host/host<n>/req_lim Date: October 20, 2010 KernelVersion: 2.6.36 diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 44c810b..797a8f1 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -1,5 +1,6 @@ /* * Copyright (c) 2005 Cisco Systems. All rights reserved. + * Copyright (c) 2010-2011 Bart Van Assche <bvanassche@acm.org>. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU @@ -430,6 +431,22 @@ static int srp_send_req(struct srp_target_port *target) static void srp_disconnect_target(struct srp_target_port *target) { + bool disconnect = false; + + spin_lock_irq(&target->lock); + switch (target->state) { + case SRP_TARGET_LIVE: + case SRP_TARGET_BLOCKED: + disconnect = true; + target->state = SRP_TARGET_DISCONNECTED; + default: + break; + } + spin_unlock_irq(&target->lock); + + if (!disconnect) + return; + /* XXX should send SRP_I_LOGOUT request */ init_completion(&target->done); @@ -489,32 +506,77 @@ static void srp_del_scsi_host_attr(struct Scsi_Host *shost) device_remove_file(&shost->shost_dev, *attr); } -static void srp_remove_work(struct work_struct *work) +/** + * srp_remove_target() - Remove an SRP target. + * + * The strategy to remove a target is as follows: + * - The caller must have set target->state to SRP_TARGET_REMOVED before + * invoking or queueing this function such that new calls to + * srp_disconnect_target(), srp_reconnect_target(), srp_recovery_timedout() + * or srp_block_work() do not have any effect. + * - Cancel block_work and lock and unlock target->mutex to make sure that any + * concurrent srp_block_rport() calls have finished. + * - Unblock the rport such that any blocked scanning work resumes. + * - Tear down the rport, the SCSI host and the IB resources associated with + * the target. + */ +static void srp_remove_target(struct srp_target_port *target) { - struct srp_target_port *target = - container_of(work, struct srp_target_port, work); - - if (!srp_change_state(target, SRP_TARGET_DEAD, SRP_TARGET_REMOVED)) - return; - - spin_lock(&target->srp_host->target_lock); - list_del(&target->list); - spin_unlock(&target->srp_host->target_lock); + WARN_ON(target->state != SRP_TARGET_REMOVED); srp_del_scsi_host_attr(target->scsi_host); + cancel_work_sync(&target->block_work); + mutex_lock(&target->mutex); + mutex_unlock(&target->mutex); + srp_unblock_rport(rport); + srp_rport_disable_recovery(target->rport); + cancel_delayed_work_sync(&target->reconnect_work); + srp_stop_rport(target->rport); srp_remove_host(target->scsi_host); scsi_remove_host(target->scsi_host); + cancel_work_sync(&target->scan_work); + srp_disconnect_target(target); ib_destroy_cm_id(target->cm_id); srp_free_target_ib(target); srp_free_req_data(target); scsi_host_put(target->scsi_host); } +static void srp_remove_work(struct work_struct *work) +{ + struct srp_target_port *target; + + target = container_of(work, struct srp_target_port, remove_work); + + spin_lock(&target->srp_host->target_lock); + list_del(&target->list); + spin_unlock(&target->srp_host->target_lock); + + srp_remove_target(target); +} + +static void srp_recovery_timedout(struct srp_rport *rport) +{ + struct srp_target_port *target = rport->lld_data; + + pr_debug("recovery timeout: rport = %p; target = %p / state %d\n", + rport, target, target->state); + + cancel_delayed_work_sync(&target->reconnect_work); + + mutex_lock(&target->mutex); + if (srp_change_state(target, SRP_TARGET_BLOCKED, SRP_TARGET_LIVE)) + srp_unblock_rport(rport); + mutex_unlock(&target->mutex); +} + static int srp_connect_target(struct srp_target_port *target) { int retries = 3; int ret; + WARN_ON(target->state != SRP_TARGET_CONNECTING); + ret = srp_lookup_path(target); if (ret) return ret; @@ -606,16 +668,40 @@ static void srp_reset_req(struct srp_target_port *target, struct srp_request *re srp_remove_req(target, req, 0); } +static void srp_scan_target(struct srp_target_port *target) +{ + scsi_scan_target(&target->scsi_host->shost_gendev, 0, target->scsi_id, + SCAN_WILD_CARD, 0); +} + +static void srp_scan_work(struct work_struct *work) +{ + struct srp_target_port *target; + + target = container_of(work, struct srp_target_port, scan_work); + srp_scan_target(target); +} + static int srp_reconnect_target(struct srp_target_port *target) { struct ib_qp_attr qp_attr; struct ib_wc wc; int i, ret; - if (!srp_change_state(target, SRP_TARGET_LIVE, SRP_TARGET_CONNECTING)) - return -EAGAIN; + mutex_lock(&target->mutex); + if (srp_change_state(target, SRP_TARGET_LIVE, SRP_TARGET_BLOCKED)) { + shost_printk(KERN_INFO, target->scsi_host, + PFX "blocked SCSI host.\n"); + srp_block_rport(target->rport); + } + mutex_unlock(&target->mutex); srp_disconnect_target(target); + + if (!srp_change_state(target, SRP_TARGET_DISCONNECTED, + SRP_TARGET_CONNECTING)) + return -EAGAIN; + /* * Now get a new local CM ID so that we avoid confusing the * target in case things are really fouled up. @@ -648,38 +734,65 @@ static int srp_reconnect_target(struct srp_target_port *target) for (i = 0; i < SRP_SQ_SIZE; ++i) list_add(&target->tx_ring[i]->list, &target->free_tx); - target->qp_in_error = 0; ret = srp_connect_target(target); if (ret) goto err; - if (!srp_change_state(target, SRP_TARGET_CONNECTING, SRP_TARGET_LIVE)) + mutex_lock(&target->mutex); + if (srp_change_state(target, SRP_TARGET_CONNECTING, SRP_TARGET_LIVE)) { + shost_printk(KERN_INFO, target->scsi_host, + PFX "unblocked SCSI host.\n"); + srp_unblock_rport(target->rport); + } else { ret = -EAGAIN; + } + mutex_unlock(&target->mutex); + + /* + * Since this code can be invoked from the context of the SCSI error + * handler, invoke SCSI scanning asynchronously. + */ + if (ret == 0) + queue_work(system_long_wq, &target->scan_work); return ret; err: - shost_printk(KERN_ERR, target->scsi_host, - PFX "reconnect failed (%d), removing target port.\n", ret); + srp_change_state(target, SRP_TARGET_CONNECTING, SRP_TARGET_BLOCKED); + return ret; +} - /* - * We couldn't reconnect, so kill our target port off. - * However, we have to defer the real removal because we - * are in the context of the SCSI error handler now, which - * will deadlock if we call scsi_remove_host(). - * - * Schedule our work inside the lock to avoid a race with - * the flush_scheduled_work() in srp_remove_one(). - */ - spin_lock_irq(&target->lock); - if (target->state == SRP_TARGET_CONNECTING) { - target->state = SRP_TARGET_DEAD; - INIT_WORK(&target->work, srp_remove_work); - queue_work(ib_wq, &target->work); +static void srp_reconnect_repeatedly(struct srp_target_port *target) +{ + int res, tmo; + + res = srp_reconnect_target(target); + if (res == 0 || target->state != SRP_TARGET_BLOCKED) + return; + + ++target->failed_reconnects; + + shost_printk(KERN_ERR, target->scsi_host, + PFX "reconnect attempt %d failed (%d).\n", + target->failed_reconnects, res); + + tmo = target->reconnect_tmo; + if (tmo > 0 && + (target->max_reconnects < 0 || + target->failed_reconnects < target->max_reconnects)) { + queue_delayed_work(system_long_wq, &target->reconnect_work, + tmo * HZ); } - spin_unlock_irq(&target->lock); +} - return ret; +static void srp_reconnect_work(struct work_struct *work) +{ + struct srp_target_port *target; + + target = container_of(to_delayed_work(work), struct srp_target_port, + reconnect_work); + + srp_reconnect_repeatedly(target); } static void srp_map_desc(struct srp_map_state *state, dma_addr_t dma_addr, @@ -1217,13 +1330,39 @@ static void srp_handle_recv(struct srp_target_port *target, struct ib_wc *wc) PFX "Recv failed with error code %d\n", res); } +static void srp_block_work(struct work_struct *work) +{ + struct srp_target_port *target; + bool reconnect = false; + + target = container_of(work, struct srp_target_port, block_work); + + mutex_lock(&target->mutex); + if (srp_change_state(target, SRP_TARGET_LIVE, SRP_TARGET_BLOCKED)) { + shost_printk(KERN_INFO, target->scsi_host, + PFX "blocked SCSI host.\n"); + srp_block_rport(target->rport); + srp_start_recovery_timer(target->rport); + reconnect = true; + } + mutex_unlock(&target->mutex); + + if (reconnect) { + target->failed_reconnects = 0; + srp_reconnect_repeatedly(target); + } +} + static void srp_handle_qp_err(enum ib_wc_status wc_status, enum ib_wc_opcode wc_opcode, struct srp_target_port *target) { - shost_printk(KERN_ERR, target->scsi_host, - PFX "failed receive status %d\n", wc_status); - target->qp_in_error = 1; + if (!queue_work(system_long_wq, &target->block_work) || + target->state == SRP_TARGET_BLOCKED) + return; + + shost_printk(KERN_ERR, target->scsi_host, PFX "failed %s status %d\n", + wc_opcode & IB_WC_RECV ? "receive" : "send", wc_status); } static void srp_recv_completion(struct ib_cq *cq, void *target_ptr) @@ -1233,12 +1372,10 @@ static void srp_recv_completion(struct ib_cq *cq, void *target_ptr) ib_req_notify_cq(cq, IB_CQ_NEXT_COMP); while (ib_poll_cq(cq, 1, &wc) > 0) { - if (wc.status == IB_WC_SUCCESS) { + if (wc.status == IB_WC_SUCCESS) srp_handle_recv(target, &wc); - } else { + else srp_handle_qp_err(wc.status, wc.opcode, target); - break; - } } } @@ -1254,7 +1391,6 @@ static void srp_send_completion(struct ib_cq *cq, void *target_ptr) list_add(&iu->list, &target->free_tx); } else { srp_handle_qp_err(wc.status, wc.opcode, target); - break; } } } @@ -1269,16 +1405,6 @@ static int srp_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmnd) unsigned long flags; int len; - if (target->state == SRP_TARGET_CONNECTING) - goto err; - - if (target->state == SRP_TARGET_DEAD || - target->state == SRP_TARGET_REMOVED) { - scmnd->result = DID_BAD_TARGET << 16; - scmnd->scsi_done(scmnd); - return 0; - } - spin_lock_irqsave(&target->lock, flags); iu = __srp_get_tx_iu(target, SRP_IU_CMD); if (!iu) @@ -1335,7 +1461,6 @@ err_iu: err_unlock: spin_unlock_irqrestore(&target->lock, flags); -err: return SCSI_MLQUEUE_HOST_BUSY; } @@ -1589,6 +1714,7 @@ static int srp_cm_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event) if (ib_send_cm_drep(cm_id, NULL, 0)) shost_printk(KERN_ERR, target->scsi_host, PFX "Sending CM DREP failed\n"); + queue_work(system_long_wq, &target->block_work); break; case IB_CM_TIMEWAIT_EXIT: @@ -1623,10 +1749,6 @@ static int srp_send_tsk_mgmt(struct srp_target_port *target, struct srp_iu *iu; struct srp_tsk_mgmt *tsk_mgmt; - if (target->state == SRP_TARGET_DEAD || - target->state == SRP_TARGET_REMOVED) - return -1; - init_completion(&target->tsk_mgmt_done); spin_lock_irq(&target->lock); @@ -1669,7 +1791,7 @@ static int srp_abort(struct scsi_cmnd *scmnd) shost_printk(KERN_ERR, target->scsi_host, "SRP abort called\n"); - if (!req || target->qp_in_error) + if (!req) return FAILED; if (srp_send_tsk_mgmt(target, req->index, scmnd->device->lun, SRP_TSK_ABORT_TASK)) @@ -1693,8 +1815,6 @@ static int srp_reset_device(struct scsi_cmnd *scmnd) shost_printk(KERN_ERR, target->scsi_host, "SRP reset_device called\n"); - if (target->qp_in_error) - return FAILED; if (srp_send_tsk_mgmt(target, SRP_TAG_NO_REQ, scmnd->device->lun, SRP_TSK_LUN_RESET)) return FAILED; @@ -1713,12 +1833,38 @@ static int srp_reset_device(struct scsi_cmnd *scmnd) static int srp_reset_host(struct scsi_cmnd *scmnd) { struct srp_target_port *target = host_to_target(scmnd->device->host); + struct srp_host *host = target->srp_host; int ret = FAILED; + bool remove = false; shost_printk(KERN_ERR, target->scsi_host, PFX "SRP reset_host called\n"); - if (!srp_reconnect_target(target)) + if (!srp_reconnect_target(target)) { ret = SUCCESS; + } else { + /* + * We couldn't reconnect, so kill our target port off. + * However, we have to defer the real removal because we + * are in the context of the SCSI error handler now, which + * will deadlock if we call scsi_remove_host(). + * + * Schedule our work inside the lock to avoid a race with + * the flush_work_sync() call in srp_remove_one(). + */ + spin_lock(&host->target_lock); + spin_lock_irq(&target->lock); + if (target->state != SRP_TARGET_REMOVED) { + target->state = SRP_TARGET_REMOVED; + remove = true; + } + spin_unlock_irq(&target->lock); + if (remove) { + shost_printk(KERN_ERR, target->scsi_host, PFX "recon" + "nect failed, removing target port.\n"); + queue_work(system_long_wq, &target->remove_work); + } + spin_unlock(&host->target_lock); + } return ret; } @@ -1822,6 +1968,66 @@ static ssize_t show_allow_ext_sg(struct device *dev, return sprintf(buf, "%s\n", target->allow_ext_sg ? "true" : "false"); } +static ssize_t show_reconnect_tmo(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct srp_target_port *target = host_to_target(class_to_shost(dev)); + + return sprintf(buf, "%d\n", target->reconnect_tmo); +} + +static ssize_t store_reconnect_tmo(struct device *dev, + struct device_attribute *attr, + const char *buf, const size_t count) +{ + struct srp_target_port *target = host_to_target(class_to_shost(dev)); + char ch[16]; + int res, tmo; + + sprintf(ch, "%.*s", (int)min(sizeof(ch) - 1, count), buf); + res = kstrtoint(ch, 0, &tmo); + if (res) + goto out; + target->reconnect_tmo = tmo; + if (tmo > 0 && target->state == SRP_TARGET_BLOCKED) + queue_delayed_work(system_long_wq, &target->reconnect_work, + tmo * HZ); + else if (tmo <= 0) + cancel_delayed_work(&target->reconnect_work); + res = count; +out: + return res; +} + +static ssize_t show_failed_reconnects(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct srp_target_port *target = host_to_target(class_to_shost(dev)); + + return sprintf(buf, "%d\n", target->failed_reconnects); +} + +static ssize_t show_max_reconnects(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct srp_target_port *target = host_to_target(class_to_shost(dev)); + + return sprintf(buf, "%d\n", target->max_reconnects); +} + +static ssize_t store_max_reconnects(struct device *dev, + struct device_attribute *attr, + const char *buf, const size_t count) +{ + struct srp_target_port *target = host_to_target(class_to_shost(dev)); + char ch[16]; + int res; + + sprintf(ch, "%.*s", (int)min(sizeof(ch) - 1, count), buf); + res = kstrtoint(ch, 0, &target->max_reconnects); + return res == 0 ? count : res; +} + static DEVICE_ATTR(id_ext, S_IRUGO, show_id_ext, NULL); static DEVICE_ATTR(ioc_guid, S_IRUGO, show_ioc_guid, NULL); static DEVICE_ATTR(service_id, S_IRUGO, show_service_id, NULL); @@ -1834,6 +2040,11 @@ static DEVICE_ATTR(local_ib_port, S_IRUGO, show_local_ib_port, NULL); static DEVICE_ATTR(local_ib_device, S_IRUGO, show_local_ib_device, NULL); static DEVICE_ATTR(cmd_sg_entries, S_IRUGO, show_cmd_sg_entries, NULL); static DEVICE_ATTR(allow_ext_sg, S_IRUGO, show_allow_ext_sg, NULL); +static DEVICE_ATTR(reconnect_tmo, S_IRUGO | S_IWUSR, show_reconnect_tmo, + store_reconnect_tmo); +static DEVICE_ATTR(failed_reconnects, S_IRUGO, show_failed_reconnects, NULL); +static DEVICE_ATTR(max_reconnects, S_IRUGO | S_IWUSR, show_max_reconnects, + store_max_reconnects); static struct device_attribute *srp_host_attrs[] = { &dev_attr_id_ext, @@ -1848,6 +2059,9 @@ static struct device_attribute *srp_host_attrs[] = { &dev_attr_local_ib_device, &dev_attr_cmd_sg_entries, &dev_attr_allow_ext_sg, + &dev_attr_reconnect_tmo, + &dev_attr_failed_reconnects, + &dev_attr_max_reconnects, NULL }; @@ -1869,6 +2083,10 @@ static struct scsi_host_template srp_template = { .shost_attrs = srp_host_attrs }; +static struct srp_function_template ib_srp_transport_functions = { + .rport_recovery_timedout = srp_recovery_timedout, +}; + static int srp_add_target(struct srp_host *host, struct srp_target_port *target) { struct srp_rport_identifiers ids; @@ -1889,14 +2107,17 @@ static int srp_add_target(struct srp_host *host, struct srp_target_port *target) return PTR_ERR(rport); } + rport->ft = &ib_srp_transport_functions; + rport->lld_data = target; + spin_lock(&host->target_lock); list_add_tail(&target->list, &host->target_list); spin_unlock(&host->target_lock); target->state = SRP_TARGET_LIVE; + target->rport = rport; - scsi_scan_target(&target->scsi_host->shost_gendev, - 0, target->scsi_id, SCAN_WILD_CARD, 0); + srp_scan_target(target); return 0; } @@ -2213,6 +2434,13 @@ static ssize_t srp_create_target(struct device *dev, sizeof (struct srp_indirect_buf) + target->cmd_sg_cnt * sizeof (struct srp_direct_buf); + mutex_init(&target->mutex); + INIT_WORK(&target->block_work, srp_block_work); + INIT_WORK(&target->remove_work, srp_remove_work); + INIT_WORK(&target->scan_work, srp_scan_work); + INIT_DELAYED_WORK(&target->reconnect_work, srp_reconnect_work); + target->reconnect_tmo = 10; + target->max_reconnects = -1; spin_lock_init(&target->lock); INIT_LIST_HEAD(&target->free_tx); INIT_LIST_HEAD(&target->free_reqs); @@ -2257,7 +2485,8 @@ static ssize_t srp_create_target(struct device *dev, if (ret) goto err_free_ib; - target->qp_in_error = 0; + target->state = SRP_TARGET_CONNECTING; + ret = srp_connect_target(target); if (ret) { shost_printk(KERN_ERR, target->scsi_host, @@ -2448,8 +2677,7 @@ static void srp_remove_one(struct ib_device *device) { struct srp_device *srp_dev; struct srp_host *host, *tmp_host; - LIST_HEAD(target_list); - struct srp_target_port *target, *tmp_target; + struct srp_target_port *target; srp_dev = ib_get_client_data(device, &srp_client); @@ -2462,36 +2690,31 @@ static void srp_remove_one(struct ib_device *device) wait_for_completion(&host->released); /* - * Mark all target ports as removed, so we stop queueing - * commands and don't try to reconnect. + * Mark all target ports as removed so we don't try to + * reconnect. Wait for any removal tasks that may have + * started before we marked our target ports as + * removed. */ spin_lock(&host->target_lock); - list_for_each_entry(target, &host->target_list, list) { + while (!list_empty(&host->target_list)) { + target = list_first_entry(&host->target_list, + struct srp_target_port, list); spin_lock_irq(&target->lock); target->state = SRP_TARGET_REMOVED; spin_unlock_irq(&target->lock); + if (work_pending(&target->remove_work)) { + spin_unlock(&host->target_lock); + flush_work_sync(&target->remove_work); + spin_lock(&host->target_lock); + } else { + list_del(&target->list); + spin_unlock(&host->target_lock); + srp_remove_target(target); + spin_lock(&host->target_lock); + } } spin_unlock(&host->target_lock); - /* - * Wait for any reconnection tasks that may have - * started before we marked our target ports as - * removed, and any target port removal tasks. - */ - flush_workqueue(ib_wq); - - list_for_each_entry_safe(target, tmp_target, - &host->target_list, list) { - srp_del_scsi_host_attr(target->scsi_host); - srp_remove_host(target->scsi_host); - scsi_remove_host(target->scsi_host); - srp_disconnect_target(target); - ib_destroy_cm_id(target->cm_id); - srp_free_target_ib(target); - srp_free_req_data(target); - scsi_host_put(target->scsi_host); - } - kfree(host); } @@ -2503,9 +2726,6 @@ static void srp_remove_one(struct ib_device *device) kfree(srp_dev); } -static struct srp_function_template ib_srp_transport_functions = { -}; - static int __init srp_init_module(void) { int ret; diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h index f010bd9..5e30cff 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.h +++ b/drivers/infiniband/ulp/srp/ib_srp.h @@ -78,11 +78,20 @@ enum { SRP_MAP_NO_FMR = 1, }; +/** + * @SRP_TARGET_LIVE: IB RC connection is established and SCSI host is unblocked. + * @SRP_TARGET_CONNECTING: IB RC connection being established. + * @SRP_TARGET_BLOCKED: An IB RC error occurred. Recovery timer may be running. + * SCSI host is blocked. + * @SRP_TARGET_DISCONNECTED: IB RC connection has been disconnected. + * @SRP_TARGET_REMOVED: SCSI host removal is pending. + */ enum srp_target_state { SRP_TARGET_LIVE, SRP_TARGET_CONNECTING, - SRP_TARGET_DEAD, - SRP_TARGET_REMOVED + SRP_TARGET_BLOCKED, + SRP_TARGET_DISCONNECTED, + SRP_TARGET_REMOVED, }; enum srp_iu_type { @@ -155,6 +164,7 @@ struct srp_target_port { u16 io_class; struct srp_host *srp_host; struct Scsi_Host *scsi_host; + struct srp_rport *rport; char target_name[32]; unsigned int scsi_id; unsigned int sg_tablesize; @@ -174,15 +184,22 @@ struct srp_target_port { struct srp_iu *rx_ring[SRP_RQ_SIZE]; struct srp_request req_ring[SRP_CMD_SQ_SIZE]; - struct work_struct work; - struct list_head list; struct completion done; int status; - int qp_in_error; + struct mutex mutex; struct completion tsk_mgmt_done; u8 tsk_mgmt_status; + + struct work_struct block_work; + struct work_struct remove_work; + struct work_struct scan_work; + + int reconnect_tmo; + int max_reconnects; + int failed_reconnects; + struct delayed_work reconnect_work; }; struct srp_iu { diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c index 1cbf097..9a57b7b 100644 --- a/drivers/scsi/scsi_transport_srp.c +++ b/drivers/scsi/scsi_transport_srp.c @@ -2,6 +2,7 @@ * SCSI RDMA (SRP) transport class * * Copyright (C) 2007 FUJITA Tomonori <tomof@acm.org> + * Copyright (C) 2011 Bart Van Assche <bvanassche@acm.org> * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License as @@ -38,7 +39,7 @@ struct srp_host_attrs { #define to_srp_host_attrs(host) ((struct srp_host_attrs *)(host)->shost_data) #define SRP_HOST_ATTRS 0 -#define SRP_RPORT_ATTRS 2 +#define SRP_RPORT_ATTRS 3 struct srp_internal { struct scsi_transport_template t; @@ -116,6 +117,158 @@ show_srp_rport_roles(struct device *dev, struct device_attribute *attr, static DEVICE_ATTR(roles, S_IRUGO, show_srp_rport_roles, NULL); +static ssize_t show_srp_rport_recovery_tmo(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct srp_rport *rport = transport_class_to_srp_rport(dev); + + return sprintf(buf, "%d\n", rport->recovery_tmo); +} + +static ssize_t store_srp_rport_recovery_tmo(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + struct srp_rport *rport = transport_class_to_srp_rport(dev); + char ch[16]; + int res; + int recovery_tmo; + + sprintf(ch, "%.*s", (int)min(sizeof(ch) - 1, count), buf); + res = kstrtoint(ch, 0, &recovery_tmo); + if (res) + goto out; + rport->recovery_tmo = recovery_tmo; + if (recovery_tmo > 0 && rport->state == SRP_RPORT_BLOCKED) + queue_delayed_work(system_long_wq, &rport->recovery_work, + recovery_tmo * HZ); + else if (recovery_tmo <= 0) + cancel_delayed_work(&rport->recovery_work); + res = count; +out: + return res; +} + +static DEVICE_ATTR(recovery_tmo, S_IRUGO | S_IWUSR, show_srp_rport_recovery_tmo, + store_srp_rport_recovery_tmo); + +/** + * rport_recovery_timedout() - Transport layer recovery timeout handler. + * + * If rport->ft->rport_recovery_timedout is not NULL, invoke that function. + * Otherwise unblock the SCSI host. + * + * Note: rport->ft->rport_recovery_timedout must either unblock or remove the + * SCSI host. + */ +static void rport_recovery_timedout(struct work_struct *work) +{ + struct srp_rport *rport; + + rport = container_of(to_delayed_work(work), struct srp_rport, + recovery_work); + + pr_debug("rport %p recovery timed out after %d secs\n", rport, + rport->recovery_tmo); + + BUG_ON(!rport->ft); + + if (rport->state == SRP_RPORT_BLOCKED && + rport->ft->rport_recovery_timedout) { + rport->ft->rport_recovery_timedout(rport); + } +} + +/** + * srp_unblock_rport() - Unblock an rport. + */ +void srp_unblock_rport(struct srp_rport *rport) +{ + unsigned long flags; + enum srp_rport_state prev_state; + bool unblock = false; + + pr_debug("Trying to unblock rport %p\n", rport); + + spin_lock_irqsave(&rport->lock, flags); + prev_state = rport->state; + if (prev_state == SRP_RPORT_BLOCKED) { + rport->state = SRP_RPORT_LIVE; + unblock = true; + } + spin_unlock_irqrestore(&rport->lock, flags); + + if (!unblock) { + pr_debug("Not unblocking rport %p because it is in state %d\n", + rport, prev_state); + return; + } + + cancel_delayed_work(&rport->recovery_work); + scsi_target_unblock(rport->dev.parent); + + pr_debug("Completed unblocking rport %p\n", rport); +} +EXPORT_SYMBOL(srp_unblock_rport); + +/** + * srp_block_rport() - Block an rport. + */ +void srp_block_rport(struct srp_rport *rport) +{ + unsigned long flags; + bool block = false; + + pr_debug("Blocking rport %p\n", rport); + + spin_lock_irqsave(&rport->lock, flags); + if (rport->state == SRP_RPORT_LIVE) { + rport->state = SRP_RPORT_BLOCKED; + block = true; + } + spin_unlock_irqrestore(&rport->lock, flags); + + scsi_target_block(rport->dev.parent); + + pr_debug("Completed blocking rport %p\n", rport); +} +EXPORT_SYMBOL(srp_block_rport); + +/** + * srp_start_recovery_timer() - Start the rport transport layer recovery timer. + */ +void srp_start_recovery_timer(struct srp_rport *rport) +{ + WARN_ON(rport->state != SRP_RPORT_BLOCKED); + + if (rport->recovery_tmo >= 0) + queue_delayed_work(system_long_wq, &rport->recovery_work, + rport->recovery_tmo * HZ); +} +EXPORT_SYMBOL(srp_start_recovery_timer); + +/** + * srp_rport_disable_recovery() - Disable the transport layer recovery timer. + */ +void srp_rport_disable_recovery(struct srp_rport *rport) +{ + struct device *dev = rport->dev.parent; + + device_remove_file(dev, &dev_attr_recovery_tmo); + cancel_delayed_work_sync(&rport->recovery_work); +} +EXPORT_SYMBOL(srp_rport_disable_recovery); + +/** + * srp_stop_rport() - Stop asynchronous work for an rport. + */ +void srp_stop_rport(struct srp_rport *rport) +{ + srp_rport_disable_recovery(rport); +} +EXPORT_SYMBOL(srp_stop_rport); + static void srp_rport_release(struct device *dev) { struct srp_rport *rport = dev_to_rport(dev); @@ -192,6 +345,11 @@ struct srp_rport *srp_rport_add(struct Scsi_Host *shost, memcpy(rport->port_id, ids->port_id, sizeof(rport->port_id)); rport->roles = ids->roles; + rport->recovery_tmo = 120; + INIT_DELAYED_WORK(&rport->recovery_work, rport_recovery_timedout); + spin_lock_init(&rport->lock); + rport->state = SRP_RPORT_LIVE; + id = atomic_inc_return(&to_srp_host_attrs(shost)->next_port_id); dev_set_name(&rport->dev, "port-%d:%d", shost->host_no, id); @@ -309,6 +467,8 @@ srp_attach_transport(struct srp_function_template *ft) count = 0; i->rport_attrs[count++] = &dev_attr_port_id; i->rport_attrs[count++] = &dev_attr_roles; + if (ft->rport_recovery_timedout) + i->rport_attrs[count++] = &dev_attr_recovery_tmo; i->rport_attrs[count++] = NULL; WARN_ON(count > ARRAY_SIZE(i->rport_attrs)); diff --git a/include/scsi/scsi_transport_srp.h b/include/scsi/scsi_transport_srp.h index 9c60ca1..fcda8e3 100644 --- a/include/scsi/scsi_transport_srp.h +++ b/include/scsi/scsi_transport_srp.h @@ -8,19 +8,45 @@ #define SRP_RPORT_ROLE_INITIATOR 0 #define SRP_RPORT_ROLE_TARGET 1 +/** + * enum srp_rport_state - Rport state. + * SRP_RPORT_LIVE: SCSI host logged in. + * SRP_RPORT_BLOCKED: SCSI host blocked because of a transport layer issue. + */ +enum srp_rport_state { + SRP_RPORT_LIVE, + SRP_RPORT_BLOCKED, +}; + struct srp_rport_identifiers { u8 port_id[16]; u8 roles; }; struct srp_rport { + /* for initiator and target drivers */ + + struct srp_function_template *ft; + struct device dev; u8 port_id[16]; u8 roles; + + /* for initiator drivers */ + + void *lld_data; /* LLD private data */ + + spinlock_t lock; + enum srp_rport_state state; + + int recovery_tmo; + struct delayed_work recovery_work; }; struct srp_function_template { + /* for initiator drivers */ + void (*rport_recovery_timedout) (struct srp_rport *rport); /* for target drivers */ int (* tsk_mgmt_response)(struct Scsi_Host *, u64, u64, int); int (* it_nexus_response)(struct Scsi_Host *, u64, int); @@ -33,6 +59,11 @@ extern void srp_release_transport(struct scsi_transport_template *); extern struct srp_rport *srp_rport_add(struct Scsi_Host *, struct srp_rport_identifiers *); extern void srp_rport_del(struct srp_rport *); +extern void srp_unblock_rport(struct srp_rport *rport); +extern void srp_block_rport(struct srp_rport *rport); +extern void srp_start_recovery_timer(struct srp_rport *rport); +extern void srp_rport_disable_recovery(struct srp_rport *rport); +extern void srp_stop_rport(struct srp_rport *rport); extern void srp_remove_host(struct Scsi_Host *); -- 1.7.3.4 ^ permalink raw reply related [flat|nested] 36+ messages in thread
[parent not found: <201112012010.37276.bvanassche-HInyCGIudOg@public.gmane.org>]
* Re: [PATCH 12/14] ib_srp: Rework error handling [not found] ` <201112012010.37276.bvanassche-HInyCGIudOg@public.gmane.org> @ 2011-12-15 20:20 ` David Dillow 2011-12-19 3:36 ` David Dillow 1 sibling, 0 replies; 36+ messages in thread From: David Dillow @ 2011-12-15 20:20 UTC (permalink / raw) To: Bart Van Assche Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Fujita Tomonori, Brian King, Roland Dreier, linux-scsi-u79uwXL29TY76Z2rM5mHXA On Thu, 2011-12-01 at 20:10 +0100, Bart Van Assche wrote: > Add the necessary functions in the SRP transport module to allow > an SRP initiator driver to implement transport layer recovery. I've run out of time today to look further, but I did want to point out that I think you're doing way too much in this patch. I like that you're moving some pieces into the transport, as it would be good to handle in-fabric/out-of-fabric there as well, as done for the FC transport. Getting there needs to be done through a series of patches rather one big patches, especially since you're touching code that multiple drivers use and need to get the core changes reviewed by so many people. I would need some quite some persuasion to be comfortable making large changes to srp_transport through the IB tree without ack's by Fujita and Brian. -- Dave Dillow National Center for Computational Science Oak Ridge National Laboratory (865) 241-6602 office -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 12/14] ib_srp: Rework error handling [not found] ` <201112012010.37276.bvanassche-HInyCGIudOg@public.gmane.org> 2011-12-15 20:20 ` David Dillow @ 2011-12-19 3:36 ` David Dillow 2011-12-19 10:38 ` Bart Van Assche [not found] ` <1324265791.17849.92.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org> 1 sibling, 2 replies; 36+ messages in thread From: David Dillow @ 2011-12-19 3:36 UTC (permalink / raw) To: Bart Van Assche Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Fujita Tomonori, Brian King, Roland Dreier, linux-scsi-u79uwXL29TY76Z2rM5mHXA On Thu, 2011-12-01 at 20:10 +0100, Bart Van Assche wrote: > Add the necessary functions in the SRP transport module to allow > an SRP initiator driver to implement transport layer recovery. > > Convert srp_target_port.qp_in_error into a target port state. This part should probably be folded into patch 7. > Factor out the code for removing an SRP target into the new function > srp_remove_target(). This could be a separate patch, making that easy to review. > In the ib_srp IB completion handlers, do not stop processing > completions after the first error completion or a CM DREQ has been > received. Why do this? It seems like you're just going to create a flurry of error messages when one would suffice, and we've got to tear down and reconnect anyway. > Block the SCSI target as soon as the first IB error > completion has been received. Eliminate the SCSI target state test > from srp_queuecommand(). I think you still want to check the SRP target state in srp_queuecommand() -- you schedule the blocking of the queue, but it could be a while before it happens. No sense sending down more commands when you know they are going to fail -- and if you're worried about the performance of the conditional in the hot path, we can make it unlikely(). Cleaning up the target states lets us just check for SRP_TARGET_LIVE, though you would want to change the creation sequence so that isn't the birth state of the target. > Rework ib_srp transport layer error handling. Instead of letting SCSI > commands time out if a transport layer error occurs, This is good, but should probably be part of the initial disconnect. We want to run the receive queue dry, processing responses to avoid unnecessarily resetting a command that was successful. > block the SCSI > host and try to reconnect until the reconnect timeout elapses or until > the maximum number of reconnect attempts has been exceeded, whichever > happens first. When we're blocked for a disconnected target, we may want to call the state something other than SRP_TARGET_BLOCKED. I'd like to eventually handle the case where the target leaves the fabric temporarily -- we don't necessarily disconnect in that case, but we need to block commands until it comes back or we give up on it. > /* > * Copyright (c) 2005 Cisco Systems. All rights reserved. > + * Copyright (c) 2010-2011 Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>. You've tried to add this in the past; I don't mean to deny you credit, and I certainly value your work, but I'm not sure where the threshold is for adding a copyright line. Roland, care to comment here? > +static void srp_remove_target(struct srp_target_port *target) > { > srp_del_scsi_host_attr(target->scsi_host); > + cancel_work_sync(&target->block_work); > + mutex_lock(&target->mutex); > + mutex_unlock(&target->mutex); You lock and unlock here without doing anything in the critical section. > static int srp_reset_host(struct scsi_cmnd *scmnd) > { > struct srp_target_port *target = host_to_target(scmnd->device->host); > + struct srp_host *host = target->srp_host; > int ret = FAILED; > + bool remove = false; > > shost_printk(KERN_ERR, target->scsi_host, PFX "SRP reset_host called\n"); > > - if (!srp_reconnect_target(target)) > + if (!srp_reconnect_target(target)) { > ret = SUCCESS; > + } else { > + /* > + * We couldn't reconnect, so kill our target port off. > + * However, we have to defer the real removal because we > + * are in the context of the SCSI error handler now, which > + * will deadlock if we call scsi_remove_host(). Part of me wonders if killing the SCSI host on a failed reconnect is the correct thing to do. Certainly, the checks in scsi_eh_test_devices() are going to kick the drives to offline state, but the FC transport blocks the error handling to avoid that. I think we need to give our error behavior a bit more thought. -- Dave Dillow National Center for Computational Science Oak Ridge National Laboratory (865) 241-6602 office -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 12/14] ib_srp: Rework error handling 2011-12-19 3:36 ` David Dillow @ 2011-12-19 10:38 ` Bart Van Assche 2011-12-19 22:51 ` David Dillow [not found] ` <1324265791.17849.92.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org> 1 sibling, 1 reply; 36+ messages in thread From: Bart Van Assche @ 2011-12-19 10:38 UTC (permalink / raw) To: David Dillow Cc: linux-rdma, Fujita Tomonori, Brian King, Roland Dreier, linux-scsi On Mon, Dec 19, 2011 at 3:36 AM, David Dillow <dillowda@ornl.gov> wrote: > On Thu, 2011-12-01 at 20:10 +0100, Bart Van Assche wrote: >> Rework ib_srp transport layer error handling. Instead of letting SCSI >> commands time out if a transport layer error occurs, > > This is good, but should probably be part of the initial disconnect. We > want to run the receive queue dry, processing responses to avoid > unnecessarily resetting a command that was successful. Blocking the SCSI host doesn't prevent ib_srp from continuing to process IB completion notifications. >> block the SCSI >> host and try to reconnect until the reconnect timeout elapses or until >> the maximum number of reconnect attempts has been exceeded, whichever >> happens first. > > When we're blocked for a disconnected target, we may want to call the > state something other than SRP_TARGET_BLOCKED. I'd like to eventually > handle the case where the target leaves the fabric temporarily -- we > don't necessarily disconnect in that case, but we need to block commands > until it comes back or we give up on it. The case where the target leaves the fabric temporarily should already be covered by this patch series. >> +static void srp_remove_target(struct srp_target_port *target) >> { > >> srp_del_scsi_host_attr(target->scsi_host); >> + cancel_work_sync(&target->block_work); >> + mutex_lock(&target->mutex); >> + mutex_unlock(&target->mutex); > > You lock and unlock here without doing anything in the critical section. That's on purpose. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 12/14] ib_srp: Rework error handling 2011-12-19 10:38 ` Bart Van Assche @ 2011-12-19 22:51 ` David Dillow [not found] ` <1324335083.7043.66.camel-FqX9LgGZnHWDB2HL1qBt2PIbXMQ5te18@public.gmane.org> 0 siblings, 1 reply; 36+ messages in thread From: David Dillow @ 2011-12-19 22:51 UTC (permalink / raw) To: Bart Van Assche Cc: linux-rdma@vger.kernel.org, Fujita Tomonori, Brian King, Roland Dreier, linux-scsi@vger.kernel.org On Mon, 2011-12-19 at 05:38 -0500, Bart Van Assche wrote: > On Mon, Dec 19, 2011 at 3:36 AM, David Dillow <dillowda@ornl.gov> wrote: > > On Thu, 2011-12-01 at 20:10 +0100, Bart Van Assche wrote: > >> Rework ib_srp transport layer error handling. Instead of letting SCSI > >> commands time out if a transport layer error occurs, > > > > This is good, but should probably be part of the initial disconnect. We > > want to run the receive queue dry, processing responses to avoid > > unnecessarily resetting a command that was successful. > > Blocking the SCSI host doesn't prevent ib_srp from continuing to > process IB completion notifications. In your series, it doesn't -- and I was wrong about the message spam, you check for the proper state and the work being queued. I haven't parsed it all out from your changes just yet, but I think part of the reason you may have had problems with req->scmd being null in srp_handle_recv() is due to a new race between the tear down of the connection and continuing to process completion notifications. > >> block the SCSI > >> host and try to reconnect until the reconnect timeout elapses or until > >> the maximum number of reconnect attempts has been exceeded, whichever > >> happens first. > > > > When we're blocked for a disconnected target, we may want to call the > > state something other than SRP_TARGET_BLOCKED. I'd like to eventually > > handle the case where the target leaves the fabric temporarily -- we > > don't necessarily disconnect in that case, but we need to block commands > > until it comes back or we give up on it. > > The case where the target leaves the fabric temporarily should already > be covered by this patch series. Only once a command is sent to it and timesout or we get a QP error. The idea here would be to start the dev_loss_tmo timer once it was detected that it left the fabric. This was handled in OFED via sysfs attributes, but it seems like we could register for the events ourselves. This isn't something I expect you to implement, but I'd like to leave room for it in the future. > >> +static void srp_remove_target(struct srp_target_port *target) > >> { > > > >> srp_del_scsi_host_attr(target->scsi_host); > >> + cancel_work_sync(&target->block_work); > >> + mutex_lock(&target->mutex); > >> + mutex_unlock(&target->mutex); > > > > You lock and unlock here without doing anything in the critical section. > > That's on purpose. I'm assuming you do this to ensure that everyone has seen the appropriate state and exited the critical section, then? Best to add a comment to that effect. Actually, is there any reason to unlock it again since we're removing the target? I don't have the code in front of me ATM, so I'm assuming that you don't call any other routines that need the lock after this. sparse may complain, though. -- Dave Dillow National Center for Computational Science Oak Ridge National Laboratory (865) 241-6602 office ^ permalink raw reply [flat|nested] 36+ messages in thread
[parent not found: <1324335083.7043.66.camel-FqX9LgGZnHWDB2HL1qBt2PIbXMQ5te18@public.gmane.org>]
* Re: [PATCH 12/14] ib_srp: Rework error handling [not found] ` <1324335083.7043.66.camel-FqX9LgGZnHWDB2HL1qBt2PIbXMQ5te18@public.gmane.org> @ 2011-12-20 9:01 ` Bart Van Assche [not found] ` <CAO+b5-qF2taG0B4n9SBwqnuh0wajH5fXFLTb-VAaDrfT9TZ6aQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 36+ messages in thread From: Bart Van Assche @ 2011-12-20 9:01 UTC (permalink / raw) To: David Dillow Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Fujita Tomonori, Brian King, Roland Dreier, linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Mon, Dec 19, 2011 at 10:51 PM, David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org> wrote: > I haven't parsed it all out from your changes just yet, but I think part > of the reason you may have had problems with req->scmd being null in > srp_handle_recv() is due to a new race between the tear down of the > connection and continuing to process completion notifications. The resources used by the QP completion handler are the srp_target port data structure and the QP data structure. And as you can see in srp_remove_target(), the scsi_host_put() call is invoked after srp_disconnect_target(). That last function waits for the DREP notification from the IB CM. That should guarantee that no IB completions arrive during or after the scsi_host_put() call. Or did I miss something ? Note: I haven't observed the req->scmd == NULL condition yet. > > > You lock and unlock here without doing anything in the critical section. > > > > That's on purpose. > > I'm assuming you do this to ensure that everyone has seen the > appropriate state and exited the critical section, then? Best to add a > comment to that effect. OK, will do. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
[parent not found: <CAO+b5-qF2taG0B4n9SBwqnuh0wajH5fXFLTb-VAaDrfT9TZ6aQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH 12/14] ib_srp: Rework error handling [not found] ` <CAO+b5-qF2taG0B4n9SBwqnuh0wajH5fXFLTb-VAaDrfT9TZ6aQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2011-12-21 3:33 ` David Dillow [not found] ` <1324438387.7621.53.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org> 0 siblings, 1 reply; 36+ messages in thread From: David Dillow @ 2011-12-21 3:33 UTC (permalink / raw) To: Bart Van Assche Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Fujita Tomonori, Brian King, Roland Dreier, linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Tue, 2011-12-20 at 09:01 +0000, Bart Van Assche wrote: > On Mon, Dec 19, 2011 at 10:51 PM, David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org> wrote: > > I haven't parsed it all out from your changes just yet, but I think part > > of the reason you may have had problems with req->scmd being null in > > srp_handle_recv() is due to a new race between the tear down of the > > connection and continuing to process completion notifications. > > The resources used by the QP completion handler are the srp_target > port data structure and the QP data structure. And as you can see in > srp_remove_target(), the scsi_host_put() call is invoked after > srp_disconnect_target(). That last function waits for the DREP > notification from the IB CM. That should guarantee that no IB > completions arrive during or after the scsi_host_put() call. Or did I > miss something ? What keeps the srp_recv_completion() --> srp_handle_recv() --> srp_process_rsp() --> etc. call chain from racing with srp_reconnect_target()? It looks like the srp_reset_req() in srp_reconnect_tartget() could race with srp_process_rsp() if receive completions continue to be processed after a send failure, but perhaps it is I who is missing something. -- Dave Dillow National Center for Computational Science Oak Ridge National Laboratory (865) 241-6602 office -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
[parent not found: <1324438387.7621.53.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>]
* Re: [PATCH 12/14] ib_srp: Rework error handling [not found] ` <1324438387.7621.53.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org> @ 2011-12-21 13:26 ` Bart Van Assche 0 siblings, 0 replies; 36+ messages in thread From: Bart Van Assche @ 2011-12-21 13:26 UTC (permalink / raw) To: David Dillow Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Fujita Tomonori, Brian King, Roland Dreier, linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Wed, Dec 21, 2011 at 3:33 AM, David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org> wrote: > What keeps the srp_recv_completion() --> srp_handle_recv() --> > srp_process_rsp() --> etc. call chain from racing with > srp_reconnect_target()? > > It looks like the srp_reset_req() in srp_reconnect_target() could race > with srp_process_rsp() if receive completions continue to be processed > after a send failure, but perhaps it is I who is missing something. In the tests I ran so far no issues came up most likely because the decision to reconnect is taken a long time after the most recent response has been received. I'll see what has to change to make this safe even if there wouldn't be any such delay. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
[parent not found: <1324265791.17849.92.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>]
* Re: [PATCH 12/14] ib_srp: Rework error handling [not found] ` <1324265791.17849.92.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org> @ 2011-12-26 19:13 ` Bart Van Assche 0 siblings, 0 replies; 36+ messages in thread From: Bart Van Assche @ 2011-12-26 19:13 UTC (permalink / raw) To: David Dillow Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Fujita Tomonori, Brian King, Roland Dreier, linux-scsi-u79uwXL29TY76Z2rM5mHXA On Mon, Dec 19, 2011 at 3:36 AM, David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org> wrote: > On Thu, 2011-12-01 at 20:10 +0100, Bart Van Assche wrote: > > /* > > * Copyright (c) 2005 Cisco Systems. All rights reserved. > > + * Copyright (c) 2010-2011 Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>. > > You've tried to add this in the past; I don't mean to deny you credit, > and I certainly value your work, but I'm not sure where the threshold is > for adding a copyright line. The excerpts below might provide some guidance (source: "The Copyright Handbook: What Every Writer Needs to Know"): [ ... ] There are strict technical requirements as to what a copyright notice must contain. Follow these rules exactly or your notice may be found to be invalid and not accomplish its intended purpose. A valid copyright notice contains three elements: * the copyright symbol * the year in which the work was published * the name of the copyright owner. It is not required that these elements appear in any particular order in the notice, but most notices are written in the order set forth above. We'll discuss each element in turn. [ ... ] If there are multiple authors, they should all be listed in the copyright notice. The authors' names can appear in any order. [ ... ] If I interpret the above correctly, at least two names are missing in the copyright header of the source file ib_srp.c. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
[parent not found: <201112011954.25811.bvanassche-HInyCGIudOg@public.gmane.org>]
* [PATCH 10/14] srp_transport: Simplify attribute initialization code [not found] ` <201112011954.25811.bvanassche-HInyCGIudOg@public.gmane.org> @ 2011-12-01 19:08 ` Bart Van Assche [not found] ` <201112012008.00502.bvanassche-HInyCGIudOg@public.gmane.org> 2011-12-01 19:11 ` [PATCH 13/14] ib_srp: Implement transport layer ping Bart Van Assche 1 sibling, 1 reply; 36+ messages in thread From: Bart Van Assche @ 2011-12-01 19:08 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Fujita Tomonori, Brian King, David Dillow, Roland Dreier Cc: linux-scsi-u79uwXL29TY76Z2rM5mHXA Eliminate the private_rport_attrs[] array and the SETUP_*() macros used to set up that array since the information in that array duplicates the information in the static device attributes. Also, trigger a kernel warning if SRP_RPORT_ATTRS is not large enough since it is easy to forget to update that macro when adding new attributes. Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org> Cc: FUJITA Tomonori <fujita.tomonori-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org> Cc: Brian King <brking-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> Cc: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org> Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org> --- drivers/scsi/scsi_transport_srp.c | 26 ++++---------------------- 1 files changed, 4 insertions(+), 22 deletions(-) diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c index 07c4394..1cbf097 100644 --- a/drivers/scsi/scsi_transport_srp.c +++ b/drivers/scsi/scsi_transport_srp.c @@ -47,7 +47,6 @@ struct srp_internal { struct device_attribute *host_attrs[SRP_HOST_ATTRS + 1]; struct device_attribute *rport_attrs[SRP_RPORT_ATTRS + 1]; - struct device_attribute private_rport_attrs[SRP_RPORT_ATTRS]; struct transport_container rport_attr_cont; }; @@ -72,24 +71,6 @@ static DECLARE_TRANSPORT_CLASS(srp_host_class, "srp_host", srp_host_setup, static DECLARE_TRANSPORT_CLASS(srp_rport_class, "srp_remote_ports", NULL, NULL, NULL); -#define SETUP_TEMPLATE(attrb, field, perm, test, ro_test, ro_perm) \ - i->private_##attrb[count] = dev_attr_##field; \ - i->private_##attrb[count].attr.mode = perm; \ - if (ro_test) { \ - i->private_##attrb[count].attr.mode = ro_perm; \ - i->private_##attrb[count].store = NULL; \ - } \ - i->attrb[count] = &i->private_##attrb[count]; \ - if (test) \ - count++ - -#define SETUP_RPORT_ATTRIBUTE_RD(field) \ - SETUP_TEMPLATE(rport_attrs, field, S_IRUGO, 1, 0, 0) - -#define SETUP_RPORT_ATTRIBUTE_RW(field) \ - SETUP_TEMPLATE(rport_attrs, field, S_IRUGO | S_IWUSR, \ - 1, 1, S_IRUGO) - #define SRP_PID(p) \ (p)->port_id[0], (p)->port_id[1], (p)->port_id[2], (p)->port_id[3], \ (p)->port_id[4], (p)->port_id[5], (p)->port_id[6], (p)->port_id[7], \ @@ -326,9 +307,10 @@ srp_attach_transport(struct srp_function_template *ft) i->rport_attr_cont.ac.match = srp_rport_match; count = 0; - SETUP_RPORT_ATTRIBUTE_RD(port_id); - SETUP_RPORT_ATTRIBUTE_RD(roles); - i->rport_attrs[count] = NULL; + i->rport_attrs[count++] = &dev_attr_port_id; + i->rport_attrs[count++] = &dev_attr_roles; + i->rport_attrs[count++] = NULL; + WARN_ON(count > ARRAY_SIZE(i->rport_attrs)); transport_container_register(&i->rport_attr_cont); -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 36+ messages in thread
[parent not found: <201112012008.00502.bvanassche-HInyCGIudOg@public.gmane.org>]
* Re: [PATCH 10/14] srp_transport: Simplify attribute initialization code [not found] ` <201112012008.00502.bvanassche-HInyCGIudOg@public.gmane.org> @ 2011-12-19 0:07 ` David Dillow [not found] ` <1324253243.17849.45.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org> 0 siblings, 1 reply; 36+ messages in thread From: David Dillow @ 2011-12-19 0:07 UTC (permalink / raw) To: Bart Van Assche Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Fujita Tomonori, Brian King, Roland Dreier, linux-scsi-u79uwXL29TY76Z2rM5mHXA On Thu, 2011-12-01 at 20:08 +0100, Bart Van Assche wrote: > Eliminate the private_rport_attrs[] array and the SETUP_*() macros > used to set up that array since the information in that array > duplicates the information in the static device attributes. Also, > trigger a kernel warning if SRP_RPORT_ATTRS is not large enough > since it is easy to forget to update that macro when adding new > attributes. Not sure I care one way or the other, but it looks to have been written the way it was to share a common code structure with the FC, SAS, and other SCSI transport code. They use a BUG_ON instead of WARN_ON, which seems more appropriate, since we just overran a buffer... The SCSI guys need to review this one, as it needs their ack. -- Dave Dillow National Center for Computational Science Oak Ridge National Laboratory (865) 241-6602 office -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
[parent not found: <1324253243.17849.45.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>]
* Re: [PATCH 10/14] srp_transport: Simplify attribute initialization code [not found] ` <1324253243.17849.45.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org> @ 2011-12-20 10:21 ` Bart Van Assche 2011-12-21 3:23 ` David Dillow 0 siblings, 1 reply; 36+ messages in thread From: Bart Van Assche @ 2011-12-20 10:21 UTC (permalink / raw) To: David Dillow Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Fujita Tomonori, Brian King, Roland Dreier, linux-scsi-u79uwXL29TY76Z2rM5mHXA On Mon, Dec 19, 2011 at 12:07 AM, David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org> wrote: > On Thu, 2011-12-01 at 20:08 +0100, Bart Van Assche wrote: >> Eliminate the private_rport_attrs[] array and the SETUP_*() macros >> used to set up that array since the information in that array >> duplicates the information in the static device attributes. Also, >> trigger a kernel warning if SRP_RPORT_ATTRS is not large enough >> since it is easy to forget to update that macro when adding new >> attributes. > > Not sure I care one way or the other, but it looks to have been written > the way it was to share a common code structure with the FC, SAS, and > other SCSI transport code. My opinion about the SETUP_*() macros is that - at least in the SRP transport code - they make the code harder to read without having any real value and hence should be removed. > They use a BUG_ON instead of WARN_ON, which > seems more appropriate, since we just overran a buffer... Sorry, but I disagree. The guideline for Linux kernel code is to use BUG_ON() only if a crash can't be avoided. With SLUB red zoning enabled it is still possible to unload kernel modules after a (mild) buffer overrun. And if you have a look at mm/slub.c, you will see that slab_err() is invoked in case padding has been overwritten. That last function does something similar to WARN_ON(). Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 10/14] srp_transport: Simplify attribute initialization code 2011-12-20 10:21 ` Bart Van Assche @ 2011-12-21 3:23 ` David Dillow 0 siblings, 0 replies; 36+ messages in thread From: David Dillow @ 2011-12-21 3:23 UTC (permalink / raw) To: Bart Van Assche Cc: linux-rdma, Fujita Tomonori, Brian King, Roland Dreier, linux-scsi On Tue, 2011-12-20 at 10:21 +0000, Bart Van Assche wrote: > On Mon, Dec 19, 2011 at 12:07 AM, David Dillow <dillowda@ornl.gov> wrote: > > They use a BUG_ON instead of WARN_ON, which > > seems more appropriate, since we just overran a buffer... > > Sorry, but I disagree. The guideline for Linux kernel code is to use > BUG_ON() only if a crash can't be avoided. With SLUB red zoning > enabled it is still possible to unload kernel modules after a (mild) > buffer overrun. And if you have a look at mm/slub.c, you will see that > slab_err() is invoked in case padding has been overwritten. That last > function does something similar to WARN_ON(). You aren't guaranteed to have SLUB as your allocator, much less having the red-zoning turned on, so you have no guarantee it is safe to continue. -- Dave Dillow National Center for Computational Science Oak Ridge National Laboratory (865) 241-6602 office ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 13/14] ib_srp: Implement transport layer ping [not found] ` <201112011954.25811.bvanassche-HInyCGIudOg@public.gmane.org> 2011-12-01 19:08 ` [PATCH 10/14] srp_transport: Simplify attribute initialization code Bart Van Assche @ 2011-12-01 19:11 ` Bart Van Assche 2011-12-19 0:50 ` David Dillow 1 sibling, 1 reply; 36+ messages in thread From: Bart Van Assche @ 2011-12-01 19:11 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA Cc: linux-scsi-u79uwXL29TY76Z2rM5mHXA, David Dillow, Roland Dreier, Fujita Tomonori, Brian King Add a time-based transport layer test such that fail-over in a multipath setup can happen quickly. Add the necessary functions in the SRP transport module to allow an SRP initiator driver to implement this. Add a slave_delete callback in the SCSI host template such that SCSI hosts that hold a reference to a SCSI device can be deleted via the sysfs SCSI host "delete" attribute. Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org> Cc: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org> Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org> Cc: FUJITA Tomonori <fujita.tomonori-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org> Cc: Brian King <brking-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> --- Documentation/ABI/stable/sysfs-transport-srp | 18 +++ drivers/infiniband/ulp/srp/ib_srp.c | 48 ++++++++ drivers/scsi/scsi_sysfs.c | 7 +- drivers/scsi/scsi_transport_srp.c | 168 +++++++++++++++++++++++++- include/scsi/scsi_host.h | 9 ++ include/scsi/scsi_transport_srp.h | 9 ++ 6 files changed, 257 insertions(+), 2 deletions(-) diff --git a/Documentation/ABI/stable/sysfs-transport-srp b/Documentation/ABI/stable/sysfs-transport-srp index 7b0d4a5..d8a9048 100644 --- a/Documentation/ABI/stable/sysfs-transport-srp +++ b/Documentation/ABI/stable/sysfs-transport-srp @@ -1,3 +1,21 @@ +What: /sys/class/srp_remote_ports/port-<h>:<n>/ping_interval +Date: November 1, 2011 +KernelVersion: 3.3 +Contact: linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org +Description: Time in seconds between two sucessive ping attempts. Setting + this parameter to zero or a negative value disables the ping + mechanism. + +What: /sys/class/srp_remote_ports/port-<h>:<n>/ping_timeout +Date: November 1, 2011 +KernelVersion: 3.3 +Contact: linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org +Description: If more time has elapsed than the specified number of seconds + since the latest successful ping attempt, the SRP initiator + driver that enabled this feature is informed about a transport + layer timeout by invoking its rport_ping_timedout callback + function. + What: /sys/class/srp_remote_ports/port-<h>:<n>/port_id Date: June 27, 2007 KernelVersion: 2.6.24 diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 797a8f1..82057f2 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -507,6 +507,26 @@ static void srp_del_scsi_host_attr(struct Scsi_Host *shost) } /** + * srp_disable_ping() - Stop pinging a target. + * + * Note: Can be invoked concurrently via the SCSI host sysfs attribute "delete" + * and one of the rport callback functions. + */ +static void srp_disable_ping(struct scsi_device *sdev) +{ + struct Scsi_Host *shost = sdev->host; + struct srp_target_port *target = host_to_target(shost); + struct srp_rport *rport = target->rport; + + if (rport->sdev == sdev) { + pr_debug("Disabling pinging rport %p via sdev %p\n", rport, + sdev); + srp_rport_set_sdev(rport, NULL); + srp_rport_disable_ping(rport); + } +} + +/* * srp_remove_target() - Remove an SRP target. * * The strategy to remove a target is as follows: @@ -522,6 +542,8 @@ static void srp_del_scsi_host_attr(struct Scsi_Host *shost) */ static void srp_remove_target(struct srp_target_port *target) { + struct srp_rport *rport = target->rport; + WARN_ON(target->state != SRP_TARGET_REMOVED); srp_del_scsi_host_attr(target->scsi_host); @@ -529,6 +551,8 @@ static void srp_remove_target(struct srp_target_port *target) mutex_lock(&target->mutex); mutex_unlock(&target->mutex); srp_unblock_rport(rport); + if (rport->sdev) + srp_disable_ping(rport->sdev); srp_rport_disable_recovery(target->rport); cancel_delayed_work_sync(&target->reconnect_work); srp_stop_rport(target->rport); @@ -555,6 +579,21 @@ static void srp_remove_work(struct work_struct *work) srp_remove_target(target); } +static void srp_ping_timedout(struct srp_rport *rport) +{ + struct srp_target_port *target = rport->lld_data; + + pr_debug("ping timeout: rport = %p; target = %p / state %d\n", + rport, target, target->state); + + mutex_lock(&target->mutex); + if (srp_change_state(target, SRP_TARGET_LIVE, SRP_TARGET_BLOCKED)) { + srp_block_rport(rport); + srp_start_recovery_timer(target->rport); + } + mutex_unlock(&target->mutex); +} + static void srp_recovery_timedout(struct srp_rport *rport) { struct srp_target_port *target = rport->lld_data; @@ -1506,6 +1545,13 @@ static int srp_slave_alloc(struct scsi_device *sdev) { struct Scsi_Host *shost = sdev->host; struct srp_target_port *target = host_to_target(shost); + struct srp_rport *rport = target->rport; + + if (!rport->sdev) { + pr_debug("Enable pinging rport %p through sdev %p\n", rport, + sdev); + srp_rport_set_sdev(rport, sdev); + } if (!WARN_ON(target->rq_tmo_jiffies == 0)) blk_queue_rq_timeout(sdev->request_queue, @@ -2070,6 +2116,7 @@ static struct scsi_host_template srp_template = { .name = "InfiniBand SRP initiator", .proc_name = DRV_NAME, .slave_alloc = srp_slave_alloc, + .slave_delete = srp_disable_ping, .info = srp_target_info, .queuecommand = srp_queuecommand, .eh_abort_handler = srp_abort, @@ -2084,6 +2131,7 @@ static struct scsi_host_template srp_template = { }; static struct srp_function_template ib_srp_transport_functions = { + .rport_ping_timedout = srp_ping_timedout, .rport_recovery_timedout = srp_recovery_timedout, }; diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c index e0bd3f7..328caa3 100644 --- a/drivers/scsi/scsi_sysfs.c +++ b/drivers/scsi/scsi_sysfs.c @@ -534,7 +534,12 @@ static DEVICE_ATTR(rescan, S_IWUSR, NULL, store_rescan_field); static void sdev_store_delete_callback(struct device *dev) { - scsi_remove_device(to_scsi_device(dev)); + struct scsi_device *sdev = to_scsi_device(dev); + struct scsi_host_template *sht = sdev->host->hostt; + + if (sht->slave_delete) + sht->slave_delete(sdev); + scsi_remove_device(sdev); } static ssize_t diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c index 9a57b7b..135a870 100644 --- a/drivers/scsi/scsi_transport_srp.c +++ b/drivers/scsi/scsi_transport_srp.c @@ -29,6 +29,7 @@ #include <scsi/scsi.h> #include <scsi/scsi_device.h> #include <scsi/scsi_host.h> +#include <scsi/scsi_eh.h> #include <scsi/scsi_transport.h> #include <scsi/scsi_transport_srp.h> #include "scsi_transport_srp_internal.h" @@ -39,7 +40,7 @@ struct srp_host_attrs { #define to_srp_host_attrs(host) ((struct srp_host_attrs *)(host)->shost_data) #define SRP_HOST_ATTRS 0 -#define SRP_RPORT_ATTRS 3 +#define SRP_RPORT_ATTRS 5 struct srp_internal { struct scsi_transport_template t; @@ -117,6 +118,67 @@ show_srp_rport_roles(struct device *dev, struct device_attribute *attr, static DEVICE_ATTR(roles, S_IRUGO, show_srp_rport_roles, NULL); +static ssize_t show_srp_rport_ping_interval(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct srp_rport *rport = transport_class_to_srp_rport(dev); + + return sprintf(buf, "%d\n", rport->ping_itv); +} + +static ssize_t store_srp_rport_ping_interval(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + struct srp_rport *rport = transport_class_to_srp_rport(dev); + char ch[16]; + int res, ping_itv; + + sprintf(ch, "%.*s", (int)min(sizeof(ch) - 1, count), buf); + res = kstrtoint(ch, 0, &ping_itv); + if (res) + goto out; + rport->ping_itv = ping_itv; + pr_debug("rport %p: new ping interval = %d seconds\n", rport, ping_itv); + if (ping_itv > 0) + queue_delayed_work(system_long_wq, &rport->ping_work, + ping_itv * HZ); + else + cancel_delayed_work(&rport->ping_work); + res = count; +out: + return res; +} + +static DEVICE_ATTR(ping_interval, S_IRUGO | S_IWUSR, + show_srp_rport_ping_interval, store_srp_rport_ping_interval); + +static ssize_t show_srp_rport_ping_timeout(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct srp_rport *rport = transport_class_to_srp_rport(dev); + + return sprintf(buf, "%d\n", rport->ping_tmo); +} + +static ssize_t store_srp_rport_ping_timeout(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + struct srp_rport *rport = transport_class_to_srp_rport(dev); + char ch[16]; + int res; + + sprintf(ch, "%.*s", (int)min(sizeof(ch) - 1, count), buf); + res = kstrtoint(ch, 0, &rport->ping_tmo); + return res == 0 ? count : res; +} + +static DEVICE_ATTR(ping_timeout, S_IRUGO | S_IWUSR, show_srp_rport_ping_timeout, + store_srp_rport_ping_timeout); + static ssize_t show_srp_rport_recovery_tmo(struct device *dev, struct device_attribute *attr, char *buf) @@ -194,6 +256,7 @@ void srp_unblock_rport(struct srp_rport *rport) spin_lock_irqsave(&rport->lock, flags); prev_state = rport->state; if (prev_state == SRP_RPORT_BLOCKED) { + rport->latest_ping_response = jiffies; rport->state = SRP_RPORT_LIVE; unblock = true; } @@ -249,6 +312,20 @@ void srp_start_recovery_timer(struct srp_rport *rport) EXPORT_SYMBOL(srp_start_recovery_timer); /** + * srp_rport_disable_ping() - Stop pinging and prevent reenabling pinging. + */ +void srp_rport_disable_ping(struct srp_rport *rport) +{ + struct device *dev = rport->dev.parent; + + WARN_ON(rport->state == SRP_RPORT_BLOCKED); + device_remove_file(dev, &dev_attr_ping_timeout); + device_remove_file(dev, &dev_attr_ping_interval); + cancel_delayed_work_sync(&rport->ping_work); +} +EXPORT_SYMBOL(srp_rport_disable_ping); + +/** * srp_rport_disable_recovery() - Disable the transport layer recovery timer. */ void srp_rport_disable_recovery(struct srp_rport *rport) @@ -265,10 +342,92 @@ EXPORT_SYMBOL(srp_rport_disable_recovery); */ void srp_stop_rport(struct srp_rport *rport) { + srp_rport_disable_ping(rport); srp_rport_disable_recovery(rport); } EXPORT_SYMBOL(srp_stop_rport); +/** + * srp_rport_set_sdev() - Set the SCSI device that will be used for pinging. + */ +void srp_rport_set_sdev(struct srp_rport *rport, struct scsi_device *sdev) +{ + unsigned long flags; + + if (sdev && !get_device(&sdev->sdev_dev)) + sdev = NULL; + + spin_lock_irqsave(&rport->lock, flags); + swap(rport->sdev, sdev); + spin_unlock_irqrestore(&rport->lock, flags); + + if (sdev) + put_device(&sdev->sdev_dev); +} +EXPORT_SYMBOL(srp_rport_set_sdev); + +/** + * srp_rport_get_sdev() - Get the SCSI device used for pinging. + */ +static struct scsi_device *rport_get_sdev(struct srp_rport *rport) +{ + struct scsi_device *sdev; + unsigned long flags; + + spin_lock_irqsave(&rport->lock, flags); + sdev = rport->sdev; + if (sdev && !get_device(&sdev->sdev_dev)) + sdev = NULL; + spin_unlock_irqrestore(&rport->lock, flags); + + return sdev; +} + +/** + * rport_ping() - Verify whether the transport layer is still operational. + */ +static void rport_ping(struct work_struct *work) +{ + struct scsi_sense_hdr sshdr; + struct srp_rport *rport; + struct scsi_device *sdev; + int res, itv, tmo; + + rport = container_of(work, struct srp_rport, ping_work.work); + itv = rport->ping_itv; + tmo = rport->ping_tmo; + sdev = rport_get_sdev(rport); + pr_debug("rport %p has state %d; sdev = %p; ping interval = %d\n", + rport, rport->state, sdev, itv); + if (itv <= 0) + goto out; + if (!sdev) + goto schedule; + if (rport->state == SRP_RPORT_BLOCKED) + goto put; + memset(&sshdr, 0, sizeof(sshdr)); + res = scsi_test_unit_ready(sdev, itv, 1, NULL); + pr_debug("scsi_test_unit_ready() result = 0x%x / %s%s\n", res, + scsi_sense_valid(&sshdr) ? "sense valid" : "sense not valid", + scsi_sense_valid(&sshdr) && + sshdr.sense_key == UNIT_ATTENTION ? " (unit attention)" : ""); + if (scsi_status_is_good(res) || (res & SAM_STAT_CHECK_CONDITION)) { + rport->latest_ping_response = jiffies; + } else if (tmo > 0 && + time_after(jiffies, rport->latest_ping_response + tmo)) { + shost_printk(KERN_INFO, sdev->host, + "SRP ping timeout elapsed\n"); + if (rport->ft->rport_ping_timedout) + rport->ft->rport_ping_timedout(rport); + } +put: + put_device(&sdev->sdev_dev); +schedule: + queue_delayed_work(system_long_wq, &rport->ping_work, itv * HZ); +out: + return; +} + static void srp_rport_release(struct device *dev) { struct srp_rport *rport = dev_to_rport(dev); @@ -346,6 +505,7 @@ struct srp_rport *srp_rport_add(struct Scsi_Host *shost, rport->roles = ids->roles; rport->recovery_tmo = 120; + INIT_DELAYED_WORK(&rport->ping_work, rport_ping); INIT_DELAYED_WORK(&rport->recovery_work, rport_recovery_timedout); spin_lock_init(&rport->lock); rport->state = SRP_RPORT_LIVE; @@ -353,6 +513,8 @@ struct srp_rport *srp_rport_add(struct Scsi_Host *shost, id = atomic_inc_return(&to_srp_host_attrs(shost)->next_port_id); dev_set_name(&rport->dev, "port-%d:%d", shost->host_no, id); + pr_debug("rport %p has name %s\n", rport, dev_name(&rport->dev)); + transport_setup_device(&rport->dev); ret = device_add(&rport->dev); @@ -467,6 +629,10 @@ srp_attach_transport(struct srp_function_template *ft) count = 0; i->rport_attrs[count++] = &dev_attr_port_id; i->rport_attrs[count++] = &dev_attr_roles; + if (ft->rport_ping_timedout) { + i->rport_attrs[count++] = &dev_attr_ping_interval; + i->rport_attrs[count++] = &dev_attr_ping_timeout; + } if (ft->rport_recovery_timedout) i->rport_attrs[count++] = &dev_attr_recovery_tmo; i->rport_attrs[count++] = NULL; diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h index f1f2644..16536a6 100644 --- a/include/scsi/scsi_host.h +++ b/include/scsi/scsi_host.h @@ -233,6 +233,15 @@ struct scsi_host_template { */ int (* slave_configure)(struct scsi_device *); + /** + * Callback invoked just before scsi_remove_device() will be invoked + * e.g. if device removal has been requested via the sdev "delete" + * sysfs attribute. + * + * Status: OPTIONAL + */ + void (* slave_delete)(struct scsi_device *); + /* * Immediately prior to deallocating the device and after all activity * has ceased the mid layer calls this point so that the low level diff --git a/include/scsi/scsi_transport_srp.h b/include/scsi/scsi_transport_srp.h index fcda8e3..565cb79 100644 --- a/include/scsi/scsi_transport_srp.h +++ b/include/scsi/scsi_transport_srp.h @@ -38,14 +38,20 @@ struct srp_rport { void *lld_data; /* LLD private data */ spinlock_t lock; + struct scsi_device *sdev; enum srp_rport_state state; + int ping_itv; + int ping_tmo; + unsigned long latest_ping_response; int recovery_tmo; + struct delayed_work ping_work; struct delayed_work recovery_work; }; struct srp_function_template { /* for initiator drivers */ + void (*rport_ping_timedout) (struct srp_rport *rport); void (*rport_recovery_timedout) (struct srp_rport *rport); /* for target drivers */ int (* tsk_mgmt_response)(struct Scsi_Host *, u64, u64, int); @@ -62,6 +68,9 @@ extern void srp_rport_del(struct srp_rport *); extern void srp_unblock_rport(struct srp_rport *rport); extern void srp_block_rport(struct srp_rport *rport); extern void srp_start_recovery_timer(struct srp_rport *rport); +extern void srp_rport_set_sdev(struct srp_rport *rport, + struct scsi_device *sdev); +extern void srp_rport_disable_ping(struct srp_rport *rport); extern void srp_rport_disable_recovery(struct srp_rport *rport); extern void srp_stop_rport(struct srp_rport *rport); -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH 13/14] ib_srp: Implement transport layer ping 2011-12-01 19:11 ` [PATCH 13/14] ib_srp: Implement transport layer ping Bart Van Assche @ 2011-12-19 0:50 ` David Dillow 2011-12-19 10:16 ` Bart Van Assche 0 siblings, 1 reply; 36+ messages in thread From: David Dillow @ 2011-12-19 0:50 UTC (permalink / raw) To: Bart Van Assche Cc: linux-rdma, linux-scsi, Roland Dreier, Fujita Tomonori, Brian King On Thu, 2011-12-01 at 20:11 +0100, Bart Van Assche wrote: > Add a time-based transport layer test such that fail-over in a multipath > setup can happen quickly. Why should this be done in the kernel? multipathd already verifies all paths to a SCSI device are up and that the device is reachable. -- Dave Dillow National Center for Computational Science Oak Ridge National Laboratory (865) 241-6602 office ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 13/14] ib_srp: Implement transport layer ping 2011-12-19 0:50 ` David Dillow @ 2011-12-19 10:16 ` Bart Van Assche 2011-12-19 22:32 ` David Dillow 0 siblings, 1 reply; 36+ messages in thread From: Bart Van Assche @ 2011-12-19 10:16 UTC (permalink / raw) To: David Dillow Cc: linux-rdma, linux-scsi, Roland Dreier, Fujita Tomonori, Brian King On Mon, Dec 19, 2011 at 12:50 AM, David Dillow <dillowda@ornl.gov> wrote: > On Thu, 2011-12-01 at 20:11 +0100, Bart Van Assche wrote: >> Add a time-based transport layer test such that fail-over in a multipath >> setup can happen quickly. > > Why should this be done in the kernel? multipathd already verifies all > paths to a SCSI device are up and that the device is reachable. I'm afraid it's impossible to make a transport layer check work reliably from user space. As an example, srp_reset_host() blocks the SCSI host before reconnecting. Before starting to attempt to reconnect, that action does block the SCSI host and hence also all transport layer checks issued from user space. I doubt it's possible to fix the resulting race between a transport layer reconnect issued from srp_reset_host() and a transport layer reconnect triggered from user space. Bart. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 13/14] ib_srp: Implement transport layer ping 2011-12-19 10:16 ` Bart Van Assche @ 2011-12-19 22:32 ` David Dillow [not found] ` <1324333931.7043.52.camel-FqX9LgGZnHWDB2HL1qBt2PIbXMQ5te18@public.gmane.org> 0 siblings, 1 reply; 36+ messages in thread From: David Dillow @ 2011-12-19 22:32 UTC (permalink / raw) To: Bart Van Assche Cc: linux-rdma@vger.kernel.org, linux-scsi@vger.kernel.org, Roland Dreier, Fujita Tomonori, Brian King On Mon, 2011-12-19 at 05:16 -0500, Bart Van Assche wrote: > On Mon, Dec 19, 2011 at 12:50 AM, David Dillow <dillowda@ornl.gov> wrote: > > On Thu, 2011-12-01 at 20:11 +0100, Bart Van Assche wrote: > >> Add a time-based transport layer test such that fail-over in a multipath > >> setup can happen quickly. > > > > Why should this be done in the kernel? multipathd already verifies all > > paths to a SCSI device are up and that the device is reachable. > > I'm afraid it's impossible to make a transport layer check work > reliably from user space. As an example, srp_reset_host() blocks the > SCSI host before reconnecting. Before starting to attempt to > reconnect, that action does block the SCSI host and hence also all > transport layer checks issued from user space. I doubt it's possible > to fix the resulting race between a transport layer reconnect issued > from srp_reset_host() and a transport layer reconnect triggered from > user space. Part of the problem is introduced by allowing for permanent connections rather than using the familiar dev_loss_tmo and fast_io_fail_tmo parameters from other SCSI transports. For instance, in the FC transport, rports are allowed to disappear for up to dev_loss_tmo seconds before being removed from the SCSI device tree. Until they have been gone for fast_fail_io_tmo seconds, they are blocked (as is error handling to prevent offlining devices). Once they have been gone longer that fast_fail_io_tmo, they become unblocked and new IO will be failed. Now, the FC transport is probably a bit more complex than we want right now, but following it (and the SAS transport's) lead should keep us in sync with where the rest of the SCSI stack is headed. As for reliability from user space, multipathd checks that the SCSI initiator is not blocked before checking the liveness of the path; blocked paths are assumed to be down. It still seems to default to a 30 second timeout for the test TUR/directIO/etc check, and that doesn't currently look to be configurable (fixable). These timeouts are independent of the SCSI layer, and will mark a path down for new traffic without waiting for the lower level timeout. The transport layer reconnect from user space (I'm assuming you were thinking ioctl or using the sg device) would coordinated by the SCSI-midlayer, using calls to srp_reset_host(), so I think we avoid any race condition. Or did you mean a manual reconnect attempt from manipulating srp_host/X/reconnect_tmo via sysfs -- in which case it is in our code, so we can certainly avoid race conditions. I still think this is already solved in user space, but the new reconnect model you've implemented doesn't match up with the expected semantics. It'd be better to match the rest of the SCSI stack for this. -- Dave Dillow National Center for Computational Science Oak Ridge National Laboratory (865) 241-6602 office ^ permalink raw reply [flat|nested] 36+ messages in thread
[parent not found: <1324333931.7043.52.camel-FqX9LgGZnHWDB2HL1qBt2PIbXMQ5te18@public.gmane.org>]
* Re: [PATCH 13/14] ib_srp: Implement transport layer ping [not found] ` <1324333931.7043.52.camel-FqX9LgGZnHWDB2HL1qBt2PIbXMQ5te18@public.gmane.org> @ 2011-12-20 10:13 ` Bart Van Assche [not found] ` <CAO+b5-qLxmcXCCxA8+bPYsinjr1eqCDO2JUJbjgVr59N55CU1Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2011-12-20 10:27 ` Bart Van Assche 1 sibling, 1 reply; 36+ messages in thread From: Bart Van Assche @ 2011-12-20 10:13 UTC (permalink / raw) To: David Dillow Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Roland Dreier, Fujita Tomonori, Brian King On Mon, Dec 19, 2011 at 10:32 PM, David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org> wrote: > I still think this is already solved in user space, but the new > reconnect model you've implemented doesn't match up with the expected > semantics. It'd be better to match the rest of the SCSI stack for this. Sorry, but I'm afraid there is no "rest of the SCSI stack [semantics]". I've been following the semantics implemented in open-iscsi. If the semantics implemented for FC differ of those proposed in this patch set, then that means that the semantics implemented for FC differ of those for open-iscsi. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
[parent not found: <CAO+b5-qLxmcXCCxA8+bPYsinjr1eqCDO2JUJbjgVr59N55CU1Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH 13/14] ib_srp: Implement transport layer ping [not found] ` <CAO+b5-qLxmcXCCxA8+bPYsinjr1eqCDO2JUJbjgVr59N55CU1Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2011-12-21 2:32 ` David Dillow 0 siblings, 0 replies; 36+ messages in thread From: David Dillow @ 2011-12-21 2:32 UTC (permalink / raw) To: Bart Van Assche Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Roland Dreier, Fujita Tomonori, Brian King On Tue, 2011-12-20 at 10:13 +0000, Bart Van Assche wrote: > On Mon, Dec 19, 2011 at 10:32 PM, David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org> wrote: > > I still think this is already solved in user space, but the new > > reconnect model you've implemented doesn't match up with the expected > > semantics. It'd be better to match the rest of the SCSI stack for this. > > Sorry, but I'm afraid there is no "rest of the SCSI stack > [semantics]". I've been following the semantics implemented in > open-iscsi. If the semantics implemented for FC differ of those > proposed in this patch set, then that means that the semantics > implemented for FC differ of those for open-iscsi. FC, SAS, SPI, (current) SRP, and FCoE currently use the same semantic to check if the transport is still alive -- user space initiates activity, usually via multipathd. You are changing SRP to match the iSCSI stack, and because SRP does not have a NOP command, you have to fake one up using Test Unit Ready to the first LUN to come through slave_alloc(). You kill the entire connection if that LUN has issues and causes a timeout, though the rest of the LUNs may be fine and able to communicate. This is all handled properly by existing user space code; there's no need for it to be in kernel, especially if you have to fake it while going against the grain. -- Dave Dillow National Center for Computational Science Oak Ridge National Laboratory (865) 241-6602 office -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 13/14] ib_srp: Implement transport layer ping [not found] ` <1324333931.7043.52.camel-FqX9LgGZnHWDB2HL1qBt2PIbXMQ5te18@public.gmane.org> 2011-12-20 10:13 ` Bart Van Assche @ 2011-12-20 10:27 ` Bart Van Assche 2011-12-21 3:05 ` David Dillow 1 sibling, 1 reply; 36+ messages in thread From: Bart Van Assche @ 2011-12-20 10:27 UTC (permalink / raw) To: David Dillow Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Roland Dreier, Fujita Tomonori, Brian King On Mon, Dec 19, 2011 at 10:32 PM, David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org> wrote: > Part of the problem is introduced by allowing for permanent connections > rather than using the familiar dev_loss_tmo and fast_io_fail_tmo > parameters from other SCSI transports. For instance, in the FC > transport, rports are allowed to disappear for up to dev_loss_tmo > seconds before being removed from the SCSI device tree. Until they have > been gone for fast_fail_io_tmo seconds, they are blocked (as is error > handling to prevent offlining devices). Once they have been gone longer > that fast_fail_io_tmo, they become unblocked and new IO will be failed. I'm not convinced an equivalent of fast_fail_io_tmo is necessary for the SRP transport. If a target disappears briefly from the IB fabric what will happen with the posted patch series is that the initiator is blocked during one ping interval and also that a reconnect is triggered. Also, some SCSI commands may be reissued after reconnecting. But that shouldn't have any adverse consequences, isn't it ? Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 13/14] ib_srp: Implement transport layer ping 2011-12-20 10:27 ` Bart Van Assche @ 2011-12-21 3:05 ` David Dillow [not found] ` <1324436736.7621.38.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org> 0 siblings, 1 reply; 36+ messages in thread From: David Dillow @ 2011-12-21 3:05 UTC (permalink / raw) To: Bart Van Assche Cc: linux-rdma@vger.kernel.org, linux-scsi@vger.kernel.org, Roland Dreier, Fujita Tomonori, Brian King On Tue, 2011-12-20 at 10:27 +0000, Bart Van Assche wrote: > On Mon, Dec 19, 2011 at 10:32 PM, David Dillow <dillowda@ornl.gov> wrote: > > Part of the problem is introduced by allowing for permanent connections > > rather than using the familiar dev_loss_tmo and fast_io_fail_tmo > > parameters from other SCSI transports. For instance, in the FC > > transport, rports are allowed to disappear for up to dev_loss_tmo > > seconds before being removed from the SCSI device tree. Until they have > > been gone for fast_fail_io_tmo seconds, they are blocked (as is error > > handling to prevent offlining devices). Once they have been gone longer > > that fast_fail_io_tmo, they become unblocked and new IO will be failed. > > I'm not convinced an equivalent of fast_fail_io_tmo is necessary for > the SRP transport. If a target disappears briefly from the IB fabric > what will happen with the posted patch series is that the initiator is > blocked during one ping interval and also that a reconnect is > triggered. Also, some SCSI commands may be reissued after > reconnecting. But that shouldn't have any adverse consequences, isn't > it ? We don't want to leave a target blocked indefinitely -- commands caught in the blocked queue won't be reissued until the queue is unblocked -- but we may want to keep the sdX mappings around for a long time. fast_io_fail_tmo gives us the ability to do that -- the expiration of fast_io_fail_tmo unblocks the queue and allows commands to be failed due to the transport error. See commit f2818663. Also, these settings can be used to help tune multipath failover for devices with relatively long LUN transfer times (say with RDAC) vs short ones (ALUA). You still don't want the mappings to go away, but you want to move data to the other paths in a reasonably quick fashion. http://lkml.org/lkml/2008/1/7/244 talks about this a bit. As for reissued commands, most shouldn't be an issue -- once they get out of the blocked queue. I would expect there to be problems with certain vendor specific commands, and tape drives will have issues, but those are common to any multipath solution. -- Dave Dillow National Center for Computational Science Oak Ridge National Laboratory (865) 241-6602 office ^ permalink raw reply [flat|nested] 36+ messages in thread
[parent not found: <1324436736.7621.38.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>]
* Re: [PATCH 13/14] ib_srp: Implement transport layer ping [not found] ` <1324436736.7621.38.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org> @ 2011-12-21 14:07 ` Bart Van Assche 2011-12-23 22:34 ` David Dillow 0 siblings, 1 reply; 36+ messages in thread From: Bart Van Assche @ 2011-12-21 14:07 UTC (permalink / raw) To: David Dillow Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Roland Dreier, Fujita Tomonori, Brian King On Wed, Dec 21, 2011 at 3:05 AM, David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org> wrote: > We don't want to leave a target blocked indefinitely -- commands caught > in the blocked queue won't be reissued until the queue is unblocked -- > but we may want to keep the sdX mappings around for a long time. > fast_io_fail_tmo gives us the ability to do that -- the expiration of > fast_io_fail_tmo unblocks the queue and allows commands to be failed due > to the transport error. See commit f2818663. Our opinions may differ here. My opinion is that for some use cases it is crucial to be able to block a target indefinitely, e.g. when there is only a single path between initiator and target and when the user prefers that I/O blocks instead of encountering I/O errors. See e.g. the "iSCSI settings for iSCSI root" section in the iSCSI README (http://www.open-iscsi.org/docs/README). Regarding commit f2818663: as far as I can see what that commit does is to make it impossible for a user to set fast_io_fail_tmo == -1 together with dev_loss_tmo > SCSI_DEVICE_BLOCK_MAX_TIMEOUT. That change is specific to the FC code - only the FC transport code limits the blocked state time to the SCSI_DEVICE_BLOCK_MAX_TIMEOUT limit. > Also, these settings can be used to help tune multipath failover for > devices with relatively long LUN transfer times (say with RDAC) vs short > ones (ALUA). You still don't want the mappings to go away, but you want > to move data to the other paths in a reasonably quick fashion. > http://lkml.org/lkml/2008/1/7/244 talks about this a bit. In a multipath setup the ping_interval, ping_timeout, recovery_tmo and max_reconnects parameters (all added via this patch series) should be sufficient to allow control over how quickly multipath failover happens. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 13/14] ib_srp: Implement transport layer ping 2011-12-21 14:07 ` Bart Van Assche @ 2011-12-23 22:34 ` David Dillow [not found] ` <1324679698.3004.12.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org> 0 siblings, 1 reply; 36+ messages in thread From: David Dillow @ 2011-12-23 22:34 UTC (permalink / raw) To: Bart Van Assche Cc: linux-rdma@vger.kernel.org, linux-scsi@vger.kernel.org, Roland Dreier, Fujita Tomonori, Brian King On Wed, 2011-12-21 at 14:07 +0000, Bart Van Assche wrote: > On Wed, Dec 21, 2011 at 3:05 AM, David Dillow <dillowda@ornl.gov> wrote: > > We don't want to leave a target blocked indefinitely -- commands caught > > in the blocked queue won't be reissued until the queue is unblocked -- > > but we may want to keep the sdX mappings around for a long time. > > fast_io_fail_tmo gives us the ability to do that -- the expiration of > > fast_io_fail_tmo unblocks the queue and allows commands to be failed due > > to the transport error. See commit f2818663. > > Our opinions may differ here. My opinion is that for some use cases it > is crucial to be able to block a target indefinitely, e.g. when there > is only a single path between initiator and target and when the user > prefers that I/O blocks instead of encountering I/O errors. See e.g. > the "iSCSI settings for iSCSI root" section in the iSCSI README > (http://www.open-iscsi.org/docs/README). I can see admins desiring that option, but it's also handled by putting your root on a dm-multipath volume. Either way, I'm open to letting things block indefinitely, but I do think we should look into why the SCSI stack has a concept of the maximum blocking time. > Regarding commit f2818663: as far as I can see what that commit does > is to make it impossible for a user to set fast_io_fail_tmo == -1 I was pointing to the commit message as evidence of more prominent SCSI developer's line of thought on handling missing devices. As you note, the actual diff is uninteresting. We should match the rest of the SCSI stack unless there is good reason to go our own way. The iSCSI transport can do what it does because it has a native ping, but I'm not convinced they should have introduced new semantics and/or names when taking advantage of that. As far as I can tell without deeply studying it, replacement_timeout is equivalent to fast_io_fail_tmo. -- Dave Dillow National Center for Computational Science Oak Ridge National Laboratory (865) 241-6602 office ^ permalink raw reply [flat|nested] 36+ messages in thread
[parent not found: <1324679698.3004.12.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>]
* Re: [PATCH 13/14] ib_srp: Implement transport layer ping [not found] ` <1324679698.3004.12.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org> @ 2011-12-23 22:56 ` Mike Christie 2011-12-24 20:07 ` David Dillow 2011-12-26 20:01 ` Bart Van Assche 0 siblings, 2 replies; 36+ messages in thread From: Mike Christie @ 2011-12-23 22:56 UTC (permalink / raw) To: David Dillow Cc: Bart Van Assche, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Roland Dreier, Fujita Tomonori, Brian King On 12/23/2011 04:34 PM, David Dillow wrote: > On Wed, 2011-12-21 at 14:07 +0000, Bart Van Assche wrote: >> On Wed, Dec 21, 2011 at 3:05 AM, David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org> wrote: >>> We don't want to leave a target blocked indefinitely -- commands caught >>> in the blocked queue won't be reissued until the queue is unblocked -- >>> but we may want to keep the sdX mappings around for a long time. >>> fast_io_fail_tmo gives us the ability to do that -- the expiration of >>> fast_io_fail_tmo unblocks the queue and allows commands to be failed due >>> to the transport error. See commit f2818663. >> >> Our opinions may differ here. My opinion is that for some use cases it >> is crucial to be able to block a target indefinitely, e.g. when there >> is only a single path between initiator and target and when the user >> prefers that I/O blocks instead of encountering I/O errors. See e.g. >> the "iSCSI settings for iSCSI root" section in the iSCSI README >> (http://www.open-iscsi.org/docs/README). > > I can see admins desiring that option, but it's also handled by putting > your root on a dm-multipath volume. Either way, I'm open to letting > things block indefinitely, but I do think we should look into why the > SCSI stack has a concept of the maximum blocking time. > >> Regarding commit f2818663: as far as I can see what that commit does >> is to make it impossible for a user to set fast_io_fail_tmo == -1 > > I was pointing to the commit message as evidence of more prominent SCSI > developer's line of thought on handling missing devices. As you note, > the actual diff is uninteresting. > > We should match the rest of the SCSI stack unless there is good reason > to go our own way. The iSCSI transport can do what it does because it > has a native ping, but I'm not convinced they should have introduced new > semantics and/or names when taking advantage of that. As far as I can > tell without deeply studying it, replacement_timeout is equivalent to > fast_io_fail_tmo. > iSCSI replacement_timeout is the same as fast_io_fail_tmo for FC. iSCSI replacement_timeout actually came first so you should say FC should have copied our name :) So FC has fast_io_fail_tmo which controls when to fail IO when a port is blocked. This can be set to so that it does not ever fire. However, FC also has the dev_loss_tmo which controls when to remove devices (and also fail IO) when a port is blocked. This one cannot be turned off, so it will eventually fire and you will get IO errors (this is due to the devices being removed and the scsi layer then failing all IO). iSCSI had replacement_timeout first, and it only fails IO, because at the time things did not handle hotplug removal well, and I did not know there was a need for the removal. There is patch to add dev_loss_tmo to the iSCSI layer. There have been some bugs though, so I have not pushed it. But it will also work like FC where you cannot turn it off. SAS has something like the dev_loss_tmo. It does not have something like the fast_io_fail/replacement_timeout. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 13/14] ib_srp: Implement transport layer ping 2011-12-23 22:56 ` Mike Christie @ 2011-12-24 20:07 ` David Dillow 2011-12-26 19:39 ` Bart Van Assche 2011-12-26 20:01 ` Bart Van Assche 1 sibling, 1 reply; 36+ messages in thread From: David Dillow @ 2011-12-24 20:07 UTC (permalink / raw) To: Mike Christie Cc: Bart Van Assche, linux-rdma@vger.kernel.org, linux-scsi@vger.kernel.org, Roland Dreier, Fujita Tomonori, Brian King On Fri, 2011-12-23 at 16:56 -0600, Mike Christie wrote: > On 12/23/2011 04:34 PM, David Dillow wrote: > > We should match the rest of the SCSI stack unless there is good reason > > to go our own way. The iSCSI transport can do what it does because it > > has a native ping, but I'm not convinced they should have introduced new > > semantics and/or names when taking advantage of that. As far as I can > > tell without deeply studying it, replacement_timeout is equivalent to > > fast_io_fail_tmo. > > > > iSCSI replacement_timeout is the same as fast_io_fail_tmo for FC. iSCSI > replacement_timeout actually came first so you should say FC should have > copied our name :) Indeed, some git-foo seems to indicate both stacks had that functionality mainlined in 2.6.18, with iSCSI getting it first. FC should have used your name. > iSCSI had replacement_timeout first, and it only fails IO, because at > the time things did not handle hotplug removal well, and I did not know > there was a need for the removal. There is patch to add dev_loss_tmo to > the iSCSI layer. There have been some bugs though, so I have not pushed > it. But it will also work like FC where you cannot turn it off. > > SAS has something like the dev_loss_tmo. It does not have something like > the fast_io_fail/replacement_timeout. This says to me that SRP should use the dev_loss_tmo semantics, though the naming of fast_io_fail vs replacement_timeout is a bit more of a question than I thought. I tend to think of SRP more in terms of FC than iSCSI, so I still prefer the former, but perhaps not as strongly now. I still believe consistency is good -- what do the SCSI devs think here? Should we work on getting a consistent name for this property -- and if so, which one -- or does the name not matter? -- Dave Dillow National Center for Computational Science Oak Ridge National Laboratory (865) 241-6602 office ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 13/14] ib_srp: Implement transport layer ping 2011-12-24 20:07 ` David Dillow @ 2011-12-26 19:39 ` Bart Van Assche 2011-12-28 23:53 ` David Dillow 0 siblings, 1 reply; 36+ messages in thread From: Bart Van Assche @ 2011-12-26 19:39 UTC (permalink / raw) To: David Dillow Cc: Mike Christie, linux-rdma@vger.kernel.org, linux-scsi@vger.kernel.org, Roland Dreier, Fujita Tomonori, Brian King On Sat, Dec 24, 2011 at 8:07 PM, David Dillow <dillowda@ornl.gov> wrote: > This says to me that SRP should use the dev_loss_tmo semantics, though > the naming of fast_io_fail vs replacement_timeout is a bit more of a > question than I thought. I tend to think of SRP more in terms of FC than > iSCSI, so I still prefer the former, but perhaps not as strongly now. Do we need dev_loss_tmo functionality ? Since multipathd switches over if the active path is in the blocked state, the posted patch set already provides a way to make multipathd switch over if communication is lost. Bart. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 13/14] ib_srp: Implement transport layer ping 2011-12-26 19:39 ` Bart Van Assche @ 2011-12-28 23:53 ` David Dillow 0 siblings, 0 replies; 36+ messages in thread From: David Dillow @ 2011-12-28 23:53 UTC (permalink / raw) To: Bart Van Assche Cc: Mike Christie, linux-rdma@vger.kernel.org, linux-scsi@vger.kernel.org, Roland Dreier, Fujita Tomonori, Brian King On Mon, 2011-12-26 at 19:39 +0000, Bart Van Assche wrote: > On Sat, Dec 24, 2011 at 8:07 PM, David Dillow <dillowda@ornl.gov> wrote: > > This says to me that SRP should use the dev_loss_tmo semantics, though > > the naming of fast_io_fail vs replacement_timeout is a bit more of a > > question than I thought. I tend to think of SRP more in terms of FC than > > iSCSI, so I still prefer the former, but perhaps not as strongly now. > > Do we need dev_loss_tmo functionality ? Since multipathd switches over > if the active path is in the blocked state, the posted patch set > already provides a way to make multipathd switch over if communication > is lost. I think so. multipathd isn't the only use case, though I think we need to reorganize the SRP device model to fully take advantage of dev_loss_tmo -- but that can come later. I'd like to see dev_loss_tmo take the place of max_reconnects to give admins an easy way to set a maximum time. -- Dave Dillow National Center for Computational Science Oak Ridge National Laboratory (865) 241-6602 office ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 13/14] ib_srp: Implement transport layer ping 2011-12-23 22:56 ` Mike Christie 2011-12-24 20:07 ` David Dillow @ 2011-12-26 20:01 ` Bart Van Assche 1 sibling, 0 replies; 36+ messages in thread From: Bart Van Assche @ 2011-12-26 20:01 UTC (permalink / raw) To: Mike Christie Cc: David Dillow, linux-rdma@vger.kernel.org, linux-scsi@vger.kernel.org, Roland Dreier, Fujita Tomonori, Brian King On Fri, Dec 23, 2011 at 10:56 PM, Mike Christie <michaelc@cs.wisc.edu> wrote: > iSCSI replacement_timeout is the same as fast_io_fail_tmo for FC. iSCSI > replacement_timeout actually came first so you should say FC should have > copied our name :) Agreed. But as far as I can see multipathd only supports the FC name (fast_io_fail_tmo) and not the iSCSI name (recovery_tmo). That looks like an argument to me to switch to the FC name for ib_srp. Bart. ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 14/14] ib_srp: Allow SRP disconnect through sysfs 2011-12-01 18:54 [PATCH 00/14] Make ib_srp better suited for H.A. purposes Bart Van Assche ` (3 preceding siblings ...) [not found] ` <201112011954.25811.bvanassche-HInyCGIudOg@public.gmane.org> @ 2011-12-01 19:13 ` Bart Van Assche 2011-12-19 4:03 ` David Dillow 2011-12-01 23:26 ` [PATCH 00/14] Make ib_srp better suited for H.A. purposes David Dillow 5 siblings, 1 reply; 36+ messages in thread From: Bart Van Assche @ 2011-12-01 19:13 UTC (permalink / raw) To: linux-rdma Cc: linux-scsi, David Dillow, Roland Dreier, Fujita Tomonori, Brian King Make it possible to disconnect via sysfs the IB RC connection used by the SRP protocol to communicate with a target. Let the SRP transport layer create a sysfs "delete" attribute for initiator drivers that support this functionality. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Cc: David Dillow <dillowda@ornl.gov> Cc: Roland Dreier <roland@purestorage.com> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Brian King <brking@linux.vnet.ibm.com> --- Documentation/ABI/stable/sysfs-transport-srp | 7 ++++ drivers/infiniband/ulp/srp/ib_srp.c | 44 +++++++++++++++---------- drivers/scsi/scsi_transport_srp.c | 20 +++++++++++- include/scsi/scsi_transport_srp.h | 1 + 4 files changed, 53 insertions(+), 19 deletions(-) diff --git a/Documentation/ABI/stable/sysfs-transport-srp b/Documentation/ABI/stable/sysfs-transport-srp index d8a9048..b1c8acd 100644 --- a/Documentation/ABI/stable/sysfs-transport-srp +++ b/Documentation/ABI/stable/sysfs-transport-srp @@ -1,3 +1,10 @@ +What: /sys/class/srp_remote_ports/port-<h>:<n>/delete +Date: November 1, 2011 +KernelVersion: 3.3 +Contact: linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org +Description: Instructs an SRP initiator to disconnect from a target and to + remove all LUNs imported from that target. + What: /sys/class/srp_remote_ports/port-<h>:<n>/ping_interval Date: November 1, 2011 KernelVersion: 3.3 diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 82057f2..f7e0224 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -579,6 +579,28 @@ static void srp_remove_work(struct work_struct *work) srp_remove_target(target); } +static void srp_rport_delete(struct srp_rport *rport) +{ + struct srp_target_port *target = rport->lld_data; + struct srp_host *host = target->srp_host; + bool remove = false; + + /* + * Schedule our work inside the lock to avoid a race with + * the flush_work_sync() call in srp_remove_one(). + */ + spin_lock(&host->target_lock); + spin_lock_irq(&target->lock); + if (target->state != SRP_TARGET_REMOVED) { + target->state = SRP_TARGET_REMOVED; + remove = true; + } + spin_unlock_irq(&target->lock); + if (remove) + queue_work(system_long_wq, &target->remove_work); + spin_unlock(&host->target_lock); +} + static void srp_ping_timedout(struct srp_rport *rport) { struct srp_target_port *target = rport->lld_data; @@ -1879,9 +1901,7 @@ static int srp_reset_device(struct scsi_cmnd *scmnd) static int srp_reset_host(struct scsi_cmnd *scmnd) { struct srp_target_port *target = host_to_target(scmnd->device->host); - struct srp_host *host = target->srp_host; int ret = FAILED; - bool remove = false; shost_printk(KERN_ERR, target->scsi_host, PFX "SRP reset_host called\n"); @@ -1893,23 +1913,10 @@ static int srp_reset_host(struct scsi_cmnd *scmnd) * However, we have to defer the real removal because we * are in the context of the SCSI error handler now, which * will deadlock if we call scsi_remove_host(). - * - * Schedule our work inside the lock to avoid a race with - * the flush_work_sync() call in srp_remove_one(). */ - spin_lock(&host->target_lock); - spin_lock_irq(&target->lock); - if (target->state != SRP_TARGET_REMOVED) { - target->state = SRP_TARGET_REMOVED; - remove = true; - } - spin_unlock_irq(&target->lock); - if (remove) { - shost_printk(KERN_ERR, target->scsi_host, PFX "recon" - "nect failed, removing target port.\n"); - queue_work(system_long_wq, &target->remove_work); - } - spin_unlock(&host->target_lock); + shost_printk(KERN_ERR, target->scsi_host, + PFX "reconnect failed, removing target port.\n"); + srp_rport_delete(target->rport); } return ret; @@ -2131,6 +2138,7 @@ static struct scsi_host_template srp_template = { }; static struct srp_function_template ib_srp_transport_functions = { + .rport_delete = srp_rport_delete, .rport_ping_timedout = srp_ping_timedout, .rport_recovery_timedout = srp_recovery_timedout, }; diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c index 135a870..e0359b4 100644 --- a/drivers/scsi/scsi_transport_srp.c +++ b/drivers/scsi/scsi_transport_srp.c @@ -40,7 +40,7 @@ struct srp_host_attrs { #define to_srp_host_attrs(host) ((struct srp_host_attrs *)(host)->shost_data) #define SRP_HOST_ATTRS 0 -#define SRP_RPORT_ATTRS 5 +#define SRP_RPORT_ATTRS 6 struct srp_internal { struct scsi_transport_template t; @@ -118,6 +118,22 @@ show_srp_rport_roles(struct device *dev, struct device_attribute *attr, static DEVICE_ATTR(roles, S_IRUGO, show_srp_rport_roles, NULL); +static ssize_t store_srp_rport_delete(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + struct srp_rport *rport = transport_class_to_srp_rport(dev); + + if (rport->ft->rport_delete) { + rport->ft->rport_delete(rport); + return count; + } else { + return -ENOSYS; + } +} + +static DEVICE_ATTR(delete, S_IWUSR, NULL, store_srp_rport_delete); + static ssize_t show_srp_rport_ping_interval(struct device *dev, struct device_attribute *attr, char *buf) @@ -635,6 +651,8 @@ srp_attach_transport(struct srp_function_template *ft) } if (ft->rport_recovery_timedout) i->rport_attrs[count++] = &dev_attr_recovery_tmo; + if (ft->rport_delete) + i->rport_attrs[count++] = &dev_attr_delete; i->rport_attrs[count++] = NULL; WARN_ON(count > ARRAY_SIZE(i->rport_attrs)); diff --git a/include/scsi/scsi_transport_srp.h b/include/scsi/scsi_transport_srp.h index 565cb79..8782d6a 100644 --- a/include/scsi/scsi_transport_srp.h +++ b/include/scsi/scsi_transport_srp.h @@ -51,6 +51,7 @@ struct srp_rport { struct srp_function_template { /* for initiator drivers */ + void (*rport_delete)(struct srp_rport *rport); void (*rport_ping_timedout) (struct srp_rport *rport); void (*rport_recovery_timedout) (struct srp_rport *rport); /* for target drivers */ -- 1.7.3.4 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH 14/14] ib_srp: Allow SRP disconnect through sysfs 2011-12-01 19:13 ` [PATCH 14/14] ib_srp: Allow SRP disconnect through sysfs Bart Van Assche @ 2011-12-19 4:03 ` David Dillow [not found] ` <1324267414.17849.98.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org> 0 siblings, 1 reply; 36+ messages in thread From: David Dillow @ 2011-12-19 4:03 UTC (permalink / raw) To: Bart Van Assche Cc: linux-rdma, linux-scsi, Roland Dreier, Fujita Tomonori, Brian King On Thu, 2011-12-01 at 20:13 +0100, Bart Van Assche wrote: > Make it possible to disconnect via sysfs the IB RC connection used by the > SRP protocol to communicate with a target. > > Let the SRP transport layer create a sysfs "delete" attribute for > initiator drivers that support this functionality. This will be a nice feature to have, but I wonder about the proper behavior. I had thought about having a 'drain' state, where we wouldn't start new commands and would let existing ones complete or timeout before killing the connection. That may be in addition to a 'kill it now' variant. Though perhaps my gentler delete is best done by having a user script walk all of the devices under the host and set them to offline, wait for scsi_host/hostX/host_busy to drop to zero, then hit the SRP delete attr. -- Dave Dillow National Center for Computational Science Oak Ridge National Laboratory (865) 241-6602 office ^ permalink raw reply [flat|nested] 36+ messages in thread
[parent not found: <1324267414.17849.98.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>]
* Re: [PATCH 14/14] ib_srp: Allow SRP disconnect through sysfs [not found] ` <1324267414.17849.98.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org> @ 2011-12-19 9:04 ` Bart Van Assche 0 siblings, 0 replies; 36+ messages in thread From: Bart Van Assche @ 2011-12-19 9:04 UTC (permalink / raw) To: David Dillow Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, linux-scsi-u79uwXL29TY76Z2rM5mHXA, Roland Dreier, Fujita Tomonori, Brian King On Mon, Dec 19, 2011 at 4:03 AM, David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org> wrote: > On Thu, 2011-12-01 at 20:13 +0100, Bart Van Assche wrote: >> Make it possible to disconnect via sysfs the IB RC connection used by the >> SRP protocol to communicate with a target. >> >> Let the SRP transport layer create a sysfs "delete" attribute for >> initiator drivers that support this functionality. > > This will be a nice feature to have, but I wonder about the proper > behavior. I had thought about having a 'drain' state, where we wouldn't > start new commands and would let existing ones complete or timeout > before killing the connection. That may be in addition to a 'kill it > now' variant. > > Though perhaps my gentler delete is best done by having a user script > walk all of the devices under the host and set them to offline, wait for > scsi_host/hostX/host_busy to drop to zero, then hit the SRP delete attr. Sorry, but I disagree. One of the goals of this patch set is to make fast fail-over possible. Waiting for an error completion or time out contradicts with that goal. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 00/14] Make ib_srp better suited for H.A. purposes 2011-12-01 18:54 [PATCH 00/14] Make ib_srp better suited for H.A. purposes Bart Van Assche ` (4 preceding siblings ...) 2011-12-01 19:13 ` [PATCH 14/14] ib_srp: Allow SRP disconnect through sysfs Bart Van Assche @ 2011-12-01 23:26 ` David Dillow 5 siblings, 0 replies; 36+ messages in thread From: David Dillow @ 2011-12-01 23:26 UTC (permalink / raw) To: Bart Van Assche Cc: linux-rdma@vger.kernel.org, linux-scsi@vger.kernel.org, Roland Dreier, Vu Pham On Thu, 2011-12-01 at 13:54 -0500, Bart Van Assche wrote: > This patch series makes the ib_srp driver better suited for use in a H.A. > setup because: > - Switchover without triggering read or write errors become possible. Such > errors are bad because these can make a filesystem switch to read-only > mode. > - A ping mechanism has been added that allows to reduce the switch-over > time greatly. > - Disconnecting from a target without unloading ib_srp becomes possible. > - Switchover can be triggered explicitly by deleting an initiator device. In general, I'm glad to see this being done. I won't have the time this week for a detailed review, but will take a look as soon as I can. I have a number of ideas I'd like to see implemented (and have partially done), so it will be interesting to see if we're using similar approaches. -- Dave Dillow National Center for Computational Science Oak Ridge National Laboratory (865) 241-6602 office ^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2011-12-28 23:54 UTC | newest]
Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-01 18:54 [PATCH 00/14] Make ib_srp better suited for H.A. purposes Bart Van Assche
2011-12-01 19:05 ` [PATCH 08/14] srp_transport: Document sysfs attributes Bart Van Assche
2011-12-01 19:06 ` [PATCH 09/14] srp_transport: Fix attribute registration Bart Van Assche
[not found] ` <201112012006.50445.bvanassche-HInyCGIudOg@public.gmane.org>
2011-12-15 20:09 ` David Dillow
2011-12-01 19:10 ` [PATCH 12/14] ib_srp: Rework error handling Bart Van Assche
[not found] ` <201112012010.37276.bvanassche-HInyCGIudOg@public.gmane.org>
2011-12-15 20:20 ` David Dillow
2011-12-19 3:36 ` David Dillow
2011-12-19 10:38 ` Bart Van Assche
2011-12-19 22:51 ` David Dillow
[not found] ` <1324335083.7043.66.camel-FqX9LgGZnHWDB2HL1qBt2PIbXMQ5te18@public.gmane.org>
2011-12-20 9:01 ` Bart Van Assche
[not found] ` <CAO+b5-qF2taG0B4n9SBwqnuh0wajH5fXFLTb-VAaDrfT9TZ6aQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-12-21 3:33 ` David Dillow
[not found] ` <1324438387.7621.53.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
2011-12-21 13:26 ` Bart Van Assche
[not found] ` <1324265791.17849.92.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
2011-12-26 19:13 ` Bart Van Assche
[not found] ` <201112011954.25811.bvanassche-HInyCGIudOg@public.gmane.org>
2011-12-01 19:08 ` [PATCH 10/14] srp_transport: Simplify attribute initialization code Bart Van Assche
[not found] ` <201112012008.00502.bvanassche-HInyCGIudOg@public.gmane.org>
2011-12-19 0:07 ` David Dillow
[not found] ` <1324253243.17849.45.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
2011-12-20 10:21 ` Bart Van Assche
2011-12-21 3:23 ` David Dillow
2011-12-01 19:11 ` [PATCH 13/14] ib_srp: Implement transport layer ping Bart Van Assche
2011-12-19 0:50 ` David Dillow
2011-12-19 10:16 ` Bart Van Assche
2011-12-19 22:32 ` David Dillow
[not found] ` <1324333931.7043.52.camel-FqX9LgGZnHWDB2HL1qBt2PIbXMQ5te18@public.gmane.org>
2011-12-20 10:13 ` Bart Van Assche
[not found] ` <CAO+b5-qLxmcXCCxA8+bPYsinjr1eqCDO2JUJbjgVr59N55CU1Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-12-21 2:32 ` David Dillow
2011-12-20 10:27 ` Bart Van Assche
2011-12-21 3:05 ` David Dillow
[not found] ` <1324436736.7621.38.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
2011-12-21 14:07 ` Bart Van Assche
2011-12-23 22:34 ` David Dillow
[not found] ` <1324679698.3004.12.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
2011-12-23 22:56 ` Mike Christie
2011-12-24 20:07 ` David Dillow
2011-12-26 19:39 ` Bart Van Assche
2011-12-28 23:53 ` David Dillow
2011-12-26 20:01 ` Bart Van Assche
2011-12-01 19:13 ` [PATCH 14/14] ib_srp: Allow SRP disconnect through sysfs Bart Van Assche
2011-12-19 4:03 ` David Dillow
[not found] ` <1324267414.17849.98.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
2011-12-19 9:04 ` Bart Van Assche
2011-12-01 23:26 ` [PATCH 00/14] Make ib_srp better suited for H.A. purposes David Dillow
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).