From: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
To: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>,
Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
Vu Pham <vu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
Sebastian Riemer
<sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>,
linux-rdma <linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: [PATCH 07/10] IB/srp: Add periodic reconnect functionality
Date: Thu, 10 Oct 2013 14:16:59 +0200 [thread overview]
Message-ID: <52569ABB.4060205@acm.org> (raw)
In-Reply-To: <52569806.9080201-HInyCGIudOg@public.gmane.org>
After a transport layer occurred, periodically try to reconnect
to the target until the dev_loss timer expires. Protect the
callback functions that can be invoked from inside the SCSI EH
against concurrent invocation with srp_reconnect_rport() via the
rport mutex. Change the default dev_loss_tmo from 60s into 600s
to give the reconnect mechanism a chance to kick in.
Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Acked-by: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
Cc: Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Vu Pham <vu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Sebastian Riemer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
---
Documentation/ABI/stable/sysfs-transport-srp | 8 +++++
drivers/infiniband/ulp/srp/ib_srp.c | 52 ++++++++++++++++++++++++----
2 files changed, 54 insertions(+), 6 deletions(-)
diff --git a/Documentation/ABI/stable/sysfs-transport-srp b/Documentation/ABI/stable/sysfs-transport-srp
index 178a44d..21bd480 100644
--- a/Documentation/ABI/stable/sysfs-transport-srp
+++ b/Documentation/ABI/stable/sysfs-transport-srp
@@ -30,6 +30,14 @@ Contact: linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Description: 16-byte local SRP port identifier in hexadecimal format. An
example: 4c:49:4e:55:58:20:56:49:4f:00:00:00:00:00:00:00.
+What: /sys/class/srp_remote_ports/port-<h>:<n>/reconnect_delay
+Date: December 1, 2013
+KernelVersion: 3.12
+Contact: linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
+Description: Number of seconds the SCSI layer will wait after a reconnect
+ attempt failed before retrying. Setting this attribute to
+ "off" will disable time-based reconnecting.
+
What: /sys/class/srp_remote_ports/port-<h>:<n>/roles
Date: June 27, 2007
KernelVersion: 2.6.24
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index ceb84b6..4be3eb8 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -88,6 +88,11 @@ MODULE_PARM_DESC(topspin_workarounds,
static struct kernel_param_ops srp_tmo_ops;
+static int srp_reconnect_delay = 10;
+module_param_cb(reconnect_delay, &srp_tmo_ops, &srp_reconnect_delay,
+ S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(reconnect_delay, "Time between successive reconnect attempts");
+
static int srp_fast_io_fail_tmo = 15;
module_param_cb(fast_io_fail_tmo, &srp_tmo_ops, &srp_fast_io_fail_tmo,
S_IRUGO | S_IWUSR);
@@ -96,7 +101,7 @@ MODULE_PARM_DESC(fast_io_fail_tmo,
" layer error and failing all I/O. \"off\" means that this"
" functionality is disabled.");
-static int srp_dev_loss_tmo = 60;
+static int srp_dev_loss_tmo = 600;
module_param_cb(dev_loss_tmo, &srp_tmo_ops, &srp_dev_loss_tmo,
S_IRUGO | S_IWUSR);
MODULE_PARM_DESC(dev_loss_tmo,
@@ -144,10 +149,14 @@ static int srp_tmo_set(const char *val, const struct kernel_param *kp)
} else {
tmo = -1;
}
- if (kp->arg == &srp_fast_io_fail_tmo)
- res = srp_tmo_valid(tmo, srp_dev_loss_tmo);
+ if (kp->arg == &srp_reconnect_delay)
+ res = srp_tmo_valid(tmo, srp_fast_io_fail_tmo,
+ srp_dev_loss_tmo);
+ else if (kp->arg == &srp_fast_io_fail_tmo)
+ res = srp_tmo_valid(srp_reconnect_delay, tmo, srp_dev_loss_tmo);
else
- res = srp_tmo_valid(srp_fast_io_fail_tmo, tmo);
+ res = srp_tmo_valid(srp_reconnect_delay, srp_fast_io_fail_tmo,
+ tmo);
if (res)
goto out;
*(int *)kp->arg = tmo;
@@ -1426,18 +1435,29 @@ static void srp_send_completion(struct ib_cq *cq, void *target_ptr)
static int srp_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmnd)
{
struct srp_target_port *target = host_to_target(shost);
+ struct srp_rport *rport = target->rport;
struct srp_request *req;
struct srp_iu *iu;
struct srp_cmd *cmd;
struct ib_device *dev;
unsigned long flags;
int len, result;
+ const bool in_scsi_eh = !in_interrupt() && current == shost->ehandler;
+
+ /*
+ * The SCSI EH thread is the only context from which srp_queuecommand()
+ * can get invoked for blocked devices (SDEV_BLOCK /
+ * SDEV_CREATED_BLOCK). Avoid racing with srp_reconnect_rport() by
+ * locking the rport mutex if invoked from inside the SCSI EH.
+ */
+ if (in_scsi_eh)
+ mutex_lock(&rport->mutex);
result = srp_chkready(target->rport);
if (unlikely(result)) {
scmnd->result = result;
scmnd->scsi_done(scmnd);
- return 0;
+ goto unlock_rport;
}
spin_lock_irqsave(&target->lock, flags);
@@ -1482,6 +1502,10 @@ static int srp_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmnd)
goto err_unmap;
}
+unlock_rport:
+ if (in_scsi_eh)
+ mutex_unlock(&rport->mutex);
+
return 0;
err_unmap:
@@ -1496,6 +1520,9 @@ err_iu:
err_unlock:
spin_unlock_irqrestore(&target->lock, flags);
+ if (in_scsi_eh)
+ mutex_unlock(&rport->mutex);
+
return SCSI_MLQUEUE_HOST_BUSY;
}
@@ -1780,6 +1807,7 @@ static int srp_cm_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event)
static int srp_send_tsk_mgmt(struct srp_target_port *target,
u64 req_tag, unsigned int lun, u8 func)
{
+ struct srp_rport *rport = target->rport;
struct ib_device *dev = target->srp_host->srp_dev->dev;
struct srp_iu *iu;
struct srp_tsk_mgmt *tsk_mgmt;
@@ -1789,12 +1817,20 @@ static int srp_send_tsk_mgmt(struct srp_target_port *target,
init_completion(&target->tsk_mgmt_done);
+ /*
+ * Lock the rport mutex to avoid that srp_create_target_ib() is
+ * invoked while a task management function is being sent.
+ */
+ mutex_lock(&rport->mutex);
spin_lock_irq(&target->lock);
iu = __srp_get_tx_iu(target, SRP_IU_TSK_MGMT);
spin_unlock_irq(&target->lock);
- if (!iu)
+ if (!iu) {
+ mutex_unlock(&rport->mutex);
+
return -1;
+ }
ib_dma_sync_single_for_cpu(dev, iu->dma, sizeof *tsk_mgmt,
DMA_TO_DEVICE);
@@ -1811,8 +1847,11 @@ static int srp_send_tsk_mgmt(struct srp_target_port *target,
DMA_TO_DEVICE);
if (srp_post_send(target, iu, sizeof *tsk_mgmt)) {
srp_put_tx_iu(target, iu, SRP_IU_TSK_MGMT);
+ mutex_unlock(&rport->mutex);
+
return -1;
}
+ mutex_unlock(&rport->mutex);
if (!wait_for_completion_timeout(&target->tsk_mgmt_done,
msecs_to_jiffies(SRP_ABORT_TIMEOUT_MS)))
@@ -2713,6 +2752,7 @@ static void srp_remove_one(struct ib_device *device)
static struct srp_function_template ib_srp_transport_functions = {
.has_rport_state = true,
.reset_timer_if_blocked = true,
+ .reconnect_delay = &srp_reconnect_delay,
.fast_io_fail_tmo = &srp_fast_io_fail_tmo,
.dev_loss_tmo = &srp_dev_loss_tmo,
.reconnect = srp_rport_reconnect,
--
1.8.1.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2013-10-10 12:16 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-10-10 12:05 [PATCH 0/10] IB SRP initiator patches for kernel 3.12 Bart Van Assche
2013-10-10 12:10 ` [PATCH 02/10] IB/srp: Keep rport as long as the IB transport layer Bart Van Assche
[not found] ` <52569806.9080201-HInyCGIudOg@public.gmane.org>
2013-10-10 12:08 ` [PATCH 01/10] IB/srp: Make transport layer retry count configurable Bart Van Assche
2013-10-10 12:12 ` [PATCH 03/10] scsi_transport_srp: Add transport layer error handling Bart Van Assche
[not found] ` <525699A9.6090802-HInyCGIudOg@public.gmane.org>
2013-10-19 16:13 ` Bart Van Assche
[not found] ` <5262AF8D.7070700-HInyCGIudOg@public.gmane.org>
2013-10-25 22:42 ` David Dillow
2013-10-26 6:30 ` Bart Van Assche
2013-10-10 12:13 ` [PATCH 04/10] IB/srp: Use SRP transport layer error recovery Bart Van Assche
2013-10-10 12:14 ` [PATCH 05/10] IB/srp: Start timers if a transport layer error occurs Bart Van Assche
2013-10-10 12:15 ` [PATCH 06/10] scsi_transport_srp: Add periodic reconnect support Bart Van Assche
2013-10-10 12:16 ` Bart Van Assche [this message]
2013-10-10 12:18 ` [PATCH 08/10] IB/srp: Export sgid to sysfs Bart Van Assche
2013-10-10 12:18 ` [PATCH 09/10] IB/srp: Introduce srp_alloc_req_data() Bart Van Assche
2013-10-10 12:19 ` [PATCH 10/10] IB/srp: Make queue size configurable Bart Van Assche
[not found] ` <52569B6A.30301-HInyCGIudOg@public.gmane.org>
2013-10-11 11:34 ` Jack Wang
2013-10-25 22:13 ` David Dillow
[not found] ` <1382739213.17280.3.camel-zHLflQxYYDO4Hhoo1DtQwJ9G+ZOsUmrO@public.gmane.org>
2013-10-26 6:13 ` Bart Van Assche
2013-10-25 22:14 ` [PATCH 0/10] IB SRP initiator patches for kernel 3.12 David Dillow
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52569ABB.4060205@acm.org \
--to=bvanassche-hinycgiudog@public.gmane.org \
--cc=dillowda-1Heg1YXhbW8@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
--cc=sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org \
--cc=vu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).