* [PATCH 0/3] SCSI SRP transport layer patches for kernel 3.14
@ 2013-12-11 16:04 Bart Van Assche
2013-12-11 16:05 ` [PATCH 1/3] scsi_transport_srp: Block rport upon TL error even with fast_io_fail_tmo = off Bart Van Assche
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Bart Van Assche @ 2013-12-11 16:04 UTC (permalink / raw)
To: David Dillow
Cc: Vu Pham, Sebastian Riemer, James Bottomley, linux-rdma,
linux-scsi
This series of three patches is what I came up with after several weeks
of path failover testing of the 3.13-rc1 IB/SRP initiator on a large
setup. The patches in this series are:
* Make SRP transport layer behavior more consistent with that of the FC
transport layer.
* Fix a rare race condition triggered by path failover.
* Document the rport state transitions.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/3] scsi_transport_srp: Block rport upon TL error even with fast_io_fail_tmo = off
2013-12-11 16:04 [PATCH 0/3] SCSI SRP transport layer patches for kernel 3.14 Bart Van Assche
@ 2013-12-11 16:05 ` Bart Van Assche
2013-12-11 16:06 ` [PATCH 2/3] scsi_transport_srp: Fix a race condition Bart Van Assche
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Bart Van Assche @ 2013-12-11 16:05 UTC (permalink / raw)
To: David Dillow
Cc: Vu Pham, Sebastian Riemer, James Bottomley, linux-rdma,
linux-scsi
The current behavior of the SRP transport layer when a transport
layer error is encountered is to block SCSI command processing only
if fast_io_fail_tmo != off. The current behavior of the FC transport
layer when a transport layer error is encountered is to block SCSI
command processing no matter which value fast_io_fail_tmo has been
set to. Make the behavior of the SRP transport layer consistent with
that of the FC transport layer to avoid confusion.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: David Dillow <dillowda@ornl.gov>
Cc: Vu Pham <vu@mellanox.com>
Cc: Sebastian Riemer <sebastian.riemer@profitbricks.com>
Cc: James Bottomley <JBottomley@Parallels.com>
---
drivers/scsi/scsi_transport_srp.c | 30 +++++++++++++++++-------------
1 file changed, 17 insertions(+), 13 deletions(-)
diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c
index 2700a5a..8b9cb22 100644
--- a/drivers/scsi/scsi_transport_srp.c
+++ b/drivers/scsi/scsi_transport_srp.c
@@ -67,7 +67,8 @@ static inline struct Scsi_Host *rport_to_shost(struct srp_rport *r)
*
* The combination of the timeout parameters must be such that SCSI commands
* are finished in a reasonable time. Hence do not allow the fast I/O fail
- * timeout to exceed SCSI_DEVICE_BLOCK_MAX_TIMEOUT. Furthermore, these
+ * timeout to exceed SCSI_DEVICE_BLOCK_MAX_TIMEOUT nor allow dev_loss_tmo to
+ * exceed that limit if failing I/O fast has been disabled. Furthermore, these
* parameters must be such that multipath can detect failed paths timely.
* Hence do not allow all three parameters to be disabled simultaneously.
*/
@@ -79,6 +80,9 @@ int srp_tmo_valid(int reconnect_delay, int fast_io_fail_tmo, int dev_loss_tmo)
return -EINVAL;
if (fast_io_fail_tmo > SCSI_DEVICE_BLOCK_MAX_TIMEOUT)
return -EINVAL;
+ if (fast_io_fail_tmo < 0 &&
+ dev_loss_tmo > SCSI_DEVICE_BLOCK_MAX_TIMEOUT)
+ return -EINVAL;
if (dev_loss_tmo >= LONG_MAX / HZ)
return -EINVAL;
if (fast_io_fail_tmo >= 0 && dev_loss_tmo >= 0 &&
@@ -463,20 +467,20 @@ static void __srp_start_tl_fail_timers(struct srp_rport *rport)
queue_delayed_work(system_long_wq,
&rport->reconnect_work,
1UL * delay * HZ);
- if (fast_io_fail_tmo >= 0 &&
- srp_rport_set_state(rport, SRP_RPORT_BLOCKED) == 0) {
+ if (srp_rport_set_state(rport, SRP_RPORT_BLOCKED) == 0) {
pr_debug("%s new state: %d\n",
dev_name(&shost->shost_gendev),
rport->state);
scsi_target_block(&shost->shost_gendev);
- queue_delayed_work(system_long_wq,
- &rport->fast_io_fail_work,
- 1UL * fast_io_fail_tmo * HZ);
+ if (fast_io_fail_tmo >= 0)
+ queue_delayed_work(system_long_wq,
+ &rport->fast_io_fail_work,
+ 1UL * fast_io_fail_tmo * HZ);
+ if (dev_loss_tmo >= 0)
+ queue_delayed_work(system_long_wq,
+ &rport->dev_loss_work,
+ 1UL * dev_loss_tmo * HZ);
}
- if (dev_loss_tmo >= 0)
- queue_delayed_work(system_long_wq,
- &rport->dev_loss_work,
- 1UL * dev_loss_tmo * HZ);
} else {
pr_debug("%s has already been deleted\n",
dev_name(&shost->shost_gendev));
@@ -578,9 +582,9 @@ int srp_reconnect_rport(struct srp_rport *rport)
spin_unlock_irq(shost->host_lock);
} else if (rport->state == SRP_RPORT_RUNNING) {
/*
- * srp_reconnect_rport() was invoked with fast_io_fail
- * off. Mark the port as failed and start the TL failure
- * timers if these had not yet been started.
+ * srp_reconnect_rport() has been invoked with fast_io_fail
+ * and dev_loss off. Mark the port as failed and start the TL
+ * failure timers if these had not yet been started.
*/
__rport_fail_io_fast(rport);
scsi_target_unblock(&shost->shost_gendev,
--
1.8.1.4
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/3] scsi_transport_srp: Fix a race condition
2013-12-11 16:04 [PATCH 0/3] SCSI SRP transport layer patches for kernel 3.14 Bart Van Assche
2013-12-11 16:05 ` [PATCH 1/3] scsi_transport_srp: Block rport upon TL error even with fast_io_fail_tmo = off Bart Van Assche
@ 2013-12-11 16:06 ` Bart Van Assche
[not found] ` <52A88CF9.206-HInyCGIudOg@public.gmane.org>
2013-12-30 21:46 ` [PATCH 0/3] SCSI SRP transport layer patches for kernel 3.14 David Dillow
3 siblings, 0 replies; 5+ messages in thread
From: Bart Van Assche @ 2013-12-11 16:06 UTC (permalink / raw)
To: David Dillow
Cc: Vu Pham, Sebastian Riemer, James Bottomley, linux-rdma,
linux-scsi
The rport timers must be stopped before the SRP initiator destroys the
resources associated with the SCSI host. This is necessary because
otherwise the callback functions invoked from the SRP transport layer
could trigger a use-after-free. Stopping the rport timers before
invoking scsi_remove_host() can trigger long delays in the SCSI error
handler if a transport layer failure occurs while scsi_remove_host()
is in progress. Hence move the code for stopping the rport timers
from srp_rport_release() into a new function and invoke that function
after scsi_remove_host() has finished. This patch fixes the following
sporadic kernel crash:
kernel BUG at include/asm-generic/dma-mapping-common.h:64!
invalid opcode: 0000 [#1] SMP
RIP: 0010:[<ffffffffa03b20b1>] [<ffffffffa03b20b1>] srp_unmap_data+0x121/0x130 [ib_srp]
Call Trace:
[<ffffffffa03b20fc>] srp_free_req+0x3c/0x80 [ib_srp]
[<ffffffffa03b2188>] srp_finish_req+0x48/0x70 [ib_srp]
[<ffffffffa03b21fb>] srp_terminate_io+0x4b/0x60 [ib_srp]
[<ffffffffa03a6fb5>] __rport_fail_io_fast+0x75/0x80 [scsi_transport_srp]
[<ffffffffa03a7438>] rport_fast_io_fail_timedout+0x88/0xc0 [scsi_transport_srp]
[<ffffffff8108b370>] worker_thread+0x170/0x2a0
[<ffffffff81090876>] kthread+0x96/0xa0
[<ffffffff8100c0ca>] child_rip+0xa/0x20
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: David Dillow <dillowda@ornl.gov>
Cc: Vu Pham <vu@mellanox.com>
Cc: Sebastian Riemer <sebastian.riemer@profitbricks.com>
Cc: James Bottomley <JBottomley@Parallels.com>
---
drivers/infiniband/ulp/srp/ib_srp.c | 1 +
drivers/scsi/scsi_transport_srp.c | 83 +++++++++++++++++++------------------
include/scsi/scsi_transport_srp.h | 4 +-
3 files changed, 46 insertions(+), 42 deletions(-)
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 6787d7b..91bb991 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -695,6 +695,7 @@ static void srp_remove_target(struct srp_target_port *target)
srp_rport_get(target->rport);
srp_remove_host(target->scsi_host);
scsi_remove_host(target->scsi_host);
+ srp_stop_rport_timers(target->rport);
srp_disconnect_target(target);
ib_destroy_cm_id(target->cm_id);
srp_free_target_ib(target);
diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c
index 8b9cb22..a349d44 100644
--- a/drivers/scsi/scsi_transport_srp.c
+++ b/drivers/scsi/scsi_transport_srp.c
@@ -456,37 +456,29 @@ static void __srp_start_tl_fail_timers(struct srp_rport *rport)
lockdep_assert_held(&rport->mutex);
- if (!rport->deleted) {
- delay = rport->reconnect_delay;
- fast_io_fail_tmo = rport->fast_io_fail_tmo;
- dev_loss_tmo = rport->dev_loss_tmo;
- pr_debug("%s current state: %d\n",
- dev_name(&shost->shost_gendev), rport->state);
+ delay = rport->reconnect_delay;
+ fast_io_fail_tmo = rport->fast_io_fail_tmo;
+ dev_loss_tmo = rport->dev_loss_tmo;
+ pr_debug("%s current state: %d\n", dev_name(&shost->shost_gendev),
+ rport->state);
- if (delay > 0)
+ if (rport->state == SRP_RPORT_LOST)
+ return;
+ if (delay > 0)
+ queue_delayed_work(system_long_wq, &rport->reconnect_work,
+ 1UL * delay * HZ);
+ if (srp_rport_set_state(rport, SRP_RPORT_BLOCKED) == 0) {
+ pr_debug("%s new state: %d\n", dev_name(&shost->shost_gendev),
+ rport->state);
+ scsi_target_block(&shost->shost_gendev);
+ if (fast_io_fail_tmo >= 0)
queue_delayed_work(system_long_wq,
- &rport->reconnect_work,
- 1UL * delay * HZ);
- if (srp_rport_set_state(rport, SRP_RPORT_BLOCKED) == 0) {
- pr_debug("%s new state: %d\n",
- dev_name(&shost->shost_gendev),
- rport->state);
- scsi_target_block(&shost->shost_gendev);
- if (fast_io_fail_tmo >= 0)
- queue_delayed_work(system_long_wq,
- &rport->fast_io_fail_work,
- 1UL * fast_io_fail_tmo * HZ);
- if (dev_loss_tmo >= 0)
- queue_delayed_work(system_long_wq,
- &rport->dev_loss_work,
- 1UL * dev_loss_tmo * HZ);
- }
- } else {
- pr_debug("%s has already been deleted\n",
- dev_name(&shost->shost_gendev));
- srp_rport_set_state(rport, SRP_RPORT_FAIL_FAST);
- scsi_target_unblock(&shost->shost_gendev,
- SDEV_TRANSPORT_OFFLINE);
+ &rport->fast_io_fail_work,
+ 1UL * fast_io_fail_tmo * HZ);
+ if (dev_loss_tmo >= 0)
+ queue_delayed_work(system_long_wq,
+ &rport->dev_loss_work,
+ 1UL * dev_loss_tmo * HZ);
}
}
@@ -560,7 +552,7 @@ int srp_reconnect_rport(struct srp_rport *rport)
scsi_target_block(&shost->shost_gendev);
while (scsi_request_fn_active(shost))
msleep(20);
- res = i->f->reconnect(rport);
+ res = rport->state != SRP_RPORT_LOST ? i->f->reconnect(rport) : -ENODEV;
pr_debug("%s (state %d): transport.reconnect() returned %d\n",
dev_name(&shost->shost_gendev), rport->state, res);
if (res == 0) {
@@ -626,10 +618,6 @@ static void srp_rport_release(struct device *dev)
{
struct srp_rport *rport = dev_to_rport(dev);
- cancel_delayed_work_sync(&rport->reconnect_work);
- cancel_delayed_work_sync(&rport->fast_io_fail_work);
- cancel_delayed_work_sync(&rport->dev_loss_work);
-
put_device(dev->parent);
kfree(rport);
}
@@ -784,12 +772,6 @@ void srp_rport_del(struct srp_rport *rport)
device_del(dev);
transport_destroy_device(dev);
- mutex_lock(&rport->mutex);
- if (rport->state == SRP_RPORT_BLOCKED)
- __rport_fail_io_fast(rport);
- rport->deleted = true;
- mutex_unlock(&rport->mutex);
-
put_device(dev);
}
EXPORT_SYMBOL_GPL(srp_rport_del);
@@ -814,6 +796,27 @@ void srp_remove_host(struct Scsi_Host *shost)
}
EXPORT_SYMBOL_GPL(srp_remove_host);
+/**
+ * srp_stop_rport_timers - stop the transport layer recovery timers
+ *
+ * Must be called after srp_remove_host() and scsi_remove_host(). The caller
+ * must hold a reference on the rport (rport->dev) and on the SCSI host
+ * (rport->dev.parent).
+ */
+void srp_stop_rport_timers(struct srp_rport *rport)
+{
+ mutex_lock(&rport->mutex);
+ if (rport->state == SRP_RPORT_BLOCKED)
+ __rport_fail_io_fast(rport);
+ srp_rport_set_state(rport, SRP_RPORT_LOST);
+ mutex_unlock(&rport->mutex);
+
+ cancel_delayed_work_sync(&rport->reconnect_work);
+ cancel_delayed_work_sync(&rport->fast_io_fail_work);
+ cancel_delayed_work_sync(&rport->dev_loss_work);
+}
+EXPORT_SYMBOL_GPL(srp_stop_rport_timers);
+
static int srp_tsk_mgmt_response(struct Scsi_Host *shost, u64 nexus, u64 tm_id,
int result)
{
diff --git a/include/scsi/scsi_transport_srp.h b/include/scsi/scsi_transport_srp.h
index 4ebf691..f24e763 100644
--- a/include/scsi/scsi_transport_srp.h
+++ b/include/scsi/scsi_transport_srp.h
@@ -19,7 +19,7 @@ struct srp_rport_identifiers {
* @SRP_RPORT_BLOCKED: Transport layer not operational; fast I/O fail timer
* is running and I/O has been blocked.
* @SRP_RPORT_FAIL_FAST: Fast I/O fail timer has expired; fail I/O fast.
- * @SRP_RPORT_LOST: Device loss timer has expired; port is being removed.
+ * @SRP_RPORT_LOST: Port is being removed.
*/
enum srp_rport_state {
SRP_RPORT_RUNNING,
@@ -48,7 +48,6 @@ struct srp_rport {
struct mutex mutex;
enum srp_rport_state state;
- bool deleted;
int reconnect_delay;
int failed_reconnects;
struct delayed_work reconnect_work;
@@ -101,6 +100,7 @@ extern int srp_tmo_valid(int reconnect_delay, int fast_io_fail_tmo,
extern int srp_reconnect_rport(struct srp_rport *rport);
extern void srp_start_tl_fail_timers(struct srp_rport *rport);
extern void srp_remove_host(struct Scsi_Host *);
+extern void srp_stop_rport_timers(struct srp_rport *rport);
/**
* srp_chkready() - evaluate the transport layer state before I/O
--
1.8.1.4
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 3/3] scsi_transport_srp: Add rport state diagram
[not found] ` <52A88CF9.206-HInyCGIudOg@public.gmane.org>
@ 2013-12-11 16:08 ` Bart Van Assche
0 siblings, 0 replies; 5+ messages in thread
From: Bart Van Assche @ 2013-12-11 16:08 UTC (permalink / raw)
To: David Dillow
Cc: Vu Pham, Sebastian Riemer, James Bottomley, linux-rdma,
linux-scsi
[-- Attachment #1: Type: text/plain, Size: 2833 bytes --]
Add a diagram in Documentation/scsi/scsi_transport_srp that
illustrates the rport state transitions.
Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Cc: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
Cc: Vu Pham <vu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Sebastian Riemer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Cc: James Bottomley <JBottomley-MU7nAjRaF3makBO8gow8eQ@public.gmane.org>
---
Documentation/scsi/scsi_transport_srp/Makefile | 7 ++++++
.../scsi_transport_srp/rport_state_diagram.dot | 26 ++++++++++++++++++++++
2 files changed, 33 insertions(+)
create mode 100644 Documentation/scsi/scsi_transport_srp/Makefile
create mode 100644 Documentation/scsi/scsi_transport_srp/rport_state_diagram.dot
diff --git a/Documentation/scsi/scsi_transport_srp/Makefile b/Documentation/scsi/scsi_transport_srp/Makefile
new file mode 100644
index 0000000..5f6b567
--- /dev/null
+++ b/Documentation/scsi/scsi_transport_srp/Makefile
@@ -0,0 +1,7 @@
+all: rport_state_diagram.svg rport_state_diagram.png
+
+rport_state_diagram.svg: rport_state_diagram.dot
+ dot -Tsvg -o $@ $<
+
+rport_state_diagram.png: rport_state_diagram.dot
+ dot -Tpng -o $@ $<
diff --git a/Documentation/scsi/scsi_transport_srp/rport_state_diagram.dot b/Documentation/scsi/scsi_transport_srp/rport_state_diagram.dot
new file mode 100644
index 0000000..75d610d
--- /dev/null
+++ b/Documentation/scsi/scsi_transport_srp/rport_state_diagram.dot
@@ -0,0 +1,26 @@
+digraph srp_initiator {
+ node [shape = doublecircle]; running lost;
+ node [shape = circle];
+
+ {
+ rank = min;
+ running_rta [ label = "running;\nreconnect\ntimer\nactive" ];
+ };
+ running [ label = "running;\nreconnect\ntimer\nstopped" ];
+ blocked;
+ failfast [ label = "fail I/O\nfast" ];
+ lost;
+
+ running -> running_rta [ label = "fast_io_fail_tmo = off and\ndev_loss_tmo = off;\nsrp_start_tl_fail_timers()" ];
+ running_rta -> running [ label = "fast_io_fail_tmo = off and\ndev_loss_tmo = off;\nreconnecting succeeded" ];
+ running -> blocked [ label = "fast_io_fail_tmo >= 0 or\ndev_loss_tmo >= 0;\nsrp_start_tl_fail_timers()" ];
+ running -> failfast [ label = "fast_io_fail_tmo = off and\ndev_loss_tmo = off;\nreconnecting failed\n" ];
+ blocked -> failfast [ label = "fast_io_fail_tmo\nexpired or\nreconnecting\nfailed" ];
+ blocked -> lost [ label = "dev_loss_tmo\nexpired or\nsrp_stop_rport_timers()" ];
+ failfast -> lost [ label = "dev_loss_tmo\nexpired or\nsrp_stop_rport_timers()" ];
+ blocked -> running [ label = "reconnecting\nsucceeded" ];
+ failfast -> failfast [ label = "reconnecting\nfailed" ];
+ failfast -> running [ label = "reconnecting\nsucceeded" ];
+ running -> lost [ label = "srp_stop_rport_timers()" ];
+ running_rta -> lost [ label = "srp_stop_rport_timers()" ];
+}
--
1.8.1.4
[-- Attachment #2: rport_state_diagram.svg --]
[-- Type: image/svg+xml, Size: 11319 bytes --]
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 0/3] SCSI SRP transport layer patches for kernel 3.14
2013-12-11 16:04 [PATCH 0/3] SCSI SRP transport layer patches for kernel 3.14 Bart Van Assche
` (2 preceding siblings ...)
[not found] ` <52A88CF9.206-HInyCGIudOg@public.gmane.org>
@ 2013-12-30 21:46 ` David Dillow
3 siblings, 0 replies; 5+ messages in thread
From: David Dillow @ 2013-12-30 21:46 UTC (permalink / raw)
To: Bart Van Assche
Cc: Vu Pham, Sebastian Riemer, James Bottomley, linux-rdma,
linux-scsi
On Wed, 2013-12-11 at 17:04 +0100, Bart Van Assche wrote:
> This series of three patches is what I came up with after several weeks
> of path failover testing of the 3.13-rc1 IB/SRP initiator on a large
> setup. The patches in this series are:
> * Make SRP transport layer behavior more consistent with that of the FC
> transport layer.
> * Fix a rare race condition triggered by path failover.
> * Document the rport state transitions.
For all three,
Acked-by: David Dillow <dillowda@ornl.gov>
Thanks Bart!
--
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2013-12-30 21:56 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-11 16:04 [PATCH 0/3] SCSI SRP transport layer patches for kernel 3.14 Bart Van Assche
2013-12-11 16:05 ` [PATCH 1/3] scsi_transport_srp: Block rport upon TL error even with fast_io_fail_tmo = off Bart Van Assche
2013-12-11 16:06 ` [PATCH 2/3] scsi_transport_srp: Fix a race condition Bart Van Assche
[not found] ` <52A88CF9.206-HInyCGIudOg@public.gmane.org>
2013-12-11 16:08 ` [PATCH 3/3] scsi_transport_srp: Add rport state diagram Bart Van Assche
2013-12-30 21:46 ` [PATCH 0/3] SCSI SRP transport layer patches for kernel 3.14 David Dillow
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).