* [PATCH v3 0/3] IB/SRP patches for kernel 3.8
@ 2012-12-19 14:20 Bart Van Assche
[not found] ` <50D1CD49.5080607-HInyCGIudOg@public.gmane.org>
0 siblings, 1 reply; 17+ messages in thread
From: Bart Van Assche @ 2012-12-19 14:20 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, David Dillow,
Or Gerlitz, Vu Pham, Alex Turin
This patch series avoids that SCSI error handling triggers an endless
loop and also restores reporting of QP errors in the kernel log.
Changes between v3 and v2:
- As proposed by Dave, added a patch that prevents sending of a task
management function over a closed connection.
Changes between v2 and v1:
- Track connection state properly.
- Make srp_reset_host() reset requests even if reconnecting fails
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 17+ messages in thread[parent not found: <50D1CD49.5080607-HInyCGIudOg@public.gmane.org>]
* [PATCH v3 1/3] IB/srp: Track connection state properly [not found] ` <50D1CD49.5080607-HInyCGIudOg@public.gmane.org> @ 2012-12-19 14:21 ` Bart Van Assche [not found] ` <50D1CD7D.6030905-HInyCGIudOg@public.gmane.org> 2012-12-19 14:22 ` [PATCH v3 2/3] IB/srp: Avoid sending a task management function needlessly Bart Van Assche ` (2 subsequent siblings) 3 siblings, 1 reply; 17+ messages in thread From: Bart Van Assche @ 2012-12-19 14:21 UTC (permalink / raw) Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, David Dillow, Or Gerlitz, Vu Pham, Alex Turin, Roland Dreier The connection state must be initialized before srp_connect_target() is invoked. Drop the assignment in srp_add_target() since it occurs after srp_connect_target() and since scsi_host_alloc() zero-initializes the Scsi_Host structure anyway. Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org> Acked-by: David Dillow <dave-i1Mk8JYDVaaSihdK6806/g@public.gmane.org> Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org> --- drivers/infiniband/ulp/srp/ib_srp.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index d5088ce..94f76b9 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -1972,7 +1972,6 @@ static int srp_add_target(struct srp_host *host, struct srp_target_port *target) spin_unlock(&host->target_lock); target->state = SRP_TARGET_LIVE; - target->connected = false; scsi_scan_target(&target->scsi_host->shost_gendev, 0, target->scsi_id, SCAN_WILD_CARD, 0); -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 17+ messages in thread
[parent not found: <50D1CD7D.6030905-HInyCGIudOg@public.gmane.org>]
* Re: [PATCH v3 1/3] IB/srp: Track connection state properly [not found] ` <50D1CD7D.6030905-HInyCGIudOg@public.gmane.org> @ 2012-12-19 18:04 ` David Dillow [not found] ` <1355940274.23687.2.camel-zHLflQxYYDO4Hhoo1DtQwJ9G+ZOsUmrO@public.gmane.org> 0 siblings, 1 reply; 17+ messages in thread From: David Dillow @ 2012-12-19 18:04 UTC (permalink / raw) To: Bart Van Assche Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Or Gerlitz, Vu Pham, Alex Turin, Roland Dreier On Wed, 2012-12-19 at 15:21 +0100, Bart Van Assche wrote: > The connection state must be initialized before srp_connect_target() > is invoked. Drop the assignment in srp_add_target() since it occurs > after srp_connect_target() and since scsi_host_alloc() > zero-initializes the Scsi_Host structure anyway. The commit message is still wrong. The issue is that srp_connect_target() has marked us as active, and this patch removes an assignment that reverses that. This shouldn't go upstream without the fixed commit message. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <1355940274.23687.2.camel-zHLflQxYYDO4Hhoo1DtQwJ9G+ZOsUmrO@public.gmane.org>]
* Re: [PATCH v3 1/3] IB/srp: Track connection state properly [not found] ` <1355940274.23687.2.camel-zHLflQxYYDO4Hhoo1DtQwJ9G+ZOsUmrO@public.gmane.org> @ 2012-12-20 8:13 ` Bart Van Assche [not found] ` <50D2C8C4.3030308-HInyCGIudOg@public.gmane.org> 0 siblings, 1 reply; 17+ messages in thread From: Bart Van Assche @ 2012-12-20 8:13 UTC (permalink / raw) To: David Dillow; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On 12/19/12 19:04, David Dillow wrote: > On Wed, 2012-12-19 at 15:21 +0100, Bart Van Assche wrote: >> The connection state must be initialized before srp_connect_target() >> is invoked. Drop the assignment in srp_add_target() since it occurs >> after srp_connect_target() and since scsi_host_alloc() >> zero-initializes the Scsi_Host structure anyway. > > The commit message is still wrong. The issue is that > srp_connect_target() has marked us as active, and this patch removes an > assignment that reverses that. > > This shouldn't go upstream without the fixed commit message. Hi Dave, How about replacing the commit message by the following text: "Remove an assignment that incorrectly overwrites the connection state update by srp_connect_target()." Thanks, Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <50D2C8C4.3030308-HInyCGIudOg@public.gmane.org>]
* Re: [PATCH v3 1/3] IB/srp: Track connection state properly [not found] ` <50D2C8C4.3030308-HInyCGIudOg@public.gmane.org> @ 2012-12-20 15:10 ` David Dillow [not found] ` <1356016241.2507.1.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org> 0 siblings, 1 reply; 17+ messages in thread From: David Dillow @ 2012-12-20 15:10 UTC (permalink / raw) To: Bart Van Assche; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Thu, 2012-12-20 at 09:13 +0100, Bart Van Assche wrote: > On 12/19/12 19:04, David Dillow wrote: > > On Wed, 2012-12-19 at 15:21 +0100, Bart Van Assche wrote: > >> The connection state must be initialized before srp_connect_target() > >> is invoked. Drop the assignment in srp_add_target() since it occurs > >> after srp_connect_target() and since scsi_host_alloc() > >> zero-initializes the Scsi_Host structure anyway. > > > > The commit message is still wrong. The issue is that > > srp_connect_target() has marked us as active, and this patch removes an > > assignment that reverses that. > > > > This shouldn't go upstream without the fixed commit message. > > Hi Dave, > > How about replacing the commit message by the following text: > > "Remove an assignment that incorrectly overwrites the connection state > update by srp_connect_target()." Works for me, ship it! -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <1356016241.2507.1.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>]
* Re: [PATCH v3 1/3] IB/srp: Track connection state properly [not found] ` <1356016241.2507.1.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org> @ 2012-12-20 15:24 ` Bart Van Assche [not found] ` <50D32DC7.9000007-HInyCGIudOg@public.gmane.org> 2012-12-20 15:27 ` [PATCH v3 " Or Gerlitz 1 sibling, 1 reply; 17+ messages in thread From: Bart Van Assche @ 2012-12-20 15:24 UTC (permalink / raw) To: David Dillow; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On 12/20/12 16:10, David Dillow wrote: > On Thu, 2012-12-20 at 09:13 +0100, Bart Van Assche wrote: >> On 12/19/12 19:04, David Dillow wrote: >>> On Wed, 2012-12-19 at 15:21 +0100, Bart Van Assche wrote: >>>> The connection state must be initialized before srp_connect_target() >>>> is invoked. Drop the assignment in srp_add_target() since it occurs >>>> after srp_connect_target() and since scsi_host_alloc() >>>> zero-initializes the Scsi_Host structure anyway. >>> >>> The commit message is still wrong. The issue is that >>> srp_connect_target() has marked us as active, and this patch removes an >>> assignment that reverses that. >>> >>> This shouldn't go upstream without the fixed commit message. >> >> Hi Dave, >> >> How about replacing the commit message by the following text: >> >> "Remove an assignment that incorrectly overwrites the connection state >> update by srp_connect_target()." > > Works for me, ship it! Thanks ! I will post a fourth version of this patch with the above commit message to make it easier for Roland. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <50D32DC7.9000007-HInyCGIudOg@public.gmane.org>]
* [PATCH v4 1/3] IB/srp: Track connection state properly [not found] ` <50D32DC7.9000007-HInyCGIudOg@public.gmane.org> @ 2012-12-20 15:26 ` Bart Van Assche 0 siblings, 0 replies; 17+ messages in thread From: Bart Van Assche @ 2012-12-20 15:26 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Cc: David Dillow, Roland Dreier Remove an assignment that incorrectly overwrites the connection state update by srp_connect_target(). Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org> Acked-by: David Dillow <dave-i1Mk8JYDVaaSihdK6806/g@public.gmane.org> Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org> --- drivers/infiniband/ulp/srp/ib_srp.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index d5088ce..94f76b9 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -1972,7 +1972,6 @@ static int srp_add_target(struct srp_host *host, struct srp_target_port *target) spin_unlock(&host->target_lock); target->state = SRP_TARGET_LIVE; - target->connected = false; scsi_scan_target(&target->scsi_host->shost_gendev, 0, target->scsi_id, SCAN_WILD_CARD, 0); -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v3 1/3] IB/srp: Track connection state properly [not found] ` <1356016241.2507.1.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org> 2012-12-20 15:24 ` Bart Van Assche @ 2012-12-20 15:27 ` Or Gerlitz [not found] ` <50D32E7A.4070704-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 1 sibling, 1 reply; 17+ messages in thread From: Or Gerlitz @ 2012-12-20 15:27 UTC (permalink / raw) To: David Dillow Cc: Bart Van Assche, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On 20/12/2012 17:10, David Dillow wrote: > Works for me, ship it! Dave, did you gave the patches a try? Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <50D32E7A.4070704-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>]
* Re: [PATCH v3 1/3] IB/srp: Track connection state properly [not found] ` <50D32E7A.4070704-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> @ 2012-12-20 16:27 ` David Dillow [not found] ` <1356020823.8322.0.camel-zHLflQxYYDO4Hhoo1DtQwJ9G+ZOsUmrO@public.gmane.org> 0 siblings, 1 reply; 17+ messages in thread From: David Dillow @ 2012-12-20 16:27 UTC (permalink / raw) To: Or Gerlitz Cc: Bart Van Assche, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Thu, 2012-12-20 at 17:27 +0200, Or Gerlitz wrote: > On 20/12/2012 17:10, David Dillow wrote: > > Works for me, ship it! > > Dave, did you gave the patches a try? This one looks to be an obvious fix, but I have not tested the others; I have been relying on Bart, Alex, and yourself for the validation. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <1356020823.8322.0.camel-zHLflQxYYDO4Hhoo1DtQwJ9G+ZOsUmrO@public.gmane.org>]
* Re: [PATCH v3 1/3] IB/srp: Track connection state properly [not found] ` <1356020823.8322.0.camel-zHLflQxYYDO4Hhoo1DtQwJ9G+ZOsUmrO@public.gmane.org> @ 2012-12-20 20:06 ` Or Gerlitz 0 siblings, 0 replies; 17+ messages in thread From: Or Gerlitz @ 2012-12-20 20:06 UTC (permalink / raw) To: David Dillow Cc: Or Gerlitz, Bart Van Assche, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Thu, Dec 20, 2012 at 6:27 PM, David Dillow <dave-i1Mk8JYDVaaSihdK6806/g@public.gmane.org> wrote: > On Thu, 2012-12-20 at 17:27 +0200, Or Gerlitz wrote: >> Dave, did you gave the patches a try? > This one looks to be an obvious fix, but I have not tested the others; I > have been relying on Bart, Alex, and yourself for the validation. Taking into account that we've been going back and forth w.r.t that validation -- that is initially things weren't working on our setups, later with picking some patches from Bart's tree, that host removal went OK, and now with the patches posted here, it doesn't go fine on my setup -- all in all, sounds like a 2nd opinion which is strengthen by a test is needed here. You only need to boot with Roland's for-next + Bart's patches, connect to a target, take down the IB link and issue dd. No multipath scheme is needed for the very basic validation, can you? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v3 2/3] IB/srp: Avoid sending a task management function needlessly [not found] ` <50D1CD49.5080607-HInyCGIudOg@public.gmane.org> 2012-12-19 14:21 ` [PATCH v3 1/3] IB/srp: Track connection state properly Bart Van Assche @ 2012-12-19 14:22 ` Bart Van Assche [not found] ` <50D1CDB4.9070906-HInyCGIudOg@public.gmane.org> 2012-12-19 14:24 ` [PATCH v3 3/3] IB/srp: Avoid endless SCSI error handling loop Bart Van Assche 2012-12-20 12:38 ` [PATCH v3 0/3] IB/SRP patches for kernel 3.8 Or Gerlitz 3 siblings, 1 reply; 17+ messages in thread From: Bart Van Assche @ 2012-12-19 14:22 UTC (permalink / raw) Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, David Dillow, Or Gerlitz, Vu Pham, Alex Turin, Roland Dreier Do not send a task management function if sending will fail anyway because either there is no RDMA/RC connection or the QP is in the error state. Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org> Cc: David Dillow <dave-i1Mk8JYDVaaSihdK6806/g@public.gmane.org> Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org> --- drivers/infiniband/ulp/srp/ib_srp.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 94f76b9..2633258 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -1695,6 +1695,9 @@ static int srp_send_tsk_mgmt(struct srp_target_port *target, struct srp_iu *iu; struct srp_tsk_mgmt *tsk_mgmt; + if (!target->connected || target->qp_in_error) + return -1; + init_completion(&target->tsk_mgmt_done); spin_lock_irq(&target->lock); @@ -1754,8 +1757,6 @@ static int srp_reset_device(struct scsi_cmnd *scmnd) shost_printk(KERN_ERR, target->scsi_host, "SRP reset_device called\n"); - if (target->qp_in_error) - return FAILED; if (srp_send_tsk_mgmt(target, SRP_TAG_NO_REQ, scmnd->device->lun, SRP_TSK_LUN_RESET)) return FAILED; -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 17+ messages in thread
[parent not found: <50D1CDB4.9070906-HInyCGIudOg@public.gmane.org>]
* Re: [PATCH v3 2/3] IB/srp: Avoid sending a task management function needlessly [not found] ` <50D1CDB4.9070906-HInyCGIudOg@public.gmane.org> @ 2012-12-19 18:05 ` David Dillow 0 siblings, 0 replies; 17+ messages in thread From: David Dillow @ 2012-12-19 18:05 UTC (permalink / raw) To: Bart Van Assche Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Or Gerlitz, Vu Pham, Alex Turin, Roland Dreier On Wed, 2012-12-19 at 15:22 +0100, Bart Van Assche wrote: > Do not send a task management function if sending will fail anyway > because either there is no RDMA/RC connection or the QP is in the > error state. > > Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org> Acked-by: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org> -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v3 3/3] IB/srp: Avoid endless SCSI error handling loop [not found] ` <50D1CD49.5080607-HInyCGIudOg@public.gmane.org> 2012-12-19 14:21 ` [PATCH v3 1/3] IB/srp: Track connection state properly Bart Van Assche 2012-12-19 14:22 ` [PATCH v3 2/3] IB/srp: Avoid sending a task management function needlessly Bart Van Assche @ 2012-12-19 14:24 ` Bart Van Assche [not found] ` <50D1CE0B.2080902-HInyCGIudOg@public.gmane.org> 2012-12-20 12:38 ` [PATCH v3 0/3] IB/SRP patches for kernel 3.8 Or Gerlitz 3 siblings, 1 reply; 17+ messages in thread From: Bart Van Assche @ 2012-12-19 14:24 UTC (permalink / raw) Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, David Dillow, Or Gerlitz, Vu Pham, Alex Turin, Roland Dreier If a SCSI command times out it is passed to the SCSI error handler. The SCSI error handler will try to abort the commands that timed out. If aborting failed a device reset will be attempted. If the device reset also fails a host reset will be attempted. If the host reset also fails the whole procedure will be repeated. srp_abort() and srp_reset_device() fail for a QP in the error state. srp_reset_host() fails after host removal has started. Hence if the SCSI error handler gets invoked after host removal has started and with the QP in the error state an endless loop will be triggered. Modify the SCSI error handling functions in ib_srp as follows: - Abort SCSI commands properly even if the QP is in the error state. - Make srp_reset_host() reset SCSI requests even after host removal has already started or if reconnecting fails. Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org> Cc: David Dillow <dave-i1Mk8JYDVaaSihdK6806/g@public.gmane.org> Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org> Cc: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> Cc: Vu Pham <vu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> Cc: Alex Turin <alextu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/ulp/srp/ib_srp.c | 29 +++++++++++++++-------------- 1 file changed, 15 insertions(+), 14 deletions(-) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 2633258..8a7eb9f 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -700,23 +700,24 @@ static int srp_reconnect_target(struct srp_target_port *target) struct Scsi_Host *shost = target->scsi_host; int i, ret; - if (target->state != SRP_TARGET_LIVE) - return -EAGAIN; - scsi_target_block(&shost->shost_gendev); srp_disconnect_target(target); /* - * Now get a new local CM ID so that we avoid confusing the - * target in case things are really fouled up. + * Now get a new local CM ID so that we avoid confusing the target in + * case things are really fouled up. Doing so also ensures that all CM + * callbacks will have finished before a new QP is allocated. */ ret = srp_new_cm_id(target); - if (ret) - goto unblock; - - ret = srp_create_target_ib(target); - if (ret) - goto unblock; + /* + * Whether or not creating a new CM ID succeeded, create a new + * QP. This guarantees that all completion callback function + * invocations have finished before request resetting starts. + */ + if (ret == 0) + ret = srp_create_target_ib(target); + else + srp_create_target_ib(target); for (i = 0; i < SRP_CMD_SQ_SIZE; ++i) { struct srp_request *req = &target->req_ring[i]; @@ -728,9 +729,9 @@ static int srp_reconnect_target(struct srp_target_port *target) for (i = 0; i < SRP_SQ_SIZE; ++i) list_add(&target->tx_ring[i]->list, &target->free_tx); - ret = srp_connect_target(target); + if (ret == 0) + ret = srp_connect_target(target); -unblock: scsi_target_unblock(&shost->shost_gendev, ret == 0 ? SDEV_RUNNING : SDEV_TRANSPORT_OFFLINE); @@ -1739,7 +1740,7 @@ static int srp_abort(struct scsi_cmnd *scmnd) shost_printk(KERN_ERR, target->scsi_host, "SRP abort called\n"); - if (!req || target->qp_in_error || !srp_claim_req(target, req, scmnd)) + if (!req || !srp_claim_req(target, req, scmnd)) return FAILED; srp_send_tsk_mgmt(target, req->index, scmnd->device->lun, SRP_TSK_ABORT_TASK); -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 17+ messages in thread
[parent not found: <50D1CE0B.2080902-HInyCGIudOg@public.gmane.org>]
* Re: [PATCH v3 3/3] IB/srp: Avoid endless SCSI error handling loop [not found] ` <50D1CE0B.2080902-HInyCGIudOg@public.gmane.org> @ 2012-12-19 18:07 ` David Dillow 0 siblings, 0 replies; 17+ messages in thread From: David Dillow @ 2012-12-19 18:07 UTC (permalink / raw) To: Bart Van Assche Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Or Gerlitz, Vu Pham, Alex Turin, Roland Dreier On Wed, 2012-12-19 at 15:24 +0100, Bart Van Assche wrote: > If a SCSI command times out it is passed to the SCSI error > handler. The SCSI error handler will try to abort the commands > that timed out. If aborting failed a device reset will be > attempted. If the device reset also fails a host reset will > be attempted. If the host reset also fails the whole procedure > will be repeated. > > srp_abort() and srp_reset_device() fail for a QP in the error > state. srp_reset_host() fails after host removal has started. > Hence if the SCSI error handler gets invoked after host removal > has started and with the QP in the error state an endless loop > will be triggered. > > Modify the SCSI error handling functions in ib_srp as follows: > - Abort SCSI commands properly even if the QP is in the error > state. > - Make srp_reset_host() reset SCSI requests even after host > removal has already started or if reconnecting fails. > > Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org> It's not the cleanest way to handle this, but it is the simplest and we're short on time. Acked-by: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org> -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3 0/3] IB/SRP patches for kernel 3.8 [not found] ` <50D1CD49.5080607-HInyCGIudOg@public.gmane.org> ` (2 preceding siblings ...) 2012-12-19 14:24 ` [PATCH v3 3/3] IB/srp: Avoid endless SCSI error handling loop Bart Van Assche @ 2012-12-20 12:38 ` Or Gerlitz [not found] ` <50D306DE.7080505-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 3 siblings, 1 reply; 17+ messages in thread From: Or Gerlitz @ 2012-12-20 12:38 UTC (permalink / raw) To: Bart Van Assche Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, David Dillow, Vu Pham, Alex Turin On 19/12/2012 16:20, Bart Van Assche wrote: > This patch series avoids that SCSI error handling triggers an endless > loop and also restores reporting of QP errors in the kernel log. > > Changes between v3 and v2: > - As proposed by Dave, added a patch that prevents sending of a task > management function over a closed connection. > > Changes between v2 and v1: > - Track connection state properly. > - Make srp_reset_host() reset requests even if reconnecting fails > Bart, I tried the patches now, took these three commits from git://github.com/bvanassche/linux.git branch 3.8-ib-srp-fixes which I assume are the ones you posted here and applied them on Roland's for-next branch commit 62862c0f93d47853b8d321fe5dcdd5d789e92d08 IB/srp: Avoid endless SCSI error handling loop commit d73006faf751df4fc2fff7514f6fc6a74c41fd35 IB/srp: Avoid sending a task management function needlessly commit befde6a2ceca6b894fb183ff96d175d6c380546b IB/srp: Track connection state properly Basically, I connected to an SRP target, later took down the initiator IB port, and then attempted to issue some IO over the SRP lun (no multipathing). What happens is that things are seemed to be detected properly, reconnection goes in a loop, but the SCSI host (host6) isn't deleted, I can't unload the module since there's non-zero ref count. Only when I bring back the port, I see scsi host6: ib_srp: reconnect succeeded sd 6:0:0:1: [sdc] Synchronizing SCSI cache ... etc on all the luns of that host and only then the SCSI host is finally removed, the module ref count goes to zero and I canunload the module. Maybe another patch is needed here? I think few days ago you had a patch on your tree named "Save and restore host_scribble during error handling", is it possible we need this here for happy removal of the scsi host? Or. Dec 20 14:06:13 vsa33 kernel: scsi6 : SRP.T10:0002C9030010B014 Dec 20 14:06:13 vsa33 kernel: scsi 6:0:0:0: Direct-Access SUN COMSTAR 1.0 PQ: 0 ANSI: 5 Dec 20 14:06:13 vsa33 kernel: sd 6:0:0:0: Attached scsi generic sg1 type 0 Dec 20 14:06:13 vsa33 kernel: sd 6:0:0:0: [sdb] 409600 512-byte logical blocks: (209 MB/200 MiB) Dec 20 14:06:13 vsa33 kernel: scsi 6:0:0:1: Direct-Access SUN COMSTAR 1.0 PQ: 0 ANSI: 5 Dec 20 14:06:13 vsa33 kernel: sd 6:0:0:1: Attached scsi generic sg2 type 0 Dec 20 14:06:13 vsa33 kernel: sd 6:0:0:1: [sdc] 409600 512-byte logical blocks: (209 MB/200 MiB) Dec 20 14:06:13 vsa33 kernel: sd 6:0:0:1: [sdc] Write Protect is off Dec 20 14:06:13 vsa33 kernel: sd 6:0:0:1: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Dec 20 14:06:13 vsa33 kernel: scsi 6:0:0:2: Direct-Access SUN COMSTAR 1.0 PQ: 0 ANSI: 5 Dec 20 14:06:13 vsa33 kernel: sdc: unknown partition table Dec 20 14:06:13 vsa33 kernel: sd 6:0:0:2: Attached scsi generic sg3 type 0 Dec 20 14:06:13 vsa33 kernel: scsi 6:0:0:3: Direct-Access SUN COMSTAR 1.0 PQ: 0 ANSI: 5 Dec 20 14:06:13 vsa33 kernel: sd 6:0:0:3: Attached scsi generic sg4 type 0 Dec 20 14:06:13 vsa33 kernel: sd 6:0:0:2: [sdd] 409600 512-byte logical blocks: (209 MB/200 MiB) Dec 20 14:06:13 vsa33 kernel: sd 6:0:0:3: [sde] 104857600 512-byte logical blocks: (53.6 GB/50.0 GiB) Dec 20 14:06:13 vsa33 kernel: sd 6:0:0:0: [sdb] Write Protect is off Dec 20 14:06:13 vsa33 kernel: sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Dec 20 14:06:13 vsa33 kernel: sd 6:0:0:3: [sde] Write Protect is off Dec 20 14:06:13 vsa33 kernel: sd 6:0:0:3: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Dec 20 14:06:13 vsa33 kernel: sd 6:0:0:1: [sdc] Attached SCSI disk Dec 20 14:06:13 vsa33 kernel: sdb: unknown partition table Dec 20 14:06:13 vsa33 kernel: sde: unknown partition table Dec 20 14:06:13 vsa33 kernel: sd 6:0:0:2: [sdd] Write Protect is off Dec 20 14:06:13 vsa33 kernel: sd 6:0:0:0: [sdb] Attached SCSI disk Dec 20 14:06:13 vsa33 kernel: sd 6:0:0:3: [sde] Attached SCSI disk Dec 20 14:06:13 vsa33 kernel: sd 6:0:0:2: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Dec 20 14:06:13 vsa33 kernel: sdd: unknown partition table Dec 20 14:06:13 vsa33 kernel: sd 6:0:0:2: [sdd] Attached SCSI disk Dec 20 14:08:02 vsa33 kernel: scsi host6: ib_srp: failed send status 5 Dec 20 14:08:29 vsa33 kernel: scsi host6: SRP abort called Dec 20 14:08:39 vsa33 kernel: scsi host6: SRP abort called Dec 20 14:08:39 vsa33 kernel: scsi host6: SRP reset_device called Dec 20 14:08:39 vsa33 kernel: scsi host6: ib_srp: SRP reset_host called Dec 20 14:08:40 vsa33 kernel: scsi host6: ib_srp: Got failed path rec status -110 Dec 20 14:08:40 vsa33 kernel: scsi host6: ib_srp: Path record query failed Dec 20 14:08:40 vsa33 kernel: scsi host6: ib_srp: reconnect failed (-110), removing target port. Dec 20 14:08:40 vsa33 kernel: sd 6:0:0:1: Device offlined - not ready after error recovery Dec 20 14:08:40 vsa33 kernel: sd 6:0:0:1: [sdc] Unhandled error code Dec 20 14:08:40 vsa33 kernel: sd 6:0:0:1: [sdc] Dec 20 14:08:40 vsa33 kernel: sd 6:0:0:1: [sdc] CDB: Dec 20 14:08:40 vsa33 kernel: end_request: I/O error, dev sdc, sector 0 Dec 20 14:08:40 vsa33 kernel: Buffer I/O error on device sdc, logical block 0 Dec 20 14:08:40 vsa33 kernel: Buffer I/O error on device sdc, logical block 1 Dec 20 14:08:40 vsa33 kernel: sd 6:0:0:1: rejecting I/O to offline device Dec 20 14:08:40 vsa33 kernel: Buffer I/O error on device sdc, logical block 2 Dec 20 14:08:40 vsa33 kernel: Buffer I/O error on device sdc, logical block 3 Dec 20 14:08:40 vsa33 kernel: sd 6:0:0:0: [sdb] Synchronizing SCSI cache Dec 20 14:09:41 vsa33 kernel: scsi host6: SRP abort called Dec 20 14:09:51 vsa33 kernel: scsi host6: SRP abort called Dec 20 14:09:51 vsa33 kernel: scsi host6: SRP reset_device called Dec 20 14:09:51 vsa33 kernel: scsi host6: ib_srp: SRP reset_host called Dec 20 14:09:52 vsa33 kernel: scsi host6: ib_srp: Got failed path rec status -110 Dec 20 14:09:52 vsa33 kernel: scsi host6: ib_srp: Path record query failed Dec 20 14:09:52 vsa33 kernel: scsi host6: ib_srp: reconnect failed (-110), removing target port. Dec 20 14:09:52 vsa33 kernel: sd 6:0:0:0: Device offlined - not ready after error recovery Dec 20 14:10:53 vsa33 kernel: scsi host6: SRP abort called Dec 20 14:11:03 vsa33 kernel: scsi host6: SRP abort called Dec 20 14:11:03 vsa33 kernel: scsi host6: SRP reset_device called Dec 20 14:11:03 vsa33 kernel: scsi host6: ib_srp: SRP reset_host called Dec 20 14:11:04 vsa33 kernel: scsi host6: ib_srp: Got failed path rec status -110 Dec 20 14:11:04 vsa33 kernel: scsi host6: ib_srp: Path record query failed Dec 20 14:11:04 vsa33 kernel: scsi host6: ib_srp: reconnect failed (-110), removing target port. Dec 20 14:11:04 vsa33 kernel: sd 6:0:0:0: Device offlined - not ready after error recovery Dec 20 14:12:05 vsa33 kernel: scsi host6: SRP abort called Dec 20 14:12:15 vsa33 kernel: scsi host6: SRP abort called Dec 20 14:12:15 vsa33 kernel: scsi host6: SRP reset_device called Dec 20 14:12:15 vsa33 kernel: scsi host6: ib_srp: SRP reset_host called Dec 20 14:12:15 vsa33 kernel: scsi host6: ib_srp: reconnect succeeded Dec 20 14:12:25 vsa33 kernel: sd 6:0:0:1: [sdc] Synchronizing SCSI cache Dec 20 14:12:25 vsa33 kernel: sd 6:0:0:2: [sdd] Synchronizing SCSI cache Dec 20 14:12:25 vsa33 kernel: sd 6:0:0:3: [sde] Synchronizing SCSI cache -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <50D306DE.7080505-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>]
* Re: [PATCH v3 0/3] IB/SRP patches for kernel 3.8 [not found] ` <50D306DE.7080505-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> @ 2012-12-20 13:19 ` Bart Van Assche [not found] ` <50D3105C.10005-HInyCGIudOg@public.gmane.org> 0 siblings, 1 reply; 17+ messages in thread From: Bart Van Assche @ 2012-12-20 13:19 UTC (permalink / raw) To: Or Gerlitz Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, David Dillow On 12/20/12 13:38, Or Gerlitz wrote: > I think few days ago you had a patch on your tree named "Save and > restore host_scribble during error handling", is it possible we need > this here for happy removal of the scsi host? No. Host removal works fine even without that patch. That's because srp_abort() has been modified such that it finishes a request whether or not sending a task management function to the target succeeded. As you can see in scsi_eh_abort_cmnds() if the eh_abort_handler callback function returns SUCCESS then scsi_eh_test_devices() will be passed an empty work_q list. Hence the SCSI error handler function scsi_eh_prep_cmnd() does not get invoked and hence does not have a chance of overwriting the host_scribble field. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <50D3105C.10005-HInyCGIudOg@public.gmane.org>]
* Re: [PATCH v3 0/3] IB/SRP patches for kernel 3.8 [not found] ` <50D3105C.10005-HInyCGIudOg@public.gmane.org> @ 2012-12-20 14:56 ` Or Gerlitz 0 siblings, 0 replies; 17+ messages in thread From: Or Gerlitz @ 2012-12-20 14:56 UTC (permalink / raw) To: Bart Van Assche Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, David Dillow On 20/12/2012 15:19, Bart Van Assche wrote: > On 12/20/12 13:38, Or Gerlitz wrote: >> I think few days ago you had a patch on your tree named "Save and >> restore host_scribble during error handling", is it possible we need >> this here for happy removal of the scsi host? > > No. Host removal works fine even without that patch. not on my setup, how do you suggest to proceed here? Or. > That's because srp_abort() has been modified such that it finishes a > request whether or not sending a task management function to the > target succeeded. As you can see in scsi_eh_abort_cmnds() if the > eh_abort_handler callback function returns SUCCESS then > scsi_eh_test_devices() will be passed an empty work_q list. Hence the > SCSI error handler function scsi_eh_prep_cmnd() does not get invoked > and hence does not have a chance of overwriting the host_scribble field. > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2012-12-20 20:06 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-19 14:20 [PATCH v3 0/3] IB/SRP patches for kernel 3.8 Bart Van Assche
[not found] ` <50D1CD49.5080607-HInyCGIudOg@public.gmane.org>
2012-12-19 14:21 ` [PATCH v3 1/3] IB/srp: Track connection state properly Bart Van Assche
[not found] ` <50D1CD7D.6030905-HInyCGIudOg@public.gmane.org>
2012-12-19 18:04 ` David Dillow
[not found] ` <1355940274.23687.2.camel-zHLflQxYYDO4Hhoo1DtQwJ9G+ZOsUmrO@public.gmane.org>
2012-12-20 8:13 ` Bart Van Assche
[not found] ` <50D2C8C4.3030308-HInyCGIudOg@public.gmane.org>
2012-12-20 15:10 ` David Dillow
[not found] ` <1356016241.2507.1.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
2012-12-20 15:24 ` Bart Van Assche
[not found] ` <50D32DC7.9000007-HInyCGIudOg@public.gmane.org>
2012-12-20 15:26 ` [PATCH v4 " Bart Van Assche
2012-12-20 15:27 ` [PATCH v3 " Or Gerlitz
[not found] ` <50D32E7A.4070704-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2012-12-20 16:27 ` David Dillow
[not found] ` <1356020823.8322.0.camel-zHLflQxYYDO4Hhoo1DtQwJ9G+ZOsUmrO@public.gmane.org>
2012-12-20 20:06 ` Or Gerlitz
2012-12-19 14:22 ` [PATCH v3 2/3] IB/srp: Avoid sending a task management function needlessly Bart Van Assche
[not found] ` <50D1CDB4.9070906-HInyCGIudOg@public.gmane.org>
2012-12-19 18:05 ` David Dillow
2012-12-19 14:24 ` [PATCH v3 3/3] IB/srp: Avoid endless SCSI error handling loop Bart Van Assche
[not found] ` <50D1CE0B.2080902-HInyCGIudOg@public.gmane.org>
2012-12-19 18:07 ` David Dillow
2012-12-20 12:38 ` [PATCH v3 0/3] IB/SRP patches for kernel 3.8 Or Gerlitz
[not found] ` <50D306DE.7080505-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2012-12-20 13:19 ` Bart Van Assche
[not found] ` <50D3105C.10005-HInyCGIudOg@public.gmane.org>
2012-12-20 14:56 ` Or Gerlitz
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).