Re: hosts resets in SRP and the rest of the world, was: Re: [PATCH 01/12] scsi_transport_srp: Introduce srp_wait_for_queuecommand()

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Bart Van Assche <bart.vanassche@sandisk.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: Sagi Grimberg <sagig@dev.mellanox.co.il>,
	Doug Ledford <dledford@redhat.com>,
	James Bottomley <jbottomley@odin.com>,
	Sagi Grimberg <sagig@mellanox.com>,
	Sebastian Parschauer <sebastian.riemer@profitbricks.com>,
	Jens Axboe <axboe@fb.com>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	Hannes Reinecke <hare@suse.de>
Subject: Re: hosts resets in SRP and the rest of the world, was: Re: [PATCH 01/12] scsi_transport_srp: Introduce srp_wait_for_queuecommand()
Date: Mon, 11 May 2015 10:54:30 +0200	[thread overview]
Message-ID: <55506E46.2060103@sandisk.com> (raw)
In-Reply-To: <20150511075058.GA18483@infradead.org>

On 05/11/15 09:50, Christoph Hellwig wrote:
> Hi Bart,
>
> I've looked at this and didn't really like the unconditional hctx lock
> in the blk-mq path which might have nasty effects when just using a
> single hctx.
>
> So I'm taking another step back and try to understand what you're doign
> here.
>
> Let me try to recreate the issue:
>
>   - we get a ->host_reset call for the SRP initiator, which then
>     calls srp_reconnect_rport, at which point we still have outstanding
>     commands on the wire, and we still allow concurrent I/O submission
>   - srp_reconnect_rport then blocks new I/O, and tries to drain the
>     peding requeuest from ->queuecommand.  It then calls into
>     srp_rport_reconnect, which after some work also clears out all
>     commands on the wire and the reconnects
>
> Maybe it's time to move to what Hannes suggested in
> events.linuxfoundation.org/sites/events/files/slides/SCSI-EH.pdf
> slides 56+ at least for SRP as a start, that is:
>
>   - once escalating to a LUN reset fail all commands for the LUN
>     and block the the LUN for I/O and send a TMF abort
>   - once scalatating to the host reset fail all I/O for the host
>     and block the host (all LUNs) for I/O, and only then call
>     the host reset action (reconnect in the SRP case)
>     (or rather replace the current RP host reset with the
>     I_T Nexus reset suggested by Hannes)
>
> The advantage is that we can do the full drain much more easily
> than just waiting for command leaving ->queuecommnd.  The other
> advantage is that we can implement this with fairly small changes
> in the scsi_error.c code trggered off a host or transport template
> flag, without touching code in the block layer while at the same
> time significantly simplifying the transport layer and drivers.

Hello Christoph,

There are multiple events that can cause the SRP initiator driver to 
initiate a reconnect:
1. The SCSI core invoking eh_host_reset_handler().
2. An error reported by the IB HCA or by the IB core, e.g. an RDMA
    transmit timeout or a transport layer disconnect reported by the
    IB/CM.

The reason I added (2) is to reduce the failover time in a H.A. setup. 
If e.g. a path fails it can take up to (2 * SCSI timeout) before all 
outstanding SCSI commands have timed out. The next step is that the SCSI 
error handler invokes a device reset. If a cable has been pulled the 
task management function issued by srp_reset_device() will time out. The 
next step is that srp_reset_host() will try to perform a reconnect. If a 
cable has been pulled this reconnect attempt will also time out. Because 
of how the retry count and timeout parameters for establishing a 
connection in the SRP initiator have been chosen it can take 
considerable time before a reconnect attempt times out and hence before 
srp_reset_host() reports a failure.

A common complaint about older versions of the SRP initiator was that 
failover took to long, namely several minutes instead of less than a 
minute. The reason why (2) had been introduced was to reduce the path 
failover time to less than a minute. As soon as the IB HCA and/or IB 
core have reported an error we know that a connection has to be 
reestablished. Waiting until the SCSI error handler has finished its 
escalation strategy only slows down failover and does not provide any 
benefits from the point of view an SRP initiator.

In summary, if it would be possible to modify the SCSI error handling 
strategy such that (2) can be dropped without increasing the SRP 
initiator failover time I definitely would like to hear about that. But 
I'm not sure that's possible.

Best regards,

Bart.

next prev parent reply	other threads:[~2015-05-11  8:54 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-30  8:56 [PATCH 0/12] IB/srp patches for kernel v4.2 Bart Van Assche
2015-04-30  8:56 ` [PATCH 01/12] scsi_transport_srp: Introduce srp_wait_for_queuecommand() Bart Van Assche
     [not found]   ` <5541EE4A.30803-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2015-04-30  9:32     ` Sagi Grimberg
2015-04-30  9:37     ` Christoph Hellwig
2015-04-30 10:26       ` Bart Van Assche
2015-04-30 10:32         ` Sagi Grimberg
2015-04-30 10:58           ` Bart Van Assche
     [not found]             ` <55420AEA.10108-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2015-04-30 14:13               ` Sagi Grimberg
2015-04-30 17:25               ` Christoph Hellwig
2015-05-06  9:59                 ` Bart Van Assche
2015-05-11  7:50                   ` hosts resets in SRP and the rest of the world, was: " Christoph Hellwig
2015-05-11  8:54                     ` Bart Van Assche [this message]
2015-05-11  9:31                       ` Christoph Hellwig
2015-05-11  9:58                         ` Bart Van Assche
2015-05-11 11:47                           ` Christoph Hellwig
2015-05-11 10:58                         ` Bart Van Assche
2015-05-11 11:50                           ` Christoph Hellwig
2015-05-12  8:49                             ` Bart Van Assche
2015-05-12 18:02                               ` Christoph Hellwig
2015-04-30  8:57 ` [PATCH 02/12] scsi_transport_srp: Fix a race condition Bart Van Assche
     [not found]   ` <5541EE66.7090608-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2015-04-30  9:44     ` Sagi Grimberg
     [not found]       ` <5541F96F.8090503-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-04-30 10:20         ` Bart Van Assche
     [not found] ` <5541EE21.3050809-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2015-04-30  8:57   ` [PATCH 03/12] IB/srp: Remove an extraneous scsi_host_put() from an error path Bart Van Assche
2015-04-30  9:44     ` Sagi Grimberg
2015-04-30  9:02   ` [PATCH 11/12] IB/srp: Add 64-bit LUN support Bart Van Assche
2015-04-30  9:02   ` [PATCH 12/12] IB/srp: Make CM timeout dependent on subnet timeout Bart Van Assche
     [not found]     ` <5541EFB3.6030704-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2015-04-30 10:27       ` Sagi Grimberg
2015-04-30 10:45         ` Bart Van Assche
2015-04-30  8:58 ` [PATCH 04/12] IB/srp: Fix connection state tracking Bart Van Assche
2015-04-30  9:51   ` Sagi Grimberg
2015-04-30 11:25     ` Bart Van Assche
     [not found]       ` <5542111E.1080305-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2015-04-30 15:00         ` Sagi Grimberg
     [not found]           ` <5542439D.1000107-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-05-05  9:31             ` Bart Van Assche
     [not found]               ` <55488E06.8040308-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2015-05-05  9:45                 ` Sagi Grimberg
     [not found]                   ` <5548911F.8060505-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-05-05  9:59                     ` Bart Van Assche
2015-04-30 16:08   ` Doug Ledford
     [not found]     ` <1430410094.102408.71.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-05-05  9:21       ` Bart Van Assche
     [not found]         ` <55488BAE.7070006-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2015-05-05 14:10           ` Doug Ledford
2015-05-05 14:26             ` Bart Van Assche
2015-05-05 15:10               ` Doug Ledford
2015-05-05 15:27                 ` Bart Van Assche
     [not found]                   ` <5548E155.70007-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2015-05-05 16:10                     ` Doug Ledford
     [not found]                       ` <1430842201.2407.226.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-05-06  9:29                         ` Bart Van Assche
     [not found]                           ` <5549DEEC.9050501-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2015-05-07 13:44                             ` Doug Ledford
2015-04-30  8:58 ` [PATCH 05/12] IB/srp: Fix reconnection failure handling Bart Van Assche
2015-04-30  8:59 ` [PATCH 06/12] scsi_transport_srp: Reduce failover time Bart Van Assche
2015-04-30 10:13   ` Sagi Grimberg
2015-04-30 11:02     ` Bart Van Assche
     [not found]       ` <55420BAA.7060507-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2015-04-30 15:14         ` Sagi Grimberg
     [not found]           ` <554246E6.9020503-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-05-05  9:38             ` Bart Van Assche
2015-04-30  9:00 ` [PATCH 07/12] IB/srp: Remove superfluous casts Bart Van Assche
2015-04-30 10:13   ` Sagi Grimberg
2015-04-30  9:00 ` [PATCH 08/12] IB/srp: Rearrange module description Bart Van Assche
     [not found]   ` <5541EF39.6040301-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2015-04-30 10:15     ` Sagi Grimberg
2015-04-30  9:01 ` [PATCH 09/12] IB/srp: Remove a superfluous check from srp_free_req_data() Bart Van Assche
     [not found]   ` <5541EF4F.6050200-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2015-04-30 10:18     ` Sagi Grimberg
2015-04-30 10:37       ` Bart Van Assche
2015-04-30  9:01 ` [PATCH 10/12] IB/srp: Remove !ch->target tests from the reconnect code Bart Van Assche
2015-04-30 10:19   ` Sagi Grimberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55506E46.2060103@sandisk.com \
    --to=bart.vanassche@sandisk.com \
    --cc=axboe@fb.com \
    --cc=dledford@redhat.com \
    --cc=hare@suse.de \
    --cc=hch@infradead.org \
    --cc=jbottomley@odin.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=sagig@dev.mellanox.co.il \
    --cc=sagig@mellanox.com \
    --cc=sebastian.riemer@profitbricks.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.