From: Bart Van Assche <bart.vanassche@sandisk.com>
To: Doug Ledford <dledford@redhat.com>
Cc: James Bottomley <jbottomley@odin.com>,
Sagi Grimberg <sagig@mellanox.com>,
Sebastian Parschauer <sebastian.riemer@profitbricks.com>,
linux-rdma <linux-rdma@vger.kernel.org>,
"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>
Subject: Re: [PATCH 04/12] IB/srp: Fix connection state tracking
Date: Tue, 5 May 2015 17:27:17 +0200 [thread overview]
Message-ID: <5548E155.70007@sandisk.com> (raw)
In-Reply-To: <1430838637.2407.209.camel@redhat.com>
On 05/05/15 17:10, Doug Ledford wrote:
> On Tue, 2015-05-05 at 16:26 +0200, Bart Van Assche wrote:
>> On 05/05/15 16:10, Doug Ledford wrote:
>>> However, while looking through the driver to research this, I noticed
>>> something else that seems more important if you ask me. With this patch
>>> we now implement individual channel connection tracking. However, in
>>> srp_queuecommand() you pick the channel based on the tag, and the blk
>>> layer has no idea of these disconnects, so the blk layer is free to
>>> assign a tag/channel to a channel that's disconnected, and then as best
>>> I can tell, you will simply try to post a work request to a channel
>>> that's already disconnected, which I would expect to fail if we have
>>> already disconnected this particular qp and not brought up a new one
>>> yet. So it seems to me there is a race condition between new incoming
>>> SCSI commands and this disconnect/reconnect window, and that maybe we
>>> should be sending these commands back to the mid layer for requeueing
>>> when the channel the blk_mq tag points to is disconnected. Or am I
>>> missing something in there?
>>
>> Hello Doug,
>>
>> Around the time a cable disconnect or other link layer failure is
>> detected by the SRP initiator or any other SCSI LLD it is unavoidable
>> that one or more SCSI requests fail. It is up to a higher layer (e.g.
>> dm-multipath + multipathd) to decide what to do with such requests, e.g.
>> queue these requests and resend these over another path.
>
> Sure, but that wasn't my point. My point was that if you know the
> channel is disconnected, then why don't you go immediately to the
> correct action in queuecommand (where correct action could be requeue
> waiting on reconnect or return with error, whatever is appropriate)?
> Instead you attempt to post a command to a known disconnected queue
> pair.
>
>> The SRP initiator driver has been tested thoroughly with the multipath
>> queue_if_no_path policy, with a fio job with I/O verification enabled
>> running on top of a dm device while concurrently repeatedly simulating
>> link layer failures (via ibportstate).
>
> Part of my questions here are because I don't know how the blk_mq
> handles certain conditions. However, your testing above only handles
> one case: all channels get dropped. As unlikely it may be, what if
> resource constraints caused just one channel to be dropped out of the
> bunch and the others stayed alive? Then the blk_mq would see requests
> on just one queue come back errored, while the others finished
> successfully. Does it drop that one queue out of rotation, or does it
> fail over the entire connection?
Hello Doug,
Sorry but I don't think that a SCSI LLD is the appropriate layer to
choose between requeuing or failing a request. If multiple paths are
available between an initiator system and a SAN and if one path fails
only the multipath layer knows whether there are other working paths
available. If a working path is still available then the request should
be resent as soon as possible over another path. The multipath layer can
only take such a decision after a SCSI LLD has failed a request.
If only one channel fails all other channels are disconnected and the
transport layer error handling mechanism is started.
Bart.
next prev parent reply other threads:[~2015-05-05 15:27 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-30 8:56 [PATCH 0/12] IB/srp patches for kernel v4.2 Bart Van Assche
2015-04-30 8:56 ` [PATCH 01/12] scsi_transport_srp: Introduce srp_wait_for_queuecommand() Bart Van Assche
[not found] ` <5541EE4A.30803-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2015-04-30 9:32 ` Sagi Grimberg
2015-04-30 9:37 ` Christoph Hellwig
2015-04-30 10:26 ` Bart Van Assche
2015-04-30 10:32 ` Sagi Grimberg
2015-04-30 10:58 ` Bart Van Assche
[not found] ` <55420AEA.10108-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2015-04-30 14:13 ` Sagi Grimberg
2015-04-30 17:25 ` Christoph Hellwig
2015-04-30 8:57 ` [PATCH 02/12] scsi_transport_srp: Fix a race condition Bart Van Assche
[not found] ` <5541EE66.7090608-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2015-04-30 9:44 ` Sagi Grimberg
[not found] ` <5541F96F.8090503-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-04-30 10:20 ` Bart Van Assche
2015-04-30 8:58 ` [PATCH 04/12] IB/srp: Fix connection state tracking Bart Van Assche
2015-04-30 9:51 ` Sagi Grimberg
2015-04-30 11:25 ` Bart Van Assche
[not found] ` <5542111E.1080305-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2015-04-30 15:00 ` Sagi Grimberg
[not found] ` <5542439D.1000107-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-05-05 9:31 ` Bart Van Assche
[not found] ` <55488E06.8040308-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2015-05-05 9:45 ` Sagi Grimberg
[not found] ` <5548911F.8060505-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-05-05 9:59 ` Bart Van Assche
2015-04-30 16:08 ` Doug Ledford
[not found] ` <1430410094.102408.71.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-05-05 9:21 ` Bart Van Assche
[not found] ` <55488BAE.7070006-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2015-05-05 14:10 ` Doug Ledford
2015-05-05 14:26 ` Bart Van Assche
2015-05-05 15:10 ` Doug Ledford
2015-05-05 15:27 ` Bart Van Assche [this message]
[not found] ` <5548E155.70007-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2015-05-05 16:10 ` Doug Ledford
[not found] ` <1430842201.2407.226.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-05-06 9:29 ` Bart Van Assche
[not found] ` <5549DEEC.9050501-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2015-05-07 13:44 ` Doug Ledford
2015-04-30 8:58 ` [PATCH 05/12] IB/srp: Fix reconnection failure handling Bart Van Assche
2015-04-30 8:59 ` [PATCH 06/12] scsi_transport_srp: Reduce failover time Bart Van Assche
2015-04-30 10:13 ` Sagi Grimberg
2015-04-30 11:02 ` Bart Van Assche
[not found] ` <55420BAA.7060507-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2015-04-30 15:14 ` Sagi Grimberg
[not found] ` <554246E6.9020503-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-05-05 9:38 ` Bart Van Assche
2015-04-30 9:00 ` [PATCH 07/12] IB/srp: Remove superfluous casts Bart Van Assche
2015-04-30 10:13 ` Sagi Grimberg
2015-04-30 9:00 ` [PATCH 08/12] IB/srp: Rearrange module description Bart Van Assche
[not found] ` <5541EF39.6040301-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2015-04-30 10:15 ` Sagi Grimberg
2015-04-30 9:01 ` [PATCH 09/12] IB/srp: Remove a superfluous check from srp_free_req_data() Bart Van Assche
[not found] ` <5541EF4F.6050200-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2015-04-30 10:18 ` Sagi Grimberg
2015-04-30 10:37 ` Bart Van Assche
2015-04-30 9:01 ` [PATCH 10/12] IB/srp: Remove !ch->target tests from the reconnect code Bart Van Assche
2015-04-30 10:19 ` Sagi Grimberg
[not found] ` <5541EE21.3050809-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2015-04-30 8:57 ` [PATCH 03/12] IB/srp: Remove an extraneous scsi_host_put() from an error path Bart Van Assche
2015-04-30 9:44 ` Sagi Grimberg
2015-04-30 9:02 ` [PATCH 11/12] IB/srp: Add 64-bit LUN support Bart Van Assche
2015-04-30 9:02 ` [PATCH 12/12] IB/srp: Make CM timeout dependent on subnet timeout Bart Van Assche
[not found] ` <5541EFB3.6030704-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2015-04-30 10:27 ` Sagi Grimberg
2015-04-30 10:45 ` Bart Van Assche
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5548E155.70007@sandisk.com \
--to=bart.vanassche@sandisk.com \
--cc=dledford@redhat.com \
--cc=jbottomley@odin.com \
--cc=linux-rdma@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=sagig@mellanox.com \
--cc=sebastian.riemer@profitbricks.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox