linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Cc: Or Gerlitz <or.gerlitz-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>,
	Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>,
	James Bottomley
	<James.Bottomley-JuX6DAaQMKPCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>,
	"linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	linux-scsi <linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	fujita.tomonori-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org,
	rcj-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org,
	Alex Turin <alextu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Subject: Re: [PATCH 00/11] First pass at merging Bart's HA work
Date: Wed, 05 Dec 2012 20:50:54 +0100	[thread overview]
Message-ID: <50BFA59E.10208@acm.org> (raw)
In-Reply-To: <50BF9760.2080801-HInyCGIudOg@public.gmane.org>

On 12/05/12 19:50, Bart Van Assche wrote:
> On 12/05/12 19:23, Or Gerlitz wrote:
>> On Fri, Nov 30, 2012 at 4:21 AM, David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org> wrote:
>> [...]
>>> Modulo a few style issues (braces around one line if branches, etc.) and
>>> having three state variables vs one, I can live with everything up to
>>> aabfa852acd27962 at git://github.com/bvanassche/linux.git#srp-ha. Those
>>> two are small things that can be fixed later and are not worth holding
>>> things up any further.
>>>
>>> I'll try to spend some time on the final four patches tomorrow
>>> afternoon.
>>
>> Dave, Bart
>>
>> My colleague Alex Turin <alextu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> tried  today the bits as
>> they appear in Roland's kernel.org tree / for-next branch up to commit
>>   fb57e1dbbd4 and here's some feedback
>>
>> Basically, what he did was connecting  to a target, next take down the
>> IB port on the initiator side, and issue some IOs (dd if=/dev/sdb
>> of=/dev/null count=1)
>>
>> Our recollection of events from the logs (below) is the following
>>
>> 1. queued command get completion status 5
>>
>> 2. as part of error handling srp_reset_host() was called,
>>
>> 3. srp_reset_host() calls to srp_reconnect_target() which fails cause
>> port is down.
>>
>> 4. srp_reconnect_target() on failure calls to srp_queue_remove_work()
>> which sets
>> target->status to SRP_TARGET_REMOVED.
>>
>> 5.srp_reset_host() called second time. it calls to
>> srp_reconnect_target() but target->state == SRP_TARGET_REMOVED.
>> srp_reconnect_target() checks if target->state != SRP_TARGET_LIVE and
>> return -EAGAIN.
>>
>> This probably means that even after enabling port it will still fail
>> to reconnect?
>
> Hello Or,
>
> The only way to make I/O work reliably if a failure can occur at the
> transport layer is to use multipathd on top of ib_srp. If a connection
> fails for some reason, then the SRP SCSI host will be removed after the
> SCSI error handler has finished with its error recovery strategy. And
> once the transport layer is operational again and srp_daemon detects
> that the initiator is no longer logged in srp_daemon will make ib_srp
> log in again. multipathd will then cause I/O to continue over the new path.

(replying to my own e-mail)

Another possible approach would be to follow the FC model and to block 
I/O when a port goes down and to unblock I/O once I/O is again possible. 
Some time ago I had posted a patch that went somewhat in this direction 
and in which ib_srp tried to reconnect to a target repeatedly after a 
transport layer failure. That patch can be found here:

http://www.mail-archive.com/linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org/msg10158.html

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2012-12-05 19:50 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-26  4:44 [PATCH 00/11] First pass at merging Bart's HA work David Dillow
2012-11-26  4:44 ` [PATCH 01/11] IB/srp: enlarge block layer timeout David Dillow
2012-11-26  4:44 ` [PATCH 02/11] IB/srp: simplify state tracking David Dillow
2012-11-26  9:46   ` Bart Van Assche
     [not found]     ` <50B33A91.3060103-HInyCGIudOg@public.gmane.org>
2012-11-27  3:56       ` David Dillow
2012-11-26  4:44 ` [PATCH 05/11] IB/srp: destroy and recreate QP and CQs on each connection David Dillow
     [not found]   ` <8fa9a268ec4dc587970161efe94968f3263aad3b.1353903448.git.dillowda-1Heg1YXhbW8@public.gmane.org>
2012-11-26 18:57     ` Bart Van Assche
2012-11-26  4:44 ` [PATCH 06/11] IB/srp: send disconnect request without waiting for CM timewait exit David Dillow
2012-11-26  4:44 ` [PATCH 07/11] IB/srp: Document sysfs attributes David Dillow
     [not found] ` <cover.1353903448.git.dillowda-1Heg1YXhbW8@public.gmane.org>
2012-11-26  4:44   ` [PATCH 03/11] IB/srp: don't send anything on a bad QP David Dillow
2012-11-26  9:17     ` Bart Van Assche
     [not found]       ` <50B333AF.6040509-HInyCGIudOg@public.gmane.org>
2012-11-27  3:31         ` David Dillow
2012-11-26  4:44   ` [PATCH 04/11] IB/srp: keep processing commands during host removal David Dillow
     [not found]     ` <8715294a23dded5879b3a327c470d9b6a39ddbc4.1353903448.git.dillowda-1Heg1YXhbW8@public.gmane.org>
2012-11-26  9:43       ` Bart Van Assche
2012-11-27  3:16         ` David Dillow
2012-11-26  4:44   ` [PATCH 08/11] srp_transport: Fix attribute registration David Dillow
2012-11-26  4:44   ` [PATCH 09/11] srp_transport: Simplify attribute initialization code David Dillow
2012-11-26  5:02     ` David Dillow
2012-11-26  4:44   ` [PATCH 11/11] IB/srp: Allow SRP disconnect through sysfs David Dillow
2012-11-26  4:44 ` [PATCH 10/11] srp_transport: Document sysfs attributes David Dillow
2012-11-26  7:57 ` [PATCH 00/11] First pass at merging Bart's HA work Or Gerlitz
2012-11-27  4:53   ` David Dillow
2012-11-26 18:50 ` Roland Dreier
2012-11-26 19:15   ` James Bottomley
2012-11-26 19:22     ` Roland Dreier
2012-11-27  4:04     ` David Dillow
     [not found]       ` <1353989041.28917.24.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
2012-11-27  6:42         ` Or Gerlitz
2012-11-29 20:21       ` Roland Dreier
     [not found]         ` <CAL1RGDXpdWL_r7sWp=vvvXH4jxFgjDL+XcEGgKo-44=wrOBmtA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-11-30  2:21           ` David Dillow
2012-12-05 18:23             ` Or Gerlitz
     [not found]               ` <CAJZOPZJBTRXftrW5NWEEHnf2QWsni0HMTAV_PKSgDtA7GO=wRw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-12-05 18:50                 ` Bart Van Assche
     [not found]                   ` <50BF9760.2080801-HInyCGIudOg@public.gmane.org>
2012-12-05 19:50                     ` Bart Van Assche [this message]
2012-12-05 21:32                     ` Or Gerlitz
2012-12-06 14:10                       ` Bart Van Assche
     [not found]                         ` <50C0A76C.20500-HInyCGIudOg@public.gmane.org>
2012-12-06 14:27                           ` Or Gerlitz
     [not found]                             ` <50C0AB42.8040402-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2012-12-06 15:04                               ` Bart Van Assche
     [not found]                                 ` <50C0B407.4010706-HInyCGIudOg@public.gmane.org>
2012-12-06 15:46                                   ` Or Gerlitz
2012-12-06 15:55                                   ` Alex Turin
     [not found]                                     ` <50C0BFE0.909-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2012-12-07 21:47                                       ` Vu Pham
2012-12-07  8:19                               ` Or Gerlitz
2012-11-27 16:34 ` Bart Van Assche
     [not found]   ` <50B4EBA3.7070400-HInyCGIudOg@public.gmane.org>
2012-11-27 18:10     ` Joseph Glanville

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50BFA59E.10208@acm.org \
    --to=bvanassche-hinycgiudog@public.gmane.org \
    --cc=James.Bottomley-JuX6DAaQMKPCXq6kfMZ53/egYHeGw8Jk@public.gmane.org \
    --cc=alextu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    --cc=dillowda-1Heg1YXhbW8@public.gmane.org \
    --cc=fujita.tomonori-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=or.gerlitz-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=rcj-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org \
    --cc=roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).