From: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
To: Or Gerlitz <or.gerlitz-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>,
Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>,
James Bottomley
<James.Bottomley-JuX6DAaQMKPCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>,
"linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
<linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
linux-scsi <linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
fujita.tomonori-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org,
rcj-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org,
Alex Turin <alextu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Subject: Re: [PATCH 00/11] First pass at merging Bart's HA work
Date: Wed, 05 Dec 2012 19:50:08 +0100 [thread overview]
Message-ID: <50BF9760.2080801@acm.org> (raw)
In-Reply-To: <CAJZOPZJBTRXftrW5NWEEHnf2QWsni0HMTAV_PKSgDtA7GO=wRw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
On 12/05/12 19:23, Or Gerlitz wrote:
> On Fri, Nov 30, 2012 at 4:21 AM, David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org> wrote:
> [...]
>> Modulo a few style issues (braces around one line if branches, etc.) and
>> having three state variables vs one, I can live with everything up to
>> aabfa852acd27962 at git://github.com/bvanassche/linux.git#srp-ha. Those
>> two are small things that can be fixed later and are not worth holding
>> things up any further.
>>
>> I'll try to spend some time on the final four patches tomorrow afternoon.
>
> Dave, Bart
>
> My colleague Alex Turin <alextu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> tried today the bits as
> they appear in Roland's kernel.org tree / for-next branch up to commit
> fb57e1dbbd4 and here's some feedback
>
> Basically, what he did was connecting to a target, next take down the
> IB port on the initiator side, and issue some IOs (dd if=/dev/sdb
> of=/dev/null count=1)
>
> Our recollection of events from the logs (below) is the following
>
> 1. queued command get completion status 5
>
> 2. as part of error handling srp_reset_host() was called,
>
> 3. srp_reset_host() calls to srp_reconnect_target() which fails cause
> port is down.
>
> 4. srp_reconnect_target() on failure calls to srp_queue_remove_work()
> which sets
> target->status to SRP_TARGET_REMOVED.
>
> 5.srp_reset_host() called second time. it calls to
> srp_reconnect_target() but target->state == SRP_TARGET_REMOVED.
> srp_reconnect_target() checks if target->state != SRP_TARGET_LIVE and
> return -EAGAIN.
>
> This probably means that even after enabling port it will still fail
> to reconnect?
Hello Or,
The only way to make I/O work reliably if a failure can occur at the
transport layer is to use multipathd on top of ib_srp. If a connection
fails for some reason, then the SRP SCSI host will be removed after the
SCSI error handler has finished with its error recovery strategy. And
once the transport layer is operational again and srp_daemon detects
that the initiator is no longer logged in srp_daemon will make ib_srp
log in again. multipathd will then cause I/O to continue over the new path.
Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2012-12-05 18:50 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-11-26 4:44 [PATCH 00/11] First pass at merging Bart's HA work David Dillow
2012-11-26 4:44 ` [PATCH 01/11] IB/srp: enlarge block layer timeout David Dillow
2012-11-26 4:44 ` [PATCH 02/11] IB/srp: simplify state tracking David Dillow
2012-11-26 9:46 ` Bart Van Assche
[not found] ` <50B33A91.3060103-HInyCGIudOg@public.gmane.org>
2012-11-27 3:56 ` David Dillow
2012-11-26 4:44 ` [PATCH 05/11] IB/srp: destroy and recreate QP and CQs on each connection David Dillow
[not found] ` <8fa9a268ec4dc587970161efe94968f3263aad3b.1353903448.git.dillowda-1Heg1YXhbW8@public.gmane.org>
2012-11-26 18:57 ` Bart Van Assche
2012-11-26 4:44 ` [PATCH 06/11] IB/srp: send disconnect request without waiting for CM timewait exit David Dillow
2012-11-26 4:44 ` [PATCH 07/11] IB/srp: Document sysfs attributes David Dillow
[not found] ` <cover.1353903448.git.dillowda-1Heg1YXhbW8@public.gmane.org>
2012-11-26 4:44 ` [PATCH 03/11] IB/srp: don't send anything on a bad QP David Dillow
2012-11-26 9:17 ` Bart Van Assche
[not found] ` <50B333AF.6040509-HInyCGIudOg@public.gmane.org>
2012-11-27 3:31 ` David Dillow
2012-11-26 4:44 ` [PATCH 04/11] IB/srp: keep processing commands during host removal David Dillow
[not found] ` <8715294a23dded5879b3a327c470d9b6a39ddbc4.1353903448.git.dillowda-1Heg1YXhbW8@public.gmane.org>
2012-11-26 9:43 ` Bart Van Assche
2012-11-27 3:16 ` David Dillow
2012-11-26 4:44 ` [PATCH 08/11] srp_transport: Fix attribute registration David Dillow
2012-11-26 4:44 ` [PATCH 09/11] srp_transport: Simplify attribute initialization code David Dillow
2012-11-26 5:02 ` David Dillow
2012-11-26 4:44 ` [PATCH 11/11] IB/srp: Allow SRP disconnect through sysfs David Dillow
2012-11-26 4:44 ` [PATCH 10/11] srp_transport: Document sysfs attributes David Dillow
2012-11-26 7:57 ` [PATCH 00/11] First pass at merging Bart's HA work Or Gerlitz
2012-11-27 4:53 ` David Dillow
2012-11-26 18:50 ` Roland Dreier
2012-11-26 19:15 ` James Bottomley
2012-11-26 19:22 ` Roland Dreier
2012-11-27 4:04 ` David Dillow
[not found] ` <1353989041.28917.24.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
2012-11-27 6:42 ` Or Gerlitz
2012-11-29 20:21 ` Roland Dreier
[not found] ` <CAL1RGDXpdWL_r7sWp=vvvXH4jxFgjDL+XcEGgKo-44=wrOBmtA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-11-30 2:21 ` David Dillow
2012-12-05 18:23 ` Or Gerlitz
[not found] ` <CAJZOPZJBTRXftrW5NWEEHnf2QWsni0HMTAV_PKSgDtA7GO=wRw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-12-05 18:50 ` Bart Van Assche [this message]
[not found] ` <50BF9760.2080801-HInyCGIudOg@public.gmane.org>
2012-12-05 19:50 ` Bart Van Assche
2012-12-05 21:32 ` Or Gerlitz
2012-12-06 14:10 ` Bart Van Assche
[not found] ` <50C0A76C.20500-HInyCGIudOg@public.gmane.org>
2012-12-06 14:27 ` Or Gerlitz
[not found] ` <50C0AB42.8040402-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2012-12-06 15:04 ` Bart Van Assche
[not found] ` <50C0B407.4010706-HInyCGIudOg@public.gmane.org>
2012-12-06 15:46 ` Or Gerlitz
2012-12-06 15:55 ` Alex Turin
[not found] ` <50C0BFE0.909-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2012-12-07 21:47 ` Vu Pham
2012-12-07 8:19 ` Or Gerlitz
2012-11-27 16:34 ` Bart Van Assche
[not found] ` <50B4EBA3.7070400-HInyCGIudOg@public.gmane.org>
2012-11-27 18:10 ` Joseph Glanville
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50BF9760.2080801@acm.org \
--to=bvanassche-hinycgiudog@public.gmane.org \
--cc=James.Bottomley-JuX6DAaQMKPCXq6kfMZ53/egYHeGw8Jk@public.gmane.org \
--cc=alextu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
--cc=dillowda-1Heg1YXhbW8@public.gmane.org \
--cc=fujita.tomonori-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=or.gerlitz-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=rcj-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org \
--cc=roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).