From: Bart Van Assche <bart.vanassche@sandisk.com>
To: Laurence Oberman <loberman@redhat.com>
Cc: linux-block@vger.kernel.org,
linux-scsi <linux-scsi@vger.kernel.org>,
Mike Snitzer <snitzer@redhat.com>,
James Bottomley <James.Bottomley@HansenPartnership.com>,
device-mapper development <dm-devel@redhat.com>,
lsf@lists.linux-foundation.org
Subject: Re: [dm-devel] [Lsf] Notes from the four separate IO track sessions at LSF/MM
Date: Mon, 2 May 2016 11:49:54 -0700 [thread overview]
Message-ID: <5727A152.70109@sandisk.com> (raw)
In-Reply-To: <1184712515.32596182.1461977223746.JavaMail.zimbra@redhat.com>
On 04/29/2016 05:47 PM, Laurence Oberman wrote:
> From: "Bart Van Assche" <bart.vanassche@sandisk.com>
> To: "Laurence Oberman" <loberman@redhat.com>
> Cc: "James Bottomley" <James.Bottomley@HansenPartnership.com>, "linux-scsi" <linux-scsi@vger.kernel.org>, "Mike Snitzer" <snitzer@redhat.com>, linux-block@vger.kernel.org, "device-mapper development" <dm-devel@redhat.com>, lsf@lists.linux-foundation.org
> Sent: Friday, April 29, 2016 8:36:22 PM
> Subject: Re: [dm-devel] [Lsf] Notes from the four separate IO track sessions at LSF/MM
>
>> On 04/29/2016 02:47 PM, Laurence Oberman wrote:
>>> Recovery with 21 LUNS is 300s that have in-flights to abort.
>>> [ ... ]
>>> eh_deadline is set to 10 on the 2 qlogic ports, eh_timeout is set
>>> to 10 for all devices. In multipath fast_io_fail_tmo=5
>>>
>>> I jam one of the target array ports and discard the commands
>>> effectively black-holing the commands and leave it that way until
>>> we recover and I watch the I/O. The recovery takes around 300s even
>>> with all the tuning and this effectively lands up in Oracle cluster
>>> evictions.
>>
>> This discussion started as a discussion about the time needed to fail
>> over from one path to another. How long did it take in your test before
>> I/O failed over from the jammed port to another port?
>
> Around 300s before the paths were declared hard failed and the
> devices offlined. This is when I/O restarts.
> The remaining paths on the second Qlogic port (that are not jammed)
> will not be used until the error handler activity completes.
>
> Until we get these for example, and device-mapper starts declaring
> paths down we are blocked.
> Apr 29 17:20:51 localhost kernel: sd 1:0:1:0: Device offlined - not
> ready after error recovery
> Apr 29 17:20:51 localhost kernel: sd 1:0:1:13: Device offlined - not
> ready after error recovery
Hello Laurence,
Everyone else on all mailing lists to which this message has been posted
replies below the message. Please follow this convention.
Regarding the fail-over time: the ib_srp driver guarantees that
scsi_done() is invoked from inside its terminate_rport_io() function.
Apparently the lpfc and the qla2xxx drivers behave differently. Please
work with the maintainers of these drivers to reduce fail-over time.
Bart.
next prev parent reply other threads:[~2016-05-02 18:49 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-04-27 23:39 Notes from the four separate IO track sessions at LSF/MM James Bottomley
2016-04-28 12:11 ` Mike Snitzer
2016-04-28 15:40 ` James Bottomley
2016-04-28 15:53 ` [Lsf] " Bart Van Assche
2016-04-28 16:19 ` Knight, Frederick
2016-04-28 16:37 ` Bart Van Assche
2016-04-28 17:33 ` James Bottomley
2016-04-28 16:23 ` Laurence Oberman
2016-04-28 16:41 ` [dm-devel] " Bart Van Assche
2016-04-28 16:47 ` Laurence Oberman
2016-04-29 21:47 ` Laurence Oberman
2016-04-29 21:51 ` Laurence Oberman
2016-04-30 0:36 ` Bart Van Assche
2016-04-30 0:47 ` Laurence Oberman
2016-05-02 18:49 ` Bart Van Assche [this message]
2016-05-02 19:28 ` Laurence Oberman
2016-05-02 22:28 ` Bart Van Assche
2016-05-03 17:44 ` Laurence Oberman
2016-05-26 2:38 ` bio-based DM multipath is back from the dead [was: Re: Notes from the four separate IO track sessions at LSF/MM] Mike Snitzer
2016-05-27 8:39 ` Hannes Reinecke
2016-05-27 8:39 ` Hannes Reinecke
2016-05-27 14:44 ` Mike Snitzer
2016-05-27 15:42 ` Hannes Reinecke
2016-05-27 15:42 ` Hannes Reinecke
2016-05-27 16:10 ` Mike Snitzer
2016-04-29 16:45 ` [dm-devel] Notes from the four separate IO track sessions at LSF/MM Benjamin Marzinski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5727A152.70109@sandisk.com \
--to=bart.vanassche@sandisk.com \
--cc=James.Bottomley@HansenPartnership.com \
--cc=dm-devel@redhat.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=loberman@redhat.com \
--cc=lsf@lists.linux-foundation.org \
--cc=snitzer@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.