From: Vu Pham <vuhuong-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
To: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
Cc: Roland Dreier <rdreier-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>,
Jason Gunthorpe
<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>,
Linux RDMA list
<linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
Bart Van Assche
<bart.vanassche-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Subject: Re: [ofa-general][PATCH 3/4] SRP fail-over faster
Date: Fri, 23 Oct 2009 09:50:55 -0700 [thread overview]
Message-ID: <4AE1DEEF.5070205@mellanox.com> (raw)
In-Reply-To: <1256258049.1598.8.camel-FqX9LgGZnHWDB2HL1qBt2PIbXMQ5te18@public.gmane.org>
David Dillow wrote:
> On Thu, 2009-10-22 at 20:24 -0400, Vu Pham wrote:
>
>> David Dillow wrote:
>>
>>> On Thu, 2009-10-22 at 20:04 -0400, Vu Pham wrote:
>>>
>>>> Yes and you can not disable intirely. I'm still looking at
>>>> benefits/advantages to disable it entirely
>>>>
>>> To me, the advantage is I have a perfectly viable backup path to the
>>> storage, and can immediately start issuing commands to it rather than
>>> waiting for any timeout. On my systems, 1 second can be up to 1500 MB
>>> transferred and a _huge_ number of compute cycles. And I expect those
>>> numbers to grow.
>>>
>>>
>> You can still do so with these patches applied by using the right device
>> name (ie. /dev/sdXXX)
>>
>
> Not in a multipath situation configured for failover. I have to use the
> multipath device, which will then use the appropriate path as
> prioritized by ALUA.
>
>
I don't know much about multipath in ALUA mode.
How would multipath driver (in ALUA mode) to switch path? (ie. basing on
what criteria?)
Can you switch path manually in user mode (while there are commands
stucked in current active path)?
Without this patch, all outstanding I/Os have to go thru error recovery
before being returned with error code so that dm-multipath fail-over.
>>>> I use the user supplied setting for local async event on port error
>>>> where link is broken from host to switch
>>>>
>>> Perhaps that part should be in the patch that adds that support, then?
>>>
>>>
>> That's patch #4
>>
>
> Sure, and perhaps the part that massages the timeout should be in the
> patch that introduces it and actually uses it, no?
>
>
I will look at it and rework the patch.
>>> This makes a certain amount of sense; I was confused by the two
>>> unrelated changes in this patch. I'm still not all that happy about a
>>> hard-coded 5 seconds, especially with no explanation about the magic
>>> number.
>>>
>>>
>> As I said above, it's not magic at all, it just that certain unknown
>> seconds already passed by, therefore, just pick X seconds to sleep on.
>>
>
> Sorry, this is a common idiom here -- a bare number in source code, with
> no explanation as to where it came from or why it was picked, is often
> called a "magic number."
>
> I'm saying you should comment on it, either in the commit message or in
> a comment in the code. Or better yet, give it a #define and a comment
> above that definition that says why you picked it.
>
> In other words, don't make someone who comes along after us have to
> search for this mail thread to figure out that the 5 second sleep was
> pulled out of thin air.
>
>
Understood.
>>>> To really sleep user supplied number of seconds, we need to register
>>>> trap to SM and receiving trap for a node leaving the fabric.
>>>> It requires a lot of changes in srp_daemon (registering to trap, passing
>>>> event down to srp driver) and srp driver (handling this event)
>>>>
>>>>
>>> Well, if this were done, then you wouldn't need to sleep at all would
>>> you? Just wait for the trap telling you the target rejoined the fabric?
>>> Perhaps you'd want a delay before tearing down the target connection,
>>> but then that could be part of the user settings above?
>>>
>>> Not that I'm sure it is worth it, though.
>>>
>>>
>> If it's done, you still need to sleep target->device_loss_timeout
>> (instead of some unknown seconds + 5) to tear down connection so that
>> dm-multipath can fail-over.
>>
>
> Or I can just start failing requests due to knowing they won't get to
> the target so dm-multipath will use the backup path immediately. I can
> sleep as long as I want before killing the connection, just in case it
> comes back, but my commands will still be going to the other path.
>
>
If you want to failing requests right away, you can just set
device_loss_timeout=1, others don't want dm-multipath to switch path
right away. That's a whole idea of these patches that I submitted
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2009-10-23 16:50 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-10-12 22:57 [ofa-general][PATCH 3/4] SRP fail-over faster Vu Pham
[not found] ` <4AD3B453.3030109-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2009-10-13 11:09 ` Bart Van Assche
2009-10-14 18:12 ` Roland Dreier
[not found] ` <ada1vl5alqh.fsf-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
2009-10-14 20:37 ` Vu Pham
[not found] ` <4AD63681.6080901-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2009-10-14 20:52 ` Roland Dreier
[not found] ` <adaljjd8zrj.fsf-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
2009-10-14 21:08 ` Vu Pham
[not found] ` <4AD63DB1.3060906-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2009-10-14 22:47 ` Roland Dreier
[not found] ` <adahbu18uf5.fsf-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
2009-10-14 23:59 ` Vu Pham
2009-10-15 1:39 ` David Dillow
[not found] ` <1255570760.13845.4.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
2009-10-15 16:23 ` Vu Pham
[not found] ` <4AD74C88.8030604-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2009-10-15 19:25 ` David Dillow
[not found] ` <1255634715.29829.9.camel-FqX9LgGZnHWDB2HL1qBt2PIbXMQ5te18@public.gmane.org>
2009-10-15 21:35 ` Jason Gunthorpe
[not found] ` <20091015213512.GW5191-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2009-10-22 23:13 ` Vu Pham
[not found] ` <4AE0E71E.20309-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2009-10-22 23:33 ` David Dillow
[not found] ` <1256254394.1579.86.camel-FqX9LgGZnHWDB2HL1qBt2PIbXMQ5te18@public.gmane.org>
2009-10-22 23:34 ` David Dillow
[not found] ` <1256254459.1579.87.camel-FqX9LgGZnHWDB2HL1qBt2PIbXMQ5te18@public.gmane.org>
2009-10-22 23:38 ` David Dillow
[not found] ` <1256254692.1579.89.camel-FqX9LgGZnHWDB2HL1qBt2PIbXMQ5te18@public.gmane.org>
2009-10-23 0:04 ` Vu Pham
[not found] ` <4AE0F309.5040201-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2009-10-23 0:16 ` David Dillow
[not found] ` <1256256984.1579.105.camel-FqX9LgGZnHWDB2HL1qBt2PIbXMQ5te18@public.gmane.org>
2009-10-23 0:24 ` Vu Pham
[not found] ` <4AE0F7DA.20100-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2009-10-23 0:34 ` David Dillow
[not found] ` <1256258049.1598.8.camel-FqX9LgGZnHWDB2HL1qBt2PIbXMQ5te18@public.gmane.org>
2009-10-23 16:50 ` Vu Pham [this message]
[not found] ` <4AE1DEEF.5070205-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2009-10-23 22:08 ` David Dillow
[not found] ` <1256335698.10273.62.camel-FqX9LgGZnHWDB2HL1qBt2PIbXMQ5te18@public.gmane.org>
2009-10-24 7:35 ` Vu Pham
[not found] ` <4AE2AE54.5020004-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2009-10-28 15:09 ` David Dillow
2009-10-29 18:42 ` Vladislav Bolkhovitin
2009-10-23 6:13 ` Bart Van Assche
[not found] ` <e2e108260910222313o27c8b97dh483d846b6c98e480-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-10-23 16:52 ` Vu Pham
2009-10-28 18:00 ` Roland Dreier
[not found] ` <adavdhzs8iv.fsf-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
2009-10-29 16:37 ` Vu Pham
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4AE1DEEF.5070205@mellanox.com \
--to=vuhuong-vpraknaxozvwk0htik3j/w@public.gmane.org \
--cc=bart.vanassche-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=dillowda-1Heg1YXhbW8@public.gmane.org \
--cc=jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=rdreier-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox