From: Joseph Qi <joseph.qi@huawei.com>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] [PATCH 0/3] ocfs2: o2net: fix packets lost issue when reconnect
Date: Fri, 16 May 2014 17:01:14 +0800 [thread overview]
Message-ID: <5375D3DA.6030603@huawei.com> (raw)
In-Reply-To: <5375CD34.60101@oracle.com>
On 2014/5/16 16:32, Junxiao Bi wrote:
> On 05/16/2014 04:05 PM, Joseph Qi wrote:
>> Hi Junxiao,
>>
>> On 2014/5/16 10:19, Junxiao Bi wrote:
>>> Hi Joseph,
>>>
>>> On 05/15/2014 04:27 PM, Joseph Qi wrote:
>>>> On 2014/5/15 12:26, Junxiao Bi wrote:
>>>>> Hi,
>>>>>
>>>>> After the tcp connection is established between two ocfs2 nodes, an idle
>>>>> timer will be set to check its state periodically, if no messages are
>>>>> received during this time, idle timer will timeout, it will shutdown
>>>>> the connection and try to rebuild, so pending message in tcp queues will
>>>>> be lost. This may cause the whole ocfs2 cluster hung.
>>>>> This is very possible to happen when network state goes bad. Do the
>>>>> reconnect is useless, it will fail if network state doesn't recover.
>>>>> Just waiting there for network recovering may be a good idea, it will
>>>>> not lost messages and some node will be fenced until cluster goes into
>>>>> split-brain state, for this case, Tcp user timeout is used to override
>>>>> the tcp retransmit timeout. It will timeout after 25 days, user should
>>>>> have notice this through the provided log and fix the network, if they
>>>>> don't, ocfs2 will fall back to original reconnect way.
>>>>> The following is the serial of patches to fix the bug. Please help review.
>>>> TCP RTT is auto-regressive, that means the following case may take
>>>> place:
>>>> Suppose current retransmission interval is ?T (somewhat long), network
>>>> recovers but down again before the next retransmission windows
>>>> comes (< ?T), so the network recovery won't be detected and ocfs2
>>>> cluster still hungs.
>>> Network recovers but down again, this means the network is still down.
>>> Ocfs2 hung is an expected behavior when network is down if split-brain case.
>>> What we need take care is how long can ocfs2 recover from hung after
>>> network recover(not down again). I didn't know tcp internal about how
>>> they retransmit the packets, I just test blocking the network for half
>>> an hour, it just need several seconds to recover from the hung. Of
>>> course, how long the hung recover may also depends on how hard it hung
>>> from dlm.
>>>
>> Yes, it is an expected behavior. But currently ocfs2 will make quorum
>> decision after timeout and cluster won't hang long.
> Not always, sometimes, the quorum decision can't fence any node. Like in
> three nodes cluster, 1, 2, 3, if the network between node 2 and node 3
> is down, but the network of each node to node 1 is good. No node will be
> fenced. This is what we call split-brain case. Cluster will hung.
Yes, you are right. In such a case currently ocfs2 cannot handle.
But if all nodes are connected with the same switch, I am curious about
how this happens.
>> So should it be better fence than wait till recover in this situation?
>> After all, it widely affects cluster operations.
> Yes, but making fence decision is not that easy in the split-brain case.
> This needs a node know every connections status of the cluster. Then it
> can decide to cut some nodes to make the cluster work again. But now
> every node only know itself connections status, like node 1 didn't know
> the connection status between node 2 and node 3.
>> Another thought is, could we retry the message? And to avoid BUG when
>> a same message is handled twice, we can add an unique message sequence
>> number.
> Retry is useless when network is bad. It will fail again and again until
> network recover.
The thought is based on quorum decision will be made when timeout.
And I suppose network down within cluster range.
>
> Thanks,
> Junxiao.
>>
>>> Thanks,
>>> Junxiao.
>>>>> Thanks,
>>>>> Junxiao.
>>>>>
>>>>> _______________________________________________
>>>>> Ocfs2-devel mailing list
>>>>> Ocfs2-devel at oss.oracle.com
>>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>
>
> .
>
next prev parent reply other threads:[~2014-05-16 9:01 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-15 4:26 [Ocfs2-devel] [PATCH 0/3] ocfs2: o2net: fix packets lost issue when reconnect Junxiao Bi
2014-05-15 4:26 ` [Ocfs2-devel] [PATCH 1/3] ocfs2: o2net: don't shutdown connection when idle timeout Junxiao Bi
2014-05-15 4:26 ` [Ocfs2-devel] [PATCH 2/3] ocfs2: o2net: set tcp user timeout to max value Junxiao Bi
2014-05-15 4:26 ` [Ocfs2-devel] [PATCH 3/3] ocfs2: quorum: add a log for node not fenced Junxiao Bi
2014-05-15 8:27 ` [Ocfs2-devel] [PATCH 0/3] ocfs2: o2net: fix packets lost issue when reconnect Joseph Qi
2014-05-16 2:19 ` Junxiao Bi
2014-05-16 8:05 ` Joseph Qi
2014-05-16 8:32 ` Junxiao Bi
2014-05-16 9:01 ` Joseph Qi [this message]
2014-05-19 1:36 ` Junxiao Bi
2014-06-06 2:18 ` Junxiao Bi
2014-06-12 21:03 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5375D3DA.6030603@huawei.com \
--to=joseph.qi@huawei.com \
--cc=ocfs2-devel@oss.oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).