From: Junxiao Bi <junxiao.bi@oracle.com>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] [PATCH 0/3] ocfs2: o2net: fix packets lost issue when reconnect
Date: Thu, 15 May 2014 12:26:20 +0800 [thread overview]
Message-ID: <1400127983-9774-1-git-send-email-junxiao.bi@oracle.com> (raw)
Hi,
After the tcp connection is established between two ocfs2 nodes, an idle
timer will be set to check its state periodically, if no messages are
received during this time, idle timer will timeout, it will shutdown
the connection and try to rebuild, so pending message in tcp queues will
be lost. This may cause the whole ocfs2 cluster hung.
This is very possible to happen when network state goes bad. Do the
reconnect is useless, it will fail if network state doesn't recover.
Just waiting there for network recovering may be a good idea, it will
not lost messages and some node will be fenced until cluster goes into
split-brain state, for this case, Tcp user timeout is used to override
the tcp retransmit timeout. It will timeout after 25 days, user should
have notice this through the provided log and fix the network, if they
don't, ocfs2 will fall back to original reconnect way.
The following is the serial of patches to fix the bug. Please help review.
Thanks,
Junxiao.
next reply other threads:[~2014-05-15 4:26 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-15 4:26 Junxiao Bi [this message]
2014-05-15 4:26 ` [Ocfs2-devel] [PATCH 1/3] ocfs2: o2net: don't shutdown connection when idle timeout Junxiao Bi
2014-05-15 4:26 ` [Ocfs2-devel] [PATCH 2/3] ocfs2: o2net: set tcp user timeout to max value Junxiao Bi
2014-05-15 4:26 ` [Ocfs2-devel] [PATCH 3/3] ocfs2: quorum: add a log for node not fenced Junxiao Bi
2014-05-15 8:27 ` [Ocfs2-devel] [PATCH 0/3] ocfs2: o2net: fix packets lost issue when reconnect Joseph Qi
2014-05-16 2:19 ` Junxiao Bi
2014-05-16 8:05 ` Joseph Qi
2014-05-16 8:32 ` Junxiao Bi
2014-05-16 9:01 ` Joseph Qi
2014-05-19 1:36 ` Junxiao Bi
2014-06-06 2:18 ` Junxiao Bi
2014-06-12 21:03 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1400127983-9774-1-git-send-email-junxiao.bi@oracle.com \
--to=junxiao.bi@oracle.com \
--cc=ocfs2-devel@oss.oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).