From: Junxiao Bi <junxiao.bi@oracle.com>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] ocfs2: o2hb: not fence self if storage down
Date: Fri, 22 Jan 2016 13:08:31 +0800 [thread overview]
Message-ID: <56A1B94F.5050705@oracle.com> (raw)
In-Reply-To: <56A1AF34.4030006@huawei.com>
Hi Joseph,
On 01/22/2016 12:25 PM, Joseph Qi wrote:
> Hi Junxiao,
>
> On 2016/1/21 9:48, Junxiao Bi wrote:
>> On 01/21/2016 08:46 AM, Joseph Qi wrote:
>>> Hi Junxiao,
>>> So you mean the negotiation you added only happens if all nodes storage
>>> link down?
>> Negotiation happened when one node found its storage link down, but
>> success when all nodes storage link down, or it will keep the same
>> behavior like before.
> IC, thanks for your explanation.
> IMHO, if storage down, all business deployed on the storage will be
> impacted even nodes won't fence.
Yes, but storage may back online again after a while. This can improve
system's stability and availability.
> I have another scenario, only several paths (multipath environment) in
> several nodes have problems, as a result, ocfs2 will fence these nodes.
> So I wonder if we have a better way to resolve this issue.
This seemed not obey usual ocfs2's policy. Fence these nodes at that
time will be good to the availability of ocfs2?
Any way I am not sure whether it is feasible now, the problem is that we
need find a way to make an agreement between good nodes during an env
that more error maybe coming, while good nodes can't be hurt even the
agreement can't be made.
Thanks,
Junxiao.
>
> Thanks,
> Joseph
>
>>
>> Thanks,
>> Junxiao.
>>>
>>> Thanks,
>>> Joseph
>>>
>>> On 2016/1/20 21:27, Junxiao Bi wrote:
>>>> Hi Joseph,
>>>>
>>>>> ? 2016?1?20????5:18?Joseph Qi <joseph.qi@huawei.com> ???
>>>>>
>>>>> Hi Junxiao,
>>>>> Thanks for the patch set.
>>>>> In case only one node storage link down, if this node doesn't fence
>>>>> self, other nodes will still check and mark this node dead, which will
>>>>> cause cluster membership inconsistency.
>>>>> In your patch set, I cannot see any logic to handle this. Am I missing
>>>>> something?
>>>> No, there is no logic for this. But why didn?t node fence self when storage down? What make a softirq timer can?t be run, another bug?
>>>>
>>>> Thanks,
>>>> Junxiao.
>>>>>
>>>>> On 2016/1/20 11:13, Junxiao Bi wrote:
>>>>>> Hi,
>>>>>>
>>>>>> This serial of patches is to fix the issue that when storage down,
>>>>>> all nodes will fence self due to write timeout.
>>>>>> With this patch set, all nodes will keep going until storage back
>>>>>> online, except if the following issue happens, then all nodes will
>>>>>> do as before to fence self.
>>>>>> 1. io error got
>>>>>> 2. network between nodes down
>>>>>> 3. nodes panic
>>>>>>
>>>>>> Junxiao Bi (6):
>>>>>> ocfs2: o2hb: add negotiate timer
>>>>>> ocfs2: o2hb: add NEGO_TIMEOUT message
>>>>>> ocfs2: o2hb: add NEGOTIATE_APPROVE message
>>>>>> ocfs2: o2hb: add some user/debug log
>>>>>> ocfs2: o2hb: don't negotiate if last hb fail
>>>>>> ocfs2: o2hb: fix hb hung time
>>>>>>
>>>>>> fs/ocfs2/cluster/heartbeat.c | 181 ++++++++++++++++++++++++++++++++++++++++--
>>>>>> 1 file changed, 175 insertions(+), 6 deletions(-)
>>>>>>
>>>>>> Thanks,
>>>>>> Junxiao.
>>>>>>
>>>>>> _______________________________________________
>>>>>> Ocfs2-devel mailing list
>>>>>> Ocfs2-devel at oss.oracle.com
>>>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> .
>>>>
>>>
>>>
>>
>>
>> .
>>
>
>
next prev parent reply other threads:[~2016-01-22 5:08 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-20 3:13 [Ocfs2-devel] ocfs2: o2hb: not fence self if storage down Junxiao Bi
2016-01-20 3:13 ` [Ocfs2-devel] [PATCH 1/6] ocfs2: o2hb: add negotiate timer Junxiao Bi
2016-01-21 23:42 ` Andrew Morton
2016-01-22 3:23 ` Junxiao Bi
2016-01-22 0:56 ` Joseph Qi
2016-01-22 3:19 ` Junxiao Bi
2016-01-20 3:13 ` [Ocfs2-devel] [PATCH 2/6] ocfs2: o2hb: add NEGO_TIMEOUT message Junxiao Bi
2016-01-21 23:47 ` Andrew Morton
2016-01-22 5:12 ` Junxiao Bi
2016-01-22 5:45 ` Andrew Morton
2016-01-22 5:46 ` Junxiao Bi
2016-01-25 3:18 ` Eric Ren
2016-01-25 4:28 ` Junxiao Bi
2016-01-25 5:59 ` Eric Ren
2016-01-20 3:13 ` [Ocfs2-devel] [PATCH 3/6] ocfs2: o2hb: add NEGOTIATE_APPROVE message Junxiao Bi
2016-01-20 3:13 ` [Ocfs2-devel] [PATCH 4/6] ocfs2: o2hb: add some user/debug log Junxiao Bi
2016-01-25 3:28 ` Eric Ren
2016-01-25 4:29 ` Junxiao Bi
2016-01-25 6:00 ` Eric Ren
2016-01-20 3:13 ` [Ocfs2-devel] [PATCH 5/6] ocfs2: o2hb: don't negotiate if last hb fail Junxiao Bi
2016-01-20 3:13 ` [Ocfs2-devel] [PATCH 6/6] ocfs2: o2hb: fix hb hung time Junxiao Bi
2016-01-20 6:00 ` [Ocfs2-devel] ocfs2: o2hb: not fence self if storage down Gang He
2016-01-20 8:09 ` Junxiao Bi
2016-01-20 9:18 ` Joseph Qi
2016-01-20 13:27 ` Junxiao Bi
2016-01-21 0:46 ` Joseph Qi
2016-01-21 1:48 ` Junxiao Bi
2016-01-22 4:25 ` Joseph Qi
2016-01-22 5:08 ` Junxiao Bi [this message]
2016-01-21 8:34 ` rwxybh
2016-01-21 8:41 ` Junxiao Bi
-- strict thread matches above, loose matches on Subject: below --
2016-03-02 7:56 Junxiao Bi
2016-05-23 21:50 ` Andrew Morton
2016-05-23 23:40 ` Mark Fasheh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56A1B94F.5050705@oracle.com \
--to=junxiao.bi@oracle.com \
--cc=ocfs2-devel@oss.oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.