All of lore.kernel.org
 help / color / mirror / Atom feed
* [Ocfs2-devel] [RFC] make ocfs2/o2net reliable
@ 2017-11-16  9:49 Changwei Ge
  2017-11-16 10:04 ` Gang He
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Changwei Ge @ 2017-11-16  9:49 UTC (permalink / raw)
  To: ocfs2-devel

Hi all,
As far as we know, ocfs2/o2net is not a reliable message mechanism. 
Messages might get lost due to a sudden TCP socket connection shutdown. 
And the only customer of o2net is ocfs2/dlm, so this may cause ocfs2/dlm 
hang(missing AST and ASSERT MASTER). Sometimes it also causes 
ocfs2/dlm's infinite wait for accomplishment of DLM recovery. But that 
won't happen since target node is still heartbeating and no dlm recovery 
procedure will be launched.

So I think above cases drive us to improve current ocfs2/o2net making it 
more reliable. I already have a draft design for it. And we indeed need 
to change o2net behavior.

To accomplish this goal, we tag each o2net message with a sequence 
::msg_seq to let receiver tell if the newly coming message is a 
duplicated one or not and ::msg_seq will work as a key value for 
searching a following key structure in a red-black tree.

A brandy new structure is added to o2net named as *Message Holder*, it 
is responsible for _handle_status_ storing.

When TCP has to shutdown or reset due to unknown reason, although we 
lose the packets in send or receive buffer, o2net still manages those 
messages. This gives a chance to o2net to re-send the messages once TCP 
connection is established again.

Below diagram demonstrates how it works:

SEND					RECV
send message				
tag message header with ::msg_seq	
					search for Message Holder with
					  ::msg_seq
					NOT FOUND - insert one
					(FOUND - means a duplicated one)
					handle message
					store status into Message Holder
					send back status
instruct RECV to remove MH
					notify SEND that MH is already
					  removed
return to caller

I am expecting your comments especially from @Mark, @Joseph and @Junxiao.

Thanks,
Changwei.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-11-17  6:03 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-11-16  9:49 [Ocfs2-devel] [RFC] make ocfs2/o2net reliable Changwei Ge
2017-11-16 10:04 ` Gang He
2017-11-17  1:48   ` Changwei Ge
2017-11-17  2:23     ` Gang He
2017-11-17  3:45       ` Changwei Ge
2017-11-16 23:02 ` Wengang Wang
2017-11-17  1:38   ` Changwei Ge
2017-11-17  3:04 ` jiangyiwen
2017-11-17  3:53   ` Changwei Ge
2017-11-17  5:50     ` jiangyiwen
2017-11-17  6:03       ` Changwei Ge

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.