From mboxrd@z Thu Jan 1 00:00:00 1970 From: Lars Marowsky-Bree Date: Sat, 31 Oct 2009 01:20:25 +0100 Subject: [Cluster-devel] SCTP versus OpenAIS/corosync time-outs Message-ID: <20091031002025.GS14882@suse.de> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Hi all, David, I'm contemplating SCTP versus OpenAIS/corosync. Is dlm_controld(.pcmk) pro-actively informed if a single ring/link goes down, as to trigger faster SCTP recovery - or is it left for SCTP to time out on its own and proceed? If the latter - is there a way to auto-tune the SCTP time-outs to make sure the DLM doesn't stall longer than that? I'm wondering whether there's any chance for higher-level time-outs, ie a monitor operation on a filesystem-using service. RFC 5061 seems to support dynamic reconfiguration in such a fashion. If I'm reading http://tools.ietf.org/html/rfc4960#page-87 correctly, SCTP multi-homing is "active/passive", so there's some latency on the fail-over at least. If several links go down at once, SCTP might try them in sequence and pick the one surviving link last, incurring a large latency. No concurrently active transmission ("rrp_mode active") - I wonder if it is possible to put SCTP into such an mode, or, vice-versa, if this means the DLM might be better off directly opening several TCP connections on its own (and using them all at once, simply discarding duplicate messages)? I'm not sure what kind of problems exist, if any, but this may be a worth-while thing to consider or at least contemplate. I welcome feedback ;-) Regards, Lars -- Architect Storage/HA, OPS Engineering, Novell, Inc. SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG N?rnberg) "Experience is the name everyone gives to their mistakes." -- Oscar Wilde