cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
* [Cluster-devel] [RFC] Common cluster connection handler API
@ 2008-06-27  5:26 Fabio M. Di Nitto
  2008-06-27 16:35 ` David Teigland
  0 siblings, 1 reply; 6+ messages in thread
From: Fabio M. Di Nitto @ 2008-06-27  5:26 UTC (permalink / raw)
  To: cluster-devel.redhat.com


Hi,

it's been sometime now that is issue has been bothering me.

Looking at the code around, i have seen way too many different methods to 
connect to libccs and cman and most of them don't do it right.

The actual implementations suffer of a set of race conditions at startup 
time and have different behaviour across daemons/subsystems that leads to 
a confusing and dangerous way of starting the software.

What I see now:

- loops on ccs_connect/ccs_force_connect in blocking mode.
- loops on cman_init/cman_admin_init
- poor information to the users on what the daemon is waiting for
- loops to know when cman is actually available.
- attempts to call ccs_connect and exit on failure (clashes with other 
daemons waiting for ccs)
- attempts to call ccs_init and exit on failure (clashes with other 
daemons waiting for cman)
- use of cman_get_node to verify if cman has completed its 
configuration/startup when there is cman_is_active that fits exacly the 
same purpose.

As you can see this is not exactly the ideal solution.

So my suggestion boils down to a very simple API that will take care of 
connecting to libccs and cman and guarantee feedback to user on what we 
are waiting and guarantee that the application connecting, will have 
access to ccs/cman when it's the right time to do so.

I don't have a strong opinion on how this API should look like, but this 
is just one suggestion merely based on the most important bits of booting 
a cluster:

int cluster_connect(
 	char *subsytem_name,
 	int *ccsfd,
 	cman_handle_t *ch,
 	int cman_admin,
 	int wait_quorum,
 	int blocking,
 	int max_attempts)

char *subsytem_name will be used to report to the users (logging or 
stderr) that this subsystem is waiting for ccs/cman/cman_is_active.

ccsfd and ch will hold the usual suspects.

cman_admin = 0 normal cman_init, 1 cman_admin_init.

wait_quorum = 0 just go ahead, 1 loop also on cman_is_quorated.

blocking = set to 1 if you want wait for the stuff to appear, 0 if you 
want a one time shot

max_attempts = (useful only in blocking mode), 0 loop forever, > 0 loop N 
times.

the flow will look like:

connect to ccs
connect to cman
wait for cman_is_active
return 0 on success, < 0 otherwise if we fail to connect within 
max_attempts or blocking is set to 0.. or any other error for the matter.

Thanks
Fabio

--
I'm going to make him an offer he can't refuse.



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-06-28  5:01 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-27  5:26 [Cluster-devel] [RFC] Common cluster connection handler API Fabio M. Di Nitto
2008-06-27 16:35 ` David Teigland
2008-06-27 18:19   ` Fabio M. Di Nitto
2008-06-27 19:27     ` David Teigland
2008-06-28  4:58       ` Fabio M. Di Nitto
2008-06-28  5:01       ` Fabio M. Di Nitto

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).