* [Cluster-devel] [RFC] Common cluster connection handler API
@ 2008-06-27 5:26 Fabio M. Di Nitto
2008-06-27 16:35 ` David Teigland
0 siblings, 1 reply; 6+ messages in thread
From: Fabio M. Di Nitto @ 2008-06-27 5:26 UTC (permalink / raw)
To: cluster-devel.redhat.com
Hi,
it's been sometime now that is issue has been bothering me.
Looking at the code around, i have seen way too many different methods to
connect to libccs and cman and most of them don't do it right.
The actual implementations suffer of a set of race conditions at startup
time and have different behaviour across daemons/subsystems that leads to
a confusing and dangerous way of starting the software.
What I see now:
- loops on ccs_connect/ccs_force_connect in blocking mode.
- loops on cman_init/cman_admin_init
- poor information to the users on what the daemon is waiting for
- loops to know when cman is actually available.
- attempts to call ccs_connect and exit on failure (clashes with other
daemons waiting for ccs)
- attempts to call ccs_init and exit on failure (clashes with other
daemons waiting for cman)
- use of cman_get_node to verify if cman has completed its
configuration/startup when there is cman_is_active that fits exacly the
same purpose.
As you can see this is not exactly the ideal solution.
So my suggestion boils down to a very simple API that will take care of
connecting to libccs and cman and guarantee feedback to user on what we
are waiting and guarantee that the application connecting, will have
access to ccs/cman when it's the right time to do so.
I don't have a strong opinion on how this API should look like, but this
is just one suggestion merely based on the most important bits of booting
a cluster:
int cluster_connect(
char *subsytem_name,
int *ccsfd,
cman_handle_t *ch,
int cman_admin,
int wait_quorum,
int blocking,
int max_attempts)
char *subsytem_name will be used to report to the users (logging or
stderr) that this subsystem is waiting for ccs/cman/cman_is_active.
ccsfd and ch will hold the usual suspects.
cman_admin = 0 normal cman_init, 1 cman_admin_init.
wait_quorum = 0 just go ahead, 1 loop also on cman_is_quorated.
blocking = set to 1 if you want wait for the stuff to appear, 0 if you
want a one time shot
max_attempts = (useful only in blocking mode), 0 loop forever, > 0 loop N
times.
the flow will look like:
connect to ccs
connect to cman
wait for cman_is_active
return 0 on success, < 0 otherwise if we fail to connect within
max_attempts or blocking is set to 0.. or any other error for the matter.
Thanks
Fabio
--
I'm going to make him an offer he can't refuse.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Cluster-devel] [RFC] Common cluster connection handler API
2008-06-27 5:26 [Cluster-devel] [RFC] Common cluster connection handler API Fabio M. Di Nitto
@ 2008-06-27 16:35 ` David Teigland
2008-06-27 18:19 ` Fabio M. Di Nitto
0 siblings, 1 reply; 6+ messages in thread
From: David Teigland @ 2008-06-27 16:35 UTC (permalink / raw)
To: cluster-devel.redhat.com
On Fri, Jun 27, 2008 at 07:26:09AM +0200, Fabio M. Di Nitto wrote:
> connect to ccs
> connect to cman
> wait for cman_is_active
> return 0 on success, < 0 otherwise if we fail to connect within
> max_attempts or blocking is set to 0.. or any other error for the matter.
I'm not in favor of another layer, let's just fix things up where needed.
I was actually hoping that with no more ccsd there'd be no more
"connecting" to ccs, but that's probably a topic for one of the ccs
meetings...
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Cluster-devel] [RFC] Common cluster connection handler API
2008-06-27 16:35 ` David Teigland
@ 2008-06-27 18:19 ` Fabio M. Di Nitto
2008-06-27 19:27 ` David Teigland
0 siblings, 1 reply; 6+ messages in thread
From: Fabio M. Di Nitto @ 2008-06-27 18:19 UTC (permalink / raw)
To: cluster-devel.redhat.com
On Fri, 27 Jun 2008, David Teigland wrote:
> On Fri, Jun 27, 2008 at 07:26:09AM +0200, Fabio M. Di Nitto wrote:
>> connect to ccs
>> connect to cman
>> wait for cman_is_active
>> return 0 on success, < 0 otherwise if we fail to connect within
>> max_attempts or blocking is set to 0.. or any other error for the matter.
>
> I'm not in favor of another layer, let's just fix things up where needed.
that's virtually in every tool/daemon. I don't see this as a real layer.
Just a simple helper to do it right everywhere.
> I was actually hoping that with no more ccsd there'd be no more
> "connecting" to ccs, but that's probably a topic for one of the ccs
> meetings...
The only partial advantage you have, as i documented and wrote to
cluster-devel, is that if you are connected to cman and cman_is_active,
you are guaranteed 99.9% to connected to ccs without problems (only
reason for rejection would be lack of resources on the machine, but at
that point you have more serious issues to worry about).
Fabio
--
I'm going to make him an offer he can't refuse.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Cluster-devel] [RFC] Common cluster connection handler API
2008-06-27 18:19 ` Fabio M. Di Nitto
@ 2008-06-27 19:27 ` David Teigland
2008-06-28 4:58 ` Fabio M. Di Nitto
2008-06-28 5:01 ` Fabio M. Di Nitto
0 siblings, 2 replies; 6+ messages in thread
From: David Teigland @ 2008-06-27 19:27 UTC (permalink / raw)
To: cluster-devel.redhat.com
On Fri, Jun 27, 2008 at 08:19:36PM +0200, Fabio M. Di Nitto wrote:
> >I was actually hoping that with no more ccsd there'd be no more
> >"connecting" to ccs, but that's probably a topic for one of the ccs
> >meetings...
>
> The only partial advantage you have, as i documented and wrote to
> cluster-devel, is that if you are connected to cman and cman_is_active,
> you are guaranteed 99.9% to connected to ccs without problems (only
> reason for rejection would be lack of resources on the machine, but at
> that point you have more serious issues to worry about).
Oops, sorry, I'm still not thinking straight about the new ccs... yeah,
that makes sense that if cman is up then ccs should be there, since both
are openais extensions. I'm curious, after cman_init() succeeds, what
more does cman_is_active() mean? In practice would cman_init() ever be
ok, but cman_is_active() not be ok?
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Cluster-devel] [RFC] Common cluster connection handler API
2008-06-27 19:27 ` David Teigland
@ 2008-06-28 4:58 ` Fabio M. Di Nitto
2008-06-28 5:01 ` Fabio M. Di Nitto
1 sibling, 0 replies; 6+ messages in thread
From: Fabio M. Di Nitto @ 2008-06-28 4:58 UTC (permalink / raw)
To: cluster-devel.redhat.com
On Fri, 27 Jun 2008, David Teigland wrote:
> On Fri, Jun 27, 2008 at 08:19:36PM +0200, Fabio M. Di Nitto wrote:
>>> I was actually hoping that with no more ccsd there'd be no more
>>> "connecting" to ccs, but that's probably a topic for one of the ccs
>>> meetings...
>>
>> The only partial advantage you have, as i documented and wrote to
>> cluster-devel, is that if you are connected to cman and cman_is_active,
>> you are guaranteed 99.9% to connected to ccs without problems (only
>> reason for rejection would be lack of resources on the machine, but at
>> that point you have more serious issues to worry about).
>
> Oops, sorry, I'm still not thinking straight about the new ccs... yeah,
> that makes sense that if cman is up then ccs should be there, since both
> are openais extensions. I'm curious, after cman_init() succeeds, what
> more does cman_is_active() mean? In practice would cman_init() ever be
> ok, but cman_is_active() not be ok?
>
There is a small window while cman has setup the sockets and you can
connect and when it is actually configured, running and ready to process
data.
cman_is_active will return 1 only when the whole startup process is
completed and all of aisexec is running.
I think it is unlikely to happen but it can happen and it is a race
condition we want to avoid. As a consequence we know that the objdb has
been filled up with data.
The new ccs could theoretically load a standalone version of the objdb
(without cman or starting aisexec) but:
1) we would have to replicate all the information to know from which
plugin to load the configuration everywhere.
2) the startup sequence would be much more complex and racy (load, unload,
reload etc.)
3) basically all our daemons need cman. so turning this point into void.
I think all of the above is undesireable.
Fabio
--
I'm going to make him an offer he can't refuse.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Cluster-devel] [RFC] Common cluster connection handler API
2008-06-27 19:27 ` David Teigland
2008-06-28 4:58 ` Fabio M. Di Nitto
@ 2008-06-28 5:01 ` Fabio M. Di Nitto
1 sibling, 0 replies; 6+ messages in thread
From: Fabio M. Di Nitto @ 2008-06-28 5:01 UTC (permalink / raw)
To: cluster-devel.redhat.com
On Fri, 27 Jun 2008, David Teigland wrote:
> On Fri, Jun 27, 2008 at 08:19:36PM +0200, Fabio M. Di Nitto wrote:
>>> I was actually hoping that with no more ccsd there'd be no more
>>> "connecting" to ccs, but that's probably a topic for one of the ccs
>>> meetings...
>>
>> The only partial advantage you have, as i documented and wrote to
>> cluster-devel, is that if you are connected to cman and cman_is_active,
>> you are guaranteed 99.9% to connected to ccs without problems (only
>> reason for rejection would be lack of resources on the machine, but at
>> that point you have more serious issues to worry about).
>
> Oops, sorry, I'm still not thinking straight about the new ccs... yeah,
> that makes sense that if cman is up then ccs should be there, since both
> are openais extensions. I'm curious, after cman_init() succeeds, what
> more does cman_is_active() mean? In practice would cman_init() ever be
> ok, but cman_is_active() not be ok?
>
Sorry I forgot to mention another important bit. There is also the
cman_is_quorated bit that I mentioned in the first email.
Some daemons need quorum to be of any use and they often rely on the old
ccs_connect behaviour. They should instead rely on cman_is_quorate etc.
Fabio
--
I'm going to make him an offer he can't refuse.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2008-06-28 5:01 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-27 5:26 [Cluster-devel] [RFC] Common cluster connection handler API Fabio M. Di Nitto
2008-06-27 16:35 ` David Teigland
2008-06-27 18:19 ` Fabio M. Di Nitto
2008-06-27 19:27 ` David Teigland
2008-06-28 4:58 ` Fabio M. Di Nitto
2008-06-28 5:01 ` Fabio M. Di Nitto
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).