[Cluster-devel] rind-0.8.1 patch

cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed

* [Cluster-devel] rind-0.8.1 patch
@ 2007-11-30 16:49 Lon Hohberger
  2008-02-04 17:41 ` Marc Grimme
  0 siblings, 1 reply; 10+ messages in thread
From: Lon Hohberger @ 2007-11-30 16:49 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Minor bugfixes.

-- Lon
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rind-0.8.1.patch
Type: text/x-patch
Size: 142853 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/cluster-devel/attachments/20071130/32d892b9/attachment.bin>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Cluster-devel] rind-0.8.1 patch
  2007-11-30 16:49 [Cluster-devel] rind-0.8.1 patch Lon Hohberger
@ 2008-02-04 17:41 ` Marc Grimme
  2008-02-05 17:58   ` Lon Hohberger
  2008-02-14 21:56   ` Lon Hohberger
  0 siblings, 2 replies; 10+ messages in thread
From: Marc Grimme @ 2008-02-04 17:41 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi Lon,
finally I had time looking at this patch and adapted your example for the 
follow-service a little bit.

Besides that the eventtriggering is running es expected I stubled over some 
minor changes (find patch attached).

1. Isn't it better to organize the configuration as follows:
  <event name="followservice_node" class="node" 
file="/usr/local/cluster/follow-service.sl">
                        follow_service("service:ip_a", "service:ip_b", "ip_a", 
1);
    </event>
Now you can use the follow_service function as a library function and make the 
implementation in the cluster.conf (this is already integrated in the patch).

I would also like something like this:
  <event name="followservice_node" class="node">      
        <file="/usr/local/cluster/another-lib.sl">
        <file="/usr/local/cluster/follow-service.sl">
        follow_service("service:ip_a", "service:ip_b", "ip_a", 1);
    </event>
This would make using sl-files very modular. I didn't yet have time to 
implement it but wanted to hear what you are thinking.

2. I found that the sl-function nodes_online() returns also online if the node 
in question is in the cluster but has no rgmanager running. For me it worked 
just to change the line in rgmanager/src/daemons/slang_event.c:606 :
-               if (membership->cml_members[i].cn_member &&
+               if (membership->cml_members[i].cn_member > 0 &&
But I'm not sure if this is right. For me it worked perfectly well ;-) .

Next is I reimplemented your example on follow-service and made it more 
general. Still some cases might not be handled. But all my tests (which were 
not too many up to know) didn't show any problems. I will hand it over to the 
SAP Guys this week to let then see it this suits there requirements for 
master/slave queue replication (find the example attached).

I hope this feetback helps.

Regards Marc.


On Friday 30 November 2007 17:49:05 Lon Hohberger wrote:
> Minor bugfixes.
>
> -- Lon



-- 
Gruss / Regards,

Marc Grimme
http://www.atix.de/               http://www.open-sharedroot.org/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rgmanager-rind.patch
Type: text/x-diff
Size: 1498 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/cluster-devel/attachments/20080204/bedcf2fe/attachment.bin>
-------------- next part --------------
%
% Returns a list of nodes for the given service that are online and in the failoverdomain.
%
define nodelist_online(service_name) {
   variable nodes, nofailback, restricted, ordered, node_list;
   nodes=nodes_online();
   
   (nofailback, restricted, ordered, node_list) = service_domain_info(service_name);
   
   return intersection(nodes, node_list);
}

%
% Idea: 
%   General purpose function of a construct when Service(svc1) and Service(svc2) 
%   should not be running on the same node even after failover.
%   There are to options to influence the behaviour. If both services have to be 
%   running on the same node (only one node is left in the failovergroup) what 
%   service is the master and should both services be running or only the master
%   service survives. If master is not svc1 or svc2 both service might run on the 
%   same node. If master is either svc1 or svc2 the specified one will be the 
%   surviving service.
%   If followslave is not 0 the svc1 always follows svc2. That means it will be 
%   started on on the same node as svc1. And if available svc2 will be relocated
%   to any other node.
%
define follow_service(svc1, svc2, master, followslave)
{
	variable state, owner_svc1, owner_svc2;
	variable nodes1, nodes2, allowed;

    debug("*** FOLLOW_SERVICE: follow_service(",svc1,", ",svc2,", ", master, ", ", followslave, ")");
    debug("*** FOLLOW_SERVICE: event_type: ", event_type, "service_name: ", service_name, ", service_state: ", service_state);

    %
    % setup the master
    %
    if ((master != svc1) and (master != svc2)) {
       debug("*** FOLLOW_SERVICE: master=NULL");
       master=NULL;
    }

	% get infos we need to decide further
	(owner_svc1, state) = service_status(svc1);
	(owner_svc2, state) = service_status(svc2);
	nodes1 = nodelist_online(svc1);
	nodes2 = nodelist_online(svc2);
    debug("*** FOLLOW_SERVICE: service_status(",svc1,"): ", service_status(svc1));
    debug("*** FOLLOW_SERVICE: owner_svc1: ", owner_svc1, ", owner_svc2: ", owner_svc2, ", nodes1: ", nodes1, ", nodes2: ", nodes2);

	if ((event_type == EVENT_NODE) and (owner_svc1 == node_id) and
	    (node_state == NODE_OFFLINE) and (owner_svc2 >= 0)) {
		%
		% uh oh, the owner of the master server died.  Restart it
		% on the node running the slave server or if we should not 
		% follow the slave start it somewhere else.
		%
		if (followslave>0) {
		  if (master != svc2) {
		    ()=service_start(svc1, owner_svc2);
		  } 
		} else {
		  allowed = subtract(nodes1, owner_svc2);
		  if (length(allowed) > 0) { 
		    ()=service_start(svc1, allowed);
		  } else if (master == svc1) {
		    ()=service_start(svc1, owner_svc2);
		    ()=service_stop(svc2);
		  } else if (master == NULL) {
		    ()=service_start(svc1, owner_svc2);
		  }
		}
	}
	else if ((event_type == EVENT_NODE) and (owner_svc2 == node_id) and
	      (node_state == NODE_OFFLINE) and (owner_svc1 >= 0)) {
		%
		% uh oh, the owner of the svc2 died.  Restart it
		% on any other node but not the one running the svc1.
		% If svc1 is the only one left only start it there 
		% if master==svc2
		%
		allowed=subtract(nodes2, owner_svc1);
		if (length(allowed) > 0) {
		  ()=service_start(svc2, allowed);
		} else if (master == svc2) {
		  ()=service_start(svc2, owner_svc1);
          ()=service_stop(svc1);
		} else if (master == NULL) {
		  ()=service_start(svc2, owner_svc1);
		}
    }
    else if (((event_type == EVENT_SERVICE) and (service_state == "started") and (owner_svc2 == owner_svc1) and (owner_svc1 > 0) and (owner_svc2 > 0)) or
             ((event_type == EVENT_CONFIG) and (owner_svc2 == owner_svc1))) {
        allowed=subtract(nodes2, owner_svc1);
        debug("*** FOLLOW SERVICE: service event started triggered.", allowed);
        if (length(allowed) > 0) {
           ()=service_stop(svc2);
           ()=service_start(svc2, allowed);
		} else if ((master == svc2) and (owner_svc2 > 0)){
		   debug("*** FOLLOW SERVICE: will stop service .", svc1);
		   ()=service_stop(svc1);
		} else if ((master == svc1) and (owner_svc1 > 0)) {
		   debug("*** FOLLOW SERVICE: will stop service .", svc2);
		   ()=service_stop(svc2);
		} else {
		   debug("*** FOLLOW SERVICE: both services running on the same node or only one is running.", allowed, ", ", master);
		}
    }
	return;
}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Cluster-devel] rind-0.8.1 patch
  2008-02-04 17:41 ` Marc Grimme
@ 2008-02-05 17:58   ` Lon Hohberger
  2008-02-06  9:03     ` Marc Grimme
  2008-02-14 21:56   ` Lon Hohberger
  1 sibling, 1 reply; 10+ messages in thread
From: Lon Hohberger @ 2008-02-05 17:58 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Mon, 2008-02-04 at 18:41 +0100, Marc Grimme wrote:
> Hi Lon,
> finally I had time looking at this patch and adapted your example for the 
> follow-service a little bit.
> 
> Besides that the eventtriggering is running es expected I stubled over some 
> minor changes (find patch attached).
> 
> 1. Isn't it better to organize the configuration as follows:
>   <event name="followservice_node" class="node" 
> file="/usr/local/cluster/follow-service.sl">
>                         follow_service("service:ip_a", "service:ip_b", "ip_a", 
> 1);
>     </event>

See below...


> Now you can use the follow_service function as a library function and make the 
> implementation in the cluster.conf (this is already integrated in the patch).
> 
> I would also like something like this:
>   <event name="followservice_node" class="node">      
>         <file="/usr/local/cluster/another-lib.sl">
>         <file="/usr/local/cluster/follow-service.sl">
>         follow_service("service:ip_a", "service:ip_b", "ip_a", 1);
>     </event>
> This would make using sl-files very modular. I didn't yet have time to 
> implement it but wanted to hear what you are thinking.

Nothing to implement, really.  The following should handle both cases
without changing how configuration works (and requiring more parsing of
cluster.conf):

  <event name="followservice_node" class="node">      
    evalfile("another-lib.sl");
    evalfile("follow-service.sl");
    follow_service("service:ip_a", "service:ip_b", "ip_a", 1);
  </event>

I do, however, need a way to set search paths for the s-lang interpreter
as a matter of configuration.  (The above should work if you drop
another-lib.sl and follow-service.sl in /usr/share/cluster...)

<events search_path="/usr/share/cluster:/usr/local/cluster:..." />
   <!-- for example -->
   ...
</events>

(However, I don't consider this critical...)

I looked in to modules, but it'd be more complicated, and it seems
import() doesn't work on RHEL (or maybe I did it wrong...).

Note that the reason I was calling external scripts is because there's a
limit in ccsd on the amount of data you can get back from ccs_get() -
it's a couple hundred bytes.  So, embedding an entire script won't work,
but a shorty script like the one you made should work.

> 2. I found that the sl-function nodes_online() returns also online if the node 
> in question is in the cluster but has no rgmanager running. For me it worked 
> just to change the line in rgmanager/src/daemons/slang_event.c:606 :
> -               if (membership->cml_members[i].cn_member &&
> +               if (membership->cml_members[i].cn_member > 0 &&
> But I'm not sure if this is right. For me it worked perfectly well ;-) .

That's strange... I'll look at that.  That *needs* to work. :)


> Next is I reimplemented your example on follow-service and made it more 
> general.

I'll take a look at it.  Mine was really a PoC / example.  If yours is
better, then we should document it and put it up on the cluster wiki at
some point as an example of how to make rgmanager do backflips.


>  Still some cases might not be handled. But all my tests (which were 
> not too many up to know) didn't show any problems. I will hand it over to the 
> SAP Guys this week to let then see it this suits there requirements for 
> master/slave queue replication (find the example attached).

:) Good good.

-- Lon



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Cluster-devel] rind-0.8.1 patch
  2008-02-05 17:58   ` Lon Hohberger
@ 2008-02-06  9:03     ` Marc Grimme
  2008-02-06 17:01       ` Lon Hohberger
  0 siblings, 1 reply; 10+ messages in thread
From: Marc Grimme @ 2008-02-06  9:03 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Tuesday 05 February 2008 18:58:25 Lon Hohberger wrote:
> On Mon, 2008-02-04 at 18:41 +0100, Marc Grimme wrote:
> > Hi Lon,
> > finally I had time looking at this patch and adapted your example for the
> > follow-service a little bit.
> >
> > Besides that the eventtriggering is running es expected I stubled over
> > some minor changes (find patch attached).
> >
> > 1. Isn't it better to organize the configuration as follows:
> >   <event name="followservice_node" class="node"
> > file="/usr/local/cluster/follow-service.sl">
> >                         follow_service("service:ip_a", "service:ip_b",
> > "ip_a", 1);
> >     </event>
>
> See below...
>
> > Now you can use the follow_service function as a library function and
> > make the implementation in the cluster.conf (this is already integrated
> > in the patch).
> >
> > I would also like something like this:
> >   <event name="followservice_node" class="node">
> >         <file="/usr/local/cluster/another-lib.sl">
> >         <file="/usr/local/cluster/follow-service.sl">
> >         follow_service("service:ip_a", "service:ip_b", "ip_a", 1);
> >     </event>
> > This would make using sl-files very modular. I didn't yet have time to
> > implement it but wanted to hear what you are thinking.
>
> Nothing to implement, really.  The following should handle both cases
> without changing how configuration works (and requiring more parsing of
> cluster.conf):
>
>   <event name="followservice_node" class="node">
>     evalfile("another-lib.sl");
>     evalfile("follow-service.sl");
>     follow_service("service:ip_a", "service:ip_b", "ip_a", 1);
>   </event>
>
> I do, however, need a way to set search paths for the s-lang interpreter
> as a matter of configuration.  (The above should work if you drop
> another-lib.sl and follow-service.sl in /usr/share/cluster...)
>
> <events search_path="/usr/share/cluster:/usr/local/cluster:..." />
>    <!-- for example -->
>    ...
> </events>
Ah got it. I wasn't aware of evalfile. But wouldn't filetags work around the 
searchpath problem and being pretty easy to implement?
>
> (However, I don't consider this critical...)
It's not critical but could help make the development of those sl-files more 
general.
>
> I looked in to modules, but it'd be more complicated, and it seems
> import() doesn't work on RHEL (or maybe I did it wrong...).
>
> Note that the reason I was calling external scripts is because there's a
> limit in ccsd on the amount of data you can get back from ccs_get() -
> it's a couple hundred bytes.  So, embedding an entire script won't work,
> but a shorty script like the one you made should work.
And you can independently develop sl-scripts from the cluster.conf. So you 
don't need a new version number anytime you change the sl-file. Besides you 
could build up libraries (on example is follow-service) to be used general.
>
> > 2. I found that the sl-function nodes_online() returns also online if the
> > node in question is in the cluster but has no rgmanager running. For me
> > it worked just to change the line in
> > rgmanager/src/daemons/slang_event.c:606 : -               if
> > (membership->cml_members[i].cn_member &&
> > +               if (membership->cml_members[i].cn_member > 0 &&
> > But I'm not sure if this is right. For me it worked perfectly well ;-) .
>
> That's strange... I'll look at that.  That *needs* to work. :)
Right that should not be a difference shouldn't it. ;-)
>
> > Next is I reimplemented your example on follow-service and made it more
> > general.
>
> I'll take a look at it.  Mine was really a PoC / example.  If yours is
> better, then we should document it and put it up on the cluster wiki at
> some point as an example of how to make rgmanager do backflips.
I just thought to make it more like a library. And I also took the 
failoverdomains into account when returning the nodes which are capable of 
running the service in question.
>
> >  Still some cases might not be handled. But all my tests (which were
> > not too many up to know) didn't show any problems. I will hand it over to
> > the SAP Guys this week to let then see it this suits there requirements
> > for master/slave queue replication (find the example attached).
> >
> :) Good good.
>
> -- Lon

Marc.

-- 
Gruss / Regards,

Marc Grimme
http://www.atix.de/               http://www.open-sharedroot.org/



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Cluster-devel] rind-0.8.1 patch
  2008-02-06  9:03     ` Marc Grimme
@ 2008-02-06 17:01       ` Lon Hohberger
  2008-02-06 17:22         ` Lon Hohberger
                           ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Lon Hohberger @ 2008-02-06 17:01 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Wed, 2008-02-06 at 10:03 +0100, Marc Grimme wrote:
> On Tuesday 05 February 2008 18:58:25 Lon Hohberger wrote:

> > <events search_path="/usr/share/cluster:/usr/local/cluster:..." />
> >    <!-- for example -->
> >    ...
> > </events>

> Ah got it. I wasn't aware of evalfile. But wouldn't filetags work around the 
> searchpath problem and being pretty easy to implement?

I don't see searchpaths as a problem, and in fact, I might not have to
fix it anyway (yay!).  Turns out, this works, too (I thought it didn't
for some reason):

  evalfile("/tmp/lon.sl");
  lon_function();

/tmp/lon.sl:

  evalfile("/root/foo.sl");
  define lon_function()
  {
    foo_function();
    printf("Hello, world!\n");
  }

/root/foo.sl:

  define foo_function()
  {
    foo_function();
    printf("Test\n");
  }

> > (However, I don't consider this critical...)
> It's not critical but could help make the development of those sl-files more 
> general.

Given that absolute paths also work, does this satisfy the requirement?
I really can't see adding more parsing code for something S-Lang already
does.

I mean, it's not -that- hard to add, but it's kind of pointless to do:

  <event>
    <file "/rgmanager/slang-scripts/foo1.sl"/>
    <file "/rgmanager/slang-scripts/foo2.sl"/>
    script_body();
  </event>

instead of:

  <event>
    evalfile("/rgmanager/slang-scripts/foo1.sl");
    evalfile("/rgmanager/slang-scripts/foo2.sl");
    script_body();
  </event>

> > Note that the reason I was calling external scripts is because there's a
> > limit in ccsd on the amount of data you can get back from ccs_get() -
> > it's a couple hundred bytes.  So, embedding an entire script won't work,
> > but a shorty script like the one you made should work.
> And you can independently develop sl-scripts from the cluster.conf. So you 
> don't need a new version number anytime you change the sl-file. Besides you 
> could build up libraries (on example is follow-service) to be used general.

That's also a benefit (and using evalfile() in your code instead of
embedding the equivalent in cluster.conf also is coincides with this).

> > > +               if (membership->cml_members[i].cn_member > 0 &&
> > > But I'm not sure if this is right. For me it worked perfectly well ;-) .
> >
> > That's strange... I'll look at that.  That *needs* to work. :)
> Right that should not be a difference shouldn't it. ;-)

Definitely not. :)

One thing I think's missing is intelligence about event collapsing in
default_event_handler.  For example, if a service fails and you restart
it, but restart fails, so you move it to another node (all in a single
event handler execution), we get 5-ish events for that:

 * failure event
 * stopped event
 * start event
 * stopped event
 * start event

The middle 3 events become irrelevant.  We could fix it in
default_event_script.sl by checking the current state and if the current
state doesn't match the event, throwing it out.  (I think throwing them
out for user-defined event scripts is a bad idea, however, which is why
I suggested changing it in default_event_handler).

This could also be a good 'library' function (as could several of the
functions in default_event_handler.sl).

-- Lon

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Cluster-devel] rind-0.8.1 patch
  2008-02-06 17:01       ` Lon Hohberger
@ 2008-02-06 17:22         ` Lon Hohberger
  2008-02-06 19:18         ` Marc Grimme
  2008-02-07  8:38         ` Marc Grimme
  2 siblings, 0 replies; 10+ messages in thread
From: Lon Hohberger @ 2008-02-06 17:22 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Wed, 2008-02-06 at 12:01 -0500, Lon Hohberger wrote:

> /root/foo.sl:
> 
>   define foo_function()
>   {
>     foo_function();
>     printf("Test\n");
>   }

Typo... :)

-- Lon



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Cluster-devel] rind-0.8.1 patch
  2008-02-06 17:01       ` Lon Hohberger
  2008-02-06 17:22         ` Lon Hohberger
@ 2008-02-06 19:18         ` Marc Grimme
  2008-02-07  8:38         ` Marc Grimme
  2 siblings, 0 replies; 10+ messages in thread
From: Marc Grimme @ 2008-02-06 19:18 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Wednesday 06 February 2008 18:01:34 Lon Hohberger wrote:
> On Wed, 2008-02-06 at 10:03 +0100, Marc Grimme wrote:
> > On Tuesday 05 February 2008 18:58:25 Lon Hohberger wrote:
> > > <events search_path="/usr/share/cluster:/usr/local/cluster:..." />
> > >    <!-- for example -->
> > >    ...
> > > </events>
> >
> > Ah got it. I wasn't aware of evalfile. But wouldn't filetags work around
> > the searchpath problem and being pretty easy to implement?
>
> I don't see searchpaths as a problem, and in fact, I might not have to
> fix it anyway (yay!).  Turns out, this works, too (I thought it didn't
> for some reason):
>
>   evalfile("/tmp/lon.sl");
>   lon_function();
and the evalfile could also be use in sl-files I suppose.
>
> /tmp/lon.sl:
>
>   evalfile("/root/foo.sl");
>   define lon_function()
>   {
>     foo_function();
>     printf("Hello, world!\n");
>   }
>
> /root/foo.sl:
>
>   define foo_function()
>   {
>     foo_function();
>     printf("Test\n");
>   }
>
> > > (However, I don't consider this critical...)
> >
> > It's not critical but could help make the development of those sl-files
> > more general.
>
> Given that absolute paths also work, does this satisfy the requirement?
> I really can't see adding more parsing code for something S-Lang already
> does.
>
> I mean, it's not -that- hard to add, but it's kind of pointless to do:
>
>   <event>
>     <file "/rgmanager/slang-scripts/foo1.sl"/>
>     <file "/rgmanager/slang-scripts/foo2.sl"/>
>     script_body();
>   </event>
>
> instead of:
>
>   <event>
>     evalfile("/rgmanager/slang-scripts/foo1.sl");
>     evalfile("/rgmanager/slang-scripts/foo2.sl");
>     script_body();
>   </event>
Yes, this way is better I agree.
>
> > > Note that the reason I was calling external scripts is because there's
> > > a limit in ccsd on the amount of data you can get back from ccs_get() -
> > > it's a couple hundred bytes.  So, embedding an entire script won't
> > > work, but a shorty script like the one you made should work.
> >
> > And you can independently develop sl-scripts from the cluster.conf. So
> > you don't need a new version number anytime you change the sl-file.
> > Besides you could build up libraries (on example is follow-service) to be
> > used general.
>
> That's also a benefit (and using evalfile() in your code instead of
> embedding the equivalent in cluster.conf also is coincides with this).
>
> > > > +               if (membership->cml_members[i].cn_member > 0 &&
> > > > But I'm not sure if this is right. For me it worked perfectly well
> > > > ;-) .
> > >
> > > That's strange... I'll look at that.  That *needs* to work. :)
> >
> > Right that should not be a difference shouldn't it. ;-)
>
> Definitely not. :)
>
>
> One thing I think's missing is intelligence about event collapsing in
> default_event_handler.  For example, if a service fails and you restart
> it, but restart fails, so you move it to another node (all in a single
> event handler execution), we get 5-ish events for that:
>
>  * failure event
>  * stopped event
>  * start event
>  * stopped event
>  * start event
>
> The middle 3 events become irrelevant.  We could fix it in
> default_event_script.sl by checking the current state and if the current
> state doesn't match the event, throwing it out.  (I think throwing them
> out for user-defined event scripts is a bad idea, however, which is why
> I suggested changing it in default_event_handler).
>
> This could also be a good 'library' function (as could several of the
> functions in default_event_handler.sl).
Yes and therefore be useable with evalfile and everything.

But I like those eventscripts. Especially when they don't complicate the 
cluster.conf files.
>
> -- Lon

Marc.

-- 
Gruss / Regards,

Marc Grimme
http://www.atix.de/               http://www.open-sharedroot.org/



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Cluster-devel] rind-0.8.1 patch
  2008-02-06 17:01       ` Lon Hohberger
  2008-02-06 17:22         ` Lon Hohberger
  2008-02-06 19:18         ` Marc Grimme
@ 2008-02-07  8:38         ` Marc Grimme
  2008-02-08 20:56           ` Lon Hohberger
  2 siblings, 1 reply; 10+ messages in thread
From: Marc Grimme @ 2008-02-07  8:38 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Something else I was thinking about when playing with those things:
1. Why are USER, CONFIG and MIGRATION events not yet being passed? It could be 
quite interesting as well to trigger those.

2. And wouldn't it be a good idea to being able to call some kind of 
higherlevel os-skript? I thought it might then be possible to generate a more 
dynamic failoverdomain. For example one with the lowest loaded node being 
lowest prioritized. That can be quite nice when having services or vms which 
produce very high load.

Marc.

-- 
Gruss / Regards,

Marc Grimme
http://www.atix.de/               http://www.open-sharedroot.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Cluster-devel] rind-0.8.1 patch
  2008-02-07  8:38         ` Marc Grimme
@ 2008-02-08 20:56           ` Lon Hohberger
  0 siblings, 0 replies; 10+ messages in thread
From: Lon Hohberger @ 2008-02-08 20:56 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Thu, 2008-02-07 at 09:38 +0100, Marc Grimme wrote:
> Something else I was thinking about when playing with those things:
> 1. Why are USER, CONFIG and MIGRATION events not yet being passed? It could be 
> quite interesting as well to trigger those.

USER + CONFIG is being passed to the event handlers in CVS, you just
can't define events off of them currently in the configuration.  I think
what we have right now is plenty for blowing your own foot off, but we
certainly could add those.

Virtual machine requests (e.g. clusvcadm -M) aren't going out with 5.2
for central_processing.

> 2. And wouldn't it be a good idea to being able to call some kind of 
> higherlevel os-skript? 

I disagree here, sort of:

 * I don't think the possibility of lots of fork/execs while trying to
determine service placement after a failure is a great idea.  We want to
try to be as neutral as we can during this situation.  A really
low-impact script interface that reorders a node list might be okay;
i.e.:

   node_list = external_reorder("my_script", old_node_list);

I suppose it's kind of like shuffle(), but with intelligence.  That
script could then sort the node IDs by whatever criteria it wanted.

As for processing events in external scripts, I disagree fairly
strongly:

 * The data rgmanager is currently using to make decisions (e.g.
configuration info such as failover domains, service recovery policies,
and extended stuff which you can randomly add) is difficult to access
from shell scripts.  

 * Internal rgmanager operations (flipping service states for example)
can't be done from outside rgmanager in a sane way.

> I thought it might then be possible to generate a more 
> dynamic failoverdomain.

Agreed.

> For example one with the lowest loaded node being 
> lowest prioritized. That can be quite nice when having services or vms which 
> produce very high load.

There are lots of kinds of load:
 * memory pressure
 * cpu load
 * run queue average length (the 'uptime' load)
 * i/o bandwidth to shared storage
 * network bandwidth

I'd recommend whatever load monitoring we care about be done
proactively.  That is, have something publish current load states
periodically, and have the data 'already there' - so that in the event
of a failure, we can just act on what is known - rather than asking
around for various pieces of data.

However, we're getting a little far out though - does what's in CVS work
for doing the 'follows' logic or not? :)

-- Lon

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Cluster-devel] rind-0.8.1 patch
  2008-02-04 17:41 ` Marc Grimme
  2008-02-05 17:58   ` Lon Hohberger
@ 2008-02-14 21:56   ` Lon Hohberger
  1 sibling, 0 replies; 10+ messages in thread
From: Lon Hohberger @ 2008-02-14 21:56 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Mon, 2008-02-04 at 18:41 +0100, Marc Grimme wrote:

> 2. I found that the sl-function nodes_online() returns also online if the node 
> in question is in the cluster but has no rgmanager running. For me it worked 
> just to change the line in rgmanager/src/daemons/slang_event.c:606 :
> -               if (membership->cml_members[i].cn_member &&
> +               if (membership->cml_members[i].cn_member > 0 &&

This is really strange -- I just retested this and it worked for me
(unmodified).  I wonder if there's something I'm missing.  The different
behaviors would normally indicate an uninitialized variable but I don't
see where it could be (get_member_list() memsets the membership list).

-- Lon



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2008-02-14 21:56 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-30 16:49 [Cluster-devel] rind-0.8.1 patch Lon Hohberger
2008-02-04 17:41 ` Marc Grimme
2008-02-05 17:58   ` Lon Hohberger
2008-02-06  9:03     ` Marc Grimme
2008-02-06 17:01       ` Lon Hohberger
2008-02-06 17:22         ` Lon Hohberger
2008-02-06 19:18         ` Marc Grimme
2008-02-07  8:38         ` Marc Grimme
2008-02-08 20:56           ` Lon Hohberger
2008-02-14 21:56   ` Lon Hohberger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).