* [Cluster-devel] rind-0.8.1 patch @ 2007-11-30 16:49 Lon Hohberger 2008-02-04 17:41 ` Marc Grimme 0 siblings, 1 reply; 10+ messages in thread From: Lon Hohberger @ 2007-11-30 16:49 UTC (permalink / raw) To: cluster-devel.redhat.com Minor bugfixes. -- Lon -------------- next part -------------- A non-text attachment was scrubbed... Name: rind-0.8.1.patch Type: text/x-patch Size: 142853 bytes Desc: not available URL: <http://listman.redhat.com/archives/cluster-devel/attachments/20071130/32d892b9/attachment.bin> ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Cluster-devel] rind-0.8.1 patch 2007-11-30 16:49 [Cluster-devel] rind-0.8.1 patch Lon Hohberger @ 2008-02-04 17:41 ` Marc Grimme 2008-02-05 17:58 ` Lon Hohberger 2008-02-14 21:56 ` Lon Hohberger 0 siblings, 2 replies; 10+ messages in thread From: Marc Grimme @ 2008-02-04 17:41 UTC (permalink / raw) To: cluster-devel.redhat.com Hi Lon, finally I had time looking at this patch and adapted your example for the follow-service a little bit. Besides that the eventtriggering is running es expected I stubled over some minor changes (find patch attached). 1. Isn't it better to organize the configuration as follows: <event name="followservice_node" class="node" file="/usr/local/cluster/follow-service.sl"> follow_service("service:ip_a", "service:ip_b", "ip_a", 1); </event> Now you can use the follow_service function as a library function and make the implementation in the cluster.conf (this is already integrated in the patch). I would also like something like this: <event name="followservice_node" class="node"> <file="/usr/local/cluster/another-lib.sl"> <file="/usr/local/cluster/follow-service.sl"> follow_service("service:ip_a", "service:ip_b", "ip_a", 1); </event> This would make using sl-files very modular. I didn't yet have time to implement it but wanted to hear what you are thinking. 2. I found that the sl-function nodes_online() returns also online if the node in question is in the cluster but has no rgmanager running. For me it worked just to change the line in rgmanager/src/daemons/slang_event.c:606 : - if (membership->cml_members[i].cn_member && + if (membership->cml_members[i].cn_member > 0 && But I'm not sure if this is right. For me it worked perfectly well ;-) . Next is I reimplemented your example on follow-service and made it more general. Still some cases might not be handled. But all my tests (which were not too many up to know) didn't show any problems. I will hand it over to the SAP Guys this week to let then see it this suits there requirements for master/slave queue replication (find the example attached). I hope this feetback helps. Regards Marc. On Friday 30 November 2007 17:49:05 Lon Hohberger wrote: > Minor bugfixes. > > -- Lon -- Gruss / Regards, Marc Grimme http://www.atix.de/ http://www.open-sharedroot.org/ -------------- next part -------------- A non-text attachment was scrubbed... Name: rgmanager-rind.patch Type: text/x-diff Size: 1498 bytes Desc: not available URL: <http://listman.redhat.com/archives/cluster-devel/attachments/20080204/bedcf2fe/attachment.bin> -------------- next part -------------- % % Returns a list of nodes for the given service that are online and in the failoverdomain. % define nodelist_online(service_name) { variable nodes, nofailback, restricted, ordered, node_list; nodes=nodes_online(); (nofailback, restricted, ordered, node_list) = service_domain_info(service_name); return intersection(nodes, node_list); } % % Idea: % General purpose function of a construct when Service(svc1) and Service(svc2) % should not be running on the same node even after failover. % There are to options to influence the behaviour. If both services have to be % running on the same node (only one node is left in the failovergroup) what % service is the master and should both services be running or only the master % service survives. If master is not svc1 or svc2 both service might run on the % same node. If master is either svc1 or svc2 the specified one will be the % surviving service. % If followslave is not 0 the svc1 always follows svc2. That means it will be % started on on the same node as svc1. And if available svc2 will be relocated % to any other node. % define follow_service(svc1, svc2, master, followslave) { variable state, owner_svc1, owner_svc2; variable nodes1, nodes2, allowed; debug("*** FOLLOW_SERVICE: follow_service(",svc1,", ",svc2,", ", master, ", ", followslave, ")"); debug("*** FOLLOW_SERVICE: event_type: ", event_type, "service_name: ", service_name, ", service_state: ", service_state); % % setup the master % if ((master != svc1) and (master != svc2)) { debug("*** FOLLOW_SERVICE: master=NULL"); master=NULL; } % get infos we need to decide further (owner_svc1, state) = service_status(svc1); (owner_svc2, state) = service_status(svc2); nodes1 = nodelist_online(svc1); nodes2 = nodelist_online(svc2); debug("*** FOLLOW_SERVICE: service_status(",svc1,"): ", service_status(svc1)); debug("*** FOLLOW_SERVICE: owner_svc1: ", owner_svc1, ", owner_svc2: ", owner_svc2, ", nodes1: ", nodes1, ", nodes2: ", nodes2); if ((event_type == EVENT_NODE) and (owner_svc1 == node_id) and (node_state == NODE_OFFLINE) and (owner_svc2 >= 0)) { % % uh oh, the owner of the master server died. Restart it % on the node running the slave server or if we should not % follow the slave start it somewhere else. % if (followslave>0) { if (master != svc2) { ()=service_start(svc1, owner_svc2); } } else { allowed = subtract(nodes1, owner_svc2); if (length(allowed) > 0) { ()=service_start(svc1, allowed); } else if (master == svc1) { ()=service_start(svc1, owner_svc2); ()=service_stop(svc2); } else if (master == NULL) { ()=service_start(svc1, owner_svc2); } } } else if ((event_type == EVENT_NODE) and (owner_svc2 == node_id) and (node_state == NODE_OFFLINE) and (owner_svc1 >= 0)) { % % uh oh, the owner of the svc2 died. Restart it % on any other node but not the one running the svc1. % If svc1 is the only one left only start it there % if master==svc2 % allowed=subtract(nodes2, owner_svc1); if (length(allowed) > 0) { ()=service_start(svc2, allowed); } else if (master == svc2) { ()=service_start(svc2, owner_svc1); ()=service_stop(svc1); } else if (master == NULL) { ()=service_start(svc2, owner_svc1); } } else if (((event_type == EVENT_SERVICE) and (service_state == "started") and (owner_svc2 == owner_svc1) and (owner_svc1 > 0) and (owner_svc2 > 0)) or ((event_type == EVENT_CONFIG) and (owner_svc2 == owner_svc1))) { allowed=subtract(nodes2, owner_svc1); debug("*** FOLLOW SERVICE: service event started triggered.", allowed); if (length(allowed) > 0) { ()=service_stop(svc2); ()=service_start(svc2, allowed); } else if ((master == svc2) and (owner_svc2 > 0)){ debug("*** FOLLOW SERVICE: will stop service .", svc1); ()=service_stop(svc1); } else if ((master == svc1) and (owner_svc1 > 0)) { debug("*** FOLLOW SERVICE: will stop service .", svc2); ()=service_stop(svc2); } else { debug("*** FOLLOW SERVICE: both services running on the same node or only one is running.", allowed, ", ", master); } } return; } ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Cluster-devel] rind-0.8.1 patch 2008-02-04 17:41 ` Marc Grimme @ 2008-02-05 17:58 ` Lon Hohberger 2008-02-06 9:03 ` Marc Grimme 2008-02-14 21:56 ` Lon Hohberger 1 sibling, 1 reply; 10+ messages in thread From: Lon Hohberger @ 2008-02-05 17:58 UTC (permalink / raw) To: cluster-devel.redhat.com On Mon, 2008-02-04 at 18:41 +0100, Marc Grimme wrote: > Hi Lon, > finally I had time looking at this patch and adapted your example for the > follow-service a little bit. > > Besides that the eventtriggering is running es expected I stubled over some > minor changes (find patch attached). > > 1. Isn't it better to organize the configuration as follows: > <event name="followservice_node" class="node" > file="/usr/local/cluster/follow-service.sl"> > follow_service("service:ip_a", "service:ip_b", "ip_a", > 1); > </event> See below... > Now you can use the follow_service function as a library function and make the > implementation in the cluster.conf (this is already integrated in the patch). > > I would also like something like this: > <event name="followservice_node" class="node"> > <file="/usr/local/cluster/another-lib.sl"> > <file="/usr/local/cluster/follow-service.sl"> > follow_service("service:ip_a", "service:ip_b", "ip_a", 1); > </event> > This would make using sl-files very modular. I didn't yet have time to > implement it but wanted to hear what you are thinking. Nothing to implement, really. The following should handle both cases without changing how configuration works (and requiring more parsing of cluster.conf): <event name="followservice_node" class="node"> evalfile("another-lib.sl"); evalfile("follow-service.sl"); follow_service("service:ip_a", "service:ip_b", "ip_a", 1); </event> I do, however, need a way to set search paths for the s-lang interpreter as a matter of configuration. (The above should work if you drop another-lib.sl and follow-service.sl in /usr/share/cluster...) <events search_path="/usr/share/cluster:/usr/local/cluster:..." /> <!-- for example --> ... </events> (However, I don't consider this critical...) I looked in to modules, but it'd be more complicated, and it seems import() doesn't work on RHEL (or maybe I did it wrong...). Note that the reason I was calling external scripts is because there's a limit in ccsd on the amount of data you can get back from ccs_get() - it's a couple hundred bytes. So, embedding an entire script won't work, but a shorty script like the one you made should work. > 2. I found that the sl-function nodes_online() returns also online if the node > in question is in the cluster but has no rgmanager running. For me it worked > just to change the line in rgmanager/src/daemons/slang_event.c:606 : > - if (membership->cml_members[i].cn_member && > + if (membership->cml_members[i].cn_member > 0 && > But I'm not sure if this is right. For me it worked perfectly well ;-) . That's strange... I'll look at that. That *needs* to work. :) > Next is I reimplemented your example on follow-service and made it more > general. I'll take a look at it. Mine was really a PoC / example. If yours is better, then we should document it and put it up on the cluster wiki at some point as an example of how to make rgmanager do backflips. > Still some cases might not be handled. But all my tests (which were > not too many up to know) didn't show any problems. I will hand it over to the > SAP Guys this week to let then see it this suits there requirements for > master/slave queue replication (find the example attached). :) Good good. -- Lon ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Cluster-devel] rind-0.8.1 patch 2008-02-05 17:58 ` Lon Hohberger @ 2008-02-06 9:03 ` Marc Grimme 2008-02-06 17:01 ` Lon Hohberger 0 siblings, 1 reply; 10+ messages in thread From: Marc Grimme @ 2008-02-06 9:03 UTC (permalink / raw) To: cluster-devel.redhat.com On Tuesday 05 February 2008 18:58:25 Lon Hohberger wrote: > On Mon, 2008-02-04 at 18:41 +0100, Marc Grimme wrote: > > Hi Lon, > > finally I had time looking at this patch and adapted your example for the > > follow-service a little bit. > > > > Besides that the eventtriggering is running es expected I stubled over > > some minor changes (find patch attached). > > > > 1. Isn't it better to organize the configuration as follows: > > <event name="followservice_node" class="node" > > file="/usr/local/cluster/follow-service.sl"> > > follow_service("service:ip_a", "service:ip_b", > > "ip_a", 1); > > </event> > > See below... > > > Now you can use the follow_service function as a library function and > > make the implementation in the cluster.conf (this is already integrated > > in the patch). > > > > I would also like something like this: > > <event name="followservice_node" class="node"> > > <file="/usr/local/cluster/another-lib.sl"> > > <file="/usr/local/cluster/follow-service.sl"> > > follow_service("service:ip_a", "service:ip_b", "ip_a", 1); > > </event> > > This would make using sl-files very modular. I didn't yet have time to > > implement it but wanted to hear what you are thinking. > > Nothing to implement, really. The following should handle both cases > without changing how configuration works (and requiring more parsing of > cluster.conf): > > <event name="followservice_node" class="node"> > evalfile("another-lib.sl"); > evalfile("follow-service.sl"); > follow_service("service:ip_a", "service:ip_b", "ip_a", 1); > </event> > > I do, however, need a way to set search paths for the s-lang interpreter > as a matter of configuration. (The above should work if you drop > another-lib.sl and follow-service.sl in /usr/share/cluster...) > > <events search_path="/usr/share/cluster:/usr/local/cluster:..." /> > <!-- for example --> > ... > </events> Ah got it. I wasn't aware of evalfile. But wouldn't filetags work around the searchpath problem and being pretty easy to implement? > > (However, I don't consider this critical...) It's not critical but could help make the development of those sl-files more general. > > I looked in to modules, but it'd be more complicated, and it seems > import() doesn't work on RHEL (or maybe I did it wrong...). > > Note that the reason I was calling external scripts is because there's a > limit in ccsd on the amount of data you can get back from ccs_get() - > it's a couple hundred bytes. So, embedding an entire script won't work, > but a shorty script like the one you made should work. And you can independently develop sl-scripts from the cluster.conf. So you don't need a new version number anytime you change the sl-file. Besides you could build up libraries (on example is follow-service) to be used general. > > > 2. I found that the sl-function nodes_online() returns also online if the > > node in question is in the cluster but has no rgmanager running. For me > > it worked just to change the line in > > rgmanager/src/daemons/slang_event.c:606 : - if > > (membership->cml_members[i].cn_member && > > + if (membership->cml_members[i].cn_member > 0 && > > But I'm not sure if this is right. For me it worked perfectly well ;-) . > > That's strange... I'll look at that. That *needs* to work. :) Right that should not be a difference shouldn't it. ;-) > > > Next is I reimplemented your example on follow-service and made it more > > general. > > I'll take a look at it. Mine was really a PoC / example. If yours is > better, then we should document it and put it up on the cluster wiki at > some point as an example of how to make rgmanager do backflips. I just thought to make it more like a library. And I also took the failoverdomains into account when returning the nodes which are capable of running the service in question. > > > Still some cases might not be handled. But all my tests (which were > > not too many up to know) didn't show any problems. I will hand it over to > > the SAP Guys this week to let then see it this suits there requirements > > for master/slave queue replication (find the example attached). > > > :) Good good. > > -- Lon Marc. -- Gruss / Regards, Marc Grimme http://www.atix.de/ http://www.open-sharedroot.org/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Cluster-devel] rind-0.8.1 patch 2008-02-06 9:03 ` Marc Grimme @ 2008-02-06 17:01 ` Lon Hohberger 2008-02-06 17:22 ` Lon Hohberger ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: Lon Hohberger @ 2008-02-06 17:01 UTC (permalink / raw) To: cluster-devel.redhat.com On Wed, 2008-02-06 at 10:03 +0100, Marc Grimme wrote: > On Tuesday 05 February 2008 18:58:25 Lon Hohberger wrote: > > <events search_path="/usr/share/cluster:/usr/local/cluster:..." /> > > <!-- for example --> > > ... > > </events> > Ah got it. I wasn't aware of evalfile. But wouldn't filetags work around the > searchpath problem and being pretty easy to implement? I don't see searchpaths as a problem, and in fact, I might not have to fix it anyway (yay!). Turns out, this works, too (I thought it didn't for some reason): evalfile("/tmp/lon.sl"); lon_function(); /tmp/lon.sl: evalfile("/root/foo.sl"); define lon_function() { foo_function(); printf("Hello, world!\n"); } /root/foo.sl: define foo_function() { foo_function(); printf("Test\n"); } > > (However, I don't consider this critical...) > It's not critical but could help make the development of those sl-files more > general. Given that absolute paths also work, does this satisfy the requirement? I really can't see adding more parsing code for something S-Lang already does. I mean, it's not -that- hard to add, but it's kind of pointless to do: <event> <file "/rgmanager/slang-scripts/foo1.sl"/> <file "/rgmanager/slang-scripts/foo2.sl"/> script_body(); </event> instead of: <event> evalfile("/rgmanager/slang-scripts/foo1.sl"); evalfile("/rgmanager/slang-scripts/foo2.sl"); script_body(); </event> > > Note that the reason I was calling external scripts is because there's a > > limit in ccsd on the amount of data you can get back from ccs_get() - > > it's a couple hundred bytes. So, embedding an entire script won't work, > > but a shorty script like the one you made should work. > And you can independently develop sl-scripts from the cluster.conf. So you > don't need a new version number anytime you change the sl-file. Besides you > could build up libraries (on example is follow-service) to be used general. That's also a benefit (and using evalfile() in your code instead of embedding the equivalent in cluster.conf also is coincides with this). > > > + if (membership->cml_members[i].cn_member > 0 && > > > But I'm not sure if this is right. For me it worked perfectly well ;-) . > > > > That's strange... I'll look at that. That *needs* to work. :) > Right that should not be a difference shouldn't it. ;-) Definitely not. :) One thing I think's missing is intelligence about event collapsing in default_event_handler. For example, if a service fails and you restart it, but restart fails, so you move it to another node (all in a single event handler execution), we get 5-ish events for that: * failure event * stopped event * start event * stopped event * start event The middle 3 events become irrelevant. We could fix it in default_event_script.sl by checking the current state and if the current state doesn't match the event, throwing it out. (I think throwing them out for user-defined event scripts is a bad idea, however, which is why I suggested changing it in default_event_handler). This could also be a good 'library' function (as could several of the functions in default_event_handler.sl). -- Lon ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Cluster-devel] rind-0.8.1 patch 2008-02-06 17:01 ` Lon Hohberger @ 2008-02-06 17:22 ` Lon Hohberger 2008-02-06 19:18 ` Marc Grimme 2008-02-07 8:38 ` Marc Grimme 2 siblings, 0 replies; 10+ messages in thread From: Lon Hohberger @ 2008-02-06 17:22 UTC (permalink / raw) To: cluster-devel.redhat.com On Wed, 2008-02-06 at 12:01 -0500, Lon Hohberger wrote: > /root/foo.sl: > > define foo_function() > { > foo_function(); > printf("Test\n"); > } Typo... :) -- Lon ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Cluster-devel] rind-0.8.1 patch 2008-02-06 17:01 ` Lon Hohberger 2008-02-06 17:22 ` Lon Hohberger @ 2008-02-06 19:18 ` Marc Grimme 2008-02-07 8:38 ` Marc Grimme 2 siblings, 0 replies; 10+ messages in thread From: Marc Grimme @ 2008-02-06 19:18 UTC (permalink / raw) To: cluster-devel.redhat.com On Wednesday 06 February 2008 18:01:34 Lon Hohberger wrote: > On Wed, 2008-02-06 at 10:03 +0100, Marc Grimme wrote: > > On Tuesday 05 February 2008 18:58:25 Lon Hohberger wrote: > > > <events search_path="/usr/share/cluster:/usr/local/cluster:..." /> > > > <!-- for example --> > > > ... > > > </events> > > > > Ah got it. I wasn't aware of evalfile. But wouldn't filetags work around > > the searchpath problem and being pretty easy to implement? > > I don't see searchpaths as a problem, and in fact, I might not have to > fix it anyway (yay!). Turns out, this works, too (I thought it didn't > for some reason): > > evalfile("/tmp/lon.sl"); > lon_function(); and the evalfile could also be use in sl-files I suppose. > > /tmp/lon.sl: > > evalfile("/root/foo.sl"); > define lon_function() > { > foo_function(); > printf("Hello, world!\n"); > } > > /root/foo.sl: > > define foo_function() > { > foo_function(); > printf("Test\n"); > } > > > > (However, I don't consider this critical...) > > > > It's not critical but could help make the development of those sl-files > > more general. > > Given that absolute paths also work, does this satisfy the requirement? > I really can't see adding more parsing code for something S-Lang already > does. > > I mean, it's not -that- hard to add, but it's kind of pointless to do: > > <event> > <file "/rgmanager/slang-scripts/foo1.sl"/> > <file "/rgmanager/slang-scripts/foo2.sl"/> > script_body(); > </event> > > instead of: > > <event> > evalfile("/rgmanager/slang-scripts/foo1.sl"); > evalfile("/rgmanager/slang-scripts/foo2.sl"); > script_body(); > </event> Yes, this way is better I agree. > > > > Note that the reason I was calling external scripts is because there's > > > a limit in ccsd on the amount of data you can get back from ccs_get() - > > > it's a couple hundred bytes. So, embedding an entire script won't > > > work, but a shorty script like the one you made should work. > > > > And you can independently develop sl-scripts from the cluster.conf. So > > you don't need a new version number anytime you change the sl-file. > > Besides you could build up libraries (on example is follow-service) to be > > used general. > > That's also a benefit (and using evalfile() in your code instead of > embedding the equivalent in cluster.conf also is coincides with this). > > > > > + if (membership->cml_members[i].cn_member > 0 && > > > > But I'm not sure if this is right. For me it worked perfectly well > > > > ;-) . > > > > > > That's strange... I'll look at that. That *needs* to work. :) > > > > Right that should not be a difference shouldn't it. ;-) > > Definitely not. :) > > > One thing I think's missing is intelligence about event collapsing in > default_event_handler. For example, if a service fails and you restart > it, but restart fails, so you move it to another node (all in a single > event handler execution), we get 5-ish events for that: > > * failure event > * stopped event > * start event > * stopped event > * start event > > The middle 3 events become irrelevant. We could fix it in > default_event_script.sl by checking the current state and if the current > state doesn't match the event, throwing it out. (I think throwing them > out for user-defined event scripts is a bad idea, however, which is why > I suggested changing it in default_event_handler). > > This could also be a good 'library' function (as could several of the > functions in default_event_handler.sl). Yes and therefore be useable with evalfile and everything. But I like those eventscripts. Especially when they don't complicate the cluster.conf files. > > -- Lon Marc. -- Gruss / Regards, Marc Grimme http://www.atix.de/ http://www.open-sharedroot.org/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Cluster-devel] rind-0.8.1 patch 2008-02-06 17:01 ` Lon Hohberger 2008-02-06 17:22 ` Lon Hohberger 2008-02-06 19:18 ` Marc Grimme @ 2008-02-07 8:38 ` Marc Grimme 2008-02-08 20:56 ` Lon Hohberger 2 siblings, 1 reply; 10+ messages in thread From: Marc Grimme @ 2008-02-07 8:38 UTC (permalink / raw) To: cluster-devel.redhat.com Something else I was thinking about when playing with those things: 1. Why are USER, CONFIG and MIGRATION events not yet being passed? It could be quite interesting as well to trigger those. 2. And wouldn't it be a good idea to being able to call some kind of higherlevel os-skript? I thought it might then be possible to generate a more dynamic failoverdomain. For example one with the lowest loaded node being lowest prioritized. That can be quite nice when having services or vms which produce very high load. Marc. -- Gruss / Regards, Marc Grimme http://www.atix.de/ http://www.open-sharedroot.org/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Cluster-devel] rind-0.8.1 patch 2008-02-07 8:38 ` Marc Grimme @ 2008-02-08 20:56 ` Lon Hohberger 0 siblings, 0 replies; 10+ messages in thread From: Lon Hohberger @ 2008-02-08 20:56 UTC (permalink / raw) To: cluster-devel.redhat.com On Thu, 2008-02-07 at 09:38 +0100, Marc Grimme wrote: > Something else I was thinking about when playing with those things: > 1. Why are USER, CONFIG and MIGRATION events not yet being passed? It could be > quite interesting as well to trigger those. USER + CONFIG is being passed to the event handlers in CVS, you just can't define events off of them currently in the configuration. I think what we have right now is plenty for blowing your own foot off, but we certainly could add those. Virtual machine requests (e.g. clusvcadm -M) aren't going out with 5.2 for central_processing. > 2. And wouldn't it be a good idea to being able to call some kind of > higherlevel os-skript? I disagree here, sort of: * I don't think the possibility of lots of fork/execs while trying to determine service placement after a failure is a great idea. We want to try to be as neutral as we can during this situation. A really low-impact script interface that reorders a node list might be okay; i.e.: node_list = external_reorder("my_script", old_node_list); I suppose it's kind of like shuffle(), but with intelligence. That script could then sort the node IDs by whatever criteria it wanted. As for processing events in external scripts, I disagree fairly strongly: * The data rgmanager is currently using to make decisions (e.g. configuration info such as failover domains, service recovery policies, and extended stuff which you can randomly add) is difficult to access from shell scripts. * Internal rgmanager operations (flipping service states for example) can't be done from outside rgmanager in a sane way. > I thought it might then be possible to generate a more > dynamic failoverdomain. Agreed. > For example one with the lowest loaded node being > lowest prioritized. That can be quite nice when having services or vms which > produce very high load. There are lots of kinds of load: * memory pressure * cpu load * run queue average length (the 'uptime' load) * i/o bandwidth to shared storage * network bandwidth I'd recommend whatever load monitoring we care about be done proactively. That is, have something publish current load states periodically, and have the data 'already there' - so that in the event of a failure, we can just act on what is known - rather than asking around for various pieces of data. However, we're getting a little far out though - does what's in CVS work for doing the 'follows' logic or not? :) -- Lon ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Cluster-devel] rind-0.8.1 patch 2008-02-04 17:41 ` Marc Grimme 2008-02-05 17:58 ` Lon Hohberger @ 2008-02-14 21:56 ` Lon Hohberger 1 sibling, 0 replies; 10+ messages in thread From: Lon Hohberger @ 2008-02-14 21:56 UTC (permalink / raw) To: cluster-devel.redhat.com On Mon, 2008-02-04 at 18:41 +0100, Marc Grimme wrote: > 2. I found that the sl-function nodes_online() returns also online if the node > in question is in the cluster but has no rgmanager running. For me it worked > just to change the line in rgmanager/src/daemons/slang_event.c:606 : > - if (membership->cml_members[i].cn_member && > + if (membership->cml_members[i].cn_member > 0 && This is really strange -- I just retested this and it worked for me (unmodified). I wonder if there's something I'm missing. The different behaviors would normally indicate an uninitialized variable but I don't see where it could be (get_member_list() memsets the membership list). -- Lon ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2008-02-14 21:56 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-11-30 16:49 [Cluster-devel] rind-0.8.1 patch Lon Hohberger 2008-02-04 17:41 ` Marc Grimme 2008-02-05 17:58 ` Lon Hohberger 2008-02-06 9:03 ` Marc Grimme 2008-02-06 17:01 ` Lon Hohberger 2008-02-06 17:22 ` Lon Hohberger 2008-02-06 19:18 ` Marc Grimme 2008-02-07 8:38 ` Marc Grimme 2008-02-08 20:56 ` Lon Hohberger 2008-02-14 21:56 ` Lon Hohberger
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).