From mboxrd@z Thu Jan 1 00:00:00 1970 From: Lon Hohberger Date: Wed, 29 Feb 2012 18:53:19 -0500 Subject: [Cluster-devel] [PATCH] rgmanager: Retry when config is out of sync [RHEL5] Message-ID: <1330559599-6220-1-git-send-email-lhh@redhat.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit [This patch is already in RHEL5] If you add a service to rgmanager v1 or v2 and that service fails to start on the first node but succeeds in its initial stop operation, there is a chance that the remote instance of rgmanager has not yet reread the configuration, causing the service to be placed into the 'recovering' state without further action. This patch causes the originator of the request to retry the operation. Later versions of rgmanager (ex STABLE3 branch and derivatives) are unlikely to have this problem since configuration updates are not polled, but rather delivered to clients. Update 22-Feb-2012: The above is incorrect, this was reproduced a rgmanager v3 installation. Resolves: rhbz#796272 Signed-off-by: Lon Hohberger --- rgmanager/src/daemons/rg_state.c | 19 +++++++++++++++++++ 1 files changed, 19 insertions(+), 0 deletions(-) diff --git a/rgmanager/src/daemons/rg_state.c b/rgmanager/src/daemons/rg_state.c index 23a4bec..8c5af5b 100644 --- a/rgmanager/src/daemons/rg_state.c +++ b/rgmanager/src/daemons/rg_state.c @@ -1801,6 +1801,7 @@ handle_relocate_req(char *svcName, int orig_request, int preferred_target, rg_state_t svcStatus; int target = preferred_target, me = my_id(); int ret, x, request = orig_request; + int retries; get_rg_state_local(svcName, &svcStatus); if (svcStatus.rs_state == RG_STATE_DISABLED || @@ -1933,6 +1934,8 @@ handle_relocate_req(char *svcName, int orig_request, int preferred_target, if (target == me) goto exhausted; + retries = 0; +retry: ret = svc_start_remote(svcName, request, target); switch (ret) { case RG_ERUN: @@ -1942,6 +1945,22 @@ handle_relocate_req(char *svcName, int orig_request, int preferred_target, *new_owner = svcStatus.rs_owner; free_member_list(allowed_nodes); return 0; + case RG_ENOSERVICE: + /* + * Configuration update pending on remote node? Give it + * a few seconds to sync up. rhbz#568126 + * + * Configuration updates are synchronized in later releases + * of rgmanager; this should not be needed. + */ + if (retries++ < 4) { + sleep(3); + goto retry; + } + logt_print(LOG_WARNING, "Member #%d has a different " + "configuration than I do; trying next " + "member.", target); + /* Deliberate */ case RG_EDEPEND: case RG_EFAIL: /* Uh oh - we failed to relocate to this node. -- 1.7.7.6