From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ryan O'Hara Date: Tue, 22 Feb 2011 16:26:49 -0600 Subject: [Cluster-devel] fenced: don't ignore victim_done messages for reduced victims In-Reply-To: <20110222220127.GA27941@redhat.com> References: <20110222220127.GA27941@redhat.com> Message-ID: <20110222222649.GC688@redhat.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Looks correct to me. ACK. On Tue, Feb 22, 2011 at 05:01:27PM -0500, David Teigland wrote: > > Needs ACK for RHEL6. > > > When a victim is "reduced" (i.e. fenced skips fencing it because it > rejoins the cluster cleanly before fenced fences it), it is immediately > removed from the list of victims, before the "victim_done" message is > sent for it. The victim_done message updates the time of the last > successful fencing operation for a failed node. > > The code that processes received victim_done messages was ignoring the > message for the reduced victim because the node couldn't be found in > the victims list. This caused the latest fencing information to not be > recorded for the node, causing dlm_controld to wait indefinately for > fencing to complete for the reduced victim. > > The fix is to simply record the information from a victim_done message > even if the node is not in the victims list. > > bz 678704 > > Signed-off-by: David Teigland > --- > fence/fenced/cpg.c | 18 ++++++++++++------ > 1 files changed, 12 insertions(+), 6 deletions(-) > > diff --git a/fence/fenced/cpg.c b/fence/fenced/cpg.c > index a8629b9..99e16a0 100644 > --- a/fence/fenced/cpg.c > +++ b/fence/fenced/cpg.c > @@ -652,9 +652,9 @@ static void receive_victim_done(struct fd *fd, struct fd_header *hd, int len) > > node = get_node_victim(fd, id->nodeid); > if (!node) { > + /* see comment below about no node */ > log_debug("receive_victim_done %d:%u no victim nodeid %d", > hd->nodeid, seq, id->nodeid); > - return; > } > > log_debug("receive_victim_done %d:%u remove victim %d time %llu how %d", > @@ -670,9 +670,11 @@ static void receive_victim_done(struct fd *fd, struct fd_header *hd, int len) > if (hd->nodeid == our_nodeid) { > /* sanity check, I don't think this should happen; > see comment in fence_victims() */ > - if (!node->local_victim_done) > - log_error("expect local_victim_done"); > - node->local_victim_done = 0; > + if (node) { > + if (!node->local_victim_done) > + log_error("expect local_victim_done"); > + node->local_victim_done = 0; > + } > } else { > /* save details of fencing operation from master, which > master saves at the time it completes it */ > @@ -680,8 +682,12 @@ static void receive_victim_done(struct fd *fd, struct fd_header *hd, int len) > id->fence_how, id->fence_time); > } > > - list_del(&node->list); > - free(node); > + /* we can have no node when reduce_victims() removes it, bz 678704 */ > + > + if (node) { > + list_del(&node->list); > + free(node); > + } > } > > /* we know that the quorum value here is consistent with the cpg events > -- > 1.7.1.1