From mboxrd@z Thu Jan 1 00:00:00 1970 From: Fabio M. Di Nitto Date: Thu, 19 Nov 2009 23:01:01 +0100 Subject: [Cluster-devel] Re: [Debian-ha-maintainers] again: "redhat-cluster: services are not relocated when a node fails" In-Reply-To: <20091119124739.GA30480@bogon.sigxcpu.org> References: <20091119124739.GA30480@bogon.sigxcpu.org> Message-ID: <4B05C01D.8040603@redhat.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Guido G?nther wrote: > Hi Ernesto, > On Wed, Nov 18, 2009 at 02:30:57PM +0100, Ernesto Rodriguez Reina wrote: >> Hi everyone! >> >> I recently start using RHCS for a project I'm working on but I found >> that RHCS2 in Debian Lenny do not relocate services when a node fails. >> I found the thread [1] where Guido G?nther says that this problem was >> solved on RHCS 3.0.2. Then I downloaded and installed RHCS 3.0.4 (the >> deb packages from debian mirror) and reproduced the experiment of >> Martin Waite and again the service was not relocated on node fail. >> Does someone had make it work as it should in Debian? Martin, or Guido >> or anybody can you please help me to find out why it is not working as >> it should? > I checked with RHCS 3.0.4 as it's currently in unstable rebuilt for > Lenny. The kernel enters a soft lock after I shut off one node (see > attached log) and no resource takeover happens. Fabione, any idea what > triggers this? since you guys are running cluster 3.0.4, please do the following: 1) add in cluster.conf ... 2) reproduce the above scenario, then collect all the logs, from all daemons, from all nodes from /var/log/cluster (this is upstream default, check with Debian if they have changed it please). then I?d like to see your cluster.conf and have a better idea on how a node is "killed". If cluster.conf contains sensitive data such as passwords, either blank them or send the file to me only. I?ll keep it confidential but please do NOT randomly mangle the configuration to hide bits. The recovery operation is strictly dependent on different things. The configuration and the logs should be able to tell us something. Thanks Fabio