All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joel Becker <Joel.Becker@oracle.com>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] shutdown by o2net_idle_timer causes Xen to hang
Date: Wed, 18 Mar 2009 10:07:13 -0700	[thread overview]
Message-ID: <20090318170713.GB23209@mail.oracle.com> (raw)
In-Reply-To: <8C38B97A-7440-4F01-ADFE-FCF3777D6486@zeec.biz>

On Wed, Mar 18, 2009 at 12:17:36PM +0100, David Winter wrote:
> Hello,
> 
> we've had some serious trouble with a two-node Xen-based OCFS2  
> cluster. In brief: we had two incidents where one node detects an idle  
> timeout and shuts the other node down which causes the other node and  
> the Dom0 to hang. Both times this could only be resolved by rebooting  
> the whole machine using the built-in IPMI card.
> 
> All machines (including the other DomUs) run Centos 5.2 and the OCFS2  
> nodes use ocfs2-tools-1.4.1-1.el5 and  
> ocfs2-2.6.18-92.1.13.el5xen-1.4.1-1.el5.
> 
> Unfortunately there wasn't logged much of relevance, except for the / 
> var/log/messages of the node that issued the shutdown (see below) and  
> the nearly five hour gap in the logs of the other node.

	Just to clarify, the o2cb stack doesn't shut down other nodes.
Nodes can only self-fence.  The 'shutting it down' message in the logs
is about the connection.  In other words, cod-2 is already hanging.
ugc-1 notices and closes the network connection.
	So you want to figure out why cod-2 hung or crashed.  Sunil is
right that you'll want netconsole for a better idea of what's going on.
We can't diagnose cod-2 from this information.
	If your dom0 is hanging, that's a separate issue.  A hanging
domU, no matter the cause, shouldn't hang dom0.

Joel

-- 

"Sometimes when reading Goethe I have the paralyzing suspicion
 that he is trying to be funny."
         - Guy Davenport

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127

      parent reply	other threads:[~2009-03-18 17:07 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-18 11:17 [Ocfs2-devel] shutdown by o2net_idle_timer causes Xen to hang David Winter
2009-03-18 12:05 ` Sunil Mushran
2009-03-18 17:07 ` Joel Becker [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090318170713.GB23209@mail.oracle.com \
    --to=joel.becker@oracle.com \
    --cc=ocfs2-devel@oss.oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.