From: Fabio Massimo Di Nitto <fabbione@ubuntu.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] qdiskd hangs cluster activity
Date: Tue, 30 Oct 2007 14:39:02 +0100 [thread overview]
Message-ID: <472733F6.8070702@ubuntu.com> (raw)
In-Reply-To: <47258982.9060400@ubuntu.com>
The culprit is a missing patch from openais 0.82 release.
https://bugzilla.redhat.com/show_bug.cgi?id=314641
Fabio
Fabio Massimo Di Nitto wrote:
> Hi Lon,
>
> I found a very interesting bug that manages to hang the entire cluster.
>
> Setup is a 3 nodes cluster, with no fancy stuff running at all (i will be able
> to show it to you next wed as it lives on my laptop ;)).
>
> <quorumd label="test1">
> <heuristic program="ping 192.168.1.1 -c1 -t1" score="1" interval="2" tko="3"/>
> </quorumd>
>
> test1 is a 1GB shared AOE device between the 3 nodes.
>
> the cluster starts without problems. After firing up qdiskd -f -d:
>
> qdiskd -f -d
> [12681] debug: Loading configuration information
> [12681] debug: Heuristic: 'ping 192.168.1.1 -c1 -t1' score=1 interv
> =2 tko=3
> [12681] debug: 1 heuristics loaded
> [12681] debug: Quorum Daemon: 1 heuristics, 1 interval, 10 tko, 0 votes
> open_partition: seek: Invalid argument
> qdisk_validate: open of /dev/sda2 for RDWR failed: Illegal seek
> qdisk_verify: Illegal seek
> [12681] info: Quorum Partition: /dev/etherd/e1.0 Label: test1
> [12681] info: Quorum Daemon Initializing
> [12682] info: Heuristic: 'ping 192.168.1.1 -c1 -t1' UP
> [12681] debug: Node 2 is UP
> [12681] debug: Node 3 is UP
> [12681] info: Initial score 1/1
> [12681] info: Initialization complete
> [12681] notice: Score sufficient for master operation (1/1; required=1); upgra
> ng
> [12681] debug: Making bid for master
> [12681] info: Assuming master role
>
> A few seconds after the node assume master role, it hangs. The others will
> follow in a matter of seconds.
>
> aisexec is stalled in recv(..
>
> No way to recover. kill -9 all over is required.
>
> In attachment is a qdiskd strace from all the 3 nodes started at the exact same
> time.
>
> Fabio
>
> PS I wonder if we are hitting this:
>
> from qdisk/disk.c:
>
> /*
> * All IOs must be of size which is a multiple of 512. Here we
> * just add in enough extra to accommodate.
> * XXX - if the on-disk offsets don't provide enough room we're cooked!
> */
>
--
I'm going to make him an offer he can't refuse.
prev parent reply other threads:[~2007-10-30 13:39 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-10-29 7:19 [Cluster-devel] qdiskd hangs cluster activity Fabio Massimo Di Nitto
2007-10-30 13:39 ` Fabio Massimo Di Nitto [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=472733F6.8070702@ubuntu.com \
--to=fabbione@ubuntu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.