From: Fabio Massimo Di Nitto <fabbione@ubuntu.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] qdiskd hangs cluster activity
Date: Mon, 29 Oct 2007 08:19:30 +0100 [thread overview]
Message-ID: <47258982.9060400@ubuntu.com> (raw)
Hi Lon,
I found a very interesting bug that manages to hang the entire cluster.
Setup is a 3 nodes cluster, with no fancy stuff running at all (i will be able
to show it to you next wed as it lives on my laptop ;)).
<quorumd label="test1">
<heuristic program="ping 192.168.1.1 -c1 -t1" score="1" interval="2" tko="3"/>
</quorumd>
test1 is a 1GB shared AOE device between the 3 nodes.
the cluster starts without problems. After firing up qdiskd -f -d:
qdiskd -f -d
[12681] debug: Loading configuration information
[12681] debug: Heuristic: 'ping 192.168.1.1 -c1 -t1' score=1 interv
=2 tko=3
[12681] debug: 1 heuristics loaded
[12681] debug: Quorum Daemon: 1 heuristics, 1 interval, 10 tko, 0 votes
open_partition: seek: Invalid argument
qdisk_validate: open of /dev/sda2 for RDWR failed: Illegal seek
qdisk_verify: Illegal seek
[12681] info: Quorum Partition: /dev/etherd/e1.0 Label: test1
[12681] info: Quorum Daemon Initializing
[12682] info: Heuristic: 'ping 192.168.1.1 -c1 -t1' UP
[12681] debug: Node 2 is UP
[12681] debug: Node 3 is UP
[12681] info: Initial score 1/1
[12681] info: Initialization complete
[12681] notice: Score sufficient for master operation (1/1; required=1); upgra
ng
[12681] debug: Making bid for master
[12681] info: Assuming master role
A few seconds after the node assume master role, it hangs. The others will
follow in a matter of seconds.
aisexec is stalled in recv(..
No way to recover. kill -9 all over is required.
In attachment is a qdiskd strace from all the 3 nodes started at the exact same
time.
Fabio
PS I wonder if we are hitting this:
from qdisk/disk.c:
/*
* All IOs must be of size which is a multiple of 512. Here we
* just add in enough extra to accommodate.
* XXX - if the on-disk offsets don't provide enough room we're cooked!
*/
--
I'm going to make him an offer he can't refuse.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: qdisk.logs.tar.bz2
Type: application/x-bzip
Size: 10859 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/cluster-devel/attachments/20071029/300ad994/attachment.bin>
next reply other threads:[~2007-10-29 7:19 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-10-29 7:19 Fabio Massimo Di Nitto [this message]
2007-10-30 13:39 ` [Cluster-devel] qdiskd hangs cluster activity Fabio Massimo Di Nitto
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=47258982.9060400@ubuntu.com \
--to=fabbione@ubuntu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.