From mboxrd@z Thu Jan 1 00:00:00 1970 From: Fabio Massimo Di Nitto Date: Mon, 29 Oct 2007 08:19:30 +0100 Subject: [Cluster-devel] qdiskd hangs cluster activity Message-ID: <47258982.9060400@ubuntu.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Hi Lon, I found a very interesting bug that manages to hang the entire cluster. Setup is a 3 nodes cluster, with no fancy stuff running at all (i will be able to show it to you next wed as it lives on my laptop ;)). test1 is a 1GB shared AOE device between the 3 nodes. the cluster starts without problems. After firing up qdiskd -f -d: qdiskd -f -d [12681] debug: Loading configuration information [12681] debug: Heuristic: 'ping 192.168.1.1 -c1 -t1' score=1 interv =2 tko=3 [12681] debug: 1 heuristics loaded [12681] debug: Quorum Daemon: 1 heuristics, 1 interval, 10 tko, 0 votes open_partition: seek: Invalid argument qdisk_validate: open of /dev/sda2 for RDWR failed: Illegal seek qdisk_verify: Illegal seek [12681] info: Quorum Partition: /dev/etherd/e1.0 Label: test1 [12681] info: Quorum Daemon Initializing [12682] info: Heuristic: 'ping 192.168.1.1 -c1 -t1' UP [12681] debug: Node 2 is UP [12681] debug: Node 3 is UP [12681] info: Initial score 1/1 [12681] info: Initialization complete [12681] notice: Score sufficient for master operation (1/1; required=1); upgra ng [12681] debug: Making bid for master [12681] info: Assuming master role A few seconds after the node assume master role, it hangs. The others will follow in a matter of seconds. aisexec is stalled in recv(.. No way to recover. kill -9 all over is required. In attachment is a qdiskd strace from all the 3 nodes started at the exact same time. Fabio PS I wonder if we are hitting this: from qdisk/disk.c: /* * All IOs must be of size which is a multiple of 512. Here we * just add in enough extra to accommodate. * XXX - if the on-disk offsets don't provide enough room we're cooked! */ -- I'm going to make him an offer he can't refuse. -------------- next part -------------- A non-text attachment was scrubbed... Name: qdisk.logs.tar.bz2 Type: application/x-bzip Size: 10859 bytes Desc: not available URL: