From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mx1.redhat.com (ext-mx04.extmail.prod.ext.phx2.redhat.com [10.5.110.8]) by int-mx01.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id o297rBZ3018415 for ; Tue, 9 Mar 2010 02:53:11 -0500 Received: from qw-out-2122.google.com (qw-out-2122.google.com [74.125.92.24]) by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id o297quBq019884 for ; Tue, 9 Mar 2010 02:52:56 -0500 Received: by qw-out-2122.google.com with SMTP id 8so1557554qwh.39 for ; Mon, 08 Mar 2010 23:52:55 -0800 (PST) MIME-Version: 1.0 Date: Tue, 9 Mar 2010 15:52:55 +0800 Message-ID: <1cafab771003082352qbce64e2idf24cddbe30c4e55@mail.gmail.com> From: Xinwei Hu Subject: [linux-lvm] vgscan fails when other nodes quit cleanly. Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: LVM general discussion and development Hi all, Here's an interesting issue. When we shutdown the cluster stack cleanly, all lvm commands will fail to grab the global lock. Like this: --->8---- sys3:~ # vgscan cluster request failed: Host is down Unable to obtain global lock. ---8<---- I went through the code history a bit. It seems to be caused by e65ffb8e, which is for gulm only I think. --->8---- commit e65ffb8e687bbce4e7edff70ebff2b3f1c0b6157 Author: Christine Caulfield Date: Fri Jun 20 10:58:28 2008 +0000 Make clvmd return immediately if other nodes are down in a gulm cluster. bz#447799 diff --git a/WHATS_NEW b/WHATS_NEW index ec7ff54..023659e 100644 --- a/WHATS_NEW +++ b/WHATS_NEW @@ -1,5 +1,6 @@ Version 2.02.39 - ================================ + Make clvmd return immediately if other nodes are down in a gulm cluster. Improve/Fix read ahead 'auto' calculation for stripe_size Fix lvchange output for -r auto setting if auto is already set Add testcase for read ahead diff --git a/daemons/clvmd/clvmd-gulm.c b/daemons/clvmd/clvmd-gulm.c index 3a230b5..a2f2148 100644 --- a/daemons/clvmd/clvmd-gulm.c +++ b/daemons/clvmd/clvmd-gulm.c @@ -665,6 +665,7 @@ static int _cluster_do_node_callback(struct local_client *master_client, { struct dm_hash_node *hn; struct node_info *ninfo; + int somedown = 0; dm_hash_iterate(hn, node_hash) { @@ -686,12 +687,14 @@ static int _cluster_do_node_callback(struct local_client *master_client, client = dm_hash_lookup_binary(sock_hash, csid, GULM_MAX_CSID_LEN); } + DEBUGLOG("down_callback2. node %s, state = %d\n", ninfo->name, ninfo->state); if (ninfo->state != NODE_DOWN) callback(master_client, csid, ninfo->state == NODE_CLVMD); - + if (ninfo->state != NODE_CLVMD) + somedown = -1; } - return 0; + return somedown; } /* Convert gulm error codes to unix errno numbers */ ---8<---- clvmd-corosync.c is copied over from clvmd-openais.c, then from clvmd-gulm.c. I'd suggest to remove this patch for both clvmd-corosync and clvmd-gulm. Any comments ? Thanks.