From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Teigland Date: Thu, 13 Oct 2011 10:20:59 -0400 Subject: [Cluster-devel] cluster4 gfs_controld Message-ID: <20111013142059.GA6704@redhat.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Here's the outline of my plan to remove/replace the essential bits of gfs_controld in cluster4. I expect it'll go away entirely, but there could be one or two minor things it would still handle on the side. kernel dlm/gfs2 will continue to be operable with either . cluster3 dlm_controld/gfs_controld combination, or . cluster4 dlm_controld only Two main things from gfs_controld need replacing: 1. jid allocation, first mounter cluster3 . both from gfs_controld cluster4 . jid from dlm-kernel "slots" which will be assigned similarly . first mounter using a dlm lock in lock_dlm 2. recovery coordination, failure notification cluster3 . coordination of dlm-kernel/gfs-kernel recovery is done indirectly in userspace between dlm_controld/gfs_controld, which then toggle sysfs files. . write("sysfs block", 0) -> block_store(1) write("sysfs recover", jid) -> recover_store(jid) write("sysfs block", 1) -> block_store(0) cluster4 . coordination of dlm-kernel/gfs-kernel recovery is done directly in kernel using callbacks from dlm-kernel to gfs-kernel. . gdlm_mount(struct gfs2_sbd *sdp, const char *table, int *first, int *jid) calls dlm_recover_register(dlm, &jid, &recover_callbacks) . gdlm_recover_prep() -> block_store(1) gdlm_recover_slot(jid) -> recover_store(jid) gdlm_recover_done() -> block_store(0) cluster3 dlm/gfs recovery . dlm_controld sees nodedown (libcpg) . gfs_controld sees nodedown (libcpg) . dlm_controld stops dlm-kernel (sysfs control 0) . gfs_controld stops gfs-kernel (sysfs block 1) . dlm_controld waits for gfs_controld kernel stop (libdlmcontrol) . gfs_controld waits for dlm_controld kernel stop (libdlmcontrol) . dlm_controld syncs state among all nodes (libcpg) . gfs_controld syncs state among all nodes (libcpg) . dlm_controld starts dlm-kernel recovery (sysfs control 1) . gfs_controld starts gfs-kernel recovery (sysfs recover jid) . gfs_controld starts gfs-kernel (sysfs block 0) cluster4 dlm/gfs recovery . dlm_controld sees nodedown (libcpg) . dlm_controld stops dlm-kernel (sysfs control 0) . dlm-kernel stops gfs-kernel (callback block 1) . dlm_controld syncs state among all nodes (libcpg) . dlm_controld starts dlm-kernel recovery (sysfs control 1) . dlm-kernel starts gfs-kernel recovery (callback recover jid) . dlm-kernel starts gfs-kernel (callback block 0)