From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Teigland Date: Wed, 8 Oct 2008 09:00:41 -0500 Subject: [Cluster-devel] unifying fencing/stonith Message-ID: <20081008140041.GA25435@redhat.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit As discussed at the Prague cluster summit, here is a description of what a unified fencing/stonith system might look like. I've used fenced as a starting point, and added/changed things according to what I learned about pacemaker/stonith. . what causes fencing The fenced daemon will join a cpg. When a cpg member fails (confchg from libcpg), that node will be fenced. After a node leaves the cpg, it will not be fenced if it fails. The join should happen prior to any shared data being used (e.g. mounting a cluster fs), and the leave should be prevented until all shared data is done being used (e.g. all cluster fs's are unmounted). If another program (pacemaker, a control daemon, etc) decides that a cluster member is unfit to be in the cluster, that program can tell corosync to kill the node (through a new lib api). corosync will forcibly remove the member; it will appear is a failed node in the corresponding confchg callbacks. If the node was a member of the fenced cpg, it will be fenced as a result. A program that wants to know when a node has been fenced can query (or get a callback) from fenced via libfenced. . fenced communication When fenced needs to fence a node, the cpg members will communicate among themselves (via cpg messages) to decide which of them should carry out the fencing. When a fencing operation completes, the result is communicated to the cpg members. If the fencing operation failed (or the node doing the fencing fails), a different node will try. . calling agents To carry out fencing, fenced needs to read the fencing configuration for the victim node. fenced will have a fence_node_config structure that defines all the information for fencing a node. One of two different plugins will be called to fill in this structure (until we have a common configuration system). Plugin A will fill in the structure using info from cluster.conf (libccs). Plugin B will fill it in using the pacemaker config source. (This config structure will probably be filled in prior to the negotiation above so that the specific config can factor into the decision about who fences.) Once the fence_node_config structure is filled in, fenced will call the first agent specified in the config, passing it the necessary parameters. If the first agent fails, it can try the second, etc. If all methods fail, another node can make an attempt. If fenced can fork/exec all agents, that would be the simplest. I'm not familiar with fence device monitoring, but it sounds like something that's probably best done by the resource manager, outside the scope of fenced. Plugins would fill in these structures using specific config files/sources. Most of the time, only a single method is used. If multiple methods are defined, each is tried sequentially until one succeeds, e.g. if the first method is power-reset and the second is SAN-disconnect, fenced will first try to reset the power, but if that fails, will try to disable its SAN port. struct fence_node_config { int nodeid; char node_name[MAX]; int num_methods; struct fence_node_method methods[8]; }; /* A method will usually contain one device. It contains multiple devices in the case of dual power supplies or multiple SAN paths that all need to be disabled. Each device in a method is a separate invocation of the agent. All device invocations for a method must succeed in order for the method to be considered successful. */ struct fence_node_method { int num_devices; struct fence_node_device devices[8]; }; /* Agent parameter strings are opaque to fenced which doesn't need to grok them. fenced just copies them directly from the config plugin to the agent. */ struct fence_node_device { char agent[MAX]; /* name of script/program */ char general_params[MAX]; /* ip address, password, etc */ char node_params[MAX]; /* port number, etc */ char other_params[MAX]; /* on, off, querying? */ }; . quorum fenced will use the new quorum plugin being developed for corosync. When the cluster loses quorum, fenced will not execute fence agents, but it continues to operate normally apart from that, i.e. queue fencing operations for failed cpg members. When quorum is regained, fenced will act upon any queued fencing operations for failed cpg members. If a failed node rejoins the fenced cpg before the other cpg members have carried out the fencing operation against it, the operation is skipped. This is common when quorum is lost, e.g. . fenced cpg members A,B,C . B,C fail; A loses quorum . A queues fencing operations against B,C . A does not execute fencing operations without quorum . B is rebooted somehow and rejoins the cpg . A removes B from the list of queued victims (instead of fencing it) . A,B now have quorum and can execute fencing of C (unless C also rejoins the cpg beforehand) [This assumes that B rejoining the cpg implies that B has been rebooted and no longer needs to be fenced. That isn't necessarily true without some extra enforcement.] . startup fencing This has always been a thorny problem, and fenced has never had a great solution. The approach fenced takes is described here; I'm hoping pacemaker might give us some better options for dealing with this, or maybe we can collectively come up with a better solution. An example is the simplest way to define the problem: . A,B,C are cluster members and fenced cpg members . A,B,C all have cluster fs foo mounted and are writing to it . A,B experience a power reset and C freezes/hangs, all simultaneously . A,B reboot, and form a new cluster (C remains hung and unresponsive) . A,B both join the fenced cpg A,B know nothing about node C. Is it safe for A,B to mount and use cluster fs foo? No, not until C is fenced. fenced's solution has been, when nodes first join the fenced cpg, and have quorum, they fence any nodes listed in the static cluster configuration that is not a member of the cluster (or fenced cpg). In this example, A,B will fence C at the last step listed above.