* [Cluster-devel] RFC: generic improvement to fence agents api @ 2011-03-19 6:34 Fabio M. Di Nitto 2011-03-19 17:14 ` Digimer ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: Fabio M. Di Nitto @ 2011-03-19 6:34 UTC (permalink / raw) To: cluster-devel.redhat.com Hi all, while discussing on linux-cluster the support of the Tripp Lite switched PDU, it occurred to me that we can effectively improve (almost half) the time it takes to perform power fencing of certain devices, when for example, more than one PSU needs to be powered off to complete the action. Node X has 2 PSU. In our current state, the config would look like: <clusternode .....> <fence> <method...> <device name="..." port="1"/> <device name="..." port="2"/> ..... it means effectively spawning, most likely the same agent, twice. Increasing the time it takes to fence and maybe increasing the possibility to fail to fence if the second connection fails. My suggestion would be to allow to specify a list of ports instead. <clusternode .....> <fence> <method...> <device name="..." ports="1 2"/> .... Either by using a new keyword "ports" or re-using "port" itself. If using "port", current configuration will continue to work as-is and the change effectively would not introduce any backward compatibility issue. This way the agent can: 1) connect once (reducing in most cases the ssh/telnet/whatever time) 2) issue the OFF command as fast as possible (almost in parallel) 3) then wait for the results. By adopting a list, the configuration would look cleaner too IMHO. A quick glance, the change should not affect fenced (David can you confirm please?), and most agents could handle it via the fencing python lib (Marek?). Does it sound reasonable? Cheers Fabio ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Cluster-devel] RFC: generic improvement to fence agents api 2011-03-19 6:34 [Cluster-devel] RFC: generic improvement to fence agents api Fabio M. Di Nitto @ 2011-03-19 17:14 ` Digimer 2011-03-19 17:32 ` Fabio M. Di Nitto 2011-03-21 8:40 ` Marek Grac 2011-03-21 17:07 ` David Teigland 2 siblings, 1 reply; 10+ messages in thread From: Digimer @ 2011-03-19 17:14 UTC (permalink / raw) To: cluster-devel.redhat.com On 03/19/2011 02:34 AM, Fabio M. Di Nitto wrote: > Hi all, > > while discussing on linux-cluster the support of the Tripp Lite switched > PDU, it occurred to me that we can effectively improve (almost half) the > time it takes to perform power fencing of certain devices, when for > example, more than one PSU needs to be powered off to complete the action. > > Node X has 2 PSU. > > In our current state, the config would look like: > > <clusternode .....> > <fence> > <method...> > <device name="..." port="1"/> > <device name="..." port="2"/> > ..... > > it means effectively spawning, most likely the same agent, twice. > Increasing the time it takes to fence and maybe increasing the > possibility to fail to fence if the second connection fails. > > My suggestion would be to allow to specify a list of ports instead. > > <clusternode .....> > <fence> > <method...> > <device name="..." ports="1 2"/> > .... > > Either by using a new keyword "ports" or re-using "port" itself. If > using "port", current configuration will continue to work as-is and the > change effectively would not introduce any backward compatibility issue. > > This way the agent can: > > 1) connect once (reducing in most cases the ssh/telnet/whatever time) > 2) issue the OFF command as fast as possible (almost in parallel) > 3) then wait for the results. > > By adopting a list, the configuration would look cleaner too IMHO. > > A quick glance, the change should not affect fenced (David can you > confirm please?), and most agents could handle it via the fencing python > lib (Marek?). > > Does it sound reasonable? > > Cheers > Fabio I like this idea, but would like to suggest: * Keep 'port' for a single port, as it is, and add 'ports' for multiple port definitions. * When using ports, I'd recommend comma-separated values and dash-separated ranges (ie: ports="1,2", ports="1-4", ports="1,3-5") and combinations there-of. This strikes me as more "standard" and possibly less prone to typos. -- Digimer E-Mail: digimer at alteeve.com AN!Whitepapers: http://alteeve.com Node Assassin: http://nodeassassin.org ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Cluster-devel] RFC: generic improvement to fence agents api 2011-03-19 17:14 ` Digimer @ 2011-03-19 17:32 ` Fabio M. Di Nitto 2011-03-19 18:44 ` Digimer 0 siblings, 1 reply; 10+ messages in thread From: Fabio M. Di Nitto @ 2011-03-19 17:32 UTC (permalink / raw) To: cluster-devel.redhat.com On 3/19/2011 6:14 PM, Digimer wrote: > On 03/19/2011 02:34 AM, Fabio M. Di Nitto wrote: >> Hi all, >> >> while discussing on linux-cluster the support of the Tripp Lite switched >> PDU, it occurred to me that we can effectively improve (almost half) the >> time it takes to perform power fencing of certain devices, when for >> example, more than one PSU needs to be powered off to complete the action. >> >> Node X has 2 PSU. >> >> In our current state, the config would look like: >> >> <clusternode .....> >> <fence> >> <method...> >> <device name="..." port="1"/> >> <device name="..." port="2"/> >> ..... >> >> it means effectively spawning, most likely the same agent, twice. >> Increasing the time it takes to fence and maybe increasing the >> possibility to fail to fence if the second connection fails. >> >> My suggestion would be to allow to specify a list of ports instead. >> >> <clusternode .....> >> <fence> >> <method...> >> <device name="..." ports="1 2"/> >> .... >> >> Either by using a new keyword "ports" or re-using "port" itself. If >> using "port", current configuration will continue to work as-is and the >> change effectively would not introduce any backward compatibility issue. >> >> This way the agent can: >> >> 1) connect once (reducing in most cases the ssh/telnet/whatever time) >> 2) issue the OFF command as fast as possible (almost in parallel) >> 3) then wait for the results. >> >> By adopting a list, the configuration would look cleaner too IMHO. >> >> A quick glance, the change should not affect fenced (David can you >> confirm please?), and most agents could handle it via the fencing python >> lib (Marek?). >> >> Does it sound reasonable? >> >> Cheers >> Fabio > > I like this idea, but would like to suggest: > > * Keep 'port' for a single port, as it is, and add 'ports' for multiple > port definitions. > * When using ports, I'd recommend comma-separated values and > dash-separated ranges (ie: ports="1,2", ports="1-4", ports="1,3-5") and > combinations there-of. This strikes me as more "standard" and possibly > less prone to typos. > The only thing I have against "," or "-" is that they might be easily part of a port name already. Range doesn?t make sense to me and it?s complex to interpret/implement. How many machines have you seen around with so many PSU?s anyway that need a range to avoid headache? (leaving aside E10K or s390 ;)). Fabio ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Cluster-devel] RFC: generic improvement to fence agents api 2011-03-19 17:32 ` Fabio M. Di Nitto @ 2011-03-19 18:44 ` Digimer 0 siblings, 0 replies; 10+ messages in thread From: Digimer @ 2011-03-19 18:44 UTC (permalink / raw) To: cluster-devel.redhat.com On 03/19/2011 01:32 PM, Fabio M. Di Nitto wrote: > On 3/19/2011 6:14 PM, Digimer wrote: >> On 03/19/2011 02:34 AM, Fabio M. Di Nitto wrote: >>> Hi all, >>> >>> while discussing on linux-cluster the support of the Tripp Lite switched >>> PDU, it occurred to me that we can effectively improve (almost half) the >>> time it takes to perform power fencing of certain devices, when for >>> example, more than one PSU needs to be powered off to complete the action. >>> >>> Node X has 2 PSU. >>> >>> In our current state, the config would look like: >>> >>> <clusternode .....> >>> <fence> >>> <method...> >>> <device name="..." port="1"/> >>> <device name="..." port="2"/> >>> ..... >>> >>> it means effectively spawning, most likely the same agent, twice. >>> Increasing the time it takes to fence and maybe increasing the >>> possibility to fail to fence if the second connection fails. >>> >>> My suggestion would be to allow to specify a list of ports instead. >>> >>> <clusternode .....> >>> <fence> >>> <method...> >>> <device name="..." ports="1 2"/> >>> .... >>> >>> Either by using a new keyword "ports" or re-using "port" itself. If >>> using "port", current configuration will continue to work as-is and the >>> change effectively would not introduce any backward compatibility issue. >>> >>> This way the agent can: >>> >>> 1) connect once (reducing in most cases the ssh/telnet/whatever time) >>> 2) issue the OFF command as fast as possible (almost in parallel) >>> 3) then wait for the results. >>> >>> By adopting a list, the configuration would look cleaner too IMHO. >>> >>> A quick glance, the change should not affect fenced (David can you >>> confirm please?), and most agents could handle it via the fencing python >>> lib (Marek?). >>> >>> Does it sound reasonable? >>> >>> Cheers >>> Fabio >> >> I like this idea, but would like to suggest: >> >> * Keep 'port' for a single port, as it is, and add 'ports' for multiple >> port definitions. >> * When using ports, I'd recommend comma-separated values and >> dash-separated ranges (ie: ports="1,2", ports="1-4", ports="1,3-5") and >> combinations there-of. This strikes me as more "standard" and possibly >> less prone to typos. >> > > The only thing I have against "," or "-" is that they might be easily > part of a port name already. > > Range doesn?t make sense to me and it?s complex to interpret/implement. > How many machines have you seen around with so many PSU?s anyway that > need a range to avoid headache? (leaving aside E10K or s390 ;)). > > Fabio Lol, I've seen up to four in n-1 setups, but you are right, it's not common enough to justify increasing complexity, so simple space-separated numbers is fine. I still argue for the "port" vs. "ports" though. :) -- Digimer E-Mail: digimer at alteeve.com AN!Whitepapers: http://alteeve.com Node Assassin: http://nodeassassin.org ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Cluster-devel] RFC: generic improvement to fence agents api 2011-03-19 6:34 [Cluster-devel] RFC: generic improvement to fence agents api Fabio M. Di Nitto 2011-03-19 17:14 ` Digimer @ 2011-03-21 8:40 ` Marek Grac 2011-03-21 10:44 ` Fabio M. Di Nitto 2011-03-21 17:07 ` David Teigland 2 siblings, 1 reply; 10+ messages in thread From: Marek Grac @ 2011-03-21 8:40 UTC (permalink / raw) To: cluster-devel.redhat.com Hi, On 03/19/2011 07:34 AM, Fabio M. Di Nitto wrote: > <device name="..." ports="1 2"/> > .... > > Either by using a new keyword "ports" or re-using "port" itself. If > using "port", current configuration will continue to work as-is and the > change effectively would not introduce any backward compatibility issue. > > This way the agent can: > > 1) connect once (reducing in most cases the ssh/telnet/whatever time) > 2) issue the OFF command as fast as possible (almost in parallel) > 3) then wait for the results. > > By adopting a list, the configuration would look cleaner too IMHO. > > A quick glance, the change should not affect fenced (David can you > confirm please?), and most agents could handle it via the fencing python > lib (Marek?). 1) connect once will work only for connection-based fence agents. It won't help with SNMP + HTTP REST and there won't be any benefits for drac/ilo/ipmi that can turn off only one machine. Rough estimate is that it can help us to improve time in 1/3 to 1/2 fence agents. 2) parallelism is possible only on those fence devices that works in async mode. Issuing more than one command will also increase a need for QE. Some of those devices are not able even to handle 'get status' immediately after 'power off' (reason for --power-wait). Serialization within same connection is definitely possible and for fencing python lib we can implement that directly in library. -) "ports" is better than "port" because such change will have impact also on UI and we have to distinguish if fence agent accept more than one port or not. -) There is no character that can't be used for name of virtual machine. m, ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Cluster-devel] RFC: generic improvement to fence agents api 2011-03-21 8:40 ` Marek Grac @ 2011-03-21 10:44 ` Fabio M. Di Nitto 0 siblings, 0 replies; 10+ messages in thread From: Fabio M. Di Nitto @ 2011-03-21 10:44 UTC (permalink / raw) To: cluster-devel.redhat.com On 3/21/2011 9:40 AM, Marek Grac wrote: > Hi, > > On 03/19/2011 07:34 AM, Fabio M. Di Nitto wrote: >> <device name="..." ports="1 2"/> >> .... >> >> Either by using a new keyword "ports" or re-using "port" itself. If >> using "port", current configuration will continue to work as-is and the >> change effectively would not introduce any backward compatibility issue. >> >> This way the agent can: >> >> 1) connect once (reducing in most cases the ssh/telnet/whatever time) >> 2) issue the OFF command as fast as possible (almost in parallel) >> 3) then wait for the results. >> >> By adopting a list, the configuration would look cleaner too IMHO. >> >> A quick glance, the change should not affect fenced (David can you >> confirm please?), and most agents could handle it via the fencing python >> lib (Marek?). > > 1) connect once will work only for connection-based fence agents. It > won't help with SNMP + HTTP REST and there won't be any benefits for > drac/ilo/ipmi that can turn off only one machine. Rough estimate is that > it can help us to improve time in 1/3 to 1/2 fence agents. Of course, it?s a benefit for a subset of the agents. > > 2) parallelism is possible only on those fence devices that works in > async mode. Issuing more than one command will also increase a need for > QE. Some of those devices are not able even to handle 'get status' > immediately after 'power off' (reason for --power-wait). Serialization > within same connection is definitely possible and for fencing python lib > we can implement that directly in library. Assuming we agree to do it, let?s get it upstream first, then we will worry about QE at a later stage. I think starting from serialization within the same connection is already a good start. The parallelism is not real anyway. I don?t expect forking of commands as that would lead to other issues, as you already described. > > -) "ports" is better than "port" because such change will have impact > also on UI and we have to distinguish if fence agent accept more than > one port or not. ACK. > > -) There is no character that can't be used for name of virtual machine. I don?t think vms are a problem here, since each vm has only one port? Fabio ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Cluster-devel] RFC: generic improvement to fence agents api 2011-03-19 6:34 [Cluster-devel] RFC: generic improvement to fence agents api Fabio M. Di Nitto 2011-03-19 17:14 ` Digimer 2011-03-21 8:40 ` Marek Grac @ 2011-03-21 17:07 ` David Teigland 2011-03-21 17:09 ` Digimer ` (2 more replies) 2 siblings, 3 replies; 10+ messages in thread From: David Teigland @ 2011-03-21 17:07 UTC (permalink / raw) To: cluster-devel.redhat.com On Sat, Mar 19, 2011 at 07:34:55AM +0100, Fabio M. Di Nitto wrote: > My suggestion would be to allow to specify a list of ports instead. This comes up now and then. The current rule of one action per agent execution is a tried and true, fundamental property of the agent api. It should not be changed IMO. I'll need some time to come up with the various specific reasons against it, but at least one of them (a big one) is partial failure/success. Dave ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Cluster-devel] RFC: generic improvement to fence agents api 2011-03-21 17:07 ` David Teigland @ 2011-03-21 17:09 ` Digimer 2011-03-21 17:16 ` Fabio M. Di Nitto 2011-03-21 17:37 ` Lon Hohberger 2 siblings, 0 replies; 10+ messages in thread From: Digimer @ 2011-03-21 17:09 UTC (permalink / raw) To: cluster-devel.redhat.com On 03/21/2011 01:07 PM, David Teigland wrote: > On Sat, Mar 19, 2011 at 07:34:55AM +0100, Fabio M. Di Nitto wrote: >> My suggestion would be to allow to specify a list of ports instead. > > This comes up now and then. The current rule of one action per agent > execution is a tried and true, fundamental property of the agent api. > It should not be changed IMO. I'll need some time to come up with the > various specific reasons against it, but at least one of them (a big > one) is partial failure/success. > > Dave Could it not be set so that anything shy of a complete success is treated as a failure? -- Digimer E-Mail: digimer at alteeve.com AN!Whitepapers: http://alteeve.com Node Assassin: http://nodeassassin.org ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Cluster-devel] RFC: generic improvement to fence agents api 2011-03-21 17:07 ` David Teigland 2011-03-21 17:09 ` Digimer @ 2011-03-21 17:16 ` Fabio M. Di Nitto 2011-03-21 17:37 ` Lon Hohberger 2 siblings, 0 replies; 10+ messages in thread From: Fabio M. Di Nitto @ 2011-03-21 17:16 UTC (permalink / raw) To: cluster-devel.redhat.com On 3/21/2011 6:07 PM, David Teigland wrote: > On Sat, Mar 19, 2011 at 07:34:55AM +0100, Fabio M. Di Nitto wrote: >> My suggestion would be to allow to specify a list of ports instead. > > This comes up now and then. The current rule of one action per agent > execution is a tried and true, fundamental property of the agent api. > It should not be changed IMO. I'll need some time to come up with the > various specific reasons against it, but at least one of them (a big > one) is partial failure/success. No don?t waste your time on it. This is big enough to nullify the benefit. Indeed it is something that I didn?t consider and would make the recovery matrix from a failure scenario too complex to handle. Thanks Fabio ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Cluster-devel] RFC: generic improvement to fence agents api 2011-03-21 17:07 ` David Teigland 2011-03-21 17:09 ` Digimer 2011-03-21 17:16 ` Fabio M. Di Nitto @ 2011-03-21 17:37 ` Lon Hohberger 2 siblings, 0 replies; 10+ messages in thread From: Lon Hohberger @ 2011-03-21 17:37 UTC (permalink / raw) To: cluster-devel.redhat.com On Mon, Mar 21, 2011 at 01:07:02PM -0400, David Teigland wrote: > On Sat, Mar 19, 2011 at 07:34:55AM +0100, Fabio M. Di Nitto wrote: > > My suggestion would be to allow to specify a list of ports instead. > > This comes up now and then. The current rule of one action per agent > execution is a tried and true, fundamental property of the agent api. > It should not be changed IMO. I'll need some time to come up with the > various specific reasons against it, but at least one of them (a big > one) is partial failure/success. All or nothing -- Some devices actually support port grouping; turn off port "Foo" and it operates on plugs 1,2,3 at the same time. -- Lon Hohberger - Red Hat, Inc. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2011-03-21 17:37 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-03-19 6:34 [Cluster-devel] RFC: generic improvement to fence agents api Fabio M. Di Nitto 2011-03-19 17:14 ` Digimer 2011-03-19 17:32 ` Fabio M. Di Nitto 2011-03-19 18:44 ` Digimer 2011-03-21 8:40 ` Marek Grac 2011-03-21 10:44 ` Fabio M. Di Nitto 2011-03-21 17:07 ` David Teigland 2011-03-21 17:09 ` Digimer 2011-03-21 17:16 ` Fabio M. Di Nitto 2011-03-21 17:37 ` Lon Hohberger
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).