* RFC: pid "ownership" of ip config information
@ 2011-01-21 9:28 Patrick Schaaf
2011-01-21 10:17 ` Nicolas de Pesloüan
0 siblings, 1 reply; 4+ messages in thread
From: Patrick Schaaf @ 2011-01-21 9:28 UTC (permalink / raw)
To: netdev
Dear netdev,
I want to solicit comments on a feature enhancement that occured
to me recently.
Feature:
- For "ip addr add", "ip route add", "ip rule add", and maybe "ip link
add",
implement an option 'pid XXXXX' to specify a PID
- if that PID is not currently existing, fail the operation
- if, at a later time, that PID dies, automatically remove the
configuration,
as if a corresponding "ip ... del" would have been given
The feature would be useful in any kind of "IP takeover" scenario.
I'm concretely working on deployment of keepalived (VRRP address
takeover) and memcachedb (address takeover after berkeley DB master
selection).
It would also apply to all kinds of routing daemons (zebra, quagga...).
In all these cases, for as long as the process is working normally,
it can trigger the relevant address withdrawal, but when the process
dies unexpectedly (oom killer or whatever), addresses are left
configured,
while a partner on another host might take them over, resulting in
actively duplicate IPs and the application breaking.
The alternative to such a feature, would be to have an additional
monitoring process, which would watch the PID somehow, and need to
be configured to know what to withdraw when it dies.
Before I go ahead and try to implement that, I would like to have
some feedback regarding the idea
- has it been discussed before?
- would it be accepted by the relevant maintainers?
- did I overlook alternative solutions to the problem?
best regards
Patrick
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: RFC: pid "ownership" of ip config information
2011-01-21 9:28 RFC: pid "ownership" of ip config information Patrick Schaaf
@ 2011-01-21 10:17 ` Nicolas de Pesloüan
2011-01-23 10:24 ` Patrick Schaaf
0 siblings, 1 reply; 4+ messages in thread
From: Nicolas de Pesloüan @ 2011-01-21 10:17 UTC (permalink / raw)
To: Patrick Schaaf; +Cc: netdev
Le 21/01/2011 10:28, Patrick Schaaf a écrit :
> Dear netdev,
>
> I want to solicit comments on a feature enhancement that occured
> to me recently.
>
> Feature:
>
> - For "ip addr add", "ip route add", "ip rule add", and maybe "ip link
> add",
> implement an option 'pid XXXXX' to specify a PID
> - if that PID is not currently existing, fail the operation
> - if, at a later time, that PID dies, automatically remove the
> configuration,
> as if a corresponding "ip ... del" would have been given
>
> The feature would be useful in any kind of "IP takeover" scenario.
>
> I'm concretely working on deployment of keepalived (VRRP address
> takeover) and memcachedb (address takeover after berkeley DB master
> selection).
>
> It would also apply to all kinds of routing daemons (zebra, quagga...).
>
> In all these cases, for as long as the process is working normally,
> it can trigger the relevant address withdrawal, but when the process
> dies unexpectedly (oom killer or whatever), addresses are left
> configured,
> while a partner on another host might take them over, resulting in
> actively duplicate IPs and the application breaking.
>
> The alternative to such a feature, would be to have an additional
> monitoring process, which would watch the PID somehow, and need to
> be configured to know what to withdraw when it dies.
>
> Before I go ahead and try to implement that, I would like to have
> some feedback regarding the idea
>
> - has it been discussed before?
> - would it be accepted by the relevant maintainers?
> - did I overlook alternative solutions to the problem?
There exists some user space clustering system that should provide the same functionalities. Did you
had a look at http://www.linux-ha.org/ ?
> best regards
> Patrick
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: RFC: pid "ownership" of ip config information
2011-01-21 10:17 ` Nicolas de Pesloüan
@ 2011-01-23 10:24 ` Patrick Schaaf
2011-01-23 12:32 ` Nicolas de Pesloüan
0 siblings, 1 reply; 4+ messages in thread
From: Patrick Schaaf @ 2011-01-23 10:24 UTC (permalink / raw)
To: Nicolas de Pesloüan; +Cc: netdev
On Fri, 2011-01-21 at 11:17 +0100, Nicolas de Pesloüan wrote:
> Le 21/01/2011 10:28, Patrick Schaaf a écrit :
> > The alternative to such a feature, would be to have an additional
> > monitoring process, which would watch the PID somehow, and need to
> > be configured to know what to withdraw when it dies.
> There exists some user space clustering system that should provide the same functionalities. Did you
> had a look at http://www.linux-ha.org/ ?
Those would be the more complex instances of "an additional monitoring
process", right?
What happens when heartbeat is "kill -9"ed? Assume that I want to avoid
STOMITH like approaches.
My proposal could be _used_ by such complex clustering managers, too.
Or, did I overlook there a kernel based solution to "withdraw IP config
when processes die"? Can you provide a direct link on linux-ha?
best regards
Patrick
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: RFC: pid "ownership" of ip config information
2011-01-23 10:24 ` Patrick Schaaf
@ 2011-01-23 12:32 ` Nicolas de Pesloüan
0 siblings, 0 replies; 4+ messages in thread
From: Nicolas de Pesloüan @ 2011-01-23 12:32 UTC (permalink / raw)
To: Patrick Schaaf; +Cc: netdev
Le 23/01/2011 11:24, Patrick Schaaf a écrit :
> On Fri, 2011-01-21 at 11:17 +0100, Nicolas de Pesloüan wrote:
>> Le 21/01/2011 10:28, Patrick Schaaf a écrit :
>>> The alternative to such a feature, would be to have an additional
>>> monitoring process, which would watch the PID somehow, and need to
>>> be configured to know what to withdraw when it dies.
>
>> There exists some user space clustering system that should provide the same functionalities. Did you
>> had a look at http://www.linux-ha.org/ ?
>
> Those would be the more complex instances of "an additional monitoring
> process", right?
>
> What happens when heartbeat is "kill -9"ed? Assume that I want to avoid
> STOMITH like approaches.
>
> My proposal could be _used_ by such complex clustering managers, too.
>
> Or, did I overlook there a kernel based solution to "withdraw IP config
> when processes die"?
>
> Can you provide a direct link on linux-ha?
Do you consider "withdraw IP config" the only feature that is needed when a process die ? Or shall
we instead design a more generic framework to run a command or call a system call when a process die
? /sbin/init is probably already doing something similar. Arguably, even init mail hang...
If your point is to provide a safety net for very sick but not really died node, then, no userland
system would help. As such, I agree with you that an automatic withdraw of IP config might help.
However, how would you protect against a simple never ending loop in the process or against very
slow process due to high load on the node? You probably also need to guard against process not
reading the network receive queue anymore.
This might end up with some sort of local heart beating monitoring of userland process, in the
kernel, and I'm not sure if someone would support this.
And whatever you do locally to a node to ensure proper operation, you need a way to also check for
proper operation from outside of the node. A STOMITH system is always required, in order to kill a
totally mad node. Even the kernel may become mad.
Nicolas.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2011-01-23 12:32 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-01-21 9:28 RFC: pid "ownership" of ip config information Patrick Schaaf
2011-01-21 10:17 ` Nicolas de Pesloüan
2011-01-23 10:24 ` Patrick Schaaf
2011-01-23 12:32 ` Nicolas de Pesloüan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).