netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Nicolas de Pesloüan" <nicolas.2p.debian@gmail.com>
To: Patrick Schaaf <netdev@bof.de>
Cc: netdev@vger.kernel.org
Subject: Re: RFC: pid "ownership" of ip config information
Date: Sun, 23 Jan 2011 13:32:53 +0100	[thread overview]
Message-ID: <4D3C1FF5.2010607@gmail.com> (raw)
In-Reply-To: <1295778271.5657.7.camel@lat1>

Le 23/01/2011 11:24, Patrick Schaaf a écrit :
> On Fri, 2011-01-21 at 11:17 +0100, Nicolas de Pesloüan wrote:
>> Le 21/01/2011 10:28, Patrick Schaaf a écrit :
>>> The alternative to such a feature, would be to have an additional
>>> monitoring process, which would watch the PID somehow, and need to
>>> be configured to know what to withdraw when it dies.
>
>> There exists some user space clustering system that should provide the same functionalities. Did you
>> had a look at http://www.linux-ha.org/ ?
>
> Those would be the more complex instances of "an additional monitoring
> process", right?
 >
> What happens when heartbeat is "kill -9"ed? Assume that I want to avoid
> STOMITH like approaches.
 >
> My proposal could be _used_ by such complex clustering managers, too.
>
> Or, did I overlook there a kernel based solution to "withdraw IP config
> when processes die"?
 >
> Can you provide a direct link on linux-ha?

Do you consider "withdraw IP config" the only feature that is needed when a process die ? Or shall 
we instead design a more generic framework to run a command or call a system call when a process die 
? /sbin/init is probably already doing something similar. Arguably, even init mail hang...

If your point is to provide a safety net for very sick but not really died node, then, no userland 
system would help. As such, I agree with you that an automatic withdraw of IP config might help. 
However, how would you protect against a simple never ending loop in the process or against very 
slow process due to high load on the node? You probably also need to guard against process not 
reading the network receive queue anymore.

This might end up with some sort of local heart beating monitoring of userland process, in the 
kernel, and I'm not sure if someone would support this.

And whatever you do locally to a node to ensure proper operation, you need a way to also check for 
proper operation from outside of the node. A STOMITH system is always required, in order to kill a 
totally mad node. Even the kernel may become mad.

	Nicolas.

      reply	other threads:[~2011-01-23 12:32 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-21  9:28 RFC: pid "ownership" of ip config information Patrick Schaaf
2011-01-21 10:17 ` Nicolas de Pesloüan
2011-01-23 10:24   ` Patrick Schaaf
2011-01-23 12:32     ` Nicolas de Pesloüan [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D3C1FF5.2010607@gmail.com \
    --to=nicolas.2p.debian@gmail.com \
    --cc=netdev@bof.de \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).