From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?UTF-8?B?Tmljb2xhcyBkZSBQZXNsb8O8YW4=?= Subject: Re: RFC: pid "ownership" of ip config information Date: Sun, 23 Jan 2011 13:32:53 +0100 Message-ID: <4D3C1FF5.2010607@gmail.com> References: <1295602091.3582.1.camel@lat1> <4D395D3C.9010308@gmail.com> <1295778271.5657.7.camel@lat1> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: netdev@vger.kernel.org To: Patrick Schaaf Return-path: Received: from mail-ww0-f44.google.com ([74.125.82.44]:58565 "EHLO mail-ww0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750895Ab1AWMc6 (ORCPT ); Sun, 23 Jan 2011 07:32:58 -0500 Received: by wwa36 with SMTP id 36so3417840wwa.1 for ; Sun, 23 Jan 2011 04:32:57 -0800 (PST) In-Reply-To: <1295778271.5657.7.camel@lat1> Sender: netdev-owner@vger.kernel.org List-ID: Le 23/01/2011 11:24, Patrick Schaaf a =C3=A9crit : > On Fri, 2011-01-21 at 11:17 +0100, Nicolas de Peslo=C3=BCan wrote: >> Le 21/01/2011 10:28, Patrick Schaaf a =C3=A9crit : >>> The alternative to such a feature, would be to have an additional >>> monitoring process, which would watch the PID somehow, and need to >>> be configured to know what to withdraw when it dies. > >> There exists some user space clustering system that should provide t= he same functionalities. Did you >> had a look at http://www.linux-ha.org/ ? > > Those would be the more complex instances of "an additional monitorin= g > process", right? > > What happens when heartbeat is "kill -9"ed? Assume that I want to avo= id > STOMITH like approaches. > > My proposal could be _used_ by such complex clustering managers, too. > > Or, did I overlook there a kernel based solution to "withdraw IP conf= ig > when processes die"? > > Can you provide a direct link on linux-ha? Do you consider "withdraw IP config" the only feature that is needed wh= en a process die ? Or shall=20 we instead design a more generic framework to run a command or call a s= ystem call when a process die=20 ? /sbin/init is probably already doing something similar. Arguably, eve= n init mail hang... If your point is to provide a safety net for very sick but not really d= ied node, then, no userland=20 system would help. As such, I agree with you that an automatic withdraw= of IP config might help.=20 However, how would you protect against a simple never ending loop in th= e process or against very=20 slow process due to high load on the node? You probably also need to gu= ard against process not=20 reading the network receive queue anymore. This might end up with some sort of local heart beating monitoring of u= serland process, in the=20 kernel, and I'm not sure if someone would support this. And whatever you do locally to a node to ensure proper operation, you n= eed a way to also check for=20 proper operation from outside of the node. A STOMITH system is always r= equired, in order to kill a=20 totally mad node. Even the kernel may become mad. Nicolas.