From mboxrd@z Thu Jan 1 00:00:00 1970 From: Oren Laadan Subject: Re: C/R without "leaks" Date: Thu, 16 Apr 2009 14:39:05 -0400 Message-ID: <49E77B49.3020102@cs.columbia.edu> References: <49E40662.2040508@cs.columbia.edu> <20090414163633.GE27461@x200.localdomain> <49E4D89D.9060903@cs.columbia.edu> <20090415195629.GD26994@x200.localdomain> <1239835337.6610.6.camel@bahia> <20090416161215.GA8505@x200.localdomain> <49E774B1.5060505@nortel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <49E774B1.5060505-ZIRUuHA3oDzQT0dZR+AlfA@public.gmane.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: containers-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org Errors-To: containers-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org To: Chris Friesen Cc: Ingo Molnar , Linux-Kernel , Dave Hansen , containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org, Andrew Morton , Linus Torvalds , Alexey Dobriyan List-Id: containers.vger.kernel.org Chris Friesen wrote: > Alexey Dobriyan wrote: >> On Thu, Apr 16, 2009 at 12:42:17AM +0200, Greg Kurz wrote: >>> On Wed, 2009-04-15 at 23:56 +0400, Alexey Dobriyan wrote: >> >>>> There are sockets and live netns as the most complex example. I'm not >>>> prepared to describe it exactly, but people wishing to do C/R with >>>> "leaks" should be very careful with their wishes. >>> They should close their sockets before checkpoint and find/have some way >>> to reconnect after. This implies some kind of C/R awareness in the code >>> to be checkpointed. >> >> How do you imagine sshd closing sockets and reconnecting? > > Don't you already have to handle the case where an sshd connection is > checkpointed, then the system is shutdown and the restore doesn't happen > until after the TCP timeout? Any connection in that case is, of course, lost, and it's up to the application to do something about it. If the application relies on the state of the connection, it will have to give up (e.g. sshd, and ssh, die). However, there are many application that can withstand connection lost without crashing. They simply retry (web browser, irc client, db clients). With time, there may be more applications that are 'c/r-aware'. Moreover, in some cases you could, on restart, use a wrapper to create a new connection to somewhere (*), then ask restart(2) to use that socket instead of the original, such that from the user point of view things continue to work well, transparently. (*) that somewhere, could be the original peer, or another server, if it has a way to somehow continue a cut connection, or a special wrapper server that you right for that purpose. Oren. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758372AbZDPSl2 (ORCPT ); Thu, 16 Apr 2009 14:41:28 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1756970AbZDPSlS (ORCPT ); Thu, 16 Apr 2009 14:41:18 -0400 Received: from brinza.cc.columbia.edu ([128.59.29.8]:65143 "EHLO brinza.cc.columbia.edu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756810AbZDPSlR (ORCPT ); Thu, 16 Apr 2009 14:41:17 -0400 Message-ID: <49E77B49.3020102@cs.columbia.edu> Date: Thu, 16 Apr 2009 14:39:05 -0400 From: Oren Laadan Organization: Columbia University User-Agent: Thunderbird 2.0.0.21 (X11/20090302) MIME-Version: 1.0 To: Chris Friesen CC: Alexey Dobriyan , Greg Kurz , Linux-Kernel , Dave Hansen , containers@lists.osdl.org, Andrew Morton , Linus Torvalds , Ingo Molnar Subject: Re: C/R without "leaks" References: <49E40662.2040508@cs.columbia.edu> <20090414163633.GE27461@x200.localdomain> <49E4D89D.9060903@cs.columbia.edu> <20090415195629.GD26994@x200.localdomain> <1239835337.6610.6.camel@bahia> <20090416161215.GA8505@x200.localdomain> <49E774B1.5060505@nortel.com> In-Reply-To: <49E774B1.5060505@nortel.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-No-Spam-Score: Local Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Chris Friesen wrote: > Alexey Dobriyan wrote: >> On Thu, Apr 16, 2009 at 12:42:17AM +0200, Greg Kurz wrote: >>> On Wed, 2009-04-15 at 23:56 +0400, Alexey Dobriyan wrote: >> >>>> There are sockets and live netns as the most complex example. I'm not >>>> prepared to describe it exactly, but people wishing to do C/R with >>>> "leaks" should be very careful with their wishes. >>> They should close their sockets before checkpoint and find/have some way >>> to reconnect after. This implies some kind of C/R awareness in the code >>> to be checkpointed. >> >> How do you imagine sshd closing sockets and reconnecting? > > Don't you already have to handle the case where an sshd connection is > checkpointed, then the system is shutdown and the restore doesn't happen > until after the TCP timeout? Any connection in that case is, of course, lost, and it's up to the application to do something about it. If the application relies on the state of the connection, it will have to give up (e.g. sshd, and ssh, die). However, there are many application that can withstand connection lost without crashing. They simply retry (web browser, irc client, db clients). With time, there may be more applications that are 'c/r-aware'. Moreover, in some cases you could, on restart, use a wrapper to create a new connection to somewhere (*), then ask restart(2) to use that socket instead of the original, such that from the user point of view things continue to work well, transparently. (*) that somewhere, could be the original peer, or another server, if it has a way to somehow continue a cut connection, or a special wrapper server that you right for that purpose. Oren.