Re: hiberante hangs TCP Re: [EXAMPLE CODE] Parasite thread injection and TCP connection hijacking

public inbox for linux-pm@vger.kernel.org
 help / color / mirror / Atom feed

* Re: hiberante hangs TCP Re: [EXAMPLE CODE] Parasite thread injection and TCP connection hijacking
       [not found] ` <20111030044821.GA23741@spacedout.fries.net>
@ 2011-10-30 20:16   ` Tejun Heo
  2011-10-30 20:43     ` David Fries
  2011-11-02  9:44     ` MyungJoo Ham
  0 siblings, 2 replies; 5+ messages in thread
From: Tejun Heo @ 2011-10-30 20:16 UTC (permalink / raw)
  To: David Fries; +Cc: netdev, linux-pm, linux-kernel

(cc'ing Rafael and linux-pm)

On Sat, Oct 29, 2011 at 11:48:21PM -0500, David Fries wrote:
> I saw the write up on this on lwn.net, pretty creative by the way, and
> it got me thinking about a different checkpoint/restart problem I've
> been running into.  Specifically in hibernating to disk.  In the
> hibernate case active TCP connections hang after resuming, while an
> idle TCP connection will continue after the system is back up.  My
> observation is the kernel checkpoints itself to memory, enables
> devices, writes out that checkpoint image to storage, then powers off.
> The problem is if TCP packets are received while writing to storage,
> the kernel will continue to queue and ack those TCP packets, but the
> running kernel and it's network state is shortly lost.  When the
> computer resumes, those TCP byte sequences hang the TCP connection for
> an extended period of time while the resumed computer refuses to
> acknowledge the data that was received after checkpointing and the now
> running kernel knew nothing about, and the other computer tries in
> vain to resend any data that hadn't yet been acknowledged, which is
> always after the data that was lost, until one of them eventually
> gives up.
> 
> I've been wondering if it was safe or possible to leave any network
> interfaces down after the checkpoint, or what the right solution would
> be.  I didn't think marking every TCP connection with a ZOMBIE_KERNEL
> bit just after the kernel checkpoint (for the kernel is walking dead
> and won't remember anything that happens), and then prevent any TCP
> acks from being sent for those connections would be the right
> solution.  I've taken to unplugging the physical lan cable,
> hibernating to disk, and plugging it back in after the system is down,
> to avoid the problem.  Any ideas?

Hmmm... sounds like taking down network interfaces before starting
hibernation sequence should be enough, which shouldn't be too
difficult to implement from userland.  Rafael, what do you think?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: hiberante hangs TCP Re: [EXAMPLE CODE] Parasite thread injection and TCP connection hijacking
  2011-10-30 20:16   ` hiberante hangs TCP Re: [EXAMPLE CODE] Parasite thread injection and TCP connection hijacking Tejun Heo
@ 2011-10-30 20:43     ` David Fries
  2011-11-02  9:44     ` MyungJoo Ham
  1 sibling, 0 replies; 5+ messages in thread
From: David Fries @ 2011-10-30 20:43 UTC (permalink / raw)
  To: Tejun Heo; +Cc: netdev, linux-pm, linux-kernel

On Sun, Oct 30, 2011 at 01:16:18PM -0700, Tejun Heo wrote:
> (cc'ing Rafael and linux-pm)
> 
> On Sat, Oct 29, 2011 at 11:48:21PM -0500, David Fries wrote:
> > I saw the write up on this on lwn.net, pretty creative by the way, and
> > it got me thinking about a different checkpoint/restart problem I've
> > been running into.  Specifically in hibernating to disk.  In the
> > hibernate case active TCP connections hang after resuming, while an
> > idle TCP connection will continue after the system is back up.  My
> > observation is the kernel checkpoints itself to memory, enables
> > devices, writes out that checkpoint image to storage, then powers off.
> > The problem is if TCP packets are received while writing to storage,
> > the kernel will continue to queue and ack those TCP packets, but the
> > running kernel and it's network state is shortly lost.  When the
> > computer resumes, those TCP byte sequences hang the TCP connection for
> > an extended period of time while the resumed computer refuses to
> > acknowledge the data that was received after checkpointing and the now
> > running kernel knew nothing about, and the other computer tries in
> > vain to resend any data that hadn't yet been acknowledged, which is
> > always after the data that was lost, until one of them eventually
> > gives up.
> > 
> > I've been wondering if it was safe or possible to leave any network
> > interfaces down after the checkpoint, or what the right solution would
> > be.  I didn't think marking every TCP connection with a ZOMBIE_KERNEL
> > bit just after the kernel checkpoint (for the kernel is walking dead
> > and won't remember anything that happens), and then prevent any TCP
> > acks from being sent for those connections would be the right
> > solution.  I've taken to unplugging the physical lan cable,
> > hibernating to disk, and plugging it back in after the system is down,
> > to avoid the problem.  Any ideas?
> 
> Hmmm... sounds like taking down network interfaces before starting
> hibernation sequence should be enough, which shouldn't be too
> difficult to implement from userland.  Rafael, what do you think?

What I observe is the kernel prints out "Preallocating image memory",
then when the screen goes blank the network link light also goes out,
then the screen comes back on with "Compressing and saving" along with
the link light comes on, until it has been saved and the system shuts
down.  So the kernel is already brining the network down, it just
needs to keep it there until the original check pointed kernel is back
up.

Userspace bringing the network interfaces down is problematic.  As an
example one of my systems is running hostapd as an access point and
bridging that to the wired ethernet, that's not a trivial task to
setup and take down (the Debian ifup can set it up, but I've not
figured out yet how to get ifdown to take everything down cleanly, and
I sometimes manually run hostapd if I'm troubleshooting).  Any
manually added routes would go away, good luck in setting everything
back up the way it was before for all the different configurations out
there in userspace.  Add to those issues programs would now have a
time when networking is down that they wouldn't have otherwise seen.

-- 
David Fries <david@fries.net>    PGP pub CB1EE8F0
http://fries.net/~david/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: hiberante hangs TCP Re: [EXAMPLE CODE] Parasite thread injection and TCP connection hijacking
  2011-10-30 20:16   ` hiberante hangs TCP Re: [EXAMPLE CODE] Parasite thread injection and TCP connection hijacking Tejun Heo
  2011-10-30 20:43     ` David Fries
@ 2011-11-02  9:44     ` MyungJoo Ham
  2011-11-02 15:10       ` Tejun Heo
  1 sibling, 1 reply; 5+ messages in thread
From: MyungJoo Ham @ 2011-11-02  9:44 UTC (permalink / raw)
  To: Tejun Heo; +Cc: netdev, linux-pm, David Fries, linux-kernel

On Mon, Oct 31, 2011 at 5:16 AM, Tejun Heo <tj@kernel.org> wrote:
> (cc'ing Rafael and linux-pm)
>
> On Sat, Oct 29, 2011 at 11:48:21PM -0500, David Fries wrote:
>> I saw the write up on this on lwn.net, pretty creative by the way, and
>> it got me thinking about a different checkpoint/restart problem I've
>> been running into.  Specifically in hibernating to disk.  In the
>> hibernate case active TCP connections hang after resuming, while an
>> idle TCP connection will continue after the system is back up.  My
>> observation is the kernel checkpoints itself to memory, enables
>> devices, writes out that checkpoint image to storage, then powers off.
>> The problem is if TCP packets are received while writing to storage,
>> the kernel will continue to queue and ack those TCP packets, but the
>> running kernel and it's network state is shortly lost.  When the
>> computer resumes, those TCP byte sequences hang the TCP connection for
>> an extended period of time while the resumed computer refuses to
>> acknowledge the data that was received after checkpointing and the now
>> running kernel knew nothing about, and the other computer tries in
>> vain to resend any data that hadn't yet been acknowledged, which is
>> always after the data that was lost, until one of them eventually
>> gives up.
>>
>> I've been wondering if it was safe or possible to leave any network
>> interfaces down after the checkpoint, or what the right solution would
>> be.  I didn't think marking every TCP connection with a ZOMBIE_KERNEL
>> bit just after the kernel checkpoint (for the kernel is walking dead
>> and won't remember anything that happens), and then prevent any TCP
>> acks from being sent for those connections would be the right
>> solution.  I've taken to unplugging the physical lan cable,
>> hibernating to disk, and plugging it back in after the system is down,
>> to avoid the problem.  Any ideas?
>
> Hmmm... sounds like taking down network interfaces before starting
> hibernation sequence should be enough, which shouldn't be too
> difficult to implement from userland.  Rafael, what do you think?
>
> Thanks.

Um... it seems that the "thaw" callbacks of network interfaces or TCP
should do something on this.

Probably, the "thaw" callbacks should make sure that the TCP
connections are closed?



Cheers,
MyungJoo


>
> --
> tejun
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/linux-pm
>



-- 
MyungJoo Ham, Ph.D.
Mobile Software Platform Lab, DMC Business, Samsung Electronics

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: hiberante hangs TCP Re: [EXAMPLE CODE] Parasite thread injection and TCP connection hijacking
  2011-11-02  9:44     ` MyungJoo Ham
@ 2011-11-02 15:10       ` Tejun Heo
  2012-02-17 19:28         ` Pavel Machek
  0 siblings, 1 reply; 5+ messages in thread
From: Tejun Heo @ 2011-11-02 15:10 UTC (permalink / raw)
  To: MyungJoo Ham; +Cc: netdev, linux-pm, David Fries, linux-kernel

Hello,

On Wed, Nov 02, 2011 at 06:44:31PM +0900, MyungJoo Ham wrote:
> > Hmmm... sounds like taking down network interfaces before starting
> > hibernation sequence should be enough, which shouldn't be too
> > difficult to implement from userland.  Rafael, what do you think?
> >
> > Thanks.
> 
> Um... it seems that the "thaw" callbacks of network interfaces or TCP
> should do something on this.
> 
> Probably, the "thaw" callbacks should make sure that the TCP
> connections are closed?

I don't think it's a good idea to diddle with TCP connections from
that layer.  From what I understand, it seem all we need is plugging
tx/rx while preparing for hibernation.  That shouldn't be too
difficult.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: hiberante hangs TCP Re: [EXAMPLE CODE] Parasite thread injection and TCP connection hijacking
  2011-11-02 15:10       ` Tejun Heo
@ 2012-02-17 19:28         ` Pavel Machek
  0 siblings, 0 replies; 5+ messages in thread
From: Pavel Machek @ 2012-02-17 19:28 UTC (permalink / raw)
  To: Tejun Heo; +Cc: netdev, linux-pm, linux-kernel, David Fries

On Wed 2011-11-02 08:10:39, Tejun Heo wrote:
> Hello,
> 
> On Wed, Nov 02, 2011 at 06:44:31PM +0900, MyungJoo Ham wrote:
> > > Hmmm... sounds like taking down network interfaces before starting
> > > hibernation sequence should be enough, which shouldn't be too
> > > difficult to implement from userland.  Rafael, what do you think?
> > >
> > > Thanks.
> > 
> > Um... it seems that the "thaw" callbacks of network interfaces or TCP
> > should do something on this.
> > 
> > Probably, the "thaw" callbacks should make sure that the TCP
> > connections are closed?
> 
> I don't think it's a good idea to diddle with TCP connections from
> that layer.  From what I understand, it seem all we need is plugging
> tx/rx while preparing for hibernation.  That shouldn't be too
> difficult.

Yes, that should be done. 

If someone has uswsusp setup where they talk over the network, it
might break them, but hopefully noone is doing that.

Also hopefully noone does hibernation on /dev/nbd.

      	    							Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-02-17 19:28 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20110806121247.GC23937@htj.dyndns.org>
     [not found] ` <20111030044821.GA23741@spacedout.fries.net>
2011-10-30 20:16   ` hiberante hangs TCP Re: [EXAMPLE CODE] Parasite thread injection and TCP connection hijacking Tejun Heo
2011-10-30 20:43     ` David Fries
2011-11-02  9:44     ` MyungJoo Ham
2011-11-02 15:10       ` Tejun Heo
2012-02-17 19:28         ` Pavel Machek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox