netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: ATA over ethernet swapping
       [not found]     ` <20070731162140.GI3206@coraid.com>
@ 2007-07-31 22:27       ` Pavel Machek
  2007-08-01  9:18         ` Peter Zijlstra
  0 siblings, 1 reply; 5+ messages in thread
From: Pavel Machek @ 2007-07-31 22:27 UTC (permalink / raw)
  To: Ed L. Cashin; +Cc: kernel list, ak, Netdev list

Hi!

> ...
> > Is the protocol documented somewhere? aoe.txt only points at
> > HOWTO... aha, protocol is linked from wikipedia.
> > http://www.coraid.com/documents/AoEr10.txt ... perhaps that should be
> > linked from aoe.txt, too?
> 
> Perhaps.  Most people reading the aoe.txt file won't need to refer to
> the protocol itself, though.

Some of your users are developers, too :-). Should I generate a patch?

> > Hmm, aoe protocol is really trivial. Perhaps netpoll/netconsole
> > infrastructure could be used to create driver good enough for
> > swapping? (Ok, it would not neccessarily perform too well, but... we'd
> > simply wait for the reply synchronously. It should be pretty simple).
> 
> I think that in general you still need a way to receive write
> confirmations without allocating memory, and the driver can't provide
> that mechanism.  The problem is that when memory is scarce, writes of
> dirty data must be able to complete, but because memory is scarce,
> there might not be enough to receive and process packets write-reponse
> packets, and the driver has no way of affecting the situation.  That's
> why I think a callback could work: The network layer could allow
> storage drivers to register a callback that recognizes write
> responses.

Hmm, ok, it is not as simple as I thought. include/linux/netpoll.h
already includes mechanism to notify interested parties really soon,
but drivers still call dev_alloc_skb() before that.

> Usually the callback would not be used, but if free pages became so
> scarce that network receives could not take place in a normal fashion,
> the (zero or few) registered callbacks would be used to quickly
> determine whether each packet was a write response.  The distinction
> is important, because write responses can result in the freeing of
> pages.

Hmm, adding GFP_GIVE_ME_EMERGENCY_POOLS to dev_alloc_skb(), then doing

...

int netif_rx(struct sk_buff *skb)
{
	struct softnet_data *queue;
	unsigned long flags;

	/* if netpoll wants it, pretend we never saw it */
	if (netpoll_rx(skb))
		return NET_RX_DROP;

	if (memory_is_still_very_low())
		return NET_RX_DROP;

...should do the trick. Would something like that be acceptable to
network people?
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: ATA over ethernet swapping
  2007-07-31 22:27       ` ATA over ethernet swapping Pavel Machek
@ 2007-08-01  9:18         ` Peter Zijlstra
  2007-08-09 10:11           ` Pavel Machek
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2007-08-01  9:18 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Ed L. Cashin, kernel list, ak, Netdev list

I've been working on this for quite some time. And should post again
soon. Please see the patches:

  http://programming.kicks-ass.net/kernel-patches/vm_deadlock/current/

For now it requires one uses SLUB, I hope that SLAB will go away (will
save me the trouble of adding support) and I guess I ought to do SLOB
some time (if that does stay).

You'd need the first 22 patches of that series, and then call
sk_set_memalloc(sk) on the proper socket, and do some fiddling with the
reconnect logic. See nfs-swapfile.patch for examples.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: ATA over ethernet swapping
  2007-08-01  9:18         ` Peter Zijlstra
@ 2007-08-09 10:11           ` Pavel Machek
  2007-08-13  7:45             ` Peter Zijlstra
  0 siblings, 1 reply; 5+ messages in thread
From: Pavel Machek @ 2007-08-09 10:11 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ed L. Cashin, kernel list, ak, Netdev list

Hi!

> I've been working on this for quite some time. And should post again
> soon. Please see the patches:
> 
>   http://programming.kicks-ass.net/kernel-patches/vm_deadlock/current/
> 
> For now it requires one uses SLUB, I hope that SLAB will go away (will
> save me the trouble of adding support) and I guess I ought to do SLOB
> some time (if that does stay).
> 
> You'd need the first 22 patches of that series, and then call
> sk_set_memalloc(sk) on the proper socket, and do some fiddling with the
> reconnect logic. See nfs-swapfile.patch for examples.

What do you use for testing? I set up ata over ethernet... swapping
over that should deadlock w/o your patches.

But I'm able to compile kernel (-j 10) on 128MB machine, and I tried
cat /dev/zero | grep foo to exhaust memory... and could not reproduce
the deadlock. Should I pingflood? Tweak down ammount of atomic memory
avaialable to make deadlocks easier to reproduce?
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: ATA over ethernet swapping
  2007-08-09 10:11           ` Pavel Machek
@ 2007-08-13  7:45             ` Peter Zijlstra
  2007-08-21  7:42               ` Pavel Machek
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2007-08-13  7:45 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Ed L. Cashin, kernel list, ak, Netdev list

On Thu, 2007-08-09 at 12:11 +0200, Pavel Machek wrote:
> Hi!
> 
> > I've been working on this for quite some time. And should post again
> > soon. Please see the patches:
> > 
> >   http://programming.kicks-ass.net/kernel-patches/vm_deadlock/current/
> > 
> > For now it requires one uses SLUB, I hope that SLAB will go away (will
> > save me the trouble of adding support) and I guess I ought to do SLOB
> > some time (if that does stay).
> > 
> > You'd need the first 22 patches of that series, and then call
> > sk_set_memalloc(sk) on the proper socket, and do some fiddling with the
> > reconnect logic. See nfs-swapfile.patch for examples.
> 
> What do you use for testing? I set up ata over ethernet... swapping
> over that should deadlock w/o your patches.
> 
> But I'm able to compile kernel (-j 10) on 128MB machine, and I tried
> cat /dev/zero | grep foo to exhaust memory... and could not reproduce
> the deadlock. Should I pingflood? Tweak down ammount of atomic memory
> avaialable to make deadlocks easier to reproduce?

I usually test swap over NFS in the following manner, I setup a regular
inet service on the machine (apache or a bunch of ncat sockets piping to
files or something) and run a heavy workload on the machine (128M):
2*64M file backed thrashers and 2*64M anonymous thrashers. Then I start
clients for the regular inet service, wait for a bit, and shut down the
NFS server.

This makes the machine grind to a halt, I then restart the NFS server,
wait for it to reconnect and the client to come alive again.

Without the last few swap-over-NFS patches this last bit - getting back
out of that situation - never happens.

The basic idea is to make connectivity to the machine where swap traffic
goes very hard (pull a cable, cleanly shut down the server) and to keep
other network traffic pounding the machine.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: ATA over ethernet swapping
  2007-08-13  7:45             ` Peter Zijlstra
@ 2007-08-21  7:42               ` Pavel Machek
  0 siblings, 0 replies; 5+ messages in thread
From: Pavel Machek @ 2007-08-21  7:42 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ed L. Cashin, kernel list, ak, Netdev list

Hi!

> > But I'm able to compile kernel (-j 10) on 128MB machine, and I tried
> > cat /dev/zero | grep foo to exhaust memory... and could not reproduce
> > the deadlock. Should I pingflood? Tweak down ammount of atomic memory
> > avaialable to make deadlocks easier to reproduce?
> 
> I usually test swap over NFS in the following manner, I setup a regular
> inet service on the machine (apache or a bunch of ncat sockets piping to
> files or something) and run a heavy workload on the machine (128M):
> 2*64M file backed thrashers and 2*64M anonymous thrashers. Then I start
> clients for the regular inet service, wait for a bit, and shut down the
> NFS server.
> 
> This makes the machine grind to a halt, I then restart the NFS server,
> wait for it to reconnect and the client to come alive again.
> 
> Without the last few swap-over-NFS patches this last bit - getting back
> out of that situation - never happens.
> 
> The basic idea is to make connectivity to the machine where swap traffic
> goes very hard (pull a cable, cleanly shut down the server) and to keep
> other network traffic pounding the machine.

Hmm, I could not get swap-over-ata-over-ethernet to break. Maybe I
should not have local / filesystem, because it allows kernel to get
rid of some memory pressure by dropping clean pages? Plus I guess
ata-over-ethernet has some significant advantages, as it works over
ethernet directly, not over IP.

								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-08-21  9:34 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20070731135831.GA4604@elf.ucw.cz>
     [not found] ` <20070731150324.GE3206@coraid.com>
     [not found]   ` <20070731152924.GM2087@elf.ucw.cz>
     [not found]     ` <20070731162140.GI3206@coraid.com>
2007-07-31 22:27       ` ATA over ethernet swapping Pavel Machek
2007-08-01  9:18         ` Peter Zijlstra
2007-08-09 10:11           ` Pavel Machek
2007-08-13  7:45             ` Peter Zijlstra
2007-08-21  7:42               ` Pavel Machek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).