linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: Jeff Garzik <jeff@garzik.org>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	netdev@vger.kernel.org, trond.myklebust@fys.uio.no
Subject: Re: [PATCH 00/33] Swap over NFS -v14
Date: Wed, 31 Oct 2007 13:56:53 +0100	[thread overview]
Message-ID: <1193835413.27652.205.camel@twins> (raw)
In-Reply-To: <47287220.8050804@garzik.org>

[-- Attachment #1: Type: text/plain, Size: 2911 bytes --]

On Wed, 2007-10-31 at 08:16 -0400, Jeff Garzik wrote:
> Thoughts:
> 
> 1) I absolutely agree that NFS is far more prominent and useful than any 
> network block device, at the present time.
> 
> 
> 2) Nonetheless, swap over NFS is a pretty rare case.  I view this work 
> as interesting, but I really don't see a huge need, for swapping over 
> NBD or swapping over NFS.  I tend to think swapping to a remote resource 
> starts to approach "migration" rather than merely swapping.  Yes, we can 
> do it...  but given the lack of burning need one must examine the price.

There is a large corporate demand for this, which is why I'm doing this.

The typical usage scenarios are:
 - cluster/blades, where having local disks is a cost issue (maintenance
   of failures, heat, etc)
 - virtualisation, where dumping the storage on a networked storage unit
   makes for trivial migration and what not..

But please, people who want this (I'm sure some of you are reading) do
speak up. I'm just the motivated corporate drone implementing the
feature :-)

> 3) You note
> > Swap over network has the problem that the network subsystem does not use fixed
> > sized allocations, but heavily relies on kmalloc(). This makes mempools
> > unusable.
> 
> True, but IMO there are mitigating factors that should be researched and 
> taken into account:
> 
> a) To give you some net driver background/history, most mainstream net 
> drivers were coded to allocate RX skbs of size 1538, under the theory 
> that they would all be allocating out of the same underlying slab cache. 
>   It would not be difficult to update a great many of the [non-jumbo] 
> cases to create a fixed size allocation pattern.

One issue that comes to mind is how to ensure we'd still overflow the
IP-reassembly buffers. Currently those are managed on the number of
bytes present, not the number of fragments.

One of the goals of my approach was to not rewrite the network subsystem
to accomodate this feature (and I hope I succeeded).

> b) Spare-time experiments and anecdotal evidence points to RX and TX skb 
> recycling as a potentially valuable area of research.  If you are able 
> to do something like that, then memory suddenly becomes a lot more 
> bounded and predictable.
> 
> 
> So my gut feeling is that taking a hard look at how net drivers function 
> in the field should give you a lot of good ideas that approach the 
> shared goal of making network memory allocations more predictable and 
> bounded.

Note that being bounded only comes from dropping most packets before
trying them to a socket. That is the crucial part of the RX path, to
receive all packets from the NIC (regardless their size) but to not pass
them on to the network stack - unless they belong to a 'special' socket
that promises undelayed processing.

Thanks for these ideas, I'll look into them.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

  reply	other threads:[~2007-10-31 12:56 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-10-30 16:04 [PATCH 00/33] Swap over NFS -v14 Peter Zijlstra
2007-10-30 16:04 ` [PATCH 01/33] mm: gfp_to_alloc_flags() Peter Zijlstra
2007-10-30 16:04 ` [PATCH 02/33] mm: tag reseve pages Peter Zijlstra
2007-10-30 16:04 ` [PATCH 03/33] mm: slub: add knowledge of reserve pages Peter Zijlstra
2007-10-31  3:37   ` Nick Piggin
2007-10-31 10:42     ` Peter Zijlstra
2007-10-31 10:46       ` Nick Piggin
2007-10-31 12:17         ` Peter Zijlstra
2007-10-31 11:25           ` Nick Piggin
2007-10-31 12:54             ` Peter Zijlstra
2007-10-31 13:08               ` Peter Zijlstra
2007-10-30 16:04 ` [PATCH 04/33] mm: allow mempool to fall back to memalloc reserves Peter Zijlstra
2007-10-31  3:40   ` Nick Piggin
2007-10-30 16:04 ` [PATCH 05/33] mm: kmem_estimate_pages() Peter Zijlstra
2007-10-31  3:43   ` Nick Piggin
2007-10-31 10:42     ` Peter Zijlstra
2007-10-30 16:04 ` [PATCH 06/33] mm: allow PF_MEMALLOC from softirq context Peter Zijlstra
2007-10-31  3:51   ` Nick Piggin
2007-10-31 10:42     ` Peter Zijlstra
2007-10-31 10:49       ` Nick Piggin
2007-10-31 13:06         ` Peter Zijlstra
2007-10-30 16:04 ` [PATCH 07/33] mm: serialize access to min_free_kbytes Peter Zijlstra
2007-10-30 16:04 ` [PATCH 08/33] mm: emergency pool Peter Zijlstra
2007-10-30 16:04 ` [PATCH 09/33] mm: system wide ALLOC_NO_WATERMARK Peter Zijlstra
2007-10-31  3:52   ` Nick Piggin
2007-10-31 10:45     ` Peter Zijlstra
2007-10-30 16:04 ` [PATCH 10/33] mm: __GFP_MEMALLOC Peter Zijlstra
2007-10-30 16:04 ` [PATCH 11/33] mm: memory reserve management Peter Zijlstra
2007-10-30 16:04 ` [PATCH 12/33] selinux: tag avc cache alloc as non-critical Peter Zijlstra
2007-10-30 16:04 ` [PATCH 13/33] net: wrap sk->sk_backlog_rcv() Peter Zijlstra
2007-10-30 16:04 ` [PATCH 14/33] net: packet split receive api Peter Zijlstra
2007-10-30 16:04 ` [PATCH 15/33] net: sk_allocation() - concentrate socket related allocations Peter Zijlstra
2007-10-30 16:04 ` [PATCH 16/33] netvm: network reserve infrastructure Peter Zijlstra
2007-10-30 16:04 ` [PATCH 17/33] sysctl: propagate conv errors Peter Zijlstra
2007-10-30 16:04 ` [PATCH 18/33] netvm: INET reserves Peter Zijlstra
2007-10-30 16:04 ` [PATCH 19/33] netvm: hook skb allocation to reserves Peter Zijlstra
2007-10-30 16:04 ` [PATCH 20/33] netvm: filter emergency skbs Peter Zijlstra
2007-10-30 16:04 ` [PATCH 21/33] netvm: prevent a TCP specific deadlock Peter Zijlstra
2007-10-30 16:04 ` [PATCH 22/33] netfilter: NF_QUEUE vs emergency skbs Peter Zijlstra
2007-10-30 16:04 ` [PATCH 23/33] netvm: skb processing Peter Zijlstra
2007-10-30 21:26   ` Stephen Hemminger
2007-10-30 21:26   ` Stephen Hemminger
2007-10-30 21:44     ` Peter Zijlstra
2007-10-30 16:04 ` [PATCH 24/33] mm: prepare swap entry methods for use in page methods Peter Zijlstra
2007-10-30 16:04 ` [PATCH 25/33] mm: add support for non block device backed swap files Peter Zijlstra
2007-10-30 16:04 ` [PATCH 26/33] mm: methods for teaching filesystems about PG_swapcache pages Peter Zijlstra
2007-10-30 16:04 ` [PATCH 27/33] nfs: remove mempools Peter Zijlstra
2007-10-30 16:04 ` [PATCH 28/33] nfs: teach the NFS client how to treat PG_swapcache pages Peter Zijlstra
2007-10-31  8:52   ` Christoph Hellwig
2007-10-30 16:04 ` [PATCH 29/33] nfs: disable data cache revalidation for swapfiles Peter Zijlstra
2007-10-30 16:04 ` [PATCH 30/33] nfs: swap vs nfs_writepage Peter Zijlstra
2007-10-30 16:04 ` [PATCH 31/33] nfs: enable swap on NFS Peter Zijlstra
2007-10-30 16:04 ` [PATCH 32/33] nfs: fix various memory recursions possible with swap over NFS Peter Zijlstra
2007-10-30 16:04 ` [PATCH 33/33] nfs: do not warn on radix tree node allocation failures Peter Zijlstra
2007-10-31  3:26 ` [PATCH 00/33] Swap over NFS -v14 Nick Piggin
2007-10-31  4:37   ` David Miller, Nick Piggin
2007-10-31  4:04     ` Nick Piggin
2007-10-31 14:03       ` Byron Stanoszek
2007-10-31  8:50     ` Christoph Hellwig
2007-10-31 10:56       ` Peter Zijlstra
2007-10-31 11:18         ` NBD was " Pavel Machek
2007-10-31 11:24           ` Peter Zijlstra
2007-10-31 14:54         ` Mike Snitzer
2007-10-31 16:31           ` Evgeniy Polyakov
2007-10-31  9:53     ` Peter Zijlstra
2007-10-31 11:27   ` Peter Zijlstra
2007-10-31 12:16     ` Jeff Garzik
2007-10-31 12:56       ` Peter Zijlstra [this message]
2007-10-31 13:18         ` Arnaldo Carvalho de Melo
2007-10-31 13:44         ` Gregory Haskins
2007-11-02  8:54         ` Pavel Machek
2007-11-18 18:09         ` Robin Humble

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1193835413.27652.205.camel@twins \
    --to=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=jeff@garzik.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=netdev@vger.kernel.org \
    --cc=nickpiggin@yahoo.com.au \
    --cc=torvalds@linux-foundation.org \
    --cc=trond.myklebust@fys.uio.no \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).