From: Eric B Munson <emunson@mgebm.net>
To: Mel Gorman <mgorman@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Linux-MM <linux-mm@kvack.org>,
Linux-Netdev <netdev@vger.kernel.org>,
LKML <linux-kernel@vger.kernel.org>,
David Miller <davem@davemloft.net>, Neil Brown <neilb@suse.de>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Mike Christie <michaelc@cs.wisc.edu>
Subject: Re: [PATCH 00/16] Swap-over-NBD without deadlocking V9
Date: Sat, 21 Apr 2012 14:15:41 -0400 [thread overview]
Message-ID: <20120421181541.GC17039@mgebm.net> (raw)
In-Reply-To: <1334578624-23257-1-git-send-email-mgorman@suse.de>
[-- Attachment #1: Type: text/plain, Size: 7055 bytes --]
On Mon, 16 Apr 2012, Mel Gorman wrote:
> Changelog since V8
> o Rebase to 3.4-rc2
> o Use page flag instead of slab fields to keep structures the same size
> o Properly detect allocations from softirq context that use PF_MEMALLOC
> o Ensure kswapd does not sleep while processes are throttled
> o Do not accidentally throttle !_GFP_FS processes indefinitely
>
> Changelog since V7
> o Rebase to 3.3-rc2
> o Take greater care propagating page->pfmemalloc to skb
> o Propagate pfmemalloc from netdev_alloc_page to skb where possible
> o Release RCU lock properly on preempt kernel
>
> Changelog since V6
> o Rebase to 3.1-rc8
> o Use wake_up instead of wake_up_interruptible()
> o Do not throttle kernel threads
> o Avoid a potential race between kswapd going to sleep and processes being
> throttled
>
> Changelog since V5
> o Rebase to 3.1-rc5
>
> Changelog since V4
> o Update comment clarifying what protocols can be used (Michal)
> o Rebase to 3.0-rc3
>
> Changelog since V3
> o Propogate pfmemalloc from packet fragment pages to skb (Neil)
> o Rebase to 3.0-rc2
>
> Changelog since V2
> o Document that __GFP_NOMEMALLOC overrides __GFP_MEMALLOC (Neil)
> o Use wait_event_interruptible (Neil)
> o Use !! when casting to bool to avoid any possibilitity of type
> truncation (Neil)
> o Nicer logic when using skb_pfmemalloc_protocol (Neil)
>
> Changelog since V1
> o Rebase on top of mmotm
> o Use atomic_t for memalloc_socks (David Miller)
> o Remove use of sk_memalloc_socks in vmscan (Neil Brown)
> o Check throttle within prepare_to_wait (Neil Brown)
> o Add statistics on throttling instead of printk
>
> When a user or administrator requires swap for their application, they
> create a swap partition and file, format it with mkswap and activate it
> with swapon. Swap over the network is considered as an option in diskless
> systems. The two likely scenarios are when blade servers are used as part
> of a cluster where the form factor or maintenance costs do not allow the
> use of disks and thin clients.
>
> The Linux Terminal Server Project recommends the use of the
> Network Block Device (NBD) for swap according to the manual at
> https://sourceforge.net/projects/ltsp/files/Docs-Admin-Guide/LTSPManual.pdf/download
> There is also documentation and tutorials on how to setup swap over NBD
> at places like https://help.ubuntu.com/community/UbuntuLTSP/EnableNBDSWAP
> The nbd-client also documents the use of NBD as swap. Despite this, the
> fact is that a machine using NBD for swap can deadlock within minutes if
> swap is used intensively. This patch series addresses the problem.
>
> The core issue is that network block devices do not use mempools like
> normal block devices do. As the host cannot control where they receive
> packets from, they cannot reliably work out in advance how much memory
> they might need. Some years ago, Peter Ziljstra developed a series of
> patches that supported swap over an NFS that at least one distribution
> is carrying within their kernels. This patch series borrows very heavily
> from Peter's work to support swapping over NBD as a pre-requisite to
> supporting swap-over-NFS. The bulk of the complexity is concerned with
> preserving memory that is allocated from the PFMEMALLOC reserves for use
> by the network layer which is needed for both NBD and NFS.
>
> Patch 1 serialises access to min_free_kbytes. It's not strictly needed
> by this series but as the series cares about watermarks in
> general, it's a harmless fix. It could be merged independently
> and may be if CMA is merged in advance.
>
> Patch 2 adds knowledge of the PFMEMALLOC reserves to SLAB and SLUB to
> preserve access to pages allocated under low memory situations
> to callers that are freeing memory.
>
> Patch 3 introduces __GFP_MEMALLOC to allow access to the PFMEMALLOC
> reserves without setting PFMEMALLOC.
>
> Patch 4 opens the possibility for softirqs to use PFMEMALLOC reserves
> for later use by network packet processing.
>
> Patch 5 ignores memory policies when ALLOC_NO_WATERMARKS is set.
>
> Patches 6-13 allows network processing to use PFMEMALLOC reserves when
> the socket has been marked as being used by the VM to clean pages. If
> packets are received and stored in pages that were allocated under
> low-memory situations and are unrelated to the VM, the packets
> are dropped.
>
> Patch 11 reintroduces __netdev_alloc_page which the networking
> folk may object to but is needed in some cases to propogate
> pfmemalloc from a newly allocated page to an skb. If there is a
> strong objection, this patch can be dropped with the impact being
> that swap-over-network will be slower in some cases but it should
> not fail.
>
> Patch 13 is a micro-optimisation to avoid a function call in the
> common case.
>
> Patch 14 tags NBD sockets as being SOCK_MEMALLOC so they can use
> PFMEMALLOC if necessary.
>
> Patch 15 notes that it is still possible for the PFMEMALLOC reserve
> to be depleted. To prevent this, direct reclaimers get throttled on
> a waitqueue if 50% of the PFMEMALLOC reserves are depleted. It is
> expected that kswapd and the direct reclaimers already running
> will clean enough pages for the low watermark to be reached and
> the throttled processes are woken up.
>
> Patch 16 adds a statistic to track how often processes get throttled
>
> Some basic performance testing was run using kernel builds, netperf
> on loopback for UDP and TCP, hackbench (pipes and sockets), iozone
> and sysbench. Each of them were expected to use the sl*b allocators
> reasonably heavily but there did not appear to be significant
> performance variances.
>
> For testing swap-over-NBD, a machine was booted with 2G of RAM with a
> swapfile backed by NBD. 8*NUM_CPU processes were started that create
> anonymous memory mappings and read them linearly in a loop. The total
> size of the mappings were 4*PHYSICAL_MEMORY to use swap heavily under
> memory pressure.
>
> Without the patches and using SLUB, the machine locks up within minutes and
> runs to completion with them applied. With SLAB, the story is different
> as an unpatched kernel run to completion. However, the patched kernel
> completed the test 40% faster.
>
> 3.4.0-rc2 3.4.0-rc2
> vanilla-slab swapnbd
> Sys Time Running Test (seconds) 87.90 73.45
> User+Sys Time Running Test (seconds) 91.93 76.91
> Total Elapsed Time (seconds) 4174.37 2953.96
>
I have tested these with an artificial swap benchmark and with a large project
compile on a beagle board. They work great for me. My tests only used this
set via swap over NFS so it probably wasn't very thorough coverage.
Tested-by: Eric B Munson <emunson@mgebm.net>
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
next prev parent reply other threads:[~2012-04-21 18:15 UTC|newest]
Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-04-16 12:16 [PATCH 00/16] Swap-over-NBD without deadlocking V9 Mel Gorman
2012-04-16 12:16 ` Mel Gorman
2012-04-16 12:16 ` [PATCH 01/16] mm: Serialize access to min_free_kbytes Mel Gorman
2012-04-16 12:16 ` Mel Gorman
2012-04-23 23:50 ` David Rientjes
2012-04-23 23:50 ` David Rientjes
2012-04-16 12:16 ` [PATCH 02/16] mm: sl[au]b: Add knowledge of PFMEMALLOC reserve pages Mel Gorman
2012-04-16 12:16 ` Mel Gorman
2012-04-23 23:51 ` David Rientjes
2012-04-23 23:51 ` David Rientjes
2012-04-25 15:05 ` Mel Gorman
2012-04-25 15:05 ` Mel Gorman
2012-04-16 12:16 ` [PATCH 03/16] mm: slub: Optimise the SLUB fast path to avoid pfmemalloc checks Mel Gorman
2012-04-16 12:16 ` Mel Gorman
2012-04-16 12:16 ` [PATCH 04/16] mm: Introduce __GFP_MEMALLOC to allow access to emergency reserves Mel Gorman
2012-04-16 12:16 ` Mel Gorman
2012-04-16 12:16 ` [PATCH 05/16] mm: allow PF_MEMALLOC from softirq context Mel Gorman
2012-04-16 12:16 ` Mel Gorman
2012-05-01 22:08 ` Andrew Morton
2012-05-01 22:08 ` Andrew Morton
2012-05-02 16:24 ` Mel Gorman
2012-05-02 16:24 ` Mel Gorman
2012-04-16 12:16 ` [PATCH 06/16] mm: Ignore mempolicies when using ALLOC_NO_WATERMARK Mel Gorman
2012-04-16 12:16 ` Mel Gorman
2012-04-16 12:16 ` [PATCH 07/16] net: Introduce sk_allocation() to allow addition of GFP flags depending on the individual socket Mel Gorman
2012-04-16 12:16 ` Mel Gorman
2012-04-16 12:16 ` [PATCH 08/16] netvm: Allow the use of __GFP_MEMALLOC by specific sockets Mel Gorman
2012-04-16 12:16 ` Mel Gorman
2012-04-16 12:16 ` [PATCH 09/16] netvm: Allow skb allocation to use PFMEMALLOC reserves Mel Gorman
2012-04-16 12:16 ` Mel Gorman
2012-04-16 12:16 ` [PATCH 10/16] netvm: Propagate page->pfmemalloc to skb Mel Gorman
2012-04-16 12:16 ` Mel Gorman
2012-04-16 12:16 ` [PATCH 11/16] netvm: Propagate page->pfmemalloc from netdev_alloc_page " Mel Gorman
2012-04-16 12:16 ` Mel Gorman
2012-04-16 12:16 ` [PATCH 12/16] netvm: Set PF_MEMALLOC as appropriate during SKB processing Mel Gorman
2012-04-16 12:16 ` Mel Gorman
2012-04-16 12:17 ` [PATCH 13/16] mm: Micro-optimise slab to avoid a function call Mel Gorman
2012-04-16 12:17 ` Mel Gorman
2012-04-16 12:17 ` [PATCH 14/16] nbd: Set SOCK_MEMALLOC for access to PFMEMALLOC reserves Mel Gorman
2012-04-16 12:17 ` Mel Gorman
2012-04-16 12:17 ` [PATCH 15/16] mm: Throttle direct reclaimers if PF_MEMALLOC reserves are low and swap is backed by network storage Mel Gorman
2012-04-16 12:17 ` Mel Gorman
2012-05-01 22:24 ` Andrew Morton
2012-05-01 22:24 ` Andrew Morton
2012-05-02 16:24 ` Mel Gorman
2012-05-02 16:24 ` Mel Gorman
2012-04-16 12:17 ` [PATCH 16/16] mm: Account for the number of times direct reclaimers get throttled Mel Gorman
2012-04-16 12:17 ` Mel Gorman
2012-04-21 18:15 ` Eric B Munson [this message]
2012-05-01 22:28 ` [PATCH 00/16] Swap-over-NBD without deadlocking V9 Andrew Morton
2012-05-01 22:28 ` Andrew Morton
2012-05-03 15:00 ` Mel Gorman
2012-05-03 15:00 ` Mel Gorman
2012-05-03 17:06 ` David Miller
2012-05-03 17:06 ` David Miller
2012-05-04 10:16 ` Mel Gorman
2012-05-04 10:16 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120421181541.GC17039@mgebm.net \
--to=emunson@mgebm.net \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=davem@davemloft.net \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=michaelc@cs.wisc.edu \
--cc=neilb@suse.de \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.