From: Mel Gorman <mgorman@suse.de>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Dave Hansen <dave.hansen@intel.com>,
Rik van Riel <riel@redhat.com>, Linux-MM <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 5/7] mm: page_alloc: Make zone distribution page aging policy configurable
Date: Tue, 17 Dec 2013 15:29:54 +0000 [thread overview]
Message-ID: <20131217152954.GA24067@suse.de> (raw)
In-Reply-To: <20131216204215.GA21724@cmpxchg.org>
On Mon, Dec 16, 2013 at 03:42:15PM -0500, Johannes Weiner wrote:
> On Fri, Dec 13, 2013 at 02:10:05PM +0000, Mel Gorman wrote:
> > Commit 81c0a2bb ("mm: page_alloc: fair zone allocator policy") solved a
> > bug whereby new pages could be reclaimed before old pages because of
> > how the page allocator and kswapd interacted on the per-zone LRU lists.
> > Unfortunately it was missed during review that a consequence is that
> > we also round-robin between NUMA nodes. This is bad for two reasons
> >
> > 1. It alters the semantics of MPOL_LOCAL without telling anyone
> > 2. It incurs an immediate remote memory performance hit in exchange
> > for a potential performance gain when memory needs to be reclaimed
> > later
> >
> > No cookies for the reviewers on this one.
> >
> > This patch makes the behaviour of the fair zone allocator policy
> > configurable. By default it will only distribute pages that are going
> > to exist on the LRU between zones local to the allocating process. This
> > preserves the historical semantics of MPOL_LOCAL.
> >
> > By default, slab pages are not distributed between zones after this patch is
> > applied. It can be argued that they should get similar treatment but they
> > have different lifecycles to LRU pages, the shrinkers are not zone-aware
> > and the interaction between the page allocator and kswapd is different
> > for slabs. If it turns out to be an almost universal win, we can change
> > the default.
> >
> > Signed-off-by: Mel Gorman <mgorman@suse.de>
> > ---
> > Documentation/sysctl/vm.txt | 32 ++++++++++++++
> > include/linux/mmzone.h | 2 +
> > include/linux/swap.h | 2 +
> > kernel/sysctl.c | 8 ++++
> > mm/page_alloc.c | 102 ++++++++++++++++++++++++++++++++++++++------
> > 5 files changed, 134 insertions(+), 12 deletions(-)
> >
> > diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
> > index 1fbd4eb..8eaa562 100644
> > --- a/Documentation/sysctl/vm.txt
> > +++ b/Documentation/sysctl/vm.txt
> > @@ -56,6 +56,7 @@ Currently, these files are in /proc/sys/vm:
> > - swappiness
> > - user_reserve_kbytes
> > - vfs_cache_pressure
> > +- zone_distribute_mode
> > - zone_reclaim_mode
> >
> > ==============================================================
> > @@ -724,6 +725,37 @@ causes the kernel to prefer to reclaim dentries and inodes.
> >
> > ==============================================================
> >
> > +zone_distribute_mode
> > +
> > +Pages allocation and reclaim are managed on a per-zone basis. When the
> > +system needs to reclaim memory, candidate pages are selected from these
> > +per-zone lists. Historically, a potential consequence was that recently
> > +allocated pages were considered reclaim candidates. From a zone-local
> > +perspective, page aging was preserved but from a system-wide perspective
> > +there was an age inversion problem.
> > +
> > +A similar problem occurs on a node level where young pages may be reclaimed
> > +from the local node instead of allocating remote memory. Unforuntately, the
> > +cost of accessing remote nodes is higher so the system must choose by default
> > +between favouring page aging or node locality. zone_distribute_mode controls
> > +how the system will distribute page ages between zones.
> > +
> > +0 = Never round-robin based on age
>
> I think we should be very conservative with the userspace interface we
> export on a mechanism we are obviously just figuring out.
>
And we have a proposal on how to limit this. I'll be layering another
patch on top and removes this interface again. That will allows us to
rollback one patch and still have a usable interface if necessary.
> > +Otherwise the values are ORed together
> > +
> > +1 = Distribute anon pages between zones local to the allocating node
> > +2 = Distribute file pages between zones local to the allocating node
> > +4 = Distribute slab pages between zones local to the allocating node
>
> Zone fairness within a node does not affect mempolicy or remote
> reference costs. Is there a reason to have this configurable?
>
Symmetry
> > +The following three flags effectively alter MPOL_DEFAULT, be careful.
> > +
> > +8 = Distribute anon pages between zones remote to the allocating node
> > +16 = Distribute file pages between zones remote to the allocating node
> > +32 = Distribute slab pages between zones remote to the allocating node
>
> Yes, it's conceivable that somebody might want to disable remote
> distribution because of the extra references.
>
> But at this point, I'd much rather back out anon and slab distribution
> entirely, it was a mistake to include them.
>
> That would leave us with a single knob to disable remote page cache
> placement.
>
When looking at this closer I found that sysv is a weird exception. It's
file-backed as far as most of the VM is concerned but looks anonymous to
most applications that care. That and MAP_SHARED anonymous pages should
not be treated like files but we still want tmpfs to be treated as
files. Details will be in the changelog of the next series.
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-12-17 15:29 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-12-13 14:10 [RFC PATCH 0/7] Configurable fair allocation zone policy v2r6 Mel Gorman
2013-12-13 14:10 ` [PATCH 1/7] mm: page_alloc: exclude unreclaimable allocations from zone fairness policy Mel Gorman
2013-12-13 15:45 ` Rik van Riel
2013-12-13 14:10 ` [PATCH 2/7] mm: page_alloc: Break out zone page aging distribution into its own helper Mel Gorman
2013-12-13 15:46 ` Rik van Riel
2013-12-16 20:16 ` Johannes Weiner
2013-12-13 14:10 ` [PATCH 3/7] mm: page_alloc: Use zone node IDs to approximate locality Mel Gorman
2013-12-16 13:20 ` Rik van Riel
2013-12-16 20:25 ` Johannes Weiner
2013-12-17 11:13 ` Mel Gorman
2013-12-17 15:38 ` Johannes Weiner
2013-12-17 16:08 ` Mel Gorman
2013-12-17 20:11 ` Johannes Weiner
2013-12-17 21:03 ` Mel Gorman
2013-12-17 22:31 ` Johannes Weiner
2013-12-13 14:10 ` [PATCH 4/7] mm: Annotate page cache allocations Mel Gorman
2013-12-16 15:20 ` Rik van Riel
2013-12-13 14:10 ` [PATCH 5/7] mm: page_alloc: Make zone distribution page aging policy configurable Mel Gorman
2013-12-16 19:25 ` Rik van Riel
2013-12-16 20:42 ` Johannes Weiner
2013-12-17 15:29 ` Mel Gorman [this message]
2013-12-17 15:54 ` Johannes Weiner
2013-12-17 16:14 ` Mel Gorman
2013-12-17 17:43 ` Johannes Weiner
2013-12-17 21:22 ` Mel Gorman
2013-12-17 22:57 ` Johannes Weiner
2013-12-17 23:24 ` Mel Gorman
2013-12-13 14:10 ` [PATCH 6/7] mm: page_alloc: Only account batch allocations requests that are eligible Mel Gorman
2013-12-16 20:52 ` Johannes Weiner
2013-12-17 11:20 ` Mel Gorman
2013-12-17 15:43 ` Johannes Weiner
2013-12-17 16:06 ` Mel Gorman
2013-12-13 14:10 ` [PATCH 7/7] mm: page_alloc: Default allow file pages to use remote nodes for fair allocation policy Mel Gorman
2013-12-13 17:04 ` Johannes Weiner
2013-12-13 19:20 ` Mel Gorman
2013-12-13 22:15 ` Johannes Weiner
2013-12-17 16:04 ` Mel Gorman
2013-12-16 19:26 ` Rik van Riel
2013-12-17 15:07 ` [RFC PATCH 0/7] Configurable fair allocation zone policy v2r6 Zlatko Calusic
2013-12-17 21:23 ` Mel Gorman
2013-12-21 16:03 ` Zlatko Calusic
2013-12-23 10:26 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131217152954.GA24067@suse.de \
--to=mgorman@suse.de \
--cc=akpm@linux-foundation.org \
--cc=dave.hansen@intel.com \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).