From: Anton Blanchard <anton@samba.org>
To: Gavin Shan <gwshan@linux.vnet.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH] powerpc/mm: Fix RECLAIM_DISTANCE
Date: Tue, 31 Jan 2017 15:58:16 +1100 [thread overview]
Message-ID: <20170131155816.13cf819f@kryten> (raw)
In-Reply-To: <20170131043355.GA25724@gwshan>
Hi,
> Anton, I think the behaviour looks good. Actually, it's not very
> relevant to the issue addressed by the patch. I will reply to
> Michael's reply about the reason. There are two nodes in your system
> and the memory is expected to be allocated from node-0. If node-0
> doesn't have enough free memory, the allocater switches to node-1. It
> means we need more stress.
Did you try setting zone_reclaim_mode? Surely we should reclaim local
clean pagecache if enabled?
Anton
--
zone_reclaim_mode:
Zone_reclaim_mode allows someone to set more or less aggressive approaches to
reclaim memory when a zone runs out of memory. If it is set to zero then no
zone reclaim occurs. Allocations will be satisfied from other zones / nodes
in the system.
This is value ORed together of
1 = Zone reclaim on
2 = Zone reclaim writes dirty pages out
4 = Zone reclaim swaps pages
zone_reclaim_mode is disabled by default. For file servers or workloads
that benefit from having their data cached, zone_reclaim_mode should be
left disabled as the caching effect is likely to be more important than
data locality.
zone_reclaim may be enabled if it's known that the workload is partitioned
such that each partition fits within a NUMA node and that accessing remote
memory would cause a measurable performance reduction. The page allocator
will then reclaim easily reusable pages (those page cache pages that are
currently not used) before allocating off node pages.
Allowing zone reclaim to write out pages stops processes that are
writing large amounts of data from dirtying pages on other nodes. Zone
reclaim will write out dirty pages if a zone fills up and so effectively
throttle the process. This may decrease the performance of a single process
since it cannot use all of system memory to buffer the outgoing writes
anymore but it preserve the memory on other nodes so that the performance
of other processes running on other nodes will not be affected.
Allowing regular swap effectively restricts allocations to the local
node unless explicitly overridden by memory policies or cpuset
configurations.
>
> In the experiment, 38GB is allocated: 16GB for pagecache and 24GB for
> heap. It's not exceeding the memory capacity (64GB). So page reclaim
> in the fast and slow path weren't triggered. It's why the pagecache
> wasn't dropped. I think __GFP_THISNODE isn't specified when
> page-fault handler tries to allocate page to accomodate the VMA for
> the heap.
>
> *Without* the patch applied, I got something as below in the system
> where two NUMA nodes and each of them has 64GB memory. Also, I don't
> think the patch is going to change the behaviour:
>
> # cat /proc/sys/vm/zone_reclaim_mode
> 0
>
> Drop pagecache
> Read 8GB file, for pagecache to consume 8GB memory.
> Node 0 FilePages: 8496960 kB
> taskset -c 0 ./alloc 137438953472 <- 128GB sized heap
> Node 0 FilePages: 503424 kB
>
> Eventually, some of swap clusters have been used as well:
>
> # free -m
> total used free shared buff/cache
> available Mem: 130583 129203 861
> 10 518 297 Swap: 10987 3145 7842
>
> Thanks,
> Gavin
>
next prev parent reply other threads:[~2017-01-31 4:58 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-01-23 23:32 [PATCH] powerpc/mm: Fix RECLAIM_DISTANCE Gavin Shan
2017-01-25 3:57 ` Balbir Singh
2017-01-25 4:58 ` Gavin Shan
2017-01-27 12:49 ` Balbir Singh
2017-01-30 1:02 ` Anton Blanchard
2017-01-30 4:38 ` Gavin Shan
2017-01-30 21:11 ` Michael Ellerman
2017-01-31 5:01 ` Gavin Shan
2017-01-31 5:40 ` Gavin Shan
2017-02-07 23:40 ` Gavin Shan
2017-01-31 4:33 ` Gavin Shan
2017-01-31 4:58 ` Anton Blanchard [this message]
2017-01-31 5:30 ` Gavin Shan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170131155816.13cf819f@kryten \
--to=anton@samba.org \
--cc=gwshan@linux.vnet.ibm.com \
--cc=linuxppc-dev@lists.ozlabs.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).