From: Mel Gorman <mgorman@techsingularity.net>
To: Mikulas Patocka <mpatocka@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Helge Deller <deller@gmx.de>,
"James E.J. Bottomley" <James.Bottomley@hansenpartnership.com>,
John David Anglin <dave.anglin@bell.net>,
linux-parisc@vger.kernel.org, linux-mm@kvack.org,
Vlastimil Babka <vbabka@suse.cz>,
Andrea Arcangeli <aarcange@redhat.com>,
Zi Yan <zi.yan@cs.rutgers.edu>
Subject: Re: Memory management broken by "mm: reclaim small amounts of memory when an external fragmentation event occurs"
Date: Mon, 8 Apr 2019 13:54:48 +0100 [thread overview]
Message-ID: <20190408125448.GB18914@techsingularity.net> (raw)
In-Reply-To: <alpine.LRH.2.02.1904080639570.4674@file01.intranet.prod.int.rdu2.redhat.com>
On Mon, Apr 08, 2019 at 07:10:11AM -0400, Mikulas Patocka wrote:
> > First, if pa-risc is !NUMA then why are separate local ranges
> > represented as separate nodes? Is it because of DISCONTIGMEM or something
> > else? DISCONTIGMEM is before my time so I'm not familiar with it and
>
> I'm not an expert in this area, I don't know.
>
Ok.
> > I consider it "essentially dead" but the arch init code seems to setup
> > pgdats for each physical contiguous range so it's a possibility. The most
> > likely explanation is pa-risc does not have hardware with addressing
> > limitations smaller than the CPUs physical address limits and it's
> > possible to have more ranges than available zones but clarification would
> > be nice. By rights, SPARSEMEM would be supported on pa-risc but that
> > would be a time-consuming and somewhat futile exercise. Regardless of the
> > explanation, as pa-risc does not appear to support transparent hugepages,
> > an option is to special case watermark_boost_factor to be 0 on DISCONTIGMEM
> > as that commit was primarily about THP with secondary concerns around
> > SLUB. This is probably the most straight-forward solution but it'd need
> > a comment obviously. I do not know what the distro configurations for
> > pa-risc set as I'm not a user of gentoo or debian.
>
> I use Debian Sid, but I compile my own kernel. I uploaded the kernel
> .config here:
> http://people.redhat.com/~mpatocka/testcases/parisc-config.txt
>
DISCONTIGMEM is set so based on the arch init code. Glancing at the
history, it seems my assumption was accurate. Discontig used NUMA
structures for non-NUMA machines to allow code to be reused and simplify
matters.
I'll put together a patch that disables this feature on DISCONTIG as it
is surprising in the DISCONTIGMEM.
> > Second, if you set the sysctl vm.watermark_boost_factor=0, does the
> > problem go away? If so, an option would be to set this sysctl to 0 by
> > default on distros that support pa-risc. Would that be suitable?
>
> I have tried it and the problem almost goes away. With
> vm.watermark_boost_factor=0, if I read 2GiB data from the disk, the buffer
> cache will contain about 1.8GiB. So, there's still some superfluous page
> reclaim, but it is smaller.
>
Ok, for NUMA, I would generally expect some small amounts of reclaim on
a per-node basis from kswapd waking up as the node fills. I know in your
case there is no NUMA but from a memory consumption/reclaim point of
view, it doesn't matter. There are multiple active node structures so
it's treated as such.
In the short-term, I suggest you update /etc/sysctl.conf to workaround
the issue.
> BTW. I'm interested - on real NUMA machines - is reclaiming the file cache
> really a better option than allocating the file cache from non-local node?
>
The patch is not related to file cache concerns, it's for long-term
viability of high-order allocations, particularly THP but also SLUB which
uses high-order allocations by default.
>
> > Finally, I'm sure this has been asked before buy why is pa-risc alive?
> > It appears a new CPU has not been manufactured since 2005. Even Alpha
> > I can understand being semi-alive since it's an interesting case for
> > weakly-ordered memory models. pa-risc appears to be supported and active
> > for debian at least so someone cares. It's not the only feature like this
> > that is bizarrely alive but it is curious -- 32 bit NUMA support on x86,
> > I'm looking at you, your machines are all dead since the early 2000's
> > AFAIK and anyone else using NUMA on 32-bit x86 needs their head examined.
>
> I use it to test programs for portability to risc.
>
> If one could choose between buying an expensive power system or a cheap
> pa-risc system, pa-risc may be a better choice. The last pa-risc model has
> four cores at 1.1GHz, so it is not completely unuseable.
Well if it was me and I was checking portability to risc, I'd probably
get hold of a raspberry pi but we all have different ways of looking at
things.
--
Mel Gorman
SUSE Labs
next prev parent reply other threads:[~2019-04-08 13:00 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-06 15:20 Memory management broken by "mm: reclaim small amounts of memory when an external fragmentation event occurs" Mikulas Patocka
2019-04-06 17:26 ` Mikulas Patocka
2019-04-08 9:52 ` Mel Gorman
2019-04-08 11:10 ` Mikulas Patocka
2019-04-08 12:54 ` Mel Gorman [this message]
2019-04-08 14:29 ` James Bottomley
2019-04-08 15:22 ` Helge Deller
2019-04-08 19:44 ` James Bottomley
2019-04-09 20:09 ` Helge Deller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190408125448.GB18914@techsingularity.net \
--to=mgorman@techsingularity.net \
--cc=James.Bottomley@hansenpartnership.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=dave.anglin@bell.net \
--cc=deller@gmx.de \
--cc=linux-mm@kvack.org \
--cc=linux-parisc@vger.kernel.org \
--cc=mpatocka@redhat.com \
--cc=vbabka@suse.cz \
--cc=zi.yan@cs.rutgers.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.