From: Mel Gorman <mgorman@suse.de>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, Peter Xu <peterx@redhat.com>,
Blake Caldwell <blake.caldwell@colorado.edu>,
Mike Rapoport <rppt@linux.vnet.ibm.com>,
Mike Kravetz <mike.kravetz@oracle.com>,
Michal Hocko <mhocko@kernel.org>,
Vlastimil Babka <vbabka@suse.cz>,
David Rientjes <rientjes@google.com>
Subject: Re: [LSF/MM TOPIC] NUMA remote THP vs NUMA local non-THP under MADV_HUGEPAGE
Date: Fri, 1 Feb 2019 14:17:33 +0000 [thread overview]
Message-ID: <20190201141733.GC4926@suse.de> (raw)
In-Reply-To: <20190129234058.GH31695@redhat.com>
On Tue, Jan 29, 2019 at 06:40:58PM -0500, Andrea Arcangeli wrote:
> I posted some benchmark results showing that for tasks without strong
> NUMA locality the __GFP_THISNODE logic is not guaranteed to be optimal
> (and here of course I mean even if we ignore the large slowdown with
> swap storms at allocation time that might be caused by
> __GFP_THISNODE). The results also show NUMA remote THPs help
> intrasocket as well as intersocket.
>
> https://lkml.kernel.org/r/20181210044916.GC24097@redhat.com
> https://lkml.kernel.org/r/20181212104418.GE1130@redhat.com
>
> The following seems the interim conclusion which I happen to be in
> agreement with Michal and Mel:
>
> https://lkml.kernel.org/r/20181212095051.GO1286@dhcp22.suse.cz
> https://lkml.kernel.org/r/20181212170016.GG1130@redhat.com
>
> Hopefully this strict issue will be hot-fixed before April (like we
> had to hot-fix it in the enterprise kernels to avoid the 3 years old
> regression to break large workloads that can't fit it in a single NUMA
> node and I assume other enterprise distributions will follow suit),
> but whatever hot-fix will likely allow ample margin for discussions on
> what we can do better to optimize the decision between local non-THP
> and remote THP under MADV_HUGEPAGE.
>
> It is clear that the __GFP_THISNODE forced in the current code
> provides some minor advantage to apps using MADV_HUGEPAGE that can fit
> in a single NUMA node, but we should try to achieve it without major
> disadvantages to apps that can't fit in a single NUMA node.
>
> For example it was mentioned that we could allocate readily available
> already-free local 4k if local compaction fails and the watermarks
> still allows local 4k allocations without invoking reclaim, before
> invoking compaction on remote nodes. The same can be repeated at a
> second level with intra-socket non-THP memory before invoking
> compaction inter-socket. However we can't do things like that with the
> current page allocator workflow. It's possible some larger change is
> required than just sending a single gfp bitflag down to the page
> allocator that creates an implicit MPOL_LOCAL binding to make it
> behave like the obsoleted numa/zone reclaim behavior, but weirdly only
> applied to THP allocations.
>
I would also be interested in discussing this topic. My activity is
mostly compaction-related but I believe it will evolve into something
that returns more sane data to the page allocator. That should make it a
bit easier to detect when local compaction fails and make it easier to
improve the page allocator workflow without throwing another workload
under a bus.
--
Mel Gorman
SUSE Labs
prev parent reply other threads:[~2019-02-01 14:17 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-29 23:40 [LSF/MM TOPIC] NUMA remote THP vs NUMA local non-THP under MADV_HUGEPAGE Andrea Arcangeli
2019-01-30 7:17 ` Michal Hocko
2019-01-30 8:13 ` [LSF/MM TOPIC]: userfaultfd (was: [LSF/MM TOPIC] NUMA remote THP vs NUMA local non-THP under MADV_HUGEPAGE) Mike Rapoport
2019-01-30 9:23 ` Peter Xu
2019-01-31 9:54 ` Mike Rapoport
2019-01-30 14:43 ` Andrea Arcangeli
2019-01-30 23:14 ` [LSF/MM TOPIC] NUMA remote THP vs NUMA local non-THP under MADV_HUGEPAGE Mike Kravetz
2019-02-01 14:17 ` Mel Gorman [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190201141733.GC4926@suse.de \
--to=mgorman@suse.de \
--cc=aarcange@redhat.com \
--cc=blake.caldwell@colorado.edu \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=mhocko@kernel.org \
--cc=mike.kravetz@oracle.com \
--cc=peterx@redhat.com \
--cc=rientjes@google.com \
--cc=rppt@linux.vnet.ibm.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.