public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@kernel.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	ying.huang@intel.com, s.priebe@profihost.ag,
	mgorman@techsingularity.net,
	Linux List Kernel Mailing <linux-kernel@vger.kernel.org>,
	alex.williamson@redhat.com, lkp@01.org, kirill@shutemov.name,
	Andrew Morton <akpm@linux-foundation.org>,
	zi.yan@cs.rutgers.edu
Subject: Re: [patch 0/2 for-4.20] mm, thp: fix remote access and allocation regressions
Date: Wed, 5 Dec 2018 16:45:42 -0500	[thread overview]
Message-ID: <20181205214542.GC11899@redhat.com> (raw)
In-Reply-To: <alpine.DEB.2.21.1812051142040.240991@chino.kir.corp.google.com>

On Wed, Dec 05, 2018 at 11:49:26AM -0800, David Rientjes wrote:
> High thp utilization is not always better, especially when those hugepages 
> are accessed remotely and introduce the regressions that I've reported.  
> Seeking high thp utilization at all costs is not the goal if it causes 
> workloads to regress.

Is it possible what you need is a defrag=compactonly_thisnode to set
instead of the default defrag=madvise? The fact you seem concerned
about page fault latencies doesn't make your workload an obvious
candidate for MADV_HUGEPAGE to begin with. At least unless you decide
to smooth the MADV_HUGEPAGE behavior with an mbind that will simply
add __GFP_THISNODE to the allocations, perhaps you'll be even faster
if you invoke reclaim in the local node for 4k allocations too.

It looks like for your workload THP is a nice to have add-on, which is
practically true of all workloads (with a few corner cases that must
use MADV_NOHUGEPAGE), and it's what the defrag= default is about.

Is it possible that you just don't want to shut off completely
compaction in the page fault and if you're ok to do it for your
library, you may be ok with that for all other apps too?

That's a different stance from other MADV_HUGEPAGE users because you
don't seem to mind a severely crippled THP utilization in your
app.

With your patch the utilization will go down a lot compared to the
previous __GFP_THISNODE swap storm capable and you're still very fine
with that. The fact you're fine with that points in the direction of
changing the default tuning for defrag= to something stronger than
madvise (that is precisely the default setting that is forcing you to
use MADV_HUGEPAGE to get a chance to get some THP once a in a while
during the page fault, after some uptime).

Considering mbind surprisingly isn't privileged (so I suppose it may
cause swap storms equivalent to __GFP_THISNODE if maliciously used
after all) you could even consider a defrag=thisnode to force
compaction+defrag local to the node to retain your THP+NUMA dynamic
partitioning behavior that ends up swappin heavy in the local node.

  parent reply	other threads:[~2018-12-05 21:45 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-03 23:50 [patch 0/2 for-4.20] mm, thp: fix remote access and allocation regressions David Rientjes
2018-12-03 23:50 ` [patch 2/2 for-4.20] mm, thp: always fault memory with __GFP_NORETRY David Rientjes
2018-12-03 23:50 ` [patch 1/2 for-4.20] mm, thp: restore node-local hugepage allocations David Rientjes
2018-12-04  7:35   ` Michal Hocko
2018-12-04 21:56     ` David Rientjes
2018-12-05  7:34       ` Michal Hocko
2018-12-05 19:24         ` David Rientjes
2018-12-05 20:15           ` Michal Hocko
2018-12-05 22:21             ` Andrea Arcangeli
2018-12-04  7:38 ` [patch 0/2 for-4.20] mm, thp: fix remote access and allocation regressions Michal Hocko
2018-12-04 22:25   ` David Rientjes
2018-12-05  7:40     ` Michal Hocko
2018-12-05 10:15     ` Mel Gorman
2018-12-05 19:41       ` David Rientjes
2018-12-04 10:10 ` Vlastimil Babka
2018-12-04 22:04   ` David Rientjes
2018-12-05  9:05     ` Michal Hocko
2018-12-05 19:49       ` David Rientjes
2018-12-05 20:32         ` Michal Hocko
2018-12-05 21:14           ` David Rientjes
2018-12-05 21:45         ` Andrea Arcangeli [this message]
2018-12-05 22:10           ` David Rientjes
2018-12-06  0:31             ` Andrea Arcangeli
2018-12-09 22:44               ` David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181205214542.GC11899@redhat.com \
    --to=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@01.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=rientjes@google.com \
    --cc=s.priebe@profihost.ag \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    --cc=ying.huang@intel.com \
    --cc=zi.yan@cs.rutgers.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox