linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, Alex Williamson <alex.williamson@redhat.com>,
	David Rientjes <rientjes@google.com>
Subject: Re: [PATCH 0/2] fix for "pathological THP behavior"
Date: Tue, 21 Aug 2018 18:05:04 -0400	[thread overview]
Message-ID: <20180821220504.GH13047@redhat.com> (raw)
In-Reply-To: <6120e1b6-b4d2-96cb-2555-d8fab65c23c8@suse.cz>

On Tue, Aug 21, 2018 at 05:30:11PM +0200, Vlastimil Babka wrote:
> If it's "not possible to compact" then the expected outcome of this is
> to fail?

It'll just call __alloc_pages_direct_compact once and fail if that
fails.

> You could do that without calling watermark checking explicitly, but
> it's rather complicated:
> 
> 1. try alocating with __GFP_THISNODE and ~GFP_DIRECT_RECLAIM
> 2. if that fails, try PAGE_SIZE with same flags
> 3. if that fails, try THP size without __GFP_THISNODE
> 4. PAGE_SIZE without __GFP_THISNODE

It's not complicated, it's slow, why to call 4 times into the
allocator, just to skip 1 watermark check?

> Yeah, not possible in current alloc_pages_vma() which should return the
> requested order. But the advantage is that it's not prone to races
> between watermark checking and actual attempt.

Watermark checking is always racy, zone_watermark_fast doesn't take
any lock before invoking rmqueue.

The racy in this case is the least issue because it doesn't need to be
perfect: if once in a while a THP is allocated from a remote node
despite there was a bit more of PAGE_SIZEd memory free than expected
in the local node it wouldn't be an issue. If it's wrong in the other
way around it'll just behave not-optimally (like today) once in a while.

> Frankly, I would rather go with this option and assume that if someone
> explicitly wants THP's, they don't care about NUMA locality that much.

I'm fine either ways, either way will work, large NUMA will prefer
__GFP_COMPACT_ONLY option 1), small NUMA will likely prefer option
2) and making this configurable at runtime with a different default is
possible too later but then I'm not sure it's worth it.

The main benefit of 1) really is to cause the least possible
interference to NUMA balancing (and the further optimization possible
with the watermark check would not cause interference either).

> (Note: I hate __GFP_THISNODE, it's an endless source of issues.)
> Trying to be clever about "is there still PAGE_SIZEd free memory in the
> local node" is imperfect anyway. If there isn't, is it because there's
> clean page cache that we can easily reclaim (so it would be worth
> staying local) or is it really exhausted? Watermark check won't tell...

I'm not sure if it's worth reclaiming cache to stay local, I mean it
totally depends on the app running, reclaim cache to stay local is
even harder to tell if it's worth it. This is why especially on small
NUMA option 2) that removes __GFP_THISNODE is likely to perform best.

However the tradeoff about clean cache reclaiming or not combined if
to do or not the watermark check on the PAGE_SIZEd free memory, is all
about the MADV_HUGEPAGE case only.

The default case (not even calling compaction in the page fault) would
definitely benefit from the (imperfect racy) heuristic "is there still
PAGE_SIZEd free memory in the local node" and that remains true even
if we go with option 2) to solve the bug (option 2 only changes the
behavior of MADV_HUGEPAGE, not the default).

The imperfect watermark check it's net gain for the default case
without __GFP_DIRECT_RECLAIM set, because currently the code has zero
chance to get THP even if already free and available in other nodes
and it only tries to get them from the local node. It costs a
watermark fast check only to get THP from all nodes if already
available (or if made available from kswapd). Costs 1 cacheline just
for the pcp test, but then we're in the page allocator anyway so all
cachelines involving watermarks may be activated regardless.

Thanks,
Andrea

  parent reply	other threads:[~2018-08-21 22:05 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-20  3:22 [PATCH 0/2] fix for "pathological THP behavior" Andrea Arcangeli
2018-08-20  3:22 ` [PATCH 1/2] mm: thp: consolidate policy_nodemask call Andrea Arcangeli
2018-08-20  3:22 ` [PATCH 2/2] mm: thp: fix transparent_hugepage/defrag = madvise || always Andrea Arcangeli
2018-08-20  3:26   ` [PATCH 0/1] fix for "pathological THP behavior" v2 Andrea Arcangeli
2018-08-20  3:26     ` [PATCH 1/1] mm: thp: fix transparent_hugepage/defrag = madvise || always Andrea Arcangeli
2018-08-20 12:35   ` [PATCH 2/2] " Zi Yan
2018-08-20 15:32     ` Andrea Arcangeli
2018-08-21 11:50   ` Michal Hocko
2018-08-21 21:40     ` Andrea Arcangeli
2018-08-22  9:02       ` Michal Hocko
2018-08-22 11:07         ` Michal Hocko
2018-08-22 14:24           ` Andrea Arcangeli
2018-08-22 14:45             ` Michal Hocko
2018-08-22 15:24               ` Andrea Arcangeli
2018-08-23 10:50                 ` Michal Hocko
2018-08-22 15:52         ` Andrea Arcangeli
2018-08-23 10:52           ` Michal Hocko
2018-08-28  7:53             ` Michal Hocko
2018-08-28  8:18               ` Michal Hocko
2018-08-28  8:54                 ` Stefan Priebe - Profihost AG
2018-08-29 11:11                   ` Stefan Priebe - Profihost AG
     [not found]                 ` <D5F4A33C-0A37-495C-9468-D6866A862097@cs.rutgers.edu>
2018-08-29 14:28                   ` Michal Hocko
2018-08-29 14:35                     ` Michal Hocko
2018-08-29 15:22                       ` Zi Yan
2018-08-29 15:47                         ` Michal Hocko
2018-08-29 16:06                           ` Zi Yan
2018-08-29 16:25                             ` Michal Hocko
2018-08-29 19:24                               ` [PATCH] mm, thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings Michal Hocko
2018-08-29 22:54                                 ` Zi Yan
2018-08-30  7:00                                   ` Michal Hocko
2018-08-30 13:22                                     ` Zi Yan
2018-08-30 13:45                                       ` Michal Hocko
2018-08-30 14:02                                         ` Zi Yan
2018-08-30 16:19                                           ` Stefan Priebe - Profihost AG
2018-08-30 16:40                                           ` Michal Hocko
2018-09-05  3:44                                             ` Andrea Arcangeli
2018-09-05  7:08                                               ` Michal Hocko
2018-09-06 11:10                                                 ` Vlastimil Babka
2018-09-06 11:16                                                   ` Vlastimil Babka
2018-09-06 11:25                                                     ` Michal Hocko
2018-09-06 12:35                                                       ` Zi Yan
2018-09-06 10:59                                   ` Vlastimil Babka
2018-09-06 11:17                                     ` Zi Yan
2018-08-30  6:47                                 ` Michal Hocko
2018-09-06 11:18                                   ` Vlastimil Babka
2018-09-06 11:27                                     ` Michal Hocko
2018-09-12 17:29                                 ` Mel Gorman
2018-09-17  6:11                                   ` Michal Hocko
2018-09-17  7:04                                     ` Stefan Priebe - Profihost AG
2018-09-17  9:32                                       ` Stefan Priebe - Profihost AG
2018-09-17 11:27                                       ` Michal Hocko
2018-08-20 11:58 ` [PATCH 0/2] fix for "pathological THP behavior" Kirill A. Shutemov
2018-08-20 15:19   ` Andrea Arcangeli
2018-08-21 15:30     ` Vlastimil Babka
2018-08-21 17:26       ` David Rientjes
2018-08-21 22:18         ` Andrea Arcangeli
2018-08-21 22:05       ` Andrea Arcangeli [this message]
2018-08-22  9:24       ` Michal Hocko
2018-08-22 15:56         ` Andrea Arcangeli
2018-08-20 19:06   ` Yang Shi
2018-08-20 23:24     ` Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180821220504.GH13047@redhat.com \
    --to=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=kirill@shutemov.name \
    --cc=linux-mm@kvack.org \
    --cc=rientjes@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).