All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: lkp@lists.01.org
Subject: MADV_HUGEPAGE vs. NUMA semantic (was: Re: [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression)
Date: Thu, 06 Dec 2018 10:14:06 +0100	[thread overview]
Message-ID: <20181206091405.GD1286@dhcp22.suse.cz> (raw)
In-Reply-To: <CAHk-=wjm9V843eg0uesMrxKnCCq7UfWn8VJ+z-cNztb_0fVW6A@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1899 bytes --]

On Wed 05-12-18 16:58:02, Linus Torvalds wrote:
[...]
> I realize that we probably do want to just have explicit policies that
> do not exist right now, but what are (a) sane defaults, and (b) sane
> policies?

I would focus on the current default first (which is defrag=madvise).
This means that we only try the cheapest possible THP without
MADV_HUGEPAGE. If there is none we simply fallback. We do restrict to
the local node. I guess there is a general agreement that this is a sane
default.

MADV_HUGEPAGE changes the picture because the caller expressed a need
for THP and is willing to go extra mile to get it. That involves
allocation latency and as of now also a potential remote access. We do
not have complete agreement on the later but the prevailing argument is
that any strong NUMA locality is just reinventing node-reclaim story
again or makes THP success rate down the toilet (to quote Mel). I agree
that we do not want to fallback to a remote node overeagerly. I believe
that something like the below would be sensible
	1) THP on a local node with compaction not giving up too early
	2) THP on a remote node in NOWAIT mode - so no direct
	   compaction/reclaim (trigger kswapd/kcompactd only for
	   defrag=defer+madvise)
	3) fallback to the base page allocation

This would allow both full memory utilization and try to be as local as
possible. Whoever strongly prefers NUMA locality should be using
MPOL_NODE_RECLAIM (or similar) and that would skip 2 and make 1) and 2)
use more aggressive compaction and reclaim.

This will also fit into our existing NUMA api. MPOL_NODE_RECLAIM
wouldn't be restricted to THP obviously. It would act on base pages as
well and it would basically use the same implementation as we have for
the global node_reclaim and make it usable again.

Does this sound at least remotely sane?
-- 
Michal Hocko
SUSE Labs

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
	mgorman@techsingularity.net, Vlastimil Babka <vbabka@suse.cz>,
	ying.huang@intel.com, s.priebe@profihost.ag,
	Linux List Kernel Mailing <linux-kernel@vger.kernel.org>,
	alex.williamson@redhat.com, lkp@01.org,
	David Rientjes <rientjes@google.com>,
	kirill@shutemov.name, Andrew Morton <akpm@linux-foundation.org>,
	zi.yan@cs.rutgers.edu
Subject: MADV_HUGEPAGE vs. NUMA semantic (was: Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression)
Date: Thu, 6 Dec 2018 10:14:06 +0100	[thread overview]
Message-ID: <20181206091405.GD1286@dhcp22.suse.cz> (raw)
In-Reply-To: <CAHk-=wjm9V843eg0uesMrxKnCCq7UfWn8VJ+z-cNztb_0fVW6A@mail.gmail.com>

On Wed 05-12-18 16:58:02, Linus Torvalds wrote:
[...]
> I realize that we probably do want to just have explicit policies that
> do not exist right now, but what are (a) sane defaults, and (b) sane
> policies?

I would focus on the current default first (which is defrag=madvise).
This means that we only try the cheapest possible THP without
MADV_HUGEPAGE. If there is none we simply fallback. We do restrict to
the local node. I guess there is a general agreement that this is a sane
default.

MADV_HUGEPAGE changes the picture because the caller expressed a need
for THP and is willing to go extra mile to get it. That involves
allocation latency and as of now also a potential remote access. We do
not have complete agreement on the later but the prevailing argument is
that any strong NUMA locality is just reinventing node-reclaim story
again or makes THP success rate down the toilet (to quote Mel). I agree
that we do not want to fallback to a remote node overeagerly. I believe
that something like the below would be sensible
	1) THP on a local node with compaction not giving up too early
	2) THP on a remote node in NOWAIT mode - so no direct
	   compaction/reclaim (trigger kswapd/kcompactd only for
	   defrag=defer+madvise)
	3) fallback to the base page allocation

This would allow both full memory utilization and try to be as local as
possible. Whoever strongly prefers NUMA locality should be using
MPOL_NODE_RECLAIM (or similar) and that would skip 2 and make 1) and 2)
use more aggressive compaction and reclaim.

This will also fit into our existing NUMA api. MPOL_NODE_RECLAIM
wouldn't be restricted to THP obviously. It would act on base pages as
well and it would basically use the same implementation as we have for
the global node_reclaim and make it usable again.

Does this sound at least remotely sane?
-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2018-12-06  9:14 UTC|newest]

Thread overview: 151+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-27  6:25 [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression kernel test robot
2018-11-27  6:25 ` [LKP] " kernel test robot
2018-11-27 17:08 ` Linus Torvalds
2018-11-27 17:08   ` [LKP] " Linus Torvalds
2018-11-27 18:17   ` Michal Hocko
2018-11-27 18:17     ` [LKP] " Michal Hocko
2018-11-27 18:21     ` Michal Hocko
2018-11-27 18:21       ` [LKP] " Michal Hocko
2018-11-27 19:05   ` Vlastimil Babka
2018-11-27 19:05     ` [LKP] " Vlastimil Babka
2018-11-27 19:16     ` Vlastimil Babka
2018-11-27 19:16       ` [LKP] " Vlastimil Babka
2018-11-27 20:57   ` Andrea Arcangeli
2018-11-27 20:57     ` [LKP] " Andrea Arcangeli
2018-11-27 22:50     ` Linus Torvalds
2018-11-27 22:50       ` [LKP] " Linus Torvalds
2018-11-28  6:30       ` Michal Hocko
2018-11-28  6:30         ` [LKP] " Michal Hocko
2018-11-28  3:20     ` Huang, Ying
2018-11-28  3:20       ` [LKP] " Huang, Ying
2018-11-28 16:48       ` Linus Torvalds
2018-11-28 16:48         ` [LKP] " Linus Torvalds
2018-11-28 18:39         ` Andrea Arcangeli
2018-11-28 18:39           ` [LKP] " Andrea Arcangeli
2018-11-28 23:10         ` David Rientjes
2018-11-28 23:10           ` [LKP] " David Rientjes
2018-12-03 18:01         ` Linus Torvalds
2018-12-03 18:01           ` [LKP] " Linus Torvalds
2018-12-03 18:14           ` Michal Hocko
2018-12-03 18:14             ` [LKP] " Michal Hocko
2018-12-03 18:19             ` Linus Torvalds
2018-12-03 18:19               ` [LKP] " Linus Torvalds
2018-12-03 18:30               ` Michal Hocko
2018-12-03 18:30                 ` [LKP] " Michal Hocko
2018-12-03 18:45                 ` Linus Torvalds
2018-12-03 18:45                   ` [LKP] " Linus Torvalds
2018-12-03 18:59                   ` Michal Hocko
2018-12-03 18:59                     ` [LKP] " Michal Hocko
2018-12-03 19:23                     ` Andrea Arcangeli
2018-12-03 19:23                       ` [LKP] " Andrea Arcangeli
2018-12-03 20:26                       ` David Rientjes
2018-12-03 20:26                         ` [LKP] " David Rientjes
2018-12-03 19:28                     ` Linus Torvalds
2018-12-03 19:28                       ` [LKP] " Linus Torvalds
2018-12-03 20:12                       ` Andrea Arcangeli
2018-12-03 20:12                         ` [LKP] " Andrea Arcangeli
2018-12-03 20:36                         ` David Rientjes
2018-12-03 20:36                           ` [LKP] " David Rientjes
2018-12-03 22:04                         ` Linus Torvalds
2018-12-03 22:04                           ` [LKP] " Linus Torvalds
2018-12-03 22:27                           ` Linus Torvalds
2018-12-03 22:27                             ` [LKP] " Linus Torvalds
2018-12-03 22:57                             ` David Rientjes
2018-12-03 22:57                               ` [LKP] " David Rientjes
2018-12-04  9:22                             ` Vlastimil Babka
2018-12-04  9:22                               ` [LKP] " Vlastimil Babka
2018-12-04 10:45                               ` Mel Gorman
2018-12-04 10:45                                 ` [LKP] " Mel Gorman
2018-12-05  0:47                                 ` David Rientjes
2018-12-05  0:47                                   ` [LKP] " David Rientjes
2018-12-05  9:08                                   ` Michal Hocko
2018-12-05  9:08                                     ` [LKP] " Michal Hocko
2018-12-05 10:43                                     ` Mel Gorman
2018-12-05 10:43                                       ` [LKP] " Mel Gorman
2018-12-05 11:43                                       ` Michal Hocko
2018-12-05 11:43                                         ` [LKP] " Michal Hocko
2018-12-05 10:06                                 ` Mel Gorman
2018-12-05 10:06                                   ` [LKP] " Mel Gorman
2018-12-05 20:40                                 ` Andrea Arcangeli
2018-12-05 20:40                                   ` [LKP] " Andrea Arcangeli
2018-12-05 21:59                                   ` David Rientjes
2018-12-05 21:59                                     ` [LKP] " David Rientjes
2018-12-06  0:00                                     ` Andrea Arcangeli
2018-12-06  0:00                                       ` [LKP] " Andrea Arcangeli
2018-12-05 22:03                                   ` Linus Torvalds
2018-12-05 22:03                                     ` [LKP] " Linus Torvalds
2018-12-05 22:12                                     ` David Rientjes
2018-12-05 22:12                                       ` [LKP] " David Rientjes
2018-12-05 23:36                                     ` Andrea Arcangeli
2018-12-05 23:36                                       ` [LKP] " Andrea Arcangeli
2018-12-05 23:51                                       ` Linus Torvalds
2018-12-05 23:51                                         ` [LKP] " Linus Torvalds
2018-12-06  0:58                                         ` Linus Torvalds
2018-12-06  0:58                                           ` [LKP] " Linus Torvalds
2018-12-06  9:14                                           ` Michal Hocko [this message]
2018-12-06  9:14                                             ` MADV_HUGEPAGE vs. NUMA semantic (was: Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression) Michal Hocko
2018-12-06 23:49                                             ` MADV_HUGEPAGE vs. NUMA semantic (was: " David Rientjes
2018-12-06 23:49                                               ` MADV_HUGEPAGE vs. NUMA semantic (was: Re: [LKP] " David Rientjes
2018-12-07  7:34                                               ` MADV_HUGEPAGE vs. NUMA semantic (was: " Michal Hocko
2018-12-07  7:34                                                 ` MADV_HUGEPAGE vs. NUMA semantic (was: Re: [LKP] " Michal Hocko
2018-12-07  4:31                                             ` MADV_HUGEPAGE vs. NUMA semantic (was: " Linus Torvalds
2018-12-07  4:31                                               ` MADV_HUGEPAGE vs. NUMA semantic (was: Re: [LKP] " Linus Torvalds
2018-12-07  7:49                                               ` MADV_HUGEPAGE vs. NUMA semantic (was: " Michal Hocko
2018-12-07  7:49                                                 ` MADV_HUGEPAGE vs. NUMA semantic (was: Re: [LKP] " Michal Hocko
2018-12-07  9:06                                                 ` MADV_HUGEPAGE vs. NUMA semantic (was: " Vlastimil Babka
2018-12-07  9:06                                                   ` MADV_HUGEPAGE vs. NUMA semantic (was: Re: [LKP] " Vlastimil Babka
2018-12-07 23:15                                                   ` MADV_HUGEPAGE vs. NUMA semantic (was: " David Rientjes
2018-12-07 23:15                                                     ` MADV_HUGEPAGE vs. NUMA semantic (was: Re: [LKP] " David Rientjes
2018-12-06 23:43                                           ` [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression David Rientjes
2018-12-06 23:43                                             ` [LKP] " David Rientjes
2018-12-07  4:01                                             ` Linus Torvalds
2018-12-07  4:01                                               ` [LKP] " Linus Torvalds
2018-12-10  0:29                                               ` David Rientjes
2018-12-10  0:29                                                 ` [LKP] " David Rientjes
2018-12-10  4:49                                                 ` Andrea Arcangeli
2018-12-10  4:49                                                   ` [LKP] " Andrea Arcangeli
2018-12-12  0:37                                                   ` David Rientjes
2018-12-12  0:37                                                     ` [LKP] " David Rientjes
2018-12-12  9:50                                                     ` Michal Hocko
2018-12-12  9:50                                                       ` [LKP] " Michal Hocko
2018-12-12 17:00                                                       ` Andrea Arcangeli
2018-12-12 17:00                                                         ` [LKP] " Andrea Arcangeli
2018-12-14 11:32                                                         ` Michal Hocko
2018-12-14 11:32                                                           ` [LKP] " Michal Hocko
2018-12-12 10:14                                                     ` Vlastimil Babka
2018-12-12 10:14                                                       ` [LKP] " Vlastimil Babka
2018-12-14 21:04                                                       ` David Rientjes
2018-12-14 21:04                                                         ` [LKP] " David Rientjes
2018-12-14 21:33                                                         ` Vlastimil Babka
2018-12-14 21:33                                                           ` [LKP] " Vlastimil Babka
2018-12-21 22:18                                                           ` David Rientjes
2018-12-21 22:18                                                             ` [LKP] " David Rientjes
2018-12-22 12:08                                                             ` Mel Gorman
2018-12-22 12:08                                                               ` [LKP] " Mel Gorman
2018-12-14 23:11                                                         ` Mel Gorman
2018-12-14 23:11                                                           ` [LKP] " Mel Gorman
2018-12-21 22:15                                                           ` David Rientjes
2018-12-21 22:15                                                             ` [LKP] " David Rientjes
2018-12-12 10:44                                                   ` Andrea Arcangeli
2018-12-12 10:44                                                     ` [LKP] " Andrea Arcangeli
2019-04-15 11:48                                             ` Michal Hocko
2018-12-06  0:18                                       ` David Rientjes
2018-12-06  0:18                                         ` [LKP] " David Rientjes
2018-12-06  0:54                                         ` Andrea Arcangeli
2018-12-06  0:54                                           ` [LKP] " Andrea Arcangeli
2018-12-06  9:23                                           ` Vlastimil Babka
2018-12-06  9:23                                             ` [LKP] " Vlastimil Babka
2018-12-03 20:39                     ` David Rientjes
2018-12-03 20:39                       ` [LKP] " David Rientjes
2018-12-03 21:25                       ` Michal Hocko
2018-12-03 21:25                         ` [LKP] " Michal Hocko
2018-12-03 21:53                         ` David Rientjes
2018-12-03 21:53                           ` [LKP] " David Rientjes
2018-12-04  8:48                           ` Michal Hocko
2018-12-04  8:48                             ` [LKP] " Michal Hocko
2018-12-05  0:07                             ` David Rientjes
2018-12-05  0:07                               ` [LKP] " David Rientjes
2018-12-05 10:18                               ` Michal Hocko
2018-12-05 10:18                                 ` [LKP] " Michal Hocko
2018-12-05 19:16                                 ` David Rientjes
2018-12-05 19:16                                   ` [LKP] " David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181206091405.GD1286@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=lkp@lists.01.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.