From: Michal Hocko <mhocko@kernel.org>
To: David Rientjes <rientjes@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Jonathan Corbet <corbet@lwn.net>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Vlastimil Babka <vbabka@suse.cz>,
Mel Gorman <mgorman@techsingularity.net>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [patch] mm, thp: always direct reclaim for MADV_HUGEPAGE even when deferred
Date: Fri, 23 Dec 2016 09:51:50 +0100 [thread overview]
Message-ID: <20161223085150.GA23109@dhcp22.suse.cz> (raw)
In-Reply-To: <alpine.DEB.2.10.1612221259100.29036@chino.kir.corp.google.com>
On Thu 22-12-16 13:05:27, David Rientjes wrote:
> On Thu, 22 Dec 2016, Michal Hocko wrote:
>
> > > Currently, when defrag is set to "madvise", thp allocations will direct
> > > reclaim. However, when defrag is set to "defer", all thp allocations do
> > > not attempt reclaim regardless of MADV_HUGEPAGE.
> > >
> > > This patch always directly reclaims for MADV_HUGEPAGE regions when defrag
> > > is not set to "never." The idea is that MADV_HUGEPAGE regions really
> > > want to be backed by hugepages and are willing to endure the latency at
> > > fault as it was the default behavior prior to commit 444eb2a449ef ("mm:
> > > thp: set THP defrag by default to madvise and add a stall-free defrag
> > > option").
> >
> > AFAIR "defer" is implemented exactly as intended. To offer a never-stall
> > but allow to form THP in the background option. The patch description
> > doesn't explain why this is not good anymore. Could you give us more
> > details about the motivation and why "madvise" doesn't work for
> > you? This is a user visible change so the reason should better be really
> > documented and strong.
> >
>
> The offering of defer breaks backwards compatibility with previous
> settings of defrag=madvise, where we could set madvise(MADV_HUGEPAGE) on
> .text segment remap and try to force thp backing if available but not
> directly reclaim for non VM_HUGEPAGE vmas.
I do not understand the backwards compatibility issue part here. Maybe I
am missing something but the semantic of defrag=madvise hasn't changed
and a new flag can hardly break backward compatibility.
> This was very advantageous.
> We prefer that to stay unchanged and allow kcompactd compaction to be
> triggered in background by everybody else as opposed to direct reclaim.
> We do not have that ability without this patch.
So why don't you use defrag=madvise?
> Without this patch, we will be forced to offer multiple sysfs tunables to
> define (1) direct vs background compact, (2) madvise behavior, (3) always,
> (4) never and we cannot have 2^4 settings for "defrag" alone.
I disagree. I think the current set of defrag values should be
sufficient. We can completely disable direct reclaim, enable it only for
opt-in, enable for all and never allow to stall. The advantage of this
set of values is that they have _clear_ semantic and behave
consistently. If you change defer to "almost never stall except when
MADV_HUGEPAGE" then the semantic is less clear. Admin might have a good
reason to never allow stalls - especially when he doesn't have a control
over the code he is running. Your patch would break this usecase.
If we want to provide a better background THP availability we should
focus more on kcompactd and the way how it is invoked. Currently we only
wake it up during the page allocation path. Long term we want to make it
more watermak based I believe. Similar to kswapd. Vlastimil is already
playing with this idea. I would prefer such a long term plan more than
tweaking THP configuration.
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-12-23 8:51 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-12-22 0:21 [patch] mm, thp: always direct reclaim for MADV_HUGEPAGE even when deferred David Rientjes
2016-12-22 8:31 ` Kirill A. Shutemov
2016-12-22 10:00 ` Michal Hocko
2016-12-22 21:05 ` David Rientjes
2016-12-23 8:51 ` Michal Hocko [this message]
2016-12-23 10:01 ` David Rientjes
2016-12-23 11:18 ` Michal Hocko
2016-12-23 22:46 ` David Rientjes
2016-12-26 9:02 ` Michal Hocko
2016-12-27 0:53 ` David Rientjes
2016-12-27 2:32 ` Kirill A. Shutemov
2016-12-27 9:41 ` Michal Hocko
2016-12-27 21:36 ` David Rientjes
2016-12-28 8:48 ` Michal Hocko
2016-12-28 21:33 ` David Rientjes
2016-12-29 8:24 ` Michal Hocko
2016-12-30 12:36 ` Mel Gorman
2016-12-30 12:56 ` Michal Hocko
2016-12-30 14:08 ` Mel Gorman
2016-12-30 22:30 ` David Rientjes
2017-01-03 10:37 ` Mel Gorman
2017-01-03 21:57 ` David Rientjes
2017-01-04 10:12 ` Mel Gorman
2017-01-04 21:53 ` David Rientjes
2017-01-02 8:38 ` Vlastimil Babka
2017-01-03 22:44 ` David Rientjes
2017-01-04 8:32 ` Vlastimil Babka
2017-01-04 9:46 ` Michal Hocko
2017-01-04 22:04 ` David Rientjes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161223085150.GA23109@dhcp22.suse.cz \
--to=mhocko@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=corbet@lwn.net \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=rientjes@google.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).