public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Andrew Morton <akpm@linux-foundation.org>,
	Davidlohr Bueso <dave@stgolabs.net>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Hugh Dickins <hughd@google.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Rik van Riel <riel@redhat.com>, Mel Gorman <mgorman@suse.de>,
	Michal Hocko <mhocko@suse.cz>,
	Ebru Akagunduz <ebru.akagunduz@gmail.com>,
	Alex Thorlton <athorlton@sgi.com>,
	David Rientjes <rientjes@google.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>
Subject: Re: [RFC 0/6] the big khugepaged redesign
Date: Tue, 24 Feb 2015 11:32:30 +0100	[thread overview]
Message-ID: <54EC533E.8040805@suse.cz> (raw)
In-Reply-To: <20150223145619.64f3a225b914034a17d4f520@linux-foundation.org>

On 02/23/2015 11:56 PM, Andrew Morton wrote:
> On Mon, 23 Feb 2015 14:46:43 -0800 Davidlohr Bueso <dave@stgolabs.net> wrote:
>
>> On Mon, 2015-02-23 at 13:58 +0100, Vlastimil Babka wrote:
>>> Recently, there was concern expressed (e.g. [1]) whether the quite aggressive
>>> THP allocation attempts on page faults are a good performance trade-off.
>>>
>>> - THP allocations add to page fault latency, as high-order allocations are
>>>    notoriously expensive. Page allocation slowpath now does extra checks for
>>>    GFP_TRANSHUGE && !PF_KTHREAD to avoid the more expensive synchronous
>>>    compaction for user page faults. But even async compaction can be expensive.
>>> - During the first page fault in a 2MB range we cannot predict how much of the
>>>    range will be actually accessed - we can theoretically waste as much as 511
>>>    worth of pages [2]. Or, the pages in the range might be accessed from CPUs
>>>    from different NUMA nodes and while base pages could be all local, THP could
>>>    be remote to all but one CPU. The cost of remote accesses due to this false
>>>    sharing would be higher than any savings on the TLB.
>>> - The interaction with memcg are also problematic [1].
>>>
>>> Now I don't have any hard data to show how big these problems are, and I
>>> expect we will discuss this on LSF/MM (and hope somebody has such data [3]).
>>> But it's certain that e.g. SAP recommends to disable THPs [4] for their apps
>>> for performance reasons.
>>
>> There are plenty of examples of this, ie for Oracle:
>>
>> https://blogs.oracle.com/linux/entry/performance_issues_with_transparent_huge
>
> hm, five months ago and I don't recall seeing any followup to this.

Actually it's year + five months, but nevertheless...

> Does anyone know what's happening?

I would suspect mmap_sem being held during whole THP page fault 
(including the needed reclaim and compaction), which I forgot to mention 
in the first e-mail - it's not just the problem page fault latency, but 
also potentially holding back other processes, why we should allow 
shifting from THP page faults to deferred collapsing.
Although the attempts for opportunistic page faults without mmap_sem 
would also help in this particular case.

Khugepaged also used to hold mmap_sem (for read) during the allocation 
attempt, but that was fixed since then. It could be also zone lru_lock 
pressure.


  parent reply	other threads:[~2015-02-24 10:32 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-23 12:58 [RFC 0/6] the big khugepaged redesign Vlastimil Babka
2015-02-23 12:58 ` [RFC 1/6] mm, thp: stop preallocating hugepages in khugepaged Vlastimil Babka
2015-02-23 12:58 ` [RFC 2/6] mm, thp: make khugepaged check for THP allocability before scanning Vlastimil Babka
2015-02-23 12:58 ` [RFC 3/6] mm, thp: try fault allocations only if we expect them to succeed Vlastimil Babka
2015-02-23 12:58 ` [RFC 4/6] mm, thp: move collapsing from khugepaged to task_work context Vlastimil Babka
2015-02-23 14:25   ` Peter Zijlstra
2015-02-23 12:58 ` [RFC 5/6] mm, thp: wakeup khugepaged when THP allocation fails Vlastimil Babka
2015-02-23 12:58 ` [RFC 6/6] mm, thp: remove no longer needed khugepaged code Vlastimil Babka
2015-02-23 21:03 ` [RFC 0/6] the big khugepaged redesign Andi Kleen
2015-02-23 22:46 ` Davidlohr Bueso
2015-02-23 22:56   ` Andrew Morton
2015-02-23 22:58     ` Sasha Levin
2015-02-24 10:32     ` Vlastimil Babka [this message]
2015-02-24 11:24       ` Andrea Arcangeli
2015-02-24 11:45         ` Andrea Arcangeli
2015-02-25 12:42         ` Vlastimil Babka
2015-03-05 16:30       ` Vlastimil Babka
2015-03-05 16:52         ` Andres Freund
2015-03-05 17:01           ` Vlastimil Babka
2015-03-05 17:07             ` Andres Freund
2015-03-06  0:21         ` Andres Freund
2015-03-06  7:50           ` Vlastimil Babka
2015-03-09  3:17   ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54EC533E.8040805@suse.cz \
    --to=vbabka@suse.cz \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=athorlton@sgi.com \
    --cc=dave@stgolabs.net \
    --cc=ebru.akagunduz@gmail.com \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox