public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andrea Arcangeli <aarcange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>,
	Hugh Dickins <hughd@google.com>, Mel Gorman <mgorman@suse.de>,
	Rik van Riel <riel@redhat.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH, RFC 00/10] THP refcounting redesign
Date: Tue, 10 Jun 2014 10:10:56 +0200	[thread overview]
Message-ID: <5396BD90.4060104@suse.cz> (raw)
In-Reply-To: <1402329861-7037-1-git-send-email-kirill.shutemov@linux.intel.com>

On 06/09/2014 06:04 PM, Kirill A. Shutemov wrote:
> Hello everybody,
>
> We've discussed few times that is would be nice to allow huge pages to be
> mapped with 4k pages too. Here's my first attempt to actually implement
> this. It's early prototype and not stabilized yet, but I want to share it
> to discuss any potential show stoppers early.
>
> The main reason why we can't map THP with 4k is how refcounting on THP
> designed. It built around two requirements:
>
>    - split of huge page should never fail;
>    - we can't change interface of get_user_page();
>
> To be able to split huge page at any point we have to track which tail
> page was pinned. It leads to tricky and expensive get_page() on tail pages
> and also occupy tail_page->_mapcount.
>
> Most split_huge_page*() users want PMD to be split into table of PTEs and
> don't care whether compound page is going to be split or not.
>
> The plan is:
>
>   - allow split_huge_page() to fail if the page is pinned. It's trivial to
>     split non-pinned page and it doesn't require tail page refcounting, so
>     tail_page->_mapcount is free to be reused.
>
>   - introduce new routine -- split_huge_pmd() -- to split PMD into table of
>     PTEs. It splits only one PMD, not touching other PMDs the page is
>     mapped with or underlying compound page. Unlike new split_huge_page(),
>     split_huge_pmd() never fails.
>
> Fortunately, we have only few places where split_huge_page() is needed:
> swap out, memory failure, migration, KSM. And all of them can handle
> split_huge_page() fail.
>
> In new scheme we use tail_page->_mapcount is used to account how many time
> the tail page is mapped. head_page->_mapcount is used for both PMD mapping
> of whole huge page and PTE mapping of the firt 4k page of the compound
> page. It seems work fine, except the fact that we don't have a cheap way
> to check whether the page mapped with PMDs or not.
>
> Introducing split_huge_pmd() effectively allows THP to be mapped with 4k.
> It can break some kernel expectations. I.e. VMA now can start and end in
> middle of compound page. IIUC, it will break compactation and probably
> something else (any hints?).

I don't think compaction cares at all about VMA's. Unless the underlying 
page migration does. What will break is munlock due to 
VM_BUG_ON(PageTail(page)) in the PageTransHuge() check.

> Also munmap() on part of huge page will not split and free unmapped part
> immediately. We need to be careful here to keep memory footprint under
> control.

So who will take care of it, if it's not done immediately?

> As side effect we don't need to mark PMD splitting since we have
> split_huge_pmd(). get_page()/put_page() on tail of THP is cheaper (and
> cleaner) now.

But per patch 2, PageAnon() is more expensive. Also there are no side 
effects to this change?

> I will continue with stabilizing this. The patchset also available on
> git[1].
>
> Any commemnt?
>
> [1] git://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git thp/refcounting/v1
>

  parent reply	other threads:[~2014-06-10  8:11 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-09 16:04 [PATCH, RFC 00/10] THP refcounting redesign Kirill A. Shutemov
2014-06-09 16:04 ` [PATCH, RFC 01/10] mm, thp: drop FOLL_SPLIT Kirill A. Shutemov
2014-06-09 16:04 ` [PATCH, RFC 02/10] mm: change PageAnon() to work on tail pages Kirill A. Shutemov
2014-06-09 16:04 ` [PATCH, RFC 03/10] thp: rename split_huge_page_pmd() to split_huge_pmd() Kirill A. Shutemov
2014-06-09 16:04 ` [PATCH, RFC 04/10] thp: PMD splitting without splitting compound page Kirill A. Shutemov
2014-06-09 16:04 ` [PATCH, RFC 05/10] mm, vmstats: new THP splitting event Kirill A. Shutemov
2014-06-09 16:04 ` [PATCH, RFC 06/10] thp: implement new split_huge_page() Kirill A. Shutemov
2014-06-09 16:04 ` [PATCH, RFC 07/10] mm, thp: remove infrastructure for handling splitting PMDs Kirill A. Shutemov
2014-06-09 16:04 ` [PATCH, RFC 08/10] x86, thp: remove " Kirill A. Shutemov
2014-06-09 16:04 ` [PATCH, RFC 09/10] futex, thp: remove special case for THP in get_futex_key Kirill A. Shutemov
2014-06-09 16:04 ` [PATCH, RFC 10/10] thp: update documentation Kirill A. Shutemov
2014-06-10  8:10 ` Vlastimil Babka [this message]
2014-06-10 13:52   ` [PATCH, RFC 00/10] THP refcounting redesign Kirill A. Shutemov
2014-06-10 14:29     ` Andrea Arcangeli
2014-06-10 15:24       ` Kirill A. Shutemov
2014-06-10 20:25 ` Christoph Lameter
2014-06-10 20:46   ` Kirill A. Shutemov
2014-06-10 21:21     ` Christoph Lameter
2014-06-10 22:04     ` Andrea Arcangeli
2014-06-10 22:14       ` Kirill A. Shutemov
2014-06-10 22:37         ` Andrea Arcangeli
2014-06-10 21:58   ` Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5396BD90.4060104@suse.cz \
    --to=vbabka@suse.cz \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@intel.com \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox