From: Vlastimil Babka <vbabka@suse.cz>
To: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Andrew Morton <akpm@linux-foundation.org>,
Andrea Arcangeli <aarcange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>,
Hugh Dickins <hughd@google.com>, Mel Gorman <mgorman@suse.de>,
Rik van Riel <riel@redhat.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH, RFC 00/10] THP refcounting redesign
Date: Tue, 10 Jun 2014 10:10:56 +0200 [thread overview]
Message-ID: <5396BD90.4060104@suse.cz> (raw)
In-Reply-To: <1402329861-7037-1-git-send-email-kirill.shutemov@linux.intel.com>
On 06/09/2014 06:04 PM, Kirill A. Shutemov wrote:
> Hello everybody,
>
> We've discussed few times that is would be nice to allow huge pages to be
> mapped with 4k pages too. Here's my first attempt to actually implement
> this. It's early prototype and not stabilized yet, but I want to share it
> to discuss any potential show stoppers early.
>
> The main reason why we can't map THP with 4k is how refcounting on THP
> designed. It built around two requirements:
>
> - split of huge page should never fail;
> - we can't change interface of get_user_page();
>
> To be able to split huge page at any point we have to track which tail
> page was pinned. It leads to tricky and expensive get_page() on tail pages
> and also occupy tail_page->_mapcount.
>
> Most split_huge_page*() users want PMD to be split into table of PTEs and
> don't care whether compound page is going to be split or not.
>
> The plan is:
>
> - allow split_huge_page() to fail if the page is pinned. It's trivial to
> split non-pinned page and it doesn't require tail page refcounting, so
> tail_page->_mapcount is free to be reused.
>
> - introduce new routine -- split_huge_pmd() -- to split PMD into table of
> PTEs. It splits only one PMD, not touching other PMDs the page is
> mapped with or underlying compound page. Unlike new split_huge_page(),
> split_huge_pmd() never fails.
>
> Fortunately, we have only few places where split_huge_page() is needed:
> swap out, memory failure, migration, KSM. And all of them can handle
> split_huge_page() fail.
>
> In new scheme we use tail_page->_mapcount is used to account how many time
> the tail page is mapped. head_page->_mapcount is used for both PMD mapping
> of whole huge page and PTE mapping of the firt 4k page of the compound
> page. It seems work fine, except the fact that we don't have a cheap way
> to check whether the page mapped with PMDs or not.
>
> Introducing split_huge_pmd() effectively allows THP to be mapped with 4k.
> It can break some kernel expectations. I.e. VMA now can start and end in
> middle of compound page. IIUC, it will break compactation and probably
> something else (any hints?).
I don't think compaction cares at all about VMA's. Unless the underlying
page migration does. What will break is munlock due to
VM_BUG_ON(PageTail(page)) in the PageTransHuge() check.
> Also munmap() on part of huge page will not split and free unmapped part
> immediately. We need to be careful here to keep memory footprint under
> control.
So who will take care of it, if it's not done immediately?
> As side effect we don't need to mark PMD splitting since we have
> split_huge_pmd(). get_page()/put_page() on tail of THP is cheaper (and
> cleaner) now.
But per patch 2, PageAnon() is more expensive. Also there are no side
effects to this change?
> I will continue with stabilizing this. The patchset also available on
> git[1].
>
> Any commemnt?
>
> [1] git://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git thp/refcounting/v1
>
next prev parent reply other threads:[~2014-06-10 8:11 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-09 16:04 [PATCH, RFC 00/10] THP refcounting redesign Kirill A. Shutemov
2014-06-09 16:04 ` [PATCH, RFC 01/10] mm, thp: drop FOLL_SPLIT Kirill A. Shutemov
2014-06-09 16:04 ` [PATCH, RFC 02/10] mm: change PageAnon() to work on tail pages Kirill A. Shutemov
2014-06-09 16:04 ` [PATCH, RFC 03/10] thp: rename split_huge_page_pmd() to split_huge_pmd() Kirill A. Shutemov
2014-06-09 16:04 ` [PATCH, RFC 04/10] thp: PMD splitting without splitting compound page Kirill A. Shutemov
2014-06-09 16:04 ` [PATCH, RFC 05/10] mm, vmstats: new THP splitting event Kirill A. Shutemov
2014-06-09 16:04 ` [PATCH, RFC 06/10] thp: implement new split_huge_page() Kirill A. Shutemov
2014-06-09 16:04 ` [PATCH, RFC 07/10] mm, thp: remove infrastructure for handling splitting PMDs Kirill A. Shutemov
2014-06-09 16:04 ` [PATCH, RFC 08/10] x86, thp: remove " Kirill A. Shutemov
2014-06-09 16:04 ` [PATCH, RFC 09/10] futex, thp: remove special case for THP in get_futex_key Kirill A. Shutemov
2014-06-09 16:04 ` [PATCH, RFC 10/10] thp: update documentation Kirill A. Shutemov
2014-06-10 8:10 ` Vlastimil Babka [this message]
2014-06-10 13:52 ` [PATCH, RFC 00/10] THP refcounting redesign Kirill A. Shutemov
2014-06-10 14:29 ` Andrea Arcangeli
2014-06-10 15:24 ` Kirill A. Shutemov
2014-06-10 20:25 ` Christoph Lameter
2014-06-10 20:46 ` Kirill A. Shutemov
2014-06-10 21:21 ` Christoph Lameter
2014-06-10 22:04 ` Andrea Arcangeli
2014-06-10 22:14 ` Kirill A. Shutemov
2014-06-10 22:37 ` Andrea Arcangeli
2014-06-10 21:58 ` Andrea Arcangeli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5396BD90.4060104@suse.cz \
--to=vbabka@suse.cz \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=dave.hansen@intel.com \
--cc=hughd@google.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox