linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Wanpeng Li <liwanp@linux.vnet.ibm.com>
To: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	linux-mm@kvack.org, Andi Kleen <ak@linux.intel.com>,
	"H. Peter Anvin" <hpa@linux.intel.com>,
	linux-kernel@vger.kernel.org,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Gavin Shan <shangw@linux.vnet.ibm.com>
Subject: Re: [PATCH, RFC 0/9] Introduce huge zero page
Date: Fri, 10 Aug 2012 11:49:12 +0800	[thread overview]
Message-ID: <20120810034912.GA31071@hacker.(null)> (raw)
In-Reply-To: <1344503300-9507-1-git-send-email-kirill.shutemov@linux.intel.com>

On Thu, Aug 09, 2012 at 12:08:11PM +0300, Kirill A. Shutemov wrote:
>From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>
>During testing I noticed big (up to 2.5 times) memory consumption overhead
>on some workloads (e.g. ft.A from NPB) if THP is enabled.
>
>The main reason for that big difference is lacking zero page in THP case.
>We have to allocate a real page on read page fault.
>
>A program to demonstrate the issue:
>#include <assert.h>
>#include <stdlib.h>
>#include <unistd.h>
>
>#define MB 1024*1024
>
>int main(int argc, char **argv)
>{
>        char *p;
>        int i;
>
>        posix_memalign((void **)&p, 2 * MB, 200 * MB);
>        for (i = 0; i < 200 * MB; i+= 4096)
>                assert(p[i] == 0);
>        pause();
>        return 0;
>}
>
>With thp-never RSS is about 400k, but with thp-always it's 200M.
>After the patcheset thp-always RSS is 400k too.
>
Hi Kirill, 

Thank you for your patchset, I have some questions to ask.

1. In your patchset, if read page fault, the pmd will be populated by huge
zero page, IIUC, assert(p[i] == 0) is a read operation, so why thp-always
RSS is 400K ? You allocate 100 pages, why each cost 4K? I think the
right overhead should be 2MB for the huge zero page instead of 400K, where
I missing ?

2. If the user hope to allocate 200MB, total 100 pages needed. The codes 
will allocate one 2MB huge zero page and populate to all associated pmd
in your patchset logic. When the user attempt to write pages, wp will be 
triggered, and if allocate huge page failed will fallback to
do_huge_pmd_wp_zero_page_fallback in your patch logic, but you just
create a new table and set pte around fault address to the newly
allocated page, all other ptes set to normal zero page. In this scene 
user only get one 4K page and all other zero pages, how the codes can
cotinue to work? Why not fallback to allocate normal page even if not 
physical continuous.

3. In your patchset logic:
"In fallback path we create a new table and set pte around fault address
to the newly allocated page. All other ptes set to normal zero page."
When these zero pages will be replaced by real pages and add memcg charge?

Look forward to your detail response, thank you! :)

Regards,
Wanpeng Li


>H. Peter Anvin proposed to use a "virtual huge zero page" -- a pmd table
>with all pte set to 4k zero page. I haven't tried that approach and I'm
>not sure if it's good idea (cache vs. tlb trashing). And I guess it will
>require more code to handle.
>For now, I just allocate 2M page and use it.
>
>Kirill A. Shutemov (9):
>  thp: huge zero page: basic preparation
>  thp: zap_huge_pmd(): zap huge zero pmd
>  thp: copy_huge_pmd(): copy huge zero page
>  thp: do_huge_pmd_wp_page(): handle huge zero page
>  thp: change_huge_pmd(): keep huge zero page write-protected
>  thp: add address parameter to split_huge_page_pmd()
>  thp: implement splitting pmd for huge zero page
>  thp: setup huge zero page on non-write page fault
>  thp: lazy huge zero page allocation
>
> Documentation/vm/transhuge.txt |    4 +-
> arch/x86/kernel/vm86_32.c      |    2 +-
> fs/proc/task_mmu.c             |    2 +-
> include/linux/huge_mm.h        |   10 +-
> include/linux/mm.h             |    8 ++
> mm/huge_memory.c               |  228 +++++++++++++++++++++++++++++++++++-----
> mm/memory.c                    |   11 +--
> mm/mempolicy.c                 |    2 +-
> mm/mprotect.c                  |    2 +-
> mm/mremap.c                    |    3 +-
> mm/pagewalk.c                  |    2 +-
> 11 files changed, 226 insertions(+), 48 deletions(-)
>
>-- 
>1.7.7.6
>
>--
>To unsubscribe, send a message with 'unsubscribe linux-mm' in
>the body to majordomo@kvack.org.  For more info on Linux MM,
>see: http://www.linux-mm.org/ .
>Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2012-08-10  3:49 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-09  9:08 [PATCH, RFC 0/9] Introduce huge zero page Kirill A. Shutemov
2012-08-09  9:08 ` [PATCH, RFC 1/9] thp: huge zero page: basic preparation Kirill A. Shutemov
2012-08-09  9:08 ` [PATCH, RFC 2/9] thp: zap_huge_pmd(): zap huge zero pmd Kirill A. Shutemov
2012-08-09  9:08 ` [PATCH, RFC 3/9] thp: copy_huge_pmd(): copy huge zero page Kirill A. Shutemov
2012-08-09  9:08 ` [PATCH, RFC 4/9] thp: do_huge_pmd_wp_page(): handle " Kirill A. Shutemov
2012-08-09  9:08 ` [PATCH, RFC 5/9] thp: change_huge_pmd(): keep huge zero page write-protected Kirill A. Shutemov
2012-08-09  9:08 ` [PATCH, RFC 6/9] thp: add address parameter to split_huge_page_pmd() Kirill A. Shutemov
2012-08-16 19:42   ` Andrea Arcangeli
2012-08-17  7:49     ` Kirill A. Shutemov
2012-08-09  9:08 ` [PATCH, RFC 7/9] thp: implement splitting pmd for huge zero page Kirill A. Shutemov
2012-08-16 19:27   ` Andrea Arcangeli
2012-08-17  8:12     ` Kirill A. Shutemov
2012-08-17 16:33       ` Andrea Arcangeli
2012-08-31 14:06     ` Kirill A. Shutemov
2012-08-09  9:08 ` [PATCH, RFC 8/9] thp: setup huge zero page on non-write page fault Kirill A. Shutemov
2012-08-09  9:08 ` [PATCH, RFC 9/9] thp: lazy huge zero page allocation Kirill A. Shutemov
2012-08-10  3:49 ` Wanpeng Li [this message]
2012-08-10 10:33   ` [PATCH, RFC 0/9] Introduce huge zero page Kirill A. Shutemov
2012-08-11  1:10     ` Wanpeng Li
2012-08-11  1:10     ` Wanpeng Li
2012-08-10  3:49 ` Wanpeng Li
2012-08-16 19:20 ` Andrew Morton
2012-08-16 19:40   ` Andrea Arcangeli
2012-08-16 23:08     ` H. Peter Anvin
2012-08-16 23:12     ` Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='20120810034912.GA31071@hacker.(null)' \
    --to=liwanp@linux.vnet.ibm.com \
    --cc=aarcange@redhat.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=hpa@linux.intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=shangw@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).