linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: linux-mm@kvack.org, Andrea Arcangeli <aarcange@redhat.com>,
	Andi Kleen <ak@linux.intel.com>,
	"H. Peter Anvin" <hpa@linux.intel.com>,
	linux-kernel@vger.kernel.org,
	"Kirill A. Shutemov" <kirill@shutemov.name>
Subject: Re: [PATCH v3 00/10] Introduce huge zero page
Date: Tue, 2 Oct 2012 15:31:48 -0700	[thread overview]
Message-ID: <20121002153148.1ae1020a.akpm@linux-foundation.org> (raw)
In-Reply-To: <1349191172-28855-1-git-send-email-kirill.shutemov@linux.intel.com>

On Tue,  2 Oct 2012 18:19:22 +0300
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:

> During testing I noticed big (up to 2.5 times) memory consumption overhead
> on some workloads (e.g. ft.A from NPB) if THP is enabled.
> 
> The main reason for that big difference is lacking zero page in THP case.
> We have to allocate a real page on read page fault.
> 
> A program to demonstrate the issue:
> #include <assert.h>
> #include <stdlib.h>
> #include <unistd.h>
> 
> #define MB 1024*1024
> 
> int main(int argc, char **argv)
> {
>         char *p;
>         int i;
> 
>         posix_memalign((void **)&p, 2 * MB, 200 * MB);
>         for (i = 0; i < 200 * MB; i+= 4096)
>                 assert(p[i] == 0);
>         pause();
>         return 0;
> }
> 
> With thp-never RSS is about 400k, but with thp-always it's 200M.
> After the patcheset thp-always RSS is 400k too.

I'd like to see a full description of the design, please.

>From reading the code, it appears that we initially allocate a huge
page and point the pmd at that.  If/when there is a write fault against
that page we then populate the mm with ptes which point at the normal
4k zero page and populate the pte at the fault address with a newly
allocated page?   Correct and complete?  If not, please fix ;)

Also, IIRC, the early versions of the patch did not allocate the
initial huge page at all - it immediately filled the mm with ptes which
point at the normal 4k zero page.  Is that a correct recollection?
If so, why the change?

Also IIRC, Andrea had a little test app which demonstrated the TLB
costs of the inital approach, and they were high?

Please, let's capture all this knowledge in a single place, right here
in the changelog.  And in code comments, where appropriate.  Otherwise
people won't know why we made these decisions unless they go off and
find lengthy, years-old and quite possibly obsolete email threads.


Also, you've presented some data on the memory savings, but no
quantitative testing results on the performance cost.  Both you and
Andrea have run these tests and those results are important.  Let's
capture them here.  And when designing such tests we should not just
try to demonstrate the benefits of a code change - we should think of
test cases whcih might be adversely affected and run those as well.


It's not an appropriate time to be merging new features - please plan
on preparing this patchset against 3.7-rc1.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2012-10-02 22:31 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-02 15:19 [PATCH v3 00/10] Introduce huge zero page Kirill A. Shutemov
2012-10-02 15:19 ` [PATCH v3 01/10] thp: huge zero page: basic preparation Kirill A. Shutemov
2012-10-02 15:19 ` [PATCH v3 02/10] thp: zap_huge_pmd(): zap huge zero pmd Kirill A. Shutemov
2012-10-02 15:19 ` [PATCH v3 03/10] thp: copy_huge_pmd(): copy huge zero page Kirill A. Shutemov
2012-10-02 15:19 ` [PATCH v3 04/10] thp: do_huge_pmd_wp_page(): handle " Kirill A. Shutemov
2012-10-02 15:35   ` Brice Goglin
2012-10-02 15:38     ` Kirill A. Shutemov
2012-10-02 15:19 ` [PATCH v3 05/10] thp: change_huge_pmd(): keep huge zero page write-protected Kirill A. Shutemov
2012-10-02 15:19 ` [PATCH v3 06/10] thp: change split_huge_page_pmd() interface Kirill A. Shutemov
2012-10-02 15:19 ` [PATCH v3 07/10] thp: implement splitting pmd for huge zero page Kirill A. Shutemov
2012-10-12  3:23   ` Ni zhan Chen
2012-10-12  4:13     ` Kirill A. Shutemov
2012-10-12  5:00       ` Ni zhan Chen
2012-10-02 15:19 ` [PATCH v3 08/10] thp: setup huge zero page on non-write page fault Kirill A. Shutemov
2012-10-02 15:19 ` [PATCH v3 09/10] thp: lazy huge zero page allocation Kirill A. Shutemov
2012-10-02 15:19 ` [PATCH v3 10/10] thp: implement refcounting for huge zero page Kirill A. Shutemov
2012-10-02 16:13 ` [PATCH v3 00/10] Introduce " Andrea Arcangeli
2012-10-02 22:31 ` Andrew Morton [this message]
2012-10-02 22:55   ` Andrea Arcangeli
2012-10-03  0:04   ` Kirill A. Shutemov
2012-10-03  0:11     ` Andrew Morton
2012-10-17  2:32     ` Ni zhan Chen
2012-10-18 14:50       ` Kirill A. Shutemov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121002153148.1ae1020a.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=aarcange@redhat.com \
    --cc=ak@linux.intel.com \
    --cc=hpa@linux.intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).