Re: [PATCH] RFC: clear 1G pages with streaming stores on x86

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Matthew Wilcox <willy@infradead.org>
To: Cannon Matthews <cannonmatthews@google.com>
Cc: Michal Hocko <mhocko@kernel.org>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Andres Lagar-Cavilla <andreslc@google.com>,
	Salman Qazi <sqazi@google.com>, Paul Turner <pjt@google.com>,
	David Matlack <dmatlack@google.com>,
	Peter Feiner <pfeiner@google.com>,
	Alain Trinh <nullptr@google.com>
Subject: Re: [PATCH] RFC: clear 1G pages with streaming stores on x86
Date: Tue, 24 Jul 2018 14:09:23 -0700	[thread overview]
Message-ID: <20180724210923.GA20168@bombadil.infradead.org> (raw)
In-Reply-To: <20180724204639.26934-1-cannonmatthews@google.com>

On Tue, Jul 24, 2018 at 01:46:39PM -0700, Cannon Matthews wrote:
> Reimplement clear_gigantic_page() to clear gigabytes pages using the
> non-temporal streaming store instructions that bypass the cache
> (movnti), since an entire 1GiB region will not fit in the cache anyway.
> 
> Doing an mlock() on a 512GiB 1G-hugetlb region previously would take on
> average 134 seconds, about 260ms/GiB which is quite slow. Using `movnti`
> and optimizing the control flow over the constituent small pages, this
> can be improved roughly by a factor of 3-4x, with the 512GiB mlock()
> taking only 34 seconds on average, or 67ms/GiB.

This is great data ...

> - The calls to cond_resched() have been reduced from between every 4k
> page to every 64, as between all of the 256K page seemed overly
> frequent.  Does this seem like an appropriate frequency? On an idle
> system with many spare CPUs it get's rescheduled typically once or twice
> out of the 4096 times it calls cond_resched(), which seems like it is
> maybe the right amount, but more insight from a scheduling/latency point
> of view would be helpful.

... which makes the lack of data here disappointing -- what're the
comparable timings if you do check every 4kB or every 64kB instead of
every 256kB?

> The assembly code for the __clear_page_nt routine is more or less
> taken directly from the output of gcc with -O3 for this function with
> some tweaks to support arbitrary sizes and moving memory barriers:
> 
> void clear_page_nt_64i (void *page)
> {
>   for (int i = 0; i < GiB /sizeof(long long int); ++i)
>     {
>       _mm_stream_si64 (((long long int*)page) + i, 0);
>     }
>   sfence();
> }
> 
> In general I would love to hear any thoughts and feedback on this
> approach and any ways it could be improved.
> 
> Some specific questions:
> 
> - What is the appropriate method for defining an arch specific
> implementation like this, is the #ifndef code sufficient, and did stuff
> land in appropriate files?
> 
> - Are there any obvious pitfalls or caveats that have not been
> considered? In particular the iterator over mem_map_next() seemed like a
> no-op on x86, but looked like it could be important in certain
> configurations or architectures I am not familiar with.
> 
> - Are there any x86_64 implementations that do not support SSE2
> instructions like `movnti` ? What is the appropriate way to detect and
> code around that if so?

No.  SSE2 was introduced with the Pentium 4, before x86-64.  The XMM
registers are used as part of the x86-64 calling conventions, so SSE2
is mandatory for x86-64 implementations.

> - Is there anything that could be improved about the assembly code? I
> originally wrote it in C and don't have much experience hand writing x86
> asm, which seems riddled with optimization pitfalls.

I suspect it might be slightly faster if implemented as inline asm in the
x86 clear_gigantic_page() implementation instead of a function call.
Might not affect performance a lot though.

> - Is the highmem codepath really necessary? would 1GiB pages really be
> of much use on a highmem system? We recently removed some other parts of
> the code that support HIGHMEM for gigantic pages (see:
> http://lkml.kernel.org/r/20180711195913.1294-1-mike.kravetz@oracle.com)
> so this seems like a logical continuation.

PAE paging doesn't support 1GB pages, so there's no need for it on x86.

> diff --git a/mm/memory.c b/mm/memory.c
> index 7206a634270b..2515cae4af4e 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -70,6 +70,7 @@
>  #include <linux/dax.h>
>  #include <linux/oom.h>
> 
> +
>  #include <asm/io.h>
>  #include <asm/mmu_context.h>
>  #include <asm/pgalloc.h>

Spurious.

next prev parent reply	other threads:[~2018-07-24 21:09 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-24 20:46 [PATCH] RFC: clear 1G pages with streaming stores on x86 Cannon Matthews
2018-07-24 20:53 ` Andrew Morton
2018-07-25  2:50   ` Cannon Matthews
2018-07-24 21:09 ` Matthew Wilcox [this message]
2018-07-25  2:37   ` [PATCH v2] " Cannon Matthews
2018-07-25  5:02     ` Elliott, Robert (Persistent Memory)
2018-07-25 14:38       ` Matthew Wilcox
2018-07-25 17:30       ` Cannon Matthews
2018-07-25 18:23         ` Matthew Wilcox
2018-07-25 18:48           ` Cannon Matthews
2018-07-25 12:57     ` Michal Hocko
2018-07-25 17:55       ` Cannon Matthews
2018-07-26 13:19         ` Michal Hocko
2018-07-27  0:05         ` Huang, Ying
2018-07-30 16:29     ` Borislav Petkov
2018-07-31  0:28       ` Cannon Matthews
2018-07-31  0:45       ` Matthew Wilcox
2018-07-25  2:46   ` [PATCH] " Cannon Matthews

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180724210923.GA20168@bombadil.infradead.org \
    --to=willy@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=andreslc@google.com \
    --cc=cannonmatthews@google.com \
    --cc=dmatlack@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=nullptr@google.com \
    --cc=pfeiner@google.com \
    --cc=pjt@google.com \
    --cc=sqazi@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).