Re: [PATCH 00/11] Use global pages with PTI

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Ingo Molnar <mingo@kernel.org>
To: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	"Andrea Arcangeli" <aarcange@redhat.com>,
	"Andrew Lutomirski" <luto@kernel.org>,
	"Kees Cook" <keescook@google.com>,
	"Hugh Dickins" <hughd@google.com>,
	"Jürgen Groß" <jgross@suse.com>,
	"the arch/x86 maintainers" <x86@kernel.org>,
	namit@vmware.com
Subject: Re: [PATCH 00/11] Use global pages with PTI
Date: Sat, 24 Mar 2018 12:05:34 +0100	[thread overview]
Message-ID: <20180324110534.t52m5gvn4r7kvmnj@gmail.com> (raw)
In-Reply-To: <be2e683c-bf0a-e9ce-2f02-4905f6bd56d3@linux.intel.com>

* Dave Hansen <dave.hansen@linux.intel.com> wrote:

> This is time doing a modestly-sized kernel compile on a 4-core Skylake
> desktop.
> 
>                         User Time       Kernel Time     Clock Elapsed
> Baseline ( 0 GLB PTEs)  803.79          67.77           237.30
> w/series (28 GLB PTEs)  807.70 (+0.7%)  68.07 (+0.7%)   238.07 (+0.3%)
> 
> Without PCIDs, it behaves the way I would expect.
>
> I'll ask around, but I'm open to any ideas about what the heck might be
> causing this.

Hm, so it's a bit weird that while user time and kernel time both increased by 
about 0.7%, elapsed time only increased by 0.3%? Typically kernel builds are much 
more parallel for that to be typical, so maybe there's some noise in the 
measurement?

Before spending too much time on the global-TLB patch angle I'd suggest investing 
a bit of time into making sure that the regression you are seeing is actually 
real:

You haven't described how you have measured kernel build times and "+0.7% 
regression" might turn out to be the real number, but sub-1% accuracy kernel build 
times are *awfully* susceptible to:

 - various sources of noise

 - systematic statistical errors which doesn't show up as 
   measurement-to-measurement noise but which skews the results:
   such as the boot-to-boot memory layout of the source code and
   object files.

 - cpufreq artifacts

Even repeated builds with 'make clean' inbetween can be misleading because the 
exact layout of key include files and binaries which get accessed the most often 
during a build are set into stone once they've been read into the page cache for 
the first time after bootup. Automated reboots between measurements can be 
misleading as well, if the file layout after bootup is too deterministic.

So here's a pretty reliable way to measure kernel build time, which tries to avoid 
the various pitfalls of caching.

First I make sure that cpufreq is set to 'performance':

  for ((cpu=0; cpu<120; cpu++)); do
    G=/sys/devices/system/cpu/cpu$cpu/cpufreq/scaling_governor
    [ -f $G ] && echo performance > $G
  done

[ ... because it can be *really* annoying to discover that an ostensible 
  performance regression was a cpufreq artifact ... again. ;-) ]

Then I copy a kernel tree to /tmp (ramfs) as root:

	cd /tmp
	rm -rf linux
	git clone ~/linux linux
	cd linux
	make defconfig >/dev/null

... and then we can build the kernel in such a loop (as root again):

  perf stat --repeat 10 --null --pre			'\
	cp -a kernel ../kernel.copy.$(date +%s);	 \
	rm -rf *;					 \
	git checkout .;					 \
	echo 1 > /proc/sys/vm/drop_caches;		 \
	find ../kernel* -type f | xargs cat >/dev/null;  \
	make -j kernel >/dev/null;			 \
	make clean >/dev/null 2>&1;			 \
	sync						'\
							 \
	make -j16 >/dev/null

( I have tested these by pasting them into a terminal. Adjust the ~/linux source 
  git tree and the '-j16' to your system. )

Notes:

 - the 'pre' script portion is not timed by 'perf stat', only the raw build times

 - we flush all caches via drop_caches and re-establish everything again, but:

 - we also introduce an intentional memory leak by slowly filling up ramfs with 
   copies of 'kernel/', thus continously changing the layout of free memory, 
   cached data such as compiler binaries and the source code hierarchy. (Note 
   that the leak is about 8MB per iteration, so it isn't massive.)

With 10 iterations this is the statistical stability I get this on a big box:

 Performance counter stats for 'make -j128 kernel' (10 runs):

      26.346436425 seconds time elapsed    (+- 0.19%)

... which, despite a high iteration count of 10, is still surprisingly noisy, 
right?

A 0.2% stddev is probably not enough to call a 0.7% regression with good 
confidence, so I had to use *30* iterations to make measurement noise to be about 
an order of magnitude lower than the effect I'm trying to measure:

 Performance counter stats for 'make -j128' (30 runs):

      26.334767571 seconds time elapsed    (+- 0.09% )

i.e. "26.334 +- 0.023" seconds is a number we can have pretty high confidence in, 
on this system.

And just to demonstrate that it's all real, I repeated the whole 30-iteration 
measurement again:

 Performance counter stats for 'make -j128' (30 runs):

      26.311166142 seconds time elapsed    (+- 0.07%)

Even if in the end you get a similar result, close to the +0.7% overhead you 
already measured, we should have more confidence in blaming global TLBs for the 
performance regression.

BYMMV.

Thanks,

	Ingo

[*] Note that even this doesn't eliminate certain sources of measurement error: 
    such as the boot-to-boot variance in the layout of certain key kernel data
    structures - but kernel builds are mostly user-space dominated, so drop_caches 
    should be good enough.

next prev parent reply	other threads:[~2018-03-24 11:05 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-23 17:44 [PATCH 00/11] Use global pages with PTI Dave Hansen
2018-03-23 17:44 ` [PATCH 01/11] x86/mm: factor out pageattr _PAGE_GLOBAL setting Dave Hansen
2018-03-23 17:44 ` [PATCH 02/11] x86/mm: undo double _PAGE_PSE clearing Dave Hansen
2018-03-23 17:44 ` [PATCH 03/11] x86/mm: introduce "default" kernel PTE mask Dave Hansen
2018-03-23 17:44 ` [PATCH 04/11] x86/espfix: document use of _PAGE_GLOBAL Dave Hansen
2018-03-23 17:44 ` [PATCH 05/11] x86/mm: do not auto-massage page protections Dave Hansen
2018-03-23 19:15   ` Nadav Amit
2018-03-23 19:26     ` Dave Hansen
2018-03-23 19:34       ` Nadav Amit
2018-03-23 19:38         ` Dave Hansen
2018-03-24 15:10   ` kbuild test robot
2018-03-24 15:21   ` kbuild test robot
2018-03-23 17:44 ` [PATCH 06/11] x86/mm: remove extra filtering in pageattr code Dave Hansen
2018-03-23 17:44 ` [PATCH 07/11] x86/mm: comment _PAGE_GLOBAL mystery Dave Hansen
2018-03-23 17:44 ` [PATCH 08/11] x86/mm: do not forbid _PAGE_RW before init for __ro_after_init Dave Hansen
2018-03-23 17:45 ` [PATCH 09/11] x86/pti: enable global pages for shared areas Dave Hansen
2018-03-23 19:12   ` Nadav Amit
2018-03-23 19:36     ` Dave Hansen
2018-03-23 17:45 ` [PATCH 10/11] x86/pti: clear _PAGE_GLOBAL for kernel image Dave Hansen
2018-03-23 17:45 ` [PATCH 11/11] x86/pti: leave kernel text global for !PCID Dave Hansen
2018-03-23 18:26 ` [PATCH 00/11] Use global pages with PTI Linus Torvalds
2018-03-24  0:40   ` Dave Hansen
2018-03-24  0:46     ` Linus Torvalds
2018-03-24  0:54       ` Linus Torvalds
2018-03-24 11:05     ` Ingo Molnar [this message]
2018-03-27 13:36     ` Thomas Gleixner
2018-03-27 16:32       ` Dave Hansen
2018-03-27 17:51         ` Thomas Gleixner
2018-03-27 20:07           ` Ingo Molnar
2018-03-27 20:19             ` Dave Hansen
2018-03-29  0:17             ` Dave Hansen
2018-03-30 12:09               ` Ingo Molnar
2018-03-30 12:17                 ` Ingo Molnar
2018-03-30 20:26                   ` Dave Hansen
2018-03-30 20:32                     ` Thomas Gleixner
2018-03-30 21:40                       ` Dave Hansen
2018-03-31  5:39                         ` Ingo Molnar
2018-03-31 18:19                           ` Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180324110534.t52m5gvn4r7kvmnj@gmail.com \
    --to=mingo@kernel.org \
    --cc=aarcange@redhat.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=hughd@google.com \
    --cc=jgross@suse.com \
    --cc=keescook@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=namit@vmware.com \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).