All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pekka Enberg <penberg@kernel.org>, Jesper Juhl <jj@chaosbits.net>,
	linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Daniel Lezcano <daniel.lezcano@free.fr>,
	Eric Paris <eparis@redhat.com>,
	Roman Zippel <zippel@linux-m68k.org>,
	linux-kbuild@vger.kernel.org,
	Steven Rostedt <rostedt@goodmis.org>
Subject: Re: PATCH][RFC][resend] CC_OPTIMIZE_FOR_SIZE should default to N
Date: Wed, 23 Mar 2011 22:14:15 +0100	[thread overview]
Message-ID: <20110323211415.GA8791@elte.hu> (raw)
In-Reply-To: <AANLkTikz+vJGFuysDXAdVb33q1q3L547dXNJa9NmeqeM@mail.gmail.com>


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Tue, Mar 22, 2011 at 3:27 AM, Ingo Molnar <mingo@elte.hu> wrote:
> >
> > If that situation has changed - if GCC has regressed in this area then a commit
> > changing the default IMHO gains a lot of credibility if it is backed by careful
> > measurements using perf stat --repeat or similar tools.
> 
> Also, please don't back up any numbers for the "-O2 is faster than
> -Os" case with some benchmark that is hot in the caches.
> 
> The thing is, many optimizations that make the code larger look really
> good if there are no cache misses, and the code is run a million times
> in a tight loop.
> 
> But kernel code in particular tends to not be like that. [...]

To throw some numbers into the discussion, here's the size versus speed 
comparison for 'hackbench 15' - which is more on the microbenchmark side of the 
equation - but has macrobenchmark properties as well, because it runs 3000 
tasks and moves a lot of data, hence thrashes the caches constantly:

     CONFIG_CC_OPTIMIZE_FOR_SIZE=y
     ----------------------------------------
     6,757,858,145 cycles                   #   2525.983 M/sec   ( +-   0.388% )
     2,949,907,036 instructions             #      0.437 IPC     ( +-   0.191% )
       595,955,367 branches                 #    222.759 M/sec   ( +-   0.238% )
        31,504,981 branch-misses            #      5.286 %       ( +-   0.187% )

        0.164320722  seconds time elapsed   ( +-   0.524% )


     # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
     ----------------------------------------
     6,061,867,073 cycles                   #   2510.283 M/sec   ( +-   0.494% )
     2,510,505,732 instructions             #      0.414 IPC     ( +-   0.243% )
       493,721,089 branches                 #    204.455 M/sec   ( +-   0.302% )
        38,731,708 branch-misses            #      7.845 %       ( +-   0.206% )

        0.148203574  seconds time elapsed   ( +-   0.673% )

They were perf stat --repeat 100 runs - repeated a couple of times to make sure 
it's all real. I have used GCC 4.6.0, a relatively recent compiler. (64-bit 
x86, typical .config, etc.)

The text size differences:

      text	   data	    bss	    dec	         filename
  -------------------------------------------------------------------------
   8809558	1790428	2719744	13319730	 vmlinux.optimize_for_size
  10268082	1825292	2727936	14821310	 vmlinux.optimize_for_speed

So by enabling CONFIG_CC_OPTIMIZE_FOR_SIZE=y, we get this total effect:

  -16.5% text size reduction
  +17.5% instruction count increase
  +20.7% branches executed increase
  -22.9% branch-miss reduction
  +11.5% cycle count increase
  +10.8% total runtime increase

A few observations:

 - the branch-miss reduction suggests that almost none of the new branches
   introduced by -Os generates a branch miss.

 - the cycles count increase is in line with the total runtime increase.

 - workloads where 16.5% more instruction cache footprint slows down the 
   workload by more than ~11% would win from enabling 
   CONFIG_CC_OPTIMIZE_FOR_SIZE=y.

Looking at these numbers i became more pessimistic about the usefulness of the 
current implementation of CONFIG_CC_OPTIMIZE_FOR_SIZE=y - it would need some 
*serious* icache thrashing to cause a larger than 11% slowdown, right?

I'm not sure what the best way would be to measure a realistic macro workloads 
where the kernel's instructions generate a lot of instruction-cache misses. 
Most of the 'real' workloads tend to be hard to measure precisely, tend to be 
very noisy and take a long time to run.

I could perhaps try to simulate them: i could patch a debug-only 'icache 
flusher' function into every system call, and compare the perf stat results - 
would that be an acceptable simulation of cache-cold kernel execution?

The 'icache flusher' would be something simple, like 10,000x 5-byte NOP 
instructions in a row, or so. This would slow things down immensely, but this 
particular slowdown is the same for both OPTIMIZE_FOR_SIZE=y and 
OPTIMIZE_FOR_SIZE=n.

Any better ideas?

	Ingo

      parent reply	other threads:[~2011-03-23 21:14 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-21 20:08 PATCH][RFC][resend] CC_OPTIMIZE_FOR_SIZE should default to N Jesper Juhl
2011-03-22  2:52 ` Steven Rostedt
2011-03-22  8:21 ` Pekka Enberg
2011-03-22  8:25   ` Jesper Juhl
2011-03-22 10:27   ` Ingo Molnar
2011-03-22 16:59     ` Linus Torvalds
2011-03-23 17:45       ` Andi Kleen
2011-03-23 21:14       ` Ingo Molnar [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110323211415.GA8791@elte.hu \
    --to=mingo@elte.hu \
    --cc=akpm@linux-foundation.org \
    --cc=daniel.lezcano@free.fr \
    --cc=eparis@redhat.com \
    --cc=jj@chaosbits.net \
    --cc=linux-kbuild@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=penberg@kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=torvalds@linux-foundation.org \
    --cc=zippel@linux-m68k.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.