public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [patch 00/2] improve .text size on gcc 4.0 and newer compilers
@ 2005-12-28 11:46 Ingo Molnar
  2005-12-28 19:17 ` Linus Torvalds
  2005-12-29  4:38 ` Adrian Bunk
  0 siblings, 2 replies; 167+ messages in thread
From: Ingo Molnar @ 2005-12-28 11:46 UTC (permalink / raw)
  To: lkml; +Cc: Linus Torvalds, Andrew Morton, Arjan van de Ven, Matt Mackall

this patchset (for the 2.6.16 tree) consists of two patches:

  gcc-no-forced-inlining.patch
  gcc-unit-at-a-time.patch

the purpose of these patches is to reduce the kernel's .text size, in 
particular if CONFIG_CC_OPTIMIZE_FOR_SIZE is specified. The effect of 
the patches on x86 is:

    text    data     bss     dec     hex filename
 3286166  869852  387260 4543278  45532e vmlinux-orig
 3194123  955168  387260 4536551  4538e7 vmlinux-inline
 3119495  884960  387748 4392203  43050b vmlinux-inline+units
  437271   77646   32192  547109   85925 vmlinux-tiny-orig
  452694   77646   32192  562532   89564 vmlinux-tiny-inline
  431891   77422   32128  541441   84301 vmlinux-tiny-inline+units

i.e. a 5.3% .text reduction (!) with a larger .config, and a 1.2% .text 
reduction with a smaller .config.

i've also done test-builds with CC_OPTIMIZE_FOR_SIZE disabled:

   text    data     bss     dec     hex filename
4080998  870384  387260 5338642  517612 vmlinux-speed-orig
4084421  872024  387260 5343705  5189d9 vmlinux-speed-inline
4010957  834048  387748 5232753  4fd871 vmlinux-speed-inline+units

so the more flexible inlining did not result in many changes [which is 
good, we want gcc to inline those in the optimized-for-speed case], but 
unit-at-a-time optimization resulted in smaller code - very likely 
meaning speed advantages as well.

unit-at-a-time still increases the kernel stack footprint somewhat (by 
about 5% in the CC_OPTIMIZE_FOR_SIZE case), but not by the insane degree 
gcc3 used to, which prompted the original -fno-unit-at-a-time addition.

so i think the combination of the two patches is a win both for small 
and for large systems. In fact the 5.3% .text reduction for embedded 
kernels is very significant.

the patches are against -git, and were test-built and test-booted on 
x86, using gcc 4.0.2.

	Ingo

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-28 11:46 Ingo Molnar
@ 2005-12-28 19:17 ` Linus Torvalds
  2005-12-28 19:34   ` Arjan van de Ven
  2005-12-29  4:38 ` Adrian Bunk
  1 sibling, 1 reply; 167+ messages in thread
From: Linus Torvalds @ 2005-12-28 19:17 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: lkml, Andrew Morton, Arjan van de Ven, Matt Mackall



On Wed, 28 Dec 2005, Ingo Molnar wrote:
>
> this patchset (for the 2.6.16 tree) consists of two patches:
> 
>   gcc-no-forced-inlining.patch
>   gcc-unit-at-a-time.patch

Why do you mix the two up? I'd assume they are independent, and if they 
aren't, please explain why?

The forced inlining is not just a good idea. Several versions of gcc would 
NOT COMPILE the kernel without it. The fact that it works with your 
configurations and your particular compiler version has absolutely ZERO 
relevance.

Gcc has had horrible mistakes in inlining functions. Inlining too much, 
and quite often, not inlining things that absolutely _have_ to be inlined. 
Trivial things that inline to an instruction or two, but that look 
complicated because they have a big switch-statement that just happens to 
be known at compile-time.

And not inlining them not only results in horribly bad code (dynamic tests 
for something that should be static), but also results in link errors when 
cases that should be statically unreachable suddenly become reachable 
after all.

So the fact that your gcc-4.x version happens to get things right for your 
case in no way means that you can do this in general.

Also, the inlining patch apparently makes code larger in some cases, so 
it's not even a unconditional win.

What's the effect of _just_ the "unit-at-a-time" thing which we can (and 
you did) much more easily make gcc-version-dependent?

			Linus

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-28 19:17 ` Linus Torvalds
@ 2005-12-28 19:34   ` Arjan van de Ven
  2005-12-28 21:02     ` Linus Torvalds
  0 siblings, 1 reply; 167+ messages in thread
From: Arjan van de Ven @ 2005-12-28 19:34 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Ingo Molnar, lkml, Andrew Morton, Matt Mackall


> 
> The forced inlining is not just a good idea. Several versions of gcc would 
> NOT COMPILE the kernel without it.

yup that's why the patch only does it for gcc4, in which the inlining
heuristics finally got rewritten to something that seems to resemble
sanity...

> Also, the inlining patch apparently makes code larger in some cases, so 
> it's not even a unconditional win.

.... as long as you give the inlining algorithm enough information.
-fno-unit-at-a-time prevents gcc from having the information, and the
decisions it makes are then less optimal... 

(unit-at-a-time allows gcc to look at the entire .c file, eg things like
number of callers etc etc, disabling that tells gcc to do the .c file as
single pass top-to-bottom only)




^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-28 19:34   ` Arjan van de Ven
@ 2005-12-28 21:02     ` Linus Torvalds
  2005-12-28 21:17       ` Arjan van de Ven
  2005-12-28 21:23       ` Ingo Molnar
  0 siblings, 2 replies; 167+ messages in thread
From: Linus Torvalds @ 2005-12-28 21:02 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Ingo Molnar, lkml, Andrew Morton, Matt Mackall



On Wed, 28 Dec 2005, Arjan van de Ven wrote:
> 
> yup that's why the patch only does it for gcc4, in which the inlining
> heuristics finally got rewritten to something that seems to resemble
> sanity...

Is that actually true of all gcc4 versions? I seem to remember gcc-4.0 
being a real stinker.

> > Also, the inlining patch apparently makes code larger in some cases, 
> > so it's not even a unconditional win.
> 
> .... as long as you give the inlining algorithm enough information. 
> -fno-unit-at-a-time prevents gcc from having the information, and the 
> decisions it makes are then less optimal...
> 
> (unit-at-a-time allows gcc to look at the entire .c file, eg things like
> number of callers etc etc, disabling that tells gcc to do the .c file as
> single pass top-to-bottom only)

I'd still prefer to see numbers with -funit-at-a-time only. I think it's 
an independent knob, and I'd be much less worried about that, because we 
do know that unit-at-a-time has been enabled on x86-64 for a long time 
("forever"). So that's less of a change, I feel. 

			Linus

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-28 21:02     ` Linus Torvalds
@ 2005-12-28 21:17       ` Arjan van de Ven
  2005-12-28 21:23       ` Ingo Molnar
  1 sibling, 0 replies; 167+ messages in thread
From: Arjan van de Ven @ 2005-12-28 21:17 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Ingo Molnar, lkml, Andrew Morton, Matt Mackall

On Wed, 2005-12-28 at 13:02 -0800, Linus Torvalds wrote:
> 
> On Wed, 28 Dec 2005, Arjan van de Ven wrote:
> > 
> > yup that's why the patch only does it for gcc4, in which the inlining
> > heuristics finally got rewritten to something that seems to resemble
> > sanity...
> 
> Is that actually true of all gcc4 versions? I seem to remember gcc-4.0 
> being a real stinker.

it is... if you disable unit-at-a-time for sure.
But I'm not entirely sure when this got in, if it was 4.0 or 4.1

> > (unit-at-a-time allows gcc to look at the entire .c file, eg things like
> > number of callers etc etc, disabling that tells gcc to do the .c file as
> > single pass top-to-bottom only)
> 
> I'd still prefer to see numbers with -funit-at-a-time only. I think it's 
> an independent knob, and I'd be much less worried about that, because we 
> do know that unit-at-a-time has been enabled on x86-64 for a long time 
> ("forever"). So that's less of a change, I feel. 

the only effect I expect is more inlining actually, since we on the one
hand tie gcc's hands via the forced inline, and one the other hand now
give it more room to inline more. But yeah it's worth to look at for
sure, even if it is to see it's getting bigger ;) 


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-28 21:02     ` Linus Torvalds
  2005-12-28 21:17       ` Arjan van de Ven
@ 2005-12-28 21:23       ` Ingo Molnar
  2005-12-28 21:48         ` Ingo Molnar
                           ` (2 more replies)
  1 sibling, 3 replies; 167+ messages in thread
From: Ingo Molnar @ 2005-12-28 21:23 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Arjan van de Ven, lkml, Andrew Morton, Matt Mackall


* Linus Torvalds <torvalds@osdl.org> wrote:

> On Wed, 28 Dec 2005, Arjan van de Ven wrote:
> > 
> > yup that's why the patch only does it for gcc4, in which the inlining
> > heuristics finally got rewritten to something that seems to resemble
> > sanity...
> 
> Is that actually true of all gcc4 versions? I seem to remember gcc-4.0 
> being a real stinker.

all my tests were with gcc 4.0.2.

> > > Also, the inlining patch apparently makes code larger in some cases, 
> > > so it's not even a unconditional win.
> > 
> > .... as long as you give the inlining algorithm enough information. 
> > -fno-unit-at-a-time prevents gcc from having the information, and the 
> > decisions it makes are then less optimal...
> > 
> > (unit-at-a-time allows gcc to look at the entire .c file, eg things like
> > number of callers etc etc, disabling that tells gcc to do the .c file as
> > single pass top-to-bottom only)
> 
> I'd still prefer to see numbers with -funit-at-a-time only. I think 
> it's an independent knob, and I'd be much less worried about that, 
> because we do know that unit-at-a-time has been enabled on x86-64 for 
> a long time ("forever"). So that's less of a change, I feel.

the two patches are completely independent, and the only reason i did 
them together was because i was looking at .text size in general and 
these were the two things that made a difference. Also, the inlining was 
a loss in one of the .config's, unless combined with the wider-scope 
unit-at-a-time optimization.

(there's a third thing that i was also playing with, -ffunction-sections 
and -fdata-sections, but those dont seem to be reliable on the binutils 
side yet.)

here are the isolated unit-at-a-time numbers as well:

   text    data     bss     dec     hex filename
  3286166  869852  387260 4543278  45532e vmlinux-orig
  3259928  833176  387748 4480852  445f54 vmlinux-units         -0.8%
  3194123  955168  387260 4536551  4538e7 vmlinux-inline        -2.9%
  3119495  884960  387748 4392203  43050b vmlinux-inline+units  -5.3%

so both inlining and unit-at-a-time is a win independently [although 
inlining alone does bloat .data], but applied together they bring an 
additional 1.6% of .text savings. All builds done with:

   gcc version 4.0.2 20051109 (Red Hat 4.0.2-6)

how about giving the inlining stuff some more exposure in -mm (if it's 
fine with Andrew), to check for any regressions? I'd suggest the same 
for the unit-at-a-time thing too, in any case.

	Ingo

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-28 21:23       ` Ingo Molnar
@ 2005-12-28 21:48         ` Ingo Molnar
  2005-12-28 23:56           ` Krzysztof Halasa
                             ` (2 more replies)
  2005-12-29  0:37         ` Rogério Brito
  2006-01-03  3:36         ` Daniel Jacobowitz
  2 siblings, 3 replies; 167+ messages in thread
From: Ingo Molnar @ 2005-12-28 21:48 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Arjan van de Ven, lkml, Andrew Morton, Matt Mackall


* Ingo Molnar <mingo@elte.hu> wrote:

> (there's a third thing that i was also playing with, -ffunction-sections 
> and -fdata-sections, but those dont seem to be reliable on the binutils 
> side yet.)
> 
> here are the isolated unit-at-a-time numbers as well:
> 
>    text    data     bss     dec     hex filename
>   3286166  869852  387260 4543278  45532e vmlinux-orig
>   3259928  833176  387748 4480852  445f54 vmlinux-units         -0.8%
>   3194123  955168  387260 4536551  4538e7 vmlinux-inline        -2.9%
>   3119495  884960  387748 4392203  43050b vmlinux-inline+units  -5.3%
> 
> so both inlining and unit-at-a-time is a win independently [although 
> inlining alone does bloat .data], but applied together they bring an 
> additional 1.6% of .text savings. All builds done with:
> 
>    gcc version 4.0.2 20051109 (Red Hat 4.0.2-6)
> 
> how about giving the inlining stuff some more exposure in -mm (if it's 
> fine with Andrew), to check for any regressions? I'd suggest the same 
> for the unit-at-a-time thing too, in any case.

another thing: i wanted to decrease the size of -Os 
(CONFIG_CC_OPTIMIZE_FOR_SIZE) kernels, which e.g. Fedora uses too (to 
keep the icache footprint down).

I think gcc should arguably not be forced to inline things when doing 
-Os, and it's also expected to mess up much less than when optimizing 
for speed. So maybe forced inlining should be dependent on 
!CONFIG_CC_OPTIMIZE_FOR_SIZE?

I.e. like the patch below?

	Ingo

----------------->
Subject: allow gcc4 to control inlining

allow gcc4 compilers to decide what to inline and what not - instead
of the kernel forcing gcc to inline all the time.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
----

 include/linux/compiler-gcc4.h |   13 +++++++++----
 1 files changed, 9 insertions(+), 4 deletions(-)

Index: linux-gcc.q/include/linux/compiler-gcc4.h
===================================================================
--- linux-gcc.q.orig/include/linux/compiler-gcc4.h
+++ linux-gcc.q/include/linux/compiler-gcc4.h
@@ -3,14 +3,19 @@
 /* These definitions are for GCC v4.x.  */
 #include <linux/compiler-gcc.h>
 
-#define inline			inline		__attribute__((always_inline))
-#define __inline__		__inline__	__attribute__((always_inline))
-#define __inline		__inline	__attribute__((always_inline))
+
+#ifndef CONFIG_CC_OPTIMIZE_FOR_SIZE
+# define inline			inline		__attribute__((always_inline))
+# define __inline__		__inline__	__attribute__((always_inline))
+# define __inline		__inline	__attribute__((always_inline))
+#endif
+
 #define __deprecated		__attribute__((deprecated))
 #define __attribute_used__	__attribute__((__used__))
 #define __attribute_pure__	__attribute__((pure))
 #define __attribute_const__	__attribute__((__const__))
-#define  noinline		__attribute__((noinline))
+#define noinline		__attribute__((noinline))
+#define __always_inline		inline __attribute__((always_inline))
 #define __must_check 		__attribute__((warn_unused_result))
 #define __compiler_offsetof(a,b) __builtin_offsetof(a,b)
 

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-28 21:48         ` Ingo Molnar
@ 2005-12-28 23:56           ` Krzysztof Halasa
  2005-12-29  7:41             ` Ingo Molnar
  2005-12-29  4:11           ` Andrew Morton
  2005-12-29 14:38           ` Christoph Hellwig
  2 siblings, 1 reply; 167+ messages in thread
From: Krzysztof Halasa @ 2005-12-28 23:56 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Arjan van de Ven, lkml, Andrew Morton,
	Matt Mackall

Ingo Molnar <mingo@elte.hu> writes:

>>    gcc version 4.0.2 20051109 (Red Hat 4.0.2-6)

> another thing: i wanted to decrease the size of -Os 
> (CONFIG_CC_OPTIMIZE_FOR_SIZE) kernels, which e.g. Fedora uses too (to 
> keep the icache footprint down).

Remember the above gcc miscompiles the x86-32 kernel with -Os:

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=173764
-- 
Krzysztof Halasa

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-28 21:23       ` Ingo Molnar
  2005-12-28 21:48         ` Ingo Molnar
@ 2005-12-29  0:37         ` Rogério Brito
  2006-01-03  3:36         ` Daniel Jacobowitz
  2 siblings, 0 replies; 167+ messages in thread
From: Rogério Brito @ 2005-12-29  0:37 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Arjan van de Ven, lkml, Andrew Morton,
	Matt Mackall

On Dec 28 2005, Ingo Molnar wrote:
> how about giving the inlining stuff some more exposure in -mm (if it's 
> fine with Andrew), to check for any regressions? I'd suggest the same 
> for the unit-at-a-time thing too, in any case.

I am willing to give a try to the patches on both ia32 and ppc (which is
what I have at hand). I'm using Debian testing, but I can, perhaps, give
GCC 4.1 a shot (if I happen to grab my hands on such patched tree soon
enough).

I am interested in anything that could bring me memory reduction.
Actually, I am even considering using the -tiny patches here on my
father's computer---an old Pentium MMX 200MHz with 64MB of RAM.

Also, the PowerMac 9500 that I have here was inherited from my uncle and
it has a slow SCSI disk (only 2MB/s of transfer rates) and 192MB of RAM.
Anything that makes it avoid hitting swap is a plus, as you can imagine.


Thanks, Rogério.

-- 
Rogério Brito : rbrito@ime.usp.br : http://www.ime.usp.br/~rbrito
Homepage of the algorithms package : http://algorithms.berlios.de
Homepage on freshmeat:  http://freshmeat.net/projects/algorithms/

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-28 21:48         ` Ingo Molnar
  2005-12-28 23:56           ` Krzysztof Halasa
@ 2005-12-29  4:11           ` Andrew Morton
  2005-12-29  7:32             ` Ingo Molnar
                               ` (3 more replies)
  2005-12-29 14:38           ` Christoph Hellwig
  2 siblings, 4 replies; 167+ messages in thread
From: Andrew Morton @ 2005-12-29  4:11 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: torvalds, arjan, linux-kernel, mpm

Ingo Molnar <mingo@elte.hu> wrote:
>
> I think gcc should arguably not be forced to inline things when doing 
> -Os, and it's also expected to mess up much less than when optimizing 
> for speed. So maybe forced inlining should be dependent on 
> !CONFIG_CC_OPTIMIZE_FOR_SIZE?

When it comes to inlining I just don't trust gcc as far as I can spit it. 
We're putting the kernel at the mercy of future random brainfarts and bugs
from the gcc guys.  It would be better and safer IMO to continue to force
`inline' to have strict and sane semamtics, and to simply be vigilant about
our use of it.

IOW: I'd prefer that we be the ones who specify which functions are going
to be inlined and which ones are not.


If no-forced-inlining makes the kernel smaller then we probably have (yet
more) incorrect inlining.  We should hunt those down and fix them.  We did
quite a lot of this in 2.5.x/2.6.early.  Didn't someone have a script which
would identify which functions are a candidate for uninlining?

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-28 11:46 Ingo Molnar
  2005-12-28 19:17 ` Linus Torvalds
@ 2005-12-29  4:38 ` Adrian Bunk
  2005-12-29  7:59   ` Ingo Molnar
  1 sibling, 1 reply; 167+ messages in thread
From: Adrian Bunk @ 2005-12-29  4:38 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: lkml, Linus Torvalds, Andrew Morton, Arjan van de Ven,
	Matt Mackall

On Wed, Dec 28, 2005 at 12:46:37PM +0100, Ingo Molnar wrote:
> this patchset (for the 2.6.16 tree) consists of two patches:
> 
>   gcc-no-forced-inlining.patch
>   gcc-unit-at-a-time.patch
> 
> the purpose of these patches is to reduce the kernel's .text size, in 
> particular if CONFIG_CC_OPTIMIZE_FOR_SIZE is specified. The effect of 
> the patches on x86 is:
> 
>     text    data     bss     dec     hex filename
>  3286166  869852  387260 4543278  45532e vmlinux-orig
>  3194123  955168  387260 4536551  4538e7 vmlinux-inline
>...

The most interesting question is:
Which object files do these savings come from

We have two cases in the kernel:
- header files where forced inlining is required
- C files where forced inlining is nearly always wrong

The classical example are functions some marked as "inline" when they 
where tiny and had one caller, but now are huge and have many callers.

An interesting number would be the space saving after doing some kind of 
s/inline//g in all .c files.

> unit-at-a-time still increases the kernel stack footprint somewhat (by 
> about 5% in the CC_OPTIMIZE_FOR_SIZE case), but not by the insane degree 
> gcc3 used to, which prompted the original -fno-unit-at-a-time addition.
>...

Please hold off this patch.

I do already plan to look at this after the smoke has cleared after the 
4k stacks issue. I want to avoid two different knobs both with negative 
effects on stack usage (currently CONFIG_4KSTACKS=y, and after your 
patch gcc >= 4.0) giving a low testing coverage of the worst cases.

> 	Ingo

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-29  4:11           ` Andrew Morton
@ 2005-12-29  7:32             ` Ingo Molnar
  2005-12-29 14:58               ` Horst von Brand
                                 ` (2 more replies)
  2005-12-29  7:49             ` Arjan van de Ven
                               ` (2 subsequent siblings)
  3 siblings, 3 replies; 167+ messages in thread
From: Ingo Molnar @ 2005-12-29  7:32 UTC (permalink / raw)
  To: Andrew Morton; +Cc: torvalds, arjan, linux-kernel, mpm


* Andrew Morton <akpm@osdl.org> wrote:

> Ingo Molnar <mingo@elte.hu> wrote:
> >
> > I think gcc should arguably not be forced to inline things when doing 
> > -Os, and it's also expected to mess up much less than when optimizing 
> > for speed. So maybe forced inlining should be dependent on 
> > !CONFIG_CC_OPTIMIZE_FOR_SIZE?
> 
> When it comes to inlining I just don't trust gcc as far as I can spit 
> it.  We're putting the kernel at the mercy of future random brainfarts 
> and bugs from the gcc guys.  It would be better and safer IMO to 
> continue to force `inline' to have strict and sane semamtics, and to 
> simply be vigilant about our use of it.

i think there's quite an attitude here - we are at the mercy of "gcc 
brainfarts" anyway, and users are at the mercy of "kernel brainfarts" 
just as much. Should users disable swapping and trash-talk it just 
because the Linux kernel used to have a poor VM? (And the gcc folks are 
certainly listening - it's not like they were unwilling to fix stuff, 
they simply had their own decade-old technological legacies that made 
certain seemingly random problems much harder to attack. E.g. -Os has 
recently been improved quite significantly in to-be-gcc-4.2.)

at least let us allow gcc do it in the CONFIG_CC_OPTIMIZE_FOR_SIZE case, 
-Os means "optimize for space" - no ifs and when, it's a _very_ clear 
and definite goal. I dont think there's much space for gcc to mess up 
there, it's a mostly binary decision: either the inlining of a 
particular function saves space, or not.

in the other case, when optimizing for speed, the decisions are alot 
less clear, and gcc has arguably alot more leeway to mess up.

also, there's a fundamental conflict of 'speed vs. performance' here, 
for a certain boundary region. For the extremes, very small and very 
large functions, the decision is clear, but if e.g. a CPU has tons of 
cache, it might prefer more agressive inlining than if it doesnt. So 
it's not like we can do it in a fully static manner.

> If no-forced-inlining makes the kernel smaller then we probably have 
> (yet more) incorrect inlining. We should hunt those down and fix them.  
> We did quite a lot of this in 2.5.x/2.6.early.  Didn't someone have a 
> script which would identify which functions are a candidate for 
> uninlining?

this is going to be a never ending battle, and it's not about peanuts 
either: we are talking about 5% of .text space here, on a .config that 
carries most of the important subsystems and drivers. Do we really want 
to take on this battle and fight it for 30,000+ kernel functions - when 
gcc today can arguably do a _better_ job than what we attempted to do 
manually for years? We went to great trouble going to BK just to make 
development easier - shouldnt we let a fully open-source tool like gcc 
make our lives easier and not worry about details like that? Whether to 
inline or not _is_ a mostly thoughtless work with almost zero intellect 
in it. I'd rather trust gcc do it than some script doing the same much 
worse.

	Ingo

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-28 23:56           ` Krzysztof Halasa
@ 2005-12-29  7:41             ` Ingo Molnar
  2005-12-29  8:02               ` Dave Jones
  2005-12-29 19:44               ` Krzysztof Halasa
  0 siblings, 2 replies; 167+ messages in thread
From: Ingo Molnar @ 2005-12-29  7:41 UTC (permalink / raw)
  To: Krzysztof Halasa
  Cc: Linus Torvalds, Arjan van de Ven, lkml, Andrew Morton,
	Matt Mackall


* Krzysztof Halasa <khc@pm.waw.pl> wrote:

> Ingo Molnar <mingo@elte.hu> writes:
> 
> >>    gcc version 4.0.2 20051109 (Red Hat 4.0.2-6)
> 
> > another thing: i wanted to decrease the size of -Os 
> > (CONFIG_CC_OPTIMIZE_FOR_SIZE) kernels, which e.g. Fedora uses too (to 
> > keep the icache footprint down).
> 
> Remember the above gcc miscompiles the x86-32 kernel with -Os:
> 
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=173764

i'm not sure what the point is. There was no sudden rush of -Os related 
bugs when Fedora switched to it for the kernel, and the 35% code-size 
savings were certainly worth it in terms of icache footprint. Yes, -Os 
is a major change for how the compiler works, and the kernel is a major 
piece of software.

	Ingo

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-29  4:11           ` Andrew Morton
  2005-12-29  7:32             ` Ingo Molnar
@ 2005-12-29  7:49             ` Arjan van de Ven
  2005-12-29 15:01               ` Horst von Brand
  2005-12-30 15:28             ` Alan Cox
  2005-12-30 22:12             ` Matt Mackall
  3 siblings, 1 reply; 167+ messages in thread
From: Arjan van de Ven @ 2005-12-29  7:49 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Ingo Molnar, torvalds, linux-kernel, mpm


> IOW: I'd prefer that we be the ones who specify which functions are going
> to be inlined and which ones are not.

a bold statement... especially since the "and which ones are not" isn't
currently there, we still leave gcc a lot of freedom there ... but only
in one direction.


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-29  4:38 ` Adrian Bunk
@ 2005-12-29  7:59   ` Ingo Molnar
  2005-12-29 13:52     ` Adrian Bunk
  0 siblings, 1 reply; 167+ messages in thread
From: Ingo Molnar @ 2005-12-29  7:59 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: lkml, Linus Torvalds, Andrew Morton, Arjan van de Ven,
	Matt Mackall


* Adrian Bunk <bunk@stusta.de> wrote:

> > unit-at-a-time still increases the kernel stack footprint somewhat (by 
> > about 5% in the CC_OPTIMIZE_FOR_SIZE case), but not by the insane degree 
> > gcc3 used to, which prompted the original -fno-unit-at-a-time addition.
> >...
> 
> Please hold off this patch.
> 
> I do already plan to look at this after the smoke has cleared after 
> the 4k stacks issue. I want to avoid two different knobs both with 
> negative effects on stack usage (currently CONFIG_4KSTACKS=y, and 
> after your patch gcc >= 4.0) giving a low testing coverage of the 
> worst cases.

this is obviously not 2.6.15 stuff, so we've got enough time to see the 
effects. [ And what does "I do plan to look at this" mean? When 
precisely, and can i thus go to other topics without the issue being 
dropped on the floor indefinitely? ]

also note that the inlining patch actually _reduces_ average stack 
footprint by ~3-4%:
                                            orig        +inlining
        # of functions above 256 bytes:      683              660
               total stackspace, bytes:   148492           142884

it is the unit-at-a-time patch that increases stack footprint (by about 
7-8%, which together with the inlining patch gives a net ~5%).

	Ingo

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-29  7:41             ` Ingo Molnar
@ 2005-12-29  8:02               ` Dave Jones
  2005-12-29 19:44               ` Krzysztof Halasa
  1 sibling, 0 replies; 167+ messages in thread
From: Dave Jones @ 2005-12-29  8:02 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Krzysztof Halasa, Linus Torvalds, Arjan van de Ven, lkml,
	Andrew Morton, Matt Mackall

On Thu, Dec 29, 2005 at 08:41:07AM +0100, Ingo Molnar wrote:
 > 
 > * Krzysztof Halasa <khc@pm.waw.pl> wrote:
 > 
 > > Ingo Molnar <mingo@elte.hu> writes:
 > > 
 > > >>    gcc version 4.0.2 20051109 (Red Hat 4.0.2-6)
 > > 
 > > > another thing: i wanted to decrease the size of -Os 
 > > > (CONFIG_CC_OPTIMIZE_FOR_SIZE) kernels, which e.g. Fedora uses too (to 
 > > > keep the icache footprint down).
 > > 
 > > Remember the above gcc miscompiles the x86-32 kernel with -Os:
 > > 
 > > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=173764
 > 
 > i'm not sure what the point is. There was no sudden rush of -Os related 
 > bugs when Fedora switched to it for the kernel, and the 35% code-size 
 > savings were certainly worth it in terms of icache footprint. Yes, -Os 
 > is a major change for how the compiler works, and the kernel is a major 
 > piece of software.

The bug referenced is also fixed in gcc 4.1

		Dave


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-29  7:59   ` Ingo Molnar
@ 2005-12-29 13:52     ` Adrian Bunk
  2005-12-29 19:57       ` Horst von Brand
  2005-12-29 20:25       ` Ingo Molnar
  0 siblings, 2 replies; 167+ messages in thread
From: Adrian Bunk @ 2005-12-29 13:52 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: lkml, Linus Torvalds, Andrew Morton, Arjan van de Ven,
	Matt Mackall

On Thu, Dec 29, 2005 at 08:59:36AM +0100, Ingo Molnar wrote:
> 
> * Adrian Bunk <bunk@stusta.de> wrote:
> 
> > > unit-at-a-time still increases the kernel stack footprint somewhat (by 
> > > about 5% in the CC_OPTIMIZE_FOR_SIZE case), but not by the insane degree 
> > > gcc3 used to, which prompted the original -fno-unit-at-a-time addition.
> > >...
> > 
> > Please hold off this patch.
> > 
> > I do already plan to look at this after the smoke has cleared after 
> > the 4k stacks issue. I want to avoid two different knobs both with 
> > negative effects on stack usage (currently CONFIG_4KSTACKS=y, and 
> > after your patch gcc >= 4.0) giving a low testing coverage of the 
> > worst cases.
> 
> this is obviously not 2.6.15 stuff, so we've got enough time to see the 
> effects. [ And what does "I do plan to look at this" mean? When 
> precisely, and can i thus go to other topics without the issue being 
> dropped on the floor indefinitely? ]

It won't be dropped on the floor indefinitely.

"I do plan to look at this" means that I'd currently estimate this being 
2.6.19 stuff.

Yes that's one year from now, but we need it properly analyzed and 
tested before getting it into Linus' tree, and I do really want it 
untangled from and therefore after 4k stacks.

> also note that the inlining patch actually _reduces_ average stack 
> footprint by ~3-4%:
>                                             orig        +inlining
>         # of functions above 256 bytes:      683              660
>                total stackspace, bytes:   148492           142884
> 
> it is the unit-at-a-time patch that increases stack footprint (by about 
> 7-8%, which together with the inlining patch gives a net ~5%).

The problem with the stack is that average stack usage is relatively 
uninteresting - what matters is the worst case stack usage. And I'd 
expect the stack footprint improvements you see with less inlining in 
different places than the deteriorations with unit-at-a-time.

> 	Ingo

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-28 21:48         ` Ingo Molnar
  2005-12-28 23:56           ` Krzysztof Halasa
  2005-12-29  4:11           ` Andrew Morton
@ 2005-12-29 14:38           ` Christoph Hellwig
  2005-12-29 14:54             ` Arjan van de Ven
  2 siblings, 1 reply; 167+ messages in thread
From: Christoph Hellwig @ 2005-12-29 14:38 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Arjan van de Ven, lkml, Andrew Morton,
	Matt Mackall

> another thing: i wanted to decrease the size of -Os 
> (CONFIG_CC_OPTIMIZE_FOR_SIZE) kernels, which e.g. Fedora uses too (to 
> keep the icache footprint down).
> 
> I think gcc should arguably not be forced to inline things when doing 
> -Os, and it's also expected to mess up much less than when optimizing 
> for speed. So maybe forced inlining should be dependent on 
> !CONFIG_CC_OPTIMIZE_FOR_SIZE?

I don't care too much whether we put always_inline or inline at the function
we _really_ want to inline.  But all others shouldn't have any inline marker.
So instead of changing the pretty useful redefinitions we have to keep the
code a little more readable what about getting rid of all the stupid inlines
we have over the place?  I think many things we have static inline in headers
now should move to proper out of line functions.  This is more work, but also
more useful than just flipping a bit.


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-29 14:38           ` Christoph Hellwig
@ 2005-12-29 14:54             ` Arjan van de Ven
  2005-12-29 15:35               ` Adrian Bunk
  0 siblings, 1 reply; 167+ messages in thread
From: Arjan van de Ven @ 2005-12-29 14:54 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Ingo Molnar, Linus Torvalds, lkml, Andrew Morton, Matt Mackall


> I don't care too much whether we put always_inline or inline at the function
> we _really_ want to inline.  But all others shouldn't have any inline marker.
> So instead of changing the pretty useful redefinitions we have to keep the
> code a little more readable what about getting rid of all the stupid inlines
> we have over the place? 

just in drivers/ there are well over 6400 of those. Changing most of
those is going to be a huge effort. The reality is, most driver writers
(in fact kernel code writers) tend to overestimate the gain of inline in
THEIR code, and to underestimate the cumulative cost of it. Despite what
akpm says, I think gcc can make a better judgement than most of these
authors (probably including me :). We can remove 6400 now, but a year
from now, another 1000 have been added back again I bet.

You describe a nice utopia where only the most essential functions are
inlined.. but so far that hasn't worked out all that well ;) Turning
"inline" back into the hint to the compiler that the C language makes it
is maybe a cop-out, but it's a sustainable approach at least.

> I think many things we have static inline in headers
> now should move to proper out of line functions.

I suspect the biggest gains aren't the ones in the headers; those tend
to be quite small and often mostly optimize away due to constant
arguments (there may be a few exceptions of course), and also have been
attacked by various people in the 2.5/2.6 series before. It's the local
functions that got too many "inline" hints.




^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-29  7:32             ` Ingo Molnar
@ 2005-12-29 14:58               ` Horst von Brand
  2005-12-29 15:40               ` Adrian Bunk
  2005-12-29 17:41               ` Linus Torvalds
  2 siblings, 0 replies; 167+ messages in thread
From: Horst von Brand @ 2005-12-29 14:58 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, torvalds, arjan, linux-kernel, mpm

Ingo Molnar <mingo@elte.hu> wrote:
> * Andrew Morton <akpm@osdl.org> wrote:
> > Ingo Molnar <mingo@elte.hu> wrote:
> > > I think gcc should arguably not be forced to inline things when doing 
> > > -Os, and it's also expected to mess up much less than when optimizing 
> > > for speed. So maybe forced inlining should be dependent on 
> > > !CONFIG_CC_OPTIMIZE_FOR_SIZE?

> > When it comes to inlining I just don't trust gcc as far as I can spit 
> > it.  We're putting the kernel at the mercy of future random brainfarts 
> > and bugs from the gcc guys.  It would be better and safer IMO to 
> > continue to force `inline' to have strict and sane semamtics, and to 
> > simply be vigilant about our use of it.

> i think there's quite an attitude here - we are at the mercy of "gcc 
> brainfarts" anyway, and users are at the mercy of "kernel brainfarts" 
> just as much. Should users disable swapping and trash-talk it just 
> because the Linux kernel used to have a poor VM? (And the gcc folks are 
> certainly listening - it's not like they were unwilling to fix stuff, 
> they simply had their own decade-old technological legacies that made 
> certain seemingly random problems much harder to attack. E.g. -Os has 
> recently been improved quite significantly in to-be-gcc-4.2.)

Also, we do trust gcc not to screw up on lots of other stuff. I.e., we
trust it to use registers wisely (register anyone?), to set up sane
counting loops and related array handling (noone is using pointers to
traverse arrays "for speed" anymore), and to select the best code sequence
for the machine at hand in lots of cases, ... And not only for the kernel,
for the whole userspace too!

Sure, this is a large change, and it might be warranted to place it under
CONFIG_NEW_COMPILER_OPTIONS (Marked experimental, high explosive, etc if it
makes you too uneasy).
-- 
Dr. Horst H. von Brand                   User #22616 counter.li.org
Departamento de Informatica                     Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria              +56 32 654239
Casilla 110-V, Valparaiso, Chile                Fax:  +56 32 797513


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-29  7:49             ` Arjan van de Ven
@ 2005-12-29 15:01               ` Horst von Brand
  0 siblings, 0 replies; 167+ messages in thread
From: Horst von Brand @ 2005-12-29 15:01 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Andrew Morton, Ingo Molnar, torvalds, linux-kernel, mpm

Arjan van de Ven <arjan@infradead.org> wrote:
> > IOW: I'd prefer that we be the ones who specify which functions are going
> > to be inlined and which ones are not.

> a bold statement... especially since the "and which ones are not" isn't
> currently there, we still leave gcc a lot of freedom there ... but only
> in one direction.

Besides, this is currently an everywhere or nowhere switch. gcc (in
principle at least) could decide which calls to inline and for which ones
it isn't worth it. Just like the (also long to die) "register" keyword.
-- 
Dr. Horst H. von Brand                   User #22616 counter.li.org
Departamento de Informatica                     Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria              +56 32 654239
Casilla 110-V, Valparaiso, Chile                Fax:  +56 32 797513


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-29 14:54             ` Arjan van de Ven
@ 2005-12-29 15:35               ` Adrian Bunk
  2005-12-29 15:38                 ` Arjan van de Ven
  2005-12-29 15:42                 ` Jakub Jelinek
  0 siblings, 2 replies; 167+ messages in thread
From: Adrian Bunk @ 2005-12-29 15:35 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Christoph Hellwig, Ingo Molnar, Linus Torvalds, lkml,
	Andrew Morton, Matt Mackall

On Thu, Dec 29, 2005 at 03:54:09PM +0100, Arjan van de Ven wrote:
> 
> > I don't care too much whether we put always_inline or inline at the function
> > we _really_ want to inline.  But all others shouldn't have any inline marker.
> > So instead of changing the pretty useful redefinitions we have to keep the
> > code a little more readable what about getting rid of all the stupid inlines
> > we have over the place? 
> 
> just in drivers/ there are well over 6400 of those. Changing most of
> those is going to be a huge effort. The reality is, most driver writers
> (in fact kernel code writers) tend to overestimate the gain of inline in
> THEIR code, and to underestimate the cumulative cost of it. Despite what
> akpm says, I think gcc can make a better judgement than most of these
> authors (probably including me :). We can remove 6400 now, but a year
> from now, another 1000 have been added back again I bet.

Are we that bad reviewing code?

An "inline" in a .c file is simply nearly always wrong in the kernel, 
and unless the author has a good justification for it it should be 
removed.

> You describe a nice utopia where only the most essential functions are
> inlined.. but so far that hasn't worked out all that well ;) Turning
> "inline" back into the hint to the compiler that the C language makes it
> is maybe a cop-out, but it's a sustainable approach at least.
>...

But shouldn't nowadays gcc be able to know best even without an "inline" 
hint?

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-29 15:35               ` Adrian Bunk
@ 2005-12-29 15:38                 ` Arjan van de Ven
  2005-12-29 15:42                 ` Jakub Jelinek
  1 sibling, 0 replies; 167+ messages in thread
From: Arjan van de Ven @ 2005-12-29 15:38 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Christoph Hellwig, Ingo Molnar, Linus Torvalds, lkml,
	Andrew Morton, Matt Mackall


> > You describe a nice utopia where only the most essential functions are
> > inlined.. but so far that hasn't worked out all that well ;) Turning
> > "inline" back into the hint to the compiler that the C language makes it
> > is maybe a cop-out, but it's a sustainable approach at least.
> >...
> 
> But shouldn't nowadays gcc be able to know best even without an "inline" 
> hint?

it will, the inline hint only affects the thresholds so it's not
entirely without effects, but I can imagine that there are cases that
truely are performance critical and can be optimized out and where you
don't want to help gcc a bit (say a one line wrapper around readl or
writel). Otoh I suspect that modern gcc will be more than smart enough
and inline one liners anyway (if they're static of course).



^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-29  7:32             ` Ingo Molnar
  2005-12-29 14:58               ` Horst von Brand
@ 2005-12-29 15:40               ` Adrian Bunk
  2005-12-29 17:41               ` Linus Torvalds
  2 siblings, 0 replies; 167+ messages in thread
From: Adrian Bunk @ 2005-12-29 15:40 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, torvalds, arjan, linux-kernel, mpm

On Thu, Dec 29, 2005 at 08:32:59AM +0100, Ingo Molnar wrote:
> 
> * Andrew Morton <akpm@osdl.org> wrote:
> 
> > Ingo Molnar <mingo@elte.hu> wrote:
> > >
> > > I think gcc should arguably not be forced to inline things when doing 
> > > -Os, and it's also expected to mess up much less than when optimizing 
> > > for speed. So maybe forced inlining should be dependent on 
> > > !CONFIG_CC_OPTIMIZE_FOR_SIZE?
> > 
> > When it comes to inlining I just don't trust gcc as far as I can spit 
> > it.  We're putting the kernel at the mercy of future random brainfarts 
> > and bugs from the gcc guys.  It would be better and safer IMO to 
> > continue to force `inline' to have strict and sane semamtics, and to 
> > simply be vigilant about our use of it.
> 
> i think there's quite an attitude here - we are at the mercy of "gcc 
> brainfarts" anyway, and users are at the mercy of "kernel brainfarts" 
> just as much. Should users disable swapping and trash-talk it just 
> because the Linux kernel used to have a poor VM? (And the gcc folks are 
> certainly listening - it's not like they were unwilling to fix stuff, 
> they simply had their own decade-old technological legacies that made 
> certain seemingly random problems much harder to attack. E.g. -Os has 
> recently been improved quite significantly in to-be-gcc-4.2.)
>...
> also, there's a fundamental conflict of 'speed vs. performance' here, 
> for a certain boundary region. For the extremes, very small and very 
> large functions, the decision is clear, but if e.g. a CPU has tons of 
> cache, it might prefer more agressive inlining than if it doesnt. So 
> it's not like we can do it in a fully static manner.
>...

I'd formulate it the other way round as Andrew:

We should force gcc to inline code where we do know best
("static inline"s in header files) and leave the decision
to gcc in the cases where gcc should know best controlled
by some high-level knobs like -Os/-O2.

gcc simply needs to be forced to inline in some cases in which we really 
need inlining, but in all other cases gcc knows best and we can trust 
gcc to make the right decision.

> 	Ingo

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-29 15:35               ` Adrian Bunk
  2005-12-29 15:38                 ` Arjan van de Ven
@ 2005-12-29 15:42                 ` Jakub Jelinek
  2005-12-29 19:14                   ` Adrian Bunk
  2005-12-30  9:28                   ` Andi Kleen
  1 sibling, 2 replies; 167+ messages in thread
From: Jakub Jelinek @ 2005-12-29 15:42 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Arjan van de Ven, Christoph Hellwig, Ingo Molnar, Linus Torvalds,
	lkml, Andrew Morton, Matt Mackall

On Thu, Dec 29, 2005 at 04:35:29PM +0100, Adrian Bunk wrote:
> > You describe a nice utopia where only the most essential functions are
> > inlined.. but so far that hasn't worked out all that well ;) Turning
> > "inline" back into the hint to the compiler that the C language makes it
> > is maybe a cop-out, but it's a sustainable approach at least.
> >...
> 
> But shouldn't nowadays gcc be able to know best even without an "inline" 
> hint?

Only for static functions (and in -funit-at-a-time mode).
Anything else would require full IMA over the whole kernel and we aren't
there yet.  So inline hints are useful.  But most of the inline keywords
in the kernel really should be that, hints, because e.g. while it can be
beneficial to inline something on one arch, it may be not beneficial on
another arch, depending on cache sizes, number of general registers
available to the compiler, register preassure, speed of the call/ret
pair, calling convention and many other factors.

	Jakub

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-29  7:32             ` Ingo Molnar
  2005-12-29 14:58               ` Horst von Brand
  2005-12-29 15:40               ` Adrian Bunk
@ 2005-12-29 17:41               ` Linus Torvalds
  2005-12-29 18:42                 ` Arjan van de Ven
                                   ` (3 more replies)
  2 siblings, 4 replies; 167+ messages in thread
From: Linus Torvalds @ 2005-12-29 17:41 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, arjan, linux-kernel, mpm



On Thu, 29 Dec 2005, Ingo Molnar wrote:
> 
> * Andrew Morton <akpm@osdl.org> wrote:
> > 
> > When it comes to inlining I just don't trust gcc as far as I can spit 
> > it.  We're putting the kernel at the mercy of future random brainfarts 
> > and bugs from the gcc guys.  It would be better and safer IMO to 
> > continue to force `inline' to have strict and sane semamtics, and to 
> > simply be vigilant about our use of it.
> 
> i think there's quite an attitude here - we are at the mercy of "gcc 
> brainfarts" anyway, and users are at the mercy of "kernel brainfarts" 
> just as much.

There's a huge difference here. The gcc people very much have a "Oh, we 
changed old documented behaviour - live with it" attitude, together with 
"That was a gcc extension, not part of the C language, so when we change 
how gcc behaves, it's _your_ problem" approach.

At least they used to. 

So yes, there's a huge attitude difference. The gcc people have a BAD 
attitude. When the meaning of "inline" changed (from a "inline this" to 
"hey, it's a hint"), the gcc people never EVER said "sorry". They 
effectively said "screw you".

I know this is why I don't trust gcc wrt inlining. It's not so much about 
any technical issues, as about the fact that the kernel tends to be a lot 
heavier user of gcc features than most programs, and has correctness 
issues with them, AND THE GCC PEOPLE SIMPLY DON'T CARE.

Comparing it to the kernel is ludicrous. We care about user-space 
interfaces to an insane degree. We go to extreme lengths to maintain even 
badly designed or unintentional interfaces. Breaking user programs simply 
isn't acceptable. We're _not_ like the gcc developers. We know that 
people use old binaries for years and years, and that making a new 
release doesn't mean that you can just throw that out. You can trust us.

Maybe gcc development has changed. Maybe it hasn't.

THAT is what makes me worry. I don't know if this is why Andrew doesn't 
trust inlining, but I suspect it has similar roots. Not trusting it 
because we haven't been able to trust the people behind it. No heads-up, 
no warnings, no discussions. Just a "screw you, things changed, your 
usage doesn't matter, and we're not even interested in listening to you 
or telling you why things changed".

There have been situations where documented gcc semantics changed, and 
instead of saying "sorry", the gcc people changed the documentation. What 
the hell is the point of documented semantics if you can't depend on them 
anyway?

One thing we could do: I think modern gcc's at least have an option to 
warn when they don't inline something. It might make sense to just enable 
that warning, and see _which_ functions -Os and -funit-at-a-time say are 
too large to be inlined.

Maybe the right thing to do is to just heed that warning, and remove such 
functions from header files and make them no-inline? That way we get the 
size fixes _regardless_ of any compiler options.

				Linus

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-29 17:41               ` Linus Torvalds
@ 2005-12-29 18:42                 ` Arjan van de Ven
  2005-12-29 18:45                   ` Arjan van de Ven
  2005-12-29 20:19                 ` Ingo Molnar
                                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 167+ messages in thread
From: Arjan van de Ven @ 2005-12-29 18:42 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Ingo Molnar, Andrew Morton, linux-kernel, mpm


> 
> One thing we could do: I think modern gcc's at least have an option to 
> warn when they don't inline something. It might make sense to just enable 
> that warning, and see _which_ functions -Os and -funit-at-a-time say are 
> too large to be inlined.


with -Os gcc gets a bit picky and warns a LOT; with -O2... you get the
following fixes (all huge functions)


diff -purN linux-org/drivers/acpi/ec.c linux-2.6.15-rc6/drivers/acpi/ec.c
--- linux-org/drivers/acpi/ec.c	2005-10-28 02:02:08.000000000 +0200
+++ linux-2.6.15-rc6/drivers/acpi/ec.c	2005-12-29 19:21:37.000000000 +0100
@@ -153,7 +153,7 @@ static int acpi_ec_polling_mode = EC_POL
                              Transaction Management
    -------------------------------------------------------------------------- */
 
-static inline u32 acpi_ec_read_status(union acpi_ec *ec)
+static u32 acpi_ec_read_status(union acpi_ec *ec)
 {
 	u32 status = 0;
 
diff -purN linux-org/drivers/bluetooth/hci_bcsp.c linux-2.6.15-rc6/drivers/bluetooth/hci_bcsp.c
--- linux-org/drivers/bluetooth/hci_bcsp.c	2005-12-22 19:54:33.000000000 +0100
+++ linux-2.6.15-rc6/drivers/bluetooth/hci_bcsp.c	2005-12-29 19:23:21.000000000 +0100
@@ -494,7 +494,7 @@ static inline void bcsp_unslip_one_byte(
 	}
 }
 
-static inline void bcsp_complete_rx_pkt(struct hci_uart *hu)
+static void bcsp_complete_rx_pkt(struct hci_uart *hu)
 {
 	struct bcsp_struct *bcsp = hu->priv;
 	int pass_up;
diff -purN linux-org/drivers/char/drm/r128_state.c linux-2.6.15-rc6/drivers/char/drm/r128_state.c
--- linux-org/drivers/char/drm/r128_state.c	2005-12-22 19:54:33.000000000 +0100
+++ linux-2.6.15-rc6/drivers/char/drm/r128_state.c	2005-12-29 19:24:59.000000000 +0100
@@ -220,7 +220,7 @@ static __inline__ void r128_emit_tex1(dr
 	ADVANCE_RING();
 }
 
-static __inline__ void r128_emit_state(drm_r128_private_t * dev_priv)
+static void r128_emit_state(drm_r128_private_t * dev_priv)
 {
 	drm_r128_sarea_t *sarea_priv = dev_priv->sarea_priv;
 	unsigned int dirty = sarea_priv->dirty;
diff -purN linux-org/drivers/isdn/hisax/avm_pci.c linux-2.6.15-rc6/drivers/isdn/hisax/avm_pci.c
--- linux-org/drivers/isdn/hisax/avm_pci.c	2005-12-22 19:54:33.000000000 +0100
+++ linux-2.6.15-rc6/drivers/isdn/hisax/avm_pci.c	2005-12-29 19:29:31.000000000 +0100
@@ -358,7 +358,7 @@ hdlc_fill_fifo(struct BCState *bcs)
 	}
 }
 
-static inline void
+static void
 HDLC_irq(struct BCState *bcs, u_int stat) {
 	int len;
 	struct sk_buff *skb;
diff -purN linux-org/drivers/isdn/hisax/diva.c linux-2.6.15-rc6/drivers/isdn/hisax/diva.c
--- linux-org/drivers/isdn/hisax/diva.c	2005-10-28 02:02:08.000000000 +0200
+++ linux-2.6.15-rc6/drivers/isdn/hisax/diva.c	2005-12-29 19:29:42.000000000 +0100
@@ -476,7 +476,7 @@ Memhscx_fill_fifo(struct BCState *bcs)
 	}
 }
 
-static inline void
+static void
 Memhscx_interrupt(struct IsdnCardState *cs, u_char val, u_char hscx)
 {
 	u_char r;
diff -purN linux-org/drivers/isdn/hisax/hscx_irq.c linux-2.6.15-rc6/drivers/isdn/hisax/hscx_irq.c
--- linux-org/drivers/isdn/hisax/hscx_irq.c	2005-10-28 02:02:08.000000000 +0200
+++ linux-2.6.15-rc6/drivers/isdn/hisax/hscx_irq.c	2005-12-29 19:30:21.000000000 +0100
@@ -119,7 +119,7 @@ hscx_fill_fifo(struct BCState *bcs)
 	}
 }
 
-static inline void
+static void
 hscx_interrupt(struct IsdnCardState *cs, u_char val, u_char hscx)
 {
 	u_char r;
@@ -221,7 +221,7 @@ hscx_interrupt(struct IsdnCardState *cs,
 	}
 }
 
-static inline void
+static void
 hscx_int_main(struct IsdnCardState *cs, u_char val)
 {
 
diff -purN linux-org/drivers/isdn/hisax/jade_irq.c linux-2.6.15-rc6/drivers/isdn/hisax/jade_irq.c
--- linux-org/drivers/isdn/hisax/jade_irq.c	2005-10-28 02:02:08.000000000 +0200
+++ linux-2.6.15-rc6/drivers/isdn/hisax/jade_irq.c	2005-12-29 19:30:07.000000000 +0100
@@ -110,7 +110,7 @@ jade_fill_fifo(struct BCState *bcs)
 }
 
 
-static inline void
+static void
 jade_interrupt(struct IsdnCardState *cs, u_char val, u_char jade)
 {
 	u_char r;
diff -purN linux-org/drivers/md/dm-crypt.c linux-2.6.15-rc6/drivers/md/dm-crypt.c
--- linux-org/drivers/md/dm-crypt.c	2005-12-22 19:54:33.000000000 +0100
+++ linux-2.6.15-rc6/drivers/md/dm-crypt.c	2005-12-29 19:28:58.000000000 +0100
@@ -228,7 +228,7 @@ static struct crypt_iv_operations crypt_
 };
 
 
-static inline int
+static int
 crypt_convert_scatterlist(struct crypt_config *cc, struct scatterlist *out,
                           struct scatterlist *in, unsigned int length,
                           int write, sector_t sector)
diff -purN linux-org/drivers/media/video/cx25840/cx25840-audio.c linux-2.6.15-rc6/drivers/media/video/cx25840/cx25840-audio.c
--- linux-org/drivers/media/video/cx25840/cx25840-audio.c	2005-12-22 19:54:33.000000000 +0100
+++ linux-2.6.15-rc6/drivers/media/video/cx25840/cx25840-audio.c	2005-12-29 19:31:11.000000000 +0100
@@ -23,7 +23,7 @@
 
 #include "cx25840.h"
 
-inline static int set_audclk_freq(struct i2c_client *client,
+static int set_audclk_freq(struct i2c_client *client,
 				 enum v4l2_audio_clock_freq freq)
 {
 	struct cx25840_state *state = i2c_get_clientdata(client);
diff -purN linux-org/drivers/media/video/tvp5150.c linux-2.6.15-rc6/drivers/media/video/tvp5150.c
--- linux-org/drivers/media/video/tvp5150.c	2005-12-22 19:54:33.000000000 +0100
+++ linux-2.6.15-rc6/drivers/media/video/tvp5150.c	2005-12-29 19:31:41.000000000 +0100
@@ -87,7 +87,7 @@ struct tvp5150 {
 	int sat;
 };
 
-static inline int tvp5150_read(struct i2c_client *c, unsigned char addr)
+static int tvp5150_read(struct i2c_client *c, unsigned char addr)
 {
 	unsigned char buffer[1];
 	int rc;
diff -purN linux-org/drivers/mtd/nand/diskonchip.c linux-2.6.15-rc6/drivers/mtd/nand/diskonchip.c
--- linux-org/drivers/mtd/nand/diskonchip.c	2005-12-22 19:54:34.000000000 +0100
+++ linux-2.6.15-rc6/drivers/mtd/nand/diskonchip.c	2005-12-29 19:31:26.000000000 +0100
@@ -1506,7 +1506,7 @@ static inline int __init doc2001plus_ini
 	return 1;
 }
 
-static inline int __init doc_probe(unsigned long physadr)
+static int __init doc_probe(unsigned long physadr)
 {
 	unsigned char ChipID;
 	struct mtd_info *mtd;
diff -purN linux-org/drivers/net/wireless/ipw2100.c linux-2.6.15-rc6/drivers/net/wireless/ipw2100.c
--- linux-org/drivers/net/wireless/ipw2100.c	2005-12-22 19:54:34.000000000 +0100
+++ linux-2.6.15-rc6/drivers/net/wireless/ipw2100.c	2005-12-29 19:33:50.000000000 +0100
@@ -2346,7 +2346,7 @@ static inline void ipw2100_corruption_de
 	schedule_reset(priv);
 }
 
-static inline void isr_rx(struct ipw2100_priv *priv, int i,
+static void isr_rx(struct ipw2100_priv *priv, int i,
 			  struct ieee80211_rx_stats *stats)
 {
 	struct ipw2100_status *status = &priv->status_queue.drv[i];
@@ -2481,7 +2481,7 @@ static inline int ipw2100_corruption_che
  * The WRITE index is cached in the variable 'priv->rx_queue.next'.
  *
  */
-static inline void __ipw2100_rx_process(struct ipw2100_priv *priv)
+static void __ipw2100_rx_process(struct ipw2100_priv *priv)
 {
 	struct ipw2100_bd_queue *rxq = &priv->rx_queue;
 	struct ipw2100_status_queue *sq = &priv->status_queue;
@@ -2634,7 +2634,7 @@ static inline void __ipw2100_rx_process(
  * for use by future command and data packets.
  *
  */
-static inline int __ipw2100_tx_process(struct ipw2100_priv *priv)
+static int __ipw2100_tx_process(struct ipw2100_priv *priv)
 {
 	struct ipw2100_bd_queue *txq = &priv->tx_queue;
 	struct ipw2100_bd *tbd;
diff -purN linux-org/drivers/scsi/iscsi_tcp.c linux-2.6.15-rc6/drivers/scsi/iscsi_tcp.c
--- linux-org/drivers/scsi/iscsi_tcp.c	2005-12-22 19:54:34.000000000 +0100
+++ linux-2.6.15-rc6/drivers/scsi/iscsi_tcp.c	2005-12-29 19:32:02.000000000 +0100
@@ -1437,7 +1437,7 @@ iscsi_buf_data_digest_update(struct iscs
 	}
 }
 
-static inline int
+static int
 iscsi_digest_final_send(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask,
 			struct iscsi_buf *buf, uint32_t *digest, int final)
 {
diff -purN linux-org/drivers/video/matrox/matroxfb_maven.c linux-2.6.15-rc6/drivers/video/matrox/matroxfb_maven.c
--- linux-org/drivers/video/matrox/matroxfb_maven.c	2005-10-28 02:02:08.000000000 +0200
+++ linux-2.6.15-rc6/drivers/video/matrox/matroxfb_maven.c	2005-12-29 19:34:05.000000000 +0100
@@ -968,7 +968,7 @@ static inline int maven_compute_timming(
 	return 0;
 }
 
-static inline int maven_program_timming(struct maven_data* md,
+static int maven_program_timming(struct maven_data* md,
 		const struct mavenregs* m) {
 	struct i2c_client* c = md->client;
 
diff -purN linux-org/fs/9p/conv.c linux-2.6.15-rc6/fs/9p/conv.c
--- linux-org/fs/9p/conv.c	2005-10-28 02:02:08.000000000 +0200
+++ linux-2.6.15-rc6/fs/9p/conv.c	2005-12-29 19:20:19.000000000 +0100
@@ -350,7 +350,7 @@ serialize_stat(struct v9fs_session_info 
  *
  */
 
-static inline int
+static int
 deserialize_stat(struct v9fs_session_info *v9ses, struct cbuf *bufp,
 		 struct v9fs_stat *stat, struct cbuf *dbufp)
 {
diff -purN linux-org/fs/nfsd/nfsxdr.c linux-2.6.15-rc6/fs/nfsd/nfsxdr.c
--- linux-org/fs/nfsd/nfsxdr.c	2005-10-28 02:02:08.000000000 +0200
+++ linux-2.6.15-rc6/fs/nfsd/nfsxdr.c	2005-12-29 19:24:28.000000000 +0100
@@ -151,7 +151,7 @@ decode_sattr(u32 *p, struct iattr *iap)
 	return p;
 }
 
-static inline u32 *
+static u32 *
 encode_fattr(struct svc_rqst *rqstp, u32 *p, struct svc_fh *fhp)
 {
 	struct vfsmount *mnt = fhp->fh_export->ex_mnt;
diff -purN linux-org/net/ieee80211/ieee80211_rx.c linux-2.6.15-rc6/net/ieee80211/ieee80211_rx.c
--- linux-org/net/ieee80211/ieee80211_rx.c	2005-12-22 19:54:36.000000000 +0100
+++ linux-2.6.15-rc6/net/ieee80211/ieee80211_rx.c	2005-12-29 19:24:05.000000000 +0100
@@ -1295,7 +1295,7 @@ static inline int is_beacon(int fc)
 	return (WLAN_FC_GET_STYPE(le16_to_cpu(fc)) == IEEE80211_STYPE_BEACON);
 }
 
-static inline void ieee80211_process_probe_response(struct ieee80211_device
+static void ieee80211_process_probe_response(struct ieee80211_device
 						    *ieee, struct
 						    ieee80211_probe_response
 						    *beacon, struct ieee80211_rx_stats
diff -purN linux-org/net/netfilter/nfnetlink.c linux-2.6.15-rc6/net/netfilter/nfnetlink.c
--- linux-org/net/netfilter/nfnetlink.c	2005-12-22 19:54:36.000000000 +0100
+++ linux-2.6.15-rc6/net/netfilter/nfnetlink.c	2005-12-29 19:28:08.000000000 +0100
@@ -212,7 +212,7 @@ int nfnetlink_unicast(struct sk_buff *sk
 }
 
 /* Process one complete nfnetlink message. */
-static inline int nfnetlink_rcv_msg(struct sk_buff *skb,
+static int nfnetlink_rcv_msg(struct sk_buff *skb,
 				    struct nlmsghdr *nlh, int *errp)
 {
 	struct nfnl_callback *nc;
diff -purN linux-org/sound/oss/esssolo1.c linux-2.6.15-rc6/sound/oss/esssolo1.c
--- linux-org/sound/oss/esssolo1.c	2005-10-28 02:02:08.000000000 +0200
+++ linux-2.6.15-rc6/sound/oss/esssolo1.c	2005-12-29 19:23:05.000000000 +0100
@@ -515,7 +515,7 @@ static inline int prog_dmabuf_adc(struct
 	return 0;
 }
 
-static inline int prog_dmabuf_dac(struct solo1_state *s)
+static int prog_dmabuf_dac(struct solo1_state *s)
 {
 	unsigned long va;
 	int c;



^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-29 18:42                 ` Arjan van de Ven
@ 2005-12-29 18:45                   ` Arjan van de Ven
  0 siblings, 0 replies; 167+ messages in thread
From: Arjan van de Ven @ 2005-12-29 18:45 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Ingo Molnar, Andrew Morton, linux-kernel, mpm

On Thu, 2005-12-29 at 19:42 +0100, Arjan van de Ven wrote:
> > 
> > One thing we could do: I think modern gcc's at least have an option to 
> > warn when they don't inline something. It might make sense to just enable 
> > that warning, and see _which_ functions -Os and -funit-at-a-time say are 
> > too large to be inlined.
> 
> 
> with -Os gcc gets a bit picky and warns a LOT; with -O2... you get the
> following fixes (all huge functions)
> 


btw this caught one bug that the forced attribute was hiding: there was
a function which was "inline" and which uses a variable sized array.
normally gcc refuses to inline that (rightfully; esp relative addressing
gets rather really complex in that scenario), but the force attribute
causes it to be inlined anyway. No idea if the result is sane in that
case...



^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-29 15:42                 ` Jakub Jelinek
@ 2005-12-29 19:14                   ` Adrian Bunk
  2005-12-30  9:28                   ` Andi Kleen
  1 sibling, 0 replies; 167+ messages in thread
From: Adrian Bunk @ 2005-12-29 19:14 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Arjan van de Ven, Christoph Hellwig, Ingo Molnar, Linus Torvalds,
	lkml, Andrew Morton, Matt Mackall

On Thu, Dec 29, 2005 at 10:42:41AM -0500, Jakub Jelinek wrote:
> On Thu, Dec 29, 2005 at 04:35:29PM +0100, Adrian Bunk wrote:
> > > You describe a nice utopia where only the most essential functions are
> > > inlined.. but so far that hasn't worked out all that well ;) Turning
> > > "inline" back into the hint to the compiler that the C language makes it
> > > is maybe a cop-out, but it's a sustainable approach at least.
> > >...
> > 
> > But shouldn't nowadays gcc be able to know best even without an "inline" 
> > hint?
> 
> Only for static functions (and in -funit-at-a-time mode).

I'm assuming -funit-at-a-time mode. Currently it's disabled on i386, but 
this will change in the medium-term future.

> Anything else would require full IMA over the whole kernel and we aren't
> there yet.  So inline hints are useful.  But most of the inline keywords
> in the kernel really should be that, hints, because e.g. while it can be

Are there (on !alpha) any places in the kernel where a function is 
inline but not static, and this is wanted?

> beneficial to inline something on one arch, it may be not beneficial on
> another arch, depending on cache sizes, number of general registers
> available to the compiler, register preassure, speed of the call/ret
> pair, calling convention and many other factors.

Does gcc really need hints when the functions are static?

> 	Jakub

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-29  7:41             ` Ingo Molnar
  2005-12-29  8:02               ` Dave Jones
@ 2005-12-29 19:44               ` Krzysztof Halasa
  1 sibling, 0 replies; 167+ messages in thread
From: Krzysztof Halasa @ 2005-12-29 19:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Arjan van de Ven, lkml, Andrew Morton,
	Matt Mackall

Ingo Molnar <mingo@elte.hu> writes:

>> Remember the above gcc miscompiles the x86-32 kernel with -Os:
>> 
>> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=173764
>
> i'm not sure what the point is.

Nothing special, just a side note.

> There was no sudden rush of -Os related 
> bugs when Fedora switched to it for the kernel,

I found 'ip route add' was broken with -Os. I use FC4s but the kernel
is usually a mutated version of the Linus' tree so I can't check it.

> and the 35% code-size 
> savings were certainly worth it in terms of icache footprint.

Sure.

Good to hear gcc 4.1 is fixed.
-- 
Krzysztof Halasa

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-29 13:52     ` Adrian Bunk
@ 2005-12-29 19:57       ` Horst von Brand
  2005-12-29 20:25       ` Ingo Molnar
  1 sibling, 0 replies; 167+ messages in thread
From: Horst von Brand @ 2005-12-29 19:57 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Ingo Molnar, lkml, Linus Torvalds, Andrew Morton,
	Arjan van de Ven, Matt Mackall

Adrian Bunk <bunk@stusta.de> wrote:
> On Thu, Dec 29, 2005 at 08:59:36AM +0100, Ingo Molnar wrote:
> > * Adrian Bunk <bunk@stusta.de> wrote:

> > > > unit-at-a-time still increases the kernel stack footprint somewhat
> > > > (by about 5% in the CC_OPTIMIZE_FOR_SIZE case), but not by the
> > > > insane degree gcc3 used to, which prompted the original
> > > > -fno-unit-at-a-time addition.
> > > >...

> > > Please hold off this patch.
> > > 
> > > I do already plan to look at this after the smoke has cleared after 
> > > the 4k stacks issue. I want to avoid two different knobs both with 
> > > negative effects on stack usage (currently CONFIG_4KSTACKS=y, and 
> > > after your patch gcc >= 4.0) giving a low testing coverage of the 
> > > worst cases.

This is /one/ knob with effect on stack usage...

> > this is obviously not 2.6.15 stuff, so we've got enough time to see the 
> > effects. [ And what does "I do plan to look at this" mean? When 
> > precisely, and can i thus go to other topics without the issue being 
> > dropped on the floor indefinitely? ]

> It won't be dropped on the floor indefinitely.
> 
> "I do plan to look at this" means that I'd currently estimate this being 
> 2.6.19 stuff.
> 
> Yes that's one year from now, but we need it properly analyzed and 
> tested before getting it into Linus' tree, and I do really want it 
> untangled from and therefore after 4k stacks.

That is "indefinitely" in my book. Or nearly so. And in the meantime will
get many hackers to patch it in by hand and forget to tell...

> > also note that the inlining patch actually _reduces_ average stack 
> > footprint by ~3-4%:
> >                                             orig        +inlining
> >         # of functions above 256 bytes:      683              660
> >                total stackspace, bytes:   148492           142884
> > 
> > it is the unit-at-a-time patch that increases stack footprint (by about 
> > 7-8%, which together with the inlining patch gives a net ~5%).
> 
> The problem with the stack is that average stack usage is relatively 
> uninteresting - what matters is the worst case stack usage. And I'd 
> expect the stack footprint improvements you see with less inlining in 
> different places than the deteriorations with unit-at-a-time.

That is a red herring. The numbers are for number of large stack users
(goes down) and cummulative stack usage (goes down too). Sure, if the
number of > 256 bytes stack users goes down while the largest stack uses go
up we are in trouble. And if the grand total goes down but stack usage by
some critical users go up we might be screwed. That could be answered by
looking at the details behind the above numbers. But there is only one way
to find out if it causes problems (and fix them)...

I'd (tend to) buy an argument about possible instabilities in that gcc
code, but then again, it has to be tested sometime...

Make it a configuration option, under EXPERIMENTAL, VERY DANGEROUS, HIGH
EXPLOSIVE if you must.
-- 
Dr. Horst H. von Brand                   User #22616 counter.li.org
Departamento de Informatica                     Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria              +56 32 654239
Casilla 110-V, Valparaiso, Chile                Fax:  +56 32 797513


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-29 17:41               ` Linus Torvalds
  2005-12-29 18:42                 ` Arjan van de Ven
@ 2005-12-29 20:19                 ` Ingo Molnar
  2005-12-29 22:20                   ` Matt Mackall
  2005-12-29 20:28                 ` Dave Jones
  2005-12-29 23:16                 ` Willy Tarreau
  3 siblings, 1 reply; 167+ messages in thread
From: Ingo Molnar @ 2005-12-29 20:19 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, arjan, linux-kernel, mpm


* Linus Torvalds <torvalds@osdl.org> wrote:

> > i think there's quite an attitude here - we are at the mercy of "gcc 
> > brainfarts" anyway, and users are at the mercy of "kernel brainfarts" 
> > just as much.
> 
> There's a huge difference here. The gcc people very much have a "Oh, 
> we changed old documented behaviour - live with it" attitude, together 
> with "That was a gcc extension, not part of the C language, so when we 
> change how gcc behaves, it's _your_ problem" approach.
> 
> At least they used to.

yeah, i think that was definitely the case historically.

> Maybe the right thing to do is to just heed that warning, and remove 
> such functions from header files and make them no-inline? That way we 
> get the size fixes _regardless_ of any compiler options.

i think the eye-opener (for me at least) was that there's really a
massive 5%+ size difference here, from 2 simple patches. And meanwhile
Matt is doing truly hard size-reduction work and is mailing patches to
lkml that remove 200-300 bytes of .text, which is 0.01% of code, apiece.

Debloating is like scalability, a piece-by-piece process where we'll
only see the full effects after doing 100 independent steps, but still
we must not ignore the big effects either, nor must we get ourselves
into losing maintainance battles.

The current inline model seems to be a lost battle, the 'size noise'
caused by spurious inlines (which count in the thousands) is _far_
outpowering most of the size reduction efforts. And i think it can be
argued that at least in the -Os case gcc has a very clear directive wrt.
what to do - and much less room to mess up. Independently of how much we
trust it.

	Ingo

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-29 13:52     ` Adrian Bunk
  2005-12-29 19:57       ` Horst von Brand
@ 2005-12-29 20:25       ` Ingo Molnar
  2005-12-31 15:22         ` Adrian Bunk
  1 sibling, 1 reply; 167+ messages in thread
From: Ingo Molnar @ 2005-12-29 20:25 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: lkml, Linus Torvalds, Andrew Morton, Arjan van de Ven,
	Matt Mackall, Dave Jones


* Adrian Bunk <bunk@stusta.de> wrote:

> It won't be dropped on the floor indefinitely.
> 
> "I do plan to look at this" means that I'd currently estimate this 
> being 2.6.19 stuff.

you must be kidding ...

> Yes that's one year from now, but we need it properly analyzed and 
> tested before getting it into Linus' tree, and I do really want it 
> untangled from and therefore after 4k stacks.

you are really using the wrong technology for this.

look at the latency tracing patch i posted today: it includes a feature 
that prints the worst-case stack footprint _as it happens_, and thus 
allows the mapping of such effects in a very efficient and very 
practical way. As it works on a live system, and profiles live function 
traces, it goes through function pointers and irq entry nesting effects 
too. We could perhaps put that into Fedora for a while and get the 
worst-case footprints mapped.

in fact i've been running this feature in the -rt kernel for quite some 
time, and it enabled the fixing of a couple of bad stack abusers, and it 
also told us what our current worst-case stack footprint is [when 4K 
stacks are enabled]: it's execve of an ELF binary.

	Ingo

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-29 17:41               ` Linus Torvalds
  2005-12-29 18:42                 ` Arjan van de Ven
  2005-12-29 20:19                 ` Ingo Molnar
@ 2005-12-29 20:28                 ` Dave Jones
  2005-12-29 20:49                   ` Linus Torvalds
  2005-12-29 23:16                 ` Willy Tarreau
  3 siblings, 1 reply; 167+ messages in thread
From: Dave Jones @ 2005-12-29 20:28 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Ingo Molnar, Andrew Morton, arjan, linux-kernel, mpm

On Thu, Dec 29, 2005 at 09:41:12AM -0800, Linus Torvalds wrote:

 > Comparing it to the kernel is ludicrous. We care about user-space 
 > interfaces to an insane degree. We go to extreme lengths to maintain even 
 > badly designed or unintentional interfaces. Breaking user programs simply 
 > isn't acceptable. We're _not_ like the gcc developers. We know that 
 > people use old binaries for years and years, and that making a new 
 > release doesn't mean that you can just throw that out. You can trust us.

Does this mean you're holding back the 2.6.15 release until we don't
need to update udev to stop X from breaking ?
</tongue-in-cheek>

Seriously, we break things _every_ release. Sometimes in tiny
'doesn't really matter' ways, sometimes in "fuck, my system no
longer works" ways, but the days where we I didn't have to tell
our userspace packagers to rev a half dozen or so packages up to the
latest upstream revisions when I've pushed a rebased kernel are
a distant memory.

		Dave


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-29 20:28                 ` Dave Jones
@ 2005-12-29 20:49                   ` Linus Torvalds
  2005-12-29 21:25                     ` Linus Torvalds
  2005-12-30  3:47                     ` Mark Lord
  0 siblings, 2 replies; 167+ messages in thread
From: Linus Torvalds @ 2005-12-29 20:49 UTC (permalink / raw)
  To: Dave Jones; +Cc: Ingo Molnar, Andrew Morton, arjan, linux-kernel, mpm



On Thu, 29 Dec 2005, Dave Jones wrote:
> 
> Seriously, we break things _every_ release. Sometimes in tiny
> 'doesn't really matter' ways, sometimes in "fuck, my system no
> longer works" ways, but the days where we I didn't have to tell
> our userspace packagers to rev a half dozen or so packages up to the
> latest upstream revisions when I've pushed a rebased kernel are
> a distant memory.

Umm.. Complain more. I upgrade kernels a lot more often than I upgrade 
distros, and things don't break. They're not allowed to break, because I 
refuse to upgrade my user programs just because I do kernel development. 
But I'd only notice a small part of user space, so if people don't 
complain, they break not because we don't care, but because we didn't even 
know.

So if you have a user program that breaks, _complain_. It's really not 
supposed to happen outside of perhaps kernel module loaders etc things 
that get really really chummy with kernel internals (and even that was 
fixed: the modern way of loading modules isn't that chummy any more, so 
hopefully we'll not need to break even module loaders again).

If we change some /proc file thing, breakage is often totally 
unintentional, and complaining is the right thing - people might not even 
have realized it broke.

At least _I_ take breakage reports seriously. If there are maintainers 
that don't, complain to them. I'll back you up. Breaking user space simply 
isn't acceptable without years of preparation and warning.

			Linus

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-29 20:49                   ` Linus Torvalds
@ 2005-12-29 21:25                     ` Linus Torvalds
       [not found]                       ` <20051229224839.GA12247@elte.hu>
  2005-12-30  3:47                     ` Mark Lord
  1 sibling, 1 reply; 167+ messages in thread
From: Linus Torvalds @ 2005-12-29 21:25 UTC (permalink / raw)
  To: Dave Jones; +Cc: Ingo Molnar, Andrew Morton, arjan, linux-kernel, mpm



On Thu, 29 Dec 2005, Linus Torvalds wrote:
> 
> At least _I_ take breakage reports seriously. If there are maintainers 
> that don't, complain to them. I'll back you up. Breaking user space simply 
> isn't acceptable without years of preparation and warning.

Btw, sometimes we knowingly change semantics that we believe that nobody 
would ever be able to care about. Then we literally _depend_ on people 
complaining about breakage in case we were wrong, and if you guys don't, 
and just curse, and upgrade programs, we actually miss out on real 
information.

And yes, occasionally we don't have much choice, and things break. It 
should be extremely rare, though. Much more commonly it would be a bug or 
an unintentional change that somebody didn't even realized changed 
semantics subtly.

		Linus

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-29 20:19                 ` Ingo Molnar
@ 2005-12-29 22:20                   ` Matt Mackall
  0 siblings, 0 replies; 167+ messages in thread
From: Matt Mackall @ 2005-12-29 22:20 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Linus Torvalds, Andrew Morton, arjan, linux-kernel

On Thu, Dec 29, 2005 at 09:19:53PM +0100, Ingo Molnar wrote:
> 
> * Linus Torvalds <torvalds@osdl.org> wrote:
> 
> > > i think there's quite an attitude here - we are at the mercy of "gcc 
> > > brainfarts" anyway, and users are at the mercy of "kernel brainfarts" 
> > > just as much.
> > 
> > There's a huge difference here. The gcc people very much have a "Oh, 
> > we changed old documented behaviour - live with it" attitude, together 
> > with "That was a gcc extension, not part of the C language, so when we 
> > change how gcc behaves, it's _your_ problem" approach.
> > 
> > At least they used to.
> 
> yeah, i think that was definitely the case historically.
> 
> > Maybe the right thing to do is to just heed that warning, and remove 
> > such functions from header files and make them no-inline? That way we 
> > get the size fixes _regardless_ of any compiler options.
> 
> i think the eye-opener (for me at least) was that there's really a
> massive 5%+ size difference here, from 2 simple patches. And meanwhile
> Matt is doing truly hard size-reduction work and is mailing patches to
> lkml that remove 200-300 bytes of .text, which is 0.01% of code, apiece.

For the record, my cut-off for non-trivial stuff is currently about
1K. Which is more like 0.1% for minimal kernels. Unfortunately, the
impact of these patches on a stripped-down kernel is less substantial
than on a featureful one, so 

> Debloating is like scalability, a piece-by-piece process where we'll
> only see the full effects after doing 100 independent steps, but still
> we must not ignore the big effects either, nor must we get ourselves
> into losing maintainance battles.
> 
> The current inline model seems to be a lost battle, the 'size noise'
> caused by spurious inlines (which count in the thousands) is _far_
> outpowering most of the size reduction efforts. And i think it can be
> argued that at least in the -Os case gcc has a very clear directive wrt.
> what to do - and much less room to mess up. Independently of how much we
> trust it.

I think both these patches deserve a spin in -mm. But I can see
arguments for staging it. Hopefully we can get Andrew to take the
unit-at-a-time piece for post-2.6.15 and try out the inlining after
we've got some confidence with the first.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
       [not found]                       ` <20051229224839.GA12247@elte.hu>
@ 2005-12-29 22:58                         ` Arjan van de Ven
  2005-12-30  2:03                           ` Tim Schmielau
  2005-12-30  3:31                           ` Nicolas Pitre
  0 siblings, 2 replies; 167+ messages in thread
From: Arjan van de Ven @ 2005-12-29 22:58 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Linus Torvalds, Dave Jones, Andrew Morton, linux-kernel, mpm

Some data from an x86-64 allyesconfig build.

Below is a *rough* estimate of savings that could be achieved by
uninlining specific functions. The estimate is rough in the sense that it assumes
that no "trick" allows the uninlined version to be significantly smaller
than the inlined version, which for certain functions is not a valid
assumption (kmalloc comes to mind as an obvious one).

The saving is estimated at (count-1) * (size-6), eg the estimate for a
function call is 6 bytes as well and the estimate for the size something 
takes as inlined is the same as the uninline size. 


These are the top items only; a more complete list can be gotten 
from http://www.fenrus.org/savings

Est saving       function name                   count   uninline size
----------------------------------------------------------------------
95940            down                            [2461]  <45>
84392            skb_put                         [1097]  <83>
50932            kfree_skb                       [1499]  <40>
44880            init_waitqueue_head             [881]   <57>
34840            lowmem_page_address             [537]   <71>
25573            cfi_build_cmd                   [108]   <245>
19825            skb_push                        [326]   <67>
17992            aic_outb                        [347]   <58>
17434            module_put                      [380]   <52>
16318            ahd_outb                        [399]   <47>
16035            kmalloc                         [3208]  <11>
14040            netif_wake_queue                [361]   <45>
13266            dev_kfree_skb_irq               [202]   <72>
12078            signal_pending                  [672]   <24>
11979            ahc_outb                        [364]   <39>
11603            down_interruptible              [284]   <47>
11552            ahd_inb                         [305]   <44>
11310            dst_release                     [175]   <71>
11275            netif_stop_queue                [452]   <31>
11165            down_write                      [320]   <41>
11107            ahc_inb                         [384]   <35>
10807            usb_fill_bulk_urb               [102]   <113>
10508            ahd_set_modes                   [72]    <154>
10266            skb_queue_head_init             [178]   <64>



^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-29 17:41               ` Linus Torvalds
                                   ` (2 preceding siblings ...)
  2005-12-29 20:28                 ` Dave Jones
@ 2005-12-29 23:16                 ` Willy Tarreau
  2005-12-30  8:05                   ` Arjan van de Ven
                                     ` (2 more replies)
  3 siblings, 3 replies; 167+ messages in thread
From: Willy Tarreau @ 2005-12-29 23:16 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Ingo Molnar, Andrew Morton, arjan, linux-kernel, mpm

On Thu, Dec 29, 2005 at 09:41:12AM -0800, Linus Torvalds wrote:
 
> There have been situations where documented gcc semantics changed, and 
> instead of saying "sorry", the gcc people changed the documentation. What 
> the hell is the point of documented semantics if you can't depend on them 
> anyway?

Remember the #arg and ##arg mess in macros between gcc2 and gcc3 ?

I fell like I start to understand where your hate for specifications
comes from. As much as I like to stick to specs, which is generally
OK for hardware and network protocols, I can say that with GCC, there
is clearly no rule telling you whether your program will still compile
with version N+1 or not.

Can't we elect a recommended gcc version that distro makers could
ship under the name kgcc as it has been the case for some time,
and try to stick to that version for as long as possible ? The only
real reason to upgrade it would be to support newer archs, while at
the moment, we try to support compilers which are shipped as default
*user-space* compilers.

Willy


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-29 22:58                         ` Arjan van de Ven
@ 2005-12-30  2:03                           ` Tim Schmielau
  2005-12-30  2:15                             ` Tim Schmielau
                                               ` (2 more replies)
  2005-12-30  3:31                           ` Nicolas Pitre
  1 sibling, 3 replies; 167+ messages in thread
From: Tim Schmielau @ 2005-12-30  2:03 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Ingo Molnar, Linus Torvalds, Dave Jones, Andrew Morton, lkml, mpm

On Thu, 29 Dec 2005, Arjan van de Ven wrote:

> Some data from an x86-64 allyesconfig build.

Thanks for the table. This certainly is a good starting point to find 
valid candidates for uninlining.

> Below is a *rough* estimate of savings that could be achieved by
> uninlining specific functions. The estimate is rough in the sense that 
> it assumes
> that no "trick" allows the uninlined version to be significantly smaller
> than the inlined version, which for certain functions is not a valid
> assumption (kmalloc comes to mind as an obvious one).

What about the (probably more common) case that the inlined version is 
smaller because of optimizations that are not possible in the general 
case?

> The saving is estimated at (count-1) * (size-6), eg the estimate for a
> function call is 6 bytes as well and the estimate for the size something 
> takes as inlined is the same as the uninline size. 

Maybe the estimate is a little bit too rough. All savings add up to 
1780743 bytes, which seems a bit too large to me (can't compare to the
total size of an allyesconfig kernel since that gives me a 'File size 
limit exceeded' when linking).


What about the previous suggestion to remove inline from *all* static 
inline functions in .c files?
I just tried that for the fun of it. It got rid of 8806 'inline' 
annotations and produced the ~2 MB (uncompressed) patch at
   http://www.physik3.uni-rostock.de/tim/kernel/2.6/deinline.patch.gz
The resulting kernel actually booted (am running it right now). However, 
catching just these low-hanging fruits doesn't get me anywhere near 
Arjan's numbers. For my non-representative personal config I get (on 
i386 with -unit-at-a-time):

   > size vmlinux*
      text    data     bss     dec     hex filename
   2197105  386568  316840 2900513  2c4221 vmlinux
   2144453  392100  316840 2853393  2b8a11 vmlinux.deinline

I just started an allyesconfig build to get some real numbers.

Tim

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-30  2:03                           ` Tim Schmielau
@ 2005-12-30  2:15                             ` Tim Schmielau
  2005-12-30  7:49                             ` Ingo Molnar
  2005-12-31  3:51                             ` Kurt Wall
  2 siblings, 0 replies; 167+ messages in thread
From: Tim Schmielau @ 2005-12-30  2:15 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Ingo Molnar, Linus Torvalds, Dave Jones, Andrew Morton, lkml, mpm

On Fri, 30 Dec 2005, Tim Schmielau wrote:

>    > size vmlinux*
>       text    data     bss     dec     hex filename
>    2197105  386568  316840 2900513  2c4221 vmlinux
>    2144453  392100  316840 2853393  2b8a11 vmlinux.deinline

Doh! I forgot to set -Os.
Will better go to bed now and redo the numbers tomorrow.

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-29 22:58                         ` Arjan van de Ven
  2005-12-30  2:03                           ` Tim Schmielau
@ 2005-12-30  3:31                           ` Nicolas Pitre
  1 sibling, 0 replies; 167+ messages in thread
From: Nicolas Pitre @ 2005-12-30  3:31 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Ingo Molnar, Linus Torvalds, Dave Jones, Andrew Morton,
	linux-kernel, mpm

On Thu, 29 Dec 2005, Arjan van de Ven wrote:

> Some data from an x86-64 allyesconfig build.
> 
> 25573            cfi_build_cmd                   [108]   <245>

Beware this one.  The CFI code is not realistically ever used with 
everything set to y in real life scenarios.  In fact, when only the 
needed buswidth and interleave option are selected then this particular 
inlined function gets reduced to a simple constant, such as 0x00700070 
for example.

However if gcc wasn't forced to always inline, then in the allyesconfig 
this function would benefit from being uninlined automatically.


Nicolas

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-29 20:49                   ` Linus Torvalds
  2005-12-29 21:25                     ` Linus Torvalds
@ 2005-12-30  3:47                     ` Mark Lord
  2005-12-30  3:56                       ` Dave Jones
  2005-12-30  3:57                       ` Mark Lord
  1 sibling, 2 replies; 167+ messages in thread
From: Mark Lord @ 2005-12-30  3:47 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Ingo Molnar, Andrew Morton, arjan, linux-kernel, mpm

 >If we change some /proc file thing, breakage is often totally
 >unintentional, and complaining is the right thing - people might not even
 >have realized it broke.

Okay, I'm complaining:  /proc/cpuinfo is no longer correct
for my Pentium-M notebook, as ov 2.6.15-rc7.  Now it reports
a cpu speed of approx 800Mhz for a 2.0Mhz Pentium-M.

Cheers!

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-30  3:47                     ` Mark Lord
@ 2005-12-30  3:56                       ` Dave Jones
  2005-12-30  3:57                       ` Mark Lord
  1 sibling, 0 replies; 167+ messages in thread
From: Dave Jones @ 2005-12-30  3:56 UTC (permalink / raw)
  To: Mark Lord
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, arjan, linux-kernel,
	mpm

On Thu, Dec 29, 2005 at 10:47:28PM -0500, Mark Lord wrote:
 > >If we change some /proc file thing, breakage is often totally
 > >unintentional, and complaining is the right thing - people might not even
 > >have realized it broke.
 > 
 > Okay, I'm complaining:  /proc/cpuinfo is no longer correct
 > for my Pentium-M notebook, as ov 2.6.15-rc7.  Now it reports
 > a cpu speed of approx 800Mhz for a 2.0Mhz Pentium-M.

It's reporting the 'current running speed'. You have speedstep-centrino
loaded, (and probably a governor changing the speed down when idle).

I don't see how this can be construed as breakage btw.
Which application breaks due to this changing ?

		Dave


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-30  3:47                     ` Mark Lord
  2005-12-30  3:56                       ` Dave Jones
@ 2005-12-30  3:57                       ` Mark Lord
  2005-12-30  4:02                         ` Dave Jones
  1 sibling, 1 reply; 167+ messages in thread
From: Mark Lord @ 2005-12-30  3:57 UTC (permalink / raw)
  To: Mark Lord
  Cc: Linus Torvalds, Dave Jones, Ingo Molnar, Andrew Morton, arjan,
	linux-kernel, mpm

Mark Lord wrote:
>
> Okay, I'm complaining:  /proc/cpuinfo is no longer correct
> for my Pentium-M notebook, as ov 2.6.15-rc7.  Now it reports
> a cpu speed of approx 800Mhz for a 2.0Mhz Pentium-M.

2.0GHz, not Mhz!  (blush)

Prior to -rc7, /proc/cpuinfo would scale according to the
current speedstep of the CPU.  Now it seems stuck at the
lowest setting for some reason.

Cheers

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-30  3:57                       ` Mark Lord
@ 2005-12-30  4:02                         ` Dave Jones
  2005-12-30  4:11                           ` Mark Lord
  0 siblings, 1 reply; 167+ messages in thread
From: Dave Jones @ 2005-12-30  4:02 UTC (permalink / raw)
  To: Mark Lord
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, arjan, linux-kernel,
	mpm

On Thu, Dec 29, 2005 at 10:57:40PM -0500, Mark Lord wrote:
 > Mark Lord wrote:
 > >
 > >Okay, I'm complaining:  /proc/cpuinfo is no longer correct
 > >for my Pentium-M notebook, as ov 2.6.15-rc7.  Now it reports
 > >a cpu speed of approx 800Mhz for a 2.0Mhz Pentium-M.
 > 
 > 2.0GHz, not Mhz!  (blush)
 > 
 > Prior to -rc7, /proc/cpuinfo would scale according to the
 > current speedstep of the CPU.  Now it seems stuck at the
 > lowest setting for some reason.

Ok, if the scaling doesn't work any more, that's a bug rather
than an intentional breakage.  More details please? dmesg ?
/sys/devices/system/cpu/cpufreq contents? What were you using
to do the scaling previously?  (An app, or ondemand)

		Dave


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-30  4:02                         ` Dave Jones
@ 2005-12-30  4:11                           ` Mark Lord
  2005-12-30  4:14                             ` Mark Lord
  0 siblings, 1 reply; 167+ messages in thread
From: Mark Lord @ 2005-12-30  4:11 UTC (permalink / raw)
  To: Dave Jones
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, arjan, linux-kernel,
	mpm

Dave Jones wrote:
> On Thu, Dec 29, 2005 at 10:57:40PM -0500, Mark Lord wrote:
>  > Mark Lord wrote:
>  > >
>  > >Okay, I'm complaining:  /proc/cpuinfo is no longer correct
>  > >for my Pentium-M notebook, as ov 2.6.15-rc7.  Now it reports
>  > >a cpu speed of approx 800Mhz for a 2.0Mhz Pentium-M.
>  > 
>  > 2.0GHz, not Mhz!  (blush)
>  > 
>  > Prior to -rc7, /proc/cpuinfo would scale according to the
>  > current speedstep of the CPU.  Now it seems stuck at the
>  > lowest setting for some reason.
> 
> Ok, if the scaling doesn't work any more, that's a bug rather
> than an intentional breakage.  More details please? dmesg ?
> /sys/devices/system/cpu/cpufreq contents? What were you using
> to do the scaling previously?  (An app, or ondemand)

The actual speedstep component ("ondemand" cpufreq) is working just
fine, according to /sys/devices/system/cpu/cpufreq.  But /proc/cpuinfo
is no longer reflecting the current values -- stuck at 800Mhz
regardless of /sys/devices/system/cpu/cpufreq showing other values.

Cheers

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-30  4:11                           ` Mark Lord
@ 2005-12-30  4:14                             ` Mark Lord
  2005-12-30  4:20                               ` Mark Lord
  0 siblings, 1 reply; 167+ messages in thread
From: Mark Lord @ 2005-12-30  4:14 UTC (permalink / raw)
  To: Mark Lord
  Cc: Dave Jones, Linus Torvalds, Ingo Molnar, Andrew Morton, arjan,
	linux-kernel, mpm

Mark Lord wrote:
>
> The actual speedstep component ("ondemand" cpufreq) is working just
> fine, according to /sys/devices/system/cpu/cpufreq.  But /proc/cpuinfo
> is no longer reflecting the current values -- stuck at 800Mhz
> regardless of /sys/devices/system/cpu/cpufreq showing other values.

Actually, the path is /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq.

And tonight it appears to be working again (/proc/cpuinfo showing
correct values, something it was not doing when I first checked it
after upgrading to -rc7.. something buggy there??).

Cheers

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-30  4:14                             ` Mark Lord
@ 2005-12-30  4:20                               ` Mark Lord
  2005-12-30  5:04                                 ` Dave Jones
  0 siblings, 1 reply; 167+ messages in thread
From: Mark Lord @ 2005-12-30  4:20 UTC (permalink / raw)
  To: Mark Lord
  Cc: Dave Jones, Linus Torvalds, Ingo Molnar, Andrew Morton, arjan,
	linux-kernel, mpm

Mark Lord wrote:
..
> And tonight it appears to be working again (/proc/cpuinfo showing
> correct values, something it was not doing when I first checked it
> after upgrading to -rc7.. something buggy there??).

Okay, I've tried a couple of reboots, and it's working fine tonight.
Maybe it only fails when doing a public demo for Windows people?
(as when it first failed).

Leave it.  If I can catch it again, I'll scream again then.

Cheers

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-30  4:20                               ` Mark Lord
@ 2005-12-30  5:04                                 ` Dave Jones
  0 siblings, 0 replies; 167+ messages in thread
From: Dave Jones @ 2005-12-30  5:04 UTC (permalink / raw)
  To: Mark Lord
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, arjan, linux-kernel,
	mpm

On Thu, Dec 29, 2005 at 11:20:12PM -0500, Mark Lord wrote:
 > Mark Lord wrote:
 > ..
 > >And tonight it appears to be working again (/proc/cpuinfo showing
 > >correct values, something it was not doing when I first checked it
 > >after upgrading to -rc7.. something buggy there??).
 > 
 > Okay, I've tried a couple of reboots, and it's working fine tonight.
 > Maybe it only fails when doing a public demo for Windows people?
 > (as when it first failed).
 > 
 > Leave it.  If I can catch it again, I'll scream again then.

One thing that could explain it.. SMP kernels currently don't
report scaling correctly. It'll always show the boot frequency.
There's a fix for this in the cpufreq.git repo (and -mm)
that's going to Linus once 2.6.15 is out.

		Dave


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-30  2:03                           ` Tim Schmielau
  2005-12-30  2:15                             ` Tim Schmielau
@ 2005-12-30  7:49                             ` Ingo Molnar
  2005-12-31 14:38                               ` Adrian Bunk
  2005-12-31  3:51                             ` Kurt Wall
  2 siblings, 1 reply; 167+ messages in thread
From: Ingo Molnar @ 2005-12-30  7:49 UTC (permalink / raw)
  To: Tim Schmielau
  Cc: Arjan van de Ven, Linus Torvalds, Dave Jones, Andrew Morton, lkml,
	mpm


* Tim Schmielau <tim@physik3.uni-rostock.de> wrote:

> What about the previous suggestion to remove inline from *all* static 
> inline functions in .c files?

i think this is a way too static approach. Why go from one extreme to 
the other, when my 3 simple patches (which arguably create a more 
flexible scenario) gives us savings of 7.7%?

	Ingo

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-29 23:16                 ` Willy Tarreau
@ 2005-12-30  8:05                   ` Arjan van de Ven
  2005-12-30  8:15                     ` Willy Tarreau
  2005-12-30  8:33                   ` Jesper Juhl
  2005-12-30 19:53                   ` Alistair John Strachan
  2 siblings, 1 reply; 167+ messages in thread
From: Arjan van de Ven @ 2005-12-30  8:05 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, linux-kernel, mpm


> Can't we elect a recommended gcc version that distro makers could
> ship under the name kgcc as it has been the case for some time,

speaking as someone who used to work for a distro: this sucks for
distros. Shipping 2 compilers is NOT fun. Not fun at all! It's double
the maintenance, actually more since 1 of the 2 is only used in 1
package, so it gets a lot less testing.



^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-30  8:05                   ` Arjan van de Ven
@ 2005-12-30  8:15                     ` Willy Tarreau
  2005-12-30  8:24                       ` Arjan van de Ven
  0 siblings, 1 reply; 167+ messages in thread
From: Willy Tarreau @ 2005-12-30  8:15 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, linux-kernel, mpm

On Fri, Dec 30, 2005 at 09:05:17AM +0100, Arjan van de Ven wrote:
> 
> > Can't we elect a recommended gcc version that distro makers could
> > ship under the name kgcc as it has been the case for some time,
> 
> speaking as someone who used to work for a distro: this sucks for
> distros. Shipping 2 compilers is NOT fun. Not fun at all! It's double
> the maintenance, actually more since 1 of the 2 is only used in 1
> package, so it gets a lot less testing.

I trust your experience on this, but wasn't the lack of testing
primarily due to the use of a "special" version of the compiler ?
For instance, if we put a short howto in Documentation/ explaining
how to build a kgcc toolchain describing what versions to use, there
are chances that most LKML users will use the exact same version.
Distro maintainers may want to follow the same version too. Also,
the fact that the kernel would be designed to work with *that*
compiler will limit the maintenance trouble you certainly have
encountered trying to keep the compiler up-to-date with more recent
kernel patches and updates.

Of course I may be wrong, but I think that kernel developpers spend
a huge time adapting the kernel to newer versions gcc (and fixing
bugs caused by new versions too), and this time would better be spent
developping new features and fixing bugs (and of course sometimes
maintaining the kgcc toolchain when needed).

Willy


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-30  8:15                     ` Willy Tarreau
@ 2005-12-30  8:24                       ` Arjan van de Ven
  2005-12-30  9:20                         ` Willy Tarreau
  0 siblings, 1 reply; 167+ messages in thread
From: Arjan van de Ven @ 2005-12-30  8:24 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, linux-kernel, mpm

On Fri, 2005-12-30 at 09:15 +0100, Willy Tarreau wrote:
> 
> 
> I trust your experience on this, but wasn't the lack of testing
> primarily due to the use of a "special" version of the compiler ?
> For instance, if we put a short howto in Documentation/ explaining
> how to build a kgcc toolchain describing what versions to use, there
> are chances that most LKML users will use the exact same version.
> Distro maintainers may want to follow the same version too. Also,
> the fact that the kernel would be designed to work with *that*
> compiler will limit the maintenance trouble you certainly have
> encountered trying to keep the compiler up-to-date with more recent
> kernel patches and updates.

it's not that easy. Simply put: the gcc people release an update every 6
months; distros "jump ahead" the bugfixes on that usually. (think of it
like -stable, where distros would ship patches accepted for -stable but
before -stable got released). Taking an older compiler from gcc.gnu.org
doesn't mean it's bug free. It just means you're not getting bugfixes.



^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-29 23:16                 ` Willy Tarreau
  2005-12-30  8:05                   ` Arjan van de Ven
@ 2005-12-30  8:33                   ` Jesper Juhl
  2005-12-30  9:28                     ` Willy Tarreau
  2005-12-30 19:53                   ` Alistair John Strachan
  2 siblings, 1 reply; 167+ messages in thread
From: Jesper Juhl @ 2005-12-30  8:33 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, arjan, linux-kernel,
	mpm

On 12/30/05, Willy Tarreau <willy@w.ods.org> wrote:
<!-- snip -->
>
> Can't we elect a recommended gcc version that distro makers could
> ship under the name kgcc as it has been the case for some time,
> and try to stick to that version for as long as possible ? The only
> real reason to upgrade it would be to support newer archs, while at
> the moment, we try to support compilers which are shipped as default
> *user-space* compilers.
>
As I see it, doing that would
 - put extra work on distributors.
 - bloat users systems with the need to have two gcc versions installed.
 - decrease testing with different gcc versions, which sometimes uncover bugs.


--
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-30  8:24                       ` Arjan van de Ven
@ 2005-12-30  9:20                         ` Willy Tarreau
  2005-12-30 13:38                           ` Adrian Bunk
  0 siblings, 1 reply; 167+ messages in thread
From: Willy Tarreau @ 2005-12-30  9:20 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, linux-kernel, mpm

On Fri, Dec 30, 2005 at 09:24:32AM +0100, Arjan van de Ven wrote:
> On Fri, 2005-12-30 at 09:15 +0100, Willy Tarreau wrote:
> > 
> > 
> > I trust your experience on this, but wasn't the lack of testing
> > primarily due to the use of a "special" version of the compiler ?
> > For instance, if we put a short howto in Documentation/ explaining
> > how to build a kgcc toolchain describing what versions to use, there
> > are chances that most LKML users will use the exact same version.
> > Distro maintainers may want to follow the same version too. Also,
> > the fact that the kernel would be designed to work with *that*
> > compiler will limit the maintenance trouble you certainly have
> > encountered trying to keep the compiler up-to-date with more recent
> > kernel patches and updates.
> 
> it's not that easy. Simply put: the gcc people release an update every 6
> months; distros "jump ahead" the bugfixes on that usually. (think of it
> like -stable, where distros would ship patches accepted for -stable but
> before -stable got released). Taking an older compiler from gcc.gnu.org
> doesn't mean it's bug free. It just means you're not getting bugfixes.

OK, but precisely, we don't have any bug free version of gcc anyway. The
kernel has a long history of workaround for gcc bugs. So probably there
will be less work with a -possibly buggy- old gcc version than with a
constantly changing one. For instance, if we stick to 3.4 for 2 years,
we will of course encounter a lot of bugs. But they will be worked around
just like gcc-2.95 bugs have been, and we will be able to keep the same
compiler very long at virtually zero maintenance work.

A few years ago, I had to work on a mainframe system with gcc 1.37.
Yes, 1.37 !!! It was very limited, but I could adapt my code to it
without thinking about what would happen when they update it precisely
because it was not meant to evolve at all. It had been shipped like
this with the OS for 5 years and that was OK. With stable tools like
this, any bug becomes a feature because you don't risk someone fixing
it and breaking your workaround.

While it would be a real problem for user-space tools, I think it
is compatible with kernel needs. The kernel already has strict
requirements to be built and does not need the same level of
portability as pdksh or openssh for instance.

Willy


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-30  8:33                   ` Jesper Juhl
@ 2005-12-30  9:28                     ` Willy Tarreau
  2005-12-30  9:37                       ` Jesper Juhl
  0 siblings, 1 reply; 167+ messages in thread
From: Willy Tarreau @ 2005-12-30  9:28 UTC (permalink / raw)
  To: Jesper Juhl
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, arjan, linux-kernel,
	mpm

On Fri, Dec 30, 2005 at 09:33:14AM +0100, Jesper Juhl wrote:
> On 12/30/05, Willy Tarreau <willy@w.ods.org> wrote:
> <!-- snip -->
> >
> > Can't we elect a recommended gcc version that distro makers could
> > ship under the name kgcc as it has been the case for some time,
> > and try to stick to that version for as long as possible ? The only
> > real reason to upgrade it would be to support newer archs, while at
> > the moment, we try to support compilers which are shipped as default
> > *user-space* compilers.
> >
> As I see it, doing that would
>  - put extra work on distributors.

In the short term, yes. In the mid-term, I don't think so. Having one package
which does not need to change and another one which evolves regardless of
kernel needs is less work than ensuring that a single package is still
compatible with everyone's needs. Think about support too : "what gcc version
did you use ?" would simply become "did you build with kgcc ?"

>  - bloat users systems with the need to have two gcc versions installed.

$ size /usr/lib/gcc-lib/i586-pc-linux-gnu/3.3.6/cc1
   text    data     bss     dec     hex filename
3430228    2680  746688 4179596  3fc68c /usr/lib/gcc-lib/i586-pc-linux-gnu/3.3.6/cc1

You don't even need libgcc nor c++ to build the kernel. Anyway, it should
not be an absolute requirement, but the *recommended* and *supported* version.

>  - decrease testing with different gcc versions, which sometimes uncover bugs.

gcc testing should not consume kernel developpers' time, but gcc's users.
How many kernel bugs have finally been attributed to a recent change in gcc ?
A lot I think. Uncovering bugs in gcc is useful but not the primary goal of
kernel developpers.

> Jesper Juhl <jesper.juhl@gmail.com>

Willy


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-29 15:42                 ` Jakub Jelinek
  2005-12-29 19:14                   ` Adrian Bunk
@ 2005-12-30  9:28                   ` Andi Kleen
  2005-12-30  9:40                     ` Ingo Molnar
  1 sibling, 1 reply; 167+ messages in thread
From: Andi Kleen @ 2005-12-30  9:28 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Arjan van de Ven, Christoph Hellwig, Ingo Molnar, Linus Torvalds,
	lkml, Andrew Morton, Matt Mackall

Jakub Jelinek <jakub@redhat.com> writes:
> 
> Only for static functions (and in -funit-at-a-time mode).
> Anything else would require full IMA over the whole kernel and we aren't
> there yet.  So inline hints are useful.  But most of the inline keywords
> in the kernel really should be that, hints, because e.g. while it can be
> beneficial to inline something on one arch, it may be not beneficial on
> another arch, depending on cache sizes, number of general registers
> available to the compiler, register preassure, speed of the call/ret
> pair, calling convention and many other factors.

There are important exceptions like: 

- Code that really wants to do compile time constant resolution
(like the x86 copy_*_user)  and even throws linker errors when wrong.
- Anything in a include file (otherwise it gets duplicated for
every #include which can actually increase text size a lot) 
- There is some code which absolutely needs inline in the x86-64 
vsyscall code.

But arguably they should be force_inline.

I'm not quite sure I buy Ingo's original argument also.   If he's only
looking at text size then with the above fixed then he ideally
would like to not inline anything (because except these
exceptions above .text usually near always shrinks when 
not inlining). But that's not necessarily best for performance.

-Andi

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-30  9:28                     ` Willy Tarreau
@ 2005-12-30  9:37                       ` Jesper Juhl
  2005-12-30  9:38                         ` Willy Tarreau
  0 siblings, 1 reply; 167+ messages in thread
From: Jesper Juhl @ 2005-12-30  9:37 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, arjan, linux-kernel,
	mpm

On 12/30/05, Willy Tarreau <willy@w.ods.org> wrote:
> On Fri, Dec 30, 2005 at 09:33:14AM +0100, Jesper Juhl wrote:
> > On 12/30/05, Willy Tarreau <willy@w.ods.org> wrote:
> > <!-- snip -->
> > >
> > > Can't we elect a recommended gcc version that distro makers could
> > > ship under the name kgcc as it has been the case for some time,
> > > and try to stick to that version for as long as possible ? The only
> > > real reason to upgrade it would be to support newer archs, while at
> > > the moment, we try to support compilers which are shipped as default
> > > *user-space* compilers.
> > >
> > As I see it, doing that would
> >  - put extra work on distributors.
>
> In the short term, yes. In the mid-term, I don't think so. Having one package
> which does not need to change and another one which evolves regardless of
> kernel needs is less work than ensuring that a single package is still
> compatible with everyone's needs. Think about support too : "what gcc version
> did you use ?" would simply become "did you build with kgcc ?"
>
> >  - bloat users systems with the need to have two gcc versions installed.
>
> $ size /usr/lib/gcc-lib/i586-pc-linux-gnu/3.3.6/cc1
>    text    data     bss     dec     hex filename
> 3430228    2680  746688 4179596  3fc68c /usr/lib/gcc-lib/i586-pc-linux-gnu/3.3.6/cc1
>
It's not much, agreed, but if the users regular gcc can build the
kernel it's still unnessesary extra bloat to have two gcc's.
But you are right, the bloat issue is just a minor thing.


> You don't even need libgcc nor c++ to build the kernel. Anyway, it should
> not be an absolute requirement, but the *recommended* and *supported* version.
>
> >  - decrease testing with different gcc versions, which sometimes uncover bugs.
>
> gcc testing should not consume kernel developpers' time, but gcc's users.
> How many kernel bugs have finally been attributed to a recent change in gcc ?
> A lot I think. Uncovering bugs in gcc is useful but not the primary goal of
> kernel developpers.
>
That's not what I meant. I meant that building the kernel with
different gcc versions sometimes uncover bugs in the *kernel*.  I was
not talking about finding bugs in gcc.

--
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-30  9:37                       ` Jesper Juhl
@ 2005-12-30  9:38                         ` Willy Tarreau
  0 siblings, 0 replies; 167+ messages in thread
From: Willy Tarreau @ 2005-12-30  9:38 UTC (permalink / raw)
  To: Jesper Juhl
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, arjan, linux-kernel,
	mpm

On Fri, Dec 30, 2005 at 10:37:08AM +0100, Jesper Juhl wrote:
> On 12/30/05, Willy Tarreau <willy@w.ods.org> wrote:
> > On Fri, Dec 30, 2005 at 09:33:14AM +0100, Jesper Juhl wrote:
> > > On 12/30/05, Willy Tarreau <willy@w.ods.org> wrote:
> > > <!-- snip -->
> > > >
> > > > Can't we elect a recommended gcc version that distro makers could
> > > > ship under the name kgcc as it has been the case for some time,
> > > > and try to stick to that version for as long as possible ? The only
> > > > real reason to upgrade it would be to support newer archs, while at
> > > > the moment, we try to support compilers which are shipped as default
> > > > *user-space* compilers.
> > > >
> > > As I see it, doing that would
> > >  - put extra work on distributors.
> >
> > In the short term, yes. In the mid-term, I don't think so. Having one package
> > which does not need to change and another one which evolves regardless of
> > kernel needs is less work than ensuring that a single package is still
> > compatible with everyone's needs. Think about support too : "what gcc version
> > did you use ?" would simply become "did you build with kgcc ?"
> >
> > >  - bloat users systems with the need to have two gcc versions installed.
> >
> > $ size /usr/lib/gcc-lib/i586-pc-linux-gnu/3.3.6/cc1
> >    text    data     bss     dec     hex filename
> > 3430228    2680  746688 4179596  3fc68c /usr/lib/gcc-lib/i586-pc-linux-gnu/3.3.6/cc1
> >
> It's not much, agreed, but if the users regular gcc can build the
> kernel it's still unnessesary extra bloat to have two gcc's.
> But you are right, the bloat issue is just a minor thing.
> 
> 
> > You don't even need libgcc nor c++ to build the kernel. Anyway, it should
> > not be an absolute requirement, but the *recommended* and *supported* version.
> >
> > >  - decrease testing with different gcc versions, which sometimes uncover bugs.
> >
> > gcc testing should not consume kernel developpers' time, but gcc's users.
> > How many kernel bugs have finally been attributed to a recent change in gcc ?
> > A lot I think. Uncovering bugs in gcc is useful but not the primary goal of
> > kernel developpers.
> >
> That's not what I meant. I meant that building the kernel with
> different gcc versions sometimes uncover bugs in the *kernel*.  I was
> not talking about finding bugs in gcc.

OK. But there will always be people trying to build kernels with any gcc so
I don't think we would lose this bug report channel anyway.

> Jesper Juhl <jesper.juhl@gmail.com>

Willy


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-30  9:28                   ` Andi Kleen
@ 2005-12-30  9:40                     ` Ingo Molnar
  2005-12-30 10:14                       ` Ingo Molnar
  2005-12-30 10:25                       ` Andi Kleen
  0 siblings, 2 replies; 167+ messages in thread
From: Ingo Molnar @ 2005-12-30  9:40 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Jakub Jelinek, Arjan van de Ven, Christoph Hellwig,
	Linus Torvalds, lkml, Andrew Morton, Matt Mackall


* Andi Kleen <ak@suse.de> wrote:

> There are important exceptions like: 
> 
> - Code that really wants to do compile time constant resolution
> (like the x86 copy_*_user)  and even throws linker errors when wrong.
> - Anything in a include file (otherwise it gets duplicated for
> every #include which can actually increase text size a lot) 
> - There is some code which absolutely needs inline in the x86-64 
> vsyscall code.
> 
> But arguably they should be force_inline.

FYI, i picked up a couple of those in the 3rd patch that i sent 
yesterday (see below too). That patch marks a handful of functions 
__always_inline. This improved size by another 2-3%. Not bad from a 
small patch:

 asm-i386/apic.h        |    6 +++---
 asm-i386/bitops.h      |    2 +-
 asm-i386/current.h     |    2 +-
 asm-i386/string.h      |    8 ++++----
 linux/buffer_head.h    |   10 +++++-----
 linux/byteorder/swab.h |   18 +++++++++---------
 linux/mm.h             |    2 +-
 linux/slab.h           |    2 +-
 8 files changed, 25 insertions(+), 25 deletions(-)

> I'm not quite sure I buy Ingo's original argument also.  If he's only 
> looking at text size then with the above fixed then he ideally would 
> like to not inline anything (because except these exceptions above 
> .text usually near always shrinks when not inlining). But that's not 
> necessarily best for performance.

well, i think the numbers talk for themselves. Here are my latest 
results:

---- 
The effect of the patches on x86, using a generic .config is:

    text    data     bss     dec     hex filename
 3286166  869852  387260 4543278  45532e vmlinux-orig
 3194123  955168  387260 4536551  4538e7 vmlinux-inline
 3119495  884960  387748 4392203  43050b vmlinux-inline+units
 3051709  869380  387748 4308837  41bf65 vmlinux-inline+units+fixes
 3049357  868928  387748 4306033  41b471 vmlinux-inline+units+fixes+capable

i.e. a 7.8% code-size reduction. Using a tiny .config gives:

   text    data     bss     dec     hex filename
 437271   77646   32192  547109   85925 vmlinux-orig
 452694   77646   32192  562532   89564 vmlinux-inline
 431891   77422   32128  541441   84301 vmlinux-inline+units
 414803   77422   32128  524353   80041 vmlinux-inline+units+fixes
 414020   77422   32128  523570   7fd32 vmlinux-inline+units+fixes+capable

or an 5.6% reduction.

i've also done test-builds with CC_OPTIMIZE_FOR_SIZE disabled:

    text    data     bss     dec     hex filename
 4080998  870384  387260 5338642  517612 vmlinux-orig
 4084421  872024  387260 5343705  5189d9 vmlinux-inline
 4010957  834048  387748 5232753  4fd871 vmlinux-inline+units
 4010039  833112  387748 5230899  4fd133 vmlinux-inline+units+fixes
 4007617  833120  387748 5228485  4fc7c5 vmlinux-inline+units+fixes+capable

or a 1.8% code size reduction.

	Ingo

--------
Subject: mark a handful of inline functions as 'must inline'

this patch marks a number of functions as 'must inline' - so that they
get inlined even if optimizing for size. This patch gives another 2-3%
of size saved, when CONFIG_CC_OPTIMIZE_FOR_SIZE is enabled.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

----

 include/asm-i386/apic.h        |    6 +++---
 include/asm-i386/bitops.h      |    2 +-
 include/asm-i386/current.h     |    2 +-
 include/asm-i386/string.h      |    8 ++++----
 include/linux/buffer_head.h    |   10 +++++-----
 include/linux/byteorder/swab.h |   18 +++++++++---------
 include/linux/mm.h             |    2 +-
 include/linux/slab.h           |    2 +-
 8 files changed, 25 insertions(+), 25 deletions(-)

Index: linux-gcc.q/include/asm-i386/apic.h
===================================================================
--- linux-gcc.q.orig/include/asm-i386/apic.h
+++ linux-gcc.q/include/asm-i386/apic.h
@@ -49,17 +49,17 @@ static inline void lapic_enable(void)
  * Basic functions accessing APICs.
  */
 
-static __inline void apic_write(unsigned long reg, unsigned long v)
+static __always_inline void apic_write(unsigned long reg, unsigned long v)
 {
 	*((volatile unsigned long *)(APIC_BASE+reg)) = v;
 }
 
-static __inline void apic_write_atomic(unsigned long reg, unsigned long v)
+static __always_inline void apic_write_atomic(unsigned long reg, unsigned long v)
 {
 	xchg((volatile unsigned long *)(APIC_BASE+reg), v);
 }
 
-static __inline unsigned long apic_read(unsigned long reg)
+static __always_inline unsigned long apic_read(unsigned long reg)
 {
 	return *((volatile unsigned long *)(APIC_BASE+reg));
 }
Index: linux-gcc.q/include/asm-i386/bitops.h
===================================================================
--- linux-gcc.q.orig/include/asm-i386/bitops.h
+++ linux-gcc.q/include/asm-i386/bitops.h
@@ -247,7 +247,7 @@ static inline int test_and_change_bit(in
 static int test_bit(int nr, const volatile void * addr);
 #endif
 
-static inline int constant_test_bit(int nr, const volatile unsigned long *addr)
+static __always_inline int constant_test_bit(int nr, const volatile unsigned long *addr)
 {
 	return ((1UL << (nr & 31)) & (addr[nr >> 5])) != 0;
 }
Index: linux-gcc.q/include/asm-i386/current.h
===================================================================
--- linux-gcc.q.orig/include/asm-i386/current.h
+++ linux-gcc.q/include/asm-i386/current.h
@@ -5,7 +5,7 @@
 
 struct task_struct;
 
-static inline struct task_struct * get_current(void)
+static __always_inline struct task_struct * get_current(void)
 {
 	return current_thread_info()->task;
 }
Index: linux-gcc.q/include/asm-i386/string.h
===================================================================
--- linux-gcc.q.orig/include/asm-i386/string.h
+++ linux-gcc.q/include/asm-i386/string.h
@@ -201,7 +201,7 @@ __asm__ __volatile__(
 return __res;
 }
 
-static inline void * __memcpy(void * to, const void * from, size_t n)
+static __always_inline void * __memcpy(void * to, const void * from, size_t n)
 {
 int d0, d1, d2;
 __asm__ __volatile__(
@@ -223,7 +223,7 @@ return (to);
  * This looks ugly, but the compiler can optimize it totally,
  * as the count is constant.
  */
-static inline void * __constant_memcpy(void * to, const void * from, size_t n)
+static __always_inline void * __constant_memcpy(void * to, const void * from, size_t n)
 {
 	long esi, edi;
 	if (!n) return to;
@@ -367,7 +367,7 @@ return s;
  * things 32 bits at a time even when we don't know the size of the
  * area at compile-time..
  */
-static inline void * __constant_c_memset(void * s, unsigned long c, size_t count)
+static __always_inline void * __constant_c_memset(void * s, unsigned long c, size_t count)
 {
 int d0, d1;
 __asm__ __volatile__(
@@ -416,7 +416,7 @@ extern char *strstr(const char *cs, cons
  * This looks horribly ugly, but the compiler can optimize it totally,
  * as we by now know that both pattern and count is constant..
  */
-static inline void * __constant_c_and_count_memset(void * s, unsigned long pattern, size_t count)
+static __always_inline void * __constant_c_and_count_memset(void * s, unsigned long pattern, size_t count)
 {
 	switch (count) {
 		case 0:
Index: linux-gcc.q/include/linux/buffer_head.h
===================================================================
--- linux-gcc.q.orig/include/linux/buffer_head.h
+++ linux-gcc.q/include/linux/buffer_head.h
@@ -72,15 +72,15 @@ struct buffer_head {
  * and buffer_foo() functions.
  */
 #define BUFFER_FNS(bit, name)						\
-static inline void set_buffer_##name(struct buffer_head *bh)		\
+static __always_inline void set_buffer_##name(struct buffer_head *bh)	\
 {									\
 	set_bit(BH_##bit, &(bh)->b_state);				\
 }									\
-static inline void clear_buffer_##name(struct buffer_head *bh)		\
+static __always_inline void clear_buffer_##name(struct buffer_head *bh)	\
 {									\
 	clear_bit(BH_##bit, &(bh)->b_state);				\
 }									\
-static inline int buffer_##name(const struct buffer_head *bh)		\
+static __always_inline int buffer_##name(const struct buffer_head *bh)	\
 {									\
 	return test_bit(BH_##bit, &(bh)->b_state);			\
 }
@@ -89,11 +89,11 @@ static inline int buffer_##name(const st
  * test_set_buffer_foo() and test_clear_buffer_foo()
  */
 #define TAS_BUFFER_FNS(bit, name)					\
-static inline int test_set_buffer_##name(struct buffer_head *bh)	\
+static __always_inline int test_set_buffer_##name(struct buffer_head *bh)\
 {									\
 	return test_and_set_bit(BH_##bit, &(bh)->b_state);		\
 }									\
-static inline int test_clear_buffer_##name(struct buffer_head *bh)	\
+static __always_inline int test_clear_buffer_##name(struct buffer_head *bh)\
 {									\
 	return test_and_clear_bit(BH_##bit, &(bh)->b_state);		\
 }									\
Index: linux-gcc.q/include/linux/byteorder/swab.h
===================================================================
--- linux-gcc.q.orig/include/linux/byteorder/swab.h
+++ linux-gcc.q/include/linux/byteorder/swab.h
@@ -130,34 +130,34 @@
 #endif /* OPTIMIZE */
 
 
-static __inline__ __attribute_const__ __u16 __fswab16(__u16 x)
+static __always_inline __attribute_const__ __u16 __fswab16(__u16 x)
 {
 	return __arch__swab16(x);
 }
-static __inline__ __u16 __swab16p(const __u16 *x)
+static __always_inline __u16 __swab16p(const __u16 *x)
 {
 	return __arch__swab16p(x);
 }
-static __inline__ void __swab16s(__u16 *addr)
+static __always_inline void __swab16s(__u16 *addr)
 {
 	__arch__swab16s(addr);
 }
 
-static __inline__ __attribute_const__ __u32 __fswab32(__u32 x)
+static __always_inline __attribute_const__ __u32 __fswab32(__u32 x)
 {
 	return __arch__swab32(x);
 }
-static __inline__ __u32 __swab32p(const __u32 *x)
+static __always_inline __u32 __swab32p(const __u32 *x)
 {
 	return __arch__swab32p(x);
 }
-static __inline__ void __swab32s(__u32 *addr)
+static __always_inline void __swab32s(__u32 *addr)
 {
 	__arch__swab32s(addr);
 }
 
 #ifdef __BYTEORDER_HAS_U64__
-static __inline__ __attribute_const__ __u64 __fswab64(__u64 x)
+static __always_inline __attribute_const__ __u64 __fswab64(__u64 x)
 {
 #  ifdef __SWAB_64_THRU_32__
 	__u32 h = x >> 32;
@@ -167,11 +167,11 @@ static __inline__ __attribute_const__ __
 	return __arch__swab64(x);
 #  endif
 }
-static __inline__ __u64 __swab64p(const __u64 *x)
+static __always_inline __u64 __swab64p(const __u64 *x)
 {
 	return __arch__swab64p(x);
 }
-static __inline__ void __swab64s(__u64 *addr)
+static __always_inline void __swab64s(__u64 *addr)
 {
 	__arch__swab64s(addr);
 }
Index: linux-gcc.q/include/linux/mm.h
===================================================================
--- linux-gcc.q.orig/include/linux/mm.h
+++ linux-gcc.q/include/linux/mm.h
@@ -507,7 +507,7 @@ static inline void set_page_links(struct
 extern struct page *mem_map;
 #endif
 
-static inline void *lowmem_page_address(struct page *page)
+static __always_inline void *lowmem_page_address(struct page *page)
 {
 	return __va(page_to_pfn(page) << PAGE_SHIFT);
 }
Index: linux-gcc.q/include/linux/slab.h
===================================================================
--- linux-gcc.q.orig/include/linux/slab.h
+++ linux-gcc.q/include/linux/slab.h
@@ -76,7 +76,7 @@ struct cache_sizes {
 extern struct cache_sizes malloc_sizes[];
 extern void *__kmalloc(size_t, gfp_t);
 
-static inline void *kmalloc(size_t size, gfp_t flags)
+static __always_inline void *kmalloc(size_t size, gfp_t flags)
 {
 	if (__builtin_constant_p(size)) {
 		int i = 0;

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-30  9:40                     ` Ingo Molnar
@ 2005-12-30 10:14                       ` Ingo Molnar
  2005-12-30 13:31                         ` Adrian Bunk
  2005-12-30 14:08                         ` Christian Trefzer
  2005-12-30 10:25                       ` Andi Kleen
  1 sibling, 2 replies; 167+ messages in thread
From: Ingo Molnar @ 2005-12-30 10:14 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Jakub Jelinek, Arjan van de Ven, Christoph Hellwig,
	Linus Torvalds, lkml, Andrew Morton, Matt Mackall


* Ingo Molnar <mingo@elte.hu> wrote:

> > I'm not quite sure I buy Ingo's original argument also.  If he's only 
> > looking at text size then with the above fixed then he ideally would 
> > like to not inline anything (because except these exceptions above 
> > .text usually near always shrinks when not inlining). But that's not 
> > necessarily best for performance.
> 
> well, i think the numbers talk for themselves. Here are my latest 
> results:

i now have x86 allyesconfig numbers too:

    text     data     bss      dec  filename
 24190215 6737902 1775592 32703709  vmlinux-allyes-speed-orig
 20096423 6758758 1775592 28630773  vmlinux-allyes-orig
 19223511 6844002 1775656 27843169  vmlinux-allyes-inline+units+fixes+capable

i.e. enabling CONFIG_CC_OPTIMIZE_FOR_SIZE gives a 20.4% size reduction, 
and adding my latest debloating-queue ontop of gives an additional 4.5% 
of reduction. The queue is at:

  http://redhat.com/~mingo/debloating-patches/

note: my focus is still mostly on CC_OPTIMIZE_FOR_SIZE (which is only 
offered if CONFIG_EMBEDDED is enabled) - if you want a larger kernel 
optimized for speed, do not enable it.

	Ingo

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-30  9:40                     ` Ingo Molnar
  2005-12-30 10:14                       ` Ingo Molnar
@ 2005-12-30 10:25                       ` Andi Kleen
  1 sibling, 0 replies; 167+ messages in thread
From: Andi Kleen @ 2005-12-30 10:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andi Kleen, Jakub Jelinek, Arjan van de Ven, Christoph Hellwig,
	Linus Torvalds, lkml, Andrew Morton, Matt Mackall

On Fri, Dec 30, 2005 at 10:40:45AM +0100, Ingo Molnar wrote:
>     text    data     bss     dec     hex filename
>  4080998  870384  387260 5338642  517612 vmlinux-orig
>  4084421  872024  387260 5343705  5189d9 vmlinux-inline
>  4010957  834048  387748 5232753  4fd871 vmlinux-inline+units
>  4010039  833112  387748 5230899  4fd133 vmlinux-inline+units+fixes
>  4007617  833120  387748 5228485  4fc7c5 vmlinux-inline+units+fixes+capable
> 
> or a 1.8% code size reduction.

But again if you only look at text size you ideally would want
to never inline anything, except the cases above and only called
once functions. So just turn it off except when forced? That would
be the logical conclusion from your strategy. I'm not sure it's a good
one.

-Andi

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-30 10:14                       ` Ingo Molnar
@ 2005-12-30 13:31                         ` Adrian Bunk
  2005-12-30 14:08                         ` Christian Trefzer
  1 sibling, 0 replies; 167+ messages in thread
From: Adrian Bunk @ 2005-12-30 13:31 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andi Kleen, Jakub Jelinek, Arjan van de Ven, Christoph Hellwig,
	Linus Torvalds, lkml, Andrew Morton, Matt Mackall

On Fri, Dec 30, 2005 at 11:14:43AM +0100, Ingo Molnar wrote:
>...
> note: my focus is still mostly on CC_OPTIMIZE_FOR_SIZE (which is only 
> offered if CONFIG_EMBEDDED is enabled) - if you want a larger kernel 
> optimized for speed, do not enable it.

Since 2.6.15-rc6, CC_OPTIMIZE_FOR_SIZE only depends on EXPERIMENTAL.

> 	Ingo

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-30  9:20                         ` Willy Tarreau
@ 2005-12-30 13:38                           ` Adrian Bunk
  0 siblings, 0 replies; 167+ messages in thread
From: Adrian Bunk @ 2005-12-30 13:38 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Arjan van de Ven, Linus Torvalds, Ingo Molnar, Andrew Morton,
	linux-kernel, mpm

On Fri, Dec 30, 2005 at 10:20:15AM +0100, Willy Tarreau wrote:
> On Fri, Dec 30, 2005 at 09:24:32AM +0100, Arjan van de Ven wrote:
> > On Fri, 2005-12-30 at 09:15 +0100, Willy Tarreau wrote:
> > > 
> > > 
> > > I trust your experience on this, but wasn't the lack of testing
> > > primarily due to the use of a "special" version of the compiler ?
> > > For instance, if we put a short howto in Documentation/ explaining
> > > how to build a kgcc toolchain describing what versions to use, there
> > > are chances that most LKML users will use the exact same version.
> > > Distro maintainers may want to follow the same version too. Also,
> > > the fact that the kernel would be designed to work with *that*
> > > compiler will limit the maintenance trouble you certainly have
> > > encountered trying to keep the compiler up-to-date with more recent
> > > kernel patches and updates.
> > 
> > it's not that easy. Simply put: the gcc people release an update every 6
> > months; distros "jump ahead" the bugfixes on that usually. (think of it
> > like -stable, where distros would ship patches accepted for -stable but
> > before -stable got released). Taking an older compiler from gcc.gnu.org
> > doesn't mean it's bug free. It just means you're not getting bugfixes.
> 
> OK, but precisely, we don't have any bug free version of gcc anyway. The
> kernel has a long history of workaround for gcc bugs. So probably there
> will be less work with a -possibly buggy- old gcc version than with a
> constantly changing one. For instance, if we stick to 3.4 for 2 years,
> we will of course encounter a lot of bugs. But they will be worked around
> just like gcc-2.95 bugs have been, and we will be able to keep the same
> compiler very long at virtually zero maintenance work.
>...

The changes in gcc aren't _that_ big.

As an example, I tried compiling recent 2.6 kernels with gcc CVS HEAD 
shortly before the 4.1 branch was created, and except for two or three 
internal compiler errors (that are OK considering that I used a random 
CVS snapshot) the kernel compiled fine.

Every gcc release might have it's own issues, but compared to e.g. the 
pains your proposal would impose on new ports, they aren't that big.
And you shouldn't forget that it's even non-trivial to find one gcc 
release that works fine compiling kernels on all architectures. As an 
example, gcc 3.2 is a known bad compiler on arm.

> Willy

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-30 10:14                       ` Ingo Molnar
  2005-12-30 13:31                         ` Adrian Bunk
@ 2005-12-30 14:08                         ` Christian Trefzer
  1 sibling, 0 replies; 167+ messages in thread
From: Christian Trefzer @ 2005-12-30 14:08 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andi Kleen, Jakub Jelinek, Arjan van de Ven, Christoph Hellwig,
	Linus Torvalds, lkml, Andrew Morton, Matt Mackall


[-- Attachment #1.1: Type: text/plain, Size: 626 bytes --]

Hi Ingo,

On Fri, Dec 30, 2005 at 11:14:43AM +0100, Ingo Molnar wrote:
> 
> * Ingo Molnar <mingo@elte.hu> wrote:
> 
> [...] The queue is at:
> 
>   http://redhat.com/~mingo/debloating-patches/
> 

I was curious and applied among others the uninline-capable patch, with
the result that modules complain about an unknown symbol "capable". The
attached patch is a manually adapted version of yours, extended by the
EXPORT_SYMBOL_GPL required for modules to load again.

The code with the EXPORT_SYMBOL_GPL line works (I am currently running
that kernel) and the patch should apply cleanly.

Regards,
Chris


[-- Attachment #1.2: uninline-and-export-capable.patch --]
[-- Type: text/plain, Size: 1639 bytes --]

Subject: uninline capable()

uninline capable(). Saves 2K of kernel text on a generic .config, and 1K
on a tiny config.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

----

 include/linux/sched.h |   15 ++-------------
 kernel/sys.c          |   11 +++++++++++
 2 files changed, 13 insertions(+), 13 deletions(-)

Index: linux-gcc.q/include/linux/sched.h
===================================================================
--- linux-gcc.q.orig/include/linux/sched.h
+++ linux-gcc.q/include/linux/sched.h
@@ -1102,19 +1102,8 @@ static inline int sas_ss_flags(unsigned 
 }
 
 
-#ifdef CONFIG_SECURITY
-/* code is in security.c */
-extern int capable(int cap);
-#else
-static inline int capable(int cap)
-{
-	if (cap_raised(current->cap_effective, cap)) {
-		current->flags |= PF_SUPERPRIV;
-		return 1;
-	}
-	return 0;
-}
-#endif
+/* code is in security.c or kernel/sys.c if !SECURITY */
+extern int FASTCALL(capable(int cap));
 
 /*
  * Routines for handling mm_structs
Index: linux-gcc.q/kernel/sys.c
===================================================================
--- linux-gcc.q.orig/kernel/sys.c
+++ linux-gcc.q/kernel/sys.c
@@ -222,6 +222,18 @@ int unregister_reboot_notifier(struct no
 
 EXPORT_SYMBOL(unregister_reboot_notifier);
 
+#ifndef CONFIG_SECURITY
+int fastcall capable(int cap)
+{
+        if (cap_raised(current->cap_effective, cap)) {
+	       current->flags |= PF_SUPERPRIV;
+	       return 1;
+        }
+        return 0;
+}
+EXPORT_SYMBOL_GPL(capable);
+#endif
+
 static int set_one_prio(struct task_struct *p, int niceval, int error)
 {
 	int no_nice;

[-- Attachment #2: Type: application/pgp-signature, Size: 827 bytes --]

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-29  4:11           ` Andrew Morton
  2005-12-29  7:32             ` Ingo Molnar
  2005-12-29  7:49             ` Arjan van de Ven
@ 2005-12-30 15:28             ` Alan Cox
  2005-12-30 20:59               ` Adrian Bunk
  2005-12-30 22:12             ` Matt Mackall
  3 siblings, 1 reply; 167+ messages in thread
From: Alan Cox @ 2005-12-30 15:28 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Ingo Molnar, torvalds, arjan, linux-kernel, mpm

On Mer, 2005-12-28 at 20:11 -0800, Andrew Morton wrote:
> If no-forced-inlining makes the kernel smaller then we probably have (yet
> more) incorrect inlining.  We should hunt those down and fix them.  We did
> quite a lot of this in 2.5.x/2.6.early.  Didn't someone have a script which
> would identify which functions are a candidate for uninlining?

There is a tool that does this quite well. Its called "gcc" ;)

More seriously we need to seperate "things Andrew thinks are good inline
candidates" and "things that *must* be inlined". That allows 'build for
size' to do the equivalent of "-Dplease_inline" and the other build to
do "-Dplease_inline=inline". Gcc's inliner isn't aware of things cross
module so isn't going to make all the decisions right, but will make the
tedious local decisions.

As far as bugs go - gcc -Os has also fixed bugs in the past. It doesn't
introduce bugs so much as change them. Fedora means we have good long
term data on -Os with modern gcc (not with old gcc but we just dumped <
3.2 anyway).

Nowdays the -Os code paths are also getting real hammering because many
people build desktops, even OpenOffice with -Os and see overall
performance gains for the system.

Alan


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-29 23:16                 ` Willy Tarreau
  2005-12-30  8:05                   ` Arjan van de Ven
  2005-12-30  8:33                   ` Jesper Juhl
@ 2005-12-30 19:53                   ` Alistair John Strachan
  2 siblings, 0 replies; 167+ messages in thread
From: Alistair John Strachan @ 2005-12-30 19:53 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, arjan, linux-kernel,
	mpm

On Thursday 29 December 2005 23:16, Willy Tarreau wrote:
> On Thu, Dec 29, 2005 at 09:41:12AM -0800, Linus Torvalds wrote:
> > There have been situations where documented gcc semantics changed, and
> > instead of saying "sorry", the gcc people changed the documentation. What
> > the hell is the point of documented semantics if you can't depend on them
> > anyway?
>
> Remember the #arg and ##arg mess in macros between gcc2 and gcc3 ?
>
> I fell like I start to understand where your hate for specifications
> comes from. As much as I like to stick to specs, which is generally
> OK for hardware and network protocols, I can say that with GCC, there
> is clearly no rule telling you whether your program will still compile
> with version N+1 or not.
>
> Can't we elect a recommended gcc version that distro makers could
> ship under the name kgcc as it has been the case for some time,
> and try to stick to that version for as long as possible ? The only
> real reason to upgrade it would be to support newer archs, while at
> the moment, we try to support compilers which are shipped as default
> *user-space* compilers.

Leave this decision to distributors. Ubuntu already seem to use (and require 
you to install) gcc 3.4 if you want to recompile their kernel or any kernel 
modules. It ships with 4.0.1, iirc.

I see GCC improving currently. 3.0 was horrendously slow and buggy versus 
2.95, but 3.3 was a very good compiler, and 4.1 looks like it will be even 
better. Maybe things will continue to improve and this will become less of an 
issue over time.

-- 
Cheers,
Alistair.

'No sense being pessimistic, it probably wouldn't work anyway.'
Third year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-30 15:28             ` Alan Cox
@ 2005-12-30 20:59               ` Adrian Bunk
  0 siblings, 0 replies; 167+ messages in thread
From: Adrian Bunk @ 2005-12-30 20:59 UTC (permalink / raw)
  To: Alan Cox; +Cc: Andrew Morton, Ingo Molnar, torvalds, arjan, linux-kernel, mpm

On Fri, Dec 30, 2005 at 03:28:00PM +0000, Alan Cox wrote:
> On Mer, 2005-12-28 at 20:11 -0800, Andrew Morton wrote:
> > If no-forced-inlining makes the kernel smaller then we probably have (yet
> > more) incorrect inlining.  We should hunt those down and fix them.  We did
> > quite a lot of this in 2.5.x/2.6.early.  Didn't someone have a script which
> > would identify which functions are a candidate for uninlining?
> 
> There is a tool that does this quite well. Its called "gcc" ;)
> 
> More seriously we need to seperate "things Andrew thinks are good inline
> candidates" and "things that *must* be inlined". That allows 'build for
> size' to do the equivalent of "-Dplease_inline" and the other build to
> do "-Dplease_inline=inline". Gcc's inliner isn't aware of things cross
> module so isn't going to make all the decisions right, but will make the
> tedious local decisions.
>...

I'm not getting the point:

Shouldn't "static" versus not "static" already give gcc everything it 
needs for making the decision?

If stuff is cross-module (more exactly: cross-objects) it's not static 
and I doubt there are many (if any) cases of non-static code we want 
inline'd when used inside the file it's in.

> Alan

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-29  4:11           ` Andrew Morton
                               ` (2 preceding siblings ...)
  2005-12-30 15:28             ` Alan Cox
@ 2005-12-30 22:12             ` Matt Mackall
  2005-12-30 23:54               ` Adrian Bunk
  2005-12-31  9:20               ` Arjan van de Ven
  3 siblings, 2 replies; 167+ messages in thread
From: Matt Mackall @ 2005-12-30 22:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Ingo Molnar, torvalds, arjan, linux-kernel

On Wed, Dec 28, 2005 at 08:11:50PM -0800, Andrew Morton wrote:
> If no-forced-inlining makes the kernel smaller then we probably have (yet
> more) incorrect inlining.  We should hunt those down and fix them.  We did
> quite a lot of this in 2.5.x/2.6.early.  Didn't someone have a script which
> would identify which functions are a candidate for uninlining?

It was a combination of a tool I wrote for -tiny, which added
deprecation warnings to inlines along with a post-processing tool to
count instantiations, nestings, etc., and a post-post-processing tool
written by Denis Vlasenko that guessed at the space usage.

We cleaned up most of the obvious offenders quite a while ago, but
there's quite a long tail on the usage distribution. It's simply not
worth the trouble to go through the far half of the distribution one
by one to figure out whether inlining makes sense.

So I'm in favor of changing our inlining philosophy moving forward.
The world has changed since we started physically marking functions
inline. When we started, basically all arches gained advantage from
heavy inlining due to favorable CPU to memory speed ratios. And the
compiler's automatic inlining was quite primitive. Now most (but not
all!) arches heavily favor out of line code except in fairly critical
locations and the compiler has gotten (just recently) quite a bit
smarter with its inlining.

So we should really go back to using inline as a hint for 90%+ of
candidate functions (using always.. and no.. for the rest), and using
our compile-time size and arch information to fine-tune the compiler's
decisions as to which hints to take.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-30 22:12             ` Matt Mackall
@ 2005-12-30 23:54               ` Adrian Bunk
  2005-12-31  9:20               ` Arjan van de Ven
  1 sibling, 0 replies; 167+ messages in thread
From: Adrian Bunk @ 2005-12-30 23:54 UTC (permalink / raw)
  To: Matt Mackall; +Cc: Andrew Morton, Ingo Molnar, torvalds, arjan, linux-kernel

On Fri, Dec 30, 2005 at 04:12:22PM -0600, Matt Mackall wrote:
> On Wed, Dec 28, 2005 at 08:11:50PM -0800, Andrew Morton wrote:
> > If no-forced-inlining makes the kernel smaller then we probably have (yet
> > more) incorrect inlining.  We should hunt those down and fix them.  We did
> > quite a lot of this in 2.5.x/2.6.early.  Didn't someone have a script which
> > would identify which functions are a candidate for uninlining?
> 
> It was a combination of a tool I wrote for -tiny, which added
> deprecation warnings to inlines along with a post-processing tool to
> count instantiations, nestings, etc., and a post-post-processing tool
> written by Denis Vlasenko that guessed at the space usage.
> 
> We cleaned up most of the obvious offenders quite a while ago, but
> there's quite a long tail on the usage distribution. It's simply not
> worth the trouble to go through the far half of the distribution one
> by one to figure out whether inlining makes sense.

The "figure out" task is easy:

There has to be a _very_ good reason for not deleting an inline in a .c 
file.

inline's in header files are a different topic, but in their case an 
explicit review would be better since the correct solution is in such 
cases often to move the code to a .c file (this might even result in 
additional space savings).

> So I'm in favor of changing our inlining philosophy moving forward.
> The world has changed since we started physically marking functions
> inline. When we started, basically all arches gained advantage from
> heavy inlining due to favorable CPU to memory speed ratios. And the
> compiler's automatic inlining was quite primitive. Now most (but not
> all!) arches heavily favor out of line code except in fairly critical
> locations and the compiler has gotten (just recently) quite a bit
> smarter with its inlining.
> 
> So we should really go back to using inline as a hint for 90%+ of
> candidate functions (using always.. and no.. for the rest), and using
> our compile-time size and arch information to fine-tune the compiler's
> decisions as to which hints to take.

I still don't understand why gcc needs any "inline" hints at all except 
in the cases where we want to force gcc to inline a function.

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-30  2:03                           ` Tim Schmielau
  2005-12-30  2:15                             ` Tim Schmielau
  2005-12-30  7:49                             ` Ingo Molnar
@ 2005-12-31  3:51                             ` Kurt Wall
  2 siblings, 0 replies; 167+ messages in thread
From: Kurt Wall @ 2005-12-31  3:51 UTC (permalink / raw)
  To: lkml

On Fri, Dec 30, 2005 at 03:03:22AM +0100, Tim Schmielau took 0 lines to write:

>    http://www.physik3.uni-rostock.de/tim/kernel/2.6/deinline.patch.gz
> The resulting kernel actually booted (am running it right now). However, 
> catching just these low-hanging fruits doesn't get me anywhere near 
> Arjan's numbers. For my non-representative personal config I get (on 
> i386 with -unit-at-a-time):
> 
>    > size vmlinux*
>       text    data     bss     dec     hex filename
>    2197105  386568  316840 2900513  2c4221 vmlinux
>    2144453  392100  316840 2853393  2b8a11 vmlinux.deinline

For two more datapoints, also from an x86_64 2.6.15-rc7 kernel, here
are the values for my main desktop .config and an allyesconfig .config.
The .deinline kernels have the above patch applied and are also built
with CONFIG_CC_OPTIMIZE_FOR_SIZE=y.

$ size vmlinux.krw*
   text    data     bss     dec     hex filename
2338371  462208  479920 3280499  320e73 vmlinux.krw
2309384  468168  479920 3257472  31b480 vmlinux.krw.deinline

.text is only 1.24% smaller

For an allyesconfig, the results are slightly worse:

$ size vmlinux*
   text    data     bss     dec     hex filename
24076648        7465782 1996904 33539334        1ffc506 vmlinux
23791161        7513590 1996904 33301655        1fc2497 vmlinux.deinline

.text is only 1.19% smaller

Kurt
-- 
Nothing cures insomnia like the realization that it's time to get up.

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-30 22:12             ` Matt Mackall
  2005-12-30 23:54               ` Adrian Bunk
@ 2005-12-31  9:20               ` Arjan van de Ven
  1 sibling, 0 replies; 167+ messages in thread
From: Arjan van de Ven @ 2005-12-31  9:20 UTC (permalink / raw)
  To: Matt Mackall; +Cc: Andrew Morton, Ingo Molnar, torvalds, linux-kernel

On Fri, 2005-12-30 at 16:12 -0600, Matt Mackall wrote:
> On Wed, Dec 28, 2005 at 08:11:50PM -0800, Andrew Morton wrote:
> > If no-forced-inlining makes the kernel smaller then we probably have (yet
> > more) incorrect inlining.  We should hunt those down and fix them.  We did
> > quite a lot of this in 2.5.x/2.6.early.  Didn't someone have a script which
> > would identify which functions are a candidate for uninlining?
> 
> It was a combination of a tool I wrote for -tiny, which added
> deprecation warnings to inlines along with a post-processing tool to
> count instantiations, nestings, etc., and a post-post-processing tool
> written by Denis Vlasenko that guessed at the space usage.


my current patch to deinline a bunch of big offenders is at
http://www.fenrus.org/noinline

(it's on the big side for lkml for now)


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-30  7:49                             ` Ingo Molnar
@ 2005-12-31 14:38                               ` Adrian Bunk
  2005-12-31 14:45                                 ` Ingo Molnar
  0 siblings, 1 reply; 167+ messages in thread
From: Adrian Bunk @ 2005-12-31 14:38 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Tim Schmielau, Arjan van de Ven, Linus Torvalds, Dave Jones,
	Andrew Morton, lkml, mpm

On Fri, Dec 30, 2005 at 08:49:16AM +0100, Ingo Molnar wrote:
> 
> * Tim Schmielau <tim@physik3.uni-rostock.de> wrote:
> 
> > What about the previous suggestion to remove inline from *all* static 
> > inline functions in .c files?
> 
> i think this is a way too static approach. Why go from one extreme to 
> the other, when my 3 simple patches (which arguably create a more 
> flexible scenario) gives us savings of 7.7%?

This point only discusses the inline change, which were (without 
unit-at-a-time) in your measurements 2.9%.

Your patch might be simple, but it also might have side effects in cases 
where we _really_ want the code forced to be inlined. How simple is it 
to prove that your uninline patch doesn't cause a subtle breakage 
somewhere?

inline's in .c files are nearly always wrong (there might be very few 
exceptions), and this should simply be fixed.

Applying Arjan's uninlining patch [1] against 2.6.15-rc5-mm3 (ignoring a 
few rejects at applying the patch), I'm getting more than 0.6% .text 
savings (this is with a "compile everything .config", without 
unit-at-a-time and with -Os). 

> 	Ingo

cu
Adrian

[1] http://www.fenrus.org/noinline

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-31 14:38                               ` Adrian Bunk
@ 2005-12-31 14:45                                 ` Ingo Molnar
  2005-12-31 15:08                                   ` Adrian Bunk
  0 siblings, 1 reply; 167+ messages in thread
From: Ingo Molnar @ 2005-12-31 14:45 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Tim Schmielau, Arjan van de Ven, Linus Torvalds, Dave Jones,
	Andrew Morton, lkml, mpm


* Adrian Bunk <bunk@stusta.de> wrote:

> On Fri, Dec 30, 2005 at 08:49:16AM +0100, Ingo Molnar wrote:
> > 
> > * Tim Schmielau <tim@physik3.uni-rostock.de> wrote:
> > 
> > > What about the previous suggestion to remove inline from *all* static 
> > > inline functions in .c files?
> > 
> > i think this is a way too static approach. Why go from one extreme to 
> > the other, when my 3 simple patches (which arguably create a more 
> > flexible scenario) gives us savings of 7.7%?
> 
> This point only discusses the inline change, which were (without 
> unit-at-a-time) in your measurements 2.9%.
> 
> Your patch might be simple, but it also might have side effects in 
> cases where we _really_ want the code forced to be inlined. How simple 
> is it to prove that your uninline patch doesn't cause a subtle 
> breakage somewhere?

it's quite simple: run the latency tracer with stack-trace debugging 
enabled, and it will measure the worst-case stack footprint that is 
triggered on that system. Obviously any compiler version change or 
option change can cause problems, there's nothing new about it - and 
it's not realistic to wait one year for changes like that. If you have 
to wait that long, you are testing it the wrong way.

	Ingo

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-31 14:45                                 ` Ingo Molnar
@ 2005-12-31 15:08                                   ` Adrian Bunk
  2006-01-02 10:37                                     ` Ingo Molnar
  0 siblings, 1 reply; 167+ messages in thread
From: Adrian Bunk @ 2005-12-31 15:08 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Tim Schmielau, Arjan van de Ven, Linus Torvalds, Dave Jones,
	Andrew Morton, lkml, mpm

[ this email discusses only your uninline patch ]

On Sat, Dec 31, 2005 at 03:45:35PM +0100, Ingo Molnar wrote:
> 
> * Adrian Bunk <bunk@stusta.de> wrote:
> 
> > On Fri, Dec 30, 2005 at 08:49:16AM +0100, Ingo Molnar wrote:
> > > 
> > > * Tim Schmielau <tim@physik3.uni-rostock.de> wrote:
> > > 
> > > > What about the previous suggestion to remove inline from *all* static 
> > > > inline functions in .c files?
> > > 
> > > i think this is a way too static approach. Why go from one extreme to 
> > > the other, when my 3 simple patches (which arguably create a more 
> > > flexible scenario) gives us savings of 7.7%?
> > 
> > This point only discusses the inline change, which were (without 
> > unit-at-a-time) in your measurements 2.9%.
> > 
> > Your patch might be simple, but it also might have side effects in 
> > cases where we _really_ want the code forced to be inlined. How simple 
> > is it to prove that your uninline patch doesn't cause a subtle 
> > breakage somewhere?
> 
> it's quite simple: run the latency tracer with stack-trace debugging 
> enabled, and it will measure the worst-case stack footprint that is 
> triggered on that system. Obviously any compiler version change or 
> option change can cause problems, there's nothing new about it - and 
> it's not realistic to wait one year for changes like that. If you have 
> to wait that long, you are testing it the wrong way.

What are you talking about?

You sent two different patches:
1. uninline
2. unit-at-a-time for i386

These are two separate patches that should be discussed separately.

Your answer regarding your second patch does't fit in any way my email 
regarding your first patch.

Your uninline patch shouldn't cause any regressions regarding stack 
footprint, and stack usage is not what I was talking about.

My email was about things like Andi's example of the x86-64 vsyscall 
code where we really need inlining, and due to your proposed inline 
semantics change there might be breakages if an __always_inline is 
forgotten at a place where it was required.

Your uninline patch might be simple, but the safe way would be Arjan's 
approach to start removing all the buggy inline's from .c files.

> 	Ingo

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-29 20:25       ` Ingo Molnar
@ 2005-12-31 15:22         ` Adrian Bunk
  0 siblings, 0 replies; 167+ messages in thread
From: Adrian Bunk @ 2005-12-31 15:22 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: lkml, Linus Torvalds, Andrew Morton, Arjan van de Ven,
	Matt Mackall, Dave Jones

[ this email discusses only your unit-at-a-time patch ]

On Thu, Dec 29, 2005 at 09:25:50PM +0100, Ingo Molnar wrote:
> 
> * Adrian Bunk <bunk@stusta.de> wrote:
> 
> > It won't be dropped on the floor indefinitely.
> > 
> > "I do plan to look at this" means that I'd currently estimate this 
> > being 2.6.19 stuff.
> 
> you must be kidding ...

No.

Both 4k stacks and unit-at-a-time are changes with negative impact on 
the stack usage, and I want to have problems sorted out separately.

We wouldn't have much discussion here if 4k stacks were only judged by 
technical facts, but although the last known problem was fixed in -mm 
nearly two months ago, it seems the ndiswrapper groupies have managed to 
spread enough FUD to even persuade Andrew that 4k stacks were evil.  :-(

> > Yes that's one year from now, but we need it properly analyzed and 
> > tested before getting it into Linus' tree, and I do really want it 
> > untangled from and therefore after 4k stacks.
> 
> you are really using the wrong technology for this.
> 
> look at the latency tracing patch i posted today: it includes a feature 
> that prints the worst-case stack footprint _as it happens_, and thus 
> allows the mapping of such effects in a very efficient and very 
> practical way. As it works on a live system, and profiles live function 
> traces, it goes through function pointers and irq entry nesting effects 
> too. We could perhaps put that into Fedora for a while and get the 
> worst-case footprints mapped.
> 
> in fact i've been running this feature in the -rt kernel for quite some 
> time, and it enabled the fixing of a couple of bad stack abusers, and it 
> also told us what our current worst-case stack footprint is [when 4K 
> stacks are enabled]: it's execve of an ELF binary.

That's nice.

Could you try to get at least the part that checks whether more than 
STACK_WARN stack is left (if CONFIG_DEBUG_STACKOVERFLOW is set) into
-mm (and perhaps later into Linus' tree)?

> 	Ingo

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-31 15:08                                   ` Adrian Bunk
@ 2006-01-02 10:37                                     ` Ingo Molnar
  2006-01-02 10:48                                       ` Arjan van de Ven
  2006-01-02 13:42                                       ` Adrian Bunk
  0 siblings, 2 replies; 167+ messages in thread
From: Ingo Molnar @ 2006-01-02 10:37 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Tim Schmielau, Arjan van de Ven, Linus Torvalds, Dave Jones,
	Andrew Morton, lkml, mpm


* Adrian Bunk <bunk@stusta.de> wrote:

> My email was about things like Andi's example of the x86-64 vsyscall 
> code where we really need inlining, and due to your proposed inline 
> semantics change there might be breakages if an __always_inline is 
> forgotten at a place where it was required.

we can have two types of breakages:

 - stuff wont build if not always_inline. Really easy to find and fix.

 - stuff wont work at all (e.g. vsyscalls) because they have some
   unspecified reliance on gcc's code output. Such code Is Bad anyway,
   and the breakage is still clear: the vsyscalls wont work at all, it's
   quickly found, the appropriate always_inline is inserted, and the
   incident is forgotten.

talking about 'safer' or 'risky' in this context is misleading, these 
are very clear symptoms which are easy to fix.

[ If you didnt talk about this uninline patch in the "we have to wait 
  one year" comment then please clarify that - all that came through to 
  me was some vague "lets wait with this" message, and it wasnt clear 
  (to me) which patch it applied to and why. ]

> Your uninline patch might be simple, but the safe way would be Arjan's 
> approach to start removing all the buggy inline's from .c files.

sure, that's another thing to do, but it's also clear that there's no 
reason to force inlines in the -Os case.

There are 22,000+ inline functions in the kernel right now (inlined 
about a 100,000 times), and we'd have to change _thousands_ of them. 
They are causing an unjustified code bloat of somewhere around 20-30%. 
(some of them are very much justified, especially in core kernel code)

to say it loud and clear again: our current way of handling inlines is 
_FUNDAMENTALLY BROKEN_. To me this means that fundamental changes are 
needed for the _mechanics_ and meaning of inlines. We default to 'always 
inline' which has a current information to noise ratio of 1:10 perhaps.  
My patch changes the mechanics and meaning of inlines, and pretty much 
anything else but a change to the meaning of inlines will still result 
in the same scenario occuring over and over again.

	Ingo

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-02 10:37                                     ` Ingo Molnar
@ 2006-01-02 10:48                                       ` Arjan van de Ven
  2006-01-02 13:43                                         ` Adrian Bunk
  2006-01-02 13:42                                       ` Adrian Bunk
  1 sibling, 1 reply; 167+ messages in thread
From: Arjan van de Ven @ 2006-01-02 10:48 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Adrian Bunk, Tim Schmielau, Linus Torvalds, Dave Jones,
	Andrew Morton, lkml, mpm


> > Your uninline patch might be simple, but the safe way would be Arjan's 
> > approach to start removing all the buggy inline's from .c files.
> 
> sure, that's another thing to do, but it's also clear that there's no 
> reason to force inlines in the -Os case.
> 
> There are 22,000+ inline functions in the kernel right now (inlined 
> about a 100,000 times), and we'd have to change _thousands_ of them. 
> They are causing an unjustified code bloat of somewhere around 20-30%. 
> (some of them are very much justified, especially in core kernel code)

my patch attacks the top bloaters, and gains about 30k to 40k (depending
on compiler). Gaining the other 300k is going to be a LOT of churn, not
just in amount of work... so to some degree my patch shows that it's a
bit of a hopeless battle.



^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-02 10:37                                     ` Ingo Molnar
  2006-01-02 10:48                                       ` Arjan van de Ven
@ 2006-01-02 13:42                                       ` Adrian Bunk
  2006-01-02 18:28                                         ` Andrew Morton
  1 sibling, 1 reply; 167+ messages in thread
From: Adrian Bunk @ 2006-01-02 13:42 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Tim Schmielau, Arjan van de Ven, Linus Torvalds, Dave Jones,
	Andrew Morton, lkml, mpm

On Mon, Jan 02, 2006 at 11:37:21AM +0100, Ingo Molnar wrote:
>...
> to say it loud and clear again: our current way of handling inlines is 
> _FUNDAMENTALLY BROKEN_. To me this means that fundamental changes are 
> needed for the _mechanics_ and meaning of inlines. We default to 'always 
> inline' which has a current information to noise ratio of 1:10 perhaps.  
> My patch changes the mechanics and meaning of inlines, and pretty much 
> anything else but a change to the meaning of inlines will still result 
> in the same scenario occuring over and over again.

Let's emphasize what we both agree on:
It is _FUNDAMENTALLY BROKEN_ that too much code is marked as
'always inline'.

We only disagree on how to achieve an improvement.

> 	Ingo

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-02 10:48                                       ` Arjan van de Ven
@ 2006-01-02 13:43                                         ` Adrian Bunk
  2006-01-02 14:05                                           ` Ingo Molnar
  0 siblings, 1 reply; 167+ messages in thread
From: Adrian Bunk @ 2006-01-02 13:43 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Ingo Molnar, Tim Schmielau, Linus Torvalds, Dave Jones,
	Andrew Morton, lkml, mpm

On Mon, Jan 02, 2006 at 11:48:22AM +0100, Arjan van de Ven wrote:
> 
> > > Your uninline patch might be simple, but the safe way would be Arjan's 
> > > approach to start removing all the buggy inline's from .c files.
> > 
> > sure, that's another thing to do, but it's also clear that there's no 
> > reason to force inlines in the -Os case.
> > 
> > There are 22,000+ inline functions in the kernel right now (inlined 
> > about a 100,000 times), and we'd have to change _thousands_ of them. 
> > They are causing an unjustified code bloat of somewhere around 20-30%. 
> > (some of them are very much justified, especially in core kernel code)
> 
> my patch attacks the top bloaters, and gains about 30k to 40k (depending
> on compiler). Gaining the other 300k is going to be a LOT of churn, not
> just in amount of work... so to some degree my patch shows that it's a
> bit of a hopeless battle.

A quick grep shows at about 10.000 inline's in .c files, and nearly all 
of them should be removed.

Yes this is a serious amount of work, but it's an ideal janitorial task.

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-02 13:43                                         ` Adrian Bunk
@ 2006-01-02 14:05                                           ` Ingo Molnar
  2006-01-02 15:01                                             ` Adrian Bunk
  2006-01-02 18:44                                             ` Krzysztof Halasa
  0 siblings, 2 replies; 167+ messages in thread
From: Ingo Molnar @ 2006-01-02 14:05 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Arjan van de Ven, Tim Schmielau, Linus Torvalds, Dave Jones,
	Andrew Morton, lkml, mpm


* Adrian Bunk <bunk@stusta.de> wrote:

> > > > Your uninline patch might be simple, but the safe way would be Arjan's 
> > > > approach to start removing all the buggy inline's from .c files.
> > > 
> > > sure, that's another thing to do, but it's also clear that there's no 
> > > reason to force inlines in the -Os case.
> > > 
> > > There are 22,000+ inline functions in the kernel right now (inlined 
> > > about a 100,000 times), and we'd have to change _thousands_ of them. 
> > > They are causing an unjustified code bloat of somewhere around 20-30%. 
> > > (some of them are very much justified, especially in core kernel code)
> > 
> > my patch attacks the top bloaters, and gains about 30k to 40k (depending
> > on compiler). Gaining the other 300k is going to be a LOT of churn, not
> > just in amount of work... so to some degree my patch shows that it's a
> > bit of a hopeless battle.
> 
> A quick grep shows at about 10.000 inline's in .c files, and nearly 
> all of them should be removed.
> 
> Yes this is a serious amount of work, but it's an ideal janitorial 
> task.

oh, it is certainly an insane amount of janitorial work - which is also 
precisely why this well-known and seemingly trivial problem has 
escallated so much!

the nontrivial thing is that the moment trivial things get widespread, 
_the mechanism_ needs a change. I.e. the 'widespread inlines' arent the 
big problem, the big problem is that the widespread inlines _got 
widespread_. I'm not sure whether i'm being clear enough: think of the 
22,000 inlines as a symptom of a deeper problem, not as the problem 
itself. That is i am trying to get through (to you and to others).

trying to attack a problem that has a number-of-changes size of 22,000 
is also trivially _futile_, we'll be quickly overwhelmed by the 
underlying problem again, if it's not addressed.

both Arjan and me are trying to attack the underlying problem, while 
also fixing all the results of that problem. But the main focus is on 
mapping and eliminating the _hidden_ problem, not on the very visible 
'symptoms'. The symptoms are easy to fix! I do think the underlying 
problem is fixable too, but i find it slightly frustrating that most 
people are focused on the symptoms, missing the bigger picture.

what is the 'deeper problem'? I believe it is a combination of two 
(well-known) things:

  1) people add 'inline' too easily
  2) we default to 'always inline'

problem #1 is very hard to influence, because it's a basic psychology 
issue. Pretty much the only approach to fix this is to educate people.  
But it is hard to change the ways people code, and it's a long-term 
thing with no reasonable expectation of success. So while we can and 
should improve education of this issue (this thread will certainly raise 
awareness of it), we cannot rely on it alone.

i think the only realistic approach is to attack component #2: do not 
default to 'always inline' but default to 'inline if the compiler agrees 
too'. I do think we should default to 'compiler decides' (when using a 
gcc4 compiler), as this also has some obvious advantages:

 - different inlining when compiler optimizes for size not for speed

changing this also means we need to map a few trivial cases where kernel 
code relies on inlining (or relies on non-inlining), but those are 
fortunately easy and mostly well-known.

taking Arjan's patch alone, or running a script to change all static 
inlines in .c files to non-inline, without doing some of the other 
changes i'm proposing has a number of disadvantages:

 - it leaves the defaults in place, and we'll again gain ~50-100 new 
   'incorrect' inlines per week as we did until today. Barring the 
   initial few weeks of enthusiasm, nobody will really address that in 
   the long run, and we'll be back to square one.

 - it likely destroys the inlines there _were_ put into .c files 
   judiciously and correctly. The keyword 'inline' is a hint for gcc 
   which increases the likelyhood it will be inlined. Removing them an 
   bloc is _wrong_ and punishes the 'good guys', while still allowing 
   the 'bad guys' to continue business as usual.

so all in one, unless we attack #1 or #2 _with a bigger effective effort 
than we spend on attacking the symptoms_, we will only achieve a 
temporary, short-term reprieve.

	Ingo

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-02 14:05                                           ` Ingo Molnar
@ 2006-01-02 15:01                                             ` Adrian Bunk
  2006-01-02 18:44                                             ` Krzysztof Halasa
  1 sibling, 0 replies; 167+ messages in thread
From: Adrian Bunk @ 2006-01-02 15:01 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arjan van de Ven, Tim Schmielau, Linus Torvalds, Dave Jones,
	Andrew Morton, lkml, mpm

On Mon, Jan 02, 2006 at 03:05:11PM +0100, Ingo Molnar wrote:
> 
> * Adrian Bunk <bunk@stusta.de> wrote:
> 
> > > > > Your uninline patch might be simple, but the safe way would be Arjan's 
> > > > > approach to start removing all the buggy inline's from .c files.
> > > > 
> > > > sure, that's another thing to do, but it's also clear that there's no 
> > > > reason to force inlines in the -Os case.
> > > > 
> > > > There are 22,000+ inline functions in the kernel right now (inlined 
> > > > about a 100,000 times), and we'd have to change _thousands_ of them. 
> > > > They are causing an unjustified code bloat of somewhere around 20-30%. 
> > > > (some of them are very much justified, especially in core kernel code)
> > > 
> > > my patch attacks the top bloaters, and gains about 30k to 40k (depending
> > > on compiler). Gaining the other 300k is going to be a LOT of churn, not
> > > just in amount of work... so to some degree my patch shows that it's a
> > > bit of a hopeless battle.
> > 
> > A quick grep shows at about 10.000 inline's in .c files, and nearly 
> > all of them should be removed.
> > 
> > Yes this is a serious amount of work, but it's an ideal janitorial 
> > task.
> 
> oh, it is certainly an insane amount of janitorial work - which is also 
> precisely why this well-known and seemingly trivial problem has 
> escallated so much!
> 
> the nontrivial thing is that the moment trivial things get widespread, 
> _the mechanism_ needs a change. I.e. the 'widespread inlines' arent the 
> big problem, the big problem is that the widespread inlines _got 
> widespread_. I'm not sure whether i'm being clear enough: think of the 
> 22,000 inlines as a symptom of a deeper problem, not as the problem 
> itself. That is i am trying to get through (to you and to others).

My point goes more into the following direction:
- 10,000 inline's are in .c files
- 12,000 inline's are in .h files

The interesting one's are the latter:
- if they are too big, the smallest solution is to move them to .c files
- it might cause a size _increase_ if some version if gcc will not 
  inline all of them

The latter gives warnings with -Winline, and adding this flag to the 
gloval CFLAGS and analyzing all the warnings (especially with -Os) could 
make my point void.

>...
> what is the 'deeper problem'? I believe it is a combination of two 
> (well-known) things:
> 
>   1) people add 'inline' too easily
>   2) we default to 'always inline'
> 
> problem #1 is very hard to influence, because it's a basic psychology
> issue. Pretty much the only approach to fix this is to educate people.
> But it is hard to change the ways people code, and it's a long-term
> thing with no reasonable expectation of success. So while we can and
> should improve education of this issue (this thread will certainly raise
> awareness of it), we cannot rely on it alone.
>
> i think the only realistic approach is to attack component #2: do not
> default to 'always inline' but default to 'inline if the compiler agrees
> too'. I do think we should default to 'compiler decides' (when using a
> gcc4 compiler), as this also has some obvious advantages:
>
>  - different inlining when compiler optimizes for size not for speed
>
> changing this also means we need to map a few trivial cases where kernel
> code relies on inlining (or relies on non-inlining), but those are
> fortunately easy and mostly well-known.
>...
> so all in one, unless we attack #1 or #2 _with a bigger effective effort 
> than we spend on attacking the symptoms_, we will only achieve a 
> temporary, short-term reprieve.

#1 should definitely be done.

And you do slightly manage to convince me that #2 is a good (additional) 
approach (if -Winline gets added to the global CFLAGS).

> 	Ingo

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-02 13:42                                       ` Adrian Bunk
@ 2006-01-02 18:28                                         ` Andrew Morton
  2006-01-02 18:49                                           ` Arjan van de Ven
                                                             ` (2 more replies)
  0 siblings, 3 replies; 167+ messages in thread
From: Andrew Morton @ 2006-01-02 18:28 UTC (permalink / raw)
  To: Adrian Bunk; +Cc: mingo, tim, arjan, torvalds, davej, linux-kernel, mpm

Adrian Bunk <bunk@stusta.de> wrote:
>
> On Mon, Jan 02, 2006 at 11:37:21AM +0100, Ingo Molnar wrote:
> >...
> > to say it loud and clear again: our current way of handling inlines is 
> > _FUNDAMENTALLY BROKEN_. To me this means that fundamental changes are 
> > needed for the _mechanics_ and meaning of inlines. We default to 'always 
> > inline' which has a current information to noise ratio of 1:10 perhaps.  
> > My patch changes the mechanics and meaning of inlines, and pretty much 
> > anything else but a change to the meaning of inlines will still result 
> > in the same scenario occuring over and over again.
> 
> Let's emphasize what we both agree on:
> It is _FUNDAMENTALLY BROKEN_ that too much code is marked as
> 'always inline'.
> 
> We only disagree on how to achieve an improvement.
> 

The best approach is to manually review and fix up all the inline statements.

We cannot just delete them all, because that would cause performance loss
for well-chosen inlinings when using gcc-3.

I'd be reluctant to trust gcc-4 to do the right thing in all cases.  If the
compiler fails to inline functions in certain critical cases we'll suffer
some performance loss and the source of that performance loss will be
utterly obscure.

If someone types `inline' into a piece of code then we want to inline the
function, dammit.  The fact that lots of people typed `inline' when they
shouldn't have is not a good argument for defeating (or adding uncertainty
to) manual inlining in well-developed and well-maintained code.

All those squillions of bogus inlines which you've identified are probably
mainly in code we just don't care about much.  We shouldn't penalise
well-maintained code because of legacy problems in less well-maintained
code.

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-02 14:05                                           ` Ingo Molnar
  2006-01-02 15:01                                             ` Adrian Bunk
@ 2006-01-02 18:44                                             ` Krzysztof Halasa
  2006-01-02 18:51                                               ` Arjan van de Ven
                                                                 ` (2 more replies)
  1 sibling, 3 replies; 167+ messages in thread
From: Krzysztof Halasa @ 2006-01-02 18:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Adrian Bunk, Arjan van de Ven, Tim Schmielau, Linus Torvalds,
	Dave Jones, Andrew Morton, lkml, mpm

Ingo Molnar <mingo@elte.hu> writes:

> what is the 'deeper problem'? I believe it is a combination of two 
> (well-known) things:
>
>   1) people add 'inline' too easily
>   2) we default to 'always inline'

For example, I add "inline" for static functions which are only called
from one place.

If I'm able to say "this is static function which is called from one
place" I'd do so instead of saying "inline". But omitting the "inline"
with hope that some new gcc probably will inline it anyway (on some
platform?) doesn't seem like a best idea.

But what _is_ the best idea?
-- 
Krzysztof Halasa

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-02 18:28                                         ` Andrew Morton
@ 2006-01-02 18:49                                           ` Arjan van de Ven
  2006-01-02 19:26                                             ` Jörn Engel
  2006-01-02 21:51                                             ` Grant Coady
  2006-01-03  5:31                                           ` Nick Piggin
  2006-01-03 23:40                                           ` Martin J. Bligh
  2 siblings, 2 replies; 167+ messages in thread
From: Arjan van de Ven @ 2006-01-02 18:49 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Adrian Bunk, mingo, tim, torvalds, davej, linux-kernel, mpm


> I'd be reluctant to trust gcc-4 to do the right thing in all cases.  If the
> compiler fails to inline functions in certain critical cases we'll suffer
> some performance loss and the source of that performance loss will be
> utterly obscure.

yet.. turning inline into a hint (which causes gcc to greatly bias
towards inlining without making it absolute) was also opposed by either
you or Linus. Forcing is ALSO wrong. For example it causes gcc to do
inlining even for functions that use variable sized arrays ;(

the performance loss potential should not be overstated; unless the code
can get a real advantage in optimizing due to constant arguments (eg the
kmalloc example), there is not much to gain from inlining. A "call" is 1
cycle at most normally, unless the code inside the inline is highly
trivial/small that's negligible. (eg anything that does port or mmio is
already 100x more expensive, as is anything that gets a cache-miss or a
branch predictor miss such as a loop). 

> All those squillions of bogus inlines which you've identified are probably
> mainly in code we just don't care about much.  We shouldn't penalise
> well-maintained code because of legacy problems in less well-maintained
> code.

this is not a correct assumption as far as I can see. Especially for
drivers, but also even for kernel/. I sent you a patch to fix the
biggest offenders, and I can do more of that of course. But to some
degree the end isn't in sight, esp if new code keeps introducing new
bogus inlines.

Maybe the right approach is to start rejecting in reviews new code that
uses inline inappropriately. (where "inappropriate" sort of is "more
than 3 lines of C unless there is some constant-optimizes-away trick")



^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-02 18:44                                             ` Krzysztof Halasa
@ 2006-01-02 18:51                                               ` Arjan van de Ven
  2006-01-02 19:49                                                 ` Krzysztof Halasa
  2006-01-02 22:23                                                 ` Russell King
  2006-01-02 19:03                                               ` Andrew Morton
  2006-01-02 19:12                                               ` Linus Torvalds
  2 siblings, 2 replies; 167+ messages in thread
From: Arjan van de Ven @ 2006-01-02 18:51 UTC (permalink / raw)
  To: Krzysztof Halasa
  Cc: Ingo Molnar, Adrian Bunk, Tim Schmielau, Linus Torvalds,
	Dave Jones, Andrew Morton, lkml, mpm

On Mon, 2006-01-02 at 19:44 +0100, Krzysztof Halasa wrote:
> Ingo Molnar <mingo@elte.hu> writes:
> 
> > what is the 'deeper problem'? I believe it is a combination of two 
> > (well-known) things:
> >
> >   1) people add 'inline' too easily
> >   2) we default to 'always inline'
> 
> For example, I add "inline" for static functions which are only called
> from one place.

you know what? gcc inlines those automatic even without you typing
"inline". (esp if you have unit-at-a-time enabled)

> 
> If I'm able to say "this is static function which is called from one
> place" I'd do so instead of saying "inline". But omitting the "inline"
> with hope that some new gcc probably will inline it anyway (on some
> platform?) doesn't seem like a best idea.

well.. gcc is not stupid, especially if you give it visibility by
enabling unit-at-a-time. It *WILL* inline static-used-once functions
unless there is a technical reason not to (say variable sized arrays)
> 
> But what _is_ the best idea?

you save about 1 cycle by inlining unless there is a trick for the
optimizer. Especially in the case you mention where gcc will dtrt...
it's not worth typing "inline", what if you change the code later to use
the function twice... most people at least forget to remove the
redundant inline, turning it into a bloater...



^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-02 18:44                                             ` Krzysztof Halasa
  2006-01-02 18:51                                               ` Arjan van de Ven
@ 2006-01-02 19:03                                               ` Andrew Morton
  2006-01-02 19:17                                                 ` Jakub Jelinek
                                                                   ` (2 more replies)
  2006-01-02 19:12                                               ` Linus Torvalds
  2 siblings, 3 replies; 167+ messages in thread
From: Andrew Morton @ 2006-01-02 19:03 UTC (permalink / raw)
  To: Krzysztof Halasa
  Cc: mingo, bunk, arjan, tim, torvalds, davej, linux-kernel, mpm

Krzysztof Halasa <khc@pm.waw.pl> wrote:
>
> Ingo Molnar <mingo@elte.hu> writes:
> 
> > what is the 'deeper problem'? I believe it is a combination of two 
> > (well-known) things:
> >
> >   1) people add 'inline' too easily
> >   2) we default to 'always inline'
> 
> For example, I add "inline" for static functions which are only called
> from one place.
> 
> If I'm able to say "this is static function which is called from one
> place" I'd do so instead of saying "inline". But omitting the "inline"
> with hope that some new gcc probably will inline it anyway (on some
> platform?) doesn't seem like a best idea.
> 
> But what _is_ the best idea?

Just use `inline'.  With gcc-3 it'll be inlined.

With gcc-4 and Ingo's patch it _might_ be inlined.  And it _might_ be
uninlined by the compiler if someone adds a second callsite later on. 
Maybe.  We just don't know.  That's a problem.  Use of __always_inline will
remove this uncertainty.

So our options appear to be:

a) Go fix up stupid inlinings (again) or

b) Apply Ingo's patch, then go add __always_inline to places which we
   care about.

Either way, we need to go all over the tree.  In practice, we'll only
bother going over the bits which we most care about (core kernel, core
networking, a handful of net and block drivers).  I suspect many of the bad
inlining decisions are in poorly-maintained code - we've been pretty
careful about this for several years.

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-02 18:44                                             ` Krzysztof Halasa
  2006-01-02 18:51                                               ` Arjan van de Ven
  2006-01-02 19:03                                               ` Andrew Morton
@ 2006-01-02 19:12                                               ` Linus Torvalds
  2006-01-02 19:59                                                 ` Krzysztof Halasa
  2006-01-02 20:13                                                 ` Ingo Molnar
  2 siblings, 2 replies; 167+ messages in thread
From: Linus Torvalds @ 2006-01-02 19:12 UTC (permalink / raw)
  To: Krzysztof Halasa
  Cc: Ingo Molnar, Adrian Bunk, Arjan van de Ven, Tim Schmielau,
	Dave Jones, Andrew Morton, lkml, mpm



On Mon, 2 Jan 2006, Krzysztof Halasa wrote:
> 
> For example, I add "inline" for static functions which are only called
> from one place.

That's actually not a good practice. Two reasons:

 - debuggability goes way down. Oops reports give a much nicer call-chain 
   and better locality for uninlined code.

 - Gcc can suck at big functions with lots of local variables. A 
   function call can be _cheaper_ than trying to inline a function, 
   regardless of whether it's called once or many times. I've seen 
   functions that had several silly (and unnecessary) spills suddenly 
   become quite readable when they were separate functions.

   More importantly, the "inline" sticks around. Later on, the function is 
   used for some other place too, and the inline doesn't get removed.

The second "the inline sticks around" case is something that happens to 
helper functions too. They started out as trivial macros in a header file, 
but then they get converted to an inline function because they get more 
complex, and eventually it grows a new hook. Suddenly what used to 
generate two instructions generates ten instructions and a call, and would 
have been much better off being uninlined in a .c file.

So inlines that make sense at one point have a tendency to _not_ make 
sense a year or two later. 

I suspect we'd be best off removing almost all inlines, both from C files 
and headers. There are a few cases where inlining really helps: the 
function really ends up being just a few instructions, and it's really 
just a wrapper around a simpler calling convention or an inline assembly, 
or it's called with constant arguments that are better left off simplified 
at compile-time. Those are the cases where inlining really helps.

(Of course, then there's ia64. Which wants inlining just because..)

		Linus

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-02 19:03                                               ` Andrew Morton
@ 2006-01-02 19:17                                                 ` Jakub Jelinek
  2006-01-02 19:30                                                   ` Andrew Morton
  2006-01-02 19:41                                                   ` Linus Torvalds
  2006-01-02 20:09                                                 ` Ingo Molnar
  2006-01-02 20:30                                                 ` Ingo Molnar
  2 siblings, 2 replies; 167+ messages in thread
From: Jakub Jelinek @ 2006-01-02 19:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Krzysztof Halasa, mingo, bunk, arjan, tim, torvalds, davej,
	linux-kernel, mpm

On Mon, Jan 02, 2006 at 11:03:41AM -0800, Andrew Morton wrote:
> > But what _is_ the best idea?
> 
> Just use `inline'.  With gcc-3 it'll be inlined.

Where does this certainity come from?  gcc-3.x (as well as 2.x) each had
its own heuristics which functions should be inlined and which should not.
inline keyword has always been a hint.

	Jakub

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-02 18:49                                           ` Arjan van de Ven
@ 2006-01-02 19:26                                             ` Jörn Engel
  2006-01-02 21:51                                             ` Grant Coady
  1 sibling, 0 replies; 167+ messages in thread
From: Jörn Engel @ 2006-01-02 19:26 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Andrew Morton, Adrian Bunk, mingo, tim, torvalds, davej,
	linux-kernel, mpm

On Mon, 2 January 2006 19:49:06 +0100, Arjan van de Ven wrote:
> 
> > I'd be reluctant to trust gcc-4 to do the right thing in all cases.  If the
> > compiler fails to inline functions in certain critical cases we'll suffer
> > some performance loss and the source of that performance loss will be
> > utterly obscure.
> 
> yet.. turning inline into a hint (which causes gcc to greatly bias
> towards inlining without making it absolute) was also opposed by either
> you or Linus. Forcing is ALSO wrong. For example it causes gcc to do
> inlining even for functions that use variable sized arrays ;(

I believe your example is a non-issue by Linus' terms.  AFAIR, he
always considered variable sized arrays a bug, when used for kernel
code.  Line of argument is something like this:

o If a variable-sized array has an unknown upper bound, the stack
  will explode.
o If the upper bound is well-known, using a fixed-size array works
  just as well.
o make checkstack is unable to interpret the function preamble in
  presence of alloca() or variable sized arrays.

The last point is from irc today, not from Linus.


Interestingly, Linus and Ingo/you are arguing similarly:

Linus: I don't want to allow sloppiness.  If you know the upper bound,
I force you to write it down explicitly by using a fixed-size array.

Ingo: I don't want to allow sloppiness.  Either the inline is required
and shall be written as always_inline, or it is not and should be
removed.

Disallowing sloppiness might be a good idea in both cases.  Along
those lines, "inline" is quite seductive to over-use by sloppy
developers.  "always_inline" or "in_this_rare_case_inline_makes_sense"
would at least force sloppy people to get second thoughts.

Jörn

-- 
Good warriors cause others to come to them and do not go to others.
-- Sun Tzu

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-02 19:17                                                 ` Jakub Jelinek
@ 2006-01-02 19:30                                                   ` Andrew Morton
  2006-01-02 19:41                                                   ` Linus Torvalds
  1 sibling, 0 replies; 167+ messages in thread
From: Andrew Morton @ 2006-01-02 19:30 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: khc, mingo, bunk, arjan, tim, torvalds, davej, linux-kernel, mpm

Jakub Jelinek <jakub@redhat.com> wrote:
>
> inline keyword has always been a hint.
>

Kernel does

# define inline           inline          __attribute__((always_inline))

for gcc-3 and gcc-4.

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-02 19:17                                                 ` Jakub Jelinek
  2006-01-02 19:30                                                   ` Andrew Morton
@ 2006-01-02 19:41                                                   ` Linus Torvalds
  2006-01-02 19:53                                                     ` Ingo Molnar
  2006-01-02 20:28                                                     ` Jakub Jelinek
  1 sibling, 2 replies; 167+ messages in thread
From: Linus Torvalds @ 2006-01-02 19:41 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Andrew Morton, Krzysztof Halasa, mingo, bunk, arjan, tim, davej,
	linux-kernel, mpm



On Mon, 2 Jan 2006, Jakub Jelinek wrote:
> 
> Where does this certainity come from?  gcc-3.x (as well as 2.x) each had
> its own heuristics which functions should be inlined and which should not.
> inline keyword has always been a hint.

NO IT HAS NOT.

This is total revisionist history by gcc people. It did not use to be a 
hint. If you asked for inlining, you got it unless there was some 
technical reason why gcc couldn't inline it (ie recursive inlining, and 
trampolines and some other issues). End of story. 

So don't fall for the "hint" argument. It's simply not true.

At some point during gcc-3.1, gcc people changed it to be a hint, and did 
so very badly indeed, where functions that really needed inlining (because 
constant propagation made them go away) were not inlined any more. As a 
result, we do

	#define inline   inline __attribute__((always_inline))

in <linux/compiler-gcc3.h> exactly because gcc-3.1 broke so badly.

And nobody sane will argue that we would _ever_ not do that. gcc-3 was 
just too broken. Some architectures (sparc64, MIPS, s390) ended up trying 
to tune the inlining limits by hand (usually making them effectively 
infinitely large), but the basic rule was that gcc-3 inlining was just not 
working.

It may have improved in later gcc-3 versions, and apparently it's getting 
better in gcc-4. And that's the only thing we're discussing: whether to 
let gcc _4_ finally make some inlining decisions.

And people are nervous about it, exactly because the gcc people have 
historically just changed what "inline" means, with little regard for 
real-life code that uses it. And then they have this revisionist agenda, 
trying to change history and claiming that it's always been "just a hint". 
Despite the fact that the gcc manual itself very much said otherwise (I'm 
sure the manuals have been changed too).

		Linus

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-02 18:51                                               ` Arjan van de Ven
@ 2006-01-02 19:49                                                 ` Krzysztof Halasa
  2006-01-02 19:54                                                   ` Arjan van de Ven
  2006-01-02 22:23                                                 ` Russell King
  1 sibling, 1 reply; 167+ messages in thread
From: Krzysztof Halasa @ 2006-01-02 19:49 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Ingo Molnar, Adrian Bunk, Tim Schmielau, Linus Torvalds,
	Dave Jones, Andrew Morton, lkml, mpm

Arjan van de Ven <arjan@infradead.org> writes:

> you know what? gcc inlines those automatic even without you typing
> "inline". (esp if you have unit-at-a-time enabled)

And if it's not or if it's an older gcc?
Such functions should always be inlined, except maybe while debugging.

> well.. gcc is not stupid, especially if you give it visibility by
> enabling unit-at-a-time.

A *.c author can't do that. Who knows what flags will be used?

> you save about 1 cycle by inlining unless there is a trick for the
> optimizer.

There probably is but even without further optimizations I still save
at least that 1 cycle (and probably it caches better and have less stack
impact etc).

> Especially in the case you mention where gcc will dtrt...
> it's not worth typing "inline", what if you change the code later to use
> the function twice... most people at least forget to remove the
> redundant inline, turning it into a bloater...

I'd probably not forget that. BTW: most people don't write Linux.
Still, in cases where there are only gains and nothing to lose, why
not use some form of "inline"?

We could be more descriptive, though. An average reader should probably
be able to _read_ why is a particular function inlined, without guessing.
That would also help WRT different gcc options and versions, and it would
help checking if the "inline" is indeed correct (a public function with
callers all over the place marked as "one caller static" would be
obviously incorrect).
-- 
Krzysztof Halasa

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-02 19:41                                                   ` Linus Torvalds
@ 2006-01-02 19:53                                                     ` Ingo Molnar
  2006-01-02 20:28                                                     ` Jakub Jelinek
  1 sibling, 0 replies; 167+ messages in thread
From: Ingo Molnar @ 2006-01-02 19:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jakub Jelinek, Andrew Morton, Krzysztof Halasa, bunk, arjan, tim,
	davej, linux-kernel, mpm


* Linus Torvalds <torvalds@osdl.org> wrote:

> And people are nervous about it, exactly because the gcc people have 
> historically just changed what "inline" means, with little regard for 
> real-life code that uses it. [...]

i think whatever gcc does, we probably cannot get hurt more than we are 
hurting right now: everything is inlined, which bloats stuff to the 
maximum level. Stating that doesnt in any way excuse gcc 3.1's 
unilateral change of what 'inline' means, it doesnt reduce the distrust 
that might exist towards gcc, it's simply a statement of the situation 
we have right now. We can continue to distrust gcc (and probably 
rightfully so), but we probably cannot continue to hurt users as 
collateral damage of whatever tool-level fight. (and i'm not suggesting 
that this collateral damage is intentional in any way, it slowly evolved 
into the mess we have now.)

	Ingo

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-02 19:49                                                 ` Krzysztof Halasa
@ 2006-01-02 19:54                                                   ` Arjan van de Ven
  2006-01-02 20:05                                                     ` Krzysztof Halasa
  0 siblings, 1 reply; 167+ messages in thread
From: Arjan van de Ven @ 2006-01-02 19:54 UTC (permalink / raw)
  To: Krzysztof Halasa
  Cc: Ingo Molnar, Adrian Bunk, Tim Schmielau, Linus Torvalds,
	Dave Jones, Andrew Morton, lkml, mpm

On Mon, 2006-01-02 at 20:49 +0100, Krzysztof Halasa wrote:
> Arjan van de Ven <arjan@infradead.org> writes:
> 
> > you know what? gcc inlines those automatic even without you typing
> > "inline". (esp if you have unit-at-a-time enabled)
> 
> And if it's not or if it's an older gcc?

older than 3.0? that's not possible much longer

> > well.. gcc is not stupid, especially if you give it visibility by
> > enabling unit-at-a-time.
> 
> A *.c author can't do that. Who knows what flags will be used?
> 

the author should assume sane flags ;)

> > you save about 1 cycle by inlining unless there is a trick for the
> > optimizer.
> 
> There probably is but even without further optimizations I still save
> at least that 1 cycle (and probably it caches better and have less stack
> impact etc).

actually that's no so sure. especially with the current regparm
callingconvention. Inlining will cause more spills, which causes more
stack usage. It's doubtful the extra space you use with "call" will
outweigh that.

> > Especially in the case you mention where gcc will dtrt...
> > it's not worth typing "inline", what if you change the code later to use
> > the function twice... most people at least forget to remove the
> > redundant inline, turning it into a bloater...
> 
> I'd probably not forget that. BTW: most people don't write Linux.
> Still, in cases where there are only gains and nothing to lose, why
> not use some form of "inline"?

it's by far not only gains. And maintenance costs too.



^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-02 19:12                                               ` Linus Torvalds
@ 2006-01-02 19:59                                                 ` Krzysztof Halasa
  2006-01-02 20:13                                                 ` Ingo Molnar
  1 sibling, 0 replies; 167+ messages in thread
From: Krzysztof Halasa @ 2006-01-02 19:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Adrian Bunk, Arjan van de Ven, Tim Schmielau,
	Dave Jones, Andrew Morton, lkml, mpm

Linus Torvalds <torvalds@osdl.org> writes:

> That's actually not a good practice. Two reasons:
>
>  - debuggability goes way down. Oops reports give a much nicer call-chain 
>    and better locality for uninlined code.

Right.

>  - Gcc can suck at big functions with lots of local variables. A 
>    function call can be _cheaper_ than trying to inline a function, 
>    regardless of whether it's called once or many times. I've seen 
>    functions that had several silly (and unnecessary) spills suddenly 
>    become quite readable when they were separate functions.

OTOH inlining helps much if both the caller and the called share the
same variables (values, calculated the same way from the same arguments).
-- 
Krzysztof Halasa

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-02 19:54                                                   ` Arjan van de Ven
@ 2006-01-02 20:05                                                     ` Krzysztof Halasa
  2006-01-02 20:18                                                       ` Jörn Engel
  0 siblings, 1 reply; 167+ messages in thread
From: Krzysztof Halasa @ 2006-01-02 20:05 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Ingo Molnar, Adrian Bunk, Tim Schmielau, Linus Torvalds,
	Dave Jones, Andrew Morton, lkml, mpm

Arjan van de Ven <arjan@infradead.org> writes:

> it's by far not only gains. And maintenance costs too.

Anyway, distinctive "inlines" could help here, right?
-- 
Krzysztof Halasa

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-02 19:03                                               ` Andrew Morton
  2006-01-02 19:17                                                 ` Jakub Jelinek
@ 2006-01-02 20:09                                                 ` Ingo Molnar
  2006-01-02 20:24                                                   ` Andrew Morton
  2006-01-02 20:30                                                 ` Ingo Molnar
  2 siblings, 1 reply; 167+ messages in thread
From: Ingo Molnar @ 2006-01-02 20:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Krzysztof Halasa, bunk, arjan, tim, torvalds, davej, linux-kernel,
	mpm


* Andrew Morton <akpm@osdl.org> wrote:

> > > what is the 'deeper problem'? I believe it is a combination of two 
> > > (well-known) things:
> > >
> > >   1) people add 'inline' too easily
> > >   2) we default to 'always inline'
> > 
> > For example, I add "inline" for static functions which are only called
> > from one place.
> > 
> > If I'm able to say "this is static function which is called from one
> > place" I'd do so instead of saying "inline". But omitting the "inline"
> > with hope that some new gcc probably will inline it anyway (on some
> > platform?) doesn't seem like a best idea.
> > 
> > But what _is_ the best idea?
> 
> Just use `inline'.  With gcc-3 it'll be inlined.
> 
> With gcc-4 and Ingo's patch it _might_ be inlined.  And it _might_ be 
> uninlined by the compiler if someone adds a second callsite later on.  
> Maybe.  We just don't know.  That's a problem.  Use of __always_inline 
> will remove this uncertainty.

i agree with your later points, so this is only a minor nit: why is a 
dynamic decision done by the compiler a 'problem' in itself?

It _could_ _become_ a problem if the compiler does it incorrectly, but 
that's so true for just about any other dynamic gcc decision: what 
registers it opts to use in a hotpath, what amount of loop-unrolling it 
does, what machine-ops it choses, how it schedules them, how it reorders 
them, how it generates memory access patterns, etc., etc. Sure, the 
compiler can mess up in _any_ of these dynamic decisions, with possibly 
bad effects to performance, but that by itself doesnt create some magic 
'dynamic inlining is bad' axiom.

In fact, i believe the opposite is true: inlining is arguably best done 
dynamically. Whether gcc makes use of that theoretical opening is 
another question, but my measurements show that gcc4 does a quite good 
job of it. (It certainly does a better job than what we humans did over 
the last 5 years, creating 20,000+ inline markers.)

and even if we let gcc do the inlining, a global -finline-limit=0 
compile-time flag will essentially turn off all 'hinted' inlining done 
by gcc.

> So our options appear to be:
> 
> a) Go fix up stupid inlinings (again) or
> 
> b) Apply Ingo's patch, then go add __always_inline to places which we
>    care about.

note that one of the patches i did (a small one in fact) does exactly 
that, for x86: i marked all things __always_inline that allyesconfig 
needs inlined.

> Either way, we need to go all over the tree. In practice, we'll only 
> bother going over the bits which we most care about (core kernel, core 
> networking, a handful of net and block drivers).  I suspect many of 
> the bad inlining decisions are in poorly-maintained code - we've been 
> pretty careful about this for several years.

yes. And this pretty much supports my point: we should flip the meaning 
of 'inline' around, from 'always inline', to at least: 'inline only if 
gcc thinks so too, if we are optimizing for size'.

and i'd be equally happy with making the flip-around even more agressive 
than my first patch-queue did, to e.g. alias 'inline' to 'nothing':

	#define inline

and to then remove inline from all .c level files (and most .h level 
files) as well - albeit this would punish places that use inline 
judiciously.

Even in this case, it is very likely much easier to re-add inlines to 
the few places that really improve from them (even though they dont need 
it in the __always_inline sense like vsyscalls or kmalloc()), than to 
keep the current 'always inline' logic in place and to hope for a 
gradual reduction of the use of the inline keyword...

	Ingo

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-02 19:12                                               ` Linus Torvalds
  2006-01-02 19:59                                                 ` Krzysztof Halasa
@ 2006-01-02 20:13                                                 ` Ingo Molnar
  2006-01-02 21:00                                                   ` Jan Engelhardt
  1 sibling, 1 reply; 167+ messages in thread
From: Ingo Molnar @ 2006-01-02 20:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Krzysztof Halasa, Adrian Bunk, Arjan van de Ven, Tim Schmielau,
	Dave Jones, Andrew Morton, lkml, mpm


* Linus Torvalds <torvalds@osdl.org> wrote:

> > For example, I add "inline" for static functions which are only called
> > from one place.
> 
> That's actually not a good practice. Two reasons:
> 
>  - debuggability goes way down. Oops reports give a much nicer call-chain 
>    and better locality for uninlined code.

yes, and to improve debuggability, i often do this at the top of 
debugged .c modules:

	#undef inline
	#define inline

to get good stacktraces. So debuggability is i think another argument to 
further decouple 'inline' from 'always inline' - so a global .config 
DEBUG_ option could turn all inlines into real function calls. (we 
already have CONFIG_FRAME_POINTERS to improve stack-trace output, at the 
price of slightly slower code.)

	Ingo

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-02 20:05                                                     ` Krzysztof Halasa
@ 2006-01-02 20:18                                                       ` Jörn Engel
  0 siblings, 0 replies; 167+ messages in thread
From: Jörn Engel @ 2006-01-02 20:18 UTC (permalink / raw)
  To: Krzysztof Halasa
  Cc: Arjan van de Ven, Ingo Molnar, Adrian Bunk, Tim Schmielau,
	Linus Torvalds, Dave Jones, Andrew Morton, lkml, mpm

On Mon, 2 January 2006 21:05:31 +0100, Krzysztof Halasa wrote:
> Arjan van de Ven <arjan@infradead.org> writes:
> 
> > it's by far not only gains. And maintenance costs too.
> 
> Anyway, distinctive "inlines" could help here, right?

Not really.  Your example, as Linus already explained, is a great
example of how _not_ to inline stuff.  If in doubt, don't inline.  If
a function is called just once, don't inline unless there are other
good reasons for it.  If you get a minimal gain (10 bytes) in text
size, don't inline.  The maintenance issue is more important.

Jörn

-- 
A defeated army first battles and then seeks victory.
-- Sun Tzu

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-02 20:09                                                 ` Ingo Molnar
@ 2006-01-02 20:24                                                   ` Andrew Morton
  2006-01-02 20:40                                                     ` Ingo Molnar
  0 siblings, 1 reply; 167+ messages in thread
From: Andrew Morton @ 2006-01-02 20:24 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: khc, bunk, arjan, tim, torvalds, davej, linux-kernel, mpm

Ingo Molnar <mingo@elte.hu> wrote:
>
> i marked all things __always_inline that allyesconfig 
>  needs inlined.

I hope you fixed __always_inline too.  It's currently a no-op on all but
alpha.



^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-02 19:41                                                   ` Linus Torvalds
  2006-01-02 19:53                                                     ` Ingo Molnar
@ 2006-01-02 20:28                                                     ` Jakub Jelinek
  1 sibling, 0 replies; 167+ messages in thread
From: Jakub Jelinek @ 2006-01-02 20:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Krzysztof Halasa, mingo, bunk, arjan, tim, davej,
	linux-kernel, mpm

On Mon, Jan 02, 2006 at 11:41:24AM -0800, Linus Torvalds wrote:
> > Where does this certainity come from?  gcc-3.x (as well as 2.x) each had
> > its own heuristics which functions should be inlined and which should not.
> > inline keyword has always been a hint.
> 
> NO IT HAS NOT.
> 
> This is total revisionist history by gcc people. It did not use to be a 
> hint. If you asked for inlining, you got it unless there was some 
> technical reason why gcc couldn't inline it (ie recursive inlining, and 
> trampolines and some other issues). End of story. 

One of the "technical reasons" was if the function was bigger than some
threshold.  And in that case I think it is ok to speak about inline keyword
as a hint.  The default inline limit (in rtx count after constant folding,
but not other optimizations) was bigger than in the GCC 3.x era, sure, but
there was a limit and GCC wasn't inlining functions bigger than that limit,
even if they could be simplified due to constant arguments to something
much smaller.

	Jakub

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-02 19:03                                               ` Andrew Morton
  2006-01-02 19:17                                                 ` Jakub Jelinek
  2006-01-02 20:09                                                 ` Ingo Molnar
@ 2006-01-02 20:30                                                 ` Ingo Molnar
  2 siblings, 0 replies; 167+ messages in thread
From: Ingo Molnar @ 2006-01-02 20:30 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Krzysztof Halasa, bunk, arjan, tim, torvalds, davej, linux-kernel,
	mpm


* Andrew Morton <akpm@osdl.org> wrote:

> Either way, we need to go all over the tree.  In practice, we'll only 
> bother going over the bits which we most care about (core kernel, core 
> networking, a handful of net and block drivers).  I suspect many of 
> the bad inlining decisions are in poorly-maintained code - we've been 
> pretty careful about this for several years.

i've gone over quite many inlining decisions, and i have to say that 
even for my _own code_, most of the historic inlining decisions were 
wrong. What seems simple on the C level, can be quite bloaty on the 
assembly level - even if you are well aware of how C maps to assembly!

furthermore, there's also a new CPU-architecture argument: the cost of 
icache misses has gone up disproportionally over the past couple of 
years, because on the first hand lots of instruction-scheduling 
'metadata' got embedded into the L1 cache (like what used to be the BTB 
cache), and secondly because the (physical) latency gap between L1 cache 
and L2 cache has increased. Thirdly, CPUs are much better at untangling 
data dependencies, hence more compact but also more complex code can 
still perform well. So the L1 icache is more important than it used to 
be, and small code size is more important than raw cycle count - _and_ 
small code has less of a speed hit than it used to have. x86 CPUs have 
become simple JIT compilers, and code size reductions tend to become the 
best way to inform the CPU of what operations we want to compute.

So as an end-result, most of the historic inlining decisions _even in 
well-maintained core code_ are mostly incorrect these days. We really 
only want to inline things where the inlining results in _smaller_ code 
(due to less clobbering of function-call related caller-saved 
registers). That pretty much only leaves the truly trivial 1-2 
instructions code sequences like get_current() and the __always_inline 
stuff like kmalloc() or memset(). All the rest wants to be a function 
call these days.

	Ingo

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-02 20:24                                                   ` Andrew Morton
@ 2006-01-02 20:40                                                     ` Ingo Molnar
  0 siblings, 0 replies; 167+ messages in thread
From: Ingo Molnar @ 2006-01-02 20:40 UTC (permalink / raw)
  To: Andrew Morton; +Cc: khc, bunk, arjan, tim, torvalds, davej, linux-kernel, mpm


* Andrew Morton <akpm@osdl.org> wrote:

> Ingo Molnar <mingo@elte.hu> wrote:
> >
> > i marked all things __always_inline that allyesconfig 
> >  needs inlined.
> 
> I hope you fixed __always_inline too.  It's currently a no-op on all 
> but alpha.

yeah, i fixed that, my patch-queue does:

#define __always_inline         inline __attribute__((always_inline))

	Ingo

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-02 20:13                                                 ` Ingo Molnar
@ 2006-01-02 21:00                                                   ` Jan Engelhardt
  2006-01-02 22:43                                                     ` Linus Torvalds
  0 siblings, 1 reply; 167+ messages in thread
From: Jan Engelhardt @ 2006-01-02 21:00 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Krzysztof Halasa, Adrian Bunk, Arjan van de Ven,
	Tim Schmielau, Dave Jones, Andrew Morton, lkml, mpm

>* Linus Torvalds <torvalds@osdl.org> wrote:
>
>> > For example, I add "inline" for static functions which are only called
>> > from one place.
>> 
>> That's actually not a good practice. Two reasons:
>> 
>>  - debuggability goes way down. Oops reports give a much nicer call-chain 
>>    and better locality for uninlined code.

When I want to debug, I use
CFLAGS="-O0 -ggdb3 -fno-inline -fno-omit-frame-pointer"
for that particular file(s). That sure gets good results. Not sure about 
who wins in the kernel case: always_inline or -fno-inline.



Jan Engelhardt
-- 

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-02 18:49                                           ` Arjan van de Ven
  2006-01-02 19:26                                             ` Jörn Engel
@ 2006-01-02 21:51                                             ` Grant Coady
  2006-01-02 22:03                                               ` Antonio Vargas
  1 sibling, 1 reply; 167+ messages in thread
From: Grant Coady @ 2006-01-02 21:51 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Andrew Morton, Adrian Bunk, mingo, tim, torvalds, davej,
	linux-kernel, mpm

On Mon, 02 Jan 2006 19:49:06 +0100, Arjan van de Ven <arjan@infradead.org> wrote:

>Maybe the right approach is to start rejecting in reviews new code that
>uses inline inappropriately. (where "inappropriate" sort of is "more
>than 3 lines of C unless there is some constant-optimizes-away trick")

Well, I can own up to half a dozen inlines in a .c file, CodingStyle 
suggests to convert macros to static inline, so I did:

/* adm9240 internally scales voltage measurements */
static const u16 nom_mv[] = { 2500, 2700, 3300, 5000, 12000, 2700 };

static inline unsigned int IN_FROM_REG(u8 reg, int n)
{
        return SCALE(reg, nom_mv[n], 192);
}

static inline u8 IN_TO_REG(unsigned long val, int n)
{
        return SENSORS_LIMIT(SCALE(val, 192, nom_mv[n]), 0, 255);
}

/* temperature range: -40..125, 127 disables temperature alarm */
static inline s8 TEMP_TO_REG(long val)
{
        return SENSORS_LIMIT(SCALE(val, 1, 1000), -40, 127);
}

Are these typical targets for non-inline? 

Thanks,
Grant.

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-02 21:51                                             ` Grant Coady
@ 2006-01-02 22:03                                               ` Antonio Vargas
  2006-01-02 22:56                                                 ` Arjan van de Ven
  0 siblings, 1 reply; 167+ messages in thread
From: Antonio Vargas @ 2006-01-02 22:03 UTC (permalink / raw)
  To: gcoady, Arjan van de Ven, Andrew Morton, Adrian Bunk, mingo, tim,
	torvalds, davej, linux-kernel, mpm

On 1/2/06, Grant Coady <grant_lkml@dodo.com.au> wrote:
> On Mon, 02 Jan 2006 19:49:06 +0100, Arjan van de Ven <arjan@infradead.org> wrote:
>
> >Maybe the right approach is to start rejecting in reviews new code that
> >uses inline inappropriately. (where "inappropriate" sort of is "more
> >than 3 lines of C unless there is some constant-optimizes-away trick")
>
> Well, I can own up to half a dozen inlines in a .c file, CodingStyle
> suggests to convert macros to static inline, so I did:
>
> /* adm9240 internally scales voltage measurements */
> static const u16 nom_mv[] = { 2500, 2700, 3300, 5000, 12000, 2700 };
>

[snip some static inline functons]

>
> Are these typical targets for non-inline?

according to the latest flamewars, maybe it would be better
to just turn the #defines into static functions instead on static inlines...
guess even better would be to just get CodingStyle fixed ASAP ;)

--
Greetz, Antonio Vargas aka winden of network

http://wind.codepixel.com/
windNOenSPAMntw@gmail.com
thesameasabove@amigascne.org

Every day, every year
you have to work
you have to study
you have to scene.

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-02 18:51                                               ` Arjan van de Ven
  2006-01-02 19:49                                                 ` Krzysztof Halasa
@ 2006-01-02 22:23                                                 ` Russell King
  2006-01-02 23:55                                                   ` Alan Cox
  2006-01-03  3:59                                                   ` Daniel Jacobowitz
  1 sibling, 2 replies; 167+ messages in thread
From: Russell King @ 2006-01-02 22:23 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Krzysztof Halasa, Ingo Molnar, Adrian Bunk, Tim Schmielau,
	Linus Torvalds, Dave Jones, Andrew Morton, lkml, mpm

On Mon, Jan 02, 2006 at 07:51:32PM +0100, Arjan van de Ven wrote:
> On Mon, 2006-01-02 at 19:44 +0100, Krzysztof Halasa wrote:
> > Ingo Molnar <mingo@elte.hu> writes:
> > 
> > > what is the 'deeper problem'? I believe it is a combination of two 
> > > (well-known) things:
> > >
> > >   1) people add 'inline' too easily
> > >   2) we default to 'always inline'
> > 
> > For example, I add "inline" for static functions which are only called
> > from one place.
> 
> you know what? gcc inlines those automatic even without you typing
> "inline". (esp if you have unit-at-a-time enabled)

Rubbish it will.

static void fn1(void *f)
{
}

void fn2(void *f)
{
        fn1(f);
}

on ARM produces:

        .text
        .align  2
        .type   fn1, %function
fn1:
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        @ lr needed for prologue
        mov     pc, lr
        .size   fn1, .-fn1
        .align  2
        .global fn2
        .type   fn2, %function
fn2:
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        @ lr needed for prologue
        b       fn1
        .size   fn2, .-fn2
        .ident  "GCC: (GNU) 3.3 20030728 (Red Hat Linux 3.3-16)"

You can't get a simpler function than fn1 to automatically inline.

GCC will only automatically inline using -O3.  We don't use -O3 with
the kernel - only -O2 and -Os.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 Serial core

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-02 21:00                                                   ` Jan Engelhardt
@ 2006-01-02 22:43                                                     ` Linus Torvalds
  0 siblings, 0 replies; 167+ messages in thread
From: Linus Torvalds @ 2006-01-02 22:43 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Ingo Molnar, Krzysztof Halasa, Adrian Bunk, Arjan van de Ven,
	Tim Schmielau, Dave Jones, Andrew Morton, lkml, mpm



On Mon, 2 Jan 2006, Jan Engelhardt wrote:

> >* Linus Torvalds <torvalds@osdl.org> wrote:
> >
> >> > For example, I add "inline" for static functions which are only called
> >> > from one place.
> >> 
> >> That's actually not a good practice. Two reasons:
> >> 
> >>  - debuggability goes way down. Oops reports give a much nicer call-chain 
> >>    and better locality for uninlined code.
> 
> When I want to debug, I use
> CFLAGS="-O0 -ggdb3 -fno-inline -fno-omit-frame-pointer"
> for that particular file(s). That sure gets good results. Not sure about 
> who wins in the kernel case: always_inline or -fno-inline.

This is totally not relevant.

99% of all bug-reports happen for non-developers. What developers can and 
can not do from a debuggability standpoint is almost totally 
uninteresting: quite often the developers won't even be able to recreate 
the bug, but have to go on the bug report that comes in from the outside.

And yes, some users are willing to recompile the kernel, and try ten 
different versions, and in general are just worth their weight in gold. 
But many people have trouble even reporting the (short) oops details, much 
less follow up on it. 

So it's actually important that the default config is reasonably easy to 
debug from the oops report. Because it may be the only thing you ever get.

So -O0 and -fno-inline simply isn't practical, because they are not an 
option for a normal kernel. 

		Linus

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-02 22:03                                               ` Antonio Vargas
@ 2006-01-02 22:56                                                 ` Arjan van de Ven
  2006-01-02 23:10                                                   ` Grant Coady
  0 siblings, 1 reply; 167+ messages in thread
From: Arjan van de Ven @ 2006-01-02 22:56 UTC (permalink / raw)
  To: Antonio Vargas
  Cc: gcoady, Andrew Morton, Adrian Bunk, mingo, tim, torvalds, davej,
	linux-kernel, mpm


> > Are these typical targets for non-inline?

these are very simple 1 line things, and are the cases where inline is
just fine. The problem cases are the ones with a whole lot more than
that though, say 3 or more real code lines with things like loops or
udelays or ... There's 50+ line functions marked "inline". Those are the
"bad guys" not so much the simple 1 liners

> 
> according to the latest flamewars, maybe it would be better
> to just turn the #defines into static functions instead on static inlines...
> guess even better would be to just get CodingStyle fixed ASAP ;)

I proposed the following chunks:

Adds a bit of text to Documentation/Codingstyle to state that
inlining everything "just because" is a bad idea

Signed-off-by: Arjan van de Ven <arjan@infradead.org>

diff -purN linux-2.6.15-rc6/Documentation/CodingStyle linux-2.6.15-rc6-deinline/Documentation/CodingStyle
--- linux-2.6.15-rc6/Documentation/CodingStyle	2005-10-28 02:02:08.000000000 +0200
+++ linux-2.6.15-rc6-deinline/Documentation/CodingStyle	2005-12-30 13:31:13.000000000 +0100
@@ -344,7 +344,7 @@ Remember: if another thread can find you
 have a reference count on it, you almost certainly have a bug.
 
 
-		Chapter 11: Macros, Enums, Inline functions and RTL
+		Chapter 11: Macros, Enums and RTL
 
 Names of macros defining constants and labels in enums are capitalized.
 
@@ -429,7 +429,35 @@ from void pointer to any other pointer t
 language.
 
 
-		Chapter 14: References
+		Chapter 14: The inline disease
+
+There appears to be a common misperception that gcc has a magic "make me
+faster" ricing option called "inline". While the use of inlines can be
+appropriate (for example as a means of replacing macros, see Chapter 11), it
+very often is not. Abundant use of the inline keyword leads to a much bigger
+kernel, which in turn slows the system as a whole down, due to a bigger
+icache footprint for the CPU and simply because there is less memory
+available for the pagecache. Just think about it; a pagecache miss causes a
+disk seek, which easily takes 5 miliseconds. There are a LOT of cpu cycles
+that can go into these 5 miliseconds.
+
+A reasonable rule of thumb is to not put inline at functions that have more
+than 3 lines of code in them. An exception to this rule are the cases where
+a parameter is known to be a compiletime constant, and as a result of this
+constantness you *know* the compiler will be able to optimize most of your
+function away at compile time. For a good example of this later case, see
+the kmalloc() inline function.
+
+Often people argue that adding inline to functions that are static and used
+only once is always a win since there is no space tradeoff. While this is
+technically correct, gcc is capable of inlining these automatically without
+help, and the maintenance issue of removing the inline when a second user
+appears outweighs the potential value of the hint that tells gcc to do
+something it would have done anyway.
+
+
+
+		Chapter 15: References
 
 The C Programming Language, Second Edition
 by Brian W. Kernighan and Dennis M. Ritchie.
@@ -450,4 +478,4 @@ WG14 is the international standardizatio
 language C, URL: http://std.dkuug.dk/JTC1/SC22/WG14/
 
 --
-Last updated on 16 February 2004 by a community effort on LKML.
+Last updated on 30 December 2005 by a community effort on LKML.



^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-02 22:56                                                 ` Arjan van de Ven
@ 2006-01-02 23:10                                                   ` Grant Coady
  2006-01-02 23:57                                                     ` Alan Cox
  0 siblings, 1 reply; 167+ messages in thread
From: Grant Coady @ 2006-01-02 23:10 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Antonio Vargas, gcoady, Andrew Morton, Adrian Bunk, mingo, tim,
	torvalds, davej, linux-kernel, mpm

On Mon, 02 Jan 2006 23:56:23 +0100, Arjan van de Ven <arjan@infradead.org> wrote:

>I proposed the following chunks:
>
>Adds a bit of text to Documentation/Codingstyle to state that
>inlining everything "just because" is a bad idea
>
>Signed-off-by: Arjan van de Ven <arjan@infradead.org>
>
>diff -purN linux-2.6.15-rc6/Documentation/CodingStyle linux-2.6.15-rc6-deinline/Documentation/CodingStyle
>--- linux-2.6.15-rc6/Documentation/CodingStyle	2005-10-28 02:02:08.000000000 +0200
>+++ linux-2.6.15-rc6-deinline/Documentation/CodingStyle	2005-12-30 13:31:13.000000000 +0100
>@@ -344,7 +344,7 @@ Remember: if another thread can find you
> have a reference count on it, you almost certainly have a bug.
> 
> 
>-		Chapter 11: Macros, Enums, Inline functions and RTL
>+		Chapter 11: Macros, Enums and RTL
> 
> Names of macros defining constants and labels in enums are capitalized.
> 
>@@ -429,7 +429,35 @@ from void pointer to any other pointer t
> language.
> 
> 
>-		Chapter 14: References
>+		Chapter 14: The inline disease
>+
>+There appears to be a common misperception that gcc has a magic "make me
>+faster" ricing option called "inline". While the use of inlines can be
          ^^^^^^--?? not sure what this is
>+appropriate (for example as a means of replacing macros, see Chapter 11), it
>+very often is not. Abundant use of the inline keyword leads to a much bigger
>+kernel, which in turn slows the system as a whole down, due to a bigger
>+icache footprint for the CPU and simply because there is less memory
>+available for the pagecache. Just think about it; a pagecache miss causes a
>+disk seek, which easily takes 5 miliseconds. There are a LOT of cpu cycles
>+that can go into these 5 miliseconds.
>+
>+A reasonable rule of thumb is to not put inline at functions that have more
>+than 3 lines of code in them. An exception to this rule are the cases where
>+a parameter is known to be a compiletime constant, and as a result of this
>+constantness you *know* the compiler will be able to optimize most of your
>+function away at compile time. For a good example of this later case, see
>+the kmalloc() inline function.
>+
>+Often people argue that adding inline to functions that are static and used
>+only once is always a win since there is no space tradeoff. While this is
>+technically correct, gcc is capable of inlining these automatically without
>+help, and the maintenance issue of removing the inline when a second user
>+appears outweighs the potential value of the hint that tells gcc to do
>+something it would have done anyway.

Seems sane, reasonable and mostly readable to me, thank you.

Grant.

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-02 22:23                                                 ` Russell King
@ 2006-01-02 23:55                                                   ` Alan Cox
  2006-01-03  3:59                                                   ` Daniel Jacobowitz
  1 sibling, 0 replies; 167+ messages in thread
From: Alan Cox @ 2006-01-02 23:55 UTC (permalink / raw)
  To: Russell King
  Cc: Arjan van de Ven, Krzysztof Halasa, Ingo Molnar, Adrian Bunk,
	Tim Schmielau, Linus Torvalds, Dave Jones, Andrew Morton, lkml,
	mpm

On Llu, 2006-01-02 at 22:23 +0000, Russell King wrote:
> > you know what? gcc inlines those automatic even without you typing
> > "inline". (esp if you have unit-at-a-time enabled)

> GCC will only automatically inline using -O3.  We don't use -O3 with
> the kernel - only -O2 and -Os.

Or using -finline-functions which is a good idea. -O3 has some other
undesirable side effects. Both -O2 -finline-functions and -Os
-finline-functions will do what looks to be the right thing.

Mind you anything more than a few instructions is questionable with
current processors.

Alan


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-02 23:10                                                   ` Grant Coady
@ 2006-01-02 23:57                                                     ` Alan Cox
  2006-01-02 23:58                                                       ` Grant Coady
  0 siblings, 1 reply; 167+ messages in thread
From: Alan Cox @ 2006-01-02 23:57 UTC (permalink / raw)
  To: gcoady
  Cc: Arjan van de Ven, Antonio Vargas, Andrew Morton, Adrian Bunk,
	mingo, tim, torvalds, davej, linux-kernel, mpm

On Maw, 2006-01-03 at 10:10 +1100, Grant Coady wrote:
> >+There appears to be a common misperception that gcc has a magic "make me
> >+faster" ricing option called "inline". While the use of inlines can be
>           ^^^^^^--?? not sure what this is

American car kiddy slang. Not appropriate for internationally read
documentation. Think "go faster stripes" etc.

Alan


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-02 23:57                                                     ` Alan Cox
@ 2006-01-02 23:58                                                       ` Grant Coady
  0 siblings, 0 replies; 167+ messages in thread
From: Grant Coady @ 2006-01-02 23:58 UTC (permalink / raw)
  To: Alan Cox
  Cc: gcoady, Arjan van de Ven, Antonio Vargas, Andrew Morton,
	Adrian Bunk, mingo, tim, torvalds, davej, linux-kernel, mpm

On Mon, 02 Jan 2006 23:57:53 +0000, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:

>On Maw, 2006-01-03 at 10:10 +1100, Grant Coady wrote:
>> >+There appears to be a common misperception that gcc has a magic "make me
>> >+faster" ricing option called "inline". While the use of inlines can be
>>           ^^^^^^--?? not sure what this is
>
>American car kiddy slang. Not appropriate for internationally read
>documentation. Think "go faster stripes" etc.

"go faster stripes" works for me :o)  (from the land down under)

Cheers,
Grant.

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2005-12-28 21:23       ` Ingo Molnar
  2005-12-28 21:48         ` Ingo Molnar
  2005-12-29  0:37         ` Rogério Brito
@ 2006-01-03  3:36         ` Daniel Jacobowitz
  2 siblings, 0 replies; 167+ messages in thread
From: Daniel Jacobowitz @ 2006-01-03  3:36 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Arjan van de Ven, lkml, Andrew Morton,
	Matt Mackall

On Wed, Dec 28, 2005 at 10:23:13PM +0100, Ingo Molnar wrote:
> (there's a third thing that i was also playing with, -ffunction-sections 
> and -fdata-sections, but those dont seem to be reliable on the binutils 
> side yet.)

They should be - I'd bet more on the kernel's linker scripts, and other
weird bits like that.  Feel free to report any problems, though.

However, -ffunction-sections tends to increase size in non-discarded
functions.

-- 
Daniel Jacobowitz
CodeSourcery

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-02 22:23                                                 ` Russell King
  2006-01-02 23:55                                                   ` Alan Cox
@ 2006-01-03  3:59                                                   ` Daniel Jacobowitz
  2006-01-03  8:53                                                     ` Russell King
  1 sibling, 1 reply; 167+ messages in thread
From: Daniel Jacobowitz @ 2006-01-03  3:59 UTC (permalink / raw)
  To: Arjan van de Ven, Krzysztof Halasa, Ingo Molnar, Adrian Bunk,
	Tim Schmielau, Linus Torvalds, Dave Jones, Andrew Morton, lkml,
	mpm

On Mon, Jan 02, 2006 at 10:23:35PM +0000, Russell King wrote:
> On Mon, Jan 02, 2006 at 07:51:32PM +0100, Arjan van de Ven wrote:
> > On Mon, 2006-01-02 at 19:44 +0100, Krzysztof Halasa wrote:
> > > Ingo Molnar <mingo@elte.hu> writes:
> > > 
> > > > what is the 'deeper problem'? I believe it is a combination of two 
> > > > (well-known) things:
> > > >
> > > >   1) people add 'inline' too easily
> > > >   2) we default to 'always inline'
> > > 
> > > For example, I add "inline" for static functions which are only called
> > > from one place.
> > 
> > you know what? gcc inlines those automatic even without you typing
> > "inline". (esp if you have unit-at-a-time enabled)
> 
> Rubbish it will.
> 
> static void fn1(void *f)
> {
> }
> 
> void fn2(void *f)
> {
>         fn1(f);
> }
> 
> on ARM produces:

On 3.4, 4.0, and 4.1 you only need -O for this (I just checked both x86
and ARM compilers).  I believe this came in with unit-at-a-time, as
Arjan said - which was GCC 3.4.

-- 
Daniel Jacobowitz
CodeSourcery

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-02 18:28                                         ` Andrew Morton
  2006-01-02 18:49                                           ` Arjan van de Ven
@ 2006-01-03  5:31                                           ` Nick Piggin
  2006-01-03 23:40                                           ` Martin J. Bligh
  2 siblings, 0 replies; 167+ messages in thread
From: Nick Piggin @ 2006-01-03  5:31 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Adrian Bunk, Ingo Molnar, tim, arjan, torvalds, davej, lkml, mpm

On Mon, 2006-01-02 at 10:28 -0800, Andrew Morton wrote:
> Adrian Bunk <bunk@stusta.de> wrote:

> > We only disagree on how to achieve an improvement.
> > 
> 
> The best approach is to manually review and fix up all the inline statements.
> 

I agree with this. Turning off inlining in one big hit can punish
correct users of inline and may cause regressions.

Reducing inline abuse seems to be the easist possible case for
incremental patches, which is how we've always tried to do things.
Andrew (and others) have been reducing inlines for years and things
are going along in the right direction.

-- 
SUSE Labs, Novell Inc.



Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-03  3:59                                                   ` Daniel Jacobowitz
@ 2006-01-03  8:53                                                     ` Russell King
  2006-01-03  8:56                                                       ` Arjan van de Ven
  0 siblings, 1 reply; 167+ messages in thread
From: Russell King @ 2006-01-03  8:53 UTC (permalink / raw)
  To: Arjan van de Ven, Krzysztof Halasa, Ingo Molnar, Adrian Bunk,
	Tim Schmielau, Linus Torvalds, Dave Jones, Andrew Morton, lkml,
	mpm

On Mon, Jan 02, 2006 at 10:59:04PM -0500, Daniel Jacobowitz wrote:
> On Mon, Jan 02, 2006 at 10:23:35PM +0000, Russell King wrote:
> > static void fn1(void *f)
> > {
> > }
> > 
> > void fn2(void *f)
> > {
> >         fn1(f);
> > }
> > 
> > on ARM produces:
> 
> On 3.4, 4.0, and 4.1 you only need -O for this (I just checked both x86
> and ARM compilers).  I believe this came in with unit-at-a-time, as
> Arjan said - which was GCC 3.4.

Well, as demonstrated, it doesn't work with gcc 3.3.  Since we aren't
about to increase the minimum gcc version to 3.4, this isn't acceptable.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 Serial core

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-03  8:53                                                     ` Russell King
@ 2006-01-03  8:56                                                       ` Arjan van de Ven
  2006-01-03  9:00                                                         ` Russell King
  2006-01-03  9:14                                                         ` Vitaly Wool
  0 siblings, 2 replies; 167+ messages in thread
From: Arjan van de Ven @ 2006-01-03  8:56 UTC (permalink / raw)
  To: Russell King
  Cc: Krzysztof Halasa, Ingo Molnar, Adrian Bunk, Tim Schmielau,
	Linus Torvalds, Dave Jones, Andrew Morton, lkml, mpm

On Tue, 2006-01-03 at 08:53 +0000, Russell King wrote:
> On Mon, Jan 02, 2006 at 10:59:04PM -0500, Daniel Jacobowitz wrote:
> > On Mon, Jan 02, 2006 at 10:23:35PM +0000, Russell King wrote:
> > > static void fn1(void *f)
> > > {
> > > }
> > > 
> > > void fn2(void *f)
> > > {
> > >         fn1(f);
> > > }
> > > 
> > > on ARM produces:
> > 
> > On 3.4, 4.0, and 4.1 you only need -O for this (I just checked both x86
> > and ARM compilers).  I believe this came in with unit-at-a-time, as
> > Arjan said - which was GCC 3.4.
> 
> Well, as demonstrated, it doesn't work with gcc 3.3.  Since we aren't
> about to increase the minimum gcc version to 3.4, this isn't acceptable.

s/isn't acceptable/is suboptimal/




^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-03  8:56                                                       ` Arjan van de Ven
@ 2006-01-03  9:00                                                         ` Russell King
  2006-01-03  9:10                                                           ` Arjan van de Ven
  2006-01-03  9:14                                                         ` Vitaly Wool
  1 sibling, 1 reply; 167+ messages in thread
From: Russell King @ 2006-01-03  9:00 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Krzysztof Halasa, Ingo Molnar, Adrian Bunk, Tim Schmielau,
	Linus Torvalds, Dave Jones, Andrew Morton, lkml, mpm

On Tue, Jan 03, 2006 at 09:56:26AM +0100, Arjan van de Ven wrote:
> On Tue, 2006-01-03 at 08:53 +0000, Russell King wrote:
> > On Mon, Jan 02, 2006 at 10:59:04PM -0500, Daniel Jacobowitz wrote:
> > > On Mon, Jan 02, 2006 at 10:23:35PM +0000, Russell King wrote:
> > > > static void fn1(void *f)
> > > > {
> > > > }
> > > > 
> > > > void fn2(void *f)
> > > > {
> > > >         fn1(f);
> > > > }
> > > > 
> > > > on ARM produces:
> > > 
> > > On 3.4, 4.0, and 4.1 you only need -O for this (I just checked both x86
> > > and ARM compilers).  I believe this came in with unit-at-a-time, as
> > > Arjan said - which was GCC 3.4.
> > 
> > Well, as demonstrated, it doesn't work with gcc 3.3.  Since we aren't
> > about to increase the minimum gcc version to 3.4, this isn't acceptable.
> 
> s/isn't acceptable/is suboptimal/

No - it's a case of going overboard with this inline removal idea.
If we would prefer a function to be inlined because it is only used
once, we should specify it as such rather than relying on some quirky
idea that it _might_ do the right thing if we don't specify it.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 Serial core

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-03  9:00                                                         ` Russell King
@ 2006-01-03  9:10                                                           ` Arjan van de Ven
  0 siblings, 0 replies; 167+ messages in thread
From: Arjan van de Ven @ 2006-01-03  9:10 UTC (permalink / raw)
  To: Russell King
  Cc: Krzysztof Halasa, Ingo Molnar, Adrian Bunk, Tim Schmielau,
	Linus Torvalds, Dave Jones, Andrew Morton, lkml, mpm

On Tue, 2006-01-03 at 09:00 +0000, Russell King wrote:
> On Tue, Jan 03, 2006 at 09:56:26AM +0100, Arjan van de Ven wrote:
> > On Tue, 2006-01-03 at 08:53 +0000, Russell King wrote:
> > > On Mon, Jan 02, 2006 at 10:59:04PM -0500, Daniel Jacobowitz wrote:
> > > > On Mon, Jan 02, 2006 at 10:23:35PM +0000, Russell King wrote:
> > > > > static void fn1(void *f)
> > > > > {
> > > > > }
> > > > > 
> > > > > void fn2(void *f)
> > > > > {
> > > > >         fn1(f);
> > > > > }
> > > > > 
> > > > > on ARM produces:
> > > > 
> > > > On 3.4, 4.0, and 4.1 you only need -O for this (I just checked both x86
> > > > and ARM compilers).  I believe this came in with unit-at-a-time, as
> > > > Arjan said - which was GCC 3.4.
> > > 
> > > Well, as demonstrated, it doesn't work with gcc 3.3.  Since we aren't
> > > about to increase the minimum gcc version to 3.4, this isn't acceptable.
> > 
> > s/isn't acceptable/is suboptimal/
> 
> No - it's a case of going overboard with this inline removal idea.
> If we would prefer a function to be inlined because it is only used
> once, we should specify it as such rather than relying on some quirky
> idea that it _might_ do the right thing if we don't specify it

so for those gcc's one passes -finline-functions ....
(or -finline-functions-called-once if it's supported, which newer gccs
have again :)



^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-03  8:56                                                       ` Arjan van de Ven
  2006-01-03  9:00                                                         ` Russell King
@ 2006-01-03  9:14                                                         ` Vitaly Wool
  1 sibling, 0 replies; 167+ messages in thread
From: Vitaly Wool @ 2006-01-03  9:14 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Russell King, Krzysztof Halasa, Ingo Molnar, Adrian Bunk,
	Tim Schmielau, Linus Torvalds, Dave Jones, Andrew Morton, lkml,
	mpm

Arjan van de Ven wrote:

>On Tue, 2006-01-03 at 08:53 +0000, Russell King wrote:
>  
>
>>On Mon, Jan 02, 2006 at 10:59:04PM -0500, Daniel Jacobowitz wrote:
>>    
>>
>>>On Mon, Jan 02, 2006 at 10:23:35PM +0000, Russell King wrote:
>>>      
>>>
>>>>static void fn1(void *f)
>>>>{
>>>>}
>>>>
>>>>void fn2(void *f)
>>>>{
>>>>        fn1(f);
>>>>}
>>>>
>>>>on ARM produces:
>>>>        
>>>>
>>>On 3.4, 4.0, and 4.1 you only need -O for this (I just checked both x86
>>>and ARM compilers).  I believe this came in with unit-at-a-time, as
>>>Arjan said - which was GCC 3.4.
>>>      
>>>
>>Well, as demonstrated, it doesn't work with gcc 3.3.  Since we aren't
>>about to increase the minimum gcc version to 3.4, this isn't acceptable.
>>    
>>
>
>s/isn't acceptable/is suboptimal/
>  
>
I'm afraid it _isn't_ _acceptable_ since it just can kill current XIP 
implementation.

Vitaly

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-02 18:28                                         ` Andrew Morton
  2006-01-02 18:49                                           ` Arjan van de Ven
  2006-01-03  5:31                                           ` Nick Piggin
@ 2006-01-03 23:40                                           ` Martin J. Bligh
  2006-01-04  4:28                                             ` Matt Mackall
  2 siblings, 1 reply; 167+ messages in thread
From: Martin J. Bligh @ 2006-01-03 23:40 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Adrian Bunk, mingo, tim, arjan, torvalds, davej, linux-kernel,
	mpm

Andrew Morton wrote:

>Adrian Bunk <bunk@stusta.de> wrote:
>  
>
>>On Mon, Jan 02, 2006 at 11:37:21AM +0100, Ingo Molnar wrote:
>>    
>>
>>>...
>>>to say it loud and clear again: our current way of handling inlines is 
>>>_FUNDAMENTALLY BROKEN_. To me this means that fundamental changes are 
>>>needed for the _mechanics_ and meaning of inlines. We default to 'always 
>>>inline' which has a current information to noise ratio of 1:10 perhaps.  
>>>My patch changes the mechanics and meaning of inlines, and pretty much 
>>>anything else but a change to the meaning of inlines will still result 
>>>in the same scenario occuring over and over again.
>>>      
>>>
>>Let's emphasize what we both agree on:
>>It is _FUNDAMENTALLY BROKEN_ that too much code is marked as
>>'always inline'.
>>
>>We only disagree on how to achieve an improvement.
>>
>>    
>>
>
>The best approach is to manually review and fix up all the inline statements.
>
>We cannot just delete them all, because that would cause performance loss
>for well-chosen inlinings when using gcc-3.
>
>I'd be reluctant to trust gcc-4 to do the right thing in all cases.  If the
>compiler fails to inline functions in certain critical cases we'll suffer
>some performance loss and the source of that performance loss will be
>utterly obscure.
>
>If someone types `inline' into a piece of code then we want to inline the
>function, dammit.  The fact that lots of people typed `inline' when they
>shouldn't have is not a good argument for defeating (or adding uncertainty
>to) manual inlining in well-developed and well-maintained code.
>
>All those squillions of bogus inlines which you've identified are probably
>mainly in code we just don't care about much.  We shouldn't penalise
>well-maintained code because of legacy problems in less well-maintained
>code.
>  
>

It seems odd to me that we're doing this by second-hand effect on
code size ... the objective of making the code smaller is to make it
run faster, right? So ... howcome there are no benchmark results
for this?

M.

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-03 23:40                                           ` Martin J. Bligh
@ 2006-01-04  4:28                                             ` Matt Mackall
  2006-01-04  5:51                                               ` Martin J. Bligh
  2006-01-04 17:36                                               ` Zwane Mwaikambo
  0 siblings, 2 replies; 167+ messages in thread
From: Matt Mackall @ 2006-01-04  4:28 UTC (permalink / raw)
  To: Martin J. Bligh
  Cc: Andrew Morton, Adrian Bunk, mingo, tim, arjan, torvalds, davej,
	linux-kernel, Zwane Mwaikambo

On Tue, Jan 03, 2006 at 03:40:59PM -0800, Martin J. Bligh wrote:
> It seems odd to me that we're doing this by second-hand effect on
> code size ... the objective of making the code smaller is to make it
> run faster, right? So ... howcome there are no benchmark results
> for this?

Because it's extremely hard to design a benchmark that will show a
significant change one way or the other for single kernel functions
that doesn't also make said functions unusually cache-hot. And part of
the presumed advantage of uninlining is that it leaves icache room for
random other code that you're _not_ benchmarking.

In other words, if it's not a microbenchmark, it generally can't be
measured, directly or indirectly. And if it is a microbenchmark, the
result is known to be biased.

In the rare case of functions that are extremely popular (like
spinlock and friends), we _can_ actually see small improvements in
macrobenchmarks like kernel compiles. So it's fairly reasonable to
assume that reducing icache footprint really does matter more than
cycle count and extrapolate that to other functions.

(Unfortunately, Zwane is an enemy of history and the URL for the
benchmarks he posted for out-of-line spinlock has gone stale.)

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-04  4:28                                             ` Matt Mackall
@ 2006-01-04  5:51                                               ` Martin J. Bligh
  2006-01-04 17:10                                                 ` Matt Mackall
  2006-01-04 22:37                                                 ` Linus Torvalds
  2006-01-04 17:36                                               ` Zwane Mwaikambo
  1 sibling, 2 replies; 167+ messages in thread
From: Martin J. Bligh @ 2006-01-04  5:51 UTC (permalink / raw)
  To: Matt Mackall
  Cc: Andrew Morton, Adrian Bunk, mingo, tim, arjan, torvalds, davej,
	linux-kernel, Zwane Mwaikambo

Matt Mackall wrote:

>On Tue, Jan 03, 2006 at 03:40:59PM -0800, Martin J. Bligh wrote:
>  
>
>>It seems odd to me that we're doing this by second-hand effect on
>>code size ... the objective of making the code smaller is to make it
>>run faster, right? So ... howcome there are no benchmark results
>>for this?
>>    
>>
>
>Because it's extremely hard to design a benchmark that will show a
>significant change one way or the other for single kernel functions
>that doesn't also make said functions unusually cache-hot. And part of
>the presumed advantage of uninlining is that it leaves icache room for
>random other code that you're _not_ benchmarking.
>
>In other words, if it's not a microbenchmark, it generally can't be
>measured, directly or indirectly. And if it is a microbenchmark, the
>result is known to be biased.
>  
>
Well, it's not just one function, is it? It'd seem that if you unlined
a whole bunch of stuff (according to this theory) then normal
macro-benchmarks would go faster? Otherwise it's all just rather
theoretical, is it not?

>In the rare case of functions that are extremely popular (like
>spinlock and friends), we _can_ actually see small improvements in
>macrobenchmarks like kernel compiles. So it's fairly reasonable to
>assume that reducing icache footprint really does matter more than
>cycle count and extrapolate that to other functions.
>  
>
Cool, that sounds good. How much are we talking about?
I didn't see that in the thread anywhere ... perhaps I just
missed it, sorry it got long ;-)

M.

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-04  5:51                                               ` Martin J. Bligh
@ 2006-01-04 17:10                                                 ` Matt Mackall
  2006-01-04 22:37                                                 ` Linus Torvalds
  1 sibling, 0 replies; 167+ messages in thread
From: Matt Mackall @ 2006-01-04 17:10 UTC (permalink / raw)
  To: Martin J. Bligh
  Cc: Andrew Morton, Adrian Bunk, mingo, tim, arjan, torvalds, davej,
	linux-kernel, Zwane Mwaikambo

On Tue, Jan 03, 2006 at 09:51:17PM -0800, Martin J. Bligh wrote:
> Matt Mackall wrote:
> 
> >On Tue, Jan 03, 2006 at 03:40:59PM -0800, Martin J. Bligh wrote:
> > 
> >
> >>It seems odd to me that we're doing this by second-hand effect on
> >>code size ... the objective of making the code smaller is to make it
> >>run faster, right? So ... howcome there are no benchmark results
> >>for this?
> >>   
> >>
> >
> >Because it's extremely hard to design a benchmark that will show a
> >significant change one way or the other for single kernel functions
> >that doesn't also make said functions unusually cache-hot. And part of
> >the presumed advantage of uninlining is that it leaves icache room for
> >random other code that you're _not_ benchmarking.
> >
> >In other words, if it's not a microbenchmark, it generally can't be
> >measured, directly or indirectly. And if it is a microbenchmark, the
> >result is known to be biased.
> > 
> >
> Well, it's not just one function, is it? It'd seem that if you unlined
> a whole bunch of stuff (according to this theory) then normal
> macro-benchmarks would go faster? Otherwise it's all just rather
> theoretical, is it not?

Yes, if we uninline a big hunk of stuff, we should see results. But
which hunk? Running benchmarks w/ and w/o Ingo's 3 patches should be
educational. As would with -Os.

Also note that there are going to be cases where smaller wins by
orders of magnitude: where it makes the difference between the working
set fitting in cache or RAM or not.
 
> Cool, that sounds good. How much are we talking about?
> I didn't see that in the thread anywhere ... perhaps I just
> missed it, sorry it got long ;-)

The out of line spinlock stuff went by about a year ago. Google for
"zwane spinlock out of line".

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-04  4:28                                             ` Matt Mackall
  2006-01-04  5:51                                               ` Martin J. Bligh
@ 2006-01-04 17:36                                               ` Zwane Mwaikambo
  1 sibling, 0 replies; 167+ messages in thread
From: Zwane Mwaikambo @ 2006-01-04 17:36 UTC (permalink / raw)
  To: Matt Mackall
  Cc: Martin J. Bligh, Andrew Morton, Adrian Bunk, Ingo Molnar, tim,
	arjan, Linus Torvalds, Dave Jones, Linux Kernel

On Tue, 3 Jan 2006, Matt Mackall wrote:

> On Tue, Jan 03, 2006 at 03:40:59PM -0800, Martin J. Bligh wrote:
> > It seems odd to me that we're doing this by second-hand effect on
> > code size ... the objective of making the code smaller is to make it
> > run faster, right? So ... howcome there are no benchmark results
> > for this?
> 
> Because it's extremely hard to design a benchmark that will show a
> significant change one way or the other for single kernel functions
> that doesn't also make said functions unusually cache-hot. And part of
> the presumed advantage of uninlining is that it leaves icache room for
> random other code that you're _not_ benchmarking.
> 
> In other words, if it's not a microbenchmark, it generally can't be
> measured, directly or indirectly. And if it is a microbenchmark, the
> result is known to be biased.
> 
> In the rare case of functions that are extremely popular (like
> spinlock and friends), we _can_ actually see small improvements in
> macrobenchmarks like kernel compiles. So it's fairly reasonable to
> assume that reducing icache footprint really does matter more than
> cycle count and extrapolate that to other functions.
> 
> (Unfortunately, Zwane is an enemy of history and the URL for the
> benchmarks he posted for out-of-line spinlock has gone stale.)

Hey i resent that :P luckily my ~ directory hasn't been cleaned in years, 
the following bonnie runs were based on an initial implementation, i was 
only able to conclude that there was no negative cost to out of lining. 
-cool is completely out of line.

-cool
Version  @version@      ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
stp4-000      7336M  5755  99 62465  40 11037  10  5384  94 73519  29 299.0   1
stp4-000      7336M  5777  99 67203  44 14018  13  5392  94 69436  27 300.2   1
stp4-000      7336M  5725  99 61389  40 19385  18  5196  91 75178  30 307.5   1
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files:max:min        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
stp4-000        100    80  99 +++++ +++ 62151 100    80 100 +++++ +++   276 100
stp4-000        100    80  99 +++++ +++ 62775 100    81  99 +++++ +++   277 100
stp4-000        100    80  99 +++++ +++ 62857 100    80  99 +++++ +++   271 100

Version  @version@      ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
stp2-000         2G  7018  99 64560  36 21694  16  6789  97 43729  14 340.6   1
stp2-000         2G  7055  99 64836  39 21899  16  6752  97 44827  17 330.8   2
stp2-000         2G  7023  99 64525  38 22987  17  6704  96 44777  14 337.3   1
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files:max:min        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
stp2-000        100    93  99 +++++ +++ 82831  99    94  99 +++++ +++   351  99
stp2-000        100    93  99 +++++ +++ 82211 100    94  99 +++++ +++   350  99
stp2-000        100    93  99 +++++ +++ 81940 100    94  99 +++++ +++   350  99

-mainline
Version  @version@      ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
stp4-000      7336M  5726  99 65615  42 10112   9  4854  85 70211  28 295.4   1
stp4-000      7336M  5764  99 64931  43 13884  13  5242  93 67963  27 302.5   1
stp4-000      7336M  5748  99 68806  46 18061  17  5139  91 70335  28 310.0   1
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files:max:min        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
stp4-000        100    80  99 +++++ +++ 60958  99    79  99 +++++ +++   282 100
stp4-000        100    79  99 +++++ +++ 60120 100    80  99 +++++ +++   275 100
stp4-000        100    80  99 +++++ +++ 62174  99    81 100 +++++ +++   278 100

Version  @version@      ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
stp2-000         2G  7048  99 64912  38 22510  17  6732  96 43900  14 332.0   1
stp2-000         2G  7018  99 63821  39 21732  16  6787  97 44889  17 326.7   2
stp2-000         2G  7063  99 63834  38 22361  17  6738  97 43310  14 338.3   1
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files:max:min        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
stp2-000        100    93  99 +++++ +++ 80963 100    94  99 +++++ +++   344  99
stp2-000        100    93  99 +++++ +++ 80998  99    94  99 +++++ +++   348  99
stp2-000        100    93  99 +++++ +++ 81237 100    94  99 +++++ +++   349  99

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-04  5:51                                               ` Martin J. Bligh
  2006-01-04 17:10                                                 ` Matt Mackall
@ 2006-01-04 22:37                                                 ` Linus Torvalds
  2006-01-05  0:55                                                   ` Martin Bligh
  1 sibling, 1 reply; 167+ messages in thread
From: Linus Torvalds @ 2006-01-04 22:37 UTC (permalink / raw)
  To: Martin J. Bligh
  Cc: Matt Mackall, Andrew Morton, Adrian Bunk, mingo, tim, arjan,
	davej, linux-kernel, Zwane Mwaikambo



On Tue, 3 Jan 2006, Martin J. Bligh wrote:
>
> Well, it's not just one function, is it? It'd seem that if you unlined
> a whole bunch of stuff (according to this theory) then normal
> macro-benchmarks would go faster? Otherwise it's all just rather
> theoretical, is it not?

One of the problems with code size optimizations is that they 
fundamentally show much less impact under pretty much any traditional 
benchmark.

In user space, for example, the biggest impact of size optimization tends 
to be in things like load time and low-memory situations. Yet, that's 
never what is benchmarked (yeah, people will benchmark some low-memory 
kernel situations by compating different swap-out algorithms etc against 
each other - but they'll almost never compare the perceived speed of your 
desktop when you start swapping).

Now, I'll happily also admit that code and data _layout_ is often a lot 
more effective than just code size optimizations. That's especially true 
with low-memory situations where the page size being larger than many of 
the code sequences, you can make a bigger impact by changing utilization 
than by changing absolute size.

But even then it's actually really hard to measure. Cache effects tend to 
be hard _anyway_ to measure, it's really hard when the interesting case is 
the cold-cache case and can't even do simple microbenchmarks that repeat a 
million times to get stable results.

So the best we can usually do is "microbenchmarks don't show any 
noticeable _worse_ behaviour, and code size went down by 10%".

Just as an example: there's an old paper on the impact of the OS design on 
the memory subsystem, and one of the tables is about how many cycles per 
instruction is spent on cache effects (this was Ultrix vs Mach, the load 
was the average of a workload that was either specint or looked a lot 
like it).

			I$	D$

	Ultrix user:	0.07	0.08
	Mach user:	0.07	0.08
	Ultrix system:	0.43	0.23
	Mach system:	0.57	0.29

Now, that's an oldish paper, and we really don't care about Ultrix vs Mach 
nor about the particular hw (Decstation), but the same kind of basic 
numbers have been repeated over an over. Namely that system code tends to 
have very different behaviour from user code wrt caches. Caches may have 
gotten better, but code has gotten bigger, and cores have gotten faster.

And in particular, they tend to be _much_ worse. Something you'll seldom 
see as clearly in micro-benchmarks, if only because under those benchmarks 
things will generally be more cached - especially on the I$ side.

So I should probably try to find something slightly more modern, but what 
it boils down to is that at least one well-cited paper that is fairly easy 
to find claims that about _half_a_cycle_ for each instruction was spent on 
I$ misses in system code on a perfectly regular mix of programs. And that 
the cost of I$ was actually higher than the cost of D$ misses.

Now, most of the _time_ tends to be spent in user mode. That is probably 
part of the reason for why system mode gets higher cache misses (another 
is that obviously you'd hope that the user program can optimize for its 
particular memory usage, while the kernel can't). But the result remains: 
I$ miss time is a noticeable portion of system time.

Now, I claim that I$ miss overhead on a system would probably tend to have 
a pretty linear relationship with the size of the static code. There 
aren't that many big hot loops or small code that fits in the I$ to skew 
that trivial rule.

So at a total guess, and taking the above numbers (that are questionable, 
but hey, they should be ok as a starting point for WAGging), reducing code 
size by 10% should give about 0.007 cycles per instruction in user space. 
Whee. Not very noticeable. But on system code (the only thing the kernel 
can help), it's actually a much more visible 0.05 CPI.

Yeah, the math is bogus, the numbers may be irrelevant, but it does show 
why I$ should matter, even though it's much harder to measure.

			Linus

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
  2006-01-04 22:37                                                 ` Linus Torvalds
@ 2006-01-05  0:55                                                   ` Martin Bligh
  0 siblings, 0 replies; 167+ messages in thread
From: Martin Bligh @ 2006-01-05  0:55 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Matt Mackall, Andrew Morton, Adrian Bunk, mingo, tim, arjan,
	davej, linux-kernel, Zwane Mwaikambo

 > ...
> But even then it's actually really hard to measure. Cache effects tend to 
> be hard _anyway_ to measure, it's really hard when the interesting case is 
> the cold-cache case and can't even do simple microbenchmarks that repeat a 
> million times to get stable results.
> 
> So the best we can usually do is "microbenchmarks don't show any 
> noticeable _worse_ behaviour, and code size went down by 10%".

OK, that makes a lot of sense to me. It was just worrying that nobody 
seemed to be measuring performance _at all_, just talking about code 
size, which is all very nice, but not really an end goal (for most 
systems). It seems to me like a simple tradeoff - cycles vs cache 
misses, and people seem to be only looking at one side.

I wasn't trying to say it was bad ... just seemed insufficently 
justified to me (at least regression tested, as you say).

> So at a total guess, and taking the above numbers (that are questionable, 
> but hey, they should be ok as a starting point for WAGging), reducing code 
> size by 10% should give about 0.007 cycles per instruction in user space. 
> Whee. Not very noticeable. But on system code (the only thing the kernel 
> can help), it's actually a much more visible 0.05 CPI.
> 
> Yeah, the math is bogus, the numbers may be irrelevant, but it does show 
> why I$ should matter, even though it's much harder to measure.

The IBM PPC people had some fancy way of measuring CPI and were very 
interested in it. Perhaps we can taunt them into helping measure things.

M.

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
@ 2006-01-05  0:55 Chuck Ebbert
  2006-01-05  1:07 ` Martin Bligh
  0 siblings, 1 reply; 167+ messages in thread
From: Chuck Ebbert @ 2006-01-05  0:55 UTC (permalink / raw)
  To: Matt Mackall
  Cc: Adrian Bunk, Andrew Morton, Arjan van de Ven, Ingo Molnar,
	Linus Torvalds, linux-kernel, Dave Jones, Martin J. Bligh,
	Tim Schmielau

In-Reply-To: <20060104042822.GA3356@waste.org>

On Tue, 3 Jan 2006 at 22:28:23 -0600, Matt Mackall wrote:

> On Tue, Jan 03, 2006 at 03:40:59PM -0800, Martin J. Bligh wrote:
> 
> > It seems odd to me that we're doing this by second-hand effect on
> > code size ... the objective of making the code smaller is to make it
> > run faster, right? So ... howcome there are no benchmark results
> > for this?
> 
> Because it's extremely hard to design a benchmark that will show a
> significant change one way or the other for single kernel functions
> that doesn't also make said functions unusually cache-hot. And part of
> the presumed advantage of uninlining is that it leaves icache room for
> random other code that you're _not_ benchmarking.

Moving code out-of-line can have some serious drawbacks, too.  For example,
if you have a 12 byte function that is called frequently, you are pinning
an additional 52 bytes of code in the L1 cache.  Unless that code is also
called often you might waste more space than you gain from un-inlining.

E.g. a look at arch/i386/kernel/apic.c shows a bunch of functions that are
only there because of CPU hotplug, to handle errors or deal with
suspend/resume/shutdown.  Such code should be in its own text section.

And the only way to get meaningful benchmarks so you can figure out where
to place the code would be to instrument the kernel and then run real-world
applications.

-- 
Chuck

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer  compilers
  2006-01-05  0:55 [patch 00/2] improve .text size on gcc 4.0 and newer compilers Chuck Ebbert
@ 2006-01-05  1:07 ` Martin Bligh
  2006-01-05 12:19   ` Arjan van de Ven
  0 siblings, 1 reply; 167+ messages in thread
From: Martin Bligh @ 2006-01-05  1:07 UTC (permalink / raw)
  To: Chuck Ebbert
  Cc: Matt Mackall, Adrian Bunk, Andrew Morton, Arjan van de Ven,
	Ingo Molnar, Linus Torvalds, linux-kernel, Dave Jones,
	Tim Schmielau

Chuck Ebbert wrote:
> In-Reply-To: <20060104042822.GA3356@waste.org>
> 
> On Tue, 3 Jan 2006 at 22:28:23 -0600, Matt Mackall wrote:
> 
> 
>>On Tue, Jan 03, 2006 at 03:40:59PM -0800, Martin J. Bligh wrote:
>>
>>
>>>It seems odd to me that we're doing this by second-hand effect on
>>>code size ... the objective of making the code smaller is to make it
>>>run faster, right? So ... howcome there are no benchmark results
>>>for this?
>>
>>Because it's extremely hard to design a benchmark that will show a
>>significant change one way or the other for single kernel functions
>>that doesn't also make said functions unusually cache-hot. And part of
>>the presumed advantage of uninlining is that it leaves icache room for
>>random other code that you're _not_ benchmarking.
> 
> 
> Moving code out-of-line can have some serious drawbacks, too.  For example,
> if you have a 12 byte function that is called frequently, you are pinning
> an additional 52 bytes of code in the L1 cache.  Unless that code is also
> called often you might waste more space than you gain from un-inlining.
> 
> E.g. a look at arch/i386/kernel/apic.c shows a bunch of functions that are
> only there because of CPU hotplug, to handle errors or deal with
> suspend/resume/shutdown.  Such code should be in its own text section.
> 
> And the only way to get meaningful benchmarks so you can figure out where
> to place the code would be to instrument the kernel and then run real-world
> applications.

What would be nice to do is pack all the frequently used code together 
in close proximity. Would probably have much larger effects with 
userspace code, esp where we touch disk (which is more page-size 
granularity), but is probably worth doing with kernel code too (where 
AFAICS we'd only get cacheline granular).

M.

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer  compilers
  2006-01-05  1:07 ` Martin Bligh
@ 2006-01-05 12:19   ` Arjan van de Ven
  2006-01-05 14:30     ` Jakub Jelinek
  2006-01-05 17:02     ` Matt Mackall
  0 siblings, 2 replies; 167+ messages in thread
From: Arjan van de Ven @ 2006-01-05 12:19 UTC (permalink / raw)
  To: Martin Bligh
  Cc: Chuck Ebbert, Matt Mackall, Adrian Bunk, Andrew Morton,
	Ingo Molnar, Linus Torvalds, linux-kernel, Dave Jones,
	Tim Schmielau


> What would be nice to do is pack all the frequently used code together 
> in close proximity. Would probably have much larger effects with 
> userspace code, esp where we touch disk (which is more page-size 
> granularity), but is probably worth doing with kernel code too (where 
> AFAICS we'd only get cacheline granular).

in the kernel we could make a .text.rare section for functions, which we
could annotate with __rare.
The other way around, __fastpath or whatever is a bad idea, everyone
will consider all of their own functions as such (just like inline ;)...
go-fast-stripes all the way :-(


obvious candidates for __rare are
* pm suspend/resume functions
* error handling functions
* initialization stuff (including mount time stuff for filesystems,
  and hardware setup for drivers)

I wonder if gcc can be convinced to put all unlikely() code sections
into a .text.rare as well, that'd be really cool.




^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer  compilers
  2006-01-05 12:19   ` Arjan van de Ven
@ 2006-01-05 14:30     ` Jakub Jelinek
  2006-01-05 16:55       ` Linus Torvalds
  2006-01-05 17:02     ` Matt Mackall
  1 sibling, 1 reply; 167+ messages in thread
From: Jakub Jelinek @ 2006-01-05 14:30 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Martin Bligh, Chuck Ebbert, Matt Mackall, Adrian Bunk,
	Andrew Morton, Ingo Molnar, Linus Torvalds, linux-kernel,
	Dave Jones, Tim Schmielau

On Thu, Jan 05, 2006 at 01:19:12PM +0100, Arjan van de Ven wrote:
> obvious candidates for __rare are
> * pm suspend/resume functions
> * error handling functions
> * initialization stuff (including mount time stuff for filesystems,
>   and hardware setup for drivers)
> 
> I wonder if gcc can be convinced to put all unlikely() code sections
> into a .text.rare as well, that'd be really cool.

gcc 4.1 calls them .text.unlikely and you need to use
-freorder-blocks-and-partition
switch.  But I haven't been able to reproduce it on a short testcase I
cooked up, so maybe it is broken ATM (it put the whole function into
.text rather than the expected part into .text.unlikely and left
empty .text.unlikely).

	Jakub

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer  compilers
  2006-01-05 14:30     ` Jakub Jelinek
@ 2006-01-05 16:55       ` Linus Torvalds
  2006-01-05 18:42         ` Daniel Jacobowitz
  0 siblings, 1 reply; 167+ messages in thread
From: Linus Torvalds @ 2006-01-05 16:55 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Arjan van de Ven, Martin Bligh, Chuck Ebbert, Matt Mackall,
	Adrian Bunk, Andrew Morton, Ingo Molnar, linux-kernel, Dave Jones,
	Tim Schmielau



On Thu, 5 Jan 2006, Jakub Jelinek wrote:
> > 
> > I wonder if gcc can be convinced to put all unlikely() code sections
> > into a .text.rare as well, that'd be really cool.
> 
> gcc 4.1 calls them .text.unlikely and you need to use
> -freorder-blocks-and-partition
> switch.  But I haven't been able to reproduce it on a short testcase I
> cooked up, so maybe it is broken ATM (it put the whole function into
> .text rather than the expected part into .text.unlikely and left
> empty .text.unlikely).

If it causes the conditional jump to become a long one instead of a byte 
offset one, it's actually a pessimisation for no gain (yes, it might give 
better cache density _if_ the function that is linked after the current 
one is cache-dense with the function in question and _if_ the unlikely 
sequence is really really unlikely, but that's two fairly big ifs).

So I'm not at all convinced of the feature (or maybe gcc actually does the 
right thing, and the reason you can't reproduce it is because gcc is being 
understandably reluctant to use the other section).

Basically, there's "biased one way", "unlikely", and there's "practically 
never happens". And even the "practically never" case will probably be 
better off with the unlikely case close-by, if it means that the likely 
case can use a short branch.

			Linus

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer  compilers
  2006-01-05 12:19   ` Arjan van de Ven
  2006-01-05 14:30     ` Jakub Jelinek
@ 2006-01-05 17:02     ` Matt Mackall
  2006-01-05 17:59       ` Martin Bligh
  1 sibling, 1 reply; 167+ messages in thread
From: Matt Mackall @ 2006-01-05 17:02 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Martin Bligh, Chuck Ebbert, Adrian Bunk, Andrew Morton,
	Ingo Molnar, Linus Torvalds, linux-kernel, Dave Jones,
	Tim Schmielau

On Thu, Jan 05, 2006 at 01:19:12PM +0100, Arjan van de Ven wrote:
> 
> > What would be nice to do is pack all the frequently used code together 
> > in close proximity. Would probably have much larger effects with 
> > userspace code, esp where we touch disk (which is more page-size 
> > granularity), but is probably worth doing with kernel code too (where 
> > AFAICS we'd only get cacheline granular).
> 
> in the kernel we could make a .text.rare section for functions, which we
> could annotate with __rare.
> The other way around, __fastpath or whatever is a bad idea, everyone
> will consider all of their own functions as such (just like inline ;)...
> go-fast-stripes all the way :-(

Gah, we don't want to do this by hand in either direction. It's the
inline nightmare all over again.

It'd be better to take a tool like oprofile and run it against some
test suite to generate a usage map, then re-sort based on the map.
Then ship a "standard" map in the stock tarball. Note that the map
need only list the popular functions.

The ideal sampling tool can collect second order information: which
functions are executed near each other as well as which are executed
most frequently.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer  compilers
  2006-01-05 17:02     ` Matt Mackall
@ 2006-01-05 17:59       ` Martin Bligh
  2006-01-05 18:09         ` Arjan van de Ven
  2006-01-05 19:17         ` Linus Torvalds
  0 siblings, 2 replies; 167+ messages in thread
From: Martin Bligh @ 2006-01-05 17:59 UTC (permalink / raw)
  To: Matt Mackall
  Cc: Arjan van de Ven, Chuck Ebbert, Adrian Bunk, Andrew Morton,
	Ingo Molnar, Linus Torvalds, linux-kernel, Dave Jones,
	Tim Schmielau

Matt Mackall wrote:
> On Thu, Jan 05, 2006 at 01:19:12PM +0100, Arjan van de Ven wrote:
> 
>>>What would be nice to do is pack all the frequently used code together 
>>>in close proximity. Would probably have much larger effects with 
>>>userspace code, esp where we touch disk (which is more page-size 
>>>granularity), but is probably worth doing with kernel code too (where 
>>>AFAICS we'd only get cacheline granular).
>>
>>in the kernel we could make a .text.rare section for functions, which we
>>could annotate with __rare.
>>The other way around, __fastpath or whatever is a bad idea, everyone
>>will consider all of their own functions as such (just like inline ;)...
>>go-fast-stripes all the way :-(
> 
> 
> Gah, we don't want to do this by hand in either direction. It's the
> inline nightmare all over again.

Absolutely.

> It'd be better to take a tool like oprofile and run it against some
> test suite to generate a usage map, then re-sort based on the map.
> Then ship a "standard" map in the stock tarball. Note that the map
> need only list the popular functions.
> 
> The ideal sampling tool can collect second order information: which
> functions are executed near each other as well as which are executed
> most frequently.

There are tools already around to do this sort of thing as well - 
"profile directed optimization" or whatever they called it. Seems to be 
fairly commonly done with userspace, but not with the kernel. I'm not 
sure why not ... possibly because it's not available for gcc ?

M.

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer  compilers
  2006-01-05 17:59       ` Martin Bligh
@ 2006-01-05 18:09         ` Arjan van de Ven
  2006-01-05 18:43           ` Daniel Jacobowitz
  2006-01-05 19:17         ` Linus Torvalds
  1 sibling, 1 reply; 167+ messages in thread
From: Arjan van de Ven @ 2006-01-05 18:09 UTC (permalink / raw)
  To: Martin Bligh
  Cc: Matt Mackall, Chuck Ebbert, Adrian Bunk, Andrew Morton,
	Ingo Molnar, Linus Torvalds, linux-kernel, Dave Jones,
	Tim Schmielau


> There are tools already around to do this sort of thing as well - 
> "profile directed optimization" or whatever they called it. Seems to be 
> fairly commonly done with userspace, but not with the kernel. I'm not 
> sure why not ... possibly because it's not available for gcc ?

gcc has this for sure
the problem is that it expects the profile info in a special format
that.. gets written to a file. So to do it in the kernel you need
SomeMagic(tm), for example to use the kernel profiler but to let it
output it somehow in a gcc compatible format.



^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer  compilers
  2006-01-05 16:55       ` Linus Torvalds
@ 2006-01-05 18:42         ` Daniel Jacobowitz
  0 siblings, 0 replies; 167+ messages in thread
From: Daniel Jacobowitz @ 2006-01-05 18:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jakub Jelinek, Arjan van de Ven, Martin Bligh, Chuck Ebbert,
	Matt Mackall, Adrian Bunk, Andrew Morton, Ingo Molnar,
	linux-kernel, Dave Jones, Tim Schmielau

On Thu, Jan 05, 2006 at 08:55:27AM -0800, Linus Torvalds wrote:
> 
> 
> On Thu, 5 Jan 2006, Jakub Jelinek wrote:
> > > 
> > > I wonder if gcc can be convinced to put all unlikely() code sections
> > > into a .text.rare as well, that'd be really cool.
> > 
> > gcc 4.1 calls them .text.unlikely and you need to use
> > -freorder-blocks-and-partition
> > switch.  But I haven't been able to reproduce it on a short testcase I
> > cooked up, so maybe it is broken ATM (it put the whole function into
> > .text rather than the expected part into .text.unlikely and left
> > empty .text.unlikely).
> 
> If it causes the conditional jump to become a long one instead of a byte 
> offset one, it's actually a pessimisation for no gain (yes, it might give 
> better cache density _if_ the function that is linked after the current 
> one is cache-dense with the function in question and _if_ the unlikely 
> sequence is really really unlikely, but that's two fairly big ifs).
> 
> So I'm not at all convinced of the feature (or maybe gcc actually does the 
> right thing, and the reason you can't reproduce it is because gcc is being 
> understandably reluctant to use the other section).

It triggers either rarely or never in basic compilation; it's designed
to work off profile feedback.  With that it worked in gcc 4.1 a couple
of weeks ago.

-- 
Daniel Jacobowitz
CodeSourcery

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer  compilers
  2006-01-05 18:09         ` Arjan van de Ven
@ 2006-01-05 18:43           ` Daniel Jacobowitz
  0 siblings, 0 replies; 167+ messages in thread
From: Daniel Jacobowitz @ 2006-01-05 18:43 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Martin Bligh, Matt Mackall, Chuck Ebbert, Adrian Bunk,
	Andrew Morton, Ingo Molnar, Linus Torvalds, linux-kernel,
	Dave Jones, Tim Schmielau

On Thu, Jan 05, 2006 at 07:09:36PM +0100, Arjan van de Ven wrote:
> 
> > There are tools already around to do this sort of thing as well - 
> > "profile directed optimization" or whatever they called it. Seems to be 
> > fairly commonly done with userspace, but not with the kernel. I'm not 
> > sure why not ... possibly because it's not available for gcc ?
> 
> gcc has this for sure
> the problem is that it expects the profile info in a special format
> that.. gets written to a file. So to do it in the kernel you need
> SomeMagic(tm), for example to use the kernel profiler but to let it
> output it somehow in a gcc compatible format.

Right - at some point I remember a discussion of nifty Eclipse plugins
to allow test runs on a custom workload and rebuild with feedback,
but it never materialized.

-- 
Daniel Jacobowitz
CodeSourcery

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer  compilers
  2006-01-05 17:59       ` Martin Bligh
  2006-01-05 18:09         ` Arjan van de Ven
@ 2006-01-05 19:17         ` Linus Torvalds
  2006-01-05 19:40           ` Linus Torvalds
  2006-01-05 21:32           ` Grzegorz Kulewski
  1 sibling, 2 replies; 167+ messages in thread
From: Linus Torvalds @ 2006-01-05 19:17 UTC (permalink / raw)
  To: Martin Bligh
  Cc: Matt Mackall, Arjan van de Ven, Chuck Ebbert, Adrian Bunk,
	Andrew Morton, Ingo Molnar, linux-kernel, Dave Jones,
	Tim Schmielau



On Thu, 5 Jan 2006, Martin Bligh wrote:
> 
> There are tools already around to do this sort of thing as well - "profile
> directed optimization" or whatever they called it. Seems to be fairly commonly
> done with userspace, but not with the kernel. I'm not sure why not ...
> possibly because it's not available for gcc ?

.. and they are totally useless.

The fact is, the last thing we want to do is to ship a magic profile file 
around for each and every release. And that's what we'd have to do to
get consistent and _useful_ performance increases.

That kind of profile-directed stuff is useful mainly for commercial binary 
releases (where the release binary can be guided by a profile file), or 
speciality programs that can tune themselves a few times before running.

A kernel that people recompile themselves simply isn't something where it 
works.

What _would_ work is something that actually CHECKS (and suggests) the 
hints we already have in the kernel. IOW, you could have an automated 
test-bed that runs some reasonable load, and then verifies whether there 
are branches that go only one way that could be annotated as such, or 
whether some annotation is wrong.

That way the "profile data" actually follows the source code, and is thus 
actually relevant to an open-source project. Because we do _not_ start 
having specially optimized binaries. That's against the whole point of 
being open source and trying to get users to get more deeply involved with 
the project.

			Linus

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer  compilers
  2006-01-05 19:17         ` Linus Torvalds
@ 2006-01-05 19:40           ` Linus Torvalds
  2006-01-05 19:49             ` Martin Bligh
  2006-01-05 21:34             ` Matt Mackall
  2006-01-05 21:32           ` Grzegorz Kulewski
  1 sibling, 2 replies; 167+ messages in thread
From: Linus Torvalds @ 2006-01-05 19:40 UTC (permalink / raw)
  To: Martin Bligh
  Cc: Matt Mackall, Arjan van de Ven, Chuck Ebbert, Adrian Bunk,
	Andrew Morton, Ingo Molnar, linux-kernel, Dave Jones,
	Tim Schmielau



On Thu, 5 Jan 2006, Linus Torvalds wrote:
> 
> That way the "profile data" actually follows the source code, and is thus 
> actually relevant to an open-source project. Because we do _not_ start 
> having specially optimized binaries. That's against the whole point of 
> being open source and trying to get users to get more deeply involved with 
> the project.

Btw, having annotations obviously works, although it equally obviously 
will limit the scope of this kind of profile data. You won't get the same 
kind of granularity, and you'd only do the annotations for cases that end 
up being very clear-cut. But having an automated feedback cycle for adding 
(and removing!) annotations should make it pretty maintainable in the long 
run, although the initial annotations migh only end up being for really 
core code.

There's a few papers around that claim that programmers are often very 
wrong when they estimate probabilities for different code-paths, and that 
you absolutely need automation to get it right. I believe them. But the 
fact that you need automation doesn't automatically mean that you should 
feed the compiler a profile-data-blob.

You can definitely automate this on a source level too, the same way 
sparse annotations can help find user access problems. 

There's a nice secondary advantage to source code annotations that are 
actively checked: they focus the programmers themselves on the issue. One 
of the biggest advantages (in my opinion) of the "struct xyzzy __user *" 
annotations has actually been that it's much more immediately clear to the 
kernel programmer that it's a user pointer. Many of the bugs we had were 
just the stupid unnecessary ones because it wasn't always obvious.

The same is likely true of rare functions etc. A function that is marked 
"rare" as a hint to the compiler to put it into another segment (and 
perhaps optimize more aggressively for size etc rather than performance) 
is also a big hint to a programmer that he shouldn't care. On the other 
hand, if some branch is marked as "always()", that also tells the 
programmer something real.

So explicit source hints may be more work, but they have tons of 
advantages. Ranging from repeatability and distribution to just the 
programmer being aware of them.

In other projects, maybe people don't care as much about the programmer 
being aware of what's going on - garbage collection etc silent automation 
is all wonderful. In the kernel, I'd rather have people be aware of what 
happens.

		Linus

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer  compilers
  2006-01-05 19:40           ` Linus Torvalds
@ 2006-01-05 19:49             ` Martin Bligh
  2006-01-05 20:13               ` Linus Torvalds
  2006-01-05 21:34             ` Matt Mackall
  1 sibling, 1 reply; 167+ messages in thread
From: Martin Bligh @ 2006-01-05 19:49 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Matt Mackall, Arjan van de Ven, Chuck Ebbert, Adrian Bunk,
	Andrew Morton, Ingo Molnar, linux-kernel, Dave Jones,
	Tim Schmielau

Linus Torvalds wrote:
> 
> On Thu, 5 Jan 2006, Linus Torvalds wrote:
> 
>>That way the "profile data" actually follows the source code, and is thus 
>>actually relevant to an open-source project. Because we do _not_ start 
>>having specially optimized binaries. That's against the whole point of 
>>being open source and trying to get users to get more deeply involved with 
>>the project.
> 
> 
> Btw, having annotations obviously works, although it equally obviously 
> will limit the scope of this kind of profile data. You won't get the same 
> kind of granularity, and you'd only do the annotations for cases that end 
> up being very clear-cut. But having an automated feedback cycle for adding 
> (and removing!) annotations should make it pretty maintainable in the long 
> run, although the initial annotations migh only end up being for really 
> core code.
> 
> There's a few papers around that claim that programmers are often very 
> wrong when they estimate probabilities for different code-paths, and that 
> you absolutely need automation to get it right. I believe them. But the 
> fact that you need automation doesn't automatically mean that you should 
> feed the compiler a profile-data-blob.

Hmm. if you're just going to do it as binary on/off ...is it not pretty 
trivial to do a crude test implementation by booting the kernel, turning
on profiling, running a bunch of different tests, then marking anything
that never appears at all in profiling as rare?

Not saying it's a good long-term approach, but would it not give us 
enough data to know whether the whole approach was worthwhile? I suspect
(on random gut-feel) we never call at over 50% of the functions we have
(an even easier hypothesis to test)

OTOH, do we have that much to gain anyway in kernel space? all we're 
doing is packing stuff down into the same cacheline or not, isn't it?
As we have all pages pinned in memory, does it matter for any reason
beyond that?

M.

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer  compilers
  2006-01-05 19:49             ` Martin Bligh
@ 2006-01-05 20:13               ` Linus Torvalds
  2006-01-05 20:15                 ` Linus Torvalds
  0 siblings, 1 reply; 167+ messages in thread
From: Linus Torvalds @ 2006-01-05 20:13 UTC (permalink / raw)
  To: Martin Bligh
  Cc: Matt Mackall, Arjan van de Ven, Chuck Ebbert, Adrian Bunk,
	Andrew Morton, Ingo Molnar, linux-kernel, Dave Jones,
	Tim Schmielau



On Thu, 5 Jan 2006, Martin Bligh wrote:
> 
> Hmm. if you're just going to do it as binary on/off ...is it not pretty
> trivial to do a crude test implementation by booting the kernel, turning
> on profiling, running a bunch of different tests, then marking anything
> that never appears at all in profiling as rare?

Yes, I think "crude" is exactly where we want to start. It's much easier 
to then make it smarter later.

> Not saying it's a good long-term approach, but would it not give us enough
> data to know whether the whole approach was worthwhile?

Yes. And it's entirely possible that "crude" is perfectly fine even in the 
long run. I suspect this is very much a "5% of the work gets us 80% of the 
benefit", with a _very_ long tail with tons of more work to get very minor 
incremental savings..

> OTOH, do we have that much to gain anyway in kernel space? all we're doing is
> packing stuff down into the same cacheline or not, isn't it?
> As we have all pages pinned in memory, does it matter for any reason
> beyond that?

The cache effects are likely the biggest ones, and no, I don't know how 
much denser it will be in the cache. Especially with a 64-byte one.. 
(although 128 bytes is fairly common too).

There are some situations where we have TLB issues, but those are likely 
cases where we don't care about placement performance anyway (ie they'd
be in situations where you use the page-alloc-debug stuff, which is very 
expensive for _other_ reasons ;)

		Linus

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer  compilers
  2006-01-05 20:13               ` Linus Torvalds
@ 2006-01-05 20:15                 ` Linus Torvalds
  2006-01-05 23:30                   ` Ingo Molnar
  2006-01-06  0:50                   ` Mitchell Blank Jr
  0 siblings, 2 replies; 167+ messages in thread
From: Linus Torvalds @ 2006-01-05 20:15 UTC (permalink / raw)
  To: Martin Bligh
  Cc: Matt Mackall, Arjan van de Ven, Chuck Ebbert, Adrian Bunk,
	Andrew Morton, Ingo Molnar, linux-kernel, Dave Jones,
	Tim Schmielau



On Thu, 5 Jan 2006, Linus Torvalds wrote:
> 
> The cache effects are likely the biggest ones, and no, I don't know how 
> much denser it will be in the cache. Especially with a 64-byte one.. 
> (although 128 bytes is fairly common too).

Oh, but validatign things like "likely()" and "unlikely()" branch hints 
might be a noticeably bigger issue. 

In user space, placement on the macro level is probably a bigger deal, but 
in the kernel we probably care mostly about just single cachelines and 
about branch prediction/placement.

		Linus

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer  compilers
  2006-01-05 19:17         ` Linus Torvalds
  2006-01-05 19:40           ` Linus Torvalds
@ 2006-01-05 21:32           ` Grzegorz Kulewski
  1 sibling, 0 replies; 167+ messages in thread
From: Grzegorz Kulewski @ 2006-01-05 21:32 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Martin Bligh, Matt Mackall, Arjan van de Ven, Chuck Ebbert,
	Adrian Bunk, Andrew Morton, Ingo Molnar, linux-kernel, Dave Jones,
	Tim Schmielau

On Thu, 5 Jan 2006, Linus Torvalds wrote:
> On Thu, 5 Jan 2006, Martin Bligh wrote:
>>
>> There are tools already around to do this sort of thing as well - "profile
>> directed optimization" or whatever they called it. Seems to be fairly commonly
>> done with userspace, but not with the kernel. I'm not sure why not ...
>> possibly because it's not available for gcc ?
>
> .. and they are totally useless.
[snip]
>
> A kernel that people recompile themselves simply isn't something where it
> works.
>
> What _would_ work is something that actually CHECKS (and suggests) the
> hints we already have in the kernel. IOW, you could have an automated
> test-bed that runs some reasonable load, and then verifies whether there
> are branches that go only one way that could be annotated as such, or
> whether some annotation is wrong.
>
> That way the "profile data" actually follows the source code, and is thus
> actually relevant to an open-source project. Because we do _not_ start
> having specially optimized binaries. That's against the whole point of
> being open source and trying to get users to get more deeply involved with
> the project.

This is exactly what I am thinking for quite some time (for my userspace 
projects).

I would really want to see some program in gcc package that takes source 
code and some profile results (maybe even multiple profile results from 
many different environments and systems under different loads) and tries 
to suggest some annotations for the code like

- __fastpath,
- likely/unlikely,
- inline (normal for performance reasons not some forced inline because 
code breaks if not inlined - this could be marked __always_inline)
- maybe something for small loops on fastpath
- possibly some detection of dead code - if some part of code is never or 
nearly never executed it should be double checked for 1. is it still 
needed? 2. bugs - because if it never executes it is probably not tested 
enough,

and so on. It also should warn if it finds annotations already in the code 
that don't match profile results (especially if it differs greatly).

It could also try to check possible inline scenarios using some heuristics 
and searches. Something like "Ok, if I will inline this function and this 
--- will it improve things [== reduce code size for example] because some 
things will become constants?" This is probably not possible to do at 
compile time because it will make normal compiles much longer. But it 
could be run from time to time or continously on some automated testing 
farm and then generate reports for programmer to add or remove some 
annotations (and why and how certain the report is about some annotation). 
This way the (not only) linux kernel problem with inlines will be solved 
once and for ever.

Also I think that "profile driven optimizations" like currently in gcc are 
evil because the profile is most often biased and incomplete (for example 
the one used in gcc compile itself). Also it can make debuging strange 
problems (that can be caused by buggy compiler) much harder because 
everyone can have completly different binary code generated by their 
compiler because completly different profile was used. This way 
reproducing the bug by developer is much harder if not nearly impossible.

And gcc (like any other compiler) can have some bug that will cause it to 
make some completly stupid decision in some cases making program much 
slower. This can't be investigated without looking directly at assembler 
code for several hours and this is not what human programmers should be 
doing probably. And if we will be using such "hinter" and it will have 
some bug and will tell us that something is likely while it is clearly not 
we could easily invwstigate - the whole process becomes transparent and 
could be well managed.

Unfortunatelly I don't know gcc good enough to add such code to it. But I 
will hapily help by testing and reporting bugs if someone will want to 
write such code.... ;-)


Grzegorz Kulewski

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer  compilers
  2006-01-05 19:40           ` Linus Torvalds
  2006-01-05 19:49             ` Martin Bligh
@ 2006-01-05 21:34             ` Matt Mackall
  2006-01-05 22:08               ` Linus Torvalds
  1 sibling, 1 reply; 167+ messages in thread
From: Matt Mackall @ 2006-01-05 21:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Martin Bligh, Arjan van de Ven, Chuck Ebbert, Adrian Bunk,
	Andrew Morton, Ingo Molnar, linux-kernel, Dave Jones,
	Tim Schmielau

On Thu, Jan 05, 2006 at 11:40:08AM -0800, Linus Torvalds wrote:
> 
> 
> On Thu, 5 Jan 2006, Linus Torvalds wrote:
> > 
> > That way the "profile data" actually follows the source code, and is thus 
> > actually relevant to an open-source project. Because we do _not_ start 
> > having specially optimized binaries. That's against the whole point of 
> > being open source and trying to get users to get more deeply involved with 
> > the project.
> 
> Btw, having annotations obviously works, although it equally obviously 
> will limit the scope of this kind of profile data. You won't get the same 
> kind of granularity, and you'd only do the annotations for cases that end 
> up being very clear-cut. But having an automated feedback cycle for adding 
> (and removing!) annotations should make it pretty maintainable in the long 
> run, although the initial annotations migh only end up being for really 
> core code.
> 
> There's a few papers around that claim that programmers are often very 
> wrong when they estimate probabilities for different code-paths, and that 
> you absolutely need automation to get it right. I believe them. But the 
> fact that you need automation doesn't automatically mean that you should 
> feed the compiler a profile-data-blob.

I think it's a mistake to interleave this data into the C source. It's
expensive and tedious to change relative to its volatility. What I was
proposing was something like, say, arch/i386/popularity.lst, which
would simply contain a list of the most popular n% of functions sorted
by popularity. As text, of course.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer  compilers
  2006-01-05 21:34             ` Matt Mackall
@ 2006-01-05 22:08               ` Linus Torvalds
  2006-01-05 22:36                 ` Matt Mackall
  0 siblings, 1 reply; 167+ messages in thread
From: Linus Torvalds @ 2006-01-05 22:08 UTC (permalink / raw)
  To: Matt Mackall
  Cc: Martin Bligh, Arjan van de Ven, Chuck Ebbert, Adrian Bunk,
	Andrew Morton, Ingo Molnar, linux-kernel, Dave Jones,
	Tim Schmielau



On Thu, 5 Jan 2006, Matt Mackall wrote:
> 
> I think it's a mistake to interleave this data into the C source. It's
> expensive and tedious to change relative to its volatility.

I don't believe it is actually all _that_ volatile. Yes, it would be a 
huge issue _initially_, but the incremental effects shouldn't be that big, 
or there is something wrong with the approach.

> What I was proposing was something like, say, arch/i386/popularity.lst, 
> which would simply contain a list of the most popular n% of functions 
> sorted by popularity. As text, of course.

I suspect that would certainlty work for pure function-based popularity, 
and yes, it has the advantage of being simple (especially for something 
that ends up being almost totally separated from the compiler: if we're 
using this purely to modify link scripts etc with special tools).

But what about the unlikely/likely conditional hints that we currently do 
by hand? How are you going to sanely maintain a list of those without 
doing that in source code?

			Linus

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer  compilers
  2006-01-05 22:08               ` Linus Torvalds
@ 2006-01-05 22:36                 ` Matt Mackall
  2006-01-05 22:49                   ` Martin Bligh
  2006-01-05 22:55                   ` Ingo Molnar
  0 siblings, 2 replies; 167+ messages in thread
From: Matt Mackall @ 2006-01-05 22:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Martin Bligh, Arjan van de Ven, Chuck Ebbert, Adrian Bunk,
	Andrew Morton, Ingo Molnar, linux-kernel, Dave Jones,
	Tim Schmielau

On Thu, Jan 05, 2006 at 02:08:06PM -0800, Linus Torvalds wrote:
> 
> On Thu, 5 Jan 2006, Matt Mackall wrote:
> > 
> > I think it's a mistake to interleave this data into the C source. It's
> > expensive and tedious to change relative to its volatility.
> 
> I don't believe it is actually all _that_ volatile. Yes, it would be a 
> huge issue _initially_, but the incremental effects shouldn't be that big, 
> or there is something wrong with the approach.

No, perhaps not. But it would be nice in theory for people to be able
to do things like profile their production system and relink. And
having to touch hundreds of files to do it would be painful.

> > What I was proposing was something like, say, arch/i386/popularity.lst, 
> > which would simply contain a list of the most popular n% of functions 
> > sorted by popularity. As text, of course.
> 
> I suspect that would certainlty work for pure function-based popularity, 
> and yes, it has the advantage of being simple (especially for something 
> that ends up being almost totally separated from the compiler: if we're 
> using this purely to modify link scripts etc with special tools).
> 
> But what about the unlikely/likely conditional hints that we currently do 
> by hand? How are you going to sanely maintain a list of those without 
> doing that in source code?

Dunno. Those bits are all anonymous so marking them in situ is about
the only way to go. But we can do better for whole functions.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer  compilers
  2006-01-05 22:36                 ` Matt Mackall
@ 2006-01-05 22:49                   ` Martin Bligh
  2006-01-05 23:02                     ` Matt Mackall
  2006-01-05 22:55                   ` Ingo Molnar
  1 sibling, 1 reply; 167+ messages in thread
From: Martin Bligh @ 2006-01-05 22:49 UTC (permalink / raw)
  To: Matt Mackall
  Cc: Linus Torvalds, Arjan van de Ven, Chuck Ebbert, Adrian Bunk,
	Andrew Morton, Ingo Molnar, linux-kernel, Dave Jones,
	Tim Schmielau


>>>What I was proposing was something like, say, arch/i386/popularity.lst, 
>>>which would simply contain a list of the most popular n% of functions 
>>>sorted by popularity. As text, of course.
>>
>>I suspect that would certainlty work for pure function-based popularity, 
>>and yes, it has the advantage of being simple (especially for something 
>>that ends up being almost totally separated from the compiler: if we're 
>>using this purely to modify link scripts etc with special tools).
>>
>>But what about the unlikely/likely conditional hints that we currently do 
>>by hand? How are you going to sanely maintain a list of those without 
>>doing that in source code?
> 
> 
> Dunno. Those bits are all anonymous so marking them in situ is about
> the only way to go. But we can do better for whole functions.

Would also make it easier to rank it as a percentage, or group by
locality of reference to other functions, rather than just a binary
split of "rare" vs "not-rare".

Of course it's all very dependant on workload, which drivers you're 
using too, etc, etc. So a profile that's separate also makes it much
easier to tweak for one machine than the source base in general, which
theoretically represents everyone (and thus has little info ;-)).

Which also makes me think it's easier to mark hot functions than cold
ones, in a more general maintainance sense.

M.

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer  compilers
  2006-01-05 22:36                 ` Matt Mackall
  2006-01-05 22:49                   ` Martin Bligh
@ 2006-01-05 22:55                   ` Ingo Molnar
  2006-01-05 23:11                     ` Matt Mackall
  1 sibling, 1 reply; 167+ messages in thread
From: Ingo Molnar @ 2006-01-05 22:55 UTC (permalink / raw)
  To: Matt Mackall
  Cc: Linus Torvalds, Martin Bligh, Arjan van de Ven, Chuck Ebbert,
	Adrian Bunk, Andrew Morton, linux-kernel, Dave Jones,
	Tim Schmielau


* Matt Mackall <mpm@selenic.com> wrote:

> > I don't believe it is actually all _that_ volatile. Yes, it would be a 
> > huge issue _initially_, but the incremental effects shouldn't be that big, 
> > or there is something wrong with the approach.
> 
> No, perhaps not. But it would be nice in theory for people to be able 
> to do things like profile their production system and relink. And 
> having to touch hundreds of files to do it would be painful.

we can (almost) do that: via -ffunction-sections. It does seem to work 
on both the gcc and the ld side. [i tried to use this for --gc-sections 
to save more space, but that ld option seems unstable, i couldnt link a 
bootable image. -ffunction-sections itself seems to work fine in gcc4.]

i think all that is needed to reorder the functions is a build-time 
generated ld script, which is generated off the 'popularity list'.

so i think the two concepts could nicely co-exist: in-source annotations 
help us maintain the popularity list, -ffunction-sections allows us to 
reorder at link time. In fact such a kernel could be shipped in 
'unlinked' state, and could be relinked based on per-system profiling 
data. As long as we have KALLSYMS, it's not even a big debuggability 
issue.

the branch-likelyhood thing is a separate issue from function-cohesion, 
but could be handled by the same concept: if there was a 
--ffunction-unlikely-sections option in gcc (there's none currently, 
AFAICS), then those could be reordered in a smart way too. (We wouldnt 
get the 8-byte relative jumps back though, gcc would always have to 
assume that the jump is far away.)

	Ingo

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer  compilers
  2006-01-05 22:49                   ` Martin Bligh
@ 2006-01-05 23:02                     ` Matt Mackall
  0 siblings, 0 replies; 167+ messages in thread
From: Matt Mackall @ 2006-01-05 23:02 UTC (permalink / raw)
  To: Martin Bligh
  Cc: Linus Torvalds, Arjan van de Ven, Chuck Ebbert, Adrian Bunk,
	Andrew Morton, Ingo Molnar, linux-kernel, Dave Jones,
	Tim Schmielau

On Thu, Jan 05, 2006 at 02:49:21PM -0800, Martin Bligh wrote:
> 
> >>>What I was proposing was something like, say, arch/i386/popularity.lst, 
> >>>which would simply contain a list of the most popular n% of functions 
> >>>sorted by popularity. As text, of course.
> >>
> >>I suspect that would certainlty work for pure function-based popularity, 
> >>and yes, it has the advantage of being simple (especially for something 
> >>that ends up being almost totally separated from the compiler: if we're 
> >>using this purely to modify link scripts etc with special tools).
> >>
> >>But what about the unlikely/likely conditional hints that we currently do 
> >>by hand? How are you going to sanely maintain a list of those without 
> >>doing that in source code?
> >
> >
> >Dunno. Those bits are all anonymous so marking them in situ is about
> >the only way to go. But we can do better for whole functions.
> 
> Would also make it easier to rank it as a percentage, or group by
> locality of reference to other functions, rather than just a binary
> split of "rare" vs "not-rare".
> 
> Of course it's all very dependant on workload, which drivers you're 
> using too, etc, etc. So a profile that's separate also makes it much
> easier to tweak for one machine than the source base in general, which
> theoretically represents everyone (and thus has little info ;-)).
> 
> Which also makes me think it's easier to mark hot functions than cold
> ones, in a more general maintainance sense.

Yes, I think it definitely makes sense to think in terms of hot
functions. We surely have a nice long tail on the popularity
distribution and only the first 5% or so are actually worth sorting
and packing.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer  compilers
  2006-01-05 22:55                   ` Ingo Molnar
@ 2006-01-05 23:11                     ` Matt Mackall
  2006-01-05 23:27                       ` Jesse Barnes
  2006-01-05 23:58                       ` Ingo Molnar
  0 siblings, 2 replies; 167+ messages in thread
From: Matt Mackall @ 2006-01-05 23:11 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Martin Bligh, Arjan van de Ven, Chuck Ebbert,
	Adrian Bunk, Andrew Morton, linux-kernel, Dave Jones,
	Tim Schmielau

On Thu, Jan 05, 2006 at 11:55:13PM +0100, Ingo Molnar wrote:
> 
> * Matt Mackall <mpm@selenic.com> wrote:
> 
> > > I don't believe it is actually all _that_ volatile. Yes, it would be a 
> > > huge issue _initially_, but the incremental effects shouldn't be that big, 
> > > or there is something wrong with the approach.
> > 
> > No, perhaps not. But it would be nice in theory for people to be able 
> > to do things like profile their production system and relink. And 
> > having to touch hundreds of files to do it would be painful.
> 
> we can (almost) do that: via -ffunction-sections. It does seem to work 
> on both the gcc and the ld side. [i tried to use this for --gc-sections 
> to save more space, but that ld option seems unstable, i couldnt link a 
> bootable image. -ffunction-sections itself seems to work fine in gcc4.]

Yeah, we've been talking about --gc-sections for years. It'd be nice
if we could work the build system in that direction with this
profiling concept.

(I suspect something silly happened in your test like dropping the
fixup table, btw.)

> i think all that is needed to reorder the functions is a build-time 
> generated ld script, which is generated off the 'popularity list'.
> 
> so i think the two concepts could nicely co-exist: in-source annotations 
> help us maintain the popularity list, -ffunction-sections allows us to 
> reorder at link time. In fact such a kernel could be shipped in 
> 'unlinked' state, and could be relinked based on per-system profiling 
> data. As long as we have KALLSYMS, it's not even a big debuggability 
> issue.

I'm still not sure about in-source annotations for popularity. My
suspicion is that it's just too workload-dependent, and a given
author's workload will likely be biased towards their code.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer  compilers
  2006-01-05 23:11                     ` Matt Mackall
@ 2006-01-05 23:27                       ` Jesse Barnes
  2006-01-05 23:58                       ` Ingo Molnar
  1 sibling, 0 replies; 167+ messages in thread
From: Jesse Barnes @ 2006-01-05 23:27 UTC (permalink / raw)
  To: Matt Mackall
  Cc: Ingo Molnar, Linus Torvalds, Martin Bligh, Arjan van de Ven,
	Chuck Ebbert, Adrian Bunk, Andrew Morton, linux-kernel,
	Dave Jones, Tim Schmielau

On Thursday, January 5, 2006 3:11 pm, Matt Mackall wrote:
> I'm still not sure about in-source annotations for popularity. My
> suspicion is that it's just too workload-dependent, and a given
> author's workload will likely be biased towards their code.

To some extent that's true, but like Linus implied with his "5% work gets 
us 80% there" I think there are a ton of obvious cases, e.g. kmalloc, 
alloc_pages, interrupt handling, etc. that could be marked right away 
and put into a frequently used section.

Jesse

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer  compilers
  2006-01-05 20:15                 ` Linus Torvalds
@ 2006-01-05 23:30                   ` Ingo Molnar
  2006-01-05 23:54                     ` Linus Torvalds
  2006-01-06  0:02                     ` Martin Bligh
  2006-01-06  0:50                   ` Mitchell Blank Jr
  1 sibling, 2 replies; 167+ messages in thread
From: Ingo Molnar @ 2006-01-05 23:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Martin Bligh, Matt Mackall, Arjan van de Ven, Chuck Ebbert,
	Adrian Bunk, Andrew Morton, linux-kernel, Dave Jones,
	Tim Schmielau


* Linus Torvalds <torvalds@osdl.org> wrote:

> On Thu, 5 Jan 2006, Linus Torvalds wrote:
> > 
> > The cache effects are likely the biggest ones, and no, I don't know how 
> > much denser it will be in the cache. Especially with a 64-byte one.. 
> > (although 128 bytes is fairly common too).
> 
> Oh, but validatign things like "likely()" and "unlikely()" branch 
> hints might be a noticeably bigger issue.

i frequently validate branches in performance-critical kernel code like 
the scheduler (and the mutex code ;), via instruction-granularity 
profiling, driven by a high-frequency (10-100 KHz) NMI interrupt. A bad 
branch layout shows up pretty clearly in annotated assembly listings:

c037313c:     1715 <schedule>:
c037313c:     1715 	55                   	push   %ebp
c037313d:      264 	b8 00 f0 ff ff       	mov    $0xfffff000,%eax
c0373142:      150 	89 e5                	mov    %esp,%ebp
c0373144:        0 	57                   	push   %edi
c0373145:      852 	56                   	push   %esi
c0373146:      215 	53                   	push   %ebx
c0373147:        0 	83 ec 30             	sub    $0x30,%esp
c037314a:      184 	21 e0                	and    %esp,%eax
c037314c:      120 	8b 10                	mov    (%eax),%edx
c037314e:        0 	83 ba 84 00 00 00 00 	cmpl   $0x0,0x84(%edx)
c0373155:       83 	75 2b                	jne    c0373182 <schedule+0x46>
c0373157:      104 	8b 48 14             	mov    0x14(%eax),%ecx
c037315a:       39 	f7 c1 ff ff ff ef    	test   $0xefffffff,%ecx
c0373160:      112 	74 20                	je     c0373182 <schedule+0x46>
c0373162:        0 	ff b2 9c 00 00 00    	pushl  0x9c(%edx)
c0373168:        0 	8d 82 a4 01 00 00    	lea    0x1a4(%edx),%eax
c037316e:        0 	51                   	push   %ecx
c037316f:        0 	50                   	push   %eax
c0373170:        0 	68 7e 0e 39 c0       	push   $0xc0390e7e
c0373175:        0 	e8 a3 36 da ff       	call   c011681d <printk>
c037317a:        0 	e8 48 03 d9 ff       	call   c01034c7 <dump_stack>
c037317f:        0 	83 c4 10             	add    $0x10,%esp
c0373182:      323 	8b 55 04             	mov    0x4(%ebp),%edx
c0373185:        5 	b8 02 00 00 00       	mov    $0x2,%eax
c037318a:        0 	e8 b3 3f da ff       	call   c0117142 <profile_hit>
c037318f:      349 	b8 00 f0 ff ff       	mov    $0xfffff000,%eax
c0373194:      880 	21 e0                	and    %esp,%eax
c0373196:        0 	8b 00                	mov    (%eax),%eax
c0373198:        0 	89 45 d4             	mov    %eax,0xffffffd4(%ebp)
c037319b:      440 	83 78 14 00          	cmpl   $0x0,0x14(%eax)
c037319f:        5 	78 05                	js     c03731a6 <schedule+0x6a>

the second column is the number of profiler hits. As you can see, the 
branch at c0373160 is always taken, and there's a hole of 32 bytes in 
the instruction stream. It is relatively easy to identify the 
likely/unlikely candidates for various workloads. (It would probably be 
even better to have a visual tool that also associates the source code 
with the data.)

i've seen alot of such profiles on alot of different workloads, and my 
guesstimate would be that with 'perfect' likely/unlikely hints, and with 
'perfect' function ordering, we could roughly halve (!) the current 
icache footprint of the kernel on complex workloads too.

Especially with 64 or 128 byte L1 cachelines our codepaths are really 
fragmented and we can easily have 3-4 times of the optimal icache 
footprint, for a given syscall. We very often have cruft in the hotpath, 
and we often have functions that belong together ripped apart by things 
like e.g. __sched annotators. I havent seen many cases of wrongly judged 
likely/unlikely hints, what happens typically is that there's no 
annotation and the default compiler guess is wrong.

the dcache footprint of the kernel is much better, mostly because it's 
so easy to control it in C. The icache footprint is alot more elusive.  
(and also alot more critical to execution speed on nontrivial workloads)

so i think there are two major focus areas to improve our icache 
footprint:

 - reduce code size
 - reduce fragmentation of the codepath

fortunately both are hard and technically challenging projects, and both 
will improve the icache footprint - and they will also bring other 
benefits. [ We usually have much more problems with the easy and boring 
stuff ;-) ]

icache fragmentation reduction is also hard because it has to deal with 
fundamentally conflicting constraints: one workload's ideal ordering is 
different from another workload's ideal ordering, and such workloads can 
be superimposed on a system.

I think the only sane solution [that would be endorsed by distributions] 
is to allow users to reorder function sections runtime (per boot). That 
is alot faster and more robust (from a production POV) than a full 
recompilation of the kernel. Recompilation is always risky, it needs too 
much context, and has too many tool dependencies - and is thus currently 
untestable. And we dont really need a recompilation of the kernel 
technically - we need a relinking. Relinking is much safer from a 
testability POV: reordering of the functions doesnt change their 
internal instruction sequence or their interactions.

and we could use mcount() to gather function-cohesion data runtime. The 
mcount() call could be patched out from the image runtime, when no data 
gathering is happening. Given that the average function size is ~100 
bytes, and an mcount call costs 5 bytes, the overhead would be +5% of 
size and an extra 5-byte NOP per function. That's not good, but it is 
still at least an order of magnitude smaller than the possible gain in 
icache footprint. (Also, people could run mcount()-less kernels as well, 
once the data has been gathered, and the relink was done.)

one problem are modules though - they could only be reordered within 
themselves. On an average system which has ~100 modules loaded, the 
average icache fragmentation is +100*128/2 == 6.4K [with 128 byte L1 
cachelines], which can be significant (depending on the workload). OTOH, 
modules do have strong internal cohesion - they contain functions that 
belong together conceptually. So by reordering functions within modules 
we'll likely be able to realize most of the icache savings possible. The 
only exception would be workloads that utilize many modules at a high 
frequency. Such workloads will likely trash the icache anyway.

	Ingo

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer  compilers
  2006-01-05 23:30                   ` Ingo Molnar
@ 2006-01-05 23:54                     ` Linus Torvalds
  2006-01-06  0:15                       ` Ingo Molnar
  2006-01-06  0:02                     ` Martin Bligh
  1 sibling, 1 reply; 167+ messages in thread
From: Linus Torvalds @ 2006-01-05 23:54 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Martin Bligh, Matt Mackall, Arjan van de Ven, Chuck Ebbert,
	Adrian Bunk, Andrew Morton, linux-kernel, Dave Jones,
	Tim Schmielau



On Fri, 6 Jan 2006, Ingo Molnar wrote:
> 
> i frequently validate branches in performance-critical kernel code like 
> the scheduler (and the mutex code ;), via instruction-granularity 
> profiling, driven by a high-frequency (10-100 KHz) NMI interrupt. A bad 
> branch layout shows up pretty clearly in annotated assembly listings:

Yes, but we only do this for routines that we look at anyway.

Also, the profiles can be misleading at times: you often get instructions 
with zero hits, because they always schedule together with another 
instruction. So parsing things and then matching them up (correctly) with 
the source code in order to annotate them is probably pretty nontrivial.

But starting with the code-paths that get literally zero profile events is 
definitely the way to go.

> Especially with 64 or 128 byte L1 cachelines our codepaths are really 
> fragmented and we can easily have 3-4 times of the optimal icache 
> footprint, for a given syscall. We very often have cruft in the hotpath, 
> and we often have functions that belong together ripped apart by things 
> like e.g. __sched annotators. I havent seen many cases of wrongly judged 
> likely/unlikely hints, what happens typically is that there's no 
> annotation and the default compiler guess is wrong.

We don't have likely()/unlikely() that often, and at least in my case it's 
partly because the syntax is a pain (it would probably have been better to 
include the "if ()" part in the syntax - the millions of parenthesis just 
drive me wild).

So yeah, we tend to put likely/unlikely only on really obvious stuff, and 
only on functions where we think about it. So we probably don't get it 
wrong that often.

> the dcache footprint of the kernel is much better, mostly because it's 
> so easy to control it in C. The icache footprint is alot more elusive.  
> (and also alot more critical to execution speed on nontrivial workloads)
> 
> so i think there are two major focus areas to improve our icache 
> footprint:
> 
>  - reduce code size
>  - reduce fragmentation of the codepath
> 
> fortunately both are hard and technically challenging projects

That's an interesting use of "fortunately". I tend to prefer the form 
where it means "fortunately, we can trivially fix this with a two-line 
solution that is obviously correct" ;)

		Linus

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer  compilers
  2006-01-05 23:11                     ` Matt Mackall
  2006-01-05 23:27                       ` Jesse Barnes
@ 2006-01-05 23:58                       ` Ingo Molnar
  1 sibling, 0 replies; 167+ messages in thread
From: Ingo Molnar @ 2006-01-05 23:58 UTC (permalink / raw)
  To: Matt Mackall
  Cc: Linus Torvalds, Martin Bligh, Arjan van de Ven, Chuck Ebbert,
	Adrian Bunk, Andrew Morton, linux-kernel, Dave Jones,
	Tim Schmielau


* Matt Mackall <mpm@selenic.com> wrote:

> > so i think the two concepts could nicely co-exist: in-source annotations 
> > help us maintain the popularity list, -ffunction-sections allows us to 
> > reorder at link time. In fact such a kernel could be shipped in 
> > 'unlinked' state, and could be relinked based on per-system profiling 
> > data. As long as we have KALLSYMS, it's not even a big debuggability 
> > issue.
> 
> I'm still not sure about in-source annotations for popularity. My 
> suspicion is that it's just too workload-dependent, and a given 
> author's workload will likely be biased towards their code.

in-source annotations can do more:

- inlines could be driven by profile data: if a function is used in a 
  hot path and it's used only once, it makes sense to inline that 
  function into that hot path - because the kernel size increase will be 
  in the cold portion.

- we could drive the likely/unlikely annotations via profiling data.

OTOH i think that _most_ of the benefit (80% :-) could be achieved via 
the much simpler (and more robust) link-time-reordering solution. It is 
also alot less intrusive, and can still be presented in some plain-text 
format that can be distributed along the upstream kernel:

	linux/profiles/webserver.list
	linux/profiles/database-server.list
	linux/profiles/desktop.list
	linux/profiles/beowulf-node.list

and users could pick their profile at build time / relink time. There is 
no binary compatibility problem with such plaintext lists - they dont 
have to be fully complete, nor do they have to be fully accurate - they 
dont impact correctness in any way. In fact profiles could merge all 
architectures into one file, so they would be pretty generic as well.

	Ingo

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer  compilers
  2006-01-05 23:30                   ` Ingo Molnar
  2006-01-05 23:54                     ` Linus Torvalds
@ 2006-01-06  0:02                     ` Martin Bligh
  2006-01-06  0:40                       ` Ingo Molnar
  1 sibling, 1 reply; 167+ messages in thread
From: Martin Bligh @ 2006-01-06  0:02 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Matt Mackall, Arjan van de Ven, Chuck Ebbert,
	Adrian Bunk, Andrew Morton, linux-kernel, Dave Jones,
	Tim Schmielau

> I think the only sane solution [that would be endorsed by distributions] 
> is to allow users to reorder function sections runtime (per boot). That 
> is alot faster and more robust (from a production POV) than a full 
> recompilation of the kernel. Recompilation is always risky, it needs too 
> much context, and has too many tool dependencies - and is thus currently 
> untestable. 

<smhuch> - the sound of my eyeballs popping out and splatting against 
the opposite wall.

So ... recompilation is not testable, but boot time reordering of the 
code somehow is? ;-) Yes, I understand the distro toolchain issues, but 
it's still a scary solution ...

Personally, I'd think the sane thing is not to try to optimise by 
workload, but get 80% of the benefit by just reordering on a more 
generalized workload. Doing boot-time reordering for this on non-custom 
kernels just seems terrifying .. it's not that huge a benefit, surely?

> one problem are modules though - they could only be reordered within 
> themselves. On an average system which has ~100 modules loaded, the 
> average icache fragmentation is +100*128/2 == 6.4K [with 128 byte L1 
> cachelines], which can be significant (depending on the workload). OTOH, 
> modules do have strong internal cohesion - they contain functions that 
> belong together conceptually. So by reordering functions within modules 
> we'll likely be able to realize most of the icache savings possible. The 
> only exception would be workloads that utilize many modules at a high 
> frequency. Such workloads will likely trash the icache anyway.

I was thinking about that with modules earlier, and whether modular 
kernels would actually be faster because of that than a statically 
compiled one. But don't you get similar effects from the .o groupings by 
file we get? or does the linker not preserve those groupings?

M.

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer  compilers
  2006-01-05 23:54                     ` Linus Torvalds
@ 2006-01-06  0:15                       ` Ingo Molnar
  2006-01-06  0:27                         ` Linus Torvalds
  0 siblings, 1 reply; 167+ messages in thread
From: Ingo Molnar @ 2006-01-06  0:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Martin Bligh, Matt Mackall, Arjan van de Ven, Chuck Ebbert,
	Adrian Bunk, Andrew Morton, linux-kernel, Dave Jones,
	Tim Schmielau


* Linus Torvalds <torvalds@osdl.org> wrote:

> Also, the profiles can be misleading at times: you often get 
> instructions with zero hits, because they always schedule together 
> with another instruction. So parsing things and then matching them up 
> (correctly) with the source code in order to annotate them is probably 
> pretty nontrivial.

yeah, but schedules-together isnt a big problem in terms of branch 
predictions: unused branches really stick out with their zero counters.  
Especially if there enough profiling hits, it's usually a quick glance 
to figure out the hotpath:

c0119e1f:   582904 <sys_gettimeofday>:
c0119e1f:   582904 	57                   	push   %edi
c0119e20:   312621 	56                   	push   %esi
c0119e21:       29 	53                   	push   %ebx
c0119e22:        0 	50                   	push   %eax
c0119e23:   285471 	50                   	push   %eax
c0119e24:       15 	8b 74 24 18          	mov    0x18(%esp),%esi
c0119e28:       21 	8b 7c 24 1c          	mov    0x1c(%esp),%edi
c0119e2c:   325688 	89 f0                	mov    %esi,%eax
c0119e2e:       26 	89 fa                	mov    %edi,%edx
c0119e30:        0 	e8 86 fe ff ff       	call   c0119cbb <timeofday_API_hacks>
c0119e35:   377758 	83 f8 01             	cmp    $0x1,%eax
c0119e38:   384539 	75 3f                	jne    c0119e79 <sys_gettimeofday+0x5a>
c0119e3a:        0 	85 f6                	test   %esi,%esi
c0119e3c:        0 	74 19                	je     c0119e57 <sys_gettimeofday+0x38>
c0119e3e:        0 	89 e0                	mov    %esp,%eax
c0119e40:        0 	e8 4b c6 fe ff       	call   c0106490 <do_gettimeofday>
c0119e45:        0 	b9 08 00 00 00       	mov    $0x8,%ecx
c0119e4a:        0 	89 f0                	mov    %esi,%eax
c0119e4c:        0 	89 e2                	mov    %esp,%edx
c0119e4e:        0 	e8 3e f2 0b 00       	call   c01d9091 <copy_to_user>
c0119e53:        0 	85 c0                	test   %eax,%eax
c0119e55:        0 	75 19                	jne    c0119e70 <sys_gettimeofday+0x51>
c0119e57:        0 	85 ff                	test   %edi,%edi
c0119e59:        0 	74 1c                	je     c0119e77 <sys_gettimeofday+0x58>
c0119e5b:        0 	b9 08 00 00 00       	mov    $0x8,%ecx
c0119e60:        0 	ba 88 3e 53 c0       	mov    $0xc0533e88,%edx
c0119e65:        0 	89 f8                	mov    %edi,%eax
c0119e67:        0 	e8 25 f2 0b 00       	call   c01d9091 <copy_to_user>
c0119e6c:        0 	85 c0                	test   %eax,%eax
c0119e6e:        0 	74 07                	je     c0119e77 <sys_gettimeofday+0x58>
c0119e70:        0 	b8 f2 ff ff ff       	mov    $0xfffffff2,%eax
c0119e75:        0 	eb 02                	jmp    c0119e79 <sys_gettimeofday+0x5a>
c0119e77:        0 	31 c0                	xor    %eax,%eax
c0119e79:      308 	5e                   	pop    %esi
c0119e7a:   749654 	5f                   	pop    %edi
c0119e7b:   415831 	5b                   	pop    %ebx
c0119e7c:      744 	5e                   	pop    %esi
c0119e7d:   361201 	5f                   	pop    %edi
c0119e7e:   373195 	c3                   	ret    

here at the top you can see that the CPU is a nice 3-issue design and 
that in this workload the branch at c0119e38 is untaken and returns from 
the function afterwards. A branch instruction followed by more than 2 
zero profile-count instructions (that are not jumps) is a good sign of 
an untaken branch. This would be a pretty strong heuristics as well i 
think. We could really make the requirement be 'zero profiling hits', 
and the branch instruction would have to get 'enough' hits, to conclude 
that the branch is a candidate for likely/unlikely.

	Ingo

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer  compilers
  2006-01-06  0:15                       ` Ingo Molnar
@ 2006-01-06  0:27                         ` Linus Torvalds
  2006-01-06  0:54                           ` Ingo Molnar
  0 siblings, 1 reply; 167+ messages in thread
From: Linus Torvalds @ 2006-01-06  0:27 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Martin Bligh, Matt Mackall, Arjan van de Ven, Chuck Ebbert,
	Adrian Bunk, Andrew Morton, linux-kernel, Dave Jones,
	Tim Schmielau



On Fri, 6 Jan 2006, Ingo Molnar wrote:
>
> Especially if there enough profiling hits, it's usually a quick glance 
> to figure out the hotpath:

Ehh. What's a "quick glance" to a human can be quite hard to automate. 
That's my point.

If we do the "human quick glances", we won't be seeing much come out of 
this. That's what we've already been doing, for several years.

I thought the discussion was about trying to automate this..

		Linus

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer  compilers
  2006-01-06  0:02                     ` Martin Bligh
@ 2006-01-06  0:40                       ` Ingo Molnar
  2006-01-06  0:55                         ` Martin Bligh
  0 siblings, 1 reply; 167+ messages in thread
From: Ingo Molnar @ 2006-01-06  0:40 UTC (permalink / raw)
  To: Martin Bligh
  Cc: Linus Torvalds, Matt Mackall, Arjan van de Ven, Chuck Ebbert,
	Adrian Bunk, Andrew Morton, linux-kernel, Dave Jones,
	Tim Schmielau


* Martin Bligh <mbligh@mbligh.org> wrote:

> >I think the only sane solution [that would be endorsed by distributions] 
> >is to allow users to reorder function sections runtime (per boot). That 
> >is alot faster and more robust (from a production POV) than a full 
> >recompilation of the kernel. Recompilation is always risky, it needs too 
> >much context, and has too many tool dependencies - and is thus currently 
> >untestable. 
> 
> <smhuch> - the sound of my eyeballs popping out and splatting against 
> the opposite wall.
> 
> So ... recompilation is not testable, but boot time reordering of the 
> code somehow is? ;-) Yes, I understand the distro toolchain issues, 
> but it's still a scary solution ...

'testable' in the sense that the stability and reliability of the system 
is alot less dependent on function ordering, than it is dependent on 
compilation. It's not 'testable' in the sense of being able to cycle 
through all the 60000-factorial function combinations - but that's not 
really a problem, as long as the risk of the worst-case effect of 
function reordering can be judged and covered. The kernel stopped being 
'fully testable' sometime around version 0.2 already ;-)

> Personally, I'd think the sane thing is not to try to optimise by 
> workload, but get 80% of the benefit by just reordering on a more 
> generalized workload. Doing boot-time reordering for this on 
> non-custom kernels just seems terrifying .. it's not that huge a 
> benefit, surely?

i'm not so sure, see the ballpark figures below.

> >one problem are modules though - they could only be reordered within 
> >themselves. On an average system which has ~100 modules loaded, the 
> >average icache fragmentation is +100*128/2 == 6.4K [with 128 byte L1 
> >cachelines], which can be significant (depending on the workload). OTOH, 
> >modules do have strong internal cohesion - they contain functions that 
> >belong together conceptually. So by reordering functions within modules 
> >we'll likely be able to realize most of the icache savings possible. The 
> >only exception would be workloads that utilize many modules at a high 
> >frequency. Such workloads will likely trash the icache anyway.
> 
> I was thinking about that with modules earlier, and whether modular 
> kernels would actually be faster because of that than a statically 
> compiled one. But don't you get similar effects from the .o groupings 
> by file we get? or does the linker not preserve those groupings?

they are mostly preserved (except for things like __sched which move 
functions out of .o), but look at a call-chain like this:

| new stack-footprint maximum: khelper/1125, 2412 bytes (out of 8140 bytes).
------------|
{   40} [<c0144a31>] debug_stackoverflow+0xb6/0xc4
{   60} [<c0144f66>] __mcount+0x47/0xe0
{   20} [<c0110b24>] mcount+0x14/0x18
{  116} [<c05fec5e>] __down_mutex+0xe/0x87b
{   28} [<c0601544>] _spin_lock+0x24/0x49
{   36} [<c015aa7e>] kmem_cache_alloc+0x40/0xe9
{   16} [<c0154ddd>] mempool_alloc_slab+0x1d/0x1f
{   56} [<c0154c82>] mempool_alloc+0x39/0xf1
{   44} [<c02f3519>] get_request+0xf7/0x2dc
{   60} [<c02f372d>] get_request_wait+0x2f/0x10f
{   88} [<c02f43bc>] __make_request+0xae/0x552
{   88} [<c02f4b9b>] generic_make_request+0xaf/0x24c
{   84} [<c02f4d90>] submit_bio+0x58/0x127
{   20} [<c0196912>] mpage_bio_submit+0x31/0x39
{  292} [<c0196d4e>] do_mpage_readpage+0x2f5/0x451
{   88} [<c0196f99>] mpage_readpages+0xef/0x19e
{   24} [<c01dfeb4>] ext3_readpages+0x2c/0x2e
{   80} [<c0158b4b>] read_pages+0x38/0x14d
{   60} [<c0158de1>] __do_page_cache_readahead+0x181/0x186
{   36} [<c0158f40>] blockable_page_cache_readahead+0x69/0xd0
{   44} [<c015918e>] page_cache_readahead+0x13c/0x185
{  120} [<c0151abf>] do_generic_mapping_read+0x436/0x6c3
{   80} [<c0151fc8>] __generic_file_aio_read+0x17f/0x22e
{   44} [<c01520c8>] generic_file_aio_read+0x51/0x7f
{  160} [<c0171091>] do_sync_read+0xba/0x109
{   40} [<c017119e>] vfs_read+0xbe/0x1a0
{   40} [<c017c5fa>] kernel_read+0x4b/0x55
{  192} [<c019f844>] load_elf_binary+0x5e9/0xd2d
{   32} [<c017d4d1>] search_binary_handler+0x80/0x136
{  176} [<c019e87e>] load_script+0x246/0x258
{   32} [<c017d4d1>] search_binary_handler+0x80/0x136
{   36} [<c017d7c2>] do_execve+0x23b/0x268
{   36} [<c0101c4e>] sys_execve+0x41/0x8f
{=2368} [<c01030ca>] syscall_call+0x7/0xb
<---------------------------

[readers beware, quick & dirty and possibly wrong guesstimations: ]

these 30 functions involve roughly 10-15 .o files, so we've got 2-3 
functions per .o file only. And that's typical for VFS, IO, networking 
and lots of other syscall types. So if the full kernel image is 3MB, and 
we've got 30 functions totalling to 3000 bytes, they are spread out in 
10-15 groups right now - creating 10-15 split icache lines. (in reality 
we have in excess of 20 split icache-lines, due to weak cohesion even 
within .o files) With 64-byte lines that's 320-480 bytes 'lost' due to 
fragmentation alone, with 128-byte lines it's 640-960 bytes - which is 
10%-21% in the 64-byte case, and 21%-42% in the 128-byte case. I.e. the 
icache bloat just due to the placement is quite significant. Adding 
.o-level fragmentation plus inter-function inactive code to the mix can 
easily baloon this even higher. Plus the current method of doing 
unlikely() means the unlikely instructions are near the end of the 
function - so they still 'spread apart' the footprint and thus have a 
nontrivial icache cost.

[ now the above one is a random example out of my logs that is also a 
  really bad example: in reality would win little from better icache 
  footprint: an execve() takes 100,000-200,000 cycles even when
  everything comes from the pagecache, and most of those cycles are
  spent in very tight codepaths. ]

	Ingo


^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer  compilers
  2006-01-05 20:15                 ` Linus Torvalds
  2006-01-05 23:30                   ` Ingo Molnar
@ 2006-01-06  0:50                   ` Mitchell Blank Jr
  2006-01-06  0:58                     ` Ingo Molnar
  1 sibling, 1 reply; 167+ messages in thread
From: Mitchell Blank Jr @ 2006-01-06  0:50 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Martin Bligh, Matt Mackall, Arjan van de Ven, Chuck Ebbert,
	Adrian Bunk, Andrew Morton, Ingo Molnar, linux-kernel, Dave Jones,
	Tim Schmielau

Linus Torvalds wrote:
> Oh, but validatign things like "likely()" and "unlikely()" branch hints 
> might be a noticeably bigger issue. 

I think the issues are somewhat intertwined.

For instance, assume you have code like:

	if (some_function(skb)) {
		blah();
		printk(KERN_WARN "bad packet...\n");
	} else {
		process_skb(skb);
	}

Now just by annotating printk() as "rare" then gcc should be able to guess
that the "if" is unlikely() without explicitly marking it as such since
one of its paths calls a rare function and the other does not.  If instead
both paths called rare functions then the compiler could decide that the
whole block is probably "rare" and optimize accordingly.

I haven't looked at gcc 4.1 yet so I don't know how sophisticated its "rare"
promotion rules are yet but this is certainly the kind of thing the compiler
should be able to handle.

So basically better inter-functional locality hints should also help
intra-functional locality.

[from another message]
> We don't have likely()/unlikely() that often, and at least in my case it's
> partly because the syntax is a pain (it would probably have been better to
> include the "if ()" part in the syntax - the millions of parenthesis just
> drive me wild).

I actually did that in a project once (an "unlikely_if()" macro)  It was
not a good idea.  The problem is that every syntax-highlighter knows that
"if" is a keyword but you'd have to teach it about "unlikely_if".  It was
surprising how visually jarring having different pretty-printing for
different types of "if" statements was.  "if (unlikely())" looks much
cleaner in comparison.

-Mitch

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer  compilers
  2006-01-06  0:27                         ` Linus Torvalds
@ 2006-01-06  0:54                           ` Ingo Molnar
  0 siblings, 0 replies; 167+ messages in thread
From: Ingo Molnar @ 2006-01-06  0:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Martin Bligh, Matt Mackall, Arjan van de Ven, Chuck Ebbert,
	Adrian Bunk, Andrew Morton, linux-kernel, Dave Jones,
	Tim Schmielau


* Linus Torvalds <torvalds@osdl.org> wrote:

> > Especially if there enough profiling hits, it's usually a quick glance 
> > to figure out the hotpath:
> 
> Ehh. What's a "quick glance" to a human can be quite hard to automate.  
> That's my point.
> 
> If we do the "human quick glances", we won't be seeing much come out 
> of this. That's what we've already been doing, for several years.
> 
> I thought the discussion was about trying to automate this..

i think it could be automated reasonably well. An 80% effective "which 
condition is judged incorrectly" decision could be made based on:

  branch instruction with more than 10% of the average per-instruction 
  cycle count, followed by an at least 4-instruction sequence of 
  non-branch (and non-jump) instructions that have exactly zero 
  profiling hits. ('few hits' we are not interested in - those are not 
  likely/unlikely candidates)

another part is to feed this back into .c, automatically. I've done 
DEBUG_INFO, gdb vmlinux and list *0x12341234 based scripts before, but 
they are not always reliable. They could probably do something like: "if 
the resulting source code contains a clear 'if (' sequence, modify it to 
'if __unlikely (', or something like that.

i'd expect such a method to catch ~50-60% of the interesting cases, not 
more. (the rest would be stuff the heuristics doesnt catch, and it would 
also be stuff like 'while' or 'break' or 'goto', which are much harder 
to rewrite automatically.

but it does feel quite a bit fragile.

	Ingo

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer  compilers
  2006-01-06  0:40                       ` Ingo Molnar
@ 2006-01-06  0:55                         ` Martin Bligh
  2006-01-06  1:48                           ` Mitchell Blank Jr
  0 siblings, 1 reply; 167+ messages in thread
From: Martin Bligh @ 2006-01-06  0:55 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Matt Mackall, Arjan van de Ven, Chuck Ebbert,
	Adrian Bunk, Andrew Morton, linux-kernel, Dave Jones,
	Tim Schmielau


> these 30 functions involve roughly 10-15 .o files, so we've got 2-3 
> functions per .o file only. And that's typical for VFS, IO, networking 
> and lots of other syscall types. So if the full kernel image is 3MB, and 
> we've got 30 functions totalling to 3000 bytes, they are spread out in 
> 10-15 groups right now - creating 10-15 split icache lines. (in reality 
> we have in excess of 20 split icache-lines, due to weak cohesion even 
> within .o files) With 64-byte lines that's 320-480 bytes 'lost' due to 
> fragmentation alone, with 128-byte lines it's 640-960 bytes - which is 
> 10%-21% in the 64-byte case, and 21%-42% in the 128-byte case. I.e. the 
> icache bloat just due to the placement is quite significant. Adding 
> .o-level fragmentation plus inter-function inactive code to the mix can 
> easily baloon this even higher. Plus the current method of doing 
> unlikely() means the unlikely instructions are near the end of the 
> function - so they still 'spread apart' the footprint and thus have a 
> nontrivial icache cost.

mmm. will take me a little time to digest that.

But we were just discussing here ... wouldn't it be worth moving 
"unlikely" sections of code completely out of line? If they were calls 
to separate functions, all this optimisation stuff could just work at a 
function level, and would be pretty trivial to do?

ie instead of:

if (unlikely(conditon)) {
	do;
	some;
	stuff;
	BUG();
	error();
	oh_dear();
}


we'd have


if (unlikely(conditon)) {
	call_oh_shit();
}

__rarely_called void call_oh_shit()
{
	do;
	some;
	stuff;
	BUG();
	error();
	oh_dear();
}

depends how long they are, I suppose. Moving that out of line would seem 
to make more difference to icache footprint to me than just cacheline 
packing functions.

M.

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer  compilers
  2006-01-06  0:50                   ` Mitchell Blank Jr
@ 2006-01-06  0:58                     ` Ingo Molnar
  2006-01-06  1:22                       ` Mitchell Blank Jr
  0 siblings, 1 reply; 167+ messages in thread
From: Ingo Molnar @ 2006-01-06  0:58 UTC (permalink / raw)
  To: Mitchell Blank Jr
  Cc: Linus Torvalds, Martin Bligh, Matt Mackall, Arjan van de Ven,
	Chuck Ebbert, Adrian Bunk, Andrew Morton, linux-kernel,
	Dave Jones, Tim Schmielau


* Mitchell Blank Jr <mitch@sfgoth.com> wrote:

> I actually did that in a project once (an "unlikely_if()" macro) It 
> was not a good idea.  The problem is that every syntax-highlighter 
> knows that "if" is a keyword but you'd have to teach it about 
> "unlikely_if".  It was surprising how visually jarring having 
> different pretty-printing for different types of "if" statements was.  
> "if (unlikely())" looks much cleaner in comparison.

a better syntax would be:

	if __unlikely (cond) {
		...
	}

since it's the extra parantheses that are causing the visual complexity.

	Ingo

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer  compilers
  2006-01-06  0:58                     ` Ingo Molnar
@ 2006-01-06  1:22                       ` Mitchell Blank Jr
  0 siblings, 0 replies; 167+ messages in thread
From: Mitchell Blank Jr @ 2006-01-06  1:22 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Martin Bligh, Matt Mackall, Arjan van de Ven,
	Chuck Ebbert, Adrian Bunk, Andrew Morton, linux-kernel,
	Dave Jones, Tim Schmielau

Ingo Molnar wrote:
> a better syntax would be:
> 
> 	if __unlikely (cond) {
> 		...
> 	}

Well you could just throw an extra set of parenthesis around the expansion
of the current "unlikely()" macro and get this effect now.

-Mitch

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [patch 00/2] improve .text size on gcc 4.0 and newer  compilers
  2006-01-06  0:55                         ` Martin Bligh
@ 2006-01-06  1:48                           ` Mitchell Blank Jr
  0 siblings, 0 replies; 167+ messages in thread
From: Mitchell Blank Jr @ 2006-01-06  1:48 UTC (permalink / raw)
  To: Martin Bligh
  Cc: Ingo Molnar, Linus Torvalds, Matt Mackall, Arjan van de Ven,
	Chuck Ebbert, Adrian Bunk, Andrew Morton, linux-kernel,
	Dave Jones, Tim Schmielau

Martin Bligh wrote:
> But we were just discussing here ... wouldn't it be worth moving 
> "unlikely" sections of code completely out of line? If they were calls 
> to separate functions, all this optimisation stuff could just work at a 
> function level, and would be pretty trivial to do?

...assuming that they don't need to access many local variables.   And don't
have any "goto" statements... and... etc, etc.

> we'd have
> 
> if (unlikely(conditon)) {
> 	call_oh_shit();
> }
> 
> __rarely_called void call_oh_shit()
> {
> 	do;
> 	some;
> 	stuff;
> 	BUG();
> 	error();
> 	oh_dear();
> }

As I described in my other mail on this thread, the _ideal_ solution would
be to tell the compiler that BUG() is a __rarely_called function (well, it's
a macro now but it could be made into an inline function) and let the
compiler figure the rest out without further annotation

> Moving that out of line would seem 
> to make more difference to icache footprint to me than just cacheline 
> packing functions.

Assuming "-funit-at-a-time" (which all archs will probably be using soon)
you'd probably get exactly the same opcodes either way.

-Mitch

^ permalink raw reply	[flat|nested] 167+ messages in thread

end of thread, other threads:[~2006-01-06  1:48 UTC | newest]

Thread overview: 167+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-01-05  0:55 [patch 00/2] improve .text size on gcc 4.0 and newer compilers Chuck Ebbert
2006-01-05  1:07 ` Martin Bligh
2006-01-05 12:19   ` Arjan van de Ven
2006-01-05 14:30     ` Jakub Jelinek
2006-01-05 16:55       ` Linus Torvalds
2006-01-05 18:42         ` Daniel Jacobowitz
2006-01-05 17:02     ` Matt Mackall
2006-01-05 17:59       ` Martin Bligh
2006-01-05 18:09         ` Arjan van de Ven
2006-01-05 18:43           ` Daniel Jacobowitz
2006-01-05 19:17         ` Linus Torvalds
2006-01-05 19:40           ` Linus Torvalds
2006-01-05 19:49             ` Martin Bligh
2006-01-05 20:13               ` Linus Torvalds
2006-01-05 20:15                 ` Linus Torvalds
2006-01-05 23:30                   ` Ingo Molnar
2006-01-05 23:54                     ` Linus Torvalds
2006-01-06  0:15                       ` Ingo Molnar
2006-01-06  0:27                         ` Linus Torvalds
2006-01-06  0:54                           ` Ingo Molnar
2006-01-06  0:02                     ` Martin Bligh
2006-01-06  0:40                       ` Ingo Molnar
2006-01-06  0:55                         ` Martin Bligh
2006-01-06  1:48                           ` Mitchell Blank Jr
2006-01-06  0:50                   ` Mitchell Blank Jr
2006-01-06  0:58                     ` Ingo Molnar
2006-01-06  1:22                       ` Mitchell Blank Jr
2006-01-05 21:34             ` Matt Mackall
2006-01-05 22:08               ` Linus Torvalds
2006-01-05 22:36                 ` Matt Mackall
2006-01-05 22:49                   ` Martin Bligh
2006-01-05 23:02                     ` Matt Mackall
2006-01-05 22:55                   ` Ingo Molnar
2006-01-05 23:11                     ` Matt Mackall
2006-01-05 23:27                       ` Jesse Barnes
2006-01-05 23:58                       ` Ingo Molnar
2006-01-05 21:32           ` Grzegorz Kulewski
  -- strict thread matches above, loose matches on Subject: below --
2005-12-28 11:46 Ingo Molnar
2005-12-28 19:17 ` Linus Torvalds
2005-12-28 19:34   ` Arjan van de Ven
2005-12-28 21:02     ` Linus Torvalds
2005-12-28 21:17       ` Arjan van de Ven
2005-12-28 21:23       ` Ingo Molnar
2005-12-28 21:48         ` Ingo Molnar
2005-12-28 23:56           ` Krzysztof Halasa
2005-12-29  7:41             ` Ingo Molnar
2005-12-29  8:02               ` Dave Jones
2005-12-29 19:44               ` Krzysztof Halasa
2005-12-29  4:11           ` Andrew Morton
2005-12-29  7:32             ` Ingo Molnar
2005-12-29 14:58               ` Horst von Brand
2005-12-29 15:40               ` Adrian Bunk
2005-12-29 17:41               ` Linus Torvalds
2005-12-29 18:42                 ` Arjan van de Ven
2005-12-29 18:45                   ` Arjan van de Ven
2005-12-29 20:19                 ` Ingo Molnar
2005-12-29 22:20                   ` Matt Mackall
2005-12-29 20:28                 ` Dave Jones
2005-12-29 20:49                   ` Linus Torvalds
2005-12-29 21:25                     ` Linus Torvalds
     [not found]                       ` <20051229224839.GA12247@elte.hu>
2005-12-29 22:58                         ` Arjan van de Ven
2005-12-30  2:03                           ` Tim Schmielau
2005-12-30  2:15                             ` Tim Schmielau
2005-12-30  7:49                             ` Ingo Molnar
2005-12-31 14:38                               ` Adrian Bunk
2005-12-31 14:45                                 ` Ingo Molnar
2005-12-31 15:08                                   ` Adrian Bunk
2006-01-02 10:37                                     ` Ingo Molnar
2006-01-02 10:48                                       ` Arjan van de Ven
2006-01-02 13:43                                         ` Adrian Bunk
2006-01-02 14:05                                           ` Ingo Molnar
2006-01-02 15:01                                             ` Adrian Bunk
2006-01-02 18:44                                             ` Krzysztof Halasa
2006-01-02 18:51                                               ` Arjan van de Ven
2006-01-02 19:49                                                 ` Krzysztof Halasa
2006-01-02 19:54                                                   ` Arjan van de Ven
2006-01-02 20:05                                                     ` Krzysztof Halasa
2006-01-02 20:18                                                       ` Jörn Engel
2006-01-02 22:23                                                 ` Russell King
2006-01-02 23:55                                                   ` Alan Cox
2006-01-03  3:59                                                   ` Daniel Jacobowitz
2006-01-03  8:53                                                     ` Russell King
2006-01-03  8:56                                                       ` Arjan van de Ven
2006-01-03  9:00                                                         ` Russell King
2006-01-03  9:10                                                           ` Arjan van de Ven
2006-01-03  9:14                                                         ` Vitaly Wool
2006-01-02 19:03                                               ` Andrew Morton
2006-01-02 19:17                                                 ` Jakub Jelinek
2006-01-02 19:30                                                   ` Andrew Morton
2006-01-02 19:41                                                   ` Linus Torvalds
2006-01-02 19:53                                                     ` Ingo Molnar
2006-01-02 20:28                                                     ` Jakub Jelinek
2006-01-02 20:09                                                 ` Ingo Molnar
2006-01-02 20:24                                                   ` Andrew Morton
2006-01-02 20:40                                                     ` Ingo Molnar
2006-01-02 20:30                                                 ` Ingo Molnar
2006-01-02 19:12                                               ` Linus Torvalds
2006-01-02 19:59                                                 ` Krzysztof Halasa
2006-01-02 20:13                                                 ` Ingo Molnar
2006-01-02 21:00                                                   ` Jan Engelhardt
2006-01-02 22:43                                                     ` Linus Torvalds
2006-01-02 13:42                                       ` Adrian Bunk
2006-01-02 18:28                                         ` Andrew Morton
2006-01-02 18:49                                           ` Arjan van de Ven
2006-01-02 19:26                                             ` Jörn Engel
2006-01-02 21:51                                             ` Grant Coady
2006-01-02 22:03                                               ` Antonio Vargas
2006-01-02 22:56                                                 ` Arjan van de Ven
2006-01-02 23:10                                                   ` Grant Coady
2006-01-02 23:57                                                     ` Alan Cox
2006-01-02 23:58                                                       ` Grant Coady
2006-01-03  5:31                                           ` Nick Piggin
2006-01-03 23:40                                           ` Martin J. Bligh
2006-01-04  4:28                                             ` Matt Mackall
2006-01-04  5:51                                               ` Martin J. Bligh
2006-01-04 17:10                                                 ` Matt Mackall
2006-01-04 22:37                                                 ` Linus Torvalds
2006-01-05  0:55                                                   ` Martin Bligh
2006-01-04 17:36                                               ` Zwane Mwaikambo
2005-12-31  3:51                             ` Kurt Wall
2005-12-30  3:31                           ` Nicolas Pitre
2005-12-30  3:47                     ` Mark Lord
2005-12-30  3:56                       ` Dave Jones
2005-12-30  3:57                       ` Mark Lord
2005-12-30  4:02                         ` Dave Jones
2005-12-30  4:11                           ` Mark Lord
2005-12-30  4:14                             ` Mark Lord
2005-12-30  4:20                               ` Mark Lord
2005-12-30  5:04                                 ` Dave Jones
2005-12-29 23:16                 ` Willy Tarreau
2005-12-30  8:05                   ` Arjan van de Ven
2005-12-30  8:15                     ` Willy Tarreau
2005-12-30  8:24                       ` Arjan van de Ven
2005-12-30  9:20                         ` Willy Tarreau
2005-12-30 13:38                           ` Adrian Bunk
2005-12-30  8:33                   ` Jesper Juhl
2005-12-30  9:28                     ` Willy Tarreau
2005-12-30  9:37                       ` Jesper Juhl
2005-12-30  9:38                         ` Willy Tarreau
2005-12-30 19:53                   ` Alistair John Strachan
2005-12-29  7:49             ` Arjan van de Ven
2005-12-29 15:01               ` Horst von Brand
2005-12-30 15:28             ` Alan Cox
2005-12-30 20:59               ` Adrian Bunk
2005-12-30 22:12             ` Matt Mackall
2005-12-30 23:54               ` Adrian Bunk
2005-12-31  9:20               ` Arjan van de Ven
2005-12-29 14:38           ` Christoph Hellwig
2005-12-29 14:54             ` Arjan van de Ven
2005-12-29 15:35               ` Adrian Bunk
2005-12-29 15:38                 ` Arjan van de Ven
2005-12-29 15:42                 ` Jakub Jelinek
2005-12-29 19:14                   ` Adrian Bunk
2005-12-30  9:28                   ` Andi Kleen
2005-12-30  9:40                     ` Ingo Molnar
2005-12-30 10:14                       ` Ingo Molnar
2005-12-30 13:31                         ` Adrian Bunk
2005-12-30 14:08                         ` Christian Trefzer
2005-12-30 10:25                       ` Andi Kleen
2005-12-29  0:37         ` Rogério Brito
2006-01-03  3:36         ` Daniel Jacobowitz
2005-12-29  4:38 ` Adrian Bunk
2005-12-29  7:59   ` Ingo Molnar
2005-12-29 13:52     ` Adrian Bunk
2005-12-29 19:57       ` Horst von Brand
2005-12-29 20:25       ` Ingo Molnar
2005-12-31 15:22         ` Adrian Bunk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox