linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] Explicitly mark global functions with vmlinux with __global
@ 2007-05-16 16:34 David Woodhouse
  2007-05-16 16:50 ` Andrew Morton
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: David Woodhouse @ 2007-05-16 16:34 UTC (permalink / raw)
  To: torvalds, akpm; +Cc: sam, linux-arch

A while ago, I played with using '-fwhole-program --combine' for
building kernel objects -- http://lwn.net/Articles/197097/

A found a few compiler bugs which I think should mostly be fixed now, so
I'm revisiting the kernel bits. The original patches I had looked
something like this:

http://david.woodhou.se/combine/combine-diff-1-fixes.patch
http://david.woodhou.se/combine/combine-diff-2-core.patch
http://david.woodhou.se/combine/combine-diff-3-global.patch
http://david.woodhou.se/combine/combine-diff-4-hacks.patch

Essentially, it added a CONFIG_COMBINED_COMPILE option which would build
every multi-part object (including built-in.o) with the -fwhole-program
and --combine flags. We saw savings of up to 14% in size in a few
places, and a more realistic 5-6% in many more.

Since -fwhole-program makes the compiler assume everything is 'static' I
needed to explicitly mark some stuff as _not_ static, if it was used by
other things outside the same directory. EXPORT_SYMBOL could obviously
be used for that, but then there were some things which were global
_within_ vmlinux but not exported to modules -- and marking those with
__global was the majority of the patchset above; the third patch.

Before I steam ahead and do it all again, I'd like to revisit that. My
old patches had
	 #define __global __attribute__((externally_visible,used))
and went on to look a bit like the patch below. For the OLPC test build
I was playing with, I had:
 75 files changed, 238 insertions(+), 239 deletions(-)

Another way of doing it would be to add an 'EXPORT_SYMBOL_INTERNAL' or
some such, instead of __global. To be perfectly honest, I don't recall
my reasons for choosing __global over that.

Anyway, the point of the RFC... should I go ahead with creating patches
which add this __global marking where it's needed (it can be a no-op for
now anyway)? Or does anyone have any better plans? It wouldn't be as
_much_ of a pain for Andrew as the irc_reqs thing was, but it's possibly
still worth co-ordinating it -- if I go ahead and do it, when would be
the best time to merge it?

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 65814b0..f5c63c8 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -3037,7 +3037,7 @@ static struct kobj_type md_ktype = {
        .default_attrs  = md_default_attrs,
 };
 
-int mdp_major = 0;
+int __global mdp_major = 0;
 
 static struct kobject *md_probe(dev_t dev, int *part, void *data)
 {
@@ -5747,13 +5747,12 @@ static int __init md_init(void)
 static dev_t detected_devices[128];
 static int dev_cnt;
 
-void md_autodetect_dev(dev_t dev)
+void __global md_autodetect_dev(dev_t dev)
 {
        if (dev_cnt >= 0 && dev_cnt < 127)
                detected_devices[dev_cnt++] = dev;
 }
 
-
 static void autostart_arrays(int part)
 {
        mdk_rdev_t *rdev;

What I had before was...

 arch/i386/mm/extable.c         |    2 -
 arch/i386/mm/fault.c           |   10 ++++----
 arch/i386/mm/hugetlbpage.c     |   14 +++++------
 arch/i386/mm/init.c            |   26 +++++++++++-----------
 arch/i386/mm/ioremap.c         |    4 +--
 arch/i386/mm/mmap.c            |    2 -
 arch/i386/mm/pgtable.c         |   12 +++++-----
 drivers/ide/ppc/pmac.c         |    2 -
 drivers/md/md.c                |    5 +---
 drivers/video/fbcvt.c          |    2 -
 fs/aio.c                       |    6 ++---
 fs/bio.c                       |    8 +++---
 fs/block_dev.c                 |    2 -
 fs/buffer.c                    |   10 ++++----
 fs/char_dev.c                  |    2 -
 fs/compat.c                    |    4 +--
 fs/dcache.c                    |    8 +++---
 fs/devpts/inode.c              |    6 ++---
 fs/dnotify.c                   |    2 -
 fs/drop_caches.c               |    4 +--
 fs/exec.c                      |   12 +++++-----
 fs/fcntl.c                     |    6 ++---
 fs/file.c                      |    8 +++---
 fs/file_table.c                |   16 ++++++-------
 fs/filesystems.c               |    2 -
 fs/fs-writeback.c              |    4 +--
 fs/hugetlbfs/inode.c           |   10 ++++----
 fs/inode.c                     |    8 +++---
 fs/inotify_user.c              |    2 -
 fs/locks.c                     |    8 +++---
 fs/namei.c                     |    4 +--
 fs/namespace.c                 |   12 +++++-----
 fs/pipe.c                      |    4 +--
 fs/proc/base.c                 |    2 -
 fs/proc/generic.c              |    2 -
 fs/proc/kcore.c                |    2 -
 fs/proc/proc_devtree.c         |   10 ++++----
 fs/proc/proc_tty.c             |    4 +--
 fs/proc/root.c                 |    4 +--
 fs/ramfs/inode.c               |    2 -
 fs/splice.c                    |    2 -
 fs/super.c                     |    8 +++---
 init/calibrate.c               |    2 -
 init/do_mounts.c               |   11 ++++-----
 init/do_mounts_initrd.c        |    7 +++--
 init/do_mounts_rd.c            |    4 +--
 init/main.c                    |   14 +++++------
 ipc/compat.c                   |   14 +++++------
 ipc/msg.c                      |    6 ++---
 ipc/sem.c                      |    6 ++---
 ipc/shm.c                      |    8 +++---
 kernel/sys_ni.c                |    1 
 net/core/dev.c                 |   10 ++++----
 net/core/dv.c                  |    3 --
 net/core/flow.c                |    2 -
 net/core/iovec.c               |    3 +-
 net/core/neighbour.c           |    4 +--
 net/core/request_sock.c        |    2 -
 net/core/rtnetlink.c           |    4 +--
 net/core/skbuff.c              |    2 -
 net/core/sock.c                |   10 ++++----
 net/core/sysctl_net_core.c     |    2 -
 net/core/user_dma.c            |    4 +--
 net/ipv4/inet_hashtables.c     |    4 +--
 net/ipv4/sysctl_net_ipv4.c     |    2 -
 net/ipv4/tcp_timer.c           |    3 --
 net/netfilter/core.c           |    2 -
 net/sched/sch_generic.c        |   10 ++++----
 net/xfrm/xfrm_policy.c         |    4 +--
 security/selinux/avc.c         |    4 +--
 security/selinux/exports.c     |   10 ++++----
 security/selinux/hooks.c       |    4 +--
 security/selinux/netlink.c     |    2 -
 security/selinux/ss/policydb.c |    2 -
 security/selinux/ss/services.c |   48 ++++++++++++++++++++---------------------
 75 files changed, 238 insertions(+), 239 deletions(-)

-- 
dwmw2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC] Explicitly mark global functions with vmlinux with __global
  2007-05-16 16:34 [RFC] Explicitly mark global functions with vmlinux with __global David Woodhouse
@ 2007-05-16 16:50 ` Andrew Morton
  2007-05-16 18:10   ` David Woodhouse
  2007-05-16 18:33 ` Linus Torvalds
  2007-05-16 19:10 ` [RFC] Explicitly mark global functions with vmlinux with __global Sam Ravnborg
  2 siblings, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2007-05-16 16:50 UTC (permalink / raw)
  To: David Woodhouse; +Cc: torvalds, sam, linux-arch

On Thu, 17 May 2007 00:34:33 +0800 David Woodhouse <dwmw2@infradead.org> wrote:

> A while ago, I played with using '-fwhole-program --combine' for
> building kernel objects -- http://lwn.net/Articles/197097/
> 
> A found a few compiler bugs which I think should mostly be fixed now, so
> I'm revisiting the kernel bits. The original patches I had looked
> something like this:
> 
> http://david.woodhou.se/combine/combine-diff-1-fixes.patch
> http://david.woodhou.se/combine/combine-diff-2-core.patch
> http://david.woodhou.se/combine/combine-diff-3-global.patch
> http://david.woodhou.se/combine/combine-diff-4-hacks.patch
> 
> Essentially, it added a CONFIG_COMBINED_COMPILE option which would build
> every multi-part object (including built-in.o) with the -fwhole-program
> and --combine flags. We saw savings of up to 14% in size in a few
> places, and a more realistic 5-6% in many more.

So..  is 5% a reasonable estimate of the overall gain which we're likely to
see here?

And do we know specifically where that gain is coming from?  How the
compiler/linker is achieving this?  If it's because we're all slackers,
perhaps similar gains could come from manual fixes.

> Since -fwhole-program makes the compiler assume everything is 'static' I
> needed to explicitly mark some stuff as _not_ static, if it was used by
> other things outside the same directory. EXPORT_SYMBOL could obviously
> be used for that, but then there were some things which were global
> _within_ vmlinux but not exported to modules -- and marking those with
> __global was the majority of the patchset above; the third patch.
> 
> Before I steam ahead and do it all again, I'd like to revisit that. My
> old patches had
> 	 #define __global __attribute__((externally_visible,used))
> and went on to look a bit like the patch below. For the OLPC test build
> I was playing with, I had:
>  75 files changed, 238 insertions(+), 239 deletions(-)
> 
> Another way of doing it would be to add an 'EXPORT_SYMBOL_INTERNAL' or
> some such, instead of __global. To be perfectly honest, I don't recall
> my reasons for choosing __global over that.

Would it be true to say that most such symbols are already marked with
EXPORT_SYMBOL()?  If so, then I'd have thought that EXPORT_SYMBOL_INTERNAL
would be a better approach, as less additional markups would be needed.

> Anyway, the point of the RFC... should I go ahead with creating patches
> which add this __global marking where it's needed (it can be a no-op for
> now anyway)? Or does anyone have any better plans? It wouldn't be as
> _much_ of a pain for Andrew as the irc_reqs thing was, but it's possibly
> still worth co-ordinating it -- if I go ahead and do it, when would be
> the best time to merge it?

I'd be concerned about ongoing maintainability.  If someone makes a change
which breaks CONFIG_COMBINED_COMPILE then how would we be notified about
it?

And I assume that notification would only be visible to
CONFIG_COMBINED_COMPILE users, so...  will this feature be available on
x86_64 and i386, and what toolchain version is required?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Explicitly mark global functions with vmlinux with __global
  2007-05-16 16:50 ` Andrew Morton
@ 2007-05-16 18:10   ` David Woodhouse
  2007-05-17 14:51     ` David Woodhouse
  0 siblings, 1 reply; 12+ messages in thread
From: David Woodhouse @ 2007-05-16 18:10 UTC (permalink / raw)
  To: Andrew Morton; +Cc: torvalds, sam, linux-arch

On Wed, 2007-05-16 at 09:50 -0700, Andrew Morton wrote:
> On Thu, 17 May 2007 00:34:33 +0800 David Woodhouse <dwmw2@infradead.org> wrote:
> 
> > A while ago, I played with using '-fwhole-program --combine' for
> > building kernel objects -- http://lwn.net/Articles/197097/
> > 
> > A found a few compiler bugs which I think should mostly be fixed now, so
> > I'm revisiting the kernel bits. The original patches I had looked
> > something like this:
> > 
> > http://david.woodhou.se/combine/combine-diff-1-fixes.patch
> > http://david.woodhou.se/combine/combine-diff-2-core.patch
> > http://david.woodhou.se/combine/combine-diff-3-global.patch
> > http://david.woodhou.se/combine/combine-diff-4-hacks.patch
> > 
> > Essentially, it added a CONFIG_COMBINED_COMPILE option which would build
> > every multi-part object (including built-in.o) with the -fwhole-program
> > and --combine flags. We saw savings of up to 14% in size in a few
> > places, and a more realistic 5-6% in many more.
> 
> So..  is 5% a reasonable estimate of the overall gain which we're likely to
> see here?

That kind of ball-park, I imagine; maybe a little less. I'll have to try
it again and see precisely what the numbers were overall. My old numbers
are at http://david.woodhou.se/combine/sizes-sorted.csv -- tuples of
(old,new,Δ,%,object).

The reason I say 'maybe a little less' is because I concentrated on
modules at the time because the Kbuild stuff was relatively easy for
those -- and I have a vague recollection that the built-in stuff didn't
show as big a difference. Searching for 'built-in.o' in the above-linked
CSV file seems to confirm that recollection.

When I do turn my attention to the kernel proper, once thing I want to
try is combining more than one directory at a time. I wouldn't do many,
but I suspect it would make sense to combine mm/ and arch/$ARCH/mm -- to
build all files in both of those at the same time. Likewise for kernel/
and arch/$ARCH/kernel/ -- or maybe even all four together. Or maybe
copmbine fs/ and mm/. I'll do some investigation as to precisely where
the best benefits are to be found.

There's a trade-off, obviously. We can't just build _every_ C file all
at the same time because the amount of memory used for that would be
insane. So we pick the bits which actually make _sense_ to show to the
optimiser at the same time, while keeping it broken down into manageable
chunks.

> And do we know specifically where that gain is coming from?  How the
> compiler/linker is achieving this?  If it's because we're all slackers,
> perhaps similar gains could come from manual fixes.

It's not always as simple as that. Sometimes it's cleaner to split stuff
up according to some _other_ criterion than what functions get exposed
to the optimiser together.

File systems are an obvious example of this -- perhaps they'd be
_optimal_ if we stuck them all into a single file, but I don't think
anyone would seriously advocate that.

The split between mm and arch/$ARCH/mm is another example.

> Would it be true to say that most such symbols are already marked with
> EXPORT_SYMBOL()?  If so, then I'd have thought that EXPORT_SYMBOL_INTERNAL
> would be a better approach, as less additional markups would be needed.

Er, it's exactly the same number of additional markups either way.

Yes, although I didn't actually count, I do think that 'most' are
already marked with EXPORT_SYMBOL, which is why I've already hacked
EXPORT_SYMBOL() to also mark the symbol in question with the appropriate
attributes -- so we don't need to add __global to anything which is
marked with EXPORT_SYMBOL().

It's only those symbols which are global within vmlinux but _not_
exported to modules which would need an additional marking. That marking
could be either EXPORT_SYMBOL_INTERNAL() or __global -- I'm mostly
ambivalent. I think I chose __global last time just because it was
easier to do in a semi-automatic fashion.

> I'd be concerned about ongoing maintainability.  If someone makes a change
> which breaks CONFIG_COMBINED_COMPILE then how would we be notified about
> it?

The most likely failure mode -- other than the WeirdShit™ which
accompanies a compiler bug -- is an unresolved symbol in the final link.
That isn't particularly hard to deal with, and in fact is precisely how
most of my 'combine-diff-3-global.patch' came about. You find the symbol
it's bitching about, and you mark it __global. I'm sure we have an
_army_ of people capable of doing test builds and catching missing tags
like that.

> And I assume that notification would only be visible to
> CONFIG_COMBINED_COMPILE users, so...  will this feature be available on
> x86_64 and i386, and what toolchain version is required?

Er, there may be tricks we can play to make notification available to
others. Maybe not, but I'll take a look.

Regarding the toolchain version -- last time I looked, at this, about a
year ago, I filed a stream of GCC bugs (again, listed under the lwn link
above¹). I will test gcc 4.2 and I expect it should be OK. I believe
Fedora's toolchain should also cope -- I'll make sure it does, because
we want this for OLPC.

Obviously I posted these patches just as an FYI last time, since I was
the only person in the world with a compiler that could cope. The reason
I'm revisiting it now is because it should actually be of use to someone
else too, this time :)

I'd normally do ppc32, ppc64 and i386 for myself -- I can't test booting
x86_64 but I can at least make sure it compiles. Would you believe
x86_64 is one of the few Linux architectures which I _don't_ have a
sample of?

-- 
dwmw2

¹ although the list there has a typo -- s/27889/27899/


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Explicitly mark global functions with vmlinux with __global
  2007-05-16 16:34 [RFC] Explicitly mark global functions with vmlinux with __global David Woodhouse
  2007-05-16 16:50 ` Andrew Morton
@ 2007-05-16 18:33 ` Linus Torvalds
  2007-05-17  2:53   ` David Woodhouse
  2007-05-16 19:10 ` [RFC] Explicitly mark global functions with vmlinux with __global Sam Ravnborg
  2 siblings, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2007-05-16 18:33 UTC (permalink / raw)
  To: David Woodhouse; +Cc: akpm, sam, linux-arch



On Thu, 17 May 2007, David Woodhouse wrote:
>
> A while ago, I played with using '-fwhole-program --combine' for
> building kernel objects -- http://lwn.net/Articles/197097/

I'm worried about three things:

 - compiler stability. Quite frankly, I don't get the warm and fuzzies 
   about features like this that I suspect have had almost zero testing in 
   real life projects. 

 - memory use. I've seen some projects try to use various vendors 
   aggressive optimizations (including things like whole-program: others 
   have supported it for much longer than gcc has), and memory use tends 
   to skyrocket. Which in turn means that most users wouldn't necessarily 
   use it, just because it makes things so slow to compile.

   (Related to that - if you used to just recompile a single file, you now 
   end up recompiling a whole group, so you often have a double whammy: 
   the compile itself is much slower, and you do a lot more of it!)

 - how much of a win is this on a sane and relevant architecture?

   In particular, the -fwhole-program thing tends to matter a lot more on 
   broken and/or uninteresting archtectures. Architectures like ia64, 
   where you can do real additional optimizations that matter. While these 
   things tend to make much less of an impact on some modern x86, which 
   doesn't have a lot of callee-saved registers anyway, and where function 
   calls aren't really all *that* expensive..

IOW, I'd like to see numbers from something like a 64-bit Core 2 build, 
and both for performance and size and compile time. I think that's more 
interesting than some other architectures (ia64, alpha, ppc) that have 
specific issues that make function boundaries artificially more painful 
than they should be.

That said, I don't think adding "__global" annotations is wrong per se. So 
I don't object to the patches - I think they can be a real advantage in 
the long run, and that it can even be interesting to see which functions 
are "internal" to a subsystem and which ones aren't, even if the compiler 
doesn't use the information. 

But I'm just not very excited about plunging into using experimental gcc 
features unless there is some major advantage for major architectures, and 
the disadvantages are known..

		Linus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Explicitly mark global functions with vmlinux with __global
  2007-05-16 16:34 [RFC] Explicitly mark global functions with vmlinux with __global David Woodhouse
  2007-05-16 16:50 ` Andrew Morton
  2007-05-16 18:33 ` Linus Torvalds
@ 2007-05-16 19:10 ` Sam Ravnborg
  2007-05-17  0:21   ` David Woodhouse
  2 siblings, 1 reply; 12+ messages in thread
From: Sam Ravnborg @ 2007-05-16 19:10 UTC (permalink / raw)
  To: David Woodhouse; +Cc: torvalds, akpm, linux-arch, Marcelo Tosatti

On Thu, May 17, 2007 at 12:34:33AM +0800, David Woodhouse wrote:
> A while ago, I played with using '-fwhole-program --combine' for
> building kernel objects -- http://lwn.net/Articles/197097/

To save binary size the approach used by Marcelo Tosatti
should be considered.
See http://lkml.org/lkml/2006/6/4/169

But that also rely on newer toolchain and I do not
know if this has been added since then.

Googling for 'print-gc-section' says that -M has been enhanced 
to give same info.

This is on my TODO list to look into but buried somewhere deep.

	Sam

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Explicitly mark global functions with vmlinux with __global
  2007-05-16 19:10 ` [RFC] Explicitly mark global functions with vmlinux with __global Sam Ravnborg
@ 2007-05-17  0:21   ` David Woodhouse
  0 siblings, 0 replies; 12+ messages in thread
From: David Woodhouse @ 2007-05-17  0:21 UTC (permalink / raw)
  To: Sam Ravnborg; +Cc: torvalds, akpm, linux-arch, Marcelo Tosatti

On Wed, 2007-05-16 at 21:10 +0200, Sam Ravnborg wrote:
> On Thu, May 17, 2007 at 12:34:33AM +0800, David Woodhouse wrote:
> > A while ago, I played with using '-fwhole-program --combine' for
> > building kernel objects -- http://lwn.net/Articles/197097/
> 
> To save binary size the approach used by Marcelo Tosatti
> should be considered.
> See http://lkml.org/lkml/2006/6/4/169

Yeah, I know about that -- I was the one who pointed him at it :)

> But that also rely on newer toolchain and I do not
> know if this has been added since then.

Does it? I've been doing --gc-sections in the kernel for years. I
thought the only thing we wanted a toolchain patch for was to _print_
the symbols which got discarded. Just throwing them away is something
we've been able to do for ever.

--gc-sections doesn't achieve as much as --combine does -- it can only
throw away functions which are _entirely_ unused; it can't do any
cross-unit optimising and inlining.

-- 
dwmw2


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Explicitly mark global functions with vmlinux with __global
  2007-05-16 18:33 ` Linus Torvalds
@ 2007-05-17  2:53   ` David Woodhouse
  2007-05-17  9:49     ` sparse (was: Re: [RFC] Explicitly mark global functions with vmlinux with __global) Geert Uytterhoeven
  0 siblings, 1 reply; 12+ messages in thread
From: David Woodhouse @ 2007-05-17  2:53 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: akpm, sam, linux-arch

On Wed, 2007-05-16 at 11:33 -0700, Linus Torvalds wrote:
> I'm worried about three things:
> 
>  - compiler stability. Quite frankly, I don't get the warm and fuzzies 
>    about features like this that I suspect have had almost zero testing in 
>    real life projects. 

(WARNING: I am about to speak of compiler internals; of which I know
little and _want_ to know less than that. Take with a pinch of salt).

Yeah -- warm and fuzzy doesn't really describe my feelings when I filed
all those PRs either -- but none of them were _painful_ bugs. The
--combine thing is all front-end stuff; just hacking the parser to make
it look like you'd just done one big C file which #includes all the
others. And then 'noticing' that certain objects which are declared
twice are in fact the _same_ object, to work around all the compile
failures you'd normally see if you _actually_ did '#include "*.c"'.

IIRC the majority of the PRs I filed were just cases where we failed to
'notice' that objects should be merged. I don't recall a single
'wrong-code' bug, because it just isn't that kind of feature.

There were one or two 'bad optimisation' bugs, caused by stuff like
over-aggressive inlining, but that's not so much of a pain -- it's the
kind of thing that affects us now anyway, and we'd never have noticed it
unless CONFIG_COMBINED_COMPILE had actually caused one or two objects to
_grow_ instead of shrinking, causing me to investigate.

So on the whole, I'm not _too_ worried about the compiler stability.
Either it'll work, or it'll fall over. It won't hurt.

>  - memory use. I've seen some projects try to use various vendors 
>    aggressive optimizations (including things like whole-program: others 
>    have supported it for much longer than gcc has), and memory use tends 
>    to skyrocket. Which in turn means that most users wouldn't necessarily 
>    use it, just because it makes things so slow to compile.

Well, the -fwhole-program _option_ really just means "Assume everything
is static unless told otherwise". It doesn't affect memory usage at all.
What hurts is actually compiling the whole program all at once, which
you don't _really_ need to do if you keep the inter-module linkage
aroundwith judicious used of __attribute__((externally_visible,used)) :)

As I said, it's a trade-off -- we certainly wouldn't actually want to
build the _whole_ kernel all in one compiler invocation. I believe the
biggest wins from --combine come from combining files from within the
same directory (with the probable exceptions I noted), so I see little
benefit in going that far -- I was doing one directory at a time. I
think that's likely to be about the sweet spot.

Even if(^W when) every compiler would work with it, the memory usage is
one reason we'd still want it to be optional, I think.

>    (Related to that - if you used to just recompile a single file, you now 
>    end up recompiling a whole group, so you often have a double whammy: 
>    the compile itself is much slower, and you do a lot more of it!)

True. In the debugging 'compile once, run once, swear, edit' loop I'd be
inclined to build without the CONFIG_COMBINED_COMPILE option -- it's
more fun in the 'compile once, run many times' kind of situation where
the compile time isn't really in the fast path. That's one of the
reasons I wanted to keep it optional -- so that this works:

  make modules CONFIG_COMBINED_COMPILE= SUBDIRS=what/i/broke/today

>  - how much of a win is this on a sane and relevant architecture?

Well, I'm not sure I'd call x86_64 'sane and relevant' in this
particular context -- I was trying to save memory on
resource-constrained systems. But there's probably some potential for
x86_64 to gain from the improved opportunities for optimisation too.

>    In particular, the -fwhole-program thing tends to matter a lot more on 
>    broken and/or uninteresting archtectures. Architectures like ia64, 
>    where you can do real additional optimizations that matter. While these 
>    things tend to make much less of an impact on some modern x86, which 
>    doesn't have a lot of callee-saved registers anyway, and where function 
>    calls aren't really all *that* expensive..
> 
> IOW, I'd like to see numbers from something like a 64-bit Core 2 build, 
> and both for performance and size and compile time. I think that's more 
> interesting than some other architectures (ia64, alpha, ppc) that have 
> specific issues that make function boundaries artificially more painful 
> than they should be.

OK, I'll try. Size and compile time shouldn't be hard to do even for
x86_64 -- for runtime stuff I'll need to find some suitable hardware or
a volunteer. Any _particular_ runtime stuff you want tested, or shall I
attempt to contrive a benchmark of my own devising? :)

Left to my own devices, I'd probably do something based on mounting,
reading and writing a large JFFS2 filesystem on a fake ram-backed
device. That exercises lots of cpu-intensive code I know well, and it's
"real world" stuff for me, if not for you.

Me and my laptop are about to be locked in various tin cans for a total
period of 24 hours or so, and can compile for ppc, ppc64 and i386 -- so
I'll see if I can get that lot building again by the beginning of next
week.

> That said, I don't think adding "__global" annotations is wrong per se. So 
> I don't object to the patches - I think they can be a real advantage in 
> the long run, and that it can even be interesting to see which functions 
> are "internal" to a subsystem and which ones aren't, even if the compiler 
> doesn't use the information. 

OK.

> But I'm just not very excited about plunging into using experimental gcc 
> features unless there is some major advantage for major architectures, and 
> the disadvantages are known..

The architectures I see as 'major' in this particular context are the
ones that can sanely be used in 'embedded' systems -- so I'd normally be
looking at i386, ppc32 and ARM to start with. And I'd probably do FR-V
too just for fun. My original tests did show a worthwhile win on at
least the first two of those, which I think is enough in itself to
justify taking it further.

It does have other benefits for those who aren't using it, as well as
highlighting cases where the compiler is overaggressive about inlining
-- it found a few bugs with mismatching prototypes, such as the fact
that ipxrtr_route_packet() actually takes a 'len' argument of type
'size_t', but is declared and called in af_ipx.c as if that argument is
'int'. (Oops, I thought I'd chased all of combine-diff-1-fixes.patch
upstream already).

-- 
dwmw2


^ permalink raw reply	[flat|nested] 12+ messages in thread

* sparse (was: Re: [RFC] Explicitly mark global functions with vmlinux with __global)
  2007-05-17  2:53   ` David Woodhouse
@ 2007-05-17  9:49     ` Geert Uytterhoeven
  2007-05-17  9:56       ` David Woodhouse
  2007-05-17 13:56       ` David Woodhouse
  0 siblings, 2 replies; 12+ messages in thread
From: Geert Uytterhoeven @ 2007-05-17  9:49 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Linus Torvalds, akpm, sam, linux-arch

On Thu, 17 May 2007, David Woodhouse wrote:
> -- it found a few bugs with mismatching prototypes, such as the fact
> that ipxrtr_route_packet() actually takes a 'len' argument of type
> 'size_t', but is declared and called in af_ipx.c as if that argument is
> 'int'. (Oops, I thought I'd chased all of combine-diff-1-fixes.patch
> upstream already).

I guess af_ipx.c didn't include the relevant header file?

This is something sparse can find out now, too (as I found out
yesterday): if something is non-static, sparse gives a warning if it hasn't
seen a prototype in a header file.

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: sparse (was: Re: [RFC] Explicitly mark global functions with vmlinux with __global)
  2007-05-17  9:49     ` sparse (was: Re: [RFC] Explicitly mark global functions with vmlinux with __global) Geert Uytterhoeven
@ 2007-05-17  9:56       ` David Woodhouse
  2007-05-17 12:45         ` Geert Uytterhoeven
  2007-05-17 13:56       ` David Woodhouse
  1 sibling, 1 reply; 12+ messages in thread
From: David Woodhouse @ 2007-05-17  9:56 UTC (permalink / raw)
  To: Geert Uytterhoeven; +Cc: Linus Torvalds, akpm, sam, linux-arch

On Thu, 2007-05-17 at 11:49 +0200, Geert Uytterhoeven wrote:
> I guess af_ipx.c didn't include the relevant header file?
> 
> This is something sparse can find out now, too (as I found out
> yesterday): if something is non-static, sparse gives a warning if it
> hasn't seen a prototype in a header file.

In the IPX case, af_ipx.c _does_ have a prototype for the offending
function; it's just that it doesn't actually match the function itself.

-- 
dwmw2


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: sparse (was: Re: [RFC] Explicitly mark global functions with vmlinux with __global)
  2007-05-17  9:56       ` David Woodhouse
@ 2007-05-17 12:45         ` Geert Uytterhoeven
  0 siblings, 0 replies; 12+ messages in thread
From: Geert Uytterhoeven @ 2007-05-17 12:45 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Linus Torvalds, akpm, sam, linux-arch

On Thu, 17 May 2007, David Woodhouse wrote:
> On Thu, 2007-05-17 at 11:49 +0200, Geert Uytterhoeven wrote:
> > I guess af_ipx.c didn't include the relevant header file?
> > 
> > This is something sparse can find out now, too (as I found out
> > yesterday): if something is non-static, sparse gives a warning if it
> > hasn't seen a prototype in a header file.
> 
> In the IPX case, af_ipx.c _does_ have a prototype for the offending
> function; it's just that it doesn't actually match the function itself.

But the prototype is in af_ipx.c, not in a header file.

Hence sparse complains for ipx_route.c:

|   CHECK   /usr/people/geert.nba/ps3/ps3-linux-2.6.22-rc1/net/ipx/ipx_route.c
| linux-2.6.22-rc1/net/ipx/ipx_route.c:32:18: warning: symbol 'ipxrtr_lookup' was not declared. Should it be static?
| linux-2.6.22-rc1/net/ipx/ipx_route.c:51:5: warning: symbol 'ipxrtr_add_route' was not declared. Should it be static?
| linux-2.6.22-rc1/net/ipx/ipx_route.c:93:6: warning: symbol 'ipxrtr_del_routes' was not declared. Should it be static?
| linux-2.6.22-rc1/net/ipx/ipx_route.c:148:5: warning: symbol 'ipxrtr_route_skb' was not declared. Should it be static?
| linux-2.6.22-rc1/net/ipx/ipx_route.c:170:5: warning: symbol 'ipxrtr_route_packet' was not declared. Should it be static?
| linux-2.6.22-rc1/net/ipx/ipx_route.c:260:5: warning: symbol 'ipxrtr_ioctl' was not declared. Should it be static?
|   CC      net/ipx/ipx_route.o

Killing these warnings implies putting the prototypes in a header file.
If it's already in a header file, ipx_route.c doesn't include it.
If it's not yet in a header file, this is a sign there may be some prototypes
in source files (i.e. af_ipx.c), which may not match.

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: sparse (was: Re: [RFC] Explicitly mark global functions with vmlinux with __global)
  2007-05-17  9:49     ` sparse (was: Re: [RFC] Explicitly mark global functions with vmlinux with __global) Geert Uytterhoeven
  2007-05-17  9:56       ` David Woodhouse
@ 2007-05-17 13:56       ` David Woodhouse
  1 sibling, 0 replies; 12+ messages in thread
From: David Woodhouse @ 2007-05-17 13:56 UTC (permalink / raw)
  To: Geert Uytterhoeven; +Cc: Linus Torvalds, akpm, sam, linux-arch

On Thu, 2007-05-17 at 11:49 +0200, Geert Uytterhoeven wrote:
> This is something sparse can find out now, too (as I found out
> yesterday): if something is non-static, sparse gives a warning if it
> hasn't seen a prototype in a header file. 

Ah, nice :)

-- 
dwmw2


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Explicitly mark global functions with vmlinux with __global
  2007-05-16 18:10   ` David Woodhouse
@ 2007-05-17 14:51     ` David Woodhouse
  0 siblings, 0 replies; 12+ messages in thread
From: David Woodhouse @ 2007-05-17 14:51 UTC (permalink / raw)
  To: Andrew Morton; +Cc: torvalds, sam, linux-arch

On Thu, 2007-05-17 at 02:10 +0800, David Woodhouse wrote:
> The reason I say 'maybe a little less' is because I concentrated on
> modules at the time because the Kbuild stuff was relatively easy for
> those -- and I have a vague recollection that the built-in stuff didn't
> show as big a difference. Searching for 'built-in.o' in the above-linked
> CSV file seems to confirm that recollection. 

Actually I'm full of crap. The reason we don't see much improvement in
built-in.o in that list because I didn't _do_ it for built-in.o yet --
it's still only happening for the few multi-obj .o files which get
linked into built-on.o (e.g. mounts-y in init/, etc.).

I lack the wit to make that one work right now. It might look something
like this...

Actually, built-in.o probably ought to be just another case of
$(multi-obj-y). And I need to filter out the non-C sources of those too,
to fix md and crypto stuff (which uses .S files) so I don't need to add
'CONFIG_COMBINED_COMPILE=' to their makefiles.



ifdef CONFIG_COMBINED_COMPILEx
builtin-cfiles-y := $(wildcard $(obj-y:.o=.c))

multi-objs-y += $(obj)/builtin-cfiles.o
multi-used-y += $(obj)/builtin-cfiles.o

$(builtin-target): $(filter-out $(builtin-cfiles-y:.c=.o),$(obj-y))
$(obj)/builtin-cfiles.o FORCE
	$(call if_changed,link_o_target)
else
$(builtin-target): $(obj-y) FORCE
	$(call if_changed,link_o_target)
endif # CONFIG_COMBINED_COMPILE




-- 
dwmw2


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2007-05-17 14:51 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-16 16:34 [RFC] Explicitly mark global functions with vmlinux with __global David Woodhouse
2007-05-16 16:50 ` Andrew Morton
2007-05-16 18:10   ` David Woodhouse
2007-05-17 14:51     ` David Woodhouse
2007-05-16 18:33 ` Linus Torvalds
2007-05-17  2:53   ` David Woodhouse
2007-05-17  9:49     ` sparse (was: Re: [RFC] Explicitly mark global functions with vmlinux with __global) Geert Uytterhoeven
2007-05-17  9:56       ` David Woodhouse
2007-05-17 12:45         ` Geert Uytterhoeven
2007-05-17 13:56       ` David Woodhouse
2007-05-16 19:10 ` [RFC] Explicitly mark global functions with vmlinux with __global Sam Ravnborg
2007-05-17  0:21   ` David Woodhouse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).