* Inlining can be _very_bad...
@ 2007-03-28 23:18 J.A. Magallón
2007-03-29 1:29 ` Benjamin LaHaise
2007-03-29 17:52 ` Adrian Bunk
0 siblings, 2 replies; 5+ messages in thread
From: J.A. Magallón @ 2007-03-28 23:18 UTC (permalink / raw)
To: Linux-Kernel,
[-- Attachment #1: Type: text/plain, Size: 1941 bytes --]
Hi all...
I post this here as it can be of direct interest for kernel development
(as I recall many discussions about inlining yes or no...).
Testing other problems, I finally got this this issue: the same short
and stupid loop lasted from 3 to 5 times more if it was in main() than
if it was in an out-of-line function. The same (bad thing) happens if
the function is inlined.
The basic code is like this:
float data[];
[inline] double one()
{
double sum;
sum = 0;
for (i=0; i<SIZE; i++) sum += data[i];
return sum;
}
int main()
{
gettimeofday(&tv0,0);
for (i=0; i<SIZE; i++)
s0 += data[i];
gettimeofday(&tv1,0);
printf("T0: %6.2f ms\n",elap(tv0,tv1));
gettimeofday(&tv0,0);
s1 = one();
gettimeofday(&tv1,0);
printf("T1: %6.2f ms\n",elap(tv0,tv1));
}
The times if one() is not inlined (emt64, 2.33GHz):
apolo:~/e4> tst
T0: 1145.12 ms
S0: 268435456.00
T1: 457.19 ms
S1: 268435456.00
With one() inlined:
apolo:~/e4> tst
T0: 1200.52 ms
S0: 268435456.00
T1: 1200.14 ms
S1: 268435456.00
Looking at the assembler, the non-inlined version does:
.L2:
cvtss2sd (%rdx,%rax,4), %xmm0
incq %rax
cmpq $268435456, %rax
addsd %xmm0, %xmm1
jne .L2
and the inlined
.L13:
cvtss2sd (%rdx,%rax,4), %xmm0
incq %rax
cmpq $268435456, %rax
addsd 8(%rsp), %xmm0
movsd %xmm0, 8(%rsp)
jne .L13
It looks like is updating the stack on each iteration...This is -march=opteron
code, the -march=pentium4 is similar. Same behaviour with gcc3 and gcc4.
tst.c and Makefile attached.
Nice, isn't it ? Please, probe where is my fault...
--
J.A. Magallon <jamagallon()ono!com> \ Software is like sex:
\ It's better when it's free
Mandriva Linux release 2007.1 (Cooker) for i586
Linux 2.6.20-jam06 (gcc 4.1.2 20070302 (prerelease) (4.1.2-1mdv2007.1)) #1 SMP PREEMPT
[-- Attachment #2: Makefile --]
[-- Type: application/octet-stream, Size: 307 bytes --]
PROG=tst
SRCS=tst.c
CC=gcc4 -m64 -march=opteron -O2
#CC=gcc4 -m32 -march=pentium4 -O2
#CC+=-DINLINE
LIBS=
OBJS=$(SRCS:.c=.o)
ASMS=$(SRCS:.c=.s)
all: $(PROG) $(ASMS)
$(PROG): $(OBJS)
$(CC) -o $@ $(OBJS) $(LIBS)
.c.o:
$(CC) -c $<
.c.s:
$(CC) -c -S $<
clean:
@rm -f $(PROG) $(OBJS) $(ASMS) core tags
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: tst.c --]
[-- Type: text/x-csrc; name=tst.c, Size: 958 bytes --]
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#define SIZE 256*1024*1024
#define elap(t0,t1) \
((1000*t1.tv_sec+0.001*t1.tv_usec) - (1000*t0.tv_sec+0.001*t0.tv_usec))
double one();
float *data;
#ifdef INLINE
inline
#endif
double one()
{
int i;
double sum;
sum = 0;
asm("#FBGN");
for (i=0; i<SIZE; i++)
sum += data[i];
asm("#FEND");
return sum;
}
int main(int argc,char** argv)
{
struct timeval tv0,tv1;
double s0,s1;
int i;
data = malloc(SIZE*sizeof(float));
for (i=0; i<SIZE; i++)
data[i] = 1;
gettimeofday(&tv0,0);
s0 = 0;
asm("#MBGN");
for (i=0; i<SIZE; i++)
s0 += data[i];
asm("#MEND");
gettimeofday(&tv1,0);
printf("T0: %6.2f ms\n",elap(tv0,tv1));
printf("S0: %0.2lf\n",s0);
gettimeofday(&tv0,0);
s1 = one();
gettimeofday(&tv1,0);
printf("T1: %6.2f ms\n",elap(tv0,tv1));
printf("S1: %0.2lf\n",s1);
free(data);
return 0;
}
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Inlining can be _very_bad...
2007-03-28 23:18 Inlining can be _very_bad J.A. Magallón
@ 2007-03-29 1:29 ` Benjamin LaHaise
2007-03-29 17:52 ` Adrian Bunk
1 sibling, 0 replies; 5+ messages in thread
From: Benjamin LaHaise @ 2007-03-29 1:29 UTC (permalink / raw)
To: J.A. Magallón; +Cc: Linux-Kernel,
On Thu, Mar 29, 2007 at 01:18:38AM +0200, J.A. Magallón wrote:
> It looks like is updating the stack on each iteration...This is -march=opteron
> code, the -march=pentium4 is similar. Same behaviour with gcc3 and gcc4.
>
> tst.c and Makefile attached.
>
> Nice, isn't it ? Please, probe where is my fault...
Yes, gcc sucks in its handling of large return values, news at 11. I have
several outstanding bugs on cases where gcc could keep things in registers
but doesn't.
That said, it tends to do much better on plain integer code, as that is
what it gets tuned for. Do NOT propagate the blanket myth that inlining is
a bad thing. It is very useful for small functions where the overhead
associated with call/ret sequences and register clobbers overshadows the
work being done. The call/ret updates alone can make a big difference when
there are lots of other (more useful) memory transactions to complete. Take
a look at things like the notifier hooks for an example of something that
does far too little work per function call and should really be inlined.
-ben
--
"Time is of no importance, Mr. President, only life is important."
Don't Email: <zyntrop@kvack.org>.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Inlining can be _very_bad...
2007-03-28 23:18 Inlining can be _very_bad J.A. Magallón
2007-03-29 1:29 ` Benjamin LaHaise
@ 2007-03-29 17:52 ` Adrian Bunk
2007-03-29 22:01 ` J.A. Magallón
1 sibling, 1 reply; 5+ messages in thread
From: Adrian Bunk @ 2007-03-29 17:52 UTC (permalink / raw)
To: J.A. Magallón; +Cc: Linux-Kernel,
On Thu, Mar 29, 2007 at 01:18:38AM +0200, J.A. Magallón wrote:
> Hi all...
>
> I post this here as it can be of direct interest for kernel development
> (as I recall many discussions about inlining yes or no...).
>
> Testing other problems, I finally got this this issue: the same short
> and stupid loop lasted from 3 to 5 times more if it was in main() than
> if it was in an out-of-line function. The same (bad thing) happens if
> the function is inlined.
>...
> It looks like is updating the stack on each iteration...This is -march=opteron
> code, the -march=pentium4 is similar. Same behaviour with gcc3 and gcc4.
>
> tst.c and Makefile attached.
>
> Nice, isn't it ? Please, probe where is my fault...
The only fault is to post this issue here instead of the gcc Bugzilla.
In your example the compiler should produce code not slower than with
the out-of-line version when inlining. If it doesn't the bug in the
compiler resulting in this should be fixed.
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Inlining can be _very_bad...
2007-03-29 17:52 ` Adrian Bunk
@ 2007-03-29 22:01 ` J.A. Magallón
2007-03-29 22:28 ` Adrian Bunk
0 siblings, 1 reply; 5+ messages in thread
From: J.A. Magallón @ 2007-03-29 22:01 UTC (permalink / raw)
To: Adrian Bunk; +Cc: Linux-Kernel,
On Thu, 29 Mar 2007 19:52:54 +0200, Adrian Bunk <bunk@stusta.de> wrote:
> On Thu, Mar 29, 2007 at 01:18:38AM +0200, J.A. Magallón wrote:
> > Hi all...
> >
> > I post this here as it can be of direct interest for kernel development
> > (as I recall many discussions about inlining yes or no...).
> >
> > Testing other problems, I finally got this this issue: the same short
> > and stupid loop lasted from 3 to 5 times more if it was in main() than
> > if it was in an out-of-line function. The same (bad thing) happens if
> > the function is inlined.
> >...
> > It looks like is updating the stack on each iteration...This is -march=opteron
> > code, the -march=pentium4 is similar. Same behaviour with gcc3 and gcc4.
> >
> > tst.c and Makefile attached.
> >
> > Nice, isn't it ? Please, probe where is my fault...
>
> The only fault is to post this issue here instead of the gcc Bugzilla.
>
Sorry, my intention was just something like 'take a look at your
reduction-like code, perhaps its sloooow', something like checksum
funtions in tcp or raid that are inlined expecting to be faster
and in fact they are slower...
> In your example the compiler should produce code not slower than with
> the out-of-line version when inlining. If it doesn't the bug in the
> compiler resulting in this should be fixed.
>
That's what I expected, but...
Going to gcc bugzilla...
--
J.A. Magallon <jamagallon()ono!com> \ Software is like sex:
\ It's better when it's free
Mandriva Linux release 2007.1 (Cooker) for i586
Linux 2.6.20-jam06 (gcc 4.1.2 20070302 (prerelease) (4.1.2-1mdv2007.1)) #2 SMP PREEMPT
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Inlining can be _very_bad...
2007-03-29 22:01 ` J.A. Magallón
@ 2007-03-29 22:28 ` Adrian Bunk
0 siblings, 0 replies; 5+ messages in thread
From: Adrian Bunk @ 2007-03-29 22:28 UTC (permalink / raw)
To: J.A. Magallón; +Cc: Linux-Kernel,
On Fri, Mar 30, 2007 at 12:01:11AM +0200, J.A. Magallón wrote:
> On Thu, 29 Mar 2007 19:52:54 +0200, Adrian Bunk <bunk@stusta.de> wrote:
>
> > On Thu, Mar 29, 2007 at 01:18:38AM +0200, J.A. Magallón wrote:
> > > Hi all...
> > >
> > > I post this here as it can be of direct interest for kernel development
> > > (as I recall many discussions about inlining yes or no...).
> > >
> > > Testing other problems, I finally got this this issue: the same short
> > > and stupid loop lasted from 3 to 5 times more if it was in main() than
> > > if it was in an out-of-line function. The same (bad thing) happens if
> > > the function is inlined.
> > >...
> > > It looks like is updating the stack on each iteration...This is -march=opteron
> > > code, the -march=pentium4 is similar. Same behaviour with gcc3 and gcc4.
> > >
> > > tst.c and Makefile attached.
> > >
> > > Nice, isn't it ? Please, probe where is my fault...
> >
> > The only fault is to post this issue here instead of the gcc Bugzilla.
>
> Sorry, my intention was just something like 'take a look at your
> reduction-like code, perhaps its sloooow', something like checksum
> funtions in tcp or raid that are inlined expecting to be faster
> and in fact they are slower...
Unless a function that has more than 1 caller is very tiny or reduces at
compile time to a very tiny rest, it's not expected that inlining was
faster on current CPUs.
But most times that's already only up to the compiler - e.g. current gcc
versions already automatically inline all static functions with only
1 caller.
> > In your example the compiler should produce code not slower than with
> > the out-of-line version when inlining. If it doesn't the bug in the
> > compiler resulting in this should be fixed.
>
> That's what I expected, but...
> Going to gcc bugzilla...
Thanks.
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2007-03-29 22:28 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-03-28 23:18 Inlining can be _very_bad J.A. Magallón
2007-03-29 1:29 ` Benjamin LaHaise
2007-03-29 17:52 ` Adrian Bunk
2007-03-29 22:01 ` J.A. Magallón
2007-03-29 22:28 ` Adrian Bunk
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox