the cost of inlining?

kernelnewbies.kernelnewbies.org archive mirror
 help / color / mirror / Atom feed

* the cost of inlining?
@ 2014-12-05  1:32 Jeff Haran
  2014-12-05  3:14 ` John de la Garza
  2014-12-06  3:25 ` Max Filippov
  0 siblings, 2 replies; 5+ messages in thread
From: Jeff Haran @ 2014-12-05  1:32 UTC (permalink / raw)
  To: kernelnewbies

Hoping this isn't too far off the topic, but I figure it might be of interest to other kernel developers and it has me a bit baffled.

The primary benefit to inlining functions is to avoid the cost of making function calls. At least that's how I've understood it.

So I was playing with a bit of sample code:

$ cat atomic_read.c

#include <asm/atomic.h>
#include <asm/system.h>

int samp_atomic_read(atomic_t *v)
{
        int val;

        val = atomic_read(v);
        return val;
}

atomic_read() is declared like so:

static inline int atomic_read(const atomic_t *v)
{
        return v->counter;
}

So I figured the compilation of the my sample code would result in no call to a function with samp_atomic_read().

But after I build the above with the following Makefile:

$ cat Makefile
obj-m += atomic_read.o

all:
        make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules

clean:
        make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean

I dump the resultant .ko, I get this:

> objdump -S -M intel atomic_read.ko

atomic_read.ko:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <samp_atomic_read>:
#include <asm/atomic.h>
#include <asm/system.h>

int samp_atomic_read(atomic_t *v)
{
   0:   55                      push   rbp
   1:   48 89 e5                mov    rbp,rsp
   4:   e8 00 00 00 00          call   9 <samp_atomic_read+0x9>
 *
 * Atomically reads the value of @v.
 */
static inline int atomic_read(const atomic_t *v)
{
        return v->counter;
   9:   8b 07                   mov    eax,DWORD PTR [rdi]
    int val;

        val = atomic_read(v);
        return val;
}
   b:   c9                      leave
   c:   c3                      ret
   d:   90                      nop
   e:   90                      nop
   f:   90                      nop

I think I understand most of it. The first 2 instructions save the base pointer of the caller and setup a new one from samp_atomic_read().

The instruction at offset 9 reads the contents of v->counter into eax to return to the caller.

The instruction at offset 0xb, restores the base pointer and stack pointer of the caller and the ret at offset 0xc returns execution to the caller. I am guessing the nops at the end are to make the next function land on an 8 byte boundary (this is for an X86_64 target).

But what is that call instruction at offset 4 for?

It would seem to accomplish nothing since without it execution would proceed at the mov at offset 9 like I'd expect and since no new base frame gets setup inside atomic_read() itself, the leave/ret causes control to return to the caller of samp_atomic_read() anyway.

If atomic_read() were a macro, we wouldn't have this seemingly superfluous call instruction.

Anybody know why it's there?

Thanks,

Jeff Haran

^ permalink raw reply	[flat|nested] 5+ messages in thread

* the cost of inlining?
  2014-12-05  1:32 the cost of inlining? Jeff Haran
@ 2014-12-05  3:14 ` John de la Garza
  2014-12-05 22:35   ` Jeff Haran
  2014-12-06  3:25 ` Max Filippov
  1 sibling, 1 reply; 5+ messages in thread
From: John de la Garza @ 2014-12-05  3:14 UTC (permalink / raw)
  To: kernelnewbies

On Fri, Dec 05, 2014 at 01:32:35AM +0000, Jeff Haran wrote:
> $ cat atomic_read.c
> 
> #include <asm/atomic.h>
> #include <asm/system.h>
> 
> int samp_atomic_read(atomic_t *v)
> {
>         int val;
> 
>         val = atomic_read(v);
>         return val;
> }
I couldn't get it to build with the #inclue <asm/system.h>, but it built
when I removed it.

> I dump the resultant .ko, I get this:
> 
> > objdump -S -M intel atomic_read.ko
> 
> atomic_read.ko:     file format elf64-x86-64
> 
> 
> Disassembly of section .text:
> 
> 0000000000000000 <samp_atomic_read>:
> #include <asm/atomic.h>
> #include <asm/system.h>
> 
> int samp_atomic_read(atomic_t *v)
> {
>    0:   55                      push   rbp
>    1:   48 89 e5                mov    rbp,rsp
>    4:   e8 00 00 00 00          call   9 <samp_atomic_read+0x9>
>  *
>  * Atomically reads the value of @v.
>  */
> static inline int atomic_read(const atomic_t *v)
> {
>         return v->counter;
>    9:   8b 07                   mov    eax,DWORD PTR [rdi]
>     int val;
> 
>         val = atomic_read(v);
>         return val;
> }
>    b:   c9                      leave
>    c:   c3                      ret
>    d:   90                      nop
>    e:   90                      nop
>    f:   90                      nop
> 

My ouput differs:
john at vega:~/foo$ objdump -S -M intel atomic_read.ko

atomic_read.ko:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <samp_atomic_read>:
   0: 55                    push   rbp
   1: 8b 07                 mov    eax,DWORD PTR [rdi]
   3: 48 89 e5              mov    rbp,rsp
   6: 5d                    pop    rbp
   7: c3                    ret

^ permalink raw reply	[flat|nested] 5+ messages in thread

* the cost of inlining?
  2014-12-05  3:14 ` John de la Garza
@ 2014-12-05 22:35   ` Jeff Haran
  2014-12-06  2:53     ` John de la Garza
  0 siblings, 1 reply; 5+ messages in thread
From: Jeff Haran @ 2014-12-05 22:35 UTC (permalink / raw)
  To: kernelnewbies

> -----Original Message-----
> From: John de la Garza [mailto:john at jjdev.com]
> Sent: Thursday, December 04, 2014 7:14 PM
> To: Jeff Haran
> Cc: Kernel Newbies
> Subject: Re: the cost of inlining?
> 
> On Fri, Dec 05, 2014 at 01:32:35AM +0000, Jeff Haran wrote:
> > $ cat atomic_read.c
> >
> > #include <asm/atomic.h>
> > #include <asm/system.h>
> >
> > int samp_atomic_read(atomic_t *v)
> > {
> >         int val;
> >
> >         val = atomic_read(v);
> >         return val;
> > }
> I couldn't get it to build with the #inclue <asm/system.h>, but it built when I
> removed it.
> 
> > I dump the resultant .ko, I get this:
> >
> > > objdump -S -M intel atomic_read.ko
> >
> > atomic_read.ko:     file format elf64-x86-64
> >
> >
> > Disassembly of section .text:
> >
> > 0000000000000000 <samp_atomic_read>:
> > #include <asm/atomic.h>
> > #include <asm/system.h>
> >
> > int samp_atomic_read(atomic_t *v)
> > {
> >    0:   55                      push   rbp
> >    1:   48 89 e5                mov    rbp,rsp
> >    4:   e8 00 00 00 00          call   9 <samp_atomic_read+0x9>
> >  *
> >  * Atomically reads the value of @v.
> >  */
> > static inline int atomic_read(const atomic_t *v) {
> >         return v->counter;
> >    9:   8b 07                   mov    eax,DWORD PTR [rdi]
> >     int val;
> >
> >         val = atomic_read(v);
> >         return val;
> > }
> >    b:   c9                      leave
> >    c:   c3                      ret
> >    d:   90                      nop
> >    e:   90                      nop
> >    f:   90                      nop
> >
> 
> My ouput differs:
> john at vega:~/foo$ objdump -S -M intel atomic_read.ko
> 
> atomic_read.ko:     file format elf64-x86-64
> 
> 
> Disassembly of section .text:
> 
> 0000000000000000 <samp_atomic_read>:
>    0: 55                    push   rbp
>    1: 8b 07                 mov    eax,DWORD PTR [rdi]
>    3: 48 89 e5              mov    rbp,rsp
>    6: 5d                    pop    rbp
>    7: c3                    ret

John,

Would you mind sharing what kernel version/distro you are using?

I'm using a somewhat dated Redhat version for this:

$ cat /proc/version
Linux version 2.6.32-71.15.1.el6.x86_64 (mockbuild at x86-009.build.bos.redhat.com) (gcc version 4.4.4 20100726 (Red Hat 4.4.4-13) (GCC) ) #1 SMP Sun Jan 23 10:39:44 EST 2011
[jharan at build-fusion-linux]~$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 6.0 (Santiago)
$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 6.0 (Santiago)
$ gcc -v
Using built-in specs.
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib --with-ppl --with-cloog --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.4.4 20100726 (Red Hat 4.4.4-13) (GCC)

I wonder if my results are from one of Redhat's tweaks.

The only benefit I can think of from this extra call instruction is if v is a bad address and the dereference of v causes a trap, the stack trace will have that return address on it.

Thanks,

Jeff Haran

^ permalink raw reply	[flat|nested] 5+ messages in thread

* the cost of inlining?
  2014-12-05 22:35   ` Jeff Haran
@ 2014-12-06  2:53     ` John de la Garza
  0 siblings, 0 replies; 5+ messages in thread
From: John de la Garza @ 2014-12-06  2:53 UTC (permalink / raw)
  To: kernelnewbies

On Fri, Dec 05, 2014 at 10:35:13PM +0000, Jeff Haran wrote:
> John,
> 
> Would you mind sharing what kernel version/distro you are using?
> 
john at vega:~$ cat /proc/version
Linux version 3.18.0-rc7+ (john at vega) (gcc version 4.9.1 (Debian 4.9.1-16) ) #88 SMP Thu Dec 4 21:55:41 EST 2014

john at vega:~$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.9/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 4.9.1-16' --with-bugurl=file:///usr/share/doc/gcc-4.9/README.Bugs --enable-languages=c,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.9 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.9 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-4.9-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-4.9-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-4.9-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --with-arch-32=i586 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.9.1 (Debian 4.9.1-16)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* the cost of inlining?
  2014-12-05  1:32 the cost of inlining? Jeff Haran
  2014-12-05  3:14 ` John de la Garza
@ 2014-12-06  3:25 ` Max Filippov
  1 sibling, 0 replies; 5+ messages in thread
From: Max Filippov @ 2014-12-06  3:25 UTC (permalink / raw)
  To: kernelnewbies

On Fri, Dec 5, 2014 at 4:32 AM, Jeff Haran <Jeff.Haran@citrix.com> wrote:
> Disassembly of section .text:
>
> 0000000000000000 <samp_atomic_read>:
> #include <asm/atomic.h>
> #include <asm/system.h>
>
> int samp_atomic_read(atomic_t *v)
> {
>    0:   55                      push   rbp
>    1:   48 89 e5                mov    rbp,rsp
>    4:   e8 00 00 00 00          call   9 <samp_atomic_read+0x9>
>  *
>  * Atomically reads the value of @v.
>  */
> static inline int atomic_read(const atomic_t *v)
> {
>         return v->counter;
>    9:   8b 07                   mov    eax,DWORD PTR [rdi]
>     int val;
>
>         val = atomic_read(v);
>         return val;
> }
>    b:   c9                      leave
>    c:   c3                      ret
>    d:   90                      nop
>    e:   90                      nop
>    f:   90                      nop

[...]

> But what is that call instruction at offset 4 for?

Looks like you have ftrace enabled in your kernel config, and this call is
a call to _mcount.

-- 
Thanks.
-- Max

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-12-06  3:25 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-05  1:32 the cost of inlining? Jeff Haran
2014-12-05  3:14 ` John de la Garza
2014-12-05 22:35   ` Jeff Haran
2014-12-06  2:53     ` John de la Garza
2014-12-06  3:25 ` Max Filippov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).