public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Large inlines in include/linux/skbuff.h
@ 2004-04-21 19:26 Denis Vlasenko
  2004-04-22  0:47 ` James Morris
  0 siblings, 1 reply; 13+ messages in thread
From: Denis Vlasenko @ 2004-04-21 19:26 UTC (permalink / raw)
  To: David S. Miller; +Cc: Jeff Garzik, linux-kernel

Let's skip stuff which is smaller than 100 bytes.

Size  Uses Wasted Name and definition
===== ==== ====== ================================================
   58   20    722 __skb_dequeue include/linux/skbuff.h
  122  119  12036 skb_dequeue   include/linux/skbuff.h
  103   10    747 __skb_queue_purge     include/linux/skbuff.h
  164   78  11088 skb_queue_purge       include/linux/skbuff.h
   46   26    650 __skb_queue_tail      include/linux/skbuff.h
  109  101   8900 skb_queue_tail        include/linux/skbuff.h
   47   11    270 __skb_queue_head      include/linux/skbuff.h
  110   44   3870 skb_queue_head        include/linux/skbuff.h
   51   16    465 __skb_unlink  include/linux/skbuff.h
  132    9    896 skb_unlink    include/linux/skbuff.h
   46    2     26 __skb_append  include/linux/skbuff.h
  114    5    376 skb_append    include/linux/skbuff.h

'Uses' was counted over .c files only, .h were excluded, so
no double count here.

As you can see here, non-locking functions are more or less sanely
sized (except __skb_queue_purge). When spin_lock/unlock code gets added,
we cross 100 byte mark.

What shall be done with this? I'll make patch to move locking functions
into net/core/skbuff.c unless there is some reason not to do it.

Just to make you feel what is 100+ bytes in assembly:
void inline_skb_dequeue_1() { skb_dequeqe(0); }
gets compiled into:

00003513 <inline_skb_dequeue_1>:
    3513:       55                      push   %ebp
    3514:       89 e5                   mov    %esp,%ebp
    3516:       53                      push   %ebx
    3517:       9c                      pushf
    3518:       5b                      pop    %ebx
    3519:       fa                      cli
    351a:       b9 00 e0 ff ff          mov    $0xffffe000,%ecx
    351f:       21 e1                   and    %esp,%ecx
    3521:       ff 41 14                incl   0x14(%ecx)
    3524:       f0 fe 0d 0c 00 00 00    lock decb 0xc
    352b:       78 02                   js     352f <inline_skb_dequeue_1+0x1c>
    352d:       eb 0d                   jmp    353c <inline_skb_dequeue_1+0x29>
    352f:       f3 90                   repz nop
    3531:       80 3d 0c 00 00 00 00    cmpb   $0x0,0xc
    3538:       7e f5                   jle    352f <inline_skb_dequeue_1+0x1c>
    353a:       eb e8                   jmp    3524 <inline_skb_dequeue_1+0x11>
    353c:       8b 15 00 00 00 00       mov    0x0,%edx
    3542:       85 d2                   test   %edx,%edx
    3544:       74 2b                   je     3571 <inline_skb_dequeue_1+0x5e>
    3546:       89 d0                   mov    %edx,%eax
    3548:       8b 12                   mov    (%edx),%edx
    354a:       89 15 00 00 00 00       mov    %edx,0x0
    3550:       ff 0d 08 00 00 00       decl   0x8
    3556:       c7 42 04 00 00 00 00    movl   $0x0,0x4(%edx)
    355d:       c7 00 00 00 00 00       movl   $0x0,(%eax)
    3563:       c7 40 04 00 00 00 00    movl   $0x0,0x4(%eax)
    356a:       c7 40 08 00 00 00 00    movl   $0x0,0x8(%eax)
    3571:       b0 01                   mov    $0x1,%al
    3573:       86 05 0c 00 00 00       xchg   %al,0xc
    3579:       53                      push   %ebx
    357a:       9d                      popf
    357b:       8b 41 08                mov    0x8(%ecx),%eax
    357e:       ff 49 14                decl   0x14(%ecx)
    3581:       a8 08                   test   $0x8,%al
    3583:       74 05                   je     358a <inline_skb_dequeue_1+0x77>
    3585:       e8 fc ff ff ff          call   3586 <inline_skb_dequeue_1+0x73>
    358a:       5b                      pop    %ebx
    358b:       c9                      leave
    358c:       c3                      ret
--
vda


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Large inlines in include/linux/skbuff.h
  2004-04-21 19:26 Denis Vlasenko
@ 2004-04-22  0:47 ` James Morris
       [not found]   ` <200404221756.46240.vda@port.imtp.ilyichevsk.odessa.ua>
  0 siblings, 1 reply; 13+ messages in thread
From: James Morris @ 2004-04-22  0:47 UTC (permalink / raw)
  To: Denis Vlasenko; +Cc: David S. Miller, Jeff Garzik, linux-kernel

On Wed, 21 Apr 2004, Denis Vlasenko wrote:

> What shall be done with this? I'll make patch to move locking functions
> into net/core/skbuff.c unless there is some reason not to do it.

How will these changes impact performance?  I asked this last time you 
posted about inlines and didn't see any response.



- James
-- 
James Morris
<jmorris@redhat.com>



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Large inlines in include/linux/skbuff.h
       [not found] ` <1NAJr-61F-3@gated-at.bofh.it>
@ 2004-04-22 14:59   ` Andi Kleen
  2004-04-22 15:15     ` James Morris
       [not found]   ` <1NRqX-2GI-15@gated-at.bofh.it>
  1 sibling, 1 reply; 13+ messages in thread
From: Andi Kleen @ 2004-04-22 14:59 UTC (permalink / raw)
  To: James Morris; +Cc: linux-kernel, vda

James Morris <jmorris@redhat.com> writes:

> On Wed, 21 Apr 2004, Denis Vlasenko wrote:
>
>> What shall be done with this? I'll make patch to move locking functions
>> into net/core/skbuff.c unless there is some reason not to do it.
>
> How will these changes impact performance?  I asked this last time you 
> posted about inlines and didn't see any response.

I don't think it will be an issue. The optimization guidelines
of AMD and Intel recommend to move functions that generate 
more than 30-40 instructions out of line. 100 instructions 
is certainly enough to amortize the call overhead, and you 
safe some icache too so it may be even faster.

-Andi


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Large inlines in include/linux/skbuff.h
  2004-04-22 14:59   ` Large inlines in include/linux/skbuff.h Andi Kleen
@ 2004-04-22 15:15     ` James Morris
  2004-04-22 15:37       ` Eric Dumazet
  2004-04-22 22:23       ` Andi Kleen
  0 siblings, 2 replies; 13+ messages in thread
From: James Morris @ 2004-04-22 15:15 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, vda

On Thu, 22 Apr 2004, Andi Kleen wrote:

> > How will these changes impact performance?  I asked this last time you 
> > posted about inlines and didn't see any response.
> 
> I don't think it will be an issue. The optimization guidelines
> of AMD and Intel recommend to move functions that generate 
> more than 30-40 instructions out of line. 100 instructions 
> is certainly enough to amortize the call overhead, and you 
> safe some icache too so it may be even faster.

Of course, but it would be good to see some measurements.


- James
-- 
James Morris
<jmorris@redhat.com>



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Large inlines in include/linux/skbuff.h
  2004-04-22 15:15     ` James Morris
@ 2004-04-22 15:37       ` Eric Dumazet
       [not found]         ` <200404221927.34717.vda@port.imtp.ilyichevsk.odessa.ua>
  2004-04-22 22:23       ` Andi Kleen
  1 sibling, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2004-04-22 15:37 UTC (permalink / raw)
  To: James Morris; +Cc: Andi Kleen, linux-kernel, vda

James Morris wrote:

>On Thu, 22 Apr 2004, Andi Kleen wrote:
>
>  
>
>>>How will these changes impact performance?  I asked this last time you 
>>>posted about inlines and didn't see any response.
>>>      
>>>
>>I don't think it will be an issue. The optimization guidelines
>>of AMD and Intel recommend to move functions that generate 
>>more than 30-40 instructions out of line. 100 instructions 
>>is certainly enough to amortize the call overhead, and you 
>>safe some icache too so it may be even faster.
>>    
>>
>
>Of course, but it would be good to see some measurements.
>
>
>- James
>  
>
It depends a lot of the workload of the machine.

If this a specialized machine, with a small program that mostly uses 
recv() & send() syscalls, then, inlining functions is a gain, since 
icache may have a 100% hit ratio. Optimization guidelines are good for 
the common cases, not every cases.

Eric Dumazet


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Large inlines in include/linux/skbuff.h
       [not found]   ` <200404221756.46240.vda@port.imtp.ilyichevsk.odessa.ua>
@ 2004-04-22 18:36     ` David S. Miller
       [not found]       ` <200404222239.34760.vda@port.imtp.ilyichevsk.odessa.ua>
  0 siblings, 1 reply; 13+ messages in thread
From: David S. Miller @ 2004-04-22 18:36 UTC (permalink / raw)
  To: Denis Vlasenko; +Cc: jmorris, jgarzik, linux-kernel

On Thu, 22 Apr 2004 17:56:46 +0300
Denis Vlasenko <vda@port.imtp.ilyichevsk.odessa.ua> wrote:

> On Thursday 22 April 2004 03:47, James Morris wrote:
> > On Wed, 21 Apr 2004, Denis Vlasenko wrote:
> > > What shall be done with this? I'll make patch to move locking functions
> > > into net/core/skbuff.c unless there is some reason not to do it.
> >
> > How will these changes impact performance?  I asked this last time you
> > posted about inlines and didn't see any response.
> 
> Hard to say. We will lose ~2% to extra call/return instructions
> but will gain 40kb of icache space.

He's asking you to test and quantify the performance changes that
occur as a result of this patch, not to figure it out via calculations
in your head :-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Large inlines in include/linux/skbuff.h
       [not found]         ` <200404221927.34717.vda@port.imtp.ilyichevsk.odessa.ua>
@ 2004-04-22 19:24           ` Eric Dumazet
  0 siblings, 0 replies; 13+ messages in thread
From: Eric Dumazet @ 2004-04-22 19:24 UTC (permalink / raw)
  To: Denis Vlasenko; +Cc: James Morris, Andi Kleen, linux-kernel

Denis Vlasenko wrote:

>
>>
>>If this a specialized machine, with a small program that mostly uses
>>recv() & send() syscalls, then, inlining functions is a gain, since
>>icache may have a 100% hit ratio. Optimization guidelines are good for
>>the common cases, not every cases.
>>    
>>
>
>And if it is NOT a specialized machine? icache miss will nullify
>any speed advantages, while you still pay space penalty.
>We don't need to pay for at least ~250 kbytes wasted overall
>in allyesconfig kernel for each and every specialized
>setup conceivable, IMHO.
>  
>
The point is : if this IS a specialized machine, then the kernel is 
custom one, not allyesconfig.

This is imho what I do for specialized machines, and yes, I even inline 
some specific functions, like fget() and others.

But I didnt asked to not doing the un-inlining, I was just reminding 
that some guys (like me) are doing micro-optimizations that 'seem' to go 
against Optimizations guidelines from intel or AMD.

Eric


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Large inlines in include/linux/skbuff.h
       [not found]     ` <1NRqX-2GI-13@gated-at.bofh.it>
@ 2004-04-22 19:51       ` Andi Kleen
  0 siblings, 0 replies; 13+ messages in thread
From: Andi Kleen @ 2004-04-22 19:51 UTC (permalink / raw)
  To: David S. Miller; +Cc: vda, linux-kernel

"David S. Miller" <davem@redhat.com> writes:
>
> He's asking you to test and quantify the performance changes that
> occur as a result of this patch, not to figure it out via calculations
> in your head :-)

I bet it will be completely unnoticeable in macrobenchmarks.

The only way to measure a difference would be likely to use 
some unrealistic microbenchmark (program that calls this in a tight loop)

-Andi


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Large inlines in include/linux/skbuff.h
       [not found]       ` <200404222239.34760.vda@port.imtp.ilyichevsk.odessa.ua>
@ 2004-04-22 22:10         ` David S. Miller
  0 siblings, 0 replies; 13+ messages in thread
From: David S. Miller @ 2004-04-22 22:10 UTC (permalink / raw)
  To: Denis Vlasenko; +Cc: jmorris, jgarzik, linux-kernel

On Thu, 22 Apr 2004 22:39:34 +0300
Denis Vlasenko <vda@port.imtp.ilyichevsk.odessa.ua> wrote:

> Okay. I am willing do it. Hmm...
> Which type of test will highlight the difference?
> I don't have gigabit network to play with. Any hints of how can I measure
> it on 100mbit?

Get someone to run specweb before and after your changes.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Large inlines in include/linux/skbuff.h
  2004-04-22 15:15     ` James Morris
  2004-04-22 15:37       ` Eric Dumazet
@ 2004-04-22 22:23       ` Andi Kleen
  2004-04-22 22:25         ` David S. Miller
  1 sibling, 1 reply; 13+ messages in thread
From: Andi Kleen @ 2004-04-22 22:23 UTC (permalink / raw)
  To: James Morris; +Cc: Andi Kleen, linux-kernel, vda

On Thu, Apr 22, 2004 at 11:15:47AM -0400, James Morris wrote:
> On Thu, 22 Apr 2004, Andi Kleen wrote:
> 
> > > How will these changes impact performance?  I asked this last time you 
> > > posted about inlines and didn't see any response.
> > 
> > I don't think it will be an issue. The optimization guidelines
> > of AMD and Intel recommend to move functions that generate 
> > more than 30-40 instructions out of line. 100 instructions 
> > is certainly enough to amortize the call overhead, and you 
> > safe some icache too so it may be even faster.
> 
> Of course, but it would be good to see some measurements.

It's useless in this case. networking is dominated by cache misses
and locks and a modern CPU can do hundreds of function calls in a 
single cache miss.

-Andi

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Large inlines in include/linux/skbuff.h
  2004-04-22 22:23       ` Andi Kleen
@ 2004-04-22 22:25         ` David S. Miller
  2004-04-22 22:34           ` Andi Kleen
  0 siblings, 1 reply; 13+ messages in thread
From: David S. Miller @ 2004-04-22 22:25 UTC (permalink / raw)
  To: Andi Kleen; +Cc: jmorris, ak, linux-kernel, vda

On 23 Apr 2004 00:23:26 +0200
Andi Kleen <ak@muc.de> wrote:

> > Of course, but it would be good to see some measurements.
> 
> It's useless in this case. networking is dominated by cache misses
> and locks and a modern CPU can do hundreds of function calls in a 
> single cache miss.

Indeed, this change does influence the I-cache footprint of the
networking, so I conclude that it is relevant in this case.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Large inlines in include/linux/skbuff.h
  2004-04-22 22:25         ` David S. Miller
@ 2004-04-22 22:34           ` Andi Kleen
  2004-04-22 22:35             ` David S. Miller
  0 siblings, 1 reply; 13+ messages in thread
From: Andi Kleen @ 2004-04-22 22:34 UTC (permalink / raw)
  To: David S. Miller; +Cc: Andi Kleen, jmorris, linux-kernel, vda

On Thu, Apr 22, 2004 at 03:25:25PM -0700, David S. Miller wrote:
> On 23 Apr 2004 00:23:26 +0200
> Andi Kleen <ak@muc.de> wrote:
> 
> > > Of course, but it would be good to see some measurements.
> > 
> > It's useless in this case. networking is dominated by cache misses
> > and locks and a modern CPU can do hundreds of function calls in a 
> > single cache miss.
> 
> Indeed, this change does influence the I-cache footprint of the
> networking, so I conclude that it is relevant in this case.

All it does it is to lower the cache foot print, not enlarge it.
So even if it made a difference it could only get better.
It's extremly unlikely to make it detectable worse.

-Andi

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Large inlines in include/linux/skbuff.h
  2004-04-22 22:34           ` Andi Kleen
@ 2004-04-22 22:35             ` David S. Miller
  0 siblings, 0 replies; 13+ messages in thread
From: David S. Miller @ 2004-04-22 22:35 UTC (permalink / raw)
  To: Andi Kleen; +Cc: ak, jmorris, linux-kernel, vda

On 23 Apr 2004 00:34:01 +0200
Andi Kleen <ak@muc.de> wrote:

> All it does it is to lower the cache foot print, not enlarge it.
> So even if it made a difference it could only get better.
> It's extremly unlikely to make it detectable worse.

That's a good argument.  I would still like to see some
numbers, you know to help me sleep at night :-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2004-04-22 22:36 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1NvJL-1QO-9@gated-at.bofh.it>
     [not found] ` <1NAJr-61F-3@gated-at.bofh.it>
2004-04-22 14:59   ` Large inlines in include/linux/skbuff.h Andi Kleen
2004-04-22 15:15     ` James Morris
2004-04-22 15:37       ` Eric Dumazet
     [not found]         ` <200404221927.34717.vda@port.imtp.ilyichevsk.odessa.ua>
2004-04-22 19:24           ` Eric Dumazet
2004-04-22 22:23       ` Andi Kleen
2004-04-22 22:25         ` David S. Miller
2004-04-22 22:34           ` Andi Kleen
2004-04-22 22:35             ` David S. Miller
     [not found]   ` <1NRqX-2GI-15@gated-at.bofh.it>
     [not found]     ` <1NRqX-2GI-13@gated-at.bofh.it>
2004-04-22 19:51       ` Andi Kleen
2004-04-21 19:26 Denis Vlasenko
2004-04-22  0:47 ` James Morris
     [not found]   ` <200404221756.46240.vda@port.imtp.ilyichevsk.odessa.ua>
2004-04-22 18:36     ` David S. Miller
     [not found]       ` <200404222239.34760.vda@port.imtp.ilyichevsk.odessa.ua>
2004-04-22 22:10         ` David S. Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox