All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <dada1@cosmosbay.com>
To: Thomas Graf <tgraf@suug.ch>
Cc: "David S. Miller" <davem@davemloft.net>, netdev@oss.sgi.com
Subject: Re: [PATCH] loop unrolling in net/sched/sch_generic.c
Date: Tue, 05 Jul 2005 15:04:21 +0200	[thread overview]
Message-ID: <42CA8555.9050607@cosmosbay.com> (raw)
In-Reply-To: <20050705115108.GE16076@postel.suug.ch>

Thomas Graf a écrit :
> * Eric Dumazet <42CA390C.9000801@cosmosbay.com> 2005-07-05 09:38
> 
>>[NET] : unroll a small loop in pfifo_fast_dequeue(). Compiler generates 
>>better code.
>>	(Using skb_queue_empty() to test the queue is faster than trying to 
>>	__skb_dequeue())
>>	oprofile says this function uses now 0.29% instead of 1.22 %, on a 
>>	x86_64 target.
> 
> 
> I think this patch is pretty much pointless. __skb_dequeue() and
> !skb_queue_empty() should produce almost the same code and as soon
> as you disable profiling and debugging you'll see that the compiler
> unrolls the loop itself if possible.
> 
> 

OK. At least my compiler (gcc-3.3.1) does NOT unroll the loop :

Original 2.6.12 gives :

ffffffff802a9790 <pfifo_fast_dequeue>: /* pfifo_fast_dequeue total: 2904054  1.9531 */
258371  0.1738 :ffffffff802a9790:       lea    0xc0(%rdi),%rcx
273669  0.1841 :ffffffff802a9797:       xor    %esi,%esi
  12533  0.0084 :ffffffff802a9799:       mov    (%rcx),%rdx
292315  0.1966 :ffffffff802a979c:       cmp    %rcx,%rdx
  11717  0.0079 :ffffffff802a979f:       je     ffffffff802a97d1 <pfifo_fast_dequeue+0x41>
   4474  0.0030 :ffffffff802a97a1:       mov    %rdx,%rax
   6238  0.0042 :ffffffff802a97a4:       mov    (%rdx),%rdx
     41 2.8e-05 :ffffffff802a97a7:       decl   0x10(%rcx)
   6089  0.0041 :ffffffff802a97aa:       test   %rax,%rax
    126 8.5e-05 :ffffffff802a97ad:       movq   $0x0,0x10(%rax)
     39 2.6e-05 :ffffffff802a97b5:       mov    %rcx,0x8(%rdx)
   6974  0.0047 :ffffffff802a97b9:       mov    %rdx,(%rcx)
   2841  0.0019 :ffffffff802a97bc:       movq   $0x0,0x8(%rax)
    366 2.5e-04 :ffffffff802a97c4:       movq   $0x0,(%rax)
  14757  0.0099 :ffffffff802a97cb:       je     ffffffff802a97d1 <pfifo_fast_dequeue+0x41>
    288 1.9e-04 :ffffffff802a97cd:       decl   0x40(%rdi)
     94 6.3e-05 :ffffffff802a97d0:       retq
970400  0.6526 :ffffffff802a97d1:       inc    %esi
982402  0.6607 :ffffffff802a97d3:       add    $0x18,%rcx
      4 2.7e-06 :ffffffff802a97d7:       cmp    $0x2,%esi
      1 6.7e-07 :ffffffff802a97da:       jle    ffffffff802a9799 <pfifo_fast_dequeue+0x9>
  59754  0.0402 :ffffffff802a97dc:       xor    %eax,%eax
    561 3.8e-04 :ffffffff802a97de:       data16
                :ffffffff802a97df:       nop
                :ffffffff802a97e0:       retq


And new code (2.6.12-ed):

ffffffff802b1020 <pfifo_fast_dequeue>: /* pfifo_fast_dequeue total: 153139  0.2934 */
  27388  0.0525 :ffffffff802b1020:       lea    0xc0(%rdi),%rdx
  42091  0.0806 :ffffffff802b1027:       cmp    %rdx,0xc0(%rdi)
                :ffffffff802b102e:       jne    ffffffff802b1052 <pfifo_fast_dequeue+0x32>
    474 9.1e-04 :ffffffff802b1030:       lea    0xd8(%rdi),%rdx
   5571  0.0107 :ffffffff802b1037:       cmp    %rdx,0xd8(%rdi)
      2 3.8e-06 :ffffffff802b103e:       jne    ffffffff802b1052 <pfifo_fast_dequeue+0x32>
      1 1.9e-06 :ffffffff802b1040:       lea    0xf0(%rdi),%rdx
  20030  0.0384 :ffffffff802b1047:       xor    %eax,%eax
      6 1.1e-05 :ffffffff802b1049:       cmp    %rdx,0xf0(%rdi)
      6 1.1e-05 :ffffffff802b1050:       je     ffffffff802b1086 <pfifo_fast_dequeue+0x66>
                :ffffffff802b1052:       mov    (%rdx),%rcx
  11796  0.0226 :ffffffff802b1055:       xor    %eax,%eax
                :ffffffff802b1057:       cmp    %rdx,%rcx
      8 1.5e-05 :ffffffff802b105a:       je     ffffffff802b1083 <pfifo_fast_dequeue+0x63>
   3146  0.0060 :ffffffff802b105c:       mov    %rcx,%rax
     12 2.3e-05 :ffffffff802b105f:       mov    (%rcx),%rcx
    118 2.3e-04 :ffffffff802b1062:       decl   0x10(%rdx)
   4924  0.0094 :ffffffff802b1065:       movq   $0x0,0x10(%rax)
     65 1.2e-04 :ffffffff802b106d:       mov    %rdx,0x8(%rcx)
    725  0.0014 :ffffffff802b1071:       mov    %rcx,(%rdx)
  11493  0.0220 :ffffffff802b1074:       movq   $0x0,0x8(%rax)
    194 3.7e-04 :ffffffff802b107c:       movq   $0x0,(%rax)
   2995  0.0057 :ffffffff802b1083:       decl   0x40(%rdi)
  19607  0.0376 :ffffffff802b1086:       nop
   2487  0.0048 :ffffffff802b1087:       retq


Please give us the code your compiler produces, and explain me how disabling oprofile can change the generated assembly. :)
Debugging has no impact on this code either.

Thank you

Eric

  parent reply	other threads:[~2005-07-05 13:04 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-05-11 21:15 [TG3]: Add hw coalescing infrastructure David S. Miller
2005-05-11 21:17 ` Michael Chan
2005-05-12  2:28   ` David S. Miller
2005-05-12  7:53     ` Robert Olsson
2005-06-22 15:25 ` [TG3]: About " Eric Dumazet
2005-06-22 19:03   ` Michael Chan
2005-07-04 21:22     ` Eric Dumazet
2005-07-04 21:26       ` David S. Miller
2005-07-04 21:39         ` Eric Dumazet
2005-07-04 21:49           ` David S. Miller
2005-07-04 22:31           ` Eric Dumazet
2005-07-04 22:47             ` David S. Miller
2005-07-04 22:55               ` Eric Dumazet
2005-07-04 22:57                 ` Eric Dumazet
2005-07-04 23:01                   ` David S. Miller
2005-07-05  7:38                     ` [PATCH] loop unrolling in net/sched/sch_generic.c Eric Dumazet
2005-07-05 11:51                       ` Thomas Graf
2005-07-05 12:03                         ` Thomas Graf
2005-07-05 13:04                         ` Eric Dumazet [this message]
2005-07-05 13:48                           ` Thomas Graf
2005-07-05 15:58                             ` Eric Dumazet
2005-07-05 17:34                               ` Thomas Graf
2005-07-05 21:22                                 ` David S. Miller
2005-07-05 21:33                                   ` Thomas Graf
2005-07-05 21:35                                     ` David S. Miller
2005-07-05 23:16                                       ` Eric Dumazet
2005-07-05 23:41                                         ` Thomas Graf
2005-07-05 23:45                                           ` David S. Miller
2005-07-05 23:55                                             ` Thomas Graf
2005-07-06  0:32                                           ` Eric Dumazet
2005-07-06  0:51                                             ` Thomas Graf
2005-07-06  1:04                                               ` Eric Dumazet
2005-07-06  1:07                                                 ` Thomas Graf
2005-07-06  0:53                                             ` Eric Dumazet
2005-07-06  1:02                                               ` Thomas Graf
2005-07-06  1:09                                                 ` Eric Dumazet
2005-07-06 12:42                                               ` Thomas Graf
2005-07-07 21:17                                                 ` David S. Miller
2005-07-07 21:34                                                   ` Thomas Graf
2005-07-07 22:24                                                     ` David S. Miller
     [not found]                                                   ` <42CE22CE.7030902@cosmosbay.com>
2005-07-08  7:30                                                     ` David S. Miller
2005-07-08  8:19                                                       ` Eric Dumazet
2005-07-08 11:08                                                         ` Arnaldo Carvalho de Melo
2005-07-12  4:02                                                           ` David S. Miller
2005-07-05 21:26                       ` David S. Miller
2005-07-28 15:52                       ` [PATCH] Add prefetches in net/ipv4/route.c Eric Dumazet
2005-07-28 19:39                         ` David S. Miller
2005-07-28 20:56                           ` Eric Dumazet
2005-07-28 20:58                             ` David S. Miller
2005-07-28 21:24                               ` Eric Dumazet
2005-07-28 22:44                                 ` David S. Miller
2005-07-29 14:50                                 ` Robert Olsson
2005-07-29 17:06                                   ` Rick Jones
2005-07-29 17:44                                     ` Robert Olsson
2005-07-29 17:57                                     ` Eric Dumazet
2005-07-29 18:25                                       ` Rick Jones
2005-07-31  3:52                                         ` David S. Miller
     [not found]                                           ` <42EDDA50.4010405@cosmosbay.com>
2005-08-01 15:39                                             ` David S. Miller
2005-07-31  3:51                                       ` David S. Miller
2005-07-31  3:44                                   ` David S. Miller
2005-07-04 23:00                 ` [TG3]: About hw coalescing infrastructure David S. Miller
2005-07-05 16:14                   ` Eric Dumazet
2005-07-04 22:47             ` Eric Dumazet
     [not found] <C925F8B43D79CC49ACD0601FB68FF50C045E0FB0@orsmsx408>
2005-07-07 22:30 ` [PATCH] loop unrolling in net/sched/sch_generic.c David S. Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=42CA8555.9050607@cosmosbay.com \
    --to=dada1@cosmosbay.com \
    --cc=davem@davemloft.net \
    --cc=netdev@oss.sgi.com \
    --cc=tgraf@suug.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.