All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Gallatin <gallatin@myri.com>
To: Eric Dumazet <dada1@cosmosbay.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>,
	David Miller <davem@davemloft.net>,
	brice@myri.com, sgruszka@redhat.com, netdev@vger.kernel.org
Subject: Re: [PATCH] myr10ge: again fix lro_gen_skb() alignment
Date: Wed, 29 Apr 2009 13:28:58 -0400	[thread overview]
Message-ID: <49F88E5A.8070908@myri.com> (raw)
In-Reply-To: <49F87188.9000904@cosmosbay.com>

Eric Dumazet wrote:

 >
 > Sure, probably more cache misses or something...

Yes, that's what I thought.  The code is much more complete,
and spread out than LRO, and seems to open itself to cache
misses.

 > You could try a longer oprofile session (with at least one million 
samples)
 > and :
 >
 > opannotate -a vmlinux >/tmp/FILE
 >
 > And select 3 or 4 suspect functions : inet_gro_receive() 
tcp_gro_receive(),
 > skb_gro_receive(), skb_gro_header()

Here is the opreport -l output from this machine for GRO for a 25 minute
profiling run:


samples  %        image name               app name 
symbol name
3742674  32.2793  vmlinux                  vmlinux 
copy_user_generic_string
890179    7.6775  myri10ge.ko              myri10ge 
myri10ge_poll
547572    4.7226  vmlinux                  vmlinux 
inet_gro_receive
477479    4.1181  vmlinux                  vmlinux 
skb_gro_receive
406562    3.5065  vmlinux                  vmlinux 
free_hot_cold_page
396796    3.4222  vmlinux                  vmlinux 
tcp_gro_receive
332364    2.8665  vmlinux                  vmlinux 
__rmqueue_smallest
319455    2.7552  vmlinux                  vmlinux 
skb_gro_header
269040    2.3204  vmlinux                  vmlinux 
dev_gro_receive
252885    2.1810  vmlinux                  vmlinux 
free_pages_bulk
247832    2.1375  vmlinux                  vmlinux 
get_pageblock_flags_group
211592    1.8249  myri10ge.ko              myri10ge 
myri10ge_alloc_rx_pages
208867    1.8014  vmlinux                  vmlinux 
__list_add
201491    1.7378  vmlinux                  vmlinux 
tcp4_gro_receive
187591    1.6179  vmlinux                  vmlinux 
__napi_gro_receive
170156    1.4675  vmlinux                  vmlinux 
get_page_from_freelist
116321    1.0032  vmlinux                  vmlinux                  list_del
107994    0.9314  vmlinux                  vmlinux                  kfree
106434    0.9180  vmlinux                  vmlinux 
skb_copy_datagram_iovec
100675    0.8683  vmlinux                  vmlinux                  put_page

And is here is the opannotate -a output for a few GRO functions.  BTW, 
did you mean -s
rather than -a?   I'd naively think source might be more helpful.  But 
here is
what you asked for:

ffffffff80479f20 <inet_gro_receive>: /* inet_gro_receive total: 547572 
5.2554 */
  12187  0.1170 :ffffffff80479f20:       push   %r13
   2611  0.0251 :ffffffff80479f22:       mov    %rdi,%r13
                :ffffffff80479f25:       push   %r12
                :ffffffff80479f27:       push   %rbp
   4031  0.0387 :ffffffff80479f28:       push   %rbx
                :ffffffff80479f29:       mov    %rsi,%rbx
                :ffffffff80479f2c:       mov    $0x14,%esi
   6303  0.0605 :ffffffff80479f31:       mov    %rbx,%rdi
                :ffffffff80479f34:       sub    $0x8,%rsp
                :ffffffff80479f38:       callq  ffffffff804357a1 
<skb_gro_header>
                :ffffffff80479f3d:       test   %rax,%rax
   2494  0.0239 :ffffffff80479f40:       mov    %rax,%r8
                :ffffffff80479f43:       je     ffffffff8047a0a4 
<inet_gro_receive+0x184>
                :ffffffff80479f49:       movzbl 0x9(%rax),%eax
   2541  0.0244 :ffffffff80479f4d:       mov 
0xffffffff80d06280(,%rax,8),%r11
     33 3.2e-04 :ffffffff80479f55:       test   %r11,%r11
      5 4.8e-05 :ffffffff80479f58:       je     ffffffff8047a0a4 
<inet_gro_receive+0x184>
  11016  0.1057 :ffffffff80479f5e:       cmpq   $0x0,0x20(%r11)
    292  0.0028 :ffffffff80479f63:       je     ffffffff8047a0a4 
<inet_gro_receive+0x184>
      1 9.6e-06 :ffffffff80479f69:       cmpb   $0x45,(%r8)
   4297  0.0412 :ffffffff80479f6d:       jne    ffffffff8047a0a4 
<inet_gro_receive+0x184>
   6086  0.0584 :ffffffff80479f73:       mov    $0x5,%eax
                :ffffffff80479f78:       mov    %r8,%rcx
  18706  0.1795 :ffffffff80479f7b:       mov    (%rcx),%edx
    341  0.0033 :ffffffff80479f7d:       sub    $0x4,%eax
                :ffffffff80479f80:       jbe    ffffffff80479fa6 
<inet_gro_receive+0x86>
   4609  0.0442 :ffffffff80479f82:       add    0x4(%rcx),%edx
    398  0.0038 :ffffffff80479f85:       adc    0x8(%rcx),%edx
                :ffffffff80479f88:       adc    0xc(%rcx),%edx
   4310  0.0414 :ffffffff80479f8b:       adc    0x10(%rcx),%edx
    790  0.0076 :ffffffff80479f8e:       lea    0x4(%rcx),%rcx
                :ffffffff80479f92:       dec    %eax
   9097  0.0873 :ffffffff80479f94:       jne    ffffffff80479f8b 
<inet_gro_receive+0x6b>
    541  0.0052 :ffffffff80479f96:       adc    $0x0,%edx
                :ffffffff80479f99:       mov    %edx,%eax
   1919  0.0184 :ffffffff80479f9b:       shr    $0x10,%edx
    535  0.0051 :ffffffff80479f9e:       add    %ax,%dx
                :ffffffff80479fa1:       adc    $0x0,%edx
   3633  0.0349 :ffffffff80479fa4:       not    %edx
    683  0.0066 :ffffffff80479fa6:       test   %dx,%dx
      1 9.6e-06 :ffffffff80479fa9:       jne    ffffffff8047a0a4 
<inet_gro_receive+0x184>
   4725  0.0453 :ffffffff80479faf:       movzwl 0x2(%r8),%eax
   9728  0.0934 :ffffffff80479fb4:       mov    0x68(%rbx),%edx
      8 7.7e-05 :ffffffff80479fb7:       mov    $0x1,%ebp
  43000  0.4127 :ffffffff80479fbc:       sub    0x38(%rbx),%edx
  11149  0.1070 :ffffffff80479fbf:       mov    %eax,%ecx
                :ffffffff80479fc1:       shl    $0x8,%eax
  66497  0.6382 :ffffffff80479fc4:       shr    $0x8,%ecx
    735  0.0071 :ffffffff80479fc7:       or     %ecx,%eax
                :ffffffff80479fc9:       movzwl %ax,%eax
   5459  0.0524 :ffffffff80479fcc:       cmp    %edx,%eax
    522  0.0050 :ffffffff80479fce:       jne    ffffffff80479fdc 
<inet_gro_receive+0xbc>
                :ffffffff80479fd0:       xor    %ebp,%ebp
   5373  0.0516 :ffffffff80479fd2:       cmpw   $0x40,0x6(%r8)
    345  0.0033 :ffffffff80479fd8:       setne  %bpl
                :ffffffff80479fdc:       movzwl 0x4(%r8),%eax
   2384  0.0229 :ffffffff80479fe1:       mov    0x0(%r13),%r10
    631  0.0061 :ffffffff80479fe5:       mov    %eax,%edx
                :ffffffff80479fe7:       shl    $0x8,%eax
   3044  0.0292 :ffffffff80479fea:       shr    $0x8,%edx
    303  0.0029 :ffffffff80479fed:       or     %edx,%eax
                :ffffffff80479fef:       movzwl %ax,%r12d
   2747  0.0264 :ffffffff80479ff3:       jmp    ffffffff8047a071 
<inet_gro_receive+0x151>
   2109  0.0202 :ffffffff80479ff5:       lea    0x38(%r10),%r9
     12 1.2e-04 :ffffffff80479ff9:       cmpl   $0x0,0x4(%r9)
     23 2.2e-04 :ffffffff80479ffe:       je     ffffffff8047a06e 
<inet_gro_receive+0x14e>
   2104  0.0202 :ffffffff8047a000:       mov    0xac(%r10),%edi
      2 1.9e-05 :ffffffff8047a007:       add    0xc0(%r10),%rdi
                :ffffffff8047a00e:       mov    0x9(%rdi),%sil
   2391  0.0229 :ffffffff8047a012:       mov    0x1(%rdi),%al
      2 1.9e-05 :ffffffff8047a015:       xor    0x9(%r8),%sil
      7 6.7e-05 :ffffffff8047a019:       xor    0x1(%r8),%al
   2101  0.0202 :ffffffff8047a01d:       mov    0xc(%rdi),%edx
      1 9.6e-06 :ffffffff8047a020:       mov    0x10(%rdi),%ecx
                :ffffffff8047a023:       xor    0xc(%r8),%edx
   2775  0.0266 :ffffffff8047a027:       xor    0x10(%r8),%ecx
                :ffffffff8047a02b:       or     %esi,%eax
                :ffffffff8047a02d:       movzbl %al,%eax
  62734  0.6021 :ffffffff8047a030:       or     %edx,%ecx
                :ffffffff8047a032:       or     %eax,%ecx
                :ffffffff8047a034:       je     ffffffff8047a040 
<inet_gro_receive+0x120>
                :ffffffff8047a036:       movl   $0x0,0x4(%r9)
                :ffffffff8047a03e:       jmp    ffffffff8047a06e 
<inet_gro_receive+0x14e>
   2106  0.0202 :ffffffff8047a040:       movzwl 0x4(%rdi),%edx
                :ffffffff8047a044:       mov    0x8(%rdi),%al
                :ffffffff8047a047:       xor    0x8(%r8),%eax
  64244  0.6166 :ffffffff8047a04b:       mov    %edx,%ecx
                :ffffffff8047a04d:       shl    $0x8,%edx
                :ffffffff8047a050:       shr    $0x8,%ecx
   2072  0.0199 :ffffffff8047a053:       movzbl %al,%eax
                :ffffffff8047a056:       or     0x8(%r9),%eax
                :ffffffff8047a05a:       or     %ecx,%edx
   2629  0.0252 :ffffffff8047a05c:       add    0xc(%r9),%edx
      2 1.9e-05 :ffffffff8047a060:       movzwl %dx,%edx
                :ffffffff8047a063:       xor    %r12d,%edx
  58223  0.5588 :ffffffff8047a066:       or     %edx,%eax
      3 2.9e-05 :ffffffff8047a068:       or     %ebp,%eax
                :ffffffff8047a06a:       mov    %eax,0x8(%r9)
  21878  0.2100 :ffffffff8047a06e:       mov    (%r10),%r10
   2156  0.0207 :ffffffff8047a071:       test   %r10,%r10
                :ffffffff8047a074:       jne    ffffffff80479ff5 
<inet_gro_receive+0xd5>
   3007  0.0289 :ffffffff8047a07a:       mov    0x38(%rbx),%eax
     61 5.9e-04 :ffffffff8047a07d:       or     %ebp,0x40(%rbx)
      3 2.9e-05 :ffffffff8047a080:       mov    %rbx,%rsi
   3091  0.0297 :ffffffff8047a083:       mov    %r13,%rdi
     41 3.9e-04 :ffffffff8047a086:       add    $0x14,%eax
                :ffffffff8047a089:       mov    %eax,0x38(%rbx)
   3704  0.0355 :ffffffff8047a08c:       sub    0xc0(%rbx),%eax
     33 3.2e-04 :ffffffff8047a092:       add    0xc8(%rbx),%eax
                :ffffffff8047a098:       mov    %eax,0xa8(%rbx)
   2468  0.0237 :ffffffff8047a09e:       callq  *0x20(%r11)
  20011  0.1921 :ffffffff8047a0a2:       jmp    ffffffff8047a0ab 
<inet_gro_receive+0x18b>
                :ffffffff8047a0a4:       xor    %eax,%eax
                :ffffffff8047a0a6:       mov    $0x1,%ebp
  24082  0.2311 :ffffffff8047a0ab:       or     %ebp,0x40(%rbx)
    626  0.0060 :ffffffff8047a0ae:       pop    %r10
   1718  0.0165 :ffffffff8047a0b0:       pop    %rbx
    446  0.0043 :ffffffff8047a0b1:       pop    %rbp
   4074  0.0391 :ffffffff8047a0b2:       pop    %r12
   2089  0.0200 :ffffffff8047a0b4:       pop    %r13
    434  0.0042 :ffffffff8047a0b6:       retq




ffffffff80430ea9 <skb_gro_receive>: /* skb_gro_receive total: 477479 
4.5827 */
   2158  0.0207 :ffffffff80430ea9:       push   %r15
   2492  0.0239 :ffffffff80430eab:       mov    %rdi,%r15
                :ffffffff80430eae:       push   %r14
                :ffffffff80430eb0:       push   %r13
   2432  0.0233 :ffffffff80430eb2:       push   %r12
      1 9.6e-06 :ffffffff80430eb4:       push   %rbp
      1 9.6e-06 :ffffffff80430eb5:       mov    %rsi,%rbp
   2430  0.0233 :ffffffff80430eb8:       push   %rbx
                :ffffffff80430eb9:       sub    $0x8,%rsp
                :ffffffff80430ebd:       mov    0x68(%rsi),%ecx
   2420  0.0232 :ffffffff80430ec0:       mov    (%rdi),%r12
      1 9.6e-06 :ffffffff80430ec3:       mov    %ecx,%r14d
      1 9.6e-06 :ffffffff80430ec6:       sub    0x38(%rsi),%r14d
   2317  0.0222 :ffffffff80430eca:       mov    %r14d,%eax
      1 9.6e-06 :ffffffff80430ecd:       add    0x68(%r12),%eax
      1 9.6e-06 :ffffffff80430ed2:       cmp    $0xffff,%eax
   3865  0.0371 :ffffffff80430ed7:       ja     ffffffff80431261 
<skb_gro_receive+0x3b8>
                :ffffffff80430edd:       mov    0xb8(%r12),%eax
                :ffffffff80430ee5:       mov    0xc0(%r12),%rdx
   8082  0.0776 :ffffffff80430eed:       lea    (%rdx,%rax,1),%rsi
                :ffffffff80430ef1:       cmpq   $0x0,0x18(%rsi)
      2 1.9e-05 :ffffffff80430ef6:       jne    ffffffff804311ab 
<skb_gro_receive+0x302>
   9249  0.0888 :ffffffff80430efc:       mov    %ecx,%edi
                :ffffffff80430efe:       sub    0x6c(%rbp),%edi
      6 5.8e-05 :ffffffff80430f01:       cmp    0x38(%rbp),%edi
   3104  0.0298 :ffffffff80430f04:       ja     ffffffff80430fe2 
<skb_gro_receive+0x139>
      2 1.9e-05 :ffffffff80430f0a:       mov    0xb8(%rbp),%ecx
                :ffffffff80430f10:       movzwl 0x4(%rsi),%edx
   8825  0.0847 :ffffffff80430f14:       add    0xc0(%rbp),%rcx
                :ffffffff80430f1b:       movzwl 0x4(%rcx),%eax
     21 2.0e-04 :ffffffff80430f1f:       add    %edx,%eax
  19668  0.1888 :ffffffff80430f21:       cmp    $0x12,%eax
      1 9.6e-06 :ffffffff80430f24:       ja     ffffffff80431261 
<skb_gro_receive+0x3b8>
                :ffffffff80430f2a:       mov    0x38(%rcx),%eax
   1974  0.0189 :ffffffff80430f2d:       add    0x38(%rbp),%eax
                :ffffffff80430f30:       cld
                :ffffffff80430f31:       sub    %edi,%eax
   7666  0.0736 :ffffffff80430f33:       mov    %eax,0x38(%rcx)
      2 1.9e-05 :ffffffff80430f36:       mov    0xb8(%rbp),%edx
                :ffffffff80430f3c:       add    0xc0(%rbp),%rdx
  52468  0.5036 :ffffffff80430f43:       mov    0x3c(%rdx),%eax
      2 1.9e-05 :ffffffff80430f46:       add    0x68(%rbp),%eax
      1 9.6e-06 :ffffffff80430f49:       sub    0x6c(%rbp),%eax
   6592  0.0633 :ffffffff80430f4c:       sub    0x38(%rbp),%eax
                :ffffffff80430f4f:       mov    %eax,0x3c(%rdx)
                :ffffffff80430f52:       mov    0xb8(%r12),%eax
  23018  0.2209 :ffffffff80430f5a:       add    0xc0(%r12),%rax
      1 9.6e-06 :ffffffff80430f62:       mov    0xb8(%rbp),%esi
                :ffffffff80430f68:       add    0xc0(%rbp),%rsi
   8477  0.0814 :ffffffff80430f6f:       movzwl 0x4(%rax),%edi
      6 5.8e-05 :ffffffff80430f73:       movzwl 0x4(%rsi),%ecx
                :ffffffff80430f77:       add    $0x30,%rsi
  21338  0.2048 :ffffffff80430f7b:       shl    $0x4,%rdi
      3 2.9e-05 :ffffffff80430f7f:       lea    0x30(%rdi,%rax,1),%rdi
      1 9.6e-06 :ffffffff80430f84:       shl    $0x4,%rcx
150632  1.4457 :ffffffff80430f88:       rep movsb %ds:(%rsi),%es:(%rdi)
   3988  0.0383 :ffffffff80430f8a:       mov    0xb8(%r12),%eax
   2015  0.0193 :ffffffff80430f92:       mov    0xb8(%rbp),%ecx
     11 1.1e-04 :ffffffff80430f98:       add    0xc0(%r12),%rax
      8 7.7e-05 :ffffffff80430fa0:       mov    0xc0(%rbp),%rdx
   3295  0.0316 :ffffffff80430fa7:       mov    0x4(%rdx,%rcx,1),%edx
                :ffffffff80430fab:       add    %dx,0x4(%rax)
      8 7.7e-05 :ffffffff80430faf:       mov    0xb8(%rbp),%edx
   2507  0.0241 :ffffffff80430fb5:       mov    0xc0(%rbp),%rax
                :ffffffff80430fbc:       movw   $0x0,0x4(%rax,%rdx,1)
   3233  0.0310 :ffffffff80430fc3:       mov    0x6c(%rbp),%eax
      1 9.6e-06 :ffffffff80430fc6:       sub    %eax,0xd0(%rbp)
                :ffffffff80430fcc:       sub    %eax,0x68(%rbp)
  41540  0.3987 :ffffffff80430fcf:       movl   $0x0,0x6c(%rbp)
                :ffffffff80430fd6:       movl   $0x1,0x48(%rbp)
                :ffffffff80430fdd:       jmpq   ffffffff8043123f 
<skb_gro_receive+0x396>
                :ffffffff80430fe2:       mov    0xc8(%r12),%rax
                :ffffffff80430fea:       mov    0x20(%r12),%rdi
                :ffffffff80430fef:       mov    %eax,%r13d
                :ffffffff80430ff2:       sub    %edx,%r13d
                :ffffffff80430ff5:       mov    $0x20,%edx
                :ffffffff80430ffa:       mov    %r13d,%esi
                :ffffffff80430ffd:       add    0x38(%r12),%esi
                :ffffffff80431002:       callq  ffffffff8042ffe0 
<__netdev_alloc_skb>
                :ffffffff80431007:       mov    %rax,%rbx
                :ffffffff8043100a:       mov    $0xfffffff4,%eax
                :ffffffff8043100f:       test   %rbx,%rbx
                :ffffffff80431012:       je     ffffffff80431266 
<skb_gro_receive+0x3bd>
                :ffffffff80431018:       mov    %r12,%rsi
                :ffffffff8043101b:       mov    %rbx,%rdi
                :ffffffff8043101e:       callq  ffffffff8042e2c0 
<__copy_skb_header>
                :ffffffff80431023:       mov    0x70(%r12),%eax
                :ffffffff80431028:       add    %r13d,0xb4(%rbx)
                :ffffffff8043102f:       mov    %ax,0x70(%rbx)
                :ffffffff80431033:       movslq %r13d,%rax
                :ffffffff80431036:       add    %rax,0xc8(%rbx)
                :ffffffff8043103d:       cmpl   $0x0,0x6c(%rbx)
                :ffffffff80431041:       mov    0x38(%r12),%edx
                :ffffffff80431046:       mov    0xb4(%rbx),%eax
                :ffffffff8043104c:       je     ffffffff80431052 
<skb_gro_receive+0x1a9>
                :ffffffff8043104e:       ud2a
                :ffffffff80431050:       jmp    ffffffff80431050 
<skb_gro_receive+0x1a7>
                :ffffffff80431052:       lea    (%rdx,%rax,1),%eax
                :ffffffff80431055:       add    %edx,0x68(%rbx)
                :ffffffff80431058:       mov    0xc8(%r12),%rcx
                :ffffffff80431060:       mov    0xc8(%rbx),%rdx
                :ffffffff80431067:       sub    0xc0(%rbx),%edx
                :ffffffff8043106d:       mov    %eax,0xb4(%rbx)
                :ffffffff80431073:       mov    0xb0(%r12),%eax
                :ffffffff8043107b:       add    0xc0(%r12),%rax
                :ffffffff80431083:       sub    %ecx,%eax
                :ffffffff80431085:       add    %edx,%eax
                :ffffffff80431087:       mov    %eax,0xb0(%rbx)
                :ffffffff8043108d:       mov    0xac(%r12),%eax
                :ffffffff80431095:       add    0xc0(%r12),%rax
                :ffffffff8043109d:       sub    %ecx,%eax
                :ffffffff8043109f:       add    %edx,%eax
                :ffffffff804310a1:       mov    %eax,0xac(%rbx)
                :ffffffff804310a7:       mov    0xa8(%r12),%eax
                :ffffffff804310af:       add    0xc0(%r12),%rax
                :ffffffff804310b7:       sub    %ecx,%eax
                :ffffffff804310b9:       add    %edx,%eax
                :ffffffff804310bb:       mov    %eax,0xa8(%rbx)
                :ffffffff804310c1:       mov    0x68(%r12),%eax
                :ffffffff804310c6:       mov    0x38(%r12),%edx
                :ffffffff804310cb:       sub    %edx,%eax
                :ffffffff804310cd:       cmp    0x6c(%r12),%eax
                :ffffffff804310d2:       mov    %eax,0x68(%r12)
                :ffffffff804310d7:       jae    ffffffff804310dd 
<skb_gro_receive+0x234>
                :ffffffff804310d9:       ud2a
                :ffffffff804310db:       jmp    ffffffff804310db 
<skb_gro_receive+0x232>
                :ffffffff804310dd:       mov    0xb0(%r12),%esi
                :ffffffff804310e5:       mov    %edx,%ecx
                :ffffffff804310e7:       add    0xc8(%r12),%rcx
                :ffffffff804310ef:       add    0xc0(%r12),%rsi
                :ffffffff804310f7:       mov    0xb0(%rbx),%edi
                :ffffffff804310fd:       add    0xc0(%rbx),%rdi
                :ffffffff80431104:       cld
                :ffffffff80431105:       mov    %rcx,0xc8(%r12)
                :ffffffff8043110d:       sub    %rsi,%rcx
                :ffffffff80431110:       rep movsb %ds:(%rsi),%es:(%rdi)
                :ffffffff80431112:       lea    0x38(%rbx),%rdi
                :ffffffff80431116:       lea    0x38(%r12),%rsi
                :ffffffff8043111b:       mov    $0x5,%cl
                :ffffffff8043111d:       rep movsl %ds:(%rsi),%es:(%rdi)
                :ffffffff8043111f:       mov    0xb8(%rbx),%edx
                :ffffffff80431125:       mov    0xc0(%rbx),%rax
                :ffffffff8043112c:       mov    %r12,0x18(%rax,%rdx,1)
                :ffffffff80431131:       mov    0xb8(%r12),%edx
                :ffffffff80431139:       mov    0xc0(%r12),%rax
                :ffffffff80431141:       mov    0xb8(%rbx),%esi
                :ffffffff80431147:       mov    0xc0(%rbx),%rcx
                :ffffffff8043114e:       mov    0x6(%rax,%rdx,1),%ax
                :ffffffff80431153:       mov    %ax,0x6(%rcx,%rsi,1)
                :ffffffff80431158:       testb  $0x10,0x7c(%r12)
                :ffffffff8043115e:       je     ffffffff80431164 
<skb_gro_receive+0x2bb>
                :ffffffff80431160:       ud2a
                :ffffffff80431162:       jmp    ffffffff80431162 
<skb_gro_receive+0x2b9>
                :ffffffff80431164:       mov    0xb8(%r12),%eax
                :ffffffff8043116c:       orb    $0x10,0x7c(%r12)
                :ffffffff80431172:       add    0xc0(%r12),%rax
                :ffffffff8043117a:       lock addl $0x10000,(%rax)
                :ffffffff80431181:       mov    0x68(%r12),%eax
                :ffffffff80431186:       mov    %r12,0x8(%rbx)
                :ffffffff8043118a:       add    %eax,0x6c(%rbx)
                :ffffffff8043118d:       add    %eax,0xd0(%rbx)
                :ffffffff80431193:       add    %eax,0x68(%rbx)
                :ffffffff80431196:       mov    %rbx,(%r15)
                :ffffffff80431199:       mov    (%r12),%rax
                :ffffffff8043119d:       mov    %rax,(%rbx)
                :ffffffff804311a0:       movq   $0x0,(%r12)
                :ffffffff804311a8:       mov    %rbx,%r12
                :ffffffff804311ab:       mov    0x68(%rbp),%ecx
                :ffffffff804311ae:       sub    0x6c(%rbp),%ecx
                :ffffffff804311b1:       cmp    %ecx,0x38(%rbp)
                :ffffffff804311b4:       jbe    ffffffff804311f3 
<skb_gro_receive+0x34a>
                :ffffffff804311b6:       mov    0xb8(%rbp),%edx
                :ffffffff804311bc:       add    0xc0(%rbp),%rdx
                :ffffffff804311c3:       mov    0x38(%rdx),%eax
                :ffffffff804311c6:       add    0x38(%rbp),%eax
                :ffffffff804311c9:       sub    %ecx,%eax
                :ffffffff804311cb:       mov    %eax,0x38(%rdx)
                :ffffffff804311ce:       mov    0xb8(%rbp),%edx
                :ffffffff804311d4:       add    0xc0(%rbp),%rdx
                :ffffffff804311db:       mov    0x3c(%rdx),%eax
                :ffffffff804311de:       add    0x68(%rbp),%eax
                :ffffffff804311e1:       sub    0x6c(%rbp),%eax
                :ffffffff804311e4:       sub    0x38(%rbp),%eax
                :ffffffff804311e7:       mov    %eax,0x3c(%rdx)
                :ffffffff804311ea:       mov    0x68(%rbp),%eax
                :ffffffff804311ed:       sub    0x6c(%rbp),%eax
                :ffffffff804311f0:       mov    %eax,0x38(%rbp)
                :ffffffff804311f3:       mov    0x68(%rbp),%eax
                :ffffffff804311f6:       mov    0x38(%rbp),%edx
                :ffffffff804311f9:       sub    %edx,%eax
                :ffffffff804311fb:       cmp    0x6c(%rbp),%eax
                :ffffffff804311fe:       mov    %eax,0x68(%rbp)
                :ffffffff80431201:       jae    ffffffff80431207 
<skb_gro_receive+0x35e>
                :ffffffff80431203:       ud2a
                :ffffffff80431205:       jmp    ffffffff80431205 
<skb_gro_receive+0x35c>
                :ffffffff80431207:       mov    %edx,%eax
                :ffffffff80431209:       add    %rax,0xc8(%rbp)
                :ffffffff80431210:       mov    0x8(%r12),%rax
                :ffffffff80431215:       mov    %rbp,0x8(%r12)
                :ffffffff8043121a:       mov    %rbp,(%rax)
                :ffffffff8043121d:       testb  $0x10,0x7c(%rbp)
                :ffffffff80431221:       je     ffffffff80431227 
<skb_gro_receive+0x37e>
                :ffffffff80431223:       ud2a
                :ffffffff80431225:       jmp    ffffffff80431225 
<skb_gro_receive+0x37c>
                :ffffffff80431227:       mov    0xb8(%rbp),%eax
                :ffffffff8043122d:       orb    $0x10,0x7c(%rbp)
                :ffffffff80431231:       add    0xc0(%rbp),%rax
                :ffffffff80431238:       lock addl $0x10000,(%rax)
  34919  0.3351 :ffffffff8043123f:       add    %r14d,0x6c(%r12)
   1989  0.0191 :ffffffff80431244:       add    %r14d,0xd0(%r12)
      1 9.6e-06 :ffffffff8043124c:       xor    %eax,%eax
                :ffffffff8043124e:       add    %r14d,0x68(%r12)
  20605  0.1978 :ffffffff80431253:       incl   0x44(%r12)
                :ffffffff80431258:       movl   $0x1,0x3c(%rbp)
                :ffffffff8043125f:       jmp    ffffffff80431266 
<skb_gro_receive+0x3bd>
                :ffffffff80431261:       mov    $0xfffffff9,%eax
  13260  0.1273 :ffffffff80431266:       pop    %r11
   1946  0.0187 :ffffffff80431268:       pop    %rbx
   2010  0.0193 :ffffffff80431269:       pop    %rbp
     64 6.1e-04 :ffffffff8043126a:       pop    %r12
   1948  0.0187 :ffffffff8043126c:       pop    %r13
   2746  0.0264 :ffffffff8043126e:       pop    %r14
     57 5.5e-04 :ffffffff80431270:       pop    %r15
   2067  0.0198 :ffffffff80431272:       retq

ffffffff80460663 <tcp_gro_receive>: /* tcp_gro_receive total: 396796 
3.8083 */
   4433  0.0425 :ffffffff80460663:       push   %r15
   2204  0.0212 :ffffffff80460665:       push   %r14
                :ffffffff80460667:       mov    %rdi,%r14
                :ffffffff8046066a:       push   %r13
   2275  0.0218 :ffffffff8046066c:       push   %r12
                :ffffffff8046066e:       mov    %rsi,%r12
                :ffffffff80460671:       mov    $0x14,%esi
   5933  0.0569 :ffffffff80460676:       mov    %r12,%rdi
                :ffffffff80460679:       push   %rbp
                :ffffffff8046067a:       push   %rbx
   2180  0.0209 :ffffffff8046067b:       sub    $0x8,%rsp
                :ffffffff8046067f:       callq  ffffffff804357a1 
<skb_gro_header>
                :ffffffff80460684:       test   %rax,%rax
   3218  0.0309 :ffffffff80460687:       je     ffffffff804607ed 
<tcp_gro_receive+0x18a>
                :ffffffff8046068d:       mov    0xc(%rax),%al
      1 9.6e-06 :ffffffff80460690:       shr    $0x4,%al
   3528  0.0339 :ffffffff80460693:       movzbl %al,%eax
                :ffffffff80460696:       lea    0x0(,%rax,4),%r13d
      1 9.6e-06 :ffffffff8046069e:       cmp    $0x13,%r13d
   2773  0.0266 :ffffffff804606a2:       jbe    ffffffff804607ed 
<tcp_gro_receive+0x18a>
                :ffffffff804606a8:       mov    %r13d,%esi
                :ffffffff804606ab:       mov    %r12,%rdi
   3327  0.0319 :ffffffff804606ae:       callq  ffffffff804357a1 
<skb_gro_header>
                :ffffffff804606b3:       test   %rax,%rax
   2094  0.0201 :ffffffff804606b6:       mov    %rax,%r8
                :ffffffff804606b9:       je     ffffffff804607ed 
<tcp_gro_receive+0x18a>
                :ffffffff804606bf:       lea    0x38(%r12),%r15
   2245  0.0215 :ffffffff804606c4:       add    %r13d,(%r15)
                :ffffffff804606c7:       mov    0x68(%r12),%ebp
                :ffffffff804606cc:       sub    0x38(%r12),%ebp
   2394  0.0230 :ffffffff804606d1:       mov    0xc(%rax),%ebx
                :ffffffff804606d4:       jmp    ffffffff80460710 
<tcp_gro_receive+0xad>
   2111  0.0203 :ffffffff804606d6:       lea    0x38(%rdi),%r9
      3 2.9e-05 :ffffffff804606da:       cmpl   $0x0,0x4(%r9)
     21 2.0e-04 :ffffffff804606df:       je     ffffffff8046070d 
<tcp_gro_receive+0xaa>
   2592  0.0249 :ffffffff804606e1:       mov    0xa8(%rdi),%eax
                :ffffffff804606e7:       mov    0xc0(%rdi),%r10
                :ffffffff804606ee:       mov    0x2(%r8),%dx
   2440  0.0234 :ffffffff804606f3:       lea    (%r10,%rax,1),%rcx
                :ffffffff804606f7:       mov    (%r8),%eax
      1 9.6e-06 :ffffffff804606fa:       xor    0x2(%rcx),%dx
   6275  0.0602 :ffffffff804606fe:       xor    (%rcx),%eax
      3 2.9e-05 :ffffffff80460700:       or     %ax,%dx
                :ffffffff80460703:       je     ffffffff8046071d 
<tcp_gro_receive+0xba>
                :ffffffff80460705:       movl   $0x0,0x4(%r9)
                :ffffffff8046070d:       mov    %rdi,%r14
   2920  0.0280 :ffffffff80460710:       mov    (%r14),%rdi
     18 1.7e-04 :ffffffff80460713:       test   %rdi,%rdi
      2 1.9e-05 :ffffffff80460716:       jne    ffffffff804606d6 
<tcp_gro_receive+0x73>
     33 3.2e-04 :ffffffff80460718:       jmpq   ffffffff80460807 
<tcp_gro_receive+0x1a4>
   4253  0.0408 :ffffffff8046071d:       mov    0xe(%r8),%ax
   2125  0.0204 :ffffffff80460722:       xor    0xe(%rcx),%ax
      2 1.9e-05 :ffffffff80460726:       mov    %ebx,%edx
                :ffffffff80460728:       and    $0x8000,%edx
   8066  0.0774 :ffffffff8046072e:       or     0x8(%r9),%edx
                :ffffffff80460732:       movzwl %ax,%esi
                :ffffffff80460735:       mov    0x8(%r8),%eax
  64740  0.6214 :ffffffff80460739:       xor    0x8(%rcx),%eax
                :ffffffff8046073c:       or     %eax,%esi
                :ffffffff8046073e:       mov    %ebx,%eax
   2084  0.0200 :ffffffff80460740:       xor    0xc(%rcx),%eax
                :ffffffff80460743:       and    $0x76,%ah
                :ffffffff80460746:       or     %eax,%edx
   2132  0.0205 :ffffffff80460748:       or     %edx,%esi
                :ffffffff8046074a:       mov    $0x14,%edx
                :ffffffff8046074f:       jmp    ffffffff8046075e 
<tcp_gro_receive+0xfb>
                :ffffffff80460751:       movslq %edx,%rax
                :ffffffff80460754:       add    $0x4,%edx
                :ffffffff80460757:       mov    (%r8,%rax,1),%esi
                :ffffffff8046075b:       xor    (%rcx,%rax,1),%esi
   3670  0.0352 :ffffffff8046075e:       test   %esi,%esi
   2162  0.0208 :ffffffff80460760:       jne    ffffffff80460767 
<tcp_gro_receive+0x104>
                :ffffffff80460762:       cmp    %r13d,%edx
      1 9.6e-06 :ffffffff80460765:       jb     ffffffff80460751 
<tcp_gro_receive+0xee>
  50209  0.4819 :ffffffff80460767:       mov    0xb8(%rdi),%eax
   4473  0.0429 :ffffffff8046076d:       mov    0x4(%rcx),%edx
                :ffffffff80460770:       bswap  %edx
   9554  0.0917 :ffffffff80460772:       mov    0x4(%r8),%ecx
                :ffffffff80460776:       bswap  %ecx
                :ffffffff80460778:       movzwl 0x6(%r10,%rax,1),%r13d
   7572  0.0727 :ffffffff8046077e:       mov    0x68(%rdi),%eax
                :ffffffff80460781:       sub    0x38(%rdi),%eax
                :ffffffff80460784:       add    %edx,%eax
   9803  0.0941 :ffffffff80460786:       xor    %eax,%ecx
                :ffffffff80460788:       cmp    %r13d,%ebp
                :ffffffff8046078b:       seta   %al
  50608  0.4857 :ffffffff8046078e:       test   %ebp,%ebp
                :ffffffff80460790:       sete   %dl
                :ffffffff80460793:       or     %edx,%eax
   3161  0.0303 :ffffffff80460795:       movzbl %al,%eax
                :ffffffff80460798:       or     %eax,%esi
                :ffffffff8046079a:       or     %esi,%ecx
   3278  0.0315 :ffffffff8046079c:       jne    ffffffff804607f6 
<tcp_gro_receive+0x193>
                :ffffffff8046079e:       mov    %r12,%rsi
      2 1.9e-05 :ffffffff804607a1:       mov    %r14,%rdi
   2579  0.0248 :ffffffff804607a4:       callq  ffffffff80430ea9 
<skb_gro_receive>
   2059  0.0198 :ffffffff804607a9:       test   %eax,%eax
     49 4.7e-04 :ffffffff804607ab:       jne    ffffffff804607f6 
<tcp_gro_receive+0x193>
                :ffffffff804607ad:       mov    (%r14),%rcx
   1945  0.0187 :ffffffff804607b0:       mov    %ebx,%edx
      3 2.9e-05 :ffffffff804607b2:       and    $0x900,%edx
                :ffffffff804607b8:       mov    0xa8(%rcx),%eax
   2530  0.0243 :ffffffff804607be:       add    0xc0(%rcx),%rax
      3 2.9e-05 :ffffffff804607c5:       or     %edx,0xc(%rax)
     13 1.2e-04 :ffffffff804607c8:       xor    %eax,%eax
   4881  0.0468 :ffffffff804607ca:       cmp    %r13d,%ebp
                :ffffffff804607cd:       setb   %al
                :ffffffff804607d0:       and    $0x2f00,%ebx
   1912  0.0184 :ffffffff804607d6:       or     %ebx,%eax
                :ffffffff804607d8:       test   %rcx,%rcx
                :ffffffff804607db:       je     ffffffff80460816 
<tcp_gro_receive+0x1b3>
   2163  0.0208 :ffffffff804607dd:       cmpl   $0x0,0x4(%r15)
    136  0.0013 :ffffffff804607e2:       je     ffffffff804607e8 
<tcp_gro_receive+0x185>
   2455  0.0236 :ffffffff804607e4:       test   %eax,%eax
     57 5.5e-04 :ffffffff804607e6:       je     ffffffff80460816 
<tcp_gro_receive+0x1b3>
    148  0.0014 :ffffffff804607e8:       mov    %r14,%rdi
    735  0.0071 :ffffffff804607eb:       jmp    ffffffff80460818 
<tcp_gro_receive+0x1b5>
                :ffffffff804607ed:       xor    %edi,%edi
                :ffffffff804607ef:       mov    $0x1,%eax
                :ffffffff804607f4:       jmp    ffffffff80460818 
<tcp_gro_receive+0x1b5>
     68 6.5e-04 :ffffffff804607f6:       xor    %eax,%eax
      1 9.6e-06 :ffffffff804607f8:       test   %ebp,%ebp
     67 6.4e-04 :ffffffff804607fa:       sete   %al
     47 4.5e-04 :ffffffff804607fd:       and    $0x2f00,%ebx
                :ffffffff80460803:       or     %ebx,%eax
     58 5.6e-04 :ffffffff80460805:       jmp    ffffffff804607dd 
<tcp_gro_receive+0x17a>
    122  0.0012 :ffffffff80460807:       xor    %eax,%eax
      9 8.6e-05 :ffffffff80460809:       test   %ebp,%ebp
                :ffffffff8046080b:       sete   %al
     67 6.4e-04 :ffffffff8046080e:       and    $0x2f00,%ebx
      6 5.8e-05 :ffffffff80460814:       or     %ebx,%eax
   1995  0.0191 :ffffffff80460816:       xor    %edi,%edi
     68 6.5e-04 :ffffffff80460818:       or     %eax,0x40(%r12)
    275  0.0026 :ffffffff8046081d:       mov    %rdi,%rax
   2037  0.0196 :ffffffff80460820:       pop    %r11
    191  0.0018 :ffffffff80460822:       pop    %rbx
   4346  0.0417 :ffffffff80460823:       pop    %rbp
   4739  0.0455 :ffffffff80460824:       pop    %r12
    167  0.0016 :ffffffff80460826:       pop    %r13
  23735  0.2278 :ffffffff80460828:       pop    %r14
  56070  0.5381 :ffffffff8046082a:       pop    %r15
    140  0.0013 :ffffffff8046082c:       retq

ffffffff804357a1 <skb_gro_header>: /* skb_gro_header total: 319455 
3.0660 */
  13604  0.1306 :ffffffff804357a1:       push   %rbp
  14938  0.1434 :ffffffff804357a2:       push   %rbx
                :ffffffff804357a3:       mov    %rdi,%rbx
                :ffffffff804357a6:       sub    $0x8,%rsp
  18392  0.1765 :ffffffff804357aa:       mov    0x38(%rdi),%ebp
                :ffffffff804357ad:       mov    0x68(%rdi),%edx
      1 9.6e-06 :ffffffff804357b0:       add    %ebp,%esi
  20559  0.1973 :ffffffff804357b2:       mov    %edx,%edi
                :ffffffff804357b4:       sub    0x6c(%rbx),%edi
                :ffffffff804357b7:       jne    ffffffff804357cc 
<skb_gro_header+0x2b>
  36626  0.3515 :ffffffff804357b9:       mov    0xb8(%rbx),%ecx
      2 1.9e-05 :ffffffff804357bf:       mov    0xc0(%rbx),%rax
      3 2.9e-05 :ffffffff804357c6:       cmp    %esi,0x3c(%rax,%rcx,1)
  18577  0.1783 :ffffffff804357ca:       jae    ffffffff804357ee 
<skb_gro_header+0x4d>
                :ffffffff804357cc:       cmp    %edi,%esi
                :ffffffff804357ce:       jbe    ffffffff804357e3 
<skb_gro_header+0x42>
                :ffffffff804357d0:       cmp    %edx,%esi
                :ffffffff804357d2:       ja     ffffffff80435833 
<skb_gro_header+0x92>
                :ffffffff804357d4:       sub    %edi,%esi
                :ffffffff804357d6:       mov    %rbx,%rdi
                :ffffffff804357d9:       callq  ffffffff8042f6ee 
<__pskb_pull_tail>
                :ffffffff804357de:       test   %rax,%rax
                :ffffffff804357e1:       je     ffffffff80435833 
<skb_gro_header+0x92>
                :ffffffff804357e3:       mov    %ebp,%eax
                :ffffffff804357e5:       add    0xc8(%rbx),%rax
                :ffffffff804357ec:       jmp    ffffffff80435835 
<skb_gro_header+0x94>
      3 2.9e-05 :ffffffff804357ee:       add    0xc0(%rbx),%rcx
  25999  0.2495 :ffffffff804357f5:       mov    $0x1e0000000000,%rax
                :ffffffff804357ff:       mov    $0x6db6db6db6db6db7,%rdx
  44557  0.4276 :ffffffff80435809:       add    0x30(%rcx),%rax
                :ffffffff8043580d:       sar    $0x3,%rax
  12588  0.1208 :ffffffff80435811:       imul   %rdx,%rax
  10104  0.0970 :ffffffff80435815:       mov    $0xffff880000000000,%rdx
                :ffffffff8043581f:       shl    $0xc,%rax
                :ffffffff80435823:       add    %rdx,%rax
  16404  0.1574 :ffffffff80435826:       mov    0x38(%rcx),%edx
                :ffffffff80435829:       add    %rdx,%rax
                :ffffffff8043582c:       mov    %ebp,%edx
  15264  0.1465 :ffffffff8043582e:       add    %rdx,%rax
                :ffffffff80435831:       jmp    ffffffff80435835 
<skb_gro_header+0x94>
                :ffffffff80435833:       xor    %eax,%eax
  45844  0.4400 :ffffffff80435835:       pop    %r10
      2 1.9e-05 :ffffffff80435837:       pop    %rbx
  12844  0.1233 :ffffffff80435838:       pop    %rbp
  13144  0.1262 :ffffffff80435839:       retq

Thanks for your help,

Drew

  reply	other threads:[~2009-04-29 17:30 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-15  8:09 [PATCH] myr10ge: again fix lro_gen_skb() alignment Stanislaw Gruszka
2009-04-15  9:28 ` David Miller
2009-04-15  9:48   ` Brice Goglin
2009-04-15 10:02     ` David Miller
2009-04-15 13:01       ` Andrew Gallatin
2009-04-15 21:04         ` Andrew Gallatin
2009-04-15 23:42           ` David Miller
2009-04-16  8:50             ` Herbert Xu
2009-04-16  9:02               ` David Miller
2009-04-21 19:19               ` Andrew Gallatin
2009-04-22 10:48                 ` Herbert Xu
2009-04-22 15:37                   ` Andrew Gallatin
2009-04-24  5:45                     ` Herbert Xu
2009-04-24 12:45                       ` Andrew Gallatin
2009-04-24 12:51                         ` Herbert Xu
2009-04-24 17:13                         ` Rick Jones
2009-04-24 16:16                       ` Andrew Gallatin
2009-04-24 16:30                         ` Herbert Xu
2009-04-24 16:31                           ` Herbert Xu
2009-04-27  8:05                         ` Herbert Xu
2009-04-27  8:07                           ` Herbert Xu
2009-04-27  9:32                             ` David Miller
2009-04-27 11:01                               ` Herbert Xu
2009-04-27 12:45                             ` David Miller
2009-04-27 12:45                           ` David Miller
2009-04-28  6:12                           ` Herbert Xu
2009-04-28 15:00                             ` Andrew Gallatin
2009-04-28 15:02                               ` David Miller
2009-04-28 15:20                               ` Herbert Xu
2009-04-28 15:44                                 ` Andrew Gallatin
2009-04-28 21:12                                 ` Andrew Gallatin
2009-04-29 13:42                                   ` Andrew Gallatin
2009-04-29 13:53                                     ` Eric Dumazet
2009-04-29 14:18                                       ` Andrew Gallatin
2009-04-29 15:26                                         ` Eric Dumazet
2009-04-29 17:28                                           ` Andrew Gallatin [this message]
2009-04-30  8:10                                             ` Herbert Xu
2009-04-30  8:14                                               ` Herbert Xu
2009-04-30  8:17                                             ` Eric Dumazet
2009-04-30 19:14                                               ` Andrew Gallatin
2009-04-23  8:00                 ` Herbert Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49F88E5A.8070908@myri.com \
    --to=gallatin@myri.com \
    --cc=brice@myri.com \
    --cc=dada1@cosmosbay.com \
    --cc=davem@davemloft.net \
    --cc=herbert@gondor.apana.org.au \
    --cc=netdev@vger.kernel.org \
    --cc=sgruszka@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.