* Re: [PATCH] loop unrolling in net/sched/sch_generic.c [not found] <C925F8B43D79CC49ACD0601FB68FF50C045E0FB0@orsmsx408> @ 2005-07-07 22:30 ` David S. Miller 0 siblings, 0 replies; 31+ messages in thread From: David S. Miller @ 2005-07-07 22:30 UTC (permalink / raw) To: jesse.brandeburg; +Cc: tgraf, dada1, netdev From: "Brandeburg, Jesse" <jesse.brandeburg@intel.com> Date: Thu, 7 Jul 2005 15:02:17 -0700 > Arg, this thread wasn't on the new list, is there any chance we can just > get netdev@oss.sgi.com to forward to netdev@vger.kernel.org? It does already. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [TG3]: About hw coalescing infrastructure.
@ 2005-07-04 22:47 David S. Miller
2005-07-04 22:55 ` Eric Dumazet
0 siblings, 1 reply; 31+ messages in thread
From: David S. Miller @ 2005-07-04 22:47 UTC (permalink / raw)
To: dada1; +Cc: mchan, netdev
From: Eric Dumazet <dada1@cosmosbay.com>
Date: Tue, 05 Jul 2005 00:31:55 +0200
> Oops, I forgot to tell you I applied the patch : [TG3]: Eliminate all hw IRQ handler spinlocks.
> (Date: Fri, 03 Jun 2005 12:25:58 -0700 (PDT))
>
> Maybe I should revert to stock 2.6.12 tg3 driver ?
Please don't ever do stuff like that :-( That makes the driver
version, and any other information you report completely meaningless
and useless. You've just wasted a lot of our time.
I have no idea to even know _WHICH_ IRQ spinlock patch you applied.
The one that ended up in Linus's tree has many bug fixes and
refinements. The ones which were posted on netdev had many bugs and
deadlocks which needed to be cured before pushing the change upstream.
^ permalink raw reply [flat|nested] 31+ messages in thread* Re: [TG3]: About hw coalescing infrastructure. 2005-07-04 22:47 [TG3]: About hw coalescing infrastructure David S. Miller @ 2005-07-04 22:55 ` Eric Dumazet 2005-07-04 22:57 ` Eric Dumazet 0 siblings, 1 reply; 31+ messages in thread From: Eric Dumazet @ 2005-07-04 22:55 UTC (permalink / raw) To: David S. Miller; +Cc: mchan, netdev David S. Miller a écrit : > Please don't ever do stuff like that :-( That makes the driver > version, and any other information you report completely meaningless > and useless. You've just wasted a lot of our time. > Yes. But if you dont want us to test your patches, dont send them to the list. For your information, 2.6.13-rc1 locks too. Very easy way to lock it : ping -f some_destination. > I have no idea to even know _WHICH_ IRQ spinlock patch you applied. > > The one that ended up in Linus's tree has many bug fixes and > refinements. The ones which were posted on netdev had many bugs and > deadlocks which needed to be cured before pushing the change upstream. > > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [TG3]: About hw coalescing infrastructure. 2005-07-04 22:55 ` Eric Dumazet @ 2005-07-04 22:57 ` Eric Dumazet 2005-07-04 23:01 ` David S. Miller 0 siblings, 1 reply; 31+ messages in thread From: Eric Dumazet @ 2005-07-04 22:57 UTC (permalink / raw) To: Eric Dumazet; +Cc: David S. Miller, mchan, netdev Eric Dumazet a écrit : > > For your information, 2.6.13-rc1 locks too. > > Very easy way to lock it : > > ping -f some_destination. Arg. False alarm. Sorry. Time for me to sleep. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [TG3]: About hw coalescing infrastructure. 2005-07-04 22:57 ` Eric Dumazet @ 2005-07-04 23:01 ` David S. Miller 2005-07-05 7:38 ` [PATCH] loop unrolling in net/sched/sch_generic.c Eric Dumazet 0 siblings, 1 reply; 31+ messages in thread From: David S. Miller @ 2005-07-04 23:01 UTC (permalink / raw) To: dada1; +Cc: mchan, netdev From: Eric Dumazet <dada1@cosmosbay.com> Date: Tue, 05 Jul 2005 00:57:58 +0200 > Eric Dumazet a écrit : > > > Very easy way to lock it : > > > > ping -f some_destination. > > > Arg. False alarm. Sorry. > > Time for me to sleep. Oh well... ^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH] loop unrolling in net/sched/sch_generic.c 2005-07-04 23:01 ` David S. Miller @ 2005-07-05 7:38 ` Eric Dumazet 2005-07-05 11:51 ` Thomas Graf 2005-07-05 21:26 ` David S. Miller 0 siblings, 2 replies; 31+ messages in thread From: Eric Dumazet @ 2005-07-05 7:38 UTC (permalink / raw) To: David S. Miller; +Cc: netdev [-- Attachment #1: Type: text/plain, Size: 305 bytes --] [NET] : unroll a small loop in pfifo_fast_dequeue(). Compiler generates better code. (Using skb_queue_empty() to test the queue is faster than trying to __skb_dequeue()) oprofile says this function uses now 0.29% instead of 1.22 %, on a x86_64 target. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> [-- Attachment #2: patch.sch_generic --] [-- Type: text/plain, Size: 1022 bytes --] --- linux-2.6.12/net/sched/sch_generic.c 2005-06-17 21:48:29.000000000 +0200 +++ linux-2.6.12-ed/net/sched/sch_generic.c 2005-07-05 09:11:30.000000000 +0200 @@ -333,18 +333,23 @@ static struct sk_buff * pfifo_fast_dequeue(struct Qdisc* qdisc) { - int prio; struct sk_buff_head *list = qdisc_priv(qdisc); struct sk_buff *skb; - for (prio = 0; prio < 3; prio++, list++) { - skb = __skb_dequeue(list); - if (skb) { - qdisc->q.qlen--; - return skb; - } + for (;;) { + if (!skb_queue_empty(list)) + break; + list++; + if (!skb_queue_empty(list)) + break; + list++; + if (!skb_queue_empty(list)) + break; + return NULL; } - return NULL; + skb = __skb_dequeue(list); + qdisc->q.qlen--; + return skb; } static int ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] loop unrolling in net/sched/sch_generic.c 2005-07-05 7:38 ` [PATCH] loop unrolling in net/sched/sch_generic.c Eric Dumazet @ 2005-07-05 11:51 ` Thomas Graf 2005-07-05 12:03 ` Thomas Graf 2005-07-05 13:04 ` Eric Dumazet 2005-07-05 21:26 ` David S. Miller 1 sibling, 2 replies; 31+ messages in thread From: Thomas Graf @ 2005-07-05 11:51 UTC (permalink / raw) To: Eric Dumazet; +Cc: David S. Miller, netdev * Eric Dumazet <42CA390C.9000801@cosmosbay.com> 2005-07-05 09:38 > [NET] : unroll a small loop in pfifo_fast_dequeue(). Compiler generates > better code. > (Using skb_queue_empty() to test the queue is faster than trying to > __skb_dequeue()) > oprofile says this function uses now 0.29% instead of 1.22 %, on a > x86_64 target. I think this patch is pretty much pointless. __skb_dequeue() and !skb_queue_empty() should produce almost the same code and as soon as you disable profiling and debugging you'll see that the compiler unrolls the loop itself if possible. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] loop unrolling in net/sched/sch_generic.c 2005-07-05 11:51 ` Thomas Graf @ 2005-07-05 12:03 ` Thomas Graf 2005-07-05 13:04 ` Eric Dumazet 1 sibling, 0 replies; 31+ messages in thread From: Thomas Graf @ 2005-07-05 12:03 UTC (permalink / raw) To: Eric Dumazet; +Cc: David S. Miller, netdev * Thomas Graf <20050705115108.GE16076@postel.suug.ch> 2005-07-05 13:51 > * Eric Dumazet <42CA390C.9000801@cosmosbay.com> 2005-07-05 09:38 > > [NET] : unroll a small loop in pfifo_fast_dequeue(). Compiler generates > > better code. > > (Using skb_queue_empty() to test the queue is faster than trying to > > __skb_dequeue()) > > oprofile says this function uses now 0.29% instead of 1.22 %, on a > > x86_64 target. > > I think this patch is pretty much pointless. __skb_dequeue() and > !skb_queue_empty() should produce almost the same code and as soon > as you disable profiling and debugging you'll see that the compiler > unrolls the loop itself if possible. ... given one enables it of course. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] loop unrolling in net/sched/sch_generic.c 2005-07-05 11:51 ` Thomas Graf 2005-07-05 12:03 ` Thomas Graf @ 2005-07-05 13:04 ` Eric Dumazet 2005-07-05 13:48 ` Thomas Graf 1 sibling, 1 reply; 31+ messages in thread From: Eric Dumazet @ 2005-07-05 13:04 UTC (permalink / raw) To: Thomas Graf; +Cc: David S. Miller, netdev Thomas Graf a écrit : > * Eric Dumazet <42CA390C.9000801@cosmosbay.com> 2005-07-05 09:38 > >>[NET] : unroll a small loop in pfifo_fast_dequeue(). Compiler generates >>better code. >> (Using skb_queue_empty() to test the queue is faster than trying to >> __skb_dequeue()) >> oprofile says this function uses now 0.29% instead of 1.22 %, on a >> x86_64 target. > > > I think this patch is pretty much pointless. __skb_dequeue() and > !skb_queue_empty() should produce almost the same code and as soon > as you disable profiling and debugging you'll see that the compiler > unrolls the loop itself if possible. > > OK. At least my compiler (gcc-3.3.1) does NOT unroll the loop : Original 2.6.12 gives : ffffffff802a9790 <pfifo_fast_dequeue>: /* pfifo_fast_dequeue total: 2904054 1.9531 */ 258371 0.1738 :ffffffff802a9790: lea 0xc0(%rdi),%rcx 273669 0.1841 :ffffffff802a9797: xor %esi,%esi 12533 0.0084 :ffffffff802a9799: mov (%rcx),%rdx 292315 0.1966 :ffffffff802a979c: cmp %rcx,%rdx 11717 0.0079 :ffffffff802a979f: je ffffffff802a97d1 <pfifo_fast_dequeue+0x41> 4474 0.0030 :ffffffff802a97a1: mov %rdx,%rax 6238 0.0042 :ffffffff802a97a4: mov (%rdx),%rdx 41 2.8e-05 :ffffffff802a97a7: decl 0x10(%rcx) 6089 0.0041 :ffffffff802a97aa: test %rax,%rax 126 8.5e-05 :ffffffff802a97ad: movq $0x0,0x10(%rax) 39 2.6e-05 :ffffffff802a97b5: mov %rcx,0x8(%rdx) 6974 0.0047 :ffffffff802a97b9: mov %rdx,(%rcx) 2841 0.0019 :ffffffff802a97bc: movq $0x0,0x8(%rax) 366 2.5e-04 :ffffffff802a97c4: movq $0x0,(%rax) 14757 0.0099 :ffffffff802a97cb: je ffffffff802a97d1 <pfifo_fast_dequeue+0x41> 288 1.9e-04 :ffffffff802a97cd: decl 0x40(%rdi) 94 6.3e-05 :ffffffff802a97d0: retq 970400 0.6526 :ffffffff802a97d1: inc %esi 982402 0.6607 :ffffffff802a97d3: add $0x18,%rcx 4 2.7e-06 :ffffffff802a97d7: cmp $0x2,%esi 1 6.7e-07 :ffffffff802a97da: jle ffffffff802a9799 <pfifo_fast_dequeue+0x9> 59754 0.0402 :ffffffff802a97dc: xor %eax,%eax 561 3.8e-04 :ffffffff802a97de: data16 :ffffffff802a97df: nop :ffffffff802a97e0: retq And new code (2.6.12-ed): ffffffff802b1020 <pfifo_fast_dequeue>: /* pfifo_fast_dequeue total: 153139 0.2934 */ 27388 0.0525 :ffffffff802b1020: lea 0xc0(%rdi),%rdx 42091 0.0806 :ffffffff802b1027: cmp %rdx,0xc0(%rdi) :ffffffff802b102e: jne ffffffff802b1052 <pfifo_fast_dequeue+0x32> 474 9.1e-04 :ffffffff802b1030: lea 0xd8(%rdi),%rdx 5571 0.0107 :ffffffff802b1037: cmp %rdx,0xd8(%rdi) 2 3.8e-06 :ffffffff802b103e: jne ffffffff802b1052 <pfifo_fast_dequeue+0x32> 1 1.9e-06 :ffffffff802b1040: lea 0xf0(%rdi),%rdx 20030 0.0384 :ffffffff802b1047: xor %eax,%eax 6 1.1e-05 :ffffffff802b1049: cmp %rdx,0xf0(%rdi) 6 1.1e-05 :ffffffff802b1050: je ffffffff802b1086 <pfifo_fast_dequeue+0x66> :ffffffff802b1052: mov (%rdx),%rcx 11796 0.0226 :ffffffff802b1055: xor %eax,%eax :ffffffff802b1057: cmp %rdx,%rcx 8 1.5e-05 :ffffffff802b105a: je ffffffff802b1083 <pfifo_fast_dequeue+0x63> 3146 0.0060 :ffffffff802b105c: mov %rcx,%rax 12 2.3e-05 :ffffffff802b105f: mov (%rcx),%rcx 118 2.3e-04 :ffffffff802b1062: decl 0x10(%rdx) 4924 0.0094 :ffffffff802b1065: movq $0x0,0x10(%rax) 65 1.2e-04 :ffffffff802b106d: mov %rdx,0x8(%rcx) 725 0.0014 :ffffffff802b1071: mov %rcx,(%rdx) 11493 0.0220 :ffffffff802b1074: movq $0x0,0x8(%rax) 194 3.7e-04 :ffffffff802b107c: movq $0x0,(%rax) 2995 0.0057 :ffffffff802b1083: decl 0x40(%rdi) 19607 0.0376 :ffffffff802b1086: nop 2487 0.0048 :ffffffff802b1087: retq Please give us the code your compiler produces, and explain me how disabling oprofile can change the generated assembly. :) Debugging has no impact on this code either. Thank you Eric ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] loop unrolling in net/sched/sch_generic.c 2005-07-05 13:04 ` Eric Dumazet @ 2005-07-05 13:48 ` Thomas Graf 2005-07-05 15:58 ` Eric Dumazet 0 siblings, 1 reply; 31+ messages in thread From: Thomas Graf @ 2005-07-05 13:48 UTC (permalink / raw) To: Eric Dumazet; +Cc: David S. Miller, netdev * Eric Dumazet <42CA8555.9050607@cosmosbay.com> 2005-07-05 15:04 > Thomas Graf a écrit : > >* Eric Dumazet <42CA390C.9000801@cosmosbay.com> 2005-07-05 09:38 > > > >>[NET] : unroll a small loop in pfifo_fast_dequeue(). Compiler generates > >>better code. > >> (Using skb_queue_empty() to test the queue is faster than trying to > >> __skb_dequeue()) > >> oprofile says this function uses now 0.29% instead of 1.22 %, on a > >> x86_64 target. > > > > > >I think this patch is pretty much pointless. __skb_dequeue() and > >!skb_queue_empty() should produce almost the same code and as soon > >as you disable profiling and debugging you'll see that the compiler > >unrolls the loop itself if possible. > > > > > > OK. At least my compiler (gcc-3.3.1) does NOT unroll the loop : Because you don't specify -funroll-loop [...] > Please give us the code your compiler produces, Unrolled version: pfifo_fast_dequeue: pushl %esi xorl %edx, %edx pushl %ebx movl 12(%esp), %esi movl 128(%esi), %eax leal 128(%esi), %ecx cmpl %ecx, %eax je .L132 movl %eax, %edx movl (%eax), %eax decl 8(%ecx) movl $0, 8(%edx) movl %ecx, 4(%eax) movl %eax, 128(%esi) movl $0, 4(%edx) movl $0, (%edx) .L132: testl %edx, %edx je .L131 movl 96(%edx), %ebx movl 80(%esi), %eax decl 40(%esi) subl %ebx, %eax movl %eax, 80(%esi) movl %edx, %eax .L117: popl %ebx popl %esi ret .L131: movl 20(%ecx), %eax leal 20(%ecx), %edx xorl %ebx, %ebx cmpl %edx, %eax je .L137 movl %eax, %ebx movl (%eax), %eax decl 8(%edx) movl $0, 8(%ebx) movl %edx, 4(%eax) movl %eax, 20(%ecx) movl $0, 4(%ebx) movl $0, (%ebx) .L137: testl %ebx, %ebx je .L147 .L146: movl 96(%ebx), %ecx movl 80(%esi), %eax decl 40(%esi) subl %ecx, %eax movl %eax, 80(%esi) movl %ebx, %eax jmp .L117 .L147: movl 40(%ecx), %eax leal 40(%ecx), %edx xorl %ebx, %ebx cmpl %edx, %eax je .L142 movl %eax, %ebx movl (%eax), %eax decl 8(%edx) movl $0, 8(%ebx) movl %edx, 4(%eax) movl %eax, 40(%ecx) movl $0, 4(%ebx) movl $0, (%ebx) .L142: xorl %eax, %eax testl %ebx, %ebx jne .L146 jmp .L117 > and explain me how > disabling oprofile can change the generated assembly. :) > Debugging has no impact on this code either. I just noticed that this is a local modification of my own, so in the vanilla tree it indeed doesn't have any impact on the code generated. Still, your patch does not make sense to me. The latest tree also includes my pfifo_fast changes wich modified the code to maintain a backlog and made it easy to add more fifos at compile time. If you want the loop unrolled then let the compiler do it via -funroll-loop. These kind of optimization seem as uncessary to me as all the loopback optimizations. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] loop unrolling in net/sched/sch_generic.c 2005-07-05 13:48 ` Thomas Graf @ 2005-07-05 15:58 ` Eric Dumazet 2005-07-05 17:34 ` Thomas Graf 0 siblings, 1 reply; 31+ messages in thread From: Eric Dumazet @ 2005-07-05 15:58 UTC (permalink / raw) To: Thomas Graf; +Cc: David S. Miller, netdev Thomas Graf a écrit : >>OK. At least my compiler (gcc-3.3.1) does NOT unroll the loop : > > > Because you don't specify -funroll-loop I'm using vanilla 2.6.12 : no -funroll-loop in it. Maybe in your tree, not on 99.9% of 2.6.12 trees. Are you suggesting everybody should use this compiler flag ? Something like : net/sched/Makefile: CFLAGS_sch_generic.o := -funroll-loops ? > > [...] > > >>Please give us the code your compiler produces, > > > Unrolled version: > > pfifo_fast_dequeue: > pushl %esi > xorl %edx, %edx > pushl %ebx > movl 12(%esp), %esi > movl 128(%esi), %eax > leal 128(%esi), %ecx > cmpl %ecx, %eax > je .L132 > movl %eax, %edx > movl (%eax), %eax > decl 8(%ecx) > movl $0, 8(%edx) > movl %ecx, 4(%eax) > movl %eax, 128(%esi) > movl $0, 4(%edx) > movl $0, (%edx) > .L132: > testl %edx, %edx > je .L131 > movl 96(%edx), %ebx > movl 80(%esi), %eax > decl 40(%esi) > subl %ebx, %eax > movl %eax, 80(%esi) > movl %edx, %eax > .L117: > popl %ebx > popl %esi > ret > .L131: > movl 20(%ecx), %eax > leal 20(%ecx), %edx > xorl %ebx, %ebx > cmpl %edx, %eax > je .L137 > movl %eax, %ebx > movl (%eax), %eax > decl 8(%edx) > movl $0, 8(%ebx) > movl %edx, 4(%eax) > movl %eax, 20(%ecx) > movl $0, 4(%ebx) > movl $0, (%ebx) > .L137: > testl %ebx, %ebx > je .L147 > .L146: > movl 96(%ebx), %ecx > movl 80(%esi), %eax > decl 40(%esi) > subl %ecx, %eax > movl %eax, 80(%esi) > movl %ebx, %eax > jmp .L117 > .L147: > movl 40(%ecx), %eax > leal 40(%ecx), %edx > xorl %ebx, %ebx > cmpl %edx, %eax > je .L142 > movl %eax, %ebx > movl (%eax), %eax > decl 8(%edx) > movl $0, 8(%ebx) > movl %edx, 4(%eax) > movl %eax, 40(%ecx) > movl $0, 4(%ebx) > movl $0, (%ebx) > .L142: > xorl %eax, %eax > testl %ebx, %ebx > jne .L146 > jmp .L117 > OK thanks, but you dont give the code for my version :) shorter and unrolled as you can see, and with nice predicted branches. 00000fc0 <pfifo_fast_dequeue>: fc0: 56 push %esi fc1: 89 c1 mov %eax,%ecx fc3: 53 push %ebx fc4: 8d 98 a0 00 00 00 lea 0xa0(%eax),%ebx fca: 39 98 a0 00 00 00 cmp %ebx,0xa0(%eax) fd0: 89 da mov %ebx,%edx fd2: 75 22 jne ff6 <pfifo_fast_dequeue+0x36> fd4: 8d 90 c4 00 00 00 lea 0xc4(%eax),%edx fda: 39 90 c4 00 00 00 cmp %edx,0xc4(%eax) fe0: 89 d3 mov %edx,%ebx fe2: 75 12 jne ff6 <pfifo_fast_dequeue+0x36> fe4: 8d 98 e8 00 00 00 lea 0xe8(%eax),%ebx fea: 31 f6 xor %esi,%esi fec: 39 98 e8 00 00 00 cmp %ebx,0xe8(%eax) ff2: 89 da mov %ebx,%edx ff4: 74 27 je 101d <pfifo_fast_dequeue+0x5d> ff6: 8b 32 mov (%edx),%esi ff8: 39 d6 cmp %edx,%esi ffa: 74 26 je 1022 <pfifo_fast_dequeue+0x62> ffc: 8b 06 mov (%esi),%eax ffe: ff 4b 08 decl 0x8(%ebx) 1001: c7 46 08 00 00 00 00 movl $0x0,0x8(%esi) 1008: 89 50 04 mov %edx,0x4(%eax) 100b: 89 02 mov %eax,(%edx) 100d: c7 46 04 00 00 00 00 movl $0x0,0x4(%esi) 1014: c7 06 00 00 00 00 movl $0x0,(%esi) 101a: ff 49 28 decl 0x28(%ecx) 101d: 5b pop %ebx 101e: 89 f0 mov %esi,%eax 1020: 5e pop %esi 1021: c3 ret 1022: ff 49 28 decl 0x28(%ecx) 1025: 31 f6 xor %esi,%esi 1027: eb f4 jmp 101d <pfifo_fast_dequeue+0x5d> > > I just noticed that this is a local modification of my own, so in > the vanilla tree it indeed doesn't have any impact on the code > generated. > > Still, your patch does not make sense to me. The latest tree > also includes my pfifo_fast changes wich modified the code to > maintain a backlog and made it easy to add more fifos at compile > time. If you want the loop unrolled then let the compiler do it > via -funroll-loop. These kind of optimization seem as uncessary > to me as all the loopback optimizations. > I dont want change compiler flags in my tree and loose this optim when 2.6.13 is released. I dont know about loopback optimization, I am not involved with this stuff, maybe you think I'm another guy ? It seems to me you give unrelated arguments. I dont know what are your plans, but mine were not to say you are writing bad code. Just to give my performance analysis and feedback, I'm sorry if it hurts you. Eric Dumazet ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] loop unrolling in net/sched/sch_generic.c 2005-07-05 15:58 ` Eric Dumazet @ 2005-07-05 17:34 ` Thomas Graf 2005-07-05 21:22 ` David S. Miller 0 siblings, 1 reply; 31+ messages in thread From: Thomas Graf @ 2005-07-05 17:34 UTC (permalink / raw) To: Eric Dumazet; +Cc: David S. Miller, netdev * Eric Dumazet <42CAAE2F.5070807@cosmosbay.com> 2005-07-05 17:58 > I'm using vanilla 2.6.12 : no -funroll-loop in it. Maybe in your tree, not > on 99.9% of 2.6.12 trees. > > Are you suggesting everybody should use this compiler flag ? > I dont know about loopback optimization, I am not involved with this stuff, > maybe you think I'm another guy ? > > It seems to me you give unrelated arguments. Do as you wish, I don't feel like argueing about micro optimizations. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] loop unrolling in net/sched/sch_generic.c 2005-07-05 17:34 ` Thomas Graf @ 2005-07-05 21:22 ` David S. Miller 2005-07-05 21:33 ` Thomas Graf 0 siblings, 1 reply; 31+ messages in thread From: David S. Miller @ 2005-07-05 21:22 UTC (permalink / raw) To: tgraf; +Cc: dada1, netdev From: Thomas Graf <tgraf@suug.ch> Date: Tue, 5 Jul 2005 19:34:11 +0200 > Do as you wish, I don't feel like argueing about micro optimizations. I bet the performance gain really comes from the mispredicted branches in the loop. For loops of fixed duration, say, 5 or 6 iterations or less, it totally defeats the branch prediction logic in most processors. By the time the chip moves the I-cache branch state to "likely" the loop has ended and we eat a mispredict. I think the original patch is OK, hand unrolling the loop in the C code. Adding -funroll-loops to the CFLAGS has lots of implications, and in particular the embedded folks might not be happy with some things that result from that. So I'll apply the original unrolling patch for now. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] loop unrolling in net/sched/sch_generic.c 2005-07-05 21:22 ` David S. Miller @ 2005-07-05 21:33 ` Thomas Graf 2005-07-05 21:35 ` David S. Miller 0 siblings, 1 reply; 31+ messages in thread From: Thomas Graf @ 2005-07-05 21:33 UTC (permalink / raw) To: David S. Miller; +Cc: dada1, netdev * David S. Miller <20050705.142210.14973612.davem@davemloft.net> 2005-07-05 14:22 > So I'll apply the original unrolling patch for now. The patch must be changed to use __qdisc_dequeue_head() instead of __skb_dequeue() or we screw up the backlog. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] loop unrolling in net/sched/sch_generic.c 2005-07-05 21:33 ` Thomas Graf @ 2005-07-05 21:35 ` David S. Miller 2005-07-05 23:16 ` Eric Dumazet 0 siblings, 1 reply; 31+ messages in thread From: David S. Miller @ 2005-07-05 21:35 UTC (permalink / raw) To: tgraf; +Cc: dada1, netdev From: Thomas Graf <tgraf@suug.ch> Date: Tue, 5 Jul 2005 23:33:55 +0200 > * David S. Miller <20050705.142210.14973612.davem@davemloft.net> 2005-07-05 14:22 > > So I'll apply the original unrolling patch for now. > > The patch must be changed to use __qdisc_dequeue_head() instead of > __skb_dequeue() or we screw up the backlog. Ok, good thing the patch didn't apply correctly anyways :) ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] loop unrolling in net/sched/sch_generic.c 2005-07-05 21:35 ` David S. Miller @ 2005-07-05 23:16 ` Eric Dumazet 2005-07-05 23:41 ` Thomas Graf 0 siblings, 1 reply; 31+ messages in thread From: Eric Dumazet @ 2005-07-05 23:16 UTC (permalink / raw) To: David S. Miller, tgraf; +Cc: netdev [-- Attachment #1: Type: text/plain, Size: 921 bytes --] David S. Miller a écrit : > From: Thomas Graf <tgraf@suug.ch> > Date: Tue, 5 Jul 2005 23:33:55 +0200 > > >>* David S. Miller <20050705.142210.14973612.davem@davemloft.net> 2005-07-05 14:22 >> >>>So I'll apply the original unrolling patch for now. >> >>The patch must be changed to use __qdisc_dequeue_head() instead of >>__skb_dequeue() or we screw up the backlog. > > > Ok, good thing the patch didn't apply correctly anyways :) > > Oh well, I was unaware of last changes in 2.6.13-rc1 :( Given the fact that the PFIFO_FAST_BANDS macro was introduced, I wonder if the patch should be this one or not... Should we assume PFIFO_FAST_BANDS will stay at 3 or what ? [NET] : unroll a small loop in pfifo_fast_dequeue(). Compiler generates better code. oprofile says this function uses now 0.29% instead of 1.22 %, on a x86_64 target. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> [-- Attachment #2: patch.sch_generic --] [-- Type: text/plain, Size: 1015 bytes --] --- linux-2.6.13-rc1/net/sched/sch_generic.c 2005-07-06 00:46:53.000000000 +0200 +++ linux-2.6.13-rc1-ed/net/sched/sch_generic.c 2005-07-06 01:05:04.000000000 +0200 @@ -328,18 +328,31 @@ static struct sk_buff *pfifo_fast_dequeue(struct Qdisc* qdisc) { - int prio; struct sk_buff_head *list = qdisc_priv(qdisc); - for (prio = 0; prio < PFIFO_FAST_BANDS; prio++, list++) { - struct sk_buff *skb = __qdisc_dequeue_head(qdisc, list); - if (skb) { - qdisc->q.qlen--; - return skb; +#if PFIFO_FAST_BANDS == 3 + for (;;) { + if (!skb_queue_empty(list)) + break; + list++; + if (!skb_queue_empty(list)) + break; + list++; + if (!skb_queue_empty(list)) + break; + return NULL; } - } - - return NULL; +#else + int prio; + for (prio = 0;; list++) { + if (!skb_queue_empty(list)) + break; + if (++prio == PFIFO_FAST_BANDS) + return NULL; + } +#endif + qdisc->q.qlen--; + return __qdisc_dequeue_head(qdisc, list); } static int pfifo_fast_requeue(struct sk_buff *skb, struct Qdisc* qdisc) ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] loop unrolling in net/sched/sch_generic.c 2005-07-05 23:16 ` Eric Dumazet @ 2005-07-05 23:41 ` Thomas Graf 2005-07-05 23:45 ` David S. Miller 2005-07-06 0:32 ` Eric Dumazet 0 siblings, 2 replies; 31+ messages in thread From: Thomas Graf @ 2005-07-05 23:41 UTC (permalink / raw) To: Eric Dumazet; +Cc: David S. Miller, netdev * Eric Dumazet <42CB14B2.5090601@cosmosbay.com> 2005-07-06 01:16 > Oh well, I was unaware of last changes in 2.6.13-rc1 :( Ok, this clarifies a lot for me, I was under the impression you knew about these changes. > Given the fact that the PFIFO_FAST_BANDS macro was introduced, I wonder if > the patch should be this one or not... > Should we assume PFIFO_FAST_BANDS will stay at 3 or what ? It is very unlikely to change within mainline but the idea behind it is to allow it be changed at compile time. I still think we can fix this performance issue without manually unrolling the loop or we should at least try to. In the end gcc should notice the constant part of the loop and move it out so basically the only difference should the additional prio++ and possibly a failing branch prediction. What about this? I'm still not sure where exactly all the time is lost so this is a shot in the dark. Index: net-2.6/net/sched/sch_generic.c =================================================================== --- net-2.6.orig/net/sched/sch_generic.c +++ net-2.6/net/sched/sch_generic.c @@ -330,10 +330,11 @@ static struct sk_buff *pfifo_fast_dequeu { int prio; struct sk_buff_head *list = qdisc_priv(qdisc); + struct sk_buff *skb; - for (prio = 0; prio < PFIFO_FAST_BANDS; prio++, list++) { - struct sk_buff *skb = __qdisc_dequeue_head(qdisc, list); - if (skb) { + for (prio = 0; prio < PFIFO_FAST_BANDS; prio++) { + if (!skb_queue_empty(list + prio)) { + skb = __qdisc_dequeue_head(qdisc, list); qdisc->q.qlen--; return skb; } ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] loop unrolling in net/sched/sch_generic.c 2005-07-05 23:41 ` Thomas Graf @ 2005-07-05 23:45 ` David S. Miller 2005-07-05 23:55 ` Thomas Graf 2005-07-06 0:32 ` Eric Dumazet 1 sibling, 1 reply; 31+ messages in thread From: David S. Miller @ 2005-07-05 23:45 UTC (permalink / raw) To: tgraf; +Cc: dada1, netdev From: Thomas Graf <tgraf@suug.ch> Date: Wed, 6 Jul 2005 01:41:04 +0200 > I still think we can fix this performance issue without manually > unrolling the loop or we should at least try to. In the end gcc > should notice the constant part of the loop and move it out so > basically the only difference should the additional prio++ and > possibly a failing branch prediction. But the branch prediction is where I personally think a lot of the lossage is coming from. These can cost upwards of 20 or 30 processor cycles, easily. That's getting close to the cost of a L2 cache miss. I see the difficulties with this change now, why don't we revisit this some time in the future? ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] loop unrolling in net/sched/sch_generic.c 2005-07-05 23:45 ` David S. Miller @ 2005-07-05 23:55 ` Thomas Graf 0 siblings, 0 replies; 31+ messages in thread From: Thomas Graf @ 2005-07-05 23:55 UTC (permalink / raw) To: David S. Miller; +Cc: dada1, netdev * David S. Miller <20050705.164503.104035718.davem@davemloft.net> 2005-07-05 16:45 > From: Thomas Graf <tgraf@suug.ch> > Date: Wed, 6 Jul 2005 01:41:04 +0200 > > > I still think we can fix this performance issue without manually > > unrolling the loop or we should at least try to. In the end gcc > > should notice the constant part of the loop and move it out so > > basically the only difference should the additional prio++ and > > possibly a failing branch prediction. > > But the branch prediction is where I personally think a lot > of the lossage is coming from. These can cost upwards of 20 > or 30 processor cycles, easily. That's getting close to the > cost of a L2 cache miss. Absolutely. I think what happens is that we produce predicion failures due to the logic within qdisc_dequeue_head(), I cannot back this up with numbers though. > I see the difficulties with this change now, why don't we revisit > this some time in the future? Fine with me. Eric, the patch I just posted should result in the same branch prediction as your loop unrolling. The only additional overhead we still have is the list + prio thing and an additional conditional jump to do the loop. If you have the cycles etc. it would be nice to compare it with your numbers. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] loop unrolling in net/sched/sch_generic.c 2005-07-05 23:41 ` Thomas Graf 2005-07-05 23:45 ` David S. Miller @ 2005-07-06 0:32 ` Eric Dumazet 2005-07-06 0:51 ` Thomas Graf 2005-07-06 0:53 ` Eric Dumazet 1 sibling, 2 replies; 31+ messages in thread From: Eric Dumazet @ 2005-07-06 0:32 UTC (permalink / raw) To: Thomas Graf; +Cc: David S. Miller, netdev Thomas Graf a écrit : > I still think we can fix this performance issue without manually > unrolling the loop or we should at least try to. In the end gcc > should notice the constant part of the loop and move it out so > basically the only difference should the additional prio++ and > possibly a failing branch prediction. > > What about this? I'm still not sure where exactly all the time > is lost so this is a shot in the dark. > > Index: net-2.6/net/sched/sch_generic.c > =================================================================== > --- net-2.6.orig/net/sched/sch_generic.c > +++ net-2.6/net/sched/sch_generic.c > @@ -330,10 +330,11 @@ static struct sk_buff *pfifo_fast_dequeu > { > int prio; > struct sk_buff_head *list = qdisc_priv(qdisc); > + struct sk_buff *skb; > > - for (prio = 0; prio < PFIFO_FAST_BANDS; prio++, list++) { > - struct sk_buff *skb = __qdisc_dequeue_head(qdisc, list); > - if (skb) { > + for (prio = 0; prio < PFIFO_FAST_BANDS; prio++) { > + if (!skb_queue_empty(list + prio)) { > + skb = __qdisc_dequeue_head(qdisc, list); > qdisc->q.qlen--; > return skb; > } > > Hum... shouldnt it be : + skb = __qdisc_dequeue_head(qdisc, list + prio); ? Anyway, the branches misprediction come from the fact that most of packets are queued in the prio=2 list. So each time this function is called, a non unrolled version has to pay 2 to 5 branches misprediction. if ((!skb_queue_empty(list + prio)) /* branch not taken, mispredict when prio=0 */ if ((!skb_queue_empty(list + prio)) /* branch not taken, mispredict when prio=1 */ if ((!skb_queue_empty(list + prio)) /* branch taken (or not if queue is really empty), mispredict when prio=2 */ Maybe we can rewrite the whole thing without branches, examining prio from PFIFO_FAST_BANDS-1 down to 0, at least for modern cpu with conditional mov (cmov) struct sk_buff_head *best = NULL; struct sk_buff_head *list = qdisc_priv(qdisc)+PFIFO_FAST_BANDS-1; if (skb_queue_empty(list)) best = list ; list--; if (skb_queue_empty(list)) best = list ; list--; if (skb_queue_empty(list)) best = list ; if (best != NULL) { qdisc->q.qlen--; return __qdisc_dequeue_head(qdisc, best); } This version should have one branch. I will test this after some sleep :) See you Eric ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] loop unrolling in net/sched/sch_generic.c 2005-07-06 0:32 ` Eric Dumazet @ 2005-07-06 0:51 ` Thomas Graf 2005-07-06 1:04 ` Eric Dumazet 2005-07-06 0:53 ` Eric Dumazet 1 sibling, 1 reply; 31+ messages in thread From: Thomas Graf @ 2005-07-06 0:51 UTC (permalink / raw) To: Eric Dumazet; +Cc: David S. Miller, netdev * Eric Dumazet <42CB2698.2080904@cosmosbay.com> 2005-07-06 02:32 > Hum... shouldnt it be : > > + skb = __qdisc_dequeue_head(qdisc, list + prio); > Correct. > Anyway, the branches misprediction come from the fact that most of packets > are queued in the prio=2 list. > > So each time this function is called, a non unrolled version has to pay 2 > to 5 branches misprediction. > > if ((!skb_queue_empty(list + prio)) /* branch not taken, mispredict when > prio=0 */ The !expr implies an unlikely so the prediction should be right and equal to your unrolling version. > Maybe we can rewrite the whole thing without branches, examining prio from > PFIFO_FAST_BANDS-1 down to 0, at least for modern cpu with conditional mov > (cmov) This would break the whole thing, the qdisc is supposed to try and dequeue from the highest priority queue (prio=0) first. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] loop unrolling in net/sched/sch_generic.c 2005-07-06 0:51 ` Thomas Graf @ 2005-07-06 1:04 ` Eric Dumazet 2005-07-06 1:07 ` Thomas Graf 0 siblings, 1 reply; 31+ messages in thread From: Eric Dumazet @ 2005-07-06 1:04 UTC (permalink / raw) To: Thomas Graf; +Cc: David S. Miller, netdev Thomas Graf a écrit : >>Maybe we can rewrite the whole thing without branches, examining prio from >>PFIFO_FAST_BANDS-1 down to 0, at least for modern cpu with conditional mov >>(cmov) > > > This would break the whole thing, the qdisc is supposed to try and > dequeue from the highest priority queue (prio=0) first. > > I still dequeue a packet from the highest priority queue. But nothing prevents us to look the three queues in the reverse order, if you can avoid the conditional branches. No memory penalty, since most of time we were looking at the three queues anyway, and the 3 sk_buff_head are in the same cache line. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] loop unrolling in net/sched/sch_generic.c 2005-07-06 1:04 ` Eric Dumazet @ 2005-07-06 1:07 ` Thomas Graf 0 siblings, 0 replies; 31+ messages in thread From: Thomas Graf @ 2005-07-06 1:07 UTC (permalink / raw) To: Eric Dumazet; +Cc: David S. Miller, netdev * Eric Dumazet <42CB2E24.6010303@cosmosbay.com> 2005-07-06 03:04 > Thomas Graf a écrit : > > >>Maybe we can rewrite the whole thing without branches, examining prio > >>from PFIFO_FAST_BANDS-1 down to 0, at least for modern cpu with > >>conditional mov (cmov) > > > > > >This would break the whole thing, the qdisc is supposed to try and > >dequeue from the highest priority queue (prio=0) first. > > > > > > I still dequeue a packet from the highest priority queue. Ahh... sorry, I misread your patch, interesting idea. I'll be waiting for your numbers. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] loop unrolling in net/sched/sch_generic.c 2005-07-06 0:32 ` Eric Dumazet 2005-07-06 0:51 ` Thomas Graf @ 2005-07-06 0:53 ` Eric Dumazet 2005-07-06 1:02 ` Thomas Graf 2005-07-06 12:42 ` Thomas Graf 1 sibling, 2 replies; 31+ messages in thread From: Eric Dumazet @ 2005-07-06 0:53 UTC (permalink / raw) To: Eric Dumazet; +Cc: Thomas Graf, David S. Miller, netdev Eric Dumazet a écrit : > > > Maybe we can rewrite the whole thing without branches, examining prio > from PFIFO_FAST_BANDS-1 down to 0, at least for modern cpu with > conditional mov (cmov) > > struct sk_buff_head *best = NULL; > struct sk_buff_head *list = qdisc_priv(qdisc)+PFIFO_FAST_BANDS-1; > if (skb_queue_empty(list)) best = list ; > list--; > if (skb_queue_empty(list)) best = list ; > list--; > if (skb_queue_empty(list)) best = list ; > if (best != NULL) { > qdisc->q.qlen--; > return __qdisc_dequeue_head(qdisc, best); > } > > This version should have one branch. > I will test this after some sleep :) > See you > Eric > > (Sorry, still using 2.6.12, but the idea remains) static struct sk_buff * pfifo_fast_dequeue(struct Qdisc* qdisc) { struct sk_buff_head *list = qdisc_priv(qdisc); struct sk_buff_head *best = NULL; list += 2; if (!skb_queue_empty(list)) best = list; list--; if (!skb_queue_empty(list)) best = list; list--; if (!skb_queue_empty(list)) best = list; if (best) { qdisc->q.qlen--; return __skb_dequeue(best); } return NULL; } At least the compiler output seems promising : 0000000000000550 <pfifo_fast_dequeue>: 550: 48 8d 97 f0 00 00 00 lea 0xf0(%rdi),%rdx 557: 31 c9 xor %ecx,%ecx 559: 48 8d 87 c0 00 00 00 lea 0xc0(%rdi),%rax 560: 48 39 97 f0 00 00 00 cmp %rdx,0xf0(%rdi) 567: 48 0f 45 ca cmovne %rdx,%rcx 56b: 48 8d 97 d8 00 00 00 lea 0xd8(%rdi),%rdx 572: 48 39 97 d8 00 00 00 cmp %rdx,0xd8(%rdi) 579: 48 0f 45 ca cmovne %rdx,%rcx 57d: 48 39 87 c0 00 00 00 cmp %rax,0xc0(%rdi) 584: 48 0f 45 c8 cmovne %rax,%rcx 588: 31 c0 xor %eax,%eax 58a: 48 85 c9 test %rcx,%rcx 58d: 74 32 je 5c1 <pfifo_fast_dequeue+0x71> // one conditional branch 58f: ff 4f 40 decl 0x40(%rdi) 592: 48 8b 11 mov (%rcx),%rdx 595: 48 39 ca cmp %rcx,%rdx 598: 74 27 je 5c1 <pfifo_fast_dequeue+0x71> // never taken branch : always predicted OK 59a: 48 89 d0 mov %rdx,%rax 59d: 48 8b 12 mov (%rdx),%rdx 5a0: ff 49 10 decl 0x10(%rcx) 5a3: 48 c7 40 10 00 00 00 movq $0x0,0x10(%rax) 5aa: 00 5ab: 48 89 4a 08 mov %rcx,0x8(%rdx) 5af: 48 89 11 mov %rdx,(%rcx) 5b2: 48 c7 40 08 00 00 00 movq $0x0,0x8(%rax) 5b9: 00 5ba: 48 c7 00 00 00 00 00 movq $0x0,(%rax) 5c1: 90 nop 5c2: c3 retq I Will post tomorrow some profiling results. Eric ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] loop unrolling in net/sched/sch_generic.c 2005-07-06 0:53 ` Eric Dumazet @ 2005-07-06 1:02 ` Thomas Graf 2005-07-06 1:09 ` Eric Dumazet 2005-07-06 12:42 ` Thomas Graf 1 sibling, 1 reply; 31+ messages in thread From: Thomas Graf @ 2005-07-06 1:02 UTC (permalink / raw) To: Eric Dumazet; +Cc: David S. Miller, netdev * Eric Dumazet <42CB2B84.50702@cosmosbay.com> 2005-07-06 02:53 > (Sorry, still using 2.6.12, but the idea remains) I think you got me wrong, the whole point of this qdisc is to prioritize which means that we cannot dequeue from prio 1 as long as the queue in prio 0 is not empty. If you have no traffic at all for prio=0 and prio=1 then the best solution is to replace the qdisc on the device with a simple fifo. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] loop unrolling in net/sched/sch_generic.c 2005-07-06 1:02 ` Thomas Graf @ 2005-07-06 1:09 ` Eric Dumazet 0 siblings, 0 replies; 31+ messages in thread From: Eric Dumazet @ 2005-07-06 1:09 UTC (permalink / raw) To: Thomas Graf; +Cc: David S. Miller, netdev Thomas Graf a écrit : > I think you got me wrong, the whole point of this qdisc > is to prioritize which means that we cannot dequeue from > prio 1 as long as the queue in prio 0 is not empty. if prio 0 is not empty, then the last if (!skb_queue_empty(list)) best = list; will set 'best' to the prio 0 list, and we dequeue the packet on this prio 0 list, not on prio 1 or prio 2. > > If you have no traffic at all for prio=0 and prio=1 then > the best solution is to replace the qdisc on the device > with a simple fifo. Yes sure, but I know that already. Unfortunatly I have some trafic on prio=1 and prio=0 (about 5 %) Thank you Eric ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] loop unrolling in net/sched/sch_generic.c 2005-07-06 0:53 ` Eric Dumazet 2005-07-06 1:02 ` Thomas Graf @ 2005-07-06 12:42 ` Thomas Graf 2005-07-07 21:17 ` David S. Miller 1 sibling, 1 reply; 31+ messages in thread From: Thomas Graf @ 2005-07-06 12:42 UTC (permalink / raw) To: Eric Dumazet; +Cc: David S. Miller, netdev * Eric Dumazet <42CB2B84.50702@cosmosbay.com> 2005-07-06 02:53 A short recap after some coffee and sleep: The inital issue you brought up which you backed up with numbers is probably the cause of multiple wrong branch predictions due to the fact that I wrote skb = dequeue(); if (skb) which was assumed to be likely by the compiler. In your patch you fixed this with !skb_queue_empty() which fixed this wrong prediction and also acts as a little optimization due to skb_queue_empty() being really simple to implement for the compiler. The patch I posted should result in almost the same result, despite of the additional wrong branch prediction for the loop it always has one wrong prediction which is when we hit a non-empty queue. In your unrolled version you could optimize it even more by taking advantage that prio=2 is the most likely non-empty queue so you could change the check to likely and save a wrong branch prediction for the common case at the cost of a branch misprediction if all queues are empty. The patch I posted results in something like this: pfifo_fast_dequeue: pushl %ebx xorl %ecx, %ecx movl 8(%esp), %ebx leal 128(%ebx), %edx .L129: movl (%edx), %eax cmpl %edx, %eax jne .L132 ; if (!skb_queue_empty()) incl %ecx addl $20, %edx cmpl $2, %ecx jle .L129 ; end of loop xorl %eax, %eax ; all queues empty .L117: popl %ebx ret I regard the miss here as acceptable for the increased flexibility we get. It can be optimized with a loop unrolling but my opinion is to try and avoid it if possible. Now your second thought is quite interesting, although it heavly depends on the fact that prio=2 is the most often used band. It will be interesting to see some numbers. > static struct sk_buff * > pfifo_fast_dequeue(struct Qdisc* qdisc) > { > struct sk_buff_head *list = qdisc_priv(qdisc); > struct sk_buff_head *best = NULL; > > list += 2; > if (!skb_queue_empty(list)) > best = list; > list--; > if (!skb_queue_empty(list)) > best = list; > list--; > if (!skb_queue_empty(list)) > best = list; Here is what I mean, a likely() should be even better. > if (best) { > qdisc->q.qlen--; > return __skb_dequeue(best); > } > return NULL; > } ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] loop unrolling in net/sched/sch_generic.c 2005-07-06 12:42 ` Thomas Graf @ 2005-07-07 21:17 ` David S. Miller 2005-07-07 21:34 ` Thomas Graf [not found] ` <42CE22CE.7030902@cosmosbay.com> 0 siblings, 2 replies; 31+ messages in thread From: David S. Miller @ 2005-07-07 21:17 UTC (permalink / raw) To: tgraf; +Cc: dada1, netdev From: Thomas Graf <tgraf@suug.ch> Date: Wed, 6 Jul 2005 14:42:06 +0200 > The inital issue you brought up which you backed up with numbers > is probably the cause of multiple wrong branch predictions due > to the fact that I wrote skb = dequeue(); if (skb) which was > assumed to be likely by the compiler. In your patch you fixed > this with !skb_queue_empty() which fixed this wrong prediction > and also acts as a little optimization due to skb_queue_empty() > being really simple to implement for the compiler. As an aside, this reminds me that as part of my quest to make sk_buff smaller, I intend to walk across the tree and change all tests of the form: if (!skb_queue_len(list)) into: if (skb_queue_empty(list)) It would be really nice, after the above transformation and some others, to get rid of sk_buff_head->qlen. Why? Because that also allows us to remove the skb->list member as well, as it's the only reason for existing is so that the SKB queue removal routines can decrement the queue length. That's kind of silly, and most SKB lists in the kernel do not care about the queue length at all. Rather, they care about empty and non-empty. The cases that do care (mostly packet schedulers) can keep track of the queue length themselves in their private data structures. When they remove packets, they _know_ which queue to decrement the queue length of. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] loop unrolling in net/sched/sch_generic.c 2005-07-07 21:17 ` David S. Miller @ 2005-07-07 21:34 ` Thomas Graf 2005-07-07 22:24 ` David S. Miller [not found] ` <42CE22CE.7030902@cosmosbay.com> 1 sibling, 1 reply; 31+ messages in thread From: Thomas Graf @ 2005-07-07 21:34 UTC (permalink / raw) To: David S. Miller; +Cc: dada1, netdev * David S. Miller <20050707.141718.85410359.davem@davemloft.net> 2005-07-07 14:17 > As an aside, this reminds me that as part of my quest to make > sk_buff smaller, I intend to walk across the tree and change > all tests of the form: > > if (!skb_queue_len(list)) > > into: > > if (skb_queue_empty(list)) > > [...] > > The cases that do care (mostly packet schedulers) can > keep track of the queue length themselves in their private data > structures. When they remove packets, they _know_ which queue to > decrement the queue length of. Since I'm changing the classful qdiscs to use a generic API for queue management anyway I could take care of this if you want. WRT the leaf qdiscs it's a bit more complicated since we have to change the new API to take a new struct which includes the qlen and the sk_buff_head but not a problem either. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] loop unrolling in net/sched/sch_generic.c 2005-07-07 21:34 ` Thomas Graf @ 2005-07-07 22:24 ` David S. Miller 0 siblings, 0 replies; 31+ messages in thread From: David S. Miller @ 2005-07-07 22:24 UTC (permalink / raw) To: tgraf; +Cc: dada1, netdev From: Thomas Graf <tgraf@suug.ch> Date: Thu, 7 Jul 2005 23:34:50 +0200 > Since I'm changing the classful qdiscs to use a generic API > for queue management anyway I could take care of this if you want. > WRT the leaf qdiscs it's a bit more complicated since we have > to change the new API to take a new struct which includes the qlen > and the sk_buff_head but not a problem either. Ok. I'm going to check something like the following into my tree. It takes care of the obvious cases of direct binary test of queue length being zero vs. non-zero. This uncovered some seriously questionable stuff along the way. For example, take a look at drivers/usb/net/usbnet.c:usbnet_stop(). That code seems to want to wait until all the SKB queues are empty, but the way it is coded it only waits if all the queues have at least one packet. I preserved the behavior there, but if someone could verify my analysis and post a bug fix, I'd really appreciate it. Thanks. diff --git a/drivers/bluetooth/hci_vhci.c b/drivers/bluetooth/hci_vhci.c --- a/drivers/bluetooth/hci_vhci.c +++ b/drivers/bluetooth/hci_vhci.c @@ -120,7 +120,7 @@ static unsigned int hci_vhci_chr_poll(st poll_wait(file, &hci_vhci->read_wait, wait); - if (skb_queue_len(&hci_vhci->readq)) + if (!skb_queue_empty(&hci_vhci->readq)) return POLLIN | POLLRDNORM; return POLLOUT | POLLWRNORM; diff --git a/drivers/isdn/hisax/isdnl1.c b/drivers/isdn/hisax/isdnl1.c --- a/drivers/isdn/hisax/isdnl1.c +++ b/drivers/isdn/hisax/isdnl1.c @@ -279,7 +279,8 @@ BChannel_proc_xmt(struct BCState *bcs) if (test_and_clear_bit(FLG_L1_PULL_REQ, &st->l1.Flags)) st->l1.l1l2(st, PH_PULL | CONFIRM, NULL); if (!test_bit(BC_FLG_ACTIV, &bcs->Flag)) { - if (!test_bit(BC_FLG_BUSY, &bcs->Flag) && (!skb_queue_len(&bcs->squeue))) { + if (!test_bit(BC_FLG_BUSY, &bcs->Flag) && + skb_queue_empty(&bcs->squeue)) { st->l2.l2l1(st, PH_DEACTIVATE | CONFIRM, NULL); } } diff --git a/drivers/isdn/hisax/isdnl2.c b/drivers/isdn/hisax/isdnl2.c --- a/drivers/isdn/hisax/isdnl2.c +++ b/drivers/isdn/hisax/isdnl2.c @@ -108,7 +108,8 @@ static int l2addrsize(struct Layer2 *l2) static void set_peer_busy(struct Layer2 *l2) { test_and_set_bit(FLG_PEER_BUSY, &l2->flag); - if (skb_queue_len(&l2->i_queue) || skb_queue_len(&l2->ui_queue)) + if (!skb_queue_empty(&l2->i_queue) || + !skb_queue_empty(&l2->ui_queue)) test_and_set_bit(FLG_L2BLOCK, &l2->flag); } @@ -754,7 +755,7 @@ l2_restart_multi(struct FsmInst *fi, int st->l2.l2l3(st, DL_ESTABLISH | INDICATION, NULL); if ((ST_L2_7==state) || (ST_L2_8 == state)) - if (skb_queue_len(&st->l2.i_queue) && cansend(st)) + if (!skb_queue_empty(&st->l2.i_queue) && cansend(st)) st->l2.l2l1(st, PH_PULL | REQUEST, NULL); } @@ -810,7 +811,7 @@ l2_connected(struct FsmInst *fi, int eve if (pr != -1) st->l2.l2l3(st, pr, NULL); - if (skb_queue_len(&st->l2.i_queue) && cansend(st)) + if (!skb_queue_empty(&st->l2.i_queue) && cansend(st)) st->l2.l2l1(st, PH_PULL | REQUEST, NULL); } @@ -1014,7 +1015,7 @@ l2_st7_got_super(struct FsmInst *fi, int if(typ != RR) FsmDelTimer(&st->l2.t203, 9); restart_t200(st, 12); } - if (skb_queue_len(&st->l2.i_queue) && (typ == RR)) + if (!skb_queue_empty(&st->l2.i_queue) && (typ == RR)) st->l2.l2l1(st, PH_PULL | REQUEST, NULL); } else nrerrorrecovery(fi); @@ -1120,7 +1121,7 @@ l2_got_iframe(struct FsmInst *fi, int ev return; } - if (skb_queue_len(&st->l2.i_queue) && (fi->state == ST_L2_7)) + if (!skb_queue_empty(&st->l2.i_queue) && (fi->state == ST_L2_7)) st->l2.l2l1(st, PH_PULL | REQUEST, NULL); if (test_and_clear_bit(FLG_ACK_PEND, &st->l2.flag)) enquiry_cr(st, RR, RSP, 0); @@ -1138,7 +1139,7 @@ l2_got_tei(struct FsmInst *fi, int event test_and_set_bit(FLG_L3_INIT, &st->l2.flag); } else FsmChangeState(fi, ST_L2_4); - if (skb_queue_len(&st->l2.ui_queue)) + if (!skb_queue_empty(&st->l2.ui_queue)) tx_ui(st); } @@ -1301,7 +1302,7 @@ l2_pull_iqueue(struct FsmInst *fi, int e FsmDelTimer(&st->l2.t203, 13); FsmAddTimer(&st->l2.t200, st->l2.T200, EV_L2_T200, NULL, 11); } - if (skb_queue_len(&l2->i_queue) && cansend(st)) + if (!skb_queue_empty(&l2->i_queue) && cansend(st)) st->l2.l2l1(st, PH_PULL | REQUEST, NULL); } @@ -1347,7 +1348,7 @@ l2_st8_got_super(struct FsmInst *fi, int } invoke_retransmission(st, nr); FsmChangeState(fi, ST_L2_7); - if (skb_queue_len(&l2->i_queue) && cansend(st)) + if (!skb_queue_empty(&l2->i_queue) && cansend(st)) st->l2.l2l1(st, PH_PULL | REQUEST, NULL); } else nrerrorrecovery(fi); diff --git a/drivers/isdn/hisax/isdnl3.c b/drivers/isdn/hisax/isdnl3.c --- a/drivers/isdn/hisax/isdnl3.c +++ b/drivers/isdn/hisax/isdnl3.c @@ -302,7 +302,7 @@ release_l3_process(struct l3_process *p) !test_bit(FLG_PTP, &p->st->l2.flag)) { if (p->debug) l3_debug(p->st, "release_l3_process: last process"); - if (!skb_queue_len(&p->st->l3.squeue)) { + if (skb_queue_empty(&p->st->l3.squeue)) { if (p->debug) l3_debug(p->st, "release_l3_process: release link"); if (p->st->protocol != ISDN_PTYPE_NI1) diff --git a/drivers/isdn/i4l/isdn_tty.c b/drivers/isdn/i4l/isdn_tty.c --- a/drivers/isdn/i4l/isdn_tty.c +++ b/drivers/isdn/i4l/isdn_tty.c @@ -1223,7 +1223,7 @@ isdn_tty_write(struct tty_struct *tty, c total += c; } atomic_dec(&info->xmit_lock); - if ((info->xmit_count) || (skb_queue_len(&info->xmit_queue))) { + if ((info->xmit_count) || !skb_queue_empty(&info->xmit_queue)) { if (m->mdmreg[REG_DXMT] & BIT_DXMT) { isdn_tty_senddown(info); isdn_tty_tint(info); @@ -1284,7 +1284,7 @@ isdn_tty_flush_chars(struct tty_struct * if (isdn_tty_paranoia_check(info, tty->name, "isdn_tty_flush_chars")) return; - if ((info->xmit_count) || (skb_queue_len(&info->xmit_queue))) + if ((info->xmit_count) || !skb_queue_empty(&info->xmit_queue)) isdn_timer_ctrl(ISDN_TIMER_MODEMXMIT, 1); } diff --git a/drivers/isdn/icn/icn.c b/drivers/isdn/icn/icn.c --- a/drivers/isdn/icn/icn.c +++ b/drivers/isdn/icn/icn.c @@ -304,12 +304,12 @@ icn_pollbchan_send(int channel, icn_card isdn_ctrl cmd; if (!(card->sndcount[channel] || card->xskb[channel] || - skb_queue_len(&card->spqueue[channel]))) + !skb_queue_empty(&card->spqueue[channel]))) return; if (icn_trymaplock_channel(card, mch)) { while (sbfree && (card->sndcount[channel] || - skb_queue_len(&card->spqueue[channel]) || + !skb_queue_empty(&card->spqueue[channel]) || card->xskb[channel])) { spin_lock_irqsave(&card->lock, flags); if (card->xmit_lock[channel]) { diff --git a/drivers/net/hamradio/scc.c b/drivers/net/hamradio/scc.c --- a/drivers/net/hamradio/scc.c +++ b/drivers/net/hamradio/scc.c @@ -304,7 +304,7 @@ static inline void scc_discard_buffers(s scc->tx_buff = NULL; } - while (skb_queue_len(&scc->tx_queue)) + while (!skb_queue_empty(&scc->tx_queue)) dev_kfree_skb(skb_dequeue(&scc->tx_queue)); spin_unlock_irqrestore(&scc->lock, flags); @@ -1126,8 +1126,7 @@ static void t_dwait(unsigned long channe if (scc->stat.tx_state == TXS_WAIT) /* maxkeyup or idle timeout */ { - if (skb_queue_len(&scc->tx_queue) == 0) /* nothing to send */ - { + if (skb_queue_empty(&scc->tx_queue)) { /* nothing to send */ scc->stat.tx_state = TXS_IDLE; netif_wake_queue(scc->dev); /* t_maxkeyup locked it. */ return; diff --git a/drivers/net/ppp_async.c b/drivers/net/ppp_async.c --- a/drivers/net/ppp_async.c +++ b/drivers/net/ppp_async.c @@ -364,7 +364,7 @@ ppp_asynctty_receive(struct tty_struct * spin_lock_irqsave(&ap->recv_lock, flags); ppp_async_input(ap, buf, cflags, count); spin_unlock_irqrestore(&ap->recv_lock, flags); - if (skb_queue_len(&ap->rqueue)) + if (!skb_queue_empty(&ap->rqueue)) tasklet_schedule(&ap->tsk); ap_put(ap); if (test_and_clear_bit(TTY_THROTTLED, &tty->flags) diff --git a/drivers/net/ppp_generic.c b/drivers/net/ppp_generic.c --- a/drivers/net/ppp_generic.c +++ b/drivers/net/ppp_generic.c @@ -1237,8 +1237,8 @@ static int ppp_mp_explode(struct ppp *pp pch = list_entry(list, struct channel, clist); navail += pch->avail = (pch->chan != NULL); if (pch->avail) { - if (skb_queue_len(&pch->file.xq) == 0 - || !pch->had_frag) { + if (skb_queue_empty(&pch->file.xq) || + !pch->had_frag) { pch->avail = 2; ++nfree; } @@ -1374,8 +1374,8 @@ static int ppp_mp_explode(struct ppp *pp /* try to send it down the channel */ chan = pch->chan; - if (skb_queue_len(&pch->file.xq) - || !chan->ops->start_xmit(chan, frag)) + if (!skb_queue_empty(&pch->file.xq) || + !chan->ops->start_xmit(chan, frag)) skb_queue_tail(&pch->file.xq, frag); pch->had_frag = 1; p += flen; @@ -1412,7 +1412,7 @@ ppp_channel_push(struct channel *pch) spin_lock_bh(&pch->downl); if (pch->chan != 0) { - while (skb_queue_len(&pch->file.xq) > 0) { + while (!skb_queue_empty(&pch->file.xq)) { skb = skb_dequeue(&pch->file.xq); if (!pch->chan->ops->start_xmit(pch->chan, skb)) { /* put the packet back and try again later */ @@ -1426,7 +1426,7 @@ ppp_channel_push(struct channel *pch) } spin_unlock_bh(&pch->downl); /* see if there is anything from the attached unit to be sent */ - if (skb_queue_len(&pch->file.xq) == 0) { + if (skb_queue_empty(&pch->file.xq)) { read_lock_bh(&pch->upl); ppp = pch->ppp; if (ppp != 0) diff --git a/drivers/net/ppp_synctty.c b/drivers/net/ppp_synctty.c --- a/drivers/net/ppp_synctty.c +++ b/drivers/net/ppp_synctty.c @@ -406,7 +406,7 @@ ppp_sync_receive(struct tty_struct *tty, spin_lock_irqsave(&ap->recv_lock, flags); ppp_sync_input(ap, buf, cflags, count); spin_unlock_irqrestore(&ap->recv_lock, flags); - if (skb_queue_len(&ap->rqueue)) + if (!skb_queue_empty(&ap->rqueue)) tasklet_schedule(&ap->tsk); sp_put(ap); if (test_and_clear_bit(TTY_THROTTLED, &tty->flags) diff --git a/drivers/net/tun.c b/drivers/net/tun.c --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -215,7 +215,7 @@ static unsigned int tun_chr_poll(struct poll_wait(file, &tun->read_wait, wait); - if (skb_queue_len(&tun->readq)) + if (!skb_queue_empty(&tun->readq)) mask |= POLLIN | POLLRDNORM; return mask; diff --git a/drivers/net/wireless/airo.c b/drivers/net/wireless/airo.c --- a/drivers/net/wireless/airo.c +++ b/drivers/net/wireless/airo.c @@ -2374,7 +2374,7 @@ void stop_airo_card( struct net_device * /* * Clean out tx queue */ - if (test_bit(FLAG_MPI, &ai->flags) && skb_queue_len (&ai->txq) > 0) { + if (test_bit(FLAG_MPI, &ai->flags) && !skb_queue_empty(&ai->txq)) { struct sk_buff *skb = NULL; for (;(skb = skb_dequeue(&ai->txq));) dev_kfree_skb(skb); @@ -3287,7 +3287,7 @@ exitrx: if (status & EV_TXEXC) get_tx_error(apriv, -1); spin_lock_irqsave(&apriv->aux_lock, flags); - if (skb_queue_len (&apriv->txq)) { + if (!skb_queue_empty(&apriv->txq)) { spin_unlock_irqrestore(&apriv->aux_lock,flags); mpi_send_packet (dev); } else { diff --git a/drivers/s390/net/claw.c b/drivers/s390/net/claw.c --- a/drivers/s390/net/claw.c +++ b/drivers/s390/net/claw.c @@ -428,7 +428,7 @@ claw_pack_skb(struct claw_privbk *privpt new_skb = NULL; /* assume no dice */ pkt_cnt = 0; CLAW_DBF_TEXT(4,trace,"PackSKBe"); - if (skb_queue_len(&p_ch->collect_queue) > 0) { + if (!skb_queue_empty(&p_ch->collect_queue)) { /* some data */ held_skb = skb_dequeue(&p_ch->collect_queue); if (p_env->packing != DO_PACKED) @@ -1254,7 +1254,7 @@ claw_write_next ( struct chbk * p_ch ) privptr = (struct claw_privbk *) dev->priv; claw_free_wrt_buf( dev ); if ((privptr->write_free_count > 0) && - (skb_queue_len(&p_ch->collect_queue) > 0)) { + !skb_queue_empty(&p_ch->collect_queue)) { pk_skb = claw_pack_skb(privptr); while (pk_skb != NULL) { rc = claw_hw_tx( pk_skb, dev,1); diff --git a/drivers/s390/net/ctctty.c b/drivers/s390/net/ctctty.c --- a/drivers/s390/net/ctctty.c +++ b/drivers/s390/net/ctctty.c @@ -156,7 +156,7 @@ ctc_tty_readmodem(ctc_tty_info *info) skb_queue_head(&info->rx_queue, skb); else { kfree_skb(skb); - ret = skb_queue_len(&info->rx_queue); + ret = !skb_queue_empty(&info->rx_queue); } } } @@ -530,7 +530,7 @@ ctc_tty_write(struct tty_struct *tty, co total += c; count -= c; } - if (skb_queue_len(&info->tx_queue)) { + if (!skb_queue_empty(&info->tx_queue)) { info->lsr &= ~UART_LSR_TEMT; tasklet_schedule(&info->tasklet); } @@ -594,7 +594,7 @@ ctc_tty_flush_chars(struct tty_struct *t return; if (ctc_tty_paranoia_check(info, tty->name, "ctc_tty_flush_chars")) return; - if (tty->stopped || tty->hw_stopped || (!skb_queue_len(&info->tx_queue))) + if (tty->stopped || tty->hw_stopped || skb_queue_empty(&info->tx_queue)) return; tasklet_schedule(&info->tasklet); } diff --git a/drivers/usb/net/usbnet.c b/drivers/usb/net/usbnet.c --- a/drivers/usb/net/usbnet.c +++ b/drivers/usb/net/usbnet.c @@ -3227,9 +3227,9 @@ static int usbnet_stop (struct net_devic temp = unlink_urbs (dev, &dev->txq) + unlink_urbs (dev, &dev->rxq); // maybe wait for deletions to finish. - while (skb_queue_len (&dev->rxq) - && skb_queue_len (&dev->txq) - && skb_queue_len (&dev->done)) { + while (!skb_queue_empty(&dev->rxq) && + !skb_queue_empty(&dev->txq) && + !skb_queue_empty(&dev->done)) { msleep(UNLINK_TIMEOUT_MS); if (netif_msg_ifdown (dev)) devdbg (dev, "waited for %d urb completions", temp); diff --git a/include/net/irda/irda_device.h b/include/net/irda/irda_device.h --- a/include/net/irda/irda_device.h +++ b/include/net/irda/irda_device.h @@ -224,7 +224,7 @@ int irda_device_is_receiving(struct net /* Interface for internal use */ static inline int irda_device_txqueue_empty(const struct net_device *dev) { - return (skb_queue_len(&dev->qdisc->q) == 0); + return skb_queue_empty(&dev->qdisc->q); } int irda_device_set_raw_mode(struct net_device* self, int status); struct net_device *alloc_irdadev(int sizeof_priv); diff --git a/include/net/tcp.h b/include/net/tcp.h --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -991,7 +991,7 @@ static __inline__ void tcp_fast_path_on( static inline void tcp_fast_path_check(struct sock *sk, struct tcp_sock *tp) { - if (skb_queue_len(&tp->out_of_order_queue) == 0 && + if (skb_queue_empty(&tp->out_of_order_queue) && tp->rcv_wnd && atomic_read(&sk->sk_rmem_alloc) < sk->sk_rcvbuf && !tp->urg_data) diff --git a/net/bluetooth/cmtp/core.c b/net/bluetooth/cmtp/core.c --- a/net/bluetooth/cmtp/core.c +++ b/net/bluetooth/cmtp/core.c @@ -213,7 +213,7 @@ static int cmtp_send_frame(struct cmtp_s return kernel_sendmsg(sock, &msg, &iv, 1, len); } -static int cmtp_process_transmit(struct cmtp_session *session) +static void cmtp_process_transmit(struct cmtp_session *session) { struct sk_buff *skb, *nskb; unsigned char *hdr; @@ -223,7 +223,7 @@ static int cmtp_process_transmit(struct if (!(nskb = alloc_skb(session->mtu, GFP_ATOMIC))) { BT_ERR("Can't allocate memory for new frame"); - return -ENOMEM; + return; } while ((skb = skb_dequeue(&session->transmit))) { @@ -275,8 +275,6 @@ static int cmtp_process_transmit(struct cmtp_send_frame(session, nskb->data, nskb->len); kfree_skb(nskb); - - return skb_queue_len(&session->transmit); } static int cmtp_session(void *arg) diff --git a/net/bluetooth/hidp/core.c b/net/bluetooth/hidp/core.c --- a/net/bluetooth/hidp/core.c +++ b/net/bluetooth/hidp/core.c @@ -428,7 +428,7 @@ static int hidp_send_frame(struct socket return kernel_sendmsg(sock, &msg, &iv, 1, len); } -static int hidp_process_transmit(struct hidp_session *session) +static void hidp_process_transmit(struct hidp_session *session) { struct sk_buff *skb; @@ -453,9 +453,6 @@ static int hidp_process_transmit(struct hidp_set_timer(session); kfree_skb(skb); } - - return skb_queue_len(&session->ctrl_transmit) + - skb_queue_len(&session->intr_transmit); } static int hidp_session(void *arg) diff --git a/net/bluetooth/rfcomm/sock.c b/net/bluetooth/rfcomm/sock.c --- a/net/bluetooth/rfcomm/sock.c +++ b/net/bluetooth/rfcomm/sock.c @@ -590,8 +590,11 @@ static long rfcomm_sock_data_wait(struct for (;;) { set_current_state(TASK_INTERRUPTIBLE); - if (skb_queue_len(&sk->sk_receive_queue) || sk->sk_err || (sk->sk_shutdown & RCV_SHUTDOWN) || - signal_pending(current) || !timeo) + if (!skb_queue_empty(&sk->sk_receive_queue) || + sk->sk_err || + (sk->sk_shutdown & RCV_SHUTDOWN) || + signal_pending(current) || + !timeo) break; set_bit(SOCK_ASYNC_WAITDATA, &sk->sk_socket->flags); diff --git a/net/bluetooth/rfcomm/tty.c b/net/bluetooth/rfcomm/tty.c --- a/net/bluetooth/rfcomm/tty.c +++ b/net/bluetooth/rfcomm/tty.c @@ -781,7 +781,7 @@ static int rfcomm_tty_chars_in_buffer(st BT_DBG("tty %p dev %p", tty, dev); - if (skb_queue_len(&dlc->tx_queue)) + if (!skb_queue_empty(&dlc->tx_queue)) return dlc->mtu; return 0; diff --git a/net/decnet/af_decnet.c b/net/decnet/af_decnet.c --- a/net/decnet/af_decnet.c +++ b/net/decnet/af_decnet.c @@ -536,7 +536,7 @@ static void dn_keepalive(struct sock *sk * we are double checking that we are not sending too * many of these keepalive frames. */ - if (skb_queue_len(&scp->other_xmit_queue) == 0) + if (skb_queue_empty(&scp->other_xmit_queue)) dn_nsp_send_link(sk, DN_NOCHANGE, 0); } @@ -1191,7 +1191,7 @@ static unsigned int dn_poll(struct file struct dn_scp *scp = DN_SK(sk); int mask = datagram_poll(file, sock, wait); - if (skb_queue_len(&scp->other_receive_queue)) + if (!skb_queue_empty(&scp->other_receive_queue)) mask |= POLLRDBAND; return mask; @@ -1214,7 +1214,7 @@ static int dn_ioctl(struct socket *sock, case SIOCATMARK: lock_sock(sk); - val = (skb_queue_len(&scp->other_receive_queue) != 0); + val = !skb_queue_empty(&scp->other_receive_queue); if (scp->state != DN_RUN) val = -ENOTCONN; release_sock(sk); @@ -1630,7 +1630,7 @@ static int dn_data_ready(struct sock *sk int len = 0; if (flags & MSG_OOB) - return skb_queue_len(q) ? 1 : 0; + return !skb_queue_empty(q) ? 1 : 0; while(skb != (struct sk_buff *)q) { struct dn_skb_cb *cb = DN_SKB_CB(skb); @@ -1707,7 +1707,7 @@ static int dn_recvmsg(struct kiocb *iocb if (sk->sk_err) goto out; - if (skb_queue_len(&scp->other_receive_queue)) { + if (!skb_queue_empty(&scp->other_receive_queue)) { if (!(flags & MSG_OOB)) { msg->msg_flags |= MSG_OOB; if (!scp->other_report) { diff --git a/net/decnet/dn_nsp_out.c b/net/decnet/dn_nsp_out.c --- a/net/decnet/dn_nsp_out.c +++ b/net/decnet/dn_nsp_out.c @@ -342,7 +342,8 @@ int dn_nsp_xmit_timeout(struct sock *sk) dn_nsp_output(sk); - if (skb_queue_len(&scp->data_xmit_queue) || skb_queue_len(&scp->other_xmit_queue)) + if (!skb_queue_empty(&scp->data_xmit_queue) || + !skb_queue_empty(&scp->other_xmit_queue)) scp->persist = dn_nsp_persist(sk); return 0; diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1105,7 +1105,7 @@ static void tcp_prequeue_process(struct struct sk_buff *skb; struct tcp_sock *tp = tcp_sk(sk); - NET_ADD_STATS_USER(LINUX_MIB_TCPPREQUEUED, skb_queue_len(&tp->ucopy.prequeue)); + NET_INC_STATS_USER(LINUX_MIB_TCPPREQUEUED); /* RX process wants to run with disabled BHs, though it is not * necessary */ @@ -1369,7 +1369,7 @@ int tcp_recvmsg(struct kiocb *iocb, stru * is not empty. It is more elegant, but eats cycles, * unfortunately. */ - if (skb_queue_len(&tp->ucopy.prequeue)) + if (!skb_queue_empty(&tp->ucopy.prequeue)) goto do_prequeue; /* __ Set realtime policy in scheduler __ */ @@ -1394,7 +1394,7 @@ int tcp_recvmsg(struct kiocb *iocb, stru } if (tp->rcv_nxt == tp->copied_seq && - skb_queue_len(&tp->ucopy.prequeue)) { + !skb_queue_empty(&tp->ucopy.prequeue)) { do_prequeue: tcp_prequeue_process(sk); @@ -1476,7 +1476,7 @@ skip_copy: } while (len > 0); if (user_recv) { - if (skb_queue_len(&tp->ucopy.prequeue)) { + if (!skb_queue_empty(&tp->ucopy.prequeue)) { int chunk; tp->ucopy.len = copied > 0 ? len : 0; diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2802,7 +2802,7 @@ static void tcp_sack_remove(struct tcp_s int this_sack; /* Empty ofo queue, hence, all the SACKs are eaten. Clear. */ - if (skb_queue_len(&tp->out_of_order_queue) == 0) { + if (skb_queue_empty(&tp->out_of_order_queue)) { tp->rx_opt.num_sacks = 0; tp->rx_opt.eff_sacks = tp->rx_opt.dsack; return; @@ -2935,13 +2935,13 @@ queue_and_out: if(th->fin) tcp_fin(skb, sk, th); - if (skb_queue_len(&tp->out_of_order_queue)) { + if (!skb_queue_empty(&tp->out_of_order_queue)) { tcp_ofo_queue(sk); /* RFC2581. 4.2. SHOULD send immediate ACK, when * gap in queue is filled. */ - if (!skb_queue_len(&tp->out_of_order_queue)) + if (skb_queue_empty(&tp->out_of_order_queue)) tp->ack.pingpong = 0; } @@ -3249,9 +3249,8 @@ static int tcp_prune_queue(struct sock * * This must not ever occur. */ /* First, purge the out_of_order queue. */ - if (skb_queue_len(&tp->out_of_order_queue)) { - NET_ADD_STATS_BH(LINUX_MIB_OFOPRUNED, - skb_queue_len(&tp->out_of_order_queue)); + if (!skb_queue_empty(&tp->out_of_order_queue)) { + NET_INC_STATS_BH(LINUX_MIB_OFOPRUNED); __skb_queue_purge(&tp->out_of_order_queue); /* Reset SACK state. A conforming SACK implementation will diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -231,11 +231,10 @@ static void tcp_delack_timer(unsigned lo } tp->ack.pending &= ~TCP_ACK_TIMER; - if (skb_queue_len(&tp->ucopy.prequeue)) { + if (!skb_queue_empty(&tp->ucopy.prequeue)) { struct sk_buff *skb; - NET_ADD_STATS_BH(LINUX_MIB_TCPSCHEDULERFAILED, - skb_queue_len(&tp->ucopy.prequeue)); + NET_INC_STATS_BH(LINUX_MIB_TCPSCHEDULERFAILED); while ((skb = __skb_dequeue(&tp->ucopy.prequeue)) != NULL) sk->sk_backlog_rcv(sk, skb); diff --git a/net/irda/irlap.c b/net/irda/irlap.c --- a/net/irda/irlap.c +++ b/net/irda/irlap.c @@ -445,9 +445,8 @@ void irlap_disconnect_request(struct irl IRDA_ASSERT(self->magic == LAP_MAGIC, return;); /* Don't disconnect until all data frames are successfully sent */ - if (skb_queue_len(&self->txq) > 0) { + if (!skb_queue_empty(&self->txq)) { self->disconnect_pending = TRUE; - return; } diff --git a/net/irda/irlap_event.c b/net/irda/irlap_event.c --- a/net/irda/irlap_event.c +++ b/net/irda/irlap_event.c @@ -191,7 +191,7 @@ static void irlap_start_poll_timer(struc * Send out the RR frames faster if our own transmit queue is empty, or * if the peer is busy. The effect is a much faster conversation */ - if ((skb_queue_len(&self->txq) == 0) || (self->remote_busy)) { + if (skb_queue_empty(&self->txq) || self->remote_busy) { if (self->fast_RR == TRUE) { /* * Assert that the fast poll timer has not reached the @@ -263,7 +263,7 @@ void irlap_do_event(struct irlap_cb *sel IRDA_DEBUG(2, "%s() : queue len = %d\n", __FUNCTION__, skb_queue_len(&self->txq)); - if (skb_queue_len(&self->txq)) { + if (!skb_queue_empty(&self->txq)) { /* Prevent race conditions with irlap_data_request() */ self->local_busy = TRUE; @@ -1074,7 +1074,7 @@ static int irlap_state_xmit_p(struct irl #else /* CONFIG_IRDA_DYNAMIC_WINDOW */ /* Window has been adjusted for the max packet * size, so much simpler... - Jean II */ - nextfit = (skb_queue_len(&self->txq) > 0); + nextfit = !skb_queue_empty(&self->txq); #endif /* CONFIG_IRDA_DYNAMIC_WINDOW */ /* * Send data with poll bit cleared only if window > 1 @@ -1814,7 +1814,7 @@ static int irlap_state_xmit_s(struct irl #else /* CONFIG_IRDA_DYNAMIC_WINDOW */ /* Window has been adjusted for the max packet * size, so much simpler... - Jean II */ - nextfit = (skb_queue_len(&self->txq) > 0); + nextfit = !skb_queue_empty(&self->txq); #endif /* CONFIG_IRDA_DYNAMIC_WINDOW */ /* * Send data with final bit cleared only if window > 1 @@ -1937,7 +1937,7 @@ static int irlap_state_nrm_s(struct irla irlap_data_indication(self, skb, FALSE); /* Any pending data requests? */ - if ((skb_queue_len(&self->txq) > 0) && + if (!skb_queue_empty(&self->txq) && (self->window > 0)) { self->ack_required = TRUE; @@ -2038,7 +2038,7 @@ static int irlap_state_nrm_s(struct irla /* * Any pending data requests? */ - if ((skb_queue_len(&self->txq) > 0) && + if (!skb_queue_empty(&self->txq) && (self->window > 0) && !self->remote_busy) { irlap_data_indication(self, skb, TRUE); @@ -2069,7 +2069,7 @@ static int irlap_state_nrm_s(struct irla */ nr_status = irlap_validate_nr_received(self, info->nr); if (nr_status == NR_EXPECTED) { - if ((skb_queue_len( &self->txq) > 0) && + if (!skb_queue_empty(&self->txq) && (self->window > 0)) { self->remote_busy = FALSE; diff --git a/net/irda/irlap_frame.c b/net/irda/irlap_frame.c --- a/net/irda/irlap_frame.c +++ b/net/irda/irlap_frame.c @@ -1018,11 +1018,10 @@ void irlap_resend_rejected_frames(struct /* * We can now fill the window with additional data frames */ - while (skb_queue_len( &self->txq) > 0) { + while (!skb_queue_empty(&self->txq)) { IRDA_DEBUG(0, "%s(), sending additional frames!\n", __FUNCTION__); - if ((skb_queue_len( &self->txq) > 0) && - (self->window > 0)) { + if (self->window > 0) { skb = skb_dequeue( &self->txq); IRDA_ASSERT(skb != NULL, return;); @@ -1031,8 +1030,7 @@ void irlap_resend_rejected_frames(struct * bit cleared */ if ((self->window > 1) && - skb_queue_len(&self->txq) > 0) - { + !skb_queue_empty(&self->txq)) { irlap_send_data_primary(self, skb); } else { irlap_send_data_primary_poll(self, skb); diff --git a/net/irda/irttp.c b/net/irda/irttp.c --- a/net/irda/irttp.c +++ b/net/irda/irttp.c @@ -1513,7 +1513,7 @@ int irttp_disconnect_request(struct tsap /* * Check if there is still data segments in the transmit queue */ - if (skb_queue_len(&self->tx_queue) > 0) { + if (!skb_queue_empty(&self->tx_queue)) { if (priority == P_HIGH) { /* * No need to send the queued data, if we are diff --git a/net/llc/llc_c_ev.c b/net/llc/llc_c_ev.c --- a/net/llc/llc_c_ev.c +++ b/net/llc/llc_c_ev.c @@ -84,7 +84,7 @@ static u16 llc_util_nr_inside_tx_window( if (llc->dev->flags & IFF_LOOPBACK) goto out; rc = 1; - if (!skb_queue_len(&llc->pdu_unack_q)) + if (skb_queue_empty(&llc->pdu_unack_q)) goto out; skb = skb_peek(&llc->pdu_unack_q); pdu = llc_pdu_sn_hdr(skb); diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -858,7 +858,7 @@ static inline void netlink_rcv_wake(stru { struct netlink_sock *nlk = nlk_sk(sk); - if (!skb_queue_len(&sk->sk_receive_queue)) + if (skb_queue_empty(&sk->sk_receive_queue)) clear_bit(0, &nlk->state); if (!test_bit(0, &nlk->state)) wake_up_interruptible(&nlk->wait); diff --git a/net/sched/sch_red.c b/net/sched/sch_red.c --- a/net/sched/sch_red.c +++ b/net/sched/sch_red.c @@ -385,7 +385,7 @@ static int red_change(struct Qdisc *sch, memcpy(q->Stab, RTA_DATA(tb[TCA_RED_STAB-1]), 256); q->qcount = -1; - if (skb_queue_len(&sch->q) == 0) + if (skb_queue_empty(&sch->q)) PSCHED_SET_PASTPERFECT(q->qidlestart); sch_tree_unlock(sch); return 0; diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -302,7 +302,7 @@ static void unix_write_space(struct sock * may receive messages only from that peer. */ static void unix_dgram_disconnected(struct sock *sk, struct sock *other) { - if (skb_queue_len(&sk->sk_receive_queue)) { + if (!skb_queue_empty(&sk->sk_receive_queue)) { skb_queue_purge(&sk->sk_receive_queue); wake_up_interruptible_all(&unix_sk(sk)->peer_wait); @@ -1619,7 +1619,7 @@ static long unix_stream_data_wait(struct for (;;) { prepare_to_wait(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE); - if (skb_queue_len(&sk->sk_receive_queue) || + if (!skb_queue_empty(&sk->sk_receive_queue) || sk->sk_err || (sk->sk_shutdown & RCV_SHUTDOWN) || signal_pending(current) || ^ permalink raw reply [flat|nested] 31+ messages in thread
[parent not found: <42CE22CE.7030902@cosmosbay.com>]
* Re: [PATCH] loop unrolling in net/sched/sch_generic.c [not found] ` <42CE22CE.7030902@cosmosbay.com> @ 2005-07-08 7:30 ` David S. Miller 2005-07-08 8:19 ` Eric Dumazet 0 siblings, 1 reply; 31+ messages in thread From: David S. Miller @ 2005-07-08 7:30 UTC (permalink / raw) To: dada1; +Cc: tgraf, netdev From: Eric Dumazet <dada1@cosmosbay.com> Date: Fri, 08 Jul 2005 08:53:02 +0200 > About making sk_buff smaller, I use this patch to declare 'struct > sec_path *sp' only ifdef CONFIG_XFRM, what do you think ? I also > use a patch to declare nfcache, nfctinfo and nfct only if > CONFIG_NETFILTER_CONNTRACK or CONFIG_NETFILTER_CONNTRACK_MODULE are > defined, but thats more intrusive. Also, tc_index is not used if > CONFIG_NET_SCHED only is declared but none of CONFIG_NET_SCH_* In my > case, I am using CONFIG_NET_SCHED only to be able to do : tc -s -d > qdisc Distributions enable all of the ifdefs, and that is thus the size and resultant performance most users see. That's why I'm working on shrinking the size assuming all the config options are enabled, because that is the reality for most installations. For all of this stuff we could consider stealing some ideas from BSD, namely doing something similar to their MBUF tags. If a subsystem wants to add a cookie to a networking buffer, it allocates a tag and links it into the struct. So, you basically get away with only one pointer (a struct hlist_head). We could use this for the security, netfilter, and TC stuff. I don't know exactly what our tags would look like, but perhaps: struct skb_tag; struct skb_tag_type { void (*destructor)(struct skb_tag *); kmem_cache_t *slab_cache; const char *name; }; struct skb_tag { struct hlist_node list; struct skb_tag_type *owner; int tag_id; char data[0]; }; struct sk_buff { ... struct hlist_head tag_list; ... }; Then netfilter does stuff like: struct sk_buff *skb; struct skb_tag *tag; struct conntrack_skb_info *info; tag = skb_find_tag(skb, SKB_TAG_NETFILTER_CONNTRACK); info = (struct conntrack_skb_info *) tag->data; etc. etc. The downsides to this approach are: 1) Tagging an SKB eats a memory allocation, which isn't nice. This is mainly why I haven't mentioned this idea before. It may be that, on an active system, the per-cpu SLAB caches for such tag objects might keep the allocation costs real low. Another factor is that tags are relatively tiny, so a large number of them fit in one SLAB. But on the other hand we've been trying to remove per-packet kmalloc() counts, see the SKB fast-clone discussions about that. And people ask for SKB recycling all the time. 2) skb_clone() would get more expensive. This is because you'd need to clone the SKB tags as well. There is the possibility to hang the tags off of the skb_shinfo() area. I know this idea sounds crazy, but the theory goes that if the netfilter et. al info would change (and thus, so would the assosciative tags), then you'd need to COW the SKB anyways. This is actually an idea worth considering regardless of whether we do tags or not. It would result in less reference counting when we clone an SKB with netfilter stuff or security stuff attached. Overall I'm not too thrilled with the idea, but I'm enthusiatic about being convinced otherwise since this would shrink sk_buff dramatically. :-) ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] loop unrolling in net/sched/sch_generic.c 2005-07-08 7:30 ` David S. Miller @ 2005-07-08 8:19 ` Eric Dumazet 2005-07-08 11:08 ` Arnaldo Carvalho de Melo 0 siblings, 1 reply; 31+ messages in thread From: Eric Dumazet @ 2005-07-08 8:19 UTC (permalink / raw) To: David S. Miller; +Cc: tgraf, netdev David S. Miller a écrit : > From: Eric Dumazet <dada1@cosmosbay.com> > Date: Fri, 08 Jul 2005 08:53:02 +0200 > > >>About making sk_buff smaller, I use this patch to declare 'struct >>sec_path *sp' only ifdef CONFIG_XFRM, what do you think ? I also >>use a patch to declare nfcache, nfctinfo and nfct only if >>CONFIG_NETFILTER_CONNTRACK or CONFIG_NETFILTER_CONNTRACK_MODULE are >>defined, but thats more intrusive. Also, tc_index is not used if >>CONFIG_NET_SCHED only is declared but none of CONFIG_NET_SCH_* In my >>case, I am using CONFIG_NET_SCHED only to be able to do : tc -s -d >>qdisc > > > Distributions enable all of the ifdefs, and that is thus the > size and resultant performance most users see. Well, I had this idea because I found another similar use in include/linux/ip.h struct inet_sock { /* sk and pinet6 has to be the first two members of inet_sock */ struct sock sk; #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) struct ipv6_pinfo *pinet6; #endif You are right such conditions are distributions nightmare :( Eric ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] loop unrolling in net/sched/sch_generic.c 2005-07-08 8:19 ` Eric Dumazet @ 2005-07-08 11:08 ` Arnaldo Carvalho de Melo 2005-07-12 4:02 ` David S. Miller 0 siblings, 1 reply; 31+ messages in thread From: Arnaldo Carvalho de Melo @ 2005-07-08 11:08 UTC (permalink / raw) To: Eric Dumazet; +Cc: David S. Miller, tgraf, netdev [-- Attachment #1: Type: text/plain, Size: 1345 bytes --] On 7/8/05, Eric Dumazet <dada1@cosmosbay.com> wrote: > > David S. Miller a écrit : > > From: Eric Dumazet <dada1@cosmosbay.com> > > Date: Fri, 08 Jul 2005 08:53:02 +0200 > > > > > >>About making sk_buff smaller, I use this patch to declare 'struct > >>sec_path *sp' only ifdef CONFIG_XFRM, what do you think ? I also > >>use a patch to declare nfcache, nfctinfo and nfct only if > >>CONFIG_NETFILTER_CONNTRACK or CONFIG_NETFILTER_CONNTRACK_MODULE are > >>defined, but thats more intrusive. Also, tc_index is not used if > >>CONFIG_NET_SCHED only is declared but none of CONFIG_NET_SCH_* In my > >>case, I am using CONFIG_NET_SCHED only to be able to do : tc -s -d > >>qdisc > > > > > > Distributions enable all of the ifdefs, and that is thus the > > size and resultant performance most users see. > > Well, I had this idea because I found another similar use in > include/linux/ip.h > > struct inet_sock { > /* sk and pinet6 has to be the first two members of inet_sock */ > struct sock sk; > #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) > struct ipv6_pinfo *pinet6; > #endif /me pleads guilty, Dave, any problem with removing this #ifdef? Humm, I'll think about using the skb_alloc_extension() idea for struct sock, but this pinet6 sucker is a bit more difficult I guess... - Arnaldo [-- Attachment #2: Type: text/html, Size: 1903 bytes --] ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] loop unrolling in net/sched/sch_generic.c 2005-07-08 11:08 ` Arnaldo Carvalho de Melo @ 2005-07-12 4:02 ` David S. Miller 0 siblings, 0 replies; 31+ messages in thread From: David S. Miller @ 2005-07-12 4:02 UTC (permalink / raw) To: arnaldo.melo; +Cc: dada1, tgraf, netdev From: Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com> Date: Fri, 8 Jul 2005 08:08:54 -0300 > > struct inet_sock { > > /* sk and pinet6 has to be the first two members of inet_sock */ > > struct sock sk; > > #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) > > struct ipv6_pinfo *pinet6; > > #endif > > /me pleads guilty, Dave, any problem with removing this #ifdef? Humm, I'll > think about using Just leave it for now. If we come up with some spectacularly nice way to deal with this in the future, we can change it then. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] loop unrolling in net/sched/sch_generic.c 2005-07-05 7:38 ` [PATCH] loop unrolling in net/sched/sch_generic.c Eric Dumazet 2005-07-05 11:51 ` Thomas Graf @ 2005-07-05 21:26 ` David S. Miller 1 sibling, 0 replies; 31+ messages in thread From: David S. Miller @ 2005-07-05 21:26 UTC (permalink / raw) To: dada1; +Cc: netdev Eric, I've told you this before many times. Please do something so that your email client does not corrupt the patches. Once again, your email client turned all the tab characters into spaces and thus made the patch unusable. Even though you used an attachment, the tab-->space transformation still happened somehow. Please fix this, and make a serious mental note to prevent this somehow in the future, thanks a lot. ^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2005-07-12 4:02 UTC | newest]
Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <C925F8B43D79CC49ACD0601FB68FF50C045E0FB0@orsmsx408>
2005-07-07 22:30 ` [PATCH] loop unrolling in net/sched/sch_generic.c David S. Miller
2005-07-04 22:47 [TG3]: About hw coalescing infrastructure David S. Miller
2005-07-04 22:55 ` Eric Dumazet
2005-07-04 22:57 ` Eric Dumazet
2005-07-04 23:01 ` David S. Miller
2005-07-05 7:38 ` [PATCH] loop unrolling in net/sched/sch_generic.c Eric Dumazet
2005-07-05 11:51 ` Thomas Graf
2005-07-05 12:03 ` Thomas Graf
2005-07-05 13:04 ` Eric Dumazet
2005-07-05 13:48 ` Thomas Graf
2005-07-05 15:58 ` Eric Dumazet
2005-07-05 17:34 ` Thomas Graf
2005-07-05 21:22 ` David S. Miller
2005-07-05 21:33 ` Thomas Graf
2005-07-05 21:35 ` David S. Miller
2005-07-05 23:16 ` Eric Dumazet
2005-07-05 23:41 ` Thomas Graf
2005-07-05 23:45 ` David S. Miller
2005-07-05 23:55 ` Thomas Graf
2005-07-06 0:32 ` Eric Dumazet
2005-07-06 0:51 ` Thomas Graf
2005-07-06 1:04 ` Eric Dumazet
2005-07-06 1:07 ` Thomas Graf
2005-07-06 0:53 ` Eric Dumazet
2005-07-06 1:02 ` Thomas Graf
2005-07-06 1:09 ` Eric Dumazet
2005-07-06 12:42 ` Thomas Graf
2005-07-07 21:17 ` David S. Miller
2005-07-07 21:34 ` Thomas Graf
2005-07-07 22:24 ` David S. Miller
[not found] ` <42CE22CE.7030902@cosmosbay.com>
2005-07-08 7:30 ` David S. Miller
2005-07-08 8:19 ` Eric Dumazet
2005-07-08 11:08 ` Arnaldo Carvalho de Melo
2005-07-12 4:02 ` David S. Miller
2005-07-05 21:26 ` David S. Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).