* Re: [PATCH 001/001] ipv4: enable use of 240/4 address space
From: Andi Kleen @ 2008-01-11 11:17 UTC (permalink / raw)
To: Vince Fuller; +Cc: netdev, linux-kernel
In-Reply-To: <20080108011057.GA21168@cisco.com>
Vince Fuller <vaf@cisco.com> writes:
> from Vince Fuller <vaf@vaf.net>
>
> This set of diffs modify the 2.6.20 kernel to enable use of the 240/4
> (aka "class-E") address space as consistent with the Internet Draft
> draft-fuller-240space-00.txt.
Wouldn't it be wise to at least wait for it becoming an RFC first?
-Andi
^ permalink raw reply
* Re: e1000 performance issue in 4 simultaneous links
From: Benny Amorsen @ 2008-01-11 11:09 UTC (permalink / raw)
To: netdev
In-Reply-To: <20080110.172830.16409182.davem@davemloft.net>
David Miller <davem@davemloft.net> writes:
> No IRQ balancing should be done at all for networking device
> interrupts, with zero exceptions. It destroys performance.
Does irqbalanced need to be taught about this? And how about the
initial balancing, so that each network card gets assigned to one CPU?
/Benny
^ permalink raw reply
* Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
From: Herbert Xu @ 2008-01-11 10:37 UTC (permalink / raw)
To: Jarek Poplawski; +Cc: Eric Dumazet, Paul E. McKenney, davem, dipankar, netdev
In-Reply-To: <20080111083010.GA2183@ff.dom.local>
On Fri, Jan 11, 2008 at 09:30:10AM +0100, Jarek Poplawski wrote:
>
> It looks like I'm really too lazy and/or these selfdocumenting features
> of RCU are a bit overrated: one can never be sure which pointer is
> really RCU protected without checking a few places?! So, after looking
> at this rt_cache_get_next() and this patch only, it's looks like the
> third candidate after seq->private and rtable...
Perhaps we could introduce a sparse attribute for it?
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
From: Herbert Xu @ 2008-01-11 10:38 UTC (permalink / raw)
To: Jarek Poplawski; +Cc: Eric Dumazet, Paul E. McKenney, davem, dipankar, netdev
In-Reply-To: <20080111091140.GB2183@ff.dom.local>
On Fri, Jan 11, 2008 at 10:11:40AM +0100, Jarek Poplawski wrote:
>
> So, IOW: strictly speaking you are right, r can't change here, but I
> meant r vs. the returned value! Before the patch the returned value
> couldn't be NULL unless all elements of the list were looped. After
> this patch it seems possible...
Since rcu_derference(r) is always the same as r this patch cannot
change the value returned.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Improving performance of bonding driver (eql) using round robin alogrithm
From: Jeba Anandhan @ 2008-01-11 10:04 UTC (permalink / raw)
To: netdev
Hi All,
The existing algorithm works in eql bonding driver works based on
priority of each slaves. The priority has been assigned as speed of the
particular line. The current problem is, all the slaves didn't get the
chance as best slave for the transmission.
Will the round robin algorithm for selecting best slave to transmit the
data, improves the performance?.
Thanks
Jeba
^ permalink raw reply
* Re: virtio_net and SMP guests
From: Rusty Russell @ 2008-01-11 9:53 UTC (permalink / raw)
To: virtualization
Cc: Christian Borntraeger, kvm-devel, netdev, Anthony Liguori,
Dor Laor
In-Reply-To: <200801101651.58908.borntraeger@de.ibm.com>
On Friday 11 January 2008 02:51:58 Christian Borntraeger wrote:
> What about the following patch:
Looks correct and in fact pretty orthodox.
I've folded this in, thanks!
Rusty.
^ permalink raw reply
* Re: [PATCH][ROSE][AX25] af_ax25: possible circular locking
From: Jarek Poplawski @ 2008-01-11 9:40 UTC (permalink / raw)
To: David Miller; +Cc: f6bvp, ralf, adobriyan, netdev
In-Reply-To: <20080110.212242.42433023.davem@davemloft.net>
On Thu, Jan 10, 2008 at 09:22:42PM -0800, David Miller wrote:
> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Sun, 30 Dec 2007 15:13:23 +0100
>
> > On Sat, Dec 29, 2007 at 07:14:43PM -0800, David Miller wrote:
...
> I've removed the warning and made the branch back to 'again'
> unconditional as I think this is the safest version of the
> change.
>
> I'll push this upstream, thanks for fixing this Jarek.
>
Thanks for checking this and making safer!
Regards,
Jarek P.
^ permalink raw reply
* Re: Netperf TCP_RR(loopback) 10% regression in 2.6.24-rc6, comparing with 2.6.22
From: Zhang, Yanmin @ 2008-01-11 9:30 UTC (permalink / raw)
To: LKML; +Cc: netdev
In-Reply-To: <1199871330.3298.132.camel@ymzhang>
On Wed, 2008-01-09 at 17:35 +0800, Zhang, Yanmin wrote:
> The regression is:
> 1)stoakley with 2 qual-core processors: 11%;
> 2)Tulsa with 4 dual-core(+hyperThread) processors:13%;
I have new update on this issue and also cc to netdev maillist.
Thank David Miller for pointing me the netdev maillist.
>
> The test command is:
> #sudo taskset -c 7 ./netserver
> #sudo taskset -c 0 ./netperf -t TCP_RR -l 60 -H 127.0.0.1 -i 50,3 -I 99,5 -- -r 1,1
>
> As a matter of fact, 2.6.23 has about 6% regression and 2.6.24-rc's
> regression is between 16%~11%.
>
> I tried to use bisect to locate the bad patch between 2.6.22 and 2.6.23-rc1,
> but the bisected kernel wasn't stable and went crazy.
>
> I tried both CONFIG_SLUB=y and CONFIG_SLAB=y to make sure SLUB isn't the
> culprit.
>
> The oprofile data of CONFIG_SLAB=y. Top cpu utilizations are:
> 1) 2.6.22
> 2067379 9.4888 vmlinux schedule
> 1873604 8.5994 vmlinux mwait_idle
> 1568131 7.1974 vmlinux resched_task
> 1066976 4.8972 vmlinux tcp_v4_rcv
> 986641 4.5285 vmlinux tcp_rcv_established
> 979518 4.4958 vmlinux find_busiest_group
> 767069 3.5207 vmlinux sock_def_readable
> 736808 3.3818 vmlinux tcp_sendmsg
> 595889 2.7350 vmlinux task_rq_lock
> 557193 2.5574 vmlinux tcp_ack
> 470570 2.1598 vmlinux __mod_timer
> 392220 1.8002 vmlinux __alloc_skb
> 358106 1.6436 vmlinux skb_release_data
> 313372 1.4383 vmlinux skb_clone
>
> 2) 2.6.24-rc7
> 2668426 12.4497 vmlinux vmlinux schedule
> 955698 4.4589 vmlinux vmlinux skb_release_data
> 836311 3.9018 vmlinux vmlinux tcp_v4_rcv
> 762398 3.5570 vmlinux vmlinux skb_release_all
> 728907 3.4007 vmlinux vmlinux task_rq_lock
> 705037 3.2894 vmlinux vmlinux __wake_up
> 694206 3.2388 vmlinux vmlinux __mod_timer
> 617616 2.8815 vmlinux vmlinux mwait_idle
>
> It looks like tcp in 2.6.22 sends more packets, but frees far less skb than 2.6.24-rc6.
> tcp_rcv_established in 2.6.22 is highlighted on cpu utilization.
I instrumented kernel to capure the function call numbers.
1) 2.6.22
skb_release_data:50148649
tcp_ack: 25062858
tcp_transmit_skb:25063150
tcp_v4_rcv: 25063279
2) 2.6.24-rc6
skb_release_data:21429692
tcp_ack: 10707710
tcp_transmit_skb:10707866
tcp_v4_rcv: 10707959
The data doesn't show that 2.6.22 sends more packets while freeing far less skb than
2.6.24-rc6.
The data showed skb_release_data of kernel 2.6.22 is more than double of the one of
2.6.24-rc6. But netperf result just showed about 10% regression.
As the packet only has 1 byte, so I suspect 2.6.24-rc6 tries to merge packets after waiting for
a latency. 2.6.22 might haven't the wait latency or the latency is very small, so 2.6.22 almost
sends the packets immediately. I will check the source codes later.
-yanmin
^ permalink raw reply
* Re: [klibc] [patch] import socket defines
From: Mike Frysinger @ 2008-01-11 9:28 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: David Miller, netdev, klibc
In-Reply-To: <200801110403.00003.vapier@gentoo.org>
[-- Attachment #1: Type: text/plain, Size: 1711 bytes --]
On Friday 11 January 2008, Mike Frysinger wrote:
> On Friday 11 January 2008, H. Peter Anvin wrote:
> > Again, I don't particularly care about what they're named, but the whole
> > point is
> >
> > #include <linux/foo.h>
> >
> > if you want the subset and
> >
> > #include <linux/bar.h>
> >
> > if you want the whole set.
>
> i looked more at glibc/uClibc and my primary/original concern (and what i
> thought what David was raising and you confirming) was that building of
> glibc was broken and glibc headers would need updates. that does not seem
> to be the case. the breakage here is for packages that include both
> sys/socket.h (directly/indirectly) and linux/socket.h
> (directly/indirectly).
>
> due to the way the network headers depend on each other, this case is
> trivial to induce. but i dont think linux/socket.h is any more special
> than the current retarded conflicts we have between the network headers
> from the libc (which are required by POSIX and beyond) and the kernel
> headers.
>
> > No libc specifics, and no feature test macros, which I think we can both
> > agree are uglier than hell.
>
> i think in general, all of the network related headers under linux/ are
> fubared for userspace.
>
> > I thought the naming worked out nicer with <linux/sockaddr.h>
>
> placing the sockaddr definitions into linux/sockaddr.h makes sense.
so there's no confusion, since the building of the libc itself and using pure
libc headers are generally unaffected, and all of the network linux headers
are already screwed for userspace usage, i'm not against the proposed change
from Peter. it doesnt really make the situation any better/worse.
-mike
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 835 bytes --]
^ permalink raw reply
* Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
From: Jarek Poplawski @ 2008-01-11 9:23 UTC (permalink / raw)
To: Herbert Xu; +Cc: Eric Dumazet, Paul E. McKenney, davem, dipankar, netdev
In-Reply-To: <20080111091140.GB2183@ff.dom.local>
On Fri, Jan 11, 2008 at 10:11:40AM +0100, Jarek Poplawski wrote:
...
> So, IOW: strictly speaking you are right, r can't change here, but I
> meant r vs. the returned value! Before the patch the returned value
> couldn't be NULL unless all elements of the list were looped. After
...even more strictly:
couldn't be NULL unless all buckets of the hash table were looped. After
> this patch it seems possible...
Jarek P.
^ permalink raw reply
* Re: RFC: igb: Intel 82575 gigabit ethernet driver (take #3)
From: Christoph Hellwig @ 2008-01-11 9:17 UTC (permalink / raw)
To: Kok, Auke
Cc: NetDev, Jeff Garzik, Arjan van de Ven, Jesse Brandeburg,
Ronciak, John, Andrew Morton
In-Reply-To: <4786AB0C.6010202@intel.com>
On Thu, Jan 10, 2008 at 03:32:28PM -0800, Kok, Auke wrote:
> - cleaned up largely against sparse, checkpatch
largely means not completely, right? Please make sure there's no sparse
warnings left at least. checkpatch is not that criticial, but it would
be good to have an explanation for everything left.
some comments on the patch
- please remove that sill copyright heder on the Makefile, it's hard
to claim any rights on a trivial 3 line makefile.
- also please use igb-y instead of igb-objs in the Makefile
- the driver would be a lot more readable (and more importantly
hackable) if it was written in a natural flow instead of having dozends
of lines of forward declarations in every file.
- so you're adding your own phy abstraction. Is there a good reason
you can't simply use the generic phylib?
^ permalink raw reply
* Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
From: Jarek Poplawski @ 2008-01-11 9:11 UTC (permalink / raw)
To: Herbert Xu; +Cc: Eric Dumazet, Paul E. McKenney, davem, dipankar, netdev
In-Reply-To: <20080111083010.GA2183@ff.dom.local>
On Fri, Jan 11, 2008 at 09:30:10AM +0100, Jarek Poplawski wrote:
> On Fri, Jan 11, 2008 at 11:00:20AM +1100, Herbert Xu wrote:
> > On Fri, Jan 11, 2008 at 12:10:42AM +0100, Jarek Poplawski wrote:
> > >
> > > It seems this optimization could've a side effect: if during such a
> > > loop updates are done, and r is seen !NULL during while() check, but
> > > NULL after rcu_dereference(), the listing/counting could stop too
> > > soon. So, IMHO, probably the first version of this patch is more
> > > reliable. (Or alternatively additional check is needed before return.)
> >
> > No, while the value of r->u.dst.rt_next can change between two readings,
> > the value of r cannot.
>
> ...Then, of course, it's O.K.!
>
> It looks like I'm really too lazy and/or these selfdocumenting features
> of RCU are a bit overrated: one can never be sure which pointer is
> really RCU protected without checking a few places?! So, after looking
> at this rt_cache_get_next() and this patch only, it's looks like the
> third candidate after seq->private and rtable...
OOPS! ...it seems we are talking about the same, properly documented
(second) poiner yet...
So, IOW: strictly speaking you are right, r can't change here, but I
meant r vs. the returned value! Before the patch the returned value
couldn't be NULL unless all elements of the list were looped. After
this patch it seems possible...
Jarek P.
^ permalink raw reply
* Re: [klibc] [patch] import socket defines
From: Mike Frysinger @ 2008-01-11 9:02 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: David Miller, netdev, klibc
In-Reply-To: <4787221F.3030201@zytor.com>
[-- Attachment #1: Type: text/plain, Size: 2778 bytes --]
On Friday 11 January 2008, H. Peter Anvin wrote:
> Mike Frysinger wrote:
> >> all this stuff is ABI constants, and the only reason glibc
> >> doesn't use them is that glibc prefers to use enums over #defines.
> >
> > a proper libc defines things in their headers according to the POSIX
> > specs rather than relying on others to do it for them. if you want to
> > argue about linux-specific ABI pieces being exported, then you probably
> > have a valid point, but socket.h is hardly that.
>
> Have you looked at it?!!? It's full of ABI constants, and that's what I
> care about. POSIX doesn't define, say, AF_UNIX; that's an ABI specific.
i guess it depends on how you define "define" :P. no, POSIX does not state
the specific numerical value (ABI) for the define (API), but POSIX does
require sys/socket.h provide the macro AF_UNIX.
> > so if the only consumer is klibc and you're against adding these things
> > to it, special case it for __KLIBC__.
>
> No, let's split the header so that there are *no* libc knowledge in the
> kernel. For the kernel to have knowledge about the specifics of any
> particular libc (klibc, glibc, or any other) is stupid, and that's the
> whole reason we're in this spot to begin with.
we're in this spot at the moment to appease klibc only. is there any other
libc out there that is not providing its own complete sys/socket.h but
instead relying on linux/socket.h ? glibc/uClibc rely on linux/socket.h only
for the kernel's definition of sockaddr.
> Again, I don't particularly care about what they're named, but the whole
> point is
>
> #include <linux/foo.h>
>
> if you want the subset and
>
> #include <linux/bar.h>
>
> if you want the whole set.
i looked more at glibc/uClibc and my primary/original concern (and what i
thought what David was raising and you confirming) was that building of glibc
was broken and glibc headers would need updates. that does not seem to be
the case. the breakage here is for packages that include both sys/socket.h
(directly/indirectly) and linux/socket.h (directly/indirectly).
due to the way the network headers depend on each other, this case is trivial
to induce. but i dont think linux/socket.h is any more special than the
current retarded conflicts we have between the network headers from the libc
(which are required by POSIX and beyond) and the kernel headers.
> No libc specifics, and no feature test macros, which I think we can both
> agree are uglier than hell.
i think in general, all of the network related headers under linux/ are
fubared for userspace.
> I thought the naming worked out nicer with <linux/sockaddr.h>
placing the sockaddr definitions into linux/sockaddr.h makes sense.
-mike
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 835 bytes --]
^ permalink raw reply
* Re: HTB classify perfomance
From: Badalian Vyacheslav @ 2008-01-11 8:53 UTC (permalink / raw)
To: netdev
In-Reply-To: <47872845.8000702@bigtelecom.ru>
New info. Wait some time and reset oprifile statistic (i think info
abount ipt_unregister_table its get what run some script... ).
That clear info after add FILTER:
First PC
CPU: P4 / Xeon, speed 3409.96 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not
stopped) with a unit mask of 0x01 (mandatory) count 100000
samples % app name symbol name
1158171 19.1292 vmlinux ipt_do_table
722416 11.9319 vmlinux e1000_intr
627406 10.3627 vmlinux u32_classify
565286 9.3367 vmlinux e1000_irq_enable
269309 4.4481 vmlinux htb_dequeue
191016 3.1550 vmlinux ip_route_input
187127 3.0907 vmlinux sfq_dequeue
172775 2.8537 vmlinux e1000_clean_tx_irq
154654 2.5544 vmlinux e1000_clean_rx_irq
146926 2.4267 vmlinux sfq_enqueue
116782 1.9289 vmlinux htb_add_to_wait_tree
79398 1.3114 vmlinux rb_erase
74411 1.2290 vmlinux e1000_xmit_frame
65451 1.0810 vmlinux kfree
59966 0.9904 vmlinux irq_entries_start
59893 0.9892 vmlinux eth_type_trans
55510 0.9168 vmlinux dev_queue_xmit
52688 0.8702 vmlinux e1000_alloc_rx_buffers
> Hello all.
> I N days try to tune system for best performance and see strange thing.
>
> Have N htb classes
> root class is HTB. param: default 7 (if not classify - go to 1:7)
>
> filters classify only mached ip. others go to HTB DEFAULT rule.
>
> run oprofile:
> First pc (htb and iptables compile in kernel):
> CPU: P4 / Xeon, speed 3409.94 MHz (estimated)
> Counted GLOBAL_POWER_EVENTS events (time during which processor is not
> stopped) with a unit mask of 0x01 (mandatory) count 100000
> samples % app name symbol name
> 743501 47.6081 vmlinux htb_classify
> 208718 13.3647 vmlinux ipt_do_table
> 94473 6.0493 vmlinux u32_classify
> 43088 2.7590 vmlinux e1000_intr
> 35086 2.2466 vmlinux e1000_clean_tx_irq
> 34925 2.2363 vmlinux ip_route_input
> 33972 2.1753 vmlinux e1000_irq_enable
> 33788 2.1635 vmlinux htb_dequeue
> 29197 1.8696 vmlinux e1000_clean_rx_irq
> 20177 1.2920 vmlinux sfq_dequeue
> 17825 1.1414 vmlinux sfq_enqueue
> 15135 0.9691 vmlinux e1000_xmit_frame
> 15123 0.9684 vmlinux eth_type_trans
> 13081 0.8376 vmlinux kfree
> 12153 0.7782 vmlinux dev_queue_xmit
>
> Second PC (htb and iptables is modules)
> CPU: P4 / Xeon with 2 hyper-threads, speed 3192.35 MHz (estimated)
> Counted GLOBAL_POWER_EVENTS events (time during which processor is not
> stopped) with a unit mask of 0x01 (mandatory) count 100000
> samples % app name symbol name
> 102108 30.7351 sch_htb (no symbols)
> 21559 6.4894 vmlinux e1000_intr
> 17428 5.2459 cls_u32 (no symbols)
> 13887 4.1801 ip_tables (no symbols)
> 11984 3.6072 sch_sfq (no symbols)
> 11785 3.5473 vmlinux e1000_irq_enable
> 9684 2.9149 vmlinux mwait_idle_with_hints
> 9227 2.7774 vmlinux e1000_clean_rx_irq
> 8686 2.6145 vmlinux e1000_clean_tx_irq
> 6747 2.0309 vmlinux ip_route_input
> 6533 1.9665 vmlinux irq_entries_start
> 6419 1.9322 vmlinux e1000_xmit_frame
> 5605 1.6871 vmlinux dev_queue_xmit
> 4030 1.2131 vmlinux __kfree_skb
> 3997 1.2031 vmlinux __qdisc_run
> 3931 1.1833 vmlinux e1000_clean
> 3565 1.0731 vmlinux net_rx_action
> 3518 1.0589 vmlinux ip_rcv
> 3377 1.0165 vmlinux getnstimeofday
> 3215 0.9677 vmlinux rb_erase
> 2973 0.8949 vmlinux eth_type_trans
> 2707 0.8148 vmlinux ip_output
> 2586 0.7784 vmlinux handle_fasteoi_irq
>
> Hmm.. strange... look to code htb_classify i see only one place where
> it may get many CPU.
>
> ok... try to add to the end of tc batch file..
> filter add dev eth1 protocol ip parent 1:0 prio 5 u32 ht 800:: match
> ip protocol 1 0x00 flowid 1:7
> filter add dev eth0 protocol ip parent 1:0 prio 5 u32 ht 800:: match
> ip protocol 1 0x00 flowid 1:7
> (offtopic... strange... i not found that i can add filter without any
> match)
>
> Wow!
> CPU: P4 / Xeon, speed 3409.94 MHz (estimated)
> Counted GLOBAL_POWER_EVENTS events (time during which processor is not
> stopped) with a unit mask of 0x01 (mandatory) count 100000
> samples % app name symbol name
> 153128 20.9497 vmlinux ipt_unregister_table
> 121569 16.6321 vmlinux e1000_request_irq
> 60727 8.3082 vmlinux e1000_update_itr
> 47241 6.4631 vmlinux u32_delete
> 25836 3.5347 vmlinux htb_dequeue
> 18304 2.5042 vmlinux ipt_do_table
> 15980 2.1862 vmlinux mwait_idle_with_hints
> 15977 2.1858 vmlinux irq_entries_start
> 13337 1.8247 vmlinux htb_classify
> 12512 1.7118 vmlinux __ip_route_output_key
> 8821 1.2068 vmlinux sfq_init
> 8495 1.1622 vmlinux e1000_clean_rx_irq
> 8408 1.1503 vmlinux htb_enqueue
> 8018 1.0970 vmlinux e1000_xmit_frame
> 7867 1.0763 vmlinux e1000_clean_tx_ring
> 6336 0.8668 vmlinux htb_delete
> 5828 0.7973 vmlinux ___pskb_trim
> 5781 0.7909 vmlinux s_start
> 5234 0.7161 vmlinux e1000_clean_rx_irq_ps
> 4504 0.6162 vmlinux cache_alloc_refill
> 4133 0.5654 vmlinux radix_tree_delete
>
> Second PC
> CPU: P4 / Xeon with 2 hyper-threads, speed 3192.35 MHz (estimated)
> Counted GLOBAL_POWER_EVENTS events (time during which processor is not
> stopped) with a unit mask of 0x01 (mandatory) count 100000
> samples % app name symbol name
> 37747 13.3616 sch_htb (no symbols)
> 23606 8.3560 vmlinux e1000_intr
> 18158 6.4275 cls_u32 (no symbols)
> 14726 5.2127 ip_tables (no symbols)
> 13137 4.6502 vmlinux e1000_irq_enable
> 12307 4.3564 sch_sfq (no symbols)
> 9974 3.5306 vmlinux mwait_idle_with_hints
> 9855 3.4884 vmlinux e1000_clean_rx_irq
> 9077 3.2131 vmlinux e1000_clean_tx_irq
> 7293 2.5816 vmlinux irq_entries_start
> 6956 2.4623 vmlinux ip_route_input
> 6652 2.3547 vmlinux e1000_xmit_frame
> 6202 2.1954 vmlinux dev_queue_xmit
> 4403 1.5586 vmlinux __kfree_skb
> 4230 1.4973 vmlinux net_rx_action
> 4224 1.4952 vmlinux e1000_clean
> 4042 1.4308 vmlinux __qdisc_run
> 3513 1.2435 vmlinux ip_rcv
> 3509 1.2421 vmlinux getnstimeofday
> 3377 1.1954 vmlinux rb_erase
> 3191 1.1295 vmlinux eth_type_trans
> 2953 1.0453 vmlinux handle_fasteoi_irq
> 2830 1.0018 vmlinux ip_output
>
>
> Looks great!
> I hope i found interesting place for optimization.
>
> Thanks.
> Slavon.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply
* HTB classify perfomance
From: Badalian Vyacheslav @ 2008-01-11 8:26 UTC (permalink / raw)
To: netdev
Hello all.
I N days try to tune system for best performance and see strange thing.
Have N htb classes
root class is HTB. param: default 7 (if not classify - go to 1:7)
filters classify only mached ip. others go to HTB DEFAULT rule.
run oprofile:
First pc (htb and iptables compile in kernel):
CPU: P4 / Xeon, speed 3409.94 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not
stopped) with a unit mask of 0x01 (mandatory) count 100000
samples % app name symbol name
743501 47.6081 vmlinux htb_classify
208718 13.3647 vmlinux ipt_do_table
94473 6.0493 vmlinux u32_classify
43088 2.7590 vmlinux e1000_intr
35086 2.2466 vmlinux e1000_clean_tx_irq
34925 2.2363 vmlinux ip_route_input
33972 2.1753 vmlinux e1000_irq_enable
33788 2.1635 vmlinux htb_dequeue
29197 1.8696 vmlinux e1000_clean_rx_irq
20177 1.2920 vmlinux sfq_dequeue
17825 1.1414 vmlinux sfq_enqueue
15135 0.9691 vmlinux e1000_xmit_frame
15123 0.9684 vmlinux eth_type_trans
13081 0.8376 vmlinux kfree
12153 0.7782 vmlinux dev_queue_xmit
Second PC (htb and iptables is modules)
CPU: P4 / Xeon with 2 hyper-threads, speed 3192.35 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not
stopped) with a unit mask of 0x01 (mandatory) count 100000
samples % app name symbol name
102108 30.7351 sch_htb (no symbols)
21559 6.4894 vmlinux e1000_intr
17428 5.2459 cls_u32 (no symbols)
13887 4.1801 ip_tables (no symbols)
11984 3.6072 sch_sfq (no symbols)
11785 3.5473 vmlinux e1000_irq_enable
9684 2.9149 vmlinux mwait_idle_with_hints
9227 2.7774 vmlinux e1000_clean_rx_irq
8686 2.6145 vmlinux e1000_clean_tx_irq
6747 2.0309 vmlinux ip_route_input
6533 1.9665 vmlinux irq_entries_start
6419 1.9322 vmlinux e1000_xmit_frame
5605 1.6871 vmlinux dev_queue_xmit
4030 1.2131 vmlinux __kfree_skb
3997 1.2031 vmlinux __qdisc_run
3931 1.1833 vmlinux e1000_clean
3565 1.0731 vmlinux net_rx_action
3518 1.0589 vmlinux ip_rcv
3377 1.0165 vmlinux getnstimeofday
3215 0.9677 vmlinux rb_erase
2973 0.8949 vmlinux eth_type_trans
2707 0.8148 vmlinux ip_output
2586 0.7784 vmlinux handle_fasteoi_irq
Hmm.. strange... look to code htb_classify i see only one place where it
may get many CPU.
ok... try to add to the end of tc batch file..
filter add dev eth1 protocol ip parent 1:0 prio 5 u32 ht 800:: match ip
protocol 1 0x00 flowid 1:7
filter add dev eth0 protocol ip parent 1:0 prio 5 u32 ht 800:: match ip
protocol 1 0x00 flowid 1:7
(offtopic... strange... i not found that i can add filter without any match)
Wow!
CPU: P4 / Xeon, speed 3409.94 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not
stopped) with a unit mask of 0x01 (mandatory) count 100000
samples % app name symbol name
153128 20.9497 vmlinux ipt_unregister_table
121569 16.6321 vmlinux e1000_request_irq
60727 8.3082 vmlinux e1000_update_itr
47241 6.4631 vmlinux u32_delete
25836 3.5347 vmlinux htb_dequeue
18304 2.5042 vmlinux ipt_do_table
15980 2.1862 vmlinux mwait_idle_with_hints
15977 2.1858 vmlinux irq_entries_start
13337 1.8247 vmlinux htb_classify
12512 1.7118 vmlinux __ip_route_output_key
8821 1.2068 vmlinux sfq_init
8495 1.1622 vmlinux e1000_clean_rx_irq
8408 1.1503 vmlinux htb_enqueue
8018 1.0970 vmlinux e1000_xmit_frame
7867 1.0763 vmlinux e1000_clean_tx_ring
6336 0.8668 vmlinux htb_delete
5828 0.7973 vmlinux ___pskb_trim
5781 0.7909 vmlinux s_start
5234 0.7161 vmlinux e1000_clean_rx_irq_ps
4504 0.6162 vmlinux cache_alloc_refill
4133 0.5654 vmlinux radix_tree_delete
Second PC
CPU: P4 / Xeon with 2 hyper-threads, speed 3192.35 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not
stopped) with a unit mask of 0x01 (mandatory) count 100000
samples % app name symbol name
37747 13.3616 sch_htb (no symbols)
23606 8.3560 vmlinux e1000_intr
18158 6.4275 cls_u32 (no symbols)
14726 5.2127 ip_tables (no symbols)
13137 4.6502 vmlinux e1000_irq_enable
12307 4.3564 sch_sfq (no symbols)
9974 3.5306 vmlinux mwait_idle_with_hints
9855 3.4884 vmlinux e1000_clean_rx_irq
9077 3.2131 vmlinux e1000_clean_tx_irq
7293 2.5816 vmlinux irq_entries_start
6956 2.4623 vmlinux ip_route_input
6652 2.3547 vmlinux e1000_xmit_frame
6202 2.1954 vmlinux dev_queue_xmit
4403 1.5586 vmlinux __kfree_skb
4230 1.4973 vmlinux net_rx_action
4224 1.4952 vmlinux e1000_clean
4042 1.4308 vmlinux __qdisc_run
3513 1.2435 vmlinux ip_rcv
3509 1.2421 vmlinux getnstimeofday
3377 1.1954 vmlinux rb_erase
3191 1.1295 vmlinux eth_type_trans
2953 1.0453 vmlinux handle_fasteoi_irq
2830 1.0018 vmlinux ip_output
Looks great!
I hope i found interesting place for optimization.
Thanks.
Slavon.
^ permalink raw reply
* Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
From: Jarek Poplawski @ 2008-01-11 8:30 UTC (permalink / raw)
To: Herbert Xu; +Cc: Eric Dumazet, Paul E. McKenney, davem, dipankar, netdev
In-Reply-To: <20080111000020.GB22040@gondor.apana.org.au>
On Fri, Jan 11, 2008 at 11:00:20AM +1100, Herbert Xu wrote:
> On Fri, Jan 11, 2008 at 12:10:42AM +0100, Jarek Poplawski wrote:
> >
> > It seems this optimization could've a side effect: if during such a
> > loop updates are done, and r is seen !NULL during while() check, but
> > NULL after rcu_dereference(), the listing/counting could stop too
> > soon. So, IMHO, probably the first version of this patch is more
> > reliable. (Or alternatively additional check is needed before return.)
>
> No, while the value of r->u.dst.rt_next can change between two readings,
> the value of r cannot.
...Then, of course, it's O.K.!
It looks like I'm really too lazy and/or these selfdocumenting features
of RCU are a bit overrated: one can never be sure which pointer is
really RCU protected without checking a few places?! So, after looking
at this rt_cache_get_next() and this patch only, it's looks like the
third candidate after seq->private and rtable...
Thanks for explanation and sorry for disturbing!
Jarek P.
^ permalink raw reply
* Re: [klibc] [patch] import socket defines
From: H. Peter Anvin @ 2008-01-11 8:00 UTC (permalink / raw)
To: Mike Frysinger; +Cc: David Miller, netdev, klibc
In-Reply-To: <200801110257.42272.vapier@gentoo.org>
Mike Frysinger wrote:
>> all this stuff is ABI constants, and the only reason glibc
>> doesn't use them is that glibc prefers to use enums over #defines.
>
> a proper libc defines things in their headers according to the POSIX specs
> rather than relying on others to do it for them. if you want to argue about
> linux-specific ABI pieces being exported, then you probably have a valid
> point, but socket.h is hardly that.
Have you looked at it?!!? It's full of ABI constants, and that's what I
care about. POSIX doesn't define, say, AF_UNIX; that's an ABI specific.
> so if the only consumer is klibc and you're against adding these things to it,
> special case it for __KLIBC__.
No, let's split the header so that there are *no* libc knowledge in the
kernel. For the kernel to have knowledge about the specifics of any
particular libc (klibc, glibc, or any other) is stupid, and that's the
whole reason we're in this spot to begin with.
Again, I don't particularly care about what they're named, but the whole
point is
#include <linux/foo.h>
if you want the subset and
#include <linux/bar.h>
if you want the whole set.
No libc specifics, and no feature test macros, which I think we can both
agree are uglier than hell.
I thought the naming worked out nicer with <linux/sockaddr.h>, but I
*don't really care*.
-hpa
^ permalink raw reply
* Re: [klibc] [patch] import socket defines
From: Mike Frysinger @ 2008-01-11 7:57 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: David Miller, netdev, klibc
In-Reply-To: <4787164F.9030805@zytor.com>
[-- Attachment #1: Type: text/plain, Size: 2485 bytes --]
On Friday 11 January 2008, H. Peter Anvin wrote:
> Mike Frysinger wrote:
> > oh, sorry, i see what you mean. i was thinking in terms of crap removed
> > (as that's what i'm after), not crap added (which is what Peter is
> > after). i hadnt noticed that. i dont know if it'll break glibc (and
> > really, any other sane libc). if that is the case, then i think klibc
> > here is the 2nd class citizen to everyone else.
>
> I don't really understand why you insist on using such inflammatory
> language;
it's true. it is much easier to adapt klibc on the fly than every one else.
klibc is also of significant less importance in the larger open source
landscape than glibc or other libc's. this isnt really inflammatory so much
as fact.
> all this stuff is ABI constants, and the only reason glibc
> doesn't use them is that glibc prefers to use enums over #defines.
a proper libc defines things in their headers according to the POSIX specs
rather than relying on others to do it for them. if you want to argue about
linux-specific ABI pieces being exported, then you probably have a valid
point, but socket.h is hardly that. imo klibc is being lazy. you disagree.
here we are.
> Right now, glibc is special-cased. glibc also tends to be very
> deliberate about its kernel header inclusions. It wants a subset of the
> available defines, so it can include a subset header.
>
> The reverse is definitely possible too -- all other users (kernel,
> newlib, dietlibc, uclibc, and klibc) can change and leave the current
> state for glibc.
that list is inaccurate. newlib is generally not used under linux, but of the
few headers it does use, socket.h is not one of them.
i dont use dietlibc myself as ive found it generally not suitable for much,
but a quick look through their source code shows linux/socket.h not in use at
all.
uClibc is the exact reason i started this thread in the first place. we have
__GLIBC__ define in place to account for broken code and it's because of
things like linux/socket.h that we cannot yet remove it.
glibc has not used the typedefs/defines in question in ages, so they're not
affected by the things not being available.
that leaves klibc.
> We can special-case the kernel in the above case, but that would involve
> some additional ugliness.
so if the only consumer is klibc and you're against adding these things to it,
special case it for __KLIBC__.
-mike
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 835 bytes --]
^ permalink raw reply
* Re: [klibc] [patch] import socket defines
From: H. Peter Anvin @ 2008-01-11 7:42 UTC (permalink / raw)
To: Mike Frysinger; +Cc: David Miller, netdev, klibc
In-Reply-To: <4787164F.9030805@zytor.com>
H. Peter Anvin wrote:
>
> Right now, glibc is special-cased. glibc also tends to be very
> deliberate about its kernel header inclusions. It wants a subset of the
> available defines, so it can include a subset header.
>
> The reverse is definitely possible too -- all other users (kernel,
> newlib, dietlibc, uclibc, and klibc) can change and leave the current
> state for glibc.
>
> We can special-case the kernel in the above case, but that would involve
> some additional ugliness.
>
Just to clarify: I don't have any strong opinions for any particular
option -- I'm fine with either. I'd just like to get rid of the
ugliness of having #defines for any particular user spaces, and I'd
prefer two include files over feature test macros.
-hpa
^ permalink raw reply
* [PATCH 3/5] spidernet: change interrupt masks
From: Ishizaki Kou @ 2008-01-11 6:43 UTC (permalink / raw)
To: linas; +Cc: netdev, cbe-oss-dev
This patch changes spidernet interrupt masks.
- unmask GDAINVAINT. There is an operation to do by spidernet
interrupt handler.
- mask some interrupts. There are no operations in the interrupt handler.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
---
Index: linux-powerpc-git/drivers/net/spider_net.h
===================================================================
--- linux-powerpc-git.orig/drivers/net/spider_net.h
+++ linux-powerpc-git/drivers/net/spider_net.h
@@ -159,9 +159,8 @@ extern char spider_net_driver_name[];
/** interrupt mask registers */
#define SPIDER_NET_INT0_MASK_VALUE 0x3f7fe2c7
-#define SPIDER_NET_INT1_MASK_VALUE 0xffff7ff7
-/* no MAC aborts -> auto retransmission */
-#define SPIDER_NET_INT2_MASK_VALUE 0xffef7ff1
+#define SPIDER_NET_INT1_MASK_VALUE 0x0000fff2
+#define SPIDER_NET_INT2_MASK_VALUE 0x000003f1
/* we rely on flagged descriptor interrupts */
#define SPIDER_NET_FRAMENUM_VALUE 0x00000000
^ permalink raw reply
* Re: [klibc] [patch] import socket defines
From: H. Peter Anvin @ 2008-01-11 7:10 UTC (permalink / raw)
To: Mike Frysinger; +Cc: David Miller, netdev, klibc
In-Reply-To: <200801110207.39736.vapier@gentoo.org>
Mike Frysinger wrote:
> oh, sorry, i see what you mean. i was thinking in terms of crap removed (as
> that's what i'm after), not crap added (which is what Peter is after). i
> hadnt noticed that. i dont know if it'll break glibc (and really, any other
> sane libc). if that is the case, then i think klibc here is the 2nd class
> citizen to everyone else.
I don't really understand why you insist on using such inflammatory
language; all this stuff is ABI constants, and the only reason glibc
doesn't use them is that glibc prefers to use enums over #defines.
Right now, glibc is special-cased. glibc also tends to be very
deliberate about its kernel header inclusions. It wants a subset of the
available defines, so it can include a subset header.
The reverse is definitely possible too -- all other users (kernel,
newlib, dietlibc, uclibc, and klibc) can change and leave the current
state for glibc.
We can special-case the kernel in the above case, but that would involve
some additional ugliness.
-hpa
^ permalink raw reply
* Re: [klibc] [patch] import socket defines
From: Mike Frysinger @ 2008-01-11 7:07 UTC (permalink / raw)
To: David Miller; +Cc: hpa, netdev, klibc
In-Reply-To: <20080110.224749.66236050.davem@davemloft.net>
[-- Attachment #1: Type: text/plain, Size: 1502 bytes --]
On Friday 11 January 2008, David Miller wrote:
> From: Mike Frysinger <vapier@gentoo.org>
> Date: Fri, 11 Jan 2008 01:23:37 -0500
>
> > On Friday 11 January 2008, David Miller wrote:
> > > From: "H. Peter Anvin" <hpa@zytor.com>
> > >
> > > > Seems the most logical thing to do would be to break out the small
> > > > portion that everyone wants into <linux/sockaddr.h> or somesuch, and
> > > > then remove those ifdefs entirely.
> > > >
> > > > Proposed patch (still being tested) attached...
> > >
> > > I think this would clearly break existing glibc builds.
> > >
> > > I agree with fixing the ifdef checks, but not like this.
> >
> > how ? the large crap in linux/socket.h never made it into glibc builds,
> > and the few things at the top which were relocated to linux/sockaddr.h
> > are still pulled in via linux/socket.h. for glibc, the resulting
> > '#include <linux/socket.h>' should be unchanged.
>
> Hmmm...
>
> Doesn't glibc include linux/socket.h? If so, before it wouldn't get
> the sa_family_t et al. defines (because __GLIBC__ will be defined and
> it will be >= 2), but with your change it get those things.
oh, sorry, i see what you mean. i was thinking in terms of crap removed (as
that's what i'm after), not crap added (which is what Peter is after). i
hadnt noticed that. i dont know if it'll break glibc (and really, any other
sane libc). if that is the case, then i think klibc here is the 2nd class
citizen to everyone else.
-mike
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 835 bytes --]
^ permalink raw reply
* Re: [klibc] [patch] import socket defines
From: H. Peter Anvin @ 2008-01-11 7:02 UTC (permalink / raw)
To: David Miller; +Cc: vapier, netdev, klibc
In-Reply-To: <20080110.224749.66236050.davem@davemloft.net>
David Miller wrote:
>
> Hmmm...
>
> Doesn't glibc include linux/socket.h? If so, before it wouldn't get
> the sa_family_t et al. defines (because __GLIBC__ will be defined and
> it will be >= 2), but with your change it get those things.
>
> Correct me if I'm wrong.
>
At the moment, yes, it does.
-hpa
^ permalink raw reply
* [PATCH 4/5] spidernet: fix error interrupt handling
From: Ishizaki Kou @ 2008-01-11 6:46 UTC (permalink / raw)
To: linas; +Cc: netdev, cbe-oss-dev
In addition to the value of GHIINT0STS, spidernet interrupt handler
should check the values of GHIINT1STS/GHIINT2STS registers at the
beginning of spider_net_interrupt() so as not to drop error
interrupts.
GHIINT1STS/GHIINT2STS registers indicates some of erroneous conditions
in spidernet, and a few bits of GHIINT0STS register reflects these
conditions. But GHIINT0MSK masks these bits, so you should check these
conditions by reading GHIINT1STS/GHIINT2STS registers directly.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
---
Index: linux-powerpc-git/drivers/net/spider_net.c
===================================================================
--- linux-powerpc-git.orig/drivers/net/spider_net.c
+++ linux-powerpc-git/drivers/net/spider_net.c
@@ -1415,18 +1415,12 @@ spider_net_link_reset(struct net_device
* found when an interrupt is presented
*/
static void
-spider_net_handle_error_irq(struct spider_net_card *card, u32 status_reg)
+spider_net_handle_error_irq(struct spider_net_card *card, u32 status_reg,
+ u32 error_reg1, u32 error_reg2)
{
- u32 error_reg1, error_reg2;
u32 i;
int show_error = 1;
- error_reg1 = spider_net_read_reg(card, SPIDER_NET_GHIINT1STS);
- error_reg2 = spider_net_read_reg(card, SPIDER_NET_GHIINT2STS);
-
- error_reg1 &= SPIDER_NET_INT1_MASK_VALUE;
- error_reg2 &= SPIDER_NET_INT2_MASK_VALUE;
-
/* check GHIINT0STS ************************************/
if (status_reg)
for (i = 0; i < 32; i++)
@@ -1656,12 +1650,15 @@ spider_net_interrupt(int irq, void *ptr)
{
struct net_device *netdev = ptr;
struct spider_net_card *card = netdev_priv(netdev);
- u32 status_reg;
+ u32 status_reg, error_reg1, error_reg2;
status_reg = spider_net_read_reg(card, SPIDER_NET_GHIINT0STS);
- status_reg &= SPIDER_NET_INT0_MASK_VALUE;
+ error_reg1 = spider_net_read_reg(card, SPIDER_NET_GHIINT1STS);
+ error_reg2 = spider_net_read_reg(card, SPIDER_NET_GHIINT2STS);
- if (!status_reg)
+ if (!(status_reg & SPIDER_NET_INT0_MASK_VALUE) &&
+ !(error_reg1 & SPIDER_NET_INT1_MASK_VALUE) &&
+ !(error_reg2 & SPIDER_NET_INT2_MASK_VALUE))
return IRQ_NONE;
if (status_reg & SPIDER_NET_RXINT ) {
@@ -1676,7 +1673,8 @@ spider_net_interrupt(int irq, void *ptr)
spider_net_link_reset(netdev);
if (status_reg & SPIDER_NET_ERRINT )
- spider_net_handle_error_irq(card, status_reg);
+ spider_net_handle_error_irq(card, status_reg,
+ error_reg1, error_reg2);
/* clear interrupt sources */
spider_net_write_reg(card, SPIDER_NET_GHIINT0STS, status_reg);
^ permalink raw reply
* [PATCH 1/5] spidernet: add missing initialization
From: Ishizaki Kou @ 2008-01-11 6:38 UTC (permalink / raw)
To: linas; +Cc: netdev, cbe-oss-dev
This patch fixes initialization of "aneg_count" and "medium" fields in
spider_net_card to make spidernet driver correctly sets "link status".
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
---
Index: linux-powerpc-git/drivers/net/spider_net.c
===================================================================
--- linux-powerpc-git.orig/drivers/net/spider_net.c
+++ linux-powerpc-git/drivers/net/spider_net.c
@@ -1399,6 +1399,8 @@ spider_net_link_reset(struct net_device
spider_net_write_reg(card, SPIDER_NET_GMACINTEN, 0);
/* reset phy and setup aneg */
+ card->aneg_count = 0;
+ card->medium = BCM54XX_COPPER;
spider_net_setup_aneg(card);
mod_timer(&card->aneg_timer, jiffies + SPIDER_NET_ANEG_TIMER);
@@ -1982,6 +1984,8 @@ spider_net_open(struct net_device *netde
goto init_firmware_failed;
/* start probing with copper */
+ card->aneg_count = 0;
+ card->medium = BCM54XX_COPPER;
spider_net_setup_aneg(card);
if (card->phy.def->phy_id)
mod_timer(&card->aneg_timer, jiffies + SPIDER_NET_ANEG_TIMER);
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox