* CPU utilization increased in 2.6.27rc
@ 2008-08-13 0:56 Andrew Gallatin
2008-08-13 1:05 ` David Miller
2008-08-13 1:15 ` David Miller
0 siblings, 2 replies; 14+ messages in thread
From: Andrew Gallatin @ 2008-08-13 0:56 UTC (permalink / raw)
To: netdev
I noticed a performance degradation in the 2.6.27rc series having to
do with TCP transmits. The problem seems to be most noticeable
when using a fast (10GbE) network and a pitifully slow (2.0GHz
athlon64) host with a small (1500b) MTU using TSO and sendpage,
but I also see it with 1GbE hardware, without TSO and sendpage.
I used git-bisect to track down where the problem seems
to have been introduced in Linus' tree:
37437bb2e1ae8af470dfcd5b4ff454110894ccaf is first bad commit
commit 37437bb2e1ae8af470dfcd5b4ff454110894ccaf
Author: David S. Miller <davem@davemloft.net>
Date: Wed Jul 16 02:15:04 2008 -0700
pkt_sched: Schedule qdiscs instead of netdev_queue.
Something about this is maxing out the CPU on my very-low end test
machines. Just prior to the above commit, I see the same
good performance as 2.6.26.2 and the rest of the 2.6 series.
Here is output from netperf -tTCP_SENDFILE -C -c between 2 of
my low end hosts:
Forcedeth (1GbE)
87380 65536 65536 10.05 949.03 14.54 20.01 2.510
3.455
Myri10ge (10GbE):
87380 65536 65536 10.01 9466.27 19.00 73.43 0.329
1.271
Just after the above commit, the CPU utilization increases
dramatically. Note the large difference in CPU utilization
for both 1GbE (14.5% -> 46.5%) and 10GbE (19% -> 49.8%):
Forcedeth (1GbE)
87380 65536 65536 10.01 947.04 46.48 20.05 8.042
3.468
Myri10ge (10GbE):
87380 65536 65536 10.00 7693.19 49.81 60.03 1.061
1.278
For 1GbE, I see a similar increase in CPU utilization from
when using normal socket writes (netperf -t TCP_STREAM):
87380 65536 65536 10.05 948.92 19.89 18.65 3.434
3.220
vs
87380 65536 65536 10.07 949.35 49.38 20.77 8.523
3.584
Without TSO enabled, the difference is less evident, but still
there (~30% -> 49%).
For 10GbE, this only seems to happen for sendpage. Normal socket
write (netperf TCP_STREAM) tests do not seem to show this degradation,
perhaps because a CPU is already maxed out copying data...
According to oprofile, the system is spending a lot of
time in __qdisc_run() when sending on the 1GbE forcedeth
interface:
17978 17.5929 vmlinux __qdisc_run
9828 9.6175 vmlinux net_tx_action
8306 8.1281 vmlinux _raw_spin_lock
5762 5.6386 oprofiled (no symbols)
5443 5.3264 vmlinux __netif_schedule
5352 5.2374 vmlinux _raw_spin_unlock
4921 4.8156 vmlinux __do_softirq
3366 3.2939 vmlinux raise_softirq_irqoff
1730 1.6929 vmlinux pfifo_fast_requeue
1689 1.6528 vmlinux pfifo_fast_dequeue
1406 1.3759 oprofile (no symbols)
1346 1.3172 vmlinux _raw_spin_trylock
1194 1.1684 vmlinux nv_start_xmit_optimized
1114 1.0901 vmlinux handle_IRQ_event
1031 1.0089 vmlinux tcp_ack
<....>
Does anybody understand what's happening?
Thanks,
Drew
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: CPU utilization increased in 2.6.27rc 2008-08-13 0:56 CPU utilization increased in 2.6.27rc Andrew Gallatin @ 2008-08-13 1:05 ` David Miller 2008-08-13 15:06 ` Andrew Gallatin 2008-08-13 1:15 ` David Miller 1 sibling, 1 reply; 14+ messages in thread From: David Miller @ 2008-08-13 1:05 UTC (permalink / raw) To: gallatin; +Cc: netdev From: Andrew Gallatin <gallatin@myri.com> Date: Tue, 12 Aug 2008 20:56:23 -0400 > According to oprofile, the system is spending a lot of > time in __qdisc_run() when sending on the 1GbE forcedeth > interface: What does the profile look like beforehand? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: CPU utilization increased in 2.6.27rc 2008-08-13 1:05 ` David Miller @ 2008-08-13 15:06 ` Andrew Gallatin 0 siblings, 0 replies; 14+ messages in thread From: Andrew Gallatin @ 2008-08-13 15:06 UTC (permalink / raw) To: David Miller; +Cc: netdev David Miller wrote: > From: Andrew Gallatin <gallatin@myri.com> > Date: Tue, 12 Aug 2008 20:56:23 -0400 > >> According to oprofile, the system is spending a lot of >> time in __qdisc_run() when sending on the 1GbE forcedeth >> interface: > > What does the profile look like beforehand? The qdisc stuff is gone, and nearly everything is in the noise. Beforehand, we're at ~15% CPU. Here is the first page or so from opreport -l from immediately prior: 7566 6.4373 vmlinux _raw_spin_lock 5894 5.0147 oprofiled (no symbols) 4136 3.5190 ehci_hcd (no symbols) 3965 3.3735 vmlinux handle_IRQ_event 3333 2.8358 vmlinux tcp_ack 2952 2.5116 vmlinux __copy_skb_header 2869 2.4410 vmlinux default_idle 2702 2.2989 vmlinux nv_rx_process_optimized 2511 2.1364 vmlinux nv_start_xmit_optimized 2310 1.9654 vmlinux sk_run_filter 2157 1.8352 vmlinux kmem_cache_alloc 2139 1.8199 vmlinux IRQ0x69_interrupt 1797 1.5289 vmlinux nv_nic_irq_optimized 1796 1.5281 vmlinux kmem_cache_free 1784 1.5179 vmlinux kfree 1690 1.4379 vmlinux _raw_spin_unlock 1594 1.3562 vmlinux tcp_sendpage 1578 1.3426 vmlinux __tcp_push_pending_frames 1576 1.3409 vmlinux packet_rcv_spkt 1560 1.3273 vmlinux __inet_lookup_established 1558 1.3256 vmlinux nv_tx_done_optimized [ On this system, forcedeth shares an irq with ehci_hcd, so that's why that is so high.] Drew ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: CPU utilization increased in 2.6.27rc 2008-08-13 0:56 CPU utilization increased in 2.6.27rc Andrew Gallatin 2008-08-13 1:05 ` David Miller @ 2008-08-13 1:15 ` David Miller 2008-08-13 16:13 ` Andrew Gallatin 1 sibling, 1 reply; 14+ messages in thread From: David Miller @ 2008-08-13 1:15 UTC (permalink / raw) To: gallatin; +Cc: netdev, robert From: Andrew Gallatin <gallatin@myri.com> Date: Tue, 12 Aug 2008 20:56:23 -0400 > pkt_sched: Schedule qdiscs instead of netdev_queue. While I'm waiting for your beforehand profile data, here is a stab in the dark patch which might fix the problem. Robert, this could explain some of the things in the multiqueue testing profile you sent me a week or so ago. Let me know how well it works: diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h index 6affcfa..720cae6 100644 --- a/include/net/pkt_sched.h +++ b/include/net/pkt_sched.h @@ -89,7 +89,10 @@ extern void __qdisc_run(struct Qdisc *q); static inline void qdisc_run(struct Qdisc *q) { - if (!test_and_set_bit(__QDISC_STATE_RUNNING, &q->state)) + struct netdev_queue *txq = q->dev_queue; + + if (!netif_tx_queue_stopped(txq) && + !test_and_set_bit(__QDISC_STATE_RUNNING, &q->state)) __qdisc_run(q); } ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: CPU utilization increased in 2.6.27rc 2008-08-13 1:15 ` David Miller @ 2008-08-13 16:13 ` Andrew Gallatin 2008-08-13 19:52 ` Robert Olsson ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Andrew Gallatin @ 2008-08-13 16:13 UTC (permalink / raw) To: David Miller; +Cc: netdev, robert David Miller wrote: > From: Andrew Gallatin <gallatin@myri.com> > Date: Tue, 12 Aug 2008 20:56:23 -0400 > >> pkt_sched: Schedule qdiscs instead of netdev_queue. > > While I'm waiting for your beforehand profile data, > here is a stab in the dark patch which might fix > the problem. > > Robert, this could explain some of the things in the > multiqueue testing profile you sent me a week or so > ago. > > Let me know how well it works: Excellent! This completely fixes the increased CPU utilization I observed on both 10GbE and 1GbE interfaces, and CPU utilization is now reduced back to 2.6.26 levels. Oprofile now is nearly identical to what it was prior to 37437bb2e1ae8af470dfcd5b4ff454110894ccaf: 8363 6.5081 vmlinux _raw_spin_lock 5612 4.3672 oprofiled (no symbols) 4420 3.4396 ehci_hcd (no symbols) 4325 3.3657 vmlinux handle_IRQ_event 3688 2.8700 vmlinux default_idle 3164 2.4622 vmlinux nv_start_xmit_optimized 3092 2.4062 vmlinux sk_run_filter 3072 2.3906 vmlinux tcp_ack 2969 2.3105 vmlinux __copy_skb_header 2453 1.9089 vmlinux kmem_cache_free 2400 1.8677 vmlinux IRQ0x69_interrupt 2295 1.7860 vmlinux nv_rx_process_optimized 2092 1.6280 vmlinux kmem_cache_alloc 2072 1.6124 vmlinux kfree 2049 1.5945 vmlinux packet_rcv_spkt 1984 1.5439 vmlinux __tcp_push_pending_frames 1942 1.5113 vmlinux nv_nic_irq_optimized 1933 1.5043 vmlinux _raw_spin_unlock 1637 1.2739 vmlinux nv_tx_done_optimized 1630 1.2685 vmlinux eth_type_trans 1517 1.1805 vmlinux __qdisc_run Thank you, Drew ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: CPU utilization increased in 2.6.27rc 2008-08-13 16:13 ` Andrew Gallatin @ 2008-08-13 19:52 ` Robert Olsson 2008-08-13 21:34 ` Stephen Hemminger 2008-08-13 20:03 ` Andi Kleen 2008-08-13 20:27 ` David Miller 2 siblings, 1 reply; 14+ messages in thread From: Robert Olsson @ 2008-08-13 19:52 UTC (permalink / raw) To: Andrew Gallatin; +Cc: David Miller, netdev, Robert.Olsson Andrew Gallatin writes: > > Excellent! This completely fixes the increased CPU > utilization I observed on both 10GbE and 1GbE interfaces, > and CPU utilization is now reduced back to 2.6.26 levels. > > Robert, this could explain some of the things in the > > multiqueue testing profile you sent me a week or so > > ago. I've just rerun the virtual 10g router experiment with the current git including the pkt_sched patch. The full experiment is below. In this case the profile looks the same as before. No improvement due to this patch here. In this case we have not any old numbers to compare with as we're testing new functionality. I'm not to unhappy about the performance and there must be some functions the in profile... Virtual IP forwarding experiment. We're splitting an incoming flow load (10g) among 4 CPU's and keep the incoming flows per-CPU including TX and also skb clearing Network flow load into (eth0) 10G 82598. Total 295+293+293+220 kpps 4 * (4096 concurrent flows at 30 pkts) eth0 1500 0 3996889 0 1280 0 19 0 0 0 BMRU eth1 1500 0 1 0 0 0 3998236 0 0 0 BMRU I've configured RSS with ixgbe so all 4 CPU's are used and hacked driver so skb gets tagged with incoming CPU. The 2:nd col in softnet_stat is used to verify tagging and affinity is correct until hard_xmit and even for TX-skb cleaning to avoid all cache misses and true per-CPU forwarding. The ixgbe driver 1.3.31.5 from Intel's site is needed for RSS etc and bit modified for this test. softnet_stat 000f3236 001e63f8 00000872 00000000 00000000 00000000 00000000 00000000 00000000 000f52df 001ea58c 000008b8 00000000 00000000 00000000 00000000 00000000 00000000 000f3d90 001e7af8 00000a3b 00000000 00000000 00000000 00000000 00000000 00000000 000f4174 001e82c2 00000a17 00000000 00000000 00000000 00000000 00000000 00000000 eth0 (incoming) 214: 4 0 0 6623 PCI-MSI-edge eth0:v3-Rx 215: 0 5 6635 0 PCI-MSI-edge eth0:v2-Rx 216: 0 7152 5 0 PCI-MSI-edge eth0:v1-Rx 217: 7115 0 0 5 PCI-MSI-edge eth0:v0-Rx eth1 (outgoing) 201: 3 0 0 3738 PCI-MSI-edge eth1:v7-Tx 202: 0 4 3743 0 PCI-MSI-edge eth1:v6-Tx 203: 0 3743 4 0 PCI-MSI-edge eth1:v5-Tx 204: 3746 0 0 6 PCI-MSI-edge eth1:v4-Tx CPU: AMD64 processors, speed 3000 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 3000 samples % image name app name symbol name 407896 8.7211 vmlinux vmlinux cache_alloc_refill 339524 7.2592 vmlinux vmlinux __qdisc_run 243352 5.2030 vmlinux vmlinux dev_queue_xmit 227855 4.8717 vmlinux vmlinux kfree 214975 4.5963 vmlinux vmlinux __alloc_skb 172008 3.6776 vmlinux vmlinux cache_flusharray 168307 3.5985 vmlinux vmlinux ip_route_input 160995 3.4422 vmlinux vmlinux dev_kfree_skb_irq 146116 3.1240 vmlinux vmlinux netif_receive_skb 137763 2.9455 vmlinux vmlinux free_block 133732 2.8593 vmlinux vmlinux eth_type_trans 124262 2.6568 vmlinux vmlinux ip_rcv 110170 2.3555 vmlinux vmlinux list_del 100508 2.1489 vmlinux vmlinux ip_finish_output 96777 2.0691 vmlinux vmlinux ip_forward 89212 1.9074 vmlinux vmlinux check_addr diff --git a/net/core/dev.c b/net/core/dev.c index 8d13a9b..6fdf427 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1714,6 +1714,9 @@ static struct netdev_queue *dev_pick_tx(struct net_device *dev, { u16 queue_index = 0; + if (dev->real_num_tx_queues > 1) + return netdev_get_tx_queue(dev, skb->queue_mapping); + if (dev->select_queue) queue_index = dev->select_queue(dev, skb); else if (dev->real_num_tx_queues > 1) @@ -4872,3 +4875,4 @@ EXPORT_SYMBOL(dev_load); #endif EXPORT_PER_CPU_SYMBOL(softnet_data); +EXPORT_PER_CPU_SYMBOL(netdev_rx_stat); --- ixgbe.h.orig 2008-07-30 13:11:46.000000000 +0200 +++ ixgbe.h 2008-07-30 17:42:59.000000000 +0200 @@ -28,6 +28,8 @@ #ifndef _IXGBE_H_ #define _IXGBE_H_ +#define CONFIG_NETDEVICES_MULTIQUEUE + #include <linux/pci.h> #include <linux/netdevice.h> #include <linux/vmalloc.h> @@ -106,6 +108,10 @@ #define IXGBE_TX_FLAGS_VLAN_PRIO_MASK 0x0000e000 #define IXGBE_TX_FLAGS_VLAN_SHIFT 16 +#define IXGBE_NO_LRO +#define IXGBE_NAPI +#define CONFIG_IXGBE_NAPI + #ifndef IXGBE_NO_LRO #define IXGBE_LRO_MAX 32 /*Maximum number of LRO descriptors*/ #define IXGBE_LRO_GLOBAL 10 --- ixgbe_main.c.orig 2008-07-30 13:12:02.000000000 +0200 +++ ixgbe_main.c 2008-07-30 19:26:07.000000000 +0200 @@ -71,7 +71,7 @@ #endif -#define BASE_VERSION "1.3.31.5" +#define BASE_VERSION "1.3.31.5-080730" #define DRV_VERSION BASE_VERSION LRO DRIVERNAPI DRV_HW_PERF char ixgbe_driver_version[] = DRV_VERSION; @@ -257,6 +257,9 @@ total_packets++; total_bytes += skb->len; #endif + if(skb->queue_mapping == smp_processor_id()) + __get_cpu_var(netdev_rx_stat).dropped++; + } ixgbe_unmap_and_free_tx_resource(adapter, @@ -426,6 +429,9 @@ struct sk_buff *skb, bool is_vlan, u16 tag) { int ret; + + skb->queue_mapping = smp_processor_id(); + #ifdef CONFIG_IXGBE_NAPI if (!(adapter->flags & IXGBE_FLAG_IN_NETPOLL)) { #ifdef NETIF_F_HW_VLAN_TX @@ -2875,7 +2881,11 @@ rss_i = min(4, rss_i); rss_m = 0x3; nrq = dcb_i * vmdq_i * rss_i; +#ifdef CONFIG_NETDEVICES_MULTIQUEUE + ntq = nrq; +#else ntq = dcb_i * vmdq_i; +#endif break; case (IXGBE_FLAG_VMDQ_ENABLED | IXGBE_FLAG_DCB_ENABLED): dcb_m = 0x7 << 3; @@ -3242,7 +3252,7 @@ out: #ifdef CONFIG_NETDEVICES_MULTIQUEUE /* Notify the stack of the (possibly) reduced Tx Queue count. */ - adapter->netdev->egress_subqueue_count = adapter->num_tx_queues; + // adapter->netdev->egress_subqueue_count = adapter->num_tx_queues; #endif return err; @@ -3794,6 +3804,8 @@ } #endif /* CONFIG_PM */ +extern DEFINE_PER_CPU(struct netif_rx_stats, netdev_rx_stat); + static int ixgbe_suspend(struct pci_dev *pdev, pm_message_t state) { struct net_device *netdev = pci_get_drvdata(pdev); @@ -4402,6 +4414,9 @@ #ifdef CONFIG_NETDEVICES_MULTIQUEUE r_idx = (adapter->num_tx_queues - 1) & skb->queue_mapping; + + if(skb->queue_mapping == smp_processor_id()) + __get_cpu_var(netdev_rx_stat).dropped++; #endif tx_ring = &adapter->tx_ring[r_idx]; Cheers. --ro ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: CPU utilization increased in 2.6.27rc 2008-08-13 19:52 ` Robert Olsson @ 2008-08-13 21:34 ` Stephen Hemminger 2008-08-13 21:56 ` Robert Olsson 0 siblings, 1 reply; 14+ messages in thread From: Stephen Hemminger @ 2008-08-13 21:34 UTC (permalink / raw) To: Robert Olsson; +Cc: Andrew Gallatin, David Miller, netdev, Robert.Olsson On Wed, 13 Aug 2008 21:52:08 +0200 Robert Olsson <robert@robur.slu.se> wrote: > > Andrew Gallatin writes: > > > > Excellent! This completely fixes the increased CPU > > utilization I observed on both 10GbE and 1GbE interfaces, > > and CPU utilization is now reduced back to 2.6.26 levels. > > > > > Robert, this could explain some of the things in the > > > multiqueue testing profile you sent me a week or so > > > ago. > > I've just rerun the virtual 10g router experiment with the current > git including the pkt_sched patch. The full experiment is below. In this > case the profile looks the same as before. No improvement due to this > patch here. > > In this case we have not any old numbers to compare with as we're > testing new functionality. I'm not to unhappy about the performance > and there must be some functions the in profile... > > Virtual IP forwarding experiment. We're splitting an incoming flow > load (10g) among 4 CPU's and keep the incoming flows per-CPU including > TX and also skb clearing > > > Network flow load into (eth0) 10G 82598. Total 295+293+293+220 kpps > 4 * (4096 concurrent flows at 30 pkts) > > eth0 1500 0 3996889 0 1280 0 19 0 0 0 BMRU > eth1 1500 0 1 0 0 0 3998236 0 0 0 BMRU > > I've configured RSS with ixgbe so all 4 CPU's are used and hacked driver > so skb gets tagged with incoming CPU. The 2:nd col in softnet_stat is used > to verify tagging and affinity is correct until hard_xmit and even for TX-skb > cleaning to avoid all cache misses and true per-CPU forwarding. The ixgbe driver > 1.3.31.5 from Intel's site is needed for RSS etc and bit modified for this test. > > softnet_stat > 000f3236 001e63f8 00000872 00000000 00000000 00000000 00000000 00000000 00000000 > 000f52df 001ea58c 000008b8 00000000 00000000 00000000 00000000 00000000 00000000 > 000f3d90 001e7af8 00000a3b 00000000 00000000 00000000 00000000 00000000 00000000 > 000f4174 001e82c2 00000a17 00000000 00000000 00000000 00000000 00000000 00000000 > > eth0 (incoming) > 214: 4 0 0 6623 PCI-MSI-edge eth0:v3-Rx > 215: 0 5 6635 0 PCI-MSI-edge eth0:v2-Rx > 216: 0 7152 5 0 PCI-MSI-edge eth0:v1-Rx > 217: 7115 0 0 5 PCI-MSI-edge eth0:v0-Rx > > eth1 (outgoing) > 201: 3 0 0 3738 PCI-MSI-edge eth1:v7-Tx > 202: 0 4 3743 0 PCI-MSI-edge eth1:v6-Tx > 203: 0 3743 4 0 PCI-MSI-edge eth1:v5-Tx > 204: 3746 0 0 6 PCI-MSI-edge eth1:v4-Tx > > CPU: AMD64 processors, speed 3000 MHz (estimated) > Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 3000 > samples % image name app name symbol name > 407896 8.7211 vmlinux vmlinux cache_alloc_refill > 339524 7.2592 vmlinux vmlinux __qdisc_run > 243352 5.2030 vmlinux vmlinux dev_queue_xmit > 227855 4.8717 vmlinux vmlinux kfree > 214975 4.5963 vmlinux vmlinux __alloc_skb > 172008 3.6776 vmlinux vmlinux cache_flusharray I see you are still using the SLAB allocator. Does the SLUB change the numbers? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: CPU utilization increased in 2.6.27rc 2008-08-13 21:34 ` Stephen Hemminger @ 2008-08-13 21:56 ` Robert Olsson 2008-08-13 22:06 ` Stephen Hemminger 0 siblings, 1 reply; 14+ messages in thread From: Robert Olsson @ 2008-08-13 21:56 UTC (permalink / raw) To: Stephen Hemminger; +Cc: Robert Olsson, Andrew Gallatin, David Miller, netdev Stephen Hemminger writes: > > I see you are still using the SLAB allocator. Does the SLUB change the numbers? Correct. I did try SLUB a couple month ago but got less performance but there have been some SLUB patches since. Have you experimented? Cheers --ro ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: CPU utilization increased in 2.6.27rc 2008-08-13 21:56 ` Robert Olsson @ 2008-08-13 22:06 ` Stephen Hemminger 2008-08-13 22:21 ` Robert Olsson 0 siblings, 1 reply; 14+ messages in thread From: Stephen Hemminger @ 2008-08-13 22:06 UTC (permalink / raw) To: Robert Olsson; +Cc: Robert Olsson, Andrew Gallatin, David Miller, netdev On Wed, 13 Aug 2008 23:56:59 +0200 Robert Olsson <robert@robur.slu.se> wrote: > > Stephen Hemminger writes: > > > > I see you are still using the SLAB allocator. Does the SLUB change the numbers? > > Correct. I did try SLUB a couple month ago but got less performance > but there have been some SLUB patches since. Have you experimented? > > Cheers > --ro > > Not yet, but there was a movement to kill SLAB. If SLAB is still faster than Christoph probably wants to know (and fix the problem). The problem is that one way flows might still be moving memory between CPU's ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: CPU utilization increased in 2.6.27rc 2008-08-13 22:06 ` Stephen Hemminger @ 2008-08-13 22:21 ` Robert Olsson 0 siblings, 0 replies; 14+ messages in thread From: Robert Olsson @ 2008-08-13 22:21 UTC (permalink / raw) To: Stephen Hemminger; +Cc: Robert Olsson, Andrew Gallatin, David Miller, netdev > Not yet, but there was a movement to kill SLAB. If SLAB is still faster than > Christoph probably wants to know (and fix the problem). The problem is that > one way flows might still be moving memory between CPU's How does this happen? (Assuming affinity is setup correct of course) Cheers --ro ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: CPU utilization increased in 2.6.27rc 2008-08-13 16:13 ` Andrew Gallatin 2008-08-13 19:52 ` Robert Olsson @ 2008-08-13 20:03 ` Andi Kleen 2008-08-13 20:36 ` Andrew Gallatin 2008-08-13 20:27 ` David Miller 2 siblings, 1 reply; 14+ messages in thread From: Andi Kleen @ 2008-08-13 20:03 UTC (permalink / raw) To: Andrew Gallatin; +Cc: David Miller, netdev, robert Andrew Gallatin <gallatin@myri.com> writes: > > 8363 6.5081 vmlinux _raw_spin_lock > 5612 4.3672 oprofiled (no symbols) > 4420 3.4396 ehci_hcd (no symbols) > 4325 3.3657 vmlinux handle_IRQ_event > 3688 2.8700 vmlinux default_idle > 3164 2.4622 vmlinux nv_start_xmit_optimized > 3092 2.4062 vmlinux sk_run_filter ^^^^^^^^^^^^^ Looks like you have one of those nasty dhcpcds running that always open a raw socket and intercept everything using a filter? I always hoped those would disappear eventually and just bind to the proper protocol, but they seem to refuse dying. -Andi ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: CPU utilization increased in 2.6.27rc 2008-08-13 20:03 ` Andi Kleen @ 2008-08-13 20:36 ` Andrew Gallatin 0 siblings, 0 replies; 14+ messages in thread From: Andrew Gallatin @ 2008-08-13 20:36 UTC (permalink / raw) To: Andi Kleen; +Cc: David Miller, netdev, robert Andi Kleen wrote: > Andrew Gallatin <gallatin@myri.com> writes: >> 8363 6.5081 vmlinux _raw_spin_lock >> 5612 4.3672 oprofiled (no symbols) >> 4420 3.4396 ehci_hcd (no symbols) >> 4325 3.3657 vmlinux handle_IRQ_event >> 3688 2.8700 vmlinux default_idle >> 3164 2.4622 vmlinux nv_start_xmit_optimized >> 3092 2.4062 vmlinux sk_run_filter > ^^^^^^^^^^^^^ > > Looks like you have one of those nasty dhcpcds running that > always open a raw socket and intercept everything using a filter? > I always hoped those would disappear eventually and just > bind to the proper protocol, but they seem to refuse dying. Yeah, the box is running an ancient CENTOS4, so the dhcpcd is pretty old. Drew ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: CPU utilization increased in 2.6.27rc 2008-08-13 16:13 ` Andrew Gallatin 2008-08-13 19:52 ` Robert Olsson 2008-08-13 20:03 ` Andi Kleen @ 2008-08-13 20:27 ` David Miller 2008-08-13 20:58 ` Andrew Gallatin 2 siblings, 1 reply; 14+ messages in thread From: David Miller @ 2008-08-13 20:27 UTC (permalink / raw) To: gallatin; +Cc: netdev, robert From: Andrew Gallatin <gallatin@myri.com> Date: Wed, 13 Aug 2008 12:13:40 -0400 > David Miller wrote: > > From: Andrew Gallatin <gallatin@myri.com> > > Date: Tue, 12 Aug 2008 20:56:23 -0400 > > > >> pkt_sched: Schedule qdiscs instead of netdev_queue. > > > > While I'm waiting for your beforehand profile data, > > here is a stab in the dark patch which might fix > > the problem. > > > > Robert, this could explain some of the things in the > > multiqueue testing profile you sent me a week or so > > ago. > > > > Let me know how well it works: > > Excellent! This completely fixes the increased CPU > utilization I observed on both 10GbE and 1GbE interfaces, > and CPU utilization is now reduced back to 2.6.26 levels. > > Oprofile now is nearly identical to what it was prior to > 37437bb2e1ae8af470dfcd5b4ff454110894ccaf: Thanks for testing and providing those profiles. I'll get this fix to Linus soon. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: CPU utilization increased in 2.6.27rc 2008-08-13 20:27 ` David Miller @ 2008-08-13 20:58 ` Andrew Gallatin 0 siblings, 0 replies; 14+ messages in thread From: Andrew Gallatin @ 2008-08-13 20:58 UTC (permalink / raw) To: David Miller; +Cc: netdev, robert David Miller wrote: > > Thanks for testing and providing those profiles. > > I'll get this fix to Linus soon. No problem. Thanks for the excellent multi-queue tx work. FWIW, I tripped over this when testing a myri10ge patch for multi-queue tx... Drew ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2008-08-13 22:21 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-08-13 0:56 CPU utilization increased in 2.6.27rc Andrew Gallatin 2008-08-13 1:05 ` David Miller 2008-08-13 15:06 ` Andrew Gallatin 2008-08-13 1:15 ` David Miller 2008-08-13 16:13 ` Andrew Gallatin 2008-08-13 19:52 ` Robert Olsson 2008-08-13 21:34 ` Stephen Hemminger 2008-08-13 21:56 ` Robert Olsson 2008-08-13 22:06 ` Stephen Hemminger 2008-08-13 22:21 ` Robert Olsson 2008-08-13 20:03 ` Andi Kleen 2008-08-13 20:36 ` Andrew Gallatin 2008-08-13 20:27 ` David Miller 2008-08-13 20:58 ` Andrew Gallatin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).