* Re: [PATCH] netdev/phy: Fixup lockdep warnings in mdio-mux.c
From: David Miller @ 2012-07-09 7:13 UTC (permalink / raw)
To: ddaney.cavm; +Cc: netdev, linux-kernel, david.daney
In-Reply-To: <1341439576-1413-1-git-send-email-ddaney.cavm@gmail.com>
From: David Daney <ddaney.cavm@gmail.com>
Date: Wed, 4 Jul 2012 15:06:16 -0700
> From: David Daney <david.daney@cavium.com>
>
> With lockdep enabled we get:
...
> This is a false positive, since we are indeed using 'nested' locking,
> we need to use mutex_lock_nested().
>
> Now in theory we can stack multiple MDIO multiplexers, but that would
> require passing the nesting level (which is difficult to know) to
> mutex_lock_nested(). Instead we assume the simple case of a single
> level of nesting. Since these are only warning messages, it isn't so
> important to solve the general case.
>
> Signed-off-by: David Daney <david.daney@cavium.com>
Applied to 'net', thanks.
^ permalink raw reply
* Re: [PATCH] bcm87xx: fix reg-init comment typo
From: David Miller @ 2012-07-09 7:12 UTC (permalink / raw)
To: ddaney.cavm; +Cc: jacmet, netdev, david.daney
In-Reply-To: <4FF47BDE.3010002@gmail.com>
From: David Daney <ddaney.cavm@gmail.com>
Date: Wed, 04 Jul 2012 10:22:38 -0700
> On 07/04/2012 08:05 AM, Peter Korsgaard wrote:
>> broadcom, not marvell.
>>
>> Signed-off-by: Peter Korsgaard<jacmet@sunsite.dk>
>
> Indeed, it was a cut-and-paste error. Thanks for fixing it...
>
> Acked-by: David Daney <david.daney@cavium.com>
Applied to net-next, thanks.
^ permalink raw reply
* Re: [PATCH] phylib: Support registering a bunch of drivers
From: David Miller @ 2012-07-09 7:11 UTC (permalink / raw)
To: chohnstaedt; +Cc: netdev
In-Reply-To: <20120704154434.GZ19422@elara.bln.innominate.local>
From: Christian Hohnstaedt <chohnstaedt@innominate.com>
Date: Wed, 4 Jul 2012 17:44:34 +0200
> If registering of one of them fails, all already registered drivers
> of this module will be unregistered.
>
> Use the new register/unregister functions in all drivers
> registering more than one driver.
>
> amd.c, realtek.c: Simplify: directly return registration result.
>
> Tested with broadcom.c
> All others compile-tested.
>
> Signed-off-by: Christian Hohnstaedt <chohnstaedt@innominate.com>
Applied, thanks.
^ permalink raw reply
* Re: [net] ixgbe: DCB and SR-IOV can not co-exist and will cause hangs
From: David Miller @ 2012-07-09 7:10 UTC (permalink / raw)
To: jeffrey.t.kirsher; +Cc: alexander.h.duyck, netdev, gospo, sassmann
In-Reply-To: <1341403225-1326-1-git-send-email-jeffrey.t.kirsher@intel.com>
From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Wed, 4 Jul 2012 05:00:25 -0700
> From: Alexander Duyck <alexander.h.duyck@intel.com>
>
> DCB and SR-IOV cannot currently be enabled at the same time as the queueing
> schemes are incompatible. If they are both enabled it will result in Tx
> hangs since only the first Tx queue will be able to transmit any traffic.
>
> This simple fix for this is to block us from enabling TCs in ixgbe_setup_tc
> if SR-IOV is enabled. This change will be reverted once we can support
> SR-IOV and DCB coexistence.
>
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> Acked-by: John Fastabend <john.r.fastabend@intel.com>
> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
> Tested-by: Ross Brattain <ross.b.brattain@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Applied, thanks.
^ permalink raw reply
* Re: [PATCH] bcm87xx: disable autonegotiation by default
From: David Miller @ 2012-07-09 7:09 UTC (permalink / raw)
To: jacmet; +Cc: netdev, david.daney
In-Reply-To: <1341398037-7591-1-git-send-email-jacmet@sunsite.dk>
From: Peter Korsgaard <jacmet@sunsite.dk>
Date: Wed, 4 Jul 2012 12:33:57 +0200
> The bcm87xx phys don't support autonegotiation, so don't use it by
> default, as otherwise phy_state_machine() will try to enable it (using
> c22 requests, which also don't make any sense for the bcm78xx).
>
> Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk>
Applied to net-next, thanks.
^ permalink raw reply
* Re: [RFC PATCH] tcp: limit data skbs in qdisc layer
From: David Miller @ 2012-07-09 7:08 UTC (permalink / raw)
To: eric.dumazet
Cc: ycheng, dave.taht, netdev, codel, therbert, mattmathis, nanditad,
ncardwell, andrewmcgr
In-Reply-To: <1341396687.2583.1757.camel@edumazet-glaptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 04 Jul 2012 12:11:27 +0200
> sk->sk_wmem_alloc not allowed to grow above a given limit,
> allowing no more than ~4 segments [1] per tcp socket in qdisc layer at a
> given time. (if TSO is enabled, then a single TSO packet hits the limit)
I'm suspicious and anticipate that 10G will need more queueing than
you are able to get away with tg3 at 1G speeds. But it is an exciting
idea nonetheless :-)
^ permalink raw reply
* Re: [PATCH 0/2] [net-next] Marvell sky2 updates
From: David Miller @ 2012-07-09 7:06 UTC (permalink / raw)
To: mlindner; +Cc: shemminger, netdev
In-Reply-To: <1341394709.14972.39.camel@mlindner-lin.skd.de>
Applied, but you must put a "sky2: " prefix in the subject lines
of future patches.
Otherwise someone scanning the commit log summary has no idea what
driver your changes are for.
^ permalink raw reply
* Re: [PATCH] net/macb: manage carrier state with call to netif_carrier_{on|off}()
From: David Miller @ 2012-07-09 7:03 UTC (permalink / raw)
To: nicolas.ferre
Cc: netdev, bhutchings, Arvid.Brodin, kuznet, shemminger,
linux-arm-kernel
In-Reply-To: <1341393253-6531-1-git-send-email-nicolas.ferre@atmel.com>
From: Nicolas Ferre <nicolas.ferre@atmel.com>
Date: Wed, 4 Jul 2012 11:14:13 +0200
> OFF carrier state is setup in probe() open() and suspend() functions.
> The carrier ON state is managed in macb_handle_link_change().
>
> Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Applied to net-next, thanks.
^ permalink raw reply
* Re: [PATCH] netem: add limitation to reordered packets
From: David Miller @ 2012-07-09 7:02 UTC (permalink / raw)
To: eric.dumazet; +Cc: netdev, hagen, msg, aterzis, ycheng
In-Reply-To: <1341384921.2583.1462.camel@edumazet-glaptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 04 Jul 2012 08:55:21 +0200
> From: Eric Dumazet <edumazet@google.com>
>
> Fix two netem bugs :
>
> 1) When a frame was dropped by tfifo_enqueue(), drop counter
> was incremented twice.
>
> 2) When reordering is triggered, we enqueue a packet without
> checking queue limit. This can OOM pretty fast when this
> is repeated enough, since skbs are orphaned, no socket limit
> can help in this situation.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Applied, thanks Eric.
^ permalink raw reply
* Re: [PATCH 1/1] atl1c: fix issue of transmit queue 0 timed out
From: David Miller @ 2012-07-09 7:00 UTC (permalink / raw)
To: cjren; +Cc: netdev, linux-kernel, qca-linux-team, nic-devel
In-Reply-To: <1341370308-23233-1-git-send-email-cjren@qca.qualcomm.com>
From: <cjren@qca.qualcomm.com>
Date: Wed, 4 Jul 2012 10:51:48 +0800
> some people report atl1c could cause system hang with following
> kernel trace info:
> ---------------------------------------
> WARNING: at.../net/sched/sch_generic.c:258 dev_watchdog+0x1db/0x1d0()
> ...
> NETDEV WATCHDOG: eth0 (atl1c): transmit queue 0 timed out
> ...
> ---------------------------------------
> This is caused by netif_stop_queue calling when cable Link is down.
> So remove netif_stop_queue, because link_watch will take it over.
>
> Signed-off-by: xiong <xiong@qca.qualcomm.com>
> Cc: stable <stable@vger.kernel.org>
> Signed-off-by: Cloud Ren <cjren@qca.qualcomm.com>
Applied, thanks.
^ permalink raw reply
* Re: [PATCH] net/fsl_pq_mdio: use spin_event_timeout() to poll the indicator register
From: David Miller @ 2012-07-09 6:59 UTC (permalink / raw)
To: timur; +Cc: afleming, netdev
In-Reply-To: <1341357381-10861-1-git-send-email-timur@freescale.com>
From: Timur Tabi <timur@freescale.com>
Date: Tue, 3 Jul 2012 18:16:21 -0500
> Macro spin_event_timeout() was designed for simple polling of hardware
> registers with a timeout, so use it when we poll the MIIMIND register.
> This allows us to return an error code instead of polling indefinitely.
>
> Note that PHY_INIT_TIMEOUT is a count of loop iterations, so we can't use
> it for spin_event_timeout(), which asks for microseconds.
>
> Signed-off-by: Timur Tabi <timur@freescale.com>
Define a macro for the timeout value rather than use an arbitrary
constant.
> + status = spin_event_timeout(!(in_be32(®s->miimind) & MIIMIND_BUSY),
> + 1000, 0);
This indentation is absolutely terrible.
> + status = spin_event_timeout(!(in_be32(®s->miimind) &
> + (MIIMIND_NOTVALID | MIIMIND_BUSY)), 1000, 0);
Same here.
> + status = spin_event_timeout(!(in_be32(®s->miimind) & MIIMIND_BUSY),
> + 1000, 0);
And here too.
^ permalink raw reply
* Re: [PATCH] etherdevice: introduce broadcast_ether_addr
From: David Miller @ 2012-07-09 6:58 UTC (permalink / raw)
To: johannes-cdvu00un1VgdHxzADdlk8Q
Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
linux-wireless-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1341310587.5131.2.camel-8upI4CBIZJIJvtFkdXX2HixXY32XiHfO@public.gmane.org>
From: Johannes Berg <johannes-cdvu00un1VgdHxzADdlk8Q@public.gmane.org>
Date: Tue, 03 Jul 2012 12:16:27 +0200
> From: Johannes Berg <johannes.berg-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
>
> A lot of code has either the memset or an
> inefficient copy from a static array that
> contains the all-ones broadcast address.
> Introduce broadcast_ether_addr() to fill
> an address with all ones, making the code
> clearer and allowing us to get rid of the
> various constant arrays.
>
> Signed-off-by: Johannes Berg <johannes.berg-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
I would prefer if this were named "eth_something()", thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH] sctp: refactor sctp_packet_append_chunk and clenup some memory leaks
From: David Miller @ 2012-07-09 6:54 UTC (permalink / raw)
To: vyasevich; +Cc: nhorman, netdev, linux-sctp
In-Reply-To: <4FF30428.6070403@gmail.com>
From: Vlad Yasevich <vyasevich@gmail.com>
Date: Tue, 03 Jul 2012 10:39:36 -0400
> On 07/02/2012 03:59 PM, Neil Horman wrote:
>> While doing some recent work on sctp sack bundling I noted that
>> sctp_packet_append_chunk was pretty inefficient. Specifially, it was
>> called
>> recursively while trying to bundle auth and sack chunks. Because of
>> that we
>> call sctp_packet_bundle_sack and sctp_packet_bundle_auth a total of 4
>> times for
>> every call to sctp_packet_append_chunk, knowing that at least 3 of
>> those calls
>> will do nothing.
>>
>> So lets refactor sctp_packet_bundle_auth to have an outer part that
>> does the
>> attempted bundling, and an inner part that just does the chunk
>> appends. This
>> saves us several calls per iteration that we just don't need.
>>
>> Also, noticed that the auth and sack bundling fail to free the chunks
>> they
>> allocate if the append fails, so make sure we add that in
>>
>> Signed-off-by: Neil Horman<nhorman@tuxdriver.com>
>> CC: Vlad Yasevich<vyasevich@gmail.com>
>
> Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Applied to net-next, thanks.
^ permalink raw reply
* Re: [PATCH] net: dont use __netdev_alloc_skb for bounce buffer
From: David Miller @ 2012-07-09 6:52 UTC (permalink / raw)
To: eric.dumazet; +Cc: netdev, stefan.bader
In-Reply-To: <1341254172.22621.456.camel@edumazet-glaptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 02 Jul 2012 20:36:12 +0200
> From: Eric Dumazet <edumazet@google.com>
>
> commit a1c7fff7e1 (net: netdev_alloc_skb() use build_skb()) broke b44 on
> some 64bit machines.
>
> It appears b44 and b43 use __netdev_alloc_skb() instead of alloc_skb()
> for their bounce buffers.
>
> There is no need to add an extra NET_SKB_PAD reservation for bounce
> buffers :
>
> - In TX path, NET_SKB_PAD is useless
>
> - In RX path in b44, we force a copy of incoming frames if
> GFP_DMA allocations were needed.
>
> Reported-and-bisected-by: Stefan Bader <stefan.bader@canonical.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Applied, thanks Eric.
^ permalink raw reply
* Re: [patch] [SCSI] bnx2i: use strlcpy() instead of memcpy() for strings
From: David Miller @ 2012-07-09 6:51 UTC (permalink / raw)
To: mchan
Cc: dan.carpenter, David.Laight, JBottomley, barak, eddie.wai,
linux-scsi, netdev
In-Reply-To: <1341242018.7472.5.camel@LTIRV-MCHAN1.corp.ad.broadcom.com>
From: "Michael Chan" <mchan@broadcom.com>
Date: Mon, 2 Jul 2012 08:13:38 -0700
> This came from the net-next tree, so David is the right persion to apply
> this. Thanks.
>
> Acked-by: Michael Chan <mchan@broadcom.com>
Applied, thanks.
^ permalink raw reply
* Re: [PATCH v3] ieee802154: verify packet size before trying to allocate it
From: David Miller @ 2012-07-09 6:50 UTC (permalink / raw)
To: levinsasha928
Cc: dbaryshkov, slapin, linux-zigbee-devel, netdev, linux-kernel
In-Reply-To: <1341228595-9883-1-git-send-email-levinsasha928@gmail.com>
From: Sasha Levin <levinsasha928@gmail.com>
Date: Mon, 2 Jul 2012 13:29:55 +0200
> Currently when sending data over datagram, the send function will attempt to
> allocate any size passed on from the userspace.
>
> We should make sure that this size is checked and limited. We'll limit it
> to the MTU of the device, which is checked later anyway.
>
> Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Applied.
^ permalink raw reply
* Re: [PATCH net-next 1/2] r8169: support RTL8106E
From: David Miller @ 2012-07-09 6:48 UTC (permalink / raw)
To: hayeswang; +Cc: romieu, netdev, linux-kernel, hayes
In-Reply-To: <1340966060-2749-1-git-send-email-hayeswang@realtek.com>
Francois, what would you like me to do with these two patches? I
haven't seen full ACKs from you yet.
Thanks.
^ permalink raw reply
* Re: [net-next PATCH 02/02] net/ipv4: VTI support new module for ip_vti.
From: David Miller @ 2012-07-09 6:47 UTC (permalink / raw)
To: saurabh.mohan; +Cc: netdev
In-Reply-To: <20120629013017.GA4649@debian-saurabh-64.vyatta.com>
From: Saurabh <saurabh.mohan@vyatta.com>
Date: Thu, 28 Jun 2012 18:30:17 -0700
> +#define HASH_SIZE 16
> +#define HASH(addr) (((__force u32)addr^((__force u32)addr>>4))&0xF)
Define HASH such that it masks with (HASH_SIZE - 1) instead of
0xf, so that if HASH_SIZE is changed everything automatically
still works without having to remember to update the value in
HASH()'s definition too.
> + if (skb->protocol != htons(ETH_P_IP))
> + goto tx_error;
We are really past the point where we can add major inet protocol
features without supporting ipv6 as well.
> + if (IS_ERR(rt)) {
> + dev->stats.tx_carrier_errors++;
> + goto tx_error_icmp;
> + }
> +#ifdef CONFIG_XFRM
> + /* if there is no transform then this tunnel is not functional.
> + * Or if the xfrm is not mode tunnel.
> + */
> + if (!rt->dst.xfrm ||
> + rt->dst.xfrm->props.mode != XFRM_MODE_TUNNEL) {
> + stats->tx_carrier_errors++;
> + goto tx_error_icmp;
> + }
> +#endif
This code in the CONFIG_XFRM block is not indented properly.
And this is a pointless CONFIG_* check, you can't even register
this tunnel outside of the XFRM code. In fact the code already
depends upon INET_XFRM_MODE_TUNNEL which therefore automatically
means that CONFIG_XFRM must be set for this code.
> + }
> +
> +
> + if (tunnel->err_count > 0) {
Get rid of these extra blank lines.
> + }
> +
> +
> + IPCB(skb)->flags &= ~(IPSKB_XFRM_TUNNEL_SIZE | IPSKB_XFRM_TRANSFORMED |
Again.
The reason there are long periods of time between my attempts to
review your code (and probably the reason I'm the only person still
reviewing your work at all) is that I know there are going to be so
many problems to let you know about. It's really painful to review
your work and I've spent so much time on the coding style and the
simpler issues that I really haven't considered the high level issues
of what your code is trying to do.
^ permalink raw reply
* [PATCH] netns: correctly use per-netns ipv4 sysctl_tcp_mem
From: Huang Qiang @ 2012-07-09 6:05 UTC (permalink / raw)
To: davem, glommer; +Cc: netdev, containers, yangzhenzhang
From: Yang Zhenzhang <yangzhenzhang@huawei.com>
Now, kernel allows each net namespace to independently set up its levels
for tcp memory pressure thresholds.
But it seems there is a bug, as using the following steps:
[root@host socket]# lxc-start -n test -f config /bin/bash
[root@net-test socket]# ip route add default via 192.168.58.2
[root@net-test socket]# echo 0 0 0 > /proc/sys/net/ipv4/tcp_mem
[root@net-test socket]# scp root@192.168.58.174:/home/tcp_mem_test .
and it still can transport the "tcp_mem_test" file which we hope it
would not.
It's because inet_init() (net/ipv4/af_inet.c)initialize the
tcp_prot.sysctl_mem:
tcp_prot.sysctl_mem = init_net.ipv4.sysctl_tcp_mem;
So when the protocal is TCP, sk->sk_prot->sysctl_mem(following code)
always use the ipv4 sysctl_tcp_mem of init_net namespace rather than
it's own net namespace.
This patch simply set "prot" equal to net->ipv4.sysctl_tcp_mem when
the protocol type is TCP.
Signed-off-by: Yang Zhenzhang <yangzhenzhang@huawei.com>
---
include/net/sock.h | 6 ++++++
1 files changed, 6 insertions(+), 0 deletions(-)
diff --git a/include/net/sock.h b/include/net/sock.h
index 4a45216..b62a8d9 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -59,6 +59,7 @@
#include <linux/static_key.h>
#include <linux/aio.h>
#include <linux/sched.h>
+#include <linux/in.h>
#include <linux/filter.h>
#include <linux/rculist_nulls.h>
@@ -1062,7 +1063,12 @@ static inline void sk_enter_memory_pressure(struct sock *sk)
static inline long sk_prot_mem_limits(const struct sock *sk, int index)
{
+ struct net *net = sock_net(sk);
long *prot = sk->sk_prot->sysctl_mem;
+
+ if (sk->protocol == IPPROTO_TCP)
+ prot = net->ipv4.sysctl_tcp_mem;
+
if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
prot = sk->sk_cgrp->sysctl_mem;
return prot[index];
--
1.7.1
^ permalink raw reply related
* Re: [PATCH 2/2] ksz884x: fix Endian
From: Joe Perches @ 2012-07-09 5:44 UTC (permalink / raw)
To: RongQing Li; +Cc: Ben Hutchings, netdev, Tristram.Ha
In-Reply-To: <CAJFZqHzm7=-PpsiNZJ9TgkDY2bt5WW7XwY6nBOa_E4eerRh1pg@mail.gmail.com>
On Mon, 2012-07-09 at 13:26 +0800, RongQing Li wrote:
> 2012/7/7, Ben Hutchings <bhutchings@solarflare.com>:
> > On Thu, 2012-07-05 at 10:06 +0800, roy.qing.li@gmail.com wrote:
> >> ETH_P_IP is host Endian, skb->protocol is big Endian, when
> >> compare them, we should change skb->protocol from big endian
> >> to host endian, ntohs, not htons.
[]
> >> diff --git a/drivers/net/ethernet/micrel/ksz884x.c
[]
> >> @@ -4882,7 +4882,7 @@ static netdev_tx_t netdev_tx(struct sk_buff *skb,
> >> struct net_device *dev)
> >> if (left) {
> >> if (left < num ||
> >> ((CHECKSUM_PARTIAL == skb->ip_summed) &&
> >> - (ETH_P_IPV6 == htons(skb->protocol)))) {
> >> + (ETH_P_IPV6 == ntohs(skb->protocol)))) {
> >
> > This should really be changed to the idiomatic 'skb->protocol ==
> > htons(ETH_P_IPV6)'. For the current code, the compiler will probably
> > generate a run-time byte-swap for little-endian systems.
True. Perhaps this would be better written as:
if (left) {
if (left < num ||
(ip->ip_summed == CHECKSUM_PARTIAL &&
skb->protocol == htons(ETH_P_IPV6))) {
etc...
^ permalink raw reply
* Re: [net-next RFC V5 0/5] Multiqueue virtio-net
From: Jason Wang @ 2012-07-09 5:35 UTC (permalink / raw)
To: Ronen Hod
Cc: krkumar2, habanero, mashirle, kvm, mst, netdev, linux-kernel,
virtualization, edumazet, tahm, jwhan, davem, sri
In-Reply-To: <4FF9429A.8020508@redhat.com>
On 07/08/2012 04:19 PM, Ronen Hod wrote:
> On 07/05/2012 01:29 PM, Jason Wang wrote:
>> Hello All:
>>
>> This series is an update version of multiqueue virtio-net driver
>> based on
>> Krishna Kumar's work to let virtio-net use multiple rx/tx queues to
>> do the
>> packets reception and transmission. Please review and comments.
>>
>> Test Environment:
>> - Intel(R) Xeon(R) CPU E5620 @ 2.40GHz, 8 cores 2 numa nodes
>> - Two directed connected 82599
>>
>> Test Summary:
>>
>> - Highlights: huge improvements on TCP_RR test
>
> Hi Jason,
>
> It might be that the good TCP_RR results are due to the large number
> of sessions (50-250). Can you test it also with small number of sessions?
Sure, I would test them.
>
>> - Lowlights: regression on small packet transmission, higher cpu
>> utilization
>> than single queue, need further optimization
>>
>> Analysis of the performance result:
>>
>> - I count the number of packets sending/receiving during the test, and
>> multiqueue show much more ability in terms of packets per second.
>>
>> - For the tx regression, multiqueue send about 1-2 times of more packets
>> compared to single queue, and the packets size were much smaller
>> than single
>> queue does. I suspect tcp does less batching in multiqueue, so I
>> hack the
>> tcp_write_xmit() to forece more batching, multiqueue works as well as
>> singlequeue for both small transmission and throughput
>
> Could it be that since the CPUs are not busy they are available for
> immediate handling of the packets (little batching)? In such scenario
> the CPU utilization is not really interesting. What will happen on a
> busy machine?
>
The regression happnes when test guest transmission in stream test, the
cpu utilization is 100% in this situation.
> Ronen.
>
>>
>> - I didn't pack the accelerate RFS with virtio-net in this sereis as
>> it still
>> need further shaping, for the one that interested in this please see:
>> http://www.mail-archive.com/kvm@vger.kernel.org/msg64111.html
>>
>> Changes from V4:
>> - Add ability to negotiate the number of queues through control
>> virtqueue
>> - Ethtool -{L|l} support and default the tx/rx queue number to 1
>> - Expose the API to set irq affinity instead of irq itself
>>
>> Changes from V3:
>>
>> - Rebase to the net-next
>> - Let queue 2 to be the control virtqueue to obey the spec
>> - Prodives irq affinity
>> - Choose txq based on processor id
>>
>> References:
>>
>> - V4: https://lkml.org/lkml/2012/6/25/120
>> - V3: http://lwn.net/Articles/467283/
>>
>> Test result:
>>
>> 1) 1 vm 2 vcpu 1q vs 2q, 1 - 1q, 2 - 2q, no pinning
>>
>> - Guest to External Host TCP STREAM
>> sessions size throughput1 throughput2 norm1 norm2
>> 1 64 650.55 655.61 100% 24.88 24.86 99%
>> 2 64 1446.81 1309.44 90% 30.49 27.16 89%
>> 4 64 1430.52 1305.59 91% 30.78 26.80 87%
>> 8 64 1450.89 1270.82 87% 30.83 25.95 84%
>> 1 256 1699.45 1779.58 104% 56.75 59.08 104%
>> 2 256 4902.71 3446.59 70% 98.53 62.78 63%
>> 4 256 4803.76 2980.76 62% 97.44 54.68 56%
>> 8 256 5128.88 3158.74 61% 104.68 58.61 55%
>> 1 512 2837.98 2838.42 100% 89.76 90.41 100%
>> 2 512 6742.59 5495.83 81% 155.03 99.07 63%
>> 4 512 9193.70 5900.17 64% 202.84 106.44 52%
>> 8 512 9287.51 7107.79 76% 202.18 129.08 63%
>> 1 1024 4166.42 4224.98 101% 128.55 129.86 101%
>> 2 1024 6196.94 7823.08 126% 181.80 168.81 92%
>> 4 1024 9113.62 9219.49 101% 235.15 190.93 81%
>> 8 1024 9324.25 9402.66 100% 239.10 179.99 75%
>> 1 2048 7441.63 6534.04 87% 248.01 215.63 86%
>> 2 2048 7024.61 7414.90 105% 225.79 219.62 97%
>> 4 2048 8971.49 9269.00 103% 278.94 220.84 79%
>> 8 2048 9314.20 9359.96 100% 268.36 192.23 71%
>> 1 4096 8282.60 8990.08 108% 277.45 320.05 115%
>> 2 4096 9194.80 9293.78 101% 317.02 248.76 78%
>> 4 4096 9340.73 9313.19 99% 300.34 230.35 76%
>> 8 4096 9148.23 9347.95 102% 279.49 199.43 71%
>> 1 16384 8787.89 8766.31 99% 312.38 316.53 101%
>> 2 16384 9306.35 9156.14 98% 319.53 279.83 87%
>> 4 16384 9177.81 9307.50 101% 312.69 230.07 73%
>> 8 16384 9035.82 9188.00 101% 298.32 199.17 66%
>> - TCP RR
>> sessions size throughput1 throughput2 norm1 norm2
>> 50 1 54695.41 84164.98 153% 1957.33 1901.31 97%
>> 100 1 60141.88 88598.94 147% 2157.90 2000.45 92%
>> 250 1 74763.56 135584.22 181% 2541.94 2628.59 103%
>> 50 64 51628.38 82867.50 160% 1872.55 1812.16 96%
>> 100 64 60367.73 84080.60 139% 2215.69 1867.69 84%
>> 250 64 68502.70 124910.59 182% 2321.43 2495.76 107%
>> 50 128 53477.08 77625.07 145% 1905.10 1870.99 98%
>> 100 128 59697.56 74902.37 125% 2230.66 1751.03 78%
>> 250 128 71248.74 133963.55 188% 2453.12 2711.72 110%
>> 50 256 47663.86 67742.63 142% 1880.45 1735.30 92%
>> 100 256 54051.84 68738.57 127% 2123.03 1778.59 83%
>> 250 256 68250.06 124487.90 182% 2321.89 2598.60 111%
>> - External Host to Guest TCP STRAM
>> sessions size throughput1 throughput2 norm1 norm2
>> 1 64 847.71 864.83 102% 57.99 57.93 99%
>> 2 64 1690.82 1544.94 91% 80.13 55.09 68%
>> 4 64 3434.98 3455.53 100% 127.17 89.00 69%
>> 8 64 5890.19 6557.35 111% 194.70 146.52 75%
>> 1 256 2094.04 2109.14 100% 130.73 127.14 97%
>> 2 256 5218.13 3731.97 71% 219.15 114.02 52%
>> 4 256 6734.51 9213.47 136% 227.87 208.31 91%
>> 8 256 6452.86 9402.78 145% 224.83 207.77 92%
>> 1 512 3945.07 4203.68 106% 279.72 273.30 97%
>> 2 512 7878.96 8122.55 103% 278.25 231.71 83%
>> 4 512 7645.89 9402.13 122% 252.10 217.42 86%
>> 8 512 6657.06 9403.71 141% 239.81 214.89 89%
>> 1 1024 5729.06 5111.21 89% 289.38 303.09 104%
>> 2 1024 8097.27 8159.67 100% 269.29 242.97 90%
>> 4 1024 7778.93 8919.02 114% 261.28 205.50 78%
>> 8 1024 6458.02 9360.02 144% 221.26 208.09 94%
>> 1 2048 6426.94 5195.59 80% 292.52 307.47 105%
>> 2 2048 8221.90 9025.66 109% 283.80 242.25 85%
>> 4 2048 7364.72 8527.79 115% 248.10 198.36 79%
>> 8 2048 6760.63 9161.07 135% 230.53 205.12 88%
>> 1 4096 7247.02 6874.21 94% 276.23 287.68 104%
>> 2 4096 8346.04 8818.65 105% 281.49 254.81 90%
>> 4 4096 6710.00 9354.59 139% 216.41 210.13 97%
>> 8 4096 6265.69 9406.87 150% 206.69 210.92 102%
>> 1 16384 8159.50 8048.79 98% 266.94 283.11 106%
>> 2 16384 8525.66 8552.41 100% 294.36 239.27 81%
>> 4 16384 6042.24 8447.86 139% 200.21 196.40 98%
>> 8 16384 6432.63 9403.49 146% 211.48 206.13 97%
>>
>> 2) 1 vm 4 vcpu 1q vs 4q, 1 - 1q, 2 - 4q, no pinning
>>
>> - Guest to External Host TCP STREAM
>> sessions size throughput1 throughput2 norm1 norm2
>> 1 64 636.93 657.69 103% 23.55 24.42 103%
>> 2 64 1457.46 1268.78 87% 30.97 26.02 84%
>> 4 64 3062.86 2302.43 75% 41.00 29.64 72%
>> 8 64 3107.68 2308.32 74% 41.62 29.07 69%
>> 1 256 1743.50 1750.11 100% 59.00 56.63 95%
>> 2 256 4582.61 2870.31 62% 92.47 51.97 56%
>> 4 256 8440.96 4795.37 56% 135.10 56.39 41%
>> 8 256 9240.31 6654.82 72% 144.76 74.89 51%
>> 1 512 2918.25 2735.26 93% 91.08 86.47 94%
>> 2 512 8978.32 5107.95 56% 200.00 94.97 47%
>> 4 512 8850.39 6864.37 77% 190.32 101.09 53%
>> 8 512 9270.30 8483.01 91% 193.44 118.73 61%
>> 1 1024 4416.10 3679.70 83% 135.54 110.63 81%
>> 2 1024 9085.20 8770.48 96% 242.23 175.59 72%
>> 4 1024 9158.57 9011.56 98% 234.39 159.17 67%
>> 8 1024 9345.89 9067.43 97% 233.35 138.73 59%
>> 1 2048 8455.19 6077.94 71% 338.52 190.16 56%
>> 2 2048 9223.32 8237.73 89% 270.00 198.27 73%
>> 4 2048 9080.75 9257.63 101% 261.30 172.80 66%
>> 8 2048 9177.39 8977.10 97% 256.89 147.50 57%
>> 1 4096 8665.35 8394.78 96% 289.63 289.85 100%
>> 2 4096 7850.73 8857.86 112% 253.33 252.62 99%
>> 4 4096 9332.55 8508.37 91% 289.19 151.29 52%
>> 8 4096 8482.30 9146.80 107% 255.41 156.02 61%
>> 1 16384 8825.72 8778.26 99% 314.60 308.89 98%
>> 2 16384 9283.85 8927.40 96% 316.48 246.98 78%
>> 4 16384 7766.95 8708.06 112% 265.25 155.59 58%
>> 8 16384 8945.55 8940.23 99% 298.45 151.32 50%
>> - TCP_RR
>> sessions size throughput1 throughput2 norm1 norm2
>> 50 1 60848.70 81719.39 134% 2196.86 1551.05 70%
>> 100 1 61886.19 81425.02 131% 2215.76 1517.52 68%
>> 250 1 72058.41 162597.84 225% 2441.84 2278.14 93%
>> 50 64 51646.93 74160.10 143% 1861.07 1322.22 71%
>> 100 64 57574.86 83488.26 145% 2076.54 1479.79 71%
>> 250 64 67583.35 138482.15 204% 2314.46 2022.83 87%
>> 50 128 59931.51 71633.03 119% 2244.60 1309.18 58%
>> 100 128 58329.80 73104.90 125% 2202.98 1329.52 60%
>> 250 128 71021.55 161067.73 226% 2469.11 2205.28 89%
>> 50 256 47509.24 64330.24 135% 1915.75 1269.90 66%
>> 100 256 49293.03 68507.94 138% 1939.75 1263.64 65%
>> 250 256 63169.07 138390.68 219% 2255.47 2098.13 93%
>> - External Host to Guest TCP STREAM
>> sessions size throughput1 throughput2 norm1 norm2
>> 1 64 850.18 854.96 100% 56.94 58.25 102%
>> 2 64 1659.12 1730.25 104% 81.65 67.57 82%
>> 4 64 3254.70 3397.17 104% 118.57 76.21 64%
>> 8 64 6251.97 6389.29 102% 207.68 104.21 50%
>> 1 256 2029.14 2105.18 103% 116.45 119.69 102%
>> 2 256 5412.02 4260.32 78% 240.87 139.73 58%
>> 4 256 7777.28 8743.12 112% 263.20 174.65 66%
>> 8 256 6459.51 9388.93 145% 218.94 158.37 72%
>> 1 512 4566.31 4269.30 93% 274.74 289.83 105%
>> 2 512 7444.52 8240.64 110% 286.24 243.74 85%
>> 4 512 7722.29 9391.16 121% 261.96 180.36 68%
>> 8 512 6228.50 9134.52 146% 209.17 161.00 76%
>> 1 1024 4965.50 4953.68 99% 307.64 280.48 91%
>> 2 1024 8270.08 7733.71 93% 288.32 197.04 68%
>> 4 1024 7551.04 9394.58 124% 268.41 206.62 76%
>> 8 1024 6307.78 9179.03 145% 216.67 159.63 73%
>> 1 2048 5741.12 5948.80 103% 290.34 268.66 92%
>> 2 2048 7932.79 8766.05 110% 262.96 215.90 82%
>> 4 2048 6907.55 9255.97 133% 233.56 203.96 87%
>> 8 2048 6037.22 9399.41 155% 197.14 164.09 83%
>> 1 4096 7131.70 7535.10 105% 279.43 275.12 98%
>> 2 4096 8109.17 9348.04 115% 274.29 211.49 77%
>> 4 4096 6878.92 9319.13 135% 244.21 192.06 78%
>> 8 4096 6265.92 9408.35 150% 211.85 159.26 75%
>> 1 16384 8288.01 8596.39 103% 272.85 290.22 106%
>> 2 16384 8166.29 9280.12 113% 277.04 236.61 85%
>> 4 16384 6446.97 9382.22 145% 222.91 187.24 83%
>> 8 16384 6066.98 9405.51 155% 198.98 157.09 78%
>>
>> 3) 2 vms each with 2 vcpus, 1q vs 2q - pin vhost/vcpu in the same node
>>
>> - 2 Guests to External Hosts TCP STREAM
>> sessions size throughput1 throughput2 norm1 norm2
>> 1 64 1442.07 1475.11 102% 30.82 31.21 101%
>> 2 64 3124.87 2900.93 92% 40.29 35.95 89%
>> 4 64 3166.52 2864.04 90% 40.70 35.47 87%
>> 8 64 3141.45 2848.94 90% 40.38 35.34 87%
>> 1 256 3628.54 3711.73 102% 68.47 70.22 102%
>> 2 256 7806.95 7586.69 97% 111.23 84.38 75%
>> 4 256 8823.65 7612.74 86% 132.92 85.04 63%
>> 8 256 9194.89 9373.41 101% 135.98 119.62 87%
>> 1 512 7106.67 7128.00 100% 124.79 124.30 99%
>> 2 512 9190.22 9397.33 102% 180.84 149.34 82%
>> 4 512 9401.01 9376.67 99% 173.00 140.15 81%
>> 8 512 8572.84 9032.90 105% 150.49 127.58 84%
>> 1 1024 9361.93 9379.24 100% 205.81 202.94 98%
>> 2 1024 9386.69 9389.04 100% 201.78 165.75 82%
>> 4 1024 9403.43 9378.54 99% 195.33 152.06 77%
>> 8 1024 9213.63 9180.64 99% 178.99 141.51 79%
>> 1 2048 9338.95 9384.67 100% 223.22 227.86 102%
>> 2 2048 9389.28 9389.45 100% 202.37 170.08 84%
>> 4 2048 9405.86 9388.71 99% 193.76 161.54 83%
>> 8 2048 9352.40 9384.06 100% 189.16 157.06 83%
>> 1 4096 9380.74 9384.90 100% 239.37 241.56 100%
>> 2 4096 9393.47 9376.74 99% 213.84 195.61 91%
>> 4 4096 9393.85 9381.50 99% 198.06 170.18 85%
>> 8 4096 9400.41 9232.31 98% 192.87 163.56 84%
>> 1 16384 9348.18 9335.55 99% 253.02 254.86 100%
>> 2 16384 9384.97 9359.53 99% 218.56 208.59 95%
>> 4 16384 9326.60 9382.15 100% 206.24 179.72 87%
>> 8 16384 9355.82 9392.85 100% 198.22 172.89 87%
>> - TCP RR
>> sessions size throughput1 throughput2 norm1 norm2
>> 50 1 200340.33 261750.19 130% 2935.27 3018.59 102%
>> 100 1 236141.58 266304.49 112% 3452.16 3071.74 88%
>> 250 1 361574.59 320825.08 88% 4972.98 3705.70 74%
>> 50 64 225748.53 242671.12 107% 3011.48 2869.07 95%
>> 100 64 249885.37 260453.72 104% 3240.21 3063.67 94%
>> 250 64 360341.12 310775.60 86% 4682.42 3657.91 78%
>> 50 128 227995.27 289320.38 126% 2950.92 3479.37 117%
>> 100 128 239491.11 291135.77 121% 3099.55 3508.75 113%
>> 250 128 390390.68 362484.35 92% 5042.30 4368.52 86%
>> 50 256 222604.51 317140.97 142% 3058.08 3839.39 125%
>> 100 256 254770.92 335606.03 131% 3326.16 4046.65 121%
>> 250 256 400584.52 436749.22 109% 5220.79 5278.86 101%
>> - External Host to 2 Guests
>> sessions size throughput1 throughput2 norm1 norm2
>> 1 64 1667.99 1684.50 100% 59.66 60.77 101%
>> 2 64 3338.83 3379.97 101% 83.61 64.82 77%
>> 4 64 6613.65 6619.11 100% 131.00 97.19 74%
>> 8 64 6553.07 6418.31 97% 141.35 98.27 69%
>> 1 256 3938.40 4068.52 103% 125.21 123.76 98%
>> 2 256 9215.57 9210.88 99% 185.31 154.27 83%
>> 4 256 9407.29 9008.13 95% 186.72 150.01 80%
>> 8 256 9377.17 9385.57 100% 190.28 137.59 72%
>> 1 512 7360.19 6984.80 94% 214.09 211.66 98%
>> 2 512 9392.91 9401.88 100% 193.92 173.11 89%
>> 4 512 9382.64 9394.34 100% 189.27 145.80 77%
>> 8 512 9308.60 9094.08 97% 189.70 141.26 74%
>> 1 1024 9153.26 9066.06 99% 223.07 219.95 98%
>> 2 1024 9393.38 9398.43 100% 194.02 173.82 89%
>> 4 1024 9395.92 8960.73 95% 192.61 145.82 75%
>> 8 1024 9388.92 9399.08 100% 191.18 143.87 75%
>> 1 2048 9355.32 9240.63 98% 221.50 223.03 100%
>> 2 2048 9395.68 9399.62 100% 193.31 177.21 91%
>> 4 2048 9397.67 9399.56 100% 195.25 157.53 80%
>> 8 2048 9397.89 9401.70 100% 197.57 146.96 74%
>> 1 4096 9375.84 9381.72 100% 223.06 225.06 100%
>> 2 4096 9389.47 9396.00 100% 193.91 197.13 101%
>> 4 4096 9397.45 9400.11 100% 192.33 163.60 85%
>> 8 4096 9105.40 9415.76 103% 192.71 140.41 72%
>> 1 16384 9381.53 9381.40 99% 223.53 225.66 100%
>> 2 16384 9387.90 9395.44 100% 193.34 177.03 91%
>> 4 16384 9397.92 9410.98 100% 195.04 151.14 77%
>> 8 16384 9259.00 9419.48 101% 194.91 153.48 78%
>>
>> 4) Local vm to vm 2 vcpu 1q vs 2q - pin vcpu/thread in the same numa
>> node
>>
>> - VM to VM TCP STREAM
>> sessions size throughput1 throughput2 norm1 norm2
>> 1 64 576.05 576.14 100% 12.25 12.32 100%
>> 2 64 1266.75 1160.04 91% 19.10 16.05 84%
>> 4 64 1267.34 1123.70 88% 19.08 15.51 81%
>> 8 64 1230.88 1174.70 95% 18.53 15.58 84%
>> 1 256 1311.00 1303.02 99% 25.34 25.35 100%
>> 2 256 5400.26 2794.00 51% 75.92 36.43 47%
>> 4 256 5200.67 2818.88 54% 72.81 33.92 46%
>> 8 256 5234.55 2893.74 55% 73.10 34.97 47%
>> 1 512 3244.09 3263.72 100% 56.48 56.65 100%
>> 2 512 8172.16 4661.15 57% 119.05 67.89 57%
>> 4 512 10567.44 7063.25 66% 147.76 77.27 52%
>> 8 512 10477.87 8471.33 80% 145.94 102.91 70%
>> 1 1024 5432.54 5333.99 98% 93.69 92.38 98%
>> 2 1024 12590.24 9259.97 73% 185.37 135.28 72%
>> 4 1024 15600.53 10731.93 68% 222.20 123.60 55%
>> 8 1024 16222.87 10704.85 65% 227.05 113.81 50%
>> 1 2048 6667.61 7484.37 112% 116.75 129.72 111%
>> 2 2048 8180.43 11500.88 140% 137.84 156.64 113%
>> 4 2048 15127.93 14416.16 95% 227.60 154.59 67%
>> 8 2048 16381.79 14794.10 90% 244.29 158.45 64%
>> 1 4096 7375.63 8948.90 121% 131.97 156.57 118%
>> 2 4096 9321.16 14443.21 154% 161.24 163.74 101%
>> 4 4096 13028.45 15984.94 122% 212.78 171.26 80%
>> 8 4096 15611.28 18810.54 120% 245.15 198.65 81%
>> 1 16384 15304.38 14202.08 92% 259.94 244.04 93%
>> 2 16384 15508.97 15913.09 102% 261.30 244.26 93%
>> 4 16384 14859.98 20164.34 135% 248.29 214.26 86%
>> 8 16384 15594.59 19960.99 127% 253.79 211.27 83%
>> - TCP RR
>> sessions size throughput1 throughput2 norm1 norm2
>> 50 1 54972.51 69820.99 127% 1133.58 1063.58 93%
>> 100 1 55847.16 72407.93 129% 1155.73 1024.35 88%
>> 250 1 60066.23 108266.50 180% 1114.30 1323.55 118%
>> 50 64 48727.63 62378.32 128% 1014.29 888.78 87%
>> 100 64 51804.65 69250.51 133% 1077.78 986.97 91%
>> 250 64 61278.68 100015.78 163% 1076.93 1243.18 115%
>> 50 256 51593.29 62046.22 120% 1069.14 871.08 81%
>> 100 256 51647.00 68197.43 132% 1071.66 958.51 89%
>> 250 256 60433.88 99072.59 163% 1072.41 1199.10 111%
>> 50 512 52177.79 66483.77 127% 1082.65 960.82 88%
>> 100 512 50351.67 62537.63 124% 1041.61 876.41 84%
>> 250 512 60510.14 103856.79 171% 1055.21 1245.17 118%
>>
>>
>> Jason Wang (4):
>> virtio_ring: move queue_index to vring_virtqueue
>> virtio: intorduce an API to set affinity for a virtqueue
>> virtio_net: multiqueue support
>> virtio_net: support negotiating the number of queues through ctrl vq
>>
>> Krishna Kumar (1):
>> virtio_net: Introduce VIRTIO_NET_F_MULTIQUEUE
>>
>> drivers/net/virtio_net.c | 792
>> +++++++++++++++++++++++++++++------------
>> drivers/virtio/virtio_mmio.c | 5 +-
>> drivers/virtio/virtio_pci.c | 58 +++-
>> drivers/virtio/virtio_ring.c | 17 +
>> include/linux/virtio.h | 4 +
>> include/linux/virtio_config.h | 21 ++
>> include/linux/virtio_net.h | 10 +
>> 7 files changed, 677 insertions(+), 230 deletions(-)
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply
* Re: [PATCH 2/2] ksz884x: fix Endian
From: RongQing Li @ 2012-07-09 5:30 UTC (permalink / raw)
To: Ben Hutchings; +Cc: netdev, Tristram.Ha
In-Reply-To: <CAJFZqHzm7=-PpsiNZJ9TgkDY2bt5WW7XwY6nBOa_E4eerRh1pg@mail.gmail.com>
Please ignore the first reply.
OK, I will change it as Ben's suggestion.
Thanks
-Roy
^ permalink raw reply
* Re: [PATCH 2/2] ksz884x: fix Endian
From: RongQing Li @ 2012-07-09 5:26 UTC (permalink / raw)
To: Ben Hutchings; +Cc: netdev, Tristram.Ha
In-Reply-To: <1341614416.2923.12.camel@bwh-desktop.uk.solarflarecom.com>
2012/7/7, Ben Hutchings <bhutchings@solarflare.com>:
> On Thu, 2012-07-05 at 10:06 +0800, roy.qing.li@gmail.com wrote:
>> From: Li RongQing <roy.qing.li@gmail.com>
>>
>> ETH_P_IP is host Endian, skb->protocol is big Endian, when
>> compare them, we should change skb->protocol from big endian
>> to host endian, ntohs, not htons.
>>
>> CC: Tristram Ha <Tristram.Ha@micrel.com>
>> Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
>> ---
>> drivers/net/ethernet/micrel/ksz884x.c | 2 +-
>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/micrel/ksz884x.c
>> b/drivers/net/ethernet/micrel/ksz884x.c
>> index eaf9ff0..d9727f7 100644
>> --- a/drivers/net/ethernet/micrel/ksz884x.c
>> +++ b/drivers/net/ethernet/micrel/ksz884x.c
>> @@ -4882,7 +4882,7 @@ static netdev_tx_t netdev_tx(struct sk_buff *skb,
>> struct net_device *dev)
>> if (left) {
>> if (left < num ||
>> ((CHECKSUM_PARTIAL == skb->ip_summed) &&
>> - (ETH_P_IPV6 == htons(skb->protocol)))) {
>> + (ETH_P_IPV6 == ntohs(skb->protocol)))) {
>
> This should really be changed to the idiomatic 'skb->protocol ==
> htons(ETH_P_IPV6)'. For the current code, the compiler will probably
> generate a run-time byte-swap for little-endian systems.
>
> Ben.
>
>> struct sk_buff *org_skb = skb;
>>
>> skb = netdev_alloc_skb(dev, org_skb->len);
>
> --
> Ben Hutchings, Staff Engineer, Solarflare
> Not speaking for my employer; that's the marketing department's job.
> They asked us to note that Solarflare product names are trademarked.
>
>
>
^ permalink raw reply
* Re: TCP transmit performance regression
From: Ming Lei @ 2012-07-09 5:13 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Network Development, David Miller
In-Reply-To: <1341551764.3265.47.camel@edumazet-glaptop>
On Fri, Jul 6, 2012 at 1:16 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Fri, 2012-07-06 at 06:58 +0200, Eric Dumazet wrote:
>> On Fri, 2012-07-06 at 08:45 +0800, Ming Lei wrote:
>>
>> > Unfortunately, the patch still hasn't any improvement on the transmit
>> > performance of beagle-xm.
>>
>> Ah yes, I need to change usbnet as well to be able to fully recycle the
>> big skbs allocated in turbo mode.
>>
>> Right now they are constantly allocated/freed and this sucks if SLAB
>> wants to check poison bytes in debug mode.
>
> In the mean time, you also can use the following patch I have to polish,
> but this should give you a nice boost, since the big skb skb->head wont
> be checked by SLAB debug :
Unfortunately, the patch makes the result of the same test worsen than
without the patch, :-(
Thanks,
--
Ming Lei
^ permalink raw reply
* Re: [RFC] Introduce to batch variants of accept() and epoll_ctl() syscall
From: Li Yu @ 2012-07-09 3:36 UTC (permalink / raw)
To: Eric Dumazet
Cc: Changli Gao, Linux Netdev List, Linux Kernel Mailing List,
davidel
In-Reply-To: <4FF6B20E.7000402@gmail.com>
于 2012年07月06日 17:38, Li Yu 写道:
> 于 2012年06月15日 16:51, Eric Dumazet 写道:
>> On Fri, 2012-06-15 at 13:37 +0800, Li Yu wrote:
>>
>>> Of course, I think that implementing them should not be a hard work :)
>>>
>>> Em. I really do not know whether it is necessary to introduce to a new
>>> syscall here. An alternative solution to add new socket option to handle
>>> such batch requirement, so applications also can detect if kernel has
>>> this extended ability with a easy getsockopt() call.
>>>
>>> Any way, I am going to try to write a prototype first.
>>
>> Before that, could you post the result of "perf top", or "perf
>> record ...;perf report"
>>
>
> Sorry for I just have time to write a benchmark to reproduce this
> problem on my test bed, below are results of "perf record -g -C 0".
> kernel is 3.4.0:
>
> Events: 7K cycles
> + 54.87% swapper [kernel.kallsyms] [k] poll_idle
> - 3.10% :22984 [kernel.kallsyms] [k] _raw_spin_lock
> - _raw_spin_lock
> - 64.62% sch_direct_xmit
> dev_queue_xmit
> ip_finish_output
> ip_output
> - ip_local_out
> + 49.48% ip_queue_xmit
> + 37.48% ip_build_and_send_pkt
> + 13.04% ip_send_skb
>
> I can not reproduce complete same high CPU usage on my testing
> environment, but top show that it has similar ratio of sys% and
> si% on one CPU:
>
> Tasks: 125 total, 2 running, 123 sleeping, 0 stopped, 0 zombie
> Cpu0 : 1.0%us, 30.7%sy, 0.0%ni, 18.8%id, 0.0%wa, 0.0%hi, 49.5%si,
> 0.0%st
>
> Well, it seem that I must acknowledge I was wrong here. however,
> I recall that I indeed ever encountered this in another benchmarking a
> small packets performance.
>
> I guess, this is since TX softirq and syscall context contend same lock
> in sch_direct_xmit(), is this right?
>
Em, do we have some means to decrease the lock contention here?
> thanks
>
> Yu
>
>>> The top shows the kernel is most cpu hog, the testing is simple,
>>> just a accept() -> epoll_ctl(ADD) loop, the ratio of cpu util sys% to
>>> si% is about 2:5.
>>
>> This ratio is not meaningful, if we dont know where time is spent.
>>
>>
>> I doubt epoll_ctl(ADD) is a problem here...
>>
>> If it is, batching the fds wont speed the thing anyway...
>>
>> I believe accept() is the problem here, because it contends with the
>> softirq processing the tcp session handshake.
>>
>>
>>
>>
>
>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox