All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Network Checksum Removal
@ 2005-05-20 23:30 Jon Mason
  2005-05-21 14:53 ` Keir Fraser
  0 siblings, 1 reply; 53+ messages in thread
From: Jon Mason @ 2005-05-20 23:30 UTC (permalink / raw)
  To: Xen-devel

Currently in Xen, interdomain communication needlessly wastes CPU cycles
calculating and verifying TCP/UDP checksums.  This is unnecessary, as
the possibility of packet corruption between domains is miniscule (and
can be detected in memory via ECC).  Also, domU's are unable to take
advantage of any adapter hardware checksum offload capabilities when
transmitting packets outside of the system.

This patch removes the inter-xen network checksums by using the existing
Linux hardware checksum offload infrastructure.  This decreased the
changes needed by this patch, and enabled me to easily use hardware
checksum on the physical
devices.

Here is how the traffic flow now works (generically):
Traffic generated by dom0 will not do the TCP/UDP checksums and will
notify domU this via the csum bit in netif_rx_response_t.  domU will
check for the csum bit on each incoming packet, and if not enabled it
will verify the checksum.

Traffic generated externally, if rx hardware checksum is available and
enabled, then dom0 will notify domU that it is unnecessary to validate
this checksum (providing the checksum is valid) by enabling the csum
bit.  If domU is not notified that it is unnecessary to vaildate the 
checksum, then domU will do it.

Traffic generated by domU will not do the TCP/UDP checksums and will
notify dom0 this via the csim bit in netif_tx_request_t.  dom0 will
check for the csum bit on each incoming packet, and if enabled it will
calculate the necessary bits for hardware checksum offload (skb->csum, 
which is the offset to insert the checksum).  It also sets
skb->ip_summed = CHECKSUM_UNNECESSARY;
skb->flags |= SKB_FDW_NO_CSUM;

ip_summed is set in the case that the packet is destined for dom0, which
will prevent dom0 from checking the TCP/UDP checksum.  Unfortunately,
this flag is stomped on by both routing and bridging.  So I added a new
skb field and a new flag, SKB_FDW_NO_CSUM.  This field is checked on
transmission and corrects the fields that have been modified by the
bridging/routing code.  Once these fields have been corrected, the
adapter (if tx csum able) or stack (via skb_checksum_help()) will
calculate the TCP/UDP checksum.

Performance:
I ran the following test cases with netperf3 TCP_STREAM, and get the
following boosts (using bridging):
domU->dom0		500Mbps
dom0->domU              10Mbps
domU->remote host       none
domU->domU		70Mbps
Note: I have a small bridging patch which increases dom0 throughput.  I
am in the process of having it accepted into the Linux kernel.

I currently do not have CPU utilization numbers (where the real boost of
this patch would be), and I do not have throughput numbers for
routing/nat.


Also, I added the ability to enable/disable checksum offload via the
ethtool command.  

Signed-off-by: Jon Mason <jdmason@us.ibm.com>

--- ../xen-unstable-pristine/xen/include/public/io/netif.h	2005-05-04 22:20:10.000000000 -0500
+++ xen/include/public/io/netif.h	2005-05-18 12:05:41.000000000 -0500
@@ -12,7 +12,8 @@
 typedef struct {
     memory_t addr;   /*  0: Machine address of packet.  */
     MEMORY_PADDING;
-    u16      id;     /*  8: Echoed in response message. */
+    u16      csum:1;
+    u16      id:15;     /*  8: Echoed in response message. */
     u16      size;   /* 10: Packet size in bytes.       */
 } PACKED netif_tx_request_t; /* 12 bytes */
 
@@ -29,7 +30,8 @@ typedef struct {
 typedef struct {
     memory_t addr;   /*  0: Machine address of packet.              */
     MEMORY_PADDING;
-    u16      id;     /*  8:  */
+    u16      csum:1;
+    u16      id:15;     /*  8:  */
     s16      status; /* 10: -ve: BLKIF_RSP_* ; +ve: Rx'ed pkt size. */
 } PACKED netif_rx_response_t; /* 12 bytes */
 
--- ../xen-unstable-pristine/linux-2.6.11-xen-sparse/drivers/xen/netback/netback.c	2005-05-04 22:20:01.000000000 -0500
+++ linux-2.6.11-xen-sparse/drivers/xen/netback/netback.c	2005-05-19 13:25:50.000000000 -0500
@@ -13,6 +13,9 @@
 #include "common.h"
 #include <asm-xen/balloon.h>
 #include <asm-xen/evtchn.h>
+#include <net/ip.h>
+#include <linux/tcp.h>
+#include <linux/udp.h>
 
 #if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,0)
 #include <linux/delay.h>
@@ -154,10 +157,14 @@ int netif_be_start_xmit(struct sk_buff *
         __skb_put(nskb, skb->len);
         (void)skb_copy_bits(skb, -hlen, nskb->data - hlen, skb->len + hlen);
         nskb->dev = skb->dev;
+	nskb->ip_summed = skb->ip_summed;
         dev_kfree_skb(skb);
         skb = nskb;
     }
 
+    if (skb->ip_summed > 0)
+	netif->rx->ring[MASK_NETIF_RX_IDX(netif->rx_resp_prod)].resp.csum = 1;
+	
     netif->rx_req_cons++;
     netif_get(netif);
 
@@ -646,6 +653,18 @@ static void net_tx_action(unsigned long 
         skb->dev      = netif->dev;
         skb->protocol = eth_type_trans(skb, skb->dev);
 
+	skb->csum = 0;
+	if (txreq.csum) {
+		skb->ip_summed = CHECKSUM_UNNECESSARY;
+		skb->flags |= SKB_FDW_NO_CSUM;
+		skb->nh.iph = (struct iphdr *) skb->data;
+		if (skb->nh.iph->protocol == IPPROTO_TCP)
+			skb->csum = offsetof(struct tcphdr, check);
+		if (skb->nh.iph->protocol == IPPROTO_UDP)
+			skb->csum = offsetof(struct udphdr, check);
+	} else
+		skb->ip_summed = CHECKSUM_NONE;
+
         netif->stats.rx_bytes += txreq.size;
         netif->stats.rx_packets++;
 
--- ../xen-unstable-pristine/linux-2.6.11-xen-sparse/drivers/xen/netback/interface.c	2005-05-04 22:20:09.000000000 -0500
+++ linux-2.6.11-xen-sparse/drivers/xen/netback/interface.c	2005-05-20 10:36:14.000000000 -0500
@@ -159,6 +159,7 @@ void netif_create(netif_be_create_t *cre
     dev->get_stats       = netif_be_get_stats;
     dev->open            = net_open;
     dev->stop            = net_close;
+    dev->features        = NETIF_F_NO_CSUM;
 
     /* Disable queuing. */
     dev->tx_queue_len = 0;
--- ../xen-unstable-pristine/linux-2.6.11-xen-sparse/drivers/xen/netfront/netfront.c	2005-05-04 22:20:11.000000000 -0500
+++ linux-2.6.11-xen-sparse/drivers/xen/netfront/netfront.c	2005-05-20 13:15:39.000000000 -0500
@@ -40,6 +40,7 @@
 #include <linux/init.h>
 #include <linux/bitops.h>
 #include <linux/proc_fs.h>
+#include <linux/ethtool.h>
 #include <net/sock.h>
 #include <net/pkt_sched.h>
 #include <net/arp.h>
@@ -287,6 +288,11 @@ static int send_fake_arp(struct net_devi
     return dev_queue_xmit(skb);
 }
 
+static struct ethtool_ops network_ethtool_ops = {
+	.get_tx_csum = ethtool_op_get_tx_csum,
+	.set_tx_csum = ethtool_op_set_tx_csum,
+};
+
 static int network_open(struct net_device *dev)
 {
     struct net_private *np = netdev_priv(dev);
@@ -472,6 +478,7 @@ static int network_start_xmit(struct sk_
     tx->id   = id;
     tx->addr = virt_to_machine(skb->data);
     tx->size = skb->len;
+    tx->csum = (skb->ip_summed) ? 1 : 0;
 
     wmb(); /* Ensure that backend will see the request. */
     np->tx->req_prod = i + 1;
@@ -572,6 +579,9 @@ static int netif_poll(struct net_device 
         skb->len  = rx->status;
         skb->tail = skb->data + skb->len;
 
+	if (rx->csum)
+		skb->ip_summed = CHECKSUM_UNNECESSARY;
+		
         np->stats.rx_packets++;
         np->stats.rx_bytes += rx->status;
 
@@ -966,7 +976,9 @@ static int create_netdev(int handle, str
     dev->get_stats       = network_get_stats;
     dev->poll            = netif_poll;
     dev->weight          = 64;
-    
+    dev->features 	 = NETIF_F_IP_CSUM;
+    SET_ETHTOOL_OPS(dev, &network_ethtool_ops);
+
     if ((err = register_netdev(dev)) != 0) {
         printk(KERN_WARNING "%s> register_netdev err=%d\n", __FUNCTION__, err);
         goto exit;
--- ../xen-unstable-pristine/linux-2.6.11-xen0/include/linux/skbuff.h	2005-03-02 01:38:38.000000000 -0600
+++ linux-2.6.11-xen0/include/linux/skbuff.h	2005-05-18 12:05:41.000000000 -0500
@@ -37,6 +37,10 @@
 #define CHECKSUM_HW 1
 #define CHECKSUM_UNNECESSARY 2
 
+#define SKB_CLONED	1
+#define SKB_NOHDR	2
+#define SKB_FDW_NO_CSUM	4
+
 #define SKB_DATA_ALIGN(X)	(((X) + (SMP_CACHE_BYTES - 1)) & \
 				 ~(SMP_CACHE_BYTES - 1))
 #define SKB_MAX_ORDER(X, ORDER)	(((PAGE_SIZE << (ORDER)) - (X) - \
@@ -238,7 +242,7 @@ struct sk_buff {
 				mac_len,
 				csum;
 	unsigned char		local_df,
-				cloned,
+				flags,
 				pkt_type,
 				ip_summed;
 	__u32			priority;
@@ -370,7 +374,7 @@ static inline void kfree_skb(struct sk_b
  */
 static inline int skb_cloned(const struct sk_buff *skb)
 {
-	return skb->cloned && atomic_read(&skb_shinfo(skb)->dataref) != 1;
+	return (skb->flags & SKB_CLONED) && atomic_read(&skb_shinfo(skb)->dataref) != 1;
 }
 
 /**
--- ../xen-unstable-pristine/linux-2.6.11-xen0/net/core/skbuff.c	2005-03-02 01:38:17.000000000 -0600
+++ linux-2.6.11-xen0/net/core/skbuff.c	2005-05-18 12:05:41.000000000 -0500
@@ -240,7 +240,7 @@ static void skb_clone_fraglist(struct sk
 
 void skb_release_data(struct sk_buff *skb)
 {
-	if (!skb->cloned ||
+	if (!(skb->flags & SKB_CLONED) ||
 	    atomic_dec_and_test(&(skb_shinfo(skb)->dataref))) {
 		if (skb_shinfo(skb)->nr_frags) {
 			int i;
@@ -352,7 +352,7 @@ struct sk_buff *skb_clone(struct sk_buff
 	C(data_len);
 	C(csum);
 	C(local_df);
-	n->cloned = 1;
+	n->flags = skb->flags | SKB_CLONED;
 	C(pkt_type);
 	C(ip_summed);
 	C(priority);
@@ -395,7 +395,7 @@ struct sk_buff *skb_clone(struct sk_buff
 	C(end);
 
 	atomic_inc(&(skb_shinfo(skb)->dataref));
-	skb->cloned = 1;
+	skb->flags |= SKB_CLONED;
 
 	return n;
 }
@@ -603,7 +603,7 @@ int pskb_expand_head(struct sk_buff *skb
 	skb->mac.raw += off;
 	skb->h.raw   += off;
 	skb->nh.raw  += off;
-	skb->cloned   = 0;
+	skb->flags    &= SKB_CLONED;
 	atomic_set(&skb_shinfo(skb)->dataref, 1);
 	return 0;
 
--- ../xen-unstable-pristine/linux-2.6.11-xen0/net/core/dev.c	2005-03-02 01:38:09.000000000 -0600
+++ linux-2.6.11-xen0/net/core/dev.c	2005-05-20 10:20:36.000000000 -0500
@@ -98,6 +98,7 @@
 #include <linux/stat.h>
 #include <linux/if_bridge.h>
 #include <linux/divert.h>
+#include <net/ip.h> 
 #include <net/dst.h>
 #include <net/pkt_sched.h>
 #include <net/checksum.h>
@@ -1182,7 +1183,7 @@ int __skb_linearize(struct sk_buff *skb,
 	skb->data    += offset;
 
 	/* We are no longer a clone, even if we were. */
-	skb->cloned    = 0;
+	skb->flags    &= ~SKB_CLONED;
 
 	skb->tail     += skb->data_len;
 	skb->data_len  = 0;
@@ -1236,6 +1237,15 @@ int dev_queue_xmit(struct sk_buff *skb)
 	    __skb_linearize(skb, GFP_ATOMIC))
 		goto out_kfree_skb;
 
+	/* If packet is forwarded to a device that needs a checksum and not 
+	 * checksummed, correct the pointers and enable checksumming in the 
+	 * next function.
+	 */
+	if (skb->flags & SKB_FDW_NO_CSUM) {
+		skb->ip_summed = CHECKSUM_HW;
+		skb->h.raw = (void *)skb->nh.iph + (skb->nh.iph->ihl * 4);
+	}
+
 	/* If packet is not checksummed and device does not support
 	 * checksumming for this protocol, complete checksumming here.
 	 */

^ permalink raw reply	[flat|nested] 53+ messages in thread
* RE: [PATCH] Network Checksum Removal
@ 2005-05-23 20:22 Ian Pratt
  2005-05-23 20:38 ` Keir Fraser
  2005-05-23 21:01 ` Bin Ren
  0 siblings, 2 replies; 53+ messages in thread
From: Ian Pratt @ 2005-05-23 20:22 UTC (permalink / raw)
  To: Keir Fraser, bin.ren; +Cc: Andrew Theurer, xen-devel, Jon Mason

> Overall though these are the kind of results I would expect. 
> Linux usually does csumming at the same time as it has to do 
> a copy anyway, and it ends up being limited by 
> memory/L2-cache bandwidth, not the extra computation. But the 
> offload extensions haven't cost much to implement and there 
> are probably cases where it helps a little.
> 
> Maybe I'm being pessimistic though: Can you reproduce the 
> rather more impressive speedups that you previously saw, Jon?

We should be getting some benefit on the receive path, where the
checksum is normally forced to happen independent of a copy. Having this
offloaded to hardware should produce some measureable gain.

Bin: The numbers you're seeing are terrible anyway. You should be seeing
890Mb/s for external traffic. What kind of machine is this on?

Ian

^ permalink raw reply	[flat|nested] 53+ messages in thread
* RE: [PATCH] Network Checksum Removal
@ 2005-05-23 20:26 Ian Pratt
  0 siblings, 0 replies; 53+ messages in thread
From: Ian Pratt @ 2005-05-23 20:26 UTC (permalink / raw)
  To: Jon Mason, xen-devel; +Cc: Andrew Theurer, bin.ren

 
> I would if I could.  As I don't use BK, I'll have to wait for 
> the nightly build to pull in your latest fix.

Jon,
Do you know about either of the following?
 http://www.bitkeeper.com/press/2005-03-17.html
 http://sourceforge.net/projects/sourcepuller/

I haven't used either myself, but I'd be interested to know whether they
work.

Thanks,
Ian

^ permalink raw reply	[flat|nested] 53+ messages in thread
* RE: [PATCH] Network Checksum Removal
@ 2005-05-23 23:59 Ian Pratt
  2005-05-24 16:12 ` Jon Mason
  0 siblings, 1 reply; 53+ messages in thread
From: Ian Pratt @ 2005-05-23 23:59 UTC (permalink / raw)
  To: Jon Mason, xen-devel; +Cc: Andrew Theurer, bin.ren


> I get the following domU->dom0 throughput on my system (using 
> netperf3 TCP_STREAM testcase):
> tx on	~1580Mbps
> tx off	~1230Mbps
> 
> with my previous patch (on Friday's build), I was seeing the 
> following:
> with patch	~1610Mbps
> no patch		~1100Mbps
> 
> The slight difference between the two might be caused by the 
> changes that were incorporated in xen between those dates.  
> If you think it is worth the time, I can back port the latest 
> patch to Friday's build to see if that makes a difference.

Are you sure these aren't within 'experimental error'? I can't think of
anything that's changed since Friday that could be effecting this, but
it would be good to dig a bit further as the difference in 'no patch'
results is quite significant. 
It might be revealing to try running some results on the unpatched
Fri/Sat/Sun tree. 

BTW, dom0<->domU is not that interesting as I'd generally discourage
people from running services in dom0. I'd be really interested to see
the following tests:

domU <-> external [dom0 on cpu0; dom1 on cpu1]
domU <-> external [dom0 on cpu0; dom1 on cpu0]
domU <-> domU [dom0 on cpu0; dom1 on cpu1; dom2 on cpu2 ** on a 4 way]
domU <-> domU [dom0 on cpu0; dom1 on cpu0; dom2 on cpu0 ]
domU <-> domU [dom0 on cpu0; dom1 on cpu1; dom2 on cpu1 ]
domU <-> domU [dom0 on cpu0; dom1 on cpu0; dom2 on cpu1 ]
domU <-> domU [dom0 on cpu0; dom1 on cpu1; dom2 on cpu2 ** cpu2
hyperthread w/ cpu 0]
domU <-> domU [dom0 on cpu0; dom1 on cpu1; dom2 on cpu3 ** cpu3
hyperthread w/ cpu 1]

This might help us understand the performance of interdomin networking
rather better than we do at present. If you could fill a few of these in
that would be great.

Best,
Ian

^ permalink raw reply	[flat|nested] 53+ messages in thread
* RE: [PATCH] Network Checksum Removal
@ 2005-05-24  1:22 Ian Pratt
  2005-05-24  1:35 ` Bin Ren
  2005-05-24 22:54 ` Bin Ren
  0 siblings, 2 replies; 53+ messages in thread
From: Ian Pratt @ 2005-05-24  1:22 UTC (permalink / raw)
  To: bin.ren, Rolf Neugebauer; +Cc: xen-devel, Andrew Theurer, Jon Mason

> What currently I'm really really obssessed is (1) 
> dom1->external with default BVT gives only ~400Mbps (2) 
> dom1->external with my EEVDF scheduler (everything else is 
> exactly the same) gives 610Mbps, very close to 
> dom0->external. With scheduler latency histograms, it seems 
> to be caused by *far too frequent* context switches in BVT. 
> I'm still digging.

Have you tried SEDF? I'm itching to make it the default scheduler...

Ian

^ permalink raw reply	[flat|nested] 53+ messages in thread
* RE: [PATCH] Network Checksum Removal
@ 2005-05-25 16:48 Ian Pratt
  2005-05-25 17:13 ` Jon Mason
  0 siblings, 1 reply; 53+ messages in thread
From: Ian Pratt @ 2005-05-25 16:48 UTC (permalink / raw)
  To: Andrew Theurer, Jon Mason, xen-devel; +Cc: bin.ren


What does the tx hw csum control actually turn on and off?

I'm surprised there's much benefit to csum offload on the tx side at all
as its almost always done as part of a copy.

I'd have thought the main benefit of csum offload was on the rx side, so
that packets received by the NIC are hardware csum'ed, passed through
the bridge, and then into the domU where the csum re-calculation is
avoided [it would normally need to be done before the TCP ack is sent,
and can't be done as part of a copy as the data won't be moved out of
the skb until the user app does a read].  The same rx csum check will be
avoided and hence provide benefit to domU <-> domU transfers.

In the figures below, which direction is the data stream heading? (I
presume it's a one way test, like ttcp?)

It's somewhat surprising that the dom0 bridge code is burning so much
CPU. xenoprofile results will be quite interesting to see what functions
are eating the CPU.

Ultimately, the best way of doing domU <-> domU networking will be to
allow point-to-point connections where netfronts are connected direct to
other netfronts if the hosts are on the same machine. However, the
priority for 3.0 is to optimise the normal front-back-bridge-back-front
path.

Thanks,
Ian

> -----Original Message-----
> From: Andrew Theurer [mailto:habanero@us.ibm.com] 
> Sent: 25 May 2005 15:39
> To: Jon Mason; xen-devel@lists.xensource.com
> Cc: Ian Pratt; bin.ren@cl.cam.ac.uk
> Subject: Re: [Xen-devel] [PATCH] Network Checksum Removal
> 
> Tests for domU->dom0, domU->host, and domU->domU are completed:
> 
> 3.2 GHz Xeon with Hyperhtreading, 2GB (correction) memory
>                                                               
>                                                                       
> Benchmark: netperf2 -T TCP_STREAM
>                                                               
>                                                                       
> dom0, dom1, and dom2 on cpu0 (first SMT thread on first core)
>  domU to host
>   hw tx csum
>    msg-size: 00064  Mbps: 0186  d0-cpu: 49.38  d1-cpu: 44.35
>    msg-size: 01500  Mbps: 0917  d0-cpu: 62.13  d1-cpu: 37.87
>    msg-size: 16384  Mbps: 0933  d0-cpu: 66.63  d1-cpu: 33.37
>    msg-size: 32768  Mbps: 0928  d0-cpu: 66.96  d1-cpu: 32.66
>   sw tx csum
>    msg-size: 00064  Mbps: 0187  d0-cpu: 49.50  d1-cpu: 44.52
>    msg-size: 01500  Mbps: 0904  d0-cpu: 60.63  d1-cpu: 39.36
>    msg-size: 16384  Mbps: 0924  d0-cpu: 63.98  d1-cpu: 35.98
>    msg-size: 32768  Mbps: 0926  d0-cpu: 64.18  d1-cpu: 35.68
> 	^^about 2% reduction in cpu util on dom1^^
>  domU to dom0
>   hw tx csum
>    msg-size: 00064  Mbps: 0014  d0-cpu: 64.02  d1-cpu: 31.71
>    msg-size: 01500  Mbps: 1087  d0-cpu: 63.34  d1-cpu: 36.67
>    msg-size: 16384  Mbps: 1204  d0-cpu: 67.30  d1-cpu: 32.71
>    msg-size: 32768  Mbps: 1148  d0-cpu: 68.08  d1-cpu: 31.93
>   sw tx csum
>    msg-size: 00064  Mbps: 0014  d0-cpu: 64.88  d1-cpu: 32.39
>    msg-size: 01500  Mbps: 0948  d0-cpu: 62.20  d1-cpu: 37.80
>    msg-size: 16384  Mbps: 1063  d0-cpu: 64.73  d1-cpu: 35.27
>    msg-size: 32768  Mbps: 1012  d0-cpu: 65.71  d1-cpu: 34.30
> 	^^upto 13% throughput increase with cpu util down ~2% on dom1^^
> 	  Note the dismal performance for very small msg sizes
>  donU to domU
>   hw tx csum
>    msg-size:00064 Mbps: 0359  d0-cpu: 27.85  d1-cpu: 53.68 
> d2-cpu: 18.48
>    msg-size:01500 Mbps: 0594  d0-cpu: 47.42  d1-cpu: 21.77 
> d2-cpu: 30.78
>    msg-size:16384 Mbps: 0619  d0-cpu: 49.66  d1-cpu: 18.81 
> d2-cpu: 31.53
>    msg-size:32768 Mbps: 0616  d0-cpu: 49.58  d1-cpu: 18.68 
> d2-cpu: 31.74
>   sw tx csum
>    msg-size:00064 Mbps: 0361  d0-cpu: 27.81  d1-cpu: 53.58 
> d2-cpu: 18.62
>    msg-size:01500 Mbps: 0584  d0-cpu: 46.22  d1-cpu: 23.18 
> d2-cpu: 30.60
>    msg-size:16384 Mbps: 0602  d0-cpu: 47.99  d1-cpu: 20.33 
> d2-cpu: 31.69
>    msg-size:32768 Mbps: 0603  d0-cpu: 47.67  d1-cpu: 20.59 
> d2-cpu: 31.74
> 	^^About a 2% throughput increase, and cpu down on d1
> 	  The cpu wasted on dom1 should be enough justification for
> 	  domU<->domU communication with point to point front end driver
> 	  communication.  
> dom0 on cpu0, dom1 on cpu2, and dom2 on cpu3 (dom1 and dom2 on same 
> core)
>  domU to host
>   hw tx csum
>    msg-size: 00064  Mbps: 0540  d0-cpu: 92.98  d1-cpu: 100.00
>    msg-size: 01500  Mbps: 0941  d0-cpu: 99.74  d1-cpu: 48.62
>    msg-size: 16384  Mbps: 0941  d0-cpu: 99.71  d1-cpu: 43.32
>    msg-size: 32768  Mbps: 0941  d0-cpu: 99.72  d1-cpu: 43.21
>   sw tx csum
>    msg-size: 00064  Mbps: 0545  d0-cpu: 93.47  d1-cpu: 100.00
>    msg-size: 01500  Mbps: 0941  d0-cpu: 99.76  d1-cpu: 51.43
>    msg-size: 16384  Mbps: 0941  d0-cpu: 99.69  d1-cpu: 46.58
>    msg-size: 32768  Mbps: 0941  d0-cpu: 99.72  d1-cpu: 45.39
> 	^^Finally at wire speed, but at a cost of 100% cpu on dom0
> 	  This cpu util seems excessive, maybe oprofile will show
> 	  some problems.  Notice dom1 has ~2% lower cpu.
>  domU to dom0
>   tx csum
>    msg-size: 00064  Mbps: 0390  d0-cpu: 97.92  d1-cpu: 100.00
>    msg-size: 01500  Mbps: 1571  d0-cpu: 97.36  d1-cpu: 54.83
>    msg-size: 16384  Mbps: 1582  d0-cpu: 96.20  d1-cpu: 49.93
>    msg-size: 32768  Mbps: 1596  d0-cpu: 96.32  d1-cpu: 49.63
>   sw tx csum
>    msg-size: 00064  Mbps: 0375  d0-cpu: 97.65  d1-cpu: 100.00 
>    msg-size: 01500  Mbps: 1546  d0-cpu: 96.36  d1-cpu: 52.99
>    msg-size: 16384  Mbps: 1598  d0-cpu: 95.88  d1-cpu: 47.48
>    msg-size: 32768  Mbps: 1641  d0-cpu: 95.89  d1-cpu: 46.37 
> 	^^very slightly better avg throughput, and lower cpu on dom1
>  donU to domU
>   tx csum
>    msg-size:00064 Mbps: 0287  d0-cpu: 84.97  d1-cpu: 100.0 
> d2-cpu: 75.46
>    msg-size:01500 Mbps: 1004  d0-cpu: 90.98  d1-cpu: 68.29 
> d2-cpu: 76.94
>    msg-size:16384 Mbps: 1018  d0-cpu: 89.78  d1-cpu: 60.82 
> d2-cpu: 78.12
>    msg-size:32768 Mbps: 1010  d0-cpu: 89.30  d1-cpu: 59.83 
> d2-cpu: 77.99
>   sw tx csum
>    msg-size:00064 Mbps: 0286  d0-cpu: 84.81  d1-cpu: 99.93 
> d2-cpu: 76.28
>    msg-size:01500 Mbps: 1018  d0-cpu: 91.30  d1-cpu: 67.27 
> d2-cpu: 75.08
>    msg-size:16384 Mbps: 1012  d0-cpu: 88.46  d1-cpu: 55.56 
> d2-cpu: 71.37
>    msg-size:32768 Mbps: 1017  d0-cpu: 88.33  d1-cpu: 54.96 
> d2-cpu: 70.96
> 	^^about same throughput, but ~4% lower cpu on d1
> 	  Again, point to point front end comms woudl be great here.
> 
> 
> IMO, I think the patch is a good thing.  There are other very major 
> issues with networking, like the massive cpu overhead for dom0.  I 
> wonder if we could have a layer 2 networking model like:
> 
> -Xen has have front end ethernet drivers only
> -dom0 has a Xen bridge front end driver, just to put eth0 (or 
> whatever 
> phys dev) on it.
> -no domain hosted bridge device or backend ethernet drivers
>  
> With this, Xen acts as a ethernet "switch", switching 
> ethernet traffic 
> in xen itself, without the help of a domain hosted bridge.  
> Packets are 
> forwarded to either a domain's front end driver, or the front end 
> bridge interface in dom0 (or any other driver domain).  With this we 
> may have better control of emulating offload functions, and we should 
> avoid some hops (and in may cases involving dom0) for the netwrok 
> traffic.  Comments?
> 
> -Andrew                                                       
>                                                               
>            
> 
> 
> 

^ permalink raw reply	[flat|nested] 53+ messages in thread
* RE: [PATCH] Network Checksum Removal
@ 2005-05-25 20:06 Ian Pratt
  2005-05-25 21:14 ` Keir Fraser
  2005-05-25 21:38 ` Cédric Schieli
  0 siblings, 2 replies; 53+ messages in thread
From: Ian Pratt @ 2005-05-25 20:06 UTC (permalink / raw)
  To: Jon Mason, xen-devel; +Cc: Andrew Theurer, bin.ren

 


> > I'm surprised there's much benefit to csum offload on the 
> tx side at 
> > all as its almost always done as part of a copy.
> 
> Why?  The tx checksumming is just as expensive as the rx checksumming.

[Nivedita has already posted a nice explanation.]
 

> There is a patch on netdev which can decrease the CPU load of 
> bridging.  
> specifically, it allows the bridge device to take advantage 
> of the network device features (like hardware checksum 
> offload).  Stephen Hemminger says it should go in the 2.6.13 kernel.  

Please can you post it as a patch so that we can include it in our
2.6.11 patches directory.

With the patch, csum offload will be much more interesting in the rx
case 

Thanks,
Ian

^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2005-05-26 13:37 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-05-20 23:30 [PATCH] Network Checksum Removal Jon Mason
2005-05-21 14:53 ` Keir Fraser
2005-05-21 19:16   ` Keir Fraser
2005-05-21 21:49     ` Jon Mason
2005-05-23 15:29     ` Andrew Theurer
2005-05-23 15:31       ` Bin Ren
2005-05-23 15:47         ` Andrew Theurer
2005-05-23 15:56           ` Bin Ren
2005-05-23 16:06             ` Bin Ren
2005-05-23 16:16               ` Jon Mason
2005-05-23 16:36                 ` Bin Ren
2005-05-23 17:54                   ` Keir Fraser
2005-05-23 18:08                     ` Bin Ren
2005-05-23 18:18                       ` Jon Mason
2005-05-23 18:43                         ` Keir Fraser
2005-05-23 18:53                           ` Bin Ren
2005-05-23 19:55                     ` Bin Ren
2005-05-23 20:13                       ` Keir Fraser
2005-05-23 20:20                         ` Jon Mason
2005-05-23 21:52                         ` Bin Ren
2005-05-23 21:58                         ` Jon Mason
2005-05-23 22:05                           ` Bin Ren
2005-05-23 22:41                             ` Jon Mason
2005-05-23 21:12                       ` Nivedita Singhvi
2005-05-23 21:48                         ` Bin Ren
2005-05-23 23:55                           ` Rolf Neugebauer
2005-05-24  0:38                             ` Bin Ren
  -- strict thread matches above, loose matches on Subject: below --
2005-05-23 20:22 Ian Pratt
2005-05-23 20:38 ` Keir Fraser
2005-05-23 20:44   ` Jon Mason
2005-05-23 21:01 ` Bin Ren
2005-05-23 21:09   ` Andrew Theurer
2005-05-23 20:26 Ian Pratt
2005-05-23 23:59 Ian Pratt
2005-05-24 16:12 ` Jon Mason
2005-05-24 20:45   ` Andrew Theurer
2005-05-25 14:38     ` Andrew Theurer
2005-05-24  1:22 Ian Pratt
2005-05-24  1:35 ` Bin Ren
2005-05-24 22:54 ` Bin Ren
2005-05-25 16:48 Ian Pratt
2005-05-25 17:13 ` Jon Mason
2005-05-25 18:19   ` Nivedita Singhvi
2005-05-25 20:06 Ian Pratt
2005-05-25 21:14 ` Keir Fraser
2005-05-25 21:35   ` Jon Mason
2005-05-25 21:40     ` Keir Fraser
2005-05-25 23:41       ` Jon Mason
2005-05-26  8:07         ` Keir Fraser
2005-05-26 13:37           ` Jon Mason
2005-05-25 21:38 ` Cédric Schieli
2005-05-25 21:47   ` Keir Fraser
2005-05-25 21:54     ` Keir Fraser

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.