All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Network Checksum Removal
@ 2005-05-20 23:30 Jon Mason
  2005-05-21 14:53 ` Keir Fraser
  0 siblings, 1 reply; 53+ messages in thread
From: Jon Mason @ 2005-05-20 23:30 UTC (permalink / raw)
  To: Xen-devel

Currently in Xen, interdomain communication needlessly wastes CPU cycles
calculating and verifying TCP/UDP checksums.  This is unnecessary, as
the possibility of packet corruption between domains is miniscule (and
can be detected in memory via ECC).  Also, domU's are unable to take
advantage of any adapter hardware checksum offload capabilities when
transmitting packets outside of the system.

This patch removes the inter-xen network checksums by using the existing
Linux hardware checksum offload infrastructure.  This decreased the
changes needed by this patch, and enabled me to easily use hardware
checksum on the physical
devices.

Here is how the traffic flow now works (generically):
Traffic generated by dom0 will not do the TCP/UDP checksums and will
notify domU this via the csum bit in netif_rx_response_t.  domU will
check for the csum bit on each incoming packet, and if not enabled it
will verify the checksum.

Traffic generated externally, if rx hardware checksum is available and
enabled, then dom0 will notify domU that it is unnecessary to validate
this checksum (providing the checksum is valid) by enabling the csum
bit.  If domU is not notified that it is unnecessary to vaildate the 
checksum, then domU will do it.

Traffic generated by domU will not do the TCP/UDP checksums and will
notify dom0 this via the csim bit in netif_tx_request_t.  dom0 will
check for the csum bit on each incoming packet, and if enabled it will
calculate the necessary bits for hardware checksum offload (skb->csum, 
which is the offset to insert the checksum).  It also sets
skb->ip_summed = CHECKSUM_UNNECESSARY;
skb->flags |= SKB_FDW_NO_CSUM;

ip_summed is set in the case that the packet is destined for dom0, which
will prevent dom0 from checking the TCP/UDP checksum.  Unfortunately,
this flag is stomped on by both routing and bridging.  So I added a new
skb field and a new flag, SKB_FDW_NO_CSUM.  This field is checked on
transmission and corrects the fields that have been modified by the
bridging/routing code.  Once these fields have been corrected, the
adapter (if tx csum able) or stack (via skb_checksum_help()) will
calculate the TCP/UDP checksum.

Performance:
I ran the following test cases with netperf3 TCP_STREAM, and get the
following boosts (using bridging):
domU->dom0		500Mbps
dom0->domU              10Mbps
domU->remote host       none
domU->domU		70Mbps
Note: I have a small bridging patch which increases dom0 throughput.  I
am in the process of having it accepted into the Linux kernel.

I currently do not have CPU utilization numbers (where the real boost of
this patch would be), and I do not have throughput numbers for
routing/nat.


Also, I added the ability to enable/disable checksum offload via the
ethtool command.  

Signed-off-by: Jon Mason <jdmason@us.ibm.com>

--- ../xen-unstable-pristine/xen/include/public/io/netif.h	2005-05-04 22:20:10.000000000 -0500
+++ xen/include/public/io/netif.h	2005-05-18 12:05:41.000000000 -0500
@@ -12,7 +12,8 @@
 typedef struct {
     memory_t addr;   /*  0: Machine address of packet.  */
     MEMORY_PADDING;
-    u16      id;     /*  8: Echoed in response message. */
+    u16      csum:1;
+    u16      id:15;     /*  8: Echoed in response message. */
     u16      size;   /* 10: Packet size in bytes.       */
 } PACKED netif_tx_request_t; /* 12 bytes */
 
@@ -29,7 +30,8 @@ typedef struct {
 typedef struct {
     memory_t addr;   /*  0: Machine address of packet.              */
     MEMORY_PADDING;
-    u16      id;     /*  8:  */
+    u16      csum:1;
+    u16      id:15;     /*  8:  */
     s16      status; /* 10: -ve: BLKIF_RSP_* ; +ve: Rx'ed pkt size. */
 } PACKED netif_rx_response_t; /* 12 bytes */
 
--- ../xen-unstable-pristine/linux-2.6.11-xen-sparse/drivers/xen/netback/netback.c	2005-05-04 22:20:01.000000000 -0500
+++ linux-2.6.11-xen-sparse/drivers/xen/netback/netback.c	2005-05-19 13:25:50.000000000 -0500
@@ -13,6 +13,9 @@
 #include "common.h"
 #include <asm-xen/balloon.h>
 #include <asm-xen/evtchn.h>
+#include <net/ip.h>
+#include <linux/tcp.h>
+#include <linux/udp.h>
 
 #if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,0)
 #include <linux/delay.h>
@@ -154,10 +157,14 @@ int netif_be_start_xmit(struct sk_buff *
         __skb_put(nskb, skb->len);
         (void)skb_copy_bits(skb, -hlen, nskb->data - hlen, skb->len + hlen);
         nskb->dev = skb->dev;
+	nskb->ip_summed = skb->ip_summed;
         dev_kfree_skb(skb);
         skb = nskb;
     }
 
+    if (skb->ip_summed > 0)
+	netif->rx->ring[MASK_NETIF_RX_IDX(netif->rx_resp_prod)].resp.csum = 1;
+	
     netif->rx_req_cons++;
     netif_get(netif);
 
@@ -646,6 +653,18 @@ static void net_tx_action(unsigned long 
         skb->dev      = netif->dev;
         skb->protocol = eth_type_trans(skb, skb->dev);
 
+	skb->csum = 0;
+	if (txreq.csum) {
+		skb->ip_summed = CHECKSUM_UNNECESSARY;
+		skb->flags |= SKB_FDW_NO_CSUM;
+		skb->nh.iph = (struct iphdr *) skb->data;
+		if (skb->nh.iph->protocol == IPPROTO_TCP)
+			skb->csum = offsetof(struct tcphdr, check);
+		if (skb->nh.iph->protocol == IPPROTO_UDP)
+			skb->csum = offsetof(struct udphdr, check);
+	} else
+		skb->ip_summed = CHECKSUM_NONE;
+
         netif->stats.rx_bytes += txreq.size;
         netif->stats.rx_packets++;
 
--- ../xen-unstable-pristine/linux-2.6.11-xen-sparse/drivers/xen/netback/interface.c	2005-05-04 22:20:09.000000000 -0500
+++ linux-2.6.11-xen-sparse/drivers/xen/netback/interface.c	2005-05-20 10:36:14.000000000 -0500
@@ -159,6 +159,7 @@ void netif_create(netif_be_create_t *cre
     dev->get_stats       = netif_be_get_stats;
     dev->open            = net_open;
     dev->stop            = net_close;
+    dev->features        = NETIF_F_NO_CSUM;
 
     /* Disable queuing. */
     dev->tx_queue_len = 0;
--- ../xen-unstable-pristine/linux-2.6.11-xen-sparse/drivers/xen/netfront/netfront.c	2005-05-04 22:20:11.000000000 -0500
+++ linux-2.6.11-xen-sparse/drivers/xen/netfront/netfront.c	2005-05-20 13:15:39.000000000 -0500
@@ -40,6 +40,7 @@
 #include <linux/init.h>
 #include <linux/bitops.h>
 #include <linux/proc_fs.h>
+#include <linux/ethtool.h>
 #include <net/sock.h>
 #include <net/pkt_sched.h>
 #include <net/arp.h>
@@ -287,6 +288,11 @@ static int send_fake_arp(struct net_devi
     return dev_queue_xmit(skb);
 }
 
+static struct ethtool_ops network_ethtool_ops = {
+	.get_tx_csum = ethtool_op_get_tx_csum,
+	.set_tx_csum = ethtool_op_set_tx_csum,
+};
+
 static int network_open(struct net_device *dev)
 {
     struct net_private *np = netdev_priv(dev);
@@ -472,6 +478,7 @@ static int network_start_xmit(struct sk_
     tx->id   = id;
     tx->addr = virt_to_machine(skb->data);
     tx->size = skb->len;
+    tx->csum = (skb->ip_summed) ? 1 : 0;
 
     wmb(); /* Ensure that backend will see the request. */
     np->tx->req_prod = i + 1;
@@ -572,6 +579,9 @@ static int netif_poll(struct net_device 
         skb->len  = rx->status;
         skb->tail = skb->data + skb->len;
 
+	if (rx->csum)
+		skb->ip_summed = CHECKSUM_UNNECESSARY;
+		
         np->stats.rx_packets++;
         np->stats.rx_bytes += rx->status;
 
@@ -966,7 +976,9 @@ static int create_netdev(int handle, str
     dev->get_stats       = network_get_stats;
     dev->poll            = netif_poll;
     dev->weight          = 64;
-    
+    dev->features 	 = NETIF_F_IP_CSUM;
+    SET_ETHTOOL_OPS(dev, &network_ethtool_ops);
+
     if ((err = register_netdev(dev)) != 0) {
         printk(KERN_WARNING "%s> register_netdev err=%d\n", __FUNCTION__, err);
         goto exit;
--- ../xen-unstable-pristine/linux-2.6.11-xen0/include/linux/skbuff.h	2005-03-02 01:38:38.000000000 -0600
+++ linux-2.6.11-xen0/include/linux/skbuff.h	2005-05-18 12:05:41.000000000 -0500
@@ -37,6 +37,10 @@
 #define CHECKSUM_HW 1
 #define CHECKSUM_UNNECESSARY 2
 
+#define SKB_CLONED	1
+#define SKB_NOHDR	2
+#define SKB_FDW_NO_CSUM	4
+
 #define SKB_DATA_ALIGN(X)	(((X) + (SMP_CACHE_BYTES - 1)) & \
 				 ~(SMP_CACHE_BYTES - 1))
 #define SKB_MAX_ORDER(X, ORDER)	(((PAGE_SIZE << (ORDER)) - (X) - \
@@ -238,7 +242,7 @@ struct sk_buff {
 				mac_len,
 				csum;
 	unsigned char		local_df,
-				cloned,
+				flags,
 				pkt_type,
 				ip_summed;
 	__u32			priority;
@@ -370,7 +374,7 @@ static inline void kfree_skb(struct sk_b
  */
 static inline int skb_cloned(const struct sk_buff *skb)
 {
-	return skb->cloned && atomic_read(&skb_shinfo(skb)->dataref) != 1;
+	return (skb->flags & SKB_CLONED) && atomic_read(&skb_shinfo(skb)->dataref) != 1;
 }
 
 /**
--- ../xen-unstable-pristine/linux-2.6.11-xen0/net/core/skbuff.c	2005-03-02 01:38:17.000000000 -0600
+++ linux-2.6.11-xen0/net/core/skbuff.c	2005-05-18 12:05:41.000000000 -0500
@@ -240,7 +240,7 @@ static void skb_clone_fraglist(struct sk
 
 void skb_release_data(struct sk_buff *skb)
 {
-	if (!skb->cloned ||
+	if (!(skb->flags & SKB_CLONED) ||
 	    atomic_dec_and_test(&(skb_shinfo(skb)->dataref))) {
 		if (skb_shinfo(skb)->nr_frags) {
 			int i;
@@ -352,7 +352,7 @@ struct sk_buff *skb_clone(struct sk_buff
 	C(data_len);
 	C(csum);
 	C(local_df);
-	n->cloned = 1;
+	n->flags = skb->flags | SKB_CLONED;
 	C(pkt_type);
 	C(ip_summed);
 	C(priority);
@@ -395,7 +395,7 @@ struct sk_buff *skb_clone(struct sk_buff
 	C(end);
 
 	atomic_inc(&(skb_shinfo(skb)->dataref));
-	skb->cloned = 1;
+	skb->flags |= SKB_CLONED;
 
 	return n;
 }
@@ -603,7 +603,7 @@ int pskb_expand_head(struct sk_buff *skb
 	skb->mac.raw += off;
 	skb->h.raw   += off;
 	skb->nh.raw  += off;
-	skb->cloned   = 0;
+	skb->flags    &= SKB_CLONED;
 	atomic_set(&skb_shinfo(skb)->dataref, 1);
 	return 0;
 
--- ../xen-unstable-pristine/linux-2.6.11-xen0/net/core/dev.c	2005-03-02 01:38:09.000000000 -0600
+++ linux-2.6.11-xen0/net/core/dev.c	2005-05-20 10:20:36.000000000 -0500
@@ -98,6 +98,7 @@
 #include <linux/stat.h>
 #include <linux/if_bridge.h>
 #include <linux/divert.h>
+#include <net/ip.h> 
 #include <net/dst.h>
 #include <net/pkt_sched.h>
 #include <net/checksum.h>
@@ -1182,7 +1183,7 @@ int __skb_linearize(struct sk_buff *skb,
 	skb->data    += offset;
 
 	/* We are no longer a clone, even if we were. */
-	skb->cloned    = 0;
+	skb->flags    &= ~SKB_CLONED;
 
 	skb->tail     += skb->data_len;
 	skb->data_len  = 0;
@@ -1236,6 +1237,15 @@ int dev_queue_xmit(struct sk_buff *skb)
 	    __skb_linearize(skb, GFP_ATOMIC))
 		goto out_kfree_skb;
 
+	/* If packet is forwarded to a device that needs a checksum and not 
+	 * checksummed, correct the pointers and enable checksumming in the 
+	 * next function.
+	 */
+	if (skb->flags & SKB_FDW_NO_CSUM) {
+		skb->ip_summed = CHECKSUM_HW;
+		skb->h.raw = (void *)skb->nh.iph + (skb->nh.iph->ihl * 4);
+	}
+
 	/* If packet is not checksummed and device does not support
 	 * checksumming for this protocol, complete checksumming here.
 	 */

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] Network Checksum Removal
  2005-05-20 23:30 [PATCH] Network Checksum Removal Jon Mason
@ 2005-05-21 14:53 ` Keir Fraser
  2005-05-21 19:16   ` Keir Fraser
  0 siblings, 1 reply; 53+ messages in thread
From: Keir Fraser @ 2005-05-21 14:53 UTC (permalink / raw)
  To: Jon Mason; +Cc: Xen-devel


On 21 May 2005, at 00:30, Jon Mason wrote:

>
> Traffic generated externally, if rx hardware checksum is available and
> enabled, then dom0 will notify domU that it is unnecessary to validate
> this checksum (providing the checksum is valid) by enabling the csum
> bit.  If domU is not notified that it is unnecessary to vaildate the
> checksum, then domU will do it.

Unfortunately you can't trust the ip_summed flag because, as you point 
out yourself, the bridge and IP forwarding paths both clobber it to 
CHECKSUM_NONE. This puts us in a pickle: without hacking in some more 
info we have no way to know whether the physical interface (eth0, say) 
summed the packet or not. And, if it did, whether it was a 
CHECKSUM_UNNECESSARY or a CHECKSUM_HW kind of summing (they differ in 
how you interpret the result).

Your patch as its stands is only correct if eth0 sets 
ip_summed==CHECKSUM_UNNECESSARY on received packets.

  -- Keir

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] Network Checksum Removal
  2005-05-21 14:53 ` Keir Fraser
@ 2005-05-21 19:16   ` Keir Fraser
  2005-05-21 21:49     ` Jon Mason
  2005-05-23 15:29     ` Andrew Theurer
  0 siblings, 2 replies; 53+ messages in thread
From: Keir Fraser @ 2005-05-21 19:16 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Xen-devel, Jon Mason


On 21 May 2005, at 15:53, Keir Fraser wrote:

>> Traffic generated externally, if rx hardware checksum is available and
>> enabled, then dom0 will notify domU that it is unnecessary to validate
>> this checksum (providing the checksum is valid) by enabling the csum
>> bit.  If domU is not notified that it is unnecessary to vaildate the
>> checksum, then domU will do it.
>
> Unfortunately you can't trust the ip_summed flag because, as you point 
> out yourself, the bridge and IP forwarding paths both clobber it to 
> CHECKSUM_NONE. This puts us in a pickle: without hacking in some more 
> info we have no way to know whether the physical interface (eth0, say) 
> summed the packet or not. And, if it did, whether it was a 
> CHECKSUM_UNNECESSARY or a CHECKSUM_HW kind of summing (they differ in 
> how you interpret the result).
>
> Your patch as its stands is only correct if eth0 sets 
> ip_summed==CHECKSUM_UNNECESSARY on received packets.

I've checked in a modified version of your patch that hopefully deals 
with propagating checksum information in both directions across a 
virtual bridge or router. I replaced your skb flags with two new ones 
-- proto_csum_blank and proto_csum_valid.

The former indicates that the protocol-level checksum needs filling in. 
This is not a problem for local processing, but the flag is picked up 
before sending to a physical interface and fixed up.

The latter indicates that the proto-level checksum has been validated 
since arrival at localhost (*or* that the packet originated from a domU 
on localhost). This flag survives crossing a bridge/router so we can 
trust it when deciding if checksum validation is required.

I'll push the patch to the bkbits repository just as soon as bkbits 
rematerialises. :-)

If you have any performance or stress tests that you were using to test 
checksum offloading, it would be great to find out how they perform on 
the checked-in version!

  Thanks,
  Keir

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] Network Checksum Removal
  2005-05-21 19:16   ` Keir Fraser
@ 2005-05-21 21:49     ` Jon Mason
  2005-05-23 15:29     ` Andrew Theurer
  1 sibling, 0 replies; 53+ messages in thread
From: Jon Mason @ 2005-05-21 21:49 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Xen-devel

On Saturday 21 May 2005 02:16 pm, Keir Fraser wrote:
> On 21 May 2005, at 15:53, Keir Fraser wrote:
> >> Traffic generated externally, if rx hardware checksum is available and
> >> enabled, then dom0 will notify domU that it is unnecessary to validate
> >> this checksum (providing the checksum is valid) by enabling the csum
> >> bit.  If domU is not notified that it is unnecessary to vaildate the
> >> checksum, then domU will do it.
> >
> > Unfortunately you can't trust the ip_summed flag because, as you point
> > out yourself, the bridge and IP forwarding paths both clobber it to
> > CHECKSUM_NONE. This puts us in a pickle: without hacking in some more
> > info we have no way to know whether the physical interface (eth0, say)
> > summed the packet or not. And, if it did, whether it was a
> > CHECKSUM_UNNECESSARY or a CHECKSUM_HW kind of summing (they differ in
> > how you interpret the result).
> >
> > Your patch as its stands is only correct if eth0 sets
> > ip_summed==CHECKSUM_UNNECESSARY on received packets.

Silly mistake on my part.  Good catch.

> I've checked in a modified version of your patch that hopefully deals
> with propagating checksum information in both directions across a
> virtual bridge or router. I replaced your skb flags with two new ones
> -- proto_csum_blank and proto_csum_valid.
>
> The former indicates that the protocol-level checksum needs filling in.
> This is not a problem for local processing, but the flag is picked up
> before sending to a physical interface and fixed up.
>
> The latter indicates that the proto-level checksum has been validated
> since arrival at localhost (*or* that the packet originated from a domU
> on localhost). This flag survives crossing a bridge/router so we can
> trust it when deciding if checksum validation is required.
>
> I'll push the patch to the bkbits repository just as soon as bkbits
> rematerialises. :-)

I'd be interested in seeing the bits you added.  

> If you have any performance or stress tests that you were using to test
> checksum offloading, it would be great to find out how they perform on
> the checked-in version!

I am happy to give the latest patch some testing (thought I probably won't be 
able Monday).

Thanks,
Jon

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] Network Checksum Removal
  2005-05-21 19:16   ` Keir Fraser
  2005-05-21 21:49     ` Jon Mason
@ 2005-05-23 15:29     ` Andrew Theurer
  2005-05-23 15:31       ` Bin Ren
  1 sibling, 1 reply; 53+ messages in thread
From: Andrew Theurer @ 2005-05-23 15:29 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Xen-devel, Jon Mason

> I've checked in a modified version of your patch that hopefully deals
> with propagating checksum information in both directions across a
> virtual bridge or router. I replaced your skb flags with two new ones
> -- proto_csum_blank and proto_csum_valid.
>
> The former indicates that the protocol-level checksum needs filling
> in. This is not a problem for local processing, but the flag is
> picked up before sending to a physical interface and fixed up.
>
> The latter indicates that the proto-level checksum has been validated
> since arrival at localhost (*or* that the packet originated from a
> domU on localhost). This flag survives crossing a bridge/router so we
> can trust it when deciding if checksum validation is required.
>
> I'll push the patch to the bkbits repository just as soon as bkbits
> rematerialises. :-)
>
> If you have any performance or stress tests that you were using to
> test checksum offloading, it would be great to find out how they
> perform on the checked-in version!

Now that BK is up, I'll run some netperf tests before/after that 
changeset and see what we get.

-Andrew

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] Network Checksum Removal
  2005-05-23 15:29     ` Andrew Theurer
@ 2005-05-23 15:31       ` Bin Ren
  2005-05-23 15:47         ` Andrew Theurer
  0 siblings, 1 reply; 53+ messages in thread
From: Bin Ren @ 2005-05-23 15:31 UTC (permalink / raw)
  To: Andrew Theurer; +Cc: Xen-devel, Jon Mason

It seems to break the interdomain ssh and nfs on my machine. Digging
for reasons.

- Bin

On 5/23/05, Andrew Theurer <habanero@us.ibm.com> wrote:
> > I've checked in a modified version of your patch that hopefully deals
> > with propagating checksum information in both directions across a
> > virtual bridge or router. I replaced your skb flags with two new ones
> > -- proto_csum_blank and proto_csum_valid.
> >
> > The former indicates that the protocol-level checksum needs filling
> > in. This is not a problem for local processing, but the flag is
> > picked up before sending to a physical interface and fixed up.
> >
> > The latter indicates that the proto-level checksum has been validated
> > since arrival at localhost (*or* that the packet originated from a
> > domU on localhost). This flag survives crossing a bridge/router so we
> > can trust it when deciding if checksum validation is required.
> >
> > I'll push the patch to the bkbits repository just as soon as bkbits
> > rematerialises. :-)
> >
> > If you have any performance or stress tests that you were using to
> > test checksum offloading, it would be great to find out how they
> > perform on the checked-in version!
> 
> Now that BK is up, I'll run some netperf tests before/after that
> changeset and see what we get.
> 
> -Andrew
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] Network Checksum Removal
  2005-05-23 15:31       ` Bin Ren
@ 2005-05-23 15:47         ` Andrew Theurer
  2005-05-23 15:56           ` Bin Ren
  0 siblings, 1 reply; 53+ messages in thread
From: Andrew Theurer @ 2005-05-23 15:47 UTC (permalink / raw)
  To: bin.ren, Bin Ren; +Cc: Xen-devel, Jon Mason

On Monday 23 May 2005 10:31, Bin Ren wrote:
> It seems to break the interdomain ssh and nfs on my machine. Digging
> for reasons.

Are you using bridge or network model?
>
> - Bin
>
> On 5/23/05, Andrew Theurer <habanero@us.ibm.com> wrote:
> > > I've checked in a modified version of your patch that hopefully
> > > deals with propagating checksum information in both directions
> > > across a virtual bridge or router. I replaced your skb flags with
> > > two new ones -- proto_csum_blank and proto_csum_valid.
> > >
> > > The former indicates that the protocol-level checksum needs
> > > filling in. This is not a problem for local processing, but the
> > > flag is picked up before sending to a physical interface and
> > > fixed up.
> > >
> > > The latter indicates that the proto-level checksum has been
> > > validated since arrival at localhost (*or* that the packet
> > > originated from a domU on localhost). This flag survives crossing
> > > a bridge/router so we can trust it when deciding if checksum
> > > validation is required.
> > >
> > > I'll push the patch to the bkbits repository just as soon as
> > > bkbits rematerialises. :-)
> > >
> > > If you have any performance or stress tests that you were using
> > > to test checksum offloading, it would be great to find out how
> > > they perform on the checked-in version!
> >
> > Now that BK is up, I'll run some netperf tests before/after that
> > changeset and see what we get.
> >
> > -Andrew
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] Network Checksum Removal
  2005-05-23 15:47         ` Andrew Theurer
@ 2005-05-23 15:56           ` Bin Ren
  2005-05-23 16:06             ` Bin Ren
  0 siblings, 1 reply; 53+ messages in thread
From: Bin Ren @ 2005-05-23 15:56 UTC (permalink / raw)
  To: Andrew Theurer; +Cc: Xen-devel, Jon Mason

I'm using bridge and stock scripts. I start to doubt it's caused csum
offloading, as I'm seeing some weird things. (1) it's possible to do
interdomain iperf, which binds to ports > 1024 (2) ssh and nfs don't
work. In both cases, dom0 is the server, dom1 is the client. tcpdump
on dom0 doesn't show any incoming packets from dom1.

I'm recompiling everything again.

Cheers,
Bin

On 5/23/05, Andrew Theurer <habanero@us.ibm.com> wrote:
> On Monday 23 May 2005 10:31, Bin Ren wrote:
> > It seems to break the interdomain ssh and nfs on my machine. Digging
> > for reasons.
> 
> Are you using bridge or network model?
> >
> > - Bin
> >
> > On 5/23/05, Andrew Theurer <habanero@us.ibm.com> wrote:
> > > > I've checked in a modified version of your patch that hopefully
> > > > deals with propagating checksum information in both directions
> > > > across a virtual bridge or router. I replaced your skb flags with
> > > > two new ones -- proto_csum_blank and proto_csum_valid.
> > > >
> > > > The former indicates that the protocol-level checksum needs
> > > > filling in. This is not a problem for local processing, but the
> > > > flag is picked up before sending to a physical interface and
> > > > fixed up.
> > > >
> > > > The latter indicates that the proto-level checksum has been
> > > > validated since arrival at localhost (*or* that the packet
> > > > originated from a domU on localhost). This flag survives crossing
> > > > a bridge/router so we can trust it when deciding if checksum
> > > > validation is required.
> > > >
> > > > I'll push the patch to the bkbits repository just as soon as
> > > > bkbits rematerialises. :-)
> > > >
> > > > If you have any performance or stress tests that you were using
> > > > to test checksum offloading, it would be great to find out how
> > > > they perform on the checked-in version!
> > >
> > > Now that BK is up, I'll run some netperf tests before/after that
> > > changeset and see what we get.
> > >
> > > -Andrew
> > >
> > > _______________________________________________
> > > Xen-devel mailing list
> > > Xen-devel@lists.xensource.com
> > > http://lists.xensource.com/xen-devel
> 
>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] Network Checksum Removal
  2005-05-23 15:56           ` Bin Ren
@ 2005-05-23 16:06             ` Bin Ren
  2005-05-23 16:16               ` Jon Mason
  0 siblings, 1 reply; 53+ messages in thread
From: Bin Ren @ 2005-05-23 16:06 UTC (permalink / raw)
  To: Andrew Theurer; +Cc: Xen-devel, Jon Mason

Start from fresh again. The same weird symptoms.

- Bin

On 5/23/05, Bin Ren <bin.ren@gmail.com> wrote:
> I'm using bridge and stock scripts. I start to doubt it's caused csum
> offloading, as I'm seeing some weird things. (1) it's possible to do
> interdomain iperf, which binds to ports > 1024 (2) ssh and nfs don't
> work. In both cases, dom0 is the server, dom1 is the client. tcpdump
> on dom0 doesn't show any incoming packets from dom1.
> 
> I'm recompiling everything again.
> 
> Cheers,
> Bin
> 
> On 5/23/05, Andrew Theurer <habanero@us.ibm.com> wrote:
> > On Monday 23 May 2005 10:31, Bin Ren wrote:
> > > It seems to break the interdomain ssh and nfs on my machine. Digging
> > > for reasons.
> >
> > Are you using bridge or network model?
> > >
> > > - Bin
> > >
> > > On 5/23/05, Andrew Theurer <habanero@us.ibm.com> wrote:
> > > > > I've checked in a modified version of your patch that hopefully
> > > > > deals with propagating checksum information in both directions
> > > > > across a virtual bridge or router. I replaced your skb flags with
> > > > > two new ones -- proto_csum_blank and proto_csum_valid.
> > > > >
> > > > > The former indicates that the protocol-level checksum needs
> > > > > filling in. This is not a problem for local processing, but the
> > > > > flag is picked up before sending to a physical interface and
> > > > > fixed up.
> > > > >
> > > > > The latter indicates that the proto-level checksum has been
> > > > > validated since arrival at localhost (*or* that the packet
> > > > > originated from a domU on localhost). This flag survives crossing
> > > > > a bridge/router so we can trust it when deciding if checksum
> > > > > validation is required.
> > > > >
> > > > > I'll push the patch to the bkbits repository just as soon as
> > > > > bkbits rematerialises. :-)
> > > > >
> > > > > If you have any performance or stress tests that you were using
> > > > > to test checksum offloading, it would be great to find out how
> > > > > they perform on the checked-in version!
> > > >
> > > > Now that BK is up, I'll run some netperf tests before/after that
> > > > changeset and see what we get.
> > > >
> > > > -Andrew
> > > >
> > > > _______________________________________________
> > > > Xen-devel mailing list
> > > > Xen-devel@lists.xensource.com
> > > > http://lists.xensource.com/xen-devel
> >
> >
>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] Network Checksum Removal
  2005-05-23 16:06             ` Bin Ren
@ 2005-05-23 16:16               ` Jon Mason
  2005-05-23 16:36                 ` Bin Ren
  0 siblings, 1 reply; 53+ messages in thread
From: Jon Mason @ 2005-05-23 16:16 UTC (permalink / raw)
  To: xen-devel, bin.ren; +Cc: Andrew Theurer

You can disable the checksum "offload" with ethtool (in domU).  
"ethtool -k eth0" will show whether it is enabled or not.  
"ethtool -K eth0 tx off" will disable it.
"ethtool -K eth0 tx on" will enable it.

I tested it throughly with bridging before I submitted the patch, so it should 
be working.  I'll download the latest source and verify that it works on my 
test system.  

Thanks for your help,
Jon

On Monday 23 May 2005 11:06 am, Bin Ren wrote:
> Start from fresh again. The same weird symptoms.
>
> - Bin
>
> On 5/23/05, Bin Ren <bin.ren@gmail.com> wrote:
> > I'm using bridge and stock scripts. I start to doubt it's caused csum
> > offloading, as I'm seeing some weird things. (1) it's possible to do
> > interdomain iperf, which binds to ports > 1024 (2) ssh and nfs don't
> > work. In both cases, dom0 is the server, dom1 is the client. tcpdump
> > on dom0 doesn't show any incoming packets from dom1.
> >
> > I'm recompiling everything again.
> >
> > Cheers,
> > Bin
> >
> > On 5/23/05, Andrew Theurer <habanero@us.ibm.com> wrote:
> > > On Monday 23 May 2005 10:31, Bin Ren wrote:
> > > > It seems to break the interdomain ssh and nfs on my machine. Digging
> > > > for reasons.
> > >
> > > Are you using bridge or network model?
> > >
> > > > - Bin
> > > >
> > > > On 5/23/05, Andrew Theurer <habanero@us.ibm.com> wrote:
> > > > > > I've checked in a modified version of your patch that hopefully
> > > > > > deals with propagating checksum information in both directions
> > > > > > across a virtual bridge or router. I replaced your skb flags with
> > > > > > two new ones -- proto_csum_blank and proto_csum_valid.
> > > > > >
> > > > > > The former indicates that the protocol-level checksum needs
> > > > > > filling in. This is not a problem for local processing, but the
> > > > > > flag is picked up before sending to a physical interface and
> > > > > > fixed up.
> > > > > >
> > > > > > The latter indicates that the proto-level checksum has been
> > > > > > validated since arrival at localhost (*or* that the packet
> > > > > > originated from a domU on localhost). This flag survives crossing
> > > > > > a bridge/router so we can trust it when deciding if checksum
> > > > > > validation is required.
> > > > > >
> > > > > > I'll push the patch to the bkbits repository just as soon as
> > > > > > bkbits rematerialises. :-)
> > > > > >
> > > > > > If you have any performance or stress tests that you were using
> > > > > > to test checksum offloading, it would be great to find out how
> > > > > > they perform on the checked-in version!
> > > > >
> > > > > Now that BK is up, I'll run some netperf tests before/after that
> > > > > changeset and see what we get.
> > > > >
> > > > > -Andrew
> > > > >
> > > > > _______________________________________________
> > > > > Xen-devel mailing list
> > > > > Xen-devel@lists.xensource.com
> > > > > http://lists.xensource.com/xen-devel
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] Network Checksum Removal
  2005-05-23 16:16               ` Jon Mason
@ 2005-05-23 16:36                 ` Bin Ren
  2005-05-23 17:54                   ` Keir Fraser
  0 siblings, 1 reply; 53+ messages in thread
From: Bin Ren @ 2005-05-23 16:36 UTC (permalink / raw)
  To: Jon Mason; +Cc: Andrew Theurer, xen-devel

Keir has removed 'SET_ETHTOOL_OPS(dev, &network_ethtool_ops);' from
your patch. The operations are not supported.

- Bin

On 5/23/05, Jon Mason <jdmason@us.ibm.com> wrote:
> You can disable the checksum "offload" with ethtool (in domU).
> "ethtool -k eth0" will show whether it is enabled or not.
> "ethtool -K eth0 tx off" will disable it.
> "ethtool -K eth0 tx on" will enable it.
> 
> I tested it throughly with bridging before I submitted the patch, so it should
> be working.  I'll download the latest source and verify that it works on my
> test system.
> 
> Thanks for your help,
> Jon
> 
> On Monday 23 May 2005 11:06 am, Bin Ren wrote:
> > Start from fresh again. The same weird symptoms.
> >
> > - Bin
> >
> > On 5/23/05, Bin Ren <bin.ren@gmail.com> wrote:
> > > I'm using bridge and stock scripts. I start to doubt it's caused csum
> > > offloading, as I'm seeing some weird things. (1) it's possible to do
> > > interdomain iperf, which binds to ports > 1024 (2) ssh and nfs don't
> > > work. In both cases, dom0 is the server, dom1 is the client. tcpdump
> > > on dom0 doesn't show any incoming packets from dom1.
> > >
> > > I'm recompiling everything again.
> > >
> > > Cheers,
> > > Bin
> > >
> > > On 5/23/05, Andrew Theurer <habanero@us.ibm.com> wrote:
> > > > On Monday 23 May 2005 10:31, Bin Ren wrote:
> > > > > It seems to break the interdomain ssh and nfs on my machine. Digging
> > > > > for reasons.
> > > >
> > > > Are you using bridge or network model?
> > > >
> > > > > - Bin
> > > > >
> > > > > On 5/23/05, Andrew Theurer <habanero@us.ibm.com> wrote:
> > > > > > > I've checked in a modified version of your patch that hopefully
> > > > > > > deals with propagating checksum information in both directions
> > > > > > > across a virtual bridge or router. I replaced your skb flags with
> > > > > > > two new ones -- proto_csum_blank and proto_csum_valid.
> > > > > > >
> > > > > > > The former indicates that the protocol-level checksum needs
> > > > > > > filling in. This is not a problem for local processing, but the
> > > > > > > flag is picked up before sending to a physical interface and
> > > > > > > fixed up.
> > > > > > >
> > > > > > > The latter indicates that the proto-level checksum has been
> > > > > > > validated since arrival at localhost (*or* that the packet
> > > > > > > originated from a domU on localhost). This flag survives crossing
> > > > > > > a bridge/router so we can trust it when deciding if checksum
> > > > > > > validation is required.
> > > > > > >
> > > > > > > I'll push the patch to the bkbits repository just as soon as
> > > > > > > bkbits rematerialises. :-)
> > > > > > >
> > > > > > > If you have any performance or stress tests that you were using
> > > > > > > to test checksum offloading, it would be great to find out how
> > > > > > > they perform on the checked-in version!
> > > > > >
> > > > > > Now that BK is up, I'll run some netperf tests before/after that
> > > > > > changeset and see what we get.
> > > > > >
> > > > > > -Andrew
> > > > > >
> > > > > > _______________________________________________
> > > > > > Xen-devel mailing list
> > > > > > Xen-devel@lists.xensource.com
> > > > > > http://lists.xensource.com/xen-devel
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] Network Checksum Removal
  2005-05-23 16:36                 ` Bin Ren
@ 2005-05-23 17:54                   ` Keir Fraser
  2005-05-23 18:08                     ` Bin Ren
  2005-05-23 19:55                     ` Bin Ren
  0 siblings, 2 replies; 53+ messages in thread
From: Keir Fraser @ 2005-05-23 17:54 UTC (permalink / raw)
  To: bin.ren; +Cc: Andrew Theurer, xen-devel, Jon Mason


On 23 May 2005, at 17:36, Bin Ren wrote:

> Keir has removed 'SET_ETHTOOL_OPS(dev, &network_ethtool_ops);' from
> your patch. The operations are not supported.

Ah, I thought that was just testing infrastructure. I'll take a patch 
to add the ethtool ops back in.

Bin -- does your domain0 traffic get delivered via the bridge device or 
via the new vif0.0/veth0 that I added? If the former you might want to 
try updating your /etc/xen/scripts/network script. Although delivery 
via the bridge ought to work...

  -- Keir

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] Network Checksum Removal
  2005-05-23 17:54                   ` Keir Fraser
@ 2005-05-23 18:08                     ` Bin Ren
  2005-05-23 18:18                       ` Jon Mason
  2005-05-23 19:55                     ` Bin Ren
  1 sibling, 1 reply; 53+ messages in thread
From: Bin Ren @ 2005-05-23 18:08 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Andrew Theurer, xen-devel, Jon Mason

It's via the new vif0.0/veth0. I did tcpdump on vif1.0 in dom0 and saw
packets sent by dom0, but got dropped by the netfront on dom1.

Cheers,
Bin

On 5/23/05, Keir Fraser <Keir.Fraser@cl.cam.ac.uk> wrote:
> 
> On 23 May 2005, at 17:36, Bin Ren wrote:
> 
> > Keir has removed 'SET_ETHTOOL_OPS(dev, &network_ethtool_ops);' from
> > your patch. The operations are not supported.
> 
> Ah, I thought that was just testing infrastructure. I'll take a patch
> to add the ethtool ops back in.
> 
> Bin -- does your domain0 traffic get delivered via the bridge device or
> via the new vif0.0/veth0 that I added? If the former you might want to
> try updating your /etc/xen/scripts/network script. Although delivery
> via the bridge ought to work...
> 
>   -- Keir
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] Network Checksum Removal
  2005-05-23 18:08                     ` Bin Ren
@ 2005-05-23 18:18                       ` Jon Mason
  2005-05-23 18:43                         ` Keir Fraser
  0 siblings, 1 reply; 53+ messages in thread
From: Jon Mason @ 2005-05-23 18:18 UTC (permalink / raw)
  To: bin.ren; +Cc: xen-devel, Andrew Theurer

thanks Bin,
I'll take a look at that.

Jon

On Monday 23 May 2005 01:08 pm, Bin Ren wrote:
> It's via the new vif0.0/veth0. I did tcpdump on vif1.0 in dom0 and saw
> packets sent by dom0, but got dropped by the netfront on dom1.
>
> Cheers,
> Bin
>
> On 5/23/05, Keir Fraser <Keir.Fraser@cl.cam.ac.uk> wrote:
> > On 23 May 2005, at 17:36, Bin Ren wrote:
> > > Keir has removed 'SET_ETHTOOL_OPS(dev, &network_ethtool_ops);' from
> > > your patch. The operations are not supported.
> >
> > Ah, I thought that was just testing infrastructure. I'll take a patch
> > to add the ethtool ops back in.
> >
> > Bin -- does your domain0 traffic get delivered via the bridge device or
> > via the new vif0.0/veth0 that I added? If the former you might want to
> > try updating your /etc/xen/scripts/network script. Although delivery
> > via the bridge ought to work...
> >
> >   -- Keir
> >
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] Network Checksum Removal
  2005-05-23 18:18                       ` Jon Mason
@ 2005-05-23 18:43                         ` Keir Fraser
  2005-05-23 18:53                           ` Bin Ren
  0 siblings, 1 reply; 53+ messages in thread
From: Keir Fraser @ 2005-05-23 18:43 UTC (permalink / raw)
  To: Jon Mason; +Cc: Andrew Theurer, xen-devel, bin.ren


I think I found the problem, and I've checked in a fix.

Bin: can you try dom0->domU networking with latest unstable tree? 
Hopefully your problem is fixed.

As further work on this, I think I chose a bad name for the 
'proto_csum_valid' field because sometimes it is set for local packets 
that have had no csum poked into the packet at all. Something like 
'proto_data_valid' might be better. And communicating this information 
between domains (i.e., that the csum field is blank, but the packet 
data is known good anyway) would be nice. Then domU can decide to add 
the checksum if it passes the packet off to a context that expects a 
valid checksum.

  -- Keir

On 23 May 2005, at 19:18, Jon Mason wrote:

> thanks Bin,
> I'll take a look at that.
>
> Jon
>
> On Monday 23 May 2005 01:08 pm, Bin Ren wrote:
>> It's via the new vif0.0/veth0. I did tcpdump on vif1.0 in dom0 and saw
>> packets sent by dom0, but got dropped by the netfront on dom1.
>>
>> Cheers,
>> Bin

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] Network Checksum Removal
  2005-05-23 18:43                         ` Keir Fraser
@ 2005-05-23 18:53                           ` Bin Ren
  0 siblings, 0 replies; 53+ messages in thread
From: Bin Ren @ 2005-05-23 18:53 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Andrew Theurer, xen-devel, Jon Mason

Fantastic! It's working :-D

Thanks a great deal,
Bin

On 5/23/05, Keir Fraser <Keir.Fraser@cl.cam.ac.uk> wrote:
> 
> I think I found the problem, and I've checked in a fix.
> 
> Bin: can you try dom0->domU networking with latest unstable tree?
> Hopefully your problem is fixed.
> 
> As further work on this, I think I chose a bad name for the
> 'proto_csum_valid' field because sometimes it is set for local packets
> that have had no csum poked into the packet at all. Something like
> 'proto_data_valid' might be better. And communicating this information
> between domains (i.e., that the csum field is blank, but the packet
> data is known good anyway) would be nice. Then domU can decide to add
> the checksum if it passes the packet off to a context that expects a
> valid checksum.
> 
>   -- Keir
> 
> On 23 May 2005, at 19:18, Jon Mason wrote:
> 
> > thanks Bin,
> > I'll take a look at that.
> >
> > Jon
> >
> > On Monday 23 May 2005 01:08 pm, Bin Ren wrote:
> >> It's via the new vif0.0/veth0. I did tcpdump on vif1.0 in dom0 and saw
> >> packets sent by dom0, but got dropped by the netfront on dom1.
> >>
> >> Cheers,
> >> Bin
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] Network Checksum Removal
  2005-05-23 17:54                   ` Keir Fraser
  2005-05-23 18:08                     ` Bin Ren
@ 2005-05-23 19:55                     ` Bin Ren
  2005-05-23 20:13                       ` Keir Fraser
  2005-05-23 21:12                       ` Nivedita Singhvi
  1 sibling, 2 replies; 53+ messages in thread
From: Bin Ren @ 2005-05-23 19:55 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Andrew Theurer, xen-devel, Jon Mason

I've added the support for ethtools. By turning on and off netfront
checksum offloading, I'm getting the following throughput numbers,
using iperf. Each test was run three times. CPU usages are quite
similar in two cases ('top' output). Looks like checksum computation
is not a major overhead in domU networking.

dom0/1/2 all have 128M memory. dom0 has e1000 tx checksum offloading turned on.

With Tx checksum on:

dom1->dom2: 300Mb/s (dom0 cpu maxed out by software interrupts)
dom1->dom0: 459Mb/s (dom0 cpu 80% in SI, dom1 cpu 20% in SI)
dom1->external: 439Mb/s (over 1Gb/s ethernet) (dom0 cpu 50% in SI,
dom1 60% in SI)

With Tx checksum off:

dom1->dom2: 301Mb/s
dom1->dom0: 454Mb/s
dom1->externel: 437Mb/s (over 1Gb/s ethernet)

On 5/23/05, Keir Fraser <Keir.Fraser@cl.cam.ac.uk> wrote:
> 
> On 23 May 2005, at 17:36, Bin Ren wrote:
> 
> > Keir has removed 'SET_ETHTOOL_OPS(dev, &network_ethtool_ops);' from
> > your patch. The operations are not supported.
> 
> Ah, I thought that was just testing infrastructure. I'll take a patch
> to add the ethtool ops back in.
> 
> Bin -- does your domain0 traffic get delivered via the bridge device or
> via the new vif0.0/veth0 that I added? If the former you might want to
> try updating your /etc/xen/scripts/network script. Although delivery
> via the bridge ought to work...
> 
>   -- Keir
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] Network Checksum Removal
  2005-05-23 19:55                     ` Bin Ren
@ 2005-05-23 20:13                       ` Keir Fraser
  2005-05-23 20:20                         ` Jon Mason
                                           ` (2 more replies)
  2005-05-23 21:12                       ` Nivedita Singhvi
  1 sibling, 3 replies; 53+ messages in thread
From: Keir Fraser @ 2005-05-23 20:13 UTC (permalink / raw)
  To: bin.ren; +Cc: Andrew Theurer, xen-devel, Jon Mason


On 23 May 2005, at 20:55, Bin Ren wrote:

> I've added the support for ethtools. By turning on and off netfront
> checksum offloading, I'm getting the following throughput numbers,
> using iperf. Each test was run three times. CPU usages are quite
> similar in two cases ('top' output). Looks like checksum computation
> is not a major overhead in domU networking.
>
> dom0/1/2 all have 128M memory. dom0 has e1000 tx checksum offloading 
> turned on.

What happens to CPU usage in dom1 when tx checksumming is disabled?

Overall though these are the kind of results I would expect. Linux 
usually does csumming at the same time as it has to do a copy anyway, 
and it ends up being limited by memory/L2-cache bandwidth, not the 
extra computation. But the offload extensions haven't cost much to 
implement and there are probably cases where it helps a little.

Maybe I'm being pessimistic though: Can you reproduce the rather more 
impressive speedups that you previously saw, Jon?

  -- Keir

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] Network Checksum Removal
  2005-05-23 20:13                       ` Keir Fraser
@ 2005-05-23 20:20                         ` Jon Mason
  2005-05-23 21:52                         ` Bin Ren
  2005-05-23 21:58                         ` Jon Mason
  2 siblings, 0 replies; 53+ messages in thread
From: Jon Mason @ 2005-05-23 20:20 UTC (permalink / raw)
  To: xen-devel; +Cc: bin.ren, Andrew Theurer

On Monday 23 May 2005 03:13 pm, Keir Fraser wrote:
> On 23 May 2005, at 20:55, Bin Ren wrote:
> > I've added the support for ethtools. By turning on and off netfront
> > checksum offloading, I'm getting the following throughput numbers,
> > using iperf. Each test was run three times. CPU usages are quite
> > similar in two cases ('top' output). Looks like checksum computation
> > is not a major overhead in domU networking.
> >
> > dom0/1/2 all have 128M memory. dom0 has e1000 tx checksum offloading
> > turned on.
>
> What happens to CPU usage in dom1 when tx checksumming is disabled?
>
> Overall though these are the kind of results I would expect. Linux
> usually does csumming at the same time as it has to do a copy anyway,
> and it ends up being limited by memory/L2-cache bandwidth, not the
> extra computation. But the offload extensions haven't cost much to
> implement and there are probably cases where it helps a little.
>
> Maybe I'm being pessimistic though: Can you reproduce the rather more
> impressive speedups that you previously saw, Jon?

I would if I could.  As I don't use BK, I'll have to wait for the nightly 
build to pull in your latest fix.

Thanks,
Jon

^ permalink raw reply	[flat|nested] 53+ messages in thread

* RE: [PATCH] Network Checksum Removal
@ 2005-05-23 20:22 Ian Pratt
  2005-05-23 20:38 ` Keir Fraser
  2005-05-23 21:01 ` Bin Ren
  0 siblings, 2 replies; 53+ messages in thread
From: Ian Pratt @ 2005-05-23 20:22 UTC (permalink / raw)
  To: Keir Fraser, bin.ren; +Cc: Andrew Theurer, xen-devel, Jon Mason

> Overall though these are the kind of results I would expect. 
> Linux usually does csumming at the same time as it has to do 
> a copy anyway, and it ends up being limited by 
> memory/L2-cache bandwidth, not the extra computation. But the 
> offload extensions haven't cost much to implement and there 
> are probably cases where it helps a little.
> 
> Maybe I'm being pessimistic though: Can you reproduce the 
> rather more impressive speedups that you previously saw, Jon?

We should be getting some benefit on the receive path, where the
checksum is normally forced to happen independent of a copy. Having this
offloaded to hardware should produce some measureable gain.

Bin: The numbers you're seeing are terrible anyway. You should be seeing
890Mb/s for external traffic. What kind of machine is this on?

Ian

^ permalink raw reply	[flat|nested] 53+ messages in thread

* RE: [PATCH] Network Checksum Removal
@ 2005-05-23 20:26 Ian Pratt
  0 siblings, 0 replies; 53+ messages in thread
From: Ian Pratt @ 2005-05-23 20:26 UTC (permalink / raw)
  To: Jon Mason, xen-devel; +Cc: Andrew Theurer, bin.ren

 
> I would if I could.  As I don't use BK, I'll have to wait for 
> the nightly build to pull in your latest fix.

Jon,
Do you know about either of the following?
 http://www.bitkeeper.com/press/2005-03-17.html
 http://sourceforge.net/projects/sourcepuller/

I haven't used either myself, but I'd be interested to know whether they
work.

Thanks,
Ian

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] Network Checksum Removal
  2005-05-23 20:22 Ian Pratt
@ 2005-05-23 20:38 ` Keir Fraser
  2005-05-23 20:44   ` Jon Mason
  2005-05-23 21:01 ` Bin Ren
  1 sibling, 1 reply; 53+ messages in thread
From: Keir Fraser @ 2005-05-23 20:38 UTC (permalink / raw)
  To: Ian Pratt; +Cc: Andrew Theurer, xen-devel, bin.ren, Jon Mason


On 23 May 2005, at 21:22, Ian Pratt wrote:

>> Maybe I'm being pessimistic though: Can you reproduce the
>> rather more impressive speedups that you previously saw, Jon?
>
> We should be getting some benefit on the receive path, where the
> checksum is normally forced to happen independent of a copy. Having 
> this
> offloaded to hardware should produce some measureable gain.

Ah, I forgot about that. But rx csum was not being toggled in the 
experiment.

The external bandwidth was definitely very low, so I guess there must 
be some other bottleneck in Bin's setup.

  -- Keir

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] Network Checksum Removal
  2005-05-23 20:38 ` Keir Fraser
@ 2005-05-23 20:44   ` Jon Mason
  0 siblings, 0 replies; 53+ messages in thread
From: Jon Mason @ 2005-05-23 20:44 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Ian Pratt, Andrew Theurer, xen-devel, bin.ren

On Monday 23 May 2005 03:38 pm, Keir Fraser wrote:
> On 23 May 2005, at 21:22, Ian Pratt wrote:
> >> Maybe I'm being pessimistic though: Can you reproduce the
> >> rather more impressive speedups that you previously saw, Jon?
> >
> > We should be getting some benefit on the receive path, where the
> > checksum is normally forced to happen independent of a copy. Having
> > this
> > offloaded to hardware should produce some measureable gain.
>
> Ah, I forgot about that. But rx csum was not being toggled in the
> experiment.
>
> The external bandwidth was definitely very low, so I guess there must
> be some other bottleneck in Bin's setup.

I was using 256MB/domain in my test system (and Bin is using 128MB).  That 
might be the bottleneck.

Thanks,
Jon

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] Network Checksum Removal
  2005-05-23 20:22 Ian Pratt
  2005-05-23 20:38 ` Keir Fraser
@ 2005-05-23 21:01 ` Bin Ren
  2005-05-23 21:09   ` Andrew Theurer
  1 sibling, 1 reply; 53+ messages in thread
From: Bin Ren @ 2005-05-23 21:01 UTC (permalink / raw)
  To: Ian Pratt; +Cc: xen-devel, Andrew Theurer, Jon Mason

Machines spec:

External server:
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU: AMD Athlon(tm) 64 Processor 3200+ stepping 08
Memory: 1024M DDR400 CAS 3
NIC: 1Gb/s Intel Pro/1000 MT Desktop

Xen machine:
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 256K (64 bytes/line)
CPU: AMD Sempron(tm) 2200+ stepping 01
Memory: 1024M DDR400 CAS 3
NIC: 1Gb/s Intel Pro/1000 MT Desktop

The highest number I'm seeing here is 760Mbps running native linux on
the Xen machine. dom0->external server gets 650Mbps. dom1->external
server is definitely low using the default BVT. I've recently
implemented a Xen scheduler based on Earliest Eligible Virtual
Deadline First, which gives 610Mbps for dom1->external, ~50%
improvement over BVT. I'm still figuring out why.

On 5/23/05, Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk> wrote:
> > Overall though these are the kind of results I would expect.
> > Linux usually does csumming at the same time as it has to do
> > a copy anyway, and it ends up being limited by
> > memory/L2-cache bandwidth, not the extra computation. But the
> > offload extensions haven't cost much to implement and there
> > are probably cases where it helps a little.
> >
> > Maybe I'm being pessimistic though: Can you reproduce the
> > rather more impressive speedups that you previously saw, Jon?
> 
> We should be getting some benefit on the receive path, where the
> checksum is normally forced to happen independent of a copy. Having this
> offloaded to hardware should produce some measureable gain.
> 
> Bin: The numbers you're seeing are terrible anyway. You should be seeing
> 890Mb/s for external traffic. What kind of machine is this on?
> 
> Ian
>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] Network Checksum Removal
  2005-05-23 21:01 ` Bin Ren
@ 2005-05-23 21:09   ` Andrew Theurer
  0 siblings, 0 replies; 53+ messages in thread
From: Andrew Theurer @ 2005-05-23 21:09 UTC (permalink / raw)
  To: bin.ren, Bin Ren, Ian Pratt; +Cc: xen-devel, Jon Mason

On Monday 23 May 2005 16:01, Bin Ren wrote:
> Machines spec:
>
> External server:
> CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
> CPU: L2 Cache: 1024K (64 bytes/line)
> CPU: AMD Athlon(tm) 64 Processor 3200+ stepping 08
> Memory: 1024M DDR400 CAS 3
> NIC: 1Gb/s Intel Pro/1000 MT Desktop
>
> Xen machine:
> CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
> CPU: L2 Cache: 256K (64 bytes/line)
> CPU: AMD Sempron(tm) 2200+ stepping 01
> Memory: 1024M DDR400 CAS 3
> NIC: 1Gb/s Intel Pro/1000 MT Desktop
>
> The highest number I'm seeing here is 760Mbps running native linux on
> the Xen machine.

This still seems kind of low.  With netperf tcp_stream test I see 940 
Mbps, basically wire speed with somewhere around 30% cpu on a P4 Xeon.  
Do you know the cpu util for native linux test?

-Andrew

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] Network Checksum Removal
  2005-05-23 19:55                     ` Bin Ren
  2005-05-23 20:13                       ` Keir Fraser
@ 2005-05-23 21:12                       ` Nivedita Singhvi
  2005-05-23 21:48                         ` Bin Ren
  1 sibling, 1 reply; 53+ messages in thread
From: Nivedita Singhvi @ 2005-05-23 21:12 UTC (permalink / raw)
  To: bin.ren; +Cc: xen-devel, Andrew Theurer, Jon Mason

Bin Ren wrote:
> I've added the support for ethtools. By turning on and off netfront
> checksum offloading, I'm getting the following throughput numbers,
> using iperf. Each test was run three times. CPU usages are quite
> similar in two cases ('top' output). Looks like checksum computation
> is not a major overhead in domU networking.
> 
> dom0/1/2 all have 128M memory. dom0 has e1000 tx checksum offloading turned on.

Yeah, if you want to do anything network intensive, 128MB is just
not enough - you really need more memory in your system.


> With Tx checksum on:
> 
> dom1->dom2: 300Mb/s (dom0 cpu maxed out by software interrupts)
> dom1->dom0: 459Mb/s (dom0 cpu 80% in SI, dom1 cpu 20% in SI)
> dom1->external: 439Mb/s (over 1Gb/s ethernet) (dom0 cpu 50% in SI,
> dom1 60% in SI)
> 
> With Tx checksum off:
> 
> dom1->dom2: 301Mb/s
> dom1->dom0: 454Mb/s
> dom1->externel: 437Mb/s (over 1Gb/s ethernet)


iperf is a directional send test, correct?
i.e. is dom1 -> dom0 perf the same as dom0 -> dom1 for you?

thanks,
Nivedita

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] Network Checksum Removal
  2005-05-23 21:12                       ` Nivedita Singhvi
@ 2005-05-23 21:48                         ` Bin Ren
  2005-05-23 23:55                           ` Rolf Neugebauer
  0 siblings, 1 reply; 53+ messages in thread
From: Bin Ren @ 2005-05-23 21:48 UTC (permalink / raw)
  To: Nivedita Singhvi; +Cc: Ian Pratt, xen-devel, Andrew Theurer, Jon Mason

On 5/23/05, Nivedita Singhvi <niv@us.ibm.com> wrote:
> Bin Ren wrote:
> > I've added the support for ethtools. By turning on and off netfront
> > checksum offloading, I'm getting the following throughput numbers,
> > using iperf. Each test was run three times. CPU usages are quite
> > similar in two cases ('top' output). Looks like checksum computation
> > is not a major overhead in domU networking.
> >
> > dom0/1/2 all have 128M memory. dom0 has e1000 tx checksum offloading turned on.
> 
> Yeah, if you want to do anything network intensive, 128MB is just
> not enough - you really need more memory in your system.

I've given all the domains 256M memory and switched to netperf
TCP_STREAM (netperf -H server). almost no change. Details:

dom1->external: 420Mbps
dom1->dom0: 437Mbps
dom0->dom1: 200Mbps (!!!)
dom1->dom2: 327Mbps

>  
> > With Tx checksum on:
> >
> > dom1->dom2: 300Mb/s (dom0 cpu maxed out by software interrupts)
> > dom1->dom0: 459Mb/s (dom0 cpu 80% in SI, dom1 cpu 20% in SI)
> > dom1->external: 439Mb/s (over 1Gb/s ethernet) (dom0 cpu 50% in SI,
> > dom1 60% in SI)
> >
> > With Tx checksum off:
> >
> > dom1->dom2: 301Mb/s
> > dom1->dom0: 454Mb/s
> > dom1->externel: 437Mb/s (over 1Gb/s ethernet)
> 
> 
> iperf is a directional send test, correct?
> i.e. is dom1 -> dom0 perf the same as dom0 -> dom1 for you?

Please see above.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] Network Checksum Removal
  2005-05-23 20:13                       ` Keir Fraser
  2005-05-23 20:20                         ` Jon Mason
@ 2005-05-23 21:52                         ` Bin Ren
  2005-05-23 21:58                         ` Jon Mason
  2 siblings, 0 replies; 53+ messages in thread
From: Bin Ren @ 2005-05-23 21:52 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Andrew Theurer, xen-devel, Jon Mason

On 5/23/05, Keir Fraser <Keir.Fraser@cl.cam.ac.uk> wrote:
> What happens to CPU usage in dom1 when tx checksumming is disabled?

dom1->dom0: 70.7% id,  0.0% wa,  1.7% hi, 15.0% si
dom1->external: 20.0% id,  0.0% wa,  0.7% hi, 60.0% si
dom1->dom2: 77.7% id,  0.0% wa,  1.0% hi,  9.3% si

> 
> Overall though these are the kind of results I would expect. Linux
> usually does csumming at the same time as it has to do a copy anyway,
> and it ends up being limited by memory/L2-cache bandwidth, not the
> extra computation. But the offload extensions haven't cost much to
> implement and there are probably cases where it helps a little.
> 
> Maybe I'm being pessimistic though: Can you reproduce the rather more
> impressive speedups that you previously saw, Jon?
> 
>   -- Keir
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] Network Checksum Removal
  2005-05-23 20:13                       ` Keir Fraser
  2005-05-23 20:20                         ` Jon Mason
  2005-05-23 21:52                         ` Bin Ren
@ 2005-05-23 21:58                         ` Jon Mason
  2005-05-23 22:05                           ` Bin Ren
  2 siblings, 1 reply; 53+ messages in thread
From: Jon Mason @ 2005-05-23 21:58 UTC (permalink / raw)
  To: xen-devel; +Cc: bin.ren, Andrew Theurer

On Monday 23 May 2005 03:13 pm, Keir Fraser wrote:
> On 23 May 2005, at 20:55, Bin Ren wrote:
> > I've added the support for ethtools. By turning on and off netfront
> > checksum offloading, I'm getting the following throughput numbers,
> > using iperf. Each test was run three times. CPU usages are quite
> > similar in two cases ('top' output). Looks like checksum computation
> > is not a major overhead in domU networking.
> >
> > dom0/1/2 all have 128M memory. dom0 has e1000 tx checksum offloading
> > turned on.
>
> What happens to CPU usage in dom1 when tx checksumming is disabled?
>
> Overall though these are the kind of results I would expect. Linux
> usually does csumming at the same time as it has to do a copy anyway,
> and it ends up being limited by memory/L2-cache bandwidth, not the
> extra computation. But the offload extensions haven't cost much to
> implement and there are probably cases where it helps a little.
>
> Maybe I'm being pessimistic though: Can you reproduce the rather more
> impressive speedups that you previously saw, Jon?

Alright, I broke down and got a BK puller.  

I get the following domU->dom0 throughput on my system (using netperf3 
TCP_STREAM testcase):
tx on	~1580Mbps
tx off	~1230Mbps

with my previous patch (on Friday's build), I was seeing the following:
with patch	~1610Mbps
no patch		~1100Mbps

The slight difference between the two might be caused by the changes that were 
incorporated in xen between those dates.  If you think it is worth the time, 
I can back port the latest patch to Friday's build to see if that makes a 
difference.

Thanks,
Jon

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] Network Checksum Removal
  2005-05-23 21:58                         ` Jon Mason
@ 2005-05-23 22:05                           ` Bin Ren
  2005-05-23 22:41                             ` Jon Mason
  0 siblings, 1 reply; 53+ messages in thread
From: Bin Ren @ 2005-05-23 22:05 UTC (permalink / raw)
  To: Jon Mason; +Cc: Andrew Theurer, xen-devel

On 5/23/05, Jon Mason <jdmason@us.ibm.com> wrote:
> I get the following domU->dom0 throughput on my system (using netperf3
> TCP_STREAM testcase):
> tx on   ~1580Mbps
> tx off  ~1230Mbps
> 
> with my previous patch (on Friday's build), I was seeing the following:
> with patch      ~1610Mbps
> no patch                ~1100Mbps

I suppose you are running dom0 and dom1 on different CPUs. Is it
possible for you to pin them to the same CPU and get the numbers
again? It'll show how much overhead context switches and CPU share
halved could incur.

Thanks a lot,
Bin

> The slight difference between the two might be caused by the changes that were
> incorporated in xen between those dates.  If you think it is worth the time,
> I can back port the latest patch to Friday's build to see if that makes a
> difference.
> 
> Thanks,
> Jon
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] Network Checksum Removal
  2005-05-23 22:05                           ` Bin Ren
@ 2005-05-23 22:41                             ` Jon Mason
  0 siblings, 0 replies; 53+ messages in thread
From: Jon Mason @ 2005-05-23 22:41 UTC (permalink / raw)
  To: bin.ren; +Cc: Andrew Theurer, xen-devel

On Monday 23 May 2005 05:05 pm, Bin Ren wrote:
> On 5/23/05, Jon Mason <jdmason@us.ibm.com> wrote:
> > I get the following domU->dom0 throughput on my system (using netperf3
> > TCP_STREAM testcase):
> > tx on   ~1580Mbps
> > tx off  ~1230Mbps
> >
> > with my previous patch (on Friday's build), I was seeing the following:
> > with patch      ~1610Mbps
> > no patch                ~1100Mbps
>
> I suppose you are running dom0 and dom1 on different CPUs. Is it

Yes, I am.

> possible for you to pin them to the same CPU and get the numbers
> again? It'll show how much overhead context switches and CPU share
> halved could incur.

I pinned them to the same CPU and got the following:
tx on	~1480Mbps
tx off	~1330Mbps

> > The slight difference between the two might be caused by the changes that
> > were incorporated in xen between those dates.  If you think it is worth
> > the time, I can back port the latest patch to Friday's build to see if
> > that makes a difference.
> >
> > Thanks,
> > Jon
> >
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] Network Checksum Removal
  2005-05-23 21:48                         ` Bin Ren
@ 2005-05-23 23:55                           ` Rolf Neugebauer
  2005-05-24  0:38                             ` Bin Ren
  0 siblings, 1 reply; 53+ messages in thread
From: Rolf Neugebauer @ 2005-05-23 23:55 UTC (permalink / raw)
  To: bin.ren, Nivedita Singhvi; +Cc: Ian Pratt, Andrew Theurer, xen-devel, Jon Mason

These results are pretty bad.

What do you get for dom0->external? That definitely should be close or equal
to native.

Have you tweaked /proc/sys/net/core/rmem_max?
Is the socket buffer set to some large value?
Are you transmitting/receiving enough data?

I don't know netperf but for ttcp I would normally do:

echo 1048576 > /proc/sys/net/core/rmem_max
ttcp -b 65536 (or similar) ...
And then transmit a few gigabytes

What's the interrupt rate etc.

Rolf


On 23/5/05 10:48 pm, "Bin Ren" <bin.ren@gmail.com> wrote:

> On 5/23/05, Nivedita Singhvi <niv@us.ibm.com> wrote:
>> Bin Ren wrote:
>>> I've added the support for ethtools. By turning on and off netfront
>>> checksum offloading, I'm getting the following throughput numbers,
>>> using iperf. Each test was run three times. CPU usages are quite
>>> similar in two cases ('top' output). Looks like checksum computation
>>> is not a major overhead in domU networking.
>>> 
>>> dom0/1/2 all have 128M memory. dom0 has e1000 tx checksum offloading turned
>>> on.
>> 
>> Yeah, if you want to do anything network intensive, 128MB is just
>> not enough - you really need more memory in your system.
> 
> I've given all the domains 256M memory and switched to netperf
> TCP_STREAM (netperf -H server). almost no change. Details:
> 
> dom1->external: 420Mbps
> dom1->dom0: 437Mbps
> dom0->dom1: 200Mbps (!!!)
> dom1->dom2: 327Mbps
> 
>>  
>>> With Tx checksum on:
>>> 
>>> dom1->dom2: 300Mb/s (dom0 cpu maxed out by software interrupts)
>>> dom1->dom0: 459Mb/s (dom0 cpu 80% in SI, dom1 cpu 20% in SI)
>>> dom1->external: 439Mb/s (over 1Gb/s ethernet) (dom0 cpu 50% in SI,
>>> dom1 60% in SI)
>>> 
>>> With Tx checksum off:
>>> 
>>> dom1->dom2: 301Mb/s
>>> dom1->dom0: 454Mb/s
>>> dom1->externel: 437Mb/s (over 1Gb/s ethernet)
>> 
>> 
>> iperf is a directional send test, correct?
>> i.e. is dom1 -> dom0 perf the same as dom0 -> dom1 for you?
> 
> Please see above.
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* RE: [PATCH] Network Checksum Removal
@ 2005-05-23 23:59 Ian Pratt
  2005-05-24 16:12 ` Jon Mason
  0 siblings, 1 reply; 53+ messages in thread
From: Ian Pratt @ 2005-05-23 23:59 UTC (permalink / raw)
  To: Jon Mason, xen-devel; +Cc: Andrew Theurer, bin.ren


> I get the following domU->dom0 throughput on my system (using 
> netperf3 TCP_STREAM testcase):
> tx on	~1580Mbps
> tx off	~1230Mbps
> 
> with my previous patch (on Friday's build), I was seeing the 
> following:
> with patch	~1610Mbps
> no patch		~1100Mbps
> 
> The slight difference between the two might be caused by the 
> changes that were incorporated in xen between those dates.  
> If you think it is worth the time, I can back port the latest 
> patch to Friday's build to see if that makes a difference.

Are you sure these aren't within 'experimental error'? I can't think of
anything that's changed since Friday that could be effecting this, but
it would be good to dig a bit further as the difference in 'no patch'
results is quite significant. 
It might be revealing to try running some results on the unpatched
Fri/Sat/Sun tree. 

BTW, dom0<->domU is not that interesting as I'd generally discourage
people from running services in dom0. I'd be really interested to see
the following tests:

domU <-> external [dom0 on cpu0; dom1 on cpu1]
domU <-> external [dom0 on cpu0; dom1 on cpu0]
domU <-> domU [dom0 on cpu0; dom1 on cpu1; dom2 on cpu2 ** on a 4 way]
domU <-> domU [dom0 on cpu0; dom1 on cpu0; dom2 on cpu0 ]
domU <-> domU [dom0 on cpu0; dom1 on cpu1; dom2 on cpu1 ]
domU <-> domU [dom0 on cpu0; dom1 on cpu0; dom2 on cpu1 ]
domU <-> domU [dom0 on cpu0; dom1 on cpu1; dom2 on cpu2 ** cpu2
hyperthread w/ cpu 0]
domU <-> domU [dom0 on cpu0; dom1 on cpu1; dom2 on cpu3 ** cpu3
hyperthread w/ cpu 1]

This might help us understand the performance of interdomin networking
rather better than we do at present. If you could fill a few of these in
that would be great.

Best,
Ian

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] Network Checksum Removal
  2005-05-23 23:55                           ` Rolf Neugebauer
@ 2005-05-24  0:38                             ` Bin Ren
  0 siblings, 0 replies; 53+ messages in thread
From: Bin Ren @ 2005-05-24  0:38 UTC (permalink / raw)
  To: Rolf Neugebauer; +Cc: Ian Pratt, xen-devel, Andrew Theurer, Jon Mason

On 5/24/05, Rolf Neugebauer <rolf.neugebauer@intel.com> wrote:
> These results are pretty bad.
> 
> What do you get for dom0->external? That definitely should be close or equal
> to native.

with default BVT, dom->external gets 643Mbps. native gets 744Mbps.

> Have you tweaked /proc/sys/net/core/rmem_max?

No. I once did Linux tcp tuning on native linux and increased the
throughput to around 810Mbps. But it's not very stable and
occasionally produced weird behaviors so I turned off tuning on both
server and client.

> Is the socket buffer set to some large value?

Both sender and receiver buffers are 32K.

> Are you transmitting/receiving enough data?

Each tests last 50 seconds, transmitting around 3g data.

> 
> I don't know netperf but for ttcp I would normally do:
> 
> echo 1048576 > /proc/sys/net/core/rmem_max
> ttcp -b 65536 (or similar) ...
> And then transmit a few gigabytes
> 
> What's the interrupt rate etc.

Haven't noticed yet. I'll get you the number tomorrow.

What currently I'm really really obssessed is (1) dom1->external with
default BVT gives only ~400Mbps (2) dom1->external with my EEVDF
scheduler (everything else is exactly the same) gives 610Mbps, very
close to dom0->external. With scheduler latency histograms, it seems
to be caused by *far too frequent* context switches in BVT. I'm still
digging.

Thanks a lot,
Bin

> 
> Rolf
> 
> 
> On 23/5/05 10:48 pm, "Bin Ren" <bin.ren@gmail.com> wrote:
> 
> > On 5/23/05, Nivedita Singhvi <niv@us.ibm.com> wrote:
> >> Bin Ren wrote:
> >>> I've added the support for ethtools. By turning on and off netfront
> >>> checksum offloading, I'm getting the following throughput numbers,
> >>> using iperf. Each test was run three times. CPU usages are quite
> >>> similar in two cases ('top' output). Looks like checksum computation
> >>> is not a major overhead in domU networking.
> >>>
> >>> dom0/1/2 all have 128M memory. dom0 has e1000 tx checksum offloading turned
> >>> on.
> >>
> >> Yeah, if you want to do anything network intensive, 128MB is just
> >> not enough - you really need more memory in your system.
> >
> > I've given all the domains 256M memory and switched to netperf
> > TCP_STREAM (netperf -H server). almost no change. Details:
> >
> > dom1->external: 420Mbps
> > dom1->dom0: 437Mbps
> > dom0->dom1: 200Mbps (!!!)
> > dom1->dom2: 327Mbps
> >
> >>
> >>> With Tx checksum on:
> >>>
> >>> dom1->dom2: 300Mb/s (dom0 cpu maxed out by software interrupts)
> >>> dom1->dom0: 459Mb/s (dom0 cpu 80% in SI, dom1 cpu 20% in SI)
> >>> dom1->external: 439Mb/s (over 1Gb/s ethernet) (dom0 cpu 50% in SI,
> >>> dom1 60% in SI)
> >>>
> >>> With Tx checksum off:
> >>>
> >>> dom1->dom2: 301Mb/s
> >>> dom1->dom0: 454Mb/s
> >>> dom1->externel: 437Mb/s (over 1Gb/s ethernet)
> >>
> >>
> >> iperf is a directional send test, correct?
> >> i.e. is dom1 -> dom0 perf the same as dom0 -> dom1 for you?
> >
> > Please see above.
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel
> 
>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* RE: [PATCH] Network Checksum Removal
@ 2005-05-24  1:22 Ian Pratt
  2005-05-24  1:35 ` Bin Ren
  2005-05-24 22:54 ` Bin Ren
  0 siblings, 2 replies; 53+ messages in thread
From: Ian Pratt @ 2005-05-24  1:22 UTC (permalink / raw)
  To: bin.ren, Rolf Neugebauer; +Cc: xen-devel, Andrew Theurer, Jon Mason

> What currently I'm really really obssessed is (1) 
> dom1->external with default BVT gives only ~400Mbps (2) 
> dom1->external with my EEVDF scheduler (everything else is 
> exactly the same) gives 610Mbps, very close to 
> dom0->external. With scheduler latency histograms, it seems 
> to be caused by *far too frequent* context switches in BVT. 
> I'm still digging.

Have you tried SEDF? I'm itching to make it the default scheduler...

Ian

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] Network Checksum Removal
  2005-05-24  1:22 Ian Pratt
@ 2005-05-24  1:35 ` Bin Ren
  2005-05-24 22:54 ` Bin Ren
  1 sibling, 0 replies; 53+ messages in thread
From: Bin Ren @ 2005-05-24  1:35 UTC (permalink / raw)
  To: Ian Pratt; +Cc: xen-devel, Andrew Theurer, Rolf Neugebauer, Jon Mason

Not yet. I'll give it a shot tomorrow and post the numbers here.

Cheers,
Bin

On 5/24/05, Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk> wrote:
> > What currently I'm really really obssessed is (1)
> > dom1->external with default BVT gives only ~400Mbps (2)
> > dom1->external with my EEVDF scheduler (everything else is
> > exactly the same) gives 610Mbps, very close to
> > dom0->external. With scheduler latency histograms, it seems
> > to be caused by *far too frequent* context switches in BVT.
> > I'm still digging.
> 
> Have you tried SEDF? I'm itching to make it the default scheduler...
> 
> Ian
>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] Network Checksum Removal
  2005-05-23 23:59 Ian Pratt
@ 2005-05-24 16:12 ` Jon Mason
  2005-05-24 20:45   ` Andrew Theurer
  0 siblings, 1 reply; 53+ messages in thread
From: Jon Mason @ 2005-05-24 16:12 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Pratt, Andrew Theurer, bin.ren

On Monday 23 May 2005 06:59 pm, Ian Pratt wrote:
> > I get the following domU->dom0 throughput on my system (using
> > netperf3 TCP_STREAM testcase):
> > tx on	~1580Mbps
> > tx off	~1230Mbps
> >
> > with my previous patch (on Friday's build), I was seeing the
> > following:
> > with patch	~1610Mbps
> > no patch		~1100Mbps
> >
> > The slight difference between the two might be caused by the
> > changes that were incorporated in xen between those dates.
> > If you think it is worth the time, I can back port the latest
> > patch to Friday's build to see if that makes a difference.
>
> Are you sure these aren't within 'experimental error'? I can't think of
> anything that's changed since Friday that could be effecting this, but
> it would be good to dig a bit further as the difference in 'no patch'
> results is quite significant.

The "tx off" is probably higher because of the offloading for the rx (in both 
the netback not checksumming and the physical ethernet checksum verification 
being passed to domU).

I'm not sure why "tx on" is lower than my previous tests.  It could be 
something outside the patch which has been incorporated, or it could be 
something in the patch that was committed.  The changelog patch diff was 
truncated, so I will have to create a diff to apply to my Friday tree to see 
if the problem lies in the latter.  

> It might be revealing to try running some results on the unpatched
> Fri/Sat/Sun tree.
>
> BTW, dom0<->domU is not that interesting as I'd generally discourage
> people from running services in dom0. 

That is why I designed the checksum offload patch the way I did, as there were 
otherways which would be significantly better domU->dom0 communication (but 
would cause significantly more calculation in dom0).

> I'd be really interested to see 
> the following tests:
>
> domU <-> external [dom0 on cpu0; dom1 on cpu1]
> domU <-> external [dom0 on cpu0; dom1 on cpu0]
> domU <-> domU [dom0 on cpu0; dom1 on cpu1; dom2 on cpu2 ** on a 4 way]
> domU <-> domU [dom0 on cpu0; dom1 on cpu0; dom2 on cpu0 ]
> domU <-> domU [dom0 on cpu0; dom1 on cpu1; dom2 on cpu1 ]
> domU <-> domU [dom0 on cpu0; dom1 on cpu0; dom2 on cpu1 ]
> domU <-> domU [dom0 on cpu0; dom1 on cpu1; dom2 on cpu2 ** cpu2
> hyperthread w/ cpu 0]
> domU <-> domU [dom0 on cpu0; dom1 on cpu1; dom2 on cpu3 ** cpu3
> hyperthread w/ cpu 1]
>
> This might help us understand the performance of interdomin networking
> rather better than we do at present. If you could fill a few of these in
> that would be great.

I wish I had all the hardware you describe ;-)

My tests are running on a pentium4 (which has hyperthreading, which shows up 
as 2 cpus).  dom0 was on cpu0 and domU was on cpu1.  I'll be happy to run 
netperf on the hardware I have.

Thanks,
Jon

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] Network Checksum Removal
  2005-05-24 16:12 ` Jon Mason
@ 2005-05-24 20:45   ` Andrew Theurer
  2005-05-25 14:38     ` Andrew Theurer
  0 siblings, 1 reply; 53+ messages in thread
From: Andrew Theurer @ 2005-05-24 20:45 UTC (permalink / raw)
  To: Jon Mason, xen-devel; +Cc: Ian Pratt, bin.ren

First round of test results for netperf2:  I will also run domU->domU, 
and all of these tests again with domians on different cpus (these are 
all on the same HW thread).

The cpu util is from xc_domain_get_cpu_usage(), not sar, vmstat, etc (I 
am not confident those are accurate for Xen right now).

DomU cpu util is about 2% lower on domU->host, which is about the % time 
spent in csum_partial_copy based on a timer int based oprofile.  Not 
sure why dom0 uses that extra 2% cpu, and we see maybe 1% throughput 
increase in our best cases.  I do think cpu util in dom0 is the biggest 
problem right now.  On this same box, we might use 30% of one cpu total 
to max out this Gbps adapter (tg3).  Adding ~60% cpu to just "proxy" 
this network seems like a lot.

Dom0 to domU is quite good, 13% better in the best case.

Also note the horrible throughput rates for 64 byte messages, most 
likely due to excessive context switching.

Also, BTW, this is the "old" bridge networking, no veth0 used yet.

-Andrew

3.2 GHz Xeon with Hyperhtreading, 1GB memory
                                                                                                                                    
Benchmark: netperf2 -T TCP_STREAM
                                                                                                                                    
dom0 and dom1 on cpu0 (first SMT thread on first core)
 domU to host
  "hw" tx csum
   msg-size: 00064  Mbps: 0186  d0-cpu: 49.38  d1-cpu: 44.35
   msg-size: 01500  Mbps: 0917  d0-cpu: 62.13  d1-cpu: 37.87
   msg-size: 16384  Mbps: 0933  d0-cpu: 66.63  d1-cpu: 33.37
   msg-size: 32768  Mbps: 0928  d0-cpu: 66.96  d1-cpu: 32.66
  sw tx csum
   msg-size: 00064  Mbps: 0187  d0-cpu: 49.50  d1-cpu: 44.52
   msg-size: 01500  Mbps: 0904  d0-cpu: 60.63  d1-cpu: 39.36
   msg-size: 16384  Mbps: 0924  d0-cpu: 63.98  d1-cpu: 35.98
   msg-size: 32768  Mbps: 0926  d0-cpu: 64.18  d1-cpu: 35.68
 domU to dom0
  "hw" csum
   msg-size: 00064  Mbps: 0014  d0-cpu: 64.02  d1-cpu: 31.71
   msg-size: 01500  Mbps: 1087  d0-cpu: 63.34  d1-cpu: 36.67
   msg-size: 16384  Mbps: 1204  d0-cpu: 67.30  d1-cpu: 32.71
   msg-size: 32768  Mbps: 1148  d0-cpu: 68.08  d1-cpu: 31.93
  sw tx csum
   msg-size: 00064  Mbps: 0014  d0-cpu: 64.88  d1-cpu: 32.39
   msg-size: 01500  Mbps: 0948  d0-cpu: 62.20  d1-cpu: 37.80
   msg-size: 16384  Mbps: 1063  d0-cpu: 64.73  d1-cpu: 35.27
   msg-size: 32768  Mbps: 1012  d0-cpu: 65.71  d1-cpu: 34.30

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] Network Checksum Removal
  2005-05-24  1:22 Ian Pratt
  2005-05-24  1:35 ` Bin Ren
@ 2005-05-24 22:54 ` Bin Ren
  1 sibling, 0 replies; 53+ messages in thread
From: Bin Ren @ 2005-05-24 22:54 UTC (permalink / raw)
  To: Ian Pratt; +Cc: xen-devel, Andrew Theurer, Rolf Neugebauer, Jon Mason

On 5/24/05, Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk> wrote:
> > What currently I'm really really obssessed is (1)
> > dom1->external with default BVT gives only ~400Mbps (2)
> > dom1->external with my EEVDF scheduler (everything else is
> > exactly the same) gives 610Mbps, very close to
> > dom0->external. With scheduler latency histograms, it seems
> > to be caused by *far too frequent* context switches in BVT.
> > I'm still digging.
> 
> Have you tried SEDF? I'm itching to make it the default scheduler...
> 
> Ian

The following numbers are all for dom1->external. Each test runs 50
seconds. dom0/1 shares one CPU.

With default SEDF, throughput is even worse than default BVT: 318Mbps
(down from 410Mbps). I guess, without looking into the source codes,
default SEDF, dom0 and dom1 both get 50% of CPU. I tweaked their CPU
shares and get the followings:

dom1 60%: 493Mbps
dom1 70%: 371Mbps
dom1 80%: 243Mbps

After these tests, dom0 /proc/interrupts is:

           CPU0
 14:      11148        Phys-irq  ide0
 15:          2        Phys-irq  ide1
 16:    1722970        Phys-irq  eth0
 21:          0        Phys-irq  uhci_hcd, uhci_hcd, uhci_hcd, uhci_hcd
256:          5     Dynamic-irq  ctrl-if
257:      92682     Dynamic-irq  timer0
258:         35     Dynamic-irq  console
259:          0     Dynamic-irq  net-be-dbg
260:       4842     Dynamic-irq  blkif-backend
261:    2943112     Dynamic-irq  vif1.0
NMI:          0
LOC:          0
ERR:          0
MIS:          0

dom1 /proc/interrupts is:

           CPU0
256:        474     Dynamic-irq  ctrl-if
257:      45584     Dynamic-irq  timer0
258:       5158     Dynamic-irq  blkif
259:    1097273     Dynamic-irq  eth0
NMI:          0
ERR:          0

SEDF doesn't work out of box and parameter tuning is tricky as for
driver domains.

- Bin

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] Network Checksum Removal
  2005-05-24 20:45   ` Andrew Theurer
@ 2005-05-25 14:38     ` Andrew Theurer
  0 siblings, 0 replies; 53+ messages in thread
From: Andrew Theurer @ 2005-05-25 14:38 UTC (permalink / raw)
  To: Jon Mason, xen-devel; +Cc: Ian Pratt, bin.ren

Tests for domU->dom0, domU->host, and domU->domU are completed:

3.2 GHz Xeon with Hyperhtreading, 2GB (correction) memory
                                                                                                                                    
Benchmark: netperf2 -T TCP_STREAM
                                                                                                                                    
dom0, dom1, and dom2 on cpu0 (first SMT thread on first core)
 domU to host
  hw tx csum
   msg-size: 00064  Mbps: 0186  d0-cpu: 49.38  d1-cpu: 44.35
   msg-size: 01500  Mbps: 0917  d0-cpu: 62.13  d1-cpu: 37.87
   msg-size: 16384  Mbps: 0933  d0-cpu: 66.63  d1-cpu: 33.37
   msg-size: 32768  Mbps: 0928  d0-cpu: 66.96  d1-cpu: 32.66
  sw tx csum
   msg-size: 00064  Mbps: 0187  d0-cpu: 49.50  d1-cpu: 44.52
   msg-size: 01500  Mbps: 0904  d0-cpu: 60.63  d1-cpu: 39.36
   msg-size: 16384  Mbps: 0924  d0-cpu: 63.98  d1-cpu: 35.98
   msg-size: 32768  Mbps: 0926  d0-cpu: 64.18  d1-cpu: 35.68
	^^about 2% reduction in cpu util on dom1^^
 domU to dom0
  hw tx csum
   msg-size: 00064  Mbps: 0014  d0-cpu: 64.02  d1-cpu: 31.71
   msg-size: 01500  Mbps: 1087  d0-cpu: 63.34  d1-cpu: 36.67
   msg-size: 16384  Mbps: 1204  d0-cpu: 67.30  d1-cpu: 32.71
   msg-size: 32768  Mbps: 1148  d0-cpu: 68.08  d1-cpu: 31.93
  sw tx csum
   msg-size: 00064  Mbps: 0014  d0-cpu: 64.88  d1-cpu: 32.39
   msg-size: 01500  Mbps: 0948  d0-cpu: 62.20  d1-cpu: 37.80
   msg-size: 16384  Mbps: 1063  d0-cpu: 64.73  d1-cpu: 35.27
   msg-size: 32768  Mbps: 1012  d0-cpu: 65.71  d1-cpu: 34.30
	^^upto 13% throughput increase with cpu util down ~2% on dom1^^
	  Note the dismal performance for very small msg sizes
 donU to domU
  hw tx csum
   msg-size:00064 Mbps: 0359  d0-cpu: 27.85  d1-cpu: 53.68 d2-cpu: 18.48
   msg-size:01500 Mbps: 0594  d0-cpu: 47.42  d1-cpu: 21.77 d2-cpu: 30.78
   msg-size:16384 Mbps: 0619  d0-cpu: 49.66  d1-cpu: 18.81 d2-cpu: 31.53
   msg-size:32768 Mbps: 0616  d0-cpu: 49.58  d1-cpu: 18.68 d2-cpu: 31.74
  sw tx csum
   msg-size:00064 Mbps: 0361  d0-cpu: 27.81  d1-cpu: 53.58 d2-cpu: 18.62
   msg-size:01500 Mbps: 0584  d0-cpu: 46.22  d1-cpu: 23.18 d2-cpu: 30.60
   msg-size:16384 Mbps: 0602  d0-cpu: 47.99  d1-cpu: 20.33 d2-cpu: 31.69
   msg-size:32768 Mbps: 0603  d0-cpu: 47.67  d1-cpu: 20.59 d2-cpu: 31.74
	^^About a 2% throughput increase, and cpu down on d1
	  The cpu wasted on dom1 should be enough justification for
	  domU<->domU communication with point to point front end driver
	  communication.  
dom0 on cpu0, dom1 on cpu2, and dom2 on cpu3 (dom1 and dom2 on same 
core)
 domU to host
  hw tx csum
   msg-size: 00064  Mbps: 0540  d0-cpu: 92.98  d1-cpu: 100.00
   msg-size: 01500  Mbps: 0941  d0-cpu: 99.74  d1-cpu: 48.62
   msg-size: 16384  Mbps: 0941  d0-cpu: 99.71  d1-cpu: 43.32
   msg-size: 32768  Mbps: 0941  d0-cpu: 99.72  d1-cpu: 43.21
  sw tx csum
   msg-size: 00064  Mbps: 0545  d0-cpu: 93.47  d1-cpu: 100.00
   msg-size: 01500  Mbps: 0941  d0-cpu: 99.76  d1-cpu: 51.43
   msg-size: 16384  Mbps: 0941  d0-cpu: 99.69  d1-cpu: 46.58
   msg-size: 32768  Mbps: 0941  d0-cpu: 99.72  d1-cpu: 45.39
	^^Finally at wire speed, but at a cost of 100% cpu on dom0
	  This cpu util seems excessive, maybe oprofile will show
	  some problems.  Notice dom1 has ~2% lower cpu.
 domU to dom0
  tx csum
   msg-size: 00064  Mbps: 0390  d0-cpu: 97.92  d1-cpu: 100.00
   msg-size: 01500  Mbps: 1571  d0-cpu: 97.36  d1-cpu: 54.83
   msg-size: 16384  Mbps: 1582  d0-cpu: 96.20  d1-cpu: 49.93
   msg-size: 32768  Mbps: 1596  d0-cpu: 96.32  d1-cpu: 49.63
  sw tx csum
   msg-size: 00064  Mbps: 0375  d0-cpu: 97.65  d1-cpu: 100.00 
   msg-size: 01500  Mbps: 1546  d0-cpu: 96.36  d1-cpu: 52.99
   msg-size: 16384  Mbps: 1598  d0-cpu: 95.88  d1-cpu: 47.48
   msg-size: 32768  Mbps: 1641  d0-cpu: 95.89  d1-cpu: 46.37 
	^^very slightly better avg throughput, and lower cpu on dom1
 donU to domU
  tx csum
   msg-size:00064 Mbps: 0287  d0-cpu: 84.97  d1-cpu: 100.0 d2-cpu: 75.46
   msg-size:01500 Mbps: 1004  d0-cpu: 90.98  d1-cpu: 68.29 d2-cpu: 76.94
   msg-size:16384 Mbps: 1018  d0-cpu: 89.78  d1-cpu: 60.82 d2-cpu: 78.12
   msg-size:32768 Mbps: 1010  d0-cpu: 89.30  d1-cpu: 59.83 d2-cpu: 77.99
  sw tx csum
   msg-size:00064 Mbps: 0286  d0-cpu: 84.81  d1-cpu: 99.93 d2-cpu: 76.28
   msg-size:01500 Mbps: 1018  d0-cpu: 91.30  d1-cpu: 67.27 d2-cpu: 75.08
   msg-size:16384 Mbps: 1012  d0-cpu: 88.46  d1-cpu: 55.56 d2-cpu: 71.37
   msg-size:32768 Mbps: 1017  d0-cpu: 88.33  d1-cpu: 54.96 d2-cpu: 70.96
	^^about same throughput, but ~4% lower cpu on d1
	  Again, point to point front end comms woudl be great here.


IMO, I think the patch is a good thing.  There are other very major 
issues with networking, like the massive cpu overhead for dom0.  I 
wonder if we could have a layer 2 networking model like:

-Xen has have front end ethernet drivers only
-dom0 has a Xen bridge front end driver, just to put eth0 (or whatever 
phys dev) on it.
-no domain hosted bridge device or backend ethernet drivers
 
With this, Xen acts as a ethernet "switch", switching ethernet traffic 
in xen itself, without the help of a domain hosted bridge.  Packets are 
forwarded to either a domain's front end driver, or the front end 
bridge interface in dom0 (or any other driver domain).  With this we 
may have better control of emulating offload functions, and we should 
avoid some hops (and in may cases involving dom0) for the netwrok 
traffic.  Comments?

-Andrew                                                                                                                                

^ permalink raw reply	[flat|nested] 53+ messages in thread

* RE: [PATCH] Network Checksum Removal
@ 2005-05-25 16:48 Ian Pratt
  2005-05-25 17:13 ` Jon Mason
  0 siblings, 1 reply; 53+ messages in thread
From: Ian Pratt @ 2005-05-25 16:48 UTC (permalink / raw)
  To: Andrew Theurer, Jon Mason, xen-devel; +Cc: bin.ren


What does the tx hw csum control actually turn on and off?

I'm surprised there's much benefit to csum offload on the tx side at all
as its almost always done as part of a copy.

I'd have thought the main benefit of csum offload was on the rx side, so
that packets received by the NIC are hardware csum'ed, passed through
the bridge, and then into the domU where the csum re-calculation is
avoided [it would normally need to be done before the TCP ack is sent,
and can't be done as part of a copy as the data won't be moved out of
the skb until the user app does a read].  The same rx csum check will be
avoided and hence provide benefit to domU <-> domU transfers.

In the figures below, which direction is the data stream heading? (I
presume it's a one way test, like ttcp?)

It's somewhat surprising that the dom0 bridge code is burning so much
CPU. xenoprofile results will be quite interesting to see what functions
are eating the CPU.

Ultimately, the best way of doing domU <-> domU networking will be to
allow point-to-point connections where netfronts are connected direct to
other netfronts if the hosts are on the same machine. However, the
priority for 3.0 is to optimise the normal front-back-bridge-back-front
path.

Thanks,
Ian

> -----Original Message-----
> From: Andrew Theurer [mailto:habanero@us.ibm.com] 
> Sent: 25 May 2005 15:39
> To: Jon Mason; xen-devel@lists.xensource.com
> Cc: Ian Pratt; bin.ren@cl.cam.ac.uk
> Subject: Re: [Xen-devel] [PATCH] Network Checksum Removal
> 
> Tests for domU->dom0, domU->host, and domU->domU are completed:
> 
> 3.2 GHz Xeon with Hyperhtreading, 2GB (correction) memory
>                                                               
>                                                                       
> Benchmark: netperf2 -T TCP_STREAM
>                                                               
>                                                                       
> dom0, dom1, and dom2 on cpu0 (first SMT thread on first core)
>  domU to host
>   hw tx csum
>    msg-size: 00064  Mbps: 0186  d0-cpu: 49.38  d1-cpu: 44.35
>    msg-size: 01500  Mbps: 0917  d0-cpu: 62.13  d1-cpu: 37.87
>    msg-size: 16384  Mbps: 0933  d0-cpu: 66.63  d1-cpu: 33.37
>    msg-size: 32768  Mbps: 0928  d0-cpu: 66.96  d1-cpu: 32.66
>   sw tx csum
>    msg-size: 00064  Mbps: 0187  d0-cpu: 49.50  d1-cpu: 44.52
>    msg-size: 01500  Mbps: 0904  d0-cpu: 60.63  d1-cpu: 39.36
>    msg-size: 16384  Mbps: 0924  d0-cpu: 63.98  d1-cpu: 35.98
>    msg-size: 32768  Mbps: 0926  d0-cpu: 64.18  d1-cpu: 35.68
> 	^^about 2% reduction in cpu util on dom1^^
>  domU to dom0
>   hw tx csum
>    msg-size: 00064  Mbps: 0014  d0-cpu: 64.02  d1-cpu: 31.71
>    msg-size: 01500  Mbps: 1087  d0-cpu: 63.34  d1-cpu: 36.67
>    msg-size: 16384  Mbps: 1204  d0-cpu: 67.30  d1-cpu: 32.71
>    msg-size: 32768  Mbps: 1148  d0-cpu: 68.08  d1-cpu: 31.93
>   sw tx csum
>    msg-size: 00064  Mbps: 0014  d0-cpu: 64.88  d1-cpu: 32.39
>    msg-size: 01500  Mbps: 0948  d0-cpu: 62.20  d1-cpu: 37.80
>    msg-size: 16384  Mbps: 1063  d0-cpu: 64.73  d1-cpu: 35.27
>    msg-size: 32768  Mbps: 1012  d0-cpu: 65.71  d1-cpu: 34.30
> 	^^upto 13% throughput increase with cpu util down ~2% on dom1^^
> 	  Note the dismal performance for very small msg sizes
>  donU to domU
>   hw tx csum
>    msg-size:00064 Mbps: 0359  d0-cpu: 27.85  d1-cpu: 53.68 
> d2-cpu: 18.48
>    msg-size:01500 Mbps: 0594  d0-cpu: 47.42  d1-cpu: 21.77 
> d2-cpu: 30.78
>    msg-size:16384 Mbps: 0619  d0-cpu: 49.66  d1-cpu: 18.81 
> d2-cpu: 31.53
>    msg-size:32768 Mbps: 0616  d0-cpu: 49.58  d1-cpu: 18.68 
> d2-cpu: 31.74
>   sw tx csum
>    msg-size:00064 Mbps: 0361  d0-cpu: 27.81  d1-cpu: 53.58 
> d2-cpu: 18.62
>    msg-size:01500 Mbps: 0584  d0-cpu: 46.22  d1-cpu: 23.18 
> d2-cpu: 30.60
>    msg-size:16384 Mbps: 0602  d0-cpu: 47.99  d1-cpu: 20.33 
> d2-cpu: 31.69
>    msg-size:32768 Mbps: 0603  d0-cpu: 47.67  d1-cpu: 20.59 
> d2-cpu: 31.74
> 	^^About a 2% throughput increase, and cpu down on d1
> 	  The cpu wasted on dom1 should be enough justification for
> 	  domU<->domU communication with point to point front end driver
> 	  communication.  
> dom0 on cpu0, dom1 on cpu2, and dom2 on cpu3 (dom1 and dom2 on same 
> core)
>  domU to host
>   hw tx csum
>    msg-size: 00064  Mbps: 0540  d0-cpu: 92.98  d1-cpu: 100.00
>    msg-size: 01500  Mbps: 0941  d0-cpu: 99.74  d1-cpu: 48.62
>    msg-size: 16384  Mbps: 0941  d0-cpu: 99.71  d1-cpu: 43.32
>    msg-size: 32768  Mbps: 0941  d0-cpu: 99.72  d1-cpu: 43.21
>   sw tx csum
>    msg-size: 00064  Mbps: 0545  d0-cpu: 93.47  d1-cpu: 100.00
>    msg-size: 01500  Mbps: 0941  d0-cpu: 99.76  d1-cpu: 51.43
>    msg-size: 16384  Mbps: 0941  d0-cpu: 99.69  d1-cpu: 46.58
>    msg-size: 32768  Mbps: 0941  d0-cpu: 99.72  d1-cpu: 45.39
> 	^^Finally at wire speed, but at a cost of 100% cpu on dom0
> 	  This cpu util seems excessive, maybe oprofile will show
> 	  some problems.  Notice dom1 has ~2% lower cpu.
>  domU to dom0
>   tx csum
>    msg-size: 00064  Mbps: 0390  d0-cpu: 97.92  d1-cpu: 100.00
>    msg-size: 01500  Mbps: 1571  d0-cpu: 97.36  d1-cpu: 54.83
>    msg-size: 16384  Mbps: 1582  d0-cpu: 96.20  d1-cpu: 49.93
>    msg-size: 32768  Mbps: 1596  d0-cpu: 96.32  d1-cpu: 49.63
>   sw tx csum
>    msg-size: 00064  Mbps: 0375  d0-cpu: 97.65  d1-cpu: 100.00 
>    msg-size: 01500  Mbps: 1546  d0-cpu: 96.36  d1-cpu: 52.99
>    msg-size: 16384  Mbps: 1598  d0-cpu: 95.88  d1-cpu: 47.48
>    msg-size: 32768  Mbps: 1641  d0-cpu: 95.89  d1-cpu: 46.37 
> 	^^very slightly better avg throughput, and lower cpu on dom1
>  donU to domU
>   tx csum
>    msg-size:00064 Mbps: 0287  d0-cpu: 84.97  d1-cpu: 100.0 
> d2-cpu: 75.46
>    msg-size:01500 Mbps: 1004  d0-cpu: 90.98  d1-cpu: 68.29 
> d2-cpu: 76.94
>    msg-size:16384 Mbps: 1018  d0-cpu: 89.78  d1-cpu: 60.82 
> d2-cpu: 78.12
>    msg-size:32768 Mbps: 1010  d0-cpu: 89.30  d1-cpu: 59.83 
> d2-cpu: 77.99
>   sw tx csum
>    msg-size:00064 Mbps: 0286  d0-cpu: 84.81  d1-cpu: 99.93 
> d2-cpu: 76.28
>    msg-size:01500 Mbps: 1018  d0-cpu: 91.30  d1-cpu: 67.27 
> d2-cpu: 75.08
>    msg-size:16384 Mbps: 1012  d0-cpu: 88.46  d1-cpu: 55.56 
> d2-cpu: 71.37
>    msg-size:32768 Mbps: 1017  d0-cpu: 88.33  d1-cpu: 54.96 
> d2-cpu: 70.96
> 	^^about same throughput, but ~4% lower cpu on d1
> 	  Again, point to point front end comms woudl be great here.
> 
> 
> IMO, I think the patch is a good thing.  There are other very major 
> issues with networking, like the massive cpu overhead for dom0.  I 
> wonder if we could have a layer 2 networking model like:
> 
> -Xen has have front end ethernet drivers only
> -dom0 has a Xen bridge front end driver, just to put eth0 (or 
> whatever 
> phys dev) on it.
> -no domain hosted bridge device or backend ethernet drivers
>  
> With this, Xen acts as a ethernet "switch", switching 
> ethernet traffic 
> in xen itself, without the help of a domain hosted bridge.  
> Packets are 
> forwarded to either a domain's front end driver, or the front end 
> bridge interface in dom0 (or any other driver domain).  With this we 
> may have better control of emulating offload functions, and we should 
> avoid some hops (and in may cases involving dom0) for the netwrok 
> traffic.  Comments?
> 
> -Andrew                                                       
>                                                               
>            
> 
> 
> 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] Network Checksum Removal
  2005-05-25 16:48 Ian Pratt
@ 2005-05-25 17:13 ` Jon Mason
  2005-05-25 18:19   ` Nivedita Singhvi
  0 siblings, 1 reply; 53+ messages in thread
From: Jon Mason @ 2005-05-25 17:13 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Pratt, Andrew Theurer, bin.ren

On Wednesday 25 May 2005 11:48 am, Ian Pratt wrote:
> What does the tx hw csum control actually turn on and off?

The tx hw csum control lets the TCP/IP stack know whether or not to software 
checksum the outgoing packet or not.  So if tx checksum offload is enabled, 
then the stack will not software checksum it.

> I'm surprised there's much benefit to csum offload on the tx side at all
> as its almost always done as part of a copy.

Why?  The tx checksumming is just as expensive as the rx checksumming.

> I'd have thought the main benefit of csum offload was on the rx side, so
> that packets received by the NIC are hardware csum'ed, passed through
> the bridge, and then into the domU where the csum re-calculation is
> avoided [it would normally need to be done before the TCP ack is sent,
> and can't be done as part of a copy as the data won't be moved out of
> the skb until the user app does a read].  The same rx csum check will be
> avoided and hence provide benefit to domU <-> domU transfers.

I can add an ethtool feature to disable rx checksum offload (so that domU will 
verify the checksum in hardware).

> In the figures below, which direction is the data stream heading? (I
> presume it's a one way test, like ttcp?)
>
> It's somewhat surprising that the dom0 bridge code is burning so much
> CPU. xenoprofile results will be quite interesting to see what functions
> are eating the CPU.

There is a patch on netdev which can decrease the CPU load of bridging.  
specifically, it allows the bridge device to take advantage of the network 
device features (like hardware checksum offload).  Stephen Hemminger says it 
should go in the 2.6.13 kernel.  

> Ultimately, the best way of doing domU <-> domU networking will be to
> allow point-to-point connections where netfronts are connected direct to
> other netfronts if the hosts are on the same machine. However, the
> priority for 3.0 is to optimise the normal front-back-bridge-back-front
> path.
>
> Thanks,
> Ian
>
> > -----Original Message-----
> > From: Andrew Theurer [mailto:habanero@us.ibm.com]
> > Sent: 25 May 2005 15:39
> > To: Jon Mason; xen-devel@lists.xensource.com
> > Cc: Ian Pratt; bin.ren@cl.cam.ac.uk
> > Subject: Re: [Xen-devel] [PATCH] Network Checksum Removal
> >
> > Tests for domU->dom0, domU->host, and domU->domU are completed:
> >
> > 3.2 GHz Xeon with Hyperhtreading, 2GB (correction) memory
> >
> >
> > Benchmark: netperf2 -T TCP_STREAM
> >
> >
> > dom0, dom1, and dom2 on cpu0 (first SMT thread on first core)
> >  domU to host
> >   hw tx csum
> >    msg-size: 00064  Mbps: 0186  d0-cpu: 49.38  d1-cpu: 44.35
> >    msg-size: 01500  Mbps: 0917  d0-cpu: 62.13  d1-cpu: 37.87
> >    msg-size: 16384  Mbps: 0933  d0-cpu: 66.63  d1-cpu: 33.37
> >    msg-size: 32768  Mbps: 0928  d0-cpu: 66.96  d1-cpu: 32.66
> >   sw tx csum
> >    msg-size: 00064  Mbps: 0187  d0-cpu: 49.50  d1-cpu: 44.52
> >    msg-size: 01500  Mbps: 0904  d0-cpu: 60.63  d1-cpu: 39.36
> >    msg-size: 16384  Mbps: 0924  d0-cpu: 63.98  d1-cpu: 35.98
> >    msg-size: 32768  Mbps: 0926  d0-cpu: 64.18  d1-cpu: 35.68
> > 	^^about 2% reduction in cpu util on dom1^^
> >  domU to dom0
> >   hw tx csum
> >    msg-size: 00064  Mbps: 0014  d0-cpu: 64.02  d1-cpu: 31.71
> >    msg-size: 01500  Mbps: 1087  d0-cpu: 63.34  d1-cpu: 36.67
> >    msg-size: 16384  Mbps: 1204  d0-cpu: 67.30  d1-cpu: 32.71
> >    msg-size: 32768  Mbps: 1148  d0-cpu: 68.08  d1-cpu: 31.93
> >   sw tx csum
> >    msg-size: 00064  Mbps: 0014  d0-cpu: 64.88  d1-cpu: 32.39
> >    msg-size: 01500  Mbps: 0948  d0-cpu: 62.20  d1-cpu: 37.80
> >    msg-size: 16384  Mbps: 1063  d0-cpu: 64.73  d1-cpu: 35.27
> >    msg-size: 32768  Mbps: 1012  d0-cpu: 65.71  d1-cpu: 34.30
> > 	^^upto 13% throughput increase with cpu util down ~2% on dom1^^
> > 	  Note the dismal performance for very small msg sizes
> >  donU to domU
> >   hw tx csum
> >    msg-size:00064 Mbps: 0359  d0-cpu: 27.85  d1-cpu: 53.68
> > d2-cpu: 18.48
> >    msg-size:01500 Mbps: 0594  d0-cpu: 47.42  d1-cpu: 21.77
> > d2-cpu: 30.78
> >    msg-size:16384 Mbps: 0619  d0-cpu: 49.66  d1-cpu: 18.81
> > d2-cpu: 31.53
> >    msg-size:32768 Mbps: 0616  d0-cpu: 49.58  d1-cpu: 18.68
> > d2-cpu: 31.74
> >   sw tx csum
> >    msg-size:00064 Mbps: 0361  d0-cpu: 27.81  d1-cpu: 53.58
> > d2-cpu: 18.62
> >    msg-size:01500 Mbps: 0584  d0-cpu: 46.22  d1-cpu: 23.18
> > d2-cpu: 30.60
> >    msg-size:16384 Mbps: 0602  d0-cpu: 47.99  d1-cpu: 20.33
> > d2-cpu: 31.69
> >    msg-size:32768 Mbps: 0603  d0-cpu: 47.67  d1-cpu: 20.59
> > d2-cpu: 31.74
> > 	^^About a 2% throughput increase, and cpu down on d1
> > 	  The cpu wasted on dom1 should be enough justification for
> > 	  domU<->domU communication with point to point front end driver
> > 	  communication.
> > dom0 on cpu0, dom1 on cpu2, and dom2 on cpu3 (dom1 and dom2 on same
> > core)
> >  domU to host
> >   hw tx csum
> >    msg-size: 00064  Mbps: 0540  d0-cpu: 92.98  d1-cpu: 100.00
> >    msg-size: 01500  Mbps: 0941  d0-cpu: 99.74  d1-cpu: 48.62
> >    msg-size: 16384  Mbps: 0941  d0-cpu: 99.71  d1-cpu: 43.32
> >    msg-size: 32768  Mbps: 0941  d0-cpu: 99.72  d1-cpu: 43.21
> >   sw tx csum
> >    msg-size: 00064  Mbps: 0545  d0-cpu: 93.47  d1-cpu: 100.00
> >    msg-size: 01500  Mbps: 0941  d0-cpu: 99.76  d1-cpu: 51.43
> >    msg-size: 16384  Mbps: 0941  d0-cpu: 99.69  d1-cpu: 46.58
> >    msg-size: 32768  Mbps: 0941  d0-cpu: 99.72  d1-cpu: 45.39
> > 	^^Finally at wire speed, but at a cost of 100% cpu on dom0
> > 	  This cpu util seems excessive, maybe oprofile will show
> > 	  some problems.  Notice dom1 has ~2% lower cpu.
> >  domU to dom0
> >   tx csum
> >    msg-size: 00064  Mbps: 0390  d0-cpu: 97.92  d1-cpu: 100.00
> >    msg-size: 01500  Mbps: 1571  d0-cpu: 97.36  d1-cpu: 54.83
> >    msg-size: 16384  Mbps: 1582  d0-cpu: 96.20  d1-cpu: 49.93
> >    msg-size: 32768  Mbps: 1596  d0-cpu: 96.32  d1-cpu: 49.63
> >   sw tx csum
> >    msg-size: 00064  Mbps: 0375  d0-cpu: 97.65  d1-cpu: 100.00
> >    msg-size: 01500  Mbps: 1546  d0-cpu: 96.36  d1-cpu: 52.99
> >    msg-size: 16384  Mbps: 1598  d0-cpu: 95.88  d1-cpu: 47.48
> >    msg-size: 32768  Mbps: 1641  d0-cpu: 95.89  d1-cpu: 46.37
> > 	^^very slightly better avg throughput, and lower cpu on dom1
> >  donU to domU
> >   tx csum
> >    msg-size:00064 Mbps: 0287  d0-cpu: 84.97  d1-cpu: 100.0
> > d2-cpu: 75.46
> >    msg-size:01500 Mbps: 1004  d0-cpu: 90.98  d1-cpu: 68.29
> > d2-cpu: 76.94
> >    msg-size:16384 Mbps: 1018  d0-cpu: 89.78  d1-cpu: 60.82
> > d2-cpu: 78.12
> >    msg-size:32768 Mbps: 1010  d0-cpu: 89.30  d1-cpu: 59.83
> > d2-cpu: 77.99
> >   sw tx csum
> >    msg-size:00064 Mbps: 0286  d0-cpu: 84.81  d1-cpu: 99.93
> > d2-cpu: 76.28
> >    msg-size:01500 Mbps: 1018  d0-cpu: 91.30  d1-cpu: 67.27
> > d2-cpu: 75.08
> >    msg-size:16384 Mbps: 1012  d0-cpu: 88.46  d1-cpu: 55.56
> > d2-cpu: 71.37
> >    msg-size:32768 Mbps: 1017  d0-cpu: 88.33  d1-cpu: 54.96
> > d2-cpu: 70.96
> > 	^^about same throughput, but ~4% lower cpu on d1
> > 	  Again, point to point front end comms woudl be great here.
> >
> >
> > IMO, I think the patch is a good thing.  There are other very major
> > issues with networking, like the massive cpu overhead for dom0.  I
> > wonder if we could have a layer 2 networking model like:
> >
> > -Xen has have front end ethernet drivers only
> > -dom0 has a Xen bridge front end driver, just to put eth0 (or
> > whatever
> > phys dev) on it.
> > -no domain hosted bridge device or backend ethernet drivers
> >
> > With this, Xen acts as a ethernet "switch", switching
> > ethernet traffic
> > in xen itself, without the help of a domain hosted bridge.
> > Packets are
> > forwarded to either a domain's front end driver, or the front end
> > bridge interface in dom0 (or any other driver domain).  With this we
> > may have better control of emulating offload functions, and we should
> > avoid some hops (and in may cases involving dom0) for the netwrok
> > traffic.  Comments?
> >
> > -Andrew
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] Network Checksum Removal
  2005-05-25 17:13 ` Jon Mason
@ 2005-05-25 18:19   ` Nivedita Singhvi
  0 siblings, 0 replies; 53+ messages in thread
From: Nivedita Singhvi @ 2005-05-25 18:19 UTC (permalink / raw)
  To: Jon Mason; +Cc: Ian Pratt, Andrew Theurer, xen-devel, bin.ren

Jon Mason wrote:

>>I'm surprised there's much benefit to csum offload on the tx side at all
>>as its almost always done as part of a copy.
> 
> 
> Why?  The tx checksumming is just as expensive as the rx checksumming.

Normally (i.e. non sendfile() case), on the transmit side, you
have to copy the data from user space to kernel space, and
usually, during this step, you perform the checksum operation
for a few extra instructions - you have to take the hit of
pulling in each byte of data in any case.

So checksum offload on the transmit path _normally_ buys you
no throughput gain, and very slight reduction in CPU utilization.
Of course, for every segment sent out (or bunches thereof),
we get an ack back. But checksumming a TCP header (pure
ack case) is again, fairly trivial (20 bytes).


thanks,
Nivedita

^ permalink raw reply	[flat|nested] 53+ messages in thread

* RE: [PATCH] Network Checksum Removal
@ 2005-05-25 20:06 Ian Pratt
  2005-05-25 21:14 ` Keir Fraser
  2005-05-25 21:38 ` Cédric Schieli
  0 siblings, 2 replies; 53+ messages in thread
From: Ian Pratt @ 2005-05-25 20:06 UTC (permalink / raw)
  To: Jon Mason, xen-devel; +Cc: Andrew Theurer, bin.ren

 


> > I'm surprised there's much benefit to csum offload on the 
> tx side at 
> > all as its almost always done as part of a copy.
> 
> Why?  The tx checksumming is just as expensive as the rx checksumming.

[Nivedita has already posted a nice explanation.]
 

> There is a patch on netdev which can decrease the CPU load of 
> bridging.  
> specifically, it allows the bridge device to take advantage 
> of the network device features (like hardware checksum 
> offload).  Stephen Hemminger says it should go in the 2.6.13 kernel.  

Please can you post it as a patch so that we can include it in our
2.6.11 patches directory.

With the patch, csum offload will be much more interesting in the rx
case 

Thanks,
Ian

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] Network Checksum Removal
  2005-05-25 20:06 Ian Pratt
@ 2005-05-25 21:14 ` Keir Fraser
  2005-05-25 21:35   ` Jon Mason
  2005-05-25 21:38 ` Cédric Schieli
  1 sibling, 1 reply; 53+ messages in thread
From: Keir Fraser @ 2005-05-25 21:14 UTC (permalink / raw)
  To: Ian Pratt; +Cc: Andrew Theurer, xen-devel, bin.ren, Jon Mason


On 25 May 2005, at 21:06, Ian Pratt wrote:

>> There is a patch on netdev which can decrease the CPU load of
>> bridging.
>> specifically, it allows the bridge device to take advantage
>> of the network device features (like hardware checksum
>> offload).  Stephen Hemminger says it should go in the 2.6.13 kernel.
>
> Please can you post it as a patch so that we can include it in our
> 2.6.11 patches directory.
>
> With the patch, csum offload will be much more interesting in the rx
> case

The code we already have offloads rx csums both for dom0 and domU's 
(the dom0 traffic has to be received through veth0 though, not the 
bridge device itself).

  -- Keir

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] Network Checksum Removal
  2005-05-25 21:14 ` Keir Fraser
@ 2005-05-25 21:35   ` Jon Mason
  2005-05-25 21:40     ` Keir Fraser
  0 siblings, 1 reply; 53+ messages in thread
From: Jon Mason @ 2005-05-25 21:35 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Ian Pratt, Andrew Theurer, xen-devel, bin.ren

On Wednesday 25 May 2005 04:14 pm, Keir Fraser wrote:
> On 25 May 2005, at 21:06, Ian Pratt wrote:
> >> There is a patch on netdev which can decrease the CPU load of
> >> bridging.
> >> specifically, it allows the bridge device to take advantage
> >> of the network device features (like hardware checksum
> >> offload).  Stephen Hemminger says it should go in the 2.6.13 kernel.
> >
> > Please can you post it as a patch so that we can include it in our
> > 2.6.11 patches directory.
> >
> > With the patch, csum offload will be much more interesting in the rx
> > case
>
> The code we already have offloads rx csums both for dom0 and domU's
> (the dom0 traffic has to be received through veth0 though, not the
> bridge device itself).

The problem with the bridge device is that all traffic generated in dom0 will 
be software checksummed, regardless of whether it needs to be or not.  In 
Xen's case, it will software checksum all traffic to domU, even though the 
vif device is advertising NETIF_F_NO_CSUM.  

This is because the stack doesn't see the features of the children devices of 
the bridge, only the features of the bridge device itself.  I created a quick 
hack to work around this, and started the discussion on the Linux netdev 
mailing list about how to fix the problem.  From this discussion, a patch was 
created which does most of what we want, but needs to be slightly modified to 
be optimal for Xen.  I will post the Xen optimized patch as soon as I have it 
done.  

Thanks,
Jon

^ permalink raw reply	[flat|nested] 53+ messages in thread

* RE: [PATCH] Network Checksum Removal
  2005-05-25 20:06 Ian Pratt
  2005-05-25 21:14 ` Keir Fraser
@ 2005-05-25 21:38 ` Cédric Schieli
  2005-05-25 21:47   ` Keir Fraser
  1 sibling, 1 reply; 53+ messages in thread
From: Cédric Schieli @ 2005-05-25 21:38 UTC (permalink / raw)
  To: xen-devel

Hello

It seems this patch breaks something in netfilter.

My setup is classical bridge (no veth0/vif0.0) plus some stateful
firewalling on Dom0

With tx offload off and firewall on, pings from Dom0 to DomU works, ssh
from Dom0 to DomU works.
With tx offload on and firewall off, idem.
With tx offload on and firewall on, ping goes well but ssh not.

Here are the iptables rules :

iptables -P INPUT DROP
iptables -A INPUT -p icmp -j ACCEPT
iptables -A INPUT -i xen-br0 -m state --state RELATED,ESTABLISHED -j
ACCEPT
iptables -P OUTPUT ACCEPT


Here is a capture of vif1.0 :

IP DOM0.2486 > DOM1.22: S
IP DOM1.22 > DOM0.2486: S
IP DOM0.2486 > DOM1.22: . ack 1
IP DOM1.22 > DOM0.2486: P 1:23(22) ack 1
IP DOM1.22 > DOM0.2486: P 1:23(22) ack 1
IP DOM1.22 > DOM0.2486: P 1:23(22) ack 1
IP DOM1.22 > DOM0.2486: P 1:23(22) ack 1
IP DOM1.22 > DOM0.2486: P 1:23(22) ack 1
IP DOM1.22 > DOM0.2486: P 1:23(22) ack 1
IP DOM1.22 > DOM0.2486: P 1:23(22) ack 1
...

The response from the original SYN goes through the third rule, but the
ACKs don't.

I added a rule to log packets with invalid state and the ACKs got
logged.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] Network Checksum Removal
  2005-05-25 21:35   ` Jon Mason
@ 2005-05-25 21:40     ` Keir Fraser
  2005-05-25 23:41       ` Jon Mason
  0 siblings, 1 reply; 53+ messages in thread
From: Keir Fraser @ 2005-05-25 21:40 UTC (permalink / raw)
  To: Jon Mason; +Cc: Ian Pratt, Andrew Theurer, xen-devel, bin.ren


On 25 May 2005, at 22:35, Jon Mason wrote:

> The problem with the bridge device is that all traffic generated in 
> dom0 will
> be software checksummed, regardless of whether it needs to be or not.  
> In
> Xen's case, it will software checksum all traffic to domU, even though 
> the
> vif device is advertising NETIF_F_NO_CSUM.
>
> This is because the stack doesn't see the features of the children 
> devices of
> the bridge, only the features of the bridge device itself.  I created 
> a quick
> hack to work around this, and started the discussion on the Linux 
> netdev
> mailing list about how to fix the problem.  From this discussion, a 
> patch was
> created which does most of what we want, but needs to be slightly 
> modified to
> be optimal for Xen.  I will post the Xen optimized patch as soon as I 
> have it
> done.

But we no longer bring up an IP interface on the bridge device -- we 
use veth0 instead, which advertises NETIF_F_IP_CSUM.

  -- Keir

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] Network Checksum Removal
  2005-05-25 21:38 ` Cédric Schieli
@ 2005-05-25 21:47   ` Keir Fraser
  2005-05-25 21:54     ` Keir Fraser
  0 siblings, 1 reply; 53+ messages in thread
From: Keir Fraser @ 2005-05-25 21:47 UTC (permalink / raw)
  To: Cédric Schieli; +Cc: xen-devel


On 25 May 2005, at 22:38, Cédric Schieli wrote:

> The response from the original SYN goes through the third rule, but the
> ACKs don't.
>
> I added a rule to log packets with invalid state and the ACKs got
> logged.

This may be a hard one to fix. The problem is probably that the packets 
coming from domU haven't been checksummed, so a checksum check will 
fail. We set ip_summed==CHECKSUM_UNNECESSARY, but perhaps the firewall 
code checksums anyway, or the bridge is clobbering ip_summed when it 
locally delivers. :-(

veth0 is careful to preserve CHECKSUM_UNNECESSARY -- it may be worth 
trying it out rather than bringing up your IP interface on the bridge 
device. See tools/examples/network for an example script that brings up 
veth0.

If that doesn't work then I'm not sure there's a clean solution (ie. 
one that doesn;t require hacking the network stack), other than 
disabling checksum offload.

  -- Keir

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] Network Checksum Removal
  2005-05-25 21:47   ` Keir Fraser
@ 2005-05-25 21:54     ` Keir Fraser
  0 siblings, 0 replies; 53+ messages in thread
From: Keir Fraser @ 2005-05-25 21:54 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Cédric Schieli, xen-devel


On 25 May 2005, at 22:47, Keir Fraser wrote:

> This may be a hard one to fix. The problem is probably that the 
> packets coming from domU haven't been checksummed, so a checksum check 
> will fail. We set ip_summed==CHECKSUM_UNNECESSARY, but perhaps the 
> firewall code checksums anyway, or the bridge is clobbering ip_summed 
> when it locally delivers. :-(

Perhaps not so hard....

Try modifying tcp_error() in 
net/ipv4/netfilter/ip_conntrack_proto_tcp.c.

Wrap the entire if statement that checks for invalid checksum in:
   if ( skb->ip_summed != CHECKSUM_UNNECESSARY ) {
        <checksum checking code goes here>
   }

I expect this should solve the problem. :-)

  -- Keir

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] Network Checksum Removal
  2005-05-25 21:40     ` Keir Fraser
@ 2005-05-25 23:41       ` Jon Mason
  2005-05-26  8:07         ` Keir Fraser
  0 siblings, 1 reply; 53+ messages in thread
From: Jon Mason @ 2005-05-25 23:41 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Pratt, bin.ren, Andrew Theurer

On Wednesday 25 May 2005 04:40 pm, Keir Fraser wrote:
> On 25 May 2005, at 22:35, Jon Mason wrote:
> > The problem with the bridge device is that all traffic generated in
> > dom0 will
> > be software checksummed, regardless of whether it needs to be or not.
> > In
> > Xen's case, it will software checksum all traffic to domU, even though
> > the
> > vif device is advertising NETIF_F_NO_CSUM.
> >
> > This is because the stack doesn't see the features of the children
> > devices of
> > the bridge, only the features of the bridge device itself.  I created
> > a quick
> > hack to work around this, and started the discussion on the Linux
> > netdev
> > mailing list about how to fix the problem.  From this discussion, a
> > patch was
> > created which does most of what we want, but needs to be slightly
> > modified to
> > be optimal for Xen.  I will post the Xen optimized patch as soon as I
> > have it
> > done.
>
> But we no longer bring up an IP interface on the bridge device -- we
> use veth0 instead, which advertises NETIF_F_IP_CSUM.

The bridge device still is the device that the stack sees, and uses its 
features to determine what to do during transmission.  If you monitor the 
skb->ip_summed flag going into netif_be_start_xmit(), you will see that it is 
0 (meaning that the stack did the checksum in software).  Now if you add the 
following patch to the bridging device, you will notice that ip_summed is now 
being used.

--- net/bridge/br_device.c.orig 2005-05-13 11:23:02.552751024 -0500
+++ net/bridge/br_device.c      2005-05-13 11:25:39.155943720 -0500
@@ -101,4 +101,5 @@ void br_dev_setup(struct net_device *dev
        dev->tx_queue_len = 0;
        dev->set_mac_address = NULL;
        dev->priv_flags = IFF_EBRIDGE;
+       dev->features = NETIF_F_HW_CSUM | NETIF_F_SG;
 }

This patch oversimplifies what needs to be done, but it provides the general 
idea and speedup that we are looking for.

Thanks,
Jon

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] Network Checksum Removal
  2005-05-25 23:41       ` Jon Mason
@ 2005-05-26  8:07         ` Keir Fraser
  2005-05-26 13:37           ` Jon Mason
  0 siblings, 1 reply; 53+ messages in thread
From: Keir Fraser @ 2005-05-26  8:07 UTC (permalink / raw)
  To: Jon Mason; +Cc: Ian Pratt, Andrew Theurer, xen-devel, bin.ren


On 26 May 2005, at 00:41, Jon Mason wrote:

> The bridge device still is the device that the stack sees, and uses its
> features to determine what to do during transmission.  If you monitor 
> the
> skb->ip_summed flag going into netif_be_start_xmit(), you will see 
> that it is
> 0 (meaning that the stack did the checksum in software).  Now if you 
> add the
> following patch to the bridging device, you will notice that ip_summed 
> is now
> being used.

For local traffic transmitted via veth0, ip_summed is zero 
(CHECKSUM_NONE) at netif_be_start_xmit because the bridge forwarding 
code nobbles the ip_summed field. It does *not* checksum the packet: 
etherbridge never checksums packets it forwards because that is the 
destination's job (it's an end-to-end checksum at the protocol level).

If you transmit local traffic directly on the bridge device then yes, 
you need a patch because it does not advertise NETIF_F_*_CSUM.

  -- Keir

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] Network Checksum Removal
  2005-05-26  8:07         ` Keir Fraser
@ 2005-05-26 13:37           ` Jon Mason
  0 siblings, 0 replies; 53+ messages in thread
From: Jon Mason @ 2005-05-26 13:37 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Ian Pratt, Andrew Theurer, xen-devel, bin.ren

On Thursday 26 May 2005 03:07 am, Keir Fraser wrote:
> On 26 May 2005, at 00:41, Jon Mason wrote:
> > The bridge device still is the device that the stack sees, and uses its
> > features to determine what to do during transmission.  If you monitor
> > the
> > skb->ip_summed flag going into netif_be_start_xmit(), you will see
> > that it is
> > 0 (meaning that the stack did the checksum in software).  Now if you
> > add the
> > following patch to the bridging device, you will notice that ip_summed
> > is now
> > being used.
>
> For local traffic transmitted via veth0, ip_summed is zero
> (CHECKSUM_NONE) at netif_be_start_xmit because the bridge forwarding
> code nobbles the ip_summed field. It does *not* checksum the packet:
> etherbridge never checksums packets it forwards because that is the
> destination's job (it's an end-to-end checksum at the protocol level).
>
> If you transmit local traffic directly on the bridge device then yes,
> you need a patch because it does not advertise NETIF_F_*_CSUM.

That is exactly the case I am refering to (sorry for the confusion).  The 
issue is not Xen sepcific (which is why I addressed the issue on the Linux 
networking mailing list), but Xen will see a boost when using the patch.

Thanks,
Jon

^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2005-05-26 13:37 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-05-20 23:30 [PATCH] Network Checksum Removal Jon Mason
2005-05-21 14:53 ` Keir Fraser
2005-05-21 19:16   ` Keir Fraser
2005-05-21 21:49     ` Jon Mason
2005-05-23 15:29     ` Andrew Theurer
2005-05-23 15:31       ` Bin Ren
2005-05-23 15:47         ` Andrew Theurer
2005-05-23 15:56           ` Bin Ren
2005-05-23 16:06             ` Bin Ren
2005-05-23 16:16               ` Jon Mason
2005-05-23 16:36                 ` Bin Ren
2005-05-23 17:54                   ` Keir Fraser
2005-05-23 18:08                     ` Bin Ren
2005-05-23 18:18                       ` Jon Mason
2005-05-23 18:43                         ` Keir Fraser
2005-05-23 18:53                           ` Bin Ren
2005-05-23 19:55                     ` Bin Ren
2005-05-23 20:13                       ` Keir Fraser
2005-05-23 20:20                         ` Jon Mason
2005-05-23 21:52                         ` Bin Ren
2005-05-23 21:58                         ` Jon Mason
2005-05-23 22:05                           ` Bin Ren
2005-05-23 22:41                             ` Jon Mason
2005-05-23 21:12                       ` Nivedita Singhvi
2005-05-23 21:48                         ` Bin Ren
2005-05-23 23:55                           ` Rolf Neugebauer
2005-05-24  0:38                             ` Bin Ren
  -- strict thread matches above, loose matches on Subject: below --
2005-05-23 20:22 Ian Pratt
2005-05-23 20:38 ` Keir Fraser
2005-05-23 20:44   ` Jon Mason
2005-05-23 21:01 ` Bin Ren
2005-05-23 21:09   ` Andrew Theurer
2005-05-23 20:26 Ian Pratt
2005-05-23 23:59 Ian Pratt
2005-05-24 16:12 ` Jon Mason
2005-05-24 20:45   ` Andrew Theurer
2005-05-25 14:38     ` Andrew Theurer
2005-05-24  1:22 Ian Pratt
2005-05-24  1:35 ` Bin Ren
2005-05-24 22:54 ` Bin Ren
2005-05-25 16:48 Ian Pratt
2005-05-25 17:13 ` Jon Mason
2005-05-25 18:19   ` Nivedita Singhvi
2005-05-25 20:06 Ian Pratt
2005-05-25 21:14 ` Keir Fraser
2005-05-25 21:35   ` Jon Mason
2005-05-25 21:40     ` Keir Fraser
2005-05-25 23:41       ` Jon Mason
2005-05-26  8:07         ` Keir Fraser
2005-05-26 13:37           ` Jon Mason
2005-05-25 21:38 ` Cédric Schieli
2005-05-25 21:47   ` Keir Fraser
2005-05-25 21:54     ` Keir Fraser

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.