[PATCH 0/4] [RFC] virtio-net: Improve small packet performance

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/4] [RFC] virtio-net: Improve small packet performance
@ 2011-05-04 14:02 Krishna Kumar
  2011-05-04 14:03 ` [PATCH 1/4] [RFC] netdevice: Introduce per-txq xmit_restart Krishna Kumar
                   ` (4 more replies)
  0 siblings, 5 replies; 26+ messages in thread
From: Krishna Kumar @ 2011-05-04 14:02 UTC (permalink / raw)
  To: davem; +Cc: eric.dumazet, kvm, mst, netdev, rusty, Krishna Kumar

Earlier approach to improving small packet performance went
along the lines of dropping packets when the txq is full to
avoid stop/start of the txq. Though performance improved
significantly (upto 3x) for a single thread, multiple netperf
sessions showed a regression of upto -17% (starting from 4
sessions).

This patch proposes a different approach with the following
changes:

A. virtio:
	- Provide a API to get available number of slots.

B. virtio-net:
	- Remove stop/start txq's and associated callback.
	- Pre-calculate the number of slots needed to transmit
	  the skb in xmit_skb and bail out early if enough space
	  is not available. My testing shows that 2.5-3% of
	  packets are benefited by using this API.
	- Do not drop skbs but instead return TX_BUSY like other
	  drivers.
	- When returning EBUSY, set a per-txq variable to indicate
	  to dev_queue_xmit() whether to restart xmits on this txq.

C. net/sched/sch_generic.c:
	Since virtio-net now returns EBUSY, the skb is requeued to
	gso_skb. This allows adding the addional check for restart
	xmits in just the slow-path (the first re-queued packet
	case of dequeue_skb, where it checks for gso_skb) before
	deciding whether to call the driver or not.

Patch was also tested between two servers with Emulex OneConnect
10G cards to confirm there is no regression. Though the patch is
an attempt to improve only small packet performance, there was
improvement for 1K, 2K and also 16K both in BW and SD. Results
from Guest -> Remote Host (BW in Mbps) for 1K and 16K I/O sizes:

________________________________________________________
			I/O Size: 1K
#	BW1	BW2 (%)		SD1	SD2 (%)
________________________________________________________
1	1226	3313 (170.2)	6.6	1.9 (-71.2)
2	3223	7705 (139.0)	18.0	7.1 (-60.5)
4	7223	8716 (20.6)	36.5	29.7 (-18.6)
8	8689	8693 (0)	131.5	123.0 (-6.4)
16	8059	8285 (2.8)	578.3	506.2 (-12.4)
32	7758	7955 (2.5)	2281.4	2244.2 (-1.6)
64	7503	7895 (5.2)	9734.0	9424.4 (-3.1)
96	7496	7751 (3.4)	21980.9	20169.3 (-8.2)
128	7389	7741 (4.7)	40467.5	34995.5 (-13.5)
________________________________________________________
Summary:	BW: 16.2%	SD: -10.2%

________________________________________________________
			I/O Size: 16K
#	BW1	BW2 (%)		SD1	SD2 (%)
________________________________________________________
1	6684	7019 (5.0)	1.1	1.1 (0)
2	7674	7196 (-6.2)	5.0	4.8 (-4.0)
4	7358	8032 (9.1)	21.3	20.4 (-4.2)
8	7393	8015 (8.4)	82.7	82.0 (-.8)
16	7958	8366 (5.1)	283.2	310.7 (9.7)
32	7792	8113 (4.1)	1257.5	1363.0 (8.3)
64	7673	8040 (4.7)	5723.1	5812.4 (1.5)
96	7462	7883 (5.6)	12731.8	12119.8 (-4.8)
128	7338	7800 (6.2)	21331.7	21094.7 (-1.1)
________________________________________________________
Summary:	BW: 4.6%	SD: -1.5%

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
---

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/4] [RFC] netdevice: Introduce per-txq xmit_restart
  2011-05-04 14:02 [PATCH 0/4] [RFC] virtio-net: Improve small packet performance Krishna Kumar
@ 2011-05-04 14:03 ` Krishna Kumar
  2011-05-04 14:03 ` [PATCH 2/4] [RFC] virtio: Introduce new API to get free space Krishna Kumar
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 26+ messages in thread
From: Krishna Kumar @ 2011-05-04 14:03 UTC (permalink / raw)
  To: davem; +Cc: eric.dumazet, kvm, mst, netdev, rusty, Krishna Kumar

Add a per-txq field that can (optionally) be set by participating
drivers to indicate when to restart tx.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
---
 include/linux/netdevice.h |    1 +
 1 file changed, 1 insertion(+)

diff -ruNp org/include/linux/netdevice.h new/include/linux/netdevice.h
--- org/include/linux/netdevice.h	2011-05-04 18:57:06.000000000 +0530
+++ new/include/linux/netdevice.h	2011-05-04 18:57:09.000000000 +0530
@@ -571,6 +571,7 @@ struct netdev_queue {
 	 * please use this field instead of dev->trans_start
 	 */
 	unsigned long		trans_start;
+	unsigned long		xmit_restart_jiffies; /* jiffies to restart */
 } ____cacheline_aligned_in_smp;
 
 static inline int netdev_queue_numa_node_read(const struct netdev_queue *q)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 2/4] [RFC] virtio: Introduce new API to get free space
  2011-05-04 14:02 [PATCH 0/4] [RFC] virtio-net: Improve small packet performance Krishna Kumar
  2011-05-04 14:03 ` [PATCH 1/4] [RFC] netdevice: Introduce per-txq xmit_restart Krishna Kumar
@ 2011-05-04 14:03 ` Krishna Kumar
  2011-05-04 14:50   ` Michael S. Tsirkin
  2011-05-04 19:58   ` Michael S. Tsirkin
  2011-05-04 14:03 ` [PATCH 3/4] [RFC] virtio-net: Changes to virtio-net driver Krishna Kumar
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 26+ messages in thread
From: Krishna Kumar @ 2011-05-04 14:03 UTC (permalink / raw)
  To: davem; +Cc: eric.dumazet, kvm, mst, netdev, rusty, Krishna Kumar

Introduce virtqueue_get_capacity() to help bail out of transmit
path early. Also remove notification when we run out of space (I
am not sure if this should be under a feature bit).

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
---
 drivers/virtio/virtio_ring.c |   13 ++++++++-----
 include/linux/virtio.h       |    5 +++++
 2 files changed, 13 insertions(+), 5 deletions(-)

diff -ruNp org/include/linux/virtio.h new/include/linux/virtio.h
--- org/include/linux/virtio.h	2011-05-04 18:57:06.000000000 +0530
+++ new/include/linux/virtio.h	2011-05-04 18:57:09.000000000 +0530
@@ -27,6 +27,9 @@ struct virtqueue {
 
 /**
  * operations for virtqueue
+ * virtqueue_get_capacity: Get vq capacity
+ *	vq: the struct virtqueue we're talking about.
+ *	Returns remaining capacity of queue
  * virtqueue_add_buf: expose buffer to other end
  *	vq: the struct virtqueue we're talking about.
  *	sg: the description of the buffer(s).
@@ -62,6 +65,8 @@ struct virtqueue {
  * All operations can be called in any context.
  */
 
+int virtqueue_get_capacity(struct virtqueue *vq);
+
 int virtqueue_add_buf_gfp(struct virtqueue *vq,
 			  struct scatterlist sg[],
 			  unsigned int out_num,
diff -ruNp org/drivers/virtio/virtio_ring.c new/drivers/virtio/virtio_ring.c
--- org/drivers/virtio/virtio_ring.c	2011-05-04 18:57:06.000000000 +0530
+++ new/drivers/virtio/virtio_ring.c	2011-05-04 18:57:09.000000000 +0530
@@ -156,6 +156,14 @@ static int vring_add_indirect(struct vri
 	return head;
 }
 
+int virtqueue_get_capacity(struct virtqueue *_vq)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+
+	return vq->num_free;
+}
+EXPORT_SYMBOL_GPL(virtqueue_get_capacity);
+
 int virtqueue_add_buf_gfp(struct virtqueue *_vq,
 			  struct scatterlist sg[],
 			  unsigned int out,
@@ -185,11 +193,6 @@ int virtqueue_add_buf_gfp(struct virtque
 	if (vq->num_free < out + in) {
 		pr_debug("Can't add buf len %i - avail = %i\n",
 			 out + in, vq->num_free);
-		/* FIXME: for historical reasons, we force a notify here if
-		 * there are outgoing parts to the buffer.  Presumably the
-		 * host should service the ring ASAP. */
-		if (out)
-			vq->notify(&vq->vq);
 		END_USE(vq);
 		return -ENOSPC;
 	}

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/4] [RFC] virtio: Introduce new API to get free space
  2011-05-04 14:03 ` [PATCH 2/4] [RFC] virtio: Introduce new API to get free space Krishna Kumar
@ 2011-05-04 14:50   ` Michael S. Tsirkin
  2011-05-04 20:00     ` Michael S. Tsirkin
  2011-05-04 19:58   ` Michael S. Tsirkin
  1 sibling, 1 reply; 26+ messages in thread
From: Michael S. Tsirkin @ 2011-05-04 14:50 UTC (permalink / raw)
  To: Krishna Kumar; +Cc: davem, eric.dumazet, kvm, netdev, rusty

On Wed, May 04, 2011 at 07:33:19PM +0530, Krishna Kumar wrote:
> Introduce virtqueue_get_capacity() to help bail out of transmit
> path early. Also remove notification when we run out of space (I
> am not sure if this should be under a feature bit).
> 
> Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
> ---
>  drivers/virtio/virtio_ring.c |   13 ++++++++-----
>  include/linux/virtio.h       |    5 +++++
>  2 files changed, 13 insertions(+), 5 deletions(-)
> 
> diff -ruNp org/include/linux/virtio.h new/include/linux/virtio.h
> --- org/include/linux/virtio.h	2011-05-04 18:57:06.000000000 +0530
> +++ new/include/linux/virtio.h	2011-05-04 18:57:09.000000000 +0530
> @@ -27,6 +27,9 @@ struct virtqueue {
>  
>  /**
>   * operations for virtqueue
> + * virtqueue_get_capacity: Get vq capacity
> + *	vq: the struct virtqueue we're talking about.
> + *	Returns remaining capacity of queue
>   * virtqueue_add_buf: expose buffer to other end
>   *	vq: the struct virtqueue we're talking about.
>   *	sg: the description of the buffer(s).
> @@ -62,6 +65,8 @@ struct virtqueue {
>   * All operations can be called in any context.
>   */
>  
> +int virtqueue_get_capacity(struct virtqueue *vq);
> +
>  int virtqueue_add_buf_gfp(struct virtqueue *vq,
>  			  struct scatterlist sg[],
>  			  unsigned int out_num,

This is same as Shirley sent?
Maybe split and attribute ...

> diff -ruNp org/drivers/virtio/virtio_ring.c new/drivers/virtio/virtio_ring.c
> --- org/drivers/virtio/virtio_ring.c	2011-05-04 18:57:06.000000000 +0530
> +++ new/drivers/virtio/virtio_ring.c	2011-05-04 18:57:09.000000000 +0530
> @@ -156,6 +156,14 @@ static int vring_add_indirect(struct vri
>  	return head;
>  }
>  
> +int virtqueue_get_capacity(struct virtqueue *_vq)
> +{
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +
> +	return vq->num_free;
> +}
> +EXPORT_SYMBOL_GPL(virtqueue_get_capacity);
> +
>  int virtqueue_add_buf_gfp(struct virtqueue *_vq,
>  			  struct scatterlist sg[],
>  			  unsigned int out,
> @@ -185,11 +193,6 @@ int virtqueue_add_buf_gfp(struct virtque
>  	if (vq->num_free < out + in) {
>  		pr_debug("Can't add buf len %i - avail = %i\n",
>  			 out + in, vq->num_free);
> -		/* FIXME: for historical reasons, we force a notify here if
> -		 * there are outgoing parts to the buffer.  Presumably the
> -		 * host should service the ring ASAP. */
> -		if (out)
> -			vq->notify(&vq->vq);
>  		END_USE(vq);
>  		return -ENOSPC;
>  	}

This will break qemu versions 0.13 and back.
I'm adding some new virtio ring flags, we'll be
able to reuse one of these to mean 'no need for
work around', I think.

-- 
MST

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/4] [RFC] virtio: Introduce new API to get free space
  2011-05-04 14:50   ` Michael S. Tsirkin
@ 2011-05-04 20:00     ` Michael S. Tsirkin
  2011-05-05  3:08       ` Krishna Kumar2
  2011-05-05  9:13       ` Michael S. Tsirkin
  0 siblings, 2 replies; 26+ messages in thread
From: Michael S. Tsirkin @ 2011-05-04 20:00 UTC (permalink / raw)
  To: Krishna Kumar; +Cc: davem, eric.dumazet, kvm, netdev, rusty

On Wed, May 04, 2011 at 05:50:19PM +0300, Michael S. Tsirkin wrote:
> > @@ -185,11 +193,6 @@ int virtqueue_add_buf_gfp(struct virtque
> >  	if (vq->num_free < out + in) {
> >  		pr_debug("Can't add buf len %i - avail = %i\n",
> >  			 out + in, vq->num_free);
> > -		/* FIXME: for historical reasons, we force a notify here if
> > -		 * there are outgoing parts to the buffer.  Presumably the
> > -		 * host should service the ring ASAP. */
> > -		if (out)
> > -			vq->notify(&vq->vq);
> >  		END_USE(vq);
> >  		return -ENOSPC;
> >  	}
> 
> This will break qemu versions 0.13 and back.
> I'm adding some new virtio ring flags, we'll be
> able to reuse one of these to mean 'no need for
> work around', I think.

Not really, it wont. We shall almost never get here at all.
But then, why would this help performance?

> -- 
> MST

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/4] [RFC] virtio: Introduce new API to get free space
  2011-05-04 20:00     ` Michael S. Tsirkin
@ 2011-05-05  3:08       ` Krishna Kumar2
  2011-05-05  9:13       ` Michael S. Tsirkin
  1 sibling, 0 replies; 26+ messages in thread
From: Krishna Kumar2 @ 2011-05-05  3:08 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: davem, eric.dumazet, kvm, netdev, rusty

"Michael S. Tsirkin" <mst@redhat.com> wrote on 05/05/2011 01:30:23 AM:

> > > @@ -185,11 +193,6 @@ int virtqueue_add_buf_gfp(struct virtque
> > >     if (vq->num_free < out + in) {
> > >        pr_debug("Can't add buf len %i - avail = %i\n",
> > >            out + in, vq->num_free);
> > > -      /* FIXME: for historical reasons, we force a notify here if
> > > -       * there are outgoing parts to the buffer.  Presumably the
> > > -       * host should service the ring ASAP. */
> > > -      if (out)
> > > -         vq->notify(&vq->vq);
> > >        END_USE(vq);
> > >        return -ENOSPC;
> > >     }
> >
> > This will break qemu versions 0.13 and back.
> > I'm adding some new virtio ring flags, we'll be
> > able to reuse one of these to mean 'no need for
> > work around', I think.
>
> Not really, it wont. We shall almost never get here at all.
> But then, why would this help performance?

Yes, it is not needed. I will be testing it without this
also.

thanks,

- KK


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/4] [RFC] virtio: Introduce new API to get free space
  2011-05-04 20:00     ` Michael S. Tsirkin
  2011-05-05  3:08       ` Krishna Kumar2
@ 2011-05-05  9:13       ` Michael S. Tsirkin
  1 sibling, 0 replies; 26+ messages in thread
From: Michael S. Tsirkin @ 2011-05-05  9:13 UTC (permalink / raw)
  To: Krishna Kumar; +Cc: davem, eric.dumazet, kvm, netdev, rusty

On Wed, May 04, 2011 at 11:00:23PM +0300, Michael S. Tsirkin wrote:
> On Wed, May 04, 2011 at 05:50:19PM +0300, Michael S. Tsirkin wrote:
> > > @@ -185,11 +193,6 @@ int virtqueue_add_buf_gfp(struct virtque
> > >  	if (vq->num_free < out + in) {
> > >  		pr_debug("Can't add buf len %i - avail = %i\n",
> > >  			 out + in, vq->num_free);
> > > -		/* FIXME: for historical reasons, we force a notify here if
> > > -		 * there are outgoing parts to the buffer.  Presumably the
> > > -		 * host should service the ring ASAP. */
> > > -		if (out)
> > > -			vq->notify(&vq->vq);
> > >  		END_USE(vq);
> > >  		return -ENOSPC;
> > >  	}
> > 
> > This will break qemu versions 0.13 and back.
> > I'm adding some new virtio ring flags, we'll be
> > able to reuse one of these to mean 'no need for
> > work around', I think.
> 
> Not really, it wont. We shall almost never get here at all.
> But then, why would this help performance?

I think I understand this finally.
By itself, this patch does not help performance and does not
hurt it. But later patch makes us try to xmit and fail there
instead of doing capacity checks. With *that* patch applied
on top of this one, and with qemu 0.13 and older, performance
will be hurt.

We need to either
- ignore these older hosts
- add a feature bit (or use one of the new ones I added: for example
  with avail_event userspace never needs this behaviour as it can
  ask to get events when ring gets full)
- keep doing capacity checks, which will make us almost never get here


> > -- 
> > MST

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/4] [RFC] virtio: Introduce new API to get free space
  2011-05-04 14:03 ` [PATCH 2/4] [RFC] virtio: Introduce new API to get free space Krishna Kumar
  2011-05-04 14:50   ` Michael S. Tsirkin
@ 2011-05-04 19:58   ` Michael S. Tsirkin
  1 sibling, 0 replies; 26+ messages in thread
From: Michael S. Tsirkin @ 2011-05-04 19:58 UTC (permalink / raw)
  To: Krishna Kumar; +Cc: davem, eric.dumazet, kvm, netdev, rusty, mashirle

On Wed, May 04, 2011 at 07:33:19PM +0530, Krishna Kumar wrote:
> @@ -185,11 +193,6 @@ int virtqueue_add_buf_gfp(struct virtque
>  	if (vq->num_free < out + in) {
>  		pr_debug("Can't add buf len %i - avail = %i\n",
>  			 out + in, vq->num_free);
> -		/* FIXME: for historical reasons, we force a notify here if
> -		 * there are outgoing parts to the buffer.  Presumably the
> -		 * host should service the ring ASAP. */
> -		if (out)
> -			vq->notify(&vq->vq);
>  		END_USE(vq);
>  		return -ENOSPC;
>  	}

I thought about it some more.  We should typically not get into this
state with the current driver as we check capacity upfront.

So why would this change help performance?
Shirley, any idea?

-- 
MST

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 3/4] [RFC] virtio-net: Changes to virtio-net driver
  2011-05-04 14:02 [PATCH 0/4] [RFC] virtio-net: Improve small packet performance Krishna Kumar
  2011-05-04 14:03 ` [PATCH 1/4] [RFC] netdevice: Introduce per-txq xmit_restart Krishna Kumar
  2011-05-04 14:03 ` [PATCH 2/4] [RFC] virtio: Introduce new API to get free space Krishna Kumar
@ 2011-05-04 14:03 ` Krishna Kumar
  2011-05-05 12:28   ` Michael S. Tsirkin
  2011-05-04 14:03 ` [PATCH 4/4] [RFC] sched: Changes to dequeue_skb Krishna Kumar
  2011-05-04 14:46 ` [PATCH 0/4] [RFC] virtio-net: Improve small packet performance Michael S. Tsirkin
  4 siblings, 1 reply; 26+ messages in thread
From: Krishna Kumar @ 2011-05-04 14:03 UTC (permalink / raw)
  To: davem; +Cc: eric.dumazet, kvm, mst, netdev, rusty, Krishna Kumar

Changes:

1. Remove xmit notification
2. free_old_xmit_skbs() frees upto a limit to reduce tx jitter.
3. xmit_skb() precalculates the number of slots and checks if
   that is available. It assumes that we are not using
   indirect descriptors at this time.
4. start_xmit() becomes a small routine that removes most error
   checks, does not drop packets but instead returns EBUSY if
   there is no space to transmit. It also sets when to restart
   xmits in future.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
---
 drivers/net/virtio_net.c |   70 ++++++++++---------------------------
 1 file changed, 20 insertions(+), 50 deletions(-)

diff -ruNp org/drivers/net/virtio_net.c new/drivers/net/virtio_net.c
--- org/drivers/net/virtio_net.c	2011-05-04 18:57:06.000000000 +0530
+++ new/drivers/net/virtio_net.c	2011-05-04 18:57:09.000000000 +0530
@@ -117,17 +117,6 @@ static struct page *get_a_page(struct vi
 	return p;
 }
 
-static void skb_xmit_done(struct virtqueue *svq)
-{
-	struct virtnet_info *vi = svq->vdev->priv;
-
-	/* Suppress further interrupts. */
-	virtqueue_disable_cb(svq);
-
-	/* We were probably waiting for more output buffers. */
-	netif_wake_queue(vi->dev);
-}
-
 static void set_skb_frag(struct sk_buff *skb, struct page *page,
 			 unsigned int offset, unsigned int *len)
 {
@@ -509,19 +498,18 @@ again:
 	return received;
 }
 
-static unsigned int free_old_xmit_skbs(struct virtnet_info *vi)
+static inline void free_old_xmit_skbs(struct virtnet_info *vi)
 {
 	struct sk_buff *skb;
-	unsigned int len, tot_sgs = 0;
+	unsigned int count = 0, len;
 
-	while ((skb = virtqueue_get_buf(vi->svq, &len)) != NULL) {
+	while (count++ < MAX_SKB_FRAGS+2 &&
+	       (skb = virtqueue_get_buf(vi->svq, &len)) != NULL) {
 		pr_debug("Sent skb %p\n", skb);
 		vi->dev->stats.tx_bytes += skb->len;
 		vi->dev->stats.tx_packets++;
-		tot_sgs += skb_vnet_hdr(skb)->num_sg;
 		dev_kfree_skb_any(skb);
 	}
-	return tot_sgs;
 }
 
 static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb)
@@ -531,6 +519,12 @@ static int xmit_skb(struct virtnet_info 
 
 	pr_debug("%s: xmit %p %pM\n", vi->dev->name, skb, dest);
 
+	hdr->num_sg = skb_to_sgvec(skb, vi->tx_sg + 1, 0, skb->len) + 1;
+	if (unlikely(hdr->num_sg > virtqueue_get_capacity(vi->svq))) {
+		/* Don't rely on indirect descriptors when reaching capacity */
+		return -ENOSPC;
+	}
+
 	if (skb->ip_summed == CHECKSUM_PARTIAL) {
 		hdr->hdr.flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
 		hdr->hdr.csum_start = skb_checksum_start_offset(skb);
@@ -566,7 +560,6 @@ static int xmit_skb(struct virtnet_info 
 	else
 		sg_set_buf(vi->tx_sg, &hdr->hdr, sizeof hdr->hdr);
 
-	hdr->num_sg = skb_to_sgvec(skb, vi->tx_sg + 1, 0, skb->len) + 1;
 	return virtqueue_add_buf(vi->svq, vi->tx_sg, hdr->num_sg,
 					0, skb);
 }
@@ -574,30 +567,21 @@ static int xmit_skb(struct virtnet_info 
 static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct virtnet_info *vi = netdev_priv(dev);
-	int capacity;
 
 	/* Free up any pending old buffers before queueing new ones. */
 	free_old_xmit_skbs(vi);
 
 	/* Try to transmit */
-	capacity = xmit_skb(vi, skb);
+	if (unlikely(xmit_skb(vi, skb) < 0)) {
+		struct netdev_queue *txq;
 
-	/* This can happen with OOM and indirect buffers. */
-	if (unlikely(capacity < 0)) {
-		if (net_ratelimit()) {
-			if (likely(capacity == -ENOMEM)) {
-				dev_warn(&dev->dev,
-					 "TX queue failure: out of memory\n");
-			} else {
-				dev->stats.tx_fifo_errors++;
-				dev_warn(&dev->dev,
-					 "Unexpected TX queue failure: %d\n",
-					 capacity);
-			}
-		}
-		dev->stats.tx_dropped++;
-		kfree_skb(skb);
-		return NETDEV_TX_OK;
+		/*
+		 * Tell kernel to restart xmits after 1 jiffy to help the
+		 * host catch up.
+		 */
+		txq = netdev_get_tx_queue(dev, 0);
+		txq->xmit_restart_jiffies = jiffies + 1;
+		return NETDEV_TX_BUSY;
 	}
 	virtqueue_kick(vi->svq);
 
@@ -605,20 +589,6 @@ static netdev_tx_t start_xmit(struct sk_
 	skb_orphan(skb);
 	nf_reset(skb);
 
-	/* Apparently nice girls don't return TX_BUSY; stop the queue
-	 * before it gets out of hand.  Naturally, this wastes entries. */
-	if (capacity < 2+MAX_SKB_FRAGS) {
-		netif_stop_queue(dev);
-		if (unlikely(!virtqueue_enable_cb(vi->svq))) {
-			/* More just got used, free them then recheck. */
-			capacity += free_old_xmit_skbs(vi);
-			if (capacity >= 2+MAX_SKB_FRAGS) {
-				netif_start_queue(dev);
-				virtqueue_disable_cb(vi->svq);
-			}
-		}
-	}
-
 	return NETDEV_TX_OK;
 }
 
@@ -881,7 +851,7 @@ static int virtnet_probe(struct virtio_d
 	struct net_device *dev;
 	struct virtnet_info *vi;
 	struct virtqueue *vqs[3];
-	vq_callback_t *callbacks[] = { skb_recv_done, skb_xmit_done, NULL};
+	vq_callback_t *callbacks[] = { skb_recv_done, NULL, NULL};
 	const char *names[] = { "input", "output", "control" };
 	int nvqs;
 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/4] [RFC] virtio-net: Changes to virtio-net driver
  2011-05-04 14:03 ` [PATCH 3/4] [RFC] virtio-net: Changes to virtio-net driver Krishna Kumar
@ 2011-05-05 12:28   ` Michael S. Tsirkin
  0 siblings, 0 replies; 26+ messages in thread
From: Michael S. Tsirkin @ 2011-05-05 12:28 UTC (permalink / raw)
  To: Krishna Kumar; +Cc: davem, eric.dumazet, kvm, netdev, rusty

On Wed, May 04, 2011 at 07:33:32PM +0530, Krishna Kumar wrote:
> Changes:
> 
> 1. Remove xmit notification
> 2. free_old_xmit_skbs() frees upto a limit to reduce tx jitter.
> 3. xmit_skb() precalculates the number of slots and checks if
>    that is available. It assumes that we are not using
>    indirect descriptors at this time.
> 4. start_xmit() becomes a small routine that removes most error
>    checks, does not drop packets but instead returns EBUSY if
>    there is no space to transmit. It also sets when to restart
>    xmits in future.
> 
> Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
> ---
>  drivers/net/virtio_net.c |   70 ++++++++++---------------------------
>  1 file changed, 20 insertions(+), 50 deletions(-)
> 
> diff -ruNp org/drivers/net/virtio_net.c new/drivers/net/virtio_net.c
> --- org/drivers/net/virtio_net.c	2011-05-04 18:57:06.000000000 +0530
> +++ new/drivers/net/virtio_net.c	2011-05-04 18:57:09.000000000 +0530
> @@ -117,17 +117,6 @@ static struct page *get_a_page(struct vi
>  	return p;
>  }
>  
> -static void skb_xmit_done(struct virtqueue *svq)
> -{
> -	struct virtnet_info *vi = svq->vdev->priv;
> -
> -	/* Suppress further interrupts. */
> -	virtqueue_disable_cb(svq);
> -
> -	/* We were probably waiting for more output buffers. */
> -	netif_wake_queue(vi->dev);
> -}
> -
>  static void set_skb_frag(struct sk_buff *skb, struct page *page,
>  			 unsigned int offset, unsigned int *len)
>  {
> @@ -509,19 +498,18 @@ again:
>  	return received;
>  }
>  
> -static unsigned int free_old_xmit_skbs(struct virtnet_info *vi)
> +static inline void free_old_xmit_skbs(struct virtnet_info *vi)
>  {
>  	struct sk_buff *skb;
> -	unsigned int len, tot_sgs = 0;
> +	unsigned int count = 0, len;
>  
> -	while ((skb = virtqueue_get_buf(vi->svq, &len)) != NULL) {
> +	while (count++ < MAX_SKB_FRAGS+2 &&
> +	       (skb = virtqueue_get_buf(vi->svq, &len)) != NULL) {
>  		pr_debug("Sent skb %p\n", skb);
>  		vi->dev->stats.tx_bytes += skb->len;
>  		vi->dev->stats.tx_packets++;
> -		tot_sgs += skb_vnet_hdr(skb)->num_sg;
>  		dev_kfree_skb_any(skb);
>  	}
> -	return tot_sgs;
>  }
>  
>  static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb)
> @@ -531,6 +519,12 @@ static int xmit_skb(struct virtnet_info 
>  
>  	pr_debug("%s: xmit %p %pM\n", vi->dev->name, skb, dest);
>  
> +	hdr->num_sg = skb_to_sgvec(skb, vi->tx_sg + 1, 0, skb->len) + 1;
> +	if (unlikely(hdr->num_sg > virtqueue_get_capacity(vi->svq))) {
> +		/* Don't rely on indirect descriptors when reaching capacity */
> +		return -ENOSPC;
> +	}
> +

This is minor, but when the ring gets full, we are doing extra
work.

>  	if (skb->ip_summed == CHECKSUM_PARTIAL) {
>  		hdr->hdr.flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
>  		hdr->hdr.csum_start = skb_checksum_start_offset(skb);
> @@ -566,7 +560,6 @@ static int xmit_skb(struct virtnet_info 
>  	else
>  		sg_set_buf(vi->tx_sg, &hdr->hdr, sizeof hdr->hdr);
>  
> -	hdr->num_sg = skb_to_sgvec(skb, vi->tx_sg + 1, 0, skb->len) + 1;
>  	return virtqueue_add_buf(vi->svq, vi->tx_sg, hdr->num_sg,
>  					0, skb);
>  }
> @@ -574,30 +567,21 @@ static int xmit_skb(struct virtnet_info 
>  static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>  {
>  	struct virtnet_info *vi = netdev_priv(dev);
> -	int capacity;
>  
>  	/* Free up any pending old buffers before queueing new ones. */
>  	free_old_xmit_skbs(vi);
>  
>  	/* Try to transmit */
> -	capacity = xmit_skb(vi, skb);
> +	if (unlikely(xmit_skb(vi, skb) < 0)) {
> +		struct netdev_queue *txq;
>  
> -	/* This can happen with OOM and indirect buffers. */
> -	if (unlikely(capacity < 0)) {
> -		if (net_ratelimit()) {
> -			if (likely(capacity == -ENOMEM)) {
> -				dev_warn(&dev->dev,
> -					 "TX queue failure: out of memory\n");
> -			} else {
> -				dev->stats.tx_fifo_errors++;
> -				dev_warn(&dev->dev,
> -					 "Unexpected TX queue failure: %d\n",
> -					 capacity);
> -			}
> -		}
> -		dev->stats.tx_dropped++;
> -		kfree_skb(skb);
> -		return NETDEV_TX_OK;
> +		/*
> +		 * Tell kernel to restart xmits after 1 jiffy to help the
> +		 * host catch up.
> +		 */
> +		txq = netdev_get_tx_queue(dev, 0);
> +		txq->xmit_restart_jiffies = jiffies + 1;
> +		return NETDEV_TX_BUSY;
>  	}
>  	virtqueue_kick(vi->svq);
>  
> @@ -605,20 +589,6 @@ static netdev_tx_t start_xmit(struct sk_
>  	skb_orphan(skb);
>  	nf_reset(skb);
>  
> -	/* Apparently nice girls don't return TX_BUSY; stop the queue
> -	 * before it gets out of hand.  Naturally, this wastes entries. */
> -	if (capacity < 2+MAX_SKB_FRAGS) {
> -		netif_stop_queue(dev);
> -		if (unlikely(!virtqueue_enable_cb(vi->svq))) {
> -			/* More just got used, free them then recheck. */
> -			capacity += free_old_xmit_skbs(vi);
> -			if (capacity >= 2+MAX_SKB_FRAGS) {
> -				netif_start_queue(dev);
> -				virtqueue_disable_cb(vi->svq);
> -			}
> -		}
> -	}
> -
>  	return NETDEV_TX_OK;
>  }
>  
> @@ -881,7 +851,7 @@ static int virtnet_probe(struct virtio_d
>  	struct net_device *dev;
>  	struct virtnet_info *vi;
>  	struct virtqueue *vqs[3];
> -	vq_callback_t *callbacks[] = { skb_recv_done, skb_xmit_done, NULL};
> +	vq_callback_t *callbacks[] = { skb_recv_done, NULL, NULL};
>  	const char *names[] = { "input", "output", "control" };
>  	int nvqs;
>  

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 4/4] [RFC] sched: Changes to dequeue_skb
  2011-05-04 14:02 [PATCH 0/4] [RFC] virtio-net: Improve small packet performance Krishna Kumar
                   ` (2 preceding siblings ...)
  2011-05-04 14:03 ` [PATCH 3/4] [RFC] virtio-net: Changes to virtio-net driver Krishna Kumar
@ 2011-05-04 14:03 ` Krishna Kumar
  2011-05-04 14:46 ` [PATCH 0/4] [RFC] virtio-net: Improve small packet performance Michael S. Tsirkin
  4 siblings, 0 replies; 26+ messages in thread
From: Krishna Kumar @ 2011-05-04 14:03 UTC (permalink / raw)
  To: davem; +Cc: eric.dumazet, kvm, mst, netdev, rusty, Krishna Kumar

Dequeue_skb has an additional check, for the first packet that
is requeued, to see if the device has requested xmits after a
interval. This is intended to not affect the fast xmit path, and
have minimal overhead to the slow path. Drivers setting the
restart time should not stop/start their tx queues, and hence
the frozen/stopped check can be avoided.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
---
 net/sched/sch_generic.c |   23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff -ruNp org/net/sched/sch_generic.c new/net/sched/sch_generic.c
--- org/net/sched/sch_generic.c	2011-05-04 18:57:06.000000000 +0530
+++ new/net/sched/sch_generic.c	2011-05-04 18:57:09.000000000 +0530
@@ -50,17 +50,30 @@ static inline int dev_requeue_skb(struct
 	return 0;
 }
 
+/*
+ * This function can return a rare false positive for drivers setting
+ * xmit_restart_jiffies (e.g. virtio-net) when xmit_restart_jiffies is
+ * zero but the device may not be ready. That only leads to the skb
+ * being requeued again.
+ */
+static inline int can_restart_xmit(struct Qdisc *q, struct sk_buff *skb)
+{
+	struct net_device *dev = qdisc_dev(q);
+	struct netdev_queue *txq;
+
+	txq = netdev_get_tx_queue(dev, skb_get_queue_mapping(skb));
+	if (unlikely(txq->xmit_restart_jiffies))
+		return time_after_eq(jiffies, txq->xmit_restart_jiffies);
+	return !netif_tx_queue_frozen_or_stopped(txq);
+}
+
 static inline struct sk_buff *dequeue_skb(struct Qdisc *q)
 {
 	struct sk_buff *skb = q->gso_skb;
 
 	if (unlikely(skb)) {
-		struct net_device *dev = qdisc_dev(q);
-		struct netdev_queue *txq;
-
 		/* check the reason of requeuing without tx lock first */
-		txq = netdev_get_tx_queue(dev, skb_get_queue_mapping(skb));
-		if (!netif_tx_queue_frozen_or_stopped(txq)) {
+		if (can_restart_xmit(q, skb)) {
 			q->gso_skb = NULL;
 			q->q.qlen--;
 		} else

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/4] [RFC] virtio-net: Improve small packet performance
  2011-05-04 14:02 [PATCH 0/4] [RFC] virtio-net: Improve small packet performance Krishna Kumar
                   ` (3 preceding siblings ...)
  2011-05-04 14:03 ` [PATCH 4/4] [RFC] sched: Changes to dequeue_skb Krishna Kumar
@ 2011-05-04 14:46 ` Michael S. Tsirkin
  2011-05-04 14:59   ` Krishna Kumar2
  4 siblings, 1 reply; 26+ messages in thread
From: Michael S. Tsirkin @ 2011-05-04 14:46 UTC (permalink / raw)
  To: Krishna Kumar; +Cc: davem, eric.dumazet, kvm, netdev, rusty

On Wed, May 04, 2011 at 07:32:58PM +0530, Krishna Kumar wrote:
> Earlier approach to improving small packet performance went
> along the lines of dropping packets when the txq is full to
> avoid stop/start of the txq. Though performance improved
> significantly (upto 3x) for a single thread, multiple netperf
> sessions showed a regression of upto -17% (starting from 4
> sessions).
> 
> This patch proposes a different approach with the following
> changes:
> 
> A. virtio:
> 	- Provide a API to get available number of slots.
> 
> B. virtio-net:
> 	- Remove stop/start txq's and associated callback.
> 	- Pre-calculate the number of slots needed to transmit
> 	  the skb in xmit_skb and bail out early if enough space
> 	  is not available. My testing shows that 2.5-3% of
> 	  packets are benefited by using this API.
> 	- Do not drop skbs but instead return TX_BUSY like other
> 	  drivers.
> 	- When returning EBUSY, set a per-txq variable to indicate
> 	  to dev_queue_xmit() whether to restart xmits on this txq.
> 
> C. net/sched/sch_generic.c:
> 	Since virtio-net now returns EBUSY, the skb is requeued to
> 	gso_skb. This allows adding the addional check for restart
> 	xmits in just the slow-path (the first re-queued packet
> 	case of dequeue_skb, where it checks for gso_skb) before
> 	deciding whether to call the driver or not.
> 
> Patch was also tested between two servers with Emulex OneConnect
> 10G cards to confirm there is no regression. Though the patch is
> an attempt to improve only small packet performance, there was
> improvement for 1K, 2K and also 16K both in BW and SD. Results
> from Guest -> Remote Host (BW in Mbps) for 1K and 16K I/O sizes:
> 
> ________________________________________________________
> 			I/O Size: 1K
> #	BW1	BW2 (%)		SD1	SD2 (%)
> ________________________________________________________
> 1	1226	3313 (170.2)	6.6	1.9 (-71.2)
> 2	3223	7705 (139.0)	18.0	7.1 (-60.5)
> 4	7223	8716 (20.6)	36.5	29.7 (-18.6)
> 8	8689	8693 (0)	131.5	123.0 (-6.4)
> 16	8059	8285 (2.8)	578.3	506.2 (-12.4)
> 32	7758	7955 (2.5)	2281.4	2244.2 (-1.6)
> 64	7503	7895 (5.2)	9734.0	9424.4 (-3.1)
> 96	7496	7751 (3.4)	21980.9	20169.3 (-8.2)
> 128	7389	7741 (4.7)	40467.5	34995.5 (-13.5)
> ________________________________________________________
> Summary:	BW: 16.2%	SD: -10.2%
> 
> ________________________________________________________
> 			I/O Size: 16K
> #	BW1	BW2 (%)		SD1	SD2 (%)
> ________________________________________________________
> 1	6684	7019 (5.0)	1.1	1.1 (0)
> 2	7674	7196 (-6.2)	5.0	4.8 (-4.0)
> 4	7358	8032 (9.1)	21.3	20.4 (-4.2)
> 8	7393	8015 (8.4)	82.7	82.0 (-.8)
> 16	7958	8366 (5.1)	283.2	310.7 (9.7)
> 32	7792	8113 (4.1)	1257.5	1363.0 (8.3)
> 64	7673	8040 (4.7)	5723.1	5812.4 (1.5)
> 96	7462	7883 (5.6)	12731.8	12119.8 (-4.8)
> 128	7338	7800 (6.2)	21331.7	21094.7 (-1.1)
> ________________________________________________________
> Summary:	BW: 4.6%	SD: -1.5%
> 
> Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
> ---

So IIUC, we delay transmit by an arbitrary value and hope
that the host is done with the packets by then?

Interesting.

I am currently testing an approach where
we tell the host explicitly to interrupt us only after
a large part of the queue is empty.
With 256 entries in a queue, we should get 1 interrupt per
on the order of 100 packets which does not seem like a lot.

I can post it, mind testing this?

-- 
MST

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/4] [RFC] virtio-net: Improve small packet performance
  2011-05-04 14:46 ` [PATCH 0/4] [RFC] virtio-net: Improve small packet performance Michael S. Tsirkin
@ 2011-05-04 14:59   ` Krishna Kumar2
  2011-05-04 21:23     ` Michael S. Tsirkin
  0 siblings, 1 reply; 26+ messages in thread
From: Krishna Kumar2 @ 2011-05-04 14:59 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: davem, eric.dumazet, kvm, netdev, rusty

"Michael S. Tsirkin" <mst@redhat.com> wrote on 05/04/2011 08:16:22 PM:

> > A. virtio:
> >    - Provide a API to get available number of slots.
> >
> > B. virtio-net:
> >    - Remove stop/start txq's and associated callback.
> >    - Pre-calculate the number of slots needed to transmit
> >      the skb in xmit_skb and bail out early if enough space
> >      is not available. My testing shows that 2.5-3% of
> >      packets are benefited by using this API.
> >    - Do not drop skbs but instead return TX_BUSY like other
> >      drivers.
> >    - When returning EBUSY, set a per-txq variable to indicate
> >      to dev_queue_xmit() whether to restart xmits on this txq.
> >
> > C. net/sched/sch_generic.c:
> >    Since virtio-net now returns EBUSY, the skb is requeued to
> >    gso_skb. This allows adding the addional check for restart
> >    xmits in just the slow-path (the first re-queued packet
> >    case of dequeue_skb, where it checks for gso_skb) before
> >    deciding whether to call the driver or not.
> >
> > Patch was also tested between two servers with Emulex OneConnect
> > 10G cards to confirm there is no regression. Though the patch is
> > an attempt to improve only small packet performance, there was
> > improvement for 1K, 2K and also 16K both in BW and SD. Results
> > from Guest -> Remote Host (BW in Mbps) for 1K and 16K I/O sizes:
> >
> > ________________________________________________________
> >          I/O Size: 1K
> > #   BW1   BW2 (%)      SD1   SD2 (%)
> > ________________________________________________________
> > 1   1226   3313 (170.2)   6.6   1.9 (-71.2)
> > 2   3223   7705 (139.0)   18.0   7.1 (-60.5)
> > 4   7223   8716 (20.6)   36.5   29.7 (-18.6)
> > 8   8689   8693 (0)   131.5   123.0 (-6.4)
> > 16   8059   8285 (2.8)   578.3   506.2 (-12.4)
> > 32   7758   7955 (2.5)   2281.4   2244.2 (-1.6)
> > 64   7503   7895 (5.2)   9734.0   9424.4 (-3.1)
> > 96   7496   7751 (3.4)   21980.9   20169.3 (-8.2)
> > 128   7389   7741 (4.7)   40467.5   34995.5 (-13.5)
> > ________________________________________________________
> > Summary:   BW: 16.2%   SD: -10.2%
> >
> > ________________________________________________________
> >          I/O Size: 16K
> > #   BW1   BW2 (%)      SD1   SD2 (%)
> > ________________________________________________________
> > 1   6684   7019 (5.0)   1.1   1.1 (0)
> > 2   7674   7196 (-6.2)   5.0   4.8 (-4.0)
> > 4   7358   8032 (9.1)   21.3   20.4 (-4.2)
> > 8   7393   8015 (8.4)   82.7   82.0 (-.8)
> > 16   7958   8366 (5.1)   283.2   310.7 (9.7)
> > 32   7792   8113 (4.1)   1257.5   1363.0 (8.3)
> > 64   7673   8040 (4.7)   5723.1   5812.4 (1.5)
> > 96   7462   7883 (5.6)   12731.8   12119.8 (-4.8)
> > 128   7338   7800 (6.2)   21331.7   21094.7 (-1.1)
> > ________________________________________________________
> > Summary:   BW: 4.6%   SD: -1.5%
> >
> > Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
> > ---
>
> So IIUC, we delay transmit by an arbitrary value and hope
> that the host is done with the packets by then?

Not "hope" exactly. If the device is not ready, then
the packet is requeued. The main idea is to avoid
drops/stop/starts, etc.

> Interesting.
>
> I am currently testing an approach where
> we tell the host explicitly to interrupt us only after
> a large part of the queue is empty.
> With 256 entries in a queue, we should get 1 interrupt per
> on the order of 100 packets which does not seem like a lot.
>
> I can post it, mind testing this?

Sure.

- KK


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/4] [RFC] virtio-net: Improve small packet performance
  2011-05-04 14:59   ` Krishna Kumar2
@ 2011-05-04 21:23     ` Michael S. Tsirkin
  2011-05-05  8:03       ` Krishna Kumar2
  0 siblings, 1 reply; 26+ messages in thread
From: Michael S. Tsirkin @ 2011-05-04 21:23 UTC (permalink / raw)
  To: Krishna Kumar2; +Cc: davem, eric.dumazet, kvm, netdev, rusty

On Wed, May 04, 2011 at 08:29:44PM +0530, Krishna Kumar2 wrote:
> "Michael S. Tsirkin" <mst@redhat.com> wrote on 05/04/2011 08:16:22 PM:
> 
> > > A. virtio:
> > >    - Provide a API to get available number of slots.
> > >
> > > B. virtio-net:
> > >    - Remove stop/start txq's and associated callback.
> > >    - Pre-calculate the number of slots needed to transmit
> > >      the skb in xmit_skb and bail out early if enough space
> > >      is not available. My testing shows that 2.5-3% of
> > >      packets are benefited by using this API.
> > >    - Do not drop skbs but instead return TX_BUSY like other
> > >      drivers.
> > >    - When returning EBUSY, set a per-txq variable to indicate
> > >      to dev_queue_xmit() whether to restart xmits on this txq.
> > >
> > > C. net/sched/sch_generic.c:
> > >    Since virtio-net now returns EBUSY, the skb is requeued to
> > >    gso_skb. This allows adding the addional check for restart
> > >    xmits in just the slow-path (the first re-queued packet
> > >    case of dequeue_skb, where it checks for gso_skb) before
> > >    deciding whether to call the driver or not.
> > >
> > > Patch was also tested between two servers with Emulex OneConnect
> > > 10G cards to confirm there is no regression. Though the patch is
> > > an attempt to improve only small packet performance, there was
> > > improvement for 1K, 2K and also 16K both in BW and SD. Results
> > > from Guest -> Remote Host (BW in Mbps) for 1K and 16K I/O sizes:
> > >
> > > ________________________________________________________
> > >          I/O Size: 1K
> > > #   BW1   BW2 (%)      SD1   SD2 (%)
> > > ________________________________________________________
> > > 1   1226   3313 (170.2)   6.6   1.9 (-71.2)
> > > 2   3223   7705 (139.0)   18.0   7.1 (-60.5)
> > > 4   7223   8716 (20.6)   36.5   29.7 (-18.6)
> > > 8   8689   8693 (0)   131.5   123.0 (-6.4)
> > > 16   8059   8285 (2.8)   578.3   506.2 (-12.4)
> > > 32   7758   7955 (2.5)   2281.4   2244.2 (-1.6)
> > > 64   7503   7895 (5.2)   9734.0   9424.4 (-3.1)
> > > 96   7496   7751 (3.4)   21980.9   20169.3 (-8.2)
> > > 128   7389   7741 (4.7)   40467.5   34995.5 (-13.5)
> > > ________________________________________________________
> > > Summary:   BW: 16.2%   SD: -10.2%
> > >
> > > ________________________________________________________
> > >          I/O Size: 16K
> > > #   BW1   BW2 (%)      SD1   SD2 (%)
> > > ________________________________________________________
> > > 1   6684   7019 (5.0)   1.1   1.1 (0)
> > > 2   7674   7196 (-6.2)   5.0   4.8 (-4.0)
> > > 4   7358   8032 (9.1)   21.3   20.4 (-4.2)
> > > 8   7393   8015 (8.4)   82.7   82.0 (-.8)
> > > 16   7958   8366 (5.1)   283.2   310.7 (9.7)
> > > 32   7792   8113 (4.1)   1257.5   1363.0 (8.3)
> > > 64   7673   8040 (4.7)   5723.1   5812.4 (1.5)
> > > 96   7462   7883 (5.6)   12731.8   12119.8 (-4.8)
> > > 128   7338   7800 (6.2)   21331.7   21094.7 (-1.1)
> > > ________________________________________________________
> > > Summary:   BW: 4.6%   SD: -1.5%
> > >
> > > Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
> > > ---
> >
> > So IIUC, we delay transmit by an arbitrary value and hope
> > that the host is done with the packets by then?
> 
> Not "hope" exactly. If the device is not ready, then
> the packet is requeued. The main idea is to avoid
> drops/stop/starts, etc.

Yes, I see that, definitely. I guess it's a win if the
interrupt takes at least a jiffy to arrive anyway,
and a loss if not. Is there some reason interrupts
might be delayed until the next jiffy?

> > Interesting.
> >
> > I am currently testing an approach where
> > we tell the host explicitly to interrupt us only after
> > a large part of the queue is empty.
> > With 256 entries in a queue, we should get 1 interrupt per
> > on the order of 100 packets which does not seem like a lot.
> >
> > I can post it, mind testing this?
> 
> Sure.
> 
> - KK

Just posted. Would appreciate feedback.

-- 
MST

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/4] [RFC] virtio-net: Improve small packet performance
  2011-05-04 21:23     ` Michael S. Tsirkin
@ 2011-05-05  8:03       ` Krishna Kumar2
  2011-05-05  9:04         ` Michael S. Tsirkin
  0 siblings, 1 reply; 26+ messages in thread
From: Krishna Kumar2 @ 2011-05-05  8:03 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: davem, eric.dumazet, kvm, netdev, rusty

"Michael S. Tsirkin" <mst@redhat.com> wrote on 05/05/2011 02:53:59 AM:

> > Not "hope" exactly. If the device is not ready, then
> > the packet is requeued. The main idea is to avoid
> > drops/stop/starts, etc.
>
> Yes, I see that, definitely. I guess it's a win if the
> interrupt takes at least a jiffy to arrive anyway,
> and a loss if not. Is there some reason interrupts
> might be delayed until the next jiffy?

I can explain this a bit as I have three debug counters
in start_xmit() just for this:

1. Whether the current xmit call was good, i.e. we had
   returned BUSY last time and this xmit was successful.
2. Whether the current xmit call was bad, i.e. we had
   returned BUSY last time and this xmit still failed.
3. The free capacity when we *resumed* xmits. This is
   after calling free_old_xmit_skbs where this function
   is not throttled, in effect it processes *all* the
   completed skbs. This counter is a sum:

   if (If_I_had_returned_EBUSY_last_iteration)
       free_slots += virtqueue_get_capacity();

The counters after a 30 min run of 1K,2K,16K netperf
sessions are:

Good:          1059172
Bad:           31226
Sum of slots:  47551557

(Total of Good+Bad tallies with the total number of requeues
as shown by tc:

qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1
1 1 1 1
 Sent 1560854473453 bytes 1075873684 pkt (dropped 718379, overlimits 0
requeues 1090398)
 backlog 0b 0p requeues 1090398
)

It shows that 2.9% of the time, the 1 jiffy was not enough
to free up space in the txq. That could also mean that we
had set xmit_restart just before jiffies changed. But the
average free capacity when we *resumed* xmits is:
Sum of slots / (Good + Bad) = 43.

So the delay of 1 jiffy helped the host clean up, on average,
just 43 entries, which is 16% of total entries. This is
intended to show that the guest is not sitting idle waiting
for the jiffy to expire.

> > > I can post it, mind testing this?
> >
> > Sure.
>
> Just posted. Would appreciate feedback.

Do I need to apply all the patches and simply test?

Thanks,

- KK

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/4] [RFC] virtio-net: Improve small packet performance
  2011-05-05  8:03       ` Krishna Kumar2
@ 2011-05-05  9:04         ` Michael S. Tsirkin
  2011-05-05  9:43           ` Krishna Kumar2
  2011-05-05 15:27           ` Krishna Kumar2
  0 siblings, 2 replies; 26+ messages in thread
From: Michael S. Tsirkin @ 2011-05-05  9:04 UTC (permalink / raw)
  To: Krishna Kumar2; +Cc: davem, eric.dumazet, kvm, netdev, rusty

On Thu, May 05, 2011 at 01:33:14PM +0530, Krishna Kumar2 wrote:
> "Michael S. Tsirkin" <mst@redhat.com> wrote on 05/05/2011 02:53:59 AM:
> 
> > > Not "hope" exactly. If the device is not ready, then
> > > the packet is requeued. The main idea is to avoid
> > > drops/stop/starts, etc.
> >
> > Yes, I see that, definitely. I guess it's a win if the
> > interrupt takes at least a jiffy to arrive anyway,
> > and a loss if not. Is there some reason interrupts
> > might be delayed until the next jiffy?
> 
> I can explain this a bit as I have three debug counters
> in start_xmit() just for this:
> 
> 1. Whether the current xmit call was good, i.e. we had
>    returned BUSY last time and this xmit was successful.
> 2. Whether the current xmit call was bad, i.e. we had
>    returned BUSY last time and this xmit still failed.
> 3. The free capacity when we *resumed* xmits. This is
>    after calling free_old_xmit_skbs where this function
>    is not throttled, in effect it processes *all* the
>    completed skbs. This counter is a sum:
> 
>    if (If_I_had_returned_EBUSY_last_iteration)
>        free_slots += virtqueue_get_capacity();
> 
> The counters after a 30 min run of 1K,2K,16K netperf
> sessions are:
> 
> Good:          1059172
> Bad:           31226
> Sum of slots:  47551557
> 
> (Total of Good+Bad tallies with the total number of requeues
> as shown by tc:
> 
> qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1
> 1 1 1 1
>  Sent 1560854473453 bytes 1075873684 pkt (dropped 718379, overlimits 0
> requeues 1090398)
>  backlog 0b 0p requeues 1090398
> )
> 
> It shows that 2.9% of the time, the 1 jiffy was not enough
> to free up space in the txq.

How common is it to free up space in *less than* 1 jiffy?

> That could also mean that we
> had set xmit_restart just before jiffies changed. But the
> average free capacity when we *resumed* xmits is:
> Sum of slots / (Good + Bad) = 43.
> 
> So the delay of 1 jiffy helped the host clean up, on average,
> just 43 entries, which is 16% of total entries. This is
> intended to show that the guest is not sitting idle waiting
> for the jiffy to expire.

OK, nice, this is exactly what my patchset is trying
to do, without playing with timers: tell the host
to interrupt us after 3/4 of the ring is free.
Why 3/4 and not all of the ring? My hope is we can
get some parallelism with the host this way.
Why 3/4 and not 7/8? No idea :)

> > > > I can post it, mind testing this?
> > >
> > > Sure.
> >
> > Just posted. Would appreciate feedback.
> 
> Do I need to apply all the patches and simply test?
> 
> Thanks,
> 
> - KK

Exactly. You can also try to tune the threshold
for interrupts as well.

-- 
MST

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/4] [RFC] virtio-net: Improve small packet performance
  2011-05-05  9:04         ` Michael S. Tsirkin
@ 2011-05-05  9:43           ` Krishna Kumar2
  2011-05-05 10:12             ` Michael S. Tsirkin
  2011-05-05 15:27           ` Krishna Kumar2
  1 sibling, 1 reply; 26+ messages in thread
From: Krishna Kumar2 @ 2011-05-05  9:43 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: davem, eric.dumazet, kvm, netdev, rusty

"Michael S. Tsirkin" <mst@redhat.com> wrote on 05/05/2011 02:34:39 PM:

> > It shows that 2.9% of the time, the 1 jiffy was not enough
> > to free up space in the txq.
>
> How common is it to free up space in *less than* 1 jiffy?

True, but the point is that the space freed is just
enough for 43 entries, keeping it lower means a flood
of (psuedo) stop's and restart's.

> > That could also mean that we
> > had set xmit_restart just before jiffies changed. But the
> > average free capacity when we *resumed* xmits is:
> > Sum of slots / (Good + Bad) = 43.
> >
> > So the delay of 1 jiffy helped the host clean up, on average,
> > just 43 entries, which is 16% of total entries. This is
> > intended to show that the guest is not sitting idle waiting
> > for the jiffy to expire.
>
> OK, nice, this is exactly what my patchset is trying
> to do, without playing with timers: tell the host
> to interrupt us after 3/4 of the ring is free.
> Why 3/4 and not all of the ring? My hope is we can
> get some parallelism with the host this way.
> Why 3/4 and not 7/8? No idea :)
>
> > > > > I can post it, mind testing this?
> > > >
> > > > Sure.
> > >
> > > Just posted. Would appreciate feedback.
> >
> > Do I need to apply all the patches and simply test?
> >
> > Thanks,
> >
> > - KK
>
> Exactly. You can also try to tune the threshold
> for interrupts as well.

Could you send me (privately) the entire virtio-net/vhost
patch in a single file? It will help me quite a bit :)
Either attachment or inline is fine.

thanks,

- KK


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/4] [RFC] virtio-net: Improve small packet performance
  2011-05-05  9:43           ` Krishna Kumar2
@ 2011-05-05 10:12             ` Michael S. Tsirkin
  2011-05-05 10:57               ` Krishna Kumar2
  0 siblings, 1 reply; 26+ messages in thread
From: Michael S. Tsirkin @ 2011-05-05 10:12 UTC (permalink / raw)
  To: Krishna Kumar2; +Cc: davem, eric.dumazet, kvm, netdev, rusty

On Thu, May 05, 2011 at 03:13:43PM +0530, Krishna Kumar2 wrote:
> "Michael S. Tsirkin" <mst@redhat.com> wrote on 05/05/2011 02:34:39 PM:
> 
> > > It shows that 2.9% of the time, the 1 jiffy was not enough
> > > to free up space in the txq.
> >
> > How common is it to free up space in *less than* 1 jiffy?
> 
> True,

Sorry, which statement do you say is true? That interrupt
after less than 1 jiffy is common?

> but the point is that the space freed is just
> enough for 43 entries, keeping it lower means a flood
> of (psuedo) stop's and restart's.
> 
> > > That could also mean that we
> > > had set xmit_restart just before jiffies changed. But the
> > > average free capacity when we *resumed* xmits is:
> > > Sum of slots / (Good + Bad) = 43.
> > >
> > > So the delay of 1 jiffy helped the host clean up, on average,
> > > just 43 entries, which is 16% of total entries. This is
> > > intended to show that the guest is not sitting idle waiting
> > > for the jiffy to expire.
> >
> > OK, nice, this is exactly what my patchset is trying
> > to do, without playing with timers: tell the host
> > to interrupt us after 3/4 of the ring is free.
> > Why 3/4 and not all of the ring? My hope is we can
> > get some parallelism with the host this way.
> > Why 3/4 and not 7/8? No idea :)
> >
> > > > > > I can post it, mind testing this?
> > > > >
> > > > > Sure.
> > > >
> > > > Just posted. Would appreciate feedback.
> > >
> > > Do I need to apply all the patches and simply test?
> > >
> > > Thanks,
> > >
> > > - KK
> >
> > Exactly. You can also try to tune the threshold
> > for interrupts as well.
> 
> Could you send me (privately) the entire virtio-net/vhost
> patch in a single file? It will help me quite a bit :)
> Either attachment or inline is fine.
> 
> thanks,
> 
> - KK

Better yet, here they are in git:

git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-net-next-event-idx-v1
git://git.kernel.org/pub/scm/linux/kernel/git/mst/qemu-kvm.git virtio-net-event-idx-v1


-- 
MST

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/4] [RFC] virtio-net: Improve small packet performance
  2011-05-05 10:12             ` Michael S. Tsirkin
@ 2011-05-05 10:57               ` Krishna Kumar2
  0 siblings, 0 replies; 26+ messages in thread
From: Krishna Kumar2 @ 2011-05-05 10:57 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: davem, eric.dumazet, kvm, netdev, rusty

"Michael S. Tsirkin" <mst@redhat.com> wrote on 05/05/2011 03:42:29 PM:

> > > > It shows that 2.9% of the time, the 1 jiffy was not enough
> > > > to free up space in the txq.
> > >
> > > How common is it to free up space in *less than* 1 jiffy?
> >
> > True,
>
> Sorry, which statement do you say is true? That interrupt
> after less than 1 jiffy is common?

I meant to say that, 97% of the time, space was enough for
the next xmit to succeed. This is keeping in mind that on
average 43 slots were freed up, indicating that the guest
was not waiting around for too long.

Regarding whether interrupts in less than 1 jiffy is
common, I think most of the time it should. But
increasing the limit as to when to do the cb would
increase to a jiffy.

To confirm, I just put some counters in the original
code and found that interrupts happen in less than a
jiffy around 96.75% of the time, only 3.25% took 1
jiffy. But as expected, this is with the host
interrupting immediately, which leads to many
stop/start/interrupts due to very little free capacity.

> > but the point is that the space freed is just
> > enough for 43 entries, keeping it lower means a flood
> > of (psuedo) stop's and restart's.

> Better yet, here they are in git:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-
> net-next-event-idx-v1
> git://git.kernel.org/pub/scm/linux/kernel/git/mst/qemu-kvm.git
> virtio-net-event-idx-v1

Great, I will pick up from here.

thanks,

- KK

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/4] [RFC] virtio-net: Improve small packet performance
  2011-05-05  9:04         ` Michael S. Tsirkin
  2011-05-05  9:43           ` Krishna Kumar2
@ 2011-05-05 15:27           ` Krishna Kumar2
  2011-05-05 15:34             ` Michael S. Tsirkin
                               ` (2 more replies)
  1 sibling, 3 replies; 26+ messages in thread
From: Krishna Kumar2 @ 2011-05-05 15:27 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: davem, eric.dumazet, kvm, netdev, rusty

"Michael S. Tsirkin" <mst@redhat.com> wrote on 05/05/2011 02:34:39 PM:

> > Do I need to apply all the patches and simply test?
> >
> > Thanks,
> >
> > - KK
>
> Exactly. You can also try to tune the threshold
> for interrupts as well.

I haven't tuned the threshhold, it is left it at 3/4. I ran
the new qemu/vhost/guest, and the results for 1K, 2K and 16K
are below. Note this is a different kernel version from my
earlier test results. So, f.e., BW1 represents 2.6.39-rc2,
the original kernel; while BW2 represents 2.6.37-rc5 (MST's
kernel). This also isn't with the fixes you have sent just
now. I will get a run with that either late tonight or
tomorrow.

________________________________________________________
                   I/O size: 1K
#     BW1     BW2 (%)        SD1       SD2 (%)
________________________________________________________
1     1723    3016 (75.0)    4.7       2.6 (-44.6)
2     3223    6712 (108.2)   18.0      7.1 (-60.5)
4     7223    8258 (14.3)    36.5      24.3 (-33.4)
8     8689    7943 (-8.5)    131.5     101.6 (-22.7)
16    8059    7398 (-8.2)    578.3     406.4 (-29.7)
32    7758    7208 (-7.0)    2281.4    1574.7 (-30.9)
64    7503    7155 (-4.6)    9734.0    6368.0 (-34.5)
96    7496    7078 (-5.5)    21980.9   15477.6 (-29.5)
128   7389    6900 (-6.6)    40467.5   26031.9 (-35.6)
________________________________________________________
Summary:     BW: (4.4)     SD: (-33.5)

________________________________________________________
                 I/O size: 2K
#     BW1     BW2 (%)        SD1       SD2 (%)
________________________________________________________
1     1608    4968 (208.9)   5.0       1.3 (-74.0)
2     3354    6974 (107.9)   18.6      4.9 (-73.6)
4     8234    8344 (1.3)     35.6      17.9 (-49.7)
8     8427    7818 (-7.2)    103.5     71.2 (-31.2)
16    7995    7491 (-6.3)    410.1     273.9 (-33.2)
32    7863    7149 (-9.0)    1678.6    1080.4 (-35.6)
64    7661    7092 (-7.4)    7245.3    4717.2 (-34.8)
96    7517    6984 (-7.0)    15711.2   9838.9 (-37.3)
128   7389    6851 (-7.2)    27121.6   18255.7 (-32.6)
________________________________________________________
Summary:     BW: (6.0)     SD: (-34.5)

________________________________________________________
                  I/O size: 16K
#     BW1     BW2 (%)        SD1       SD2 (%)
________________________________________________________
1     6684    7019 (5.0)     1.1       1.1 (0)
2     7674    7196 (-6.2)    5.0       4.8 (-4.0)
4     7358    8032 (9.1)     21.3      20.4 (-4.2)
8     7393    8015 (8.4)     82.7      82.0 (-.8)
16    7958    8366 (5.1)     283.2     310.7 (9.7)
32    7792    8113 (4.1)     1257.5    1363.0 (8.3)
64    7673    8040 (4.7)     5723.1    5812.4 (1.5)
96    7462    7883 (5.6)     12731.8   12119.8 (-4.8)
128   7338    7800 (6.2)     21331.7   21094.7 (-1.1)
________________________________________________________
Summary:     BW: (4.6)     SD: (-1.5)

Thanks,

- KK


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/4] [RFC] virtio-net: Improve small packet performance
  2011-05-05 15:27           ` Krishna Kumar2
@ 2011-05-05 15:34             ` Michael S. Tsirkin
  2011-05-07  7:15               ` Krishna Kumar2
  2011-05-05 15:36             ` Krishna Kumar2
  2011-05-05 15:42             ` Michael S. Tsirkin
  2 siblings, 1 reply; 26+ messages in thread
From: Michael S. Tsirkin @ 2011-05-05 15:34 UTC (permalink / raw)
  To: Krishna Kumar2; +Cc: davem, eric.dumazet, kvm, netdev, rusty

On Thu, May 05, 2011 at 08:57:13PM +0530, Krishna Kumar2 wrote:
> "Michael S. Tsirkin" <mst@redhat.com> wrote on 05/05/2011 02:34:39 PM:
> 
> > > Do I need to apply all the patches and simply test?
> > >
> > > Thanks,
> > >
> > > - KK
> >
> > Exactly. You can also try to tune the threshold
> > for interrupts as well.
> 
> I haven't tuned the threshhold, it is left it at 3/4. I ran
> the new qemu/vhost/guest, and the results for 1K, 2K and 16K
> are below. Note this is a different kernel version from my
> earlier test results. So, f.e., BW1 represents 2.6.39-rc2,
> the original kernel; while BW2 represents 2.6.37-rc5 (MST's
> kernel).

Weird. My kernel is actually 2.6.39-rc2. So which is which?

> This also isn't with the fixes you have sent just
> now. I will get a run with that either late tonight or
> tomorrow.

Shouldn't affect anything performance-wise.

> ________________________________________________________
>                    I/O size: 1K
> #     BW1     BW2 (%)        SD1       SD2 (%)
> ________________________________________________________
> 1     1723    3016 (75.0)    4.7       2.6 (-44.6)
> 2     3223    6712 (108.2)   18.0      7.1 (-60.5)
> 4     7223    8258 (14.3)    36.5      24.3 (-33.4)
> 8     8689    7943 (-8.5)    131.5     101.6 (-22.7)
> 16    8059    7398 (-8.2)    578.3     406.4 (-29.7)
> 32    7758    7208 (-7.0)    2281.4    1574.7 (-30.9)
> 64    7503    7155 (-4.6)    9734.0    6368.0 (-34.5)
> 96    7496    7078 (-5.5)    21980.9   15477.6 (-29.5)
> 128   7389    6900 (-6.6)    40467.5   26031.9 (-35.6)
> ________________________________________________________
> Summary:     BW: (4.4)     SD: (-33.5)
> 
> ________________________________________________________
>                  I/O size: 2K
> #     BW1     BW2 (%)        SD1       SD2 (%)
> ________________________________________________________
> 1     1608    4968 (208.9)   5.0       1.3 (-74.0)
> 2     3354    6974 (107.9)   18.6      4.9 (-73.6)
> 4     8234    8344 (1.3)     35.6      17.9 (-49.7)
> 8     8427    7818 (-7.2)    103.5     71.2 (-31.2)
> 16    7995    7491 (-6.3)    410.1     273.9 (-33.2)
> 32    7863    7149 (-9.0)    1678.6    1080.4 (-35.6)
> 64    7661    7092 (-7.4)    7245.3    4717.2 (-34.8)
> 96    7517    6984 (-7.0)    15711.2   9838.9 (-37.3)
> 128   7389    6851 (-7.2)    27121.6   18255.7 (-32.6)
> ________________________________________________________
> Summary:     BW: (6.0)     SD: (-34.5)
> 
> ________________________________________________________
>                   I/O size: 16K
> #     BW1     BW2 (%)        SD1       SD2 (%)
> ________________________________________________________
> 1     6684    7019 (5.0)     1.1       1.1 (0)
> 2     7674    7196 (-6.2)    5.0       4.8 (-4.0)
> 4     7358    8032 (9.1)     21.3      20.4 (-4.2)
> 8     7393    8015 (8.4)     82.7      82.0 (-.8)
> 16    7958    8366 (5.1)     283.2     310.7 (9.7)
> 32    7792    8113 (4.1)     1257.5    1363.0 (8.3)
> 64    7673    8040 (4.7)     5723.1    5812.4 (1.5)
> 96    7462    7883 (5.6)     12731.8   12119.8 (-4.8)
> 128   7338    7800 (6.2)     21331.7   21094.7 (-1.1)
> ________________________________________________________
> Summary:     BW: (4.6)     SD: (-1.5)
> 
> Thanks,
> 
> - KK

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/4] [RFC] virtio-net: Improve small packet performance
  2011-05-05 15:34             ` Michael S. Tsirkin
@ 2011-05-07  7:15               ` Krishna Kumar2
  0 siblings, 0 replies; 26+ messages in thread
From: Krishna Kumar2 @ 2011-05-07  7:15 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: davem, eric.dumazet, kvm, netdev, rusty

"Michael S. Tsirkin" <mst@redhat.com> wrote on 05/05/2011 09:04:13 PM:

> > I haven't tuned the threshhold, it is left it at 3/4. I ran
> > the new qemu/vhost/guest, and the results for 1K, 2K and 16K
> > are below. Note this is a different kernel version from my
> > earlier test results. So, f.e., BW1 represents 2.6.39-rc2,
> > the original kernel; while BW2 represents 2.6.37-rc5 (MST's
> > kernel).
>
> Weird. My kernel is actually 2.6.39-rc2. So which is which?

I cloned git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git

# git branch -a
  vhost
* vhost-net-next-event-idx-v1
  remotes/origin/HEAD -> origin/vhost
  remotes/origin/for-linus
  remotes/origin/master
  remotes/origin/net-2.6
  remotes/origin/vhost
  remotes/origin/vhost-broken
  remotes/origin/vhost-devel
  remotes/origin/vhost-mrg-rxbuf
  remotes/origin/vhost-net
  remotes/origin/vhost-net-next
  remotes/origin/vhost-net-next-event-idx-v1
  remotes/origin/vhost-net-next-rebased
  remotes/origin/virtio-layout-aligned
  remotes/origin/virtio-layout-minimal
  remotes/origin/virtio-layout-original
  remotes/origin/virtio-layout-padded
  remotes/origin/virtio-publish-used

# git checkout vhost-net-next-event-idx-v1
Already on 'vhost-net-next-event-idx-v1'

# head -4 Makefile
VERSION = 2
PATCHLEVEL = 6
SUBLEVEL = 37
EXTRAVERSION = -rc5

I am not sure what I am missing.

thanks,

- KK


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/4] [RFC] virtio-net: Improve small packet performance
  2011-05-05 15:27           ` Krishna Kumar2
  2011-05-05 15:34             ` Michael S. Tsirkin
@ 2011-05-05 15:36             ` Krishna Kumar2
  2011-05-05 15:37               ` Michael S. Tsirkin
  2011-05-05 15:42             ` Michael S. Tsirkin
  2 siblings, 1 reply; 26+ messages in thread
From: Krishna Kumar2 @ 2011-05-05 15:36 UTC (permalink / raw)
  To: Krishna Kumar2
  Cc: davem, eric.dumazet, kvm, Michael S. Tsirkin, netdev,
	netdev-owner, rusty

Krishna Kumar wrote on 05/05/2011 08:57:13 PM:

Oops, I sent my patch's test results for the 16K case.
The correct one is:

________________________________________________________
                  I/O size: 16K
#       BW1     BW2 (%)         SD1     SD2 (%)
________________________________________________________
1       6684    6670 (-.2)      1.1     .6 (-45.4)
2       7674    7859 (2.4)      5.0     2.6 (-48.0)
4       7358    7421 (.8)       21.3    11.6 (-45.5)
8       7393    7289 (-1.4)     82.7    44.8 (-45.8)
16      7958    7280 (-8.5)     283.2   166.3 (-41.2)
32      7792    7163 (-8.0)     1257.5  692.4 (-44.9)
64      7673    7096 (-7.5)     5723.1  2870.3 (-49.8)
96      7462    6963 (-6.6)     12731.8 6475.6 (-49.1)
128     7338    6919 (-5.7)     21331.7 12345.7 (-42.1)
________________________________________________________
Summary:    BW: (-3.9)      SD: (-45.4)

Sorry for the confusion.

Regards,

- KK

> ________________________________________________________
>                   I/O size: 16K
> #     BW1     BW2 (%)        SD1       SD2 (%)
> ________________________________________________________
> 1     6684    7019 (5.0)     1.1       1.1 (0)
> 2     7674    7196 (-6.2)    5.0       4.8 (-4.0)
> 4     7358    8032 (9.1)     21.3      20.4 (-4.2)
> 8     7393    8015 (8.4)     82.7      82.0 (-.8)
> 16    7958    8366 (5.1)     283.2     310.7 (9.7)
> 32    7792    8113 (4.1)     1257.5    1363.0 (8.3)
> 64    7673    8040 (4.7)     5723.1    5812.4 (1.5)
> 96    7462    7883 (5.6)     12731.8   12119.8 (-4.8)
> 128   7338    7800 (6.2)     21331.7   21094.7 (-1.1)
> ________________________________________________________
> Summary:     BW: (4.6)     SD: (-1.5)


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/4] [RFC] virtio-net: Improve small packet performance
  2011-05-05 15:36             ` Krishna Kumar2
@ 2011-05-05 15:37               ` Michael S. Tsirkin
  0 siblings, 0 replies; 26+ messages in thread
From: Michael S. Tsirkin @ 2011-05-05 15:37 UTC (permalink / raw)
  To: Krishna Kumar2; +Cc: davem, eric.dumazet, kvm, netdev, netdev-owner, rusty

On Thu, May 05, 2011 at 09:06:00PM +0530, Krishna Kumar2 wrote:
> Krishna Kumar wrote on 05/05/2011 08:57:13 PM:
> 
> Oops, I sent my patch's test results for the 16K case.
> The correct one is:
> 
> ________________________________________________________
>                   I/O size: 16K
> #       BW1     BW2 (%)         SD1     SD2 (%)
> ________________________________________________________
> 1       6684    6670 (-.2)      1.1     .6 (-45.4)
> 2       7674    7859 (2.4)      5.0     2.6 (-48.0)
> 4       7358    7421 (.8)       21.3    11.6 (-45.5)
> 8       7393    7289 (-1.4)     82.7    44.8 (-45.8)
> 16      7958    7280 (-8.5)     283.2   166.3 (-41.2)
> 32      7792    7163 (-8.0)     1257.5  692.4 (-44.9)
> 64      7673    7096 (-7.5)     5723.1  2870.3 (-49.8)
> 96      7462    6963 (-6.6)     12731.8 6475.6 (-49.1)
> 128     7338    6919 (-5.7)     21331.7 12345.7 (-42.1)
> ________________________________________________________
> Summary:    BW: (-3.9)      SD: (-45.4)
> 
> Sorry for the confusion.
> 
> Regards,
> 
> - KK

Interesting. So which is which?
> > ________________________________________________________
> >                   I/O size: 16K
> > #     BW1     BW2 (%)        SD1       SD2 (%)
> > ________________________________________________________
> > 1     6684    7019 (5.0)     1.1       1.1 (0)
> > 2     7674    7196 (-6.2)    5.0       4.8 (-4.0)
> > 4     7358    8032 (9.1)     21.3      20.4 (-4.2)
> > 8     7393    8015 (8.4)     82.7      82.0 (-.8)
> > 16    7958    8366 (5.1)     283.2     310.7 (9.7)
> > 32    7792    8113 (4.1)     1257.5    1363.0 (8.3)
> > 64    7673    8040 (4.7)     5723.1    5812.4 (1.5)
> > 96    7462    7883 (5.6)     12731.8   12119.8 (-4.8)
> > 128   7338    7800 (6.2)     21331.7   21094.7 (-1.1)
> > ________________________________________________________
> > Summary:     BW: (4.6)     SD: (-1.5)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/4] [RFC] virtio-net: Improve small packet performance
  2011-05-05 15:27           ` Krishna Kumar2
  2011-05-05 15:34             ` Michael S. Tsirkin
  2011-05-05 15:36             ` Krishna Kumar2
@ 2011-05-05 15:42             ` Michael S. Tsirkin
  2 siblings, 0 replies; 26+ messages in thread
From: Michael S. Tsirkin @ 2011-05-05 15:42 UTC (permalink / raw)
  To: Krishna Kumar2; +Cc: davem, eric.dumazet, kvm, netdev, rusty

On Thu, May 05, 2011 at 08:57:13PM +0530, Krishna Kumar2 wrote:
> "Michael S. Tsirkin" <mst@redhat.com> wrote on 05/05/2011 02:34:39 PM:
> 
> > > Do I need to apply all the patches and simply test?
> > >
> > > Thanks,
> > >
> > > - KK
> >
> > Exactly. You can also try to tune the threshold
> > for interrupts as well.
> 
> I haven't tuned the threshhold, it is left it at 3/4. I ran
> the new qemu/vhost/guest, and the results for 1K, 2K and 16K
> are below. Note this is a different kernel version from my
> earlier test results. So, f.e., BW1 represents 2.6.39-rc2,
> the original kernel; while BW2 represents 2.6.37-rc5 (MST's
> kernel). This also isn't with the fixes you have sent just
> now. I will get a run with that either late tonight or
> tomorrow.

One thing I'd suggest is merging v2.6.39-rc6 into that tree.
rc2 is still pretty early, reason I use it is because that is
what net-next is.

-- 
MST

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/4] [RFC] virtio-net: Improve small packet performance
@ 2011-05-04 14:10 Krishna Kumar
  0 siblings, 0 replies; 26+ messages in thread
From: Krishna Kumar @ 2011-05-04 14:10 UTC (permalink / raw)
  To: davem; +Cc: eric.dumazet, kvm, mst, netdev, rusty, Krishna Kumar

Krishna Kumar2/India/IBM@IBMIN wrote on 05/04/2011 07:32:58 PM:

> [PATCH 0/4] [RFC] virtio-net: Improve small packet performance

I found having tabs in the table made the results a little
difficult to understand. Converting the same to spaces, hope
it is clear this time.

________________________________________________________
               I/O Size: 1K
#     BW1      BW2 (%)            SD1       SD2 (%)
________________________________________________________
1     1226     3313 (170.2)       6.6       1.9 (-71.2)
2     3223     7705 (139.0)       18.0      7.1 (-60.5)
4     7223     8716 (20.6)        36.5      29.7 (-18.6)
8     8689     8693 (0)           131.5     123.0 (-6.4)
16    8059     8285 (2.8)         578.3     506.2 (-12.4)
32    7758     7955 (2.5)         2281.4    2244.2 (-1.6)
64    7503     7895 (5.2)         9734.0    9424.4 (-3.1)
96    7496     7751 (3.4)         21980.9   20169.3 (-8.2)
128   7389     7741 (4.7)         40467.5   34995.5 (-13.5)
________________________________________________________
Summary:     BW: 16.2%     SD: -10.2%

________________________________________________________
               I/O Size: 16K
#     BW1      BW2 (%)            SD1       SD2 (%)
________________________________________________________
1     6684     7019 (5.0)         1.1       1.1 (0)
2     7674     7196 (-6.2)        5.0       4.8 (-4.0)
4     7358     8032 (9.1)         21.3      20.4 (-4.2)
8     7393     8015 (8.4)         82.7      82.0 (-.8)
16    7958     8366 (5.1)         283.2     310.7 (9.7)
32    7792     8113 (4.1)         1257.5    1363.0 (8.3)
64    7673     8040 (4.7)         5723.1    5812.4 (1.5)
96    7462     7883 (5.6)         12731.8   12119.8 (-4.8)
128   7338     7800 (6.2)         21331.7   21094.7 (-1.1)
________________________________________________________
Summary:     BW: 4.6%     SD: -1.5%

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2011-05-07  7:15 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-05-04 14:02 [PATCH 0/4] [RFC] virtio-net: Improve small packet performance Krishna Kumar
2011-05-04 14:03 ` [PATCH 1/4] [RFC] netdevice: Introduce per-txq xmit_restart Krishna Kumar
2011-05-04 14:03 ` [PATCH 2/4] [RFC] virtio: Introduce new API to get free space Krishna Kumar
2011-05-04 14:50   ` Michael S. Tsirkin
2011-05-04 20:00     ` Michael S. Tsirkin
2011-05-05  3:08       ` Krishna Kumar2
2011-05-05  9:13       ` Michael S. Tsirkin
2011-05-04 19:58   ` Michael S. Tsirkin
2011-05-04 14:03 ` [PATCH 3/4] [RFC] virtio-net: Changes to virtio-net driver Krishna Kumar
2011-05-05 12:28   ` Michael S. Tsirkin
2011-05-04 14:03 ` [PATCH 4/4] [RFC] sched: Changes to dequeue_skb Krishna Kumar
2011-05-04 14:46 ` [PATCH 0/4] [RFC] virtio-net: Improve small packet performance Michael S. Tsirkin
2011-05-04 14:59   ` Krishna Kumar2
2011-05-04 21:23     ` Michael S. Tsirkin
2011-05-05  8:03       ` Krishna Kumar2
2011-05-05  9:04         ` Michael S. Tsirkin
2011-05-05  9:43           ` Krishna Kumar2
2011-05-05 10:12             ` Michael S. Tsirkin
2011-05-05 10:57               ` Krishna Kumar2
2011-05-05 15:27           ` Krishna Kumar2
2011-05-05 15:34             ` Michael S. Tsirkin
2011-05-07  7:15               ` Krishna Kumar2
2011-05-05 15:36             ` Krishna Kumar2
2011-05-05 15:37               ` Michael S. Tsirkin
2011-05-05 15:42             ` Michael S. Tsirkin
  -- strict thread matches above, loose matches on Subject: below --
2011-05-04 14:10 Krishna Kumar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).