Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCHv2 1/2] net: Allow ethtool to set interface in loopback mode.
From: Mahesh Bandewar @ 2011-05-05  1:28 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Ben Hutchings, Michał Mirosław, Tom Herbert,
	Mahesh Bandewar
In-Reply-To: <1304558800-15877-1-git-send-email-maheshb@google.com>

Please discard this, I do not wish to create a new thread for this.

Thanks,
--mahesh..

On Wed, May 4, 2011 at 6:26 PM, Mahesh Bandewar <maheshb@google.com> wrote:
> This patch enables ethtool to set the loopback mode on a given interface.
> By configuring the interface in loopback mode in conjunction with a policy
> route / rule, a userland application can stress the egress / ingress path
> exposing the flows of the change in progress and potentially help developer(s)
> understand the impact of those changes without even sending a packet out
> on the network.
>
> Following set of commands illustrates one such example -
>    a) ip -4 addr add 192.168.1.1/24 dev eth1
>    b) ip -4 rule add from all iif eth1 lookup 250
>    c) ip -4 route add local 0/0 dev lo proto kernel scope host table 250
>    d) arp -Ds 192.168.1.100 eth1
>    e) arp -Ds 192.168.1.200 eth1
>    f) sysctl -w net.ipv4.ip_nonlocal_bind=1
>    g) sysctl -w net.ipv4.conf.all.accept_local=1
>    # Assuming that the machine has 8 cores
>    h) taskset 000f netserver -L 192.168.1.200
>    i) taskset 00f0 netperf -t TCP_CRR -L 192.168.1.100 -H 192.168.1.200 -l 30
>
> Signed-off-by: Mahesh Bandewar <maheshb@google.com>
> ---
> Changes since v1
>  Added NETIF_F_LOOPBACK in loopback device's feature-set.
>
>  drivers/net/loopback.c    |    3 ++-
>  include/linux/netdevice.h |    3 ++-
>  net/core/ethtool.c        |    2 +-
>  3 files changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
> index d70fb76..4ce9e5f 100644
> --- a/drivers/net/loopback.c
> +++ b/drivers/net/loopback.c
> @@ -174,7 +174,8 @@ static void loopback_setup(struct net_device *dev)
>                | NETIF_F_HIGHDMA
>                | NETIF_F_LLTX
>                | NETIF_F_NETNS_LOCAL
> -               | NETIF_F_VLAN_CHALLENGED;
> +               | NETIF_F_VLAN_CHALLENGED
> +               | NETIF_F_LOOPBACK;
>        dev->ethtool_ops        = &loopback_ethtool_ops;
>        dev->header_ops         = &eth_header_ops;
>        dev->netdev_ops         = &loopback_ops;
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index d5de66a..e7244ed 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1067,6 +1067,7 @@ struct net_device {
>  #define NETIF_F_RXHASH         (1 << 28) /* Receive hashing offload */
>  #define NETIF_F_RXCSUM         (1 << 29) /* Receive checksumming offload */
>  #define NETIF_F_NOCACHE_COPY   (1 << 30) /* Use no-cache copyfromuser */
> +#define NETIF_F_LOOPBACK       (1 << 31) /* Enable loopback */
>
>        /* Segmentation offload features */
>  #define NETIF_F_GSO_SHIFT      16
> @@ -1082,7 +1083,7 @@ struct net_device {
>        /* = all defined minus driver/device-class-related */
>  #define NETIF_F_NEVER_CHANGE   (NETIF_F_VLAN_CHALLENGED | \
>                                  NETIF_F_LLTX | NETIF_F_NETNS_LOCAL)
> -#define NETIF_F_ETHTOOL_BITS   (0x7f3fffff & ~NETIF_F_NEVER_CHANGE)
> +#define NETIF_F_ETHTOOL_BITS   (0xff3fffff & ~NETIF_F_NEVER_CHANGE)
>
>        /* List of features with software fallbacks. */
>  #define NETIF_F_GSO_SOFTWARE   (NETIF_F_TSO | NETIF_F_TSO_ECN | \
> diff --git a/net/core/ethtool.c b/net/core/ethtool.c
> index d8b1a8d..f26649d 100644
> --- a/net/core/ethtool.c
> +++ b/net/core/ethtool.c
> @@ -362,7 +362,7 @@ static const char netdev_features_strings[ETHTOOL_DEV_FEATURE_WORDS * 32][ETH_GS
>        /* NETIF_F_RXHASH */          "rx-hashing",
>        /* NETIF_F_RXCSUM */          "rx-checksum",
>        /* NETIF_F_NOCACHE_COPY */    "tx-nocache-copy"
> -       "",
> +       /* NETIF_F_LOOPBACK */        "loopback",
>  };
>
>  static int __ethtool_get_sset_count(struct net_device *dev, int sset)
> --
> 1.7.3.1
>
>

^ permalink raw reply

* [PATCHv2] forcedeth: Allow ethtool to enable/disable loopback.
From: Mahesh Bandewar @ 2011-05-05  1:36 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Tom Herbert, Mahesh Bandewar
In-Reply-To: <1304472127-652-1-git-send-email-maheshb@google.com>

This patch updates nv_set_features() to handle loopback mode.
When enabled; it sets internal-PHY loopback.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
---
Changes since v1:
 Clear the loopback bit in ndo_open callback to avoid features discrepancy.

 drivers/net/forcedeth.c |   61 +++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 61 insertions(+), 0 deletions(-)

diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
index d09e8b0..55eb558 100644
--- a/drivers/net/forcedeth.c
+++ b/drivers/net/forcedeth.c
@@ -4474,6 +4474,53 @@ static int nv_set_pauseparam(struct net_device *dev, struct ethtool_pauseparam*
 	return 0;
 }
 
+static int nv_set_loopback(struct net_device *dev, u32 features)
+{
+	struct fe_priv *np = netdev_priv(dev);
+	unsigned long flags;
+	u32 miicontrol;
+	int err, retval = 0;
+
+	spin_lock_irqsave(&np->lock, flags);
+	miicontrol = mii_rw(dev, np->phyaddr, MII_BMCR, MII_READ);
+	if (features & NETIF_F_LOOPBACK) {
+		if (miicontrol & BMCR_LOOPBACK) {
+			spin_unlock_irqrestore(&np->lock, flags);
+			return retval;
+		}
+		nv_disable_irq(dev);
+		/* Turn on loopback mode */
+		miicontrol |= BMCR_LOOPBACK | BMCR_FULLDPLX;
+		err = mii_rw(dev, np->phyaddr, MII_BMCR, miicontrol);
+		if (err) {
+			err = -EAGAIN;
+			spin_unlock_irqrestore(&np->lock, flags);
+			phy_init(dev);
+		} else {
+			/* Force link up */
+			netif_carrier_on(dev);
+			spin_unlock_irqrestore(&np->lock, flags);
+			netdev_info(dev, "Internal PHY loopback mode enabled.\n");
+		}
+	} else {
+		if (!(miicontrol & BMCR_LOOPBACK)) {
+			spin_unlock_irqrestore(&np->lock, flags);
+			return retval;
+		}	
+		nv_disable_irq(dev);
+		/* Turn off loopback */
+		spin_unlock_irqrestore(&np->lock, flags);
+		netdev_info(dev, "Internal PHY loopback mode disabled.\n");
+		phy_init(dev);
+	}
+	msleep(500);
+	spin_lock_irqsave(&np->lock, flags);
+	nv_enable_irq(dev);
+	spin_unlock_irqrestore(&np->lock, flags);
+
+	return retval;
+}
+
 static u32 nv_fix_features(struct net_device *dev, u32 features)
 {
 	/* vlan is dependent on rx checksum offload */
@@ -4502,6 +4549,9 @@ static int nv_set_features(struct net_device *dev, u32 features)
 
 		spin_unlock_irq(&np->lock);
 	}
+	if ((changed & NETIF_F_LOOPBACK) && netif_running(dev)) {
+		nv_set_loopback(dev, features);
+	}
 
 	return 0;
 }
@@ -5142,6 +5192,14 @@ static int nv_open(struct net_device *dev)
 
 	spin_unlock_irq(&np->lock);
 
+	/* If the loopback feature was set while the device was down, make sure
+	 * that it's cleared to avoid any discrepancy in features reporting.
+	 */
+	if (dev->features & NETIF_F_LOOPBACK) {
+		dev->features &= ~NETIF_F_LOOPBACK;
+		dev->wanted_features &= ~NETIF_F_LOOPBACK;
+	}
+
 	return 0;
 out_drain:
 	nv_drain_rxtx(dev);
@@ -5341,6 +5399,9 @@ static int __devinit nv_probe(struct pci_dev *pci_dev, const struct pci_device_i
 		dev->features |= dev->hw_features;
 	}
 
+	/* Add loopback capability to the device. */
+	dev->hw_features |= NETIF_F_LOOPBACK;
+
 	np->vlanctl_bits = 0;
 	if (id->driver_data & DEV_HAS_VLAN) {
 		np->vlanctl_bits = NVREG_VLANCONTROL_ENABLE;
-- 
1.7.3.1


^ permalink raw reply related

* Re: [PATCH 2/4] [RFC] virtio: Introduce new API to get free space
From: Krishna Kumar2 @ 2011-05-05  3:08 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: davem, eric.dumazet, kvm, netdev, rusty
In-Reply-To: <20110504200023.GA20303@redhat.com>

"Michael S. Tsirkin" <mst@redhat.com> wrote on 05/05/2011 01:30:23 AM:

> > > @@ -185,11 +193,6 @@ int virtqueue_add_buf_gfp(struct virtque
> > >     if (vq->num_free < out + in) {
> > >        pr_debug("Can't add buf len %i - avail = %i\n",
> > >            out + in, vq->num_free);
> > > -      /* FIXME: for historical reasons, we force a notify here if
> > > -       * there are outgoing parts to the buffer.  Presumably the
> > > -       * host should service the ring ASAP. */
> > > -      if (out)
> > > -         vq->notify(&vq->vq);
> > >        END_USE(vq);
> > >        return -ENOSPC;
> > >     }
> >
> > This will break qemu versions 0.13 and back.
> > I'm adding some new virtio ring flags, we'll be
> > able to reuse one of these to mean 'no need for
> > work around', I think.
>
> Not really, it wont. We shall almost never get here at all.
> But then, why would this help performance?

Yes, it is not needed. I will be testing it without this
also.

thanks,

- KK


^ permalink raw reply

* Hello;
From: Loveth Jerry @ 2011-05-03  8:40 UTC (permalink / raw)




Hello, 

My name is Loveth Niame Jerry . I got your contact email while making some research for reliable foreigner on the Internet (Remember the distance or colour does not matter but our good heart and sincerity matters allot in life ) i will be waiting to hear from you so that i will tell you more about my self and send my picture to you.I wish you all the best for your day.Always stay blessed from my deepest heart. 

Yours respectfully, 
Miss Loveth. 


^ permalink raw reply

* Re: [Bugme-new] [Bug 33502] New: Caught 64-bit read from uninitialized memory in __alloc_skb
From: Eric Dumazet @ 2011-05-05  6:18 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, casteyde.christian, Andrew Morton, netdev,
	bugzilla-daemon, bugme-daemon, Vegard Nossum
In-Reply-To: <1303311687.3186.100.camel@edumazet-laptop>

Le mercredi 20 avril 2011 à 17:01 +0200, Eric Dumazet a écrit :
> Le mercredi 20 avril 2011 à 09:42 -0500, Christoph Lameter a écrit :
> 
> > Avoiding the irq handling yields the savings that improve the fastpath. if
> > you do both then there is only a regression left. So lets go with
> > disabling the CMPXCHG_LOCAL.
> 
> OK, let's do that then.
> 
> Thanks
> 
> [PATCH v4] slub: dont use cmpxchg_double if KMEMCHECK or DEBUG_PAGEALLOC
> 
> Christian Casteyde reported a KMEMCHECK splat in slub code.
> 
> Problem is now we are lockless and allow IRQ in slab_alloc(), the object
> we manipulate from freelist can be allocated and freed right before we
> try to read object->next.
> 
> Same problem can happen with DEBUG_PAGEALLOC
> 
> Just dont use cmpxchg_double() if either CONFIG_KMEMCHECK or
> CONFIG_DEBUG_PAGEALLOC is defined.
> 
> Reported-by: Christian Casteyde <casteyde.christian@free.fr>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Pekka Enberg <penberg@cs.helsinki.fi>
> Cc: Vegard Nossum <vegardno@ifi.uio.no>
> Cc: Christoph Lameter <cl@linux.com>
> ---
>  mm/slub.c |   45 +++++++++++++++++++++++++--------------------
>  1 files changed, 25 insertions(+), 20 deletions(-)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index 94d2a33..f31ab2c 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1540,7 +1540,12 @@ static void unfreeze_slab(struct kmem_cache *s, struct page *page, int tail)
>  	}
>  }
>  
> -#ifdef CONFIG_CMPXCHG_LOCAL
> +#if defined(CONFIG_CMPXCHG_LOCAL) && !defined(CONFIG_KMEMCHECK) && \
> +				     !defined(CONFIG_DEBUG_PAGEALLOC)
> +#define SLUB_USE_CMPXCHG_DOUBLE
> +#endif
> +
> +#ifdef SLUB_USE_CMPXCHG_DOUBLE
>  #ifdef CONFIG_PREEMPT
>  /*
>   * Calculate the next globally unique transaction for disambiguiation
> @@ -1604,7 +1609,7 @@ static inline void note_cmpxchg_failure(const char *n,
>  
>  void init_kmem_cache_cpus(struct kmem_cache *s)
>  {
> -#ifdef CONFIG_CMPXCHG_LOCAL
> +#ifdef SLUB_USE_CMPXCHG_DOUBLE
>  	int cpu;
>  
>  	for_each_possible_cpu(cpu)
> @@ -1643,7 +1648,7 @@ static void deactivate_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
>  		page->inuse--;
>  	}
>  	c->page = NULL;
> -#ifdef CONFIG_CMPXCHG_LOCAL
> +#ifdef SLUB_USE_CMPXCHG_DOUBLE
>  	c->tid = next_tid(c->tid);
>  #endif
>  	unfreeze_slab(s, page, tail);
> @@ -1780,7 +1785,7 @@ static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>  {
>  	void **object;
>  	struct page *new;
> -#ifdef CONFIG_CMPXCHG_LOCAL
> +#ifdef SLUB_USE_CMPXCHG_DOUBLE
>  	unsigned long flags;
>  
>  	local_irq_save(flags);
> @@ -1819,7 +1824,7 @@ load_freelist:
>  	c->node = page_to_nid(c->page);
>  unlock_out:
>  	slab_unlock(c->page);
> -#ifdef CONFIG_CMPXCHG_LOCAL
> +#ifdef SLUB_USE_CMPXCHG_DOUBLE
>  	c->tid = next_tid(c->tid);
>  	local_irq_restore(flags);
>  #endif
> @@ -1858,7 +1863,7 @@ new_slab:
>  	}
>  	if (!(gfpflags & __GFP_NOWARN) && printk_ratelimit())
>  		slab_out_of_memory(s, gfpflags, node);
> -#ifdef CONFIG_CMPXCHG_LOCAL
> +#ifdef SLUB_USE_CMPXCHG_DOUBLE
>  	local_irq_restore(flags);
>  #endif
>  	return NULL;
> @@ -1887,7 +1892,7 @@ static __always_inline void *slab_alloc(struct kmem_cache *s,
>  {
>  	void **object;
>  	struct kmem_cache_cpu *c;
> -#ifdef CONFIG_CMPXCHG_LOCAL
> +#ifdef SLUB_USE_CMPXCHG_DOUBLE
>  	unsigned long tid;
>  #else
>  	unsigned long flags;
> @@ -1896,7 +1901,7 @@ static __always_inline void *slab_alloc(struct kmem_cache *s,
>  	if (slab_pre_alloc_hook(s, gfpflags))
>  		return NULL;
>  
> -#ifndef CONFIG_CMPXCHG_LOCAL
> +#ifndef SLUB_USE_CMPXCHG_DOUBLE
>  	local_irq_save(flags);
>  #else
>  redo:
> @@ -1910,7 +1915,7 @@ redo:
>  	 */
>  	c = __this_cpu_ptr(s->cpu_slab);
>  
> -#ifdef CONFIG_CMPXCHG_LOCAL
> +#ifdef SLUB_USE_CMPXCHG_DOUBLE
>  	/*
>  	 * The transaction ids are globally unique per cpu and per operation on
>  	 * a per cpu queue. Thus they can be guarantee that the cmpxchg_double
> @@ -1927,7 +1932,7 @@ redo:
>  		object = __slab_alloc(s, gfpflags, node, addr, c);
>  
>  	else {
> -#ifdef CONFIG_CMPXCHG_LOCAL
> +#ifdef SLUB_USE_CMPXCHG_DOUBLE
>  		/*
>  		 * The cmpxchg will only match if there was no additional
>  		 * operation and if we are on the right processor.
> @@ -1954,7 +1959,7 @@ redo:
>  		stat(s, ALLOC_FASTPATH);
>  	}
>  
> -#ifndef CONFIG_CMPXCHG_LOCAL
> +#ifndef SLUB_USE_CMPXCHG_DOUBLE
>  	local_irq_restore(flags);
>  #endif
>  
> @@ -2034,7 +2039,7 @@ static void __slab_free(struct kmem_cache *s, struct page *page,
>  {
>  	void *prior;
>  	void **object = (void *)x;
> -#ifdef CONFIG_CMPXCHG_LOCAL
> +#ifdef SLUB_USE_CMPXCHG_DOUBLE
>  	unsigned long flags;
>  
>  	local_irq_save(flags);
> @@ -2070,7 +2075,7 @@ checks_ok:
>  
>  out_unlock:
>  	slab_unlock(page);
> -#ifdef CONFIG_CMPXCHG_LOCAL
> +#ifdef SLUB_USE_CMPXCHG_DOUBLE
>  	local_irq_restore(flags);
>  #endif
>  	return;
> @@ -2084,7 +2089,7 @@ slab_empty:
>  		stat(s, FREE_REMOVE_PARTIAL);
>  	}
>  	slab_unlock(page);
> -#ifdef CONFIG_CMPXCHG_LOCAL
> +#ifdef SLUB_USE_CMPXCHG_DOUBLE
>  	local_irq_restore(flags);
>  #endif
>  	stat(s, FREE_SLAB);
> @@ -2113,7 +2118,7 @@ static __always_inline void slab_free(struct kmem_cache *s,
>  {
>  	void **object = (void *)x;
>  	struct kmem_cache_cpu *c;
> -#ifdef CONFIG_CMPXCHG_LOCAL
> +#ifdef SLUB_USE_CMPXCHG_DOUBLE
>  	unsigned long tid;
>  #else
>  	unsigned long flags;
> @@ -2121,7 +2126,7 @@ static __always_inline void slab_free(struct kmem_cache *s,
>  
>  	slab_free_hook(s, x);
>  
> -#ifndef CONFIG_CMPXCHG_LOCAL
> +#ifndef SLUB_USE_CMPXCHG_DOUBLE
>  	local_irq_save(flags);
>  
>  #else
> @@ -2136,7 +2141,7 @@ redo:
>  	 */
>  	c = __this_cpu_ptr(s->cpu_slab);
>  
> -#ifdef CONFIG_CMPXCHG_LOCAL
> +#ifdef SLUB_USE_CMPXCHG_DOUBLE
>  	tid = c->tid;
>  	barrier();
>  #endif
> @@ -2144,7 +2149,7 @@ redo:
>  	if (likely(page == c->page && c->node != NUMA_NO_NODE)) {
>  		set_freepointer(s, object, c->freelist);
>  
> -#ifdef CONFIG_CMPXCHG_LOCAL
> +#ifdef SLUB_USE_CMPXCHG_DOUBLE
>  		if (unlikely(!this_cpu_cmpxchg_double(
>  				s->cpu_slab->freelist, s->cpu_slab->tid,
>  				c->freelist, tid,
> @@ -2160,7 +2165,7 @@ redo:
>  	} else
>  		__slab_free(s, page, x, addr);
>  
> -#ifndef CONFIG_CMPXCHG_LOCAL
> +#ifndef SLUB_USE_CMPXCHG_DOUBLE
>  	local_irq_restore(flags);
>  #endif
>  }
> @@ -2354,7 +2359,7 @@ static inline int alloc_kmem_cache_cpus(struct kmem_cache *s)
>  	BUILD_BUG_ON(PERCPU_DYNAMIC_EARLY_SIZE <
>  			SLUB_PAGE_SHIFT * sizeof(struct kmem_cache_cpu));
>  
> -#ifdef CONFIG_CMPXCHG_LOCAL
> +#ifdef SLUB_USE_CMPXCHG_DOUBLE
>  	/*
>  	 * Must align to double word boundary for the double cmpxchg instructions
>  	 * to work.
> 


Is anybody taking the patch and sending it upstream ?

Thanks



^ permalink raw reply

* Re: [Bugme-new] [Bug 33502] New: Caught 64-bit read from uninitialized memory in __alloc_skb
From: Pekka Enberg @ 2011-05-05  6:22 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Christoph Lameter, casteyde.christian, Andrew Morton, netdev,
	bugzilla-daemon, bugme-daemon, Vegard Nossum
In-Reply-To: <1304576296.32152.713.camel@edumazet-laptop>

On 5/5/11 9:18 AM, Eric Dumazet wrote:
> Le mercredi 20 avril 2011 à 17:01 +0200, Eric Dumazet a écrit :
>> Le mercredi 20 avril 2011 à 09:42 -0500, Christoph Lameter a écrit :
>>
>>> Avoiding the irq handling yields the savings that improve the fastpath. if
>>> you do both then there is only a regression left. So lets go with
>>> disabling the CMPXCHG_LOCAL.
>>
>> OK, let's do that then.
>>
>> Thanks
>>
>> [PATCH v4] slub: dont use cmpxchg_double if KMEMCHECK or DEBUG_PAGEALLOC
>>
>> Christian Casteyde reported a KMEMCHECK splat in slub code.
>>
>> Problem is now we are lockless and allow IRQ in slab_alloc(), the object
>> we manipulate from freelist can be allocated and freed right before we
>> try to read object->next.
>>
>> Same problem can happen with DEBUG_PAGEALLOC
>>
>> Just dont use cmpxchg_double() if either CONFIG_KMEMCHECK or
>> CONFIG_DEBUG_PAGEALLOC is defined.
>>
>> Reported-by: Christian Casteyde<casteyde.christian@free.fr>
>> Signed-off-by: Eric Dumazet<eric.dumazet@gmail.com>
>> Cc: Andrew Morton<akpm@linux-foundation.org>
>> Cc: Pekka Enberg<penberg@cs.helsinki.fi>
>> Cc: Vegard Nossum<vegardno@ifi.uio.no>
>> Cc: Christoph Lameter<cl@linux.com>
>> ---
>>   mm/slub.c |   45 +++++++++++++++++++++++++--------------------
>>   1 files changed, 25 insertions(+), 20 deletions(-)
>>
>> diff --git a/mm/slub.c b/mm/slub.c
>> index 94d2a33..f31ab2c 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -1540,7 +1540,12 @@ static void unfreeze_slab(struct kmem_cache *s, struct page *page, int tail)
>>   	}
>>   }
>>
>> -#ifdef CONFIG_CMPXCHG_LOCAL
>> +#if defined(CONFIG_CMPXCHG_LOCAL)&&  !defined(CONFIG_KMEMCHECK)&&  \
>> +				     !defined(CONFIG_DEBUG_PAGEALLOC)
>> +#define SLUB_USE_CMPXCHG_DOUBLE
>> +#endif
>> +
>> +#ifdef SLUB_USE_CMPXCHG_DOUBLE
>>   #ifdef CONFIG_PREEMPT
>>   /*
>>    * Calculate the next globally unique transaction for disambiguiation
>> @@ -1604,7 +1609,7 @@ static inline void note_cmpxchg_failure(const char *n,
>>
>>   void init_kmem_cache_cpus(struct kmem_cache *s)
>>   {
>> -#ifdef CONFIG_CMPXCHG_LOCAL
>> +#ifdef SLUB_USE_CMPXCHG_DOUBLE
>>   	int cpu;
>>
>>   	for_each_possible_cpu(cpu)
>> @@ -1643,7 +1648,7 @@ static void deactivate_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
>>   		page->inuse--;
>>   	}
>>   	c->page = NULL;
>> -#ifdef CONFIG_CMPXCHG_LOCAL
>> +#ifdef SLUB_USE_CMPXCHG_DOUBLE
>>   	c->tid = next_tid(c->tid);
>>   #endif
>>   	unfreeze_slab(s, page, tail);
>> @@ -1780,7 +1785,7 @@ static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>>   {
>>   	void **object;
>>   	struct page *new;
>> -#ifdef CONFIG_CMPXCHG_LOCAL
>> +#ifdef SLUB_USE_CMPXCHG_DOUBLE
>>   	unsigned long flags;
>>
>>   	local_irq_save(flags);
>> @@ -1819,7 +1824,7 @@ load_freelist:
>>   	c->node = page_to_nid(c->page);
>>   unlock_out:
>>   	slab_unlock(c->page);
>> -#ifdef CONFIG_CMPXCHG_LOCAL
>> +#ifdef SLUB_USE_CMPXCHG_DOUBLE
>>   	c->tid = next_tid(c->tid);
>>   	local_irq_restore(flags);
>>   #endif
>> @@ -1858,7 +1863,7 @@ new_slab:
>>   	}
>>   	if (!(gfpflags&  __GFP_NOWARN)&&  printk_ratelimit())
>>   		slab_out_of_memory(s, gfpflags, node);
>> -#ifdef CONFIG_CMPXCHG_LOCAL
>> +#ifdef SLUB_USE_CMPXCHG_DOUBLE
>>   	local_irq_restore(flags);
>>   #endif
>>   	return NULL;
>> @@ -1887,7 +1892,7 @@ static __always_inline void *slab_alloc(struct kmem_cache *s,
>>   {
>>   	void **object;
>>   	struct kmem_cache_cpu *c;
>> -#ifdef CONFIG_CMPXCHG_LOCAL
>> +#ifdef SLUB_USE_CMPXCHG_DOUBLE
>>   	unsigned long tid;
>>   #else
>>   	unsigned long flags;
>> @@ -1896,7 +1901,7 @@ static __always_inline void *slab_alloc(struct kmem_cache *s,
>>   	if (slab_pre_alloc_hook(s, gfpflags))
>>   		return NULL;
>>
>> -#ifndef CONFIG_CMPXCHG_LOCAL
>> +#ifndef SLUB_USE_CMPXCHG_DOUBLE
>>   	local_irq_save(flags);
>>   #else
>>   redo:
>> @@ -1910,7 +1915,7 @@ redo:
>>   	 */
>>   	c = __this_cpu_ptr(s->cpu_slab);
>>
>> -#ifdef CONFIG_CMPXCHG_LOCAL
>> +#ifdef SLUB_USE_CMPXCHG_DOUBLE
>>   	/*
>>   	 * The transaction ids are globally unique per cpu and per operation on
>>   	 * a per cpu queue. Thus they can be guarantee that the cmpxchg_double
>> @@ -1927,7 +1932,7 @@ redo:
>>   		object = __slab_alloc(s, gfpflags, node, addr, c);
>>
>>   	else {
>> -#ifdef CONFIG_CMPXCHG_LOCAL
>> +#ifdef SLUB_USE_CMPXCHG_DOUBLE
>>   		/*
>>   		 * The cmpxchg will only match if there was no additional
>>   		 * operation and if we are on the right processor.
>> @@ -1954,7 +1959,7 @@ redo:
>>   		stat(s, ALLOC_FASTPATH);
>>   	}
>>
>> -#ifndef CONFIG_CMPXCHG_LOCAL
>> +#ifndef SLUB_USE_CMPXCHG_DOUBLE
>>   	local_irq_restore(flags);
>>   #endif
>>
>> @@ -2034,7 +2039,7 @@ static void __slab_free(struct kmem_cache *s, struct page *page,
>>   {
>>   	void *prior;
>>   	void **object = (void *)x;
>> -#ifdef CONFIG_CMPXCHG_LOCAL
>> +#ifdef SLUB_USE_CMPXCHG_DOUBLE
>>   	unsigned long flags;
>>
>>   	local_irq_save(flags);
>> @@ -2070,7 +2075,7 @@ checks_ok:
>>
>>   out_unlock:
>>   	slab_unlock(page);
>> -#ifdef CONFIG_CMPXCHG_LOCAL
>> +#ifdef SLUB_USE_CMPXCHG_DOUBLE
>>   	local_irq_restore(flags);
>>   #endif
>>   	return;
>> @@ -2084,7 +2089,7 @@ slab_empty:
>>   		stat(s, FREE_REMOVE_PARTIAL);
>>   	}
>>   	slab_unlock(page);
>> -#ifdef CONFIG_CMPXCHG_LOCAL
>> +#ifdef SLUB_USE_CMPXCHG_DOUBLE
>>   	local_irq_restore(flags);
>>   #endif
>>   	stat(s, FREE_SLAB);
>> @@ -2113,7 +2118,7 @@ static __always_inline void slab_free(struct kmem_cache *s,
>>   {
>>   	void **object = (void *)x;
>>   	struct kmem_cache_cpu *c;
>> -#ifdef CONFIG_CMPXCHG_LOCAL
>> +#ifdef SLUB_USE_CMPXCHG_DOUBLE
>>   	unsigned long tid;
>>   #else
>>   	unsigned long flags;
>> @@ -2121,7 +2126,7 @@ static __always_inline void slab_free(struct kmem_cache *s,
>>
>>   	slab_free_hook(s, x);
>>
>> -#ifndef CONFIG_CMPXCHG_LOCAL
>> +#ifndef SLUB_USE_CMPXCHG_DOUBLE
>>   	local_irq_save(flags);
>>
>>   #else
>> @@ -2136,7 +2141,7 @@ redo:
>>   	 */
>>   	c = __this_cpu_ptr(s->cpu_slab);
>>
>> -#ifdef CONFIG_CMPXCHG_LOCAL
>> +#ifdef SLUB_USE_CMPXCHG_DOUBLE
>>   	tid = c->tid;
>>   	barrier();
>>   #endif
>> @@ -2144,7 +2149,7 @@ redo:
>>   	if (likely(page == c->page&&  c->node != NUMA_NO_NODE)) {
>>   		set_freepointer(s, object, c->freelist);
>>
>> -#ifdef CONFIG_CMPXCHG_LOCAL
>> +#ifdef SLUB_USE_CMPXCHG_DOUBLE
>>   		if (unlikely(!this_cpu_cmpxchg_double(
>>   				s->cpu_slab->freelist, s->cpu_slab->tid,
>>   				c->freelist, tid,
>> @@ -2160,7 +2165,7 @@ redo:
>>   	} else
>>   		__slab_free(s, page, x, addr);
>>
>> -#ifndef CONFIG_CMPXCHG_LOCAL
>> +#ifndef SLUB_USE_CMPXCHG_DOUBLE
>>   	local_irq_restore(flags);
>>   #endif
>>   }
>> @@ -2354,7 +2359,7 @@ static inline int alloc_kmem_cache_cpus(struct kmem_cache *s)
>>   	BUILD_BUG_ON(PERCPU_DYNAMIC_EARLY_SIZE<
>>   			SLUB_PAGE_SHIFT * sizeof(struct kmem_cache_cpu));
>>
>> -#ifdef CONFIG_CMPXCHG_LOCAL
>> +#ifdef SLUB_USE_CMPXCHG_DOUBLE
>>   	/*
>>   	 * Must align to double word boundary for the double cmpxchg instructions
>>   	 * to work.
>>
>
>
> Is anybody taking the patch and sending it upstream ?

Oh, sorry, it got lost in my inbox. I can take care of it.

			Pekka

^ permalink raw reply

* RE: [PATCH] mlx4_en: Setting RSS hash result to skb->rxhash field
From: Yevgeny Petrilin @ 2011-05-05  6:47 UTC (permalink / raw)
  To: Eric Dumazet, Ben Hutchings; +Cc: davem@davemloft.net, netdev@vger.kernel.org
In-Reply-To: <1304527586.32152.5.camel@edumazet-laptop>

> >
> > An 8-bit hash is almost useless.  It's entirely useless if you then
> > shift it into the top bits of rxhash.
> >
> 
> Agreed. This is very bad.
> 
> Yevgeny probably did this shift because get_rps_cpu() does :
> 
> tcpu = map->cpus[((u64) skb->rxhash * map->len) >> 32];
> 
> (If rxhash is not a pure random u32 distribution, then high order bits are more important than low order bits)
> 
> 
Eric, you are correct.
We do plan to enable full 32 bit hash for our devices.
Once it is done, we will naturally use the whole 32 bits.
In the meanwhile, even with this change we see improved performance when enabling RPS.

Thanks,
Yevgeny

^ permalink raw reply

* Re: [Bugme-new] [Bug 33502] New: Caught 64-bit read from uninitialized memory in __alloc_skb
From: Eric Dumazet @ 2011-05-05  6:50 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, casteyde.christian, Andrew Morton, netdev,
	bugzilla-daemon, bugme-daemon, Vegard Nossum
In-Reply-To: <4DC24239.7050806@cs.helsinki.fi>

Le jeudi 05 mai 2011 à 09:22 +0300, Pekka Enberg a écrit :

> Oh, sorry, it got lost in my inbox. I can take care of it.
> 

Thanks Pekka !



^ permalink raw reply

* Re: [PATCH] [RFC] vlan: fix updating wanted_features for vlan device
From: Michał Mirosław @ 2011-05-05  6:51 UTC (permalink / raw)
  To: Yi Zou; +Cc: netdev, jeffrey.t.kirsher, devel
In-Reply-To: <20110505004841.26691.74872.stgit@localhost6.localdomain6>

On Wed, May 04, 2011 at 05:48:42PM -0700, Yi Zou wrote:
> commit 8a0427b "vlan: convert VLAN devices to use ndo_fix_features()" converts the
> vlan to support ndo_fix_features. However, the wanted_features is not updated
> for the vlan device, causing real_dev->features not be populated to the vlan
> device when real_dev->features are changed by the driver through FEAT_CHANGE.
> This is breaking FCoE related netdev feature flags on vlan devices. Add updating
> wanted_features to vlan_transfer_features() so netdev_get_wanted_features() will
> can get the updated wanted feature flags for vlan device properly.

Can you describe the situation further? There might be problems if device
changes its vlan_features after creation of VLAN devices on top (bonding?).

dev->wanted_features is only what user wants to get and should not be changed
by anything else.

Best Regards,
Michał Mirosław

^ permalink raw reply

* RE: [PATCH] mlx4_en: Setting RSS hash result to skb->rxhash field
From: Eric Dumazet @ 2011-05-05  6:55 UTC (permalink / raw)
  To: Yevgeny Petrilin
  Cc: Ben Hutchings, davem@davemloft.net, netdev@vger.kernel.org
In-Reply-To: <953B660C027164448AE903364AC447D2070B8E35@mtldag01.mtl.com>

Le jeudi 05 mai 2011 à 06:47 +0000, Yevgeny Petrilin a écrit :
> > 
> Eric, you are correct.
> We do plan to enable full 32 bit hash for our devices.
> Once it is done, we will naturally use the whole 32 bits.
> In the meanwhile, even with this change we see improved performance when enabling RPS.

Hmm, thats strange because you have only 256 possible values for rxhash,
and it's OK for maybe one hundred flows. (I am talking not of RPS but
RFS here)

So your patch is a win only for small hosts, or particular workloads
(did I say biased benchmarks ? ;) )

Really, we better use the linux/software full 32bit rxhash in this case,
and wait for your 32bit full support.

^ permalink raw reply

* RE: [PATCH] mlx4_en: Setting RSS hash result to skb->rxhash field
From: Eric Dumazet @ 2011-05-05  6:57 UTC (permalink / raw)
  To: Yevgeny Petrilin
  Cc: Ben Hutchings, davem@davemloft.net, netdev@vger.kernel.org
In-Reply-To: <1304578518.32152.794.camel@edumazet-laptop>

Le jeudi 05 mai 2011 à 08:55 +0200, Eric Dumazet a écrit :
> Le jeudi 05 mai 2011 à 06:47 +0000, Yevgeny Petrilin a écrit :
> > > 
> > Eric, you are correct.
> > We do plan to enable full 32 bit hash for our devices.
> > Once it is done, we will naturally use the whole 32 bits.
> > In the meanwhile, even with this change we see improved performance when enabling RPS.
> 
> Hmm, thats strange because you have only 256 possible values for rxhash,
> and it's OK for maybe one hundred flows. (I am talking not of RPS but
> RFS here)
> 
> So your patch is a win only for small hosts, or particular workloads
> (did I say biased benchmarks ? ;) )
> 
> Really, we better use the linux/software full 32bit rxhash in this case,
> and wait for your 32bit full support.
> 
> 

BTW, before you submit another rxhash patch, please make sure ethtool
support is included. An admin must be able to switch off hardware
support :

ethtool -K eth0 rxhash off




^ permalink raw reply

* Re: [PATCHv4 2/2] tg3: Allow ethtool to enable/disable loopback.
From: Michał Mirosław @ 2011-05-05  6:59 UTC (permalink / raw)
  To: Mahesh Bandewar
  Cc: Matt Carlson, David Miller, netdev, Michael Chan, Tom Herbert
In-Reply-To: <1304559247-16111-1-git-send-email-maheshb@google.com>

2011/5/5 Mahesh Bandewar <maheshb@google.com>:
> This patch adds tg3_set_features() to handle loopback mode. Currently the
> capability is added for the devices which support internal MAC loopback mode.
> So when enabled, it enables internal-MAC loopback.
>
> Signed-off-by: Mahesh Bandewar <maheshb@google.com>
> ---
> Changes since v3:
>  (a) Corrected the condition (|| => &&) where loopback capability is added.
>  (b) set_features() always returns 0.
>  (c) Clear the loopback bit in ndo_open callback to avoid discrepancy.
>
> Changes since v2:
>  Implemtned Joe Perches's style change.
>
> Changes since v1:
>  Implemented Matt Carlson's suggestions.
>  (a) Enable this capability on the devices which are capable of MAC-loopback
>      mode.
>  (b) check if the device is running before making changes.
>  (c) check bits before making changes.
>
>  drivers/net/tg3.c |   69 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 69 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
> index 7c7c9a8..7f7a290 100644
> --- a/drivers/net/tg3.c
> +++ b/drivers/net/tg3.c
> @@ -6309,6 +6309,44 @@ dma_error:
>        return NETDEV_TX_OK;
>  }
>
> +static int tg3_set_loopback(struct net_device *dev, u32 features)
> +{
> +       struct tg3 *tp = netdev_priv(dev);
> +       u32 cur_mode = 0;
> +       int retval = 0;

void? You always return zero and don't check it in callers.

[...]
> @@ -9485,6 +9533,15 @@ static int tg3_open(struct net_device *dev)
>
>        netif_tx_start_all_queues(dev);
>
> +       /*
> +        * Reset loopback feature if it was turned on while the device was down
> +        * to avoid and any discrepancy in features reporting.
> +        */
> +       if (dev->features & NETIF_F_LOOPBACK) {
> +               dev->features &= ~NETIF_F_LOOPBACK;
> +               dev->wanted_features &= ~NETIF_F_LOOPBACK;
> +       }
> +

if (dev->features & NETIF_F_LOOPBACK)
  tg3_set_loopback(dev, dev->features);

Whatever you do, don't modify wanted_features in drivers.

BTW, similar problems (also like in previous versions) are in
forcedeth implementation.

Best Regards,
Michał Mirosław

^ permalink raw reply

* RE: [PATCH] mlx4_en: Setting RSS hash result to skb->rxhash field
From: Yevgeny Petrilin @ 2011-05-05  7:08 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Ben Hutchings, davem@davemloft.net, netdev@vger.kernel.org
In-Reply-To: <1304578635.32152.800.camel@edumazet-laptop>

> > > >
> > > Eric, you are correct.
> > > We do plan to enable full 32 bit hash for our devices.
> > > Once it is done, we will naturally use the whole 32 bits.
> > > In the meanwhile, even with this change we see improved performance when enabling RPS.
> >
> > Hmm, thats strange because you have only 256 possible values for rxhash,
> > and it's OK for maybe one hundred flows. (I am talking not of RPS but
> > RFS here)
> >
> > So your patch is a win only for small hosts, or particular workloads
> > (did I say biased benchmarks ? ;) )
> >
> > Really, we better use the linux/software full 32bit rxhash in this case,
> > and wait for your 32bit full support.
> >
> >
> 
> BTW, before you submit another rxhash patch, please make sure ethtool
> support is included. An admin must be able to switch off hardware
> support :
> 
> ethtool -K eth0 rxhash off
> 
> 
Thanks for this input

^ permalink raw reply

* RE: [PATCH] [RFC] vlan: fix updating wanted_features for vlan device
From: Zou, Yi @ 2011-05-05  7:49 UTC (permalink / raw)
  To: Michał Mirosław
  Cc: netdev@vger.kernel.org, Kirsher, Jeffrey T, devel@open-fcoe.org
In-Reply-To: <20110505065153.GA3288@rere.qmqm.pl>

> 
> On Wed, May 04, 2011 at 05:48:42PM -0700, Yi Zou wrote:
> > commit 8a0427b "vlan: convert VLAN devices to use ndo_fix_features()"
> converts the
> > vlan to support ndo_fix_features. However, the wanted_features is not
> updated
> > for the vlan device, causing real_dev->features not be populated to the
> vlan
> > device when real_dev->features are changed by the driver through
> FEAT_CHANGE.
> > This is breaking FCoE related netdev feature flags on vlan devices. Add
> updating
> > wanted_features to vlan_transfer_features() so
> netdev_get_wanted_features() will
> > can get the updated wanted feature flags for vlan device properly.
> 
> Can you describe the situation further? There might be problems if device
> changes its vlan_features after creation of VLAN devices on top
> (bonding?).

Not sure for bonding, I noticed this when creating FCoE instance on a vlan
Device on top of an Intel 82599 device. The real_dev is created with 
vlan_features set of related FCoE flags, e.g., NETIF_F_FCOE_MTU, NETIF_F_FCOE_CRC,
etc. However, the real_dev's features are not set w/ these FCoE flags till 
the FCoE protocol stack starts using FCoE when ndo_fcoe_enable() is called, 
where in the case of 82599's case, these flags are toggled on in netdev's 
features, and netdev_features_change() is called to populate to the upper 
vlan device since previously we do:

       vlandev->features &= ~dev->vlan_features;
       vlandev->features |= dev->features & dev->vlan_features; 

in vlan_transfer_features(). FCoE only checks the corresponding netdev features
(vlan or real_dev) passed in from user space as a interface name e.g., eth2.100,
to tell if, for example, FCOE_MTU is supported. Particularly for this case, there
is no change on vlan_features. 

> 
> dev->wanted_features is only what user wants to get and should not be
> changed
> by anything else.
Hmm...so my usage for that seems to be wrong if that's what wanted_features is
meant to be. Well, wanted_features in vlan is features & hw_features, if it stays,
then, it seems to me will never get changes from real_dev's features since 
netdev_get_wanted_features() won't get it to begin w/.

Also, note my RFC isn't complete either, as it only sets not clears previously
flags in wanted_features.

Thanks,
yi
> 
> Best Regards,
> Michał Mirosław

^ permalink raw reply

* Re: [PATCH 0/4] [RFC] virtio-net: Improve small packet performance
From: Krishna Kumar2 @ 2011-05-05  8:03 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: davem, eric.dumazet, kvm, netdev, rusty
In-Reply-To: <20110504212359.GA21446@redhat.com>

"Michael S. Tsirkin" <mst@redhat.com> wrote on 05/05/2011 02:53:59 AM:

> > Not "hope" exactly. If the device is not ready, then
> > the packet is requeued. The main idea is to avoid
> > drops/stop/starts, etc.
>
> Yes, I see that, definitely. I guess it's a win if the
> interrupt takes at least a jiffy to arrive anyway,
> and a loss if not. Is there some reason interrupts
> might be delayed until the next jiffy?

I can explain this a bit as I have three debug counters
in start_xmit() just for this:

1. Whether the current xmit call was good, i.e. we had
   returned BUSY last time and this xmit was successful.
2. Whether the current xmit call was bad, i.e. we had
   returned BUSY last time and this xmit still failed.
3. The free capacity when we *resumed* xmits. This is
   after calling free_old_xmit_skbs where this function
   is not throttled, in effect it processes *all* the
   completed skbs. This counter is a sum:

   if (If_I_had_returned_EBUSY_last_iteration)
       free_slots += virtqueue_get_capacity();

The counters after a 30 min run of 1K,2K,16K netperf
sessions are:

Good:          1059172
Bad:           31226
Sum of slots:  47551557

(Total of Good+Bad tallies with the total number of requeues
as shown by tc:

qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1
1 1 1 1
 Sent 1560854473453 bytes 1075873684 pkt (dropped 718379, overlimits 0
requeues 1090398)
 backlog 0b 0p requeues 1090398
)

It shows that 2.9% of the time, the 1 jiffy was not enough
to free up space in the txq. That could also mean that we
had set xmit_restart just before jiffies changed. But the
average free capacity when we *resumed* xmits is:
Sum of slots / (Good + Bad) = 43.

So the delay of 1 jiffy helped the host clean up, on average,
just 43 entries, which is 16% of total entries. This is
intended to show that the guest is not sitting idle waiting
for the jiffy to expire.

> > > I can post it, mind testing this?
> >
> > Sure.
>
> Just posted. Would appreciate feedback.

Do I need to apply all the patches and simply test?

Thanks,

- KK

^ permalink raw reply

* Re: [PATCH 07/18] virtio ring: inline function to check for events
From: Stefan Hajnoczi @ 2011-05-05  8:34 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, Krishna Kumar, Carsten Otte, lguest, Shirley Ma,
	kvm, linux-s390, netdev, habanero, Heiko Carstens, virtualization,
	steved, Christian Borntraeger, Tom Lendacky, Martin Schwidefsky,
	linux390
In-Reply-To: <cc4596a5ecdbf8dd37b5567a36667e3841f18ca3.1304541919.git.mst@redhat.com>

On Wed, May 4, 2011 at 9:51 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> With the new used_event and avail_event and features, both
> host and guest need similar logic to check whether events are
> enabled, so it helps to put the common code in the header.
>
> Note that Xen has similar logic for notification hold-off
> in include/xen/interface/io/ring.h with req_event and req_prod
> corresponding to event_idx + 1 and new_idx respectively.
> +1 comes from the fact that req_event and req_prod in Xen start at 1,
> while event index in virtio starts at 0.
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  include/linux/virtio_ring.h |   14 ++++++++++++++
>  1 files changed, 14 insertions(+), 0 deletions(-)
>
> diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
> index f791772..2a3b0ea 100644
> --- a/include/linux/virtio_ring.h
> +++ b/include/linux/virtio_ring.h
> @@ -124,6 +124,20 @@ static inline unsigned vring_size(unsigned int num, unsigned long align)
>                + sizeof(__u16) * 3 + sizeof(struct vring_used_elem) * num;
>  }
>
> +/* The following is used with USED_EVENT_IDX and AVAIL_EVENT_IDX */
> +/* Assuming a given event_idx value from the other size, if

s/other size/other side/ ?

Stefan

^ permalink raw reply

* [PATCH] be2net: Fixed bugs related to PVID.
From: Somnath Kotur @ 2011-05-05  8:40 UTC (permalink / raw)
  To: netdev, davem; +Cc: Somnath Kotur

Fixed bug to make sure 'pvid' retrieval will work on big endian hosts.
Fixed incorrect comparison between the Rx Completion's 16-bit VLAN TCI
and the PVID. Now comparing only the relevant 12 bits corresponding to
the VID.
Renamed 'vid' field under Rx Completion to 'vlan_tag' to reflect
accurate description.

Signed-off-by: Somnath Kotur <somnath.kotur@emulex.com>
---
 drivers/net/benet/be.h      |    2 +-
 drivers/net/benet/be_cmds.c |    2 +-
 drivers/net/benet/be_main.c |   18 ++++++++++++------
 3 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/drivers/net/benet/be.h b/drivers/net/benet/be.h
index 66823ed..2353eca 100644
--- a/drivers/net/benet/be.h
+++ b/drivers/net/benet/be.h
@@ -213,7 +213,7 @@ struct be_rx_stats {
 
 struct be_rx_compl_info {
 	u32 rss_hash;
-	u16 vid;
+	u16 vlan_tag;
 	u16 pkt_size;
 	u16 rxq_idx;
 	u16 mac_id;
diff --git a/drivers/net/benet/be_cmds.c b/drivers/net/benet/be_cmds.c
index 1e2d825..9dc9394 100644
--- a/drivers/net/benet/be_cmds.c
+++ b/drivers/net/benet/be_cmds.c
@@ -132,7 +132,7 @@ static void be_async_grp5_pvid_state_process(struct be_adapter *adapter,
 		struct be_async_event_grp5_pvid_state *evt)
 {
 	if (evt->enabled)
-		adapter->pvid = evt->tag;
+		adapter->pvid = le16_to_cpu(evt->tag);
 	else
 		adapter->pvid = 0;
 }
diff --git a/drivers/net/benet/be_main.c b/drivers/net/benet/be_main.c
index 02a0443..9187fb4 100644
--- a/drivers/net/benet/be_main.c
+++ b/drivers/net/benet/be_main.c
@@ -1018,7 +1018,8 @@ static void be_rx_compl_process(struct be_adapter *adapter,
 			kfree_skb(skb);
 			return;
 		}
-		vlan_hwaccel_receive_skb(skb, adapter->vlan_grp, rxcp->vid);
+		vlan_hwaccel_receive_skb(skb, adapter->vlan_grp,
+					rxcp->vlan_tag);
 	} else {
 		netif_receive_skb(skb);
 	}
@@ -1076,7 +1077,8 @@ static void be_rx_compl_process_gro(struct be_adapter *adapter,
 	if (likely(!rxcp->vlanf))
 		napi_gro_frags(&eq_obj->napi);
 	else
-		vlan_gro_frags(&eq_obj->napi, adapter->vlan_grp, rxcp->vid);
+		vlan_gro_frags(&eq_obj->napi, adapter->vlan_grp,
+				rxcp->vlan_tag);
 }
 
 static void be_parse_rx_compl_v1(struct be_adapter *adapter,
@@ -1102,7 +1104,8 @@ static void be_parse_rx_compl_v1(struct be_adapter *adapter,
 	rxcp->pkt_type =
 		AMAP_GET_BITS(struct amap_eth_rx_compl_v1, cast_enc, compl);
 	rxcp->vtm = AMAP_GET_BITS(struct amap_eth_rx_compl_v1, vtm, compl);
-	rxcp->vid = AMAP_GET_BITS(struct amap_eth_rx_compl_v1, vlan_tag, compl);
+	rxcp->vlan_tag = AMAP_GET_BITS(struct amap_eth_rx_compl_v1, vlan_tag,
+					compl);
 }
 
 static void be_parse_rx_compl_v0(struct be_adapter *adapter,
@@ -1128,7 +1131,8 @@ static void be_parse_rx_compl_v0(struct be_adapter *adapter,
 	rxcp->pkt_type =
 		AMAP_GET_BITS(struct amap_eth_rx_compl_v0, cast_enc, compl);
 	rxcp->vtm = AMAP_GET_BITS(struct amap_eth_rx_compl_v0, vtm, compl);
-	rxcp->vid = AMAP_GET_BITS(struct amap_eth_rx_compl_v0, vlan_tag, compl);
+	rxcp->vlan_tag = AMAP_GET_BITS(struct amap_eth_rx_compl_v0, vlan_tag,
+					compl);
 }
 
 static struct be_rx_compl_info *be_rx_compl_get(struct be_rx_obj *rxo)
@@ -1155,9 +1159,11 @@ static struct be_rx_compl_info *be_rx_compl_get(struct be_rx_obj *rxo)
 		rxcp->vlanf = 0;
 
 	if (!lancer_chip(adapter))
-		rxcp->vid = swab16(rxcp->vid);
+		rxcp->vlan_tag = swab16(rxcp->vlan_tag);
 
-	if ((adapter->pvid == rxcp->vid) && !adapter->vlan_tag[rxcp->vid])
+	if (((adapter->pvid & VLAN_VID_MASK) ==
+		(rxcp->vlan_tag & VLAN_VID_MASK)) &&
+		!adapter->vlan_tag[rxcp->vlan_tag])
 		rxcp->vlanf = 0;
 
 	/* As the compl has been parsed, reset it; we wont touch it again */
-- 
1.5.6.1


^ permalink raw reply related

* Re: tap/bridge: Dropping NETIF_F_GSO/NETIF_F_SG
From: Michael S. Tsirkin @ 2011-05-05  8:44 UTC (permalink / raw)
  To: Michał Mirosław; +Cc: Herbert Xu, netdev, Ben Hutchings
In-Reply-To: <20110504232854.GA11687@rere.qmqm.pl>

On Thu, May 05, 2011 at 01:28:54AM +0200, Michał Mirosław wrote:
> On Thu, May 05, 2011 at 08:34:15AM +1000, Herbert Xu wrote:
> > On Wed, May 04, 2011 at 09:18:14PM +0300, Michael S. Tsirkin wrote:
> > > BTW, I just noticed that net-next spits out
> > > many of the following when I run any VMs:
> > > 
> > > tap0: Dropping NETIF_F_SG since no checksum feature.
> > > tap0: Dropping NETIF_F_GSO since no SG feature.
> > > tap0: Features changed: 0x401b4849 -> 0x40004040
> > > device msttap0 entered promiscuous mode
> > > br0: Dropping NETIF_F_GSO since no SG feature.
> > > br0: port 1(msttap0) entering forwarding state
> > > br0: port 1(msttap0) entering forwarding state
> > > tap0: Features changed: 0x40004040 -> 0x40024849
> > > tap0: Dropping NETIF_F_SG since no checksum feature.
> > > tap0: Dropping NETIF_F_GSO since no SG feature.
> > > tap0: Features changed: 0x40024849 -> 0x40004040
> > > br0: Dropping NETIF_F_GSO since no SG feature.
> > > tap0: Dropping NETIF_F_SG since no checksum feature.
> > > tap0: Dropping NETIF_F_GSO since no SG feature.
> > > tap0: Dropping NETIF_F_SG since no checksum feature.
> > > tap0: Dropping NETIF_F_GSO since no SG feature.
> > > tap0: Dropping NETIF_F_SG since no checksum feature.
> > > tap0: Dropping NETIF_F_GSO since no SG feature.
> > > tap0: Dropping NETIF_F_SG since no checksum feature.
> > > tap0: Dropping NETIF_F_GSO since no SG feature.
> > > tap0: Features changed: 0x40004040 -> 0x401b4849
> > > tap0: Dropping NETIF_F_SG since no checksum feature.
> > > tap0: Dropping NETIF_F_GSO since no SG feature.
> > > tap0: Features changed: 0x401b4849 -> 0x40004040
> > > br0: Dropping NETIF_F_GSO since no SG feature.
> > > 
> > > My problem is not primarily with warnings:
> > > 
> > > will that linearize all packets and disable GSO
> > > for tap and bridge? If yes it can't be good
> > > for performance...
> > I think so.  So the question is why is checksum off?
> 
> Whatever application is creating the tap0 device is not calling
> ioctl(TUNSETOFFLOAD) with TUN_F_CSUM set. This is userspace bug/feature
> exposed by recent changes to netdev features handling.
> 
> Best Regards,
> Michał Mirosław

No, I think it's not a bug in userspace. Here is what it does:

    /* Check if our kernel supports TUNSETOFFLOAD */
    if (ioctl(fd, TUNSETOFFLOAD, 0) != 0 && errno == EINVAL) {
        return;
    }

    if (csum) {
        offload |= TUN_F_CSUM;
        if (tso4)
            offload |= TUN_F_TSO4;
        if (tso6)
            offload |= TUN_F_TSO6;
        if ((tso4 || tso6) && ecn)
            offload |= TUN_F_TSO_ECN;
        if (ufo)
            offload |= TUN_F_UFO;
    }

    if (ioctl(fd, TUNSETOFFLOAD, offload) != 0) {
        offload &= ~TUN_F_UFO;
        if (ioctl(fd, TUNSETOFFLOAD, offload) != 0) {
            fprintf(stderr, "TUNSETOFFLOAD ioctl() failed: %s\n",
                    strerror(errno));
        }
    }

The first call is just to check that ioctl works.
When checking for ioctl in this way, userspace clears checksum.
This will clear SG and thus GSO; later userspace enables checksum.
checksum is on but SG is by now disabled so GSO gets cleared again too.


It's also likely a problem that
userspace can trigger warnings in log for what used to be
a legal way to check for ioctl and/or disable checksum offloading,
but that is more minor.

-- 
MST

^ permalink raw reply

* Re: [PATCH 2/2 net-next] net: drivers: set TSO/UFO offload option explicitly
From: Shan Wei @ 2011-05-05  8:51 UTC (permalink / raw)
  To: Michał Mirosław
  Cc: David Miller, netdev, rusty, mst, Eric Dumazet, mirq-linux,
	bhutchings, dm
In-Reply-To: <BANLkTikckKWTBY0udurXi09UhdqOUExk6A@mail.gmail.com>

Michał Mirosław wrote, at 05/04/2011 08:36 PM:
> 2011/4/29 Shan Wei <shanwei@cn.fujitsu.com>:
>> The device drivers should not use NETIF_F_ALL_TSO mask to set hw_features(or features),
>> but have to explicitly set offload option. Because, This would make drivers automatically
>> clain to support any new TSO feature an the moment of NETIF_F_ALL_TSO is expanded.
>>
>> Some code style tuning. Just compile test.
>>
>> Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
>> ---
>>  drivers/net/loopback.c   |   18 ++++++++----------
>>  drivers/net/virtio_net.c |    9 ++++++---
>>  2 files changed, 14 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
>> index d70fb76..bfb6a4a 100644
>> --- a/drivers/net/loopback.c
>> +++ b/drivers/net/loopback.c
>> @@ -152,6 +152,9 @@ static const struct net_device_ops loopback_ops = {
>>        .ndo_get_stats64 = loopback_get_stats64,
>>  };
>>
>> +#define LOOPBACK_USER_FEATURES (NETIF_F_TSO | NETIF_F_TSO_ECN | \
>> +                               NETIF_F_TSO6 | NETIF_F_UFO)
>> +
>>  /*
>>  * The loopback device is special. There is only one instance
>>  * per network namespace.
>> @@ -165,16 +168,11 @@ static void loopback_setup(struct net_device *dev)
>>        dev->type               = ARPHRD_LOOPBACK;      /* 0x0001*/
>>        dev->flags              = IFF_LOOPBACK;
>>        dev->priv_flags        &= ~IFF_XMIT_DST_RELEASE;
>> -       dev->hw_features        = NETIF_F_ALL_TSO | NETIF_F_UFO;
>> -       dev->features           = NETIF_F_SG | NETIF_F_FRAGLIST
>> -               | NETIF_F_ALL_TSO
>> -               | NETIF_F_UFO
>> -               | NETIF_F_NO_CSUM
>> -               | NETIF_F_RXCSUM
>> -               | NETIF_F_HIGHDMA
>> -               | NETIF_F_LLTX
>> -               | NETIF_F_NETNS_LOCAL
>> -               | NETIF_F_VLAN_CHALLENGED;
>> +       dev->hw_features        = LOOPBACK_USER_FEATURES;
>> +       dev->features           = NETIF_F_SG | NETIF_F_FRAGLIST
>> +               | LOOPBACK_USER_FEATURES | NETIF_F_NO_CSUM | NETIF_F_RXCSUM
>> +               | NETIF_F_HIGHDMA | NETIF_F_LLTX
>> +               | NETIF_F_NETNS_LOCAL | NETIF_F_VLAN_CHALLENGED;
>>        dev->ethtool_ops        = &loopback_ethtool_ops;
>>        dev->header_ops         = &eth_header_ops;
>>        dev->netdev_ops         = &loopback_ops;
> 
> You can add NETIF_F_HIGHDMA and NETIF_F_NO_CSUM to
> LOOPBACK_USER_FEATURES in one go. NETIF_F_RXCSUM could match
> NETIF_F_NO_CSUM state (this needs ndo_fix_features callback), but this
> won't have much real functional impact.

Do you mean that defining LOOPBACK_USER_FEATURES as:
#define LOOPBACK_USER_FEATURES (NETIF_F_TSO | NETIF_F_TSO_ECN | \
                               NETIF_F_TSO6 | NETIF_F_UFO | NETIF_F_HIGHDMA | NETIF_F_NO_CSUM)

Is it necessary to add NETIF_F_NO_CSUM to hw_features?
After that, user can change TX hw csum offload(NETIF_F_NO_CSUM is disabled). But, this is not surportted before.

-- 
Best Regards
-----
Shan Wei

^ permalink raw reply

* Re: [PATCH 07/18] virtio ring: inline function to check for events
From: Michael S. Tsirkin @ 2011-05-05  8:56 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: linux-kernel, Krishna Kumar, Carsten Otte, lguest, Shirley Ma,
	kvm, linux-s390, netdev, habanero, Heiko Carstens, virtualization,
	steved, Christian Borntraeger, Tom Lendacky, Martin Schwidefsky,
	linux390
In-Reply-To: <BANLkTi=eH-iK7L2LBAm0eqidapqXqhSqxw@mail.gmail.com>

On Thu, May 05, 2011 at 09:34:46AM +0100, Stefan Hajnoczi wrote:
> On Wed, May 4, 2011 at 9:51 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > With the new used_event and avail_event and features, both
> > host and guest need similar logic to check whether events are
> > enabled, so it helps to put the common code in the header.
> >
> > Note that Xen has similar logic for notification hold-off
> > in include/xen/interface/io/ring.h with req_event and req_prod
> > corresponding to event_idx + 1 and new_idx respectively.
> > +1 comes from the fact that req_event and req_prod in Xen start at 1,
> > while event index in virtio starts at 0.
> >
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >  include/linux/virtio_ring.h |   14 ++++++++++++++
> >  1 files changed, 14 insertions(+), 0 deletions(-)
> >
> > diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
> > index f791772..2a3b0ea 100644
> > --- a/include/linux/virtio_ring.h
> > +++ b/include/linux/virtio_ring.h
> > @@ -124,6 +124,20 @@ static inline unsigned vring_size(unsigned int num, unsigned long align)
> >                + sizeof(__u16) * 3 + sizeof(struct vring_used_elem) * num;
> >  }
> >
> > +/* The following is used with USED_EVENT_IDX and AVAIL_EVENT_IDX */
> > +/* Assuming a given event_idx value from the other size, if
> 
> s/other size/other side/ ?
> 
> Stefan

Exactly. Good catch, thanks.

-- 
MST

^ permalink raw reply

* Re: [PATCH 0/4] [RFC] virtio-net: Improve small packet performance
From: Michael S. Tsirkin @ 2011-05-05  9:04 UTC (permalink / raw)
  To: Krishna Kumar2; +Cc: davem, eric.dumazet, kvm, netdev, rusty
In-Reply-To: <OF2E0182A1.3818A07E-ON65257887.00225EE8-65257887.002C000C@in.ibm.com>

On Thu, May 05, 2011 at 01:33:14PM +0530, Krishna Kumar2 wrote:
> "Michael S. Tsirkin" <mst@redhat.com> wrote on 05/05/2011 02:53:59 AM:
> 
> > > Not "hope" exactly. If the device is not ready, then
> > > the packet is requeued. The main idea is to avoid
> > > drops/stop/starts, etc.
> >
> > Yes, I see that, definitely. I guess it's a win if the
> > interrupt takes at least a jiffy to arrive anyway,
> > and a loss if not. Is there some reason interrupts
> > might be delayed until the next jiffy?
> 
> I can explain this a bit as I have three debug counters
> in start_xmit() just for this:
> 
> 1. Whether the current xmit call was good, i.e. we had
>    returned BUSY last time and this xmit was successful.
> 2. Whether the current xmit call was bad, i.e. we had
>    returned BUSY last time and this xmit still failed.
> 3. The free capacity when we *resumed* xmits. This is
>    after calling free_old_xmit_skbs where this function
>    is not throttled, in effect it processes *all* the
>    completed skbs. This counter is a sum:
> 
>    if (If_I_had_returned_EBUSY_last_iteration)
>        free_slots += virtqueue_get_capacity();
> 
> The counters after a 30 min run of 1K,2K,16K netperf
> sessions are:
> 
> Good:          1059172
> Bad:           31226
> Sum of slots:  47551557
> 
> (Total of Good+Bad tallies with the total number of requeues
> as shown by tc:
> 
> qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1
> 1 1 1 1
>  Sent 1560854473453 bytes 1075873684 pkt (dropped 718379, overlimits 0
> requeues 1090398)
>  backlog 0b 0p requeues 1090398
> )
> 
> It shows that 2.9% of the time, the 1 jiffy was not enough
> to free up space in the txq.

How common is it to free up space in *less than* 1 jiffy?

> That could also mean that we
> had set xmit_restart just before jiffies changed. But the
> average free capacity when we *resumed* xmits is:
> Sum of slots / (Good + Bad) = 43.
> 
> So the delay of 1 jiffy helped the host clean up, on average,
> just 43 entries, which is 16% of total entries. This is
> intended to show that the guest is not sitting idle waiting
> for the jiffy to expire.

OK, nice, this is exactly what my patchset is trying
to do, without playing with timers: tell the host
to interrupt us after 3/4 of the ring is free.
Why 3/4 and not all of the ring? My hope is we can
get some parallelism with the host this way.
Why 3/4 and not 7/8? No idea :)

> > > > I can post it, mind testing this?
> > >
> > > Sure.
> >
> > Just posted. Would appreciate feedback.
> 
> Do I need to apply all the patches and simply test?
> 
> Thanks,
> 
> - KK

Exactly. You can also try to tune the threshold
for interrupts as well.

-- 
MST

^ permalink raw reply

* Re: [PATCH 2/4] [RFC] virtio: Introduce new API to get free space
From: Michael S. Tsirkin @ 2011-05-05  9:13 UTC (permalink / raw)
  To: Krishna Kumar; +Cc: davem, eric.dumazet, kvm, netdev, rusty
In-Reply-To: <20110504200023.GA20303@redhat.com>

On Wed, May 04, 2011 at 11:00:23PM +0300, Michael S. Tsirkin wrote:
> On Wed, May 04, 2011 at 05:50:19PM +0300, Michael S. Tsirkin wrote:
> > > @@ -185,11 +193,6 @@ int virtqueue_add_buf_gfp(struct virtque
> > >  	if (vq->num_free < out + in) {
> > >  		pr_debug("Can't add buf len %i - avail = %i\n",
> > >  			 out + in, vq->num_free);
> > > -		/* FIXME: for historical reasons, we force a notify here if
> > > -		 * there are outgoing parts to the buffer.  Presumably the
> > > -		 * host should service the ring ASAP. */
> > > -		if (out)
> > > -			vq->notify(&vq->vq);
> > >  		END_USE(vq);
> > >  		return -ENOSPC;
> > >  	}
> > 
> > This will break qemu versions 0.13 and back.
> > I'm adding some new virtio ring flags, we'll be
> > able to reuse one of these to mean 'no need for
> > work around', I think.
> 
> Not really, it wont. We shall almost never get here at all.
> But then, why would this help performance?

I think I understand this finally.
By itself, this patch does not help performance and does not
hurt it. But later patch makes us try to xmit and fail there
instead of doing capacity checks. With *that* patch applied
on top of this one, and with qemu 0.13 and older, performance
will be hurt.

We need to either
- ignore these older hosts
- add a feature bit (or use one of the new ones I added: for example
  with avail_event userspace never needs this behaviour as it can
  ask to get events when ring gets full)
- keep doing capacity checks, which will make us almost never get here


> > -- 
> > MST

^ permalink raw reply

* Re: [PATCH 09/18] virtio: use avail_event index
From: Michael S. Tsirkin @ 2011-05-05  9:34 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-kernel, Rusty Russell, Carsten Otte, Christian Borntraeger,
	linux390, Martin Schwidefsky, Heiko Carstens, Shirley Ma, lguest,
	virtualization, netdev, linux-s390, kvm, Krishna Kumar, steved,
	habanero
In-Reply-To: <201105041658.19917.tahm@linux.vnet.ibm.com>

On Wed, May 04, 2011 at 04:58:18PM -0500, Tom Lendacky wrote:
> 
> On Wednesday, May 04, 2011 03:51:47 PM Michael S. Tsirkin wrote:
> > Use the new avail_event feature to reduce the number
> > of exits from the guest.
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >  drivers/virtio/virtio_ring.c |   39
> > ++++++++++++++++++++++++++++++++++++++- 1 files changed, 38 insertions(+),
> > 1 deletions(-)
> > 
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 3a3ed75..262dfe6 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -82,6 +82,15 @@ struct vring_virtqueue
> >  	/* Host supports indirect buffers */
> >  	bool indirect;
> > 
> > +	/* Host publishes avail event idx */
> > +	bool event;
> > +
> > +	/* Is kicked_avail below valid? */
> > +	bool kicked_avail_valid;
> > +
> > +	/* avail idx value we already kicked. */
> > +	u16 kicked_avail;
> > +
> >  	/* Number of free buffers */
> >  	unsigned int num_free;
> >  	/* Head of free buffer list. */
> > @@ -228,6 +237,12 @@ add_head:
> >  	 * new available array entries. */
> >  	virtio_wmb();
> >  	vq->vring.avail->idx++;
> > +	/* If the driver never bothers to kick in a very long while,
> > +	 * avail index might wrap around. If that happens, invalidate
> > +	 * kicked_avail index we stored. TODO: make sure all drivers
> > +	 * kick at least once in 2^16 and remove this. */
> > +	if (unlikely(vq->vring.avail->idx == vq->kicked_avail))
> > +		vq->kicked_avail_valid = true;
> 
> vq->kicked_avail_valid should be set to false here.
> 
> Tom

Right, good catch.

> > 
> >  	pr_debug("Added buffer head %i to %p\n", head, vq);
> >  	END_USE(vq);
> > @@ -236,6 +251,23 @@ add_head:
> >  }
> >  EXPORT_SYMBOL_GPL(virtqueue_add_buf_gfp);
> > 
> > +
> > +static bool vring_notify(struct vring_virtqueue *vq)
> > +{
> > +	u16 old, new;
> > +	bool v;
> > +	if (!vq->event)
> > +		return !(vq->vring.used->flags & VRING_USED_F_NO_NOTIFY);
> > +
> > +	v = vq->kicked_avail_valid;
> > +	old = vq->kicked_avail;
> > +	new = vq->kicked_avail = vq->vring.avail->idx;
> > +	vq->kicked_avail_valid = true;
> > +	if (unlikely(!v))
> > +		return true;
> > +	return vring_need_event(vring_avail_event(&vq->vring), new, old);
> > +}
> > +
> >  void virtqueue_kick(struct virtqueue *_vq)
> >  {
> >  	struct vring_virtqueue *vq = to_vvq(_vq);
> > @@ -244,7 +276,7 @@ void virtqueue_kick(struct virtqueue *_vq)
> >  	/* Need to update avail index before checking if we should notify */
> >  	virtio_mb();
> > 
> > -	if (!(vq->vring.used->flags & VRING_USED_F_NO_NOTIFY))
> > +	if (vring_notify(vq))
> >  		/* Prod other side to tell it about changes. */
> >  		vq->notify(&vq->vq);
> > 
> > @@ -437,6 +469,8 @@ struct virtqueue *vring_new_virtqueue(unsigned int num,
> >  	vq->vq.name = name;
> >  	vq->notify = notify;
> >  	vq->broken = false;
> > +	vq->kicked_avail_valid = false;
> > +	vq->kicked_avail = 0;
> >  	vq->last_used_idx = 0;
> >  	list_add_tail(&vq->vq.list, &vdev->vqs);
> >  #ifdef DEBUG
> > @@ -444,6 +478,7 @@ struct virtqueue *vring_new_virtqueue(unsigned int num,
> >  #endif
> > 
> >  	vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC);
> > +	vq->event = virtio_has_feature(vdev, VIRTIO_RING_F_AVAIL_EVENT_IDX);
> > 
> >  	/* No callback?  Tell other side not to bother us. */
> >  	if (!callback)
> > @@ -482,6 +517,8 @@ void vring_transport_features(struct virtio_device
> > *vdev) break;
> >  		case VIRTIO_RING_F_USED_EVENT_IDX:
> >  			break;
> > +		case VIRTIO_RING_F_AVAIL_EVENT_IDX:
> > +			break;
> >  		default:
> >  			/* We don't understand this bit. */
> >  			clear_bit(i, vdev->features);

^ permalink raw reply

* Re: [PATCH 05/18] virtio: used event index interface
From: Michael S. Tsirkin @ 2011-05-05  9:38 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-kernel, Rusty Russell, Carsten Otte, Christian Borntraeger,
	linux390, Martin Schwidefsky, Heiko Carstens, Shirley Ma, lguest,
	virtualization, netdev, linux-s390, kvm, Krishna Kumar, steved,
	habanero
In-Reply-To: <201105041656.11009.tahm@linux.vnet.ibm.com>

On Wed, May 04, 2011 at 04:56:09PM -0500, Tom Lendacky wrote:
> On Wednesday, May 04, 2011 03:51:09 PM Michael S. Tsirkin wrote:
> > Define a new feature bit for the guest to utilize a used_event index
> > (like Xen) instead if a flag bit to enable/disable interrupts.
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >  include/linux/virtio_ring.h |    9 +++++++++
> >  1 files changed, 9 insertions(+), 0 deletions(-)
> > 
> > diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
> > index e4d144b..f5c1b75 100644
> > --- a/include/linux/virtio_ring.h
> > +++ b/include/linux/virtio_ring.h
> > @@ -29,6 +29,10 @@
> >  /* We support indirect buffer descriptors */
> >  #define VIRTIO_RING_F_INDIRECT_DESC	28
> > 
> > +/* The Guest publishes the used index for which it expects an interrupt
> > + * at the end of the avail ring. Host should ignore the avail->flags
> > field. */ +#define VIRTIO_RING_F_USED_EVENT_IDX	29
> > +
> >  /* Virtio ring descriptors: 16 bytes.  These can chain together via
> > "next". */ struct vring_desc {
> >  	/* Address (guest-physical). */
> > @@ -83,6 +87,7 @@ struct vring {
> >   *	__u16 avail_flags;
> >   *	__u16 avail_idx;
> >   *	__u16 available[num];
> > + *	__u16 used_event_idx;
> >   *
> >   *	// Padding to the next align boundary.
> >   *	char pad[];
> > @@ -93,6 +98,10 @@ struct vring {
> >   *	struct vring_used_elem used[num];
> >   * };
> >   */
> > +/* We publish the used event index at the end of the available ring.
> > + * It is at the end for backwards compatibility. */
> > +#define vring_used_event(vr) ((vr)->avail->ring[(vr)->num])
> > +
> >  static inline void vring_init(struct vring *vr, unsigned int num, void *p,
> >  			      unsigned long align)
> >  {
> 
> You should update the vring_size procedure to account for the extra field at 
> the end of the available ring by change the "(2 + num)" to "(3 + num)":
>     return ((sizeof(struct vring_desc) * num + sizeof(__u16) * (3 + num)
> 
> Tom

In practice it gives the same result because of the alignment, but sure.
Thanks!

-- 
MST

^ permalink raw reply

* Re: [PATCH 0/4] [RFC] virtio-net: Improve small packet performance
From: Krishna Kumar2 @ 2011-05-05  9:43 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: davem, eric.dumazet, kvm, netdev, rusty
In-Reply-To: <20110505090439.GD17647@redhat.com>

"Michael S. Tsirkin" <mst@redhat.com> wrote on 05/05/2011 02:34:39 PM:

> > It shows that 2.9% of the time, the 1 jiffy was not enough
> > to free up space in the txq.
>
> How common is it to free up space in *less than* 1 jiffy?

True, but the point is that the space freed is just
enough for 43 entries, keeping it lower means a flood
of (psuedo) stop's and restart's.

> > That could also mean that we
> > had set xmit_restart just before jiffies changed. But the
> > average free capacity when we *resumed* xmits is:
> > Sum of slots / (Good + Bad) = 43.
> >
> > So the delay of 1 jiffy helped the host clean up, on average,
> > just 43 entries, which is 16% of total entries. This is
> > intended to show that the guest is not sitting idle waiting
> > for the jiffy to expire.
>
> OK, nice, this is exactly what my patchset is trying
> to do, without playing with timers: tell the host
> to interrupt us after 3/4 of the ring is free.
> Why 3/4 and not all of the ring? My hope is we can
> get some parallelism with the host this way.
> Why 3/4 and not 7/8? No idea :)
>
> > > > > I can post it, mind testing this?
> > > >
> > > > Sure.
> > >
> > > Just posted. Would appreciate feedback.
> >
> > Do I need to apply all the patches and simply test?
> >
> > Thanks,
> >
> > - KK
>
> Exactly. You can also try to tune the threshold
> for interrupts as well.

Could you send me (privately) the entire virtio-net/vhost
patch in a single file? It will help me quite a bit :)
Either attachment or inline is fine.

thanks,

- KK


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox