Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [RFC PATCH net-next 6/6] virtio-net: enable tx interrupt only for the final skb in the chain
From: Michael S. Tsirkin @ 2014-10-15 10:22 UTC (permalink / raw)
  To: Jason Wang; +Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem
In-Reply-To: <1413357930-45302-7-git-send-email-jasowang@redhat.com>

On Wed, Oct 15, 2014 at 03:25:30PM +0800, Jason Wang wrote:
> With the help of xmit_more and virtqueue_enable_cb_avail(), this patch
> enable tx interrupt only for the final skb in the chain if
> needed. This will help to mitigate tx interrupts.
> 
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>

I think you should split xmit_more stuff out.



> ---
>  drivers/net/virtio_net.c |   10 +++++++---
>  1 files changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 2afc2e2..5943f3f 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -993,12 +993,16 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>  				virtqueue_disable_cb(sq->vq);
>  			}
>  		}
> -	} else if (virtqueue_enable_cb(sq->vq)) {
> -		free_old_xmit_skbs(sq, qsize);
>  	}
>  
> -	if (__netif_subqueue_stopped(dev, qnum) || !skb->xmit_more)
> +	if (__netif_subqueue_stopped(dev, qnum) || !skb->xmit_more) {
>  		virtqueue_kick(sq->vq);
> +		if (sq->vq->num_free >= 2 +MAX_SKB_FRAGS &&
> +		    unlikely(virtqueue_enable_cb_avail(sq->vq))) {
> +			free_old_xmit_skbs(sq, qsize);
> +			virtqueue_disable_cb(sq->vq);
> +		}
> +	}
>  
>  	return NETDEV_TX_OK;
>  }
> -- 
> 1.7.1

^ permalink raw reply

* Re: [PATCH] gianfar: disable vlan tag insertion by default on 2.6.x
From: zhuyj @ 2014-10-15 10:22 UTC (permalink / raw)
  To: yzhu1, Willy Tarreau
  Cc: sandeep.kumar, netdev, linux-kernel, Yue.Tao, guang.yang, joe,
	festevam, richardcochran, clarocq, yongjun_wei, claudiu.manoil,
	roy.xu, sky.wangfeng, zhuyj
In-Reply-To: <543E4949.1050606@windriver.com>

[-- Attachment #1: Type: text/plain, Size: 1309 bytes --]

On 10/15/2014 06:15 PM, yzhu1 wrote:
> On 10/15/2014 06:09 PM, Willy Tarreau wrote:
>> Hi Zhu,
>>
>> On Wed, Oct 15, 2014 at 06:00:33PM +0800, Zhu Yanjun wrote:
>>> In commit f43c75d4b6[gianfar: disable TX vlan based on kernel 2.6.x],
>>> gianfar nic disables TX vlan. But gianfar nic enables vlan tag
>>> insertion by default. This will lead to unusable connections on
>>> some configurations. Since vlan tag insertion is disabled by default
>>> and it is not enabled any longer, it is not necessary to disable it 
>>> again.
>>>
>>> Zhu Yanjun (1):
>>>    gianfar: disable vlan tag insertion by default
>>>
>>>   drivers/net/gianfar.c | 6 ------
>>>   1 file changed, 6 deletions(-)
>>>
>>> -- 
>>> 1.9.1
>> There's no patch in this e-mail. Since you sent another e-mail with 
>> almost
>> the same subject, I'm confused, it's unclear to me whether I only 
>> need to
>> apply the patch in the other one with this commit message or if it's 
>> just
>> that you accidently dropped the patch when sending this e-mail. Could 
>> you
>> please enlighten me ?
>>
>> Thanks,
>> Willy
>>
> Hi, Willy
>
> Sorry. Please apply the patch in the other one with this commit message.
>
> Thanks a lot.
> Zhu Yanjun
>
Hi, Willy

Sorry, it is my fault. Please apply this patch in the attachment.

Thanks a lot.
Zhu Yanjun


[-- Attachment #2: 0001-gianfar-disable-vlan-tag-insertion-by-default.patch --]
[-- Type: text/x-patch, Size: 1687 bytes --]

>From 48cb11f57e363e8831c4bcaa854d6df23ca300f7 Mon Sep 17 00:00:00 2001
From: Zhu Yanjun <Yanjun.Zhu@windriver.com>
Date: Wed, 15 Oct 2014 16:52:51 +0800
Subject: [PATCH] gianfar: disable vlan tag insertion by default

2.6.x kernels require a similar logic change as commit 51b8cbfc
[gianfar: fix bug caused by e1653c3e] introduces for newer kernels.

Gianfar driver originally enables vlan tag insertion by default.
This will lead to unusable connections on some configurations.

Since gianfar nic vlan tag insertion is disabled by default and 
it is not enabled any longer, it is not necessary to disable it 
again.

Reported-by: Xu Jianrong <roy.xu@huawei.com>
Suggested-by: Wang Feng <sky.wangfeng@huawei.com>
Signed-off-by: Zhu Yanjun <Yanjun.Zhu@windriver.com>
---
 drivers/net/gianfar.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
index 8aa2cf6..afdcb41 100644
--- a/drivers/net/gianfar.c
+++ b/drivers/net/gianfar.c
@@ -1115,7 +1115,6 @@ int startup_gfar(struct net_device *dev)
 	/* keep vlan related bits if it's enabled */
 	if (priv->vlgrp) {
 		rctrl |= RCTRL_VLEX | RCTRL_PRSDEP_INIT;
-		tctrl |= TCTRL_VLINS;
 	}
 
 	/* Init rctrl based on our settings */
@@ -1456,11 +1455,6 @@ static void gfar_vlan_rx_register(struct net_device *dev,
 		tempval |= (RCTRL_VLEX | RCTRL_PRSDEP_INIT);
 		gfar_write(&priv->regs->rctrl, tempval);
 	} else {
-		/* Disable VLAN tag insertion */
-		tempval = gfar_read(&priv->regs->tctrl);
-		tempval &= ~TCTRL_VLINS;
-		gfar_write(&priv->regs->tctrl, tempval);
-
 		/* Disable VLAN tag extraction */
 		tempval = gfar_read(&priv->regs->rctrl);
 		tempval &= ~RCTRL_VLEX;
-- 
1.9.1


^ permalink raw reply related

* Re: [RFC PATCH net-next 5/6] virtio-net: enable tx interrupt
From: Jason Wang @ 2014-10-15 10:21 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: mst, netdev, linux-kernel, virtualization, davem
In-Reply-To: <1413365865.12304.55.camel@edumazet-glaptop2.roam.corp.google.com>

On 10/15/2014 05:37 PM, Eric Dumazet wrote:
> On Wed, 2014-10-15 at 15:25 +0800, Jason Wang wrote:
>
> ...
>
>> +static int free_old_xmit_skbs(struct send_queue *sq, int budget)
>> +{
>> +	struct sk_buff *skb;
>> +	unsigned int len;
>> +	struct virtnet_info *vi = sq->vq->vdev->priv;
>> +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
>> +	u64 tx_bytes = 0, tx_packets = 0;
>> +
>> +	while (tx_packets < budget &&
>> +	       (skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
>> +		pr_debug("Sent skb %p\n", skb);
>> +
>> +		tx_bytes += skb->len;
>> +		tx_packets++;
>> +
>> +		dev_kfree_skb_any(skb);
>> +	}
>> +
>> +	u64_stats_update_begin(&stats->tx_syncp);
>> +	stats->tx_bytes += tx_bytes;
>> +	stats->tx_packets =+ tx_packets;
> 	
> 	stats->tx_packets += tx_packets;

My bad, will correct it in next version.

Thanks

^ permalink raw reply

* Re: [RFC PATCH net-next 2/6] virtio: introduce virtio_enable_cb_avail()
From: Jason Wang @ 2014-10-15 10:19 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem
In-Reply-To: <20141015092849.GA25776@redhat.com>

On 10/15/2014 05:28 PM, Michael S. Tsirkin wrote:
> On Wed, Oct 15, 2014 at 03:25:26PM +0800, Jason Wang wrote:
>> This patch introduces virtio_enable_cb_avail() to publish avail idx
>> and used event. This could be used by batched buffer submitting to
>> reduce the number of tx interrupts.
>>
>> Cc: Rusty Russell <rusty@rustcorp.com.au>
>> Cc: Michael S. Tsirkin <mst@redhat.com>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>>  drivers/virtio/virtio_ring.c |   22 ++++++++++++++++++++--
>>  include/linux/virtio.h       |    2 ++
>>  2 files changed, 22 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
>> index 1b3929f..d67fbf8 100644
>> --- a/drivers/virtio/virtio_ring.c
>> +++ b/drivers/virtio/virtio_ring.c
>> @@ -567,14 +567,32 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
>>  	 * entry. Always do both to keep code simple. */
>>  	vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
>>  	/* Make sure used event never go backwards */
>> -	if (!vring_need_event(vring_used_event(&vq->vring),
>> -			      vq->vring.avail->idx, last_used_idx))
>> +	if (vq->vring.avail->idx != vring_used_event(&vq->vring) &&
>> +	    !vring_need_event(vring_used_event(&vq->vring),
>> +			      vq->vring.avail->idx, last_used_idx)) {
>>  		vring_used_event(&vq->vring) = last_used_idx;
>> +	}
>>  	END_USE(vq);
>>  	return last_used_idx;
>>  }
>>  EXPORT_SYMBOL_GPL(virtqueue_enable_cb_prepare);
>>
> I see you are also changing virtqueue_enable_cb_prepare, why?

This is also used to prevent it from moving the used event backwards.
This may happens when we handle tx napi after we publish avail idx as
used event (virtqueue_enable_cb_avail() was called).
>
>> +bool virtqueue_enable_cb_avail(struct virtqueue *_vq)
>> +{
>> +	struct vring_virtqueue *vq = to_vvq(_vq);
>> +	bool ret;
>> +
>> +	START_USE(vq);
>> +	vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
>> +	vring_used_event(&vq->vring) = vq->vring.avail->idx;
>> +	ret = vring_need_event(vq->vring.avail->idx,
>> +			       vq->last_used_idx, vq->vring.used->idx);
>> +	END_USE(vq);
>> +
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(virtqueue_enable_cb_avail);
>> +
>>  /**
>>   * virtqueue_poll - query pending used buffers
>>   * @vq: the struct virtqueue we're talking about.
> Could not figure out what this does.
> Please add documentation.
>

Sure, does something like below explain what does this function do?

/**                                                                            

 * virtqueue_enable_cb_avail - restart callbacks after
disable_cb.           
 * @vq: the struct virtqueue we're talking
about.                              
 *                                                                             

 * This re-enables callbacks but hints to the other side to
delay              
 * interrupts all of the available buffers have been processed;         
 * it returns "false" if there are at least one pending buffer in the
queue,          
 * to detect a possible race between the driver checking for more
work,        
 * and enabling
callbacks.                                                     
 *                                                                             

 * Caller must ensure we don't call this with other
virtqueue                  
 * operations at the same time (except where
noted).                           
 */

>> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
>> index b46671e..bfaf058 100644
>> --- a/include/linux/virtio.h
>> +++ b/include/linux/virtio.h
>> @@ -65,6 +65,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
>>  
>>  unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq);
>>  
>> +bool virtqueue_enable_cb_avail(struct virtqueue *vq);
>> +
>>  bool virtqueue_poll(struct virtqueue *vq, unsigned);
>>  
>>  bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
>> -- 
>> 1.7.1
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply

* Re: [RFC PATCH net-next 5/6] virtio-net: enable tx interrupt
From: Michael S. Tsirkin @ 2014-10-15 10:18 UTC (permalink / raw)
  To: Jason Wang; +Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem
In-Reply-To: <1413357930-45302-6-git-send-email-jasowang@redhat.com>

On Wed, Oct 15, 2014 at 03:25:29PM +0800, Jason Wang wrote:
> Orphan skb in ndo_start_xmit() breaks socket accounting and packet
> queuing. This in fact breaks lots of things such as pktgen and several
> TCP optimizations. And also make BQL can't be implemented for
> virtio-net.
> 
> This patch tries to solve this issue by enabling tx interrupt. To
> avoid introducing extra spinlocks, a tx napi was scheduled to free
> those packets.
> 
> More tx interrupt mitigation method could be used on top.
> 
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/net/virtio_net.c |  125 +++++++++++++++++++++++++++++++---------------
>  1 files changed, 85 insertions(+), 40 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index ccf98f9..2afc2e2 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -72,6 +72,8 @@ struct send_queue {
>  
>  	/* Name of the send queue: output.$index */
>  	char name[40];
> +
> +	struct napi_struct napi;
>  };
>  
>  /* Internal representation of a receive virtqueue */
> @@ -217,15 +219,40 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
>  	return p;
>  }
>  
> +static int free_old_xmit_skbs(struct send_queue *sq, int budget)
> +{
> +	struct sk_buff *skb;
> +	unsigned int len;
> +	struct virtnet_info *vi = sq->vq->vdev->priv;
> +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> +	u64 tx_bytes = 0, tx_packets = 0;
> +
> +	while (tx_packets < budget &&
> +	       (skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> +		pr_debug("Sent skb %p\n", skb);
> +
> +		tx_bytes += skb->len;
> +		tx_packets++;
> +
> +		dev_kfree_skb_any(skb);
> +	}
> +
> +	u64_stats_update_begin(&stats->tx_syncp);
> +	stats->tx_bytes += tx_bytes;
> +	stats->tx_packets =+ tx_packets;
> +	u64_stats_update_end(&stats->tx_syncp);
> +
> +	return tx_packets;
> +}
> +
>  static void skb_xmit_done(struct virtqueue *vq)
>  {
>  	struct virtnet_info *vi = vq->vdev->priv;
> +	struct send_queue *sq = &vi->sq[vq2txq(vq)];
>  
> -	/* Suppress further interrupts. */
> -	virtqueue_disable_cb(vq);
> -
> -	/* We were probably waiting for more output buffers. */
> -	netif_wake_subqueue(vi->dev, vq2txq(vq));
> +	if (napi_schedule_prep(&sq->napi)) {
> +		__napi_schedule(&sq->napi);
> +	}
>  }
>  
>  static unsigned int mergeable_ctx_to_buf_truesize(unsigned long mrg_ctx)
> @@ -774,7 +801,39 @@ again:
>  	return received;
>  }
>  
> +static int virtnet_poll_tx(struct napi_struct *napi, int budget)
> +{
> +	struct send_queue *sq =
> +		container_of(napi, struct send_queue, napi);
> +	struct virtnet_info *vi = sq->vq->vdev->priv;
> +	struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, vq2txq(sq->vq));
> +	unsigned int r, sent = 0;
> +
> +again:
> +	__netif_tx_lock(txq, smp_processor_id());
> +	virtqueue_disable_cb(sq->vq);
> +	sent += free_old_xmit_skbs(sq, budget - sent);
> +
> +	if (sent < budget) {
> +		r = virtqueue_enable_cb_prepare(sq->vq);
> +		napi_complete(napi);
> +		__netif_tx_unlock(txq);
> +		if (unlikely(virtqueue_poll(sq->vq, r)) &&


So you are enabling callback on the next packet,
which is almost sure to cause an interrupt storm
on the guest.


I think it's a bad idea, this is why I used
enable_cb_delayed in my patch.




> +		    napi_schedule_prep(napi)) {
> +			virtqueue_disable_cb(sq->vq);
> +			__napi_schedule(napi);
> +			goto again;
> +		}
> +	} else {
> +		__netif_tx_unlock(txq);
> +	}
> +
> +	netif_wake_subqueue(vi->dev, vq2txq(sq->vq));
> +	return sent;
> +}
> +
>  #ifdef CONFIG_NET_RX_BUSY_POLL
> +
>  /* must be called with local_bh_disable()d */
>  static int virtnet_busy_poll(struct napi_struct *napi)
>  {
> @@ -822,36 +881,12 @@ static int virtnet_open(struct net_device *dev)
>  			if (!try_fill_recv(&vi->rq[i], GFP_KERNEL))
>  				schedule_delayed_work(&vi->refill, 0);
>  		virtnet_napi_enable(&vi->rq[i]);
> +		napi_enable(&vi->sq[i].napi);
>  	}
>  
>  	return 0;
>  }
>  
> -static int free_old_xmit_skbs(struct send_queue *sq)
> -{
> -	struct sk_buff *skb;
> -	unsigned int len;
> -	struct virtnet_info *vi = sq->vq->vdev->priv;
> -	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> -	u64 tx_bytes = 0, tx_packets = 0;
> -
> -	while ((skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> -		pr_debug("Sent skb %p\n", skb);
> -
> -		tx_bytes += skb->len;
> -		tx_packets++;
> -
> -		dev_kfree_skb_any(skb);
> -	}
> -
> -	u64_stats_update_begin(&stats->tx_syncp);
> -	stats->tx_bytes += tx_bytes;
> -	stats->tx_packets =+ tx_packets;
> -	u64_stats_update_end(&stats->tx_syncp);
> -
> -	return tx_packets;
> -}
> -

So you end up moving it all anyway, why bother splitting out
minor changes in previous patches?


>  static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
>  {
>  	struct skb_vnet_hdr *hdr;
> @@ -917,6 +952,7 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
>  		sg_set_buf(sq->sg, hdr, hdr_len);
>  		num_sg = skb_to_sgvec(skb, sq->sg + 1, 0, skb->len) + 1;
>  	}
> +
>  	return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb, GFP_ATOMIC);
>  }
>  
> @@ -925,10 +961,11 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>  	struct virtnet_info *vi = netdev_priv(dev);
>  	int qnum = skb_get_queue_mapping(skb);
>  	struct send_queue *sq = &vi->sq[qnum];
> -	int err;
> +	int err, qsize = virtqueue_get_vring_size(sq->vq);
>  
> +	virtqueue_disable_cb(sq->vq);
>  	/* Free up any pending old buffers before queueing new ones. */
> -	free_old_xmit_skbs(sq);
> +	free_old_xmit_skbs(sq, qsize);
>  
>  	/* Try to transmit */
>  	err = xmit_skb(sq, skb);
> @@ -944,22 +981,20 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>  		return NETDEV_TX_OK;
>  	}
>  
> -	/* Don't wait up for transmitted skbs to be freed. */
> -	skb_orphan(skb);
> -	nf_reset(skb);
> -
>  	/* Apparently nice girls don't return TX_BUSY; stop the queue
>  	 * before it gets out of hand.  Naturally, this wastes entries. */
>  	if (sq->vq->num_free < 2+MAX_SKB_FRAGS) {
>  		netif_stop_subqueue(dev, qnum);
>  		if (unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
>  			/* More just got used, free them then recheck. */
> -			free_old_xmit_skbs(sq);
> +			free_old_xmit_skbs(sq, qsize);
>  			if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
>  				netif_start_subqueue(dev, qnum);
>  				virtqueue_disable_cb(sq->vq);
>  			}
>  		}
> +	} else if (virtqueue_enable_cb(sq->vq)) {
> +		free_old_xmit_skbs(sq, qsize);
>  	}
>  
>  	if (__netif_subqueue_stopped(dev, qnum) || !skb->xmit_more)
> @@ -1141,8 +1176,10 @@ static int virtnet_close(struct net_device *dev)
>  	/* Make sure refill_work doesn't re-enable napi! */
>  	cancel_delayed_work_sync(&vi->refill);
>  
> -	for (i = 0; i < vi->max_queue_pairs; i++)
> +	for (i = 0; i < vi->max_queue_pairs; i++) {
>  		napi_disable(&vi->rq[i].napi);
> +		napi_disable(&vi->sq[i].napi);
> +	}
>  
>  	return 0;
>  }
> @@ -1461,8 +1498,10 @@ static void virtnet_free_queues(struct virtnet_info *vi)
>  {
>  	int i;
>  
> -	for (i = 0; i < vi->max_queue_pairs; i++)
> +	for (i = 0; i < vi->max_queue_pairs; i++) {
>  		netif_napi_del(&vi->rq[i].napi);
> +		netif_napi_del(&vi->sq[i].napi);
> +	}
>  
>  	kfree(vi->rq);
>  	kfree(vi->sq);
> @@ -1616,6 +1655,8 @@ static int virtnet_alloc_queues(struct virtnet_info *vi)
>  		netif_napi_add(vi->dev, &vi->rq[i].napi, virtnet_poll,
>  			       napi_weight);
>  		napi_hash_add(&vi->rq[i].napi);
> +		netif_napi_add(vi->dev, &vi->sq[i].napi, virtnet_poll_tx,
> +			       napi_weight);
>  
>  		sg_init_table(vi->rq[i].sg, ARRAY_SIZE(vi->rq[i].sg));
>  		ewma_init(&vi->rq[i].mrg_avg_pkt_len, 1, RECEIVE_AVG_WEIGHT);
> @@ -1920,8 +1961,10 @@ static int virtnet_freeze(struct virtio_device *vdev)
>  	if (netif_running(vi->dev)) {
>  		for (i = 0; i < vi->max_queue_pairs; i++) {
>  			napi_disable(&vi->rq[i].napi);
> +			napi_disable(&vi->sq[i].napi);
>  			napi_hash_del(&vi->rq[i].napi);
>  			netif_napi_del(&vi->rq[i].napi);
> +			netif_napi_del(&vi->sq[i].napi);
>  		}
>  	}
>  
> @@ -1946,8 +1989,10 @@ static int virtnet_restore(struct virtio_device *vdev)
>  			if (!try_fill_recv(&vi->rq[i], GFP_KERNEL))
>  				schedule_delayed_work(&vi->refill, 0);
>  
> -		for (i = 0; i < vi->max_queue_pairs; i++)
> +		for (i = 0; i < vi->max_queue_pairs; i++) {
>  			virtnet_napi_enable(&vi->rq[i]);
> +			napi_enable(&vi->sq[i].napi);
> +		}
>  	}
>  
>  	netif_device_attach(vi->dev);
> -- 
> 1.7.1

^ permalink raw reply

* Re: [PATCH] gianfar: disable vlan tag insertion by default on 2.6.x
From: yzhu1 @ 2014-10-15 10:15 UTC (permalink / raw)
  To: Willy Tarreau, Zhu Yanjun
  Cc: sandeep.kumar, netdev, linux-kernel, Yue.Tao, guang.yang, joe,
	festevam, richardcochran, clarocq, yongjun_wei, claudiu.manoil,
	roy.xu, sky.wangfeng, zhuyj
In-Reply-To: <20141015100949.GA30089@1wt.eu>

On 10/15/2014 06:09 PM, Willy Tarreau wrote:
> Hi Zhu,
>
> On Wed, Oct 15, 2014 at 06:00:33PM +0800, Zhu Yanjun wrote:
>> In commit f43c75d4b6[gianfar: disable TX vlan based on kernel 2.6.x],
>> gianfar nic disables TX vlan. But gianfar nic enables vlan tag
>> insertion by default. This will lead to unusable connections on
>> some configurations. Since vlan tag insertion is disabled by default
>> and it is not enabled any longer, it is not necessary to disable it again.
>>
>> Zhu Yanjun (1):
>>    gianfar: disable vlan tag insertion by default
>>
>>   drivers/net/gianfar.c | 6 ------
>>   1 file changed, 6 deletions(-)
>>
>> -- 
>> 1.9.1
> There's no patch in this e-mail. Since you sent another e-mail with almost
> the same subject, I'm confused, it's unclear to me whether I only need to
> apply the patch in the other one with this commit message or if it's just
> that you accidently dropped the patch when sending this e-mail. Could you
> please enlighten me ?
>
> Thanks,
> Willy
>
Hi, Willy

Sorry. Please apply the patch in the other one with this commit message.

Thanks a lot.
Zhu Yanjun

^ permalink raw reply

* Re: [RFC PATCH net-next 1/6] virtio: make sure used event never go backwards
From: Jason Wang @ 2014-10-15 10:13 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem
In-Reply-To: <20141015093432.GB25776@redhat.com>

On 10/15/2014 05:34 PM, Michael S. Tsirkin wrote:
> On Wed, Oct 15, 2014 at 03:25:25PM +0800, Jason Wang wrote:
>> This patch checks the new event idx to make sure used event idx never
>> goes back. This is used to synchronize the calls between
>> virtqueue_enable_cb_delayed() and virtqueue_enable_cb().
>>
>> Cc: Rusty Russell <rusty@rustcorp.com.au>
>> Cc: Michael S. Tsirkin <mst@redhat.com>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
> the implication being that moving event idx back might cause some race
> condition?  

This will cause race condition when tx interrupt is enabled. Consider
the following cases

1) tx napi was scheduled
2) start_xmit() call virtqueue_enable_cb_delayed() and disable cb, [used
event is vq->last_used_idx + 3/4 pendg bufs]
3) tx napi enable the callback by virtqueue_enable_cb() [ used event is
vq->last_used_idx ]
 
After step 3, used event was moved back, unnecessary tx interrupt was
triggered.
> If yes but please describe the race explicitly.
> Is there a bug we need to fix on stable?

Looks not, current code does not have such race condition.
> Please also explicitly describe a configuration that causes event idx
> to go back.
>
> All this info should go in the commit log.

Will do this.
>> ---
>>  drivers/virtio/virtio_ring.c |    7 +++++--
>>  1 files changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
>> index 3b1f89b..1b3929f 100644
>> --- a/drivers/virtio/virtio_ring.c
>> +++ b/drivers/virtio/virtio_ring.c
>> @@ -559,14 +559,17 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
>>  	u16 last_used_idx;
>>  
>>  	START_USE(vq);
>> -
>> +	last_used_idx = vq->last_used_idx;
>>  	/* We optimistically turn back on interrupts, then check if there was
>>  	 * more to do. */
>>  	/* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
>>  	 * either clear the flags bit or point the event index at the next
>>  	 * entry. Always do both to keep code simple. */
>>  	vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
>> -	vring_used_event(&vq->vring) = last_used_idx = vq->last_used_idx;
>> +	/* Make sure used event never go backwards */
> s/go/goes/
>
>> +	if (!vring_need_event(vring_used_event(&vq->vring),
>> +			      vq->vring.avail->idx, last_used_idx))
>> +		vring_used_event(&vq->vring) = last_used_idx;
> The result will be that driver will *not* get an interrupt
> on the next consumed buffer, which is likely not what driver
> intended when it called virtqueue_enable_cb.

This will only happen when we want to delay the interrupt for next few
consumed buffers (virtqueue_enable_cb_delayed() was called). For the
other case, vq->last_used_idx should be ahead of previous used event. Do
you see any other case?
>
> Instead, how about we simply document the requirement that drivers either
> always call virtqueue_enable_cb_delayed or virtqueue_enable_cb
> but not both?

We need call them both when tx interrupt is enabled I believe.
>
>>  	END_USE(vq);
>>  	return last_used_idx;
>>  }
>> -- 
>> 1.7.1

^ permalink raw reply

* Re: [PATCH] gianfar: disable vlan tag insertion by default on 2.6.x
From: Willy Tarreau @ 2014-10-15 10:09 UTC (permalink / raw)
  To: Zhu Yanjun
  Cc: sandeep.kumar, netdev, linux-kernel, Yue.Tao, guang.yang, joe,
	festevam, richardcochran, clarocq, yongjun_wei, claudiu.manoil,
	roy.xu, sky.wangfeng, Zhu Yanjun
In-Reply-To: <1413367234-28652-1-git-send-email-Yanjun.Zhu@windriver.com>

Hi Zhu,

On Wed, Oct 15, 2014 at 06:00:33PM +0800, Zhu Yanjun wrote:
> In commit f43c75d4b6[gianfar: disable TX vlan based on kernel 2.6.x],
> gianfar nic disables TX vlan. But gianfar nic enables vlan tag 
> insertion by default. This will lead to unusable connections on 
> some configurations. Since vlan tag insertion is disabled by default 
> and it is not enabled any longer, it is not necessary to disable it again.
> 
> Zhu Yanjun (1):
>   gianfar: disable vlan tag insertion by default
> 
>  drivers/net/gianfar.c | 6 ------
>  1 file changed, 6 deletions(-)
> 
> -- 
> 1.9.1

There's no patch in this e-mail. Since you sent another e-mail with almost
the same subject, I'm confused, it's unclear to me whether I only need to
apply the patch in the other one with this commit message or if it's just
that you accidently dropped the patch when sending this e-mail. Could you
please enlighten me ?

Thanks,
Willy

^ permalink raw reply

* [ethtool PATCH 1/1] bug fix: SFP Tx BIAS uses memory wrong offset
From: Jamal Hadi Salim @ 2014-10-15 10:09 UTC (permalink / raw)
  To: bwh; +Cc: netdev, Jamal Hadi Salim

From: Jamal Hadi Salim <jhs@mojatatu.com>

SFF-8472 rev 12.0 indicates the SFP BIAS is at offset 100 of page A2

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
---
 sfpdiag.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sfpdiag.c b/sfpdiag.c
index 812a2fa..a3dbc9b 100644
--- a/sfpdiag.c
+++ b/sfpdiag.c
@@ -48,7 +48,7 @@
 #define SFF_A2_VCC_HWARN                  12
 #define SFF_A2_VCC_LWARN                  14
 
-#define SFF_A2_BIAS                       96
+#define SFF_A2_BIAS                       100
 #define SFF_A2_BIAS_HALRM                 16
 #define SFF_A2_BIAS_LALRM                 18
 #define SFF_A2_BIAS_HWARN                 20
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH v2 1/1] net: fec: ptp: fix convergence issue to support LinuxPTP stack
From: Fugang Duan @ 2014-10-15  9:30 UTC (permalink / raw)
  To: richardcochran, davem; +Cc: netdev, bhutchings, b20596, b38611

iMX6SX IEEE 1588 module has one hw issue in capturing the ATVR register.
The current SW flow is:
		ENET0->ATCR |= ENET_ATCR_CAPTURE_MASK;
		ts_counter_ns = ENET0->ATVR;
The ATVR value is not expected value that cause LinuxPTP stack cannot be convergent.

ENET Block Guide/ Chapter for the iMX6SX (PELE) address the issue:
After set ENET_ATCR[Capture], there need some time cycles before the counter
value is capture in the register clock domain. The wait-time-cycles is at least
6 clock cycles of the slower clock between the register clock and the 1588 clock.
So need something like:
		ENET0->ATCR |= ENET_ATCR_CAPTURE_MASK;
		wait();
		ts_counter_ns = ENET0->ATVR;

For iMX6SX, the 1588 ts_clk is fixed to 25Mhz, register clock is 66Mhz, so the
wait-time-cycles must be greater than 240ns (40ns * 6). The patch add 1us delay
before cpu read ATVR register.

Changes V2:
Modify the commit/comments log to describe the issue clearly.

Signed-off-by: Fugang Duan <B38611@freescale.com>
---
 drivers/net/ethernet/freescale/fec.h      |   50 +++++++++++++++++++++++++++++
 drivers/net/ethernet/freescale/fec_main.c |   43 +------------------------
 drivers/net/ethernet/freescale/fec_ptp.c  |    5 +++
 3 files changed, 56 insertions(+), 42 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec.h b/drivers/net/ethernet/freescale/fec.h
index 1d5e182..906b739 100644
--- a/drivers/net/ethernet/freescale/fec.h
+++ b/drivers/net/ethernet/freescale/fec.h
@@ -367,6 +367,56 @@ struct bufdesc_ex {
 #define FEC_VLAN_TAG_LEN       0x04
 #define FEC_ETHTYPE_LEN                0x02
 
+/* Controller is ENET-MAC */
+#define FEC_QUIRK_ENET_MAC		(1 << 0)
+/* Controller needs driver to swap frame */
+#define FEC_QUIRK_SWAP_FRAME		(1 << 1)
+/* Controller uses gasket */
+#define FEC_QUIRK_USE_GASKET		(1 << 2)
+/* Controller has GBIT support */
+#define FEC_QUIRK_HAS_GBIT		(1 << 3)
+/* Controller has extend desc buffer */
+#define FEC_QUIRK_HAS_BUFDESC_EX	(1 << 4)
+/* Controller has hardware checksum support */
+#define FEC_QUIRK_HAS_CSUM		(1 << 5)
+/* Controller has hardware vlan support */
+#define FEC_QUIRK_HAS_VLAN		(1 << 6)
+/* ENET IP errata ERR006358
+ *
+ * If the ready bit in the transmit buffer descriptor (TxBD[R]) is previously
+ * detected as not set during a prior frame transmission, then the
+ * ENET_TDAR[TDAR] bit is cleared at a later time, even if additional TxBDs
+ * were added to the ring and the ENET_TDAR[TDAR] bit is set. This results in
+ * frames not being transmitted until there is a 0-to-1 transition on
+ * ENET_TDAR[TDAR].
+ */
+#define FEC_QUIRK_ERR006358            (1 << 7)
+/* ENET IP hw AVB
+ *
+ * i.MX6SX ENET IP add Audio Video Bridging (AVB) feature support.
+ * - Two class indicators on receive with configurable priority
+ * - Two class indicators and line speed timer on transmit allowing
+ *   implementation class credit based shapers externally
+ * - Additional DMA registers provisioned to allow managing up to 3
+ *   independent rings
+ */
+#define FEC_QUIRK_HAS_AVB		(1 << 8)
+/* There is a TDAR race condition for mutliQ when the software sets TDAR
+ * and the UDMA clears TDAR simultaneously or in a small window (2-4 cycles).
+ * This will cause the udma_tx and udma_tx_arbiter state machines to hang.
+ * The issue exist at i.MX6SX enet IP.
+ */
+#define FEC_QUIRK_ERR007885		(1 << 9)
+/* ENET Block Guide/ Chapter for the iMX6SX (PELE) address one issue:
+ * After set ENET_ATCR[Capture], there need some time cycles before the counter
+ * value is capture in the register clock domain.
+ * The wait-time-cycles is at least 6 clock cycles of the slower clock between
+ * the register clock and the 1588 clock. The 1588 ts_clk is fixed to 25Mhz,
+ * register clock is 66Mhz, so the wait-time-cycles must be greater than 240ns
+ * (40ns * 6).
+ */
+#define FEC_QUIRK_BUG_CAPTURE		(1 << 10)
+
 struct fec_enet_priv_tx_q {
 	int index;
 	unsigned char *tx_bounce[TX_RING_SIZE];
diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index 87975b5..965f1c8 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -78,47 +78,6 @@ static void fec_enet_itr_coal_init(struct net_device *ndev);
 #define FEC_ENET_RAFL_V	0x8
 #define FEC_ENET_OPD_V	0xFFF0
 
-/* Controller is ENET-MAC */
-#define FEC_QUIRK_ENET_MAC		(1 << 0)
-/* Controller needs driver to swap frame */
-#define FEC_QUIRK_SWAP_FRAME		(1 << 1)
-/* Controller uses gasket */
-#define FEC_QUIRK_USE_GASKET		(1 << 2)
-/* Controller has GBIT support */
-#define FEC_QUIRK_HAS_GBIT		(1 << 3)
-/* Controller has extend desc buffer */
-#define FEC_QUIRK_HAS_BUFDESC_EX	(1 << 4)
-/* Controller has hardware checksum support */
-#define FEC_QUIRK_HAS_CSUM		(1 << 5)
-/* Controller has hardware vlan support */
-#define FEC_QUIRK_HAS_VLAN		(1 << 6)
-/* ENET IP errata ERR006358
- *
- * If the ready bit in the transmit buffer descriptor (TxBD[R]) is previously
- * detected as not set during a prior frame transmission, then the
- * ENET_TDAR[TDAR] bit is cleared at a later time, even if additional TxBDs
- * were added to the ring and the ENET_TDAR[TDAR] bit is set. This results in
- * frames not being transmitted until there is a 0-to-1 transition on
- * ENET_TDAR[TDAR].
- */
-#define FEC_QUIRK_ERR006358            (1 << 7)
-/* ENET IP hw AVB
- *
- * i.MX6SX ENET IP add Audio Video Bridging (AVB) feature support.
- * - Two class indicators on receive with configurable priority
- * - Two class indicators and line speed timer on transmit allowing
- *   implementation class credit based shapers externally
- * - Additional DMA registers provisioned to allow managing up to 3
- *   independent rings
- */
-#define FEC_QUIRK_HAS_AVB		(1 << 8)
-/* There is a TDAR race condition for mutliQ when the software sets TDAR
- * and the UDMA clears TDAR simultaneously or in a small window (2-4 cycles).
- * This will cause the udma_tx and udma_tx_arbiter state machines to hang.
- * The issue exist at i.MX6SX enet IP.
- */
-#define FEC_QUIRK_ERR007885		(1 << 9)
-
 static struct platform_device_id fec_devtype[] = {
 	{
 		/* keep it for coldfire */
@@ -146,7 +105,7 @@ static struct platform_device_id fec_devtype[] = {
 		.driver_data = FEC_QUIRK_ENET_MAC | FEC_QUIRK_HAS_GBIT |
 				FEC_QUIRK_HAS_BUFDESC_EX | FEC_QUIRK_HAS_CSUM |
 				FEC_QUIRK_HAS_VLAN | FEC_QUIRK_HAS_AVB |
-				FEC_QUIRK_ERR007885,
+				FEC_QUIRK_ERR007885 | FEC_QUIRK_BUG_CAPTURE,
 	}, {
 		/* sentinel */
 	}
diff --git a/drivers/net/ethernet/freescale/fec_ptp.c b/drivers/net/ethernet/freescale/fec_ptp.c
index cca3617..380bb10 100644
--- a/drivers/net/ethernet/freescale/fec_ptp.c
+++ b/drivers/net/ethernet/freescale/fec_ptp.c
@@ -82,12 +82,17 @@ static cycle_t fec_ptp_read(const struct cyclecounter *cc)
 {
 	struct fec_enet_private *fep =
 		container_of(cc, struct fec_enet_private, cc);
+	const struct platform_device_id *id_entry =
+		platform_get_device_id(fep->pdev);
 	u32 tempval;
 
 	tempval = readl(fep->hwp + FEC_ATIME_CTRL);
 	tempval |= FEC_T_CTRL_CAPTURE;
 	writel(tempval, fep->hwp + FEC_ATIME_CTRL);
 
+	if (id_entry->driver_data & FEC_QUIRK_BUG_CAPTURE)
+		udelay(1);
+
 	return readl(fep->hwp + FEC_ATIME);
 }
 
-- 
1.7.8

^ permalink raw reply related

* [PATCH] gianfar: disable vlan tag insertion by default
From: Zhu Yanjun @ 2014-10-15 10:00 UTC (permalink / raw)
  To: sandeep.kumar, netdev, linux-kernel, Yue.Tao, guang.yang, joe,
	festevam, richardcochran, clarocq, yongjun_wei, claudiu.manoil, w,
	roy.xu, sky.wangfeng, zyjzyj2000
  Cc: Zhu Yanjun
In-Reply-To: <1413367234-28652-1-git-send-email-Yanjun.Zhu@windriver.com>

2.6.x kernels require a similar logic change as commit 51b8cbfc
[gianfar: fix bug caused by e1653c3e] introduces for newer kernels.

Gianfar driver originally enables vlan tag insertion by default.
This will lead to unusable connections on some configurations.

Since gianfar nic vlan tag insertion is disabled by default and 
it is not enabled any longer, it is not necessary to disable it 
again.

Reported-by: Xu Jianrong <roy.xu@huawei.com>
Suggested-by: Wang Feng <sky.wangfeng@huawei.com>
---
 drivers/net/gianfar.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
index 8aa2cf6..afdcb41 100644
--- a/drivers/net/gianfar.c
+++ b/drivers/net/gianfar.c
@@ -1115,7 +1115,6 @@ int startup_gfar(struct net_device *dev)
 	/* keep vlan related bits if it's enabled */
 	if (priv->vlgrp) {
 		rctrl |= RCTRL_VLEX | RCTRL_PRSDEP_INIT;
-		tctrl |= TCTRL_VLINS;
 	}
 
 	/* Init rctrl based on our settings */
@@ -1456,11 +1455,6 @@ static void gfar_vlan_rx_register(struct net_device *dev,
 		tempval |= (RCTRL_VLEX | RCTRL_PRSDEP_INIT);
 		gfar_write(&priv->regs->rctrl, tempval);
 	} else {
-		/* Disable VLAN tag insertion */
-		tempval = gfar_read(&priv->regs->tctrl);
-		tempval &= ~TCTRL_VLINS;
-		gfar_write(&priv->regs->tctrl, tempval);
-
 		/* Disable VLAN tag extraction */
 		tempval = gfar_read(&priv->regs->rctrl);
 		tempval &= ~RCTRL_VLEX;
-- 
1.9.1

^ permalink raw reply related

* [PATCH] gianfar: disable vlan tag insertion by default on 2.6.x
From: Zhu Yanjun @ 2014-10-15 10:00 UTC (permalink / raw)
  To: sandeep.kumar, netdev, linux-kernel, Yue.Tao, guang.yang, joe,
	festevam, richardcochran, clarocq, yongjun_wei, claudiu.manoil, w,
	roy.xu, sky.wangfeng, zyjzyj2000
  Cc: Zhu Yanjun

In commit f43c75d4b6[gianfar: disable TX vlan based on kernel 2.6.x],
gianfar nic disables TX vlan. But gianfar nic enables vlan tag 
insertion by default. This will lead to unusable connections on 
some configurations. Since vlan tag insertion is disabled by default 
and it is not enabled any longer, it is not necessary to disable it again.

Zhu Yanjun (1):
  gianfar: disable vlan tag insertion by default

 drivers/net/gianfar.c | 6 ------
 1 file changed, 6 deletions(-)

-- 
1.9.1

^ permalink raw reply

* [PATCH v3 net] ixgbe: check adapter->vfinfo before dereference
From: Thierry Herbelot @ 2014-10-15  9:58 UTC (permalink / raw)
  To: Jeff Kirsher, Jesse Brandeburg, Bruce Allan, netdev,
	emil.s.tantilov
  Cc: Thierry Herbelot
In-Reply-To: <1412930732-892-1-git-send-email-thierry.herbelot@6wind.com>

this protects against the following panic:
(before a VF was actually created on p96p1 PF Ethernet port)

ip link set p96p1 vf 0 spoofchk off
BUG: unable to handle kernel NULL pointer dereference at 0000000000000052
IP: [<ffffffffa044a1c1>] ixgbe_ndo_set_vf_spoofchk+0x51/0x150 [ixgbe]

Signed-off-by: Thierry Herbelot <thierry.herbelot@6wind.com>
---

v2:
  compilation fixes

v3:
  remove checks in functions where vfinfo is known not to be NULL
  return -EINVAL as error code

 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c |   42 ++++++++++++++++++++++--
 1 file changed, 40 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
index 706fc69..fab0157 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
@@ -365,6 +365,9 @@ void ixgbe_restore_vf_multicasts(struct ixgbe_adapter *adapter)
 	u32 vector_reg;
 	u32 mta_reg;
 
+	if (!adapter->vfinfo)
+		return;
+
 	for (i = 0; i < adapter->num_vfs; i++) {
 		u32 vmolr = IXGBE_READ_REG(hw, IXGBE_VMOLR(i));
 		vfinfo = &adapter->vfinfo[i];
@@ -504,9 +507,13 @@ static void ixgbe_clear_vmvir(struct ixgbe_adapter *adapter, u32 vf)
 static inline void ixgbe_vf_reset_event(struct ixgbe_adapter *adapter, u32 vf)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
-	struct vf_data_storage *vfinfo = &adapter->vfinfo[vf];
+	struct vf_data_storage *vfinfo;
 	u8 num_tcs = netdev_get_num_tc(adapter->netdev);
 
+	if (!adapter->vfinfo)
+		return;
+	vfinfo = &adapter->vfinfo[vf];
+
 	/* add PF assigned VLAN or VLAN 0 */
 	ixgbe_set_vf_vlan(adapter, true, vfinfo->pf_vlan, vf);
 
@@ -612,6 +619,9 @@ int ixgbe_vf_configuration(struct pci_dev *pdev, unsigned int event_mask)
 
 	bool enable = ((event_mask & 0x10000000U) != 0);
 
+	if (!adapter->vfinfo)
+		return -EINVAL;
+
 	if (enable)
 		eth_zero_addr(adapter->vfinfo[vfn].vf_mac_addresses);
 
@@ -937,6 +947,9 @@ static int ixgbe_rcv_msg_from_vf(struct ixgbe_adapter *adapter, u32 vf)
 	struct ixgbe_hw *hw = &adapter->hw;
 	s32 retval;
 
+	if (!adapter->vfinfo)
+		return -EINVAL;
+
 	retval = ixgbe_read_mbx(hw, msgbuf, mbx_size, vf);
 
 	if (retval) {
@@ -1010,6 +1023,9 @@ static void ixgbe_rcv_ack_from_vf(struct ixgbe_adapter *adapter, u32 vf)
 	struct ixgbe_hw *hw = &adapter->hw;
 	u32 msg = IXGBE_VT_MSGTYPE_NACK;
 
+	if (!adapter->vfinfo)
+		return;
+
 	/* if device isn't clear to send it shouldn't be reading either */
 	if (!adapter->vfinfo[vf].clear_to_send)
 		ixgbe_write_mbx(hw, &msg, 1, vf);
@@ -1053,6 +1069,9 @@ void ixgbe_ping_all_vfs(struct ixgbe_adapter *adapter)
 	u32 ping;
 	int i;
 
+	if (!adapter->vfinfo)
+		return;
+
 	for (i = 0 ; i < adapter->num_vfs; i++) {
 		ping = IXGBE_PF_CONTROL_MSG;
 		if (adapter->vfinfo[i].clear_to_send)
@@ -1066,6 +1085,9 @@ int ixgbe_ndo_set_vf_mac(struct net_device *netdev, int vf, u8 *mac)
 	struct ixgbe_adapter *adapter = netdev_priv(netdev);
 	if (!is_valid_ether_addr(mac) || (vf >= adapter->num_vfs))
 		return -EINVAL;
+	if (!adapter->vfinfo)
+		return -EINVAL;
+
 	adapter->vfinfo[vf].pf_set_mac = true;
 	dev_info(&adapter->pdev->dev, "setting MAC %pM on VF %d\n", mac, vf);
 	dev_info(&adapter->pdev->dev, "Reload the VF driver to make this"
@@ -1085,6 +1107,9 @@ int ixgbe_ndo_set_vf_vlan(struct net_device *netdev, int vf, u16 vlan, u8 qos)
 	struct ixgbe_adapter *adapter = netdev_priv(netdev);
 	struct ixgbe_hw *hw = &adapter->hw;
 
+	if (!adapter->vfinfo)
+		return -EINVAL;
+
 	if ((vf >= adapter->num_vfs) || (vlan > 4095) || (qos > 7))
 		return -EINVAL;
 	if (vlan || qos) {
@@ -1149,7 +1174,11 @@ static void ixgbe_set_vf_rate_limit(struct ixgbe_adapter *adapter, int vf)
 	struct ixgbe_hw *hw = &adapter->hw;
 	u32 bcnrc_val = 0;
 	u16 queue, queues_per_pool;
-	u16 tx_rate = adapter->vfinfo[vf].tx_rate;
+	u16 tx_rate;
+
+	if (!adapter->vfinfo)
+		return;
+	tx_rate = adapter->vfinfo[vf].tx_rate;
 
 	if (tx_rate) {
 		/* start with base link speed value */
@@ -1199,6 +1228,9 @@ void ixgbe_check_vf_rate_limit(struct ixgbe_adapter *adapter)
 {
 	int i;
 
+	if (!adapter->vfinfo)
+		return;
+
 	/* VF Tx rate limit was not set */
 	if (!adapter->vf_rate_link_speed)
 		return;
@@ -1261,6 +1293,9 @@ int ixgbe_ndo_set_vf_spoofchk(struct net_device *netdev, int vf, bool setting)
 	struct ixgbe_hw *hw = &adapter->hw;
 	u32 regval;
 
+	if (!adapter->vfinfo)
+		return -EINVAL;
+
 	adapter->vfinfo[vf].spoofchk_enabled = setting;
 
 	regval = IXGBE_READ_REG(hw, IXGBE_PFVFSPOOF(vf_target_reg));
@@ -1285,6 +1320,9 @@ int ixgbe_ndo_get_vf_config(struct net_device *netdev,
 	struct ixgbe_adapter *adapter = netdev_priv(netdev);
 	if (vf >= adapter->num_vfs)
 		return -EINVAL;
+	if (!adapter->vfinfo)
+		return -EINVAL;
+
 	ivi->vf = vf;
 	memcpy(&ivi->mac, adapter->vfinfo[vf].vf_mac_addresses, ETH_ALEN);
 	ivi->max_tx_rate = adapter->vfinfo[vf].tx_rate;
-- 
1.7.10.4

^ permalink raw reply related

* RE: [RFC PATCH net-next 3/6] virtio-net: small optimization on free_old_xmit_skbs()
From: David Laight @ 2014-10-15  9:49 UTC (permalink / raw)
  To: 'Michael S. Tsirkin', Jason Wang
  Cc: eric.dumazet@gmail.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org, davem@davemloft.net
In-Reply-To: <20141015093739.GC25776@redhat.com>

From: Of Michael S. Tsirkin
> On Wed, Oct 15, 2014 at 03:25:27PM +0800, Jason Wang wrote:
> > Accumulate the sent packets and sent bytes in local variables and perform a
> > single u64_stats_update_begin/end() after.
> >
> > Cc: Rusty Russell <rusty@rustcorp.com.au>
> > Cc: Michael S. Tsirkin <mst@redhat.com>
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> 
> Not sure how much it's worth but since Eric suggested it ...

Probably depends on the actual cost of u64_stats_update_begin/end
against the likely extra saving of the tx_bytes and tx_packets
values onto the stack across the call to dev_kfree_skb_any().
(Which depends on the number of caller saved registers.)

> Acked-by: Michael S. Tsirkin <mst@redhat.com>
> 
> > ---
> >  drivers/net/virtio_net.c |   12 ++++++++----
> >  1 files changed, 8 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 3d0ce44..a4d56b8 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -833,17 +833,21 @@ static void free_old_xmit_skbs(struct send_queue *sq)
> >  	unsigned int len;
> >  	struct virtnet_info *vi = sq->vq->vdev->priv;
> >  	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> > +	u64 tx_bytes = 0, tx_packets = 0;

tx_packets need only be 'unsigned int'.
The same is almost certainly true of tx_bytes.

	David

> >  	while ((skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> >  		pr_debug("Sent skb %p\n", skb);
> >
> > -		u64_stats_update_begin(&stats->tx_syncp);
> > -		stats->tx_bytes += skb->len;
> > -		stats->tx_packets++;
> > -		u64_stats_update_end(&stats->tx_syncp);
> > +		tx_bytes += skb->len;
> > +		tx_packets++;
> >
> >  		dev_kfree_skb_any(skb);
> >  	}
> > +
> > +	u64_stats_update_begin(&stats->tx_syncp);
> > +	stats->tx_bytes += tx_bytes;
> > +	stats->tx_packets =+ tx_packets;
> > +	u64_stats_update_end(&stats->tx_syncp);
> >  }
> >
> >  static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
> > --
> > 1.7.1
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFC PATCH net-next 5/6] virtio-net: enable tx interrupt
From: Eric Dumazet @ 2014-10-15  9:37 UTC (permalink / raw)
  To: Jason Wang; +Cc: mst, netdev, linux-kernel, virtualization, davem
In-Reply-To: <1413357930-45302-6-git-send-email-jasowang@redhat.com>

On Wed, 2014-10-15 at 15:25 +0800, Jason Wang wrote:

...

> +static int free_old_xmit_skbs(struct send_queue *sq, int budget)
> +{
> +	struct sk_buff *skb;
> +	unsigned int len;
> +	struct virtnet_info *vi = sq->vq->vdev->priv;
> +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> +	u64 tx_bytes = 0, tx_packets = 0;
> +
> +	while (tx_packets < budget &&
> +	       (skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> +		pr_debug("Sent skb %p\n", skb);
> +
> +		tx_bytes += skb->len;
> +		tx_packets++;
> +
> +		dev_kfree_skb_any(skb);
> +	}
> +
> +	u64_stats_update_begin(&stats->tx_syncp);
> +	stats->tx_bytes += tx_bytes;
> +	stats->tx_packets =+ tx_packets;

	
	stats->tx_packets += tx_packets;

> +	u64_stats_update_end(&stats->tx_syncp);
> +
> +	return tx_packets;
> +}

^ permalink raw reply

* Re: [RFC PATCH net-next 3/6] virtio-net: small optimization on free_old_xmit_skbs()
From: Michael S. Tsirkin @ 2014-10-15  9:37 UTC (permalink / raw)
  To: Jason Wang; +Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem
In-Reply-To: <1413357930-45302-4-git-send-email-jasowang@redhat.com>

On Wed, Oct 15, 2014 at 03:25:27PM +0800, Jason Wang wrote:
> Accumulate the sent packets and sent bytes in local variables and perform a
> single u64_stats_update_begin/end() after.
> 
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Not sure how much it's worth but since Eric suggested it ...

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  drivers/net/virtio_net.c |   12 ++++++++----
>  1 files changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 3d0ce44..a4d56b8 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -833,17 +833,21 @@ static void free_old_xmit_skbs(struct send_queue *sq)
>  	unsigned int len;
>  	struct virtnet_info *vi = sq->vq->vdev->priv;
>  	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> +	u64 tx_bytes = 0, tx_packets = 0;
>  
>  	while ((skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
>  		pr_debug("Sent skb %p\n", skb);
>  
> -		u64_stats_update_begin(&stats->tx_syncp);
> -		stats->tx_bytes += skb->len;
> -		stats->tx_packets++;
> -		u64_stats_update_end(&stats->tx_syncp);
> +		tx_bytes += skb->len;
> +		tx_packets++;
>  
>  		dev_kfree_skb_any(skb);
>  	}
> +
> +	u64_stats_update_begin(&stats->tx_syncp);
> +	stats->tx_bytes += tx_bytes;
> +	stats->tx_packets =+ tx_packets;
> +	u64_stats_update_end(&stats->tx_syncp);
>  }
>  
>  static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
> -- 
> 1.7.1

^ permalink raw reply

* Re: [RFC PATCH net-next 3/6] virtio-net: small optimization on free_old_xmit_skbs()
From: Eric Dumazet @ 2014-10-15  9:36 UTC (permalink / raw)
  To: Jason Wang; +Cc: mst, netdev, linux-kernel, virtualization, davem
In-Reply-To: <1413357930-45302-4-git-send-email-jasowang@redhat.com>

On Wed, 2014-10-15 at 15:25 +0800, Jason Wang wrote:
> Accumulate the sent packets and sent bytes in local variables and perform a
> single u64_stats_update_begin/end() after.
> 
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/net/virtio_net.c |   12 ++++++++----
>  1 files changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 3d0ce44..a4d56b8 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -833,17 +833,21 @@ static void free_old_xmit_skbs(struct send_queue *sq)
>  	unsigned int len;
>  	struct virtnet_info *vi = sq->vq->vdev->priv;
>  	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> +	u64 tx_bytes = 0, tx_packets = 0;
>  
>  	while ((skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
>  		pr_debug("Sent skb %p\n", skb);
>  
> -		u64_stats_update_begin(&stats->tx_syncp);
> -		stats->tx_bytes += skb->len;
> -		stats->tx_packets++;
> -		u64_stats_update_end(&stats->tx_syncp);
> +		tx_bytes += skb->len;
> +		tx_packets++;
>  
>  		dev_kfree_skb_any(skb);
>  	}
> +
> +	u64_stats_update_begin(&stats->tx_syncp);
> +	stats->tx_bytes += tx_bytes;
> +	stats->tx_packets =+ tx_packets;

	
	stats->tx_packets += tx_packets;


> +	u64_stats_update_end(&stats->tx_syncp);
>  }
>  
>  static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)

^ permalink raw reply

* Re: [RFC PATCH net-next 1/6] virtio: make sure used event never go backwards
From: Michael S. Tsirkin @ 2014-10-15  9:34 UTC (permalink / raw)
  To: Jason Wang; +Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem
In-Reply-To: <1413357930-45302-2-git-send-email-jasowang@redhat.com>

On Wed, Oct 15, 2014 at 03:25:25PM +0800, Jason Wang wrote:
> This patch checks the new event idx to make sure used event idx never
> goes back. This is used to synchronize the calls between
> virtqueue_enable_cb_delayed() and virtqueue_enable_cb().
> 
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>

the implication being that moving event idx back might cause some race
condition?  If yes but please describe the race explicitly.
Is there a bug we need to fix on stable?
Please also explicitly describe a configuration that causes event idx
to go back.

All this info should go in the commit log.

> ---
>  drivers/virtio/virtio_ring.c |    7 +++++--
>  1 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 3b1f89b..1b3929f 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -559,14 +559,17 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
>  	u16 last_used_idx;
>  
>  	START_USE(vq);
> -
> +	last_used_idx = vq->last_used_idx;
>  	/* We optimistically turn back on interrupts, then check if there was
>  	 * more to do. */
>  	/* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
>  	 * either clear the flags bit or point the event index at the next
>  	 * entry. Always do both to keep code simple. */
>  	vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
> -	vring_used_event(&vq->vring) = last_used_idx = vq->last_used_idx;
> +	/* Make sure used event never go backwards */

s/go/goes/

> +	if (!vring_need_event(vring_used_event(&vq->vring),
> +			      vq->vring.avail->idx, last_used_idx))
> +		vring_used_event(&vq->vring) = last_used_idx;

The result will be that driver will *not* get an interrupt
on the next consumed buffer, which is likely not what driver
intended when it called virtqueue_enable_cb.

Instead, how about we simply document the requirement that drivers either
always call virtqueue_enable_cb_delayed or virtqueue_enable_cb
but not both?

>  	END_USE(vq);
>  	return last_used_idx;
>  }
> -- 
> 1.7.1

^ permalink raw reply

* Re: [PATCH] net: use hardware buffer pool to allocate skb
From: Stephen Hemminger @ 2014-10-15  9:33 UTC (permalink / raw)
  To: Pan Jiafei; +Cc: davem, jkosina, netdev, LeoLi, linux-doc
In-Reply-To: <1413343571-33231-1-git-send-email-Jiafei.Pan@freescale.com>

Since an skb can sit forever in an application queue, you have created
an easy way to livelock the system when enough skb's are waiting to be
read.

^ permalink raw reply

* Re: [PATCH v2] ipv4: dst_entry leak in ip_append_data()
From: Eric Dumazet @ 2014-10-15  9:30 UTC (permalink / raw)
  To: Vasily Averin
  Cc: netdev, David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy
In-Reply-To: <543E1AAF.9050601@parallels.com>

On Wed, 2014-10-15 at 10:56 +0400, Vasily Averin wrote:
> On 15.10.2014 08:46, Eric Dumazet wrote:
> > On Tue, 2014-10-14 at 08:57 +0400, Vasily Averin wrote:
> >> v2: adjust the indentation of the arguments __ip_append_data() call
> >>
> >> Fixes: 2e77d89b2fa8 ("net: avoid a pair of dst_hold()/dst_release() in ip_append_data()")
> >>
> >> If sk_write_queue is empty ip_append_data() executes ip_setup_cork()
> >> that "steals" dst entry from rt to cork. Later it calls __ip_append_data()
> >> that creates skb and adds it to sk_write_queue.
> >>
> >> If skb was added successfully following ip_push_pending_frames() call
> >> reassign dst entries from cork to skb, and kfree_skb frees dst_entry.
> >>
> >> However nobody frees stolen dst_entry if skb was not added into sk_write_queue.
> > 
> > I thought this was done by ip_flush_pending_frames() ?
> 
> Take look at ip_send_unicast_reply():

So maybe the bug is here ?


> 
> ip_flush_pending_frames() is not called if skb was not added to sk_write_queue.
> And ip_rt_put() does not work, because dst entry was stolen in ip_setup_cork().
> 
> Probably it can happen in raw_sendmsg() and udp_sendmsg() too.

UDP & RAW do :

err = ip_append_data(...);
if (err)
    udp_flush_pending_frames(sk);


It seems you chose to add a test in fast path, with not even adding an
unlikely() clause, while it seems that we took care of all the cases but
missed a single one : ip_send_unicast_reply()

I am suggesting to fix this bug in another way.

Thanks.

^ permalink raw reply

* Re: [RFC PATCH net-next 2/6] virtio: introduce virtio_enable_cb_avail()
From: Michael S. Tsirkin @ 2014-10-15  9:28 UTC (permalink / raw)
  To: Jason Wang; +Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem
In-Reply-To: <1413357930-45302-3-git-send-email-jasowang@redhat.com>

On Wed, Oct 15, 2014 at 03:25:26PM +0800, Jason Wang wrote:
> This patch introduces virtio_enable_cb_avail() to publish avail idx
> and used event. This could be used by batched buffer submitting to
> reduce the number of tx interrupts.
> 
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/virtio/virtio_ring.c |   22 ++++++++++++++++++++--
>  include/linux/virtio.h       |    2 ++
>  2 files changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 1b3929f..d67fbf8 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -567,14 +567,32 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
>  	 * entry. Always do both to keep code simple. */
>  	vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
>  	/* Make sure used event never go backwards */
> -	if (!vring_need_event(vring_used_event(&vq->vring),
> -			      vq->vring.avail->idx, last_used_idx))
> +	if (vq->vring.avail->idx != vring_used_event(&vq->vring) &&
> +	    !vring_need_event(vring_used_event(&vq->vring),
> +			      vq->vring.avail->idx, last_used_idx)) {
>  		vring_used_event(&vq->vring) = last_used_idx;
> +	}
>  	END_USE(vq);
>  	return last_used_idx;
>  }
>  EXPORT_SYMBOL_GPL(virtqueue_enable_cb_prepare);
>

I see you are also changing virtqueue_enable_cb_prepare, why?

> +bool virtqueue_enable_cb_avail(struct virtqueue *_vq)
> +{
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +	bool ret;
> +
> +	START_USE(vq);
> +	vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
> +	vring_used_event(&vq->vring) = vq->vring.avail->idx;
> +	ret = vring_need_event(vq->vring.avail->idx,
> +			       vq->last_used_idx, vq->vring.used->idx);
> +	END_USE(vq);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(virtqueue_enable_cb_avail);
> +
>  /**
>   * virtqueue_poll - query pending used buffers
>   * @vq: the struct virtqueue we're talking about.

Could not figure out what this does.
Please add documentation.

> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index b46671e..bfaf058 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -65,6 +65,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
>  
>  unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq);
>  
> +bool virtqueue_enable_cb_avail(struct virtqueue *vq);
> +
>  bool virtqueue_poll(struct virtqueue *vq, unsigned);
>  
>  bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
> -- 
> 1.7.1

^ permalink raw reply

* Re: [PATCH] net: use hardware buffer pool to allocate skb
From: Eric Dumazet @ 2014-10-15  9:15 UTC (permalink / raw)
  To: Jiafei.Pan@freescale.com
  Cc: David Miller, jkosina@suse.cz, netdev@vger.kernel.org,
	LeoLi@freescale.com, linux-doc@vger.kernel.org
In-Reply-To: <c02005b9d8794d14bc2149f468ab571c@BY2PR03MB524.namprd03.prod.outlook.com>

On Wed, 2014-10-15 at 05:34 +0000, Jiafei.Pan@freescale.com wrote:

> Yes, for this matter, in order to do this we should modify the Ethernet
> drivers. For example, driver A want to driver B, C, D.. to support driver
> A's Hardware block access functions, so we have to modify driver B, C, D...
> It will be so complex for this matter.
> 
> But by using my patch, I just modify a Ethernet device (I don't care
> Which driver it is used) flag in driver A in order to implement this
> Ethernet device using hardware block access functions provided by
> Driver A.

We care a lot of all the bugs added by your patches. You have little
idea of how many of them were added. We do not want to spend days of
work explaining everything or fixing all the details for you.

Carefully read net/core/skbuff.c, net/core/dev.c, GRO layer, you'll see
how many spots you missed.

You cannot control how skbs are cooked before reaching your driver
ndo_start_xmit(). You are not going to add hooks in UDP , TCP, or other
drivers RX path. This would be absolutely insane.

Trying to control how skb are cooked in RX path is absolutely something
drivers do, using page frags that are read-only by all the stack.

Fix your driver to use existing infra, your suggestion is not going to
be accepted.

^ permalink raw reply

* Re: [PATCH] net: use hardware buffer pool to allocate skb
From: Eric Dumazet @ 2014-10-15  9:15 UTC (permalink / raw)
  To: Jiafei.Pan@freescale.com
  Cc: davem@davemloft.net, jkosina@suse.cz, netdev@vger.kernel.org,
	LeoLi@freescale.com, linux-doc@vger.kernel.org
In-Reply-To: <efb0b7f4955243fbab9b24e0c31ea806@BY2PR03MB524.namprd03.prod.outlook.com>

On Wed, 2014-10-15 at 05:43 +0000, Jiafei.Pan@freescale.com wrote:

> For my case, there are some shortcoming to use page frags. Firstly, I have to
> modify each Ethernet drivers to support it especially I don’t which vendor's
> driver the customer will use. Secondly, it is not enough only 
> build skb by 'lock' pages, the buffer address comes from hardware block and
> should be aligned to hardware block.

So you align to hardware block. What is the problem with this ?



^ permalink raw reply

* Re: [PATCH linux v3 1/1] fs/proc: use a rb tree for the directory entries
From: Nicolas Dichtel @ 2014-10-15  9:02 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: netdev, linux-kernel, davem, akpm, adobriyan, rui.xiang, viro,
	oleg, gorcunov, kirill.shutemov, grant.likely, tytso,
	Linus Torvalds
In-Reply-To: <87y4sis9wk.fsf@x220.int.ebiederm.org>

Le 14/10/2014 21:56, Eric W. Biederman a écrit :
> Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:
>
>> Le 07/10/2014 11:02, Nicolas Dichtel a écrit :
>>> The current implementation for the directories in /proc is using a single
>>> linked list. This is slow when handling directories with large numbers of
>>> entries (eg netdevice-related entries when lots of tunnels are opened).
>>>
>>> This patch replaces this linked list by a red-black tree.
>>>
>>> Here are some numbers:
>>>
>>> dummy30000.batch contains 30 000 times 'link add type dummy'.
>>>
>>> Before the patch:
>>> $ time ip -b dummy30000.batch
>>> real	2m31.950s
>>> user	0m0.440s
>>> sys	2m21.440s
>>> $ time rmmod dummy
>>> real	1m35.764s
>>> user	0m0.000s
>>> sys	1m24.088s
>>>
>>> After the patch:
>>> $ time ip -b dummy30000.batch
>>> real	2m0.874s
>>> user	0m0.448s
>>> sys	1m49.720s
>>> $ time rmmod dummy
>>> real	1m13.988s
>>> user	0m0.000s
>>> sys	1m1.008s
>>>
>>> The idea of improving this part was suggested by
>>> Thierry Herbelot <thierry.herbelot@6wind.com>.
>>>
>>> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
>>> Acked-by: David S. Miller <davem@davemloft.net>
> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
>>> ---
>>
>> I'm not sure who is in charge of taking this patch. Should I resend it to
>> someone else or is it already included in a tree?
>
> There are a couple of things going on here.
>
> This patch came out at the beginning of the merge window which is a time
> when everything that was ready and well tested ahead of time gets
> merged.
>
> Your numbers don't look too bad, so I expect this code is ready to go
> (although I am a smidge disappointed in the small size of the
> performance improvement), my quick read through earlier did not show
> anything scary.   Do you have any idea why going from O(N^2) algorithm
> to a O(NlogN) algorithm showed such a small performance improvement with
> 30,000 entries?
perf top shows that a lot of time was wasted in vsscanf().
With my previous test file (dummy30000.batch), kernel needs to calculate
the interface name (iproute2 command was : 'link add type dummy'). Here is
another bench:

Files dummywithnameX.batch are created like this:
for i in `seq 1 X`; do echo "link add dummy$i type dummy" >> 
dummywithnameX.batch; done

Before the patch:
$ time ip -b dummywithname10000.batch
real	0m22.496s
user	0m0.196s
sys	0m5.924s
$ rmmod dummy
$ time ip -b dummywithname15000.batch
real	0m37.903s
user	0m0.348s
sys	0m13.160s
$ rmmod dummy
$ time ip -b dummywithname20000.batch
real	0m52.941s
user	0m0.396s
sys	0m20.164s
$ rmmod dummy
$ time ip -b dummywithname30000.batch
real	1m32.447s
user	0m0.660s
sys	0m43.436s

After the patch:
$ time ip -b dummywithname10000.batch
real	0m19.648s
user	0m0.180s
sys	0m2.260s
$ rmmod dummy
$ time ip -b dummywithname15000.batch
real	0m29.149s
user	0m0.256s
sys	0m3.616s
$ rmmod dummy
$ time ip -b dummywithname20000.batch
real	0m39.133s
user	0m0.440s
sys	0m4.756s
$ rmmod dummy
$ time ip -b dummywithname30000.batch
real    0m59.837s
user    0m0.520s
sys     0m7.780s

Thus an improvement of ~35% for 30k ifaces, but more importantly, after the
patch, it scales linearly.

perf top output after the patch:
      4.30%  [kernel]          [k] arch_local_irq_restore
      2.92%  [kernel]          [k] format_decode
      2.10%  [kernel]          [k] vsnprintf
      2.08%  [kernel]          [k] arch_local_irq_enable
      1.82%  [kernel]          [k] number.isra.1
      1.81%  [kernel]          [k] arch_local_irq_enable
      1.41%  [kernel]          [k] kmem_cache_alloc
      1.33%  [kernel]          [k] unmap_single_vma
      1.10%  [kernel]          [k] do_raw_spin_lock
      1.09%  [kernel]          [k] clear_page
      1.00%  [kernel]          [k] arch_local_irq_enable

>
> Normally proc is looked at by a group of folks me, Alexey Dobriyan, and
> Al Viro all sort of tag team taking care of the proc infrastructure with
> (except for Al) Andrew Morton typically taking the patches and merging
> them.
>
> I am currently in the middle of a move so I don't have the time to carry
> this change or do much of anything until I am settled again.
>
> What I would recommend is verifying your patch works against v3.18-rc1
> at the begginning of next week and sending the code to Andrew Morton.
Ok, thank you. I will do this.

Nicolas

^ permalink raw reply

* RE: [PATCH] net: use hardware buffer pool to allocate skb
From: David Laight @ 2014-10-15  8:57 UTC (permalink / raw)
  To: 'Pan Jiafei', davem@davemloft.net, jkosina@suse.cz
  Cc: netdev@vger.kernel.org, LeoLi@freescale.com,
	linux-doc@vger.kernel.org
In-Reply-To: <1413343571-33231-1-git-send-email-Jiafei.Pan@freescale.com>

From: Pan Jiafei
> In some platform, there are some hardware block provided
> to manage buffers to improve performance. So in some case,
> it is expected that the packets received by some generic
> NIC should be put into such hardware managed buffers
> directly, so that such buffer can be released by hardware
> or by driver.

This looks like some strange variant of 'buffer loaning'.
In general it just doesn't work due to the limited number
of such buffers - they soon all become queued waiting for
applications to read from sockets.

It also isn't at all clear how you expect a 'generic NIC'
to actually allocate buffers from your 'special area'.

	David

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox