Netdev List
 help / color / mirror / Atom feed
* Re: [RFC PATCH net-next 2/6] virtio: introduce virtio_enable_cb_avail()
From: Michael S. Tsirkin @ 2014-10-15 11:43 UTC (permalink / raw)
  To: Jason Wang; +Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem
In-Reply-To: <543E5347.7060106@redhat.com>

On Wed, Oct 15, 2014 at 06:58:15PM +0800, Jason Wang wrote:
> On 10/15/2014 06:41 PM, Michael S. Tsirkin wrote:
> > On Wed, Oct 15, 2014 at 06:19:15PM +0800, Jason Wang wrote:
> >> On 10/15/2014 05:28 PM, Michael S. Tsirkin wrote:
> >>> On Wed, Oct 15, 2014 at 03:25:26PM +0800, Jason Wang wrote:
> >>>> This patch introduces virtio_enable_cb_avail() to publish avail idx
> >>>> and used event. This could be used by batched buffer submitting to
> >>>> reduce the number of tx interrupts.
> >>>>
> >>>> Cc: Rusty Russell <rusty@rustcorp.com.au>
> >>>> Cc: Michael S. Tsirkin <mst@redhat.com>
> >>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
> >>>> ---
> >>>>  drivers/virtio/virtio_ring.c |   22 ++++++++++++++++++++--
> >>>>  include/linux/virtio.h       |    2 ++
> >>>>  2 files changed, 22 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> >>>> index 1b3929f..d67fbf8 100644
> >>>> --- a/drivers/virtio/virtio_ring.c
> >>>> +++ b/drivers/virtio/virtio_ring.c
> >>>> @@ -567,14 +567,32 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
> >>>>  	 * entry. Always do both to keep code simple. */
> >>>>  	vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
> >>>>  	/* Make sure used event never go backwards */
> >>>> -	if (!vring_need_event(vring_used_event(&vq->vring),
> >>>> -			      vq->vring.avail->idx, last_used_idx))
> >>>> +	if (vq->vring.avail->idx != vring_used_event(&vq->vring) &&
> >>>> +	    !vring_need_event(vring_used_event(&vq->vring),
> >>>> +			      vq->vring.avail->idx, last_used_idx)) {
> >>>>  		vring_used_event(&vq->vring) = last_used_idx;
> >>>> +	}
> >>>>  	END_USE(vq);
> >>>>  	return last_used_idx;
> >>>>  }
> >>>>  EXPORT_SYMBOL_GPL(virtqueue_enable_cb_prepare);
> >>>>
> >>> I see you are also changing virtqueue_enable_cb_prepare, why?
> >> This is also used to prevent it from moving the used event backwards.
> >> This may happens when we handle tx napi after we publish avail idx as
> >> used event (virtqueue_enable_cb_avail() was called).
> > So it's wrong exactly in the same way.
> >
> > But also, please document this stuff, don't put
> > unrelated changes in a patch called "introduce
> > virtqueue_enable_cb_avail".
> >
> >
> >>>> +bool virtqueue_enable_cb_avail(struct virtqueue *_vq)
> >>>> +{
> >>>> +	struct vring_virtqueue *vq = to_vvq(_vq);
> >>>> +	bool ret;
> >>>> +
> >>>> +	START_USE(vq);
> >>>> +	vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
> >>>> +	vring_used_event(&vq->vring) = vq->vring.avail->idx;
> >>>> +	ret = vring_need_event(vq->vring.avail->idx,
> >>>> +			       vq->last_used_idx, vq->vring.used->idx);
> >>>> +	END_USE(vq);
> >>>> +
> >>>> +	return ret;
> >>>> +}
> >>>> +EXPORT_SYMBOL_GPL(virtqueue_enable_cb_avail);
> >>>> +
> >>>>  /**
> >>>>   * virtqueue_poll - query pending used buffers
> >>>>   * @vq: the struct virtqueue we're talking about.
> >>> Could not figure out what this does.
> >>> Please add documentation.
> >>>
> >> Sure, does something like below explain what does this function do?
> >>
> >> /**                                                                            
> >>
> >>  * virtqueue_enable_cb_avail - restart callbacks after
> >> disable_cb.           
> >>  * @vq: the struct virtqueue we're talking
> >> about.                              
> >>  *                                                                             
> >>
> >>  * This re-enables callbacks but hints to the other side to
> >> delay              
> >>  * interrupts all of the available buffers have been processed;         
> >
> > So this is like virtqueue_enable_cb_delayed but even more
> > aggressive?
> > I think it's too agressive: it's better to wake up guest
> > after you are through most of buffers, but not all,
> > so guest and host can work in parallel.
> 
> Note that:
> 
> - it was only used when there are still few of free slots (which is
> greater than 2 + MAX_SKB_FRAGS)
> - my patch keeps the free_old_xmit_skbs() in the beginning of
> start_xmit(), so the tx skb reclaiming does not depends totally on tx
> interrupt. If more packets comes, we'd expect some of them were freed in
> ndo_start_xmit(). If not, finally we may trigger the
> virtqueue_enable_cb_delayed().
> 
> So probably not as aggressive as it looks. I will do benchmark on this.

Mine too:
        } else if (virtqueue_enable_cb_delayed(sq->vq)) {
                free_old_xmit_skbs(txq, sq, qsize);
        }



> >
> >
> >>  * it returns "false" if there are at least one pending buffer in the
> >> queue,          
> >>  * to detect a possible race between the driver checking for more
> >> work,        
> >>  * and enabling
> >> callbacks.                                                     
> >>  *                                                                             
> >>
> >>  * Caller must ensure we don't call this with other
> >> virtqueue                  
> >>  * operations at the same time (except where
> >> noted).                           
> >>  */
> >>
> >>>> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> >>>> index b46671e..bfaf058 100644
> >>>> --- a/include/linux/virtio.h
> >>>> +++ b/include/linux/virtio.h
> >>>> @@ -65,6 +65,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
> >>>>  
> >>>>  unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq);
> >>>>  
> >>>> +bool virtqueue_enable_cb_avail(struct virtqueue *vq);
> >>>> +
> >>>>  bool virtqueue_poll(struct virtqueue *vq, unsigned);
> >>>>  
> >>>>  bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
> >>>> -- 
> >>>> 1.7.1
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> >>> the body of a message to majordomo@vger.kernel.org
> >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>> Please read the FAQ at  http://www.tux.org/lkml/
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply

* Re: [PATCH] tcp: refine autocork condition check
From: Eric Dumazet @ 2014-10-15 11:47 UTC (permalink / raw)
  To: Weiping Pan; +Cc: netdev, edumazet
In-Reply-To: <8e0610510498c7f6ecbe2e99ab6044030f93f792.1413369212.git.panweiping3@gmail.com>

On Wed, 2014-10-15 at 18:34 +0800, Weiping Pan wrote:
> Inspired by commit b2532eb9abd8 (tcp: fix ooo_okay setting vs Small Queues).
> 
> The last check in tcp_should_autocork() was meant to check that whether we
> only have an ACK in Qdisc/NIC queues, or if TX completion was delayed after we
> processed ACK packet, if so, we should push the packet immediately instead of
> corking it.
> Therefore we should compare sk_wmem_alloc with SKB_TRUESIZE(1) instead of
> skb->truesize.
> 
> After this patch, tcp should have more chances to be corked, and the
> performance should be a little better.  And netperf shows that this patch
> works as expected.
> 
> ./super_netperf.sh 300 -H 10.16.42.249 -t TCP_STREAM -- -m 1 -M 1
> 	      speed	TCPAutoCorking
> Before patch: 169.38    222278
> After patch:  173.27    232988
> 

I do not see how this patch changes anything on this workload, I suspect
noise in your tests ? Full nstat output would give some hints maybe.

TCP_STREAM netperfs send no ACK packets at all.

I am concerned that this patch adds some latencies, and this wont be
seen with your TCP_STREAM test.

Autocorking is a trade off between throughput and latencies.

We need extensive tests, using TCP_RR with various sizes.

Existing behavior is telling that if a prior packet is in qdisc, and
this skb has a bigger truesize, we do not autocork.


In practice, you might hold now packets that are quite big, (more than
SKB_WITH_OVERHEAD(2048 - MAX_TCP_HEADER) bytes of payload.

Typical cases is applications using two writes, one to send a small
header, one for the body of the request/answer.

Existing code is better because we allow the second send() to be pushed
to the qdisc/NIC, before first send is TX completed.

^ permalink raw reply

* Re: [PATCH RFC] virtio_net: enable tx interrupt
From: Michael S. Tsirkin @ 2014-10-15 11:57 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, linux-kernel, virtualization
In-Reply-To: <543E54B4.7040602@redhat.com>

On Wed, Oct 15, 2014 at 07:04:20PM +0800, Jason Wang wrote:
> On 10/15/2014 05:53 AM, Michael S. Tsirkin wrote:
> > On newer hosts that support delayed tx interrupts,
> > we probably don't have much to gain from orphaning
> > packets early.
> >
> > Based on patch by Jason Wang.
> >
> > Note: this will likely degrade performance for hosts without event idx
> > support.  Various fallback options are available, including
> > orphaning conditionally.
> > Testing TBD.
> >
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >  drivers/net/virtio_net.c | 119 +++++++++++++++++++++++++++++++++--------------
> >  1 file changed, 83 insertions(+), 36 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 6b6e136..62c059d 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -72,6 +72,8 @@ struct send_queue {
> >  
> >  	/* Name of the send queue: output.$index */
> >  	char name[40];
> > +
> > +	struct napi_struct napi;
> >  };
> >  
> >  /* Internal representation of a receive virtqueue */
> > @@ -211,15 +213,38 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
> >  	return p;
> >  }
> >  
> > +static int free_old_xmit_skbs(struct send_queue *sq, int budget)
> > +{
> > +	struct sk_buff *skb;
> > +	unsigned int len;
> > +	struct virtnet_info *vi = sq->vq->vdev->priv;
> > +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> > +	int sent = 0;
> > +
> > +	while (sent < budget &&
> > +	       (skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> > +		pr_debug("Sent skb %p\n", skb);
> > +
> > +		u64_stats_update_begin(&stats->tx_syncp);
> > +		stats->tx_bytes += skb->len;
> > +		stats->tx_packets++;
> > +		u64_stats_update_end(&stats->tx_syncp);
> > +
> > +		dev_kfree_skb_any(skb);
> > +		sent++;
> > +	}
> > +
> > +	return sent;
> > +}
> > +
> >  static void skb_xmit_done(struct virtqueue *vq)
> >  {
> >  	struct virtnet_info *vi = vq->vdev->priv;
> > +	struct send_queue *sq = &vi->sq[vq2txq(vq)];
> >  
> > -	/* Suppress further interrupts. */
> > -	virtqueue_disable_cb(vq);
> > -
> > -	/* We were probably waiting for more output buffers. */
> > -	netif_wake_subqueue(vi->dev, vq2txq(vq));
> > +	if (napi_schedule_prep(&sq->napi)) {
> > +		__napi_schedule(&sq->napi);
> > +	}
> >  }
> >  
> >  static unsigned int mergeable_ctx_to_buf_truesize(unsigned long mrg_ctx)
> > @@ -766,6 +791,37 @@ again:
> >  	return received;
> >  }
> >  
> > +static int virtnet_poll_tx(struct napi_struct *napi, int budget)
> > +{
> > +	struct send_queue *sq =
> > +		container_of(napi, struct send_queue, napi);
> > +	struct virtnet_info *vi = sq->vq->vdev->priv;
> > +	struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, vq2txq(sq->vq));
> > +	unsigned int r, sent = 0;
> > +
> > +again:
> > +	__netif_tx_lock(txq, smp_processor_id());
> > +	virtqueue_disable_cb(sq->vq);
> > +	sent += free_old_xmit_skbs(sq, budget - sent);
> > +
> > +	if (sent < budget) {
> > +		r = virtqueue_enable_cb_prepare(sq->vq);
> 
> So even virtqueue_enable_cb_delayed() was used in start_xmit(). This can
> move used index backwards to trigger unnecessary interrupts.

Good point. I'll rework this to use virtqueue_enable_cb_delayed.

virtqueue_enable_cb_delayed_prepare might be nice to
reduce lock contention, but that needs to be benchmarked.


> > +		napi_complete(napi);
> > +		__netif_tx_unlock(txq);
> > +		if (unlikely(virtqueue_poll(sq->vq, r)) &&
> > +		    napi_schedule_prep(napi)) {
> > +			virtqueue_disable_cb(sq->vq);
> > +			__napi_schedule(napi);
> > +			goto again;
> > +		}
> > +	} else {
> > +		__netif_tx_unlock(txq);
> > +	}
> > +
> > +	netif_wake_subqueue(vi->dev, vq2txq(sq->vq));
> > +	return sent;
> > +}
> > +
> >  #ifdef CONFIG_NET_RX_BUSY_POLL
> >  /* must be called with local_bh_disable()d */
> >  static int virtnet_busy_poll(struct napi_struct *napi)
> > @@ -814,30 +870,12 @@ static int virtnet_open(struct net_device *dev)
> >  			if (!try_fill_recv(&vi->rq[i], GFP_KERNEL))
> >  				schedule_delayed_work(&vi->refill, 0);
> >  		virtnet_napi_enable(&vi->rq[i]);
> > +		napi_enable(&vi->sq[i].napi);
> >  	}
> >  
> >  	return 0;
> >  }
> >  
> > -static void free_old_xmit_skbs(struct send_queue *sq)
> > -{
> > -	struct sk_buff *skb;
> > -	unsigned int len;
> > -	struct virtnet_info *vi = sq->vq->vdev->priv;
> > -	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> > -
> > -	while ((skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> > -		pr_debug("Sent skb %p\n", skb);
> > -
> > -		u64_stats_update_begin(&stats->tx_syncp);
> > -		stats->tx_bytes += skb->len;
> > -		stats->tx_packets++;
> > -		u64_stats_update_end(&stats->tx_syncp);
> > -
> > -		dev_kfree_skb_any(skb);
> > -	}
> > -}
> > -
> >  static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
> >  {
> >  	struct skb_vnet_hdr *hdr;
> > @@ -902,7 +940,9 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
> >  		sg_set_buf(sq->sg, hdr, hdr_len);
> >  		num_sg = skb_to_sgvec(skb, sq->sg + 1, 0, skb->len) + 1;
> >  	}
> > -	return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb, GFP_ATOMIC);
> > +
> > +	return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb,
> > +				    GFP_ATOMIC);
> >  }
> >  
> >  static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> > @@ -910,10 +950,9 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> >  	struct virtnet_info *vi = netdev_priv(dev);
> >  	int qnum = skb_get_queue_mapping(skb);
> >  	struct send_queue *sq = &vi->sq[qnum];
> > -	int err;
> > +	int err, qsize = virtqueue_get_vring_size(sq->vq);
> >  
> > -	/* Free up any pending old buffers before queueing new ones. */
> > -	free_old_xmit_skbs(sq);
> 
> I think we'd better keep this. Since it may the tx skb freeing not
> totally depends on the tx interrupt, delayed interrupt may work better
> without damage the latency.

Hmm ok but I think it's best to do it at the end,
after we have sent the packet.
Will update the patch.

> > +	virtqueue_disable_cb(sq->vq);
> >  
> >  	/* Try to transmit */
> >  	err = xmit_skb(sq, skb);
> > @@ -930,22 +969,20 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> >  	}
> >  	virtqueue_kick(sq->vq);
> >  
> > -	/* Don't wait up for transmitted skbs to be freed. */
> > -	skb_orphan(skb);
> > -	nf_reset(skb);
> > -
> >  	/* Apparently nice girls don't return TX_BUSY; stop the queue
> >  	 * before it gets out of hand.  Naturally, this wastes entries. */
> >  	if (sq->vq->num_free < 2+MAX_SKB_FRAGS) {
> >  		netif_stop_subqueue(dev, qnum);
> >  		if (unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
> >  			/* More just got used, free them then recheck. */
> > -			free_old_xmit_skbs(sq);
> > +			free_old_xmit_skbs(sq, qsize);
> >  			if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
> >  				netif_start_subqueue(dev, qnum);
> >  				virtqueue_disable_cb(sq->vq);
> >  			}
> >  		}
> > +	} else if (virtqueue_enable_cb_delayed(sq->vq)) {
> > +		free_old_xmit_skbs(sq, qsize);
> >  	}
> >  
> >  	return NETDEV_TX_OK;
> > @@ -1124,8 +1161,10 @@ static int virtnet_close(struct net_device *dev)
> >  	/* Make sure refill_work doesn't re-enable napi! */
> >  	cancel_delayed_work_sync(&vi->refill);
> >  
> > -	for (i = 0; i < vi->max_queue_pairs; i++)
> > +	for (i = 0; i < vi->max_queue_pairs; i++) {
> >  		napi_disable(&vi->rq[i].napi);
> > +		napi_disable(&vi->sq[i].napi);
> > +	}
> >  
> >  	return 0;
> >  }
> > @@ -1438,8 +1477,10 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> >  {
> >  	int i;
> >  
> > -	for (i = 0; i < vi->max_queue_pairs; i++)
> > +	for (i = 0; i < vi->max_queue_pairs; i++) {
> >  		netif_napi_del(&vi->rq[i].napi);
> > +		netif_napi_del(&vi->sq[i].napi);
> > +	}
> >  
> >  	kfree(vi->rq);
> >  	kfree(vi->sq);
> > @@ -1593,6 +1634,8 @@ static int virtnet_alloc_queues(struct virtnet_info *vi)
> >  		netif_napi_add(vi->dev, &vi->rq[i].napi, virtnet_poll,
> >  			       napi_weight);
> >  		napi_hash_add(&vi->rq[i].napi);
> > +		netif_napi_add(vi->dev, &vi->sq[i].napi, virtnet_poll_tx,
> > +			       napi_weight);
> >  
> >  		sg_init_table(vi->rq[i].sg, ARRAY_SIZE(vi->rq[i].sg));
> >  		ewma_init(&vi->rq[i].mrg_avg_pkt_len, 1, RECEIVE_AVG_WEIGHT);
> > @@ -1893,8 +1936,10 @@ static int virtnet_freeze(struct virtio_device *vdev)
> >  	if (netif_running(vi->dev)) {
> >  		for (i = 0; i < vi->max_queue_pairs; i++) {
> >  			napi_disable(&vi->rq[i].napi);
> > +			napi_disable(&vi->sq[i].napi);
> >  			napi_hash_del(&vi->rq[i].napi);
> >  			netif_napi_del(&vi->rq[i].napi);
> > +			netif_napi_del(&vi->sq[i].napi);
> >  		}
> >  	}
> >  
> > @@ -1919,8 +1964,10 @@ static int virtnet_restore(struct virtio_device *vdev)
> >  			if (!try_fill_recv(&vi->rq[i], GFP_KERNEL))
> >  				schedule_delayed_work(&vi->refill, 0);
> >  
> > -		for (i = 0; i < vi->max_queue_pairs; i++)
> > +		for (i = 0; i < vi->max_queue_pairs; i++) {
> >  			virtnet_napi_enable(&vi->rq[i]);
> > +			napi_enable(&vi->sq[i].napi);
> > +		}
> >  	}
> >  
> >  	netif_device_attach(vi->dev);

^ permalink raw reply

* Re: [RFC PATCH net-next 0/6] Always use tx interrupt for virtio-net
From: Michael S. Tsirkin @ 2014-10-15 11:58 UTC (permalink / raw)
  To: Jason Wang; +Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem
In-Reply-To: <543E5706.10305@redhat.com>

On Wed, Oct 15, 2014 at 07:14:14PM +0800, Jason Wang wrote:
> On 10/15/2014 06:25 PM, Michael S. Tsirkin wrote:
> > On Wed, Oct 15, 2014 at 03:25:24PM +0800, Jason Wang wrote:
> >> According to David, proper accounting and queueing (at all levels, not
> >> just TCP sockets) is more important than trying to skim a bunch of
> >> cycles by avoiding TX interrupts.
> > He also mentioned we should find other ways to batch
> >
> 
> Right.
> >> Having an event to free the SKB is
> >> absolutely essential for the stack to operate correctly.
> >>
> >> This series tries to enable tx interrupt for virtio-net. The idea is
> >> simple: enable tx interrupt and schedule a tx napi to free old xmit
> >> skbs.
> >>
> >> Several notes:
> >> - Tx interrupt storm avoidance when queue is about to be full is
> >>   kept.
> > But queue is typically *not* full. More important to avoid interrupt
> > storms in that case IMO.
> 
> Yes.
> >> Since we may enable callbacks in both ndo_start_xmit() and tx
> >>   napi, patch 1 adds a check to make sure used event never go
> >>   back. This will let the napi not enable the callbacks wrongly after
> >>   delayed callbacks was used.
> > So why not just use delayed callbacks?
> 
> This means the tx interrupt are coalesced in a somewhat adaptive way.
> Need benchmark to see its effect.

I think it's a minimal change, and does not need new APIs.
If that's not optimal, we can do smarter things on top.

> >
> >> - For bulk dequeuing, there's no need to enable tx interrupt for each
> >>   packet. The last patch only enable tx interrupt for the final skb in
> >>   the chain through xmit_more and a new helper to publish current avail
> >>   idx as used event.
> >>
> >> This series fixes several issues of original rfc pointed out by Michael.
> > Could you list the issues, for ease of review?
> 
> Probably just one:
> 
> - Move the virtqueue_disable_cb() from skb_xmit_done() into
> virtnet_poll_tx() under tx lock.

I think I did this already, I'll recheck.

^ permalink raw reply

* Re: [RFC PATCH net-next 3/6] virtio-net: small optimization on free_old_xmit_skbs()
From: Michael S. Tsirkin @ 2014-10-15 12:00 UTC (permalink / raw)
  To: David Laight
  Cc: eric.dumazet@gmail.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org, davem@davemloft.net
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6D1C9D7688@AcuExch.aculab.com>

On Wed, Oct 15, 2014 at 10:51:32AM +0000, David Laight wrote:
> From: Michael S. Tsirkin
> > On Wed, Oct 15, 2014 at 09:49:01AM +0000, David Laight wrote:
> > > From: Of Michael S. Tsirkin
> > > > On Wed, Oct 15, 2014 at 03:25:27PM +0800, Jason Wang wrote:
> > > > > Accumulate the sent packets and sent bytes in local variables and perform a
> > > > > single u64_stats_update_begin/end() after.
> > > > >
> > > > > Cc: Rusty Russell <rusty@rustcorp.com.au>
> > > > > Cc: Michael S. Tsirkin <mst@redhat.com>
> > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > >
> > > > Not sure how much it's worth but since Eric suggested it ...
> > >
> > > Probably depends on the actual cost of u64_stats_update_begin/end
> > > against the likely extra saving of the tx_bytes and tx_packets
> > > values onto the stack across the call to dev_kfree_skb_any().
> > > (Which depends on the number of caller saved registers.)
> > 
> > Yea, some benchmark results would be nice to see.
> 
> I there are likely to be multiple skb on the queue the fastest
> code would probably do one 'virtqueue_get_all()' that returned
> a linked list of buffers, then follow the list to get the stats,
> and follow it again to free the skb.
> 
> 	David

Interesting. Each time we tried playing with linked list in the past,
it simply destroyed performance.
Maybe this case is different but I have my doubts.

> > > > Acked-by: Michael S. Tsirkin <mst@redhat.com>
> > > >
> > > > > ---
> > > > >  drivers/net/virtio_net.c |   12 ++++++++----
> > > > >  1 files changed, 8 insertions(+), 4 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > index 3d0ce44..a4d56b8 100644
> > > > > --- a/drivers/net/virtio_net.c
> > > > > +++ b/drivers/net/virtio_net.c
> > > > > @@ -833,17 +833,21 @@ static void free_old_xmit_skbs(struct send_queue *sq)
> > > > >  	unsigned int len;
> > > > >  	struct virtnet_info *vi = sq->vq->vdev->priv;
> > > > >  	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> > > > > +	u64 tx_bytes = 0, tx_packets = 0;
> > >
> > > tx_packets need only be 'unsigned int'.
> > > The same is almost certainly true of tx_bytes.
> > >
> > > 	David
> > >
> > > > >  	while ((skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> > > > >  		pr_debug("Sent skb %p\n", skb);
> > > > >
> > > > > -		u64_stats_update_begin(&stats->tx_syncp);
> > > > > -		stats->tx_bytes += skb->len;
> > > > > -		stats->tx_packets++;
> > > > > -		u64_stats_update_end(&stats->tx_syncp);
> > > > > +		tx_bytes += skb->len;
> > > > > +		tx_packets++;
> > > > >
> > > > >  		dev_kfree_skb_any(skb);
> > > > >  	}
> > > > > +
> > > > > +	u64_stats_update_begin(&stats->tx_syncp);
> > > > > +	stats->tx_bytes += tx_bytes;
> > > > > +	stats->tx_packets =+ tx_packets;
> > > > > +	u64_stats_update_end(&stats->tx_syncp);
> > > > >  }
> > > > >
> > > > >  static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
> > > > > --
> > > > > 1.7.1
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > >
> 

^ permalink raw reply

* Re: [PATCH] gianfar: disable vlan tag insertion by default on 2.6.x
From: Willy Tarreau @ 2014-10-15 12:01 UTC (permalink / raw)
  To: zhuyj
  Cc: yzhu1, sandeep.kumar, netdev, linux-kernel, Yue.Tao, guang.yang,
	joe, festevam, richardcochran, clarocq, yongjun_wei,
	claudiu.manoil, roy.xu, sky.wangfeng
In-Reply-To: <543E4AE0.2010207@gmail.com>

Hi,

On Wed, Oct 15, 2014 at 06:22:24PM +0800, zhuyj wrote:
> Sorry, it is my fault.

no problem, don't worry :-)

> Please apply this patch in the attachment.

Thank you, patch queued now!

Willy

^ permalink raw reply

* [PATCH] ipv4: dst_entry leak in ip_send_unicast_reply()
From: Vasily Averin @ 2014-10-15 12:24 UTC (permalink / raw)
  To: Eric Dumazet, netdev, David S. Miller
  Cc: Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI,
	Patrick McHardy

Fixes: 2e77d89b2fa8 ("net: avoid a pair of dst_hold()/dst_release() in ip_append_data()")

ip_setup_cork() called inside ip_append_data() steals dst entry from rt to cork
and in case errors in __ip_append_data() nobody frees stolen dst entry

Signed-off-by: Vasily Averin <vvs@parallels.com>
---
 net/ipv4/ip_output.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index e35b712..88e5ef2 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1535,6 +1535,7 @@ void ip_send_unicast_reply(struct net *net, struct sk_buff *skb,
 	struct sk_buff *nskb;
 	struct sock *sk;
 	struct inet_sock *inet;
+	int err;
 
 	if (__ip_options_echo(&replyopts.opt.opt, skb, sopt))
 		return;
@@ -1574,8 +1575,13 @@ void ip_send_unicast_reply(struct net *net, struct sk_buff *skb,
 	sock_net_set(sk, net);
 	__skb_queue_head_init(&sk->sk_write_queue);
 	sk->sk_sndbuf = sysctl_wmem_default;
-	ip_append_data(sk, &fl4, ip_reply_glue_bits, arg->iov->iov_base, len, 0,
-		       &ipc, &rt, MSG_DONTWAIT);
+	err = ip_append_data(sk, &fl4, ip_reply_glue_bits, arg->iov->iov_base,
+			     len, 0, &ipc, &rt, MSG_DONTWAIT);
+	if (unlikely(err)) {
+		ip_flush_pending_frames(sk);
+		goto out;
+	}
+
 	nskb = skb_peek(&sk->sk_write_queue);
 	if (nskb) {
 		if (arg->csumoffset >= 0)
@@ -1587,7 +1593,7 @@ void ip_send_unicast_reply(struct net *net, struct sk_buff *skb,
 		skb_set_queue_mapping(nskb, skb_get_queue_mapping(skb));
 		ip_push_pending_frames(sk, &fl4);
 	}
-
+out:
 	put_cpu_var(unicast_sock);
 
 	ip_rt_put(rt);
-- 
1.9.1

^ permalink raw reply related

* [PATCH] virtio_net: fix use after free
From: Michael S. Tsirkin @ 2014-10-15 13:23 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev, David S. Miller, virtualization

commit 0b725a2ca61bedc33a2a63d0451d528b268cf975
    net: Remove ndo_xmit_flush netdev operation, use signalling instead.

added code that looks at skb->xmit_more after the skb has
been put in TX VQ. Since some paths process the ring and free the skb
immediately, this can cause use after free.

Fix by storing xmit_more in a local variable.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---

David, am I using the API correctly?
Seems to work for me.
You used __netif_subqueue_stopped but that seems to use
a slightly more expensive test_bit internally.
The reason I added a variable for the txq here is because it's handy for
BQL patch later on.


 drivers/net/virtio_net.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 3d0ce44..13d0a8b 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -920,6 +920,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 	int qnum = skb_get_queue_mapping(skb);
 	struct send_queue *sq = &vi->sq[qnum];
 	int err;
+	struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
+	bool kick = !skb->xmit_more;
 
 	/* Free up any pending old buffers before queueing new ones. */
 	free_old_xmit_skbs(sq);
@@ -956,7 +958,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 		}
 	}
 
-	if (__netif_subqueue_stopped(dev, qnum) || !skb->xmit_more)
+	if (kick || netif_xmit_stopped(txq))
 		virtqueue_kick(sq->vq);
 
 	return NETDEV_TX_OK;
-- 
MST

^ permalink raw reply related

* RE: [PATCH] virtio_net: fix use after free
From: David Laight @ 2014-10-15 13:24 UTC (permalink / raw)
  To: 'Michael S. Tsirkin', linux-kernel@vger.kernel.org
  Cc: David S. Miller, Rusty Russell,
	virtualization@lists.linux-foundation.org, netdev@vger.kernel.org,
	Jason Wang
In-Reply-To: <1413378878-28118-1-git-send-email-mst@redhat.com>

From: Michael S. Tsirkin
> commit 0b725a2ca61bedc33a2a63d0451d528b268cf975
>     net: Remove ndo_xmit_flush netdev operation, use signalling instead.
> 
> added code that looks at skb->xmit_more after the skb has
> been put in TX VQ. Since some paths process the ring and free the skb
> immediately, this can cause use after free.
> 
> Fix by storing xmit_more in a local variable.
> 
> Cc: David S. Miller <davem@davemloft.net>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> 
> David, am I using the API correctly?
> Seems to work for me.
> You used __netif_subqueue_stopped but that seems to use
> a slightly more expensive test_bit internally.
> The reason I added a variable for the txq here is because it's handy for
> BQL patch later on.
> 
> 
>  drivers/net/virtio_net.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 3d0ce44..13d0a8b 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -920,6 +920,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>  	int qnum = skb_get_queue_mapping(skb);
>  	struct send_queue *sq = &vi->sq[qnum];
>  	int err;
> +	struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);

Do you need to cache 'txq' on stack for the entire call?
Looks like it is only needed when 'kick' is true.
I've not looked to see if saves both 'dev' and 'qnum' being kept.

In any case it isn't mentioned in the commit message.

	David

> +	bool kick = !skb->xmit_more;
> 
>  	/* Free up any pending old buffers before queueing new ones. */
>  	free_old_xmit_skbs(sq);
> @@ -956,7 +958,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>  		}
>  	}
> 
> -	if (__netif_subqueue_stopped(dev, qnum) || !skb->xmit_more)
> +	if (kick || netif_xmit_stopped(txq))
>  		virtqueue_kick(sq->vq);
> 
>  	return NETDEV_TX_OK;
> --
> MST
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v2 1/1] net: fec: ptp: fix convergence issue to support LinuxPTP stack
From: Richard Cochran @ 2014-10-15 13:25 UTC (permalink / raw)
  To: Fugang Duan; +Cc: davem, netdev, bhutchings, b20596
In-Reply-To: <1413365412-22072-1-git-send-email-b38611@freescale.com>

On Wed, Oct 15, 2014 at 05:30:12PM +0800, Fugang Duan wrote:
> Changes V2:
> Modify the commit/comments log to describe the issue clearly.

Thanks, it is more clear now.

Acked-by: Richard Cochran <richardcochran@gmail.com>

^ permalink raw reply

* Re: [PATCH] virtio_net: fix use after free
From: Michael S. Tsirkin @ 2014-10-15 13:37 UTC (permalink / raw)
  To: David Laight
  Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org, David S. Miller
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6D1C9D77AB@AcuExch.aculab.com>

On Wed, Oct 15, 2014 at 01:24:57PM +0000, David Laight wrote:
> From: Michael S. Tsirkin
> > commit 0b725a2ca61bedc33a2a63d0451d528b268cf975
> >     net: Remove ndo_xmit_flush netdev operation, use signalling instead.
> > 
> > added code that looks at skb->xmit_more after the skb has
> > been put in TX VQ. Since some paths process the ring and free the skb
> > immediately, this can cause use after free.
> > 
> > Fix by storing xmit_more in a local variable.
> > 
> > Cc: David S. Miller <davem@davemloft.net>
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> > 
> > David, am I using the API correctly?
> > Seems to work for me.
> > You used __netif_subqueue_stopped but that seems to use
> > a slightly more expensive test_bit internally.
> > The reason I added a variable for the txq here is because it's handy for
> > BQL patch later on.
> > 
> > 
> >  drivers/net/virtio_net.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 3d0ce44..13d0a8b 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -920,6 +920,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> >  	int qnum = skb_get_queue_mapping(skb);
> >  	struct send_queue *sq = &vi->sq[qnum];
> >  	int err;
> > +	struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
> 
> Do you need to cache 'txq' on stack for the entire call?
> Looks like it is only needed when 'kick' is true.
> I've not looked to see if saves both 'dev' and 'qnum' being kept.
> 
> In any case it isn't mentioned in the commit message.
> 
> 	David

Code seems slightly neater this way, I haven't bothered to
micro-optimize it to this level yet.
Want to benchmark and send a patch on top?

> > +	bool kick = !skb->xmit_more;
> > 
> >  	/* Free up any pending old buffers before queueing new ones. */
> >  	free_old_xmit_skbs(sq);
> > @@ -956,7 +958,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> >  		}
> >  	}
> > 
> > -	if (__netif_subqueue_stopped(dev, qnum) || !skb->xmit_more)
> > +	if (kick || netif_xmit_stopped(txq))
> >  		virtqueue_kick(sq->vq);
> > 
> >  	return NETDEV_TX_OK;
> > --
> > MST
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* [NET/USB] ax88179_set_mac_addr: return only error or zero
From: Ian Morgan @ 2014-10-15 13:49 UTC (permalink / raw)
  To: netdev, linux-usb

[-- Attachment #1: Type: text/plain, Size: 2024 bytes --]

(It's about a USB NIC, so not sure if this is suitable to the netdev
mailing list or the linux-usb mailing list, so you both get it.)

I had an issue with bonding one of these USB3 Gigabit NIC devices
because set_mac_addr was returning a positive value (6), and the
bonding driver thinks that any non-zero value is an error.

My proposed fix was to (I thought correctly) check for negative return
codes in bond_enslave() instead of any non-zero value.

I was told by the bonding driver maintainers that my proposed fix was
incorrect, and that the device driver should return only zero on
success, as other drivers do this and that assumption has been made.
I'm really not sure why the more vigilant check in the bonding driver
was "wrong" though. We all know the rule about assumptions ;-) But I
defer to their superior knowledge of kernel interfaces.

Anyhow, it would seem that the following patch is the "more
acceptable" solution, and is in line with what other drivers do:


--- linux-3.17/drivers/net/usb/ax88179_178a.c.orig  2014-10-05
15:23:04.000000000 -0400
+++ linux-3.17/drivers/net/usb/ax88179_178a.c   2014-10-15
09:07:31.217810217 -0400
@@ -937,6 +937,7 @@ static int ax88179_set_mac_addr(struct n
 {
        struct usbnet *dev = netdev_priv(net);
        struct sockaddr *addr = p;
+       int ret;

        if (netif_running(net))
                return -EBUSY;
@@ -946,8 +947,12 @@ static int ax88179_set_mac_addr(struct n
        memcpy(net->dev_addr, addr->sa_data, ETH_ALEN);

        /* Set the MAC address */
-       return ax88179_write_cmd(dev, AX_ACCESS_MAC, AX_NODE_ID, ETH_ALEN,
+       ret = ax88179_write_cmd(dev, AX_ACCESS_MAC, AX_NODE_ID, ETH_ALEN,
                                 ETH_ALEN, net->dev_addr);
+       if (ret < 0)
+               return ret;
+
+       return 0;
 }

 static const struct net_device_ops ax88179_netdev_ops = {


(non-tab-mangled patch file attached)

Your feedback is duly appreciated. Please CC me, as I am not
subscribed to these mailing lists.

--
Ian Morgan

[-- Attachment #2: ax88719_setmacaddr_rc.patch --]
[-- Type: text/x-patch, Size: 775 bytes --]

--- linux-3.17/drivers/net/usb/ax88179_178a.c~	2014-10-05 15:23:04.000000000 -0400
+++ linux-3.17/drivers/net/usb/ax88179_178a.c	2014-10-15 09:07:31.217810217 -0400
@@ -937,6 +937,7 @@ static int ax88179_set_mac_addr(struct n
 {
 	struct usbnet *dev = netdev_priv(net);
 	struct sockaddr *addr = p;
+	int ret;
 
 	if (netif_running(net))
 		return -EBUSY;
@@ -946,8 +947,12 @@ static int ax88179_set_mac_addr(struct n
 	memcpy(net->dev_addr, addr->sa_data, ETH_ALEN);
 
 	/* Set the MAC address */
-	return ax88179_write_cmd(dev, AX_ACCESS_MAC, AX_NODE_ID, ETH_ALEN,
+	ret = ax88179_write_cmd(dev, AX_ACCESS_MAC, AX_NODE_ID, ETH_ALEN,
 				 ETH_ALEN, net->dev_addr);
+	if (ret < 0)
+		return ret;
+
+	return 0;
 }
 
 static const struct net_device_ops ax88179_netdev_ops = {

^ permalink raw reply

* sunvnet NAPIfication
From: Sowmini Varadhan @ 2014-10-15 14:05 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20141003.120802.1213573830649867131.davem@davemloft.net>


I have the NAPIfication-of-sunvnet patch-set ready for 
code review. AFAIK net-next is not currently open
for new features this week, but does it make sense for
me to go ahead and send the patch-set to the list (I'm sure
it will take some time to review) or will that just create
confusion to the netdev maintainers?

--Sowmini

Previously discussed:

> >> For example, you can now move everything into software IRQ context,
> >> just disable the VIO interrupt and unconditionally go into NAPI
> >> context from the VIO event.
> >> No more irqsave/irqrestore.
> >> Then the TX path even can run mostly lockless, it just needs
> >> to hold the VIO lock for a minute period of time.  The caller
> >> holds the xmit_lock of the network device to prevent re-entry
> >> into the ->ndo_start_xmit() path.
> 
> I think you should be able to get rid of all of the in-driver
> locking in the fast paths.
> 
> NAPI ->poll() is non-reentrant, so all RX processing occurs
> strictly in a serialized environment.
> 
> Once you do TX reclaim in NAPI context, then all you have to do is
> take the generic netdev TX queue lock during the evaluation of whether
> to wakeup the TX queue or not.  Worst case you need to hold the
> TX netdev queue lock across the whole TX reclaim operation.
> 
> The VIO lock really ought to be entirely superfluous in the data
> paths.

^ permalink raw reply

* [PATCH RFC v2 1/3] virtio_net: enable tx interrupt
From: Michael S. Tsirkin @ 2014-10-15 14:32 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev, virtualization
In-Reply-To: <1413383116-30977-1-git-send-email-mst@redhat.com>

On newer hosts that support delayed tx interrupts,
we probably don't have much to gain from orphaning
packets early.

Based on patch by Jason Wang.

Note: this might degrade performance for
hosts without event idx support.
Should be addressed by the next patch.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 drivers/net/virtio_net.c | 137 ++++++++++++++++++++++++++++++++---------------
 1 file changed, 94 insertions(+), 43 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 13d0a8b..a9bf178 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -72,6 +72,8 @@ struct send_queue {
 
 	/* Name of the send queue: output.$index */
 	char name[40];
+
+	struct napi_struct napi;
 };
 
 /* Internal representation of a receive virtqueue */
@@ -217,15 +219,37 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
 	return p;
 }
 
+static unsigned int free_old_xmit_skbs(struct send_queue *sq, int budget)
+{
+	struct sk_buff *skb;
+	unsigned int len;
+	struct virtnet_info *vi = sq->vq->vdev->priv;
+	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
+	unsigned int packets = 0;
+
+	while (packets < budget &&
+	       (skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
+		pr_debug("Sent skb %p\n", skb);
+
+		u64_stats_update_begin(&stats->tx_syncp);
+		stats->tx_bytes += skb->len;
+		stats->tx_packets++;
+		u64_stats_update_end(&stats->tx_syncp);
+
+		dev_kfree_skb_any(skb);
+		packets++;
+	}
+
+	return packets;
+}
+
 static void skb_xmit_done(struct virtqueue *vq)
 {
 	struct virtnet_info *vi = vq->vdev->priv;
+	struct send_queue *sq = &vi->sq[vq2txq(vq)];
 
-	/* Suppress further interrupts. */
-	virtqueue_disable_cb(vq);
-
-	/* We were probably waiting for more output buffers. */
-	netif_wake_subqueue(vi->dev, vq2txq(vq));
+	if (napi_schedule_prep(&sq->napi))
+		__napi_schedule(&sq->napi);
 }
 
 static unsigned int mergeable_ctx_to_buf_truesize(unsigned long mrg_ctx)
@@ -774,6 +798,37 @@ again:
 	return received;
 }
 
+static int virtnet_poll_tx(struct napi_struct *napi, int budget)
+{
+	struct send_queue *sq =
+		container_of(napi, struct send_queue, napi);
+	struct virtnet_info *vi = sq->vq->vdev->priv;
+	struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, vq2txq(sq->vq));
+	unsigned int sent = 0;
+	bool enable_done;
+
+again:
+	__netif_tx_lock(txq, smp_processor_id());
+	virtqueue_disable_cb(sq->vq);
+	sent += free_old_xmit_skbs(sq, budget - sent);
+
+	if (sent < budget) {
+		enable_done = virtqueue_enable_cb_delayed(sq->vq);
+		napi_complete(napi);
+		__netif_tx_unlock(txq);
+		if (unlikely(enable_done) && napi_schedule_prep(napi)) {
+			virtqueue_disable_cb(sq->vq);
+			__napi_schedule(napi);
+			goto again;
+		}
+	} else {
+		__netif_tx_unlock(txq);
+	}
+
+	netif_wake_subqueue(vi->dev, vq2txq(sq->vq));
+	return sent;
+}
+
 #ifdef CONFIG_NET_RX_BUSY_POLL
 /* must be called with local_bh_disable()d */
 static int virtnet_busy_poll(struct napi_struct *napi)
@@ -822,30 +877,12 @@ static int virtnet_open(struct net_device *dev)
 			if (!try_fill_recv(&vi->rq[i], GFP_KERNEL))
 				schedule_delayed_work(&vi->refill, 0);
 		virtnet_napi_enable(&vi->rq[i]);
+		napi_enable(&vi->sq[i].napi);
 	}
 
 	return 0;
 }
 
-static void free_old_xmit_skbs(struct send_queue *sq)
-{
-	struct sk_buff *skb;
-	unsigned int len;
-	struct virtnet_info *vi = sq->vq->vdev->priv;
-	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
-
-	while ((skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
-		pr_debug("Sent skb %p\n", skb);
-
-		u64_stats_update_begin(&stats->tx_syncp);
-		stats->tx_bytes += skb->len;
-		stats->tx_packets++;
-		u64_stats_update_end(&stats->tx_syncp);
-
-		dev_kfree_skb_any(skb);
-	}
-}
-
 static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
 {
 	struct skb_vnet_hdr *hdr;
@@ -911,7 +948,9 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
 		sg_set_buf(sq->sg, hdr, hdr_len);
 		num_sg = skb_to_sgvec(skb, sq->sg + 1, 0, skb->len) + 1;
 	}
-	return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb, GFP_ATOMIC);
+
+	return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb,
+				    GFP_ATOMIC);
 }
 
 static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
@@ -919,12 +958,16 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 	struct virtnet_info *vi = netdev_priv(dev);
 	int qnum = skb_get_queue_mapping(skb);
 	struct send_queue *sq = &vi->sq[qnum];
-	int err;
+	int err, qsize = virtqueue_get_vring_size(sq->vq);
 	struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
 	bool kick = !skb->xmit_more;
+	bool stopped;
+
+	virtqueue_disable_cb(sq->vq);
 
-	/* Free up any pending old buffers before queueing new ones. */
-	free_old_xmit_skbs(sq);
+	/* We are going to push one skb.
+	 * Try to pop one off to free space for it. */
+	free_old_xmit_skbs(sq, 1);
 
 	/* Try to transmit */
 	err = xmit_skb(sq, skb);
@@ -940,27 +983,25 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 		return NETDEV_TX_OK;
 	}
 
-	/* Don't wait up for transmitted skbs to be freed. */
-	skb_orphan(skb);
-	nf_reset(skb);
-
 	/* Apparently nice girls don't return TX_BUSY; stop the queue
 	 * before it gets out of hand.  Naturally, this wastes entries. */
 	if (sq->vq->num_free < 2+MAX_SKB_FRAGS) {
 		netif_stop_subqueue(dev, qnum);
-		if (unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
-			/* More just got used, free them then recheck. */
-			free_old_xmit_skbs(sq);
-			if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
-				netif_start_subqueue(dev, qnum);
-				virtqueue_disable_cb(sq->vq);
-			}
-		}
+		stopped = true;
+	} else {
+		stopped = false;
 	}
 
 	if (kick || netif_xmit_stopped(txq))
 		virtqueue_kick(sq->vq);
 
+	if (unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
+		/* More just got used, free them then recheck. */
+		free_old_xmit_skbs(sq, qsize);
+		if (stopped && sq->vq->num_free >= 2+MAX_SKB_FRAGS)
+			netif_start_subqueue(dev, qnum);
+	}
+
 	return NETDEV_TX_OK;
 }
 
@@ -1137,8 +1178,10 @@ static int virtnet_close(struct net_device *dev)
 	/* Make sure refill_work doesn't re-enable napi! */
 	cancel_delayed_work_sync(&vi->refill);
 
-	for (i = 0; i < vi->max_queue_pairs; i++)
+	for (i = 0; i < vi->max_queue_pairs; i++) {
 		napi_disable(&vi->rq[i].napi);
+		napi_disable(&vi->sq[i].napi);
+	}
 
 	return 0;
 }
@@ -1457,8 +1500,10 @@ static void virtnet_free_queues(struct virtnet_info *vi)
 {
 	int i;
 
-	for (i = 0; i < vi->max_queue_pairs; i++)
+	for (i = 0; i < vi->max_queue_pairs; i++) {
 		netif_napi_del(&vi->rq[i].napi);
+		netif_napi_del(&vi->sq[i].napi);
+	}
 
 	kfree(vi->rq);
 	kfree(vi->sq);
@@ -1612,6 +1657,8 @@ static int virtnet_alloc_queues(struct virtnet_info *vi)
 		netif_napi_add(vi->dev, &vi->rq[i].napi, virtnet_poll,
 			       napi_weight);
 		napi_hash_add(&vi->rq[i].napi);
+		netif_napi_add(vi->dev, &vi->sq[i].napi, virtnet_poll_tx,
+			       napi_weight);
 
 		sg_init_table(vi->rq[i].sg, ARRAY_SIZE(vi->rq[i].sg));
 		ewma_init(&vi->rq[i].mrg_avg_pkt_len, 1, RECEIVE_AVG_WEIGHT);
@@ -1916,8 +1963,10 @@ static int virtnet_freeze(struct virtio_device *vdev)
 	if (netif_running(vi->dev)) {
 		for (i = 0; i < vi->max_queue_pairs; i++) {
 			napi_disable(&vi->rq[i].napi);
+			napi_disable(&vi->sq[i].napi);
 			napi_hash_del(&vi->rq[i].napi);
 			netif_napi_del(&vi->rq[i].napi);
+			netif_napi_del(&vi->sq[i].napi);
 		}
 	}
 
@@ -1942,8 +1991,10 @@ static int virtnet_restore(struct virtio_device *vdev)
 			if (!try_fill_recv(&vi->rq[i], GFP_KERNEL))
 				schedule_delayed_work(&vi->refill, 0);
 
-		for (i = 0; i < vi->max_queue_pairs; i++)
+		for (i = 0; i < vi->max_queue_pairs; i++) {
 			virtnet_napi_enable(&vi->rq[i]);
+			napi_enable(&vi->sq[i].napi);
+		}
 	}
 
 	netif_device_attach(vi->dev);
-- 
MST

^ permalink raw reply related

* [PATCH RFC v2 2/3] virtio_net: bql
From: Michael S. Tsirkin @ 2014-10-15 14:32 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev, virtualization
In-Reply-To: <1413383116-30977-1-git-send-email-mst@redhat.com>

Improve tx batching using byte queue limits.
Should be especially effective for MQ.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 drivers/net/virtio_net.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index a9bf178..8dea411 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -219,13 +219,15 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
 	return p;
 }
 
-static unsigned int free_old_xmit_skbs(struct send_queue *sq, int budget)
+static unsigned int free_old_xmit_skbs(struct netdev_queue *txq,
+				       struct send_queue *sq, int budget)
 {
 	struct sk_buff *skb;
 	unsigned int len;
 	struct virtnet_info *vi = sq->vq->vdev->priv;
 	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
 	unsigned int packets = 0;
+	unsigned int bytes = 0;
 
 	while (packets < budget &&
 	       (skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
@@ -233,6 +235,7 @@ static unsigned int free_old_xmit_skbs(struct send_queue *sq, int budget)
 
 		u64_stats_update_begin(&stats->tx_syncp);
 		stats->tx_bytes += skb->len;
+		bytes += skb->len;
 		stats->tx_packets++;
 		u64_stats_update_end(&stats->tx_syncp);
 
@@ -240,6 +243,8 @@ static unsigned int free_old_xmit_skbs(struct send_queue *sq, int budget)
 		packets++;
 	}
 
+	netdev_tx_completed_queue(txq, packets, bytes);
+
 	return packets;
 }
 
@@ -810,7 +815,7 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
 again:
 	__netif_tx_lock(txq, smp_processor_id());
 	virtqueue_disable_cb(sq->vq);
-	sent += free_old_xmit_skbs(sq, budget - sent);
+	sent += free_old_xmit_skbs(txq, sq, budget - sent);
 
 	if (sent < budget) {
 		enable_done = virtqueue_enable_cb_delayed(sq->vq);
@@ -962,12 +967,13 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 	struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
 	bool kick = !skb->xmit_more;
 	bool stopped;
+	unsigned int bytes = skb->len;
 
 	virtqueue_disable_cb(sq->vq);
 
 	/* We are going to push one skb.
 	 * Try to pop one off to free space for it. */
-	free_old_xmit_skbs(sq, 1);
+	free_old_xmit_skbs(txq, sq, 1);
 
 	/* Try to transmit */
 	err = xmit_skb(sq, skb);
@@ -983,6 +989,12 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 		return NETDEV_TX_OK;
 	}
 
+	netdev_tx_sent_queue(txq, bytes);
+
+	/* Kick early so device can process descriptors in parallel with us. */
+	if (kick)
+		virtqueue_kick(sq->vq);
+
 	/* Apparently nice girls don't return TX_BUSY; stop the queue
 	 * before it gets out of hand.  Naturally, this wastes entries. */
 	if (sq->vq->num_free < 2+MAX_SKB_FRAGS) {
@@ -997,7 +1009,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	if (unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
 		/* More just got used, free them then recheck. */
-		free_old_xmit_skbs(sq, qsize);
+		free_old_xmit_skbs(txq, sq, qsize);
 		if (stopped && sq->vq->num_free >= 2+MAX_SKB_FRAGS)
 			netif_start_subqueue(dev, qnum);
 	}
-- 
MST

^ permalink raw reply related

* [PATCH RFC v2 3/3] virtio-net: optimize free_old_xmit_skbs stats
From: Michael S. Tsirkin @ 2014-10-15 14:32 UTC (permalink / raw)
  To: linux-kernel; +Cc: Jason Wang, Rusty Russell, virtualization, netdev
In-Reply-To: <1413383116-30977-1-git-send-email-mst@redhat.com>

From: Jason Wang <jasowang@redhat.com>

We already have counters for sent packets and sent bytes.
Use them to reduce the number of u64_stats_update_begin/end().

Take care not to bother with stats update when called
speculatively.

Based on a patch by Jason Wang.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 drivers/net/virtio_net.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 8dea411..4e12023 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -233,16 +233,22 @@ static unsigned int free_old_xmit_skbs(struct netdev_queue *txq,
 	       (skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
 		pr_debug("Sent skb %p\n", skb);
 
-		u64_stats_update_begin(&stats->tx_syncp);
-		stats->tx_bytes += skb->len;
 		bytes += skb->len;
-		stats->tx_packets++;
-		u64_stats_update_end(&stats->tx_syncp);
+		packets++;
 
 		dev_kfree_skb_any(skb);
-		packets++;
 	}
 
+	/* Avoid overhead when no packets have been processed
+	 * happens when called speculatively from start_xmit. */
+	if (!packets)
+		return 0;
+
+	u64_stats_update_begin(&stats->tx_syncp);
+	stats->tx_bytes += bytes;
+	stats->tx_packets += packets;
+	u64_stats_update_end(&stats->tx_syncp);
+
 	netdev_tx_completed_queue(txq, packets, bytes);
 
 	return packets;
-- 
MST

^ permalink raw reply related

* Re: bnx2: can't disable rx-vlan-offload?
From: Vlad Yasevich @ 2014-10-15 14:37 UTC (permalink / raw)
  To: Lukas Tribus, Sony Chacko, Dept-HSGLinuxNICDev@qlogic.com
  Cc: netdev@vger.kernel.org
In-Reply-To: <DUB123-W35654C3B747AEBC067DAF8EDAD0@phx.gbl>

On 10/14/2014 06:05 PM, Lukas Tribus wrote:
> Hi!
> 
> 
> For a passive, libpcap based sniffer I'm trying to disable rx vlan
> offload on a bnx2 card (vlan capture filter in libpcap doesn't work
> correctly), but I'm not able to:
> 
> disabling rxvlan fails:
> 
> root@sniffer:~# ethtool -K eth1 rxvlan off
> Could not change any device features
> root@sniffer:~#
> root@sniffer:~# ethtool -k eth1
> Features for eth1:
> rx-checksumming: on
> tx-checksumming: on
>         tx-checksum-ipv4: on
>         tx-checksum-ip-generic: off [fixed]
>         tx-checksum-ipv6: on
>         tx-checksum-fcoe-crc: off [fixed]
>         tx-checksum-sctp: off [fixed]
> scatter-gather: on
>         tx-scatter-gather: on
>         tx-scatter-gather-fraglist: off [fixed]
> tcp-segmentation-offload: on
>         tx-tcp-segmentation: on
>         tx-tcp-ecn-segmentation: on
>         tx-tcp6-segmentation: on
> udp-fragmentation-offload: off [fixed]
> generic-segmentation-offload: on
> generic-receive-offload: on
> large-receive-offload: off [fixed]
> rx-vlan-offload: on [requested off]
> tx-vlan-offload: on
> ntuple-filters: off [fixed]
> receive-hashing: on
> highdma: on [fixed]
> rx-vlan-filter: off [fixed]
> vlan-challenged: off [fixed]
> tx-lockless: off [fixed]
> netns-local: off [fixed]
> tx-gso-robust: off [fixed]
> tx-fcoe-segmentation: off [fixed]
> tx-gre-segmentation: off [fixed]
> tx-ipip-segmentation: off [fixed]
> tx-sit-segmentation: off [fixed]
> tx-udp_tnl-segmentation: off [fixed]
> tx-mpls-segmentation: off [fixed]
> fcoe-mtu: off [fixed]
> tx-nocache-copy: on
> loopback: off [fixed]
> rx-fcs: off [fixed]
> rx-all: off [fixed]
> tx-vlan-stag-hw-insert: off [fixed]
> rx-vlan-stag-hw-parse: off [fixed]
> rx-vlan-stag-filter: off [fixed]
> l2-fwd-offload: off [fixed]
> root@sniffer:~#
> root@sniffer:~# ethtool -i eth1
> driver: bnx2
> version: 2.2.4
> firmware-version: 5.0.9 bc 5.0.6 NCSI 2.0.3
> bus-info: 0000:01:00.1
> supports-statistics: yes
> supports-test: yes
> supports-eeprom-access: yes
> supports-register-dump: yes
> supports-priv-flags: no
> root@sniffer:~#
> 
> dmesg bnx2 lines:
> [    6.974972] bnx2: Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v2.2.4 (Aug 05, 2013)
> [    6.975681] bnx2 0000:01:00.0 eth0: Broadcom NetXtreme II BCM5716 1000Base-T (C0) PCI Express found at mem da000000, IRQ 36, node addr 00:26:b9:51:ba:59
> [    6.976631] bnx2 0000:01:00.1 eth1: Broadcom NetXtreme II BCM5716 1000Base-T (C0) PCI Express found at mem dc000000, IRQ 48, node addr 00:26:b9:51:ba:5a
> 
> 
> I'm using Ubuntus 3.13 kernel and ethtool version 3.13, but I previously
> tried ethtool 3.1 on both ubuntu 3.8 kernel and a vanilla 3.14.21
> kernel as well.
> 
> 
> Any ideas or workarounds that may help disabling rx vlan acceleration?

Try bringing the interface down first.  BNX2 is a bit strange in how it handles
vlan feature changes.

-vlad

> 
> 
> 
> 
> Thanks!
> 
> Lukas
> 
> 
>  		 	   		  --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* Re: feature suggestion: implement SO_PEERCRED on local AF_INET/AF_INET6 sockets (allow uid-based identification on localhost)
From: Andy Lutomirski @ 2014-10-15 14:41 UTC (permalink / raw)
  To: David Madore, Linux Kernel mailing-list, netdev
In-Reply-To: <20141015133518.GA7179@achernar.madore.org>

On 10/15/2014 06:35 AM, David Madore wrote:
> Given an AF_UNIX socket, the getsockopt(, SOL_SOCKET, SO_PEERCRED,,)
> call allows one endpoint to authenticate the other endpoint's pid, uid
> and gid.
> 
> The call is valid on AF_INET and AF_INET6 sockets but returns no data
> (pid=0, uid=-1, gid=-1).  Obviously it is meaningless to try to get
> such credentials from a INET/INET6 socket in general, but there is one
> case where it would make sense: namely, when the endpoint is local
> (i.e., when the socket is a connection to the same machine, e.g., when
> connecting to 127.0.0.0/8 or ::1/32).
> 
> Being able to authenticate local INET/INET6 sockets would be immensely
> valuable for a number of programs, to provide some kind of access
> control to local sockets.  For example, ssh allows port forwarding
> using the -L and -D options: by default or by option (cf. the
> GatewayPorts option of ssh), these port-forwarding sockets can be
> restricted to localhost, but of course they cannot be restricted to
> the user running ssh, which makes them a huge security problem.  Many
> programs suffer from the same problem (they restrict some kind of
> connection to localhost, but they of course cannot make a restriction
> on which user will be able to connect).
> 
> One cannot simply retort "these programs should be using Unix-domain
> sockets instead": I don't think many browsers support using a SOCKS
> proxy or an HTTP proxy over a Unix-domain socket, and in the latter
> case I'm not even sure it would make sense (protocol-wise).
> 
> If I believe <URL:
> http://www.lehman.cuny.edu/cgi-bin/man-cgi?getpeerucred+3
>  > ("The system currently supports both sides of connection end-points
> for local AF_UNIX, AF_INET, and AF_INET6 sockets"), Solaris, or at
> least some version thereof, support authentication of local AF_INET
> and AF_INET6 sockets.
> 
> I think it would be wonderful if Linux had this.  I'm willing to work
> on the implementation if it is considered *a priori* acceptable for
> inclusion.
> 
> The data seems to be available, since it is exposed in /proc/net/tcp
> and /proc/net/tcp6 and whatnots (implementation details left aside, it
> is merely a question of matching a line with opposite endpoints to the
> current socket and returning it).
> 
> [In principle, a userland program can parse /proc/net/tcp so it does
> not need the feature I am suggesting, but in practice parsing a text
> file to communicate with the kernel is yucky at best, and probably not
> very robust (e.g., /proc might not be mounted), and it would be very
> difficult to convince, say, the OpenSSH authors to include code that
> parses the Linux /proc/net/tcp format (or even link with a library
> which does this) in order to add access-control on ssh port-forwards:
> having this under a more standard getsockpot() interface is cleaner
> and opens at least some kind of hope that programs would agree to use
> it.]
> 
> Question number 1: If this feature were implemented, would it be
> considered acceptable for inclusion in the kernel?  (If there is some
> reason why it can't be accepted, I'd like to know in advance, to avoid
> working in vain.)

I will object to adding it as described, for the same reason that I
object to anything that extends the current model of socket-based
credential passing.  Ideally, credentials would *never* be implicitly
captured by socket syscalls.  We live in the real world, and SO_****CRED
exists, so I think the best we can do is to try to minimize its use.

I can elaborate further, or you can IIRC search the archives for
SCM_IDENTITY, and you can also look at CVE-2013-1979 for a nasty example
of why this model is broken.

That being said, if anyone ever finished the SCM_IDENTITY work, I think
I'd be okay with allowing it over TCP.  I can't speak for the network
people, though, and you should ask on netdev (cc'd).

> 
> Question number 2: A priori, how difficult would it be to implement
> this?  (As mentioned above, it seems trivial in principle to merely go
> through the local endpoints to find a matching connection, but maybe
> there are locking issues that I don't understand that make it much
> more difficult than it would seem.)  Any guidelines on implementation?
> (I imagine one should try to fill sk->sk_peer_cred at connect time,
> but I don't really know how difficult this might turn out.)

Dunno.

--Andy

^ permalink raw reply

* (unknown)
From: Microsoft Promotion @ 2014-10-15 15:17 UTC (permalink / raw)

In-Reply-To: <723568878.141830.1413385719263.JavaMail.yahoo@jws10091.mail.ne1.yahoo.com>

[-- Attachment #1: Type: text/plain, Size: 14 bytes --]

Congratulation

[-- Attachment #2: MICROSOFT AWARD PROMOTION.doc --]
[-- Type: application/msword, Size: 53248 bytes --]

^ permalink raw reply

* Re: something is wrong in commit 971f10eca1 - tcp: better TCP_SKB_CB layout to reduce cache line misses
From: Eric Dumazet @ 2014-10-15 15:32 UTC (permalink / raw)
  To: Krzysztof Kolasa; +Cc: Cong Wang, David Miller, Eric Dumazet, netdev
In-Reply-To: <543E5D1F.2020207@winsoft.pl>

On Wed, 2014-10-15 at 13:40 +0200, Krzysztof Kolasa wrote:

> on a 32bit system, the patch did not solve the problem :(
> I have exactly the same problem as before the patch


You mean Cong patch ?

> 
> I do not understand this, perhaps the problem is hidden somewhere else,
> one thing is certain after revert commit 971f10eca1 everything works 
> correctly
> 

Can you privately send me the vmlinux file ?

It might be a compiler issue...

^ permalink raw reply

* Re: [PATCH (net.git) 1/1] stmmac: fix sti compatibililies
From: David Miller @ 2014-10-15 15:46 UTC (permalink / raw)
  To: peppe.cavallaro; +Cc: netdev, sfr, linux-kernel, linux-next
In-Reply-To: <1413351041-11369-1-git-send-email-peppe.cavallaro@st.com>

From: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Date: Wed, 15 Oct 2014 07:30:41 +0200

> this patch is to fix the stmmac data compatibilities for
> all the SoCs inside the platform file.
> 
> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>

Please in the future provide a proper "Reported-by: " tag for people
like Stephen who let you know about the problem.

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] net: use hardware buffer pool to allocate skb
From: David Miller @ 2014-10-15 15:51 UTC (permalink / raw)
  To: Jiafei.Pan; +Cc: jkosina, netdev, LeoLi, linux-doc
In-Reply-To: <c02005b9d8794d14bc2149f468ab571c@BY2PR03MB524.namprd03.prod.outlook.com>

From: "Jiafei.Pan@freescale.com" <Jiafei.Pan@freescale.com>
Date: Wed, 15 Oct 2014 05:34:24 +0000

> Yes, for this matter, in order to do this we should modify the Ethernet
> drivers. For example, driver A want to driver B, C, D.. to support driver
> A's Hardware block access functions, so we have to modify driver B, C, D...
> It will be so complex for this matter.

Experience says otherwise.  It's three or four lines of code to attach
a page to an SKB frag.

The driver needs it's own buffer management and setup code anyways,
and no generic facility will replace that.

I think your precondition for these changes therefore doesn't really
exist.

Please, look over all of the drivers that exist already in the tree
and build SKBs using page frage from hardware device buffer pools.

You have to show us that all of those drivers can make use of your
facility.

^ permalink raw reply

* Re: [PATCH 00/12] dmaengine: remove users of device_control
From: Vinod Koul @ 2014-10-15 15:58 UTC (permalink / raw)
  To: dmaengine
  Cc: Viresh Kumar, Tejun Heo, Linus Walleij, Dan Williams,
	Guennadi Liakhovetski, Mauro Carvalho Chehab, David Woodhouse,
	Brian Norris, Nicolas Ferre, Mark Brown, Greg Kroah-Hartman,
	Jiri Slaby, Felipe Balbi, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Liam Girdwood, Jaroslav Kysela, Takashi Iwai,
	Jingoo Han, Bartlomiej Zolnierkiewicz, Laurent Pinchart,
	"Rafael J. Wysocki" <ra
In-Reply-To: <1413041973-28146-1-git-send-email-vinod.koul@intel.com>

On Sat, Oct 11, 2014 at 09:09:33PM +0530, Vinod Koul wrote:
> The recent discussion [1] on the API have resulted in moving away from
> device_control ioctl method to proper channel APIs.
> There are still few users on the device_control which should use the wrappers
> existing rather than access device_control.
> This will aid us in deprecating and removing device_control, possibly after
> the merge window.
> 
> These can be merged thru respective subsystem tree or dmaengine tree. Either
> way please just let me know.

Applying to dmaengine-next with acks recived, dropping the ones which are
applied by respective folks

-- 
~Vinod

> 
> Feng's kbuild has tested these as well [2]
> 
> [1]: http://www.spinics.net/lists/dmaengine/msg02212.html
> [2]: http://git.infradead.org/users/vkoul/slave-dma.git/shortlog/refs/heads/topic/dma_control_cleanup
> 
> Vinod Koul (12):
>   pata_arasan_cf: use dmaengine_terminate_all() API
>   dmaengine: coh901318: use dmaengine_terminate_all() API
>   [media] V4L2: mx3_camer: use dmaengine_pause() API
>   mtd: fsmc_nand: use dmaengine_terminate_all() API
>   mtd: sh_flctl: use dmaengine_terminate_all() API
>   net: ks8842: use dmaengine_terminate_all() API
>   spi/atmel: use dmaengine_terminate_all() API
>   spi/spi-dw-mid.c: use dmaengine_slave_config() API
>   serial: sh-sci: use dmaengine_terminate_all() API
>   usb: musb: ux500_dma: use dmaengine_xxx() APIs
>   ASoC: txx9: use dmaengine_terminate_all() API
>   video: mx3fb: use dmaengine_terminate_all() API
> 
>  drivers/ata/pata_arasan_cf.c                   |    5 ++---
>  drivers/dma/coh901318.c                        |    2 +-
>  drivers/media/platform/soc_camera/mx3_camera.c |    6 ++----
>  drivers/mtd/nand/fsmc_nand.c                   |    2 +-
>  drivers/mtd/nand/sh_flctl.c                    |    2 +-
>  drivers/net/ethernet/micrel/ks8842.c           |    6 ++----
>  drivers/spi/spi-atmel.c                        |    6 ++----
>  drivers/spi/spi-dw-mid.c                       |    6 ++----
>  drivers/tty/serial/sh-sci.c                    |    2 +-
>  drivers/usb/musb/ux500_dma.c                   |    7 ++-----
>  drivers/video/fbdev/mx3fb.c                    |    3 +--
>  sound/soc/txx9/txx9aclc.c                      |    7 +++----
>  12 files changed, 20 insertions(+), 34 deletions(-)
> 
> --
> To unsubscribe from this list: send the line "unsubscribe dmaengine" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 

^ permalink raw reply

* Re: [PATCH v2] net: Add ndo_gso_check
From: David Miller @ 2014-10-15 16:11 UTC (permalink / raw)
  To: therbert; +Cc: netdev
In-Reply-To: <1413325146-2404-1-git-send-email-therbert@google.com>

From: Tom Herbert <therbert@google.com>
Date: Tue, 14 Oct 2014 15:19:06 -0700

> Add ndo_gso_check which a device can define to indicate whether is
> is capable of doing GSO on a packet. This funciton would be called from
> the stack to determine whether software GSO is needed to be done. A
> driver should populate this function if it advertises GSO types for
> which there are combinations that it wouldn't be able to handle. For
> instance a device that performs UDP tunneling might only implement
> support for transparent Ethernet bridging type of inner packets
> or might have limitations on lengths of inner headers.
> 
> Signed-off-by: Tom Herbert <therbert@google.com>

Applied, thanks Tom.

^ permalink raw reply

* [net] gianfar: Add FCS to rx buffer size (fix)
From: Claudiu Manoil @ 2014-10-15 16:11 UTC (permalink / raw)
  To: netdev; +Cc: Kristian Otnes, David S. Miller

For each Rx frame the eTSEC writes its FCS (Frame Check Sequence)
to the Rx buffer.

The eTSEC h/w manual states in the "Receive Buffer Descriptor Field
Descriptions" table:
"Data length is the number of octets written by the eTSEC into this BD's
data buffer if L is cleared (the value is equal to MRBLR), or, if L is
set, the length of the frame including *CRC*, FCB (if RCTRL[PRSDEP > 00),
preamble (if MACCFG2[PreAmRxEn]=1), time stamp (if RCTRL[TS] = 1) and
any padding (RCTRL[PAL])."

Though the FCS bytes are removed by the driver before passing the skb
to the net stack, the Rx buffer size computation does not currently
take into account the FCS bytes (4 bytes).
Because the Rx buffer size is multiple of 512 bytes, leaving out the
FCS is not a problem for the default MTU of 1500, as the Rx buffer size
is 1536 in this case.  However, for custom MTUs, where the difference
between the MTU size and the Rx buffer size is less, this can be a
problem as the computed Rx buffer size won't be enough to accomodate
the FCS for a received frame that is big enough (close to MTU size).
In such case the received frame is considered to be incomplete (L flag
not set in the RxBD status) and silently dropped.

Note that the driver does not currently support S/G on Rx, so it has to
compute its Rx buffer size based on the MTU of the device.

Reported-by: Kristian Otnes <kotnes@cisco.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
---
 drivers/net/ethernet/freescale/gianfar.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c
index 379b1a5..4fdf0aa 100644
--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -338,7 +338,7 @@ static void gfar_init_tx_rx_base(struct gfar_private *priv)
 
 static void gfar_rx_buff_size_config(struct gfar_private *priv)
 {
-	int frame_size = priv->ndev->mtu + ETH_HLEN;
+	int frame_size = priv->ndev->mtu + ETH_HLEN + ETH_FCS_LEN;
 
 	/* set this when rx hw offload (TOE) functions are being used */
 	priv->uses_rxfcb = 0;
-- 
1.7.11.7

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox