Netdev List
 help / color / mirror / Atom feed
* [PATCH net 0/4] ipv6 and related cleanup for cxgb4/cxgb4i
From: Anish Bhatt @ 2014-10-14 22:13 UTC (permalink / raw)
  To: netdev; +Cc: davem, hariprasad, leedom, kxie, manojmalviya, Anish Bhatt

This patch set removes some duplicated/extraneous code from cxgb4i, and guards
cxgb4 against compilation failure based on ipv6 tristate.

Anish Bhatt (4):
  cxgb4i : Remove duplicated code from cxgb4i, fucntionality already
    present in cxgb4
  cxgb4 : Fix build failure in cxgb4 when ipv6 is disabled/not in-built
  cxgb4i : All this code is only needed when IPV6 is enabled, disable
    when not required, also fixes -Wunused-function warning
  cxgb4i: Remove duplicate call to dst_neigh_lookup()

 drivers/net/ethernet/chelsio/Kconfig            |   2 +-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c |  15 +++
 drivers/scsi/cxgbi/cxgb4i/cxgb4i.c              | 146 ++----------------------
 drivers/scsi/cxgbi/libcxgbi.c                   |   2 +
 4 files changed, 26 insertions(+), 139 deletions(-)

-- 
2.1.2

^ permalink raw reply

* Re: [PATCH] net: ethernet: marvell: sky2.c: Cleaning up missing null-terminate in conjunction with strncpy
From: Rickard Strandqvist @ 2014-10-14 22:11 UTC (permalink / raw)
  To: David Miller
  Cc: Stephen Hemminger, Mirko Lindner, Network Development,
	linux-kernel@vger.kernel.org
In-Reply-To: <20140915.165620.1169777676117933052.davem@davemloft.net>

2014-09-15 22:56 GMT+02:00 David Miller <davem@davemloft.net>:
> From: Stephen Hemminger <stephen@networkplumber.org>
> Date: Mon, 15 Sep 2014 13:53:39 -0700
>
>> On Mon, 15 Sep 2014 13:07:21 -0400 (EDT)
>> David Miller <davem@davemloft.net> wrote:
>>
>>> From: Stephen Hemminger <stephen@networkplumber.org>
>>> Date: Sun, 14 Sep 2014 19:05:57 -0700
>>>
>>> > On Sun, 14 Sep 2014 19:33:43 +0200
>>> > Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se> wrote:
>>> >
>>> >> Replacing strncpy with strlcpy to avoid strings that lacks null terminate.
>>> >>
>>> >> Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
>>> >> ---
>>> >>  drivers/net/ethernet/marvell/sky2.c |    2 +-
>>> >>  1 file changed, 1 insertion(+), 1 deletion(-)
>>> >>
>>> >> diff --git a/drivers/net/ethernet/marvell/sky2.c b/drivers/net/ethernet/marvell/sky2.c
>>> >> index dba48a5c..7053d38 100644
>>> >> --- a/drivers/net/ethernet/marvell/sky2.c
>>> >> +++ b/drivers/net/ethernet/marvell/sky2.c
>>> >> @@ -4907,7 +4907,7 @@ static const char *sky2_name(u8 chipid, char *buf, int sz)
>>> >>   };
>>> >>
>>> >>   if (chipid >= CHIP_ID_YUKON_XL && chipid <= CHIP_ID_YUKON_OP_2)
>>> >> -         strncpy(buf, name[chipid - CHIP_ID_YUKON_XL], sz);
>>> >> +         strlcpy(buf, name[chipid - CHIP_ID_YUKON_XL], sz);
>>> >>   else
>>> >>           snprintf(buf, sz, "(chip %#x)", chipid);
>>> >>   return buf;
>>> >
>>> > Useless and unnecessary since the list of names is right there.
>>> > Why not avoid the copy all together?
>>> >
>>> > Subject: sky2: avoid strncpy
>>> >
>>> > Don't use strncpy() since security thought police think it is bad.
>>> >
>>> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
>>>
>>> I think providing the buffer on the stack of the thread executing the
>>> probe is superior because it will allow enabling parallel probing
>>> in the future.
>>>
>>> I don't think you have to change that aspect to achieve your goal
>>> of returning the const char * string when possible.
>>
>> What is benefit of s/strncpy/strlcpy/ for known safe code?
>> Seems like more of the checkpatch police state.
>>
>
> Stephen, read my reply again.
>
> I didn't say to go back to the strlcpy change.
>
> I said to use your patch, but keep the buf[] parameter allocated on
> the caller's stack.
>
> Thanks.


Hi

Has not happened anything here ...

Stephen, how can this be "." Then you might as well use strcpy,
because the use here does not guarantee that the string will be null
terminated, anyway.


And David, as I understand you want to use code like this:

static const char *sky2_name(u8 chipid, char *buf, int sz)
{
....

    if (chipid >= CHIP_ID_YUKON_XL && chipid <= CHIP_ID_YUKON_OP_2)
        return name[chipid - CHIP_ID_YUKON_XL];

    snprintf(buf, sz, "(chip %#x)", chipid);

    return buf;
}

Or?


And we can all easily see how the code is used here.

So Stephen statement as true, sz = 16.
And David, since we only use the return value.

But is it really such we should assume always is true?

What if someone uses it like this:

char tmp[8];
...
sky2_name(id, tmp, sizeof(tmp));
printf("The chip: %s\n", tmp);



Kind regards
Rickard Strandqvist

^ permalink raw reply

* [PATCH RFC] virtio_net: enable tx interrupt
From: Michael S. Tsirkin @ 2014-10-14 21:53 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev, virtualization

On newer hosts that support delayed tx interrupts,
we probably don't have much to gain from orphaning
packets early.

Based on patch by Jason Wang.

Note: this will likely degrade performance for hosts without event idx
support.  Various fallback options are available, including
orphaning conditionally.
Testing TBD.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 drivers/net/virtio_net.c | 119 +++++++++++++++++++++++++++++++++--------------
 1 file changed, 83 insertions(+), 36 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 6b6e136..62c059d 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -72,6 +72,8 @@ struct send_queue {
 
 	/* Name of the send queue: output.$index */
 	char name[40];
+
+	struct napi_struct napi;
 };
 
 /* Internal representation of a receive virtqueue */
@@ -211,15 +213,38 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
 	return p;
 }
 
+static int free_old_xmit_skbs(struct send_queue *sq, int budget)
+{
+	struct sk_buff *skb;
+	unsigned int len;
+	struct virtnet_info *vi = sq->vq->vdev->priv;
+	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
+	int sent = 0;
+
+	while (sent < budget &&
+	       (skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
+		pr_debug("Sent skb %p\n", skb);
+
+		u64_stats_update_begin(&stats->tx_syncp);
+		stats->tx_bytes += skb->len;
+		stats->tx_packets++;
+		u64_stats_update_end(&stats->tx_syncp);
+
+		dev_kfree_skb_any(skb);
+		sent++;
+	}
+
+	return sent;
+}
+
 static void skb_xmit_done(struct virtqueue *vq)
 {
 	struct virtnet_info *vi = vq->vdev->priv;
+	struct send_queue *sq = &vi->sq[vq2txq(vq)];
 
-	/* Suppress further interrupts. */
-	virtqueue_disable_cb(vq);
-
-	/* We were probably waiting for more output buffers. */
-	netif_wake_subqueue(vi->dev, vq2txq(vq));
+	if (napi_schedule_prep(&sq->napi)) {
+		__napi_schedule(&sq->napi);
+	}
 }
 
 static unsigned int mergeable_ctx_to_buf_truesize(unsigned long mrg_ctx)
@@ -766,6 +791,37 @@ again:
 	return received;
 }
 
+static int virtnet_poll_tx(struct napi_struct *napi, int budget)
+{
+	struct send_queue *sq =
+		container_of(napi, struct send_queue, napi);
+	struct virtnet_info *vi = sq->vq->vdev->priv;
+	struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, vq2txq(sq->vq));
+	unsigned int r, sent = 0;
+
+again:
+	__netif_tx_lock(txq, smp_processor_id());
+	virtqueue_disable_cb(sq->vq);
+	sent += free_old_xmit_skbs(sq, budget - sent);
+
+	if (sent < budget) {
+		r = virtqueue_enable_cb_prepare(sq->vq);
+		napi_complete(napi);
+		__netif_tx_unlock(txq);
+		if (unlikely(virtqueue_poll(sq->vq, r)) &&
+		    napi_schedule_prep(napi)) {
+			virtqueue_disable_cb(sq->vq);
+			__napi_schedule(napi);
+			goto again;
+		}
+	} else {
+		__netif_tx_unlock(txq);
+	}
+
+	netif_wake_subqueue(vi->dev, vq2txq(sq->vq));
+	return sent;
+}
+
 #ifdef CONFIG_NET_RX_BUSY_POLL
 /* must be called with local_bh_disable()d */
 static int virtnet_busy_poll(struct napi_struct *napi)
@@ -814,30 +870,12 @@ static int virtnet_open(struct net_device *dev)
 			if (!try_fill_recv(&vi->rq[i], GFP_KERNEL))
 				schedule_delayed_work(&vi->refill, 0);
 		virtnet_napi_enable(&vi->rq[i]);
+		napi_enable(&vi->sq[i].napi);
 	}
 
 	return 0;
 }
 
-static void free_old_xmit_skbs(struct send_queue *sq)
-{
-	struct sk_buff *skb;
-	unsigned int len;
-	struct virtnet_info *vi = sq->vq->vdev->priv;
-	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
-
-	while ((skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
-		pr_debug("Sent skb %p\n", skb);
-
-		u64_stats_update_begin(&stats->tx_syncp);
-		stats->tx_bytes += skb->len;
-		stats->tx_packets++;
-		u64_stats_update_end(&stats->tx_syncp);
-
-		dev_kfree_skb_any(skb);
-	}
-}
-
 static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
 {
 	struct skb_vnet_hdr *hdr;
@@ -902,7 +940,9 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
 		sg_set_buf(sq->sg, hdr, hdr_len);
 		num_sg = skb_to_sgvec(skb, sq->sg + 1, 0, skb->len) + 1;
 	}
-	return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb, GFP_ATOMIC);
+
+	return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb,
+				    GFP_ATOMIC);
 }
 
 static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
@@ -910,10 +950,9 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 	struct virtnet_info *vi = netdev_priv(dev);
 	int qnum = skb_get_queue_mapping(skb);
 	struct send_queue *sq = &vi->sq[qnum];
-	int err;
+	int err, qsize = virtqueue_get_vring_size(sq->vq);
 
-	/* Free up any pending old buffers before queueing new ones. */
-	free_old_xmit_skbs(sq);
+	virtqueue_disable_cb(sq->vq);
 
 	/* Try to transmit */
 	err = xmit_skb(sq, skb);
@@ -930,22 +969,20 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 	}
 	virtqueue_kick(sq->vq);
 
-	/* Don't wait up for transmitted skbs to be freed. */
-	skb_orphan(skb);
-	nf_reset(skb);
-
 	/* Apparently nice girls don't return TX_BUSY; stop the queue
 	 * before it gets out of hand.  Naturally, this wastes entries. */
 	if (sq->vq->num_free < 2+MAX_SKB_FRAGS) {
 		netif_stop_subqueue(dev, qnum);
 		if (unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
 			/* More just got used, free them then recheck. */
-			free_old_xmit_skbs(sq);
+			free_old_xmit_skbs(sq, qsize);
 			if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
 				netif_start_subqueue(dev, qnum);
 				virtqueue_disable_cb(sq->vq);
 			}
 		}
+	} else if (virtqueue_enable_cb_delayed(sq->vq)) {
+		free_old_xmit_skbs(sq, qsize);
 	}
 
 	return NETDEV_TX_OK;
@@ -1124,8 +1161,10 @@ static int virtnet_close(struct net_device *dev)
 	/* Make sure refill_work doesn't re-enable napi! */
 	cancel_delayed_work_sync(&vi->refill);
 
-	for (i = 0; i < vi->max_queue_pairs; i++)
+	for (i = 0; i < vi->max_queue_pairs; i++) {
 		napi_disable(&vi->rq[i].napi);
+		napi_disable(&vi->sq[i].napi);
+	}
 
 	return 0;
 }
@@ -1438,8 +1477,10 @@ static void virtnet_free_queues(struct virtnet_info *vi)
 {
 	int i;
 
-	for (i = 0; i < vi->max_queue_pairs; i++)
+	for (i = 0; i < vi->max_queue_pairs; i++) {
 		netif_napi_del(&vi->rq[i].napi);
+		netif_napi_del(&vi->sq[i].napi);
+	}
 
 	kfree(vi->rq);
 	kfree(vi->sq);
@@ -1593,6 +1634,8 @@ static int virtnet_alloc_queues(struct virtnet_info *vi)
 		netif_napi_add(vi->dev, &vi->rq[i].napi, virtnet_poll,
 			       napi_weight);
 		napi_hash_add(&vi->rq[i].napi);
+		netif_napi_add(vi->dev, &vi->sq[i].napi, virtnet_poll_tx,
+			       napi_weight);
 
 		sg_init_table(vi->rq[i].sg, ARRAY_SIZE(vi->rq[i].sg));
 		ewma_init(&vi->rq[i].mrg_avg_pkt_len, 1, RECEIVE_AVG_WEIGHT);
@@ -1893,8 +1936,10 @@ static int virtnet_freeze(struct virtio_device *vdev)
 	if (netif_running(vi->dev)) {
 		for (i = 0; i < vi->max_queue_pairs; i++) {
 			napi_disable(&vi->rq[i].napi);
+			napi_disable(&vi->sq[i].napi);
 			napi_hash_del(&vi->rq[i].napi);
 			netif_napi_del(&vi->rq[i].napi);
+			netif_napi_del(&vi->sq[i].napi);
 		}
 	}
 
@@ -1919,8 +1964,10 @@ static int virtnet_restore(struct virtio_device *vdev)
 			if (!try_fill_recv(&vi->rq[i], GFP_KERNEL))
 				schedule_delayed_work(&vi->refill, 0);
 
-		for (i = 0; i < vi->max_queue_pairs; i++)
+		for (i = 0; i < vi->max_queue_pairs; i++) {
 			virtnet_napi_enable(&vi->rq[i]);
+			napi_enable(&vi->sq[i].napi);
+		}
 	}
 
 	netif_device_attach(vi->dev);
-- 
MST

^ permalink raw reply related

* Re: [PATCH net-next RFC 3/3] virtio-net: conditionally enable tx interrupt
From: Michael S. Tsirkin @ 2014-10-14 21:51 UTC (permalink / raw)
  To: Jason Wang; +Cc: kvm, netdev, linux-kernel, virtualization, linux-api
In-Reply-To: <1413011806-3813-4-git-send-email-jasowang@redhat.com>

On Sat, Oct 11, 2014 at 03:16:46PM +0800, Jason Wang wrote:
> We free transmitted packets in ndo_start_xmit() in the past to get better
> performance in the past. One side effect is that skb_orphan() needs to be
> called in ndo_start_xmit() which makes sk_wmem_alloc not accurate in
> fact. For TCP protocol, this means several optimization could not work well
> such as TCP small queue and auto corking. This can lead extra low
> throughput of small packets stream.
> 
> Thanks to the urgent descriptor support. This patch tries to solve this
> issue by enable the tx interrupt selectively for stream packets. This means
> we don't need to orphan TCP stream packets in ndo_start_xmit() but enable
> tx interrupt for those packets. After we get tx interrupt, a tx napi was
> scheduled to free those packets.
> 
> With this method, sk_wmem_alloc of TCP socket were more accurate than in
> the past which let TCP can batch more through TSQ and auto corking.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/net/virtio_net.c | 164 ++++++++++++++++++++++++++++++++++++-----------
>  1 file changed, 128 insertions(+), 36 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 5810841..b450fc4 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -72,6 +72,8 @@ struct send_queue {
>  
>  	/* Name of the send queue: output.$index */
>  	char name[40];
> +
> +	struct napi_struct napi;
>  };
>  
>  /* Internal representation of a receive virtqueue */
> @@ -217,15 +219,40 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
>  	return p;
>  }
>  
> +static int free_old_xmit_skbs(struct send_queue *sq, int budget)
> +{
> +	struct sk_buff *skb;
> +	unsigned int len;
> +	struct virtnet_info *vi = sq->vq->vdev->priv;
> +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> +	int sent = 0;
> +
> +	while (sent < budget &&
> +	       (skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> +		pr_debug("Sent skb %p\n", skb);
> +
> +		u64_stats_update_begin(&stats->tx_syncp);
> +		stats->tx_bytes += skb->len;
> +		stats->tx_packets++;
> +		u64_stats_update_end(&stats->tx_syncp);
> +
> +		dev_kfree_skb_any(skb);
> +		sent++;
> +	}
> +
> +	return sent;
> +}
> +
>  static void skb_xmit_done(struct virtqueue *vq)
>  {
>  	struct virtnet_info *vi = vq->vdev->priv;
> +	struct send_queue *sq = &vi->sq[vq2txq(vq)];
>  
> -	/* Suppress further interrupts. */
> -	virtqueue_disable_cb(vq);
> -
> -	/* We were probably waiting for more output buffers. */
> -	netif_wake_subqueue(vi->dev, vq2txq(vq));
> +	if (napi_schedule_prep(&sq->napi)) {
> +		virtqueue_disable_cb(vq);
> +		virtqueue_disable_cb_urgent(vq);

This disable_cb is no longer safe in xmit_done callback,
since queue can be running at the same time.

You must do it under tx lock. And yes, this likely will not work
work well without event_idx. We'll probably need extra
synchronization for such old hosts.



> +		__napi_schedule(&sq->napi);
> +	}
>  }
>  
>  static unsigned int mergeable_ctx_to_buf_truesize(unsigned long mrg_ctx)
> @@ -772,7 +799,38 @@ again:
>  	return received;
>  }
>  
> +static int virtnet_poll_tx(struct napi_struct *napi, int budget)
> +{
> +	struct send_queue *sq =
> +		container_of(napi, struct send_queue, napi);
> +	struct virtnet_info *vi = sq->vq->vdev->priv;
> +	struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, vq2txq(sq->vq));
> +	unsigned int r, sent = 0;
> +
> +again:
> +	__netif_tx_lock(txq, smp_processor_id());
> +	sent += free_old_xmit_skbs(sq, budget - sent);
> +
> +	if (sent < budget) {
> +		r = virtqueue_enable_cb_prepare_urgent(sq->vq);
> +		napi_complete(napi);
> +		__netif_tx_unlock(txq);
> +		if (unlikely(virtqueue_poll(sq->vq, r)) &&
> +		    napi_schedule_prep(napi)) {
> +			virtqueue_disable_cb_urgent(sq->vq);
> +			__napi_schedule(napi);
> +			goto again;
> +		}
> +	} else {
> +		__netif_tx_unlock(txq);
> +	}
> +
> +	netif_wake_subqueue(vi->dev, vq2txq(sq->vq));
> +	return sent;
> +}
> +
>  #ifdef CONFIG_NET_RX_BUSY_POLL
> +
>  /* must be called with local_bh_disable()d */
>  static int virtnet_busy_poll(struct napi_struct *napi)
>  {
> @@ -820,31 +878,13 @@ static int virtnet_open(struct net_device *dev)
>  			if (!try_fill_recv(&vi->rq[i], GFP_KERNEL))
>  				schedule_delayed_work(&vi->refill, 0);
>  		virtnet_napi_enable(&vi->rq[i]);
> +		napi_enable(&vi->sq[i].napi);
>  	}
>  
>  	return 0;
>  }
>  
> -static void free_old_xmit_skbs(struct send_queue *sq)
> -{
> -	struct sk_buff *skb;
> -	unsigned int len;
> -	struct virtnet_info *vi = sq->vq->vdev->priv;
> -	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> -
> -	while ((skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> -		pr_debug("Sent skb %p\n", skb);
> -
> -		u64_stats_update_begin(&stats->tx_syncp);
> -		stats->tx_bytes += skb->len;
> -		stats->tx_packets++;
> -		u64_stats_update_end(&stats->tx_syncp);
> -
> -		dev_kfree_skb_any(skb);
> -	}
> -}
> -
> -static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
> +static int xmit_skb(struct send_queue *sq, struct sk_buff *skb, bool urgent)
>  {
>  	struct skb_vnet_hdr *hdr;
>  	const unsigned char *dest = ((struct ethhdr *)skb->data)->h_dest;
> @@ -908,7 +948,43 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
>  		sg_set_buf(sq->sg, hdr, hdr_len);
>  		num_sg = skb_to_sgvec(skb, sq->sg + 1, 0, skb->len) + 1;
>  	}
> -	return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb, GFP_ATOMIC);
> +	if (urgent)
> +		return virtqueue_add_outbuf_urgent(sq->vq, sq->sg, num_sg,
> +						   skb, GFP_ATOMIC);
> +	else
> +		return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb,
> +					    GFP_ATOMIC);
> +}
> +
> +static bool virtnet_skb_needs_intr(struct sk_buff *skb)
> +{
> +	union {
> +		unsigned char *network;
> +		struct iphdr *ipv4;
> +		struct ipv6hdr *ipv6;
> +	} hdr;
> +	struct tcphdr *th = tcp_hdr(skb);
> +	u16 payload_len;
> +
> +	hdr.network = skb_network_header(skb);
> +
> +	/* Only IPv4/IPv6 with TCP is supported */
> +	if ((skb->protocol == htons(ETH_P_IP)) &&
> +	    hdr.ipv4->protocol == IPPROTO_TCP) {
> +		payload_len = ntohs(hdr.ipv4->tot_len) - hdr.ipv4->ihl * 4 -
> +			      th->doff * 4;
> +	} else if ((skb->protocol == htons(ETH_P_IPV6) ||
> +		   hdr.ipv6->nexthdr == IPPROTO_TCP)) {
> +		payload_len = ntohs(hdr.ipv6->payload_len) - th->doff * 4;
> +	} else {
> +		return false;
> +	}
> +
> +	/* We don't want to dealy packet with PUSH bit and pure ACK packet */
> +	if (!th->psh && payload_len)
> +		return true;
> +
> +	return false;
>  }
>  
>  static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> @@ -916,13 +992,15 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>  	struct virtnet_info *vi = netdev_priv(dev);
>  	int qnum = skb_get_queue_mapping(skb);
>  	struct send_queue *sq = &vi->sq[qnum];
> -	int err;
> +	bool urgent = virtnet_skb_needs_intr(skb);
> +	int err, qsize = virtqueue_get_vring_size(sq->vq);
>  
> +	virtqueue_disable_cb_urgent(sq->vq);
>  	/* Free up any pending old buffers before queueing new ones. */
> -	free_old_xmit_skbs(sq);
> +	free_old_xmit_skbs(sq, qsize);
>  
>  	/* Try to transmit */
> -	err = xmit_skb(sq, skb);
> +	err = xmit_skb(sq, skb, urgent);
>  
>  	/* This should not happen! */
>  	if (unlikely(err)) {
> @@ -935,22 +1013,26 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>  		return NETDEV_TX_OK;
>  	}
>  
> -	/* Don't wait up for transmitted skbs to be freed. */
> -	skb_orphan(skb);
> -	nf_reset(skb);
> +	if (!urgent) {
> +		skb_orphan(skb);
> +		nf_reset(skb);
> +	}
>  
>  	/* Apparently nice girls don't return TX_BUSY; stop the queue
>  	 * before it gets out of hand.  Naturally, this wastes entries. */
>  	if (sq->vq->num_free < 2+MAX_SKB_FRAGS) {
>  		netif_stop_subqueue(dev, qnum);
> +		virtqueue_disable_cb_urgent(sq->vq);
>  		if (unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
>  			/* More just got used, free them then recheck. */
> -			free_old_xmit_skbs(sq);
> +			free_old_xmit_skbs(sq, qsize);
>  			if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
>  				netif_start_subqueue(dev, qnum);
>  				virtqueue_disable_cb(sq->vq);
>  			}
>  		}
> +	} else if (virtqueue_enable_cb_urgent(sq->vq)) {
> +		free_old_xmit_skbs(sq, qsize);
>  	}
>  
>  	if (__netif_subqueue_stopped(dev, qnum) || !skb->xmit_more)
> @@ -1132,8 +1214,10 @@ static int virtnet_close(struct net_device *dev)
>  	/* Make sure refill_work doesn't re-enable napi! */
>  	cancel_delayed_work_sync(&vi->refill);
>  
> -	for (i = 0; i < vi->max_queue_pairs; i++)
> +	for (i = 0; i < vi->max_queue_pairs; i++) {
>  		napi_disable(&vi->rq[i].napi);
> +		napi_disable(&vi->sq[i].napi);
> +	}
>  
>  	return 0;
>  }
> @@ -1452,8 +1536,10 @@ static void virtnet_free_queues(struct virtnet_info *vi)
>  {
>  	int i;
>  
> -	for (i = 0; i < vi->max_queue_pairs; i++)
> +	for (i = 0; i < vi->max_queue_pairs; i++) {
>  		netif_napi_del(&vi->rq[i].napi);
> +		netif_napi_del(&vi->sq[i].napi);
> +	}
>  
>  	kfree(vi->rq);
>  	kfree(vi->sq);
> @@ -1607,6 +1693,8 @@ static int virtnet_alloc_queues(struct virtnet_info *vi)
>  		netif_napi_add(vi->dev, &vi->rq[i].napi, virtnet_poll,
>  			       napi_weight);
>  		napi_hash_add(&vi->rq[i].napi);
> +		netif_napi_add(vi->dev, &vi->sq[i].napi, virtnet_poll_tx,
> +			       napi_weight);
>  
>  		sg_init_table(vi->rq[i].sg, ARRAY_SIZE(vi->rq[i].sg));
>  		ewma_init(&vi->rq[i].mrg_avg_pkt_len, 1, RECEIVE_AVG_WEIGHT);
> @@ -1912,8 +2000,10 @@ static int virtnet_freeze(struct virtio_device *vdev)
>  	if (netif_running(vi->dev)) {
>  		for (i = 0; i < vi->max_queue_pairs; i++) {
>  			napi_disable(&vi->rq[i].napi);
> +			napi_disable(&vi->sq[i].napi);
>  			napi_hash_del(&vi->rq[i].napi);
>  			netif_napi_del(&vi->rq[i].napi);
> +			netif_napi_del(&vi->sq[i].napi);
>  		}
>  	}
>  
> @@ -1938,8 +2028,10 @@ static int virtnet_restore(struct virtio_device *vdev)
>  			if (!try_fill_recv(&vi->rq[i], GFP_KERNEL))
>  				schedule_delayed_work(&vi->refill, 0);
>  
> -		for (i = 0; i < vi->max_queue_pairs; i++)
> +		for (i = 0; i < vi->max_queue_pairs; i++) {
>  			virtnet_napi_enable(&vi->rq[i]);
> +			napi_enable(&vi->sq[i].napi);
> +		}
>  	}
>  
>  	netif_device_attach(vi->dev);
> -- 
> 1.8.3.1

^ permalink raw reply

* Re: [PATCH net-next RFC 0/3] virtio-net: Conditionally enable tx interrupt
From: Michael S. Tsirkin @ 2014-10-14 21:51 UTC (permalink / raw)
  To: David Miller; +Cc: kvm, netdev, linux-kernel, virtualization, linux-api
In-Reply-To: <20141014.145327.365091739350390288.davem@davemloft.net>

On Tue, Oct 14, 2014 at 02:53:27PM -0400, David Miller wrote:
> From: Jason Wang <jasowang@redhat.com>
> Date: Sat, 11 Oct 2014 15:16:43 +0800
> 
> > We free old transmitted packets in ndo_start_xmit() currently, so any
> > packet must be orphaned also there. This was used to reduce the overhead of
> > tx interrupt to achieve better performance. But this may not work for some
> > protocols such as TCP stream. TCP depends on the value of sk_wmem_alloc to
> > implement various optimization for small packets stream such as TCP small
> > queue and auto corking. But orphaning packets early in ndo_start_xmit()
> > disable such things more or less since sk_wmem_alloc was not accurate. This
> > lead extra low throughput for TCP stream of small writes.
> > 
> > This series tries to solve this issue by enable tx interrupts for all TCP
> > packets other than the ones with push bit or pure ACK. This is done through
> > the support of urgent descriptor which can force an interrupt for a
> > specified packet. If tx interrupt was enabled for a packet, there's no need
> > to orphan it in ndo_start_xmit(), we can free it tx napi which is scheduled
> > by tx interrupt. Then sk_wmem_alloc was more accurate than before and TCP
> > can batch more for small write. More larger skb was produced by TCP in this
> > case to improve both throughput and cpu utilization.
> > 
> > Test shows great improvements on small write tcp streams. For most of the
> > other cases, the throughput and cpu utilization are the same in the
> > past. Only few cases, more cpu utilization was noticed which needs more
> > investigation.
> > 
> > Review and comments are welcomed.
> 
> I think proper accounting and queueing (at all levels, not just TCP
> sockets) is more important than trying to skim a bunch of cycles by
> avoiding TX interrupts.
> 
> Having an event to free the SKB is absolutely essential for the stack
> to operate correctly.
> 
> And with virtio-net you don't even have the excuse of "the HW
> unfortunately doesn't have an appropriate TX event."
> 
> So please don't play games, and instead use TX interrupts all the
> time.  You can mitigate them in various ways, but don't turn them on
> selectively based upon traffic type, that's terrible.

This got me thinking: how about using virtqueue_enable_cb_delayed
for this mitigation?

It's pretty easy to implement - I'll send a proof of concept patch
separately.

^ permalink raw reply

* RE: [PATCH net-next,v2] hyperv: Add handling of IP header with option field in netvsc_set_hash()
From: Haiyang Zhang @ 2014-10-14 21:47 UTC (permalink / raw)
  To: David Miller
  Cc: netdev@vger.kernel.org, KY Srinivasan, olaf@aepfle.de,
	jasowang@redhat.com, linux-kernel@vger.kernel.org,
	driverdev-devel@linuxdriverproject.org
In-Reply-To: <20141014.173717.1412701736631211553.davem@davemloft.net>



> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: Tuesday, October 14, 2014 5:37 PM
> To: Haiyang Zhang
> Cc: netdev@vger.kernel.org; KY Srinivasan; olaf@aepfle.de;
> jasowang@redhat.com; linux-kernel@vger.kernel.org; driverdev-
> devel@linuxdriverproject.org
> Subject: Re: [PATCH net-next,v2] hyperv: Add handling of IP header with option
> field in netvsc_set_hash()
> 
> From: Haiyang Zhang <haiyangz@microsoft.com>
> Date: Tue, 14 Oct 2014 15:16:28 -0700
> 
> > In case that the IP header has optional field at the end, this patch
> > will get the port numbers after that field, and compute the hash.
> >
> > Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
> > Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
> 
> Instead of adding hack after hack after hack to your internal header parser, just
> use the generic flow dissector we already have in the kernel to fetch out the
> values you need.
> 
> __skb_flow_get_ports() etc.

Thanks. I will update the patch.

- Haiyang

^ permalink raw reply

* Re: something is wrong in commit 971f10eca1 - tcp: better TCP_SKB_CB layout to reduce cache line misses
From: Eric Dumazet @ 2014-10-14 21:43 UTC (permalink / raw)
  To: Cong Wang; +Cc: David Miller, Eric Dumazet, Krzysztof Kolasa, netdev
In-Reply-To: <CAHA+R7PyUZyjHPf-SJRiiOw=jQjJnXVaosYU7TuiVe6sQTC9Ww@mail.gmail.com>

On Tue, 2014-10-14 at 14:24 -0700, Cong Wang wrote:

> 
> Since we are still in merge window, I don't think we have to use
> a one-line fix for a bug introduced in this merge window.

You are clearly refactoring here. Its a nice cleanup.

If I was the maintainer, I would prefer the one line fix.

Then when net-next is open, you refactor.

As I said, I wont argue, do whatever you want.

Thanks

^ permalink raw reply

* Re: [PATCH net-next,v2] hyperv: Add handling of IP header with option field in netvsc_set_hash()
From: David Miller @ 2014-10-14 21:37 UTC (permalink / raw)
  To: haiyangz; +Cc: olaf, netdev, jasowang, driverdev-devel, linux-kernel
In-Reply-To: <1413324988-8863-1-git-send-email-haiyangz@microsoft.com>

From: Haiyang Zhang <haiyangz@microsoft.com>
Date: Tue, 14 Oct 2014 15:16:28 -0700

> In case that the IP header has optional field at the end, this patch will
> get the port numbers after that field, and compute the hash.
> 
> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>

Instead of adding hack after hack after hack to your internal header
parser, just use the generic flow dissector we already have in the
kernel to fetch out the values you need.

__skb_flow_get_ports() etc.

^ permalink raw reply

* Re: [Patch net] rds: avoid calling sock_kfree_s() on allocation failure
From: Cong Wang @ 2014-10-14 21:27 UTC (permalink / raw)
  To: David Miller; +Cc: Cong Wang, netdev, rds-devel, chien.yen, Stephen Hemminger
In-Reply-To: <20141014.170311.2104285680542945400.davem@davemloft.net>

On Tue, Oct 14, 2014 at 2:03 PM, David Miller <davem@davemloft.net> wrote:
> From: Cong Wang <xiyou.wangcong@gmail.com>
> Date: Tue, 14 Oct 2014 12:35:08 -0700
>
>> From: Cong Wang <cwang@twopensource.com>
>>
>> It is okay to free a NULL pointer but not okay to mischarge the socket optmem
>> accounting. Compile test only.
>>
>> Reported-by: rucsoftsec@gmail.com
>> Cc: Chien Yen <chien.yen@oracle.com>
>> Cc: Stephen Hemminger <stephen@networkplumber.org>
>> Signed-off-by: Cong Wang <cwang@twopensource.com>
>> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
>
> Applied, and I'm going to add the following bug check to the tree too.
>
> ====================
> [PATCH] net: Trap attempts to call sock_kfree_s() with a NULL pointer.
>
> Unlike normal kfree() it is never right to call sock_kfree_s() with
> a NULL pointer, because sock_kfree_s() also has the side effect of
> discharging the memory from the sockets quota.
>
> Signed-off-by: David S. Miller <davem@davemloft.net>


Acked-by: Cong Wang <cwang@twopensource.com>

Sounds reasonable. It could catch more bugs similar to this one.

Thanks!

^ permalink raw reply

* Re: something is wrong in commit 971f10eca1 - tcp: better TCP_SKB_CB layout to reduce cache line misses
From: Cong Wang @ 2014-10-14 21:24 UTC (permalink / raw)
  To: David Miller; +Cc: Eric Dumazet, Krzysztof Kolasa, Eric Dumazet, netdev
In-Reply-To: <20141014.171559.1355616486999711590.davem@davemloft.net>

On Tue, Oct 14, 2014 at 2:15 PM, David Miller <davem@davemloft.net> wrote:
> From: Eric Dumazet <edumazet@gmail.com>
> Date: Tue, 14 Oct 2014 14:03:51 -0700
>
>> 2014-10-14 13:47 GMT-07:00 David Miller <davem@davemloft.net>:
>>>
>>>
>>> > Patch looks good, but seems a net-next candidate to me.
>>>
>>> I thought it is a potential fix for this corruption report?
>>
>>
>> Sure, but a one liner fix seemed more obvious, as it points to the problem.
>>
>> I do not particularly care, as long as we fixed the bug.


Since we are still in merge window, I don't think we have to use
a one-line fix for a bug introduced in this merge window.

>
> Ok, let's get testing feedback and then I'll look for a formal submission.

Yeah, there is no evidence to say my patch fixes the bug Krzysztof
reported before he tests it, I found it during code review.

Thanks!

^ permalink raw reply

* Re: [PATCH net] tcp: TCP Small Queues and strange attractors
From: David Miller @ 2014-10-14 21:16 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1413321018.17109.7.camel@edumazet-glaptop2.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 14 Oct 2014 14:10:18 -0700

> On Tue, 2014-10-14 at 15:33 -0400, David Miller wrote:
>> From: Eric Dumazet <eric.dumazet@gmail.com>
>> Date: Mon, 13 Oct 2014 06:27:47 -0700
>> 
>> > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
>> > index 8d4eac793700..4a7e97811d71 100644
>> > --- a/net/ipv4/tcp_output.c
>> > +++ b/net/ipv4/tcp_output.c
>> > @@ -839,26 +839,38 @@ void tcp_wfree(struct sk_buff *skb)
>> >  {
>>  ...
>> >  		local_irq_restore(flags);
>> > -	} else {
>> > -		sock_wfree(skb);
>> > +		return;
>> >  	}
>> > +out:
>> > +	sk_free(sk);
>> >  }
>> >  
>> 
>> Why do we need to release the socket here?
> 
> Thats because we had to keep a reference at the very beginning :
> 
> +       /* Keep one reference on sk_wmem_alloc.
> +        * Will be released by sk_free() from here or tcp_tasklet_func()
> +        */
> +       wmem = atomic_sub_return(skb->truesize - 1, &sk->sk_wmem_alloc);
> +
> 
> 
> If we find that we do not arm the tasklet (and keep the reference), we
> then can remove the last ref we kept.
> 
> sk_free() is essentially doing :
> 
> if (atomic_dec_and_test(&sk->sk_wmem_alloc))
>     __sk_free(sk)

Gotcha, now it makes sense.

Patch applied, thanks Eric.

^ permalink raw reply

* Re: something is wrong in commit 971f10eca1 - tcp: better TCP_SKB_CB layout to reduce cache line misses
From: David Miller @ 2014-10-14 21:15 UTC (permalink / raw)
  To: edumazet; +Cc: cwang, kkolasa, eric.dumazet, netdev
In-Reply-To: <CAFrEnejET5wer6S6qx13tMKWMshmRj+JnQeZL-zdPGcs__9NoQ@mail.gmail.com>

From: Eric Dumazet <edumazet@gmail.com>
Date: Tue, 14 Oct 2014 14:03:51 -0700

> 2014-10-14 13:47 GMT-07:00 David Miller <davem@davemloft.net>:
>>
>>
>> > Patch looks good, but seems a net-next candidate to me.
>>
>> I thought it is a potential fix for this corruption report?
> 
> 
> Sure, but a one liner fix seemed more obvious, as it points to the problem.
> 
> I do not particularly care, as long as we fixed the bug.

Ok, let's get testing feedback and then I'll look for a formal submission.

^ permalink raw reply

* Re: [PATCH RFC net-next] sfc: add support for skb->xmit_more
From: David Miller @ 2014-10-14 21:15 UTC (permalink / raw)
  To: ecree; +Cc: dborkman, nikolay, netdev, sshah, jcooper, linux-net-drivers
In-Reply-To: <alpine.LFD.2.03.1410141933500.26972@solarflare.com>

From: Edward Cree <ecree@solarflare.com>
Date: Tue, 14 Oct 2014 19:41:37 +0100

> Don't ring the doorbell, and don't do PIO.  This will also prevent
>  TX Push, because there will be more than one buffer waiting when
>  the doorbell is rung.
> 
> Signed-off-by: Edward Cree <ecree@solarflare.com>

This looks good to me, mind if I apply this now?

^ permalink raw reply

* Re: [PATCH net] tcp: TCP Small Queues and strange attractors
From: Eric Dumazet @ 2014-10-14 21:10 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20141014.153308.2221538197341630790.davem@davemloft.net>

On Tue, 2014-10-14 at 15:33 -0400, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Mon, 13 Oct 2014 06:27:47 -0700
> 
> > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > index 8d4eac793700..4a7e97811d71 100644
> > --- a/net/ipv4/tcp_output.c
> > +++ b/net/ipv4/tcp_output.c
> > @@ -839,26 +839,38 @@ void tcp_wfree(struct sk_buff *skb)
> >  {
>  ...
> >  		local_irq_restore(flags);
> > -	} else {
> > -		sock_wfree(skb);
> > +		return;
> >  	}
> > +out:
> > +	sk_free(sk);
> >  }
> >  
> 
> Why do we need to release the socket here?

Thats because we had to keep a reference at the very beginning :

+       /* Keep one reference on sk_wmem_alloc.
+        * Will be released by sk_free() from here or tcp_tasklet_func()
+        */
+       wmem = atomic_sub_return(skb->truesize - 1, &sk->sk_wmem_alloc);
+


If we find that we do not arm the tasklet (and keep the reference), we
then can remove the last ref we kept.

sk_free() is essentially doing :

if (atomic_dec_and_test(&sk->sk_wmem_alloc))
    __sk_free(sk)

sock_wfree() is the same, its only that we know that TCP socket do not
pass this test :

if (!sock_flag(sk, SOCK_USE_WRITE_QUEUE)) {
   sk->sk_write_space(sk);
}

^ permalink raw reply

* Re: [PATCH net 0/2] qlcnic: Bug fixes
From: David Miller @ 2014-10-14 21:06 UTC (permalink / raw)
  To: rajesh.borundia; +Cc: netdev, Dept-HSGLinuxNICDev
In-Reply-To: <1413286906-14220-1-git-send-email-rajesh.borundia@qlogic.com>

From: Rajesh Borundia <rajesh.borundia@qlogic.com>
Date: Tue, 14 Oct 2014 07:41:44 -0400

> This series fixes following issues.
> 
> * We were programming maximum number of arguments supported by
>   adapter instead of required in a command.
> * Destroy tx command requires three arguments instead of two.
> 
> Please apply these patches to net.

Series applied, thanks.

^ permalink raw reply

* Re: [net] genl_magic: Resolve logical-op warnings
From: David Miller @ 2014-10-14 21:04 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: mark.d.rustad, linux-kernel, johannes.berg, netdev
In-Reply-To: <1413293318-9070-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Tue, 14 Oct 2014 06:28:38 -0700

> From: Mark Rustad <mark.d.rustad@intel.com>
> 
> Resolve "logical 'and' applied to non-boolean constant" warnings"
> that appear in W=2 builds by adding !! to a bit test.
> 
> Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied, thanks Jeff.

^ permalink raw reply

* Re: something is wrong in commit 971f10eca1 - tcp: better TCP_SKB_CB layout to reduce cache line misses
From: Eric Dumazet @ 2014-10-14 21:03 UTC (permalink / raw)
  To: David Miller; +Cc: cwang, kkolasa, Eric Dumazet, netdev
In-Reply-To: <20141014.164705.1874094084240216813.davem@davemloft.net>

2014-10-14 13:47 GMT-07:00 David Miller <davem@davemloft.net>:
>
>
> > Patch looks good, but seems a net-next candidate to me.
>
> I thought it is a potential fix for this corruption report?


Sure, but a one liner fix seemed more obvious, as it points to the problem.

I do not particularly care, as long as we fixed the bug.

Thanks !

^ permalink raw reply

* Re: [Patch net] rds: avoid calling sock_kfree_s() on allocation failure
From: David Miller @ 2014-10-14 21:03 UTC (permalink / raw)
  To: xiyou.wangcong; +Cc: netdev, rds-devel, chien.yen, stephen, cwang
In-Reply-To: <1413315308-32288-1-git-send-email-xiyou.wangcong@gmail.com>

From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Tue, 14 Oct 2014 12:35:08 -0700

> From: Cong Wang <cwang@twopensource.com>
> 
> It is okay to free a NULL pointer but not okay to mischarge the socket optmem
> accounting. Compile test only.
> 
> Reported-by: rucsoftsec@gmail.com
> Cc: Chien Yen <chien.yen@oracle.com>
> Cc: Stephen Hemminger <stephen@networkplumber.org>
> Signed-off-by: Cong Wang <cwang@twopensource.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

Applied, and I'm going to add the following bug check to the tree too.

====================
[PATCH] net: Trap attempts to call sock_kfree_s() with a NULL pointer.

Unlike normal kfree() it is never right to call sock_kfree_s() with
a NULL pointer, because sock_kfree_s() also has the side effect of
discharging the memory from the sockets quota.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/core/sock.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/core/sock.c b/net/core/sock.c
index b4f3ea2..15e0c67 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1718,6 +1718,8 @@ EXPORT_SYMBOL(sock_kmalloc);
  */
 void sock_kfree_s(struct sock *sk, void *mem, int size)
 {
+	if (WARN_ON_ONCE(!mem))
+		return;
 	kfree(mem);
 	atomic_sub(size, &sk->sk_omem_alloc);
 }
-- 
1.7.11.7

^ permalink raw reply related

* Re: Fwd: micrel: ksz8051 badly detected as ksz8031
From: Florian Fainelli @ 2014-10-14 20:56 UTC (permalink / raw)
  To: Angelo Dureghello, netdev@vger.kernel.org
In-Reply-To: <543D7A84.4030202@gmail.com>

On 10/14/2014 12:33 PM, Angelo Dureghello wrote:
> Hi Florian,
> 
>> On 10/14/2014 10:24 AM, Angelo Dureghello wrote:
>>> Dear,
>>>
>>> have to apologize for the confusion, previous patch is not the proper
>>> fix,
>>> since it is not solving completely the issue.
>>>
>>> And also, i mainly misunderstood the issue.
>>>
>>> The issue i am experiencing is :
>>>
>>> https://lkml.org/lkml/2013/9/18/259
>>>
>>> Mainly, i have Micrel chip marked KSZ8051(RNL), but the product Id in
>>> the
>>> silicon is KSZ8031 and linux detects it as KSZ8031.
>>> The attmept to mdio boot override register kill the Micrel
>>> functionality.
>> Ok, so basically your bootloader does something that Linux does, and
>> once Linux boots, it will reset the PHY to put it in a known state. If
>> you can snoop the MDIO read/writes done in your bootloader environment,
>> that might help narrow down the issue.
>>
> Bootloader is u-boot and seems it uses generic PHY setup, and so it works.
> 
> Linux at boot detects the phy and does a soft_reset. If the detection
> sets the
> driver as for KSZ8031, ethernet/link will not work, becouse micrel.c uses
> the incorrect config_init function, attempts to write to the bootstrap
> override register, that can't be written for KSZ8051, and so puts the
> micrel
> chip in a broken state.
> 
> The guy in this link (https://lkml.org/lkml/2013/9/18/259) seems are
> discussing a
> better solution.
> 
> I patched my linux as below, but this is a fast fixup for me:

You could register a PHY fixup in your board code that does exactly what
you are you doing here, except that you would match 0x00221556,
something along those lines:

static int micrel_ipam390_phy_fixup(struct phy_device *dev)
{
       if (dev->phy_id == 0x00221556)
		dev->phy_id = 0x00221550;

        return 0;
}

int __init micrel_ipam390_phy_fixup_register(void)
{
        return phy_register_fixup_for_id(PHY_ANY_ID, micrel_ipam390);
}

> 
> diff -rupN drivers/net/phy/phy_device.c
> ../linux-3.17/drivers/net/phy/phy_device.c
> --- drivers/net/phy/phy_device.c        2014-10-14 21:05:56.191117190 +0200
> +++ ../linux-3.17/drivers/net/phy/phy_device.c  2014-10-05
> 21:23:04.000000000 +0200
> @@ -310,19 +310,6 @@ static int get_phy_id(struct mii_bus *bu
> 
>         *phy_id |= (phy_reg & 0xffff);
> 
> -       /*
> -        * Angelo - Barix
> -        * Micrel produced chips marked KSZ8051 but with KSZ8031 id code
> -        * in the silicon. After getting crazy to understand why in
> recent kernel
> -        * the ethenret was not workeing, i find it out.
> -        *
> -        * From the schematic, we assume to use KSZ8051
> -        * I hardcode the fix here for ipam390.
> -        */
> -#if CONFIG_MACH_BARIX_IPAM390
> -       if (*phy_id == 0x00221556) *phy_id = 0x00221550;
> -#endif
> -
>         return 0;
>  }
> 
> Regards
> angelo
> 

^ permalink raw reply

* Re: [PATCH net] cxgb4: Fix FW flash logic using ethtool
From: David Miller @ 2014-10-14 20:55 UTC (permalink / raw)
  To: hariprasad; +Cc: netdev, leedom, kumaras, nirranjan, santosh, anish
In-Reply-To: <1413318254-23813-1-git-send-email-hariprasad@chelsio.com>

From: Hariprasad Shenai <hariprasad@chelsio.com>
Date: Wed, 15 Oct 2014 01:54:14 +0530

> Use t4_fw_upgrade instead of t4_load_fw to write firmware into FLASH, since
> t4_load_fw doesn't co-ordinate with the firmware and the adapter can get hosed
> enough to require a power cycle of the system.
> 
> Based on original work by Casey Leedom <leedom@chelsio.com>
> 
> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>

Applied.

^ permalink raw reply

* RE: ixgbe: Question about Flow Control on 10G
From: Tantilov, Emil S @ 2014-10-14 20:54 UTC (permalink / raw)
  To: dom, Skidmore, Donald C, netdev@vger.kernel.org
In-Reply-To: <543D72FE.9090107@citrix.com>

>-----Original Message-----
>From: netdev-owner@vger.kernel.org [mailto:netdev-
>owner@vger.kernel.org] On Behalf Of dom
>Sent: Tuesday, October 14, 2014 12:01 PM
>To: Skidmore, Donald C; netdev@vger.kernel.org
>Subject: ixgbe: Question about Flow Control on 10G
>
>Hi
>
>I have a question about the ixgbe driver's handling of
>'ethtool -a ethX'
>when the NIC is using fibre.
>
>Specifically I don't understand the code introduced by this
>commit:
>
>commit 73d80953dfd1d5a92948005798c857c311c2834b
>Author: Don Skidmore <donald.c.skidmore@intel.com>
>Date:   Wed Jul 31 02:19:24 2013 +0000
>Subject: ixgbe: fix fc autoneg ethtool reporting.
>
>The function introduced the function:
>        ixgbe_device_supports_autoneg_fc()
>
>which gets called by
>ixgbe_get_pauseparam()/ixgbe_set_pauseparam().
>
>specifically there is a  case in
>ixgbe_device_supports_autoneg_fc()
>
>     case ixgbe_media_type_fiber_qsfp:
>     case ixgbe_media_type_fiber:
>         hw->mac.ops.check_link(hw, &speed, &link_up,
>false);
>         /* if link is down, assume supported */
>         if (link_up)
>             supported = speed == IXGBE_LINK_SPEED_1GB_FULL ?
>                 true : false;
>
>If link_up=1 then why is supported only true for a
>speed=IXGBE_LINK_SPEED_1GB_FULL ?
>
>Why is Flow Control not supported for IXGBE_LINK_SPEED_10GB_FULL ?

For SFP modules (media_type_fiber) flow control autoneg is not supported at 10gig. You can still set flow control manually to enabled/disabled, just not autoneg.

Thanks,
Emil

^ permalink raw reply

* Re: something is wrong in commit 971f10eca1 - tcp: better TCP_SKB_CB layout to reduce cache line misses
From: David Miller @ 2014-10-14 20:47 UTC (permalink / raw)
  To: edumazet; +Cc: cwang, kkolasa, eric.dumazet, netdev
In-Reply-To: <CAFrEnehWNzmzZGV-ZGbZF=wJ1q0TjFtB1yG6Dp1yVrJmBj9B7Q@mail.gmail.com>

From: Eric Dumazet <edumazet@gmail.com>
Date: Tue, 14 Oct 2014 13:42:01 -0700

> 2014-10-14 12:03 GMT-07:00 David Miller <davem@davemloft.net>:
> 
>> From: Cong Wang <cwang@twopensource.com>
>> Date: Tue, 14 Oct 2014 11:59:25 -0700
>>
>> > On Tue, Oct 14, 2014 at 6:25 AM, Krzysztof Kolasa <kkolasa@winsoft.pl>
>> wrote:
>> >> W dniu 14.10.2014 o 02:09, Cong Wang pisze:
>> >>
>> >>> On Mon, Oct 13, 2014 at 4:59 PM, Cong Wang <cwang@twopensource.com>
>> wrote:
>> >>>>
>> >>>> Probably not related with this bug, but with regarding to the
>> >>>> offending commit, what's the point of the memmove() in tcp_v4_rcv()
>> >>>> since ip_rcv() already clears IPCB()?
>> >>>
>> >>> Oh, ip options are actually saved in ip_rcv_finish()... Hmm, looks
>> scary
>> >>> to play with variable-length array with memmove()....
>> >>>
>> >> On my other old laptop with 32bit kernel next and graphics card Intel
>> 945GM
>> >> just after the revert commit working OK,
>> >> before, after login to gnome shell in some seconds decorations disappear
>> >>
>> >> 32 bit Ubuntu 12.04.5 LTS, gnome shell, kernel source next 14-10-2014
>> >>
>> >> Can anyone confirm this ?
>> >>
>> >
>> > Sorry, believe it or not, for me it is hard to find a 32bit machine even
>> VM. :)
>> >
>> > Could the attached patch by any chance help? I noticed cookie_v4_check()
>> > still uses IPCB() instead of TCPCB() at TCP layer.
>>
>> Eric, please review.
>>
> 
> For an unknown reason I do not see this patch on my eric.dumazet@gmail.com
> 
> It seems my migration to Trusty was not a good move :(
> 
> Patch looks good, but seems a net-next candidate to me.

I thought it is a potential fix for this corruption report?

^ permalink raw reply

* Re: [PATCH (net.git) 0/2] stmmac: review and fix the dwmac-sti glue-logic
From: David Miller @ 2014-10-14 20:41 UTC (permalink / raw)
  To: peppe.cavallaro; +Cc: netdev, maxime.coquelin
In-Reply-To: <1413267176-325-1-git-send-email-peppe.cavallaro@st.com>

From: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Date: Tue, 14 Oct 2014 08:12:54 +0200

> This patch is to review the whole glue logic adopted on STi SoCs that
> was bugged.
> In the old glue-logic there was a lot of confusion when setup the
> retiming especially for STiD127 where, for example, the bits 6 and 7
> (in the GMAC  control register) have a different meaning of what is
> used for STiH4xx SoCs. So we cannot adopt the same glue for all these
> SoCs.
> Moreover, GiGa on STiD127 didn't work and, for all the SoCs, the RGMII
> couldn't run when the speed was 10Mbps (because the clock was not properly
> managed).
> Note that the phy clock needs to be provided by the platform as well as
> documented in the related binding file (updated as consequence).
> 
> The old code supported too many configurations never adopted and validated.
> This made the code very complex to maintain and debug in case of issues.
> 
> The patch simplifies all the configurations as commented in the tables
> inside the file and obviously it has been tested on all the boards
> based on the SoCs mentioned.
> 
> With this patch, the dwmac-sti is also ready to support new configurations that
> will be available on next SoC generations.

Series applied.

^ permalink raw reply

* Re: [PATCH (net.git)] stmmac: platform: fix FIXED_PHY support.
From: David Miller @ 2014-10-14 20:41 UTC (permalink / raw)
  To: peppe.cavallaro; +Cc: netdev
In-Reply-To: <1413267114-24043-1-git-send-email-peppe.cavallaro@st.com>

From: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Date: Tue, 14 Oct 2014 08:11:54 +0200

> On several STi platforms: e.g. stihxxx-b2120 an Ethernet switch is
> embedded and connected to the stmmac via RGMII mode. So this is managed
> by using the FIXED_PHY. In that case, the support in the platform needs
> to be fixed to allow the stmmac to dialog with the switch via fixed-link
> by using phy_bus_name property.
> 
> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>

Applied.

^ permalink raw reply

* Re: [PATCH v9 net-next 2/4] net: filter: split filter.h and expose eBPF to user space
From: Daniel Borkmann @ 2014-10-14 20:27 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Ingo Molnar, Linus Torvalds, Andy Lutomirski,
	Steven Rostedt, Hannes Frederic Sowa, Chema Gonzalez,
	Eric Dumazet, Peter Zijlstra, H. Peter Anvin, Andrew Morton,
	Kees Cook, Linux API, Network Development, LKML
In-Reply-To: <CAMEtUuws+AtOdwud8_YjXhs=yomt8nY+49f_UuhofcmhV58c1Q@mail.gmail.com>

On 10/14/2014 10:43 AM, Alexei Starovoitov wrote:
> On Tue, Oct 14, 2014 at 12:34 AM, Daniel Borkmann <dborkman@redhat.com> wrote:
>> On 10/13/2014 11:49 PM, Alexei Starovoitov wrote:
>>>
>>> On Mon, Oct 13, 2014 at 10:21 AM, Daniel Borkmann <dborkman@redhat.com>
>>> wrote:
>>>>
>>>> On 09/03/2014 05:46 PM, Daniel Borkmann wrote:
>>>> ...
>>>>>
>>>>> Ok, given you post the remaining two RFCs later on this window as
>>>>> you indicate, I have no objections:
>>>>>
>>>>> Acked-by: Daniel Borkmann <dborkman@redhat.com>
>>>>
>>>> Ping, Alexei, are you still sending the patch for bpf_common.h or
>>>> do you want me to take care of this?
>>>
>>> It's not forgotten.
>>> I'm not sending it only because net-next is closed
>>> and it seems to be -next material.
>>
>> Well, the point was since it's UAPI you're modifying, that it needs
>> to be shipped before it first gets exposed to user land ...
>>
>> I think that should be reason enough ... there's no point in doing
>> this at a later point in time.
>
> Moving common #defines from filter.h into bpf_common.h can
> be done at any point in time. For the sake of argument if
> there is an app that includes both filter.h and bpf.h, it will
> continue to work just fine.

Correct, but the argument was that we can _avoid_ this from the
very beginning. Thus, user space applications making use of eBPF
only need to include <linux/bpf.h>, nothing more.

Doing this at any later point in time will just lead to the need
to include both headers.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox