Netdev List
 help / color / mirror / Atom feed
* Re: bnx2: can't disable rx-vlan-offload?
From: Vlad Yasevich @ 2014-10-15 14:37 UTC (permalink / raw)
  To: Lukas Tribus, Sony Chacko, Dept-HSGLinuxNICDev@qlogic.com
  Cc: netdev@vger.kernel.org
In-Reply-To: <DUB123-W35654C3B747AEBC067DAF8EDAD0@phx.gbl>

On 10/14/2014 06:05 PM, Lukas Tribus wrote:
> Hi!
> 
> 
> For a passive, libpcap based sniffer I'm trying to disable rx vlan
> offload on a bnx2 card (vlan capture filter in libpcap doesn't work
> correctly), but I'm not able to:
> 
> disabling rxvlan fails:
> 
> root@sniffer:~# ethtool -K eth1 rxvlan off
> Could not change any device features
> root@sniffer:~#
> root@sniffer:~# ethtool -k eth1
> Features for eth1:
> rx-checksumming: on
> tx-checksumming: on
>         tx-checksum-ipv4: on
>         tx-checksum-ip-generic: off [fixed]
>         tx-checksum-ipv6: on
>         tx-checksum-fcoe-crc: off [fixed]
>         tx-checksum-sctp: off [fixed]
> scatter-gather: on
>         tx-scatter-gather: on
>         tx-scatter-gather-fraglist: off [fixed]
> tcp-segmentation-offload: on
>         tx-tcp-segmentation: on
>         tx-tcp-ecn-segmentation: on
>         tx-tcp6-segmentation: on
> udp-fragmentation-offload: off [fixed]
> generic-segmentation-offload: on
> generic-receive-offload: on
> large-receive-offload: off [fixed]
> rx-vlan-offload: on [requested off]
> tx-vlan-offload: on
> ntuple-filters: off [fixed]
> receive-hashing: on
> highdma: on [fixed]
> rx-vlan-filter: off [fixed]
> vlan-challenged: off [fixed]
> tx-lockless: off [fixed]
> netns-local: off [fixed]
> tx-gso-robust: off [fixed]
> tx-fcoe-segmentation: off [fixed]
> tx-gre-segmentation: off [fixed]
> tx-ipip-segmentation: off [fixed]
> tx-sit-segmentation: off [fixed]
> tx-udp_tnl-segmentation: off [fixed]
> tx-mpls-segmentation: off [fixed]
> fcoe-mtu: off [fixed]
> tx-nocache-copy: on
> loopback: off [fixed]
> rx-fcs: off [fixed]
> rx-all: off [fixed]
> tx-vlan-stag-hw-insert: off [fixed]
> rx-vlan-stag-hw-parse: off [fixed]
> rx-vlan-stag-filter: off [fixed]
> l2-fwd-offload: off [fixed]
> root@sniffer:~#
> root@sniffer:~# ethtool -i eth1
> driver: bnx2
> version: 2.2.4
> firmware-version: 5.0.9 bc 5.0.6 NCSI 2.0.3
> bus-info: 0000:01:00.1
> supports-statistics: yes
> supports-test: yes
> supports-eeprom-access: yes
> supports-register-dump: yes
> supports-priv-flags: no
> root@sniffer:~#
> 
> dmesg bnx2 lines:
> [    6.974972] bnx2: Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v2.2.4 (Aug 05, 2013)
> [    6.975681] bnx2 0000:01:00.0 eth0: Broadcom NetXtreme II BCM5716 1000Base-T (C0) PCI Express found at mem da000000, IRQ 36, node addr 00:26:b9:51:ba:59
> [    6.976631] bnx2 0000:01:00.1 eth1: Broadcom NetXtreme II BCM5716 1000Base-T (C0) PCI Express found at mem dc000000, IRQ 48, node addr 00:26:b9:51:ba:5a
> 
> 
> I'm using Ubuntus 3.13 kernel and ethtool version 3.13, but I previously
> tried ethtool 3.1 on both ubuntu 3.8 kernel and a vanilla 3.14.21
> kernel as well.
> 
> 
> Any ideas or workarounds that may help disabling rx vlan acceleration?

Try bringing the interface down first.  BNX2 is a bit strange in how it handles
vlan feature changes.

-vlad

> 
> 
> 
> 
> Thanks!
> 
> Lukas
> 
> 
>  		 	   		  --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* [PATCH RFC v2 3/3] virtio-net: optimize free_old_xmit_skbs stats
From: Michael S. Tsirkin @ 2014-10-15 14:32 UTC (permalink / raw)
  To: linux-kernel; +Cc: Jason Wang, Rusty Russell, virtualization, netdev
In-Reply-To: <1413383116-30977-1-git-send-email-mst@redhat.com>

From: Jason Wang <jasowang@redhat.com>

We already have counters for sent packets and sent bytes.
Use them to reduce the number of u64_stats_update_begin/end().

Take care not to bother with stats update when called
speculatively.

Based on a patch by Jason Wang.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 drivers/net/virtio_net.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 8dea411..4e12023 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -233,16 +233,22 @@ static unsigned int free_old_xmit_skbs(struct netdev_queue *txq,
 	       (skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
 		pr_debug("Sent skb %p\n", skb);
 
-		u64_stats_update_begin(&stats->tx_syncp);
-		stats->tx_bytes += skb->len;
 		bytes += skb->len;
-		stats->tx_packets++;
-		u64_stats_update_end(&stats->tx_syncp);
+		packets++;
 
 		dev_kfree_skb_any(skb);
-		packets++;
 	}
 
+	/* Avoid overhead when no packets have been processed
+	 * happens when called speculatively from start_xmit. */
+	if (!packets)
+		return 0;
+
+	u64_stats_update_begin(&stats->tx_syncp);
+	stats->tx_bytes += bytes;
+	stats->tx_packets += packets;
+	u64_stats_update_end(&stats->tx_syncp);
+
 	netdev_tx_completed_queue(txq, packets, bytes);
 
 	return packets;
-- 
MST

^ permalink raw reply related

* [PATCH RFC v2 2/3] virtio_net: bql
From: Michael S. Tsirkin @ 2014-10-15 14:32 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev, virtualization
In-Reply-To: <1413383116-30977-1-git-send-email-mst@redhat.com>

Improve tx batching using byte queue limits.
Should be especially effective for MQ.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 drivers/net/virtio_net.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index a9bf178..8dea411 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -219,13 +219,15 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
 	return p;
 }
 
-static unsigned int free_old_xmit_skbs(struct send_queue *sq, int budget)
+static unsigned int free_old_xmit_skbs(struct netdev_queue *txq,
+				       struct send_queue *sq, int budget)
 {
 	struct sk_buff *skb;
 	unsigned int len;
 	struct virtnet_info *vi = sq->vq->vdev->priv;
 	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
 	unsigned int packets = 0;
+	unsigned int bytes = 0;
 
 	while (packets < budget &&
 	       (skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
@@ -233,6 +235,7 @@ static unsigned int free_old_xmit_skbs(struct send_queue *sq, int budget)
 
 		u64_stats_update_begin(&stats->tx_syncp);
 		stats->tx_bytes += skb->len;
+		bytes += skb->len;
 		stats->tx_packets++;
 		u64_stats_update_end(&stats->tx_syncp);
 
@@ -240,6 +243,8 @@ static unsigned int free_old_xmit_skbs(struct send_queue *sq, int budget)
 		packets++;
 	}
 
+	netdev_tx_completed_queue(txq, packets, bytes);
+
 	return packets;
 }
 
@@ -810,7 +815,7 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
 again:
 	__netif_tx_lock(txq, smp_processor_id());
 	virtqueue_disable_cb(sq->vq);
-	sent += free_old_xmit_skbs(sq, budget - sent);
+	sent += free_old_xmit_skbs(txq, sq, budget - sent);
 
 	if (sent < budget) {
 		enable_done = virtqueue_enable_cb_delayed(sq->vq);
@@ -962,12 +967,13 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 	struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
 	bool kick = !skb->xmit_more;
 	bool stopped;
+	unsigned int bytes = skb->len;
 
 	virtqueue_disable_cb(sq->vq);
 
 	/* We are going to push one skb.
 	 * Try to pop one off to free space for it. */
-	free_old_xmit_skbs(sq, 1);
+	free_old_xmit_skbs(txq, sq, 1);
 
 	/* Try to transmit */
 	err = xmit_skb(sq, skb);
@@ -983,6 +989,12 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 		return NETDEV_TX_OK;
 	}
 
+	netdev_tx_sent_queue(txq, bytes);
+
+	/* Kick early so device can process descriptors in parallel with us. */
+	if (kick)
+		virtqueue_kick(sq->vq);
+
 	/* Apparently nice girls don't return TX_BUSY; stop the queue
 	 * before it gets out of hand.  Naturally, this wastes entries. */
 	if (sq->vq->num_free < 2+MAX_SKB_FRAGS) {
@@ -997,7 +1009,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	if (unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
 		/* More just got used, free them then recheck. */
-		free_old_xmit_skbs(sq, qsize);
+		free_old_xmit_skbs(txq, sq, qsize);
 		if (stopped && sq->vq->num_free >= 2+MAX_SKB_FRAGS)
 			netif_start_subqueue(dev, qnum);
 	}
-- 
MST

^ permalink raw reply related

* [PATCH RFC v2 1/3] virtio_net: enable tx interrupt
From: Michael S. Tsirkin @ 2014-10-15 14:32 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev, virtualization
In-Reply-To: <1413383116-30977-1-git-send-email-mst@redhat.com>

On newer hosts that support delayed tx interrupts,
we probably don't have much to gain from orphaning
packets early.

Based on patch by Jason Wang.

Note: this might degrade performance for
hosts without event idx support.
Should be addressed by the next patch.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 drivers/net/virtio_net.c | 137 ++++++++++++++++++++++++++++++++---------------
 1 file changed, 94 insertions(+), 43 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 13d0a8b..a9bf178 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -72,6 +72,8 @@ struct send_queue {
 
 	/* Name of the send queue: output.$index */
 	char name[40];
+
+	struct napi_struct napi;
 };
 
 /* Internal representation of a receive virtqueue */
@@ -217,15 +219,37 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
 	return p;
 }
 
+static unsigned int free_old_xmit_skbs(struct send_queue *sq, int budget)
+{
+	struct sk_buff *skb;
+	unsigned int len;
+	struct virtnet_info *vi = sq->vq->vdev->priv;
+	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
+	unsigned int packets = 0;
+
+	while (packets < budget &&
+	       (skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
+		pr_debug("Sent skb %p\n", skb);
+
+		u64_stats_update_begin(&stats->tx_syncp);
+		stats->tx_bytes += skb->len;
+		stats->tx_packets++;
+		u64_stats_update_end(&stats->tx_syncp);
+
+		dev_kfree_skb_any(skb);
+		packets++;
+	}
+
+	return packets;
+}
+
 static void skb_xmit_done(struct virtqueue *vq)
 {
 	struct virtnet_info *vi = vq->vdev->priv;
+	struct send_queue *sq = &vi->sq[vq2txq(vq)];
 
-	/* Suppress further interrupts. */
-	virtqueue_disable_cb(vq);
-
-	/* We were probably waiting for more output buffers. */
-	netif_wake_subqueue(vi->dev, vq2txq(vq));
+	if (napi_schedule_prep(&sq->napi))
+		__napi_schedule(&sq->napi);
 }
 
 static unsigned int mergeable_ctx_to_buf_truesize(unsigned long mrg_ctx)
@@ -774,6 +798,37 @@ again:
 	return received;
 }
 
+static int virtnet_poll_tx(struct napi_struct *napi, int budget)
+{
+	struct send_queue *sq =
+		container_of(napi, struct send_queue, napi);
+	struct virtnet_info *vi = sq->vq->vdev->priv;
+	struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, vq2txq(sq->vq));
+	unsigned int sent = 0;
+	bool enable_done;
+
+again:
+	__netif_tx_lock(txq, smp_processor_id());
+	virtqueue_disable_cb(sq->vq);
+	sent += free_old_xmit_skbs(sq, budget - sent);
+
+	if (sent < budget) {
+		enable_done = virtqueue_enable_cb_delayed(sq->vq);
+		napi_complete(napi);
+		__netif_tx_unlock(txq);
+		if (unlikely(enable_done) && napi_schedule_prep(napi)) {
+			virtqueue_disable_cb(sq->vq);
+			__napi_schedule(napi);
+			goto again;
+		}
+	} else {
+		__netif_tx_unlock(txq);
+	}
+
+	netif_wake_subqueue(vi->dev, vq2txq(sq->vq));
+	return sent;
+}
+
 #ifdef CONFIG_NET_RX_BUSY_POLL
 /* must be called with local_bh_disable()d */
 static int virtnet_busy_poll(struct napi_struct *napi)
@@ -822,30 +877,12 @@ static int virtnet_open(struct net_device *dev)
 			if (!try_fill_recv(&vi->rq[i], GFP_KERNEL))
 				schedule_delayed_work(&vi->refill, 0);
 		virtnet_napi_enable(&vi->rq[i]);
+		napi_enable(&vi->sq[i].napi);
 	}
 
 	return 0;
 }
 
-static void free_old_xmit_skbs(struct send_queue *sq)
-{
-	struct sk_buff *skb;
-	unsigned int len;
-	struct virtnet_info *vi = sq->vq->vdev->priv;
-	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
-
-	while ((skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
-		pr_debug("Sent skb %p\n", skb);
-
-		u64_stats_update_begin(&stats->tx_syncp);
-		stats->tx_bytes += skb->len;
-		stats->tx_packets++;
-		u64_stats_update_end(&stats->tx_syncp);
-
-		dev_kfree_skb_any(skb);
-	}
-}
-
 static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
 {
 	struct skb_vnet_hdr *hdr;
@@ -911,7 +948,9 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
 		sg_set_buf(sq->sg, hdr, hdr_len);
 		num_sg = skb_to_sgvec(skb, sq->sg + 1, 0, skb->len) + 1;
 	}
-	return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb, GFP_ATOMIC);
+
+	return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb,
+				    GFP_ATOMIC);
 }
 
 static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
@@ -919,12 +958,16 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 	struct virtnet_info *vi = netdev_priv(dev);
 	int qnum = skb_get_queue_mapping(skb);
 	struct send_queue *sq = &vi->sq[qnum];
-	int err;
+	int err, qsize = virtqueue_get_vring_size(sq->vq);
 	struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
 	bool kick = !skb->xmit_more;
+	bool stopped;
+
+	virtqueue_disable_cb(sq->vq);
 
-	/* Free up any pending old buffers before queueing new ones. */
-	free_old_xmit_skbs(sq);
+	/* We are going to push one skb.
+	 * Try to pop one off to free space for it. */
+	free_old_xmit_skbs(sq, 1);
 
 	/* Try to transmit */
 	err = xmit_skb(sq, skb);
@@ -940,27 +983,25 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 		return NETDEV_TX_OK;
 	}
 
-	/* Don't wait up for transmitted skbs to be freed. */
-	skb_orphan(skb);
-	nf_reset(skb);
-
 	/* Apparently nice girls don't return TX_BUSY; stop the queue
 	 * before it gets out of hand.  Naturally, this wastes entries. */
 	if (sq->vq->num_free < 2+MAX_SKB_FRAGS) {
 		netif_stop_subqueue(dev, qnum);
-		if (unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
-			/* More just got used, free them then recheck. */
-			free_old_xmit_skbs(sq);
-			if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
-				netif_start_subqueue(dev, qnum);
-				virtqueue_disable_cb(sq->vq);
-			}
-		}
+		stopped = true;
+	} else {
+		stopped = false;
 	}
 
 	if (kick || netif_xmit_stopped(txq))
 		virtqueue_kick(sq->vq);
 
+	if (unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
+		/* More just got used, free them then recheck. */
+		free_old_xmit_skbs(sq, qsize);
+		if (stopped && sq->vq->num_free >= 2+MAX_SKB_FRAGS)
+			netif_start_subqueue(dev, qnum);
+	}
+
 	return NETDEV_TX_OK;
 }
 
@@ -1137,8 +1178,10 @@ static int virtnet_close(struct net_device *dev)
 	/* Make sure refill_work doesn't re-enable napi! */
 	cancel_delayed_work_sync(&vi->refill);
 
-	for (i = 0; i < vi->max_queue_pairs; i++)
+	for (i = 0; i < vi->max_queue_pairs; i++) {
 		napi_disable(&vi->rq[i].napi);
+		napi_disable(&vi->sq[i].napi);
+	}
 
 	return 0;
 }
@@ -1457,8 +1500,10 @@ static void virtnet_free_queues(struct virtnet_info *vi)
 {
 	int i;
 
-	for (i = 0; i < vi->max_queue_pairs; i++)
+	for (i = 0; i < vi->max_queue_pairs; i++) {
 		netif_napi_del(&vi->rq[i].napi);
+		netif_napi_del(&vi->sq[i].napi);
+	}
 
 	kfree(vi->rq);
 	kfree(vi->sq);
@@ -1612,6 +1657,8 @@ static int virtnet_alloc_queues(struct virtnet_info *vi)
 		netif_napi_add(vi->dev, &vi->rq[i].napi, virtnet_poll,
 			       napi_weight);
 		napi_hash_add(&vi->rq[i].napi);
+		netif_napi_add(vi->dev, &vi->sq[i].napi, virtnet_poll_tx,
+			       napi_weight);
 
 		sg_init_table(vi->rq[i].sg, ARRAY_SIZE(vi->rq[i].sg));
 		ewma_init(&vi->rq[i].mrg_avg_pkt_len, 1, RECEIVE_AVG_WEIGHT);
@@ -1916,8 +1963,10 @@ static int virtnet_freeze(struct virtio_device *vdev)
 	if (netif_running(vi->dev)) {
 		for (i = 0; i < vi->max_queue_pairs; i++) {
 			napi_disable(&vi->rq[i].napi);
+			napi_disable(&vi->sq[i].napi);
 			napi_hash_del(&vi->rq[i].napi);
 			netif_napi_del(&vi->rq[i].napi);
+			netif_napi_del(&vi->sq[i].napi);
 		}
 	}
 
@@ -1942,8 +1991,10 @@ static int virtnet_restore(struct virtio_device *vdev)
 			if (!try_fill_recv(&vi->rq[i], GFP_KERNEL))
 				schedule_delayed_work(&vi->refill, 0);
 
-		for (i = 0; i < vi->max_queue_pairs; i++)
+		for (i = 0; i < vi->max_queue_pairs; i++) {
 			virtnet_napi_enable(&vi->rq[i]);
+			napi_enable(&vi->sq[i].napi);
+		}
 	}
 
 	netif_device_attach(vi->dev);
-- 
MST

^ permalink raw reply related

* sunvnet NAPIfication
From: Sowmini Varadhan @ 2014-10-15 14:05 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20141003.120802.1213573830649867131.davem@davemloft.net>


I have the NAPIfication-of-sunvnet patch-set ready for 
code review. AFAIK net-next is not currently open
for new features this week, but does it make sense for
me to go ahead and send the patch-set to the list (I'm sure
it will take some time to review) or will that just create
confusion to the netdev maintainers?

--Sowmini

Previously discussed:

> >> For example, you can now move everything into software IRQ context,
> >> just disable the VIO interrupt and unconditionally go into NAPI
> >> context from the VIO event.
> >> No more irqsave/irqrestore.
> >> Then the TX path even can run mostly lockless, it just needs
> >> to hold the VIO lock for a minute period of time.  The caller
> >> holds the xmit_lock of the network device to prevent re-entry
> >> into the ->ndo_start_xmit() path.
> 
> I think you should be able to get rid of all of the in-driver
> locking in the fast paths.
> 
> NAPI ->poll() is non-reentrant, so all RX processing occurs
> strictly in a serialized environment.
> 
> Once you do TX reclaim in NAPI context, then all you have to do is
> take the generic netdev TX queue lock during the evaluation of whether
> to wakeup the TX queue or not.  Worst case you need to hold the
> TX netdev queue lock across the whole TX reclaim operation.
> 
> The VIO lock really ought to be entirely superfluous in the data
> paths.

^ permalink raw reply

* [NET/USB] ax88179_set_mac_addr: return only error or zero
From: Ian Morgan @ 2014-10-15 13:49 UTC (permalink / raw)
  To: netdev, linux-usb

[-- Attachment #1: Type: text/plain, Size: 2024 bytes --]

(It's about a USB NIC, so not sure if this is suitable to the netdev
mailing list or the linux-usb mailing list, so you both get it.)

I had an issue with bonding one of these USB3 Gigabit NIC devices
because set_mac_addr was returning a positive value (6), and the
bonding driver thinks that any non-zero value is an error.

My proposed fix was to (I thought correctly) check for negative return
codes in bond_enslave() instead of any non-zero value.

I was told by the bonding driver maintainers that my proposed fix was
incorrect, and that the device driver should return only zero on
success, as other drivers do this and that assumption has been made.
I'm really not sure why the more vigilant check in the bonding driver
was "wrong" though. We all know the rule about assumptions ;-) But I
defer to their superior knowledge of kernel interfaces.

Anyhow, it would seem that the following patch is the "more
acceptable" solution, and is in line with what other drivers do:


--- linux-3.17/drivers/net/usb/ax88179_178a.c.orig  2014-10-05
15:23:04.000000000 -0400
+++ linux-3.17/drivers/net/usb/ax88179_178a.c   2014-10-15
09:07:31.217810217 -0400
@@ -937,6 +937,7 @@ static int ax88179_set_mac_addr(struct n
 {
        struct usbnet *dev = netdev_priv(net);
        struct sockaddr *addr = p;
+       int ret;

        if (netif_running(net))
                return -EBUSY;
@@ -946,8 +947,12 @@ static int ax88179_set_mac_addr(struct n
        memcpy(net->dev_addr, addr->sa_data, ETH_ALEN);

        /* Set the MAC address */
-       return ax88179_write_cmd(dev, AX_ACCESS_MAC, AX_NODE_ID, ETH_ALEN,
+       ret = ax88179_write_cmd(dev, AX_ACCESS_MAC, AX_NODE_ID, ETH_ALEN,
                                 ETH_ALEN, net->dev_addr);
+       if (ret < 0)
+               return ret;
+
+       return 0;
 }

 static const struct net_device_ops ax88179_netdev_ops = {


(non-tab-mangled patch file attached)

Your feedback is duly appreciated. Please CC me, as I am not
subscribed to these mailing lists.

--
Ian Morgan

[-- Attachment #2: ax88719_setmacaddr_rc.patch --]
[-- Type: text/x-patch, Size: 775 bytes --]

--- linux-3.17/drivers/net/usb/ax88179_178a.c~	2014-10-05 15:23:04.000000000 -0400
+++ linux-3.17/drivers/net/usb/ax88179_178a.c	2014-10-15 09:07:31.217810217 -0400
@@ -937,6 +937,7 @@ static int ax88179_set_mac_addr(struct n
 {
 	struct usbnet *dev = netdev_priv(net);
 	struct sockaddr *addr = p;
+	int ret;
 
 	if (netif_running(net))
 		return -EBUSY;
@@ -946,8 +947,12 @@ static int ax88179_set_mac_addr(struct n
 	memcpy(net->dev_addr, addr->sa_data, ETH_ALEN);
 
 	/* Set the MAC address */
-	return ax88179_write_cmd(dev, AX_ACCESS_MAC, AX_NODE_ID, ETH_ALEN,
+	ret = ax88179_write_cmd(dev, AX_ACCESS_MAC, AX_NODE_ID, ETH_ALEN,
 				 ETH_ALEN, net->dev_addr);
+	if (ret < 0)
+		return ret;
+
+	return 0;
 }
 
 static const struct net_device_ops ax88179_netdev_ops = {

^ permalink raw reply

* Re: [PATCH] virtio_net: fix use after free
From: Michael S. Tsirkin @ 2014-10-15 13:37 UTC (permalink / raw)
  To: David Laight
  Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org, David S. Miller
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6D1C9D77AB@AcuExch.aculab.com>

On Wed, Oct 15, 2014 at 01:24:57PM +0000, David Laight wrote:
> From: Michael S. Tsirkin
> > commit 0b725a2ca61bedc33a2a63d0451d528b268cf975
> >     net: Remove ndo_xmit_flush netdev operation, use signalling instead.
> > 
> > added code that looks at skb->xmit_more after the skb has
> > been put in TX VQ. Since some paths process the ring and free the skb
> > immediately, this can cause use after free.
> > 
> > Fix by storing xmit_more in a local variable.
> > 
> > Cc: David S. Miller <davem@davemloft.net>
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> > 
> > David, am I using the API correctly?
> > Seems to work for me.
> > You used __netif_subqueue_stopped but that seems to use
> > a slightly more expensive test_bit internally.
> > The reason I added a variable for the txq here is because it's handy for
> > BQL patch later on.
> > 
> > 
> >  drivers/net/virtio_net.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 3d0ce44..13d0a8b 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -920,6 +920,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> >  	int qnum = skb_get_queue_mapping(skb);
> >  	struct send_queue *sq = &vi->sq[qnum];
> >  	int err;
> > +	struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
> 
> Do you need to cache 'txq' on stack for the entire call?
> Looks like it is only needed when 'kick' is true.
> I've not looked to see if saves both 'dev' and 'qnum' being kept.
> 
> In any case it isn't mentioned in the commit message.
> 
> 	David

Code seems slightly neater this way, I haven't bothered to
micro-optimize it to this level yet.
Want to benchmark and send a patch on top?

> > +	bool kick = !skb->xmit_more;
> > 
> >  	/* Free up any pending old buffers before queueing new ones. */
> >  	free_old_xmit_skbs(sq);
> > @@ -956,7 +958,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> >  		}
> >  	}
> > 
> > -	if (__netif_subqueue_stopped(dev, qnum) || !skb->xmit_more)
> > +	if (kick || netif_xmit_stopped(txq))
> >  		virtqueue_kick(sq->vq);
> > 
> >  	return NETDEV_TX_OK;
> > --
> > MST
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* Re: [PATCH v2 1/1] net: fec: ptp: fix convergence issue to support LinuxPTP stack
From: Richard Cochran @ 2014-10-15 13:25 UTC (permalink / raw)
  To: Fugang Duan; +Cc: davem, netdev, bhutchings, b20596
In-Reply-To: <1413365412-22072-1-git-send-email-b38611@freescale.com>

On Wed, Oct 15, 2014 at 05:30:12PM +0800, Fugang Duan wrote:
> Changes V2:
> Modify the commit/comments log to describe the issue clearly.

Thanks, it is more clear now.

Acked-by: Richard Cochran <richardcochran@gmail.com>

^ permalink raw reply

* RE: [PATCH] virtio_net: fix use after free
From: David Laight @ 2014-10-15 13:24 UTC (permalink / raw)
  To: 'Michael S. Tsirkin', linux-kernel@vger.kernel.org
  Cc: David S. Miller, Rusty Russell,
	virtualization@lists.linux-foundation.org, netdev@vger.kernel.org,
	Jason Wang
In-Reply-To: <1413378878-28118-1-git-send-email-mst@redhat.com>

From: Michael S. Tsirkin
> commit 0b725a2ca61bedc33a2a63d0451d528b268cf975
>     net: Remove ndo_xmit_flush netdev operation, use signalling instead.
> 
> added code that looks at skb->xmit_more after the skb has
> been put in TX VQ. Since some paths process the ring and free the skb
> immediately, this can cause use after free.
> 
> Fix by storing xmit_more in a local variable.
> 
> Cc: David S. Miller <davem@davemloft.net>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> 
> David, am I using the API correctly?
> Seems to work for me.
> You used __netif_subqueue_stopped but that seems to use
> a slightly more expensive test_bit internally.
> The reason I added a variable for the txq here is because it's handy for
> BQL patch later on.
> 
> 
>  drivers/net/virtio_net.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 3d0ce44..13d0a8b 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -920,6 +920,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>  	int qnum = skb_get_queue_mapping(skb);
>  	struct send_queue *sq = &vi->sq[qnum];
>  	int err;
> +	struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);

Do you need to cache 'txq' on stack for the entire call?
Looks like it is only needed when 'kick' is true.
I've not looked to see if saves both 'dev' and 'qnum' being kept.

In any case it isn't mentioned in the commit message.

	David

> +	bool kick = !skb->xmit_more;
> 
>  	/* Free up any pending old buffers before queueing new ones. */
>  	free_old_xmit_skbs(sq);
> @@ -956,7 +958,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>  		}
>  	}
> 
> -	if (__netif_subqueue_stopped(dev, qnum) || !skb->xmit_more)
> +	if (kick || netif_xmit_stopped(txq))
>  		virtqueue_kick(sq->vq);
> 
>  	return NETDEV_TX_OK;
> --
> MST
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH] virtio_net: fix use after free
From: Michael S. Tsirkin @ 2014-10-15 13:23 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev, David S. Miller, virtualization

commit 0b725a2ca61bedc33a2a63d0451d528b268cf975
    net: Remove ndo_xmit_flush netdev operation, use signalling instead.

added code that looks at skb->xmit_more after the skb has
been put in TX VQ. Since some paths process the ring and free the skb
immediately, this can cause use after free.

Fix by storing xmit_more in a local variable.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---

David, am I using the API correctly?
Seems to work for me.
You used __netif_subqueue_stopped but that seems to use
a slightly more expensive test_bit internally.
The reason I added a variable for the txq here is because it's handy for
BQL patch later on.


 drivers/net/virtio_net.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 3d0ce44..13d0a8b 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -920,6 +920,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 	int qnum = skb_get_queue_mapping(skb);
 	struct send_queue *sq = &vi->sq[qnum];
 	int err;
+	struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
+	bool kick = !skb->xmit_more;
 
 	/* Free up any pending old buffers before queueing new ones. */
 	free_old_xmit_skbs(sq);
@@ -956,7 +958,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 		}
 	}
 
-	if (__netif_subqueue_stopped(dev, qnum) || !skb->xmit_more)
+	if (kick || netif_xmit_stopped(txq))
 		virtqueue_kick(sq->vq);
 
 	return NETDEV_TX_OK;
-- 
MST

^ permalink raw reply related

* [PATCH] ipv4: dst_entry leak in ip_send_unicast_reply()
From: Vasily Averin @ 2014-10-15 12:24 UTC (permalink / raw)
  To: Eric Dumazet, netdev, David S. Miller
  Cc: Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI,
	Patrick McHardy

Fixes: 2e77d89b2fa8 ("net: avoid a pair of dst_hold()/dst_release() in ip_append_data()")

ip_setup_cork() called inside ip_append_data() steals dst entry from rt to cork
and in case errors in __ip_append_data() nobody frees stolen dst entry

Signed-off-by: Vasily Averin <vvs@parallels.com>
---
 net/ipv4/ip_output.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index e35b712..88e5ef2 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1535,6 +1535,7 @@ void ip_send_unicast_reply(struct net *net, struct sk_buff *skb,
 	struct sk_buff *nskb;
 	struct sock *sk;
 	struct inet_sock *inet;
+	int err;
 
 	if (__ip_options_echo(&replyopts.opt.opt, skb, sopt))
 		return;
@@ -1574,8 +1575,13 @@ void ip_send_unicast_reply(struct net *net, struct sk_buff *skb,
 	sock_net_set(sk, net);
 	__skb_queue_head_init(&sk->sk_write_queue);
 	sk->sk_sndbuf = sysctl_wmem_default;
-	ip_append_data(sk, &fl4, ip_reply_glue_bits, arg->iov->iov_base, len, 0,
-		       &ipc, &rt, MSG_DONTWAIT);
+	err = ip_append_data(sk, &fl4, ip_reply_glue_bits, arg->iov->iov_base,
+			     len, 0, &ipc, &rt, MSG_DONTWAIT);
+	if (unlikely(err)) {
+		ip_flush_pending_frames(sk);
+		goto out;
+	}
+
 	nskb = skb_peek(&sk->sk_write_queue);
 	if (nskb) {
 		if (arg->csumoffset >= 0)
@@ -1587,7 +1593,7 @@ void ip_send_unicast_reply(struct net *net, struct sk_buff *skb,
 		skb_set_queue_mapping(nskb, skb_get_queue_mapping(skb));
 		ip_push_pending_frames(sk, &fl4);
 	}
-
+out:
 	put_cpu_var(unicast_sock);
 
 	ip_rt_put(rt);
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH] gianfar: disable vlan tag insertion by default on 2.6.x
From: Willy Tarreau @ 2014-10-15 12:01 UTC (permalink / raw)
  To: zhuyj
  Cc: yzhu1, sandeep.kumar, netdev, linux-kernel, Yue.Tao, guang.yang,
	joe, festevam, richardcochran, clarocq, yongjun_wei,
	claudiu.manoil, roy.xu, sky.wangfeng
In-Reply-To: <543E4AE0.2010207@gmail.com>

Hi,

On Wed, Oct 15, 2014 at 06:22:24PM +0800, zhuyj wrote:
> Sorry, it is my fault.

no problem, don't worry :-)

> Please apply this patch in the attachment.

Thank you, patch queued now!

Willy

^ permalink raw reply

* Re: [RFC PATCH net-next 3/6] virtio-net: small optimization on free_old_xmit_skbs()
From: Michael S. Tsirkin @ 2014-10-15 12:00 UTC (permalink / raw)
  To: David Laight
  Cc: eric.dumazet@gmail.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org, davem@davemloft.net
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6D1C9D7688@AcuExch.aculab.com>

On Wed, Oct 15, 2014 at 10:51:32AM +0000, David Laight wrote:
> From: Michael S. Tsirkin
> > On Wed, Oct 15, 2014 at 09:49:01AM +0000, David Laight wrote:
> > > From: Of Michael S. Tsirkin
> > > > On Wed, Oct 15, 2014 at 03:25:27PM +0800, Jason Wang wrote:
> > > > > Accumulate the sent packets and sent bytes in local variables and perform a
> > > > > single u64_stats_update_begin/end() after.
> > > > >
> > > > > Cc: Rusty Russell <rusty@rustcorp.com.au>
> > > > > Cc: Michael S. Tsirkin <mst@redhat.com>
> > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > >
> > > > Not sure how much it's worth but since Eric suggested it ...
> > >
> > > Probably depends on the actual cost of u64_stats_update_begin/end
> > > against the likely extra saving of the tx_bytes and tx_packets
> > > values onto the stack across the call to dev_kfree_skb_any().
> > > (Which depends on the number of caller saved registers.)
> > 
> > Yea, some benchmark results would be nice to see.
> 
> I there are likely to be multiple skb on the queue the fastest
> code would probably do one 'virtqueue_get_all()' that returned
> a linked list of buffers, then follow the list to get the stats,
> and follow it again to free the skb.
> 
> 	David

Interesting. Each time we tried playing with linked list in the past,
it simply destroyed performance.
Maybe this case is different but I have my doubts.

> > > > Acked-by: Michael S. Tsirkin <mst@redhat.com>
> > > >
> > > > > ---
> > > > >  drivers/net/virtio_net.c |   12 ++++++++----
> > > > >  1 files changed, 8 insertions(+), 4 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > index 3d0ce44..a4d56b8 100644
> > > > > --- a/drivers/net/virtio_net.c
> > > > > +++ b/drivers/net/virtio_net.c
> > > > > @@ -833,17 +833,21 @@ static void free_old_xmit_skbs(struct send_queue *sq)
> > > > >  	unsigned int len;
> > > > >  	struct virtnet_info *vi = sq->vq->vdev->priv;
> > > > >  	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> > > > > +	u64 tx_bytes = 0, tx_packets = 0;
> > >
> > > tx_packets need only be 'unsigned int'.
> > > The same is almost certainly true of tx_bytes.
> > >
> > > 	David
> > >
> > > > >  	while ((skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> > > > >  		pr_debug("Sent skb %p\n", skb);
> > > > >
> > > > > -		u64_stats_update_begin(&stats->tx_syncp);
> > > > > -		stats->tx_bytes += skb->len;
> > > > > -		stats->tx_packets++;
> > > > > -		u64_stats_update_end(&stats->tx_syncp);
> > > > > +		tx_bytes += skb->len;
> > > > > +		tx_packets++;
> > > > >
> > > > >  		dev_kfree_skb_any(skb);
> > > > >  	}
> > > > > +
> > > > > +	u64_stats_update_begin(&stats->tx_syncp);
> > > > > +	stats->tx_bytes += tx_bytes;
> > > > > +	stats->tx_packets =+ tx_packets;
> > > > > +	u64_stats_update_end(&stats->tx_syncp);
> > > > >  }
> > > > >
> > > > >  static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
> > > > > --
> > > > > 1.7.1
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > >
> 

^ permalink raw reply

* Re: [RFC PATCH net-next 0/6] Always use tx interrupt for virtio-net
From: Michael S. Tsirkin @ 2014-10-15 11:58 UTC (permalink / raw)
  To: Jason Wang; +Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem
In-Reply-To: <543E5706.10305@redhat.com>

On Wed, Oct 15, 2014 at 07:14:14PM +0800, Jason Wang wrote:
> On 10/15/2014 06:25 PM, Michael S. Tsirkin wrote:
> > On Wed, Oct 15, 2014 at 03:25:24PM +0800, Jason Wang wrote:
> >> According to David, proper accounting and queueing (at all levels, not
> >> just TCP sockets) is more important than trying to skim a bunch of
> >> cycles by avoiding TX interrupts.
> > He also mentioned we should find other ways to batch
> >
> 
> Right.
> >> Having an event to free the SKB is
> >> absolutely essential for the stack to operate correctly.
> >>
> >> This series tries to enable tx interrupt for virtio-net. The idea is
> >> simple: enable tx interrupt and schedule a tx napi to free old xmit
> >> skbs.
> >>
> >> Several notes:
> >> - Tx interrupt storm avoidance when queue is about to be full is
> >>   kept.
> > But queue is typically *not* full. More important to avoid interrupt
> > storms in that case IMO.
> 
> Yes.
> >> Since we may enable callbacks in both ndo_start_xmit() and tx
> >>   napi, patch 1 adds a check to make sure used event never go
> >>   back. This will let the napi not enable the callbacks wrongly after
> >>   delayed callbacks was used.
> > So why not just use delayed callbacks?
> 
> This means the tx interrupt are coalesced in a somewhat adaptive way.
> Need benchmark to see its effect.

I think it's a minimal change, and does not need new APIs.
If that's not optimal, we can do smarter things on top.

> >
> >> - For bulk dequeuing, there's no need to enable tx interrupt for each
> >>   packet. The last patch only enable tx interrupt for the final skb in
> >>   the chain through xmit_more and a new helper to publish current avail
> >>   idx as used event.
> >>
> >> This series fixes several issues of original rfc pointed out by Michael.
> > Could you list the issues, for ease of review?
> 
> Probably just one:
> 
> - Move the virtqueue_disable_cb() from skb_xmit_done() into
> virtnet_poll_tx() under tx lock.

I think I did this already, I'll recheck.

^ permalink raw reply

* Re: [PATCH RFC] virtio_net: enable tx interrupt
From: Michael S. Tsirkin @ 2014-10-15 11:57 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, linux-kernel, virtualization
In-Reply-To: <543E54B4.7040602@redhat.com>

On Wed, Oct 15, 2014 at 07:04:20PM +0800, Jason Wang wrote:
> On 10/15/2014 05:53 AM, Michael S. Tsirkin wrote:
> > On newer hosts that support delayed tx interrupts,
> > we probably don't have much to gain from orphaning
> > packets early.
> >
> > Based on patch by Jason Wang.
> >
> > Note: this will likely degrade performance for hosts without event idx
> > support.  Various fallback options are available, including
> > orphaning conditionally.
> > Testing TBD.
> >
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >  drivers/net/virtio_net.c | 119 +++++++++++++++++++++++++++++++++--------------
> >  1 file changed, 83 insertions(+), 36 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 6b6e136..62c059d 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -72,6 +72,8 @@ struct send_queue {
> >  
> >  	/* Name of the send queue: output.$index */
> >  	char name[40];
> > +
> > +	struct napi_struct napi;
> >  };
> >  
> >  /* Internal representation of a receive virtqueue */
> > @@ -211,15 +213,38 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
> >  	return p;
> >  }
> >  
> > +static int free_old_xmit_skbs(struct send_queue *sq, int budget)
> > +{
> > +	struct sk_buff *skb;
> > +	unsigned int len;
> > +	struct virtnet_info *vi = sq->vq->vdev->priv;
> > +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> > +	int sent = 0;
> > +
> > +	while (sent < budget &&
> > +	       (skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> > +		pr_debug("Sent skb %p\n", skb);
> > +
> > +		u64_stats_update_begin(&stats->tx_syncp);
> > +		stats->tx_bytes += skb->len;
> > +		stats->tx_packets++;
> > +		u64_stats_update_end(&stats->tx_syncp);
> > +
> > +		dev_kfree_skb_any(skb);
> > +		sent++;
> > +	}
> > +
> > +	return sent;
> > +}
> > +
> >  static void skb_xmit_done(struct virtqueue *vq)
> >  {
> >  	struct virtnet_info *vi = vq->vdev->priv;
> > +	struct send_queue *sq = &vi->sq[vq2txq(vq)];
> >  
> > -	/* Suppress further interrupts. */
> > -	virtqueue_disable_cb(vq);
> > -
> > -	/* We were probably waiting for more output buffers. */
> > -	netif_wake_subqueue(vi->dev, vq2txq(vq));
> > +	if (napi_schedule_prep(&sq->napi)) {
> > +		__napi_schedule(&sq->napi);
> > +	}
> >  }
> >  
> >  static unsigned int mergeable_ctx_to_buf_truesize(unsigned long mrg_ctx)
> > @@ -766,6 +791,37 @@ again:
> >  	return received;
> >  }
> >  
> > +static int virtnet_poll_tx(struct napi_struct *napi, int budget)
> > +{
> > +	struct send_queue *sq =
> > +		container_of(napi, struct send_queue, napi);
> > +	struct virtnet_info *vi = sq->vq->vdev->priv;
> > +	struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, vq2txq(sq->vq));
> > +	unsigned int r, sent = 0;
> > +
> > +again:
> > +	__netif_tx_lock(txq, smp_processor_id());
> > +	virtqueue_disable_cb(sq->vq);
> > +	sent += free_old_xmit_skbs(sq, budget - sent);
> > +
> > +	if (sent < budget) {
> > +		r = virtqueue_enable_cb_prepare(sq->vq);
> 
> So even virtqueue_enable_cb_delayed() was used in start_xmit(). This can
> move used index backwards to trigger unnecessary interrupts.

Good point. I'll rework this to use virtqueue_enable_cb_delayed.

virtqueue_enable_cb_delayed_prepare might be nice to
reduce lock contention, but that needs to be benchmarked.


> > +		napi_complete(napi);
> > +		__netif_tx_unlock(txq);
> > +		if (unlikely(virtqueue_poll(sq->vq, r)) &&
> > +		    napi_schedule_prep(napi)) {
> > +			virtqueue_disable_cb(sq->vq);
> > +			__napi_schedule(napi);
> > +			goto again;
> > +		}
> > +	} else {
> > +		__netif_tx_unlock(txq);
> > +	}
> > +
> > +	netif_wake_subqueue(vi->dev, vq2txq(sq->vq));
> > +	return sent;
> > +}
> > +
> >  #ifdef CONFIG_NET_RX_BUSY_POLL
> >  /* must be called with local_bh_disable()d */
> >  static int virtnet_busy_poll(struct napi_struct *napi)
> > @@ -814,30 +870,12 @@ static int virtnet_open(struct net_device *dev)
> >  			if (!try_fill_recv(&vi->rq[i], GFP_KERNEL))
> >  				schedule_delayed_work(&vi->refill, 0);
> >  		virtnet_napi_enable(&vi->rq[i]);
> > +		napi_enable(&vi->sq[i].napi);
> >  	}
> >  
> >  	return 0;
> >  }
> >  
> > -static void free_old_xmit_skbs(struct send_queue *sq)
> > -{
> > -	struct sk_buff *skb;
> > -	unsigned int len;
> > -	struct virtnet_info *vi = sq->vq->vdev->priv;
> > -	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> > -
> > -	while ((skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> > -		pr_debug("Sent skb %p\n", skb);
> > -
> > -		u64_stats_update_begin(&stats->tx_syncp);
> > -		stats->tx_bytes += skb->len;
> > -		stats->tx_packets++;
> > -		u64_stats_update_end(&stats->tx_syncp);
> > -
> > -		dev_kfree_skb_any(skb);
> > -	}
> > -}
> > -
> >  static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
> >  {
> >  	struct skb_vnet_hdr *hdr;
> > @@ -902,7 +940,9 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
> >  		sg_set_buf(sq->sg, hdr, hdr_len);
> >  		num_sg = skb_to_sgvec(skb, sq->sg + 1, 0, skb->len) + 1;
> >  	}
> > -	return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb, GFP_ATOMIC);
> > +
> > +	return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb,
> > +				    GFP_ATOMIC);
> >  }
> >  
> >  static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> > @@ -910,10 +950,9 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> >  	struct virtnet_info *vi = netdev_priv(dev);
> >  	int qnum = skb_get_queue_mapping(skb);
> >  	struct send_queue *sq = &vi->sq[qnum];
> > -	int err;
> > +	int err, qsize = virtqueue_get_vring_size(sq->vq);
> >  
> > -	/* Free up any pending old buffers before queueing new ones. */
> > -	free_old_xmit_skbs(sq);
> 
> I think we'd better keep this. Since it may the tx skb freeing not
> totally depends on the tx interrupt, delayed interrupt may work better
> without damage the latency.

Hmm ok but I think it's best to do it at the end,
after we have sent the packet.
Will update the patch.

> > +	virtqueue_disable_cb(sq->vq);
> >  
> >  	/* Try to transmit */
> >  	err = xmit_skb(sq, skb);
> > @@ -930,22 +969,20 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> >  	}
> >  	virtqueue_kick(sq->vq);
> >  
> > -	/* Don't wait up for transmitted skbs to be freed. */
> > -	skb_orphan(skb);
> > -	nf_reset(skb);
> > -
> >  	/* Apparently nice girls don't return TX_BUSY; stop the queue
> >  	 * before it gets out of hand.  Naturally, this wastes entries. */
> >  	if (sq->vq->num_free < 2+MAX_SKB_FRAGS) {
> >  		netif_stop_subqueue(dev, qnum);
> >  		if (unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
> >  			/* More just got used, free them then recheck. */
> > -			free_old_xmit_skbs(sq);
> > +			free_old_xmit_skbs(sq, qsize);
> >  			if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
> >  				netif_start_subqueue(dev, qnum);
> >  				virtqueue_disable_cb(sq->vq);
> >  			}
> >  		}
> > +	} else if (virtqueue_enable_cb_delayed(sq->vq)) {
> > +		free_old_xmit_skbs(sq, qsize);
> >  	}
> >  
> >  	return NETDEV_TX_OK;
> > @@ -1124,8 +1161,10 @@ static int virtnet_close(struct net_device *dev)
> >  	/* Make sure refill_work doesn't re-enable napi! */
> >  	cancel_delayed_work_sync(&vi->refill);
> >  
> > -	for (i = 0; i < vi->max_queue_pairs; i++)
> > +	for (i = 0; i < vi->max_queue_pairs; i++) {
> >  		napi_disable(&vi->rq[i].napi);
> > +		napi_disable(&vi->sq[i].napi);
> > +	}
> >  
> >  	return 0;
> >  }
> > @@ -1438,8 +1477,10 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> >  {
> >  	int i;
> >  
> > -	for (i = 0; i < vi->max_queue_pairs; i++)
> > +	for (i = 0; i < vi->max_queue_pairs; i++) {
> >  		netif_napi_del(&vi->rq[i].napi);
> > +		netif_napi_del(&vi->sq[i].napi);
> > +	}
> >  
> >  	kfree(vi->rq);
> >  	kfree(vi->sq);
> > @@ -1593,6 +1634,8 @@ static int virtnet_alloc_queues(struct virtnet_info *vi)
> >  		netif_napi_add(vi->dev, &vi->rq[i].napi, virtnet_poll,
> >  			       napi_weight);
> >  		napi_hash_add(&vi->rq[i].napi);
> > +		netif_napi_add(vi->dev, &vi->sq[i].napi, virtnet_poll_tx,
> > +			       napi_weight);
> >  
> >  		sg_init_table(vi->rq[i].sg, ARRAY_SIZE(vi->rq[i].sg));
> >  		ewma_init(&vi->rq[i].mrg_avg_pkt_len, 1, RECEIVE_AVG_WEIGHT);
> > @@ -1893,8 +1936,10 @@ static int virtnet_freeze(struct virtio_device *vdev)
> >  	if (netif_running(vi->dev)) {
> >  		for (i = 0; i < vi->max_queue_pairs; i++) {
> >  			napi_disable(&vi->rq[i].napi);
> > +			napi_disable(&vi->sq[i].napi);
> >  			napi_hash_del(&vi->rq[i].napi);
> >  			netif_napi_del(&vi->rq[i].napi);
> > +			netif_napi_del(&vi->sq[i].napi);
> >  		}
> >  	}
> >  
> > @@ -1919,8 +1964,10 @@ static int virtnet_restore(struct virtio_device *vdev)
> >  			if (!try_fill_recv(&vi->rq[i], GFP_KERNEL))
> >  				schedule_delayed_work(&vi->refill, 0);
> >  
> > -		for (i = 0; i < vi->max_queue_pairs; i++)
> > +		for (i = 0; i < vi->max_queue_pairs; i++) {
> >  			virtnet_napi_enable(&vi->rq[i]);
> > +			napi_enable(&vi->sq[i].napi);
> > +		}
> >  	}
> >  
> >  	netif_device_attach(vi->dev);

^ permalink raw reply

* Re: [PATCH] tcp: refine autocork condition check
From: Eric Dumazet @ 2014-10-15 11:47 UTC (permalink / raw)
  To: Weiping Pan; +Cc: netdev, edumazet
In-Reply-To: <8e0610510498c7f6ecbe2e99ab6044030f93f792.1413369212.git.panweiping3@gmail.com>

On Wed, 2014-10-15 at 18:34 +0800, Weiping Pan wrote:
> Inspired by commit b2532eb9abd8 (tcp: fix ooo_okay setting vs Small Queues).
> 
> The last check in tcp_should_autocork() was meant to check that whether we
> only have an ACK in Qdisc/NIC queues, or if TX completion was delayed after we
> processed ACK packet, if so, we should push the packet immediately instead of
> corking it.
> Therefore we should compare sk_wmem_alloc with SKB_TRUESIZE(1) instead of
> skb->truesize.
> 
> After this patch, tcp should have more chances to be corked, and the
> performance should be a little better.  And netperf shows that this patch
> works as expected.
> 
> ./super_netperf.sh 300 -H 10.16.42.249 -t TCP_STREAM -- -m 1 -M 1
> 	      speed	TCPAutoCorking
> Before patch: 169.38    222278
> After patch:  173.27    232988
> 

I do not see how this patch changes anything on this workload, I suspect
noise in your tests ? Full nstat output would give some hints maybe.

TCP_STREAM netperfs send no ACK packets at all.

I am concerned that this patch adds some latencies, and this wont be
seen with your TCP_STREAM test.

Autocorking is a trade off between throughput and latencies.

We need extensive tests, using TCP_RR with various sizes.

Existing behavior is telling that if a prior packet is in qdisc, and
this skb has a bigger truesize, we do not autocork.


In practice, you might hold now packets that are quite big, (more than
SKB_WITH_OVERHEAD(2048 - MAX_TCP_HEADER) bytes of payload.

Typical cases is applications using two writes, one to send a small
header, one for the body of the request/answer.

Existing code is better because we allow the second send() to be pushed
to the qdisc/NIC, before first send is TX completed.

^ permalink raw reply

* Re: [RFC PATCH net-next 2/6] virtio: introduce virtio_enable_cb_avail()
From: Michael S. Tsirkin @ 2014-10-15 11:43 UTC (permalink / raw)
  To: Jason Wang; +Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem
In-Reply-To: <543E5347.7060106@redhat.com>

On Wed, Oct 15, 2014 at 06:58:15PM +0800, Jason Wang wrote:
> On 10/15/2014 06:41 PM, Michael S. Tsirkin wrote:
> > On Wed, Oct 15, 2014 at 06:19:15PM +0800, Jason Wang wrote:
> >> On 10/15/2014 05:28 PM, Michael S. Tsirkin wrote:
> >>> On Wed, Oct 15, 2014 at 03:25:26PM +0800, Jason Wang wrote:
> >>>> This patch introduces virtio_enable_cb_avail() to publish avail idx
> >>>> and used event. This could be used by batched buffer submitting to
> >>>> reduce the number of tx interrupts.
> >>>>
> >>>> Cc: Rusty Russell <rusty@rustcorp.com.au>
> >>>> Cc: Michael S. Tsirkin <mst@redhat.com>
> >>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
> >>>> ---
> >>>>  drivers/virtio/virtio_ring.c |   22 ++++++++++++++++++++--
> >>>>  include/linux/virtio.h       |    2 ++
> >>>>  2 files changed, 22 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> >>>> index 1b3929f..d67fbf8 100644
> >>>> --- a/drivers/virtio/virtio_ring.c
> >>>> +++ b/drivers/virtio/virtio_ring.c
> >>>> @@ -567,14 +567,32 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
> >>>>  	 * entry. Always do both to keep code simple. */
> >>>>  	vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
> >>>>  	/* Make sure used event never go backwards */
> >>>> -	if (!vring_need_event(vring_used_event(&vq->vring),
> >>>> -			      vq->vring.avail->idx, last_used_idx))
> >>>> +	if (vq->vring.avail->idx != vring_used_event(&vq->vring) &&
> >>>> +	    !vring_need_event(vring_used_event(&vq->vring),
> >>>> +			      vq->vring.avail->idx, last_used_idx)) {
> >>>>  		vring_used_event(&vq->vring) = last_used_idx;
> >>>> +	}
> >>>>  	END_USE(vq);
> >>>>  	return last_used_idx;
> >>>>  }
> >>>>  EXPORT_SYMBOL_GPL(virtqueue_enable_cb_prepare);
> >>>>
> >>> I see you are also changing virtqueue_enable_cb_prepare, why?
> >> This is also used to prevent it from moving the used event backwards.
> >> This may happens when we handle tx napi after we publish avail idx as
> >> used event (virtqueue_enable_cb_avail() was called).
> > So it's wrong exactly in the same way.
> >
> > But also, please document this stuff, don't put
> > unrelated changes in a patch called "introduce
> > virtqueue_enable_cb_avail".
> >
> >
> >>>> +bool virtqueue_enable_cb_avail(struct virtqueue *_vq)
> >>>> +{
> >>>> +	struct vring_virtqueue *vq = to_vvq(_vq);
> >>>> +	bool ret;
> >>>> +
> >>>> +	START_USE(vq);
> >>>> +	vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
> >>>> +	vring_used_event(&vq->vring) = vq->vring.avail->idx;
> >>>> +	ret = vring_need_event(vq->vring.avail->idx,
> >>>> +			       vq->last_used_idx, vq->vring.used->idx);
> >>>> +	END_USE(vq);
> >>>> +
> >>>> +	return ret;
> >>>> +}
> >>>> +EXPORT_SYMBOL_GPL(virtqueue_enable_cb_avail);
> >>>> +
> >>>>  /**
> >>>>   * virtqueue_poll - query pending used buffers
> >>>>   * @vq: the struct virtqueue we're talking about.
> >>> Could not figure out what this does.
> >>> Please add documentation.
> >>>
> >> Sure, does something like below explain what does this function do?
> >>
> >> /**                                                                            
> >>
> >>  * virtqueue_enable_cb_avail - restart callbacks after
> >> disable_cb.           
> >>  * @vq: the struct virtqueue we're talking
> >> about.                              
> >>  *                                                                             
> >>
> >>  * This re-enables callbacks but hints to the other side to
> >> delay              
> >>  * interrupts all of the available buffers have been processed;         
> >
> > So this is like virtqueue_enable_cb_delayed but even more
> > aggressive?
> > I think it's too agressive: it's better to wake up guest
> > after you are through most of buffers, but not all,
> > so guest and host can work in parallel.
> 
> Note that:
> 
> - it was only used when there are still few of free slots (which is
> greater than 2 + MAX_SKB_FRAGS)
> - my patch keeps the free_old_xmit_skbs() in the beginning of
> start_xmit(), so the tx skb reclaiming does not depends totally on tx
> interrupt. If more packets comes, we'd expect some of them were freed in
> ndo_start_xmit(). If not, finally we may trigger the
> virtqueue_enable_cb_delayed().
> 
> So probably not as aggressive as it looks. I will do benchmark on this.

Mine too:
        } else if (virtqueue_enable_cb_delayed(sq->vq)) {
                free_old_xmit_skbs(txq, sq, qsize);
        }



> >
> >
> >>  * it returns "false" if there are at least one pending buffer in the
> >> queue,          
> >>  * to detect a possible race between the driver checking for more
> >> work,        
> >>  * and enabling
> >> callbacks.                                                     
> >>  *                                                                             
> >>
> >>  * Caller must ensure we don't call this with other
> >> virtqueue                  
> >>  * operations at the same time (except where
> >> noted).                           
> >>  */
> >>
> >>>> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> >>>> index b46671e..bfaf058 100644
> >>>> --- a/include/linux/virtio.h
> >>>> +++ b/include/linux/virtio.h
> >>>> @@ -65,6 +65,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
> >>>>  
> >>>>  unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq);
> >>>>  
> >>>> +bool virtqueue_enable_cb_avail(struct virtqueue *vq);
> >>>> +
> >>>>  bool virtqueue_poll(struct virtqueue *vq, unsigned);
> >>>>  
> >>>>  bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
> >>>> -- 
> >>>> 1.7.1
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> >>> the body of a message to majordomo@vger.kernel.org
> >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>> Please read the FAQ at  http://www.tux.org/lkml/
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply

* Re: something is wrong in commit 971f10eca1 - tcp: better TCP_SKB_CB layout to reduce cache line misses
From: Krzysztof Kolasa @ 2014-10-15 11:40 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Cong Wang, David Miller, Eric Dumazet, netdev
In-Reply-To: <1413372422.17365.10.camel@edumazet-glaptop2.roam.corp.google.com>

W dniu 15.10.2014 o 13:27, Eric Dumazet pisze:
> On Wed, 2014-10-15 at 12:35 +0200, Krzysztof Kolasa wrote:
>> W dniu 14.10.2014 o 23:43, Eric Dumazet pisze:
>>> On Tue, 2014-10-14 at 14:24 -0700, Cong Wang wrote:
>>>
>>>> Since we are still in merge window, I don't think we have to use
>>>> a one-line fix for a bug introduced in this merge window.
>>> You are clearly refactoring here. Its a nice cleanup.
>>>
>>> If I was the maintainer, I would prefer the one line fix.
>>>
>>> Then when net-next is open, you refactor.
>>>
>>> As I said, I wont argue, do whatever you want.
>>>
>>> Thanks
>>>
>>>
>>>
>> one-line patch not resolve problem, fix created by Cong Wang resolves
>> problem !!!
> Hmm, there should be no difference with either patch.
>
> tcp_v4_rcv()
> ...
> memmove(&TCP_SKB_CB(skb)->header.h4, IPCB(skb),
>          sizeof(struct inet_skb_parm));
> ...
> -> tcp_v4_do_rcv()
>     ->  tcp_v4_hnd_req()
>        -> cookie_v4_check(... , &TCP_SKB_CB(skb)->header.h4.opt)
>
> Hmm...
>
>

on a 32bit system, the patch did not solve the problem :(
I have exactly the same problem as before the patch

I do not understand this, perhaps the problem is hidden somewhere else,
one thing is certain after revert commit 971f10eca1 everything works 
correctly

^ permalink raw reply

* Re: [RFC PATCH net-next 1/6] virtio: make sure used event never go backwards
From: Michael S. Tsirkin @ 2014-10-15 11:38 UTC (permalink / raw)
  To: Jason Wang; +Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem
In-Reply-To: <543E5019.9020507@redhat.com>

On Wed, Oct 15, 2014 at 06:44:41PM +0800, Jason Wang wrote:
> On 10/15/2014 06:32 PM, Michael S. Tsirkin wrote:
> > On Wed, Oct 15, 2014 at 06:13:19PM +0800, Jason Wang wrote:
> >> On 10/15/2014 05:34 PM, Michael S. Tsirkin wrote:
> >>> On Wed, Oct 15, 2014 at 03:25:25PM +0800, Jason Wang wrote:
> >>>> This patch checks the new event idx to make sure used event idx never
> >>>> goes back. This is used to synchronize the calls between
> >>>> virtqueue_enable_cb_delayed() and virtqueue_enable_cb().
> >>>>
> >>>> Cc: Rusty Russell <rusty@rustcorp.com.au>
> >>>> Cc: Michael S. Tsirkin <mst@redhat.com>
> >>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
> >>> the implication being that moving event idx back might cause some race
> >>> condition?  
> >> This will cause race condition when tx interrupt is enabled. Consider
> >> the following cases
> >>
> >> 1) tx napi was scheduled
> >> 2) start_xmit() call virtqueue_enable_cb_delayed() and disable cb, [used
> >> event is vq->last_used_idx + 3/4 pendg bufs]
> >> 3) tx napi enable the callback by virtqueue_enable_cb() [ used event is
> >> vq->last_used_idx ]
> >>  
> >> After step 3, used event was moved back, unnecessary tx interrupt was
> >> triggered.
> > Well unnecessary interrupts are safe.
> 
> But it that is what we want to reduce.

It's all about correctness. I don't think mixing enable_cb
and enable_cb_delayed makes sense, let's just make
virtio behave correctly if that happens, no need to
optimize for that.


> > With your patch caller of virtqueue_enable_cb will not get an
> > interrupt on the next buffer which is not safe.
> >
> > If you don't want an interrupt on the next buffer, don't
> > call virtqueue_enable_cb.
> 
> So something like this patch should be done in virtio core somewhere
> else. Virtio-net can not do this since it does not have the knowledge of
> event index.

Take a look at my patch - no calls to enable_cb, only
enable_cb_delayed, so we should be fine.

> >
> >>> If yes but please describe the race explicitly.
> >>> Is there a bug we need to fix on stable?
> >> Looks not, current code does not have such race condition.
> >>> Please also explicitly describe a configuration that causes event idx
> >>> to go back.
> >>>
> >>> All this info should go in the commit log.
> >> Will do this.
> >>>> ---
> >>>>  drivers/virtio/virtio_ring.c |    7 +++++--
> >>>>  1 files changed, 5 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> >>>> index 3b1f89b..1b3929f 100644
> >>>> --- a/drivers/virtio/virtio_ring.c
> >>>> +++ b/drivers/virtio/virtio_ring.c
> >>>> @@ -559,14 +559,17 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
> >>>>  	u16 last_used_idx;
> >>>>  
> >>>>  	START_USE(vq);
> >>>> -
> >>>> +	last_used_idx = vq->last_used_idx;
> >>>>  	/* We optimistically turn back on interrupts, then check if there was
> >>>>  	 * more to do. */
> >>>>  	/* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
> >>>>  	 * either clear the flags bit or point the event index at the next
> >>>>  	 * entry. Always do both to keep code simple. */
> >>>>  	vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
> >>>> -	vring_used_event(&vq->vring) = last_used_idx = vq->last_used_idx;
> >>>> +	/* Make sure used event never go backwards */
> >>> s/go/goes/
> >>>
> >>>> +	if (!vring_need_event(vring_used_event(&vq->vring),
> >>>> +			      vq->vring.avail->idx, last_used_idx))
> >>>> +		vring_used_event(&vq->vring) = last_used_idx;
> >>> The result will be that driver will *not* get an interrupt
> >>> on the next consumed buffer, which is likely not what driver
> >>> intended when it called virtqueue_enable_cb.
> >> This will only happen when we want to delay the interrupt for next few
> >> consumed buffers (virtqueue_enable_cb_delayed() was called). For the
> >> other case, vq->last_used_idx should be ahead of previous used event. Do
> >> you see any other case?
> > Call virtqueue_enable_cb_delayed, later call virtqueue_enable_cb.  If
> > event index is not updated in virtqueue_enable_cb, driver will not get
> > an interrupt on the next buffer.
> 
> This is just what we want I think. The interrupt was not lost but fired
> after 3/4 pending buffers were consumed. Do you see any real issue on this?

Yes, this violates the API. For example device might never
consume the rest of buffers.


> >
> >>> Instead, how about we simply document the requirement that drivers either
> >>> always call virtqueue_enable_cb_delayed or virtqueue_enable_cb
> >>> but not both?
> >> We need call them both when tx interrupt is enabled I believe.
> > Can you pls reply to my patch and document issues you see?
> >
> 
> In the previous reply you said you're using
> virtuqueue_enable_cb_delayed(), so no race in your patch.

OK so you think my patch is also correct, but that yours gives better
efficiency?

-- 
MST

^ permalink raw reply

* Re: [PATCH v2] ipv4: dst_entry leak in ip_append_data()
From: Vasily Averin @ 2014-10-15 11:31 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy
In-Reply-To: <1413365433.12304.53.camel@edumazet-glaptop2.roam.corp.google.com>

On 15.10.2014 13:30, Eric Dumazet wrote:
> On Wed, 2014-10-15 at 10:56 +0400, Vasily Averin wrote:
>> On 15.10.2014 08:46, Eric Dumazet wrote:
>>> On Tue, 2014-10-14 at 08:57 +0400, Vasily Averin wrote:
>>>> v2: adjust the indentation of the arguments __ip_append_data() call
>>>>
>>>> Fixes: 2e77d89b2fa8 ("net: avoid a pair of dst_hold()/dst_release() in ip_append_data()")
>>>>
>>>> If sk_write_queue is empty ip_append_data() executes ip_setup_cork()
>>>> that "steals" dst entry from rt to cork. Later it calls __ip_append_data()
>>>> that creates skb and adds it to sk_write_queue.
>>>>
>>>> If skb was added successfully following ip_push_pending_frames() call
>>>> reassign dst entries from cork to skb, and kfree_skb frees dst_entry.
>>>>
>>>> However nobody frees stolen dst_entry if skb was not added into sk_write_queue.
>>>
>>> I thought this was done by ip_flush_pending_frames() ?
>>
>> Take look at ip_send_unicast_reply():
> 
> So maybe the bug is here ?

Thank you, I'll remake my patch.

^ permalink raw reply

* Re: something is wrong in commit 971f10eca1 - tcp: better TCP_SKB_CB layout to reduce cache line misses
From: Eric Dumazet @ 2014-10-15 11:27 UTC (permalink / raw)
  To: Krzysztof Kolasa; +Cc: Cong Wang, David Miller, Eric Dumazet, netdev
In-Reply-To: <543E4DD8.80203@winsoft.pl>

On Wed, 2014-10-15 at 12:35 +0200, Krzysztof Kolasa wrote:
> W dniu 14.10.2014 o 23:43, Eric Dumazet pisze:
> > On Tue, 2014-10-14 at 14:24 -0700, Cong Wang wrote:
> >
> >> Since we are still in merge window, I don't think we have to use
> >> a one-line fix for a bug introduced in this merge window.
> > You are clearly refactoring here. Its a nice cleanup.
> >
> > If I was the maintainer, I would prefer the one line fix.
> >
> > Then when net-next is open, you refactor.
> >
> > As I said, I wont argue, do whatever you want.
> >
> > Thanks
> >
> >
> >
> 
> one-line patch not resolve problem, fix created by Cong Wang resolves 
> problem !!!

Hmm, there should be no difference with either patch.

tcp_v4_rcv() 
...
memmove(&TCP_SKB_CB(skb)->header.h4, IPCB(skb),
        sizeof(struct inet_skb_parm));
...
-> tcp_v4_do_rcv() 
   ->  tcp_v4_hnd_req()
      -> cookie_v4_check(... , &TCP_SKB_CB(skb)->header.h4.opt)

Hmm...

^ permalink raw reply

* Re: [RFC PATCH net-next 0/6] Always use tx interrupt for virtio-net
From: Jason Wang @ 2014-10-15 11:14 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem
In-Reply-To: <20141015102553.GF25776@redhat.com>

On 10/15/2014 06:25 PM, Michael S. Tsirkin wrote:
> On Wed, Oct 15, 2014 at 03:25:24PM +0800, Jason Wang wrote:
>> According to David, proper accounting and queueing (at all levels, not
>> just TCP sockets) is more important than trying to skim a bunch of
>> cycles by avoiding TX interrupts.
> He also mentioned we should find other ways to batch
>

Right.
>> Having an event to free the SKB is
>> absolutely essential for the stack to operate correctly.
>>
>> This series tries to enable tx interrupt for virtio-net. The idea is
>> simple: enable tx interrupt and schedule a tx napi to free old xmit
>> skbs.
>>
>> Several notes:
>> - Tx interrupt storm avoidance when queue is about to be full is
>>   kept.
> But queue is typically *not* full. More important to avoid interrupt
> storms in that case IMO.

Yes.
>> Since we may enable callbacks in both ndo_start_xmit() and tx
>>   napi, patch 1 adds a check to make sure used event never go
>>   back. This will let the napi not enable the callbacks wrongly after
>>   delayed callbacks was used.
> So why not just use delayed callbacks?

This means the tx interrupt are coalesced in a somewhat adaptive way.
Need benchmark to see its effect.
>
>> - For bulk dequeuing, there's no need to enable tx interrupt for each
>>   packet. The last patch only enable tx interrupt for the final skb in
>>   the chain through xmit_more and a new helper to publish current avail
>>   idx as used event.
>>
>> This series fixes several issues of original rfc pointed out by Michael.
> Could you list the issues, for ease of review?

Probably just one:

- Move the virtqueue_disable_cb() from skb_xmit_done() into
virtnet_poll_tx() under tx lock.

^ permalink raw reply

* Re: [PATCH RFC net-next] sfc: add support for skb->xmit_more
From: Edward Cree @ 2014-10-15 11:05 UTC (permalink / raw)
  To: David Miller; +Cc: dborkman, nikolay, netdev, sshah, jcooper, linux-net-drivers
In-Reply-To: <20141014.171501.448630399243434370.davem@davemloft.net>

On 14/10/14 22:15, David Miller wrote:
> From: Edward Cree <ecree@solarflare.com>
> Date: Tue, 14 Oct 2014 19:41:37 +0100
>
>> Don't ring the doorbell, and don't do PIO.  This will also prevent
>>  TX Push, because there will be more than one buffer waiting when
>>  the doorbell is rung.
>>
>> Signed-off-by: Edward Cree <ecree@solarflare.com>
> This looks good to me, mind if I apply this now?
I'd rather wait until Jon Cooper's had a chance to look at it; he
understands our TX path better than I.

-Edward

^ permalink raw reply

* Re: [PATCH RFC] virtio_net: enable tx interrupt
From: Jason Wang @ 2014-10-15 11:04 UTC (permalink / raw)
  To: Michael S. Tsirkin, linux-kernel; +Cc: netdev, virtualization
In-Reply-To: <1413323524-23380-1-git-send-email-mst@redhat.com>

On 10/15/2014 05:53 AM, Michael S. Tsirkin wrote:
> On newer hosts that support delayed tx interrupts,
> we probably don't have much to gain from orphaning
> packets early.
>
> Based on patch by Jason Wang.
>
> Note: this will likely degrade performance for hosts without event idx
> support.  Various fallback options are available, including
> orphaning conditionally.
> Testing TBD.
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  drivers/net/virtio_net.c | 119 +++++++++++++++++++++++++++++++++--------------
>  1 file changed, 83 insertions(+), 36 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 6b6e136..62c059d 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -72,6 +72,8 @@ struct send_queue {
>  
>  	/* Name of the send queue: output.$index */
>  	char name[40];
> +
> +	struct napi_struct napi;
>  };
>  
>  /* Internal representation of a receive virtqueue */
> @@ -211,15 +213,38 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
>  	return p;
>  }
>  
> +static int free_old_xmit_skbs(struct send_queue *sq, int budget)
> +{
> +	struct sk_buff *skb;
> +	unsigned int len;
> +	struct virtnet_info *vi = sq->vq->vdev->priv;
> +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> +	int sent = 0;
> +
> +	while (sent < budget &&
> +	       (skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> +		pr_debug("Sent skb %p\n", skb);
> +
> +		u64_stats_update_begin(&stats->tx_syncp);
> +		stats->tx_bytes += skb->len;
> +		stats->tx_packets++;
> +		u64_stats_update_end(&stats->tx_syncp);
> +
> +		dev_kfree_skb_any(skb);
> +		sent++;
> +	}
> +
> +	return sent;
> +}
> +
>  static void skb_xmit_done(struct virtqueue *vq)
>  {
>  	struct virtnet_info *vi = vq->vdev->priv;
> +	struct send_queue *sq = &vi->sq[vq2txq(vq)];
>  
> -	/* Suppress further interrupts. */
> -	virtqueue_disable_cb(vq);
> -
> -	/* We were probably waiting for more output buffers. */
> -	netif_wake_subqueue(vi->dev, vq2txq(vq));
> +	if (napi_schedule_prep(&sq->napi)) {
> +		__napi_schedule(&sq->napi);
> +	}
>  }
>  
>  static unsigned int mergeable_ctx_to_buf_truesize(unsigned long mrg_ctx)
> @@ -766,6 +791,37 @@ again:
>  	return received;
>  }
>  
> +static int virtnet_poll_tx(struct napi_struct *napi, int budget)
> +{
> +	struct send_queue *sq =
> +		container_of(napi, struct send_queue, napi);
> +	struct virtnet_info *vi = sq->vq->vdev->priv;
> +	struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, vq2txq(sq->vq));
> +	unsigned int r, sent = 0;
> +
> +again:
> +	__netif_tx_lock(txq, smp_processor_id());
> +	virtqueue_disable_cb(sq->vq);
> +	sent += free_old_xmit_skbs(sq, budget - sent);
> +
> +	if (sent < budget) {
> +		r = virtqueue_enable_cb_prepare(sq->vq);

So even virtqueue_enable_cb_delayed() was used in start_xmit(). This can
move used index backwards to trigger unnecessary interrupts.
> +		napi_complete(napi);
> +		__netif_tx_unlock(txq);
> +		if (unlikely(virtqueue_poll(sq->vq, r)) &&
> +		    napi_schedule_prep(napi)) {
> +			virtqueue_disable_cb(sq->vq);
> +			__napi_schedule(napi);
> +			goto again;
> +		}
> +	} else {
> +		__netif_tx_unlock(txq);
> +	}
> +
> +	netif_wake_subqueue(vi->dev, vq2txq(sq->vq));
> +	return sent;
> +}
> +
>  #ifdef CONFIG_NET_RX_BUSY_POLL
>  /* must be called with local_bh_disable()d */
>  static int virtnet_busy_poll(struct napi_struct *napi)
> @@ -814,30 +870,12 @@ static int virtnet_open(struct net_device *dev)
>  			if (!try_fill_recv(&vi->rq[i], GFP_KERNEL))
>  				schedule_delayed_work(&vi->refill, 0);
>  		virtnet_napi_enable(&vi->rq[i]);
> +		napi_enable(&vi->sq[i].napi);
>  	}
>  
>  	return 0;
>  }
>  
> -static void free_old_xmit_skbs(struct send_queue *sq)
> -{
> -	struct sk_buff *skb;
> -	unsigned int len;
> -	struct virtnet_info *vi = sq->vq->vdev->priv;
> -	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> -
> -	while ((skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> -		pr_debug("Sent skb %p\n", skb);
> -
> -		u64_stats_update_begin(&stats->tx_syncp);
> -		stats->tx_bytes += skb->len;
> -		stats->tx_packets++;
> -		u64_stats_update_end(&stats->tx_syncp);
> -
> -		dev_kfree_skb_any(skb);
> -	}
> -}
> -
>  static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
>  {
>  	struct skb_vnet_hdr *hdr;
> @@ -902,7 +940,9 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
>  		sg_set_buf(sq->sg, hdr, hdr_len);
>  		num_sg = skb_to_sgvec(skb, sq->sg + 1, 0, skb->len) + 1;
>  	}
> -	return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb, GFP_ATOMIC);
> +
> +	return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb,
> +				    GFP_ATOMIC);
>  }
>  
>  static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> @@ -910,10 +950,9 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>  	struct virtnet_info *vi = netdev_priv(dev);
>  	int qnum = skb_get_queue_mapping(skb);
>  	struct send_queue *sq = &vi->sq[qnum];
> -	int err;
> +	int err, qsize = virtqueue_get_vring_size(sq->vq);
>  
> -	/* Free up any pending old buffers before queueing new ones. */
> -	free_old_xmit_skbs(sq);

I think we'd better keep this. Since it may the tx skb freeing not
totally depends on the tx interrupt, delayed interrupt may work better
without damage the latency.
> +	virtqueue_disable_cb(sq->vq);
>  
>  	/* Try to transmit */
>  	err = xmit_skb(sq, skb);
> @@ -930,22 +969,20 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>  	}
>  	virtqueue_kick(sq->vq);
>  
> -	/* Don't wait up for transmitted skbs to be freed. */
> -	skb_orphan(skb);
> -	nf_reset(skb);
> -
>  	/* Apparently nice girls don't return TX_BUSY; stop the queue
>  	 * before it gets out of hand.  Naturally, this wastes entries. */
>  	if (sq->vq->num_free < 2+MAX_SKB_FRAGS) {
>  		netif_stop_subqueue(dev, qnum);
>  		if (unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
>  			/* More just got used, free them then recheck. */
> -			free_old_xmit_skbs(sq);
> +			free_old_xmit_skbs(sq, qsize);
>  			if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
>  				netif_start_subqueue(dev, qnum);
>  				virtqueue_disable_cb(sq->vq);
>  			}
>  		}
> +	} else if (virtqueue_enable_cb_delayed(sq->vq)) {
> +		free_old_xmit_skbs(sq, qsize);
>  	}
>  
>  	return NETDEV_TX_OK;
> @@ -1124,8 +1161,10 @@ static int virtnet_close(struct net_device *dev)
>  	/* Make sure refill_work doesn't re-enable napi! */
>  	cancel_delayed_work_sync(&vi->refill);
>  
> -	for (i = 0; i < vi->max_queue_pairs; i++)
> +	for (i = 0; i < vi->max_queue_pairs; i++) {
>  		napi_disable(&vi->rq[i].napi);
> +		napi_disable(&vi->sq[i].napi);
> +	}
>  
>  	return 0;
>  }
> @@ -1438,8 +1477,10 @@ static void virtnet_free_queues(struct virtnet_info *vi)
>  {
>  	int i;
>  
> -	for (i = 0; i < vi->max_queue_pairs; i++)
> +	for (i = 0; i < vi->max_queue_pairs; i++) {
>  		netif_napi_del(&vi->rq[i].napi);
> +		netif_napi_del(&vi->sq[i].napi);
> +	}
>  
>  	kfree(vi->rq);
>  	kfree(vi->sq);
> @@ -1593,6 +1634,8 @@ static int virtnet_alloc_queues(struct virtnet_info *vi)
>  		netif_napi_add(vi->dev, &vi->rq[i].napi, virtnet_poll,
>  			       napi_weight);
>  		napi_hash_add(&vi->rq[i].napi);
> +		netif_napi_add(vi->dev, &vi->sq[i].napi, virtnet_poll_tx,
> +			       napi_weight);
>  
>  		sg_init_table(vi->rq[i].sg, ARRAY_SIZE(vi->rq[i].sg));
>  		ewma_init(&vi->rq[i].mrg_avg_pkt_len, 1, RECEIVE_AVG_WEIGHT);
> @@ -1893,8 +1936,10 @@ static int virtnet_freeze(struct virtio_device *vdev)
>  	if (netif_running(vi->dev)) {
>  		for (i = 0; i < vi->max_queue_pairs; i++) {
>  			napi_disable(&vi->rq[i].napi);
> +			napi_disable(&vi->sq[i].napi);
>  			napi_hash_del(&vi->rq[i].napi);
>  			netif_napi_del(&vi->rq[i].napi);
> +			netif_napi_del(&vi->sq[i].napi);
>  		}
>  	}
>  
> @@ -1919,8 +1964,10 @@ static int virtnet_restore(struct virtio_device *vdev)
>  			if (!try_fill_recv(&vi->rq[i], GFP_KERNEL))
>  				schedule_delayed_work(&vi->refill, 0);
>  
> -		for (i = 0; i < vi->max_queue_pairs; i++)
> +		for (i = 0; i < vi->max_queue_pairs; i++) {
>  			virtnet_napi_enable(&vi->rq[i]);
> +			napi_enable(&vi->sq[i].napi);
> +		}
>  	}
>  
>  	netif_device_attach(vi->dev);

^ permalink raw reply

* Re: [PATCH v3 net] ixgbe: check adapter->vfinfo before dereference
From: Jeff Kirsher @ 2014-10-15 11:00 UTC (permalink / raw)
  To: Thierry Herbelot; +Cc: Jesse Brandeburg, Bruce Allan, netdev, emil.s.tantilov
In-Reply-To: <1413367080-31540-1-git-send-email-thierry.herbelot@6wind.com>

[-- Attachment #1: Type: text/plain, Size: 811 bytes --]

On Wed, 2014-10-15 at 11:58 +0200, Thierry Herbelot wrote:
> this protects against the following panic:
> (before a VF was actually created on p96p1 PF Ethernet port)
> 
> ip link set p96p1 vf 0 spoofchk off
> BUG: unable to handle kernel NULL pointer dereference at
> 0000000000000052
> IP: [<ffffffffa044a1c1>] ixgbe_ndo_set_vf_spoofchk+0x51/0x150 [ixgbe]
> 
> Signed-off-by: Thierry Herbelot <thierry.herbelot@6wind.com>
> ---
> 
> v2:
>   compilation fixes
> 
> v3:
>   remove checks in functions where vfinfo is known not to be NULL
>   return -EINVAL as error code
> 
>  drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c |   42
> ++++++++++++++++++++++--
>  1 file changed, 40 insertions(+), 2 deletions(-)

Thanks Thierry, I have added this patch to my queue (and dropped your
v2).

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox