[PATCH net-next 0/6] mvneta driver performance improvements

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net-next 0/6] mvneta driver performance improvements
@ 2014-01-12 11:24 Willy Tarreau
  2014-01-12 11:24 ` [PATCH net-next 1/6] net: mvneta: remove tests for impossible cases in the tx_done path Willy Tarreau
                   ` (6 more replies)
  0 siblings, 7 replies; 14+ messages in thread
From: Willy Tarreau @ 2014-01-12 11:24 UTC (permalink / raw)
  To: davem; +Cc: netdev, Willy Tarreau

Hi,

this patch series implements several performance improvements on the
mvneta driver.

  - The first 3 patches are essentially cleanups, code deduplication
    and minor optimizations for not re-fetching a value we already have
    (status).

  - patch 4 changes the prefetch of Rx descriptor from current one to
    next one. In benchmarks, it results in about 1% general performance
    increase on HTTP traffic, probably because prefetching the current
    descriptor does not leave enough time between the start of prefetch
    and its usage.

  - patch 5 implements support for build_skb() on Rx path. The driver
    now preallocates frags instead of skbs and builds an skb just before
    delivering it. This results in a 2% performance increase on HTTP
    traffic, and up to 5% on small packet Rx rate.

  - patch 6 implements rx_copybreak for small packets (256 bytes). It
    avoids a dma_map_single()/dma_unmap_single() and increases the Rx
    rate by 16.4%, from 486kpps to 573kpps. Further improvements up to
    711kpps are possible depending how the DMA is used.

This patch series depends on the previous series of fixes and is only
for the net-next tree.

Thanks!
Willy

---

Willy Tarreau (6):
  net: mvneta: remove tests for impossible cases in the tx_done path
  net: mvneta: factor rx refilling code
  net: mvneta: simplify access to the rx descriptor status
  net: mvneta: prefetch next rx descriptor instead of current one
  net: mvneta: convert to build_skb()
  net: mvneta: implement rx_copybreak

 drivers/net/ethernet/marvell/mvneta.c | 152 +++++++++++++++++++++-------------
 1 file changed, 95 insertions(+), 57 deletions(-)

-- 
1.7.12.2.21.g234cd45.dirty

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH net-next 1/6] net: mvneta: remove tests for impossible cases in the tx_done path
  2014-01-12 11:24 [PATCH net-next 0/6] mvneta driver performance improvements Willy Tarreau
@ 2014-01-12 11:24 ` Willy Tarreau
  2014-01-12 11:24 ` [PATCH net-next 2/6] net: mvneta: factor rx refilling code Willy Tarreau
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Willy Tarreau @ 2014-01-12 11:24 UTC (permalink / raw)
  To: davem; +Cc: netdev, Willy Tarreau, Thomas Petazzoni, Gregory CLEMENT

Currently, mvneta_txq_bufs_free() calls mvneta_tx_done_policy() with
a non-null cause to retrieve the pointer to the next queue to process.
There are useless tests on the return queue number and on the pointer,
all of which are well defined within a known limited set. This code
path is fast, although not critical. Removing 3 tests here that the
compiler could not optimize (verified) is always desirable.

Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/net/ethernet/marvell/mvneta.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index df75a23..2fb9559 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -1276,13 +1276,16 @@ static void mvneta_rx_csum(struct mvneta_port *pp,
 	skb->ip_summed = CHECKSUM_NONE;
 }
 
-/* Return tx queue pointer (find last set bit) according to causeTxDone reg */
+/* Return tx queue pointer (find last set bit) according to <cause> returned
+ * form tx_done reg. <cause> must not be null. The return value is always a
+ * valid queue for matching the first one found in <cause>.
+ */
 static struct mvneta_tx_queue *mvneta_tx_done_policy(struct mvneta_port *pp,
 						     u32 cause)
 {
 	int queue = fls(cause) - 1;
 
-	return (queue < 0 || queue >= txq_number) ? NULL : &pp->txqs[queue];
+	return &pp->txqs[queue];
 }
 
 /* Free tx queue skbuffs */
@@ -1651,7 +1654,9 @@ static void mvneta_txq_done_force(struct mvneta_port *pp,
 	txq->txq_get_index = 0;
 }
 
-/* handle tx done - called from tx done timer callback */
+/* Handle tx done - called in softirq context. The <cause_tx_done> argument
+ * must be a valid cause according to MVNETA_TXQ_INTR_MASK_ALL.
+ */
 static u32 mvneta_tx_done_gbe(struct mvneta_port *pp, u32 cause_tx_done,
 			      int *tx_todo)
 {
@@ -1660,10 +1665,8 @@ static u32 mvneta_tx_done_gbe(struct mvneta_port *pp, u32 cause_tx_done,
 	struct netdev_queue *nq;
 
 	*tx_todo = 0;
-	while (cause_tx_done != 0) {
+	while (cause_tx_done) {
 		txq = mvneta_tx_done_policy(pp, cause_tx_done);
-		if (!txq)
-			break;
 
 		nq = netdev_get_tx_queue(pp->dev, txq->id);
 		__netif_tx_lock(nq, smp_processor_id());
-- 
1.7.12.2.21.g234cd45.dirty

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next 2/6] net: mvneta: factor rx refilling code
  2014-01-12 11:24 [PATCH net-next 0/6] mvneta driver performance improvements Willy Tarreau
  2014-01-12 11:24 ` [PATCH net-next 1/6] net: mvneta: remove tests for impossible cases in the tx_done path Willy Tarreau
@ 2014-01-12 11:24 ` Willy Tarreau
  2014-01-12 11:24 ` [PATCH net-next 3/6] net: mvneta: simplify access to the rx descriptor status Willy Tarreau
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Willy Tarreau @ 2014-01-12 11:24 UTC (permalink / raw)
  To: davem; +Cc: netdev, Willy Tarreau, Thomas Petazzoni, Gregory CLEMENT

Make mvneta_rxq_fill() use mvneta_rx_refill() instead of using
duplicate code.

Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/net/ethernet/marvell/mvneta.c | 23 +++--------------------
 1 file changed, 3 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 2fb9559..eccafd1 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -1969,32 +1969,15 @@ static int mvneta_poll(struct napi_struct *napi, int budget)
 static int mvneta_rxq_fill(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 			   int num)
 {
-	struct net_device *dev = pp->dev;
 	int i;
 
 	for (i = 0; i < num; i++) {
-		struct sk_buff *skb;
-		struct mvneta_rx_desc *rx_desc;
-		unsigned long phys_addr;
-
-		skb = dev_alloc_skb(pp->pkt_size);
-		if (!skb) {
-			netdev_err(dev, "%s:rxq %d, %d of %d buffs  filled\n",
+		memset(rxq->descs + i, 0, sizeof(struct mvneta_rx_desc));
+		if (mvneta_rx_refill(pp, rxq->descs + i) != 0) {
+			netdev_err(pp->dev, "%s:rxq %d, %d of %d buffs  filled\n",
 				__func__, rxq->id, i, num);
 			break;
 		}
-
-		rx_desc = rxq->descs + i;
-		memset(rx_desc, 0, sizeof(struct mvneta_rx_desc));
-		phys_addr = dma_map_single(dev->dev.parent, skb->head,
-					   MVNETA_RX_BUF_SIZE(pp->pkt_size),
-					   DMA_FROM_DEVICE);
-		if (unlikely(dma_mapping_error(dev->dev.parent, phys_addr))) {
-			dev_kfree_skb(skb);
-			break;
-		}
-
-		mvneta_rx_desc_fill(rx_desc, phys_addr, (u32)skb);
 	}
 
 	/* Add this number of RX descriptors as non occupied (ready to
-- 
1.7.12.2.21.g234cd45.dirty

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next 3/6] net: mvneta: simplify access to the rx descriptor status
  2014-01-12 11:24 [PATCH net-next 0/6] mvneta driver performance improvements Willy Tarreau
  2014-01-12 11:24 ` [PATCH net-next 1/6] net: mvneta: remove tests for impossible cases in the tx_done path Willy Tarreau
  2014-01-12 11:24 ` [PATCH net-next 2/6] net: mvneta: factor rx refilling code Willy Tarreau
@ 2014-01-12 11:24 ` Willy Tarreau
  2014-01-12 11:24 ` [PATCH net-next 4/6] net: mvneta: prefetch next rx descriptor instead of current one Willy Tarreau
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Willy Tarreau @ 2014-01-12 11:24 UTC (permalink / raw)
  To: davem; +Cc: netdev, Willy Tarreau, Thomas Petazzoni, Gregory CLEMENT

At several places, we already know the value of the rx status but
we call functions which dereference the pointer again to get it
and don't need the descriptor for anything else. Simplify this
task by replacing the rx desc pointer by the status word itself.

Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/net/ethernet/marvell/mvneta.c | 25 ++++++++++++-------------
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index eccafd1..aa3a4f7 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -528,14 +528,14 @@ struct rtnl_link_stats64 *mvneta_get_stats64(struct net_device *dev,
 
 /* Rx descriptors helper methods */
 
-/* Checks whether the given RX descriptor is both the first and the
- * last descriptor for the RX packet. Each RX packet is currently
+/* Checks whether the RX descriptor having this status is both the first
+ * and the last descriptor for the RX packet. Each RX packet is currently
  * received through a single RX descriptor, so not having each RX
  * descriptor with its first and last bits set is an error
  */
-static int mvneta_rxq_desc_is_first_last(struct mvneta_rx_desc *desc)
+static int mvneta_rxq_desc_is_first_last(u32 status)
 {
-	return (desc->status & MVNETA_RXD_FIRST_LAST_DESC) ==
+	return (status & MVNETA_RXD_FIRST_LAST_DESC) ==
 		MVNETA_RXD_FIRST_LAST_DESC;
 }
 
@@ -1234,10 +1234,10 @@ static void mvneta_rx_error(struct mvneta_port *pp,
 {
 	u32 status = rx_desc->status;
 
-	if (!mvneta_rxq_desc_is_first_last(rx_desc)) {
+	if (!mvneta_rxq_desc_is_first_last(status)) {
 		netdev_err(pp->dev,
 			   "bad rx status %08x (buffer oversize), size=%d\n",
-			   rx_desc->status, rx_desc->data_size);
+			   status, rx_desc->data_size);
 		return;
 	}
 
@@ -1261,13 +1261,12 @@ static void mvneta_rx_error(struct mvneta_port *pp,
 	}
 }
 
-/* Handle RX checksum offload */
-static void mvneta_rx_csum(struct mvneta_port *pp,
-			   struct mvneta_rx_desc *rx_desc,
+/* Handle RX checksum offload based on the descriptor's status */
+static void mvneta_rx_csum(struct mvneta_port *pp, u32 status,
 			   struct sk_buff *skb)
 {
-	if ((rx_desc->status & MVNETA_RXD_L3_IP4) &&
-	    (rx_desc->status & MVNETA_RXD_L4_CSUM_OK)) {
+	if ((status & MVNETA_RXD_L3_IP4) &&
+	    (status & MVNETA_RXD_L4_CSUM_OK)) {
 		skb->csum = 0;
 		skb->ip_summed = CHECKSUM_UNNECESSARY;
 		return;
@@ -1449,7 +1448,7 @@ static int mvneta_rx(struct mvneta_port *pp, int rx_todo,
 		rx_status = rx_desc->status;
 		skb = (struct sk_buff *)rx_desc->buf_cookie;
 
-		if (!mvneta_rxq_desc_is_first_last(rx_desc) ||
+		if (!mvneta_rxq_desc_is_first_last(rx_status) ||
 		    (rx_status & MVNETA_RXD_ERR_SUMMARY)) {
 			dev->stats.rx_errors++;
 			mvneta_rx_error(pp, rx_desc);
@@ -1472,7 +1471,7 @@ static int mvneta_rx(struct mvneta_port *pp, int rx_todo,
 
 		skb->protocol = eth_type_trans(skb, dev);
 
-		mvneta_rx_csum(pp, rx_desc, skb);
+		mvneta_rx_csum(pp, rx_status, skb);
 
 		napi_gro_receive(&pp->napi, skb);
 
-- 
1.7.12.2.21.g234cd45.dirty

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next 4/6] net: mvneta: prefetch next rx descriptor instead of current one
  2014-01-12 11:24 [PATCH net-next 0/6] mvneta driver performance improvements Willy Tarreau
                   ` (2 preceding siblings ...)
  2014-01-12 11:24 ` [PATCH net-next 3/6] net: mvneta: simplify access to the rx descriptor status Willy Tarreau
@ 2014-01-12 11:24 ` Willy Tarreau
  2014-01-12 11:24 ` [PATCH net-next 5/6] net: mvneta: convert to build_skb() Willy Tarreau
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Willy Tarreau @ 2014-01-12 11:24 UTC (permalink / raw)
  To: davem; +Cc: netdev, Willy Tarreau, Thomas Petazzoni, Gregory CLEMENT

Currently, the mvneta driver tries to prefetch the current Rx
descriptor during read. Tests have shown that prefetching the
next one instead increases general performance by about 1% on
HTTP traffic.

Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/net/ethernet/marvell/mvneta.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index aa3a4f7..c7b37e0 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -611,6 +611,7 @@ mvneta_rxq_next_desc_get(struct mvneta_rx_queue *rxq)
 	int rx_desc = rxq->next_desc_to_proc;
 
 	rxq->next_desc_to_proc = MVNETA_QUEUE_NEXT_DESC(rxq, rx_desc);
+	prefetch(rxq->descs + rxq->next_desc_to_proc);
 	return rxq->descs + rx_desc;
 }
 
@@ -1442,7 +1443,6 @@ static int mvneta_rx(struct mvneta_port *pp, int rx_todo,
 		u32 rx_status;
 		int rx_bytes, err;
 
-		prefetch(rx_desc);
 		rx_done++;
 		rx_filled++;
 		rx_status = rx_desc->status;
-- 
1.7.12.2.21.g234cd45.dirty

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next 5/6] net: mvneta: convert to build_skb()
  2014-01-12 11:24 [PATCH net-next 0/6] mvneta driver performance improvements Willy Tarreau
                   ` (3 preceding siblings ...)
  2014-01-12 11:24 ` [PATCH net-next 4/6] net: mvneta: prefetch next rx descriptor instead of current one Willy Tarreau
@ 2014-01-12 11:24 ` Willy Tarreau
  2014-01-12 11:24 ` [PATCH net-next 6/6] net: mvneta: implement rx_copybreak Willy Tarreau
  2014-01-15  1:00 ` [PATCH net-next 0/6] mvneta driver performance improvements David Miller
  6 siblings, 0 replies; 14+ messages in thread
From: Willy Tarreau @ 2014-01-12 11:24 UTC (permalink / raw)
  To: davem; +Cc: netdev, Willy Tarreau, Thomas Petazzoni, Gregory CLEMENT

Make use of build_skb() to allocate frags on the RX path. When frag size
is lower than a page size, we can use netdev_alloc_frag(), and we fall back
to kmalloc() for larger sizes. The frag size is stored into the mvneta_port
struct. The alloc/free functions check the frag size to decide what alloc/
free method to use. MTU changes are safe because the MTU change function
stops the device and clears the queues before applying the change.

With this patch, I observed a reproducible 2% performance improvement on
HTTP-based benchmarks, and 5% on small packet RX rate.

Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/net/ethernet/marvell/mvneta.c | 49 +++++++++++++++++++++++++----------
 1 file changed, 35 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index c7b37e0..726a8d2 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -268,6 +268,7 @@ struct mvneta_pcpu_stats {
 
 struct mvneta_port {
 	int pkt_size;
+	unsigned int frag_size;
 	void __iomem *base;
 	struct mvneta_rx_queue *rxqs;
 	struct mvneta_tx_queue *txqs;
@@ -1332,28 +1333,43 @@ static int mvneta_txq_done(struct mvneta_port *pp,
 	return tx_done;
 }
 
+static void *mvneta_frag_alloc(const struct mvneta_port *pp)
+{
+	if (likely(pp->frag_size <= PAGE_SIZE))
+		return netdev_alloc_frag(pp->frag_size);
+	else
+		return kmalloc(pp->frag_size, GFP_ATOMIC);
+}
+
+static void mvneta_frag_free(const struct mvneta_port *pp, void *data)
+{
+	if (likely(pp->frag_size <= PAGE_SIZE))
+		put_page(virt_to_head_page(data));
+	else
+		kfree(data);
+}
+
 /* Refill processing */
 static int mvneta_rx_refill(struct mvneta_port *pp,
 			    struct mvneta_rx_desc *rx_desc)
 
 {
 	dma_addr_t phys_addr;
-	struct sk_buff *skb;
+	void *data;
 
-	skb = netdev_alloc_skb(pp->dev, pp->pkt_size);
-	if (!skb)
+	data = mvneta_frag_alloc(pp);
+	if (!data)
 		return -ENOMEM;
 
-	phys_addr = dma_map_single(pp->dev->dev.parent, skb->head,
+	phys_addr = dma_map_single(pp->dev->dev.parent, data,
 				   MVNETA_RX_BUF_SIZE(pp->pkt_size),
 				   DMA_FROM_DEVICE);
 	if (unlikely(dma_mapping_error(pp->dev->dev.parent, phys_addr))) {
-		dev_kfree_skb(skb);
+		mvneta_frag_free(pp, data);
 		return -ENOMEM;
 	}
 
-	mvneta_rx_desc_fill(rx_desc, phys_addr, (u32)skb);
-
+	mvneta_rx_desc_fill(rx_desc, phys_addr, (u32)data);
 	return 0;
 }
 
@@ -1407,9 +1423,9 @@ static void mvneta_rxq_drop_pkts(struct mvneta_port *pp,
 	rx_done = mvneta_rxq_busy_desc_num_get(pp, rxq);
 	for (i = 0; i < rxq->size; i++) {
 		struct mvneta_rx_desc *rx_desc = rxq->descs + i;
-		struct sk_buff *skb = (struct sk_buff *)rx_desc->buf_cookie;
+		void *data = (void *)rx_desc->buf_cookie;
 
-		dev_kfree_skb_any(skb);
+		mvneta_frag_free(pp, data);
 		dma_unmap_single(pp->dev->dev.parent, rx_desc->buf_phys_addr,
 				 MVNETA_RX_BUF_SIZE(pp->pkt_size), DMA_FROM_DEVICE);
 	}
@@ -1440,20 +1456,21 @@ static int mvneta_rx(struct mvneta_port *pp, int rx_todo,
 	while (rx_done < rx_todo) {
 		struct mvneta_rx_desc *rx_desc = mvneta_rxq_next_desc_get(rxq);
 		struct sk_buff *skb;
+		unsigned char *data;
 		u32 rx_status;
 		int rx_bytes, err;
 
 		rx_done++;
 		rx_filled++;
 		rx_status = rx_desc->status;
-		skb = (struct sk_buff *)rx_desc->buf_cookie;
+		data = (unsigned char *)rx_desc->buf_cookie;
 
 		if (!mvneta_rxq_desc_is_first_last(rx_status) ||
-		    (rx_status & MVNETA_RXD_ERR_SUMMARY)) {
+		    (rx_status & MVNETA_RXD_ERR_SUMMARY) ||
+		    !(skb = build_skb(data, pp->frag_size > PAGE_SIZE ? 0 : pp->frag_size))) {
 			dev->stats.rx_errors++;
 			mvneta_rx_error(pp, rx_desc);
-			mvneta_rx_desc_fill(rx_desc, rx_desc->buf_phys_addr,
-					    (u32)skb);
+			/* leave the descriptor untouched */
 			continue;
 		}
 
@@ -1466,7 +1483,7 @@ static int mvneta_rx(struct mvneta_port *pp, int rx_todo,
 		rcvd_bytes += rx_bytes;
 
 		/* Linux processing */
-		skb_reserve(skb, MVNETA_MH_SIZE);
+		skb_reserve(skb, MVNETA_MH_SIZE + NET_SKB_PAD);
 		skb_put(skb, rx_bytes);
 
 		skb->protocol = eth_type_trans(skb, dev);
@@ -2276,6 +2293,8 @@ static int mvneta_change_mtu(struct net_device *dev, int mtu)
 	mvneta_cleanup_rxqs(pp);
 
 	pp->pkt_size = MVNETA_RX_PKT_SIZE(pp->dev->mtu);
+	pp->frag_size = SKB_DATA_ALIGN(MVNETA_RX_BUF_SIZE(pp->pkt_size)) +
+	                SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 
 	ret = mvneta_setup_rxqs(pp);
 	if (ret) {
@@ -2423,6 +2442,8 @@ static int mvneta_open(struct net_device *dev)
 	mvneta_mac_addr_set(pp, dev->dev_addr, rxq_def);
 
 	pp->pkt_size = MVNETA_RX_PKT_SIZE(pp->dev->mtu);
+	pp->frag_size = SKB_DATA_ALIGN(MVNETA_RX_BUF_SIZE(pp->pkt_size)) +
+	                SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 
 	ret = mvneta_setup_rxqs(pp);
 	if (ret)
-- 
1.7.12.2.21.g234cd45.dirty

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next 6/6] net: mvneta: implement rx_copybreak
  2014-01-12 11:24 [PATCH net-next 0/6] mvneta driver performance improvements Willy Tarreau
                   ` (4 preceding siblings ...)
  2014-01-12 11:24 ` [PATCH net-next 5/6] net: mvneta: convert to build_skb() Willy Tarreau
@ 2014-01-12 11:24 ` Willy Tarreau
  2014-01-13 10:13   ` David Laight
  2014-01-15  1:00 ` [PATCH net-next 0/6] mvneta driver performance improvements David Miller
  6 siblings, 1 reply; 14+ messages in thread
From: Willy Tarreau @ 2014-01-12 11:24 UTC (permalink / raw)
  To: davem; +Cc: netdev, Willy Tarreau, Thomas Petazzoni, Gregory CLEMENT

calling dma_map_single()/dma_unmap_single() is quite expensive compared
to copying a small packet. So let's copy short frames and keep the buffers
mapped. We set the limit to 256 bytes which seems to give good results both
on the XP-GP board and on the AX3/4.

The Rx small packet rate increased by 16.4% doing this, from 486kpps to
573kpps. It is worth noting that even the call to the function
dma_sync_single_range_for_cpu() is expensive (300 ns) although less
than dma_unmap_single(). Without it, the packet rate raises to 711kpps
(+24% more). Thus on systems where coherency from device to CPU is
guaranteed by a snoop control unit, this patch should provide even more
gains, and probably rx_copybreak could be increased.

Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/net/ethernet/marvell/mvneta.c | 44 ++++++++++++++++++++++++++++++-----
 1 file changed, 38 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 726a8d2..f5fc7a2 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -444,6 +444,8 @@ static int txq_number = 8;
 
 static int rxq_def;
 
+static int rx_copybreak __read_mostly = 256;
+
 #define MVNETA_DRIVER_NAME "mvneta"
 #define MVNETA_DRIVER_VERSION "1.0"
 
@@ -1463,22 +1465,51 @@ static int mvneta_rx(struct mvneta_port *pp, int rx_todo,
 		rx_done++;
 		rx_filled++;
 		rx_status = rx_desc->status;
+		rx_bytes = rx_desc->data_size - (ETH_FCS_LEN + MVNETA_MH_SIZE);
 		data = (unsigned char *)rx_desc->buf_cookie;
 
 		if (!mvneta_rxq_desc_is_first_last(rx_status) ||
-		    (rx_status & MVNETA_RXD_ERR_SUMMARY) ||
-		    !(skb = build_skb(data, pp->frag_size > PAGE_SIZE ? 0 : pp->frag_size))) {
+		    (rx_status & MVNETA_RXD_ERR_SUMMARY)) {
+		err_drop_frame:
 			dev->stats.rx_errors++;
 			mvneta_rx_error(pp, rx_desc);
 			/* leave the descriptor untouched */
 			continue;
 		}
 
-		dma_unmap_single(pp->dev->dev.parent, rx_desc->buf_phys_addr,
+		if (rx_bytes <= rx_copybreak) {
+			/* better copy a small frame and not unmap the DMA region */
+			skb = netdev_alloc_skb_ip_align(dev, rx_bytes);
+			if (unlikely(!skb))
+				goto err_drop_frame;
+
+			dma_sync_single_range_for_cpu(dev->dev.parent,
+			                              rx_desc->buf_phys_addr,
+			                              MVNETA_MH_SIZE + NET_SKB_PAD,
+			                              rx_bytes,
+			                              DMA_FROM_DEVICE);
+			memcpy(skb_put(skb, rx_bytes),
+			       data + MVNETA_MH_SIZE + NET_SKB_PAD,
+			       rx_bytes);
+
+			skb->protocol = eth_type_trans(skb, dev);
+			mvneta_rx_csum(pp, rx_status, skb);
+			napi_gro_receive(&pp->napi, skb);
+
+			rcvd_pkts++;
+			rcvd_bytes += rx_bytes;
+
+			/* leave the descriptor and buffer untouched */
+			continue;
+		}
+
+		skb = build_skb(data, pp->frag_size > PAGE_SIZE ? 0 : pp->frag_size);
+		if (!skb)
+			goto err_drop_frame;
+
+		dma_unmap_single(dev->dev.parent, rx_desc->buf_phys_addr,
 				 MVNETA_RX_BUF_SIZE(pp->pkt_size), DMA_FROM_DEVICE);
 
-		rx_bytes = rx_desc->data_size -
-			(ETH_FCS_LEN + MVNETA_MH_SIZE);
 		rcvd_pkts++;
 		rcvd_bytes += rx_bytes;
 
@@ -1495,7 +1526,7 @@ static int mvneta_rx(struct mvneta_port *pp, int rx_todo,
 		/* Refill processing */
 		err = mvneta_rx_refill(pp, rx_desc);
 		if (err) {
-			netdev_err(pp->dev, "Linux processing - Can't refill\n");
+			netdev_err(dev, "Linux processing - Can't refill\n");
 			rxq->missed++;
 			rx_filled--;
 		}
@@ -2945,3 +2976,4 @@ module_param(rxq_number, int, S_IRUGO);
 module_param(txq_number, int, S_IRUGO);
 
 module_param(rxq_def, int, S_IRUGO);
+module_param(rx_copybreak, int, S_IRUGO | S_IWUSR);
-- 
1.7.12.2.21.g234cd45.dirty

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* RE: [PATCH net-next 6/6] net: mvneta: implement rx_copybreak
  2014-01-12 11:24 ` [PATCH net-next 6/6] net: mvneta: implement rx_copybreak Willy Tarreau
@ 2014-01-13 10:13   ` David Laight
  2014-01-13 10:49     ` Willy Tarreau
  0 siblings, 1 reply; 14+ messages in thread
From: David Laight @ 2014-01-13 10:13 UTC (permalink / raw)
  To: 'Willy Tarreau', davem@davemloft.net
  Cc: netdev@vger.kernel.org, Thomas Petazzoni, Gregory CLEMENT

From: Willy Tarreau
> calling dma_map_single()/dma_unmap_single() is quite expensive compared
> to copying a small packet. So let's copy short frames and keep the buffers
> mapped. We set the limit to 256 bytes which seems to give good results both
> on the XP-GP board and on the AX3/4.

Which architecture is this?
I presume it is one that needs iommu setup and/or cache flushing.

> The Rx small packet rate increased by 16.4% doing this, from 486kpps to
> 573kpps. It is worth noting that even the call to the function
> dma_sync_single_range_for_cpu() is expensive (300 ns) although less
> than dma_unmap_single(). Without it, the packet rate raises to 711kpps
> (+24% more). Thus on systems where coherency from device to CPU is
> guaranteed by a snoop control unit, this patch should provide even more
> gains, and probably rx_copybreak could be increased.

Is that the right way around?
If cache coherency is guaranteed then I'd have thought that the dma sync
would be a nop.
...
> +			memcpy(skb_put(skb, rx_bytes),
> +			       data + MVNETA_MH_SIZE + NET_SKB_PAD,
> +			       rx_bytes);

You can probably arrange for the copy to be fully aligned since
the partial words at both ends can be safely read and written.
That might speed things up further.

	David

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 6/6] net: mvneta: implement rx_copybreak
  2014-01-13 10:13   ` David Laight
@ 2014-01-13 10:49     ` Willy Tarreau
  0 siblings, 0 replies; 14+ messages in thread
From: Willy Tarreau @ 2014-01-13 10:49 UTC (permalink / raw)
  To: David Laight
  Cc: davem@davemloft.net, netdev@vger.kernel.org, Thomas Petazzoni,
	Gregory CLEMENT

Hi David,

On Mon, Jan 13, 2014 at 10:13:16AM +0000, David Laight wrote:
> From: Willy Tarreau
> > calling dma_map_single()/dma_unmap_single() is quite expensive compared
> > to copying a small packet. So let's copy short frames and keep the buffers
> > mapped. We set the limit to 256 bytes which seems to give good results both
> > on the XP-GP board and on the AX3/4.
> 
> Which architecture is this?

It's an ARMv7.

> I presume it is one that needs iommu setup and/or cache flushing.

Just wait for cache snoop completion.

> > The Rx small packet rate increased by 16.4% doing this, from 486kpps to
> > 573kpps. It is worth noting that even the call to the function
> > dma_sync_single_range_for_cpu() is expensive (300 ns) although less
> > than dma_unmap_single(). Without it, the packet rate raises to 711kpps
> > (+24% more). Thus on systems where coherency from device to CPU is
> > guaranteed by a snoop control unit, this patch should provide even more
> > gains, and probably rx_copybreak could be increased.
> 
> Is that the right way around?
> If cache coherency is guaranteed then I'd have thought that the dma sync
> would be a nop.

It's a bit more tricky. I found that the DMA API is not optimal for such
an architecture, because we need to wait *once* for the cache snooping to
complete at the beginning of the Rx loop, and then all other access may be
done with a NOP. However, since we don't currently have the ability to do
a first call and replace the other ones with a NOP, we still have to do
a dma_sync_single_for_cpu() for each packet, resulting in waiting for cache
snoop completion for each packet.

I've hacked the DMA API to add support for ops->iobarrier and test for it
at the beginning of the loop, call it, then avoid doing dma_sync_* afterwards
if it's defined. That way I reach the higher performance mentionned above.

But in my opinion, this is only material for future discussions.

> ...
> > +			memcpy(skb_put(skb, rx_bytes),
> > +			       data + MVNETA_MH_SIZE + NET_SKB_PAD,
> > +			       rx_bytes);
> 
> You can probably arrange for the copy to be fully aligned since
> the partial words at both ends can be safely read and written.
> That might speed things up further.

In fact it does not, this is what the very first patch did but I did not
see any difference.

Thanks,
Willy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 0/6] mvneta driver performance improvements
  2014-01-12 11:24 [PATCH net-next 0/6] mvneta driver performance improvements Willy Tarreau
                   ` (5 preceding siblings ...)
  2014-01-12 11:24 ` [PATCH net-next 6/6] net: mvneta: implement rx_copybreak Willy Tarreau
@ 2014-01-15  1:00 ` David Miller
  2014-01-15  7:25   ` Willy Tarreau
  2014-01-15  7:48   ` Willy Tarreau
  6 siblings, 2 replies; 14+ messages in thread
From: David Miller @ 2014-01-15  1:00 UTC (permalink / raw)
  To: w; +Cc: netdev

From: Willy Tarreau <w@1wt.eu>
Date: Sun, 12 Jan 2014 12:24:02 +0100

> this patch series implements several performance improvements on the
> mvneta driver.

Ignore that last email, I meant to say that this series did not apply
cleanly to net-next, sorry for the confusion.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 0/6] mvneta driver performance improvements
  2014-01-15  1:00 ` [PATCH net-next 0/6] mvneta driver performance improvements David Miller
@ 2014-01-15  7:25   ` Willy Tarreau
  2014-01-15  7:48   ` Willy Tarreau
  1 sibling, 0 replies; 14+ messages in thread
From: Willy Tarreau @ 2014-01-15  7:25 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On Tue, Jan 14, 2014 at 05:00:26PM -0800, David Miller wrote:
> From: Willy Tarreau <w@1wt.eu>
> Date: Sun, 12 Jan 2014 12:24:02 +0100
> 
> > this patch series implements several performance improvements on the
> > mvneta driver.
> 
> Ignore that last email, I meant to say that this series did not apply
> cleanly to net-next, sorry for the confusion.

No problem, I'll redo it.

thanks,
Willy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 0/6] mvneta driver performance improvements
  2014-01-15  1:00 ` [PATCH net-next 0/6] mvneta driver performance improvements David Miller
  2014-01-15  7:25   ` Willy Tarreau
@ 2014-01-15  7:48   ` Willy Tarreau
  2014-01-15  8:05     ` David Miller
  1 sibling, 1 reply; 14+ messages in thread
From: Willy Tarreau @ 2014-01-15  7:48 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On Tue, Jan 14, 2014 at 05:00:26PM -0800, David Miller wrote:
> From: Willy Tarreau <w@1wt.eu>
> Date: Sun, 12 Jan 2014 12:24:02 +0100
> 
> > this patch series implements several performance improvements on the
> > mvneta driver.
> 
> Ignore that last email, I meant to say that this series did not apply
> cleanly to net-next, sorry for the confusion.

David, this is because the first series is not applied first. I can
understand it was not very clear (just mentionned in the e-mail's body).
Do you prefer me to send you the two series for net-next or is it better
to wait for the first series to be merged first before applying the
second one ?

Just tell me what you prefer and I'll adapt.

Thanks,
Willy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 0/6] mvneta driver performance improvements
  2014-01-15  7:48   ` Willy Tarreau
@ 2014-01-15  8:05     ` David Miller
  2014-01-15  8:18       ` Willy Tarreau
  0 siblings, 1 reply; 14+ messages in thread
From: David Miller @ 2014-01-15  8:05 UTC (permalink / raw)
  To: w; +Cc: netdev

From: Willy Tarreau <w@1wt.eu>
Date: Wed, 15 Jan 2014 08:48:31 +0100

> On Tue, Jan 14, 2014 at 05:00:26PM -0800, David Miller wrote:
>> From: Willy Tarreau <w@1wt.eu>
>> Date: Sun, 12 Jan 2014 12:24:02 +0100
>> 
>> > this patch series implements several performance improvements on the
>> > mvneta driver.
>> 
>> Ignore that last email, I meant to say that this series did not apply
>> cleanly to net-next, sorry for the confusion.
> 
> David, this is because the first series is not applied first. I can
> understand it was not very clear (just mentionned in the e-mail's body).
> Do you prefer me to send you the two series for net-next or is it better
> to wait for the first series to be merged first before applying the
> second one ?
> 
> Just tell me what you prefer and I'll adapt.

Hmmm, didn't I apply the first series?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 0/6] mvneta driver performance improvements
  2014-01-15  8:05     ` David Miller
@ 2014-01-15  8:18       ` Willy Tarreau
  0 siblings, 0 replies; 14+ messages in thread
From: Willy Tarreau @ 2014-01-15  8:18 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On Wed, Jan 15, 2014 at 12:05:40AM -0800, David Miller wrote:
> From: Willy Tarreau <w@1wt.eu>
> Date: Wed, 15 Jan 2014 08:48:31 +0100
> 
> > On Tue, Jan 14, 2014 at 05:00:26PM -0800, David Miller wrote:
> >> From: Willy Tarreau <w@1wt.eu>
> >> Date: Sun, 12 Jan 2014 12:24:02 +0100
> >> 
> >> > this patch series implements several performance improvements on the
> >> > mvneta driver.
> >> 
> >> Ignore that last email, I meant to say that this series did not apply
> >> cleanly to net-next, sorry for the confusion.
> > 
> > David, this is because the first series is not applied first. I can
> > understand it was not very clear (just mentionned in the e-mail's body).
> > Do you prefer me to send you the two series for net-next or is it better
> > to wait for the first series to be merged first before applying the
> > second one ?
> > 
> > Just tell me what you prefer and I'll adapt.
> 
> Hmmm, didn't I apply the first series?

Possible :-)

Willy

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-01-15  8:18 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-12 11:24 [PATCH net-next 0/6] mvneta driver performance improvements Willy Tarreau
2014-01-12 11:24 ` [PATCH net-next 1/6] net: mvneta: remove tests for impossible cases in the tx_done path Willy Tarreau
2014-01-12 11:24 ` [PATCH net-next 2/6] net: mvneta: factor rx refilling code Willy Tarreau
2014-01-12 11:24 ` [PATCH net-next 3/6] net: mvneta: simplify access to the rx descriptor status Willy Tarreau
2014-01-12 11:24 ` [PATCH net-next 4/6] net: mvneta: prefetch next rx descriptor instead of current one Willy Tarreau
2014-01-12 11:24 ` [PATCH net-next 5/6] net: mvneta: convert to build_skb() Willy Tarreau
2014-01-12 11:24 ` [PATCH net-next 6/6] net: mvneta: implement rx_copybreak Willy Tarreau
2014-01-13 10:13   ` David Laight
2014-01-13 10:49     ` Willy Tarreau
2014-01-15  1:00 ` [PATCH net-next 0/6] mvneta driver performance improvements David Miller
2014-01-15  7:25   ` Willy Tarreau
2014-01-15  7:48   ` Willy Tarreau
2014-01-15  8:05     ` David Miller
2014-01-15  8:18       ` Willy Tarreau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).