* [PATCH net-next 0/7] net: aquantia: RX performance optimization patches
@ 2019-03-23 15:23 Igor Russkikh
2019-03-23 15:23 ` [PATCH net-next 1/7] net: aquantia: optimize rx path using larger preallocated skb len Igor Russkikh
` (7 more replies)
0 siblings, 8 replies; 9+ messages in thread
From: Igor Russkikh @ 2019-03-23 15:23 UTC (permalink / raw)
To: David S . Miller; +Cc: netdev@vger.kernel.org, Igor Russkikh
Here is a set of patches targeting for performance improvement
on various platforms and protocols.
Our main target was rx performance on iommu systems, notably
NVIDIA Jetson TX2 and NVIDIA Xavier platforms.
We introduce page reuse strategy to better deal with iommu dma mapping costs.
With it we see 80-90% of page reuse under some test configurations on UDP traffic.
This shows good improvements on other systems with IOMMU hardware, like
AMD Ryzen.
We've also improved TCP LRO configuration parameters, allowing packets to better
coalesce.
Page reuse tests were carried out using iperf3, iperf2, netperf and pktgen.
Mainly on UDP traffic, with various packet lengths.
Jetson TX2, UDP, Default MTU:
RX Lost Datagrams
Before: Max: 69% Min: 68% Avg: 68.5%
After: Max: 41% Min: 38% Avg: 39.2%
Maximum throughput
Before: 1.27 Gbits/sec
After: 2.41 Gbits/sec
AMD Ryzen 5 2400G, UDP, Default MTU:
RX Lost Datagrams
Before: Max: 12% Min: 4.5% Avg: 7.17%
After: Max: 6.2% Min: 2.3% Avg: 4.26%
Igor Russkikh (6):
net: aquantia: optimize rx path using larger preallocated skb len
net: aquantia: optimize rx performance by page reuse strategy
net: aquantia: Introduce rx refill threshold value
net: aquantia: Make RX default frame size 2K
net: aquantia: Increase rx ring default size from 1K to 2K
net: aquantia: enable driver build for arm64 or compile_test
Nikita Danilov (1):
net: aquantia: improve LRO configuration
drivers/net/ethernet/aquantia/Kconfig | 3 +-
.../net/ethernet/aquantia/atlantic/aq_cfg.h | 10 +-
.../net/ethernet/aquantia/atlantic/aq_nic.c | 1 +
.../net/ethernet/aquantia/atlantic/aq_nic.h | 1 +
.../net/ethernet/aquantia/atlantic/aq_ring.c | 187 +++++++++++++-----
.../net/ethernet/aquantia/atlantic/aq_ring.h | 34 +++-
.../net/ethernet/aquantia/atlantic/aq_vec.c | 3 +
.../aquantia/atlantic/hw_atl/hw_atl_a0.c | 4 -
.../aquantia/atlantic/hw_atl/hw_atl_b0.c | 16 +-
.../atlantic/hw_atl/hw_atl_b0_internal.h | 2 +-
.../aquantia/atlantic/hw_atl/hw_atl_llh.c | 15 ++
.../aquantia/atlantic/hw_atl/hw_atl_llh.h | 6 +
.../atlantic/hw_atl/hw_atl_llh_internal.h | 13 ++
13 files changed, 223 insertions(+), 72 deletions(-)
--
2.17.1
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH net-next 1/7] net: aquantia: optimize rx path using larger preallocated skb len
2019-03-23 15:23 [PATCH net-next 0/7] net: aquantia: RX performance optimization patches Igor Russkikh
@ 2019-03-23 15:23 ` Igor Russkikh
2019-03-23 15:23 ` [PATCH net-next 2/7] net: aquantia: optimize rx performance by page reuse strategy Igor Russkikh
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Igor Russkikh @ 2019-03-23 15:23 UTC (permalink / raw)
To: David S . Miller; +Cc: netdev@vger.kernel.org, Igor Russkikh
Atlantic driver used 14 bytes preallocated skb size. That made L3 protocol
processing inefficient because pskb_pull had to fetch all the L3/L4 headers
from extra fragments.
Specially on UDP flows that caused extra packet drops because CPU was
overloaded with pskb_pull.
This patch uses eth_get_headlen for skb preallocation.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
---
.../net/ethernet/aquantia/atlantic/aq_cfg.h | 2 ++
.../net/ethernet/aquantia/atlantic/aq_ring.c | 27 ++++++++++++-------
2 files changed, 20 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_cfg.h b/drivers/net/ethernet/aquantia/atlantic/aq_cfg.h
index 3944ce7f0870..aba550770adf 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_cfg.h
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_cfg.h
@@ -38,6 +38,8 @@
#define AQ_CFG_TX_CLEAN_BUDGET 256U
+#define AQ_CFG_RX_HDR_SIZE 256U
+
/* LRO */
#define AQ_CFG_IS_LRO_DEF 1U
diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
index 74550ccc7a20..42163b5d1cac 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
@@ -200,17 +200,18 @@ int aq_ring_rx_clean(struct aq_ring_s *self,
int budget)
{
struct net_device *ndev = aq_nic_get_ndev(self->aq_nic);
- int err = 0;
bool is_rsc_completed = true;
+ int err = 0;
for (; (self->sw_head != self->hw_head) && budget;
self->sw_head = aq_ring_next_dx(self, self->sw_head),
--budget, ++(*work_done)) {
struct aq_ring_buff_s *buff = &self->buff_ring[self->sw_head];
+ struct aq_ring_buff_s *buff_ = NULL;
struct sk_buff *skb = NULL;
unsigned int next_ = 0U;
unsigned int i = 0U;
- struct aq_ring_buff_s *buff_ = NULL;
+ u16 hdr_len;
if (buff->is_error) {
__free_pages(buff->page, 0);
@@ -254,20 +255,28 @@ int aq_ring_rx_clean(struct aq_ring_s *self,
err = -ENOMEM;
goto err_exit;
}
-
skb_put(skb, buff->len);
} else {
- skb = netdev_alloc_skb(ndev, ETH_HLEN);
+ skb = napi_alloc_skb(napi, AQ_CFG_RX_HDR_SIZE);
if (unlikely(!skb)) {
err = -ENOMEM;
goto err_exit;
}
- skb_put(skb, ETH_HLEN);
- memcpy(skb->data, page_address(buff->page), ETH_HLEN);
- skb_add_rx_frag(skb, 0, buff->page, ETH_HLEN,
- buff->len - ETH_HLEN,
- SKB_TRUESIZE(buff->len - ETH_HLEN));
+ hdr_len = buff->len;
+ if (hdr_len > AQ_CFG_RX_HDR_SIZE)
+ hdr_len = eth_get_headlen(page_address(buff->page),
+ AQ_CFG_RX_HDR_SIZE);
+
+ memcpy(__skb_put(skb, hdr_len), page_address(buff->page),
+ ALIGN(hdr_len, sizeof(long)));
+
+ if (buff->len - hdr_len > 0) {
+ skb_add_rx_frag(skb, 0, buff->page,
+ hdr_len,
+ buff->len - hdr_len,
+ SKB_TRUESIZE(buff->len - hdr_len));
+ }
if (!buff->is_eop) {
for (i = 1U, next_ = buff->next,
--
2.17.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH net-next 2/7] net: aquantia: optimize rx performance by page reuse strategy
2019-03-23 15:23 [PATCH net-next 0/7] net: aquantia: RX performance optimization patches Igor Russkikh
2019-03-23 15:23 ` [PATCH net-next 1/7] net: aquantia: optimize rx path using larger preallocated skb len Igor Russkikh
@ 2019-03-23 15:23 ` Igor Russkikh
2019-03-23 15:23 ` [PATCH net-next 3/7] net: aquantia: Introduce rx refill threshold value Igor Russkikh
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Igor Russkikh @ 2019-03-23 15:23 UTC (permalink / raw)
To: David S . Miller; +Cc: netdev@vger.kernel.org, Igor Russkikh
We introduce internal aq_rxpage wrapper over regular page
where extra field is tracked: rxpage offset inside of allocated page.
This offset allows to reuse one page for multiple packets.
When needed (for example with large frames processing), allocated
pageorder could be customized. This gives even larger page reuse
efficiency.
page_ref_count is used to track page users. If during rx refill
underlying page has users, we increase pg_off by rx frame size
thus the top half of the page is reused.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
---
.../net/ethernet/aquantia/atlantic/aq_cfg.h | 2 +
.../net/ethernet/aquantia/atlantic/aq_nic.c | 1 +
.../net/ethernet/aquantia/atlantic/aq_nic.h | 1 +
.../net/ethernet/aquantia/atlantic/aq_ring.c | 166 +++++++++++++-----
.../net/ethernet/aquantia/atlantic/aq_ring.h | 34 ++--
.../net/ethernet/aquantia/atlantic/aq_vec.c | 3 +
.../aquantia/atlantic/hw_atl/hw_atl_a0.c | 4 -
.../aquantia/atlantic/hw_atl/hw_atl_b0.c | 4 -
8 files changed, 152 insertions(+), 63 deletions(-)
diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_cfg.h b/drivers/net/ethernet/aquantia/atlantic/aq_cfg.h
index aba550770adf..80c16ab87771 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_cfg.h
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_cfg.h
@@ -40,6 +40,8 @@
#define AQ_CFG_RX_HDR_SIZE 256U
+#define AQ_CFG_RX_PAGEORDER 0U
+
/* LRO */
#define AQ_CFG_IS_LRO_DEF 1U
diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
index ff83667410bd..059df86e8e37 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
@@ -73,6 +73,7 @@ void aq_nic_cfg_start(struct aq_nic_s *self)
cfg->tx_itr = aq_itr_tx;
cfg->rx_itr = aq_itr_rx;
+ cfg->rxpageorder = AQ_CFG_RX_PAGEORDER;
cfg->is_rss = AQ_CFG_IS_RSS_DEF;
cfg->num_rss_queues = AQ_CFG_NUM_RSS_QUEUES_DEF;
cfg->aq_rss.base_cpu_number = AQ_CFG_RSS_BASE_CPU_NUM_DEF;
diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_nic.h b/drivers/net/ethernet/aquantia/atlantic/aq_nic.h
index 8e34c1e49bf2..b1372430f62f 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_nic.h
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_nic.h
@@ -31,6 +31,7 @@ struct aq_nic_cfg_s {
u32 itr;
u16 rx_itr;
u16 tx_itr;
+ u32 rxpageorder;
u32 num_rss_queues;
u32 mtu;
u32 flow_control;
diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
index 42163b5d1cac..1b258694144c 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
@@ -12,10 +12,89 @@
#include "aq_ring.h"
#include "aq_nic.h"
#include "aq_hw.h"
+#include "aq_hw_utils.h"
#include <linux/netdevice.h>
#include <linux/etherdevice.h>
+static inline void aq_free_rxpage(struct aq_rxpage *rxpage, struct device *dev)
+{
+ unsigned int len = PAGE_SIZE << rxpage->order;
+
+ dma_unmap_page(dev, rxpage->daddr, len, DMA_FROM_DEVICE);
+
+ /* Drop the ref for being in the ring. */
+ __free_pages(rxpage->page, rxpage->order);
+ rxpage->page = NULL;
+}
+
+static int aq_get_rxpage(struct aq_rxpage *rxpage, unsigned int order,
+ struct device *dev)
+{
+ struct page *page;
+ dma_addr_t daddr;
+ int ret = -ENOMEM;
+
+ page = dev_alloc_pages(order);
+ if (unlikely(!page))
+ goto err_exit;
+
+ daddr = dma_map_page(dev, page, 0, PAGE_SIZE << order,
+ DMA_FROM_DEVICE);
+
+ if (unlikely(dma_mapping_error(dev, daddr)))
+ goto free_page;
+
+ rxpage->page = page;
+ rxpage->daddr = daddr;
+ rxpage->order = order;
+ rxpage->pg_off = 0;
+
+ return 0;
+
+free_page:
+ __free_pages(page, order);
+
+err_exit:
+ return ret;
+}
+
+static int aq_get_rxpages(struct aq_ring_s *self, struct aq_ring_buff_s *rxbuf,
+ int order)
+{
+ int ret;
+
+ if (rxbuf->rxdata.page) {
+ /* One means ring is the only user and can reuse */
+ if (page_ref_count(rxbuf->rxdata.page) > 1) {
+ /* Try reuse buffer */
+ rxbuf->rxdata.pg_off += AQ_CFG_RX_FRAME_MAX;
+ if (rxbuf->rxdata.pg_off + AQ_CFG_RX_FRAME_MAX <=
+ (PAGE_SIZE << order)) {
+ self->stats.rx.pg_flips++;
+ } else {
+ /* Buffer exhausted. We have other users and
+ * should release this page and realloc
+ */
+ aq_free_rxpage(&rxbuf->rxdata,
+ aq_nic_get_dev(self->aq_nic));
+ self->stats.rx.pg_losts++;
+ }
+ } else {
+ rxbuf->rxdata.pg_off = 0;
+ self->stats.rx.pg_reuses++;
+ }
+ }
+
+ if (!rxbuf->rxdata.page) {
+ ret = aq_get_rxpage(&rxbuf->rxdata, order,
+ aq_nic_get_dev(self->aq_nic));
+ return ret;
+ }
+
+ return 0;
+}
+
static struct aq_ring_s *aq_ring_alloc(struct aq_ring_s *self,
struct aq_nic_s *aq_nic)
{
@@ -81,6 +160,11 @@ struct aq_ring_s *aq_ring_rx_alloc(struct aq_ring_s *self,
self->idx = idx;
self->size = aq_nic_cfg->rxds;
self->dx_size = aq_nic_cfg->aq_hw_caps->rxd_size;
+ self->page_order = fls(AQ_CFG_RX_FRAME_MAX / PAGE_SIZE +
+ (AQ_CFG_RX_FRAME_MAX % PAGE_SIZE ? 1 : 0)) - 1;
+
+ if (aq_nic_cfg->rxpageorder > self->page_order)
+ self->page_order = aq_nic_cfg->rxpageorder;
self = aq_ring_alloc(self, aq_nic);
if (!self) {
@@ -213,10 +297,8 @@ int aq_ring_rx_clean(struct aq_ring_s *self,
unsigned int i = 0U;
u16 hdr_len;
- if (buff->is_error) {
- __free_pages(buff->page, 0);
+ if (buff->is_error)
continue;
- }
if (buff->is_cleaned)
continue;
@@ -246,16 +328,22 @@ int aq_ring_rx_clean(struct aq_ring_s *self,
}
}
+ dma_sync_single_range_for_cpu(aq_nic_get_dev(self->aq_nic),
+ buff->rxdata.daddr,
+ buff->rxdata.pg_off,
+ buff->len, DMA_FROM_DEVICE);
+
/* for single fragment packets use build_skb() */
if (buff->is_eop &&
buff->len <= AQ_CFG_RX_FRAME_MAX - AQ_SKB_ALIGN) {
- skb = build_skb(page_address(buff->page),
+ skb = build_skb(aq_buf_vaddr(&buff->rxdata),
AQ_CFG_RX_FRAME_MAX);
if (unlikely(!skb)) {
err = -ENOMEM;
goto err_exit;
}
skb_put(skb, buff->len);
+ page_ref_inc(buff->rxdata.page);
} else {
skb = napi_alloc_skb(napi, AQ_CFG_RX_HDR_SIZE);
if (unlikely(!skb)) {
@@ -265,34 +353,41 @@ int aq_ring_rx_clean(struct aq_ring_s *self,
hdr_len = buff->len;
if (hdr_len > AQ_CFG_RX_HDR_SIZE)
- hdr_len = eth_get_headlen(page_address(buff->page),
+ hdr_len = eth_get_headlen(aq_buf_vaddr(&buff->rxdata),
AQ_CFG_RX_HDR_SIZE);
- memcpy(__skb_put(skb, hdr_len), page_address(buff->page),
+ memcpy(__skb_put(skb, hdr_len), aq_buf_vaddr(&buff->rxdata),
ALIGN(hdr_len, sizeof(long)));
if (buff->len - hdr_len > 0) {
- skb_add_rx_frag(skb, 0, buff->page,
- hdr_len,
+ skb_add_rx_frag(skb, 0, buff->rxdata.page,
+ buff->rxdata.pg_off + hdr_len,
buff->len - hdr_len,
- SKB_TRUESIZE(buff->len - hdr_len));
+ AQ_CFG_RX_FRAME_MAX);
+ page_ref_inc(buff->rxdata.page);
}
if (!buff->is_eop) {
- for (i = 1U, next_ = buff->next,
- buff_ = &self->buff_ring[next_];
- true; next_ = buff_->next,
- buff_ = &self->buff_ring[next_], ++i) {
- skb_add_rx_frag(skb, i,
- buff_->page, 0,
+ buff_ = buff;
+ i = 1U;
+ do {
+ next_ = buff_->next,
+ buff_ = &self->buff_ring[next_];
+
+ dma_sync_single_range_for_cpu(
+ aq_nic_get_dev(self->aq_nic),
+ buff_->rxdata.daddr,
+ buff_->rxdata.pg_off,
+ buff_->len,
+ DMA_FROM_DEVICE);
+ skb_add_rx_frag(skb, i++,
+ buff_->rxdata.page,
+ buff_->rxdata.pg_off,
buff_->len,
- SKB_TRUESIZE(buff->len -
- ETH_HLEN));
+ AQ_CFG_RX_FRAME_MAX);
+ page_ref_inc(buff_->rxdata.page);
buff_->is_cleaned = 1;
-
- if (buff_->is_eop)
- break;
- }
+ } while (!buff_->is_eop);
}
}
@@ -318,8 +413,7 @@ int aq_ring_rx_clean(struct aq_ring_s *self,
int aq_ring_rx_fill(struct aq_ring_s *self)
{
- unsigned int pages_order = fls(AQ_CFG_RX_FRAME_MAX / PAGE_SIZE +
- (AQ_CFG_RX_FRAME_MAX % PAGE_SIZE ? 1 : 0)) - 1;
+ unsigned int page_order = self->page_order;
struct aq_ring_buff_s *buff = NULL;
int err = 0;
int i = 0;
@@ -331,30 +425,15 @@ int aq_ring_rx_fill(struct aq_ring_s *self)
buff->flags = 0U;
buff->len = AQ_CFG_RX_FRAME_MAX;
- buff->page = alloc_pages(GFP_ATOMIC | __GFP_COMP, pages_order);
- if (!buff->page) {
- err = -ENOMEM;
+ err = aq_get_rxpages(self, buff, page_order);
+ if (err)
goto err_exit;
- }
-
- buff->pa = dma_map_page(aq_nic_get_dev(self->aq_nic),
- buff->page, 0,
- AQ_CFG_RX_FRAME_MAX, DMA_FROM_DEVICE);
-
- if (dma_mapping_error(aq_nic_get_dev(self->aq_nic), buff->pa)) {
- err = -ENOMEM;
- goto err_exit;
- }
+ buff->pa = aq_buf_daddr(&buff->rxdata);
buff = NULL;
}
err_exit:
- if (err < 0) {
- if (buff && buff->page)
- __free_pages(buff->page, 0);
- }
-
return err;
}
@@ -367,10 +446,7 @@ void aq_ring_rx_deinit(struct aq_ring_s *self)
self->sw_head = aq_ring_next_dx(self, self->sw_head)) {
struct aq_ring_buff_s *buff = &self->buff_ring[self->sw_head];
- dma_unmap_page(aq_nic_get_dev(self->aq_nic), buff->pa,
- AQ_CFG_RX_FRAME_MAX, DMA_FROM_DEVICE);
-
- __free_pages(buff->page, 0);
+ aq_free_rxpage(&buff->rxdata, aq_nic_get_dev(self->aq_nic));
}
err_exit:;
diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_ring.h b/drivers/net/ethernet/aquantia/atlantic/aq_ring.h
index ac1329f4051d..cfffc301e746 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_ring.h
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_ring.h
@@ -17,6 +17,13 @@
struct page;
struct aq_nic_cfg_s;
+struct aq_rxpage {
+ struct page *page;
+ dma_addr_t daddr;
+ unsigned int order;
+ unsigned int pg_off;
+};
+
/* TxC SOP DX EOP
* +----------+----------+----------+-----------
* 8bytes|len l3,l4 | pa | pa | pa
@@ -31,28 +38,21 @@ struct aq_nic_cfg_s;
*/
struct __packed aq_ring_buff_s {
union {
+ /* RX/TX */
+ dma_addr_t pa;
/* RX */
struct {
u32 rss_hash;
u16 next;
u8 is_hash_l4;
u8 rsvd1;
- struct page *page;
+ struct aq_rxpage rxdata;
};
/* EOP */
struct {
dma_addr_t pa_eop;
struct sk_buff *skb;
};
- /* DX */
- struct {
- dma_addr_t pa;
- };
- /* SOP */
- struct {
- dma_addr_t pa_sop;
- u32 len_pkt_sop;
- };
/* TxC */
struct {
u32 mss;
@@ -91,6 +91,9 @@ struct aq_ring_stats_rx_s {
u64 bytes;
u64 lro_packets;
u64 jumbo_packets;
+ u64 pg_losts;
+ u64 pg_flips;
+ u64 pg_reuses;
};
struct aq_ring_stats_tx_s {
@@ -116,6 +119,7 @@ struct aq_ring_s {
unsigned int size; /* descriptors number */
unsigned int dx_size; /* TX or RX descriptor size, */
/* stored here for fater math */
+ unsigned int page_order;
union aq_ring_stats_s stats;
dma_addr_t dx_ring_pa;
};
@@ -126,6 +130,16 @@ struct aq_ring_param_s {
cpumask_t affinity_mask;
};
+static inline void *aq_buf_vaddr(struct aq_rxpage *rxpage)
+{
+ return page_to_virt(rxpage->page) + rxpage->pg_off;
+}
+
+static inline dma_addr_t aq_buf_daddr(struct aq_rxpage *rxpage)
+{
+ return rxpage->daddr + rxpage->pg_off;
+}
+
static inline unsigned int aq_ring_next_dx(struct aq_ring_s *self,
unsigned int dx)
{
diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_vec.c b/drivers/net/ethernet/aquantia/atlantic/aq_vec.c
index d335c334fa56..a2e4ca1782ae 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_vec.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_vec.c
@@ -353,6 +353,9 @@ void aq_vec_add_stats(struct aq_vec_s *self,
stats_rx->errors += rx->errors;
stats_rx->jumbo_packets += rx->jumbo_packets;
stats_rx->lro_packets += rx->lro_packets;
+ stats_rx->pg_losts += rx->pg_losts;
+ stats_rx->pg_flips += rx->pg_flips;
+ stats_rx->pg_reuses += rx->pg_reuses;
stats_tx->packets += tx->packets;
stats_tx->bytes += tx->bytes;
diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_a0.c b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_a0.c
index f6f8338153a2..65ffaa7ad69e 100644
--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_a0.c
+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_a0.c
@@ -619,8 +619,6 @@ static int hw_atl_a0_hw_ring_tx_head_update(struct aq_hw_s *self,
static int hw_atl_a0_hw_ring_rx_receive(struct aq_hw_s *self,
struct aq_ring_s *ring)
{
- struct device *ndev = aq_nic_get_dev(ring->aq_nic);
-
for (; ring->hw_head != ring->sw_tail;
ring->hw_head = aq_ring_next_dx(ring, ring->hw_head)) {
struct aq_ring_buff_s *buff = NULL;
@@ -687,8 +685,6 @@ static int hw_atl_a0_hw_ring_rx_receive(struct aq_hw_s *self,
is_err &= ~0x18U;
is_err &= ~0x04U;
- dma_unmap_page(ndev, buff->pa, buff->len, DMA_FROM_DEVICE);
-
if (is_err || rxd_wb->type & 0x1000U) {
/* status error or DMA error */
buff->is_error = 1U;
diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
index b31dba1b1a55..f4e895906b1a 100644
--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
@@ -654,8 +654,6 @@ static int hw_atl_b0_hw_ring_tx_head_update(struct aq_hw_s *self,
static int hw_atl_b0_hw_ring_rx_receive(struct aq_hw_s *self,
struct aq_ring_s *ring)
{
- struct device *ndev = aq_nic_get_dev(ring->aq_nic);
-
for (; ring->hw_head != ring->sw_tail;
ring->hw_head = aq_ring_next_dx(ring, ring->hw_head)) {
struct aq_ring_buff_s *buff = NULL;
@@ -697,8 +695,6 @@ static int hw_atl_b0_hw_ring_rx_receive(struct aq_hw_s *self,
buff->is_cso_err = 0U;
}
- dma_unmap_page(ndev, buff->pa, buff->len, DMA_FROM_DEVICE);
-
if ((rx_stat & BIT(0)) || rxd_wb->type & 0x1000U) {
/* MAC error or DMA error */
buff->is_error = 1U;
--
2.17.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH net-next 3/7] net: aquantia: Introduce rx refill threshold value
2019-03-23 15:23 [PATCH net-next 0/7] net: aquantia: RX performance optimization patches Igor Russkikh
2019-03-23 15:23 ` [PATCH net-next 1/7] net: aquantia: optimize rx path using larger preallocated skb len Igor Russkikh
2019-03-23 15:23 ` [PATCH net-next 2/7] net: aquantia: optimize rx performance by page reuse strategy Igor Russkikh
@ 2019-03-23 15:23 ` Igor Russkikh
2019-03-23 15:23 ` [PATCH net-next 4/7] net: aquantia: Make RX default frame size 2K Igor Russkikh
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Igor Russkikh @ 2019-03-23 15:23 UTC (permalink / raw)
To: David S . Miller; +Cc: netdev@vger.kernel.org, Igor Russkikh
Before that, we've refilled ring even on single descriptor move.
Under high packet load that caused page allocation logic to be triggered
too often. That made overall ring processing slower.
Moreover, with page buffer reuse implemented, we should give a chance
higher networking levels to process received packets faster, release
the pages they consumed and therefore give a higher chance for these
pages to be reused.
RX ring is now refilled only when AQ_CFG_RX_REFILL_THRES or more
descriptors were processed (32 by default). Under regular traffic this
gives quite enough time for packet to be consumed and page to be reused.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
---
drivers/net/ethernet/aquantia/atlantic/aq_cfg.h | 2 ++
drivers/net/ethernet/aquantia/atlantic/aq_ring.c | 4 ++++
2 files changed, 6 insertions(+)
diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_cfg.h b/drivers/net/ethernet/aquantia/atlantic/aq_cfg.h
index 80c16ab87771..551c5cc1714b 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_cfg.h
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_cfg.h
@@ -38,6 +38,8 @@
#define AQ_CFG_TX_CLEAN_BUDGET 256U
+#define AQ_CFG_RX_REFILL_THRES 32U
+
#define AQ_CFG_RX_HDR_SIZE 256U
#define AQ_CFG_RX_PAGEORDER 0U
diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
index 1b258694144c..21c486cecbad 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
@@ -418,6 +418,10 @@ int aq_ring_rx_fill(struct aq_ring_s *self)
int err = 0;
int i = 0;
+ if (aq_ring_avail_dx(self) < min_t(unsigned int, AQ_CFG_RX_REFILL_THRES,
+ self->size / 2))
+ return err;
+
for (i = aq_ring_avail_dx(self); i--;
self->sw_tail = aq_ring_next_dx(self, self->sw_tail)) {
buff = &self->buff_ring[self->sw_tail];
--
2.17.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH net-next 4/7] net: aquantia: Make RX default frame size 2K
2019-03-23 15:23 [PATCH net-next 0/7] net: aquantia: RX performance optimization patches Igor Russkikh
` (2 preceding siblings ...)
2019-03-23 15:23 ` [PATCH net-next 3/7] net: aquantia: Introduce rx refill threshold value Igor Russkikh
@ 2019-03-23 15:23 ` Igor Russkikh
2019-03-23 15:23 ` [PATCH net-next 5/7] net: aquantia: Increase rx ring default size from 1K to 2K Igor Russkikh
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Igor Russkikh @ 2019-03-23 15:23 UTC (permalink / raw)
To: David S . Miller; +Cc: netdev@vger.kernel.org, Igor Russkikh
This correlates with default internet MTU. This also allows page
flip/reuse to be activated, since each allocated RX page now serves for
two frags/packets.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
---
drivers/net/ethernet/aquantia/atlantic/aq_cfg.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_cfg.h b/drivers/net/ethernet/aquantia/atlantic/aq_cfg.h
index 551c5cc1714b..0dc18f5f381d 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_cfg.h
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_cfg.h
@@ -34,7 +34,7 @@
#define AQ_CFG_TCS_MAX 8U
#define AQ_CFG_TX_FRAME_MAX (16U * 1024U)
-#define AQ_CFG_RX_FRAME_MAX (4U * 1024U)
+#define AQ_CFG_RX_FRAME_MAX (2U * 1024U)
#define AQ_CFG_TX_CLEAN_BUDGET 256U
--
2.17.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH net-next 5/7] net: aquantia: Increase rx ring default size from 1K to 2K
2019-03-23 15:23 [PATCH net-next 0/7] net: aquantia: RX performance optimization patches Igor Russkikh
` (3 preceding siblings ...)
2019-03-23 15:23 ` [PATCH net-next 4/7] net: aquantia: Make RX default frame size 2K Igor Russkikh
@ 2019-03-23 15:23 ` Igor Russkikh
2019-03-23 15:23 ` [PATCH net-next 6/7] net: aquantia: improve LRO configuration Igor Russkikh
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Igor Russkikh @ 2019-03-23 15:23 UTC (permalink / raw)
To: David S . Miller; +Cc: netdev@vger.kernel.org, Igor Russkikh
For multigig rates 1K ring size is often not enough and causes extra
packet drops in hardware.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
---
drivers/net/ethernet/aquantia/atlantic/aq_cfg.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_cfg.h b/drivers/net/ethernet/aquantia/atlantic/aq_cfg.h
index 0dc18f5f381d..8f35c3f883f0 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_cfg.h
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_cfg.h
@@ -16,7 +16,7 @@
#define AQ_CFG_TCS_DEF 1U
#define AQ_CFG_TXDS_DEF 4096U
-#define AQ_CFG_RXDS_DEF 1024U
+#define AQ_CFG_RXDS_DEF 2048U
#define AQ_CFG_IS_POLLING_DEF 0U
--
2.17.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH net-next 6/7] net: aquantia: improve LRO configuration
2019-03-23 15:23 [PATCH net-next 0/7] net: aquantia: RX performance optimization patches Igor Russkikh
` (4 preceding siblings ...)
2019-03-23 15:23 ` [PATCH net-next 5/7] net: aquantia: Increase rx ring default size from 1K to 2K Igor Russkikh
@ 2019-03-23 15:23 ` Igor Russkikh
2019-03-23 15:23 ` [PATCH net-next 7/7] net: aquantia: enable driver build for arm64 or compile_test Igor Russkikh
2019-03-24 2:17 ` [PATCH net-next 0/7] net: aquantia: RX performance optimization patches David Miller
7 siblings, 0 replies; 9+ messages in thread
From: Igor Russkikh @ 2019-03-23 15:23 UTC (permalink / raw)
To: David S . Miller; +Cc: netdev@vger.kernel.org, Igor Russkikh, Nikita Danilov
From: Nikita Danilov <nikita.danilov@aquantia.com>
Default LRO HW configuration was very conservative.
Low Number of Descriptors per LRO Sequence, small session
timeout, inefficient settings in interrupt generation logic.
Change max number of LRO descriptors from 2 to 16 to
increase performance. Increase maximum coalescing interval
in HW to 250uS. Tune up HW LRO interrupt generation setting
to prevent hw issues with long LRO sessions.
Signed-off-by: Nikita Danilov <nikita.danilov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
---
.../ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c | 12 +++++++++++-
.../aquantia/atlantic/hw_atl/hw_atl_b0_internal.h | 2 +-
.../aquantia/atlantic/hw_atl/hw_atl_llh.c | 15 +++++++++++++++
.../aquantia/atlantic/hw_atl/hw_atl_llh.h | 6 ++++++
.../atlantic/hw_atl/hw_atl_llh_internal.h | 13 +++++++++++++
5 files changed, 46 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
index f4e895906b1a..7e95804e2180 100644
--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
@@ -259,7 +259,13 @@ static int hw_atl_b0_hw_offload_set(struct aq_hw_s *self,
hw_atl_rpo_lro_time_base_divider_set(self, 0x61AU);
hw_atl_rpo_lro_inactive_interval_set(self, 0);
- hw_atl_rpo_lro_max_coalescing_interval_set(self, 2);
+ /* the LRO timebase divider is 5 uS (0x61a),
+ * which is multiplied by 50(0x32)
+ * to get a maximum coalescing interval of 250 uS,
+ * which is the default value
+ */
+ hw_atl_rpo_lro_max_coalescing_interval_set(self, 50);
+
hw_atl_rpo_lro_qsessions_lim_set(self, 1U);
@@ -273,6 +279,10 @@ static int hw_atl_b0_hw_offload_set(struct aq_hw_s *self,
hw_atl_rpo_lro_en_set(self,
aq_nic_cfg->is_lro ? 0xFFFFFFFFU : 0U);
+ hw_atl_itr_rsc_en_set(self,
+ aq_nic_cfg->is_lro ? 0xFFFFFFFFU : 0U);
+
+ hw_atl_itr_rsc_delay_set(self, 1U);
}
return aq_hw_err_from_flags(self);
}
diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0_internal.h b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0_internal.h
index b318eefd36ae..ea98a08d7820 100644
--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0_internal.h
+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0_internal.h
@@ -78,7 +78,7 @@
#define HW_ATL_B0_TC_MAX 1U
#define HW_ATL_B0_RSS_MAX 8U
-#define HW_ATL_B0_LRO_RXD_MAX 2U
+#define HW_ATL_B0_LRO_RXD_MAX 16U
#define HW_ATL_B0_RS_SLIP_ENABLED 0U
/* (256k -1(max pay_len) - 54(header)) */
diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_llh.c b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_llh.c
index 0722b8e01964..9442deff98a8 100644
--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_llh.c
+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_llh.c
@@ -315,6 +315,21 @@ void hw_atl_itr_res_irq_set(struct aq_hw_s *aq_hw, u32 res_irq)
HW_ATL_ITR_RES_SHIFT, res_irq);
}
+/* set RSC interrupt */
+void hw_atl_itr_rsc_en_set(struct aq_hw_s *aq_hw, u32 enable)
+{
+ aq_hw_write_reg(aq_hw, HW_ATL_ITR_RSC_EN_ADR, enable);
+}
+
+/* set RSC delay */
+void hw_atl_itr_rsc_delay_set(struct aq_hw_s *aq_hw, u32 delay)
+{
+ aq_hw_write_reg_bit(aq_hw, HW_ATL_ITR_RSC_DELAY_ADR,
+ HW_ATL_ITR_RSC_DELAY_MSK,
+ HW_ATL_ITR_RSC_DELAY_SHIFT,
+ delay);
+}
+
/* rdm */
void hw_atl_rdm_cpu_id_set(struct aq_hw_s *aq_hw, u32 cpuid, u32 dca)
{
diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_llh.h b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_llh.h
index d46351890b16..4cfa4bd80ad3 100644
--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_llh.h
+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_llh.h
@@ -152,6 +152,12 @@ u32 hw_atl_itr_res_irq_get(struct aq_hw_s *aq_hw);
/* set reset interrupt */
void hw_atl_itr_res_irq_set(struct aq_hw_s *aq_hw, u32 res_irq);
+/* set RSC interrupt */
+void hw_atl_itr_rsc_en_set(struct aq_hw_s *aq_hw, u32 enable);
+
+/* set RSC delay */
+void hw_atl_itr_rsc_delay_set(struct aq_hw_s *aq_hw, u32 delay);
+
/* rdm */
/* set cpu id */
diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_llh_internal.h b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_llh_internal.h
index fb45bc2d99cf..430bbd45b2f0 100644
--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_llh_internal.h
+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_llh_internal.h
@@ -95,6 +95,19 @@
#define HW_ATL_ITR_RES_MSK 0x80000000
/* lower bit position of bitfield itr_reset */
#define HW_ATL_ITR_RES_SHIFT 31
+
+/* register address for bitfield rsc_en */
+#define HW_ATL_ITR_RSC_EN_ADR 0x00002200
+
+/* register address for bitfield rsc_delay */
+#define HW_ATL_ITR_RSC_DELAY_ADR 0x00002204
+/* bitmask for bitfield rsc_delay */
+#define HW_ATL_ITR_RSC_DELAY_MSK 0x0000000f
+/* width of bitfield rsc_delay */
+#define HW_ATL_ITR_RSC_DELAY_WIDTH 4
+/* lower bit position of bitfield rsc_delay */
+#define HW_ATL_ITR_RSC_DELAY_SHIFT 0
+
/* register address for bitfield dca{d}_cpuid[7:0] */
#define HW_ATL_RDM_DCADCPUID_ADR(dca) (0x00006100 + (dca) * 0x4)
/* bitmask for bitfield dca{d}_cpuid[7:0] */
--
2.17.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH net-next 7/7] net: aquantia: enable driver build for arm64 or compile_test
2019-03-23 15:23 [PATCH net-next 0/7] net: aquantia: RX performance optimization patches Igor Russkikh
` (5 preceding siblings ...)
2019-03-23 15:23 ` [PATCH net-next 6/7] net: aquantia: improve LRO configuration Igor Russkikh
@ 2019-03-23 15:23 ` Igor Russkikh
2019-03-24 2:17 ` [PATCH net-next 0/7] net: aquantia: RX performance optimization patches David Miller
7 siblings, 0 replies; 9+ messages in thread
From: Igor Russkikh @ 2019-03-23 15:23 UTC (permalink / raw)
To: David S . Miller; +Cc: netdev@vger.kernel.org, Igor Russkikh
The driver is now constantly tested in our lab on aarch64 hardware:
Jetson tx2, Pascal and Xavier tegra based hardware.
Many of tegra smmu related HW bugs were fixed or workarounded already.
Thus, add ARM64 into Kconfig.
Add also COMPILE_TEST dependency.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
---
drivers/net/ethernet/aquantia/Kconfig | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/aquantia/Kconfig b/drivers/net/ethernet/aquantia/Kconfig
index 7d623e90dc19..12472c5bb34d 100644
--- a/drivers/net/ethernet/aquantia/Kconfig
+++ b/drivers/net/ethernet/aquantia/Kconfig
@@ -17,7 +17,8 @@ if NET_VENDOR_AQUANTIA
config AQTION
tristate "aQuantia AQtion(tm) Support"
- depends on PCI && X86_64
+ depends on PCI
+ depends on X86_64 || ARM64 || COMPILE_TEST
---help---
This enables the support for the aQuantia AQtion(tm) Ethernet card.
--
2.17.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH net-next 0/7] net: aquantia: RX performance optimization patches
2019-03-23 15:23 [PATCH net-next 0/7] net: aquantia: RX performance optimization patches Igor Russkikh
` (6 preceding siblings ...)
2019-03-23 15:23 ` [PATCH net-next 7/7] net: aquantia: enable driver build for arm64 or compile_test Igor Russkikh
@ 2019-03-24 2:17 ` David Miller
7 siblings, 0 replies; 9+ messages in thread
From: David Miller @ 2019-03-24 2:17 UTC (permalink / raw)
To: Igor.Russkikh; +Cc: netdev
From: Igor Russkikh <Igor.Russkikh@aquantia.com>
Date: Sat, 23 Mar 2019 15:23:29 +0000
> Here is a set of patches targeting for performance improvement
> on various platforms and protocols.
...
Series applied, thanks.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2019-03-24 2:17 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-03-23 15:23 [PATCH net-next 0/7] net: aquantia: RX performance optimization patches Igor Russkikh
2019-03-23 15:23 ` [PATCH net-next 1/7] net: aquantia: optimize rx path using larger preallocated skb len Igor Russkikh
2019-03-23 15:23 ` [PATCH net-next 2/7] net: aquantia: optimize rx performance by page reuse strategy Igor Russkikh
2019-03-23 15:23 ` [PATCH net-next 3/7] net: aquantia: Introduce rx refill threshold value Igor Russkikh
2019-03-23 15:23 ` [PATCH net-next 4/7] net: aquantia: Make RX default frame size 2K Igor Russkikh
2019-03-23 15:23 ` [PATCH net-next 5/7] net: aquantia: Increase rx ring default size from 1K to 2K Igor Russkikh
2019-03-23 15:23 ` [PATCH net-next 6/7] net: aquantia: improve LRO configuration Igor Russkikh
2019-03-23 15:23 ` [PATCH net-next 7/7] net: aquantia: enable driver build for arm64 or compile_test Igor Russkikh
2019-03-24 2:17 ` [PATCH net-next 0/7] net: aquantia: RX performance optimization patches David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).