* [Intel-wired-lan] [RFC PATCH 00/20] Enable the use of build_skb in igb/ixgbe
@ 2016-12-19 20:10 Alexander Duyck
2016-12-19 20:10 ` [Intel-wired-lan] [RFC PATCH 01/20] mm: Rename __alloc_page_frag to page_frag_alloc and __free_page_frag to page_frag_free Alexander Duyck
` (19 more replies)
0 siblings, 20 replies; 21+ messages in thread
From: Alexander Duyck @ 2016-12-19 20:10 UTC (permalink / raw)
To: intel-wired-lan
This patch series is based off of Dave Miller's latest net tree. The plan
is to submit it for next-queue once net has been merged back into net-next.
The first 3 patches are in a bit of limbo. They have been accepted for
linux-mm, however I haven't seen them submitted to Linus's tree as of yet.
According to Andrew Morton they are on his list to pull in post 4.9.
The remaining patches implement the changes to support build_skb and
various other performance improvements. So far I am doing most of my
performance testing on ixgbe with this patch set and the results are
promising. I am seeing throughput improvements of 10% with netperf
TCP_STREAM tests w/o LRO. With VXLAN tunnels I am seeing improvements of
as much as 30% when out checksums are enabled.
Any initial feedback would be useful. My next steps for this is to look at
generating changes for i40e and adding support to out-of-tree drivers for
these features.
---
Alexander Duyck (20):
mm: Rename __alloc_page_frag to page_frag_alloc and __free_page_frag to page_frag_free
mm: Rename __page_frag functions to __page_frag_cache, drop order from drain
mm: Add documentation for page fragment APIs
igb: Add support for DMA_ATTR_WEAK_ORDERING
igb: Use length to determine if descriptor is done
igb: Limit maximum frame Rx based on MTU
igb: Add support for padding packet
igb: Add support for ethtool private flag to allow use of legacy Rx
igb: Break out Rx buffer page management
igb: Revert "igb: Revert support for build_skb in igb"
ixgbe: Add function for checking to see if we can reuse page
ixgbe: Only DMA sync frame length
ixgbe: Update driver to make use of DMA attributes in Rx path
ixgbe: Update code to better handle incrementing page count
ixgbe: Make use of order 1 pages and 3K buffers independent of FCoE
ixgbe: Use length to determine if descriptor is done
ixgbe: Break out Rx buffer page management
ixgbe: Add support for padding packet
ixgbe: Add private flag to control buffer mode
ixgbe: Add support for build_skb
Documentation/vm/page_frags | 42 ++
drivers/net/ethernet/intel/igb/igb.h | 55 +++
drivers/net/ethernet/intel/igb/igb_ethtool.c | 52 +++
drivers/net/ethernet/intel/igb/igb_main.c | 357 ++++++++++++-------
drivers/net/ethernet/intel/ixgbe/ixgbe.h | 45 ++
drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c | 47 ++
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 414 +++++++++++++++-------
include/linux/gfp.h | 9
include/linux/skbuff.h | 2
mm/page_alloc.c | 33 +-
net/core/skbuff.c | 8
11 files changed, 758 insertions(+), 306 deletions(-)
create mode 100644 Documentation/vm/page_frags
--
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Intel-wired-lan] [RFC PATCH 01/20] mm: Rename __alloc_page_frag to page_frag_alloc and __free_page_frag to page_frag_free
2016-12-19 20:10 [Intel-wired-lan] [RFC PATCH 00/20] Enable the use of build_skb in igb/ixgbe Alexander Duyck
@ 2016-12-19 20:10 ` Alexander Duyck
2016-12-19 20:10 ` [Intel-wired-lan] [RFC PATCH 02/20] mm: Rename __page_frag functions to __page_frag_cache, drop order from drain Alexander Duyck
` (18 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Alexander Duyck @ 2016-12-19 20:10 UTC (permalink / raw)
To: intel-wired-lan
From: Alexander Duyck <alexander.h.duyck@intel.com>
This patch renames the page frag functions to be more consistent with other
APIs. Specifically we place the name page_frag first in the name and then
have either an alloc or free call name that we append as the suffix. This
makes it a bit clearer in terms of naming.
In addition we drop the leading double underscores since we are technically
no longer a backing interface and instead the front end that is called from
the networking APIs.
The last bit I changed is I rebased page_frag_free to actually function
very similar to the function free_pages, the only real difference now is
the fact that we have to get the page order by calling compound page
instead of having it passed as a part of the function call.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
include/linux/gfp.h | 6 +++---
include/linux/skbuff.h | 2 +-
mm/page_alloc.c | 20 ++++++++++++--------
net/core/skbuff.c | 8 ++++----
4 files changed, 20 insertions(+), 16 deletions(-)
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 4175dca4ac39..6238c74e0a01 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -508,9 +508,9 @@ extern struct page *alloc_pages_vma(gfp_t gfp_mask, int order,
struct page_frag_cache;
extern void __page_frag_drain(struct page *page, unsigned int order,
unsigned int count);
-extern void *__alloc_page_frag(struct page_frag_cache *nc,
- unsigned int fragsz, gfp_t gfp_mask);
-extern void __free_page_frag(void *addr);
+extern void *page_frag_alloc(struct page_frag_cache *nc,
+ unsigned int fragsz, gfp_t gfp_mask);
+extern void page_frag_free(void *addr);
#define __free_page(page) __free_pages((page), 0)
#define free_page(addr) free_pages((addr), 0)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index ac7fa34db8a7..e0b7f492b628 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2480,7 +2480,7 @@ static inline struct sk_buff *netdev_alloc_skb_ip_align(struct net_device *dev,
static inline void skb_free_frag(void *addr)
{
- __free_page_frag(addr);
+ page_frag_free(addr);
}
void *napi_alloc_frag(unsigned int fragsz);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2c6d5f64feca..4fccadbcdab9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3939,8 +3939,8 @@ void __page_frag_drain(struct page *page, unsigned int order,
}
EXPORT_SYMBOL(__page_frag_drain);
-void *__alloc_page_frag(struct page_frag_cache *nc,
- unsigned int fragsz, gfp_t gfp_mask)
+void *page_frag_alloc(struct page_frag_cache *nc,
+ unsigned int fragsz, gfp_t gfp_mask)
{
unsigned int size = PAGE_SIZE;
struct page *page;
@@ -3991,19 +3991,23 @@ void *__alloc_page_frag(struct page_frag_cache *nc,
return nc->va + offset;
}
-EXPORT_SYMBOL(__alloc_page_frag);
+EXPORT_SYMBOL(page_frag_alloc);
/*
* Frees a page fragment allocated out of either a compound or order 0 page.
*/
-void __free_page_frag(void *addr)
+void page_frag_free(void *addr)
{
- struct page *page = virt_to_head_page(addr);
+ struct page *page;
+
+ if (addr != 0) {
+ VM_BUG_ON(!virt_addr_valid(addr));
+ page = virt_to_head_page(addr);
- if (unlikely(put_page_testzero(page)))
- __free_pages_ok(page, compound_order(page));
+ __free_pages(page, compound_order(page));
+ }
}
-EXPORT_SYMBOL(__free_page_frag);
+EXPORT_SYMBOL(page_frag_free);
static void *make_alloc_exact(unsigned long addr, unsigned int order,
size_t size)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 65a74e13c45b..df4598ad6473 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -369,7 +369,7 @@ static void *__netdev_alloc_frag(unsigned int fragsz, gfp_t gfp_mask)
local_irq_save(flags);
nc = this_cpu_ptr(&netdev_alloc_cache);
- data = __alloc_page_frag(nc, fragsz, gfp_mask);
+ data = page_frag_alloc(nc, fragsz, gfp_mask);
local_irq_restore(flags);
return data;
}
@@ -391,7 +391,7 @@ static void *__napi_alloc_frag(unsigned int fragsz, gfp_t gfp_mask)
{
struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache);
- return __alloc_page_frag(&nc->page, fragsz, gfp_mask);
+ return page_frag_alloc(&nc->page, fragsz, gfp_mask);
}
void *napi_alloc_frag(unsigned int fragsz)
@@ -441,7 +441,7 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int len,
local_irq_save(flags);
nc = this_cpu_ptr(&netdev_alloc_cache);
- data = __alloc_page_frag(nc, len, gfp_mask);
+ data = page_frag_alloc(nc, len, gfp_mask);
pfmemalloc = nc->pfmemalloc;
local_irq_restore(flags);
@@ -505,7 +505,7 @@ struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len,
if (sk_memalloc_socks())
gfp_mask |= __GFP_MEMALLOC;
- data = __alloc_page_frag(&nc->page, len, gfp_mask);
+ data = page_frag_alloc(&nc->page, len, gfp_mask);
if (unlikely(!data))
return NULL;
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [Intel-wired-lan] [RFC PATCH 02/20] mm: Rename __page_frag functions to __page_frag_cache, drop order from drain
2016-12-19 20:10 [Intel-wired-lan] [RFC PATCH 00/20] Enable the use of build_skb in igb/ixgbe Alexander Duyck
2016-12-19 20:10 ` [Intel-wired-lan] [RFC PATCH 01/20] mm: Rename __alloc_page_frag to page_frag_alloc and __free_page_frag to page_frag_free Alexander Duyck
@ 2016-12-19 20:10 ` Alexander Duyck
2016-12-19 20:10 ` [Intel-wired-lan] [RFC PATCH 03/20] mm: Add documentation for page fragment APIs Alexander Duyck
` (17 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Alexander Duyck @ 2016-12-19 20:10 UTC (permalink / raw)
To: intel-wired-lan
From: Alexander Duyck <alexander.h.duyck@intel.com>
This patch does two things.
First it goes through and renames the __page_frag prefixed functions to
__page_frag_cache so that we can be clear that we are draining or refilling
the cache, not the frags themselves.
Second we drop the order parameter from __page_frag_cache_drain since we
don't actually need to pass it since all fragments are either order 0 or
must be a compound page.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
drivers/net/ethernet/intel/igb/igb_main.c | 6 +++---
include/linux/gfp.h | 3 +--
mm/page_alloc.c | 13 +++++++------
3 files changed, 11 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index a761001308dc..1515abaa5ac9 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -3962,8 +3962,8 @@ static void igb_clean_rx_ring(struct igb_ring *rx_ring)
PAGE_SIZE,
DMA_FROM_DEVICE,
DMA_ATTR_SKIP_CPU_SYNC);
- __page_frag_drain(buffer_info->page, 0,
- buffer_info->pagecnt_bias);
+ __page_frag_cache_drain(buffer_info->page,
+ buffer_info->pagecnt_bias);
buffer_info->page = NULL;
}
@@ -6991,7 +6991,7 @@ static struct sk_buff *igb_fetch_rx_buffer(struct igb_ring *rx_ring,
dma_unmap_page_attrs(rx_ring->dev, rx_buffer->dma,
PAGE_SIZE, DMA_FROM_DEVICE,
DMA_ATTR_SKIP_CPU_SYNC);
- __page_frag_drain(page, 0, rx_buffer->pagecnt_bias);
+ __page_frag_cache_drain(page, rx_buffer->pagecnt_bias);
}
/* clear contents of rx_buffer */
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 6238c74e0a01..884080404e24 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -506,8 +506,7 @@ extern struct page *alloc_pages_vma(gfp_t gfp_mask, int order,
extern void free_hot_cold_page_list(struct list_head *list, bool cold);
struct page_frag_cache;
-extern void __page_frag_drain(struct page *page, unsigned int order,
- unsigned int count);
+extern void __page_frag_cache_drain(struct page *page, unsigned int count);
extern void *page_frag_alloc(struct page_frag_cache *nc,
unsigned int fragsz, gfp_t gfp_mask);
extern void page_frag_free(void *addr);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4fccadbcdab9..dc396953331c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3904,8 +3904,8 @@ void free_pages(unsigned long addr, unsigned int order)
* drivers to provide a backing region of memory for use as either an
* sk_buff->head, or to be used in the "frags" portion of skb_shared_info.
*/
-static struct page *__page_frag_refill(struct page_frag_cache *nc,
- gfp_t gfp_mask)
+static struct page *__page_frag_cache_refill(struct page_frag_cache *nc,
+ gfp_t gfp_mask)
{
struct page *page = NULL;
gfp_t gfp = gfp_mask;
@@ -3925,19 +3925,20 @@ static struct page *__page_frag_refill(struct page_frag_cache *nc,
return page;
}
-void __page_frag_drain(struct page *page, unsigned int order,
- unsigned int count)
+void __page_frag_cache_drain(struct page *page, unsigned int count)
{
VM_BUG_ON_PAGE(page_ref_count(page) == 0, page);
if (page_ref_sub_and_test(page, count)) {
+ unsigned int order = compound_order(page);
+
if (order == 0)
free_hot_cold_page(page, false);
else
__free_pages_ok(page, order);
}
}
-EXPORT_SYMBOL(__page_frag_drain);
+EXPORT_SYMBOL(__page_frag_cache_drain);
void *page_frag_alloc(struct page_frag_cache *nc,
unsigned int fragsz, gfp_t gfp_mask)
@@ -3948,7 +3949,7 @@ void *page_frag_alloc(struct page_frag_cache *nc,
if (unlikely(!nc->va)) {
refill:
- page = __page_frag_refill(nc, gfp_mask);
+ page = __page_frag_cache_refill(nc, gfp_mask);
if (!page)
return NULL;
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [Intel-wired-lan] [RFC PATCH 03/20] mm: Add documentation for page fragment APIs
2016-12-19 20:10 [Intel-wired-lan] [RFC PATCH 00/20] Enable the use of build_skb in igb/ixgbe Alexander Duyck
2016-12-19 20:10 ` [Intel-wired-lan] [RFC PATCH 01/20] mm: Rename __alloc_page_frag to page_frag_alloc and __free_page_frag to page_frag_free Alexander Duyck
2016-12-19 20:10 ` [Intel-wired-lan] [RFC PATCH 02/20] mm: Rename __page_frag functions to __page_frag_cache, drop order from drain Alexander Duyck
@ 2016-12-19 20:10 ` Alexander Duyck
2016-12-19 20:11 ` [Intel-wired-lan] [RFC PATCH 04/20] igb: Add support for DMA_ATTR_WEAK_ORDERING Alexander Duyck
` (16 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Alexander Duyck @ 2016-12-19 20:10 UTC (permalink / raw)
To: intel-wired-lan
From: Alexander Duyck <alexander.h.duyck@intel.com>
This is a first pass at trying to add documentation for the page_frag APIs.
They may still change over time but for now I thought I would try to get
these documented so that as more network drivers and stack calls make use
of them we have one central spot to document how they are meant to be used.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
Documentation/vm/page_frags | 42 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 42 insertions(+)
create mode 100644 Documentation/vm/page_frags
diff --git a/Documentation/vm/page_frags b/Documentation/vm/page_frags
new file mode 100644
index 000000000000..a6714565dbf9
--- /dev/null
+++ b/Documentation/vm/page_frags
@@ -0,0 +1,42 @@
+Page fragments
+--------------
+
+A page fragment is an arbitrary-length arbitrary-offset area of memory
+which resides within a 0 or higher order compound page. Multiple
+fragments within that page are individually refcounted, in the page's
+reference counter.
+
+The page_frag functions, page_frag_alloc and page_frag_free, provide a
+simple allocation framework for page fragments. This is used by the
+network stack and network device drivers to provide a backing region of
+memory for use as either an sk_buff->head, or to be used in the "frags"
+portion of skb_shared_info.
+
+In order to make use of the page fragment APIs a backing page fragment
+cache is needed. This provides a central point for the fragment allocation
+and tracks allows multiple calls to make use of a cached page. The
+advantage to doing this is that multiple calls to get_page can be avoided
+which can be expensive at allocation time. However due to the nature of
+this caching it is required that any calls to the cache be protected by
+either a per-cpu limitation, or a per-cpu limitation and forcing interrupts
+to be disabled when executing the fragment allocation.
+
+The network stack uses two separate caches per CPU to handle fragment
+allocation. The netdev_alloc_cache is used by callers making use of the
+__netdev_alloc_frag and __netdev_alloc_skb calls. The napi_alloc_cache is
+used by callers of the __napi_alloc_frag and __napi_alloc_skb calls. The
+main difference between these two calls is the context in which they may be
+called. The "netdev" prefixed functions are usable in any context as these
+functions will disable interrupts, while the "napi" prefixed functions are
+only usable within the softirq context.
+
+Many network device drivers use a similar methodology for allocating page
+fragments, but the page fragments are cached at the ring or descriptor
+level. In order to enable these cases it is necessary to provide a generic
+way of tearing down a page cache. For this reason __page_frag_cache_drain
+was implemented. It allows for freeing multiple references from a single
+page via a single call. The advantage to doing this is that it allows for
+cleaning up the multiple references that were added to a page in order to
+avoid calling get_page per allocation.
+
+Alexander Duyck, Nov 29, 2016.
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [Intel-wired-lan] [RFC PATCH 04/20] igb: Add support for DMA_ATTR_WEAK_ORDERING
2016-12-19 20:10 [Intel-wired-lan] [RFC PATCH 00/20] Enable the use of build_skb in igb/ixgbe Alexander Duyck
` (2 preceding siblings ...)
2016-12-19 20:10 ` [Intel-wired-lan] [RFC PATCH 03/20] mm: Add documentation for page fragment APIs Alexander Duyck
@ 2016-12-19 20:11 ` Alexander Duyck
2016-12-19 20:11 ` [Intel-wired-lan] [RFC PATCH 05/20] igb: Use length to determine if descriptor is done Alexander Duyck
` (15 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Alexander Duyck @ 2016-12-19 20:11 UTC (permalink / raw)
To: intel-wired-lan
From: Alexander Duyck <alexander.h.duyck@intel.com>
Since we are already using DMA attributes in igb for Rx there is no reason
why we can't also apply DMA_ATTR_WEAK_ORDERING which is needed on some
platforms to improve performance.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
drivers/net/ethernet/intel/igb/igb.h | 3 +++
drivers/net/ethernet/intel/igb/igb_main.c | 6 +++---
2 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
index acbc3abe2ddd..87c9fe9d6f18 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -148,6 +148,9 @@ struct vf_data_storage {
/* How many Rx Buffers do we bundle into one write to the hardware ? */
#define IGB_RX_BUFFER_WRITE 16 /* Must be power of 2 */
+#define IGB_RX_DMA_ATTR \
+ (DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_WEAK_ORDERING)
+
#define AUTO_ALL_MODES 0
#define IGB_EEPROM_APME 0x0400
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 1515abaa5ac9..7e21cdbde948 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -3961,7 +3961,7 @@ static void igb_clean_rx_ring(struct igb_ring *rx_ring)
buffer_info->dma,
PAGE_SIZE,
DMA_FROM_DEVICE,
- DMA_ATTR_SKIP_CPU_SYNC);
+ IGB_RX_DMA_ATTR);
__page_frag_cache_drain(buffer_info->page,
buffer_info->pagecnt_bias);
@@ -6990,7 +6990,7 @@ static struct sk_buff *igb_fetch_rx_buffer(struct igb_ring *rx_ring,
*/
dma_unmap_page_attrs(rx_ring->dev, rx_buffer->dma,
PAGE_SIZE, DMA_FROM_DEVICE,
- DMA_ATTR_SKIP_CPU_SYNC);
+ IGB_RX_DMA_ATTR);
__page_frag_cache_drain(page, rx_buffer->pagecnt_bias);
}
@@ -7250,7 +7250,7 @@ static bool igb_alloc_mapped_page(struct igb_ring *rx_ring,
/* map page for use */
dma = dma_map_page_attrs(rx_ring->dev, page, 0, PAGE_SIZE,
- DMA_FROM_DEVICE, DMA_ATTR_SKIP_CPU_SYNC);
+ DMA_FROM_DEVICE, IGB_RX_DMA_ATTR);
/* if mapping failed free memory back to system since
* there isn't much point in holding memory we can't use
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [Intel-wired-lan] [RFC PATCH 05/20] igb: Use length to determine if descriptor is done
2016-12-19 20:10 [Intel-wired-lan] [RFC PATCH 00/20] Enable the use of build_skb in igb/ixgbe Alexander Duyck
` (3 preceding siblings ...)
2016-12-19 20:11 ` [Intel-wired-lan] [RFC PATCH 04/20] igb: Add support for DMA_ATTR_WEAK_ORDERING Alexander Duyck
@ 2016-12-19 20:11 ` Alexander Duyck
2016-12-19 20:11 ` [Intel-wired-lan] [RFC PATCH 06/20] igb: Limit maximum frame Rx based on MTU Alexander Duyck
` (14 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Alexander Duyck @ 2016-12-19 20:11 UTC (permalink / raw)
To: intel-wired-lan
From: Alexander Duyck <alexander.h.duyck@intel.com>
This change makes it so that we use the length of the packet instead of the
DD status bit to determine if a new descriptor is ready to be processed.
The obvious advantage is that it cuts down on reads as we don't really even
need the DD bit if going from a 0 to a non-zero value on size is enough to
inform us that the packet has been completed.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
drivers/net/ethernet/intel/igb/igb_main.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 7e21cdbde948..7c3c3abc72bb 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -7172,7 +7172,7 @@ static int igb_clean_rx_irq(struct igb_q_vector *q_vector, const int budget)
rx_desc = IGB_RX_DESC(rx_ring, rx_ring->next_to_clean);
- if (!rx_desc->wb.upper.status_error)
+ if (!rx_desc->wb.upper.length)
break;
/* This memory barrier is needed to keep us from reading
@@ -7312,8 +7312,8 @@ void igb_alloc_rx_buffers(struct igb_ring *rx_ring, u16 cleaned_count)
i -= rx_ring->count;
}
- /* clear the status bits for the next_to_use descriptor */
- rx_desc->wb.upper.status_error = 0;
+ /* clear the length for the next_to_use descriptor */
+ rx_desc->wb.upper.length = 0;
cleaned_count--;
} while (cleaned_count);
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [Intel-wired-lan] [RFC PATCH 06/20] igb: Limit maximum frame Rx based on MTU
2016-12-19 20:10 [Intel-wired-lan] [RFC PATCH 00/20] Enable the use of build_skb in igb/ixgbe Alexander Duyck
` (4 preceding siblings ...)
2016-12-19 20:11 ` [Intel-wired-lan] [RFC PATCH 05/20] igb: Use length to determine if descriptor is done Alexander Duyck
@ 2016-12-19 20:11 ` Alexander Duyck
2016-12-19 20:11 ` [Intel-wired-lan] [RFC PATCH 07/20] igb: Add support for padding packet Alexander Duyck
` (13 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Alexander Duyck @ 2016-12-19 20:11 UTC (permalink / raw)
To: intel-wired-lan
From: Alexander Duyck <alexander.h.duyck@intel.com>
In order to support the use of build_skb going forward it will be necessary
to place a maximum limit on the amount of data we can receive when jumbo
frames is not enabled. In order to do this I am adding a new upper limit
for receive based on the size of a 2K buffer minus padding.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
drivers/net/ethernet/intel/igb/igb.h | 10 +++++++++-
drivers/net/ethernet/intel/igb/igb_main.c | 16 ++++++++++++----
2 files changed, 21 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
index 87c9fe9d6f18..390c2a6a98aa 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -143,8 +143,17 @@ struct vf_data_storage {
#define IGB_RXBUFFER_256 256
#define IGB_RXBUFFER_2048 2048
#define IGB_RX_HDR_LEN IGB_RXBUFFER_256
+#define IGB_TS_HDR_LEN 16
#define IGB_RX_BUFSZ IGB_RXBUFFER_2048
+#define IGB_SKB_PAD (NET_SKB_PAD + NET_IP_ALIGN)
+#if (PAGE_SIZE < 8192)
+#define IGB_MAX_FRAME_BUILD_SKB \
+ (SKB_WITH_OVERHEAD(IGB_RXBUFFER_2048) - IGB_SKB_PAD - IGB_TS_HDR_LEN)
+#else
+#define IGB_MAX_FRAME_BUILD_SKB (IGB_RXBUFFER_2048 - IGB_TS_HDR_LEN)
+#endif
+
/* How many Rx Buffers do we bundle into one write to the hardware ? */
#define IGB_RX_BUFFER_WRITE 16 /* Must be power of 2 */
@@ -561,7 +570,6 @@ struct igb_adapter {
#define IGB_DMCTLX_DCFLUSH_DIS 0x80000000 /* Disable DMA Coal Flush */
#define IGB_82576_TSYNC_SHIFT 19
-#define IGB_TS_HDR_LEN 16
enum e1000_state_t {
__IGB_TESTING,
__IGB_RESETTING,
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 7c3c3abc72bb..da6219c37f05 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -4238,7 +4238,7 @@ static void igb_set_rx_mode(struct net_device *netdev)
struct igb_adapter *adapter = netdev_priv(netdev);
struct e1000_hw *hw = &adapter->hw;
unsigned int vfn = adapter->vfs_allocated_count;
- u32 rctl = 0, vmolr = 0;
+ u32 rctl = 0, vmolr = 0, rlpml = MAX_JUMBO_FRAME_SIZE;
int count;
/* Check for Promiscuous and All Multicast modes */
@@ -4310,12 +4310,20 @@ static void igb_set_rx_mode(struct net_device *netdev)
vmolr |= rd32(E1000_VMOLR(vfn)) &
~(E1000_VMOLR_ROPE | E1000_VMOLR_MPME | E1000_VMOLR_ROMPE);
- /* enable Rx jumbo frames, no need for restriction */
+ /* enable Rx jumbo frames, restrict as needed to support build_skb */
vmolr &= ~E1000_VMOLR_RLPML_MASK;
- vmolr |= MAX_JUMBO_FRAME_SIZE | E1000_VMOLR_LPE;
+#if (PAGE_SIZE < 8192)
+ if (adapter->max_frame_size <= IGB_MAX_FRAME_BUILD_SKB) {
+ if (!adapter->vfs_allocated_count)
+ rlpml = IGB_MAX_FRAME_BUILD_SKB;
+ vmolr |= IGB_MAX_FRAME_BUILD_SKB;
+ } else
+#endif
+ vmolr |= MAX_JUMBO_FRAME_SIZE;
+ vmolr |= E1000_VMOLR_LPE;
wr32(E1000_VMOLR(vfn), vmolr);
- wr32(E1000_RLPML, MAX_JUMBO_FRAME_SIZE);
+ wr32(E1000_RLPML, rlpml);
igb_restore_vf_multicasts(adapter);
}
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [Intel-wired-lan] [RFC PATCH 07/20] igb: Add support for padding packet
2016-12-19 20:10 [Intel-wired-lan] [RFC PATCH 00/20] Enable the use of build_skb in igb/ixgbe Alexander Duyck
` (5 preceding siblings ...)
2016-12-19 20:11 ` [Intel-wired-lan] [RFC PATCH 06/20] igb: Limit maximum frame Rx based on MTU Alexander Duyck
@ 2016-12-19 20:11 ` Alexander Duyck
2016-12-19 20:11 ` [Intel-wired-lan] [RFC PATCH 08/20] igb: Add support for ethtool private flag to allow use of legacy Rx Alexander Duyck
` (12 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Alexander Duyck @ 2016-12-19 20:11 UTC (permalink / raw)
To: intel-wired-lan
From: Alexander Duyck <alexander.h.duyck@intel.com>
With the size of the frame limited we can now write to an offset within the
buffer instead of having to write at the very start of the buffer. The
advantage to this is that it allows us to leave padding room for things
like supporting XDP in the future.
One side effect of this patch is that we can end up using a larger buffer
if jumbo frames is enabled. The impact shouldn't be too great, but it
could hurt small packet performance for UDP workloads if jumbo frames is
enabled as the truesize of frames will be larger.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
drivers/net/ethernet/intel/igb/igb.h | 42 +++++++++++-
drivers/net/ethernet/intel/igb/igb_ethtool.c | 4 +
drivers/net/ethernet/intel/igb/igb_main.c | 93 +++++++++++++++++++-------
3 files changed, 110 insertions(+), 29 deletions(-)
diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
index 390c2a6a98aa..084de568060b 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -142,9 +142,9 @@ struct vf_data_storage {
/* Supported Rx Buffer Sizes */
#define IGB_RXBUFFER_256 256
#define IGB_RXBUFFER_2048 2048
+#define IGB_RXBUFFER_3072 3072
#define IGB_RX_HDR_LEN IGB_RXBUFFER_256
#define IGB_TS_HDR_LEN 16
-#define IGB_RX_BUFSZ IGB_RXBUFFER_2048
#define IGB_SKB_PAD (NET_SKB_PAD + NET_IP_ALIGN)
#if (PAGE_SIZE < 8192)
@@ -313,12 +313,51 @@ struct igb_q_vector {
};
enum e1000_ring_flags_t {
+ IGB_RING_FLAG_RX_3K_BUFFER,
+ IGB_RING_FLAG_RX_BUILD_SKB_ENABLED,
IGB_RING_FLAG_RX_SCTP_CSUM,
IGB_RING_FLAG_RX_LB_VLAN_BSWAP,
IGB_RING_FLAG_TX_CTX_IDX,
IGB_RING_FLAG_TX_DETECT_HANG
};
+#define ring_uses_build_skb(ring) \
+ test_bit(IGB_RING_FLAG_RX_BUILD_SKB_ENABLED, &(ring)->flags)
+#define set_ring_build_skb_enabled(ring) \
+ set_bit(IGB_RING_FLAG_RX_BUILD_SKB_ENABLED, &(ring)->flags)
+#define clear_ring_build_skb_enabled(ring) \
+ clear_bit(IGB_RING_FLAG_RX_BUILD_SKB_ENABLED, &(ring)->flags)
+
+#define ring_uses_large_buffer(ring) \
+ test_bit(IGB_RING_FLAG_RX_3K_BUFFER, &(ring)->flags)
+#define set_ring_uses_large_buffer(ring) \
+ set_bit(IGB_RING_FLAG_RX_3K_BUFFER, &(ring)->flags)
+#define clear_ring_uses_large_buffer(ring) \
+ clear_bit(IGB_RING_FLAG_RX_3K_BUFFER, &(ring)->flags)
+
+static inline unsigned int igb_rx_bufsz(struct igb_ring *ring)
+{
+#if (PAGE_SIZE < 8192)
+ if (ring_uses_large_buffer(ring))
+ return IGB_RXBUFFER_3072;
+
+ if (ring_uses_build_skb(ring))
+ return IGB_MAX_FRAME_BUILD_SKB + IGB_TS_HDR_LEN;
+#endif
+ return IGB_RXBUFFER_2048;
+}
+
+static inline unsigned int igb_rx_pg_order(struct igb_ring *ring)
+{
+#if (PAGE_SIZE < 8192)
+ if (ring_uses_large_buffer(ring))
+ return 1;
+#endif
+ return 0;
+}
+
+#define igb_rx_pg_size(_ring) (PAGE_SIZE << igb_rx_pg_order(_ring))
+
#define IGB_TXD_DCMD (E1000_ADVTXD_DCMD_EOP | E1000_ADVTXD_DCMD_RS)
#define IGB_RX_DESC(R, i) \
@@ -557,6 +596,7 @@ struct igb_adapter {
#define IGB_FLAG_HAS_MSIX BIT(13)
#define IGB_FLAG_EEE BIT(14)
#define IGB_FLAG_VLAN_PROMISC BIT(15)
+#define IGB_FLAG_RX_LEGACY BIT(16)
/* Media Auto Sense */
#define IGB_MAS_ENABLE_0 0X0001
diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c b/drivers/net/ethernet/intel/igb/igb_ethtool.c
index 737b664d004c..167977e80b66 100644
--- a/drivers/net/ethernet/intel/igb/igb_ethtool.c
+++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c
@@ -1818,7 +1818,7 @@ static int igb_clean_test_rings(struct igb_ring *rx_ring,
/* sync Rx buffer for CPU read */
dma_sync_single_for_cpu(rx_ring->dev,
rx_buffer_info->dma,
- IGB_RX_BUFSZ,
+ size,
DMA_FROM_DEVICE);
/* verify contents of skb */
@@ -1828,7 +1828,7 @@ static int igb_clean_test_rings(struct igb_ring *rx_ring,
/* sync Rx buffer for device write */
dma_sync_single_for_device(rx_ring->dev,
rx_buffer_info->dma,
- IGB_RX_BUFSZ,
+ size,
DMA_FROM_DEVICE);
/* unmap buffer on Tx side */
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index da6219c37f05..ddc2c2788959 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -554,7 +554,7 @@ static void igb_dump(struct igb_adapter *adapter)
16, 1,
page_address(buffer_info->page) +
buffer_info->page_offset,
- IGB_RX_BUFSZ, true);
+ igb_rx_bufsz(rx_ring), true);
}
}
}
@@ -3739,7 +3739,10 @@ void igb_configure_rx_ring(struct igb_adapter *adapter,
/* set descriptor configuration */
srrctl = IGB_RX_HDR_LEN << E1000_SRRCTL_BSIZEHDRSIZE_SHIFT;
- srrctl |= IGB_RX_BUFSZ >> E1000_SRRCTL_BSIZEPKT_SHIFT;
+ if (ring_uses_large_buffer(ring))
+ srrctl |= IGB_RXBUFFER_3072 >> E1000_SRRCTL_BSIZEPKT_SHIFT;
+ else
+ srrctl |= IGB_RXBUFFER_2048 >> E1000_SRRCTL_BSIZEPKT_SHIFT;
srrctl |= E1000_SRRCTL_DESCTYPE_ADV_ONEBUF;
if (hw->mac.type >= e1000_82580)
srrctl |= E1000_SRRCTL_TIMESTAMP;
@@ -3761,6 +3764,26 @@ void igb_configure_rx_ring(struct igb_adapter *adapter,
wr32(E1000_RXDCTL(reg_idx), rxdctl);
}
+static void igb_set_rx_buffer_len(struct igb_adapter *adapter,
+ struct igb_ring *rx_ring)
+{
+ /* set build_skb and buffer size flags */
+ clear_ring_build_skb_enabled(rx_ring);
+ clear_ring_uses_large_buffer(rx_ring);
+
+ if (adapter->flags & IGB_FLAG_RX_LEGACY)
+ return;
+
+ set_ring_build_skb_enabled(rx_ring);
+
+#if (PAGE_SIZE < 8192)
+ if (adapter->max_frame_size <= IGB_MAX_FRAME_BUILD_SKB)
+ return;
+
+ set_ring_uses_large_buffer(rx_ring);
+#endif
+}
+
/**
* igb_configure_rx - Configure receive Unit after Reset
* @adapter: board private structure
@@ -3778,8 +3801,12 @@ static void igb_configure_rx(struct igb_adapter *adapter)
/* Setup the HW Rx Head and Tail Descriptor Pointers and
* the Base and Length of the Rx Descriptor Ring
*/
- for (i = 0; i < adapter->num_rx_queues; i++)
- igb_configure_rx_ring(adapter, adapter->rx_ring[i]);
+ for (i = 0; i < adapter->num_rx_queues; i++) {
+ struct igb_ring *rx_ring = adapter->rx_ring[i];
+
+ igb_set_rx_buffer_len(adapter, rx_ring);
+ igb_configure_rx_ring(adapter, rx_ring);
+ }
}
/**
@@ -3931,7 +3958,7 @@ static void igb_free_all_rx_resources(struct igb_adapter *adapter)
static void igb_clean_rx_ring(struct igb_ring *rx_ring)
{
unsigned long size;
- u16 i;
+ u16 i, bufsz;
if (rx_ring->skb)
dev_kfree_skb(rx_ring->skb);
@@ -3940,6 +3967,8 @@ static void igb_clean_rx_ring(struct igb_ring *rx_ring)
if (!rx_ring->rx_buffer_info)
return;
+ bufsz = igb_rx_bufsz(rx_ring);
+
/* Free all the Rx ring sk_buffs */
for (i = 0; i < rx_ring->count; i++) {
struct igb_rx_buffer *buffer_info = &rx_ring->rx_buffer_info[i];
@@ -3953,13 +3982,13 @@ static void igb_clean_rx_ring(struct igb_ring *rx_ring)
dma_sync_single_range_for_cpu(rx_ring->dev,
buffer_info->dma,
buffer_info->page_offset,
- IGB_RX_BUFSZ,
+ bufsz,
DMA_FROM_DEVICE);
/* free resources associated with mapping */
dma_unmap_page_attrs(rx_ring->dev,
buffer_info->dma,
- PAGE_SIZE,
+ igb_rx_pg_size(rx_ring),
DMA_FROM_DEVICE,
IGB_RX_DMA_ATTR);
__page_frag_cache_drain(buffer_info->page,
@@ -6841,7 +6870,7 @@ static inline bool igb_page_is_reserved(struct page *page)
static bool igb_can_reuse_rx_page(struct igb_rx_buffer *rx_buffer,
struct page *page,
- unsigned int truesize)
+ const unsigned int truesize)
{
unsigned int pagecnt_bias = rx_buffer->pagecnt_bias--;
@@ -6855,12 +6884,14 @@ static bool igb_can_reuse_rx_page(struct igb_rx_buffer *rx_buffer,
return false;
/* flip page offset to other buffer */
- rx_buffer->page_offset ^= IGB_RX_BUFSZ;
+ rx_buffer->page_offset ^= truesize;
#else
/* move offset up to the next cache line */
rx_buffer->page_offset += truesize;
+#define IGB_LAST_OFFSET \
+ (SKB_WITH_OVERHEAD(PAGE_SIZE) - IGB_RXBUFFER_2048)
- if (rx_buffer->page_offset > (PAGE_SIZE - IGB_RX_BUFSZ))
+ if (rx_buffer->page_offset > IGB_LAST_OFFSET)
return false;
#endif
@@ -6899,12 +6930,14 @@ static bool igb_add_rx_frag(struct igb_ring *rx_ring,
{
struct page *page = rx_buffer->page;
unsigned char *va = page_address(page) + rx_buffer->page_offset;
+ unsigned int pull_len;
#if (PAGE_SIZE < 8192)
- unsigned int truesize = IGB_RX_BUFSZ;
+ unsigned int truesize = igb_rx_pg_size(rx_ring) / 2;
#else
- unsigned int truesize = SKB_DATA_ALIGN(size);
+ unsigned int truesize = ring_uses_build_skb(rx_ring) ?
+ SKB_DATA_ALIGN(IGB_SKB_PAD + size) :
+ SKB_DATA_ALIGN(size);
#endif
- unsigned int pull_len;
if (unlikely(skb_is_nonlinear(skb)))
goto add_tail_frag;
@@ -6940,7 +6973,7 @@ static bool igb_add_rx_frag(struct igb_ring *rx_ring,
add_tail_frag:
skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, page,
- (unsigned long)va & ~PAGE_MASK, size, truesize);
+ va - page_address(page), size, truesize);
return igb_can_reuse_rx_page(rx_buffer, page, truesize);
}
@@ -6965,13 +6998,12 @@ static struct sk_buff *igb_fetch_rx_buffer(struct igb_ring *rx_ring,
DMA_FROM_DEVICE);
if (likely(!skb)) {
- void *page_addr = page_address(page) +
- rx_buffer->page_offset;
+ void *va = page_address(page) + rx_buffer->page_offset;
/* prefetch first cache line of first page */
- prefetch(page_addr);
+ prefetch(va);
#if L1_CACHE_BYTES < 128
- prefetch(page_addr + L1_CACHE_BYTES);
+ prefetch(va + L1_CACHE_BYTES);
#endif
/* allocate a skb to store the frags */
@@ -6997,7 +7029,7 @@ static struct sk_buff *igb_fetch_rx_buffer(struct igb_ring *rx_ring,
* any references we are holding to it
*/
dma_unmap_page_attrs(rx_ring->dev, rx_buffer->dma,
- PAGE_SIZE, DMA_FROM_DEVICE,
+ igb_rx_pg_size(rx_ring), DMA_FROM_DEVICE,
IGB_RX_DMA_ATTR);
__page_frag_cache_drain(page, rx_buffer->pagecnt_bias);
}
@@ -7239,6 +7271,11 @@ static int igb_clean_rx_irq(struct igb_q_vector *q_vector, const int budget)
return total_packets;
}
+static inline unsigned int igb_rx_offset(struct igb_ring *rx_ring)
+{
+ return ring_uses_build_skb(rx_ring) ? IGB_SKB_PAD : 0;
+}
+
static bool igb_alloc_mapped_page(struct igb_ring *rx_ring,
struct igb_rx_buffer *bi)
{
@@ -7250,21 +7287,23 @@ static bool igb_alloc_mapped_page(struct igb_ring *rx_ring,
return true;
/* alloc new page for storage */
- page = dev_alloc_page();
+ page = dev_alloc_pages(igb_rx_pg_order(rx_ring));
if (unlikely(!page)) {
rx_ring->rx_stats.alloc_failed++;
return false;
}
/* map page for use */
- dma = dma_map_page_attrs(rx_ring->dev, page, 0, PAGE_SIZE,
- DMA_FROM_DEVICE, IGB_RX_DMA_ATTR);
+ dma = dma_map_page_attrs(rx_ring->dev, page, 0,
+ igb_rx_pg_size(rx_ring),
+ DMA_FROM_DEVICE,
+ IGB_RX_DMA_ATTR);
/* if mapping failed free memory back to system since
* there isn't much point in holding memory we can't use
*/
if (dma_mapping_error(rx_ring->dev, dma)) {
- __free_page(page);
+ __free_pages(page, igb_rx_pg_order(rx_ring));
rx_ring->rx_stats.alloc_failed++;
return false;
@@ -7272,7 +7311,7 @@ static bool igb_alloc_mapped_page(struct igb_ring *rx_ring,
bi->dma = dma;
bi->page = page;
- bi->page_offset = 0;
+ bi->page_offset = igb_rx_offset(rx_ring);
bi->pagecnt_bias = 1;
return true;
@@ -7287,6 +7326,7 @@ void igb_alloc_rx_buffers(struct igb_ring *rx_ring, u16 cleaned_count)
union e1000_adv_rx_desc *rx_desc;
struct igb_rx_buffer *bi;
u16 i = rx_ring->next_to_use;
+ u16 bufsz;
/* nothing to do */
if (!cleaned_count)
@@ -7296,14 +7336,15 @@ void igb_alloc_rx_buffers(struct igb_ring *rx_ring, u16 cleaned_count)
bi = &rx_ring->rx_buffer_info[i];
i -= rx_ring->count;
+ bufsz = igb_rx_bufsz(rx_ring);
+
do {
if (!igb_alloc_mapped_page(rx_ring, bi))
break;
/* sync the buffer for use by the device */
dma_sync_single_range_for_device(rx_ring->dev, bi->dma,
- bi->page_offset,
- IGB_RX_BUFSZ,
+ bi->page_offset, bufsz,
DMA_FROM_DEVICE);
/* Refresh the desc even if buffer_addrs didn't change
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [Intel-wired-lan] [RFC PATCH 08/20] igb: Add support for ethtool private flag to allow use of legacy Rx
2016-12-19 20:10 [Intel-wired-lan] [RFC PATCH 00/20] Enable the use of build_skb in igb/ixgbe Alexander Duyck
` (6 preceding siblings ...)
2016-12-19 20:11 ` [Intel-wired-lan] [RFC PATCH 07/20] igb: Add support for padding packet Alexander Duyck
@ 2016-12-19 20:11 ` Alexander Duyck
2016-12-19 20:11 ` [Intel-wired-lan] [RFC PATCH 09/20] igb: Break out Rx buffer page management Alexander Duyck
` (11 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Alexander Duyck @ 2016-12-19 20:11 UTC (permalink / raw)
To: intel-wired-lan
From: Alexander Duyck <alexander.h.duyck@intel.com>
Since there are potential drawbacks to the new Rx allocation approach I
thought it best to add a "chicken bit" so that we can turn the feature off
if in the event that a problem is found.
It also provides a means of validating the legacy Rx path in the event that
we are forced to fall back. At some point int he future when we are
convinced we don't need it anymore we might be able to drop the legacy-rx
flag.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
drivers/net/ethernet/intel/igb/igb_ethtool.c | 48 ++++++++++++++++++++++++++
1 file changed, 48 insertions(+)
diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c b/drivers/net/ethernet/intel/igb/igb_ethtool.c
index 167977e80b66..01b188a59623 100644
--- a/drivers/net/ethernet/intel/igb/igb_ethtool.c
+++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c
@@ -144,6 +144,13 @@ enum igb_diagnostics_results {
};
#define IGB_TEST_LEN (sizeof(igb_gstrings_test) / ETH_GSTRING_LEN)
+static const char igb_priv_flags_strings[][ETH_GSTRING_LEN] = {
+#define IGB_PRIV_FLAGS_LEGACY_RX BIT(0)
+ "legacy-rx",
+};
+
+#define IGB_PRIV_FLAGS_STR_LEN ARRAY_SIZE(igb_priv_flags_strings)
+
static int igb_get_settings(struct net_device *netdev, struct ethtool_cmd *ecmd)
{
struct igb_adapter *adapter = netdev_priv(netdev);
@@ -852,6 +859,8 @@ static void igb_get_drvinfo(struct net_device *netdev,
sizeof(drvinfo->fw_version));
strlcpy(drvinfo->bus_info, pci_name(adapter->pdev),
sizeof(drvinfo->bus_info));
+
+ drvinfo->n_priv_flags = IGB_PRIV_FLAGS_STR_LEN;
}
static void igb_get_ringparam(struct net_device *netdev,
@@ -2271,6 +2280,8 @@ static int igb_get_sset_count(struct net_device *netdev, int sset)
return IGB_STATS_LEN;
case ETH_SS_TEST:
return IGB_TEST_LEN;
+ case ETH_SS_PRIV_FLAGS:
+ return IGB_PRIV_FLAGS_STR_LEN;
default:
return -ENOTSUPP;
}
@@ -2376,6 +2387,10 @@ static void igb_get_strings(struct net_device *netdev, u32 stringset, u8 *data)
}
/* BUG_ON(p - data != IGB_STATS_LEN * ETH_GSTRING_LEN); */
break;
+ case ETH_SS_PRIV_FLAGS:
+ memcpy(data, igb_priv_flags_strings,
+ IGB_PRIV_FLAGS_STR_LEN * ETH_GSTRING_LEN);
+ break;
}
}
@@ -3388,6 +3403,37 @@ static int igb_set_channels(struct net_device *netdev,
return 0;
}
+static u32 igb_get_priv_flags(struct net_device *netdev)
+{
+ struct igb_adapter *adapter = netdev_priv(netdev);
+ u32 priv_flags = 0;
+
+ if (adapter->flags & IGB_FLAG_RX_LEGACY)
+ priv_flags |= IGB_PRIV_FLAGS_LEGACY_RX;
+
+ return priv_flags;
+}
+
+static int igb_set_priv_flags(struct net_device *netdev, u32 priv_flags)
+{
+ struct igb_adapter *adapter = netdev_priv(netdev);
+ unsigned int flags = adapter->flags;
+
+ flags &= ~IGB_FLAG_RX_LEGACY;
+ if (priv_flags & IGB_PRIV_FLAGS_LEGACY_RX)
+ flags |= IGB_FLAG_RX_LEGACY;
+
+ if (flags != adapter->flags) {
+ adapter->flags = flags;
+
+ /* reset interface to repopulate queues */
+ if (netif_running(netdev))
+ igb_reinit_locked(adapter);
+ }
+
+ return 0;
+}
+
static const struct ethtool_ops igb_ethtool_ops = {
.get_settings = igb_get_settings,
.set_settings = igb_set_settings,
@@ -3426,6 +3472,8 @@ static int igb_set_channels(struct net_device *netdev,
.set_rxfh = igb_set_rxfh,
.get_channels = igb_get_channels,
.set_channels = igb_set_channels,
+ .get_priv_flags = igb_get_priv_flags,
+ .set_priv_flags = igb_set_priv_flags,
.begin = igb_ethtool_begin,
.complete = igb_ethtool_complete,
};
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [Intel-wired-lan] [RFC PATCH 09/20] igb: Break out Rx buffer page management
2016-12-19 20:10 [Intel-wired-lan] [RFC PATCH 00/20] Enable the use of build_skb in igb/ixgbe Alexander Duyck
` (7 preceding siblings ...)
2016-12-19 20:11 ` [Intel-wired-lan] [RFC PATCH 08/20] igb: Add support for ethtool private flag to allow use of legacy Rx Alexander Duyck
@ 2016-12-19 20:11 ` Alexander Duyck
2016-12-19 20:11 ` [Intel-wired-lan] [RFC PATCH 10/20] igb: Revert "igb: Revert support for build_skb in igb" Alexander Duyck
` (10 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Alexander Duyck @ 2016-12-19 20:11 UTC (permalink / raw)
To: intel-wired-lan
From: Alexander Duyck <alexander.h.duyck@intel.com>
At this point we have 2 to 3 paths that can be taken depending on what Rx
modes are enabled. In order to better support that and improve the
maintainability I am breaking out the common bits from those paths and
making them into their own functions.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
drivers/net/ethernet/intel/igb/igb_main.c | 227 +++++++++++++++--------------
1 file changed, 115 insertions(+), 112 deletions(-)
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index ddc2c2788959..cea252fc7582 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -6868,11 +6868,10 @@ static inline bool igb_page_is_reserved(struct page *page)
return (page_to_nid(page) != numa_mem_id()) || page_is_pfmemalloc(page);
}
-static bool igb_can_reuse_rx_page(struct igb_rx_buffer *rx_buffer,
- struct page *page,
- const unsigned int truesize)
+static bool igb_can_reuse_rx_page(struct igb_rx_buffer *rx_buffer)
{
- unsigned int pagecnt_bias = rx_buffer->pagecnt_bias--;
+ unsigned int pagecnt_bias = rx_buffer->pagecnt_bias;
+ struct page *page = rx_buffer->page;
/* avoid re-using remote pages */
if (unlikely(igb_page_is_reserved(page)))
@@ -6880,14 +6879,9 @@ static bool igb_can_reuse_rx_page(struct igb_rx_buffer *rx_buffer,
#if (PAGE_SIZE < 8192)
/* if we are only owner of page we can reuse it */
- if (unlikely(page_ref_count(page) != pagecnt_bias))
+ if (unlikely((page_ref_count(page) - pagecnt_bias) > 1))
return false;
-
- /* flip page offset to other buffer */
- rx_buffer->page_offset ^= truesize;
#else
- /* move offset up to the next cache line */
- rx_buffer->page_offset += truesize;
#define IGB_LAST_OFFSET \
(SKB_WITH_OVERHEAD(PAGE_SIZE) - IGB_RXBUFFER_2048)
@@ -6899,7 +6893,7 @@ static bool igb_can_reuse_rx_page(struct igb_rx_buffer *rx_buffer,
* the pagecnt_bias and page count so that we fully restock the
* number of references the driver holds.
*/
- if (unlikely(pagecnt_bias == 1)) {
+ if (unlikely(!pagecnt_bias)) {
page_ref_add(page, USHRT_MAX);
rx_buffer->pagecnt_bias = USHRT_MAX;
}
@@ -6911,26 +6905,16 @@ static bool igb_can_reuse_rx_page(struct igb_rx_buffer *rx_buffer,
* igb_add_rx_frag - Add contents of Rx buffer to sk_buff
* @rx_ring: rx descriptor ring to transact packets on
* @rx_buffer: buffer containing page to add
- * @rx_desc: descriptor containing length of buffer written by hardware
* @skb: sk_buff to place the data into
+ * @size: size of buffer to be added
*
* This function will add the data contained in rx_buffer->page to the skb.
- * This is done either through a direct copy if the data in the buffer is
- * less than the skb header size, otherwise it will just attach the page as
- * a frag to the skb.
- *
- * The function will then update the page offset if necessary and return
- * true if the buffer can be reused by the adapter.
**/
-static bool igb_add_rx_frag(struct igb_ring *rx_ring,
+static void igb_add_rx_frag(struct igb_ring *rx_ring,
struct igb_rx_buffer *rx_buffer,
- unsigned int size,
- union e1000_adv_rx_desc *rx_desc,
- struct sk_buff *skb)
+ struct sk_buff *skb,
+ unsigned int size)
{
- struct page *page = rx_buffer->page;
- unsigned char *va = page_address(page) + rx_buffer->page_offset;
- unsigned int pull_len;
#if (PAGE_SIZE < 8192)
unsigned int truesize = igb_rx_pg_size(rx_ring) / 2;
#else
@@ -6938,9 +6922,39 @@ static bool igb_add_rx_frag(struct igb_ring *rx_ring,
SKB_DATA_ALIGN(IGB_SKB_PAD + size) :
SKB_DATA_ALIGN(size);
#endif
+ skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, rx_buffer->page,
+ rx_buffer->page_offset, size, truesize);
+#if (PAGE_SIZE < 8192)
+ rx_buffer->page_offset ^= truesize;
+#else
+ rx_buffer->page_offset += truesize;
+#endif
+}
+
+static struct sk_buff *igb_construct_skb(struct igb_ring *rx_ring,
+ struct igb_rx_buffer *rx_buffer,
+ union e1000_adv_rx_desc *rx_desc,
+ unsigned int size)
+{
+ void *va = page_address(rx_buffer->page) + rx_buffer->page_offset;
+#if (PAGE_SIZE < 8192)
+ unsigned int truesize = igb_rx_pg_size(rx_ring) / 2;
+#else
+ unsigned int truesize = SKB_DATA_ALIGN(size);
+#endif
+ unsigned int headlen;
+ struct sk_buff *skb;
- if (unlikely(skb_is_nonlinear(skb)))
- goto add_tail_frag;
+ /* prefetch first cache line of first page */
+ prefetch(va);
+#if L1_CACHE_BYTES < 128
+ prefetch(va + L1_CACHE_BYTES);
+#endif
+
+ /* allocate a skb to store the frags */
+ skb = napi_alloc_skb(&rx_ring->q_vector->napi, IGB_RX_HDR_LEN);
+ if (unlikely(!skb))
+ return NULL;
if (unlikely(igb_test_staterr(rx_desc, E1000_RXDADV_STAT_TSIP))) {
igb_ptp_rx_pktstamp(rx_ring->q_vector, va, skb);
@@ -6948,95 +6962,31 @@ static bool igb_add_rx_frag(struct igb_ring *rx_ring,
size -= IGB_TS_HDR_LEN;
}
- if (likely(size <= IGB_RX_HDR_LEN)) {
- memcpy(__skb_put(skb, size), va, ALIGN(size, sizeof(long)));
-
- /* page is not reserved, we can reuse buffer as-is */
- if (likely(!igb_page_is_reserved(page)))
- return true;
-
- /* this page cannot be reused so discard it */
- return false;
- }
-
- /* we need the header to contain the greater of either ETH_HLEN or
- * 60 bytes if the skb->len is less than 60 for skb_pad.
- */
- pull_len = eth_get_headlen(va, IGB_RX_HDR_LEN);
+ /* Determine available headroom for copy */
+ headlen = size;
+ if (headlen > IGB_RX_HDR_LEN)
+ headlen = eth_get_headlen(va, IGB_RX_HDR_LEN);
+ else
+ headlen = size;
/* align pull length to size of long to optimize memcpy performance */
- memcpy(__skb_put(skb, pull_len), va, ALIGN(pull_len, sizeof(long)));
+ memcpy(__skb_put(skb, headlen), va, ALIGN(headlen, sizeof(long)));
/* update all of the pointers */
- va += pull_len;
- size -= pull_len;
-
-add_tail_frag:
- skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, page,
- va - page_address(page), size, truesize);
-
- return igb_can_reuse_rx_page(rx_buffer, page, truesize);
-}
-
-static struct sk_buff *igb_fetch_rx_buffer(struct igb_ring *rx_ring,
- union e1000_adv_rx_desc *rx_desc,
- struct sk_buff *skb)
-{
- unsigned int size = le16_to_cpu(rx_desc->wb.upper.length);
- struct igb_rx_buffer *rx_buffer;
- struct page *page;
-
- rx_buffer = &rx_ring->rx_buffer_info[rx_ring->next_to_clean];
- page = rx_buffer->page;
- prefetchw(page);
-
- /* we are reusing so sync this buffer for CPU use */
- dma_sync_single_range_for_cpu(rx_ring->dev,
- rx_buffer->dma,
- rx_buffer->page_offset,
- size,
- DMA_FROM_DEVICE);
-
- if (likely(!skb)) {
- void *va = page_address(page) + rx_buffer->page_offset;
-
- /* prefetch first cache line of first page */
- prefetch(va);
-#if L1_CACHE_BYTES < 128
- prefetch(va + L1_CACHE_BYTES);
+ size -= headlen;
+ if (size) {
+ skb_add_rx_frag(skb, 0, rx_buffer->page,
+ (va + headlen) - page_address(rx_buffer->page),
+ size, truesize);
+#if (PAGE_SIZE < 8192)
+ rx_buffer->page_offset ^= truesize;
+#else
+ rx_buffer->page_offset += truesize;
#endif
-
- /* allocate a skb to store the frags */
- skb = napi_alloc_skb(&rx_ring->q_vector->napi, IGB_RX_HDR_LEN);
- if (unlikely(!skb)) {
- rx_ring->rx_stats.alloc_failed++;
- return NULL;
- }
-
- /* we will be copying header into skb->data in
- * pskb_may_pull so it is in our interest to prefetch
- * it now to avoid a possible cache miss
- */
- prefetchw(skb->data);
- }
-
- /* pull page into skb */
- if (igb_add_rx_frag(rx_ring, rx_buffer, size, rx_desc, skb)) {
- /* hand second half of page back to the ring */
- igb_reuse_rx_page(rx_ring, rx_buffer);
} else {
- /* We are not reusing the buffer so unmap it and free
- * any references we are holding to it
- */
- dma_unmap_page_attrs(rx_ring->dev, rx_buffer->dma,
- igb_rx_pg_size(rx_ring), DMA_FROM_DEVICE,
- IGB_RX_DMA_ATTR);
- __page_frag_cache_drain(page, rx_buffer->pagecnt_bias);
+ rx_buffer->pagecnt_bias++;
}
- /* clear contents of rx_buffer */
- rx_buffer->page = NULL;
-
return skb;
}
@@ -7194,6 +7144,47 @@ static void igb_process_skb_fields(struct igb_ring *rx_ring,
skb->protocol = eth_type_trans(skb, rx_ring->netdev);
}
+static struct igb_rx_buffer *igb_get_rx_buffer(struct igb_ring *rx_ring,
+ const unsigned int size)
+{
+ struct igb_rx_buffer *rx_buffer;
+
+ rx_buffer = &rx_ring->rx_buffer_info[rx_ring->next_to_clean];
+ prefetchw(rx_buffer->page);
+
+ /* we are reusing so sync this buffer for CPU use */
+ dma_sync_single_range_for_cpu(rx_ring->dev,
+ rx_buffer->dma,
+ rx_buffer->page_offset,
+ size,
+ DMA_FROM_DEVICE);
+
+ rx_buffer->pagecnt_bias--;
+
+ return rx_buffer;
+}
+
+static void igb_put_rx_buffer(struct igb_ring *rx_ring,
+ struct igb_rx_buffer *rx_buffer)
+{
+ if (igb_can_reuse_rx_page(rx_buffer)) {
+ /* hand second half of page back to the ring */
+ igb_reuse_rx_page(rx_ring, rx_buffer);
+ } else {
+ /* We are not reusing the buffer so unmap it and free
+ * any references we are holding to it
+ */
+ dma_unmap_page_attrs(rx_ring->dev, rx_buffer->dma,
+ igb_rx_pg_size(rx_ring), DMA_FROM_DEVICE,
+ IGB_RX_DMA_ATTR);
+ __page_frag_cache_drain(rx_buffer->page,
+ rx_buffer->pagecnt_bias);
+ }
+
+ /* clear contents of rx_buffer */
+ rx_buffer->page = NULL;
+}
+
static int igb_clean_rx_irq(struct igb_q_vector *q_vector, const int budget)
{
struct igb_ring *rx_ring = q_vector->rx.ring;
@@ -7203,6 +7194,8 @@ static int igb_clean_rx_irq(struct igb_q_vector *q_vector, const int budget)
while (likely(total_packets < budget)) {
union e1000_adv_rx_desc *rx_desc;
+ struct igb_rx_buffer *rx_buffer;
+ unsigned int size;
/* return some buffers to hardware, one@a time is too slow */
if (cleaned_count >= IGB_RX_BUFFER_WRITE) {
@@ -7211,8 +7204,8 @@ static int igb_clean_rx_irq(struct igb_q_vector *q_vector, const int budget)
}
rx_desc = IGB_RX_DESC(rx_ring, rx_ring->next_to_clean);
-
- if (!rx_desc->wb.upper.length)
+ size = le16_to_cpu(rx_desc->wb.upper.length);
+ if (!size)
break;
/* This memory barrier is needed to keep us from reading
@@ -7221,13 +7214,23 @@ static int igb_clean_rx_irq(struct igb_q_vector *q_vector, const int budget)
*/
dma_rmb();
+ rx_buffer = igb_get_rx_buffer(rx_ring, size);
+
/* retrieve a buffer from the ring */
- skb = igb_fetch_rx_buffer(rx_ring, rx_desc, skb);
+ if (skb)
+ igb_add_rx_frag(rx_ring, rx_buffer, skb, size);
+ else
+ skb = igb_construct_skb(rx_ring, rx_buffer,
+ rx_desc, size);
/* exit if we failed to retrieve a buffer */
- if (!skb)
+ if (!skb) {
+ rx_ring->rx_stats.alloc_failed++;
+ rx_buffer->pagecnt_bias++;
break;
+ }
+ igb_put_rx_buffer(rx_ring, rx_buffer);
cleaned_count++;
/* fetch next buffer in frame if non-eop */
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [Intel-wired-lan] [RFC PATCH 10/20] igb: Revert "igb: Revert support for build_skb in igb"
2016-12-19 20:10 [Intel-wired-lan] [RFC PATCH 00/20] Enable the use of build_skb in igb/ixgbe Alexander Duyck
` (8 preceding siblings ...)
2016-12-19 20:11 ` [Intel-wired-lan] [RFC PATCH 09/20] igb: Break out Rx buffer page management Alexander Duyck
@ 2016-12-19 20:11 ` Alexander Duyck
2016-12-19 20:11 ` [Intel-wired-lan] [RFC PATCH 11/20] ixgbe: Add function for checking to see if we can reuse page Alexander Duyck
` (9 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Alexander Duyck @ 2016-12-19 20:11 UTC (permalink / raw)
To: intel-wired-lan
From: Alexander Duyck <alexander.h.duyck@intel.com>
This reverts commit f9d40f6a9921 ("igb: Revert support for build_skb in
igb") and adds a few changes to update it to work with the latest version
of igb. We are now able to revert the removal of this due to the fact
that with the recent changes to the page count and the use of
DMA_ATTR_SKIP_CPU_SYNC we can make the pages writable so we should not be
invalidating the additional data added when we call build_skb.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
drivers/net/ethernet/intel/igb/igb_main.c | 47 +++++++++++++++++++++++++++++
1 file changed, 47 insertions(+)
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index cea252fc7582..ffafb50f458f 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -6990,6 +6990,51 @@ static struct sk_buff *igb_construct_skb(struct igb_ring *rx_ring,
return skb;
}
+static struct sk_buff *igb_build_skb(struct igb_ring *rx_ring,
+ struct igb_rx_buffer *rx_buffer,
+ union e1000_adv_rx_desc *rx_desc,
+ unsigned int size)
+{
+ void *va = page_address(rx_buffer->page) + rx_buffer->page_offset;
+#if (PAGE_SIZE < 8192)
+ unsigned int truesize = igb_rx_pg_size(rx_ring) / 2;
+#else
+ unsigned int truesize = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) +
+ SKB_DATA_ALIGN(IGB_SKB_PAD + size);
+#endif
+ struct sk_buff *skb;
+
+ /* prefetch first cache line of first page */
+ prefetch(va);
+#if L1_CACHE_BYTES < 128
+ prefetch(va + L1_CACHE_BYTES);
+#endif
+
+ /* build an skb to around the page buffer */
+ skb = build_skb(va - IGB_SKB_PAD, truesize);
+ if (unlikely(!skb))
+ return NULL;
+
+ /* update pointers within the skb to store the data */
+ skb_reserve(skb, IGB_SKB_PAD);
+ __skb_put(skb, size);
+
+ /* pull timestamp out of packet data */
+ if (igb_test_staterr(rx_desc, E1000_RXDADV_STAT_TSIP)) {
+ igb_ptp_rx_pktstamp(rx_ring->q_vector, skb->data, skb);
+ __skb_pull(skb, IGB_TS_HDR_LEN);
+ }
+
+ /* update buffer offset */
+#if (PAGE_SIZE < 8192)
+ rx_buffer->page_offset ^= truesize;
+#else
+ rx_buffer->page_offset += truesize;
+#endif
+
+ return skb;
+}
+
static inline void igb_rx_checksum(struct igb_ring *ring,
union e1000_adv_rx_desc *rx_desc,
struct sk_buff *skb)
@@ -7219,6 +7264,8 @@ static int igb_clean_rx_irq(struct igb_q_vector *q_vector, const int budget)
/* retrieve a buffer from the ring */
if (skb)
igb_add_rx_frag(rx_ring, rx_buffer, skb, size);
+ else if (ring_uses_build_skb(rx_ring))
+ skb = igb_build_skb(rx_ring, rx_buffer, rx_desc, size);
else
skb = igb_construct_skb(rx_ring, rx_buffer,
rx_desc, size);
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [Intel-wired-lan] [RFC PATCH 11/20] ixgbe: Add function for checking to see if we can reuse page
2016-12-19 20:10 [Intel-wired-lan] [RFC PATCH 00/20] Enable the use of build_skb in igb/ixgbe Alexander Duyck
` (9 preceding siblings ...)
2016-12-19 20:11 ` [Intel-wired-lan] [RFC PATCH 10/20] igb: Revert "igb: Revert support for build_skb in igb" Alexander Duyck
@ 2016-12-19 20:11 ` Alexander Duyck
2016-12-19 20:11 ` [Intel-wired-lan] [RFC PATCH 12/20] ixgbe: Only DMA sync frame length Alexander Duyck
` (8 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Alexander Duyck @ 2016-12-19 20:11 UTC (permalink / raw)
To: intel-wired-lan
From: Alexander Duyck <alexander.h.duyck@intel.com>
This patch consolidates the code for the ixgbe driver so that it is more
inline with what is already in igb. The general idea is to just
consolidate functions that represent logical steps in the Rx process so we
can later update them more easily.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 70 +++++++++++++++----------
1 file changed, 41 insertions(+), 29 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 1e2f39ebd824..f62191f7d080 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1931,6 +1931,41 @@ static inline bool ixgbe_page_is_reserved(struct page *page)
return (page_to_nid(page) != numa_mem_id()) || page_is_pfmemalloc(page);
}
+static bool ixgbe_can_reuse_rx_page(struct ixgbe_rx_buffer *rx_buffer,
+ struct page *page,
+ const unsigned int truesize)
+{
+#if (PAGE_SIZE >= 8192)
+ unsigned int last_offset = ixgbe_rx_pg_size(rx_ring) -
+ ixgbe_rx_bufsz(rx_ring);
+#endif
+ /* avoid re-using remote pages */
+ if (unlikely(ixgbe_page_is_reserved(page)))
+ return false;
+
+#if (PAGE_SIZE < 8192)
+ /* if we are only owner of page we can reuse it */
+ if (unlikely(page_count(page) != 1))
+ return false;
+
+ /* flip page offset to other buffer */
+ rx_buffer->page_offset ^= truesize;
+#else
+ /* move offset up to the next cache line */
+ rx_buffer->page_offset += truesize;
+
+ if (rx_buffer->page_offset > last_offset)
+ return false;
+#endif
+
+ /* Even if we own the page, we are not allowed to use atomic_set()
+ * This would break get_page_unless_zero() users.
+ */
+ page_ref_inc(page);
+
+ return true;
+}
+
/**
* ixgbe_add_rx_frag - Add contents of Rx buffer to sk_buff
* @rx_ring: rx descriptor ring to transact packets on
@@ -1952,18 +1987,18 @@ static bool ixgbe_add_rx_frag(struct ixgbe_ring *rx_ring,
struct sk_buff *skb)
{
struct page *page = rx_buffer->page;
+ unsigned char *va = page_address(page) + rx_buffer->page_offset;
unsigned int size = le16_to_cpu(rx_desc->wb.upper.length);
#if (PAGE_SIZE < 8192)
unsigned int truesize = ixgbe_rx_bufsz(rx_ring);
#else
unsigned int truesize = ALIGN(size, L1_CACHE_BYTES);
- unsigned int last_offset = ixgbe_rx_pg_size(rx_ring) -
- ixgbe_rx_bufsz(rx_ring);
#endif
- if ((size <= IXGBE_RX_HDR_SIZE) && !skb_is_nonlinear(skb)) {
- unsigned char *va = page_address(page) + rx_buffer->page_offset;
+ if (unlikely(skb_is_nonlinear(skb)))
+ goto add_tail_frag;
+ if (size <= IXGBE_RX_HDR_SIZE) {
memcpy(__skb_put(skb, size), va, ALIGN(size, sizeof(long)));
/* page is not reserved, we can reuse buffer as-is */
@@ -1975,34 +2010,11 @@ static bool ixgbe_add_rx_frag(struct ixgbe_ring *rx_ring,
return false;
}
+add_tail_frag:
skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, page,
rx_buffer->page_offset, size, truesize);
- /* avoid re-using remote pages */
- if (unlikely(ixgbe_page_is_reserved(page)))
- return false;
-
-#if (PAGE_SIZE < 8192)
- /* if we are only owner of page we can reuse it */
- if (unlikely(page_count(page) != 1))
- return false;
-
- /* flip page offset to other buffer */
- rx_buffer->page_offset ^= truesize;
-#else
- /* move offset up to the next cache line */
- rx_buffer->page_offset += truesize;
-
- if (rx_buffer->page_offset > last_offset)
- return false;
-#endif
-
- /* Even if we own the page, we are not allowed to use atomic_set()
- * This would break get_page_unless_zero() users.
- */
- page_ref_inc(page);
-
- return true;
+ return ixgbe_can_reuse_rx_page(rx_buffer, page, truesize);
}
static struct sk_buff *ixgbe_fetch_rx_buffer(struct ixgbe_ring *rx_ring,
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [Intel-wired-lan] [RFC PATCH 12/20] ixgbe: Only DMA sync frame length
2016-12-19 20:10 [Intel-wired-lan] [RFC PATCH 00/20] Enable the use of build_skb in igb/ixgbe Alexander Duyck
` (10 preceding siblings ...)
2016-12-19 20:11 ` [Intel-wired-lan] [RFC PATCH 11/20] ixgbe: Add function for checking to see if we can reuse page Alexander Duyck
@ 2016-12-19 20:11 ` Alexander Duyck
2016-12-19 20:11 ` [Intel-wired-lan] [RFC PATCH 13/20] ixgbe: Update driver to make use of DMA attributes in Rx path Alexander Duyck
` (7 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Alexander Duyck @ 2016-12-19 20:11 UTC (permalink / raw)
To: intel-wired-lan
From: Alexander Duyck <alexander.h.duyck@intel.com>
On some platforms, syncing a buffer for DMA is expensive. Rather than
sync the whole 2K receive buffer, only synchronise the length of the
frame, which will typically be the MTU, or a much smaller TCP ACK.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index f62191f7d080..0ff32c05bfc3 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1842,7 +1842,7 @@ static void ixgbe_dma_sync_frag(struct ixgbe_ring *rx_ring,
dma_sync_single_range_for_cpu(rx_ring->dev,
IXGBE_CB(skb)->dma,
frag->page_offset,
- ixgbe_rx_bufsz(rx_ring),
+ skb_frag_size(frag),
DMA_FROM_DEVICE);
}
IXGBE_CB(skb)->dma = 0;
@@ -1983,12 +1983,11 @@ static bool ixgbe_can_reuse_rx_page(struct ixgbe_rx_buffer *rx_buffer,
**/
static bool ixgbe_add_rx_frag(struct ixgbe_ring *rx_ring,
struct ixgbe_rx_buffer *rx_buffer,
- union ixgbe_adv_rx_desc *rx_desc,
+ unsigned int size,
struct sk_buff *skb)
{
struct page *page = rx_buffer->page;
unsigned char *va = page_address(page) + rx_buffer->page_offset;
- unsigned int size = le16_to_cpu(rx_desc->wb.upper.length);
#if (PAGE_SIZE < 8192)
unsigned int truesize = ixgbe_rx_bufsz(rx_ring);
#else
@@ -2020,6 +2019,7 @@ static bool ixgbe_add_rx_frag(struct ixgbe_ring *rx_ring,
static struct sk_buff *ixgbe_fetch_rx_buffer(struct ixgbe_ring *rx_ring,
union ixgbe_adv_rx_desc *rx_desc)
{
+ unsigned int size = le16_to_cpu(rx_desc->wb.upper.length);
struct ixgbe_rx_buffer *rx_buffer;
struct sk_buff *skb;
struct page *page;
@@ -2074,14 +2074,14 @@ static struct sk_buff *ixgbe_fetch_rx_buffer(struct ixgbe_ring *rx_ring,
dma_sync_single_range_for_cpu(rx_ring->dev,
rx_buffer->dma,
rx_buffer->page_offset,
- ixgbe_rx_bufsz(rx_ring),
+ size,
DMA_FROM_DEVICE);
rx_buffer->skb = NULL;
}
/* pull page into skb */
- if (ixgbe_add_rx_frag(rx_ring, rx_buffer, rx_desc, skb)) {
+ if (ixgbe_add_rx_frag(rx_ring, rx_buffer, size, skb)) {
/* hand second half of page back to the ring */
ixgbe_reuse_rx_page(rx_ring, rx_buffer);
} else if (IXGBE_CB(skb)->dma == rx_buffer->dma) {
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [Intel-wired-lan] [RFC PATCH 13/20] ixgbe: Update driver to make use of DMA attributes in Rx path
2016-12-19 20:10 [Intel-wired-lan] [RFC PATCH 00/20] Enable the use of build_skb in igb/ixgbe Alexander Duyck
` (11 preceding siblings ...)
2016-12-19 20:11 ` [Intel-wired-lan] [RFC PATCH 12/20] ixgbe: Only DMA sync frame length Alexander Duyck
@ 2016-12-19 20:11 ` Alexander Duyck
2016-12-19 20:12 ` [Intel-wired-lan] [RFC PATCH 14/20] ixgbe: Update code to better handle incrementing page count Alexander Duyck
` (6 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Alexander Duyck @ 2016-12-19 20:11 UTC (permalink / raw)
To: intel-wired-lan
From: Alexander Duyck <alexander.h.duyck@intel.com>
This patch adds support for DMA_ATTR_SKIP_CPU_SYNC and
DMA_ATTR_WEAK_ORDERING. By enabling both of these for the Rx path we are
able to see performance improvements on architectures that implement either
one due to the fact that page mapping and unmapping only has to sync what
is actually being used instead of the entire buffer. In addition by
enabling the weak ordering attribute enables a performance improvement for
architectures that can associate a memory ordering with a DMA buffer such
as Sparc.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
drivers/net/ethernet/intel/ixgbe/ixgbe.h | 3 +
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 56 +++++++++++++++++--------
2 files changed, 40 insertions(+), 19 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index ef81c3d8c295..108a8e22c33e 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -107,6 +107,9 @@
/* How many Rx Buffers do we bundle into one write to the hardware ? */
#define IXGBE_RX_BUFFER_WRITE 16 /* Must be power of 2 */
+#define IXGBE_RX_DMA_ATTR \
+ (DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_WEAK_ORDERING)
+
enum ixgbe_tx_flags {
/* cmd_type flags */
IXGBE_TX_FLAGS_HW_VLAN = 0x01,
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 0ff32c05bfc3..1928a479dcc1 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1567,8 +1567,10 @@ static bool ixgbe_alloc_mapped_page(struct ixgbe_ring *rx_ring,
}
/* map page for use */
- dma = dma_map_page(rx_ring->dev, page, 0,
- ixgbe_rx_pg_size(rx_ring), DMA_FROM_DEVICE);
+ dma = dma_map_page_attrs(rx_ring->dev, page, 0,
+ ixgbe_rx_pg_size(rx_ring),
+ DMA_FROM_DEVICE,
+ IXGBE_RX_DMA_ATTR);
/*
* if mapping failed free memory back to system since
@@ -1611,6 +1613,12 @@ void ixgbe_alloc_rx_buffers(struct ixgbe_ring *rx_ring, u16 cleaned_count)
if (!ixgbe_alloc_mapped_page(rx_ring, bi))
break;
+ /* sync the buffer for use by the device */
+ dma_sync_single_range_for_device(rx_ring->dev, bi->dma,
+ bi->page_offset,
+ ixgbe_rx_bufsz(rx_ring),
+ DMA_FROM_DEVICE);
+
/*
* Refresh the desc even if buffer_addrs didn't change
* because each write-back erases this info.
@@ -1833,8 +1841,10 @@ static void ixgbe_dma_sync_frag(struct ixgbe_ring *rx_ring,
{
/* if the page was released unmap it, else just sync our portion */
if (unlikely(IXGBE_CB(skb)->page_released)) {
- dma_unmap_page(rx_ring->dev, IXGBE_CB(skb)->dma,
- ixgbe_rx_pg_size(rx_ring), DMA_FROM_DEVICE);
+ dma_unmap_page_attrs(rx_ring->dev, IXGBE_CB(skb)->dma,
+ ixgbe_rx_pg_size(rx_ring),
+ DMA_FROM_DEVICE,
+ IXGBE_RX_DMA_ATTR);
IXGBE_CB(skb)->page_released = false;
} else {
struct skb_frag_struct *frag = &skb_shinfo(skb)->frags[0];
@@ -1918,12 +1928,6 @@ static void ixgbe_reuse_rx_page(struct ixgbe_ring *rx_ring,
/* transfer page from old buffer to new buffer */
*new_buff = *old_buff;
-
- /* sync the buffer for use by the device */
- dma_sync_single_range_for_device(rx_ring->dev, new_buff->dma,
- new_buff->page_offset,
- ixgbe_rx_bufsz(rx_ring),
- DMA_FROM_DEVICE);
}
static inline bool ixgbe_page_is_reserved(struct page *page)
@@ -2089,9 +2093,10 @@ static struct sk_buff *ixgbe_fetch_rx_buffer(struct ixgbe_ring *rx_ring,
IXGBE_CB(skb)->page_released = true;
} else {
/* we are not reusing the buffer so unmap it */
- dma_unmap_page(rx_ring->dev, rx_buffer->dma,
- ixgbe_rx_pg_size(rx_ring),
- DMA_FROM_DEVICE);
+ dma_unmap_page_attrs(rx_ring->dev, rx_buffer->dma,
+ ixgbe_rx_pg_size(rx_ring),
+ DMA_FROM_DEVICE,
+ IXGBE_RX_DMA_ATTR);
}
/* clear contents of buffer_info */
@@ -4906,10 +4911,11 @@ static void ixgbe_clean_rx_ring(struct ixgbe_ring *rx_ring)
if (rx_buffer->skb) {
struct sk_buff *skb = rx_buffer->skb;
if (IXGBE_CB(skb)->page_released)
- dma_unmap_page(dev,
- IXGBE_CB(skb)->dma,
- ixgbe_rx_bufsz(rx_ring),
- DMA_FROM_DEVICE);
+ dma_unmap_page_attrs(dev,
+ IXGBE_CB(skb)->dma,
+ ixgbe_rx_bufsz(rx_ring),
+ DMA_FROM_DEVICE,
+ IXGBE_RX_DMA_ATTR);
dev_kfree_skb(skb);
rx_buffer->skb = NULL;
}
@@ -4917,8 +4923,20 @@ static void ixgbe_clean_rx_ring(struct ixgbe_ring *rx_ring)
if (!rx_buffer->page)
continue;
- dma_unmap_page(dev, rx_buffer->dma,
- ixgbe_rx_pg_size(rx_ring), DMA_FROM_DEVICE);
+ /* Invalidate cache lines that may have been written to by
+ * device so that we avoid corrupting memory.
+ */
+ dma_sync_single_range_for_cpu(rx_ring->dev,
+ rx_buffer->dma,
+ rx_buffer->page_offset,
+ ixgbe_rx_pg_size(rx_ring),
+ DMA_FROM_DEVICE);
+
+ /* free resources associated with mapping */
+ dma_unmap_page_attrs(dev, rx_buffer->dma,
+ ixgbe_rx_pg_size(rx_ring),
+ DMA_FROM_DEVICE,
+ IXGBE_RX_DMA_ATTR);
__free_pages(rx_buffer->page, ixgbe_rx_pg_order(rx_ring));
rx_buffer->page = NULL;
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [Intel-wired-lan] [RFC PATCH 14/20] ixgbe: Update code to better handle incrementing page count
2016-12-19 20:10 [Intel-wired-lan] [RFC PATCH 00/20] Enable the use of build_skb in igb/ixgbe Alexander Duyck
` (12 preceding siblings ...)
2016-12-19 20:11 ` [Intel-wired-lan] [RFC PATCH 13/20] ixgbe: Update driver to make use of DMA attributes in Rx path Alexander Duyck
@ 2016-12-19 20:12 ` Alexander Duyck
2016-12-19 20:12 ` [Intel-wired-lan] [RFC PATCH 15/20] ixgbe: Make use of order 1 pages and 3K buffers independent of FCoE Alexander Duyck
` (5 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Alexander Duyck @ 2016-12-19 20:12 UTC (permalink / raw)
To: intel-wired-lan
From: Alexander Duyck <alexander.h.duyck@intel.com>
Batch the page count updates instead of doing them one at a time. By doing
this we can improve the overall performance as the atomic increment
operations can be expensive due to the fact that on x86 they are locked
operations which can cause stalls. By doing bulk updates we can
consolidate the stall which should help to improve the overall receive
performance.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
drivers/net/ethernet/intel/ixgbe/ixgbe.h | 7 ++++
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 39 ++++++++++++++++---------
2 files changed, 31 insertions(+), 15 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 108a8e22c33e..c9c00b58a9ad 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -197,7 +197,12 @@ struct ixgbe_rx_buffer {
struct sk_buff *skb;
dma_addr_t dma;
struct page *page;
- unsigned int page_offset;
+#if (BITS_PER_LONG > 32) || (PAGE_SIZE >= 65536)
+ __u32 page_offset;
+#else
+ __u16 page_offset;
+#endif
+ __u16 pagecnt_bias;
};
struct ixgbe_queue_stats {
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 1928a479dcc1..ac65b568fe01 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1586,6 +1586,7 @@ static bool ixgbe_alloc_mapped_page(struct ixgbe_ring *rx_ring,
bi->dma = dma;
bi->page = page;
bi->page_offset = 0;
+ bi->pagecnt_bias = 1;
return true;
}
@@ -1943,13 +1944,15 @@ static bool ixgbe_can_reuse_rx_page(struct ixgbe_rx_buffer *rx_buffer,
unsigned int last_offset = ixgbe_rx_pg_size(rx_ring) -
ixgbe_rx_bufsz(rx_ring);
#endif
+ unsigned int pagecnt_bias = rx_buffer->pagecnt_bias--;
+
/* avoid re-using remote pages */
if (unlikely(ixgbe_page_is_reserved(page)))
return false;
#if (PAGE_SIZE < 8192)
/* if we are only owner of page we can reuse it */
- if (unlikely(page_count(page) != 1))
+ if (unlikely(page_count(page) != pagecnt_bias))
return false;
/* flip page offset to other buffer */
@@ -1962,10 +1965,14 @@ static bool ixgbe_can_reuse_rx_page(struct ixgbe_rx_buffer *rx_buffer,
return false;
#endif
- /* Even if we own the page, we are not allowed to use atomic_set()
- * This would break get_page_unless_zero() users.
+ /* If we have drained the page fragment pool we need to update
+ * the pagecnt_bias and page count so that we fully restock the
+ * number of references the driver holds.
*/
- page_ref_inc(page);
+ if (unlikely(pagecnt_bias == 1)) {
+ page_ref_add(page, USHRT_MAX);
+ rx_buffer->pagecnt_bias = USHRT_MAX;
+ }
return true;
}
@@ -2009,7 +2016,6 @@ static bool ixgbe_add_rx_frag(struct ixgbe_ring *rx_ring,
return true;
/* this page cannot be reused so discard it */
- __free_pages(page, ixgbe_rx_pg_order(rx_ring));
return false;
}
@@ -2088,15 +2094,19 @@ static struct sk_buff *ixgbe_fetch_rx_buffer(struct ixgbe_ring *rx_ring,
if (ixgbe_add_rx_frag(rx_ring, rx_buffer, size, skb)) {
/* hand second half of page back to the ring */
ixgbe_reuse_rx_page(rx_ring, rx_buffer);
- } else if (IXGBE_CB(skb)->dma == rx_buffer->dma) {
- /* the page has been released from the ring */
- IXGBE_CB(skb)->page_released = true;
} else {
- /* we are not reusing the buffer so unmap it */
- dma_unmap_page_attrs(rx_ring->dev, rx_buffer->dma,
- ixgbe_rx_pg_size(rx_ring),
- DMA_FROM_DEVICE,
- IXGBE_RX_DMA_ATTR);
+ if (IXGBE_CB(skb)->dma == rx_buffer->dma) {
+ /* the page has been released from the ring */
+ IXGBE_CB(skb)->page_released = true;
+ } else {
+ /* we are not reusing the buffer so unmap it */
+ dma_unmap_page_attrs(rx_ring->dev, rx_buffer->dma,
+ ixgbe_rx_pg_size(rx_ring),
+ DMA_FROM_DEVICE,
+ IXGBE_RX_DMA_ATTR);
+ }
+ __page_frag_cache_drain(page,
+ rx_buffer->pagecnt_bias);
}
/* clear contents of buffer_info */
@@ -4937,7 +4947,8 @@ static void ixgbe_clean_rx_ring(struct ixgbe_ring *rx_ring)
ixgbe_rx_pg_size(rx_ring),
DMA_FROM_DEVICE,
IXGBE_RX_DMA_ATTR);
- __free_pages(rx_buffer->page, ixgbe_rx_pg_order(rx_ring));
+ __page_frag_cache_drain(rx_buffer->page,
+ rx_buffer->pagecnt_bias);
rx_buffer->page = NULL;
}
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [Intel-wired-lan] [RFC PATCH 15/20] ixgbe: Make use of order 1 pages and 3K buffers independent of FCoE
2016-12-19 20:10 [Intel-wired-lan] [RFC PATCH 00/20] Enable the use of build_skb in igb/ixgbe Alexander Duyck
` (13 preceding siblings ...)
2016-12-19 20:12 ` [Intel-wired-lan] [RFC PATCH 14/20] ixgbe: Update code to better handle incrementing page count Alexander Duyck
@ 2016-12-19 20:12 ` Alexander Duyck
2016-12-19 20:12 ` [Intel-wired-lan] [RFC PATCH 16/20] ixgbe: Use length to determine if descriptor is done Alexander Duyck
` (4 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Alexander Duyck @ 2016-12-19 20:12 UTC (permalink / raw)
To: intel-wired-lan
From: Alexander Duyck <alexander.h.duyck@intel.com>
In order to support build_skb with jumbo frames it will be necessary to use
3K buffers for the Rx path with 8K pages backing them. This is needed on
architectures that implement 4K pages because we can't support 2K buffers
plus padding in a 4K page.
In the case of systems that support page sizes larger than 4K the 3K
attribute will only be applied to FCoE as we can fall back to using just 2K
buffers and adding the padding.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
drivers/net/ethernet/intel/ixgbe/ixgbe.h | 20 +++++++++-----------
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 19 +++++++++++++------
2 files changed, 22 insertions(+), 17 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index c9c00b58a9ad..cb1e8d4c10bb 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -233,13 +233,14 @@ struct ixgbe_rx_queue_stats {
#define IXGBE_TS_HDR_LEN 8
enum ixgbe_ring_state_t {
+ __IXGBE_RX_3K_BUFFER,
+ __IXGBE_RX_RSC_ENABLED,
+ __IXGBE_RX_CSUM_UDP_ZERO_ERR,
+ __IXGBE_RX_FCOE,
__IXGBE_TX_FDIR_INIT_DONE,
__IXGBE_TX_XPS_INIT_DONE,
__IXGBE_TX_DETECT_HANG,
__IXGBE_HANG_CHECK_ARMED,
- __IXGBE_RX_RSC_ENABLED,
- __IXGBE_RX_CSUM_UDP_ZERO_ERR,
- __IXGBE_RX_FCOE,
};
struct ixgbe_fwd_adapter {
@@ -351,19 +352,16 @@ struct ixgbe_ring_feature {
*/
static inline unsigned int ixgbe_rx_bufsz(struct ixgbe_ring *ring)
{
-#ifdef IXGBE_FCOE
- if (test_bit(__IXGBE_RX_FCOE, &ring->state))
- return (PAGE_SIZE < 8192) ? IXGBE_RXBUFFER_4K :
- IXGBE_RXBUFFER_3K;
-#endif
+ if (test_bit(__IXGBE_RX_3K_BUFFER, &ring->state))
+ return IXGBE_RXBUFFER_3K;
return IXGBE_RXBUFFER_2K;
}
static inline unsigned int ixgbe_rx_pg_order(struct ixgbe_ring *ring)
{
-#ifdef IXGBE_FCOE
- if (test_bit(__IXGBE_RX_FCOE, &ring->state))
- return (PAGE_SIZE < 8192) ? 1 : 0;
+#if (PAGE_SIZE < 8192)
+ if (test_bit(__IXGBE_RX_3K_BUFFER, &ring->state))
+ return 1;
#endif
return 0;
}
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index ac65b568fe01..56a2408ac0d8 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1601,6 +1601,7 @@ void ixgbe_alloc_rx_buffers(struct ixgbe_ring *rx_ring, u16 cleaned_count)
union ixgbe_adv_rx_desc *rx_desc;
struct ixgbe_rx_buffer *bi;
u16 i = rx_ring->next_to_use;
+ u16 bufsz;
/* nothing to do */
if (!cleaned_count)
@@ -1610,14 +1611,15 @@ void ixgbe_alloc_rx_buffers(struct ixgbe_ring *rx_ring, u16 cleaned_count)
bi = &rx_ring->rx_buffer_info[i];
i -= rx_ring->count;
+ bufsz = ixgbe_rx_bufsz(rx_ring);
+
do {
if (!ixgbe_alloc_mapped_page(rx_ring, bi))
break;
/* sync the buffer for use by the device */
dma_sync_single_range_for_device(rx_ring->dev, bi->dma,
- bi->page_offset,
- ixgbe_rx_bufsz(rx_ring),
+ bi->page_offset, bufsz,
DMA_FROM_DEVICE);
/*
@@ -2000,9 +2002,9 @@ static bool ixgbe_add_rx_frag(struct ixgbe_ring *rx_ring,
struct page *page = rx_buffer->page;
unsigned char *va = page_address(page) + rx_buffer->page_offset;
#if (PAGE_SIZE < 8192)
- unsigned int truesize = ixgbe_rx_bufsz(rx_ring);
+ unsigned int truesize = ixgbe_rx_pg_size(rx_ring) / 2;
#else
- unsigned int truesize = ALIGN(size, L1_CACHE_BYTES);
+ unsigned int truesize = SKB_DATA_ALIGN(size);
#endif
if (unlikely(skb_is_nonlinear(skb)))
@@ -3882,10 +3884,15 @@ static void ixgbe_set_rx_buffer_len(struct ixgbe_adapter *adapter)
*/
for (i = 0; i < adapter->num_rx_queues; i++) {
rx_ring = adapter->rx_ring[i];
+
+ clear_ring_rsc_enabled(rx_ring);
+ clear_bit(__IXGBE_RX_3K_BUFFER, &rx_ring->state);
+
if (adapter->flags2 & IXGBE_FLAG2_RSC_ENABLED)
set_ring_rsc_enabled(rx_ring);
- else
- clear_ring_rsc_enabled(rx_ring);
+
+ if (test_bit(__IXGBE_RX_FCOE, &rx_ring->state))
+ set_bit(__IXGBE_RX_3K_BUFFER, &rx_ring->state);
}
}
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [Intel-wired-lan] [RFC PATCH 16/20] ixgbe: Use length to determine if descriptor is done
2016-12-19 20:10 [Intel-wired-lan] [RFC PATCH 00/20] Enable the use of build_skb in igb/ixgbe Alexander Duyck
` (14 preceding siblings ...)
2016-12-19 20:12 ` [Intel-wired-lan] [RFC PATCH 15/20] ixgbe: Make use of order 1 pages and 3K buffers independent of FCoE Alexander Duyck
@ 2016-12-19 20:12 ` Alexander Duyck
2016-12-19 20:12 ` [Intel-wired-lan] [RFC PATCH 17/20] ixgbe: Break out Rx buffer page management Alexander Duyck
` (3 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Alexander Duyck @ 2016-12-19 20:12 UTC (permalink / raw)
To: intel-wired-lan
From: Alexander Duyck <alexander.h.duyck@intel.com>
This change makes it so that we use the length of the packet instead of the
DD status bit to determine if a new descriptor is ready to be processed.
The obvious advantage is that it cuts down on reads as we don't really even
need the DD bit if going from a 0 to a non-zero value on size is enough to
inform us that the packet has been completed.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 56a2408ac0d8..abd68b9b628e 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1637,8 +1637,8 @@ void ixgbe_alloc_rx_buffers(struct ixgbe_ring *rx_ring, u16 cleaned_count)
i -= rx_ring->count;
}
- /* clear the status bits for the next_to_use descriptor */
- rx_desc->wb.upper.status_error = 0;
+ /* clear the length for the next_to_use descriptor */
+ rx_desc->wb.upper.length = 0;
cleaned_count--;
} while (cleaned_count);
@@ -2154,7 +2154,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
rx_desc = IXGBE_RX_DESC(rx_ring, rx_ring->next_to_clean);
- if (!rx_desc->wb.upper.status_error)
+ if (!rx_desc->wb.upper.length)
break;
/* This memory barrier is needed to keep us from reading
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [Intel-wired-lan] [RFC PATCH 17/20] ixgbe: Break out Rx buffer page management
2016-12-19 20:10 [Intel-wired-lan] [RFC PATCH 00/20] Enable the use of build_skb in igb/ixgbe Alexander Duyck
` (15 preceding siblings ...)
2016-12-19 20:12 ` [Intel-wired-lan] [RFC PATCH 16/20] ixgbe: Use length to determine if descriptor is done Alexander Duyck
@ 2016-12-19 20:12 ` Alexander Duyck
2016-12-19 20:12 ` [Intel-wired-lan] [RFC PATCH 18/20] ixgbe: Add support for padding packet Alexander Duyck
` (2 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Alexander Duyck @ 2016-12-19 20:12 UTC (permalink / raw)
To: intel-wired-lan
From: Alexander Duyck <alexander.h.duyck@intel.com>
We are going to be expanding the number of Rx paths in the driver. Instead
of duplicating all that code I am pulling it apart into separate functions
so that we don't have so much code duplication.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 220 +++++++++++++------------
1 file changed, 114 insertions(+), 106 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index abd68b9b628e..3f94cad1fdc8 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1848,7 +1848,6 @@ static void ixgbe_dma_sync_frag(struct ixgbe_ring *rx_ring,
ixgbe_rx_pg_size(rx_ring),
DMA_FROM_DEVICE,
IXGBE_RX_DMA_ATTR);
- IXGBE_CB(skb)->page_released = false;
} else {
struct skb_frag_struct *frag = &skb_shinfo(skb)->frags[0];
@@ -1858,7 +1857,6 @@ static void ixgbe_dma_sync_frag(struct ixgbe_ring *rx_ring,
skb_frag_size(frag),
DMA_FROM_DEVICE);
}
- IXGBE_CB(skb)->dma = 0;
}
/**
@@ -1938,15 +1936,10 @@ static inline bool ixgbe_page_is_reserved(struct page *page)
return (page_to_nid(page) != numa_mem_id()) || page_is_pfmemalloc(page);
}
-static bool ixgbe_can_reuse_rx_page(struct ixgbe_rx_buffer *rx_buffer,
- struct page *page,
- const unsigned int truesize)
+static bool ixgbe_can_reuse_rx_page(struct ixgbe_rx_buffer *rx_buffer)
{
-#if (PAGE_SIZE >= 8192)
- unsigned int last_offset = ixgbe_rx_pg_size(rx_ring) -
- ixgbe_rx_bufsz(rx_ring);
-#endif
- unsigned int pagecnt_bias = rx_buffer->pagecnt_bias--;
+ unsigned int pagecnt_bias = rx_buffer->pagecnt_bias;
+ struct page *page = rx_buffer->page;
/* avoid re-using remote pages */
if (unlikely(ixgbe_page_is_reserved(page)))
@@ -1954,16 +1947,17 @@ static bool ixgbe_can_reuse_rx_page(struct ixgbe_rx_buffer *rx_buffer,
#if (PAGE_SIZE < 8192)
/* if we are only owner of page we can reuse it */
- if (unlikely(page_count(page) != pagecnt_bias))
+ if (unlikely((page_ref_count(page) - pagecnt_bias) > 1))
return false;
-
- /* flip page offset to other buffer */
- rx_buffer->page_offset ^= truesize;
#else
- /* move offset up to the next cache line */
- rx_buffer->page_offset += truesize;
-
- if (rx_buffer->page_offset > last_offset)
+ /* The last offset is a bit aggressive in that we assume the
+ * worst case of FCoE being enabled and using a 3K buffer.
+ * However this should have minimal impact as the 1K extra is
+ * still less than one buffer in size.
+ */
+#define IXGBE_LAST_OFFSET \
+ (SKB_WITH_OVERHEAD(PAGE_SIZE) - IXGBE_RXBUFFER_3072)
+ if (rx_buffer->page_offset > IXGBE_LAST_OFFSET)
return false;
#endif
@@ -1971,7 +1965,7 @@ static bool ixgbe_can_reuse_rx_page(struct ixgbe_rx_buffer *rx_buffer,
* the pagecnt_bias and page count so that we fully restock the
* number of references the driver holds.
*/
- if (unlikely(pagecnt_bias == 1)) {
+ if (unlikely(!pagecnt_bias)) {
page_ref_add(page, USHRT_MAX);
rx_buffer->pagecnt_bias = USHRT_MAX;
}
@@ -1994,106 +1988,65 @@ static bool ixgbe_can_reuse_rx_page(struct ixgbe_rx_buffer *rx_buffer,
* The function will then update the page offset if necessary and return
* true if the buffer can be reused by the adapter.
**/
-static bool ixgbe_add_rx_frag(struct ixgbe_ring *rx_ring,
+static void ixgbe_add_rx_frag(struct ixgbe_ring *rx_ring,
struct ixgbe_rx_buffer *rx_buffer,
- unsigned int size,
- struct sk_buff *skb)
+ struct sk_buff *skb,
+ unsigned int size)
{
- struct page *page = rx_buffer->page;
- unsigned char *va = page_address(page) + rx_buffer->page_offset;
#if (PAGE_SIZE < 8192)
unsigned int truesize = ixgbe_rx_pg_size(rx_ring) / 2;
#else
unsigned int truesize = SKB_DATA_ALIGN(size);
#endif
-
- if (unlikely(skb_is_nonlinear(skb)))
- goto add_tail_frag;
-
- if (size <= IXGBE_RX_HDR_SIZE) {
- memcpy(__skb_put(skb, size), va, ALIGN(size, sizeof(long)));
-
- /* page is not reserved, we can reuse buffer as-is */
- if (likely(!ixgbe_page_is_reserved(page)))
- return true;
-
- /* this page cannot be reused so discard it */
- return false;
- }
-
-add_tail_frag:
- skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, page,
+ skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, rx_buffer->page,
rx_buffer->page_offset, size, truesize);
-
- return ixgbe_can_reuse_rx_page(rx_buffer, page, truesize);
+#if (PAGE_SIZE < 8192)
+ rx_buffer->page_offset ^= truesize;
+#else
+ rx_buffer->page_offset += truesize;
+#endif
}
-static struct sk_buff *ixgbe_fetch_rx_buffer(struct ixgbe_ring *rx_ring,
- union ixgbe_adv_rx_desc *rx_desc)
+static struct ixgbe_rx_buffer *ixgbe_get_rx_buffer(struct ixgbe_ring *rx_ring,
+ union ixgbe_adv_rx_desc *rx_desc,
+ struct sk_buff **skb,
+ const unsigned int size)
{
- unsigned int size = le16_to_cpu(rx_desc->wb.upper.length);
struct ixgbe_rx_buffer *rx_buffer;
- struct sk_buff *skb;
- struct page *page;
rx_buffer = &rx_ring->rx_buffer_info[rx_ring->next_to_clean];
- page = rx_buffer->page;
- prefetchw(page);
-
- skb = rx_buffer->skb;
-
- if (likely(!skb)) {
- void *page_addr = page_address(page) +
- rx_buffer->page_offset;
-
- /* prefetch first cache line of first page */
- prefetch(page_addr);
-#if L1_CACHE_BYTES < 128
- prefetch(page_addr + L1_CACHE_BYTES);
-#endif
-
- /* allocate a skb to store the frags */
- skb = napi_alloc_skb(&rx_ring->q_vector->napi,
- IXGBE_RX_HDR_SIZE);
- if (unlikely(!skb)) {
- rx_ring->rx_stats.alloc_rx_buff_failed++;
- return NULL;
- }
-
- /*
- * we will be copying header into skb->data in
- * pskb_may_pull so it is in our interest to prefetch
- * it now to avoid a possible cache miss
- */
- prefetchw(skb->data);
-
- /*
- * Delay unmapping of the first packet. It carries the
- * header information, HW may still access the header
- * after the writeback. Only unmap it when EOP is
- * reached
- */
- if (likely(ixgbe_test_staterr(rx_desc, IXGBE_RXD_STAT_EOP)))
- goto dma_sync;
+ prefetchw(rx_buffer->page);
+ *skb = rx_buffer->skb;
- IXGBE_CB(skb)->dma = rx_buffer->dma;
+ /* Delay unmapping of the first packet. It carries the header
+ * information, HW may still access the header after the writeback.
+ * Only unmap it when EOP is reached
+ */
+ if (!ixgbe_test_staterr(rx_desc, IXGBE_RXD_STAT_EOP)) {
+ if (!*skb)
+ goto skip_sync;
} else {
- if (ixgbe_test_staterr(rx_desc, IXGBE_RXD_STAT_EOP))
- ixgbe_dma_sync_frag(rx_ring, skb);
+ if (*skb)
+ ixgbe_dma_sync_frag(rx_ring, *skb);
+ }
-dma_sync:
- /* we are reusing so sync this buffer for CPU use */
- dma_sync_single_range_for_cpu(rx_ring->dev,
- rx_buffer->dma,
- rx_buffer->page_offset,
- size,
- DMA_FROM_DEVICE);
+ /* we are reusing so sync this buffer for CPU use */
+ dma_sync_single_range_for_cpu(rx_ring->dev,
+ rx_buffer->dma,
+ rx_buffer->page_offset,
+ size,
+ DMA_FROM_DEVICE);
+skip_sync:
+ rx_buffer->pagecnt_bias--;
- rx_buffer->skb = NULL;
- }
+ return rx_buffer;
+}
- /* pull page into skb */
- if (ixgbe_add_rx_frag(rx_ring, rx_buffer, size, skb)) {
+static void ixgbe_put_rx_buffer(struct ixgbe_ring *rx_ring,
+ struct ixgbe_rx_buffer *rx_buffer,
+ struct sk_buff *skb)
+{
+ if (ixgbe_can_reuse_rx_page(rx_buffer)) {
/* hand second half of page back to the ring */
ixgbe_reuse_rx_page(rx_ring, rx_buffer);
} else {
@@ -2107,12 +2060,55 @@ static struct sk_buff *ixgbe_fetch_rx_buffer(struct ixgbe_ring *rx_ring,
DMA_FROM_DEVICE,
IXGBE_RX_DMA_ATTR);
}
- __page_frag_cache_drain(page,
+ __page_frag_cache_drain(rx_buffer->page,
rx_buffer->pagecnt_bias);
}
- /* clear contents of buffer_info */
+ /* clear contents of rx_buffer */
rx_buffer->page = NULL;
+ rx_buffer->skb = NULL;
+}
+
+static struct sk_buff *ixgbe_construct_skb(struct ixgbe_ring *rx_ring,
+ struct ixgbe_rx_buffer *rx_buffer,
+ union ixgbe_adv_rx_desc *rx_desc,
+ unsigned int size)
+{
+ void *va = page_address(rx_buffer->page) + rx_buffer->page_offset;
+#if (PAGE_SIZE < 8192)
+ unsigned int truesize = ixgbe_rx_pg_size(rx_ring) / 2;
+#else
+ unsigned int truesize = ALIGN(size, L1_CACHE_BYTES);
+#endif
+ struct sk_buff *skb;
+
+ /* prefetch first cache line of first page */
+ prefetch(va);
+#if L1_CACHE_BYTES < 128
+ prefetch(va + L1_CACHE_BYTES);
+#endif
+
+ /* allocate a skb to store the frags */
+ skb = napi_alloc_skb(&rx_ring->q_vector->napi, IXGBE_RX_HDR_SIZE);
+ if (unlikely(!skb))
+ return NULL;
+
+ if (size > IXGBE_RX_HDR_SIZE) {
+ if (!ixgbe_test_staterr(rx_desc, IXGBE_RXD_STAT_EOP))
+ IXGBE_CB(skb)->dma = rx_buffer->dma;
+
+ skb_add_rx_frag(skb, 0, rx_buffer->page,
+ rx_buffer->page_offset,
+ size, truesize);
+#if (PAGE_SIZE < 8192)
+ rx_buffer->page_offset ^= truesize;
+#else
+ rx_buffer->page_offset += truesize;
+#endif
+ } else {
+ memcpy(__skb_put(skb, size), va, ALIGN(size, sizeof(long)));
+ rx_buffer->pagecnt_bias++;
+ }
return skb;
}
@@ -2144,7 +2140,9 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
while (likely(total_rx_packets < budget)) {
union ixgbe_adv_rx_desc *rx_desc;
+ struct ixgbe_rx_buffer *rx_buffer;
struct sk_buff *skb;
+ unsigned int size;
/* return some buffers to hardware, one@a time is too slow */
if (cleaned_count >= IXGBE_RX_BUFFER_WRITE) {
@@ -2153,8 +2151,8 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
}
rx_desc = IXGBE_RX_DESC(rx_ring, rx_ring->next_to_clean);
-
- if (!rx_desc->wb.upper.length)
+ size = le16_to_cpu(rx_desc->wb.upper.length);
+ if (!size)
break;
/* This memory barrier is needed to keep us from reading
@@ -2163,13 +2161,23 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
*/
dma_rmb();
+ rx_buffer = ixgbe_get_rx_buffer(rx_ring, rx_desc, &skb, size);
+
/* retrieve a buffer from the ring */
- skb = ixgbe_fetch_rx_buffer(rx_ring, rx_desc);
+ if (skb)
+ ixgbe_add_rx_frag(rx_ring, rx_buffer, skb, size);
+ else
+ skb = ixgbe_construct_skb(rx_ring, rx_buffer,
+ rx_desc, size);
/* exit if we failed to retrieve a buffer */
- if (!skb)
+ if (!skb) {
+ rx_ring->rx_stats.alloc_rx_buff_failed++;
+ rx_buffer->pagecnt_bias++;
break;
+ }
+ ixgbe_put_rx_buffer(rx_ring, rx_buffer, skb);
cleaned_count++;
/* place incomplete frames back on ring for completion */
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [Intel-wired-lan] [RFC PATCH 18/20] ixgbe: Add support for padding packet
2016-12-19 20:10 [Intel-wired-lan] [RFC PATCH 00/20] Enable the use of build_skb in igb/ixgbe Alexander Duyck
` (16 preceding siblings ...)
2016-12-19 20:12 ` [Intel-wired-lan] [RFC PATCH 17/20] ixgbe: Break out Rx buffer page management Alexander Duyck
@ 2016-12-19 20:12 ` Alexander Duyck
2016-12-19 20:12 ` [Intel-wired-lan] [RFC PATCH 19/20] ixgbe: Add private flag to control buffer mode Alexander Duyck
2016-12-19 20:12 ` [Intel-wired-lan] [RFC PATCH 20/20] ixgbe: Add support for build_skb Alexander Duyck
19 siblings, 0 replies; 21+ messages in thread
From: Alexander Duyck @ 2016-12-19 20:12 UTC (permalink / raw)
To: intel-wired-lan
From: Alexander Duyck <alexander.h.duyck@intel.com>
This patch adds support for providing a buffer with headroom and tailroom
to allow for shared info, NET_SKB_PAD, and NET_IP_ALIGN. With this
combined with the DMA changes we can start using build_skb to build frames
around an incoming Rx buffer instead of having to memcpy the headers.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
drivers/net/ethernet/intel/ixgbe/ixgbe.h | 17 ++++++++++
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 43 +++++++++++++++++++++++--
2 files changed, 56 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index cb1e8d4c10bb..a535b538ad9e 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -94,6 +94,14 @@
#define IXGBE_RXBUFFER_4K 4096
#define IXGBE_MAX_RXBUFFER 16384 /* largest size for a single descriptor */
+#define IXGBE_SKB_PAD (NET_SKB_PAD + NET_IP_ALIGN)
+#if (PAGE_SIZE < 8192)
+#define IXGBE_MAX_FRAME_BUILD_SKB \
+ (SKB_WITH_OVERHEAD(IXGBE_RXBUFFER_2K) - IXGBE_SKB_PAD)
+#else
+#define IGB_MAX_FRAME_BUILD_SKB IXGBE_RXBUFFER_2K
+#endif
+
/*
* NOTE: netdev_alloc_skb reserves up to 64 bytes, NET_IP_ALIGN means we
* reserve 64 more, and skb_shared_info adds an additional 320 bytes more,
@@ -234,6 +242,7 @@ struct ixgbe_rx_queue_stats {
enum ixgbe_ring_state_t {
__IXGBE_RX_3K_BUFFER,
+ __IXGBE_RX_BUILD_SKB_ENABLED,
__IXGBE_RX_RSC_ENABLED,
__IXGBE_RX_CSUM_UDP_ZERO_ERR,
__IXGBE_RX_FCOE,
@@ -243,6 +252,9 @@ enum ixgbe_ring_state_t {
__IXGBE_HANG_CHECK_ARMED,
};
+#define ring_uses_build_skb(ring) \
+ test_bit(__IXGBE_RX_BUILD_SKB_ENABLED, &(ring)->state)
+
struct ixgbe_fwd_adapter {
unsigned long active_vlans[BITS_TO_LONGS(VLAN_N_VID)];
struct net_device *netdev;
@@ -354,6 +366,10 @@ static inline unsigned int ixgbe_rx_bufsz(struct ixgbe_ring *ring)
{
if (test_bit(__IXGBE_RX_3K_BUFFER, &ring->state))
return IXGBE_RXBUFFER_3K;
+#if (PAGE_SIZE < 8192)
+ if (ring_uses_build_skb(ring))
+ return IXGBE_MAX_FRAME_BUILD_SKB;
+#endif
return IXGBE_RXBUFFER_2K;
}
@@ -667,6 +683,7 @@ struct ixgbe_adapter {
#define IXGBE_FLAG2_PHY_INTERRUPT BIT(11)
#define IXGBE_FLAG2_UDP_TUN_REREG_NEEDED BIT(12)
#define IXGBE_FLAG2_VLAN_PROMISC BIT(13)
+#define IXGBE_FLAG2_RX_LEGACY BIT(14)
/* Tx fast path data */
int num_tx_queues;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 3f94cad1fdc8..d509b6842b9a 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1549,6 +1549,11 @@ static inline void ixgbe_rx_checksum(struct ixgbe_ring *ring,
}
}
+static inline unsigned int ixgbe_rx_offset(struct ixgbe_ring *rx_ring)
+{
+ return ring_uses_build_skb(rx_ring) ? IXGBE_SKB_PAD : 0;
+}
+
static bool ixgbe_alloc_mapped_page(struct ixgbe_ring *rx_ring,
struct ixgbe_rx_buffer *bi)
{
@@ -1585,7 +1590,7 @@ static bool ixgbe_alloc_mapped_page(struct ixgbe_ring *rx_ring,
bi->dma = dma;
bi->page = page;
- bi->page_offset = 0;
+ bi->page_offset = ixgbe_rx_offset(rx_ring);
bi->pagecnt_bias = 1;
return true;
@@ -1996,7 +2001,9 @@ static void ixgbe_add_rx_frag(struct ixgbe_ring *rx_ring,
#if (PAGE_SIZE < 8192)
unsigned int truesize = ixgbe_rx_pg_size(rx_ring) / 2;
#else
- unsigned int truesize = SKB_DATA_ALIGN(size);
+ unsigned int truesize = ring_uses_build_skb(rx_ring) ?
+ SKB_DATA_ALIGN(IXGBE_SKB_PAD + size) :
+ SKB_DATA_ALIGN(size);
#endif
skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, rx_buffer->page,
rx_buffer->page_offset, size, truesize);
@@ -2078,7 +2085,7 @@ static struct sk_buff *ixgbe_construct_skb(struct ixgbe_ring *rx_ring,
#if (PAGE_SIZE < 8192)
unsigned int truesize = ixgbe_rx_pg_size(rx_ring) / 2;
#else
- unsigned int truesize = ALIGN(size, L1_CACHE_BYTES);
+ unsigned int truesize = SKB_DATA_ALIGN(size);
#endif
struct sk_buff *skb;
@@ -3421,7 +3428,10 @@ static void ixgbe_configure_srrctl(struct ixgbe_adapter *adapter,
srrctl = IXGBE_RX_HDR_SIZE << IXGBE_SRRCTL_BSIZEHDRSIZE_SHIFT;
/* configure the packet buffer length */
- srrctl |= ixgbe_rx_bufsz(rx_ring) >> IXGBE_SRRCTL_BSIZEPKT_SHIFT;
+ if (test_bit(__IXGBE_RX_3K_BUFFER, &rx_ring->state))
+ srrctl |= IXGBE_RXBUFFER_3K >> IXGBE_SRRCTL_BSIZEPKT_SHIFT;
+ else
+ srrctl |= IXGBE_RXBUFFER_2K >> IXGBE_SRRCTL_BSIZEPKT_SHIFT;
/* configure descriptor type */
srrctl |= IXGBE_SRRCTL_DESCTYPE_ADV_ONEBUF;
@@ -3754,6 +3764,17 @@ void ixgbe_configure_rx_ring(struct ixgbe_adapter *adapter,
*/
rxdctl &= ~0x3FFFFF;
rxdctl |= 0x080420;
+#if (PAGE_SIZE < 8192)
+ } else {
+ rxdctl &= ~(IXGBE_RXDCTL_RLPMLMASK |
+ IXGBE_RXDCTL_RLPML_EN);
+
+ /* Limit the maximum frame size so we don't overrun the skb */
+ if (ring_uses_build_skb(ring) &&
+ !test_bit(__IXGBE_RX_3K_BUFFER, &ring->state))
+ rxdctl |= IXGBE_MAX_FRAME_BUILD_SKB |
+ IXGBE_RXDCTL_RLPML_EN;
+#endif
}
/* enable receive descriptor ring */
@@ -3895,12 +3916,26 @@ static void ixgbe_set_rx_buffer_len(struct ixgbe_adapter *adapter)
clear_ring_rsc_enabled(rx_ring);
clear_bit(__IXGBE_RX_3K_BUFFER, &rx_ring->state);
+ clear_bit(__IXGBE_RX_BUILD_SKB_ENABLED, &rx_ring->state);
if (adapter->flags2 & IXGBE_FLAG2_RSC_ENABLED)
set_ring_rsc_enabled(rx_ring);
if (test_bit(__IXGBE_RX_FCOE, &rx_ring->state))
set_bit(__IXGBE_RX_3K_BUFFER, &rx_ring->state);
+
+ if (adapter->flags2 & IXGBE_FLAG2_RX_LEGACY)
+ continue;
+
+ set_bit(__IXGBE_RX_BUILD_SKB_ENABLED, &rx_ring->state);
+
+#if (PAGE_SIZE < 8192)
+ if (adapter->flags2 & IXGBE_FLAG2_RSC_ENABLED)
+ set_bit(__IXGBE_RX_3K_BUFFER, &rx_ring->state);
+
+ if (max_frame > (ETH_FRAME_LEN + ETH_FCS_LEN))
+ set_bit(__IXGBE_RX_3K_BUFFER, &rx_ring->state);
+#endif
}
}
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [Intel-wired-lan] [RFC PATCH 19/20] ixgbe: Add private flag to control buffer mode
2016-12-19 20:10 [Intel-wired-lan] [RFC PATCH 00/20] Enable the use of build_skb in igb/ixgbe Alexander Duyck
` (17 preceding siblings ...)
2016-12-19 20:12 ` [Intel-wired-lan] [RFC PATCH 18/20] ixgbe: Add support for padding packet Alexander Duyck
@ 2016-12-19 20:12 ` Alexander Duyck
2016-12-19 20:12 ` [Intel-wired-lan] [RFC PATCH 20/20] ixgbe: Add support for build_skb Alexander Duyck
19 siblings, 0 replies; 21+ messages in thread
From: Alexander Duyck @ 2016-12-19 20:12 UTC (permalink / raw)
To: intel-wired-lan
From: Alexander Duyck <alexander.h.duyck@intel.com>
Since there are potential drawbacks to the new Rx allocation approach I
thought it best to add a "chicken bit" so that we can turn the feature off
if in the event that a problem is found.
It also provides a means of validating the legacy Rx path in the event that
we are forced to fall back. At some point int he future when we are
convinced we don't need it anymore we might be able to drop the legacy-rx
flag.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c | 47 ++++++++++++++++++++++
1 file changed, 47 insertions(+)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
index fd192bf29b26..5117c1845b35 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
@@ -151,6 +151,13 @@ struct ixgbe_stats {
};
#define IXGBE_TEST_LEN sizeof(ixgbe_gstrings_test) / ETH_GSTRING_LEN
+static const char ixgbe_priv_flags_strings[][ETH_GSTRING_LEN] = {
+#define IXGBE_PRIV_FLAGS_LEGACY_RX BIT(0)
+ "legacy-rx",
+};
+
+#define IXGBE_PRIV_FLAGS_STR_LEN ARRAY_SIZE(ixgbe_priv_flags_strings)
+
/* currently supported speeds for 10G */
#define ADVRTSD_MSK_10G (SUPPORTED_10000baseT_Full | \
SUPPORTED_10000baseKX4_Full | \
@@ -989,6 +996,8 @@ static void ixgbe_get_drvinfo(struct net_device *netdev,
strlcpy(drvinfo->bus_info, pci_name(adapter->pdev),
sizeof(drvinfo->bus_info));
+
+ drvinfo->n_priv_flags = IXGBE_PRIV_FLAGS_STR_LEN;
}
static void ixgbe_get_ringparam(struct net_device *netdev,
@@ -1128,6 +1137,8 @@ static int ixgbe_get_sset_count(struct net_device *netdev, int sset)
return IXGBE_TEST_LEN;
case ETH_SS_STATS:
return IXGBE_STATS_LEN;
+ case ETH_SS_PRIV_FLAGS:
+ return IXGBE_PRIV_FLAGS_STR_LEN;
default:
return -EOPNOTSUPP;
}
@@ -1292,6 +1303,9 @@ static void ixgbe_get_strings(struct net_device *netdev, u32 stringset,
}
/* BUG_ON(p - data != IXGBE_STATS_LEN * ETH_GSTRING_LEN); */
break;
+ case ETH_SS_PRIV_FLAGS:
+ memcpy(data, ixgbe_priv_flags_strings,
+ IXGBE_PRIV_FLAGS_STR_LEN * ETH_GSTRING_LEN);
}
}
@@ -3237,6 +3251,37 @@ static int ixgbe_get_module_eeprom(struct net_device *dev,
return 0;
}
+static u32 ixgbe_get_priv_flags(struct net_device *netdev)
+{
+ struct ixgbe_adapter *adapter = netdev_priv(netdev);
+ u32 priv_flags = 0;
+
+ if (adapter->flags2 & IXGBE_FLAG2_RX_LEGACY)
+ priv_flags |= IXGBE_PRIV_FLAGS_LEGACY_RX;
+
+ return priv_flags;
+}
+
+static int ixgbe_set_priv_flags(struct net_device *netdev, u32 priv_flags)
+{
+ struct ixgbe_adapter *adapter = netdev_priv(netdev);
+ unsigned int flags2 = adapter->flags2;
+
+ flags2 &= ~IXGBE_FLAG2_RX_LEGACY;
+ if (priv_flags & IXGBE_PRIV_FLAGS_LEGACY_RX)
+ flags2 |= IXGBE_FLAG2_RX_LEGACY;
+
+ if (flags2 != adapter->flags2) {
+ adapter->flags2 = flags2;
+
+ /* reset interface to repopulate queues */
+ if (netif_running(netdev))
+ ixgbe_reinit_locked(adapter);
+ }
+
+ return 0;
+}
+
static const struct ethtool_ops ixgbe_ethtool_ops = {
.get_settings = ixgbe_get_settings,
.set_settings = ixgbe_set_settings,
@@ -3271,6 +3316,8 @@ static int ixgbe_get_module_eeprom(struct net_device *dev,
.set_rxfh = ixgbe_set_rxfh,
.get_channels = ixgbe_get_channels,
.set_channels = ixgbe_set_channels,
+ .get_priv_flags = ixgbe_get_priv_flags,
+ .set_priv_flags = ixgbe_set_priv_flags,
.get_ts_info = ixgbe_get_ts_info,
.get_module_info = ixgbe_get_module_info,
.get_module_eeprom = ixgbe_get_module_eeprom,
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [Intel-wired-lan] [RFC PATCH 20/20] ixgbe: Add support for build_skb
2016-12-19 20:10 [Intel-wired-lan] [RFC PATCH 00/20] Enable the use of build_skb in igb/ixgbe Alexander Duyck
` (18 preceding siblings ...)
2016-12-19 20:12 ` [Intel-wired-lan] [RFC PATCH 19/20] ixgbe: Add private flag to control buffer mode Alexander Duyck
@ 2016-12-19 20:12 ` Alexander Duyck
19 siblings, 0 replies; 21+ messages in thread
From: Alexander Duyck @ 2016-12-19 20:12 UTC (permalink / raw)
To: intel-wired-lan
From: Alexander Duyck <alexander.h.duyck@intel.com>
This patch adds build_skb support to the Rx path. There are several
advantages to this change.
1. It avoids the memcpy and skb->head allocation for small packets which
improves performance by about 5% in my tests.
2. It avoids the memcpy, skb->head allocation, and eth_get_headlen
for larger packets improving performance by about 10% in my tests.
3. For VXLAN packets it allows the full header to be in skb->data which
improves the performance by as much as 30% in some of my tests.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 49 ++++++++++++++++++++++++-
1 file changed, 48 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index d509b6842b9a..06a97b003639 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1897,7 +1897,7 @@ static bool ixgbe_cleanup_headers(struct ixgbe_ring *rx_ring,
}
/* place header in linear portion of buffer */
- if (skb_is_nonlinear(skb))
+ if (!skb_headlen(skb))
ixgbe_pull_tail(rx_ring, skb);
#ifdef IXGBE_FCOE
@@ -2120,6 +2120,49 @@ static struct sk_buff *ixgbe_construct_skb(struct ixgbe_ring *rx_ring,
return skb;
}
+static struct sk_buff *ixgbe_build_skb(struct ixgbe_ring *rx_ring,
+ struct ixgbe_rx_buffer *rx_buffer,
+ union ixgbe_adv_rx_desc *rx_desc,
+ unsigned int size)
+{
+ void *va = page_address(rx_buffer->page) + rx_buffer->page_offset;
+#if (PAGE_SIZE < 8192)
+ unsigned int truesize = ixgbe_rx_pg_size(rx_ring) / 2;
+#else
+ unsigned int truesize = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) +
+ SKB_DATA_ALIGN(IXGBE_SKB_PAD + size);
+#endif
+ struct sk_buff *skb;
+
+ /* prefetch first cache line of first page */
+ prefetch(va);
+#if L1_CACHE_BYTES < 128
+ prefetch(va + L1_CACHE_BYTES);
+#endif
+
+ /* build an skb to around the page buffer */
+ skb = build_skb(va - IXGBE_SKB_PAD, truesize);
+ if (unlikely(!skb))
+ return NULL;
+
+ /* update pointers within the skb to store the data */
+ skb_reserve(skb, IXGBE_SKB_PAD);
+ __skb_put(skb, size);
+
+ /* record DMA address if this is the start of a chain of buffers */
+ if (!ixgbe_test_staterr(rx_desc, IXGBE_RXD_STAT_EOP))
+ IXGBE_CB(skb)->dma = rx_buffer->dma;
+
+ /* update buffer offset */
+#if (PAGE_SIZE < 8192)
+ rx_buffer->page_offset ^= truesize;
+#else
+ rx_buffer->page_offset += truesize;
+#endif
+
+ return skb;
+}
+
/**
* ixgbe_clean_rx_irq - Clean completed descriptors from Rx ring - bounce buf
* @q_vector: structure containing interrupt and ring information
@@ -2173,6 +2216,9 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
/* retrieve a buffer from the ring */
if (skb)
ixgbe_add_rx_frag(rx_ring, rx_buffer, skb, size);
+ else if (ring_uses_build_skb(rx_ring))
+ skb = ixgbe_build_skb(rx_ring, rx_buffer,
+ rx_desc, size);
else
skb = ixgbe_construct_skb(rx_ring, rx_buffer,
rx_desc, size);
@@ -3924,6 +3970,7 @@ static void ixgbe_set_rx_buffer_len(struct ixgbe_adapter *adapter)
if (test_bit(__IXGBE_RX_FCOE, &rx_ring->state))
set_bit(__IXGBE_RX_3K_BUFFER, &rx_ring->state);
+ clear_bit(__IXGBE_RX_BUILD_SKB_ENABLED, &rx_ring->state);
if (adapter->flags2 & IXGBE_FLAG2_RX_LEGACY)
continue;
^ permalink raw reply related [flat|nested] 21+ messages in thread
end of thread, other threads:[~2016-12-19 20:12 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-12-19 20:10 [Intel-wired-lan] [RFC PATCH 00/20] Enable the use of build_skb in igb/ixgbe Alexander Duyck
2016-12-19 20:10 ` [Intel-wired-lan] [RFC PATCH 01/20] mm: Rename __alloc_page_frag to page_frag_alloc and __free_page_frag to page_frag_free Alexander Duyck
2016-12-19 20:10 ` [Intel-wired-lan] [RFC PATCH 02/20] mm: Rename __page_frag functions to __page_frag_cache, drop order from drain Alexander Duyck
2016-12-19 20:10 ` [Intel-wired-lan] [RFC PATCH 03/20] mm: Add documentation for page fragment APIs Alexander Duyck
2016-12-19 20:11 ` [Intel-wired-lan] [RFC PATCH 04/20] igb: Add support for DMA_ATTR_WEAK_ORDERING Alexander Duyck
2016-12-19 20:11 ` [Intel-wired-lan] [RFC PATCH 05/20] igb: Use length to determine if descriptor is done Alexander Duyck
2016-12-19 20:11 ` [Intel-wired-lan] [RFC PATCH 06/20] igb: Limit maximum frame Rx based on MTU Alexander Duyck
2016-12-19 20:11 ` [Intel-wired-lan] [RFC PATCH 07/20] igb: Add support for padding packet Alexander Duyck
2016-12-19 20:11 ` [Intel-wired-lan] [RFC PATCH 08/20] igb: Add support for ethtool private flag to allow use of legacy Rx Alexander Duyck
2016-12-19 20:11 ` [Intel-wired-lan] [RFC PATCH 09/20] igb: Break out Rx buffer page management Alexander Duyck
2016-12-19 20:11 ` [Intel-wired-lan] [RFC PATCH 10/20] igb: Revert "igb: Revert support for build_skb in igb" Alexander Duyck
2016-12-19 20:11 ` [Intel-wired-lan] [RFC PATCH 11/20] ixgbe: Add function for checking to see if we can reuse page Alexander Duyck
2016-12-19 20:11 ` [Intel-wired-lan] [RFC PATCH 12/20] ixgbe: Only DMA sync frame length Alexander Duyck
2016-12-19 20:11 ` [Intel-wired-lan] [RFC PATCH 13/20] ixgbe: Update driver to make use of DMA attributes in Rx path Alexander Duyck
2016-12-19 20:12 ` [Intel-wired-lan] [RFC PATCH 14/20] ixgbe: Update code to better handle incrementing page count Alexander Duyck
2016-12-19 20:12 ` [Intel-wired-lan] [RFC PATCH 15/20] ixgbe: Make use of order 1 pages and 3K buffers independent of FCoE Alexander Duyck
2016-12-19 20:12 ` [Intel-wired-lan] [RFC PATCH 16/20] ixgbe: Use length to determine if descriptor is done Alexander Duyck
2016-12-19 20:12 ` [Intel-wired-lan] [RFC PATCH 17/20] ixgbe: Break out Rx buffer page management Alexander Duyck
2016-12-19 20:12 ` [Intel-wired-lan] [RFC PATCH 18/20] ixgbe: Add support for padding packet Alexander Duyck
2016-12-19 20:12 ` [Intel-wired-lan] [RFC PATCH 19/20] ixgbe: Add private flag to control buffer mode Alexander Duyck
2016-12-19 20:12 ` [Intel-wired-lan] [RFC PATCH 20/20] ixgbe: Add support for build_skb Alexander Duyck
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.