* [PATCH v3 iwl-net 0/3] ice: fix Rx data path for heavy 9k MTU traffic
@ 2025-01-20 15:50 Maciej Fijalkowski
2025-01-20 15:50 ` [PATCH v3 iwl-net 1/3] ice: put Rx buffers after being done with current frame Maciej Fijalkowski
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Maciej Fijalkowski @ 2025-01-20 15:50 UTC (permalink / raw)
To: intel-wired-lan
Cc: netdev, anthony.l.nguyen, magnus.karlsson, jacob.e.keller, xudu,
mschmidt, jmaxwell, poros, przemyslaw.kitszel, Maciej Fijalkowski
v2->v3:
s/intel/iwl in patch subjects
v1->v2:
* pass ntc to ice_put_rx_mbuf() (pointed out by Petr Oros) in patch 1
* add review tags from Przemek Kitszel (thanks!)
* make sure patches compile and work ;)
Hello in 2025,
this patchset fixes a pretty nasty issue that was reported by RedHat
folks which occured after ~30 minutes (this value varied, just trying
here to state that it was not observed immediately but rather after a
considerable longer amount of time) when ice driver was tortured with
jumbo frames via mix of iperf traffic executed simultaneously with
wrk/nginx on client/server sides (HTTP and TCP workloads basically).
The reported splats were spanning across all the bad things that can
happen to the state of page - refcount underflow, use-after-free, etc.
One of these looked as follows:
[ 2084.019891] BUG: Bad page state in process swapper/34 pfn:97fcd0
[ 2084.025990] page:00000000a60ee772 refcount:-1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x97fcd0
[ 2084.035462] flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff)
[ 2084.041990] raw: 0017ffffc0000000 dead000000000100 dead000000000122 0000000000000000
[ 2084.049730] raw: 0000000000000000 0000000000000000 ffffffffffffffff 0000000000000000
[ 2084.057468] page dumped because: nonzero _refcount
[ 2084.062260] Modules linked in: bonding tls sunrpc intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common i10nm_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm mgag200 irqd
[ 2084.137829] CPU: 34 PID: 0 Comm: swapper/34 Kdump: loaded Not tainted 5.14.0-427.37.1.el9_4.x86_64 #1
[ 2084.147039] Hardware name: Dell Inc. PowerEdge R750/0216NK, BIOS 1.13.2 12/19/2023
[ 2084.154604] Call Trace:
[ 2084.157058] <IRQ>
[ 2084.159080] dump_stack_lvl+0x34/0x48
[ 2084.162752] bad_page.cold+0x63/0x94
[ 2084.166333] check_new_pages+0xb3/0xe0
[ 2084.170083] rmqueue_bulk+0x2d2/0x9e0
[ 2084.173749] ? ktime_get+0x35/0xa0
[ 2084.177159] rmqueue_pcplist+0x13b/0x210
[ 2084.181081] rmqueue+0x7d3/0xd40
[ 2084.184316] ? xas_load+0x9/0xa0
[ 2084.187547] ? xas_find+0x183/0x1d0
[ 2084.191041] ? xa_find_after+0xd0/0x130
[ 2084.194879] ? intel_iommu_iotlb_sync_map+0x89/0xe0
[ 2084.199759] get_page_from_freelist+0x11f/0x530
[ 2084.204291] __alloc_pages+0xf2/0x250
[ 2084.207958] ice_alloc_rx_bufs+0xcc/0x1c0 [ice]
[ 2084.212543] ice_clean_rx_irq+0x631/0xa20 [ice]
[ 2084.217111] ice_napi_poll+0xdf/0x2a0 [ice]
[ 2084.221330] __napi_poll+0x27/0x170
[ 2084.224824] net_rx_action+0x233/0x2f0
[ 2084.228575] __do_softirq+0xc7/0x2ac
[ 2084.232155] __irq_exit_rcu+0xa1/0xc0
[ 2084.235821] common_interrupt+0x80/0xa0
[ 2084.239662] </IRQ>
[ 2084.241768] <TASK>
The fix is mostly about reverting what was done in commit 1dc1a7e7f410
("ice: Centrallize Rx buffer recycling") followed by proper timing on
page_count() storage and then removing the ice_rx_buf::act related logic
(which was mostly introduced for purposes from cited commit).
Special thanks to Xu Du for providing reproducer and Jacob Keller for
initial extensive analysis.
Thanks,
Maciej
Maciej Fijalkowski (3):
ice: put Rx buffers after being done with current frame
ice: gather page_count()'s of each frag right before XDP prog call
ice: stop storing XDP verdict within ice_rx_buf
drivers/net/ethernet/intel/ice/ice_txrx.c | 128 +++++++++++-------
drivers/net/ethernet/intel/ice/ice_txrx.h | 1 -
drivers/net/ethernet/intel/ice/ice_txrx_lib.h | 43 ------
3 files changed, 82 insertions(+), 90 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v3 iwl-net 1/3] ice: put Rx buffers after being done with current frame
2025-01-20 15:50 [PATCH v3 iwl-net 0/3] ice: fix Rx data path for heavy 9k MTU traffic Maciej Fijalkowski
@ 2025-01-20 15:50 ` Maciej Fijalkowski
2025-01-20 16:38 ` Simon Horman
2025-01-20 15:50 ` [PATCH v3 iwl-net 2/3] ice: gather page_count()'s of each frag right before XDP prog call Maciej Fijalkowski
2025-01-20 15:50 ` [PATCH v3 iwl-net 3/3] ice: stop storing XDP verdict within ice_rx_buf Maciej Fijalkowski
2 siblings, 1 reply; 8+ messages in thread
From: Maciej Fijalkowski @ 2025-01-20 15:50 UTC (permalink / raw)
To: intel-wired-lan
Cc: netdev, anthony.l.nguyen, magnus.karlsson, jacob.e.keller, xudu,
mschmidt, jmaxwell, poros, przemyslaw.kitszel, Maciej Fijalkowski
Introduce a new helper ice_put_rx_mbuf() that will go through gathered
frags from current frame and will call ice_put_rx_buf() on them. Current
logic that was supposed to simplify and optimize the driver where we go
through a batch of all buffers processed in current NAPI instance turned
out to be broken for jumbo frames and very heavy load that was coming
from both multi-thread iperf and nginx/wrk pair between server and
client. The delay introduced by approach that we are dropping is simply
too big and we need to take the decision regarding page
recycling/releasing as quick as we can.
While at it, address an error path of ice_add_xdp_frag() - we were
missing buffer putting from day 1 there.
As a nice side effect we get rid of annoying and repetetive three-liner:
xdp->data = NULL;
rx_ring->first_desc = ntc;
rx_ring->nr_frags = 0;
by embedding it within introduced routine.
Fixes: 1dc1a7e7f410 ("ice: Centrallize Rx buffer recycling")
Reported-and-tested-by: Xu Du <xudu@redhat.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Co-developed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
drivers/net/ethernet/intel/ice/ice_txrx.c | 68 +++++++++++++----------
1 file changed, 39 insertions(+), 29 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index 5d2d7736fd5f..f2134ad57ead 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -1103,6 +1103,38 @@ ice_put_rx_buf(struct ice_rx_ring *rx_ring, struct ice_rx_buf *rx_buf)
rx_buf->page = NULL;
}
+static void ice_put_rx_mbuf(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
+ u32 *xdp_xmit, u32 ntc)
+{
+ u32 nr_frags = rx_ring->nr_frags + 1;
+ u32 idx = rx_ring->first_desc;
+ u32 cnt = rx_ring->count;
+ struct ice_rx_buf *buf;
+ int i;
+
+ for (i = 0; i < nr_frags; i++) {
+ buf = &rx_ring->rx_buf[idx];
+
+ if (buf->act & (ICE_XDP_TX | ICE_XDP_REDIR)) {
+ ice_rx_buf_adjust_pg_offset(buf, xdp->frame_sz);
+ *xdp_xmit |= buf->act;
+ } else if (buf->act & ICE_XDP_CONSUMED) {
+ buf->pagecnt_bias++;
+ } else if (buf->act == ICE_XDP_PASS) {
+ ice_rx_buf_adjust_pg_offset(buf, xdp->frame_sz);
+ }
+
+ ice_put_rx_buf(rx_ring, buf);
+
+ if (++idx == cnt)
+ idx = 0;
+ }
+
+ xdp->data = NULL;
+ rx_ring->first_desc = ntc;
+ rx_ring->nr_frags = 0;
+}
+
/**
* ice_clean_rx_irq - Clean completed descriptors from Rx ring - bounce buf
* @rx_ring: Rx descriptor ring to transact packets on
@@ -1120,7 +1152,6 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget)
unsigned int total_rx_bytes = 0, total_rx_pkts = 0;
unsigned int offset = rx_ring->rx_offset;
struct xdp_buff *xdp = &rx_ring->xdp;
- u32 cached_ntc = rx_ring->first_desc;
struct ice_tx_ring *xdp_ring = NULL;
struct bpf_prog *xdp_prog = NULL;
u32 ntc = rx_ring->next_to_clean;
@@ -1128,7 +1159,6 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget)
u32 xdp_xmit = 0;
u32 cached_ntu;
bool failure;
- u32 first;
xdp_prog = READ_ONCE(rx_ring->xdp_prog);
if (xdp_prog) {
@@ -1190,6 +1220,7 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget)
xdp_prepare_buff(xdp, hard_start, offset, size, !!offset);
xdp_buff_clear_frags_flag(xdp);
} else if (ice_add_xdp_frag(rx_ring, xdp, rx_buf, size)) {
+ ice_put_rx_mbuf(rx_ring, xdp, NULL, ntc);
break;
}
if (++ntc == cnt)
@@ -1205,9 +1236,8 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget)
total_rx_bytes += xdp_get_buff_len(xdp);
total_rx_pkts++;
- xdp->data = NULL;
- rx_ring->first_desc = ntc;
- rx_ring->nr_frags = 0;
+ ice_put_rx_mbuf(rx_ring, xdp, &xdp_xmit, ntc);
+
continue;
construct_skb:
if (likely(ice_ring_uses_build_skb(rx_ring)))
@@ -1221,14 +1251,11 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget)
if (unlikely(xdp_buff_has_frags(xdp)))
ice_set_rx_bufs_act(xdp, rx_ring,
ICE_XDP_CONSUMED);
- xdp->data = NULL;
- rx_ring->first_desc = ntc;
- rx_ring->nr_frags = 0;
- break;
}
- xdp->data = NULL;
- rx_ring->first_desc = ntc;
- rx_ring->nr_frags = 0;
+ ice_put_rx_mbuf(rx_ring, xdp, &xdp_xmit, ntc);
+
+ if (!skb)
+ break;
stat_err_bits = BIT(ICE_RX_FLEX_DESC_STATUS0_RXE_S);
if (unlikely(ice_test_staterr(rx_desc->wb.status_error0,
@@ -1257,23 +1284,6 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget)
total_rx_pkts++;
}
- first = rx_ring->first_desc;
- while (cached_ntc != first) {
- struct ice_rx_buf *buf = &rx_ring->rx_buf[cached_ntc];
-
- if (buf->act & (ICE_XDP_TX | ICE_XDP_REDIR)) {
- ice_rx_buf_adjust_pg_offset(buf, xdp->frame_sz);
- xdp_xmit |= buf->act;
- } else if (buf->act & ICE_XDP_CONSUMED) {
- buf->pagecnt_bias++;
- } else if (buf->act == ICE_XDP_PASS) {
- ice_rx_buf_adjust_pg_offset(buf, xdp->frame_sz);
- }
-
- ice_put_rx_buf(rx_ring, buf);
- if (++cached_ntc >= cnt)
- cached_ntc = 0;
- }
rx_ring->next_to_clean = ntc;
/* return up to cleaned_count buffers to hardware */
failure = ice_alloc_rx_bufs(rx_ring, ICE_RX_DESC_UNUSED(rx_ring));
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v3 iwl-net 2/3] ice: gather page_count()'s of each frag right before XDP prog call
2025-01-20 15:50 [PATCH v3 iwl-net 0/3] ice: fix Rx data path for heavy 9k MTU traffic Maciej Fijalkowski
2025-01-20 15:50 ` [PATCH v3 iwl-net 1/3] ice: put Rx buffers after being done with current frame Maciej Fijalkowski
@ 2025-01-20 15:50 ` Maciej Fijalkowski
2025-01-20 15:50 ` [PATCH v3 iwl-net 3/3] ice: stop storing XDP verdict within ice_rx_buf Maciej Fijalkowski
2 siblings, 0 replies; 8+ messages in thread
From: Maciej Fijalkowski @ 2025-01-20 15:50 UTC (permalink / raw)
To: intel-wired-lan
Cc: netdev, anthony.l.nguyen, magnus.karlsson, jacob.e.keller, xudu,
mschmidt, jmaxwell, poros, przemyslaw.kitszel, Maciej Fijalkowski
If we store the pgcnt on few fragments while being in the middle of
gathering the whole frame and we stumbled upon DD bit not being set, we
terminate the NAPI Rx processing loop and come back later on. Then on
next NAPI execution we work on previously stored pgcnt.
Imagine that second half of page was used actively by networking stack
and by the time we came back, stack is not busy with this page anymore
and decremented the refcnt. The page reuse algorithm in this case should
be good to reuse the page but given the old refcnt it will not do so and
attempt to release the page via page_frag_cache_drain() with
pagecnt_bias used as an arg. This in turn will result in negative refcnt
on struct page, which was initially observed by Xu Du.
Therefore, move the page count storage from ice_get_rx_buf() to a place
where we are sure that whole frame has been collected, but before
calling XDP program as it internally can also change the page count of
fragments belonging to xdp_buff.
Fixes: ac0753391195 ("ice: Store page count inside ice_rx_buf")
Reported-and-tested-by: Xu Du <xudu@redhat.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Co-developed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
drivers/net/ethernet/intel/ice/ice_txrx.c | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index f2134ad57ead..9aa53ad2d8f2 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -924,7 +924,6 @@ ice_get_rx_buf(struct ice_rx_ring *rx_ring, const unsigned int size,
struct ice_rx_buf *rx_buf;
rx_buf = &rx_ring->rx_buf[ntc];
- rx_buf->pgcnt = page_count(rx_buf->page);
prefetchw(rx_buf->page);
if (!size)
@@ -940,6 +939,22 @@ ice_get_rx_buf(struct ice_rx_ring *rx_ring, const unsigned int size,
return rx_buf;
}
+static void ice_get_pgcnts(struct ice_rx_ring *rx_ring)
+{
+ u32 nr_frags = rx_ring->nr_frags + 1;
+ u32 idx = rx_ring->first_desc;
+ struct ice_rx_buf *rx_buf;
+ u32 cnt = rx_ring->count;
+
+ for (int i = 0; i < nr_frags; i++) {
+ rx_buf = &rx_ring->rx_buf[idx];
+ rx_buf->pgcnt = page_count(rx_buf->page);
+
+ if (++idx == cnt)
+ idx = 0;
+ }
+}
+
/**
* ice_build_skb - Build skb around an existing buffer
* @rx_ring: Rx descriptor ring to transact packets on
@@ -1230,6 +1245,7 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget)
if (ice_is_non_eop(rx_ring, rx_desc))
continue;
+ ice_get_pgcnts(rx_ring);
ice_run_xdp(rx_ring, xdp, xdp_prog, xdp_ring, rx_buf, rx_desc);
if (rx_buf->act == ICE_XDP_PASS)
goto construct_skb;
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v3 iwl-net 3/3] ice: stop storing XDP verdict within ice_rx_buf
2025-01-20 15:50 [PATCH v3 iwl-net 0/3] ice: fix Rx data path for heavy 9k MTU traffic Maciej Fijalkowski
2025-01-20 15:50 ` [PATCH v3 iwl-net 1/3] ice: put Rx buffers after being done with current frame Maciej Fijalkowski
2025-01-20 15:50 ` [PATCH v3 iwl-net 2/3] ice: gather page_count()'s of each frag right before XDP prog call Maciej Fijalkowski
@ 2025-01-20 15:50 ` Maciej Fijalkowski
2025-01-20 16:37 ` Simon Horman
2025-01-20 21:23 ` [Intel-wired-lan] " kernel test robot
2 siblings, 2 replies; 8+ messages in thread
From: Maciej Fijalkowski @ 2025-01-20 15:50 UTC (permalink / raw)
To: intel-wired-lan
Cc: netdev, anthony.l.nguyen, magnus.karlsson, jacob.e.keller, xudu,
mschmidt, jmaxwell, poros, przemyslaw.kitszel, Maciej Fijalkowski
Idea behind having ice_rx_buf::act was to simplify and speed up the Rx
data path by walking through buffers that were representing cleaned HW
Rx descriptors. Since it caused us a major headache recently and we
rolled back to old approach that 'puts' Rx buffers right after running
XDP prog/creating skb, this is useless now and should be removed.
Get rid of ice_rx_buf::act and related logic. We still need to take care
of a corner case where XDP program releases a particular fragment.
Make ice_run_xdp() to return its result and use it within
ice_put_rx_mbuf().
Fixes: 2fba7dc5157b ("ice: Add support for XDP multi-buffer on Rx side")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
drivers/net/ethernet/intel/ice/ice_txrx.c | 60 +++++++++++--------
drivers/net/ethernet/intel/ice/ice_txrx.h | 1 -
drivers/net/ethernet/intel/ice/ice_txrx_lib.h | 43 -------------
3 files changed, 35 insertions(+), 69 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index 9aa53ad2d8f2..77d75664c14d 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -532,10 +532,10 @@ int ice_setup_rx_ring(struct ice_rx_ring *rx_ring)
*
* Returns any of ICE_XDP_{PASS, CONSUMED, TX, REDIR}
*/
-static void
+static u32
ice_run_xdp(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
struct bpf_prog *xdp_prog, struct ice_tx_ring *xdp_ring,
- struct ice_rx_buf *rx_buf, union ice_32b_rx_flex_desc *eop_desc)
+ union ice_32b_rx_flex_desc *eop_desc)
{
unsigned int ret = ICE_XDP_PASS;
u32 act;
@@ -574,7 +574,7 @@ ice_run_xdp(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
ret = ICE_XDP_CONSUMED;
}
exit:
- ice_set_rx_bufs_act(xdp, rx_ring, ret);
+ return ret;
}
/**
@@ -860,10 +860,8 @@ ice_add_xdp_frag(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
xdp_buff_set_frags_flag(xdp);
}
- if (unlikely(sinfo->nr_frags == MAX_SKB_FRAGS)) {
- ice_set_rx_bufs_act(xdp, rx_ring, ICE_XDP_CONSUMED);
+ if (unlikely(sinfo->nr_frags == MAX_SKB_FRAGS))
return -ENOMEM;
- }
__skb_fill_page_desc_noacc(sinfo, sinfo->nr_frags++, rx_buf->page,
rx_buf->page_offset, size);
@@ -1066,12 +1064,12 @@ ice_construct_skb(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp)
rx_buf->page_offset + headlen, size,
xdp->frame_sz);
} else {
- /* buffer is unused, change the act that should be taken later
- * on; data was copied onto skb's linear part so there's no
+ /* buffer is unused, restore biased page count in Rx buffer;
+ * data was copied onto skb's linear part so there's no
* need for adjusting page offset and we can reuse this buffer
* as-is
*/
- rx_buf->act = ICE_SKB_CONSUMED;
+ rx_buf->pagecnt_bias++;
}
if (unlikely(xdp_buff_has_frags(xdp))) {
@@ -1119,23 +1117,27 @@ ice_put_rx_buf(struct ice_rx_ring *rx_ring, struct ice_rx_buf *rx_buf)
}
static void ice_put_rx_mbuf(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
- u32 *xdp_xmit, u32 ntc)
+ u32 *xdp_xmit, u32 ntc, u32 verdict)
{
u32 nr_frags = rx_ring->nr_frags + 1;
u32 idx = rx_ring->first_desc;
u32 cnt = rx_ring->count;
+ u32 post_xdp_frags = 1;
struct ice_rx_buf *buf;
int i;
- for (i = 0; i < nr_frags; i++) {
+ if (unlikely(xdp_buff_has_frags(xdp)))
+ post_xdp_frags += xdp_get_shared_info_from_buff(xdp)->nr_frags;
+
+ for (i = 0; i < post_xdp_frags; i++) {
buf = &rx_ring->rx_buf[idx];
- if (buf->act & (ICE_XDP_TX | ICE_XDP_REDIR)) {
+ if (verdict & (ICE_XDP_TX | ICE_XDP_REDIR)) {
ice_rx_buf_adjust_pg_offset(buf, xdp->frame_sz);
- *xdp_xmit |= buf->act;
- } else if (buf->act & ICE_XDP_CONSUMED) {
+ *xdp_xmit |= verdict;
+ } else if (verdict & ICE_XDP_CONSUMED) {
buf->pagecnt_bias++;
- } else if (buf->act == ICE_XDP_PASS) {
+ } else if (verdict == ICE_XDP_PASS) {
ice_rx_buf_adjust_pg_offset(buf, xdp->frame_sz);
}
@@ -1144,6 +1146,17 @@ static void ice_put_rx_mbuf(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
if (++idx == cnt)
idx = 0;
}
+ /* handle buffers that represented frags released by XDP prog;
+ * for these we keep pagecnt_bias as-is; refcount from struct page
+ * has been decremented within XDP prog and we do not have to increase
+ * the biased refcnt
+ */
+ for (; i < nr_frags; i++) {
+ buf = &rx_ring->rx_buf[idx];
+ ice_put_rx_buf(rx_ring, buf);
+ if (++idx == cnt)
+ idx = 0;
+ }
xdp->data = NULL;
rx_ring->first_desc = ntc;
@@ -1170,9 +1183,9 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget)
struct ice_tx_ring *xdp_ring = NULL;
struct bpf_prog *xdp_prog = NULL;
u32 ntc = rx_ring->next_to_clean;
+ u32 cached_ntu, xdp_verdict;
u32 cnt = rx_ring->count;
u32 xdp_xmit = 0;
- u32 cached_ntu;
bool failure;
xdp_prog = READ_ONCE(rx_ring->xdp_prog);
@@ -1235,7 +1248,7 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget)
xdp_prepare_buff(xdp, hard_start, offset, size, !!offset);
xdp_buff_clear_frags_flag(xdp);
} else if (ice_add_xdp_frag(rx_ring, xdp, rx_buf, size)) {
- ice_put_rx_mbuf(rx_ring, xdp, NULL, ntc);
+ ice_put_rx_mbuf(rx_ring, xdp, NULL, ntc, ICE_XDP_CONSUMED);
break;
}
if (++ntc == cnt)
@@ -1246,13 +1259,13 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget)
continue;
ice_get_pgcnts(rx_ring);
- ice_run_xdp(rx_ring, xdp, xdp_prog, xdp_ring, rx_buf, rx_desc);
- if (rx_buf->act == ICE_XDP_PASS)
+ xdp_verdict = ice_run_xdp(rx_ring, xdp, xdp_prog, xdp_ring, rx_desc);
+ if (xdp_verdict == ICE_XDP_PASS)
goto construct_skb;
total_rx_bytes += xdp_get_buff_len(xdp);
total_rx_pkts++;
- ice_put_rx_mbuf(rx_ring, xdp, &xdp_xmit, ntc);
+ ice_put_rx_mbuf(rx_ring, xdp, &xdp_xmit, ntc, xdp_verdict);
continue;
construct_skb:
@@ -1263,12 +1276,9 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget)
/* exit if we failed to retrieve a buffer */
if (!skb) {
rx_ring->ring_stats->rx_stats.alloc_page_failed++;
- rx_buf->act = ICE_XDP_CONSUMED;
- if (unlikely(xdp_buff_has_frags(xdp)))
- ice_set_rx_bufs_act(xdp, rx_ring,
- ICE_XDP_CONSUMED);
+ xdp_verdict = ICE_XDP_CONSUMED;
}
- ice_put_rx_mbuf(rx_ring, xdp, &xdp_xmit, ntc);
+ ice_put_rx_mbuf(rx_ring, xdp, &xdp_xmit, ntc, xdp_verdict);
if (!skb)
break;
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.h b/drivers/net/ethernet/intel/ice/ice_txrx.h
index cb347c852ba9..806bce701df3 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.h
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.h
@@ -201,7 +201,6 @@ struct ice_rx_buf {
struct page *page;
unsigned int page_offset;
unsigned int pgcnt;
- unsigned int act;
unsigned int pagecnt_bias;
};
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.h b/drivers/net/ethernet/intel/ice/ice_txrx_lib.h
index 79f960c6680d..6cf32b404127 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.h
+++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.h
@@ -5,49 +5,6 @@
#define _ICE_TXRX_LIB_H_
#include "ice.h"
-/**
- * ice_set_rx_bufs_act - propagate Rx buffer action to frags
- * @xdp: XDP buffer representing frame (linear and frags part)
- * @rx_ring: Rx ring struct
- * act: action to store onto Rx buffers related to XDP buffer parts
- *
- * Set action that should be taken before putting Rx buffer from first frag
- * to the last.
- */
-static inline void
-ice_set_rx_bufs_act(struct xdp_buff *xdp, const struct ice_rx_ring *rx_ring,
- const unsigned int act)
-{
- u32 sinfo_frags = xdp_get_shared_info_from_buff(xdp)->nr_frags;
- u32 nr_frags = rx_ring->nr_frags + 1;
- u32 idx = rx_ring->first_desc;
- u32 cnt = rx_ring->count;
- struct ice_rx_buf *buf;
-
- for (int i = 0; i < nr_frags; i++) {
- buf = &rx_ring->rx_buf[idx];
- buf->act = act;
-
- if (++idx == cnt)
- idx = 0;
- }
-
- /* adjust pagecnt_bias on frags freed by XDP prog */
- if (sinfo_frags < rx_ring->nr_frags && act == ICE_XDP_CONSUMED) {
- u32 delta = rx_ring->nr_frags - sinfo_frags;
-
- while (delta) {
- if (idx == 0)
- idx = cnt - 1;
- else
- idx--;
- buf = &rx_ring->rx_buf[idx];
- buf->pagecnt_bias--;
- delta--;
- }
- }
-}
-
/**
* ice_test_staterr - tests bits in Rx descriptor status and error fields
* @status_err_n: Rx descriptor status_error0 or status_error1 bits
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v3 iwl-net 3/3] ice: stop storing XDP verdict within ice_rx_buf
2025-01-20 15:50 ` [PATCH v3 iwl-net 3/3] ice: stop storing XDP verdict within ice_rx_buf Maciej Fijalkowski
@ 2025-01-20 16:37 ` Simon Horman
2025-01-22 12:50 ` Maciej Fijalkowski
2025-01-20 21:23 ` [Intel-wired-lan] " kernel test robot
1 sibling, 1 reply; 8+ messages in thread
From: Simon Horman @ 2025-01-20 16:37 UTC (permalink / raw)
To: Maciej Fijalkowski
Cc: intel-wired-lan, netdev, anthony.l.nguyen, magnus.karlsson,
jacob.e.keller, xudu, mschmidt, jmaxwell, poros,
przemyslaw.kitszel
On Mon, Jan 20, 2025 at 04:50:16PM +0100, Maciej Fijalkowski wrote:
> Idea behind having ice_rx_buf::act was to simplify and speed up the Rx
> data path by walking through buffers that were representing cleaned HW
> Rx descriptors. Since it caused us a major headache recently and we
> rolled back to old approach that 'puts' Rx buffers right after running
> XDP prog/creating skb, this is useless now and should be removed.
>
> Get rid of ice_rx_buf::act and related logic. We still need to take care
> of a corner case where XDP program releases a particular fragment.
>
> Make ice_run_xdp() to return its result and use it within
> ice_put_rx_mbuf().
>
> Fixes: 2fba7dc5157b ("ice: Add support for XDP multi-buffer on Rx side")
> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> ---
> drivers/net/ethernet/intel/ice/ice_txrx.c | 60 +++++++++++--------
> drivers/net/ethernet/intel/ice/ice_txrx.h | 1 -
> drivers/net/ethernet/intel/ice/ice_txrx_lib.h | 43 -------------
> 3 files changed, 35 insertions(+), 69 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
> index 9aa53ad2d8f2..77d75664c14d 100644
> --- a/drivers/net/ethernet/intel/ice/ice_txrx.c
> +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
> @@ -532,10 +532,10 @@ int ice_setup_rx_ring(struct ice_rx_ring *rx_ring)
> *
> * Returns any of ICE_XDP_{PASS, CONSUMED, TX, REDIR}
> */
> -static void
> +static u32
> ice_run_xdp(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
> struct bpf_prog *xdp_prog, struct ice_tx_ring *xdp_ring,
> - struct ice_rx_buf *rx_buf, union ice_32b_rx_flex_desc *eop_desc)
> + union ice_32b_rx_flex_desc *eop_desc)
> {
> unsigned int ret = ICE_XDP_PASS;
> u32 act;
nit: The Kernel doc for ice_run_xdp should also be updated to no
longer document the rx_buf parameter.
...
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3 iwl-net 1/3] ice: put Rx buffers after being done with current frame
2025-01-20 15:50 ` [PATCH v3 iwl-net 1/3] ice: put Rx buffers after being done with current frame Maciej Fijalkowski
@ 2025-01-20 16:38 ` Simon Horman
0 siblings, 0 replies; 8+ messages in thread
From: Simon Horman @ 2025-01-20 16:38 UTC (permalink / raw)
To: Maciej Fijalkowski
Cc: intel-wired-lan, netdev, anthony.l.nguyen, magnus.karlsson,
jacob.e.keller, xudu, mschmidt, jmaxwell, poros,
przemyslaw.kitszel
On Mon, Jan 20, 2025 at 04:50:14PM +0100, Maciej Fijalkowski wrote:
> Introduce a new helper ice_put_rx_mbuf() that will go through gathered
> frags from current frame and will call ice_put_rx_buf() on them. Current
> logic that was supposed to simplify and optimize the driver where we go
> through a batch of all buffers processed in current NAPI instance turned
> out to be broken for jumbo frames and very heavy load that was coming
> from both multi-thread iperf and nginx/wrk pair between server and
> client. The delay introduced by approach that we are dropping is simply
> too big and we need to take the decision regarding page
> recycling/releasing as quick as we can.
>
> While at it, address an error path of ice_add_xdp_frag() - we were
> missing buffer putting from day 1 there.
>
> As a nice side effect we get rid of annoying and repetetive three-liner:
nit: repetitive
...
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Intel-wired-lan] [PATCH v3 iwl-net 3/3] ice: stop storing XDP verdict within ice_rx_buf
2025-01-20 15:50 ` [PATCH v3 iwl-net 3/3] ice: stop storing XDP verdict within ice_rx_buf Maciej Fijalkowski
2025-01-20 16:37 ` Simon Horman
@ 2025-01-20 21:23 ` kernel test robot
1 sibling, 0 replies; 8+ messages in thread
From: kernel test robot @ 2025-01-20 21:23 UTC (permalink / raw)
To: Maciej Fijalkowski, intel-wired-lan
Cc: oe-kbuild-all, Maciej Fijalkowski, netdev, xudu, anthony.l.nguyen,
przemyslaw.kitszel, jacob.e.keller, jmaxwell, magnus.karlsson
Hi Maciej,
kernel test robot noticed the following build warnings:
[auto build test WARNING on tnguy-net-queue/dev-queue]
url: https://github.com/intel-lab-lkp/linux/commits/Maciej-Fijalkowski/ice-put-Rx-buffers-after-being-done-with-current-frame/20250120-235320
base: https://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue.git dev-queue
patch link: https://lore.kernel.org/r/20250120155016.556735-4-maciej.fijalkowski%40intel.com
patch subject: [Intel-wired-lan] [PATCH v3 iwl-net 3/3] ice: stop storing XDP verdict within ice_rx_buf
config: arc-randconfig-001-20250121 (https://download.01.org/0day-ci/archive/20250121/202501210750.KInYtrPt-lkp@intel.com/config)
compiler: arceb-elf-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250121/202501210750.KInYtrPt-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202501210750.KInYtrPt-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> drivers/net/ethernet/intel/ice/ice_txrx.c:539: warning: Excess function parameter 'rx_buf' description in 'ice_run_xdp'
vim +539 drivers/net/ethernet/intel/ice/ice_txrx.c
cdedef59deb020 Anirudh Venkataramanan 2018-03-20 523
efc2214b6047b6 Maciej Fijalkowski 2019-11-04 524 /**
efc2214b6047b6 Maciej Fijalkowski 2019-11-04 525 * ice_run_xdp - Executes an XDP program on initialized xdp_buff
efc2214b6047b6 Maciej Fijalkowski 2019-11-04 526 * @rx_ring: Rx ring
efc2214b6047b6 Maciej Fijalkowski 2019-11-04 527 * @xdp: xdp_buff used as input to the XDP program
efc2214b6047b6 Maciej Fijalkowski 2019-11-04 528 * @xdp_prog: XDP program to run
eb087cd828648d Maciej Fijalkowski 2021-08-19 529 * @xdp_ring: ring to be used for XDP_TX action
1dc1a7e7f4108b Maciej Fijalkowski 2023-01-31 530 * @rx_buf: Rx buffer to store the XDP action
d951c14ad237b0 Larysa Zaremba 2023-12-05 531 * @eop_desc: Last descriptor in packet to read metadata from
efc2214b6047b6 Maciej Fijalkowski 2019-11-04 532 *
efc2214b6047b6 Maciej Fijalkowski 2019-11-04 533 * Returns any of ICE_XDP_{PASS, CONSUMED, TX, REDIR}
efc2214b6047b6 Maciej Fijalkowski 2019-11-04 534 */
55a1a17189d7a5 Maciej Fijalkowski 2025-01-20 535 static u32
e72bba21355dbb Maciej Fijalkowski 2021-08-19 536 ice_run_xdp(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
1dc1a7e7f4108b Maciej Fijalkowski 2023-01-31 537 struct bpf_prog *xdp_prog, struct ice_tx_ring *xdp_ring,
55a1a17189d7a5 Maciej Fijalkowski 2025-01-20 538 union ice_32b_rx_flex_desc *eop_desc)
efc2214b6047b6 Maciej Fijalkowski 2019-11-04 @539 {
1dc1a7e7f4108b Maciej Fijalkowski 2023-01-31 540 unsigned int ret = ICE_XDP_PASS;
efc2214b6047b6 Maciej Fijalkowski 2019-11-04 541 u32 act;
efc2214b6047b6 Maciej Fijalkowski 2019-11-04 542
1dc1a7e7f4108b Maciej Fijalkowski 2023-01-31 543 if (!xdp_prog)
1dc1a7e7f4108b Maciej Fijalkowski 2023-01-31 544 goto exit;
1dc1a7e7f4108b Maciej Fijalkowski 2023-01-31 545
d951c14ad237b0 Larysa Zaremba 2023-12-05 546 ice_xdp_meta_set_desc(xdp, eop_desc);
d951c14ad237b0 Larysa Zaremba 2023-12-05 547
efc2214b6047b6 Maciej Fijalkowski 2019-11-04 548 act = bpf_prog_run_xdp(xdp_prog, xdp);
efc2214b6047b6 Maciej Fijalkowski 2019-11-04 549 switch (act) {
efc2214b6047b6 Maciej Fijalkowski 2019-11-04 550 case XDP_PASS:
1dc1a7e7f4108b Maciej Fijalkowski 2023-01-31 551 break;
efc2214b6047b6 Maciej Fijalkowski 2019-11-04 552 case XDP_TX:
22bf877e528f68 Maciej Fijalkowski 2021-08-19 553 if (static_branch_unlikely(&ice_xdp_locking_key))
22bf877e528f68 Maciej Fijalkowski 2021-08-19 554 spin_lock(&xdp_ring->tx_lock);
055d0920685e53 Alexander Lobakin 2023-02-10 555 ret = __ice_xmit_xdp_ring(xdp, xdp_ring, false);
22bf877e528f68 Maciej Fijalkowski 2021-08-19 556 if (static_branch_unlikely(&ice_xdp_locking_key))
22bf877e528f68 Maciej Fijalkowski 2021-08-19 557 spin_unlock(&xdp_ring->tx_lock);
1dc1a7e7f4108b Maciej Fijalkowski 2023-01-31 558 if (ret == ICE_XDP_CONSUMED)
89d65df024c599 Magnus Karlsson 2021-05-10 559 goto out_failure;
1dc1a7e7f4108b Maciej Fijalkowski 2023-01-31 560 break;
efc2214b6047b6 Maciej Fijalkowski 2019-11-04 561 case XDP_REDIRECT:
1dc1a7e7f4108b Maciej Fijalkowski 2023-01-31 562 if (xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog))
89d65df024c599 Magnus Karlsson 2021-05-10 563 goto out_failure;
1dc1a7e7f4108b Maciej Fijalkowski 2023-01-31 564 ret = ICE_XDP_REDIR;
1dc1a7e7f4108b Maciej Fijalkowski 2023-01-31 565 break;
efc2214b6047b6 Maciej Fijalkowski 2019-11-04 566 default:
c8064e5b4adac5 Paolo Abeni 2021-11-30 567 bpf_warn_invalid_xdp_action(rx_ring->netdev, xdp_prog, act);
4e83fc934e3a04 Bruce Allan 2020-01-22 568 fallthrough;
efc2214b6047b6 Maciej Fijalkowski 2019-11-04 569 case XDP_ABORTED:
89d65df024c599 Magnus Karlsson 2021-05-10 570 out_failure:
efc2214b6047b6 Maciej Fijalkowski 2019-11-04 571 trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
4e83fc934e3a04 Bruce Allan 2020-01-22 572 fallthrough;
efc2214b6047b6 Maciej Fijalkowski 2019-11-04 573 case XDP_DROP:
1dc1a7e7f4108b Maciej Fijalkowski 2023-01-31 574 ret = ICE_XDP_CONSUMED;
efc2214b6047b6 Maciej Fijalkowski 2019-11-04 575 }
1dc1a7e7f4108b Maciej Fijalkowski 2023-01-31 576 exit:
55a1a17189d7a5 Maciej Fijalkowski 2025-01-20 577 return ret;
efc2214b6047b6 Maciej Fijalkowski 2019-11-04 578 }
efc2214b6047b6 Maciej Fijalkowski 2019-11-04 579
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3 iwl-net 3/3] ice: stop storing XDP verdict within ice_rx_buf
2025-01-20 16:37 ` Simon Horman
@ 2025-01-22 12:50 ` Maciej Fijalkowski
0 siblings, 0 replies; 8+ messages in thread
From: Maciej Fijalkowski @ 2025-01-22 12:50 UTC (permalink / raw)
To: Simon Horman
Cc: intel-wired-lan, netdev, anthony.l.nguyen, magnus.karlsson,
jacob.e.keller, xudu, mschmidt, jmaxwell, poros,
przemyslaw.kitszel
On Mon, Jan 20, 2025 at 04:37:55PM +0000, Simon Horman wrote:
> On Mon, Jan 20, 2025 at 04:50:16PM +0100, Maciej Fijalkowski wrote:
> > Idea behind having ice_rx_buf::act was to simplify and speed up the Rx
> > data path by walking through buffers that were representing cleaned HW
> > Rx descriptors. Since it caused us a major headache recently and we
> > rolled back to old approach that 'puts' Rx buffers right after running
> > XDP prog/creating skb, this is useless now and should be removed.
> >
> > Get rid of ice_rx_buf::act and related logic. We still need to take care
> > of a corner case where XDP program releases a particular fragment.
> >
> > Make ice_run_xdp() to return its result and use it within
> > ice_put_rx_mbuf().
> >
> > Fixes: 2fba7dc5157b ("ice: Add support for XDP multi-buffer on Rx side")
> > Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
> > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > ---
> > drivers/net/ethernet/intel/ice/ice_txrx.c | 60 +++++++++++--------
> > drivers/net/ethernet/intel/ice/ice_txrx.h | 1 -
> > drivers/net/ethernet/intel/ice/ice_txrx_lib.h | 43 -------------
> > 3 files changed, 35 insertions(+), 69 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
> > index 9aa53ad2d8f2..77d75664c14d 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_txrx.c
> > +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
> > @@ -532,10 +532,10 @@ int ice_setup_rx_ring(struct ice_rx_ring *rx_ring)
> > *
> > * Returns any of ICE_XDP_{PASS, CONSUMED, TX, REDIR}
> > */
> > -static void
> > +static u32
> > ice_run_xdp(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
> > struct bpf_prog *xdp_prog, struct ice_tx_ring *xdp_ring,
> > - struct ice_rx_buf *rx_buf, union ice_32b_rx_flex_desc *eop_desc)
> > + union ice_32b_rx_flex_desc *eop_desc)
> > {
> > unsigned int ret = ICE_XDP_PASS;
> > u32 act;
>
> nit: The Kernel doc for ice_run_xdp should also be updated to no
> longer document the rx_buf parameter.
Heh - but after making it to return the verdict again the return
description is valid:D
I have been missing the kdoc descriptions for introduced functions in this
patchset so let me add them as well.
Thanks for review!
>
> ...
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-01-22 12:50 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-20 15:50 [PATCH v3 iwl-net 0/3] ice: fix Rx data path for heavy 9k MTU traffic Maciej Fijalkowski
2025-01-20 15:50 ` [PATCH v3 iwl-net 1/3] ice: put Rx buffers after being done with current frame Maciej Fijalkowski
2025-01-20 16:38 ` Simon Horman
2025-01-20 15:50 ` [PATCH v3 iwl-net 2/3] ice: gather page_count()'s of each frag right before XDP prog call Maciej Fijalkowski
2025-01-20 15:50 ` [PATCH v3 iwl-net 3/3] ice: stop storing XDP verdict within ice_rx_buf Maciej Fijalkowski
2025-01-20 16:37 ` Simon Horman
2025-01-22 12:50 ` Maciej Fijalkowski
2025-01-20 21:23 ` [Intel-wired-lan] " kernel test robot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).