* [PATCH net 3/6] ice: fix Rx page leak on multi-buffer frames [not found] <20250815204205.1407768-1-anthony.l.nguyen@intel.com> @ 2025-08-15 20:41 ` Tony Nguyen 2025-08-18 11:05 ` Jesper Dangaard Brouer 2025-08-15 20:42 ` [PATCH net 4/6] ixgbe: xsk: resolve the negative overflow of budget in ixgbe_xmit_zc Tony Nguyen 2025-08-15 20:42 ` [PATCH net 5/6] ixgbe: fix ndo_xdp_xmit() workloads Tony Nguyen 2 siblings, 1 reply; 9+ messages in thread From: Tony Nguyen @ 2025-08-15 20:41 UTC (permalink / raw) To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev Cc: Jacob Keller, anthony.l.nguyen, maciej.fijalkowski, magnus.karlsson, ast, daniel, hawk, john.fastabend, sdf, bpf, horms, przemyslaw.kitszel, aleksander.lobakin, jaroslav.pulchart, jdamato, christoph.petrausch, Rinitha S, Priya Singh From: Jacob Keller <jacob.e.keller@intel.com> The ice_put_rx_mbuf() function handles calling ice_put_rx_buf() for each buffer in the current frame. This function was introduced as part of handling multi-buffer XDP support in the ice driver. It works by iterating over the buffers from first_desc up to 1 plus the total number of fragments in the frame, cached from before the XDP program was executed. If the hardware posts a descriptor with a size of 0, the logic used in ice_put_rx_mbuf() breaks. Such descriptors get skipped and don't get added as fragments in ice_add_xdp_frag. Since the buffer isn't counted as a fragment, we do not iterate over it in ice_put_rx_mbuf(), and thus we don't call ice_put_rx_buf(). Because we don't call ice_put_rx_buf(), we don't attempt to re-use the page or free it. This leaves a stale page in the ring, as we don't increment next_to_alloc. The ice_reuse_rx_page() assumes that the next_to_alloc has been incremented properly, and that it always points to a buffer with a NULL page. Since this function doesn't check, it will happily recycle a page over the top of the next_to_alloc buffer, losing track of the old page. Note that this leak only occurs for multi-buffer frames. The ice_put_rx_mbuf() function always handles at least one buffer, so a single-buffer frame will always get handled correctly. It is not clear precisely why the hardware hands us descriptors with a size of 0 sometimes, but it happens somewhat regularly with "jumbo frames" used by 9K MTU. To fix ice_put_rx_mbuf(), we need to make sure to call ice_put_rx_buf() on all buffers between first_desc and next_to_clean. Borrow the logic of a similar function in i40e used for this same purpose. Use the same logic also in ice_get_pgcnts(). Instead of iterating over just the number of fragments, use a loop which iterates until the current index reaches to the next_to_clean element just past the current frame. Check the current number of fragments (post XDP program). For all buffers up 1 more than the number of fragments, we'll update the pagecnt_bias. For any buffers past this, pagecnt_bias is left as-is. This ensures that fragments released by the XDP program, as well as any buffers with zero-size won't have their pagecnt_bias updated incorrectly. Unlike i40e, the ice_put_rx_mbuf() function does call ice_put_rx_buf() on the last buffer of the frame indicating end of packet. The xdp_xmit value only needs to be updated if an XDP program is run, and only once per packet. Drop the xdp_xmit pointer argument from ice_put_rx_mbuf(). Instead, set xdp_xmit in the ice_clean_rx_irq() function directly. This avoids needing to pass the argument and avoids an extra bit-wise OR for each buffer in the frame. Move the increment of the ntc local variable to ensure its updated *before* all calls to ice_get_pgcnts() or ice_put_rx_mbuf(), as the loop logic requires the index of the element just after the current frame. This has the advantage that we also no longer need to track or cache the number of fragments in the rx_ring, which saves a few bytes in the ring. Cc: Christoph Petrausch <christoph.petrausch@deepl.com> Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com> Closes: https://lore.kernel.org/netdev/CAK8fFZ4hY6GUJNENz3wY9jaYLZXGfpr7dnZxzGMYoE44caRbgw@mail.gmail.com/ Fixes: 743bbd93cf29 ("ice: put Rx buffers after being done with current frame") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Priya Singh <priyax.singh@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> --- drivers/net/ethernet/intel/ice/ice_txrx.c | 81 +++++++++-------------- drivers/net/ethernet/intel/ice/ice_txrx.h | 1 - 2 files changed, 33 insertions(+), 49 deletions(-) diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c index 29e0088ab6b2..93907ab2eac7 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c @@ -894,10 +894,6 @@ ice_add_xdp_frag(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp, __skb_fill_page_desc_noacc(sinfo, sinfo->nr_frags++, rx_buf->page, rx_buf->page_offset, size); sinfo->xdp_frags_size += size; - /* remember frag count before XDP prog execution; bpf_xdp_adjust_tail() - * can pop off frags but driver has to handle it on its own - */ - rx_ring->nr_frags = sinfo->nr_frags; if (page_is_pfmemalloc(rx_buf->page)) xdp_buff_set_frag_pfmemalloc(xdp); @@ -968,20 +964,20 @@ ice_get_rx_buf(struct ice_rx_ring *rx_ring, const unsigned int size, /** * ice_get_pgcnts - grab page_count() for gathered fragments * @rx_ring: Rx descriptor ring to store the page counts on + * @ntc: the next to clean element (not included in this frame!) * * This function is intended to be called right before running XDP * program so that the page recycling mechanism will be able to take * a correct decision regarding underlying pages; this is done in such * way as XDP program can change the refcount of page */ -static void ice_get_pgcnts(struct ice_rx_ring *rx_ring) +static void ice_get_pgcnts(struct ice_rx_ring *rx_ring, unsigned int ntc) { - u32 nr_frags = rx_ring->nr_frags + 1; u32 idx = rx_ring->first_desc; struct ice_rx_buf *rx_buf; u32 cnt = rx_ring->count; - for (int i = 0; i < nr_frags; i++) { + while (idx != ntc) { rx_buf = &rx_ring->rx_buf[idx]; rx_buf->pgcnt = page_count(rx_buf->page); @@ -1154,62 +1150,48 @@ ice_put_rx_buf(struct ice_rx_ring *rx_ring, struct ice_rx_buf *rx_buf) } /** - * ice_put_rx_mbuf - ice_put_rx_buf() caller, for all frame frags + * ice_put_rx_mbuf - ice_put_rx_buf() caller, for all buffers in frame * @rx_ring: Rx ring with all the auxiliary data * @xdp: XDP buffer carrying linear + frags part - * @xdp_xmit: XDP_TX/XDP_REDIRECT verdict storage - * @ntc: a current next_to_clean value to be stored at rx_ring + * @ntc: the next to clean element (not included in this frame!) * @verdict: return code from XDP program execution * - * Walk through gathered fragments and satisfy internal page - * recycle mechanism; we take here an action related to verdict - * returned by XDP program; + * Called after XDP program is completed, or on error with verdict set to + * ICE_XDP_CONSUMED. + * + * Walk through buffers from first_desc to the end of the frame, releasing + * buffers and satisfying internal page recycle mechanism. The action depends + * on verdict from XDP program. */ static void ice_put_rx_mbuf(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp, - u32 *xdp_xmit, u32 ntc, u32 verdict) + u32 ntc, u32 verdict) { - u32 nr_frags = rx_ring->nr_frags + 1; + u32 nr_frags = xdp_get_shared_info_from_buff(xdp)->nr_frags; u32 idx = rx_ring->first_desc; u32 cnt = rx_ring->count; - u32 post_xdp_frags = 1; struct ice_rx_buf *buf; - int i; - - if (unlikely(xdp_buff_has_frags(xdp))) - post_xdp_frags += xdp_get_shared_info_from_buff(xdp)->nr_frags; + int i = 0; - for (i = 0; i < post_xdp_frags; i++) { + while (idx != ntc) { buf = &rx_ring->rx_buf[idx]; + if (++idx == cnt) + idx = 0; - if (verdict & (ICE_XDP_TX | ICE_XDP_REDIR)) { + /* An XDP program could release fragments from the end of the + * buffer. For these, we need to keep the pagecnt_bias as-is. + * To do this, only adjust pagecnt_bias for fragments up to + * the total remaining after the XDP program has run. + */ + if (verdict != ICE_XDP_CONSUMED) ice_rx_buf_adjust_pg_offset(buf, xdp->frame_sz); - *xdp_xmit |= verdict; - } else if (verdict & ICE_XDP_CONSUMED) { + else if (i++ <= nr_frags) buf->pagecnt_bias++; - } else if (verdict == ICE_XDP_PASS) { - ice_rx_buf_adjust_pg_offset(buf, xdp->frame_sz); - } ice_put_rx_buf(rx_ring, buf); - - if (++idx == cnt) - idx = 0; - } - /* handle buffers that represented frags released by XDP prog; - * for these we keep pagecnt_bias as-is; refcount from struct page - * has been decremented within XDP prog and we do not have to increase - * the biased refcnt - */ - for (; i < nr_frags; i++) { - buf = &rx_ring->rx_buf[idx]; - ice_put_rx_buf(rx_ring, buf); - if (++idx == cnt) - idx = 0; } xdp->data = NULL; rx_ring->first_desc = ntc; - rx_ring->nr_frags = 0; } /** @@ -1317,6 +1299,10 @@ static int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget) /* retrieve a buffer from the ring */ rx_buf = ice_get_rx_buf(rx_ring, size, ntc); + /* Increment ntc before calls to ice_put_rx_mbuf() */ + if (++ntc == cnt) + ntc = 0; + if (!xdp->data) { void *hard_start; @@ -1325,24 +1311,23 @@ static int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget) xdp_prepare_buff(xdp, hard_start, offset, size, !!offset); xdp_buff_clear_frags_flag(xdp); } else if (ice_add_xdp_frag(rx_ring, xdp, rx_buf, size)) { - ice_put_rx_mbuf(rx_ring, xdp, NULL, ntc, ICE_XDP_CONSUMED); + ice_put_rx_mbuf(rx_ring, xdp, ntc, ICE_XDP_CONSUMED); break; } - if (++ntc == cnt) - ntc = 0; /* skip if it is NOP desc */ if (ice_is_non_eop(rx_ring, rx_desc)) continue; - ice_get_pgcnts(rx_ring); + ice_get_pgcnts(rx_ring, ntc); xdp_verdict = ice_run_xdp(rx_ring, xdp, xdp_prog, xdp_ring, rx_desc); if (xdp_verdict == ICE_XDP_PASS) goto construct_skb; total_rx_bytes += xdp_get_buff_len(xdp); total_rx_pkts++; - ice_put_rx_mbuf(rx_ring, xdp, &xdp_xmit, ntc, xdp_verdict); + ice_put_rx_mbuf(rx_ring, xdp, ntc, xdp_verdict); + xdp_xmit |= xdp_verdict & (ICE_XDP_TX | ICE_XDP_REDIR); continue; construct_skb: @@ -1355,7 +1340,7 @@ static int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget) rx_ring->ring_stats->rx_stats.alloc_page_failed++; xdp_verdict = ICE_XDP_CONSUMED; } - ice_put_rx_mbuf(rx_ring, xdp, &xdp_xmit, ntc, xdp_verdict); + ice_put_rx_mbuf(rx_ring, xdp, ntc, xdp_verdict); if (!skb) break; diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.h b/drivers/net/ethernet/intel/ice/ice_txrx.h index fef750c5f288..2fd8e78178a2 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.h +++ b/drivers/net/ethernet/intel/ice/ice_txrx.h @@ -358,7 +358,6 @@ struct ice_rx_ring { struct ice_tx_ring *xdp_ring; struct ice_rx_ring *next; /* pointer to next ring in q_vector */ struct xsk_buff_pool *xsk_pool; - u32 nr_frags; u16 max_frame; u16 rx_buf_len; dma_addr_t dma; /* physical address of ring */ -- 2.47.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH net 3/6] ice: fix Rx page leak on multi-buffer frames 2025-08-15 20:41 ` [PATCH net 3/6] ice: fix Rx page leak on multi-buffer frames Tony Nguyen @ 2025-08-18 11:05 ` Jesper Dangaard Brouer 2025-08-19 0:38 ` Jacob Keller 0 siblings, 1 reply; 9+ messages in thread From: Jesper Dangaard Brouer @ 2025-08-18 11:05 UTC (permalink / raw) To: Tony Nguyen, davem, kuba, pabeni, edumazet, andrew+netdev, netdev Cc: Jacob Keller, maciej.fijalkowski, magnus.karlsson, ast, daniel, john.fastabend, sdf, bpf, horms, przemyslaw.kitszel, aleksander.lobakin, jaroslav.pulchart, jdamato, christoph.petrausch, Rinitha S, Priya Singh, Eelco Chaudron On 15/08/2025 22.41, Tony Nguyen wrote: > From: Jacob Keller <jacob.e.keller@intel.com> > > The ice_put_rx_mbuf() function handles calling ice_put_rx_buf() for each > buffer in the current frame. This function was introduced as part of > handling multi-buffer XDP support in the ice driver. > > It works by iterating over the buffers from first_desc up to 1 plus the > total number of fragments in the frame, cached from before the XDP program > was executed. > > If the hardware posts a descriptor with a size of 0, the logic used in > ice_put_rx_mbuf() breaks. Such descriptors get skipped and don't get added > as fragments in ice_add_xdp_frag. Since the buffer isn't counted as a > fragment, we do not iterate over it in ice_put_rx_mbuf(), and thus we don't > call ice_put_rx_buf(). > > Because we don't call ice_put_rx_buf(), we don't attempt to re-use the > page or free it. This leaves a stale page in the ring, as we don't > increment next_to_alloc. > > The ice_reuse_rx_page() assumes that the next_to_alloc has been incremented > properly, and that it always points to a buffer with a NULL page. Since > this function doesn't check, it will happily recycle a page over the top > of the next_to_alloc buffer, losing track of the old page. > > Note that this leak only occurs for multi-buffer frames. The > ice_put_rx_mbuf() function always handles at least one buffer, so a > single-buffer frame will always get handled correctly. It is not clear > precisely why the hardware hands us descriptors with a size of 0 sometimes, > but it happens somewhat regularly with "jumbo frames" used by 9K MTU. > > To fix ice_put_rx_mbuf(), we need to make sure to call ice_put_rx_buf() on > all buffers between first_desc and next_to_clean. Borrow the logic of a > similar function in i40e used for this same purpose. Use the same logic > also in ice_get_pgcnts(). > > Instead of iterating over just the number of fragments, use a loop which > iterates until the current index reaches to the next_to_clean element just > past the current frame. Check the current number of fragments (post XDP > program). For all buffers up 1 more than the number of fragments, we'll > update the pagecnt_bias. For any buffers past this, pagecnt_bias is left > as-is. This ensures that fragments released by the XDP program, as well as > any buffers with zero-size won't have their pagecnt_bias updated > incorrectly. Unlike i40e, the ice_put_rx_mbuf() function does call > ice_put_rx_buf() on the last buffer of the frame indicating end of packet. > > The xdp_xmit value only needs to be updated if an XDP program is run, and > only once per packet. Drop the xdp_xmit pointer argument from > ice_put_rx_mbuf(). Instead, set xdp_xmit in the ice_clean_rx_irq() function > directly. This avoids needing to pass the argument and avoids an extra > bit-wise OR for each buffer in the frame. > > Move the increment of the ntc local variable to ensure its updated *before* > all calls to ice_get_pgcnts() or ice_put_rx_mbuf(), as the loop logic > requires the index of the element just after the current frame. > > This has the advantage that we also no longer need to track or cache the > number of fragments in the rx_ring, which saves a few bytes in the ring. > Have anyone tested the performance impact for XDP_DROP ? (with standard non-multi-buffer frames) Below code change will always touch cache-lines in shared_info area. Before it was guarded with a xdp_buff_has_frags() check. > Cc: Christoph Petrausch <christoph.petrausch@deepl.com> > Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com> > Closes: https://lore.kernel.org/netdev/CAK8fFZ4hY6GUJNENz3wY9jaYLZXGfpr7dnZxzGMYoE44caRbgw@mail.gmail.com/ > Fixes: 743bbd93cf29 ("ice: put Rx buffers after being done with current frame") > Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> > Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) > Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> > Tested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> > Tested-by: Priya Singh <priyax.singh@intel.com> > Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> > --- > drivers/net/ethernet/intel/ice/ice_txrx.c | 81 +++++++++-------------- > drivers/net/ethernet/intel/ice/ice_txrx.h | 1 - > 2 files changed, 33 insertions(+), 49 deletions(-) > > diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c > index 29e0088ab6b2..93907ab2eac7 100644 > --- a/drivers/net/ethernet/intel/ice/ice_txrx.c > +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c > @@ -894,10 +894,6 @@ ice_add_xdp_frag(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp, > __skb_fill_page_desc_noacc(sinfo, sinfo->nr_frags++, rx_buf->page, > rx_buf->page_offset, size); > sinfo->xdp_frags_size += size; > - /* remember frag count before XDP prog execution; bpf_xdp_adjust_tail() > - * can pop off frags but driver has to handle it on its own > - */ > - rx_ring->nr_frags = sinfo->nr_frags; > > if (page_is_pfmemalloc(rx_buf->page)) > xdp_buff_set_frag_pfmemalloc(xdp); > @@ -968,20 +964,20 @@ ice_get_rx_buf(struct ice_rx_ring *rx_ring, const unsigned int size, > /** > * ice_get_pgcnts - grab page_count() for gathered fragments > * @rx_ring: Rx descriptor ring to store the page counts on > + * @ntc: the next to clean element (not included in this frame!) > * > * This function is intended to be called right before running XDP > * program so that the page recycling mechanism will be able to take > * a correct decision regarding underlying pages; this is done in such > * way as XDP program can change the refcount of page > */ > -static void ice_get_pgcnts(struct ice_rx_ring *rx_ring) > +static void ice_get_pgcnts(struct ice_rx_ring *rx_ring, unsigned int ntc) > { > - u32 nr_frags = rx_ring->nr_frags + 1; > u32 idx = rx_ring->first_desc; > struct ice_rx_buf *rx_buf; > u32 cnt = rx_ring->count; > > - for (int i = 0; i < nr_frags; i++) { > + while (idx != ntc) { > rx_buf = &rx_ring->rx_buf[idx]; > rx_buf->pgcnt = page_count(rx_buf->page); > > @@ -1154,62 +1150,48 @@ ice_put_rx_buf(struct ice_rx_ring *rx_ring, struct ice_rx_buf *rx_buf) > } > > /** > - * ice_put_rx_mbuf - ice_put_rx_buf() caller, for all frame frags > + * ice_put_rx_mbuf - ice_put_rx_buf() caller, for all buffers in frame > * @rx_ring: Rx ring with all the auxiliary data > * @xdp: XDP buffer carrying linear + frags part > - * @xdp_xmit: XDP_TX/XDP_REDIRECT verdict storage > - * @ntc: a current next_to_clean value to be stored at rx_ring > + * @ntc: the next to clean element (not included in this frame!) > * @verdict: return code from XDP program execution > * > - * Walk through gathered fragments and satisfy internal page > - * recycle mechanism; we take here an action related to verdict > - * returned by XDP program; > + * Called after XDP program is completed, or on error with verdict set to > + * ICE_XDP_CONSUMED. > + * > + * Walk through buffers from first_desc to the end of the frame, releasing > + * buffers and satisfying internal page recycle mechanism. The action depends > + * on verdict from XDP program. > */ > static void ice_put_rx_mbuf(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp, > - u32 *xdp_xmit, u32 ntc, u32 verdict) > + u32 ntc, u32 verdict) > { > - u32 nr_frags = rx_ring->nr_frags + 1; > + u32 nr_frags = xdp_get_shared_info_from_buff(xdp)->nr_frags; Here we unconditionally access the skb_shared_info area. > u32 idx = rx_ring->first_desc; > u32 cnt = rx_ring->count; > - u32 post_xdp_frags = 1; > struct ice_rx_buf *buf; > - int i; > - > - if (unlikely(xdp_buff_has_frags(xdp))) Previously we only touch shared_info area if this is a multi-buff frame. > - post_xdp_frags += xdp_get_shared_info_from_buff(xdp)->nr_frags; > + int i = 0; > > - for (i = 0; i < post_xdp_frags; i++) { > + while (idx != ntc) { > buf = &rx_ring->rx_buf[idx]; > + if (++idx == cnt) > + idx = 0; > > - if (verdict & (ICE_XDP_TX | ICE_XDP_REDIR)) { > + /* An XDP program could release fragments from the end of the > + * buffer. For these, we need to keep the pagecnt_bias as-is. > + * To do this, only adjust pagecnt_bias for fragments up to > + * the total remaining after the XDP program has run. > + */ > + if (verdict != ICE_XDP_CONSUMED) > ice_rx_buf_adjust_pg_offset(buf, xdp->frame_sz); > - *xdp_xmit |= verdict; > - } else if (verdict & ICE_XDP_CONSUMED) { > + else if (i++ <= nr_frags) > buf->pagecnt_bias++; > - } else if (verdict == ICE_XDP_PASS) { > - ice_rx_buf_adjust_pg_offset(buf, xdp->frame_sz); > - } > > ice_put_rx_buf(rx_ring, buf); > - > - if (++idx == cnt) > - idx = 0; > - } > - /* handle buffers that represented frags released by XDP prog; > - * for these we keep pagecnt_bias as-is; refcount from struct page > - * has been decremented within XDP prog and we do not have to increase > - * the biased refcnt > - */ > - for (; i < nr_frags; i++) { > - buf = &rx_ring->rx_buf[idx]; > - ice_put_rx_buf(rx_ring, buf); > - if (++idx == cnt) > - idx = 0; > } > > xdp->data = NULL; > rx_ring->first_desc = ntc; > - rx_ring->nr_frags = 0; > } > > /** > @@ -1317,6 +1299,10 @@ static int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget) > /* retrieve a buffer from the ring */ > rx_buf = ice_get_rx_buf(rx_ring, size, ntc); > > + /* Increment ntc before calls to ice_put_rx_mbuf() */ > + if (++ntc == cnt) > + ntc = 0; > + > if (!xdp->data) { > void *hard_start; > > @@ -1325,24 +1311,23 @@ static int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget) > xdp_prepare_buff(xdp, hard_start, offset, size, !!offset); > xdp_buff_clear_frags_flag(xdp); > } else if (ice_add_xdp_frag(rx_ring, xdp, rx_buf, size)) { > - ice_put_rx_mbuf(rx_ring, xdp, NULL, ntc, ICE_XDP_CONSUMED); > + ice_put_rx_mbuf(rx_ring, xdp, ntc, ICE_XDP_CONSUMED); > break; > } > - if (++ntc == cnt) > - ntc = 0; > > /* skip if it is NOP desc */ > if (ice_is_non_eop(rx_ring, rx_desc)) > continue; > > - ice_get_pgcnts(rx_ring); > + ice_get_pgcnts(rx_ring, ntc); > xdp_verdict = ice_run_xdp(rx_ring, xdp, xdp_prog, xdp_ring, rx_desc); > if (xdp_verdict == ICE_XDP_PASS) > goto construct_skb; > total_rx_bytes += xdp_get_buff_len(xdp); > total_rx_pkts++; > > - ice_put_rx_mbuf(rx_ring, xdp, &xdp_xmit, ntc, xdp_verdict); > + ice_put_rx_mbuf(rx_ring, xdp, ntc, xdp_verdict); > + xdp_xmit |= xdp_verdict & (ICE_XDP_TX | ICE_XDP_REDIR); > > continue; > construct_skb: > @@ -1355,7 +1340,7 @@ static int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget) > rx_ring->ring_stats->rx_stats.alloc_page_failed++; > xdp_verdict = ICE_XDP_CONSUMED; > } > - ice_put_rx_mbuf(rx_ring, xdp, &xdp_xmit, ntc, xdp_verdict); > + ice_put_rx_mbuf(rx_ring, xdp, ntc, xdp_verdict); > > if (!skb) > break; > diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.h b/drivers/net/ethernet/intel/ice/ice_txrx.h > index fef750c5f288..2fd8e78178a2 100644 > --- a/drivers/net/ethernet/intel/ice/ice_txrx.h > +++ b/drivers/net/ethernet/intel/ice/ice_txrx.h > @@ -358,7 +358,6 @@ struct ice_rx_ring { > struct ice_tx_ring *xdp_ring; > struct ice_rx_ring *next; /* pointer to next ring in q_vector */ > struct xsk_buff_pool *xsk_pool; > - u32 nr_frags; > u16 max_frame; > u16 rx_buf_len; > dma_addr_t dma; /* physical address of ring */ ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net 3/6] ice: fix Rx page leak on multi-buffer frames 2025-08-18 11:05 ` Jesper Dangaard Brouer @ 2025-08-19 0:38 ` Jacob Keller 2025-08-19 16:44 ` Jesper Dangaard Brouer 0 siblings, 1 reply; 9+ messages in thread From: Jacob Keller @ 2025-08-19 0:38 UTC (permalink / raw) To: Jesper Dangaard Brouer, Tony Nguyen, davem, kuba, pabeni, edumazet, andrew+netdev, netdev Cc: maciej.fijalkowski, magnus.karlsson, ast, daniel, john.fastabend, sdf, bpf, horms, przemyslaw.kitszel, aleksander.lobakin, jaroslav.pulchart, jdamato, christoph.petrausch, Rinitha S, Priya Singh, Eelco Chaudron [-- Attachment #1.1: Type: text/plain, Size: 9118 bytes --] On 8/18/2025 4:05 AM, Jesper Dangaard Brouer wrote: > On 15/08/2025 22.41, Tony Nguyen wrote: >> This has the advantage that we also no longer need to track or cache the >> number of fragments in the rx_ring, which saves a few bytes in the ring. >> > > Have anyone tested the performance impact for XDP_DROP ? > (with standard non-multi-buffer frames) > > Below code change will always touch cache-lines in shared_info area. > Before it was guarded with a xdp_buff_has_frags() check. > I did some basic testing with XDP_DROP previously using the xdp-bench tool, and do not recall notice an issue. I don't recall the actual numbers now though, so I did some quick tests again. without patch... Client: $ iperf3 -u -c 192.168.93.1 -t86400 -l1200 -P20 -b5G ... [SUM] 10.00-10.33 sec 626 MBytes 16.0 Gbits/sec 546909 $ iperf3 -s -B 192.168.93.1%ens260f0 [SUM] 0.00-10.00 sec 17.7 GBytes 15.2 Gbits/sec 0.011 ms 9712/15888183 (0.061%) receiver $ xdp-bench drop ens260f0 Summary 1,778,935 rx/s 0 err/s Summary 2,041,087 rx/s 0 err/s Summary 2,005,052 rx/s 0 err/s Summary 1,918,967 rx/s 0 err/s with patch... Client: $ iperf3 -u -c 192.168.93.1 -t86400 -l1200 -P20 -b5G ... [SUM] 78.00-78.90 sec 2.01 GBytes 19.1 Gbits/sec 1801284 Server: $ iperf3 -s -B 192.168.93.1%ens260f0 [SUM] 77.00-78.00 sec 2.14 GBytes 18.4 Gbits/sec 0.012 ms 9373/1921186 (0.49%) xdp-bench: $ xdp-bench drop ens260f0 Dropping packets on ens260f0 (ifindex 8; driver ice) Summary 1,910,918 rx/s 0 err/s Summary 1,866,562 rx/s 0 err/s Summary 1,901,233 rx/s 0 err/s Summary 1,859,854 rx/s 0 err/s Summary 1,593,493 rx/s 0 err/s Summary 1,891,426 rx/s 0 err/s Summary 1,880,673 rx/s 0 err/s Summary 1,866,043 rx/s 0 err/s Summary 1,872,845 rx/s 0 err/s I ran a few times and it seemed to waffle a bit around 15Gbit/sec to 20Gbit/sec, with throughput varying regardless of which patch applied. I actually tended to see slightly higher numbers with this fix applied, but it was not consistent and hard to measure. without the patch: Without xdp-bench running the XDP program, top showed a CPU usage of 740% and an ~86 idle score. With xdp-bench running, the iperf cpu drops off the top listing and the CPU idle score goes up to 99.9 with the patch: The iperf3 CPU use seems to go up, but so does the throughput. It is hard to get an isolated measure. I don't have an immediate setup for fine tuned performance testing available to do anything more rigorous. Personally, I think its overall in the noise, as I saw the same peak performance and CPU usages with and without the patch. I also tried testing TCP and also didn't see a significant difference with or without the patch. Though, testing xdp-bench with TCP is not that useful since the client stops transmitting once the packets are dropped instead of handled. $ iperf3 -c 192.168.93.1 -t86400 -l8000 -P5 Without patch: [SUM] 24.00-25.00 sec 7.80 GBytes 67.0 Gbits/sec With patch: [SUM] 28.00-29.00 sec 7.85 GBytes 67.4 Gbits/sec Again, it ranges from 60 to 68 Gbit/sec in both cases, though I think the peak is slightly higher with the fix applied, sometimes I saw it spike up to 70Gbit/sec but it mostly hovers around 67 Gbit/sec. I'm sure theres a lot of factors impacting the performance here, but I think there's not much evidence that its significantly different. >> Cc: Christoph Petrausch <christoph.petrausch@deepl.com> >> Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com> >> Closes: https://lore.kernel.org/netdev/CAK8fFZ4hY6GUJNENz3wY9jaYLZXGfpr7dnZxzGMYoE44caRbgw@mail.gmail.com/ >> Fixes: 743bbd93cf29 ("ice: put Rx buffers after being done with current frame") >> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> >> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) >> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> >> Tested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> >> Tested-by: Priya Singh <priyax.singh@intel.com> >> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> >> --- >> drivers/net/ethernet/intel/ice/ice_txrx.c | 81 +++++++++-------------- >> drivers/net/ethernet/intel/ice/ice_txrx.h | 1 - >> 2 files changed, 33 insertions(+), 49 deletions(-) >> >> diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c >> index 29e0088ab6b2..93907ab2eac7 100644 >> --- a/drivers/net/ethernet/intel/ice/ice_txrx.c >> +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c >> @@ -894,10 +894,6 @@ ice_add_xdp_frag(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp, >> __skb_fill_page_desc_noacc(sinfo, sinfo->nr_frags++, rx_buf->page, >> rx_buf->page_offset, size); >> sinfo->xdp_frags_size += size; >> - /* remember frag count before XDP prog execution; bpf_xdp_adjust_tail() >> - * can pop off frags but driver has to handle it on its own >> - */ >> - rx_ring->nr_frags = sinfo->nr_frags; >> >> if (page_is_pfmemalloc(rx_buf->page)) >> xdp_buff_set_frag_pfmemalloc(xdp); >> @@ -968,20 +964,20 @@ ice_get_rx_buf(struct ice_rx_ring *rx_ring, const unsigned int size, >> /** >> * ice_get_pgcnts - grab page_count() for gathered fragments >> * @rx_ring: Rx descriptor ring to store the page counts on >> + * @ntc: the next to clean element (not included in this frame!) >> * >> * This function is intended to be called right before running XDP >> * program so that the page recycling mechanism will be able to take >> * a correct decision regarding underlying pages; this is done in such >> * way as XDP program can change the refcount of page >> */ >> -static void ice_get_pgcnts(struct ice_rx_ring *rx_ring) >> +static void ice_get_pgcnts(struct ice_rx_ring *rx_ring, unsigned int ntc) >> { >> - u32 nr_frags = rx_ring->nr_frags + 1; >> u32 idx = rx_ring->first_desc; >> struct ice_rx_buf *rx_buf; >> u32 cnt = rx_ring->count; >> >> - for (int i = 0; i < nr_frags; i++) { >> + while (idx != ntc) { >> rx_buf = &rx_ring->rx_buf[idx]; >> rx_buf->pgcnt = page_count(rx_buf->page); >> >> @@ -1154,62 +1150,48 @@ ice_put_rx_buf(struct ice_rx_ring *rx_ring, struct ice_rx_buf *rx_buf) >> } >> >> /** >> - * ice_put_rx_mbuf - ice_put_rx_buf() caller, for all frame frags >> + * ice_put_rx_mbuf - ice_put_rx_buf() caller, for all buffers in frame >> * @rx_ring: Rx ring with all the auxiliary data >> * @xdp: XDP buffer carrying linear + frags part >> - * @xdp_xmit: XDP_TX/XDP_REDIRECT verdict storage >> - * @ntc: a current next_to_clean value to be stored at rx_ring >> + * @ntc: the next to clean element (not included in this frame!) >> * @verdict: return code from XDP program execution >> * >> - * Walk through gathered fragments and satisfy internal page >> - * recycle mechanism; we take here an action related to verdict >> - * returned by XDP program; >> + * Called after XDP program is completed, or on error with verdict set to >> + * ICE_XDP_CONSUMED. >> + * >> + * Walk through buffers from first_desc to the end of the frame, releasing >> + * buffers and satisfying internal page recycle mechanism. The action depends >> + * on verdict from XDP program. >> */ >> static void ice_put_rx_mbuf(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp, >> - u32 *xdp_xmit, u32 ntc, u32 verdict) >> + u32 ntc, u32 verdict) >> { >> - u32 nr_frags = rx_ring->nr_frags + 1; >> + u32 nr_frags = xdp_get_shared_info_from_buff(xdp)->nr_frags; > > Here we unconditionally access the skb_shared_info area. > >> u32 idx = rx_ring->first_desc; >> u32 cnt = rx_ring->count; >> - u32 post_xdp_frags = 1; >> struct ice_rx_buf *buf; >> - int i; >> - >> - if (unlikely(xdp_buff_has_frags(xdp))) > > Previously we only touch shared_info area if this is a multi-buff frame. > I'm not certain, but reading the helpers it might be correct to do something like this: if (unlikely(xdp_buff_has_frags(xdp))) nr_frags = xdp_get_shared_info_from_buff(xdp)->nr_frags; else nr_frags = 1 either in the driver code or by adding a new xdp helper function. I'm not sure its worth it though. We have pending work from our development team to refactor ice to use page pool and switch to libeth XDP helpers which eliminates all of this driver-specific logic. I don't personally think its worth holding up this series and this important memory leak fix for a minor potential code change that I can't measure an obvious improvement on. Thanks, Jake [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 236 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net 3/6] ice: fix Rx page leak on multi-buffer frames 2025-08-19 0:38 ` Jacob Keller @ 2025-08-19 16:44 ` Jesper Dangaard Brouer 2025-08-19 19:53 ` Jacob Keller 0 siblings, 1 reply; 9+ messages in thread From: Jesper Dangaard Brouer @ 2025-08-19 16:44 UTC (permalink / raw) To: Jacob Keller, Tony Nguyen, ast, davem, kuba, pabeni, netdev Cc: maciej.fijalkowski, magnus.karlsson, andrew+netdev, daniel, john.fastabend, sdf, bpf, horms, przemyslaw.kitszel, aleksander.lobakin, jaroslav.pulchart, jdamato, christoph.petrausch, Rinitha S, Priya Singh, Eelco Chaudron, edumazet On 19/08/2025 02.38, Jacob Keller wrote: > > > On 8/18/2025 4:05 AM, Jesper Dangaard Brouer wrote: >> On 15/08/2025 22.41, Tony Nguyen wrote: >>> This has the advantage that we also no longer need to track or cache the >>> number of fragments in the rx_ring, which saves a few bytes in the ring. >>> >> >> Have anyone tested the performance impact for XDP_DROP ? >> (with standard non-multi-buffer frames) >> >> Below code change will always touch cache-lines in shared_info area. >> Before it was guarded with a xdp_buff_has_frags() check. >> > > I did some basic testing with XDP_DROP previously using the xdp-bench > tool, and do not recall notice an issue. I don't recall the actual > numbers now though, so I did some quick tests again. > > without patch... > > Client: > $ iperf3 -u -c 192.168.93.1 -t86400 -l1200 -P20 -b5G > ... > [SUM] 10.00-10.33 sec 626 MBytes 16.0 Gbits/sec 546909 > > $ iperf3 -s -B 192.168.93.1%ens260f0 > [SUM] 0.00-10.00 sec 17.7 GBytes 15.2 Gbits/sec 0.011 ms > 9712/15888183 (0.061%) receiver > > $ xdp-bench drop ens260f0 > Summary 1,778,935 rx/s 0 err/s > Summary 2,041,087 rx/s 0 err/s > Summary 2,005,052 rx/s 0 err/s > Summary 1,918,967 rx/s 0 err/s > > with patch... > > Client: > $ iperf3 -u -c 192.168.93.1 -t86400 -l1200 -P20 -b5G > ... > [SUM] 78.00-78.90 sec 2.01 GBytes 19.1 Gbits/sec 1801284 > > Server: > $ iperf3 -s -B 192.168.93.1%ens260f0 > [SUM] 77.00-78.00 sec 2.14 GBytes 18.4 Gbits/sec 0.012 ms > 9373/1921186 (0.49%) > > xdp-bench: > $ xdp-bench drop ens260f0 > Dropping packets on ens260f0 (ifindex 8; driver ice) > Summary 1,910,918 rx/s 0 err/s > Summary 1,866,562 rx/s 0 err/s > Summary 1,901,233 rx/s 0 err/s > Summary 1,859,854 rx/s 0 err/s > Summary 1,593,493 rx/s 0 err/s > Summary 1,891,426 rx/s 0 err/s > Summary 1,880,673 rx/s 0 err/s > Summary 1,866,043 rx/s 0 err/s > Summary 1,872,845 rx/s 0 err/s > > > I ran a few times and it seemed to waffle a bit around 15Gbit/sec to > 20Gbit/sec, with throughput varying regardless of which patch applied. I > actually tended to see slightly higher numbers with this fix applied, > but it was not consistent and hard to measure. > Above testing is not a valid XDP_DROP test. The packet generator need to be much much faster, as XDP_DROP is for DDoS protection use-cases (one of Cloudflare's main products). I recommend using the script for pktgen in kernel tree: samples/pktgen/pktgen_sample03_burst_single_flow.sh Example: ./pktgen_sample03_burst_single_flow.sh -vi mlx5p2 -d 198.18.100.1 -m b4:96:91:ad:0b:09 -t $(nproc) > without the patch: On my testlab with CPU: AMD EPYC 9684X (SRSO=IBPB) running: - sudo ./xdp-bench drop ice4 # (defaults to no-touch) XDP_DROP (with no-touch) Without patch : 54,052,300 rx/s = 18.50 nanosec/packet With the patch: 33,420,619 rx/s = 29.92 nanosec/packet Diff: 11.42 nanosec Using perf stat I can see an increase in cache-misses. The difference is less, if we read-packet data, running: - sudo ./xdp-bench drop ice4 --packet-operation read-data XDP_DROP (with read-data) Without patch : 27,200,683 rx/s = 36.76 nanosec/packet With the patch: 24,348,751 rx/s = 41.07 nanosec/packet Diff: 4.31 nanosec On this CPU we don't have DDIO/DCA, so we take a big hit reading the packet data in XDP. This will be needed by our DDoS bpf_prog. The nanosec diff isn't the same, so it seem this change can hide a little behind the cache-miss in the XDP bpf_prog. > Without xdp-bench running the XDP program, top showed a CPU usage of > 740% and an ~86 idle score. > We don't want a scaling test for this. For this XDP_DROP/DDoS test we want to target a single CPU. This is easiest done by generating a single flow (hint pktgen script is called _single_flow). We want to see a single CPU running ksoftirqd 100% of the time. > With xdp-bench running, the iperf cpu drops off the top listing and the > CPU idle score goes up to 99.9 > With the single flow generator DoS "attacking" a single CPU, we want to see a single CPU running ksoftirqd 100% of the time, for XDP_DROP case. > > with the patch: > > The iperf3 CPU use seems to go up, but so does the throughput. It is > hard to get an isolated measure. I don't have an immediate setup for > fine tuned performance testing available to do anything more rigorous. > > Personally, I think its overall in the noise, as I saw the same peak > performance and CPU usages with and without the patch. > > I also tried testing TCP and also didn't see a significant difference > with or without the patch. Though, testing xdp-bench with TCP is not > that useful since the client stops transmitting once the packets are > dropped instead of handled. > > $ iperf3 -c 192.168.93.1 -t86400 -l8000 -P5 > > Without patch: > [SUM] 24.00-25.00 sec 7.80 GBytes 67.0 Gbits/sec > > With patch: > [SUM] 28.00-29.00 sec 7.85 GBytes 67.4 Gbits/sec > > Again, it ranges from 60 to 68 Gbit/sec in both cases, though I think > the peak is slightly higher with the fix applied, sometimes I saw it > spike up to 70Gbit/sec but it mostly hovers around 67 Gbit/sec. > > I'm sure theres a lot of factors impacting the performance here, but I > think there's not much evidence that its significantly different. >>> Cc: Christoph Petrausch <christoph.petrausch@deepl.com> >>> Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com> >>> Closes: https://lore.kernel.org/netdev/CAK8fFZ4hY6GUJNENz3wY9jaYLZXGfpr7dnZxzGMYoE44caRbgw@mail.gmail.com/ >>> Fixes: 743bbd93cf29 ("ice: put Rx buffers after being done with current frame") >>> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> >>> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) >>> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> >>> Tested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> >>> Tested-by: Priya Singh <priyax.singh@intel.com> >>> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> >>> --- >>> drivers/net/ethernet/intel/ice/ice_txrx.c | 81 +++++++++-------------- >>> drivers/net/ethernet/intel/ice/ice_txrx.h | 1 - >>> 2 files changed, 33 insertions(+), 49 deletions(-) >>> >>> diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c >>> index 29e0088ab6b2..93907ab2eac7 100644 >>> --- a/drivers/net/ethernet/intel/ice/ice_txrx.c >>> +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c >>> @@ -894,10 +894,6 @@ ice_add_xdp_frag(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp, >>> __skb_fill_page_desc_noacc(sinfo, sinfo->nr_frags++, rx_buf->page, >>> rx_buf->page_offset, size); >>> sinfo->xdp_frags_size += size; >>> - /* remember frag count before XDP prog execution; bpf_xdp_adjust_tail() >>> - * can pop off frags but driver has to handle it on its own >>> - */ >>> - rx_ring->nr_frags = sinfo->nr_frags; >>> >>> if (page_is_pfmemalloc(rx_buf->page)) >>> xdp_buff_set_frag_pfmemalloc(xdp); >>> @@ -968,20 +964,20 @@ ice_get_rx_buf(struct ice_rx_ring *rx_ring, const unsigned int size, >>> /** >>> * ice_get_pgcnts - grab page_count() for gathered fragments >>> * @rx_ring: Rx descriptor ring to store the page counts on >>> + * @ntc: the next to clean element (not included in this frame!) >>> * >>> * This function is intended to be called right before running XDP >>> * program so that the page recycling mechanism will be able to take >>> * a correct decision regarding underlying pages; this is done in such >>> * way as XDP program can change the refcount of page >>> */ >>> -static void ice_get_pgcnts(struct ice_rx_ring *rx_ring) >>> +static void ice_get_pgcnts(struct ice_rx_ring *rx_ring, unsigned int ntc) >>> { >>> - u32 nr_frags = rx_ring->nr_frags + 1; >>> u32 idx = rx_ring->first_desc; >>> struct ice_rx_buf *rx_buf; >>> u32 cnt = rx_ring->count; >>> >>> - for (int i = 0; i < nr_frags; i++) { >>> + while (idx != ntc) { >>> rx_buf = &rx_ring->rx_buf[idx]; >>> rx_buf->pgcnt = page_count(rx_buf->page); >>> >>> @@ -1154,62 +1150,48 @@ ice_put_rx_buf(struct ice_rx_ring *rx_ring, struct ice_rx_buf *rx_buf) >>> } >>> >>> /** >>> - * ice_put_rx_mbuf - ice_put_rx_buf() caller, for all frame frags >>> + * ice_put_rx_mbuf - ice_put_rx_buf() caller, for all buffers in frame >>> * @rx_ring: Rx ring with all the auxiliary data >>> * @xdp: XDP buffer carrying linear + frags part >>> - * @xdp_xmit: XDP_TX/XDP_REDIRECT verdict storage >>> - * @ntc: a current next_to_clean value to be stored at rx_ring >>> + * @ntc: the next to clean element (not included in this frame!) >>> * @verdict: return code from XDP program execution >>> * >>> - * Walk through gathered fragments and satisfy internal page >>> - * recycle mechanism; we take here an action related to verdict >>> - * returned by XDP program; >>> + * Called after XDP program is completed, or on error with verdict set to >>> + * ICE_XDP_CONSUMED. >>> + * >>> + * Walk through buffers from first_desc to the end of the frame, releasing >>> + * buffers and satisfying internal page recycle mechanism. The action depends >>> + * on verdict from XDP program. >>> */ >>> static void ice_put_rx_mbuf(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp, >>> - u32 *xdp_xmit, u32 ntc, u32 verdict) >>> + u32 ntc, u32 verdict) >>> { >>> - u32 nr_frags = rx_ring->nr_frags + 1; >>> + u32 nr_frags = xdp_get_shared_info_from_buff(xdp)->nr_frags; >> >> Here we unconditionally access the skb_shared_info area. >> >>> u32 idx = rx_ring->first_desc; >>> u32 cnt = rx_ring->count; >>> - u32 post_xdp_frags = 1; >>> struct ice_rx_buf *buf; >>> - int i; >>> - >>> - if (unlikely(xdp_buff_has_frags(xdp))) >> >> Previously we only touch shared_info area if this is a multi-buff frame. >> > > I'm not certain, but reading the helpers it might be correct to do > something like this: > > if (unlikely(xdp_buff_has_frags(xdp))) > nr_frags = xdp_get_shared_info_from_buff(xdp)->nr_frags; > else > nr_frags = 1 Yes, that looks like a correct pattern. > either in the driver code or by adding a new xdp helper function. > > I'm not sure its worth it though. We have pending work from our > development team to refactor ice to use page pool and switch to libeth > XDP helpers which eliminates all of this driver-specific logic. Please do proper testing of XDP_DROP case when doing this change. > I don't personally think its worth holding up this series and this > important memory leak fix for a minor potential code change that I can't > measure an obvious improvement on. IMHO you included an optimization (that wasn't a gain) in a fix patch. I think you can fix the memory leak without the "optimization" part. pw-bot: cr --Jesper ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net 3/6] ice: fix Rx page leak on multi-buffer frames 2025-08-19 16:44 ` Jesper Dangaard Brouer @ 2025-08-19 19:53 ` Jacob Keller 2025-08-19 20:44 ` Tony Nguyen 2025-08-21 16:27 ` Jacob Keller 0 siblings, 2 replies; 9+ messages in thread From: Jacob Keller @ 2025-08-19 19:53 UTC (permalink / raw) To: Jesper Dangaard Brouer, Tony Nguyen, ast, davem, kuba, pabeni, netdev Cc: maciej.fijalkowski, magnus.karlsson, andrew+netdev, daniel, john.fastabend, sdf, bpf, horms, przemyslaw.kitszel, aleksander.lobakin, jaroslav.pulchart, jdamato, christoph.petrausch, Rinitha S, Priya Singh, Eelco Chaudron, edumazet [-- Attachment #1.1: Type: text/plain, Size: 6025 bytes --] On 8/19/2025 9:44 AM, Jesper Dangaard Brouer wrote: > > > On 19/08/2025 02.38, Jacob Keller wrote: >> >> >> On 8/18/2025 4:05 AM, Jesper Dangaard Brouer wrote: >>> On 15/08/2025 22.41, Tony Nguyen wrote: >>>> This has the advantage that we also no longer need to track or cache the >>>> number of fragments in the rx_ring, which saves a few bytes in the ring. >>>> >>> >>> Have anyone tested the performance impact for XDP_DROP ? >>> (with standard non-multi-buffer frames) >>> >>> Below code change will always touch cache-lines in shared_info area. >>> Before it was guarded with a xdp_buff_has_frags() check. >>> >> >> I did some basic testing with XDP_DROP previously using the xdp-bench >> tool, and do not recall notice an issue. I don't recall the actual >> numbers now though, so I did some quick tests again. >> >> without patch... >> >> Client: >> $ iperf3 -u -c 192.168.93.1 -t86400 -l1200 -P20 -b5G >> ... >> [SUM] 10.00-10.33 sec 626 MBytes 16.0 Gbits/sec 546909 >> >> $ iperf3 -s -B 192.168.93.1%ens260f0 >> [SUM] 0.00-10.00 sec 17.7 GBytes 15.2 Gbits/sec 0.011 ms >> 9712/15888183 (0.061%) receiver >> >> $ xdp-bench drop ens260f0 >> Summary 1,778,935 rx/s 0 err/s >> Summary 2,041,087 rx/s 0 err/s >> Summary 2,005,052 rx/s 0 err/s >> Summary 1,918,967 rx/s 0 err/s >> >> with patch... >> >> Client: >> $ iperf3 -u -c 192.168.93.1 -t86400 -l1200 -P20 -b5G >> ... >> [SUM] 78.00-78.90 sec 2.01 GBytes 19.1 Gbits/sec 1801284 >> >> Server: >> $ iperf3 -s -B 192.168.93.1%ens260f0 >> [SUM] 77.00-78.00 sec 2.14 GBytes 18.4 Gbits/sec 0.012 ms >> 9373/1921186 (0.49%) >> >> xdp-bench: >> $ xdp-bench drop ens260f0 >> Dropping packets on ens260f0 (ifindex 8; driver ice) >> Summary 1,910,918 rx/s 0 err/s >> Summary 1,866,562 rx/s 0 err/s >> Summary 1,901,233 rx/s 0 err/s >> Summary 1,859,854 rx/s 0 err/s >> Summary 1,593,493 rx/s 0 err/s >> Summary 1,891,426 rx/s 0 err/s >> Summary 1,880,673 rx/s 0 err/s >> Summary 1,866,043 rx/s 0 err/s >> Summary 1,872,845 rx/s 0 err/s >> >> >> I ran a few times and it seemed to waffle a bit around 15Gbit/sec to >> 20Gbit/sec, with throughput varying regardless of which patch applied. I >> actually tended to see slightly higher numbers with this fix applied, >> but it was not consistent and hard to measure. >> > > Above testing is not a valid XDP_DROP test. > Fair. I'm no XDP expert, so I have a lot to learn here :) > The packet generator need to be much much faster, as XDP_DROP is for > DDoS protection use-cases (one of Cloudflare's main products). > > I recommend using the script for pktgen in kernel tree: > samples/pktgen/pktgen_sample03_burst_single_flow.sh > > Example: > ./pktgen_sample03_burst_single_flow.sh -vi mlx5p2 -d 198.18.100.1 -m > b4:96:91:ad:0b:09 -t $(nproc) > > >> without the patch: > > On my testlab with CPU: AMD EPYC 9684X (SRSO=IBPB) running: > - sudo ./xdp-bench drop ice4 # (defaults to no-touch) > > XDP_DROP (with no-touch) > Without patch : 54,052,300 rx/s = 18.50 nanosec/packet > With the patch: 33,420,619 rx/s = 29.92 nanosec/packet > Diff: 11.42 nanosec > Oof. Yea, thats not good. > Using perf stat I can see an increase in cache-misses. > > The difference is less, if we read-packet data, running: > - sudo ./xdp-bench drop ice4 --packet-operation read-data > > XDP_DROP (with read-data) > Without patch : 27,200,683 rx/s = 36.76 nanosec/packet > With the patch: 24,348,751 rx/s = 41.07 nanosec/packet > Diff: 4.31 nanosec > > On this CPU we don't have DDIO/DCA, so we take a big hit reading the > packet data in XDP. This will be needed by our DDoS bpf_prog. > The nanosec diff isn't the same, so it seem this change can hide a > little behind the cache-miss in the XDP bpf_prog. > > >> Without xdp-bench running the XDP program, top showed a CPU usage of >> 740% and an ~86 idle score. >> > > We don't want a scaling test for this. For this XDP_DROP/DDoS test we > want to target a single CPU. This is easiest done by generating a single > flow (hint pktgen script is called _single_flow). We want to see a > single CPU running ksoftirqd 100% of the time. > Ok. >> >> I'm not certain, but reading the helpers it might be correct to do >> something like this: >> >> if (unlikely(xdp_buff_has_frags(xdp))) >> nr_frags = xdp_get_shared_info_from_buff(xdp)->nr_frags; >> else >> nr_frags = 1 > > Yes, that looks like a correct pattern. > >> either in the driver code or by adding a new xdp helper function. >> >> I'm not sure its worth it though. We have pending work from our >> development team to refactor ice to use page pool and switch to libeth >> XDP helpers which eliminates all of this driver-specific logic. > > Please do proper testing of XDP_DROP case when doing this change. > >> I don't personally think its worth holding up this series and this >> important memory leak fix for a minor potential code change that I can't >> measure an obvious improvement on. > > IMHO you included an optimization (that wasn't a gain) in a fix patch. > I think you can fix the memory leak without the "optimization" part. > It wasn't intended as an optimization in any case, but me trying to make it easier to keep track of what the driver was doing, but obviously ended up regressing here. @Jakub, @Tony, I guess we'll have to drop this patch from the series, and I'll work on a v2 to avoid this regression. > pw-bot: cr > > --Jesper > [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 236 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net 3/6] ice: fix Rx page leak on multi-buffer frames 2025-08-19 19:53 ` Jacob Keller @ 2025-08-19 20:44 ` Tony Nguyen 2025-08-21 16:27 ` Jacob Keller 1 sibling, 0 replies; 9+ messages in thread From: Tony Nguyen @ 2025-08-19 20:44 UTC (permalink / raw) To: Jacob Keller, Jesper Dangaard Brouer, ast, davem, kuba, pabeni, netdev Cc: maciej.fijalkowski, magnus.karlsson, andrew+netdev, daniel, john.fastabend, sdf, bpf, horms, przemyslaw.kitszel, aleksander.lobakin, jaroslav.pulchart, jdamato, christoph.petrausch, Rinitha S, Priya Singh, Eelco Chaudron, edumazet On 8/19/2025 12:53 PM, Jacob Keller wrote: > On 8/19/2025 9:44 AM, Jesper Dangaard Brouer wrote: ... > > @Jakub, @Tony, I guess we'll have to drop this patch from the series, > and I'll work on a v2 to avoid this regression. Ok. I'll re-send the rest of the series with this dropped. Thanks, Tony >> pw-bot: cr I don't think the bot picked this up so... pw-bot: changes-requested >> >> --Jesper >> > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net 3/6] ice: fix Rx page leak on multi-buffer frames 2025-08-19 19:53 ` Jacob Keller 2025-08-19 20:44 ` Tony Nguyen @ 2025-08-21 16:27 ` Jacob Keller 1 sibling, 0 replies; 9+ messages in thread From: Jacob Keller @ 2025-08-21 16:27 UTC (permalink / raw) To: Jesper Dangaard Brouer, Tony Nguyen, ast, davem, kuba, pabeni, netdev Cc: maciej.fijalkowski, magnus.karlsson, andrew+netdev, daniel, john.fastabend, sdf, bpf, horms, przemyslaw.kitszel, aleksander.lobakin, jaroslav.pulchart, jdamato, christoph.petrausch, Rinitha S, Priya Singh, Eelco Chaudron, edumazet [-- Attachment #1.1: Type: text/plain, Size: 5873 bytes --] On 8/19/2025 12:53 PM, Jacob Keller wrote: > > > On 8/19/2025 9:44 AM, Jesper Dangaard Brouer wrote: >> >> >> On 19/08/2025 02.38, Jacob Keller wrote: >>> >>> >>> On 8/18/2025 4:05 AM, Jesper Dangaard Brouer wrote: >>>> On 15/08/2025 22.41, Tony Nguyen wrote: >>>>> This has the advantage that we also no longer need to track or cache the >>>>> number of fragments in the rx_ring, which saves a few bytes in the ring. >>>>> >>>> >>>> Have anyone tested the performance impact for XDP_DROP ? >>>> (with standard non-multi-buffer frames) >>>> >>>> Below code change will always touch cache-lines in shared_info area. >>>> Before it was guarded with a xdp_buff_has_frags() check. >>>> >>> >>> I did some basic testing with XDP_DROP previously using the xdp-bench >>> tool, and do not recall notice an issue. I don't recall the actual >>> numbers now though, so I did some quick tests again. >>> >>> without patch... >>> >>> Client: >>> $ iperf3 -u -c 192.168.93.1 -t86400 -l1200 -P20 -b5G >>> ... >>> [SUM] 10.00-10.33 sec 626 MBytes 16.0 Gbits/sec 546909 >>> >>> $ iperf3 -s -B 192.168.93.1%ens260f0 >>> [SUM] 0.00-10.00 sec 17.7 GBytes 15.2 Gbits/sec 0.011 ms >>> 9712/15888183 (0.061%) receiver >>> >>> $ xdp-bench drop ens260f0 >>> Summary 1,778,935 rx/s 0 err/s >>> Summary 2,041,087 rx/s 0 err/s >>> Summary 2,005,052 rx/s 0 err/s >>> Summary 1,918,967 rx/s 0 err/s >>> >>> with patch... >>> >>> Client: >>> $ iperf3 -u -c 192.168.93.1 -t86400 -l1200 -P20 -b5G >>> ... >>> [SUM] 78.00-78.90 sec 2.01 GBytes 19.1 Gbits/sec 1801284 >>> >>> Server: >>> $ iperf3 -s -B 192.168.93.1%ens260f0 >>> [SUM] 77.00-78.00 sec 2.14 GBytes 18.4 Gbits/sec 0.012 ms >>> 9373/1921186 (0.49%) >>> >>> xdp-bench: >>> $ xdp-bench drop ens260f0 >>> Dropping packets on ens260f0 (ifindex 8; driver ice) >>> Summary 1,910,918 rx/s 0 err/s >>> Summary 1,866,562 rx/s 0 err/s >>> Summary 1,901,233 rx/s 0 err/s >>> Summary 1,859,854 rx/s 0 err/s >>> Summary 1,593,493 rx/s 0 err/s >>> Summary 1,891,426 rx/s 0 err/s >>> Summary 1,880,673 rx/s 0 err/s >>> Summary 1,866,043 rx/s 0 err/s >>> Summary 1,872,845 rx/s 0 err/s >>> >>> >>> I ran a few times and it seemed to waffle a bit around 15Gbit/sec to >>> 20Gbit/sec, with throughput varying regardless of which patch applied. I >>> actually tended to see slightly higher numbers with this fix applied, >>> but it was not consistent and hard to measure. >>> >> >> Above testing is not a valid XDP_DROP test. >> > > Fair. I'm no XDP expert, so I have a lot to learn here :) > >> The packet generator need to be much much faster, as XDP_DROP is for >> DDoS protection use-cases (one of Cloudflare's main products). >> >> I recommend using the script for pktgen in kernel tree: >> samples/pktgen/pktgen_sample03_burst_single_flow.sh >> >> Example: >> ./pktgen_sample03_burst_single_flow.sh -vi mlx5p2 -d 198.18.100.1 -m >> b4:96:91:ad:0b:09 -t $(nproc) >> >> >>> without the patch: >> >> On my testlab with CPU: AMD EPYC 9684X (SRSO=IBPB) running: >> - sudo ./xdp-bench drop ice4 # (defaults to no-touch) >> >> XDP_DROP (with no-touch) >> Without patch : 54,052,300 rx/s = 18.50 nanosec/packet >> With the patch: 33,420,619 rx/s = 29.92 nanosec/packet >> Diff: 11.42 nanosec >> > > Oof. Yea, thats not good. > >> Using perf stat I can see an increase in cache-misses. >> >> The difference is less, if we read-packet data, running: >> - sudo ./xdp-bench drop ice4 --packet-operation read-data >> >> XDP_DROP (with read-data) >> Without patch : 27,200,683 rx/s = 36.76 nanosec/packet >> With the patch: 24,348,751 rx/s = 41.07 nanosec/packet >> Diff: 4.31 nanosec >> >> On this CPU we don't have DDIO/DCA, so we take a big hit reading the >> packet data in XDP. This will be needed by our DDoS bpf_prog. >> The nanosec diff isn't the same, so it seem this change can hide a >> little behind the cache-miss in the XDP bpf_prog. >> >> >>> Without xdp-bench running the XDP program, top showed a CPU usage of >>> 740% and an ~86 idle score. >>> >> >> We don't want a scaling test for this. For this XDP_DROP/DDoS test we >> want to target a single CPU. This is easiest done by generating a single >> flow (hint pktgen script is called _single_flow). We want to see a >> single CPU running ksoftirqd 100% of the time. >> > > Ok. > >>> >>> I'm not certain, but reading the helpers it might be correct to do >>> something like this: >>> >>> if (unlikely(xdp_buff_has_frags(xdp))) >>> nr_frags = xdp_get_shared_info_from_buff(xdp)->nr_frags; >>> else >>> nr_frags = 1 >> >> Yes, that looks like a correct pattern. >> It looks like i40e has the same mistake, but perhaps its less impacted because of lower network speeds. This mistake crept in because the i40e_process_rx_buffs (which I borrowed the same logic from) unconditionally checks the shared info for the nr_frags. In actuality, this counts the number of fragments not counting the initial descriptor, but the check in the loop body is aware of and accounts for that. Thus, I think what we really want here is to set nr_frags to 0 if xdp_buff_has_frags() is false, not 1. A helper function seems like the best solution, and I can submit a change to i40e to fix that code, assuming I can measure the difference there as well. Thanks, Jake [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 236 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH net 4/6] ixgbe: xsk: resolve the negative overflow of budget in ixgbe_xmit_zc [not found] <20250815204205.1407768-1-anthony.l.nguyen@intel.com> 2025-08-15 20:41 ` [PATCH net 3/6] ice: fix Rx page leak on multi-buffer frames Tony Nguyen @ 2025-08-15 20:42 ` Tony Nguyen 2025-08-15 20:42 ` [PATCH net 5/6] ixgbe: fix ndo_xdp_xmit() workloads Tony Nguyen 2 siblings, 0 replies; 9+ messages in thread From: Tony Nguyen @ 2025-08-15 20:42 UTC (permalink / raw) To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev Cc: Jason Xing, anthony.l.nguyen, maciej.fijalkowski, magnus.karlsson, ast, daniel, hawk, john.fastabend, sdf, bpf, bjorn, przemyslaw.kitszel, larysa.zaremba, Paul Menzel, Aleksandr Loktionov From: Jason Xing <kernelxing@tencent.com> Resolve the budget negative overflow which leads to returning true in ixgbe_xmit_zc even when the budget of descs are thoroughly consumed. Before this patch, when the budget is decreased to zero and finishes sending the last allowed desc in ixgbe_xmit_zc, it will always turn back and enter into the while() statement to see if it should keep processing packets, but in the meantime it unexpectedly decreases the value again to 'unsigned int (0--)', namely, UINT_MAX. Finally, the ixgbe_xmit_zc returns true, showing 'we complete cleaning the budget'. That also means 'clean_complete = true' in ixgbe_poll. The true theory behind this is if that budget number of descs are consumed, it implies that we might have more descs to be done. So we should return false in ixgbe_xmit_zc to tell napi poll to find another chance to start polling to handle the rest of descs. On the contrary, returning true here means job done and we know we finish all the possible descs this time and we don't intend to start a new napi poll. It is apparently against our expectations. Please also see how ixgbe_clean_tx_irq() handles the problem: it uses do..while() statement to make sure the budget can be decreased to zero at most and the negative overflow never happens. The patch adds 'likely' because we rarely would not hit the loop condition since the standard budget is 256. Fixes: 8221c5eba8c1 ("ixgbe: add AF_XDP zero-copy Tx support") Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> --- drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c index ac58964b2f08..7b941505a9d0 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c @@ -398,7 +398,7 @@ static bool ixgbe_xmit_zc(struct ixgbe_ring *xdp_ring, unsigned int budget) dma_addr_t dma; u32 cmd_type; - while (budget-- > 0) { + while (likely(budget)) { if (unlikely(!ixgbe_desc_unused(xdp_ring))) { work_done = false; break; @@ -433,6 +433,8 @@ static bool ixgbe_xmit_zc(struct ixgbe_ring *xdp_ring, unsigned int budget) xdp_ring->next_to_use++; if (xdp_ring->next_to_use == xdp_ring->count) xdp_ring->next_to_use = 0; + + budget--; } if (tx_desc) { -- 2.47.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH net 5/6] ixgbe: fix ndo_xdp_xmit() workloads [not found] <20250815204205.1407768-1-anthony.l.nguyen@intel.com> 2025-08-15 20:41 ` [PATCH net 3/6] ice: fix Rx page leak on multi-buffer frames Tony Nguyen 2025-08-15 20:42 ` [PATCH net 4/6] ixgbe: xsk: resolve the negative overflow of budget in ixgbe_xmit_zc Tony Nguyen @ 2025-08-15 20:42 ` Tony Nguyen 2 siblings, 0 replies; 9+ messages in thread From: Tony Nguyen @ 2025-08-15 20:42 UTC (permalink / raw) To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev Cc: Maciej Fijalkowski, anthony.l.nguyen, magnus.karlsson, ast, daniel, hawk, john.fastabend, sdf, bpf, Tobias Böhm, Marcus Wichelmann, Aleksandr Loktionov From: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Currently ixgbe driver checks periodically in its watchdog subtask if there is anything to be transmitted (considering both Tx and XDP rings) under state of carrier not being 'ok'. Such event is interpreted as Tx hang and therefore results in interface reset. This is currently problematic for ndo_xdp_xmit() as it is allowed to produce descriptors when interface is going through reset or its carrier is turned off. Furthermore, XDP rings should not really be objects of Tx hang detection. This mechanism is rather a matter of ndo_tx_timeout() being called from dev_watchdog against Tx rings exposed to networking stack. Taking into account issues described above, let us have a two fold fix - do not respect XDP rings in local ixgbe watchdog and do not produce Tx descriptors in ndo_xdp_xmit callback when there is some problem with carrier currently. For now, keep the Tx hang checks in clean Tx irq routine, but adjust it to not execute for XDP rings. Cc: Tobias Böhm <tobias.boehm@hetzner-cloud.de> Reported-by: Marcus Wichelmann <marcus.wichelmann@hetzner-cloud.de> Closes: https://lore.kernel.org/netdev/eca1880f-253a-4955-afe6-732d7c6926ee@hetzner-cloud.de/ Fixes: 6453073987ba ("ixgbe: add initial support for xdp redirect") Fixes: 33fdc82f0883 ("ixgbe: add support for XDP_TX action") Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Marcus Wichelmann <marcus.wichelmann@hetzner-cloud.de> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> --- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 34 ++++++------------- 1 file changed, 11 insertions(+), 23 deletions(-) diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c index 6122a0abb41f..80e6a2ef1350 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -968,10 +968,6 @@ static void ixgbe_update_xoff_rx_lfc(struct ixgbe_adapter *adapter) for (i = 0; i < adapter->num_tx_queues; i++) clear_bit(__IXGBE_HANG_CHECK_ARMED, &adapter->tx_ring[i]->state); - - for (i = 0; i < adapter->num_xdp_queues; i++) - clear_bit(__IXGBE_HANG_CHECK_ARMED, - &adapter->xdp_ring[i]->state); } static void ixgbe_update_xoff_received(struct ixgbe_adapter *adapter) @@ -1214,7 +1210,7 @@ static void ixgbe_pf_handle_tx_hang(struct ixgbe_ring *tx_ring, struct ixgbe_adapter *adapter = netdev_priv(tx_ring->netdev); struct ixgbe_hw *hw = &adapter->hw; - e_err(drv, "Detected Tx Unit Hang%s\n" + e_err(drv, "Detected Tx Unit Hang\n" " Tx Queue <%d>\n" " TDH, TDT <%x>, <%x>\n" " next_to_use <%x>\n" @@ -1222,16 +1218,14 @@ static void ixgbe_pf_handle_tx_hang(struct ixgbe_ring *tx_ring, "tx_buffer_info[next_to_clean]\n" " time_stamp <%lx>\n" " jiffies <%lx>\n", - ring_is_xdp(tx_ring) ? " (XDP)" : "", tx_ring->queue_index, IXGBE_READ_REG(hw, IXGBE_TDH(tx_ring->reg_idx)), IXGBE_READ_REG(hw, IXGBE_TDT(tx_ring->reg_idx)), tx_ring->next_to_use, next, tx_ring->tx_buffer_info[next].time_stamp, jiffies); - if (!ring_is_xdp(tx_ring)) - netif_stop_subqueue(tx_ring->netdev, - tx_ring->queue_index); + netif_stop_subqueue(tx_ring->netdev, + tx_ring->queue_index); } /** @@ -1451,6 +1445,9 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector, total_bytes); adapter->tx_ipsec += total_ipsec; + if (ring_is_xdp(tx_ring)) + return !!budget; + if (check_for_tx_hang(tx_ring) && ixgbe_check_tx_hang(tx_ring)) { if (adapter->hw.mac.type == ixgbe_mac_e610) ixgbe_handle_mdd_event(adapter, tx_ring); @@ -1468,9 +1465,6 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector, return true; } - if (ring_is_xdp(tx_ring)) - return !!budget; - #define TX_WAKE_THRESHOLD (DESC_NEEDED * 2) txq = netdev_get_tx_queue(tx_ring->netdev, tx_ring->queue_index); if (!__netif_txq_completed_wake(txq, total_packets, total_bytes, @@ -7974,12 +7968,9 @@ static void ixgbe_check_hang_subtask(struct ixgbe_adapter *adapter) return; /* Force detection of hung controller */ - if (netif_carrier_ok(adapter->netdev)) { + if (netif_carrier_ok(adapter->netdev)) for (i = 0; i < adapter->num_tx_queues; i++) set_check_for_tx_hang(adapter->tx_ring[i]); - for (i = 0; i < adapter->num_xdp_queues; i++) - set_check_for_tx_hang(adapter->xdp_ring[i]); - } if (!(adapter->flags & IXGBE_FLAG_MSIX_ENABLED)) { /* @@ -8199,13 +8190,6 @@ static bool ixgbe_ring_tx_pending(struct ixgbe_adapter *adapter) return true; } - for (i = 0; i < adapter->num_xdp_queues; i++) { - struct ixgbe_ring *ring = adapter->xdp_ring[i]; - - if (ring->next_to_use != ring->next_to_clean) - return true; - } - return false; } @@ -11005,6 +10989,10 @@ static int ixgbe_xdp_xmit(struct net_device *dev, int n, if (unlikely(test_bit(__IXGBE_DOWN, &adapter->state))) return -ENETDOWN; + if (!netif_carrier_ok(adapter->netdev) || + !netif_running(adapter->netdev)) + return -ENETDOWN; + if (unlikely(flags & ~XDP_XMIT_FLAGS_MASK)) return -EINVAL; -- 2.47.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-08-21 16:27 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <20250815204205.1407768-1-anthony.l.nguyen@intel.com> 2025-08-15 20:41 ` [PATCH net 3/6] ice: fix Rx page leak on multi-buffer frames Tony Nguyen 2025-08-18 11:05 ` Jesper Dangaard Brouer 2025-08-19 0:38 ` Jacob Keller 2025-08-19 16:44 ` Jesper Dangaard Brouer 2025-08-19 19:53 ` Jacob Keller 2025-08-19 20:44 ` Tony Nguyen 2025-08-21 16:27 ` Jacob Keller 2025-08-15 20:42 ` [PATCH net 4/6] ixgbe: xsk: resolve the negative overflow of budget in ixgbe_xmit_zc Tony Nguyen 2025-08-15 20:42 ` [PATCH net 5/6] ixgbe: fix ndo_xdp_xmit() workloads Tony Nguyen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).