netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net v3 0/1] i40e: xsk: advance next_to_clean on status descriptors
@ 2025-11-13  8:24 Alessandro Decina
  2025-11-13  8:24 ` [PATCH net v3 1/1] " Alessandro Decina
  0 siblings, 1 reply; 8+ messages in thread
From: Alessandro Decina @ 2025-11-13  8:24 UTC (permalink / raw)
  To: netdev
  Cc: Maciej Fijalkowski, David S. Miller, Alexei Starovoitov,
	Andrew Lunn, Daniel Borkmann, Eric Dumazet, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Paolo Abeni,
	Przemek Kitszel, Stanislav Fomichev, Tirthendu Sarkar,
	Tony Nguyen, bpf, intel-wired-lan, linux-kernel,
	Alessandro Decina

Hello,

As suggested by Maciej, submitting v3 which makes i40e_clean_rx_irq and
i40e_clean_rx_irq_zc use similar logic and a shared function
i40e_inc_ntp_ntc() to advance next_to_process and next_to_clean when
handling status descriptors. 

I've left the rest of the i40e_clean_rx_irq logic unchanged or this
patch would snowball. I think it'd be nice to change the function to
work with local variables and update the rx_ring only at the end like
_zc, but seems out of scope for this patch. 

Changes since v2:
 * use common utility function i40e_inc_ntp_ntc to advance indexes

Alessandro Decina (1):
  i40e: xsk: advance next_to_clean on status descriptors

 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 33 ++++++++++++-------
 .../ethernet/intel/i40e/i40e_txrx_common.h    |  2 ++
 drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 17 ++++++----
 3 files changed, 34 insertions(+), 18 deletions(-)


base-commit: 96a9178a29a6b84bb632ebeb4e84cf61191c73d5
-- 
2.43.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH net v3 1/1] i40e: xsk: advance next_to_clean on status descriptors
  2025-11-13  8:24 [PATCH net v3 0/1] i40e: xsk: advance next_to_clean on status descriptors Alessandro Decina
@ 2025-11-13  8:24 ` Alessandro Decina
  2025-11-14  5:49   ` [Intel-wired-lan] " Loktionov, Aleksandr
  2025-11-14 13:01   ` Maciej Fijalkowski
  0 siblings, 2 replies; 8+ messages in thread
From: Alessandro Decina @ 2025-11-13  8:24 UTC (permalink / raw)
  To: netdev
  Cc: Maciej Fijalkowski, David S. Miller, Alexei Starovoitov,
	Andrew Lunn, Daniel Borkmann, Eric Dumazet, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Paolo Abeni,
	Przemek Kitszel, Stanislav Fomichev, Tirthendu Sarkar,
	Tony Nguyen, bpf, intel-wired-lan, linux-kernel,
	Alessandro Decina

Whenever a status descriptor is received, i40e processes and skips over
it, correctly updating next_to_process but forgetting to update
next_to_clean. In the next iteration this accidentally causes the
creation of an invalid multi-buffer xdp_buff where the first fragment
is the status descriptor.

If then a skb is constructed from such an invalid buffer - because the
eBPF program returns XDP_PASS - a panic occurs:

[ 5866.367317] BUG: unable to handle page fault for address: ffd31c37eab1c980
[ 5866.375050] #PF: supervisor read access in kernel mode
[ 5866.380825] #PF: error_code(0x0000) - not-present page
[ 5866.386602] PGD 0
[ 5866.388867] Oops: Oops: 0000 [#1] SMP NOPTI
[ 5866.393575] CPU: 34 UID: 0 PID: 0 Comm: swapper/34 Not tainted 6.17.0-custom #1 PREEMPT(voluntary)
[ 5866.403740] Hardware name: Supermicro AS -2115GT-HNTR/H13SST-G, BIOS 3.2 03/20/2025
[ 5866.412339] RIP: 0010:memcpy+0x8/0x10
[ 5866.416454] Code: cc cc 90 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 90 48 89 f8 48 89 d1 <f3> a4 e9 fc 26 c0 fe 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90
[ 5866.437538] RSP: 0018:ff428d9ec0bb0ca8 EFLAGS: 00010286
[ 5866.443415] RAX: ff2dd26dbd8f0000 RBX: ff2dd265ad161400 RCX: 00000000000004e1
[ 5866.451435] RDX: 00000000000004e1 RSI: ffd31c37eab1c980 RDI: ff2dd26dbd8f0000
[ 5866.459454] RBP: ff428d9ec0bb0d40 R08: 0000000000000000 R09: 0000000000000000
[ 5866.467470] R10: 0000000000000000 R11: 0000000000000000 R12: ff428d9eec726ef8
[ 5866.475490] R13: ff2dd26dbd8f0000 R14: ff2dd265ca2f9fc0 R15: ff2dd26548548b80
[ 5866.483509] FS:  0000000000000000(0000) GS:ff2dd2c363592000(0000) knlGS:0000000000000000
[ 5866.492600] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5866.499060] CR2: ffd31c37eab1c980 CR3: 0000000178d7b040 CR4: 0000000000f71ef0
[ 5866.507079] PKRU: 55555554
[ 5866.510125] Call Trace:
[ 5866.512867]  <IRQ>
[ 5866.515132]  ? i40e_clean_rx_irq_zc+0xc50/0xe60 [i40e]
[ 5866.520921]  i40e_napi_poll+0x2d8/0x1890 [i40e]
[ 5866.526022]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 5866.531408]  ? raise_softirq+0x24/0x70
[ 5866.535623]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 5866.541011]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 5866.546397]  ? rcu_sched_clock_irq+0x225/0x1800
[ 5866.551493]  __napi_poll+0x30/0x230
[ 5866.555423]  net_rx_action+0x20b/0x3f0
[ 5866.559643]  handle_softirqs+0xe4/0x340
[ 5866.563962]  __irq_exit_rcu+0x10e/0x130
[ 5866.568283]  irq_exit_rcu+0xe/0x20
[ 5866.572110]  common_interrupt+0xb6/0xe0
[ 5866.576425]  </IRQ>
[ 5866.578791]  <TASK>

Advance next_to_clean to ensure invalid xdp_buff(s) aren't created.

Rename i40e_inc_ntp to i40e_inc_ntp_ntc. Make it take an optional
pointer to next_to_clean so it's harder for callers to accidentally
forget to advance it.

Fixes: 1c9ba9c14658 ("i40e: xsk: add RX multi-buffer support")
Signed-off-by: Alessandro Decina <alessandro.d@gmail.com>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 33 ++++++++++++-------
 .../ethernet/intel/i40e/i40e_txrx_common.h    |  2 ++
 drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 17 ++++++----
 3 files changed, 34 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index cc0b9efc2637..d3dae895a058 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2359,15 +2359,24 @@ void i40e_finalize_xdp_rx(struct i40e_ring *rx_ring, unsigned int xdp_res)
 }
 
 /**
- * i40e_inc_ntp: Advance the next_to_process index
+ * i40e_inc_ntp_ntc: Advance the next_to_process and next_to_clean indexes
  * @rx_ring: Rx ring
+ * @next_to_process: Pointer to next_to_process
+ * @next_to_clean: Pointer to next_to_clean or NULL
+ *
+ * This function advances the next_to_process index. If next_to_clean is not
+ * NULL, it is advanced as well.
  **/
-static void i40e_inc_ntp(struct i40e_ring *rx_ring)
+void i40e_inc_ntp_ntc(struct i40e_ring *rx_ring, u16 *next_to_process,
+		      u16 *next_to_clean)
 {
-	u32 ntp = rx_ring->next_to_process + 1;
+	u16 ntp = *next_to_process + 1;
 
 	ntp = (ntp < rx_ring->count) ? ntp : 0;
-	rx_ring->next_to_process = ntp;
+	*next_to_process = ntp;
+	if (next_to_clean)
+		*next_to_clean = ntp;
+
 	prefetch(I40E_RX_DESC(rx_ring, ntp));
 }
 
@@ -2484,17 +2493,19 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget,
 			i40e_clean_programming_status(rx_ring,
 						      rx_desc->raw.qword[0],
 						      qword);
+			bool eop;
+
 			rx_buffer = i40e_rx_bi(rx_ring, ntp);
-			i40e_inc_ntp(rx_ring);
-			i40e_reuse_rx_page(rx_ring, rx_buffer);
 			/* Update ntc and bump cleaned count if not in the
 			 * middle of mb packet.
 			 */
-			if (rx_ring->next_to_clean == ntp) {
-				rx_ring->next_to_clean =
-					rx_ring->next_to_process;
+			eop = rx_ring->next_to_process ==
+			      rx_ring->next_to_clean;
+			i40e_inc_ntp_ntc(rx_ring, &rx_ring->next_to_process,
+					 eop ? &rx_ring->next_to_clean : NULL);
+			if (eop)
 				cleaned_count++;
-			}
+			i40e_reuse_rx_page(rx_ring, rx_buffer);
 			continue;
 		}
 
@@ -2507,7 +2518,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget,
 		rx_buffer = i40e_get_rx_buffer(rx_ring, size);
 
 		neop = i40e_is_non_eop(rx_ring, rx_desc);
-		i40e_inc_ntp(rx_ring);
+		i40e_inc_ntp_ntc(rx_ring, &rx_ring->next_to_process, NULL);
 
 		if (!xdp->data) {
 			unsigned char *hard_start;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h b/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
index e26807fd2123..3d7e4b3404f0 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
@@ -17,6 +17,8 @@ void i40e_update_rx_stats(struct i40e_ring *rx_ring,
 			  unsigned int total_rx_packets);
 void i40e_finalize_xdp_rx(struct i40e_ring *rx_ring, unsigned int xdp_res);
 void i40e_release_rx_desc(struct i40e_ring *rx_ring, u32 val);
+void i40e_inc_ntp_ntc(struct i40e_ring *rx_ring, u16 *next_to_process,
+		      u16 *next_to_clean);
 
 #define I40E_XDP_PASS		0
 #define I40E_XDP_CONSUMED	BIT(0)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 9f47388eaba5..fdf72446ed67 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -410,7 +410,6 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
 	u16 next_to_clean = rx_ring->next_to_clean;
 	unsigned int xdp_res, xdp_xmit = 0;
 	struct xdp_buff *first = NULL;
-	u32 count = rx_ring->count;
 	struct bpf_prog *xdp_prog;
 	u32 entries_to_alloc;
 	bool failure = false;
@@ -430,6 +429,7 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
 		struct xdp_buff *bi;
 		unsigned int size;
 		u64 qword;
+		bool neop;
 
 		rx_desc = I40E_RX_DESC(rx_ring, next_to_process);
 		qword = le64_to_cpu(rx_desc->wb.qword1.status_error_len);
@@ -446,8 +446,10 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
 						      qword);
 			bi = *i40e_rx_bi(rx_ring, next_to_process);
 			xsk_buff_free(bi);
-			if (++next_to_process == count)
-				next_to_process = 0;
+			i40e_inc_ntp_ntc(rx_ring, &next_to_process,
+					 next_to_process == next_to_clean ?
+						 &next_to_clean :
+						 NULL);
 			continue;
 		}
 
@@ -466,16 +468,17 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
 			break;
 		}
 
-		if (++next_to_process == count)
-			next_to_process = 0;
+		neop = i40e_is_non_eop(rx_ring, rx_desc);
+		// advance next_to_process. on EOP, advance next_to_clean as well.
+		i40e_inc_ntp_ntc(rx_ring, &next_to_process,
+				 !neop ? &next_to_clean : NULL);
 
-		if (i40e_is_non_eop(rx_ring, rx_desc))
+		if (neop)
 			continue;
 
 		xdp_res = i40e_run_xdp_zc(rx_ring, first, xdp_prog);
 		i40e_handle_xdp_result_zc(rx_ring, first, rx_desc, &rx_packets,
 					  &rx_bytes, xdp_res, &failure);
-		next_to_clean = next_to_process;
 		if (failure)
 			break;
 		total_rx_packets += rx_packets;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* RE: [Intel-wired-lan] [PATCH net v3 1/1] i40e: xsk: advance next_to_clean on status descriptors
  2025-11-13  8:24 ` [PATCH net v3 1/1] " Alessandro Decina
@ 2025-11-14  5:49   ` Loktionov, Aleksandr
  2025-11-14 13:01   ` Maciej Fijalkowski
  1 sibling, 0 replies; 8+ messages in thread
From: Loktionov, Aleksandr @ 2025-11-14  5:49 UTC (permalink / raw)
  To: Alessandro Decina, netdev@vger.kernel.org
  Cc: Fijalkowski, Maciej, David S. Miller, Alexei Starovoitov,
	Andrew Lunn, Daniel Borkmann, Eric Dumazet, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Paolo Abeni,
	Kitszel, Przemyslaw, Stanislav Fomichev, Sarkar, Tirthendu,
	Nguyen, Anthony L, bpf@vger.kernel.org,
	intel-wired-lan@lists.osuosl.org, linux-kernel@vger.kernel.org



> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf
> Of Alessandro Decina
> Sent: Thursday, November 13, 2025 9:25 AM
> To: netdev@vger.kernel.org
> Cc: Fijalkowski, Maciej <maciej.fijalkowski@intel.com>; David S.
> Miller <davem@davemloft.net>; Alexei Starovoitov <ast@kernel.org>;
> Andrew Lunn <andrew+netdev@lunn.ch>; Daniel Borkmann
> <daniel@iogearbox.net>; Eric Dumazet <edumazet@google.com>; Jakub
> Kicinski <kuba@kernel.org>; Jesper Dangaard Brouer <hawk@kernel.org>;
> John Fastabend <john.fastabend@gmail.com>; Paolo Abeni
> <pabeni@redhat.com>; Kitszel, Przemyslaw
> <przemyslaw.kitszel@intel.com>; Stanislav Fomichev <sdf@fomichev.me>;
> Sarkar, Tirthendu <tirthendu.sarkar@intel.com>; Nguyen, Anthony L
> <anthony.l.nguyen@intel.com>; bpf@vger.kernel.org; intel-wired-
> lan@lists.osuosl.org; linux-kernel@vger.kernel.org; Alessandro Decina
> <alessandro.d@gmail.com>
> Subject: [Intel-wired-lan] [PATCH net v3 1/1] i40e: xsk: advance
> next_to_clean on status descriptors
> 
> Whenever a status descriptor is received, i40e processes and skips
> over it, correctly updating next_to_process but forgetting to update
> next_to_clean. In the next iteration this accidentally causes the
> creation of an invalid multi-buffer xdp_buff where the first fragment
> is the status descriptor.
> 
> If then a skb is constructed from such an invalid buffer - because the
> eBPF program returns XDP_PASS - a panic occurs:
> 
> [ 5866.367317] BUG: unable to handle page fault for address:
> ffd31c37eab1c980 [ 5866.375050] #PF: supervisor read access in kernel
> mode [ 5866.380825] #PF: error_code(0x0000) - not-present page [
> 5866.386602] PGD 0 [ 5866.388867] Oops: Oops: 0000 [#1] SMP NOPTI [
> 5866.393575] CPU: 34 UID: 0 PID: 0 Comm: swapper/34 Not tainted
> 6.17.0-custom #1 PREEMPT(voluntary) [ 5866.403740] Hardware name:
> Supermicro AS -2115GT-HNTR/H13SST-G, BIOS 3.2 03/20/2025 [
> 5866.412339] RIP: 0010:memcpy+0x8/0x10 [ 5866.416454] Code: cc cc 90
> cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 90 90 90 90 90 90 90 90
> 90 90 90 90 90 90 90 90 66 90 48 89 f8 48 89 d1 <f3> a4 e9 fc 26 c0 fe
> 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 [ 5866.437538] RSP:
> 0018:ff428d9ec0bb0ca8 EFLAGS: 00010286 [ 5866.443415] RAX:
> ff2dd26dbd8f0000 RBX: ff2dd265ad161400 RCX: 00000000000004e1 [
> 5866.451435] RDX: 00000000000004e1 RSI: ffd31c37eab1c980 RDI:
> ff2dd26dbd8f0000 [ 5866.459454] RBP: ff428d9ec0bb0d40 R08:
> 0000000000000000 R09: 0000000000000000 [ 5866.467470] R10:
> 0000000000000000 R11: 0000000000000000 R12: ff428d9eec726ef8 [
> 5866.475490] R13: ff2dd26dbd8f0000 R14: ff2dd265ca2f9fc0 R15:
> ff2dd26548548b80 [ 5866.483509] FS:  0000000000000000(0000)
> GS:ff2dd2c363592000(0000) knlGS:0000000000000000 [ 5866.492600] CS:
> 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5866.499060] CR2:
> ffd31c37eab1c980 CR3: 0000000178d7b040 CR4: 0000000000f71ef0 [
> 5866.507079] PKRU: 55555554 [ 5866.510125] Call Trace:
> [ 5866.512867]  <IRQ>
> [ 5866.515132]  ? i40e_clean_rx_irq_zc+0xc50/0xe60 [i40e] [
> 5866.520921]  i40e_napi_poll+0x2d8/0x1890 [i40e] [ 5866.526022]  ?
> srso_alias_return_thunk+0x5/0xfbef5
> [ 5866.531408]  ? raise_softirq+0x24/0x70 [ 5866.535623]  ?
> srso_alias_return_thunk+0x5/0xfbef5
> [ 5866.541011]  ? srso_alias_return_thunk+0x5/0xfbef5
> [ 5866.546397]  ? rcu_sched_clock_irq+0x225/0x1800 [ 5866.551493]
> __napi_poll+0x30/0x230 [ 5866.555423]  net_rx_action+0x20b/0x3f0 [
> 5866.559643]  handle_softirqs+0xe4/0x340 [ 5866.563962]
> __irq_exit_rcu+0x10e/0x130 [ 5866.568283]  irq_exit_rcu+0xe/0x20 [
> 5866.572110]  common_interrupt+0xb6/0xe0 [ 5866.576425]  </IRQ> [
> 5866.578791]  <TASK>
> 
> Advance next_to_clean to ensure invalid xdp_buff(s) aren't created.
> 
> Rename i40e_inc_ntp to i40e_inc_ntp_ntc. Make it take an optional
> pointer to next_to_clean so it's harder for callers to accidentally
> forget to advance it.
> 
> Fixes: 1c9ba9c14658 ("i40e: xsk: add RX multi-buffer support")
> Signed-off-by: Alessandro Decina <alessandro.d@gmail.com>
> ---
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 33 ++++++++++++------
> -
>  .../ethernet/intel/i40e/i40e_txrx_common.h    |  2 ++
>  drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 17 ++++++----
>  3 files changed, 34 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> index cc0b9efc2637..d3dae895a058 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> @@ -2359,15 +2359,24 @@ void i40e_finalize_xdp_rx(struct i40e_ring
> *rx_ring, unsigned int xdp_res)  }
> 
>  /**
> - * i40e_inc_ntp: Advance the next_to_process index
> + * i40e_inc_ntp_ntc: Advance the next_to_process and next_to_clean
> + indexes
>   * @rx_ring: Rx ring
> + * @next_to_process: Pointer to next_to_process
> + * @next_to_clean: Pointer to next_to_clean or NULL
> + *
> + * This function advances the next_to_process index. If next_to_clean
> + is not
> + * NULL, it is advanced as well.
>   **/
> -static void i40e_inc_ntp(struct i40e_ring *rx_ring)
> +void i40e_inc_ntp_ntc(struct i40e_ring *rx_ring, u16
> *next_to_process,
> +		      u16 *next_to_clean)
>  {
> -	u32 ntp = rx_ring->next_to_process + 1;
> +	u16 ntp = *next_to_process + 1;
> 
>  	ntp = (ntp < rx_ring->count) ? ntp : 0;
> -	rx_ring->next_to_process = ntp;
> +	*next_to_process = ntp;
> +	if (next_to_clean)
> +		*next_to_clean = ntp;
> +
>  	prefetch(I40E_RX_DESC(rx_ring, ntp));
>  }
> 
> @@ -2484,17 +2493,19 @@ static int i40e_clean_rx_irq(struct i40e_ring
> *rx_ring, int budget,
>  			i40e_clean_programming_status(rx_ring,
>  						      rx_desc->raw.qword[0],
>  						      qword);
> +			bool eop;
> +
>  			rx_buffer = i40e_rx_bi(rx_ring, ntp);
> -			i40e_inc_ntp(rx_ring);
> -			i40e_reuse_rx_page(rx_ring, rx_buffer);
>  			/* Update ntc and bump cleaned count if not in
> the
>  			 * middle of mb packet.
>  			 */
> -			if (rx_ring->next_to_clean == ntp) {
> -				rx_ring->next_to_clean =
> -					rx_ring->next_to_process;
> +			eop = rx_ring->next_to_process ==
> +			      rx_ring->next_to_clean;
> +			i40e_inc_ntp_ntc(rx_ring, &rx_ring-
> >next_to_process,
> +					 eop ? &rx_ring->next_to_clean :
> NULL);
> +			if (eop)
>  				cleaned_count++;
> -			}
> +			i40e_reuse_rx_page(rx_ring, rx_buffer);
>  			continue;
>  		}
> 
> @@ -2507,7 +2518,7 @@ static int i40e_clean_rx_irq(struct i40e_ring
> *rx_ring, int budget,
>  		rx_buffer = i40e_get_rx_buffer(rx_ring, size);
> 
>  		neop = i40e_is_non_eop(rx_ring, rx_desc);
> -		i40e_inc_ntp(rx_ring);
> +		i40e_inc_ntp_ntc(rx_ring, &rx_ring->next_to_process,
> NULL);
> 
>  		if (!xdp->data) {
>  			unsigned char *hard_start;
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
> b/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
> index e26807fd2123..3d7e4b3404f0 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
> @@ -17,6 +17,8 @@ void i40e_update_rx_stats(struct 

...

> --
> 2.43.0

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net v3 1/1] i40e: xsk: advance next_to_clean on status descriptors
  2025-11-13  8:24 ` [PATCH net v3 1/1] " Alessandro Decina
  2025-11-14  5:49   ` [Intel-wired-lan] " Loktionov, Aleksandr
@ 2025-11-14 13:01   ` Maciej Fijalkowski
  2025-11-15  7:58     ` Alessandro Decina
  1 sibling, 1 reply; 8+ messages in thread
From: Maciej Fijalkowski @ 2025-11-14 13:01 UTC (permalink / raw)
  To: Alessandro Decina
  Cc: netdev, David S. Miller, Alexei Starovoitov, Andrew Lunn,
	Daniel Borkmann, Eric Dumazet, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Paolo Abeni,
	Przemek Kitszel, Stanislav Fomichev, Tirthendu Sarkar,
	Tony Nguyen, bpf, intel-wired-lan, linux-kernel

On Thu, Nov 13, 2025 at 07:24:38PM +1100, Alessandro Decina wrote:
> Whenever a status descriptor is received, i40e processes and skips over
> it, correctly updating next_to_process but forgetting to update
> next_to_clean. In the next iteration this accidentally causes the
> creation of an invalid multi-buffer xdp_buff where the first fragment
> is the status descriptor.
> 
> If then a skb is constructed from such an invalid buffer - because the
> eBPF program returns XDP_PASS - a panic occurs:
> 
> [ 5866.367317] BUG: unable to handle page fault for address: ffd31c37eab1c980
> [ 5866.375050] #PF: supervisor read access in kernel mode
> [ 5866.380825] #PF: error_code(0x0000) - not-present page
> [ 5866.386602] PGD 0
> [ 5866.388867] Oops: Oops: 0000 [#1] SMP NOPTI
> [ 5866.393575] CPU: 34 UID: 0 PID: 0 Comm: swapper/34 Not tainted 6.17.0-custom #1 PREEMPT(voluntary)
> [ 5866.403740] Hardware name: Supermicro AS -2115GT-HNTR/H13SST-G, BIOS 3.2 03/20/2025
> [ 5866.412339] RIP: 0010:memcpy+0x8/0x10
> [ 5866.416454] Code: cc cc 90 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 90 48 89 f8 48 89 d1 <f3> a4 e9 fc 26 c0 fe 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90
> [ 5866.437538] RSP: 0018:ff428d9ec0bb0ca8 EFLAGS: 00010286
> [ 5866.443415] RAX: ff2dd26dbd8f0000 RBX: ff2dd265ad161400 RCX: 00000000000004e1
> [ 5866.451435] RDX: 00000000000004e1 RSI: ffd31c37eab1c980 RDI: ff2dd26dbd8f0000
> [ 5866.459454] RBP: ff428d9ec0bb0d40 R08: 0000000000000000 R09: 0000000000000000
> [ 5866.467470] R10: 0000000000000000 R11: 0000000000000000 R12: ff428d9eec726ef8
> [ 5866.475490] R13: ff2dd26dbd8f0000 R14: ff2dd265ca2f9fc0 R15: ff2dd26548548b80
> [ 5866.483509] FS:  0000000000000000(0000) GS:ff2dd2c363592000(0000) knlGS:0000000000000000
> [ 5866.492600] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 5866.499060] CR2: ffd31c37eab1c980 CR3: 0000000178d7b040 CR4: 0000000000f71ef0
> [ 5866.507079] PKRU: 55555554
> [ 5866.510125] Call Trace:
> [ 5866.512867]  <IRQ>
> [ 5866.515132]  ? i40e_clean_rx_irq_zc+0xc50/0xe60 [i40e]
> [ 5866.520921]  i40e_napi_poll+0x2d8/0x1890 [i40e]
> [ 5866.526022]  ? srso_alias_return_thunk+0x5/0xfbef5
> [ 5866.531408]  ? raise_softirq+0x24/0x70
> [ 5866.535623]  ? srso_alias_return_thunk+0x5/0xfbef5
> [ 5866.541011]  ? srso_alias_return_thunk+0x5/0xfbef5
> [ 5866.546397]  ? rcu_sched_clock_irq+0x225/0x1800
> [ 5866.551493]  __napi_poll+0x30/0x230
> [ 5866.555423]  net_rx_action+0x20b/0x3f0
> [ 5866.559643]  handle_softirqs+0xe4/0x340
> [ 5866.563962]  __irq_exit_rcu+0x10e/0x130
> [ 5866.568283]  irq_exit_rcu+0xe/0x20
> [ 5866.572110]  common_interrupt+0xb6/0xe0
> [ 5866.576425]  </IRQ>
> [ 5866.578791]  <TASK>
> 
> Advance next_to_clean to ensure invalid xdp_buff(s) aren't created.
> 
> Rename i40e_inc_ntp to i40e_inc_ntp_ntc. Make it take an optional
> pointer to next_to_clean so it's harder for callers to accidentally
> forget to advance it.
> 
> Fixes: 1c9ba9c14658 ("i40e: xsk: add RX multi-buffer support")
> Signed-off-by: Alessandro Decina <alessandro.d@gmail.com>
> ---
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 33 ++++++++++++-------
>  .../ethernet/intel/i40e/i40e_txrx_common.h    |  2 ++
>  drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 17 ++++++----
>  3 files changed, 34 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> index cc0b9efc2637..d3dae895a058 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> @@ -2359,15 +2359,24 @@ void i40e_finalize_xdp_rx(struct i40e_ring *rx_ring, unsigned int xdp_res)
>  }
>  
>  /**
> - * i40e_inc_ntp: Advance the next_to_process index
> + * i40e_inc_ntp_ntc: Advance the next_to_process and next_to_clean indexes
>   * @rx_ring: Rx ring
> + * @next_to_process: Pointer to next_to_process
> + * @next_to_clean: Pointer to next_to_clean or NULL
> + *
> + * This function advances the next_to_process index. If next_to_clean is not
> + * NULL, it is advanced as well.
>   **/
> -static void i40e_inc_ntp(struct i40e_ring *rx_ring)
> +void i40e_inc_ntp_ntc(struct i40e_ring *rx_ring, u16 *next_to_process,
> +		      u16 *next_to_clean)
>  {
> -	u32 ntp = rx_ring->next_to_process + 1;
> +	u16 ntp = *next_to_process + 1;
>  
>  	ntp = (ntp < rx_ring->count) ? ntp : 0;
> -	rx_ring->next_to_process = ntp;
> +	*next_to_process = ntp;
> +	if (next_to_clean)
> +		*next_to_clean = ntp;
> +
>  	prefetch(I40E_RX_DESC(rx_ring, ntp));
>  }
>  
> @@ -2484,17 +2493,19 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget,
>  			i40e_clean_programming_status(rx_ring,
>  						      rx_desc->raw.qword[0],
>  						      qword);
> +			bool eop;
> +
>  			rx_buffer = i40e_rx_bi(rx_ring, ntp);
> -			i40e_inc_ntp(rx_ring);
> -			i40e_reuse_rx_page(rx_ring, rx_buffer);
>  			/* Update ntc and bump cleaned count if not in the
>  			 * middle of mb packet.
>  			 */
> -			if (rx_ring->next_to_clean == ntp) {
> -				rx_ring->next_to_clean =
> -					rx_ring->next_to_process;
> +			eop = rx_ring->next_to_process ==
> +			      rx_ring->next_to_clean;
> +			i40e_inc_ntp_ntc(rx_ring, &rx_ring->next_to_process,
> +					 eop ? &rx_ring->next_to_clean : NULL);

Woah, that's not what I had on mind...I meant to pull whole block that
takes care of FDIR descriptors onto common function. That logic should be
shared between normal Rx and ZC Rx. The only different action we need to
take is how we release the buffer.

Could you try pulling whole i40e_rx_is_programming_status() branch onto
function within i40e_txrx_common.h and see how much of a work would it
take to have this as a common function?

> +			if (eop)
>  				cleaned_count++;
> -			}
> +			i40e_reuse_rx_page(rx_ring, rx_buffer);
>  			continue;
>  		}
>  
> @@ -2507,7 +2518,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget,
>  		rx_buffer = i40e_get_rx_buffer(rx_ring, size);
>  
>  		neop = i40e_is_non_eop(rx_ring, rx_desc);
> -		i40e_inc_ntp(rx_ring);
> +		i40e_inc_ntp_ntc(rx_ring, &rx_ring->next_to_process, NULL);
>  
>  		if (!xdp->data) {
>  			unsigned char *hard_start;
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h b/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
> index e26807fd2123..3d7e4b3404f0 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
> @@ -17,6 +17,8 @@ void i40e_update_rx_stats(struct i40e_ring *rx_ring,
>  			  unsigned int total_rx_packets);
>  void i40e_finalize_xdp_rx(struct i40e_ring *rx_ring, unsigned int xdp_res);
>  void i40e_release_rx_desc(struct i40e_ring *rx_ring, u32 val);
> +void i40e_inc_ntp_ntc(struct i40e_ring *rx_ring, u16 *next_to_process,
> +		      u16 *next_to_clean);
>  
>  #define I40E_XDP_PASS		0
>  #define I40E_XDP_CONSUMED	BIT(0)
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> index 9f47388eaba5..fdf72446ed67 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> @@ -410,7 +410,6 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
>  	u16 next_to_clean = rx_ring->next_to_clean;
>  	unsigned int xdp_res, xdp_xmit = 0;
>  	struct xdp_buff *first = NULL;
> -	u32 count = rx_ring->count;
>  	struct bpf_prog *xdp_prog;
>  	u32 entries_to_alloc;
>  	bool failure = false;
> @@ -430,6 +429,7 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
>  		struct xdp_buff *bi;
>  		unsigned int size;
>  		u64 qword;
> +		bool neop;
>  
>  		rx_desc = I40E_RX_DESC(rx_ring, next_to_process);
>  		qword = le64_to_cpu(rx_desc->wb.qword1.status_error_len);
> @@ -446,8 +446,10 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
>  						      qword);
>  			bi = *i40e_rx_bi(rx_ring, next_to_process);
>  			xsk_buff_free(bi);
> -			if (++next_to_process == count)
> -				next_to_process = 0;
> +			i40e_inc_ntp_ntc(rx_ring, &next_to_process,
> +					 next_to_process == next_to_clean ?
> +						 &next_to_clean :
> +						 NULL);
>  			continue;
>  		}
>  
> @@ -466,16 +468,17 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
>  			break;
>  		}
>  
> -		if (++next_to_process == count)
> -			next_to_process = 0;
> +		neop = i40e_is_non_eop(rx_ring, rx_desc);
> +		// advance next_to_process. on EOP, advance next_to_clean as well.
> +		i40e_inc_ntp_ntc(rx_ring, &next_to_process,
> +				 !neop ? &next_to_clean : NULL);
>  
> -		if (i40e_is_non_eop(rx_ring, rx_desc))
> +		if (neop)
>  			continue;
>  
>  		xdp_res = i40e_run_xdp_zc(rx_ring, first, xdp_prog);
>  		i40e_handle_xdp_result_zc(rx_ring, first, rx_desc, &rx_packets,
>  					  &rx_bytes, xdp_res, &failure);
> -		next_to_clean = next_to_process;
>  		if (failure)
>  			break;
>  		total_rx_packets += rx_packets;
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net v3 1/1] i40e: xsk: advance next_to_clean on status descriptors
  2025-11-14 13:01   ` Maciej Fijalkowski
@ 2025-11-15  7:58     ` Alessandro Decina
  2025-11-17 16:37       ` Maciej Fijalkowski
  0 siblings, 1 reply; 8+ messages in thread
From: Alessandro Decina @ 2025-11-15  7:58 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: netdev, David S. Miller, Alexei Starovoitov, Andrew Lunn,
	Daniel Borkmann, Eric Dumazet, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Paolo Abeni,
	Przemek Kitszel, Stanislav Fomichev, Tirthendu Sarkar,
	Tony Nguyen, bpf, intel-wired-lan, linux-kernel

On Fri, Nov 14, 2025 at 02:01:14PM +0100, Maciej Fijalkowski wrote:
> Woah, that's not what I had on mind...I meant to pull whole block that
> takes care of FDIR descriptors onto common function. That logic should be
> shared between normal Rx and ZC Rx. The only different action we need to
> take is how we release the buffer.
> 
> Could you try pulling whole i40e_rx_is_programming_status() branch onto
> function within i40e_txrx_common.h and see how much of a work would it
> take to have this as a common function?

Just before I send another rev, you mean something like this? 
https://github.com/alessandrod/linux/commit/a6fa91d5b5d1cc283a2f1faa378085c44bda8b4a

My rationale for i40e_inc_ntp_ntc was that _that_ is where the bug lies:
letting ntp and ntc get out of sync. By introducing a function that
forces you to _have_ to think about ntc and explicitly pass NULL if you
don't want to sync it, bugs like this become less easy to introduce.

That said I don't mind either way! Let me know if you want me to send v4
with the i40e_clean_programming_status() change.

Ciao,
Alessandro

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net v3 1/1] i40e: xsk: advance next_to_clean on status descriptors
  2025-11-15  7:58     ` Alessandro Decina
@ 2025-11-17 16:37       ` Maciej Fijalkowski
  2025-11-18  6:37         ` Alessandro Decina
  0 siblings, 1 reply; 8+ messages in thread
From: Maciej Fijalkowski @ 2025-11-17 16:37 UTC (permalink / raw)
  To: Alessandro Decina
  Cc: netdev, David S. Miller, Alexei Starovoitov, Andrew Lunn,
	Daniel Borkmann, Eric Dumazet, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Paolo Abeni,
	Przemek Kitszel, Stanislav Fomichev, Tirthendu Sarkar,
	Tony Nguyen, bpf, intel-wired-lan, linux-kernel

On Sat, Nov 15, 2025 at 06:58:41PM +1100, Alessandro Decina wrote:
> On Fri, Nov 14, 2025 at 02:01:14PM +0100, Maciej Fijalkowski wrote:
> > Woah, that's not what I had on mind...I meant to pull whole block that
> > takes care of FDIR descriptors onto common function. That logic should be
> > shared between normal Rx and ZC Rx. The only different action we need to
> > take is how we release the buffer.
> > 
> > Could you try pulling whole i40e_rx_is_programming_status() branch onto
> > function within i40e_txrx_common.h and see how much of a work would it
> > take to have this as a common function?
> 
> Just before I send another rev, you mean something like this? 
> https://github.com/alessandrod/linux/commit/a6fa91d5b5d1cc283a2f1faa378085c44bda8b4a
> 
> My rationale for i40e_inc_ntp_ntc was that _that_ is where the bug lies:
> letting ntp and ntc get out of sync. By introducing a function that
> forces you to _have_ to think about ntc and explicitly pass NULL if you
> don't want to sync it, bugs like this become less easy to introduce.
> 
> That said I don't mind either way! Let me know if you want me to send v4
> with the i40e_clean_programming_status() change.

This revision is much more clear to me. Only thing that might be bothering
someone is doubled i40e_rx_bi() call in i40e_get_rx_buffer(). Not sure if
we can do about it though as we need to use ntp from before potential
increment.

...maybe pass rx_buffer to i40e_get_rx_buffer() ?

> 
> Ciao,
> Alessandro

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net v3 1/1] i40e: xsk: advance next_to_clean on status descriptors
  2025-11-17 16:37       ` Maciej Fijalkowski
@ 2025-11-18  6:37         ` Alessandro Decina
  2025-11-18  8:04           ` Maciej Fijalkowski
  0 siblings, 1 reply; 8+ messages in thread
From: Alessandro Decina @ 2025-11-18  6:37 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: netdev, David S. Miller, Alexei Starovoitov, Andrew Lunn,
	Daniel Borkmann, Eric Dumazet, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Paolo Abeni,
	Przemek Kitszel, Stanislav Fomichev, Tirthendu Sarkar,
	Tony Nguyen, bpf, intel-wired-lan, linux-kernel

On Mon, Nov 17, 2025 at 05:37:49PM +0100, Maciej Fijalkowski wrote:
> This revision is much more clear to me. Only thing that might be bothering
> someone is doubled i40e_rx_bi() call in i40e_get_rx_buffer(). Not sure if
> we can do about it though as we need to use ntp from before potential
> increment.
> 
> ...maybe pass rx_buffer to i40e_get_rx_buffer() ?

Surely the compiler isn't going to actually reload here, but yeah not
great code wise. How about I pass it the buffer and rename to
i40e_prepare_rx_buffer to better match what's happening now?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net v3 1/1] i40e: xsk: advance next_to_clean on status descriptors
  2025-11-18  6:37         ` Alessandro Decina
@ 2025-11-18  8:04           ` Maciej Fijalkowski
  0 siblings, 0 replies; 8+ messages in thread
From: Maciej Fijalkowski @ 2025-11-18  8:04 UTC (permalink / raw)
  To: Alessandro Decina
  Cc: netdev, David S. Miller, Alexei Starovoitov, Andrew Lunn,
	Daniel Borkmann, Eric Dumazet, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Paolo Abeni,
	Przemek Kitszel, Stanislav Fomichev, Tirthendu Sarkar,
	Tony Nguyen, bpf, intel-wired-lan, linux-kernel

On Tue, Nov 18, 2025 at 05:37:14PM +1100, Alessandro Decina wrote:
> On Mon, Nov 17, 2025 at 05:37:49PM +0100, Maciej Fijalkowski wrote:
> > This revision is much more clear to me. Only thing that might be bothering
> > someone is doubled i40e_rx_bi() call in i40e_get_rx_buffer(). Not sure if
> > we can do about it though as we need to use ntp from before potential
> > increment.
> > 
> > ...maybe pass rx_buffer to i40e_get_rx_buffer() ?
> 
> Surely the compiler isn't going to actually reload here, but yeah not
> great code wise. How about I pass it the buffer and rename to
> i40e_prepare_rx_buffer to better match what's happening now?

SGTM!

> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-11-18  8:05 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-13  8:24 [PATCH net v3 0/1] i40e: xsk: advance next_to_clean on status descriptors Alessandro Decina
2025-11-13  8:24 ` [PATCH net v3 1/1] " Alessandro Decina
2025-11-14  5:49   ` [Intel-wired-lan] " Loktionov, Aleksandr
2025-11-14 13:01   ` Maciej Fijalkowski
2025-11-15  7:58     ` Alessandro Decina
2025-11-17 16:37       ` Maciej Fijalkowski
2025-11-18  6:37         ` Alessandro Decina
2025-11-18  8:04           ` Maciej Fijalkowski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).