Netdev List

Netdev List
 help / color / mirror / Atom feed

* RE: [PATCH net v5 1/4] net: ethernet: oa_tc6: Interrupt is active low, level triggered.
From: Selvamani Rajagopal @ 2026-06-18  4:26 UTC (permalink / raw)
  To: Parthiban.Veerasooran@microchip.com, andrew+netdev@lunn.ch,
	davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
	pabeni@redhat.com, robh@kernel.org, krzk+dt@kernel.org,
	conor+dt@kernel.org, Piergiorgio Beruto
  Cc: andrew@lunn.ch, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, Conor.Dooley@microchip.com,
	devicetree@vger.kernel.org
In-Reply-To: <7c89df6b-32ac-46c8-8400-945879037f2e@microchip.com>

> Subject: Re: [PATCH net v5 1/4] net: ethernet: oa_tc6: Interrupt is active low, level
> triggered.
> 
> 
> 
> Test case 2: Two LAN8651 instances on the same RPI4
> 
> Setup:
> 
> RPI4 #1 + LAN8651 (IP: 192.168.10.101) <--- RPI4 #2 + EVB-LAN8670-USB
> (IP: 192.168.10.102)
> RPI4 #1 + LAN8651 (IP: 192.168.20.101) <--- RPI4 #2 + EVB-LAN8670-USB
> (IP: 192.168.20.102)
> 
> Result:
> 

Parthiban,

It appears that we can't reproduce the crash you saw in your setup. Code has been running
all day with 5+ millions of "™Receive buffer overflow error" (Yes. I added a counter to see how 
many times, code returns EAGAIN error code)

One obvious reason is that our EVB has only one network interface. Just like your setup in Test case 1,
where you didn't see any issue.

AI review bot Sashiko suggested one potential issue where skb pointers aren't protected. But those 
concerns are in transmit path. This crash seems to be in receive path. If you think that might help,
I can generate a patch for that.

What do you suggest? Since you are able to see the crash, would you have time to investigate?

Sincerely
Selva

^ permalink raw reply

* Re: [PATCH bpf v2] bpf, sockmap: fix use-after-free when the stream parser resizes the skb
From: Sechang Lim @ 2026-06-18  4:57 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: bobbyeshleman, bpf, davem, edumazet, horms, jakub, john.fastabend,
	kuba, linux-kernel, netdev, pabeni
In-Reply-To: <20260618002559.1479884-1-kuniyu@google.com>

On Thu, Jun 18, 2026 at 12:25:57AM +0000, Kuniyuki Iwashima wrote:
>From: Sechang Lim <rhkrqnwk98@gmail.com>
>Date: Fri, 12 Jun 2026 12:35:51 +0000
>> sk_psock_strp_parse() runs the BPF_PROG_TYPE_SK_SKB stream-parser program
>> to find the length of the next message. strparser assembles a message out
>> of several received skbs by chaining them onto the head's frag_list and
>> recording where to append the next one in strp->skb_nextp:
>>
>> 	*strp->skb_nextp = skb;
>> 	strp->skb_nextp = &skb->next;
>>
>> and then calls the parser on the head:
>>
>> 	len = (*strp->cb.parse_msg)(strp, head);
>>
>> The parser is only meant to inspect the skb, but the program may call
>> bpf_skb_change_tail() -- or the sibling bpf_skb_pull_data(),
>> bpf_skb_change_head(), bpf_skb_adjust_room(), all allowed for SK_SKB.
>
>It's bpf prog's responsibility not to abuse them.
>
>Even setting aside that, why not simply block such BPF prog ?
>
>It cannot be done at load time, but doable at attach time.
>
>>

Thanks, this is cleaner than cloning. Will fix in v3.

Best,
Sechang

^ permalink raw reply

* Re: [PATCH bpf v2] bpf, sockmap: fix use-after-free when the stream parser resizes the skb
From: Sechang Lim @ 2026-06-18  5:19 UTC (permalink / raw)
  To: Jiayuan Chen
  Cc: John Fastabend, Jakub Sitnicki, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Bobby Eshleman, netdev,
	bpf, linux-kernel
In-Reply-To: <04931588-e708-40d8-a1b7-3700a1a3b376@linux.dev>

On Thu, Jun 18, 2026 at 10:45:02AM +0800, Jiayuan Chen wrote:
>
>On 6/12/26 8:35 PM, Sechang Lim wrote:
>>sk_psock_strp_parse() runs the BPF_PROG_TYPE_SK_SKB stream-parser program
>>to find the length of the next message. strparser assembles a message out
>>of several received skbs by chaining them onto the head's frag_list and
>>recording where to append the next one in strp->skb_nextp:
>>
>>	*strp->skb_nextp = skb;
>>	strp->skb_nextp = &skb->next;
>>
>>and then calls the parser on the head:
>>
>>	len = (*strp->cb.parse_msg)(strp, head);
>>
>>The parser is only meant to inspect the skb, but the program may call
>>bpf_skb_change_tail() -- or the sibling bpf_skb_pull_data(),
>>bpf_skb_change_head(), bpf_skb_adjust_room(), all allowed for SK_SKB.
>>Once the head carries a frag_list these go
>>
>>	... -> skb_ensure_writable -> pskb_may_pull -> __pskb_pull_tail
>>
>>and __pskb_pull_tail() frees the frag_list skbs that strparser still
>>tracks through skb_nextp:
>>
>>	while ((list = skb_shinfo(skb)->frag_list) != insp) {
>>		skb_shinfo(skb)->frag_list = list->next;
>>		consume_skb(list);
>>	}
>>
>>strp->skb_nextp now points into a freed sk_buff. The next segment of
>>the same message arrives in __strp_recv(), which links it with
>>*strp->skb_nextp = skb, an 8-byte write into the freed skb. The free
>>and the write happen in different __strp_recv() calls, so the message
>>has to span at least three segments before it triggers.
>>
>>   BUG: KASAN: slab-use-after-free in __strp_recv+0x447/0xda0
>>   Write of size 8 at addr ffff88810db86140 by task repro/349
>>
>>   Call Trace:
>>    <IRQ>
>>    __strp_recv+0x447/0xda0
>>    __tcp_read_sock+0x13d/0x590
>>    tcp_bpf_strp_read_sock+0x195/0x320
>>    strp_data_ready+0x267/0x340
>>    sk_psock_strp_data_ready+0x1ce/0x350
>>    tcp_data_queue+0x1364/0x2fd0
>>    tcp_rcv_established+0xe07/0x1640
>>    [...]
>>
>>   Allocated by task 349:
>>    skb_clone+0x17b/0x210
>>    __strp_recv+0x2c3/0xda0
>>    __tcp_read_sock+0x13d/0x590
>>    [...]
>>
>>   Freed by task 349:
>>    kmem_cache_free+0x150/0x570
>>    __pskb_pull_tail+0x57b/0xc20
>>    skb_ensure_writable+0x236/0x260
>>    __bpf_skb_change_tail+0x1d4/0x590
>>    sk_skb_change_tail+0x2a/0x40
>>    bpf_prog_1b285dcd6c41373e+0x27/0x30
>>    bpf_prog_run_pin_on_cpu+0xf3/0x260
>>    sk_psock_strp_parse+0x118/0x1e0
>>    __strp_recv+0x4f6/0xda0
>>    [...]
>>
>>The same resize also leaves the head's length inconsistent with its
>>frags, so a later __pskb_pull_tail() can instead hit the
>>BUG_ON(skb_copy_bits(...)) in net/core/skbuff.c.
>>
>>Run the parser on a private clone of the head when the message spans more
>>than one skb and the program can modify the packet
>>(prog->aux->changes_pkt_data), so a resizing helper can only touch the
>>clone and strparser's head and skb_nextp stay valid. Single-skb messages
>>have no frag_list and read-only parsers cannot resize, so both are still
>>parsed in place. If the clone cannot be allocated, return 0 so the caller
>>retries on the next read rather than failing the parser.
>>
>>Fixes: 8a31db561566 ("bpf: add access to sock fields and pkt data from sk_skb programs")
>
>
>Please consider Kuniyuki Iwashima's suggestion.
>
>But it only covers the ATTACH path; the other two paths should be 
>covered as well:
>
>- BPF_PROG_ATTACH → sock_map_get_from_fd → sock_map_prog_update
>- BPF_LINK_CREATE → sock_map_link_create → sock_map_prog_update
>- replace prog → sock_map_link_update_prog
>
>A new helper for this check is probably needed, called from both
>
>sock_map_prog_update() and sock_map_link_update_prog().
>

Thanks, agreed. v3 will cover prog attach, link create and link update.

>
>Since this rejects the program at attach time rather than fixing a 
>runtime crash,
>
>I'm not sure a Fixes tag is appropriate here - thoughts?
>

I'd keep it. skb_change_tail reached SK_SKB in current Fixes tag, so
that is where the UAF became reachable. Happy to drop it if you prefer.

Best,
Sechang

^ permalink raw reply

* RE: [Intel-wired-lan] [PATCH net v2] ice: eswitch: fix use-after-free of metadata_dst in repr release
From: Loktionov, Aleksandr @ 2026-06-18  5:20 UTC (permalink / raw)
  To: Doruk Tan Ozturk, Nguyen, Anthony L, Kitszel, Przemyslaw,
	andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com
  Cc: michal.swiatkowski@linux.intel.com, Drewek, Wojciech,
	horms@kernel.org, intel-wired-lan@lists.osuosl.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	stable@vger.kernel.org
In-Reply-To: <20260617100556.83620-1-doruk@0sec.ai>



> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf
> Of Doruk Tan Ozturk
> Sent: Wednesday, June 17, 2026 12:06 PM
> To: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel,
> Przemyslaw <przemyslaw.kitszel@intel.com>; andrew+netdev@lunn.ch;
> davem@davemloft.net; edumazet@google.com; kuba@kernel.org;
> pabeni@redhat.com
> Cc: michal.swiatkowski@linux.intel.com; Drewek, Wojciech
> <wojciech.drewek@intel.com>; horms@kernel.org; intel-wired-
> lan@lists.osuosl.org; netdev@vger.kernel.org; linux-
> kernel@vger.kernel.org; Doruk Tan Ozturk <doruk@0sec.ai>;
> stable@vger.kernel.org
> Subject: [Intel-wired-lan] [PATCH net v2] ice: eswitch: fix use-after-
> free of metadata_dst in repr release
> 
> ice_eswitch_release_repr() frees the port representor metadata_dst via
> metadata_dst_free(), which directly kfree()s the object and ignores
> the dst_entry refcount. The eswitch slow-path TX routine
> ice_eswitch_port_start_xmit() takes a reference on this dst with
> dst_hold() and attaches it to the skb via skb_dst_set(). If such an
> skb is still in flight (e.g. queued in a qdisc) when the representor
> is torn down, the metadata_dst is freed while the skb still points at
> it. When the skb is later freed, dst_release() operates on already-
> freed memory.
> 
> Replace metadata_dst_free() with dst_release() so the metadata_dst is
> freed only after the last reference is dropped. The dst subsystem
> frees metadata_dst objects from dst_destroy() once the refcount
> reaches zero (DST_METADATA is set by metadata_dst_alloc()).
> 
> Same class of bug and fix as commit c32b26aaa2f9 ("netfilter:
> nft_tunnel: fix use-after-free on object destroy").
> 
> Fixes: 1a1c40df2e80 ("ice: set and release switchdev environment")
> Cc: stable@vger.kernel.org
> Signed-off-by: Doruk Tan Ozturk <doruk@0sec.ai>
> Reviewed-by: Simon Horman <horms@kernel.org>
> ---
>  v2:
>   - Correct the Fixes: tag to the commit that introduced the switchdev
>     teardown (Simon Horman); add his Reviewed-by. No functional
> change.
>  v1: https://lore.kernel.org/netdev/20260615140532.52676-1-
> doruk@0sec.ai/
> 
>  drivers/net/ethernet/intel/ice/ice_eswitch.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_eswitch.c
> b/drivers/net/ethernet/intel/ice/ice_eswitch.c
> index 2e4f0969035f..41b30a7ca4a9 100644
> --- a/drivers/net/ethernet/intel/ice/ice_eswitch.c
> +++ b/drivers/net/ethernet/intel/ice/ice_eswitch.c
> @@ -95,7 +95,7 @@ ice_eswitch_release_repr(struct ice_pf *pf, struct
> ice_repr *repr)
>  		return;
> 
>  	ice_vsi_update_security(vsi, ice_vsi_ctx_set_antispoof);
> -	metadata_dst_free(repr->dst);
> +	dst_release(&repr->dst->dst);
>  	repr->dst = NULL;
>  	ice_fltr_add_mac_and_broadcast(vsi, repr->parent_mac,
>  				       ICE_FWD_TO_VSI);
> --
> 2.43.0

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>


^ permalink raw reply

* RE: [Intel-wired-lan] [PATCH net v2] ice: Fix use-after-scope in ice_sched_add_nodes_to_layer()
From: Loktionov, Aleksandr @ 2026-06-18  5:21 UTC (permalink / raw)
  To: NeKon69, Nguyen, Anthony L, Kitszel, Przemyslaw
  Cc: andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com, horms@kernel.org,
	Kwapulinski, Piotr, intel-wired-lan@lists.osuosl.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <20260617072155.1172432-1-nobodqwe@gmail.com>



> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf
> Of NeKon69
> Sent: Wednesday, June 17, 2026 9:22 AM
> To: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel,
> Przemyslaw <przemyslaw.kitszel@intel.com>
> Cc: andrew+netdev@lunn.ch; davem@davemloft.net; edumazet@google.com;
> kuba@kernel.org; pabeni@redhat.com; horms@kernel.org; Kwapulinski,
> Piotr <piotr.kwapulinski@intel.com>; intel-wired-lan@lists.osuosl.org;
> netdev@vger.kernel.org; linux-kernel@vger.kernel.org; NeKon69
> <nobodqwe@gmail.com>
> Subject: [Intel-wired-lan] [PATCH net v2] ice: Fix use-after-scope in
> ice_sched_add_nodes_to_layer()
> 
> Commit 7fb09a737536 ("ice: Modify recursive way of adding nodes")
> changed ice_sched_add_nodes_to_layer() from recursive control flow to
> an iterative loop.
> 
> Inside the loop, first_teid_ptr may be set to the address of a block-
> local variable:
> 
>     u32 temp;
>     ...
>     if (num_added)
>         first_teid_ptr = &temp;
> 
> On the next loop iteration, first_teid_ptr may be passed to
> ice_sched_add_nodes_to_hw_layer(), after temp from the previous
> iteration has gone out of scope.
> 
> Instead of keeping temporary storage for later calls, allow
> first_node_teid to be NULL when the caller does not need the TEID.
> 
> This was found by Clang with LifetimeSafety enabled while testing C
> language support on a Linux allmodconfig build.
> 
> Fixes: 7fb09a737536 ("ice: Modify recursive way of adding nodes")
> Link: https://github.com/llvm/llvm-project/pull/203270
> Signed-off-by: NeKon69 <nobodqwe@gmail.com>
> ---
> v2:
> - Allow first_node_teid to be NULL when callers do not need the TEID.
> - Pass NULL after the first TEID has already been returned instead of
> using
>   temporary stack storage.
> - Update kernel-doc for helpers accepting NULL.
> - Link to v1: https://lore.kernel.org/netdev/20260613101440.80190-1-
> nobodqwe@gmail.com/
> - Compile-tested with:
>   make drivers/net/ethernet/intel/ice/ice_sched.o
> 
>  drivers/net/ethernet/intel/ice/ice_sched.c | 16 +++++++---------
>  1 file changed, 7 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_sched.c
> b/drivers/net/ethernet/intel/ice/ice_sched.c
> index fff0c1afdb41..89e191c839b1 100644
> --- a/drivers/net/ethernet/intel/ice/ice_sched.c
> +++ b/drivers/net/ethernet/intel/ice/ice_sched.c
> @@ -895,7 +895,8 @@ void ice_sched_cleanup_all(struct ice_hw *hw)
>   * @layer: layer number to add nodes
>   * @num_nodes: number of nodes
>   * @num_nodes_added: pointer to num nodes added
> - * @first_node_teid: if new nodes are added then return the TEID of
> first node
> + * @first_node_teid: if new nodes are added then return the TEID of
> first node,
> + *                   may be NULL
>   * @prealloc_nodes: preallocated nodes struct for software DB
>   *
>   * This function add nodes to HW as well as to SW DB for a given
> layer @@ -1000,7 +1001,7 @@ ice_sched_add_elems(struct ice_port_info
> *pi, struct ice_sched_node *tc_node,
>  		if (!pi->sib_head[tc_node->tc_num][layer])
>  			pi->sib_head[tc_node->tc_num][layer] = new_node;
> 
> -		if (i == 0)
> +		if (first_node_teid && i == 0)
>  			*first_node_teid = teid;
>  	}
> 
> @@ -1015,7 +1016,7 @@ ice_sched_add_elems(struct ice_port_info *pi,
> struct ice_sched_node *tc_node,
>   * @parent: pointer to parent node
>   * @layer: layer number to add nodes
>   * @num_nodes: number of nodes to be added
> - * @first_node_teid: pointer to the first node TEID
> + * @first_node_teid: pointer to the first node TEID, may be NULL
>   * @num_nodes_added: pointer to number of nodes added
>   *
>   * Add nodes into specific HW layer.
> @@ -1078,7 +1079,6 @@ ice_sched_add_nodes_to_layer(struct
> ice_port_info *pi,
>  	*num_nodes_added = 0;
>  	while (*num_nodes_added < num_nodes) {
>  		u16 max_child_nodes, num_added = 0;
> -		u32 temp;
> 
>  		status = ice_sched_add_nodes_to_hw_layer(pi, tc_node,
> parent,
>  							 layer,
> 	new_num_nodes,
> @@ -1109,13 +1109,11 @@ ice_sched_add_nodes_to_layer(struct
> ice_port_info *pi,
>  			 * try the next available sibling.
>  			 */
>  			parent = ice_sched_find_next_vsi_node(parent);
> -			/* Don't modify the first node TEID memory if the
> -			 * first node was added already in the above
> call.
> -			 * Instead send some temp memory for all other
> -			 * recursive calls.
> +			/* Don't modify the first node TEID memory if the
> first node
> +			 * was added already in the above call.
>  			 */
>  			if (num_added)
> -				first_teid_ptr = &temp;
> +				first_teid_ptr = NULL;
> 
>  			new_num_nodes = num_nodes - *num_nodes_added;
>  		}
> --
> 2.54.0

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>


^ permalink raw reply

* RE: [Intel-wired-lan] [PATCH] idpf: bound interrupt-vector register fill to the allocated array
From: Loktionov, Aleksandr @ 2026-06-18  5:22 UTC (permalink / raw)
  To: Michael Bommarito, Nguyen, Anthony L, Kitszel, Przemyslaw,
	Hay, Joshua A, Pavan Kumar Linga, Andrew Lunn, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <20260617215754.1117178-1-michael.bommarito@gmail.com>



> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf
> Of Michael Bommarito
> Sent: Wednesday, June 17, 2026 11:58 PM
> To: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel,
> Przemyslaw <przemyslaw.kitszel@intel.com>; Hay, Joshua A
> <joshua.a.hay@intel.com>; Pavan Kumar Linga
> <pavan.kumar.linga@intel.com>; Andrew Lunn <andrew+netdev@lunn.ch>;
> David S . Miller <davem@davemloft.net>; Eric Dumazet
> <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo Abeni
> <pabeni@redhat.com>
> Cc: intel-wired-lan@lists.osuosl.org; netdev@vger.kernel.org; linux-
> kernel@vger.kernel.org
> Subject: [Intel-wired-lan] [PATCH] idpf: bound interrupt-vector
> register fill to the allocated array
> 
> idpf_get_reg_intr_vecs() fills the caller-allocated reg_vals[] array
> from the VIRTCHNL2_OP_ALLOC_VECTORS reply in adapter->req_vec_chunks,
> bounding its inner loop only by the per-chunk num_vectors. The array
> is sized
> separately: idpf_intr_reg_init() allocates kzalloc_objs(struct
> idpf_vec_regs, total_vecs) from caps.num_allocated_vectors and only
> checks the returned count after the fill. The sum of per-chunk
> num_vectors is never reconciled against total_vecs, so a reply with a
> small num_allocated_vectors but chunks summing higher writes past the
> end of reg_vals[].
> 
> Impact: a control plane (a PF or hypervisor device model) that returns
> a VIRTCHNL2_OP_ALLOC_VECTORS reply whose per-chunk num_vectors sum
> exceeds num_allocated_vectors writes struct idpf_vec_regs entries past
> the end of the reg_vals kmalloc allocation (KASAN slab-out-of-bounds
> write).
> 
> Bound the fill loop to the array capacity passed in by the callers,
> mirroring the sibling idpf_vport_get_q_reg(). The existing num_regs <
> num_vecs check then rejects an undersized reply without the out-of-
> bounds write happening first.
> 
> Fixes: d4d558718266 ("idpf: initialize interrupts and enable vport")
> Assisted-by: Claude:claude-opus-4-7
> Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
> ---
> The reply originates from the control plane (a PF or hypervisor device
> model), which is trusted in a standard deployment, so this is a
> defense-in-depth / robustness fix: it bounds a malformed or internally
> inconsistent ALLOC_VECTORS reply. It is a genuine trust-boundary
> crossing only where the guest distrusts the control plane (a
> confidential VM or an Intel IPU posture) or the control plane is
> simply buggy. It is not remotely or unprivileged-reachable.
> 
> Reproduced with a KUnit harness that calls the unmodified
> idpf_get_reg_intr_vecs() against a crafted req_vec_chunks reply
> (num_allocated_vectors = 1, four chunks of sixteen vectors) under
> KASAN:
> stock reports a slab-out-of-bounds write 0 bytes past a 12-byte
> kmalloc-16 object and the test fails; the patched build is KASAN-
> clean; a well-formed 64-vector reply still fills 64 entries on both.
> The KUnit wiring is repro-only scaffolding, not part of this patch;
> harness on request.
> 
>  drivers/net/ethernet/intel/idpf/idpf_dev.c      | 2 +-
>  drivers/net/ethernet/intel/idpf/idpf_vf_dev.c   | 2 +-
>  drivers/net/ethernet/intel/idpf/idpf_virtchnl.c | 5 +++--
> drivers/net/ethernet/intel/idpf/idpf_virtchnl.h | 2 +-
>  4 files changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_dev.c
> b/drivers/net/ethernet/intel/idpf/idpf_dev.c
> index 1a0c71c95ef12..4079a787657f1 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_dev.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_dev.c
> @@ -87,7 +87,7 @@ static int idpf_intr_reg_init(struct idpf_vport
> *vport,
>  	if (!reg_vals)
>  		return -ENOMEM;
> 
> -	num_regs = idpf_get_reg_intr_vecs(adapter, reg_vals);
> +	num_regs = idpf_get_reg_intr_vecs(adapter, reg_vals,
> total_vecs);
>  	if (num_regs < num_vecs) {
>  		err = -EINVAL;
>  		goto free_reg_vals;
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_vf_dev.c
> b/drivers/net/ethernet/intel/idpf/idpf_vf_dev.c
> index a07d7e808ca9b..6726084f6cfa0 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_vf_dev.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_vf_dev.c
> @@ -86,7 +86,7 @@ static int idpf_vf_intr_reg_init(struct idpf_vport
> *vport,
>  	if (!reg_vals)
>  		return -ENOMEM;
> 
> -	num_regs = idpf_get_reg_intr_vecs(adapter, reg_vals);
> +	num_regs = idpf_get_reg_intr_vecs(adapter, reg_vals,
> total_vecs);
>  	if (num_regs < num_vecs) {
>  		err = -EINVAL;
>  		goto free_reg_vals;
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> index be66f9b2e101c..ec7330603ff84 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> @@ -1318,11 +1318,12 @@ idpf_vport_init_queue_reg_chunks(struct
> idpf_vport_config *vport_config,
>   * idpf_get_reg_intr_vecs - Get vector queue register offset
>   * @adapter: adapter structure to get the vector chunks
>   * @reg_vals: Register offsets to store in
> + * @num_vecs: number of entries the @reg_vals array can hold
>   *
>   * Return: number of registers that got populated
>   */
>  int idpf_get_reg_intr_vecs(struct idpf_adapter *adapter,
> -			   struct idpf_vec_regs *reg_vals)
> +			   struct idpf_vec_regs *reg_vals, int num_vecs)
>  {
>  	struct virtchnl2_vector_chunks *chunks;
>  	struct idpf_vec_regs reg_val;
> @@ -1346,7 +1347,7 @@ int idpf_get_reg_intr_vecs(struct idpf_adapter
> *adapter,
>  		dynctl_reg_spacing = le32_to_cpu(chunk-
> >dynctl_reg_spacing);
>  		itrn_reg_spacing = le32_to_cpu(chunk->itrn_reg_spacing);
> 
> -		for (i = 0; i < num_vec; i++) {
> +		for (i = 0; i < num_vec && num_regs < num_vecs; i++) {
>  			reg_vals[num_regs].dyn_ctl_reg =
> reg_val.dyn_ctl_reg;
>  			reg_vals[num_regs].itrn_reg = reg_val.itrn_reg;
>  			reg_vals[num_regs].itrn_index_spacing = diff --
> git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.h
> b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.h
> index 6876e3ed9d1be..9b1c9c86f6eac 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.h
> +++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.h
> @@ -104,7 +104,7 @@ int idpf_vc_core_init(struct idpf_adapter
> *adapter);  void idpf_vc_core_deinit(struct idpf_adapter *adapter);
> 
>  int idpf_get_reg_intr_vecs(struct idpf_adapter *adapter,
> -			   struct idpf_vec_regs *reg_vals);
> +			   struct idpf_vec_regs *reg_vals, int num_vecs);
>  int idpf_queue_reg_init(struct idpf_vport *vport,
>  			struct idpf_q_vec_rsrc *rsrc,
>  			struct idpf_queue_id_reg_info *chunks);
> --
> 2.53.0

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>


^ permalink raw reply

* Re: [linux-next:master] [selftests]  a3f88d89f6: kernel-selftests-bpf.net.test_bridge_neigh_suppress.sh.arping.fail
From: Oliver Sang @ 2026-06-18  5:43 UTC (permalink / raw)
  To: Danielle Ratson
  Cc: oe-lkp@lists.linux.dev, lkp@intel.com, Jakub Kicinski,
	Nikolay Aleksandrov, netdev@vger.kernel.org, oliver.sang
In-Reply-To: <SJ2PR12MB9008502979A1D3566074A273D8E42@SJ2PR12MB9008.namprd12.prod.outlook.com>

hi, Danielle,

On Wed, Jun 17, 2026 at 07:29:48AM +0000, Danielle Ratson wrote:
> Hi Oliver,
> 
> Thank you for confirming the arping version. ARPing 2.25 by Thomas Habets has incompatible semantics for several flags that the test relies on (-D, b, -U, -A). 
> So, the failures are a tool version issue rather than a kernel regression.
> 
> This is not limited to the new commit- looking at the 56 failures in the added log, the other arping-based test cases that existed before commit a3f88d89f698 are also failing, which confirms the root cause.
> 
> The same assumption (iputils arping) is made by other net selftests as well:
> test_vxlan_nh.sh, forwarding/vxlan_asymmetric.sh, and arp_ndisc_untracked_subnets.sh all use iputils-specific flags.
> 
> There are two options to address this on your end:
> 
> 1. Install iputils-arping and ensure it takes precedence over ARPing 2.25 in PATH. Running "arping -V" should then show "iputils".

thanks a lot for information and guildance! we will defintely to install
iputils-arping.

> 2. If that is not feasible, I can send a fix that adds an iputils version check to test_bridge_neigh_suppress.sh, causing it to SKIP cleanly when iputils arping is not present:
> 
> if ! arping -V 2>&1 | grep -q "iputils"; then
>      echo "SKIP: Test requires iputils arping"
>      exit $ksft_skip
> fi
> 
> This will result in a SKIP rather than a FAIL for your environment.

no need to do fix for our env IMHO. this kernel test robot's purpose is to
bisect the regression and report the fbc to linux kernel community to help
developers to improve kernel code quality. we need to enable as many test
cases as possible for this purpose. so we will try above #1 option :)

thanks a lot!


> 
> Thanks,
> Danielle
> 
> > -----Original Message-----
> > From: Oliver Sang <oliver.sang@intel.com>
> > Sent: Monday, 15 June 2026 16:03
> > To: Danielle Ratson <danieller@nvidia.com>
> > Cc: oe-lkp@lists.linux.dev; lkp@intel.com; Jakub Kicinski <kuba@kernel.org>;
> > Nikolay Aleksandrov <razor@blackwall.org>; netdev@vger.kernel.org;
> > oliver.sang@intel.com
> > Subject: Re: [linux-next:master] [selftests] a3f88d89f6: kernel-selftests-
> > bpf.net.test_bridge_neigh_suppress.sh.arping.fail
> > 
> > hi, Danielle,
> > 
> > On Thu, Jun 11, 2026 at 11:44:39AM +0000, Danielle Ratson wrote:
> > > Hi Oliver,
> > >
> > > Thank you for the report.
> > >
> > > The failures appear to be caused by an arping tool version mismatch.
> > > The test was written assuming iputils arping semantics, but not all
> > distributions ship that version. Different arping implementations have
> > incompatible behavior for the flags used throughout
> > test_bridge_neigh_suppress.sh.
> > >
> > > Looking at the added log, the 56 failures are not limited to the
> > neigh_suppress_arp_probe section.
> > > The other arping-based test cases in the file are also affected, which is
> > consistent with a tool version issue rather than a kernel regression.
> > >
> > > To confirm the root cause on your end, please share the results for running
> > the below:
> > > $ arping -V
> > > $ ./test_bridge_neigh_suppress.sh -t neigh_suppress_arp -v
> > 
> > sorry for late.
> > 
> > our tests run in a auto framework, I had to add some code to print above
> > information, but so far, it just generates below output.
> > before we try further, want to seek your advice if these information are
> > enough?
> > 
> > KERNEL SELFTESTS: linux_headers_dir is /usr/src/linux-headers-x86_64-rhel-
> > 9.4-bpf-a3f88d89f698743a8cd91fb43f997e2d292a168d
> > ### arping -V
> > arping: option requires an argument -- 'V'
> > ARPing 2.25, by Thomas Habets <thomas@habets.se>
> > usage: arping [ -0aAbdDeFpPqrRuUvzZ ] [ -w <sec> ] [ -W <sec> ] [ -S <host/ip>
> > ]
> >               [ -T <host/ip ] [ -s <MAC> ] [ -t <MAC> ] [ -c <count> ]
> >               [ -C <count> ] [ -i <interface> ] [ -m <type> ] [ -g <group> ]
> >               [ -V <vlan> ] [ -Q <priority> ] <host/ip/MAC | -B> For complete usage
> > info, use --help or check the manpage.
> > ### ./test_bridge_neigh_suppress.sh -t neigh_suppress_arp -v
> >                                                 <-------- seems there is no output here
> > Per-port ARP suppression - VLAN 10              <-------- seems already start the
> > tests
> > ----------------------------------
> > COMMAND: tc -n sw1-U1mYwE qdisc replace dev vx0 clsact
> > 
> > 
> > >
> > > Thanks,
> > > Danielle
> > >
> > > > -----Original Message-----
> > > > From: kernel test robot <oliver.sang@intel.com>
> > > > Sent: Thursday, 11 June 2026 10:23
> > > > To: Danielle Ratson <danieller@nvidia.com>
> > > > Cc: oe-lkp@lists.linux.dev; lkp@intel.com; Jakub Kicinski
> > > > <kuba@kernel.org>; Nikolay Aleksandrov <razor@blackwall.org>;
> > > > netdev@vger.kernel.org; oliver.sang@intel.com
> > > > Subject: [linux-next:master] [selftests] a3f88d89f6:
> > > > kernel-selftests- bpf.net.test_bridge_neigh_suppress.sh.arping.fail
> > > >
> > > >
> > > > hi, Danielle Ratson,
> > > >
> > > > for new added tests, we still found some failures in our tests, not
> > > > sure if any dependencies we missed? thanks
> > > >
> > > >
> > > > Hello,
> > > >
> > > > kernel test robot noticed "kernel-selftests-
> > > > bpf.net.test_bridge_neigh_suppress.sh.arping.fail" on:
> > > >
> > > > commit: a3f88d89f698743a8cd91fb43f997e2d292a168d ("selftests: net:
> > > > Add tests for ARP probe and DAD NS handling")
> > > > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git
> > > > master
> > > >
> > > > in testcase: kernel-selftests-bpf
> > > > version:
> > > > with following parameters:
> > > >
> > > > 	group: net
> > > >
> > > >
> > > > config: x86_64-rhel-9.4-bpf
> > > > compiler: gcc-14
> > > > test machine: 16 threads Intel(R) Core(TM) i7-13620H (Raptor Lake)
> > > > with 32G memory
> > > >
> > > > (please refer to attached dmesg/kmsg for entire log/backtrace)
> > > >
> > > >
> > > >
> > > > If you fix the issue in a separate patch/commit (i.e. not just a new
> > > > version of the same patch/commit), kindly add following tags
> > > > | Reported-by: kernel test robot <oliver.sang@intel.com>
> > > > | Closes:
> > > > | https://lore.kernel.org/oe-lkp/202606110955.8f29025d-lkp@intel.com
> > > >
> > > >
> > > > # timeout set to 3600
> > > > # selftests: net: test_bridge_neigh_suppress.sh #
> > > >
> > > > [...]
> > > >
> > > > #
> > > > # Per-port ARP probe suppression
> > > > # ------------------------------
> > > > # TEST: ARP probe suppression                                         [FAIL]
> > > > # TEST: "neigh_suppress" is on                                        [ OK ]
> > > > # TEST: ARP probe suppression                                         [FAIL]
> > > > # TEST: FDB and neighbor entry installation                           [ OK ]
> > > > # TEST: arping                                                        [FAIL]
> > > > # TEST: ARP probe suppression                                         [FAIL]
> > > > # TEST: neighbor removal                                              [ OK ]
> > > > # TEST: ARP probe suppression                                         [FAIL]
> > > > # TEST: "neigh_suppress" is off                                       [ OK ]
> > > > # TEST: ARP probe suppression                                         [FAIL]
> > > > #
> > > > # Per-port DAD NS suppression
> > > > # ---------------------------
> > > > # TEST: DAD NS suppression                                            [ OK ]
> > > > # TEST: "neigh_suppress" is on                                        [ OK ]
> > > > # TEST: DAD NS suppression                                            [ OK ]
> > > > # TEST: FDB and neighbor entry installation                           [ OK ]
> > > > # TEST: DAD NS suppression                                            [ OK ]
> > > > # TEST: DAD NS proxy NA reply                                         [ OK ]
> > > > # TEST: neighbor removal                                              [ OK ]
> > > > # TEST: DAD NS suppression                                            [ OK ]
> > > > # TEST: "neigh_suppress" is off                                       [ OK ]
> > > > # TEST: DAD NS suppression                                            [ OK ]
> > > > #
> > > > # Tests passed: 124
> > > > # Tests failed:  56
> > > > not ok 110 selftests: net: test_bridge_neigh_suppress.sh # exit=1
> > > >
> > > >
> > > >
> > > > The kernel config and materials to reproduce are available at:
> > > > https://download.01.org/0day-
> > > > ci/archive/20260611/202606110955.8f29025d-lkp@intel.com
> > > >
> > > >
> > > >
> > > > --
> > > > 0-DAY CI Kernel Test Service
> > > > https://github.com/intel/lkp-tests/wiki
> > >

^ permalink raw reply

* RE: [Intel-wired-lan] [PATCH net v2 1/2] ice: dpll: set pointers to NULL after kfree in ice_dpll_deinit_info
From: Rinitha, SX @ 2026-06-18  5:52 UTC (permalink / raw)
  To: ZhaoJinming, Nguyen, Anthony L, Kitszel, Przemyslaw, Andrew Lunn,
	David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <20260529053733.764996-2-zhaojinming@uniontech.com>

> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of ZhaoJinming
> Sent: 29 May 2026 11:08
> To: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw <przemyslaw.kitszel@intel.com>; Andrew Lunn <andrew+netdev@lunn.ch>; David S . Miller <davem@davemloft.net>; Eric Dumazet <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>
> Cc: intel-wired-lan@lists.osuosl.org; netdev@vger.kernel.org; linux-kernel@vger.kernel.org; ZhaoJinming <zhaojinming@uniontech.com>
> Subject: [Intel-wired-lan] [PATCH net v2 1/2] ice: dpll: set pointers to NULL after kfree in ice_dpll_deinit_info
>
> ice_dpll_deinit_info() calls kfree() on several pf->dplls fields (inputs, outputs, eec.input_prio, pps.input_prio) but does not set the pointers to NULL afterward. This leaves dangling pointers in the
> pf->dplls structure.
>
> While not currently exploitable through existing code paths, this is unsafe because:
>
> 1. If ice_dpll_init_info() is called again after a deinit (e.g. during
>   driver recovery), and a subsequent allocation within init fails, the
>   error path will jump to deinit_info and call ice_dpll_deinit_info()
>   again. Since some pointers still hold the old freed addresses, this
>   would result in a double-free.
>
> 2. Any future code that checks these pointers before use or after free
>   would be unprotected against use-after-free.
>
> Follow the common kernel convention of setting pointers to NULL after
> kfree() so that:
> - kfree(NULL) is a safe no-op, preventing double-free
> - NULL checks on these pointers become meaningful
>
> This is a preparatory fix for a subsequent patch that routes additional error paths in ice_dpll_init_info() to the deinit_info label.
>
> Fixes: d7999f5ea64b ("ice: implement dpll interface to control cgu")
> Signed-off-by: ZhaoJinming <zhaojinming@uniontech.com>
> ---
> drivers/net/ethernet/intel/ice/ice_dpll.c | 4 ++++
> 1 file changed, 4 insertions(+)
>

Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)

^ permalink raw reply

* RE: [Intel-wired-lan] [PATCH net v2 2/2] ice: dpll: fix memory leak in ice_dpll_init_info error paths
From: Rinitha, SX @ 2026-06-18  5:52 UTC (permalink / raw)
  To: ZhaoJinming, Nguyen, Anthony L, Kitszel, Przemyslaw, Andrew Lunn,
	David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <20260529053733.764996-3-zhaojinming@uniontech.com>

> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of ZhaoJinming
> Sent: 29 May 2026 11:08
> To: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw <przemyslaw.kitszel@intel.com>; Andrew Lunn <andrew+netdev@lunn.ch>; David S . Miller <davem@davemloft.net>; Eric Dumazet <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>
> Cc: intel-wired-lan@lists.osuosl.org; netdev@vger.kernel.org; linux-kernel@vger.kernel.org; ZhaoJinming <zhaojinming@uniontech.com>
> Subject: [Intel-wired-lan] [PATCH net v2 2/2] ice: dpll: fix memory leak in ice_dpll_init_info error paths
>
> Several error return paths in ice_dpll_init_info() directly return without freeing previously allocated resources, causing memory leaks:
>
> - When de->input_prio allocation fails, d->inputs is leaked
> - When dp->input_prio allocation fails, d->inputs and de->input_prio
>  are leaked
> - When ice_get_cgu_rclk_pin_info() fails, all previously allocated
>  inputs/outputs/input_prio are leaked
> - When ice_dpll_init_pins_info(RCLK_INPUT) fails, same resources
>  are leaked
>
> Fix this by jumping to the deinit_info label which properly calls
> ice_dpll_deinit_info() to free all allocated resources.
>
> Fixes: d7999f5ea64b ("ice: implement dpll interface to control cgu")
> Signed-off-by: ZhaoJinming <zhaojinming@uniontech.com>
> ---
> drivers/net/ethernet/intel/ice/ice_dpll.c | 16 ++++++++++------
> 1 file changed, 10 insertions(+), 6 deletions(-)
>

Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)

^ permalink raw reply

* Re: [PATCH bpf v2] bpf, sockmap: fix use-after-free when the stream parser resizes the skb
From: Sechang Lim @ 2026-06-18  5:58 UTC (permalink / raw)
  To: Bobby Eshleman
  Cc: John Fastabend, Jakub Sitnicki, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, netdev, bpf,
	linux-kernel
In-Reply-To: <ajMhV7O5YfYbwQzE@devvm29614.prn0.facebook.com>

On Wed, Jun 17, 2026 at 03:36:07PM -0700, Bobby Eshleman wrote:
>On Fri, Jun 12, 2026 at 12:35:51PM +0000, Sechang Lim wrote:
>> sk_psock_strp_parse() runs the BPF_PROG_TYPE_SK_SKB stream-parser program
>> to find the length of the next message. strparser assembles a message out
>> of several received skbs by chaining them onto the head's frag_list and
>> recording where to append the next one in strp->skb_nextp:
>>
>> 	*strp->skb_nextp = skb;
>> 	strp->skb_nextp = &skb->next;
>>
>> and then calls the parser on the head:
>>
>> 	len = (*strp->cb.parse_msg)(strp, head);
>>
>> The parser is only meant to inspect the skb, but the program may call
>> bpf_skb_change_tail() -- or the sibling bpf_skb_pull_data(),
>> bpf_skb_change_head(), bpf_skb_adjust_room(), all allowed for SK_SKB.
>> Once the head carries a frag_list these go
>>
>> 	... -> skb_ensure_writable -> pskb_may_pull -> __pskb_pull_tail
>>
>> and __pskb_pull_tail() frees the frag_list skbs that strparser still
>> tracks through skb_nextp:
>>
>> 	while ((list = skb_shinfo(skb)->frag_list) != insp) {
>> 		skb_shinfo(skb)->frag_list = list->next;
>> 		consume_skb(list);
>> 	}
>>
>> strp->skb_nextp now points into a freed sk_buff. The next segment of
>> the same message arrives in __strp_recv(), which links it with
>> *strp->skb_nextp = skb, an 8-byte write into the freed skb. The free
>> and the write happen in different __strp_recv() calls, so the message
>> has to span at least three segments before it triggers.
>>
>>   BUG: KASAN: slab-use-after-free in __strp_recv+0x447/0xda0
>>   Write of size 8 at addr ffff88810db86140 by task repro/349
>>
>>   Call Trace:
>>    <IRQ>
>>    __strp_recv+0x447/0xda0
>>    __tcp_read_sock+0x13d/0x590
>>    tcp_bpf_strp_read_sock+0x195/0x320
>>    strp_data_ready+0x267/0x340
>>    sk_psock_strp_data_ready+0x1ce/0x350
>>    tcp_data_queue+0x1364/0x2fd0
>>    tcp_rcv_established+0xe07/0x1640
>>    [...]
>>
>>   Allocated by task 349:
>>    skb_clone+0x17b/0x210
>>    __strp_recv+0x2c3/0xda0
>>    __tcp_read_sock+0x13d/0x590
>>    [...]
>>
>>   Freed by task 349:
>>    kmem_cache_free+0x150/0x570
>>    __pskb_pull_tail+0x57b/0xc20
>>    skb_ensure_writable+0x236/0x260
>>    __bpf_skb_change_tail+0x1d4/0x590
>>    sk_skb_change_tail+0x2a/0x40
>>    bpf_prog_1b285dcd6c41373e+0x27/0x30
>>    bpf_prog_run_pin_on_cpu+0xf3/0x260
>>    sk_psock_strp_parse+0x118/0x1e0
>>    __strp_recv+0x4f6/0xda0
>>    [...]
>>
>> The same resize also leaves the head's length inconsistent with its
>> frags, so a later __pskb_pull_tail() can instead hit the
>> BUG_ON(skb_copy_bits(...)) in net/core/skbuff.c.
>>
>> Run the parser on a private clone of the head when the message spans more
>> than one skb and the program can modify the packet
>> (prog->aux->changes_pkt_data), so a resizing helper can only touch the
>> clone and strparser's head and skb_nextp stay valid. Single-skb messages
>> have no frag_list and read-only parsers cannot resize, so both are still
>> parsed in place. If the clone cannot be allocated, return 0 so the caller
>> retries on the next read rather than failing the parser.
>>
>> Fixes: 8a31db561566 ("bpf: add access to sock fields and pkt data from sk_skb programs")
>> Signed-off-by: Sechang Lim <rhkrqnwk98@gmail.com>
>> ---
>> v2:
>>  - clone only when prog->aux->changes_pkt_data (Bobby Eshleman)
>>  - return 0 on clone failure instead of -ENOMEM (Bobby Eshleman)
>>  - free the clone with consume_skb() instead of kfree_skb()
>>  - drop the unrelated guard(rcu)() change (Bobby Eshleman)
>>
>> v1:
>>  - https://lore.kernel.org/all/20260609112316.3685738-1-rhkrqnwk98@gmail.com/
>>
>>  net/core/skmsg.c | 26 +++++++++++++++++++++++---
>>  1 file changed, 23 insertions(+), 3 deletions(-)
>>
>> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
>> index e1850caf1a71..97e5bc5f38c3 100644
>> --- a/net/core/skmsg.c
>> +++ b/net/core/skmsg.c
>> @@ -1149,9 +1149,29 @@ static int sk_psock_strp_parse(struct strparser *strp, struct sk_buff *skb)
>>  	rcu_read_lock();
>>  	prog = READ_ONCE(psock->progs.stream_parser);
>>  	if (likely(prog)) {
>> -		skb->sk = psock->sk;
>> -		ret = bpf_prog_run_pin_on_cpu(prog, skb);
>> -		skb->sk = NULL;
>> +		struct sk_buff *parse_skb = skb;
>> +
>> +		/*
>> +		 * strparser chains the message skbs through skb->frag_list and
>> +		 * keeps a pointer into that list in strp->skb_nextp.  The parser
>> +		 * program may call bpf_skb_change_tail() and friends, which go
>> +		 * through __pskb_pull_tail() and free the frag_list skbs that
>> +		 * strparser still tracks.  Run the program on a clone when the head
>> +		 * has a frag_list and the program can modify the packet, so it
>> +		 * cannot drop frags strparser owns.
>> +		 */
>> +		if (skb_has_frag_list(skb) && prog->aux->changes_pkt_data) {
>> +			parse_skb = skb_clone(skb, GFP_ATOMIC);
>> +			if (!parse_skb) {
>> +				rcu_read_unlock();
>> +				return 0;
>> +			}
>> +		}
>> +		parse_skb->sk = psock->sk;
>> +		ret = bpf_prog_run_pin_on_cpu(prog, parse_skb);
>> +		parse_skb->sk = NULL;
>> +		if (parse_skb != skb)
>> +			consume_skb(parse_skb);
>>  	}
>>  	rcu_read_unlock();
>>  	return ret;
>> --
>> 2.43.0
>>
>I'm still on the fence about "return 0" vs ENOMEM. I hate to flip-flop
>on you here, but now I'm not sure if it is worth the complication to
>return 0 since we're really only buying a single timer interval in which
>we need 1) suddenly more memory to alloc the clone, and 2) another data
>ready event to cause the stream parsing to pick up again. If any one
>doesn't happen, the end result is the same. Not sure its a good
>trade-off for the complexity of basically tricking the caller with the
>zero return. Maybe let's go back to ENOMEM?
>

Per Kuniyuki's and Jiayuan's suggestion, v3 will reject a packet-modifying
stream parser at attach time instead of runtime, so the return-0 vs
ENOMEM question goes away with that code.

>BTW, based on the comm name "repro", it sounds like you have a decent
>reproducer for this. I wonder if it is possible to add something to the
>selftests to catch this?
>

I will add an selftest in v3.

Best,
Sechang

^ permalink raw reply

* [PATCH net 0/2] airoha: fixes for sched HTB offload support
From: Lorenzo Bianconi @ 2026-06-18  6:00 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: Wayen Yan, linux-arm-kernel, linux-mediatek, netdev,
	Lorenzo Bianconi


---
Lorenzo Bianconi (2):
      net: airoha: Fix off-by-one in airoha_tc_remove_htb_queue()
      net: airoha: fix netif_set_real_num_tx_queues for sparse QoS channels

 drivers/net/ethernet/airoha/airoha_eth.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)
---
base-commit: 7d8297e26b4e20b5d1c3c3fe51fe81a1c7fbc823
change-id: 20260618-airoha-qos-fixes-b6460b085680

Best regards,
-- 
Lorenzo Bianconi <lorenzo@kernel.org>


^ permalink raw reply

* [PATCH net 1/2] net: airoha: Fix off-by-one in airoha_tc_remove_htb_queue()
From: Lorenzo Bianconi @ 2026-06-18  6:00 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: Wayen Yan, linux-arm-kernel, linux-mediatek, netdev,
	Lorenzo Bianconi
In-Reply-To: <20260618-airoha-qos-fixes-v1-0-37192652157f@kernel.org>

airoha_tc_htb_alloc_leaf_queue() computes the HTB QoS channel index
as opt->classid % AIROHA_NUM_QOS_CHANNELS and stores it in qos_sq_bmap.
However, airoha_tc_remove_htb_queue() clears the HTB configuration
using queue + 1 as the channel index, causing an off-by-one error.
Use queue directly as the QoS channel index to match the allocation
logic.

Fixes: ef1ca9271313b ("net: airoha: Add sched HTB offload support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 drivers/net/ethernet/airoha/airoha_eth.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
index 64dde6464f3f..aa98d1823ab6 100644
--- a/drivers/net/ethernet/airoha/airoha_eth.c
+++ b/drivers/net/ethernet/airoha/airoha_eth.c
@@ -3006,7 +3006,7 @@ static void airoha_tc_remove_htb_queue(struct net_device *netdev, int queue)
 	struct airoha_qdma *qdma = dev->qdma;
 
 	netif_set_real_num_tx_queues(netdev, netdev->real_num_tx_queues - 1);
-	airoha_qdma_set_tx_rate_limit(netdev, queue + 1, 0, 0);
+	airoha_qdma_set_tx_rate_limit(netdev, queue, 0, 0);
 
 	clear_bit(queue, qdma->qos_channel_map);
 	clear_bit(queue, dev->qos_sq_bmap);

-- 
2.54.0


^ permalink raw reply related

* [PATCH net 2/2] net: airoha: fix netif_set_real_num_tx_queues for sparse QoS channels
From: Lorenzo Bianconi @ 2026-06-18  6:00 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: Wayen Yan, linux-arm-kernel, linux-mediatek, netdev,
	Lorenzo Bianconi
In-Reply-To: <20260618-airoha-qos-fixes-v1-0-37192652157f@kernel.org>

airoha_tc_htb_alloc_leaf_queue() assigns queue IDs based on the channel
index (opt->qid = AIROHA_NUM_TX_RING + channel), but updates
real_num_tx_queues with a simple increment (num_tx_queues + 1). When QoS
channels are allocated sparsely (e.g., channels 0 and 3 without 1 and
2), the returned qid can exceed real_num_tx_queues, causing out-of-bounds
accesses in the networking stack.
For example, allocating channel 0 then channel 3 results in
real_num_tx_queues = 34 but qid = 35, which is out of range [0, 34).
Fix this by computing real_num_tx_queues based on the highest active
channel index rather than using a simple counter, in both the allocation
and deletion paths.

Fixes: ef1ca9271313b ("net: airoha: Add sched HTB offload support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 drivers/net/ethernet/airoha/airoha_eth.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
index aa98d1823ab6..e2652cff67c0 100644
--- a/drivers/net/ethernet/airoha/airoha_eth.c
+++ b/drivers/net/ethernet/airoha/airoha_eth.c
@@ -2789,7 +2789,7 @@ static int airoha_tc_htb_alloc_leaf_queue(struct net_device *netdev,
 					  struct tc_htb_qopt_offload *opt)
 {
 	u32 channel = TC_H_MIN(opt->classid) % AIROHA_NUM_QOS_CHANNELS;
-	int err, num_tx_queues = netdev->real_num_tx_queues;
+	int err, num_tx_queues = AIROHA_NUM_TX_RING + channel + 1;
 	struct airoha_gdm_dev *dev = netdev_priv(netdev);
 	struct airoha_qdma *qdma = dev->qdma;
 
@@ -2806,7 +2806,10 @@ static int airoha_tc_htb_alloc_leaf_queue(struct net_device *netdev,
 	if (err)
 		goto error;
 
-	err = netif_set_real_num_tx_queues(netdev, num_tx_queues + 1);
+	if (num_tx_queues <= netdev->real_num_tx_queues)
+		goto set_qos_sq_bmap;
+
+	err = netif_set_real_num_tx_queues(netdev, num_tx_queues);
 	if (err) {
 		airoha_qdma_set_tx_rate_limit(netdev, channel, 0,
 					      opt->quantum);
@@ -2815,6 +2818,7 @@ static int airoha_tc_htb_alloc_leaf_queue(struct net_device *netdev,
 		goto error;
 	}
 
+set_qos_sq_bmap:
 	set_bit(channel, dev->qos_sq_bmap);
 	opt->qid = AIROHA_NUM_TX_RING + channel;
 
@@ -3003,13 +3007,18 @@ static int airoha_dev_setup_tc_block(struct net_device *dev,
 static void airoha_tc_remove_htb_queue(struct net_device *netdev, int queue)
 {
 	struct airoha_gdm_dev *dev = netdev_priv(netdev);
+	int num_tx_queues = AIROHA_NUM_TX_RING;
 	struct airoha_qdma *qdma = dev->qdma;
 
-	netif_set_real_num_tx_queues(netdev, netdev->real_num_tx_queues - 1);
 	airoha_qdma_set_tx_rate_limit(netdev, queue, 0, 0);
 
 	clear_bit(queue, qdma->qos_channel_map);
 	clear_bit(queue, dev->qos_sq_bmap);
+
+	if (!bitmap_empty(dev->qos_sq_bmap, AIROHA_NUM_QOS_CHANNELS))
+		num_tx_queues += find_last_bit(dev->qos_sq_bmap,
+					       AIROHA_NUM_QOS_CHANNELS) + 1;
+	netif_set_real_num_tx_queues(netdev, num_tx_queues);
 }
 
 static int airoha_tc_htb_delete_leaf_queue(struct net_device *netdev,

-- 
2.54.0


^ permalink raw reply related

* [PATCH net] net: airoha: fix BQL underflow and UAF in shared QDMA TX ring
From: Lorenzo Bianconi @ 2026-06-18  6:13 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: Wayen Yan, linux-arm-kernel, linux-mediatek, netdev,
	Lorenzo Bianconi

When multiple netdevs share a QDMA TX ring and one device is stopped,
netdev_tx_reset_subqueue() zeroes that device's BQL counters while its
pending skbs remain in the shared HW TX ring. When NAPI later completes
those skbs via netdev_tx_completed_queue(), the already-zeroed
dql->num_queued counter underflows.
Moreover, in the airoha_remove() path, netdevs are unregistered
sequentially while skbs from previously unregistered netdevs may still
reference freed net_device memory via skb->dev, causing a use-after-free
during BQL accounting.
Fix both issues:
- Remove netdev_tx_reset_subqueue() from airoha_dev_stop() so pending
  skbs are completed naturally by NAPI with proper BQL accounting.
- Add netdev_tx_completed_queue() in airoha_qdma_cleanup_tx_queue()
  to properly account for skbs freed during queue teardown.
- Introduce airoha_qdma_tx_disable() to stop TX on all registered
  netdevs for a given QDMA instance under RTNL lock.
- Move DMA engine start/stop into probe/remove and
  airoha_qdma_cleanup(), ensuring TX queues are cleaned up while all
  netdevs are still registered and skb->dev is valid.

Fixes: 6df0488dc9dd ("net: airoha: fix BQL accounting in airoha_qdma_cleanup_tx_queue()")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 drivers/net/ethernet/airoha/airoha_eth.c | 95 ++++++++++++++++++++++++--------
 1 file changed, 72 insertions(+), 23 deletions(-)

diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
index 64dde6464f3f..4d6a061cd779 100644
--- a/drivers/net/ethernet/airoha/airoha_eth.c
+++ b/drivers/net/ethernet/airoha/airoha_eth.c
@@ -1004,6 +1004,7 @@ static int airoha_qdma_tx_napi_poll(struct napi_struct *napi, int budget)
 
 		e = &q->entry[index];
 		skb = e->skb;
+		e->skb = NULL;
 
 		dma_unmap_single(eth->dev, e->dma_addr, e->dma_len,
 				 DMA_TO_DEVICE);
@@ -1147,6 +1148,42 @@ static int airoha_qdma_init_tx(struct airoha_qdma *qdma)
 	return 0;
 }
 
+static void airoha_qdma_tx_disable(struct airoha_qdma *qdma)
+{
+	struct airoha_eth *eth = qdma->eth;
+	int i;
+
+	/* Protect netdev->reg_state and netif_tx_disable() calls. */
+	rtnl_lock();
+
+	for (i = 0; i < ARRAY_SIZE(eth->ports); i++) {
+		struct airoha_gdm_port *port = eth->ports[i];
+		int j;
+
+		if (!port)
+			continue;
+
+		for (j = 0; j < ARRAY_SIZE(port->devs); j++) {
+			struct airoha_gdm_dev *dev = port->devs[j];
+			struct net_device *netdev;
+
+			if (!dev)
+				continue;
+
+			if (dev->qdma != qdma)
+				continue;
+
+			netdev = netdev_from_priv(dev);
+			if (netdev->reg_state != NETREG_REGISTERED)
+				continue;
+
+			netif_tx_disable(netdev);
+		}
+	}
+
+	rtnl_unlock();
+}
+
 static void airoha_qdma_cleanup_tx_queue(struct airoha_queue *q)
 {
 	struct airoha_qdma *qdma = q->qdma;
@@ -1158,13 +1195,20 @@ static void airoha_qdma_cleanup_tx_queue(struct airoha_queue *q)
 	for (i = 0; i < q->ndesc; i++) {
 		struct airoha_queue_entry *e = &q->entry[i];
 		struct airoha_qdma_desc *desc = &q->desc[i];
+		struct sk_buff *skb = e->skb;
 
 		if (!e->dma_addr)
 			continue;
 
 		dma_unmap_single(eth->dev, e->dma_addr, e->dma_len,
 				 DMA_TO_DEVICE);
-		dev_kfree_skb_any(e->skb);
+		if (skb) {
+			struct netdev_queue *txq;
+
+			txq = skb_get_tx_queue(skb->dev, skb);
+			netdev_tx_completed_queue(txq, 1, skb->len);
+			dev_kfree_skb_any(skb);
+		}
 		e->dma_addr = 0;
 		e->skb = NULL;
 		list_add_tail(&e->list, &q->tx_list);
@@ -1527,6 +1571,23 @@ static void airoha_qdma_cleanup(struct airoha_qdma *qdma)
 {
 	int i;
 
+	if (test_bit(DEV_STATE_INITIALIZED, &qdma->eth->state)) {
+		u32 status;
+
+		airoha_qdma_tx_disable(qdma);
+
+		airoha_qdma_clear(qdma, REG_QDMA_GLOBAL_CFG,
+				  GLOBAL_CFG_TX_DMA_EN_MASK |
+				  GLOBAL_CFG_RX_DMA_EN_MASK);
+		if (read_poll_timeout(airoha_qdma_rr, status,
+				      !(status & (GLOBAL_CFG_TX_DMA_BUSY_MASK |
+						  GLOBAL_CFG_RX_DMA_BUSY_MASK)),
+				      USEC_PER_MSEC, 50 * USEC_PER_MSEC, true,
+				      qdma, REG_QDMA_GLOBAL_CFG))
+			dev_warn(qdma->eth->dev,
+				 "QDMA DMA engine busy timeout\n");
+	}
+
 	for (i = 0; i < ARRAY_SIZE(qdma->q_rx); i++) {
 		if (!qdma->q_rx[i].ndesc)
 			continue;
@@ -1837,9 +1898,6 @@ static int airoha_dev_open(struct net_device *netdev)
 	}
 	port->users++;
 
-	airoha_qdma_set(qdma, REG_QDMA_GLOBAL_CFG,
-			GLOBAL_CFG_TX_DMA_EN_MASK |
-			GLOBAL_CFG_RX_DMA_EN_MASK);
 	qdma->users++;
 
 	if (!airoha_is_lan_gdm_dev(dev) &&
@@ -1880,12 +1938,9 @@ static int airoha_dev_stop(struct net_device *netdev)
 	struct airoha_gdm_dev *dev = netdev_priv(netdev);
 	struct airoha_gdm_port *port = dev->port;
 	struct airoha_qdma *qdma = dev->qdma;
-	int i;
 
 	netif_tx_disable(netdev);
 	airoha_set_vip_for_gdm_port(dev, false);
-	for (i = 0; i < netdev->num_tx_queues; i++)
-		netdev_tx_reset_subqueue(netdev, i);
 
 	if (--port->users)
 		airoha_set_port_mtu(dev->eth, port);
@@ -1893,19 +1948,7 @@ static int airoha_dev_stop(struct net_device *netdev)
 		airoha_set_gdm_port_fwd_cfg(qdma->eth,
 					    REG_GDM_FWD_CFG(port->id),
 					    FE_PSE_PORT_DROP);
-
-	if (!--qdma->users) {
-		airoha_qdma_clear(qdma, REG_QDMA_GLOBAL_CFG,
-				  GLOBAL_CFG_TX_DMA_EN_MASK |
-				  GLOBAL_CFG_RX_DMA_EN_MASK);
-
-		for (i = 0; i < ARRAY_SIZE(qdma->q_tx); i++) {
-			if (!qdma->q_tx[i].ndesc)
-				continue;
-
-			airoha_qdma_cleanup_tx_queue(&qdma->q_tx[i]);
-		}
-	}
+	qdma->users--;
 
 	return 0;
 }
@@ -3413,8 +3456,12 @@ static int airoha_probe(struct platform_device *pdev)
 	if (err)
 		goto error_netdev_free;
 
-	for (i = 0; i < ARRAY_SIZE(eth->qdma); i++)
+	for (i = 0; i < ARRAY_SIZE(eth->qdma); i++) {
 		airoha_qdma_start_napi(&eth->qdma[i]);
+		airoha_qdma_set(&eth->qdma[i], REG_QDMA_GLOBAL_CFG,
+				GLOBAL_CFG_TX_DMA_EN_MASK |
+				GLOBAL_CFG_RX_DMA_EN_MASK);
+	}
 
 	for_each_child_of_node(pdev->dev.of_node, np) {
 		if (!of_device_is_compatible(np, "airoha,eth-mac"))
@@ -3440,6 +3487,8 @@ static int airoha_probe(struct platform_device *pdev)
 	for (i = 0; i < ARRAY_SIZE(eth->qdma); i++)
 		airoha_qdma_stop_napi(&eth->qdma[i]);
 
+	airoha_hw_cleanup(eth);
+
 	for (i = 0; i < ARRAY_SIZE(eth->ports); i++) {
 		struct airoha_gdm_port *port = eth->ports[i];
 		int j;
@@ -3461,7 +3510,6 @@ static int airoha_probe(struct platform_device *pdev)
 		}
 		airoha_metadata_dst_free(port);
 	}
-	airoha_hw_cleanup(eth);
 error_netdev_free:
 	free_netdev(eth->napi_dev);
 	platform_set_drvdata(pdev, NULL);
@@ -3477,6 +3525,8 @@ static void airoha_remove(struct platform_device *pdev)
 	for (i = 0; i < ARRAY_SIZE(eth->qdma); i++)
 		airoha_qdma_stop_napi(&eth->qdma[i]);
 
+	airoha_hw_cleanup(eth);
+
 	for (i = 0; i < ARRAY_SIZE(eth->ports); i++) {
 		struct airoha_gdm_port *port = eth->ports[i];
 		int j;
@@ -3497,7 +3547,6 @@ static void airoha_remove(struct platform_device *pdev)
 		}
 		airoha_metadata_dst_free(port);
 	}
-	airoha_hw_cleanup(eth);
 
 	free_netdev(eth->napi_dev);
 	platform_set_drvdata(pdev, NULL);

---
base-commit: 7d8297e26b4e20b5d1c3c3fe51fe81a1c7fbc823
change-id: 20260618-airoha-bql-fixes-f57b2d108573

Best regards,
-- 
Lorenzo Bianconi <lorenzo@kernel.org>


^ permalink raw reply related

* Re: [PATCH net] net/smc: avoid recursive sk_callback_lock in listen data_ready
From: Mahanta Jambigi @ 2026-06-18  6:24 UTC (permalink / raw)
  To: Runyu Xiao, D. Wythe, Dust Li, Sidraya Jayagond, Wenjia Zhang,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Tony Lu, Wen Gu, Simon Horman, Karsten Graul, linux-rdma,
	linux-s390, netdev, linux-kernel, jianhao.xu, stable
In-Reply-To: <20260617152855.1039151-1-runyu.xiao@seu.edu.cn>



On 17/06/26 8:58 pm, Runyu Xiao wrote:
> smc_listen() installs smc_clcsock_data_ready() as the underlying TCP
> listen socket's sk_data_ready callback.  smc_clcsock_data_ready() then
> immediately takes sk_callback_lock before looking up the SMC listener and
> queuing smc_tcp_listen_work().
> 
> That is unsafe once the TCP listen socket is leaving TCP_LISTEN.  The TCP
> close/flush path can run the installed sk_data_ready callback with
> sk_callback_lock already held, so entering smc_clcsock_data_ready() again
> tries to take the same rwlock recursively in the same thread.  The nvmet

Could you provide me the exact call stack showing recursive lock? Also
help me with the nvmet commit details.

> TCP listener had to make the same state check before taking
> sk_callback_lock for this reason.
> 
> This issue was found by our static analysis tool and then manually
> reviewed against the current tree.
> 
> The grounded PoC kept the SMC listen callback installation path:
> 
>   smc_listen()
>   smc_clcsock_replace_cb()
>   sk_data_ready = smc_clcsock_data_ready()
> 
> It then modeled the close/flush carrier that invokes the installed
> sk_data_ready callback while sk_callback_lock is already held.  Lockdep
> reported the same-thread recursive acquisition:
> 
>   WARNING: possible recursive locking detected
>   smc_clcsock_data_ready+0xa/0x4d [vuln_msv]
>   smc_close_flush_work+0x1f/0x30 [vuln_msv]
>   *** DEADLOCK ***
> 
> Return before taking sk_callback_lock when the underlying TCP socket is no
> longer in TCP_LISTEN.  In that state there is no listen accept work to
> queue for SMC, and avoiding the callback lock mirrors the fix used by the
> TCP nvmet listener.
> 
> Fixes: 0558226cebee ("net/smc: Fix slab-out-of-bounds issue in fallback")
> Cc: stable@vger.kernel.org
> Signed-off-by: Runyu Xiao <runyu.xiao@seu.edu.cn>
> ---
>  net/smc/af_smc.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
> index 6421c2e1c84d..1af4e3c333ff 100644
> --- a/net/smc/af_smc.c
> +++ b/net/smc/af_smc.c
> @@ -2631,6 +2631,9 @@ static void smc_clcsock_data_ready(struct sock *listen_clcsock)
>  {
>  	struct smc_sock *lsmc;
>  
> +	if (READ_ONCE(listen_clcsock->sk_state) != TCP_LISTEN)

Is *TCP_LISTEN* check sufficient? What about *TCP_SYN_RECV* or
*TCP_ESTABLISHED*?

> +		return;
> +
>  	read_lock_bh(&listen_clcsock->sk_callback_lock);
>  	lsmc = smc_clcsock_user_data(listen_clcsock);
>  	if (!lsmc)


^ permalink raw reply

* Re: [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2
From: Andrei Vagin @ 2026-06-18  6:34 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Joanne Koong, Val Packett, Al Viro, Linus Torvalds, Askar Safin,
	linux-kernel, linux-mm, linux-api, netdev, Matthew Wilcox,
	Jens Axboe, Christoph Hellwig, David Howells, Andrew Morton,
	David Hildenbrand, Pedro Falcato, Miklos Szeredi, patches,
	linux-fsdevel, Jan Kara, Steven Rostedt, fuse-devel,
	Bernd Schubert, Aleksandr Mikhalitsyn, criu@lists.linux.dev
In-Reply-To: <CANaxB-zK5q=Xw6UZTmeFtXsDZjUsPkFk=p485m-wtNTBnf4hgg@mail.gmail.com>

On Wed, Jun 17, 2026 at 12:57 PM Andrei Vagin <avagin@gmail.com> wrote:
>
> On Wed, Jun 17, 2026 at 4:08 AM Christian Brauner <brauner@kernel.org> wrote:
> >
> > > After this patch, step b) is a straight copy which means step d)'s
> > > fixup doesn't modify what's in the pipe. This could be fixed up in
> > > libfuse to not depend on modify-after-vmsplice, but I don't think this
> > > helps for applications using already-released libfuse versions. I
> > > think this patch needs to be reverted.
> >
> > Note, nothing was merged. I deliberately kept in -next though for a long
> > time to see how quickly we'd see regressions.
>
> The bait worked. CRIU wins a prize in this lottery.
>
> The CRIU fifo test fails with this change. The problem is that vmsplice
> with SPLICE_F_NONBLOCK to a fifo file descriptor fails with -EOPNOTSUPP.

Actually, this change introduces a performance and functional
regression for CRIU.

Here is a brief overview of how CRIU currently dumps memory pages:

CRIU injects a parasite code blob into the target process's address
space. The parasite invokes vmsplice() with the SPLICE_F_GIFT flag to
pin physical pages directly inside a pipe without copying them. The main
CRIU process then takes over from outside the target context, calling
splice() on the other end of the pipe to stream the data directly into
checkpoint image files or a remote network socket.

I ran a simple test that creates an anonymous mapping and touches every
page within it:
Without this patch, CRIU takes 9 seconds to dump the test process.
With this patch, It takes 18 seconds...

Plus, it obviously introduces some memory overhead.

If these changes are merged, we will need to completely rework the
memory dumping mechanism in CRIU. Using vmsplice() in this proposed form
no longer makes any sense for our architecture...

Thanks,
Andrei

^ permalink raw reply

* [PATCH net] net: stmmac: dwmac-spacemit: Fix wrong ctrl register definition
From: Inochi Amaoto @ 2026-06-18  6:41 UTC (permalink / raw)
  To: Inochi Amaoto, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Maxime Coquelin, Alexandre Torgue,
	Yixun Lan, Russell King (Oracle)
  Cc: netdev, linux-stm32, linux-arm-kernel, linux-riscv, spacemit,
	linux-kernel, Yixun Lan, Longbin Li

There register layout of the phy ctrl register has something wrong,
fix it to match the right layout

Fixes: 30f0ba420ed3 ("net: stmmac: Add glue layer for Spacemit K3 SoC")
Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
---
 .../net/ethernet/stmicro/stmmac/dwmac-spacemit.c    | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-spacemit.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-spacemit.c
index 223754cc5c79..6feffaa3ef3a 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-spacemit.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-spacemit.c
@@ -18,10 +18,12 @@
 #include "stmmac_platform.h"
 
 /* ctrl register bits */
-#define CTRL_PHY_INTF_RGMII		BIT(3)
-#define CTRL_PHY_INTF_MII		BIT(4)
-#define CTRL_WAKE_IRQ_EN		BIT(9)
-#define CTRL_PHY_IRQ_EN			BIT(12)
+#define CTRL_PHY_INTF_MODE		GENMASK(4, 3)
+#define CTRL_PHY_INTF_RMII		FIELD_PREP(CTRL_PHY_INTF_MODE, 0)
+#define CTRL_PHY_INTF_RGMII		FIELD_PREP(CTRL_PHY_INTF_MODE, 1)
+#define CTRL_PHY_INTF_MII		FIELD_PREP(CTRL_PHY_INTF_MODE, 3)
+#define CTRL_PHY_IRQ_EN			BIT(9)
+#define CTRL_WAKE_IRQ_EN		BIT(12)
 
 /* dline register bits */
 #define RGMII_RX_DLINE_EN		BIT(0)
@@ -118,7 +120,7 @@ static void spacemit_get_interfaces(struct stmmac_priv *priv, void *bsp_priv,
 
 static int spacemit_set_phy_intf_sel(void *bsp_priv, u8 phy_intf_sel)
 {
-	unsigned int mask = CTRL_PHY_INTF_MII | CTRL_PHY_INTF_RGMII;
+	unsigned int mask = CTRL_PHY_INTF_MODE;
 	struct spacmit_dwmac *dwmac = bsp_priv;
 	unsigned int val = 0;
 
@@ -128,6 +130,7 @@ static int spacemit_set_phy_intf_sel(void *bsp_priv, u8 phy_intf_sel)
 		break;
 
 	case PHY_INTF_SEL_RMII:
+		val = CTRL_PHY_INTF_RMII;
 		break;
 
 	case PHY_INTF_SEL_RGMII:
-- 
2.54.0


^ permalink raw reply related

* Re: [PATCH net] net: stmmac: dwmac-spacemit: Fix wrong ctrl register definition
From: Maxime Chevallier @ 2026-06-18  7:03 UTC (permalink / raw)
  To: Inochi Amaoto, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Maxime Coquelin, Alexandre Torgue,
	Yixun Lan, Russell King (Oracle)
  Cc: netdev, linux-stm32, linux-arm-kernel, linux-riscv, spacemit,
	linux-kernel, Yixun Lan, Longbin Li
In-Reply-To: <20260618064143.1102179-1-inochiama@gmail.com>

Hi Inochi,

On 6/18/26 08:41, Inochi Amaoto wrote:
> There register layout of the phy ctrl register has something wrong,
> fix it to match the right layout
> 
> Fixes: 30f0ba420ed3 ("net: stmmac: Add glue layer for Spacemit K3 SoC")
> Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
> ---
>  .../net/ethernet/stmicro/stmmac/dwmac-spacemit.c    | 13 ++++++++-----
>  1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-spacemit.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-spacemit.c
> index 223754cc5c79..6feffaa3ef3a 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-spacemit.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-spacemit.c
> @@ -18,10 +18,12 @@
>  #include "stmmac_platform.h"
>  
>  /* ctrl register bits */
> -#define CTRL_PHY_INTF_RGMII		BIT(3)
> -#define CTRL_PHY_INTF_MII		BIT(4)
> -#define CTRL_WAKE_IRQ_EN		BIT(9)
> -#define CTRL_PHY_IRQ_EN			BIT(12)
> +#define CTRL_PHY_INTF_MODE		GENMASK(4, 3)
> +#define CTRL_PHY_INTF_RMII		FIELD_PREP(CTRL_PHY_INTF_MODE, 0)
> +#define CTRL_PHY_INTF_RGMII		FIELD_PREP(CTRL_PHY_INTF_MODE, 1)
> +#define CTRL_PHY_INTF_MII		FIELD_PREP(CTRL_PHY_INTF_MODE, 3)
> +#define CTRL_PHY_IRQ_EN			BIT(9)
> +#define CTRL_WAKE_IRQ_EN		BIT(12)

Looks like you're fixing 2 things there :

 -> Wake on Lan probably didn't work before, as the wake irq was apparently wrong
 -> The MII mode selection apparently also changes, but maybe you don't have a
    MII board around to test this ?

Is it possible you address these issues independently (i.e. split this in 2 patches) ?
That way, if we ever revert one, we won't re-break the other thing that was broken.


Maxime


^ permalink raw reply

* [PATCH net] net: sit: require CAP_NET_ADMIN in the device netns for changelink
From: Maoyi Xie @ 2026-06-18  7:08 UTC (permalink / raw)
  To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Kuniyuki Iwashima, Xiao Liang, Nicolas Dichtel,
	Kees Cook, netdev, linux-kernel, stable

ipip6_changelink() operates on at most two netns, dev_net(dev) and the
tunnel link netns t->net. They differ once the device is created in or
moved to a netns other than the one the request runs in. The rtnl
changelink path checks CAP_NET_ADMIN only against dev_net(dev), so a
caller privileged there but not in t->net can rewrite a tunnel that
lives in t->net.

Gate ipip6_changelink() on rtnl_dev_link_net_capable() at its top,
before any attribute is parsed. sit was the one tunnel type not covered
by the recent series that added this check to the other changelink()
handlers.

Fixes: 5e6700b3bf98 ("sit: add support of x-netns")
Link: https://lore.kernel.org/netdev/20260612085941.3158249-1-maoyixie.tju@gmail.com/
Cc: stable@vger.kernel.org
Signed-off-by: Maoyi Xie <maoyixie.tju@gmail.com>
---
 net/ipv6/sit.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 64f0d1b..a38b24f 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -1613,6 +1613,9 @@ static int ipip6_changelink(struct net_device *dev, struct nlattr *tb[],
 	__u32 fwmark = t->fwmark;
 	int err;

+	if (!rtnl_dev_link_net_capable(dev, net))
+		return -EPERM;
+
 	if (dev == sitn->fb_tunnel_dev)
 		return -EINVAL;

--
2.43.0

^ permalink raw reply related

* Re: [PATCH net] net: stmmac: dwmac-spacemit: Fix wrong ctrl register definition
From: Inochi Amaoto @ 2026-06-18  7:12 UTC (permalink / raw)
  To: Maxime Chevallier, Inochi Amaoto, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Maxime Coquelin,
	Alexandre Torgue, Yixun Lan, Russell King (Oracle)
  Cc: netdev, linux-stm32, linux-arm-kernel, linux-riscv, spacemit,
	linux-kernel, Yixun Lan, Longbin Li
In-Reply-To: <9b39829d-92b4-4ffa-be0b-b2b0f857f58e@bootlin.com>

On Thu, Jun 18, 2026 at 09:03:21AM +0200, Maxime Chevallier wrote:
> Hi Inochi,
> 
> On 6/18/26 08:41, Inochi Amaoto wrote:
> > There register layout of the phy ctrl register has something wrong,
> > fix it to match the right layout
> > 
> > Fixes: 30f0ba420ed3 ("net: stmmac: Add glue layer for Spacemit K3 SoC")
> > Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
> > ---
> >  .../net/ethernet/stmicro/stmmac/dwmac-spacemit.c    | 13 ++++++++-----
> >  1 file changed, 8 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-spacemit.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-spacemit.c
> > index 223754cc5c79..6feffaa3ef3a 100644
> > --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-spacemit.c
> > +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-spacemit.c
> > @@ -18,10 +18,12 @@
> >  #include "stmmac_platform.h"
> >  
> >  /* ctrl register bits */
> > -#define CTRL_PHY_INTF_RGMII		BIT(3)
> > -#define CTRL_PHY_INTF_MII		BIT(4)
> > -#define CTRL_WAKE_IRQ_EN		BIT(9)
> > -#define CTRL_PHY_IRQ_EN			BIT(12)
> > +#define CTRL_PHY_INTF_MODE		GENMASK(4, 3)
> > +#define CTRL_PHY_INTF_RMII		FIELD_PREP(CTRL_PHY_INTF_MODE, 0)
> > +#define CTRL_PHY_INTF_RGMII		FIELD_PREP(CTRL_PHY_INTF_MODE, 1)
> > +#define CTRL_PHY_INTF_MII		FIELD_PREP(CTRL_PHY_INTF_MODE, 3)
> > +#define CTRL_PHY_IRQ_EN			BIT(9)
> > +#define CTRL_WAKE_IRQ_EN		BIT(12)
> 
> Looks like you're fixing 2 things there :
> 
>  -> Wake on Lan probably didn't work before, as the wake irq was apparently wrong.

I guess the vendor firmware and uboot may do something for it, 
but the irq is wrong actually.

>  -> The MII mode selection apparently also changes, but maybe you don't have a
>     MII board around to test this ?
> 

Actually, the only board of the K3 is the pico-itx board, and it only has
a RGMII phy. I even doube the spacemit vendor has not tested the MII phy
well....

> Is it possible you address these issues independently (i.e. split this in 2 patches) ?
> That way, if we ever revert one, we won't re-break the other thing that was broken.
> 
> 

Yes, it is fine for me to split it. I will send it in a few days.

Regards,
Inochi

^ permalink raw reply

* Re: [PATCH] xfrm: Fix xfrm state cache insertion race
From: Steffen Klassert @ 2026-06-18  7:23 UTC (permalink / raw)
  To: Simon Horman
  Cc: Herbert Xu, netdev, Linus Torvalds, Jakub Kicinski, Paolo Abeni,
	zdi-disclosures@trendmicro.com, Willy Tarreau
In-Reply-To: <20260615084321.GE712698@horms.kernel.org>

On Mon, Jun 15, 2026 at 09:43:21AM +0100, Simon Horman wrote:
> On Fri, Jun 12, 2026 at 12:58:59PM +0800, Herbert Xu wrote:
> > The xfrm input state cache insertion code checks the validity of
> > the state before acquiring the global xfrm_state_lock.  Thus it's
> > possible for someone else to kill the state after it passed the
> > validity check, and then the insertion will add the dead state
> > to the cache.
> > 
> > Fix this by moving the validity check inside the lock.
> > 
> > This entire function is called on the input path, where BH must
> > be off (e.g., the caller of this function xfrm_input acquires
> > its spinlocks without disabling BH).
> > 
> > So there is no need to disable BH here or take the RCU read lock.
> > Remove both and replace them with an assertion that trips if BH
> > is accidentally enabled on some future calling path.
> > 
> > Fixes: 81a331a0e72d ("xfrm: Add an inbound percpu state cache.")
> > Reported-by: Zero Day Initiative <zdi-disclosures@trendmicro.com>
> > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> 
> Reviewed-by: Simon Horman <horms@kernel.org>

Applied, thanks everyone!

^ permalink raw reply

* Re: [PATCH net] xfrm: annotate data-races around xfrm_policy_count[] and xfrm_policy_default[]
From: Steffen Klassert @ 2026-06-18  7:24 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	netdev, eric.dumazet, syzbot+d85ba1c732720b9a4097, Herbert Xu
In-Reply-To: <20260612055634.3560352-1-edumazet@google.com>

On Fri, Jun 12, 2026 at 05:56:34AM +0000, Eric Dumazet wrote:
> KCSAN reported a data race involving net->xfrm.policy_count access.
> 
> Add missing READ_ONCE()/WRITE_ONCE() annotations on
> xfrm_policy_count and xfrm_policy_default.
> 
> Fixes: 2518c7c2b3d7 ("[XFRM]: Hash policies when non-prefixed.")
> Reported-by: syzbot+d85ba1c732720b9a4097@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/netdev/6a2b9e96.99669fcc.12a77b.0006.GAE@google.com/T/#u
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied to the ipsec tree, thanks a lot Eric!

^ permalink raw reply

* Re: [PATCH ipsec] espintcp: use sk_msg_free_partial to fix partial send
From: Steffen Klassert @ 2026-06-18  7:24 UTC (permalink / raw)
  To: Sabrina Dubroca; +Cc: netdev, stable, Aaron Esau, Yiming Qian
In-Reply-To: <68ef5bdae251f605b0743d2e51c2a5cb285e5772.1781270325.git.sd@queasysnail.net>

On Fri, Jun 12, 2026 at 04:11:39PM +0200, Sabrina Dubroca wrote:
> sk_msg_free_partial() ensures consistency of the skmsg at every
> iteration, without having to manually handle uncharges and offsets.
> This simplifies the code, and fixes some bugs in skmsg accounting when
> we don't send the full contents.
> 
> Cc: stable@vger.kernel.org
> Fixes: e27cca96cd68 ("xfrm: add espintcp (RFC 8229)")
> Reported-by: Aaron Esau <aaron1esau@gmail.com>
> Reported-by: Yiming Qian <yimingqian591@gmail.com>
> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>

Applied, thanks a lot Sabrina!

^ permalink raw reply

* Re: [PATCH net] xfrm: validate selector family and prefixlen during match
From: Steffen Klassert @ 2026-06-18  7:25 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	netdev, eric.dumazet, syzbot+9383b1ff0df4b29ca5e6,
	Sabrina Dubroca
In-Reply-To: <20260615090237.2689082-1-edumazet@google.com>

On Mon, Jun 15, 2026 at 09:02:37AM +0000, Eric Dumazet wrote:
> syzbot reported a shift-out-of-bounds in xfrm_selector_match()
> due to AF_UNSPEC selector with large prefixlen (e.g. 128) matched
> against IPv4 flow (when XFRM_STATE_AF_UNSPEC is set).
> 
> Fix this by:
> 
> - Rejecting mismatched families in xfrm_selector_match.
> - Returning false in addr4_match if prefixlen > 32.
> - Returning false in addr_match if prefixlen > 128 (prevents overflow).
> 
> Fixes: 3f0ab59e6537 ("xfrm: validate new SA's prefixlen using SA family when sel.family is unset")
> Reported-by: syzbot+9383b1ff0df4b29ca5e6@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/netdev/6a2fbe35.be3f099c.2836ae.0018.GAE@google.com/T/#u
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Also applied, thanks a lot!

^ permalink raw reply

* [PATCH 1/2] igc: Wait for MAC passthrough after reset
From: Chia-Lin Kao (AceLan) @ 2026-06-18  7:33 UTC (permalink / raw)
  To: Tony Nguyen, Przemek Kitszel
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, intel-wired-lan, netdev, linux-kernel

Some systems support MAC passthrough for dock Ethernet controllers by
having firmware rewrite the receive address registers after the controller
reset completes.

igc resets the controller before reading RAL0/RAH0, so that reset can
restore the controller native MAC address temporarily. If the driver reads
the registers immediately, it can race the firmware rewrite and keep the
native dock MAC instead of the host passthrough MAC.

For LMVP devices, poll RAL0/RAH0 after reset and before reading the MAC
address. Stop once the address registers change to another valid Ethernet
address, allowing firmware a bounded window to complete the passthrough
update.

Signed-off-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>
---
 drivers/net/ethernet/intel/igc/igc_main.c | 48 +++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 2c9e2dfd8499..fa9752ed8bc5 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -11,6 +11,7 @@
 #include <net/pkt_sched.h>
 #include <linux/bpf_trace.h>
 #include <net/xdp_sock_drv.h>
+#include <linux/etherdevice.h>
 #include <linux/pci.h>
 #include <linux/mdio.h>
 
@@ -69,6 +70,52 @@ static const struct pci_device_id igc_pci_tbl[] = {
 
 MODULE_DEVICE_TABLE(pci, igc_pci_tbl);
 
+static void igc_read_rar0(struct igc_hw *hw, u8 *addr, u32 *ral, u32 *rah)
+{
+	*ral = rd32(IGC_RAL(0));
+	*rah = rd32(IGC_RAH(0));
+
+	addr[0] = *ral & 0xff;
+	addr[1] = (*ral >> 8) & 0xff;
+	addr[2] = (*ral >> 16) & 0xff;
+	addr[3] = (*ral >> 24) & 0xff;
+	addr[4] = *rah & 0xff;
+	addr[5] = (*rah >> 8) & 0xff;
+}
+
+static bool igc_is_lmvp_device(struct pci_dev *pdev)
+{
+	switch (pdev->device) {
+	case IGC_DEV_ID_I225_LMVP:
+	case IGC_DEV_ID_I226_LMVP:
+		return true;
+	default:
+		return false;
+	}
+}
+
+static void igc_wait_for_lmvp_mac_passthrough(struct pci_dev *pdev,
+					      struct igc_hw *hw)
+{
+	u8 addr[ETH_ALEN] __aligned(2);
+	u32 orig_ral, orig_rah;
+	u32 ral, rah;
+	int i;
+
+	if (!igc_is_lmvp_device(pdev))
+		return;
+
+	igc_read_rar0(hw, addr, &orig_ral, &orig_rah);
+
+	for (i = 0; i < 100; i++) {
+		msleep(100);
+		igc_read_rar0(hw, addr, &ral, &rah);
+		if ((ral != orig_ral || rah != orig_rah) &&
+		    is_valid_ether_addr(addr))
+			return;
+	}
+}
+
 enum latency_range {
 	lowest_latency = 0,
 	low_latency = 1,
@@ -7259,6 +7306,7 @@ static int igc_probe(struct pci_dev *pdev,
 	 * known good starting state
 	 */
 	hw->mac.ops.reset_hw(hw);
+	igc_wait_for_lmvp_mac_passthrough(pdev, hw);
 
 	if (igc_get_flash_presence_i225(hw)) {
 		if (hw->nvm.ops.validate(hw) < 0) {
-- 
2.53.0


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox