Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net] octeontx2-af: npc: Log successful MCAM drop-on-non-hit install at debug level
From: Simon Horman @ 2026-06-16  7:14 UTC (permalink / raw)
  To: Ratheesh Kannoth
  Cc: kuba, linux-kernel, netdev, andrew+netdev, davem, edumazet,
	pabeni, sgoutham
In-Reply-To: <20260615033157.535237-1-rkannoth@marvell.com>

On Mon, Jun 15, 2026 at 09:01:57AM +0530, Ratheesh Kannoth wrote:
> npc_install_mcam_drop_rule() used dev_err() after a successful
> rvu_mbox_handler_npc_mcam_write_entry() call, so normal installs appeared
> as errors in dmesg.  Use dev_dbg() for the success path and keep dev_err()
> for real failures.
> 
> Fixes: 3571fe07a090 ("octeontx2-af: Drop rules for NPC MCAM")
> Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>

Reviewed-by: Simon Horman <horms@kernel.org>

^ permalink raw reply

* Re: [PATCH net] octeontx2-pf: Fix leak of SQ timestamp buffer on teardown
From: Simon Horman @ 2026-06-16  7:10 UTC (permalink / raw)
  To: Ratheesh Kannoth
  Cc: amakarov, davem, jesse.brandeburg, kuba, linux-kernel, netdev,
	richardcochran, andrew+netdev, edumazet, pabeni, sgoutham
In-Reply-To: <20260615030704.504536-1-rkannoth@marvell.com>

On Mon, Jun 15, 2026 at 08:37:04AM +0530, Ratheesh Kannoth wrote:
> The send-queue timestamp ring is allocated with qmem_alloc() when
> timestamping is used, but otx2_free_sq_res() never freed sq->timestamps,
> leaking that memory across ifdown and device removal.  Add the missing
> qmem_free() alongside the other SQ companion buffers.
> 
> Fixes: c9c12d339d93 ("octeontx2-pf: Add support for PTP clock")
> Cc: Aleksey Makarov <amakarov@marvell.com>
> Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply

* Re: [PATCH net-next v3 2/4] udmabuf: emit one sg entry per pinned folio
From: Christian König @ 2026-06-16  7:04 UTC (permalink / raw)
  To: Jakub Kicinski, Bobby Eshleman
  Cc: Donald Hunter, David S. Miller, Eric Dumazet, Paolo Abeni,
	Simon Horman, Andrew Lunn, Gerd Hoffmann, Vivek Kasireddy,
	Sumit Semwal, Shuah Khan, netdev, linux-kernel, dri-devel,
	linux-media, linaro-mm-sig, linux-kselftest, sdf, razor, daniel,
	almasrymina, matttbe, skhawaja, dw, Bobby Eshleman
In-Reply-To: <20260615145757.0b2ddcf3@kernel.org>

On 6/15/26 23:57, Jakub Kicinski wrote:
> On Fri, 12 Jun 2026 09:25:58 -0700 Bobby Eshleman wrote:
>> dma_map_sgtable() does not always merge contiguous pages for us, so we
>> do this internally before exporting.
>>
>> Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
>> ---
>>  drivers/dma-buf/udmabuf.c 
> 
> This will need at the very least an ack from DMABUF maintainers,
> so it's a bit late to consider it for 7.2

Sorry for not replying earlier. I already nailed Bobby with questions on the why and what from the DMA-buf side and while I don't have time to in deep review the code the high level rational looks sane to me.

Feel free to add Acked-by: Christian König <christian.koenig@amd.com>

Regards,
Christian.

^ permalink raw reply

* Re: [PATCH v3 0/3] net: stmmac: L3/L4 filter bug fixes
From: Nazle Asmade, Muhammad Nazim Amirul @ 2026-06-16  6:57 UTC (permalink / raw)
  To: Maxime Chevallier, netdev@vger.kernel.org
  Cc: andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com, rmk+kernel@armlinux.org.uk,
	Jose.Abreu@synopsys.com, linux-kernel@vger.kernel.org
In-Reply-To: <fbe4294d-98af-4817-97de-9b20df89a240@bootlin.com>

On 16/6/2026 2:06 pm, Maxime Chevallier wrote:
> Hi Nazim,
>
> On 6/16/26 06:26, muhammad.nazim.amirul.nazle.asmade@altera.com wrote:
>> From: Nazim Amirul <muhammad.nazim.amirul.nazle.asmade@altera.com>
>>
>> This series fixes three bugs in the stmmac L3/L4 TC flower filter
>> implementation for the XGMAC2 core. All three patches target net.
>>
>> The L3/L4 filter match count statistics patch (originally patch 4/4)
>> has been split out and will be sent separately against net-next per
>> Andrew Lunn's review of v1.
>>
>> Patch 1 fixes a register corruption bug in the L4 filter port configuration.
>> The XGMAC_L4_ADDR register holds both source and destination port match
>> values in a single register. The original code overwrites the entire register
>> when setting either field, silently erasing the other. This is fixed by
>> using a read-modify-write sequence.
>>
>> Patch 2 fixes the basic flow match parser to properly reject unsupported
>> offload requests with -EOPNOTSUPP instead of silently accepting them.
>> Unsupported cases include partial protocol masks, non-IPv4 network proto,
>> and non-TCP/UDP transport proto. Extack messages are now included so users
>> know exactly which part of the match is unsupported. The -EOPNOTSUPP is
>> also now returned directly instead of using break, which was silently
>> discarding the error on FLOW_CLS_REPLACE operations.
>>
>> Patch 3 fixes a stale action bug on filter deletion. When a filter entry
>> with a drop action is deleted, the action field was not reset, causing
>> it to persist and potentially affect subsequent filter configurations.
>>
>> All three patches fix the original L3/L4 filter implementation introduced in
>> 425eabddaf0f ("net: stmmac: Implement L3/L4 Filters using TC Flower").
>>
>> Changes in v3:
>> - Patch 2: add extack messages to each -EOPNOTSUPP return (Jakub Kicinski)
>> - Patch 2: return -EOPNOTSUPP directly instead of break to avoid silently
>>    reporting success on unsupported FLOW_CLS_REPLACE (Sashiko review)
>
> Please take a look at this page prior to reposting :
>
> https://netdev.bots.linux.dev/net-next.html
>
> There's also an announcement made on the netdev@ list when net-next
> opens/closes.
>
> You can't submit new series that target net-next during the merge window,
> this revision will have to wait 2 weeks.
>
> Maxime
>
Ah I see, Thanks Maxime :)

^ permalink raw reply

* Re: [PATCH net] net: dsa: Fix skb ownership in taggers
From: David Yang @ 2026-06-16  6:54 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Andrew Lunn, Vladimir Oltean, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Florian Fainelli,
	Jonas Gorski, Hauke Mehrtens, Kurt Kanzenbach, Woojung Huh,
	UNGLinuxDriver, Chester A. Unal, Daniel Golle, Matthias Brugger,
	AngeloGioacchino Del Regno, Wei Fang, Clark Wang,
	Clément Léger, George McCollister, netdev,
	Sashiko AI Review
In-Reply-To: <20260616-dsa-fix-free-skb-v1-1-fd30b35dcf66@kernel.org>

On Tue, Jun 16, 2026 at 6:33 AM Linus Walleij <linusw@kernel.org> wrote:
>
> The tag_8021q.c tagger calls vlan_insert_tag() in dsa_8021q_xmit().
> vlan_insert_tag() will consume the skb with kfree_skb() on failure
> and return NULL.
>
> When NULL is returned as error code to ->xmit() in dsa_user_xmit()
> it will free the same skb again leading to a double-free.
>
> The idea of dsa_user_xmit() and dsa_switch_rcv() dropping the skb
> they held before the call to ->xmit() and ->rcv() is conceptually
> wrong: the pattern elsewhere in the networking code is that consumers
> drop their skb:s on failure.
>
> Modify the ->xmit() and ->rcv() call sites to not drop the SKB if
> the taggers return NULL from any of these calls. Move those drops into
> the taggers so every callback error path that retains ownership consumes
> the skb before returning NULL.
>
> Keep the existing helper ownership rules: VLAN insertion helpers already
> free on failure (this is the case in tag_8021q.c), while deferred
> transmit paths either transfer the skb reference to worker context or
> hold a worker reference with skb_get() and drop the caller's reference.
>
> For SJA1105 meta RX, transfer the buffered stampable skb under the meta
> lock and return NULL while the skb is waiting for its meta frame: the
> skb is not dropped in this case.
>
> Reported-by: Sashiko AI Review <sashiko-bot@kernel.org>
> Closes: https://lore.kernel.org/r/20260610153952.1685895-1-kuba@kernel.org/
> Suggested-by: Jakub Kicinski <kuba@kernel.org>
> Assisted-by: Codex:gpt-5-5
> Signed-off-by: Linus Walleij <linusw@kernel.org>

Better to use goto err, but I'm fine with the current patch. In either case,

Acked-by: David Yang <mmyangfl@gmail.com> # yt921x

> ---
>  net/dsa/tag_yt921x.c        |  7 ++++++-

^ permalink raw reply

* 回复：[PATCH v18 net-next 07/11] net/nebula-matrix: add intr resource implementation
From: Illusion Wang @ 2026-06-16  6:45 UTC (permalink / raw)
  To: Illusion Wang, Dimon, Alvin, Sam, netdev
  Cc: andrew+netdev, corbet, kuba, horms, linux-doc, pabeni,
	vadim.fedorenko, lukas.bulwahn, edumazet, enelsonmoore, skhan,
	hkallweit1, open list
In-Reply-To: <20260611044916.2383-8-illusion.wang@nebula-matrix.com>

>commit 7a32ca8335c9041ec03b80d30716c474e81723ad
>Author: illusion.wang <illusion.wang@nebula-matrix.com>
>net/nebula-matrix: add intr resource implementation
>This patch adds the nbl_interrupt module for management of driver-private
>MSI-X indices, plus hw_ops callbacks to program the function MSI-X mapping
>registers, configure MSI-X info, and enable mailbox interrupts.
>> diff --git a/drivers/net/ethernet/nebula-matrix/nbl/nbl_hw/nbl_interrupt.c b/drivers/net/ethernet/nebula-matrix/nbl/nbl_hw/nbl_interrupt.c
>> new file mode 100644
>> index 0000000000000..6d6924e7d9a82
>> --- /dev/null
>> +++ b/drivers/net/ethernet/nebula-matrix/nbl/nbl_hw/nbl_interrupt.c
>[ ... ]
>> +int nbl_res_intr_destroy_msix_map(struct nbl_resource_mgt *res_mgt,
>> +				  u16 func_id)
>> +{
>[ ... ]
>> +	/* use ctrl dev bdf */
>> +	hw_ops->configure_msix_map(res_mgt->hw_ops_tbl->priv, func_id, false, 0,
>> +				   0, 0, 0);
>> +
>> +	intr_mgt->func_intr_res[func_id].interrupts = NULL;
>> +	intr_mgt->func_intr_res[func_id].num_interrupts = 0;
>> +	intr_mgt->func_intr_res[func_id].num_net_interrupts = 0;
>> +	hw_ops->flush_write(res_mgt->hw_ops_tbl->priv);
>> +
>> +	return 0;
>> +}
>Does this code leak the device-managed memory? nbl_res_intr_configure_msix_map()
>allocates interrupts via devm_kcalloc() and msix_map_table->base_addr via
>dmam_alloc_coherent(). Setting the pointer to NULL here without calling
>devm_kfree() and dmam_free_coherent() means the memory won't be freed until
>the device is detached. If the MSI-X map is repeatedly reconfigured, could
>this exhaust memory?
>[ ... ]
>> +int nbl_res_intr_configure_msix_map(struct nbl_resource_mgt *res_mgt,
>> +				    u16 func_id, u16 num_net_msix,
>> +				    u16 num_others_msix,
>> +				    bool net_msix_mask_en)
>> +{
[ ... ]
>> +	msix_map_table = &intr_mgt->func_intr_res[func_id].msix_map_table;
>> +	WARN_ON(msix_map_table->base_addr);
>> +	msix_map_table->size =
>> +		sizeof(struct nbl_msix_map) * NBL_MSIX_MAP_TABLE_MAX_ENTRIES;
>> +	/* HW indexes the MSI-X map table with a fixed stride of 1024 entries,
>> +	 * so we must allocate the full table even if fewer vectors are used.
>> +	 */
>> +	msix_map_table->base_addr = dmam_alloc_coherent(dev,
>> +							msix_map_table->size,
>> +							&msix_map_table->dma,
>> +							GFP_KERNEL);
>> +	if (!msix_map_table->base_addr) {
>> +		dev_err(dev, "Allocate DMA memory for function msix map table failed\n");
>> +		msix_map_table->size = 0;
>> +		return -ENOMEM;
>> +	}
>> +
>> +	interrupts = devm_kcalloc(dev, requested, sizeof(interrupts[0]),
>> +				  GFP_KERNEL);
>> +	if (!interrupts) {
>> +		ret = -ENOMEM;
>> +		goto alloc_interrupts_err;
>> +	}
[ ... ]
>> +get_interrupt_err:
>> +	while (i--) {
>> +		intr_index = interrupts[i];
>> +		if (intr_index >= NBL_NET_INTR_BASE)
>> +			clear_bit(intr_index - NBL_NET_INTR_BASE,
>> +				  intr_mgt->interrupt_net_bitmap);
>> +		else
>> +			clear_bit(intr_index,
>> +				  intr_mgt->interrupt_others_bitmap);
>> +	}
>> +
>> +	intr_mgt->func_intr_res[func_id].num_interrupts = 0;
>> +	intr_mgt->func_intr_res[func_id].interrupts = NULL;
>> +	intr_mgt->func_intr_res[func_id].num_net_interrupts = 0;
>> +alloc_interrupts_err:
>> +
>> +	return ret;
>> +}
>If an error occurs after allocating the memory (e.g., jumping to
>get_interrupt_err or alloc_interrupts_err), does this error path also leak
>the devm_kcalloc() and dmam_alloc_coherent() memory by simply returning
>without freeing it?

Both use devm-managed resources, so I think no manual free is needed.

destroy_msix_map: The interrupts array is allocated with devm_kcalloc(), and
the MSI-X map table is allocated with dmam_alloc_coherent() on the device's
DMA pool. Both are tied to the device lifecycle — they are automatically
reclaimed on driver detach. Setting the pointer to NULL is defensive;
the devm framework handles the actual free.
configure_msix_map error path: Same reasoning. interrupts is devm_kcalloc(),
and the DMA table is device-managed. On any error, the function returns to
the caller, which will eventually trigger driver remove and devm cleanup.
There is no repeated-reconfig loop in the current design — configure_msix_map
is only called during probe/reinit, not in a hot path — so exhaustion is not
a practical concern. Adding manual free calls would duplicate what devm already
does and risk double-free.

^ permalink raw reply

* [PATCH net] net: llc: make empty have static storage duration
From: Wentao Guan @ 2026-06-16  6:40 UTC (permalink / raw)
  To: kuba; +Cc: joel.granados, netdev, linux-kernel, zhanjun, niecheng1,
	Wentao Guan

Make empty have static storage duration (like net/sysctl_net.c does) to
avoid a potential use-after-return and keep consistent with
__register_sysctl_table @table 'should not be free'd after registration'.

Fixes: 73dbd8cf7947 ("net: Remove ctl_table sentinel elements from several networking subsystems")
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
---
 net/llc/sysctl_net_llc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/llc/sysctl_net_llc.c b/net/llc/sysctl_net_llc.c
index c8d88e2508fce..15f1e5d88f208 100644
--- a/net/llc/sysctl_net_llc.c
+++ b/net/llc/sysctl_net_llc.c
@@ -47,7 +47,7 @@ static struct ctl_table_header *llc_station_header;
 
 int __init llc_sysctl_init(void)
 {
-	struct ctl_table empty[1] = {};
+	static struct ctl_table empty[1] = {};
 	llc2_timeout_header = register_net_sysctl(&init_net, "net/llc/llc2/timeout", llc2_timeout_table);
 	llc_station_header = register_net_sysctl_sz(&init_net, "net/llc/station", empty, 0);
 
-- 
2.30.2


^ permalink raw reply related

* Re: [PATCH net] net: dsa: Fix skb ownership in taggers
From: Kurt Kanzenbach @ 2026-06-16  6:38 UTC (permalink / raw)
  To: Linus Walleij, Andrew Lunn, Vladimir Oltean, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Florian Fainelli, Jonas Gorski, Hauke Mehrtens, Woojung Huh,
	UNGLinuxDriver, Chester A. Unal, Daniel Golle, Matthias Brugger,
	AngeloGioacchino Del Regno, Wei Fang, Clark Wang,
	Clément Léger, George McCollister, David Yang
  Cc: netdev, Sashiko AI Review, Linus Walleij
In-Reply-To: <20260616-dsa-fix-free-skb-v1-1-fd30b35dcf66@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 1676 bytes --]

On Tue Jun 16 2026, Linus Walleij wrote:
> The tag_8021q.c tagger calls vlan_insert_tag() in dsa_8021q_xmit().
> vlan_insert_tag() will consume the skb with kfree_skb() on failure
> and return NULL.
>
> When NULL is returned as error code to ->xmit() in dsa_user_xmit()
> it will free the same skb again leading to a double-free.
>
> The idea of dsa_user_xmit() and dsa_switch_rcv() dropping the skb
> they held before the call to ->xmit() and ->rcv() is conceptually
> wrong: the pattern elsewhere in the networking code is that consumers
> drop their skb:s on failure.
>
> Modify the ->xmit() and ->rcv() call sites to not drop the SKB if
> the taggers return NULL from any of these calls. Move those drops into
> the taggers so every callback error path that retains ownership consumes
> the skb before returning NULL.
>
> Keep the existing helper ownership rules: VLAN insertion helpers already
> free on failure (this is the case in tag_8021q.c), while deferred
> transmit paths either transfer the skb reference to worker context or
> hold a worker reference with skb_get() and drop the caller's reference.
>
> For SJA1105 meta RX, transfer the buffered stampable skb under the meta
> lock and return NULL while the skb is waiting for its meta frame: the
> skb is not dropped in this case.
>
> Reported-by: Sashiko AI Review <sashiko-bot@kernel.org>
> Closes: https://lore.kernel.org/r/20260610153952.1685895-1-kuba@kernel.org/
> Suggested-by: Jakub Kicinski <kuba@kernel.org>
> Assisted-by: Codex:gpt-5-5
> Signed-off-by: Linus Walleij <linusw@kernel.org>

Acked-by: Kurt Kanzenbach <kurt@linutronix.de> # hellcreek

> ---
>  net/dsa/tag_hellcreek.c     |  9 +++++++--

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 861 bytes --]

^ permalink raw reply

* Re: [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2
From: Joanne Koong @ 2026-06-16  6:38 UTC (permalink / raw)
  To: Askar Safin
  Cc: akpm, axboe, bernd, brauner, david, dhowells, fuse-devel, hch,
	jack, linux-api, linux-fsdevel, linux-kernel, linux-mm, miklos,
	netdev, patches, pfalcato, rostedt, torvalds, val, viro, willy
In-Reply-To: <20260616011516.4039110-1-safinaskar@gmail.com>

On Mon, Jun 15, 2026 at 9:15 PM Askar Safin <safinaskar@gmail.com> wrote:
>
> Joanne Koong <joannelkoong@gmail.com>:
> > > speaking of fuse_dev_splice……_write actually, this series has broken
> > > xdg-document-portal!
> > >
> > > https://github.com/flatpak/xdg-desktop-portal/issues/2026
> > >
> > > Specifically what happens is that the EINVAL is returned due to oh.len
> > > != nbytes:
> > >
> > > fuse_dev_do_write: oh.len 16400 != nbytes 15526
> > >
> > > (where 16400 == 16384 (read len) + 16, 15526 == 15510 (file len) + 16)
> > >
> > > After reverting the series, there is no error because oh.len
> > > becomes 15526 too.
> >
> > I think this is because of how libfuse handles eof / short reads. When
> > it detects a short read, it fixes up the header length after the
> > header was already vmspliced to the pipe because it assumes vmsplice
> > mapped the header's page into the pipe by reference. It assumes that
> > modifying the header length in place gets then reflected in what the
> > pipe later splices out.
> >
> > The logic for this happens in fuse_send_data_iov() [1]:
> > a) sets out->len = headerlen (16) + len (16384) = 16400 in the
> > stack-allocated fuse_out_header
> > b) vmsplices the header to the pipe
> > c) splices the backing file to the pipe. if this hits EOF, it'll get
> > back 15510 instead of 16384
> > d) detects the short read [2], fixes up the stack out->len = 16 + 15510 = 15526
> > e) splices the pipe to /dev/fuse
> >
> > After this patch, step b) is a straight copy which means step d)'s
> > fixup doesn't modify what's in the pipe. This could be fixed up in
> > libfuse to not depend on modify-after-vmsplice, but I don't think this
> > helps for applications using already-released libfuse versions. I
> > think this patch needs to be reverted.
> >
> > Thanks,
> > Joanne
> >
> > [1] https://github.com/libfuse/libfuse/blob/master/lib/fuse_lowlevel.c#L846
> > [2] https://github.com/libfuse/libfuse/blob/master/lib/fuse_lowlevel.c#L956
>
> Uh, this is very unfortunate. But I still want to remove vmsplice.
> Maybe we can somehow save my patchsets? For example, let's return EINVAL
> for this particular combination (writable pipe + SPLICE_F_NONBLOCK).

writable pipe + SPLICE_F_NONBLOCK is a valid vmsplice call today, so I
think returning -EINVAL would still cause regressions. It happens to
be a workaround for libfuse only because libfuse falls back to
writev() when vmsplice fails, but I don't think we can assume other
callers have the same fallback.

Thanks,
Joanne

>
> --
> Askar Safin

^ permalink raw reply

* [PATCH 5.10/5.15/6.1/6.6/6.12/6.18] ipvs: skip ipv6 extension headers for csum checks
From: Nazar Kalashnikov @ 2026-06-16  6:30 UTC (permalink / raw)
  To: stable, Greg Kroah-Hartman
  Cc: Nazar Kalashnikov, Simon Horman, Julian Anastasov,
	Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal,
	Phil Sutter, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Venkata Mohan Reddy, Patrick McHardy, Julius Volz,
	netdev, lvs-devel, netfilter-devel, coreteam, linux-kernel,
	Wensong Zhang, lvc-project

From: Julian Anastasov <ja@ssi.bg>

commit 05cfe9863ef049d98141dc2969eefde72fb07625 upstream.

Protocol checksum validation fails for IPv6 if there are extension
headers before the protocol header. iph->len already contains its
offset, so use it to fix the problem.

Fixes: 2906f66a5682 ("ipvs: SCTP Trasport Loadbalancing Support")
Fixes: 0bbdd42b7efa ("IPVS: Extend protocol DNAT/SNAT and state handlers")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Nazar Kalashnikov <nazarkalashnikov0@gmail.com>
---
Backport fix for CVE-2026-45850
 net/netfilter/ipvs/ip_vs_proto_sctp.c | 18 ++++++------------
 net/netfilter/ipvs/ip_vs_proto_tcp.c  | 21 +++++++--------------
 net/netfilter/ipvs/ip_vs_proto_udp.c  | 20 +++++++-------------
 3 files changed, 20 insertions(+), 39 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_proto_sctp.c b/net/netfilter/ipvs/ip_vs_proto_sctp.c
index 83e452916403..63c78a1f3918 100644
--- a/net/netfilter/ipvs/ip_vs_proto_sctp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_sctp.c
@@ -10,7 +10,8 @@
 #include <net/ip_vs.h>
 
 static int
-sctp_csum_check(int af, struct sk_buff *skb, struct ip_vs_protocol *pp);
+sctp_csum_check(int af, struct sk_buff *skb, struct ip_vs_protocol *pp,
+		unsigned int sctphoff);
 
 static int
 sctp_conn_schedule(struct netns_ipvs *ipvs, int af, struct sk_buff *skb,
@@ -108,7 +109,7 @@ sctp_snat_handler(struct sk_buff *skb, struct ip_vs_protocol *pp,
 		int ret;
 
 		/* Some checks before mangling */
-		if (!sctp_csum_check(cp->af, skb, pp))
+		if (!sctp_csum_check(cp->af, skb, pp, sctphoff))
 			return 0;
 
 		/* Call application helper if needed */
@@ -156,7 +157,7 @@ sctp_dnat_handler(struct sk_buff *skb, struct ip_vs_protocol *pp,
 		int ret;
 
 		/* Some checks before mangling */
-		if (!sctp_csum_check(cp->af, skb, pp))
+		if (!sctp_csum_check(cp->af, skb, pp, sctphoff))
 			return 0;
 
 		/* Call application helper if needed */
@@ -185,19 +186,12 @@ sctp_dnat_handler(struct sk_buff *skb, struct ip_vs_protocol *pp,
 }
 
 static int
-sctp_csum_check(int af, struct sk_buff *skb, struct ip_vs_protocol *pp)
+sctp_csum_check(int af, struct sk_buff *skb, struct ip_vs_protocol *pp,
+		unsigned int sctphoff)
 {
-	unsigned int sctphoff;
 	struct sctphdr *sh;
 	__le32 cmp, val;
 
-#ifdef CONFIG_IP_VS_IPV6
-	if (af == AF_INET6)
-		sctphoff = sizeof(struct ipv6hdr);
-	else
-#endif
-		sctphoff = ip_hdrlen(skb);
-
 	sh = (struct sctphdr *)(skb->data + sctphoff);
 	cmp = sh->checksum;
 	val = sctp_compute_cksum(skb, sctphoff);
diff --git a/net/netfilter/ipvs/ip_vs_proto_tcp.c b/net/netfilter/ipvs/ip_vs_proto_tcp.c
index 7da51390cea6..ede4fa3b63f5 100644
--- a/net/netfilter/ipvs/ip_vs_proto_tcp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_tcp.c
@@ -29,7 +29,8 @@
 #include <net/ip_vs.h>
 
 static int
-tcp_csum_check(int af, struct sk_buff *skb, struct ip_vs_protocol *pp);
+tcp_csum_check(int af, struct sk_buff *skb, struct ip_vs_protocol *pp,
+	       unsigned int tcphoff);
 
 static int
 tcp_conn_schedule(struct netns_ipvs *ipvs, int af, struct sk_buff *skb,
@@ -166,7 +167,7 @@ tcp_snat_handler(struct sk_buff *skb, struct ip_vs_protocol *pp,
 		int ret;
 
 		/* Some checks before mangling */
-		if (!tcp_csum_check(cp->af, skb, pp))
+		if (!tcp_csum_check(cp->af, skb, pp, tcphoff))
 			return 0;
 
 		/* Call application helper if needed */
@@ -244,7 +245,7 @@ tcp_dnat_handler(struct sk_buff *skb, struct ip_vs_protocol *pp,
 		int ret;
 
 		/* Some checks before mangling */
-		if (!tcp_csum_check(cp->af, skb, pp))
+		if (!tcp_csum_check(cp->af, skb, pp, tcphoff))
 			return 0;
 
 		/*
@@ -301,17 +302,9 @@ tcp_dnat_handler(struct sk_buff *skb, struct ip_vs_protocol *pp,
 
 
 static int
-tcp_csum_check(int af, struct sk_buff *skb, struct ip_vs_protocol *pp)
+tcp_csum_check(int af, struct sk_buff *skb, struct ip_vs_protocol *pp,
+	       unsigned int tcphoff)
 {
-	unsigned int tcphoff;
-
-#ifdef CONFIG_IP_VS_IPV6
-	if (af == AF_INET6)
-		tcphoff = sizeof(struct ipv6hdr);
-	else
-#endif
-		tcphoff = ip_hdrlen(skb);
-
 	switch (skb->ip_summed) {
 	case CHECKSUM_NONE:
 		skb->csum = skb_checksum(skb, tcphoff, skb->len - tcphoff, 0);
@@ -322,7 +315,7 @@ tcp_csum_check(int af, struct sk_buff *skb, struct ip_vs_protocol *pp)
 			if (csum_ipv6_magic(&ipv6_hdr(skb)->saddr,
 					    &ipv6_hdr(skb)->daddr,
 					    skb->len - tcphoff,
-					    ipv6_hdr(skb)->nexthdr,
+					    IPPROTO_TCP,
 					    skb->csum)) {
 				IP_VS_DBG_RL_PKT(0, af, pp, skb, 0,
 						 "Failed checksum for");
diff --git a/net/netfilter/ipvs/ip_vs_proto_udp.c b/net/netfilter/ipvs/ip_vs_proto_udp.c
index 68260d91c988..ffbebda547fc 100644
--- a/net/netfilter/ipvs/ip_vs_proto_udp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_udp.c
@@ -25,7 +25,8 @@
 #include <net/ip6_checksum.h>
 
 static int
-udp_csum_check(int af, struct sk_buff *skb, struct ip_vs_protocol *pp);
+udp_csum_check(int af, struct sk_buff *skb, struct ip_vs_protocol *pp,
+	       unsigned int udphoff);
 
 static int
 udp_conn_schedule(struct netns_ipvs *ipvs, int af, struct sk_buff *skb,
@@ -155,7 +156,7 @@ udp_snat_handler(struct sk_buff *skb, struct ip_vs_protocol *pp,
 		int ret;
 
 		/* Some checks before mangling */
-		if (!udp_csum_check(cp->af, skb, pp))
+		if (!udp_csum_check(cp->af, skb, pp, udphoff))
 			return 0;
 
 		/*
@@ -238,7 +239,7 @@ udp_dnat_handler(struct sk_buff *skb, struct ip_vs_protocol *pp,
 		int ret;
 
 		/* Some checks before mangling */
-		if (!udp_csum_check(cp->af, skb, pp))
+		if (!udp_csum_check(cp->af, skb, pp, udphoff))
 			return 0;
 
 		/*
@@ -297,17 +298,10 @@ udp_dnat_handler(struct sk_buff *skb, struct ip_vs_protocol *pp,
 
 
 static int
-udp_csum_check(int af, struct sk_buff *skb, struct ip_vs_protocol *pp)
+udp_csum_check(int af, struct sk_buff *skb, struct ip_vs_protocol *pp,
+	       unsigned int udphoff)
 {
 	struct udphdr _udph, *uh;
-	unsigned int udphoff;
-
-#ifdef CONFIG_IP_VS_IPV6
-	if (af == AF_INET6)
-		udphoff = sizeof(struct ipv6hdr);
-	else
-#endif
-		udphoff = ip_hdrlen(skb);
 
 	uh = skb_header_pointer(skb, udphoff, sizeof(_udph), &_udph);
 	if (uh == NULL)
@@ -325,7 +319,7 @@ udp_csum_check(int af, struct sk_buff *skb, struct ip_vs_protocol *pp)
 				if (csum_ipv6_magic(&ipv6_hdr(skb)->saddr,
 						    &ipv6_hdr(skb)->daddr,
 						    skb->len - udphoff,
-						    ipv6_hdr(skb)->nexthdr,
+						    IPPROTO_UDP,
 						    skb->csum)) {
 					IP_VS_DBG_RL_PKT(0, af, pp, skb, 0,
 							 "Failed checksum for");
-- 
2.47.3

^ permalink raw reply related

* Re: [BUG] kernel BUG in team driver: buffer overflow in team_add_slave()
From: Mukesh Kumar Chaurasiya @ 2026-06-16  6:26 UTC (permalink / raw)
  To: Yeswanth Krishna; +Cc: netdev, venkat88, linux-kernel, linuxppc-dev
In-Reply-To: <ad80ee33-400b-4c63-8197-838c3cf85cef@linux.ibm.com>

On Tue, Jun 16, 2026 at 11:38:37AM +0530, Yeswanth Krishna wrote:
> 
> > Please add below reported-by tag:
> > Reported-by: Yeswanth Krishna Tellakula <yeswanth@linux.ibm.com>\
> > 
> > 
I am also not able to reproduce it.
Can you paste the full report of the crash?

Regards,
Mukesh

^ permalink raw reply

* Re: [PATCH bpf-next 2/2] selftests/bpf: Cover small conntrack opts error writes
From: bot+bpf-ci @ 2026-06-16  6:19 UTC (permalink / raw)
  To: chenyy23, bpf, netfilter-devel
  Cc: chenyy23, pablo, fw, phil, davem, edumazet, kuba, pabeni, horms,
	andrii, eddyz87, ast, daniel, memxor, martin.lau, song,
	yonghong.song, jolsa, emil, shuah, kartikey406, coreteam, netdev,
	linux-kernel, linux-kselftest, ast, andrii, daniel, martin.lau,
	eddyz87, yonghong.song, clm, ihor.solodrai
In-Reply-To: <c4c898dd23181b676ebf6b6b4d9c54f51bb69c75.1781586477.git.chenyy23@mails.tsinghua.edu.cn>

[-- Attachment #1: Type: text/plain, Size: 1565 bytes --]

> selftests/bpf: Cover small conntrack opts error writes
>
> Add a conntrack kfunc regression check for opts__sz values that do not
> cover opts->error. The BPF program initializes opts->error with a guard
> value, calls the lookup and allocation kfuncs with opts__sz set to
> sizeof(opts->netns_id), and verifies that the guard is still intact
> after the kfunc returns NULL.
>
> Without the conntrack wrapper guard, the kfunc error path overwrites
> that guard with -EINVAL even though the verifier checked only the first
> four bytes of the options object.
>
> Signed-off-by: Yiyang Chen <chenyy23@mails.tsinghua.edu.cn>

This is a regression test for the out-of-bounds opts->error write that
is fixed in the prior series commit 86b813c37ee5 ("bpf: Guard conntrack
opts error writes"), but it carries no Fixes: tags.

A regression test usually carries the same Fixes: tags as the fix it
covers, so it gets backported to stable alongside that fix. The two new
test cases map onto the two commits that introduced the kfuncs: the
small-opts lookup case covers the lookup kfunc and the small-opts alloc
case covers the alloc kfunc.

Should this include:

  Fixes: b4c2b9593a1c ("net/netfilter: Add unstable CT lookup helpers for XDP and TC-BPF")
  Fixes: d7e79c97c00c ("net: netfilter: Add kfuncs to allocate and insert CT")


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/27597459087

^ permalink raw reply

* Re: [BUG] kernel BUG in team driver: buffer overflow in team_add_slave()
From: Yeswanth Krishna @ 2026-06-16  6:08 UTC (permalink / raw)
  To: netdev, venkat88; +Cc: linux-kernel, linuxppc-dev
In-Reply-To: <a08b0a7f-089f-4428-9360-9edbec3a5453@linux.ibm.com>


> Please add below reported-by tag:
> Reported-by: Yeswanth Krishna Tellakula <yeswanth@linux.ibm.com>\
>
>

^ permalink raw reply

* Re: [PATCH v3 0/3] net: stmmac: L3/L4 filter bug fixes
From: Maxime Chevallier @ 2026-06-16  6:06 UTC (permalink / raw)
  To: muhammad.nazim.amirul.nazle.asmade, netdev
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, rmk+kernel,
	Jose.Abreu, linux-kernel
In-Reply-To: <20260616042655.7782-1-muhammad.nazim.amirul.nazle.asmade@altera.com>

Hi Nazim,

On 6/16/26 06:26, muhammad.nazim.amirul.nazle.asmade@altera.com wrote:
> From: Nazim Amirul <muhammad.nazim.amirul.nazle.asmade@altera.com>
> 
> This series fixes three bugs in the stmmac L3/L4 TC flower filter
> implementation for the XGMAC2 core. All three patches target net.
> 
> The L3/L4 filter match count statistics patch (originally patch 4/4)
> has been split out and will be sent separately against net-next per
> Andrew Lunn's review of v1.
> 
> Patch 1 fixes a register corruption bug in the L4 filter port configuration.
> The XGMAC_L4_ADDR register holds both source and destination port match
> values in a single register. The original code overwrites the entire register
> when setting either field, silently erasing the other. This is fixed by
> using a read-modify-write sequence.
> 
> Patch 2 fixes the basic flow match parser to properly reject unsupported
> offload requests with -EOPNOTSUPP instead of silently accepting them.
> Unsupported cases include partial protocol masks, non-IPv4 network proto,
> and non-TCP/UDP transport proto. Extack messages are now included so users
> know exactly which part of the match is unsupported. The -EOPNOTSUPP is
> also now returned directly instead of using break, which was silently
> discarding the error on FLOW_CLS_REPLACE operations.
> 
> Patch 3 fixes a stale action bug on filter deletion. When a filter entry
> with a drop action is deleted, the action field was not reset, causing
> it to persist and potentially affect subsequent filter configurations.
> 
> All three patches fix the original L3/L4 filter implementation introduced in
> 425eabddaf0f ("net: stmmac: Implement L3/L4 Filters using TC Flower").
> 
> Changes in v3:
> - Patch 2: add extack messages to each -EOPNOTSUPP return (Jakub Kicinski)
> - Patch 2: return -EOPNOTSUPP directly instead of break to avoid silently
>   reporting success on unsupported FLOW_CLS_REPLACE (Sashiko review)

Please take a look at this page prior to reposting :

https://netdev.bots.linux.dev/net-next.html

There's also an announcement made on the netdev@ list when net-next
opens/closes.

You can't submit new series that target net-next during the merge window,
this revision will have to wait 2 weeks.

Maxime


^ permalink raw reply

* RE: [PATCH net-next v2 2/4] udmabuf: emit one sg entry per pinned folio
From: Kasireddy, Vivek @ 2026-06-16  6:04 UTC (permalink / raw)
  To: Bobby Eshleman, Donald Hunter, Jakub Kicinski, David S. Miller,
	Eric Dumazet, Paolo Abeni, Simon Horman, Andrew Lunn,
	Gerd Hoffmann, Sumit Semwal, Christian König, Shuah Khan,
	Jason Gunthorpe
  Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	dri-devel@lists.freedesktop.org, linux-media@vger.kernel.org,
	linaro-mm-sig@lists.linaro.org, linux-kselftest@vger.kernel.org,
	sdf@fomichev.me, razor@blackwall.org, daniel@iogearbox.net,
	almasrymina@google.com, matttbe@kernel.org, skhawaja@google.com,
	dw@davidwei.uk, Bobby Eshleman
In-Reply-To: <20260611-tcpdm-large-niovs-v2-2-ee2bf15e7523@meta.com>

Adding Jason to this discussion.

Hi Bobby,

> Subject: [PATCH net-next v2 2/4] udmabuf: emit one sg entry per pinned
> folio
> 
> From: Bobby Eshleman <bobbyeshleman@meta.com>
> 
> get_sg_table() emitted one PAGE_SIZE sg entry per page even when the
> underlying folio was larger.
> 
> Instead, walk folios[] and emit one sg entry per folio. When folios
We have recently merged a patch (that will make it into 7.2) from Jason that
replaced sg_set_folio() with sg_alloc_table_from_pages() in udmabuf driver:
https://gitlab.freedesktop.org/drm/tip/-/commit/5bf888673e0dda5a53220fa0c4956271a46c353c

Since you are relying on sg_set_folio(), the core argument against its usage
in udmabuf is that it doesn't work well with offsets > PAGE_SIZE, resulting
in a malformed scatterlist. Not sure if this can be fixed easily.

> represent large pages (as is for MFD_HUGETLB), each sg entry is a large
> page. Normal PAGE_SIZE sg tables are unchanged.
> 
> This is helpful for importers like net/core/devmem that expect dmabuf sg
IMO, udmabuf needs to detect whether importers can handle segments that
are > PAGE_SIZE and set the entries appropriately. Please look into how the
GPU drivers and other dmabuf exporters/importers handle this situation, so
that we can adopt best practices to address this issue.

Thanks,
Vivek

> entries to be size and length aligned. Prior to this patch udmabuf
> handed over one PAGE_SIZE sg entry per page, so devmem only saw
> PAGE_SIZE chunks regardless of the underlying folio size.
> 
> dma_map_sgtable() does not always merge contiguous pages for us, so we
> do this internally before exporting.
> 
> Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
> ---
>  drivers/dma-buf/udmabuf.c | 52
> ++++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 47 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
> index 94b8ecb892bb..9b751dd98b12 100644
> --- a/drivers/dma-buf/udmabuf.c
> +++ b/drivers/dma-buf/udmabuf.c
> @@ -141,26 +141,68 @@ static void vunmap_udmabuf(struct dma_buf
> *buf, struct iosys_map *map)
>  	vm_unmap_ram(map->vaddr, ubuf->pagecount);
>  }
> 
> +/* Return the number of contiguous pages backed by the folio at @i.
> + * A udmabuf may map only part of a folio, or reference the same folio
> + * in multiple non-contiguous runs, so folio_nr_pages() can't be used.
> + */
> +static pgoff_t udmabuf_folio_nr_pages(struct udmabuf *ubuf, pgoff_t i)
> +{
> +	struct folio *f = ubuf->folios[i];
> +	pgoff_t j;
> +
> +	for (j = 1; i + j < ubuf->pagecount; j++) {
> +		if (ubuf->folios[i + j] != f)
> +			break;
> +		/* Same folio, but not a sequential offset within it. */
> +		if (ubuf->offsets[i + j] != ubuf->offsets[i] + j * PAGE_SIZE)
> +			break;
> +	}
> +	return j;
> +}
> +
> +/* Count the contiguous folio runs in @ubuf, one sg entry per run.
> + *
> + * Coalescing folios into a single sg entry up front lets importers actually
> + * see large chunks. We can't rely on dma_map_sgtable() to do this for us
> as
> + * the dma_map_direct() path preserves the input scatterlist lengths
> verbatim.
> + */
> +static unsigned int udmabuf_sg_nents(struct udmabuf *ubuf)
> +{
> +	unsigned int nents = 0;
> +	pgoff_t i;
> +
> +	for (i = 0; i < ubuf->pagecount; i += udmabuf_folio_nr_pages(ubuf,
> i))
> +		nents++;
> +	return nents;
> +}
> +
>  static struct sg_table *get_sg_table(struct device *dev, struct dma_buf
> *buf,
>  				     enum dma_data_direction direction)
>  {
>  	struct udmabuf *ubuf = buf->priv;
> -	struct sg_table *sg;
>  	struct scatterlist *sgl;
> -	unsigned int i = 0;
> +	struct sg_table *sg;
> +	pgoff_t i, run;
> +	unsigned int nents;
>  	int ret;
> 
> +	nents = udmabuf_sg_nents(ubuf);
> +
>  	sg = kzalloc_obj(*sg);
>  	if (!sg)
>  		return ERR_PTR(-ENOMEM);
> 
> -	ret = sg_alloc_table(sg, ubuf->pagecount, GFP_KERNEL);
> +	ret = sg_alloc_table(sg, nents, GFP_KERNEL);
>  	if (ret < 0)
>  		goto err_alloc;
> 
> -	for_each_sg(sg->sgl, sgl, ubuf->pagecount, i)
> -		sg_set_folio(sgl, ubuf->folios[i], PAGE_SIZE,
> +	sgl = sg->sgl;
> +	for (i = 0; i < ubuf->pagecount; i += run) {
> +		run = udmabuf_folio_nr_pages(ubuf, i);
> +		sg_set_folio(sgl, ubuf->folios[i], run << PAGE_SHIFT,
>  			     ubuf->offsets[i]);
> +		sgl = sg_next(sgl);
> +	}
> 
>  	ret = dma_map_sgtable(dev, sg, direction, 0);
>  	if (ret < 0)
> 
> --
> 2.53.0-Meta


^ permalink raw reply

* Re: [PATCH 0/18] pull request (net-next): ipsec-next 2026-06-12
From: Antony Antony @ 2026-06-16  5:54 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Steffen Klassert, Antony Antony, David Miller, Herbert Xu, netdev
In-Reply-To: <20260613131552.2562d433@kernel.org>

On Sat, Jun 13, 2026 at 01:15:52PM -0700, Jakub Kicinski wrote:
> On Fri, 12 Jun 2026 09:46:16 +0200 Steffen Klassert wrote:
> > 3) Add a new netlink message XFRM_MSG_MIGRATE_STATE that
> >    allows migrating individual IPsec SAs independently of
> >    their policies. The existing XFRM_MSG_MIGRATE is tightly coupled
> >    to policy+SA migration, lacks SPI for unique SA identification,
> >    and cannot express reqid changes or migrate Transport mode
> >    selectors. The new interface identifies the SA via SPI and mark,
> >    supports reqid changes, address family changes, encap removal,
> >    and uses an atomic create+install flow under x->lock to prevent
> >    SN/IV reuse during AEAD SA migration.
> >    From Antony Antony.
> 
> Hi! There are some Sashiko comments here, please follow up:
> 
> https://sashiko.dev/#/patchset/20260612074725.1760473-8-steffen.klassert@secunet.com
> 

Thanks Jakub. I have fixes and testing them now. And I will send fixes soon.

The comments didn't click until I realized xfrm_user_state_lookup() only
keys on mark.v & mark.m, so distinct (v, m) pairs collapse to the same
masked value. A lookup key of {0, 0} matches a source SA with mark
{0, 0xffffff} (both mask to 0), but reusing {0, 0} as the migrated mark 
turns "match only mark 0x00" into "match all traffic".

Fix is copy from old SA than from old_mark passed along. This also pointed 
more issues.

-antony


^ permalink raw reply

* Re: [PATCH net v2] xfrm: Fix dev use-after-free in xfrm async resumption
From: Steffen Klassert @ 2026-06-16  6:01 UTC (permalink / raw)
  To: Dong Chenchen
  Cc: herbert, davem, edumazet, kuba, pabeni, horms, tpluszz77, idosch,
	netdev, zhangchangzhong, xuchunxiao3
In-Reply-To: <20260609092117.1362316-1-dongchenchen2@huawei.com>

On Tue, Jun 09, 2026 at 05:21:17PM +0800, Dong Chenchen wrote:
> xfrm async resumption hold skb->dev refcnt until after transport_finish.
> However, xfrm_rcv_cb may modify skb->dev to tunnel dev without taking
> device reference, such as vti_rcv_cb. The subsequent async resumption
> will decrement the tunnel device's reference count, which lead to uaf
> of tunnel dev and refcnt leak of orig dev as below:
> 
> unregister_netdevice: waiting for vti1 to become free. Usage count = -2
> 
> Stash the original skb->dev to fix refcnt imbalance. The new skb->dev set
> by xfrm_rcv_cb can race with device teardown. Extend rcu protection over
> xfrm_rcv_cb and transport_finish to prevent races.
> 
> Fixes: 1c428b038400 ("xfrm: hold dev ref until after transport_finish NF_HOOK")
> Reported-by: Xu Chunxiao <xuchunxiao3@huawei.com>
> Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com>

Applied to the ipsec tree, thanks Dong!

^ permalink raw reply

* Re: [PATCH net v5 1/4] net: ethernet: oa_tc6: Interrupt is active low, level triggered.
From: Parthiban.Veerasooran @ 2026-06-16  6:01 UTC (permalink / raw)
  To: Selvamani.Rajagopal, andrew+netdev, davem, edumazet, kuba, pabeni,
	robh, krzk+dt, conor+dt, pier.beruto
  Cc: andrew, netdev, linux-kernel, Conor.Dooley, devicetree
In-Reply-To: <20260611-level-trigger-v5-1-4533a9e85ce2@onsemi.com>

Hi Selvamani,

I did a quick test by connecting Mikroe LAN8651 Click to a Raspberry Pi 
4 and shared the feedback below. Please let me know if you need any 
further details.

Test case 1: Single LAN8651 instance on RPI4

Setup:

RPI4 #1 + LAN8651 (IP: 192.168.10.101) <--- RPI4 #2 + EVB-LAN8670-USB 
(IP: 192.168.10.102)

Commands:

iperf3 -s -p 5001 <--- iperf3 -c 192.168.10.101 -u -b 9.4M -i 1 -t 0 -p 5001

Result:

No issues observed.

Test case 2: Two LAN8651 instances on the same RPI4

Setup:

RPI4 #1 + LAN8651 (IP: 192.168.10.101) <--- RPI4 #2 + EVB-LAN8670-USB 
(IP: 192.168.10.102)
RPI4 #1 + LAN8651 (IP: 192.168.20.101) <--- RPI4 #2 + EVB-LAN8670-USB 
(IP: 192.168.20.102)

Result:

Initially working fine with continuous "Receive buffer overflow" errors. 
This is expected, as both USB devices transmit at full speed while RPI4 
can handle the traffic only up to a maximum SPI frequency of 15 MHz. 
Eventually, the system crashed. The crash message was captured in the 
dmesg log,

[ 8276.676335] net_ratelimit: 2448 callbacks suppressed
[ 8276.676341] eth1: Receive buffer overflow error
[ 8276.676349] eth2: Receive buffer overflow error
[ 8276.680025] eth2: Receive buffer overflow error
[ 8276.680033] eth1: Receive buffer overflow error
[ 8276.683701] eth2: Receive buffer overflow error
[ 8276.683710] eth1: Receive buffer overflow error
[ 8276.687378] eth2: Receive buffer overflow error
[ 8276.687387] eth1: Receive buffer overflow error
[ 8276.691055] eth2: Receive buffer overflow error
[ 8276.691064] eth1: Receive buffer overflow error
[ 8281.662600] Unable to handle kernel NULL pointer dereference at 
virtual address 0000000000000074
[ 8281.670936] Mem abort info:
[ 8281.673747]   ESR = 0x0000000096000005
[ 8281.677544]   EC = 0x25: DABT (current EL), IL = 32 bits
[ 8281.682917]   SET = 0, FnV = 0
[ 8281.685997]   EA = 0, S1PTW = 0
[ 8281.689173]   FSC = 0x05: level 1 translation fault
[ 8281.694109] Data abort info:
[ 8281.697017]   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
[ 8281.702571]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 8281.707680]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 8281.713056] user pgtable: 4k pages, 39-bit VAs, pgdp=0000000040a1d000
[ 8281.719578] [0000000000000074] pgd=0000000000000000, 
p4d=0000000000000000, pud=0000000000000000
[ 8281.728391] Internal error: Oops: 0000000096000005 [#1]  SMP
[ 8281.734115] Modules linked in: sch_fq lan865x_t1s(O) microchip_t1s(O) 
snd_seq_dummy snd_hrtimer snd_seq snd_seq_device rfcomm algif_hash 
aes_neon_bs algif_skcipher af_alg bnep binfmt_misc brcmfmac_cyw brcmfmac 
vc4 hci_uart brcmutil btbcm bluetooth v3d snd_soc_hdmi_codec bcm2835_isp 
cfg80211 rpi_hevc_dec bcm2835_codec(C) drm_exec bcm2835_v4l2(C) 
drm_display_helper ecdh_generic cec ecc bcm2835_mmal_vchiq gpu_sched 
videobuf2_vmalloc vc_sm_cma v4l2_mem2mem drm_dma_helper rfkill crc_ccitt 
drm_client_lib drm_shmem_helper videobuf2_dma_contig videobuf2_memops 
drm_kms_helper videobuf2_v4l2 snd_soc_core videodev raspberrypi_hwmon 
snd_bcm2835(C) snd_compress i2c_brcmstb snd_pcm_dmaengine snd_pcm 
videobuf2_common snd_timer mc raspberrypi_gpiomem spi_bcm2835 snd 
gpio_fan nvmem_rmem sch_fq_codel i2c_dev zram lz4_compress drm fuse 
drm_panel_orientation_quirks backlight nfnetlink
[ 8281.811847] CPU: 3 UID: 0 PID: 1759 Comm: irq/59-spi0.0 Tainted: G 
      C O        7.1.0-rc7-v8+ #1 PREEMPT
[ 8281.822067] Tainted: [C]=CRAP, [O]=OOT_MODULE
[ 8281.826473] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT)
[ 8281.832377] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS 
BTYPE=--)
[ 8281.839427] pc : skb_put+0x14/0x80
[ 8281.842864] lr : oa_tc6_macphy_threaded_irq+0x428/0x880 [lan865x_t1s]
[ 8281.849386] sp : ffffffc083c4bd40
[ 8281.852735] x29: ffffffc083c4bd40 x28: 000000002020003e x27: 
ffffffe59e5609c8
[ 8281.859962] x26: ffffff8103d55080 x25: 0000000000000001 x24: 
ffffff8040566880
[ 8281.867187] x23: 0000000000000001 x22: 0000000000000000 x21: 
0000000000000000
[ 8281.874414] x20: 000000003e002020 x19: ffffff80405668a0 x18: 
00000000000b6748
[ 8281.881641] x17: ffffff9b64502000 x16: ffffffe59f08cef0 x15: 
1ae8add2c0a08935
[ 8281.888867] x14: b154d86008ee08e7 x13: b66a1ae8add2c0a0 x12: 
8935b154d86008ee
[ 8281.896093] x11: 00000000000000c0 x10: 0000000000001ae0 x9 : 
ffffffe590bb7918
[ 8281.903320] x8 : ffffff80493c1b40 x7 : 0000000000000004 x6 : 
ffffffffffffffff
[ 8281.910547] x5 : ffffffe59fb9d000 x4 : 0000000000000004 x3 : 
0000000000000000
[ 8281.917773] x2 : 0000000000000000 x1 : 0000000000000040 x0 : 
0000000000000000
[ 8281.925000] Call trace:
[ 8281.927468]  skb_put+0x14/0x80 (P)
[ 8281.930905]  oa_tc6_macphy_threaded_irq+0x428/0x880 [lan865x_t1s]
[ 8281.937073]  irq_thread_fn+0x34/0xc0
[ 8281.940686]  irq_thread+0x1a8/0x308
[ 8281.944212]  kthread+0x138/0x150
[ 8281.947472]  ret_from_fork+0x10/0x20
[ 8281.951089] Code: d503201f d503233f a9bf7bfd 910003fd (b9407406)
[ 8281.957258] ---[ end trace 0000000000000000 ]---
[ 8281.961969] genirq: exiting task "irq/59-spi0.0" (1759) is an active 
IRQ thread (irq 59)
[ 8282.080140] irq 59: nobody cared (try booting with the "irqpoll" option)
[ 8282.086344] CPU: 0 UID: 0 PID: 15 Comm: rcu_preempt Tainted: G      D 
  C O        7.1.0-rc7-v8+ #1 PREEMPT
[ 8282.086352] Tainted: [D]=DIE, [C]=CRAP, [O]=OOT_MODULE
[ 8282.086354] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT)
[ 8282.086357] Call trace:
[ 8282.086359]  show_stack+0x20/0x38 (C)
[ 8282.086372]  dump_stack_lvl+0x60/0x80
[ 8282.086378]  dump_stack+0x18/0x24
[ 8282.086382]  __report_bad_irq+0x54/0xf0
[ 8282.086388]  note_interrupt+0x344/0x398
[ 8282.086393]  handle_irq_event+0xa4/0x110
[ 8282.086397]  handle_level_irq+0xe0/0x178
[ 8282.086401]  handle_irq_desc+0x3c/0x68
[ 8282.086407]  generic_handle_domain_irq+0x20/0x40
[ 8282.086413]  bcm2835_gpio_irq_handle_bank+0x180/0x1c8
[ 8282.086420]  bcm2835_gpio_irq_handler+0x88/0x188
[ 8282.086424]  handle_irq_desc+0x3c/0x68
[ 8282.086430]  generic_handle_domain_irq+0x20/0x40
[ 8282.086435]  gic_handle_irq+0x4c/0xe0
[ 8282.086438]  call_on_irq_stack+0x30/0x88
[ 8282.086444]  do_interrupt_handler+0x88/0x98
[ 8282.086447]  el1_interrupt+0x3c/0x60
[ 8282.086452]  el1h_64_irq_handler+0x18/0x30
[ 8282.086457]  el1h_64_irq+0x6c/0x70
[ 8282.086460]  _raw_spin_unlock_irq+0x10/0x60 (P)
[ 8282.086465]  rcu_gp_kthread+0x2f0/0x310
[ 8282.086471]  kthread+0x138/0x150
[ 8282.086476]  ret_from_fork+0x10/0x20
[ 8282.086481] handlers:
[ 8282.170193] lan8650 spi0.1: SPI transfer timed out
[ 8282.170591] [<0000000094f492f8>] oa_tc6_macphy_isr [lan865x_t1s]
[ 8282.174845] spi_master spi0: failed to transfer one message from queue
[ 8282.178435]  threaded [<000000008b769ba3>] oa_tc6_macphy_threaded_irq 
[lan865x_t1s]
[ 8282.178443] spi_master spi0: noqueue transfer failed

[ 8282.182578] Disabling IRQ #59
[ 8282.238458] lan8650 spi0.1 eth2: SPI data transfer failed: -110
[ 8282.244490] lan8650 spi0.1: Device interrupt disabled to avoid 
interrupt storm

Best regards,
Parthiban V

On 12/06/26 3:25 am, Selvamani Rajagopal via B4 Relay wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> From: Selvamani Rajagopal <Selvamani.Rajagopal@onsemi.com>
> 
> According OPEN Alliance 10BASET1x MAC-PHY Serial Interface
> specification, interrupt is active low, level triggered.
> 
> Code used edge triggered interrupt which has the risk of losing an
> interrupt on instances like when interrupt is disabled. Level
> triggered interrupt won't be deasserted unless handler runs and
> clear the interrupting conditions.
> 
> Interrupt handler mechanism is changed to threaded irq from
> interrupt handler and kernel thread waiting on work queue.
> Threaded irq mechanism is best suited for level triggered interrupt
> as it disables the interrupt until handler is run in thread level,
> while giving us an ability to have interrupt context handler to
> signal the threaded irq handler.
> 
> Introduced a logic to disable the device interrupt on error. Error
> could be due in data chunk's header and footer or SPI interface itself.
> This will avoid having repeated interrupts, in case the driver couldn't
> recover from the error condition with the available recovery mechanism.
> 
> Fixes: 2c6ce5354453 ("net: ethernet: oa_tc6: implement mac-phy interrupt")
> Signed-off-by: Selvamani Rajagopal <Selvamani.Rajagopal@onsemi.com>

^ permalink raw reply

* Re: [PATCH net v2] net: af_key: initialize alg_key_len for IPComp states
From: Steffen Klassert @ 2026-06-16  6:00 UTC (permalink / raw)
  To: Sabrina Dubroca
  Cc: Zijing Yin, Herbert Xu, David S . Miller, Eric Dumazet,
	Paolo Abeni, Ido Schimmel, Simon Horman, netdev, linux-kernel,
	stable
In-Reply-To: <aibn3tkGc3Iz1r5n@krikkit>

On Mon, Jun 08, 2026 at 06:03:42PM +0200, Sabrina Dubroca wrote:
> note: fixes for IPsec should go to the "ipsec" tree, not net
> 
> 2026-06-08, 07:44:41 -0700, Zijing Yin wrote:
> > pfkey_msg2xfrm_state() handles the IPComp (SADB_X_SATYPE_IPCOMP) case by
> > allocating x->calg and copying only the algorithm name:
> > 
> > 	x->calg = kmalloc_obj(*x->calg);
> > 	if (!x->calg) {
> > 		err = -ENOMEM;
> > 		goto out;
> > 	}
> > 	strcpy(x->calg->alg_name, a->name);
> > 	x->props.calgo = sa->sadb_sa_encrypt;
> > 
> > Unlike the authentication (x->aalg) and encryption (x->ealg) branches of
> > the same function, the compression branch never initializes
> > calg->alg_key_len.  IPComp carries no key and the allocation only
> > reserves sizeof(struct xfrm_algo) (i.e. no room for a key), so the field
> > is left containing uninitialized slab data.
> > 
> > calg->alg_key_len is later used as a length by xfrm_algo_clone() when an
> > IPComp state is cloned during XFRM_MSG_MIGRATE:
> 
> The patch looks correct, but do we want to start fixing random bugs in
> code that we're trying to get rid of and that nobody actually uses?
> 
> If we do, then:
> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>

As long as we have the code in the repo, we do.

Applied, thanks everyone!

^ permalink raw reply

* Re: [PATCH ipsec] xfrm: use compat translator only for u64 alignment mismatch
From: Steffen Klassert @ 2026-06-16  5:58 UTC (permalink / raw)
  To: Pradhan, Sanman
  Cc: netdev@vger.kernel.org, herbert@gondor.apana.org.au,
	davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
	pabeni@redhat.com, horms@kernel.org, 0x7f454c46@gmail.com,
	linux-kernel@vger.kernel.org, stable@vger.kernel.org,
	Sanman Pradhan
In-Reply-To: <20260607164726.1544435-1-sanman.pradhan@hpe.com>

On Sun, Jun 07, 2026 at 04:47:34PM +0000, Pradhan, Sanman wrote:
> From: Sanman Pradhan <psanman@juniper.net>
> 
> The XFRM compat layer (CONFIG_XFRM_USER_COMPAT) translates 32-bit xfrm
> netlink and setsockopt messages into the native 64-bit layout. It is
> only needed on architectures where the 32-bit and 64-bit ABIs disagree
> on u64 alignment, which the kernel encodes as COMPAT_FOR_U64_ALIGNMENT.
> 
> That symbol is defined only by arch/x86. XFRM_USER_COMPAT depends on it,
> so the translator can never be built on any other architecture,
> including arm64, which still provides a 32-bit compat ABI (CONFIG_COMPAT)
> for AArch32 EL0 userspace. On arm64 the AArch32 EABI already aligns u64
> to 8 bytes, identical to the AArch64 ABI, so no translation is required
> and the native code path is correct for 32-bit tasks.
> 
> However, xfrm_user_rcv_msg() and xfrm_user_policy() gate on
> in_compat_syscall() alone and then call xfrm_get_translator(), which
> returns NULL when no translator is registered. On arm64 that is always
> the case, so every xfrm netlink message and the XFRM_POLICY setsockopt
> issued by a 32-bit task returns -EOPNOTSUPP. A 32-bit userspace process
> on arm64 (and on any other arch with CONFIG_COMPAT but without
> COMPAT_FOR_U64_ALIGNMENT) therefore cannot configure XFRM state or
> policy through the XFRM_USER netlink API, and cannot use the XFRM_POLICY
> setsockopt path, because both fail before reaching the native parser.
> 
> The translator series replaced the blanket compat rejection with a
> translator lookup. That made the path usable on x86 when the translator
> is available, but left architectures that cannot build the translator
> permanently rejected even when their compat layout already matches the
> native layout. Let those architectures use the native parser instead.
> 
> Gate the translator requirement on COMPAT_FOR_U64_ALIGNMENT instead of
> on in_compat_syscall() alone. Gating on the ABI property rather than on
> CONFIG_XFRM_USER_COMPAT is deliberate: on x86 with IA32_EMULATION=y but
> XFRM_USER_COMPAT=n, a 32-bit task must still be rejected rather than
> routed through the native parser, which would misread genuinely
> 4-byte-aligned x86-32 messages. COMPAT_FOR_U64_ALIGNMENT is the ABI
> property that makes the XFRM translator mandatory.
> 
> Only the receive/input direction needs the guard. The send, dump and
> notification paths already call the translator as "if (xtr) { ... }"
> with no error on NULL, so on arches without a translator they no-op and
> the kernel emits native 64-bit-layout messages, which is what an AArch32
> task expects.
> 
> Tested on Juniper SRX hardware: with the fix, 32-bit IPsec userspace
> netlink and XFRM_POLICY setsockopt operations that previously failed
> with -EOPNOTSUPP now succeed; x86 behaviour is unchanged by inspection.
> 
> Fixes: 5106f4a8acff ("xfrm/compat: Add 32=>64-bit messages translator")
> Fixes: 96392ee5a13b ("xfrm/compat: Translate 32-bit user_policy from sockptr")
> Cc: stable@vger.kernel.org
> Signed-off-by: Sanman Pradhan <psanman@juniper.net>

Patch applied, thanks a lot!

^ permalink raw reply

* Re: [PATCH net-next v3 1/2] net: dsa: realtek: rtl8365mb: add SGMII support for RTL8367S
From: Maxime Chevallier @ 2026-06-16  5:55 UTC (permalink / raw)
  To: Johan Alvarado, linusw, alsi, andrew, olteanv, davem, edumazet,
	kuba, pabeni, netdev
  Cc: linux, namiltd, luizluca, linux-kernel
In-Reply-To: <0100019ecd045b3f-a7bfbb9d-6659-45c6-8fca-9cdce637092c-000000@email.amazonses.com>

Hi Johan,

On 6/15/26 22:41, Johan Alvarado wrote:
>> This comment implies that you could deal with SGMII aneg at some point.
>> [...] makes me wonder if this whole SGMII/2500BaseX series should be
>> represented as a PCS phylink driver.
> 
> Hi Maxime,
> 
> You're right, and I'll convert the SerDes path to a phylink_pcs for v4.
> It splits the MAC and SerDes layers cleanly, drops the "ext then sds"
> branches in mac_link_up/down, and makes future in-band aneg an additive
> change instead of a rewrite.

great !

> 
> One point I'd like to confirm on scope: I can only test the forced-link
> path on my MR80X (fixed-link / conventional PHY), and I have no setup to
> exercise SGMII in-band autonegotiation. My plan is to do the PCS refactor
> keeping the link forced (outband / no in-band AN), and leave actual
> in-band aneg support for a follow-up once I have hardware to validate it.
> Does limiting v4 to the forced path sound acceptable, or would you prefer
> in-band aneg implemented up front? I'd rather not add a code path I can't
> test.

That's fine by me, it's usually better to have something smaller but fully
tested :)

> 
> I'll also reword the misleading "disable in-band aneg" comment.
> 
> net-next being closed until the 29th gives me time to do this properly,
> so v4 will carry the PCS conversion, retested on the MR80X v2.20.

Thanks,

Maxime

> 
> Best regards,
> Johan


^ permalink raw reply

* Re: [PATCH stable 6.6.y v3 1/4] bpf: Track equal scalars history on per-instruction level
From: Shung-Hsi Yu @ 2026-06-16  5:51 UTC (permalink / raw)
  To: Zhenzhong Wu
  Cc: bpf, netdev, linux-kernel, ast, daniel, john.fastabend, andrii,
	martin.lau, song, yonghong.song, kpsingh, haoluo, jolsa,
	menglong8.dong, eddyz87, stable, mykolal, tamird, Hao Sun
In-Reply-To: <7f27d335fa6280d5eb04e7b27a7e3d7e7ac1d641.1781194510.git.jt26wzz@gmail.com>

On Mon, Jun 15, 2026 at 12:58:38AM +0800, Zhenzhong Wu wrote:
[...]
> +/* For all R being scalar registers or spilled scalar registers
> + * in verifier state, save R in linked_regs if R->id == id.
> + * If there are too many Rs sharing same id, reset id for leftover Rs.
> + */
> +static void collect_linked_regs(struct bpf_verifier_state *vstate, u32 id,
> +				struct linked_regs *linked_regs)
> +{
> +	struct bpf_func_state *func;
>  	struct bpf_reg_state *reg;
> +	int i, j;
>  
> -	bpf_for_each_reg_in_vstate(vstate, state, reg, ({
> -		if (reg->type == SCALAR_VALUE && reg->id == known_reg->id) {
> +	for (i = vstate->curframe; i >= 0; i--) {
> +		func = vstate->frame[i];
> +		for (j = 0; j < BPF_REG_FP; j++) {
> +			reg = &func->regs[j];
> +			__collect_linked_regs(linked_regs, reg, id, i, j, true);
> +		}
> +		for (j = 0; j < func->allocated_stack / BPF_REG_SIZE; j++) {
> +			if (!is_spilled_reg(&func->stack[j]))
> +				continue;
> +			reg = &func->stack[j].spilled_ptr;
> +			__collect_linked_regs(linked_regs, reg, id, i, j, false);
> +		}
> +	}
> +
> +	if (linked_regs->cnt == 1)
> +		linked_regs->cnt = 0;

This part seems new, not found on the original commit, and also not in
bpf-next. Can you add some more explaining (in the notes before your
signed-off-by) regarding why this is needed?

> +}
[...]
> @@ -14704,6 +14899,21 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
>  		return 0;
>  	}
>  
> +	/* Push scalar registers sharing same ID to jump history,
> +	 * do this before creating 'other_branch', so that both
> +	 * 'this_branch' and 'other_branch' share this history
> +	 * if parent state is created.
> +	 */
> +	if (BPF_SRC(insn->code) == BPF_X && src_reg->type == SCALAR_VALUE && src_reg->id)
> +		collect_linked_regs(this_branch, src_reg->id, &linked_regs);
> +	if (dst_reg->type == SCALAR_VALUE && dst_reg->id)
> +		collect_linked_regs(this_branch, dst_reg->id, &linked_regs);
> +	if (linked_regs.cnt > 0) {

Same here, the original commit and bpf-next has the '> 1' conditional,
where as your has '> 0'. Can you also added some explanation on this
part?

> +		err = push_jmp_history(env, this_branch, 0, linked_regs_pack(&linked_regs));
> +		if (err)
> +			return err;
> +	}
> +
...

^ permalink raw reply

* [PATCH bpf-next 1/2] bpf: Guard conntrack opts error writes
From: Yiyang Chen @ 2026-06-16  5:42 UTC (permalink / raw)
  To: bpf, netfilter-devel
  Cc: Yiyang Chen, pablo, fw, phil, davem, edumazet, kuba, pabeni,
	horms, andrii, eddyz87, ast, daniel, memxor, martin.lau, song,
	yonghong.song, jolsa, emil, shuah, kartikey406, coreteam, netdev,
	linux-kernel, linux-kselftest
In-Reply-To: <cover.1781586477.git.chenyy23@mails.tsinghua.edu.cn>

The conntrack lookup and allocation kfuncs take an opts pointer
together with an opts__sz argument. The verifier checks only the memory
range described by opts__sz, but the wrappers unconditionally write
opts->error whenever the internal lookup or allocation helper returns an
error.

For an invalid size smaller than the end of opts->error, that write can
land outside the verifier-checked range. Keep returning NULL for invalid
arguments, but only report the error through opts->error when the
supplied size includes the field.

This preserves error reporting for the supported 12-byte and 16-byte
layouts, and for other invalid sizes that still include opts->error.

Fixes: b4c2b9593a1c ("net/netfilter: Add unstable CT lookup helpers for XDP and TC-BPF")
Fixes: d7e79c97c00c ("net: netfilter: Add kfuncs to allocate and insert CT")
Signed-off-by: Yiyang Chen <chenyy23@mails.tsinghua.edu.cn>
---
 net/netfilter/nf_conntrack_bpf.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/net/netfilter/nf_conntrack_bpf.c b/net/netfilter/nf_conntrack_bpf.c
index 40c261cd0af38..3c182024ec509 100644
--- a/net/netfilter/nf_conntrack_bpf.c
+++ b/net/netfilter/nf_conntrack_bpf.c
@@ -65,6 +65,11 @@ enum {
 	NF_BPF_CT_OPTS_SZ = 16,
 };
 
+static bool bpf_ct_opts_has_error(u32 opts_len)
+{
+	return opts_len >= offsetofend(struct bpf_ct_opts, error);
+}
+
 static int bpf_nf_ct_tuple_parse(struct bpf_sock_tuple *bpf_tuple,
 				 u32 tuple_len, u8 protonum, u8 dir,
 				 struct nf_conntrack_tuple *tuple)
@@ -298,7 +303,8 @@ bpf_xdp_ct_alloc(struct xdp_md *xdp_ctx, struct bpf_sock_tuple *bpf_tuple,
 	nfct = __bpf_nf_ct_alloc_entry(dev_net(ctx->rxq->dev), bpf_tuple, tuple__sz,
 				       opts, opts__sz, 10);
 	if (IS_ERR(nfct)) {
-		opts->error = PTR_ERR(nfct);
+		if (bpf_ct_opts_has_error(opts__sz))
+			opts->error = PTR_ERR(nfct);
 		return NULL;
 	}
 
@@ -332,7 +338,8 @@ bpf_xdp_ct_lookup(struct xdp_md *xdp_ctx, struct bpf_sock_tuple *bpf_tuple,
 	caller_net = dev_net(ctx->rxq->dev);
 	nfct = __bpf_nf_ct_lookup(caller_net, bpf_tuple, tuple__sz, opts, opts__sz);
 	if (IS_ERR(nfct)) {
-		opts->error = PTR_ERR(nfct);
+		if (bpf_ct_opts_has_error(opts__sz))
+			opts->error = PTR_ERR(nfct);
 		return NULL;
 	}
 	return nfct;
@@ -364,7 +371,8 @@ bpf_skb_ct_alloc(struct __sk_buff *skb_ctx, struct bpf_sock_tuple *bpf_tuple,
 	net = skb->dev ? dev_net(skb->dev) : sock_net(skb->sk);
 	nfct = __bpf_nf_ct_alloc_entry(net, bpf_tuple, tuple__sz, opts, opts__sz, 10);
 	if (IS_ERR(nfct)) {
-		opts->error = PTR_ERR(nfct);
+		if (bpf_ct_opts_has_error(opts__sz))
+			opts->error = PTR_ERR(nfct);
 		return NULL;
 	}
 
@@ -398,7 +406,8 @@ bpf_skb_ct_lookup(struct __sk_buff *skb_ctx, struct bpf_sock_tuple *bpf_tuple,
 	caller_net = skb->dev ? dev_net(skb->dev) : sock_net(skb->sk);
 	nfct = __bpf_nf_ct_lookup(caller_net, bpf_tuple, tuple__sz, opts, opts__sz);
 	if (IS_ERR(nfct)) {
-		opts->error = PTR_ERR(nfct);
+		if (bpf_ct_opts_has_error(opts__sz))
+			opts->error = PTR_ERR(nfct);
 		return NULL;
 	}
 	return nfct;
-- 
2.34.1


^ permalink raw reply related

* [PATCH bpf-next 2/2] selftests/bpf: Cover small conntrack opts error writes
From: Yiyang Chen @ 2026-06-16  5:42 UTC (permalink / raw)
  To: bpf, netfilter-devel
  Cc: Yiyang Chen, pablo, fw, phil, davem, edumazet, kuba, pabeni,
	horms, andrii, eddyz87, ast, daniel, memxor, martin.lau, song,
	yonghong.song, jolsa, emil, shuah, kartikey406, coreteam, netdev,
	linux-kernel, linux-kselftest
In-Reply-To: <cover.1781586477.git.chenyy23@mails.tsinghua.edu.cn>

Add a conntrack kfunc regression check for opts__sz values that do not
cover opts->error. The BPF program initializes opts->error with a guard
value, calls the lookup and allocation kfuncs with opts__sz set to
sizeof(opts->netns_id), and verifies that the guard is still intact
after the kfunc returns NULL.

Without the conntrack wrapper guard, the kfunc error path overwrites
that guard with -EINVAL even though the verifier checked only the first
four bytes of the options object.

Signed-off-by: Yiyang Chen <chenyy23@mails.tsinghua.edu.cn>
---
 .../testing/selftests/bpf/prog_tests/bpf_nf.c |  6 +++++
 .../testing/selftests/bpf/progs/test_bpf_nf.c | 26 +++++++++++++++++++
 2 files changed, 32 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_nf.c b/tools/testing/selftests/bpf/prog_tests/bpf_nf.c
index b33dba4b126e2..14d4c1793aed5 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_nf.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_nf.c
@@ -5,6 +5,8 @@
 #include "test_bpf_nf.skel.h"
 #include "test_bpf_nf_fail.skel.h"
 
+#define CT_OPTS_ERROR_GUARD 0x12345678
+
 static char log_buf[1024 * 1024];
 
 struct {
@@ -119,6 +121,10 @@ static void test_bpf_nf_ct(int mode)
 	ASSERT_EQ(skel->bss->test_einval_reserved_new, -EINVAL, "Test EINVAL for reserved in new struct not set to 0");
 	ASSERT_EQ(skel->bss->test_einval_netns_id, -EINVAL, "Test EINVAL for netns_id < -1");
 	ASSERT_EQ(skel->bss->test_einval_len_opts, -EINVAL, "Test EINVAL for len__opts != NF_BPF_CT_OPTS_SZ");
+	ASSERT_EQ(skel->bss->test_einval_len_opts_small_lookup, CT_OPTS_ERROR_GUARD,
+		  "Test no error write for lookup opts__sz before error field");
+	ASSERT_EQ(skel->bss->test_einval_len_opts_small_alloc, CT_OPTS_ERROR_GUARD,
+		  "Test no error write for alloc opts__sz before error field");
 	ASSERT_EQ(skel->bss->test_eproto_l4proto, -EPROTO, "Test EPROTO for l4proto != TCP or UDP");
 	ASSERT_EQ(skel->bss->test_enonet_netns_id, -ENONET, "Test ENONET for bad but valid netns_id");
 	ASSERT_EQ(skel->bss->test_enoent_lookup, -ENOENT, "Test ENOENT for failed lookup");
diff --git a/tools/testing/selftests/bpf/progs/test_bpf_nf.c b/tools/testing/selftests/bpf/progs/test_bpf_nf.c
index 076fbf03a1268..df43649ecb785 100644
--- a/tools/testing/selftests/bpf/progs/test_bpf_nf.c
+++ b/tools/testing/selftests/bpf/progs/test_bpf_nf.c
@@ -10,6 +10,8 @@
 #define EINVAL 22
 #define ENOENT 2
 
+#define CT_OPTS_ERROR_GUARD 0x12345678
+
 #define NF_CT_ZONE_DIR_ORIG (1 << IP_CT_DIR_ORIGINAL)
 #define NF_CT_ZONE_DIR_REPL (1 << IP_CT_DIR_REPLY)
 
@@ -19,6 +21,8 @@ int test_einval_reserved = 0;
 int test_einval_reserved_new = 0;
 int test_einval_netns_id = 0;
 int test_einval_len_opts = 0;
+int test_einval_len_opts_small_lookup = 0;
+int test_einval_len_opts_small_alloc = 0;
 int test_eproto_l4proto = 0;
 int test_enonet_netns_id = 0;
 int test_enoent_lookup = 0;
@@ -124,6 +128,28 @@ nf_ct_test(struct nf_conn *(*lookup_fn)(void *, struct bpf_sock_tuple *, u32,
 	else
 		test_einval_len_opts = opts_def.error;
 
+	opts_def.error = CT_OPTS_ERROR_GUARD;
+	ct = lookup_fn(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def,
+		       sizeof(opts_def.netns_id));
+	if (ct) {
+		bpf_ct_release(ct);
+		test_einval_len_opts_small_lookup = -EINVAL;
+	} else {
+		test_einval_len_opts_small_lookup = opts_def.error;
+	}
+
+	opts_def.error = CT_OPTS_ERROR_GUARD;
+	ct = alloc_fn(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def,
+		      sizeof(opts_def.netns_id));
+	if (ct) {
+		ct = bpf_ct_insert_entry(ct);
+		if (ct)
+			bpf_ct_release(ct);
+		test_einval_len_opts_small_alloc = -EINVAL;
+	} else {
+		test_einval_len_opts_small_alloc = opts_def.error;
+	}
+
 	opts_def.l4proto = IPPROTO_ICMP;
 	ct = lookup_fn(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def,
 		       sizeof(opts_def));
-- 
2.34.1


^ permalink raw reply related

* [PATCH bpf-next 0/2] bpf: Guard conntrack opts error writes
From: Yiyang Chen @ 2026-06-16  5:42 UTC (permalink / raw)
  To: bpf, netfilter-devel
  Cc: Yiyang Chen, pablo, fw, phil, davem, edumazet, kuba, pabeni,
	horms, andrii, eddyz87, ast, daniel, memxor, martin.lau, song,
	yonghong.song, jolsa, emil, shuah, kartikey406, coreteam, netdev,
	linux-kernel, linux-kselftest

The conntrack lookup/allocation kfuncs expose an opts/opts__sz pair.
The verifier checks the caller-provided opts__sz range, but the wrappers
currently write opts->error after internal errors even when opts__sz is too
small to include that field.

Patch 1 writes opts->error only when opts__sz includes it.
Patch 2 adds a bpf_nf regression check that keeps a guard in opts->error
while passing opts__sz covering only netns_id.

The regression check follows the existing bpf_nf test shape.  Before the
fix, the guard is overwritten with -EINVAL even though opts__sz covers only
the first four bytes of the options object.  After the fix, the kfunc still
returns NULL for the invalid size, but the guard remains intact.

Validation, rebased and tested on bpf-next master e4287bf34f97
("selftests/bpf: Work around llvm stack overflow in crypto progs"):

  git diff --check origin/master..HEAD: OK
  scripts/checkpatch.pl --strict on 1/2 and 2/2: OK
  make O=/root/ebpf-verifier-bug-detection/kernel-build/bpf-next \
    net/netfilter/nf_conntrack_bpf.o: OK
  git am of exported 1/2 and 2/2 on a fresh worktree at base: OK
  range-diff between branch commits and git-am result: equivalent

The local direct clang build of test_bpf_nf.c is blocked by the local
kernel BTF/config: this environment's generated vmlinux.h lacks
struct nf_conn.mark, which is used by pre-existing test_bpf_nf.c code.
The changed kernel object and generated patch application were validated.

Yiyang Chen (2):
  bpf: Guard conntrack opts error writes
  selftests/bpf: Cover small conntrack opts error writes

 net/netfilter/nf_conntrack_bpf.c              | 17 +++++++++---
 .../testing/selftests/bpf/prog_tests/bpf_nf.c |  6 +++++
 .../testing/selftests/bpf/progs/test_bpf_nf.c | 26 +++++++++++++++++++
 3 files changed, 45 insertions(+), 4 deletions(-)

base-commit: e4287bf34f97a88c7d9322f5bde828724c073a6b
-- 
2.34.1

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox