Netdev List

Netdev List
 help / color / mirror / Atom feed

* RE: the confusing 10000base_CR. Shouldn't it be 10000_SFI_DA?
From: D H, Siddaraju @ 2026-06-26 18:12 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev@vger.kernel.org, Das, Shubham, Chintalapalle, Balaji,
	Srinivasan, Vijay
In-Reply-To: <2994edfc-0894-49c9-abac-6daea5e12075@lunn.ch>

Sorry Andrew, carry-forwarded IEEE's PMD naming convention of separating media "-CR" :D. Will stick to Ethtool :).
I was referring to "ETHTOOL_LINK_MODE_10000baseCR_FULL_Full_BIT" and its associated documentations Andrew.

- Thank you,
Siddaraju D H

-----Original Message-----
From: Andrew Lunn <andrew@lunn.ch> 
Sent: Friday, June 26, 2026 8:02 PM
To: D H, Siddaraju <siddaraju.dh@intel.com>
Cc: netdev@vger.kernel.org; Das, Shubham <shubham.das@intel.com>; Chintalapalle, Balaji <balaji.chintalapalle@intel.com>; Srinivasan, Vijay <vijay.srinivasan@intel.com>
Subject: Re: the confusing 10000base_CR. Shouldn't it be 10000_SFI_DA?

On Fri, Jun 26, 2026 at 02:15:27PM +0000, D H, Siddaraju wrote:
> Hi Linux Ethernet Team,
> 
> We explored Ethtool's "10000base_CR" PMD PHY type and we think it 
> might be just a wrong name.

~/linux$ grep -r 10000Base_CR
~/linux$ 

~/ethtool$ grep -r 10000Base_CR
~/ethtool$ 

What exactly do you mean by 10000Base_CR?

     Andrew

^ permalink raw reply

* Re: the confusing 10000base_CR. Shouldn't it be 10000_SFI_DA?
From: Andrew Lunn @ 2026-06-26 18:18 UTC (permalink / raw)
  To: D H, Siddaraju
  Cc: netdev@vger.kernel.org, Das, Shubham, Chintalapalle, Balaji,
	Srinivasan, Vijay
In-Reply-To: <MW4PR11MB6912BE7FEDE5C509B6C5D5DF9AEB2@MW4PR11MB6912.namprd11.prod.outlook.com>

On Fri, Jun 26, 2026 at 06:12:15PM +0000, D H, Siddaraju wrote:
> Sorry Andrew, carry-forwarded IEEE's PMD naming convention of separating media "-CR" :D. Will stick to Ethtool :).
> I was referring to "ETHTOOL_LINK_MODE_10000baseCR_FULL_Full_BIT" and its associated documentations Andrew.

Please don't top post. And set your mail client to wrap to around 75
characters. All standard netiquette things.

It is a good idea to be specific when asking questions. Asking about
something which does not exist in the kernel just makes it a guessing
game.

As Maxime pointed out, this is uAPI, so cannot be changed.

	Andrew

^ permalink raw reply

* Re: [PATCH bpf-next v2 01/15] bpf: Remove __rcu tagging in st_link->map
From: Eduard Zingerman @ 2026-06-26 18:52 UTC (permalink / raw)
  To: Amery Hung, bpf
  Cc: netdev, alexei.starovoitov, andrii, daniel, memxor, martin.lau,
	shakeel.butt, roman.gushchin, kuniyu, kerneljasonxing,
	kernel-team
In-Reply-To: <20260623175006.3136053-2-ameryhung@gmail.com>

On Tue, 2026-06-23 at 10:49 -0700, Amery Hung wrote:
> From: Martin KaFai Lau <martin.lau@kernel.org>
> 
> st_link->map is always written under update_mutex. The paths that read
> st_link->map with rcu_read_lock() are not in the fast path, so they can
> simply take update_mutex instead. Remove the __rcu annotation and replace
> all RCU accessors with direct pointer reads under update_mutex. Use
> READ_ONCE() in bpf_struct_ops_map_link_poll() which reads the pointer
> without holding update_mutex.
> 
> It is a simplification change.
> 
> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
> Signed-off-by: Amery Hung <ameryhung@gmail.com>
> ---

Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>

[...]

^ permalink raw reply

* RE: the confusing 10000base_CR. Shouldn't it be 10000_SFI_DA?
From: D H, Siddaraju @ 2026-06-26 19:19 UTC (permalink / raw)
  To: Andrew Lunn, Maxime Chevallier
  Cc: netdev@vger.kernel.org, Das, Shubham, Chintalapalle, Balaji,
	Srinivasan, Vijay
In-Reply-To: <2baa5a77-3305-4ed9-b56d-ba27cecc1911@lunn.ch>

Sure, thanks for pointing them, Andrew, will follow.
Now I realized what you meant there, thank you for the quick feedback.

About options,
Ok, got it: "option-(a): renaming *10000baseCR*" is out.
  Sure, will support this from uAPI backward-compatibility point-of-view.

  Just to highlight Maxime, yes during exploration, we too came across
  those few vendor products. But when we looked further to understand
  which standard those 10GBaseCR cables were following, we found they all
  explicitly call out that its SFP+ DA conforming to SFF-8431.

What about
"option-(b): create a new enum ETHTOOL_LINK_MODE_10G_SFI_DA_Full_BIT"?
  Idea is just to create a new enum, with same enum value of 10000baseCR.
  This will NOT consume a bit position in "ethtool_link_mode_bit_indices".
  It just helps those tech-savvy people, who does not accept 10000baseCR
  and prefer 10000sfiDA for being explicit.

At worst case, hope we agree for
"option-(c): ethtool.8 man page help strings to indicate 10G_SFI_DA"
  Something like
    "10000baseCR (10G_SFI_DA    SFF-8431 SFP+ DA)
  under "advertise" mask values.

- Thank you,
Siddaraju D H

^ permalink raw reply

* RE: [PATCH] net: lan743x: Initialize eth_syslock spinlock before use
From: David Thompson @ 2026-06-26 19:50 UTC (permalink / raw)
  To: Andrea Righi, Bryan Whitehead, UNGLinuxDriver@microchip.com
  Cc: Andrew Lunn, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Raju Lakkaraju, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <20260626163218.3591486-1-arighi@nvidia.com>

> -----Original Message-----
> From: Andrea Righi <arighi@nvidia.com>
> Sent: Friday, June 26, 2026 12:32 PM
> To: Bryan Whitehead <bryan.whitehead@microchip.com>;
> UNGLinuxDriver@microchip.com
> Cc: Andrew Lunn <andrew+netdev@lunn.ch>; David S . Miller
> <davem@davemloft.net>; Eric Dumazet <edumazet@google.com>; Jakub
> Kicinski <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>; Raju
> Lakkaraju <Raju.Lakkaraju@microchip.com>; David Thompson
> <davthompson@nvidia.com>; netdev@vger.kernel.org; linux-
> kernel@vger.kernel.org
> Subject: [PATCH] net: lan743x: Initialize eth_syslock spinlock before use
> 
> lan743x_hardware_init() calls pci11x1x_strap_get_status() during the PCI11x1x
> probe sequence. That helper acquires the Ethernet subsystem hardware lock
> via lan743x_hs_syslock_acquire(), which relies on
> adapter->eth_syslock_spinlock to serialize access.
> 
> The spinlock is currently initialized only after the strap status is read. With
> CONFIG_DEBUG_SPINLOCK enabled, taking the zeroed initialized spinlock can
> trip the spinlock debug check.
> 
> Fix by initializing adapter->eth_syslock_spinlock before reading the strap status
> so the probe path never attempts to lock an uninitialized spinlock.
> 
> Fixes: 46b777ad9a8c ("net: lan743x: Add support to SGMII 1G and 2.5G")
> Cc: stable@vger.kernel.org # v6.0+
> Signed-off-by: Andrea Righi <arighi@nvidia.com>
> ---
>  drivers/net/ethernet/microchip/lan743x_main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/microchip/lan743x_main.c
> b/drivers/net/ethernet/microchip/lan743x_main.c
> index 1cdce35e14239..e759171bfd766 100644
> --- a/drivers/net/ethernet/microchip/lan743x_main.c
> +++ b/drivers/net/ethernet/microchip/lan743x_main.c
> @@ -3541,8 +3541,8 @@ static int lan743x_hardware_init(struct
> lan743x_adapter *adapter,
>  		adapter->max_tx_channels = PCI11X1X_MAX_TX_CHANNELS;
>  		adapter->used_tx_channels = PCI11X1X_USED_TX_CHANNELS;
>  		adapter->max_vector_count =
> PCI11X1X_MAX_VECTOR_COUNT;
> -		pci11x1x_strap_get_status(adapter);
>  		spin_lock_init(&adapter->eth_syslock_spinlock);
> +		pci11x1x_strap_get_status(adapter);
>  		mutex_init(&adapter->sgmii_rw_lock);
>  		pci11x1x_set_rfe_rd_fifo_threshold(adapter);
>  		sgmii_ctl = lan743x_csr_read(adapter, SGMII_CTL);
> --
> 2.54.0

Reviewed-by: David Thompson <davthompson@nvidia.com>


^ permalink raw reply

* Re: [BUG] net: tcp: SO_LINGER with l_linger=0 leaks memory when closing sockets with pending send data
From: Ahmed, Aaron @ 2026-06-26 20:26 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: stable@vger.kernel.org, netdev@vger.kernel.org,
	ncardwell@google.com, edumazet@google.com, aws-binance-tam
In-Reply-To: <34F462A1-CEB9-4812-8E98-239E38585F14@amazon.com>

+CC: aws-binance-tam@amazon.com

On 6/19/26, 3:58 PM, "Ahmed, Aaron" <aarnahmd@amazon.com <mailto:aarnahmd@amazon.com>> wrote:

>Hi Kuniyuki,
>
>Sorry to keep asking, were you able to take a look at the updated reproducer? I've still been able to repro with the latest 6.18 LTS.
>
> Thanks,
> Aaron 








^ permalink raw reply

* [PATCH bpf v2 3/4] selftests/bpf: Adapt sockmap update error handling
From: Michal Luczaj @ 2026-06-26 20:36 UTC (permalink / raw)
  To: Eric Dumazet, Kuniyuki Iwashima, Paolo Abeni, Willem de Bruijn,
	John Fastabend, Jakub Sitnicki, Jiayuan Chen, David S. Miller,
	Jakub Kicinski, Simon Horman, Alexei Starovoitov, Cong Wang,
	Daniel Borkmann, Andrii Nakryiko, Eduard Zingerman,
	Kumar Kartikeya Dwivedi, Martin KaFai Lau, Song Liu,
	Yonghong Song, Jiri Olsa, Emil Tsalapatis, Shuah Khan
  Cc: netdev, bpf, linux-kernel, linux-kselftest, Michal Luczaj
In-Reply-To: <20260626-sockmap-lookup-udp-leak-v2-0-7e7e201c951a@rbox.co>

Update sockmap_listen to accommodate the recent change in sockmap that
rejects unbound UDP sockets.

TCP: Reject unbound and bound (unless established or listening).
UDP: Accept only bound sockets.

Signed-off-by: Michal Luczaj <mhal@rbox.co>
---
 tools/testing/selftests/bpf/prog_tests/sockmap_listen.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
index cc0c68bab907..6ee1bc6b3b23 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
@@ -63,11 +63,8 @@ static void test_insert_opened(struct test_sockmap_listen *skel __always_unused,
 	errno = 0;
 	value = s;
 	err = bpf_map_update_elem(mapfd, &key, &value, BPF_NOEXIST);
-	if (sotype == SOCK_STREAM) {
-		if (!err || errno != EOPNOTSUPP)
-			FAIL_ERRNO("map_update: expected EOPNOTSUPP");
-	} else if (err)
-		FAIL_ERRNO("map_update: expected success");
+	if (!err || errno != EOPNOTSUPP)
+		FAIL_ERRNO("map_update: expected EOPNOTSUPP");
 	xclose(s);
 }
 
@@ -93,8 +90,12 @@ static void test_insert_bound(struct test_sockmap_listen *skel __always_unused,
 	errno = 0;
 	value = s;
 	err = bpf_map_update_elem(mapfd, &key, &value, BPF_NOEXIST);
-	if (!err || errno != EOPNOTSUPP)
-		FAIL_ERRNO("map_update: expected EOPNOTSUPP");
+	if (sotype == SOCK_STREAM) {
+		if (!err || errno != EOPNOTSUPP)
+			FAIL_ERRNO("map_update: expected EOPNOTSUPP");
+	} else if (err) {
+		FAIL_ERRNO("map_update: expected success");
+	}
 close:
 	xclose(s);
 }
@@ -1289,7 +1290,7 @@ static void test_ops(struct test_sockmap_listen *skel, struct bpf_map *map,
 		/* insert */
 		TEST(test_insert_invalid),
 		TEST(test_insert_opened),
-		TEST(test_insert_bound, SOCK_STREAM),
+		TEST(test_insert_bound),
 		TEST(test_insert),
 		/* delete */
 		TEST(test_delete_after_insert),

-- 
2.54.0


^ permalink raw reply related

* [PATCH bpf v2 1/4] bpf, sockmap: Reject unhashed UDP sockets on sockmap update
From: Michal Luczaj @ 2026-06-26 20:36 UTC (permalink / raw)
  To: Eric Dumazet, Kuniyuki Iwashima, Paolo Abeni, Willem de Bruijn,
	John Fastabend, Jakub Sitnicki, Jiayuan Chen, David S. Miller,
	Jakub Kicinski, Simon Horman, Alexei Starovoitov, Cong Wang,
	Daniel Borkmann, Andrii Nakryiko, Eduard Zingerman,
	Kumar Kartikeya Dwivedi, Martin KaFai Lau, Song Liu,
	Yonghong Song, Jiri Olsa, Emil Tsalapatis, Shuah Khan
  Cc: netdev, bpf, linux-kernel, linux-kselftest, Michal Luczaj
In-Reply-To: <20260626-sockmap-lookup-udp-leak-v2-0-7e7e201c951a@rbox.co>

UDP sockets get SOCK_RCU_FREE set when (auto-)bound. This means
sk_is_refcounted(unbound) = true, while sk_is_refcounted(bound) = false.

Because sockmap accepts unbound UDP sockets, a BPF program can increment a
socket's refcount via lookup. If the socket is subsequently bound, the
transition from unbound to bound causes bpf_sk_release() to skip the
decrement of the refcount, causing a memory leak.

unreferenced object 0xffff88810bc2eb40 (size 1984):
  comm "test_progs", pid 2451, jiffies 4295320596
  hex dump (first 32 bytes):
    7f 00 00 01 7f 00 00 01 d2 04 1b b7 04 d2 00 00  ................
    02 00 01 40 00 00 00 00 00 00 00 00 00 00 00 00  ...@............
  backtrace (crc bdee079d):
    kmem_cache_alloc_noprof+0x557/0x660
    sk_prot_alloc+0x69/0x240
    sk_alloc+0x30/0x460
    inet_create+0x2ce/0xf80
    __sock_create+0x25b/0x5c0
    __sys_socket+0x119/0x1d0
    __x64_sys_socket+0x72/0xd0
    do_syscall_64+0xa1/0x5f0
    entry_SYSCALL_64_after_hwframe+0x76/0x7e

Instead of special-casing for refcounted sockets, reject unhashed UDP
sockets during sockmap updates, as there is no benefit to supporting those.
This effectively reverts the commit under Fixes, with two exceptions:

1. sock_map_sk_state_allowed() maintains a fall-through `return true`.
2. In the spirit of commit b8b8315e39ff ("bpf, sockmap: Remove unhash
   handler for BPF sockmap usage"), the proto::unhash BPF handler is not
   reintroduced.

Historical note: this issue is related to commit 67312adc96b5 ("bpf: reject
unhashed sockets in bpf_sk_assign").

Fixes: 0c48eefae712 ("sock_map: Lift socket state restriction for datagram sockets")
Suggested-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
---
 net/core/sock_map.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index c60ba6d292f9..9efbd8ca7db8 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -542,6 +542,8 @@ static bool sock_map_sk_state_allowed(const struct sock *sk)
 {
 	if (sk_is_tcp(sk))
 		return (1 << sk->sk_state) & (TCPF_ESTABLISHED | TCPF_LISTEN);
+	if (sk_is_udp(sk))
+		return sk_hashed(sk);
 	if (sk_is_stream_unix(sk))
 		return (1 << READ_ONCE(sk->sk_state)) & TCPF_ESTABLISHED;
 	if (sk_is_vsock(sk) &&

-- 
2.54.0

^ permalink raw reply related

* [PATCH bpf v2 4/4] selftests/bpf: Fail unbound UDP on sockmap update
From: Michal Luczaj @ 2026-06-26 20:36 UTC (permalink / raw)
  To: Eric Dumazet, Kuniyuki Iwashima, Paolo Abeni, Willem de Bruijn,
	John Fastabend, Jakub Sitnicki, Jiayuan Chen, David S. Miller,
	Jakub Kicinski, Simon Horman, Alexei Starovoitov, Cong Wang,
	Daniel Borkmann, Andrii Nakryiko, Eduard Zingerman,
	Kumar Kartikeya Dwivedi, Martin KaFai Lau, Song Liu,
	Yonghong Song, Jiri Olsa, Emil Tsalapatis, Shuah Khan
  Cc: netdev, bpf, linux-kernel, linux-kselftest, Michal Luczaj
In-Reply-To: <20260626-sockmap-lookup-udp-leak-v2-0-7e7e201c951a@rbox.co>

sockmap now rejects unbound UDP sockets. Adjust test_maps.

This effectively reverts commit c39aa2159974 ("bpf, selftests: Fix
test_maps now that sockmap supports UDP").

Signed-off-by: Michal Luczaj <mhal@rbox.co>
---
 tools/testing/selftests/bpf/test_maps.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_maps.c b/tools/testing/selftests/bpf/test_maps.c
index c32da7bd8be2..81cd5d0d69c1 100644
--- a/tools/testing/selftests/bpf/test_maps.c
+++ b/tools/testing/selftests/bpf/test_maps.c
@@ -759,12 +759,12 @@ static void test_sockmap(unsigned int tasks, void *data)
 		goto out_sockmap;
 	}
 
-	/* Test update with unsupported UDP socket */
+	/* Test update with unsupported unbound UDP socket */
 	udp = socket(AF_INET, SOCK_DGRAM, 0);
 	i = 0;
 	err = bpf_map_update_elem(fd, &i, &udp, BPF_ANY);
-	if (err) {
-		printf("Failed socket update SOCK_DGRAM '%i:%i'\n",
+	if (!err) {
+		printf("Failed allowed unbound SOCK_DGRAM socket update '%i:%i'\n",
 		       i, udp);
 		goto out_sockmap;
 	}

-- 
2.54.0


^ permalink raw reply related

* [PATCH bpf v2 0/4] bpf, sockmap: Fix sockmap leaking UDP socks
From: Michal Luczaj @ 2026-06-26 20:36 UTC (permalink / raw)
  To: Eric Dumazet, Kuniyuki Iwashima, Paolo Abeni, Willem de Bruijn,
	John Fastabend, Jakub Sitnicki, Jiayuan Chen, David S. Miller,
	Jakub Kicinski, Simon Horman, Alexei Starovoitov, Cong Wang,
	Daniel Borkmann, Andrii Nakryiko, Eduard Zingerman,
	Kumar Kartikeya Dwivedi, Martin KaFai Lau, Song Liu,
	Yonghong Song, Jiri Olsa, Emil Tsalapatis, Shuah Khan
  Cc: netdev, bpf, linux-kernel, linux-kselftest, Michal Luczaj

Fix for UDP sockets getting leaked during sockmap lookup/release.
Accompanied by selftests updates.

Signed-off-by: Michal Luczaj <mhal@rbox.co>
---
Changes in v2:
- selftest: drop the original, adapt old tests
- fix: change approach to rejecting unbound UDP [Kuniyuki]
- Link to v1: https://patch.msgid.link/20260623-sockmap-lookup-udp-leak-v1-0-05804f9308e4@rbox.co

---
Michal Luczaj (4):
      bpf, sockmap: Reject unhashed UDP sockets on sockmap update
      selftests/bpf: Ensure UDP sockets are bound
      selftests/bpf: Adapt sockmap update error handling
      selftests/bpf: Fail unbound UDP on sockmap update

 net/core/sock_map.c                                     |  2 ++
 tools/testing/selftests/bpf/prog_tests/sockmap_basic.c  |  6 +++---
 tools/testing/selftests/bpf/prog_tests/sockmap_listen.c | 17 +++++++++--------
 tools/testing/selftests/bpf/test_maps.c                 |  6 +++---
 4 files changed, 17 insertions(+), 14 deletions(-)
---
base-commit: 26490a375cb9be9bac96b5171610fd85ca6c2305
change-id: 20260617-sockmap-lookup-udp-leak-bc4e5c5481d7

Best regards,
--  
Michal Luczaj <mhal@rbox.co>


^ permalink raw reply

* [PATCH bpf v2 2/4] selftests/bpf: Ensure UDP sockets are bound
From: Michal Luczaj @ 2026-06-26 20:36 UTC (permalink / raw)
  To: Eric Dumazet, Kuniyuki Iwashima, Paolo Abeni, Willem de Bruijn,
	John Fastabend, Jakub Sitnicki, Jiayuan Chen, David S. Miller,
	Jakub Kicinski, Simon Horman, Alexei Starovoitov, Cong Wang,
	Daniel Borkmann, Andrii Nakryiko, Eduard Zingerman,
	Kumar Kartikeya Dwivedi, Martin KaFai Lau, Song Liu,
	Yonghong Song, Jiri Olsa, Emil Tsalapatis, Shuah Khan
  Cc: netdev, bpf, linux-kernel, linux-kselftest, Michal Luczaj
In-Reply-To: <20260626-sockmap-lookup-udp-leak-v2-0-7e7e201c951a@rbox.co>

Update sockmap_basic tests to bind sockets before they are used. This
accommodates the recent change in sockmap that rejects unbound UDP sockets.

Signed-off-by: Michal Luczaj <mhal@rbox.co>
---
 tools/testing/selftests/bpf/prog_tests/sockmap_basic.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
index cb3229711f93..2d22a9058a8e 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
@@ -853,7 +853,7 @@ static void test_sockmap_many_socket(void)
 		return;
 	}
 
-	udp = xsocket(AF_INET, SOCK_DGRAM | SOCK_NONBLOCK, 0);
+	udp = socket_loopback(AF_INET, SOCK_DGRAM | SOCK_NONBLOCK);
 	if (udp < 0) {
 		close(dgram);
 		close(tcp);
@@ -922,7 +922,7 @@ static void test_sockmap_many_maps(void)
 		return;
 	}
 
-	udp = xsocket(AF_INET, SOCK_DGRAM | SOCK_NONBLOCK, 0);
+	udp = socket_loopback(AF_INET, SOCK_DGRAM | SOCK_NONBLOCK);
 	if (udp < 0) {
 		close(dgram);
 		close(tcp);
@@ -993,7 +993,7 @@ static void test_sockmap_same_sock(void)
 		return;
 	}
 
-	udp = xsocket(AF_INET, SOCK_DGRAM | SOCK_NONBLOCK, 0);
+	udp = socket_loopback(AF_INET, SOCK_DGRAM | SOCK_NONBLOCK);
 	if (udp < 0) {
 		close(dgram);
 		close(tcp);

-- 
2.54.0


^ permalink raw reply related

* Re: [RFC PATCH 1/2] landlock: fix TCP Fast Open connection bypass
From: Mickaël Salaün @ 2026-06-26 20:40 UTC (permalink / raw)
  To: Matthieu Buffet
  Cc: Bryam Vargas, Günther Noack, linux-security-module,
	Mikhail Ivanov, Paul Moore, Eric Dumazet, Neal Cardwell,
	linux-kernel, netdev
In-Reply-To: <20260617180526.15627-2-matthieu@buffet.re>

Thanks Matthieu, could you please rebase this serise on the master
branch (especially on top of your UDP changes)?

This patch will be useful for backports though.

On Wed, Jun 17, 2026 at 08:05:23PM +0200, Matthieu Buffet wrote:
> The documentation of the socket_connect() LSM hook states that it
> controls connecting a socket to a remote address. It has not been the
> case since the addition of TCP Fast Open (RFC 7413) support, which allows
> opening a TCP connection (thus, setting a socket's destination address)
> via the MSG_FASTOPEN flag passed to sendto()/sendmsg()/sendmmsg(). The
> problem then got duplicated into MPTCP.
> 
> Landlock did not take it into account when its TCP support was added,
> leaving a bypass of TCP connect policy.
> 
> Ideally a call to the LSM hook would be added in the fastopen code path,
> in order to fix this generically. But connect() hooks are designed to run
> with the socket locked, unlike sendmsg() hooks.
> 
> Closes: https://github.com/landlock-lsm/linux/issues/41
> Fixes: fff69fb03dde ("landlock: Support network rules with TCP bind and connect")
> Signed-off-by: Matthieu Buffet <matthieu@buffet.re>
> ---
>  security/landlock/net.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/security/landlock/net.c b/security/landlock/net.c
> index 4ee4002a8f56..a2375762c18b 100644
> --- a/security/landlock/net.c
> +++ b/security/landlock/net.c
> @@ -246,9 +246,26 @@ static int hook_socket_connect(struct socket *const sock,
>  					   access_request);
>  }
>  
> +static int hook_socket_sendmsg(struct socket *const sock,
> +			       struct msghdr *const msg, const int size)
> +{
> +	struct sockaddr *const address = msg->msg_name;
> +	const int addrlen = msg->msg_namelen;
> +
> +	if (sk_is_tcp(sock->sk) && address != NULL &&
> +	    (msg->msg_flags & MSG_FASTOPEN) != 0) {

This might be a bit better:

  if ((msg->msg_flags & MSG_FASTOPEN) && address && sk_is_tcp(sock->sk))

> +		return current_check_access_socket(
> +			sock, address, addrlen,
> +			LANDLOCK_ACCESS_NET_CONNECT_TCP);
> +	}
> +
> +	return 0;
> +}
> +
>  static struct security_hook_list landlock_hooks[] __ro_after_init = {
>  	LSM_HOOK_INIT(socket_bind, hook_socket_bind),
>  	LSM_HOOK_INIT(socket_connect, hook_socket_connect),
> +	LSM_HOOK_INIT(socket_sendmsg, hook_socket_sendmsg),
>  };
>  
>  __init void landlock_add_net_hooks(void)
> -- 
> 2.47.3
> 
> 

^ permalink raw reply

* Re: [PATCH bpf 1/2] bpf, sockmap: Don't leak UDP socks on lookup-bind-release
From: Michal Luczaj @ 2026-06-26 20:42 UTC (permalink / raw)
  To: Jakub Sitnicki, Kuniyuki Iwashima
  Cc: Willem de Bruijn, John Fastabend, Jiayuan Chen, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Alexei Starovoitov, Cong Wang, Daniel Borkmann, Andrii Nakryiko,
	Eduard Zingerman, Kumar Kartikeya Dwivedi, Martin KaFai Lau,
	Song Liu, Yonghong Song, Jiri Olsa, Emil Tsalapatis, Shuah Khan,
	netdev, bpf, linux-kernel, linux-kselftest
In-Reply-To: <87o6gyyjxk.fsf@cloudflare.com>

On 6/25/26 12:48, Jakub Sitnicki wrote:
> On Wed, Jun 24, 2026 at 02:39 PM -07, Kuniyuki Iwashima wrote:
>> On Wed, Jun 24, 2026 at 2:33 PM Kuniyuki Iwashima <kuniyu@google.com> wrote:
>>> ...
>>> Setting SOCK_RCU_FREE itself should not cause a problem, but I think
>>> we should take a step back.
>>>
>>> AFAIU, 0c48eefae712 was to allow putting AF_UNIX SOCK_DGRAM sockets
>>> into sockmap, not to allow using unconnected UDP sockets in sk_lookup etc.
>>>
>>> Actually, v4 of the patch was implemented as such but did not get any feedback,
>>> https://lore.kernel.org/bpf/20210508220835.53801-9-xiyou.wangcong@gmail.com/#t
>>>
>>> ... and v5 (the final commit) somehow removed the restriction for unconnected
>>> UDP socket as well.
>>> https://lore.kernel.org/bpf/20210704190252.11866-3-xiyou.wangcong@gmail.com/
>>>
>>> Given the initial use case, sockmap redirect, is still blocked by
>>> TCP_ESTABLISHED
>>> check in sock_map_redirect_allowed(), I feel there is no point in supporting
>>> unconnected UDP sockets in sockmap.  It cannot get any skb from anywhere
>>> (without buggy sk_lookup).
>>
>> s/unconnected/unhashed/g :)
> 
> Rejecting unhashed UDP sockets on insert to sockmap SGTM.
> It is also in line with disable-problematic-cases strategy.

OK, here's v2 with the sock_map_sk_state_allowed() check reintroduced:
https://lore.kernel.org/bpf/20260626-sockmap-lookup-udp-leak-v2-0-7e7e201c951a@rbox.co/

Thanks,
Michal


^ permalink raw reply

* Re: [PATCH bpf v2 1/4] bpf, sockmap: Reject unhashed UDP sockets on sockmap update
From: Kuniyuki Iwashima @ 2026-06-26 20:45 UTC (permalink / raw)
  To: Michal Luczaj
  Cc: Eric Dumazet, Paolo Abeni, Willem de Bruijn, John Fastabend,
	Jakub Sitnicki, Jiayuan Chen, David S. Miller, Jakub Kicinski,
	Simon Horman, Alexei Starovoitov, Cong Wang, Daniel Borkmann,
	Andrii Nakryiko, Eduard Zingerman, Kumar Kartikeya Dwivedi,
	Martin KaFai Lau, Song Liu, Yonghong Song, Jiri Olsa,
	Emil Tsalapatis, Shuah Khan, netdev, bpf, linux-kernel,
	linux-kselftest
In-Reply-To: <20260626-sockmap-lookup-udp-leak-v2-1-7e7e201c951a@rbox.co>

On Fri, Jun 26, 2026 at 1:37 PM Michal Luczaj <mhal@rbox.co> wrote:
>
> UDP sockets get SOCK_RCU_FREE set when (auto-)bound. This means
> sk_is_refcounted(unbound) = true, while sk_is_refcounted(bound) = false.
>
> Because sockmap accepts unbound UDP sockets, a BPF program can increment a
> socket's refcount via lookup. If the socket is subsequently bound, the
> transition from unbound to bound causes bpf_sk_release() to skip the
> decrement of the refcount, causing a memory leak.
>
> unreferenced object 0xffff88810bc2eb40 (size 1984):
>   comm "test_progs", pid 2451, jiffies 4295320596
>   hex dump (first 32 bytes):
>     7f 00 00 01 7f 00 00 01 d2 04 1b b7 04 d2 00 00  ................
>     02 00 01 40 00 00 00 00 00 00 00 00 00 00 00 00  ...@............
>   backtrace (crc bdee079d):
>     kmem_cache_alloc_noprof+0x557/0x660
>     sk_prot_alloc+0x69/0x240
>     sk_alloc+0x30/0x460
>     inet_create+0x2ce/0xf80
>     __sock_create+0x25b/0x5c0
>     __sys_socket+0x119/0x1d0
>     __x64_sys_socket+0x72/0xd0
>     do_syscall_64+0xa1/0x5f0
>     entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> Instead of special-casing for refcounted sockets, reject unhashed UDP
> sockets during sockmap updates, as there is no benefit to supporting those.
> This effectively reverts the commit under Fixes, with two exceptions:
>
> 1. sock_map_sk_state_allowed() maintains a fall-through `return true`.
> 2. In the spirit of commit b8b8315e39ff ("bpf, sockmap: Remove unhash
>    handler for BPF sockmap usage"), the proto::unhash BPF handler is not
>    reintroduced.
>
> Historical note: this issue is related to commit 67312adc96b5 ("bpf: reject
> unhashed sockets in bpf_sk_assign").
>
> Fixes: 0c48eefae712 ("sock_map: Lift socket state restriction for datagram sockets")
> Suggested-by: Kuniyuki Iwashima <kuniyu@google.com>
> Signed-off-by: Michal Luczaj <mhal@rbox.co>

Looks good, thanks !

Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>

^ permalink raw reply

* Re: [PATCH bpf v2 2/4] selftests/bpf: Ensure UDP sockets are bound
From: Kuniyuki Iwashima @ 2026-06-26 20:47 UTC (permalink / raw)
  To: Michal Luczaj
  Cc: Eric Dumazet, Paolo Abeni, Willem de Bruijn, John Fastabend,
	Jakub Sitnicki, Jiayuan Chen, David S. Miller, Jakub Kicinski,
	Simon Horman, Alexei Starovoitov, Cong Wang, Daniel Borkmann,
	Andrii Nakryiko, Eduard Zingerman, Kumar Kartikeya Dwivedi,
	Martin KaFai Lau, Song Liu, Yonghong Song, Jiri Olsa,
	Emil Tsalapatis, Shuah Khan, netdev, bpf, linux-kernel,
	linux-kselftest
In-Reply-To: <20260626-sockmap-lookup-udp-leak-v2-2-7e7e201c951a@rbox.co>

On Fri, Jun 26, 2026 at 1:37 PM Michal Luczaj <mhal@rbox.co> wrote:
>
> Update sockmap_basic tests to bind sockets before they are used. This
> accommodates the recent change in sockmap that rejects unbound UDP sockets.
>
> Signed-off-by: Michal Luczaj <mhal@rbox.co>

nit: this should be patch 1.

Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>

^ permalink raw reply

* Re: [PATCH bpf v2 3/4] selftests/bpf: Adapt sockmap update error handling
From: Kuniyuki Iwashima @ 2026-06-26 20:58 UTC (permalink / raw)
  To: Michal Luczaj
  Cc: Eric Dumazet, Paolo Abeni, Willem de Bruijn, John Fastabend,
	Jakub Sitnicki, Jiayuan Chen, David S. Miller, Jakub Kicinski,
	Simon Horman, Alexei Starovoitov, Cong Wang, Daniel Borkmann,
	Andrii Nakryiko, Eduard Zingerman, Kumar Kartikeya Dwivedi,
	Martin KaFai Lau, Song Liu, Yonghong Song, Jiri Olsa,
	Emil Tsalapatis, Shuah Khan, netdev, bpf, linux-kernel,
	linux-kselftest
In-Reply-To: <20260626-sockmap-lookup-udp-leak-v2-3-7e7e201c951a@rbox.co>

On Fri, Jun 26, 2026 at 1:37 PM Michal Luczaj <mhal@rbox.co> wrote:
>
> Update sockmap_listen to accommodate the recent change in sockmap that
> rejects unbound UDP sockets.
>
> TCP: Reject unbound and bound (unless established or listening).
> UDP: Accept only bound sockets.
>
> Signed-off-by: Michal Luczaj <mhal@rbox.co>
> ---
>  tools/testing/selftests/bpf/prog_tests/sockmap_listen.c | 17 +++++++++--------
>  1 file changed, 9 insertions(+), 8 deletions(-)
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
> index cc0c68bab907..6ee1bc6b3b23 100644
> --- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
> +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
> @@ -63,11 +63,8 @@ static void test_insert_opened(struct test_sockmap_listen *skel __always_unused,
>         errno = 0;
>         value = s;
>         err = bpf_map_update_elem(mapfd, &key, &value, BPF_NOEXIST);
> -       if (sotype == SOCK_STREAM) {
> -               if (!err || errno != EOPNOTSUPP)
> -                       FAIL_ERRNO("map_update: expected EOPNOTSUPP");
> -       } else if (err)
> -               FAIL_ERRNO("map_update: expected success");

Initially I thought AF_UNIX still exercised this path but it was removed
in f3de1cf621f7.  The leftover in family_str() was a bit confusing, so please
follow up on bpf-next.

Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>

^ permalink raw reply

* Re: [PATCH bpf v2 4/4] selftests/bpf: Fail unbound UDP on sockmap update
From: Kuniyuki Iwashima @ 2026-06-26 21:03 UTC (permalink / raw)
  To: Michal Luczaj
  Cc: Eric Dumazet, Paolo Abeni, Willem de Bruijn, John Fastabend,
	Jakub Sitnicki, Jiayuan Chen, David S. Miller, Jakub Kicinski,
	Simon Horman, Alexei Starovoitov, Cong Wang, Daniel Borkmann,
	Andrii Nakryiko, Eduard Zingerman, Kumar Kartikeya Dwivedi,
	Martin KaFai Lau, Song Liu, Yonghong Song, Jiri Olsa,
	Emil Tsalapatis, Shuah Khan, netdev, bpf, linux-kernel,
	linux-kselftest
In-Reply-To: <20260626-sockmap-lookup-udp-leak-v2-4-7e7e201c951a@rbox.co>

On Fri, Jun 26, 2026 at 1:37 PM Michal Luczaj <mhal@rbox.co> wrote:
>
> sockmap now rejects unbound UDP sockets. Adjust test_maps.
>
> This effectively reverts commit c39aa2159974 ("bpf, selftests: Fix
> test_maps now that sockmap supports UDP").
>
> Signed-off-by: Michal Luczaj <mhal@rbox.co>
> ---
>  tools/testing/selftests/bpf/test_maps.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/tools/testing/selftests/bpf/test_maps.c b/tools/testing/selftests/bpf/test_maps.c
> index c32da7bd8be2..81cd5d0d69c1 100644
> --- a/tools/testing/selftests/bpf/test_maps.c
> +++ b/tools/testing/selftests/bpf/test_maps.c
> @@ -759,12 +759,12 @@ static void test_sockmap(unsigned int tasks, void *data)
>                 goto out_sockmap;
>         }
>
> -       /* Test update with unsupported UDP socket */
> +       /* Test update with unsupported unbound UDP socket */
>         udp = socket(AF_INET, SOCK_DGRAM, 0);
>         i = 0;
>         err = bpf_map_update_elem(fd, &i, &udp, BPF_ANY);
> -       if (err) {
> -               printf("Failed socket update SOCK_DGRAM '%i:%i'\n",
> +       if (!err) {
> +               printf("Failed allowed unbound SOCK_DGRAM socket update '%i:%i'\n",

nit: Maybe s/Failed/Unexpectedly succeeded/ ?

If we want to avoid breakage, this patch needs to be squashed to
the fix patch, but it's discouraged in netdev, not sure about bpf tree.

Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>

^ permalink raw reply

* Re: [PATCH bpf v2 3/4] selftests/bpf: Adapt sockmap update error handling
From: bot+bpf-ci @ 2026-06-26 21:14 UTC (permalink / raw)
  To: mhal, edumazet, kuniyu, pabeni, willemb, john.fastabend, jakub,
	jiayuan.chen, davem, kuba, horms, ast, cong.wang, daniel, andrii,
	eddyz87, memxor, martin.lau, song, yonghong.song, jolsa, emil,
	shuah
  Cc: netdev, bpf, linux-kernel, linux-kselftest, mhal, ast, andrii,
	daniel, martin.lau, eddyz87, yonghong.song, clm, ihor.solodrai
In-Reply-To: <20260626-sockmap-lookup-udp-leak-v2-3-7e7e201c951a@rbox.co>

[-- Attachment #1: Type: text/plain, Size: 1189 bytes --]

>     selftests/bpf: Adapt sockmap update error handling
>
>     Update sockmap_listen to accommodate the recent change in sockmap that
>     rejects unbound UDP sockets.
>
>     TCP: Reject unbound and bound (unless established or listening).
>     UDP: Accept only bound sockets.
>
>     Signed-off-by: Michal Luczaj <mhal@rbox.co>

This adjusts the test expectations that commit 0c48eefae712 ("sock_map:
Lift socket state restriction for datagram sockets") had introduced, the
per-sotype branching in test_insert_opened() along with the matching
changes in test_insert_bound() and test_ops().

The companion kernel fix 77a34848a723 ("bpf, sockmap: Reject unhashed UDP
sockets on sockmap update") already carries:

  Fixes: 0c48eefae712 ("sock_map: Lift socket state restriction for datagram sockets")

Should this selftest commit carry the same Fixes: tag?

  Fixes: 0c48eefae712 ("sock_map: Lift socket state restriction for datagram sockets")


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/28264556257

^ permalink raw reply

* Re: [RFC PATCH 2/2] selftests/landlock: Add test for TCP fast open
From: Mickaël Salaün @ 2026-06-26 21:19 UTC (permalink / raw)
  To: Matthieu Buffet
  Cc: Bryam Vargas, Günther Noack, linux-security-module,
	Mikhail Ivanov, Paul Moore, Eric Dumazet, Neal Cardwell,
	linux-kernel, netdev
In-Reply-To: <20260617180526.15627-3-matthieu@buffet.re>

On Wed, Jun 17, 2026 at 08:05:24PM +0200, Matthieu Buffet wrote:
> Enforce that TCP Fast Open is controlled by
> LANDLOCK_ACCESS_NET_CONNECT_TCP. Semantics of connect() and
> sendmsg(MSG_FASTOPEN) should be identical from Landlock's perspective.
> Also enforce error code consistency, since UDP sockets ignore
> the MSG_FASTOPEN flag while Unix sockets reject it.
> 
> Signed-off-by: Matthieu Buffet <matthieu@buffet.re>
> ---
>  tools/testing/selftests/landlock/net_test.c | 155 ++++++++++++++++++++
>  1 file changed, 155 insertions(+)
> 
> diff --git a/tools/testing/selftests/landlock/net_test.c b/tools/testing/selftests/landlock/net_test.c
> index 0c256e7c8675..177ed28e70f6 100644
> --- a/tools/testing/selftests/landlock/net_test.c
> +++ b/tools/testing/selftests/landlock/net_test.c
> @@ -258,6 +258,64 @@ static int connect_variant(const int sock_fd,
>  	return connect_variant_addrlen(sock_fd, srv, get_addrlen(srv, false));
>  }
>  
> +static int sendto_variant_addrlen(const int sock_fd,
> +				  const struct service_fixture *const srv,
> +				  const socklen_t addrlen, void *buf,
> +				  size_t len, size_t flags)
> +{
> +	const struct sockaddr *dst = NULL;
> +	ssize_t ret;
> +
> +	/*
> +        * We never want our processes to be killed by SIGPIPE: we check return
> +        * codes and errno, so that we have actual error messages.
> +        */

There are some extra spaces above.

> +	flags |= MSG_NOSIGNAL;
> +
> +	if (srv != NULL) {

Just `if (srv) {`

> +		switch (srv->protocol.domain) {
> +		case AF_UNSPEC:
> +		case AF_INET:
> +			dst = (const struct sockaddr *)&srv->ipv4_addr;
> +			break;
> +
> +		case AF_INET6:
> +			dst = (const struct sockaddr *)&srv->ipv6_addr;
> +			break;
> +
> +		case AF_UNIX:
> +			dst = (const struct sockaddr *)&srv->unix_addr;
> +			break;
> +
> +		default:
> +			errno = EAFNOSUPPORT;
> +			return -errno;
> +		}
> +	}
> +
> +	ret = sendto(sock_fd, buf, len, flags, dst, addrlen);
> +	if (ret < 0)
> +		return -errno;
> +
> +	/* errno is not set in cases of partial writes. */
> +	if (ret != len)
> +		return -EINTR;
> +
> +	return 0;
> +}
> +
> +static int sendto_variant(const int sock_fd,
> +			  const struct service_fixture *const srv, void *buf,
> +			  size_t len, size_t flags)
> +{
> +	socklen_t addrlen = 0;
> +
> +	if (srv != NULL)

ditto

> +		addrlen = get_addrlen(srv, false);
> +
> +	return sendto_variant_addrlen(sock_fd, srv, addrlen, buf, len, flags);
> +}
> +
>  FIXTURE(protocol)
>  {
>  	struct service_fixture srv0, srv1, srv2, unspec_any0, unspec_srv0;
> @@ -950,6 +1008,103 @@ TEST_F(protocol, connect_unspec)
>  	EXPECT_EQ(0, close(bind_fd));
>  }
>  
> +TEST_F(protocol, tcp_fastopen)
> +{
> +	const bool restricted = variant->sandbox == TCP_SANDBOX &&
> +				variant->prot.type == SOCK_STREAM &&
> +				(variant->prot.protocol == IPPROTO_TCP ||
> +				 variant->prot.protocol == IPPROTO_IP) &&
> +				(variant->prot.domain == AF_INET ||
> +				 variant->prot.domain == AF_INET6);
> +	const struct landlock_ruleset_attr ruleset_attr = {
> +		.handled_access_net = LANDLOCK_ACCESS_NET_CONNECT_TCP,
> +	};
> +	int bind_fd, client_fd, status;
> +	char buf;
> +	pid_t child;
> +
> +	bind_fd = socket_variant(&self->srv0);
> +	ASSERT_LE(0, bind_fd);
> +	EXPECT_EQ(0, bind_variant(bind_fd, &self->srv0));
> +	if (self->srv0.protocol.type == SOCK_STREAM)
> +		EXPECT_EQ(0, listen(bind_fd, backlog));
> +
> +	child = fork();
> +	ASSERT_LE(0, child);
> +	if (child == 0) {
> +		int connect_fd, ret;
> +
> +		/* Closes listening socket for the child. */
> +		EXPECT_EQ(0, close(bind_fd));
> +
> +		connect_fd = socket_variant(&self->srv0);
> +		ASSERT_LE(0, connect_fd);
> +
> +		if (variant->sandbox == TCP_SANDBOX) {
> +			const int ruleset_fd = landlock_create_ruleset(
> +				&ruleset_attr, sizeof(ruleset_attr), 0);
> +			ASSERT_LE(0, ruleset_fd);
> +
> +			enforce_ruleset(_metadata, ruleset_fd);
> +			EXPECT_EQ(0, close(ruleset_fd));
> +		}
> +
> +		/* Fast Open with no address. */
> +		ret = sendto_variant(connect_fd, NULL, NULL, 0, MSG_FASTOPEN);


> +		if (self->srv0.protocol.domain == AF_UNIX) {
> +			ASSERT_EQ(-ENOTCONN, ret);
> +		} else if (self->srv0.protocol.type == SOCK_DGRAM) {
> +			ASSERT_EQ(-EDESTADDRREQ, ret);
> +		} else {
> +			ASSERT_EQ(-EINVAL, ret);
> +		}
> +
> +		/* Fast Open to a denied address. */
> +		ret = sendto_variant(connect_fd, &self->srv0, "A", 1,
> +				     MSG_FASTOPEN);
> +		if (restricted) {
> +			ASSERT_EQ(-EACCES, ret);
> +		} else if (self->srv0.protocol.domain == AF_UNIX &&
> +			   self->srv0.protocol.type == SOCK_STREAM) {
> +			ASSERT_EQ(-EOPNOTSUPP, ret);
> +		} else {
> +			ASSERT_EQ(0, ret);
> +		}

All these ret checks should be done with EXPECT_EQ because they don't
block the test itself and we can get more info by checking more after
that.

> +
> +		EXPECT_EQ(0, close(connect_fd));
> +		_exit(_metadata->exit_code);
> +		return;
> +	}
> +
> +	client_fd = bind_fd;
> +	if (!restricted && self->srv0.protocol.type == SOCK_STREAM &&
> +	    self->srv0.protocol.domain != AF_UNIX) {
> +		client_fd = accept(bind_fd, NULL, 0);
> +		ASSERT_LE(0, client_fd);
> +	}
> +
> +	if (restricted) {
> +		EXPECT_EQ(-1, read(client_fd, &buf, 1));
> +		EXPECT_EQ(ENOTCONN, errno);
> +	} else if (self->srv0.protocol.domain == AF_UNIX &&
> +		   self->srv0.protocol.type == SOCK_STREAM) {
> +		EXPECT_EQ(-1, read(client_fd, &buf, 1));
> +		EXPECT_EQ(EINVAL, errno);
> +	} else {
> +		EXPECT_EQ(1, read(client_fd, &buf, 1));
> +		EXPECT_EQ('A', buf);
> +	}
> +
> +	EXPECT_EQ(child, waitpid(child, &status, 0));
> +	EXPECT_EQ(1, WIFEXITED(status));
> +	EXPECT_EQ(EXIT_SUCCESS, WEXITSTATUS(status));
> +
> +	if (client_fd != bind_fd)
> +		EXPECT_LE(0, close(client_fd));
> +
> +	EXPECT_EQ(0, close(bind_fd));
> +}
> +
>  FIXTURE(ipv4)
>  {
>  	struct service_fixture srv0, srv1;
> -- 
> 2.47.3
> 
> 

^ permalink raw reply

* Re: [PATCH bpf 1/2] bpf, sockmap: Don't leak UDP socks on lookup-bind-release
From: John Fastabend @ 2026-06-26 21:43 UTC (permalink / raw)
  To: Jakub Sitnicki
  Cc: Kuniyuki Iwashima, Michal Luczaj, Willem de Bruijn, Jiayuan Chen,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Alexei Starovoitov, Cong Wang, Daniel Borkmann,
	Andrii Nakryiko, Eduard Zingerman, Kumar Kartikeya Dwivedi,
	Martin KaFai Lau, Song Liu, Yonghong Song, Jiri Olsa,
	Emil Tsalapatis, Shuah Khan, netdev, bpf, linux-kernel,
	linux-kselftest
In-Reply-To: <87o6gyyjxk.fsf@cloudflare.com>

On Thu, Jun 25, 2026 at 12:48:55PM +0200, Jakub Sitnicki wrote:
>On Wed, Jun 24, 2026 at 02:39 PM -07, Kuniyuki Iwashima wrote:
>> On Wed, Jun 24, 2026 at 2:33 PM Kuniyuki Iwashima <kuniyu@google.com> wrote:
>>>
>>> On Wed, Jun 24, 2026 at 2:26 PM Michal Luczaj <mhal@rbox.co> wrote:
>>> >
>>> > On 6/24/26 22:01, Willem de Bruijn wrote:
>>> > > Jakub Sitnicki wrote:
>>> > >> On Tue, Jun 23, 2026 at 08:03 PM +02, Michal Luczaj wrote:
>>> > >>> UDP sockets get SOCK_RCU_FREE set when (auto-)bound. This means
>>> > >>> sk_is_refcounted(unbound) = true, while sk_is_refcounted(bound) = false.
>>> > >>>
>>> > >>> Because sockmap accepts unbound UDP sockets, a BPF program can increment a
>>> > >>> socket's refcount via lookup. If the socket is subsequently bound, the
>>> > >>> transition from unbound to bound causes bpf_sk_release() to skip the
>>> > >>> decrement of the refcount, causing a memory leak.
>>> > >>>
>>> > >>> unreferenced object 0xffff88810bc2eb40 (size 1984):
>>> > >>>   comm "test_progs", pid 2451, jiffies 4295320596
>>> > >>>   hex dump (first 32 bytes):
>>> > >>>     7f 00 00 01 7f 00 00 01 d2 04 1b b7 04 d2 00 00  ................
>>> > >>>     02 00 01 40 00 00 00 00 00 00 00 00 00 00 00 00  ...@............
>>> > >>>   backtrace (crc bdee079d):
>>> > >>>     kmem_cache_alloc_noprof+0x557/0x660
>>> > >>>     sk_prot_alloc+0x69/0x240
>>> > >>>     sk_alloc+0x30/0x460
>>> > >>>     inet_create+0x2ce/0xf80
>>> > >>>     __sock_create+0x25b/0x5c0
>>> > >>>     __sys_socket+0x119/0x1d0
>>> > >>>     __x64_sys_socket+0x72/0xd0
>>> > >>>     do_syscall_64+0xa1/0x5f0
>>> > >>>     entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>> > >>>
>>> > >>> Maintain balanced refcounts across sk lookup/release: (re-)set
>>> > >>> SOCK_RCU_FREE on proto update to treat the socket (whether bound or
>>> > >>> unbound) as not requiring a refcount increment on (a RCU protected) lookup.
>>> > >>>
>>> > >>> Fixes: 0c48eefae712 ("sock_map: Lift socket state restriction for datagram sockets")
>>> > >>> Signed-off-by: Michal Luczaj <mhal@rbox.co>

[...]

>Rejecting unhashed UDP sockets on insert to sockmap SGTM.
>It is also in line with disable-problematic-cases strategy.

Agree ACK just disallow it.

^ permalink raw reply

* Re: [PATCH bpf-next v2 02/15] bpf: Make struct_ops tasks_rcu grace period optional
From: Eduard Zingerman @ 2026-06-26 22:20 UTC (permalink / raw)
  To: Amery Hung, bpf
  Cc: netdev, alexei.starovoitov, andrii, daniel, memxor, martin.lau,
	shakeel.butt, roman.gushchin, kuniyu, kerneljasonxing,
	kernel-team
In-Reply-To: <20260623175006.3136053-3-ameryhung@gmail.com>

On Tue, 2026-06-23 at 10:49 -0700, Amery Hung wrote:
> From: Martin KaFai Lau <martin.lau@kernel.org>
> 
> bpf_struct_ops_map_free() currently waits for both a regular RCU grace
> period and a tasks RCU grace period for every struct_ops map through
> synchronize_rcu_mult(call_rcu, call_rcu_tasks).
> 
> A regular RCU grace period is still required for all struct_ops maps
> because the struct_ops trampoline ksyms requires a rcu grace period
> (take a look at the list_del_rcu in __bpf_ksym_del).
> Add a map_free_pre_rcu() callback so the struct_ops map can remove
> ksyms before bpf_map_put() wait for the regular rcu grace period.
> 
> The tasks RCU grace period is only needed by tcp_congestion_ops.
> Add free_after_tasks_rcu_gp only to struct bpf_struct_ops instead
> of the bpf_map.
> 
> When CONFIG_TASKS_RCU=n, synchronize_rcu_tasks() is the same as
> synchronize_rcu(). Since all struct_ops maps now complete a regular RCU
> grace period before bpf_struct_ops_map_free() runs, skip the extra
> synchronize_rcu_tasks() call in this case.
> 
> This cleanup prepares for a later patch that needs to support
> free_after_mult_rcu_gp.
> 
> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
> Signed-off-by: Amery Hung <ameryhung@gmail.com>
> ---

Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>

[...]

> @@ -997,24 +1006,8 @@ static void bpf_struct_ops_map_free(struct bpf_map *map)
>  
>  	bpf_struct_ops_map_dissoc_progs(st_map);
>  
> -	bpf_struct_ops_map_del_ksyms(st_map);
> -
> -	/* The struct_ops's function may switch to another struct_ops.
> -	 *
> -	 * For example, bpf_tcp_cc_x->init() may switch to
> -	 * another tcp_cc_y by calling
> -	 * setsockopt(TCP_CONGESTION, "tcp_cc_y").
> -	 * During the switch,  bpf_struct_ops_put(tcp_cc_x) is called
> -	 * and its refcount may reach 0 which then free its
> -	 * trampoline image while tcp_cc_x is still running.
> -	 *
> -	 * A vanilla rcu gp is to wait for all bpf-tcp-cc prog
> -	 * to finish. bpf-tcp-cc prog is non sleepable.
> -	 * A rcu_tasks gp is to wait for the last few insn
> -	 * in the tramopline image to finish before releasing
> -	 * the trampoline image.
> -	 */
> -	synchronize_rcu_mult(call_rcu, call_rcu_tasks);
> +	if (tasks_rcu && IS_ENABLED(CONFIG_TASKS_RCU))
> +		synchronize_rcu_tasks();

As far as I understand, this removes the synchronize_rcu_tasks()
for qdisk, sched_ext, smc and hid struct ops. As far as I can tell,
each one of them employs separate means to guarantee that there won't
be any pending BPF trampolines referring to the image being freed here.
So, the change appears to be safe.

>  
>  	__bpf_struct_ops_map_free(map);
>  }

[...]

^ permalink raw reply

* [PATCH] netfilter: x_tables: replace strlcat() with snprintf()
From: Ian Bridges @ 2026-06-26 22:25 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Florian Westphal, Phil Sutter, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	netfilter-devel, coreteam, netdev, linux-kernel
  Cc: linux-hardening

In preparation for removing the deprecated strlcat() API[1], replace the
strscpy()/strlcat() pairs in xt_proto_init() and xt_proto_fini() with
snprintf(), which builds each /proc file name in a single call.

Each name is "<prefix><suffix>", where <prefix> is the address-family
string xt_prefix[af] and <suffix> is one of the FORMAT_TABLES,
FORMAT_MATCHES or FORMAT_TARGETS literals. snprintf() with a "%s%s"
format produces the same NUL-terminated, length-bounded string as the
strscpy()/strlcat() chain it replaces, so the proc entry names are
unchanged.

Link: https://github.com/KSPP/linux/issues/370 [1]
Signed-off-by: Ian Bridges <icb@fastmail.org>
---
 net/netfilter/x_tables.c | 24 ++++++++----------------
 1 file changed, 8 insertions(+), 16 deletions(-)

diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index 4e6708c23922..56f4546be336 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -2033,8 +2033,7 @@ int xt_proto_init(struct net *net, u_int8_t af)
 	root_uid = make_kuid(net->user_ns, 0);
 	root_gid = make_kgid(net->user_ns, 0);
 
-	strscpy(buf, xt_prefix[af], sizeof(buf));
-	strlcat(buf, FORMAT_TABLES, sizeof(buf));
+	snprintf(buf, sizeof(buf), "%s%s", xt_prefix[af], FORMAT_TABLES);
 	proc = proc_create_net_data(buf, 0440, net->proc_net, &xt_table_seq_ops,
 			sizeof(struct seq_net_private),
 			(void *)(unsigned long)af);
@@ -2043,8 +2042,7 @@ int xt_proto_init(struct net *net, u_int8_t af)
 	if (uid_valid(root_uid) && gid_valid(root_gid))
 		proc_set_user(proc, root_uid, root_gid);
 
-	strscpy(buf, xt_prefix[af], sizeof(buf));
-	strlcat(buf, FORMAT_MATCHES, sizeof(buf));
+	snprintf(buf, sizeof(buf), "%s%s", xt_prefix[af], FORMAT_MATCHES);
 	proc = proc_create_seq_private(buf, 0440, net->proc_net,
 			&xt_match_seq_ops, sizeof(struct nf_mttg_trav),
 			(void *)(unsigned long)af);
@@ -2053,8 +2051,7 @@ int xt_proto_init(struct net *net, u_int8_t af)
 	if (uid_valid(root_uid) && gid_valid(root_gid))
 		proc_set_user(proc, root_uid, root_gid);
 
-	strscpy(buf, xt_prefix[af], sizeof(buf));
-	strlcat(buf, FORMAT_TARGETS, sizeof(buf));
+	snprintf(buf, sizeof(buf), "%s%s", xt_prefix[af], FORMAT_TARGETS);
 	proc = proc_create_seq_private(buf, 0440, net->proc_net,
 			 &xt_target_seq_ops, sizeof(struct nf_mttg_trav),
 			 (void *)(unsigned long)af);
@@ -2068,13 +2065,11 @@ int xt_proto_init(struct net *net, u_int8_t af)
 
 #ifdef CONFIG_PROC_FS
 out_remove_matches:
-	strscpy(buf, xt_prefix[af], sizeof(buf));
-	strlcat(buf, FORMAT_MATCHES, sizeof(buf));
+	snprintf(buf, sizeof(buf), "%s%s", xt_prefix[af], FORMAT_MATCHES);
 	remove_proc_entry(buf, net->proc_net);
 
 out_remove_tables:
-	strscpy(buf, xt_prefix[af], sizeof(buf));
-	strlcat(buf, FORMAT_TABLES, sizeof(buf));
+	snprintf(buf, sizeof(buf), "%s%s", xt_prefix[af], FORMAT_TABLES);
 	remove_proc_entry(buf, net->proc_net);
 out:
 	return -1;
@@ -2087,16 +2082,13 @@ void xt_proto_fini(struct net *net, u_int8_t af)
 #ifdef CONFIG_PROC_FS
 	char buf[XT_FUNCTION_MAXNAMELEN];
 
-	strscpy(buf, xt_prefix[af], sizeof(buf));
-	strlcat(buf, FORMAT_TABLES, sizeof(buf));
+	snprintf(buf, sizeof(buf), "%s%s", xt_prefix[af], FORMAT_TABLES);
 	remove_proc_entry(buf, net->proc_net);
 
-	strscpy(buf, xt_prefix[af], sizeof(buf));
-	strlcat(buf, FORMAT_TARGETS, sizeof(buf));
+	snprintf(buf, sizeof(buf), "%s%s", xt_prefix[af], FORMAT_TARGETS);
 	remove_proc_entry(buf, net->proc_net);
 
-	strscpy(buf, xt_prefix[af], sizeof(buf));
-	strlcat(buf, FORMAT_MATCHES, sizeof(buf));
+	snprintf(buf, sizeof(buf), "%s%s", xt_prefix[af], FORMAT_MATCHES);
 	remove_proc_entry(buf, net->proc_net);
 #endif /*CONFIG_PROC_FS*/
 }
-- 
2.47.3


^ permalink raw reply related

* Re: [PATCH iwl-next v5 2/2] ice: implement symmetric RSS hash configuration
From: Jakub Kicinski @ 2026-06-26 22:26 UTC (permalink / raw)
  To: Aleksandr Loktionov; +Cc: intel-wired-lan, anthony.l.nguyen, netdev
In-Reply-To: <20260626054730.1126969-3-aleksandr.loktionov@intel.com>

On Fri, 26 Jun 2026 07:47:30 +0200 Aleksandr Loktionov wrote:
> -	/* Update the VSI's hash function */
> -	if (rxfh->input_xfrm & RXH_XFRM_SYM_XOR)
> -		hfunc = ICE_AQ_VSI_Q_OPT_RSS_HASH_SYM_TPLZ;
> +	/* Handle RSS symmetric hash transformation */
> +	if (rxfh->input_xfrm != RXH_XFRM_NO_CHANGE) {
> +		u8 new_hfunc;

I think this is the very bad part. Please extract it out and send it as
a fix to net. Looks like any changes to RSS confing on ice randomly
enable xfrm sym. I isolated it to the ntuple.py test which just changes
the indir table, and the driver says:

  ice 0000:e1:00.0 ens1f0np0: Hash function set to: Symmetric Toeplitz

Which we never asked for. I drafted this before seeing your reply:

--- a/drivers/net/ethernet/intel/ice/ice_ethtool.c
+++ b/drivers/net/ethernet/intel/ice/ice_ethtool.c
@@ -3692,10 +3692,10 @@ ice_set_rxfh(struct net_device *netdev, struct ethtool_rxfh_param *rxfh,
             struct netlink_ext_ack *extack)
 {
        struct ice_netdev_priv *np = netdev_priv(netdev);
-       u8 hfunc = ICE_AQ_VSI_Q_OPT_RSS_HASH_TPLZ;
        struct ice_vsi *vsi = np->vsi;
        struct ice_pf *pf = vsi->back;
        struct device *dev;
+       u8 hfunc;
        int err;
 
        dev = ice_pf_to_dev(pf);
@@ -3714,9 +3714,12 @@ ice_set_rxfh(struct net_device *netdev, struct ethtool_rxfh_param *rxfh,
                return -EOPNOTSUPP;
        }
 
-       /* Update the VSI's hash function */
-       if (rxfh->input_xfrm & RXH_XFRM_SYM_XOR)
+       if (rxfh->input_xfrm == RXH_XFRM_NO_CHANGE)
+               hfunc = vsi->rss_hfunc;
+       else if (rxfh->input_xfrm & RXH_XFRM_SYM_XOR)
                hfunc = ICE_AQ_VSI_Q_OPT_RSS_HASH_SYM_TPLZ;
+       else /* input_xfrm == 0; core rejects any other value */
+               hfunc = ICE_AQ_VSI_Q_OPT_RSS_HASH_TPLZ;
 
        err = ice_set_rss_hfunc(vsi, hfunc);

^ permalink raw reply

* Re: [PATCH iwl-next v5 1/2] ethtool: treat RXH_GTP_TEID as intrinsically symmetric
From: Jakub Kicinski @ 2026-06-26 22:29 UTC (permalink / raw)
  To: Aleksandr Loktionov; +Cc: intel-wired-lan, anthony.l.nguyen, netdev
In-Reply-To: <20260626054730.1126969-2-aleksandr.loktionov@intel.com>

On Fri, 26 Jun 2026 07:47:29 +0200 Aleksandr Loktionov wrote:
> A GTP tunnel uses the same TEID value in both directions of a flow;
> including TEID in the hash input does not break src/dst symmetry.
> 
> ethtool_rxfh_config_is_sym() currently rejects any hash field bitmap
> that contains bits outside the four paired L3/L4 fields.  This causes
> drivers that hash GTP flows on TEID to fail the kernel's preflight
> validation in ethtool_check_flow_types(), making it impossible for
> those drivers to support symmetric-xor transforms at all.
> 
> Strip RXH_GTP_TEID from the bitmap before the paired-field check so
> that drivers may honestly report TEID hashing without blocking the
> configuration of symmetric transforms.

I don't know much about GTP, but "the Internet" does not seem to agree
with your claim:

  The TEID uniquely identifies the GSN tunnel endpoints. The tunnels 
  for an uplink and a downlink are separate and use a different TEID.

https://docs.paloaltonetworks.com/service-providers/10-1/mobile-network-infrastructure-getting-started/gtp/mobile-network-protection-profile

So I don't think this will fly..

^ permalink raw reply

* [PATCH net] net: gianfar: dispose irq mappings on probe failure and device removal
From: Rosen Penev @ 2026-06-26 22:52 UTC (permalink / raw)
  To: netdev
  Cc: Claudiu Manoil, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Andy Fleming, open list

irq_of_parse_and_map() creates irqdomain mappings that should be
balanced with irq_dispose_mapping(). The driver never called
irq_dispose_mapping(), leaking mappings on probe failure and
device removal.

Fix by adding irq_dispose_mapping() in free_gfar_dev() and
expanding its loop from priv->num_grps to MAXGROUPS so the
error path also catches partially-initialized groups. All
irqinfo pointers are pre-initialized to NULL in gfar_of_init(),
making the NULL-guarded walk in free_gfar_dev() safe for every
scenario.

gfar_parse_group() itself is left as a simple parse function
with no resource management; cleanup is centralized in the
caller's error path.

Assisted-by: opencode:big-pickle
Fixes: b31a1d8b4151 ("gianfar: Convert gianfar to an of_platform_driver")
Signed-off-by: Rosen Penev <rosenp@gmail.com>
---
 drivers/net/ethernet/freescale/gianfar.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c
index 3271de5844f8..89215e1ddc2d 100644
--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -469,10 +469,13 @@ static void free_gfar_dev(struct gfar_private *priv)
 {
 	int i, j;
 
-	for (i = 0; i < priv->num_grps; i++)
+	for (i = 0; i < MAXGROUPS; i++)
 		for (j = 0; j < GFAR_NUM_IRQS; j++) {
-			kfree(priv->gfargrp[i].irqinfo[j]);
-			priv->gfargrp[i].irqinfo[j] = NULL;
+			if (priv->gfargrp[i].irqinfo[j]) {
+				irq_dispose_mapping(priv->gfargrp[i].irqinfo[j]->irq);
+				kfree(priv->gfargrp[i].irqinfo[j]);
+				priv->gfargrp[i].irqinfo[j] = NULL;
+			}
 		}
 
 	free_netdev(priv->ndev);
@@ -616,7 +619,7 @@ static phy_interface_t gfar_get_interface(struct net_device *dev)
 static int gfar_of_init(struct platform_device *ofdev, struct net_device **pdev)
 {
 	const char *model;
-	int err = 0, i;
+	int err = 0, i, j;
 	phy_interface_t interface;
 	struct net_device *dev = NULL;
 	struct gfar_private *priv = NULL;
@@ -702,8 +705,11 @@ static int gfar_of_init(struct platform_device *ofdev, struct net_device **pdev)
 	priv->rx_list.count = 0;
 	mutex_init(&priv->rx_queue_access);
 
-	for (i = 0; i < MAXGROUPS; i++)
+	for (i = 0; i < MAXGROUPS; i++) {
 		priv->gfargrp[i].regs = NULL;
+		for (j = 0; j < GFAR_NUM_IRQS; j++)
+			priv->gfargrp[i].irqinfo[j] = NULL;
+	}
 
 	/* Parse and initialize group specific information */
 	if (priv->mode == MQ_MG_MODE) {
-- 
2.54.0


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox