Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] vhost/vdpa: reject overflowing PA map page counts
From: Yousef Alhouseen @ 2026-06-24 21:47 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, Eugenio Pérez, kvm, virtualization, netdev,
	linux-kernel
In-Reply-To: <20260624153850-mutt-send-email-mst@kernel.org>

On Wed, Jun 24, 2026 at 01:53:38PM -0400, Michael S. Tsirkin wrote:
> You should add "on 32 bit systems" - I do not see how it can
> overflow on 64 bit.

Right, the overflow I was trying to cover is the unsigned long
page-count calculation on 32-bit systems, where size can be wider than
unsigned long and the page offset is added before PFN_UP(). I should
have made that scope explicit in the changelog.

> I don't see how this can happen at all - pinned_vm is in units of pages.

Agreed, that part is not needed for this fix. I'll drop the memlock
check change and send a v2 with the changelog clarified to say this is
for 32-bit systems.

Thanks,
Yousef


On Wed, 24 Jun 2026 15:53:38 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Wed, Jun 24, 2026 at 09:06:53PM +0200, Yousef Alhouseen wrote:
> > vhost_vdpa_pa_map() adds the IOVA page offset to the user-controlled map
> > size before computing the number of pages to pin. If that addition wraps,
> > the code can pin and map fewer pages than the requested IOTLB range.
> >
> > Reject sizes that overflow the page-count calculation.
>
> You should add "on 32 bit systems" - I do not see how it can
> overflow on 64 bit.
>
> > Also make the
> > memlock check subtraction-based so a large page count cannot wrap the
> > pinned page total.
>
> I don't see how this can happen at all - pinned_vm is in units of pages.
>
> > Signed-off-by: Yousef Alhouseen <alhouseenyousef@gmail.com>
> > ---
> > drivers/vhost/vdpa.c | 12 ++++++++++--
> > 1 file changed, 10 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> > index ac55275fa..090cb8693 100644
> > --- a/drivers/vhost/vdpa.c
> > +++ b/drivers/vhost/vdpa.c
> > @@ -1102,6 +1102,8 @@ static int vhost_vdpa_pa_map(struct vhost_vdpa *v,
> > unsigned int gup_flags = FOLL_LONGTERM;
> > unsigned long npages, cur_base, map_pfn, last_pfn = 0;
> > unsigned long lock_limit, sz2pin, nchunks, i;
> > + unsigned long page_offset;
> > + u64 pinned_vm;
> > u64 start = iova;
> > long pinned;
> > int ret = 0;
> > @@ -1114,7 +1116,12 @@ static int vhost_vdpa_pa_map(struct vhost_vdpa *v,
> > if (perm & VHOST_ACCESS_WO)
> > gup_flags |= FOLL_WRITE;
> >
> > - npages = PFN_UP(size + (iova & ~PAGE_MASK));
> > + page_offset = iova & ~PAGE_MASK;
> > + if (size > ULONG_MAX - page_offset) {
> > + ret = -EINVAL;
> > + goto free;
> > + }
> > + npages = PFN_UP(size + page_offset);
> > if (!npages) {
> > ret = -EINVAL;
> > goto free;
> > @@ -1123,7 +1130,8 @@ static int vhost_vdpa_pa_map(struct vhost_vdpa *v,
> > mmap_read_lock(dev->mm);
> >
> > lock_limit = PFN_DOWN(rlimit(RLIMIT_MEMLOCK));
> > - if (npages + atomic64_read(&dev->mm->pinned_vm) > lock_limit) {
> > + pinned_vm = atomic64_read(&dev->mm->pinned_vm);
> > + if (npages > lock_limit || pinned_vm > lock_limit - npages) {
> > ret = -ENOMEM;
> > goto unlock;
> > }
> > --
> > 2.54.0

^ permalink raw reply

* Re: [PATCH net v4 2/2] net: phy: mdio-i2c: defer RollBall bridge probe to PHY discovery
From: Aleksander Jan Bajkowski @ 2026-06-24 21:44 UTC (permalink / raw)
  To: Petr Wozniak, Russell King, Andrew Lunn, Heiner Kallweit
  Cc: Jakub Kicinski, David S . Miller, Eric Dumazet, Paolo Abeni,
	netdev, linux-kernel, linux-phy, Maxime Chevallier, Bjorn Mork,
	Marek Behun
In-Reply-To: <20260624084814.20972-3-petr.wozniak@gmail.com>

Hi Petr,

W dniu 24.06.2026 o 10:48, Petr Wozniak pisze:
> commit 8fe125892f40 ("net: phy: sfp: probe for RollBall I2C-to-MDIO
> bridge in mdio-i2c") introduced a regression: the RollBall I2C-to-MDIO
> bridge is not yet ready to respond to CMD_READ/CMD_DONE cycles when
> sfp_sm_add_mdio_bus() runs in SFP_S_INIT.  The 200 ms probe times out,
> i2c_mii_probe_rollball() returns -ENODEV, and sfp_sm_add_mdio_bus()
> sets mdio_protocol = MDIO_I2C_NONE.  By the time sfp_sm_probe_for_phy()
> runs (up to ~17 s later on affected hardware), the bridge is fully
> initialized but PHY probing is skipped because the protocol has already
> been changed to NONE.
>
> This affects both modules inserted before boot and hotplugged modules on
> hardware where bridge initialization exceeds the 200 ms probe window
> (confirmed: FLYPRO SFP-10GT-CS-30M with Aquantia AQR113C, hotplugged).
>
> Move the probe from i2c_mii_init_rollball(), called at bus-creation time,
> to sfp_sm_probe_for_phy() in sfp.c, where it runs after the SFP state
> machine module initialization delays.  Export the probe function as
> mdio_i2c_probe_rollball() so sfp.c can call it.
>
> For RTL8261BE-based modules the probe correctly returns -ENODEV at PHY
> discovery time, causing sfp_sm_probe_for_phy() to destroy the MDIO bus
> and set MDIO_I2C_NONE, eliminating the 5+ minute PHY probe retry loop.
>
> For genuine RollBall modules (e.g. FLYPRO SFP-10GT-CS-30M with Aquantia
> AQR113C) the probe now runs after initialization is complete and
> correctly returns 0, so PHY detection proceeds normally.
The FLPRO SFP module still fails to detect the PHY. It is necessary to
increase `module_t_wait` to 20 seconds. Most likely, during this time
the module loads the PHY firmware from SPI memory or from the
microcontroller (rollball bridge) via MDIO. Same probably applies to
most SFP modules with a PHY that load firmware at start-up (AQR113,
RTL8261C etc.).
>
> Reported-by: Aleksander Bajkowski <olek2@wp.pl>
> Fixes: 8fe125892f40 ("net: phy: sfp: probe for RollBall I2C-to-MDIO bridge in mdio-i2c")
> Signed-off-by: Petr Wozniak <petr.wozniak@gmail.com>
> ---
>   drivers/net/mdio/mdio-i2c.c   | 15 +++++++++------
>   drivers/net/phy/sfp.c         | 22 ++++++++++++++--------
>   include/linux/mdio/mdio-i2c.h |  1 +
>   3 files changed, 24 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/net/mdio/mdio-i2c.c b/drivers/net/mdio/mdio-i2c.c
> index b88f63234b4e..2a3a418c1369 100644
> --- a/drivers/net/mdio/mdio-i2c.c
> +++ b/drivers/net/mdio/mdio-i2c.c
> @@ -419,7 +419,7 @@ static int i2c_mii_write_rollball(struct mii_bus *bus, int phy_id, int devad,
>   	return 0;
>   }
>   
> -static int i2c_mii_probe_rollball(struct i2c_adapter *i2c)
> +int mdio_i2c_probe_rollball(struct i2c_adapter *i2c)
>   {
>   	u8 data_buf[] = { ROLLBALL_DATA_ADDR, 0x01, 0x00, 0x00 };
>   	u8 cmd_buf[]  = { ROLLBALL_CMD_ADDR, ROLLBALL_CMD_READ };
> @@ -462,9 +462,13 @@ static int i2c_mii_probe_rollball(struct i2c_adapter *i2c)
>   
>   	return -ENODEV;
>   }
> +EXPORT_SYMBOL_GPL(mdio_i2c_probe_rollball);
>   
>   static int i2c_mii_init_rollball(struct i2c_adapter *i2c)
>   {
> +	/* Send the RollBall unlock password; bridge presence is verified
> +	 * later, in sfp_sm_probe_for_phy(), after module initialization.
> +	 */
>   	struct i2c_msg msg;
>   	u8 pw[5];
>   	int ret;
> @@ -486,7 +490,7 @@ static int i2c_mii_init_rollball(struct i2c_adapter *i2c)
>   	if (ret != 1)
>   		return -EIO;
>   
> -	return i2c_mii_probe_rollball(i2c);
> +	return 0;
>   }
>   
>   static bool mdio_i2c_check_functionality(struct i2c_adapter *i2c,
> @@ -531,10 +535,9 @@ struct mii_bus *mdio_i2c_alloc(struct device *parent, struct i2c_adapter *i2c,
>   	case MDIO_I2C_ROLLBALL:
>   		ret = i2c_mii_init_rollball(i2c);
>   		if (ret < 0) {
> -			if (ret != -ENODEV)
> -				dev_err(parent,
> -					"Cannot initialize RollBall MDIO I2C protocol: %d\n",
> -					ret);
> +			dev_err(parent,
> +				"Cannot initialize RollBall MDIO I2C protocol: %d\n",
> +				ret);
>   			mdiobus_free(mii);
>   			return ERR_PTR(ret);
>   		}
> diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c
> index c4d274ab651e..01b941a38eed 100644
> --- a/drivers/net/phy/sfp.c
> +++ b/drivers/net/phy/sfp.c
> @@ -2174,17 +2174,10 @@ static void sfp_sm_fault(struct sfp *sfp, unsigned int next_state, bool warn)
>   
>   static int sfp_sm_add_mdio_bus(struct sfp *sfp)
>   {
> -	int ret;
> -
>   	if (sfp->mdio_protocol == MDIO_I2C_NONE)
>   		return 0;
>   
> -	ret = sfp_i2c_mdiobus_create(sfp);
> -	if (ret == -ENODEV) {
> -		sfp->mdio_protocol = MDIO_I2C_NONE;
> -		return 0;
> -	}
> -	return ret;
> +	return sfp_i2c_mdiobus_create(sfp);
>   }
>   
>   /* Probe a SFP for a PHY device if the module supports copper - the PHY
> @@ -2215,6 +2208,19 @@ static int sfp_sm_probe_for_phy(struct sfp *sfp)
>   		break;
>   
>   	case MDIO_I2C_ROLLBALL:
> +		/* Probe here, after module initialization delays, so that
> +		 * genuine RollBall bridges have had time to start up.
> +		 * Modules without a bridge (e.g. RTL8261BE) return -ENODEV.
> +		 */
> +		err = mdio_i2c_probe_rollball(sfp->i2c);
> +		if (err == -ENODEV) {
> +			sfp_i2c_mdiobus_destroy(sfp);
> +			sfp->mdio_protocol = MDIO_I2C_NONE;
> +			err = 0;
> +			break;
> +		}
> +		if (err)
> +			break;
>   		err = sfp_sm_probe_phy(sfp, SFP_PHY_ADDR_ROLLBALL, true);
>   		break;
>   	}
> diff --git a/include/linux/mdio/mdio-i2c.h b/include/linux/mdio/mdio-i2c.h
> index 65b550a6fc32..5cf14f45c94b 100644
> --- a/include/linux/mdio/mdio-i2c.h
> +++ b/include/linux/mdio/mdio-i2c.h
> @@ -20,5 +20,6 @@ enum mdio_i2c_proto {
>   
>   struct mii_bus *mdio_i2c_alloc(struct device *parent, struct i2c_adapter *i2c,
>   			       enum mdio_i2c_proto protocol);
> +int mdio_i2c_probe_rollball(struct i2c_adapter *i2c);
>   
>   #endif

Best regards,
Aleksander


^ permalink raw reply

* Re: [PATCH bpf 1/2] bpf, sockmap: Don't leak UDP socks on lookup-bind-release
From: Kuniyuki Iwashima @ 2026-06-24 21:39 UTC (permalink / raw)
  To: Michal Luczaj
  Cc: Willem de Bruijn, Jakub Sitnicki, John Fastabend, Jiayuan Chen,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Alexei Starovoitov, Cong Wang, Daniel Borkmann,
	Andrii Nakryiko, Eduard Zingerman, Kumar Kartikeya Dwivedi,
	Martin KaFai Lau, Song Liu, Yonghong Song, Jiri Olsa,
	Emil Tsalapatis, Shuah Khan, netdev, bpf, linux-kernel,
	linux-kselftest
In-Reply-To: <CAAVpQUBHMrSiWqn0Zo_D7sUCLFdkKggduoijRdFtHx5CPiSpsg@mail.gmail.com>

On Wed, Jun 24, 2026 at 2:33 PM Kuniyuki Iwashima <kuniyu@google.com> wrote:
>
> On Wed, Jun 24, 2026 at 2:26 PM Michal Luczaj <mhal@rbox.co> wrote:
> >
> > On 6/24/26 22:01, Willem de Bruijn wrote:
> > > Jakub Sitnicki wrote:
> > >> On Tue, Jun 23, 2026 at 08:03 PM +02, Michal Luczaj wrote:
> > >>> UDP sockets get SOCK_RCU_FREE set when (auto-)bound. This means
> > >>> sk_is_refcounted(unbound) = true, while sk_is_refcounted(bound) = false.
> > >>>
> > >>> Because sockmap accepts unbound UDP sockets, a BPF program can increment a
> > >>> socket's refcount via lookup. If the socket is subsequently bound, the
> > >>> transition from unbound to bound causes bpf_sk_release() to skip the
> > >>> decrement of the refcount, causing a memory leak.
> > >>>
> > >>> unreferenced object 0xffff88810bc2eb40 (size 1984):
> > >>>   comm "test_progs", pid 2451, jiffies 4295320596
> > >>>   hex dump (first 32 bytes):
> > >>>     7f 00 00 01 7f 00 00 01 d2 04 1b b7 04 d2 00 00  ................
> > >>>     02 00 01 40 00 00 00 00 00 00 00 00 00 00 00 00  ...@............
> > >>>   backtrace (crc bdee079d):
> > >>>     kmem_cache_alloc_noprof+0x557/0x660
> > >>>     sk_prot_alloc+0x69/0x240
> > >>>     sk_alloc+0x30/0x460
> > >>>     inet_create+0x2ce/0xf80
> > >>>     __sock_create+0x25b/0x5c0
> > >>>     __sys_socket+0x119/0x1d0
> > >>>     __x64_sys_socket+0x72/0xd0
> > >>>     do_syscall_64+0xa1/0x5f0
> > >>>     entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > >>>
> > >>> Maintain balanced refcounts across sk lookup/release: (re-)set
> > >>> SOCK_RCU_FREE on proto update to treat the socket (whether bound or
> > >>> unbound) as not requiring a refcount increment on (a RCU protected) lookup.
> > >>>
> > >>> Fixes: 0c48eefae712 ("sock_map: Lift socket state restriction for datagram sockets")
> > >>> Signed-off-by: Michal Luczaj <mhal@rbox.co>
> > >>> ---
> > >>> Note: this issue is related to commit 67312adc96b5 ("bpf: reject unhashed
> > >>> sockets in bpf_sk_assign").
> > >>> ---
> > >>>  net/ipv4/udp_bpf.c | 3 +++
> > >>>  1 file changed, 3 insertions(+)
> > >>>
> > >>> diff --git a/net/ipv4/udp_bpf.c b/net/ipv4/udp_bpf.c
> > >>> index ad57c4c9eaab..970327b59582 100644
> > >>> --- a/net/ipv4/udp_bpf.c
> > >>> +++ b/net/ipv4/udp_bpf.c
> > >>> @@ -173,6 +173,9 @@ int udp_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore)
> > >>>     if (sk->sk_family == AF_INET6)
> > >>>             udp_bpf_check_v6_needs_rebuild(psock->sk_proto);
> > >>>
> > >>> +   /* Treat all sockets as non-refcounted, regardless of binding state. */
> > >>> +   sock_set_flag(sk, SOCK_RCU_FREE);
> > >>> +
> > >>>     sock_replace_proto(sk, &udp_bpf_prots[family]);
> > >>>     return 0;
> > >>>  }
> > >>
> > >> There is a side effect that an unhashed (unbound) UDP socket can now be
> > >> selected in sk_lookup with bpf_sk_assign.
> > >
> > > The commit does mention a related fix, beneath the ---, commit
> > > 67312adc96b5 ("bpf: reject unhashed sockets in bpf_sk_assign").
> > > That fixes a similar issue by exactly disallowing this:
> > >
> > >     Fix the problem by rejecting unhashed sockets in bpf_sk_assign().
> > >     This matches the behaviour of __inet_lookup_skb which is ultimately
> > >     the goal of bpf_sk_assign().
> > >
> > > So ..
> > >
> > >> Though perhaps that's for the
> > >> better because TC bpf_sk_assign doesn't reject non-refcounted UDP
> > >> sockets either, so we would have both socket dispatch sites behave the
> > >> same way.
> > >
> > > .. there are two conflicting types of consistency here? Consistent with
> > > __inet_lookup_skb or the TC bpf hook. Of those the first is the more
> > > canonical.
> > >
> > >> Also, with this patch, if we insert & remove an unhashed UDP socket
> > >> into/from a sockmap, we end up with an unhashed non-refcounted UDP
> > >> socket. Not entirely sure if that is actually a problem or not.
> > >>
> > >> Willem, what is your take on having unhashed non-refcoted UDP sockets?
> > >
> > > I don't immediately see a problem, but I'm not an expert on SOCK_RCU_FREE.
> >
> > Perhaps it's worth mentioning that unhashed non-refcounted UDP socket is
> > already possible: first auto-bind via connect(AF_INET) (which also sets
> > SOCK_RCU_FREE), then unhash via connect(AF_UNSPEC).
>
> Setting SOCK_RCU_FREE itself should not cause a problem, but I think
> we should take a step back.
>
> AFAIU, 0c48eefae712 was to allow putting AF_UNIX SOCK_DGRAM sockets
> into sockmap, not to allow using unconnected UDP sockets in sk_lookup etc.
>
> Actually, v4 of the patch was implemented as such but did not get any feedback,
> https://lore.kernel.org/bpf/20210508220835.53801-9-xiyou.wangcong@gmail.com/#t
>
> ... and v5 (the final commit) somehow removed the restriction for unconnected
> UDP socket as well.
> https://lore.kernel.org/bpf/20210704190252.11866-3-xiyou.wangcong@gmail.com/
>
> Given the initial use case, sockmap redirect, is still blocked by
> TCP_ESTABLISHED
> check in sock_map_redirect_allowed(), I feel there is no point in supporting
> unconnected UDP sockets in sockmap.  It cannot get any skb from anywhere
> (without buggy sk_lookup).

s/unconnected/unhashed/g :)

^ permalink raw reply

* Re: [PATCH bpf 1/2] bpf, sockmap: Don't leak UDP socks on lookup-bind-release
From: Kuniyuki Iwashima @ 2026-06-24 21:33 UTC (permalink / raw)
  To: Michal Luczaj
  Cc: Willem de Bruijn, Jakub Sitnicki, John Fastabend, Jiayuan Chen,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Alexei Starovoitov, Cong Wang, Daniel Borkmann,
	Andrii Nakryiko, Eduard Zingerman, Kumar Kartikeya Dwivedi,
	Martin KaFai Lau, Song Liu, Yonghong Song, Jiri Olsa,
	Emil Tsalapatis, Shuah Khan, netdev, bpf, linux-kernel,
	linux-kselftest
In-Reply-To: <dd065bfb-52ce-48fd-b1ef-9c6166f714ed@rbox.co>

On Wed, Jun 24, 2026 at 2:26 PM Michal Luczaj <mhal@rbox.co> wrote:
>
> On 6/24/26 22:01, Willem de Bruijn wrote:
> > Jakub Sitnicki wrote:
> >> On Tue, Jun 23, 2026 at 08:03 PM +02, Michal Luczaj wrote:
> >>> UDP sockets get SOCK_RCU_FREE set when (auto-)bound. This means
> >>> sk_is_refcounted(unbound) = true, while sk_is_refcounted(bound) = false.
> >>>
> >>> Because sockmap accepts unbound UDP sockets, a BPF program can increment a
> >>> socket's refcount via lookup. If the socket is subsequently bound, the
> >>> transition from unbound to bound causes bpf_sk_release() to skip the
> >>> decrement of the refcount, causing a memory leak.
> >>>
> >>> unreferenced object 0xffff88810bc2eb40 (size 1984):
> >>>   comm "test_progs", pid 2451, jiffies 4295320596
> >>>   hex dump (first 32 bytes):
> >>>     7f 00 00 01 7f 00 00 01 d2 04 1b b7 04 d2 00 00  ................
> >>>     02 00 01 40 00 00 00 00 00 00 00 00 00 00 00 00  ...@............
> >>>   backtrace (crc bdee079d):
> >>>     kmem_cache_alloc_noprof+0x557/0x660
> >>>     sk_prot_alloc+0x69/0x240
> >>>     sk_alloc+0x30/0x460
> >>>     inet_create+0x2ce/0xf80
> >>>     __sock_create+0x25b/0x5c0
> >>>     __sys_socket+0x119/0x1d0
> >>>     __x64_sys_socket+0x72/0xd0
> >>>     do_syscall_64+0xa1/0x5f0
> >>>     entry_SYSCALL_64_after_hwframe+0x76/0x7e
> >>>
> >>> Maintain balanced refcounts across sk lookup/release: (re-)set
> >>> SOCK_RCU_FREE on proto update to treat the socket (whether bound or
> >>> unbound) as not requiring a refcount increment on (a RCU protected) lookup.
> >>>
> >>> Fixes: 0c48eefae712 ("sock_map: Lift socket state restriction for datagram sockets")
> >>> Signed-off-by: Michal Luczaj <mhal@rbox.co>
> >>> ---
> >>> Note: this issue is related to commit 67312adc96b5 ("bpf: reject unhashed
> >>> sockets in bpf_sk_assign").
> >>> ---
> >>>  net/ipv4/udp_bpf.c | 3 +++
> >>>  1 file changed, 3 insertions(+)
> >>>
> >>> diff --git a/net/ipv4/udp_bpf.c b/net/ipv4/udp_bpf.c
> >>> index ad57c4c9eaab..970327b59582 100644
> >>> --- a/net/ipv4/udp_bpf.c
> >>> +++ b/net/ipv4/udp_bpf.c
> >>> @@ -173,6 +173,9 @@ int udp_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore)
> >>>     if (sk->sk_family == AF_INET6)
> >>>             udp_bpf_check_v6_needs_rebuild(psock->sk_proto);
> >>>
> >>> +   /* Treat all sockets as non-refcounted, regardless of binding state. */
> >>> +   sock_set_flag(sk, SOCK_RCU_FREE);
> >>> +
> >>>     sock_replace_proto(sk, &udp_bpf_prots[family]);
> >>>     return 0;
> >>>  }
> >>
> >> There is a side effect that an unhashed (unbound) UDP socket can now be
> >> selected in sk_lookup with bpf_sk_assign.
> >
> > The commit does mention a related fix, beneath the ---, commit
> > 67312adc96b5 ("bpf: reject unhashed sockets in bpf_sk_assign").
> > That fixes a similar issue by exactly disallowing this:
> >
> >     Fix the problem by rejecting unhashed sockets in bpf_sk_assign().
> >     This matches the behaviour of __inet_lookup_skb which is ultimately
> >     the goal of bpf_sk_assign().
> >
> > So ..
> >
> >> Though perhaps that's for the
> >> better because TC bpf_sk_assign doesn't reject non-refcounted UDP
> >> sockets either, so we would have both socket dispatch sites behave the
> >> same way.
> >
> > .. there are two conflicting types of consistency here? Consistent with
> > __inet_lookup_skb or the TC bpf hook. Of those the first is the more
> > canonical.
> >
> >> Also, with this patch, if we insert & remove an unhashed UDP socket
> >> into/from a sockmap, we end up with an unhashed non-refcounted UDP
> >> socket. Not entirely sure if that is actually a problem or not.
> >>
> >> Willem, what is your take on having unhashed non-refcoted UDP sockets?
> >
> > I don't immediately see a problem, but I'm not an expert on SOCK_RCU_FREE.
>
> Perhaps it's worth mentioning that unhashed non-refcounted UDP socket is
> already possible: first auto-bind via connect(AF_INET) (which also sets
> SOCK_RCU_FREE), then unhash via connect(AF_UNSPEC).

Setting SOCK_RCU_FREE itself should not cause a problem, but I think
we should take a step back.

AFAIU, 0c48eefae712 was to allow putting AF_UNIX SOCK_DGRAM sockets
into sockmap, not to allow using unconnected UDP sockets in sk_lookup etc.

Actually, v4 of the patch was implemented as such but did not get any feedback,
https://lore.kernel.org/bpf/20210508220835.53801-9-xiyou.wangcong@gmail.com/#t

... and v5 (the final commit) somehow removed the restriction for unconnected
UDP socket as well.
https://lore.kernel.org/bpf/20210704190252.11866-3-xiyou.wangcong@gmail.com/

Given the initial use case, sockmap redirect, is still blocked by
TCP_ESTABLISHED
check in sock_map_redirect_allowed(), I feel there is no point in supporting
unconnected UDP sockets in sockmap.  It cannot get any skb from anywhere
(without buggy sk_lookup).

^ permalink raw reply

* Re: [PATCH bpf 1/2] bpf, sockmap: Don't leak UDP socks on lookup-bind-release
From: Michal Luczaj @ 2026-06-24 21:25 UTC (permalink / raw)
  To: Willem de Bruijn, Jakub Sitnicki
  Cc: John Fastabend, Jiayuan Chen, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Alexei Starovoitov,
	Cong Wang, Daniel Borkmann, Andrii Nakryiko, Eduard Zingerman,
	Kumar Kartikeya Dwivedi, Martin KaFai Lau, Song Liu,
	Yonghong Song, Jiri Olsa, Emil Tsalapatis, Shuah Khan, netdev,
	bpf, linux-kernel, linux-kselftest, kuniyu
In-Reply-To: <willemdebruijn.kernel.24d11e11d5dc0@gmail.com>

On 6/24/26 22:01, Willem de Bruijn wrote:
> Jakub Sitnicki wrote:
>> On Tue, Jun 23, 2026 at 08:03 PM +02, Michal Luczaj wrote:
>>> UDP sockets get SOCK_RCU_FREE set when (auto-)bound. This means
>>> sk_is_refcounted(unbound) = true, while sk_is_refcounted(bound) = false.
>>>
>>> Because sockmap accepts unbound UDP sockets, a BPF program can increment a
>>> socket's refcount via lookup. If the socket is subsequently bound, the
>>> transition from unbound to bound causes bpf_sk_release() to skip the
>>> decrement of the refcount, causing a memory leak.
>>>
>>> unreferenced object 0xffff88810bc2eb40 (size 1984):
>>>   comm "test_progs", pid 2451, jiffies 4295320596
>>>   hex dump (first 32 bytes):
>>>     7f 00 00 01 7f 00 00 01 d2 04 1b b7 04 d2 00 00  ................
>>>     02 00 01 40 00 00 00 00 00 00 00 00 00 00 00 00  ...@............
>>>   backtrace (crc bdee079d):
>>>     kmem_cache_alloc_noprof+0x557/0x660
>>>     sk_prot_alloc+0x69/0x240
>>>     sk_alloc+0x30/0x460
>>>     inet_create+0x2ce/0xf80
>>>     __sock_create+0x25b/0x5c0
>>>     __sys_socket+0x119/0x1d0
>>>     __x64_sys_socket+0x72/0xd0
>>>     do_syscall_64+0xa1/0x5f0
>>>     entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>
>>> Maintain balanced refcounts across sk lookup/release: (re-)set
>>> SOCK_RCU_FREE on proto update to treat the socket (whether bound or
>>> unbound) as not requiring a refcount increment on (a RCU protected) lookup.
>>>
>>> Fixes: 0c48eefae712 ("sock_map: Lift socket state restriction for datagram sockets")
>>> Signed-off-by: Michal Luczaj <mhal@rbox.co>
>>> ---
>>> Note: this issue is related to commit 67312adc96b5 ("bpf: reject unhashed
>>> sockets in bpf_sk_assign").
>>> ---
>>>  net/ipv4/udp_bpf.c | 3 +++
>>>  1 file changed, 3 insertions(+)
>>>
>>> diff --git a/net/ipv4/udp_bpf.c b/net/ipv4/udp_bpf.c
>>> index ad57c4c9eaab..970327b59582 100644
>>> --- a/net/ipv4/udp_bpf.c
>>> +++ b/net/ipv4/udp_bpf.c
>>> @@ -173,6 +173,9 @@ int udp_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore)
>>>  	if (sk->sk_family == AF_INET6)
>>>  		udp_bpf_check_v6_needs_rebuild(psock->sk_proto);
>>>  
>>> +	/* Treat all sockets as non-refcounted, regardless of binding state. */
>>> +	sock_set_flag(sk, SOCK_RCU_FREE);
>>> +
>>>  	sock_replace_proto(sk, &udp_bpf_prots[family]);
>>>  	return 0;
>>>  }
>>
>> There is a side effect that an unhashed (unbound) UDP socket can now be
>> selected in sk_lookup with bpf_sk_assign.
> 
> The commit does mention a related fix, beneath the ---, commit
> 67312adc96b5 ("bpf: reject unhashed sockets in bpf_sk_assign").
> That fixes a similar issue by exactly disallowing this:
> 
>     Fix the problem by rejecting unhashed sockets in bpf_sk_assign().
>     This matches the behaviour of __inet_lookup_skb which is ultimately
>     the goal of bpf_sk_assign().
> 
> So ..
> 
>> Though perhaps that's for the
>> better because TC bpf_sk_assign doesn't reject non-refcounted UDP
>> sockets either, so we would have both socket dispatch sites behave the
>> same way.
> 
> .. there are two conflicting types of consistency here? Consistent with
> __inet_lookup_skb or the TC bpf hook. Of those the first is the more
> canonical.
> 
>> Also, with this patch, if we insert & remove an unhashed UDP socket
>> into/from a sockmap, we end up with an unhashed non-refcounted UDP
>> socket. Not entirely sure if that is actually a problem or not.
>>
>> Willem, what is your take on having unhashed non-refcoted UDP sockets?
> 
> I don't immediately see a problem, but I'm not an expert on SOCK_RCU_FREE.

Perhaps it's worth mentioning that unhashed non-refcounted UDP socket is
already possible: first auto-bind via connect(AF_INET) (which also sets
SOCK_RCU_FREE), then unhash via connect(AF_UNSPEC).

^ permalink raw reply

* Re: [PATCH bpf 2/2] selftests/bpf: Add test for UDP sock leak on sockmap lookup-bind-release
From: Michal Luczaj @ 2026-06-24 21:24 UTC (permalink / raw)
  To: bot+bpf-ci, john.fastabend, jakub, jiayuan.chen, davem, edumazet,
	kuba, pabeni, horms, ast, cong.wang, daniel, andrii, eddyz87,
	memxor, martin.lau, song, yonghong.song, jolsa, emil, shuah
  Cc: netdev, bpf, linux-kernel, linux-kselftest, martin.lau, clm,
	ihor.solodrai
In-Reply-To: <bd4abd3f584c38f1fc512a9d3ae1f90c2a374588fa23529b32b7f1aed6ab2ead@mail.kernel.org>

On 6/23/26 21:32, bot+bpf-ci@kernel.org wrote:
>> selftests/bpf: Add test for UDP sock leak on sockmap lookup-bind-release
>>
>>     Setup and join a cgroup, then attach a cgroup/connect4 program that runs
>>
>>        sk = bpf_map_lookup_elem(sockmap, 0)
>>        bpf_bind(ctx, sa, sizeof(sa))
>>        bpf_sk_release(sk)
>>
>>     Unpatched kernel leaks the socket.
>>
>>     Signed-off-by: Michal Luczaj <mhal@rbox.co>
> 
> This test reproduces a UDP socket leak across sockmap lookup-bind-release,
> but there is no Fixes: tag.
> 
> The sibling fix commit a2510dc351c5 ("bpf, sockmap: Don't leak UDP socks on
> lookup-bind-release") carries a Fixes: tag pointing at the commit that lifted
> the socket-state restriction allowing unbound UDP sockets into sockmap.
> 
> Should this test carry the same tag so it stays backportable alongside the
> fix?
> 
>   Fixes: 0c48eefae712 ("sock_map: Lift socket state restriction for datagram sockets")

I am skipping this suggestion as the test itself does not address any bugs
and requires no backporting.

Michal

^ permalink raw reply

* Re: [PATCH nf-next 1/4] netfilter: nf_conntrack_sane: replace u_int16_t with u16
From: Florian Westphal @ 2026-06-24 21:00 UTC (permalink / raw)
  To: Carlos Grillet
  Cc: Pablo Neira Ayuso, Phil Sutter, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, netfilter-devel,
	coreteam, netdev, linux-kernel
In-Reply-To: <20260624184036.71051-2-carlos@carlosgrillet.me>

Carlos Grillet <carlos@carlosgrillet.me> wrote:
> Use preferred kernel integer type u16 instead of the POSIX u_int16_t
> variant.
> 
> No functional change.
> 
> Signed-off-by: Carlos Grillet <carlos@carlosgrillet.me>
> ---
>  net/netfilter/nf_conntrack_sane.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/netfilter/nf_conntrack_sane.c b/net/netfilter/nf_conntrack_sane.c
> index 39085acf7a71..130b3e68090e 100644
> --- a/net/netfilter/nf_conntrack_sane.c
> +++ b/net/netfilter/nf_conntrack_sane.c
> @@ -35,7 +35,7 @@ MODULE_DESCRIPTION("SANE connection tracking helper");
>  MODULE_ALIAS_NFCT_HELPER(HELPER_NAME);
>  
>  #define MAX_PORTS 8
> -static u_int16_t ports[MAX_PORTS];
> +static u16 ports[MAX_PORTS];

These port variables are useless and will be removed soon.

^ permalink raw reply

* Re: [PATCH bpf-next v2] bpf, unix: Guard sk_msg-dependent code behind CONFIG_NET_SOCK_MSG
From: Alexei Starovoitov @ 2026-06-24 20:57 UTC (permalink / raw)
  To: Jiayuan Chen, Jakub Sitnicki
  Cc: Amery Hung, Kuniyuki Iwashima, bpf, Alexei Starovoitov,
	Daniel Borkmann, Jakub Kicinski, John Fastabend,
	Network Development, kernel-team
In-Reply-To: <a50cef70-d8fe-4f42-a89b-2c63c33a72ef@linux.dev>

On Tue Jun 23, 2026 at 6:32 PM PDT, Jiayuan Chen wrote:
>
> Hi Alexei and Jakub,
>
> skmsg is actually still pretty useful for gateways.
> I started with bpf by integrating skmsg into nginx as a module and envoy 
> has something similar.
> The usual setup is cgroup/sk for L4 bypass (reject SYN), and skmsg for 
> L7, redirecting
> between local apps by looking at the payload. So there are real users.

...

> Agree, just like we remove skmsg from KTLS which is rarely used.

...

> Hope not have skmsg disabled by default.

I wasn't suggesting to delete the whole skmsg,
but to disable combinations that are causing issues.
Like what was done for skmsg and ktls.
I'd allow plain tcp and udp sockets only.
Allowing unix sockets was fishy. I think we should reject it too.

^ permalink raw reply

* [PATCH net v2] net: pse-pd: scope pse_control regulator handle to kref lifetime
From: Carlo Szelinsky @ 2026-06-24 20:40 UTC (permalink / raw)
  To: Oleksij Rempel, Kory Maincent, Andrew Lunn, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Corey Leavitt, Heiner Kallweit, Russell King,
	netdev, linux-kernel, Carlo Szelinsky

From: Corey Leavitt <corey@leavitt.info>

__pse_control_release() drops psec->ps via devm_regulator_put(), which
only succeeds if the devres entry added by the matching
devm_regulator_get_exclusive() is still present on pcdev->dev at the
time the pse_control's kref hits zero.

That assumption does not hold when the controller is unbound while a
pse_control still has consumers: pcdev->dev's devres list is released
LIFO, so every per-attach regulator-GET devres runs (and
regulator_put()s the underlying regulator) before
pse_controller_unregister() itself is invoked. Any later
pse_control_put() from that unbind path then reads psec->ps as a
dangling pointer inside devm_regulator_put() and WARNs at
drivers/regulator/devres.c:232 (devres_release() fails to find the
already-released match).

The pse_control's consumer handle is logically scoped to the
pse_control's refcount, not to pcdev->dev's devres lifetime. Switch to
the plain regulator_get_exclusive() / regulator_put() pair so the
regulator put in __pse_control_release() no longer depends on the
controller's devres still being present. No change to the
regulator-framework-visible refcount or lifetime of the underlying
regulator: a single get paired with a single put. The existing
devm_regulator_register() for the per-PI rails is unchanged (those ARE
correctly scoped to the controller's lifetime).

This addresses only the regulator handle. The same unbind-while-held
scenario also leaves __pse_control_release() reading psec->pcdev->pi[]
and psec->pcdev->owner after pse_controller_unregister() has freed
pcdev->pi, because the controller does not drain its outstanding
pse_control references on unregister. That wider pse_control vs
pcdev lifetime problem pre-dates this change and is addressed by the
PSE controller notifier series, which drains phydev->psec on
PSE_UNREGISTERED before pcdev->pi is freed.

Link: https://lore.kernel.org/netdev/20260620112440.1734404-1-github@szelinsky.de/
Fixes: d83e13761d5b ("net: pse-pd: Use regulator framework within PSE framework")
Signed-off-by: Corey Leavitt <corey@leavitt.info>
Acked-by: Kory Maincent <kory.maincent@bootlin.com>
Signed-off-by: Carlo Szelinsky <github@szelinsky.de>
---
This is patch 1 of the "decouple controller lookup from MDIO probe"
series, reposted on its own for net as Jakub suggested. The rest of the
series targets net-next and is deferred until it reopens.

Changes in v2:
- Reword the commit message to scope the fix to the regulator handle.
  As Simon's review pointed out, the same unbind-while-held path also
  reads pcdev->pi[] and pcdev->owner after pse_release_pis(); that wider
  pse_control vs pcdev lifetime issue is fixed by the notifier series,
  not here. No code change.
  Link: https://lore.kernel.org/netdev/20260624151251.1137250-1-horms@kernel.org/
v1: https://lore.kernel.org/netdev/20260622192839.2508733-1-github@szelinsky.de/
---
 drivers/net/pse-pd/pse_core.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/pse-pd/pse_core.c b/drivers/net/pse-pd/pse_core.c
index 69dbdbde9d71..a5e6d7b26b9f 100644
--- a/drivers/net/pse-pd/pse_core.c
+++ b/drivers/net/pse-pd/pse_core.c
@@ -1367,7 +1367,7 @@ static void __pse_control_release(struct kref *kref)
 
 	if (psec->pcdev->pi[psec->id].admin_state_enabled)
 		regulator_disable(psec->ps);
-	devm_regulator_put(psec->ps);
+	regulator_put(psec->ps);
 
 	module_put(psec->pcdev->owner);
 
@@ -1436,8 +1436,8 @@ pse_control_get_internal(struct pse_controller_dev *pcdev, unsigned int index,
 		goto free_psec;
 
 	pcdev->pi[index].admin_state_enabled = ret;
-	psec->ps = devm_regulator_get_exclusive(pcdev->dev,
-						rdev_get_name(pcdev->pi[index].rdev));
+	psec->ps = regulator_get_exclusive(pcdev->dev,
+					   rdev_get_name(pcdev->pi[index].rdev));
 	if (IS_ERR(psec->ps)) {
 		ret = PTR_ERR(psec->ps);
 		goto put_module;

base-commit: d87363b0edfc7504ff2b144fe4cdd8154f90f42e
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH net] net: pse-pd: scope pse_control regulator handle to kref lifetime
From: Carlo Szelinsky @ 2026-06-24 20:38 UTC (permalink / raw)
  To: Simon Horman
  Cc: Oleksij Rempel, Kory Maincent, Andrew Lunn, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Corey Leavitt, netdev,
	linux-kernel, Carlo Szelinsky
In-Reply-To: <20260624151251.1137250-1-horms@kernel.org>

On Wed, 24 Jun 2026 16:12:51 +0100, Simon Horman wrote:
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
[...]
> [High]
> Does this fix the use-after-free completely, or only the regulator-put
> part of it?
[...]
> Would a more complete fix also need pse_controller_unregister() to
> drain outstanding pse_control references, or have pse_control hold a
> refcount on pcdev, so that psec cannot outlive pcdev->pi and pcdev?

Thanks, the review is correct. This patch only fixes the regulator
handle. In the same unbind-while-held case __pse_control_release()
also reads psec->pcdev->pi[] and psec->pcdev->owner after
pse_controller_unregister() has freed pcdev->pi, so those are still
use-after-free reads on their own.

That wider problem is exactly what you describe: the controller does
not drain its outstanding pse_control references on unregister. It is
fixed by draining them, which is what the PSE notifier series does --
PSE_UNREGISTERED drops every phydev->psec before pse_release_pis()
frees pcdev->pi. This patch is patch 1 of that series (by Corey
Leavitt); the rest targets net-next and is deferred until it reopens:

  https://lore.kernel.org/netdev/20260620112440.1734404-1-github@szelinsky.de/

Jakub suggested sending this one to net on its own since it is a fix,
so it is here without the notifier patches. My v1 commit message
overclaimed by saying it makes __pse_control_release() correct
regardless of the controller's devres state, which is only true for
the regulator handle. I have reworded it in v2 to scope it to the
regulator put and to point at the series for the wider lifetime fix.

Does you agree? Another option would be to wait for the entire series.

cheers Carlo

^ permalink raw reply

* [PATCH net] eth: fbnic: fix race between concurrent hwmon sensor reads
From: Zinc Lim @ 2026-06-24 20:05 UTC (permalink / raw)
  To: Alexander Duyck, Jakub Kicinski, Andrew Lunn, David S. Miller,
	Eric Dumazet, Paolo Abeni
  Cc: alexander.duyck, kernel-team, netdev, linux-kernel, zinclim,
	Zinc Lim

From: Zinc Lim <limzhineng2@gmail.com>

Reading an hwmon sensor issues a TSENE firmware mailbox transaction that
uses a shared completion slot. Concurrent reads (e.g. parallel
"cat .../temp1_input" or a monitoring agent polling all attributes) race
over that slot, and the second transmit fails because a completion is
already pending:

  fbnic 0000:41:00.0: Failed to transmit TSENE read msg, err -17

Serialize the hwmon read path with a per-device mutex so only one TSENE
transaction is in flight at a time.

Fixes: 880630734102 ("eth: fbnic: Add hardware monitoring support via HWMON interface")
Signed-off-by: Zinc Lim <limzhineng2@gmail.com>
---
 drivers/net/ethernet/meta/fbnic/fbnic.h       |  2 ++
 drivers/net/ethernet/meta/fbnic/fbnic_hwmon.c | 15 +++++++++++++--
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/meta/fbnic/fbnic.h b/drivers/net/ethernet/meta/fbnic/fbnic.h
index d0715695c43e..e31d6f88b746 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic.h
+++ b/drivers/net/ethernet/meta/fbnic/fbnic.h
@@ -6,6 +6,7 @@
 
 #include <linux/interrupt.h>
 #include <linux/io.h>
+#include <linux/mutex.h>
 #include <linux/ptp_clock_kernel.h>
 #include <linux/types.h>
 #include <linux/workqueue.h>
@@ -27,6 +28,7 @@ struct fbnic_dev {
 	struct net_device *netdev;
 	struct dentry *dbg_fbd;
 	struct device *hwmon;
+	struct mutex hwmon_mutex; /* Serializes hwmon sensor reads */
 	struct devlink_health_reporter *fw_reporter;
 	struct devlink_health_reporter *otp_reporter;
 
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_hwmon.c b/drivers/net/ethernet/meta/fbnic/fbnic_hwmon.c
index def8598aceec..ac1b4a422677 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_hwmon.c
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_hwmon.c
@@ -33,10 +33,17 @@ static int fbnic_hwmon_read(struct device *dev, enum hwmon_sensor_types type,
 {
 	struct fbnic_dev *fbd = dev_get_drvdata(dev);
 	const struct fbnic_mac *mac = fbd->mac;
-	int id;
+	int id, err;
 
 	id = fbnic_hwmon_sensor_id(type);
-	return id < 0 ? id : mac->get_sensor(fbd, id, val);
+	if (id < 0)
+		return id;
+
+	mutex_lock(&fbd->hwmon_mutex);
+	err = mac->get_sensor(fbd, id, val);
+	mutex_unlock(&fbd->hwmon_mutex);
+
+	return err;
 }
 
 static const struct hwmon_ops fbnic_hwmon_ops = {
@@ -60,6 +67,8 @@ void fbnic_hwmon_register(struct fbnic_dev *fbd)
 	if (!IS_REACHABLE(CONFIG_HWMON))
 		return;
 
+	mutex_init(&fbd->hwmon_mutex);
+
 	fbd->hwmon = hwmon_device_register_with_info(fbd->dev, "fbnic",
 						     fbd, &fbnic_chip_info,
 						     NULL);
@@ -68,6 +77,7 @@ void fbnic_hwmon_register(struct fbnic_dev *fbd)
 			   "Failed to register hwmon device %pe\n",
 			   fbd->hwmon);
 		fbd->hwmon = NULL;
+		mutex_destroy(&fbd->hwmon_mutex);
 	}
 }
 
@@ -78,4 +88,5 @@ void fbnic_hwmon_unregister(struct fbnic_dev *fbd)
 
 	hwmon_device_unregister(fbd->hwmon);
 	fbd->hwmon = NULL;
+	mutex_destroy(&fbd->hwmon_mutex);
 }
-- 
2.53.0-Meta


^ permalink raw reply related

* Re: [PATCH bpf 1/2] bpf, sockmap: Don't leak UDP socks on lookup-bind-release
From: Willem de Bruijn @ 2026-06-24 20:01 UTC (permalink / raw)
  To: Jakub Sitnicki, Michal Luczaj, Willem de Bruijn
  Cc: John Fastabend, Jiayuan Chen, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Alexei Starovoitov,
	Cong Wang, Daniel Borkmann, Andrii Nakryiko, Eduard Zingerman,
	Kumar Kartikeya Dwivedi, Martin KaFai Lau, Song Liu,
	Yonghong Song, Jiri Olsa, Emil Tsalapatis, Shuah Khan, netdev,
	bpf, linux-kernel, linux-kselftest, kuniyu
In-Reply-To: <87wlvoxdq1.fsf@cloudflare.com>

Jakub Sitnicki wrote:
> On Tue, Jun 23, 2026 at 08:03 PM +02, Michal Luczaj wrote:
> > UDP sockets get SOCK_RCU_FREE set when (auto-)bound. This means
> > sk_is_refcounted(unbound) = true, while sk_is_refcounted(bound) = false.
> >
> > Because sockmap accepts unbound UDP sockets, a BPF program can increment a
> > socket's refcount via lookup. If the socket is subsequently bound, the
> > transition from unbound to bound causes bpf_sk_release() to skip the
> > decrement of the refcount, causing a memory leak.
> >
> > unreferenced object 0xffff88810bc2eb40 (size 1984):
> >   comm "test_progs", pid 2451, jiffies 4295320596
> >   hex dump (first 32 bytes):
> >     7f 00 00 01 7f 00 00 01 d2 04 1b b7 04 d2 00 00  ................
> >     02 00 01 40 00 00 00 00 00 00 00 00 00 00 00 00  ...@............
> >   backtrace (crc bdee079d):
> >     kmem_cache_alloc_noprof+0x557/0x660
> >     sk_prot_alloc+0x69/0x240
> >     sk_alloc+0x30/0x460
> >     inet_create+0x2ce/0xf80
> >     __sock_create+0x25b/0x5c0
> >     __sys_socket+0x119/0x1d0
> >     __x64_sys_socket+0x72/0xd0
> >     do_syscall_64+0xa1/0x5f0
> >     entry_SYSCALL_64_after_hwframe+0x76/0x7e
> >
> > Maintain balanced refcounts across sk lookup/release: (re-)set
> > SOCK_RCU_FREE on proto update to treat the socket (whether bound or
> > unbound) as not requiring a refcount increment on (a RCU protected) lookup.
> >
> > Fixes: 0c48eefae712 ("sock_map: Lift socket state restriction for datagram sockets")
> > Signed-off-by: Michal Luczaj <mhal@rbox.co>
> > ---
> > Note: this issue is related to commit 67312adc96b5 ("bpf: reject unhashed
> > sockets in bpf_sk_assign").
> > ---
> >  net/ipv4/udp_bpf.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/net/ipv4/udp_bpf.c b/net/ipv4/udp_bpf.c
> > index ad57c4c9eaab..970327b59582 100644
> > --- a/net/ipv4/udp_bpf.c
> > +++ b/net/ipv4/udp_bpf.c
> > @@ -173,6 +173,9 @@ int udp_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore)
> >  	if (sk->sk_family == AF_INET6)
> >  		udp_bpf_check_v6_needs_rebuild(psock->sk_proto);
> >  
> > +	/* Treat all sockets as non-refcounted, regardless of binding state. */
> > +	sock_set_flag(sk, SOCK_RCU_FREE);
> > +
> >  	sock_replace_proto(sk, &udp_bpf_prots[family]);
> >  	return 0;
> >  }
> 
> There is a side effect that an unhashed (unbound) UDP socket can now be
> selected in sk_lookup with bpf_sk_assign.

The commit does mention a related fix, beneath the ---, commit
67312adc96b5 ("bpf: reject unhashed sockets in bpf_sk_assign").
That fixes a similar issue by exactly disallowing this:

    Fix the problem by rejecting unhashed sockets in bpf_sk_assign().
    This matches the behaviour of __inet_lookup_skb which is ultimately
    the goal of bpf_sk_assign().

So ..

> Though perhaps that's for the
> better because TC bpf_sk_assign doesn't reject non-refcounted UDP
> sockets either, so we would have both socket dispatch sites behave the
> same way.

.. there are two conflicting types of consistency here? Consistent with
__inet_lookup_skb or the TC bpf hook. Of those the first is the more
canonical.

> Also, with this patch, if we insert & remove an unhashed UDP socket
> into/from a sockmap, we end up with an unhashed non-refcounted UDP
> socket. Not entirely sure if that is actually a problem or not.
> 
> Willem, what is your take on having unhashed non-refcoted UDP sockets?

I don't immediately see a problem, but I'm not an expert on SOCK_RCU_FREE.


^ permalink raw reply

* Re: [PATCH net] net: enetc: fix potential divide-by-zero when num_vsi is zero
From: Maxime Chevallier @ 2026-06-24 19:54 UTC (permalink / raw)
  To: wei.fang, claudiu.manoil, vladimir.oltean, xiaoning.wang,
	andrew+netdev, davem, edumazet, kuba, pabeni
  Cc: Frank.Li, wei.fang, imx, netdev, linux-kernel
In-Reply-To: <20260624072726.1238903-1-wei.fang@oss.nxp.com>

Hi,

On 6/24/26 09:27, wei.fang@oss.nxp.com wrote:
> From: Wei Fang <wei.fang@nxp.com>
> 
> For i.MX94 series, all the standalone ENETCs do not support SR-IOV, so
> pf->caps.num_vsi is zero. This leads to a divide-by-zero in
> enetc4_default_rings_allocation() when distributing rings among PF and
> VFs.
> 
> Division by zero is undefined behavior in C. On ARM64, the UDIV/SDIV
> instructions silently return zero rather than raising an exception, so
> the issue does not cause a visible crash. However, relying on this
> behavior is incorrect and poses a cross-platform compatibility risk.
> 
> Add an explicit check for num_vsi == 0 and return early after the PF's
> rings have been configured.
> 
> Fixes: 2d673b0e2f8d ("net: enetc: add standalone ENETC support for i.MX94")
> Signed-off-by: Wei Fang <wei.fang@nxp.com>
> ---
>  drivers/net/ethernet/freescale/enetc/enetc4_pf.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/net/ethernet/freescale/enetc/enetc4_pf.c b/drivers/net/ethernet/freescale/enetc/enetc4_pf.c
> index 4e771f852358..437a15bbb47b 100644
> --- a/drivers/net/ethernet/freescale/enetc/enetc4_pf.c
> +++ b/drivers/net/ethernet/freescale/enetc/enetc4_pf.c
> @@ -322,6 +322,9 @@ static void enetc4_default_rings_allocation(struct enetc_pf *pf)
>  	val = enetc4_psicfgr0_val_construct(false, num_tx_bdr, num_rx_bdr);
>  	enetc_port_wr(hw, ENETC4_PSICFGR0(0), val);
>  
> +	if (!pf->caps.num_vsi)
> +		return;
> +
>  	num_rx_bdr = pf->caps.num_rx_bdr - num_rx_bdr;
>  	rx_rem = num_rx_bdr % pf->caps.num_vsi;
>  	num_rx_bdr = num_rx_bdr / pf->caps.num_vsi;

Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>

Maxime

^ permalink raw reply

* Re: [PATCH] vhost/vdpa: reject overflowing PA map page counts
From: Michael S. Tsirkin @ 2026-06-24 19:53 UTC (permalink / raw)
  To: Yousef Alhouseen
  Cc: Jason Wang, Eugenio Pérez, kvm, virtualization, netdev,
	linux-kernel
In-Reply-To: <20260624190653.2893-1-alhouseenyousef@gmail.com>

On Wed, Jun 24, 2026 at 09:06:53PM +0200, Yousef Alhouseen wrote:
> vhost_vdpa_pa_map() adds the IOVA page offset to the user-controlled map
> size before computing the number of pages to pin. If that addition wraps,
> the code can pin and map fewer pages than the requested IOTLB range.
> 
> Reject sizes that overflow the page-count calculation.

You should add "on 32 bit systems" - I do not see how it can
overflow on 64 bit.

> Also make the
> memlock check subtraction-based so a large page count cannot wrap the
> pinned page total.

I don't see how this can happen at all - pinned_vm is in units of pages.

> Signed-off-by: Yousef Alhouseen <alhouseenyousef@gmail.com>
> ---
>  drivers/vhost/vdpa.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index ac55275fa..090cb8693 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -1102,6 +1102,8 @@ static int vhost_vdpa_pa_map(struct vhost_vdpa *v,
>  	unsigned int gup_flags = FOLL_LONGTERM;
>  	unsigned long npages, cur_base, map_pfn, last_pfn = 0;
>  	unsigned long lock_limit, sz2pin, nchunks, i;
> +	unsigned long page_offset;
> +	u64 pinned_vm;
>  	u64 start = iova;
>  	long pinned;
>  	int ret = 0;
> @@ -1114,7 +1116,12 @@ static int vhost_vdpa_pa_map(struct vhost_vdpa *v,
>  	if (perm & VHOST_ACCESS_WO)
>  		gup_flags |= FOLL_WRITE;
>  
> -	npages = PFN_UP(size + (iova & ~PAGE_MASK));
> +	page_offset = iova & ~PAGE_MASK;
> +	if (size > ULONG_MAX - page_offset) {
> +		ret = -EINVAL;
> +		goto free;
> +	}
> +	npages = PFN_UP(size + page_offset);
>  	if (!npages) {
>  		ret = -EINVAL;
>  		goto free;
> @@ -1123,7 +1130,8 @@ static int vhost_vdpa_pa_map(struct vhost_vdpa *v,
>  	mmap_read_lock(dev->mm);
>  
>  	lock_limit = PFN_DOWN(rlimit(RLIMIT_MEMLOCK));
> -	if (npages + atomic64_read(&dev->mm->pinned_vm) > lock_limit) {
> +	pinned_vm = atomic64_read(&dev->mm->pinned_vm);
> +	if (npages > lock_limit || pinned_vm > lock_limit - npages) {
>  		ret = -ENOMEM;
>  		goto unlock;
>  	}
> -- 
> 2.54.0


^ permalink raw reply

* Re: [PATCH v2 net 0/2] tipc: syzbot related fixes
From: Xin Long @ 2026-06-24 19:07 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Kuniyuki Iwashima, Jon Maloy, tipc-discussion, netdev,
	eric.dumazet
In-Reply-To: <20260623173030.2925059-1-edumazet@google.com>

On Tue, Jun 23, 2026 at 1:30 PM Eric Dumazet <edumazet@google.com> wrote:
>
> First patch fixes a recent syzbot report.
>
> Second patch is inspired by numerous syzbot soft lockup
> reports with RTNL pressure.
>
> Eric Dumazet (2):
>   tipc: fix UAF in cleanup_bearer() due to premature dst_cache_destroy()
>   tipc: avoid busy looping in tipc_exit_net()
>
>  net/tipc/core.c      |  4 ++--
>  net/tipc/udp_media.c | 19 ++++++++++++++-----
>  2 files changed, 16 insertions(+), 7 deletions(-)
>
> --
> 2.55.0.rc0.799.gd6f94ed593-goog
>

Reviewed-by: Xin Long <lucien.xin@gmail.com>

^ permalink raw reply

* [PATCH] vhost/vdpa: reject overflowing PA map page counts
From: Yousef Alhouseen @ 2026-06-24 19:06 UTC (permalink / raw)
  To: Michael S . Tsirkin, Jason Wang, Eugenio Pérez
  Cc: kvm, virtualization, netdev, linux-kernel, Yousef Alhouseen

vhost_vdpa_pa_map() adds the IOVA page offset to the user-controlled map
size before computing the number of pages to pin. If that addition wraps,
the code can pin and map fewer pages than the requested IOTLB range.

Reject sizes that overflow the page-count calculation. Also make the
memlock check subtraction-based so a large page count cannot wrap the
pinned page total.

Signed-off-by: Yousef Alhouseen <alhouseenyousef@gmail.com>
---
 drivers/vhost/vdpa.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index ac55275fa..090cb8693 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -1102,6 +1102,8 @@ static int vhost_vdpa_pa_map(struct vhost_vdpa *v,
 	unsigned int gup_flags = FOLL_LONGTERM;
 	unsigned long npages, cur_base, map_pfn, last_pfn = 0;
 	unsigned long lock_limit, sz2pin, nchunks, i;
+	unsigned long page_offset;
+	u64 pinned_vm;
 	u64 start = iova;
 	long pinned;
 	int ret = 0;
@@ -1114,7 +1116,12 @@ static int vhost_vdpa_pa_map(struct vhost_vdpa *v,
 	if (perm & VHOST_ACCESS_WO)
 		gup_flags |= FOLL_WRITE;
 
-	npages = PFN_UP(size + (iova & ~PAGE_MASK));
+	page_offset = iova & ~PAGE_MASK;
+	if (size > ULONG_MAX - page_offset) {
+		ret = -EINVAL;
+		goto free;
+	}
+	npages = PFN_UP(size + page_offset);
 	if (!npages) {
 		ret = -EINVAL;
 		goto free;
@@ -1123,7 +1130,8 @@ static int vhost_vdpa_pa_map(struct vhost_vdpa *v,
 	mmap_read_lock(dev->mm);
 
 	lock_limit = PFN_DOWN(rlimit(RLIMIT_MEMLOCK));
-	if (npages + atomic64_read(&dev->mm->pinned_vm) > lock_limit) {
+	pinned_vm = atomic64_read(&dev->mm->pinned_vm);
+	if (npages > lock_limit || pinned_vm > lock_limit - npages) {
 		ret = -ENOMEM;
 		goto unlock;
 	}
-- 
2.54.0


^ permalink raw reply related

* [PATCH net] net: ethtool: keep rtnl_lock for ops using ethtool_op_get_link()
From: Jakub Kicinski @ 2026-06-24 19:04 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, Jakub Kicinski,
	Breno Leitao, joshwash, hramamurthy, anthony.l.nguyen,
	przemyslaw.kitszel, saeedm, tariqt, mbloch, leon, alexanderduyck,
	kernel-team, kys, haiyangz, wei.liu, decui, longli, jordanrhee,
	jacob.e.keller, nktgrg, debarghyak, mohsin.bashr, ernis, sdf, gal,
	linux-rdma, linux-hyperv

Breno reports following splats on mlx5:

  RTNL: assertion failed at net/core/dev.c (2241)
  WARNING: net/core/dev.c:2241 at netif_state_change+0xed/0x130, CPU#5: ethtool/1335
  RIP: 0010:netif_state_change+0xf9/0x130
  Call Trace:
    <TASK>
     __linkwatch_sync_dev+0xea/0x120
     ethtool_op_get_link+0xe/0x20
     __ethtool_get_link+0x26/0x40
     linkstate_prepare_data+0x51/0x200
     ethnl_default_doit+0x213/0x470
     genl_family_rcv_msg_doit+0xdd/0x110

Looks like I missed ethtool_op_get_link() trying to sync linkwatch,
which needs rtnl_lock. Not all drivers do this - bnxt doesn't,
it just returns the link state, so add an opt-in bit.

Reported-by: Breno Leitao <leitao@debian.org>
Fixes: 45079e00133e ("net: ethtool: optionally skip rtnl_lock on Netlink path for GET ops")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
CC: joshwash@google.com
CC: hramamurthy@google.com
CC: anthony.l.nguyen@intel.com
CC: przemyslaw.kitszel@intel.com
CC: saeedm@nvidia.com
CC: tariqt@nvidia.com
CC: mbloch@nvidia.com
CC: leon@kernel.org
CC: alexanderduyck@fb.com
CC: kernel-team@meta.com
CC: kys@microsoft.com
CC: haiyangz@microsoft.com
CC: wei.liu@kernel.org
CC: decui@microsoft.com
CC: longli@microsoft.com
CC: jordanrhee@google.com
CC: jacob.e.keller@intel.com
CC: nktgrg@google.com
CC: debarghyak@google.com
CC: leitao@debian.org
CC: mohsin.bashr@gmail.com
CC: ernis@linux.microsoft.com
CC: sdf@fomichev.me
CC: gal@nvidia.com
CC: linux-rdma@vger.kernel.org
CC: linux-hyperv@vger.kernel.org
---
 include/linux/ethtool.h                                 | 2 ++
 net/ethtool/common.h                                    | 4 ++++
 drivers/net/ethernet/google/gve/gve_ethtool.c           | 3 ++-
 drivers/net/ethernet/intel/iavf/iavf_ethtool.c          | 1 +
 drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c    | 3 ++-
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c        | 3 ++-
 drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c | 4 +++-
 drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c         | 3 ++-
 drivers/net/ethernet/microsoft/mana/mana_ethtool.c      | 3 ++-
 9 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 1b834e2a522e..5d491a98265e 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -942,6 +942,7 @@ struct kernel_ethtool_ts_info {
 #define ETHTOOL_OP_NEEDS_RTNL_GPAUSEPARAM	BIT(5)
 #define ETHTOOL_OP_NEEDS_RTNL_SPAUSEPARAM	BIT(6)
 #define ETHTOOL_OP_NEEDS_RTNL_RSS		BIT(7)
+#define ETHTOOL_OP_NEEDS_RTNL_GLINK		BIT(8)
 
 /**
  * struct ethtool_ops - optional netdev operations
@@ -978,6 +979,7 @@ struct kernel_ethtool_ts_info {
  *	 - phylink helpers (note that phydev is currently unsupported!)
  *	 - netdev_update_features()
  *	 - netif_set_real_num_tx_queues()
+ *	 - ethtool_op_get_link() (syncs link watch under rtnl_lock)
  *
  * @get_drvinfo: Report driver/device information. Modern drivers no
  *	longer have to implement this callback. Most fields are
diff --git a/net/ethtool/common.h b/net/ethtool/common.h
index 2b3847f00801..4e5356e26f40 100644
--- a/net/ethtool/common.h
+++ b/net/ethtool/common.h
@@ -113,6 +113,8 @@ ethtool_nl_msg_needs_rtnl(const struct net_device *dev, u8 cmd)
 		return ops->op_needs_rtnl & ETHTOOL_OP_NEEDS_RTNL_SPAUSEPARAM;
 	case ETHTOOL_MSG_RSS_SET:
 		return ops->op_needs_rtnl & ETHTOOL_OP_NEEDS_RTNL_RSS;
+	case ETHTOOL_MSG_LINKSTATE_GET:
+		return ops->op_needs_rtnl & ETHTOOL_OP_NEEDS_RTNL_GLINK;
 	case ETHTOOL_MSG_TSCONFIG_GET:
 	case ETHTOOL_MSG_TSCONFIG_SET:
 		/* tsconfig calls ndos (ndo_hwtstamp_set/get), not ethtool ops.
@@ -159,6 +161,8 @@ ethtool_ioctl_needs_rtnl(const struct net_device *dev, u32 ethcmd)
 	case ETHTOOL_SRXFH:
 	case ETHTOOL_SRXFHINDIR:
 		return ops->op_needs_rtnl & ETHTOOL_OP_NEEDS_RTNL_RSS;
+	case ETHTOOL_GLINK:
+		return ops->op_needs_rtnl & ETHTOOL_OP_NEEDS_RTNL_GLINK;
 	}
 	return false;
 }
diff --git a/drivers/net/ethernet/google/gve/gve_ethtool.c b/drivers/net/ethernet/google/gve/gve_ethtool.c
index 7cc22916852f..8199738ba979 100644
--- a/drivers/net/ethernet/google/gve/gve_ethtool.c
+++ b/drivers/net/ethernet/google/gve/gve_ethtool.c
@@ -984,7 +984,8 @@ const struct ethtool_ops gve_ethtool_ops = {
 	.supported_ring_params = ETHTOOL_RING_USE_TCP_DATA_SPLIT |
 				 ETHTOOL_RING_USE_RX_BUF_LEN,
 	.op_needs_rtnl = ETHTOOL_OP_NEEDS_RTNL_SCHANNELS |
-			 ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM,
+			 ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM |
+			 ETHTOOL_OP_NEEDS_RTNL_GLINK,
 	.get_drvinfo = gve_get_drvinfo,
 	.get_strings = gve_get_strings,
 	.get_sset_count = gve_get_sset_count,
diff --git a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
index a615d599b88e..e7cf12eaa268 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
@@ -1855,6 +1855,7 @@ static const struct ethtool_ops iavf_ethtool_ops = {
 	.supported_coalesce_params = ETHTOOL_COALESCE_USECS |
 				     ETHTOOL_COALESCE_USE_ADAPTIVE,
 	.supported_input_xfrm	= RXH_XFRM_SYM_XOR,
+	.op_needs_rtnl		= ETHTOOL_OP_NEEDS_RTNL_GLINK,
 	.get_drvinfo		= iavf_get_drvinfo,
 	.get_link		= ethtool_op_get_link,
 	.get_ringparam		= iavf_get_ringparam,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index 2f5b626ba33f..112926d07634 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -2721,7 +2721,8 @@ const struct ethtool_ops mlx5e_ethtool_ops = {
 	.rxfh_max_num_contexts	= MLX5E_MAX_NUM_RSS,
 	.op_needs_rtnl		= ETHTOOL_OP_NEEDS_RTNL_SCHANNELS |
 				  ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM |
-				  ETHTOOL_OP_NEEDS_RTNL_SPFLAGS,
+				  ETHTOOL_OP_NEEDS_RTNL_SPFLAGS |
+				  ETHTOOL_OP_NEEDS_RTNL_GLINK,
 	.supported_coalesce_params = ETHTOOL_COALESCE_USECS |
 				     ETHTOOL_COALESCE_MAX_FRAMES |
 				     ETHTOOL_COALESCE_USE_ADAPTIVE |
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 1a8a19f980d3..c8b76d301c92 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -419,7 +419,8 @@ static const struct ethtool_ops mlx5e_rep_ethtool_ops = {
 				     ETHTOOL_COALESCE_MAX_FRAMES |
 				     ETHTOOL_COALESCE_USE_ADAPTIVE,
 	.op_needs_rtnl	   = ETHTOOL_OP_NEEDS_RTNL_SCHANNELS |
-			     ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM,
+			     ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM |
+			     ETHTOOL_OP_NEEDS_RTNL_GLINK,
 	.get_drvinfo	   = mlx5e_rep_get_drvinfo,
 	.get_link	   = ethtool_op_get_link,
 	.get_strings       = mlx5e_rep_get_strings,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c
index 9b3b32408c64..01ddc3def9ac 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c
@@ -286,7 +286,8 @@ const struct ethtool_ops mlx5i_ethtool_ops = {
 				     ETHTOOL_COALESCE_MAX_FRAMES |
 				     ETHTOOL_COALESCE_USE_ADAPTIVE,
 	.op_needs_rtnl	    = ETHTOOL_OP_NEEDS_RTNL_SCHANNELS |
-			      ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM,
+			      ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM |
+			      ETHTOOL_OP_NEEDS_RTNL_GLINK,
 	.get_drvinfo        = mlx5i_get_drvinfo,
 	.get_strings        = mlx5i_get_strings,
 	.get_sset_count     = mlx5i_get_sset_count,
@@ -309,6 +310,7 @@ const struct ethtool_ops mlx5i_ethtool_ops = {
 };
 
 const struct ethtool_ops mlx5i_pkey_ethtool_ops = {
+	.op_needs_rtnl	    = ETHTOOL_OP_NEEDS_RTNL_GLINK,
 	.get_drvinfo        = mlx5i_get_drvinfo,
 	.get_link           = ethtool_op_get_link,
 	.get_ts_info        = mlx5i_get_ts_info,
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c b/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c
index cb34fc166ef9..0e47088ec44b 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c
@@ -2024,7 +2024,8 @@ static const struct ethtool_ops fbnic_ethtool_ops = {
 					  ETHTOOL_OP_NEEDS_RTNL_GPAUSEPARAM |
 					  ETHTOOL_OP_NEEDS_RTNL_SPAUSEPARAM |
 					  ETHTOOL_OP_NEEDS_RTNL_SCHANNELS |
-					  ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM,
+					  ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM |
+					  ETHTOOL_OP_NEEDS_RTNL_GLINK,
 	.get_drvinfo			= fbnic_get_drvinfo,
 	.get_regs_len			= fbnic_get_regs_len,
 	.get_regs			= fbnic_get_regs,
diff --git a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
index 94e658d07a27..881df597d7f9 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
@@ -597,7 +597,8 @@ static int mana_get_link_ksettings(struct net_device *ndev,
 const struct ethtool_ops mana_ethtool_ops = {
 	.supported_coalesce_params = ETHTOOL_COALESCE_RX_CQE_FRAMES,
 	.op_needs_rtnl		= ETHTOOL_OP_NEEDS_RTNL_SCHANNELS |
-				  ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM,
+				  ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM |
+				  ETHTOOL_OP_NEEDS_RTNL_GLINK,
 	.get_ethtool_stats	= mana_get_ethtool_stats,
 	.get_sset_count		= mana_get_sset_count,
 	.get_strings		= mana_get_strings,
-- 
2.54.0


^ permalink raw reply related

* Re: [PATCH v2 net 1/2] tipc: fix UAF in cleanup_bearer() due to premature dst_cache_destroy()
From: Eric Dumazet @ 2026-06-24 18:49 UTC (permalink / raw)
  To: Xin Long
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Kuniyuki Iwashima, Jon Maloy, tipc-discussion, netdev,
	eric.dumazet, syzbot+e14bc5d4942756023b77
In-Reply-To: <CADvbK_cnZmZkzCUxGEi=uaBug3VcfUd4MiAzQp1OGUsnvau=xA@mail.gmail.com>

On Wed, Jun 24, 2026 at 11:37 AM Xin Long <lucien.xin@gmail.com> wrote:
>
> > diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c
> > index 988b8a7f953ad..66f3cb87a0aaa 100644
> > --- a/net/tipc/udp_media.c
> > +++ b/net/tipc/udp_media.c
> > @@ -803,6 +803,14 @@ static int tipc_udp_enable(struct net *net, struct tipc_bearer *b,
> > return err;
> > }
> >
> > +static void rcast_free_rcu(struct rcu_head *rcu)
> > +{
> > + struct udp_replicast *rcast = container_of(rcu, struct udp_replicast, rcu);
> > +
> > + dst_cache_destroy(&rcast->dst_cache);
> > + kfree(rcast);
> > +}
> > +
> Since this adds a module-specific callback rcast_free_rcu registered with RCU
> via call_rcu_hurry(), is an rcu_barrier() needed in the TIPC module exit
> function?

There is one already, this was one of my feedback for this patch:

commit 1579342d71133da7f00daa02c75cebec7372097b
Author: Weiming Shi <bestswngs@gmail.com>
Date:   Wed Jun 17 21:57:45 2026 +0800

    tipc: fix use-after-free of the discoverer in tipc_disc_rcv()



> If the module is unloaded, the RCU grace period might expire after the module
> memory is freed.
> net/tipc/core.c:tipc_exit() {
> tipc_netlink_compat_stop();
> ...
> pr_info("Deactivated\n");
> }
> Could this result in a kernel panic when RCU attempts to execute the unloaded
> rcast_free_rcu function?
>
> This sashiko report looks legit.
>
> I think synchronize_net() doesn't guarantee rcast_free_rcu() to be done.

^ permalink raw reply

* [PATCH nf-next 0/4] netfilter: replace u_int*_t with kernel int types (batch 2)
From: Carlos Grillet @ 2026-06-24 18:40 UTC (permalink / raw)
  To: Simon Horman, Julian Anastasov, David Ahern, Ido Schimmel,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Pablo Neira Ayuso, Florian Westphal, Phil Sutter
  Cc: netdev, lvs-devel, linux-kernel, netfilter-devel, coreteam

This patch series replaces POSIX u_int8_t/u_int16_t with the preferred
kernel types u8/u16 across several netfilter files and updates the
corresponding header definitions.

This continues the work started in:
https://lore.kernel.org/all/20260616182948.96865-1-carlos@carlosgrillet.me

No functional changes.

Carlos Grillet (4):
  netfilter: nf_conntrack_sane: replace u_int16_t with u16
  netfilter: nf_conntrack_h323_main: replace u_int8_t with u8
  netfilter: nf_conntrack_amanda: replace u_int16_t with u16
  netfilter: ip_vs_nfct: replace u_int8_t with u8

 include/net/ip_vs.h                    | 2 +-
 net/netfilter/ipvs/ip_vs_nfct.c        | 2 +-
 net/netfilter/nf_conntrack_amanda.c    | 2 +-
 net/netfilter/nf_conntrack_h323_main.c | 2 +-
 net/netfilter/nf_conntrack_sane.c      | 2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

-- 
2.54.0


^ permalink raw reply

* [PATCH nf-next 1/4] netfilter: nf_conntrack_sane: replace u_int16_t with u16
From: Carlos Grillet @ 2026-06-24 18:40 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Florian Westphal, Phil Sutter, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman
  Cc: netfilter-devel, coreteam, netdev, linux-kernel
In-Reply-To: <20260624184036.71051-1-carlos@carlosgrillet.me>

Use preferred kernel integer type u16 instead of the POSIX u_int16_t
variant.

No functional change.

Signed-off-by: Carlos Grillet <carlos@carlosgrillet.me>
---
 net/netfilter/nf_conntrack_sane.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/nf_conntrack_sane.c b/net/netfilter/nf_conntrack_sane.c
index 39085acf7a71..130b3e68090e 100644
--- a/net/netfilter/nf_conntrack_sane.c
+++ b/net/netfilter/nf_conntrack_sane.c
@@ -35,7 +35,7 @@ MODULE_DESCRIPTION("SANE connection tracking helper");
 MODULE_ALIAS_NFCT_HELPER(HELPER_NAME);
 
 #define MAX_PORTS 8
-static u_int16_t ports[MAX_PORTS];
+static u16 ports[MAX_PORTS];
 static unsigned int ports_c;
 module_param_array(ports, ushort, &ports_c, 0400);
 
-- 
2.54.0


^ permalink raw reply related

* [PATCH nf-next 3/4] netfilter: nf_conntrack_amanda: replace u_int16_t with u16
From: Carlos Grillet @ 2026-06-24 18:40 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Florian Westphal, Phil Sutter, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman
  Cc: netfilter-devel, coreteam, netdev, linux-kernel
In-Reply-To: <20260624184036.71051-1-carlos@carlosgrillet.me>

Use preferred kernel integer type u16 instead of the POSIX u_int16_t
variant.

No functional change.

Signed-off-by: Carlos Grillet <carlos@carlosgrillet.me>
---
 net/netfilter/nf_conntrack_amanda.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/nf_conntrack_amanda.c b/net/netfilter/nf_conntrack_amanda.c
index ddafbdfc96dc..f10ac2c49f4b 100644
--- a/net/netfilter/nf_conntrack_amanda.c
+++ b/net/netfilter/nf_conntrack_amanda.c
@@ -89,7 +89,7 @@ static int amanda_help(struct sk_buff *skb,
 	struct nf_conntrack_tuple *tuple;
 	unsigned int dataoff, start, stop, off, i;
 	char pbuf[sizeof("65535")], *tmp;
-	u_int16_t len;
+	u16 len;
 	__be16 port;
 	int ret = NF_ACCEPT;
 	nf_nat_amanda_hook_fn *nf_nat_amanda;
-- 
2.54.0


^ permalink raw reply related

* [PATCH nf-next 2/4] netfilter: nf_conntrack_h323_main: replace u_int8_t with u8
From: Carlos Grillet @ 2026-06-24 18:40 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Florian Westphal, Phil Sutter, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman
  Cc: netfilter-devel, coreteam, netdev, linux-kernel
In-Reply-To: <20260624184036.71051-1-carlos@carlosgrillet.me>

Use preferred kernel integer type u8 instead of the POSIX u_int8_t
variant.

No functional change.

Signed-off-by: Carlos Grillet <carlos@carlosgrillet.me>
---
 net/netfilter/nf_conntrack_h323_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/nf_conntrack_h323_main.c b/net/netfilter/nf_conntrack_h323_main.c
index 7f189dceb3c4..68ecaf0daf95 100644
--- a/net/netfilter/nf_conntrack_h323_main.c
+++ b/net/netfilter/nf_conntrack_h323_main.c
@@ -671,7 +671,7 @@ static int expect_h245(struct sk_buff *skb, struct nf_conn *ct,
 static int callforward_do_filter(struct net *net,
 				 const union nf_inet_addr *src,
 				 const union nf_inet_addr *dst,
-				 u_int8_t family)
+				 u8 family)
 {
 	int ret = 0;
 
-- 
2.54.0


^ permalink raw reply related

* [PATCH nf-next 4/4] netfilter: ip_vs_nfct: replace u_int8_t with u8
From: Carlos Grillet @ 2026-06-24 18:40 UTC (permalink / raw)
  To: David Ahern, Ido Schimmel, Simon Horman, Julian Anastasov,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Pablo Neira Ayuso, Florian Westphal, Phil Sutter
  Cc: netdev, lvs-devel, linux-kernel, netfilter-devel, coreteam
In-Reply-To: <20260624184036.71051-1-carlos@carlosgrillet.me>

Use preferred kernel integer type u8 instead of the POSIX u_int8_t
variant and update header to match definition.

No functional change.

Signed-off-by: Carlos Grillet <carlos@carlosgrillet.me>
---
 include/net/ip_vs.h             | 2 +-
 net/netfilter/ipvs/ip_vs_nfct.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index 49297fec448a..ed2e9bc1bb4e 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -2123,7 +2123,7 @@ void ip_vs_update_conntrack(struct sk_buff *skb, struct ip_vs_conn *cp,
 			    int outin);
 int ip_vs_confirm_conntrack(struct sk_buff *skb);
 void ip_vs_nfct_expect_related(struct sk_buff *skb, struct nf_conn *ct,
-			       struct ip_vs_conn *cp, u_int8_t proto,
+			       struct ip_vs_conn *cp, u8 proto,
 			       const __be16 port, int from_rs);
 void ip_vs_conn_drop_conntrack(struct ip_vs_conn *cp);
 
diff --git a/net/netfilter/ipvs/ip_vs_nfct.c b/net/netfilter/ipvs/ip_vs_nfct.c
index 81974f69e5bb..347185fd0c8c 100644
--- a/net/netfilter/ipvs/ip_vs_nfct.c
+++ b/net/netfilter/ipvs/ip_vs_nfct.c
@@ -208,7 +208,7 @@ static void ip_vs_nfct_expect_callback(struct nf_conn *ct,
  * Use port 0 to expect connection from any port.
  */
 void ip_vs_nfct_expect_related(struct sk_buff *skb, struct nf_conn *ct,
-			       struct ip_vs_conn *cp, u_int8_t proto,
+			       struct ip_vs_conn *cp, u8 proto,
 			       const __be16 port, int from_rs)
 {
 	struct nf_conntrack_expect *exp;
-- 
2.54.0


^ permalink raw reply related

* Re: [PATCH v2 net 1/2] tipc: fix UAF in cleanup_bearer() due to premature dst_cache_destroy()
From: Xin Long @ 2026-06-24 18:37 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Kuniyuki Iwashima, Jon Maloy, tipc-discussion, netdev,
	eric.dumazet, syzbot+e14bc5d4942756023b77
In-Reply-To: <20260623173030.2925059-2-edumazet@google.com>

> diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c
> index 988b8a7f953ad..66f3cb87a0aaa 100644
> --- a/net/tipc/udp_media.c
> +++ b/net/tipc/udp_media.c
> @@ -803,6 +803,14 @@ static int tipc_udp_enable(struct net *net, struct tipc_bearer *b,
> return err;
> }
>
> +static void rcast_free_rcu(struct rcu_head *rcu)
> +{
> + struct udp_replicast *rcast = container_of(rcu, struct udp_replicast, rcu);
> +
> + dst_cache_destroy(&rcast->dst_cache);
> + kfree(rcast);
> +}
> +
Since this adds a module-specific callback rcast_free_rcu registered with RCU
via call_rcu_hurry(), is an rcu_barrier() needed in the TIPC module exit
function?
If the module is unloaded, the RCU grace period might expire after the module
memory is freed.
net/tipc/core.c:tipc_exit() {
tipc_netlink_compat_stop();
...
pr_info("Deactivated\n");
}
Could this result in a kernel panic when RCU attempts to execute the unloaded
rcast_free_rcu function?

This sashiko report looks legit.

I think synchronize_net() doesn't guarantee rcast_free_rcu() to be done.

^ permalink raw reply

* Re: [PATCH v2] netdevsim: fix use-after-free in nsim_create and __nsim_dev_port_del
From: Simon Horman @ 2026-06-24 18:35 UTC (permalink / raw)
  To: Hrushiraj Gandhi
  Cc: Jakub Kicinski, Andrew Lunn, David S . Miller, Eric Dumazet,
	Paolo Abeni, Jiri Pirko, netdev, linux-kernel, bpf,
	syzbot+6c25f4750230faf70be9
In-Reply-To: <20260623144447.255326-1-hrushirajg23@gmail.com>

On Tue, Jun 23, 2026 at 08:14:47PM +0530, Hrushiraj Gandhi wrote:
> debugfs files created under a port's ddir (ethtool/get_err,
> ethtool/set_err, ring params, bpf_offloaded_id, udp_ports/inject_error,
> etc.) store raw pointers directly into the netdevsim struct, which lives
> in the net_device private data kmalloc slab.
> 
> If these files outlive the netdevsim struct, a concurrent reader can
> trigger a slab-use-after-free by passing debugfs_file_get() (which only
> checks dentry lifetime) and then dereferencing the freed data pointer
> in debugfs_u32_get().
> 
> In __nsim_dev_port_del(), nsim_destroy() is called before
> nsim_dev_port_debugfs_exit(). However, nsim_destroy() calls free_netdev()
> at its end, while nsim_dev_port_debugfs_exit() removes the port's
> debugfs directory. This means the slab is freed before the debugfs
> files are removed.
> 
> The same window exists on nsim_create()'s error path:
> nsim_ethtool_init() creates debugfs files under ddir with pointers into
> ns before nsim_init_netdevsim()/nsim_init_netdevsim_vf() which can fail,
> and the err_free_netdev label calls free_netdev() while those debugfs
> entries are still live.
> 
> Fix both paths by calling debugfs_remove_recursive() on the port's
> ddir before every free_netdev() call. The subsequent
> nsim_dev_port_debugfs_exit() calls become harmless no-ops since ddir is
> set to NULL.
> 
> Reported-by: syzbot+6c25f4750230faf70be9@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=6c25f4750230faf70be9
> Fixes: e05b2d141fef ("netdevsim: move netdev creation/destruction to dev probe")
> Signed-off-by: Hrushiraj Gandhi <hrushirajg23@gmail.com>
> ---
> v2:
> - Also fix the same use-after-free window on the error path of nsim_create() as suggested by Simon Horman.
> - Shorten the code comment in nsim_destroy() to be more concise.

Thanks for the updates.

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox