Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v2] net: wwan: t7xx: validate port_count against message length in t7xx_port_enum_msg_handler
From: Willy Tarreau @ 2026-04-14 16:23 UTC (permalink / raw)
  To: Pavitra Jha; +Cc: pabeni, chandrashekar.devegowda, linux-wwan, netdev, stable
In-Reply-To: <20260414153201.1633720-1-jhapavitra98@gmail.com>

Hello,

On Tue, Apr 14, 2026 at 11:31:56AM -0400, Pavitra Jha wrote:
> t7xx_port_enum_msg_handler() uses the modem-supplied port_count field as
> a loop bound over port_msg->data[] without checking that the message buffer
> contains sufficient data. A modem sending port_count=65535 in a 12-byte
> buffer triggers a slab-out-of-bounds read of up to 262140 bytes.
> 
> Add a struct_size() check after extracting port_count and before the loop.
> Pass msg_len to t7xx_port_enum_msg_handler() and use it to validate
> the message size before accessing port_msg->data[].
> Pass msg_len from both call sites: skb->len at the DPMAIF path after
> skb_pull(), and the captured rt_feature->data_len at the handshake path.
> 
> Fixes: 39d439047f1d ("net: wwan: t7xx: Add control DMA interface")
> Cc: stable@vger.kernel.org
> Reported-by: Pavitra Jha <jhapavitra98@gmail.com>
> Signed-off-by: Pavitra Jha <jhapavitra98@gmail.com>

Please note that you don't need the Reported-by tag when it's the same
as the Signed-off-by one.

Also, I'm noticing a few empty-line removals out of context below:

> diff --git a/drivers/net/wwan/t7xx/t7xx_modem_ops.c b/drivers/net/wwan/t7xx/t7xx_modem_ops.c
> index 7968e208d..d0559fe16 100644
> --- a/drivers/net/wwan/t7xx/t7xx_modem_ops.c
> +++ b/drivers/net/wwan/t7xx/t7xx_modem_ops.c
> @@ -453,25 +453,25 @@ static int t7xx_parse_host_rt_data(struct t7xx_fsm_ctl *ctl, struct t7xx_sys_inf
>  {
>  	enum mtk_feature_support_type ft_spt_st, ft_spt_cfg;
>  	struct mtk_runtime_feature *rt_feature;
> +	size_t feat_data_len;
>  	int i, offset;
>  
>  	offset = sizeof(struct feature_query);
>  	for (i = 0; i < FEATURE_COUNT && offset < data_length; i++) {
>  		rt_feature = data + offset;
> -		offset += sizeof(*rt_feature) + le32_to_cpu(rt_feature->data_len);
> -
> +		feat_data_len = le32_to_cpu(rt_feature->data_len);
> +		offset += sizeof(*rt_feature) + feat_data_len;
>  		ft_spt_cfg = FIELD_GET(FEATURE_MSK, core->feature_set[i]);
>  		if (ft_spt_cfg != MTK_FEATURE_MUST_BE_SUPPORTED)
>  			continue;
> -

here

>  		ft_spt_st = FIELD_GET(FEATURE_MSK, rt_feature->support_info);
>  		if (ft_spt_st != MTK_FEATURE_MUST_BE_SUPPORTED)
>  			return -EINVAL;
> -

Here, the original author probably left the line to highlight the return
statement.

> -		if (i == RT_ID_MD_PORT_ENUM || i == RT_ID_AP_PORT_ENUM)
> -			t7xx_port_enum_msg_handler(ctl->md, rt_feature->data);
> +		if (i == RT_ID_MD_PORT_ENUM || i == RT_ID_AP_PORT_ENUM) {
> +			t7xx_port_enum_msg_handler(ctl->md, rt_feature->data,
> +						   feat_data_len);
> +		}
>  	}
> -

Here, why?

>  	return 0;
>  }
>  
> diff --git a/drivers/net/wwan/t7xx/t7xx_port_ctrl_msg.c b/drivers/net/wwan/t7xx/t7xx_port_ctrl_msg.c
> index ae632ef96..d984a688d 100644
> --- a/drivers/net/wwan/t7xx/t7xx_port_ctrl_msg.c
> +++ b/drivers/net/wwan/t7xx/t7xx_port_ctrl_msg.c
> @@ -154,7 +161,6 @@ int t7xx_port_enum_msg_handler(struct t7xx_modem *md, void *msg)
>  
>  	return 0;
>  }
> -

This one as well.

>  static int control_msg_handler(struct t7xx_port *port, struct sk_buff *skb)
>  {
>  	const struct t7xx_port_conf *port_conf = port->port_conf;

Better leave them untouched, it will keep the code as readable as it
previously was and reduce the overall review effort.

thanks,
willy

^ permalink raw reply

* Re: [PATCH net] net: pse-pd: fix kernel-doc function name for pse_control_find_by_id()
From: Andrew Lunn @ 2026-04-14 16:44 UTC (permalink / raw)
  To: Kory Maincent
  Cc: Jakub Kicinski, netdev, linux-kernel, thomas.petazzoni,
	Oleksij Rempel, Andrew Lunn, David S. Miller, Eric Dumazet,
	Paolo Abeni
In-Reply-To: <20260414150948.744618-1-kory.maincent@bootlin.com>

On Tue, Apr 14, 2026 at 05:09:47PM +0200, Kory Maincent wrote:
> The kernel-doc comment header incorrectly referenced the function
> name pse_control_find_net_by_id() instead of the actual function name
> pse_control_find_by_id(). Correct the function name in the documentation
> to match the implementation.
> 
> Fixes: fc0e6db30941a ("net: pse-pd: Add support for reporting events")
> Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* [syzbot ci] Re: veth: add Byte Queue Limits (BQL) support
From: syzbot ci @ 2026-04-14 17:05 UTC (permalink / raw)
  To: nogikh, hawk, linux-kernel, netdev, syzbot, syzkaller-bugs
  Cc: syzbot, syzkaller-bugs
In-Reply-To: <20260414083316.19864-1-nogikh@google.com>

syzbot ci has tested the suggested fix patch on top of the following series:

[v2] veth: add Byte Queue Limits (BQL) support
https://lore.kernel.org/all/20260413094442.1376022-1-hawk@kernel.org

Patch: https://ci.syzbot.org/jobs/bf27b11f-8196-4c99-bc10-11843ded8a31/patch

The patch testing request could not be completed:
tree "net-next" is no longer known

Full report is available here:
https://ci.syzbot.org/session/31e795da-e7ea-4f46-8412-c30d420a5f1e

---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.

^ permalink raw reply

* Re: [PATCH net v5 1/2] flow_dissector: do not dissect PPPoE PFC frames
From: Simon Horman @ 2026-04-14 17:08 UTC (permalink / raw)
  To: Qingfang Deng
  Cc: linux-ppp, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Tony Nguyen, Guillaume Nault, Wojciech Drewek,
	netdev, linux-kernel, Paul Mackerras, Jaco Kroon, James Carlson,
	Marcin Szycik
In-Reply-To: <20260414021353.23471-1-qingfang.deng@linux.dev>

On Tue, Apr 14, 2026 at 10:13:48AM +0800, Qingfang Deng wrote:
> RFC 2516 Section 7 states that Protocol Field Compression (PFC) is NOT
> RECOMMENDED for PPPoE. In practice, pppd does not support negotiating
> PFC for PPPoE sessions, and the flow dissector driver has assumed an
> uncompressed frame until the blamed commit.
> 
> During the review process of that commit [1], support for PFC is
> suggested. However, having a compressed (1-byte) protocol field means
> the subsequent PPP payload is shifted by one byte, causing 4-byte
> misalignment for the network header and an unaligned access exception
> on some architectures.
> 
> The exception can be reproduced by sending a PPPoE PFC frame to an
> ethernet interface of a MIPS board, with RPS enabled, even if no PPPoE
> session is active on that interface:
> 
> $ 0   : 00000000 80c40000 00000000 85144817
> $ 4   : 00000008 00000100 80a75758 81dc9bb8
> $ 8   : 00000010 8087ae2c 0000003d 00000000
> $12   : 000000e0 00000039 00000000 00000000
> $16   : 85043240 80a75758 81dc9bb8 00006488
> $20   : 0000002f 00000007 85144810 80a70000
> $24   : 81d1bda0 00000000
> $28   : 81dc8000 81dc9aa8 00000000 805ead08
> Hi    : 00009d51
> Lo    : 2163358a
> epc   : 805e91f0 __skb_flow_dissect+0x1b0/0x1b50
> ra    : 805ead08 __skb_get_hash_net+0x74/0x12c
> Status: 11000403        KERNEL EXL IE
> Cause : 40800010 (ExcCode 04)
> BadVA : 85144817
> PrId  : 0001992f (MIPS 1004Kc)
> Call Trace:
> [<805e91f0>] __skb_flow_dissect+0x1b0/0x1b50
> [<805ead08>] __skb_get_hash_net+0x74/0x12c
> [<805ef330>] get_rps_cpu+0x1b8/0x3fc
> [<805fca70>] netif_receive_skb_list_internal+0x324/0x364
> [<805fd120>] napi_complete_done+0x68/0x2a4
> [<8058de5c>] mtk_napi_rx+0x228/0xfec
> [<805fd398>] __napi_poll+0x3c/0x1c4
> [<805fd754>] napi_threaded_poll_loop+0x234/0x29c
> [<805fd848>] napi_threaded_poll+0x8c/0xb0
> [<80053544>] kthread+0x104/0x12c
> [<80002bd8>] ret_from_kernel_thread+0x14/0x1c
> 
> Code: 02d51821  1060045b  00000000 <8c640000> 3084000f  2c820005  144001a2  00042080  8e220000
> 
> To reduce the attack surface and maintain performance, do not process
> PPPoE PFC frames.
> 
> [1] https://patch.msgid.link/20220630231016.GA392@debian.home
> Fixes: 46126db9c861 ("flow_dissector: Add PPPoE dissectors")
> Signed-off-by: Qingfang Deng <qingfang.deng@linux.dev>
> ---
> Changes in v5: drop byte-swap change
>  Link to v4: https://lore.kernel.org/netdev/20260410033627.93786-1-qingfang.deng@linux.dev/
> 
>  net/core/flow_dissector.c | 10 +---------
>  1 file changed, 1 insertion(+), 9 deletions(-)
> 
> diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
> index 1b61bb25ba0e..f9aaba554128 100644
> --- a/net/core/flow_dissector.c
> +++ b/net/core/flow_dissector.c
> @@ -1374,16 +1374,8 @@ bool __skb_flow_dissect(const struct net *net,
>  			break;
>  		}
>  
> -		/* least significant bit of the most significant octet
> -		 * indicates if protocol field was compressed
> -		 */
>  		ppp_proto = ntohs(hdr->proto);
> -		if (ppp_proto & 0x0100) {
> -			ppp_proto = ppp_proto >> 8;
> -			nhoff += PPPOE_SES_HLEN - 1;
> -		} else {
> -			nhoff += PPPOE_SES_HLEN;
> -		}

I think it would be good to add a comment around here
describing how PFC is safely handled in this function.

> +		nhoff += PPPOE_SES_HLEN;
>  
>  		if (ppp_proto == PPP_IP) {
>  			proto = htons(ETH_P_IP);
> -- 
> 2.43.0
> 

^ permalink raw reply

* Re: [PATCH] net: mdio: octeon: use %p for bus id
From: 古鎮榮 @ 2026-04-14 17:10 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: davem, kuba, edumazet, pabeni, hkallweit1, linux, netdev,
	linux-kernel
In-Reply-To: <efc34ba8-730a-4c01-bc44-ee64569d2d4e@lunn.ch>

Thank you for the clarification. I understand the concern now and will
not pursue this patch further.

Best regards,
Chen Jung Ku

Andrew Lunn <andrew@lunn.ch> 於 2026年4月15日週三 上午12:16寫道：
>
> On Tue, Apr 14, 2026 at 11:56:52PM +0800, Chen Jung Ku wrote:
> > Replace %px with %p to avoid exposing raw kernel pointer values.
>
> What exactly are we giving away here?
>
>                         compatible = "cavium,octeon-3860-mdio";
>                         #address-cells = <1>;
>                         #size-cells = <0>;
>                         reg = <0x11800 0x00001900 0x0 0x40>;
>
> Isn't bus->register_base this well known value?
>
> You also need to think about ABI.
>
>     Andrew

^ permalink raw reply

* Re: [Intel-wired-lan] [PATCH iwl-net v2 2/6] ixgbe: add bounds check for debugfs register access
From: Simon Horman @ 2026-04-14 17:16 UTC (permalink / raw)
  To: Jacob Keller
  Cc: Aleksandr Loktionov, intel-wired-lan, anthony.l.nguyen, netdev,
	Paul Greenwalt
In-Reply-To: <dda1f0f3-f57b-418a-93e6-2cdaa1d2ef35@intel.com>

On Mon, Apr 13, 2026 at 06:00:28PM -0700, Jacob Keller wrote:
> On 4/13/2026 3:30 AM, Simon Horman wrote:
> > On Wed, Apr 08, 2026 at 03:11:50PM +0200, Aleksandr Loktionov wrote:
> >> From: Paul Greenwalt <paul.greenwalt@intel.com>
> >>
> >> Prevent out-of-bounds MMIO accesses triggered through user-controlled
> >> register offsets.  IXGBE_HFDR (0x15FE8) is the highest valid MMIO
> >> register in the ixgbe register map; any offset beyond it would address
> >> unmapped memory.
> >>
> >> Add a defense-in-depth check at two levels:
> >>
> >> 1. ixgbe_read_reg() -- the noinline register read accessor.  A
> >>    WARN_ON_ONCE() guard here catches any future code path (including
> >>    ioctl extensions) that might inadvertently pass an out-of-range
> >>    offset without relying on higher layers to catch it first.
> >>    ixgbe_write_reg() is a static inline called from the TX/RX hot path;
> >>    adding WARN_ON_ONCE there would inline the check at every call site,
> >>    so only the read path gets this guard.
> >>
> >> 2. ixgbe_dbg_reg_ops_write() -- the debugfs 'reg_ops' interface is the
> >>    only current path where a raw, user-supplied offset enters the driver.
> >>    Gating it before invoking the register accessors provides a clean,
> >>    user-visible failure (silent ignore with no kernel splat) for
> >>    deliberately malformed debugfs writes.
> >>
> >> Add a reg <= IXGBE_HFDR guard to both the read and write paths in
> >> ixgbe_dbg_reg_ops_write(), and a WARN_ON_ONCE + early-return guard to
> >> ixgbe_read_reg().
> >>
> >> Fixes: 91fbd8f081e2 ("ixgbe: added reg_ops file to debugfs")
> >> Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
> >> Cc: stable@vger.kernel.org
> >> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
> >> ---
> >> v1 -> v2:
> >>  - Add Fixes: tag; reroute from iwl-next to iwl-net (security-relevant
> >>    hardening for user-controllable out-of-bounds MMIO).
> > 
> > Thanks for the update.
> > 
> > And sorry for not thinking to ask this earlier: this patch
> > addresses possible overruns of the mapped address space if the
> > supplied value for reg is too large. But do we also need a
> > guard against underrun if the value for reg is too small?
> > 
> 
> I don't think so. This is bounds checking a register offset which is an
> unsigned 32-bit value and begins at 0, so the map goes from 0 to
> IXGBE_HFDR. Since the value is unsigned, if it does underflow somehow it
> would then get caught by the check for IXGBE_HFDR right?

If the entire range from 0 to IXGBE_HFDR is mapped,
and it's ok for reg to have any value in that range,
then I agree there is no problem here.

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply

* Re: [PATCH net-next v2] net/smc: cap allocation order for SMC-R physically contiguous buffers
From: Simon Horman @ 2026-04-14 17:16 UTC (permalink / raw)
  To: D. Wythe
  Cc: David S. Miller, Dust Li, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Sidraya Jayagond, Wenjia Zhang, Mahanta Jambigi,
	Tony Lu, Wen Gu, linux-kernel, linux-rdma, linux-s390, netdev,
	oliver.yang, pasic
In-Reply-To: <20260414021054.GA111420@j66a10360.sqa.eu95>

On Tue, Apr 14, 2026 at 10:10:54AM +0800, D. Wythe wrote:
> On Fri, Apr 10, 2026 at 04:16:31PM +0100, Simon Horman wrote:
> > On Tue, Apr 07, 2026 at 08:43:37PM +0800, D. Wythe wrote:
> > > The alloc_pages() cannot satisfy requests exceeding MAX_PAGE_ORDER,
> > > and attempting such allocations will lead to guaranteed failures
> > > and potential kernel warnings.
> > > 
> > > For SMCR_PHYS_CONT_BUFS, cap the allocation order to MAX_PAGE_ORDER.
> > > This ensures the attempts to allocate the largest possible physically
> > > contiguous chunk succeed, instead of failing with an invalid order.
> > > This also avoids redundant "try-fail-degrade" cycles in
> > > __smc_buf_create().
> > > 
> > > For SMCR_MIXED_BUFS, no cap is needed: if the order exceeds
> > > MAX_PAGE_ORDER, alloc_pages() will silently fail (__GFP_NOWARN)
> > > and automatically fall back to virtual memory.
> > > 
> > > Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
> > > Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
> > > ---
> > > Changes v1 -> v2:
> > > https://lore.kernel.org/netdev/20260312082154.36971-1-alibuda@linux.alibaba.com/
> > > 
> > > - Move the bufsize cap from smcr_new_buf_create() up to
> > >   __smc_buf_create(), which is simpler and avoids touching
> > >   the allocation logic itself.
> > 
> > The nit below notwithstanding, this looks good to me.
> > 
> > Reviewed-by: Simon Horman <horms@kernel.org>
> > 
> > > ---
> > >  net/smc/smc_core.c | 4 ++++
> > >  1 file changed, 4 insertions(+)
> > > 
> > > diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
> > > index e2d083daeb7e..cdd881746e21 100644
> > > --- a/net/smc/smc_core.c
> > > +++ b/net/smc/smc_core.c
> > > @@ -2440,6 +2440,10 @@ static int __smc_buf_create(struct smc_sock *smc, bool is_smcd, bool is_rmb)
> > >  		/* use socket send buffer size (w/o overhead) as start value */
> > >  		bufsize = smc->sk.sk_sndbuf / 2;
> > >  
> > > +	/* limit bufsize for physically contiguous buffers */
> > > +	if (!is_smcd && lgr->buf_type == SMCR_PHYS_CONT_BUFS)
> > > +		bufsize = min_t(int, bufsize, (PAGE_SIZE << MAX_PAGE_ORDER));
> > 
> > nit: I think min() is sufficient here, and the inner parentheses are
> >      unnecessary
> 
> Hi Simon,
> 
> I think min_t is required here because min() triggers a signedness
> error:
> 
> ././include/linux/compiler_types.h:706:38: error: call to
> ‘__compiletime_assert_950’ declared with attribute error: min(bufsize,
> ((1UL) << 12) << 10) signedness error
> 
> The inner parentheses can be removed, though.

Ack, thanks for checking.

^ permalink raw reply

* Re: [RFC v2 1/2] vfio: add callback to get tph info for dmabuf
From: Keith Busch @ 2026-04-14 17:34 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Zhiping Zhang, Jason Gunthorpe, Bjorn Helgaas, linux-rdma,
	linux-pci, netdev, dri-devel, Yochai Cohen, Yishai Hadas,
	Bjorn Helgaas
In-Reply-To: <20260409120415.GF86584@unreal>

On Thu, Apr 09, 2026 at 03:04:15PM +0300, Leon Romanovsky wrote:
> Something like that, on top of this proposal:

...
  
> +struct vfio_region_dma_tph {
> +	u16 tag;
> +	u8 ph;
> +};
> +
>  struct vfio_region_dma_range {
> -	__u64 offset;
> -	__u64 length;
> +	union {
> +		__u64 offset;
> +		struct vfio_region_dma_tph tph;
> +	};
> +	union {
> +		__u64 length;
> +		__u64 reserved;
> +	};
> +};
> +
> +enum {
> +	VFIO_DMABUF_FLAG_TPH = 1 << 0,
>  };

Okay, so you have the hints as a separate action from the dmabuf
creation. I was trying to set it up in one shot, but this proposal may
be fine. We'll try this idea out internally.

^ permalink raw reply

* Re: [PATCH] net: mdio: octeon: use %p for bus id
From: Russell King (Oracle) @ 2026-04-14 17:42 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Chen Jung Ku, davem, kuba, edumazet, pabeni, hkallweit1, netdev,
	linux-kernel
In-Reply-To: <efc34ba8-730a-4c01-bc44-ee64569d2d4e@lunn.ch>

On Tue, Apr 14, 2026 at 06:16:08PM +0200, Andrew Lunn wrote:
> On Tue, Apr 14, 2026 at 11:56:52PM +0800, Chen Jung Ku wrote:
> > Replace %px with %p to avoid exposing raw kernel pointer values.
> 
> What exactly are we giving away here?
> 
>                         compatible = "cavium,octeon-3860-mdio";
>                         #address-cells = <1>;
>                         #size-cells = <0>;
>                         reg = <0x11800 0x00001900 0x0 0x40>;
> 
> Isn't bus->register_base this well known value?
> 
> You also need to think about ABI.

There isn't ABI here.

        bus->register_base = devm_platform_ioremap_resource(pdev, 0);

        snprintf(bus->mii_bus->id, MII_BUS_ID_SIZE, "%px", bus->register_base);

bus->register_base is the ioremap'd version of the resource, which is
effectively random, and it can be either a 32 or 64-bit hex number
depending on the pointer size. It's an exceedingly bad choice of MDIO
bus ID.

A better more stable choice would be to use the bus address or
dev_name().

Even so, I don't think there's any ABI here as the existing "ID" will
not be stable.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply

* Re: [PATCH v2 2/6] bus: mhi: host: Add support for non-posted TSC timesync feature
From: Vadim Fedorenko @ 2026-04-14 17:46 UTC (permalink / raw)
  To: Krishna Chaitanya Chundru, Manivannan Sadhasivam, Richard Cochran
  Cc: mhi, linux-arm-msm, linux-kernel, netdev, Vivek Pernamitta
In-Reply-To: <20260411-tsc_timesync-v2-2-6f25f72987b3@oss.qualcomm.com>

On 11/04/2026 09:12, Krishna Chaitanya Chundru wrote:
> From: Vivek Pernamitta <quic_vpernami@quicinc.com>
> 
> Implement non-posted time synchronization as described in section 5.1.1
> of the MHI v1.2 specification. The host disables low-power link states
> to minimize latency, reads the local time, issues a MMIO read to the
> device's TIME register.
> 
> Add support for initializing this feature and export a function to be
> used by the drivers which does the time synchronization.
> 
> MHI reads the device time registers in the MMIO address space pointed to
> by the capability register after disabling all low power modes and keeping
> MHI in M0. Before and after MHI reads, the local time is captured
> and shared for processing.

[...]

> +	/*
> +	 * time critical code to fetch device time, delay between these two steps
> +	 * should be deterministic as possible.
> +	 */
> +	preempt_disable();
> +	local_irq_disable();
> +
> +	time->t_host_pre = ktime_get_real();
> +
> +	/*
> +	 * To ensure the PCIe link is in L0 when ASPM is enabled, perform series
> +	 * of back-to-back reads. This is necessary because the link may be in a
> +	 * low-power state (e.g., L1 or L1ss), and need to be forced it to
> +	 * transition to L0.
> +	 */
> +	for (i = 0; i < MHI_NUM_BACK_TO_BACK_READS; i++) {
> +		ret = mhi_read_reg(mhi_cntrl, mhi_tsync->time_reg,
> +				   TSC_TIMESYNC_TIME_LOW_OFFSET, &time->t_dev_lo);
> +
> +		ret = mhi_read_reg(mhi_cntrl, mhi_tsync->time_reg,
> +				   TSC_TIMESYNC_TIME_HIGH_OFFSET, &time->t_dev_hi);
> +	}
> +
> +	time->t_host_post = ktime_get_real();
> +
> +	local_irq_enable();
> +	preempt_enable();

PTP_SYS_OFFSET_EXTENDED receives the amount of samples to read from user
space, you can use it instead of MHI_NUM_BACK_TO_BACK_READS, and in this
case it's better to grab host-pre and host-post time for a single
register read.

Also, PTP_SYS_OFFSET_EXTENDED was improved and currently supports
multiple clockids as system time, it's good to account for it.

^ permalink raw reply

* Re: [PATCH net-next v3 3/5] net: phy: mscc: Drop unnecessary phydev->lock
From: Russell King (Oracle) @ 2026-04-14 17:47 UTC (permalink / raw)
  To: Biju
  Cc: Andrew Lunn, Heiner Kallweit, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Biju Das, Lad Prabhakar,
	Horatiu Vultur, Vladimir Oltean, netdev, linux-kernel,
	Geert Uytterhoeven, linux-renesas-soc
In-Reply-To: <20260412140032.122841-4-biju.das.jz@bp.renesas.com>

On Sun, Apr 12, 2026 at 03:00:25PM +0100, Biju wrote:
> @@ -486,15 +486,9 @@ static int vsc85xx_dt_led_modes_get(struct phy_device *phydev,
>  
>  static int vsc85xx_edge_rate_cntl_set(struct phy_device *phydev, u8 edge_rate)
>  {
> -	int rc;
> -
> -	mutex_lock(&phydev->lock);
> -	rc = phy_modify_paged(phydev, MSCC_PHY_PAGE_EXTENDED_2,
> -			      MSCC_PHY_WOL_MAC_CONTROL, EDGE_RATE_CNTL_MASK,
> -			      edge_rate << EDGE_RATE_CNTL_POS);
> -	mutex_unlock(&phydev->lock);
> -
> -	return rc;
> +	return phy_modify_paged(phydev, MSCC_PHY_PAGE_EXTENDED_2,
> +				MSCC_PHY_WOL_MAC_CONTROL, EDGE_RATE_CNTL_MASK,
> +				edge_rate << EDGE_RATE_CNTL_POS);

This one is fine.

> @@ -503,7 +497,6 @@ static int vsc85xx_mac_if_set(struct phy_device *phydev,
>  	int rc;
>  	u16 reg_val;
>  
> -	mutex_lock(&phydev->lock);
>  	reg_val = phy_read(phydev, MSCC_PHY_EXT_PHY_CNTL_1);
>  	reg_val &= ~(MAC_IF_SELECTION_MASK);
>  	switch (interface) {
> @@ -522,17 +515,15 @@ static int vsc85xx_mac_if_set(struct phy_device *phydev,
>  		break;
>  	default:
>  		rc = -EINVAL;
> -		goto out_unlock;
> +		goto err;
>  	}
>  	rc = phy_write(phydev, MSCC_PHY_EXT_PHY_CNTL_1, reg_val);

I would much rather this was converted to use phy_modify() as well so
that we ensure that the update is atomic.

	rc = phy_modify(phydev, MSCC_PHY_EXT_PHY_CNTL_1,
			MAC_IF_SELECTION_MASK, reg_val);

where reg_val is assigned the field value in the switch above.

> @@ -668,19 +659,15 @@ static int vsc8531_pre_init_seq_set(struct phy_device *phydev)
>  	if (rc < 0)
>  		return rc;
>  
> -	mutex_lock(&phydev->lock);
>  	oldpage = phy_select_page(phydev, MSCC_PHY_PAGE_TR);
>  	if (oldpage < 0)
> -		goto out_unlock;
> +		goto restore_oldpage;
>  
>  	for (i = 0; i < ARRAY_SIZE(init_seq); i++)
>  		vsc85xx_tr_write(phydev, init_seq[i].reg, init_seq[i].val);
>  
> -out_unlock:
> -	oldpage = phy_restore_page(phydev, oldpage, oldpage);
> -	mutex_unlock(&phydev->lock);
> -
> -	return oldpage;
> +restore_oldpage:
> +	return phy_restore_page(phydev, oldpage, oldpage);

This is fine.

> @@ -708,19 +695,15 @@ static int vsc85xx_eee_init_seq_set(struct phy_device *phydev)
>  	unsigned int i;
>  	int oldpage;
>  
> -	mutex_lock(&phydev->lock);
>  	oldpage = phy_select_page(phydev, MSCC_PHY_PAGE_TR);
>  	if (oldpage < 0)
> -		goto out_unlock;
> +		goto restore_oldpage;
>  
>  	for (i = 0; i < ARRAY_SIZE(init_eee); i++)
>  		vsc85xx_tr_write(phydev, init_eee[i].reg, init_eee[i].val);
>  
> -out_unlock:
> -	oldpage = phy_restore_page(phydev, oldpage, oldpage);
> -	mutex_unlock(&phydev->lock);
> -
> -	return oldpage;
> +restore_oldpage:
> +	return phy_restore_page(phydev, oldpage, oldpage);

Also fine.

Thanks.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply

* Re: [PATCH v2] vsock/virtio: fix accept queue count leak on transport mismatch
From: Bobby Eshleman @ 2026-04-14 17:57 UTC (permalink / raw)
  To: Dudu Lu; +Cc: netdev, stefanha, sgarzare, mst, jasowang
In-Reply-To: <20260413131409.19022-1-phx0fer@gmail.com>

On Mon, Apr 13, 2026 at 09:14:09PM +0800, Dudu Lu wrote:
> virtio_transport_recv_listen() calls sk_acceptq_added() before
> vsock_assign_transport(). If vsock_assign_transport() fails or
> selects a different transport, the error path returns without
> calling sk_acceptq_removed(), permanently incrementing
> sk_ack_backlog.
> 
> After approximately backlog+1 such failures, sk_acceptq_is_full()
> returns true, causing the listener to reject all new connections.
> 
> Fix by moving sk_acceptq_added() to after the transport validation,
> matching the pattern used by vmci_transport and hyperv_transport.
> 
> Fixes: c0cfa2d8a788 ("vsock: add multi-transports support")
> Signed-off-by: Dudu Lu <phx0fer@gmail.com>
> ---

Just a heads up that version change lists are encouraged.

>  net/vmw_vsock/virtio_transport_common.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index 8a9fb23c6e85..e01d983488e5 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -1560,8 +1560,6 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb,
>  		return -ENOMEM;
>  	}
>  
> -	sk_acceptq_added(sk);
> -
>  	lock_sock_nested(child, SINGLE_DEPTH_NESTING);
>  
>  	child->sk_state = TCP_ESTABLISHED;
> @@ -1583,6 +1581,7 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb,
>  		return ret;
>  	}
>  
> +	sk_acceptq_added(sk);
>  	if (virtio_transport_space_update(child, skb))
>  		child->sk_write_space(child);
>  
> -- 
> 2.39.3 (Apple Git-145)
> 

This makes sense to me.

Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com>

^ permalink raw reply

* RE: [Intel-wired-lan] [PATCH iwl-net v1] i40e: fix napi_enable/disable skipping ringless q_vectors
From: Mekala, SunithaX D @ 2026-04-14 17:58 UTC (permalink / raw)
  To: Loktionov, Aleksandr, intel-wired-lan@lists.osuosl.org,
	Nguyen, Anthony L, Loktionov, Aleksandr
  Cc: netdev@vger.kernel.org, Jakub Kicinski
In-Reply-To: <20260324130922.562714-1-aleksandr.loktionov@intel.com>

> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of Aleksandr Loktionov
> Sent: Tuesday, March 24, 2026 6:09 AM
> To: intel-wired-lan@lists.osuosl.org; Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Loktionov, Aleksandr <aleksandr.loktionov@intel.com>
> Cc: netdev@vger.kernel.org; Jakub Kicinski <kuba@kernel.org>
> Subject: [Intel-wired-lan] [PATCH iwl-net v1] i40e: fix napi_enable/disable skipping ringless q_vectors
>
> After ethtool -L reduces the queue count, i40e_napi_disable_all() sets
> NAPI_STATE_SCHED on all q_vectors, then i40e_vsi_map_rings_to_vectors()
> clears ring pointers on the excess ones.  i40e_napi_enable_all() skips
> those with:
>
>	if (q_vector->rx.ring || q_vector->tx.ring)
>		napi_enable(&q_vector->napi);
>
> leaving them on dev->napi_list with NAPI_STATE_SCHED permanently set.
>
> Writing to /sys/class/net/<iface>/threaded calls napi_stop_kthread()
> on every entry in dev->napi_list.  The function loops on msleep(20)
> waiting for NAPI_STATE_SCHED to clear -- which never happens for the
> stale q_vectors.  The task hangs in D state forever; a concurrent write
> deadlocks on dev->lock held by the first.
>
> Commit 13a8cd191a2b added the guard to prevent a divide-by-zero in
> i40e_napi_poll() when epoll busy-poll iterated all device NAPIs (4.x
> era).  Since 7adc3d57fe2b ("net: Introduce preferred busy-polling",
> v5.11) napi_busy_loop() polls by napi_id keyed to the socket, so
> ringless q_vectors are never selected.  i40e_msix_clean_rings() also
> independently avoids scheduling NAPI for them.  The guard is safe to
> remove.
>
> Add an early return in i40e_napi_poll() for num_ringpairs == 0 so the
> function is self-defending against a NULL tx.ring dereference at the
> WB_ON_ITR check, should the NAPI ever fire through an unexpected path.
>
> Reported-by: Jakub Kicinski <kuba@kernel.org>
> Closes: https://lore.kernel.org/intel-wired-lan/20260316133100.6054a11f@kernel.org/
> Fixes: 13a8cd191a2b ("i40e: Do not enable NAPI on q_vectors that have no rings")
> Cc: stable@vger.kernel.org
> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
> ---
> Test configuration:
>   Kernel   : Linux 6.19.0-rc8+
>   NIC      : Intel Ethernet Controller XXV710 for 25GbE SFP28 [8086:158b]
>   Driver   : i40e (in-tree)
>   Firmware : 9.40 0x8000ed12 1.3429.0
>   CPU      : 2 x Intel Xeon Gold 6238M (88 logical CPUs, x86_64)
>   RAM      : 64 GiB
>
> Reproduction steps (FAIL before fix):
>   # 1. Reduce queues so excess q_vectors lose their ring pointers
>  ethtool -L <iface> combined 1
>
>   # 2. Enable threaded NAPI (completes fast in 6.19, no hang on enable path)
>   echo 1 > /sys/class/net/<iface>/threaded
>
 >  # 3. Two concurrent writes to disable -- fires the msleep deadlock
>   echo 0 > /sys/class/net/<iface>/threaded &
>   echo 0 > /sys/class/net/<iface>/threaded &
>
>   Both background tasks enter uninterruptible sleep (D state) immediately
>   and never return.
>
>   Observed kernel stack (W1, holds dev->lock):
>     msleep+0x2d/0x50
>     napi_set_threaded+0x10b/0x110
>     netif_set_threaded+0xe1/0x140
>     threaded_store+0xd2/0x100
>     kernfs_fop_write_iter+0x138/0x1d0
>
>   Kernel hung_task message (~120 s after trigger):
>     INFO: task bash blocked for more than 122 seconds.
>     INFO: task bash is blocked on a mutex likely owned by task bash.
>
> Validation (PASS with fix):
>   Both background tasks exit within 1 second.
>   D-state process count: 0.
>   Busy-poll (net.core.busy_poll=50) + 50000-packet UDP flood with
>   1 active queue: no NULL dereference, no crash.
>
>  drivers/net/ethernet/intel/i40e/i40e_main.c | 28 ++++++++++++---------
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c | 10 ++++++++
>  2 files changed, 26 insertions(+), 12 deletions(-)

Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel)

^ permalink raw reply

* Re: [PATCH v10 01/12] x86/bhi: x86/vmscape: Move LFENCE out of clear_bhb_loop()
From: Pawan Gupta @ 2026-04-14 18:05 UTC (permalink / raw)
  To: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
	David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
	Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
	David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
	Paolo Bonzini, Jonathan Corbet
  Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang, bpf, netdev,
	linux-doc
In-Reply-To: <20260414-vmscape-bhb-v10-1-efa924abae5f@linux.intel.com>

On Tue, Apr 14, 2026 at 12:05:28AM -0700, Pawan Gupta wrote:
> Currently, the BHB clearing sequence is followed by an LFENCE to prevent
> transient execution of subsequent indirect branches prematurely. However,
> the LFENCE barrier could be unnecessary in certain cases. For example, when
> the kernel is using the BHI_DIS_S mitigation, and BHB clearing is only
> needed for userspace. In such cases, the LFENCE is redundant because ring
> transitions would provide the necessary serialization.
> 
> Below is a quick recap of BHI mitigation options:
> 
> On Alder Lake and newer
> 
>     BHI_DIS_S: Hardware control to mitigate BHI in ring0. This has low
>     performance overhead.
> 
>     Long loop: Alternatively, a longer version of the BHB clearing sequence
>     can be used to mitigate BHI. It can also be used to mitigate the BHI
>     variant of VMSCAPE. This is not yet implemented in Linux.
> 
> On older CPUs
> 
>     Short loop: Clears BHB at kernel entry and VMexit. The "Long loop" is
>     effective on older CPUs as well, but should be avoided because of
>     unnecessary overhead.
> 
> On Alder Lake and newer CPUs, eIBRS isolates the indirect targets between
> guest and host. But when affected by the BHI variant of VMSCAPE, a guest's
> branch history may still influence indirect branches in userspace. This
> also means the big hammer IBPB could be replaced with a cheaper option that
> clears the BHB at exit-to-userspace after a VMexit.
> 
> In preparation for adding the support for the BHB sequence (without LFENCE)
> on newer CPUs, move the LFENCE to the caller side after clear_bhb_loop() is
> executed. Allow callers to decide whether they need the LFENCE or not. This
> adds a few extra bytes to the call sites, but it obviates the need for
> multiple variants of clear_bhb_loop().
> 
> Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
> Tested-by: Jon Kohler <jon@nutanix.com>
> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> ---

Sorry this is missing Boris's Ack, I will fix.

> Acked-by: Borislav Petkov (AMD) <bp@alien8.de>

^ permalink raw reply

* RE: [Intel-wired-lan] [PATCH iwl-net v1] i40e: don't advertise IFF_SUPP_NOFCS
From: Mekala, SunithaX D @ 2026-04-14 18:07 UTC (permalink / raw)
  To: Kohei Enju, intel-wired-lan@lists.osuosl.org,
	netdev@vger.kernel.org
  Cc: Nguyen, Anthony L, Kitszel, Przemyslaw, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Shannon Nelson, Jesse Brandeburg, kohei.enju@gmail.com
In-Reply-To: <20260325205054.109822-1-kohei@enjuk.jp>

> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of Kohei Enju
> Sent: Wednesday, March 25, 2026 1:50 PM
> To: intel-wired-lan@lists.osuosl.org; netdev@vger.kernel.org
> Cc: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw <przemyslaw.kitszel@intel.com>; Andrew Lunn <andrew+netdev@lunn.ch>; David S. Miller <davem@davemloft.net>; Eric Dumazet <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>; > Shannon Nelson <sln@onemain.com>; Jesse Brandeburg <jesse.brandeburg@intel.com>; kohei.enju@gmail.com; Kohei Enju <kohei@enjuk.jp>
> Subject: [Intel-wired-lan] [PATCH iwl-net v1] i40e: don't advertise IFF_SUPP_NOFCS
>
> i40e advertises IFF_SUPP_NOFCS, allowing users to use the SO_NOFCS
> socket option. However, this option is silently ignored, as the driver
> does not check skb->no_fcs, and always enables FCS insertion offload.
>
> Fix this by removing the advertisement of IFF_SUPP_NOFCS.
>
> This behavior can be reproduced with a simple AF_PACKET socket:
>
>  import socket
>   s = socket.socket(socket.AF_PACKET, socket.SOCK_RAW)
>   s.setsockopt(socket.SOL_SOCKET, 43, 1) # SO_NOFCS
>   s.bind(("eth0", 0))
>   s.send(b'\xff' * 64)
>
> Previously, send() succeeds but the driver ignores SO_NOFCS.
> With this change, send() fails with -EPROTONOSUPPORT, as expected.
>
> Fixes: 41c445ff0f48 ("i40e: main driver core")
> Signed-off-by: Kohei Enju <kohei@enjuk.jp>
> ---
>  drivers/net/ethernet/intel/i40e/i40e_main.c | 1 -
>  1 file changed, 1 deletion(-)

Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel)

^ permalink raw reply

* TODOs in oa_tc6
From: Stefan Wahren @ 2026-04-14 18:18 UTC (permalink / raw)
  To: Parthiban Veerasooran; +Cc: netdev

Hello Parthiban,

thank you for upstreaming the lan865x driver.

When do you plan to fix these open TODOs in ethernet/oa_tc6?

Best regards

^ permalink raw reply

* Re: [PATCH net] net/sched: taprio: fix use-after-free in advance_sched() on schedule switch
From: Vinicius Costa Gomes @ 2026-04-14 18:26 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jakub Kicinski, vladimir.oltean, jhs, jiri, davem, edumazet,
	pabeni, horms, netdev, linux-kernel, qjx1298677004
In-Reply-To: <20260413230106.3654555-1-kuba@kernel.org>

Jakub Kicinski <kuba@kernel.org> writes:

> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
> ---
> net/sched: taprio: fix use-after-free in advance_sched() on schedule switch
>
> This commit updates advance_sched() to select the next entry from the new
> oper schedule immediately after switch_schedules() returns. It relies on the
> pre-calculated end_time of the new schedule's first entry instead of
> overwriting it with the admin schedule's base time.
>
>> diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
>> index f721c03514f60..0316f2dee06ac 100644
>> --- a/net/sched/sch_taprio.c
>> +++ b/net/sched/sch_taprio.c
>
> [ ... ]
>
>> @@ -972,11 +972,12 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer)
>>  	}
>>  
>>  	if (should_change_schedules(admin, oper, end_time)) {
>> -		/* Set things so the next time this runs, the new
>> -		 * schedule runs.
>> -		 */
>> -		end_time = sched_base_time(admin);
>
> Does removing this assignment introduce a regression where the new schedule
> activates prematurely?
>

should_change_schedules() checks if the previous 'end_time' is after the
start of the "new" admin schedule.

> Since the new schedule's first entry has its end_time pre-calculated
> as base_time + interval, if we activate it early (before its actual
> base_time), does it stretch the entry backwards in time?

If the hrtimer triggers early, that might be possible. But one fact is
that this is on the software implementation, and as its accuracy is the
hrtimer accuracy, second guessing what the hrtimer did/is doing here
seems wrong.

>
> Could this cause its gate mask to apply during the time gap, violating the
> strictly configured gate durations?

Yeah, if the hrtimer fires early, we might have some schedule entries
that are longer than they should be. But that is an intrinsic limitation
of the software mode (at least in its current implementation). 

So, in short, not really concerned about this "violation of strictness",
as long as we are not making it worse, to be clear.

>
>>  		switch_schedules(q, &admin, &oper);
>
> Can this sequence cause a use-after-free regression due to the RCU grace
> period starting too early?
>
> Inside switch_schedules(), call_rcu() is invoked on the old oper
> schedule. However, q->current_entry is not updated until later in
> advance_sched() at the first_run label:
>
> first_run:
> 	rcu_assign_pointer(q->current_entry, next);
>
> If a concurrent reader calls rcu_dereference(q->current_entry) between
> switch_schedules() and this assignment, can it acquire a pointer to the
> old schedule after call_rcu() has already started? This would mean the
> grace period will not wait for this new reader.

I had to sleep on this, yeah, this is possible, but the window is very
narrow. And not introduced by this fix. I guess this problem existed
since I added the concept of an "admin" schedule.

Will fix on a separate patch. Need to think more about this.

>
> As a separate pre-existing issue, can switch_schedules() be called
> concurrently by lockless readers?
>
> If advance_sched() calls it under q->current_entry_lock, but a concurrent
> reader like get_packet_txtime() evaluates the switch condition and calls
> switch_schedules() without holding locks, can they race?

At configuration time, txtime-assisted (which calls get_packet_txtime())
mode and the software mode (which "runs" advance_sched()) are blocked to
run at the same time. This should not be possible.

>
> Could this lead to call_rcu() being called twice on the same old oper
> object, potentially corrupting the RCU callback list?
>
>> +		/* After changing schedules, the next entry is the first one
>> +		 * in the new schedule, with a pre-calculated end_time.
>> +		 */
>> +		next = list_first_entry(&oper->entries, struct sched_entry, list);
>> +		end_time = next->end_time;
>>  	}
>>  
>>  	next->end_time = end_time;


Cheers,
-- 
Vinicius

^ permalink raw reply

* Re: [PATCH] net/sched: sch_dualpi2: fix NULL pointer dereference in dualpi2_change()
From: Simon Horman @ 2026-04-14 18:31 UTC (permalink / raw)
  To: Kito Xu (veritas501)
  Cc: Jamal Hadi Salim, Jiri Pirko, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Chia-Yu Chang, netdev, linux-kernel
In-Reply-To: <20260413075740.2234828-1-hxzene@gmail.com>

On Mon, Apr 13, 2026 at 03:57:40PM +0800, Kito Xu (veritas501) wrote:
> dualpi2_change() uses a trim loop to enforce the new queue limit after a
> configuration change. The loop calls qdisc_dequeue_internal(sch, true)
> which only dequeues from the C-queue (sch->q) and the requeue list
> (sch->gso_skb). It does not dequeue from the L-queue (q->l_queue).
> 
> However, the loop continuation condition checks qdisc_qlen(sch), which
> reflects the total packet count across both queues because
> dualpi2_enqueue_skb() manually increments sch->q.qlen for L-queue
> packets (line 418). Similarly, q->memory_used accounts for memory from
> both queues.
> 
> When all packets reside in the L-queue and the C-queue is empty, the
> loop condition remains true but qdisc_dequeue_internal() returns NULL.
> The subsequent skb->truesize dereference causes a NULL pointer oops.
> 
> An unprivileged user can trigger this from a user namespace:
> 
>   1. unshare(CLONE_NEWUSER | CLONE_NEWNET)
>   2. Create a dummy device and attach dualpi2 qdisc
>   3. Send ECT(1)-marked packets to fill the L-queue
>   4. Reduce the qdisc limit via RTM_NEWQDISC

...

> Fix this by adding a NULL check after qdisc_dequeue_internal(). When
> the C-queue is exhausted but L-queue packets keep qdisc_qlen(sch) above
> the limit, the loop breaks safely. Remaining excess L-queue packets will
> be drained by the normal dequeue path.
> 
> Fixes: 320d031ad6e4 ("sched: Struct definition and parsing of dualpi2 qdisc")
> Signed-off-by: Kito Xu (veritas501) <hxzene@gmail.com>

Reviewed-by: Simon Horman <horms@kernel.org>

^ permalink raw reply

* Re: [PATCH] net/sched: sch_dualpi2: fix NULL pointer dereference in dualpi2_change()
From: Simon Horman @ 2026-04-14 18:36 UTC (permalink / raw)
  To: Kito Xu (veritas501)
  Cc: Jamal Hadi Salim, Jiri Pirko, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Chia-Yu Chang, netdev, linux-kernel
In-Reply-To: <20260414183132.GC772670@horms.kernel.org>

On Tue, Apr 14, 2026 at 07:31:32PM +0100, Simon Horman wrote:
> On Mon, Apr 13, 2026 at 03:57:40PM +0800, Kito Xu (veritas501) wrote:
> > dualpi2_change() uses a trim loop to enforce the new queue limit after a
> > configuration change. The loop calls qdisc_dequeue_internal(sch, true)
> > which only dequeues from the C-queue (sch->q) and the requeue list
> > (sch->gso_skb). It does not dequeue from the L-queue (q->l_queue).
> > 
> > However, the loop continuation condition checks qdisc_qlen(sch), which
> > reflects the total packet count across both queues because
> > dualpi2_enqueue_skb() manually increments sch->q.qlen for L-queue
> > packets (line 418). Similarly, q->memory_used accounts for memory from
> > both queues.
> > 
> > When all packets reside in the L-queue and the C-queue is empty, the
> > loop condition remains true but qdisc_dequeue_internal() returns NULL.
> > The subsequent skb->truesize dereference causes a NULL pointer oops.
> > 
> > An unprivileged user can trigger this from a user namespace:
> > 
> >   1. unshare(CLONE_NEWUSER | CLONE_NEWNET)
> >   2. Create a dummy device and attach dualpi2 qdisc
> >   3. Send ECT(1)-marked packets to fill the L-queue
> >   4. Reduce the qdisc limit via RTM_NEWQDISC
> 
> ...
> 
> > Fix this by adding a NULL check after qdisc_dequeue_internal(). When
> > the C-queue is exhausted but L-queue packets keep qdisc_qlen(sch) above
> > the limit, the loop breaks safely. Remaining excess L-queue packets will
> > be drained by the normal dequeue path.
> > 
> > Fixes: 320d031ad6e4 ("sched: Struct definition and parsing of dualpi2 qdisc")
> > Signed-off-by: Kito Xu (veritas501) <hxzene@gmail.com>
> 
> Reviewed-by: Simon Horman <horms@kernel.org>

Sorry, I now see that a more comprehensive fix for this code path
is available from the original author of the code.

- [PATCH v1 net 1/1] net/sched: sch_dualpi2: fix limit/memlimit enforcement when dequeueing L-queue
  https://lore.kernel.org/all/20260413163711.56191-1-chia-yu.chang@nokia-bell-labs.com/

^ permalink raw reply

* Re: [PATCH RFC bpf-next 1/8] kasan: expose generic kasan helpers
From: Alexis Lothoré @ 2026-04-14 18:41 UTC (permalink / raw)
  To: Alexei Starovoitov, Alexis Lothoré
  Cc: Andrey Konovalov, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman,
	Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Jiri Olsa,
	John Fastabend, David S. Miller, David Ahern, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, X86 ML, H. Peter Anvin,
	Shuah Khan, Maxime Coquelin, Alexandre Torgue, Andrey Ryabinin,
	Alexander Potapenko, Dmitry Vyukov, Vincenzo Frascino,
	Andrew Morton, ebpf, Bastien Curutchet, Thomas Petazzoni,
	Xu Kuohai, bpf, LKML, Network Development,
	open list:KERNEL SELFTEST FRAMEWORK, linux-stm32,
	linux-arm-kernel, kasan-dev, linux-mm
In-Reply-To: <CAADnVQLJ=fJ7t1i2+_RYqU1gqYqiLP9Zrwo4vdZsgzjK_yzJTQ@mail.gmail.com>

On Tue Apr 14, 2026 at 4:36 PM CEST, Alexei Starovoitov wrote:
> On Tue, Apr 14, 2026 at 6:13 AM Alexis Lothoré
> <alexis.lothore@bootlin.com> wrote:
>>
>> Hi Andrey, thanks for the prompt review !
>>
>> On Tue Apr 14, 2026 at 12:19 AM CEST, Andrey Konovalov wrote:
>> > On Mon, Apr 13, 2026 at 8:29 PM Alexis Lothoré (eBPF Foundation)
>> > <alexis.lothore@bootlin.com> wrote:
>> >>
>>
>> [...]
>>
>> >> +#ifdef CONFIG_KASAN_GENERIC
>> >> +void __asan_load1(void *p);
>> >> +void __asan_store1(void *p);
>> >> +void __asan_load2(void *p);
>> >> +void __asan_store2(void *p);
>> >> +void __asan_load4(void *p);
>> >> +void __asan_store4(void *p);
>> >> +void __asan_load8(void *p);
>> >> +void __asan_store8(void *p);
>> >> +void __asan_load16(void *p);
>> >> +void __asan_store16(void *p);
>> >> +#endif /* CONFIG_KASAN_GENERIC */
>> >
>> > This looks ugly, let's not do this unless it's really required.
>> >
>> > You can just use kasan_check_read/write() instead - these are public
>> > wrappers around the same shadow memory checking functions. And they
>> > also work with the SW_TAGS mode, in case the BPF would want to use
>> > that mode at some point. (For HW_TAGS, we only have kasan_check_byte()
>> > that checks a single byte, but it can be extended in the future if
>> > required to be used by BPF.)
>>
>> ACK, I'll try to use those kasan_check_read and kasan_check_write rather
>> than __asan_{load,store}X.
>
> No. The performance penalty will be too high.

Since we are mentioning it, I did not consider yet any performance
comparision/benchmarking (and I am not really familiar with usual bpf
performance validation practices for new bpf features). Is there any
existing test I should take a look at for this ? Maybe some specific
benches in tools/testing/selftests/bpf/bench ? 

> hw_tags won't work without corresponding JIT work.
> I see no point sacrificing performance for aesthetics.
> __asan_load/storeX is what compilers emit.
> In that sense JIT is a compiler it should emit exactly the same.




-- 
Alexis Lothoré, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


^ permalink raw reply

* RE: [PATCH net-next v3 5/5] net: phy: Move phy_init_hw() from phy_resume() to __phy_resume()
From: Biju Das @ 2026-04-14 18:43 UTC (permalink / raw)
  To: Andrew Lunn, biju.das.au
  Cc: Heiner Kallweit, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Russell King, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, Geert Uytterhoeven,
	Prabhakar Mahadev Lad, linux-renesas-soc@vger.kernel.org
In-Reply-To: <b3521be0-c5da-45ef-b6bd-64e4a7b97966@lunn.ch>

Hi Andrew,

> -----Original Message-----
> From: Andrew Lunn <andrew@lunn.ch>
> Sent: 14 April 2026 17:03
> Subject: Re: [PATCH net-next v3 5/5] net: phy: Move phy_init_hw() from phy_resume() to __phy_resume()
> 
> On Sun, Apr 12, 2026 at 03:00:27PM +0100, Biju wrote:
> > From: Biju Das <biju.das.jz@bp.renesas.com>
> >
> > Now that redundant locking has been removed from PHY driver callbacks,
> > phy_init_hw() can be called with phydev->lock held.
> >
> > Many MAC drivers and the phylink framework resume the PHY via
> > phy_start(), which invokes __phy_resume() directly without going
> > through phy_resume(). Keeping phy_init_hw() in phy_resume() means it
> > is not called in this path.
> >
> > Move phy_init_hw() into __phy_resume() so that PHY soft reset and
> > re-initialisation happen unconditionally on every resume, regardless
> > of which code path triggers it.
> 
> I would change the order of these patches. First remove the redundant locks. You can then put
> phy_init_hw() into __phy_resume(), rather than first moving it into phy_resume() and then
> __phy_resume().

Agreed.

Cheers,
Biju

^ permalink raw reply

* RE: [PATCH net-next v3 4/5] net: phy: microchip_t1: Replace phydev->lock with mdio_lock in lan937x_dsp_workaround()
From: Biju Das @ 2026-04-14 18:44 UTC (permalink / raw)
  To: Andrew Lunn, biju.das.au
  Cc: Arun Ramadoss, Heiner Kallweit, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, UNGLinuxDriver@microchip.com,
	Russell King, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, Geert Uytterhoeven,
	Prabhakar Mahadev Lad, linux-renesas-soc@vger.kernel.org
In-Reply-To: <7924b6fa-8a8a-4a17-bb3d-40a9578a3f8a@lunn.ch>

Hi Andrew,

> -----Original Message-----
> From: Andrew Lunn <andrew@lunn.ch>
> Sent: 14 April 2026 17:09
> Subject: Re: [PATCH net-next v3 4/5] net: phy: microchip_t1: Replace phydev->lock with mdio_lock in
> lan937x_dsp_workaround()
> 
> > -	mutex_lock(&phydev->lock);
> > +	mutex_lock(&phydev->mdio.bus->mdio_lock);
> 
> phy_lock_mdio_bus(), and the phy_unlock_mdio_bus().

OK, will fix this.

Cheers,
Biju

^ permalink raw reply

* RE: [PATCH net-next v3 3/5] net: phy: mscc: Drop unnecessary phydev->lock
From: Biju Das @ 2026-04-14 18:45 UTC (permalink / raw)
  To: Russell King, biju.das.au
  Cc: Andrew Lunn, Heiner Kallweit, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Prabhakar Mahadev Lad,
	Horatiu Vultur, Vladimir Oltean, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, Geert Uytterhoeven,
	linux-renesas-soc@vger.kernel.org
In-Reply-To: <ad59y5ZfJDKFo3eU@shell.armlinux.org.uk>

Hi Russell King,

> -----Original Message-----
> From: Russell King <linux@armlinux.org.uk>
> Sent: 14 April 2026 18:48
> Subject: Re: [PATCH net-next v3 3/5] net: phy: mscc: Drop unnecessary phydev->lock
> 
> On Sun, Apr 12, 2026 at 03:00:25PM +0100, Biju wrote:
> > @@ -486,15 +486,9 @@ static int vsc85xx_dt_led_modes_get(struct
> > phy_device *phydev,
> >
> >  static int vsc85xx_edge_rate_cntl_set(struct phy_device *phydev, u8
> > edge_rate)  {
> > -	int rc;
> > -
> > -	mutex_lock(&phydev->lock);
> > -	rc = phy_modify_paged(phydev, MSCC_PHY_PAGE_EXTENDED_2,
> > -			      MSCC_PHY_WOL_MAC_CONTROL, EDGE_RATE_CNTL_MASK,
> > -			      edge_rate << EDGE_RATE_CNTL_POS);
> > -	mutex_unlock(&phydev->lock);
> > -
> > -	return rc;
> > +	return phy_modify_paged(phydev, MSCC_PHY_PAGE_EXTENDED_2,
> > +				MSCC_PHY_WOL_MAC_CONTROL, EDGE_RATE_CNTL_MASK,
> > +				edge_rate << EDGE_RATE_CNTL_POS);
> 
> This one is fine.
> 
> > @@ -503,7 +497,6 @@ static int vsc85xx_mac_if_set(struct phy_device *phydev,
> >  	int rc;
> >  	u16 reg_val;
> >
> > -	mutex_lock(&phydev->lock);
> >  	reg_val = phy_read(phydev, MSCC_PHY_EXT_PHY_CNTL_1);
> >  	reg_val &= ~(MAC_IF_SELECTION_MASK);
> >  	switch (interface) {
> > @@ -522,17 +515,15 @@ static int vsc85xx_mac_if_set(struct phy_device *phydev,
> >  		break;
> >  	default:
> >  		rc = -EINVAL;
> > -		goto out_unlock;
> > +		goto err;
> >  	}
> >  	rc = phy_write(phydev, MSCC_PHY_EXT_PHY_CNTL_1, reg_val);
> 
> I would much rather this was converted to use phy_modify() as well so that we ensure that the update is
> atomic.
> 
> 	rc = phy_modify(phydev, MSCC_PHY_EXT_PHY_CNTL_1,
> 			MAC_IF_SELECTION_MASK, reg_val);

Agreed, will use phy_modify()

Cheers,
Biju

^ permalink raw reply

* Re: [PATCH net] net: pse-pd: fix out-of-bounds bitmap access in pse_isr() on 32-bit
From: Oleksij Rempel @ 2026-04-14 18:47 UTC (permalink / raw)
  To: Kory Maincent
  Cc: Jakub Kicinski, netdev, linux-kernel, Carlo Szelinsky,
	thomas.petazzoni, Andrew Lunn, David S. Miller, Eric Dumazet,
	Paolo Abeni
In-Reply-To: <20260414151331.745552-1-kory.maincent@bootlin.com>

Hi Kory,

On Tue, Apr 14, 2026 at 05:13:30PM +0200, Kory Maincent wrote:
> @@ -1340,6 +1341,11 @@ int devm_pse_irq_helper(struct pse_controller_dev *pcdev, int irq,
>  	if (!h->notifs)
>  		return -ENOMEM;
>  
> +	h->notifs_mask = devm_kcalloc(dev, BITS_TO_LONGS(pcdev->nr_lines),
> +				      sizeof(*h->notifs_mask), GFP_KERNEL);

May be better devm_bitmap_zalloc() instead of devm_kcalloc()?

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply

* Re: [PATCH net-next 0/4] WireGuard fixes for 7.1-rc1
From: patchwork-bot+netdevbpf @ 2026-04-14 18:50 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: netdev, kuba, pabeni
In-Reply-To: <20260414153944.2742252-1-Jason@zx2c4.com>

Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Tue, 14 Apr 2026 17:39:40 +0200 you wrote:
> Hi Jakub,
> 
> Please find 4 simple patches attached:
> 
> 1) Asbjørn's YNL sample, finally merged. Sorry for the wait on this one.
> 
> 2) A simplification to use kfree_rcu instead of call_rcu, since
>    kfree_rcu now works with kmem caches.
> 
> [...]

Here is the summary with links:
  - [net-next,1/4] wireguard: allowedips: Use kfree_rcu() instead of call_rcu()
    https://git.kernel.org/netdev/net-next/c/e5549aecdd24
  - [net-next,2/4] tools: ynl: add sample for wireguard
    https://git.kernel.org/netdev/net-next/c/121f416756d6
  - [net-next,3/4] wireguard: allowedips: remove redundant space
    https://git.kernel.org/netdev/net-next/c/f364db381c9d
  - [net-next,4/4] wireguard: device: use exit_rtnl callback instead of manual rtnl_lock in pre_exit
    https://git.kernel.org/netdev/net-next/c/60a25ef8dacb

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox