Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net v4 0/5] nfc: fix multiple OOB reads in NCI and LLCP parsing paths
From: David Heidelberg @ 2026-06-24 16:11 UTC (permalink / raw)
  To: Lekë Hapçiu, oe-linux-nfc
  Cc: davem, edumazet, kuba, pabeni, krzk, horms, linux-kernel, netdev
In-Reply-To: <20260424180151.3808557-1-snowwlake@icloud.com>

On 24/04/2026 20:01, Lekë Hapçiu wrote:
> This series fixes five out-of-bounds / underflow bugs in the kernel NFC
> stack.  All are reachable from a remote NFC peer that the local stack
> has already associated with; in the LLCP cases the peer only needs to
> send a malformed frame.
> 
>    1/5  nci: u8 underflow in nci_store_general_bytes_nfc_dep() lets the
>         attacker-controlled atr_res_len skip the GT-offset subtraction
>         and cause an OOB read/write against general_bytes[].
>    2/5  llcp: parse_gb_tlv() / parse_connection_tlv() trust the TLV
>         length byte without checking remaining buffer, and the tlv16
>         accessors read past the end when length < 2.
>    3/5  llcp: nfc_llcp_recv_snl() has the same TLV-length trust bug, and
>         its SDRES handler uses an unbounded "%.16s" pr_debug() that
>         walks past service_name_len.
>    4/5  llcp: nfc_llcp_recv_dm() reads skb->data[3] without checking
>         skb->len, giving a 1-byte heap OOB read.
>    5/5  llcp: nfc_llcp_connect_sn() walks the TLV array with no length
>         validation; a crafted CONNECT frame drops it into OOB reads /
>         an unbounded service-name pointer.
> 
> The series applies on top of net/main.
> 
> Lekë Hapçiu (5):
>    nfc: nci: fix u8 underflow in nci_store_general_bytes_nfc_dep
>    nfc: llcp: fix TLV parsing in parse_gb_tlv and parse_connection_tlv
>    nfc: llcp: fix TLV parsing OOB in nfc_llcp_recv_snl
>    nfc: llcp: fix OOB read of DM reason byte in nfc_llcp_recv_dm
>    nfc: llcp: fix TLV parsing OOB in nfc_llcp_connect_sn
> 
>   net/nfc/llcp_commands.c | 24 ++++++++++++++++++++++--
>   net/nfc/llcp_core.c     | 35 ++++++++++++++++++++++++++++++++---
>   net/nfc/nci/ntf.c       |  6 ++++++
>   3 files changed, 60 insertions(+), 5 deletions(-)
> 

Hello Lekë,

could you please rebase this series against NFC for-linus branch [1]?

Likely some checks has been added meanwhile, but I would love to get the 
remaining ones in!

Don't forget also add NFC mailing list oe-linux-nfc@lists.linux.dev .

Thank you
David

^ permalink raw reply

* Re: [PATCH v3 2/7] net: wwan: t9xx: Add control plane transaction layer
From: Andrew Lunn @ 2026-06-24 16:00 UTC (permalink / raw)
  To: jackbb_wu
  Cc: Loic Poulain, Sergey Ryazanov, Johannes Berg, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Wen-Zhi Huang, Shi-Wei Yeh, Minano Tseng, Matthias Brugger,
	AngeloGioacchino Del Regno, Simon Horman, Jonathan Corbet,
	Shuah Khan, linux-kernel, netdev, linux-arm-kernel,
	linux-mediatek, linux-doc
In-Reply-To: <20260624-t9xx_driver_v1-v3-2-73ff03f60c48@compal.com>

> +static int __init mtk_common_drv_init(void)
> +{
> +	return 0;
> +}
> +module_init(mtk_common_drv_init);
> +
> +static void __exit mtk_common_drv_exit(void)
> +{
> +}
> +module_exit(mtk_common_drv_exit);

Since these don't do anything, they should not be needed.

> @@ -467,6 +468,7 @@ static u32 mtk_pci_ext_h2d_evt_hw_bits(u32 chs)
>  
>  	SET_HW_BITS(hw_bits, chs, MHCCIF_RC2EP_EVT_DEVICE_RESET,
>  		    DEV_EVT_H2D_DEVICE_RESET);
> +
>  	return LE32_TO_U32(cpu_to_le32(hw_bits));

Please don't add white space like this. I assume a previous patch
added this code, so move this to that patch.

> @@ -908,13 +910,11 @@ static int mtk_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  	struct mtk_md_dev *mdev;
>  	int ret;
>  
> -	mdev = devm_kzalloc(dev, sizeof(*mdev), GFP_KERNEL);
> +	mdev = mtk_dev_alloc(dev, &pci_hw_ops);
>  	if (!mdev) {
>  		ret = -ENOMEM;
>  		goto log_err;
>  	}
> -	mdev->dev_ops = &pci_hw_ops;
> -	mdev->dev = dev;
>  
>  	priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
>  	if (!priv) {
> @@ -991,7 +991,7 @@ static int mtk_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  free_priv_data:
>  	devm_kfree(dev, priv);
>  free_cntx_data:
> -	devm_kfree(dev, mdev);
> +	mtk_dev_free(mdev);

Why are you removing devm_ calls?

	Andrew

^ permalink raw reply

* Re: [PATCH net-next v5 1/4] dpll: add DPLL_PIN_TYPE_INT_NCO pin type
From: Vadim Fedorenko @ 2026-06-24 15:57 UTC (permalink / raw)
  To: Ivan Vecera, Kubalewski, Arkadiusz, Jiri Pirko, Jakub Kicinski
  Cc: netdev@vger.kernel.org, Jiri Pirko, David S. Miller,
	Donald Hunter, Eric Dumazet, Schmidt, Michal, Paolo Abeni,
	Vaananen, Pasi, Oros, Petr, Prathosh Satish, Simon Horman,
	linux-kernel@vger.kernel.org
In-Reply-To: <23e47140-f69f-451d-9154-29071130c11c@redhat.com>

On 19/06/2026 18:07, Ivan Vecera wrote:
> On 6/17/26 1:59 PM, Kubalewski, Arkadiusz wrote:
>>> From: Ivan Vecera <ivecera@redhat.com>
>>> Sent: Monday, June 15, 2026 2:00 PM
>>>
>>> On 6/11/26 2:09 PM, Jiri Pirko wrote:
>>>> Wed, Jun 10, 2026 at 05:45:46PM +0200, ivecera@redhat.com wrote:
>>>>> On 6/10/26 3:04 PM, Kubalewski, Arkadiusz wrote:
>>>>>>> From: Ivan Vecera <ivecera@redhat.com>
>>>>>>> Sent: Tuesday, June 9, 2026 4:59 PM
>>>>>>>
>>>>>>> On 6/9/26 4:00 PM, Kubalewski, Arkadiusz wrote:
>>>>>>>>> From: Jiri Pirko <jiri@resnulli.us>
>>>>>>>>> Sent: Tuesday, June 9, 2026 10:51 AM
>>>>>>>>>
>>>>>>>>> Mon, Jun 08, 2026 at 07:03:46PM +0200,
>>>>>>>>> arkadiusz.kubalewski@intel.com
>>>>>>>>> wrote:
>>>>>>>>>>> From: Ivan Vecera <ivecera@redhat.com>
>>>>>>>>>>> Sent: Monday, June 8, 2026 5:48 PM
>>>>>>>>>>>
>>>>>>>>>>> On 6/8/26 4:43 PM, Kubalewski, Arkadiusz wrote:
>>>>>>>>>>>>> From: Ivan Vecera <ivecera@redhat.com>
>>>>>>>>>>>>> Sent: Sunday, May 31, 2026 9:44 PM ...
>>>>>>>>>>>>>            -
>>>>>>>>>>>>>              name: gnss
>>>>>>>>>>>>>              doc: GNSS recovered clock
>>>>>>>>>>>>> +      -
>>>>>>>>>>>>> +        name: int-nco
>>>>>>>>>>>>> +        doc: |
>>>>>>>>>>>>> +          Device internal numerically controlled oscillator.
>>>>>>>>>>>>> +          When connected as a DPLL input, the DPLL enters NCO
>>>>>>>>>>>>> mode
>>>>>>>>>>>>> +          where the output frequency is adjusted by the host
>>>>>>>>>>>>> via
>>>>>>>>>>>>> +          the PTP clock interface.
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Ivan!
>>>>>>>>>>>>
>>>>>>>>>>>> How would you control this in case of automatic mode dpll?
>>>>>>>>>>>> Automatic mode DPLL shall be controlled on HW level, such pin
>>>>>>>>>>>> brakes that rule and requires some driver magic to show it is
>>>>>>>>>>>> higher priority then the rest of the pins?
>>>>>>>>>>>
>>>>>>>>>>> The NCO pin can be connected only in manual mode. In other words
>>>>>>>>>>> a
>>>>>>>>>>> DPLL in automatic mode cannot select NCO pin (switch to NCO 
>>>>>>>>>>> mode)
>>>>>>>>>>> by
>>>>>>>>>>> its own.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Being picky on DPLL_MODE for enabling feature is not something we
>>>>>>>>>> can allow if it is not related to HW limitation, is it?
>>>>>>>>>> Could you please elaborate why it is not possible for AUTOMATIC
>>>>>>>>>> mode?
>>>>>>>>>
>>>>>>>>> In automatic mode, the pin selection logic is defined upon prio. I
>>>>>>>>> can imagine that if NCO pin has the highest prio of the available
>>>>>>>>> ones, it gets picked. I would be aligned 100% with automatic mode
>>>>>>>>> behaviour.
>>>>>>>>> Is there a real usecase for it?
>>>>>>>>>
>>>>>>>>> [..]
>>>>>>>>
>>>>>>>> This is not true. AUTOMATIC mode is HW solution, SW driver ONLY
>>>>>>>> configures priorities on the inputs, not manages the active inputs.
>>>>>>>> This brakes that behavior, the SW driver would have to manually
>>>>>>>> override the AUTMATIC mode to be fed from such NCO pin as it 
>>>>>>>> doesn't
>>>>>>>> exists on it's priority list, HW cannot pick or use it.
>>>>>>>
>>>>>>> Correct, AUTO mode is hardware feature and it should not be emulated
>>>>>>> by a
>>>>>>> driver. If the hardware does not support it then the switching
>>>>>>> between
>>>>>>> input references should be done by userspace (by monitoring ffo,
>>>>>>> phase_offset, operstate).
>>>>>>>
>>>>>>
>>>>>> Yes, exactly, so for AUTOMATIC mode HW it will not be possible to
>>>>>> create
>>>>>> such pin, which means that NCO pin would serve only a MANUAL mode
>>>>>> implementation.
>>>>>> Basically this is something we shall not allow to happen. DPLL API
>>>>>> should be designed to cover the case where AUTO mode is able to
>>>>>> implement
>>>>>> all features consistently.
>>>>>
>>>>> If you don't like the proposal from Jiri (NCO switch driven by NCO pin
>>>>> priority -> highest==enter_nco else leave_nco) then it could be
>>>>> possible
>>>>> to handle the switching by allowing the state 'connected' in AUTO mode
>>>>> for the NCO pin type. Then the implementation will be the same for 
>>>>> both
>>>>> selection modes.
>>>>>
>>>>> Only difference would be that a user does not need to switch the 
>>>>> device
>>>> >from the AUTO to MANUAL mode.
>>>>>
>>>>>>>> The real use case is that any DPLL can switch the mode to this one
>>>>>>>> instead of implementing MANUAL mode just to use the feature with a
>>>>>>>> 'virtual' pin.
>>>>>>>
>>>>>>> I don't expect this... but it is up to a driver. I don't plan such
>>>>>>> functionality in zl3073x as the NCO pin does not expose prio_get()
>>>>>>> and
>>>>>>> prio_set() callbacks - so it is clear that this pin cannot be 
>>>>>>> part of
>>>>>>> the
>>>>>>> automatic selection.
>>>>>>>
>>>>>>> Ivan
>>>>>>
>>>>>> There is a difference between particular HW and API capabilities, 
>>>>>> with
>>>>>> the
>>>>>> proposed API we would disallow the possibility of such implementation
>>>>>> for
>>>>>> existing HW variants.
>>>>>>
>>>>>> DPLL NCO MODE would allow that but as pointed here by Ivan and by 
>>>>>> Jiri
>>>>>> in
>>>>>> the other email it would also require the extra implementation for
>>>>>> some
>>>>>> configuration - device level phase/ffo handling.
>>>>>>
>>>>>> To summarize it all, I don't have such simple solution for it.
>>>>>>
>>>>>> First thing that comes to my mind is to combine both approaches.
>>>>>> Make it possible for AUTMATIC mode to also set "CONNECTED" state
>>>>>> on certain kind of "OVERRIDE" pins, where it could be determined by
>>>>>> the type of PIN and embed that logic into the DPLL subsystem.
>>>>>
>>>>> The possible states for particual pins are now handled at a driver
>>>>> level
>>>>> so the driver decides if the requested state is correct or not. So it
>>>>> could be easy to implement this.
>>>>>
>>>>> For auto mode allowed states:
>>>>> - input references: selectable / disconnected
>>>>> - nco pin: connected / disconnected
>>>>>
>>>>>> Basically, if driver registers such NCO pin it would be always
>>>>>> selected
>>>>>> manually, and in such case all the other pins are going to
>>>>>> disconnected
>>>>>> state while DPLL mode is also a "OVERRIDE" or something like it.
>>>>>
>>>>> I would leave this decision on the driver level... Imagine the
>>>>> potential
>>>>> HW that would allow to switch NCO mode if there is no valid input
>>>>> reference.
>>>>>
>>>>> Example:
>>>>>
>>>>> REF0 (prio 0) -> +------+ -> OUT0
>>>>> REF1 (prio 1) -> | DPLL | -> ...
>>>>> NCO  (prio 2) -> +------+ -> OUTn
>>>>>
>>>>> Such HW would prefer REF0 or REF1 and lock to one of them if they are
>>>>> qualified. But if they are NOT, then it switches to NCO mode.
>>
>> Now you said yourself "NCO mode" ... I agree that it would be a mode in
>> that case. Where instead of running on regular/built in XO dpll would run
>> on NCO and user could select it, and this would be addition to regular
>> behavior.
>>
>> I also agree that the pin approach might be better/easier to use, 
>> assuming
>> frequency offset for all the outputs given dpll drives, it makes more 
>> sense
>> to have it configurable on input side.
> 
> +1
> 
>>>>>
>>>>> In this situation the relevant driver would allow to configure 
>>>>> priority
>>>>> and state 'selectable' for this NCO pin.
>>>>>
>>>>>> Perhaps the pin type could include OVERRIDE in it's name to make it
>>>>>> less
>>>>>> confusing and needs some extra documentation.
>>>>>>
>>>>>> Thoughts?
>>>>> I think _INT_ is ok. In the case of TYPE_INT_OSCILLATOR it is also
>>>>> obvious that it is not a standard input reference.
>>>>>
>>>>> Jiri, Vadim, Arek, thoughts?
>>>>
>>>> I agree with you, the driver should have the flexibility to implement
>>>> this according to his/hw's needs/capabilities. If it implements prio
>>>> selection in AUTO mode, let it have it. If it implements manual NCO pin
>>>> selection in AUTO mode using connected/disconnected override, let it
>>>> have it.
>>
>> I don't know 'current' HW that is capable of using AUTO mode as a part of
>> HW-based priority source selection and use such NCO input..
>> But as already explained above, this is special mode of regular XO, which
>> allows DPLL's output frequency offset configuration.
> 
> Lets keep this available for potential future HW. I can imagine a
> situation where a user will prefer an automatic switch to NCO mode
> if there is no qualified input reference - automatic switch means
> that HW will support this (not emulated by the driver).
> 
>>>>
>>>> Moreover, I actually like the "override" capability for pins in AUTO
>>>> mode in general. It may be handy for other usecases as well.
>>>>
>>> Arek? Vadim?
>>>
>>> Thanks,
>>> Ivan
>>
>> Agree, 'override' capability of a pin would be the way to go for this and
>> other similar further cases.
>>
>> I believe a single approach on this would be best, I mean if AUTO mode
>> needs a capability, to switch from regular behavior to 'OVERRIDE', and
>> 'OVERRIDE' is only pin capability that allows such behavior for AUTO
>> mode, then similar approach should be used on MANUAL mode, to make
>> userspace know that such pin is always available to set "CONNECTED"
>> and make the userspace implementation consistent on enabling it no matter
>> if AUTO or MANUAL mode dpll.
> 
> Proposal:
> 1) new pin capability
>     - name: state-connected-override
>     - doc: pin state can be changed to connected in any DPLL mode
> 
> 2) new NCO pin type to switch the DPLL to NCO mode when connected
> 
> 3) automatic-only DPLL
>     - should expose NCO pin with state-connected-override capability
> 
> 4) manual-only DPLL
>    - does not need to expose NCO pin with state-connected-override cap
> 
> 5) dual-mode DPLL (supporting mode switching)
>    - if it exposes NCO pin with the override cap then it has to support
>      switching to NCO mode directly from AUTO mode
>    - if does not expose NCO pin with the override cap then a user MUST
>      switch the DPLL mode from AUTO to MANUAL to be able to make NCO
>      pin connected to the DPLL

I still don't see good reasoning for the pin. Even this sentence says
"DPLL mode" which keeps me thinking whether we have to move it to a
special DPLL mode. All these items look like overcomplication of a
simple function of the device itself. DPLL can be either in the closed
loop when one of the pins provides a signal to align to, or in the open
loop meaning that software can control adjustments to phase/frequency.
But it's definitely a property of the device, and it's not a pin in any
kind...

> 
> Vadim, Jiri, Arek - thoughts?
> 
> Thanks,
> Ivan
> 


^ permalink raw reply

* Re: [PATCH v3 1/7] net: wwan: t9xx: Add PCIe core
From: Andrew Lunn @ 2026-06-24 15:56 UTC (permalink / raw)
  To: jackbb_wu
  Cc: Loic Poulain, Sergey Ryazanov, Johannes Berg, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Wen-Zhi Huang, Shi-Wei Yeh, Minano Tseng, Matthias Brugger,
	AngeloGioacchino Del Regno, Simon Horman, Jonathan Corbet,
	Shuah Khan, linux-kernel, netdev, linux-arm-kernel,
	linux-mediatek, linux-doc
In-Reply-To: <20260624-t9xx_driver_v1-v3-1-73ff03f60c48@compal.com>


> From: Jack Wu <jackbb_wu@compal.com>
> 
> Registers the T900 device driver with the kernel. Set up all
> the fundamental configurations for the device: PCIe layer,
> Modem Host Cross Core Interface (MHCCIF), Reset Generation
> Unit (RGU), modem common control operations and build
> infrastructure.
> 
> * PCIe layer code implements driver probe and removal, MSI-X
>   interrupt initialization and de-initialization, and the way
>   of resetting the device.
> * MHCCIF provides interrupt channels to communicate events
>   such as handshake, PM and port enumeration.
> * RGU provides interrupt channels to generate notifications
>   from the device so that the T900 driver could get the
>   device reset.
> * Modem common control operations provide the basic read/write
>   functions of the device's hardware registers,
>   mask/unmask/get/clear functions of the device's interrupt
>   registers and inquiry functions of the device's status.
> 
> Signed-off-by: Jack Wu <jackbb_wu@compal.com>
> ---
>  drivers/net/wwan/Kconfig                      |   12 +
>  drivers/net/wwan/Makefile                     |    1 +
>  drivers/net/wwan/t9xx/Makefile                |   10 +
>  drivers/net/wwan/t9xx/mtk_dev.h               |  108 +++
>  drivers/net/wwan/t9xx/pcie/mtk_pci.c          | 1049 +++++++++++++++++++++++++
>  drivers/net/wwan/t9xx/pcie/mtk_pci.h          |  234 ++++++
>  drivers/net/wwan/t9xx/pcie/mtk_pci_drv_m9xx.c |   69 ++
>  drivers/net/wwan/t9xx/pcie/mtk_pci_reg.h      |   70 ++
>  8 files changed, 1553 insertions(+)
> 
> diff --git a/drivers/net/wwan/Kconfig b/drivers/net/wwan/Kconfig
> index 88df55d78d90..4cee537c739f 100644
> --- a/drivers/net/wwan/Kconfig
> +++ b/drivers/net/wwan/Kconfig
> @@ -121,6 +121,18 @@ config MTK_T7XX
>  
>  	  If unsure, say N.
>  
> +config MTK_T9XX
> +	tristate "MediaTek PCIe 5G WWAN modem T9xx device"
> +	depends on PCI
> +	select NET_DEVLINK
> +	help
> +	  Enables MediaTek PCIe based 5G WWAN modem (T9xx series) device.
> +
> +	  To compile this driver as a module, choose M here: the module will be
> +	  called mtk_t9xx.
> +
> +	  If unsure, say N.
> +
>  endif # WWAN
>  
>  endmenu
> diff --git a/drivers/net/wwan/Makefile b/drivers/net/wwan/Makefile
> index 3960c0ae2445..7361eef4c472 100644
> --- a/drivers/net/wwan/Makefile
> +++ b/drivers/net/wwan/Makefile
> @@ -14,3 +14,4 @@ obj-$(CONFIG_QCOM_BAM_DMUX) += qcom_bam_dmux.o
>  obj-$(CONFIG_RPMSG_WWAN_CTRL) += rpmsg_wwan_ctrl.o
>  obj-$(CONFIG_IOSM) += iosm/
>  obj-$(CONFIG_MTK_T7XX) += t7xx/
> +obj-$(CONFIG_MTK_T9XX) += t9xx/
> diff --git a/drivers/net/wwan/t9xx/Makefile b/drivers/net/wwan/t9xx/Makefile
> new file mode 100644
> index 000000000000..6f2dd3f91454
> --- /dev/null
> +++ b/drivers/net/wwan/t9xx/Makefile
> @@ -0,0 +1,10 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +
> +ccflags-y += -I$(src)/pcie
> +ccflags-y += -I$(src)
> +
> +obj-$(CONFIG_MTK_T9XX) += mtk_t9xx.o
> +
> +mtk_t9xx-y := \
> +	pcie/mtk_pci.o \
> +	pcie/mtk_pci_drv_m9xx.o
> diff --git a/drivers/net/wwan/t9xx/mtk_dev.h b/drivers/net/wwan/t9xx/mtk_dev.h
> new file mode 100644
> index 000000000000..8278a0e2875e
> --- /dev/null
> +++ b/drivers/net/wwan/t9xx/mtk_dev.h
> @@ -0,0 +1,108 @@
> +/* SPDX-License-Identifier: GPL-2.0-only
> + *
> + * Copyright (c) 2022, MediaTek Inc.
> + */
> +
> +#ifndef __MTK_DEV_H__
> +#define __MTK_DEV_H__
> +
> +#include <linux/dma-mapping.h>
> +#include <linux/dmapool.h>
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/sched.h>
> +#include <linux/slab.h>
> +#include <linux/spinlock.h>
> +
> +#define MTK_DEV_STR_LEN 16
> +
> +enum mtk_user_id {
> +	MTK_USER_MIN,
> +	MTK_USER_CTRL,
> +	MTK_USER_DATA,
> +	MTK_USER_MAX
> +};
> +
> +enum mtk_dev_evt_h2d {
> +	DEV_EVT_H2D_DEVICE_RESET	= BIT(2),
> +	DEV_EVT_H2D_MAX			= BIT(5)
> +};
> +
> +enum mtk_dev_evt_d2h {
> +	DEV_EVT_D2H_BOOT_FLOW_SYNC	= BIT(4),
> +	DEV_EVT_D2H_ASYNC_HS_NOTIFY_SAP = BIT(5),
> +	DEV_EVT_D2H_ASYNC_HS_NOTIFY_MD	= BIT(6),
> +	DEV_EVT_D2H_MAX			= BIT(11)
> +};
> +
> +struct mtk_md_dev;
> +
> +struct mtk_dev_ops {
> +	u32 (*get_dev_state)(struct mtk_md_dev *mdev);
> +	void (*ack_dev_state)(struct mtk_md_dev *mdev, u32 state);
> +	u32 (*get_dev_cfg)(struct mtk_md_dev *mdev);
> +	int (*register_dev_evt)(struct mtk_md_dev *mdev, u32 dev_evt,
> +				int (*evt_cb)(u32 status, void *data), void *data);
> +	void (*unregister_dev_evt)(struct mtk_md_dev *mdev, u32 dev_evt);
> +	void (*mask_dev_evt)(struct mtk_md_dev *mdev, u32 dev_evt);
> +	void (*unmask_dev_evt)(struct mtk_md_dev *mdev, u32 dev_evt);
> +	void (*clear_dev_evt)(struct mtk_md_dev *mdev, u32 dev_evt);
> +	int (*send_dev_evt)(struct mtk_md_dev *mdev, u32 dev_evt);
> +};
> +
> +/* mtk_md_dev defines the structure of MTK modem device */
> +struct mtk_md_dev {
> +	struct device *dev;
> +	const struct mtk_dev_ops *dev_ops;
> +	void *hw_priv;
> +	u32 hw_ver;
> +	char dev_str[MTK_DEV_STR_LEN];
> +};
> +
> +static inline u32 mtk_dev_get_dev_state(struct mtk_md_dev *mdev)
> +{
> +	return mdev->dev_ops->get_dev_state(mdev);
> +}
> +
> +static inline void mtk_dev_ack_dev_state(struct mtk_md_dev *mdev, u32 state)
> +{
> +	return mdev->dev_ops->ack_dev_state(mdev, state);
> +}
> +
> +static inline u32 mtk_dev_get_dev_cfg(struct mtk_md_dev *mdev)
> +{
> +	return mdev->dev_ops->get_dev_cfg(mdev);
> +}
> +
> +static inline int mtk_dev_register_dev_evt(struct mtk_md_dev *mdev, u32 dev_evt,
> +					   int (*evt_cb)(u32 status, void *data), void *data)
> +{
> +	return mdev->dev_ops->register_dev_evt(mdev, dev_evt, evt_cb, data);
> +}
> +
> +static inline void mtk_dev_unregister_dev_evt(struct mtk_md_dev *mdev, u32 dev_evt)
> +{
> +	mdev->dev_ops->unregister_dev_evt(mdev, dev_evt);
> +}
> +
> +static inline void mtk_dev_mask_dev_evt(struct mtk_md_dev *mdev, u32 dev_evt)
> +{
> +	mdev->dev_ops->mask_dev_evt(mdev, dev_evt);
> +}
> +
> +static inline void mtk_dev_unmask_dev_evt(struct mtk_md_dev *mdev, u32 dev_evt)
> +{
> +	mdev->dev_ops->unmask_dev_evt(mdev, dev_evt);
> +}
> +
> +static inline void mtk_dev_clear_dev_evt(struct mtk_md_dev *mdev, u32 dev_evt)
> +{
> +	mdev->dev_ops->clear_dev_evt(mdev, dev_evt);
> +}
> +
> +static inline int mtk_dev_send_dev_evt(struct mtk_md_dev *mdev, u32 dev_evt)
> +{
> +	return mdev->dev_ops->send_dev_evt(mdev, dev_evt);
> +}
> +
> +#endif /* __MTK_DEV_H__ */
> diff --git a/drivers/net/wwan/t9xx/pcie/mtk_pci.c b/drivers/net/wwan/t9xx/pcie/mtk_pci.c
> new file mode 100644
> index 000000000000..c6a7196fcdd6
> --- /dev/null
> +++ b/drivers/net/wwan/t9xx/pcie/mtk_pci.c
> @@ -0,0 +1,1049 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2022, MediaTek Inc.
> + */
> +
> +#include <linux/acpi.h>
> +#include <linux/aer.h>
> +#include <linux/bitfield.h>
> +#include <linux/debugfs.h>
> +#include <linux/delay.h>
> +#include <linux/device.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +
> +#include "mtk_dev.h"
> +#include "mtk_pci.h"
> +#include "mtk_pci_reg.h"
> +
> +#define MTK_PCI_BAR_NUM		6
> +#define MTK_PCI_TRANSPARENT_ATR_SIZE	(0x3F)
> +#define MTK_PCI_MINIMUM_ATR_SIZE	(0x1000)
> +#define ATR_SIZE_LO32_MASK		GENMASK_ULL(31, 0)
> +#define ATR_SIZE_HI32_MASK		GENMASK_ULL(63, 32)
> +#define ATR_SIZE_BIAS_FROM_LO32		2
> +#define ATR_ADDR_ALIGN_MASK		0xFFFFF000
> +#define ATR_EN				BIT(0)
> +#define ATR_PARAM_OFFSET		16
> +/* Delay between ACPI PXP._OFF and _ON for modem power cycle stabilization */
> +#define MTK_PLDR_POWER_OFF_DELAY_MS	500
> +#define LE32_TO_U32(x) ((__force u32)(__le32)(x))
> +#define SET_HW_BITS(dest, chs, mhccif, dev)		\
> +	({						\
> +		if ((chs) & (dev))					\
> +			(dest) |= FIELD_PREP(mhccif, 1);		\
> +	})
> +
> +struct mtk_mhccif_cb {
> +	struct list_head entry;
> +	int (*evt_cb)(u32 status, void *data);
> +	void *data;
> +	u32 chs;
> +};
> +
> +/**
> + * mtk_pci_setup_atr() - Configure a PCIe address translation rule
> + * @mdev: MTK MD device
> + * @cfg: ATR configuration parameters
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +int mtk_pci_setup_atr(struct mtk_md_dev *mdev, struct mtk_atr_cfg *cfg)
> +{
> +	struct mtk_pci_priv *priv = mdev->hw_priv;
> +	u32 addr, val, size_h, size_l;
> +	int atr_size, pos, offset;
> +
> +	if (cfg->transparent) {
> +		/* No address conversion is performed */
> +		atr_size = MTK_PCI_TRANSPARENT_ATR_SIZE;
> +	} else {
> +		if (cfg->size < MTK_PCI_MINIMUM_ATR_SIZE)
> +			cfg->size = MTK_PCI_MINIMUM_ATR_SIZE;
> +
> +		if (cfg->src_addr & (cfg->size - 1)) {
> +			dev_err((mdev)->dev, "Invalid atr src addr is not aligned to size\n");
> +			return -EFAULT;
> +		}
> +
> +		if (cfg->trsl_addr & (cfg->size - 1)) {
> +			dev_err((mdev)->dev,
> +				"Invalid atr trsl addr is not aligned to size, %llx, %llx\n",
> +				cfg->trsl_addr, cfg->size - 1);
> +			return -EFAULT;
> +		}
> +
> +		size_l = FIELD_GET(ATR_SIZE_LO32_MASK, cfg->size);
> +		size_h = FIELD_GET(ATR_SIZE_HI32_MASK, cfg->size);
> +		pos = ffs(size_l);
> +		if (pos) {
> +			atr_size = pos - ATR_SIZE_BIAS_FROM_LO32;
> +		} else {
> +			pos = ffs(size_h);
> +			atr_size = pos + 32 - ATR_SIZE_BIAS_FROM_LO32;
> +		}
> +	}
> +
> +	/* Calculate table offset */
> +	offset = ATR_PORT_OFFSET * cfg->port + ATR_TABLE_OFFSET * cfg->table;
> +	addr = REG_ATR_PCIE_WIN0_T0_SRC_ADDR_MSB + offset;
> +	val = (u32)(cfg->src_addr >> 32);
> +	mtk_pci_mac_write32(priv, addr, val);
> +
> +	addr = REG_ATR_PCIE_WIN0_T0_SRC_ADDR_LSB + offset;
> +	val = (u32)(cfg->src_addr & ATR_ADDR_ALIGN_MASK) | (atr_size << 1) | ATR_EN;
> +	mtk_pci_mac_write32(priv, addr, val);
> +
> +	addr = REG_ATR_PCIE_WIN0_T0_TRSL_ADDR_MSB + offset;
> +	val = (u32)(cfg->trsl_addr >> 32);
> +	mtk_pci_mac_write32(priv, addr, val);
> +
> +	addr = REG_ATR_PCIE_WIN0_T0_TRSL_ADDR_LSB + offset;
> +	val = (u32)(cfg->trsl_addr & ATR_ADDR_ALIGN_MASK);
> +	mtk_pci_mac_write32(priv, addr, val);
> +
> +	/* TRSL_PARAM */
> +	addr = REG_ATR_PCIE_WIN0_T0_TRSL_PARAM + offset;
> +	val = (cfg->trsl_param << ATR_PARAM_OFFSET) | cfg->trsl_id;
> +	mtk_pci_mac_write32(priv, addr, val);
> +
> +	return 0;
> +}
> +
> +/**
> + * mtk_pci_atr_disable() - Disable all PCIe address translation rules
> + * @priv: MTK PCI private data
> + */
> +void mtk_pci_atr_disable(struct mtk_pci_priv *priv)
> +{
> +	int port, tbl, offset;
> +	u32 val;
> +
> +	/* Disable all ATR table for all ports */
> +	for (port = ATR_SRC_PCI_WIN0; port <= ATR_SRC_AXIS_3; port++)
> +		for (tbl = 0; tbl < ATR_TABLE_NUM_PER_ATR; tbl++) {
> +			/* Calculate table offset */
> +			offset = ATR_PORT_OFFSET * port + ATR_TABLE_OFFSET * tbl;
> +			val = mtk_pci_mac_read32(priv, REG_ATR_PCIE_WIN0_T0_SRC_ADDR_LSB + offset);
> +			val = val & (~BIT(0));
> +			/* Disable table by SRC_ADDR_L */
> +			mtk_pci_mac_write32(priv, REG_ATR_PCIE_WIN0_T0_SRC_ADDR_LSB + offset, val);
> +		}
> +}
> +
> +static void mtk_pci_set_msix_merged(struct mtk_pci_priv *priv, int irq_cnt)
> +{
> +	mtk_pci_mac_write32(priv, REG_PCIE_CFG_MSIX, ffs(irq_cnt) * 2 - 1);
> +}
> +
> +/**
> + * mtk_pci_get_dev_state() - Read the device state from the modem
> + * @mdev: MTK MD device
> + *
> + * Return: Device state value.
> + */
> +u32 mtk_pci_get_dev_state(struct mtk_md_dev *mdev)
> +{
> +	return mtk_pci_mac_read32(mdev->hw_priv, REG_PCIE_DEBUG_DUMMY_7);
> +}
> +
> +/**
> + * mtk_pci_ack_dev_state() - Acknowledge the device state to the modem
> + * @mdev: MTK MD device
> + * @state: State value to acknowledge
> + */
> +void mtk_pci_ack_dev_state(struct mtk_md_dev *mdev, u32 state)
> +{
> +	mtk_pci_mac_write32(mdev->hw_priv, REG_PCIE_DEBUG_DUMMY_7, state);
> +}
> +
> +/**
> + * mtk_pci_get_irq_id() - Map an IRQ source to its hardware IRQ ID
> + * @mdev: MTK MD device
> + * @irq_src: IRQ source enum
> + *
> + * Return: IRQ ID on success, -EINVAL on failure.
> + */
> +int mtk_pci_get_irq_id(struct mtk_md_dev *mdev, enum mtk_irq_src irq_src)
> +{
> +	struct mtk_pci_priv *priv = mdev->hw_priv;
> +	const int *irq_tbl = priv->cfg->irq_tbl;
> +	int irq_id = -EINVAL;
> +
> +	if (irq_src > MTK_IRQ_SRC_MIN && irq_src < MTK_IRQ_SRC_MAX) {
> +		irq_id = irq_tbl[irq_src];
> +		if (irq_id < 0 || irq_id >= MTK_IRQ_CNT_MAX)
> +			irq_id = -EINVAL;
> +	}
> +
> +	return irq_id;
> +}
> +
> +/**
> + * mtk_pci_get_virq_id() - Get the Linux virtual IRQ for a hardware IRQ ID
> + * @mdev: MTK MD device
> + * @irq_id: Hardware IRQ ID
> + *
> + * Return: Virtual IRQ number on success, negative error code on failure.
> + */
> +int mtk_pci_get_virq_id(struct mtk_md_dev *mdev, int irq_id)
> +{
> +	struct pci_dev *pdev = to_pci_dev(mdev->dev);
> +	struct mtk_pci_priv *priv = mdev->hw_priv;
> +
> +	if (!priv->irq_cnt || irq_id < 0)
> +		return -EINVAL;
> +
> +	return pci_irq_vector(pdev, irq_id % priv->irq_cnt);
> +}
> +
> +/**
> + * mtk_pci_register_irq() - Register a callback for a hardware IRQ
> + * @mdev: MTK MD device
> + * @irq_id: Hardware IRQ ID
> + * @irq_cb: Callback function
> + * @data: Private data passed to callback
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +int mtk_pci_register_irq(struct mtk_md_dev *mdev, int irq_id,
> +			 int (*irq_cb)(int irq_id, void *data), void *data)
> +{
> +	struct mtk_pci_priv *priv = mdev->hw_priv;
> +
> +	if ((irq_id < 0 || irq_id >= MTK_IRQ_CNT_MAX) || !irq_cb)
> +		return -EINVAL;
> +
> +	if (priv->irq_cb_list[irq_id]) {
> +		dev_err((mdev)->dev,
> +			"Unable to register irq, irq_id=%d, it's already been register by %ps.\n",
> +			irq_id, priv->irq_cb_list[irq_id]);
> +		return -EFAULT;
> +	}
> +	priv->irq_cb_list[irq_id] = irq_cb;
> +	priv->irq_cb_data[irq_id] = data;
> +
> +	return 0;
> +}
> +
> +/**
> + * mtk_pci_unregister_irq() - Unregister a hardware IRQ callback
> + * @mdev: MTK MD device
> + * @irq_id: Hardware IRQ ID
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +int mtk_pci_unregister_irq(struct mtk_md_dev *mdev, int irq_id)
> +{
> +	struct mtk_pci_priv *priv = mdev->hw_priv;
> +
> +	if (irq_id < 0 || irq_id >= MTK_IRQ_CNT_MAX)
> +		return -EINVAL;
> +
> +	if (!priv->irq_cb_list[irq_id]) {
> +		dev_err((mdev)->dev, "irq_id=%d has not been registered\n", irq_id);
> +		return -EFAULT;
> +	}
> +	priv->irq_cb_list[irq_id] = NULL;
> +	priv->irq_cb_data[irq_id] = NULL;
> +
> +	return 0;
> +}
> +
> +/**
> + * mtk_pci_mask_irq() - Mask (disable) a hardware IRQ
> + * @mdev: MTK MD device
> + * @irq_id: Hardware IRQ ID
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +int mtk_pci_mask_irq(struct mtk_md_dev *mdev, int irq_id)
> +{
> +	struct mtk_pci_priv *priv = mdev->hw_priv;
> +
> +	if (irq_id < 0 || irq_id >= MTK_IRQ_CNT_MAX ||
> +	    priv->irq_type != PCI_IRQ_MSIX) {
> +		dev_err(mdev->dev, "Failed to mask irq: input irq_id=%d\n", irq_id);
> +		return -EINVAL;
> +	}
> +
> +	mtk_pci_mac_write32(priv, REG_IMASK_HOST_MSIX_CLR_GRP0_0, BIT(irq_id));
> +
> +	return 0;
> +}
> +
> +/**
> + * mtk_pci_unmask_irq() - Unmask (enable) a hardware IRQ
> + * @mdev: MTK MD device
> + * @irq_id: Hardware IRQ ID
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +int mtk_pci_unmask_irq(struct mtk_md_dev *mdev, int irq_id)
> +{
> +	struct mtk_pci_priv *priv = mdev->hw_priv;
> +
> +	if (irq_id < 0 || irq_id >= MTK_IRQ_CNT_MAX ||
> +	    priv->irq_type != PCI_IRQ_MSIX) {
> +		dev_err(mdev->dev, "Failed to unmask irq: input irq_id=%d\n", irq_id);
> +		return -EINVAL;
> +	}
> +
> +	mtk_pci_mac_write32(priv, REG_IMASK_HOST_MSIX_SET_GRP0_0, BIT(irq_id));
> +
> +	return 0;
> +}
> +
> +/**
> + * mtk_pci_clear_irq() - Clear (acknowledge) a hardware IRQ
> + * @mdev: MTK MD device
> + * @irq_id: Hardware IRQ ID
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +int mtk_pci_clear_irq(struct mtk_md_dev *mdev, int irq_id)
> +{
> +	struct mtk_pci_priv *priv = mdev->hw_priv;
> +
> +	if (irq_id < 0 || irq_id >= MTK_IRQ_CNT_MAX ||
> +	    priv->irq_type != PCI_IRQ_MSIX) {
> +		dev_err(mdev->dev, "Failed to clear irq: input irq_id=%d\n", irq_id);
> +		return -EINVAL;
> +	}
> +
> +	mtk_pci_mac_write32(priv, REG_MSIX_ISTATUS_HOST_GRP0_0, BIT(irq_id));
> +
> +	return 0;
> +}
> +
> +static u32 mtk_pci_ext_d2h_evt_hw_bits(u32 chs)
> +{
> +	u32 hw_bits = 0;
> +
> +	SET_HW_BITS(hw_bits, chs, MHCCIF_EP2RC_EVT_BOOT_FLOW_SYNC,
> +		    DEV_EVT_D2H_BOOT_FLOW_SYNC);
> +	SET_HW_BITS(hw_bits, chs, MHCCIF_EP2RC_EVT_ASYNC_HS_NOTIFY_SAP,
> +		    DEV_EVT_D2H_ASYNC_HS_NOTIFY_SAP);
> +	SET_HW_BITS(hw_bits, chs, MHCCIF_EP2RC_EVT_ASYNC_HS_NOTIFY_MD,
> +		    DEV_EVT_D2H_ASYNC_HS_NOTIFY_MD);
> +
> +	return LE32_TO_U32(cpu_to_le32(hw_bits));
> +}
> +
> +static u32 mtk_pci_ext_d2h_evt_chs(u32 hw_bits)
> +{
> +	u32 chs = 0;
> +
> +	if (!hw_bits)
> +		return chs;
> +
> +	chs = FIELD_PREP(DEV_EVT_D2H_BOOT_FLOW_SYNC,
> +			 FIELD_GET(MHCCIF_EP2RC_EVT_BOOT_FLOW_SYNC, hw_bits)) |
> +	      FIELD_PREP(DEV_EVT_D2H_ASYNC_HS_NOTIFY_SAP,
> +			 FIELD_GET(MHCCIF_EP2RC_EVT_ASYNC_HS_NOTIFY_SAP, hw_bits)) |
> +	      FIELD_PREP(DEV_EVT_D2H_ASYNC_HS_NOTIFY_MD,
> +			 FIELD_GET(MHCCIF_EP2RC_EVT_ASYNC_HS_NOTIFY_MD, hw_bits));
> +
> +	return chs;
> +}
> +
> +/**
> + * mtk_pci_register_ext_evt() - Register a callback for MHCCIF device events
> + * @mdev: MTK MD device
> + * @chs: Bitmask of event channels to register
> + * @evt_cb: Callback function
> + * @data: Private data passed to callback
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +int mtk_pci_register_ext_evt(struct mtk_md_dev *mdev, u32 chs,
> +			     int (*evt_cb)(u32 status, void *data), void *data)
> +{
> +	struct mtk_pci_priv *priv = mdev->hw_priv;
> +	struct mtk_mhccif_cb *cb;
> +	int ret = 0;
> +
> +	if (!chs || !evt_cb)
> +		return -EINVAL;
> +
> +	spin_lock_bh(&priv->mhccif_lock);
> +	list_for_each_entry(cb, &priv->mhccif_cb_list, entry) {
> +		if (cb->chs & chs) {
> +			ret = -EFAULT;
> +			dev_err((mdev)->dev,
> +				"Unable to register evt, intersection: chs=0x%08x&0x%08x cb=%ps\n",
> +				chs, cb->chs, cb->evt_cb);
> +			goto err_spin_unlock;
> +		}
> +	}
> +	cb = devm_kzalloc(mdev->dev, sizeof(*cb), GFP_ATOMIC);
> +	if (!cb) {
> +		ret = -ENOMEM;
> +		goto err_spin_unlock;
> +	}
> +	cb->evt_cb = evt_cb;
> +	cb->data = data;
> +	cb->chs = chs;
> +	list_add_tail(&cb->entry, &priv->mhccif_cb_list);
> +err_spin_unlock:
> +	spin_unlock_bh(&priv->mhccif_lock);
> +
> +	return ret;
> +}
> +
> +/**
> + * mtk_pci_unregister_ext_evt() - Unregister an MHCCIF device event callback
> + * @mdev: MTK MD device
> + * @chs: Bitmask of event channels to unregister
> + */
> +void mtk_pci_unregister_ext_evt(struct mtk_md_dev *mdev, u32 chs)
> +{
> +	struct mtk_pci_priv *priv = mdev->hw_priv;
> +	struct mtk_mhccif_cb *cb, *next;
> +
> +	if (!chs)
> +		return;
> +
> +	spin_lock_bh(&priv->mhccif_lock);
> +	list_for_each_entry_safe(cb, next, &priv->mhccif_cb_list, entry) {
> +		if (cb->chs == chs) {
> +			list_del(&cb->entry);
> +			devm_kfree(mdev->dev, cb);
> +			goto out;
> +		}
> +	}
> +	dev_warn((mdev)->dev,
> +		 "Unable to unregister evt, no chs=0x%08x has been registered.\n", chs);
> +out:
> +	spin_unlock_bh(&priv->mhccif_lock);
> +}
> +
> +/**
> + * mtk_pci_mask_ext_evt() - Mask (disable) MHCCIF device events
> + * @mdev: MTK MD device
> + * @chs: Bitmask of event channels to mask
> + */
> +void mtk_pci_mask_ext_evt(struct mtk_md_dev *mdev, u32 chs)
> +{
> +	struct mtk_pci_priv *priv = mdev->hw_priv;
> +	u32 hw_bits = mtk_pci_ext_d2h_evt_hw_bits(chs);
> +
> +	mtk_pci_write32(mdev, priv->cfg->mhccif_rc_base_addr +
> +			MHCCIF_EP2RC_SW_INT_EAP_MASK_SET, hw_bits);
> +}
> +
> +/**
> + * mtk_pci_unmask_ext_evt() - Unmask (enable) MHCCIF device events
> + * @mdev: MTK MD device
> + * @chs: Bitmask of event channels to unmask
> + */
> +void mtk_pci_unmask_ext_evt(struct mtk_md_dev *mdev, u32 chs)
> +{
> +	struct mtk_pci_priv *priv = mdev->hw_priv;
> +	u32 hw_bits = mtk_pci_ext_d2h_evt_hw_bits(chs);
> +
> +	mtk_pci_write32(mdev, priv->cfg->mhccif_rc_base_addr +
> +			MHCCIF_EP2RC_SW_INT_EAP_MASK_CLR, hw_bits);
> +}
> +
> +/**
> + * mtk_pci_clear_ext_evt() - Clear (acknowledge) MHCCIF device events
> + * @mdev: MTK MD device
> + * @chs: Bitmask of event channels to clear
> + */
> +void mtk_pci_clear_ext_evt(struct mtk_md_dev *mdev, u32 chs)
> +{
> +	struct mtk_pci_priv *priv = mdev->hw_priv;
> +	u32 hw_bits = mtk_pci_ext_d2h_evt_hw_bits(chs);
> +
> +	mtk_pci_write32(mdev, priv->cfg->mhccif_rc_base_addr +
> +			MHCCIF_EP2RC_SW_INT_ACK, hw_bits);
> +}
> +
> +static u32 mtk_pci_ext_h2d_evt_hw_bits(u32 chs)
> +{
> +	u32 hw_bits = 0;
> +
> +	SET_HW_BITS(hw_bits, chs, MHCCIF_RC2EP_EVT_DEVICE_RESET,
> +		    DEV_EVT_H2D_DEVICE_RESET);
> +	return LE32_TO_U32(cpu_to_le32(hw_bits));
> +}
> +
> +/**
> + * mtk_pci_send_ext_evt() - Send an MHCCIF event to the modem
> + * @mdev: MTK MD device
> + * @ch: Event channel to trigger (must be a single bit)
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +int mtk_pci_send_ext_evt(struct mtk_md_dev *mdev, u32 ch)
> +{
> +	struct mtk_pci_priv *priv = mdev->hw_priv;
> +	u32 rc_base, hw_bits;
> +
> +	rc_base = priv->cfg->mhccif_rc_base_addr;
> +
> +	/* Only allow one ch to be triggered at a time */
> +	if (!is_power_of_2(ch)) {
> +		dev_err((mdev)->dev, "Unsupported ext evt ch=0x%08x\n", ch);
> +		return -EINVAL;
> +	}
> +
> +	hw_bits = mtk_pci_ext_h2d_evt_hw_bits(ch);
> +	mtk_pci_write32(mdev, rc_base + MHCCIF_RC2EP_SW_BSY, hw_bits);
> +	mtk_pci_write32(mdev, rc_base + MHCCIF_RC2EP_SW_TCHNUM, ffs(hw_bits) - 1);
> +	return 0;
> +}
> +
> +static u32 mtk_pci_get_ext_evt_hw_status(struct mtk_md_dev *mdev)
> +{
> +	struct mtk_pci_priv *priv = mdev->hw_priv;
> +
> +	return mtk_pci_read32(mdev, priv->cfg->mhccif_rc_base_addr +
> +			      MHCCIF_EP2RC_SW_INT_STS);
> +}
> +
> +/**
> + * mtk_pci_fldr() - Perform a Function Level Device Reset via ACPI _RST
> + * @mdev: MTK MD device
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +int mtk_pci_fldr(struct mtk_md_dev *mdev)
> +{
> +#ifdef CONFIG_ACPI

...

> +#else /* !CONFIG_ACPI */
> +	dev_err((mdev)->dev, "Unsupported, CONFIG ACPI hasn't been set to 'y'\n");

Why not just have the Kconfig depend on ACPI?

> +	if (ret) {
> +		dev_err((mdev)->dev, "Failed to register mhccif_irq callback\n");

Why the () around mdev?


    Andrew

---
pw-bot: cr

^ permalink raw reply

* Re: [RFC net-next 08/15] ipxlat: add translation engine and dispatch core
From: Ralf Lici @ 2026-06-24 15:43 UTC (permalink / raw)
  To: Beniamino Galvani
  Cc: Toke Høiland-Jørgensen, netdev, Daniel Gröber,
	Antonio Quartulli, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, linux-kernel
In-Reply-To: <ajo-MgpZ8jA8YDwi@tp>

On Tue, 23 Jun 2026 10:05:06 +0200, Beniamino Galvani <bgalvani@redhat.com> wrote:
> On Mon, Jun 22, 2026 at 05:56:11PM +0200, Ralf Lici wrote:
> > While reading the BPF programs, two things stood out that would help
> > shape v2. On addressing, both implementations use a single NAT64/PLAT
> > prefix for destinations plus an explicit local_v4<>local_v6 mapping for
> > the host itself. ipxlat today maps both source and destination through
> > one RFC 6052 prefix, so this suggests v2 should probably support
> > explicit 1:1 address mappings (EAM, RFC 7757) alongside prefix
> > embedding. Is a single local mapping enough for your case, or do you
> > foresee needing several?
>
> Based on these:
>
>   https://datatracker.ietf.org/doc/html/rfc6877#section-6.3
>   https://www.ietf.org/archive/id/draft-ietf-v6ops-claton-14.html#name-obtaining-clat-ipv6-address
>
> there are two cases to consider for CLAT.
>
> If the node doing CLAT extends the IPv4 connectivity downstream, it
> should acquire a dedicated prefix via DHCPv6-PD for the downstream
> network, and use this prefix to generate IPv4-Embedded IPv6 Addresses
> (RFC 6052 2.2) for downlink hosts. In this case, the ipxlat device
> would need to be configured with two prefixes: one is the NAT64 prefix
> used to synthesize IPv6 destinations for IPv4 Internet addresses, and
> the other is the delegated prefix used to synthesize IPv6 source
> addresses for hosts on the downstream IPv4 network. Ideally, the
> ipxlat device should also be aware of the address range of the
> downstream IPv4 network, and check that the source addresses of
> packets belong to that network.
>
> If the node doing CLAT does not extend IPv4 connectivity downstream
> (or it does, but it also uses NAT44), the "downstream network" is
> basically just one host. The CLAT only needs to map a single
> locally-generated IPv4 address (usually in the 192.0.0.0/29 range) to
> a single SLAAC IPv6 address reserved for the CLAT. In this case, the
> ipxlat device would need to know the CLAT IPv4 address, the CLAT IPv6
> address and the NAT64 prefix.
>
> Currently, NetworkManager only uses a single IPv4 address and doesn't
> request a dedicated prefix via DHCPv6-PD for the CLAT. If it needs to
> provide downstream connectivity, it does IPv4 masquerading so that the
> traffic originates from a single IPv4 address. I think the ipxlat
> implementation should also support the delegated-prefix case, as this
> architecture is described in the standards.
>

I see. Your CLAT breakdown makes it clear that the single-prefix model
in this RFC is too narrow for v2. What I am currently thinking is to
shape the config as a far-side prefix plus an optional near-side map.
The 2 CLAT cases then can be represented roughly as:

    remote-prefix6 64:ff9b::/96
    local-map      192.0.0.1/32 2001:db8:1:2::1234/128

for the single-host CLAT case (which is an explicit 1:1 mapping), and:

    remote-prefix6 64:ff9b::/96
    local-map      192.0.2.0/24 2001:db8:100:64::/96

for the downstream/delegated-prefix case.

The idea is that remote-prefix6 is the RFC 6052 prefix used for the
NAT64/PLAT side, while local-map optionally describes the near-side
mapping. If local-map is omitted, the translator falls back to the
symmetric v1 behavior where the same RFC 6052 prefix is used for both
source and destination.

On the source-range check you mentioned: it falls out of the near-side
map for free. In the 4->6 direction the source is resolved through
local-map only, so a source outside its IPv4 domain (the /32 or the /24
above) simply has no near-side mapping and is dropped, so no separate
anti-spoofing knob should be needed. Anything broader than "is this a
valid near-side source" I'd leave to routing/nft rather than bake into
the translator.

Names and exact attribute shape are still WIP, but does this capture the
two CLAT cases you had in mind?

> > On the consumer side, is there anything in how NM models a connection
> > that would make a particular kernel model awkward to drive, e.g. needing
> > to attach to an already-managed interface, or conversely being able to
> > create and own a dedicated device? We're still settling the
> > kernel-facing model for v2, so consumer input here is genuinely
> > valuable.
>
> Any of the solutions mentioned in the thread (dedicated device,
> netfilter, LWT) would be fine from NetworkManager's point of
> view. Compared to what we are doing now, they would be a great
> simplification ;)
>

Nice, thanks for confirming!

-- 
Ralf Lici
Mandelbit Srl

^ permalink raw reply

* Re: [PATCH net] dt-bindings: net: renesas,ether: Drop example "ethernet-phy-ieee802.3-c22" fallback
From: Andrew Lunn @ 2026-06-24 15:47 UTC (permalink / raw)
  To: Rob Herring (Arm)
  Cc: Niklas Söderlund, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Krzysztof Kozlowski, Conor Dooley,
	Geert Uytterhoeven, Magnus Damm, Sergei Shtylyov, netdev,
	linux-renesas-soc, devicetree, linux-kernel
In-Reply-To: <20260624150250.131966-2-robh@kernel.org>

On Wed, Jun 24, 2026 at 10:02:50AM -0500, Rob Herring (Arm) wrote:
> Fix the Micrel PHY in the example which shouldn't have the
> fallback "ethernet-phy-ieee802.3-c22" compatible:
> 
> Documentation/devicetree/bindings/net/renesas,ether.example.dtb: ethernet-phy@1 \
>   (ethernet-phy-id0022.1537): compatible: ['ethernet-phy-id0022.1537', 'ethernet-phy-ieee802.3-c22'] is too long
>         from schema $id: http://devicetree.org/schemas/net/micrel.yaml
> 
> Signed-off-by: Rob Herring (Arm) <robh@kernel.org>

Fixes: 37a2fce09001 ("dt-bindings: sh_eth convert bindings to json-schema")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: [PATCH net v2 0/2] sctp: validate INIT in COOKIE-ECHO when auth disabled
From: Xin Long @ 2026-06-24 15:46 UTC (permalink / raw)
  To: network dev, linux-sctp
  Cc: davem, kuba, Eric Dumazet, Paolo Abeni, Simon Horman,
	Marcelo Ricardo Leitner
In-Reply-To: <cover.1781968162.git.lucien.xin@gmail.com>

On Sat, Jun 20, 2026 at 11:10 AM Xin Long <lucien.xin@gmail.com> wrote:
>
> This series fixes a security gap in SCTP's COOKIE-ECHO handling when
> cookie authentication is disabled.
>
> Currently, INIT chunks embedded in cookies are not re-verified after
> unpacking, creating a vulnerability when cookie_auth_enable=0. This
> series first refactors error handling, then adds the missing validation.
>
> Changes in v2: see individual patch changelogs for details.
>
> Xin Long (2):
>   sctp: factor out INIT verification failure handling
>   sctp: add INIT verification after cookie unpacking
>
I will drop Patch 1/2, and call sctp_sf_pdiscard() for err path in Patch
2/2 on v3, as it makes more sense to discard the packet if embedded INIT
chunk validation fails, consistent with malformed cookie handling.

>  net/sctp/sm_make_chunk.c |   3 +-
>  net/sctp/sm_statefuns.c  | 220 ++++++++++++++++++++-------------------
>  2 files changed, 117 insertions(+), 106 deletions(-)
>
> --
> 2.47.1
>

^ permalink raw reply

* Re: [PATCH] xfrm: iptfs: propagate SKBFL_SHARED_FRAG in iptfs_skb_add_frags()
From: Simon Horman @ 2026-06-24 15:45 UTC (permalink / raw)
  To: 635381823cyj
  Cc: Simon Horman, steffen.klassert, herbert, davem, netdev, moomichen
In-Reply-To: <20260623092258.70507-1-635381823cyj@gmail.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://netdev-ai.bots.linux.dev/sashiko/
---
xfrm: iptfs: propagate SKBFL_SHARED_FRAG in iptfs_skb_add_frags()

This patch sets SKBFL_SHARED_FRAG on the destination SKB after
iptfs_skb_add_frags() transfers frag references from a frag walk, to
prevent ESP's no-COW fast path from decrypting in place over pages still
referenced by the outer IPTFS SKB.

> diff --git a/net/xfrm/xfrm_iptfs.c b/net/xfrm/xfrm_iptfs.c
> index ad810d1f97c0..0e0dcf47a470 100644
> --- a/net/xfrm/xfrm_iptfs.c
> +++ b/net/xfrm/xfrm_iptfs.c
> @@ -496,6 +496,10 @@ static int iptfs_skb_add_frags(struct sk_buff *skb,
>  		walk->past += frag->len;	/* careful, use src bv_len */
>  		walk->fragi++;
>  	}
> +
> +	if (skb_shinfo(skb)->nr_frags)
> +		skb_shinfo(skb)->flags |= SKBFL_SHARED_FRAG;
> +
>  	return len;
>  }

[High]
Does this placement cover every path that adds frags to the destination
SKB?  The new assignment sits after the while loop, but the loop has an
earlier exit that returns once the requested length is satisfied
mid-frag:

    while (len && walk->fragi < walk->nr_frags) {
        skb_frag_t *frag = &walk->frags[walk->fragi];
        skb_frag_t *tofrag = &shinfo->frags[shinfo->nr_frags];

        *tofrag = *frag;
        if (offset) {
            tofrag->offset += offset;
            tofrag->len -= offset;
            offset = 0;
        }
        __skb_frag_ref(tofrag);
        shinfo->nr_frags++;

        /* see if we are done */
        fraglen = tofrag->len;
        if (len < fraglen) {
            tofrag->len = len;
            skb->len += len;
            skb->data_len += len;
            return 0;
        }
        ...
    }

By this point __skb_frag_ref(tofrag) has run and shinfo->nr_frags has
been incremented, so frags have been attached, but the early return 0
bypasses the new SKBFL_SHARED_FRAG assignment.

The caller iptfs_pskb_add_frags() typically requests fewer bytes than
the current source frag holds, so this early-return path is the common
termination of the loop.  In that case the destination SKB still appears
to ESP as having privately owned frags, and the nested transport-mode
esp_input() no-COW path described in the commit message can still
decrypt in place over pages referenced by the outer IPTFS SKB.

Would it be more robust to set the flag immediately after
__skb_frag_ref(tofrag) / shinfo->nr_frags++, or duplicate the assignment
before the inner return 0, so every path that attaches a frag also
propagates SKBFL_SHARED_FRAG?

^ permalink raw reply

* Re: [PATCH 7/7] ARM: dts: rockchip: Add Alientek DLRV1126
From: Andrew Lunn @ 2026-06-24 15:44 UTC (permalink / raw)
  To: Yanan He
  Cc: robh, krzk+dt, conor+dt, heiko, andrew+netdev, davem, edumazet,
	kuba, pabeni, david.wu, mcoquelin.stm32, alexandre.torgue,
	devicetree, linux-kernel, linux-arm-kernel, linux-rockchip,
	netdev, linux-stm32
In-Reply-To: <20260624-rv1126-alientek-dlrv1126-v1-7-dc42d99f75a7@gmail.com>

> The board consists of a CLRV1126F core module and a DLRV1126 carrier
> board. The core module contains the RV1126 SoC, eMMC and RK809 PMIC,
> while the carrier board provides Ethernet, SD card, AP6212 WiFi and
> Bluetooth, PCF8563 RTC, ADC keys, GPIO LEDs and audio connectors.
> 
> The board has been tested with Ethernet/NFS boot, eMMC, SD card, SDIO
> WiFi enumeration, Bluetooth LE scanning, RTC, ADC keys, GPIO LEDs and
> RK809 audio card registration.

Ah, here is the networking nodes. But why was it not threaded to the
rest of the series?

> +&gmac {
> +	phy-mode = "rgmii";
> +	clock_in_out = "input";
> +	assigned-clocks = <&cru CLK_GMAC_SRC>, <&cru CLK_GMAC_TX_RX>,
> +			  <&cru CLK_GMAC_ETHERNET_OUT>;
> +	assigned-clock-parents = <&cru CLK_GMAC_SRC_M1>,
> +				 <&cru RGMII_MODE_CLK>;
> +	assigned-clock-rates = <125000000>, <0>, <25000000>;
> +	pinctrl-names = "default";
> +	pinctrl-0 = <&rgmiim1_miim &rgmiim1_bus2 &rgmiim1_bus4
> +		     &clk_out_ethernetm1_pins>;
> +	tx_delay = <0x2a>;
> +	rx_delay = <0x1a>;

As i predicted, this is wrong.

https://elixir.bootlin.com/linux/v6.15/source/Documentation/devicetree/bindings/net/ethernet-controller.yaml#L287

Please try removing rx_delay, rx_delay and setting phy-mode to
rgmii-id.

	Andrew

^ permalink raw reply

* Re: [PATCH 6/7] ARM: dts: rockchip: Add RV1126 I2C5
From: Andrew Lunn @ 2026-06-24 15:42 UTC (permalink / raw)
  To: Yanan He
  Cc: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
	Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, David Wu, Maxime Coquelin, Alexandre Torgue,
	devicetree, linux-kernel, linux-arm-kernel, linux-rockchip,
	netdev, linux-stm32
In-Reply-To: <20260624-rv1126-alientek-dlrv1126-v1-6-5aef608a3f64@gmail.com>

On Wed, Jun 24, 2026 at 04:44:43PM +0800, Yanan He wrote:
> The controller is present in the SoC and can be used by boards for
> external peripherals, such as an RTC on the Alientek DLRV1126 carrier
> board.

This has nothing to do with networking, so please post it separately.

What i would actually like to see is the patch adding networking
nodes, because my guess is, you have the RGMII delays wrong.

       Andrew

^ permalink raw reply

* Re: [PATCH 4/7] net: stmmac: dwmac-rk: Enable refout clock for RGMII
From: Andrew Lunn @ 2026-06-24 15:39 UTC (permalink / raw)
  To: Yanan He
  Cc: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
	Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, David Wu, Maxime Coquelin, Alexandre Torgue,
	devicetree, linux-kernel, linux-arm-kernel, linux-rockchip,
	netdev, linux-stm32
In-Reply-To: <20260624-rv1126-alientek-dlrv1126-v1-4-5aef608a3f64@gmail.com>

On Wed, Jun 24, 2026 at 04:44:41PM +0800, Yanan He wrote:
> Some Rockchip GMAC integrations use clk_mac_refout as an external PHY
> reference clock even when the MAC is configured for RGMII.
> 
> RV1126 boards can route CLK_GMAC_ETHERNET_OUT to the external PHY as a
> 25 MHz reference clock. If the driver does not acquire and enable this
> clock in RGMII mode, the common clock framework may disable it as unused
> and the PHY can lose its reference clock.
> 
> Enable the refout clock handling for RGMII in addition to RMII.
> 
> Signed-off-by: Yanan He <grumpycat921013@gmail.com>
> ---
>  drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
> index 8d7042e68926..f6fdc0c5b475 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
> @@ -1112,7 +1112,8 @@ static int rk_gmac_clk_init(struct plat_stmmacenet_data *plat)
>  	bsp_priv->clk_enabled = false;
>  
>  	bsp_priv->num_clks = ARRAY_SIZE(rk_clocks);
> -	if (phy_iface == PHY_INTERFACE_MODE_RMII)
> +	if (phy_iface == PHY_INTERFACE_MODE_RMII ||
> +	    phy_iface == PHY_INTERFACE_MODE_RGMII)

Apart from Heiko commenting that this patch is completely wrong, there
are 4 RGMII modes, not one. You should of used
phy_interface_mode_is_rgmii().

    Andrew

---
pw-bot: cr
 

^ permalink raw reply

* Re: [PATCH net 0/7] xsk: fix AF_XDP multi-buffer Tx descriptor reclaim
From: Stanislav Fomichev @ 2026-06-24 15:38 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: netdev, bpf, magnus.karlsson, stfomichev, kuba, pabeni, horms,
	kerneljasonxing, bjorn
In-Reply-To: <20260623133240.1048434-1-maciej.fijalkowski@intel.com>

On 06/23, Maciej Fijalkowski wrote:
> Hi,
> 
> This series fixes several AF_XDP multi-buffer Tx paths where descriptors
> consumed from the Tx ring are not consistently returned to userspace
> through the completion ring when the packet is later dropped as invalid.
> 
> The affected cases are invalid or oversized multi-buffer Tx packets in
> both the generic and zero-copy paths. In these cases, the kernel can
> consume one or more Tx descriptors while building or validating a
> multi-buffer packet, then drop the packet before it reaches the device.
> Userspace still owns the UMEM buffers only after the corresponding
> addresses are returned through the CQ. Missing completions therefore
> make userspace lose track of those buffers.
> 
> The generic path fixes cover three related cases:
> * partially built multi-buffer skbs dropped by xsk_drop_skb();
>   continuation descriptors left in the Tx ring after xsk_build_skb()
>   reports overflow;
> * invalid descriptors encountered in the middle of a multi-buffer
>   packet, including the offending invalid descriptor itself.
> 
> The zero-copy path is handled separately. The batched Tx parser now
> distinguishes descriptors that can be passed to the driver from
> descriptors that are consumed only because they belong to an invalid
> multi-buffer packet. Reclaim-only descriptors are written to the CQ
> address area and published in completion order, after any earlier
> driver-visible Tx descriptors.
> 
> The ZC batching path can also retain drain state when userspace has not
> yet provided the end of an invalid multi-buffer packet. To keep this
> state local to the singular batched path, the series prevents a second
> Tx socket from joining the same pool while such drain state exists.
> During the singular-to-shared transition, Tx batching is gated,
> pre-existing readers are waited out, and bind fails with -EAGAIN if the
> existing socket still has pending drain state. This avoids adding
> multi-buffer drain handling to the shared-UMEM fallback path.
> 
> The last two patches update xskxceiver so the tests account invalid
> multi-buffer Tx packets as descriptors that must be reclaimed, while
> still not expecting those invalid packets on the Rx side.
> 
> This is a follow-up to Jason's changes [0] which were addressing generic
> xmit only and this set allows me to pass full xskxceiver test suite run
> against ice driver.

There is a fair amount of feedback from sashiko already :-( So the meta
question from me is: is it time to scrap our current approach where
we parse descriptor by descriptor? (and maintain half-baked skb and
half-consumed descriptor queues)

Should we:

1. do desc[MAX_SKB_FRAGS] and xskq_cons_peek_desc until we exhaust
PKT_CONT (if the last packet has PKT_CONT, return EOVERFLOW to userspace
and do a full stop here)
2. now that we really know the number of valid descriptors -> reserve
the cq space (if not -> EAGAIN)
3. pre-allocate everything here (if at any point we have ENOMEM -> cleanup
locally, don't ever create semi-initialized skb)
4. construct the skb
5. xmit

If at any point there is an issue, the cleanup is straightforward. That
whole xk->skb goes away, no state between syscalls. Thoughts?

^ permalink raw reply

* Re: Ethtool : PRBS feature
From: Alexander Duyck @ 2026-06-24 15:35 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Lee Trager, Das, Shubham, Maxime Chevallier,
	netdev@vger.kernel.org, mkubecek@suse.cz, D H, Siddaraju,
	Chintalapalle, Balaji, Lindberg, Magnus,
	niklas.damberg@ericsson.com
In-Reply-To: <38359a70-ebc7-49c5-bae5-0d3d7bf82fac@lunn.ch>

On Tue, Jun 23, 2026 at 7:30 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
> >     To avoid race conditions, maybe some of these commands need combining.
> >     ethtool --phy-test eth1 tx-prbs prbs7 rx-prbs prbs7 bert start
> >
> >     The configuration is then atomic, with respect to the uAPI, so we
> >     don't get two users configuring it at the same time, ending up with a
> >     messed up configuration.
> >
> > Testing consumes the link so you really don't want anything done to the netdev
> > while testing is running. fbnic does the following.
> >
> > 1. Testing cannot start when the link is up
>
> That is not going to work in the generic case. Many MAC drivers don't
> bind to there PCS or PHY until open() is called. So there is no way to
> pass the uAPI calls onto the PCS or PHY if the interface is
> down. There are also some MACs which connect to multiple PCSs, and
> there can be multiple PHYs. So you need to somehow indicate which
> PCS/PHY should perform the PRBS. There was a discussion about loopback
> recently, which has the same issue, you can perform loopback testing
> in multiple places. So i expect the same concept will be used for
> this.

I would think something like this would still be usable. You would
just need to specify the phy address and possibly device address in
the case that you support doing such testing at multiple layers.
Basically it would be up to the driver to provide a way to connect the
request with the desired interface. I would imagine something similar
is the case for the loopback handling since there are so many layers
where you can hairpin things back to the port it came in on.

> > 2. Once testing starts the driver removes the netdev to prevent use. The netdev
> > is only added back when testing stops. The upstream solution will need
> > something that can keep the netdev but lock everything down while testing is
> > running.
>
> Probably IF_OPER_TESTING would be part of this. If the interface is in
> this state, you want many other things blocked. However, probably
> ksettings get/set need to work, so you can force the link into a
> specific mode.

I would imagine it depends on if you want to enforce ordering on this
or not. I would say the set would probably need to be blocked as you
wouldn't normally want to be changing the setting in the middle of a
test as it would cause the error stats to climb quickly.

> > 3. Once testing starts you cannot change the test, even on an individual lane
> > basis. You must stop testing first.
> >
> >
> >     Traditionally, Unix does not offer a way to clear statistic counters
> >     back to zero. So i'm not sure about clear-stats. We also need to think
> >     about hardware which does not support that. And there is locking
> >     issues, can the stats be cleared while a test is active?
> >
> > fbnic actually has separate registers for PRBS test results. Results do need to
> > be clean between runs but I never created an explicit clear interface. Firmware
> > automatically reset the registers when a new test was started. This also allows
> > results to be viewed after testing has stopped.
>
> We should really take 802.3 as the model, but i've not had time yet to
> read what it says about the statistics.

I think most of this is all called out in the IEEE 802.3-2022 spec
under section 45.2.1.169 - 45.2.1.174. Basically the ability and
controls live in the 1500 range, Tx error statistics in the 1600, and
Rx statistics in the 1700 range.

> > Reading results was a little tricky due to roll over between two 32bit
> > registers.
>
> 802.3 is make this even more interesting, since those registers are 16
> bits.

Yeah, normally to deal with something like that we would likely be
looking at having to maintain a fairly high read frequency. Although
in theory the error counts shouldn't be climbing that fast anyway. The
spec calls out that the registers are clear on read and held at ~0 in
the event of overflow which would be a failing case for any reasonable
test anyway.

> > When I spoke to hardware engineers at Meta they did not want a timeout. Testing
> > often occurred over days, so they wanted to be able to start it and explicitly
> > stop it. I'm not against a time out but I do think it should be optional.
> >
> > Since PRBS testing is handled by firmware one safety measure I added is if
> > firmware lost contact with the host testing was automatically stopped and TX
> > FIR values were reset to factory. This ensured that the NIC won't get stuck in
> > testing and on initialization the driver doesn't have to worry about testing
> > state.
>
> That will work for firmware, but not when Linux is driving the
> hardware. I don't know if netlink will allow it, or if RTNL will get
> in the way etc, but it could be we actually don't want a start and
> stop commands at all, it is a blocking netlink call, and the test runs
> until the user space process closes the socket?

What we would probably need to do is look at testing as a state rather
than an operation. Basically the NIC would be put into the testing
state and as a result it would just be sitting there emitting whatever
test pattern it is supposed to emit, and validating it is receiving
the pattern it expects to receive.

The statistics could probably just be a subset of the PHY statistics
that could be collected separately. Actually now that I think about it
I wonder if we couldn't look at putting together the interface similar
to how we currently handle FEC where you have the --set-fec interface
to configure things and the --show-fec interface with the -I option to
show the current state and also dump the statistics.

^ permalink raw reply

* [TEST] GRO on i40e looks off
From: Jakub Kicinski @ 2026-06-24 15:34 UTC (permalink / raw)
  To: Adrian Pielech, Przemyslaw Kitszel; +Cc: intel-wired-lan, netdev

Hi!

Looking thru the stability reports GRO tests on i40e stand out.
It's bad across multiple test cases but IPv4 IP ID cases are
a very good example. These tests are solid across all platforms
but on i40e they fail half of the time both in Intel CI and
in netdev CI.

^ permalink raw reply

* [TEST] Weird RSS state on ice
From: Jakub Kicinski @ 2026-06-24 15:30 UTC (permalink / raw)
  To: Adrian Pielech, Przemyslaw Kitszel
  Cc: netdev@vger.kernel.org, intel-wired-lan

Hi!

I noticed in the netdev CI that the ice runner fails to run the
toeplitz tests because of the RSS config.

https://netdev-ci-results.intel.com/ice-results/net-next-hw-2026-06-23--00-00/ice-E810-CQ2/toeplitz.py/stdout

I added some extra debug on the branch:

net.lib.ynl.pyynl.lib.ynl.NlError: Netlink error: hash field config is not symmetric 16 304: Invalid argument {'bad-attr': '.input-xfrm'}

16, 304 means GTP flow, GTP_TEID field. So we are trying to disable
symmetric RSS, but the field configuration contains TEID. The problem
is this is an illegal configuration in the first place. We are
_disabling_ symmetric RSS, but the kernel tries to make sure that both
before and after states are correct (because the configuration involves
multiple calls to the drivers and may fail half-way-thru). If the
current config is illegal net/ethtool/ won't even let us restore it to
sane state.

So the question is how we got into this state. It does not happen
on netdev machines. And on Intel machines it happens randomly around
30% of the time.

I tried to look thru the driver code and I don't see how we could end
up with such a config.

Could y'all have a look and figure out / fix this? This has been
happening for a while back but I was waiting until the merge window 
to poke at it first.

^ permalink raw reply

* Re: [PATCH net v2 2/2] octeontx2-af: suppress kpu profile loading warning
From: Simon Horman @ 2026-06-24 15:24 UTC (permalink / raw)
  To: nshettyj
  Cc: Simon Horman, netdev, linux-kernel, sgoutham, rkannoth, lcherian,
	gakula, hkelam, sbhatta, andrew+netdev, davem, edumazet, kuba,
	pabeni, Sunil.Goutham, naveenm, hkalra
In-Reply-To: <20260623040609.3090846-3-nshettyj@marvell.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://netdev-ai.bots.linux.dev/sashiko/
---
octeontx2-af: suppress kpu profile loading warning

This commit replaces request_firmware_direct() with
firmware_request_nowarn() in npc_load_kpu_profile_from_fs() to suppress
a warning observed when the KPU profile is loaded via the firmware
database method rather than being integrated into the kernel image.

> diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c
> index c7bc0b3a29b9..007d3f22b0c9 100644
> --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c
> +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c
> @@ -2246,7 +2246,7 @@ static int npc_load_kpu_profile_from_fs(struct rvu *rvu)
>  
>  	strcat(path, kpu_profile);
>  
> -	if (request_firmware_direct(&fw, path, rvu->dev))
> +	if (firmware_request_nowarn(&fw, path, rvu->dev))

[Medium]
Does this actually change the warning behavior the commit message
describes?

Looking at the two helpers in drivers/base/firmware_loader/main.c, both
pass FW_OPT_NO_WARN into _request_firmware():

  int firmware_request_nowarn(...) {
        ...
        ret = _request_firmware(firmware, name, device, NULL, 0, 0,
                                FW_OPT_UEVENT | FW_OPT_NO_WARN);
        ...
  }

  int request_firmware_direct(...) {
        ...
        ret = _request_firmware(firmware_p, name, device, NULL, 0, 0,
                                FW_OPT_UEVENT | FW_OPT_NO_WARN |
                                FW_OPT_NOFALLBACK_SYSFS);
        ...
  }

The dev_warn() calls in the firmware loader hot path (for example the
"Direct firmware load for %s failed" message, the "loading %s failed"
message, and the sysfs fallback notice) are all gated by
!(opt_flags & FW_OPT_NO_WARN), and have been since commit 1fe6e4f0b0c4
("firmware_loader: Suppress warning on FW_OPT_NO_WARN flag").

Given that request_firmware_direct() already sets FW_OPT_NO_WARN, where
does the spurious warning mentioned in the commit message
("causes a spurious warning from the Linux firmware subsystem ... due
to the absence of firmware in the kernel image") originate?

[Medium]
This change also silently drops FW_OPT_NOFALLBACK_SYSFS, which is not
mentioned in the commit message.

The kerneldoc for request_firmware_direct() says it "doesn't fall back
to usermode helper even if the firmware couldn't be loaded directly
from fs.  Hence it's useful for loading optional firmwares, which
aren't always present, without extra long timeouts of udev."

firmware_request_nowarn()'s kerneldoc states the opposite: "The sysfs
fallback mechanism is enabled if direct filesystem lookup fails."

On kernels built with CONFIG_FW_LOADER_USER_HELPER_FALLBACK=y, or where
the force_sysfs_fallback sysctl is enabled, fw_run_sysfs_fallback() no
longer short-circuits on FW_OPT_NOFALLBACK_SYSFS, so the absent-firmware
case will now invoke firmware_fallback_sysfs() and wait on udev for the
loading timeout (default 60s) before returning -ENOENT.

The commit message itself states that "in most cases the profile is
loaded using the 2nd method", so this is precisely the optional-firmware
scenario that request_firmware_direct() was designed for. Was the
fallback-sysfs behavior change intentional, and is there a reason not
to keep the no-fallback semantics on configurations that enable the
usermode helper?

>  		return -ENOENT;
>  
>  	dev_info(rvu->dev, "Loading KPU profile from filesystem: %s\n",

^ permalink raw reply

* Re: [PATCH 1/2] bug: Provide WARN_ON.*DEFERRED() macros for console deferred output
From: Sebastian Andrzej Siewior @ 2026-06-24 15:24 UTC (permalink / raw)
  To: Petr Mladek
  Cc: K Prateek Nayak, linux-arch, linux-kernel, sched-ext, netdev,
	David S . Miller, Andrea Righi, Andrew Morton, Arnd Bergmann,
	Ben Segall, Breno Leitao, Changwoo Min, David Vernet,
	Dietmar Eggemann, Eric Dumazet, Ingo Molnar, Jakub Kicinski,
	John Ogness, Juri Lelli, Paolo Abeni, Peter Zijlstra,
	Sergey Senozhatsky, Simon Horman, Steven Rostedt, Tejun Heo,
	Vincent Guittot, Vlad Poenaru
In-Reply-To: <ajugq8VAciqtMx9F@pathway.suse.cz>

On 2026-06-24 11:17:31 [+0200], Petr Mladek wrote:
> For Linus, it was a no-go, definitely.
> I would vote for adding the WARN_*DEFERRED() into the scheduler code
> at least until majority of console drivers are converted to nbcon API.

I see four nbcon serial console drivers (+netconsole, + drm_log). We
have at least four times that many console drivers. What is the
majority from your point of view? The 8250 should cover all of x86.

> Best Regards,
> Petr

Sebastian

^ permalink raw reply

* Re: [PATCH 0/18] pull request (net-next): ipsec-next 2026-06-12
From: Antony Antony @ 2026-06-24 15:10 UTC (permalink / raw)
  To: Jakub Kicinski, Steffen Klassert, Nathan Harold, Yan Yan
  Cc: Antony Antony, David Miller, Herbert Xu, netdev, Tobias Brunner,
	Sabrina Dubroca
In-Reply-To: <ajDlFUhMfJP36qA8@Antony2201.local>

On Tue, Jun 16, 2026 at 07:54:29AM +0200, Antony Antony wrote:
> On Sat, Jun 13, 2026 at 01:15:52PM -0700, Jakub Kicinski wrote:
> > On Fri, 12 Jun 2026 09:46:16 +0200 Steffen Klassert wrote:
> > > 3) Add a new netlink message XFRM_MSG_MIGRATE_STATE that
> > >    allows migrating individual IPsec SAs independently of
> > >    their policies. The existing XFRM_MSG_MIGRATE is tightly coupled
> > >    to policy+SA migration, lacks SPI for unique SA identification,
> > >    and cannot express reqid changes or migrate Transport mode
> > >    selectors. The new interface identifies the SA via SPI and mark,
> > >    supports reqid changes, address family changes, encap removal,
> > >    and uses an atomic create+install flow under x->lock to prevent
> > >    SN/IV reuse during AEAD SA migration.
> > >    From Antony Antony.
> > 
> > Hi! There are some Sashiko comments here, please follow up:
> > 
> > https://sashiko.dev/#/patchset/20260612074725.1760473-8-steffen.klassert@secunet.com
> > 
> 
> Thanks Jakub. I have fixes and testing them now. And I will send fixes soon.
> 
> The comments didn't click until I realized xfrm_user_state_lookup() only
> keys on mark.v & mark.m, so distinct (v, m) pairs collapse to the same
> masked value. A lookup key of {0, 0} matches a source SA with mark
> {0, 0xffffff} (both mask to 0), but reusing {0, 0} as the migrated mark 
> turns "match only mark 0x00" into "match all traffic".
> 
> Fix is copy from old SA than from old_mark passed along. This also pointed 
> more issues.

I have fixes queued up for the issues Sashiko found, to send once the
ipsec tree has net-next. What Sashiko pointed are corner cases. IMO
a typical IKE/IPsec daemon would not trigger, but worth fixing. 

The fixes address all four High findings and the Medium in patch 16/18.
Finding 6 (patch 05/18, encap removal) was determined to be a false
positive — already reviewed.

One tricky part worth noting: xfrm allows two SAs with the same SPI,
src, dst, and proto, however different mark:

  ip xfrm state add src 10.1.1.1 dst 10.1.1.2 spi 0x1000 .. mark 0x1 mask 0xff
  ip xfrm state add src 10.1.1.1 dst 10.1.1.2 spi 0x1000 .. mark 0x2 mask 0xff

  ip x s
  src 10.1.1.1 dst 10.1.1.2
      proto esp spi 0x00001000 reqid 100 mode tunnel
      replay-window 0
      mark 0x2/0xff
      aead rfc4106(gcm(aes)) 0x1111111111111111111111111111111111111111 96
      anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000
      sel src 0.0.0.0/0 dst 0.0.0.0/0
  src 10.1.1.1 dst 10.1.1.2
      proto esp spi 0x00001000 reqid 100 mode tunnel
      replay-window 0
      mark 0x1/0xff
      aead rfc4106(gcm(aes)) 0x1111111111111111111111111111111111111111 96
      anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000
      sel src 0.0.0.0/0 dst 0.0.0.0/0

Both are accepted: same SPI 0x1000, two distinct SAs with diffrent
mark. Note that both SAs share the same key material and their
independent oseq counters both start at 0 - the encrypted packets
from each produces an identical AES-GCM IV.

Does anyone know whether this is intentional or accidental? Is there a
use case that requires two SAs with identical crypto and replay counter,
however, different marks?

This is also what makes state migration with Mark complex. Since xfrm
permits two SAs to share the same SPI with different marks, migrating
a mark must check whether the target slot is already occupied.
The fix "xfrm: check mark changes for SA tuple collisions in XFRM_MSG_MIGRATE_STATE" does
exactly that, using the effective lookup key m->v & m->m to detect a
collision before proceeding.

Kernel selftests for this series are included in the tree. However,
extensive testing is difficult on my end — *swan cannot easily create
these cases.

Yan/Nathan,
would you be able to run the Android test suite against this branch? to
test migrating SA with mark set.

https://github.com/antonyantony/linux/tree/migrate-state-fixes-v0

-antony

^ permalink raw reply

* Re: [PATCH net v2] net/smc: avoid recursive sk_callback_lock in listen data_ready
From: Runyu Xiao @ 2026-06-24 10:37 UTC (permalink / raw)
  To: XIAO WU
  Cc: D. Wythe, Dust Li, Sidraya Jayagond, Wenjia Zhang,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Mahanta Jambigi, Tony Lu, Wen Gu, Simon Horman, Karsten Graul,
	linux-rdma, linux-s390, netdev, linux-kernel, jianhao.xu
In-Reply-To: <tencent_BD4B709F8D16281265EDBC0DC9EFC8758808@qq.com>

Hi Xiao,

&gt; the error path in smc_listen() does not restore icsk_af_ops when
&gt; kernel_listen() fails

Thanks, this looks like a real error-path bug. I will prepare it as a
separate fix for smc_listen() rather than folding it into this
sk_callback_lock patch.

Runyu


^ permalink raw reply

* Re: [PATCH net] net: pse-pd: scope pse_control regulator handle to kref lifetime
From: Simon Horman @ 2026-06-24 15:12 UTC (permalink / raw)
  To: github
  Cc: Simon Horman, o.rempel, kory.maincent, andrew+netdev, davem,
	edumazet, kuba, pabeni, corey, hkallweit1, linux, netdev,
	linux-kernel
In-Reply-To: <20260622192839.2508733-1-github@szelinsky.de>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://netdev-ai.bots.linux.dev/sashiko/
---
net: pse-pd: scope pse_control regulator handle to kref lifetime

This patch switches the pse_control consumer regulator handle from
devm_regulator_get_exclusive() / devm_regulator_put() to the plain
regulator_get_exclusive() / regulator_put() pair, so the put in
__pse_control_release() works regardless of whether the controller's
devres has already been unwound.

> diff --git a/drivers/net/pse-pd/pse_core.c b/drivers/net/pse-pd/pse_core.c
> index 69dbdbde9d71..a5e6d7b26b9f 100644
> --- a/drivers/net/pse-pd/pse_core.c
> +++ b/drivers/net/pse-pd/pse_core.c
> @@ -1367,7 +1367,7 @@ static void __pse_control_release(struct kref *kref)
>  
>  	if (psec->pcdev->pi[psec->id].admin_state_enabled)
>  		regulator_disable(psec->ps);
> -	devm_regulator_put(psec->ps);
> +	regulator_put(psec->ps);
>  
>  	module_put(psec->pcdev->owner);

[High]
Does this fix the use-after-free completely, or only the regulator-put
part of it?

The commit message describes the scenario where the controller is
unbound while a pse_control still has consumers, and explains that
pcdev->dev's devres is released LIFO before pse_controller_unregister()
runs.

In that same scenario, __pse_control_release() still reads
psec->pcdev->pi[psec->id].admin_state_enabled and psec->pcdev->owner
just before the regulator_put():

	if (psec->pcdev->pi[psec->id].admin_state_enabled)
		regulator_disable(psec->ps);
	regulator_put(psec->ps);

	module_put(psec->pcdev->owner);

pse_controller_unregister() calls pse_release_pis() which does
kfree(pcdev->pi), and pcdev->pi is plain kzalloc memory rather than
devres-managed. pcdev itself is typically a driver-owned allocation
tied to pcdev->dev's lifetime.

If a later pse_control_put() runs from the unbind path after
pse_controller_unregister() has freed pcdev->pi (and pcdev), would
these pcdev->pi[...] and pcdev->owner reads operate on already-freed
memory?

Similarly, would the regulator_put() then be called against an rdev
that has already been torn down through the controller's devres?

This pre-dates this patch, so it is not a regression introduced here,
but the commit message frames the change as making
__pse_control_release() correct "regardless of whether the
controller's devres has already been unwound", which seems to only
cover the regulator handle and not the pcdev->pi / pcdev->owner reads
on the same code path.

Would a more complete fix also need pse_controller_unregister() to
drain outstanding pse_control references, or have pse_control hold a
refcount on pcdev, so that psec cannot outlive pcdev->pi and pcdev?

^ permalink raw reply

* [PATCH net-next] openvswitch: conntrack: annotate ct limit hlist traversal
From: Runyu Xiao @ 2026-06-24 15:01 UTC (permalink / raw)
  To: aconole, echaudro, i.maximets
  Cc: davem, edumazet, kuba, pabeni, horms, netdev, dev, linux-kernel,
	runyu.xiao, jianhao.xu

ct_limit_set() is documented as being called with ovs_mutex held. It
walks the ct limit hlist with hlist_for_each_entry_rcu(), but the
iterator does not currently pass the OVS lockdep condition used
elsewhere for RCU-protected OVS objects.

Pass lockdep_ovsl_is_held() to the iterator. This matches the function's
existing caller contract and lets CONFIG_PROVE_RCU_LIST distinguish the
ovs_mutex-protected update path from the RCU read-side ct_limit_get()
path.

This was found by our static analysis tool and then manually reviewed
against the current tree. In the reviewed CONFIG_PROVE_RCU_LIST triage
run, the writer-side ct limit update produced the expected "RCU-list
traversed in non-reader section!!" warning while ovs_mutex was held,
with the stack matching ct_limit_set() and ovs_ct_limit_set_zone_limit().
The change is limited to documenting the existing protection contract.

This is a lockdep annotation cleanup. It does not change the conntrack
limit list update or release behavior.

Signed-off-by: Runyu Xiao <runyu.xiao@seu.edu.cn>
---
 net/openvswitch/conntrack.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index c6fd9c424e8f..95697d4e16e6 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -883,7 +883,8 @@ static void ct_limit_set(const struct ovs_ct_limit_info *info,
 	struct hlist_head *head;
 
 	head = ct_limit_hash_bucket(info, new_ct_limit->zone);
-	hlist_for_each_entry_rcu(ct_limit, head, hlist_node) {
+	hlist_for_each_entry_rcu(ct_limit, head, hlist_node,
+				 lockdep_ovsl_is_held()) {
 		if (ct_limit->zone == new_ct_limit->zone) {
 			hlist_replace_rcu(&ct_limit->hlist_node,
 					  &new_ct_limit->hlist_node);
-- 
2.34.1


^ permalink raw reply related

* Re: [PATCH net v2] net: sungem: fix probe error cleanup
From: Simon Horman @ 2026-06-24 15:06 UTC (permalink / raw)
  To: Ruoyu Wang
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev, linux-kernel
In-Reply-To: <20260623025759.3468566-1-ruoyuw560@gmail.com>

On Tue, Jun 23, 2026 at 10:57:59AM +0800, Ruoyu Wang wrote:
> gem_init_one() calls gem_remove_one() when register_netdev() fails.
> gem_remove_one() unregisters and frees resources owned by the net_device,
> including the DMA block, MMIO mapping, PCI regions, and the net_device
> itself. gem_init_one() then falls through to its own cleanup labels and
> frees the same resources again.
> 
> Keep the register_netdev() error path in gem_init_one(): clear drvdata so
> PM/remove paths do not see a half-registered device, remove the NAPI
> instance added during probe, and let the existing cleanup labels release
> the resources once.
> 
> The issue was found by a local static-analysis checker for probe error
> paths. The reported path was manually inspected before sending this fix.
> 
> Compile-tested with CONFIG_SUNGEM=y. Runtime testing was not performed
> because no sungem hardware is available.
> 
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Signed-off-by: Ruoyu Wang <ruoyuw560@gmail.com>
> ---
> v2:
> - Add a Fixes tag.
> - Describe how the issue was found.
> - Add testing information.
> 
> v1: https://lore.kernel.org/netdev/20260620155326.80582-1-ruoyuw560@gmail.com/

Thanks for the update.

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply

* [PATCH net] dt-bindings: net: renesas,ether: Drop example "ethernet-phy-ieee802.3-c22" fallback
From: Rob Herring (Arm) @ 2026-06-24 15:02 UTC (permalink / raw)
  To: Niklas Söderlund, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Krzysztof Kozlowski, Conor Dooley,
	Geert Uytterhoeven, Magnus Damm, Sergei Shtylyov
  Cc: netdev, linux-renesas-soc, devicetree, linux-kernel

Fix the Micrel PHY in the example which shouldn't have the
fallback "ethernet-phy-ieee802.3-c22" compatible:

Documentation/devicetree/bindings/net/renesas,ether.example.dtb: ethernet-phy@1 \
  (ethernet-phy-id0022.1537): compatible: ['ethernet-phy-id0022.1537', 'ethernet-phy-ieee802.3-c22'] is too long
        from schema $id: http://devicetree.org/schemas/net/micrel.yaml

Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
---
 Documentation/devicetree/bindings/net/renesas,ether.yaml | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/renesas,ether.yaml b/Documentation/devicetree/bindings/net/renesas,ether.yaml
index f0a52f47f95a..dd7187f12a67 100644
--- a/Documentation/devicetree/bindings/net/renesas,ether.yaml
+++ b/Documentation/devicetree/bindings/net/renesas,ether.yaml
@@ -121,8 +121,7 @@ examples:
         #size-cells = <0>;
 
         phy1: ethernet-phy@1 {
-            compatible = "ethernet-phy-id0022.1537",
-                         "ethernet-phy-ieee802.3-c22";
+            compatible = "ethernet-phy-id0022.1537";
             reg = <1>;
             interrupt-parent = <&irqc0>;
             interrupts = <0 IRQ_TYPE_LEVEL_LOW>;
-- 
2.53.0


^ permalink raw reply related

* Re: [BUG] KFENCE: use-after-free read in udp_tunnel_nic_device_sync_work
From: Eric Dumazet @ 2026-06-24 15:00 UTC (permalink / raw)
  To: Sam Sun
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, netdev,
	linux-kernel, syzkaller
In-Reply-To: <CAEkJfYMXsNuJjKWJ5nvvw0afSP77F0WWT0gfj2-sQM3VyZ0brQ@mail.gmail.com>

On Wed, Jun 24, 2026 at 7:46 AM Sam Sun <samsun1006219@gmail.com> wrote:
>

> So we are still freeing struct udp_tunnel_nic while its embedded work_struct
> is active. debugobjects catches this at kfree() before the active work gets a
> chance to run later and dereference the freed utn.
>
> My read is that the conversion from bitfields to atomic bitops removes the
> plain bitfield data race, but UDP_TUNNEL_NIC_WORK_PENDING is still only one
> boolean state. It can represent "some work is pending", but it cannot
> distinguish between:
>   idle
>   queued
>   running
>   running and queued again
>
> In particular, the workqueue core clears WORK_STRUCT_PENDING before invoking
> the worker. At that point the same work item can be queued again by
> udp_tunnel_nic_device_sync(). If an already running instance later executes:
>
>   clear_bit(UDP_TUNNEL_NIC_WORK_PENDING, &utn->flags);
>
> it can still clear the bit that was set for the requeued instance. Then
> udp_tunnel_nic_unregister() may observe UDP_TUNNEL_NIC_WORK_PENDING clear and
> free utn, even though debugobjects still sees utn->work as active.
>

-ETOOMANYBUGS

Ok, we could try to convert pending bit to a refcount.

diff --git a/net/ipv4/udp_tunnel_nic.c b/net/ipv4/udp_tunnel_nic.c
index 9944ed923ddfd10f9adf6ad788c0740daeaf2adb..2e14686f35896cb0caba3f8f587ef8b369090fbf
100644
--- a/net/ipv4/udp_tunnel_nic.c
+++ b/net/ipv4/udp_tunnel_nic.c
@@ -3,6 +3,7 @@

 #include <linux/ethtool_netlink.h>
 #include <linux/netdevice.h>
+#include <linux/refcount.h>
 #include <linux/slab.h>
 #include <linux/types.h>
 #include <linux/workqueue.h>
@@ -30,9 +31,8 @@ struct udp_tunnel_nic_table_entry {
  * @work:      async work for talking to hardware from process context
  * @dev:       netdev pointer
  * @lock:      protects all fields
- * @need_sync: at least one port start changed
- * @need_replay: space was freed, we need a replay of all ports
- * @work_pending: @work is currently scheduled
+ * @flags:     sync, replay flags
+ * @refcnt:    reference count
  * @n_tables:  number of tables under @entries
  * @missed:    bitmap of tables which overflown
  * @entries:   table of tables of ports currently offloaded
@@ -44,9 +44,11 @@ struct udp_tunnel_nic {

        struct mutex lock;

-       u8 need_sync:1;
-       u8 need_replay:1;
-       u8 work_pending:1;
+       unsigned long flags;
+#define UDP_TUNNEL_NIC_NEED_SYNC       0
+#define UDP_TUNNEL_NIC_NEED_REPLAY     1
+
+       refcount_t refcnt;

        unsigned int n_tables;
        unsigned long missed;
@@ -116,7 +118,7 @@ udp_tunnel_nic_entry_queue(struct udp_tunnel_nic *utn,
                           unsigned int flag)
 {
        entry->flags |= flag;
-       utn->need_sync = 1;
+       set_bit(UDP_TUNNEL_NIC_NEED_SYNC, &utn->flags);
 }

 static void
@@ -283,7 +285,7 @@ udp_tunnel_nic_device_sync_by_table(struct net_device *dev,
 static void
 __udp_tunnel_nic_device_sync(struct net_device *dev, struct
udp_tunnel_nic *utn)
 {
-       if (!utn->need_sync)
+       if (!test_bit(UDP_TUNNEL_NIC_NEED_SYNC, &utn->flags))
                return;

        if (dev->udp_tunnel_nic_info->sync_table)
@@ -291,21 +293,24 @@ __udp_tunnel_nic_device_sync(struct net_device
*dev, struct udp_tunnel_nic *utn)
        else
                udp_tunnel_nic_device_sync_by_port(dev, utn);

-       utn->need_sync = 0;
+       clear_bit(UDP_TUNNEL_NIC_NEED_SYNC, &utn->flags);
        /* Can't replay directly here, in case we come from the tunnel driver's
         * notification - trying to replay may deadlock inside tunnel driver.
         */
-       utn->need_replay = udp_tunnel_nic_should_replay(dev, utn);
+       if (udp_tunnel_nic_should_replay(dev, utn))
+               set_bit(UDP_TUNNEL_NIC_NEED_REPLAY, &utn->flags);
+       else
+               clear_bit(UDP_TUNNEL_NIC_NEED_REPLAY, &utn->flags);
 }

 static void
 udp_tunnel_nic_device_sync(struct net_device *dev, struct udp_tunnel_nic *utn)
 {
-       if (!utn->need_sync)
+       if (!test_bit(UDP_TUNNEL_NIC_NEED_SYNC, &utn->flags))
                return;

-       queue_work(udp_tunnel_nic_workqueue, &utn->work);
-       utn->work_pending = 1;
+       if (queue_work(udp_tunnel_nic_workqueue, &utn->work))
+               refcount_inc(&utn->refcnt);
 }

 static bool
@@ -348,7 +353,7 @@ udp_tunnel_nic_has_collision(struct net_device
*dev, struct udp_tunnel_nic *utn,
                        if (!udp_tunnel_nic_entry_is_free(entry) &&
                            entry->port == ti->port &&
                            entry->type != ti->type) {
-                               __set_bit(i, &utn->missed);
+                               set_bit(i, &utn->missed);
                                return true;
                        }
                }
@@ -483,7 +488,7 @@ udp_tunnel_nic_add_new(struct net_device *dev,
struct udp_tunnel_nic *utn,
                 * are no devices currently which have multiple tables accepting
                 * the same tunnel type, and false positives are okay.
                 */
-               __set_bit(i, &utn->missed);
+               set_bit(i, &utn->missed);
        }

        return false;
@@ -552,7 +557,7 @@ static void __udp_tunnel_nic_reset_ntf(struct
net_device *dev)

        mutex_lock(&utn->lock);

-       utn->need_sync = false;
+       clear_bit(UDP_TUNNEL_NIC_NEED_SYNC, &utn->flags);
        for (i = 0; i < utn->n_tables; i++)
                for (j = 0; j < info->tables[i].n_entries; j++) {
                        struct udp_tunnel_nic_table_entry *entry;
@@ -696,8 +701,8 @@ udp_tunnel_nic_flush(struct net_device *dev,
struct udp_tunnel_nic *utn)
        for (i = 0; i < utn->n_tables; i++)
                memset(utn->entries[i], 0, array_size(info->tables[i].n_entries,
                                                      sizeof(**utn->entries)));
-       WARN_ON(utn->need_sync);
-       utn->need_replay = 0;
+       WARN_ON(test_bit(UDP_TUNNEL_NIC_NEED_SYNC, &utn->flags));
+       clear_bit(UDP_TUNNEL_NIC_NEED_REPLAY, &utn->flags);
 }

 static void
@@ -713,8 +718,8 @@ udp_tunnel_nic_replay(struct net_device *dev,
struct udp_tunnel_nic *utn)
        for (i = 0; i < utn->n_tables; i++)
                for (j = 0; j < info->tables[i].n_entries; j++)
                        udp_tunnel_nic_entry_freeze_used(&utn->entries[i][j]);
-       utn->missed = 0;
-       utn->need_replay = 0;
+       bitmap_zero(&utn->missed, UDP_TUNNEL_NIC_MAX_TABLES);
+       clear_bit(UDP_TUNNEL_NIC_NEED_REPLAY, &utn->flags);

        if (!info->shared) {
                udp_tunnel_get_rx_info(dev);
@@ -728,6 +733,25 @@ udp_tunnel_nic_replay(struct net_device *dev,
struct udp_tunnel_nic *utn)
                        udp_tunnel_nic_entry_unfreeze(&utn->entries[i][j]);
 }

+static void udp_tunnel_nic_free(struct udp_tunnel_nic *utn)
+{
+       unsigned int i;
+
+       for (i = 0; i < utn->n_tables; i++)
+               kfree(utn->entries[i]);
+
+       if (utn->dev)
+               dev_put(utn->dev);
+
+       kfree(utn);
+}
+
+static void udp_tunnel_nic_put(struct udp_tunnel_nic *utn)
+{
+       if (refcount_dec_and_test(&utn->refcnt))
+               udp_tunnel_nic_free(utn);
+}
+
 static void udp_tunnel_nic_device_sync_work(struct work_struct *work)
 {
        struct udp_tunnel_nic *utn =
@@ -736,14 +760,15 @@ static void
udp_tunnel_nic_device_sync_work(struct work_struct *work)
        rtnl_lock();
        mutex_lock(&utn->lock);

-       utn->work_pending = 0;
        __udp_tunnel_nic_device_sync(utn->dev, utn);

-       if (utn->need_replay)
+       if (test_bit(UDP_TUNNEL_NIC_NEED_REPLAY, &utn->flags))
                udp_tunnel_nic_replay(utn->dev, utn);

        mutex_unlock(&utn->lock);
        rtnl_unlock();
+
+       udp_tunnel_nic_put(utn);
 }

 static struct udp_tunnel_nic *
@@ -759,6 +784,7 @@ udp_tunnel_nic_alloc(const struct udp_tunnel_nic_info *info,
        utn->n_tables = n_tables;
        INIT_WORK(&utn->work, udp_tunnel_nic_device_sync_work);
        mutex_init(&utn->lock);
+       refcount_set(&utn->refcnt, 1);

        for (i = 0; i < n_tables; i++) {
                utn->entries[i] = kzalloc_objs(*utn->entries[i],
@@ -776,15 +802,6 @@ udp_tunnel_nic_alloc(const struct
udp_tunnel_nic_info *info,
        return NULL;
 }

-static void udp_tunnel_nic_free(struct udp_tunnel_nic *utn)
-{
-       unsigned int i;
-
-       for (i = 0; i < utn->n_tables; i++)
-               kfree(utn->entries[i]);
-       kfree(utn);
-}
-
 static int udp_tunnel_nic_register(struct net_device *dev)
 {
        const struct udp_tunnel_nic_info *info = dev->udp_tunnel_nic_info;
@@ -863,6 +880,7 @@ static void
 udp_tunnel_nic_unregister(struct net_device *dev, struct udp_tunnel_nic *utn)
 {
        const struct udp_tunnel_nic_info *info = dev->udp_tunnel_nic_info;
+       bool last = true;

        udp_tunnel_nic_lock(dev);

@@ -889,6 +907,7 @@ udp_tunnel_nic_unregister(struct net_device *dev,
struct udp_tunnel_nic *utn)
                        udp_tunnel_drop_rx_info(dev);
                        utn->dev = first->dev;
                        udp_tunnel_nic_unlock(dev);
+                       last = false;
                        goto release_dev;
                }

@@ -901,16 +920,11 @@ udp_tunnel_nic_unregister(struct net_device
*dev, struct udp_tunnel_nic *utn)
        udp_tunnel_nic_flush(dev, utn);
        udp_tunnel_nic_unlock(dev);

-       /* Wait for the work to be done using the state, netdev core will
-        * retry unregister until we give up our reference on this device.
-        */
-       if (utn->work_pending)
-               return;
-
-       udp_tunnel_nic_free(utn);
+       udp_tunnel_nic_put(utn);
 release_dev:
        dev->udp_tunnel_nic = NULL;
-       dev_put(dev);
+       if (!last)
+               dev_put(dev);
 }

 static int

^ permalink raw reply

* [PATCH v6 10/10] rust: module: update MAINTAINERS to cover module.rs
From: Alvin Sun @ 2026-06-24 15:00 UTC (permalink / raw)
  To: Miguel Ojeda, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Danilo Krummrich, Luis Chamberlain, Petr Pavlu, Daniel Gomez,
	Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
	Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
	Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
	Jens Axboe, Dave Ertman, Leon Romanovsky, Igor Korotin,
	FUJITA Tomonori, Bjorn Helgaas, Krzysztof Wilczyński,
	Arve Hjønnevåg, Todd Kjos, Christian Brauner,
	Carlos Llamas
  Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
	linux-kselftest, kunit-dev, linux-block, linux-kernel, netdev,
	linux-pci, Alvin Sun
In-Reply-To: <20260624-fix-fops-owner-v6-0-5295e333cb3e@linux.dev>

Module types now live in `rust/kernel/module.rs` alongside
`rust/kernel/module_param.rs`. Update the MODULE SUPPORT file pattern
from `rust/kernel/module_param.rs` to `rust/kernel/module*.rs` so both
files are covered.

Assisted-by: opencode:glm-5.2
Link: https://lore.kernel.org/rust-for-linux/8ea21b29-9baf-4926-a16f-7d21c5a1a1b8@suse.com
Signed-off-by: Alvin Sun <alvin.sun@linux.dev>
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index e035a3be797c4..74733de3e41ee 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -17984,7 +17984,7 @@ F:	include/linux/module*.h
 F:	kernel/module/
 F:	lib/test_kmod.c
 F:	lib/tests/module/
-F:	rust/kernel/module_param.rs
+F:	rust/kernel/module*.rs
 F:	rust/macros/module.rs
 F:	scripts/module*
 F:	tools/testing/selftests/kmod/

-- 
2.43.0



^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox