* Re: [PATCH net] dt-bindings: net: renesas,ether: Drop example "ethernet-phy-ieee802.3-c22" fallback
From: Andrew Lunn @ 2026-06-24 15:47 UTC (permalink / raw)
To: Rob Herring (Arm)
Cc: Niklas Söderlund, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Krzysztof Kozlowski, Conor Dooley,
Geert Uytterhoeven, Magnus Damm, Sergei Shtylyov, netdev,
linux-renesas-soc, devicetree, linux-kernel
In-Reply-To: <20260624150250.131966-2-robh@kernel.org>
On Wed, Jun 24, 2026 at 10:02:50AM -0500, Rob Herring (Arm) wrote:
> Fix the Micrel PHY in the example which shouldn't have the
> fallback "ethernet-phy-ieee802.3-c22" compatible:
>
> Documentation/devicetree/bindings/net/renesas,ether.example.dtb: ethernet-phy@1 \
> (ethernet-phy-id0022.1537): compatible: ['ethernet-phy-id0022.1537', 'ethernet-phy-ieee802.3-c22'] is too long
> from schema $id: http://devicetree.org/schemas/net/micrel.yaml
>
> Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Fixes: 37a2fce09001 ("dt-bindings: sh_eth convert bindings to json-schema")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Andrew
^ permalink raw reply
* Re: [RFC net-next 08/15] ipxlat: add translation engine and dispatch core
From: Ralf Lici @ 2026-06-24 15:43 UTC (permalink / raw)
To: Beniamino Galvani
Cc: Toke Høiland-Jørgensen, netdev, Daniel Gröber,
Antonio Quartulli, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, linux-kernel
In-Reply-To: <ajo-MgpZ8jA8YDwi@tp>
On Tue, 23 Jun 2026 10:05:06 +0200, Beniamino Galvani <bgalvani@redhat.com> wrote:
> On Mon, Jun 22, 2026 at 05:56:11PM +0200, Ralf Lici wrote:
> > While reading the BPF programs, two things stood out that would help
> > shape v2. On addressing, both implementations use a single NAT64/PLAT
> > prefix for destinations plus an explicit local_v4<>local_v6 mapping for
> > the host itself. ipxlat today maps both source and destination through
> > one RFC 6052 prefix, so this suggests v2 should probably support
> > explicit 1:1 address mappings (EAM, RFC 7757) alongside prefix
> > embedding. Is a single local mapping enough for your case, or do you
> > foresee needing several?
>
> Based on these:
>
> https://datatracker.ietf.org/doc/html/rfc6877#section-6.3
> https://www.ietf.org/archive/id/draft-ietf-v6ops-claton-14.html#name-obtaining-clat-ipv6-address
>
> there are two cases to consider for CLAT.
>
> If the node doing CLAT extends the IPv4 connectivity downstream, it
> should acquire a dedicated prefix via DHCPv6-PD for the downstream
> network, and use this prefix to generate IPv4-Embedded IPv6 Addresses
> (RFC 6052 2.2) for downlink hosts. In this case, the ipxlat device
> would need to be configured with two prefixes: one is the NAT64 prefix
> used to synthesize IPv6 destinations for IPv4 Internet addresses, and
> the other is the delegated prefix used to synthesize IPv6 source
> addresses for hosts on the downstream IPv4 network. Ideally, the
> ipxlat device should also be aware of the address range of the
> downstream IPv4 network, and check that the source addresses of
> packets belong to that network.
>
> If the node doing CLAT does not extend IPv4 connectivity downstream
> (or it does, but it also uses NAT44), the "downstream network" is
> basically just one host. The CLAT only needs to map a single
> locally-generated IPv4 address (usually in the 192.0.0.0/29 range) to
> a single SLAAC IPv6 address reserved for the CLAT. In this case, the
> ipxlat device would need to know the CLAT IPv4 address, the CLAT IPv6
> address and the NAT64 prefix.
>
> Currently, NetworkManager only uses a single IPv4 address and doesn't
> request a dedicated prefix via DHCPv6-PD for the CLAT. If it needs to
> provide downstream connectivity, it does IPv4 masquerading so that the
> traffic originates from a single IPv4 address. I think the ipxlat
> implementation should also support the delegated-prefix case, as this
> architecture is described in the standards.
>
I see. Your CLAT breakdown makes it clear that the single-prefix model
in this RFC is too narrow for v2. What I am currently thinking is to
shape the config as a far-side prefix plus an optional near-side map.
The 2 CLAT cases then can be represented roughly as:
remote-prefix6 64:ff9b::/96
local-map 192.0.0.1/32 2001:db8:1:2::1234/128
for the single-host CLAT case (which is an explicit 1:1 mapping), and:
remote-prefix6 64:ff9b::/96
local-map 192.0.2.0/24 2001:db8:100:64::/96
for the downstream/delegated-prefix case.
The idea is that remote-prefix6 is the RFC 6052 prefix used for the
NAT64/PLAT side, while local-map optionally describes the near-side
mapping. If local-map is omitted, the translator falls back to the
symmetric v1 behavior where the same RFC 6052 prefix is used for both
source and destination.
On the source-range check you mentioned: it falls out of the near-side
map for free. In the 4->6 direction the source is resolved through
local-map only, so a source outside its IPv4 domain (the /32 or the /24
above) simply has no near-side mapping and is dropped, so no separate
anti-spoofing knob should be needed. Anything broader than "is this a
valid near-side source" I'd leave to routing/nft rather than bake into
the translator.
Names and exact attribute shape are still WIP, but does this capture the
two CLAT cases you had in mind?
> > On the consumer side, is there anything in how NM models a connection
> > that would make a particular kernel model awkward to drive, e.g. needing
> > to attach to an already-managed interface, or conversely being able to
> > create and own a dedicated device? We're still settling the
> > kernel-facing model for v2, so consumer input here is genuinely
> > valuable.
>
> Any of the solutions mentioned in the thread (dedicated device,
> netfilter, LWT) would be fine from NetworkManager's point of
> view. Compared to what we are doing now, they would be a great
> simplification ;)
>
Nice, thanks for confirming!
--
Ralf Lici
Mandelbit Srl
^ permalink raw reply
* Re: [PATCH v3 1/7] net: wwan: t9xx: Add PCIe core
From: Andrew Lunn @ 2026-06-24 15:56 UTC (permalink / raw)
To: jackbb_wu
Cc: Loic Poulain, Sergey Ryazanov, Johannes Berg, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Wen-Zhi Huang, Shi-Wei Yeh, Minano Tseng, Matthias Brugger,
AngeloGioacchino Del Regno, Simon Horman, Jonathan Corbet,
Shuah Khan, linux-kernel, netdev, linux-arm-kernel,
linux-mediatek, linux-doc
In-Reply-To: <20260624-t9xx_driver_v1-v3-1-73ff03f60c48@compal.com>
> From: Jack Wu <jackbb_wu@compal.com>
>
> Registers the T900 device driver with the kernel. Set up all
> the fundamental configurations for the device: PCIe layer,
> Modem Host Cross Core Interface (MHCCIF), Reset Generation
> Unit (RGU), modem common control operations and build
> infrastructure.
>
> * PCIe layer code implements driver probe and removal, MSI-X
> interrupt initialization and de-initialization, and the way
> of resetting the device.
> * MHCCIF provides interrupt channels to communicate events
> such as handshake, PM and port enumeration.
> * RGU provides interrupt channels to generate notifications
> from the device so that the T900 driver could get the
> device reset.
> * Modem common control operations provide the basic read/write
> functions of the device's hardware registers,
> mask/unmask/get/clear functions of the device's interrupt
> registers and inquiry functions of the device's status.
>
> Signed-off-by: Jack Wu <jackbb_wu@compal.com>
> ---
> drivers/net/wwan/Kconfig | 12 +
> drivers/net/wwan/Makefile | 1 +
> drivers/net/wwan/t9xx/Makefile | 10 +
> drivers/net/wwan/t9xx/mtk_dev.h | 108 +++
> drivers/net/wwan/t9xx/pcie/mtk_pci.c | 1049 +++++++++++++++++++++++++
> drivers/net/wwan/t9xx/pcie/mtk_pci.h | 234 ++++++
> drivers/net/wwan/t9xx/pcie/mtk_pci_drv_m9xx.c | 69 ++
> drivers/net/wwan/t9xx/pcie/mtk_pci_reg.h | 70 ++
> 8 files changed, 1553 insertions(+)
>
> diff --git a/drivers/net/wwan/Kconfig b/drivers/net/wwan/Kconfig
> index 88df55d78d90..4cee537c739f 100644
> --- a/drivers/net/wwan/Kconfig
> +++ b/drivers/net/wwan/Kconfig
> @@ -121,6 +121,18 @@ config MTK_T7XX
>
> If unsure, say N.
>
> +config MTK_T9XX
> + tristate "MediaTek PCIe 5G WWAN modem T9xx device"
> + depends on PCI
> + select NET_DEVLINK
> + help
> + Enables MediaTek PCIe based 5G WWAN modem (T9xx series) device.
> +
> + To compile this driver as a module, choose M here: the module will be
> + called mtk_t9xx.
> +
> + If unsure, say N.
> +
> endif # WWAN
>
> endmenu
> diff --git a/drivers/net/wwan/Makefile b/drivers/net/wwan/Makefile
> index 3960c0ae2445..7361eef4c472 100644
> --- a/drivers/net/wwan/Makefile
> +++ b/drivers/net/wwan/Makefile
> @@ -14,3 +14,4 @@ obj-$(CONFIG_QCOM_BAM_DMUX) += qcom_bam_dmux.o
> obj-$(CONFIG_RPMSG_WWAN_CTRL) += rpmsg_wwan_ctrl.o
> obj-$(CONFIG_IOSM) += iosm/
> obj-$(CONFIG_MTK_T7XX) += t7xx/
> +obj-$(CONFIG_MTK_T9XX) += t9xx/
> diff --git a/drivers/net/wwan/t9xx/Makefile b/drivers/net/wwan/t9xx/Makefile
> new file mode 100644
> index 000000000000..6f2dd3f91454
> --- /dev/null
> +++ b/drivers/net/wwan/t9xx/Makefile
> @@ -0,0 +1,10 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +
> +ccflags-y += -I$(src)/pcie
> +ccflags-y += -I$(src)
> +
> +obj-$(CONFIG_MTK_T9XX) += mtk_t9xx.o
> +
> +mtk_t9xx-y := \
> + pcie/mtk_pci.o \
> + pcie/mtk_pci_drv_m9xx.o
> diff --git a/drivers/net/wwan/t9xx/mtk_dev.h b/drivers/net/wwan/t9xx/mtk_dev.h
> new file mode 100644
> index 000000000000..8278a0e2875e
> --- /dev/null
> +++ b/drivers/net/wwan/t9xx/mtk_dev.h
> @@ -0,0 +1,108 @@
> +/* SPDX-License-Identifier: GPL-2.0-only
> + *
> + * Copyright (c) 2022, MediaTek Inc.
> + */
> +
> +#ifndef __MTK_DEV_H__
> +#define __MTK_DEV_H__
> +
> +#include <linux/dma-mapping.h>
> +#include <linux/dmapool.h>
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/sched.h>
> +#include <linux/slab.h>
> +#include <linux/spinlock.h>
> +
> +#define MTK_DEV_STR_LEN 16
> +
> +enum mtk_user_id {
> + MTK_USER_MIN,
> + MTK_USER_CTRL,
> + MTK_USER_DATA,
> + MTK_USER_MAX
> +};
> +
> +enum mtk_dev_evt_h2d {
> + DEV_EVT_H2D_DEVICE_RESET = BIT(2),
> + DEV_EVT_H2D_MAX = BIT(5)
> +};
> +
> +enum mtk_dev_evt_d2h {
> + DEV_EVT_D2H_BOOT_FLOW_SYNC = BIT(4),
> + DEV_EVT_D2H_ASYNC_HS_NOTIFY_SAP = BIT(5),
> + DEV_EVT_D2H_ASYNC_HS_NOTIFY_MD = BIT(6),
> + DEV_EVT_D2H_MAX = BIT(11)
> +};
> +
> +struct mtk_md_dev;
> +
> +struct mtk_dev_ops {
> + u32 (*get_dev_state)(struct mtk_md_dev *mdev);
> + void (*ack_dev_state)(struct mtk_md_dev *mdev, u32 state);
> + u32 (*get_dev_cfg)(struct mtk_md_dev *mdev);
> + int (*register_dev_evt)(struct mtk_md_dev *mdev, u32 dev_evt,
> + int (*evt_cb)(u32 status, void *data), void *data);
> + void (*unregister_dev_evt)(struct mtk_md_dev *mdev, u32 dev_evt);
> + void (*mask_dev_evt)(struct mtk_md_dev *mdev, u32 dev_evt);
> + void (*unmask_dev_evt)(struct mtk_md_dev *mdev, u32 dev_evt);
> + void (*clear_dev_evt)(struct mtk_md_dev *mdev, u32 dev_evt);
> + int (*send_dev_evt)(struct mtk_md_dev *mdev, u32 dev_evt);
> +};
> +
> +/* mtk_md_dev defines the structure of MTK modem device */
> +struct mtk_md_dev {
> + struct device *dev;
> + const struct mtk_dev_ops *dev_ops;
> + void *hw_priv;
> + u32 hw_ver;
> + char dev_str[MTK_DEV_STR_LEN];
> +};
> +
> +static inline u32 mtk_dev_get_dev_state(struct mtk_md_dev *mdev)
> +{
> + return mdev->dev_ops->get_dev_state(mdev);
> +}
> +
> +static inline void mtk_dev_ack_dev_state(struct mtk_md_dev *mdev, u32 state)
> +{
> + return mdev->dev_ops->ack_dev_state(mdev, state);
> +}
> +
> +static inline u32 mtk_dev_get_dev_cfg(struct mtk_md_dev *mdev)
> +{
> + return mdev->dev_ops->get_dev_cfg(mdev);
> +}
> +
> +static inline int mtk_dev_register_dev_evt(struct mtk_md_dev *mdev, u32 dev_evt,
> + int (*evt_cb)(u32 status, void *data), void *data)
> +{
> + return mdev->dev_ops->register_dev_evt(mdev, dev_evt, evt_cb, data);
> +}
> +
> +static inline void mtk_dev_unregister_dev_evt(struct mtk_md_dev *mdev, u32 dev_evt)
> +{
> + mdev->dev_ops->unregister_dev_evt(mdev, dev_evt);
> +}
> +
> +static inline void mtk_dev_mask_dev_evt(struct mtk_md_dev *mdev, u32 dev_evt)
> +{
> + mdev->dev_ops->mask_dev_evt(mdev, dev_evt);
> +}
> +
> +static inline void mtk_dev_unmask_dev_evt(struct mtk_md_dev *mdev, u32 dev_evt)
> +{
> + mdev->dev_ops->unmask_dev_evt(mdev, dev_evt);
> +}
> +
> +static inline void mtk_dev_clear_dev_evt(struct mtk_md_dev *mdev, u32 dev_evt)
> +{
> + mdev->dev_ops->clear_dev_evt(mdev, dev_evt);
> +}
> +
> +static inline int mtk_dev_send_dev_evt(struct mtk_md_dev *mdev, u32 dev_evt)
> +{
> + return mdev->dev_ops->send_dev_evt(mdev, dev_evt);
> +}
> +
> +#endif /* __MTK_DEV_H__ */
> diff --git a/drivers/net/wwan/t9xx/pcie/mtk_pci.c b/drivers/net/wwan/t9xx/pcie/mtk_pci.c
> new file mode 100644
> index 000000000000..c6a7196fcdd6
> --- /dev/null
> +++ b/drivers/net/wwan/t9xx/pcie/mtk_pci.c
> @@ -0,0 +1,1049 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2022, MediaTek Inc.
> + */
> +
> +#include <linux/acpi.h>
> +#include <linux/aer.h>
> +#include <linux/bitfield.h>
> +#include <linux/debugfs.h>
> +#include <linux/delay.h>
> +#include <linux/device.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +
> +#include "mtk_dev.h"
> +#include "mtk_pci.h"
> +#include "mtk_pci_reg.h"
> +
> +#define MTK_PCI_BAR_NUM 6
> +#define MTK_PCI_TRANSPARENT_ATR_SIZE (0x3F)
> +#define MTK_PCI_MINIMUM_ATR_SIZE (0x1000)
> +#define ATR_SIZE_LO32_MASK GENMASK_ULL(31, 0)
> +#define ATR_SIZE_HI32_MASK GENMASK_ULL(63, 32)
> +#define ATR_SIZE_BIAS_FROM_LO32 2
> +#define ATR_ADDR_ALIGN_MASK 0xFFFFF000
> +#define ATR_EN BIT(0)
> +#define ATR_PARAM_OFFSET 16
> +/* Delay between ACPI PXP._OFF and _ON for modem power cycle stabilization */
> +#define MTK_PLDR_POWER_OFF_DELAY_MS 500
> +#define LE32_TO_U32(x) ((__force u32)(__le32)(x))
> +#define SET_HW_BITS(dest, chs, mhccif, dev) \
> + ({ \
> + if ((chs) & (dev)) \
> + (dest) |= FIELD_PREP(mhccif, 1); \
> + })
> +
> +struct mtk_mhccif_cb {
> + struct list_head entry;
> + int (*evt_cb)(u32 status, void *data);
> + void *data;
> + u32 chs;
> +};
> +
> +/**
> + * mtk_pci_setup_atr() - Configure a PCIe address translation rule
> + * @mdev: MTK MD device
> + * @cfg: ATR configuration parameters
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +int mtk_pci_setup_atr(struct mtk_md_dev *mdev, struct mtk_atr_cfg *cfg)
> +{
> + struct mtk_pci_priv *priv = mdev->hw_priv;
> + u32 addr, val, size_h, size_l;
> + int atr_size, pos, offset;
> +
> + if (cfg->transparent) {
> + /* No address conversion is performed */
> + atr_size = MTK_PCI_TRANSPARENT_ATR_SIZE;
> + } else {
> + if (cfg->size < MTK_PCI_MINIMUM_ATR_SIZE)
> + cfg->size = MTK_PCI_MINIMUM_ATR_SIZE;
> +
> + if (cfg->src_addr & (cfg->size - 1)) {
> + dev_err((mdev)->dev, "Invalid atr src addr is not aligned to size\n");
> + return -EFAULT;
> + }
> +
> + if (cfg->trsl_addr & (cfg->size - 1)) {
> + dev_err((mdev)->dev,
> + "Invalid atr trsl addr is not aligned to size, %llx, %llx\n",
> + cfg->trsl_addr, cfg->size - 1);
> + return -EFAULT;
> + }
> +
> + size_l = FIELD_GET(ATR_SIZE_LO32_MASK, cfg->size);
> + size_h = FIELD_GET(ATR_SIZE_HI32_MASK, cfg->size);
> + pos = ffs(size_l);
> + if (pos) {
> + atr_size = pos - ATR_SIZE_BIAS_FROM_LO32;
> + } else {
> + pos = ffs(size_h);
> + atr_size = pos + 32 - ATR_SIZE_BIAS_FROM_LO32;
> + }
> + }
> +
> + /* Calculate table offset */
> + offset = ATR_PORT_OFFSET * cfg->port + ATR_TABLE_OFFSET * cfg->table;
> + addr = REG_ATR_PCIE_WIN0_T0_SRC_ADDR_MSB + offset;
> + val = (u32)(cfg->src_addr >> 32);
> + mtk_pci_mac_write32(priv, addr, val);
> +
> + addr = REG_ATR_PCIE_WIN0_T0_SRC_ADDR_LSB + offset;
> + val = (u32)(cfg->src_addr & ATR_ADDR_ALIGN_MASK) | (atr_size << 1) | ATR_EN;
> + mtk_pci_mac_write32(priv, addr, val);
> +
> + addr = REG_ATR_PCIE_WIN0_T0_TRSL_ADDR_MSB + offset;
> + val = (u32)(cfg->trsl_addr >> 32);
> + mtk_pci_mac_write32(priv, addr, val);
> +
> + addr = REG_ATR_PCIE_WIN0_T0_TRSL_ADDR_LSB + offset;
> + val = (u32)(cfg->trsl_addr & ATR_ADDR_ALIGN_MASK);
> + mtk_pci_mac_write32(priv, addr, val);
> +
> + /* TRSL_PARAM */
> + addr = REG_ATR_PCIE_WIN0_T0_TRSL_PARAM + offset;
> + val = (cfg->trsl_param << ATR_PARAM_OFFSET) | cfg->trsl_id;
> + mtk_pci_mac_write32(priv, addr, val);
> +
> + return 0;
> +}
> +
> +/**
> + * mtk_pci_atr_disable() - Disable all PCIe address translation rules
> + * @priv: MTK PCI private data
> + */
> +void mtk_pci_atr_disable(struct mtk_pci_priv *priv)
> +{
> + int port, tbl, offset;
> + u32 val;
> +
> + /* Disable all ATR table for all ports */
> + for (port = ATR_SRC_PCI_WIN0; port <= ATR_SRC_AXIS_3; port++)
> + for (tbl = 0; tbl < ATR_TABLE_NUM_PER_ATR; tbl++) {
> + /* Calculate table offset */
> + offset = ATR_PORT_OFFSET * port + ATR_TABLE_OFFSET * tbl;
> + val = mtk_pci_mac_read32(priv, REG_ATR_PCIE_WIN0_T0_SRC_ADDR_LSB + offset);
> + val = val & (~BIT(0));
> + /* Disable table by SRC_ADDR_L */
> + mtk_pci_mac_write32(priv, REG_ATR_PCIE_WIN0_T0_SRC_ADDR_LSB + offset, val);
> + }
> +}
> +
> +static void mtk_pci_set_msix_merged(struct mtk_pci_priv *priv, int irq_cnt)
> +{
> + mtk_pci_mac_write32(priv, REG_PCIE_CFG_MSIX, ffs(irq_cnt) * 2 - 1);
> +}
> +
> +/**
> + * mtk_pci_get_dev_state() - Read the device state from the modem
> + * @mdev: MTK MD device
> + *
> + * Return: Device state value.
> + */
> +u32 mtk_pci_get_dev_state(struct mtk_md_dev *mdev)
> +{
> + return mtk_pci_mac_read32(mdev->hw_priv, REG_PCIE_DEBUG_DUMMY_7);
> +}
> +
> +/**
> + * mtk_pci_ack_dev_state() - Acknowledge the device state to the modem
> + * @mdev: MTK MD device
> + * @state: State value to acknowledge
> + */
> +void mtk_pci_ack_dev_state(struct mtk_md_dev *mdev, u32 state)
> +{
> + mtk_pci_mac_write32(mdev->hw_priv, REG_PCIE_DEBUG_DUMMY_7, state);
> +}
> +
> +/**
> + * mtk_pci_get_irq_id() - Map an IRQ source to its hardware IRQ ID
> + * @mdev: MTK MD device
> + * @irq_src: IRQ source enum
> + *
> + * Return: IRQ ID on success, -EINVAL on failure.
> + */
> +int mtk_pci_get_irq_id(struct mtk_md_dev *mdev, enum mtk_irq_src irq_src)
> +{
> + struct mtk_pci_priv *priv = mdev->hw_priv;
> + const int *irq_tbl = priv->cfg->irq_tbl;
> + int irq_id = -EINVAL;
> +
> + if (irq_src > MTK_IRQ_SRC_MIN && irq_src < MTK_IRQ_SRC_MAX) {
> + irq_id = irq_tbl[irq_src];
> + if (irq_id < 0 || irq_id >= MTK_IRQ_CNT_MAX)
> + irq_id = -EINVAL;
> + }
> +
> + return irq_id;
> +}
> +
> +/**
> + * mtk_pci_get_virq_id() - Get the Linux virtual IRQ for a hardware IRQ ID
> + * @mdev: MTK MD device
> + * @irq_id: Hardware IRQ ID
> + *
> + * Return: Virtual IRQ number on success, negative error code on failure.
> + */
> +int mtk_pci_get_virq_id(struct mtk_md_dev *mdev, int irq_id)
> +{
> + struct pci_dev *pdev = to_pci_dev(mdev->dev);
> + struct mtk_pci_priv *priv = mdev->hw_priv;
> +
> + if (!priv->irq_cnt || irq_id < 0)
> + return -EINVAL;
> +
> + return pci_irq_vector(pdev, irq_id % priv->irq_cnt);
> +}
> +
> +/**
> + * mtk_pci_register_irq() - Register a callback for a hardware IRQ
> + * @mdev: MTK MD device
> + * @irq_id: Hardware IRQ ID
> + * @irq_cb: Callback function
> + * @data: Private data passed to callback
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +int mtk_pci_register_irq(struct mtk_md_dev *mdev, int irq_id,
> + int (*irq_cb)(int irq_id, void *data), void *data)
> +{
> + struct mtk_pci_priv *priv = mdev->hw_priv;
> +
> + if ((irq_id < 0 || irq_id >= MTK_IRQ_CNT_MAX) || !irq_cb)
> + return -EINVAL;
> +
> + if (priv->irq_cb_list[irq_id]) {
> + dev_err((mdev)->dev,
> + "Unable to register irq, irq_id=%d, it's already been register by %ps.\n",
> + irq_id, priv->irq_cb_list[irq_id]);
> + return -EFAULT;
> + }
> + priv->irq_cb_list[irq_id] = irq_cb;
> + priv->irq_cb_data[irq_id] = data;
> +
> + return 0;
> +}
> +
> +/**
> + * mtk_pci_unregister_irq() - Unregister a hardware IRQ callback
> + * @mdev: MTK MD device
> + * @irq_id: Hardware IRQ ID
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +int mtk_pci_unregister_irq(struct mtk_md_dev *mdev, int irq_id)
> +{
> + struct mtk_pci_priv *priv = mdev->hw_priv;
> +
> + if (irq_id < 0 || irq_id >= MTK_IRQ_CNT_MAX)
> + return -EINVAL;
> +
> + if (!priv->irq_cb_list[irq_id]) {
> + dev_err((mdev)->dev, "irq_id=%d has not been registered\n", irq_id);
> + return -EFAULT;
> + }
> + priv->irq_cb_list[irq_id] = NULL;
> + priv->irq_cb_data[irq_id] = NULL;
> +
> + return 0;
> +}
> +
> +/**
> + * mtk_pci_mask_irq() - Mask (disable) a hardware IRQ
> + * @mdev: MTK MD device
> + * @irq_id: Hardware IRQ ID
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +int mtk_pci_mask_irq(struct mtk_md_dev *mdev, int irq_id)
> +{
> + struct mtk_pci_priv *priv = mdev->hw_priv;
> +
> + if (irq_id < 0 || irq_id >= MTK_IRQ_CNT_MAX ||
> + priv->irq_type != PCI_IRQ_MSIX) {
> + dev_err(mdev->dev, "Failed to mask irq: input irq_id=%d\n", irq_id);
> + return -EINVAL;
> + }
> +
> + mtk_pci_mac_write32(priv, REG_IMASK_HOST_MSIX_CLR_GRP0_0, BIT(irq_id));
> +
> + return 0;
> +}
> +
> +/**
> + * mtk_pci_unmask_irq() - Unmask (enable) a hardware IRQ
> + * @mdev: MTK MD device
> + * @irq_id: Hardware IRQ ID
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +int mtk_pci_unmask_irq(struct mtk_md_dev *mdev, int irq_id)
> +{
> + struct mtk_pci_priv *priv = mdev->hw_priv;
> +
> + if (irq_id < 0 || irq_id >= MTK_IRQ_CNT_MAX ||
> + priv->irq_type != PCI_IRQ_MSIX) {
> + dev_err(mdev->dev, "Failed to unmask irq: input irq_id=%d\n", irq_id);
> + return -EINVAL;
> + }
> +
> + mtk_pci_mac_write32(priv, REG_IMASK_HOST_MSIX_SET_GRP0_0, BIT(irq_id));
> +
> + return 0;
> +}
> +
> +/**
> + * mtk_pci_clear_irq() - Clear (acknowledge) a hardware IRQ
> + * @mdev: MTK MD device
> + * @irq_id: Hardware IRQ ID
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +int mtk_pci_clear_irq(struct mtk_md_dev *mdev, int irq_id)
> +{
> + struct mtk_pci_priv *priv = mdev->hw_priv;
> +
> + if (irq_id < 0 || irq_id >= MTK_IRQ_CNT_MAX ||
> + priv->irq_type != PCI_IRQ_MSIX) {
> + dev_err(mdev->dev, "Failed to clear irq: input irq_id=%d\n", irq_id);
> + return -EINVAL;
> + }
> +
> + mtk_pci_mac_write32(priv, REG_MSIX_ISTATUS_HOST_GRP0_0, BIT(irq_id));
> +
> + return 0;
> +}
> +
> +static u32 mtk_pci_ext_d2h_evt_hw_bits(u32 chs)
> +{
> + u32 hw_bits = 0;
> +
> + SET_HW_BITS(hw_bits, chs, MHCCIF_EP2RC_EVT_BOOT_FLOW_SYNC,
> + DEV_EVT_D2H_BOOT_FLOW_SYNC);
> + SET_HW_BITS(hw_bits, chs, MHCCIF_EP2RC_EVT_ASYNC_HS_NOTIFY_SAP,
> + DEV_EVT_D2H_ASYNC_HS_NOTIFY_SAP);
> + SET_HW_BITS(hw_bits, chs, MHCCIF_EP2RC_EVT_ASYNC_HS_NOTIFY_MD,
> + DEV_EVT_D2H_ASYNC_HS_NOTIFY_MD);
> +
> + return LE32_TO_U32(cpu_to_le32(hw_bits));
> +}
> +
> +static u32 mtk_pci_ext_d2h_evt_chs(u32 hw_bits)
> +{
> + u32 chs = 0;
> +
> + if (!hw_bits)
> + return chs;
> +
> + chs = FIELD_PREP(DEV_EVT_D2H_BOOT_FLOW_SYNC,
> + FIELD_GET(MHCCIF_EP2RC_EVT_BOOT_FLOW_SYNC, hw_bits)) |
> + FIELD_PREP(DEV_EVT_D2H_ASYNC_HS_NOTIFY_SAP,
> + FIELD_GET(MHCCIF_EP2RC_EVT_ASYNC_HS_NOTIFY_SAP, hw_bits)) |
> + FIELD_PREP(DEV_EVT_D2H_ASYNC_HS_NOTIFY_MD,
> + FIELD_GET(MHCCIF_EP2RC_EVT_ASYNC_HS_NOTIFY_MD, hw_bits));
> +
> + return chs;
> +}
> +
> +/**
> + * mtk_pci_register_ext_evt() - Register a callback for MHCCIF device events
> + * @mdev: MTK MD device
> + * @chs: Bitmask of event channels to register
> + * @evt_cb: Callback function
> + * @data: Private data passed to callback
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +int mtk_pci_register_ext_evt(struct mtk_md_dev *mdev, u32 chs,
> + int (*evt_cb)(u32 status, void *data), void *data)
> +{
> + struct mtk_pci_priv *priv = mdev->hw_priv;
> + struct mtk_mhccif_cb *cb;
> + int ret = 0;
> +
> + if (!chs || !evt_cb)
> + return -EINVAL;
> +
> + spin_lock_bh(&priv->mhccif_lock);
> + list_for_each_entry(cb, &priv->mhccif_cb_list, entry) {
> + if (cb->chs & chs) {
> + ret = -EFAULT;
> + dev_err((mdev)->dev,
> + "Unable to register evt, intersection: chs=0x%08x&0x%08x cb=%ps\n",
> + chs, cb->chs, cb->evt_cb);
> + goto err_spin_unlock;
> + }
> + }
> + cb = devm_kzalloc(mdev->dev, sizeof(*cb), GFP_ATOMIC);
> + if (!cb) {
> + ret = -ENOMEM;
> + goto err_spin_unlock;
> + }
> + cb->evt_cb = evt_cb;
> + cb->data = data;
> + cb->chs = chs;
> + list_add_tail(&cb->entry, &priv->mhccif_cb_list);
> +err_spin_unlock:
> + spin_unlock_bh(&priv->mhccif_lock);
> +
> + return ret;
> +}
> +
> +/**
> + * mtk_pci_unregister_ext_evt() - Unregister an MHCCIF device event callback
> + * @mdev: MTK MD device
> + * @chs: Bitmask of event channels to unregister
> + */
> +void mtk_pci_unregister_ext_evt(struct mtk_md_dev *mdev, u32 chs)
> +{
> + struct mtk_pci_priv *priv = mdev->hw_priv;
> + struct mtk_mhccif_cb *cb, *next;
> +
> + if (!chs)
> + return;
> +
> + spin_lock_bh(&priv->mhccif_lock);
> + list_for_each_entry_safe(cb, next, &priv->mhccif_cb_list, entry) {
> + if (cb->chs == chs) {
> + list_del(&cb->entry);
> + devm_kfree(mdev->dev, cb);
> + goto out;
> + }
> + }
> + dev_warn((mdev)->dev,
> + "Unable to unregister evt, no chs=0x%08x has been registered.\n", chs);
> +out:
> + spin_unlock_bh(&priv->mhccif_lock);
> +}
> +
> +/**
> + * mtk_pci_mask_ext_evt() - Mask (disable) MHCCIF device events
> + * @mdev: MTK MD device
> + * @chs: Bitmask of event channels to mask
> + */
> +void mtk_pci_mask_ext_evt(struct mtk_md_dev *mdev, u32 chs)
> +{
> + struct mtk_pci_priv *priv = mdev->hw_priv;
> + u32 hw_bits = mtk_pci_ext_d2h_evt_hw_bits(chs);
> +
> + mtk_pci_write32(mdev, priv->cfg->mhccif_rc_base_addr +
> + MHCCIF_EP2RC_SW_INT_EAP_MASK_SET, hw_bits);
> +}
> +
> +/**
> + * mtk_pci_unmask_ext_evt() - Unmask (enable) MHCCIF device events
> + * @mdev: MTK MD device
> + * @chs: Bitmask of event channels to unmask
> + */
> +void mtk_pci_unmask_ext_evt(struct mtk_md_dev *mdev, u32 chs)
> +{
> + struct mtk_pci_priv *priv = mdev->hw_priv;
> + u32 hw_bits = mtk_pci_ext_d2h_evt_hw_bits(chs);
> +
> + mtk_pci_write32(mdev, priv->cfg->mhccif_rc_base_addr +
> + MHCCIF_EP2RC_SW_INT_EAP_MASK_CLR, hw_bits);
> +}
> +
> +/**
> + * mtk_pci_clear_ext_evt() - Clear (acknowledge) MHCCIF device events
> + * @mdev: MTK MD device
> + * @chs: Bitmask of event channels to clear
> + */
> +void mtk_pci_clear_ext_evt(struct mtk_md_dev *mdev, u32 chs)
> +{
> + struct mtk_pci_priv *priv = mdev->hw_priv;
> + u32 hw_bits = mtk_pci_ext_d2h_evt_hw_bits(chs);
> +
> + mtk_pci_write32(mdev, priv->cfg->mhccif_rc_base_addr +
> + MHCCIF_EP2RC_SW_INT_ACK, hw_bits);
> +}
> +
> +static u32 mtk_pci_ext_h2d_evt_hw_bits(u32 chs)
> +{
> + u32 hw_bits = 0;
> +
> + SET_HW_BITS(hw_bits, chs, MHCCIF_RC2EP_EVT_DEVICE_RESET,
> + DEV_EVT_H2D_DEVICE_RESET);
> + return LE32_TO_U32(cpu_to_le32(hw_bits));
> +}
> +
> +/**
> + * mtk_pci_send_ext_evt() - Send an MHCCIF event to the modem
> + * @mdev: MTK MD device
> + * @ch: Event channel to trigger (must be a single bit)
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +int mtk_pci_send_ext_evt(struct mtk_md_dev *mdev, u32 ch)
> +{
> + struct mtk_pci_priv *priv = mdev->hw_priv;
> + u32 rc_base, hw_bits;
> +
> + rc_base = priv->cfg->mhccif_rc_base_addr;
> +
> + /* Only allow one ch to be triggered at a time */
> + if (!is_power_of_2(ch)) {
> + dev_err((mdev)->dev, "Unsupported ext evt ch=0x%08x\n", ch);
> + return -EINVAL;
> + }
> +
> + hw_bits = mtk_pci_ext_h2d_evt_hw_bits(ch);
> + mtk_pci_write32(mdev, rc_base + MHCCIF_RC2EP_SW_BSY, hw_bits);
> + mtk_pci_write32(mdev, rc_base + MHCCIF_RC2EP_SW_TCHNUM, ffs(hw_bits) - 1);
> + return 0;
> +}
> +
> +static u32 mtk_pci_get_ext_evt_hw_status(struct mtk_md_dev *mdev)
> +{
> + struct mtk_pci_priv *priv = mdev->hw_priv;
> +
> + return mtk_pci_read32(mdev, priv->cfg->mhccif_rc_base_addr +
> + MHCCIF_EP2RC_SW_INT_STS);
> +}
> +
> +/**
> + * mtk_pci_fldr() - Perform a Function Level Device Reset via ACPI _RST
> + * @mdev: MTK MD device
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +int mtk_pci_fldr(struct mtk_md_dev *mdev)
> +{
> +#ifdef CONFIG_ACPI
...
> +#else /* !CONFIG_ACPI */
> + dev_err((mdev)->dev, "Unsupported, CONFIG ACPI hasn't been set to 'y'\n");
Why not just have the Kconfig depend on ACPI?
> + if (ret) {
> + dev_err((mdev)->dev, "Failed to register mhccif_irq callback\n");
Why the () around mdev?
Andrew
---
pw-bot: cr
^ permalink raw reply
* Re: [PATCH net-next v5 1/4] dpll: add DPLL_PIN_TYPE_INT_NCO pin type
From: Vadim Fedorenko @ 2026-06-24 15:57 UTC (permalink / raw)
To: Ivan Vecera, Kubalewski, Arkadiusz, Jiri Pirko, Jakub Kicinski
Cc: netdev@vger.kernel.org, Jiri Pirko, David S. Miller,
Donald Hunter, Eric Dumazet, Schmidt, Michal, Paolo Abeni,
Vaananen, Pasi, Oros, Petr, Prathosh Satish, Simon Horman,
linux-kernel@vger.kernel.org
In-Reply-To: <23e47140-f69f-451d-9154-29071130c11c@redhat.com>
On 19/06/2026 18:07, Ivan Vecera wrote:
> On 6/17/26 1:59 PM, Kubalewski, Arkadiusz wrote:
>>> From: Ivan Vecera <ivecera@redhat.com>
>>> Sent: Monday, June 15, 2026 2:00 PM
>>>
>>> On 6/11/26 2:09 PM, Jiri Pirko wrote:
>>>> Wed, Jun 10, 2026 at 05:45:46PM +0200, ivecera@redhat.com wrote:
>>>>> On 6/10/26 3:04 PM, Kubalewski, Arkadiusz wrote:
>>>>>>> From: Ivan Vecera <ivecera@redhat.com>
>>>>>>> Sent: Tuesday, June 9, 2026 4:59 PM
>>>>>>>
>>>>>>> On 6/9/26 4:00 PM, Kubalewski, Arkadiusz wrote:
>>>>>>>>> From: Jiri Pirko <jiri@resnulli.us>
>>>>>>>>> Sent: Tuesday, June 9, 2026 10:51 AM
>>>>>>>>>
>>>>>>>>> Mon, Jun 08, 2026 at 07:03:46PM +0200,
>>>>>>>>> arkadiusz.kubalewski@intel.com
>>>>>>>>> wrote:
>>>>>>>>>>> From: Ivan Vecera <ivecera@redhat.com>
>>>>>>>>>>> Sent: Monday, June 8, 2026 5:48 PM
>>>>>>>>>>>
>>>>>>>>>>> On 6/8/26 4:43 PM, Kubalewski, Arkadiusz wrote:
>>>>>>>>>>>>> From: Ivan Vecera <ivecera@redhat.com>
>>>>>>>>>>>>> Sent: Sunday, May 31, 2026 9:44 PM ...
>>>>>>>>>>>>> -
>>>>>>>>>>>>> name: gnss
>>>>>>>>>>>>> doc: GNSS recovered clock
>>>>>>>>>>>>> + -
>>>>>>>>>>>>> + name: int-nco
>>>>>>>>>>>>> + doc: |
>>>>>>>>>>>>> + Device internal numerically controlled oscillator.
>>>>>>>>>>>>> + When connected as a DPLL input, the DPLL enters NCO
>>>>>>>>>>>>> mode
>>>>>>>>>>>>> + where the output frequency is adjusted by the host
>>>>>>>>>>>>> via
>>>>>>>>>>>>> + the PTP clock interface.
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Ivan!
>>>>>>>>>>>>
>>>>>>>>>>>> How would you control this in case of automatic mode dpll?
>>>>>>>>>>>> Automatic mode DPLL shall be controlled on HW level, such pin
>>>>>>>>>>>> brakes that rule and requires some driver magic to show it is
>>>>>>>>>>>> higher priority then the rest of the pins?
>>>>>>>>>>>
>>>>>>>>>>> The NCO pin can be connected only in manual mode. In other words
>>>>>>>>>>> a
>>>>>>>>>>> DPLL in automatic mode cannot select NCO pin (switch to NCO
>>>>>>>>>>> mode)
>>>>>>>>>>> by
>>>>>>>>>>> its own.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Being picky on DPLL_MODE for enabling feature is not something we
>>>>>>>>>> can allow if it is not related to HW limitation, is it?
>>>>>>>>>> Could you please elaborate why it is not possible for AUTOMATIC
>>>>>>>>>> mode?
>>>>>>>>>
>>>>>>>>> In automatic mode, the pin selection logic is defined upon prio. I
>>>>>>>>> can imagine that if NCO pin has the highest prio of the available
>>>>>>>>> ones, it gets picked. I would be aligned 100% with automatic mode
>>>>>>>>> behaviour.
>>>>>>>>> Is there a real usecase for it?
>>>>>>>>>
>>>>>>>>> [..]
>>>>>>>>
>>>>>>>> This is not true. AUTOMATIC mode is HW solution, SW driver ONLY
>>>>>>>> configures priorities on the inputs, not manages the active inputs.
>>>>>>>> This brakes that behavior, the SW driver would have to manually
>>>>>>>> override the AUTMATIC mode to be fed from such NCO pin as it
>>>>>>>> doesn't
>>>>>>>> exists on it's priority list, HW cannot pick or use it.
>>>>>>>
>>>>>>> Correct, AUTO mode is hardware feature and it should not be emulated
>>>>>>> by a
>>>>>>> driver. If the hardware does not support it then the switching
>>>>>>> between
>>>>>>> input references should be done by userspace (by monitoring ffo,
>>>>>>> phase_offset, operstate).
>>>>>>>
>>>>>>
>>>>>> Yes, exactly, so for AUTOMATIC mode HW it will not be possible to
>>>>>> create
>>>>>> such pin, which means that NCO pin would serve only a MANUAL mode
>>>>>> implementation.
>>>>>> Basically this is something we shall not allow to happen. DPLL API
>>>>>> should be designed to cover the case where AUTO mode is able to
>>>>>> implement
>>>>>> all features consistently.
>>>>>
>>>>> If you don't like the proposal from Jiri (NCO switch driven by NCO pin
>>>>> priority -> highest==enter_nco else leave_nco) then it could be
>>>>> possible
>>>>> to handle the switching by allowing the state 'connected' in AUTO mode
>>>>> for the NCO pin type. Then the implementation will be the same for
>>>>> both
>>>>> selection modes.
>>>>>
>>>>> Only difference would be that a user does not need to switch the
>>>>> device
>>>> >from the AUTO to MANUAL mode.
>>>>>
>>>>>>>> The real use case is that any DPLL can switch the mode to this one
>>>>>>>> instead of implementing MANUAL mode just to use the feature with a
>>>>>>>> 'virtual' pin.
>>>>>>>
>>>>>>> I don't expect this... but it is up to a driver. I don't plan such
>>>>>>> functionality in zl3073x as the NCO pin does not expose prio_get()
>>>>>>> and
>>>>>>> prio_set() callbacks - so it is clear that this pin cannot be
>>>>>>> part of
>>>>>>> the
>>>>>>> automatic selection.
>>>>>>>
>>>>>>> Ivan
>>>>>>
>>>>>> There is a difference between particular HW and API capabilities,
>>>>>> with
>>>>>> the
>>>>>> proposed API we would disallow the possibility of such implementation
>>>>>> for
>>>>>> existing HW variants.
>>>>>>
>>>>>> DPLL NCO MODE would allow that but as pointed here by Ivan and by
>>>>>> Jiri
>>>>>> in
>>>>>> the other email it would also require the extra implementation for
>>>>>> some
>>>>>> configuration - device level phase/ffo handling.
>>>>>>
>>>>>> To summarize it all, I don't have such simple solution for it.
>>>>>>
>>>>>> First thing that comes to my mind is to combine both approaches.
>>>>>> Make it possible for AUTMATIC mode to also set "CONNECTED" state
>>>>>> on certain kind of "OVERRIDE" pins, where it could be determined by
>>>>>> the type of PIN and embed that logic into the DPLL subsystem.
>>>>>
>>>>> The possible states for particual pins are now handled at a driver
>>>>> level
>>>>> so the driver decides if the requested state is correct or not. So it
>>>>> could be easy to implement this.
>>>>>
>>>>> For auto mode allowed states:
>>>>> - input references: selectable / disconnected
>>>>> - nco pin: connected / disconnected
>>>>>
>>>>>> Basically, if driver registers such NCO pin it would be always
>>>>>> selected
>>>>>> manually, and in such case all the other pins are going to
>>>>>> disconnected
>>>>>> state while DPLL mode is also a "OVERRIDE" or something like it.
>>>>>
>>>>> I would leave this decision on the driver level... Imagine the
>>>>> potential
>>>>> HW that would allow to switch NCO mode if there is no valid input
>>>>> reference.
>>>>>
>>>>> Example:
>>>>>
>>>>> REF0 (prio 0) -> +------+ -> OUT0
>>>>> REF1 (prio 1) -> | DPLL | -> ...
>>>>> NCO (prio 2) -> +------+ -> OUTn
>>>>>
>>>>> Such HW would prefer REF0 or REF1 and lock to one of them if they are
>>>>> qualified. But if they are NOT, then it switches to NCO mode.
>>
>> Now you said yourself "NCO mode" ... I agree that it would be a mode in
>> that case. Where instead of running on regular/built in XO dpll would run
>> on NCO and user could select it, and this would be addition to regular
>> behavior.
>>
>> I also agree that the pin approach might be better/easier to use,
>> assuming
>> frequency offset for all the outputs given dpll drives, it makes more
>> sense
>> to have it configurable on input side.
>
> +1
>
>>>>>
>>>>> In this situation the relevant driver would allow to configure
>>>>> priority
>>>>> and state 'selectable' for this NCO pin.
>>>>>
>>>>>> Perhaps the pin type could include OVERRIDE in it's name to make it
>>>>>> less
>>>>>> confusing and needs some extra documentation.
>>>>>>
>>>>>> Thoughts?
>>>>> I think _INT_ is ok. In the case of TYPE_INT_OSCILLATOR it is also
>>>>> obvious that it is not a standard input reference.
>>>>>
>>>>> Jiri, Vadim, Arek, thoughts?
>>>>
>>>> I agree with you, the driver should have the flexibility to implement
>>>> this according to his/hw's needs/capabilities. If it implements prio
>>>> selection in AUTO mode, let it have it. If it implements manual NCO pin
>>>> selection in AUTO mode using connected/disconnected override, let it
>>>> have it.
>>
>> I don't know 'current' HW that is capable of using AUTO mode as a part of
>> HW-based priority source selection and use such NCO input..
>> But as already explained above, this is special mode of regular XO, which
>> allows DPLL's output frequency offset configuration.
>
> Lets keep this available for potential future HW. I can imagine a
> situation where a user will prefer an automatic switch to NCO mode
> if there is no qualified input reference - automatic switch means
> that HW will support this (not emulated by the driver).
>
>>>>
>>>> Moreover, I actually like the "override" capability for pins in AUTO
>>>> mode in general. It may be handy for other usecases as well.
>>>>
>>> Arek? Vadim?
>>>
>>> Thanks,
>>> Ivan
>>
>> Agree, 'override' capability of a pin would be the way to go for this and
>> other similar further cases.
>>
>> I believe a single approach on this would be best, I mean if AUTO mode
>> needs a capability, to switch from regular behavior to 'OVERRIDE', and
>> 'OVERRIDE' is only pin capability that allows such behavior for AUTO
>> mode, then similar approach should be used on MANUAL mode, to make
>> userspace know that such pin is always available to set "CONNECTED"
>> and make the userspace implementation consistent on enabling it no matter
>> if AUTO or MANUAL mode dpll.
>
> Proposal:
> 1) new pin capability
> - name: state-connected-override
> - doc: pin state can be changed to connected in any DPLL mode
>
> 2) new NCO pin type to switch the DPLL to NCO mode when connected
>
> 3) automatic-only DPLL
> - should expose NCO pin with state-connected-override capability
>
> 4) manual-only DPLL
> - does not need to expose NCO pin with state-connected-override cap
>
> 5) dual-mode DPLL (supporting mode switching)
> - if it exposes NCO pin with the override cap then it has to support
> switching to NCO mode directly from AUTO mode
> - if does not expose NCO pin with the override cap then a user MUST
> switch the DPLL mode from AUTO to MANUAL to be able to make NCO
> pin connected to the DPLL
I still don't see good reasoning for the pin. Even this sentence says
"DPLL mode" which keeps me thinking whether we have to move it to a
special DPLL mode. All these items look like overcomplication of a
simple function of the device itself. DPLL can be either in the closed
loop when one of the pins provides a signal to align to, or in the open
loop meaning that software can control adjustments to phase/frequency.
But it's definitely a property of the device, and it's not a pin in any
kind...
>
> Vadim, Jiri, Arek - thoughts?
>
> Thanks,
> Ivan
>
^ permalink raw reply
* Re: [PATCH v3 2/7] net: wwan: t9xx: Add control plane transaction layer
From: Andrew Lunn @ 2026-06-24 16:00 UTC (permalink / raw)
To: jackbb_wu
Cc: Loic Poulain, Sergey Ryazanov, Johannes Berg, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Wen-Zhi Huang, Shi-Wei Yeh, Minano Tseng, Matthias Brugger,
AngeloGioacchino Del Regno, Simon Horman, Jonathan Corbet,
Shuah Khan, linux-kernel, netdev, linux-arm-kernel,
linux-mediatek, linux-doc
In-Reply-To: <20260624-t9xx_driver_v1-v3-2-73ff03f60c48@compal.com>
> +static int __init mtk_common_drv_init(void)
> +{
> + return 0;
> +}
> +module_init(mtk_common_drv_init);
> +
> +static void __exit mtk_common_drv_exit(void)
> +{
> +}
> +module_exit(mtk_common_drv_exit);
Since these don't do anything, they should not be needed.
> @@ -467,6 +468,7 @@ static u32 mtk_pci_ext_h2d_evt_hw_bits(u32 chs)
>
> SET_HW_BITS(hw_bits, chs, MHCCIF_RC2EP_EVT_DEVICE_RESET,
> DEV_EVT_H2D_DEVICE_RESET);
> +
> return LE32_TO_U32(cpu_to_le32(hw_bits));
Please don't add white space like this. I assume a previous patch
added this code, so move this to that patch.
> @@ -908,13 +910,11 @@ static int mtk_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> struct mtk_md_dev *mdev;
> int ret;
>
> - mdev = devm_kzalloc(dev, sizeof(*mdev), GFP_KERNEL);
> + mdev = mtk_dev_alloc(dev, &pci_hw_ops);
> if (!mdev) {
> ret = -ENOMEM;
> goto log_err;
> }
> - mdev->dev_ops = &pci_hw_ops;
> - mdev->dev = dev;
>
> priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
> if (!priv) {
> @@ -991,7 +991,7 @@ static int mtk_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> free_priv_data:
> devm_kfree(dev, priv);
> free_cntx_data:
> - devm_kfree(dev, mdev);
> + mtk_dev_free(mdev);
Why are you removing devm_ calls?
Andrew
^ permalink raw reply
* Re: [PATCH net v4 0/5] nfc: fix multiple OOB reads in NCI and LLCP parsing paths
From: David Heidelberg @ 2026-06-24 16:11 UTC (permalink / raw)
To: Lekë Hapçiu, oe-linux-nfc
Cc: davem, edumazet, kuba, pabeni, krzk, horms, linux-kernel, netdev
In-Reply-To: <20260424180151.3808557-1-snowwlake@icloud.com>
On 24/04/2026 20:01, Lekë Hapçiu wrote:
> This series fixes five out-of-bounds / underflow bugs in the kernel NFC
> stack. All are reachable from a remote NFC peer that the local stack
> has already associated with; in the LLCP cases the peer only needs to
> send a malformed frame.
>
> 1/5 nci: u8 underflow in nci_store_general_bytes_nfc_dep() lets the
> attacker-controlled atr_res_len skip the GT-offset subtraction
> and cause an OOB read/write against general_bytes[].
> 2/5 llcp: parse_gb_tlv() / parse_connection_tlv() trust the TLV
> length byte without checking remaining buffer, and the tlv16
> accessors read past the end when length < 2.
> 3/5 llcp: nfc_llcp_recv_snl() has the same TLV-length trust bug, and
> its SDRES handler uses an unbounded "%.16s" pr_debug() that
> walks past service_name_len.
> 4/5 llcp: nfc_llcp_recv_dm() reads skb->data[3] without checking
> skb->len, giving a 1-byte heap OOB read.
> 5/5 llcp: nfc_llcp_connect_sn() walks the TLV array with no length
> validation; a crafted CONNECT frame drops it into OOB reads /
> an unbounded service-name pointer.
>
> The series applies on top of net/main.
>
> Lekë Hapçiu (5):
> nfc: nci: fix u8 underflow in nci_store_general_bytes_nfc_dep
> nfc: llcp: fix TLV parsing in parse_gb_tlv and parse_connection_tlv
> nfc: llcp: fix TLV parsing OOB in nfc_llcp_recv_snl
> nfc: llcp: fix OOB read of DM reason byte in nfc_llcp_recv_dm
> nfc: llcp: fix TLV parsing OOB in nfc_llcp_connect_sn
>
> net/nfc/llcp_commands.c | 24 ++++++++++++++++++++++--
> net/nfc/llcp_core.c | 35 ++++++++++++++++++++++++++++++++---
> net/nfc/nci/ntf.c | 6 ++++++
> 3 files changed, 60 insertions(+), 5 deletions(-)
>
Hello Lekë,
could you please rebase this series against NFC for-linus branch [1]?
Likely some checks has been added meanwhile, but I would love to get the
remaining ones in!
Don't forget also add NFC mailing list oe-linux-nfc@lists.linux.dev .
Thank you
David
^ permalink raw reply
* Re: [PATCH v3 3/7] net: wwan: t9xx: Add control DMA interface
From: Andrew Lunn @ 2026-06-24 16:15 UTC (permalink / raw)
To: jackbb_wu
Cc: Loic Poulain, Sergey Ryazanov, Johannes Berg, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Wen-Zhi Huang, Shi-Wei Yeh, Minano Tseng, Matthias Brugger,
AngeloGioacchino Del Regno, Simon Horman, Jonathan Corbet,
Shuah Khan, linux-kernel, netdev, linux-arm-kernel,
linux-mediatek, linux-doc
In-Reply-To: <20260624-t9xx_driver_v1-v3-3-73ff03f60c48@compal.com>
> diff --git a/drivers/net/wwan/t9xx/pcie/mtk_cldma.c b/drivers/net/wwan/t9xx/pcie/mtk_cldma.c
> +static inline void mtk_cldma_clr_bd_dsc(struct cldma_drv_info *drv_info,
> + struct bd_dsc *bd_dsc_pool, int nr_bds)
No inline functions in C files. Please let the compiler decide.
> +static int mtk_cldma_reload_rx_skb(struct mtk_md_dev *mdev, struct rxq *rxq,
> + struct rx_req *req)
> +{
> + struct sk_buff *tail = NULL;
> + struct bd_dsc *bd_dsc;
> + int nr_bds;
> + int i, ret;
> +
> + nr_bds = rxq->nr_bds;
> +
> + for (i = 0; i < nr_bds; i++) {
> + bd_dsc = req->bd_dsc_pool + i;
> + bd_dsc->skb = __dev_alloc_skb(req->frag_size, GFP_KERNEL);
> + if (!bd_dsc->skb) {
> + dev_warn((mdev)->dev, "Failed to alloc SKB\n");
You might want to rate limit this, and the other similar messages in
the data path, otherwise it could be a DOS.
> +u32 mtk_cldma_stop_queue(struct cldma_drv_info *drv_info, enum mtk_tx_rx dir, u32 qno)
> +{
> + u32 val = (qno == ALLQ) ? qno : BIT(qno);
> + struct cldma_hw_regs *hw_regs;
> + unsigned int active;
> + int cnt = 0;
> + int base;
> + u32 addr;
> +
> + hw_regs = drv_info->hw_regs;
> + base = drv_info->base_addr;
> +
> + if (dir == DIR_TX)
> + addr = base + hw_regs->reg_cldma_ul_stop_cmd;
> + else
> + addr = base + hw_regs->reg_cldma_so_stop_cmd;
> +
> + mtk_pci_write32(drv_info->mdev, addr, val);
> +
> + do {
> + active = drv_info->drv_ops->cldma_queue_status(drv_info, dir, qno);
> + if (active == LINK_ERROR_VAL || !active)
> + break;
> + usleep_range(WAIT_QUEUE_STOP, 2 * WAIT_QUEUE_STOP);
> + } while (++cnt < 10);
Please use one of the helpers from iopoll.h. Any loops waiting for an
event to happen should use those macros, since the open code
implementation is often wrong.
> +static int mtk_ctrl_trb_srv_init(struct mtk_ctrl_trans *trans)
> +{
> + struct srv_que *srv_que;
> + struct trb_srv *srv;
> + int i, j;
> + int ret;
> +
> + for (i = 0; i < trans->trb_srv_num; i++) {
> + srv = devm_kzalloc(trans->mdev->dev, sizeof(*srv), GFP_KERNEL);
> + if (!srv) {
> + ret = -ENOMEM;
> + goto err_free_srv;
> + }
> +
> + srv->trans = trans;
> + srv->srv_id = i;
> + trans->trb_srv[i] = srv;
> +
> + init_waitqueue_head(&srv->trb_waitq);
> + for (j = 0; j < NR_CLDMA; j++)
> + INIT_LIST_HEAD(&srv->srv_q_list[j]);
> + }
> +
> + for (i = 0; i < NR_CLDMA; i++)
> + for (j = 0; j < HW_QUE_NUM; j++) {
> + if (trans->srv_cfg[i][j] < 0 ||
> + trans->srv_cfg[i][j] >= trans->trb_srv_num)
> + trans->srv_cfg[i][j] = 0;
> + srv_que = devm_kzalloc(trans->mdev->dev, sizeof(*srv_que), GFP_KERNEL);
> + if (!srv_que) {
> + ret = -ENOMEM;
> + goto err_free_srv_que;
> + }
> + srv_que->hif_id = i;
> + srv_que->qno = j;
> + list_add_tail(&srv_que->list,
> + &trans->trb_srv[trans->srv_cfg[i][j]]->srv_q_list[i]);
> + }
> +
> + for (i = 0; i < trans->trb_srv_num; i++) {
> + trans->trb_srv[i]->trb_thread = kthread_run(mtk_ctrl_trb_thread, trans->trb_srv[i],
> + "mtk_trb_srv%d_%s", i,
> + trans->mdev->dev_str);
> + if (IS_ERR(trans->trb_srv[i]->trb_thread)) {
> + ret = PTR_ERR(trans->trb_srv[i]->trb_thread);
> + trans->trb_srv[i]->trb_thread = NULL;
> + goto err_stop_kthread;
> + }
> + }
> +
> + return 0;
> +err_stop_kthread:
> + while (--i >= 0)
> + kthread_stop(trans->trb_srv[i]->trb_thread);
> +err_free_srv_que:
> + for (i = 0; i < trans->trb_srv_num; i++) {
> + for (j = 0; j < NR_CLDMA; j++) {
> + struct srv_que *next_srv_que;
> +
> + list_for_each_entry_safe(srv_que, next_srv_que,
> + &trans->trb_srv[i]->srv_q_list[j], list) {
> + list_del(&srv_que->list);
> + devm_kfree(trans->mdev->dev, srv_que);
It is unusual to see devm_kfree(). Why is it needed?
> +static unsigned int ctrl_port_chl_mtu;
Is this a global variable? Why is it not part of priv?
> +module_param(ctrl_port_chl_mtu, uint, 0644);
> +MODULE_PARM_DESC(ctrl_port_chl_mtu, "This is used to config the ctrl port mtu!\n");
Ah. No modules parameters please. If this is an MTU, why not use the
normal networking interfaces to set the MTU?
Andrew
^ permalink raw reply
* Re: [PATCH net v2] nfc: llcp: fix OOB read and u8 offset wrap in TLV parsers
From: David Heidelberg @ 2026-06-24 16:16 UTC (permalink / raw)
To: Muhammad Bilal, oe-linux-nfc
Cc: davem, edumazet, kuba, pabeni, horms, krzk, linux-kernel, stable,
netdev
In-Reply-To: <20260622131802.239035-1-meatuni001@gmail.com>
On Mon, 22 Jun 2026 18:18:02 +0500, Muhammad Bilal wrote:
> nfc: llcp: fix OOB read and u8 offset wrap in TLV parsers
Applied, thanks!
[1/1] nfc: llcp: fix OOB read and u8 offset wrap in TLV parsers
commit: cb682b66aba4bc3f4dd4d7ae5612f49ad75b2087
Best regards,
--
David Heidelberg <david@ixit.cz>
^ permalink raw reply
* Re: [RFC net-next 08/15] ipxlat: add translation engine and dispatch core
From: Ralf Lici @ 2026-06-24 16:18 UTC (permalink / raw)
To: Toke Høiland-Jørgensen
Cc: netdev, Daniel Gröber, Antonio Quartulli, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
linux-kernel, Pablo Neira Ayuso, Florian Westphal, Phil Sutter,
Beniamino Galvani
In-Reply-To: <87v7b9c9jj.fsf@toke.dk>
On Tue, 23 Jun 2026 21:59:44 +0200, Toke Høiland-Jørgensen <toke@kernel.org> wrote:
> Ralf Lici <ralf@mandelbit.com> writes:
> > On the BPF point specifically: I agree a BPF program should be able to
> > decide whether to translate. What I am less sure about is whether
> > redirecting to a netdevice is the best way to expose that. A TC action
> > (yet another model, I know :)) gives you the same thing in-pipeline and
> > more directly:
> >
> > tc filter add dev wwan0 egress \
> > bpf obj match.o action ipxlat4to6 domain clat0
> >
> > Let BPF make the policy decision, with the native action doing the
> > translation work that the current BPF CLAT implementations have trouble
> > with: fragmentation, checksum corner cases, and ICMP error inner
> > headers (as explained by Beniamino).
> >
> > So TC clsact looks like the natural in-kernel replacement for today's
> > TC-BPF CLAT programs: no extra netdev, you attach to the existing
> > uplink, direction is explicit, and on egress you sit on the real route
> > dst, so the synthetic-dst and double-routing problems above just don't
> > arise. The cost is more moving parts than a single bpf_redirect since
> > userspace has to manage clsact, filters, priorities and action
> > lifecycle/cleanup.
>
> Hmm, so no one really uses the bpf filter mechanism, since you can just
> do everything from an action anyway (and with TCX attachment, you can
> even avoid the overhead of the TC filter/action infrastructure
> entirely). However, point taken wrt how to integrate this with BPF. I
> guess the most flexible thing would be to expose the functionality
> directly (as a kfunc callable from a BPF program). Which also fits with
> your point below:
>
Ah, I see, the cls_bpf example was dated, and I like the kfunc angle
better than a new TC action.
I would probably keep that as the minimal per-packet interface: BPF can
decide whether a packet should be translated, and the kfunc can do the
actual translation work for packets whose translated form still fits the
output MTU. The full 4->6 fragmentation case still looks like
output-path/harness territory to me, since it is a 1->N fan-out
operation.
> > For a gateway translator, though, I still think a device-bound model is
> > less natural. There the translation point is more like a forwarding
> > decision across routes and nexthops, so a route/LWT attachment, or
> > possibly a netfilter attachment seems easier to reason about. Also, as
> > you already pointed out while discussing LWT, an admin setting up NAT64
> > is more likely to reach for an nft rule than for a clsact filter on a
> > specific device.
> >
> > Taking a step back, ipxlat is really a generic translation engine plus a
> > thin harness around it. So rather than pick one attachment, it might be
> > worth structuring the engine so different harnesses can drive it.
> > There's interesting precedent for this shape:
> >
> > - ILA, again, is the closest sibling: stateless IPv6 address translation
> > with a shared core in ila_common.c, driven both by an LWT frontend in
> > ila_lwt.c and by an inline netfilter hook with a netlink-configured
> > mapping table in ila_xlat.c.
> >
> > - act_ct is the precedent for the TC side specifically: a TC action that
> > reuses the netfilter conntrack engine rather than reimplementing it.
> >
> > And act_nat is the cautionary counter-example: a standalone TC
> > reimplementation of stateless NAT that shares no code with nf_nat, and
> > carries a "would be nice to share code" comment :)
> >
> > So I am wondering whether the right direction is to factor the
> > translation engine cleanly, land it with one harness first, and keep the
> > other attachment points as follow-up work once the core semantics are
> > settled.
> >
> > Does that direction seem reasonable to you?
>
> Yes, reusable functionality that can be called from multiple places
> sounds like a good fit; let's try to structure it that way!
>
Great, that's the direction I'll take then.
> As for which hook to start with, well, let's see if we hear back from
> the netfilter devs, but either netfilter or the routing subsystem (LWT
> style) would be OK for me I think.
>
Works for me. The engine factoring is common to all of them, so I'll
start there. Once it's in shape I can sketch a harness against it to
sanity-check the interface.
--
Ralf Lici
Mandelbit Srl
^ permalink raw reply
* Re: [Intel-wired-lan] [PATCH net v7 2/4] i40e: skip unnecessary VF reset when setting trust
From: Simon Horman @ 2026-06-24 16:30 UTC (permalink / raw)
To: jtornosm
Cc: Simon Horman, netdev, intel-wired-lan, przemyslaw.kitszel,
aleksandr.loktionov, jacob.e.keller, anthony.l.nguyen, davem,
edumazet, kuba, pabeni
In-Reply-To: <20260623101800.991293-3-jtornosm@redhat.com>
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://netdev-ai.bots.linux.dev/sashiko/
---
i40e: skip unnecessary VF reset when setting trust
This patch avoids the ~10 second VF reset delay on trust changes by
flipping the I40E_VIRTCHNL_VF_CAP_PRIVILEGE bit directly, and only
resorting to a full reset when revoking trust from a VF that has
advanced features (ADQ, cloud filters, UC/MC promisc) configured.
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
> index a26c3d47ec15..0cc434b26eb8 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
> @@ -4987,19 +5004,20 @@ int i40e_ndo_set_vf_trust(struct net_device *netdev, int vf_id, bool setting)
> set_bit(__I40E_MACVLAN_SYNC_PENDING, pf->state);
> pf->vsi[vf->lan_vsi_idx]->flags |= I40E_VSI_FLAG_FILTER_CHANGED;
>
> - i40e_vc_reset_vf(vf, true);
> + /* Reset only if revoking trust and VF has advanced features configured */
> + if (!setting &&
> + (vf->adq_enabled || vf->num_cloud_filters > 0 ||
> + test_bit(I40E_VF_STATE_UC_PROMISC, &vf->vf_states) ||
> + test_bit(I40E_VF_STATE_MC_PROMISC, &vf->vf_states))) {
[High]
Does this revoke-trust heuristic miss MAC/VLAN filters that were only
installable while the VF was trusted?
A trusted VF is allowed to add many more MAC addresses than
I40E_VC_MAX_MAC_ADDR_PER_VF (i40e_check_vf_permission() uses
I40E_VC_MAX_MACVLAN_PER_TRUSTED_VF) and more than I40E_VC_MAX_VLAN_PER_VF
VLANs. These quotas are enforced only at add-time.
The previous always-reset path went through i40e_vc_reset_vf() ->
i40e_cleanup_reset_vf() -> i40e_free_vf_res() -> i40e_vsi_release(),
which destroyed the VSI and its mac_filter_hash, and reset
vf->num_vlan = 0.
The new fast path only flips the PRIVILEGE bit, leaving any
trusted-quota MAC/VLAN filters installed in hardware on an
now-untrusted VF. Should the heuristic also check
vf->num_vlan > I40E_VC_MAX_VLAN_PER_VF or the size of the lan VSI's
MAC filter hash before taking the no-reset branch?
[Medium]
Is the sampling of vf->adq_enabled, vf->num_cloud_filters and the
UC/MC_PROMISC bits race-free against the admin queue worker?
i40e_ndo_set_vf_trust() runs under rtnl_lock and
__I40E_VIRTCHNL_OP_PENDING, but i40e_vc_process_vf_msg() (called from
i40e_clean_adminq_subtask() in the service workqueue) takes neither.
A virtchnl handler such as i40e_vc_config_promiscuous_mode_msg(),
i40e_vc_add_cloud_filter() or i40e_vc_add_qch_msg() can be running
concurrently and may have already passed its CAP_PRIVILEGE gate but
not yet set vf_states / num_cloud_filters / adq_enabled when this
ndo samples them.
In that window the ndo would observe a "clean" VF, take the no-reset
branch, and clear CAP_PRIVILEGE while the in-flight handler still
goes on to enable promisc / install a cloud filter / set
adq_enabled = true on the now-untrusted VF.
The previous always-reset behaviour masked this because the reset
moved the VF to I40E_VF_STATE_DISABLED, after which
i40e_vc_process_vf_msg() rejects further messages. Is some additional
serialization (or a recheck after clearing the bit) needed here?
> + i40e_vc_reset_vf(vf, true);
> + i40e_del_all_cloud_filters(vf);
> + } else {
> + i40e_setup_vf_trust(vf, setting);
> + }
> +
> dev_info(&pf->pdev->dev, "VF %u is now %strusted\n",
> vf_id, setting ? "" : "un");
>
> - if (vf->adq_enabled) {
> - if (!vf->trusted) {
> - dev_info(&pf->pdev->dev,
> - "VF %u no longer Trusted, deleting all cloud filters\n",
> - vf_id);
> - i40e_del_all_cloud_filters(vf);
> - }
> - }
> -
[Low]
The dedicated "VF %u no longer Trusted, deleting all cloud filters"
dev_info() is dropped in this hunk, and the commit message does not
mention the logging change. Operators or log scrapers that key on
this specific string for audit of cloud filter teardown will no
longer see it; only the generic "VF %u is now untrusted" line
remains. Was removing this message intentional, and if so could
that be called out in the changelog?
> out:
> clear_bit(__I40E_VIRTCHNL_OP_PENDING, pf->state);
> return ret;
^ permalink raw reply
* Re: [BUG] KFENCE: use-after-free read in udp_tunnel_nic_device_sync_work
From: Sam Sun @ 2026-06-24 16:35 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, netdev,
linux-kernel, syzkaller
In-Reply-To: <CANn89iKgwNmTZ98MW92VVgC5zPqtQDBFBFspCphKpS-5i6+-7A@mail.gmail.com>
On Wed, Jun 24, 2026 at 11:01 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Wed, Jun 24, 2026 at 7:46 AM Sam Sun <samsun1006219@gmail.com> wrote:
> >
>
> > So we are still freeing struct udp_tunnel_nic while its embedded work_struct
> > is active. debugobjects catches this at kfree() before the active work gets a
> > chance to run later and dereference the freed utn.
> >
> > My read is that the conversion from bitfields to atomic bitops removes the
> > plain bitfield data race, but UDP_TUNNEL_NIC_WORK_PENDING is still only one
> > boolean state. It can represent "some work is pending", but it cannot
> > distinguish between:
> > idle
> > queued
> > running
> > running and queued again
> >
> > In particular, the workqueue core clears WORK_STRUCT_PENDING before invoking
> > the worker. At that point the same work item can be queued again by
> > udp_tunnel_nic_device_sync(). If an already running instance later executes:
> >
> > clear_bit(UDP_TUNNEL_NIC_WORK_PENDING, &utn->flags);
> >
> > it can still clear the bit that was set for the requeued instance. Then
> > udp_tunnel_nic_unregister() may observe UDP_TUNNEL_NIC_WORK_PENDING clear and
> > free utn, even though debugobjects still sees utn->work as active.
> >
>
> -ETOOMANYBUGS
>
> Ok, we could try to convert pending bit to a refcount.
>
> diff --git a/net/ipv4/udp_tunnel_nic.c b/net/ipv4/udp_tunnel_nic.c
> index 9944ed923ddfd10f9adf6ad788c0740daeaf2adb..2e14686f35896cb0caba3f8f587ef8b369090fbf
> 100644
> --- a/net/ipv4/udp_tunnel_nic.c
> +++ b/net/ipv4/udp_tunnel_nic.c
> @@ -3,6 +3,7 @@
>
> #include <linux/ethtool_netlink.h>
> #include <linux/netdevice.h>
> +#include <linux/refcount.h>
> #include <linux/slab.h>
> #include <linux/types.h>
> #include <linux/workqueue.h>
> @@ -30,9 +31,8 @@ struct udp_tunnel_nic_table_entry {
> * @work: async work for talking to hardware from process context
> * @dev: netdev pointer
> * @lock: protects all fields
> - * @need_sync: at least one port start changed
> - * @need_replay: space was freed, we need a replay of all ports
> - * @work_pending: @work is currently scheduled
> + * @flags: sync, replay flags
> + * @refcnt: reference count
> * @n_tables: number of tables under @entries
> * @missed: bitmap of tables which overflown
> * @entries: table of tables of ports currently offloaded
> @@ -44,9 +44,11 @@ struct udp_tunnel_nic {
>
> struct mutex lock;
>
> - u8 need_sync:1;
> - u8 need_replay:1;
> - u8 work_pending:1;
> + unsigned long flags;
> +#define UDP_TUNNEL_NIC_NEED_SYNC 0
> +#define UDP_TUNNEL_NIC_NEED_REPLAY 1
> +
> + refcount_t refcnt;
>
> unsigned int n_tables;
> unsigned long missed;
> @@ -116,7 +118,7 @@ udp_tunnel_nic_entry_queue(struct udp_tunnel_nic *utn,
> unsigned int flag)
> {
> entry->flags |= flag;
> - utn->need_sync = 1;
> + set_bit(UDP_TUNNEL_NIC_NEED_SYNC, &utn->flags);
> }
>
> static void
> @@ -283,7 +285,7 @@ udp_tunnel_nic_device_sync_by_table(struct net_device *dev,
> static void
> __udp_tunnel_nic_device_sync(struct net_device *dev, struct
> udp_tunnel_nic *utn)
> {
> - if (!utn->need_sync)
> + if (!test_bit(UDP_TUNNEL_NIC_NEED_SYNC, &utn->flags))
> return;
>
> if (dev->udp_tunnel_nic_info->sync_table)
> @@ -291,21 +293,24 @@ __udp_tunnel_nic_device_sync(struct net_device
> *dev, struct udp_tunnel_nic *utn)
> else
> udp_tunnel_nic_device_sync_by_port(dev, utn);
>
> - utn->need_sync = 0;
> + clear_bit(UDP_TUNNEL_NIC_NEED_SYNC, &utn->flags);
> /* Can't replay directly here, in case we come from the tunnel driver's
> * notification - trying to replay may deadlock inside tunnel driver.
> */
> - utn->need_replay = udp_tunnel_nic_should_replay(dev, utn);
> + if (udp_tunnel_nic_should_replay(dev, utn))
> + set_bit(UDP_TUNNEL_NIC_NEED_REPLAY, &utn->flags);
> + else
> + clear_bit(UDP_TUNNEL_NIC_NEED_REPLAY, &utn->flags);
> }
>
> static void
> udp_tunnel_nic_device_sync(struct net_device *dev, struct udp_tunnel_nic *utn)
> {
> - if (!utn->need_sync)
> + if (!test_bit(UDP_TUNNEL_NIC_NEED_SYNC, &utn->flags))
> return;
>
> - queue_work(udp_tunnel_nic_workqueue, &utn->work);
> - utn->work_pending = 1;
> + if (queue_work(udp_tunnel_nic_workqueue, &utn->work))
> + refcount_inc(&utn->refcnt);
> }
>
> static bool
> @@ -348,7 +353,7 @@ udp_tunnel_nic_has_collision(struct net_device
> *dev, struct udp_tunnel_nic *utn,
> if (!udp_tunnel_nic_entry_is_free(entry) &&
> entry->port == ti->port &&
> entry->type != ti->type) {
> - __set_bit(i, &utn->missed);
> + set_bit(i, &utn->missed);
> return true;
> }
> }
> @@ -483,7 +488,7 @@ udp_tunnel_nic_add_new(struct net_device *dev,
> struct udp_tunnel_nic *utn,
> * are no devices currently which have multiple tables accepting
> * the same tunnel type, and false positives are okay.
> */
> - __set_bit(i, &utn->missed);
> + set_bit(i, &utn->missed);
> }
>
> return false;
> @@ -552,7 +557,7 @@ static void __udp_tunnel_nic_reset_ntf(struct
> net_device *dev)
>
> mutex_lock(&utn->lock);
>
> - utn->need_sync = false;
> + clear_bit(UDP_TUNNEL_NIC_NEED_SYNC, &utn->flags);
> for (i = 0; i < utn->n_tables; i++)
> for (j = 0; j < info->tables[i].n_entries; j++) {
> struct udp_tunnel_nic_table_entry *entry;
> @@ -696,8 +701,8 @@ udp_tunnel_nic_flush(struct net_device *dev,
> struct udp_tunnel_nic *utn)
> for (i = 0; i < utn->n_tables; i++)
> memset(utn->entries[i], 0, array_size(info->tables[i].n_entries,
> sizeof(**utn->entries)));
> - WARN_ON(utn->need_sync);
> - utn->need_replay = 0;
> + WARN_ON(test_bit(UDP_TUNNEL_NIC_NEED_SYNC, &utn->flags));
> + clear_bit(UDP_TUNNEL_NIC_NEED_REPLAY, &utn->flags);
> }
>
> static void
> @@ -713,8 +718,8 @@ udp_tunnel_nic_replay(struct net_device *dev,
> struct udp_tunnel_nic *utn)
> for (i = 0; i < utn->n_tables; i++)
> for (j = 0; j < info->tables[i].n_entries; j++)
> udp_tunnel_nic_entry_freeze_used(&utn->entries[i][j]);
> - utn->missed = 0;
> - utn->need_replay = 0;
> + bitmap_zero(&utn->missed, UDP_TUNNEL_NIC_MAX_TABLES);
> + clear_bit(UDP_TUNNEL_NIC_NEED_REPLAY, &utn->flags);
>
> if (!info->shared) {
> udp_tunnel_get_rx_info(dev);
> @@ -728,6 +733,25 @@ udp_tunnel_nic_replay(struct net_device *dev,
> struct udp_tunnel_nic *utn)
> udp_tunnel_nic_entry_unfreeze(&utn->entries[i][j]);
> }
>
> +static void udp_tunnel_nic_free(struct udp_tunnel_nic *utn)
> +{
> + unsigned int i;
> +
> + for (i = 0; i < utn->n_tables; i++)
> + kfree(utn->entries[i]);
> +
> + if (utn->dev)
> + dev_put(utn->dev);
> +
> + kfree(utn);
> +}
> +
> +static void udp_tunnel_nic_put(struct udp_tunnel_nic *utn)
> +{
> + if (refcount_dec_and_test(&utn->refcnt))
> + udp_tunnel_nic_free(utn);
> +}
> +
> static void udp_tunnel_nic_device_sync_work(struct work_struct *work)
> {
> struct udp_tunnel_nic *utn =
> @@ -736,14 +760,15 @@ static void
> udp_tunnel_nic_device_sync_work(struct work_struct *work)
> rtnl_lock();
> mutex_lock(&utn->lock);
>
> - utn->work_pending = 0;
> __udp_tunnel_nic_device_sync(utn->dev, utn);
>
> - if (utn->need_replay)
> + if (test_bit(UDP_TUNNEL_NIC_NEED_REPLAY, &utn->flags))
> udp_tunnel_nic_replay(utn->dev, utn);
>
> mutex_unlock(&utn->lock);
> rtnl_unlock();
> +
> + udp_tunnel_nic_put(utn);
> }
>
> static struct udp_tunnel_nic *
> @@ -759,6 +784,7 @@ udp_tunnel_nic_alloc(const struct udp_tunnel_nic_info *info,
> utn->n_tables = n_tables;
> INIT_WORK(&utn->work, udp_tunnel_nic_device_sync_work);
> mutex_init(&utn->lock);
> + refcount_set(&utn->refcnt, 1);
>
> for (i = 0; i < n_tables; i++) {
> utn->entries[i] = kzalloc_objs(*utn->entries[i],
> @@ -776,15 +802,6 @@ udp_tunnel_nic_alloc(const struct
> udp_tunnel_nic_info *info,
> return NULL;
> }
>
> -static void udp_tunnel_nic_free(struct udp_tunnel_nic *utn)
> -{
> - unsigned int i;
> -
> - for (i = 0; i < utn->n_tables; i++)
> - kfree(utn->entries[i]);
> - kfree(utn);
> -}
> -
> static int udp_tunnel_nic_register(struct net_device *dev)
> {
> const struct udp_tunnel_nic_info *info = dev->udp_tunnel_nic_info;
> @@ -863,6 +880,7 @@ static void
> udp_tunnel_nic_unregister(struct net_device *dev, struct udp_tunnel_nic *utn)
> {
> const struct udp_tunnel_nic_info *info = dev->udp_tunnel_nic_info;
> + bool last = true;
>
> udp_tunnel_nic_lock(dev);
>
> @@ -889,6 +907,7 @@ udp_tunnel_nic_unregister(struct net_device *dev,
> struct udp_tunnel_nic *utn)
> udp_tunnel_drop_rx_info(dev);
> utn->dev = first->dev;
> udp_tunnel_nic_unlock(dev);
> + last = false;
> goto release_dev;
> }
>
> @@ -901,16 +920,11 @@ udp_tunnel_nic_unregister(struct net_device
> *dev, struct udp_tunnel_nic *utn)
> udp_tunnel_nic_flush(dev, utn);
> udp_tunnel_nic_unlock(dev);
>
> - /* Wait for the work to be done using the state, netdev core will
> - * retry unregister until we give up our reference on this device.
> - */
> - if (utn->work_pending)
> - return;
> -
> - udp_tunnel_nic_free(utn);
> + udp_tunnel_nic_put(utn);
> release_dev:
> dev->udp_tunnel_nic = NULL;
> - dev_put(dev);
> + if (!last)
> + dev_put(dev);
> }
>
> static int
I tested the refcount version, and I could no longer reproduce the bug with
the C reproducer. I think this patch fixes the specific lifetime problem
exposed by the reproducer.
Thanks,
Yue
^ permalink raw reply
* Re: [PATCH net] dt-bindings: net: renesas,ether: Drop example "ethernet-phy-ieee802.3-c22" fallback
From: Conor Dooley @ 2026-06-24 16:35 UTC (permalink / raw)
To: Rob Herring (Arm)
Cc: Niklas Söderlund, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Krzysztof Kozlowski, Conor Dooley,
Geert Uytterhoeven, Magnus Damm, Sergei Shtylyov, netdev,
linux-renesas-soc, devicetree, linux-kernel
In-Reply-To: <20260624150250.131966-2-robh@kernel.org>
[-- Attachment #1: Type: text/plain, Size: 52 bytes --]
Acked-by: Conor Dooley <conor.dooley@microchip.com>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply
* Re: [PATCH 3/7] dt-bindings: net: rockchip-dwmac: Allow 9 clocks
From: Conor Dooley @ 2026-06-24 16:37 UTC (permalink / raw)
To: Yanan He
Cc: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, David Wu, Maxime Coquelin, Alexandre Torgue,
devicetree, linux-kernel, linux-arm-kernel, linux-rockchip,
netdev, linux-stm32
In-Reply-To: <20260624-rv1126-alientek-dlrv1126-v1-3-5aef608a3f64@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 64 bytes --]
Per Heiko's comments,
pw-bot: changes-requested
Cheers,
Conor.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply
* Re: [PATCH net 0/7] xsk: fix AF_XDP multi-buffer Tx descriptor reclaim
From: Maciej Fijalkowski @ 2026-06-24 16:37 UTC (permalink / raw)
To: Stanislav Fomichev
Cc: netdev, bpf, magnus.karlsson, stfomichev, kuba, pabeni, horms,
kerneljasonxing, bjorn
In-Reply-To: <ajv23_r4M3FGrFNN@devvm7509.cco0.facebook.com>
On Wed, Jun 24, 2026 at 08:38:20AM -0700, Stanislav Fomichev wrote:
> On 06/23, Maciej Fijalkowski wrote:
> > Hi,
> >
> > This series fixes several AF_XDP multi-buffer Tx paths where descriptors
> > consumed from the Tx ring are not consistently returned to userspace
> > through the completion ring when the packet is later dropped as invalid.
> >
> > The affected cases are invalid or oversized multi-buffer Tx packets in
> > both the generic and zero-copy paths. In these cases, the kernel can
> > consume one or more Tx descriptors while building or validating a
> > multi-buffer packet, then drop the packet before it reaches the device.
> > Userspace still owns the UMEM buffers only after the corresponding
> > addresses are returned through the CQ. Missing completions therefore
> > make userspace lose track of those buffers.
> >
> > The generic path fixes cover three related cases:
> > * partially built multi-buffer skbs dropped by xsk_drop_skb();
> > continuation descriptors left in the Tx ring after xsk_build_skb()
> > reports overflow;
> > * invalid descriptors encountered in the middle of a multi-buffer
> > packet, including the offending invalid descriptor itself.
> >
> > The zero-copy path is handled separately. The batched Tx parser now
> > distinguishes descriptors that can be passed to the driver from
> > descriptors that are consumed only because they belong to an invalid
> > multi-buffer packet. Reclaim-only descriptors are written to the CQ
> > address area and published in completion order, after any earlier
> > driver-visible Tx descriptors.
> >
> > The ZC batching path can also retain drain state when userspace has not
> > yet provided the end of an invalid multi-buffer packet. To keep this
> > state local to the singular batched path, the series prevents a second
> > Tx socket from joining the same pool while such drain state exists.
> > During the singular-to-shared transition, Tx batching is gated,
> > pre-existing readers are waited out, and bind fails with -EAGAIN if the
> > existing socket still has pending drain state. This avoids adding
> > multi-buffer drain handling to the shared-UMEM fallback path.
> >
> > The last two patches update xskxceiver so the tests account invalid
> > multi-buffer Tx packets as descriptors that must be reclaimed, while
> > still not expecting those invalid packets on the Rx side.
> >
> > This is a follow-up to Jason's changes [0] which were addressing generic
> > xmit only and this set allows me to pass full xskxceiver test suite run
> > against ice driver.
>
> There is a fair amount of feedback from sashiko already :-( So the meta
> question from me is: is it time to scrap our current approach where
> we parse descriptor by descriptor? (and maintain half-baked skb and
> half-consumed descriptor queues)
>
> Should we:
>
> 1. do desc[MAX_SKB_FRAGS] and xskq_cons_peek_desc until we exhaust
> PKT_CONT (if the last packet has PKT_CONT, return EOVERFLOW to userspace
> and do a full stop here)
> 2. now that we really know the number of valid descriptors -> reserve
> the cq space (if not -> EAGAIN)
> 3. pre-allocate everything here (if at any point we have ENOMEM -> cleanup
> locally, don't ever create semi-initialized skb)
> 4. construct the skb
> 5. xmit
Yeah generic xmit became utterly horrible, haven't gone through sashiko
reviews yet, but bare in mind this set also aligns zc side to what was
previously being addressed by Jason.
I believe planned logistics were to get these fixes onto net and then
Jason had an implementation of batching on generic xmit, directed towards
-next and that's where we could address current flow.
>
> If at any point there is an issue, the cleanup is straightforward. That
> whole xk->skb goes away, no state between syscalls. Thoughts?
^ permalink raw reply
* Re: [PATCH 2/7] dt-bindings: arm: rockchip: Add Alientek DLRV1126
From: Conor Dooley @ 2026-06-24 16:37 UTC (permalink / raw)
To: Yanan He
Cc: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, David Wu, Maxime Coquelin, Alexandre Torgue,
devicetree, linux-kernel, linux-arm-kernel, linux-rockchip,
netdev, linux-stm32
In-Reply-To: <20260624-rv1126-alientek-dlrv1126-v1-2-5aef608a3f64@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 75 bytes --]
Acked-by: Conor Dooley <conor.dooley@microchip.com>
pw-bot: not-applicable
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply
* Re: [PATCH 1/7] dt-bindings: vendor-prefixes: add alientek
From: Conor Dooley @ 2026-06-24 16:38 UTC (permalink / raw)
To: Yanan He
Cc: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, David Wu, Maxime Coquelin, Alexandre Torgue,
devicetree, linux-kernel, linux-arm-kernel, linux-rockchip,
netdev, linux-stm32
In-Reply-To: <20260624-rv1126-alientek-dlrv1126-v1-1-5aef608a3f64@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 75 bytes --]
Acked-by: Conor Dooley <conor.dooley@microchip.com>
pw-bot: not-applicable
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply
* [PATCH net v3 00/11] rxrpc: Miscellaneous fixes
From: David Howells @ 2026-06-24 16:38 UTC (permalink / raw)
To: netdev
Cc: David Howells, Marc Dionne, Jakub Kicinski, David S. Miller,
Eric Dumazet, Paolo Abeni, Simon Horman, linux-afs, linux-kernel
Here are some miscellaneous AF_RXRPC fixes for more stuff found by Sashiko[1][2]:
(1) Fix ACKALL handling by adding two more call states to simplify when
ACKs are valid.
(2) Fix connection leak from AF_RXRPC recvmsg userspace OOB handling.
(3) Fix double unlock in AF_RXRPC recvmsg userspace OOB handling.
(4) Fix AFS preallocate charge to flush the waitqueue after unlistening
the socket so that any charging thread that does manage to get started
will be waited for before socket destruction.
(5) Fix AFS OOB notify handling to cancel in-progress OOB notification
handling and then to flush the workqueue it's on.
(6) Fix handling of apparent reply reception before initial transmission
starts in client call.
(7) Fix OOB challenge leak in cleanup on notification failure.
(8) Fix infinite loop in recvmsg if OOB packet available, but no calls.
(9) Fix notify vs recvmsg race where notify thinks the call is already
queued.
(10) Fix MSG_PEEK call leak for calls with no content.
(11) Fix rxrpc_rotate_tx_window() to check that there's something in the Tx
buffer before attempting to rotate it.
Note that the latest Sashiko review[3] has some more pre-existing issues,
but I'd like to get this set of fixes committed before tackling those.
David
The patches can be found here also:
http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-fixes
[1] https://sashiko.dev/#/patchset/20260609140911.838677-1-dhowells%40redhat.com
[2] https://sashiko.dev/#/patchset/20260616155749.2125907-1-dhowells%40redhat.com
[3] https://sashiko.dev/#/patchset/20260618134802.2477777-1-dhowells%40redhat.com
Changes
=======
ver #3)
- Rewrote ACKALL fix to add two new call states to make it easier to track
when ACKs are valid.
- Addressed the Sashiko review[3]:
- Use READ_ONCE() to access net->live.
- Added a patch to check the Tx buffer is occupied before rotating it.
ver #2)
- Addressed the Sashiko review[2] of ver #1:
- Added patches to fix more bugs that it found.
- Adjusted AFS preallocate charge cleanup to only cancel the preallocate
work item after unlistening rather than flushing the entire waitqueue
(which may be waiting on DNS lookup).
- Added extra cancel for OOB rx handler in AFS.
David Howells (10):
rxrpc: Fix leak of connection from OOB challenge
rxrpc: Fix double unlock in rxrpc_recvmsg()
afs: Fix further netns teardown to cancel the preallocation charger
afs: Fix uncancelled rxrpc OOB message handler
rxrpc: Fix the reception of a reply packet before data transmission
rxrpc: Fix oob challenge leak in cleanup after notification failure
rxrpc: Fix potential infinite loop in rxrpc_recvmsg()
rxrpc: Fix socket notification race
rxrpc: Fix leak of released call in recvmsg(MSG_PEEK)
rxrpc: Fix rxrpc_rotate_tx_rotate() to check there's something to
rotate
Wyatt Feng (1):
rxrpc: Fix ACKALL packet handling
fs/afs/cm_security.c | 3 ++-
fs/afs/rxrpc.c | 10 +++++++++-
net/rxrpc/ar-internal.h | 6 ++++--
net/rxrpc/call_event.c | 5 ++++-
net/rxrpc/call_object.c | 2 ++
net/rxrpc/conn_client.c | 2 +-
net/rxrpc/conn_event.c | 9 +++++++--
net/rxrpc/input.c | 39 ++++++++++++++++++++++++++++++++-------
net/rxrpc/oob.c | 12 ++++++++++--
net/rxrpc/recvmsg.c | 10 ++++------
net/rxrpc/sendmsg.c | 3 ++-
11 files changed, 77 insertions(+), 24 deletions(-)
^ permalink raw reply
* [PATCH net v3 01/11] rxrpc: Fix ACKALL packet handling
From: David Howells @ 2026-06-24 16:38 UTC (permalink / raw)
To: netdev
Cc: David Howells, Marc Dionne, Jakub Kicinski, David S. Miller,
Eric Dumazet, Paolo Abeni, Simon Horman, linux-afs, linux-kernel,
Wyatt Feng, Yuan Tan, Yifan Wu, Juefei Pu, Zhengchuan Liang,
Xin Liu, Ren Wei, stable
In-Reply-To: <20260624163819.3017002-1-dhowells@redhat.com>
From: Wyatt Feng <bronzed_45_vested@icloud.com>
rxrpc_input_ackall() accepts ACKALL packets without checking whether the
call is in a state that can legitimately have outstanding transmit buffers.
A forged ACKALL can therefore reach a new service call in
RXRPC_CALL_SERVER_RECV_REQUEST before any reply packets have been queued.
In that state call->tx_top is zero and call->tx_queue is NULL, so
rxrpc_rotate_tx_window() dereferences a NULL txqueue and triggers a
null-pointer dereference.
Fix the handling of ACKALL packets by the following means:
(1) Add two new call states: RXRPC_CALL_CLIENT_PRE_SEND which indicates
that the client call is connected, but nothing has been transmitted as
yet; and RXRPC_CALL_CLIENT_AWAIT_ACK, which indicates that everything
has been transmitted at least once, but we're now waiting for the
stuff remaining in the Tx buffer to be ACK'd (retransmissions may
still happen).
The RXRPC_CALL_CLIENT_PRE_SEND state is set when the call is assigned
a channel and transitions to RXRPC_CALL_CLIENT_SEND_REQUEST when the
first packet is transmitted.
RXRPC_CALL_CLIENT_AWAIT_REPLY is then narrowed in scope to indicate
that all Tx packets have been ACK'd and we're now waiting for the
reply to be received.
(2) As per Wyatt Feng's original patch[1], the ACKALL handler then checks
that the call state is one in which there might be stuff in the Tx
buffer to ACK, but now this includes AWAIT_ACK rather than
AWAIT_REPLY. ACKALL packets are ignored if received in the wrong
state.
Note that unlike Wyatt Feng's patch, it's no longer necessary to check
to see if the Tx buffer exists as this the state set now covers this.
(3) Make the ACKALL handler use call->tx_transmitted rather than
call->tx_top as the former is explicitly the highest packet seq number
transmitted, whereas the latter has a looser definition.
Thanks to Jeffrey Altman for a description of the history of the ACKALL
packet[1].
Fixes: b341a0263b1b ("rxrpc: Implement progressive transmission queue struct")
Reported-by: Yuan Tan <yuantan098@gmail.com>
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Reported-by: Zhengchuan Liang <zcliangcn@gmail.com>
Reported-by: Xin Liu <bird@lzu.edu.cn>
Signed-off-by: Wyatt Feng <bronzed_45_vested@icloud.com>
Co-developed-by: David Howells <dhowells@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Ren Wei <n05ec@lzu.edu.cn>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20260616155749.2125907-2-dhowells@redhat.com/ [1]
Link: https://lore.kernel.org/r/c0fd4fec-1576-4070-b31e-a37d5506f5ed@auristor.com/ [2]
---
net/rxrpc/ar-internal.h | 2 ++
net/rxrpc/call_event.c | 5 ++++-
net/rxrpc/call_object.c | 2 ++
net/rxrpc/conn_client.c | 2 +-
net/rxrpc/input.c | 23 +++++++++++++++++++----
net/rxrpc/sendmsg.c | 3 ++-
6 files changed, 30 insertions(+), 7 deletions(-)
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index 98f2165159d7..b6ccd8a8199b 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -650,7 +650,9 @@ enum rxrpc_call_event {
enum rxrpc_call_state {
RXRPC_CALL_UNINITIALISED,
RXRPC_CALL_CLIENT_AWAIT_CONN, /* - client waiting for connection to become available */
+ RXRPC_CALL_CLIENT_PRE_SEND, /* - client is connected, but hasn't sent anything yet */
RXRPC_CALL_CLIENT_SEND_REQUEST, /* - client sending request phase */
+ RXRPC_CALL_CLIENT_AWAIT_ACK, /* - client awaiting ACKs of request */
RXRPC_CALL_CLIENT_AWAIT_REPLY, /* - client awaiting reply */
RXRPC_CALL_CLIENT_RECV_REPLY, /* - client receiving reply phase */
RXRPC_CALL_SERVER_PREALLOC, /* - service preallocation */
diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c
index fec59d9338b9..21be9c86d7a7 100644
--- a/net/rxrpc/call_event.c
+++ b/net/rxrpc/call_event.c
@@ -178,7 +178,7 @@ static void rxrpc_close_tx_phase(struct rxrpc_call *call)
switch (__rxrpc_call_state(call)) {
case RXRPC_CALL_CLIENT_SEND_REQUEST:
- rxrpc_set_call_state(call, RXRPC_CALL_CLIENT_AWAIT_REPLY);
+ rxrpc_set_call_state(call, RXRPC_CALL_CLIENT_AWAIT_ACK);
break;
case RXRPC_CALL_SERVER_SEND_REPLY:
rxrpc_set_call_state(call, RXRPC_CALL_SERVER_AWAIT_ACK);
@@ -244,6 +244,8 @@ static void rxrpc_transmit_fresh_data(struct rxrpc_call *call, unsigned int limi
break;
} while (req.n < limit && before(seq, send_top));
+ if (__rxrpc_call_state(call) == RXRPC_CALL_CLIENT_PRE_SEND)
+ rxrpc_set_call_state(call, RXRPC_CALL_CLIENT_SEND_REQUEST);
if (txb->flags & RXRPC_LAST_PACKET) {
rxrpc_close_tx_phase(call);
tq = NULL;
@@ -267,6 +269,7 @@ void rxrpc_transmit_some_data(struct rxrpc_call *call, unsigned int limit,
fallthrough;
case RXRPC_CALL_SERVER_SEND_REPLY:
+ case RXRPC_CALL_CLIENT_PRE_SEND:
case RXRPC_CALL_CLIENT_SEND_REQUEST:
if (!rxrpc_tx_window_space(call))
return;
diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c
index fcb9d38bb521..817ed9acb91e 100644
--- a/net/rxrpc/call_object.c
+++ b/net/rxrpc/call_object.c
@@ -18,7 +18,9 @@
const char *const rxrpc_call_states[NR__RXRPC_CALL_STATES] = {
[RXRPC_CALL_UNINITIALISED] = "Uninit ",
[RXRPC_CALL_CLIENT_AWAIT_CONN] = "ClWtConn",
+ [RXRPC_CALL_CLIENT_PRE_SEND] = "ClPreSnd",
[RXRPC_CALL_CLIENT_SEND_REQUEST] = "ClSndReq",
+ [RXRPC_CALL_CLIENT_AWAIT_ACK] = "ClAwtAck",
[RXRPC_CALL_CLIENT_AWAIT_REPLY] = "ClAwtRpl",
[RXRPC_CALL_CLIENT_RECV_REPLY] = "ClRcvRpl",
[RXRPC_CALL_SERVER_PREALLOC] = "SvPrealc",
diff --git a/net/rxrpc/conn_client.c b/net/rxrpc/conn_client.c
index 9b757798dedd..48519f0de185 100644
--- a/net/rxrpc/conn_client.c
+++ b/net/rxrpc/conn_client.c
@@ -449,7 +449,7 @@ static void rxrpc_activate_one_channel(struct rxrpc_connection *conn,
trace_rxrpc_connect_call(call);
call->tx_last_sent = ktime_get_real();
rxrpc_start_call_timer(call);
- rxrpc_set_call_state(call, RXRPC_CALL_CLIENT_SEND_REQUEST);
+ rxrpc_set_call_state(call, RXRPC_CALL_CLIENT_PRE_SEND);
wake_up(&call->waitq);
}
diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index ce761466b02d..2eedab1b0919 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -181,7 +181,8 @@ void rxrpc_congestion_degrade(struct rxrpc_call *call)
if (call->cong_ca_state != RXRPC_CA_SLOW_START &&
call->cong_ca_state != RXRPC_CA_CONGEST_AVOIDANCE)
return;
- if (__rxrpc_call_state(call) == RXRPC_CALL_CLIENT_AWAIT_REPLY)
+ if (__rxrpc_call_state(call) == RXRPC_CALL_CLIENT_AWAIT_ACK ||
+ __rxrpc_call_state(call) == RXRPC_CALL_CLIENT_AWAIT_REPLY)
return;
rtt = ns_to_ktime(call->srtt_us * (NSEC_PER_USEC / 8));
@@ -356,6 +357,7 @@ static void rxrpc_end_tx_phase(struct rxrpc_call *call, bool reply_begun,
switch (__rxrpc_call_state(call)) {
case RXRPC_CALL_CLIENT_SEND_REQUEST:
+ case RXRPC_CALL_CLIENT_AWAIT_ACK:
case RXRPC_CALL_CLIENT_AWAIT_REPLY:
if (reply_begun) {
rxrpc_set_call_state(call, RXRPC_CALL_CLIENT_RECV_REPLY);
@@ -694,6 +696,7 @@ static void rxrpc_input_data(struct rxrpc_call *call, struct sk_buff *skb)
switch (__rxrpc_call_state(call)) {
case RXRPC_CALL_CLIENT_SEND_REQUEST:
+ case RXRPC_CALL_CLIENT_AWAIT_ACK:
case RXRPC_CALL_CLIENT_AWAIT_REPLY:
/* Received data implicitly ACKs all of the request
* packets we sent when we're acting as a client.
@@ -1154,10 +1157,12 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb)
if (hard_ack + 1 == 0)
return rxrpc_proto_abort(call, 0, rxrpc_eproto_ackr_zero);
- /* Ignore ACKs unless we are or have just been transmitting. */
+ /* Ignore ACKs unless we are transmitting or are waiting for
+ * acknowledgement of the packets we've just been transmitting.
+ */
switch (__rxrpc_call_state(call)) {
case RXRPC_CALL_CLIENT_SEND_REQUEST:
- case RXRPC_CALL_CLIENT_AWAIT_REPLY:
+ case RXRPC_CALL_CLIENT_AWAIT_ACK:
case RXRPC_CALL_SERVER_SEND_REPLY:
case RXRPC_CALL_SERVER_AWAIT_ACK:
break;
@@ -1215,7 +1220,17 @@ static void rxrpc_input_ackall(struct rxrpc_call *call, struct sk_buff *skb)
{
struct rxrpc_ack_summary summary = { 0 };
- if (rxrpc_rotate_tx_window(call, call->tx_top, &summary))
+ switch (__rxrpc_call_state(call)) {
+ case RXRPC_CALL_CLIENT_SEND_REQUEST:
+ case RXRPC_CALL_CLIENT_AWAIT_ACK:
+ case RXRPC_CALL_SERVER_SEND_REPLY:
+ case RXRPC_CALL_SERVER_AWAIT_ACK:
+ break;
+ default:
+ return;
+ }
+
+ if (rxrpc_rotate_tx_window(call, call->tx_transmitted, &summary))
rxrpc_end_tx_phase(call, false, rxrpc_eproto_unexpected_ackall);
}
diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c
index c35de4fd75e3..ed2c9a51005a 100644
--- a/net/rxrpc/sendmsg.c
+++ b/net/rxrpc/sendmsg.c
@@ -366,7 +366,8 @@ static int rxrpc_send_data(struct rxrpc_sock *rx,
if (state >= RXRPC_CALL_COMPLETE)
goto maybe_error;
ret = -EPROTO;
- if (state != RXRPC_CALL_CLIENT_SEND_REQUEST &&
+ if (state != RXRPC_CALL_CLIENT_PRE_SEND &&
+ state != RXRPC_CALL_CLIENT_SEND_REQUEST &&
state != RXRPC_CALL_SERVER_ACK_REQUEST &&
state != RXRPC_CALL_SERVER_SEND_REPLY) {
/* Request phase complete for this client call */
^ permalink raw reply related
* [PATCH net v3 02/11] rxrpc: Fix leak of connection from OOB challenge
From: David Howells @ 2026-06-24 16:38 UTC (permalink / raw)
To: netdev
Cc: David Howells, Marc Dionne, Jakub Kicinski, David S. Miller,
Eric Dumazet, Paolo Abeni, Simon Horman, linux-afs, linux-kernel,
stable
In-Reply-To: <20260624163819.3017002-1-dhowells@redhat.com>
Fix leak of connection object from OOB challenge queue when response is
provided by userspace.
Fixes: 5800b1cf3fd8 ("rxrpc: Allow CHALLENGEs to the passed to the app for a RESPONSE")
Link: https://sashiko.dev/#/patchset/20260609140911.838677-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: stable@kernel.org
---
net/rxrpc/oob.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/net/rxrpc/oob.c b/net/rxrpc/oob.c
index 05ca9c1faa57..3318c8bd82ad 100644
--- a/net/rxrpc/oob.c
+++ b/net/rxrpc/oob.c
@@ -210,6 +210,11 @@ static int rxrpc_respond_to_oob(struct rxrpc_sock *rx,
break;
}
+ switch (skb->mark) {
+ case RXRPC_OOB_CHALLENGE:
+ rxrpc_put_connection(sp->chall.conn, rxrpc_conn_put_oob);
+ break;
+ }
rxrpc_free_skb(skb, rxrpc_skb_put_oob);
return ret;
}
^ permalink raw reply related
* [PATCH net v3 03/11] rxrpc: Fix double unlock in rxrpc_recvmsg()
From: David Howells @ 2026-06-24 16:38 UTC (permalink / raw)
To: netdev
Cc: David Howells, Marc Dionne, Jakub Kicinski, David S. Miller,
Eric Dumazet, Paolo Abeni, Simon Horman, linux-afs, linux-kernel,
stable
In-Reply-To: <20260624163819.3017002-1-dhowells@redhat.com>
Fix a double unlock in rxrpc_recvmsg() when dealing with OOB messages.
Fixes: 5800b1cf3fd8 ("rxrpc: Allow CHALLENGEs to the passed to the app for a RESPONSE")
Link: https://sashiko.dev/#/patchset/20260609140911.838677-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: stable@kernel.org
---
net/rxrpc/recvmsg.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/rxrpc/recvmsg.c b/net/rxrpc/recvmsg.c
index 82614cbdb60f..39a03684432d 100644
--- a/net/rxrpc/recvmsg.c
+++ b/net/rxrpc/recvmsg.c
@@ -471,7 +471,7 @@ int rxrpc_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
release_sock(&rx->sk);
if (ret == -EAGAIN)
goto try_again;
- goto error_no_call;
+ goto error_trace;
}
/* Find the next call and dequeue it if we're not just peeking. If we
^ permalink raw reply related
* [PATCH net v3 04/11] afs: Fix further netns teardown to cancel the preallocation charger
From: David Howells @ 2026-06-24 16:38 UTC (permalink / raw)
To: netdev
Cc: David Howells, Marc Dionne, Jakub Kicinski, David S. Miller,
Eric Dumazet, Paolo Abeni, Simon Horman, linux-afs, linux-kernel,
Li Daming, Ren Wei, Jeffrey Altman, stable
In-Reply-To: <20260624163819.3017002-1-dhowells@redhat.com>
When an afs network namespace is torn down, it cancels and waits for the
work item that keeps the preallocated rxrpc call/conn/peer queue charged
before disabling incoming (i.e. listen 0), but there's a small window in
which it can be requeued by an incoming call wending through the I/O
thread.
Fix this by cancelling the charger work item again after reducing the
listen backlog to zero.
Fixes: 47694fbc9d24 ("afs: Fix netns teardown to cancel the preallocation charger")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://sashiko.dev/#/patchset/20260609140911.838677-1-dhowells%40redhat.com
cc: Li Daming <d4n.for.sec@gmail.com>
cc: Ren Wei <n05ec@lzu.edu.cn>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Jeffrey Altman <jaltman@auristor.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: stable@kernel.org
---
fs/afs/rxrpc.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c
index d5cfd24e815b..6714a189d58f 100644
--- a/fs/afs/rxrpc.c
+++ b/fs/afs/rxrpc.c
@@ -128,8 +128,13 @@ void afs_close_socket(struct afs_net *net)
_enter("");
cancel_work_sync(&net->charge_preallocation_work);
+ /* Future work items should now see ->live is false. */
+
kernel_listen(net->socket, 0);
+
+ /* Make sure work items are no longer running. */
flush_workqueue(afs_async_calls);
+ cancel_work_sync(&net->charge_preallocation_work);
if (net->spare_incoming_call) {
afs_put_call(net->spare_incoming_call);
^ permalink raw reply related
* [PATCH net v3 05/11] afs: Fix uncancelled rxrpc OOB message handler
From: David Howells @ 2026-06-24 16:38 UTC (permalink / raw)
To: netdev
Cc: David Howells, Marc Dionne, Jakub Kicinski, David S. Miller,
Eric Dumazet, Paolo Abeni, Simon Horman, linux-afs, linux-kernel,
Li Daming, Ren Wei, Jeffrey Altman, stable
In-Reply-To: <20260624163819.3017002-1-dhowells@redhat.com>
Fix AFS to cancel its OOB message processing (typically to respond to
security challenges). Also move OOB message processing to afs_wq so that
it's also waited for and make the OOB handler just return if the net
namespace is no longer live.
Fixes: 5800b1cf3fd8 ("rxrpc: Allow CHALLENGEs to the passed to the app for a RESPONSE")
Link: https://sashiko.dev/#/patchset/20260609140911.838677-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Li Daming <d4n.for.sec@gmail.com>
cc: Ren Wei <n05ec@lzu.edu.cn>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Jeffrey Altman <jaltman@auristor.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: stable@kernel.org
---
fs/afs/cm_security.c | 3 ++-
fs/afs/rxrpc.c | 5 ++++-
2 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/fs/afs/cm_security.c b/fs/afs/cm_security.c
index edcbd249d202..103168c70dd4 100644
--- a/fs/afs/cm_security.c
+++ b/fs/afs/cm_security.c
@@ -101,7 +101,8 @@ void afs_process_oob_queue(struct work_struct *work)
struct sk_buff *oob;
enum rxrpc_oob_type type;
- while ((oob = rxrpc_kernel_dequeue_oob(net->socket, &type))) {
+ while (READ_ONCE(net->live) &&
+ (oob = rxrpc_kernel_dequeue_oob(net->socket, &type))) {
switch (type) {
case RXRPC_OOB_CHALLENGE:
afs_respond_to_challenge(oob);
diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c
index 6714a189d58f..d82916657a3d 100644
--- a/fs/afs/rxrpc.c
+++ b/fs/afs/rxrpc.c
@@ -128,6 +128,7 @@ void afs_close_socket(struct afs_net *net)
_enter("");
cancel_work_sync(&net->charge_preallocation_work);
+ cancel_work_sync(&net->rx_oob_work);
/* Future work items should now see ->live is false. */
kernel_listen(net->socket, 0);
@@ -148,6 +149,7 @@ void afs_close_socket(struct afs_net *net)
kernel_sock_shutdown(net->socket, SHUT_RDWR);
flush_workqueue(afs_async_calls);
+ cancel_work_sync(&net->rx_oob_work);
net->socket->sk->sk_user_data = NULL;
sock_release(net->socket);
key_put(net->fs_cm_token_key);
@@ -989,5 +991,6 @@ static void afs_rx_notify_oob(struct sock *sk, struct sk_buff *oob)
{
struct afs_net *net = sk->sk_user_data;
- schedule_work(&net->rx_oob_work);
+ if (READ_ONCE(net->live))
+ queue_work(afs_wq, &net->rx_oob_work);
}
^ permalink raw reply related
* [PATCH net v3 06/11] rxrpc: Fix the reception of a reply packet before data transmission
From: David Howells @ 2026-06-24 16:38 UTC (permalink / raw)
To: netdev
Cc: David Howells, Marc Dionne, Jakub Kicinski, David S. Miller,
Eric Dumazet, Paolo Abeni, Simon Horman, linux-afs, linux-kernel,
Jeffrey Altman, stable
In-Reply-To: <20260624163819.3017002-1-dhowells@redhat.com>
Fix rxrpc_receiving_reply() to handle the reception of an apparent reply
DATA packet before rxrpc has had a chance to send any request DATA packets
on a client call by checking to see if the call has been exposed yet by
sending the first packet.
Without this, rxrpc_rotate_tx_window() might oops.
Also fix rxrpc_rotate_tx_window() to handle the Tx queue being empty by
changing the do...while loop into a while loop, just in case a call is
abnormally terminated by an early reply before the last request packet is
transmitted.
Fixes: b341a0263b1b ("rxrpc: Implement progressive transmission queue struct")
Link: https://sashiko.dev/#/patchset/20260616155749.2125907-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Jeffrey Altman <jaltman@auristor.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: stable@kernel.org
---
net/rxrpc/input.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index 2eedab1b0919..9bd0f1b92463 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -248,7 +248,7 @@ static bool rxrpc_rotate_tx_window(struct rxrpc_call *call, rxrpc_seq_t to,
tq = call->tx_queue;
}
- do {
+ while (before_eq(seq, to)) {
unsigned int ix = seq - call->tx_qbase;
_debug("tq=%x seq=%x i=%d f=%x", tq->qbase, seq, ix, tq->bufs[ix]->flags);
@@ -318,8 +318,7 @@ static bool rxrpc_rotate_tx_window(struct rxrpc_call *call, rxrpc_seq_t to,
break;
}
}
-
- } while (before_eq(seq, to));
+ }
if (trace)
trace_rxrpc_rack_update(call, summary);
@@ -394,6 +393,14 @@ static bool rxrpc_receiving_reply(struct rxrpc_call *call)
trace_rxrpc_timer_can(call, rxrpc_timer_trace_delayed_ack);
}
+ /* Deal with an apparent reply coming in before we've got the request
+ * queued or transmitted.
+ */
+ if (!test_bit(RXRPC_CALL_EXPOSED, &call->flags)) {
+ rxrpc_proto_abort(call, top, rxrpc_eproto_early_reply);
+ return false;
+ }
+
if (!test_bit(RXRPC_CALL_TX_LAST, &call->flags)) {
if (!rxrpc_rotate_tx_window(call, top, &summary)) {
rxrpc_proto_abort(call, top, rxrpc_eproto_early_reply);
^ permalink raw reply related
* [PATCH net v3 07/11] rxrpc: Fix oob challenge leak in cleanup after notification failure
From: David Howells @ 2026-06-24 16:38 UTC (permalink / raw)
To: netdev
Cc: David Howells, Marc Dionne, Jakub Kicinski, David S. Miller,
Eric Dumazet, Paolo Abeni, Simon Horman, linux-afs, linux-kernel,
Jeffrey Altman, stable
In-Reply-To: <20260624163819.3017002-1-dhowells@redhat.com>
Fix rxrpc_notify_socket_oob() to return an indication of failure in the
event that it failed to queue a packet and fix rxrpc_post_challenge() to
clean up the connection ref in such an event.
Fixes: 5800b1cf3fd8 ("rxrpc: Allow CHALLENGEs to the passed to the app for a RESPONSE")
Link: https://sashiko.dev/#/patchset/20260616155749.2125907-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Jeffrey Altman <jaltman@auristor.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: stable@kernel.org
---
net/rxrpc/ar-internal.h | 4 ++--
net/rxrpc/conn_event.c | 9 +++++++--
net/rxrpc/oob.c | 7 +++++--
3 files changed, 14 insertions(+), 6 deletions(-)
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index b6ccd8a8199b..d2b31d15851b 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -1357,9 +1357,9 @@ static inline struct rxrpc_net *rxrpc_net(struct net *net)
}
/*
- * out_of_band.c
+ * oob.c
*/
-void rxrpc_notify_socket_oob(struct rxrpc_call *call, struct sk_buff *skb);
+bool rxrpc_notify_socket_oob(struct rxrpc_call *call, struct sk_buff *skb);
void rxrpc_add_pending_oob(struct rxrpc_sock *rx, struct sk_buff *skb);
int rxrpc_sendmsg_oob(struct rxrpc_sock *rx, struct msghdr *msg, size_t len);
diff --git a/net/rxrpc/conn_event.c b/net/rxrpc/conn_event.c
index c96ca615b787..611c790bc6d0 100644
--- a/net/rxrpc/conn_event.c
+++ b/net/rxrpc/conn_event.c
@@ -436,7 +436,7 @@ static bool rxrpc_post_challenge(struct rxrpc_connection *conn,
struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
struct rxrpc_call *call = NULL;
struct rxrpc_sock *rx;
- bool respond = false;
+ bool respond = false, queued = false;
sp->chall.conn =
rxrpc_get_connection(conn, rxrpc_conn_get_challenge_input);
@@ -472,8 +472,13 @@ static bool rxrpc_post_challenge(struct rxrpc_connection *conn,
}
if (call)
- rxrpc_notify_socket_oob(call, skb);
+ queued = rxrpc_notify_socket_oob(call, skb);
rcu_read_unlock();
+ if (call && !queued) {
+ rxrpc_put_connection(conn, rxrpc_conn_put_challenge_input);
+ sp->chall.conn = NULL;
+ return false;
+ }
if (!call)
rxrpc_post_packet_to_conn(conn, skb);
diff --git a/net/rxrpc/oob.c b/net/rxrpc/oob.c
index 3318c8bd82ad..c80ee2487d09 100644
--- a/net/rxrpc/oob.c
+++ b/net/rxrpc/oob.c
@@ -32,11 +32,12 @@ struct rxrpc_oob_params {
* Post an out-of-band message for attention by the socket or kernel service
* associated with a reference call.
*/
-void rxrpc_notify_socket_oob(struct rxrpc_call *call, struct sk_buff *skb)
+bool rxrpc_notify_socket_oob(struct rxrpc_call *call, struct sk_buff *skb)
{
struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
struct rxrpc_sock *rx;
struct sock *sk;
+ bool queued = false;
rcu_read_lock();
@@ -49,6 +50,7 @@ void rxrpc_notify_socket_oob(struct rxrpc_call *call, struct sk_buff *skb)
skb->skb_mstamp_ns = rx->oob_id_counter++;
rxrpc_get_skb(skb, rxrpc_skb_get_post_oob);
skb_queue_tail(&rx->recvmsg_oobq, skb);
+ queued = true;
trace_rxrpc_notify_socket(call->debug_id, sp->hdr.serial);
if (rx->app_ops)
@@ -56,11 +58,12 @@ void rxrpc_notify_socket_oob(struct rxrpc_call *call, struct sk_buff *skb)
}
spin_unlock_irq(&rx->recvmsg_lock);
- if (!rx->app_ops && !sock_flag(sk, SOCK_DEAD))
+ if (queued && !rx->app_ops && !sock_flag(sk, SOCK_DEAD))
sk->sk_data_ready(sk);
}
rcu_read_unlock();
+ return queued;
}
/*
^ permalink raw reply related
* [PATCH net v3 08/11] rxrpc: Fix potential infinite loop in rxrpc_recvmsg()
From: David Howells @ 2026-06-24 16:38 UTC (permalink / raw)
To: netdev
Cc: David Howells, Marc Dionne, Jakub Kicinski, David S. Miller,
Eric Dumazet, Paolo Abeni, Simon Horman, linux-afs, linux-kernel,
Jeffrey Altman, stable
In-Reply-To: <20260624163819.3017002-1-dhowells@redhat.com>
Fix the wait in rxrpc_recvmsg() also take check the oob queue.
Fixes: 5800b1cf3fd8 ("rxrpc: Allow CHALLENGEs to the passed to the app for a RESPONSE")
Link: https://sashiko.dev/#/patchset/20260616155749.2125907-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Jeffrey Altman <jaltman@auristor.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: stable@kernel.org
---
net/rxrpc/recvmsg.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/rxrpc/recvmsg.c b/net/rxrpc/recvmsg.c
index 39a03684432d..f382a47c6eb0 100644
--- a/net/rxrpc/recvmsg.c
+++ b/net/rxrpc/recvmsg.c
@@ -438,7 +438,8 @@ int rxrpc_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
return -EAGAIN;
}
- if (list_empty(&rx->recvmsg_q)) {
+ if (list_empty(&rx->recvmsg_q) &&
+ skb_queue_empty_lockless(&rx->recvmsg_oobq)) {
ret = -EWOULDBLOCK;
if (timeo == 0) {
call = NULL;
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox