Linux-ARM-Kernel Archive on lore.kernel.org

Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH RFC 3/3] arm64: Add HOTPLUG_PARALLEL support for secondary CPUs
From: Thomas Gleixner @ 2026-06-18 15:49 UTC (permalink / raw)
  To: Jinjie Ruan, catalin.marinas, will, tsbogend, pjw, palmer, aou,
	alex, mingo, bp, dave.hansen, hpa, peterz, kees, nathan, linusw,
	ojeda, ruanjinjie, david.kaplan, lukas.bulwahn, ryan.roberts, maz,
	timothy.hayes, lpieralisi, thuth, oupton, yeoreum.yun,
	miko.lenczewski, broonie, kevin.brodsky, james.clark, tabba,
	mrigendra.chaubey, arnd, anshuman.khandual, x86, linux-kernel,
	linux-arm-kernel, linux-mips, linux-riscv
In-Reply-To: <20260611133809.3854977-4-ruanjinjie@huawei.com>

On Thu, Jun 11 2026 at 21:38, Jinjie Ruan wrote:
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -113,6 +113,7 @@ config ARM64
>  	select CPUMASK_OFFSTACK if NR_CPUS > 256
>  	select DCACHE_WORD_ACCESS
>  	select HAVE_EXTRA_IPI_TRACEPOINTS
> +	select HOTPLUG_PARALLEL if SMP && HOTPLUG_CPU

Why do you tie that to HOTPLUG_CPU? HOTPLUG_CPU lets you unplug/plug
CPUs at runtime, but if its disabled then a SMP system still has to
bring up the APs. So why should that fall back to the existing variant?

> +#ifdef CONFIG_HOTPLUG_PARALLEL
> +extern struct secondary_data cpu_boot_data[NR_CPUS];
> +#endif
> +
>  extern struct secondary_data secondary_data;
>  extern long __early_cpu_boot_status;
>  extern void secondary_entry(void);
> @@ -124,7 +128,11 @@ static inline void __noreturn cpu_park_loop(void)
>  
>  static inline void update_cpu_boot_status(unsigned int cpu, int val)
>  {
> +#ifdef CONFIG_HOTPLUG_PARALLEL
> +	WRITE_ONCE(cpu_boot_data[cpu].status, val);
> +#else
>  	WRITE_ONCE(secondary_data.status, val);
> +#endif

You're really a great fan of #ifdefs, right?

Just convert it over to the parallel mode unconditionally and get rid of
the existing cruft.

>  	/*
>  	 * TTBR0 is only used for the identity mapping at this stage. Make it
>  	 * point to zero page to avoid speculatively fetching new entries.
> @@ -254,7 +276,9 @@ asmlinkage notrace void secondary_start_kernel(void)
>  					 read_cpuid_id());
>  	update_cpu_boot_status(cpu, CPU_BOOT_SUCCESS);
>  	set_cpu_online(cpu, true);
> +#ifndef CONFIG_HOTPLUG_PARALLEL
>  	complete(&cpu_running);
> +#endif

Just for the record. You can get rid of this completion w/o PARALLEL
hotplug by selecting HOTPLUG_SPLIT_STARTUP and implementing the
kick/sync parts.
  
Thanks,

        tglx


^ permalink raw reply

* Re: [PATCH v4 3/4] drivers/firmware: add SDEI cross-CPU NMI service for arm64
From: Kiryl Shutsemau @ 2026-06-18 15:48 UTC (permalink / raw)
  To: Julian Braha
  Cc: Catalin Marinas, Will Deacon, James Morse, Mark Rutland,
	Marc Zyngier, Doug Anderson, Petr Mladek, Thomas Gleixner,
	Andrew Morton, Baoquan He, Puranjay Mohan, Usama Arif,
	Breno Leitao, Julien Thierry, Lecopzer Chen, Sumit Garg,
	kernel-team, kexec, linux-arm-kernel, linux-kernel
In-Reply-To: <a372e209-fbf9-417f-ad08-226cb7025949@gmail.com>

On Thu, Jun 18, 2026 at 11:46:12AM +0100, Julian Braha wrote:
> Hi Kiryl,
> 
> On 6/17/26 20:20, Kiryl Shutsemau wrote:
> 
> > +config ARM_SDEI_NMI
> > +	bool "SDEI-based cross-CPU NMI service (arm64)"
> > +	depends on ARM64 && ARM_SDE_INTERFACE
> The dependency on ARM64 is redundant here since you already have the
> dependency on ARM_SDE_INTERFACE.

Fair. Will remove ARM64 in the next version.

> Maybe a comment instead, though I think
> it's pretty clear from the prompt...

I don't think the comment would contribute much value.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov


^ permalink raw reply

* Re: [PATCH v1 1/1] i2c: pnx: Use generic definitions for bus frequencies
From: Vladimir Zapolskiy @ 2026-06-18 15:48 UTC (permalink / raw)
  To: Andy Shevchenko, linux-arm-kernel, linux-i2c, linux-kernel
  Cc: Piotr Wojtaszczyk, Andi Shyti
In-Reply-To: <20260618130948.3199768-1-andriy.shevchenko@linux.intel.com>

On 6/18/26 16:09, Andy Shevchenko wrote:
> Since we have generic definitions for bus frequencies, let's use them.
> 
> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

Reviewed-by: Vladimir Zapolskiy <vz@mleia.com>

-- 
Best wishes,
Vladimir


^ permalink raw reply

* Re: [PATCH v2 0/7] KVM: arm64: Fix missing ESR_ELx.IL in syndrome injection
From: Fuad Tabba @ 2026-06-18 15:47 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Catalin Marinas, Will Deacon, kvmarm,
	linux-arm-kernel, linux-kernel
  Cc: Joey Gouly, Steffen Eiden, Suzuki K Poulose, Zenghui Yu,
	Vincent Donnefort, Sascha Bischoff
In-Reply-To: <20260618121643.4105064-1-tabba@google.com>

On Thu, 18 Jun 2026 at 13:16, Fuad Tabba <tabba@google.com> wrote:
>
> Hi folks,
>
> After sashiko caught the missing IL bug [1], I did an audit of all ESR
> syndrome construction sites in KVM/arm64 as Marc suggested. This series
> is the result of that audit.

FWIW, went through all the issues Sashiko raised [1]; the only real
bug it found is the one we already fixed [2].

Cheers,
/fuad

[1] https://sashiko.dev/#/patchset/20260618121643.4105064-1-tabba@google.com
[2] https://lore.kernel.org/all/20260615131116.390977-1-tabba@google.com/


>
> The ARM architecture mandates ESR_ELx.IL=1 for several exception
> classes regardless of instruction length: EC=Unknown, Instruction
> Aborts, Data Aborts with ISV=0, and SError. For FPAC (EC=0x1C), IL
> reflects instruction length, but FPAC can only be generated by A64
> instructions, so IL must also be 1.
>
> Patch 1 is the bug sashiko found: inject_undef64() in the pKVM hyp (EL2)
> path never set IL. Patch 2 makes the same fix to inject_undef64() in the
> normal host path, where IL was derived from the triggering trap's
> instruction length. No instruction that reaches undef injection has a
> 16-bit encoding, so patch 2 has no functional change today.
> Patch 3 makes the matching fix to inject_abt64(). Unlike undef
> injection, abort injection is reachable from a 16-bit T32 instruction (a
> 32-bit EL0 task under an AArch64 EL1 guest), so the old code there
> injects an abort with IL=0.
> Patch 4 fixes the FPAC syndrome constructed during nested ERET
> emulation, which did not set IL.
> Patches 5-6 fix SError injection in the emulated and nested paths,
> neither of which set IL.
> Patch 7 fixes a fake ESR used to exit to the host. The host does not
> read IL there, so it is not guest-visible.
>
> Changes since v1 [2]:
> - Patch 4: keep IL by masking it through from the trapped ERET's ESR
>   instead of OR-ing the bit in. The ERET trap (EC=0x1A) always reports
>   IL=1, so this preserves the source syndrome rather than adding the
>   bit unconditionally (Marc).
> - Rebased on v7.1.
>
> Cheers,
> /fuad
>
> [1] https://lore.kernel.org/all/87pl1t8q24.wl-maz@kernel.org/
> [2] https://lore.kernel.org/all/20260614163336.3490925-1-tabba@google.com/
>
> Signed-off-by: Fuad Tabba <tabba@google.com>
>
> Fuad Tabba (7):
>   KVM: arm64: Set ESR_ELx.IL for injected undefined exceptions at EL2
>   KVM: arm64: Unconditionally set IL for injected undefined exceptions
>   KVM: arm64: Unconditionally set IL for injected abort exceptions
>   KVM: arm64: Set IL for injected FPAC exceptions during ERET emulation
>   KVM: arm64: Set IL for emulated SError injection
>   KVM: arm64: Set IL for nested SError injection
>   KVM: arm64: Set IL in fake ESR for pKVM memory sharing exit
>
>  arch/arm64/kvm/emulate-nested.c    |  4 ++--
>  arch/arm64/kvm/hyp/nvhe/pkvm.c     |  3 ++-
>  arch/arm64/kvm/hyp/nvhe/sys_regs.c |  2 +-
>  arch/arm64/kvm/inject_fault.c      | 18 +++++-------------
>  4 files changed, 10 insertions(+), 17 deletions(-)
>
> --
> 2.54.0.1189.g8c84645362-goog
>


^ permalink raw reply

* Re: [PATCH net] net: airoha: fix BQL underflow and UAF in shared QDMA TX ring
From: Lorenzo Bianconi @ 2026-06-18 15:42 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: Wayen Yan, linux-arm-kernel, linux-mediatek, netdev
In-Reply-To: <20260618-airoha-bql-fixes-v1-1-ffd2c2089518@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 7943 bytes --]

> When multiple netdevs share a QDMA TX ring and one device is stopped,
> netdev_tx_reset_subqueue() zeroes that device's BQL counters while its
> pending skbs remain in the shared HW TX ring. When NAPI later completes
> those skbs via netdev_tx_completed_queue(), the already-zeroed
> dql->num_queued counter underflows.
> Moreover, in the airoha_remove() path, netdevs are unregistered
> sequentially while skbs from previously unregistered netdevs may still
> reference freed net_device memory via skb->dev, causing a use-after-free
> during BQL accounting.
> Fix both issues:
> - Remove netdev_tx_reset_subqueue() from airoha_dev_stop() so pending
>   skbs are completed naturally by NAPI with proper BQL accounting.
> - Add netdev_tx_completed_queue() in airoha_qdma_cleanup_tx_queue()
>   to properly account for skbs freed during queue teardown.
> - Introduce airoha_qdma_tx_disable() to stop TX on all registered
>   netdevs for a given QDMA instance under RTNL lock.
> - Move DMA engine start/stop into probe/remove and
>   airoha_qdma_cleanup(), ensuring TX queues are cleaned up while all
>   netdevs are still registered and skb->dev is valid.
> 
> Fixes: 6df0488dc9dd ("net: airoha: fix BQL accounting in airoha_qdma_cleanup_tx_queue()")

This Fixes tag is wrong, the proper one is the one below:
Fixes: a9c2ca61fec7 ("net: airoha: Support multiple net_devices for a single FE GDM port")

Regards,
Lorenzo

> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---
>  drivers/net/ethernet/airoha/airoha_eth.c | 95 ++++++++++++++++++++++++--------
>  1 file changed, 72 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
> index 64dde6464f3f..4d6a061cd779 100644
> --- a/drivers/net/ethernet/airoha/airoha_eth.c
> +++ b/drivers/net/ethernet/airoha/airoha_eth.c
> @@ -1004,6 +1004,7 @@ static int airoha_qdma_tx_napi_poll(struct napi_struct *napi, int budget)
>  
>  		e = &q->entry[index];
>  		skb = e->skb;
> +		e->skb = NULL;
>  
>  		dma_unmap_single(eth->dev, e->dma_addr, e->dma_len,
>  				 DMA_TO_DEVICE);
> @@ -1147,6 +1148,42 @@ static int airoha_qdma_init_tx(struct airoha_qdma *qdma)
>  	return 0;
>  }
>  
> +static void airoha_qdma_tx_disable(struct airoha_qdma *qdma)
> +{
> +	struct airoha_eth *eth = qdma->eth;
> +	int i;
> +
> +	/* Protect netdev->reg_state and netif_tx_disable() calls. */
> +	rtnl_lock();
> +
> +	for (i = 0; i < ARRAY_SIZE(eth->ports); i++) {
> +		struct airoha_gdm_port *port = eth->ports[i];
> +		int j;
> +
> +		if (!port)
> +			continue;
> +
> +		for (j = 0; j < ARRAY_SIZE(port->devs); j++) {
> +			struct airoha_gdm_dev *dev = port->devs[j];
> +			struct net_device *netdev;
> +
> +			if (!dev)
> +				continue;
> +
> +			if (dev->qdma != qdma)
> +				continue;
> +
> +			netdev = netdev_from_priv(dev);
> +			if (netdev->reg_state != NETREG_REGISTERED)
> +				continue;
> +
> +			netif_tx_disable(netdev);
> +		}
> +	}
> +
> +	rtnl_unlock();
> +}
> +
>  static void airoha_qdma_cleanup_tx_queue(struct airoha_queue *q)
>  {
>  	struct airoha_qdma *qdma = q->qdma;
> @@ -1158,13 +1195,20 @@ static void airoha_qdma_cleanup_tx_queue(struct airoha_queue *q)
>  	for (i = 0; i < q->ndesc; i++) {
>  		struct airoha_queue_entry *e = &q->entry[i];
>  		struct airoha_qdma_desc *desc = &q->desc[i];
> +		struct sk_buff *skb = e->skb;
>  
>  		if (!e->dma_addr)
>  			continue;
>  
>  		dma_unmap_single(eth->dev, e->dma_addr, e->dma_len,
>  				 DMA_TO_DEVICE);
> -		dev_kfree_skb_any(e->skb);
> +		if (skb) {
> +			struct netdev_queue *txq;
> +
> +			txq = skb_get_tx_queue(skb->dev, skb);
> +			netdev_tx_completed_queue(txq, 1, skb->len);
> +			dev_kfree_skb_any(skb);
> +		}
>  		e->dma_addr = 0;
>  		e->skb = NULL;
>  		list_add_tail(&e->list, &q->tx_list);
> @@ -1527,6 +1571,23 @@ static void airoha_qdma_cleanup(struct airoha_qdma *qdma)
>  {
>  	int i;
>  
> +	if (test_bit(DEV_STATE_INITIALIZED, &qdma->eth->state)) {
> +		u32 status;
> +
> +		airoha_qdma_tx_disable(qdma);
> +
> +		airoha_qdma_clear(qdma, REG_QDMA_GLOBAL_CFG,
> +				  GLOBAL_CFG_TX_DMA_EN_MASK |
> +				  GLOBAL_CFG_RX_DMA_EN_MASK);
> +		if (read_poll_timeout(airoha_qdma_rr, status,
> +				      !(status & (GLOBAL_CFG_TX_DMA_BUSY_MASK |
> +						  GLOBAL_CFG_RX_DMA_BUSY_MASK)),
> +				      USEC_PER_MSEC, 50 * USEC_PER_MSEC, true,
> +				      qdma, REG_QDMA_GLOBAL_CFG))
> +			dev_warn(qdma->eth->dev,
> +				 "QDMA DMA engine busy timeout\n");
> +	}
> +
>  	for (i = 0; i < ARRAY_SIZE(qdma->q_rx); i++) {
>  		if (!qdma->q_rx[i].ndesc)
>  			continue;
> @@ -1837,9 +1898,6 @@ static int airoha_dev_open(struct net_device *netdev)
>  	}
>  	port->users++;
>  
> -	airoha_qdma_set(qdma, REG_QDMA_GLOBAL_CFG,
> -			GLOBAL_CFG_TX_DMA_EN_MASK |
> -			GLOBAL_CFG_RX_DMA_EN_MASK);
>  	qdma->users++;
>  
>  	if (!airoha_is_lan_gdm_dev(dev) &&
> @@ -1880,12 +1938,9 @@ static int airoha_dev_stop(struct net_device *netdev)
>  	struct airoha_gdm_dev *dev = netdev_priv(netdev);
>  	struct airoha_gdm_port *port = dev->port;
>  	struct airoha_qdma *qdma = dev->qdma;
> -	int i;
>  
>  	netif_tx_disable(netdev);
>  	airoha_set_vip_for_gdm_port(dev, false);
> -	for (i = 0; i < netdev->num_tx_queues; i++)
> -		netdev_tx_reset_subqueue(netdev, i);
>  
>  	if (--port->users)
>  		airoha_set_port_mtu(dev->eth, port);
> @@ -1893,19 +1948,7 @@ static int airoha_dev_stop(struct net_device *netdev)
>  		airoha_set_gdm_port_fwd_cfg(qdma->eth,
>  					    REG_GDM_FWD_CFG(port->id),
>  					    FE_PSE_PORT_DROP);
> -
> -	if (!--qdma->users) {
> -		airoha_qdma_clear(qdma, REG_QDMA_GLOBAL_CFG,
> -				  GLOBAL_CFG_TX_DMA_EN_MASK |
> -				  GLOBAL_CFG_RX_DMA_EN_MASK);
> -
> -		for (i = 0; i < ARRAY_SIZE(qdma->q_tx); i++) {
> -			if (!qdma->q_tx[i].ndesc)
> -				continue;
> -
> -			airoha_qdma_cleanup_tx_queue(&qdma->q_tx[i]);
> -		}
> -	}
> +	qdma->users--;
>  
>  	return 0;
>  }
> @@ -3413,8 +3456,12 @@ static int airoha_probe(struct platform_device *pdev)
>  	if (err)
>  		goto error_netdev_free;
>  
> -	for (i = 0; i < ARRAY_SIZE(eth->qdma); i++)
> +	for (i = 0; i < ARRAY_SIZE(eth->qdma); i++) {
>  		airoha_qdma_start_napi(&eth->qdma[i]);
> +		airoha_qdma_set(&eth->qdma[i], REG_QDMA_GLOBAL_CFG,
> +				GLOBAL_CFG_TX_DMA_EN_MASK |
> +				GLOBAL_CFG_RX_DMA_EN_MASK);
> +	}
>  
>  	for_each_child_of_node(pdev->dev.of_node, np) {
>  		if (!of_device_is_compatible(np, "airoha,eth-mac"))
> @@ -3440,6 +3487,8 @@ static int airoha_probe(struct platform_device *pdev)
>  	for (i = 0; i < ARRAY_SIZE(eth->qdma); i++)
>  		airoha_qdma_stop_napi(&eth->qdma[i]);
>  
> +	airoha_hw_cleanup(eth);
> +
>  	for (i = 0; i < ARRAY_SIZE(eth->ports); i++) {
>  		struct airoha_gdm_port *port = eth->ports[i];
>  		int j;
> @@ -3461,7 +3510,6 @@ static int airoha_probe(struct platform_device *pdev)
>  		}
>  		airoha_metadata_dst_free(port);
>  	}
> -	airoha_hw_cleanup(eth);
>  error_netdev_free:
>  	free_netdev(eth->napi_dev);
>  	platform_set_drvdata(pdev, NULL);
> @@ -3477,6 +3525,8 @@ static void airoha_remove(struct platform_device *pdev)
>  	for (i = 0; i < ARRAY_SIZE(eth->qdma); i++)
>  		airoha_qdma_stop_napi(&eth->qdma[i]);
>  
> +	airoha_hw_cleanup(eth);
> +
>  	for (i = 0; i < ARRAY_SIZE(eth->ports); i++) {
>  		struct airoha_gdm_port *port = eth->ports[i];
>  		int j;
> @@ -3497,7 +3547,6 @@ static void airoha_remove(struct platform_device *pdev)
>  		}
>  		airoha_metadata_dst_free(port);
>  	}
> -	airoha_hw_cleanup(eth);
>  
>  	free_netdev(eth->napi_dev);
>  	platform_set_drvdata(pdev, NULL);
> 
> ---
> base-commit: 7d8297e26b4e20b5d1c3c3fe51fe81a1c7fbc823
> change-id: 20260618-airoha-bql-fixes-f57b2d108573
> 
> Best regards,
> -- 
> Lorenzo Bianconi <lorenzo@kernel.org>
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply

* [PATCH v4 1/3] perf: marvell: Add MPAM partid filtering to CN10K TAD PMU
From: Geetha sowjanya @ 2026-06-18 15:36 UTC (permalink / raw)
  To: linux-perf-users, linux-kernel, linux-arm-kernel, devicetree
  Cc: mark.rutland, will, krzk+dt, gakula
In-Reply-To: <20260618153610.13649-1-gakula@marvell.com>

From: Tanmay Jagdale <tanmay@marvell.com>

The TAD PMU exposes counters that can be filtered by MPAM partition id
for a subset of allocation and hit events.

Add a 9-bit partid format attribute (config1) and route counter programming
through variant-specific ops so CN10K keeps MPAM-capable programming while
Odyssey keeps the reduced event set without advertising partid in sysfs.

Probe no longer mutates the platform_device MMIO resource (walk a local
map_start), rejects tad-cnt / page sizes of zero, validates the memory
window against tad-cnt, and registers the perf PMU before hotplug with
correct unwind.

Example:
  perf stat -e tad/tad_alloc_any,partid=0x12,partid_en=1/ -- <program>

Signed-off-by: Tanmay Jagdale <tanmay@marvell.com>
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
---

Changelog (since v3)
--------------------
- Restore cpuhp_state_add_instance_nocalls before perf_pmu_register in probe
  so users cannot attach events before the hotplug instance exists; unwind
  removes the hotplug instance if perf registration fails.
- Add perf_ready: tad_pmu_offline_cpu skips perf_pmu_migrate_context until after
  successful perf_pmu_register, so a CPU offline between hotplug add and perf
  register does not touch perf core state for an unregistered PMU.

Changelog (since v2)
--------------------
- Validate the eventId using an appropriate mask to ensure
  it is restricted to 8 bits.

Changelog (since v1)
--------------------
- Fix config1 filter enable to use bit 9 consistently with the PMU format
  string (partid_en) and reject reserved bits with GENMASK(9, 0).
- Register perf_pmu_register before cpuhp_state_add_instance_nocalls and
  unregister on hotplug failure.

 drivers/perf/marvell_cn10k_tad_pmu.c | 220 +++++++++++++++++++++------
 1 file changed, 171 insertions(+), 49 deletions(-)

diff --git a/drivers/perf/marvell_cn10k_tad_pmu.c b/drivers/perf/marvell_cn10k_tad_pmu.c
index 51ccb0befa05..340be3776fe7 100644
--- a/drivers/perf/marvell_cn10k_tad_pmu.c
+++ b/drivers/perf/marvell_cn10k_tad_pmu.c
@@ -7,6 +7,8 @@
 #define pr_fmt(fmt) "tad_pmu: " fmt
 
 #include <linux/io.h>
+#include <linux/bits.h>
+#include <linux/compiler.h>
 #include <linux/module.h>
 #include <linux/of.h>
 #include <linux/cpuhotplug.h>
@@ -14,12 +16,20 @@
 #include <linux/platform_device.h>
 #include <linux/acpi.h>
 
-#define TAD_PFC_OFFSET		0x800
-#define TAD_PFC(counter)	(TAD_PFC_OFFSET | (counter << 3))
 #define TAD_PRF_OFFSET		0x900
-#define TAD_PRF(counter)	(TAD_PRF_OFFSET | (counter << 3))
+#define TAD_PFC_OFFSET		0x800
+#define TAD_PFC(base, counter)	((base) | ((u64)(counter) << 3))
+#define TAD_PRF(base, counter)	((base) | ((u64)(counter) << 3))
 #define TAD_PRF_CNTSEL_MASK	0xFF
+#define TAD_PRF_MATCH_PARTID	BIT(8)
+#define TAD_PRF_PARTID_NS	BIT(10)
+/*
+ * config1: bits 0..8 MPAM partition id (including 0); bit 9 requests
+ * filtering for MPAM-capable events. All-zero config1 means no filter.
+ */
+#define TAD_PARTID_FILTER_EN	BIT(9)
 #define TAD_MAX_COUNTERS	8
+#define TAD_EVENT_SEL_MASK	GENMASK(7, 0)
 
 #define to_tad_pmu(p) (container_of(p, struct tad_pmu, pmu))
 
@@ -27,30 +37,94 @@ struct tad_region {
 	void __iomem	*base;
 };
 
+enum mrvl_tad_pmu_version {
+	TAD_PMU_V1 = 1,
+	TAD_PMU_V2,
+};
+
+struct tad_pmu_data {
+	int id;
+	u64 tad_prf_offset;
+	u64 tad_pfc_offset;
+};
+
 struct tad_pmu {
 	struct pmu pmu;
 	struct tad_region *regions;
 	u32 region_cnt;
 	unsigned int cpu;
+	/* Set after successful perf_pmu_register(); gates offline migration. */
+	bool perf_ready;
+	const struct tad_pmu_ops *ops;
+	const struct tad_pmu_data *pdata;
 	struct hlist_node node;
 	struct perf_event *events[TAD_MAX_COUNTERS];
 	DECLARE_BITMAP(counters_map, TAD_MAX_COUNTERS);
 };
 
-enum mrvl_tad_pmu_version {
-	TAD_PMU_V1 = 1,
-	TAD_PMU_V2,
-};
-
-struct tad_pmu_data {
-	int id;
+struct tad_pmu_ops {
+	void (*start_counter)(struct tad_pmu *pmu, struct perf_event *event);
 };
 
 static int tad_pmu_cpuhp_state;
 
+static void tad_pmu_start_counter(struct tad_pmu *pmu,
+				  struct perf_event *event)
+{
+	const struct tad_pmu_data *pdata = pmu->pdata;
+	struct hw_perf_event *hwc = &event->hw;
+	u32 event_idx = (u32)(event->attr.config & TAD_EVENT_SEL_MASK);
+	u32 counter_idx = hwc->idx;
+	u64 partid_filter = 0;
+	u64 reg_val;
+	u64 cfg1 = event->attr.config1;
+	bool use_mpam = cfg1 & TAD_PARTID_FILTER_EN;
+	u32 partid = (u32)(cfg1 & GENMASK(8, 0));
+	int i;
+
+	for (i = 0; i < pmu->region_cnt; i++)
+		writeq_relaxed(0, pmu->regions[i].base +
+			       TAD_PFC(pdata->tad_pfc_offset, counter_idx));
+
+	if (use_mpam && event_idx > 0x19 && event_idx < 0x21) {
+		partid_filter = TAD_PRF_MATCH_PARTID | TAD_PRF_PARTID_NS |
+				((u64)partid << 11);
+	}
+
+
+	for (i = 0; i < pmu->region_cnt; i++) {
+		reg_val = event_idx & 0xFF;
+		reg_val |= partid_filter;
+		writeq_relaxed(reg_val, pmu->regions[i].base +
+			       TAD_PRF(pdata->tad_prf_offset, counter_idx));
+	}
+}
+
+static void tad_pmu_v2_start_counter(struct tad_pmu *pmu,
+				     struct perf_event *event)
+{
+	const struct tad_pmu_data *pdata = pmu->pdata;
+	struct hw_perf_event *hwc = &event->hw;
+	u32 event_idx = (u32)(event->attr.config & TAD_EVENT_SEL_MASK);
+	u32 counter_idx = hwc->idx;
+	u64 reg_val;
+	int i;
+
+	for (i = 0; i < pmu->region_cnt; i++)
+		writeq_relaxed(0, pmu->regions[i].base +
+			       TAD_PFC(pdata->tad_pfc_offset, counter_idx));
+
+	for (i = 0; i < pmu->region_cnt; i++) {
+		reg_val = event_idx & 0xFF;
+		writeq_relaxed(reg_val, pmu->regions[i].base +
+			       TAD_PRF(pdata->tad_prf_offset, counter_idx));
+	}
+}
+
 static void tad_pmu_event_counter_read(struct perf_event *event)
 {
 	struct tad_pmu *tad_pmu = to_tad_pmu(event->pmu);
+	const struct tad_pmu_data *pdata = tad_pmu->pdata;
 	struct hw_perf_event *hwc = &event->hw;
 	u32 counter_idx = hwc->idx;
 	u64 prev, new;
@@ -60,7 +134,7 @@ static void tad_pmu_event_counter_read(struct perf_event *event)
 		prev = local64_read(&hwc->prev_count);
 		for (i = 0, new = 0; i < tad_pmu->region_cnt; i++)
 			new += readq(tad_pmu->regions[i].base +
-				     TAD_PFC(counter_idx));
+				     TAD_PFC(pdata->tad_pfc_offset, counter_idx));
 	} while (local64_cmpxchg(&hwc->prev_count, prev, new) != prev);
 
 	local64_add(new - prev, &event->count);
@@ -69,16 +143,14 @@ static void tad_pmu_event_counter_read(struct perf_event *event)
 static void tad_pmu_event_counter_stop(struct perf_event *event, int flags)
 {
 	struct tad_pmu *tad_pmu = to_tad_pmu(event->pmu);
+	const struct tad_pmu_data *pdata = tad_pmu->pdata;
 	struct hw_perf_event *hwc = &event->hw;
 	u32 counter_idx = hwc->idx;
 	int i;
 
-	/* TAD()_PFC() stop counting on the write
-	 * which sets TAD()_PRF()[CNTSEL] == 0
-	 */
 	for (i = 0; i < tad_pmu->region_cnt; i++) {
 		writeq_relaxed(0, tad_pmu->regions[i].base +
-			       TAD_PRF(counter_idx));
+			       TAD_PRF(pdata->tad_prf_offset, counter_idx));
 	}
 
 	tad_pmu_event_counter_read(event);
@@ -89,26 +161,10 @@ static void tad_pmu_event_counter_start(struct perf_event *event, int flags)
 {
 	struct tad_pmu *tad_pmu = to_tad_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
-	u32 event_idx = event->attr.config;
-	u32 counter_idx = hwc->idx;
-	u64 reg_val;
-	int i;
 
 	hwc->state = 0;
 
-	/* Typically TAD_PFC() are zeroed to start counting */
-	for (i = 0; i < tad_pmu->region_cnt; i++)
-		writeq_relaxed(0, tad_pmu->regions[i].base +
-			       TAD_PFC(counter_idx));
-
-	/* TAD()_PFC() start counting on the write
-	 * which sets TAD()_PRF()[CNTSEL] != 0
-	 */
-	for (i = 0; i < tad_pmu->region_cnt; i++) {
-		reg_val = event_idx & 0xFF;
-		writeq_relaxed(reg_val,	tad_pmu->regions[i].base +
-			       TAD_PRF(counter_idx));
-	}
+	tad_pmu->ops->start_counter(tad_pmu, event);
 }
 
 static void tad_pmu_event_counter_del(struct perf_event *event, int flags)
@@ -128,7 +184,6 @@ static int tad_pmu_event_counter_add(struct perf_event *event, int flags)
 	struct hw_perf_event *hwc = &event->hw;
 	int idx;
 
-	/* Get a free counter for this event */
 	idx = find_first_zero_bit(tad_pmu->counters_map, TAD_MAX_COUNTERS);
 	if (idx == TAD_MAX_COUNTERS)
 		return -EAGAIN;
@@ -148,6 +203,9 @@ static int tad_pmu_event_counter_add(struct perf_event *event, int flags)
 static int tad_pmu_event_init(struct perf_event *event)
 {
 	struct tad_pmu *tad_pmu = to_tad_pmu(event->pmu);
+	const struct tad_pmu_data *pdata = tad_pmu->pdata;
+	u32 event_idx = (u32)(event->attr.config & TAD_EVENT_SEL_MASK);
+	u64 cfg1 = event->attr.config1;
 
 	if (event->attr.type != event->pmu->type)
 		return -ENOENT;
@@ -158,6 +216,23 @@ static int tad_pmu_event_init(struct perf_event *event)
 	if (event->state != PERF_EVENT_STATE_OFF)
 		return -EINVAL;
 
+	if (event->attr.config & ~TAD_EVENT_SEL_MASK)
+		return -EINVAL;
+
+	if (pdata->id == TAD_PMU_V2) {
+		if (cfg1)
+			return -EINVAL;
+	} else {
+		if ((cfg1 & GENMASK(8, 0)) && !(cfg1 & TAD_PARTID_FILTER_EN))
+			return -EINVAL;
+		if (cfg1 & TAD_PARTID_FILTER_EN) {
+			if (event_idx <= 0x19 || event_idx >= 0x21)
+				return -EINVAL;
+		}
+		if (cfg1 & ~GENMASK(9, 0))
+			return -EINVAL;
+	}
+
 	event->cpu = tad_pmu->cpu;
 	event->hw.idx = -1;
 	event->hw.config_base = event->attr.config;
@@ -232,7 +307,7 @@ static struct attribute *ody_tad_pmu_event_attrs[] = {
 	TAD_PMU_EVENT_ATTR(tad_hit_ltg, 0x1e),
 	TAD_PMU_EVENT_ATTR(tad_hit_any, 0x1f),
 	TAD_PMU_EVENT_ATTR(tad_tag_rd, 0x20),
-	TAD_PMU_EVENT_ATTR(tad_tot_cycle, 0xFF),
+	TAD_PMU_EVENT_ATTR(tad_tot_cycle, 0xff),
 	NULL
 };
 
@@ -242,9 +317,13 @@ static const struct attribute_group ody_tad_pmu_events_attr_group = {
 };
 
 PMU_FORMAT_ATTR(event, "config:0-7");
+PMU_FORMAT_ATTR(partid, "config1:0-8");
+PMU_FORMAT_ATTR(partid_en, "config1:9-9");
 
 static struct attribute *tad_pmu_format_attrs[] = {
 	&format_attr_event.attr,
+	&format_attr_partid.attr,
+	&format_attr_partid_en.attr,
 	NULL
 };
 
@@ -253,6 +332,16 @@ static struct attribute_group tad_pmu_format_attr_group = {
 	.attrs = tad_pmu_format_attrs,
 };
 
+static struct attribute *ody_tad_pmu_format_attrs[] = {
+	&format_attr_event.attr,
+	NULL
+};
+
+static struct attribute_group ody_tad_pmu_format_attr_group = {
+	.name = "format",
+	.attrs = ody_tad_pmu_format_attrs,
+};
+
 static ssize_t tad_pmu_cpumask_show(struct device *dev,
 				struct device_attribute *attr, char *buf)
 {
@@ -281,16 +370,25 @@ static const struct attribute_group *tad_pmu_attr_groups[] = {
 
 static const struct attribute_group *ody_tad_pmu_attr_groups[] = {
 	&ody_tad_pmu_events_attr_group,
-	&tad_pmu_format_attr_group,
+	&ody_tad_pmu_format_attr_group,
 	&tad_pmu_cpumask_attr_group,
 	NULL
 };
 
+static const struct tad_pmu_ops tad_pmu_ops = {
+	.start_counter = tad_pmu_start_counter,
+};
+
+static const struct tad_pmu_ops tad_pmu_v2_ops = {
+	.start_counter = tad_pmu_v2_start_counter,
+};
+
 static int tad_pmu_probe(struct platform_device *pdev)
 {
 	const struct tad_pmu_data *dev_data;
 	struct device *dev = &pdev->dev;
 	struct tad_region *regions;
+	resource_size_t map_start;
 	struct tad_pmu *tad_pmu;
 	struct resource *res;
 	u32 tad_pmu_page_size;
@@ -298,7 +396,6 @@ static int tad_pmu_probe(struct platform_device *pdev)
 	u32 tad_cnt;
 	int version;
 	int i, ret;
-	char *name;
 
 	tad_pmu = devm_kzalloc(&pdev->dev, sizeof(*tad_pmu), GFP_KERNEL);
 	if (!tad_pmu)
@@ -312,6 +409,7 @@ static int tad_pmu_probe(struct platform_device *pdev)
 		return -ENODEV;
 	}
 	version = dev_data->id;
+	tad_pmu->pdata = dev_data;
 
 	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
 	if (!res) {
@@ -338,22 +436,31 @@ static int tad_pmu_probe(struct platform_device *pdev)
 		dev_err(&pdev->dev, "Can't find tad-cnt property\n");
 		return ret;
 	}
+	if (!tad_cnt || !tad_page_size || !tad_pmu_page_size) {
+		dev_err(&pdev->dev, "Invalid tad-cnt or page size\n");
+		return -EINVAL;
+	}
 
 	regions = devm_kcalloc(&pdev->dev, tad_cnt,
 			       sizeof(*regions), GFP_KERNEL);
 	if (!regions)
 		return -ENOMEM;
 
-	/* ioremap the distributed TAD pmu regions */
-	for (i = 0; i < tad_cnt && res->start < res->end; i++) {
-		regions[i].base = devm_ioremap(&pdev->dev,
-					       res->start,
+	map_start = res->start;
+	for (i = 0; i < tad_cnt; i++) {
+		if (map_start > res->end ||
+		    tad_pmu_page_size > (resource_size_t)(res->end - map_start + 1)) {
+			dev_err(&pdev->dev, "TAD PMU mem window too small for tad-cnt=%u\n",
+				tad_cnt);
+			return -EINVAL;
+		}
+		regions[i].base = devm_ioremap(&pdev->dev, map_start,
 					       tad_pmu_page_size);
 		if (!regions[i].base) {
 			dev_err(&pdev->dev, "TAD%d ioremap fail\n", i);
 			return -ENOMEM;
 		}
-		res->start += tad_page_size;
+		map_start += tad_page_size;
 	}
 
 	tad_pmu->regions = regions;
@@ -374,14 +481,16 @@ static int tad_pmu_probe(struct platform_device *pdev)
 		.read		= tad_pmu_event_counter_read,
 	};
 
-	if (version == TAD_PMU_V1)
+	if (version == TAD_PMU_V1) {
 		tad_pmu->pmu.attr_groups = tad_pmu_attr_groups;
-	else
+		tad_pmu->ops		 = &tad_pmu_ops;
+	} else {
 		tad_pmu->pmu.attr_groups = ody_tad_pmu_attr_groups;
+		tad_pmu->ops		 = &tad_pmu_v2_ops;
+	}
 
 	tad_pmu->cpu = raw_smp_processor_id();
 
-	/* Register pmu instance for cpu hotplug */
 	ret = cpuhp_state_add_instance_nocalls(tad_pmu_cpuhp_state,
 					       &tad_pmu->node);
 	if (ret) {
@@ -389,19 +498,24 @@ static int tad_pmu_probe(struct platform_device *pdev)
 		return ret;
 	}
 
-	name = "tad";
-	ret = perf_pmu_register(&tad_pmu->pmu, name, -1);
-	if (ret)
+	ret = perf_pmu_register(&tad_pmu->pmu, "tad", -1);
+	if (ret) {
+		dev_err(&pdev->dev, "Error %d registering perf PMU\n", ret);
 		cpuhp_state_remove_instance_nocalls(tad_pmu_cpuhp_state,
 						    &tad_pmu->node);
+		return ret;
+	}
 
-	return ret;
+	WRITE_ONCE(tad_pmu->perf_ready, true);
+
+	return 0;
 }
 
 static void tad_pmu_remove(struct platform_device *pdev)
 {
 	struct tad_pmu *pmu = platform_get_drvdata(pdev);
 
+	WRITE_ONCE(pmu->perf_ready, false);
 	cpuhp_state_remove_instance_nocalls(tad_pmu_cpuhp_state,
 						&pmu->node);
 	perf_pmu_unregister(&pmu->pmu);
@@ -410,12 +524,17 @@ static void tad_pmu_remove(struct platform_device *pdev)
 #if defined(CONFIG_OF) || defined(CONFIG_ACPI)
 static const struct tad_pmu_data tad_pmu_data = {
 	.id   = TAD_PMU_V1,
+	.tad_prf_offset = TAD_PRF_OFFSET,
+	.tad_pfc_offset = TAD_PFC_OFFSET,
 };
+
 #endif
 
 #ifdef CONFIG_ACPI
 static const struct tad_pmu_data tad_pmu_v2_data = {
 	.id   = TAD_PMU_V2,
+	.tad_prf_offset = TAD_PRF_OFFSET,
+	.tad_pfc_offset = TAD_PFC_OFFSET,
 };
 #endif
 
@@ -451,6 +570,9 @@ static int tad_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 	struct tad_pmu *pmu = hlist_entry_safe(node, struct tad_pmu, node);
 	unsigned int target;
 
+	if (!READ_ONCE(pmu->perf_ready))
+		return 0;
+
 	if (cpu != pmu->cpu)
 		return 0;
 
@@ -491,6 +613,6 @@ static void __exit tad_pmu_exit(void)
 module_init(tad_pmu_init);
 module_exit(tad_pmu_exit);
 
-MODULE_DESCRIPTION("Marvell CN10K LLC-TAD Perf driver");
+MODULE_DESCRIPTION("Marvell CN10K LLC-TAD perf driver");
 MODULE_AUTHOR("Bhaskara Budiredla <bbudiredla@marvell.com>");
 MODULE_LICENSE("GPL v2");
-- 
2.25.1



^ permalink raw reply related

* [PATCH v4 2/3] perf: marvell: Add CN20K LLC-TAD PMU support
From: Geetha sowjanya @ 2026-06-18 15:36 UTC (permalink / raw)
  To: linux-perf-users, linux-kernel, linux-arm-kernel, devicetree
  Cc: mark.rutland, will, krzk+dt, gakula
In-Reply-To: <20260618153610.13649-1-gakula@marvell.com>

Add support for the LLC Tag-and-Data (TAD) PMU present in
Marvell CN20K SoCs.

The CN20K TAD PMU is based on the CN10K design but differs in the
layout of PFC/PRF register offsets relative to each TAD base, and
introduces additional events. These offsets are selected by the driver
based on the compatible string and are not described via DT properties.

Because of this, "marvell,cn10k-tad-pmu" cannot be used as a fallback
for CN20K, as it would result in incorrect register programming.

Add support for "marvell,cn20k-tad-pmu" by:
  - Introducing a TAD_PMU_V3 profile with CN20K-specific register bases
  - Extending the event map for new CN20K events
  - Matching the PMU via OF and ACPI (MRVL000F)

Signed-off-by: Geetha sowjanya <gakula@marvell.com>
---

Changelog (since v2)
--------------------
- Validate the eventId using an appropriate mask to ensure
  it is restricted to 8 bits.

Changelog (since v1)
--------------------
- Hide V3-only events on CN10K via sysfs is_visible and reject them in
  event_init.
- Use CN20K-specific MPAM PRF bits (MATCH_MPAMNS, partid << 10) for V3;
  software partid is limited to nine bits so this does not collide with
  the fixed bit at 25.
- Reset hwc->prev_count when starting counters so reads match cleared HW.

 drivers/perf/marvell_cn10k_tad_pmu.c | 54 ++++++++++++++++++++++++++--
 1 file changed, 52 insertions(+), 2 deletions(-)

diff --git a/drivers/perf/marvell_cn10k_tad_pmu.c b/drivers/perf/marvell_cn10k_tad_pmu.c
index 340be3776fe7..b73ee2f58fd4 100644
--- a/drivers/perf/marvell_cn10k_tad_pmu.c
+++ b/drivers/perf/marvell_cn10k_tad_pmu.c
@@ -18,11 +18,14 @@
 
 #define TAD_PRF_OFFSET		0x900
 #define TAD_PFC_OFFSET		0x800
+#define TAD_PRF_NS_OFFSET	0x30900
+#define TAD_PFC_NS_OFFSET	0x30800
 #define TAD_PFC(base, counter)	((base) | ((u64)(counter) << 3))
 #define TAD_PRF(base, counter)	((base) | ((u64)(counter) << 3))
 #define TAD_PRF_CNTSEL_MASK	0xFF
 #define TAD_PRF_MATCH_PARTID	BIT(8)
 #define TAD_PRF_PARTID_NS	BIT(10)
+#define TAD_PRF_MATCH_MPAMNS	BIT(25)
 /*
  * config1: bits 0..8 MPAM partition id (including 0); bit 9 requests
  * filtering for MPAM-capable events. All-zero config1 means no filter.
@@ -40,6 +43,7 @@ struct tad_region {
 enum mrvl_tad_pmu_version {
 	TAD_PMU_V1 = 1,
 	TAD_PMU_V2,
+	TAD_PMU_V3,
 };
 
 struct tad_pmu_data {
@@ -89,8 +93,15 @@ static void tad_pmu_start_counter(struct tad_pmu *pmu,
 	if (use_mpam && event_idx > 0x19 && event_idx < 0x21) {
 		partid_filter = TAD_PRF_MATCH_PARTID | TAD_PRF_PARTID_NS |
 				((u64)partid << 11);
+
+		if (pdata->id == TAD_PMU_V3)
+			partid_filter = TAD_PRF_MATCH_PARTID | TAD_PRF_MATCH_MPAMNS |
+				((u64)partid << 10);
 	}
 
+	/* CN10K support events 0:24*/
+	if (pdata->id == TAD_PMU_V1 && event_idx >= 0x25)
+		return;
 
 	for (i = 0; i < pmu->region_cnt; i++) {
 		reg_val = event_idx & 0xFF;
@@ -163,6 +174,7 @@ static void tad_pmu_event_counter_start(struct perf_event *event, int flags)
 	struct hw_perf_event *hwc = &event->hw;
 
 	hwc->state = 0;
+	local64_set(&hwc->prev_count, 0);
 
 	tad_pmu->ops->start_counter(tad_pmu, event);
 }
@@ -223,6 +235,8 @@ static int tad_pmu_event_init(struct perf_event *event)
 		if (cfg1)
 			return -EINVAL;
 	} else {
+		if (pdata->id == TAD_PMU_V1 && event_idx >= 0x25)
+			return -EINVAL;
 		if ((cfg1 & GENMASK(8, 0)) && !(cfg1 & TAD_PARTID_FILTER_EN))
 			return -EINVAL;
 		if (cfg1 & TAD_PARTID_FILTER_EN) {
@@ -249,6 +263,22 @@ static ssize_t tad_pmu_event_show(struct device *dev,
 	return sysfs_emit(page, "event=0x%02llx\n", pmu_attr->id);
 }
 
+static umode_t tad_pmu_event_attr_is_visible(struct kobject *kobj,
+					     struct attribute *attr, int unused)
+{
+	struct pmu *pmu = dev_get_drvdata(kobj_to_dev(kobj));
+	struct tad_pmu *t = to_tad_pmu(pmu);
+	struct device_attribute *da = container_of(attr, struct device_attribute,
+						   attr);
+	struct perf_pmu_events_attr *e = container_of(da, struct perf_pmu_events_attr,
+						      attr);
+	u64 id = e->id;
+
+	if (t->pdata->id != TAD_PMU_V3 && id >= 0x25)
+		return 0;
+	return attr->mode;
+}
+
 #define TAD_PMU_EVENT_ATTR(name, config)			\
 	PMU_EVENT_ATTR_ID(name, tad_pmu_event_show, config)
 
@@ -290,12 +320,25 @@ static struct attribute *tad_pmu_event_attrs[] = {
 	TAD_PMU_EVENT_ATTR(tad_dat_rd_byp, 0x22),
 	TAD_PMU_EVENT_ATTR(tad_ifb_occ, 0x23),
 	TAD_PMU_EVENT_ATTR(tad_req_occ, 0x24),
+	TAD_PMU_EVENT_ATTR(tad_req_msh_out_dtg_evict, 0x25),
+	TAD_PMU_EVENT_ATTR(tad_req_msh_out_ltg_evict, 0x26),
+	TAD_PMU_EVENT_ATTR(tad_rsp_msh_out_mpam, 0x28),
+	TAD_PMU_EVENT_ATTR(tad_replays, 0x29),
+	TAD_PMU_EVENT_ATTR(tad_req_byp0, 0x2a),
+	TAD_PMU_EVENT_ATTR(tad_req_byp1, 0x2b),
+	TAD_PMU_EVENT_ATTR(tad_txreq_byp, 0x2c),
+	TAD_PMU_EVENT_ATTR(tad_time_in_dslp, 0x2d),
+	TAD_PMU_EVENT_ATTR(tad_time_elapsed, 0x2e),
+	TAD_PMU_EVENT_ATTR(tad_req_msh_out_dss_rd_128mrg, 0x2f),
+	TAD_PMU_EVENT_ATTR(tad_req_msh_out_dss_wr_128mrg, 0x30),
+	TAD_PMU_EVENT_ATTR(tad_tot_cycle, 0xff),
 	NULL
 };
 
 static const struct attribute_group tad_pmu_events_attr_group = {
 	.name = "events",
 	.attrs = tad_pmu_event_attrs,
+	.is_visible = tad_pmu_event_attr_is_visible,
 };
 
 static struct attribute *ody_tad_pmu_event_attrs[] = {
@@ -481,7 +524,7 @@ static int tad_pmu_probe(struct platform_device *pdev)
 		.read		= tad_pmu_event_counter_read,
 	};
 
-	if (version == TAD_PMU_V1) {
+	if (version == TAD_PMU_V1 || version == TAD_PMU_V3) {
 		tad_pmu->pmu.attr_groups = tad_pmu_attr_groups;
 		tad_pmu->ops		 = &tad_pmu_ops;
 	} else {
@@ -528,6 +571,11 @@ static const struct tad_pmu_data tad_pmu_data = {
 	.tad_pfc_offset = TAD_PFC_OFFSET,
 };
 
+static const struct tad_pmu_data tad_pmu_cn20k_data = {
+	.id   = TAD_PMU_V3,
+	.tad_prf_offset = TAD_PRF_NS_OFFSET,
+	.tad_pfc_offset = TAD_PFC_NS_OFFSET,
+};
 #endif
 
 #ifdef CONFIG_ACPI
@@ -541,6 +589,7 @@ static const struct tad_pmu_data tad_pmu_v2_data = {
 #ifdef CONFIG_OF
 static const struct of_device_id tad_pmu_of_match[] = {
 	{ .compatible = "marvell,cn10k-tad-pmu", .data = &tad_pmu_data },
+	{ .compatible = "marvell,cn20k-tad-pmu", .data = &tad_pmu_cn20k_data },
 	{},
 };
 #endif
@@ -549,6 +598,7 @@ static const struct of_device_id tad_pmu_of_match[] = {
 static const struct acpi_device_id tad_pmu_acpi_match[] = {
 	{"MRVL000B", (kernel_ulong_t)&tad_pmu_data},
 	{"MRVL000D", (kernel_ulong_t)&tad_pmu_v2_data},
+	{"MRVL000F", (kernel_ulong_t)&tad_pmu_cn20k_data},
 	{},
 };
 MODULE_DEVICE_TABLE(acpi, tad_pmu_acpi_match);
@@ -613,6 +663,6 @@ static void __exit tad_pmu_exit(void)
 module_init(tad_pmu_init);
 module_exit(tad_pmu_exit);
 
-MODULE_DESCRIPTION("Marvell CN10K LLC-TAD perf driver");
+MODULE_DESCRIPTION("Marvell CN10K/CN20K LLC-TAD perf driver");
 MODULE_AUTHOR("Bhaskara Budiredla <bbudiredla@marvell.com>");
 MODULE_LICENSE("GPL v2");
-- 
2.25.1



^ permalink raw reply related

* Re: [PATCH v6 00/20] dma-mapping: Use DMA_ATTR_CC_SHARED through direct, pool and swiotlb paths
From: Jason Gunthorpe @ 2026-06-18 15:37 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Alexey Kardashevskiy, Catalin Marinas, iommu, linux-arm-kernel,
	linux-kernel, linux-coco, Robin Murphy, Marek Szyprowski,
	Will Deacon, Marc Zyngier, Steven Price, Suzuki K Poulose,
	Jiri Pirko, Mostafa Saleh, Petr Tesarik, Dan Williams, Xu Yilun,
	linuxppc-dev, linux-s390, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Sven Schnelle, x86
In-Reply-To: <yq5aqzm4dz25.fsf@kernel.org>

On Thu, Jun 18, 2026 at 09:37:22AM +0100, Aneesh Kumar K.V wrote:
> Alexey Kardashevskiy <aik@amd.com> writes:
> 
> > On 10/6/26 00:47, Jason Gunthorpe wrote:
> >> On Tue, Jun 09, 2026 at 02:43:08PM +0100, Catalin Marinas wrote:
> >>> On Thu, Jun 04, 2026 at 02:09:39PM +0530, Aneesh Kumar K.V (Arm) wrote:
> >>>> This series propagates DMA_ATTR_CC_SHARED through the dma-direct,
> >>>> dma-pool, and swiotlb paths so that encrypted and decrypted DMA buffers
> >>>> are handled consistently.
> >>>>
> >>>> Today, the direct DMA path mostly relies on force_dma_unencrypted() for
> >>>> shared/decrypted buffer handling. This series consolidates the
> >>>> force_dma_unencrypted() checks in the top-level functions and ensures
> >>>> that the remaining DMA interfaces use DMA attributes to make the correct
> >>>> decisions.
> >>>
> >>> Please check Sashiko's reports, it has some good points:
> >>>
> >>> https://sashiko.dev/#/patchset/20260604083959.1265923-1-aneesh.kumar@kernel.org
> >>>
> >>> I think the main one is the swiotlb_tbl_map_single() changes which break
> >>> AMD SME host support. There cc_platform_has(CC_ATTR_MEM_ENCRYPT) is true
> >>> but force_dma_unencrypted() is false. Normally you'd not end up on this
> >>> path but you can have swiotlb=force.
> >> 
> >> IMHO that's an AMD issue, not with the design of this series..
> >> 
> >> The series is right, a device that is !force_dma_decrypted() must be
> >> considerd to be a trusted device and we must never place any DMA
> >> mappings for a trusted device into shared memory.
> >
> > swiotlb=force forces swiotlb, not decryption.

If force_dma_decrypted() == true then swiotlb must allocate from a
decrypted memory pool. It is right there in the name!

The hypervisor environment should *never* set force_dma_decrypted()
because all devices can access all hypervisor memory, up to their IOVA
limits.

> > So when I try "mem_encrypt=on iommu=pt swiotlb=force" with this
> > patchset, it fails to boot. But it boots with a hack like this:

On the host side I expect this to cause swiotlb to allocate encrypted
memory and bounce to it.

>  		u64 dma_enc_mask = DMA_BIT_MASK(__ffs64(sme_me_mask));
>  		u64 dma_dev_mask = min_not_zero(dev->coherent_dma_mask,
>  						dev->bus_dma_limit);
> +		/*
> +		 * With memory encryption enabled, SWIOTLB is marked decrypted.
> +		 * If SWIOTLB bouncing is forced, treat the device as requiring
> +		 * decrypted DMA.
> +		 */

And this is more insane logic. The right fix is to allocate the
swiotlb bounce from the *encrypted* pools when running on the
hypervisor which requires undoing this abuse of force_dma_decrypted().

Jason


^ permalink raw reply

* [PATCH v4 0/3] perf: marvell: LLC-TAD PMU MPAM filtering support
From: Geetha sowjanya @ 2026-06-18 15:36 UTC (permalink / raw)
  To: linux-perf-users, linux-kernel, linux-arm-kernel, devicetree
  Cc: mark.rutland, will, krzk+dt, gakula

This series extends the Marvell LLC-TAD performance driver used on CN10K
and CN20K systems.

Patch 1 adds optional MPAM partition-id filtering for the subset of TAD
events that support it, exposes partid / partid_en in the PMU format string,
and keeps the reduced Odyssey event surface without advertising partid where
it does not apply.  It also fixes probe resource handling (no in-place
mutation of platform_get_resource() bounds, validate MMIO window vs
tad-cnt), registers CPU hotplug before perf_pmu_register with unwind, and
aligns the filter-enable bit in config1 with the sysfs format (bit 9).

Patch 2 introduces CN20K LLC-TAD support: non-standard PFC/PRF offsets,
additional programmable events with visibility checks so CN10K does not
advertise V3-only events, CN20K-specific MPAM encoding for the V3 profile,
local64_set(prev_count) on counter start, and device discovery via OF and
ACPI.

Patch 3 extends the DeviceTree binding for marvell,cn20k-tad-pmu.

Changes since v3
----------------
- Add perf_ready: tad_pmu_offline_cpu skips perf_pmu_migrate_context until after
  successful perf_pmu_register, so a CPU offline between hotplug add and perf
  register does not touch perf core state for an unregistered PMU.

Changes since v2
----------------
- Validate the eventId using an appropriate mask to ensure it is restricted to 8 bits.

Changes since v1
----------------
- config1: use bit 9 for MPAM filter enable consistently with partid_en in
  the PMU format; allow only bits 0..9 in event_init on CN10K/CN20K paths.
- Reject reserved bits in attr.config and use the same 8-bit event index in
  start_counter as in event_init so MPAM validation cannot be bypassed.
- Register CPU hotplug before perf_pmu_register in probe (mainline order); add
  perf_ready so offline migration is skipped until after perf registration
  (reconciles v1 vs v2 ordering feedback).
- Hide V3-only sysfs events on V1.
- Reset prev_count when starting counters after clearing hardware.
- DT binding: explain non-fallback compatibles for CN10K vs CN20K.

Tanmay Jagdale (1):
  perf: marvell: Add MPAM partid filtering to CN10K TAD PMU

Geetha sowjanya (2):
  perf: marvell: Add CN20K LLC-TAD PMU support
  dt-bindings: perf: marvell: Extend CN10K TAD PMU binding for CN20K

Signed-off-by: Geetha sowjanya <gakula@marvell.com>

-- 
2.25.1

^ permalink raw reply

* [PATCH v4 3/3] dt-bindings: perf: marvell: add CN20K TAD PMU support
From: Geetha sowjanya @ 2026-06-18 15:36 UTC (permalink / raw)
  To: linux-perf-users, linux-kernel, linux-arm-kernel, devicetree
  Cc: mark.rutland, will, krzk+dt, gakula
In-Reply-To: <20260618153610.13649-1-gakula@marvell.com>

Marvell CN20K SoCs integrate a Performance Monitoring Unit (PMU)
associated with the LLC Tag-and-Data (TAD) blocks. The PMU provides
hardware counters to monitor cache traffic and performance events
via a dedicated MMIO region.

The CN20K LLC-TAD PMU is largely similar to CN10K, but differs in the
layout of PFC/PRF register offsets relative to each TAD base. These
offsets are derived from the compatible string in the driver and are
not described through Devicetree properties.

Because of this, using "marvell,cn10k-tad-pmu" as a fallback for CN20K
would result in incorrect register programming. Therefore, add a
separate compatible string:

  "marvell,cn20k-tad-pmu"

Update the binding to document CN20K alongside CN10K.

Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
---
 .../bindings/perf/marvell-cn10k-tad.yaml      | 25 +++++++++++++------
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/Documentation/devicetree/bindings/perf/marvell-cn10k-tad.yaml b/Documentation/devicetree/bindings/perf/marvell-cn10k-tad.yaml
index 362142252667..d11121a1e2c9 100644
--- a/Documentation/devicetree/bindings/perf/marvell-cn10k-tad.yaml
+++ b/Documentation/devicetree/bindings/perf/marvell-cn10k-tad.yaml
@@ -4,23 +4,32 @@
 $id: http://devicetree.org/schemas/perf/marvell-cn10k-tad.yaml#
 $schema: http://devicetree.org/meta-schemas/core.yaml#
 
-title: Marvell CN10K LLC-TAD performance monitor
+title: Marvell CN10K / CN20K LLC-TAD performance monitor
 
 maintainers:
   - Bhaskara Budiredla <bbudiredla@marvell.com>
+  - Geetha sowjanya <gakula@marvell.com>
 
 description: |
-  The Tag-and-Data units (TADs) maintain coherence and contain CN10K
-  shared on-chip last level cache (LLC). The tad pmu measures the
-  performance of last-level cache. Each tad pmu supports up to eight
-  counters.
+  The Tag-and-Data units (TADs) maintain coherence and contain the
+  shared on-chip last level cache (LLC) on Marvell CN10K and CN20K SoCs.
+  The TAD PMU measures last-level cache performance. Each TAD PMU
+  supports up to eight counters.
 
-  The DT setup comprises of number of tad blocks, the sizes of pmu
-  regions, tad blocks and overall base address of the HW.
+  The DT setup describes the number of TAD blocks, the sizes of PMU
+  regions and TAD pages, and the overall MMIO base of the hardware.
+
+  marvell,cn20k-tad-pmu is not a compatible fallback for
+  marvell,cn10k-tad-pmu (and vice versa): the driver selects different
+  PFC/PRF MMIO offsets from the compatible string, and those offsets are
+  not described by separate DT properties today.
 
 properties:
   compatible:
-    const: marvell,cn10k-tad-pmu
+    items:
+      - enum:
+          - marvell,cn10k-tad-pmu
+          - marvell,cn20k-tad-pmu
 
   reg:
     maxItems: 1
-- 
2.25.1



^ permalink raw reply related

* Re: [PATCH v2] arm64: tlbflush: Reset active_cpu on ASID rollover
From: Linu Cherian @ 2026-06-18 15:21 UTC (permalink / raw)
  To: Sayali Kulkarni
  Cc: catalin.marinas, linux-arm-kernel, linux-kernel, will,
	ryan.roberts, yang, cl, sskulkarni
In-Reply-To: <20260612232254.2856649-1-sk@gentwo.org>

Hi,

On Fri, Jun 12, 2026 at 04:21:06PM -0700, Sayali Kulkarni wrote:
> From: Sayali Kulkarni <sskulkarni@amperecomputing.com>
> 
> Hi Catalin,  
> 
> Thank you for the review. I’ve addressed your feedback in v2:  
> 
> - Moved `WRITE_ONCE(mm->context.active_cpu, ACTIVE_CPU_NONE)` from `check_and_switch_context()` to `new_context()` after the `set_asid` label. At this point, a brand new ASID has been allocated that no CPU has ever used, so the reset is safe even for multi-threaded processes where other CPUs may still be running with the old ASID via `reserved_asids`.  
> - Updated the commit message to correct the safety reasoning: `flush_context()` only sets `tlb_flush_pending`; it does not issue a global TLB flush.  
> 
> Thanks,  
> Sayali
> 
> 
> Once active_cpu flips to ACTIVE_CPU_MULTIPLE it never resets, even if
> the process settles back to one CPU. Reset it to ACTIVE_CPU_NONE in
> new_context() after a new ASID is allocated at the set_asid label.
> 
> At this point a brand new ASID has been assigned that no CPU has ever
> used, so ACTIVE_CPU_NONE accurately reflects reality. Any other threads
> of the same process continue running with the old ASID via
> reserved_asids and are unaffected.
> 
> This gives processes a fresh chance at the local-only flush fast path
> after each ASID generation rollover.
> 
> Signed-off-by: Sayali Kulkarni <sskulkarni@amperecomputing.com> (Ampere)
> ---
>  arch/arm64/mm/context.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
> index f34ed78393e0..46c7fd07b9bf 100644
> --- a/arch/arm64/mm/context.c
> +++ b/arch/arm64/mm/context.c
> @@ -209,6 +209,7 @@ static u64 new_context(struct mm_struct *mm)
>  set_asid:
>  	__set_bit(asid, asid_map);
>  	cur_idx = asid;
> +	WRITE_ONCE(mm->context.active_cpu, ACTIVE_CPU_NONE);

Can the above store race with the store to active_cpu in another thread,
that updates it to ACTIVE_CPU_MULTIPLE ?

Lets say we have two threads both initially running in CPU 0,

Thread 1: Runs in CPU 0 

Encounters a rollover, updates mm->context.active_cpu to ACTIVE_CPU_NONE and
updates mm->context.id to new asid.

Thread 2: Scheduled to run on CPU 1 for the first time

Observes the updated mm->context.id that belongs to the current
generation(after the rollover) and hence proceeds to switch_mm_fastpath
and ends up updating the active_cpu to ACTIVE_CPU_MULTIPLE. 

If Thread 1 and Thread 2 races, then active_cpu can get corrupted ?

The reason this could be possible is that, write to active_cpu and
mm->context.id can get reordered and we need to enforce ordering for
correctness ? 

Do you see this as a valid scenario ?

--
Thanks,
Linu Cherian.











^ permalink raw reply

* Re: [PATCH RFC 1/3] cpu/hotplug: Introduce CONFIG_PARALLEL_SMT_PRIMARY_FIRST
From: Thomas Gleixner @ 2026-06-18 15:17 UTC (permalink / raw)
  To: Jinjie Ruan, catalin.marinas, will, tsbogend, pjw, palmer, aou,
	alex, mingo, bp, dave.hansen, hpa, peterz, kees, nathan, linusw,
	ojeda, ruanjinjie, david.kaplan, lukas.bulwahn, ryan.roberts, maz,
	timothy.hayes, lpieralisi, thuth, oupton, yeoreum.yun,
	miko.lenczewski, broonie, kevin.brodsky, james.clark, tabba,
	mrigendra.chaubey, arnd, anshuman.khandual, x86, linux-kernel,
	linux-arm-kernel, linux-mips, linux-riscv
In-Reply-To: <20260611133809.3854977-2-ruanjinjie@huawei.com>

On Thu, Jun 11 2026 at 21:38, Jinjie Ruan wrote:

> During parallel CPU bringup, x86 requires primary SMT threads to boot
> first to avoid siblings stopping during microcode updates. This constraint
> is architecture-specific and unnecessary for other platforms
> like arm64.
>
> Introduce CONFIG_PARALLEL_SMT_PRIMARY_FIRST to decouple this constraint.
> Platforms requiring this temporal order (e.g., x86) can select it
> in Kconfig. Other architectures (e.g., arm64) can leave it unselected
> to entirely bypass the SMT branch via the preprocessor.
>
> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
> ---
>  arch/Kconfig       | 4 ++++
>  arch/mips/Kconfig  | 1 +
>  arch/riscv/Kconfig | 1 +
>  arch/x86/Kconfig   | 1 +
>  kernel/cpu.c       | 6 +++++-
>  5 files changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/arch/Kconfig b/arch/Kconfig
> index e86880045158..0365d2df2659 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -102,6 +102,10 @@ config HOTPLUG_PARALLEL
>  	bool
>  	select HOTPLUG_SPLIT_STARTUP
>  
> +config PARALLEL_SMT_PRIMARY_FIRST

Proper namespaces are overrated, right?

All related options start with HOTPLUG_....

> +	bool
> +	depends on HOTPLUG_PARALLEL
> +
>  config GENERIC_IRQ_ENTRY
>  	bool
>  
> diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
> index 4364f3dba688..84e11ac0cf71 100644
> --- a/arch/mips/Kconfig
> +++ b/arch/mips/Kconfig
> @@ -642,6 +642,7 @@ config EYEQ
>  	select MIPS_CPU_SCACHE
>  	select MIPS_GIC
>  	select MIPS_L1_CACHE_SHIFT_7
> +	select PARALLEL_SMT_PRIMARY_FIRST if HOTPLUG_PARALLEL
>  	select PCI_DRIVERS_GENERIC
>  	select SMP_UP if SMP
>  	select SWAP_IO_SPACE
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index d235396c4514..0cc49aecc841 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -210,6 +210,7 @@ config RISCV
>  	select OF
>  	select OF_EARLY_FLATTREE
>  	select OF_IRQ
> +	select PARALLEL_SMT_PRIMARY_FIRST if HOTPLUG_PARALLEL

Why does RISCV need this? It does not select HOTPLUG_SMT to begin with.

> +#ifdef CONFIG_PARALLEL_SMT_PRIMARY_FIRST
>  #ifdef CONFIG_HOTPLUG_SMT

More #ifdeffery is better, right?

>  static inline bool cpuhp_smt_aware(void)
>  {
> @@ -1811,7 +1812,8 @@ static inline const struct cpumask *cpuhp_get_primary_thread_mask(void)
>  {
>  	return cpu_none_mask;
>  }
> -#endif
> +#endif /* CONFIG_HOTPLUG_SMT */
> +#endif /* CONFIG_PARALLEL_SMT_PRIMARY_FIRST */
>  
>  bool __weak arch_cpuhp_init_parallel_bringup(void)
>  {
> @@ -1837,6 +1839,7 @@ static bool __init cpuhp_bringup_cpus_parallel(unsigned int ncpus)
>  	if (!__cpuhp_parallel_bringup)
>  		return false;
>  
> +#ifdef CONFIG_PARALLEL_SMT_PRIMARY_FIRST

Seriously?

>  	if (cpuhp_smt_aware()) {
>  		const struct cpumask *pmask = cpuhp_get_primary_thread_mask();
>  		static struct cpumask tmp_mask __initdata;
> @@ -1857,6 +1860,7 @@ static bool __init cpuhp_bringup_cpus_parallel(unsigned int ncpus)
>  		cpumask_andnot(&tmp_mask, mask, pmask);
>  		mask = &tmp_mask;
>  	}
> +#endif /* CONFIG_PARALLEL_SMT_PRIMARY_FIRST */

Something simple like the uncompiled below should just work, no?

---
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -102,6 +102,10 @@ config HOTPLUG_PARALLEL
 	bool
 	select HOTPLUG_SPLIT_STARTUP
 
+config HOTPLUG_PARALLEL_SMT
+	bool
+	select HOTPLUG_PARALLEL
+
 config GENERIC_IRQ_ENTRY
 	bool
 
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -657,7 +657,7 @@ config EYEQ
 	select USB_UHCI_BIG_ENDIAN_DESC if CPU_BIG_ENDIAN
 	select USB_UHCI_BIG_ENDIAN_MMIO if CPU_BIG_ENDIAN
 	select USE_OF
-	select HOTPLUG_PARALLEL if HOTPLUG_CPU
+	select HOTPLUG_PARALLEL_SMT if HOTPLUG_CPU
 	help
 	  Select this to build a kernel supporting EyeQ SoC from Mobileye.
 
@@ -2295,7 +2295,6 @@ config MIPS_CPS
 	select MIPS_CM
 	select MIPS_CPS_PM if HOTPLUG_CPU
 	select SMP
-	select HOTPLUG_SMT if HOTPLUG_PARALLEL
 	select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU
 	select SYNC_R4K if (CEVT_R4K || CSRC_R4K)
 	select SYS_SUPPORTS_HOTPLUG_CPU
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -304,7 +304,7 @@ config X86
 	select HAVE_USER_RETURN_NOTIFIER
 	select HAVE_GENERIC_VDSO
 	select VDSO_GETRANDOM			if X86_64
-	select HOTPLUG_PARALLEL			if SMP && X86_64
+	select HOTPLUG_PARALLEL_SMT		if SMP && X86_64
 	select HOTPLUG_SMT			if SMP
 	select HOTPLUG_SPLIT_STARTUP		if SMP && X86_32
 	select IRQ_FORCED_THREADING
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1792,7 +1792,7 @@ static int __init parallel_bringup_parse
 }
 early_param("cpuhp.parallel", parallel_bringup_parse_param);
 
-#ifdef CONFIG_HOTPLUG_SMT
+#ifdef CONFIG_HOTPLUG_PARALLEL_SMT
 static inline bool cpuhp_smt_aware(void)
 {
 	return cpu_smt_max_threads > 1;


^ permalink raw reply

* Re: [PATCHv2] arm64/entry: Fix arm64-specific rseq brokenness
From: Mark Rutland @ 2026-06-18 15:16 UTC (permalink / raw)
  To: Mathias Stearn
  Cc: Will Deacon, Jinjie Ruan, linux-arm-kernel, Catalin Marinas,
	Peter Zijlstra, Thomas Gleixner, ckennelly, dvyukov, linux-kernel,
	mathieu.desnoyers
In-Reply-To: <ajPv1tew-QAFE1ib@J2N7QTR9R3.cambridge.arm.com>

On Thu, Jun 18, 2026 at 02:17:10PM +0100, Mark Rutland wrote:
> On Thu, Jun 18, 2026 at 09:55:20AM +0200, Mathias Stearn wrote:
> > On Wed, Jun 10, 2026 at 2:37 PM Will Deacon <will@kernel.org> wrote:
> > >
> > > On Tue, Jun 09, 2026 at 02:04:23PM +0200, Mathias Stearn wrote:
> > > > Is it possible to get 411c1cf43039 backported to 7.0 or was it omitted
> > > > intentionally?
> > >
> > > You can send a backport to the stable maintainers:
> > >
> > > https://docs.kernel.org/process/stable-kernel-rules.html#procedure-for-submitting-patches-to-the-stable-tree
> > 
> > Who was that "you" directed at? I'm not used to the kernel development
> > process. Is that my responsibility as the bug reporter / interested
> > party, or something that Mark Rutland, author of the patch, should do?
> 
> I'll sort it out.

I've send a backport for v7.0; when lore updates it should be visible at:

  https://lore.kernel.org/stable/20260618151426.308099-1-mark.rutland@arm.com/

Mark.


^ permalink raw reply

* Re: Question: pinctrl-backed GPIO set_config and gpio_chip::can_sleep
From: Runyu Xiao @ 2026-06-18 15:10 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Linus Walleij, Bartosz Golaszewski, Ludovic Desroches,
	Nicolas Ferre, Alexandre Belloni, Claudiu Beznea, Antonio Borneo,
	Maxime Coquelin, Alexandre Torgue, Chen-Yu Tsai, Jernej Skrabec,
	Samuel Holland, linux-arm-kernel, linux-gpio, linux-stm32,
	linux-sunxi, linux-kernel, jianhao.xu, runyu.xiao
In-Reply-To: <CAD++jLmW3vgTFryRAL24x2TbgbR1tbhjw-nFFH3askoZfSibaQ@mail.gmail.com>

Hi,

Thanks for checking this.

I agree that marking these memory-mapped controllers as can_sleep is too
broad if the only sleepable part is the pinctrl range lookup.  That would
make consumers treat otherwise MMIO-backed get/set paths as sleepable,
which is not the contract I want to change.

I will hold back the at91-pio4/stm32/sunxi can_sleep series and look at
the pinctrl core direction instead, specifically whether
pinctrldev_list_mutex can be replaced by a non-sleeping lock for
pinctrl_get_device_gpio_range().  That should also line up with the GPIO
direction callback case discussed in the other thread.

Thanks,
Runyu

^ permalink raw reply

* Re: [RFC PATCH 3/6] arm64: mm: fix restoring linear map permissions on execmem cache clean
From: Ryan Roberts @ 2026-06-18 15:05 UTC (permalink / raw)
  To: Adrian Barnaś, linux-arm-kernel
  Cc: linux-mm, Catalin Marinas, Will Deacon, David Hildenbrand,
	Mike Rapoport (Microsoft), Ard Biesheuvel, Christoph Lameter,
	Yang Shi, Brendan Jackman
In-Reply-To: <20260611130144.1385343-4-abarnas@google.com>

On 11/06/2026 14:01, Adrian Barnaś wrote:
> Strip the read-only attribute from the selected memory range when
> restoring the linear map after an execmem cache clean.
> 
> An execmem cache clean is performed when a cache block becomes empty
> after unloading a module. When making the memory valid again, the linear
> memory alias must also have its read-only attribute cleared.
> 
> Without this change, the linear memory alias remains read-only even
> after the execmem cache block itself is freed, which prevents subsequent
> allocations from writing to that memory.
> 
> Signed-off-by: Adrian Barnaś <abarnas@google.com>
> ---
>  arch/arm64/mm/pageattr.c | 17 ++++++++++++++++-
>  1 file changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
> index 88720bbba892..eaefdf90b0d5 100644
> --- a/arch/arm64/mm/pageattr.c
> +++ b/arch/arm64/mm/pageattr.c
> @@ -239,6 +239,13 @@ int set_memory_x(unsigned long addr, int numpages)
>  					__pgprot(PTE_PXN));
>  }
>  
> +static int set_memory_default(unsigned long addr, int numpages)
> +{
> +	return __change_memory_common(addr, PAGE_SIZE * numpages,
> +				      __pgprot(PTE_VALID),
> +				      __pgprot(PTE_RDONLY));

This is not sufficient to convert an invalid entry to valid. As well as setting
the PTE_VALID bit, you would also need to clear the PTE_PRESENT_INVALID and set
PTE_MAYBE_NG.

e.g:

int set_memory_valid(unsigned long addr, int numpages, int enable)
{
	if (enable)
		return __change_memory_common(addr, PAGE_SIZE * numpages,
					__pgprot(PTE_PRESENT_VALID_KERNEL),
					__pgprot(PTE_PRESENT_INVALID));


> +}
> +
>  int set_memory_valid(unsigned long addr, int numpages, int enable)
>  {
>  	if (enable)
> @@ -362,7 +369,15 @@ int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid)
>  	if (!can_set_direct_map())
>  		return 0;
>  
> -	return set_memory_valid(addr, nr, valid);
> +	/*
> +	 * Execmem cache uses this function to reset permissions on linear mapping
> +	 * when freeing unused cache block. On x86 it makes memory RW which is
> +	 * desirable. On ARM64 set_memory_valid() just change valid bit which
> +	 * leave direct mapping read-only so use set_memory_default instead.
> +	 */
> +
> +	return valid ? set_memory_default(addr, nr) :
> +		       set_memory_valid(addr, nr, false);

Surely execmem should just be using set_direct_map_default_noflush() if that's
the behaviour it wants?

I think that the current implementation of set_direct_map_default_noflush()
doesn't undo the effects of set_memory_nx() / set_memory_x(). That might be
worth checking?

Thanks,
Ryan


>  }
>  
>  #ifdef CONFIG_DEBUG_PAGEALLOC



^ permalink raw reply

* Re: Question: SPEAr PLGPIO irq_enable on PREEMPT_RT and regmap updates
From: Runyu Xiao @ 2026-06-18 14:49 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Mark Brown
  Cc: Herve Codina, Viresh Kumar, Viresh Kumar, Linus Walleij,
	Clark Williams, Steven Rostedt, linux-arm-kernel, soc, linux-gpio,
	linux-rt-devel, linux-kernel, jianhao.xu, runyu.xiao
In-Reply-To: <20260618081554.zifCwv4I@linutronix.de>

Hi,

Thanks everyone for the feedback.

I will not send the irq_bus_sync_unlock/shadow-state patch for now. From
Sebastian's comments, it sounds like the more important question is
whether this should be handled at the regmap locking/cache level, or by
using a raw lock only where the regmap path is known to be safe.

Herve, your point about other GPIO controllers is fair. I should not
treat PLGPIO as special without checking the wider pattern. I will look
at other irq_enable/irq_disable users that combine irqchip callbacks,
driver spinlocks and regmap_update_bits(), and compare whether they are
using MMIO/flat cache/raw regmap locking or a sleepable regmap path.

For the current PLGPIO draft, I will hold it back until I can answer
whether the right fix belongs in this driver or in the common regmap/GPIO
pattern.

Thanks,
Runyu

^ permalink raw reply

* [RFC v3 net-next] net: airoha: add HW GRO offload support
From: Lorenzo Bianconi @ 2026-06-18 14:42 UTC (permalink / raw)
  To: andrew+netdev, davem, edumazet, kuba, pabeni
  Cc: lorenzo, aleksander.lobakin, linux-arm-kernel, linux-mediatek,
	netdev

Add hardware GRO offload support to the airoha_eth driver, leveraging
the EN7581/AN7583 SoC's 8 dedicated LRO hardware queues mapped to RX
queues 24-31. HW GRO offloading does not support Scatter-Gather (SG) so
it is required to increase the page_pool allocation order to 2 for RX
queues 24-31 (LRO queues).
Since HW GRO is configured per-QDMA and shared across all devices using
it, HW GRO is mutually exclusive with multiple active devices on the
same QDMA block. Call netdev_update_features() on sibling devices in
ndo_open/ndo_stop so that NETIF_F_GRO_HW availability is re-evaluated
when the QDMA user count changes.
Set CHECKSUM_PARTIAL with pseudo-header checksum on aggregated packets
so that L3-forwarded traffic is correctly handled by the GSO/TSO path
on the egress device.

Performance comparison between GRO and HW GRO has been carried out using
a 10Gbps NIC:
GRO: ~2.7 Gbps
HW GRO: ~8.1 Gbps

Tested-by: Madhur Agrawal <madhur.agrawal@airoha.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
Changes in RFC v3:
- Add missing TCP header length check.
- Fix TCP checkum calculation.
- Disable LRO running ndo_stop callback.
- Implement packet header split in order to support HW-GRO
- Link to v2: https://lore.kernel.org/r/20260610-airoha-eth-lro-v2-1-54be99b9a2d5@kernel.org

Changes in v2:
- Rebase on top of net-next main branch.
- Link to v1: https://lore.kernel.org/r/20260606-airoha-eth-lro-v1-1-0ebceb0eafc3@kernel.org

Changes in v1:
- Please note this patch depends on the following patch not applied yet
  to net-next
  https://lore.kernel.org/netdev/20260606-airoha_qdma_users-no-atomic-v1-1-86e2d6a1bfaf@kernel.org/T/#u
- Restrict LRO to single user QDMA.
- Introduce some more sanity checks.
- Disable scatter-gather for LRO queues.
- Run netif_receive_skb() for LRO packets.
- Link to v3: https://lore.kernel.org/r/20260528-airoha-eth-lro-v3-1-dd09c1fb000e@kernel.org

Changes in RFC v3:
- Fix double-free of the page_pool of airoha_qdma_lro_rx_process()
  fails.
- Set AIROHA_LRO_PAGE_ORDER according to PAGE_SIZE.
- Add missig gso metadata for the LRO packet.
- Link to v2: https://lore.kernel.org/r/20260526-airoha-eth-lro-v2-1-24e2a9e7a397@kernel.org

Changes in RFC v2:
- Improve performances fixing buf_size computation.
- Fix possible overflow in REG_CDM_LRO_LIMIT() register configuration.
- Require the device to be not running before configuring LRO.
- Fix configuration order in airoha_fe_lro_is_enabled().
- Check skb header length in airoha_qdma_lro_rx_process().
- Do not check net_device feature in airoha_qdma_rx_process() before
  executing airoha_qdma_lro_rx_process() but rely on
  airoha_qdma_lro_rx_process() logic.
- Fix possible double recycle in airoha_qdma_rx_process() for LRO
  packets.
- Always use AIROHA_RXQ_LRO_MAX_AGG_COUNT macro for max LRO aggregated
  fragments in airoha_fe_lro_init_rx_queue().
- Link to v1: https://lore.kernel.org/r/20260520-airoha-eth-lro-v1-1-129cc33766e9@kernel.org
---
 drivers/net/ethernet/airoha/airoha_eth.c  | 364 ++++++++++++++++++++--
 drivers/net/ethernet/airoha/airoha_eth.h  |  24 ++
 drivers/net/ethernet/airoha/airoha_regs.h |  22 +-
 3 files changed, 386 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
index 64dde6464f3f..2aa6915d424e 100644
--- a/drivers/net/ethernet/airoha/airoha_eth.c
+++ b/drivers/net/ethernet/airoha/airoha_eth.c
@@ -10,8 +10,10 @@
 #include <linux/tcp.h>
 #include <linux/u64_stats_sync.h>
 #include <net/dst_metadata.h>
+#include <net/ip6_checksum.h>
 #include <net/page_pool/helpers.h>
 #include <net/pkt_cls.h>
+#include <net/tcp.h>
 #include <uapi/linux/ppp_defs.h>
 
 #include "airoha_regs.h"
@@ -486,6 +488,73 @@ static void airoha_fe_crsn_qsel_init(struct airoha_eth *eth)
 				 CDM_CRSN_QSEL_Q1));
 }
 
+static void airoha_fe_lro_rxq_enable(struct airoha_eth *eth, int qdma_id,
+				     int lro_queue_index, int qid,
+				     int buf_size)
+{
+	int id = qdma_id + 1;
+
+	airoha_fe_rmw(eth, REG_CDM_LRO_LIMIT(id),
+		      CDM_LRO_AGG_NUM_MASK | CDM_LRO_AGG_SIZE_MASK,
+		      FIELD_PREP(CDM_LRO_AGG_SIZE_MASK, buf_size) |
+		      FIELD_PREP(CDM_LRO_AGG_NUM_MASK,
+				 AIROHA_RXQ_LRO_MAX_AGG_COUNT));
+	airoha_fe_rmw(eth, REG_CDM_LRO_AGE_TIME(id),
+		      CDM_LRO_AGE_TIME_MASK | CDM_LRO_AGG_TIME_MASK,
+		      FIELD_PREP(CDM_LRO_AGE_TIME_MASK,
+				 AIROHA_RXQ_LRO_MAX_AGE_TIME) |
+		      FIELD_PREP(CDM_LRO_AGG_TIME_MASK,
+				 AIROHA_RXQ_LRO_MAX_AGG_TIME));
+	airoha_fe_rmw(eth, REG_CDM_LRO_RXQ(id, lro_queue_index),
+		      LRO_RXQ_MASK(lro_queue_index),
+		      __field_prep(LRO_RXQ_MASK(lro_queue_index), qid));
+	airoha_fe_set(eth, REG_CDM_LRO_EN(id), BIT(lro_queue_index));
+}
+
+static void airoha_fe_lro_disable(struct airoha_eth *eth, int qdma_id)
+{
+	int i, id = qdma_id + 1;
+
+	airoha_fe_clear(eth, REG_CDM_LRO_EN(id), LRO_RXQ_EN_MASK);
+	airoha_fe_clear(eth, REG_CDM_LRO_LIMIT(id),
+			CDM_LRO_AGG_NUM_MASK | CDM_LRO_AGG_SIZE_MASK);
+	airoha_fe_clear(eth, REG_CDM_LRO_AGE_TIME(id),
+			CDM_LRO_AGE_TIME_MASK | CDM_LRO_AGG_TIME_MASK);
+	for (i = 0; i < AIROHA_MAX_NUM_LRO_QUEUES; i++)
+		airoha_fe_clear(eth, REG_CDM_LRO_RXQ(id, i), LRO_RXQ_MASK(i));
+}
+
+static bool airoha_fe_lro_is_enabled(struct airoha_eth *eth, int qdma_id)
+{
+	return airoha_fe_get(eth, REG_CDM_LRO_EN(qdma_id + 1),
+			     LRO_RXQ_EN_MASK);
+}
+
+static void airoha_dev_lro_enable(struct airoha_gdm_dev *dev)
+{
+	struct airoha_qdma *qdma = dev->qdma;
+	struct airoha_eth *eth = qdma->eth;
+	int qdma_id = qdma - &eth->qdma[0];
+	int i, lro_queue_index = 0;
+
+	for (i = 0; i < ARRAY_SIZE(qdma->q_rx); i++) {
+		struct airoha_queue *q = &qdma->q_rx[i];
+		u32 size;
+
+		if (!q->ndesc)
+			continue;
+
+		if (!airoha_qdma_is_lro_queue(q))
+			continue;
+
+		size = SKB_WITH_OVERHEAD(AIROHA_RX_LEN(q->buf_size));
+		size = min_t(u32, size, CDM_LRO_AGG_SIZE_MASK);
+		airoha_fe_lro_rxq_enable(eth, qdma_id, lro_queue_index, i,
+					 size);
+		lro_queue_index++;
+	}
+}
+
 static int airoha_fe_init(struct airoha_eth *eth)
 {
 	airoha_fe_maccr_init(eth);
@@ -611,6 +680,7 @@ static int airoha_qdma_fill_rx_queue(struct airoha_queue *q)
 		e->dma_addr = page_pool_get_dma_addr(page) + offset;
 		e->dma_len = SKB_WITH_OVERHEAD(AIROHA_RX_LEN(q->buf_size));
 
+		WRITE_ONCE(desc->tcp_ts_reply, 0);
 		val = FIELD_PREP(QDMA_DESC_LEN_MASK, e->dma_len);
 		WRITE_ONCE(desc->ctrl, cpu_to_le32(val));
 		WRITE_ONCE(desc->addr, cpu_to_le32(e->dma_addr));
@@ -652,12 +722,173 @@ airoha_qdma_get_gdm_dev(struct airoha_eth *eth, struct airoha_qdma_desc *desc)
 	return port->devs[d] ? port->devs[d] : ERR_PTR(-ENODEV);
 }
 
+static struct sk_buff *airoha_qdma_lro_rx_skb(struct airoha_queue *q,
+					      struct airoha_qdma_desc *desc,
+					      struct airoha_queue_entry *e)
+{
+	u32 len, th_off, tcp_ack_seq, agg_count, data_off, data_len;
+	u32 desc_ctrl = le32_to_cpu(READ_ONCE(desc->ctrl));
+	u32 msg1 = le32_to_cpu(READ_ONCE(desc->msg1));
+	u32 msg2 = le32_to_cpu(READ_ONCE(desc->msg2));
+	u32 msg3 = le32_to_cpu(READ_ONCE(desc->msg3));
+	struct skb_shared_info *shinfo;
+	u16 tcp_win, l2_len;
+	struct sk_buff *skb;
+	struct tcphdr *th;
+	struct page *page;
+	bool ipv4, ipv6;
+
+	ipv4 = FIELD_GET(QDMA_ETH_RXMSG_IP4_MASK, msg1);
+	ipv6 = FIELD_GET(QDMA_ETH_RXMSG_IP6_MASK, msg1);
+	if (!ipv4 && !ipv6)
+		return NULL;
+
+	l2_len = FIELD_GET(QDMA_ETH_RXMSG_L2_LEN_MASK, msg2);
+	len = FIELD_GET(QDMA_DESC_LEN_MASK, desc_ctrl);
+
+	if (ipv4) {
+		struct iphdr *iph;
+
+		if (len < l2_len + sizeof(*iph))
+			return NULL;
+
+		iph = (struct iphdr *)(e->buf + l2_len);
+		if (iph->protocol != IPPROTO_TCP)
+			return NULL;
+
+		if (iph->ihl < 5)
+			return NULL;
+
+		th_off = l2_len + (iph->ihl << 2);
+		if (len < th_off)
+			return NULL;
+
+		iph->tot_len = cpu_to_be16(len - l2_len);
+		iph->check = 0;
+		iph->check = ip_fast_csum((void *)iph, iph->ihl);
+	} else {
+		struct ipv6hdr *ip6h;
+
+		th_off = l2_len + sizeof(*ip6h);
+		if (len < th_off)
+			return NULL;
+
+		ip6h = (struct ipv6hdr *)(e->buf + l2_len);
+		if (ip6h->nexthdr != NEXTHDR_TCP)
+			return NULL;
+
+		ip6h->payload_len = cpu_to_be16(len - th_off);
+	}
+
+	if (len < th_off + sizeof(*th))
+		return NULL;
+
+	th = (struct tcphdr *)(e->buf + th_off);
+	if (th->doff < 5)
+		return NULL;
+
+	data_off = th_off + (th->doff << 2);
+	if (len < data_off)
+		return NULL;
+
+	tcp_win = FIELD_GET(QDMA_ETH_RXMSG_TCP_WIN_MASK, msg3);
+	tcp_ack_seq = le32_to_cpu(READ_ONCE(desc->data));
+	th->ack_seq = cpu_to_be32(tcp_ack_seq);
+	th->window = cpu_to_be16(tcp_win);
+
+	/* Check tcp timestamp option */
+	if (th->doff == (sizeof(*th) + TCPOLEN_TSTAMP_ALIGNED) / 4) {
+		u32 topt = get_unaligned_be32(th + 1);
+
+		if (topt == ((TCPOPT_NOP << 24) | (TCPOPT_NOP << 16) |
+			     (TCPOPT_TIMESTAMP << 8) | TCPOLEN_TIMESTAMP)) {
+			u8 *ptr = (u8 *)th + sizeof(*th) + 2 * sizeof(__be32);
+			__le32 tcp_ts_reply = READ_ONCE(desc->tcp_ts_reply);
+
+			put_unaligned_be32(le32_to_cpu(tcp_ts_reply), ptr);
+		}
+	}
+
+	if (ipv4) {
+		struct iphdr *iph = (struct iphdr *)(e->buf + l2_len);
+
+		th->check = ~tcp_v4_check(len - th_off, iph->saddr,
+					  iph->daddr, 0);
+	} else {
+		struct ipv6hdr *ip6h = (struct ipv6hdr *)(e->buf + l2_len);
+
+		th->check = ~tcp_v6_check(len - th_off, &ip6h->saddr,
+					  &ip6h->daddr, 0);
+	}
+
+	/* Split network headers and payload to rely on GRO.
+	 * We need to do it in the driver since the NIC does
+	 * not support it.
+	 */
+	skb = napi_alloc_skb(&q->napi, data_off);
+	if (!skb)
+		return NULL;
+
+	__skb_put(skb, data_off);
+	memcpy(skb->data, e->buf, data_off);
+
+	page = virt_to_head_page(e->buf);
+	data_len = len - data_off;
+	shinfo = skb_shinfo(skb);
+	skb_add_rx_frag(skb, shinfo->nr_frags, page,
+			e->buf + data_off - page_address(page), data_len,
+			q->buf_size);
+
+	shinfo->gso_type = ipv4 ? SKB_GSO_TCPV4 : SKB_GSO_TCPV6;
+	agg_count = FIELD_GET(QDMA_ETH_RXMSG_AGG_COUNT_MASK, msg2);
+	shinfo->gso_size = DIV_ROUND_UP(data_len, agg_count);
+	shinfo->gso_segs = agg_count;
+
+	skb->csum_start = skb_headroom(skb) + th_off;
+	skb->csum_offset = offsetof(struct tcphdr, check);
+	skb->ip_summed = CHECKSUM_PARTIAL;
+
+	return skb;
+}
+
+static struct sk_buff *airoha_qdma_build_rx_skb(struct airoha_queue *q,
+						struct airoha_qdma_desc *desc,
+						struct airoha_queue_entry *e,
+						struct net_device *dev)
+{
+	u32 msg2 = le32_to_cpu(READ_ONCE(desc->msg2));
+	int qid = q - &q->qdma->q_rx[0];
+	struct sk_buff *skb;
+
+	if (FIELD_GET(QDMA_ETH_RXMSG_AGG_COUNT_MASK, msg2) > 1) { /* LRO */
+		skb = airoha_qdma_lro_rx_skb(q, desc, e);
+		if (!skb)
+			return NULL;
+	} else {
+		u32 desc_ctrl = le32_to_cpu(READ_ONCE(desc->ctrl));
+		u32 len = FIELD_GET(QDMA_DESC_LEN_MASK, desc_ctrl);
+
+		skb = napi_build_skb(e->buf - AIROHA_RX_HEADROOM, q->buf_size);
+		if (!skb)
+			return NULL;
+
+		skb_reserve(skb, AIROHA_RX_HEADROOM);
+		__skb_put(skb, len);
+		skb->ip_summed = CHECKSUM_UNNECESSARY;
+	}
+
+	skb_mark_for_recycle(skb);
+	skb->dev = dev;
+	skb_record_rx_queue(skb, qid);
+	skb->protocol = eth_type_trans(skb, dev);
+
+	return skb;
+}
+
 static int airoha_qdma_rx_process(struct airoha_queue *q, int budget)
 {
 	enum dma_data_direction dir = page_pool_get_dma_dir(q->page_pool);
-	struct airoha_qdma *qdma = q->qdma;
-	struct airoha_eth *eth = qdma->eth;
-	int qid = q - &qdma->q_rx[0];
+	struct airoha_eth *eth = q->qdma->eth;
 	int done = 0;
 
 	while (done < budget) {
@@ -693,18 +924,9 @@ static int airoha_qdma_rx_process(struct airoha_queue *q, int budget)
 
 		netdev = netdev_from_priv(dev);
 		if (!q->skb) { /* first buffer */
-			q->skb = napi_build_skb(e->buf - AIROHA_RX_HEADROOM,
-						q->buf_size);
+			q->skb = airoha_qdma_build_rx_skb(q, desc, e, netdev);
 			if (!q->skb)
 				goto free_frag;
-
-			skb_reserve(q->skb, AIROHA_RX_HEADROOM);
-			__skb_put(q->skb, len);
-			skb_mark_for_recycle(q->skb);
-			q->skb->dev = netdev;
-			q->skb->protocol = eth_type_trans(q->skb, netdev);
-			q->skb->ip_summed = CHECKSUM_UNNECESSARY;
-			skb_record_rx_queue(q->skb, qid);
 		} else { /* scattered frame */
 			struct skb_shared_info *shinfo = skb_shinfo(q->skb);
 			int nr_frags = shinfo->nr_frags;
@@ -795,12 +1017,10 @@ static int airoha_qdma_rx_napi_poll(struct napi_struct *napi, int budget)
 static int airoha_qdma_init_rx_queue(struct airoha_queue *q,
 				     struct airoha_qdma *qdma, int ndesc)
 {
-	const struct page_pool_params pp_params = {
-		.order = 0,
+	struct page_pool_params pp_params = {
 		.pool_size = 256,
 		.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV,
 		.dma_dir = DMA_FROM_DEVICE,
-		.max_len = PAGE_SIZE,
 		.nid = NUMA_NO_NODE,
 		.dev = qdma->eth->dev,
 		.napi = &q->napi,
@@ -808,9 +1028,10 @@ static int airoha_qdma_init_rx_queue(struct airoha_queue *q,
 	struct airoha_eth *eth = qdma->eth;
 	int qid = q - &qdma->q_rx[0], thr;
 	dma_addr_t dma_addr;
+	bool lro_q;
 
-	q->buf_size = PAGE_SIZE / 2;
 	q->qdma = qdma;
+	lro_q = airoha_qdma_is_lro_queue(q);
 
 	q->entry = devm_kzalloc(eth->dev, ndesc * sizeof(*q->entry),
 				GFP_KERNEL);
@@ -822,6 +1043,9 @@ static int airoha_qdma_init_rx_queue(struct airoha_queue *q,
 	if (!q->desc)
 		return -ENOMEM;
 
+	pp_params.order = lro_q ? AIROHA_LRO_PAGE_ORDER : 0;
+	pp_params.max_len = PAGE_SIZE << pp_params.order;
+
 	q->page_pool = page_pool_create(&pp_params);
 	if (IS_ERR(q->page_pool)) {
 		int err = PTR_ERR(q->page_pool);
@@ -830,6 +1054,7 @@ static int airoha_qdma_init_rx_queue(struct airoha_queue *q,
 		return err;
 	}
 
+	q->buf_size = lro_q ? pp_params.max_len : pp_params.max_len / 2;
 	q->ndesc = ndesc;
 	netif_napi_add(eth->napi_dev, &q->napi, airoha_qdma_rx_napi_poll);
 
@@ -843,7 +1068,12 @@ static int airoha_qdma_init_rx_queue(struct airoha_queue *q,
 			FIELD_PREP(RX_RING_THR_MASK, thr));
 	airoha_qdma_rmw(qdma, REG_RX_DMA_IDX(qid), RX_RING_DMA_IDX_MASK,
 			FIELD_PREP(RX_RING_DMA_IDX_MASK, q->head));
-	airoha_qdma_set(qdma, REG_RX_SCATTER_CFG(qid), RX_RING_SG_EN_MASK);
+	if (lro_q)
+		airoha_qdma_clear(qdma, REG_RX_SCATTER_CFG(qid),
+				  RX_RING_SG_EN_MASK);
+	else
+		airoha_qdma_set(qdma, REG_RX_SCATTER_CFG(qid),
+				RX_RING_SG_EN_MASK);
 
 	airoha_qdma_fill_rx_queue(q);
 
@@ -865,6 +1095,7 @@ static void airoha_qdma_cleanup_rx_queue(struct airoha_queue *q)
 					page_pool_get_dma_dir(q->page_pool));
 		page_pool_put_full_page(q->page_pool, page, false);
 		/* Reset DMA descriptor */
+		WRITE_ONCE(desc->tcp_ts_reply, 0);
 		WRITE_ONCE(desc->ctrl, 0);
 		WRITE_ONCE(desc->addr, 0);
 		WRITE_ONCE(desc->data, 0);
@@ -1802,6 +2033,37 @@ static void airoha_update_hw_stats(struct airoha_gdm_dev *dev)
 	spin_unlock(&port->stats_lock);
 }
 
+static void airoha_update_netdev_features(struct airoha_gdm_dev *dev)
+{
+	struct airoha_eth *eth = dev->eth;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(eth->ports); i++) {
+		struct airoha_gdm_port *port = eth->ports[i];
+		int j;
+
+		if (!port)
+			continue;
+
+		for (j = 0; j < ARRAY_SIZE(port->devs); j++) {
+			struct airoha_gdm_dev *iter_dev = port->devs[j];
+			struct net_device *netdev;
+
+			if (!iter_dev || iter_dev == dev)
+				continue;
+
+			if (iter_dev->qdma != dev->qdma)
+				continue;
+
+			netdev = netdev_from_priv(iter_dev);
+			if (netdev->reg_state != NETREG_REGISTERED)
+				continue;
+
+			netdev_update_features(netdev);
+		}
+	}
+}
+
 static int airoha_dev_open(struct net_device *netdev)
 {
 	int err, len = ETH_HLEN + netdev->mtu + ETH_FCS_LEN;
@@ -1809,6 +2071,17 @@ static int airoha_dev_open(struct net_device *netdev)
 	struct airoha_gdm_port *port = dev->port;
 	u32 cur_len, pse_port = FE_PSE_PORT_PPE1;
 	struct airoha_qdma *qdma = dev->qdma;
+	int qdma_id = qdma - &qdma->eth->qdma[0];
+
+	/* HW GRO is configured on the QDMA and it is shared between
+	 * all the devices using it. Refuse to open a second device on
+	 * the same QDMA if HW GRO is enabled on any device sharing it.
+	 */
+	if (qdma->users && airoha_fe_lro_is_enabled(qdma->eth, qdma_id)) {
+		netdev_warn(netdev, "required to disable HW GRO on QDMA%d\n",
+			    qdma_id);
+		return -EBUSY;
+	}
 
 	netif_tx_start_all_queues(netdev);
 	err = airoha_set_vip_for_gdm_port(dev, true);
@@ -1848,6 +2121,11 @@ static int airoha_dev_open(struct net_device *netdev)
 	airoha_set_gdm_port_fwd_cfg(qdma->eth, REG_GDM_FWD_CFG(port->id),
 				    pse_port);
 
+	if (netdev->features & NETIF_F_GRO_HW)
+		airoha_dev_lro_enable(dev);
+
+	airoha_update_netdev_features(dev);
+
 	return 0;
 }
 
@@ -1895,6 +2173,9 @@ static int airoha_dev_stop(struct net_device *netdev)
 					    FE_PSE_PORT_DROP);
 
 	if (!--qdma->users) {
+		int qdma_id = qdma - &qdma->eth->qdma[0];
+
+		airoha_fe_lro_disable(qdma->eth, qdma_id);
 		airoha_qdma_clear(qdma, REG_QDMA_GLOBAL_CFG,
 				  GLOBAL_CFG_TX_DMA_EN_MASK |
 				  GLOBAL_CFG_RX_DMA_EN_MASK);
@@ -1907,6 +2188,8 @@ static int airoha_dev_stop(struct net_device *netdev)
 		}
 	}
 
+	airoha_update_netdev_features(dev);
+
 	return 0;
 }
 
@@ -2176,6 +2459,41 @@ int airoha_get_fe_port(struct airoha_gdm_dev *dev)
 	}
 }
 
+static netdev_features_t airoha_dev_fix_features(struct net_device *netdev,
+						 netdev_features_t features)
+{
+	struct airoha_gdm_dev *dev = netdev_priv(netdev);
+	struct airoha_qdma *qdma = dev->qdma;
+
+	if (qdma->users > 1)
+		features &= ~NETIF_F_GRO_HW;
+
+	return features;
+}
+
+static int airoha_dev_set_features(struct net_device *netdev,
+				   netdev_features_t features)
+{
+	netdev_features_t diff = netdev->features ^ features;
+	struct airoha_gdm_dev *dev = netdev_priv(netdev);
+
+	if (!(diff & NETIF_F_GRO_HW))
+		return 0;
+
+	if (!netif_running(netdev))
+		return 0;
+
+	if (features & NETIF_F_GRO_HW) {
+		airoha_dev_lro_enable(dev);
+	} else {
+		int qdma_id = dev->qdma - &dev->eth->qdma[0];
+
+		airoha_fe_lro_disable(dev->eth, qdma_id);
+	}
+
+	return 0;
+}
+
 static netdev_tx_t airoha_dev_xmit(struct sk_buff *skb,
 				   struct net_device *netdev)
 {
@@ -3102,6 +3420,8 @@ static const struct net_device_ops airoha_netdev_ops = {
 	.ndo_stop		= airoha_dev_stop,
 	.ndo_change_mtu		= airoha_dev_change_mtu,
 	.ndo_select_queue	= airoha_dev_select_queue,
+	.ndo_fix_features	= airoha_dev_fix_features,
+	.ndo_set_features	= airoha_dev_set_features,
 	.ndo_start_xmit		= airoha_dev_xmit,
 	.ndo_get_stats64        = airoha_dev_get_stats64,
 	.ndo_set_mac_address	= airoha_dev_set_macaddr,
@@ -3189,11 +3509,9 @@ static int airoha_alloc_gdm_device(struct airoha_eth *eth,
 	netdev->ethtool_ops = &airoha_ethtool_ops;
 	netdev->max_mtu = AIROHA_MAX_MTU;
 	netdev->watchdog_timeo = 5 * HZ;
-	netdev->hw_features = NETIF_F_IP_CSUM | NETIF_F_RXCSUM | NETIF_F_TSO6 |
-			      NETIF_F_IPV6_CSUM | NETIF_F_SG | NETIF_F_TSO |
-			      NETIF_F_HW_TC;
-	netdev->features |= netdev->hw_features;
-	netdev->vlan_features = netdev->hw_features;
+	netdev->hw_features = AIROHA_HW_FEATURES | NETIF_F_GRO_HW;
+	netdev->features |= AIROHA_HW_FEATURES;
+	netdev->vlan_features = AIROHA_HW_FEATURES;
 	SET_NETDEV_DEV(netdev, eth->dev);
 
 	/* reserve hw queues for HTB offloading */
diff --git a/drivers/net/ethernet/airoha/airoha_eth.h b/drivers/net/ethernet/airoha/airoha_eth.h
index 41d2e7a1f9fb..c13757a88aba 100644
--- a/drivers/net/ethernet/airoha/airoha_eth.h
+++ b/drivers/net/ethernet/airoha/airoha_eth.h
@@ -44,6 +44,18 @@
 	 (_n) == 15 ? 128 :		\
 	 (_n) ==  0 ? 1024 : 16)
 
+#define AIROHA_LRO_PAGE_ORDER		order_base_2(SZ_16K / PAGE_SIZE)
+#define AIROHA_MAX_NUM_LRO_QUEUES	8
+#define AIROHA_RXQ_LRO_EN_MASK		GENMASK(31, 24)
+#define AIROHA_RXQ_LRO_MAX_AGG_COUNT	64
+#define AIROHA_RXQ_LRO_MAX_AGG_TIME	100
+#define AIROHA_RXQ_LRO_MAX_AGE_TIME	2000
+
+#define AIROHA_HW_FEATURES			\
+	(NETIF_F_IP_CSUM | NETIF_F_RXCSUM |	\
+	 NETIF_F_TSO6 | NETIF_F_IPV6_CSUM |	\
+	 NETIF_F_SG | NETIF_F_TSO | NETIF_F_HW_TC)
+
 #define PSE_RSV_PAGES			128
 #define PSE_QUEUE_RSV_PAGES		64
 
@@ -673,6 +685,18 @@ static inline bool airoha_is_7583(struct airoha_eth *eth)
 	return eth->soc->version == 0x7583;
 }
 
+static inline bool airoha_qdma_is_lro_queue(struct airoha_queue *q)
+{
+	struct airoha_qdma *qdma = q->qdma;
+	int qid = q - &qdma->q_rx[0];
+
+	/* EN7581 SoC supports at most 8 LRO rx queues */
+	BUILD_BUG_ON(hweight32(AIROHA_RXQ_LRO_EN_MASK) >
+		     AIROHA_MAX_NUM_LRO_QUEUES);
+
+	return !!(AIROHA_RXQ_LRO_EN_MASK & BIT(qid));
+}
+
 int airoha_get_fe_port(struct airoha_gdm_dev *dev);
 bool airoha_is_valid_gdm_dev(struct airoha_eth *eth,
 			     struct airoha_gdm_dev *dev);
diff --git a/drivers/net/ethernet/airoha/airoha_regs.h b/drivers/net/ethernet/airoha/airoha_regs.h
index 436f3c8779c1..dfc786583774 100644
--- a/drivers/net/ethernet/airoha/airoha_regs.h
+++ b/drivers/net/ethernet/airoha/airoha_regs.h
@@ -122,6 +122,20 @@
 #define CDM_CRSN_QSEL_REASON_MASK(_n)	\
 	GENMASK(4 + (((_n) % 4) << 3),	(((_n) % 4) << 3))
 
+#define REG_CDM_LRO_RXQ(_n, _m)		(CDM_BASE(_n) + 0x78 + ((_m) & 0x4))
+#define LRO_RXQ_MASK(_n)		GENMASK(4 + (((_n) & 0x3) << 3), ((_n) & 0x3) << 3)
+
+#define REG_CDM_LRO_EN(_n)		(CDM_BASE(_n) + 0x80)
+#define LRO_RXQ_EN_MASK			GENMASK(7, 0)
+
+#define REG_CDM_LRO_LIMIT(_n)		(CDM_BASE(_n) + 0x84)
+#define CDM_LRO_AGG_NUM_MASK		GENMASK(23, 16)
+#define CDM_LRO_AGG_SIZE_MASK		GENMASK(15, 0)
+
+#define REG_CDM_LRO_AGE_TIME(_n)	(CDM_BASE(_n) + 0x88)
+#define CDM_LRO_AGE_TIME_MASK		GENMASK(31, 16)
+#define CDM_LRO_AGG_TIME_MASK		GENMASK(15, 0)
+
 #define REG_GDM_FWD_CFG(_n)		GDM_BASE(_n)
 #define GDM_PAD_EN_MASK			BIT(28)
 #define GDM_DROP_CRC_ERR_MASK		BIT(23)
@@ -883,9 +897,15 @@
 #define QDMA_ETH_RXMSG_SPORT_MASK	GENMASK(25, 21)
 #define QDMA_ETH_RXMSG_CRSN_MASK	GENMASK(20, 16)
 #define QDMA_ETH_RXMSG_PPE_ENTRY_MASK	GENMASK(15, 0)
+/* RX MSG2 */
+#define QDMA_ETH_RXMSG_AGG_COUNT_MASK	GENMASK(31, 24)
+#define QDMA_ETH_RXMSG_L2_LEN_MASK	GENMASK(6, 0)
+/* RX MSG3 */
+#define QDMA_ETH_RXMSG_AGG_LEN_MASK	GENMASK(31, 16)
+#define QDMA_ETH_RXMSG_TCP_WIN_MASK	GENMASK(15, 0)
 
 struct airoha_qdma_desc {
-	__le32 rsv;
+	__le32 tcp_ts_reply;
 	__le32 ctrl;
 	__le32 addr;
 	__le32 data;
-- 
2.54.0



^ permalink raw reply related

* Re: [PATCH 3/4] mfd: Add support for UGREEN NASync DH2300 MCU
From: Alexey Charkov @ 2026-06-18 14:39 UTC (permalink / raw)
  To: Lee Jones
  Cc: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
	Liam Girdwood, Mark Brown, devicetree, linux-kernel,
	linux-arm-kernel, linux-rockchip
In-Reply-To: <20260618124034.GI1672911@google.com>

On Thu, Jun 18, 2026 at 4:40 PM Lee Jones <lee@kernel.org> wrote:
>
> On Fri, 12 Jun 2026, Alexey Charkov wrote:
>
> > Add a driver for the HC32F005 MCU used as an embedded controller on the
> > UGREEN NASync DH2300 NAS.
> >
> > This part provides the shared I2C regmap to be used by function-specific
> > sub-devices, and instantiates the SATA drive-bay power gate regulator.
> > Implemented as an MFD to allow for other functions of the MCU to be added
> > later: vendor binaries imply that it also provides a hardware watchdog
> > and somehow serves as a wake source, but so far only the SATA power gating
> > function has been confirmed in absence of documentation and sources for the
> > vendor firmware.
> >
> > Signed-off-by: Alexey Charkov <alchark@flipper.net>
> > ---
> >  MAINTAINERS                     |  1 +
> >  drivers/mfd/Kconfig             | 16 +++++++++++
> >  drivers/mfd/Makefile            |  1 +
> >  drivers/mfd/ugreen-dh2300-mcu.c | 60 +++++++++++++++++++++++++++++++++++++++++
> >  4 files changed, 78 insertions(+)
>
> Did you see: drivers/mfd/simple-mfd-i2c.c ?

Oh. Now I did :-D

It's exactly what I needed, thanks a lot for the pointer. Will drop
the boilerplate in v2 and instead instantiate my tiny child device
from there.

Best regards,
Alexey


^ permalink raw reply

* Re: [PATCH v2 0/3] Optimize S2 page splitting
From: Leonardo Bras @ 2026-06-18 14:38 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Joey Gouly, Steffen Eiden,
	Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon,
	Fuad Tabba, Raghavendra Rao Ananta
  Cc: Leonardo Bras, linux-arm-kernel, kvmarm, linux-kernel
In-Reply-To: <20260618131447.764085-1-leo.bras@arm.com>

On Thu, Jun 18, 2026 at 02:14:41PM +0100, Leonardo Bras wrote:
> While playing with dirty-bit tracking, I decided to take a look on how page
> splitting works. Found out all entries are walked, even though we can infer,
> for instance that:
> - If a level-3 entry is walked, it means the parent level-2 entry is split
> - If a split just succeeded in an table entry, it means all children nodes
>   are already split
> 
> This patches' idea is to introduce new walking flags to skip pagetable
> levels 0-3.
> 
> The idea of skipping child nodes was also tested, but it was marginally
> slower than just skipping levels, so it was discarted.
> 
> Optimization measured on two scenarios involving eager-splitting on a
> VM with 1 memslot of 16GB:
> - Scenario 1: No manual protect, whole memslot split at dirty-track enable
>   (KVM_SET_USER_MEMORY_REGION2 ioctl with KVM_MEM_LOG_DIRTY_PAGES)
>   - Split happens only once, whole region
>   - Evalutes improved batch performance of splitting
> - Scenario 2: Manual protect, split happens during every dirty-bit clean
>   (KVM_CLEAR_DIRTY_LOG ioctl), average for 2 iterations.
>   - Split called multiple times, for smaller 64-page sections.
>   - Evaluate improved performance for multiple calls
> 
> Scenario 1, improvement on dirty-track enable ioctl for the memslot:
> - Memory was already split (4k pages):  -44.01% runtime (stdev 2.80%)
> - THP backed memory:                    -24.66% runtime (stdev 1.21%)
> - 16x1GB hugetlb memory:                -24.78% runtime (stdev 0.85%)
> 
> Scenario 2, improvement on dirty-log clean ioctl for the memslot:
> - Memory was already split (4k pages):  -38.98% runtime (stdev 1.91%)
> - THP backed memory:                    -25.49% runtime (stdev 0.65%)
> - 16x1GB hugetlb memory:                -24.24% runtime (stdev 0.65%)
> 
> For collecting above numbers, the following script was ran in both vanilla
> and patched kernels, with kernel parameter 'default_hugepagesz=1G', on an
> TX2 with 32GB RAM.
> 
> --- dirty_test.sh
> #!/bin/bash
> filename=$(uname -r |cut -d'-' -f 4-)
> 
> run_test(){
>   uname -a
>   cat /proc/cmdline
> 
>   #prepare
>   sudo bash -c 'echo 64 > /proc/sys/vm/nr_hugepages'
> 
>   ./dirty_log_perf_test -g -b 64G
>   ./dirty_log_perf_test -g -b 64G -s anonymous_thp
>   ./dirty_log_perf_test -g -b 64G -s shared_hugetlb
> 
>   ./dirty_log_perf_test -b 64G
>   ./dirty_log_perf_test -b 64G -s anonymous_thp
>   ./dirty_log_perf_test -b 64G -s shared_hugetlb
> }
> 
> run_test 2>&1 | tee ${filename}
> ---

s/64G/16G/ on above script

> 
> Above dirty_log_perf_test command is the standard kvm selftest found in the
> kernel tree. It tested the following guest modes:
> Testing guest mode: PA-bits:40,  VA-bits:48,  4K pages
> Testing guest mode: PA-bits:40,  VA-bits:48, 64K pages
> Testing guest mode: PA-bits:36,  VA-bits:48,  4K pages
> Testing guest mode: PA-bits:36,  VA-bits:48, 64K pages
> 
> Performance numbers from above modes were used to calculate average and
> stdev showed in the optimization results.
> 
> Changes since v1:
> - Fixed inverted flag verification priority (Sashiko)
> - Fixed incorrectly skipping POST call if level was skipped (Sashiko), and to that
> - New pre-patch that changes goto-out -> return to avoid re-testing walk_continue 
> v1 Link: https://lore.kernel.org/lkml/20260610202112.2695205-2-leo.bras@arm.com/
> 
> Changes since RFC:
> - Changed approach from return value to walk flags (Will Deacon)
> - Discarted skip_child approach (Oliver Upton)
> - Measured in real hardware, and from userspace perspective (Marc Zyngier)
> - Better explanation of what and how numbers were collected
> RFC Link: https://lore.kernel.org/all/20260515195904.2466381-1-leo.bras@arm.com/
> 
> Thanks!
> Leo
> 
> Leonardo Bras (3):
>   KVM: arm64: Avoid re-testing walk_continue
>   KVM: arm64: Introduce KVM_PGTABLE_WALK_SKIP_LEVEL* walk flags
>   KVM: arm64: Make stage2_split_walker() skip unnecessary walks
> 
>  arch/arm64/include/asm/kvm_pgtable.h | 13 +++++++++++++
>  arch/arm64/kvm/hyp/pgtable.c         | 28 +++++++++++++++++++++-------
>  2 files changed, 34 insertions(+), 7 deletions(-)
> 
> 
> base-commit: 66affa37cfac0aec061cc4bcf4a065b0c52f7e19
> -- 
> 2.54.0
> 


^ permalink raw reply

* [PATCH v3] i2c: sun6i-p2wi: Fix device node reference leak in p2wi_probe
From: Uday Khare @ 2026-06-18 14:37 UTC (permalink / raw)
  To: Andi Shyti
  Cc: Chen-Yu Tsai, Jernej Skrabec, Samuel Holland, Arnd Bergmann,
	Boris Brezillon, Maxime Ripard, Wolfram Sang, linux-i2c,
	linux-arm-kernel, linux-sunxi, linux-kernel, Uday Khare

In p2wi_probe(), the device node reference obtained via
of_get_next_available_child() is stored in childnp. This reference is
never released, causing a device node reference leak.

Fix this by calling of_node_put(childnp) on both the error and success
paths.

Fixes: 3e833490fae5 ("i2c: sunxi: add P2WI (Push/Pull 2 Wire Interface) controller support")
Signed-off-by: Uday Khare <udaykhare77@gmail.com>
---
v3:
- Revert back to manual of_node_put() because the function uses goto-based
  error handling, and mixing the two styles is discouraged (reported by
  Sashiko-bot).
v2:
- Use __free(device_node) and include <linux/cleanup.h> to automate the device
  node reference cleanup instead of manually calling of_node_put() on error and
  success paths (suggested by Chen-Yu Tsai).

 drivers/i2c/busses/i2c-sun6i-p2wi.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/i2c/busses/i2c-sun6i-p2wi.c b/drivers/i2c/busses/i2c-sun6i-p2wi.c
index dffbe776a195..7c0239f8937e 100644
--- a/drivers/i2c/busses/i2c-sun6i-p2wi.c
+++ b/drivers/i2c/busses/i2c-sun6i-p2wi.c
@@ -184,7 +184,6 @@ static int p2wi_probe(struct platform_device *pdev)
 {
 	struct device *dev = &pdev->dev;
 	struct device_node *np = dev->of_node;
-	struct device_node *childnp;
 	unsigned long parent_clk_freq;
 	u32 clk_freq = I2C_MAX_STANDARD_MODE_FREQ;
 	struct p2wi *p2wi;
@@ -217,14 +216,19 @@ static int p2wi_probe(struct platform_device *pdev)
 	 * In this case the target_addr is set to -1 and won't be checked when
 	 * launching a P2WI transfer.
 	 */
+	struct device_node *childnp;
+
 	childnp = of_get_next_available_child(np, NULL);
 	if (childnp) {
 		ret = of_property_read_u32(childnp, "reg", &target_addr);
-		if (ret)
+		if (ret) {
+			of_node_put(childnp);
 			return dev_err_probe(dev, -EINVAL,
 					     "invalid target address on node %pOF\n", childnp);
+		}
 
 		p2wi->target_addr = target_addr;
+		of_node_put(childnp);
 	}
 
 	p2wi->regs = devm_platform_ioremap_resource(pdev, 0);
-- 
2.54.0



^ permalink raw reply related

* Re: [RFC PATCH 2/6] arm64: mm: allow huge vmap permission adjustments with bbml2_no_abort
From: Ryan Roberts @ 2026-06-18 14:21 UTC (permalink / raw)
  To: Adrian Barnaś, linux-arm-kernel
  Cc: linux-mm, Catalin Marinas, Will Deacon, David Hildenbrand,
	Mike Rapoport (Microsoft), Ard Biesheuvel, Christoph Lameter,
	Yang Shi, Brendan Jackman
In-Reply-To: <20260611130144.1385343-3-abarnas@google.com>

On 11/06/2026 14:01, Adrian Barnaś wrote:
> Remove the protection against huge vmap permission adjustments on
> systems that support the bbml2_no_abort CPU feature.
> 
> Splitting live kernel VA section mappings into page mappings was
> restricted because it could cause TLB Conflict Aborts. This forced
> permission adjustments on memory allocated with VM_ALLOW_HUGE_VMAP to be
> rejected, resulting in performance drops (e.g., when enforcing rodata=on
> disables huge mappings).
> 
> The bbml2_no_abort feature (which mirrors the architectural guarantees of
> FEAT_BBML3) ensures that changing between table and block sizes without
> following a break-before-make sequence will not generate a TLB Conflict
> Abort. This hardware guarantee makes it safe to allow dynamic permission
> adjustments on huge vmap regions.

FYI Linu Cherian has a series that renames bbml2_no_abort to bbml3. I think he's
planning to post at -rc1. Would be good to rebase this on top once merged.

> 
> Signed-off-by: Adrian Barnaś <abarnas@google.com>
> ---
>  arch/arm64/mm/pageattr.c | 22 ++++++++++++++--------
>  1 file changed, 14 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
> index 358d1dc9a576..88720bbba892 100644
> --- a/arch/arm64/mm/pageattr.c
> +++ b/arch/arm64/mm/pageattr.c
> @@ -157,23 +157,29 @@ static int change_memory_common(unsigned long addr, int numpages,
>  	}
>  
>  	/*
> -	 * Kernel VA mappings are always live, and splitting live section
> -	 * mappings into page mappings may cause TLB conflicts. This means
> -	 * we have to ensure that changing the permission bits of the range
> -	 * we are operating on does not result in such splitting.
> -	 *
>  	 * Let's restrict ourselves to mappings created by vmalloc (or vmap).
> -	 * Disallow VM_ALLOW_HUGE_VMAP mappings to guarantee that only page
> -	 * mappings are updated and splitting is never needed.
>  	 *
>  	 * So check whether the [addr, addr + size) interval is entirely
>  	 * covered by precisely one VM area that has the VM_ALLOC flag set.
>  	 */
>  	area = find_vm_area((void *)addr);
> +
>  	if (!area ||
>  	    ((unsigned long)kasan_reset_tag((void *)end) >
>  	     (unsigned long)kasan_reset_tag(area->addr) + area->size) ||
> -	    ((area->flags & (VM_ALLOC | VM_ALLOW_HUGE_VMAP)) != VM_ALLOC))
> +	    !(area->flags & VM_ALLOC))
> +		return -EINVAL;
> +
> +	/*
> +	 * Kernel VA mappings are always live, and splitting live section
> +	 * mappings into page mappings may cause TLB conflicts if bbml2_noabort
> +	 * is not present.
> +	 *
> +	 * While bbml2_noabort is not present disallow VM_ALLOW_HUGE_VMAP mappings
> +	 * to guarantee that only page mappings are updated and splitting is not
> +	 * needed.
> +	 */
> +	if (!system_supports_bbml2_noabort() && (area->flags & (VM_ALLOW_HUGE_VMAP)))

nit: no need for the parentheses around VM_ALLOW_HUGE_VMAP.

With that:

Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>

>  		return -EINVAL;
>  
>  	if (!numpages)



^ permalink raw reply

* [PATCH v1 1/1] i2c: nomadik: Use generic definitions for bus frequencies
From: Andy Shevchenko @ 2026-06-18 14:17 UTC (permalink / raw)
  To: Andy Shevchenko, linux-arm-kernel, linux-i2c, linux-kernel
  Cc: Linus Walleij, Andi Shyti

Since we have generic definitions for bus frequencies, let's use them.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
---
 drivers/i2c/busses/i2c-nomadik.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/i2c/busses/i2c-nomadik.c b/drivers/i2c/busses/i2c-nomadik.c
index b63ee51c1652..404709179d73 100644
--- a/drivers/i2c/busses/i2c-nomadik.c
+++ b/drivers/i2c/busses/i2c-nomadik.c
@@ -1050,9 +1050,9 @@ static int nmk_i2c_eyeq5_probe(struct nmk_i2c_dev *priv)
 	if (id >= ARRAY_SIZE(nmk_i2c_eyeq5_masks))
 		return -ENOENT;
 
-	if (priv->clk_freq <= 400000)
+	if (priv->clk_freq <= I2C_MAX_FAST_MODE_FREQ)
 		speed_mode = I2C_EYEQ5_SPEED_FAST;
-	else if (priv->clk_freq <= 1000000)
+	else if (priv->clk_freq <= I2C_MAX_FAST_MODE_PLUS_FREQ)
 		speed_mode = I2C_EYEQ5_SPEED_FAST_PLUS;
 	else
 		speed_mode = I2C_EYEQ5_SPEED_HIGH_SPEED;
-- 
2.50.1



^ permalink raw reply related

* [RFC PATCH net-next v8 06/12] net: Document PCS subsystem
From: Christian Marangi @ 2026-06-18 12:57 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Simon Horman, Jonathan Corbet, Shuah Khan, Christian Marangi,
	Lorenzo Bianconi, Heiner Kallweit, Russell King, Saravana Kannan,
	Philipp Zabel, Nathan Chancellor, Nick Desaulniers, Bill Wendling,
	Justin Stitt, netdev, devicetree, linux-kernel, linux-doc,
	linux-arm-kernel, linux-mediatek, llvm, Maxime Chevallier
In-Reply-To: <20260618125752.1223-1-ansuelsmth@gmail.com>

Add extensive documentation of the new PCS subsystem and the fwnode
implementation with producer/consumer API.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
---
 Documentation/networking/index.rst |   1 +
 Documentation/networking/pcs.rst   | 229 +++++++++++++++++++++++++++++
 2 files changed, 230 insertions(+)
 create mode 100644 Documentation/networking/pcs.rst

diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
index 44a422ad3b05..3fce8f6ac089 100644
--- a/Documentation/networking/index.rst
+++ b/Documentation/networking/index.rst
@@ -28,6 +28,7 @@ Contents:
    net_failover
    page_pool
    phy
+   pcs
    sfp-phylink
    alias
    bridge
diff --git a/Documentation/networking/pcs.rst b/Documentation/networking/pcs.rst
new file mode 100644
index 000000000000..98592cdee3ef
--- /dev/null
+++ b/Documentation/networking/pcs.rst
@@ -0,0 +1,229 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=============
+PCS Subsystem
+=============
+
+The PCS (Physical Coding Sublayer) subsystem handles the registration and lookup
+of PCS devices. These devices contain the upper sublayers of the Ethernet
+physical layer, generally handling framing, scrambling, and encoding tasks. PCS
+devices may also include PMA (Physical Medium Attachment) components. PCS
+devices transfer data between the Link-layer MAC device, and the rest of the
+physical layer, typically via a serdes. The output of the serdes may be
+connected more-or-less directly to the medium when using fiber-optic or
+backplane connections (1000BASE-SX, 1000BASE-KX, etc). It may also communicate
+with a separate PHY (such as over SGMII) which handles the connection to the
+medium (such as 1000BASE-T).
+
+Remark on usage of .mac_select_pcs and fw_node PCS
+--------------------------------------------------
+
+There are generally two ways to look up a PCS device.
+
+1. MAC OP struct .mac_select_pcs (considered legacy)
+2. firmware node (fwnode) PCS entirely handled by phylink
+
+Implementation 1 leaves the entire handling of the PCS to the MAC
+driver with the selection of the PCS driven by .mac_select_pcs.
+Custom implementations are required if the PCS is external to the MAC
+and needs to be handled by a separate driver.
+
+This implementation is considered legacy and it's suggested to
+switch to the new fwnode PCS.
+
+Looking up PCS Devices (fwnode implementation)
+-----------------------------------------------
+
+The lookup of a PCS device follows the common producer/consumer implementation
+used by similar subsystems with a ``#pcs-cells`` on the producer and a
+``pcs-handle`` property on the consumer::
+
+    pcs: pcs {
+        // ...
+        #pcs-cells = <0>;
+    };
+
+    ethernet-controller {
+        // ...
+        pcs-handle = <&pcs>;
+    };
+
+On :c:func:`phylink_create`, phylink will use the ``num_possible_pcs``
+value and ``fill_available_pcs`` helper function in
+:c:struct:`phylink_config` to compose the list of available PCS that can be
+used for the phylink instance.
+
+Phylink will then internally handle the selection of the correct PCS for
+the requested interface mode based on the interface modes configured in
+``pcs_interfaces`` in :c:struct:`phylink_config` struct and
+``supported_interfaces`` in :c:struct:`phylink_pcs` struct.
+
+A PCS is considered eligible when the requested interface mode is present
+in both ``pcs_interfaces`` in :c:struct:`phylink_config` struct and
+``supported_interfaces`` in :c:struct:`phylink_pcs` struct.
+
+``supported_interfaces`` describes all interface modes supported by the MAC,
+whereas ``pcs_interfaces`` identifies the subset that require PCS selection.
+
+For the special implementation where the PCS is internal or part of the MAC
+and a dedicated driver is not needed, it's possible to leave the implementation
+of the PCS to the MAC driver and just implement the ``num_possible_pcs``
+value and ``fill_available_pcs`` helper  function in
+:c:struct:`phylink_config` referencing the local :c:struct:`phylink_pcs`
+struct allocated from the MAC driver.
+
+Using PCS Devices
+-----------------
+
+It's mandatory to either implement the ``mac_select_pcs`` callback
+of :c:struct:`phylink_mac_ops` or ``num_possible_pcs`` and ``fill_available_pcs``
+of :c:struct:`phylink_config` to use a PCS for a MAC.
+
+The fwnode implementation exposes simple helpers to parse the PCS from
+the fwnode :c:func:`fwnode_phylink_pcs_count` and
+:c:func:`fwnode_phylink_pcs_parse`. The :c:func:`fwnode_phylink_pcs_count` helper
+takes the fwnode where the ``pcs-handle`` should be parsed and return the
+number of PCS entries described in the fwnode.
+The :c:func:`fwnode_phylink_pcs_parse` helper takes three arguments,
+the fwnode where the ``pcs-handle`` should be parsed, an allocated array
+of :c:struct:`phylink_pcs` pointer where to put the parsed PCS from the fwnode
+and the maximum number of PCS to parse.
+Contrary to :c:func:`fwnode_phylink_pcs_count`, :c:func:`fwnode_phylink_pcs_parse`
+helper fills the allocated array with ONLY the available PCS and return the
+number of available PCS found. PCS that returns -ENODEV will be skipped and
+won't be inserted in the allocated array.
+
+A phylink instance may use multiple PCS devices. The maximum number is reported
+through ``num_possible_pcs``.
+
+It's mandatory to specify for what interface a PCS is needed. This can be done
+by filling the ``pcs_interfaces`` in :c:struct:`phylink_config` struct.
+If the requested interface mode is not present in this bitmask, phylink does
+not search for a PCS for  that specific mode. (example MAC doesn't need a PCS
+for SGMII but require one for USXGMII)
+
+With the use of the :c:func:`fwnode_phylink_pcs_parse` a common implementation
+is the following::
+
+   static int mac_fill_available_pcs(struct phylink_config *config,
+   				                      struct phylink_pcs **available_pcs,
+					                      unsigned int num_possible_pcs)
+   {
+   	struct device *dev = config->dev;
+
+   	return fwnode_phylink_pcs_parse(dev_fwnode(dev), available_pcs,
+						                    num_possible_pcs);
+   }
+
+   static int mac_setup_phylink(struct net_device *netdev)
+   {
+      struct phylink_config *config;
+
+      // ...
+
+      config->dev = &netdev->dev;
+
+      // ...
+
+      // Parse possible PCS and fill num_possible_pcs.
+      config->num_possible_pcs = fwnode_phylink_pcs_count(dev_fwnode(&netdev->dev));
+      config->fill_available_pcs = mac_fill_available_pcs;
+
+      __set_bit(PHY_INTERFACE_MODE_INTERNAL, config->supported_interfaces);
+      __set_bit(PHY_INTERFACE_MODE_SGMII, config->supported_interfaces);
+      __set_bit(PHY_INTERFACE_MODE_1000BASEX, config->supported_interfaces);
+      __set_bit(PHY_INTERFACE_MODE_USXGMII, config->supported_interfaces);
+
+      // PCS required only for USXGMII
+      __set_bit(PHY_INTERFACE_MODE_USXGMII, config->pcs_interfaces);
+
+      phylink = phylink_create(config, //...
+
+It's worth to mention that it's phylink code that takes care of allocating
+the array of :c:struct:`phylink_pcs` pointer for ``fill_available_pcs``
+callback based on the value set in ``num_possible_pcs`` for
+:c:struct:`phylink_config` struct.
+
+The ``fill_available_pcs`` callback must not write more than
+``num_possible_pcs`` entries. The third argument may be used to validate
+that there is enough space to fill all the available PCS in the passed array
+of :c:struct:`phylink_pcs` pointer.
+
+The ``fill_available_pcs`` callback is called only on :c:func:`phylink_create`
+and is used only to compose the initial available PCS list. Ownership of PCS
+is held by phylink and :c:func:`phylink_release_pcs` should be used to release
+them.
+
+Writing PCS Drivers
+-------------------
+
+To write a PCS driver, first implement :c:struct:`phylink_pcs_ops`. Then,
+register your PCS in your probe function using :c:func:`fwnode_pcs_add_provider`.
+The :c:func:`fwnode_pcs_add_provider` takes three arguments, the fwnode where
+the PCS provider should be registered to, a get function to return the requested
+PCS based on ``#pcs-cells`` and a pointer to reference private data for the get
+function.
+
+The PCS will then be registered to a global list of PCS provider that the
+PCS fwnode implementation will use to parse it.
+
+For the simple case where the PCS driver expose a single PCS,
+:c:func:`fwnode_pcs_simple_get` can be used as the get function.
+
+You must call :c:func:`fwnode_pcs_del_provider` from your remove function and
+release the PCS from any phylink instance under RTNL lock with
+:c:func:`phylink_release_pcs`::
+
+   fwnode_pcs_del_provider(dev_fwnode(&pdev->dev));
+
+	rtnl_lock();
+
+	for (i = 0; i < data->num_port; i++) {
+		struct pcs_port *port = &priv->ports[i];
+
+		phylink_release_pcs(&port->pcs);
+	}
+
+	rtnl_unlock();
+
+Late PCS registration handling
+------------------------------
+
+It's possible that a PCS becomes available after the MAC finished probing.
+Contrary to the usual producer/consumer implementation, when a PCS is not
+registered and can't be found, the fwnode parser helper returns ``-ENODEV``
+instead of ``-EPROBE_DEFER``.
+
+This is to prevent race condition with particular devices that register
+MAC and PCS with USB or PCIe and require the MAC to be registered before
+the PCS.
+
+The phylink logic correctly handle this special case and keep the phylink
+instance in a fail condition.
+
+The PCS fwnode implementation provides a notifier to which each phylink
+instance with a non-empty ``pcs_interfaces`` in :c:type:`phylink_config`
+registers. When a new PCS provider is registered, the notifier is called
+triggering the :c:func:`pcs_provider_notify` function.
+
+Function :c:func:`pcs_provider_notify` will check if the just added PCS
+should be used by the phylink instance. If it should be used then,
+it's added to the internal list of available PCS and a phylink major
+config is forced.
+
+If a phylink instance was in a failure state, with the just added PCS
+now part of the available PCS internal phylink list, provided all other
+conditions are satisfied, the configuration is retried and the failure
+condition is cleared.
+
+API Reference
+-------------
+
+.. kernel-doc:: include/linux/phylink.h
+   :identifiers: phylink_pcs
+
+.. kernel-doc:: include/linux/pcs/pcs.h
+   :internal:
+
+.. kernel-doc:: include/linux/pcs/pcs-provider.h
+   :internal:
-- 
2.53.0



^ permalink raw reply related

* Re: [RFC PATCH 0/2] kasan: hw_tags: Add option to tag only at allocation time
From: Harry Yoo @ 2026-06-18 14:05 UTC (permalink / raw)
  To: Dev Jain, ryabinin.a.a, akpm, corbet
  Cc: glider, andreyknvl, dvyukov, vincenzo.frascino, kasan-dev,
	linux-mm, linux-kernel, skhan, workflows, linux-doc,
	linux-arm-kernel, ryan.roberts, anshuman.khandual, kaleshsingh,
	21cnbao, david, will, catalin.marinas
In-Reply-To: <b1502a60-09a1-4699-886b-93d041de7023@kernel.org>



On 6/18/26 10:35 PM, Harry Yoo wrote:
> 
> Hi Dev,
> 
> On 6/12/26 1:44 PM, Dev Jain wrote:
>> Introduce a boot option to tag only at allocation time of the objects. This
>> reduces KASAN MTE overhead, the tradeoff being reduced ability of
>> catching bugs.
> 
> I think most of overhead when enabling MTE comes from loading and
> validing tags for every memory access (either in SYNC or ASYNC mode),
> rather than from storing tags.

Is there any reason not to use STGM instead of STG + DC GVA when
setting/clearing tags for large sizes when we know they are properly
aligned?

>> Now, when a memory object will be freed, it will retain the random tag it
>> had at allocation time. This compromises on catching UAF bugs, till the
>> time the object is not reallocated, at which point it will have a new
>> random tag.
>>
>> Hence, not catching "use-after-free-before-reallocation" and not catching
>> "double-free" will be the compromise for reduced KASAN overhead.
> 
> I doubt users who care about security enough to enable HW_TAGS KASAN
> are willing to compromise on security just to save a few instructions
> to store tags in the free path.
> 
> To me, it looks like too much of a compromise on security for little
> performance gain.
> 
>> This is an RFC because we are not clear about the performance benefit.
>>
>> Android folks, please help with testing!
>>
>> ---
>> Applies on Linus master (9716c086c8e8).
>>
>> Dev Jain (2):
>>   kasan: hw_tags: Use KASAN_PAGE_REDZONE for vmalloc redzoning
>>   kasan: hw_tags: Add boot option to elide free time poisoning
>>
>>  Documentation/dev-tools/kasan.rst |  4 +++
>>  mm/kasan/hw_tags.c                | 45 +++++++++++++++++++++++++++++--
>>  mm/kasan/kasan.h                  | 23 +++++++++++++++-
>>  3 files changed, 69 insertions(+), 3 deletions(-)
>>
> 

-- 
Cheers,
Harry / Hyeonggon



^ permalink raw reply

* [PATCH v2] i2c: sun6i-p2wi: Fix device node reference leak in p2wi_probe
From: Uday Khare @ 2026-06-18 14:05 UTC (permalink / raw)
  To: wens, andi.shyti, jernej.skrabec, samuel
  Cc: linux-i2c, linux-arm-kernel, linux-sunxi, linux-kernel,
	Uday Khare
In-Reply-To: <20260617194522.114984-1-udaykhare77@gmail.com>

In p2wi_probe(), the device node reference obtained via
of_get_next_available_child() is stored in childnp. This reference is
never released, causing a device node reference leak.

Fix this by declaring childnp with the __free(device_node) cleanup
attribute, which automatically releases the device node reference
when childnp goes out of scope.

Fixes: 3e833490fae5 ("i2c: sunxi: add P2WI (Push/Pull 2 Wire Interface) controller support")
Signed-off-by: Uday Khare <udaykhare77@gmail.com>
---
v2:
- Use __free(device_node) and include <linux/cleanup.h> to automate the device
  node reference cleanup instead of manually calling of_node_put() on error and
  success paths (suggested by Chen-Yu Tsai).

 drivers/i2c/busses/i2c-sun6i-p2wi.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/i2c/busses/i2c-sun6i-p2wi.c b/drivers/i2c/busses/i2c-sun6i-p2wi.c
index dffbe776a195..8469a0ea98d7 100644
--- a/drivers/i2c/busses/i2c-sun6i-p2wi.c
+++ b/drivers/i2c/busses/i2c-sun6i-p2wi.c
@@ -21,6 +21,7 @@
  * PMIC).
  *
  */
+#include <linux/cleanup.h>
 #include <linux/clk.h>
 #include <linux/i2c.h>
 #include <linux/io.h>
@@ -184,7 +185,6 @@ static int p2wi_probe(struct platform_device *pdev)
 {
 	struct device *dev = &pdev->dev;
 	struct device_node *np = dev->of_node;
-	struct device_node *childnp;
 	unsigned long parent_clk_freq;
 	u32 clk_freq = I2C_MAX_STANDARD_MODE_FREQ;
 	struct p2wi *p2wi;
@@ -217,7 +217,9 @@ static int p2wi_probe(struct platform_device *pdev)
 	 * In this case the target_addr is set to -1 and won't be checked when
 	 * launching a P2WI transfer.
 	 */
-	childnp = of_get_next_available_child(np, NULL);
+	struct device_node *childnp __free(device_node) =
+		of_get_next_available_child(np, NULL);
+
 	if (childnp) {
 		ret = of_property_read_u32(childnp, "reg", &target_addr);
 		if (ret)
-- 
2.54.0



^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox