Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 1/7] dt-bindings: clock: qcom: Add video clock controller on Eliza SoC
From: Krzysztof Kozlowski @ 2026-04-07 13:40 UTC (permalink / raw)
  To: Taniya Das, Bjorn Andersson, Michael Turquette, Stephen Boyd,
	Rob Herring, Krzysztof Kozlowski, Conor Dooley, Konrad Dybcio,
	Vladimir Zapolskiy
  Cc: Ajit Pandey, Imran Shaik, Jagadeesh Kona, linux-arm-msm,
	linux-clk, devicetree, linux-kernel, linux-arm-kernel
In-Reply-To: <20260317-eliza_mm_clock_controllers_v1-v1-1-4696eeda8cfb@oss.qualcomm.com>

On 17/03/2026 18:14, Taniya Das wrote:
> Add compatible string for Eliza video clock controller and the bindings
> for Eliza Qualcomm SoC.
> 
> Signed-off-by: Taniya Das <taniya.das@oss.qualcomm.com>
> ---
>  .../bindings/clock/qcom,sm8450-videocc.yaml        | 16 ++++++++++
>  include/dt-bindings/clock/qcom,eliza-videocc.h     | 37 ++++++++++++++++++++++
>  2 files changed, 53 insertions(+)
> 

Although I already suggested that this was not tested (and you never
replied where did you test it), but I also checked and this fails checks
- constraints are mismatched now.

Best regards,
Krzysztof



^ permalink raw reply

* Re: [PATCH V11 02/12] PCI: host-generic: Add common helpers for parsing Root Port properties
From: Manivannan Sadhasivam @ 2026-04-07 13:27 UTC (permalink / raw)
  To: Sherry Sun
  Cc: robh, krzk+dt, conor+dt, Frank.Li, s.hauer, kernel, festevam,
	lpieralisi, kwilczynski, bhelgaas, hongxing.zhu, l.stach, imx,
	linux-pci, linux-arm-kernel, devicetree, linux-kernel
In-Reply-To: <20260407104154.2842132-3-sherry.sun@nxp.com>

On Tue, Apr 07, 2026 at 06:41:44PM +0800, Sherry Sun wrote:
> Introduce generic helper functions to parse Root Port device tree nodes
> and extract common properties like reset GPIOs. This allows multiple
> PCI host controller drivers to share the same parsing logic.
> 
> Define struct pci_host_port to hold common Root Port properties
> (currently only reset GPIO descriptor) and add
> pci_host_common_parse_ports() to parse Root Port nodes from device tree.
> 
> Also add the 'ports' list to struct pci_host_bridge for better maintain
> parsed Root Port information.
> 
> Signed-off-by: Sherry Sun <sherry.sun@nxp.com>
> ---
>  drivers/pci/controller/pci-host-common.c | 77 ++++++++++++++++++++++++
>  drivers/pci/controller/pci-host-common.h | 16 +++++
>  drivers/pci/probe.c                      |  1 +
>  include/linux/pci.h                      |  1 +
>  4 files changed, 95 insertions(+)
> 
> diff --git a/drivers/pci/controller/pci-host-common.c b/drivers/pci/controller/pci-host-common.c
> index d6258c1cffe5..0fb6991dde7b 100644
> --- a/drivers/pci/controller/pci-host-common.c
> +++ b/drivers/pci/controller/pci-host-common.c
> @@ -9,6 +9,7 @@
>  
>  #include <linux/kernel.h>
>  #include <linux/module.h>
> +#include <linux/gpio/consumer.h>
>  #include <linux/of.h>
>  #include <linux/of_address.h>
>  #include <linux/of_pci.h>
> @@ -17,6 +18,82 @@
>  
>  #include "pci-host-common.h"
>  
> +/**
> + * pci_host_common_delete_ports - Cleanup function for port list
> + * @data: Pointer to the port list head
> + */
> +void pci_host_common_delete_ports(void *data)
> +{
> +	struct list_head *ports = data;
> +	struct pci_host_port *port, *tmp;
> +
> +	list_for_each_entry_safe(port, tmp, ports, list)
> +		list_del(&port->list);
> +}
> +EXPORT_SYMBOL_GPL(pci_host_common_delete_ports);
> +
> +/**
> + * pci_host_common_parse_port - Parse a single Root Port node
> + * @dev: Device pointer
> + * @bridge: PCI host bridge
> + * @node: Device tree node of the Root Port
> + *
> + * Returns: 0 on success, negative error code on failure
> + */
> +static int pci_host_common_parse_port(struct device *dev,
> +				      struct pci_host_bridge *bridge,
> +				      struct device_node *node)
> +{
> +	struct pci_host_port *port;
> +	struct gpio_desc *reset;
> +
> +	reset = devm_fwnode_gpiod_get(dev, of_fwnode_handle(node),
> +				      "reset", GPIOD_ASIS, "PERST#");

Sorry, I missed this earlier.

Since PERST# is optional, you cannot reliably detect whether the Root Port
binding intentionally skipped the PERST# GPIO or legacy binding is used, just by
checking for PERST# in Root Port node.

So this helper should do 3 things:

1. If PERST# is found in Root Port node, use it.
2. If not, check the RC node and if present, return -ENOENT to fallback to the
legacy binding.
3. If not found in both nodes, assume that the PERST# is not present in the
design, and proceed with parsing Root Port binding further.

But there is one more important limitation here. Right now, this API only
handles PERST#. But if another vendor tries to use it and if they need other
properties such as PHY, clocks etc... those resources should be fetched
optionally only by this helper. But if the controller has a hard dependency on
those resources, the driver will fail to operate.

I don't think we can fix this limitation though and those platforms should
ensure that the resource dependency is correctly modeled in DT binding and the
DTS is validated properly. It'd be good to mention this in the kernel doc of
this API.

> +	if (IS_ERR(reset))
> +		return PTR_ERR(reset);
> +
> +	port = devm_kzalloc(dev, sizeof(*port), GFP_KERNEL);
> +	if (!port)
> +		return -ENOMEM;
> +
> +	port->reset = reset;
> +	INIT_LIST_HEAD(&port->list);
> +	list_add_tail(&port->list, &bridge->ports);
> +
> +	return 0;
> +}
> +
> +/**
> + * pci_host_common_parse_ports - Parse Root Port nodes from device tree
> + * @dev: Device pointer
> + * @bridge: PCI host bridge
> + *
> + * This function iterates through child nodes of the host bridge and parses
> + * Root Port properties (currently only reset GPIO).
> + *
> + * Returns: 0 on success, -ENOENT if no ports found, other negative error codes
> + * on failure
> + */
> +int pci_host_common_parse_ports(struct device *dev, struct pci_host_bridge *bridge)
> +{
> +	int ret = -ENOENT;
> +
> +	for_each_available_child_of_node_scoped(dev->of_node, of_port) {
> +		if (!of_node_is_type(of_port, "pci"))
> +			continue;
> +		ret = pci_host_common_parse_port(dev, bridge, of_port);
> +		if (ret)
> +			return ret;

As Sashiko flagged, you need to make sure that devm_add_action_or_reset() is
added even during the error path:
https://sashiko.dev/#/patchset/20260407104154.2842132-1-sherry.sun%40nxp.com?part=2

- Mani

> +	}
> +
> +	if (ret)
> +		return ret;
> +
> +	return devm_add_action_or_reset(dev, pci_host_common_delete_ports,
> +					&bridge->ports);
> +}
> +EXPORT_SYMBOL_GPL(pci_host_common_parse_ports);
> +
>  static void gen_pci_unmap_cfg(void *ptr)
>  {
>  	pci_ecam_free((struct pci_config_window *)ptr);
> diff --git a/drivers/pci/controller/pci-host-common.h b/drivers/pci/controller/pci-host-common.h
> index b5075d4bd7eb..37714bedb625 100644
> --- a/drivers/pci/controller/pci-host-common.h
> +++ b/drivers/pci/controller/pci-host-common.h
> @@ -12,6 +12,22 @@
>  
>  struct pci_ecam_ops;
>  
> +/**
> + * struct pci_host_port - Generic Root Port properties
> + * @list: List node for linking multiple ports
> + * @reset: GPIO descriptor for PERST# signal
> + *
> + * This structure contains common properties that can be parsed from
> + * Root Port device tree nodes.
> + */
> +struct pci_host_port {
> +	struct list_head	list;
> +	struct gpio_desc	*reset;
> +};
> +
> +void pci_host_common_delete_ports(void *data);
> +int pci_host_common_parse_ports(struct device *dev, struct pci_host_bridge *bridge);
> +
>  int pci_host_common_probe(struct platform_device *pdev);
>  int pci_host_common_init(struct platform_device *pdev,
>  			 struct pci_host_bridge *bridge,
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index eaa4a3d662e8..629ae08b7d35 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -677,6 +677,7 @@ static void pci_init_host_bridge(struct pci_host_bridge *bridge)
>  {
>  	INIT_LIST_HEAD(&bridge->windows);
>  	INIT_LIST_HEAD(&bridge->dma_ranges);
> +	INIT_LIST_HEAD(&bridge->ports);
>  
>  	/*
>  	 * We assume we can manage these PCIe features.  Some systems may
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 8f63de38f2d2..a73ea81ce88f 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -636,6 +636,7 @@ struct pci_host_bridge {
>  	int		domain_nr;
>  	struct list_head windows;	/* resource_entry */
>  	struct list_head dma_ranges;	/* dma ranges resource list */
> +	struct list_head ports;		/* Root Port list (pci_host_port) */
>  #ifdef CONFIG_PCI_IDE
>  	u16 nr_ide_streams; /* Max streams possibly active in @ide_stream_ida */
>  	struct ida ide_stream_ida;
> -- 
> 2.37.1
> 

-- 
மணிவண்ணன் சதாசிவம்


^ permalink raw reply

* [PATCH 10/10] arm64: Check DAIF (and PMR) at task-switch time
From: Mark Rutland @ 2026-04-07 13:16 UTC (permalink / raw)
  To: linux-arm-kernel, Catalin Marinas, Will Deacon
  Cc: mark.rutland, vladimir.murzin, peterz, ruanjinjie, linux-kernel,
	tglx, luto
In-Reply-To: <20260407131650.3813777-1-mark.rutland@arm.com>

When __switch_to() switches from a 'prev' task to a 'next' task, various
pieces of CPU state are expected to have specific values, such that
these do not need to be saved/restored. If any of these hold an
unexpected value when switching away from the prev task, they could lead
to surprising behaviour in the context of the next task, and it would be
difficult to determine where they were configured to their unexpected
value.

Add some checks for DAIF and PMR at task-switch time so that we can
detect such issues.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@kernel.org>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/kernel/process.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 489554931231e..ba9038434d2fb 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -699,6 +699,29 @@ void update_sctlr_el1(u64 sctlr)
 	isb();
 }
 
+static inline void debug_switch_state(void)
+{
+	if (system_uses_irq_prio_masking()) {
+		unsigned long daif_expected = 0;
+		unsigned long daif_actual = read_sysreg(daif);
+		unsigned long pmr_expected = GIC_PRIO_IRQOFF;
+		unsigned long pmr_actual = read_sysreg_s(SYS_ICC_PMR_EL1);
+
+		WARN_ONCE(daif_actual != daif_expected ||
+			  pmr_actual != pmr_expected,
+			  "Unexpected DAIF + PMR: 0x%lx + 0x%lx (expected 0x%lx + 0x%lx)\n",
+			  daif_actual, pmr_actual,
+			  daif_expected, pmr_expected);
+	} else {
+		unsigned long daif_expected = DAIF_PROCCTX_NOIRQ;
+		unsigned long daif_actual = read_sysreg(daif);
+
+		WARN_ONCE(daif_actual != daif_expected,
+			  "Unexpected DAIF value: 0x%lx (expected 0x%lx)\n",
+			  daif_actual, daif_expected);
+	}
+}
+
 /*
  * Thread switching.
  */
@@ -708,6 +731,8 @@ struct task_struct *__switch_to(struct task_struct *prev,
 {
 	struct task_struct *last;
 
+	debug_switch_state();
+
 	fpsimd_thread_switch(next);
 	tls_thread_switch(next);
 	hw_breakpoint_thread_switch(next);
-- 
2.30.2



^ permalink raw reply related

* [PATCH 09/10] arm64: entry: Use split preemption logic
From: Mark Rutland @ 2026-04-07 13:16 UTC (permalink / raw)
  To: linux-arm-kernel, Catalin Marinas, Will Deacon
  Cc: mark.rutland, vladimir.murzin, peterz, ruanjinjie, linux-kernel,
	tglx, luto
In-Reply-To: <20260407131650.3813777-1-mark.rutland@arm.com>

The generic irqentry code now provides
irqentry_exit_to_kernel_mode_preempt() and
irqentry_exit_to_kernel_mode_after_preempt(), which can be used
where architectures have different state requirements for involuntary
preemption and exception return, as is the case on arm64.

Use the new functions on arm64, aligning our exit to kernel mode logic
with the style of our exit to user mode logic. This removes the need for
the recently-added bodge in arch_irqentry_exit_need_resched(), and
allows preemption to occur when returning from any exception taken from
kernel mode, which is nicer for RT.

In an ideal world, we'd remove arch_irqentry_exit_need_resched(), and
fold the conditionality directly into the architecture-specific entry
code. That way all the logic necessary to avoid preempting from a
pseudo-NMI could be constrained specifically to the EL1 IRQ/FIQ paths,
avoiding redundant work for other exceptions, and making the flow a bit
clearer. At present it looks like that would require a larger
refactoring (e.g. for the PREEMPT_DYNAMIC logic), and so I've left that
as-is for now.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@kernel.org>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/entry-common.h | 21 ++++++++-------------
 arch/arm64/kernel/entry-common.c      | 12 ++++--------
 2 files changed, 12 insertions(+), 21 deletions(-)

diff --git a/arch/arm64/include/asm/entry-common.h b/arch/arm64/include/asm/entry-common.h
index 20f0a7c7bde15..cab8cd78f6938 100644
--- a/arch/arm64/include/asm/entry-common.h
+++ b/arch/arm64/include/asm/entry-common.h
@@ -29,19 +29,14 @@ static __always_inline void arch_exit_to_user_mode_work(struct pt_regs *regs,
 
 static inline bool arch_irqentry_exit_need_resched(void)
 {
-	if (system_uses_irq_prio_masking()) {
-		/*
-		 * DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC
-		 * priority masking is used the GIC irqchip driver will clear DAIF.IF
-		 * using gic_arch_enable_irqs() for normal IRQs. If anything is set in
-		 * DAIF we must have handled an NMI, so skip preemption.
-		 */
-		if (read_sysreg(daif))
-			return false;
-	} else {
-		if (read_sysreg(daif) & (PSR_D_BIT | PSR_A_BIT))
-			return false;
-	}
+	/*
+	 * DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC
+	 * priority masking is used the GIC irqchip driver will clear DAIF.IF
+	 * using gic_arch_enable_irqs() for normal IRQs. If anything is set in
+	 * DAIF we must have handled an NMI, so skip preemption.
+	 */
+	if (system_uses_irq_prio_masking() && read_sysreg(daif))
+		return false;
 
 	/*
 	 * Preempting a task from an IRQ means we leave copies of PSTATE
diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
index 16a65987a6a9b..f42ce7b5c67f3 100644
--- a/arch/arm64/kernel/entry-common.c
+++ b/arch/arm64/kernel/entry-common.c
@@ -54,8 +54,11 @@ static noinstr irqentry_state_t arm64_enter_from_kernel_mode(struct pt_regs *reg
 static void noinstr arm64_exit_to_kernel_mode(struct pt_regs *regs,
 					      irqentry_state_t state)
 {
+	local_irq_disable();
+	irqentry_exit_to_kernel_mode_preempt(regs, state);
+	local_daif_mask();
 	mte_check_tfsr_exit();
-	irqentry_exit_to_kernel_mode(regs, state);
+	irqentry_exit_to_kernel_mode_after_preempt(regs, state);
 }
 
 /*
@@ -301,7 +304,6 @@ static void noinstr el1_abort(struct pt_regs *regs, unsigned long esr)
 	state = arm64_enter_from_kernel_mode(regs);
 	local_daif_inherit(regs);
 	do_mem_abort(far, esr, regs);
-	local_daif_mask();
 	arm64_exit_to_kernel_mode(regs, state);
 }
 
@@ -313,7 +315,6 @@ static void noinstr el1_pc(struct pt_regs *regs, unsigned long esr)
 	state = arm64_enter_from_kernel_mode(regs);
 	local_daif_inherit(regs);
 	do_sp_pc_abort(far, esr, regs);
-	local_daif_mask();
 	arm64_exit_to_kernel_mode(regs, state);
 }
 
@@ -324,7 +325,6 @@ static void noinstr el1_undef(struct pt_regs *regs, unsigned long esr)
 	state = arm64_enter_from_kernel_mode(regs);
 	local_daif_inherit(regs);
 	do_el1_undef(regs, esr);
-	local_daif_mask();
 	arm64_exit_to_kernel_mode(regs, state);
 }
 
@@ -335,7 +335,6 @@ static void noinstr el1_bti(struct pt_regs *regs, unsigned long esr)
 	state = arm64_enter_from_kernel_mode(regs);
 	local_daif_inherit(regs);
 	do_el1_bti(regs, esr);
-	local_daif_mask();
 	arm64_exit_to_kernel_mode(regs, state);
 }
 
@@ -346,7 +345,6 @@ static void noinstr el1_gcs(struct pt_regs *regs, unsigned long esr)
 	state = arm64_enter_from_kernel_mode(regs);
 	local_daif_inherit(regs);
 	do_el1_gcs(regs, esr);
-	local_daif_mask();
 	arm64_exit_to_kernel_mode(regs, state);
 }
 
@@ -357,7 +355,6 @@ static void noinstr el1_mops(struct pt_regs *regs, unsigned long esr)
 	state = arm64_enter_from_kernel_mode(regs);
 	local_daif_inherit(regs);
 	do_el1_mops(regs, esr);
-	local_daif_mask();
 	arm64_exit_to_kernel_mode(regs, state);
 }
 
@@ -423,7 +420,6 @@ static void noinstr el1_fpac(struct pt_regs *regs, unsigned long esr)
 	state = arm64_enter_from_kernel_mode(regs);
 	local_daif_inherit(regs);
 	do_el1_fpac(regs, esr);
-	local_daif_mask();
 	arm64_exit_to_kernel_mode(regs, state);
 }
 
-- 
2.30.2



^ permalink raw reply related

* [PATCH 08/10] arm64: entry: Use irqentry_{enter_from,exit_to}_kernel_mode()
From: Mark Rutland @ 2026-04-07 13:16 UTC (permalink / raw)
  To: linux-arm-kernel, Catalin Marinas, Will Deacon
  Cc: mark.rutland, vladimir.murzin, peterz, ruanjinjie, linux-kernel,
	tglx, luto
In-Reply-To: <20260407131650.3813777-1-mark.rutland@arm.com>

The generic irqentry code now provides irqentry_enter_from_kernel_mode()
and irqentry_exit_to_kernel_mode(), which can be used when an exception
is known to be taken from kernel mode. These can be inlined into
architecture-specific entry code, and avoid redundant work to test
whether the exception was taken from user mode.

Use these in arm64_enter_from_kernel_mode() and
arm64_exit_to_kernel_mode(), which are only used for exceptions known to
be taken from kernel mode. This will remove a small amount of redundant
work, and will permit further changes to arm64_exit_to_kernel_mode() in
subsequent patches.

There should be no funcitonal change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@kernel.org>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/kernel/entry-common.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
index 3d01cdacdc7a2..16a65987a6a9b 100644
--- a/arch/arm64/kernel/entry-common.c
+++ b/arch/arm64/kernel/entry-common.c
@@ -39,7 +39,7 @@ static noinstr irqentry_state_t arm64_enter_from_kernel_mode(struct pt_regs *reg
 {
 	irqentry_state_t state;
 
-	state = irqentry_enter(regs);
+	state = irqentry_enter_from_kernel_mode(regs);
 	mte_check_tfsr_entry();
 	mte_disable_tco_entry(current);
 
@@ -55,7 +55,7 @@ static void noinstr arm64_exit_to_kernel_mode(struct pt_regs *regs,
 					      irqentry_state_t state)
 {
 	mte_check_tfsr_exit();
-	irqentry_exit(regs, state);
+	irqentry_exit_to_kernel_mode(regs, state);
 }
 
 /*
-- 
2.30.2



^ permalink raw reply related

* [PATCH 07/10] arm64: entry: Consistently prefix arm64-specific wrappers
From: Mark Rutland @ 2026-04-07 13:16 UTC (permalink / raw)
  To: linux-arm-kernel, Catalin Marinas, Will Deacon
  Cc: mark.rutland, vladimir.murzin, peterz, ruanjinjie, linux-kernel,
	tglx, luto
In-Reply-To: <20260407131650.3813777-1-mark.rutland@arm.com>

For historical reasons, arm64's entry code has arm64-specific functions
named enter_from_kernel_mode() and exit_to_kernel_mode(), which are
wrappers for similarly-named functions from the generic irqentry code.
Other arm64-specific wrappers have an 'arm64_' prefix to clearly
distinguish them from their generic counterparts, e.g.
arm64_enter_from_user_mode() and arm64_exit_to_user_mode().

For consistency and clarity, add an 'arm64_' prefix to these functions.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@kernel.org>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/kernel/entry-common.c | 38 ++++++++++++++++----------------
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
index 3625797e9ee8f..3d01cdacdc7a2 100644
--- a/arch/arm64/kernel/entry-common.c
+++ b/arch/arm64/kernel/entry-common.c
@@ -35,7 +35,7 @@
  * Before this function is called it is not safe to call regular kernel code,
  * instrumentable code, or any code which may trigger an exception.
  */
-static noinstr irqentry_state_t enter_from_kernel_mode(struct pt_regs *regs)
+static noinstr irqentry_state_t arm64_enter_from_kernel_mode(struct pt_regs *regs)
 {
 	irqentry_state_t state;
 
@@ -51,8 +51,8 @@ static noinstr irqentry_state_t enter_from_kernel_mode(struct pt_regs *regs)
  * After this function returns it is not safe to call regular kernel code,
  * instrumentable code, or any code which may trigger an exception.
  */
-static void noinstr exit_to_kernel_mode(struct pt_regs *regs,
-					irqentry_state_t state)
+static void noinstr arm64_exit_to_kernel_mode(struct pt_regs *regs,
+					      irqentry_state_t state)
 {
 	mte_check_tfsr_exit();
 	irqentry_exit(regs, state);
@@ -298,11 +298,11 @@ static void noinstr el1_abort(struct pt_regs *regs, unsigned long esr)
 	unsigned long far = read_sysreg(far_el1);
 	irqentry_state_t state;
 
-	state = enter_from_kernel_mode(regs);
+	state = arm64_enter_from_kernel_mode(regs);
 	local_daif_inherit(regs);
 	do_mem_abort(far, esr, regs);
 	local_daif_mask();
-	exit_to_kernel_mode(regs, state);
+	arm64_exit_to_kernel_mode(regs, state);
 }
 
 static void noinstr el1_pc(struct pt_regs *regs, unsigned long esr)
@@ -310,55 +310,55 @@ static void noinstr el1_pc(struct pt_regs *regs, unsigned long esr)
 	unsigned long far = read_sysreg(far_el1);
 	irqentry_state_t state;
 
-	state = enter_from_kernel_mode(regs);
+	state = arm64_enter_from_kernel_mode(regs);
 	local_daif_inherit(regs);
 	do_sp_pc_abort(far, esr, regs);
 	local_daif_mask();
-	exit_to_kernel_mode(regs, state);
+	arm64_exit_to_kernel_mode(regs, state);
 }
 
 static void noinstr el1_undef(struct pt_regs *regs, unsigned long esr)
 {
 	irqentry_state_t state;
 
-	state = enter_from_kernel_mode(regs);
+	state = arm64_enter_from_kernel_mode(regs);
 	local_daif_inherit(regs);
 	do_el1_undef(regs, esr);
 	local_daif_mask();
-	exit_to_kernel_mode(regs, state);
+	arm64_exit_to_kernel_mode(regs, state);
 }
 
 static void noinstr el1_bti(struct pt_regs *regs, unsigned long esr)
 {
 	irqentry_state_t state;
 
-	state = enter_from_kernel_mode(regs);
+	state = arm64_enter_from_kernel_mode(regs);
 	local_daif_inherit(regs);
 	do_el1_bti(regs, esr);
 	local_daif_mask();
-	exit_to_kernel_mode(regs, state);
+	arm64_exit_to_kernel_mode(regs, state);
 }
 
 static void noinstr el1_gcs(struct pt_regs *regs, unsigned long esr)
 {
 	irqentry_state_t state;
 
-	state = enter_from_kernel_mode(regs);
+	state = arm64_enter_from_kernel_mode(regs);
 	local_daif_inherit(regs);
 	do_el1_gcs(regs, esr);
 	local_daif_mask();
-	exit_to_kernel_mode(regs, state);
+	arm64_exit_to_kernel_mode(regs, state);
 }
 
 static void noinstr el1_mops(struct pt_regs *regs, unsigned long esr)
 {
 	irqentry_state_t state;
 
-	state = enter_from_kernel_mode(regs);
+	state = arm64_enter_from_kernel_mode(regs);
 	local_daif_inherit(regs);
 	do_el1_mops(regs, esr);
 	local_daif_mask();
-	exit_to_kernel_mode(regs, state);
+	arm64_exit_to_kernel_mode(regs, state);
 }
 
 static void noinstr el1_breakpt(struct pt_regs *regs, unsigned long esr)
@@ -420,11 +420,11 @@ static void noinstr el1_fpac(struct pt_regs *regs, unsigned long esr)
 {
 	irqentry_state_t state;
 
-	state = enter_from_kernel_mode(regs);
+	state = arm64_enter_from_kernel_mode(regs);
 	local_daif_inherit(regs);
 	do_el1_fpac(regs, esr);
 	local_daif_mask();
-	exit_to_kernel_mode(regs, state);
+	arm64_exit_to_kernel_mode(regs, state);
 }
 
 asmlinkage void noinstr el1h_64_sync_handler(struct pt_regs *regs)
@@ -491,13 +491,13 @@ static __always_inline void __el1_irq(struct pt_regs *regs,
 {
 	irqentry_state_t state;
 
-	state = enter_from_kernel_mode(regs);
+	state = arm64_enter_from_kernel_mode(regs);
 
 	irq_enter_rcu();
 	do_interrupt_handler(regs, handler);
 	irq_exit_rcu();
 
-	exit_to_kernel_mode(regs, state);
+	arm64_exit_to_kernel_mode(regs, state);
 }
 static void noinstr el1_interrupt(struct pt_regs *regs,
 				  void (*handler)(struct pt_regs *))
-- 
2.30.2



^ permalink raw reply related

* [PATCH 06/10] arm64: entry: Don't preempt with SError or Debug masked
From: Mark Rutland @ 2026-04-07 13:16 UTC (permalink / raw)
  To: linux-arm-kernel, Catalin Marinas, Will Deacon
  Cc: mark.rutland, vladimir.murzin, peterz, ruanjinjie, linux-kernel,
	tglx, luto
In-Reply-To: <20260407131650.3813777-1-mark.rutland@arm.com>

On arm64, involuntary kernel preemption has been subtly broken since the
move to the generic irqentry code. When preemption occurs, the new task
may run with SError and Debug exceptions masked unexpectedly, leading to
a loss of RAS events, breakpoints, watchpoints, and single-step
exceptions.

Prior to moving to the generic irqentry code, involuntary preemption of
kernel mode would only occur when returning from regular interrupts, in
a state where interrupts were masked and all other arm64-specific
exceptions (SError, Debug, and pseudo-NMI) were unmasked. This is the
only state in which it is valid to switch tasks.

As part of moving to the generic irqentry code, the involuntary
preemption logic was moved such that involuntary preemption could occur
when returning from any (non-NMI) exception. As most exception handlers
mask all arm64-specific exceptions before this point, preemption could
occur in a state where arm64-specific exceptions were masked. This is
not a valid state to switch tasks, and resulted in the loss of
exceptions described above.

As a temporary bodge, avoid the loss of exceptions by avoiding
involuntary preemption when SError and/or Debug exceptions are masked.
Practically speaking this means that involuntary preemption will only
occur when returning from regular interrupts, as was the case before
moving to the generic irqentry code.

Fixes: 99eb057ccd67 ("arm64: entry: Move arm64_preempt_schedule_irq() into __exit_to_kernel_mode()")
Reported-by: Ada Couprie Diaz <ada.coupriediaz@arm.com>
Reported-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@kernel.org>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/entry-common.h | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/include/asm/entry-common.h b/arch/arm64/include/asm/entry-common.h
index cab8cd78f6938..20f0a7c7bde15 100644
--- a/arch/arm64/include/asm/entry-common.h
+++ b/arch/arm64/include/asm/entry-common.h
@@ -29,14 +29,19 @@ static __always_inline void arch_exit_to_user_mode_work(struct pt_regs *regs,
 
 static inline bool arch_irqentry_exit_need_resched(void)
 {
-	/*
-	 * DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC
-	 * priority masking is used the GIC irqchip driver will clear DAIF.IF
-	 * using gic_arch_enable_irqs() for normal IRQs. If anything is set in
-	 * DAIF we must have handled an NMI, so skip preemption.
-	 */
-	if (system_uses_irq_prio_masking() && read_sysreg(daif))
-		return false;
+	if (system_uses_irq_prio_masking()) {
+		/*
+		 * DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC
+		 * priority masking is used the GIC irqchip driver will clear DAIF.IF
+		 * using gic_arch_enable_irqs() for normal IRQs. If anything is set in
+		 * DAIF we must have handled an NMI, so skip preemption.
+		 */
+		if (read_sysreg(daif))
+			return false;
+	} else {
+		if (read_sysreg(daif) & (PSR_D_BIT | PSR_A_BIT))
+			return false;
+	}
 
 	/*
 	 * Preempting a task from an IRQ means we leave copies of PSTATE
-- 
2.30.2



^ permalink raw reply related

* [PATCH 05/10] entry: Split preemption from irqentry_exit_to_kernel_mode()
From: Mark Rutland @ 2026-04-07 13:16 UTC (permalink / raw)
  To: linux-arm-kernel, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner
  Cc: mark.rutland, vladimir.murzin, catalin.marinas, ruanjinjie,
	linux-kernel, will
In-Reply-To: <20260407131650.3813777-1-mark.rutland@arm.com>

Some architecture-specific work needs to be performed between the state
management for exception entry/exit and the "real" work to handle the
exception. For example, arm64 needs to manipulate a number of exception
masking bits, with different exceptions requiring different masking.

Generally this can all be hidden in the architecture code, but for arm64
the current structure of irqentry_exit_to_kernel_mode() makes this
particularly difficult to handle in a way that is correct, maintainable,
and efficient.

The gory details are described in the thread surrounding:

  https://lore.kernel.org/lkml/acPAzdtjK5w-rNqC@J2N7QTR9R3/

The summary is:

* Currently, irqentry_exit_to_kernel_mode() handles both involuntary
  preemption AND state management necessary for exception return.

* When scheduling (including involuntary preemption), arm64 needs to
  have all arm64-specific exceptions unmasked, though regular interrupts
  must be masked.

* Prior to the state management for exception return, arm64 needs to
  mask a number of arm64-specific exceptions, and perform some work with
  these exceptions masked (with RCU watching, etc).

While in theory it is possible to handle this with a new arch_*() hook
called somewhere under irqentry_exit_to_kernel_mode(), this is fragile
and complicated, and doesn't match the flow used for exception return to
user mode, which has a separate 'prepare' step (where preemption can
occur) prior to the state management.

To solve this, refactor irqentry_exit_to_kernel_mode() to match the
style of {irqentry,syscall}_exit_to_user_mode(), moving preemption logic
into a new irqentry_exit_to_kernel_mode_preempt() function, and moving
state management in a new irqentry_exit_to_kernel_mode_after_preempt()
function. The existing irqentry_exit_to_kernel_mode() is left as a
caller of both of these, avoiding the need to modify existing callers.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@kernel.org>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 include/linux/irq-entry-common.h | 26 +++++++++++++++++++++-----
 1 file changed, 21 insertions(+), 5 deletions(-)

Thomas/Peter/Andy, as mentioned on IRC, I haven't created kerneldoc
comments for these new functions because the existing comments don't
seem all that consistent (e.g. for user mode vs kernel mode), and I
suspect we want to rewrite them all in one go for wider consistency.

I'm happy to respin this, or to follow-up with that as per your
preference.

Mark.

diff --git a/include/linux/irq-entry-common.h b/include/linux/irq-entry-common.h
index 2206150e526d8..24830baa539c6 100644
--- a/include/linux/irq-entry-common.h
+++ b/include/linux/irq-entry-common.h
@@ -421,10 +421,18 @@ static __always_inline irqentry_state_t irqentry_enter_from_kernel_mode(struct p
 	return ret;
 }
 
-static __always_inline void irqentry_exit_to_kernel_mode(struct pt_regs *regs, irqentry_state_t state)
+static inline void irqentry_exit_to_kernel_mode_preempt(struct pt_regs *regs, irqentry_state_t state)
 {
-	lockdep_assert_irqs_disabled();
+	if (regs_irqs_disabled(regs) || state.exit_rcu)
+		return;
+
+	if (IS_ENABLED(CONFIG_PREEMPTION))
+		irqentry_exit_cond_resched();
+}
 
+static __always_inline void
+irqentry_exit_to_kernel_mode_after_preempt(struct pt_regs *regs, irqentry_state_t state)
+{
 	if (!regs_irqs_disabled(regs)) {
 		/*
 		 * If RCU was not watching on entry this needs to be done
@@ -443,9 +451,6 @@ static __always_inline void irqentry_exit_to_kernel_mode(struct pt_regs *regs, i
 		}
 
 		instrumentation_begin();
-		if (IS_ENABLED(CONFIG_PREEMPTION))
-			irqentry_exit_cond_resched();
-
 		/* Covers both tracing and lockdep */
 		trace_hardirqs_on();
 		instrumentation_end();
@@ -459,6 +464,17 @@ static __always_inline void irqentry_exit_to_kernel_mode(struct pt_regs *regs, i
 	}
 }
 
+static __always_inline void irqentry_exit_to_kernel_mode(struct pt_regs *regs, irqentry_state_t state)
+{
+	lockdep_assert_irqs_disabled();
+
+	instrumentation_begin();
+	irqentry_exit_to_kernel_mode_preempt(regs, state);
+	instrumentation_end();
+
+	irqentry_exit_to_kernel_mode_after_preempt(regs, state);
+}
+
 /**
  * irqentry_enter - Handle state tracking on ordinary interrupt entries
  * @regs:	Pointer to pt_regs of interrupted context
-- 
2.30.2



^ permalink raw reply related

* [PATCH 04/10] entry: Split kernel mode logic from irqentry_{enter,exit}()
From: Mark Rutland @ 2026-04-07 13:16 UTC (permalink / raw)
  To: linux-arm-kernel, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner
  Cc: mark.rutland, vladimir.murzin, catalin.marinas, ruanjinjie,
	linux-kernel, will
In-Reply-To: <20260407131650.3813777-1-mark.rutland@arm.com>

The generic irqentry code has entry/exit functions specifically for
exceptions taken from user mode, but doesn't have entry/exit functions
specifically for exceptions taken from kernel mode.

It would be helpful to have separate entry/exit functions specifically
for exceptions taken from kernel mode. This would make the structure of
the entry code more consistent, and would make it easier for
architectures to manage logic specific to exceptions taken from kernel
mode.

Move the logic specific to kernel mode out of irqentry_enter() and
irqentry_exit() into new irqentry_enter_from_kernel_mode() and
irqentry_exit_to_kernel_mode() functions. These are marked
__always_inline and placed in irq-entry-common.h, as with
irqentry_enter_from_user_mode() and irqentry_exit_to_user_mode(), so
that they can be inlined into architecture-specific wrappers. The
existing out-of-line irqentry_enter() and irqentry_exit() functions
retained as callers of the new functions.

The lockdep assertion from irqentry_exit() is moved into
irqentry_exit_to_user_mode() and irqentry_exit_to_kernel_mode(). This
was previously missing from irqentry_exit_to_user_mode() when called
directly, and any new lockdep assertion failure relating from this
change is a latent bug.

Aside from the lockdep change noted above, there should be no functional
change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@kernel.org>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 include/linux/irq-entry-common.h | 103 +++++++++++++++++++++++++++++++
 kernel/entry/common.c            | 103 +++----------------------------
 2 files changed, 111 insertions(+), 95 deletions(-)

Thomas/Peter/Andy, as mentioned on IRC, I haven't created kerneldoc
comments for these new functions because the existing comments don't
seem all that consistent (e.g. for user mode vs kernel mode), and I
suspect we want to rewrite them all in one go for wider consistency.

I'm happy to respin this, or to follow-up with that as per your
preference.

Mark.

diff --git a/include/linux/irq-entry-common.h b/include/linux/irq-entry-common.h
index d1e8591a59195..2206150e526d8 100644
--- a/include/linux/irq-entry-common.h
+++ b/include/linux/irq-entry-common.h
@@ -304,6 +304,8 @@ static __always_inline void irqentry_enter_from_user_mode(struct pt_regs *regs)
  */
 static __always_inline void irqentry_exit_to_user_mode(struct pt_regs *regs)
 {
+	lockdep_assert_irqs_disabled();
+
 	instrumentation_begin();
 	irqentry_exit_to_user_mode_prepare(regs);
 	instrumentation_end();
@@ -356,6 +358,107 @@ void dynamic_irqentry_exit_cond_resched(void);
 #define irqentry_exit_cond_resched()	raw_irqentry_exit_cond_resched()
 #endif /* CONFIG_PREEMPT_DYNAMIC */
 
+static __always_inline irqentry_state_t irqentry_enter_from_kernel_mode(struct pt_regs *regs)
+{
+	irqentry_state_t ret = {
+		.exit_rcu = false,
+	};
+
+	/*
+	 * If this entry hit the idle task invoke ct_irq_enter() whether
+	 * RCU is watching or not.
+	 *
+	 * Interrupts can nest when the first interrupt invokes softirq
+	 * processing on return which enables interrupts.
+	 *
+	 * Scheduler ticks in the idle task can mark quiescent state and
+	 * terminate a grace period, if and only if the timer interrupt is
+	 * not nested into another interrupt.
+	 *
+	 * Checking for rcu_is_watching() here would prevent the nesting
+	 * interrupt to invoke ct_irq_enter(). If that nested interrupt is
+	 * the tick then rcu_flavor_sched_clock_irq() would wrongfully
+	 * assume that it is the first interrupt and eventually claim
+	 * quiescent state and end grace periods prematurely.
+	 *
+	 * Unconditionally invoke ct_irq_enter() so RCU state stays
+	 * consistent.
+	 *
+	 * TINY_RCU does not support EQS, so let the compiler eliminate
+	 * this part when enabled.
+	 */
+	if (!IS_ENABLED(CONFIG_TINY_RCU) &&
+	    (is_idle_task(current) || arch_in_rcu_eqs())) {
+		/*
+		 * If RCU is not watching then the same careful
+		 * sequence vs. lockdep and tracing is required
+		 * as in irqentry_enter_from_user_mode().
+		 */
+		lockdep_hardirqs_off(CALLER_ADDR0);
+		ct_irq_enter();
+		instrumentation_begin();
+		kmsan_unpoison_entry_regs(regs);
+		trace_hardirqs_off_finish();
+		instrumentation_end();
+
+		ret.exit_rcu = true;
+		return ret;
+	}
+
+	/*
+	 * If RCU is watching then RCU only wants to check whether it needs
+	 * to restart the tick in NOHZ mode. rcu_irq_enter_check_tick()
+	 * already contains a warning when RCU is not watching, so no point
+	 * in having another one here.
+	 */
+	lockdep_hardirqs_off(CALLER_ADDR0);
+	instrumentation_begin();
+	kmsan_unpoison_entry_regs(regs);
+	rcu_irq_enter_check_tick();
+	trace_hardirqs_off_finish();
+	instrumentation_end();
+
+	return ret;
+}
+
+static __always_inline void irqentry_exit_to_kernel_mode(struct pt_regs *regs, irqentry_state_t state)
+{
+	lockdep_assert_irqs_disabled();
+
+	if (!regs_irqs_disabled(regs)) {
+		/*
+		 * If RCU was not watching on entry this needs to be done
+		 * carefully and needs the same ordering of lockdep/tracing
+		 * and RCU as the return to user mode path.
+		 */
+		if (state.exit_rcu) {
+			instrumentation_begin();
+			/* Tell the tracer that IRET will enable interrupts */
+			trace_hardirqs_on_prepare();
+			lockdep_hardirqs_on_prepare();
+			instrumentation_end();
+			ct_irq_exit();
+			lockdep_hardirqs_on(CALLER_ADDR0);
+			return;
+		}
+
+		instrumentation_begin();
+		if (IS_ENABLED(CONFIG_PREEMPTION))
+			irqentry_exit_cond_resched();
+
+		/* Covers both tracing and lockdep */
+		trace_hardirqs_on();
+		instrumentation_end();
+	} else {
+		/*
+		 * IRQ flags state is correct already. Just tell RCU if it
+		 * was not watching on entry.
+		 */
+		if (state.exit_rcu)
+			ct_irq_exit();
+	}
+}
+
 /**
  * irqentry_enter - Handle state tracking on ordinary interrupt entries
  * @regs:	Pointer to pt_regs of interrupted context
diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index b5e05d87ba391..1034be02eae84 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -105,70 +105,16 @@ __always_inline unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
 
 noinstr irqentry_state_t irqentry_enter(struct pt_regs *regs)
 {
-	irqentry_state_t ret = {
-		.exit_rcu = false,
-	};
-
 	if (user_mode(regs)) {
-		irqentry_enter_from_user_mode(regs);
-		return ret;
-	}
+		irqentry_state_t ret = {
+			.exit_rcu = false,
+		};
 
-	/*
-	 * If this entry hit the idle task invoke ct_irq_enter() whether
-	 * RCU is watching or not.
-	 *
-	 * Interrupts can nest when the first interrupt invokes softirq
-	 * processing on return which enables interrupts.
-	 *
-	 * Scheduler ticks in the idle task can mark quiescent state and
-	 * terminate a grace period, if and only if the timer interrupt is
-	 * not nested into another interrupt.
-	 *
-	 * Checking for rcu_is_watching() here would prevent the nesting
-	 * interrupt to invoke ct_irq_enter(). If that nested interrupt is
-	 * the tick then rcu_flavor_sched_clock_irq() would wrongfully
-	 * assume that it is the first interrupt and eventually claim
-	 * quiescent state and end grace periods prematurely.
-	 *
-	 * Unconditionally invoke ct_irq_enter() so RCU state stays
-	 * consistent.
-	 *
-	 * TINY_RCU does not support EQS, so let the compiler eliminate
-	 * this part when enabled.
-	 */
-	if (!IS_ENABLED(CONFIG_TINY_RCU) &&
-	    (is_idle_task(current) || arch_in_rcu_eqs())) {
-		/*
-		 * If RCU is not watching then the same careful
-		 * sequence vs. lockdep and tracing is required
-		 * as in irqentry_enter_from_user_mode().
-		 */
-		lockdep_hardirqs_off(CALLER_ADDR0);
-		ct_irq_enter();
-		instrumentation_begin();
-		kmsan_unpoison_entry_regs(regs);
-		trace_hardirqs_off_finish();
-		instrumentation_end();
-
-		ret.exit_rcu = true;
+		irqentry_enter_from_user_mode(regs);
 		return ret;
 	}
 
-	/*
-	 * If RCU is watching then RCU only wants to check whether it needs
-	 * to restart the tick in NOHZ mode. rcu_irq_enter_check_tick()
-	 * already contains a warning when RCU is not watching, so no point
-	 * in having another one here.
-	 */
-	lockdep_hardirqs_off(CALLER_ADDR0);
-	instrumentation_begin();
-	kmsan_unpoison_entry_regs(regs);
-	rcu_irq_enter_check_tick();
-	trace_hardirqs_off_finish();
-	instrumentation_end();
-
-	return ret;
+	return irqentry_enter_from_kernel_mode(regs);
 }
 
 /**
@@ -212,43 +158,10 @@ void dynamic_irqentry_exit_cond_resched(void)
 
 noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state)
 {
-	lockdep_assert_irqs_disabled();
-
-	/* Check whether this returns to user mode */
-	if (user_mode(regs)) {
+	if (user_mode(regs))
 		irqentry_exit_to_user_mode(regs);
-	} else if (!regs_irqs_disabled(regs)) {
-		/*
-		 * If RCU was not watching on entry this needs to be done
-		 * carefully and needs the same ordering of lockdep/tracing
-		 * and RCU as the return to user mode path.
-		 */
-		if (state.exit_rcu) {
-			instrumentation_begin();
-			/* Tell the tracer that IRET will enable interrupts */
-			trace_hardirqs_on_prepare();
-			lockdep_hardirqs_on_prepare();
-			instrumentation_end();
-			ct_irq_exit();
-			lockdep_hardirqs_on(CALLER_ADDR0);
-			return;
-		}
-
-		instrumentation_begin();
-		if (IS_ENABLED(CONFIG_PREEMPTION))
-			irqentry_exit_cond_resched();
-
-		/* Covers both tracing and lockdep */
-		trace_hardirqs_on();
-		instrumentation_end();
-	} else {
-		/*
-		 * IRQ flags state is correct already. Just tell RCU if it
-		 * was not watching on entry.
-		 */
-		if (state.exit_rcu)
-			ct_irq_exit();
-	}
+	else
+		irqentry_exit_to_kernel_mode(regs, state);
 }
 
 irqentry_state_t noinstr irqentry_nmi_enter(struct pt_regs *regs)
-- 
2.30.2



^ permalink raw reply related

* [PATCH 03/10] entry: Move irqentry_enter() prototype later
From: Mark Rutland @ 2026-04-07 13:16 UTC (permalink / raw)
  To: linux-arm-kernel, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner
  Cc: mark.rutland, vladimir.murzin, catalin.marinas, ruanjinjie,
	linux-kernel, will
In-Reply-To: <20260407131650.3813777-1-mark.rutland@arm.com>

Subsequent patches will rework the irqentry_*() functions. The end
result (and the intermediate diffs) will be much clearer if the
prototype for the irqentry_enter() function is moved later, immediately
before the prototype of the irqentry_exit() function.

Move the prototype later.

This is purely a move; there should be no functional change as a result
of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@kernel.org>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 include/linux/irq-entry-common.h | 44 ++++++++++++++++----------------
 1 file changed, 22 insertions(+), 22 deletions(-)

diff --git a/include/linux/irq-entry-common.h b/include/linux/irq-entry-common.h
index 93b4b551f7ae4..d1e8591a59195 100644
--- a/include/linux/irq-entry-common.h
+++ b/include/linux/irq-entry-common.h
@@ -334,6 +334,28 @@ typedef struct irqentry_state {
 } irqentry_state_t;
 #endif
 
+/**
+ * irqentry_exit_cond_resched - Conditionally reschedule on return from interrupt
+ *
+ * Conditional reschedule with additional sanity checks.
+ */
+void raw_irqentry_exit_cond_resched(void);
+
+#ifdef CONFIG_PREEMPT_DYNAMIC
+#if defined(CONFIG_HAVE_PREEMPT_DYNAMIC_CALL)
+#define irqentry_exit_cond_resched_dynamic_enabled	raw_irqentry_exit_cond_resched
+#define irqentry_exit_cond_resched_dynamic_disabled	NULL
+DECLARE_STATIC_CALL(irqentry_exit_cond_resched, raw_irqentry_exit_cond_resched);
+#define irqentry_exit_cond_resched()	static_call(irqentry_exit_cond_resched)()
+#elif defined(CONFIG_HAVE_PREEMPT_DYNAMIC_KEY)
+DECLARE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
+void dynamic_irqentry_exit_cond_resched(void);
+#define irqentry_exit_cond_resched()	dynamic_irqentry_exit_cond_resched()
+#endif
+#else /* CONFIG_PREEMPT_DYNAMIC */
+#define irqentry_exit_cond_resched()	raw_irqentry_exit_cond_resched()
+#endif /* CONFIG_PREEMPT_DYNAMIC */
+
 /**
  * irqentry_enter - Handle state tracking on ordinary interrupt entries
  * @regs:	Pointer to pt_regs of interrupted context
@@ -367,28 +389,6 @@ typedef struct irqentry_state {
  */
 irqentry_state_t noinstr irqentry_enter(struct pt_regs *regs);
 
-/**
- * irqentry_exit_cond_resched - Conditionally reschedule on return from interrupt
- *
- * Conditional reschedule with additional sanity checks.
- */
-void raw_irqentry_exit_cond_resched(void);
-
-#ifdef CONFIG_PREEMPT_DYNAMIC
-#if defined(CONFIG_HAVE_PREEMPT_DYNAMIC_CALL)
-#define irqentry_exit_cond_resched_dynamic_enabled	raw_irqentry_exit_cond_resched
-#define irqentry_exit_cond_resched_dynamic_disabled	NULL
-DECLARE_STATIC_CALL(irqentry_exit_cond_resched, raw_irqentry_exit_cond_resched);
-#define irqentry_exit_cond_resched()	static_call(irqentry_exit_cond_resched)()
-#elif defined(CONFIG_HAVE_PREEMPT_DYNAMIC_KEY)
-DECLARE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
-void dynamic_irqentry_exit_cond_resched(void);
-#define irqentry_exit_cond_resched()	dynamic_irqentry_exit_cond_resched()
-#endif
-#else /* CONFIG_PREEMPT_DYNAMIC */
-#define irqentry_exit_cond_resched()	raw_irqentry_exit_cond_resched()
-#endif /* CONFIG_PREEMPT_DYNAMIC */
-
 /**
  * irqentry_exit - Handle return from exception that used irqentry_enter()
  * @regs:	Pointer to pt_regs (exception entry regs)
-- 
2.30.2



^ permalink raw reply related

* [PATCH 02/10] entry: Remove local_irq_{enable,disable}_exit_to_user()
From: Mark Rutland @ 2026-04-07 13:16 UTC (permalink / raw)
  To: linux-arm-kernel, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner
  Cc: mark.rutland, vladimir.murzin, catalin.marinas, ruanjinjie,
	linux-kernel, will
In-Reply-To: <20260407131650.3813777-1-mark.rutland@arm.com>

The local_irq_enable_exit_to_user() and local_irq_disable_exit_to_user()
functions are never overridden by architecture code, and are always
equivalent to local_irq_enable() and local_irq_disable().

These functions were added on the assumption that arm64 would override
them to manage 'DAIF' exception masking, as described by Thomas Gleixner
in these threads:

  https://lore.kernel.org/all/20190919150809.340471236@linutronix.de/
  https://lore.kernel.org/all/alpine.DEB.2.21.1910240119090.1852@nanos.tec.linutronix.de/

In practice arm64 did not need to override either. Prior to moving to
the generic irqentry code, arm64's management of DAIF was reworked in
commit:

  97d935faacde ("arm64: Unmask Debug + SError in do_notify_resume()")

Since that commit, arm64 only masks interrupts during the 'prepare' step
when returning to user mode, and masks other DAIF exceptions later.
Within arm64_exit_to_user_mode(), the arm64 entry code is as follows:

	local_irq_disable();
	exit_to_user_mode_prepare_legacy(regs);
	local_daif_mask();
	mte_check_tfsr_exit();
	exit_to_user_mode();

Remove the unnecessary local_irq_enable_exit_to_user() and
local_irq_disable_exit_to_user() functions.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@kernel.org>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 include/linux/entry-common.h     |  2 +-
 include/linux/irq-entry-common.h | 31 -------------------------------
 kernel/entry/common.c            |  4 ++--
 3 files changed, 3 insertions(+), 34 deletions(-)

diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
index f83ca0abf2cdb..dbaa153100f44 100644
--- a/include/linux/entry-common.h
+++ b/include/linux/entry-common.h
@@ -321,7 +321,7 @@ static __always_inline void syscall_exit_to_user_mode(struct pt_regs *regs)
 {
 	instrumentation_begin();
 	syscall_exit_to_user_mode_work(regs);
-	local_irq_disable_exit_to_user();
+	local_irq_disable();
 	syscall_exit_to_user_mode_prepare(regs);
 	instrumentation_end();
 	exit_to_user_mode();
diff --git a/include/linux/irq-entry-common.h b/include/linux/irq-entry-common.h
index 3cf4d21168ba1..93b4b551f7ae4 100644
--- a/include/linux/irq-entry-common.h
+++ b/include/linux/irq-entry-common.h
@@ -100,37 +100,6 @@ static __always_inline void enter_from_user_mode(struct pt_regs *regs)
 	instrumentation_end();
 }
 
-/**
- * local_irq_enable_exit_to_user - Exit to user variant of local_irq_enable()
- * @ti_work:	Cached TIF flags gathered with interrupts disabled
- *
- * Defaults to local_irq_enable(). Can be supplied by architecture specific
- * code.
- */
-static inline void local_irq_enable_exit_to_user(unsigned long ti_work);
-
-#ifndef local_irq_enable_exit_to_user
-static __always_inline void local_irq_enable_exit_to_user(unsigned long ti_work)
-{
-	local_irq_enable();
-}
-#endif
-
-/**
- * local_irq_disable_exit_to_user - Exit to user variant of local_irq_disable()
- *
- * Defaults to local_irq_disable(). Can be supplied by architecture specific
- * code.
- */
-static inline void local_irq_disable_exit_to_user(void);
-
-#ifndef local_irq_disable_exit_to_user
-static __always_inline void local_irq_disable_exit_to_user(void)
-{
-	local_irq_disable();
-}
-#endif
-
 /**
  * arch_exit_to_user_mode_work - Architecture specific TIF work for exit
  *				 to user mode.
diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index 9ef63e4147913..b5e05d87ba391 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -47,7 +47,7 @@ static __always_inline unsigned long __exit_to_user_mode_loop(struct pt_regs *re
 	 */
 	while (ti_work & EXIT_TO_USER_MODE_WORK_LOOP) {
 
-		local_irq_enable_exit_to_user(ti_work);
+		local_irq_enable();
 
 		if (ti_work & (_TIF_NEED_RESCHED | _TIF_NEED_RESCHED_LAZY)) {
 			if (!rseq_grant_slice_extension(ti_work & TIF_SLICE_EXT_DENY))
@@ -74,7 +74,7 @@ static __always_inline unsigned long __exit_to_user_mode_loop(struct pt_regs *re
 		 * might have changed while interrupts and preemption was
 		 * enabled above.
 		 */
-		local_irq_disable_exit_to_user();
+		local_irq_disable();
 
 		/* Check if any of the above work has queued a deferred wakeup */
 		tick_nohz_user_enter_prepare();
-- 
2.30.2



^ permalink raw reply related

* [PATCH 01/10] entry: Fix stale comment for irqentry_enter()
From: Mark Rutland @ 2026-04-07 13:16 UTC (permalink / raw)
  To: linux-arm-kernel, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner
  Cc: mark.rutland, vladimir.murzin, catalin.marinas, ruanjinjie,
	linux-kernel, will
In-Reply-To: <20260407131650.3813777-1-mark.rutland@arm.com>

The kerneldoc comment for irqentry_enter() refers to idtentry_exit(),
which is an accidental holdover from the x86 entry code that the generic
irqentry code was based on.

Correct this to refer to irqentry_exit().

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@kernel.org>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 include/linux/irq-entry-common.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/irq-entry-common.h b/include/linux/irq-entry-common.h
index d26d1b1bcbfb9..3cf4d21168ba1 100644
--- a/include/linux/irq-entry-common.h
+++ b/include/linux/irq-entry-common.h
@@ -394,7 +394,7 @@ typedef struct irqentry_state {
  * establish the proper context for NOHZ_FULL. Otherwise scheduling on exit
  * would not be possible.
  *
- * Returns: An opaque object that must be passed to idtentry_exit()
+ * Returns: An opaque object that must be passed to irqentry_exit()
  */
 irqentry_state_t noinstr irqentry_enter(struct pt_regs *regs);
 
-- 
2.30.2



^ permalink raw reply related

* [PATCH 00/10] arm64/entry:
From: Mark Rutland @ 2026-04-07 13:16 UTC (permalink / raw)
  To: linux-arm-kernel, Andy Lutomirski, Catalin Marinas,
	Peter Zijlstra, Thomas Gleixner, Will Deacon
  Cc: mark.rutland, ruanjinjie, vladimir.murzin, linux-kernel

Since the move to generic IRQ entry, arm64's involuntary kernel
preemption logic has been subtly broken, and preemption can lead to
tasks running with some exceptions masked unexpectedly.

The gory details were discussed in the thread for my earlier attempt to
fix this:

  https://lore.kernel.org/linux-arm-kernel/20260320113026.3219620-1-mark.rutland@arm.com/
  https://lore.kernel.org/linux-arm-kernel/ab1prenkP-tFgUzK@J2N7QTR9R3.cambridge.arm.com/
  https://lore.kernel.org/linux-arm-kernel/ab2EZAXvL6bYcuKt@J2N7QTR9R3.cambridge.arm.com/
  https://lore.kernel.org/linux-arm-kernel/acPAzdtjK5w-rNqC@J2N7QTR9R3/

In summary, due to the way arm64's exceptions work architecturally, and
due to some constraints on sequencing during entry/exit, fixing this
properly requires tha arm64 handles more of the sequencing and
(architectural) state management itself.

This series attempts to make that possible by refactoring the generic
irqentry kernel mode entry/exit paths to look more like the user mode
entry/exit paths, with a separate 'prepare' step prior to return. The
refactoring also allows more of the generic irqentry code to be inlined
into architectural entry code, which can result in slightly better code
generation.

I've split the series into a prefix of changes for generic irqentry,
followed by changes to the arm64 code. I'm hoping that we can queue the
generic irqentry patches onto a stable branch, or take those via arm64.
The patches are as follows:

* Patches 1 and 2 are cleanup to the generic irqentry code. These have no
  functional impact, and I think these can be taken regardless of the
  rest of the series.

* Patches 3 to 5 refactor the generic irqentry code as described above,
  providing separate irqentry_{enter,exit}() functions and providing a
  split form of irqentry_exit_to_kernel_mode() similar to what exists
  for irqentry_exit_to_user_mode(). These patches alone should have no
  functional impact.

* Patch 6 is a minimal fix for the arm64 exception masking issues. This
  DOES NOT depend on the generic irqentry patches, and can be backported
  to stable.

* Patches 7 to 9 refactor the arm64 entry code and provide a more
  optimal fix (which permits preemption in more cases). These are split
  into separate patches to aid bisection.

* Patch 10 is a test which can detect exceptions being masked
  unexpectedly. I don't know whether we want to take this as-is, but
  I've included it here to aid testing and so that it gets archived for
  future reference.

The series is based on v7.0-rc3.

Thanks,
Mark.

Mark Rutland (10):
  entry: Fix stale comment for irqentry_enter()
  entry: Remove local_irq_{enable,disable}_exit_to_user()
  entry: Move irqentry_enter() prototype later
  entry: Split kernel mode logic from irqentry_{enter,exit}()
  entry: Split preemption from irqentry_exit_to_kernel_mode()
  arm64: entry: Don't preempt with SError or Debug masked
  arm64: entry: Consistently prefix arm64-specific wrappers
  arm64: entry: Use irqentry_{enter_from,exit_to}_kernel_mode()
  arm64: entry: Use split preemption logic
  arm64: Check DAIF (and PMR) at task-switch time

 arch/arm64/kernel/entry-common.c |  52 ++++----
 arch/arm64/kernel/process.c      |  25 ++++
 include/linux/entry-common.h     |   2 +-
 include/linux/irq-entry-common.h | 196 ++++++++++++++++++++++---------
 kernel/entry/common.c            | 107 ++---------------
 5 files changed, 202 insertions(+), 180 deletions(-)

-- 
2.30.2



^ permalink raw reply

* Re: [PATCH v2 1/3] arm64: mm: Fix rodata=full block mapping support for realm guests
From: Ryan Roberts @ 2026-04-07 13:06 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Will Deacon, David Hildenbrand (Arm), Dev Jain, Yang Shi,
	Suzuki K Poulose, Jinjiang Tu, Kevin Brodsky, linux-arm-kernel,
	linux-kernel, stable
In-Reply-To: <adTh8d9k3y5ybemL@arm.com>

On 07/04/2026 11:52, Catalin Marinas wrote:
> On Tue, Apr 07, 2026 at 11:13:07AM +0100, Ryan Roberts wrote:
>> On 07/04/2026 10:32, Catalin Marinas wrote:
>>> On Tue, Apr 07, 2026 at 09:43:42AM +0100, Ryan Roberts wrote:
>>>> On 03/04/2026 11:31, Catalin Marinas wrote:
>>>>> On Thu, Apr 02, 2026 at 09:43:59PM +0100, Catalin Marinas wrote:
>>>>>> Another thing I couldn't get my head around - IIUC is_realm_world()
>>>>>> won't return true for map_mem() yet (if in a realm). Can we have realms
>>>>>> on hardware that does not support BBML2_NOABORT? We may not have
>>>>>> configuration with rodata_full set (it should be complementary to realm
>>>>>> support).
>>>>>
>>>>> With rodata_full==false, can_set_direct_map() returns false initially
>>>>> but after arm64_rsi_init() it starts returning true if is_realm_world().
>>>>> The side-effect is that map_mem() goes for block mappings and
>>>>> linear_map_requires_bbml2 set to false. Later on,
>>>>> linear_map_maybe_split_to_ptes() will skip the splitting.
>>>>>
>>>>> Unless I'm missing something, is_realm_world() calls in
>>>>> force_pte_mapping() and can_set_direct_map() are useless. I'd remove
>>>>> them and either require BBML2_NOABORT with CCA or get the user to force
>>>>> rodata_full when running in realms. Or move arm64_rsi_init() even
>>>>> earlier?
>>>>
>>>> I'd need Suzuki to comment on this. As I said in the other mail, I was treating
>>>> this like a pre-existing bug. But I guess linear_map_requires_bbml2 ending up
>>>> wrong is a problem here. I'm not sure it's quite as simple as requiring
>>>> BBML2_NOABORT with CCA as we still need can_set_direct_map() to return true if
>>>> we are in a realm.
>>>
>>> can_set_direct_map() == true is not a property of the realm but rather a
>>> requirement. 
>>
>> Yes indeed. It would be better to call it might_set_direct_map() or something
>> like that...
> 
> The way it is used means "is allowed to set the direct map". I guess
> "may set..." works as well. My reading of "might" is more like in
> might_sleep(), more of hint than a permission check.

OK, I read it as "might" as in a hint that we might want to change the direct
map permissions.

> 
> If you only look at the linear_map_requires_bbml2 setting in map_mem(),
> yes, something like might_set_direct_map() makes sense but that's not
> how this function is used in the rest of the kernel (to reject the
> direct map change if not supported).

ACK.

> 
>>> In the absence of BBML2_NOABORT, I guess the test was added
>>> under the assumption that force_pte_mapping() also returns true if
>>> is_realm_world(). We might as well add a variable or static label to
>>> track whether can_set_direct_map() is possible and avoid tests that
>>> duplicate force_pte_mapping().
>>
>> I'm not sure I follow. We have linear_map_requires_bbml2 which is inteded to
>> track this shape of thing;
> 
> As the name implies, linear_map_requires_bbml2 tracks only this -
> BBML2_NOABORT is required because the linear map uses large blocks.
> Prior to your patches, that's only used as far as
> linear_map_maybe_split_to_ptes() and if splitting took place, this
> variable is no longer relevant (should be turned to false but since it's
> not used, it doesn't matter).
> 
> With your patches, its use was extended to runtime and I think it
> remains true even if linear_map_maybe_split_to_ptes() changed the block
> mappings. Do we need this:

I'll admit it is ugly but it's not a bug; the system capabilitites are finalized
by the time we call linear_map_maybe_split_to_ptes().

The "if (!linear_map_requires_bbml2 || is_kfence_address((void *)start))" check
in split_kernel_leaf_mapping() would ideally be "if (!force_pte_mapping() ||
is_kfence_address((void *)start))", but it is not safe to call
force_pte_mapping() from a secondary cpu prior to finalizing the system caps.
I'm reusing the flag that I already had available to work around that.

> 
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index dcee56bb622a..595d35fdd8c3 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -988,6 +988,7 @@ void __init linear_map_maybe_split_to_ptes(void)
>  	if (linear_map_requires_bbml2 && !system_supports_bbml2_noabort()) {
>  		init_idmap_kpti_bbml2_flag();
>  		stop_machine(linear_map_split_to_ptes, NULL, cpu_online_mask);
> +		linear_map_requires_bbml2 = false;
>  	}
>  }
>  
> 
>> if we have forced pte mapping then the value of
>> can_set_direct_map() is irrelevant - we will never need to split because we are
>> already pte-mapped.
> 
> can_set_direct_map() is used in other places, so its value is relevant,
> e.g. sys_memfd_secret() is rejected if this function returns false.
> 
>> But if can_set_direct_map() initially returns false because
>> is_realm_world() incorrectly returns false in the early boot environment, then
>> linear_map_requires_bbml2 will be set to false, and we will incorrectly
>> short-circuit splitting any block mappings in split_kernel_leaf_mapping().
>>
>> I think we are agreed on the problem. But I don't understand how tracking
>> can_set_direct_map() in a cached variable helps with that.
> 
> It's not about the map_mem() decision and linear_map_requires_bbml2
> setting but rather its other uses like sys_memfd_secret().
> 
>>> This won't solve the is_realm_world() changing polarity during boot but
>>> at least we know it won't suddenly make can_set_direct_map() return
>>> true when it shouldn't.
>>
>> But is_real_world() _should_ make can_set_direct_map() return true, shouldn't
>> it?
> 
> Yes but not directly. If is_realm_world() is true, we either have
> (linear_map_requires_bbml2 && system_supports_bbml2_noabort()) or
> linear_map_requires_bbml2 is false and we have pte mappings. Adding
> is_realm_world() to can_set_direct_map() does not imply any of these.
> It's just a hope that something before actually ensured the conditions
> are true.
> 
> It might be better if we rename the current function to
> might_set_direct_map() and introduce a new can_set_direct_map() that
> actually tells the truth if all the conditions are met. I suggested a
> variable or static label but checking some conditions related to the
> actual linear map work as well, just not is_realm_world() directly.

I'm not sure I see the distinction between "might" and "can" with your
definition. But regardless, I think we are talking about the pre-existing
is_real_world() bug, so I'm not personally planning to do anything further here
unless you shout.

Thanks,
Ryan




^ permalink raw reply

* [PATCH v4 5/5] PCI: qcom: Add D3cold support
From: Krishna Chaitanya Chundru @ 2026-04-07 13:03 UTC (permalink / raw)
  To: Jingoo Han, Manivannan Sadhasivam, Lorenzo Pieralisi,
	Krzysztof Wilczyński, Rob Herring, Bjorn Helgaas,
	Will Deacon
  Cc: linux-pci, linux-kernel, linux-arm-msm, linux-arm-kernel,
	jonathanh, bjorn.andersson, Krishna Chaitanya Chundru
In-Reply-To: <20260407-d3cold-v4-0-bb171f75b465@oss.qualcomm.com>

Add support for transitioning PCIe endpoints under host bridge into
D3cold by integrating with the DWC core suspend/resume helpers.

Implement PME_TurnOff message generation via ELBI_SYS_CTRL and hook it
into the DWC host operations so the controller follows the standard
PME_TurnOff-based power-down sequence before entering D3cold.

When the device is suspended into D3cold, fully tear down interconnect
bandwidth, OPP votes. If D3cold is not entered, retain existing behavior
by keeping the required interconnect and OPP votes.

Use dw_pcie::skip_pwrctrl_off to avoid powering off devices during suspend
to preseve wakeup capability of the devices and also not to power on the
devices in the init path.

Drop the qcom_pcie::suspended flag and rely on the existing
dw_pcie::suspended state, which now drives both the power-management
flow and the interconnect/OPP handling.

Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
---
 drivers/pci/controller/dwc/pcie-qcom.c | 150 ++++++++++++++++++++-------------
 1 file changed, 92 insertions(+), 58 deletions(-)

diff --git a/drivers/pci/controller/dwc/pcie-qcom.c b/drivers/pci/controller/dwc/pcie-qcom.c
index c14c3eb70f356b6ad8a2ffe48b107327d2babf77..e8d109c44dd270610272906244d1afeec3664f41 100644
--- a/drivers/pci/controller/dwc/pcie-qcom.c
+++ b/drivers/pci/controller/dwc/pcie-qcom.c
@@ -145,6 +145,7 @@
 
 /* ELBI_SYS_CTRL register fields */
 #define ELBI_SYS_CTRL_LT_ENABLE			BIT(0)
+#define ELBI_SYS_CTRL_PME_TURNOFF_MSG		BIT(4)
 
 /* AXI_MSTR_RESP_COMP_CTRL0 register fields */
 #define CFG_REMOTE_RD_REQ_BRIDGE_SIZE_2K	0x4
@@ -283,7 +284,6 @@ struct qcom_pcie {
 	const struct qcom_pcie_cfg *cfg;
 	struct dentry *debugfs;
 	struct list_head ports;
-	bool suspended;
 	bool use_pm_opp;
 };
 
@@ -1336,13 +1336,17 @@ static int qcom_pcie_host_init(struct dw_pcie_rp *pp)
 	if (ret)
 		goto err_deinit;
 
-	ret = pci_pwrctrl_create_devices(pci->dev);
-	if (ret)
-		goto err_disable_phy;
+	if (!pci->suspended) {
+		ret = pci_pwrctrl_create_devices(pci->dev);
+		if (ret)
+			goto err_disable_phy;
+	}
 
-	ret = pci_pwrctrl_power_on_devices(pci->dev);
-	if (ret)
-		goto err_pwrctrl_destroy;
+	if (!pp->skip_pwrctrl_off) {
+		ret = pci_pwrctrl_power_on_devices(pci->dev);
+		if (ret)
+			goto err_pwrctrl_destroy;
+	}
 
 	if (pcie->cfg->ops->post_init) {
 		ret = pcie->cfg->ops->post_init(pcie);
@@ -1386,11 +1390,14 @@ static void qcom_pcie_host_deinit(struct dw_pcie_rp *pp)
 
 	qcom_pcie_perst_assert(pcie);
 
-	/*
-	 * No need to destroy pwrctrl devices as this function only gets called
-	 * during system suspend as of now.
-	 */
-	pci_pwrctrl_power_off_devices(pci->dev);
+	if (!pci->pp.skip_pwrctrl_off) {
+		/*
+		 * No need to destroy pwrctrl devices as this function only gets called
+		 * during system suspend as of now.
+		 */
+		pci_pwrctrl_power_off_devices(pci->dev);
+	}
+
 	qcom_pcie_phy_power_off(pcie);
 	pcie->cfg->ops->deinit(pcie);
 }
@@ -1404,10 +1411,18 @@ static void qcom_pcie_host_post_init(struct dw_pcie_rp *pp)
 		pcie->cfg->ops->host_post_init(pcie);
 }
 
+static void qcom_pcie_host_pme_turn_off(struct dw_pcie_rp *pp)
+{
+	struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
+
+	writel(ELBI_SYS_CTRL_PME_TURNOFF_MSG, pci->elbi_base + ELBI_SYS_CTRL);
+}
+
 static const struct dw_pcie_host_ops qcom_pcie_dw_ops = {
 	.init		= qcom_pcie_host_init,
 	.deinit		= qcom_pcie_host_deinit,
 	.post_init	= qcom_pcie_host_post_init,
+	.pme_turn_off	= qcom_pcie_host_pme_turn_off,
 };
 
 /* Qcom IP rev.: 2.1.0	Synopsys IP rev.: 4.01a */
@@ -2072,53 +2087,51 @@ static int qcom_pcie_suspend_noirq(struct device *dev)
 	if (!pcie)
 		return 0;
 
-	/*
-	 * Set minimum bandwidth required to keep data path functional during
-	 * suspend.
-	 */
-	if (pcie->icc_mem) {
-		ret = icc_set_bw(pcie->icc_mem, 0, kBps_to_icc(1));
-		if (ret) {
-			dev_err(dev,
-				"Failed to set bandwidth for PCIe-MEM interconnect path: %d\n",
-				ret);
-			return ret;
-		}
-	}
+	ret = dw_pcie_suspend_noirq(pcie->pci);
+	if (ret)
+		return ret;
 
-	/*
-	 * Turn OFF the resources only for controllers without active PCIe
-	 * devices. For controllers with active devices, the resources are kept
-	 * ON and the link is expected to be in L0/L1 (sub)states.
-	 *
-	 * Turning OFF the resources for controllers with active PCIe devices
-	 * will trigger access violation during the end of the suspend cycle,
-	 * as kernel tries to access the PCIe devices config space for masking
-	 * MSIs.
-	 *
-	 * Also, it is not desirable to put the link into L2/L3 state as that
-	 * implies VDD supply will be removed and the devices may go into
-	 * powerdown state. This will affect the lifetime of the storage devices
-	 * like NVMe.
-	 */
-	if (!dw_pcie_link_up(pcie->pci)) {
-		qcom_pcie_host_deinit(&pcie->pci->pp);
-		pcie->suspended = true;
-	}
+	if (pcie->pci->suspended) {
+		ret = icc_disable(pcie->icc_mem);
+		if (ret)
+			dev_err(dev, "Failed to disable PCIe-MEM interconnect path: %d\n", ret);
 
-	/*
-	 * Only disable CPU-PCIe interconnect path if the suspend is non-S2RAM.
-	 * Because on some platforms, DBI access can happen very late during the
-	 * S2RAM and a non-active CPU-PCIe interconnect path may lead to NoC
-	 * error.
-	 */
-	if (pm_suspend_target_state != PM_SUSPEND_MEM) {
 		ret = icc_disable(pcie->icc_cpu);
 		if (ret)
 			dev_err(dev, "Failed to disable CPU-PCIe interconnect path: %d\n", ret);
 
 		if (pcie->use_pm_opp)
 			dev_pm_opp_set_opp(pcie->pci->dev, NULL);
+	} else {
+		/*
+		 * Set minimum bandwidth required to keep data path functional during
+		 * suspend.
+		 */
+		if (pcie->icc_mem) {
+			ret = icc_set_bw(pcie->icc_mem, 0, kBps_to_icc(1));
+			if (ret) {
+				dev_err(dev,
+					"Failed to set bandwidth for PCIe-MEM interconnect path: %d\n",
+					ret);
+				return ret;
+			}
+		}
+
+		/*
+		 * Only disable CPU-PCIe interconnect path if the suspend is non-S2RAM.
+		 * Because on some platforms, DBI access can happen very late during the
+		 * S2RAM and a non-active CPU-PCIe interconnect path may lead to NoC
+		 * error.
+		 */
+		if (pm_suspend_target_state != PM_SUSPEND_MEM) {
+			ret = icc_disable(pcie->icc_cpu);
+			if (ret)
+				dev_err(dev, "Failed to disable CPU-PCIe interconnect path: %d\n",
+					ret);
+
+			if (pcie->use_pm_opp)
+				dev_pm_opp_set_opp(pcie->pci->dev, NULL);
+		}
 	}
 	return ret;
 }
@@ -2132,25 +2145,46 @@ static int qcom_pcie_resume_noirq(struct device *dev)
 	if (!pcie)
 		return 0;
 
-	if (pm_suspend_target_state != PM_SUSPEND_MEM) {
+	if (pcie->pci->suspended) {
 		ret = icc_enable(pcie->icc_cpu);
 		if (ret) {
 			dev_err(dev, "Failed to enable CPU-PCIe interconnect path: %d\n", ret);
 			return ret;
 		}
-	}
 
-	if (pcie->suspended) {
-		ret = qcom_pcie_host_init(&pcie->pci->pp);
-		if (ret)
-			return ret;
+		ret = icc_enable(pcie->icc_mem);
+		if (ret) {
+			dev_err(dev, "Failed to enable PCIe-MEM interconnect path: %d\n", ret);
+			goto disable_icc_cpu;
+		}
 
-		pcie->suspended = false;
+		/*
+		 * Ignore -ENODEV & -EIO here since it is expected when no endpoint is
+		 * connected to the PCIe link.
+		 */
+		ret = dw_pcie_resume_noirq(pcie->pci);
+		if (ret && ret != -ENODEV && ret != -EIO)
+			goto disable_icc_mem;
+	} else {
+		if (pm_suspend_target_state != PM_SUSPEND_MEM) {
+			ret = icc_enable(pcie->icc_cpu);
+			if (ret) {
+				dev_err(dev, "Failed to enable CPU-PCIe interconnect path: %d\n",
+					ret);
+				return ret;
+			}
+		}
 	}
 
 	qcom_pcie_icc_opp_update(pcie);
 
 	return 0;
+disable_icc_mem:
+	icc_disable(pcie->icc_mem);
+disable_icc_cpu:
+	icc_disable(pcie->icc_cpu);
+
+	return ret;
 }
 
 static const struct of_device_id qcom_pcie_match[] = {

-- 
2.34.1



^ permalink raw reply related

* [PATCH v4 4/5] PCI: dwc: Use common D3cold eligibility helper in suspend path
From: Krishna Chaitanya Chundru @ 2026-04-07 13:03 UTC (permalink / raw)
  To: Jingoo Han, Manivannan Sadhasivam, Lorenzo Pieralisi,
	Krzysztof Wilczyński, Rob Herring, Bjorn Helgaas,
	Will Deacon
  Cc: linux-pci, linux-kernel, linux-arm-msm, linux-arm-kernel,
	jonathanh, bjorn.andersson, Krishna Chaitanya Chundru
In-Reply-To: <20260407-d3cold-v4-0-bb171f75b465@oss.qualcomm.com>

Previously, the driver skipped putting the link into L2/device state in
D3cold whenever L1 ASPM was enabled, since some devices (e.g. NVMe) expect
low resume latency and may not tolerate deeper power states. However, such
devices typically remain in D0 and are already covered by the new helper's
requirement that all endpoints be in D3hot before the devices under host
bridge may enter D3cold.

So, replace the local L1/L1SS-based check in dw_pcie_suspend_noirq() with
the shared pci_host_common_d3cold_possible() helper to decide whether the
devices under host bridge can safely transition to D3cold.

In addition, propagate PME-from-D3cold capability information from the
helper and record it in skip_pwrctrl_off. Some devices (e.g. M.2 cards
without auxiliary power) may lose PME detection when main power is
removed, even if they advertise PME-from-D3cold support. This allows
controller power-off to be skipped when required to preserve wakeup
functionality.

Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
---
 drivers/pci/controller/dwc/pcie-designware-host.c | 11 +++++------
 drivers/pci/controller/dwc/pcie-designware.h      |  1 +
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
index 6ae6189e9b8a9021c99ece17504834650debd86b..ce3093cfd1608f1616001cbf5f541a4dc3eafea5 100644
--- a/drivers/pci/controller/dwc/pcie-designware-host.c
+++ b/drivers/pci/controller/dwc/pcie-designware-host.c
@@ -16,9 +16,11 @@
 #include <linux/msi.h>
 #include <linux/of_address.h>
 #include <linux/of_pci.h>
+#include <linux/pci.h>
 #include <linux/pci_regs.h>
 #include <linux/platform_device.h>
 
+#include "../pci-host-common.h"
 #include "../../pci.h"
 #include "pcie-designware.h"
 
@@ -1218,18 +1220,14 @@ static int dw_pcie_pme_turn_off(struct dw_pcie *pci)
 
 int dw_pcie_suspend_noirq(struct dw_pcie *pci)
 {
-	u8 offset = dw_pcie_find_capability(pci, PCI_CAP_ID_EXP);
+	bool pme_capable = false;
 	int ret = 0;
 	u32 val;
 
 	if (!dw_pcie_link_up(pci))
 		goto stop_link;
 
-	/*
-	 * If L1SS is supported, then do not put the link into L2 as some
-	 * devices such as NVMe expect low resume latency.
-	 */
-	if (dw_pcie_readw_dbi(pci, offset + PCI_EXP_LNKCTL) & PCI_EXP_LNKCTL_ASPM_L1)
+	if (!pci_host_common_d3cold_possible(pci->pp.bridge, &pme_capable))
 		return 0;
 
 	if (pci->pp.ops->pme_turn_off) {
@@ -1269,6 +1267,7 @@ int dw_pcie_suspend_noirq(struct dw_pcie *pci)
 	udelay(1);
 
 stop_link:
+	pci->pp.skip_pwrctrl_off = pme_capable;
 	dw_pcie_stop_link(pci);
 	if (pci->pp.ops->deinit)
 		pci->pp.ops->deinit(&pci->pp);
diff --git a/drivers/pci/controller/dwc/pcie-designware.h b/drivers/pci/controller/dwc/pcie-designware.h
index ae6389dd9caa5c27690f998d58729130ea863984..0af083018aee29c1f0f4385dacc6e878c8d040de 100644
--- a/drivers/pci/controller/dwc/pcie-designware.h
+++ b/drivers/pci/controller/dwc/pcie-designware.h
@@ -447,6 +447,7 @@ struct dw_pcie_rp {
 	bool			ecam_enabled;
 	bool			native_ecam;
 	bool                    skip_l23_ready;
+	bool			skip_pwrctrl_off;
 };
 
 struct dw_pcie_ep_ops {

-- 
2.34.1



^ permalink raw reply related

* [PATCH v4 3/5] PCI: qcom: Power down PHY via PARF_PHY_CTRL before disabling rails/clocks
From: Krishna Chaitanya Chundru @ 2026-04-07 13:03 UTC (permalink / raw)
  To: Jingoo Han, Manivannan Sadhasivam, Lorenzo Pieralisi,
	Krzysztof Wilczyński, Rob Herring, Bjorn Helgaas,
	Will Deacon
  Cc: linux-pci, linux-kernel, linux-arm-msm, linux-arm-kernel,
	jonathanh, bjorn.andersson, Krishna Chaitanya Chundru
In-Reply-To: <20260407-d3cold-v4-0-bb171f75b465@oss.qualcomm.com>

Some Qcom PCIe controller variants bring the PHY out of test power-down
(PHY_TEST_PWR_DOWN) during init. When the link is later transitioned
towards D3cold and the driver disables PCIe clocks and/or regulators
without explicitly re-asserting PHY_TEST_PWR_DOWN, the PHY can remain
partially powered, leading to avoidable power leakage.

Update the init-path comments to reflect that PARF_PHY_CTRL is used to
power the PHY on. Also, for controller revisions that enable PHY power
in init (2.3.2, 2.3.3, 2.7.0 and 2.9.0), explicitly power the PHY down
via PARF_PHY_CTRL in the deinit path before disabling clocks/regulators.

This ensures the PHY is put into a defined low-power state prior to
removing its supplies, preventing leakage when entering D3cold.

Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
---
 drivers/pci/controller/dwc/pcie-qcom.c | 32 +++++++++++++++++++++++++++++---
 1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/controller/dwc/pcie-qcom.c b/drivers/pci/controller/dwc/pcie-qcom.c
index b00bf46637a5ff803a845719c5b0b5b82739244b..c14c3eb70f356b6ad8a2ffe48b107327d2babf77 100644
--- a/drivers/pci/controller/dwc/pcie-qcom.c
+++ b/drivers/pci/controller/dwc/pcie-qcom.c
@@ -513,7 +513,7 @@ static int qcom_pcie_post_init_2_1_0(struct qcom_pcie *pcie)
 	u32 val;
 	int ret;
 
-	/* enable PCIe clocks and resets */
+	/* Force PHY out of lowest power state */
 	val = readl(pcie->parf + PARF_PHY_CTRL);
 	val &= ~PHY_TEST_PWR_DOWN;
 	writel(val, pcie->parf + PARF_PHY_CTRL);
@@ -680,6 +680,12 @@ static int qcom_pcie_get_resources_2_3_2(struct qcom_pcie *pcie)
 static void qcom_pcie_deinit_2_3_2(struct qcom_pcie *pcie)
 {
 	struct qcom_pcie_resources_2_3_2 *res = &pcie->res.v2_3_2;
+	u32 val;
+
+	/* Force PHY to lowest power state*/
+	val = readl(pcie->parf + PARF_PHY_CTRL);
+	val |= PHY_TEST_PWR_DOWN;
+	writel(val, pcie->parf + PARF_PHY_CTRL);
 
 	clk_bulk_disable_unprepare(res->num_clks, res->clks);
 	regulator_bulk_disable(ARRAY_SIZE(res->supplies), res->supplies);
@@ -712,7 +718,7 @@ static int qcom_pcie_post_init_2_3_2(struct qcom_pcie *pcie)
 {
 	u32 val;
 
-	/* enable PCIe clocks and resets */
+	/* Force PHY out of lowest power state */
 	val = readl(pcie->parf + PARF_PHY_CTRL);
 	val &= ~PHY_TEST_PWR_DOWN;
 	writel(val, pcie->parf + PARF_PHY_CTRL);
@@ -844,6 +850,12 @@ static int qcom_pcie_get_resources_2_3_3(struct qcom_pcie *pcie)
 static void qcom_pcie_deinit_2_3_3(struct qcom_pcie *pcie)
 {
 	struct qcom_pcie_resources_2_3_3 *res = &pcie->res.v2_3_3;
+	u32 val;
+
+	/* Force PHY to lowest power state */
+	val = readl(pcie->parf + PARF_PHY_CTRL);
+	val |= PHY_TEST_PWR_DOWN;
+	writel(val, pcie->parf + PARF_PHY_CTRL);
 
 	clk_bulk_disable_unprepare(res->num_clks, res->clks);
 }
@@ -899,6 +911,7 @@ static int qcom_pcie_post_init_2_3_3(struct qcom_pcie *pcie)
 	u16 offset = dw_pcie_find_capability(pci, PCI_CAP_ID_EXP);
 	u32 val;
 
+	/* Force PHY out of lowest power state */
 	val = readl(pcie->parf + PARF_PHY_CTRL);
 	val &= ~PHY_TEST_PWR_DOWN;
 	writel(val, pcie->parf + PARF_PHY_CTRL);
@@ -994,7 +1007,7 @@ static int qcom_pcie_init_2_7_0(struct qcom_pcie *pcie)
 	/* configure PCIe to RC mode */
 	writel(DEVICE_TYPE_RC, pcie->parf + PARF_DEVICE_TYPE);
 
-	/* enable PCIe clocks and resets */
+	/* Force PHY out of lowest power state */
 	val = readl(pcie->parf + PARF_PHY_CTRL);
 	val &= ~PHY_TEST_PWR_DOWN;
 	writel(val, pcie->parf + PARF_PHY_CTRL);
@@ -1065,6 +1078,12 @@ static void qcom_pcie_host_post_init_2_7_0(struct qcom_pcie *pcie)
 static void qcom_pcie_deinit_2_7_0(struct qcom_pcie *pcie)
 {
 	struct qcom_pcie_resources_2_7_0 *res = &pcie->res.v2_7_0;
+	u32 val;
+
+	/* Force PHY to lowest power state */
+	val = readl(pcie->parf + PARF_PHY_CTRL);
+	val |= PHY_TEST_PWR_DOWN;
+	writel(val, pcie->parf + PARF_PHY_CTRL);
 
 	clk_bulk_disable_unprepare(res->num_clks, res->clks);
 
@@ -1169,6 +1188,12 @@ static int qcom_pcie_get_resources_2_9_0(struct qcom_pcie *pcie)
 static void qcom_pcie_deinit_2_9_0(struct qcom_pcie *pcie)
 {
 	struct qcom_pcie_resources_2_9_0 *res = &pcie->res.v2_9_0;
+	u32 val;
+
+	/* Force PHY to lowest power state */
+	val = readl(pcie->parf + PARF_PHY_CTRL);
+	val |= PHY_TEST_PWR_DOWN;
+	writel(val, pcie->parf + PARF_PHY_CTRL);
 
 	clk_bulk_disable_unprepare(res->num_clks, res->clks);
 }
@@ -1209,6 +1234,7 @@ static int qcom_pcie_post_init_2_9_0(struct qcom_pcie *pcie)
 	u32 val;
 	int i;
 
+	/* Force PHY out of lowest power state */
 	val = readl(pcie->parf + PARF_PHY_CTRL);
 	val &= ~PHY_TEST_PWR_DOWN;
 	writel(val, pcie->parf + PARF_PHY_CTRL);

-- 
2.34.1



^ permalink raw reply related

* [PATCH v4 2/5] PCI: qcom: Add .get_ltssm() helper
From: Krishna Chaitanya Chundru @ 2026-04-07 13:03 UTC (permalink / raw)
  To: Jingoo Han, Manivannan Sadhasivam, Lorenzo Pieralisi,
	Krzysztof Wilczyński, Rob Herring, Bjorn Helgaas,
	Will Deacon
  Cc: linux-pci, linux-kernel, linux-arm-msm, linux-arm-kernel,
	jonathanh, bjorn.andersson, Krishna Chaitanya Chundru
In-Reply-To: <20260407-d3cold-v4-0-bb171f75b465@oss.qualcomm.com>

For older targets like sc7280, we see reading DBI after sending PME
turn off message is causing NOC error.

To avoid unsafe DBI accesses, introduce qcom_pcie_get_ltssm(), which
retrieves the LTSSM state from the PARF_LTSSM register instead.

This helper is used in place of direct DBI-based link state checks in
the D3cold path after sending PME turn-off message, ensuring the LTSSM
state can be queried safely even after DBI access is no longer valid.

Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
---
 drivers/pci/controller/dwc/pcie-qcom.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/pci/controller/dwc/pcie-qcom.c b/drivers/pci/controller/dwc/pcie-qcom.c
index 67a16af69ddc75fca1b123e70715e692a91a9135..b00bf46637a5ff803a845719c5b0b5b82739244b 100644
--- a/drivers/pci/controller/dwc/pcie-qcom.c
+++ b/drivers/pci/controller/dwc/pcie-qcom.c
@@ -131,6 +131,7 @@
 
 /* PARF_LTSSM register fields */
 #define LTSSM_EN				BIT(8)
+#define PARF_LTSSM_STATE_MASK			GENMASK(5, 0)
 
 /* PARF_NO_SNOOP_OVERRIDE register fields */
 #define WR_NO_SNOOP_OVERRIDE_EN			BIT(1)
@@ -1255,6 +1256,16 @@ static bool qcom_pcie_link_up(struct dw_pcie *pci)
 	return val & PCI_EXP_LNKSTA_DLLLA;
 }
 
+static enum dw_pcie_ltssm qcom_pcie_get_ltssm(struct dw_pcie *pci)
+{
+	struct qcom_pcie *pcie = to_qcom_pcie(pci);
+	u32 val;
+
+	val = readl(pcie->parf + PARF_LTSSM);
+
+	return (enum dw_pcie_ltssm)FIELD_GET(PARF_LTSSM_STATE_MASK, val);
+}
+
 static void qcom_pcie_phy_power_off(struct qcom_pcie *pcie)
 {
 	struct qcom_pcie_port *port;
@@ -1507,6 +1518,7 @@ static const struct qcom_pcie_cfg cfg_fw_managed = {
 static const struct dw_pcie_ops dw_pcie_ops = {
 	.link_up = qcom_pcie_link_up,
 	.start_link = qcom_pcie_start_link,
+	.get_ltssm = qcom_pcie_get_ltssm,
 };
 
 static int qcom_pcie_icc_init(struct qcom_pcie *pcie)

-- 
2.34.1



^ permalink raw reply related

* [PATCH v4 1/5] PCI: host-common: Add helper to determine host bridge D3cold eligibility
From: Krishna Chaitanya Chundru @ 2026-04-07 13:03 UTC (permalink / raw)
  To: Jingoo Han, Manivannan Sadhasivam, Lorenzo Pieralisi,
	Krzysztof Wilczyński, Rob Herring, Bjorn Helgaas,
	Will Deacon
  Cc: linux-pci, linux-kernel, linux-arm-msm, linux-arm-kernel,
	jonathanh, bjorn.andersson, Krishna Chaitanya Chundru
In-Reply-To: <20260407-d3cold-v4-0-bb171f75b465@oss.qualcomm.com>

Add a common helper, pci_host_common_d3cold_possible(), to determine
whether PCIe devices under host bridge can safely transition to D3cold.

This helper is intended to be used by PCI host controller drivers to
decide whether they may safely put the host bridge into D3cold based on
the power state and wakeup capabilities of downstream endpoints.

The helper walks all devices on the all bridge buses and only allows
the devices to enter D3cold if all PCIe endpoints are already in
PCI_D3hot. This ensures that we do not power off the host bridge while
any active endpoint still requires the link to remain powered.

For devices that may wake the system, the helper additionally requires
that the device supports PME wake from D3cold (via WAKE#). Devices that
do not have wakeup enabled are not restricted by this check and do not
block the devices under host bridge from entering D3cold.

Devices without a bound driver and with PCI not enabled via sysfs are
treated as inactive and therefore do not prevent the devices under host
bridge from entering D3cold. This allows controllers to power down more
aggressively when there are no actively managed endpoints.

Some devices (e.g. M.2 without auxiliary power) lose PME detection when
main power is removed. Even if such devices advertise PME-from-D3cold
capability, entering D3cold may break wakeup. So, return PME-from-D3cold
capability via an output parameter so PCIe controller drivers can apply
platform-specific handling to preserve wakeup functionality.

Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
---
 drivers/pci/controller/pci-host-common.c | 63 ++++++++++++++++++++++++++++++++
 drivers/pci/controller/pci-host-common.h |  2 +
 2 files changed, 65 insertions(+)

diff --git a/drivers/pci/controller/pci-host-common.c b/drivers/pci/controller/pci-host-common.c
index d6258c1cffe5ec480fd2a7e50b3af39ef6ac4c8c..34e4c4c1d8c0fdead3e714525a497b722a41392e 100644
--- a/drivers/pci/controller/pci-host-common.c
+++ b/drivers/pci/controller/pci-host-common.c
@@ -17,6 +17,9 @@
 
 #include "pci-host-common.h"
 
+#define PCI_HOST_D3COLD_ALLOWED        BIT(0)
+#define PCI_HOST_PME_D3COLD_CAPABLE    BIT(1)
+
 static void gen_pci_unmap_cfg(void *ptr)
 {
 	pci_ecam_free((struct pci_config_window *)ptr);
@@ -106,5 +109,65 @@ void pci_host_common_remove(struct platform_device *pdev)
 }
 EXPORT_SYMBOL_GPL(pci_host_common_remove);
 
+static int __pci_host_common_d3cold_possible(struct pci_dev *pdev, void *userdata)
+{
+	u32 *flags = userdata;
+
+	if (pci_pcie_type(pdev) != PCI_EXP_TYPE_ENDPOINT)
+		return 0;
+
+	if (!pdev->dev.driver && !pci_is_enabled(pdev))
+		return 0;
+
+	if (pdev->current_state != PCI_D3hot)
+		goto exit;
+
+	if (device_may_wakeup(&pdev->dev)) {
+		if (!pci_pme_capable(pdev, PCI_D3cold))
+			goto exit;
+		else
+			*flags |= PCI_HOST_PME_D3COLD_CAPABLE;
+	}
+
+	return 0;
+
+exit:
+	*flags &= ~PCI_HOST_D3COLD_ALLOWED;
+
+	return -EOPNOTSUPP;
+}
+
+/**
+ * pci_host_common_d3cold_possible - Determine whether the host bridge can transition the
+ *				     devices into D3Cold.
+ *
+ * @bridge: PCI host bridge to check
+ * @pme_capable: Pointer to update if there is any device which is capable of generating
+ *		 PME from D3cold.
+ *
+ * Walk downstream PCIe endpoint devices and determine whether the host bridge
+ * is permitted to transition the devices into D3cold.
+ *
+ * Devices under host bridge can enter D3cold only if all active PCIe endpoints are in
+ * PCI_D3hot and any wakeup-enabled endpoint is capable of generating PME from D3cold.
+ * Inactive endpoints are ignored.
+ *
+ * The @pme_capable output allows PCIe controller drivers to apply
+ * platform-specific handling to preserve wakeup functionality.
+ *
+ * Return: %true if the host bridge may enter D3cold, otherwise %false.
+ */
+bool pci_host_common_d3cold_possible(struct pci_host_bridge *bridge, bool *pme_capable)
+{
+	u32 flags = PCI_HOST_D3COLD_ALLOWED;
+
+	pci_walk_bus(bridge->bus, __pci_host_common_d3cold_possible, &flags);
+
+	*pme_capable = !!(flags & PCI_HOST_PME_D3COLD_CAPABLE);
+
+	return !!(flags & PCI_HOST_D3COLD_ALLOWED);
+}
+EXPORT_SYMBOL_GPL(pci_host_common_d3cold_possible);
+
 MODULE_DESCRIPTION("Common library for PCI host controller drivers");
 MODULE_LICENSE("GPL v2");
diff --git a/drivers/pci/controller/pci-host-common.h b/drivers/pci/controller/pci-host-common.h
index b5075d4bd7eb31fbf1dc946ef1a6afd5afb5b3c6..7eb5599b9ce4feb5c8ba2aa1f2e532b0cf3e1c03 100644
--- a/drivers/pci/controller/pci-host-common.h
+++ b/drivers/pci/controller/pci-host-common.h
@@ -20,4 +20,6 @@ void pci_host_common_remove(struct platform_device *pdev);
 
 struct pci_config_window *pci_host_common_ecam_create(struct device *dev,
 	struct pci_host_bridge *bridge, const struct pci_ecam_ops *ops);
+
+bool pci_host_common_d3cold_possible(struct pci_host_bridge *bridge, bool *pme_capable);
 #endif

-- 
2.34.1



^ permalink raw reply related

* [PATCH v4 0/5] PCI: qcom: Add D3cold support
From: Krishna Chaitanya Chundru @ 2026-04-07 13:03 UTC (permalink / raw)
  To: Jingoo Han, Manivannan Sadhasivam, Lorenzo Pieralisi,
	Krzysztof Wilczyński, Rob Herring, Bjorn Helgaas,
	Will Deacon
  Cc: linux-pci, linux-kernel, linux-arm-msm, linux-arm-kernel,
	jonathanh, bjorn.andersson, Krishna Chaitanya Chundru

This series adds support for putting Qualcomm PCIe host bridges into D3cold
when downstream conditions allow it, and introduces a small common helper
to determine D3cold eligibility based on endpoint state.

On Qualcomm platforms, PCIe host controllers are currently kept powered
even when there are no active endpoints (i.e. all endpoints are already in
PCI_D3hot). This prevents the SoC from entering deeper low‑power states
such as CXPC.

While PCIe D3cold support exists in the PCI core, host controller drivers
lack a common mechanism to determine whether it is safe to power off the
host bridge without breaking active devices or wakeup functionality.
As a result, controllers either avoid entering D3cold or depend on rough,
driver‑specific workarounds.

This series addresses that gap.

1. Introduces pci_host_common_can_enter_d3cold(), a helper that determines
   whether a host bridge may enter D3cold based on downstream PCIe endpoint
   state. The helper permits D3cold only when all *active* endpoints are
   already in PCI_D3hot, and any wakeup‑enabled endpoint supports PME
   from D3cold.

2. Updates the Designware PCIe host driver to use this helper in the
   suspend_noirq() path, replacing the existing heuristic that blocked
   D3cold whenever L1 ASPM was enabled.

3. Enables D3cold support for Qualcomm PCIe controllers by wiring them into
   the DesignWare common suspend/resume flow and explicitly powering down
   controller resources when all endpoints are in D3hot.

The immediate outcome of this series is that Qualcomm PCIe host bridges can
enter D3cold when all endpoints are in D3hot.

This is a necessary but not sufficient step toward unblocking CXPC. With
this series applied, CXPC can be achieved on systems with no attached NVMe
devices. Support for NVMe‑attached systems requires additional changes
in NVMe driver, which are being worked on separately.

Tested on:
  - Qualcomm Lemans EVK, Monaco & sc7280 platforms.

Validation steps:
  - Boot without NVMe attach:
      * PCIe host enters D3cold during suspend
      * SoC is able to reach CXPC provided other drivers also remove
	their votes as part of suspend.

Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
---
Changes in v4:
- Added new argument to the API to know if there is any device with
  wakeup enabled and pme can be generated in D3cold. we need this info
  to decide to turn off power to device or not.
- Couple of nits in commit text (Mani).
- Link to v3: https://lore.kernel.org/r/20260311-d3cold-v3-0-4d85dc7c2695@oss.qualcomm.com

Changes in v3:
- Changed the function name from pci_host_common_can_enter_d3cold() to
  pci_host_common_d3cold_possible() (Mani).
- Couple of nits for commit text, newlines etc(Mani).
- Removed -ETIMEDOUT check and added -ENODEV & -EIO(Mani).
- Link to v2: https://lore.kernel.org/r/20260217-d3cold-v2-0-89b322864043@oss.qualcomm.com

Changes in v2:
- Updated the cover letter (Bjorn Andersson)
- Add get_ltssm helper function to read LTSSM state from parf.
- Allow D3cold if there is no driver enabled for a endpoint.
- Added a seperate patch to make phy down in deinit part to avoid power
  leakage.
- Revert icc bw voting if resume fails(Bjorn Andersson).
- Link to v1: https://lore.kernel.org/r/20260128-d3cold-v1-0-dd8f3f0ce824@oss.qualcomm.com

---
Krishna Chaitanya Chundru (5):
      PCI: host-common: Add helper to determine host bridge D3cold eligibility
      PCI: qcom: Add .get_ltssm() helper
      PCI: qcom: Power down PHY via PARF_PHY_CTRL before disabling rails/clocks
      PCI: dwc: Use common D3cold eligibility helper in suspend path
      PCI: qcom: Add D3cold support

 drivers/pci/controller/dwc/pcie-designware-host.c |  11 +-
 drivers/pci/controller/dwc/pcie-designware.h      |   1 +
 drivers/pci/controller/dwc/pcie-qcom.c            | 194 +++++++++++++++-------
 drivers/pci/controller/pci-host-common.c          |  63 +++++++
 drivers/pci/controller/pci-host-common.h          |   2 +
 5 files changed, 204 insertions(+), 67 deletions(-)
---
base-commit: 3aae9383f42f687221c011d7ee87529398e826b3
change-id: 20251229-d3cold-bf99921960bb

Best regards,
-- 
Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>



^ permalink raw reply

* Re: [PATCH v12 2/2] arm: dts: aspeed: ventura: add Meta Ventura BMC
From: Andrew Lunn @ 2026-04-07 12:42 UTC (permalink / raw)
  To: PK Lee
  Cc: robh+dt, krzysztof.kozlowski+dt, conor+dt, joel, andrew,
	devicetree, linux-arm-kernel, linux-aspeed, linux-kernel,
	Jason-Hsu, p.k.lee
In-Reply-To: <CAK8yEODCyYxkggU+7=xzWFcXP6RMTpNbHyYRHZhahX7=b6reqA@mail.gmail.com>

On Tue, Apr 07, 2026 at 05:05:12PM +0800, PK Lee wrote:
> > > +&mac3 {
> > > +     status = "okay";
> > > +     phy-mode = "rmii";
> > > +     pinctrl-names = "default";
> > > +     pinctrl-0 = <&pinctrl_rmii4_default>;
> > > +     fixed-link {
> > > +             speed = <100>;
> > > +             full-duplex;
> > > +     };
> >
> > What is on the other end of this fixed link?
> 
> The other end of this fixed link is the CPU port of a Marvell 88E6393X
> switch. We are using this switch in unmanaged mode rather than using
> the DSA subsystem. Therefore, we use a fixed-link to force the mac3 to
> 100Mbps full-duplex RMII to match the CPU port configuration.

You are mixing up terms. The 88E6393X does not have a dedicated port
for connecting to the host CPU. Any port can be connected to the host,
using DSA tags. And all the ports are 1G or faster, so it seems odd to
limit it to 100Mbps. There is something consider a CPU port, but that
connects the internal Z80 CPU to the switch fabric.

> > > +};
> > > +
> > > +&mdio0 {
> > > +     status = "okay";
> > > +};
> >
> > If there are no devices on the bus, why enable it?
> 
> We intentionally enable it so user-space tools can access the switch
> registers. I have added a comment in v13 to clarify this.

Why would user space want to access the switch registers for an
unmanaged switch? It sounds like you are using Marvells SDK in
userspace to manage the switch, rather than using DSA.

	Andrew


^ permalink raw reply

* Re: [PATCH v12 02/17] drm/bridge: Move legacy bridge driver out of imx directory for multi-platform use
From: Luca Ceresoli @ 2026-04-07 12:41 UTC (permalink / raw)
  To: Damon Ding, andrzej.hajda, neil.armstrong, rfoss,
	maarten.lankhorst, mripard, tzimmermann, airlied, simona,
	victor.liu, shawnguo, s.hauer, inki.dae, sw0312.kim,
	kyungmin.park, krzk, jingoohan1, p.zabel, hjc, heiko, andy.yan
  Cc: Laurent.pinchart, jonas, jernej.skrabec, kernel, festevam,
	alim.akhtar, dmitry.baryshkov, nicolas.frattaroli, dianders,
	m.szyprowski, linux-kernel, dri-devel, imx, linux-arm-kernel,
	linux-samsung-soc, linux-rockchip
In-Reply-To: <20260401091454.25730-3-damon.ding@rock-chips.com>

Hello Damon,

On Wed Apr 1, 2026 at 11:14 AM CEST, Damon Ding wrote:
> As suggested by Dmitry, the DRM legacy bridge driver can be pulled
> out of imx/ subdir for multi-platform use. The driver is also renamed
> to make it more generic and suitable for platforms other than i.MX.
>
> Signed-off-by: Damon Ding <damon.ding@rock-chips.com>
> Suggested-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
> Reviewed-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Tested-by: Heiko Stuebner <heiko@sntech.de> (on rk3588)

I tried applying patchs 1-9 to drm-misc-next but patch 2 does not apply due
to conflicts in the Kconfig file. Can you please rebase and send a new
iteration?

Luca

--
Luca Ceresoli, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


^ permalink raw reply

* Re: [PATCH 3/4] perf arm_spe: Decode Arm N1 IMPDEF events
From: James Clark @ 2026-04-07 12:35 UTC (permalink / raw)
  To: Ian Rogers
  Cc: John Garry, Will Deacon, Mike Leach, Leo Yan, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, Al Grant,
	linux-arm-kernel, linux-perf-users, linux-kernel
In-Reply-To: <CAP-5=fVcYOJ_3TWd6won5FVaH2MH6QwLhCoCRnNzzeE-9PgO1Q@mail.gmail.com>



On 02/04/2026 4:26 pm, Ian Rogers wrote:
> On Wed, Apr 1, 2026 at 7:26 AM James Clark <james.clark@linaro.org> wrote:
>>
>>  From the TRM [1], N1 has one IMPDEF event which isn't covered by the
>> common list. Add a framework so that more cores can be added in the
>> future and that the N1 IMPDEF event can be decoded. Also increase the
>> size of the buffer because we're adding more strings and if it gets
>> truncated it falls back to a hex dump only.
>>
>> [1]: https://developer.arm.com/documentation/100616/0401/Statistical-Profiling-Extension/implementation-defined-features-of-SPE
>> Suggested-by: Al Grant <al.grant@arm.com>
>> Signed-off-by: James Clark <james.clark@linaro.org>
>> ---
>>   tools/perf/util/arm-spe-decoder/Build              |  2 +
>>   .../util/arm-spe-decoder/arm-spe-pkt-decoder.c     | 45 ++++++++++++++++++++--
>>   .../util/arm-spe-decoder/arm-spe-pkt-decoder.h     |  5 ++-
>>   tools/perf/util/arm-spe.c                          | 13 ++++---
>>   4 files changed, 54 insertions(+), 11 deletions(-)
>>
>> diff --git a/tools/perf/util/arm-spe-decoder/Build b/tools/perf/util/arm-spe-decoder/Build
>> index ab500e0efe24..97a298d1e279 100644
>> --- a/tools/perf/util/arm-spe-decoder/Build
>> +++ b/tools/perf/util/arm-spe-decoder/Build
>> @@ -1 +1,3 @@
>>   perf-util-y += arm-spe-pkt-decoder.o arm-spe-decoder.o
>> +
>> +CFLAGS_arm-spe-pkt-decoder.o += -I$(srctree)/tools/arch/arm64/include/ -I$(OUTPUT)arch/arm64/include/generated/
>> diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
>> index c880b0dec3a1..42a7501d4dfe 100644
>> --- a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
>> +++ b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
>> @@ -15,6 +15,8 @@
>>
>>   #include "arm-spe-pkt-decoder.h"
>>
>> +#include "../../arm64/include/asm/cputype.h"
> 
> Sashiko spotted:
> https://sashiko.dev/#/patchset/20260401-james-spe-impdef-decode-v1-0-ad0d372c220c%40linaro.org
> """
> This isn't a bug, but does this include directive rely on accidental
> path normalization?
> 
> The relative path ../../arm64/include/asm/cputype.h does not exist relative
> to arm-spe-pkt-decoder.c. It only compiles because the Build file adds
> -I$(srctree)/tools/arch/arm64/include/ to CFLAGS.
> 
> Would it be cleaner to use #include <asm/cputype.h> to explicitly rely on
> the include path?
> [ ... ]
> """
> I wouldn't use <asm/cputype.h> due to cross-compilation and the like,
> instead just add the extra "../" into the include path.
> 

Do you mean change the #include to this?

   #include "../../../arm64/include/asm/cputype.h"

I still need to add:

   CFLAGS_arm-spe-pkt-decoder.o += -I$(srctree)/tools/arch/arm64/include/

To make the this include in cputype.h work:

   #include <asm/sysreg.h>

Which probably only works because there isn't a sysreg.h on other 
architectures. But I'm not sure what the significance of ../../ vs 
../../../ is if either compile? arm-spe.c already does it with ../../ 
which is what I copied.

>> +
>>   static const char * const arm_spe_packet_name[] = {
>>          [ARM_SPE_PAD]           = "PAD",
>>          [ARM_SPE_END]           = "END",
>> @@ -307,6 +309,11 @@ static const struct ev_string common_ev_strings[] = {
>>          { .event = 0, .desc = NULL },
>>   };
>>
>> +static const struct ev_string n1_event_strings[] = {
>> +       { .event = 12, .desc = "LATE-PREFETCH" },
>> +       { .event = 0, .desc = NULL },
>> +};
>> +
>>   static u64 print_event_list(int *err, char **buf, size_t *buf_len,
>>                              const struct ev_string *ev_strings, u64 payload)
>>   {
>> @@ -318,14 +325,44 @@ static u64 print_event_list(int *err, char **buf, size_t *buf_len,
>>          return payload;
>>   }
>>
>> +struct event_print_handle {
>> +       const struct midr_range *midr_ranges;
>> +       const struct ev_string *ev_strings;
>> +};
>> +
>> +#define EV_PRINT(range, strings)                       \
>> +       {                                       \
>> +               .midr_ranges = range,           \
>> +               .ev_strings = strings,  \
>> +       }
>> +
>> +static const struct midr_range n1_event_encoding_cpus[] = {
>> +       MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N1),
>> +       {},
>> +};
>> +
>> +static const struct event_print_handle event_print_handles[] = {
>> +       EV_PRINT(n1_event_encoding_cpus, n1_event_strings),
>> +};
>> +
>>   static int arm_spe_pkt_desc_event(const struct arm_spe_pkt *packet,
>> -                                 char *buf, size_t buf_len)
>> +                                 char *buf, size_t buf_len, u64 midr)
>>   {
>>          u64 payload = packet->payload;
>>          int err = 0;
>>
>>          arm_spe_pkt_out_string(&err, &buf, &buf_len, "EV");
>> -       print_event_list(&err, &buf, &buf_len, common_ev_strings, payload);
>> +       payload = print_event_list(&err, &buf, &buf_len, common_ev_strings,
>> +                                  payload);
>> +
>> +       /* Try to decode IMPDEF bits for known CPUs */
>> +       for (unsigned int i = 0; i < ARRAY_SIZE(event_print_handles); i++) {
>> +               if (is_midr_in_range_list(midr,
>> +                                         event_print_handles[i].midr_ranges))
>> +                       payload = print_event_list(&err, &buf, &buf_len,
>> +                                                  event_print_handles[i].ev_strings,
>> +                                                  payload);
>> +       }
>>
>>          return err;
>>   }
>> @@ -506,7 +543,7 @@ static int arm_spe_pkt_desc_counter(const struct arm_spe_pkt *packet,
>>   }
>>
>>   int arm_spe_pkt_desc(const struct arm_spe_pkt *packet, char *buf,
>> -                    size_t buf_len)
>> +                    size_t buf_len, u64 midr)
>>   {
>>          int idx = packet->index;
>>          unsigned long long payload = packet->payload;
>> @@ -522,7 +559,7 @@ int arm_spe_pkt_desc(const struct arm_spe_pkt *packet, char *buf,
>>                  arm_spe_pkt_out_string(&err, &buf, &blen, "%s", name);
>>                  break;
>>          case ARM_SPE_EVENTS:
>> -               err = arm_spe_pkt_desc_event(packet, buf, buf_len);
>> +               err = arm_spe_pkt_desc_event(packet, buf, buf_len, midr);
>>                  break;
>>          case ARM_SPE_OP_TYPE:
>>                  err = arm_spe_pkt_desc_op_type(packet, buf, buf_len);
>> diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
>> index adf4cde320aa..17b067fe3c87 100644
>> --- a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
>> +++ b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
>> @@ -11,7 +11,7 @@
>>   #include <stddef.h>
>>   #include <stdint.h>
>>
>> -#define ARM_SPE_PKT_DESC_MAX           256
>> +#define ARM_SPE_PKT_DESC_MAX           512
>>
>>   #define ARM_SPE_NEED_MORE_BYTES                -1
>>   #define ARM_SPE_BAD_PACKET             -2
>> @@ -186,5 +186,6 @@ const char *arm_spe_pkt_name(enum arm_spe_pkt_type);
>>   int arm_spe_get_packet(const unsigned char *buf, size_t len,
>>                         struct arm_spe_pkt *packet);
>>
>> -int arm_spe_pkt_desc(const struct arm_spe_pkt *packet, char *buf, size_t len);
>> +int arm_spe_pkt_desc(const struct arm_spe_pkt *packet, char *buf, size_t len,
>> +                    u64 midr);
>>   #endif
>> diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
>> index 7447b000f9cd..46f0309c092b 100644
>> --- a/tools/perf/util/arm-spe.c
>> +++ b/tools/perf/util/arm-spe.c
>> @@ -135,7 +135,7 @@ struct data_source_handle {
>>          }
>>
>>   static void arm_spe_dump(struct arm_spe *spe __maybe_unused,
>> -                        unsigned char *buf, size_t len)
>> +                        unsigned char *buf, size_t len, u64 midr)
>>   {
>>          struct arm_spe_pkt packet;
>>          size_t pos = 0;
>> @@ -161,7 +161,7 @@ static void arm_spe_dump(struct arm_spe *spe __maybe_unused,
>>                          color_fprintf(stdout, color, "   ");
>>                  if (ret > 0) {
>>                          ret = arm_spe_pkt_desc(&packet, desc,
>> -                                              ARM_SPE_PKT_DESC_MAX);
>> +                                              ARM_SPE_PKT_DESC_MAX, midr);
>>                          if (!ret)
>>                                  color_fprintf(stdout, color, " %s\n", desc);
>>                  } else {
>> @@ -174,10 +174,10 @@ static void arm_spe_dump(struct arm_spe *spe __maybe_unused,
>>   }
>>
>>   static void arm_spe_dump_event(struct arm_spe *spe, unsigned char *buf,
>> -                              size_t len)
>> +                              size_t len, u64 midr)
>>   {
>>          printf(".\n");
>> -       arm_spe_dump(spe, buf, len);
>> +       arm_spe_dump(spe, buf, len, midr);
>>   }
>>
>>   static int arm_spe_get_trace(struct arm_spe_buffer *b, void *data)
>> @@ -1469,8 +1469,11 @@ static int arm_spe_process_auxtrace_event(struct perf_session *session,
>>                  /* Dump here now we have copied a piped trace out of the pipe */
>>                  if (dump_trace) {
>>                          if (auxtrace_buffer__get_data(buffer, fd)) {
>> +                               u64 midr = 0;
>> +
>> +                               arm_spe__get_midr(spe, buffer->cpu.cpu, &midr);
> 
> Sashiko claims to have spotted an issue here:
> """
> Is it possible for arm_spe__get_midr() to cause a segmentation fault here?
> 
> If the trace is from an older recording (metadata version 1) and the
> environment lacks a CPUID string (such as during cross-architecture
> analysis), perf_env__cpuid() returns NULL.
> 
> It appears arm_spe__get_midr() then passes this NULL pointer to
> strtol(cpuid, NULL, 16), which leads to undefined behavior.
> """
> 
> But this feels like, if this happens you're already having a bad time
> and these changes aren't necessarily making things worse.
> 
> Thanks,
> Ian
> 

Yeah I think it might be possible so I can add an error instead of a 
segfault. I'll check the rest of the Sashiko comments too.

>>                                  arm_spe_dump_event(spe, buffer->data,
>> -                                               buffer->size);
>> +                                               buffer->size, midr);
>>                                  auxtrace_buffer__put_data(buffer);
>>                          }
>>                  }
>>
>> --
>> 2.34.1
>>



^ permalink raw reply

* Re: [PATCH v3 5/5] PCI: qcom: Add D3cold support
From: Konrad Dybcio @ 2026-04-07 12:27 UTC (permalink / raw)
  To: Krishna Chaitanya Chundru, Jingoo Han, Manivannan Sadhasivam,
	Lorenzo Pieralisi, Krzysztof Wilczyński, Rob Herring,
	Bjorn Helgaas, Will Deacon
  Cc: linux-pci, linux-kernel, linux-arm-msm, linux-arm-kernel,
	jonathanh, bjorn.andersson
In-Reply-To: <2d193ac6-619c-4cd7-b0f1-39f5aa1ec02b@oss.qualcomm.com>

On 4/7/26 1:37 PM, Krishna Chaitanya Chundru wrote:
> 
> 
> On 4/7/2026 5:06 PM, Konrad Dybcio wrote:
>> On 4/6/26 11:08 AM, Krishna Chaitanya Chundru wrote:
>>>
>>> On 3/17/2026 2:45 PM, Konrad Dybcio wrote:
>>>> On 3/11/26 11:26 AM, Krishna Chaitanya Chundru wrote:
>>>>> Add support for transitioning PCIe endpoints & bridges into D3cold by
>>>>> integrating with the DWC core suspend/resume helpers.
>>>>>
>>>>> Implement PME_TurnOff message generation via ELBI_SYS_CTRL and hook it
>>>>> into the DWC host operations so the controller follows the standard
>>>>> PME_TurnOff-based power-down sequence before entering D3cold.
>>>>>
>>>>> When the device is suspended into D3cold, fully tear down interconnect
>>>>> bandwidth, OPP votes. If D3cold is not entered, retain existing behavior
>>>>> by keeping the required interconnect and OPP votes.
>>>>>
>>>>> Drop the qcom_pcie::suspended flag and rely on the existing
>>>>> dw_pcie::suspended state, which now drives both the power-management
>>>>> flow and the interconnect/OPP handling.
>>>>>
>>>>> Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
>>>>> ---
>>>> [...]
>>>>
>>>>>           ret = icc_disable(pcie->icc_cpu);
>>>>>           if (ret)
>>>>>               dev_err(dev, "Failed to disable CPU-PCIe interconnect path: %d\n", ret);
>>>>>             if (pcie->use_pm_opp)
>>>>>               dev_pm_opp_set_opp(pcie->pci->dev, NULL);
>>>> Does calling .suspend not drop the vote by itself?
>>> No, unlike genpd framework for power domains, opp votes will not removed as part of suspend.
>> Hm, I would imagine the power vote goes down.. is that the ICC vote
>> that's still hanging if we don't do this?
> yes, ICC votes are still present

OK, thanks for confirming

Konrad


^ permalink raw reply

* Re: [PATCH v10 6/6] usb: typec: tcpm/tcpci_maxim: deprecate WAR for setting charger mode
From: Heikki Krogerus @ 2026-04-07 12:24 UTC (permalink / raw)
  To: Amit Sunil Dhamne
  Cc: André Draszik, Lee Jones, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Greg Kroah-Hartman, Jagan Sridharan, Mark Brown,
	Matti Vaittinen, Andrew Morton, Sebastian Reichel, Peter Griffin,
	Tudor Ambarus, Alim Akhtar, linux-kernel, devicetree, linux-usb,
	linux-pm, linux-arm-kernel, linux-samsung-soc, RD Babiera,
	Kyle Tso
In-Reply-To: <017b8552-87e2-4409-ae34-9a3ab7365a68@google.com>

Hi Amit,

On Thu, Apr 02, 2026 at 11:47:30AM -0700, Amit Sunil Dhamne wrote:
> Hi Heikki,
> 
> On 4/2/26 7:33 AM, Heikki Krogerus wrote:
> > Hi Amit,
> > 
> > > +static int get_vbus_regulator_handle(struct max_tcpci_chip *chip)
> > > +{
> > > +	if (IS_ERR_OR_NULL(chip->vbus_reg)) {
> > > +		chip->vbus_reg = devm_regulator_get_exclusive(chip->dev,
> > > +							      "vbus");
> > Sorry to go back to this, but why can't you just get the regulator in
> > max_tcpci_probe()?
> 
> Thanks for calling this out. This was an intentional design decision to
> break a circular dependency.
> 
> The charger driver is guaranteed to probe after the TCPC driver due to a
> power supply dependency (the TCPC is a supplier of power for the Battery
> Charger). However, the charger driver is also the regulator provider for
> VBUS out (when Type-C goes into source mode).
> 
> Because of this, the regulator handle will not be available during the TCPC
> driver's probe. If we tried to fetch it in max_tcpci_probe() and returned
> -EPROBE_DEFER, it would create a probe deadlock, as the charger would then
> never probe. Therefore, I made the decision to get the regulator handle
> lazily and on-demand.

Got it. Thanks for the explanation!

-- 
heikki



^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox