public inbox for linux-riscv@lists.infradead.org
 help / color / mirror / Atom feed
* [PATCH 0/5] riscv: irqchip: Optimization of interrupt handling
@ 2025-01-13 15:09 Xu Lu
  2025-01-13 15:09 ` [PATCH 1/5] irqchip/riscv-intc: Balance priority and fairness during irq handling Xu Lu
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Xu Lu @ 2025-01-13 15:09 UTC (permalink / raw)
  To: daniel.lezcano, tglx, anup, paul.walmsley, palmer
  Cc: lihangjing, xieyongji, linux-riscv, linux-kernel, Xu Lu

This patch series provides some optimization for the existing interrupt
handling procedure. First, it tries to make a balance between interrupt
priority and fairness to avoid interrupts with lower priority get
starved. Also, it inserts barriers to ensure the order between normal
memory writes and IPI issuing.

Xu Lu (5):
  irqchip/riscv-intc: Balance priority and fairness during irq handling
  irqchip/riscv-imsic: Add a threshold to ext irq handling times
  irqchip/riscv-imsic: Use wmb() to order normal writes and IPI writes
  irqchip/timer-clint: Use wmb() to order normal writes and IPI writes
  irqchip/aclint-sswi: Use wmb() to order normal writes and IPI writes

 drivers/clocksource/timer-clint.c            |  6 ++++
 drivers/irqchip/irq-riscv-imsic-early.c      | 37 +++++++++++++-------
 drivers/irqchip/irq-riscv-intc.c             | 32 +++++++++++++----
 drivers/irqchip/irq-thead-c900-aclint-sswi.c |  6 ++++
 4 files changed, 62 insertions(+), 19 deletions(-)

-- 
2.20.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/5] irqchip/riscv-intc: Balance priority and fairness during irq handling
  2025-01-13 15:09 [PATCH 0/5] riscv: irqchip: Optimization of interrupt handling Xu Lu
@ 2025-01-13 15:09 ` Xu Lu
  2025-01-15 11:39   ` Anup Patel
  2025-01-13 15:09 ` [PATCH 2/5] irqchip/riscv-imsic: Add a threshold to ext irq handling times Xu Lu
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Xu Lu @ 2025-01-13 15:09 UTC (permalink / raw)
  To: daniel.lezcano, tglx, anup, paul.walmsley, palmer
  Cc: lihangjing, xieyongji, linux-riscv, linux-kernel, Xu Lu

Both csr cause and csr topi record the pending bit with the highest
priority. If interrupts with high priority arrive frequently within a
certain period of time, the interrupts with low priority won't get a
chance to be handled.

For example, if external interrupts and software interrupts arrive very
frequently, the timer interrupts will never be handled. Then buddy
watchdog on a buddy CPU will report a hardlockup on the current CPU while
current CPU actually can receive irq.

This commit solves this problem by handling all pending irqs in a round.
During each round, this commit handles pending irqs by their priority.

Signed-off-by: Xu Lu <luxu.kernel@bytedance.com>
---
 drivers/irqchip/irq-riscv-intc.c | 32 ++++++++++++++++++++++++++------
 1 file changed, 26 insertions(+), 6 deletions(-)

diff --git a/drivers/irqchip/irq-riscv-intc.c b/drivers/irqchip/irq-riscv-intc.c
index f653c13de62b..bc2ec26aa9e9 100644
--- a/drivers/irqchip/irq-riscv-intc.c
+++ b/drivers/irqchip/irq-riscv-intc.c
@@ -26,20 +26,40 @@ static unsigned int riscv_intc_nr_irqs __ro_after_init = BITS_PER_LONG;
 static unsigned int riscv_intc_custom_base __ro_after_init = BITS_PER_LONG;
 static unsigned int riscv_intc_custom_nr_irqs __ro_after_init;
 
+static unsigned int riscv_prio_irqs[] = {
+#ifdef CONFIG_RISCV_M_MODE
+	IRQ_M_EXT, IRQ_M_SOFT, IRQ_M_TIMER,
+#endif
+	IRQ_S_EXT, IRQ_S_SOFT, IRQ_S_TIMER, IRQ_S_GEXT,
+	IRQ_VS_EXT, IRQ_VS_SOFT, IRQ_VS_TIMER,
+	IRQ_PMU_OVF,
+};
+
 static void riscv_intc_irq(struct pt_regs *regs)
 {
-	unsigned long cause = regs->cause & ~CAUSE_IRQ_FLAG;
-
-	if (generic_handle_domain_irq(intc_domain, cause))
-		pr_warn_ratelimited("Failed to handle interrupt (cause: %ld)\n", cause);
+	unsigned long pending = csr_read(CSR_IP) & csr_read(CSR_IE);
+	unsigned int i;
+
+	for (i = 0; i < ARRAY_SIZE(riscv_prio_irqs); i++)
+		if (pending & (1UL << riscv_prio_irqs[i]))
+			if (generic_handle_domain_irq(intc_domain, riscv_prio_irqs[i]))
+				pr_warn_ratelimited("Failed to handle interrupt (cause: %u)\n",
+						    riscv_prio_irqs[i]);
 }
 
 static void riscv_intc_aia_irq(struct pt_regs *regs)
 {
 	unsigned long topi;
+	unsigned long pending;
+	unsigned int i;
+
+	while ((topi = csr_read(CSR_TOPI))) {
+		pending = csr_read(CSR_IP) & csr_read(CSR_IE);
 
-	while ((topi = csr_read(CSR_TOPI)))
-		generic_handle_domain_irq(intc_domain, topi >> TOPI_IID_SHIFT);
+		for (i = 0; i < ARRAY_SIZE(riscv_prio_irqs); i++)
+			if (pending & (1UL << riscv_prio_irqs[i]))
+				generic_handle_domain_irq(intc_domain, riscv_prio_irqs[i]);
+	}
 }
 
 /*
-- 
2.20.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/5] irqchip/riscv-imsic: Add a threshold to ext irq handling times
  2025-01-13 15:09 [PATCH 0/5] riscv: irqchip: Optimization of interrupt handling Xu Lu
  2025-01-13 15:09 ` [PATCH 1/5] irqchip/riscv-intc: Balance priority and fairness during irq handling Xu Lu
@ 2025-01-13 15:09 ` Xu Lu
  2025-01-13 15:09 ` [PATCH 3/5] irqchip/riscv-imsic: Use wmb() to order normal writes and IPI writes Xu Lu
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 16+ messages in thread
From: Xu Lu @ 2025-01-13 15:09 UTC (permalink / raw)
  To: daniel.lezcano, tglx, anup, paul.walmsley, palmer
  Cc: lihangjing, xieyongji, linux-riscv, linux-kernel, Xu Lu

RISC-V's external irq has higher priority than software irq, timer irq
and pmu overflow irq. In existing implementation of IMSIC driver, If
external irqs keep arriving within a certain period of time, no other
interrupt will get a chance to be handled.

This commit solves this problem by introducing a threshold to the number
of times imsic interrupts can be processed in each round.

Signed-off-by: Xu Lu <luxu.kernel@bytedance.com>
---
 drivers/irqchip/irq-riscv-imsic-early.c | 31 ++++++++++++++-----------
 1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/drivers/irqchip/irq-riscv-imsic-early.c b/drivers/irqchip/irq-riscv-imsic-early.c
index c5c2e6929a2f..63097f2bbadf 100644
--- a/drivers/irqchip/irq-riscv-imsic-early.c
+++ b/drivers/irqchip/irq-riscv-imsic-early.c
@@ -20,6 +20,8 @@
 
 #include "irq-riscv-imsic-state.h"
 
+#define IMSIC_HANDLE_THRESHOLD		20
+
 static int imsic_parent_irq;
 
 #ifdef CONFIG_SMP
@@ -76,6 +78,7 @@ static void imsic_handle_irq(struct irq_desc *desc)
 	int err, cpu = smp_processor_id();
 	struct imsic_vector *vec;
 	unsigned long local_id;
+	unsigned int handled = 0;
 
 	chained_irq_enter(chip, desc);
 
@@ -85,21 +88,23 @@ static void imsic_handle_irq(struct irq_desc *desc)
 		if (local_id == IMSIC_IPI_ID) {
 			if (IS_ENABLED(CONFIG_SMP))
 				ipi_mux_process();
-			continue;
-		}
-
-		if (unlikely(!imsic->base_domain))
-			continue;
-
-		vec = imsic_vector_from_local_id(cpu, local_id);
-		if (!vec) {
-			pr_warn_ratelimited("vector not found for local ID 0x%lx\n", local_id);
-			continue;
+		} else if (likely(imsic->base_domain)) {
+			vec = imsic_vector_from_local_id(cpu, local_id);
+
+			if (unlikely(!vec))
+				pr_warn_ratelimited("vector not found for local ID 0x%lx\n",
+						    local_id);
+			else {
+				err = generic_handle_domain_irq(imsic->base_domain, vec->hwirq);
+
+				if (unlikely(err))
+					pr_warn_ratelimited("hwirq 0x%x mapping not found\n",
+							    vec->hwirq);
+			}
 		}
 
-		err = generic_handle_domain_irq(imsic->base_domain, vec->hwirq);
-		if (unlikely(err))
-			pr_warn_ratelimited("hwirq 0x%x mapping not found\n", vec->hwirq);
+		if (handled++ >= IMSIC_HANDLE_THRESHOLD)
+			break;
 	}
 
 	chained_irq_exit(chip, desc);
-- 
2.20.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/5] irqchip/riscv-imsic: Use wmb() to order normal writes and IPI writes
  2025-01-13 15:09 [PATCH 0/5] riscv: irqchip: Optimization of interrupt handling Xu Lu
  2025-01-13 15:09 ` [PATCH 1/5] irqchip/riscv-intc: Balance priority and fairness during irq handling Xu Lu
  2025-01-13 15:09 ` [PATCH 2/5] irqchip/riscv-imsic: Add a threshold to ext irq handling times Xu Lu
@ 2025-01-13 15:09 ` Xu Lu
  2025-01-14  4:34   ` Anup Patel
  2025-01-13 15:09 ` [PATCH 4/5] irqchip/timer-clint: " Xu Lu
  2025-01-13 15:09 ` [PATCH 5/5] irqchip/aclint-sswi: " Xu Lu
  4 siblings, 1 reply; 16+ messages in thread
From: Xu Lu @ 2025-01-13 15:09 UTC (permalink / raw)
  To: daniel.lezcano, tglx, anup, paul.walmsley, palmer
  Cc: lihangjing, xieyongji, linux-riscv, linux-kernel, Xu Lu

During an IPI procedure, we need to ensure all previous write operations
are visible to other CPUs before sending a real IPI. We use wmb() barrier
to ensure this as IMSIC issues IPI via mmio writes.

Signed-off-by: Xu Lu <luxu.kernel@bytedance.com>
---
 drivers/irqchip/irq-riscv-imsic-early.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/irqchip/irq-riscv-imsic-early.c b/drivers/irqchip/irq-riscv-imsic-early.c
index 63097f2bbadf..c6317cb557fb 100644
--- a/drivers/irqchip/irq-riscv-imsic-early.c
+++ b/drivers/irqchip/irq-riscv-imsic-early.c
@@ -29,6 +29,12 @@ static void imsic_ipi_send(unsigned int cpu)
 {
 	struct imsic_local_config *local = per_cpu_ptr(imsic->global.local, cpu);
 
+	/*
+	 * Ensure that stores to normal memory are visible to the other CPUs
+	 * before issuing IPI.
+	 */
+	wmb();
+
 	writel_relaxed(IMSIC_IPI_ID, local->msi_va);
 }
 
-- 
2.20.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/5] irqchip/timer-clint: Use wmb() to order normal writes and IPI writes
  2025-01-13 15:09 [PATCH 0/5] riscv: irqchip: Optimization of interrupt handling Xu Lu
                   ` (2 preceding siblings ...)
  2025-01-13 15:09 ` [PATCH 3/5] irqchip/riscv-imsic: Use wmb() to order normal writes and IPI writes Xu Lu
@ 2025-01-13 15:09 ` Xu Lu
  2025-01-14  4:34   ` Anup Patel
  2025-01-13 15:09 ` [PATCH 5/5] irqchip/aclint-sswi: " Xu Lu
  4 siblings, 1 reply; 16+ messages in thread
From: Xu Lu @ 2025-01-13 15:09 UTC (permalink / raw)
  To: daniel.lezcano, tglx, anup, paul.walmsley, palmer
  Cc: lihangjing, xieyongji, linux-riscv, linux-kernel, Xu Lu

During an IPI procedure, we need to ensure all previous write operations
are visible to other CPUs before sending a real IPI. We use wmb() barrier
to ensure this as CLINT issues IPI via mmio writes.

Signed-off-by: Xu Lu <luxu.kernel@bytedance.com>
---
 drivers/clocksource/timer-clint.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/clocksource/timer-clint.c b/drivers/clocksource/timer-clint.c
index 0bdd9d7ec545..8d73b45f9966 100644
--- a/drivers/clocksource/timer-clint.c
+++ b/drivers/clocksource/timer-clint.c
@@ -48,6 +48,12 @@ EXPORT_SYMBOL(clint_time_val);
 #ifdef CONFIG_SMP
 static void clint_send_ipi(unsigned int cpu)
 {
+	/*
+	 * Ensure that stores to normal memory are visible to the other CPUs
+	 * before issuing IPI.
+	 */
+	wmb();
+
 	writel(1, clint_ipi_base + cpuid_to_hartid_map(cpu));
 }
 
-- 
2.20.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 5/5] irqchip/aclint-sswi: Use wmb() to order normal writes and IPI writes
  2025-01-13 15:09 [PATCH 0/5] riscv: irqchip: Optimization of interrupt handling Xu Lu
                   ` (3 preceding siblings ...)
  2025-01-13 15:09 ` [PATCH 4/5] irqchip/timer-clint: " Xu Lu
@ 2025-01-13 15:09 ` Xu Lu
  2025-01-14  4:34   ` Anup Patel
  4 siblings, 1 reply; 16+ messages in thread
From: Xu Lu @ 2025-01-13 15:09 UTC (permalink / raw)
  To: daniel.lezcano, tglx, anup, paul.walmsley, palmer
  Cc: lihangjing, xieyongji, linux-riscv, linux-kernel, Xu Lu

During an IPI procedure, we need to ensure all previous write operations
are visible to other CPUs before sending a real IPI. We use wmb() barrier
to ensure this as ACLINT SSWI issues IPI via mmio writes.

Signed-off-by: Xu Lu <luxu.kernel@bytedance.com>
---
 drivers/irqchip/irq-thead-c900-aclint-sswi.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/irqchip/irq-thead-c900-aclint-sswi.c b/drivers/irqchip/irq-thead-c900-aclint-sswi.c
index b0e366ade427..7246a008a0f0 100644
--- a/drivers/irqchip/irq-thead-c900-aclint-sswi.c
+++ b/drivers/irqchip/irq-thead-c900-aclint-sswi.c
@@ -31,6 +31,12 @@ static DEFINE_PER_CPU(void __iomem *, sswi_cpu_regs);
 
 static void thead_aclint_sswi_ipi_send(unsigned int cpu)
 {
+	/*
+	 * Ensure that stores to normal memory are visible to the other CPUs
+	 * before issuing IPI.
+	 */
+	wmb();
+
 	writel_relaxed(0x1, per_cpu(sswi_cpu_regs, cpu));
 }
 
-- 
2.20.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/5] irqchip/riscv-imsic: Use wmb() to order normal writes and IPI writes
  2025-01-13 15:09 ` [PATCH 3/5] irqchip/riscv-imsic: Use wmb() to order normal writes and IPI writes Xu Lu
@ 2025-01-14  4:34   ` Anup Patel
  2025-01-14  6:39     ` [External] " Xu Lu
  0 siblings, 1 reply; 16+ messages in thread
From: Anup Patel @ 2025-01-14  4:34 UTC (permalink / raw)
  To: Xu Lu
  Cc: daniel.lezcano, tglx, paul.walmsley, palmer, lihangjing,
	xieyongji, linux-riscv, linux-kernel

On Mon, Jan 13, 2025 at 8:39 PM Xu Lu <luxu.kernel@bytedance.com> wrote:
>
> During an IPI procedure, we need to ensure all previous write operations
> are visible to other CPUs before sending a real IPI. We use wmb() barrier
> to ensure this as IMSIC issues IPI via mmio writes.
>
> Signed-off-by: Xu Lu <luxu.kernel@bytedance.com>
> ---
>  drivers/irqchip/irq-riscv-imsic-early.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/drivers/irqchip/irq-riscv-imsic-early.c b/drivers/irqchip/irq-riscv-imsic-early.c
> index 63097f2bbadf..c6317cb557fb 100644
> --- a/drivers/irqchip/irq-riscv-imsic-early.c
> +++ b/drivers/irqchip/irq-riscv-imsic-early.c
> @@ -29,6 +29,12 @@ static void imsic_ipi_send(unsigned int cpu)
>  {
>         struct imsic_local_config *local = per_cpu_ptr(imsic->global.local, cpu);
>
> +       /*
> +        * Ensure that stores to normal memory are visible to the other CPUs
> +        * before issuing IPI.
> +        */
> +       wmb();
> +

The imsic_ipi_send() is called through ipi_mux_send_mask()
which does smp_mb__after_atomic() before calling so no need
for any barrier here. Also, barriers need to be in-pair so adding
a single barrier at random location is inappropriate.
(Refer, https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/irq/ipi-mux.c?h=v6.13-rc7#n78)

Based on the above, this patch is not needed.

Regards,
Anup

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/5] irqchip/timer-clint: Use wmb() to order normal writes and IPI writes
  2025-01-13 15:09 ` [PATCH 4/5] irqchip/timer-clint: " Xu Lu
@ 2025-01-14  4:34   ` Anup Patel
  0 siblings, 0 replies; 16+ messages in thread
From: Anup Patel @ 2025-01-14  4:34 UTC (permalink / raw)
  To: Xu Lu
  Cc: daniel.lezcano, tglx, paul.walmsley, palmer, lihangjing,
	xieyongji, linux-riscv, linux-kernel

On Mon, Jan 13, 2025 at 8:39 PM Xu Lu <luxu.kernel@bytedance.com> wrote:
>
> During an IPI procedure, we need to ensure all previous write operations
> are visible to other CPUs before sending a real IPI. We use wmb() barrier
> to ensure this as CLINT issues IPI via mmio writes.
>
> Signed-off-by: Xu Lu <luxu.kernel@bytedance.com>
> ---
>  drivers/clocksource/timer-clint.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/drivers/clocksource/timer-clint.c b/drivers/clocksource/timer-clint.c
> index 0bdd9d7ec545..8d73b45f9966 100644
> --- a/drivers/clocksource/timer-clint.c
> +++ b/drivers/clocksource/timer-clint.c
> @@ -48,6 +48,12 @@ EXPORT_SYMBOL(clint_time_val);
>  #ifdef CONFIG_SMP
>  static void clint_send_ipi(unsigned int cpu)
>  {
> +       /*
> +        * Ensure that stores to normal memory are visible to the other CPUs
> +        * before issuing IPI.
> +        */
> +       wmb();
> +

Same comment as PATCH3.

The clint_send_ipi() is called through ipi_mux_send_mask()
which does smp_mb__after_atomic() before calling so no need
for any barrier here. Also, barriers need to be in-pair so adding
a single barrier at random location is inappropriate.
(Refer, https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/irq/ipi-mux.c?h=v6.13-rc7#n78)

Based on the above, this patch is not needed.

Regards,
Anup

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/5] irqchip/aclint-sswi: Use wmb() to order normal writes and IPI writes
  2025-01-13 15:09 ` [PATCH 5/5] irqchip/aclint-sswi: " Xu Lu
@ 2025-01-14  4:34   ` Anup Patel
  0 siblings, 0 replies; 16+ messages in thread
From: Anup Patel @ 2025-01-14  4:34 UTC (permalink / raw)
  To: Xu Lu
  Cc: daniel.lezcano, tglx, paul.walmsley, palmer, lihangjing,
	xieyongji, linux-riscv, linux-kernel

On Mon, Jan 13, 2025 at 8:40 PM Xu Lu <luxu.kernel@bytedance.com> wrote:
>
> During an IPI procedure, we need to ensure all previous write operations
> are visible to other CPUs before sending a real IPI. We use wmb() barrier
> to ensure this as ACLINT SSWI issues IPI via mmio writes.
>
> Signed-off-by: Xu Lu <luxu.kernel@bytedance.com>
> ---
>  drivers/irqchip/irq-thead-c900-aclint-sswi.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/drivers/irqchip/irq-thead-c900-aclint-sswi.c b/drivers/irqchip/irq-thead-c900-aclint-sswi.c
> index b0e366ade427..7246a008a0f0 100644
> --- a/drivers/irqchip/irq-thead-c900-aclint-sswi.c
> +++ b/drivers/irqchip/irq-thead-c900-aclint-sswi.c
> @@ -31,6 +31,12 @@ static DEFINE_PER_CPU(void __iomem *, sswi_cpu_regs);
>
>  static void thead_aclint_sswi_ipi_send(unsigned int cpu)
>  {
> +       /*
> +        * Ensure that stores to normal memory are visible to the other CPUs
> +        * before issuing IPI.
> +        */
> +       wmb();
> +

Same comment as PATCH3.

The thead_aclint_sswi_ipi_send() is called through ipi_mux_send_mask()
which does smp_mb__after_atomic() before calling so no need
for any barrier here. Also, barriers need to be in-pair so adding
a single barrier at random location is inappropriate.
(Refer, https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/irq/ipi-mux.c?h=v6.13-rc7#n78)

Based on the above, this patch is not needed.

Regards,
Anup

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [External] Re: [PATCH 3/5] irqchip/riscv-imsic: Use wmb() to order normal writes and IPI writes
  2025-01-14  4:34   ` Anup Patel
@ 2025-01-14  6:39     ` Xu Lu
  2025-01-14  8:58       ` Anup Patel
  0 siblings, 1 reply; 16+ messages in thread
From: Xu Lu @ 2025-01-14  6:39 UTC (permalink / raw)
  To: Anup Patel
  Cc: daniel.lezcano, tglx, paul.walmsley, palmer, lihangjing,
	xieyongji, linux-riscv, linux-kernel

Hi Anup,

I want to ensure when the receiver CPU traps into interrupt, it can
see all memory updates made by the sender before.

The smp_mb__after_atomic() uses 'fence rw, rw' which orders normal
memory reads/writes. The IMSIC write is actually IO writes and is not
ordered by it.

So I worry in extreme cases, the receiver CPU can receive the IPI
first, see no updates to ipi_mux_pcpu->bits and return immediately.
Then this IPI will not be handled until another IPI comes.

Best Regards,

Xu Lu

On Tue, Jan 14, 2025 at 12:34 PM Anup Patel <anup@brainfault.org> wrote:
>
> On Mon, Jan 13, 2025 at 8:39 PM Xu Lu <luxu.kernel@bytedance.com> wrote:
> >
> > During an IPI procedure, we need to ensure all previous write operations
> > are visible to other CPUs before sending a real IPI. We use wmb() barrier
> > to ensure this as IMSIC issues IPI via mmio writes.
> >
> > Signed-off-by: Xu Lu <luxu.kernel@bytedance.com>
> > ---
> >  drivers/irqchip/irq-riscv-imsic-early.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> >
> > diff --git a/drivers/irqchip/irq-riscv-imsic-early.c b/drivers/irqchip/irq-riscv-imsic-early.c
> > index 63097f2bbadf..c6317cb557fb 100644
> > --- a/drivers/irqchip/irq-riscv-imsic-early.c
> > +++ b/drivers/irqchip/irq-riscv-imsic-early.c
> > @@ -29,6 +29,12 @@ static void imsic_ipi_send(unsigned int cpu)
> >  {
> >         struct imsic_local_config *local = per_cpu_ptr(imsic->global.local, cpu);
> >
> > +       /*
> > +        * Ensure that stores to normal memory are visible to the other CPUs
> > +        * before issuing IPI.
> > +        */
> > +       wmb();
> > +
>
> The imsic_ipi_send() is called through ipi_mux_send_mask()
> which does smp_mb__after_atomic() before calling so no need
> for any barrier here. Also, barriers need to be in-pair so adding
> a single barrier at random location is inappropriate.
> (Refer, https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/irq/ipi-mux.c?h=v6.13-rc7#n78)
>
> Based on the above, this patch is not needed.
>
> Regards,
> Anup

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [External] Re: [PATCH 3/5] irqchip/riscv-imsic: Use wmb() to order normal writes and IPI writes
  2025-01-14  6:39     ` [External] " Xu Lu
@ 2025-01-14  8:58       ` Anup Patel
  2025-01-14  9:07         ` Xu Lu
  0 siblings, 1 reply; 16+ messages in thread
From: Anup Patel @ 2025-01-14  8:58 UTC (permalink / raw)
  To: Xu Lu
  Cc: daniel.lezcano, tglx, paul.walmsley, palmer, lihangjing,
	xieyongji, linux-riscv, linux-kernel

On Tue, Jan 14, 2025 at 12:09 PM Xu Lu <luxu.kernel@bytedance.com> wrote:
>
> Hi Anup,
>
> I want to ensure when the receiver CPU traps into interrupt, it can
> see all memory updates made by the sender before.
>
> The smp_mb__after_atomic() uses 'fence rw, rw' which orders normal
> memory reads/writes. The IMSIC write is actually IO writes and is not
> ordered by it.
>
> So I worry in extreme cases, the receiver CPU can receive the IPI
> first, see no updates to ipi_mux_pcpu->bits and return immediately.
> Then this IPI will not be handled until another IPI comes.

Using wmb() is still inappropriate because if you want to simply
ensure IO writes are not ordered before the normal memory writes
then just convert writel_relaxed() to write().

Regards,
Anup

>
> Best Regards,
>
> Xu Lu
>
> On Tue, Jan 14, 2025 at 12:34 PM Anup Patel <anup@brainfault.org> wrote:
> >
> > On Mon, Jan 13, 2025 at 8:39 PM Xu Lu <luxu.kernel@bytedance.com> wrote:
> > >
> > > During an IPI procedure, we need to ensure all previous write operations
> > > are visible to other CPUs before sending a real IPI. We use wmb() barrier
> > > to ensure this as IMSIC issues IPI via mmio writes.
> > >
> > > Signed-off-by: Xu Lu <luxu.kernel@bytedance.com>
> > > ---
> > >  drivers/irqchip/irq-riscv-imsic-early.c | 6 ++++++
> > >  1 file changed, 6 insertions(+)
> > >
> > > diff --git a/drivers/irqchip/irq-riscv-imsic-early.c b/drivers/irqchip/irq-riscv-imsic-early.c
> > > index 63097f2bbadf..c6317cb557fb 100644
> > > --- a/drivers/irqchip/irq-riscv-imsic-early.c
> > > +++ b/drivers/irqchip/irq-riscv-imsic-early.c
> > > @@ -29,6 +29,12 @@ static void imsic_ipi_send(unsigned int cpu)
> > >  {
> > >         struct imsic_local_config *local = per_cpu_ptr(imsic->global.local, cpu);
> > >
> > > +       /*
> > > +        * Ensure that stores to normal memory are visible to the other CPUs
> > > +        * before issuing IPI.
> > > +        */
> > > +       wmb();
> > > +
> >
> > The imsic_ipi_send() is called through ipi_mux_send_mask()
> > which does smp_mb__after_atomic() before calling so no need
> > for any barrier here. Also, barriers need to be in-pair so adding
> > a single barrier at random location is inappropriate.
> > (Refer, https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/irq/ipi-mux.c?h=v6.13-rc7#n78)
> >
> > Based on the above, this patch is not needed.
> >
> > Regards,
> > Anup

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [External] Re: [PATCH 3/5] irqchip/riscv-imsic: Use wmb() to order normal writes and IPI writes
  2025-01-14  8:58       ` Anup Patel
@ 2025-01-14  9:07         ` Xu Lu
  0 siblings, 0 replies; 16+ messages in thread
From: Xu Lu @ 2025-01-14  9:07 UTC (permalink / raw)
  To: Anup Patel
  Cc: daniel.lezcano, tglx, paul.walmsley, palmer, lihangjing,
	xieyongji, linux-riscv, linux-kernel

On Tue, Jan 14, 2025 at 4:59 PM Anup Patel <anup@brainfault.org> wrote:
>
> On Tue, Jan 14, 2025 at 12:09 PM Xu Lu <luxu.kernel@bytedance.com> wrote:
> >
> > Hi Anup,
> >
> > I want to ensure when the receiver CPU traps into interrupt, it can
> > see all memory updates made by the sender before.
> >
> > The smp_mb__after_atomic() uses 'fence rw, rw' which orders normal
> > memory reads/writes. The IMSIC write is actually IO writes and is not
> > ordered by it.
> >
> > So I worry in extreme cases, the receiver CPU can receive the IPI
> > first, see no updates to ipi_mux_pcpu->bits and return immediately.
> > Then this IPI will not be handled until another IPI comes.
>
> Using wmb() is still inappropriate because if you want to simply
> ensure IO writes are not ordered before the normal memory writes
> then just convert writel_relaxed() to write().

Got it. I will use writel() instead.

Thanks!

Xu Lu

>
> Regards,
> Anup
>
> >
> > Best Regards,
> >
> > Xu Lu
> >
> > On Tue, Jan 14, 2025 at 12:34 PM Anup Patel <anup@brainfault.org> wrote:
> > >
> > > On Mon, Jan 13, 2025 at 8:39 PM Xu Lu <luxu.kernel@bytedance.com> wrote:
> > > >
> > > > During an IPI procedure, we need to ensure all previous write operations
> > > > are visible to other CPUs before sending a real IPI. We use wmb() barrier
> > > > to ensure this as IMSIC issues IPI via mmio writes.
> > > >
> > > > Signed-off-by: Xu Lu <luxu.kernel@bytedance.com>
> > > > ---
> > > >  drivers/irqchip/irq-riscv-imsic-early.c | 6 ++++++
> > > >  1 file changed, 6 insertions(+)
> > > >
> > > > diff --git a/drivers/irqchip/irq-riscv-imsic-early.c b/drivers/irqchip/irq-riscv-imsic-early.c
> > > > index 63097f2bbadf..c6317cb557fb 100644
> > > > --- a/drivers/irqchip/irq-riscv-imsic-early.c
> > > > +++ b/drivers/irqchip/irq-riscv-imsic-early.c
> > > > @@ -29,6 +29,12 @@ static void imsic_ipi_send(unsigned int cpu)
> > > >  {
> > > >         struct imsic_local_config *local = per_cpu_ptr(imsic->global.local, cpu);
> > > >
> > > > +       /*
> > > > +        * Ensure that stores to normal memory are visible to the other CPUs
> > > > +        * before issuing IPI.
> > > > +        */
> > > > +       wmb();
> > > > +
> > >
> > > The imsic_ipi_send() is called through ipi_mux_send_mask()
> > > which does smp_mb__after_atomic() before calling so no need
> > > for any barrier here. Also, barriers need to be in-pair so adding
> > > a single barrier at random location is inappropriate.
> > > (Refer, https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/irq/ipi-mux.c?h=v6.13-rc7#n78)
> > >
> > > Based on the above, this patch is not needed.
> > >
> > > Regards,
> > > Anup

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/5] irqchip/riscv-intc: Balance priority and fairness during irq handling
  2025-01-13 15:09 ` [PATCH 1/5] irqchip/riscv-intc: Balance priority and fairness during irq handling Xu Lu
@ 2025-01-15 11:39   ` Anup Patel
  2025-01-15 12:37     ` [External] " Xu Lu
  0 siblings, 1 reply; 16+ messages in thread
From: Anup Patel @ 2025-01-15 11:39 UTC (permalink / raw)
  To: Xu Lu
  Cc: daniel.lezcano, tglx, paul.walmsley, palmer, lihangjing,
	xieyongji, linux-riscv, linux-kernel

On Mon, Jan 13, 2025 at 8:39 PM Xu Lu <luxu.kernel@bytedance.com> wrote:
>
> Both csr cause and csr topi record the pending bit with the highest
> priority. If interrupts with high priority arrive frequently within a
> certain period of time, the interrupts with low priority won't get a
> chance to be handled.
>
> For example, if external interrupts and software interrupts arrive very
> frequently, the timer interrupts will never be handled. Then buddy
> watchdog on a buddy CPU will report a hardlockup on the current CPU while
> current CPU actually can receive irq.

Platform with proper HW watchdog will not see this issue because HW
watchdog interrupt will be an external interrupt.

There was an effort to standardize watchdog for RISC-V platforms but it was
deferred to future standardization. We should resume those discussions within
RVI forums.

>
> This commit solves this problem by handling all pending irqs in a round.
> During each round, this commit handles pending irqs by their priority.
>
> Signed-off-by: Xu Lu <luxu.kernel@bytedance.com>
> ---
>  drivers/irqchip/irq-riscv-intc.c | 32 ++++++++++++++++++++++++++------
>  1 file changed, 26 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/irqchip/irq-riscv-intc.c b/drivers/irqchip/irq-riscv-intc.c
> index f653c13de62b..bc2ec26aa9e9 100644
> --- a/drivers/irqchip/irq-riscv-intc.c
> +++ b/drivers/irqchip/irq-riscv-intc.c
> @@ -26,20 +26,40 @@ static unsigned int riscv_intc_nr_irqs __ro_after_init = BITS_PER_LONG;
>  static unsigned int riscv_intc_custom_base __ro_after_init = BITS_PER_LONG;
>  static unsigned int riscv_intc_custom_nr_irqs __ro_after_init;
>
> +static unsigned int riscv_prio_irqs[] = {
> +#ifdef CONFIG_RISCV_M_MODE
> +       IRQ_M_EXT, IRQ_M_SOFT, IRQ_M_TIMER,
> +#endif
> +       IRQ_S_EXT, IRQ_S_SOFT, IRQ_S_TIMER, IRQ_S_GEXT,
> +       IRQ_VS_EXT, IRQ_VS_SOFT, IRQ_VS_TIMER,
> +       IRQ_PMU_OVF,
> +};
> +
>  static void riscv_intc_irq(struct pt_regs *regs)
>  {
> -       unsigned long cause = regs->cause & ~CAUSE_IRQ_FLAG;
> -
> -       if (generic_handle_domain_irq(intc_domain, cause))
> -               pr_warn_ratelimited("Failed to handle interrupt (cause: %ld)\n", cause);
> +       unsigned long pending = csr_read(CSR_IP) & csr_read(CSR_IE);
> +       unsigned int i;
> +
> +       for (i = 0; i < ARRAY_SIZE(riscv_prio_irqs); i++)
> +               if (pending & (1UL << riscv_prio_irqs[i]))
> +                       if (generic_handle_domain_irq(intc_domain, riscv_prio_irqs[i]))
> +                               pr_warn_ratelimited("Failed to handle interrupt (cause: %u)\n",
> +                                                   riscv_prio_irqs[i]);

This is overriding the builtin priority assignment for local interrupts as
defined by the RISC-V privileged specification for non-AIA systems
which changes the expected behaviour on existing platforms.

>  }
>
>  static void riscv_intc_aia_irq(struct pt_regs *regs)
>  {
>         unsigned long topi;
> +       unsigned long pending;
> +       unsigned int i;
> +
> +       while ((topi = csr_read(CSR_TOPI))) {
> +               pending = csr_read(CSR_IP) & csr_read(CSR_IE);
>
> -       while ((topi = csr_read(CSR_TOPI)))
> -               generic_handle_domain_irq(intc_domain, topi >> TOPI_IID_SHIFT);
> +               for (i = 0; i < ARRAY_SIZE(riscv_prio_irqs); i++)
> +                       if (pending & (1UL << riscv_prio_irqs[i]))
> +                               generic_handle_domain_irq(intc_domain, riscv_prio_irqs[i]);
> +       }

The AIA specification already provides a mechanism to change
priorities of local interrupts. The firmware or bootloader can always
provide custom priorities to local interrupts.

In general, requiring certain local interrupts to have higher priority is
platform or use-case specific and should be done as AIA configuration
in firmware or bootloader.

NAK to this patch from my side since it is just adding a software
work-around for a missing standard watchdog in the RISC-V
ecosystem.

One possible approach is to let platforms have their implementation
specific M-mode watchdog and for supervisor software (both HS and
VS-mode) we can have SBI based watchdog where the supervisor
watchdog expiry is an SSE (higher priority than all local interrupts).

Regards,
Anup

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [External] Re: [PATCH 1/5] irqchip/riscv-intc: Balance priority and fairness during irq handling
  2025-01-15 11:39   ` Anup Patel
@ 2025-01-15 12:37     ` Xu Lu
  2025-01-15 17:01       ` Anup Patel
  0 siblings, 1 reply; 16+ messages in thread
From: Xu Lu @ 2025-01-15 12:37 UTC (permalink / raw)
  To: Anup Patel
  Cc: daniel.lezcano, tglx, paul.walmsley, palmer, lihangjing,
	xieyongji, linux-riscv, linux-kernel

Hi Anup,

I agree that a standardized NMI or SSE event is the optimal solution
to the watchdog problem. Here I just use buddy watchdog to illustrate
the possible consequences of interrupts with low priority getting
starved.

I believe that AIA or firmware can adjust the priority of different
irqs. What I want to emphasize is that in existing implementation,
kernel will continue handling newly arrived high priority irqs, no
matter how long lower priority irqs have been waiting for.

For example, if both external irq and IPI (assuming non AIA
architecture using software irq) arrive in cycle 0, kernel will handle
the external irq first. Then a new external irq arrives during the
first external irq handling procedure, let's say cycle 100. Then
kernel finishes the first external irq handling procedure, let's say
cycle 200, it will handle the newly arrived external irq, instead of
the already arrived IPI. The IPI won't be handled until the second
external irq is handled, let's say cycle 300. Things get worse if
external interrupts keep arriving.

I think it is better to provide a mechanism to avoid this. So I regard
interrupts arriving within a certain period as a round and only handle
interrupts in the new round after all interrupts in the old round have
been handled. The interrupt priority only takes effect in the same
round.

Of course this is not the optimal way. Please give some advice if you
also think it is necessary (for example introduce an interrupt
priority decreasing mechanism instead).

Best Regards,

Xu Lu

On Wed, Jan 15, 2025 at 7:39 PM Anup Patel <anup@brainfault.org> wrote:
>
> On Mon, Jan 13, 2025 at 8:39 PM Xu Lu <luxu.kernel@bytedance.com> wrote:
> >
> > Both csr cause and csr topi record the pending bit with the highest
> > priority. If interrupts with high priority arrive frequently within a
> > certain period of time, the interrupts with low priority won't get a
> > chance to be handled.
> >
> > For example, if external interrupts and software interrupts arrive very
> > frequently, the timer interrupts will never be handled. Then buddy
> > watchdog on a buddy CPU will report a hardlockup on the current CPU while
> > current CPU actually can receive irq.
>
> Platform with proper HW watchdog will not see this issue because HW
> watchdog interrupt will be an external interrupt.
>
> There was an effort to standardize watchdog for RISC-V platforms but it was
> deferred to future standardization. We should resume those discussions within
> RVI forums.
>
> >
> > This commit solves this problem by handling all pending irqs in a round.
> > During each round, this commit handles pending irqs by their priority.
> >
> > Signed-off-by: Xu Lu <luxu.kernel@bytedance.com>
> > ---
> >  drivers/irqchip/irq-riscv-intc.c | 32 ++++++++++++++++++++++++++------
> >  1 file changed, 26 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/irqchip/irq-riscv-intc.c b/drivers/irqchip/irq-riscv-intc.c
> > index f653c13de62b..bc2ec26aa9e9 100644
> > --- a/drivers/irqchip/irq-riscv-intc.c
> > +++ b/drivers/irqchip/irq-riscv-intc.c
> > @@ -26,20 +26,40 @@ static unsigned int riscv_intc_nr_irqs __ro_after_init = BITS_PER_LONG;
> >  static unsigned int riscv_intc_custom_base __ro_after_init = BITS_PER_LONG;
> >  static unsigned int riscv_intc_custom_nr_irqs __ro_after_init;
> >
> > +static unsigned int riscv_prio_irqs[] = {
> > +#ifdef CONFIG_RISCV_M_MODE
> > +       IRQ_M_EXT, IRQ_M_SOFT, IRQ_M_TIMER,
> > +#endif
> > +       IRQ_S_EXT, IRQ_S_SOFT, IRQ_S_TIMER, IRQ_S_GEXT,
> > +       IRQ_VS_EXT, IRQ_VS_SOFT, IRQ_VS_TIMER,
> > +       IRQ_PMU_OVF,
> > +};
> > +
> >  static void riscv_intc_irq(struct pt_regs *regs)
> >  {
> > -       unsigned long cause = regs->cause & ~CAUSE_IRQ_FLAG;
> > -
> > -       if (generic_handle_domain_irq(intc_domain, cause))
> > -               pr_warn_ratelimited("Failed to handle interrupt (cause: %ld)\n", cause);
> > +       unsigned long pending = csr_read(CSR_IP) & csr_read(CSR_IE);
> > +       unsigned int i;
> > +
> > +       for (i = 0; i < ARRAY_SIZE(riscv_prio_irqs); i++)
> > +               if (pending & (1UL << riscv_prio_irqs[i]))
> > +                       if (generic_handle_domain_irq(intc_domain, riscv_prio_irqs[i]))
> > +                               pr_warn_ratelimited("Failed to handle interrupt (cause: %u)\n",
> > +                                                   riscv_prio_irqs[i]);
>
> This is overriding the builtin priority assignment for local interrupts as
> defined by the RISC-V privileged specification for non-AIA systems
> which changes the expected behaviour on existing platforms.
>
> >  }
> >
> >  static void riscv_intc_aia_irq(struct pt_regs *regs)
> >  {
> >         unsigned long topi;
> > +       unsigned long pending;
> > +       unsigned int i;
> > +
> > +       while ((topi = csr_read(CSR_TOPI))) {
> > +               pending = csr_read(CSR_IP) & csr_read(CSR_IE);
> >
> > -       while ((topi = csr_read(CSR_TOPI)))
> > -               generic_handle_domain_irq(intc_domain, topi >> TOPI_IID_SHIFT);
> > +               for (i = 0; i < ARRAY_SIZE(riscv_prio_irqs); i++)
> > +                       if (pending & (1UL << riscv_prio_irqs[i]))
> > +                               generic_handle_domain_irq(intc_domain, riscv_prio_irqs[i]);
> > +       }
>
> The AIA specification already provides a mechanism to change
> priorities of local interrupts. The firmware or bootloader can always
> provide custom priorities to local interrupts.
>
> In general, requiring certain local interrupts to have higher priority is
> platform or use-case specific and should be done as AIA configuration
> in firmware or bootloader.
>
> NAK to this patch from my side since it is just adding a software
> work-around for a missing standard watchdog in the RISC-V
> ecosystem.
>
> One possible approach is to let platforms have their implementation
> specific M-mode watchdog and for supervisor software (both HS and
> VS-mode) we can have SBI based watchdog where the supervisor
> watchdog expiry is an SSE (higher priority than all local interrupts).
>
> Regards,
> Anup

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [External] Re: [PATCH 1/5] irqchip/riscv-intc: Balance priority and fairness during irq handling
  2025-01-15 12:37     ` [External] " Xu Lu
@ 2025-01-15 17:01       ` Anup Patel
  2025-01-16  2:26         ` Xu Lu
  0 siblings, 1 reply; 16+ messages in thread
From: Anup Patel @ 2025-01-15 17:01 UTC (permalink / raw)
  To: Xu Lu
  Cc: daniel.lezcano, tglx, paul.walmsley, palmer, lihangjing,
	xieyongji, linux-riscv, linux-kernel

On Wed, Jan 15, 2025 at 6:08 PM Xu Lu <luxu.kernel@bytedance.com> wrote:
>
> Hi Anup,
>
> I agree that a standardized NMI or SSE event is the optimal solution
> to the watchdog problem. Here I just use buddy watchdog to illustrate
> the possible consequences of interrupts with low priority getting
> starved.
>
> I believe that AIA or firmware can adjust the priority of different
> irqs. What I want to emphasize is that in existing implementation,
> kernel will continue handling newly arrived high priority irqs, no
> matter how long lower priority irqs have been waiting for.
>
> For example, if both external irq and IPI (assuming non AIA
> architecture using software irq) arrive in cycle 0, kernel will handle
> the external irq first. Then a new external irq arrives during the
> first external irq handling procedure, let's say cycle 100. Then
> kernel finishes the first external irq handling procedure, let's say
> cycle 200, it will handle the newly arrived external irq, instead of
> the already arrived IPI. The IPI won't be handled until the second
> external irq is handled, let's say cycle 300. Things get worse if
> external interrupts keep arriving.
>
> I think it is better to provide a mechanism to avoid this. So I regard
> interrupts arriving within a certain period as a round and only handle
> interrupts in the new round after all interrupts in the old round have
> been handled. The interrupt priority only takes effect in the same
> round.

Well, this flood of external interrupts from a device is a classical
problem across architectures. The best way to solve this is improve
the device driver to use better bottom-half techniques such as
NAPI in-case of network driver, threaded IRQs, etc.

Working around this in the interrupt controller driver is not the
right way.

Regards,
Anup

>
> Of course this is not the optimal way. Please give some advice if you
> also think it is necessary (for example introduce an interrupt
> priority decreasing mechanism instead).
>
> Best Regards,
>
> Xu Lu
>
> On Wed, Jan 15, 2025 at 7:39 PM Anup Patel <anup@brainfault.org> wrote:
> >
> > On Mon, Jan 13, 2025 at 8:39 PM Xu Lu <luxu.kernel@bytedance.com> wrote:
> > >
> > > Both csr cause and csr topi record the pending bit with the highest
> > > priority. If interrupts with high priority arrive frequently within a
> > > certain period of time, the interrupts with low priority won't get a
> > > chance to be handled.
> > >
> > > For example, if external interrupts and software interrupts arrive very
> > > frequently, the timer interrupts will never be handled. Then buddy
> > > watchdog on a buddy CPU will report a hardlockup on the current CPU while
> > > current CPU actually can receive irq.
> >
> > Platform with proper HW watchdog will not see this issue because HW
> > watchdog interrupt will be an external interrupt.
> >
> > There was an effort to standardize watchdog for RISC-V platforms but it was
> > deferred to future standardization. We should resume those discussions within
> > RVI forums.
> >
> > >
> > > This commit solves this problem by handling all pending irqs in a round.
> > > During each round, this commit handles pending irqs by their priority.
> > >
> > > Signed-off-by: Xu Lu <luxu.kernel@bytedance.com>
> > > ---
> > >  drivers/irqchip/irq-riscv-intc.c | 32 ++++++++++++++++++++++++++------
> > >  1 file changed, 26 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/drivers/irqchip/irq-riscv-intc.c b/drivers/irqchip/irq-riscv-intc.c
> > > index f653c13de62b..bc2ec26aa9e9 100644
> > > --- a/drivers/irqchip/irq-riscv-intc.c
> > > +++ b/drivers/irqchip/irq-riscv-intc.c
> > > @@ -26,20 +26,40 @@ static unsigned int riscv_intc_nr_irqs __ro_after_init = BITS_PER_LONG;
> > >  static unsigned int riscv_intc_custom_base __ro_after_init = BITS_PER_LONG;
> > >  static unsigned int riscv_intc_custom_nr_irqs __ro_after_init;
> > >
> > > +static unsigned int riscv_prio_irqs[] = {
> > > +#ifdef CONFIG_RISCV_M_MODE
> > > +       IRQ_M_EXT, IRQ_M_SOFT, IRQ_M_TIMER,
> > > +#endif
> > > +       IRQ_S_EXT, IRQ_S_SOFT, IRQ_S_TIMER, IRQ_S_GEXT,
> > > +       IRQ_VS_EXT, IRQ_VS_SOFT, IRQ_VS_TIMER,
> > > +       IRQ_PMU_OVF,
> > > +};
> > > +
> > >  static void riscv_intc_irq(struct pt_regs *regs)
> > >  {
> > > -       unsigned long cause = regs->cause & ~CAUSE_IRQ_FLAG;
> > > -
> > > -       if (generic_handle_domain_irq(intc_domain, cause))
> > > -               pr_warn_ratelimited("Failed to handle interrupt (cause: %ld)\n", cause);
> > > +       unsigned long pending = csr_read(CSR_IP) & csr_read(CSR_IE);
> > > +       unsigned int i;
> > > +
> > > +       for (i = 0; i < ARRAY_SIZE(riscv_prio_irqs); i++)
> > > +               if (pending & (1UL << riscv_prio_irqs[i]))
> > > +                       if (generic_handle_domain_irq(intc_domain, riscv_prio_irqs[i]))
> > > +                               pr_warn_ratelimited("Failed to handle interrupt (cause: %u)\n",
> > > +                                                   riscv_prio_irqs[i]);
> >
> > This is overriding the builtin priority assignment for local interrupts as
> > defined by the RISC-V privileged specification for non-AIA systems
> > which changes the expected behaviour on existing platforms.
> >
> > >  }
> > >
> > >  static void riscv_intc_aia_irq(struct pt_regs *regs)
> > >  {
> > >         unsigned long topi;
> > > +       unsigned long pending;
> > > +       unsigned int i;
> > > +
> > > +       while ((topi = csr_read(CSR_TOPI))) {
> > > +               pending = csr_read(CSR_IP) & csr_read(CSR_IE);
> > >
> > > -       while ((topi = csr_read(CSR_TOPI)))
> > > -               generic_handle_domain_irq(intc_domain, topi >> TOPI_IID_SHIFT);
> > > +               for (i = 0; i < ARRAY_SIZE(riscv_prio_irqs); i++)
> > > +                       if (pending & (1UL << riscv_prio_irqs[i]))
> > > +                               generic_handle_domain_irq(intc_domain, riscv_prio_irqs[i]);
> > > +       }
> >
> > The AIA specification already provides a mechanism to change
> > priorities of local interrupts. The firmware or bootloader can always
> > provide custom priorities to local interrupts.
> >
> > In general, requiring certain local interrupts to have higher priority is
> > platform or use-case specific and should be done as AIA configuration
> > in firmware or bootloader.
> >
> > NAK to this patch from my side since it is just adding a software
> > work-around for a missing standard watchdog in the RISC-V
> > ecosystem.
> >
> > One possible approach is to let platforms have their implementation
> > specific M-mode watchdog and for supervisor software (both HS and
> > VS-mode) we can have SBI based watchdog where the supervisor
> > watchdog expiry is an SSE (higher priority than all local interrupts).
> >
> > Regards,
> > Anup

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [External] Re: [PATCH 1/5] irqchip/riscv-intc: Balance priority and fairness during irq handling
  2025-01-15 17:01       ` Anup Patel
@ 2025-01-16  2:26         ` Xu Lu
  0 siblings, 0 replies; 16+ messages in thread
From: Xu Lu @ 2025-01-16  2:26 UTC (permalink / raw)
  To: Anup Patel
  Cc: daniel.lezcano, tglx, paul.walmsley, palmer, lihangjing,
	xieyongji, linux-riscv, linux-kernel

On Thu, Jan 16, 2025 at 1:02 AM Anup Patel <anup@brainfault.org> wrote:
>
> On Wed, Jan 15, 2025 at 6:08 PM Xu Lu <luxu.kernel@bytedance.com> wrote:
> >
> > Hi Anup,
> >
> > I agree that a standardized NMI or SSE event is the optimal solution
> > to the watchdog problem. Here I just use buddy watchdog to illustrate
> > the possible consequences of interrupts with low priority getting
> > starved.
> >
> > I believe that AIA or firmware can adjust the priority of different
> > irqs. What I want to emphasize is that in existing implementation,
> > kernel will continue handling newly arrived high priority irqs, no
> > matter how long lower priority irqs have been waiting for.
> >
> > For example, if both external irq and IPI (assuming non AIA
> > architecture using software irq) arrive in cycle 0, kernel will handle
> > the external irq first. Then a new external irq arrives during the
> > first external irq handling procedure, let's say cycle 100. Then
> > kernel finishes the first external irq handling procedure, let's say
> > cycle 200, it will handle the newly arrived external irq, instead of
> > the already arrived IPI. The IPI won't be handled until the second
> > external irq is handled, let's say cycle 300. Things get worse if
> > external interrupts keep arriving.
> >
> > I think it is better to provide a mechanism to avoid this. So I regard
> > interrupts arriving within a certain period as a round and only handle
> > interrupts in the new round after all interrupts in the old round have
> > been handled. The interrupt priority only takes effect in the same
> > round.
>
> Well, this flood of external interrupts from a device is a classical
> problem across architectures. The best way to solve this is improve
> the device driver to use better bottom-half techniques such as
> NAPI in-case of network driver, threaded IRQs, etc.
>
> Working around this in the interrupt controller driver is not the
> right way.
>
> Regards,
> Anup

OK. I will remove this commit and only replace the 'writel_relaxedl'
with 'writel' when sending IPI.

Thanks!

Xu Lu

>
> >
> > Of course this is not the optimal way. Please give some advice if you
> > also think it is necessary (for example introduce an interrupt
> > priority decreasing mechanism instead).
> >
> > Best Regards,
> >
> > Xu Lu
> >
> > On Wed, Jan 15, 2025 at 7:39 PM Anup Patel <anup@brainfault.org> wrote:
> > >
> > > On Mon, Jan 13, 2025 at 8:39 PM Xu Lu <luxu.kernel@bytedance.com> wrote:
> > > >
> > > > Both csr cause and csr topi record the pending bit with the highest
> > > > priority. If interrupts with high priority arrive frequently within a
> > > > certain period of time, the interrupts with low priority won't get a
> > > > chance to be handled.
> > > >
> > > > For example, if external interrupts and software interrupts arrive very
> > > > frequently, the timer interrupts will never be handled. Then buddy
> > > > watchdog on a buddy CPU will report a hardlockup on the current CPU while
> > > > current CPU actually can receive irq.
> > >
> > > Platform with proper HW watchdog will not see this issue because HW
> > > watchdog interrupt will be an external interrupt.
> > >
> > > There was an effort to standardize watchdog for RISC-V platforms but it was
> > > deferred to future standardization. We should resume those discussions within
> > > RVI forums.
> > >
> > > >
> > > > This commit solves this problem by handling all pending irqs in a round.
> > > > During each round, this commit handles pending irqs by their priority.
> > > >
> > > > Signed-off-by: Xu Lu <luxu.kernel@bytedance.com>
> > > > ---
> > > >  drivers/irqchip/irq-riscv-intc.c | 32 ++++++++++++++++++++++++++------
> > > >  1 file changed, 26 insertions(+), 6 deletions(-)
> > > >
> > > > diff --git a/drivers/irqchip/irq-riscv-intc.c b/drivers/irqchip/irq-riscv-intc.c
> > > > index f653c13de62b..bc2ec26aa9e9 100644
> > > > --- a/drivers/irqchip/irq-riscv-intc.c
> > > > +++ b/drivers/irqchip/irq-riscv-intc.c
> > > > @@ -26,20 +26,40 @@ static unsigned int riscv_intc_nr_irqs __ro_after_init = BITS_PER_LONG;
> > > >  static unsigned int riscv_intc_custom_base __ro_after_init = BITS_PER_LONG;
> > > >  static unsigned int riscv_intc_custom_nr_irqs __ro_after_init;
> > > >
> > > > +static unsigned int riscv_prio_irqs[] = {
> > > > +#ifdef CONFIG_RISCV_M_MODE
> > > > +       IRQ_M_EXT, IRQ_M_SOFT, IRQ_M_TIMER,
> > > > +#endif
> > > > +       IRQ_S_EXT, IRQ_S_SOFT, IRQ_S_TIMER, IRQ_S_GEXT,
> > > > +       IRQ_VS_EXT, IRQ_VS_SOFT, IRQ_VS_TIMER,
> > > > +       IRQ_PMU_OVF,
> > > > +};
> > > > +
> > > >  static void riscv_intc_irq(struct pt_regs *regs)
> > > >  {
> > > > -       unsigned long cause = regs->cause & ~CAUSE_IRQ_FLAG;
> > > > -
> > > > -       if (generic_handle_domain_irq(intc_domain, cause))
> > > > -               pr_warn_ratelimited("Failed to handle interrupt (cause: %ld)\n", cause);
> > > > +       unsigned long pending = csr_read(CSR_IP) & csr_read(CSR_IE);
> > > > +       unsigned int i;
> > > > +
> > > > +       for (i = 0; i < ARRAY_SIZE(riscv_prio_irqs); i++)
> > > > +               if (pending & (1UL << riscv_prio_irqs[i]))
> > > > +                       if (generic_handle_domain_irq(intc_domain, riscv_prio_irqs[i]))
> > > > +                               pr_warn_ratelimited("Failed to handle interrupt (cause: %u)\n",
> > > > +                                                   riscv_prio_irqs[i]);
> > >
> > > This is overriding the builtin priority assignment for local interrupts as
> > > defined by the RISC-V privileged specification for non-AIA systems
> > > which changes the expected behaviour on existing platforms.
> > >
> > > >  }
> > > >
> > > >  static void riscv_intc_aia_irq(struct pt_regs *regs)
> > > >  {
> > > >         unsigned long topi;
> > > > +       unsigned long pending;
> > > > +       unsigned int i;
> > > > +
> > > > +       while ((topi = csr_read(CSR_TOPI))) {
> > > > +               pending = csr_read(CSR_IP) & csr_read(CSR_IE);
> > > >
> > > > -       while ((topi = csr_read(CSR_TOPI)))
> > > > -               generic_handle_domain_irq(intc_domain, topi >> TOPI_IID_SHIFT);
> > > > +               for (i = 0; i < ARRAY_SIZE(riscv_prio_irqs); i++)
> > > > +                       if (pending & (1UL << riscv_prio_irqs[i]))
> > > > +                               generic_handle_domain_irq(intc_domain, riscv_prio_irqs[i]);
> > > > +       }
> > >
> > > The AIA specification already provides a mechanism to change
> > > priorities of local interrupts. The firmware or bootloader can always
> > > provide custom priorities to local interrupts.
> > >
> > > In general, requiring certain local interrupts to have higher priority is
> > > platform or use-case specific and should be done as AIA configuration
> > > in firmware or bootloader.
> > >
> > > NAK to this patch from my side since it is just adding a software
> > > work-around for a missing standard watchdog in the RISC-V
> > > ecosystem.
> > >
> > > One possible approach is to let platforms have their implementation
> > > specific M-mode watchdog and for supervisor software (both HS and
> > > VS-mode) we can have SBI based watchdog where the supervisor
> > > watchdog expiry is an SSE (higher priority than all local interrupts).
> > >
> > > Regards,
> > > Anup

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-01-16  2:27 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-13 15:09 [PATCH 0/5] riscv: irqchip: Optimization of interrupt handling Xu Lu
2025-01-13 15:09 ` [PATCH 1/5] irqchip/riscv-intc: Balance priority and fairness during irq handling Xu Lu
2025-01-15 11:39   ` Anup Patel
2025-01-15 12:37     ` [External] " Xu Lu
2025-01-15 17:01       ` Anup Patel
2025-01-16  2:26         ` Xu Lu
2025-01-13 15:09 ` [PATCH 2/5] irqchip/riscv-imsic: Add a threshold to ext irq handling times Xu Lu
2025-01-13 15:09 ` [PATCH 3/5] irqchip/riscv-imsic: Use wmb() to order normal writes and IPI writes Xu Lu
2025-01-14  4:34   ` Anup Patel
2025-01-14  6:39     ` [External] " Xu Lu
2025-01-14  8:58       ` Anup Patel
2025-01-14  9:07         ` Xu Lu
2025-01-13 15:09 ` [PATCH 4/5] irqchip/timer-clint: " Xu Lu
2025-01-14  4:34   ` Anup Patel
2025-01-13 15:09 ` [PATCH 5/5] irqchip/aclint-sswi: " Xu Lu
2025-01-14  4:34   ` Anup Patel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox