* [PATCH] riscv: add system error interrupt handler support
@ 2026-02-26 8:27 Rui Qi
2026-02-26 9:22 ` Conor Dooley
2026-02-26 19:09 ` kernel test robot
0 siblings, 2 replies; 6+ messages in thread
From: Rui Qi @ 2026-02-26 8:27 UTC (permalink / raw)
To: paul.walmsley, palmer, aou, alex, cyrilbur, tglx, peterz, debug,
andybnac, charlie, geomatsi, thuth, bjorn, songshuaishuai, martin,
masahiroy, kees
Cc: linux-arch, linux-riscv, linux-kernel, Rui Qi
Add a system error interrupt handler for RISC-V that panics
the system when hardware errors are detected. The implementation includes:
- Add IRQ_SYS_ERROR (23) interrupt definition to CSR header
- Implement sys_error.c module with panic handler
- Register per-CPU interrupt handler for system error interrupts
- Add module to kernel build system
When a system error interrupt occurs, the handler immediately panics
the system with a descriptive message to ensure the error is properly
captured and the system is halted safely.
Signed-off-by: Rui Qi <qirui.001@bytedance.com>
---
arch/riscv/include/asm/csr.h | 4 +-
arch/riscv/kernel/Makefile | 1 +
arch/riscv/kernel/sys_error.c | 80 +++++++++++++++++++++++++++++++++++
include/linux/cpuhotplug.h | 1 +
4 files changed, 85 insertions(+), 1 deletion(-)
create mode 100644 arch/riscv/kernel/sys_error.c
diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
index 31b8988f4488..1f43c25b07ed 100644
--- a/arch/riscv/include/asm/csr.h
+++ b/arch/riscv/include/asm/csr.h
@@ -99,7 +99,8 @@
#define IRQ_M_EXT 11
#define IRQ_S_GEXT 12
#define IRQ_PMU_OVF 13
-#define IRQ_LOCAL_MAX (IRQ_PMU_OVF + 1)
+#define IRQ_SYS_ERROR 23
+#define IRQ_LOCAL_MAX (IRQ_SYS_ERROR + 1)
#define IRQ_LOCAL_MASK GENMASK((IRQ_LOCAL_MAX - 1), 0)
/* Exception causes */
@@ -535,6 +536,7 @@
# define RV_IRQ_TIMER IRQ_S_TIMER
# define RV_IRQ_EXT IRQ_S_EXT
# define RV_IRQ_PMU IRQ_PMU_OVF
+# define RV_IRQ_SYS_ERROR IRQ_SYS_ERROR
# define SIP_LCOFIP (_AC(0x1, UL) << IRQ_PMU_OVF)
#endif /* !CONFIG_RISCV_M_MODE */
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index cabb99cadfb6..3aaf16c75d6e 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -72,6 +72,7 @@ obj-y += vendor_extensions.o
obj-y += vendor_extensions/
obj-y += probes/
obj-y += tests/
+obj-y += sys_error.o
obj-$(CONFIG_MMU) += vdso.o vdso/
obj-$(CONFIG_RISCV_USER_CFI) += vdso_cfi/
diff --git a/arch/riscv/kernel/sys_error.c b/arch/riscv/kernel/sys_error.c
new file mode 100644
index 000000000000..5b88ff4a0e84
--- /dev/null
+++ b/arch/riscv/kernel/sys_error.c
@@ -0,0 +1,80 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2026 Bytedance, Inc.
+ */
+#define pr_fmt(fmt) "riscv-sys-error: " fmt
+
+#include <linux/kernel.h>
+#include <linux/irq.h>
+#include <linux/irqdomain.h>
+#include <linux/interrupt.h>
+#include <linux/percpu.h>
+#include <linux/module.h>
+#include <asm/irq.h>
+#include <linux/cpuhotplug.h>
+#include <asm/csr.h>
+
+static unsigned int riscv_sys_error_irq;
+static DEFINE_PER_CPU_READ_MOSTLY(int, sys_error_dummy_dev);
+
+static irqreturn_t sys_error_irq_handler(int irq, void *dev)
+{
+ panic("RISC-V System Error Interrupt - System Error Detected");
+ return IRQ_HANDLED;
+}
+
+static int riscv_serror_starting_cpu(unsigned int cpu)
+{
+ csr_set(CSR_IE, BIT(RV_IRQ_SYS_ERROR));
+ enable_percpu_irq(riscv_sys_error_irq, irq_get_trigger_type(riscv_sys_error_irq));
+ return 0;
+}
+
+static int riscv_serror_dying_cpu(unsigned int cpu)
+{
+ csr_clear(CSR_IE, BIT(RV_IRQ_SYS_ERROR));
+ disable_percpu_irq(riscv_sys_error_irq);
+ return 0;
+}
+
+static int __init sys_error_init(void)
+{
+ int ret;
+ struct irq_domain *domain = NULL;
+
+ domain = irq_find_matching_fwnode(riscv_get_intc_hwnode(),
+ DOMAIN_BUS_ANY);
+ if (!domain) {
+ pr_err("Failed to find INTC IRQ root domain\n");
+ return -ENODEV;
+ }
+
+ riscv_sys_error_irq = irq_create_mapping(domain, RV_IRQ_SYS_ERROR);
+ if (!riscv_sys_error_irq) {
+ pr_err("Failed to map PMU interrupt for node\n");
+ return -ENODEV;
+ }
+
+ ret = request_percpu_irq(riscv_sys_error_irq, sys_error_irq_handler,
+ "riscv-syserror", &sys_error_dummy_dev);
+ if (ret) {
+ pr_err("registering percpu irq failed [%d]\n", ret);
+ return ret;
+ }
+
+ ret = cpuhp_setup_state(CPUHP_AP_RISCV_SERROR_STARTING,
+ "riscv/sys_error:starting",
+ riscv_serror_starting_cpu, riscv_serror_dying_cpu);
+ if (ret) {
+ pr_err("cpuhp setup state failed [%d]\n", ret);
+ goto fail_free_irq;
+ }
+
+ return 0;
+
+fail_free_irq:
+ free_percpu_irq(riscv_sys_error_irq, &sys_error_dummy_dev);
+ return ret;
+}
+
+arch_initcall(sys_error_init)
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 62cd7b35a29c..f6d0c05f72df 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -174,6 +174,7 @@ enum cpuhp_state {
CPUHP_AP_REALTEK_TIMER_STARTING,
CPUHP_AP_RISCV_TIMER_STARTING,
CPUHP_AP_CLINT_TIMER_STARTING,
+ CPUHP_AP_RISCV_SERROR_STARTING,
CPUHP_AP_CSKY_TIMER_STARTING,
CPUHP_AP_TI_GP_TIMER_STARTING,
CPUHP_AP_HYPERV_TIMER_STARTING,
--
2.20.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] riscv: add system error interrupt handler support
2026-02-26 8:27 [PATCH] riscv: add system error interrupt handler support Rui Qi
@ 2026-02-26 9:22 ` Conor Dooley
2026-02-27 7:54 ` Rui Qi
2026-02-26 19:09 ` kernel test robot
1 sibling, 1 reply; 6+ messages in thread
From: Conor Dooley @ 2026-02-26 9:22 UTC (permalink / raw)
To: Rui Qi
Cc: paul.walmsley, palmer, aou, alex, cyrilbur, tglx, peterz, debug,
andybnac, charlie, geomatsi, thuth, bjorn, songshuaishuai, martin,
masahiroy, kees, linux-arch, linux-riscv, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 5868 bytes --]
On Thu, Feb 26, 2026 at 04:27:35PM +0800, Rui Qi wrote:
> Add a system error interrupt handler for RISC-V that panics
> the system when hardware errors are detected. The implementation includes:
>
> - Add IRQ_SYS_ERROR (23) interrupt definition to CSR header
> - Implement sys_error.c module with panic handler
> - Register per-CPU interrupt handler for system error interrupts
> - Add module to kernel build system
>
> When a system error interrupt occurs, the handler immediately panics
> the system with a descriptive message to ensure the error is properly
> captured and the system is halted safely.
>
> Signed-off-by: Rui Qi <qirui.001@bytedance.com>
> ---
> arch/riscv/include/asm/csr.h | 4 +-
> arch/riscv/kernel/Makefile | 1 +
> arch/riscv/kernel/sys_error.c | 80 +++++++++++++++++++++++++++++++++++
> include/linux/cpuhotplug.h | 1 +
> 4 files changed, 85 insertions(+), 1 deletion(-)
> create mode 100644 arch/riscv/kernel/sys_error.c
>
> diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
> index 31b8988f4488..1f43c25b07ed 100644
> --- a/arch/riscv/include/asm/csr.h
> +++ b/arch/riscv/include/asm/csr.h
> @@ -99,7 +99,8 @@
> #define IRQ_M_EXT 11
> #define IRQ_S_GEXT 12
> #define IRQ_PMU_OVF 13
> -#define IRQ_LOCAL_MAX (IRQ_PMU_OVF + 1)
> +#define IRQ_SYS_ERROR 23
Hmmm, two problems I think with this. 23 is one of the interrupts that
has been reserved for use with AIA. I don't think they use it right now,
but in the future it might see use there.
The first problem is kind of moot though, because reserving 16-23 for
AIA is a retcon, and previously these interrupts were available custom
use on any platform (as you have done here), so while it might be a
system error on your platform, it could be something completely innocuous
on mine!
With that in mind, does having this in arch code make sense at all?
Can this just be a normal driver, that'll only probe on your specific
platform?
Cheers,
Conor.
> +#define IRQ_LOCAL_MAX (IRQ_SYS_ERROR + 1)
> #define IRQ_LOCAL_MASK GENMASK((IRQ_LOCAL_MAX - 1), 0)
>
> /* Exception causes */
> @@ -535,6 +536,7 @@
> # define RV_IRQ_TIMER IRQ_S_TIMER
> # define RV_IRQ_EXT IRQ_S_EXT
> # define RV_IRQ_PMU IRQ_PMU_OVF
> +# define RV_IRQ_SYS_ERROR IRQ_SYS_ERROR
> # define SIP_LCOFIP (_AC(0x1, UL) << IRQ_PMU_OVF)
>
> #endif /* !CONFIG_RISCV_M_MODE */
> diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> index cabb99cadfb6..3aaf16c75d6e 100644
> --- a/arch/riscv/kernel/Makefile
> +++ b/arch/riscv/kernel/Makefile
> @@ -72,6 +72,7 @@ obj-y += vendor_extensions.o
> obj-y += vendor_extensions/
> obj-y += probes/
> obj-y += tests/
> +obj-y += sys_error.o
> obj-$(CONFIG_MMU) += vdso.o vdso/
> obj-$(CONFIG_RISCV_USER_CFI) += vdso_cfi/
>
> diff --git a/arch/riscv/kernel/sys_error.c b/arch/riscv/kernel/sys_error.c
> new file mode 100644
> index 000000000000..5b88ff4a0e84
> --- /dev/null
> +++ b/arch/riscv/kernel/sys_error.c
> @@ -0,0 +1,80 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2026 Bytedance, Inc.
> + */
> +#define pr_fmt(fmt) "riscv-sys-error: " fmt
> +
> +#include <linux/kernel.h>
> +#include <linux/irq.h>
> +#include <linux/irqdomain.h>
> +#include <linux/interrupt.h>
> +#include <linux/percpu.h>
> +#include <linux/module.h>
> +#include <asm/irq.h>
> +#include <linux/cpuhotplug.h>
> +#include <asm/csr.h>
> +
> +static unsigned int riscv_sys_error_irq;
> +static DEFINE_PER_CPU_READ_MOSTLY(int, sys_error_dummy_dev);
> +
> +static irqreturn_t sys_error_irq_handler(int irq, void *dev)
> +{
> + panic("RISC-V System Error Interrupt - System Error Detected");
> + return IRQ_HANDLED;
> +}
> +
> +static int riscv_serror_starting_cpu(unsigned int cpu)
> +{
> + csr_set(CSR_IE, BIT(RV_IRQ_SYS_ERROR));
> + enable_percpu_irq(riscv_sys_error_irq, irq_get_trigger_type(riscv_sys_error_irq));
> + return 0;
> +}
> +
> +static int riscv_serror_dying_cpu(unsigned int cpu)
> +{
> + csr_clear(CSR_IE, BIT(RV_IRQ_SYS_ERROR));
> + disable_percpu_irq(riscv_sys_error_irq);
> + return 0;
> +}
> +
> +static int __init sys_error_init(void)
> +{
> + int ret;
> + struct irq_domain *domain = NULL;
> +
> + domain = irq_find_matching_fwnode(riscv_get_intc_hwnode(),
> + DOMAIN_BUS_ANY);
> + if (!domain) {
> + pr_err("Failed to find INTC IRQ root domain\n");
> + return -ENODEV;
> + }
> +
> + riscv_sys_error_irq = irq_create_mapping(domain, RV_IRQ_SYS_ERROR);
> + if (!riscv_sys_error_irq) {
> + pr_err("Failed to map PMU interrupt for node\n");
> + return -ENODEV;
> + }
> +
> + ret = request_percpu_irq(riscv_sys_error_irq, sys_error_irq_handler,
> + "riscv-syserror", &sys_error_dummy_dev);
> + if (ret) {
> + pr_err("registering percpu irq failed [%d]\n", ret);
> + return ret;
> + }
> +
> + ret = cpuhp_setup_state(CPUHP_AP_RISCV_SERROR_STARTING,
> + "riscv/sys_error:starting",
> + riscv_serror_starting_cpu, riscv_serror_dying_cpu);
> + if (ret) {
> + pr_err("cpuhp setup state failed [%d]\n", ret);
> + goto fail_free_irq;
> + }
> +
> + return 0;
> +
> +fail_free_irq:
> + free_percpu_irq(riscv_sys_error_irq, &sys_error_dummy_dev);
> + return ret;
> +}
> +
> +arch_initcall(sys_error_init)
> diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
> index 62cd7b35a29c..f6d0c05f72df 100644
> --- a/include/linux/cpuhotplug.h
> +++ b/include/linux/cpuhotplug.h
> @@ -174,6 +174,7 @@ enum cpuhp_state {
> CPUHP_AP_REALTEK_TIMER_STARTING,
> CPUHP_AP_RISCV_TIMER_STARTING,
> CPUHP_AP_CLINT_TIMER_STARTING,
> + CPUHP_AP_RISCV_SERROR_STARTING,
> CPUHP_AP_CSKY_TIMER_STARTING,
> CPUHP_AP_TI_GP_TIMER_STARTING,
> CPUHP_AP_HYPERV_TIMER_STARTING,
> --
> 2.20.1
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] riscv: add system error interrupt handler support
2026-02-26 8:27 [PATCH] riscv: add system error interrupt handler support Rui Qi
2026-02-26 9:22 ` Conor Dooley
@ 2026-02-26 19:09 ` kernel test robot
1 sibling, 0 replies; 6+ messages in thread
From: kernel test robot @ 2026-02-26 19:09 UTC (permalink / raw)
To: Rui Qi, paul.walmsley, palmer, aou, alex, cyrilbur, tglx, peterz,
debug, andybnac, charlie, geomatsi, thuth, bjorn, songshuaishuai,
martin, masahiroy, kees
Cc: oe-kbuild-all, linux-arch, linux-riscv, linux-kernel, Rui Qi
Hi Rui,
kernel test robot noticed the following build errors:
[auto build test ERROR on linus/master]
[also build test ERROR on tip/smp/core v7.0-rc1 next-20260226]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Rui-Qi/riscv-add-system-error-interrupt-handler-support/20260226-163131
base: linus/master
patch link: https://lore.kernel.org/r/20260226082735.56108-1-qirui.001%40bytedance.com
patch subject: [PATCH] riscv: add system error interrupt handler support
config: riscv-randconfig-r064-20260226 (https://download.01.org/0day-ci/archive/20260227/202602270355.h8QSG2vl-lkp@intel.com/config)
compiler: riscv32-linux-gcc (GCC) 8.5.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260227/202602270355.h8QSG2vl-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202602270355.h8QSG2vl-lkp@intel.com/
All errors (new ones prefixed by >>):
In file included from arch/riscv/include/asm/errata_list.h:8,
from arch/riscv/include/asm/vdso/processor.h:8,
from include/vdso/processor.h:10,
from arch/riscv/include/asm/processor.h:13,
from arch/riscv/include/asm/cmpxchg.h:16,
from arch/riscv/include/asm/barrier.h:14,
from include/asm-generic/bitops/generic-non-atomic.h:7,
from include/linux/bitops.h:28,
from include/linux/kernel.h:23,
from arch/riscv/kernel/sys_error.c:7:
arch/riscv/kernel/sys_error.c: In function 'riscv_serror_starting_cpu':
>> arch/riscv/kernel/sys_error.c:28:22: error: 'RV_IRQ_SYS_ERROR' undeclared (first use in this function); did you mean 'IRQ_SYS_ERROR'?
csr_set(CSR_IE, BIT(RV_IRQ_SYS_ERROR));
^~~~~~~~~~~~~~~~
arch/riscv/include/asm/csr.h:588:38: note: in definition of macro 'csr_set'
unsigned long __v = (unsigned long)(val); \
^~~
arch/riscv/kernel/sys_error.c:28:18: note: in expansion of macro 'BIT'
csr_set(CSR_IE, BIT(RV_IRQ_SYS_ERROR));
^~~
arch/riscv/kernel/sys_error.c:28:22: note: each undeclared identifier is reported only once for each function it appears in
csr_set(CSR_IE, BIT(RV_IRQ_SYS_ERROR));
^~~~~~~~~~~~~~~~
arch/riscv/include/asm/csr.h:588:38: note: in definition of macro 'csr_set'
unsigned long __v = (unsigned long)(val); \
^~~
arch/riscv/kernel/sys_error.c:28:18: note: in expansion of macro 'BIT'
csr_set(CSR_IE, BIT(RV_IRQ_SYS_ERROR));
^~~
arch/riscv/kernel/sys_error.c: In function 'riscv_serror_dying_cpu':
arch/riscv/kernel/sys_error.c:35:24: error: 'RV_IRQ_SYS_ERROR' undeclared (first use in this function); did you mean 'IRQ_SYS_ERROR'?
csr_clear(CSR_IE, BIT(RV_IRQ_SYS_ERROR));
^~~~~~~~~~~~~~~~
arch/riscv/include/asm/csr.h:605:38: note: in definition of macro 'csr_clear'
unsigned long __v = (unsigned long)(val); \
^~~
arch/riscv/kernel/sys_error.c:35:20: note: in expansion of macro 'BIT'
csr_clear(CSR_IE, BIT(RV_IRQ_SYS_ERROR));
^~~
arch/riscv/kernel/sys_error.c: In function 'sys_error_init':
arch/riscv/kernel/sys_error.c:52:51: error: 'RV_IRQ_SYS_ERROR' undeclared (first use in this function); did you mean 'IRQ_SYS_ERROR'?
riscv_sys_error_irq = irq_create_mapping(domain, RV_IRQ_SYS_ERROR);
^~~~~~~~~~~~~~~~
IRQ_SYS_ERROR
vim +28 arch/riscv/kernel/sys_error.c
25
26 static int riscv_serror_starting_cpu(unsigned int cpu)
27 {
> 28 csr_set(CSR_IE, BIT(RV_IRQ_SYS_ERROR));
29 enable_percpu_irq(riscv_sys_error_irq, irq_get_trigger_type(riscv_sys_error_irq));
30 return 0;
31 }
32
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] riscv: add system error interrupt handler support
2026-02-26 9:22 ` Conor Dooley
@ 2026-02-27 7:54 ` Rui Qi
2026-02-27 10:08 ` Conor Dooley
0 siblings, 1 reply; 6+ messages in thread
From: Rui Qi @ 2026-02-27 7:54 UTC (permalink / raw)
To: Conor Dooley
Cc: paul.walmsley, palmer, aou, alex, cyrilbur, tglx, peterz, debug,
andybnac, charlie, geomatsi, thuth, bjorn, songshuaishuai, martin,
masahiroy, kees, linux-arch, linux-riscv, linux-kernel
On 2/26/26 5:22 PM, Conor Dooley wrote:
> On Thu, Feb 26, 2026 at 04:27:35PM +0800, Rui Qi wrote:
>> Add a system error interrupt handler for RISC-V that panics
>> the system when hardware errors are detected. The implementation includes:
>>
>> - Add IRQ_SYS_ERROR (23) interrupt definition to CSR header
>> - Implement sys_error.c module with panic handler
>> - Register per-CPU interrupt handler for system error interrupts
>> - Add module to kernel build system
>>
>> When a system error interrupt occurs, the handler immediately panics
>> the system with a descriptive message to ensure the error is properly
>> captured and the system is halted safely.
>>
>> Signed-off-by: Rui Qi <qirui.001@bytedance.com>
>> ---
>> arch/riscv/include/asm/csr.h | 4 +-
>> arch/riscv/kernel/Makefile | 1 +
>> arch/riscv/kernel/sys_error.c | 80 +++++++++++++++++++++++++++++++++++
>> include/linux/cpuhotplug.h | 1 +
>> 4 files changed, 85 insertions(+), 1 deletion(-)
>> create mode 100644 arch/riscv/kernel/sys_error.c
>>
>> diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
>> index 31b8988f4488..1f43c25b07ed 100644
>> --- a/arch/riscv/include/asm/csr.h
>> +++ b/arch/riscv/include/asm/csr.h
>> @@ -99,7 +99,8 @@
>> #define IRQ_M_EXT 11
>> #define IRQ_S_GEXT 12
>> #define IRQ_PMU_OVF 13
>> -#define IRQ_LOCAL_MAX (IRQ_PMU_OVF + 1)
>> +#define IRQ_SYS_ERROR 23
>
> Hmmm, two problems I think with this. 23 is one of the interrupts that
> has been reserved for use with AIA. I don't think they use it right now,
> but in the future it might see use there.
>
> The first problem is kind of moot though, because reserving 16-23 for
> AIA is a retcon, and previously these interrupts were available custom
> use on any platform (as you have done here), so while it might be a
> system error on your platform, it could be something completely innocuous
> on mine!
>
> With that in mind, does having this in arch code make sense at all?
> Can this just be a normal driver, that'll only probe on your specific
> platform?
>
> Cheers,
> Conor.
>
Thanks for the comment.
I checked the latest RISC-V Interrupt Spec (2025-03-12). In that
version, interrupts 16–23 are defined as architectural local interrupts,
and interrupt 23 is tentatively proposed for a “Bus or system error”
type condition. That suggests this interrupt number is no longer just a
free, platform-defined slot — it now carries architectural intent and a
potential standardized meaning.
Given this context, my current implementation treats interrupt 23 as a
local condition that matches the spec’s intent for a system-level error
signal, rather than an arbitrary, custom platform interrupt. This seemed
reasonable as long as it aligns with the architectural semantics for
local interrupts.
That said, I’m open to the concern about placing this handling in
arch/riscv, and I’d like to understand your preference: do you think
this should be entirely moved into platform-specific code, or would a
conditional, spec-aware arch implementation (e.g., gated on the presence
of the relevant AIA/local interrupt support) be acceptable? Please let
me know what approach you’d suggest.
>> +#define IRQ_LOCAL_MAX (IRQ_SYS_ERROR + 1)
>> #define IRQ_LOCAL_MASK GENMASK((IRQ_LOCAL_MAX - 1), 0)
>>
>> /* Exception causes */
>> @@ -535,6 +536,7 @@
>> # define RV_IRQ_TIMER IRQ_S_TIMER
>> # define RV_IRQ_EXT IRQ_S_EXT
>> # define RV_IRQ_PMU IRQ_PMU_OVF
>> +# define RV_IRQ_SYS_ERROR IRQ_SYS_ERROR
>> # define SIP_LCOFIP (_AC(0x1, UL) << IRQ_PMU_OVF)
>>
>> #endif /* !CONFIG_RISCV_M_MODE */
>> diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
>> index cabb99cadfb6..3aaf16c75d6e 100644
>> --- a/arch/riscv/kernel/Makefile
>> +++ b/arch/riscv/kernel/Makefile
>> @@ -72,6 +72,7 @@ obj-y += vendor_extensions.o
>> obj-y += vendor_extensions/
>> obj-y += probes/
>> obj-y += tests/
>> +obj-y += sys_error.o
>> obj-$(CONFIG_MMU) += vdso.o vdso/
>> obj-$(CONFIG_RISCV_USER_CFI) += vdso_cfi/
>>
>> diff --git a/arch/riscv/kernel/sys_error.c b/arch/riscv/kernel/sys_error.c
>> new file mode 100644
>> index 000000000000..5b88ff4a0e84
>> --- /dev/null
>> +++ b/arch/riscv/kernel/sys_error.c
>> @@ -0,0 +1,80 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/*
>> + * Copyright (c) 2026 Bytedance, Inc.
>> + */
>> +#define pr_fmt(fmt) "riscv-sys-error: " fmt
>> +
>> +#include <linux/kernel.h>
>> +#include <linux/irq.h>
>> +#include <linux/irqdomain.h>
>> +#include <linux/interrupt.h>
>> +#include <linux/percpu.h>
>> +#include <linux/module.h>
>> +#include <asm/irq.h>
>> +#include <linux/cpuhotplug.h>
>> +#include <asm/csr.h>
>> +
>> +static unsigned int riscv_sys_error_irq;
>> +static DEFINE_PER_CPU_READ_MOSTLY(int, sys_error_dummy_dev);
>> +
>> +static irqreturn_t sys_error_irq_handler(int irq, void *dev)
>> +{
>> + panic("RISC-V System Error Interrupt - System Error Detected");
>> + return IRQ_HANDLED;
>> +}
>> +
>> +static int riscv_serror_starting_cpu(unsigned int cpu)
>> +{
>> + csr_set(CSR_IE, BIT(RV_IRQ_SYS_ERROR));
>> + enable_percpu_irq(riscv_sys_error_irq, irq_get_trigger_type(riscv_sys_error_irq));
>> + return 0;
>> +}
>> +
>> +static int riscv_serror_dying_cpu(unsigned int cpu)
>> +{
>> + csr_clear(CSR_IE, BIT(RV_IRQ_SYS_ERROR));
>> + disable_percpu_irq(riscv_sys_error_irq);
>> + return 0;
>> +}
>> +
>> +static int __init sys_error_init(void)
>> +{
>> + int ret;
>> + struct irq_domain *domain = NULL;
>> +
>> + domain = irq_find_matching_fwnode(riscv_get_intc_hwnode(),
>> + DOMAIN_BUS_ANY);
>> + if (!domain) {
>> + pr_err("Failed to find INTC IRQ root domain\n");
>> + return -ENODEV;
>> + }
>> +
>> + riscv_sys_error_irq = irq_create_mapping(domain, RV_IRQ_SYS_ERROR);
>> + if (!riscv_sys_error_irq) {
>> + pr_err("Failed to map PMU interrupt for node\n");
>> + return -ENODEV;
>> + }
>> +
>> + ret = request_percpu_irq(riscv_sys_error_irq, sys_error_irq_handler,
>> + "riscv-syserror", &sys_error_dummy_dev);
>> + if (ret) {
>> + pr_err("registering percpu irq failed [%d]\n", ret);
>> + return ret;
>> + }
>> +
>> + ret = cpuhp_setup_state(CPUHP_AP_RISCV_SERROR_STARTING,
>> + "riscv/sys_error:starting",
>> + riscv_serror_starting_cpu, riscv_serror_dying_cpu);
>> + if (ret) {
>> + pr_err("cpuhp setup state failed [%d]\n", ret);
>> + goto fail_free_irq;
>> + }
>> +
>> + return 0;
>> +
>> +fail_free_irq:
>> + free_percpu_irq(riscv_sys_error_irq, &sys_error_dummy_dev);
>> + return ret;
>> +}
>> +
>> +arch_initcall(sys_error_init)
>> diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
>> index 62cd7b35a29c..f6d0c05f72df 100644
>> --- a/include/linux/cpuhotplug.h
>> +++ b/include/linux/cpuhotplug.h
>> @@ -174,6 +174,7 @@ enum cpuhp_state {
>> CPUHP_AP_REALTEK_TIMER_STARTING,
>> CPUHP_AP_RISCV_TIMER_STARTING,
>> CPUHP_AP_CLINT_TIMER_STARTING,
>> + CPUHP_AP_RISCV_SERROR_STARTING,
>> CPUHP_AP_CSKY_TIMER_STARTING,
>> CPUHP_AP_TI_GP_TIMER_STARTING,
>> CPUHP_AP_HYPERV_TIMER_STARTING,
>> --
>> 2.20.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] riscv: add system error interrupt handler support
2026-02-27 7:54 ` Rui Qi
@ 2026-02-27 10:08 ` Conor Dooley
2026-03-02 2:41 ` Rui Qi
0 siblings, 1 reply; 6+ messages in thread
From: Conor Dooley @ 2026-02-27 10:08 UTC (permalink / raw)
To: Rui Qi
Cc: paul.walmsley, palmer, aou, alex, cyrilbur, tglx, peterz, debug,
andybnac, charlie, geomatsi, thuth, bjorn, songshuaishuai, martin,
masahiroy, kees, linux-arch, linux-riscv, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 4359 bytes --]
On Fri, Feb 27, 2026 at 03:54:39PM +0800, Rui Qi wrote:
> On 2/26/26 5:22 PM, Conor Dooley wrote:
> > On Thu, Feb 26, 2026 at 04:27:35PM +0800, Rui Qi wrote:
> >> Add a system error interrupt handler for RISC-V that panics
> >> the system when hardware errors are detected. The implementation includes:
> >>
> >> - Add IRQ_SYS_ERROR (23) interrupt definition to CSR header
> >> - Implement sys_error.c module with panic handler
> >> - Register per-CPU interrupt handler for system error interrupts
> >> - Add module to kernel build system
> >>
> >> When a system error interrupt occurs, the handler immediately panics
> >> the system with a descriptive message to ensure the error is properly
> >> captured and the system is halted safely.
> >>
> >> Signed-off-by: Rui Qi <qirui.001@bytedance.com>
> >> ---
> >> arch/riscv/include/asm/csr.h | 4 +-
> >> arch/riscv/kernel/Makefile | 1 +
> >> arch/riscv/kernel/sys_error.c | 80 +++++++++++++++++++++++++++++++++++
> >> include/linux/cpuhotplug.h | 1 +
> >> 4 files changed, 85 insertions(+), 1 deletion(-)
> >> create mode 100644 arch/riscv/kernel/sys_error.c
> >>
> >> diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
> >> index 31b8988f4488..1f43c25b07ed 100644
> >> --- a/arch/riscv/include/asm/csr.h
> >> +++ b/arch/riscv/include/asm/csr.h
> >> @@ -99,7 +99,8 @@
> >> #define IRQ_M_EXT 11
> >> #define IRQ_S_GEXT 12
> >> #define IRQ_PMU_OVF 13
> >> -#define IRQ_LOCAL_MAX (IRQ_PMU_OVF + 1)
> >> +#define IRQ_SYS_ERROR 23
> >
> > Hmmm, two problems I think with this. 23 is one of the interrupts that
> > has been reserved for use with AIA. I don't think they use it right now,
> > but in the future it might see use there.
> >
> > The first problem is kind of moot though, because reserving 16-23 for
> > AIA is a retcon, and previously these interrupts were available custom
> > use on any platform (as you have done here), so while it might be a
> > system error on your platform, it could be something completely innocuous
> > on mine!
> >
> > With that in mind, does having this in arch code make sense at all?
> > Can this just be a normal driver, that'll only probe on your specific
> > platform?
>
> Thanks for the comment.
>
> I checked the latest RISC-V Interrupt Spec (2025-03-12). In that
> version, interrupts 16–23 are defined as architectural local interrupts,
> and interrupt 23 is tentatively proposed for a “Bus or system error”
Right, tentative is kinda no use to either of us though. Do you have
hardware that does this?
> type condition. That suggests this interrupt number is no longer just a
> free, platform-defined slot — it now carries architectural intent and a
> potential standardized meaning.
Emphasis on "potential", of course ;) Still a platform defined slot for
anyone that doesn't implement AIA, I think...
I don't understand the interaction between extensions and the priv spec
here, since the priv spec still says that 16 and higher are platform
specific.
> Given this context, my current implementation treats interrupt 23 as a
> local condition that matches the spec’s intent for a system-level error
> signal, rather than an arbitrary, custom platform interrupt. This seemed
> reasonable as long as it aligns with the architectural semantics for
> local interrupts.
>
> That said, I’m open to the concern about placing this handling in
> arch/riscv, and I’d like to understand your preference: do you think
> this should be entirely moved into platform-specific code, or would a
> conditional, spec-aware arch implementation (e.g., gated on the presence
> of the relevant AIA/local interrupt support) be acceptable? Please let
It can't be gated on AIA, because AIA isn't the extension that says that
this is what interrupt 23 does (it specifically says that new extensions
are expected to define the use of the three interrupts "proposes"), and
we don't permit extension related stuff that is not frozen. I couldn't
find an extension at any stage that set the behaviour in stone, are you
aware of one? If one exists, a spec-aware arch implementation would be
okay. If one does not, you'll have to make this specific to your
platform until that extension shows up.
> me know what approach you’d suggest.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] riscv: add system error interrupt handler support
2026-02-27 10:08 ` Conor Dooley
@ 2026-03-02 2:41 ` Rui Qi
0 siblings, 0 replies; 6+ messages in thread
From: Rui Qi @ 2026-03-02 2:41 UTC (permalink / raw)
To: Conor Dooley
Cc: paul.walmsley, palmer, aou, alex, cyrilbur, tglx, peterz, debug,
andybnac, charlie, geomatsi, thuth, bjorn, songshuaishuai, martin,
masahiroy, kees, linux-arch, linux-riscv, linux-kernel
On 2/27/26 6:08 PM, Conor Dooley wrote:
> On Fri, Feb 27, 2026 at 03:54:39PM +0800, Rui Qi wrote:
>> On 2/26/26 5:22 PM, Conor Dooley wrote:
>>> On Thu, Feb 26, 2026 at 04:27:35PM +0800, Rui Qi wrote:
>>>> Add a system error interrupt handler for RISC-V that panics
>>>> the system when hardware errors are detected. The implementation includes:
>>>>
>>>> - Add IRQ_SYS_ERROR (23) interrupt definition to CSR header
>>>> - Implement sys_error.c module with panic handler
>>>> - Register per-CPU interrupt handler for system error interrupts
>>>> - Add module to kernel build system
>>>>
>>>> When a system error interrupt occurs, the handler immediately panics
>>>> the system with a descriptive message to ensure the error is properly
>>>> captured and the system is halted safely.
>>>>
>>>> Signed-off-by: Rui Qi <qirui.001@bytedance.com>
>>>> ---
>>>> arch/riscv/include/asm/csr.h | 4 +-
>>>> arch/riscv/kernel/Makefile | 1 +
>>>> arch/riscv/kernel/sys_error.c | 80 +++++++++++++++++++++++++++++++++++
>>>> include/linux/cpuhotplug.h | 1 +
>>>> 4 files changed, 85 insertions(+), 1 deletion(-)
>>>> create mode 100644 arch/riscv/kernel/sys_error.c
>>>>
>>>> diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
>>>> index 31b8988f4488..1f43c25b07ed 100644
>>>> --- a/arch/riscv/include/asm/csr.h
>>>> +++ b/arch/riscv/include/asm/csr.h
>>>> @@ -99,7 +99,8 @@
>>>> #define IRQ_M_EXT 11
>>>> #define IRQ_S_GEXT 12
>>>> #define IRQ_PMU_OVF 13
>>>> -#define IRQ_LOCAL_MAX (IRQ_PMU_OVF + 1)
>>>> +#define IRQ_SYS_ERROR 23
>>>
>>> Hmmm, two problems I think with this. 23 is one of the interrupts that
>>> has been reserved for use with AIA. I don't think they use it right now,
>>> but in the future it might see use there.
>>>
>>> The first problem is kind of moot though, because reserving 16-23 for
>>> AIA is a retcon, and previously these interrupts were available custom
>>> use on any platform (as you have done here), so while it might be a
>>> system error on your platform, it could be something completely innocuous
>>> on mine!
>>>
>>> With that in mind, does having this in arch code make sense at all?
>>> Can this just be a normal driver, that'll only probe on your specific
>>> platform?
>>
>> Thanks for the comment.
>>
>> I checked the latest RISC-V Interrupt Spec (2025-03-12). In that
>> version, interrupts 16–23 are defined as architectural local interrupts,
>> and interrupt 23 is tentatively proposed for a “Bus or system error”
>
> Right, tentative is kinda no use to either of us though. Do you have
> hardware that does this?
>
>> type condition. That suggests this interrupt number is no longer just a
>> free, platform-defined slot — it now carries architectural intent and a
>> potential standardized meaning.
>
> Emphasis on "potential", of course ;) Still a platform defined slot for
> anyone that doesn't implement AIA, I think...
> I don't understand the interaction between extensions and the priv spec
> here, since the priv spec still says that 16 and higher are platform
> specific.
>
>> Given this context, my current implementation treats interrupt 23 as a
>> local condition that matches the spec’s intent for a system-level error
>> signal, rather than an arbitrary, custom platform interrupt. This seemed
>> reasonable as long as it aligns with the architectural semantics for
>> local interrupts.
>>
>> That said, I’m open to the concern about placing this handling in
>> arch/riscv, and I’d like to understand your preference: do you think
>> this should be entirely moved into platform-specific code, or would a
>> conditional, spec-aware arch implementation (e.g., gated on the presence
>> of the relevant AIA/local interrupt support) be acceptable? Please let
>
> It can't be gated on AIA, because AIA isn't the extension that says that
> this is what interrupt 23 does (it specifically says that new extensions
> are expected to define the use of the three interrupts "proposes"), and
> we don't permit extension related stuff that is not frozen. I couldn't
> find an extension at any stage that set the behaviour in stone, are you
> aware of one? If one exists, a spec-aware arch implementation would be
> okay. If one does not, you'll have to make this specific to your
> platform until that extension shows up.
>
>> me know what approach you’d suggest.
Thanks for the clarification — that makes sense.
You're right that AIA itself does not normatively define the semantics
of interrupt 23, and that the wording there is explicitly provisional. I
also understand the policy around not merging extension-related behavior
unless it is tied to a frozen specification.
My reasoning was based on the architectural direction indicated in the
interrupt spec, but I agree that “tentative” does not constitute a
stable architectural contract, and therefore isn't sufficient to justify
arch-level handling at this point.
Given that, I can move this into platform-specific code for now. If and
when a ratified extension formally assigns defined semantics to
interrupt 23, we could then revisit a spec-aware arch implementation
gated on that extension.
Please let me know if that approach aligns better with your expectations.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-03-02 2:42 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-26 8:27 [PATCH] riscv: add system error interrupt handler support Rui Qi
2026-02-26 9:22 ` Conor Dooley
2026-02-27 7:54 ` Rui Qi
2026-02-27 10:08 ` Conor Dooley
2026-03-02 2:41 ` Rui Qi
2026-02-26 19:09 ` kernel test robot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox