From: "Troy Mitchell" <troy.mitchell@linux.dev>
To: "Vivian Wang" <wangruikang@iscas.ac.cn>,
"Troy Mitchell" <troy.mitchell@linux.dev>,
"Paul Walmsley" <pjw@kernel.org>,
"Palmer Dabbelt" <palmer@dabbelt.com>,
"Albert Ou" <aou@eecs.berkeley.edu>,
"Alexandre Ghiti" <alex@ghiti.fr>
Cc: <linux-riscv@lists.infradead.org>, <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH RFC] riscv: disable local interrupts and stop other CPUs before restart
Date: Mon, 16 Mar 2026 15:23:48 +0800 [thread overview]
Message-ID: <DH40Z14EHV40.1I443BTRU720F@linux.dev> (raw)
In-Reply-To: <f9079774-1d16-4b84-bf85-bc8444e93446@iscas.ac.cn>
Hi Vivian,
Thanks for the review and the insights!
On Thu Mar 12, 2026 at 11:05 AM CST, Vivian Wang wrote:
> On 3/11/26 10:51, Troy Mitchell wrote:
>> Currently, the RISC-V implementation of machine_restart() directly calls
>> do_kernel_restart() without disabling local interrupts or stopping other
>> CPUs. This missing architectural setup causes fatal issues for systems
>> that rely on external peripherals (e.g., I2C PMICs) to execute the system
>> restart when CONFIG_PREEMPT_RCU is enabled.
>>
>> When a restart handler relies on the I2C subsystem, the I2C core checks
>> i2c_in_atomic_xfer_mode() to decide whether to use the sleepable xfer
>> or the polling atomic_xfer. This check evaluates to true if
>> (!preemptible() || irqs_disabled()).
>>
>> During do_kernel_restart(), the restart handlers are invoked via
>> atomic_notifier_call_chain(), which holds an RCU read lock.
>> The behavior diverges based on the preemption model:
>> 1. Under CONFIG_PREEMPT_VOLUNTARY or CONFIG_PREEMPT_NONE, rcu_read_lock()
>> implicitly disables preemption. preemptible() evaluates to false, and
>> the I2C core correctly routes to the atomic, polling transfer path.
>> 2. Under CONFIG_PREEMPT_RCU, rcu_read_lock() does NOT disable preemption.
>> Since machine_restart() left local interrupts enabled, irqs_disabled()
>> is false, and preempt_count is 0. Consequently, preemptible() evaluates
>> to true.
>>
>> As a result, the I2C core falsely assumes a sleepable context and routes
>> the transfer to the standard master_xfer path. This inevitably triggers a
>> schedule() call while holding the RCU read lock, resulting in a fatal splat:
>> "Voluntary context switch within RCU read-side critical section!" and
>> a system hang.
>>
>> Align RISC-V with other major architectures (e.g., ARM64) by adding
>> local_irq_disable() and smp_send_stop() to machine_restart().
>> - local_irq_disable() guarantees a strict atomic context, forcing sub-
>> systems like I2C to always fall back to polling mode.
>> - smp_send_stop() ensures exclusive hardware access by quiescing other
>> CPUs, preventing them from holding bus locks (e.g., I2C spinlocks)
>> during the final restart phase.
>
> Maybe while we're at it, we can fix the other functions in this file as
> well?
Nice catch. I'll fix other functions in the next version.
>
> I think the reason we ended up with the "unsafe" implementations of the
> reboot/shutdown functions is that on the backend it is usually SBI SRST
> calls, which can protect itself from other CPUs and interrupts. Since on
> K1 we're going to be poking I2C directly, we run into the problem
> described above. So all of these should disable interrupts and stop
> other CPUs before calling the handlers, and can't assume the handlers
> are all SBI SRST.
Yes, we cannot assume that all platforms rely on this.
>
>> Signed-off-by: Troy Mitchell <troy.mitchell@linux.dev>
>> ---
>> arch/riscv/kernel/reset.c | 5 +++++
>> 1 file changed, 5 insertions(+)
>>
>> diff --git a/arch/riscv/kernel/reset.c b/arch/riscv/kernel/reset.c
>> index 912288572226..7a5dcfdc3674 100644
>> --- a/arch/riscv/kernel/reset.c
>> +++ b/arch/riscv/kernel/reset.c
>> @@ -5,6 +5,7 @@
>>
>> #include <linux/reboot.h>
>> #include <linux/pm.h>
>> +#include <linux/smp.h>
>>
>> static void default_power_off(void)
>> {
>> @@ -17,6 +18,10 @@ EXPORT_SYMBOL(pm_power_off);
>>
>> void machine_restart(char *cmd)
>> {
>> + /* Disable interrupts first */
>> + local_irq_disable();
>> + smp_send_stop();
>> +
>> do_kernel_restart(cmd);
>> while (1);
>> }
>
> So... one thing I'm not certain is that arm64 also has some EFI handling
> here. But I think the safe choice is to ignore EFI for now until it's
> needed, instead of preemptively doing it. Who knows what shenanigans
> firmwares can get up to.
I agree at all.
>
> Thanks for the patch!
I'll prepare v2 with those functions fixed. Looking forward to your further review
on the updated version
- Troy
>
> Vivian "dramforever" Wang
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
WARNING: multiple messages have this Message-ID (diff)
From: "Troy Mitchell" <troy.mitchell@linux.dev>
To: "Vivian Wang" <wangruikang@iscas.ac.cn>,
"Troy Mitchell" <troy.mitchell@linux.dev>,
"Paul Walmsley" <pjw@kernel.org>,
"Palmer Dabbelt" <palmer@dabbelt.com>,
"Albert Ou" <aou@eecs.berkeley.edu>,
"Alexandre Ghiti" <alex@ghiti.fr>
Cc: <linux-riscv@lists.infradead.org>, <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH RFC] riscv: disable local interrupts and stop other CPUs before restart
Date: Mon, 16 Mar 2026 15:23:48 +0800 [thread overview]
Message-ID: <DH40Z14EHV40.1I443BTRU720F@linux.dev> (raw)
In-Reply-To: <f9079774-1d16-4b84-bf85-bc8444e93446@iscas.ac.cn>
Hi Vivian,
Thanks for the review and the insights!
On Thu Mar 12, 2026 at 11:05 AM CST, Vivian Wang wrote:
> On 3/11/26 10:51, Troy Mitchell wrote:
>> Currently, the RISC-V implementation of machine_restart() directly calls
>> do_kernel_restart() without disabling local interrupts or stopping other
>> CPUs. This missing architectural setup causes fatal issues for systems
>> that rely on external peripherals (e.g., I2C PMICs) to execute the system
>> restart when CONFIG_PREEMPT_RCU is enabled.
>>
>> When a restart handler relies on the I2C subsystem, the I2C core checks
>> i2c_in_atomic_xfer_mode() to decide whether to use the sleepable xfer
>> or the polling atomic_xfer. This check evaluates to true if
>> (!preemptible() || irqs_disabled()).
>>
>> During do_kernel_restart(), the restart handlers are invoked via
>> atomic_notifier_call_chain(), which holds an RCU read lock.
>> The behavior diverges based on the preemption model:
>> 1. Under CONFIG_PREEMPT_VOLUNTARY or CONFIG_PREEMPT_NONE, rcu_read_lock()
>> implicitly disables preemption. preemptible() evaluates to false, and
>> the I2C core correctly routes to the atomic, polling transfer path.
>> 2. Under CONFIG_PREEMPT_RCU, rcu_read_lock() does NOT disable preemption.
>> Since machine_restart() left local interrupts enabled, irqs_disabled()
>> is false, and preempt_count is 0. Consequently, preemptible() evaluates
>> to true.
>>
>> As a result, the I2C core falsely assumes a sleepable context and routes
>> the transfer to the standard master_xfer path. This inevitably triggers a
>> schedule() call while holding the RCU read lock, resulting in a fatal splat:
>> "Voluntary context switch within RCU read-side critical section!" and
>> a system hang.
>>
>> Align RISC-V with other major architectures (e.g., ARM64) by adding
>> local_irq_disable() and smp_send_stop() to machine_restart().
>> - local_irq_disable() guarantees a strict atomic context, forcing sub-
>> systems like I2C to always fall back to polling mode.
>> - smp_send_stop() ensures exclusive hardware access by quiescing other
>> CPUs, preventing them from holding bus locks (e.g., I2C spinlocks)
>> during the final restart phase.
>
> Maybe while we're at it, we can fix the other functions in this file as
> well?
Nice catch. I'll fix other functions in the next version.
>
> I think the reason we ended up with the "unsafe" implementations of the
> reboot/shutdown functions is that on the backend it is usually SBI SRST
> calls, which can protect itself from other CPUs and interrupts. Since on
> K1 we're going to be poking I2C directly, we run into the problem
> described above. So all of these should disable interrupts and stop
> other CPUs before calling the handlers, and can't assume the handlers
> are all SBI SRST.
Yes, we cannot assume that all platforms rely on this.
>
>> Signed-off-by: Troy Mitchell <troy.mitchell@linux.dev>
>> ---
>> arch/riscv/kernel/reset.c | 5 +++++
>> 1 file changed, 5 insertions(+)
>>
>> diff --git a/arch/riscv/kernel/reset.c b/arch/riscv/kernel/reset.c
>> index 912288572226..7a5dcfdc3674 100644
>> --- a/arch/riscv/kernel/reset.c
>> +++ b/arch/riscv/kernel/reset.c
>> @@ -5,6 +5,7 @@
>>
>> #include <linux/reboot.h>
>> #include <linux/pm.h>
>> +#include <linux/smp.h>
>>
>> static void default_power_off(void)
>> {
>> @@ -17,6 +18,10 @@ EXPORT_SYMBOL(pm_power_off);
>>
>> void machine_restart(char *cmd)
>> {
>> + /* Disable interrupts first */
>> + local_irq_disable();
>> + smp_send_stop();
>> +
>> do_kernel_restart(cmd);
>> while (1);
>> }
>
> So... one thing I'm not certain is that arm64 also has some EFI handling
> here. But I think the safe choice is to ignore EFI for now until it's
> needed, instead of preemptively doing it. Who knows what shenanigans
> firmwares can get up to.
I agree at all.
>
> Thanks for the patch!
I'll prepare v2 with those functions fixed. Looking forward to your further review
on the updated version
- Troy
>
> Vivian "dramforever" Wang
next prev parent reply other threads:[~2026-03-16 7:25 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-11 2:51 [PATCH RFC] riscv: disable local interrupts and stop other CPUs before restart Troy Mitchell
2026-03-11 2:51 ` Troy Mitchell
2026-03-11 6:47 ` Aurelien Jarno
2026-03-11 6:47 ` Aurelien Jarno
2026-03-11 9:49 ` Troy Mitchell
2026-03-11 9:49 ` Troy Mitchell
2026-03-11 9:54 ` Troy Mitchell
2026-03-11 9:54 ` Troy Mitchell
2026-03-12 3:05 ` Vivian Wang
2026-03-12 3:05 ` Vivian Wang
2026-03-16 7:23 ` Troy Mitchell [this message]
2026-03-16 7:23 ` Troy Mitchell
2026-03-16 13:19 ` Samuel Holland
2026-03-16 13:19 ` Samuel Holland
2026-03-17 2:45 ` Troy Mitchell
2026-03-17 2:45 ` Troy Mitchell
2026-03-17 4:07 ` Samuel Holland
2026-03-17 4:07 ` Samuel Holland
2026-03-17 7:42 ` Troy Mitchell
2026-03-17 7:42 ` Troy Mitchell
2026-04-23 20:49 ` Aurelien Jarno
2026-04-23 20:49 ` Aurelien Jarno
2026-04-25 17:42 ` Anand Moon
2026-04-25 17:42 ` Anand Moon
2026-04-26 12:20 ` Aurelien Jarno
2026-04-26 12:20 ` Aurelien Jarno
2026-04-27 3:47 ` Anand Moon
2026-04-27 3:47 ` Anand Moon
2026-05-04 13:27 ` Anand Moon
2026-05-04 13:27 ` Anand Moon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DH40Z14EHV40.1I443BTRU720F@linux.dev \
--to=troy.mitchell@linux.dev \
--cc=alex@ghiti.fr \
--cc=aou@eecs.berkeley.edu \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=palmer@dabbelt.com \
--cc=pjw@kernel.org \
--cc=wangruikang@iscas.ac.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.