public inbox for opensbi@lists.infradead.org
 help / color / mirror / Atom feed
From: Bo Gan <ganboing@gmail.com>
To: opensbi@lists.infradead.org
Cc: nick.hu@sifive.com, linmin@eswincomputing.com,
	gaohan@iscas.ac.cn, me@ziyao.cc
Subject: Re: [PATCH 2/2] platform: generic: eswin: Add eic770x_hsm and fix warm reset issues
Date: Tue, 3 Mar 2026 23:52:36 -0800	[thread overview]
Message-ID: <a5064a0a-be3c-45a5-9166-a965cbc158c7@gmail.com> (raw)
In-Reply-To: <20260302084029.428-3-ganboing@gmail.com>

Hi All,

An issue is reported by Linmin (ESWIN) in a separate discussion with me.
Specifically, the mcput_cease_from_tile_x bit will not properly reflect
hart cease status if the remote hart hasn't undergone a round of HSM
start->stop, and its 0x7c1 CSR is still at reset value. I'm working on
a fix and will push v2 shortly. Thanks.

Bo

On 3/2/26 00:40, Bo Gan wrote:
> During warm reset, my EIC770X/Hifive Premier P550 can sometimes
> encounter memory corruption issue crashing Linux boot. Currently the
> issue is mitigated by having a sbi_printf before writing to the reset
> register. I analyzed the issue further since then. From the SoC
> datasheet[1], it's recommended to implement power-down flow as:
> 
>    a. Designate a primary core, and let it broadcast requests to other
>       cores to execute a CEASE insn. Primary core also notifies an
>       "Externel Agent" to start monitoring.
>    b. Primary core waits for other cores to CEASE before it CEASEs.
>    c. "External Agent" waits for primary core to CEASE before resets
>       the Core Complex.
> 
> It's possible that EIC770X can trigger undefined behavior if the core
> complex is reset while the harts are actively running. The sbi_printf
> in the reset handler effectively hides the problem by delaying the
> reset -- by the time sbi_printf finishes, all other harts will have
> already landed in the loop in sbi_hsm_hart_wait(), which parks the hart.
> Without the sbi_printf, I confirmed that other harts haven't reached
> sbi_hsm_hart_wait yet before current hart resets the SoC. (by debugging)
> 
> To safely reset, and inspired by the datasheet, the warm reset logic
> in eic770x.c now use the current hart as both primary core and the
> "External Agent", and other harts as secondary cores. It leverages
> the HSM framework and a new eic770x_hsm device to CEASE other harts,
> and wait for them to CEASE before resets the SoC. with the sbi_printf
> before reset removed, and this logic in place, stress test shows that
> the memory corruption issue no longer occurs.
> 
> The new eic770x_hsm device is only used for the reset-CEASE logic at
> the moment, and may be extended to a fully functional HSM device in
> the future.
> 
> [1] https://github.com/eswincomputing/EIC7700X-SoC-Technical-Reference-Manual
> 
> Fixes: e5797e0688c1 ("platform: generic: eswin: add EIC7700")
> Signed-off-by: Bo Gan <ganboing@gmail.com>
> ---
>   platform/generic/eswin/eic770x.c         | 105 ++++++++++++++++++++---
>   platform/generic/eswin/hfp.c             |   4 +-
>   platform/generic/include/eswin/eic770x.h |  20 +++--
>   3 files changed, 113 insertions(+), 16 deletions(-)
> 
> diff --git a/platform/generic/eswin/eic770x.c b/platform/generic/eswin/eic770x.c
> index 7330df9f..237fb06d 100644
> --- a/platform/generic/eswin/eic770x.c
> +++ b/platform/generic/eswin/eic770x.c
> @@ -10,14 +10,96 @@
>   #include <sbi/sbi_console.h>
>   #include <sbi/sbi_system.h>
>   #include <sbi/sbi_math.h>
> +#include <sbi/sbi_hsm.h>
> +#include <sbi/sbi_ipi.h>
>   #include <sbi/sbi_hart_pmp.h>
>   #include <sbi/sbi_hart_protection.h>
> +#include <sbi_utils/hsm/fdt_hsm_sifive_inst.h>
>   #include <eswin/eic770x.h>
>   #include <eswin/hfp.h>
>   
>   static struct sbi_hart_protection eswin_eic7700_pmp_protection;
> +static volatile bool eic770x_power_down = false;
>   
> -static int eic770x_system_reset_check(u32 type, u32 reason)
> +static int eic770x_hart_start(u32 hartid, ulong saddr)
> +{
> +	u32 hartindex = sbi_hartid_to_hartindex(hartid);
> +
> +	/*
> +	 * saddr is ignored intentionally.
> +	 * For non-power-down scenarios, eic770x_hart_stop simply
> +	 * returns, putting the hart in atomic_read(&hdata->state)
> +	 * loop in sbi_hsm_hart_wait. We wake it up if it's in wfi()
> +	 */
> +	return sbi_ipi_raw_send(hartindex, true);
> +}
> +
> +static int eic770x_hart_stop()
> +{
> +	/*
> +	 * fence to enforce all previous ipi clears are done
> +	 * Refer to comments below in eic770x_cease_other_harts
> +	 */
> +	asm volatile ("fence o, r");
> +
> +	if (!eic770x_power_down)
> +		return SBI_ENOTSUPP;
> +
> +	sifive_cease();
> +}
> +
> +void eic770x_cease_other_harts(void)
> +{
> +	u32 to_cease[2] = {};
> +
> +	eic770x_power_down = true;
> +	sbi_for_each_hartindex(i) {
> +		u32 hartid = sbi_hartindex_to_hartid(i);
> +		u32 die    = hart_die(hartid);
> +		u32 core   = hart_core(hartid);
> +
> +		/* Only wait for other harts */
> +		if (i == current_hartindex())
> +			continue;
> +		/*
> +		 * Bring harts out of WFI in sbi_hsm_hart_wait
> +		 * Harts won't miss this IPI, because:
> +		 *  1. If hart goes to wfi() in sbi_hsm_hart_wait,
> +		 *     it must have not observed eic770x_power_down
> +		 *  2. If it hasn't observed eic770x_power_down,
> +		 *     then it must haven't observed the IPI sent,
> +		 *     given the wmb() in sbi_ipi_raw_send
> +		 *  3. Given the fence o, r, any previous ipi_clear
> +		 *     can't fall-through the read of eic770x_power_down
> +		 */
> +		sbi_ipi_raw_send(i, false);
> +		to_cease[die] |= EIC770X_MC_CEASE_BIT(core);
> +	}
> +
> +	for (u32 die = 0; die < array_size(to_cease); die++) {
> +		/*
> +		 * MCPU status indicates the wfi/debug/halt/cease status
> +		 * of each individual harts in the same die. The value
> +		 * can change on the fly, but for ceased harts, the cease
> +		 * bit remains high until reset
> +		 */
> +		u32 *status = (u32*)EIC770X_MCPU_STATUS(die);
> +
> +		if (!to_cease[die])
> +			continue;
> +
> +		/* Wait for mcput_cease_from_tile_x */
> +		while ((readl(status) & to_cease[die]) != to_cease[die]);
> +	}
> +}
> +
> +static const struct sbi_hsm_device eswin_eic770x_hsm = {
> +	.name	      = "eic770x_hsm",
> +	.hart_start   = eic770x_hart_start,
> +	.hart_stop    = eic770x_hart_stop,
> +};
> +
> +static int eic7700_system_reset_check(u32 type, u32 reason)
>   {
>   	switch (type) {
>   	case SBI_SRST_RESET_TYPE_COLD_REBOOT:
> @@ -28,23 +110,23 @@ static int eic770x_system_reset_check(u32 type, u32 reason)
>   	}
>   }
>   
> -static void eic770x_system_reset(u32 type, u32 reason)
> +static void eic7700_system_reset(u32 type, u32 reason)
>   {
>   	switch (type) {
>   	case SBI_SRST_RESET_TYPE_COLD_REBOOT:
>   	case SBI_SRST_RESET_TYPE_WARM_REBOOT:
> -		sbi_printf("%s: resetting...\n", __func__);
> -		writel(EIC770X_SYSCRG_RST_VAL, (void *)EIC770X_SYSCRG_RST);
> +		eic770x_cease_other_harts();
> +		writel(EIC770X_SYSRST_VAL, (void *)EIC770X_SYSCRG_SYSRST);
>   	}
>   
> -	sbi_hart_hang();
> +	sifive_cease();
>   }
>   
>   static struct sbi_system_reset_device *board_reset = NULL;
> -static struct sbi_system_reset_device eic770x_reset = {
> -	.name = "eic770x_reset",
> -	.system_reset_check = eic770x_system_reset_check,
> -	.system_reset = eic770x_system_reset,
> +static struct sbi_system_reset_device eic7700_reset = {
> +	.name = "eic7700_reset",
> +	.system_reset_check = eic7700_system_reset_check,
> +	.system_reset = eic7700_system_reset,
>   };
>   
>   #define add_root_mem_chk(...) do { \
> @@ -145,7 +227,7 @@ static int eswin_eic7700_early_init(bool cold_boot)
>   
>   	if (board_reset)
>   		sbi_system_reset_add_device(board_reset);
> -	sbi_system_reset_add_device(&eic770x_reset);
> +	sbi_system_reset_add_device(&eic7700_reset);
>   
>   	/* Enable bus blocker */
>   	writel(1, (void*)EIC770X_TL64D2D_OUT);
> @@ -231,6 +313,9 @@ static int eswin_eic7700_final_init(bool cold_boot)
>   	int rc;
>   
>   
> +	if (cold_boot)
> +		sbi_hsm_set_device(&eswin_eic770x_hsm);
> +
>   	/**
>   	 * Do generic_final_init stuff first, because it touchs FDT.
>   	 * After final_init, we'll block entire memory port with the
> diff --git a/platform/generic/eswin/hfp.c b/platform/generic/eswin/hfp.c
> index eabed191..a6e73e18 100644
> --- a/platform/generic/eswin/hfp.c
> +++ b/platform/generic/eswin/hfp.c
> @@ -11,6 +11,7 @@
>   #include <sbi/sbi_hart.h>
>   #include <sbi/sbi_ecall_interface.h>
>   #include <sbi_utils/serial/uart8250.h>
> +#include <sbi_utils/hsm/fdt_hsm_sifive_inst.h>
>   #include <eswin/eic770x.h>
>   #include <eswin/hfp.h>
>   
> @@ -94,6 +95,7 @@ static int hfp_system_reset_check(u32 type, u32 reason)
>   
>   static void hfp_system_reset(u32 type, u32 reason)
>   {
> +	eic770x_cease_other_harts();
>   	switch (type) {
>   	case SBI_SRST_RESET_TYPE_SHUTDOWN:
>   		hfp_send_bmc_msg(HFP_MSG_NOTIFY, HFP_CMD_POWER_OFF,
> @@ -104,7 +106,7 @@ static void hfp_system_reset(u32 type, u32 reason)
>   				NULL, 0);
>   		break;
>   	}
> -	sbi_hart_hang();
> +	sifive_cease();
>   }
>   
>   static struct sbi_system_reset_device hfp_reset = {
> diff --git a/platform/generic/include/eswin/eic770x.h b/platform/generic/include/eswin/eic770x.h
> index 67764ec0..fe7c1f58 100644
> --- a/platform/generic/include/eswin/eic770x.h
> +++ b/platform/generic/include/eswin/eic770x.h
> @@ -14,6 +14,8 @@ struct eic770x_board_override {
>   	struct sbi_system_reset_device *reset_dev;
>   };
>   
> +void eic770x_cease_other_harts(void);
> +
>   /* CSRs */
>   #define EIC770X_CSR_BRPREDICT	0x7c0
>   #define EIC770X_CSR_FEAT0	0x7c1
> @@ -55,11 +57,16 @@ struct eic770x_board_override {
>   #define EIC770X_UART_REG_SHIFT	2
>   #define EIC770X_UART_REG_WIDTH	4
>   
> -#define EIC770X_SYSCRG		(EIC770X_SYSPORT_LOCAL + 0x11828000UL)
> -#define EIC770X_SYSCRG_LSPCLK0	(EIC770X_SYSCRG + 0x200UL)
> -#define EIC770X_SYSCRG_SYSCLK	(EIC770X_SYSCRG + 0x20cUL)
> -#define EIC770X_SYSCRG_RST	(EIC770X_SYSCRG + 0x300UL)
> -#define EIC770X_SYSCRG_RST_VAL	0x1AC0FFE6UL
> +#define EIC770X_SYSCON(d)	(EIC770X_SYSPORT_BASE(d) + 0x11810000UL)
> +#define EIC770X_MCPU_STATUS(d)	(EIC770X_SYSCON(d) + 0x608UL)
> +#define EIC770X_MC_CEASE_BIT(c)	(1UL << (15 - c))
> +
> +#define EIC770X_SYSCRG(d)	(EIC770X_SYSPORT_BASE(d) + 0x11828000UL)
> +#define EIC770X_SYSCRG_LOCAL	(EIC770X_SYSPORT_LOCAL + 0x11828000UL)
> +#define EIC770X_SYSCRG_LSPCLK0	(EIC770X_SYSCRG_LOCAL + 0x200UL)
> +#define EIC770X_SYSCRG_MCCLK(d) (EIC770X_SYSCRG(d) + 0x208UL)
> +#define EIC770X_SYSCRG_SYSCLK	(EIC770X_SYSCRG_LOCAL + 0x20cUL)
> +#define EIC770X_SYSCRG_SYSRST	(EIC770X_SYSCRG_LOCAL + 0x300UL)
>   
>   /* Memory Ports */
>   #define EIC770X_MEMPORT_BASE	0x0080000000UL // 2G
> @@ -98,4 +105,7 @@ struct eic770x_board_override {
>   		divisor > 2 ? divisor : 2; 		\
>   	})
>   
> +/* Reset definitions */
> +#define EIC770X_SYSRST_VAL	0x1AC0FFE6UL
> +
>   #endif


-- 
opensbi mailing list
opensbi@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/opensbi

  reply	other threads:[~2026-03-04  7:54 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-02  8:40 Add eic770x_hsm and fix warm Bo Gan
2026-03-02  8:40 ` [PATCH 1/2] include: utils/hsm: Add __noreturn attribute for sifive_cease Bo Gan
2026-03-02  8:40 ` [PATCH 2/2] platform: generic: eswin: Add eic770x_hsm and fix warm reset issues Bo Gan
2026-03-04  7:52   ` Bo Gan [this message]
2026-04-06 11:05     ` Anup Patel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a5064a0a-be3c-45a5-9166-a965cbc158c7@gmail.com \
    --to=ganboing@gmail.com \
    --cc=gaohan@iscas.ac.cn \
    --cc=linmin@eswincomputing.com \
    --cc=me@ziyao.cc \
    --cc=nick.hu@sifive.com \
    --cc=opensbi@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox