public inbox for u-boot@lists.denx.de
 help / color / mirror / Atom feed
From: Nikita Shubin <nikita.shubin@maquefel.me>
To: Rick Chen <rickchen36@gmail.com>
Cc: Lukas Auer <lukas.auer@aisec.fraunhofer.de>,
	U-Boot Mailing List <u-boot@lists.denx.de>,
	Heinrich Schuchardt <xypron.glpk@gmx.de>,
	Atish Patra <atishp@atishpatra.org>,
	Anup Patel <anup@brainfault.org>, Bin Meng <bmeng.cn@gmail.com>,
	Sean Anderson <seanga2@gmail.com>,
	Leo Liang <ycliang@andestech.com>, rick <rick@andestech.com>
Subject: Re: RISCV: the machanism of available_harts may cause other harts boot failure
Date: Mon, 5 Sep 2022 10:47:35 +0300	[thread overview]
Message-ID: <20220905104735.5c2a260d@redslave.neermore.group> (raw)
In-Reply-To: <CAN5B=e++LtsbvCmK+P0k1bkNtYGsKLLFAADfzkHrhxUWYvjLfw@mail.gmail.com>

Hi Rick!

On Mon, 5 Sep 2022 14:22:41 +0800
Rick Chen <rickchen36@gmail.com> wrote:

> Hi,
> 
> When I free-run a SMP system, I once hit a failure case where some
> harts didn't boot to the kernel shell successfully.
> However it can't be duplicated anymore even if I try many times.
> 
> But when I set a break during debugging with GDB, it can trigger the
> failure case each time.

If hart fails to register itself to available_harts before
send_ipi_many is hit by the main hart: 
https://elixir.bootlin.com/u-boot/v2022.10-rc3/source/arch/riscv/lib/smp.c#L50

it won't exit the secondary_hart_loop:
https://elixir.bootlin.com/u-boot/v2022.10-rc3/source/arch/riscv/cpu/start.S#L433
As no ipi will be sent to it.

This might be exactly your case.

> I think the mechanism of available_harts does not provide a method
> that guarantees the success of the SMP system.
> Maybe we shall think of a better way for the SMP booting or just
> remove it ?

I haven't experienced any unexplained problem with hart_lottery or
available_harts_lock unless:

1) harts are started non-simultaneously
2) SPL/U-Boot is in some kind of TCM, OCRAM, etc... which is not cleared
on reset which leaves available_harts dirty
3) something is wrong with atomics

Also there might be something wrong with IPI send/recieve.

> 
> Thread 8 hit Breakpoint 1, harts_early_init ()
> 
> (gdb) c
> Continuing.
> [Switching to Thread 7]
> 
> Thread 7 hit Breakpoint 1, harts_early_init ()
> 
> (gdb)
> Continuing.
> [Switching to Thread 6]
> 
> Thread 6 hit Breakpoint 1, harts_early_init ()
> 
> (gdb)
> Continuing.
> [Switching to Thread 5]
> 
> Thread 5 hit Breakpoint 1, harts_early_init ()
> 
> (gdb)
> Continuing.
> [Switching to Thread 4]
> 
> Thread 4 hit Breakpoint 1, harts_early_init ()
> 
> (gdb)
> Continuing.
> [Switching to Thread 3]
> 
> Thread 3 hit Breakpoint 1, harts_early_init ()
> (gdb)
> Continuing.
> [Switching to Thread 2]
> 
> Thread 2 hit Breakpoint 1, harts_early_init ()
> (gdb)
> Continuing.
> [Switching to Thread 1]
> 
> Thread 1 hit Breakpoint 1, harts_early_init ()
> (gdb)
> Continuing.
> [Switching to Thread 5]
> 
> 
> Thread 5 hit Breakpoint 3, 0x0000000001200000 in ?? ()
> (gdb) info threads
>   Id   Target Id         Frame
>   1    Thread 1 (hart 1) secondary_hart_loop () at
> arch/riscv/cpu/start.S:436 2    Thread 2 (hart 2) secondary_hart_loop
> () at arch/riscv/cpu/start.S:436 3    Thread 3 (hart 3)
> secondary_hart_loop () at arch/riscv/cpu/start.S:436 4    Thread 4
> (hart 4) secondary_hart_loop () at arch/riscv/cpu/start.S:436
> * 5    Thread 5 (hart 5) 0x0000000001200000 in ?? ()
>   6    Thread 6 (hart 6) 0x000000000000b650 in ?? ()
>   7    Thread 7 (hart 7) 0x000000000000b650 in ?? ()
>   8    Thread 8 (hart 8) 0x0000000000005fa0 in ?? ()
> (gdb) c
> Continuing.

Do they all "offline" harts remain in SPL/U-Boot secondary_hart_loop ?

> 
> 
> 
> [    0.175619] smp: Bringing up secondary CPUs ...
> [    1.230474] CPU1: failed to come online
> [    2.282349] CPU2: failed to come online
> [    3.334394] CPU3: failed to come online
> [    4.386783] CPU4: failed to come online
> [    4.427829] smp: Brought up 1 node, 4 CPUs
> 
> 
> /root # cat /proc/cpuinfo
> processor       : 0
> hart            : 4
> isa     : rv64i2p0m2p0a2p0c2p0xv5-1p1
> mmu             : sv39
> 
> processor       : 5
> hart            : 5
> isa     : rv64i2p0m2p0a2p0c2p0xv5-1p1
> mmu             : sv39
> 
> processor       : 6
> hart            : 6
> isa     : rv64i2p0m2p0a2p0c2p0xv5-1p1
> mmu             : sv39
> 
> processor       : 7
> hart            : 7
> isa     : rv64i2p0m2p0a2p0c2p0xv5-1p1
> mmu             : sv39
> 
> /root #
> 
> Thanks,
> Rick


  reply	other threads:[~2022-09-05  7:47 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-05  6:22 RISCV: the machanism of available_harts may cause other harts boot failure Rick Chen
2022-09-05  7:47 ` Nikita Shubin [this message]
2022-09-05 15:30   ` Sean Anderson
2022-09-05 15:41     ` Heinrich Schuchardt
2022-09-05 15:45       ` Sean Anderson
2022-09-05 16:00         ` Heinrich Schuchardt
2022-09-05 16:14           ` Sean Anderson
2022-09-05 16:30             ` Heinrich Schuchardt
2022-09-05 17:10     ` Nikita Shubin
2022-09-06  1:51       ` Rick Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220905104735.5c2a260d@redslave.neermore.group \
    --to=nikita.shubin@maquefel.me \
    --cc=anup@brainfault.org \
    --cc=atishp@atishpatra.org \
    --cc=bmeng.cn@gmail.com \
    --cc=lukas.auer@aisec.fraunhofer.de \
    --cc=rick@andestech.com \
    --cc=rickchen36@gmail.com \
    --cc=seanga2@gmail.com \
    --cc=u-boot@lists.denx.de \
    --cc=xypron.glpk@gmx.de \
    --cc=ycliang@andestech.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox