Possible race condition in aspeed ast2600 smp boot on TCG QEMU

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* Possible race condition in aspeed ast2600 smp boot on TCG QEMU
@ 2024-01-11 17:38 Stephen Longfield
  2024-01-11 20:49 ` Stephen Longfield
  2024-01-12 16:33 ` Cédric Le Goater
  0 siblings, 2 replies; 5+ messages in thread
From: Stephen Longfield @ 2024-01-11 17:38 UTC (permalink / raw)
  To: qemu-devel, Cédric Le Goater, peter.maydell, andrew, joel
  Cc: Joe Komlodi, Patrick Venture

[-- Attachment #1: Type: text/plain, Size: 3371 bytes --]

We’ve noticed inconsistent behavior when running a large number of aspeed
ast2600 executions, that seems to be tied to a race condition in the smp
boot when executing on TCG-QEMU, and were wondering what a good mediation
strategy might be.

The problem first shows up as part of SMP boot. On a run that’s likely to
later run into issues, we’ll see something like:

```
[    0.008350] smp: Bringing up secondary CPUs ...
[    1.168584] CPU1: failed to come online
[    1.187277] smp: Brought up 1 node, 1 CPU
```

Compared to the more likely to succeed:

```
[    0.080313] smp: Bringing up secondary CPUs ...
[    0.093166] smp: Brought up 1 node, 2 CPUs
[    0.093345] SMP: Total of 2 processors activated (4800.00 BogoMIPS).
```

It’s somewhat reliably reproducible by running the ast2600-evb with an
OpenBMC image, using ‘-icount auto’ to slow execution and make the race
condition more frequent (it happens without this, just easier to debug if
we can reproduce):

```
./aarch64-softmmu/qemu-system-aarch64 -machine ast2600-evb -nographic
-drive file=~/bmc-bin/image-obmc-ast2600,if=mtd,bus=0,unit=0,snapshot=on
-nic user -icount auto
```

Our current hypothesis is that the problem comes up in the platform uboot.
As part of the boot, the secondary core waits for the smp mailbox to get a
magic number written by the primary core:

https://github.com/AspeedTech-BMC/u-boot/blob/aspeed-master-v2019.04/arch/arm/mach-aspeed/ast2600/platform.S#L168

However, this memory address is cleared on boot:

https://github.com/AspeedTech-BMC/u-boot/blob/aspeed-master-v2019.04/arch/arm/mach-aspeed/ast2600/platform.S#L146

The race condition occurs if the primary core runs far ahead of the
secondary core: if the primary core gets to the point where it signals the
secondary core’s mailbox before the secondary core gets past the point
where it does the initial reset and starts waiting, the reset will clear
the signal, and then the secondary core will never get past the point where
it’s looping in `poll_smp_mbox_ready`.

We’ve observed this race happening by dumping all SCU reads and writes, and
validated that this is the problem by using a modified `platform.S` that
doesn’t clear the =SCU_SMP_READY mailbox on reset, but would rather not
have to use a modified version of SMP boot just for QEMU-TCG execution.

Is there a way to have QEMU insert a barrier synchronization at some point
in the bootloader?  I think getting both cores past the =SCU_SMP_READY
reset would get rid of this race, but I’m not aware of a way to do that
kind of thing in QEMU-TCG.

Thanks for any insights!

--Stephen

---

P.S. Additional note about the aspeed platform.S:

Clearing the mailbox was added in this patch:

https://github.com/AspeedTech-BMC/u-boot/commit/55825c55d1dabc00e37999a38495ed05c901bec2

At the time, the write to what was then known as
`=AST_SMP_MBOX_FIELD_READY` (now `=SCU_SMP_READY`) happened after
`scu_unlock`.  But, when the boot flow was revised in

https://github.com/AspeedTech-BMC/u-boot/commit/46a48bbe56c1e790c9bd1794364db86ec609c48e
the scu_unlock was moved to primary core boot, so, unless the primary core
wins the race, it doesn’t seem like the mailbox ready clear actually will
have any effect, since it’ll be writing while the SCU is locked.

[-- Attachment #2: Type: text/html, Size: 3949 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Possible race condition in aspeed ast2600 smp boot on TCG QEMU
  2024-01-11 17:38 Possible race condition in aspeed ast2600 smp boot on TCG QEMU Stephen Longfield
@ 2024-01-11 20:49 ` Stephen Longfield
  2024-01-12 16:33 ` Cédric Le Goater
  1 sibling, 0 replies; 5+ messages in thread
From: Stephen Longfield @ 2024-01-11 20:49 UTC (permalink / raw)
  To: qemu-devel, Cédric Le Goater, peter.maydell, andrew, joel
  Cc: Joe Komlodi, Patrick Venture

[-- Attachment #1: Type: text/plain, Size: 3979 bytes --]

Sort of resolved: We were able to find a good-enough workaround. In case
anyone else is running into this, here's what we did:

By dropping to the uboot console and running the command

```
mw.l 0x1e6e2188 0xbabecafe
```

The magic number is set in the SCU regardless of how the race goes, and the
2nd core gets released from its mailbox polling loop.

On Thu, Jan 11, 2024 at 9:38 AM Stephen Longfield <slongfield@google.com>
wrote:

> We’ve noticed inconsistent behavior when running a large number of aspeed
> ast2600 executions, that seems to be tied to a race condition in the smp
> boot when executing on TCG-QEMU, and were wondering what a good mediation
> strategy might be.
>
> The problem first shows up as part of SMP boot. On a run that’s likely to
> later run into issues, we’ll see something like:
>
> ```
> [    0.008350] smp: Bringing up secondary CPUs ...
> [    1.168584] CPU1: failed to come online
> [    1.187277] smp: Brought up 1 node, 1 CPU
> ```
>
> Compared to the more likely to succeed:
>
> ```
> [    0.080313] smp: Bringing up secondary CPUs ...
> [    0.093166] smp: Brought up 1 node, 2 CPUs
> [    0.093345] SMP: Total of 2 processors activated (4800.00 BogoMIPS).
> ```
>
> It’s somewhat reliably reproducible by running the ast2600-evb with an
> OpenBMC image, using ‘-icount auto’ to slow execution and make the race
> condition more frequent (it happens without this, just easier to debug if
> we can reproduce):
>
>
> ```
> ./aarch64-softmmu/qemu-system-aarch64 -machine ast2600-evb -nographic
> -drive file=~/bmc-bin/image-obmc-ast2600,if=mtd,bus=0,unit=0,snapshot=on
> -nic user -icount auto
> ```
>
> Our current hypothesis is that the problem comes up in the platform
> uboot.  As part of the boot, the secondary core waits for the smp mailbox
> to get a magic number written by the primary core:
>
>
> https://github.com/AspeedTech-BMC/u-boot/blob/aspeed-master-v2019.04/arch/arm/mach-aspeed/ast2600/platform.S#L168
>
> However, this memory address is cleared on boot:
>
>
> https://github.com/AspeedTech-BMC/u-boot/blob/aspeed-master-v2019.04/arch/arm/mach-aspeed/ast2600/platform.S#L146
>
> The race condition occurs if the primary core runs far ahead of the
> secondary core: if the primary core gets to the point where it signals the
> secondary core’s mailbox before the secondary core gets past the point
> where it does the initial reset and starts waiting, the reset will clear
> the signal, and then the secondary core will never get past the point where
> it’s looping in `poll_smp_mbox_ready`.
>
> We’ve observed this race happening by dumping all SCU reads and writes,
> and validated that this is the problem by using a modified `platform.S`
> that doesn’t clear the =SCU_SMP_READY mailbox on reset, but would rather
> not have to use a modified version of SMP boot just for QEMU-TCG execution.
>
> Is there a way to have QEMU insert a barrier synchronization at some point
> in the bootloader?  I think getting both cores past the =SCU_SMP_READY
> reset would get rid of this race, but I’m not aware of a way to do that
> kind of thing in QEMU-TCG.
>
> Thanks for any insights!
>
> --Stephen
>
> ---
>
> P.S. Additional note about the aspeed platform.S:
>
> Clearing the mailbox was added in this patch:
>
>
> https://github.com/AspeedTech-BMC/u-boot/commit/55825c55d1dabc00e37999a38495ed05c901bec2
>
> At the time, the write to what was then known as
> `=AST_SMP_MBOX_FIELD_READY` (now `=SCU_SMP_READY`) happened after
> `scu_unlock`.  But, when the boot flow was revised in
>
>
> https://github.com/AspeedTech-BMC/u-boot/commit/46a48bbe56c1e790c9bd1794364db86ec609c48e
> the scu_unlock was moved to primary core boot, so, unless the primary core
> wins the race, it doesn’t seem like the mailbox ready clear actually will
> have any effect, since it’ll be writing while the SCU is locked.
>

[-- Attachment #2: Type: text/html, Size: 4850 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Possible race condition in aspeed ast2600 smp boot on TCG QEMU
  2024-01-11 17:38 Possible race condition in aspeed ast2600 smp boot on TCG QEMU Stephen Longfield
  2024-01-11 20:49 ` Stephen Longfield
@ 2024-01-12 16:33 ` Cédric Le Goater
  2024-01-15  8:36   ` Troy Lee
  1 sibling, 1 reply; 5+ messages in thread
From: Cédric Le Goater @ 2024-01-12 16:33 UTC (permalink / raw)
  To: Stephen Longfield, qemu-devel, peter.maydell, andrew, joel
  Cc: Joe Komlodi, Patrick Venture, Ryan Chen, Jamin Lin, Troy Lee

Adding Aspeed Engineers. This reminds me of a discussion a while ago.

On 1/11/24 18:38, Stephen Longfield wrote:
> We’ve noticed inconsistent behavior when running a large number of aspeed ast2600 executions, that seems to be tied to a race condition in the smp boot when executing on TCG-QEMU, and were wondering what a good mediation strategy might be.
> 
> The problem first shows up as part of SMP boot. On a run that’s likely to later run into issues, we’ll see something like:
> 
> ```
> [    0.008350] smp: Bringing up secondary CPUs ...
> [    1.168584] CPU1: failed to come online
> [    1.187277] smp: Brought up 1 node, 1 CPU
> ```
> 
> Compared to the more likely to succeed:
> 
> ```
> [    0.080313] smp: Bringing up secondary CPUs ...
> [    0.093166] smp: Brought up 1 node, 2 CPUs
> [    0.093345] SMP: Total of 2 processors activated (4800.00 BogoMIPS).
> ```
> 
> It’s somewhat reliably reproducible by running the ast2600-evb with an OpenBMC image, using ‘-icount auto’ to slow execution and make the race condition more frequent (it happens without this, just easier to debug if we can reproduce):
> 
> 
> ```
> ./aarch64-softmmu/qemu-system-aarch64 -machine ast2600-evb -nographic -drive file=~/bmc-bin/image-obmc-ast2600,if=mtd,bus=0,unit=0,snapshot=on -nic user -icount auto
> ```
> 
> Our current hypothesis is that the problem comes up in the platform uboot.  As part of the boot, the secondary core waits for the smp mailbox to get a magic number written by the primary core:
> 
> https://github.com/AspeedTech-BMC/u-boot/blob/aspeed-master-v2019.04/arch/arm/mach-aspeed/ast2600/platform.S#L168 <https://github.com/AspeedTech-BMC/u-boot/blob/aspeed-master-v2019.04/arch/arm/mach-aspeed/ast2600/platform.S#L168>
> 
> However, this memory address is cleared on boot:
> 
> https://github.com/AspeedTech-BMC/u-boot/blob/aspeed-master-v2019.04/arch/arm/mach-aspeed/ast2600/platform.S#L146 <https://github.com/AspeedTech-BMC/u-boot/blob/aspeed-master-v2019.04/arch/arm/mach-aspeed/ast2600/platform.S#L146>
> 
> The race condition occurs if the primary core runs far ahead of the secondary core: if the primary core gets to the point where it signals the secondary core’s mailbox before the secondary core gets past the point where it does the initial reset and starts waiting, the reset will clear the signal, and then the secondary core will never get past the point where it’s looping in `poll_smp_mbox_ready`.
> 
> We’ve observed this race happening by dumping all SCU reads and writes, and validated that this is the problem by using a modified `platform.S` that doesn’t clear the =SCU_SMP_READY mailbox on reset, but would rather not have to use a modified version of SMP boot just for QEMU-TCG execution.

you could use '-trace aspeed_scu*' to collect the MMIO accesses on
the SCU unit. A TCG plugin also.

> Is there a way to have QEMU insert a barrier synchronization at some point in the bootloader?  I think getting both cores past the =SCU_SMP_READY reset would get rid of this race, but I’m not aware of a way to do that kind of thing in QEMU-TCG.
> 
> Thanks for any insights!

Could we change the default value to registers 0x180 ... 0x18C in
hw/misc/aspeed_scu.c to make sure the SMP regs are immune to the
race ?

Thanks,

C.






^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: Possible race condition in aspeed ast2600 smp boot on TCG QEMU
  2024-01-12 16:33 ` Cédric Le Goater
@ 2024-01-15  8:36   ` Troy Lee
  2024-01-16 17:52     ` Stephen Longfield
  0 siblings, 1 reply; 5+ messages in thread
From: Troy Lee @ 2024-01-15  8:36 UTC (permalink / raw)
  To: Cédric Le Goater, Stephen Longfield, qemu-devel@nongnu.org,
	peter.maydell@linaro.org, andrew@codeconstruct.com.au,
	joel@jms.id.au
  Cc: Joe Komlodi, Patrick Venture, Ryan Chen, Jamin Lin

Hi Stephen and Cedric,

This issue haven't been found in real platform but sometime happens in
emulator, e.g. Simic. 

> Adding Aspeed Engineers. This reminds me of a discussion a while ago.
> 
> On 1/11/24 18:38, Stephen Longfield wrote:
> > We’ve noticed inconsistent behavior when running a large number of aspeed
> ast2600 executions, that seems to be tied to a race condition in the smp boot
> when executing on TCG-QEMU, and were wondering what a good mediation
> strategy might be.
> >
> > The problem first shows up as part of SMP boot. On a run that’s likely to
> later run into issues, we’ll see something like:
> >
> > ```
> > [    0.008350] smp: Bringing up secondary CPUs ...
> > [    1.168584] CPU1: failed to come online [    1.187277] smp: Brought
> > up 1 node, 1 CPU ```
> >
> > Compared to the more likely to succeed:
> >
> > ```
> > [    0.080313] smp: Bringing up secondary CPUs ...
> > [    0.093166] smp: Brought up 1 node, 2 CPUs [    0.093345] SMP:
> > Total of 2 processors activated (4800.00 BogoMIPS).
> > ```
> >
> > It’s somewhat reliably reproducible by running the ast2600-evb with an
> OpenBMC image, using ‘-icount auto’ to slow execution and make the race
> condition more frequent (it happens without this, just easier to debug if we
> can reproduce):
> >
> >
> > ```
> > ./aarch64-softmmu/qemu-system-aarch64 -machine ast2600-evb -
> nographic
> > -drive
> > file=~/bmc-bin/image-obmc-ast2600,if=mtd,bus=0,unit=0,snapshot=on -nic
> > user -icount auto ```

Have you try to run qemu with "-smp 2"?

> >
> > Our current hypothesis is that the problem comes up in the platform
> uboot.  As part of the boot, the secondary core waits for the smp mailbox to
> get a magic number written by the primary core:
> >
> > https://github.com/AspeedTech-BMC/u-boot/blob/aspeed-master-
> v2019.04/a
> > rch/arm/mach-aspeed/ast2600/platform.S#L168
> > <https://github.com/AspeedTech-BMC/u-boot/blob/aspeed-master-
> v2019.04/
> > arch/arm/mach-aspeed/ast2600/platform.S#L168>
> >
> > However, this memory address is cleared on boot:
> >
> > https://github.com/AspeedTech-BMC/u-boot/blob/aspeed-master-
> v2019.04/a
> > rch/arm/mach-aspeed/ast2600/platform.S#L146
> > <https://github.com/AspeedTech-BMC/u-boot/blob/aspeed-master-
> v2019.04/
> > arch/arm/mach-aspeed/ast2600/platform.S#L146>
> >
> > The race condition occurs if the primary core runs far ahead of the secondary
> core: if the primary core gets to the point where it signals the secondary core’s
> mailbox before the secondary core gets past the point where it does the initial
> reset and starts waiting, the reset will clear the signal, and then the secondary
> core will never get past the point where it’s looping in
> `poll_smp_mbox_ready`.
> >
> > We’ve observed this race happening by dumping all SCU reads and writes,
> and validated that this is the problem by using a modified `platform.S` that
> doesn’t clear the =SCU_SMP_READY mailbox on reset, but would rather not
> have to use a modified version of SMP boot just for QEMU-TCG execution.

To prevent the race condition described, SCU188 zeroization is conducted
as early as possible by both CPU#0 and CPU#1. After that, there are at 
least 100 instructions for CPU#0 to execute before it get the chance to
set SCU188 to 0xbabecafe. For real, parallel HW, it is unusual that CPU#1
will be slower than CPU#0 by 100 instruction cycles.

> 
> you could use '-trace aspeed_scu*' to collect the MMIO accesses on the SCU
> unit. A TCG plugin also.
> 
> > Is there a way to have QEMU insert a barrier synchronization at some point
> in the bootloader?  I think getting both cores past the =SCU_SMP_READY reset
> would get rid of this race, but I’m not aware of a way to do that kind of thing
> in QEMU-TCG.
> >
> > Thanks for any insights!
> 
> Could we change the default value to registers 0x180 ... 0x18C in
> hw/misc/aspeed_scu.c to make sure the SMP regs are immune to the race ?
> 
> Thanks,
> 
> C.

Thanks,
Troy Lee

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Possible race condition in aspeed ast2600 smp boot on TCG QEMU
  2024-01-15  8:36   ` Troy Lee
@ 2024-01-16 17:52     ` Stephen Longfield
  0 siblings, 0 replies; 5+ messages in thread
From: Stephen Longfield @ 2024-01-16 17:52 UTC (permalink / raw)
  To: Troy Lee
  Cc: Cédric Le Goater, qemu-devel@nongnu.org,
	peter.maydell@linaro.org, andrew@codeconstruct.com.au,
	joel@jms.id.au, Joe Komlodi, Patrick Venture, Ryan Chen,
	Jamin Lin

On Mon, Jan 15, 2024 at 12:36 AM Troy Lee <troy_lee@aspeedtech.com> wrote:
>
> Hi Stephen and Cedric,
>
> This issue haven't been found in real platform but sometime happens in
> emulator, e.g. Simic.

Do you have a workaround that you use in other simulators that we
could also try using in QEMU?

> > Adding Aspeed Engineers. This reminds me of a discussion a while ago.
> >
> > On 1/11/24 18:38, Stephen Longfield wrote:
> > > We’ve noticed inconsistent behavior when running a large number of aspeed
> > ast2600 executions, that seems to be tied to a race condition in the smp boot
> > when executing on TCG-QEMU, and were wondering what a good mediation
> > strategy might be.
> > >
> > > The problem first shows up as part of SMP boot. On a run that’s likely to
> > later run into issues, we’ll see something like:
> > >
> > > ```
> > > [    0.008350] smp: Bringing up secondary CPUs ...
> > > [    1.168584] CPU1: failed to come online [    1.187277] smp: Brought
> > > up 1 node, 1 CPU ```
> > >
> > > Compared to the more likely to succeed:
> > >
> > > ```
> > > [    0.080313] smp: Bringing up secondary CPUs ...
> > > [    0.093166] smp: Brought up 1 node, 2 CPUs [    0.093345] SMP:
> > > Total of 2 processors activated (4800.00 BogoMIPS).
> > > ```
> > >
> > > It’s somewhat reliably reproducible by running the ast2600-evb with an
> > OpenBMC image, using ‘-icount auto’ to slow execution and make the race
> > condition more frequent (it happens without this, just easier to debug if we
> > can reproduce):
> > >
> > >
> > > ```
> > > ./aarch64-softmmu/qemu-system-aarch64 -machine ast2600-evb -
> > nographic
> > > -drive
> > > file=~/bmc-bin/image-obmc-ast2600,if=mtd,bus=0,unit=0,snapshot=on -nic
> > > user -icount auto ```
>
> Have you try to run qemu with "-smp 2"?

`-smp 2` lowers the probability, but does not get rid of the race.

> > >
> > > Our current hypothesis is that the problem comes up in the platform
> > uboot.  As part of the boot, the secondary core waits for the smp mailbox to
> > get a magic number written by the primary core:
> > >
> > > https://github.com/AspeedTech-BMC/u-boot/blob/aspeed-master-
> > v2019.04/a
> > > rch/arm/mach-aspeed/ast2600/platform.S#L168
> > > <https://github.com/AspeedTech-BMC/u-boot/blob/aspeed-master-
> > v2019.04/
> > > arch/arm/mach-aspeed/ast2600/platform.S#L168>
> > >
> > > However, this memory address is cleared on boot:
> > >
> > > https://github.com/AspeedTech-BMC/u-boot/blob/aspeed-master-
> > v2019.04/a
> > > rch/arm/mach-aspeed/ast2600/platform.S#L146
> > > <https://github.com/AspeedTech-BMC/u-boot/blob/aspeed-master-
> > v2019.04/
> > > arch/arm/mach-aspeed/ast2600/platform.S#L146>
> > >
> > > The race condition occurs if the primary core runs far ahead of the secondary
> > core: if the primary core gets to the point where it signals the secondary core’s
> > mailbox before the secondary core gets past the point where it does the initial
> > reset and starts waiting, the reset will clear the signal, and then the secondary
> > core will never get past the point where it’s looping in
> > `poll_smp_mbox_ready`.
> > >
> > > We’ve observed this race happening by dumping all SCU reads and writes,
> > and validated that this is the problem by using a modified `platform.S` that
> > doesn’t clear the =SCU_SMP_READY mailbox on reset, but would rather not
> > have to use a modified version of SMP boot just for QEMU-TCG execution.
>
> To prevent the race condition described, SCU188 zeroization is conducted
> as early as possible by both CPU#0 and CPU#1. After that, there are at
> least 100 instructions for CPU#0 to execute before it get the chance to
> set SCU188 to 0xbabecafe. For real, parallel HW, it is unusual that CPU#1
> will be slower than CPU#0 by 100 instruction cycles.
>

This doesn't really prevent the race, it just lowers the probability
of it causing problems.

If in hardware, the cores begin execution at the same time, then in
normal execution of Platform.S, the SCU188 zeroing will happen before
the scu_unlock macro runs. If I understand the SCU lock correctly:
that would mean that the store-zero to SCU188 normally has no effect
(since the registers are read only at that time), and only really
participates in this race. Since you mentioned seeing this same race
in other simulation environments, have you considered removing that
write?

> >
> > you could use '-trace aspeed_scu*' to collect the MMIO accesses on the SCU
> > unit. A TCG plugin also.
> >
> > > Is there a way to have QEMU insert a barrier synchronization at some point
> > in the bootloader?  I think getting both cores past the =SCU_SMP_READY reset
> > would get rid of this race, but I’m not aware of a way to do that kind of thing
> > in QEMU-TCG.
> > >
> > > Thanks for any insights!
> >
> > Could we change the default value to registers 0x180 ... 0x18C in
> > hw/misc/aspeed_scu.c to make sure the SMP regs are immune to the race ?

I don't believe that changing the default values would help. The race
occurs when the secondary core clears data that the primary core
writes--this would still happen even if the default value is what it
eventually would get written to.

We did find a workaround, sent to this list on Thursday (2024-01-11):
setting the magic value of the register at the uboot console (`mw.l
0x1e6e2188 0xbabecafe`). Since that executes after platform.S, but
before Linux boot, it's able to unwedge the secondary core and avoid
later issues.

> > Thanks,
> >
> > C.
>
> Thanks,
> Troy Lee


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-01-16 17:53 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-11 17:38 Possible race condition in aspeed ast2600 smp boot on TCG QEMU Stephen Longfield
2024-01-11 20:49 ` Stephen Longfield
2024-01-12 16:33 ` Cédric Le Goater
2024-01-15  8:36   ` Troy Lee
2024-01-16 17:52     ` Stephen Longfield

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).