qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Cédric Le Goater" <clg@kaod.org>
To: Frederic Barrat <fbarrat@linux.ibm.com>, <danielhb413@gmail.com>,
	<qemu-ppc@nongnu.org>, <qemu-devel@nongnu.org>
Subject: Re: [PATCH v2] target/ppc: cpu_init: Clean up stop state on cpu reset
Date: Sat, 18 Jun 2022 13:35:38 +0200	[thread overview]
Message-ID: <c84bcd17-87d4-053c-5c7b-72e2e420288e@kaod.org> (raw)
In-Reply-To: <20220617095222.612212-1-fbarrat@linux.ibm.com>

On 6/17/22 11:52, Frederic Barrat wrote:
> The 'resume_as_sreset' attribute of a cpu is set when a thread is
> entering a stop state on ppc books. It causes the thread to be
> re-routed to vector 0x100 when woken up by an exception. So it must be
> cleared on reset or a thread might be re-routed unexpectedly after a
> reset, when it was not in a stop state and/or when the appropriate
> exception handler isn't set up yet.
> 
> Using skiboot, it can be tested by resetting the system when it is
> quiet and most threads are idle and in stop state.
> 
> After the reset occurs, skiboot elects a primary thread and all the
> others wait in secondary_wait. The primary thread does all the system
> initialization from main_cpu_entry() and at some point, the
> decrementer interrupt starts ticking. The exception vector for the
> decrementer interrupt is in place, so that shouldn't be a
> problem. However, if that primary thread was in stop state prior to
> the reset, and because the resume_as_sreset parameters is still set,
> it is re-routed to exception vector 0x100. Which, at that time, is
> still defined as the entry point for BML. So that primary thread
> restarts as new and ends up being treated like any other secondary
> thread. All threads are now waiting in secondary_wait.
> 
> It results in a full system hang with no message on the console, as
> the uart hasn't been init'ed yet. It's actually not obvious to realise
> what's happening if not tracing reset (-d cpu_reset). The fix is
> simply to clear the 'resume_as_sreset' attribute on reset.
> 
> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
> ---
> Changelog:
> v2: rework commit message


Nice ! This has been a long standing bug. I chased it for weeks.
I was reproducing with intensive I/Os, doing an scp on an emulated
PowerNV machine. It hung after a while (unless using powersave=off)

Now, with this patch, a QEMU PowerNV POWER9 machine (SMP) running a
Linux 5.18 sustains the load :

   $ scp ./ubuntu-22.04-ppc64le.qcow2 root@vm103:/dev/null
   root@vm103's password:
   ubuntu-22.04-ppc64le.qcow2                    100% 8581MB   5.8MB/s   24:39

Quite a few interrupts :

   # grep PNV-PCI-MSI  /proc/interrupts
    51:          9          0  PNV-PCI-MSI 403177472 Edge      nvme0q0
    52:          2          0  PNV-PCI-MSI 403177473 Edge      nvme0q1
    53:          0          0  PNV-PCI-MSI 403177474 Edge      nvme0q2
    54:    3427556          0  PNV-PCI-MSI 135315456 Edge      eth0-rx-0
    55:          0    4261742  PNV-PCI-MSI 135315457 Edge      eth0-tx-0
    56:          1          0  PNV-PCI-MSI 135315458 Edge      eth0
    57:          0         71  PNV-PCI-MSI 135299072 Edge      xhci_hcd
    58:          0          0  PNV-PCI-MSI 135299073 Edge      xhci_hcd
    59:          0          0  PNV-PCI-MSI 135299074 Edge      xhci_hcd


It would be nice to explain what you did to corner the issue. It would
help other people chasing similar bugs in QEMU or in the kernel.

Thanks,

C.





> 
> 
>   target/ppc/cpu_init.c | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
> index 0f891afa04..c16cb8dbe7 100644
> --- a/target/ppc/cpu_init.c
> +++ b/target/ppc/cpu_init.c
> @@ -7186,6 +7186,9 @@ static void ppc_cpu_reset(DeviceState *dev)
>           }
>           pmu_update_summaries(env);
>       }
> +
> +    /* clean any pending stop state */
> +    env->resume_as_sreset = 0;
>   #endif
>       hreg_compute_hflags(env);
>       env->reserve_addr = (target_ulong)-1ULL;



      parent reply	other threads:[~2022-06-18 11:40 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-17  9:52 [PATCH v2] target/ppc: cpu_init: Clean up stop state on cpu reset Frederic Barrat
2022-06-17 14:57 ` Fabiano Rosas
2022-06-17 19:45 ` Daniel Henrique Barboza
2022-06-18 11:35 ` Cédric Le Goater [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c84bcd17-87d4-053c-5c7b-72e2e420288e@kaod.org \
    --to=clg@kaod.org \
    --cc=danielhb413@gmail.com \
    --cc=fbarrat@linux.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).