From: David Hildenbrand <david@redhat.com>
To: Keqian Zhu <zhukeqian1@huawei.com>,
qemu-devel@nongnu.org, Peter Maydell <peter.maydell@linaro.org>,
Igor Mammedov <imammedo@redhat.com>,
Stefan Hajnoczi <stefanha@redhat.com>
Cc: wanghaibin.wang@huawei.com, Zenghui Yu <yuzenghui@huawei.com>,
jiangkunkun@huawei.com, salil.mehta@huawei.com
Subject: Re: [PATCH v1 1/2] system/cpus: Fix pause_all_vcpus() under concurrent environment
Date: Mon, 18 Mar 2024 11:10:34 +0100 [thread overview]
Message-ID: <4d7795bb-1dfa-40e7-a98e-4c0bafdf3db0@redhat.com> (raw)
In-Reply-To: <20240317083704.23244-2-zhukeqian1@huawei.com>
On 17.03.24 09:37, Keqian Zhu via wrote:
> Both main loop thread and vCPU thread are allowed to call
> pause_all_vcpus(), and in general resume_all_vcpus() is called
> after it. Two issues live in pause_all_vcpus():
In general, calling pause_all_vcpus() from VCPU threads is quite dangerous.
Do we have reproducers for the cases below?
>
> 1. There is possibility that during thread T1 waits on
> qemu_pause_cond with bql unlocked, other thread has called
> pause_all_vcpus() and resume_all_vcpus(), then thread T1 will
> stuck, because the condition all_vcpus_paused() is always false.
How can this happen?
Two threads calling pause_all_vcpus() is borderline broken, as you note.
IIRC, we should call pause_all_vcpus() only if some other mechanism
prevents these races. For example, based on runstate changes.
Just imagine one thread calling pause_all_vcpus() while another one
calls resume_all_vcpus(). It cannot possibly work.
>
> 2. After all_vcpus_paused() has been checked as true, we will
> unlock bql to relock replay_mutex. During the bql was unlocked,
> the vcpu's state may has been changed by other thread, so we
> must retry.
>
> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
> ---
> system/cpus.c | 29 ++++++++++++++++++++++++-----
> 1 file changed, 24 insertions(+), 5 deletions(-)
>
> diff --git a/system/cpus.c b/system/cpus.c
> index 68d161d96b..4e41abe23e 100644
> --- a/system/cpus.c
> +++ b/system/cpus.c
> @@ -571,12 +571,14 @@ static bool all_vcpus_paused(void)
> return true;
> }
>
> -void pause_all_vcpus(void)
> +static void request_pause_all_vcpus(void)
> {
> CPUState *cpu;
>
> - qemu_clock_enable(QEMU_CLOCK_VIRTUAL, false);
> CPU_FOREACH(cpu) {
> + if (cpu->stopped) {
> + continue;
> + }
> if (qemu_cpu_is_self(cpu)) {
> qemu_cpu_stop(cpu, true);
> } else {
> @@ -584,6 +586,14 @@ void pause_all_vcpus(void)
> qemu_cpu_kick(cpu);
> }
> }
> +}
> +
> +void pause_all_vcpus(void)
> +{
> + qemu_clock_enable(QEMU_CLOCK_VIRTUAL, false);
> +
> +retry:
> + request_pause_all_vcpus();
>
> /* We need to drop the replay_lock so any vCPU threads woken up
> * can finish their replay tasks
> @@ -592,14 +602,23 @@ void pause_all_vcpus(void)
>
> while (!all_vcpus_paused()) {
> qemu_cond_wait(&qemu_pause_cond, &bql);
> - CPU_FOREACH(cpu) {
> - qemu_cpu_kick(cpu);
> - }
> + /* During we waited on qemu_pause_cond the bql was unlocked,
> + * the vcpu's state may has been changed by other thread, so
> + * we must request the pause state on all vcpus again.
> + */
> + request_pause_all_vcpus();
> }
>
> bql_unlock();
> replay_mutex_lock();
> bql_lock();
> +
> + /* During the bql was unlocked, the vcpu's state may has been
> + * changed by other thread, so we must retry.
> + */
> + if (!all_vcpus_paused()) {
> + goto retry;
> + }
> }
>
> void cpu_resume(CPUState *cpu)
--
Cheers,
David / dhildenb
next prev parent reply other threads:[~2024-03-18 10:11 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-17 8:37 [PATCH v1 0/2] Some fixes for pause and resume all vcpus Keqian Zhu via
2024-03-17 8:37 ` [PATCH v1 1/2] system/cpus: Fix pause_all_vcpus() under concurrent environment Keqian Zhu via
2024-03-18 10:10 ` David Hildenbrand [this message]
2024-03-19 5:06 ` 答复: " zhukeqian via
2024-03-19 9:24 ` David Hildenbrand
2024-03-19 13:23 ` David Hildenbrand
2024-03-19 14:23 ` Peter Maydell
2024-03-19 14:46 ` David Hildenbrand
2024-03-19 14:56 ` Peter Maydell
2024-03-17 8:37 ` [PATCH v1 2/2] system/cpus: Fix resume_all_vcpus() under vCPU hotplug condition Keqian Zhu via
2024-03-18 10:14 ` David Hildenbrand
2024-03-19 5:11 ` 答复: " zhukeqian via
2024-03-19 9:25 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4d7795bb-1dfa-40e7-a98e-4c0bafdf3db0@redhat.com \
--to=david@redhat.com \
--cc=imammedo@redhat.com \
--cc=jiangkunkun@huawei.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=salil.mehta@huawei.com \
--cc=stefanha@redhat.com \
--cc=wanghaibin.wang@huawei.com \
--cc=yuzenghui@huawei.com \
--cc=zhukeqian1@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).