From: Harsh Prateek Bora <harshpb@linux.ibm.com>
To: Ani Sinha <anisinha@redhat.com>
Cc: qemu-devel <qemu-devel@nongnu.org>,
qemu-ppc@nongnu.org, Paolo Bonzini <pbonzini@redhat.com>,
npiggin@gmail.com, misanjum@linux.ibm.com, gautam@linux.ibm.com,
Peter Maydell <peter.maydell@linaro.org>
Subject: Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in kvm_cpu_exec
Date: Fri, 10 Apr 2026 13:48:08 +0530 [thread overview]
Message-ID: <424743ec-a34d-4af0-adfb-c8392ee5e5be@linux.ibm.com> (raw)
In-Reply-To: <451942FA-0056-466C-AD42-AB0BBE88472E@redhat.com>
On 10/04/26 12:05 pm, Ani Sinha wrote:
>
>
>> On 10 Apr 2026, at 10:55 AM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:
>>
>> Hi Ani,
>>
>> On 10/04/26 9:12 am, Ani Sinha wrote:
>>>> On 9 Apr 2026, at 9:40 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:
>>>>
>>>> When kvm_cpu_exec() returns EXCP_HLT due to kvm_arch_process_async_events()
>>>> returning true, it was returning before releasing the BQL (Big QEMU Lock).
>>>> This caused a lock imbalance where the vCPU thread would loop back to
>>>> kvm_cpu_exec() while still holding the BQL, leading to deadlocks.
>>> I am not sure I understand this. Seems kvm_cpu_exec() does expect that the caller holds bql before calling the function. Where is the lock imbalance?
>>
>> The issue is not that kvm_cpu_exec() doesn't expect the caller to hold the BQL - it does. The problem is that kvm_cpu_exec() has inconsistent BQL handling across its return paths.
>>
>> Normal execution path:
>>
>> int kvm_cpu_exec(CPUState *cpu)
>> {
>> // BQL held on entry (from caller)
>>
>> if (kvm_arch_process_async_events(cpu)) {
>> return EXCP_HLT; // ← Returns with BQL STILL HELD
>> }
>>
>> bql_unlock(); // ← Normal path unlocks here
>> // ... KVM execution loop ...
>> bql_lock(); // ← Re-acquires before returning
>> return ret;
>> }
>
> Yes the semantics of the function kvm_cpu_exec() is that it should always return with bql in locked state. This is because the caller kvm_vcpu_thread_fn() calls this function with bql locked and if you see the end of kvm_vcpu_thread_fn(), it releases the lock.
>
> So if kvm_cpu_exec() unlocks bql internally, it has the responsibility to lock it again before returning. This makes the locking and unlocking symmetric.
>
>
>>
>> The lock imbalance:
>>
>> When kvm_arch_process_async_events() returns true, the function returns EXCP_HLT before the bql_unlock() call.
>
> Why should it unlock it before returning? In fact it’s opposite. If the function had unlocked bql, it should lock it again before returning.
>
>> This means the early return path keeps the BQL held,
>
> This would be the correct thing to do.
>
>> while the normal execution path releases and re-acquires it.
>
> Because the functions it calls after unlocking requires bql to be unlocked. Since it had to unlock it, it locks it again before returning.
It had to unlock it for the same reason - to give others a chance to
lock. We need to handle failure/exception cases for the same purpose as
well.
>
>> The caller (kvm_vcpu_thread_fn()) loops back and calls kvm_cpu_exec() again, but now the BQL is already held from the previous iteration
>> This creates a situation where the BQL is never released between iterations when EXCP_HLT is returned.
>>
>> Why this matters:
>> On PowerPC pseries with halted secondary vCPUs (start-powered-off=true), these vCPUs repeatedly call kvm_cpu_exec() which returns EXCP_HLT. Each iteration accumulates BQL holds, preventing other threads (including CPU 0) from making progress.
>
> This seems like some kind of architectural issue with PowerPC. Shouldn’t qemu_process_cpu_events() -> qemu_cond_wait(cpu->halt_cond, &bql) block other secondary cpus? Then the main cpu does a qemu_cpu_kick() to make them active again at some point?
KVM vCPUs need to enter the kernel to handle the halted state and
therefore can run. On spapr, it is handled via start-cpu rtas call for
which the handler in qemu does a qemu_cpu_kick(). However CPU 0 needs to
be able to proceed before that stage is reached, but it hangs while
trying to acquire bql_lock in qemu_default_main() whereas secondary vcpu
is spinning with BQL held returning EXCP_HLT. This is causing deadlock.
>
>>
>>>>
>>>> The issue manifests as boot hangs on PowerPC pseries machines with multiple
>>>> vCPUs, where secondary vCPUs with start-powered-off=true remain halted and
>>>> repeatedly call kvm_cpu_exec() which returns EXCP_HLT. Each iteration held
>>>> the BQL, preventing other operations from proceeding.
>>>>
>>>> The fix has two parts:
>>>>
>>>> 1. In kvm_cpu_exec() (kvm-all.c):
>>>> Release the BQL before returning EXCP_HLT in the early return path,
>>>> matching the behavior of the normal execution path where bql_unlock()
>>>> is called before entering the main KVM execution loop.
>>>>
>>>> 2. In kvm_vcpu_thread_fn() (kvm-accel-ops.c):
>>>> Re-acquire the BQL after kvm_cpu_exec() returns EXCP_HLT, since the
>>>> loop expects to hold the BQL when calling kvm_cpu_exec() again.
>>>>
>>>> This ensures proper BQL lock/unlock pairing:
>>>> - kvm_vcpu_thread_fn() holds BQL before calling kvm_cpu_exec()
>>>> - kvm_cpu_exec() releases BQL before returning (for EXCP_HLT)
>>>> - kvm_vcpu_thread_fn() re-acquires BQL if EXCP_HLT was returned
>>>> - Next iteration has BQL held as expected
>>>>
>>>> This is a regression introduced by commit 98884e0cc1 ("accel/kvm: add
>>>> changes required to support KVM VM file descriptor change") which
>>>> refactored kvm_irqchip_create() and changed the initialization timing,
>>>> exposing this lock imbalance issue.
>>>>
>>>> Fixes: 98884e0cc1 ("accel/kvm: add changes required to support KVM VM file descriptor change")
>>> I do not think this is the right reference. The above commit may have exposed some underlying issue but is certainly not the cause of it. Further, as we have discussed in the other thread, the changes in that commit are not even getting executed.
>>> Personally I think the core issue is somewhere else. I am not convinced this is the proper fix.
>>
>> Regarding commit 98884e0cc1:
>>
>> Reverting the kvm_irqchip_create refactoring makes the problem go away. This commit may have changed timing that exposed the issue, but the root cause is the pre-existing BQL lock imbalance in kvm_cpu_exec(). We can either:
>>
>> Remove the "Fixes:" tag entirely, or
>
> Lets remove fixes tag entirely unless you can pin point the exact commit that introduced the architectural issues.
>
>> Add a note that this is a pre-existing issue exposed by timing changes
>> The core fix (releasing BQL before returning EXCP_HLT) is correct and addresses the actual deadlock mechanism.
>>
>>>> Reported-by: Misbah Anjum N <misanjum@linux.ibm.com>
>>>> Reported-by: Gautam Menghani <gautam@linux.ibm.com>
>>>> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
>>>> ---
>>>> accel/kvm/kvm-accel-ops.c | 4 ++++
>>>> accel/kvm/kvm-all.c | 1 +
>>>> 2 files changed, 5 insertions(+)
>>>>
>>>> diff --git a/accel/kvm/kvm-accel-ops.c b/accel/kvm/kvm-accel-ops.c
>>>> index 6d9140e549..d684fd0840 100644
>>>> --- a/accel/kvm/kvm-accel-ops.c
>>>> +++ b/accel/kvm/kvm-accel-ops.c
>>>> @@ -52,6 +52,10 @@ static void *kvm_vcpu_thread_fn(void *arg)
>>>>
>>>> if (cpu_can_run(cpu)) {
>>>> r = kvm_cpu_exec(cpu);
>>>> + if (r == EXCP_HLT) {
>>>> + /* kvm_cpu_exec() released BQL, re-acquire for next iteration */
>>>> + bql_lock();
>>>> + }
>>>> if (r == EXCP_DEBUG) {
>>>> cpu_handle_guest_debug(cpu);
>>>> }
>>>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>>>> index 774499d34f..00b8018664 100644
>>>> --- a/accel/kvm/kvm-all.c
>>>> +++ b/accel/kvm/kvm-all.c
>>>> @@ -3439,6 +3439,7 @@ int kvm_cpu_exec(CPUState *cpu)
>>>> trace_kvm_cpu_exec();
>>>>
>>>> if (kvm_arch_process_async_events(cpu)) {
>>>> + bql_unlock();
>>>> return EXCP_HLT;
>>>> }
>>>>
>>>> --
>>>> 2.52.0
>
>
>
next prev parent reply other threads:[~2026-04-10 8:18 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-09 16:10 [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in kvm_cpu_exec Harsh Prateek Bora
2026-04-10 3:42 ` Ani Sinha
2026-04-10 5:25 ` Harsh Prateek Bora
2026-04-10 6:35 ` Ani Sinha
2026-04-10 8:15 ` Ani Sinha
2026-04-10 8:18 ` Harsh Prateek Bora [this message]
2026-04-10 8:29 ` Ani Sinha
2026-04-10 9:01 ` Harsh Prateek Bora
2026-04-10 9:31 ` Ani Sinha
2026-04-10 10:02 ` Harsh Prateek Bora
2026-04-10 10:05 ` Ani Sinha
2026-04-10 10:16 ` Harsh Prateek Bora
2026-04-10 13:04 ` BALATON Zoltan
2026-04-10 13:37 ` Ani Sinha
2026-04-10 15:07 ` BALATON Zoltan
2026-04-10 7:16 ` Misbah Anjum N
2026-04-10 18:12 ` Fabiano Rosas
2026-04-13 5:44 ` Harsh Prateek Bora
2026-04-13 7:13 ` Ani Sinha
2026-04-13 7:39 ` Harsh Prateek Bora
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=424743ec-a34d-4af0-adfb-c8392ee5e5be@linux.ibm.com \
--to=harshpb@linux.ibm.com \
--cc=anisinha@redhat.com \
--cc=gautam@linux.ibm.com \
--cc=misanjum@linux.ibm.com \
--cc=npiggin@gmail.com \
--cc=pbonzini@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=qemu-ppc@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.