All of lore.kernel.org
 help / color / mirror / Atom feed
From: Harsh Prateek Bora <harshpb@linux.ibm.com>
To: Ani Sinha <anisinha@redhat.com>
Cc: qemu-devel <qemu-devel@nongnu.org>,
	qemu-ppc@nongnu.org, Paolo Bonzini <pbonzini@redhat.com>,
	npiggin@gmail.com, misanjum@linux.ibm.com, gautam@linux.ibm.com,
	Peter Maydell <peter.maydell@linaro.org>
Subject: Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in kvm_cpu_exec
Date: Fri, 10 Apr 2026 13:48:08 +0530	[thread overview]
Message-ID: <424743ec-a34d-4af0-adfb-c8392ee5e5be@linux.ibm.com> (raw)
In-Reply-To: <451942FA-0056-466C-AD42-AB0BBE88472E@redhat.com>



On 10/04/26 12:05 pm, Ani Sinha wrote:
> 
> 
>> On 10 Apr 2026, at 10:55 AM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:
>>
>> Hi Ani,
>>
>> On 10/04/26 9:12 am, Ani Sinha wrote:
>>>> On 9 Apr 2026, at 9:40 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:
>>>>
>>>> When kvm_cpu_exec() returns EXCP_HLT due to kvm_arch_process_async_events()
>>>> returning true, it was returning before releasing the BQL (Big QEMU Lock).
>>>> This caused a lock imbalance where the vCPU thread would loop back to
>>>> kvm_cpu_exec() while still holding the BQL, leading to deadlocks.
>>> I am not sure I understand this. Seems kvm_cpu_exec() does expect that the caller holds bql before calling the function. Where is the lock imbalance?
>>
>> The issue is not that kvm_cpu_exec() doesn't expect the caller to hold the BQL - it does. The problem is that kvm_cpu_exec() has inconsistent BQL handling across its return paths.
>>
>> Normal execution path:
>>
>> int kvm_cpu_exec(CPUState *cpu)
>> {
>>     // BQL held on entry (from caller)
>>
>>     if (kvm_arch_process_async_events(cpu)) {
>>         return EXCP_HLT;  // ← Returns with BQL STILL HELD
>>     }
>>
>>     bql_unlock();  // ← Normal path unlocks here
>>     // ... KVM execution loop ...
>>     bql_lock();    // ← Re-acquires before returning
>>     return ret;
>> }
> 
> Yes the semantics of the function kvm_cpu_exec() is that it should always return with bql in locked state. This is because the caller kvm_vcpu_thread_fn() calls this function with bql locked and if you see the end of kvm_vcpu_thread_fn(), it releases the lock.
> 
> So if kvm_cpu_exec() unlocks bql internally, it has the responsibility to lock it again before returning. This makes the locking and unlocking symmetric.
> 
> 
>>
>> The lock imbalance:
>>
>> When kvm_arch_process_async_events() returns true, the function returns EXCP_HLT before the bql_unlock() call.
> 
> Why should it unlock it before returning? In fact it’s opposite. If the function had unlocked bql, it should lock it again before returning.
> 
>> This means the early return path keeps the BQL held,
> 
> This would be the correct thing to do.
> 
>> while the normal execution path releases and re-acquires it.
> 
> Because the functions it calls after unlocking requires bql to be unlocked. Since it had to unlock it, it locks it again before returning.

It had to unlock it for the same reason - to give others a chance to 
lock. We need to handle failure/exception cases for the same purpose as 
well.

> 
>> The caller (kvm_vcpu_thread_fn()) loops back and calls kvm_cpu_exec() again, but now the BQL is already held from the previous iteration
>> This creates a situation where the BQL is never released between iterations when EXCP_HLT is returned.
>>
>> Why this matters:
>> On PowerPC pseries with halted secondary vCPUs (start-powered-off=true), these vCPUs repeatedly call kvm_cpu_exec() which returns EXCP_HLT. Each iteration accumulates BQL holds, preventing other threads (including CPU 0) from making progress.
> 
> This seems like some kind of architectural issue with PowerPC. Shouldn’t qemu_process_cpu_events() -> qemu_cond_wait(cpu->halt_cond, &bql) block other secondary cpus? Then the main cpu does a qemu_cpu_kick() to make them active again at some point?

KVM vCPUs need to enter the kernel to handle the halted state and 
therefore can run. On spapr, it is handled via start-cpu rtas call for 
which the handler in qemu does a qemu_cpu_kick(). However CPU 0 needs to 
be able to proceed before that stage is reached, but it hangs while 
trying to acquire bql_lock in qemu_default_main() whereas secondary vcpu 
is spinning with BQL held returning EXCP_HLT. This is causing deadlock.

> 
>>
>>>>
>>>> The issue manifests as boot hangs on PowerPC pseries machines with multiple
>>>> vCPUs, where secondary vCPUs with start-powered-off=true remain halted and
>>>> repeatedly call kvm_cpu_exec() which returns EXCP_HLT. Each iteration held
>>>> the BQL, preventing other operations from proceeding.
>>>>
>>>> The fix has two parts:
>>>>
>>>> 1. In kvm_cpu_exec() (kvm-all.c):
>>>>    Release the BQL before returning EXCP_HLT in the early return path,
>>>>    matching the behavior of the normal execution path where bql_unlock()
>>>>    is called before entering the main KVM execution loop.
>>>>
>>>> 2. In kvm_vcpu_thread_fn() (kvm-accel-ops.c):
>>>>    Re-acquire the BQL after kvm_cpu_exec() returns EXCP_HLT, since the
>>>>    loop expects to hold the BQL when calling kvm_cpu_exec() again.
>>>>
>>>> This ensures proper BQL lock/unlock pairing:
>>>> - kvm_vcpu_thread_fn() holds BQL before calling kvm_cpu_exec()
>>>> - kvm_cpu_exec() releases BQL before returning (for EXCP_HLT)
>>>> - kvm_vcpu_thread_fn() re-acquires BQL if EXCP_HLT was returned
>>>> - Next iteration has BQL held as expected
>>>>
>>>> This is a regression introduced by commit 98884e0cc1 ("accel/kvm: add
>>>> changes required to support KVM VM file descriptor change") which
>>>> refactored kvm_irqchip_create() and changed the initialization timing,
>>>> exposing this lock imbalance issue.
>>>>
>>>> Fixes: 98884e0cc1 ("accel/kvm: add changes required to support KVM VM file descriptor change")
>>> I do not think this is the right reference. The above commit may have exposed some underlying issue but is certainly not the cause of it. Further, as we have discussed in the other thread, the changes in that commit are not even getting executed.
>>> Personally I think the core issue is somewhere else. I am not convinced this is the proper fix.
>>
>> Regarding commit 98884e0cc1:
>>
>> Reverting the kvm_irqchip_create refactoring makes the problem go away. This commit may have changed timing that exposed the issue, but the root cause is the pre-existing BQL lock imbalance in kvm_cpu_exec(). We can either:
>>
>> Remove the "Fixes:" tag entirely, or
> 
> Lets remove fixes tag entirely unless you can pin point the exact commit that introduced the architectural issues.
> 
>> Add a note that this is a pre-existing issue exposed by timing changes
>> The core fix (releasing BQL before returning EXCP_HLT) is correct and addresses the actual deadlock mechanism.
>>
>>>> Reported-by: Misbah Anjum N <misanjum@linux.ibm.com>
>>>> Reported-by: Gautam Menghani <gautam@linux.ibm.com>
>>>> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
>>>> ---
>>>> accel/kvm/kvm-accel-ops.c | 4 ++++
>>>> accel/kvm/kvm-all.c       | 1 +
>>>> 2 files changed, 5 insertions(+)
>>>>
>>>> diff --git a/accel/kvm/kvm-accel-ops.c b/accel/kvm/kvm-accel-ops.c
>>>> index 6d9140e549..d684fd0840 100644
>>>> --- a/accel/kvm/kvm-accel-ops.c
>>>> +++ b/accel/kvm/kvm-accel-ops.c
>>>> @@ -52,6 +52,10 @@ static void *kvm_vcpu_thread_fn(void *arg)
>>>>
>>>>          if (cpu_can_run(cpu)) {
>>>>              r = kvm_cpu_exec(cpu);
>>>> +            if (r == EXCP_HLT) {
>>>> +                /* kvm_cpu_exec() released BQL, re-acquire for next iteration */
>>>> +                bql_lock();
>>>> +            }
>>>>              if (r == EXCP_DEBUG) {
>>>>                  cpu_handle_guest_debug(cpu);
>>>>              }
>>>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>>>> index 774499d34f..00b8018664 100644
>>>> --- a/accel/kvm/kvm-all.c
>>>> +++ b/accel/kvm/kvm-all.c
>>>> @@ -3439,6 +3439,7 @@ int kvm_cpu_exec(CPUState *cpu)
>>>>      trace_kvm_cpu_exec();
>>>>
>>>>      if (kvm_arch_process_async_events(cpu)) {
>>>> +        bql_unlock();
>>>>          return EXCP_HLT;
>>>>      }
>>>>
>>>> -- 
>>>> 2.52.0
> 
> 
> 



  parent reply	other threads:[~2026-04-10  8:18 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-09 16:10 [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in kvm_cpu_exec Harsh Prateek Bora
2026-04-10  3:42 ` Ani Sinha
2026-04-10  5:25   ` Harsh Prateek Bora
2026-04-10  6:35     ` Ani Sinha
2026-04-10  8:15       ` Ani Sinha
2026-04-10  8:18       ` Harsh Prateek Bora [this message]
2026-04-10  8:29         ` Ani Sinha
2026-04-10  9:01           ` Harsh Prateek Bora
2026-04-10  9:31             ` Ani Sinha
2026-04-10 10:02               ` Harsh Prateek Bora
2026-04-10 10:05                 ` Ani Sinha
2026-04-10 10:16                   ` Harsh Prateek Bora
2026-04-10 13:04             ` BALATON Zoltan
2026-04-10 13:37               ` Ani Sinha
2026-04-10 15:07                 ` BALATON Zoltan
2026-04-10  7:16 ` Misbah Anjum N
2026-04-10 18:12 ` Fabiano Rosas
2026-04-13  5:44   ` Harsh Prateek Bora
2026-04-13  7:13     ` Ani Sinha
2026-04-13  7:39       ` Harsh Prateek Bora

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=424743ec-a34d-4af0-adfb-c8392ee5e5be@linux.ibm.com \
    --to=harshpb@linux.ibm.com \
    --cc=anisinha@redhat.com \
    --cc=gautam@linux.ibm.com \
    --cc=misanjum@linux.ibm.com \
    --cc=npiggin@gmail.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.