[Qemu-devel] Holding the BQL for emulate_ppc

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] Holding the BQL for emulate_ppc_hypercall
@ 2016-10-24 14:41 Alex Bennée
  2016-10-24 14:44 ` Alex Bennée
  2016-10-25  3:43 ` Nikunj A Dadhania
  0 siblings, 2 replies; 6+ messages in thread
From: Alex Bennée @ 2016-10-24 14:41 UTC (permalink / raw)
  To: Nikunj A Dadhania, Bharata B Rao, David Gibson; +Cc: qemu-ppc, Qemu Developers

Hi,

In the MTTCG patch set one of the big patches is to remove the
requirement to hold the BQL while running code:

  tcg: drop global lock during TCG code execution

And this broke the PPC code because emulate_ppc_hypercall can cause
changes to the global state. This function just calls spapr_hypercall()
and puts the results into the TCG register file. Normally
spapr_hypercall() is called under the BQL in KVM as
kvm_arch_handle_exit() does things with the BQL held.

I blithely wrapped the called in a lock/unlock pair only to find the
ppc64 check builds failed as the hypercall was made during the
cc->do_interrupt() code which also holds the BQL.

I'm a little confused by the nature of PPC hypercalls in TCG? Are they
not all detectable at code generation time? What is the case that causes
an exception to occur rather than the helper function doing the
hypercall?

I guess it comes down to can I avoid doing:

  /* If we come via cc->do_interrupt BQL may already be held */
  if (!qemu_mutex_iothread_locked()) {
      g_mutex_lock_iothread();
      env->gpr[3] = spapr_hypercall(cpu, env->gpr[3], &env->gpr[4]);
      g_muetx_unlock_iothread();
  } else {
      env->gpr[3] = spapr_hypercall(cpu, env->gpr[3], &env->gpr[4]);
  }

Any thoughts?

--
Alex Bennée

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] Holding the BQL for emulate_ppc_hypercall
  2016-10-24 14:41 [Qemu-devel] Holding the BQL for emulate_ppc_hypercall Alex Bennée
@ 2016-10-24 14:44 ` Alex Bennée
  2016-10-25  2:47   ` David Gibson
  2016-10-25  3:43 ` Nikunj A Dadhania
  1 sibling, 1 reply; 6+ messages in thread
From: Alex Bennée @ 2016-10-24 14:44 UTC (permalink / raw)
  To: Nikunj A Dadhania, Bharata B Rao, David Gibson; +Cc: qemu-ppc, Qemu Developers


Alex Bennée <alex.bennee@linaro.org> writes:

> Hi,
>
> In the MTTCG patch set one of the big patches is to remove the
> requirement to hold the BQL while running code:
>
>   tcg: drop global lock during TCG code execution
>
> And this broke the PPC code because emulate_ppc_hypercall can cause
> changes to the global state. This function just calls spapr_hypercall()
> and puts the results into the TCG register file. Normally
> spapr_hypercall() is called under the BQL in KVM as
> kvm_arch_handle_exit() does things with the BQL held.
>
> I blithely wrapped the called in a lock/unlock pair only to find the
> ppc64 check builds failed as the hypercall was made during the
> cc->do_interrupt() code which also holds the BQL.
>
> I'm a little confused by the nature of PPC hypercalls in TCG? Are they
> not all detectable at code generation time? What is the case that causes
> an exception to occur rather than the helper function doing the
> hypercall?
>
> I guess it comes down to can I avoid doing:
>
>   /* If we come via cc->do_interrupt BQL may already be held */
>   if (!qemu_mutex_iothread_locked()) {
>       g_mutex_lock_iothread();
>       env->gpr[3] = spapr_hypercall(cpu, env->gpr[3], &env->gpr[4]);
>       g_muetx_unlock_iothread();
>   } else {
>       env->gpr[3] = spapr_hypercall(cpu, env->gpr[3], &env->gpr[4]);
>   }

Of course I mean:

  /* If we come via cc->do_interrupt BQL may already be held */
  if (!qemu_mutex_iothread_locked()) {
      qemu_mutex_lock_iothread();
      env->gpr[3] = spapr_hypercall(cpu, env->gpr[3], &env->gpr[4]);
      qemu_mutex_unlock_iothread();
  } else {
      env->gpr[3] = spapr_hypercall(cpu, env->gpr[3], &env->gpr[4]);
  }

> Any thoughts?


--
Alex Bennée

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] Holding the BQL for emulate_ppc_hypercall
  2016-10-24 14:44 ` Alex Bennée
@ 2016-10-25  2:47   ` David Gibson
  2016-10-25  8:36     ` Alex Bennée
  0 siblings, 1 reply; 6+ messages in thread
From: David Gibson @ 2016-10-25  2:47 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Nikunj A Dadhania, Bharata B Rao, qemu-ppc, Qemu Developers

[-- Attachment #1: Type: text/plain, Size: 3202 bytes --]

On Mon, Oct 24, 2016 at 03:44:01PM +0100, Alex Bennée wrote:
> 
> Alex Bennée <alex.bennee@linaro.org> writes:
> 
> > Hi,
> >
> > In the MTTCG patch set one of the big patches is to remove the
> > requirement to hold the BQL while running code:
> >
> >   tcg: drop global lock during TCG code execution
> >
> > And this broke the PPC code because emulate_ppc_hypercall can cause
> > changes to the global state. This function just calls spapr_hypercall()
> > and puts the results into the TCG register file. Normally
> > spapr_hypercall() is called under the BQL in KVM as
> > kvm_arch_handle_exit() does things with the BQL held.
> >
> > I blithely wrapped the called in a lock/unlock pair only to find the
> > ppc64 check builds failed as the hypercall was made during the
> > cc->do_interrupt() code which also holds the BQL.
> >
> > I'm a little confused by the nature of PPC hypercalls in TCG? Are they
> > not all detectable at code generation time? What is the case that causes
> > an exception to occur rather than the helper function doing the
> > hypercall?
> >
> > I guess it comes down to can I avoid doing:
> >
> >   /* If we come via cc->do_interrupt BQL may already be held */
> >   if (!qemu_mutex_iothread_locked()) {
> >       g_mutex_lock_iothread();
> >       env->gpr[3] = spapr_hypercall(cpu, env->gpr[3], &env->gpr[4]);
> >       g_muetx_unlock_iothread();
> >   } else {
> >       env->gpr[3] = spapr_hypercall(cpu, env->gpr[3], &env->gpr[4]);
> >   }
> 
> Of course I mean:
> 
>   /* If we come via cc->do_interrupt BQL may already be held */
>   if (!qemu_mutex_iothread_locked()) {
>       qemu_mutex_lock_iothread();
>       env->gpr[3] = spapr_hypercall(cpu, env->gpr[3], &env->gpr[4]);
>       qemu_mutex_unlock_iothread();
>   } else {
>       env->gpr[3] = spapr_hypercall(cpu, env->gpr[3], &env->gpr[4]);
>   }
> 
> > Any thoughts?

So, I understand why the hypercall is being called from exception code
and therefore with the BQL held.  On Power, the hypercall instruction
is the same as the guest-level system call instruction, just with a
flag bit set.  System calls are, of course, treated as exceptions,
because they change the CPU's privilege mode.  Likewise if we were
implementing a full host system (like the upcoming 'powernv' machine
type) we'd need to treat hypercalls as exceptions for the same reason.

We could detect hypercalls at translation time, but at present we
don't: we go into the exception path, then detect that it's a "level
1" (i.e. hypervisor) sc instruction and branch off to the hypercall
emulation code if that's been set up.  It just seemed the simplet
approach at the time.

What I *don't* understand is how the hypercall code is ever being
invoked *without* the BQL.  I grepped through and the only entry paths
I can see are the one in the exception handling and KVM.

Could you try to get a backtrace from the case where we're entering
the hypercall without the BQL?

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] Holding the BQL for emulate_ppc_hypercall
  2016-10-25  2:47   ` David Gibson
@ 2016-10-25  8:36     ` Alex Bennée
  0 siblings, 0 replies; 6+ messages in thread
From: Alex Bennée @ 2016-10-25  8:36 UTC (permalink / raw)
  To: David Gibson; +Cc: Nikunj A Dadhania, Bharata B Rao, qemu-ppc, Qemu Developers


David Gibson <david@gibson.dropbear.id.au> writes:

> On Mon, Oct 24, 2016 at 03:44:01PM +0100, Alex Bennée wrote:
>>
>> Alex Bennée <alex.bennee@linaro.org> writes:
>>
>> > Hi,
>> >
>> > In the MTTCG patch set one of the big patches is to remove the
>> > requirement to hold the BQL while running code:
>> >
>> >   tcg: drop global lock during TCG code execution
>> >
>> > And this broke the PPC code because emulate_ppc_hypercall can cause
>> > changes to the global state. This function just calls spapr_hypercall()
>> > and puts the results into the TCG register file. Normally
>> > spapr_hypercall() is called under the BQL in KVM as
>> > kvm_arch_handle_exit() does things with the BQL held.
>> >
>> > I blithely wrapped the called in a lock/unlock pair only to find the
>> > ppc64 check builds failed as the hypercall was made during the
>> > cc->do_interrupt() code which also holds the BQL.
>> >
>> > I'm a little confused by the nature of PPC hypercalls in TCG? Are they
>> > not all detectable at code generation time? What is the case that causes
>> > an exception to occur rather than the helper function doing the
>> > hypercall?
>> >
>> > I guess it comes down to can I avoid doing:
>> >
>> >   /* If we come via cc->do_interrupt BQL may already be held */
>> >   if (!qemu_mutex_iothread_locked()) {
>> >       g_mutex_lock_iothread();
>> >       env->gpr[3] = spapr_hypercall(cpu, env->gpr[3], &env->gpr[4]);
>> >       g_muetx_unlock_iothread();
>> >   } else {
>> >       env->gpr[3] = spapr_hypercall(cpu, env->gpr[3], &env->gpr[4]);
>> >   }
>>
>> Of course I mean:
>>
>>   /* If we come via cc->do_interrupt BQL may already be held */
>>   if (!qemu_mutex_iothread_locked()) {
>>       qemu_mutex_lock_iothread();
>>       env->gpr[3] = spapr_hypercall(cpu, env->gpr[3], &env->gpr[4]);
>>       qemu_mutex_unlock_iothread();
>>   } else {
>>       env->gpr[3] = spapr_hypercall(cpu, env->gpr[3], &env->gpr[4]);
>>   }
>>
>> > Any thoughts?
>
> So, I understand why the hypercall is being called from exception code
> and therefore with the BQL held.  On Power, the hypercall instruction
> is the same as the guest-level system call instruction, just with a
> flag bit set.  System calls are, of course, treated as exceptions,
> because they change the CPU's privilege mode.  Likewise if we were
> implementing a full host system (like the upcoming 'powernv' machine
> type) we'd need to treat hypercalls as exceptions for the same reason.
>
> We could detect hypercalls at translation time, but at present we
> don't: we go into the exception path, then detect that it's a "level
> 1" (i.e. hypervisor) sc instruction and branch off to the hypercall
> emulation code if that's been set up.  It just seemed the simplet
> approach at the time.
>
> What I *don't* understand is how the hypercall code is ever being
> invoked *without* the BQL.  I grepped through and the only entry paths
> I can see are the one in the exception handling and KVM.

Ahh I think I have figured out what happened.

In the pre-MTTCG world TCG code always held the BQL. After Jan's patch
the BQL was held for interrupt processing but not exception handling.
I've since fixed this. I think I must have tried this hack before fixing
the root cause because I had another assertion failure when trying to
ensure all IRQ processing has the BQL held (as the target specific code
can do all sorts of things on an IRQ/Exception).

> Could you try to get a backtrace from the case where we're entering
> the hypercall without the BQL?

I'll remove the hack, add an assert and see if it comes up again in
testing.

--
Alex Bennée

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] Holding the BQL for emulate_ppc_hypercall
  2016-10-24 14:41 [Qemu-devel] Holding the BQL for emulate_ppc_hypercall Alex Bennée
  2016-10-24 14:44 ` Alex Bennée
@ 2016-10-25  3:43 ` Nikunj A Dadhania
  2016-10-25  8:39   ` Alex Bennée
  1 sibling, 1 reply; 6+ messages in thread
From: Nikunj A Dadhania @ 2016-10-25  3:43 UTC (permalink / raw)
  To: Alex Bennée, Bharata B Rao, David Gibson; +Cc: qemu-ppc, Qemu Developers

Alex Bennée <alex.bennee@linaro.org> writes:

> Hi,
>
> In the MTTCG patch set one of the big patches is to remove the
> requirement to hold the BQL while running code:
>
>   tcg: drop global lock during TCG code execution
>
> And this broke the PPC code because emulate_ppc_hypercall can cause
> changes to the global state. This function just calls spapr_hypercall()
> and puts the results into the TCG register file. Normally
> spapr_hypercall() is called under the BQL in KVM as
> kvm_arch_handle_exit() does things with the BQL held.
>
> I blithely wrapped the called in a lock/unlock pair only to find the
> ppc64 check builds failed as the hypercall was made during the
> cc->do_interrupt() code which also holds the BQL.
>
> I'm a little confused by the nature of PPC hypercalls in TCG? Are they
> not all detectable at code generation time? What is the case that causes
> an exception to occur rather than the helper function doing the
> hypercall?
>
> I guess it comes down to can I avoid doing:
>
>   /* If we come via cc->do_interrupt BQL may already be held */
>   if (!qemu_mutex_iothread_locked()) {
>       g_mutex_lock_iothread();
>       env->gpr[3] = spapr_hypercall(cpu, env->gpr[3], &env->gpr[4]);
>       g_muetx_unlock_iothread();
>   } else {
>       env->gpr[3] = spapr_hypercall(cpu, env->gpr[3], &env->gpr[4]);
>   }
>
> Any thoughts?

Similar discussions happened on this patch:
https://lists.gnu.org/archive/html/qemu-ppc/2016-09/msg00015.html

This was just working for TCG case, need to fix for KVM. I would need to
handle KVM case to avoid a deadlock.

Regards
Nikunj

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] Holding the BQL for emulate_ppc_hypercall
  2016-10-25  3:43 ` Nikunj A Dadhania
@ 2016-10-25  8:39   ` Alex Bennée
  0 siblings, 0 replies; 6+ messages in thread
From: Alex Bennée @ 2016-10-25  8:39 UTC (permalink / raw)
  To: Nikunj A Dadhania; +Cc: Bharata B Rao, David Gibson, qemu-ppc, Qemu Developers


Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> writes:

> Alex Bennée <alex.bennee@linaro.org> writes:
>
>> Hi,
>>
>> In the MTTCG patch set one of the big patches is to remove the
>> requirement to hold the BQL while running code:
>>
>>   tcg: drop global lock during TCG code execution
>>
>> And this broke the PPC code because emulate_ppc_hypercall can cause
>> changes to the global state. This function just calls spapr_hypercall()
>> and puts the results into the TCG register file. Normally
>> spapr_hypercall() is called under the BQL in KVM as
>> kvm_arch_handle_exit() does things with the BQL held.
>>
>> I blithely wrapped the called in a lock/unlock pair only to find the
>> ppc64 check builds failed as the hypercall was made during the
>> cc->do_interrupt() code which also holds the BQL.
>>
>> I'm a little confused by the nature of PPC hypercalls in TCG? Are they
>> not all detectable at code generation time? What is the case that causes
>> an exception to occur rather than the helper function doing the
>> hypercall?
>>
>> I guess it comes down to can I avoid doing:
>>
>>   /* If we come via cc->do_interrupt BQL may already be held */
>>   if (!qemu_mutex_iothread_locked()) {
>>       g_mutex_lock_iothread();
>>       env->gpr[3] = spapr_hypercall(cpu, env->gpr[3], &env->gpr[4]);
>>       g_muetx_unlock_iothread();
>>   } else {
>>       env->gpr[3] = spapr_hypercall(cpu, env->gpr[3], &env->gpr[4]);
>>   }
>>
>> Any thoughts?
>
> Similar discussions happened on this patch:
> https://lists.gnu.org/archive/html/qemu-ppc/2016-09/msg00015.html
>
> This was just working for TCG case, need to fix for KVM. I would need to
> handle KVM case to avoid a deadlock.

Thanks for the pointer I missed that before.

But I think the fix here is too far down the call stack as the
spapr_hypercall is called by both TCG and KVM paths. But as discussed
on my reply to Dave I think the correct fix is to ensure
cpu-exec.c:cpu_handle_exception also takes the BQL when delivering
exceptions:

            if (replay_exception()) {
                CPUClass *cc = CPU_GET_CLASS(cpu);
                qemu_mutex_lock_iothread();
                cc->do_interrupt(cpu);
                qemu_mutex_unlock_iothread();
                cpu->exception_index = -1;
            } else if (!replay_has_interrupt()) {

I got confused by the if(replay_exception()) which is a bit non-obvious.

>
> Regards
> Nikunj


--
Alex Bennée

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-10-25  8:39 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-10-24 14:41 [Qemu-devel] Holding the BQL for emulate_ppc_hypercall Alex Bennée
2016-10-24 14:44 ` Alex Bennée
2016-10-25  2:47   ` David Gibson
2016-10-25  8:36     ` Alex Bennée
2016-10-25  3:43 ` Nikunj A Dadhania
2016-10-25  8:39   ` Alex Bennée

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).