[Qemu-devel] MTTCG External Halt

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] MTTCG External Halt
@ 2018-01-03 22:10 Alistair Francis
  2018-01-03 22:14 ` Peter Maydell
                   ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Alistair Francis @ 2018-01-03 22:10 UTC (permalink / raw)
  To: qemu-devel@nongnu.org Developers; +Cc: Alex Bennée, Alistair Francis

Hey guys, I'm super stuck with an ugly MTTCG issue and was wondering
if anyone had any ideas.

In the Xilinx fork of QEMU (based on 2.11) we have a way for CPUs to
halt other CPUs. This is used for example when the power control unit
halts the ARM A53s. To do this we have internal GPIO signals that end
up calling a function that basically does this:

To halt:
    cpu->halted = true;
    cpu_interrupt(cpu, CPU_INTERRUPT_HALT);

To un-halt
    cpu->halted = false;
    cpu_reset_interrupt(cpu, CPU_INTERRUPT_HALT);

We also have the standard ARM WFI (Wait For Interrupt) implementation
in op_helper.c:
    cs->halted = 1;
    cs->exception_index = EXCP_HLT;
    cpu_loop_exit(cs);

Before MTTCG this used to work great, but now either we end up with
the guest Linux complaining about CPU stalls or we hit:
ERROR:/scratch/alistai/master-qemu/cpus.c:1516:qemu_tcg_cpu_thread_fn:
assertion failed: (cpu->halted)

If I remove the instances of manually setting cpu->halted then I don't
see the asserts(), but the the WFI instruction doesn't work correctly.
So it seems like setting the halted status externally from the CPU
causes the issue. I have tried setting it inside a lock, using atomic
operations and running the setter async on the CPU, but nothing works.

Any chance any one has some insight into a way to externally set a
vCPU as halted/un-halted?

Thanks,
Alistair

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] MTTCG External Halt
  2018-01-03 22:10 [Qemu-devel] MTTCG External Halt Alistair Francis
@ 2018-01-03 22:14 ` Peter Maydell
  2018-01-03 22:23   ` Alistair Francis
  2018-01-04 11:08 ` Alex Bennée
  2018-01-31 17:13 ` Paolo Bonzini
  2 siblings, 1 reply; 25+ messages in thread
From: Peter Maydell @ 2018-01-03 22:14 UTC (permalink / raw)
  To: Alistair Francis
  Cc: qemu-devel@nongnu.org Developers, Alex Bennée,
	Alistair Francis

On 3 January 2018 at 22:10, Alistair Francis <alistair23@gmail.com> wrote:
> Any chance any one has some insight into a way to externally set a
> vCPU as halted/un-halted?

PSCI (where one vCPU can power off another) does this by
calling arm_set_cpu_off(). Does that (or some variation
on it) work?

thanks
-- PMM

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] MTTCG External Halt
  2018-01-03 22:14 ` Peter Maydell
@ 2018-01-03 22:23   ` Alistair Francis
  2018-01-04  1:14     ` Alistair Francis
  0 siblings, 1 reply; 25+ messages in thread
From: Alistair Francis @ 2018-01-03 22:23 UTC (permalink / raw)
  To: Peter Maydell
  Cc: qemu-devel@nongnu.org Developers, Alex Bennée,
	Alistair Francis

On Wed, Jan 3, 2018 at 2:14 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 3 January 2018 at 22:10, Alistair Francis <alistair23@gmail.com> wrote:
>> Any chance any one has some insight into a way to externally set a
>> vCPU as halted/un-halted?
>
> PSCI (where one vCPU can power off another) does this by
> calling arm_set_cpu_off(). Does that (or some variation
> on it) work?

It seems to help with the assert(), but I still see CPU stalls.

I also forgot to mention that we have a sev implementation, which also
might be contributing.

Alistair

>
> thanks
> -- PMM

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] MTTCG External Halt
  2018-01-03 22:23   ` Alistair Francis
@ 2018-01-04  1:14     ` Alistair Francis
  0 siblings, 0 replies; 25+ messages in thread
From: Alistair Francis @ 2018-01-04  1:14 UTC (permalink / raw)
  To: Peter Maydell
  Cc: qemu-devel@nongnu.org Developers, Alex Bennée,
	Alistair Francis

On Wed, Jan 3, 2018 at 2:23 PM, Alistair Francis <alistair23@gmail.com> wrote:
> On Wed, Jan 3, 2018 at 2:14 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
>> On 3 January 2018 at 22:10, Alistair Francis <alistair23@gmail.com> wrote:
>>> Any chance any one has some insight into a way to externally set a
>>> vCPU as halted/un-halted?
>>
>> PSCI (where one vCPU can power off another) does this by
>> calling arm_set_cpu_off(). Does that (or some variation
>> on it) work?
>
> It seems to help with the assert(), but I still see CPU stalls.
>
> I also forgot to mention that we have a sev implementation, which also
> might be contributing.

I figured it out. We have the same thing for reset (a GPIO line can
reset the cores) and apparently resting the same core twice in a row
was causing the assert(). Resting the core twice was a bug, so I have
fixed that and I don't see the assert() any more. I'm still not sure
why that assert() was being hit after a reset and halt/un-halt though.

Thanks for your help Peter.

Alistair

>
> Alistair
>
>>
>> thanks
>> -- PMM

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] MTTCG External Halt
  2018-01-03 22:10 [Qemu-devel] MTTCG External Halt Alistair Francis
  2018-01-03 22:14 ` Peter Maydell
@ 2018-01-04 11:08 ` Alex Bennée
  2018-01-06  2:23   ` Alistair Francis
  2018-01-31 17:13 ` Paolo Bonzini
  2 siblings, 1 reply; 25+ messages in thread
From: Alex Bennée @ 2018-01-04 11:08 UTC (permalink / raw)
  To: Alistair Francis; +Cc: qemu-devel@nongnu.org Developers, Alistair Francis

Alistair Francis <alistair23@gmail.com> writes:

> Hey guys, I'm super stuck with an ugly MTTCG issue and was wondering
> if anyone had any ideas.
>
> In the Xilinx fork of QEMU (based on 2.11) we have a way for CPUs to
> halt other CPUs. This is used for example when the power control unit
> halts the ARM A53s. To do this we have internal GPIO signals that end
> up calling a function that basically does this:
>
> To halt:
>     cpu->halted = true;
>     cpu_interrupt(cpu, CPU_INTERRUPT_HALT);

Hmm I don't think you should be setting cpu->halted unless you know it
is safe to do so. As the other CPUs free-run during BQL this isn't
enough for a cross vCPU interaction. However you can schedule work to
run in the target vCPUs context safely.

That said isn't the cpu_interrupt enough to trigger the target vCPU to
halt?

>
> To un-halt
>     cpu->halted = false;
>     cpu_reset_interrupt(cpu, CPU_INTERRUPT_HALT);

Again if cross vCPU context this needs to be scheduled against the
target vCPU.

>
> We also have the standard ARM WFI (Wait For Interrupt) implementation
> in op_helper.c:
>     cs->halted = 1;
>     cs->exception_index = EXCP_HLT;
>     cpu_loop_exit(cs);
>
> Before MTTCG this used to work great, but now either we end up with
> the guest Linux complaining about CPU stalls or we hit:
> ERROR:/scratch/alistai/master-qemu/cpus.c:1516:qemu_tcg_cpu_thread_fn:
> assertion failed: (cpu->halted)
>
> If I remove the instances of manually setting cpu->halted then I don't
> see the asserts(), but the the WFI instruction doesn't work correctly.
> So it seems like setting the halted status externally from the CPU
> causes the issue.

  /* during start-up the vCPU is reset and the thread is
   * kicked several times. If we don't ensure we go back
   * to sleep in the halted state we won't cleanly
   * start-up when the vCPU is enabled.
   *
   * cpu->halted should ensure we sleep in wait_io_event
   */

I think what I'm trying to say is we should never be halted without
having gone via wait_io_event where we can sleep.

> I have tried setting it inside a lock, using atomic
> operations and running the setter async on the CPU, but nothing works.
>
> Any chance any one has some insight into a way to externally set a
> vCPU as halted/un-halted?

See the PSCI code which uses the async interface for exactly this.

>
> Thanks,
> Alistair

--
Alex Bennée

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] MTTCG External Halt
  2018-01-04 11:08 ` Alex Bennée
@ 2018-01-06  2:23   ` Alistair Francis
  2018-01-30 23:56     ` Alistair Francis
  0 siblings, 1 reply; 25+ messages in thread
From: Alistair Francis @ 2018-01-06  2:23 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel@nongnu.org Developers, Alistair Francis

On Thu, Jan 4, 2018 at 3:08 AM, Alex Bennée <alex.bennee@linaro.org> wrote:
>
> Alistair Francis <alistair23@gmail.com> writes:
>
>> Hey guys, I'm super stuck with an ugly MTTCG issue and was wondering
>> if anyone had any ideas.
>>
>> In the Xilinx fork of QEMU (based on 2.11) we have a way for CPUs to
>> halt other CPUs. This is used for example when the power control unit
>> halts the ARM A53s. To do this we have internal GPIO signals that end
>> up calling a function that basically does this:
>>
>> To halt:
>>     cpu->halted = true;
>>     cpu_interrupt(cpu, CPU_INTERRUPT_HALT);
>
> Hmm I don't think you should be setting cpu->halted unless you know it
> is safe to do so. As the other CPUs free-run during BQL this isn't
> enough for a cross vCPU interaction. However you can schedule work to
> run in the target vCPUs context safely.

We actually pretty much only ever set it on reset.

>
> That said isn't the cpu_interrupt enough to trigger the target vCPU to
> halt?
>
>>
>> To un-halt
>>     cpu->halted = false;
>>     cpu_reset_interrupt(cpu, CPU_INTERRUPT_HALT);
>
> Again if cross vCPU context this needs to be scheduled against the
> target vCPU.
>
>>
>> We also have the standard ARM WFI (Wait For Interrupt) implementation
>> in op_helper.c:
>>     cs->halted = 1;
>>     cs->exception_index = EXCP_HLT;
>>     cpu_loop_exit(cs);
>>
>> Before MTTCG this used to work great, but now either we end up with
>> the guest Linux complaining about CPU stalls or we hit:
>> ERROR:/scratch/alistai/master-qemu/cpus.c:1516:qemu_tcg_cpu_thread_fn:
>> assertion failed: (cpu->halted)
>>
>> If I remove the instances of manually setting cpu->halted then I don't
>> see the asserts(), but the the WFI instruction doesn't work correctly.
>> So it seems like setting the halted status externally from the CPU
>> causes the issue.
>
>   /* during start-up the vCPU is reset and the thread is
>    * kicked several times. If we don't ensure we go back
>    * to sleep in the halted state we won't cleanly
>    * start-up when the vCPU is enabled.
>    *
>    * cpu->halted should ensure we sleep in wait_io_event
>    */
>
> I think what I'm trying to say is we should never be halted without
> having gone via wait_io_event where we can sleep.
>
>
>> I have tried setting it inside a lock, using atomic
>> operations and running the setter async on the CPU, but nothing works.
>>
>> Any chance any one has some insight into a way to externally set a
>> vCPU as halted/un-halted?
>
> See the PSCI code which uses the async interface for exactly this.

Yeah, that and a fix to our weird double reset fixed it.

What I don't get is how a double reset would cause the assert() to be hit.

Alistair

>
>>
>> Thanks,
>> Alistair
>
>
> --
> Alex Bennée

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] MTTCG External Halt
  2018-01-06  2:23   ` Alistair Francis
@ 2018-01-30 23:56     ` Alistair Francis
  2018-01-31  4:26       ` Paolo Bonzini
  0 siblings, 1 reply; 25+ messages in thread
From: Alistair Francis @ 2018-01-30 23:56 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel@nongnu.org Developers, Alistair Francis

On Fri, Jan 5, 2018 at 6:23 PM, Alistair Francis <alistair23@gmail.com> wrote:
> On Thu, Jan 4, 2018 at 3:08 AM, Alex Bennée <alex.bennee@linaro.org> wrote:
>>
>> Alistair Francis <alistair23@gmail.com> writes:
>>
>>> Hey guys, I'm super stuck with an ugly MTTCG issue and was wondering
>>> if anyone had any ideas.
>>>
>>> In the Xilinx fork of QEMU (based on 2.11) we have a way for CPUs to
>>> halt other CPUs. This is used for example when the power control unit
>>> halts the ARM A53s. To do this we have internal GPIO signals that end
>>> up calling a function that basically does this:
>>>
>>> To halt:
>>>     cpu->halted = true;
>>>     cpu_interrupt(cpu, CPU_INTERRUPT_HALT);
>>
>> Hmm I don't think you should be setting cpu->halted unless you know it
>> is safe to do so. As the other CPUs free-run during BQL this isn't
>> enough for a cross vCPU interaction. However you can schedule work to
>> run in the target vCPUs context safely.
>
> We actually pretty much only ever set it on reset.
>
>>
>> That said isn't the cpu_interrupt enough to trigger the target vCPU to
>> halt?
>>
>>>
>>> To un-halt
>>>     cpu->halted = false;
>>>     cpu_reset_interrupt(cpu, CPU_INTERRUPT_HALT);
>>
>> Again if cross vCPU context this needs to be scheduled against the
>> target vCPU.
>>
>>>
>>> We also have the standard ARM WFI (Wait For Interrupt) implementation
>>> in op_helper.c:
>>>     cs->halted = 1;
>>>     cs->exception_index = EXCP_HLT;
>>>     cpu_loop_exit(cs);
>>>
>>> Before MTTCG this used to work great, but now either we end up with
>>> the guest Linux complaining about CPU stalls or we hit:
>>> ERROR:/scratch/alistai/master-qemu/cpus.c:1516:qemu_tcg_cpu_thread_fn:
>>> assertion failed: (cpu->halted)
>>>
>>> If I remove the instances of manually setting cpu->halted then I don't
>>> see the asserts(), but the the WFI instruction doesn't work correctly.
>>> So it seems like setting the halted status externally from the CPU
>>> causes the issue.
>>
>>   /* during start-up the vCPU is reset and the thread is
>>    * kicked several times. If we don't ensure we go back
>>    * to sleep in the halted state we won't cleanly
>>    * start-up when the vCPU is enabled.
>>    *
>>    * cpu->halted should ensure we sleep in wait_io_event
>>    */
>>
>> I think what I'm trying to say is we should never be halted without
>> having gone via wait_io_event where we can sleep.
>>
>>
>>> I have tried setting it inside a lock, using atomic
>>> operations and running the setter async on the CPU, but nothing works.
>>>
>>> Any chance any one has some insight into a way to externally set a
>>> vCPU as halted/un-halted?
>>
>> See the PSCI code which uses the async interface for exactly this.

Grr... It's back.

I narrowed it down to a reset (triggered by a external GPIO) is
causing the problem. Apparently QEMU doesn't like halted CPUs being
reset while spinning around qemu_tcg_cpu_thread_fn().

I don't have a good solution though, as setting CPU_INTERRUPT_RESET
doesn't help (that isn't handled while we are halted) and
async_run_on_cpu()/run_on_cpu() doesn't reliably reset the CPU when we
want.

I've ever tried pausing all CPUs before reseting the CPU and them
resuming them all but that doesn't seem to to work either. Is there
anything I'm missing? Is there no reliable way to reset a CPU?

Alistair

>
> Yeah, that and a fix to our weird double reset fixed it.
>
> What I don't get is how a double reset would cause the assert() to be hit.
>
> Alistair
>
>>
>>>
>>> Thanks,
>>> Alistair
>>
>>
>> --
>> Alex Bennée

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] MTTCG External Halt
  2018-01-30 23:56     ` Alistair Francis
@ 2018-01-31  4:26       ` Paolo Bonzini
  2018-01-31 16:08         ` Alistair Francis
  0 siblings, 1 reply; 25+ messages in thread
From: Paolo Bonzini @ 2018-01-31  4:26 UTC (permalink / raw)
  To: Alistair Francis, Alex Bennée; +Cc: qemu-devel@nongnu.org Developers

On 30/01/2018 18:56, Alistair Francis wrote:
> 
> I don't have a good solution though, as setting CPU_INTERRUPT_RESET
> doesn't help (that isn't handled while we are halted) and
> async_run_on_cpu()/run_on_cpu() doesn't reliably reset the CPU when we
> want.
> 
> I've ever tried pausing all CPUs before reseting the CPU and them
> resuming them all but that doesn't seem to to work either.

async_safe_run_on_cpu would be like async_run_on_cpu, except that it
takes care of stopping all other CPUs while the function runs.

> Is there
> anything I'm missing? Is there no reliable way to reset a CPU?

What do you mean by reliable?  Executing no instruction after the one
you were at?

Paolo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] MTTCG External Halt
  2018-01-31  4:26       ` Paolo Bonzini
@ 2018-01-31 16:08         ` Alistair Francis
  2018-01-31 20:32           ` Alex Bennée
  0 siblings, 1 reply; 25+ messages in thread
From: Alistair Francis @ 2018-01-31 16:08 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Alistair Francis, Alex Bennée,
	qemu-devel@nongnu.org Developers

On Tue, Jan 30, 2018 at 8:26 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 30/01/2018 18:56, Alistair Francis wrote:
>>
>> I don't have a good solution though, as setting CPU_INTERRUPT_RESET
>> doesn't help (that isn't handled while we are halted) and
>> async_run_on_cpu()/run_on_cpu() doesn't reliably reset the CPU when we
>> want.
>>
>> I've ever tried pausing all CPUs before reseting the CPU and them
>> resuming them all but that doesn't seem to to work either.
>
> async_safe_run_on_cpu would be like async_run_on_cpu, except that it
> takes care of stopping all other CPUs while the function runs.
>
>> Is there
>> anything I'm missing? Is there no reliable way to reset a CPU?
>
> What do you mean by reliable?  Executing no instruction after the one
> you were at?

The reset is called by a GPIO line, so I need the reset to be called
basically as quickly as the GPIO line changes. The async_ and
async_safe_ functions seem to not run quickly enough, even if I run a
process_work_queue() function afterwards.

Is there a way to kick the CPU to act on the async_*?

Thanks,
Alistair

>
> Paolo
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] MTTCG External Halt
  2018-01-03 22:10 [Qemu-devel] MTTCG External Halt Alistair Francis
  2018-01-03 22:14 ` Peter Maydell
  2018-01-04 11:08 ` Alex Bennée
@ 2018-01-31 17:13 ` Paolo Bonzini
  2018-01-31 18:17   ` Alistair Francis
  2 siblings, 1 reply; 25+ messages in thread
From: Paolo Bonzini @ 2018-01-31 17:13 UTC (permalink / raw)
  To: Alistair Francis, qemu-devel@nongnu.org Developers
  Cc: Alex Bennée, Alistair Francis

On 03/01/2018 17:10, Alistair Francis wrote:
> Hey guys, I'm super stuck with an ugly MTTCG issue and was wondering
> if anyone had any ideas.
> 
> In the Xilinx fork of QEMU (based on 2.11) we have a way for CPUs to
> halt other CPUs. This is used for example when the power control unit
> halts the ARM A53s. To do this we have internal GPIO signals that end
> up calling a function that basically does this:
> 
> To halt:
>     cpu->halted = true;
>     cpu_interrupt(cpu, CPU_INTERRUPT_HALT);

cpu->halted = true should not be needed here.  It will be set by
cpu_handle_interrupt when processing CPU_INTERRUPT_HALT.

> To un-halt
>     cpu->halted = false;
>     cpu_reset_interrupt(cpu, CPU_INTERRUPT_HALT);

cpu->halted = false likewise should not be needed here, but you cannot
just clear CPU_INTERRUPT_HALT either.  You need to set a *different*
interrupt request bit (the dummy CPU_INTERRUPT_EXITTB will do) and
cpu_handle_halt will clear cpu->halted.

Paolo

> We also have the standard ARM WFI (Wait For Interrupt) implementation
> in op_helper.c:
>     cs->halted = 1;
>     cs->exception_index = EXCP_HLT;
>     cpu_loop_exit(cs);
> 
> Before MTTCG this used to work great, but now either we end up with
> the guest Linux complaining about CPU stalls or we hit:
> ERROR:/scratch/alistai/master-qemu/cpus.c:1516:qemu_tcg_cpu_thread_fn:
> assertion failed: (cpu->halted)
> 
> If I remove the instances of manually setting cpu->halted then I don't
> see the asserts(), but the the WFI instruction doesn't work correctly.
> So it seems like setting the halted status externally from the CPU
> causes the issue. I have tried setting it inside a lock, using atomic
> operations and running the setter async on the CPU, but nothing works.
> 
> Any chance any one has some insight into a way to externally set a
> vCPU as halted/un-halted?
> 
> Thanks,
> Alistair
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] MTTCG External Halt
  2018-01-31 17:13 ` Paolo Bonzini
@ 2018-01-31 18:17   ` Alistair Francis
  2018-01-31 18:48     ` Peter Maydell
  0 siblings, 1 reply; 25+ messages in thread
From: Alistair Francis @ 2018-01-31 18:17 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel@nongnu.org Developers, Alex Bennée,
	Alistair Francis

On Wed, Jan 31, 2018 at 9:13 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 03/01/2018 17:10, Alistair Francis wrote:
>> Hey guys, I'm super stuck with an ugly MTTCG issue and was wondering
>> if anyone had any ideas.
>>
>> In the Xilinx fork of QEMU (based on 2.11) we have a way for CPUs to
>> halt other CPUs. This is used for example when the power control unit
>> halts the ARM A53s. To do this we have internal GPIO signals that end
>> up calling a function that basically does this:
>>
>> To halt:
>>     cpu->halted = true;
>>     cpu_interrupt(cpu, CPU_INTERRUPT_HALT);
>
> cpu->halted = true should not be needed here.  It will be set by
> cpu_handle_interrupt when processing CPU_INTERRUPT_HALT.
>
>> To un-halt
>>     cpu->halted = false;
>>     cpu_reset_interrupt(cpu, CPU_INTERRUPT_HALT);
>
> cpu->halted = false likewise should not be needed here, but you cannot
> just clear CPU_INTERRUPT_HALT either.  You need to set a *different*
> interrupt request bit (the dummy CPU_INTERRUPT_EXITTB will do) and
> cpu_handle_halt will clear cpu->halted.

The problem with that is that I hit this assert for ARM CPUs:

qemu-system-aarch64: ./target/arm/cpu.h:1446: arm_el_is_aa64:
Assertion `el >= 1 && el <= 3' failed.

Alistair

>
> Paolo
>
>> We also have the standard ARM WFI (Wait For Interrupt) implementation
>> in op_helper.c:
>>     cs->halted = 1;
>>     cs->exception_index = EXCP_HLT;
>>     cpu_loop_exit(cs);
>>
>> Before MTTCG this used to work great, but now either we end up with
>> the guest Linux complaining about CPU stalls or we hit:
>> ERROR:/scratch/alistai/master-qemu/cpus.c:1516:qemu_tcg_cpu_thread_fn:
>> assertion failed: (cpu->halted)
>>
>> If I remove the instances of manually setting cpu->halted then I don't
>> see the asserts(), but the the WFI instruction doesn't work correctly.
>> So it seems like setting the halted status externally from the CPU
>> causes the issue. I have tried setting it inside a lock, using atomic
>> operations and running the setter async on the CPU, but nothing works.
>>
>> Any chance any one has some insight into a way to externally set a
>> vCPU as halted/un-halted?
>>
>> Thanks,
>> Alistair
>>
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] MTTCG External Halt
  2018-01-31 18:17   ` Alistair Francis
@ 2018-01-31 18:48     ` Peter Maydell
  2018-01-31 18:51       ` Alistair Francis
  0 siblings, 1 reply; 25+ messages in thread
From: Peter Maydell @ 2018-01-31 18:48 UTC (permalink / raw)
  To: Alistair Francis
  Cc: Paolo Bonzini, Alex Bennée, qemu-devel@nongnu.org Developers,
	Alistair Francis

On 31 January 2018 at 18:17, Alistair Francis <alistair23@gmail.com> wrote:
> On Wed, Jan 31, 2018 at 9:13 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> cpu->halted = false likewise should not be needed here, but you cannot
>> just clear CPU_INTERRUPT_HALT either.  You need to set a *different*
>> interrupt request bit (the dummy CPU_INTERRUPT_EXITTB will do) and
>> cpu_handle_halt will clear cpu->halted.
>
> The problem with that is that I hit this assert for ARM CPUs:
>
> qemu-system-aarch64: ./target/arm/cpu.h:1446: arm_el_is_aa64:
> Assertion `el >= 1 && el <= 3' failed.

Backtrace from when you hit that might be useful...

thanks
-- PMM

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] MTTCG External Halt
  2018-01-31 18:48     ` Peter Maydell
@ 2018-01-31 18:51       ` Alistair Francis
  2018-01-31 18:56         ` Alistair Francis
  2018-01-31 18:59         ` Peter Maydell
  0 siblings, 2 replies; 25+ messages in thread
From: Alistair Francis @ 2018-01-31 18:51 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Paolo Bonzini, Alex Bennée, qemu-devel@nongnu.org Developers,
	Alistair Francis

On Wed, Jan 31, 2018 at 10:48 AM, Peter Maydell
<peter.maydell@linaro.org> wrote:
> On 31 January 2018 at 18:17, Alistair Francis <alistair23@gmail.com> wrote:
>> On Wed, Jan 31, 2018 at 9:13 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>> cpu->halted = false likewise should not be needed here, but you cannot
>>> just clear CPU_INTERRUPT_HALT either.  You need to set a *different*
>>> interrupt request bit (the dummy CPU_INTERRUPT_EXITTB will do) and
>>> cpu_handle_halt will clear cpu->halted.
>>
>> The problem with that is that I hit this assert for ARM CPUs:
>>
>> qemu-system-aarch64: ./target/arm/cpu.h:1446: arm_el_is_aa64:
>> Assertion `el >= 1 && el <= 3' failed.
>
> Backtrace from when you hit that might be useful...

Here it is:

(gdb) bt
#0  0x00007ffff1a030bb in __GI_raise (sig=sig@entry=6) at
../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007ffff1a04f5d in __GI_abort () at abort.c:90
#2  0x00007ffff19faf17 in __assert_fail_base (fmt=<optimized out>,
assertion=assertion@entry=0x555555cf86c4 "el >= 1 && el <= 3",
file=file@entry=0x555555cf8660
"/scratch/alistai/master-qemu/target/arm/cpu.h", line=line@entry=1446,
function=function@entry=0x555555d314e8 <__PRETTY_FUNCTION__.24916>
"arm_el_is_aa64") at assert.c:92
#3  0x00007ffff19fafc2 in __GI___assert_fail
(assertion=assertion@entry=0x555555cf86c4 "el >= 1 && el <= 3",
file=file@entry=0x555555cf8660
"/scratch/alistai/master-qemu/target/arm/cpu.h", line=line@entry=1446,
function=function@entry=0x555555d314e8 <__PRETTY_FUNCTION__.24916>
"arm_el_is_aa64") at assert.c:101
#4  0x00005555557eb872 in arm_el_is_aa64 (el=0, env=0x55555723c7f8) at
/scratch/alistai/master-qemu/target/arm/cpu.h:1446
#5  0x0000555555951233 in arm_el_is_aa64 (el=0, env=0x55555723c7f8) at
/scratch/alistai/master-qemu/target/arm/cpu.h:1838
#6  0x0000555555951233 in arm_cpu_do_interrupt (cs=0x555557234550) at
/scratch/alistai/master-qemu/target/arm/helper.c:8020
#7  0x000055555585e75b in cpu_handle_exception (ret=<synthetic
pointer>, cpu=0x555556c64200)
    at /scratch/alistai/master-qemu/accel/tcg/cpu-exec.c:532
#8  0x000055555585e75b in cpu_exec (cpu=cpu@entry=0x555557234550) at
/scratch/alistai/master-qemu/accel/tcg/cpu-exec.c:748
#9  0x000055555582d963 in tcg_cpu_exec (cpu=0x555557234550) at
/scratch/alistai/master-qemu/cpus.c:1297
#10 0x000055555582d963 in qemu_tcg_cpu_thread_fn (arg=0x555557234550)
at /scratch/alistai/master-qemu/cpus.c:1502
#11 0x00007ffff1db37fc in start_thread (arg=0x7ffef6b43700) at
pthread_create.c:465
#12 0x00007ffff1ae0b5f in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Alistair

>
> thanks
> -- PMM

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] MTTCG External Halt
  2018-01-31 18:51       ` Alistair Francis
@ 2018-01-31 18:56         ` Alistair Francis
  2018-01-31 18:59         ` Peter Maydell
  1 sibling, 0 replies; 25+ messages in thread
From: Alistair Francis @ 2018-01-31 18:56 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Paolo Bonzini, Alex Bennée, qemu-devel@nongnu.org Developers,
	Alistair Francis

On Wed, Jan 31, 2018 at 10:51 AM, Alistair Francis <alistair23@gmail.com> wrote:
> On Wed, Jan 31, 2018 at 10:48 AM, Peter Maydell
> <peter.maydell@linaro.org> wrote:
>> On 31 January 2018 at 18:17, Alistair Francis <alistair23@gmail.com> wrote:
>>> On Wed, Jan 31, 2018 at 9:13 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>> cpu->halted = false likewise should not be needed here, but you cannot
>>>> just clear CPU_INTERRUPT_HALT either.  You need to set a *different*
>>>> interrupt request bit (the dummy CPU_INTERRUPT_EXITTB will do) and
>>>> cpu_handle_halt will clear cpu->halted.
>>>
>>> The problem with that is that I hit this assert for ARM CPUs:
>>>
>>> qemu-system-aarch64: ./target/arm/cpu.h:1446: arm_el_is_aa64:
>>> Assertion `el >= 1 && el <= 3' failed.
>>
>> Backtrace from when you hit that might be useful...
>
> Here it is:
>
> (gdb) bt
> #0  0x00007ffff1a030bb in __GI_raise (sig=sig@entry=6) at
> ../sysdeps/unix/sysv/linux/raise.c:51
> #1  0x00007ffff1a04f5d in __GI_abort () at abort.c:90
> #2  0x00007ffff19faf17 in __assert_fail_base (fmt=<optimized out>,
> assertion=assertion@entry=0x555555cf86c4 "el >= 1 && el <= 3",
> file=file@entry=0x555555cf8660
> "/scratch/alistai/master-qemu/target/arm/cpu.h", line=line@entry=1446,
> function=function@entry=0x555555d314e8 <__PRETTY_FUNCTION__.24916>
> "arm_el_is_aa64") at assert.c:92
> #3  0x00007ffff19fafc2 in __GI___assert_fail
> (assertion=assertion@entry=0x555555cf86c4 "el >= 1 && el <= 3",
> file=file@entry=0x555555cf8660
> "/scratch/alistai/master-qemu/target/arm/cpu.h", line=line@entry=1446,
> function=function@entry=0x555555d314e8 <__PRETTY_FUNCTION__.24916>
> "arm_el_is_aa64") at assert.c:101
> #4  0x00005555557eb872 in arm_el_is_aa64 (el=0, env=0x55555723c7f8) at
> /scratch/alistai/master-qemu/target/arm/cpu.h:1446
> #5  0x0000555555951233 in arm_el_is_aa64 (el=0, env=0x55555723c7f8) at
> /scratch/alistai/master-qemu/target/arm/cpu.h:1838
> #6  0x0000555555951233 in arm_cpu_do_interrupt (cs=0x555557234550) at
> /scratch/alistai/master-qemu/target/arm/helper.c:8020
> #7  0x000055555585e75b in cpu_handle_exception (ret=<synthetic
> pointer>, cpu=0x555556c64200)
>     at /scratch/alistai/master-qemu/accel/tcg/cpu-exec.c:532
> #8  0x000055555585e75b in cpu_exec (cpu=cpu@entry=0x555557234550) at
> /scratch/alistai/master-qemu/accel/tcg/cpu-exec.c:748
> #9  0x000055555582d963 in tcg_cpu_exec (cpu=0x555557234550) at
> /scratch/alistai/master-qemu/cpus.c:1297
> #10 0x000055555582d963 in qemu_tcg_cpu_thread_fn (arg=0x555557234550)
> at /scratch/alistai/master-qemu/cpus.c:1502
> #11 0x00007ffff1db37fc in start_thread (arg=0x7ffef6b43700) at
> pthread_create.c:465
> #12 0x00007ffff1ae0b5f in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

This diff works around it, at least for now:

diff --git a/target/arm/helper.c b/target/arm/helper.c
index eebc898b37..06b40809d9 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -8015,6 +8015,10 @@ void arm_cpu_do_interrupt(CPUState *cs)
         return;
     }

+    if (is_a64(env) && new_el == 0) {
+        return;
+    }
+
     assert(!excp_is_internal(cs->exception_index));
     if (arm_el_is_aa64(env, new_el)) {
         arm_cpu_do_interrupt_aarch64(cs);


>
> Alistair
>
>>
>> thanks
>> -- PMM

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] MTTCG External Halt
  2018-01-31 18:51       ` Alistair Francis
  2018-01-31 18:56         ` Alistair Francis
@ 2018-01-31 18:59         ` Peter Maydell
  2018-01-31 19:37           ` Alistair Francis
  1 sibling, 1 reply; 25+ messages in thread
From: Peter Maydell @ 2018-01-31 18:59 UTC (permalink / raw)
  To: Alistair Francis
  Cc: Paolo Bonzini, Alex Bennée, qemu-devel@nongnu.org Developers,
	Alistair Francis

On 31 January 2018 at 18:51, Alistair Francis <alistair23@gmail.com> wrote:
> On Wed, Jan 31, 2018 at 10:48 AM, Peter Maydell
> <peter.maydell@linaro.org> wrote:
>> On 31 January 2018 at 18:17, Alistair Francis <alistair23@gmail.com> wrote:
>>> On Wed, Jan 31, 2018 at 9:13 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>> cpu->halted = false likewise should not be needed here, but you cannot
>>>> just clear CPU_INTERRUPT_HALT either.  You need to set a *different*
>>>> interrupt request bit (the dummy CPU_INTERRUPT_EXITTB will do) and
>>>> cpu_handle_halt will clear cpu->halted.
>>>
>>> The problem with that is that I hit this assert for ARM CPUs:
>>>
>>> qemu-system-aarch64: ./target/arm/cpu.h:1446: arm_el_is_aa64:
>>> Assertion `el >= 1 && el <= 3' failed.
>>
>> Backtrace from when you hit that might be useful...
>
> Here it is:
>
> (gdb) bt
> #0  0x00007ffff1a030bb in __GI_raise (sig=sig@entry=6) at
> ../sysdeps/unix/sysv/linux/raise.c:51
> #1  0x00007ffff1a04f5d in __GI_abort () at abort.c:90
> #2  0x00007ffff19faf17 in __assert_fail_base (fmt=<optimized out>,
> assertion=assertion@entry=0x555555cf86c4 "el >= 1 && el <= 3",
> file=file@entry=0x555555cf8660
> "/scratch/alistai/master-qemu/target/arm/cpu.h", line=line@entry=1446,
> function=function@entry=0x555555d314e8 <__PRETTY_FUNCTION__.24916>
> "arm_el_is_aa64") at assert.c:92
> #3  0x00007ffff19fafc2 in __GI___assert_fail
> (assertion=assertion@entry=0x555555cf86c4 "el >= 1 && el <= 3",
> file=file@entry=0x555555cf8660
> "/scratch/alistai/master-qemu/target/arm/cpu.h", line=line@entry=1446,
> function=function@entry=0x555555d314e8 <__PRETTY_FUNCTION__.24916>
> "arm_el_is_aa64") at assert.c:101
> #4  0x00005555557eb872 in arm_el_is_aa64 (el=0, env=0x55555723c7f8) at
> /scratch/alistai/master-qemu/target/arm/cpu.h:1446
> #5  0x0000555555951233 in arm_el_is_aa64 (el=0, env=0x55555723c7f8) at
> /scratch/alistai/master-qemu/target/arm/cpu.h:1838
> #6  0x0000555555951233 in arm_cpu_do_interrupt (cs=0x555557234550) at
> /scratch/alistai/master-qemu/target/arm/helper.c:8020

The problem is here (or further down the callstack) -- you
definitely don't want to be trying to take an interrupt from
the guest's perspective, which is what arm_cpu_do_interrupt()
is for...

This is probably happening because cpu->exception_index isn't
right at this point (though the arm code has a habit of leaving
it set to whatever its value was last...)

> #7  0x000055555585e75b in cpu_handle_exception (ret=<synthetic
> pointer>, cpu=0x555556c64200)
>     at /scratch/alistai/master-qemu/accel/tcg/cpu-exec.c:532
> #8  0x000055555585e75b in cpu_exec (cpu=cpu@entry=0x555557234550) at
> /scratch/alistai/master-qemu/accel/tcg/cpu-exec.c:748
> #9  0x000055555582d963 in tcg_cpu_exec (cpu=0x555557234550) at
> /scratch/alistai/master-qemu/cpus.c:1297
> #10 0x000055555582d963 in qemu_tcg_cpu_thread_fn (arg=0x555557234550)
> at /scratch/alistai/master-qemu/cpus.c:1502
> #11 0x00007ffff1db37fc in start_thread (arg=0x7ffef6b43700) at
> pthread_create.c:465
> #12 0x00007ffff1ae0b5f in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

thanks
-- PMM

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] MTTCG External Halt
  2018-01-31 18:59         ` Peter Maydell
@ 2018-01-31 19:37           ` Alistair Francis
  0 siblings, 0 replies; 25+ messages in thread
From: Alistair Francis @ 2018-01-31 19:37 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Paolo Bonzini, Alex Bennée, qemu-devel@nongnu.org Developers,
	Alistair Francis

On Wed, Jan 31, 2018 at 10:59 AM, Peter Maydell
<peter.maydell@linaro.org> wrote:
> On 31 January 2018 at 18:51, Alistair Francis <alistair23@gmail.com> wrote:
>> On Wed, Jan 31, 2018 at 10:48 AM, Peter Maydell
>> <peter.maydell@linaro.org> wrote:
>>> On 31 January 2018 at 18:17, Alistair Francis <alistair23@gmail.com> wrote:
>>>> On Wed, Jan 31, 2018 at 9:13 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>>> cpu->halted = false likewise should not be needed here, but you cannot
>>>>> just clear CPU_INTERRUPT_HALT either.  You need to set a *different*
>>>>> interrupt request bit (the dummy CPU_INTERRUPT_EXITTB will do) and
>>>>> cpu_handle_halt will clear cpu->halted.
>>>>
>>>> The problem with that is that I hit this assert for ARM CPUs:
>>>>
>>>> qemu-system-aarch64: ./target/arm/cpu.h:1446: arm_el_is_aa64:
>>>> Assertion `el >= 1 && el <= 3' failed.
>>>
>>> Backtrace from when you hit that might be useful...
>>
>> Here it is:
>>
>> (gdb) bt
>> #0  0x00007ffff1a030bb in __GI_raise (sig=sig@entry=6) at
>> ../sysdeps/unix/sysv/linux/raise.c:51
>> #1  0x00007ffff1a04f5d in __GI_abort () at abort.c:90
>> #2  0x00007ffff19faf17 in __assert_fail_base (fmt=<optimized out>,
>> assertion=assertion@entry=0x555555cf86c4 "el >= 1 && el <= 3",
>> file=file@entry=0x555555cf8660
>> "/scratch/alistai/master-qemu/target/arm/cpu.h", line=line@entry=1446,
>> function=function@entry=0x555555d314e8 <__PRETTY_FUNCTION__.24916>
>> "arm_el_is_aa64") at assert.c:92
>> #3  0x00007ffff19fafc2 in __GI___assert_fail
>> (assertion=assertion@entry=0x555555cf86c4 "el >= 1 && el <= 3",
>> file=file@entry=0x555555cf8660
>> "/scratch/alistai/master-qemu/target/arm/cpu.h", line=line@entry=1446,
>> function=function@entry=0x555555d314e8 <__PRETTY_FUNCTION__.24916>
>> "arm_el_is_aa64") at assert.c:101
>> #4  0x00005555557eb872 in arm_el_is_aa64 (el=0, env=0x55555723c7f8) at
>> /scratch/alistai/master-qemu/target/arm/cpu.h:1446
>> #5  0x0000555555951233 in arm_el_is_aa64 (el=0, env=0x55555723c7f8) at
>> /scratch/alistai/master-qemu/target/arm/cpu.h:1838
>> #6  0x0000555555951233 in arm_cpu_do_interrupt (cs=0x555557234550) at
>> /scratch/alistai/master-qemu/target/arm/helper.c:8020
>
> The problem is here (or further down the callstack) -- you
> definitely don't want to be trying to take an interrupt from
> the guest's perspective, which is what arm_cpu_do_interrupt()
> is for...
>
> This is probably happening because cpu->exception_index isn't
> right at this point (though the arm code has a habit of leaving
> it set to whatever its value was last...)

Ok, adding a cpu->exception_index = -1 seems to fix the assert.

Thanks for that Peter.

Now I'm just left with a hang :(

Alistair

>
>> #7  0x000055555585e75b in cpu_handle_exception (ret=<synthetic
>> pointer>, cpu=0x555556c64200)
>>     at /scratch/alistai/master-qemu/accel/tcg/cpu-exec.c:532
>> #8  0x000055555585e75b in cpu_exec (cpu=cpu@entry=0x555557234550) at
>> /scratch/alistai/master-qemu/accel/tcg/cpu-exec.c:748
>> #9  0x000055555582d963 in tcg_cpu_exec (cpu=0x555557234550) at
>> /scratch/alistai/master-qemu/cpus.c:1297
>> #10 0x000055555582d963 in qemu_tcg_cpu_thread_fn (arg=0x555557234550)
>> at /scratch/alistai/master-qemu/cpus.c:1502
>> #11 0x00007ffff1db37fc in start_thread (arg=0x7ffef6b43700) at
>> pthread_create.c:465
>> #12 0x00007ffff1ae0b5f in clone () at
>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
>
> thanks
> -- PMM

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] MTTCG External Halt
  2018-01-31 16:08         ` Alistair Francis
@ 2018-01-31 20:32           ` Alex Bennée
  2018-01-31 22:31             ` Alistair Francis
  0 siblings, 1 reply; 25+ messages in thread
From: Alex Bennée @ 2018-01-31 20:32 UTC (permalink / raw)
  To: Alistair Francis; +Cc: Paolo Bonzini, qemu-devel@nongnu.org Developers


Alistair Francis <alistair.francis@xilinx.com> writes:

> On Tue, Jan 30, 2018 at 8:26 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> On 30/01/2018 18:56, Alistair Francis wrote:
>>>
>>> I don't have a good solution though, as setting CPU_INTERRUPT_RESET
>>> doesn't help (that isn't handled while we are halted) and
>>> async_run_on_cpu()/run_on_cpu() doesn't reliably reset the CPU when we
>>> want.
>>>
>>> I've ever tried pausing all CPUs before reseting the CPU and them
>>> resuming them all but that doesn't seem to to work either.
>>
>> async_safe_run_on_cpu would be like async_run_on_cpu, except that it
>> takes care of stopping all other CPUs while the function runs.
>>
>>> Is there
>>> anything I'm missing? Is there no reliable way to reset a CPU?
>>
>> What do you mean by reliable?  Executing no instruction after the one
>> you were at?
>
> The reset is called by a GPIO line, so I need the reset to be called
> basically as quickly as the GPIO line changes. The async_ and
> async_safe_ functions seem to not run quickly enough, even if I run a
> process_work_queue() function afterwards.
>
> Is there a way to kick the CPU to act on the async_*?

Define quickly enough? The async_(safe) functions kick the vCPUs so they
will all exit the run loop as they enter the next TB (even if they loop
to themselves).

>From an external vCPUs point of view those extra instructions have
already executed. If the resetting vCPU needs them to have reset by the
time it executes it's next instruction it should either cpu_loop_exit at
that point or ensure it is the last instruction in it's TB (which is
what we do for the MMU flush cases in ARM, they all end the TB at that
point).


>
> Thanks,
> Alistair
>
>>
>> Paolo
>>


--
Alex Bennée

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] MTTCG External Halt
  2018-01-31 20:32           ` Alex Bennée
@ 2018-01-31 22:31             ` Alistair Francis
  2018-02-01 12:01               ` Alex Bennée
  0 siblings, 1 reply; 25+ messages in thread
From: Alistair Francis @ 2018-01-31 22:31 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Alistair Francis, Paolo Bonzini, qemu-devel@nongnu.org Developers

On Wed, Jan 31, 2018 at 12:32 PM, Alex Bennée <alex.bennee@linaro.org> wrote:
>
> Alistair Francis <alistair.francis@xilinx.com> writes:
>
>> On Tue, Jan 30, 2018 at 8:26 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>> On 30/01/2018 18:56, Alistair Francis wrote:
>>>>
>>>> I don't have a good solution though, as setting CPU_INTERRUPT_RESET
>>>> doesn't help (that isn't handled while we are halted) and
>>>> async_run_on_cpu()/run_on_cpu() doesn't reliably reset the CPU when we
>>>> want.
>>>>
>>>> I've ever tried pausing all CPUs before reseting the CPU and them
>>>> resuming them all but that doesn't seem to to work either.
>>>
>>> async_safe_run_on_cpu would be like async_run_on_cpu, except that it
>>> takes care of stopping all other CPUs while the function runs.
>>>
>>>> Is there
>>>> anything I'm missing? Is there no reliable way to reset a CPU?
>>>
>>> What do you mean by reliable?  Executing no instruction after the one
>>> you were at?
>>
>> The reset is called by a GPIO line, so I need the reset to be called
>> basically as quickly as the GPIO line changes. The async_ and
>> async_safe_ functions seem to not run quickly enough, even if I run a
>> process_work_queue() function afterwards.
>>
>> Is there a way to kick the CPU to act on the async_*?
>
> Define quickly enough? The async_(safe) functions kick the vCPUs so they
> will all exit the run loop as they enter the next TB (even if they loop
> to themselves).

We have a special power controller CPU that wakes all the CPUs up and
at boot the async_* functions don't wake the CPUs up. If I just use
the cpu_rest() function directly everything starts fine (but then I
hit issues later).

If I forcefully run process_queued_cpu_work() then I can get the CPUs
up, but I don't think that is the right solution.

>
> From an external vCPUs point of view those extra instructions have
> already executed. If the resetting vCPU needs them to have reset by the
> time it executes it's next instruction it should either cpu_loop_exit at
> that point or ensure it is the last instruction in it's TB (which is
> what we do for the MMU flush cases in ARM, they all end the TB at that
> point).

cpu_loop_exit() sounds like it would help, but as I'm not in the CPU
context it just seg faults.

Alistair

>
>
>>
>> Thanks,
>> Alistair
>>
>>>
>>> Paolo
>>>
>
>
> --
> Alex Bennée
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] MTTCG External Halt
  2018-01-31 22:31             ` Alistair Francis
@ 2018-02-01 12:01               ` Alex Bennée
  2018-02-01 17:13                 ` Alistair Francis
  0 siblings, 1 reply; 25+ messages in thread
From: Alex Bennée @ 2018-02-01 12:01 UTC (permalink / raw)
  To: Alistair Francis; +Cc: Paolo Bonzini, qemu-devel@nongnu.org Developers


Alistair Francis <alistair.francis@xilinx.com> writes:

> On Wed, Jan 31, 2018 at 12:32 PM, Alex Bennée <alex.bennee@linaro.org> wrote:
>>
>> Alistair Francis <alistair.francis@xilinx.com> writes:
>>
>>> On Tue, Jan 30, 2018 at 8:26 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>> On 30/01/2018 18:56, Alistair Francis wrote:
>>>>>
>>>>> I don't have a good solution though, as setting CPU_INTERRUPT_RESET
>>>>> doesn't help (that isn't handled while we are halted) and
>>>>> async_run_on_cpu()/run_on_cpu() doesn't reliably reset the CPU when we
>>>>> want.
>>>>>
>>>>> I've ever tried pausing all CPUs before reseting the CPU and them
>>>>> resuming them all but that doesn't seem to to work either.
>>>>
>>>> async_safe_run_on_cpu would be like async_run_on_cpu, except that it
>>>> takes care of stopping all other CPUs while the function runs.
>>>>
>>>>> Is there
>>>>> anything I'm missing? Is there no reliable way to reset a CPU?
>>>>
>>>> What do you mean by reliable?  Executing no instruction after the one
>>>> you were at?
>>>
>>> The reset is called by a GPIO line, so I need the reset to be called
>>> basically as quickly as the GPIO line changes. The async_ and
>>> async_safe_ functions seem to not run quickly enough, even if I run a
>>> process_work_queue() function afterwards.
>>>
>>> Is there a way to kick the CPU to act on the async_*?
>>
>> Define quickly enough? The async_(safe) functions kick the vCPUs so they
>> will all exit the run loop as they enter the next TB (even if they loop
>> to themselves).
>
> We have a special power controller CPU that wakes all the CPUs up and
> at boot the async_* functions don't wake the CPUs up. If I just use
> the cpu_rest() function directly everything starts fine (but then I
> hit issues later).
>
> If I forcefully run process_queued_cpu_work() then I can get the CPUs
> up, but I don't think that is the right solution.
>
>>
>> From an external vCPUs point of view those extra instructions have
>> already executed. If the resetting vCPU needs them to have reset by the
>> time it executes it's next instruction it should either cpu_loop_exit at
>> that point or ensure it is the last instruction in it's TB (which is
>> what we do for the MMU flush cases in ARM, they all end the TB at that
>> point).
>
> cpu_loop_exit() sounds like it would help, but as I'm not in the CPU
> context it just seg faults.

What context are you in? gdb-stub does have to something like this.

>
> Alistair
>
>>
>>
>>>
>>> Thanks,
>>> Alistair
>>>
>>>>
>>>> Paolo
>>>>
>>
>>
>> --
>> Alex Bennée
>>


--
Alex Bennée

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] MTTCG External Halt
  2018-02-01 12:01               ` Alex Bennée
@ 2018-02-01 17:13                 ` Alistair Francis
  2018-02-01 21:00                   ` Alistair Francis
  0 siblings, 1 reply; 25+ messages in thread
From: Alistair Francis @ 2018-02-01 17:13 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Alistair Francis, Paolo Bonzini, qemu-devel@nongnu.org Developers

On Thu, Feb 1, 2018 at 4:01 AM, Alex Bennée <alex.bennee@linaro.org> wrote:
>
> Alistair Francis <alistair.francis@xilinx.com> writes:
>
>> On Wed, Jan 31, 2018 at 12:32 PM, Alex Bennée <alex.bennee@linaro.org> wrote:
>>>
>>> Alistair Francis <alistair.francis@xilinx.com> writes:
>>>
>>>> On Tue, Jan 30, 2018 at 8:26 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>>> On 30/01/2018 18:56, Alistair Francis wrote:
>>>>>>
>>>>>> I don't have a good solution though, as setting CPU_INTERRUPT_RESET
>>>>>> doesn't help (that isn't handled while we are halted) and
>>>>>> async_run_on_cpu()/run_on_cpu() doesn't reliably reset the CPU when we
>>>>>> want.
>>>>>>
>>>>>> I've ever tried pausing all CPUs before reseting the CPU and them
>>>>>> resuming them all but that doesn't seem to to work either.
>>>>>
>>>>> async_safe_run_on_cpu would be like async_run_on_cpu, except that it
>>>>> takes care of stopping all other CPUs while the function runs.
>>>>>
>>>>>> Is there
>>>>>> anything I'm missing? Is there no reliable way to reset a CPU?
>>>>>
>>>>> What do you mean by reliable?  Executing no instruction after the one
>>>>> you were at?
>>>>
>>>> The reset is called by a GPIO line, so I need the reset to be called
>>>> basically as quickly as the GPIO line changes. The async_ and
>>>> async_safe_ functions seem to not run quickly enough, even if I run a
>>>> process_work_queue() function afterwards.
>>>>
>>>> Is there a way to kick the CPU to act on the async_*?
>>>
>>> Define quickly enough? The async_(safe) functions kick the vCPUs so they
>>> will all exit the run loop as they enter the next TB (even if they loop
>>> to themselves).
>>
>> We have a special power controller CPU that wakes all the CPUs up and
>> at boot the async_* functions don't wake the CPUs up. If I just use
>> the cpu_rest() function directly everything starts fine (but then I
>> hit issues later).
>>
>> If I forcefully run process_queued_cpu_work() then I can get the CPUs
>> up, but I don't think that is the right solution.
>>
>>>
>>> From an external vCPUs point of view those extra instructions have
>>> already executed. If the resetting vCPU needs them to have reset by the
>>> time it executes it's next instruction it should either cpu_loop_exit at
>>> that point or ensure it is the last instruction in it's TB (which is
>>> what we do for the MMU flush cases in ARM, they all end the TB at that
>>> point).
>>
>> cpu_loop_exit() sounds like it would help, but as I'm not in the CPU
>> context it just seg faults.
>
> What context are you in? gdb-stub does have to something like this.

gdb-stub just seems to use vm_stop() and vm_start().

That fixes all hangs/asserts, but now Linux only brings up 1 CPU (instead of 4).

Alistair

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] MTTCG External Halt
  2018-02-01 17:13                 ` Alistair Francis
@ 2018-02-01 21:00                   ` Alistair Francis
  2018-02-02 20:37                     ` Alex Bennée
  0 siblings, 1 reply; 25+ messages in thread
From: Alistair Francis @ 2018-02-01 21:00 UTC (permalink / raw)
  To: Alistair Francis
  Cc: Alex Bennée, Paolo Bonzini, qemu-devel@nongnu.org Developers

On Thu, Feb 1, 2018 at 9:13 AM, Alistair Francis
<alistair.francis@xilinx.com> wrote:
> On Thu, Feb 1, 2018 at 4:01 AM, Alex Bennée <alex.bennee@linaro.org> wrote:
>>
>> Alistair Francis <alistair.francis@xilinx.com> writes:
>>
>>> On Wed, Jan 31, 2018 at 12:32 PM, Alex Bennée <alex.bennee@linaro.org> wrote:
>>>>
>>>> Alistair Francis <alistair.francis@xilinx.com> writes:
>>>>
>>>>> On Tue, Jan 30, 2018 at 8:26 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>>>> On 30/01/2018 18:56, Alistair Francis wrote:
>>>>>>>
>>>>>>> I don't have a good solution though, as setting CPU_INTERRUPT_RESET
>>>>>>> doesn't help (that isn't handled while we are halted) and
>>>>>>> async_run_on_cpu()/run_on_cpu() doesn't reliably reset the CPU when we
>>>>>>> want.
>>>>>>>
>>>>>>> I've ever tried pausing all CPUs before reseting the CPU and them
>>>>>>> resuming them all but that doesn't seem to to work either.
>>>>>>
>>>>>> async_safe_run_on_cpu would be like async_run_on_cpu, except that it
>>>>>> takes care of stopping all other CPUs while the function runs.
>>>>>>
>>>>>>> Is there
>>>>>>> anything I'm missing? Is there no reliable way to reset a CPU?
>>>>>>
>>>>>> What do you mean by reliable?  Executing no instruction after the one
>>>>>> you were at?
>>>>>
>>>>> The reset is called by a GPIO line, so I need the reset to be called
>>>>> basically as quickly as the GPIO line changes. The async_ and
>>>>> async_safe_ functions seem to not run quickly enough, even if I run a
>>>>> process_work_queue() function afterwards.
>>>>>
>>>>> Is there a way to kick the CPU to act on the async_*?
>>>>
>>>> Define quickly enough? The async_(safe) functions kick the vCPUs so they
>>>> will all exit the run loop as they enter the next TB (even if they loop
>>>> to themselves).
>>>
>>> We have a special power controller CPU that wakes all the CPUs up and
>>> at boot the async_* functions don't wake the CPUs up. If I just use
>>> the cpu_rest() function directly everything starts fine (but then I
>>> hit issues later).
>>>
>>> If I forcefully run process_queued_cpu_work() then I can get the CPUs
>>> up, but I don't think that is the right solution.
>>>
>>>>
>>>> From an external vCPUs point of view those extra instructions have
>>>> already executed. If the resetting vCPU needs them to have reset by the
>>>> time it executes it's next instruction it should either cpu_loop_exit at
>>>> that point or ensure it is the last instruction in it's TB (which is
>>>> what we do for the MMU flush cases in ARM, they all end the TB at that
>>>> point).
>>>
>>> cpu_loop_exit() sounds like it would help, but as I'm not in the CPU
>>> context it just seg faults.
>>
>> What context are you in? gdb-stub does have to something like this.
>
> gdb-stub just seems to use vm_stop() and vm_start().
>
> That fixes all hangs/asserts, but now Linux only brings up 1 CPU (instead of 4).

Hmmm... Interesting if I do this on reset events:

        pause_all_vcpus();
        cpu_reset(cpu);
        resume_all_vcpus();

it hangs, while if I do this

        if (runstate_is_running()) {
            vm_stop(RUN_STATE_PAUSED);
        }
        cpu_reset(cpu);
        if (!runstate_needs_reset()) {
            vm_start();
        }

it doesn't hang but CPU bringup doesn't work.

Alistair

>
> Alistair

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] MTTCG External Halt
  2018-02-01 21:00                   ` Alistair Francis
@ 2018-02-02 20:37                     ` Alex Bennée
  2018-02-02 21:49                       ` Alistair Francis
  0 siblings, 1 reply; 25+ messages in thread
From: Alex Bennée @ 2018-02-02 20:37 UTC (permalink / raw)
  To: Alistair Francis; +Cc: Paolo Bonzini, qemu-devel@nongnu.org Developers


Alistair Francis <alistair.francis@xilinx.com> writes:

> On Thu, Feb 1, 2018 at 9:13 AM, Alistair Francis
> <alistair.francis@xilinx.com> wrote:
>> On Thu, Feb 1, 2018 at 4:01 AM, Alex Bennée <alex.bennee@linaro.org> wrote:
>>>
>>> Alistair Francis <alistair.francis@xilinx.com> writes:
>>>
>>>> On Wed, Jan 31, 2018 at 12:32 PM, Alex Bennée <alex.bennee@linaro.org> wrote:
>>>>>
>>>>> Alistair Francis <alistair.francis@xilinx.com> writes:
>>>>>
>>>>>> On Tue, Jan 30, 2018 at 8:26 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>>>>> On 30/01/2018 18:56, Alistair Francis wrote:
>>>>>>>>
>>>>>>>> I don't have a good solution though, as setting CPU_INTERRUPT_RESET
>>>>>>>> doesn't help (that isn't handled while we are halted) and
>>>>>>>> async_run_on_cpu()/run_on_cpu() doesn't reliably reset the CPU when we
>>>>>>>> want.
>>>>>>>>
>>>>>>>> I've ever tried pausing all CPUs before reseting the CPU and them
>>>>>>>> resuming them all but that doesn't seem to to work either.
>>>>>>>
>>>>>>> async_safe_run_on_cpu would be like async_run_on_cpu, except that it
>>>>>>> takes care of stopping all other CPUs while the function runs.
>>>>>>>
>>>>>>>> Is there
>>>>>>>> anything I'm missing? Is there no reliable way to reset a CPU?
>>>>>>>
>>>>>>> What do you mean by reliable?  Executing no instruction after the one
>>>>>>> you were at?
>>>>>>
>>>>>> The reset is called by a GPIO line, so I need the reset to be called
>>>>>> basically as quickly as the GPIO line changes. The async_ and
>>>>>> async_safe_ functions seem to not run quickly enough, even if I run a
>>>>>> process_work_queue() function afterwards.
>>>>>>
>>>>>> Is there a way to kick the CPU to act on the async_*?
>>>>>
>>>>> Define quickly enough? The async_(safe) functions kick the vCPUs so they
>>>>> will all exit the run loop as they enter the next TB (even if they loop
>>>>> to themselves).
>>>>
>>>> We have a special power controller CPU that wakes all the CPUs up and
>>>> at boot the async_* functions don't wake the CPUs up. If I just use
>>>> the cpu_rest() function directly everything starts fine (but then I
>>>> hit issues later).
>>>>
>>>> If I forcefully run process_queued_cpu_work() then I can get the CPUs
>>>> up, but I don't think that is the right solution.
>>>>
>>>>>
>>>>> From an external vCPUs point of view those extra instructions have
>>>>> already executed. If the resetting vCPU needs them to have reset by the
>>>>> time it executes it's next instruction it should either cpu_loop_exit at
>>>>> that point or ensure it is the last instruction in it's TB (which is
>>>>> what we do for the MMU flush cases in ARM, they all end the TB at that
>>>>> point).
>>>>
>>>> cpu_loop_exit() sounds like it would help, but as I'm not in the CPU
>>>> context it just seg faults.
>>>
>>> What context are you in? gdb-stub does have to something like this.
>>
>> gdb-stub just seems to use vm_stop() and vm_start().
>>
>> That fixes all hangs/asserts, but now Linux only brings up 1 CPU (instead of 4).
>
> Hmmm... Interesting if I do this on reset events:
>
>         pause_all_vcpus();
>         cpu_reset(cpu);
>         resume_all_vcpus();
>
> it hangs, while if I do this
>
>         if (runstate_is_running()) {
>             vm_stop(RUN_STATE_PAUSED);
>         }
>         cpu_reset(cpu);
>         if (!runstate_needs_reset()) {
>             vm_start();
>         }
>
> it doesn't hang but CPU bringup doesn't work.

Hmm I'm still confused what context you are in. Is this an externally
triggered reset via the (qemu) prompt or something?

>
> Alistair
>
>>
>> Alistair


--
Alex Bennée

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] MTTCG External Halt
  2018-02-02 20:37                     ` Alex Bennée
@ 2018-02-02 21:49                       ` Alistair Francis
  2018-02-02 21:59                         ` Alistair Francis
  0 siblings, 1 reply; 25+ messages in thread
From: Alistair Francis @ 2018-02-02 21:49 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Alistair Francis, Paolo Bonzini, qemu-devel@nongnu.org Developers

On Fri, Feb 2, 2018 at 12:37 PM, Alex Bennée <alex.bennee@linaro.org> wrote:
>
> Alistair Francis <alistair.francis@xilinx.com> writes:
>
>> On Thu, Feb 1, 2018 at 9:13 AM, Alistair Francis
>> <alistair.francis@xilinx.com> wrote:
>>> On Thu, Feb 1, 2018 at 4:01 AM, Alex Bennée <alex.bennee@linaro.org> wrote:
>>>>
>>>> Alistair Francis <alistair.francis@xilinx.com> writes:
>>>>
>>>>> On Wed, Jan 31, 2018 at 12:32 PM, Alex Bennée <alex.bennee@linaro.org> wrote:
>>>>>>
>>>>>> Alistair Francis <alistair.francis@xilinx.com> writes:
>>>>>>
>>>>>>> On Tue, Jan 30, 2018 at 8:26 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>>>>>> On 30/01/2018 18:56, Alistair Francis wrote:
>>>>>>>>>
>>>>>>>>> I don't have a good solution though, as setting CPU_INTERRUPT_RESET
>>>>>>>>> doesn't help (that isn't handled while we are halted) and
>>>>>>>>> async_run_on_cpu()/run_on_cpu() doesn't reliably reset the CPU when we
>>>>>>>>> want.
>>>>>>>>>
>>>>>>>>> I've ever tried pausing all CPUs before reseting the CPU and them
>>>>>>>>> resuming them all but that doesn't seem to to work either.
>>>>>>>>
>>>>>>>> async_safe_run_on_cpu would be like async_run_on_cpu, except that it
>>>>>>>> takes care of stopping all other CPUs while the function runs.
>>>>>>>>
>>>>>>>>> Is there
>>>>>>>>> anything I'm missing? Is there no reliable way to reset a CPU?
>>>>>>>>
>>>>>>>> What do you mean by reliable?  Executing no instruction after the one
>>>>>>>> you were at?
>>>>>>>
>>>>>>> The reset is called by a GPIO line, so I need the reset to be called
>>>>>>> basically as quickly as the GPIO line changes. The async_ and
>>>>>>> async_safe_ functions seem to not run quickly enough, even if I run a
>>>>>>> process_work_queue() function afterwards.
>>>>>>>
>>>>>>> Is there a way to kick the CPU to act on the async_*?
>>>>>>
>>>>>> Define quickly enough? The async_(safe) functions kick the vCPUs so they
>>>>>> will all exit the run loop as they enter the next TB (even if they loop
>>>>>> to themselves).
>>>>>
>>>>> We have a special power controller CPU that wakes all the CPUs up and
>>>>> at boot the async_* functions don't wake the CPUs up. If I just use
>>>>> the cpu_rest() function directly everything starts fine (but then I
>>>>> hit issues later).
>>>>>
>>>>> If I forcefully run process_queued_cpu_work() then I can get the CPUs
>>>>> up, but I don't think that is the right solution.
>>>>>
>>>>>>
>>>>>> From an external vCPUs point of view those extra instructions have
>>>>>> already executed. If the resetting vCPU needs them to have reset by the
>>>>>> time it executes it's next instruction it should either cpu_loop_exit at
>>>>>> that point or ensure it is the last instruction in it's TB (which is
>>>>>> what we do for the MMU flush cases in ARM, they all end the TB at that
>>>>>> point).
>>>>>
>>>>> cpu_loop_exit() sounds like it would help, but as I'm not in the CPU
>>>>> context it just seg faults.
>>>>
>>>> What context are you in? gdb-stub does have to something like this.
>>>
>>> gdb-stub just seems to use vm_stop() and vm_start().
>>>
>>> That fixes all hangs/asserts, but now Linux only brings up 1 CPU (instead of 4).
>>
>> Hmmm... Interesting if I do this on reset events:
>>
>>         pause_all_vcpus();
>>         cpu_reset(cpu);
>>         resume_all_vcpus();
>>
>> it hangs, while if I do this
>>
>>         if (runstate_is_running()) {
>>             vm_stop(RUN_STATE_PAUSED);
>>         }
>>         cpu_reset(cpu);
>>         if (!runstate_needs_reset()) {
>>             vm_start();
>>         }
>>
>> it doesn't hang but CPU bringup doesn't work.
>
> Hmm I'm still confused what context you are in. Is this an externally
> triggered reset via the (qemu) prompt or something?

This gets called from a variety of places. But most likely it's called
from a second QEMU process that is triggering an interrupt through a
device.

Alistair

>
>>
>> Alistair
>>
>>>
>>> Alistair
>
>
> --
> Alex Bennée
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] MTTCG External Halt
  2018-02-02 21:49                       ` Alistair Francis
@ 2018-02-02 21:59                         ` Alistair Francis
  2018-04-22 23:03                           ` Philippe Mathieu-Daudé
  0 siblings, 1 reply; 25+ messages in thread
From: Alistair Francis @ 2018-02-02 21:59 UTC (permalink / raw)
  To: Alistair Francis
  Cc: Alex Bennée, Paolo Bonzini, qemu-devel@nongnu.org Developers

On Fri, Feb 2, 2018 at 1:49 PM, Alistair Francis
<alistair.francis@xilinx.com> wrote:
> On Fri, Feb 2, 2018 at 12:37 PM, Alex Bennée <alex.bennee@linaro.org> wrote:
>>
>> Alistair Francis <alistair.francis@xilinx.com> writes:
>>
>>> On Thu, Feb 1, 2018 at 9:13 AM, Alistair Francis
>>> <alistair.francis@xilinx.com> wrote:
>>>> On Thu, Feb 1, 2018 at 4:01 AM, Alex Bennée <alex.bennee@linaro.org> wrote:
>>>>>
>>>>> Alistair Francis <alistair.francis@xilinx.com> writes:
>>>>>
>>>>>> On Wed, Jan 31, 2018 at 12:32 PM, Alex Bennée <alex.bennee@linaro.org> wrote:
>>>>>>>
>>>>>>> Alistair Francis <alistair.francis@xilinx.com> writes:
>>>>>>>
>>>>>>>> On Tue, Jan 30, 2018 at 8:26 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>>>>>>> On 30/01/2018 18:56, Alistair Francis wrote:
>>>>>>>>>>
>>>>>>>>>> I don't have a good solution though, as setting CPU_INTERRUPT_RESET
>>>>>>>>>> doesn't help (that isn't handled while we are halted) and
>>>>>>>>>> async_run_on_cpu()/run_on_cpu() doesn't reliably reset the CPU when we
>>>>>>>>>> want.
>>>>>>>>>>
>>>>>>>>>> I've ever tried pausing all CPUs before reseting the CPU and them
>>>>>>>>>> resuming them all but that doesn't seem to to work either.
>>>>>>>>>
>>>>>>>>> async_safe_run_on_cpu would be like async_run_on_cpu, except that it
>>>>>>>>> takes care of stopping all other CPUs while the function runs.
>>>>>>>>>
>>>>>>>>>> Is there
>>>>>>>>>> anything I'm missing? Is there no reliable way to reset a CPU?
>>>>>>>>>
>>>>>>>>> What do you mean by reliable?  Executing no instruction after the one
>>>>>>>>> you were at?
>>>>>>>>
>>>>>>>> The reset is called by a GPIO line, so I need the reset to be called
>>>>>>>> basically as quickly as the GPIO line changes. The async_ and
>>>>>>>> async_safe_ functions seem to not run quickly enough, even if I run a
>>>>>>>> process_work_queue() function afterwards.
>>>>>>>>
>>>>>>>> Is there a way to kick the CPU to act on the async_*?
>>>>>>>
>>>>>>> Define quickly enough? The async_(safe) functions kick the vCPUs so they
>>>>>>> will all exit the run loop as they enter the next TB (even if they loop
>>>>>>> to themselves).
>>>>>>
>>>>>> We have a special power controller CPU that wakes all the CPUs up and
>>>>>> at boot the async_* functions don't wake the CPUs up. If I just use
>>>>>> the cpu_rest() function directly everything starts fine (but then I
>>>>>> hit issues later).
>>>>>>
>>>>>> If I forcefully run process_queued_cpu_work() then I can get the CPUs
>>>>>> up, but I don't think that is the right solution.
>>>>>>
>>>>>>>
>>>>>>> From an external vCPUs point of view those extra instructions have
>>>>>>> already executed. If the resetting vCPU needs them to have reset by the
>>>>>>> time it executes it's next instruction it should either cpu_loop_exit at
>>>>>>> that point or ensure it is the last instruction in it's TB (which is
>>>>>>> what we do for the MMU flush cases in ARM, they all end the TB at that
>>>>>>> point).
>>>>>>
>>>>>> cpu_loop_exit() sounds like it would help, but as I'm not in the CPU
>>>>>> context it just seg faults.
>>>>>
>>>>> What context are you in? gdb-stub does have to something like this.
>>>>
>>>> gdb-stub just seems to use vm_stop() and vm_start().
>>>>
>>>> That fixes all hangs/asserts, but now Linux only brings up 1 CPU (instead of 4).
>>>
>>> Hmmm... Interesting if I do this on reset events:
>>>
>>>         pause_all_vcpus();
>>>         cpu_reset(cpu);
>>>         resume_all_vcpus();
>>>
>>> it hangs, while if I do this
>>>
>>>         if (runstate_is_running()) {
>>>             vm_stop(RUN_STATE_PAUSED);
>>>         }
>>>         cpu_reset(cpu);
>>>         if (!runstate_needs_reset()) {
>>>             vm_start();
>>>         }
>>>
>>> it doesn't hang but CPU bringup doesn't work.
>>
>> Hmm I'm still confused what context you are in. Is this an externally
>> triggered reset via the (qemu) prompt or something?
>
> This gets called from a variety of places. But most likely it's called
> from a second QEMU process that is triggering an interrupt through a
> device.

Something like this:

#0  0x0000555555807350 in cpu_reset_gpio (opaque=0x555557272100,
irq=0, level=0) at /scratch/alistai/master-qemu/exec.c:3853
#1  0x0000555555a20336 in dep_register_refresh_gpios
(reg=reg@entry=0x555556fa5ad0, old_value=old_value@entry=2147496974)
    at hw/core/register-dep.c:246
#2  0x0000555555a2067b in dep_register_write (reg=0x555556fa5ad0,
val=<optimized out>, we=<optimized out>)
    at hw/core/register-dep.c:142
#3  0x0000555555841ae8 in memory_region_write_accessor
(mr=0x555556fa5b80, addr=0, value=<optimized out>, size=4,
shift=<optimized out>, mask=<optimized out>, attrs=...) at
/scratch/alistai/master-qemu/memory.c:617
#4  0x000055555583e57d in access_with_adjusted_size
(addr=addr@entry=0, value=value@entry=0x7fffffffd218,
size=size@entry=4, access_size_min=<optimized out>,
access_size_max=<optimized out>, access_fn=
    0x555555841a70 <memory_region_write_accessor>, mr=0x555556fa5b80,
attrs=...) at /scratch/alistai/master-qemu/memory.c:684
#5  0x0000555555843cda in memory_region_dispatch_write
(mr=0x555556fa5b80, addr=0, data=<optimized out>, size=4, attrs=...)
    at /scratch/alistai/master-qemu/memory.c:1789
#6  0x00005555557fbcb1 in flatview_write_continue (mr=0x555556fa5b80,
l=<optimized out>, addr1=<optimized out>, len=4, buf=0x7fff900047c0
"\f4", attrs=..., addr=4246339844, fv=0x5555574cdc10) at
/scratch/alistai/master-qemu/exec.c:3076
#7  0x00005555557fbcb1 in flatview_write (fv=0x5555574cdc10,
addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized
out>) at /scratch/alistai/master-qemu/exec.c:3145
#8  0x000055555586eb1b in dma_memory_rw_relaxed_attr (attr=...,
dir=DMA_DIRECTION_FROM_DEVICE, len=<optimized out>,
buf=0x7fff900047c0, addr=<optimized out>, as=<optimized out>) at
/scratch/alistai/master-qemu/include/sysemu/dma.h:96
#9  0x000055555586eb1b in dma_memory_rw_attr (attr=...,
dir=DMA_DIRECTION_FROM_DEVICE, len=<optimized out>,
buf=0x7fff900047c0, addr=<optimized out>, as=<optimized out>) at
/scratch/alistai/master-qemu/include/sysemu/dma.h:120
#10 0x000055555586eb1b in rp_cmd_rw (s=0x555556d0bb90,
pkt=0x7fff90004770, dir=DMA_DIRECTION_FROM_DEVICE)
    at /scratch/alistai/master-qemu/hw/core/remote-port-memory-slave.c:93
#11 0x000055555586db53 in rp_process (s=<optimized out>) at
/scratch/alistai/master-qemu/hw/core/remote-port.c:424
#12 0x000055555586db53 in rp_event_read (opaque=<optimized out>) at
/scratch/alistai/master-qemu/hw/core/remote-port.c:460
#13 0x0000555555c5de14 in aio_dispatch_handlers
(ctx=ctx@entry=0x555556cf7750) at util/aio-posix.c:406
#14 0x0000555555c5e6e8 in aio_dispatch (ctx=0x555556cf7750) at
util/aio-posix.c:437
#15 0x0000555555c5b6ae in aio_ctx_dispatch (source=<optimized out>,
callback=<optimized out>, user_data=<optimized out>)
    at util/async.c:261
#16 0x00007ffff27a4fb7 in g_main_context_dispatch () at
/lib/x86_64-linux-gnu/libglib-2.0.so.0
#17 0x0000555555c5d937 in glib_pollfds_poll () at util/main-loop.c:215
#18 0x0000555555c5d937 in os_host_main_loop_wait (timeout=<optimized
out>) at util/main-loop.c:262
#19 0x0000555555c5d937 in main_loop_wait (nonblocking=<optimized out>)
at util/main-loop.c:516
#20 0x00005555557f4c76 in main_loop () at vl.c:2002
#21 0x00005555557f4c76 in main (argc=<optimized out>, argv=<optimized
out>, envp=<optimized out>) at vl.c:4949


Alistair

>
> Alistair
>
>>
>>>
>>> Alistair
>>>
>>>>
>>>> Alistair
>>
>>
>> --
>> Alex Bennée
>>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] MTTCG External Halt
  2018-02-02 21:59                         ` Alistair Francis
@ 2018-04-22 23:03                           ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 25+ messages in thread
From: Philippe Mathieu-Daudé @ 2018-04-22 23:03 UTC (permalink / raw)
  To: Alistair Francis, Stefan Hajnoczi
  Cc: Paolo Bonzini, Alex Bennée, qemu-devel@nongnu.org Developers

> On Fri, Feb 2, 2018 at 1:49 PM, Alistair Francis
> <alistair.francis@xilinx.com> wrote:
>> On Fri, Feb 2, 2018 at 12:37 PM, Alex Bennée <alex.bennee@linaro.org> wrote:
>>>
>>> Alistair Francis <alistair.francis@xilinx.com> writes:
>>>
>>>> On Thu, Feb 1, 2018 at 9:13 AM, Alistair Francis
>>>> <alistair.francis@xilinx.com> wrote:
>>>>> On Thu, Feb 1, 2018 at 4:01 AM, Alex Bennée <alex.bennee@linaro.org> wrote:
>>>>>>
>>>>>> Alistair Francis <alistair.francis@xilinx.com> writes:
>>>>>>
>>>>>>> On Wed, Jan 31, 2018 at 12:32 PM, Alex Bennée <alex.bennee@linaro.org> wrote:
>>>>>>>>
>>>>>>>> Alistair Francis <alistair.francis@xilinx.com> writes:
>>>>>>>>
>>>>>>>>> On Tue, Jan 30, 2018 at 8:26 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>>>>>>>> On 30/01/2018 18:56, Alistair Francis wrote:
>>>>>>>>>>>
>>>>>>>>>>> I don't have a good solution though, as setting CPU_INTERRUPT_RESET
>>>>>>>>>>> doesn't help (that isn't handled while we are halted) and
>>>>>>>>>>> async_run_on_cpu()/run_on_cpu() doesn't reliably reset the CPU when we
>>>>>>>>>>> want.
>>>>>>>>>>>
>>>>>>>>>>> I've ever tried pausing all CPUs before reseting the CPU and them
>>>>>>>>>>> resuming them all but that doesn't seem to to work either.
>>>>>>>>>>
>>>>>>>>>> async_safe_run_on_cpu would be like async_run_on_cpu, except that it
>>>>>>>>>> takes care of stopping all other CPUs while the function runs.
>>>>>>>>>>
>>>>>>>>>>> Is there
>>>>>>>>>>> anything I'm missing? Is there no reliable way to reset a CPU?
>>>>>>>>>>
>>>>>>>>>> What do you mean by reliable?  Executing no instruction after the one
>>>>>>>>>> you were at?
>>>>>>>>>
>>>>>>>>> The reset is called by a GPIO line, so I need the reset to be called
>>>>>>>>> basically as quickly as the GPIO line changes. The async_ and
>>>>>>>>> async_safe_ functions seem to not run quickly enough, even if I run a
>>>>>>>>> process_work_queue() function afterwards.
>>>>>>>>>
>>>>>>>>> Is there a way to kick the CPU to act on the async_*?
>>>>>>>>
>>>>>>>> Define quickly enough? The async_(safe) functions kick the vCPUs so they
>>>>>>>> will all exit the run loop as they enter the next TB (even if they loop
>>>>>>>> to themselves).
>>>>>>>
>>>>>>> We have a special power controller CPU that wakes all the CPUs up and
>>>>>>> at boot the async_* functions don't wake the CPUs up. If I just use
>>>>>>> the cpu_rest() function directly everything starts fine (but then I
>>>>>>> hit issues later).
>>>>>>>
>>>>>>> If I forcefully run process_queued_cpu_work() then I can get the CPUs
>>>>>>> up, but I don't think that is the right solution.
>>>>>>>
>>>>>>>>
>>>>>>>> From an external vCPUs point of view those extra instructions have
>>>>>>>> already executed. If the resetting vCPU needs them to have reset by the
>>>>>>>> time it executes it's next instruction it should either cpu_loop_exit at
>>>>>>>> that point or ensure it is the last instruction in it's TB (which is
>>>>>>>> what we do for the MMU flush cases in ARM, they all end the TB at that
>>>>>>>> point).
>>>>>>>
>>>>>>> cpu_loop_exit() sounds like it would help, but as I'm not in the CPU
>>>>>>> context it just seg faults.
>>>>>>
>>>>>> What context are you in? gdb-stub does have to something like this.
>>>>>
>>>>> gdb-stub just seems to use vm_stop() and vm_start().
>>>>>
>>>>> That fixes all hangs/asserts, but now Linux only brings up 1 CPU (instead of 4).
>>>>
>>>> Hmmm... Interesting if I do this on reset events:
>>>>
>>>>         pause_all_vcpus();
>>>>         cpu_reset(cpu);
>>>>         resume_all_vcpus();
>>>>
>>>> it hangs, while if I do this
>>>>
>>>>         if (runstate_is_running()) {
>>>>             vm_stop(RUN_STATE_PAUSED);
>>>>         }
>>>>         cpu_reset(cpu);
>>>>         if (!runstate_needs_reset()) {
>>>>             vm_start();
>>>>         }
>>>>
>>>> it doesn't hang but CPU bringup doesn't work.
>>>
>>> Hmm I'm still confused what context you are in. Is this an externally
>>> triggered reset via the (qemu) prompt or something?
>>
>> This gets called from a variety of places. But most likely it's called
>> from a second QEMU process that is triggering an interrupt through a
>> device.
> 
> Something like this:
> 
> #0  0x0000555555807350 in cpu_reset_gpio (opaque=0x555557272100,
> irq=0, level=0) at /scratch/alistai/master-qemu/exec.c:3853
> #1  0x0000555555a20336 in dep_register_refresh_gpios
> (reg=reg@entry=0x555556fa5ad0, old_value=old_value@entry=2147496974)
>     at hw/core/register-dep.c:246
> #2  0x0000555555a2067b in dep_register_write (reg=0x555556fa5ad0,
> val=<optimized out>, we=<optimized out>)
>     at hw/core/register-dep.c:142
> #3  0x0000555555841ae8 in memory_region_write_accessor
> (mr=0x555556fa5b80, addr=0, value=<optimized out>, size=4,
> shift=<optimized out>, mask=<optimized out>, attrs=...) at
> /scratch/alistai/master-qemu/memory.c:617
> #4  0x000055555583e57d in access_with_adjusted_size
> (addr=addr@entry=0, value=value@entry=0x7fffffffd218,
> size=size@entry=4, access_size_min=<optimized out>,
> access_size_max=<optimized out>, access_fn=
>     0x555555841a70 <memory_region_write_accessor>, mr=0x555556fa5b80,
> attrs=...) at /scratch/alistai/master-qemu/memory.c:684
> #5  0x0000555555843cda in memory_region_dispatch_write
> (mr=0x555556fa5b80, addr=0, data=<optimized out>, size=4, attrs=...)
>     at /scratch/alistai/master-qemu/memory.c:1789
> #6  0x00005555557fbcb1 in flatview_write_continue (mr=0x555556fa5b80,
> l=<optimized out>, addr1=<optimized out>, len=4, buf=0x7fff900047c0
> "\f4", attrs=..., addr=4246339844, fv=0x5555574cdc10) at
> /scratch/alistai/master-qemu/exec.c:3076
> #7  0x00005555557fbcb1 in flatview_write (fv=0x5555574cdc10,
> addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized
> out>) at /scratch/alistai/master-qemu/exec.c:3145
> #8  0x000055555586eb1b in dma_memory_rw_relaxed_attr (attr=...,
> dir=DMA_DIRECTION_FROM_DEVICE, len=<optimized out>,
> buf=0x7fff900047c0, addr=<optimized out>, as=<optimized out>) at
> /scratch/alistai/master-qemu/include/sysemu/dma.h:96
> #9  0x000055555586eb1b in dma_memory_rw_attr (attr=...,
> dir=DMA_DIRECTION_FROM_DEVICE, len=<optimized out>,
> buf=0x7fff900047c0, addr=<optimized out>, as=<optimized out>) at
> /scratch/alistai/master-qemu/include/sysemu/dma.h:120

Cc'ing Stefan for this part:

> #10 0x000055555586eb1b in rp_cmd_rw (s=0x555556d0bb90,
> pkt=0x7fff90004770, dir=DMA_DIRECTION_FROM_DEVICE)
>     at /scratch/alistai/master-qemu/hw/core/remote-port-memory-slave.c:93
> #11 0x000055555586db53 in rp_process (s=<optimized out>) at
> /scratch/alistai/master-qemu/hw/core/remote-port.c:424
> #12 0x000055555586db53 in rp_event_read (opaque=<optimized out>) at
> /scratch/alistai/master-qemu/hw/core/remote-port.c:460
> #13 0x0000555555c5de14 in aio_dispatch_handlers
> (ctx=ctx@entry=0x555556cf7750) at util/aio-posix.c:406
> #14 0x0000555555c5e6e8 in aio_dispatch (ctx=0x555556cf7750) at
> util/aio-posix.c:437
> #15 0x0000555555c5b6ae in aio_ctx_dispatch (source=<optimized out>,
> callback=<optimized out>, user_data=<optimized out>)
>     at util/async.c:261
> #16 0x00007ffff27a4fb7 in g_main_context_dispatch () at
> /lib/x86_64-linux-gnu/libglib-2.0.so.0
> #17 0x0000555555c5d937 in glib_pollfds_poll () at util/main-loop.c:215
> #18 0x0000555555c5d937 in os_host_main_loop_wait (timeout=<optimized
> out>) at util/main-loop.c:262
> #19 0x0000555555c5d937 in main_loop_wait (nonblocking=<optimized out>)
> at util/main-loop.c:516
> #20 0x00005555557f4c76 in main_loop () at vl.c:2002
> #21 0x00005555557f4c76 in main (argc=<optimized out>, argv=<optimized
> out>, envp=<optimized out>) at vl.c:4949

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2018-04-22 23:03 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-03 22:10 [Qemu-devel] MTTCG External Halt Alistair Francis
2018-01-03 22:14 ` Peter Maydell
2018-01-03 22:23   ` Alistair Francis
2018-01-04  1:14     ` Alistair Francis
2018-01-04 11:08 ` Alex Bennée
2018-01-06  2:23   ` Alistair Francis
2018-01-30 23:56     ` Alistair Francis
2018-01-31  4:26       ` Paolo Bonzini
2018-01-31 16:08         ` Alistair Francis
2018-01-31 20:32           ` Alex Bennée
2018-01-31 22:31             ` Alistair Francis
2018-02-01 12:01               ` Alex Bennée
2018-02-01 17:13                 ` Alistair Francis
2018-02-01 21:00                   ` Alistair Francis
2018-02-02 20:37                     ` Alex Bennée
2018-02-02 21:49                       ` Alistair Francis
2018-02-02 21:59                         ` Alistair Francis
2018-04-22 23:03                           ` Philippe Mathieu-Daudé
2018-01-31 17:13 ` Paolo Bonzini
2018-01-31 18:17   ` Alistair Francis
2018-01-31 18:48     ` Peter Maydell
2018-01-31 18:51       ` Alistair Francis
2018-01-31 18:56         ` Alistair Francis
2018-01-31 18:59         ` Peter Maydell
2018-01-31 19:37           ` Alistair Francis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).