[Qemu-devel] [Bug 1647683] [NEW] Bad interaction between tb flushing & gdb stub

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [Bug 1647683] [NEW] Bad interaction between tb flushing & gdb stub
@ 2016-12-06 11:39 Julian Brown
  2016-12-06 11:46 ` [Qemu-devel] [Bug 1647683] " Julian Brown
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Julian Brown @ 2016-12-06 11:39 UTC (permalink / raw)
  To: qemu-devel

Public bug reported:

I have been working on a series of patches for ARM big-endian system
mode support, using QEMU as a bare-metal simulator for the GDB test
suite. At some point I realised that these tests were not running
reliably on the QEMU master branch, even without my patches applied.
(I.e., in little-endian mode.)

Running QEMU under GDB in the test harness via Valgrind, using something
akin to:

(gdb) target remote | valgrind --tool=memcheck qemu-arm-system [...]

leads to intermittent (and quite hard-to-reproduce) segfaults in QEMU of
the form:

==52333== Process terminating with default action of signal 11 (SIGSEGV)
==52333==  Access not within mapped region at address 0x24
==52333==    at 0x1D55F2: tb_page_remove (translate-all.c:1026)
==52333==    by 0x1D58B4: tb_phys_invalidate (translate-all.c:1119)
==52333==    by 0x1D63AA: tb_invalidate_phys_page_range (translate-all.c:1519)
==52333==    by 0x1D66D7: tb_invalidate_phys_addr (translate-all.c:1714)
==52333==    by 0x1CBA7F: breakpoint_invalidate (exec.c:704)
==52333==    by 0x1CC01F: cpu_breakpoint_remove_by_ref (exec.c:869)
==52333==    by 0x1CBF97: cpu_breakpoint_remove (exec.c:857)
==52333==    by 0x218FAA: gdb_breakpoint_remove (gdbstub.c:717)
==52333==    by 0x219E35: gdb_handle_packet (gdbstub.c:1035)
==52333==    by 0x21AF62: gdb_read_byte (gdbstub.c:1459)
==52333==    by 0x21B096: gdb_chr_receive (gdbstub.c:1672)
==52333==    by 0x3AF2BC: qemu_chr_be_write_impl (qemu-char.c:419)

These crashes didn't happen on a 2.6-era QEMU, so I bisected and
discovered the commit 3359baad36889b83df40b637ed993a4b816c4906 ("tcg:
Make tb_flush() thread safe") appears to be the thing that triggers this
intermittent failure. Reverting the patch on the branch tip makes the
crashes go away.

Unfortunately I don't currently have a way to trigger the segfaults
outside of Mentor Graphics's test infrastructure, which I can't share.

Does anyone know a reason that this might be happening, or suggestions
of how I might further debug this? Maybe a missed tb flush in the gdb
stub code, somewhere?

Thanks!

Julian

** Affects: qemu
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1647683

Title:
  Bad interaction between tb flushing & gdb stub

Status in QEMU:
  New

Bug description:
  I have been working on a series of patches for ARM big-endian system
  mode support, using QEMU as a bare-metal simulator for the GDB test
  suite. At some point I realised that these tests were not running
  reliably on the QEMU master branch, even without my patches applied.
  (I.e., in little-endian mode.)

  Running QEMU under GDB in the test harness via Valgrind, using
  something akin to:

  (gdb) target remote | valgrind --tool=memcheck qemu-arm-system [...]

  leads to intermittent (and quite hard-to-reproduce) segfaults in QEMU
  of the form:

  ==52333== Process terminating with default action of signal 11 (SIGSEGV)
  ==52333==  Access not within mapped region at address 0x24
  ==52333==    at 0x1D55F2: tb_page_remove (translate-all.c:1026)
  ==52333==    by 0x1D58B4: tb_phys_invalidate (translate-all.c:1119)
  ==52333==    by 0x1D63AA: tb_invalidate_phys_page_range (translate-all.c:1519)
  ==52333==    by 0x1D66D7: tb_invalidate_phys_addr (translate-all.c:1714)
  ==52333==    by 0x1CBA7F: breakpoint_invalidate (exec.c:704)
  ==52333==    by 0x1CC01F: cpu_breakpoint_remove_by_ref (exec.c:869)
  ==52333==    by 0x1CBF97: cpu_breakpoint_remove (exec.c:857)
  ==52333==    by 0x218FAA: gdb_breakpoint_remove (gdbstub.c:717)
  ==52333==    by 0x219E35: gdb_handle_packet (gdbstub.c:1035)
  ==52333==    by 0x21AF62: gdb_read_byte (gdbstub.c:1459)
  ==52333==    by 0x21B096: gdb_chr_receive (gdbstub.c:1672)
  ==52333==    by 0x3AF2BC: qemu_chr_be_write_impl (qemu-char.c:419)

  These crashes didn't happen on a 2.6-era QEMU, so I bisected and
  discovered the commit 3359baad36889b83df40b637ed993a4b816c4906 ("tcg:
  Make tb_flush() thread safe") appears to be the thing that triggers
  this intermittent failure. Reverting the patch on the branch tip makes
  the crashes go away.

  Unfortunately I don't currently have a way to trigger the segfaults
  outside of Mentor Graphics's test infrastructure, which I can't share.

  Does anyone know a reason that this might be happening, or suggestions
  of how I might further debug this? Maybe a missed tb flush in the gdb
  stub code, somewhere?

  Thanks!

  Julian

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1647683/+subscriptions

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Qemu-devel] [Bug 1647683] Re: Bad interaction between tb flushing & gdb stub
  2016-12-06 11:39 [Qemu-devel] [Bug 1647683] [NEW] Bad interaction between tb flushing & gdb stub Julian Brown
@ 2016-12-06 11:46 ` Julian Brown
  2016-12-06 14:51   ` Alex Bennée
  2016-12-06 12:34 ` [Qemu-devel] [Bug 1647683] [NEW] " Peter Maydell
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Julian Brown @ 2016-12-06 11:46 UTC (permalink / raw)
  To: qemu-devel

(FAOD, the crashes happen without Valgrind too, and the above backtrace
may be a red herring.)

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1647683

Title:
  Bad interaction between tb flushing & gdb stub

Status in QEMU:
  New

Bug description:
  I have been working on a series of patches for ARM big-endian system
  mode support, using QEMU as a bare-metal simulator for the GDB test
  suite. At some point I realised that these tests were not running
  reliably on the QEMU master branch, even without my patches applied.
  (I.e., in little-endian mode.)

  Running QEMU under GDB in the test harness via Valgrind, using
  something akin to:

  (gdb) target remote | valgrind --tool=memcheck qemu-arm-system [...]

  leads to intermittent (and quite hard-to-reproduce) segfaults in QEMU
  of the form:

  ==52333== Process terminating with default action of signal 11 (SIGSEGV)
  ==52333==  Access not within mapped region at address 0x24
  ==52333==    at 0x1D55F2: tb_page_remove (translate-all.c:1026)
  ==52333==    by 0x1D58B4: tb_phys_invalidate (translate-all.c:1119)
  ==52333==    by 0x1D63AA: tb_invalidate_phys_page_range (translate-all.c:1519)
  ==52333==    by 0x1D66D7: tb_invalidate_phys_addr (translate-all.c:1714)
  ==52333==    by 0x1CBA7F: breakpoint_invalidate (exec.c:704)
  ==52333==    by 0x1CC01F: cpu_breakpoint_remove_by_ref (exec.c:869)
  ==52333==    by 0x1CBF97: cpu_breakpoint_remove (exec.c:857)
  ==52333==    by 0x218FAA: gdb_breakpoint_remove (gdbstub.c:717)
  ==52333==    by 0x219E35: gdb_handle_packet (gdbstub.c:1035)
  ==52333==    by 0x21AF62: gdb_read_byte (gdbstub.c:1459)
  ==52333==    by 0x21B096: gdb_chr_receive (gdbstub.c:1672)
  ==52333==    by 0x3AF2BC: qemu_chr_be_write_impl (qemu-char.c:419)

  These crashes didn't happen on a 2.6-era QEMU, so I bisected and
  discovered the commit 3359baad36889b83df40b637ed993a4b816c4906 ("tcg:
  Make tb_flush() thread safe") appears to be the thing that triggers
  this intermittent failure. Reverting the patch on the branch tip makes
  the crashes go away.

  Unfortunately I don't currently have a way to trigger the segfaults
  outside of Mentor Graphics's test infrastructure, which I can't share.

  Does anyone know a reason that this might be happening, or suggestions
  of how I might further debug this? Maybe a missed tb flush in the gdb
  stub code, somewhere?

  Thanks!

  Julian

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1647683/+subscriptions

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [Bug 1647683] [NEW] Bad interaction between tb flushing & gdb stub
  2016-12-06 11:39 [Qemu-devel] [Bug 1647683] [NEW] Bad interaction between tb flushing & gdb stub Julian Brown
  2016-12-06 11:46 ` [Qemu-devel] [Bug 1647683] " Julian Brown
@ 2016-12-06 12:34 ` Peter Maydell
  2016-12-06 14:09   ` Peter Maydell
  2016-12-06 14:51 ` [Qemu-devel] [PATCH] exec.c: simplify the breakpoint invalidation logic Alex Bennée
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Peter Maydell @ 2016-12-06 12:34 UTC (permalink / raw)
  To: Bug 1647683
  Cc: QEMU Developers, Fedorov Sergey, Richard Henderson, Paolo Bonzini,
	Alex Bennée

On 6 December 2016 at 11:39, Julian Brown <1647683@bugs.launchpad.net> wrote:
> Running QEMU under GDB in the test harness via Valgrind, using something
> akin to:
>
> (gdb) target remote | valgrind --tool=memcheck qemu-arm-system [...]
>
> leads to intermittent (and quite hard-to-reproduce) segfaults in QEMU of
> the form:
>
> ==52333== Process terminating with default action of signal 11 (SIGSEGV)
> ==52333==  Access not within mapped region at address 0x24
> ==52333==    at 0x1D55F2: tb_page_remove (translate-all.c:1026)
> ==52333==    by 0x1D58B4: tb_phys_invalidate (translate-all.c:1119)
> ==52333==    by 0x1D63AA: tb_invalidate_phys_page_range (translate-all.c:1519)
> ==52333==    by 0x1D66D7: tb_invalidate_phys_addr (translate-all.c:1714)
> ==52333==    by 0x1CBA7F: breakpoint_invalidate (exec.c:704)
> ==52333==    by 0x1CC01F: cpu_breakpoint_remove_by_ref (exec.c:869)
> ==52333==    by 0x1CBF97: cpu_breakpoint_remove (exec.c:857)
> ==52333==    by 0x218FAA: gdb_breakpoint_remove (gdbstub.c:717)
> ==52333==    by 0x219E35: gdb_handle_packet (gdbstub.c:1035)
> ==52333==    by 0x21AF62: gdb_read_byte (gdbstub.c:1459)
> ==52333==    by 0x21B096: gdb_chr_receive (gdbstub.c:1672)
> ==52333==    by 0x3AF2BC: qemu_chr_be_write_impl (qemu-char.c:419)
>
> These crashes didn't happen on a 2.6-era QEMU, so I bisected and
> discovered the commit 3359baad36889b83df40b637ed993a4b816c4906 ("tcg:
> Make tb_flush() thread safe") appears to be the thing that triggers this
> intermittent failure. Reverting the patch on the branch tip makes the
> crashes go away.

I saw something similar the other day as well, not involving valgrind,
just a simple gdb connected to the gdbstub.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [Bug 1647683] [NEW] Bad interaction between tb flushing & gdb stub
  2016-12-06 12:34 ` [Qemu-devel] [Bug 1647683] [NEW] " Peter Maydell
@ 2016-12-06 14:09   ` Peter Maydell
  0 siblings, 0 replies; 13+ messages in thread
From: Peter Maydell @ 2016-12-06 14:09 UTC (permalink / raw)
  To: Bug 1647683
  Cc: QEMU Developers, Fedorov Sergey, Richard Henderson, Paolo Bonzini,
	Alex Bennée

On 6 December 2016 at 12:34, Peter Maydell <peter.maydell@linaro.org> wrote:
> I saw something similar the other day as well, not involving valgrind,
> just a simple gdb connected to the gdbstub.

http://people.linaro.org/~peter.maydell/gdbstub-bug.tgz
is a repro case for this (with an aarch64 kernel guest).
Segfaults every time when the guest hits the breakpoint.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH] exec.c: simplify the breakpoint invalidation logic
  2016-12-06 11:39 [Qemu-devel] [Bug 1647683] [NEW] Bad interaction between tb flushing & gdb stub Julian Brown
  2016-12-06 11:46 ` [Qemu-devel] [Bug 1647683] " Julian Brown
  2016-12-06 12:34 ` [Qemu-devel] [Bug 1647683] [NEW] " Peter Maydell
@ 2016-12-06 14:51 ` Alex Bennée
  2016-12-06 14:54   ` Peter Maydell
  2016-12-06 15:43 ` [Qemu-devel] [Bug 1647683] Re: Bad interaction between tb flushing & gdb stub Julian Brown
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Alex Bennée @ 2016-12-06 14:51 UTC (permalink / raw)
  To: 1647683
  Cc: peter.maydell, pbonzini, rth, Alex Bennée, Peter Crosthwaite,
	open list:Overall

A bug (1647683) was reported showing a crash when removing
breakpoints. The reproducer was bisected to 3359baad when tb_flush was
finally made thread safe. While in MTTCG the locking in
breakpoint_invalidate should have prevented any problems currently
tb_lock() is a NOP for system emulation.

On top of it all the invalidation code was special-cased between user
and system emulation which was ugly. As we now have a safe tb_flush()
we might as well use that when breakpoints and added and removed. This
is the same thing we do for single-stepping and as this is during
debugging it isn't a major performance concern.

Reported-by: Julian Brown
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 exec.c | 31 ++++---------------------------
 1 file changed, 4 insertions(+), 27 deletions(-)

diff --git a/exec.c b/exec.c
index 3d867f1..e991221 100644
--- a/exec.c
+++ b/exec.c
@@ -685,29 +685,6 @@ void cpu_exec_realizefn(CPUState *cpu, Error **errp)
 }
 
 #if defined(CONFIG_USER_ONLY)
-static void breakpoint_invalidate(CPUState *cpu, target_ulong pc)
-{
-    mmap_lock();
-    tb_lock();
-    tb_invalidate_phys_page_range(pc, pc + 1, 0);
-    tb_unlock();
-    mmap_unlock();
-}
-#else
-static void breakpoint_invalidate(CPUState *cpu, target_ulong pc)
-{
-    MemTxAttrs attrs;
-    hwaddr phys = cpu_get_phys_page_attrs_debug(cpu, pc, &attrs);
-    int asidx = cpu_asidx_from_attrs(cpu, attrs);
-    if (phys != -1) {
-        /* Locks grabbed by tb_invalidate_phys_addr */
-        tb_invalidate_phys_addr(cpu->cpu_ases[asidx].as,
-                                phys | (pc & ~TARGET_PAGE_MASK));
-    }
-}
-#endif
-
-#if defined(CONFIG_USER_ONLY)
 void cpu_watchpoint_remove_all(CPUState *cpu, int mask)
 
 {
@@ -839,7 +816,7 @@ int cpu_breakpoint_insert(CPUState *cpu, vaddr pc, int flags,
         QTAILQ_INSERT_TAIL(&cpu->breakpoints, bp, entry);
     }
 
-    breakpoint_invalidate(cpu, pc);
+    tb_flush(cpu);
 
     if (breakpoint) {
         *breakpoint = bp;
@@ -855,6 +832,7 @@ int cpu_breakpoint_remove(CPUState *cpu, vaddr pc, int flags)
     QTAILQ_FOREACH(bp, &cpu->breakpoints, entry) {
         if (bp->pc == pc && bp->flags == flags) {
             cpu_breakpoint_remove_by_ref(cpu, bp);
+            tb_flush(cpu);
             return 0;
         }
     }
@@ -865,9 +843,6 @@ int cpu_breakpoint_remove(CPUState *cpu, vaddr pc, int flags)
 void cpu_breakpoint_remove_by_ref(CPUState *cpu, CPUBreakpoint *breakpoint)
 {
     QTAILQ_REMOVE(&cpu->breakpoints, breakpoint, entry);
-
-    breakpoint_invalidate(cpu, breakpoint->pc);
-
     g_free(breakpoint);
 }
 
@@ -881,6 +856,8 @@ void cpu_breakpoint_remove_all(CPUState *cpu, int mask)
             cpu_breakpoint_remove_by_ref(cpu, bp);
         }
     }
+
+    tb_flush(cpu);
 }
 
 /* enable or disable single step mode. EXCP_DEBUG is returned by the
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [Bug 1647683] Re: Bad interaction between tb flushing & gdb stub
  2016-12-06 11:46 ` [Qemu-devel] [Bug 1647683] " Julian Brown
@ 2016-12-06 14:51   ` Alex Bennée
  0 siblings, 0 replies; 13+ messages in thread
From: Alex Bennée @ 2016-12-06 14:51 UTC (permalink / raw)
  To: Bug 1647683; +Cc: qemu-devel


Julian Brown <1647683@bugs.launchpad.net> writes:

> (FAOD, the crashes happen without Valgrind too, and the above backtrace
> may be a red herring.)

I've sent a patch that should fix this. Could you please test?

--
Alex Bennée

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH] exec.c: simplify the breakpoint invalidation logic
  2016-12-06 14:51 ` [Qemu-devel] [PATCH] exec.c: simplify the breakpoint invalidation logic Alex Bennée
@ 2016-12-06 14:54   ` Peter Maydell
  2016-12-06 15:14     ` Alex Bennée
  2016-12-07 11:32     ` Paolo Bonzini
  0 siblings, 2 replies; 13+ messages in thread
From: Peter Maydell @ 2016-12-06 14:54 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Bug 1647683, Paolo Bonzini, Richard Henderson, Peter Crosthwaite,
	open list:Overall

On 6 December 2016 at 14:51, Alex Bennée <alex.bennee@linaro.org> wrote:
> A bug (1647683) was reported showing a crash when removing
> breakpoints. The reproducer was bisected to 3359baad when tb_flush was
> finally made thread safe. While in MTTCG the locking in
> breakpoint_invalidate should have prevented any problems currently
> tb_lock() is a NOP for system emulation.
>
> On top of it all the invalidation code was special-cased between user
> and system emulation which was ugly. As we now have a safe tb_flush()
> we might as well use that when breakpoints and added and removed. This
> is the same thing we do for single-stepping and as this is during
> debugging it isn't a major performance concern.
>
> Reported-by: Julian Brown
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>  exec.c | 31 ++++---------------------------
>  1 file changed, 4 insertions(+), 27 deletions(-)
>
> diff --git a/exec.c b/exec.c
> index 3d867f1..e991221 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -685,29 +685,6 @@ void cpu_exec_realizefn(CPUState *cpu, Error **errp)
>  }
>
>  #if defined(CONFIG_USER_ONLY)
> -static void breakpoint_invalidate(CPUState *cpu, target_ulong pc)
> -{
> -    mmap_lock();
> -    tb_lock();
> -    tb_invalidate_phys_page_range(pc, pc + 1, 0);
> -    tb_unlock();
> -    mmap_unlock();
> -}
> -#else
> -static void breakpoint_invalidate(CPUState *cpu, target_ulong pc)
> -{
> -    MemTxAttrs attrs;
> -    hwaddr phys = cpu_get_phys_page_attrs_debug(cpu, pc, &attrs);
> -    int asidx = cpu_asidx_from_attrs(cpu, attrs);
> -    if (phys != -1) {
> -        /* Locks grabbed by tb_invalidate_phys_addr */
> -        tb_invalidate_phys_addr(cpu->cpu_ases[asidx].as,
> -                                phys | (pc & ~TARGET_PAGE_MASK));
> -    }
> -}
> -#endif
> -
> -#if defined(CONFIG_USER_ONLY)
>  void cpu_watchpoint_remove_all(CPUState *cpu, int mask)
>
>  {
> @@ -839,7 +816,7 @@ int cpu_breakpoint_insert(CPUState *cpu, vaddr pc, int flags,
>          QTAILQ_INSERT_TAIL(&cpu->breakpoints, bp, entry);
>      }
>
> -    breakpoint_invalidate(cpu, pc);
> +    tb_flush(cpu);
>
>      if (breakpoint) {
>          *breakpoint = bp;
> @@ -855,6 +832,7 @@ int cpu_breakpoint_remove(CPUState *cpu, vaddr pc, int flags)
>      QTAILQ_FOREACH(bp, &cpu->breakpoints, entry) {
>          if (bp->pc == pc && bp->flags == flags) {
>              cpu_breakpoint_remove_by_ref(cpu, bp);
> +            tb_flush(cpu);
>              return 0;
>          }
>      }
> @@ -865,9 +843,6 @@ int cpu_breakpoint_remove(CPUState *cpu, vaddr pc, int flags)
>  void cpu_breakpoint_remove_by_ref(CPUState *cpu, CPUBreakpoint *breakpoint)
>  {
>      QTAILQ_REMOVE(&cpu->breakpoints, breakpoint, entry);
> -
> -    breakpoint_invalidate(cpu, breakpoint->pc);
> -
>      g_free(breakpoint);
>  }
>
> @@ -881,6 +856,8 @@ void cpu_breakpoint_remove_all(CPUState *cpu, int mask)
>              cpu_breakpoint_remove_by_ref(cpu, bp);
>          }
>      }
> +
> +    tb_flush(cpu);
>  }
>
>  /* enable or disable single step mode. EXCP_DEBUG is returned by the

I think this is the wrong direction. We only need to invalidate
the TBs for the specific location the breakpoint is at.
Even if we for the moment want to apply a big hammer of doing
a full tb flush on breakpoint to fix this issue for this release,
we shouldn't drop the indirection through breakpoint_invalidate(),
because then we can't go back to invalidating just the parts we need
easily later.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH] exec.c: simplify the breakpoint invalidation logic
  2016-12-06 14:54   ` Peter Maydell
@ 2016-12-06 15:14     ` Alex Bennée
  2016-12-06 16:09       ` Peter Maydell
  2016-12-07 11:32     ` Paolo Bonzini
  1 sibling, 1 reply; 13+ messages in thread
From: Alex Bennée @ 2016-12-06 15:14 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Bug 1647683, Paolo Bonzini, Richard Henderson, Peter Crosthwaite,
	open list:Overall


Peter Maydell <peter.maydell@linaro.org> writes:

> On 6 December 2016 at 14:51, Alex Bennée <alex.bennee@linaro.org> wrote:
>> A bug (1647683) was reported showing a crash when removing
>> breakpoints. The reproducer was bisected to 3359baad when tb_flush was
>> finally made thread safe. While in MTTCG the locking in
>> breakpoint_invalidate should have prevented any problems currently
>> tb_lock() is a NOP for system emulation.
>>
>> On top of it all the invalidation code was special-cased between user
>> and system emulation which was ugly. As we now have a safe tb_flush()
>> we might as well use that when breakpoints and added and removed. This
>> is the same thing we do for single-stepping and as this is during
>> debugging it isn't a major performance concern.
>>
>> Reported-by: Julian Brown
>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>> ---
>>  exec.c | 31 ++++---------------------------
>>  1 file changed, 4 insertions(+), 27 deletions(-)
>>
>> diff --git a/exec.c b/exec.c
>> index 3d867f1..e991221 100644
>> --- a/exec.c
>> +++ b/exec.c
>> @@ -685,29 +685,6 @@ void cpu_exec_realizefn(CPUState *cpu, Error **errp)
>>  }
>>
>>  #if defined(CONFIG_USER_ONLY)
>> -static void breakpoint_invalidate(CPUState *cpu, target_ulong pc)
>> -{
>> -    mmap_lock();
>> -    tb_lock();
>> -    tb_invalidate_phys_page_range(pc, pc + 1, 0);
>> -    tb_unlock();
>> -    mmap_unlock();
>> -}
>> -#else
>> -static void breakpoint_invalidate(CPUState *cpu, target_ulong pc)
>> -{
>> -    MemTxAttrs attrs;
>> -    hwaddr phys = cpu_get_phys_page_attrs_debug(cpu, pc, &attrs);
>> -    int asidx = cpu_asidx_from_attrs(cpu, attrs);
>> -    if (phys != -1) {
>> -        /* Locks grabbed by tb_invalidate_phys_addr */
>> -        tb_invalidate_phys_addr(cpu->cpu_ases[asidx].as,
>> -                                phys | (pc & ~TARGET_PAGE_MASK));
>> -    }
>> -}
>> -#endif
>> -
>> -#if defined(CONFIG_USER_ONLY)
>>  void cpu_watchpoint_remove_all(CPUState *cpu, int mask)
>>
>>  {
>> @@ -839,7 +816,7 @@ int cpu_breakpoint_insert(CPUState *cpu, vaddr pc, int flags,
>>          QTAILQ_INSERT_TAIL(&cpu->breakpoints, bp, entry);
>>      }
>>
>> -    breakpoint_invalidate(cpu, pc);
>> +    tb_flush(cpu);
>>
>>      if (breakpoint) {
>>          *breakpoint = bp;
>> @@ -855,6 +832,7 @@ int cpu_breakpoint_remove(CPUState *cpu, vaddr pc, int flags)
>>      QTAILQ_FOREACH(bp, &cpu->breakpoints, entry) {
>>          if (bp->pc == pc && bp->flags == flags) {
>>              cpu_breakpoint_remove_by_ref(cpu, bp);
>> +            tb_flush(cpu);
>>              return 0;
>>          }
>>      }
>> @@ -865,9 +843,6 @@ int cpu_breakpoint_remove(CPUState *cpu, vaddr pc, int flags)
>>  void cpu_breakpoint_remove_by_ref(CPUState *cpu, CPUBreakpoint *breakpoint)
>>  {
>>      QTAILQ_REMOVE(&cpu->breakpoints, breakpoint, entry);
>> -
>> -    breakpoint_invalidate(cpu, breakpoint->pc);
>> -
>>      g_free(breakpoint);
>>  }
>>
>> @@ -881,6 +856,8 @@ void cpu_breakpoint_remove_all(CPUState *cpu, int mask)
>>              cpu_breakpoint_remove_by_ref(cpu, bp);
>>          }
>>      }
>> +
>> +    tb_flush(cpu);
>>  }
>>
>>  /* enable or disable single step mode. EXCP_DEBUG is returned by the
>
> I think this is the wrong direction. We only need to invalidate
> the TBs for the specific location the breakpoint is at.
> Even if we for the moment want to apply a big hammer of doing
> a full tb flush on breakpoint to fix this issue for this release,
> we shouldn't drop the indirection through breakpoint_invalidate(),
> because then we can't go back to invalidating just the parts we need
> easily later.

Why would we? It's not like fresh code generation is particularly slow,
especially in a debugging situation when you've just stopped the
machine.

The self-modifying-code paths make more sense to be partial in their
invalidations although I've never really measured quite how pathalogical
running a JIT inside QEMU is.

--
Alex Bennée

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Qemu-devel] [Bug 1647683] Re: Bad interaction between tb flushing & gdb stub
  2016-12-06 11:39 [Qemu-devel] [Bug 1647683] [NEW] Bad interaction between tb flushing & gdb stub Julian Brown
                   ` (2 preceding siblings ...)
  2016-12-06 14:51 ` [Qemu-devel] [PATCH] exec.c: simplify the breakpoint invalidation logic Alex Bennée
@ 2016-12-06 15:43 ` Julian Brown
  2016-12-06 16:05 ` Julian Brown
  2020-11-08  9:39 ` Thomas Huth
  5 siblings, 0 replies; 13+ messages in thread
From: Julian Brown @ 2016-12-06 15:43 UTC (permalink / raw)
  To: qemu-devel

I'm testing the patch now, thank you! I'll report back on how it goes.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1647683

Title:
  Bad interaction between tb flushing & gdb stub

Status in QEMU:
  New

Bug description:
  I have been working on a series of patches for ARM big-endian system
  mode support, using QEMU as a bare-metal simulator for the GDB test
  suite. At some point I realised that these tests were not running
  reliably on the QEMU master branch, even without my patches applied.
  (I.e., in little-endian mode.)

  Running QEMU under GDB in the test harness via Valgrind, using
  something akin to:

  (gdb) target remote | valgrind --tool=memcheck qemu-arm-system [...]

  leads to intermittent (and quite hard-to-reproduce) segfaults in QEMU
  of the form:

  ==52333== Process terminating with default action of signal 11 (SIGSEGV)
  ==52333==  Access not within mapped region at address 0x24
  ==52333==    at 0x1D55F2: tb_page_remove (translate-all.c:1026)
  ==52333==    by 0x1D58B4: tb_phys_invalidate (translate-all.c:1119)
  ==52333==    by 0x1D63AA: tb_invalidate_phys_page_range (translate-all.c:1519)
  ==52333==    by 0x1D66D7: tb_invalidate_phys_addr (translate-all.c:1714)
  ==52333==    by 0x1CBA7F: breakpoint_invalidate (exec.c:704)
  ==52333==    by 0x1CC01F: cpu_breakpoint_remove_by_ref (exec.c:869)
  ==52333==    by 0x1CBF97: cpu_breakpoint_remove (exec.c:857)
  ==52333==    by 0x218FAA: gdb_breakpoint_remove (gdbstub.c:717)
  ==52333==    by 0x219E35: gdb_handle_packet (gdbstub.c:1035)
  ==52333==    by 0x21AF62: gdb_read_byte (gdbstub.c:1459)
  ==52333==    by 0x21B096: gdb_chr_receive (gdbstub.c:1672)
  ==52333==    by 0x3AF2BC: qemu_chr_be_write_impl (qemu-char.c:419)

  These crashes didn't happen on a 2.6-era QEMU, so I bisected and
  discovered the commit 3359baad36889b83df40b637ed993a4b816c4906 ("tcg:
  Make tb_flush() thread safe") appears to be the thing that triggers
  this intermittent failure. Reverting the patch on the branch tip makes
  the crashes go away.

  Unfortunately I don't currently have a way to trigger the segfaults
  outside of Mentor Graphics's test infrastructure, which I can't share.

  Does anyone know a reason that this might be happening, or suggestions
  of how I might further debug this? Maybe a missed tb flush in the gdb
  stub code, somewhere?

  Thanks!

  Julian

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1647683/+subscriptions

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Qemu-devel] [Bug 1647683] Re: Bad interaction between tb flushing & gdb stub
  2016-12-06 11:39 [Qemu-devel] [Bug 1647683] [NEW] Bad interaction between tb flushing & gdb stub Julian Brown
                   ` (3 preceding siblings ...)
  2016-12-06 15:43 ` [Qemu-devel] [Bug 1647683] Re: Bad interaction between tb flushing & gdb stub Julian Brown
@ 2016-12-06 16:05 ` Julian Brown
  2020-11-08  9:39 ` Thomas Huth
  5 siblings, 0 replies; 13+ messages in thread
From: Julian Brown @ 2016-12-06 16:05 UTC (permalink / raw)
  To: qemu-devel

The patch works for me! Thank you!

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1647683

Title:
  Bad interaction between tb flushing & gdb stub

Status in QEMU:
  New

Bug description:
  I have been working on a series of patches for ARM big-endian system
  mode support, using QEMU as a bare-metal simulator for the GDB test
  suite. At some point I realised that these tests were not running
  reliably on the QEMU master branch, even without my patches applied.
  (I.e., in little-endian mode.)

  Running QEMU under GDB in the test harness via Valgrind, using
  something akin to:

  (gdb) target remote | valgrind --tool=memcheck qemu-arm-system [...]

  leads to intermittent (and quite hard-to-reproduce) segfaults in QEMU
  of the form:

  ==52333== Process terminating with default action of signal 11 (SIGSEGV)
  ==52333==  Access not within mapped region at address 0x24
  ==52333==    at 0x1D55F2: tb_page_remove (translate-all.c:1026)
  ==52333==    by 0x1D58B4: tb_phys_invalidate (translate-all.c:1119)
  ==52333==    by 0x1D63AA: tb_invalidate_phys_page_range (translate-all.c:1519)
  ==52333==    by 0x1D66D7: tb_invalidate_phys_addr (translate-all.c:1714)
  ==52333==    by 0x1CBA7F: breakpoint_invalidate (exec.c:704)
  ==52333==    by 0x1CC01F: cpu_breakpoint_remove_by_ref (exec.c:869)
  ==52333==    by 0x1CBF97: cpu_breakpoint_remove (exec.c:857)
  ==52333==    by 0x218FAA: gdb_breakpoint_remove (gdbstub.c:717)
  ==52333==    by 0x219E35: gdb_handle_packet (gdbstub.c:1035)
  ==52333==    by 0x21AF62: gdb_read_byte (gdbstub.c:1459)
  ==52333==    by 0x21B096: gdb_chr_receive (gdbstub.c:1672)
  ==52333==    by 0x3AF2BC: qemu_chr_be_write_impl (qemu-char.c:419)

  These crashes didn't happen on a 2.6-era QEMU, so I bisected and
  discovered the commit 3359baad36889b83df40b637ed993a4b816c4906 ("tcg:
  Make tb_flush() thread safe") appears to be the thing that triggers
  this intermittent failure. Reverting the patch on the branch tip makes
  the crashes go away.

  Unfortunately I don't currently have a way to trigger the segfaults
  outside of Mentor Graphics's test infrastructure, which I can't share.

  Does anyone know a reason that this might be happening, or suggestions
  of how I might further debug this? Maybe a missed tb flush in the gdb
  stub code, somewhere?

  Thanks!

  Julian

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1647683/+subscriptions

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH] exec.c: simplify the breakpoint invalidation logic
  2016-12-06 15:14     ` Alex Bennée
@ 2016-12-06 16:09       ` Peter Maydell
  0 siblings, 0 replies; 13+ messages in thread
From: Peter Maydell @ 2016-12-06 16:09 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Bug 1647683, Paolo Bonzini, Richard Henderson, Peter Crosthwaite,
	open list:Overall

On 6 December 2016 at 15:14, Alex Bennée <alex.bennee@linaro.org> wrote:
> Peter Maydell <peter.maydell@linaro.org> writes:
>> I think this is the wrong direction. We only need to invalidate
>> the TBs for the specific location the breakpoint is at.
>> Even if we for the moment want to apply a big hammer of doing
>> a full tb flush on breakpoint to fix this issue for this release,
>> we shouldn't drop the indirection through breakpoint_invalidate(),
>> because then we can't go back to invalidating just the parts we need
>> easily later.
>
> Why would we? It's not like fresh code generation is particularly slow,
> especially in a debugging situation when you've just stopped the
> machine.

Because a wholescale tb_flush() is a "this is probably wrong"
flag, and a great way to hide bugs. It also makes the gdbstub
more invasive than it strictly has to be.

We have less than 10 calls to tb_flush() in the whole system
and I think they could all use examination to see whether
they're really correct.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH] exec.c: simplify the breakpoint invalidation logic
  2016-12-06 14:54   ` Peter Maydell
  2016-12-06 15:14     ` Alex Bennée
@ 2016-12-07 11:32     ` Paolo Bonzini
  1 sibling, 0 replies; 13+ messages in thread
From: Paolo Bonzini @ 2016-12-07 11:32 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Alex Bennée, Bug 1647683, Richard Henderson,
	Peter Crosthwaite, open list:Overall


> I think this is the wrong direction. We only need to invalidate
> the TBs for the specific location the breakpoint is at.
> Even if we for the moment want to apply a big hammer of doing
> a full tb flush on breakpoint to fix this issue for this release,
> we shouldn't drop the indirection through breakpoint_invalidate(),
> because then we can't go back to invalidating just the parts we need
> easily later.

I agree with Peter, but still this patch should be fine for
2.8 and we can revert it when MTTCG locking is merged.

Paolo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 1647683] Re: Bad interaction between tb flushing & gdb stub
  2016-12-06 11:39 [Qemu-devel] [Bug 1647683] [NEW] Bad interaction between tb flushing & gdb stub Julian Brown
                   ` (4 preceding siblings ...)
  2016-12-06 16:05 ` Julian Brown
@ 2020-11-08  9:39 ` Thomas Huth
  5 siblings, 0 replies; 13+ messages in thread
From: Thomas Huth @ 2020-11-08  9:39 UTC (permalink / raw)
  To: qemu-devel

Fix had been included here:
https://git.qemu.org/?p=qemu.git;a=commitdiff;h=a9353fe897ca2687e5b338

** Changed in: qemu
       Status: New => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1647683

Title:
  Bad interaction between tb flushing & gdb stub

Status in QEMU:
  Fix Released

Bug description:
  I have been working on a series of patches for ARM big-endian system
  mode support, using QEMU as a bare-metal simulator for the GDB test
  suite. At some point I realised that these tests were not running
  reliably on the QEMU master branch, even without my patches applied.
  (I.e., in little-endian mode.)

  Running QEMU under GDB in the test harness via Valgrind, using
  something akin to:

  (gdb) target remote | valgrind --tool=memcheck qemu-arm-system [...]

  leads to intermittent (and quite hard-to-reproduce) segfaults in QEMU
  of the form:

  ==52333== Process terminating with default action of signal 11 (SIGSEGV)
  ==52333==  Access not within mapped region at address 0x24
  ==52333==    at 0x1D55F2: tb_page_remove (translate-all.c:1026)
  ==52333==    by 0x1D58B4: tb_phys_invalidate (translate-all.c:1119)
  ==52333==    by 0x1D63AA: tb_invalidate_phys_page_range (translate-all.c:1519)
  ==52333==    by 0x1D66D7: tb_invalidate_phys_addr (translate-all.c:1714)
  ==52333==    by 0x1CBA7F: breakpoint_invalidate (exec.c:704)
  ==52333==    by 0x1CC01F: cpu_breakpoint_remove_by_ref (exec.c:869)
  ==52333==    by 0x1CBF97: cpu_breakpoint_remove (exec.c:857)
  ==52333==    by 0x218FAA: gdb_breakpoint_remove (gdbstub.c:717)
  ==52333==    by 0x219E35: gdb_handle_packet (gdbstub.c:1035)
  ==52333==    by 0x21AF62: gdb_read_byte (gdbstub.c:1459)
  ==52333==    by 0x21B096: gdb_chr_receive (gdbstub.c:1672)
  ==52333==    by 0x3AF2BC: qemu_chr_be_write_impl (qemu-char.c:419)

  These crashes didn't happen on a 2.6-era QEMU, so I bisected and
  discovered the commit 3359baad36889b83df40b637ed993a4b816c4906 ("tcg:
  Make tb_flush() thread safe") appears to be the thing that triggers
  this intermittent failure. Reverting the patch on the branch tip makes
  the crashes go away.

  Unfortunately I don't currently have a way to trigger the segfaults
  outside of Mentor Graphics's test infrastructure, which I can't share.

  Does anyone know a reason that this might be happening, or suggestions
  of how I might further debug this? Maybe a missed tb flush in the gdb
  stub code, somewhere?

  Thanks!

  Julian

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1647683/+subscriptions

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-11-08  9:46 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-12-06 11:39 [Qemu-devel] [Bug 1647683] [NEW] Bad interaction between tb flushing & gdb stub Julian Brown
2016-12-06 11:46 ` [Qemu-devel] [Bug 1647683] " Julian Brown
2016-12-06 14:51   ` Alex Bennée
2016-12-06 12:34 ` [Qemu-devel] [Bug 1647683] [NEW] " Peter Maydell
2016-12-06 14:09   ` Peter Maydell
2016-12-06 14:51 ` [Qemu-devel] [PATCH] exec.c: simplify the breakpoint invalidation logic Alex Bennée
2016-12-06 14:54   ` Peter Maydell
2016-12-06 15:14     ` Alex Bennée
2016-12-06 16:09       ` Peter Maydell
2016-12-07 11:32     ` Paolo Bonzini
2016-12-06 15:43 ` [Qemu-devel] [Bug 1647683] Re: Bad interaction between tb flushing & gdb stub Julian Brown
2016-12-06 16:05 ` Julian Brown
2020-11-08  9:39 ` Thomas Huth

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).