All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Jan-Simon Möller" <dl9pf@gmx.de>
To: qemu-devel@nongnu.org
Subject: Re: locking for TCG (was: Re: [Qemu-devel] [Bug 668799] Re: qemu-arm segfaults executing msgmerge (gettext))
Date: Wed, 01 Dec 2010 21:15:57 -0000	[thread overview]
Message-ID: <201012012215.57673.dl9pf@gmx.de> (raw)
In-Reply-To: AANLkTi=-xrZY5j4Qxf5JKNp8fV9NP4JxS3H0TPgBr0CB@mail.gmail.com

Am Mittwoch, 1. Dezember 2010, 20:40:37 schrieb Peter Maydell:
> On 28 November 2010 11:24, Peter Maydell <peter.maydell@linaro.org> wrote:
> > 2010/11/28 Brian Harring <ferringb@gmail.com>:
> >> Additional note... it *looks* like the deadlock potential is there
> >> already in 0.13, it's just heavily exacerbated by this patch- out of
> >> about 600 builds I've seen 2 lockup in the same fashion (rate was far
> >> higher with the patch on this ticket).
> 
> > (I think that running multithreaded programs under user-mode
> > emulation is effectively hitting a lot of the same locking issues
> > that you would get for emulating an MP core in multiple threads.)
> 
> Having looked in a bit more detail at the code, I definitely think
> the locking is insufficient for the TCG TranslationBlock structures.
> In particular:
>  * cpu_exit() updates the linked lists in the TB ->jmp_next[] arrays
>  * cpu_exit() is called:
>    + by other threads, in linux-user mode
>    + by signal handlers (both in linux user mode for taking signals
> and in system mode via qemu_notify_event() when timers expire)
>    + by normal generated code

Thanks for investigation this further!

> At the moment nothing blocks signals when it is modifying the TB
> jmp_next arrays (either via cpu_exit() or tb_add_jump()), so if you're
> unlucky and you take a signal while you're in the middle of modifying
> a jmp_next list you might end up with the list corrupted. This is
> more likely to happen with multithreaded linux-user mode code I
> think, but there's still a possibility there even in pure system mode.
> 
> I'm not sure what the best approach to solving this is. We could
> add "block signals; take mutex" around all the places in the code
> that touch the TB data structures. That seems a bit heavyweight
> and it's also not totally clear to me what the best points in the
> exec.c code to put the locking are; but it would fix the problem.


Adding locks everywhere is probably the "save but horribly slow" solution.

 
> Alternatively we could try making cpu_exit() not have to actually
> fiddle with the TB graph. To do that you'd need to do one of:
>  * explicit checks for a "should we exit" flag at backwards and
> indirect branches and every few hundred insns. This is extra
> straight-line-code overhead, but on the other hand you get to
> avoid having all your cached next-tb links trashed every time
> something has to call cpu_exit(), so it's not totally obvious that
> it would be a net loss
>  * have cpu_exit() force itself to be running in the thread for that
> virtual CPU by sending a signal, to avoid the "thread has executed
> its way out of the TB" problem that currently requires us to trace
> through the whole TB call graph. Then we could do something
> simpler and atomic/reentrant to stop the cpu rather than chasing
> and editing linked lists
> 
> I think on balance I maybe favour the last one, but I'm not
> sure. Does anybody have an opinion?

Sounds reasonable to me.

Brainstorming:
Would per-thread data-structures make any sense ?


-- JSM

-- 
qemu-arm segfaults executing msgmerge (gettext)
https://bugs.launchpad.net/bugs/668799
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.

Status in QEMU: New

Bug description:
upstream qemu.git revision b45e9c05dbacba8e992f0bffeca04c6379c3ad45

Starting program: /usr/bin/qemu-arm msgmerge-static ar.po anjuta.pot

[Thread debugging using libthread_db enabled]
[New Thread 0x7ffff4bc3ff0 (LWP 26108)]
[New Thread 0x7ffff4b8aff0 (LWP 26109)]
[New Thread 0x7ffff4b51ff0 (LWP 26110)]
[New Thread 0x7ffff4b18ff0 (LWP 26111)]
[New Thread 0x7ffff4adfff0 (LWP 26112)]
[New Thread 0x7ffff4aa6ff0 (LWP 26113)]
[New Thread 0x7ffff4a6dff0 (LWP 26114)]
[New Thread 0x7ffff4a34ff0 (LWP 26115)]
[New Thread 0x7ffff49fbff0 (LWP 26116)]
[New Thread 0x7ffff49c2ff0 (LWP 26117)]
[New Thread 0x7ffff4989ff0 (LWP 26118)]
[New Thread 0x7ffff4950ff0 (LWP 26119)]
[New Thread 0x7ffff4917ff0 (LWP 26120)]
[New Thread 0x7ffff48deff0 (LWP 26121)]
[New Thread 0x7ffff48a5ff0 (LWP 26122)]
[New Thread 0x7ffff486cff0 (LWP 26123)]
[New Thread 0x7ffff4833ff0 (LWP 26124)]
[New Thread 0x7ffff47faff0 (LWP 26125)]
[New Thread 0x7ffff47c1ff0 (LWP 26126)]
[New Thread 0x7ffff4788ff0 (LWP 26127)]
[New Thread 0x7ffff474fff0 (LWP 26128)]
[New Thread 0x7ffff4716ff0 (LWP 26129)]
[New Thread 0x7ffff46ddff0 (LWP 26130)]
.........................
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff4aa6ff0 (LWP 26113)]
0x00000000600480d4 in tb_reset_jump_recursive2 (tb=0x7ffff4c63540, n=0)
    at /home/user/git/qemu/exec.c:1333
1333                tb1 = tb1->jmp_next[n1];

(gdb) bt
#0  0x00000000600480d4 in tb_reset_jump_recursive2 (tb=0x7ffff4c63540, n=0)
    at /home/user/git/qemu/exec.c:1333
#1  0x00000000600481c0 in tb_reset_jump_recursive (tb=0x7ffff4c63540)
    at /home/user/git/qemu/exec.c:1361
#2  0x0000000060048160 in tb_reset_jump_recursive2 (tb=0x7ffff4c634d8, n=0)
    at /home/user/git/qemu/exec.c:1355
#3  0x00000000600481c0 in tb_reset_jump_recursive (tb=0x7ffff4c634d8)
    at /home/user/git/qemu/exec.c:1361
#4  0x0000000060048160 in tb_reset_jump_recursive2 (tb=0x7ffff4c63470, n=0)
    at /home/user/git/qemu/exec.c:1355
#5  0x00000000600481c0 in tb_reset_jump_recursive (tb=0x7ffff4c63470)
    at /home/user/git/qemu/exec.c:1361
#6  0x0000000060048160 in tb_reset_jump_recursive2 (tb=0x7ffff4c63408, n=1)
    at /home/user/git/qemu/exec.c:1355
#7  0x00000000600481d1 in tb_reset_jump_recursive (tb=0x7ffff4c63408)
    at /home/user/git/qemu/exec.c:1362
#8  0x0000000060048160 in tb_reset_jump_recursive2 (tb=0x7ffff4c633a0, n=0)
    at /home/user/git/qemu/exec.c:1355
#9  0x00000000600481c0 in tb_reset_jump_recursive (tb=0x7ffff4c633a0)
    at /home/user/git/qemu/exec.c:1361
#10 0x0000000060048160 in tb_reset_jump_recursive2 (tb=0x7ffff4c63338, n=0)
    at /home/user/git/qemu/exec.c:1355
#11 0x00000000600481c0 in tb_reset_jump_recursive (tb=0x7ffff4c63338)
    at /home/user/git/qemu/exec.c:1361
#12 0x0000000060048160 in tb_reset_jump_recursive2 (tb=0x7ffff4c632d0, n=0)
    at /home/user/git/qemu/exec.c:1355
---Type <return> to continue, or q <return> to quit---
#13 0x00000000600481c0 in tb_reset_jump_recursive (tb=0x7ffff4c632d0)
    at /home/user/git/qemu/exec.c:1361
#14 0x0000000060048160 in tb_reset_jump_recursive2 (tb=0x7ffff4c63268, n=1)
    at /home/user/git/qemu/exec.c:1355
#15 0x00000000600481d1 in tb_reset_jump_recursive (tb=0x7ffff4c63268)
    at /home/user/git/qemu/exec.c:1362
#16 0x0000000060048160 in tb_reset_jump_recursive2 (tb=0x7ffff4c63200, n=0)
    at /home/user/git/qemu/exec.c:1355
#17 0x00000000600481c0 in tb_reset_jump_recursive (tb=0x7ffff4c63200)
    at /home/user/git/qemu/exec.c:1361
#18 0x00000000600487c5 in cpu_unlink_tb (env=0x62385400) at /home/user/git/qemu/exec.c:1617
#19 0x00000000600488e8 in cpu_exit (env=0x62385400) at /home/user/git/qemu/exec.c:1662
#20 0x0000000060000798 in start_exclusive () at /home/user/git/qemu/linux-user/main.c:152
#21 0x0000000060000a4b in do_kernel_trap (env=0x62359940)
    at /home/user/git/qemu/linux-user/main.c:493
#22 0x00000000600023f3 in cpu_loop (env=0x62359940) at /home/user/git/qemu/linux-user/main.c:797
#23 0x00000000600123df in clone_func (arg=0x7ffffffd76e0)
    at /home/user/git/qemu/linux-user/syscall.c:3561
#24 0x00000000600b382d in start_thread (arg=<value optimized out>) at pthread_create.c:297
#25 0x00000000600f1809 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#26 0x0000000000000000 in ?? ()
(gdb) 



Its interesting to see this :
#0  0x00000000600480d4 in tb_reset_jump_recursive2 (tb=0x7ffff4c63540, n=0)
    at /home/user/git/qemu/exec.c:1333
        tb1 = 0x0                                           <<<<<<<<<<
        tb_next = 0xf4c63610                        <<<<<<<<<<
        ptb = 0x60341c91                              <<<<<<<<<<
        n1 = 0
#1  0x00000000600481c0 in tb_reset_jump_recursive (tb=0x7ffff4c63540)
    at /home/user/git/qemu/exec.c:1361
No locals.
#2  0x0000000060048160 in tb_reset_jump_recursive2 (tb=0x7ffff4c634d8, n=0)
    at /home/user/git/qemu/exec.c:1355
        tb1 = 0x7ffff4c634d8                          <<<<<<<<<<<
        tb_next = 0x7ffff4c63540                    <<<<<<<<<<<
        ptb = 0x7ffff4c63860                           <<<<<<<<<<<
        n1 = 0
#3  0x00000000600481c0 in tb_reset_jump_recursive (tb=0x7ffff4c634d8)
    at /home/user/git/qemu/exec.c:1361
No locals.
#4  0x0000000060048160 in tb_reset_jump_recursive2 (tb=0x7ffff4c63470, n=0)
    at /home/user/git/qemu/exec.c:1355
        tb1 = 0x7ffff4c63470
        tb_next = 0x7ffff4c634d8
        ptb = 0x7ffff4c63530
        n1 = 0
#5  0x00000000600481c0 in tb_reset_jump_recursive (tb=0x7ffff4c63470)
    at /home/user/git/qemu/exec.c:1361

  reply	other threads:[~2010-12-01 21:26 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-30 16:42 [Qemu-devel] [Bug 668799] [NEW] qemu-arm segfaults executing msgmerge (gettext) Jan-Simon Möller
2010-11-07 12:13 ` [Qemu-devel] [Bug 668799] " Jan-Simon Möller
2010-11-07 12:22 ` Jan-Simon Möller
2010-11-14 20:47 ` Peter Maydell
2010-11-14 21:32 ` Jan-Simon Möller
2010-11-14 21:35 ` Jan-Simon Möller
2010-11-19 17:50 ` Peter Maydell
2010-11-28  9:41 ` Brian Harring
2010-11-28 10:37 ` Brian Harring
2010-11-28 11:24   ` Peter Maydell
2010-11-28 17:31   ` Martin Mohring
2010-12-01 19:40 ` locking for TCG (was: Re: [Qemu-devel] [Bug 668799] Re: qemu-arm segfaults executing msgmerge (gettext)) Peter Maydell
2010-12-01 21:15   ` Jan-Simon Möller [this message]
2012-04-03 13:14 ` [Qemu-devel] [Bug 668799] Re: qemu-arm segfaults executing msgmerge (gettext) Cédric VINCENT
2012-11-28 13:11 ` Ying-Chun Liu
2013-01-13 12:00 ` Erik de Castro Lopo
2013-03-03 22:38 ` Peter Maydell
2013-03-07  4:32 ` Peter Maydell
2013-04-26 16:59 ` Peter Maydell
2013-06-13  9:33 ` Peter Maydell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201012012215.57673.dl9pf@gmx.de \
    --to=dl9pf@gmx.de \
    --cc=668799@bugs.launchpad.net \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.