* [Qemu-devel] [PATCH] sh4-linux-user: fix multi-threading regression.
@ 2012-03-26 13:07 Cédric VINCENT
  2012-03-26 17:23 ` Peter Maydell
  2012-03-30 13:36 ` [Qemu-devel] [PATCH] sh4-linux-user: fix multi-threading regression cedric.vincent
  0 siblings, 2 replies; 6+ messages in thread
From: Cédric VINCENT @ 2012-03-26 13:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: Cédric VINCENT, Riku Voipio
This reverts commit fd4bab10 "target-sh4: optimize exceptions": the
function cpu_restore_state() isn't expected to be called in user-mode,
as a consequence it isn't protected from race conditions.  For
information, syscalls are exceptions on Linux/SH4.
There were two possible fixes: either "tb_lock" is acquired/released
around the call to cpu_restore_state() [1] or the commit that
introduced this regression is reverted [2].  The performance impact of
these two approaches were compared with two different tests:
    +-------------------------+--------+-----------------+-------------------+
    |                         | v1.0.1 | [1]      v1.0.1 | [2]        v1.0.1 |
    | Test                    |        | + tb_lock patch | + reverted commit |
    +-------------------------+--------+-----------------+-------------------+
    | bzip2 17MB.mp3 (v1.0.5) |   ~46s |            ~46s |              ~46s |
    | df_dok/X11 (v1.4.12)    |  crash |           ~257s |             ~256s |
    +-------------------------+--------+-----------------+-------------------+
Since I didn't see any performance impact, I prefered not to add extra
calls to spin_lock/spin_unlock that were specific to the user-mode.
Signed-off-by: Cédric VINCENT <cedric.vincent@st.com>
---
 target-sh4/op_helper.c |   22 ++++++++++------------
 target-sh4/translate.c |    5 +++++
 2 files changed, 15 insertions(+), 12 deletions(-)
diff --git a/target-sh4/op_helper.c b/target-sh4/op_helper.c
index b299576..b8b6e4a 100644
--- a/target-sh4/op_helper.c
+++ b/target-sh4/op_helper.c
@@ -84,31 +84,28 @@ void helper_ldtlb(void)
 #endif
 }
 
-static inline void raise_exception(int index, void *retaddr)
-{
-    env->exception_index = index;
-    cpu_restore_state_from_retaddr(retaddr);
-    cpu_loop_exit(env);
-}
-
 void helper_raise_illegal_instruction(void)
 {
-    raise_exception(0x180, GETPC());
+    env->exception_index = 0x180;
+    cpu_loop_exit(env);
 }
 
 void helper_raise_slot_illegal_instruction(void)
 {
-    raise_exception(0x1a0, GETPC());
+    env->exception_index = 0x1a0;
+    cpu_loop_exit(env);
 }
 
 void helper_raise_fpu_disable(void)
 {
-    raise_exception(0x800, GETPC());
+  env->exception_index = 0x800;
+  cpu_loop_exit(env);
 }
 
 void helper_raise_slot_fpu_disable(void)
 {
-    raise_exception(0x820, GETPC());
+  env->exception_index = 0x820;
+  cpu_loop_exit(env);
 }
 
 void helper_debug(void)
@@ -129,7 +126,8 @@ void helper_sleep(uint32_t next_pc)
 void helper_trapa(uint32_t tra)
 {
     env->tra = tra << 2;
-    raise_exception(0x160, GETPC());
+    env->exception_index = 0x160;
+    cpu_loop_exit(env);
 }
 
 void helper_movcal(uint32_t address, uint32_t value)
diff --git a/target-sh4/translate.c b/target-sh4/translate.c
index e04a6e0..5f4ad0e 100644
--- a/target-sh4/translate.c
+++ b/target-sh4/translate.c
@@ -466,6 +466,7 @@ static inline void gen_store_fpr64 (TCGv_i64 t, int reg)
 #define CHECK_NOT_DELAY_SLOT \
   if (ctx->flags & (DELAY_SLOT | DELAY_SLOT_CONDITIONAL))     \
   {                                                           \
+      tcg_gen_movi_i32(cpu_pc, ctx->pc);                      \
       gen_helper_raise_slot_illegal_instruction();            \
       ctx->bstate = BS_EXCP;                                  \
       return;                                                 \
@@ -473,6 +474,7 @@ static inline void gen_store_fpr64 (TCGv_i64 t, int reg)
 
 #define CHECK_PRIVILEGED                                        \
   if (IS_USER(ctx)) {                                           \
+      tcg_gen_movi_i32(cpu_pc, ctx->pc);                        \
       if (ctx->flags & (DELAY_SLOT | DELAY_SLOT_CONDITIONAL)) { \
          gen_helper_raise_slot_illegal_instruction();           \
       } else {                                                  \
@@ -484,6 +486,7 @@ static inline void gen_store_fpr64 (TCGv_i64 t, int reg)
 
 #define CHECK_FPU_ENABLED                                       \
   if (ctx->flags & SR_FD) {                                     \
+      tcg_gen_movi_i32(cpu_pc, ctx->pc);                        \
       if (ctx->flags & (DELAY_SLOT | DELAY_SLOT_CONDITIONAL)) { \
           gen_helper_raise_slot_fpu_disable();                  \
       } else {                                                  \
@@ -1384,6 +1387,7 @@ static void _decode_opc(DisasContext * ctx)
 	{
 	    TCGv imm;
 	    CHECK_NOT_DELAY_SLOT
+	    tcg_gen_movi_i32(cpu_pc, ctx->pc);
 	    imm = tcg_const_i32(B7_0);
 	    gen_helper_trapa(imm);
 	    tcg_temp_free(imm);
@@ -1881,6 +1885,7 @@ static void _decode_opc(DisasContext * ctx)
 	    ctx->opcode, ctx->pc);
     fflush(stderr);
 #endif
+    tcg_gen_movi_i32(cpu_pc, ctx->pc);
     if (ctx->flags & (DELAY_SLOT | DELAY_SLOT_CONDITIONAL)) {
        gen_helper_raise_slot_illegal_instruction();
     } else {
-- 
1.7.5.1
^ permalink raw reply related	[flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH] sh4-linux-user: fix multi-threading regression.
  2012-03-26 13:07 [Qemu-devel] [PATCH] sh4-linux-user: fix multi-threading regression Cédric VINCENT
@ 2012-03-26 17:23 ` Peter Maydell
  2012-03-27  8:15   ` cedric.vincent
  2012-04-03 12:28   ` [Qemu-devel] linux-user: threading issue in the signal handler (Was: [PATCH] sh4-linux-user: fix multi-threading) regression cedric.vincent
  2012-03-30 13:36 ` [Qemu-devel] [PATCH] sh4-linux-user: fix multi-threading regression cedric.vincent
  1 sibling, 2 replies; 6+ messages in thread
From: Peter Maydell @ 2012-03-26 17:23 UTC (permalink / raw)
  To: Cédric VINCENT; +Cc: Riku Voipio, qemu-devel, Aurelien Jarno
2012/3/26 Cédric VINCENT <cedric.vincent@st.com>:
> This reverts commit fd4bab10 "target-sh4: optimize exceptions":
[cc'ing Aurelien as the author of that commit]
> the function cpu_restore_state() isn't expected to be called in user-mode,
Is this really true? host_signal_handler() calls cpu_signal_handler()
calls handle_cpu_signala) calls cpu_restore_state() so hopefully
it's OK to be called in at least *some* situations...
> as a consequence it isn't protected from race conditions.  For
> information, syscalls are exceptions on Linux/SH4.
>
> There were two possible fixes: either "tb_lock" is acquired/released
> around the call to cpu_restore_state() [1] or the commit that
> introduced this regression is reverted [2].
Can you explain a bit further what the race condition is that occurs
here?
NB the whole tb_lock thing is broken anyway.
thanks
-- PMM
^ permalink raw reply	[flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH] sh4-linux-user: fix multi-threading regression.
  2012-03-26 17:23 ` Peter Maydell
@ 2012-03-27  8:15   ` cedric.vincent
  2012-04-03 12:28   ` [Qemu-devel] linux-user: threading issue in the signal handler (Was: [PATCH] sh4-linux-user: fix multi-threading) regression cedric.vincent
  1 sibling, 0 replies; 6+ messages in thread
From: cedric.vincent @ 2012-03-27  8:15 UTC (permalink / raw)
  To: Peter Maydell; +Cc: Riku Voipio, qemu-devel, Aurelien Jarno
On Mon, Mar 26, 2012 at 07:23:58PM +0200, Peter Maydell wrote:
> 2012/3/26 Cédric VINCENT <cedric.vincent@st.com>:
> > This reverts commit fd4bab10 "target-sh4: optimize exceptions":
> 
> [cc'ing Aurelien as the author of that commit]
> 
> > the function cpu_restore_state() isn't expected to be called in user-mode,
> 
> Is this really true? host_signal_handler() calls cpu_signal_handler()
> calls handle_cpu_signala) calls cpu_restore_state() so hopefully
> it's OK to be called in at least *some* situations...
> 
> > as a consequence it isn't protected from race conditions.  For
> > information, syscalls are exceptions on Linux/SH4.
> >
> > There were two possible fixes: either "tb_lock" is acquired/released
> > around the call to cpu_restore_state() [1] or the commit that
> > introduced this regression is reverted [2].
> 
> Can you explain a bit further what the race condition is that occurs
> here?
Here is what I observed on a "real" application:
thread2: "jump to a new block"       | thread1: "do a syscall"
  cpu_exec [tb_lock acquired]        |   helper_trapa
    tb_find_fast                     |     raise_exception
      tb_find_slow                   |       cpu_restore_state_from_retaddr
        tb_gen_code                  |         cpu_translate_state
          cpu_gen_code               |            gen_intermediate_code_pc
            gen_intermediate_code_pc | /!\ thread1 and thread2 modify
                                     |     gen_opc_buf[] at the same time
                                     |     since thread1 didn't acquire tb_lock.
Actually you can reproduce easily this race-condition with the source
code below, where both threads do syscalls repeatedly:
8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----
    #include <pthread.h>
    #include <stdio.h>
    #include <unistd.h>
    void *routine(void *arg)
    {
        int thread_number = (int)arg;
        while (1) {
            printf("hello, I'm %d\n", thread_number);
            sleep(1);
       }
    }
    int main(void)
    {
        int status;
        pthread_t thread;
        status = pthread_create(&thread, NULL, routine, (void *)2);
        if (status) {
            puts("can't create a thread, bye!");
            return 1;
        }
        routine((void *)1);
        return 0; /* Never reached.  */
    }
8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----
Before the fix:
    $ qemu-sh4 ./a.out                                   
    hello, I'm 1
    hello, I'm 2
    hello, I'm 1
    hello, I'm 2
    hello, I'm 1
    qemu: uncaught target signal 11 (Segmentation fault) - core dumped
    Segmentation fault
    $ qemu-sh4 ./a.out
    qemu/tcg/tcg.c:1369: tcg fatal error
    qemu: uncaught target signal 11 (Segmentation fault) - core dumped
    Segmentation fault
^ permalink raw reply	[flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH] sh4-linux-user: fix multi-threading regression.
  2012-03-26 13:07 [Qemu-devel] [PATCH] sh4-linux-user: fix multi-threading regression Cédric VINCENT
  2012-03-26 17:23 ` Peter Maydell
@ 2012-03-30 13:36 ` cedric.vincent
  1 sibling, 0 replies; 6+ messages in thread
From: cedric.vincent @ 2012-03-30 13:36 UTC (permalink / raw)
  To: qemu-devel, Peter Maydell; +Cc: Riku Voipio, Aurelien Jarno
On Mon, Mar 26, 2012 at 03:07:18PM +0200, Cedric VINCENT wrote:
> This reverts commit fd4bab10 "target-sh4: optimize exceptions": the
> function cpu_restore_state() isn't expected to be called in user-mode,
> as a consequence it isn't protected from race conditions.  For
> information, syscalls are exceptions on Linux/SH4.
> 
> There were two possible fixes: either "tb_lock" is acquired/released
> around the call to cpu_restore_state() [1] or the commit that
> introduced this regression is reverted [2].
I will submit a new version of this patch where the commit fd4bab10
isn't reverted and the call to cpu_restore_state() is protected by
tb_lock instead.  Actually most of the SH4 FPU helpers may call
cpu_restore_state() indirectly, this is the same problem as with
helper_trapa().
Regards,
Cédric.
^ permalink raw reply	[flat|nested] 6+ messages in thread
* [Qemu-devel] linux-user: threading issue in the signal handler (Was: [PATCH] sh4-linux-user: fix multi-threading) regression.
  2012-03-26 17:23 ` Peter Maydell
  2012-03-27  8:15   ` cedric.vincent
@ 2012-04-03 12:28   ` cedric.vincent
  2012-04-03 12:48     ` Peter Maydell
  1 sibling, 1 reply; 6+ messages in thread
From: cedric.vincent @ 2012-04-03 12:28 UTC (permalink / raw)
  To: Peter Maydell, Riku Voipio; +Cc: qemu-devel, Aurelien Jarno
On Mon, Mar 26, 2012 at 07:23:58PM +0200, Peter Maydell wrote:
> 2012/3/26 Cédric VINCENT <cedric.vincent@st.com>:
> > the function cpu_restore_state() isn't expected to be called in user-mode,
> 
> Is this really true? host_signal_handler() calls cpu_signal_handler()
> calls handle_cpu_signala) calls cpu_restore_state() so hopefully
> it's OK to be called in at least *some* situations...
Actually it's not that OK, the code below [1] exposes a threading
issue in this specific part of QEMU:
    * the first thread makes invalid memory references, that way QEMU
      reaches the given situation (host_signal_handler ->
      cpu_signal_handler -> handle_cpu_signal -> cpu_restore_state ->
      gen_intermediate_code) without acquiring "tb_lock".
    * at the same time, the second thread executes code that wasn't
      translated before, that way the TCG is invoked with "tb_lock"
      acquired.  Note that in this example I used a self modifying
      block just to simulate a not-yet-translated code.
This example triggers an invalid memory reference and/or an assertion
in the TCG part of QEMU.
> NB the whole tb_lock thing is broken anyway.
Because of such bugs or are there other reasons?  Maybe we could add a
simple sanity check in gen_intermediate_code() functions to be sure
that "tb_lock" is acquired when they are called.
Regards,
Cédric.
[1] I wrote this example for ARM and SH4, but this problem appears for
    any guest CPU:
#define _GNU_SOURCE
#include <pthread.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <signal.h>
#include <sys/syscall.h>
#include <signal.h>
#include <stdlib.h>
#include <strings.h>
#include <stdint.h>
pid_t gettid(void)
{
	return syscall(SYS_gettid);
}
void handler(int signo)
{
	printf("Thread %d received signal %d\n", gettid(), signo);
}
void *routine(void *arg)
{
	pid_t tid = (pid_t)arg;
	while (1) {
            *(int *)NULL = 1;
            sleep(1);
        }
}
int main(void)
{
    struct sigaction action;
    pthread_t thread;
    int status;
    bzero(&action, sizeof(action));
    action.sa_handler = handler;
    sigfillset(&action.sa_mask);
    status = sigaction(SIGSEGV, &action, NULL);
    if (status < 0) {
        puts("can't regsiter sighandler, bye!");
        exit(EXIT_FAILURE);
    }
    status = pthread_create(&thread, NULL, routine, (void *)gettid());
    if (status) {
	    puts("can't create a thread, bye!");
	    exit(EXIT_FAILURE);
    }
#if defined(__SH4__)
    uint8_t self_modifying_code[] = {
        // _start
        0x00, 0xc7,  // mova    @(4, pc), r0
        0x01, 0x61,  // mov.w   @r0, r1
        0x11, 0x20,  // mov.w   r1, @r0
        0xfb, 0xaf,  // bra     _start
        0x09, 0x00,  // nop
    };
    asm volatile ("jmp @%0 \n nop" : : "r" (self_modifying_code));
#elif defined(__arm__)
    uint32_t self_modifying_code[] = {
        // _start:
        0xe1a0000f,  // mov     r0, pc
        0xe5901000,  // ldr     r1, [r0]
        0xe5801000,  // str     r1, [r0]
        0xeafffffb,  // b       _start
    };
    asm volatile ("mov pc, %0" : : "r" (self_modifying_code));
#else
    #error "unsupported architecture."
#endif
    puts("should'nt be reached, bye!");
    exit(EXIT_FAILURE);
}
^ permalink raw reply	[flat|nested] 6+ messages in thread
* Re: [Qemu-devel] linux-user: threading issue in the signal handler (Was: [PATCH] sh4-linux-user: fix multi-threading) regression.
  2012-04-03 12:28   ` [Qemu-devel] linux-user: threading issue in the signal handler (Was: [PATCH] sh4-linux-user: fix multi-threading) regression cedric.vincent
@ 2012-04-03 12:48     ` Peter Maydell
  0 siblings, 0 replies; 6+ messages in thread
From: Peter Maydell @ 2012-04-03 12:48 UTC (permalink / raw)
  To: cedric.vincent, Peter Maydell, Riku Voipio, qemu-devel,
	Aurelien Jarno
On 3 April 2012 13:28,  <cedric.vincent@st.com> wrote:
> On Mon, Mar 26, 2012 at 07:23:58PM +0200, Peter Maydell wrote:
>> 2012/3/26 Cédric VINCENT <cedric.vincent@st.com>:
>> > the function cpu_restore_state() isn't expected to be called in user-mode,
>>
>> Is this really true? host_signal_handler() calls cpu_signal_handler()
>> calls handle_cpu_signala) calls cpu_restore_state() so hopefully
>> it's OK to be called in at least *some* situations...
>
> Actually it's not that OK, the code below [1] exposes a threading
> issue in this specific part of QEMU:
>
>    * the first thread makes invalid memory references, that way QEMU
>      reaches the given situation (host_signal_handler ->
>      cpu_signal_handler -> handle_cpu_signal -> cpu_restore_state ->
>      gen_intermediate_code) without acquiring "tb_lock".
>
>    * at the same time, the second thread executes code that wasn't
>      translated before, that way the TCG is invoked with "tb_lock"
>      acquired.  Note that in this example I used a self modifying
>      block just to simulate a not-yet-translated code.
>
> This example triggers an invalid memory reference and/or an assertion
> in the TCG part of QEMU.
Mmm, I rather suspected you had a larger problem than the one
you first encountered.
>> NB the whole tb_lock thing is broken anyway.
>
> Because of such bugs or are there other reasons?  Maybe we could add a
> simple sanity check in gen_intermediate_code() functions to be sure
> that "tb_lock" is acquired when they are called.
Eg https://bugs.launchpad.net/qemu/+bug/668799
Basically at the moment there are places where we try to update
the TB graph in functions called from signal handlers. This isn't
safe because we can't take a lock in the signal handler (we'd deadlock).
Instead we do no locking at all and if you make heavy use of threads
and signals we crash.
-- PMM
^ permalink raw reply	[flat|nested] 6+ messages in thread
end of thread, other threads:[~2012-04-03 12:48 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-26 13:07 [Qemu-devel] [PATCH] sh4-linux-user: fix multi-threading regression Cédric VINCENT
2012-03-26 17:23 ` Peter Maydell
2012-03-27  8:15   ` cedric.vincent
2012-04-03 12:28   ` [Qemu-devel] linux-user: threading issue in the signal handler (Was: [PATCH] sh4-linux-user: fix multi-threading) regression cedric.vincent
2012-04-03 12:48     ` Peter Maydell
2012-03-30 13:36 ` [Qemu-devel] [PATCH] sh4-linux-user: fix multi-threading regression cedric.vincent
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).