qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* qemu-system-ppc64 option -smp 2 broken with commit 20b6643324a79860dcdfe811ffe4a79942bca21e
@ 2023-06-23 11:09 Anushree Mathur
  2023-06-23 13:46 ` Cédric Le Goater
  2023-06-24 14:29 ` Michael Tokarev
  0 siblings, 2 replies; 7+ messages in thread
From: Anushree Mathur @ 2023-06-23 11:09 UTC (permalink / raw)
  To: qemu-ppc, qemu-devel, richard.henderson, alex.bennee
  Cc: Daniel Henrique Barboza, Nicholas Piggin, harshpb

Hi everyone,

I was trying to boot rhel9.3 image with upstream qemu-system-ppc64 -smp 
2 option and observed a segfault (qemu crash).

qemu command line used:

qemu-system-ppc64 -name Rhel9.3.ppc64le -smp 2 -m 16G -vga none 
-nographic -machine pseries -cpu POWER10 -accel tcg -device 
virtio-scsi-pci -drive file=/home/rh93.qcow2,if=none,format=qcow2,id=hd0 
-device scsi-hd,drive=hd0 -boot c

After doing a git bisect, I found the first bad commit which introduced 
this issue is below:

[qemu]# git bisect good
20b6643324a79860dcdfe811ffe4a79942bca21e is the first bad commit
commit 20b6643324a79860dcdfe811ffe4a79942bca21e
Author: Richard Henderson <richard.henderson@linaro.org>
Date:   Mon Dec 5 17:45:02 2022 -0600

     tcg/ppc: Reorg goto_tb implementation

     The old ppc64 implementation replaces 2 or 4 insns, which leaves a race
     condition in which a thread could be stopped at a PC in the middle of
     the sequence, and when restarted does not see the complete address
     computation and branches to nowhere.

     The new implemetation replaces only one insn, swapping between

             b       <dest>
     and
             mtctr   r31

     falling through to a general-case indirect branch.

     Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
     Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

  tcg/ppc/tcg-target.c.inc | 152 
+++++++++++++----------------------------------
  tcg/ppc/tcg-target.h     |   3 +-
  2 files changed, 41 insertions(+), 114 deletions(-)
[qemu]#

Can someone please take a look and suggest a fix to resolve this issue?

Thanks in advance.
Regards,
Anushree-Mathur



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: qemu-system-ppc64 option -smp 2 broken with commit 20b6643324a79860dcdfe811ffe4a79942bca21e
  2023-06-23 11:09 qemu-system-ppc64 option -smp 2 broken with commit 20b6643324a79860dcdfe811ffe4a79942bca21e Anushree Mathur
@ 2023-06-23 13:46 ` Cédric Le Goater
  2023-06-23 15:22   ` Alex Bennée
  2023-06-26  5:17   ` Anushree Mathur
  2023-06-24 14:29 ` Michael Tokarev
  1 sibling, 2 replies; 7+ messages in thread
From: Cédric Le Goater @ 2023-06-23 13:46 UTC (permalink / raw)
  To: Anushree Mathur, qemu-ppc, qemu-devel, richard.henderson,
	alex.bennee
  Cc: Daniel Henrique Barboza, Nicholas Piggin, harshpb

Hello Anushree,

On 6/23/23 13:09, Anushree Mathur wrote:
> Hi everyone,
> 
> I was trying to boot rhel9.3 image with upstream qemu-system-ppc64 -smp 2 option and observed a segfault (qemu crash).
> 
> qemu command line used:
> 
> qemu-system-ppc64 -name Rhel9.3.ppc64le -smp 2 -m 16G -vga none -nographic -machine pseries -cpu POWER10 -accel tcg -device virtio-scsi-pci -drive file=/home/rh93.qcow2,if=none,format=qcow2,id=hd0 -device scsi-hd,drive=hd0 -boot c
> 
> After doing a git bisect, I found the first bad commit which introduced this issue is below:

Could you please open a gitlab issue on QEMU project ?

  https://gitlab.com/qemu-project/qemu/-/issues

Thanks,

C.

> [qemu]# git bisect good
> 20b6643324a79860dcdfe811ffe4a79942bca21e is the first bad commit
> commit 20b6643324a79860dcdfe811ffe4a79942bca21e
> Author: Richard Henderson <richard.henderson@linaro.org>
> Date:   Mon Dec 5 17:45:02 2022 -0600
> 
>      tcg/ppc: Reorg goto_tb implementation
> 
>      The old ppc64 implementation replaces 2 or 4 insns, which leaves a race
>      condition in which a thread could be stopped at a PC in the middle of
>      the sequence, and when restarted does not see the complete address
>      computation and branches to nowhere.
> 
>      The new implemetation replaces only one insn, swapping between
> 
>              b       <dest>
>      and
>              mtctr   r31
> 
>      falling through to a general-case indirect branch.
> 
>      Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
>      Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> 
>   tcg/ppc/tcg-target.c.inc | 152 +++++++++++++----------------------------------
>   tcg/ppc/tcg-target.h     |   3 +-
>   2 files changed, 41 insertions(+), 114 deletions(-)
> [qemu]#
> 
> Can someone please take a look and suggest a fix to resolve this issue?
> 
> Thanks in advance.
> Regards,
> Anushree-Mathur
> 
> 



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: qemu-system-ppc64 option -smp 2 broken with commit 20b6643324a79860dcdfe811ffe4a79942bca21e
  2023-06-23 13:46 ` Cédric Le Goater
@ 2023-06-23 15:22   ` Alex Bennée
  2023-07-12  8:34     ` Anushree Mathur
  2023-06-26  5:17   ` Anushree Mathur
  1 sibling, 1 reply; 7+ messages in thread
From: Alex Bennée @ 2023-06-23 15:22 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Anushree Mathur, qemu-ppc, qemu-devel, richard.henderson,
	Daniel Henrique Barboza, Nicholas Piggin, harshpb


Cédric Le Goater <clg@kaod.org> writes:

> Hello Anushree,
>
> On 6/23/23 13:09, Anushree Mathur wrote:
>> Hi everyone,
>> I was trying to boot rhel9.3 image with upstream qemu-system-ppc64
>> -smp 2 option and observed a segfault (qemu crash).
>> qemu command line used:
>> qemu-system-ppc64 -name Rhel9.3.ppc64le -smp 2 -m 16G -vga none
>> -nographic -machine pseries -cpu POWER10 -accel tcg -device
>> virtio-scsi-pci -drive
>> file=/home/rh93.qcow2,if=none,format=qcow2,id=hd0 -device
>> scsi-hd,drive=hd0 -boot c
>> After doing a git bisect, I found the first bad commit which
>> introduced this issue is below:
>
> Could you please open a gitlab issue on QEMU project ?
>
>  https://gitlab.com/qemu-project/qemu/-/issues

Is it broken generated code that faults or does the goto_tb code break
the execution sequence in some subtle way further down the line?

If you can isolate the guest address the output from:

  -dfilter 0xBADADDR+0x100 -d in_asm,op,out_asm

would be useful for the bug report. Although conceivably the out_asm
output might make sense at translation time and then be broken when it
is patched. Having rr on power would be really useful to debug this sort
of thing. 

>
> Thanks,
>
> C.
>
>> [qemu]# git bisect good
>> 20b6643324a79860dcdfe811ffe4a79942bca21e is the first bad commit
>> commit 20b6643324a79860dcdfe811ffe4a79942bca21e
>> Author: Richard Henderson <richard.henderson@linaro.org>
>> Date:   Mon Dec 5 17:45:02 2022 -0600
>>      tcg/ppc: Reorg goto_tb implementation
>>      The old ppc64 implementation replaces 2 or 4 insns, which
>> leaves a race
>>      condition in which a thread could be stopped at a PC in the middle of
>>      the sequence, and when restarted does not see the complete address
>>      computation and branches to nowhere.
>>      The new implemetation replaces only one insn, swapping between
>>              b       <dest>
>>      and
>>              mtctr   r31
>>      falling through to a general-case indirect branch.
>>      Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
>>      Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>>   tcg/ppc/tcg-target.c.inc | 152
>> +++++++++++++----------------------------------
>>   tcg/ppc/tcg-target.h     |   3 +-
>>   2 files changed, 41 insertions(+), 114 deletions(-)
>> [qemu]#
>> Can someone please take a look and suggest a fix to resolve this
>> issue?
>> Thanks in advance.
>> Regards,
>> Anushree-Mathur
>> 


-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: qemu-system-ppc64 option -smp 2 broken with commit 20b6643324a79860dcdfe811ffe4a79942bca21e
  2023-06-23 11:09 qemu-system-ppc64 option -smp 2 broken with commit 20b6643324a79860dcdfe811ffe4a79942bca21e Anushree Mathur
  2023-06-23 13:46 ` Cédric Le Goater
@ 2023-06-24 14:29 ` Michael Tokarev
  1 sibling, 0 replies; 7+ messages in thread
From: Michael Tokarev @ 2023-06-24 14:29 UTC (permalink / raw)
  To: Anushree Mathur, qemu-ppc, qemu-devel, richard.henderson,
	alex.bennee
  Cc: Daniel Henrique Barboza, Nicholas Piggin, harshpb

23.06.2023 14:09, Anushree Mathur wrote:
> Hi everyone,
> 
> I was trying to boot rhel9.3 image with upstream qemu-system-ppc64 -smp 2 option and observed a segfault (qemu crash).
> 
> qemu command line used:
> 
> qemu-system-ppc64 -name Rhel9.3.ppc64le -smp 2 -m 16G -vga none -nographic -machine pseries -cpu POWER10 -accel tcg -device virtio-scsi-pci -drive 
> file=/home/rh93.qcow2,if=none,format=qcow2,id=hd0 -device scsi-hd,drive=hd0 -boot c
> 
> After doing a git bisect, I found the first bad commit which introduced this issue is below:
> 
> [qemu]# git bisect good
> 20b6643324a79860dcdfe811ffe4a79942bca21e is the first bad commit
> commit 20b6643324a79860dcdfe811ffe4a79942bca21e
> Author: Richard Henderson <richard.henderson@linaro.org>
> Date:   Mon Dec 5 17:45:02 2022 -0600
> 
>      tcg/ppc: Reorg goto_tb implementation

I've got another case which leads to this same commit, with similar results,
on a debian ppc64 machine with qemu 8.0 and master.

The crash doesn't happen every time, sometimes it needs 20+ iterations
to trigger (so my bisection was rather painful, initially pointing to
an entirely innocent commit).  So far it only occurs on actual ppc64
machine, - I weren't able to reproduce it on amd64.

Sometimes (more often) it ends with SIGSEGV, but sometimes it also fails
with Illegal Instruction.  Examining it with gdb - it looks more like a
stack corruption.

I triggered it by just booting a linux system. When it fails, most often
it fails somewhere at the end of boot, but sometimes it does that the moment
kernel spawns /init from initramfs and that one (a shell script) executes
first program.



[  OK  ] Finished systemd-journal-f…ush Journal to Persistent Storage.
          Starting systemd-tmpfiles-… Volatile Files and Directories...
[  OK  ] Finished systemd-udev-trig…e - Coldplug All udev Devices.
[  OK  ] Finished systemd-tmpfiles-…te Volatile Files and Directories.
          Starting systemd-resolved.…e - Network Name Resolution...
          Starting systemd-update-ut…rd System Boot/Shutdown in UTMP...
[  OK  ] Started systemd-udevd.serv…nager for Device Events and Files.
          Starting systemd-networkd.…ice - Network Configuration...
Segmentation fault (core dumped)

...
Core was generated by `qemu-system-ppc64 -append root=LABEL=debvm rw -nographic -smp 2 -machine accel='.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fff3462395c in code_gen_buffer ()
[Current thread is 1 (Thread 0x7fff79c6e7c0 (LWP 922586))]
(gdb) bt
#0  0x00007fff3462395c in code_gen_buffer ()
#1  0x00000001076cbd2c in cpu_tb_exec (cpu=cpu@entry=0x1001d98b320, itb=itb@entry=0x7fff4b378480 <code_gen_buffer+383812548>,
     tb_exit=tb_exit@entry=0x7fff79c6d8c0) at accel/tcg/cpu-exec.c:460
#2  0x00000001076cc348 in cpu_loop_exec_tb (tb_exit=0x7fff79c6d8c0, last_tb=<synthetic pointer>, pc=140736355546736,
     tb=0x7fff4b378480 <code_gen_buffer+383812548>, cpu=<optimized out>) at accel/tcg/cpu-exec.c:893
#3  cpu_exec_loop (cpu=cpu@entry=0x1001d98b320, sc=sc@entry=0x7fff79c6da10) at accel/tcg/cpu-exec.c:1013
#4  0x00000001076ccd98 in cpu_exec_setjmp (cpu=cpu@entry=0x1001d98b320, sc=sc@entry=0x7fff79c6da10)
     at accel/tcg/cpu-exec.c:1043
#5  0x00000001076cd5ec in cpu_exec (cpu=0x1001d98b320) at accel/tcg/cpu-exec.c:1069
#6  0x0000000107705d30 in tcg_cpus_exec (cpu=0x1001d98b320) at accel/tcg/tcg-accel-ops.c:81
#7  0x0000000107705f20 in mttcg_cpu_thread_fn (arg=0x1001d98b320) at accel/tcg/tcg-accel-ops-mttcg.c:95
#8  0x000000010793ed7c in qemu_thread_start (args=<optimized out>) at util/qemu-thread-posix.c:541
#9  0x00007fff81673d0c in ?? () from /lib/powerpc64le-linux-gnu/libc.so.6
#10 0x00007fff81724350 in clone () from /lib/powerpc64le-linux-gnu/libc.so.6

(gdb) l
32	
33	int qemu_default_main(void)
34	{
35	   int status;
36	
37	   status = qemu_main_loop();
38	   qemu_cleanup();
39	
40	   return status;
41	}

(gdb) frame 1
#1  0x00000001076cbd2c in cpu_tb_exec (cpu=cpu@entry=0x1001d98b320, itb=itb@entry=0x7fff4b378480 <code_gen_buffer+383812548>,
     tb_exit=tb_exit@entry=0x7fff79c6d8c0) at accel/tcg/cpu-exec.c:460
460	   ret = tcg_qemu_tb_exec(env, tb_ptr);
(gdb) l
455	   if (qemu_loglevel_mask(CPU_LOG_TB_CPU | CPU_LOG_EXEC)) {
456	       log_cpu_exec(log_pc(cpu, itb), cpu, itb);
457	   }
458	
459	   qemu_thread_jit_execute();
460	   ret = tcg_qemu_tb_exec(env, tb_ptr);
461	   cpu->can_do_io = 1;
462	   qemu_plugin_disable_mem_helpers(cpu);
463	   /*
464	    * TODO: Delay swapping back to the read-write region of the TB


(this is 8.0.2, the same happens with master).

Here, frame#0 appears corrupt.

Other attempts, sometimes stack frame is corrupt to a way so gdb can't decode it
at all.

I need help debugging this further.

Thanks,

/mjt


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: qemu-system-ppc64 option -smp 2 broken with commit 20b6643324a79860dcdfe811ffe4a79942bca21e
  2023-06-23 13:46 ` Cédric Le Goater
  2023-06-23 15:22   ` Alex Bennée
@ 2023-06-26  5:17   ` Anushree Mathur
  2023-06-26  6:18     ` Cédric Le Goater
  1 sibling, 1 reply; 7+ messages in thread
From: Anushree Mathur @ 2023-06-26  5:17 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-ppc, qemu-devel, Daniel Henrique Barboza, Nicholas Piggin,
	harshpb, richard.henderson, alex.bennee


On 6/23/23 19:16, Cédric Le Goater wrote:
> Hello Anushree,
>
> On 6/23/23 13:09, Anushree Mathur wrote:
>> Hi everyone,
>>
>> I was trying to boot rhel9.3 image with upstream qemu-system-ppc64 
>> -smp 2 option and observed a segfault (qemu crash).
>>
>> qemu command line used:
>>
>> qemu-system-ppc64 -name Rhel9.3.ppc64le -smp 2 -m 16G -vga none 
>> -nographic -machine pseries -cpu POWER10 -accel tcg -device 
>> virtio-scsi-pci -drive 
>> file=/home/rh93.qcow2,if=none,format=qcow2,id=hd0 -device 
>> scsi-hd,drive=hd0 -boot c
>>
>> After doing a git bisect, I found the first bad commit which 
>> introduced this issue is below:
>
> Could you please open a gitlab issue on QEMU project ?
>
>  https://gitlab.com/qemu-project/qemu/-/issues
>
> Thanks,
>
> C.
>
>> [qemu]# git bisect good
>> 20b6643324a79860dcdfe811ffe4a79942bca21e is the first bad commit
>> commit 20b6643324a79860dcdfe811ffe4a79942bca21e
>> Author: Richard Henderson <richard.henderson@linaro.org>
>> Date:   Mon Dec 5 17:45:02 2022 -0600
>>
>>      tcg/ppc: Reorg goto_tb implementation
>>
>>      The old ppc64 implementation replaces 2 or 4 insns, which leaves 
>> a race
>>      condition in which a thread could be stopped at a PC in the 
>> middle of
>>      the sequence, and when restarted does not see the complete address
>>      computation and branches to nowhere.
>>
>>      The new implemetation replaces only one insn, swapping between
>>
>>              b       <dest>
>>      and
>>              mtctr   r31
>>
>>      falling through to a general-case indirect branch.
>>
>>      Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
>>      Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>>
>>   tcg/ppc/tcg-target.c.inc | 152 
>> +++++++++++++----------------------------------
>>   tcg/ppc/tcg-target.h     |   3 +-
>>   2 files changed, 41 insertions(+), 114 deletions(-)
>> [qemu]#
>>
>> Can someone please take a look and suggest a fix to resolve this issue?
>>
>> Thanks in advance.
>> Regards,
>> Anushree-Mathur
>>
>>
Hello Cedric,
> As per your mail, I have created the gitlab issue 
> https://gitlab.com/qemu-project/qemu/-/issues/1726.

Thanks & Regards,

Anushree-Mathur

>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: qemu-system-ppc64 option -smp 2 broken with commit 20b6643324a79860dcdfe811ffe4a79942bca21e
  2023-06-26  5:17   ` Anushree Mathur
@ 2023-06-26  6:18     ` Cédric Le Goater
  0 siblings, 0 replies; 7+ messages in thread
From: Cédric Le Goater @ 2023-06-26  6:18 UTC (permalink / raw)
  To: Anushree Mathur
  Cc: qemu-ppc, qemu-devel, Daniel Henrique Barboza, Nicholas Piggin,
	harshpb, richard.henderson, alex.bennee

Hello Anushree


> Hello Cedric,
>
> As per your mail, I have created the gitlab issue https://gitlab.com/qemu-project/qemu/-/issues/1726.

Alex had a request for the bug report. If you have to time to provide
the data, it should help analyzing the issue.

Thanks,

C.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: qemu-system-ppc64 option -smp 2 broken with commit 20b6643324a79860dcdfe811ffe4a79942bca21e
  2023-06-23 15:22   ` Alex Bennée
@ 2023-07-12  8:34     ` Anushree Mathur
  0 siblings, 0 replies; 7+ messages in thread
From: Anushree Mathur @ 2023-07-12  8:34 UTC (permalink / raw)
  To: Alex Bennée, Cédric Le Goater
  Cc: qemu-ppc, qemu-devel, richard.henderson, Daniel Henrique Barboza,
	Nicholas Piggin, harshpb

Hi Alex,

On 6/23/23 20:52, Alex Bennée wrote:
> Cédric Le Goater <clg@kaod.org> writes:
>
>> Hello Anushree,
>>
>> On 6/23/23 13:09, Anushree Mathur wrote:
>>> Hi everyone,
>>> I was trying to boot rhel9.3 image with upstream qemu-system-ppc64
>>> -smp 2 option and observed a segfault (qemu crash).
>>> qemu command line used:
>>> qemu-system-ppc64 -name Rhel9.3.ppc64le -smp 2 -m 16G -vga none
>>> -nographic -machine pseries -cpu POWER10 -accel tcg -device
>>> virtio-scsi-pci -drive
>>> file=/home/rh93.qcow2,if=none,format=qcow2,id=hd0 -device
>>> scsi-hd,drive=hd0 -boot c
>>> After doing a git bisect, I found the first bad commit which
>>> introduced this issue is below:
>> Could you please open a gitlab issue on QEMU project ?
>>
>>   https://gitlab.com/qemu-project/qemu/-/issues
> Is it broken generated code that faults or does the goto_tb code break
> the execution sequence in some subtle way further down the line?
>
> If you can isolate the guest address the output from:
>
>    -dfilter 0xBADADDR+0x100 -d in_asm,op,out_asm

I tried as suggested above but didn't get much info collected.

I have shared my observation on the gitlab issue page.

https://gitlab.com/qemu-project/qemu/-/issues/1726


Thanks,

Anushree-Mathur

> would be useful for the bug report. Although conceivably the out_asm
> output might make sense at translation time and then be broken when it
> is patched. Having rr on power would be really useful to debug this sort
> of thing.
>
>> Thanks,
>>
>> C.
>>
>>> [qemu]# git bisect good
>>> 20b6643324a79860dcdfe811ffe4a79942bca21e is the first bad commit
>>> commit 20b6643324a79860dcdfe811ffe4a79942bca21e
>>> Author: Richard Henderson <richard.henderson@linaro.org>
>>> Date:   Mon Dec 5 17:45:02 2022 -0600
>>>       tcg/ppc: Reorg goto_tb implementation
>>>       The old ppc64 implementation replaces 2 or 4 insns, which
>>> leaves a race
>>>       condition in which a thread could be stopped at a PC in the middle of
>>>       the sequence, and when restarted does not see the complete address
>>>       computation and branches to nowhere.
>>>       The new implemetation replaces only one insn, swapping between
>>>               b       <dest>
>>>       and
>>>               mtctr   r31
>>>       falling through to a general-case indirect branch.
>>>       Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
>>>       Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>>>    tcg/ppc/tcg-target.c.inc | 152
>>> +++++++++++++----------------------------------
>>>    tcg/ppc/tcg-target.h     |   3 +-
>>>    2 files changed, 41 insertions(+), 114 deletions(-)
>>> [qemu]#
>>> Can someone please take a look and suggest a fix to resolve this
>>> issue?
>>> Thanks in advance.
>>> Regards,
>>> Anushree-Mathur
>>>
>


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-07-12  8:36 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-23 11:09 qemu-system-ppc64 option -smp 2 broken with commit 20b6643324a79860dcdfe811ffe4a79942bca21e Anushree Mathur
2023-06-23 13:46 ` Cédric Le Goater
2023-06-23 15:22   ` Alex Bennée
2023-07-12  8:34     ` Anushree Mathur
2023-06-26  5:17   ` Anushree Mathur
2023-06-26  6:18     ` Cédric Le Goater
2023-06-24 14:29 ` Michael Tokarev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).