* qemu-system-ppc64 option -smp 2 broken with commit 20b6643324a79860dcdfe811ffe4a79942bca21e
@ 2023-06-23 11:09 Anushree Mathur
2023-06-23 13:46 ` Cédric Le Goater
2023-06-24 14:29 ` Michael Tokarev
0 siblings, 2 replies; 7+ messages in thread
From: Anushree Mathur @ 2023-06-23 11:09 UTC (permalink / raw)
To: qemu-ppc, qemu-devel, richard.henderson, alex.bennee
Cc: Daniel Henrique Barboza, Nicholas Piggin, harshpb
Hi everyone,
I was trying to boot rhel9.3 image with upstream qemu-system-ppc64 -smp
2 option and observed a segfault (qemu crash).
qemu command line used:
qemu-system-ppc64 -name Rhel9.3.ppc64le -smp 2 -m 16G -vga none
-nographic -machine pseries -cpu POWER10 -accel tcg -device
virtio-scsi-pci -drive file=/home/rh93.qcow2,if=none,format=qcow2,id=hd0
-device scsi-hd,drive=hd0 -boot c
After doing a git bisect, I found the first bad commit which introduced
this issue is below:
[qemu]# git bisect good
20b6643324a79860dcdfe811ffe4a79942bca21e is the first bad commit
commit 20b6643324a79860dcdfe811ffe4a79942bca21e
Author: Richard Henderson <richard.henderson@linaro.org>
Date: Mon Dec 5 17:45:02 2022 -0600
tcg/ppc: Reorg goto_tb implementation
The old ppc64 implementation replaces 2 or 4 insns, which leaves a race
condition in which a thread could be stopped at a PC in the middle of
the sequence, and when restarted does not see the complete address
computation and branches to nowhere.
The new implemetation replaces only one insn, swapping between
b <dest>
and
mtctr r31
falling through to a general-case indirect branch.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
tcg/ppc/tcg-target.c.inc | 152
+++++++++++++----------------------------------
tcg/ppc/tcg-target.h | 3 +-
2 files changed, 41 insertions(+), 114 deletions(-)
[qemu]#
Can someone please take a look and suggest a fix to resolve this issue?
Thanks in advance.
Regards,
Anushree-Mathur
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: qemu-system-ppc64 option -smp 2 broken with commit 20b6643324a79860dcdfe811ffe4a79942bca21e
2023-06-23 11:09 qemu-system-ppc64 option -smp 2 broken with commit 20b6643324a79860dcdfe811ffe4a79942bca21e Anushree Mathur
@ 2023-06-23 13:46 ` Cédric Le Goater
2023-06-23 15:22 ` Alex Bennée
2023-06-26 5:17 ` Anushree Mathur
2023-06-24 14:29 ` Michael Tokarev
1 sibling, 2 replies; 7+ messages in thread
From: Cédric Le Goater @ 2023-06-23 13:46 UTC (permalink / raw)
To: Anushree Mathur, qemu-ppc, qemu-devel, richard.henderson,
alex.bennee
Cc: Daniel Henrique Barboza, Nicholas Piggin, harshpb
Hello Anushree,
On 6/23/23 13:09, Anushree Mathur wrote:
> Hi everyone,
>
> I was trying to boot rhel9.3 image with upstream qemu-system-ppc64 -smp 2 option and observed a segfault (qemu crash).
>
> qemu command line used:
>
> qemu-system-ppc64 -name Rhel9.3.ppc64le -smp 2 -m 16G -vga none -nographic -machine pseries -cpu POWER10 -accel tcg -device virtio-scsi-pci -drive file=/home/rh93.qcow2,if=none,format=qcow2,id=hd0 -device scsi-hd,drive=hd0 -boot c
>
> After doing a git bisect, I found the first bad commit which introduced this issue is below:
Could you please open a gitlab issue on QEMU project ?
https://gitlab.com/qemu-project/qemu/-/issues
Thanks,
C.
> [qemu]# git bisect good
> 20b6643324a79860dcdfe811ffe4a79942bca21e is the first bad commit
> commit 20b6643324a79860dcdfe811ffe4a79942bca21e
> Author: Richard Henderson <richard.henderson@linaro.org>
> Date: Mon Dec 5 17:45:02 2022 -0600
>
> tcg/ppc: Reorg goto_tb implementation
>
> The old ppc64 implementation replaces 2 or 4 insns, which leaves a race
> condition in which a thread could be stopped at a PC in the middle of
> the sequence, and when restarted does not see the complete address
> computation and branches to nowhere.
>
> The new implemetation replaces only one insn, swapping between
>
> b <dest>
> and
> mtctr r31
>
> falling through to a general-case indirect branch.
>
> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>
> tcg/ppc/tcg-target.c.inc | 152 +++++++++++++----------------------------------
> tcg/ppc/tcg-target.h | 3 +-
> 2 files changed, 41 insertions(+), 114 deletions(-)
> [qemu]#
>
> Can someone please take a look and suggest a fix to resolve this issue?
>
> Thanks in advance.
> Regards,
> Anushree-Mathur
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: qemu-system-ppc64 option -smp 2 broken with commit 20b6643324a79860dcdfe811ffe4a79942bca21e
2023-06-23 13:46 ` Cédric Le Goater
@ 2023-06-23 15:22 ` Alex Bennée
2023-07-12 8:34 ` Anushree Mathur
2023-06-26 5:17 ` Anushree Mathur
1 sibling, 1 reply; 7+ messages in thread
From: Alex Bennée @ 2023-06-23 15:22 UTC (permalink / raw)
To: Cédric Le Goater
Cc: Anushree Mathur, qemu-ppc, qemu-devel, richard.henderson,
Daniel Henrique Barboza, Nicholas Piggin, harshpb
Cédric Le Goater <clg@kaod.org> writes:
> Hello Anushree,
>
> On 6/23/23 13:09, Anushree Mathur wrote:
>> Hi everyone,
>> I was trying to boot rhel9.3 image with upstream qemu-system-ppc64
>> -smp 2 option and observed a segfault (qemu crash).
>> qemu command line used:
>> qemu-system-ppc64 -name Rhel9.3.ppc64le -smp 2 -m 16G -vga none
>> -nographic -machine pseries -cpu POWER10 -accel tcg -device
>> virtio-scsi-pci -drive
>> file=/home/rh93.qcow2,if=none,format=qcow2,id=hd0 -device
>> scsi-hd,drive=hd0 -boot c
>> After doing a git bisect, I found the first bad commit which
>> introduced this issue is below:
>
> Could you please open a gitlab issue on QEMU project ?
>
> https://gitlab.com/qemu-project/qemu/-/issues
Is it broken generated code that faults or does the goto_tb code break
the execution sequence in some subtle way further down the line?
If you can isolate the guest address the output from:
-dfilter 0xBADADDR+0x100 -d in_asm,op,out_asm
would be useful for the bug report. Although conceivably the out_asm
output might make sense at translation time and then be broken when it
is patched. Having rr on power would be really useful to debug this sort
of thing.
>
> Thanks,
>
> C.
>
>> [qemu]# git bisect good
>> 20b6643324a79860dcdfe811ffe4a79942bca21e is the first bad commit
>> commit 20b6643324a79860dcdfe811ffe4a79942bca21e
>> Author: Richard Henderson <richard.henderson@linaro.org>
>> Date: Mon Dec 5 17:45:02 2022 -0600
>> tcg/ppc: Reorg goto_tb implementation
>> The old ppc64 implementation replaces 2 or 4 insns, which
>> leaves a race
>> condition in which a thread could be stopped at a PC in the middle of
>> the sequence, and when restarted does not see the complete address
>> computation and branches to nowhere.
>> The new implemetation replaces only one insn, swapping between
>> b <dest>
>> and
>> mtctr r31
>> falling through to a general-case indirect branch.
>> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> tcg/ppc/tcg-target.c.inc | 152
>> +++++++++++++----------------------------------
>> tcg/ppc/tcg-target.h | 3 +-
>> 2 files changed, 41 insertions(+), 114 deletions(-)
>> [qemu]#
>> Can someone please take a look and suggest a fix to resolve this
>> issue?
>> Thanks in advance.
>> Regards,
>> Anushree-Mathur
>>
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: qemu-system-ppc64 option -smp 2 broken with commit 20b6643324a79860dcdfe811ffe4a79942bca21e
2023-06-23 11:09 qemu-system-ppc64 option -smp 2 broken with commit 20b6643324a79860dcdfe811ffe4a79942bca21e Anushree Mathur
2023-06-23 13:46 ` Cédric Le Goater
@ 2023-06-24 14:29 ` Michael Tokarev
1 sibling, 0 replies; 7+ messages in thread
From: Michael Tokarev @ 2023-06-24 14:29 UTC (permalink / raw)
To: Anushree Mathur, qemu-ppc, qemu-devel, richard.henderson,
alex.bennee
Cc: Daniel Henrique Barboza, Nicholas Piggin, harshpb
23.06.2023 14:09, Anushree Mathur wrote:
> Hi everyone,
>
> I was trying to boot rhel9.3 image with upstream qemu-system-ppc64 -smp 2 option and observed a segfault (qemu crash).
>
> qemu command line used:
>
> qemu-system-ppc64 -name Rhel9.3.ppc64le -smp 2 -m 16G -vga none -nographic -machine pseries -cpu POWER10 -accel tcg -device virtio-scsi-pci -drive
> file=/home/rh93.qcow2,if=none,format=qcow2,id=hd0 -device scsi-hd,drive=hd0 -boot c
>
> After doing a git bisect, I found the first bad commit which introduced this issue is below:
>
> [qemu]# git bisect good
> 20b6643324a79860dcdfe811ffe4a79942bca21e is the first bad commit
> commit 20b6643324a79860dcdfe811ffe4a79942bca21e
> Author: Richard Henderson <richard.henderson@linaro.org>
> Date: Mon Dec 5 17:45:02 2022 -0600
>
> tcg/ppc: Reorg goto_tb implementation
I've got another case which leads to this same commit, with similar results,
on a debian ppc64 machine with qemu 8.0 and master.
The crash doesn't happen every time, sometimes it needs 20+ iterations
to trigger (so my bisection was rather painful, initially pointing to
an entirely innocent commit). So far it only occurs on actual ppc64
machine, - I weren't able to reproduce it on amd64.
Sometimes (more often) it ends with SIGSEGV, but sometimes it also fails
with Illegal Instruction. Examining it with gdb - it looks more like a
stack corruption.
I triggered it by just booting a linux system. When it fails, most often
it fails somewhere at the end of boot, but sometimes it does that the moment
kernel spawns /init from initramfs and that one (a shell script) executes
first program.
[ OK ] Finished systemd-journal-f…ush Journal to Persistent Storage.
Starting systemd-tmpfiles-… Volatile Files and Directories...
[ OK ] Finished systemd-udev-trig…e - Coldplug All udev Devices.
[ OK ] Finished systemd-tmpfiles-…te Volatile Files and Directories.
Starting systemd-resolved.…e - Network Name Resolution...
Starting systemd-update-ut…rd System Boot/Shutdown in UTMP...
[ OK ] Started systemd-udevd.serv…nager for Device Events and Files.
Starting systemd-networkd.…ice - Network Configuration...
Segmentation fault (core dumped)
...
Core was generated by `qemu-system-ppc64 -append root=LABEL=debvm rw -nographic -smp 2 -machine accel='.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007fff3462395c in code_gen_buffer ()
[Current thread is 1 (Thread 0x7fff79c6e7c0 (LWP 922586))]
(gdb) bt
#0 0x00007fff3462395c in code_gen_buffer ()
#1 0x00000001076cbd2c in cpu_tb_exec (cpu=cpu@entry=0x1001d98b320, itb=itb@entry=0x7fff4b378480 <code_gen_buffer+383812548>,
tb_exit=tb_exit@entry=0x7fff79c6d8c0) at accel/tcg/cpu-exec.c:460
#2 0x00000001076cc348 in cpu_loop_exec_tb (tb_exit=0x7fff79c6d8c0, last_tb=<synthetic pointer>, pc=140736355546736,
tb=0x7fff4b378480 <code_gen_buffer+383812548>, cpu=<optimized out>) at accel/tcg/cpu-exec.c:893
#3 cpu_exec_loop (cpu=cpu@entry=0x1001d98b320, sc=sc@entry=0x7fff79c6da10) at accel/tcg/cpu-exec.c:1013
#4 0x00000001076ccd98 in cpu_exec_setjmp (cpu=cpu@entry=0x1001d98b320, sc=sc@entry=0x7fff79c6da10)
at accel/tcg/cpu-exec.c:1043
#5 0x00000001076cd5ec in cpu_exec (cpu=0x1001d98b320) at accel/tcg/cpu-exec.c:1069
#6 0x0000000107705d30 in tcg_cpus_exec (cpu=0x1001d98b320) at accel/tcg/tcg-accel-ops.c:81
#7 0x0000000107705f20 in mttcg_cpu_thread_fn (arg=0x1001d98b320) at accel/tcg/tcg-accel-ops-mttcg.c:95
#8 0x000000010793ed7c in qemu_thread_start (args=<optimized out>) at util/qemu-thread-posix.c:541
#9 0x00007fff81673d0c in ?? () from /lib/powerpc64le-linux-gnu/libc.so.6
#10 0x00007fff81724350 in clone () from /lib/powerpc64le-linux-gnu/libc.so.6
(gdb) l
32
33 int qemu_default_main(void)
34 {
35 int status;
36
37 status = qemu_main_loop();
38 qemu_cleanup();
39
40 return status;
41 }
(gdb) frame 1
#1 0x00000001076cbd2c in cpu_tb_exec (cpu=cpu@entry=0x1001d98b320, itb=itb@entry=0x7fff4b378480 <code_gen_buffer+383812548>,
tb_exit=tb_exit@entry=0x7fff79c6d8c0) at accel/tcg/cpu-exec.c:460
460 ret = tcg_qemu_tb_exec(env, tb_ptr);
(gdb) l
455 if (qemu_loglevel_mask(CPU_LOG_TB_CPU | CPU_LOG_EXEC)) {
456 log_cpu_exec(log_pc(cpu, itb), cpu, itb);
457 }
458
459 qemu_thread_jit_execute();
460 ret = tcg_qemu_tb_exec(env, tb_ptr);
461 cpu->can_do_io = 1;
462 qemu_plugin_disable_mem_helpers(cpu);
463 /*
464 * TODO: Delay swapping back to the read-write region of the TB
(this is 8.0.2, the same happens with master).
Here, frame#0 appears corrupt.
Other attempts, sometimes stack frame is corrupt to a way so gdb can't decode it
at all.
I need help debugging this further.
Thanks,
/mjt
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: qemu-system-ppc64 option -smp 2 broken with commit 20b6643324a79860dcdfe811ffe4a79942bca21e
2023-06-23 13:46 ` Cédric Le Goater
2023-06-23 15:22 ` Alex Bennée
@ 2023-06-26 5:17 ` Anushree Mathur
2023-06-26 6:18 ` Cédric Le Goater
1 sibling, 1 reply; 7+ messages in thread
From: Anushree Mathur @ 2023-06-26 5:17 UTC (permalink / raw)
To: Cédric Le Goater
Cc: qemu-ppc, qemu-devel, Daniel Henrique Barboza, Nicholas Piggin,
harshpb, richard.henderson, alex.bennee
On 6/23/23 19:16, Cédric Le Goater wrote:
> Hello Anushree,
>
> On 6/23/23 13:09, Anushree Mathur wrote:
>> Hi everyone,
>>
>> I was trying to boot rhel9.3 image with upstream qemu-system-ppc64
>> -smp 2 option and observed a segfault (qemu crash).
>>
>> qemu command line used:
>>
>> qemu-system-ppc64 -name Rhel9.3.ppc64le -smp 2 -m 16G -vga none
>> -nographic -machine pseries -cpu POWER10 -accel tcg -device
>> virtio-scsi-pci -drive
>> file=/home/rh93.qcow2,if=none,format=qcow2,id=hd0 -device
>> scsi-hd,drive=hd0 -boot c
>>
>> After doing a git bisect, I found the first bad commit which
>> introduced this issue is below:
>
> Could you please open a gitlab issue on QEMU project ?
>
> https://gitlab.com/qemu-project/qemu/-/issues
>
> Thanks,
>
> C.
>
>> [qemu]# git bisect good
>> 20b6643324a79860dcdfe811ffe4a79942bca21e is the first bad commit
>> commit 20b6643324a79860dcdfe811ffe4a79942bca21e
>> Author: Richard Henderson <richard.henderson@linaro.org>
>> Date: Mon Dec 5 17:45:02 2022 -0600
>>
>> tcg/ppc: Reorg goto_tb implementation
>>
>> The old ppc64 implementation replaces 2 or 4 insns, which leaves
>> a race
>> condition in which a thread could be stopped at a PC in the
>> middle of
>> the sequence, and when restarted does not see the complete address
>> computation and branches to nowhere.
>>
>> The new implemetation replaces only one insn, swapping between
>>
>> b <dest>
>> and
>> mtctr r31
>>
>> falling through to a general-case indirect branch.
>>
>> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>>
>> tcg/ppc/tcg-target.c.inc | 152
>> +++++++++++++----------------------------------
>> tcg/ppc/tcg-target.h | 3 +-
>> 2 files changed, 41 insertions(+), 114 deletions(-)
>> [qemu]#
>>
>> Can someone please take a look and suggest a fix to resolve this issue?
>>
>> Thanks in advance.
>> Regards,
>> Anushree-Mathur
>>
>>
Hello Cedric,
> As per your mail, I have created the gitlab issue
> https://gitlab.com/qemu-project/qemu/-/issues/1726.
Thanks & Regards,
Anushree-Mathur
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: qemu-system-ppc64 option -smp 2 broken with commit 20b6643324a79860dcdfe811ffe4a79942bca21e
2023-06-26 5:17 ` Anushree Mathur
@ 2023-06-26 6:18 ` Cédric Le Goater
0 siblings, 0 replies; 7+ messages in thread
From: Cédric Le Goater @ 2023-06-26 6:18 UTC (permalink / raw)
To: Anushree Mathur
Cc: qemu-ppc, qemu-devel, Daniel Henrique Barboza, Nicholas Piggin,
harshpb, richard.henderson, alex.bennee
Hello Anushree
> Hello Cedric,
>
> As per your mail, I have created the gitlab issue https://gitlab.com/qemu-project/qemu/-/issues/1726.
Alex had a request for the bug report. If you have to time to provide
the data, it should help analyzing the issue.
Thanks,
C.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: qemu-system-ppc64 option -smp 2 broken with commit 20b6643324a79860dcdfe811ffe4a79942bca21e
2023-06-23 15:22 ` Alex Bennée
@ 2023-07-12 8:34 ` Anushree Mathur
0 siblings, 0 replies; 7+ messages in thread
From: Anushree Mathur @ 2023-07-12 8:34 UTC (permalink / raw)
To: Alex Bennée, Cédric Le Goater
Cc: qemu-ppc, qemu-devel, richard.henderson, Daniel Henrique Barboza,
Nicholas Piggin, harshpb
Hi Alex,
On 6/23/23 20:52, Alex Bennée wrote:
> Cédric Le Goater <clg@kaod.org> writes:
>
>> Hello Anushree,
>>
>> On 6/23/23 13:09, Anushree Mathur wrote:
>>> Hi everyone,
>>> I was trying to boot rhel9.3 image with upstream qemu-system-ppc64
>>> -smp 2 option and observed a segfault (qemu crash).
>>> qemu command line used:
>>> qemu-system-ppc64 -name Rhel9.3.ppc64le -smp 2 -m 16G -vga none
>>> -nographic -machine pseries -cpu POWER10 -accel tcg -device
>>> virtio-scsi-pci -drive
>>> file=/home/rh93.qcow2,if=none,format=qcow2,id=hd0 -device
>>> scsi-hd,drive=hd0 -boot c
>>> After doing a git bisect, I found the first bad commit which
>>> introduced this issue is below:
>> Could you please open a gitlab issue on QEMU project ?
>>
>> https://gitlab.com/qemu-project/qemu/-/issues
> Is it broken generated code that faults or does the goto_tb code break
> the execution sequence in some subtle way further down the line?
>
> If you can isolate the guest address the output from:
>
> -dfilter 0xBADADDR+0x100 -d in_asm,op,out_asm
I tried as suggested above but didn't get much info collected.
I have shared my observation on the gitlab issue page.
https://gitlab.com/qemu-project/qemu/-/issues/1726
Thanks,
Anushree-Mathur
> would be useful for the bug report. Although conceivably the out_asm
> output might make sense at translation time and then be broken when it
> is patched. Having rr on power would be really useful to debug this sort
> of thing.
>
>> Thanks,
>>
>> C.
>>
>>> [qemu]# git bisect good
>>> 20b6643324a79860dcdfe811ffe4a79942bca21e is the first bad commit
>>> commit 20b6643324a79860dcdfe811ffe4a79942bca21e
>>> Author: Richard Henderson <richard.henderson@linaro.org>
>>> Date: Mon Dec 5 17:45:02 2022 -0600
>>> tcg/ppc: Reorg goto_tb implementation
>>> The old ppc64 implementation replaces 2 or 4 insns, which
>>> leaves a race
>>> condition in which a thread could be stopped at a PC in the middle of
>>> the sequence, and when restarted does not see the complete address
>>> computation and branches to nowhere.
>>> The new implemetation replaces only one insn, swapping between
>>> b <dest>
>>> and
>>> mtctr r31
>>> falling through to a general-case indirect branch.
>>> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
>>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>>> tcg/ppc/tcg-target.c.inc | 152
>>> +++++++++++++----------------------------------
>>> tcg/ppc/tcg-target.h | 3 +-
>>> 2 files changed, 41 insertions(+), 114 deletions(-)
>>> [qemu]#
>>> Can someone please take a look and suggest a fix to resolve this
>>> issue?
>>> Thanks in advance.
>>> Regards,
>>> Anushree-Mathur
>>>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-07-12 8:36 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-23 11:09 qemu-system-ppc64 option -smp 2 broken with commit 20b6643324a79860dcdfe811ffe4a79942bca21e Anushree Mathur
2023-06-23 13:46 ` Cédric Le Goater
2023-06-23 15:22 ` Alex Bennée
2023-07-12 8:34 ` Anushree Mathur
2023-06-26 5:17 ` Anushree Mathur
2023-06-26 6:18 ` Cédric Le Goater
2023-06-24 14:29 ` Michael Tokarev
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).