* qemu-system-ppc64 option -smp 2 broken with commit 20b6643324a79860dcdfe811ffe4a79942bca21e @ 2023-06-23 11:09 Anushree Mathur 2023-06-23 13:46 ` Cédric Le Goater 2023-06-24 14:29 ` Michael Tokarev 0 siblings, 2 replies; 7+ messages in thread From: Anushree Mathur @ 2023-06-23 11:09 UTC (permalink / raw) To: qemu-ppc, qemu-devel, richard.henderson, alex.bennee Cc: Daniel Henrique Barboza, Nicholas Piggin, harshpb Hi everyone, I was trying to boot rhel9.3 image with upstream qemu-system-ppc64 -smp 2 option and observed a segfault (qemu crash). qemu command line used: qemu-system-ppc64 -name Rhel9.3.ppc64le -smp 2 -m 16G -vga none -nographic -machine pseries -cpu POWER10 -accel tcg -device virtio-scsi-pci -drive file=/home/rh93.qcow2,if=none,format=qcow2,id=hd0 -device scsi-hd,drive=hd0 -boot c After doing a git bisect, I found the first bad commit which introduced this issue is below: [qemu]# git bisect good 20b6643324a79860dcdfe811ffe4a79942bca21e is the first bad commit commit 20b6643324a79860dcdfe811ffe4a79942bca21e Author: Richard Henderson <richard.henderson@linaro.org> Date: Mon Dec 5 17:45:02 2022 -0600 tcg/ppc: Reorg goto_tb implementation The old ppc64 implementation replaces 2 or 4 insns, which leaves a race condition in which a thread could be stopped at a PC in the middle of the sequence, and when restarted does not see the complete address computation and branches to nowhere. The new implemetation replaces only one insn, swapping between b <dest> and mtctr r31 falling through to a general-case indirect branch. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> tcg/ppc/tcg-target.c.inc | 152 +++++++++++++---------------------------------- tcg/ppc/tcg-target.h | 3 +- 2 files changed, 41 insertions(+), 114 deletions(-) [qemu]# Can someone please take a look and suggest a fix to resolve this issue? Thanks in advance. Regards, Anushree-Mathur ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: qemu-system-ppc64 option -smp 2 broken with commit 20b6643324a79860dcdfe811ffe4a79942bca21e 2023-06-23 11:09 qemu-system-ppc64 option -smp 2 broken with commit 20b6643324a79860dcdfe811ffe4a79942bca21e Anushree Mathur @ 2023-06-23 13:46 ` Cédric Le Goater 2023-06-23 15:22 ` Alex Bennée 2023-06-26 5:17 ` Anushree Mathur 2023-06-24 14:29 ` Michael Tokarev 1 sibling, 2 replies; 7+ messages in thread From: Cédric Le Goater @ 2023-06-23 13:46 UTC (permalink / raw) To: Anushree Mathur, qemu-ppc, qemu-devel, richard.henderson, alex.bennee Cc: Daniel Henrique Barboza, Nicholas Piggin, harshpb Hello Anushree, On 6/23/23 13:09, Anushree Mathur wrote: > Hi everyone, > > I was trying to boot rhel9.3 image with upstream qemu-system-ppc64 -smp 2 option and observed a segfault (qemu crash). > > qemu command line used: > > qemu-system-ppc64 -name Rhel9.3.ppc64le -smp 2 -m 16G -vga none -nographic -machine pseries -cpu POWER10 -accel tcg -device virtio-scsi-pci -drive file=/home/rh93.qcow2,if=none,format=qcow2,id=hd0 -device scsi-hd,drive=hd0 -boot c > > After doing a git bisect, I found the first bad commit which introduced this issue is below: Could you please open a gitlab issue on QEMU project ? https://gitlab.com/qemu-project/qemu/-/issues Thanks, C. > [qemu]# git bisect good > 20b6643324a79860dcdfe811ffe4a79942bca21e is the first bad commit > commit 20b6643324a79860dcdfe811ffe4a79942bca21e > Author: Richard Henderson <richard.henderson@linaro.org> > Date: Mon Dec 5 17:45:02 2022 -0600 > > tcg/ppc: Reorg goto_tb implementation > > The old ppc64 implementation replaces 2 or 4 insns, which leaves a race > condition in which a thread could be stopped at a PC in the middle of > the sequence, and when restarted does not see the complete address > computation and branches to nowhere. > > The new implemetation replaces only one insn, swapping between > > b <dest> > and > mtctr r31 > > falling through to a general-case indirect branch. > > Reviewed-by: Alex Bennée <alex.bennee@linaro.org> > Signed-off-by: Richard Henderson <richard.henderson@linaro.org> > > tcg/ppc/tcg-target.c.inc | 152 +++++++++++++---------------------------------- > tcg/ppc/tcg-target.h | 3 +- > 2 files changed, 41 insertions(+), 114 deletions(-) > [qemu]# > > Can someone please take a look and suggest a fix to resolve this issue? > > Thanks in advance. > Regards, > Anushree-Mathur > > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: qemu-system-ppc64 option -smp 2 broken with commit 20b6643324a79860dcdfe811ffe4a79942bca21e 2023-06-23 13:46 ` Cédric Le Goater @ 2023-06-23 15:22 ` Alex Bennée 2023-07-12 8:34 ` Anushree Mathur 2023-06-26 5:17 ` Anushree Mathur 1 sibling, 1 reply; 7+ messages in thread From: Alex Bennée @ 2023-06-23 15:22 UTC (permalink / raw) To: Cédric Le Goater Cc: Anushree Mathur, qemu-ppc, qemu-devel, richard.henderson, Daniel Henrique Barboza, Nicholas Piggin, harshpb Cédric Le Goater <clg@kaod.org> writes: > Hello Anushree, > > On 6/23/23 13:09, Anushree Mathur wrote: >> Hi everyone, >> I was trying to boot rhel9.3 image with upstream qemu-system-ppc64 >> -smp 2 option and observed a segfault (qemu crash). >> qemu command line used: >> qemu-system-ppc64 -name Rhel9.3.ppc64le -smp 2 -m 16G -vga none >> -nographic -machine pseries -cpu POWER10 -accel tcg -device >> virtio-scsi-pci -drive >> file=/home/rh93.qcow2,if=none,format=qcow2,id=hd0 -device >> scsi-hd,drive=hd0 -boot c >> After doing a git bisect, I found the first bad commit which >> introduced this issue is below: > > Could you please open a gitlab issue on QEMU project ? > > https://gitlab.com/qemu-project/qemu/-/issues Is it broken generated code that faults or does the goto_tb code break the execution sequence in some subtle way further down the line? If you can isolate the guest address the output from: -dfilter 0xBADADDR+0x100 -d in_asm,op,out_asm would be useful for the bug report. Although conceivably the out_asm output might make sense at translation time and then be broken when it is patched. Having rr on power would be really useful to debug this sort of thing. > > Thanks, > > C. > >> [qemu]# git bisect good >> 20b6643324a79860dcdfe811ffe4a79942bca21e is the first bad commit >> commit 20b6643324a79860dcdfe811ffe4a79942bca21e >> Author: Richard Henderson <richard.henderson@linaro.org> >> Date: Mon Dec 5 17:45:02 2022 -0600 >> tcg/ppc: Reorg goto_tb implementation >> The old ppc64 implementation replaces 2 or 4 insns, which >> leaves a race >> condition in which a thread could be stopped at a PC in the middle of >> the sequence, and when restarted does not see the complete address >> computation and branches to nowhere. >> The new implemetation replaces only one insn, swapping between >> b <dest> >> and >> mtctr r31 >> falling through to a general-case indirect branch. >> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> >> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> >> tcg/ppc/tcg-target.c.inc | 152 >> +++++++++++++---------------------------------- >> tcg/ppc/tcg-target.h | 3 +- >> 2 files changed, 41 insertions(+), 114 deletions(-) >> [qemu]# >> Can someone please take a look and suggest a fix to resolve this >> issue? >> Thanks in advance. >> Regards, >> Anushree-Mathur >> -- Alex Bennée Virtualisation Tech Lead @ Linaro ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: qemu-system-ppc64 option -smp 2 broken with commit 20b6643324a79860dcdfe811ffe4a79942bca21e 2023-06-23 15:22 ` Alex Bennée @ 2023-07-12 8:34 ` Anushree Mathur 0 siblings, 0 replies; 7+ messages in thread From: Anushree Mathur @ 2023-07-12 8:34 UTC (permalink / raw) To: Alex Bennée, Cédric Le Goater Cc: qemu-ppc, qemu-devel, richard.henderson, Daniel Henrique Barboza, Nicholas Piggin, harshpb Hi Alex, On 6/23/23 20:52, Alex Bennée wrote: > Cédric Le Goater <clg@kaod.org> writes: > >> Hello Anushree, >> >> On 6/23/23 13:09, Anushree Mathur wrote: >>> Hi everyone, >>> I was trying to boot rhel9.3 image with upstream qemu-system-ppc64 >>> -smp 2 option and observed a segfault (qemu crash). >>> qemu command line used: >>> qemu-system-ppc64 -name Rhel9.3.ppc64le -smp 2 -m 16G -vga none >>> -nographic -machine pseries -cpu POWER10 -accel tcg -device >>> virtio-scsi-pci -drive >>> file=/home/rh93.qcow2,if=none,format=qcow2,id=hd0 -device >>> scsi-hd,drive=hd0 -boot c >>> After doing a git bisect, I found the first bad commit which >>> introduced this issue is below: >> Could you please open a gitlab issue on QEMU project ? >> >> https://gitlab.com/qemu-project/qemu/-/issues > Is it broken generated code that faults or does the goto_tb code break > the execution sequence in some subtle way further down the line? > > If you can isolate the guest address the output from: > > -dfilter 0xBADADDR+0x100 -d in_asm,op,out_asm I tried as suggested above but didn't get much info collected. I have shared my observation on the gitlab issue page. https://gitlab.com/qemu-project/qemu/-/issues/1726 Thanks, Anushree-Mathur > would be useful for the bug report. Although conceivably the out_asm > output might make sense at translation time and then be broken when it > is patched. Having rr on power would be really useful to debug this sort > of thing. > >> Thanks, >> >> C. >> >>> [qemu]# git bisect good >>> 20b6643324a79860dcdfe811ffe4a79942bca21e is the first bad commit >>> commit 20b6643324a79860dcdfe811ffe4a79942bca21e >>> Author: Richard Henderson <richard.henderson@linaro.org> >>> Date: Mon Dec 5 17:45:02 2022 -0600 >>> tcg/ppc: Reorg goto_tb implementation >>> The old ppc64 implementation replaces 2 or 4 insns, which >>> leaves a race >>> condition in which a thread could be stopped at a PC in the middle of >>> the sequence, and when restarted does not see the complete address >>> computation and branches to nowhere. >>> The new implemetation replaces only one insn, swapping between >>> b <dest> >>> and >>> mtctr r31 >>> falling through to a general-case indirect branch. >>> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> >>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> >>> tcg/ppc/tcg-target.c.inc | 152 >>> +++++++++++++---------------------------------- >>> tcg/ppc/tcg-target.h | 3 +- >>> 2 files changed, 41 insertions(+), 114 deletions(-) >>> [qemu]# >>> Can someone please take a look and suggest a fix to resolve this >>> issue? >>> Thanks in advance. >>> Regards, >>> Anushree-Mathur >>> > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: qemu-system-ppc64 option -smp 2 broken with commit 20b6643324a79860dcdfe811ffe4a79942bca21e 2023-06-23 13:46 ` Cédric Le Goater 2023-06-23 15:22 ` Alex Bennée @ 2023-06-26 5:17 ` Anushree Mathur 2023-06-26 6:18 ` Cédric Le Goater 1 sibling, 1 reply; 7+ messages in thread From: Anushree Mathur @ 2023-06-26 5:17 UTC (permalink / raw) To: Cédric Le Goater Cc: qemu-ppc, qemu-devel, Daniel Henrique Barboza, Nicholas Piggin, harshpb, richard.henderson, alex.bennee On 6/23/23 19:16, Cédric Le Goater wrote: > Hello Anushree, > > On 6/23/23 13:09, Anushree Mathur wrote: >> Hi everyone, >> >> I was trying to boot rhel9.3 image with upstream qemu-system-ppc64 >> -smp 2 option and observed a segfault (qemu crash). >> >> qemu command line used: >> >> qemu-system-ppc64 -name Rhel9.3.ppc64le -smp 2 -m 16G -vga none >> -nographic -machine pseries -cpu POWER10 -accel tcg -device >> virtio-scsi-pci -drive >> file=/home/rh93.qcow2,if=none,format=qcow2,id=hd0 -device >> scsi-hd,drive=hd0 -boot c >> >> After doing a git bisect, I found the first bad commit which >> introduced this issue is below: > > Could you please open a gitlab issue on QEMU project ? > > https://gitlab.com/qemu-project/qemu/-/issues > > Thanks, > > C. > >> [qemu]# git bisect good >> 20b6643324a79860dcdfe811ffe4a79942bca21e is the first bad commit >> commit 20b6643324a79860dcdfe811ffe4a79942bca21e >> Author: Richard Henderson <richard.henderson@linaro.org> >> Date: Mon Dec 5 17:45:02 2022 -0600 >> >> tcg/ppc: Reorg goto_tb implementation >> >> The old ppc64 implementation replaces 2 or 4 insns, which leaves >> a race >> condition in which a thread could be stopped at a PC in the >> middle of >> the sequence, and when restarted does not see the complete address >> computation and branches to nowhere. >> >> The new implemetation replaces only one insn, swapping between >> >> b <dest> >> and >> mtctr r31 >> >> falling through to a general-case indirect branch. >> >> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> >> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> >> >> tcg/ppc/tcg-target.c.inc | 152 >> +++++++++++++---------------------------------- >> tcg/ppc/tcg-target.h | 3 +- >> 2 files changed, 41 insertions(+), 114 deletions(-) >> [qemu]# >> >> Can someone please take a look and suggest a fix to resolve this issue? >> >> Thanks in advance. >> Regards, >> Anushree-Mathur >> >> Hello Cedric, > As per your mail, I have created the gitlab issue > https://gitlab.com/qemu-project/qemu/-/issues/1726. Thanks & Regards, Anushree-Mathur > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: qemu-system-ppc64 option -smp 2 broken with commit 20b6643324a79860dcdfe811ffe4a79942bca21e 2023-06-26 5:17 ` Anushree Mathur @ 2023-06-26 6:18 ` Cédric Le Goater 0 siblings, 0 replies; 7+ messages in thread From: Cédric Le Goater @ 2023-06-26 6:18 UTC (permalink / raw) To: Anushree Mathur Cc: qemu-ppc, qemu-devel, Daniel Henrique Barboza, Nicholas Piggin, harshpb, richard.henderson, alex.bennee Hello Anushree > Hello Cedric, > > As per your mail, I have created the gitlab issue https://gitlab.com/qemu-project/qemu/-/issues/1726. Alex had a request for the bug report. If you have to time to provide the data, it should help analyzing the issue. Thanks, C. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: qemu-system-ppc64 option -smp 2 broken with commit 20b6643324a79860dcdfe811ffe4a79942bca21e 2023-06-23 11:09 qemu-system-ppc64 option -smp 2 broken with commit 20b6643324a79860dcdfe811ffe4a79942bca21e Anushree Mathur 2023-06-23 13:46 ` Cédric Le Goater @ 2023-06-24 14:29 ` Michael Tokarev 1 sibling, 0 replies; 7+ messages in thread From: Michael Tokarev @ 2023-06-24 14:29 UTC (permalink / raw) To: Anushree Mathur, qemu-ppc, qemu-devel, richard.henderson, alex.bennee Cc: Daniel Henrique Barboza, Nicholas Piggin, harshpb 23.06.2023 14:09, Anushree Mathur wrote: > Hi everyone, > > I was trying to boot rhel9.3 image with upstream qemu-system-ppc64 -smp 2 option and observed a segfault (qemu crash). > > qemu command line used: > > qemu-system-ppc64 -name Rhel9.3.ppc64le -smp 2 -m 16G -vga none -nographic -machine pseries -cpu POWER10 -accel tcg -device virtio-scsi-pci -drive > file=/home/rh93.qcow2,if=none,format=qcow2,id=hd0 -device scsi-hd,drive=hd0 -boot c > > After doing a git bisect, I found the first bad commit which introduced this issue is below: > > [qemu]# git bisect good > 20b6643324a79860dcdfe811ffe4a79942bca21e is the first bad commit > commit 20b6643324a79860dcdfe811ffe4a79942bca21e > Author: Richard Henderson <richard.henderson@linaro.org> > Date: Mon Dec 5 17:45:02 2022 -0600 > > tcg/ppc: Reorg goto_tb implementation I've got another case which leads to this same commit, with similar results, on a debian ppc64 machine with qemu 8.0 and master. The crash doesn't happen every time, sometimes it needs 20+ iterations to trigger (so my bisection was rather painful, initially pointing to an entirely innocent commit). So far it only occurs on actual ppc64 machine, - I weren't able to reproduce it on amd64. Sometimes (more often) it ends with SIGSEGV, but sometimes it also fails with Illegal Instruction. Examining it with gdb - it looks more like a stack corruption. I triggered it by just booting a linux system. When it fails, most often it fails somewhere at the end of boot, but sometimes it does that the moment kernel spawns /init from initramfs and that one (a shell script) executes first program. [ OK ] Finished systemd-journal-f…ush Journal to Persistent Storage. Starting systemd-tmpfiles-… Volatile Files and Directories... [ OK ] Finished systemd-udev-trig…e - Coldplug All udev Devices. [ OK ] Finished systemd-tmpfiles-…te Volatile Files and Directories. Starting systemd-resolved.…e - Network Name Resolution... Starting systemd-update-ut…rd System Boot/Shutdown in UTMP... [ OK ] Started systemd-udevd.serv…nager for Device Events and Files. Starting systemd-networkd.…ice - Network Configuration... Segmentation fault (core dumped) ... Core was generated by `qemu-system-ppc64 -append root=LABEL=debvm rw -nographic -smp 2 -machine accel='. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007fff3462395c in code_gen_buffer () [Current thread is 1 (Thread 0x7fff79c6e7c0 (LWP 922586))] (gdb) bt #0 0x00007fff3462395c in code_gen_buffer () #1 0x00000001076cbd2c in cpu_tb_exec (cpu=cpu@entry=0x1001d98b320, itb=itb@entry=0x7fff4b378480 <code_gen_buffer+383812548>, tb_exit=tb_exit@entry=0x7fff79c6d8c0) at accel/tcg/cpu-exec.c:460 #2 0x00000001076cc348 in cpu_loop_exec_tb (tb_exit=0x7fff79c6d8c0, last_tb=<synthetic pointer>, pc=140736355546736, tb=0x7fff4b378480 <code_gen_buffer+383812548>, cpu=<optimized out>) at accel/tcg/cpu-exec.c:893 #3 cpu_exec_loop (cpu=cpu@entry=0x1001d98b320, sc=sc@entry=0x7fff79c6da10) at accel/tcg/cpu-exec.c:1013 #4 0x00000001076ccd98 in cpu_exec_setjmp (cpu=cpu@entry=0x1001d98b320, sc=sc@entry=0x7fff79c6da10) at accel/tcg/cpu-exec.c:1043 #5 0x00000001076cd5ec in cpu_exec (cpu=0x1001d98b320) at accel/tcg/cpu-exec.c:1069 #6 0x0000000107705d30 in tcg_cpus_exec (cpu=0x1001d98b320) at accel/tcg/tcg-accel-ops.c:81 #7 0x0000000107705f20 in mttcg_cpu_thread_fn (arg=0x1001d98b320) at accel/tcg/tcg-accel-ops-mttcg.c:95 #8 0x000000010793ed7c in qemu_thread_start (args=<optimized out>) at util/qemu-thread-posix.c:541 #9 0x00007fff81673d0c in ?? () from /lib/powerpc64le-linux-gnu/libc.so.6 #10 0x00007fff81724350 in clone () from /lib/powerpc64le-linux-gnu/libc.so.6 (gdb) l 32 33 int qemu_default_main(void) 34 { 35 int status; 36 37 status = qemu_main_loop(); 38 qemu_cleanup(); 39 40 return status; 41 } (gdb) frame 1 #1 0x00000001076cbd2c in cpu_tb_exec (cpu=cpu@entry=0x1001d98b320, itb=itb@entry=0x7fff4b378480 <code_gen_buffer+383812548>, tb_exit=tb_exit@entry=0x7fff79c6d8c0) at accel/tcg/cpu-exec.c:460 460 ret = tcg_qemu_tb_exec(env, tb_ptr); (gdb) l 455 if (qemu_loglevel_mask(CPU_LOG_TB_CPU | CPU_LOG_EXEC)) { 456 log_cpu_exec(log_pc(cpu, itb), cpu, itb); 457 } 458 459 qemu_thread_jit_execute(); 460 ret = tcg_qemu_tb_exec(env, tb_ptr); 461 cpu->can_do_io = 1; 462 qemu_plugin_disable_mem_helpers(cpu); 463 /* 464 * TODO: Delay swapping back to the read-write region of the TB (this is 8.0.2, the same happens with master). Here, frame#0 appears corrupt. Other attempts, sometimes stack frame is corrupt to a way so gdb can't decode it at all. I need help debugging this further. Thanks, /mjt ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-07-12 8:36 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-06-23 11:09 qemu-system-ppc64 option -smp 2 broken with commit 20b6643324a79860dcdfe811ffe4a79942bca21e Anushree Mathur 2023-06-23 13:46 ` Cédric Le Goater 2023-06-23 15:22 ` Alex Bennée 2023-07-12 8:34 ` Anushree Mathur 2023-06-26 5:17 ` Anushree Mathur 2023-06-26 6:18 ` Cédric Le Goater 2023-06-24 14:29 ` Michael Tokarev
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).