All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: "Alex Bennée" <alex.bennee@linaro.org>
Cc: Peter Maydell <peter.maydell@linaro.org>,
	Sajjan Rao <sajjanr@gmail.com>,
	Gregory Price <gregory.price@memverge.com>,
	"Dimitrios Palyvos" <dimitrios.palyvos@zptcorp.com>,
	<linux-cxl@vger.kernel.org>, <qemu-devel@nongnu.org>,
	<richard.henderson@linaro.org>
Subject: Re: Crash with CXL + TCG on 8.2: Was Re: qemu cxl memory expander shows numa_node -1
Date: Thu, 1 Feb 2024 15:29:31 +0000	[thread overview]
Message-ID: <20240201152931.00001511@Huawei.com> (raw)
In-Reply-To: <87msskkyce.fsf@draig.linaro.org>

On Thu, 01 Feb 2024 15:17:53 +0000
Alex Bennée <alex.bennee@linaro.org> wrote:

> Peter Maydell <peter.maydell@linaro.org> writes:
> 
> > On Thu, 1 Feb 2024 at 14:01, Jonathan Cameron
> > <Jonathan.Cameron@huawei.com> wrote:  
> >> > Can you run QEMU under gdb and give the backtrace when it stops
> >> > on the abort() ? That will probably have a helpful clue. I
> >> > suspect something is failing to pass a valid retaddr in
> >> > when it calls a load/store function.  
> >  
> >> [Switching to Thread 0x7ffff56ff6c0 (LWP 21916)]
> >> __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
> >> 44      ./nptl/pthread_kill.c: No such file or directory.
> >> (gdb) bt
> >> #0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
> >> #1  __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
> >> #2  __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
> >> #3  0x00007ffff77c43b6 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
> >> #4  0x00007ffff77aa87c in __GI_abort () at ./stdlib/abort.c:79
> >> #5  0x0000555555c0d4ce in cpu_abort
> >>     (cpu=cpu@entry=0x555556fd9000, fmt=fmt@entry=0x555555fe3378 "cpu_io_recompile: could not find TB for pc=%p")
> >>     at ../../cpu-target.c:359
> >> #6  0x0000555555c59435 in cpu_io_recompile (cpu=cpu@entry=0x555556fd9000, retaddr=retaddr@entry=0) at ../../accel/tcg/translate-all.c:611
> >> #7  0x0000555555c5c956 in io_prepare
> >>     (retaddr=0, addr=19595792376, attrs=..., xlat=<optimized out>, cpu=0x555556fd9000, out_offset=<synthetic pointer>)
> >>     at ../../accel/tcg/cputlb.c:1339  
> <snip>
> >> #21 tb_htable_lookup (cpu=<optimized out>, pc=pc@entry=18446744072116178925, cs_base=0, flags=415285936, cflags=4278353920)
> >>     at ../../accel/tcg/cpu-exec.c:231
> >> #22 0x0000555555c50c08 in tb_lookup
> >>     (cpu=cpu@entry=0x555556fd9000, pc=pc@entry=18446744072116178925, cs_base=cs_base@entry=0, flags=<optimized out>, cflags=<optimized out>) at ../../accel/tcg/cpu-exec.c:267
> >> #23 0x0000555555c51e23 in helper_lookup_tb_ptr (env=0x555556fdb7c0) at ../../accel/tcg/cpu-exec.c:423
> >> #24 0x00007fffa9076ead in code_gen_buffer ()
> >> #25 0x0000555555c50fab in cpu_tb_exec (cpu=cpu@entry=0x555556fd9000, itb=<optimized out>, tb_exit=tb_exit@entry=0x7ffff56fe708)
> >>     at ../../accel/tcg/cpu-exec.c:458
> >> #26 0x0000555555c51492 in cpu_loop_exec_tb
> >>     (tb_exit=0x7ffff56fe708, last_tb=<synthetic pointer>, pc=18446744072116179169, tb=<optimized out>, cpu=0x555556fd9000)
> >>     at ../../accel/tcg/cpu-exec.c:920
> >> #27 cpu_exec_loop (cpu=cpu@entry=0x555556fd9000, sc=sc@entry=0x7ffff56fe7a0) at ../../accel/tcg/cpu-exec.c:1041
> >> #28 0x0000555555c51d11 in cpu_exec_setjmp (cpu=cpu@entry=0x555556fd9000, sc=sc@entry=0x7ffff56fe7a0) at ../../accel/tcg/cpu-exec.c:1058
> >> #29 0x0000555555c523b4 in cpu_exec (cpu=cpu@entry=0x555556fd9000) at ../../accel/tcg/cpu-exec.c:1084
> >> #30 0x0000555555c74053 in tcg_cpus_exec (cpu=cpu@entry=0x555556fd9000) at ../../accel/tcg/tcg-accel-ops.c:76
> >> #31 0x0000555555c741a0 in mttcg_cpu_thread_fn (arg=arg@entry=0x555556fd9000) at ../../accel/tcg/tcg-accel-ops-mttcg.c:95
> >> #32 0x0000555555dfb580 in qemu_thread_start (args=0x55555703c3e0) at ../../util/qemu-thread-posix.c:541
> >> #33 0x00007ffff78176ba in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:444
> >> #34 0x00007ffff78a60d0 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81  
> >
> > So, that looks like:
> >  * we call cpu_tb_exec(), which executes some generated code
> >  * that generated code calls the lookup_tb_ptr helper to see
> >    if we have a generated TB already for the address we're going
> >    to execute next
> >  * lookup_tb_ptr probes the TLB to see if we know the host RAM
> >    address for the guest address
> >  * this results in a TLB walk for an instruction fetch
> >  * the page table descriptor load is to IO memory
> >  * io_prepare assumes it needs to do a TLB recompile, because
> >    can_do_io is clear
> >
> > I am not surprised that the corner case of "the guest put its
> > page tables in an MMIO device" has not yet come up :-)
> >
> > I'm really not sure how the icount handling should interact
> > with that...  
> 
> Its not just icount - we need to handle it for all modes now. That said
> seeing as we are at the end of a block shouldn't can_do_io be set?
> 
> Does:
> 
> modified   accel/tcg/translator.c
> @@ -201,6 +201,8 @@ void translator_loop(CPUState *cpu, TranslationBlock *tb, int *max_insns,
>          }
>      }
>  
> +    set_can_do_io(db, true);
> +
>      /* Emit code to exit the TB, as indicated by db->is_jmp.  */
>      ops->tb_stop(db, cpu);
>      gen_tb_end(tb, cflags, icount_start_insn, db->num_insns);
> 
> do the trick?

no :(

> 
> >
> > -- PMM  
> 


WARNING: multiple messages have this Message-ID (diff)
From: Jonathan Cameron via <qemu-devel@nongnu.org>
To: "Alex Bennée" <alex.bennee@linaro.org>
Cc: Peter Maydell <peter.maydell@linaro.org>,
	Sajjan Rao <sajjanr@gmail.com>,
	 Gregory Price <gregory.price@memverge.com>,
	"Dimitrios Palyvos" <dimitrios.palyvos@zptcorp.com>,
	<linux-cxl@vger.kernel.org>, <qemu-devel@nongnu.org>,
	<richard.henderson@linaro.org>
Subject: Re: Crash with CXL + TCG on 8.2: Was Re: qemu cxl memory expander shows numa_node -1
Date: Thu, 1 Feb 2024 15:29:31 +0000	[thread overview]
Message-ID: <20240201152931.00001511@Huawei.com> (raw)
In-Reply-To: <87msskkyce.fsf@draig.linaro.org>

On Thu, 01 Feb 2024 15:17:53 +0000
Alex Bennée <alex.bennee@linaro.org> wrote:

> Peter Maydell <peter.maydell@linaro.org> writes:
> 
> > On Thu, 1 Feb 2024 at 14:01, Jonathan Cameron
> > <Jonathan.Cameron@huawei.com> wrote:  
> >> > Can you run QEMU under gdb and give the backtrace when it stops
> >> > on the abort() ? That will probably have a helpful clue. I
> >> > suspect something is failing to pass a valid retaddr in
> >> > when it calls a load/store function.  
> >  
> >> [Switching to Thread 0x7ffff56ff6c0 (LWP 21916)]
> >> __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
> >> 44      ./nptl/pthread_kill.c: No such file or directory.
> >> (gdb) bt
> >> #0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
> >> #1  __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
> >> #2  __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
> >> #3  0x00007ffff77c43b6 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
> >> #4  0x00007ffff77aa87c in __GI_abort () at ./stdlib/abort.c:79
> >> #5  0x0000555555c0d4ce in cpu_abort
> >>     (cpu=cpu@entry=0x555556fd9000, fmt=fmt@entry=0x555555fe3378 "cpu_io_recompile: could not find TB for pc=%p")
> >>     at ../../cpu-target.c:359
> >> #6  0x0000555555c59435 in cpu_io_recompile (cpu=cpu@entry=0x555556fd9000, retaddr=retaddr@entry=0) at ../../accel/tcg/translate-all.c:611
> >> #7  0x0000555555c5c956 in io_prepare
> >>     (retaddr=0, addr=19595792376, attrs=..., xlat=<optimized out>, cpu=0x555556fd9000, out_offset=<synthetic pointer>)
> >>     at ../../accel/tcg/cputlb.c:1339  
> <snip>
> >> #21 tb_htable_lookup (cpu=<optimized out>, pc=pc@entry=18446744072116178925, cs_base=0, flags=415285936, cflags=4278353920)
> >>     at ../../accel/tcg/cpu-exec.c:231
> >> #22 0x0000555555c50c08 in tb_lookup
> >>     (cpu=cpu@entry=0x555556fd9000, pc=pc@entry=18446744072116178925, cs_base=cs_base@entry=0, flags=<optimized out>, cflags=<optimized out>) at ../../accel/tcg/cpu-exec.c:267
> >> #23 0x0000555555c51e23 in helper_lookup_tb_ptr (env=0x555556fdb7c0) at ../../accel/tcg/cpu-exec.c:423
> >> #24 0x00007fffa9076ead in code_gen_buffer ()
> >> #25 0x0000555555c50fab in cpu_tb_exec (cpu=cpu@entry=0x555556fd9000, itb=<optimized out>, tb_exit=tb_exit@entry=0x7ffff56fe708)
> >>     at ../../accel/tcg/cpu-exec.c:458
> >> #26 0x0000555555c51492 in cpu_loop_exec_tb
> >>     (tb_exit=0x7ffff56fe708, last_tb=<synthetic pointer>, pc=18446744072116179169, tb=<optimized out>, cpu=0x555556fd9000)
> >>     at ../../accel/tcg/cpu-exec.c:920
> >> #27 cpu_exec_loop (cpu=cpu@entry=0x555556fd9000, sc=sc@entry=0x7ffff56fe7a0) at ../../accel/tcg/cpu-exec.c:1041
> >> #28 0x0000555555c51d11 in cpu_exec_setjmp (cpu=cpu@entry=0x555556fd9000, sc=sc@entry=0x7ffff56fe7a0) at ../../accel/tcg/cpu-exec.c:1058
> >> #29 0x0000555555c523b4 in cpu_exec (cpu=cpu@entry=0x555556fd9000) at ../../accel/tcg/cpu-exec.c:1084
> >> #30 0x0000555555c74053 in tcg_cpus_exec (cpu=cpu@entry=0x555556fd9000) at ../../accel/tcg/tcg-accel-ops.c:76
> >> #31 0x0000555555c741a0 in mttcg_cpu_thread_fn (arg=arg@entry=0x555556fd9000) at ../../accel/tcg/tcg-accel-ops-mttcg.c:95
> >> #32 0x0000555555dfb580 in qemu_thread_start (args=0x55555703c3e0) at ../../util/qemu-thread-posix.c:541
> >> #33 0x00007ffff78176ba in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:444
> >> #34 0x00007ffff78a60d0 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81  
> >
> > So, that looks like:
> >  * we call cpu_tb_exec(), which executes some generated code
> >  * that generated code calls the lookup_tb_ptr helper to see
> >    if we have a generated TB already for the address we're going
> >    to execute next
> >  * lookup_tb_ptr probes the TLB to see if we know the host RAM
> >    address for the guest address
> >  * this results in a TLB walk for an instruction fetch
> >  * the page table descriptor load is to IO memory
> >  * io_prepare assumes it needs to do a TLB recompile, because
> >    can_do_io is clear
> >
> > I am not surprised that the corner case of "the guest put its
> > page tables in an MMIO device" has not yet come up :-)
> >
> > I'm really not sure how the icount handling should interact
> > with that...  
> 
> Its not just icount - we need to handle it for all modes now. That said
> seeing as we are at the end of a block shouldn't can_do_io be set?
> 
> Does:
> 
> modified   accel/tcg/translator.c
> @@ -201,6 +201,8 @@ void translator_loop(CPUState *cpu, TranslationBlock *tb, int *max_insns,
>          }
>      }
>  
> +    set_can_do_io(db, true);
> +
>      /* Emit code to exit the TB, as indicated by db->is_jmp.  */
>      ops->tb_stop(db, cpu);
>      gen_tb_end(tb, cflags, icount_start_insn, db->num_insns);
> 
> do the trick?

no :(

> 
> >
> > -- PMM  
> 



  reply	other threads:[~2024-02-01 15:29 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-18  9:38 qemu cxl memory expander shows numa_node -1 Sajjan Rao
2023-08-18 15:01 ` Dimitrios Palyvos
2023-08-21 10:00   ` Sajjan Rao
2023-08-21 10:53     ` Dimitrios Palyvos
2023-08-23 11:13       ` Sajjan Rao
2023-08-23 16:50         ` Jonathan Cameron
2023-08-24  6:26           ` Sajjan Rao
2024-01-25  8:15             ` Sajjan Rao
2024-01-26 12:39               ` Jonathan Cameron
2024-01-26 15:43                 ` Gregory Price
2024-01-26 17:12                   ` Jonathan Cameron
2024-01-30  8:20                     ` Sajjan Rao
2024-02-01 13:04                       ` Crash with CXL + TCG on 8.2: Was " Jonathan Cameron
2024-02-01 13:04                         ` Jonathan Cameron via
2024-02-01 13:12                         ` Peter Maydell
2024-02-01 14:01                           ` Jonathan Cameron
2024-02-01 14:01                             ` Jonathan Cameron via
2024-02-01 14:35                             ` Peter Maydell
2024-02-01 15:17                               ` Alex Bennée
2024-02-01 15:29                                 ` Jonathan Cameron [this message]
2024-02-01 15:29                                   ` Jonathan Cameron via
2024-02-01 16:00                                 ` Peter Maydell
2024-02-01 16:21                                   ` Jonathan Cameron
2024-02-01 16:21                                     ` Jonathan Cameron via
2024-02-01 16:45                                     ` Alex Bennée
2024-02-01 17:04                                       ` Gregory Price
2024-02-01 17:07                                         ` Peter Maydell
2024-02-01 17:29                                           ` Gregory Price
2024-02-01 17:08                                       ` Jonathan Cameron
2024-02-01 17:08                                         ` Jonathan Cameron via
2024-02-01 17:21                                         ` Peter Maydell
2024-02-01 17:41                                           ` Jonathan Cameron
2024-02-01 17:41                                             ` Jonathan Cameron via
2024-02-01 17:25                                         ` Alex Bennée
2024-02-01 18:04                                           ` Peter Maydell
2024-02-01 18:56                                             ` Gregory Price
2024-02-02 16:26                                               ` Jonathan Cameron
2024-02-02 16:26                                                 ` Jonathan Cameron via
2024-02-02 16:33                                                 ` Peter Maydell
2024-02-02 16:50                                                   ` Gregory Price
2024-02-02 16:56                                                     ` Peter Maydell
2024-02-07 17:34                                                       ` Jonathan Cameron
2024-02-07 17:34                                                         ` Jonathan Cameron via
2024-02-08 14:50                                                         ` Jonathan Cameron
2024-02-08 14:50                                                           ` Jonathan Cameron via
2024-02-15 15:29                                                           ` Jonathan Cameron
2024-02-15 15:29                                                             ` Jonathan Cameron via
2024-02-19  7:55                                                             ` Mattias Nissler
2024-02-15 15:04                                   ` Jonathan Cameron
2024-02-15 15:04                                     ` Jonathan Cameron via

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240201152931.00001511@Huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=alex.bennee@linaro.org \
    --cc=dimitrios.palyvos@zptcorp.com \
    --cc=gregory.price@memverge.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=sajjanr@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.