* [PATCH 0/3 qemu] tcg/i386: Page tables in MMIO memory fixes (CXL) @ 2024-02-15 15:01 Jonathan Cameron via 2024-02-15 15:01 ` [PATCH 1/3] accel/tcg: Set can_do_io at at start of lookup_tb_ptr helper Jonathan Cameron via ` (2 more replies) 0 siblings, 3 replies; 18+ messages in thread From: Jonathan Cameron via @ 2024-02-15 15:01 UTC (permalink / raw) To: qemu-devel, Peter Maydell, Gregory Price, Alex Bennée, Sajjan Rao, Dimitrios Palyvos, richard.henderson, Paolo Bonzini, Eduardo Habkost Cc: linux-cxl CXL memory is interleaved at granularities as fine as 64 bytes. To emulate this each read and write access undergoes address translation similar to that used in physical hardware. This is done using cfmws_ops for a memory region per CXL Fixed Memory Window (the PA address range in the host that is interleaved across host bridges and beyond. The OS programs interleaved decoders in the CXL Root Bridges, switch upstream ports and the corresponding decoders CXL type 3 devices who have to know the Host PA to Device PA mappings). Unfortunately this CXL memory may be used as normal memory and anything that can end up in RAM can be placed within it. As Linux has become more capable of handling this memory we've started to get quite a few bug reports for the QEMU support. However terrible the performance is people seem to like running actual software stacks on it :( This doesn't work for KVM - so for now CXL emulation remains TCG only. (unless you are very careful on how it is used!) I plan to add some safety guards at a later date to make it slightly harder for people to shoot themselves in the foot + a more limited set of CXL functionality that is safe (no interleaving!) Previously we had some issues with TCG reading instructions from CXL memory but that is now all working. This time the issues are around the Page Tables being in the CXL memory + DMA buffers being placed in it. The test setup I've been using is simple 2 way interleave via 2 root ports below a single CXL root complex. After configuration in Linux these are mapped to their own Numa Node and numactl --membind=1 ls followed by powering down the machine is sufficient to hit all the bugs addressed in this series. Thanks to Gregory, Peter and Alex for their help figuring this lot out. Note that I've included one patch from Peter without a SoB because so far it has only be posted in the discussion thread. Whilst thread started back at: https://lore.kernel.org/all/CAAg4PaqsGZvkDk_=PH+Oz-yeEUVcVsrumncAgegRKuxe_YoFhA@mail.gmail.com/ The QEMU part is from. https://lore.kernel.org/all/20240201130438.00001384@Huawei.com/ arm64 equivalent to follow. Gregory Price (1): target/i386: Enable page walking from MMIO memory Jonathan Cameron (1): tcg: Avoid double lock if page tables happen to be in mmio memory. Peter Maydell (1): accel/tcg: Set can_do_io at at start of lookup_tb_ptr helper accel/tcg/cpu-exec.c | 8 ++++ accel/tcg/cputlb.c | 9 ++++- target/i386/tcg/sysemu/excp_helper.c | 57 +++++++++++++++------------- 3 files changed, 45 insertions(+), 29 deletions(-) -- 2.39.2 ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 1/3] accel/tcg: Set can_do_io at at start of lookup_tb_ptr helper 2024-02-15 15:01 [PATCH 0/3 qemu] tcg/i386: Page tables in MMIO memory fixes (CXL) Jonathan Cameron via @ 2024-02-15 15:01 ` Jonathan Cameron via 2024-02-15 15:11 ` Peter Maydell 2024-02-15 19:11 ` Richard Henderson 2024-02-15 15:01 ` [PATCH 2/3] target/i386: Enable page walking from MMIO memory Jonathan Cameron via 2024-02-15 15:01 ` [PATCH 3/3] tcg: Avoid double lock if page tables happen to be in mmio memory Jonathan Cameron via 2 siblings, 2 replies; 18+ messages in thread From: Jonathan Cameron via @ 2024-02-15 15:01 UTC (permalink / raw) To: qemu-devel, Peter Maydell, Gregory Price, Alex Bennée, Sajjan Rao, Dimitrios Palyvos, richard.henderson, Paolo Bonzini, Eduardo Habkost Cc: linux-cxl From: Peter Maydell <peter.maydell@linaro.org> Peter posted this in the thread trying to fix x86 TCG handling of page tables in MMIO space (specifically emulated CXL interleaved memory) https://lore.kernel.org/qemu-devel/CAFEAcA_a_AyQ=Epz3_+CheAT8Crsk9mOu894wbNW_FywamkZiw@mail.gmail.com/#t Peter, are you happy to give your SoB on this one? Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> --- accel/tcg/cpu-exec.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c index 977576ca14..52239a441f 100644 --- a/accel/tcg/cpu-exec.c +++ b/accel/tcg/cpu-exec.c @@ -396,6 +396,14 @@ const void *HELPER(lookup_tb_ptr)(CPUArchState *env) uint64_t cs_base; uint32_t flags, cflags; + /* + * By definition we've just finished a TB, so I/O is OK. + * Avoid the possibility of calling cpu_io_recompile() if + * a page table walk triggered by tb_lookup() calling + * probe_access_internal() happens to touch an MMIO device. + * The next TB, if we chain to it, will clear the flag again. + */ + cpu->neg.can_do_io = true; cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags); cflags = curr_cflags(cpu); -- 2.39.2 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] accel/tcg: Set can_do_io at at start of lookup_tb_ptr helper 2024-02-15 15:01 ` [PATCH 1/3] accel/tcg: Set can_do_io at at start of lookup_tb_ptr helper Jonathan Cameron via @ 2024-02-15 15:11 ` Peter Maydell 2024-02-15 16:04 ` Jonathan Cameron via 2024-02-15 19:11 ` Richard Henderson 1 sibling, 1 reply; 18+ messages in thread From: Peter Maydell @ 2024-02-15 15:11 UTC (permalink / raw) To: Jonathan Cameron Cc: qemu-devel, Gregory Price, Alex Bennée, Sajjan Rao, Dimitrios Palyvos, richard.henderson, Paolo Bonzini, Eduardo Habkost, linux-cxl On Thu, 15 Feb 2024 at 15:02, Jonathan Cameron via <qemu-devel@nongnu.org> wrote: > > From: Peter Maydell <peter.maydell@linaro.org> > > Peter posted this in the thread trying to fix x86 TCG handling > of page tables in MMIO space (specifically emulated CXL interleaved memory) > https://lore.kernel.org/qemu-devel/CAFEAcA_a_AyQ=Epz3_+CheAT8Crsk9mOu894wbNW_FywamkZiw@mail.gmail.com/#t > > Peter, are you happy to give your SoB on this one? > > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > --- > accel/tcg/cpu-exec.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c > index 977576ca14..52239a441f 100644 > --- a/accel/tcg/cpu-exec.c > +++ b/accel/tcg/cpu-exec.c > @@ -396,6 +396,14 @@ const void *HELPER(lookup_tb_ptr)(CPUArchState *env) > uint64_t cs_base; > uint32_t flags, cflags; > > + /* > + * By definition we've just finished a TB, so I/O is OK. > + * Avoid the possibility of calling cpu_io_recompile() if > + * a page table walk triggered by tb_lookup() calling > + * probe_access_internal() happens to touch an MMIO device. > + * The next TB, if we chain to it, will clear the flag again. > + */ > + cpu->neg.can_do_io = true; > cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags); > > cflags = curr_cflags(cpu); > -- Happy to provide a Signed-off-by: Peter Maydell <peter.maydell@linaro.org> but I'd appreciate RTH's review to confirm this is the right way to deal with the problem. thanks -- PMM ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] accel/tcg: Set can_do_io at at start of lookup_tb_ptr helper 2024-02-15 15:11 ` Peter Maydell @ 2024-02-15 16:04 ` Jonathan Cameron via 0 siblings, 0 replies; 18+ messages in thread From: Jonathan Cameron via @ 2024-02-15 16:04 UTC (permalink / raw) To: Peter Maydell Cc: qemu-devel, Gregory Price, Alex Bennée, Sajjan Rao, Dimitrios Palyvos, richard.henderson, Paolo Bonzini, Eduardo Habkost, linux-cxl On Thu, 15 Feb 2024 15:11:17 +0000 Peter Maydell <peter.maydell@linaro.org> wrote: > On Thu, 15 Feb 2024 at 15:02, Jonathan Cameron via > <qemu-devel@nongnu.org> wrote: > > > > From: Peter Maydell <peter.maydell@linaro.org> > > > > Peter posted this in the thread trying to fix x86 TCG handling > > of page tables in MMIO space (specifically emulated CXL interleaved memory) > > https://lore.kernel.org/qemu-devel/CAFEAcA_a_AyQ=Epz3_+CheAT8Crsk9mOu894wbNW_FywamkZiw@mail.gmail.com/#t > > > > Peter, are you happy to give your SoB on this one? > > Thanks, I'll also add a summary of your description of why there is a bug based on your email to v2 as the above doesn't really provide any useful info :( If a page table is in IO memory and lookup_tb_ptr probes the TLB it can result in a page table walk for the instruction fetch. If this hits IO memory and io_prepare falsely assumes it needs to do a TLB recompile. Avoid that by setting can_do_io at the start of lookup_tb_ptr. > > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > > --- > > accel/tcg/cpu-exec.c | 8 ++++++++ > > 1 file changed, 8 insertions(+) > > > > diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c > > index 977576ca14..52239a441f 100644 > > --- a/accel/tcg/cpu-exec.c > > +++ b/accel/tcg/cpu-exec.c > > @@ -396,6 +396,14 @@ const void *HELPER(lookup_tb_ptr)(CPUArchState *env) > > uint64_t cs_base; > > uint32_t flags, cflags; > > > > + /* > > + * By definition we've just finished a TB, so I/O is OK. > > + * Avoid the possibility of calling cpu_io_recompile() if > > + * a page table walk triggered by tb_lookup() calling > > + * probe_access_internal() happens to touch an MMIO device. > > + * The next TB, if we chain to it, will clear the flag again. > > + */ > > + cpu->neg.can_do_io = true; > > cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags); > > > > cflags = curr_cflags(cpu); > > -- > > Happy to provide a > Signed-off-by: Peter Maydell <peter.maydell@linaro.org> > > but I'd appreciate RTH's review to confirm this is the right > way to deal with the problem. > > thanks > -- PMM ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] accel/tcg: Set can_do_io at at start of lookup_tb_ptr helper 2024-02-15 15:01 ` [PATCH 1/3] accel/tcg: Set can_do_io at at start of lookup_tb_ptr helper Jonathan Cameron via 2024-02-15 15:11 ` Peter Maydell @ 2024-02-15 19:11 ` Richard Henderson 1 sibling, 0 replies; 18+ messages in thread From: Richard Henderson @ 2024-02-15 19:11 UTC (permalink / raw) To: Jonathan Cameron, qemu-devel, Peter Maydell, Gregory Price, Alex Bennée, Sajjan Rao, Dimitrios Palyvos, Paolo Bonzini, Eduardo Habkost Cc: linux-cxl On 2/15/24 05:01, Jonathan Cameron wrote: > From: Peter Maydell <peter.maydell@linaro.org> > > Peter posted this in the thread trying to fix x86 TCG handling > of page tables in MMIO space (specifically emulated CXL interleaved memory) > https://lore.kernel.org/qemu-devel/CAFEAcA_a_AyQ=Epz3_+CheAT8Crsk9mOu894wbNW_FywamkZiw@mail.gmail.com/#t > > Peter, are you happy to give your SoB on this one? > > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > --- > accel/tcg/cpu-exec.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c > index 977576ca14..52239a441f 100644 > --- a/accel/tcg/cpu-exec.c > +++ b/accel/tcg/cpu-exec.c > @@ -396,6 +396,14 @@ const void *HELPER(lookup_tb_ptr)(CPUArchState *env) > uint64_t cs_base; > uint32_t flags, cflags; > > + /* > + * By definition we've just finished a TB, so I/O is OK. > + * Avoid the possibility of calling cpu_io_recompile() if > + * a page table walk triggered by tb_lookup() calling > + * probe_access_internal() happens to touch an MMIO device. > + * The next TB, if we chain to it, will clear the flag again. > + */ > + cpu->neg.can_do_io = true; Yes, this mirrors what is done in cpu_loop_exit(), and I can see how get_page_addr_code() and the ptw within would require can_do_io set properly. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> r~ ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 2/3] target/i386: Enable page walking from MMIO memory 2024-02-15 15:01 [PATCH 0/3 qemu] tcg/i386: Page tables in MMIO memory fixes (CXL) Jonathan Cameron via 2024-02-15 15:01 ` [PATCH 1/3] accel/tcg: Set can_do_io at at start of lookup_tb_ptr helper Jonathan Cameron via @ 2024-02-15 15:01 ` Jonathan Cameron via 2024-02-15 15:31 ` Philippe Mathieu-Daudé 2024-02-15 19:21 ` Richard Henderson 2024-02-15 15:01 ` [PATCH 3/3] tcg: Avoid double lock if page tables happen to be in mmio memory Jonathan Cameron via 2 siblings, 2 replies; 18+ messages in thread From: Jonathan Cameron via @ 2024-02-15 15:01 UTC (permalink / raw) To: qemu-devel, Peter Maydell, Gregory Price, Alex Bennée, Sajjan Rao, Dimitrios Palyvos, richard.henderson, Paolo Bonzini, Eduardo Habkost Cc: linux-cxl From: Gregory Price <gregory.price@memverge.com> CXL emulation of interleave requires read and write hooks due to requirement for subpage granularity. The Linux kernel stack now enables using this memory as conventional memory in a separate NUMA node. If a process is deliberately forced to run from that node $ numactl --membind=1 ls the page table walk on i386 fails. Useful part of backtrace: (cpu=cpu@entry=0x555556fd9000, fmt=fmt@entry=0x555555fe3378 "cpu_io_recompile: could not find TB for pc=%p") at ../../cpu-target.c:359 (retaddr=0, addr=19595792376, attrs=..., xlat=<optimized out>, cpu=0x555556fd9000, out_offset=<synthetic pointer>) at ../../accel/tcg/cputlb.c:1339 (cpu=0x555556fd9000, full=0x7fffee0d96e0, ret_be=ret_be@entry=0, addr=19595792376, size=size@entry=8, mmu_idx=4, type=MMU_DATA_LOAD, ra=0) at ../../accel/tcg/cputlb.c:2030 (cpu=cpu@entry=0x555556fd9000, p=p@entry=0x7ffff56fddc0, mmu_idx=<optimized out>, type=type@entry=MMU_DATA_LOAD, memop=<optimized out>, ra=ra@entry=0) at ../../accel/tcg/cputlb.c:2356 (cpu=cpu@entry=0x555556fd9000, addr=addr@entry=19595792376, oi=oi@entry=52, ra=ra@entry=0, access_type=access_type@entry=MMU_DATA_LOAD) at ../../accel/tcg/cputlb.c:2439 at ../../accel/tcg/ldst_common.c.inc:301 at ../../target/i386/tcg/sysemu/excp_helper.c:173 (err=0x7ffff56fdf80, out=0x7ffff56fdf70, mmu_idx=0, access_type=MMU_INST_FETCH, addr=18446744072116178925, env=0x555556fdb7c0) at ../../target/i386/tcg/sysemu/excp_helper.c:578 (cs=0x555556fd9000, addr=18446744072116178925, size=<optimized out>, access_type=MMU_INST_FETCH, mmu_idx=0, probe=<optimized out>, retaddr=0) at ../../target/i386/tcg/sysemu/excp_helper.c:604 Avoid this by plumbing the address all the way down from x86_cpu_tlb_fill() where is available as retaddr to the actual accessors which provide it to probe_access_full() which already handles MMIO accesses. Signed-off-by: Gregory Price <gregory.price@memverge.com> --- Patch posted in reply to thread: https://lore.kernel.org/qemu-devel/ZbvpSaOXzZkqDd6c@memverge.com/ I checked Gregory was fine with me adding Sign-off / author via the CXL discord. --- target/i386/tcg/sysemu/excp_helper.c | 57 +++++++++++++++------------- 1 file changed, 30 insertions(+), 27 deletions(-) diff --git a/target/i386/tcg/sysemu/excp_helper.c b/target/i386/tcg/sysemu/excp_helper.c index 5b86f439ad..b3bce020f4 100644 --- a/target/i386/tcg/sysemu/excp_helper.c +++ b/target/i386/tcg/sysemu/excp_helper.c @@ -59,14 +59,14 @@ typedef struct PTETranslate { hwaddr gaddr; } PTETranslate; -static bool ptw_translate(PTETranslate *inout, hwaddr addr) +static bool ptw_translate(PTETranslate *inout, hwaddr addr, uint64_t ra) { CPUTLBEntryFull *full; int flags; inout->gaddr = addr; flags = probe_access_full(inout->env, addr, 0, MMU_DATA_STORE, - inout->ptw_idx, true, &inout->haddr, &full, 0); + inout->ptw_idx, true, &inout->haddr, &full, ra); if (unlikely(flags & TLB_INVALID_MASK)) { TranslateFault *err = inout->err; @@ -82,20 +82,20 @@ static bool ptw_translate(PTETranslate *inout, hwaddr addr) return true; } -static inline uint32_t ptw_ldl(const PTETranslate *in) +static inline uint32_t ptw_ldl(const PTETranslate *in, uint64_t ra) { if (likely(in->haddr)) { return ldl_p(in->haddr); } - return cpu_ldl_mmuidx_ra(in->env, in->gaddr, in->ptw_idx, 0); + return cpu_ldl_mmuidx_ra(in->env, in->gaddr, in->ptw_idx, ra); } -static inline uint64_t ptw_ldq(const PTETranslate *in) +static inline uint64_t ptw_ldq(const PTETranslate *in, uint64_t ra) { if (likely(in->haddr)) { return ldq_p(in->haddr); } - return cpu_ldq_mmuidx_ra(in->env, in->gaddr, in->ptw_idx, 0); + return cpu_ldq_mmuidx_ra(in->env, in->gaddr, in->ptw_idx, ra); } /* @@ -132,7 +132,8 @@ static inline bool ptw_setl(const PTETranslate *in, uint32_t old, uint32_t set) } static bool mmu_translate(CPUX86State *env, const TranslateParams *in, - TranslateResult *out, TranslateFault *err) + TranslateResult *out, TranslateFault *err, + uint64_t ra) { const int32_t a20_mask = x86_get_a20_mask(env); const target_ulong addr = in->addr; @@ -166,11 +167,11 @@ static bool mmu_translate(CPUX86State *env, const TranslateParams *in, */ pte_addr = ((in->cr3 & ~0xfff) + (((addr >> 48) & 0x1ff) << 3)) & a20_mask; - if (!ptw_translate(&pte_trans, pte_addr)) { + if (!ptw_translate(&pte_trans, pte_addr, ra)) { return false; } restart_5: - pte = ptw_ldq(&pte_trans); + pte = ptw_ldq(&pte_trans, ra); if (!(pte & PG_PRESENT_MASK)) { goto do_fault; } @@ -191,11 +192,11 @@ static bool mmu_translate(CPUX86State *env, const TranslateParams *in, */ pte_addr = ((pte & PG_ADDRESS_MASK) + (((addr >> 39) & 0x1ff) << 3)) & a20_mask; - if (!ptw_translate(&pte_trans, pte_addr)) { + if (!ptw_translate(&pte_trans, pte_addr, ra)) { return false; } restart_4: - pte = ptw_ldq(&pte_trans); + pte = ptw_ldq(&pte_trans, ra); if (!(pte & PG_PRESENT_MASK)) { goto do_fault; } @@ -212,11 +213,11 @@ static bool mmu_translate(CPUX86State *env, const TranslateParams *in, */ pte_addr = ((pte & PG_ADDRESS_MASK) + (((addr >> 30) & 0x1ff) << 3)) & a20_mask; - if (!ptw_translate(&pte_trans, pte_addr)) { + if (!ptw_translate(&pte_trans, pte_addr, ra)) { return false; } restart_3_lma: - pte = ptw_ldq(&pte_trans); + pte = ptw_ldq(&pte_trans, ra); if (!(pte & PG_PRESENT_MASK)) { goto do_fault; } @@ -239,12 +240,12 @@ static bool mmu_translate(CPUX86State *env, const TranslateParams *in, * Page table level 3 */ pte_addr = ((in->cr3 & ~0x1f) + ((addr >> 27) & 0x18)) & a20_mask; - if (!ptw_translate(&pte_trans, pte_addr)) { + if (!ptw_translate(&pte_trans, pte_addr, ra)) { return false; } rsvd_mask |= PG_HI_USER_MASK; restart_3_nolma: - pte = ptw_ldq(&pte_trans); + pte = ptw_ldq(&pte_trans, ra); if (!(pte & PG_PRESENT_MASK)) { goto do_fault; } @@ -262,11 +263,11 @@ static bool mmu_translate(CPUX86State *env, const TranslateParams *in, */ pte_addr = ((pte & PG_ADDRESS_MASK) + (((addr >> 21) & 0x1ff) << 3)) & a20_mask; - if (!ptw_translate(&pte_trans, pte_addr)) { + if (!ptw_translate(&pte_trans, pte_addr, ra)) { return false; } restart_2_pae: - pte = ptw_ldq(&pte_trans); + pte = ptw_ldq(&pte_trans, ra); if (!(pte & PG_PRESENT_MASK)) { goto do_fault; } @@ -289,10 +290,10 @@ static bool mmu_translate(CPUX86State *env, const TranslateParams *in, */ pte_addr = ((pte & PG_ADDRESS_MASK) + (((addr >> 12) & 0x1ff) << 3)) & a20_mask; - if (!ptw_translate(&pte_trans, pte_addr)) { + if (!ptw_translate(&pte_trans, pte_addr, ra)) { return false; } - pte = ptw_ldq(&pte_trans); + pte = ptw_ldq(&pte_trans, ra); if (!(pte & PG_PRESENT_MASK)) { goto do_fault; } @@ -307,11 +308,11 @@ static bool mmu_translate(CPUX86State *env, const TranslateParams *in, * Page table level 2 */ pte_addr = ((in->cr3 & ~0xfff) + ((addr >> 20) & 0xffc)) & a20_mask; - if (!ptw_translate(&pte_trans, pte_addr)) { + if (!ptw_translate(&pte_trans, pte_addr, ra)) { return false; } restart_2_nopae: - pte = ptw_ldl(&pte_trans); + pte = ptw_ldl(&pte_trans, ra); if (!(pte & PG_PRESENT_MASK)) { goto do_fault; } @@ -336,10 +337,10 @@ static bool mmu_translate(CPUX86State *env, const TranslateParams *in, * Page table level 1 */ pte_addr = ((pte & ~0xfffu) + ((addr >> 10) & 0xffc)) & a20_mask; - if (!ptw_translate(&pte_trans, pte_addr)) { + if (!ptw_translate(&pte_trans, pte_addr, ra)) { return false; } - pte = ptw_ldl(&pte_trans); + pte = ptw_ldl(&pte_trans, ra); if (!(pte & PG_PRESENT_MASK)) { goto do_fault; } @@ -529,7 +530,8 @@ static G_NORETURN void raise_stage2(CPUX86State *env, TranslateFault *err, static bool get_physical_address(CPUX86State *env, vaddr addr, MMUAccessType access_type, int mmu_idx, - TranslateResult *out, TranslateFault *err) + TranslateResult *out, TranslateFault *err, + uint64_t ra) { TranslateParams in; bool use_stage2 = env->hflags2 & HF2_NPT_MASK; @@ -548,7 +550,7 @@ static bool get_physical_address(CPUX86State *env, vaddr addr, in.mmu_idx = MMU_USER_IDX; in.ptw_idx = MMU_PHYS_IDX; - if (!mmu_translate(env, &in, out, err)) { + if (!mmu_translate(env, &in, out, err, ra)) { err->stage2 = S2_GPA; return false; } @@ -575,7 +577,7 @@ static bool get_physical_address(CPUX86State *env, vaddr addr, return false; } } - return mmu_translate(env, &in, out, err); + return mmu_translate(env, &in, out, err, ra); } break; } @@ -601,7 +603,8 @@ bool x86_cpu_tlb_fill(CPUState *cs, vaddr addr, int size, TranslateResult out; TranslateFault err; - if (get_physical_address(env, addr, access_type, mmu_idx, &out, &err)) { + if (get_physical_address(env, addr, access_type, mmu_idx, &out, &err, + retaddr)) { /* * Even if 4MB pages, we map only one 4KB page in the cache to * avoid filling it too fast. -- 2.39.2 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 2/3] target/i386: Enable page walking from MMIO memory 2024-02-15 15:01 ` [PATCH 2/3] target/i386: Enable page walking from MMIO memory Jonathan Cameron via @ 2024-02-15 15:31 ` Philippe Mathieu-Daudé 2024-02-15 15:56 ` Jonathan Cameron via 2024-02-15 19:21 ` Richard Henderson 1 sibling, 1 reply; 18+ messages in thread From: Philippe Mathieu-Daudé @ 2024-02-15 15:31 UTC (permalink / raw) To: Jonathan Cameron, qemu-devel, Peter Maydell, Gregory Price, Alex Bennée, Sajjan Rao, Dimitrios Palyvos, richard.henderson, Paolo Bonzini, Eduardo Habkost Cc: linux-cxl On 15/2/24 16:01, Jonathan Cameron via wrote: > From: Gregory Price <gregory.price@memverge.com> > > CXL emulation of interleave requires read and write hooks due to > requirement for subpage granularity. The Linux kernel stack now enables > using this memory as conventional memory in a separate NUMA node. If a > process is deliberately forced to run from that node > $ numactl --membind=1 ls > the page table walk on i386 fails. > > Useful part of backtrace: > > (cpu=cpu@entry=0x555556fd9000, fmt=fmt@entry=0x555555fe3378 "cpu_io_recompile: could not find TB for pc=%p") > at ../../cpu-target.c:359 > (retaddr=0, addr=19595792376, attrs=..., xlat=<optimized out>, cpu=0x555556fd9000, out_offset=<synthetic pointer>) > at ../../accel/tcg/cputlb.c:1339 > (cpu=0x555556fd9000, full=0x7fffee0d96e0, ret_be=ret_be@entry=0, addr=19595792376, size=size@entry=8, mmu_idx=4, type=MMU_DATA_LOAD, ra=0) at ../../accel/tcg/cputlb.c:2030 > (cpu=cpu@entry=0x555556fd9000, p=p@entry=0x7ffff56fddc0, mmu_idx=<optimized out>, type=type@entry=MMU_DATA_LOAD, memop=<optimized out>, ra=ra@entry=0) at ../../accel/tcg/cputlb.c:2356 > (cpu=cpu@entry=0x555556fd9000, addr=addr@entry=19595792376, oi=oi@entry=52, ra=ra@entry=0, access_type=access_type@entry=MMU_DATA_LOAD) at ../../accel/tcg/cputlb.c:2439 > at ../../accel/tcg/ldst_common.c.inc:301 > at ../../target/i386/tcg/sysemu/excp_helper.c:173 > (err=0x7ffff56fdf80, out=0x7ffff56fdf70, mmu_idx=0, access_type=MMU_INST_FETCH, addr=18446744072116178925, env=0x555556fdb7c0) > at ../../target/i386/tcg/sysemu/excp_helper.c:578 > (cs=0x555556fd9000, addr=18446744072116178925, size=<optimized out>, access_type=MMU_INST_FETCH, mmu_idx=0, probe=<optimized out>, retaddr=0) at ../../target/i386/tcg/sysemu/excp_helper.c:604 > > Avoid this by plumbing the address all the way down from > x86_cpu_tlb_fill() where is available as retaddr to the actual accessors > which provide it to probe_access_full() which already handles MMIO accesses. > Suggested-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> > Signed-off-by: Gregory Price <gregory.price@memverge.com> > > --- > Patch posted in reply to thread: > https://lore.kernel.org/qemu-devel/ZbvpSaOXzZkqDd6c@memverge.com/ > > I checked Gregory was fine with me adding Sign-off / author via the CXL discord. > --- > target/i386/tcg/sysemu/excp_helper.c | 57 +++++++++++++++------------- > 1 file changed, 30 insertions(+), 27 deletions(-) ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/3] target/i386: Enable page walking from MMIO memory 2024-02-15 15:31 ` Philippe Mathieu-Daudé @ 2024-02-15 15:56 ` Jonathan Cameron via 0 siblings, 0 replies; 18+ messages in thread From: Jonathan Cameron via @ 2024-02-15 15:56 UTC (permalink / raw) To: Philippe Mathieu-Daudé Cc: qemu-devel, Peter Maydell, Gregory Price, Alex Bennée, Sajjan Rao, Dimitrios Palyvos, richard.henderson, Paolo Bonzini, Eduardo Habkost, linux-cxl On Thu, 15 Feb 2024 16:31:26 +0100 Philippe Mathieu-Daudé <philmd@linaro.org> wrote: > On 15/2/24 16:01, Jonathan Cameron via wrote: > > From: Gregory Price <gregory.price@memverge.com> > > > > CXL emulation of interleave requires read and write hooks due to > > requirement for subpage granularity. The Linux kernel stack now enables > > using this memory as conventional memory in a separate NUMA node. If a > > process is deliberately forced to run from that node > > $ numactl --membind=1 ls > > the page table walk on i386 fails. > > > > Useful part of backtrace: > > > > (cpu=cpu@entry=0x555556fd9000, fmt=fmt@entry=0x555555fe3378 "cpu_io_recompile: could not find TB for pc=%p") > > at ../../cpu-target.c:359 > > (retaddr=0, addr=19595792376, attrs=..., xlat=<optimized out>, cpu=0x555556fd9000, out_offset=<synthetic pointer>) > > at ../../accel/tcg/cputlb.c:1339 > > (cpu=0x555556fd9000, full=0x7fffee0d96e0, ret_be=ret_be@entry=0, addr=19595792376, size=size@entry=8, mmu_idx=4, type=MMU_DATA_LOAD, ra=0) at ../../accel/tcg/cputlb.c:2030 > > (cpu=cpu@entry=0x555556fd9000, p=p@entry=0x7ffff56fddc0, mmu_idx=<optimized out>, type=type@entry=MMU_DATA_LOAD, memop=<optimized out>, ra=ra@entry=0) at ../../accel/tcg/cputlb.c:2356 > > (cpu=cpu@entry=0x555556fd9000, addr=addr@entry=19595792376, oi=oi@entry=52, ra=ra@entry=0, access_type=access_type@entry=MMU_DATA_LOAD) at ../../accel/tcg/cputlb.c:2439 > > at ../../accel/tcg/ldst_common.c.inc:301 > > at ../../target/i386/tcg/sysemu/excp_helper.c:173 > > (err=0x7ffff56fdf80, out=0x7ffff56fdf70, mmu_idx=0, access_type=MMU_INST_FETCH, addr=18446744072116178925, env=0x555556fdb7c0) > > at ../../target/i386/tcg/sysemu/excp_helper.c:578 > > (cs=0x555556fd9000, addr=18446744072116178925, size=<optimized out>, access_type=MMU_INST_FETCH, mmu_idx=0, probe=<optimized out>, retaddr=0) at ../../target/i386/tcg/sysemu/excp_helper.c:604 > > > > Avoid this by plumbing the address all the way down from > > x86_cpu_tlb_fill() where is available as retaddr to the actual accessors > > which provide it to probe_access_full() which already handles MMIO accesses. > > > > Suggested-by: Peter Maydell <peter.maydell@linaro.org> Good point! Sorry Peter. > Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Thanks > > > Signed-off-by: Gregory Price <gregory.price@memverge.com> > > > > --- > > Patch posted in reply to thread: > > https://lore.kernel.org/qemu-devel/ZbvpSaOXzZkqDd6c@memverge.com/ > > > > I checked Gregory was fine with me adding Sign-off / author via the CXL discord. > > --- > > target/i386/tcg/sysemu/excp_helper.c | 57 +++++++++++++++------------- > > 1 file changed, 30 insertions(+), 27 deletions(-) > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/3] target/i386: Enable page walking from MMIO memory 2024-02-15 15:01 ` [PATCH 2/3] target/i386: Enable page walking from MMIO memory Jonathan Cameron via 2024-02-15 15:31 ` Philippe Mathieu-Daudé @ 2024-02-15 19:21 ` Richard Henderson 2024-02-15 19:31 ` Richard Henderson 1 sibling, 1 reply; 18+ messages in thread From: Richard Henderson @ 2024-02-15 19:21 UTC (permalink / raw) To: Jonathan Cameron, qemu-devel, Peter Maydell, Gregory Price, Alex Bennée, Sajjan Rao, Dimitrios Palyvos, Paolo Bonzini, Eduardo Habkost Cc: linux-cxl On 2/15/24 05:01, Jonathan Cameron wrote: > -static bool ptw_translate(PTETranslate *inout, hwaddr addr) > +static bool ptw_translate(PTETranslate *inout, hwaddr addr, uint64_t ra) You do not need to pass in 'ra' here... > flags = probe_access_full(inout->env, addr, 0, MMU_DATA_STORE, > - inout->ptw_idx, true, &inout->haddr, &full, 0); > + inout->ptw_idx, true, &inout->haddr, &full, ra); ... because this ^^^^ indicates a non-fault probe. A return address is only required for a faulting probe to raise an exception. > -static inline uint32_t ptw_ldl(const PTETranslate *in) > +static inline uint32_t ptw_ldl(const PTETranslate *in, uint64_t ra) > { > if (likely(in->haddr)) { > return ldl_p(in->haddr); > } > - return cpu_ldl_mmuidx_ra(in->env, in->gaddr, in->ptw_idx, 0); > + return cpu_ldl_mmuidx_ra(in->env, in->gaddr, in->ptw_idx, ra); > } However the loads do require 'ra' for the io_recompile, as you saw from the backtrace. r~ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/3] target/i386: Enable page walking from MMIO memory 2024-02-15 19:21 ` Richard Henderson @ 2024-02-15 19:31 ` Richard Henderson 0 siblings, 0 replies; 18+ messages in thread From: Richard Henderson @ 2024-02-15 19:31 UTC (permalink / raw) To: Jonathan Cameron, qemu-devel, Peter Maydell, Gregory Price, Alex Bennée, Sajjan Rao, Dimitrios Palyvos, Paolo Bonzini, Eduardo Habkost Cc: linux-cxl On 2/15/24 09:21, Richard Henderson wrote: > On 2/15/24 05:01, Jonathan Cameron wrote: >> -static bool ptw_translate(PTETranslate *inout, hwaddr addr) >> +static bool ptw_translate(PTETranslate *inout, hwaddr addr, uint64_t ra) > > You do not need to pass in 'ra' here... > >> flags = probe_access_full(inout->env, addr, 0, MMU_DATA_STORE, >> - inout->ptw_idx, true, &inout->haddr, &full, 0); >> + inout->ptw_idx, true, &inout->haddr, &full, ra); > > ... because this ^^^^ > indicates a non-fault probe. > A return address is only required for a faulting probe to raise an exception. Bah. We can always recurse and hit the load case too. So: Reviewed-by: Richard Henderson <richard.henderson@linaro.org> r~ > >> -static inline uint32_t ptw_ldl(const PTETranslate *in) >> +static inline uint32_t ptw_ldl(const PTETranslate *in, uint64_t ra) >> { >> if (likely(in->haddr)) { >> return ldl_p(in->haddr); >> } >> - return cpu_ldl_mmuidx_ra(in->env, in->gaddr, in->ptw_idx, 0); >> + return cpu_ldl_mmuidx_ra(in->env, in->gaddr, in->ptw_idx, ra); >> } > > However the loads do require 'ra' for the io_recompile, as you saw from the backtrace. > > > r~ ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 3/3] tcg: Avoid double lock if page tables happen to be in mmio memory. 2024-02-15 15:01 [PATCH 0/3 qemu] tcg/i386: Page tables in MMIO memory fixes (CXL) Jonathan Cameron via 2024-02-15 15:01 ` [PATCH 1/3] accel/tcg: Set can_do_io at at start of lookup_tb_ptr helper Jonathan Cameron via 2024-02-15 15:01 ` [PATCH 2/3] target/i386: Enable page walking from MMIO memory Jonathan Cameron via @ 2024-02-15 15:01 ` Jonathan Cameron via 2024-02-15 15:33 ` Philippe Mathieu-Daudé ` (2 more replies) 2 siblings, 3 replies; 18+ messages in thread From: Jonathan Cameron via @ 2024-02-15 15:01 UTC (permalink / raw) To: qemu-devel, Peter Maydell, Gregory Price, Alex Bennée, Sajjan Rao, Dimitrios Palyvos, richard.henderson, Paolo Bonzini, Eduardo Habkost Cc: linux-cxl On i386, after fixing the page walking code to work with pages in MMIO memory (specifically CXL emulated interleaved memory), a crash was seen in an interrupt handling path. Useful part of bt Peter identified this as being due to the BQL already being held when the page table walker encounters MMIO memory and attempts to take the lock again. There are other examples of similar paths TCG, so this follows the approach taken in those of simply checking if the lock is already held and if it is, don't take it again. Suggested-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> --- accel/tcg/cputlb.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c index 047cd2cc0a..3b8d178707 100644 --- a/accel/tcg/cputlb.c +++ b/accel/tcg/cputlb.c @@ -2019,6 +2019,7 @@ static uint64_t do_ld_mmio_beN(CPUState *cpu, CPUTLBEntryFull *full, int mmu_idx, MMUAccessType type, uintptr_t ra) { MemoryRegionSection *section; + bool locked = bql_locked(); MemoryRegion *mr; hwaddr mr_offset; MemTxAttrs attrs; @@ -2030,10 +2031,14 @@ static uint64_t do_ld_mmio_beN(CPUState *cpu, CPUTLBEntryFull *full, section = io_prepare(&mr_offset, cpu, full->xlat_section, attrs, addr, ra); mr = section->mr; - bql_lock(); + if (!locked) { + bql_lock(); + } ret = int_ld_mmio_beN(cpu, full, ret_be, addr, size, mmu_idx, type, ra, mr, mr_offset); - bql_unlock(); + if (!locked) { + bql_unlock(); + } return ret; } -- 2.39.2 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 3/3] tcg: Avoid double lock if page tables happen to be in mmio memory. 2024-02-15 15:01 ` [PATCH 3/3] tcg: Avoid double lock if page tables happen to be in mmio memory Jonathan Cameron via @ 2024-02-15 15:33 ` Philippe Mathieu-Daudé 2024-02-15 16:11 ` Jonathan Cameron via 2024-02-15 16:11 ` Peter Maydell 2024-02-15 19:30 ` Richard Henderson 2 siblings, 1 reply; 18+ messages in thread From: Philippe Mathieu-Daudé @ 2024-02-15 15:33 UTC (permalink / raw) To: Jonathan Cameron, qemu-devel, Peter Maydell, Gregory Price, Alex Bennée, Sajjan Rao, Dimitrios Palyvos, richard.henderson, Paolo Bonzini, Eduardo Habkost Cc: linux-cxl On 15/2/24 16:01, Jonathan Cameron via wrote: > On i386, after fixing the page walking code to work with pages in > MMIO memory (specifically CXL emulated interleaved memory), > a crash was seen in an interrupt handling path. > > Useful part of bt > > Peter identified this as being due to the BQL already being > held when the page table walker encounters MMIO memory and attempts > to take the lock again. There are other examples of similar paths > TCG, so this follows the approach taken in those of simply checking > if the lock is already held and if it is, don't take it again. > > Suggested-by: Peter Maydell <peter.maydell@linaro.org> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > --- > accel/tcg/cputlb.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c > index 047cd2cc0a..3b8d178707 100644 > --- a/accel/tcg/cputlb.c > +++ b/accel/tcg/cputlb.c > @@ -2019,6 +2019,7 @@ static uint64_t do_ld_mmio_beN(CPUState *cpu, CPUTLBEntryFull *full, > int mmu_idx, MMUAccessType type, uintptr_t ra) > { > MemoryRegionSection *section; > + bool locked = bql_locked(); Maybe clearer as: bool need_lock = !bql_locked(); > MemoryRegion *mr; > hwaddr mr_offset; > MemTxAttrs attrs; > @@ -2030,10 +2031,14 @@ static uint64_t do_ld_mmio_beN(CPUState *cpu, CPUTLBEntryFull *full, > section = io_prepare(&mr_offset, cpu, full->xlat_section, attrs, addr, ra); > mr = section->mr; > > - bql_lock(); > + if (!locked) { if (unlikely(need_lock)) { > + bql_lock(); > + } > ret = int_ld_mmio_beN(cpu, full, ret_be, addr, size, mmu_idx, > type, ra, mr, mr_offset); > - bql_unlock(); > + if (!locked) { Ditto. > + bql_unlock(); > + } > > return ret; > } ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/3] tcg: Avoid double lock if page tables happen to be in mmio memory. 2024-02-15 15:33 ` Philippe Mathieu-Daudé @ 2024-02-15 16:11 ` Jonathan Cameron via 0 siblings, 0 replies; 18+ messages in thread From: Jonathan Cameron via @ 2024-02-15 16:11 UTC (permalink / raw) To: Philippe Mathieu-Daudé Cc: qemu-devel, Peter Maydell, Gregory Price, Alex Bennée, Sajjan Rao, Dimitrios Palyvos, richard.henderson, Paolo Bonzini, Eduardo Habkost, linux-cxl On Thu, 15 Feb 2024 16:33:45 +0100 Philippe Mathieu-Daudé <philmd@linaro.org> wrote: > On 15/2/24 16:01, Jonathan Cameron via wrote: > > On i386, after fixing the page walking code to work with pages in > > MMIO memory (specifically CXL emulated interleaved memory), > > a crash was seen in an interrupt handling path. > > > > Useful part of bt > > > > Peter identified this as being due to the BQL already being > > held when the page table walker encounters MMIO memory and attempts > > to take the lock again. There are other examples of similar paths > > TCG, so this follows the approach taken in those of simply checking > > if the lock is already held and if it is, don't take it again. > > > > Suggested-by: Peter Maydell <peter.maydell@linaro.org> > > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > > --- > > accel/tcg/cputlb.c | 9 +++++++-- > > 1 file changed, 7 insertions(+), 2 deletions(-) > > > > diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c > > index 047cd2cc0a..3b8d178707 100644 > > --- a/accel/tcg/cputlb.c > > +++ b/accel/tcg/cputlb.c > > @@ -2019,6 +2019,7 @@ static uint64_t do_ld_mmio_beN(CPUState *cpu, CPUTLBEntryFull *full, > > int mmu_idx, MMUAccessType type, uintptr_t ra) > > { > > MemoryRegionSection *section; > > + bool locked = bql_locked(); > > Maybe clearer as: > > bool need_lock = !bql_locked(); > > > MemoryRegion *mr; > > hwaddr mr_offset; > > MemTxAttrs attrs; > > @@ -2030,10 +2031,14 @@ static uint64_t do_ld_mmio_beN(CPUState *cpu, CPUTLBEntryFull *full, > > section = io_prepare(&mr_offset, cpu, full->xlat_section, attrs, addr, ra); > > mr = section->mr; > > > > - bql_lock(); > > + if (!locked) { > > if (unlikely(need_lock)) { Isn't this reversed? Until now we've always taken the lock here so I'm guessing it normally is needed. if (likely(need_lock))? if we are going to mark it one way or the other. > > > + bql_lock(); > > + } > > ret = int_ld_mmio_beN(cpu, full, ret_be, addr, size, mmu_idx, > > type, ra, mr, mr_offset); > > - bql_unlock(); > > + if (!locked) { > > Ditto. > > > + bql_unlock(); > > + } > > > > return ret; > > } > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/3] tcg: Avoid double lock if page tables happen to be in mmio memory. 2024-02-15 15:01 ` [PATCH 3/3] tcg: Avoid double lock if page tables happen to be in mmio memory Jonathan Cameron via 2024-02-15 15:33 ` Philippe Mathieu-Daudé @ 2024-02-15 16:11 ` Peter Maydell 2024-02-15 17:34 ` Jonathan Cameron via 2024-02-15 19:30 ` Richard Henderson 2 siblings, 1 reply; 18+ messages in thread From: Peter Maydell @ 2024-02-15 16:11 UTC (permalink / raw) To: Jonathan Cameron Cc: qemu-devel, Gregory Price, Alex Bennée, Sajjan Rao, Dimitrios Palyvos, richard.henderson, Paolo Bonzini, Eduardo Habkost, linux-cxl On Thu, 15 Feb 2024 at 15:03, Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: > > On i386, after fixing the page walking code to work with pages in > MMIO memory (specifically CXL emulated interleaved memory), > a crash was seen in an interrupt handling path. > > Useful part of bt Did you intend to put in a backtrace here? > > Peter identified this as being due to the BQL already being > held when the page table walker encounters MMIO memory and attempts > to take the lock again. There are other examples of similar paths > TCG, so this follows the approach taken in those of simply checking > if the lock is already held and if it is, don't take it again. > > Suggested-by: Peter Maydell <peter.maydell@linaro.org> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > --- > accel/tcg/cputlb.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c > index 047cd2cc0a..3b8d178707 100644 > --- a/accel/tcg/cputlb.c > +++ b/accel/tcg/cputlb.c > @@ -2019,6 +2019,7 @@ static uint64_t do_ld_mmio_beN(CPUState *cpu, CPUTLBEntryFull *full, > int mmu_idx, MMUAccessType type, uintptr_t ra) > { > MemoryRegionSection *section; > + bool locked = bql_locked(); > MemoryRegion *mr; > hwaddr mr_offset; > MemTxAttrs attrs; > @@ -2030,10 +2031,14 @@ static uint64_t do_ld_mmio_beN(CPUState *cpu, CPUTLBEntryFull *full, > section = io_prepare(&mr_offset, cpu, full->xlat_section, attrs, addr, ra); > mr = section->mr; > > - bql_lock(); > + if (!locked) { > + bql_lock(); > + } > ret = int_ld_mmio_beN(cpu, full, ret_be, addr, size, mmu_idx, > type, ra, mr, mr_offset); > - bql_unlock(); > + if (!locked) { > + bql_unlock(); > + } > > return ret; > } Can we do this consistently across all four functions do_ld_mmio_beN, do_ld16_mmio_beN, do_st_mmio_leN, do_st16_mmio_leN, please ? It happens that your workload only needs to do an 8-byte load but conceptually the same thing applies in all these cases. thanks -- PMM ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/3] tcg: Avoid double lock if page tables happen to be in mmio memory. 2024-02-15 16:11 ` Peter Maydell @ 2024-02-15 17:34 ` Jonathan Cameron via 0 siblings, 0 replies; 18+ messages in thread From: Jonathan Cameron via @ 2024-02-15 17:34 UTC (permalink / raw) To: Peter Maydell Cc: qemu-devel, Gregory Price, Alex Bennée, Sajjan Rao, Dimitrios Palyvos, richard.henderson, Paolo Bonzini, Eduardo Habkost, linux-cxl On Thu, 15 Feb 2024 16:11:26 +0000 Peter Maydell <peter.maydell@linaro.org> wrote: > On Thu, 15 Feb 2024 at 15:03, Jonathan Cameron > <Jonathan.Cameron@huawei.com> wrote: > > > > On i386, after fixing the page walking code to work with pages in > > MMIO memory (specifically CXL emulated interleaved memory), > > a crash was seen in an interrupt handling path. > > > > Useful part of bt > > Did you intend to put in a backtrace here? ah. Indeed. Forgot that the # at start of a bt is a comment in a git message oops. I'll put those back in (hash removed) for v2. 7 0x0000555555ab1929 in bql_lock_impl (file=0x555556049122 "../../accel/tcg/cputlb.c", line=2033) at ../../system/cpus.c:524 8 bql_lock_impl (file=file@entry=0x555556049122 "../../accel/tcg/cputlb.c", line=line@entry=2033) at ../../system/cpus.c:520 9 0x0000555555c9f7d6 in do_ld_mmio_beN (cpu=0x5555578e0cb0, full=0x7ffe88012950, ret_be=ret_be@entry=0, addr=19595792376, size=size@entry=8, mmu_idx=4, type=MMU_DATA_LOAD, ra=0) at ../../accel/tcg/cputlb.c:2033 10 0x0000555555ca0fbd in do_ld_8 (cpu=cpu@entry=0x5555578e0cb0, p=p@entry=0x7ffff4efd1d0, mmu_idx=<optimized out>, type=type@entry=MMU_DATA_LOAD, memop=<optimized out>, ra=ra@entry=0) at ../../accel/tcg/cputlb.c:2356 11 0x0000555555ca341f in do_ld8_mmu (cpu=cpu@entry=0x5555578e0cb0, addr=addr@entry=19595792376, oi=oi@entry=52, ra=0, ra@entry=52, access_type=access_type@entry=MMU_DATA_LOAD) at ../../accel/tcg/cputlb.c:2439 12 0x0000555555ca5f59 in cpu_ldq_mmu (ra=52, oi=52, addr=19595792376, env=0x5555578e3470) at ../../accel/tcg/ldst_common.c.inc:169 13 cpu_ldq_le_mmuidx_ra (env=0x5555578e3470, addr=19595792376, mmu_idx=<optimized out>, ra=ra@entry=0) at ../../accel/tcg/ldst_common.c.inc:301 14 0x0000555555b4b5fc in ptw_ldq (ra=0, in=0x7ffff4efd320) at ../../target/i386/tcg/sysemu/excp_helper.c:98 15 ptw_ldq (ra=0, in=0x7ffff4efd320) at ../../target/i386/tcg/sysemu/excp_helper.c:93 16 mmu_translate (env=env@entry=0x5555578e3470, in=0x7ffff4efd3e0, out=0x7ffff4efd3b0, err=err@entry=0x7ffff4efd3c0, ra=ra@entry=0) at ../../target/i386/tcg/sysemu/excp_helper.c:174 17 0x0000555555b4c4b3 in get_physical_address (ra=0, err=0x7ffff4efd3c0, out=0x7ffff4efd3b0, mmu_idx=0, access_type=MMU_DATA_LOAD, addr=18446741874686299840, env=0x5555578e3470) at ../../target/i386/tcg/sysemu/excp_helper.c:580 18 x86_cpu_tlb_fill (cs=0x5555578e0cb0, addr=18446741874686299840, size=<optimized out>, access_type=MMU_DATA_LOAD, mmu_idx=0, probe=<optimized out>, retaddr=0) at ../../target/i386/tcg/sysemu/excp_helper.c:606 19 0x0000555555ca0ee9 in tlb_fill (retaddr=0, mmu_idx=0, access_type=MMU_DATA_LOAD, size=<optimized out>, addr=18446741874686299840, cpu=0x7ffff4efd540) at ../../accel/tcg/cputlb.c:1315 20 mmu_lookup1 (cpu=cpu@entry=0x5555578e0cb0, data=data@entry=0x7ffff4efd540, mmu_idx=0, access_type=access_type@entry=MMU_DATA_LOAD, ra=ra@entry=0) at ../../accel/tcg/cputlb.c:1713 21 0x0000555555ca2c61 in mmu_lookup (cpu=cpu@entry=0x5555578e0cb0, addr=addr@entry=18446741874686299840, oi=oi@entry=32, ra=ra@entry=0, type=type@entry=MMU_DATA_LOAD, l=l@entry=0x7ffff4efd540) at ../../accel/tcg/cputlb.c:1803 22 0x0000555555ca3165 in do_ld4_mmu (cpu=cpu@entry=0x5555578e0cb0, addr=addr@entry=18446741874686299840, oi=oi@entry=32, ra=ra@entry=0, access_type=access_type@entry=MMU_DATA_LOAD) at ../../accel/tcg/cputlb.c:2416 23 0x0000555555ca5ef9 in cpu_ldl_mmu (ra=0, oi=32, addr=18446741874686299840, env=0x5555578e3470) at ../../accel/tcg/ldst_common.c.inc:158 24 cpu_ldl_le_mmuidx_ra (env=env@entry=0x5555578e3470, addr=addr@entry=18446741874686299840, mmu_idx=<optimized out>, ra=ra@entry=0) at ../../accel/tcg/ldst_common.c.inc:294 25 0x0000555555bb6cdd in do_interrupt64 (is_hw=1, next_eip=18446744072399775809, error_code=0, is_int=0, intno=236, env=0x5555578e3470) at ../../target/i386/tcg/seg_helper.c:889 26 do_interrupt_all (cpu=cpu@entry=0x5555578e0cb0, intno=236, is_int=is_int@entry=0, error_code=error_code@entry=0, next_eip=next_eip@entry=0, is_hw=is_hw@entry=1) at ../../target/i386/tcg/seg_helper.c:1130 27 0x0000555555bb87da in do_interrupt_x86_hardirq (env=env@entry=0x5555578e3470, intno=<optimized out>, is_hw=is_hw@entry=1) at ../../target/i386/tcg/seg_helper.c:1162 28 0x0000555555b5039c in x86_cpu_exec_interrupt (cs=0x5555578e0cb0, interrupt_request=<optimized out>) at ../../target/i386/tcg/sysemu/seg_helper.c:197 29 0x0000555555c94480 in cpu_handle_interrupt (last_tb=<synthetic pointer>, cpu=0x5555578e0cb0) at ../../accel/tcg/cpu-exec.c:844 > > > > > Peter identified this as being due to the BQL already being > > held when the page table walker encounters MMIO memory and attempts > > to take the lock again. There are other examples of similar paths > > TCG, so this follows the approach taken in those of simply checking > > if the lock is already held and if it is, don't take it again. > > > > Suggested-by: Peter Maydell <peter.maydell@linaro.org> > > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > > --- > > accel/tcg/cputlb.c | 9 +++++++-- > > 1 file changed, 7 insertions(+), 2 deletions(-) > > > > diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c > > index 047cd2cc0a..3b8d178707 100644 > > --- a/accel/tcg/cputlb.c > > +++ b/accel/tcg/cputlb.c > > @@ -2019,6 +2019,7 @@ static uint64_t do_ld_mmio_beN(CPUState *cpu, CPUTLBEntryFull *full, > > int mmu_idx, MMUAccessType type, uintptr_t ra) > > { > > MemoryRegionSection *section; > > + bool locked = bql_locked(); > > MemoryRegion *mr; > > hwaddr mr_offset; > > MemTxAttrs attrs; > > @@ -2030,10 +2031,14 @@ static uint64_t do_ld_mmio_beN(CPUState *cpu, CPUTLBEntryFull *full, > > section = io_prepare(&mr_offset, cpu, full->xlat_section, attrs, addr, ra); > > mr = section->mr; > > > > - bql_lock(); > > + if (!locked) { > > + bql_lock(); > > + } > > ret = int_ld_mmio_beN(cpu, full, ret_be, addr, size, mmu_idx, > > type, ra, mr, mr_offset); > > - bql_unlock(); > > + if (!locked) { > > + bql_unlock(); > > + } > > > > return ret; > > } > > Can we do this consistently across all four functions > do_ld_mmio_beN, do_ld16_mmio_beN, do_st_mmio_leN, > do_st16_mmio_leN, please ? It happens that your workload > only needs to do an 8-byte load but conceptually the same > thing applies in all these cases. Sure, Jonathan > > thanks > -- PMM ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/3] tcg: Avoid double lock if page tables happen to be in mmio memory. 2024-02-15 15:01 ` [PATCH 3/3] tcg: Avoid double lock if page tables happen to be in mmio memory Jonathan Cameron via 2024-02-15 15:33 ` Philippe Mathieu-Daudé 2024-02-15 16:11 ` Peter Maydell @ 2024-02-15 19:30 ` Richard Henderson 2024-02-19 12:14 ` Jonathan Cameron via 2 siblings, 1 reply; 18+ messages in thread From: Richard Henderson @ 2024-02-15 19:30 UTC (permalink / raw) To: Jonathan Cameron, qemu-devel, Peter Maydell, Gregory Price, Alex Bennée, Sajjan Rao, Dimitrios Palyvos, Paolo Bonzini, Eduardo Habkost Cc: linux-cxl On 2/15/24 05:01, Jonathan Cameron wrote: > On i386, after fixing the page walking code to work with pages in > MMIO memory (specifically CXL emulated interleaved memory), > a crash was seen in an interrupt handling path. > > Useful part of bt > > Peter identified this as being due to the BQL already being > held when the page table walker encounters MMIO memory and attempts > to take the lock again. There are other examples of similar paths > TCG, so this follows the approach taken in those of simply checking > if the lock is already held and if it is, don't take it again. > > Suggested-by: Peter Maydell <peter.maydell@linaro.org> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > --- > accel/tcg/cputlb.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c > index 047cd2cc0a..3b8d178707 100644 > --- a/accel/tcg/cputlb.c > +++ b/accel/tcg/cputlb.c > @@ -2019,6 +2019,7 @@ static uint64_t do_ld_mmio_beN(CPUState *cpu, CPUTLBEntryFull *full, > int mmu_idx, MMUAccessType type, uintptr_t ra) > { > MemoryRegionSection *section; > + bool locked = bql_locked(); > MemoryRegion *mr; > hwaddr mr_offset; > MemTxAttrs attrs; > @@ -2030,10 +2031,14 @@ static uint64_t do_ld_mmio_beN(CPUState *cpu, CPUTLBEntryFull *full, > section = io_prepare(&mr_offset, cpu, full->xlat_section, attrs, addr, ra); > mr = section->mr; > > - bql_lock(); > + if (!locked) { > + bql_lock(); > + } > ret = int_ld_mmio_beN(cpu, full, ret_be, addr, size, mmu_idx, > type, ra, mr, mr_offset); > - bql_unlock(); > + if (!locked) { > + bql_unlock(); > + } On top of other comments, I'm never keen on this type of test/lock/test/unlock. When this kind of thing is encountered, it means we should have been using a recursive lock in the first place. r~ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/3] tcg: Avoid double lock if page tables happen to be in mmio memory. 2024-02-15 19:30 ` Richard Henderson @ 2024-02-19 12:14 ` Jonathan Cameron via 2024-02-20 11:56 ` Alex Bennée 0 siblings, 1 reply; 18+ messages in thread From: Jonathan Cameron via @ 2024-02-19 12:14 UTC (permalink / raw) To: Richard Henderson Cc: qemu-devel, Peter Maydell, Gregory Price, Alex Bennée, Sajjan Rao, Dimitrios Palyvos, Paolo Bonzini, Eduardo Habkost, linux-cxl On Thu, 15 Feb 2024 09:30:27 -1000 Richard Henderson <richard.henderson@linaro.org> wrote: > On 2/15/24 05:01, Jonathan Cameron wrote: > > On i386, after fixing the page walking code to work with pages in > > MMIO memory (specifically CXL emulated interleaved memory), > > a crash was seen in an interrupt handling path. > > > > Useful part of bt > > > > Peter identified this as being due to the BQL already being > > held when the page table walker encounters MMIO memory and attempts > > to take the lock again. There are other examples of similar paths > > TCG, so this follows the approach taken in those of simply checking > > if the lock is already held and if it is, don't take it again. > > > > Suggested-by: Peter Maydell <peter.maydell@linaro.org> > > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > > --- > > accel/tcg/cputlb.c | 9 +++++++-- > > 1 file changed, 7 insertions(+), 2 deletions(-) > > > > diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c > > index 047cd2cc0a..3b8d178707 100644 > > --- a/accel/tcg/cputlb.c > > +++ b/accel/tcg/cputlb.c > > @@ -2019,6 +2019,7 @@ static uint64_t do_ld_mmio_beN(CPUState *cpu, CPUTLBEntryFull *full, > > int mmu_idx, MMUAccessType type, uintptr_t ra) > > { > > MemoryRegionSection *section; > > + bool locked = bql_locked(); > > MemoryRegion *mr; > > hwaddr mr_offset; > > MemTxAttrs attrs; > > @@ -2030,10 +2031,14 @@ static uint64_t do_ld_mmio_beN(CPUState *cpu, CPUTLBEntryFull *full, > > section = io_prepare(&mr_offset, cpu, full->xlat_section, attrs, addr, ra); > > mr = section->mr; > > > > - bql_lock(); > > + if (!locked) { > > + bql_lock(); > > + } > > ret = int_ld_mmio_beN(cpu, full, ret_be, addr, size, mmu_idx, > > type, ra, mr, mr_offset); > > - bql_unlock(); > > + if (!locked) { > > + bql_unlock(); > > + } > > On top of other comments, I'm never keen on this type of test/lock/test/unlock. When this > kind of thing is encountered, it means we should have been using a recursive lock in the > first place. Hi Richard, Whilst I agree this stuff is really ugly, is it practical to fix it for this case? Or was intent here to make a general comment on QEMU locking? Jonathan > > > r~ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/3] tcg: Avoid double lock if page tables happen to be in mmio memory. 2024-02-19 12:14 ` Jonathan Cameron via @ 2024-02-20 11:56 ` Alex Bennée 0 siblings, 0 replies; 18+ messages in thread From: Alex Bennée @ 2024-02-20 11:56 UTC (permalink / raw) To: Jonathan Cameron Cc: Richard Henderson, qemu-devel, Peter Maydell, Gregory Price, Sajjan Rao, Dimitrios Palyvos, Paolo Bonzini, Eduardo Habkost, linux-cxl Jonathan Cameron <Jonathan.Cameron@Huawei.com> writes: > On Thu, 15 Feb 2024 09:30:27 -1000 > Richard Henderson <richard.henderson@linaro.org> wrote: > >> On 2/15/24 05:01, Jonathan Cameron wrote: >> > On i386, after fixing the page walking code to work with pages in >> > MMIO memory (specifically CXL emulated interleaved memory), >> > a crash was seen in an interrupt handling path. >> > >> > Useful part of bt >> > >> > Peter identified this as being due to the BQL already being >> > held when the page table walker encounters MMIO memory and attempts >> > to take the lock again. There are other examples of similar paths >> > TCG, so this follows the approach taken in those of simply checking >> > if the lock is already held and if it is, don't take it again. >> > >> > Suggested-by: Peter Maydell <peter.maydell@linaro.org> >> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> >> > --- >> > accel/tcg/cputlb.c | 9 +++++++-- >> > 1 file changed, 7 insertions(+), 2 deletions(-) >> > >> > diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c >> > index 047cd2cc0a..3b8d178707 100644 >> > --- a/accel/tcg/cputlb.c >> > +++ b/accel/tcg/cputlb.c >> > @@ -2019,6 +2019,7 @@ static uint64_t do_ld_mmio_beN(CPUState *cpu, CPUTLBEntryFull *full, >> > int mmu_idx, MMUAccessType type, uintptr_t ra) >> > { >> > MemoryRegionSection *section; >> > + bool locked = bql_locked(); >> > MemoryRegion *mr; >> > hwaddr mr_offset; >> > MemTxAttrs attrs; >> > @@ -2030,10 +2031,14 @@ static uint64_t do_ld_mmio_beN(CPUState *cpu, CPUTLBEntryFull *full, >> > section = io_prepare(&mr_offset, cpu, full->xlat_section, attrs, addr, ra); >> > mr = section->mr; >> > >> > - bql_lock(); >> > + if (!locked) { >> > + bql_lock(); >> > + } >> > ret = int_ld_mmio_beN(cpu, full, ret_be, addr, size, mmu_idx, >> > type, ra, mr, mr_offset); >> > - bql_unlock(); >> > + if (!locked) { >> > + bql_unlock(); >> > + } >> >> On top of other comments, I'm never keen on this type of test/lock/test/unlock. When this >> kind of thing is encountered, it means we should have been using a recursive lock in the >> first place. > > Hi Richard, > > Whilst I agree this stuff is really ugly, is it practical to fix it > for this case? You can use: BQL_LOCK_GUARD(); which does all the recursive checking and clean-up and for free also ensures you don't miss an unlock leg. > Or was intent here to make a general comment on QEMU locking? > > Jonathan > > >> >> >> r~ -- Alex Bennée Virtualisation Tech Lead @ Linaro ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2024-02-20 11:56 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-02-15 15:01 [PATCH 0/3 qemu] tcg/i386: Page tables in MMIO memory fixes (CXL) Jonathan Cameron via 2024-02-15 15:01 ` [PATCH 1/3] accel/tcg: Set can_do_io at at start of lookup_tb_ptr helper Jonathan Cameron via 2024-02-15 15:11 ` Peter Maydell 2024-02-15 16:04 ` Jonathan Cameron via 2024-02-15 19:11 ` Richard Henderson 2024-02-15 15:01 ` [PATCH 2/3] target/i386: Enable page walking from MMIO memory Jonathan Cameron via 2024-02-15 15:31 ` Philippe Mathieu-Daudé 2024-02-15 15:56 ` Jonathan Cameron via 2024-02-15 19:21 ` Richard Henderson 2024-02-15 19:31 ` Richard Henderson 2024-02-15 15:01 ` [PATCH 3/3] tcg: Avoid double lock if page tables happen to be in mmio memory Jonathan Cameron via 2024-02-15 15:33 ` Philippe Mathieu-Daudé 2024-02-15 16:11 ` Jonathan Cameron via 2024-02-15 16:11 ` Peter Maydell 2024-02-15 17:34 ` Jonathan Cameron via 2024-02-15 19:30 ` Richard Henderson 2024-02-19 12:14 ` Jonathan Cameron via 2024-02-20 11:56 ` Alex Bennée
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).