* [PATCH v3 0/1] target/i386: Fix page walking from MMIO memory. @ 2024-03-07 15:53 Jonathan Cameron via 2024-03-07 15:53 ` [PATCH v3 1/1] target/i386: Enable " Jonathan Cameron via 2024-03-26 13:24 ` [PATCH v3 0/1] target/i386: Fix " Philippe Mathieu-Daudé 0 siblings, 2 replies; 5+ messages in thread From: Jonathan Cameron via @ 2024-03-07 15:53 UTC (permalink / raw) To: Paolo Bonzini, Eduardo Habkost, qemu-devel, richard.henderson Cc: Peter Maydell, Gregory Price, Alex Bennée, linuxarm Previously: tcg/i386: Page tables in MMIO memory fixes (CXL) Richard Henderson picked up patches 1 and 3 which were architecture independent leaving just this x86 specific patch. No change to the patch. Resending because it's hard to spot individual unapplied patches in a larger series. Original cover letter (edited). CXL memory is interleaved at granularities as fine as 64 bytes. To emulate this each read and write access undergoes address translation similar to that used in physical hardware. This is done using cfmws_ops for a memory region per CXL Fixed Memory Window (the PA address range in the host that is interleaved across host bridges and beyond. The OS programs interleaved decoders in the CXL Root Bridges, switch upstream ports and the corresponding decoders CXL type 3 devices who have to know the Host PA to Device PA mappings). Unfortunately this CXL memory may be used as normal memory and anything that can end up in RAM can be placed within it. As Linux has become more capable of handling this memory we've started to get quite a few bug reports for the QEMU support. However terrible the performance is people seem to like running actual software stacks on it :( This doesn't work for KVM - so for now CXL emulation remains TCG only. (unless you are very careful on how it is used!) I plan to add some safety guards at a later date to make it slightly harder for people to shoot themselves in the foot + a more limited set of CXL functionality that is safe (no interleaving!) Previously we had some issues with TCG reading instructions from CXL memory but that is now all working. This time the issues are around the Page Tables being in the CXL memory + DMA buffers being placed in it. The test setup I've been using is simple 2 way interleave via 2 root ports below a single CXL root complex. After configuration in Linux these are mapped to their own Numa Node and numactl --membind=1 ls followed by powering down the machine is sufficient to hit all the bugs addressed in this series. Thanks to Gregory, Peter and Alex for their help figuring this lot out. Whilst thread started back at: https://lore.kernel.org/all/CAAg4PaqsGZvkDk_=PH+Oz-yeEUVcVsrumncAgegRKuxe_YoFhA@mail.gmail.com/ The QEMU part is from. https://lore.kernel.org/all/20240201130438.00001384@Huawei.com/ Gregory Price (1): target/i386: Enable page walking from MMIO memory target/i386/tcg/sysemu/excp_helper.c | 57 +++++++++++++++------------- 1 file changed, 30 insertions(+), 27 deletions(-) -- 2.39.2 ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v3 1/1] target/i386: Enable page walking from MMIO memory 2024-03-07 15:53 [PATCH v3 0/1] target/i386: Fix page walking from MMIO memory Jonathan Cameron via @ 2024-03-07 15:53 ` Jonathan Cameron via 2024-03-13 19:30 ` Richard Henderson 2024-03-26 13:24 ` [PATCH v3 0/1] target/i386: Fix " Philippe Mathieu-Daudé 1 sibling, 1 reply; 5+ messages in thread From: Jonathan Cameron via @ 2024-03-07 15:53 UTC (permalink / raw) To: Paolo Bonzini, Eduardo Habkost, qemu-devel, richard.henderson Cc: Peter Maydell, Gregory Price, Alex Bennée, linuxarm From: Gregory Price <gregory.price@memverge.com> CXL emulation of interleave requires read and write hooks due to requirement for subpage granularity. The Linux kernel stack now enables using this memory as conventional memory in a separate NUMA node. If a process is deliberately forced to run from that node $ numactl --membind=1 ls the page table walk on i386 fails. Useful part of backtrace: (cpu=cpu@entry=0x555556fd9000, fmt=fmt@entry=0x555555fe3378 "cpu_io_recompile: could not find TB for pc=%p") at ../../cpu-target.c:359 (retaddr=0, addr=19595792376, attrs=..., xlat=<optimized out>, cpu=0x555556fd9000, out_offset=<synthetic pointer>) at ../../accel/tcg/cputlb.c:1339 (cpu=0x555556fd9000, full=0x7fffee0d96e0, ret_be=ret_be@entry=0, addr=19595792376, size=size@entry=8, mmu_idx=4, type=MMU_DATA_LOAD, ra=0) at ../../accel/tcg/cputlb.c:2030 (cpu=cpu@entry=0x555556fd9000, p=p@entry=0x7ffff56fddc0, mmu_idx=<optimized out>, type=type@entry=MMU_DATA_LOAD, memop=<optimized out>, ra=ra@entry=0) at ../../accel/tcg/cputlb.c:2356 (cpu=cpu@entry=0x555556fd9000, addr=addr@entry=19595792376, oi=oi@entry=52, ra=ra@entry=0, access_type=access_type@entry=MMU_DATA_LOAD) at ../../accel/tcg/cputlb.c:2439 at ../../accel/tcg/ldst_common.c.inc:301 at ../../target/i386/tcg/sysemu/excp_helper.c:173 (err=0x7ffff56fdf80, out=0x7ffff56fdf70, mmu_idx=0, access_type=MMU_INST_FETCH, addr=18446744072116178925, env=0x555556fdb7c0) at ../../target/i386/tcg/sysemu/excp_helper.c:578 (cs=0x555556fd9000, addr=18446744072116178925, size=<optimized out>, access_type=MMU_INST_FETCH, mmu_idx=0, probe=<optimized out>, retaddr=0) at ../../target/i386/tcg/sysemu/excp_helper.c:604 Avoid this by plumbing the address all the way down from x86_cpu_tlb_fill() where is available as retaddr to the actual accessors which provide it to probe_access_full() which already handles MMIO accesses. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Suggested-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Gregory Price <gregory.price@memverge.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> --- v3: No change. target/i386/tcg/sysemu/excp_helper.c | 57 +++++++++++++++------------- 1 file changed, 30 insertions(+), 27 deletions(-) diff --git a/target/i386/tcg/sysemu/excp_helper.c b/target/i386/tcg/sysemu/excp_helper.c index 8f7011d966..7a57b7dd10 100644 --- a/target/i386/tcg/sysemu/excp_helper.c +++ b/target/i386/tcg/sysemu/excp_helper.c @@ -59,14 +59,14 @@ typedef struct PTETranslate { hwaddr gaddr; } PTETranslate; -static bool ptw_translate(PTETranslate *inout, hwaddr addr) +static bool ptw_translate(PTETranslate *inout, hwaddr addr, uint64_t ra) { CPUTLBEntryFull *full; int flags; inout->gaddr = addr; flags = probe_access_full(inout->env, addr, 0, MMU_DATA_STORE, - inout->ptw_idx, true, &inout->haddr, &full, 0); + inout->ptw_idx, true, &inout->haddr, &full, ra); if (unlikely(flags & TLB_INVALID_MASK)) { TranslateFault *err = inout->err; @@ -82,20 +82,20 @@ static bool ptw_translate(PTETranslate *inout, hwaddr addr) return true; } -static inline uint32_t ptw_ldl(const PTETranslate *in) +static inline uint32_t ptw_ldl(const PTETranslate *in, uint64_t ra) { if (likely(in->haddr)) { return ldl_p(in->haddr); } - return cpu_ldl_mmuidx_ra(in->env, in->gaddr, in->ptw_idx, 0); + return cpu_ldl_mmuidx_ra(in->env, in->gaddr, in->ptw_idx, ra); } -static inline uint64_t ptw_ldq(const PTETranslate *in) +static inline uint64_t ptw_ldq(const PTETranslate *in, uint64_t ra) { if (likely(in->haddr)) { return ldq_p(in->haddr); } - return cpu_ldq_mmuidx_ra(in->env, in->gaddr, in->ptw_idx, 0); + return cpu_ldq_mmuidx_ra(in->env, in->gaddr, in->ptw_idx, ra); } /* @@ -132,7 +132,8 @@ static inline bool ptw_setl(const PTETranslate *in, uint32_t old, uint32_t set) } static bool mmu_translate(CPUX86State *env, const TranslateParams *in, - TranslateResult *out, TranslateFault *err) + TranslateResult *out, TranslateFault *err, + uint64_t ra) { const target_ulong addr = in->addr; const int pg_mode = in->pg_mode; @@ -164,11 +165,11 @@ static bool mmu_translate(CPUX86State *env, const TranslateParams *in, * Page table level 5 */ pte_addr = (in->cr3 & ~0xfff) + (((addr >> 48) & 0x1ff) << 3); - if (!ptw_translate(&pte_trans, pte_addr)) { + if (!ptw_translate(&pte_trans, pte_addr, ra)) { return false; } restart_5: - pte = ptw_ldq(&pte_trans); + pte = ptw_ldq(&pte_trans, ra); if (!(pte & PG_PRESENT_MASK)) { goto do_fault; } @@ -188,11 +189,11 @@ static bool mmu_translate(CPUX86State *env, const TranslateParams *in, * Page table level 4 */ pte_addr = (pte & PG_ADDRESS_MASK) + (((addr >> 39) & 0x1ff) << 3); - if (!ptw_translate(&pte_trans, pte_addr)) { + if (!ptw_translate(&pte_trans, pte_addr, ra)) { return false; } restart_4: - pte = ptw_ldq(&pte_trans); + pte = ptw_ldq(&pte_trans, ra); if (!(pte & PG_PRESENT_MASK)) { goto do_fault; } @@ -208,11 +209,11 @@ static bool mmu_translate(CPUX86State *env, const TranslateParams *in, * Page table level 3 */ pte_addr = (pte & PG_ADDRESS_MASK) + (((addr >> 30) & 0x1ff) << 3); - if (!ptw_translate(&pte_trans, pte_addr)) { + if (!ptw_translate(&pte_trans, pte_addr, ra)) { return false; } restart_3_lma: - pte = ptw_ldq(&pte_trans); + pte = ptw_ldq(&pte_trans, ra); if (!(pte & PG_PRESENT_MASK)) { goto do_fault; } @@ -235,12 +236,12 @@ static bool mmu_translate(CPUX86State *env, const TranslateParams *in, * Page table level 3 */ pte_addr = (in->cr3 & 0xffffffe0ULL) + ((addr >> 27) & 0x18); - if (!ptw_translate(&pte_trans, pte_addr)) { + if (!ptw_translate(&pte_trans, pte_addr, ra)) { return false; } rsvd_mask |= PG_HI_USER_MASK; restart_3_nolma: - pte = ptw_ldq(&pte_trans); + pte = ptw_ldq(&pte_trans, ra); if (!(pte & PG_PRESENT_MASK)) { goto do_fault; } @@ -257,11 +258,11 @@ static bool mmu_translate(CPUX86State *env, const TranslateParams *in, * Page table level 2 */ pte_addr = (pte & PG_ADDRESS_MASK) + (((addr >> 21) & 0x1ff) << 3); - if (!ptw_translate(&pte_trans, pte_addr)) { + if (!ptw_translate(&pte_trans, pte_addr, ra)) { return false; } restart_2_pae: - pte = ptw_ldq(&pte_trans); + pte = ptw_ldq(&pte_trans, ra); if (!(pte & PG_PRESENT_MASK)) { goto do_fault; } @@ -283,10 +284,10 @@ static bool mmu_translate(CPUX86State *env, const TranslateParams *in, * Page table level 1 */ pte_addr = (pte & PG_ADDRESS_MASK) + (((addr >> 12) & 0x1ff) << 3); - if (!ptw_translate(&pte_trans, pte_addr)) { + if (!ptw_translate(&pte_trans, pte_addr, ra)) { return false; } - pte = ptw_ldq(&pte_trans); + pte = ptw_ldq(&pte_trans, ra); if (!(pte & PG_PRESENT_MASK)) { goto do_fault; } @@ -301,11 +302,11 @@ static bool mmu_translate(CPUX86State *env, const TranslateParams *in, * Page table level 2 */ pte_addr = (in->cr3 & 0xfffff000ULL) + ((addr >> 20) & 0xffc); - if (!ptw_translate(&pte_trans, pte_addr)) { + if (!ptw_translate(&pte_trans, pte_addr, ra)) { return false; } restart_2_nopae: - pte = ptw_ldl(&pte_trans); + pte = ptw_ldl(&pte_trans, ra); if (!(pte & PG_PRESENT_MASK)) { goto do_fault; } @@ -330,10 +331,10 @@ static bool mmu_translate(CPUX86State *env, const TranslateParams *in, * Page table level 1 */ pte_addr = (pte & ~0xfffu) + ((addr >> 10) & 0xffc); - if (!ptw_translate(&pte_trans, pte_addr)) { + if (!ptw_translate(&pte_trans, pte_addr, ra)) { return false; } - pte = ptw_ldl(&pte_trans); + pte = ptw_ldl(&pte_trans, ra); if (!(pte & PG_PRESENT_MASK)) { goto do_fault; } @@ -526,7 +527,8 @@ static G_NORETURN void raise_stage2(CPUX86State *env, TranslateFault *err, static bool get_physical_address(CPUX86State *env, vaddr addr, MMUAccessType access_type, int mmu_idx, - TranslateResult *out, TranslateFault *err) + TranslateResult *out, TranslateFault *err, + uint64_t ra) { TranslateParams in; bool use_stage2 = env->hflags2 & HF2_NPT_MASK; @@ -546,7 +548,7 @@ static bool get_physical_address(CPUX86State *env, vaddr addr, env->nested_pg_mode & PG_MODE_LMA ? MMU_USER64_IDX : MMU_USER32_IDX; in.ptw_idx = MMU_PHYS_IDX; - if (!mmu_translate(env, &in, out, err)) { + if (!mmu_translate(env, &in, out, err, ra)) { err->stage2 = S2_GPA; return false; } @@ -577,7 +579,7 @@ static bool get_physical_address(CPUX86State *env, vaddr addr, return false; } } - return mmu_translate(env, &in, out, err); + return mmu_translate(env, &in, out, err, ra); } break; } @@ -597,7 +599,8 @@ bool x86_cpu_tlb_fill(CPUState *cs, vaddr addr, int size, TranslateResult out; TranslateFault err; - if (get_physical_address(env, addr, access_type, mmu_idx, &out, &err)) { + if (get_physical_address(env, addr, access_type, mmu_idx, &out, &err, + retaddr)) { /* * Even if 4MB pages, we map only one 4KB page in the cache to * avoid filling it too fast. -- 2.39.2 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v3 1/1] target/i386: Enable page walking from MMIO memory 2024-03-07 15:53 ` [PATCH v3 1/1] target/i386: Enable " Jonathan Cameron via @ 2024-03-13 19:30 ` Richard Henderson 2024-03-21 2:34 ` Richard Henderson 0 siblings, 1 reply; 5+ messages in thread From: Richard Henderson @ 2024-03-13 19:30 UTC (permalink / raw) To: Jonathan Cameron, Paolo Bonzini, Eduardo Habkost, qemu-devel Cc: Peter Maydell, Gregory Price, Alex Bennée, linuxarm On 3/7/24 05:53, Jonathan Cameron wrote: > From: Gregory Price <gregory.price@memverge.com> > > CXL emulation of interleave requires read and write hooks due to > requirement for subpage granularity. The Linux kernel stack now enables > using this memory as conventional memory in a separate NUMA node. If a > process is deliberately forced to run from that node > $ numactl --membind=1 ls > the page table walk on i386 fails. > > Useful part of backtrace: > > (cpu=cpu@entry=0x555556fd9000, fmt=fmt@entry=0x555555fe3378 "cpu_io_recompile: could not find TB for pc=%p") > at ../../cpu-target.c:359 > (retaddr=0, addr=19595792376, attrs=..., xlat=<optimized out>, cpu=0x555556fd9000, out_offset=<synthetic pointer>) > at ../../accel/tcg/cputlb.c:1339 > (cpu=0x555556fd9000, full=0x7fffee0d96e0, ret_be=ret_be@entry=0, addr=19595792376, size=size@entry=8, mmu_idx=4, type=MMU_DATA_LOAD, ra=0) at ../../accel/tcg/cputlb.c:2030 > (cpu=cpu@entry=0x555556fd9000, p=p@entry=0x7ffff56fddc0, mmu_idx=<optimized out>, type=type@entry=MMU_DATA_LOAD, memop=<optimized out>, ra=ra@entry=0) at ../../accel/tcg/cputlb.c:2356 > (cpu=cpu@entry=0x555556fd9000, addr=addr@entry=19595792376, oi=oi@entry=52, ra=ra@entry=0, access_type=access_type@entry=MMU_DATA_LOAD) at ../../accel/tcg/cputlb.c:2439 > at ../../accel/tcg/ldst_common.c.inc:301 > at ../../target/i386/tcg/sysemu/excp_helper.c:173 > (err=0x7ffff56fdf80, out=0x7ffff56fdf70, mmu_idx=0, access_type=MMU_INST_FETCH, addr=18446744072116178925, env=0x555556fdb7c0) > at ../../target/i386/tcg/sysemu/excp_helper.c:578 > (cs=0x555556fd9000, addr=18446744072116178925, size=<optimized out>, access_type=MMU_INST_FETCH, mmu_idx=0, probe=<optimized out>, retaddr=0) at ../../target/i386/tcg/sysemu/excp_helper.c:604 > > Avoid this by plumbing the address all the way down from > x86_cpu_tlb_fill() where is available as retaddr to the actual accessors > which provide it to probe_access_full() which already handles MMIO accesses. > > Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> > Reviewed-by: Richard Henderson <richard.henderson@linaro.org> > Suggested-by: Peter Maydell <peter.maydell@linaro.org> > Signed-off-by: Gregory Price <gregory.price@memverge.com> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > --- > v3: No change. Fixes: https://gitlab.com/qemu-project/qemu/-/issues/2180 Fixes: https://gitlab.com/qemu-project/qemu/-/issues/2220 r~ ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v3 1/1] target/i386: Enable page walking from MMIO memory 2024-03-13 19:30 ` Richard Henderson @ 2024-03-21 2:34 ` Richard Henderson 0 siblings, 0 replies; 5+ messages in thread From: Richard Henderson @ 2024-03-21 2:34 UTC (permalink / raw) To: Jonathan Cameron, Paolo Bonzini, Eduardo Habkost, qemu-devel Cc: Peter Maydell, Gregory Price, Alex Bennée, linuxarm Paolo, ping! On 3/13/24 09:30, Richard Henderson wrote: > On 3/7/24 05:53, Jonathan Cameron wrote: >> From: Gregory Price <gregory.price@memverge.com> >> >> CXL emulation of interleave requires read and write hooks due to >> requirement for subpage granularity. The Linux kernel stack now enables >> using this memory as conventional memory in a separate NUMA node. If a >> process is deliberately forced to run from that node >> $ numactl --membind=1 ls >> the page table walk on i386 fails. >> >> Useful part of backtrace: >> >> (cpu=cpu@entry=0x555556fd9000, fmt=fmt@entry=0x555555fe3378 "cpu_io_recompile: >> could not find TB for pc=%p") >> at ../../cpu-target.c:359 >> (retaddr=0, addr=19595792376, attrs=..., xlat=<optimized out>, cpu=0x555556fd9000, >> out_offset=<synthetic pointer>) >> at ../../accel/tcg/cputlb.c:1339 >> (cpu=0x555556fd9000, full=0x7fffee0d96e0, ret_be=ret_be@entry=0, addr=19595792376, >> size=size@entry=8, mmu_idx=4, type=MMU_DATA_LOAD, ra=0) at ../../accel/tcg/cputlb.c:2030 >> (cpu=cpu@entry=0x555556fd9000, p=p@entry=0x7ffff56fddc0, mmu_idx=<optimized out>, >> type=type@entry=MMU_DATA_LOAD, memop=<optimized out>, ra=ra@entry=0) at >> ../../accel/tcg/cputlb.c:2356 >> (cpu=cpu@entry=0x555556fd9000, addr=addr@entry=19595792376, oi=oi@entry=52, >> ra=ra@entry=0, access_type=access_type@entry=MMU_DATA_LOAD) at >> ../../accel/tcg/cputlb.c:2439 >> at ../../accel/tcg/ldst_common.c.inc:301 >> at ../../target/i386/tcg/sysemu/excp_helper.c:173 >> (err=0x7ffff56fdf80, out=0x7ffff56fdf70, mmu_idx=0, access_type=MMU_INST_FETCH, >> addr=18446744072116178925, env=0x555556fdb7c0) >> at ../../target/i386/tcg/sysemu/excp_helper.c:578 >> (cs=0x555556fd9000, addr=18446744072116178925, size=<optimized out>, >> access_type=MMU_INST_FETCH, mmu_idx=0, probe=<optimized out>, retaddr=0) at >> ../../target/i386/tcg/sysemu/excp_helper.c:604 >> >> Avoid this by plumbing the address all the way down from >> x86_cpu_tlb_fill() where is available as retaddr to the actual accessors >> which provide it to probe_access_full() which already handles MMIO accesses. >> >> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> >> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> >> Suggested-by: Peter Maydell <peter.maydell@linaro.org> >> Signed-off-by: Gregory Price <gregory.price@memverge.com> >> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> >> --- >> v3: No change. > > Fixes: https://gitlab.com/qemu-project/qemu/-/issues/2180 > Fixes: https://gitlab.com/qemu-project/qemu/-/issues/2220 > > > r~ ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v3 0/1] target/i386: Fix page walking from MMIO memory. 2024-03-07 15:53 [PATCH v3 0/1] target/i386: Fix page walking from MMIO memory Jonathan Cameron via 2024-03-07 15:53 ` [PATCH v3 1/1] target/i386: Enable " Jonathan Cameron via @ 2024-03-26 13:24 ` Philippe Mathieu-Daudé 1 sibling, 0 replies; 5+ messages in thread From: Philippe Mathieu-Daudé @ 2024-03-26 13:24 UTC (permalink / raw) To: Jonathan Cameron, Paolo Bonzini, Eduardo Habkost, qemu-devel, richard.henderson Cc: Peter Maydell, Gregory Price, Alex Bennée, linuxarm On 7/3/24 16:53, Jonathan Cameron via wrote: > Previously: tcg/i386: Page tables in MMIO memory fixes (CXL) > Richard Henderson picked up patches 1 and 3 which were architecture independent > leaving just this x86 specific patch. > > No change to the patch. Resending because it's hard to spot individual > unapplied patches in a larger series. Thanks, patch queued! ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-03-26 13:25 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-03-07 15:53 [PATCH v3 0/1] target/i386: Fix page walking from MMIO memory Jonathan Cameron via 2024-03-07 15:53 ` [PATCH v3 1/1] target/i386: Enable " Jonathan Cameron via 2024-03-13 19:30 ` Richard Henderson 2024-03-21 2:34 ` Richard Henderson 2024-03-26 13:24 ` [PATCH v3 0/1] target/i386: Fix " Philippe Mathieu-Daudé
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).