* [PATCH 0/2] x86/pvh: workaround missing MMIO regions in dom0 p2m @ 2025-02-14 9:29 Roger Pau Monne 2025-02-14 9:29 ` [PATCH 1/2] x86/emul: dump unhandled memory accesses for PVH dom0 Roger Pau Monne 2025-02-14 9:29 ` [PATCH 2/2] x86/dom0: attempt to fixup p2m page-faults " Roger Pau Monne 0 siblings, 2 replies; 14+ messages in thread From: Roger Pau Monne @ 2025-02-14 9:29 UTC (permalink / raw) To: xen-devel Cc: Roger Pau Monne, Jan Beulich, Andrew Cooper, Oleksii Kurochko, Community Manager, Anthony PERARD, Michal Orzel, Julien Grall, Stefano Stabellini Hello, The aim of this series is to provide a workaround for better handling missing MMIO regions in a PVH dom0 p2m. Xen doesn't know the complete host memory layout, as it's not able to parse any information from dynamic ACPI tables. Hence the p2m built for a PVH dom0 might be missing some MMIO regions only described in ACPI. Since a PVH dom0 has no way to request the mapping of such regions (and adding one would also require dom0 kernel modifications) instead provide an option for Xen to add those MMIO regions as part of handling p2m page-faults. The option is currently off by default. Thanks, Roger. Roger Pau Monne (2): x86/emul: dump unhandled memory accesses for PVH dom0 x86/dom0: attempt to fixup p2m page-faults for PVH dom0 CHANGELOG.md | 10 +++++++++ docs/misc/xen-command-line.pandoc | 16 ++++++++++++++- xen/arch/x86/dom0_build.c | 4 ++++ xen/arch/x86/hvm/emulate.c | 33 ++++++++++++++++++++++++++++++ xen/arch/x86/hvm/hvm.c | 31 ++++++++++++++++++++++++++++ xen/arch/x86/include/asm/hvm/hvm.h | 5 +++++ 6 files changed, 98 insertions(+), 1 deletion(-) -- 2.46.0 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 1/2] x86/emul: dump unhandled memory accesses for PVH dom0 2025-02-14 9:29 [PATCH 0/2] x86/pvh: workaround missing MMIO regions in dom0 p2m Roger Pau Monne @ 2025-02-14 9:29 ` Roger Pau Monne 2025-02-14 11:23 ` Jan Beulich 2025-02-14 9:29 ` [PATCH 2/2] x86/dom0: attempt to fixup p2m page-faults " Roger Pau Monne 1 sibling, 1 reply; 14+ messages in thread From: Roger Pau Monne @ 2025-02-14 9:29 UTC (permalink / raw) To: xen-devel; +Cc: Roger Pau Monne, Jan Beulich, Andrew Cooper A PV dom0 can map any host memory as long as it's allowed by the IO capability range in d->iomem_caps. On the other hand, a PVH dom0 has no way to populate MMIO region onto it's p2m, so it's limited to what Xen initially populates on the p2m based on the host memory map and the enabled device BARs. Introduce a new debug build only printk that reports attempts by dom0 to access addresses not populated on the p2m, and not handled by any emulator. This is for information purposes only, but might allow getting an idea of what MMIO ranges might be missing on the p2m. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> --- xen/arch/x86/hvm/emulate.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c index 0d90cc4598be..8aa7e49c056c 100644 --- a/xen/arch/x86/hvm/emulate.c +++ b/xen/arch/x86/hvm/emulate.c @@ -337,6 +337,9 @@ static int hvmemul_do_io( /* If there is no suitable backing DM, just ignore accesses */ if ( !s ) { + if ( is_mmio && is_hardware_domain(currd) ) + gdprintk(XENLOG_DEBUG, "unhandled memory %s to %#lx size %u\n", + dir ? "read" : "write", addr, size); rc = hvm_process_io_intercept(&null_handler, &p); vio->req.state = STATE_IOREQ_NONE; } -- 2.46.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] x86/emul: dump unhandled memory accesses for PVH dom0 2025-02-14 9:29 ` [PATCH 1/2] x86/emul: dump unhandled memory accesses for PVH dom0 Roger Pau Monne @ 2025-02-14 11:23 ` Jan Beulich 0 siblings, 0 replies; 14+ messages in thread From: Jan Beulich @ 2025-02-14 11:23 UTC (permalink / raw) To: Roger Pau Monne; +Cc: Andrew Cooper, xen-devel On 14.02.2025 10:29, Roger Pau Monne wrote: > A PV dom0 can map any host memory as long as it's allowed by the IO > capability range in d->iomem_caps. On the other hand, a PVH dom0 has no > way to populate MMIO region onto it's p2m, so it's limited to what Xen > initially populates on the p2m based on the host memory map and the enabled > device BARs. > > Introduce a new debug build only printk that reports attempts by dom0 to > access addresses not populated on the p2m, and not handled by any emulator. > This is for information purposes only, but might allow getting an idea of > what MMIO ranges might be missing on the p2m. > > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Hmm, yes, why not: Acked-by: Jan Beulich <jbeulich@suse.com> with one suggestion: > --- a/xen/arch/x86/hvm/emulate.c > +++ b/xen/arch/x86/hvm/emulate.c > @@ -337,6 +337,9 @@ static int hvmemul_do_io( > /* If there is no suitable backing DM, just ignore accesses */ > if ( !s ) > { > + if ( is_mmio && is_hardware_domain(currd) ) > + gdprintk(XENLOG_DEBUG, "unhandled memory %s to %#lx size %u\n", > + dir ? "read" : "write", addr, size); Can we make it "read from" and "write to"? Jan ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 2/2] x86/dom0: attempt to fixup p2m page-faults for PVH dom0 2025-02-14 9:29 [PATCH 0/2] x86/pvh: workaround missing MMIO regions in dom0 p2m Roger Pau Monne 2025-02-14 9:29 ` [PATCH 1/2] x86/emul: dump unhandled memory accesses for PVH dom0 Roger Pau Monne @ 2025-02-14 9:29 ` Roger Pau Monne 2025-02-14 11:53 ` Jan Beulich 2025-02-14 14:42 ` Andrew Cooper 1 sibling, 2 replies; 14+ messages in thread From: Roger Pau Monne @ 2025-02-14 9:29 UTC (permalink / raw) To: xen-devel Cc: Roger Pau Monne, Oleksii Kurochko, Community Manager, Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich, Julien Grall, Stefano Stabellini When building a PVH dom0 Xen attempts to map all (relevant) MMIO regions into the p2m for dom0 access. However the information Xen has about the host memory map is limited. Xen doesn't have access to any resources described in ACPI dynamic tables, and hence the p2m mappings provided might not be complete. PV doesn't suffer from this issue because a PV dom0 is capable of mapping into it's page-tables any address not explicitly banned in d->iomem_caps. Introduce a new command line options that allows Xen to attempt to fixup the p2m page-faults, by creating p2m identity maps in response to p2m page-faults. This is aimed as a workaround to small ACPI regions Xen doesn't know about. Note that missing large MMIO regions mapped in this way will lead to slowness due to the VM exit processing, plus the mappings will always use small pages. The ultimate aim is to attempt to bring better parity with a classic PV dom0. Note such fixup rely on the CPU doing the access to the unpopulated address. If the access is attempted from a device instead there's no possible way to fixup, as IOMMU page-fault are asynchronous. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> --- Only slightly tested on my local PVH dom0 deployment. --- CHANGELOG.md | 10 +++++++++ docs/misc/xen-command-line.pandoc | 16 +++++++++++++- xen/arch/x86/dom0_build.c | 4 ++++ xen/arch/x86/hvm/emulate.c | 34 ++++++++++++++++++++++++++++-- xen/arch/x86/hvm/hvm.c | 31 +++++++++++++++++++++++++++ xen/arch/x86/include/asm/hvm/hvm.h | 5 +++++ 6 files changed, 97 insertions(+), 3 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 1de1d1eca17f..e5e6ab3a8902 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,6 +4,16 @@ Notable changes to Xen will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) +## [4.21.0 UNRELEASED](https://xenbits.xenproject.org/gitweb/?p=xen.git;a=shortlog;h=staging) - TBD + +### Changed + +### Added + - On x86: + - Option to attempt to fixup p2m page-faults on PVH dom0. + +### Removed + ## [4.20.0 UNRELEASED](https://xenbits.xenproject.org/gitweb/?p=xen.git;a=shortlog;h=staging) - TBD ### Changed diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc index 9bbd00baef91..e9884de07e9e 100644 --- a/docs/misc/xen-command-line.pandoc +++ b/docs/misc/xen-command-line.pandoc @@ -822,7 +822,8 @@ Specify the bit width of the DMA heap. ### dom0 = List of [ pv | pvh, shadow=<bool>, verbose=<bool>, - cpuid-faulting=<bool>, msr-relaxed=<bool> ] (x86) + cpuid-faulting=<bool>, msr-relaxed=<bool>, + pf-fixup=<bool> ] (x86) = List of [ sve=<integer> ] (Arm64) @@ -883,6 +884,19 @@ Controls for how dom0 is constructed on x86 systems. If using this option is necessary to fix an issue, please report a bug. +* The `pf-fixup` boolean is only applicable when using a PVH dom0 and + defaults to false. + + When running dom0 in PVH mode the dom0 kernel has no way to map MMIO + regions into the p2m, such mode relies on Xen dom0 builder populating + the p2m with all MMIO regions that dom0 should access. However Xen + doesn't have a complete picture of the host memory map, due to not + being able to process ACPI dynamic tables. + + The `pf-fixup` option allows Xen to attempt to add missing MMIO regions + to the p2m in response to page-faults generated by dom0 trying to access + unpopulated entries in the p2m. + Enables features on dom0 on Arm systems. * The `sve` integer parameter enables Arm SVE usage for Dom0 and sets the diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c index d1b4ef83b2d0..34b6166f4922 100644 --- a/xen/arch/x86/dom0_build.c +++ b/xen/arch/x86/dom0_build.c @@ -286,6 +286,10 @@ int __init parse_arch_dom0_param(const char *s, const char *e) opt_dom0_cpuid_faulting = val; else if ( (val = parse_boolean("msr-relaxed", s, e)) >= 0 ) opt_dom0_msr_relaxed = val; +#ifdef CONFIG_HVM + else if ( (val = parse_boolean("pf-fixup", s, e)) >= 0 ) + opt_dom0_pf_fixup = val; +#endif else return -EINVAL; diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c index 8aa7e49c056c..aa16ed0e9cac 100644 --- a/xen/arch/x86/hvm/emulate.c +++ b/xen/arch/x86/hvm/emulate.c @@ -338,8 +338,38 @@ static int hvmemul_do_io( if ( !s ) { if ( is_mmio && is_hardware_domain(currd) ) - gdprintk(XENLOG_DEBUG, "unhandled memory %s to %#lx size %u\n", - dir ? "read" : "write", addr, size); + { + /* + * PVH dom0 is likely missing MMIO mappings on the p2m, due to + * the incomplete information Xen has about the memory layout. + * + * Either print a message to note dom0 attempted to access an + * unpopulated GPA, or try to fixup the p2m by creating an + * identity mapping for the faulting GPA. + */ + if ( opt_dom0_pf_fixup ) + { + int inner_rc = hvm_hwdom_fixup_p2m(addr); + + if ( !inner_rc ) + { + gdprintk(XENLOG_DEBUG, + "fixup p2m mapping for page %lx added\n", + paddr_to_pfn(addr)); + rc = X86EMUL_RETRY; + vio->req.state = STATE_IOREQ_NONE; + break; + } + + gprintk(XENLOG_WARNING, + "unable to fixup memory %s to %#lx size %u: %d\n", + dir ? "read" : "write", addr, size, inner_rc); + } + else + gdprintk(XENLOG_DEBUG, + "unhandled memory %s to %#lx size %u\n", + dir ? "read" : "write", addr, size); + } rc = hvm_process_io_intercept(&null_handler, &p); vio->req.state = STATE_IOREQ_NONE; } diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 39e39ce4ce36..4505868f025c 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -13,6 +13,7 @@ #include <xen/lib.h> #include <xen/trace.h> #include <xen/sched.h> +#include <xen/iocap.h> #include <xen/irq.h> #include <xen/softirq.h> #include <xen/domain.h> @@ -5458,6 +5459,36 @@ int hvm_copy_context_and_params(struct domain *dst, struct domain *src) return rc; } +bool __ro_after_init opt_dom0_pf_fixup; +int hvm_hwdom_fixup_p2m(paddr_t addr) +{ + unsigned long gfn = paddr_to_pfn(addr); + struct domain *currd = current->domain; + p2m_type_t type; + mfn_t mfn; + int rc; + + ASSERT(is_hardware_domain(currd)); + ASSERT(!altp2m_active(currd)); + + /* + * Fixups are only applied for MMIO holes, and rely on the hardware domain + * having identity mappings for non RAM regions (gfn == mfn). + */ + if ( !iomem_access_permitted(currd, gfn, gfn) || + !is_memory_hole(_mfn(gfn), _mfn(gfn)) ) + return -EPERM; + + mfn = get_gfn(currd, gfn, &type); + if ( !mfn_eq(mfn, INVALID_MFN) || !p2m_is_hole(type) ) + rc = mfn_eq(mfn, _mfn(gfn)) ? 0 : -EEXIST; + else + rc = set_mmio_p2m_entry(currd, _gfn(gfn), _mfn(gfn), 0); + put_gfn(currd, gfn); + + return rc; +} + /* * Local variables: * mode: C diff --git a/xen/arch/x86/include/asm/hvm/hvm.h b/xen/arch/x86/include/asm/hvm/hvm.h index cad3a9427801..e084e1c7d665 100644 --- a/xen/arch/x86/include/asm/hvm/hvm.h +++ b/xen/arch/x86/include/asm/hvm/hvm.h @@ -772,6 +772,11 @@ static inline int hvm_vmtrace_reset(struct vcpu *v) return -EOPNOTSUPP; } +/* For PVH dom0: signal whether to attempt fixup of p2m page-faults. */ +extern bool opt_dom0_pf_fixup; +/* Attempt to fixup a p2m page-fault by adding an identity mapping entry. */ +int hvm_hwdom_fixup_p2m(paddr_t addr); + /* * Accessors for registers which have per-guest-type or per-vendor locations * (e.g. VMCS, msr load/save lists, VMCB, VMLOAD lazy, etc). -- 2.46.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] x86/dom0: attempt to fixup p2m page-faults for PVH dom0 2025-02-14 9:29 ` [PATCH 2/2] x86/dom0: attempt to fixup p2m page-faults " Roger Pau Monne @ 2025-02-14 11:53 ` Jan Beulich 2025-02-14 12:38 ` Roger Pau Monné 2025-02-14 14:42 ` Andrew Cooper 1 sibling, 1 reply; 14+ messages in thread From: Jan Beulich @ 2025-02-14 11:53 UTC (permalink / raw) To: Roger Pau Monne Cc: Oleksii Kurochko, Community Manager, Andrew Cooper, Anthony PERARD, Michal Orzel, Julien Grall, Stefano Stabellini, xen-devel On 14.02.2025 10:29, Roger Pau Monne wrote: > --- a/docs/misc/xen-command-line.pandoc > +++ b/docs/misc/xen-command-line.pandoc > @@ -822,7 +822,8 @@ Specify the bit width of the DMA heap. > > ### dom0 > = List of [ pv | pvh, shadow=<bool>, verbose=<bool>, > - cpuid-faulting=<bool>, msr-relaxed=<bool> ] (x86) > + cpuid-faulting=<bool>, msr-relaxed=<bool>, > + pf-fixup=<bool> ] (x86) > > = List of [ sve=<integer> ] (Arm64) > > @@ -883,6 +884,19 @@ Controls for how dom0 is constructed on x86 systems. > > If using this option is necessary to fix an issue, please report a bug. > > +* The `pf-fixup` boolean is only applicable when using a PVH dom0 and > + defaults to false. > + > + When running dom0 in PVH mode the dom0 kernel has no way to map MMIO > + regions into the p2m, such mode relies on Xen dom0 builder populating > + the p2m with all MMIO regions that dom0 should access. However Xen > + doesn't have a complete picture of the host memory map, due to not > + being able to process ACPI dynamic tables. > + > + The `pf-fixup` option allows Xen to attempt to add missing MMIO regions > + to the p2m in response to page-faults generated by dom0 trying to access > + unpopulated entries in the p2m. I wonder if this is to implementation focused for a command line option doc. In particular the multiple uses of "p2m" are standing out in this regard. > --- a/xen/arch/x86/dom0_build.c > +++ b/xen/arch/x86/dom0_build.c > @@ -286,6 +286,10 @@ int __init parse_arch_dom0_param(const char *s, const char *e) > opt_dom0_cpuid_faulting = val; > else if ( (val = parse_boolean("msr-relaxed", s, e)) >= 0 ) > opt_dom0_msr_relaxed = val; > +#ifdef CONFIG_HVM > + else if ( (val = parse_boolean("pf-fixup", s, e)) >= 0 ) > + opt_dom0_pf_fixup = val; > +#endif > else > return -EINVAL; I fear the scope of these sub-options is getting increasingly confusing. opt_dom0_msr_relaxed is what its name says - specific to Dom0. opt_dom0_cpuid_faulting, otoh, is a control domain option (i.e. also applicable to a [hypothetical?] late ctrldom). Now you add an option that's applicable to the hardware domain, i.e. also coverting late-hwdom. > --- a/xen/arch/x86/hvm/emulate.c > +++ b/xen/arch/x86/hvm/emulate.c > @@ -338,8 +338,38 @@ static int hvmemul_do_io( > if ( !s ) > { > if ( is_mmio && is_hardware_domain(currd) ) > - gdprintk(XENLOG_DEBUG, "unhandled memory %s to %#lx size %u\n", > - dir ? "read" : "write", addr, size); > + { > + /* > + * PVH dom0 is likely missing MMIO mappings on the p2m, due to > + * the incomplete information Xen has about the memory layout. > + * > + * Either print a message to note dom0 attempted to access an > + * unpopulated GPA, or try to fixup the p2m by creating an > + * identity mapping for the faulting GPA. > + */ > + if ( opt_dom0_pf_fixup ) > + { > + int inner_rc = hvm_hwdom_fixup_p2m(addr); Why not use rc, as we do elsewhere in the function? > --- a/xen/arch/x86/hvm/hvm.c > +++ b/xen/arch/x86/hvm/hvm.c > @@ -13,6 +13,7 @@ > #include <xen/lib.h> > #include <xen/trace.h> > #include <xen/sched.h> > +#include <xen/iocap.h> > #include <xen/irq.h> > #include <xen/softirq.h> > #include <xen/domain.h> > @@ -5458,6 +5459,36 @@ int hvm_copy_context_and_params(struct domain *dst, struct domain *src) > return rc; > } > > +bool __ro_after_init opt_dom0_pf_fixup; > +int hvm_hwdom_fixup_p2m(paddr_t addr) The placement here looks odd to me. Why not as static function in emulate.c? Or alternatively why not as p2m_hwdom_fixup() in mm/p2m.c? > +{ > + unsigned long gfn = paddr_to_pfn(addr); > + struct domain *currd = current->domain; > + p2m_type_t type; > + mfn_t mfn; > + int rc; > + > + ASSERT(is_hardware_domain(currd)); > + ASSERT(!altp2m_active(currd)); > + > + /* > + * Fixups are only applied for MMIO holes, and rely on the hardware domain > + * having identity mappings for non RAM regions (gfn == mfn). > + */ > + if ( !iomem_access_permitted(currd, gfn, gfn) || > + !is_memory_hole(_mfn(gfn), _mfn(gfn)) ) > + return -EPERM; > + > + mfn = get_gfn(currd, gfn, &type); > + if ( !mfn_eq(mfn, INVALID_MFN) || !p2m_is_hole(type) ) > + rc = mfn_eq(mfn, _mfn(gfn)) ? 0 : -EEXIST; I understand this is to cover the case where two vCPU-s access the same GFN at about the same time. However, the "success" log message at the call site being debug-only means we may be silently hiding bugs in release builds, if e.g. we get here despite the GFN having had an identity mapping already for ages. Jan ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] x86/dom0: attempt to fixup p2m page-faults for PVH dom0 2025-02-14 11:53 ` Jan Beulich @ 2025-02-14 12:38 ` Roger Pau Monné 2025-02-14 13:07 ` Jan Beulich 0 siblings, 1 reply; 14+ messages in thread From: Roger Pau Monné @ 2025-02-14 12:38 UTC (permalink / raw) To: Jan Beulich Cc: Oleksii Kurochko, Community Manager, Andrew Cooper, Anthony PERARD, Michal Orzel, Julien Grall, Stefano Stabellini, xen-devel On Fri, Feb 14, 2025 at 12:53:01PM +0100, Jan Beulich wrote: > On 14.02.2025 10:29, Roger Pau Monne wrote: > > --- a/docs/misc/xen-command-line.pandoc > > +++ b/docs/misc/xen-command-line.pandoc > > @@ -822,7 +822,8 @@ Specify the bit width of the DMA heap. > > > > ### dom0 > > = List of [ pv | pvh, shadow=<bool>, verbose=<bool>, > > - cpuid-faulting=<bool>, msr-relaxed=<bool> ] (x86) > > + cpuid-faulting=<bool>, msr-relaxed=<bool>, > > + pf-fixup=<bool> ] (x86) > > > > = List of [ sve=<integer> ] (Arm64) > > > > @@ -883,6 +884,19 @@ Controls for how dom0 is constructed on x86 systems. > > > > If using this option is necessary to fix an issue, please report a bug. > > > > +* The `pf-fixup` boolean is only applicable when using a PVH dom0 and > > + defaults to false. > > + > > + When running dom0 in PVH mode the dom0 kernel has no way to map MMIO > > + regions into the p2m, such mode relies on Xen dom0 builder populating > > + the p2m with all MMIO regions that dom0 should access. However Xen > > + doesn't have a complete picture of the host memory map, due to not > > + being able to process ACPI dynamic tables. > > + > > + The `pf-fixup` option allows Xen to attempt to add missing MMIO regions > > + to the p2m in response to page-faults generated by dom0 trying to access > > + unpopulated entries in the p2m. > > I wonder if this is to implementation focused for a command line option doc. > In particular the multiple uses of "p2m" are standing out in this regard. Hm, let me try to change p2m with 'dom0 physical memory map' or similar. > > --- a/xen/arch/x86/dom0_build.c > > +++ b/xen/arch/x86/dom0_build.c > > @@ -286,6 +286,10 @@ int __init parse_arch_dom0_param(const char *s, const char *e) > > opt_dom0_cpuid_faulting = val; > > else if ( (val = parse_boolean("msr-relaxed", s, e)) >= 0 ) > > opt_dom0_msr_relaxed = val; > > +#ifdef CONFIG_HVM > > + else if ( (val = parse_boolean("pf-fixup", s, e)) >= 0 ) > > + opt_dom0_pf_fixup = val; > > +#endif > > else > > return -EINVAL; > > I fear the scope of these sub-options is getting increasingly confusing. > opt_dom0_msr_relaxed is what its name says - specific to Dom0. > opt_dom0_cpuid_faulting, otoh, is a control domain option (i.e. also > applicable to a [hypothetical?] late ctrldom). Now you add an option > that's applicable to the hardware domain, i.e. also coverting late-hwdom. It's kind of a mixed bag, but ATM I would leave it as-is because it's likely easier for users to find the options if they are grouped together. WE can always add more fine grained options if there's a desired for them. > > --- a/xen/arch/x86/hvm/emulate.c > > +++ b/xen/arch/x86/hvm/emulate.c > > @@ -338,8 +338,38 @@ static int hvmemul_do_io( > > if ( !s ) > > { > > if ( is_mmio && is_hardware_domain(currd) ) > > - gdprintk(XENLOG_DEBUG, "unhandled memory %s to %#lx size %u\n", > > - dir ? "read" : "write", addr, size); > > + { > > + /* > > + * PVH dom0 is likely missing MMIO mappings on the p2m, due to > > + * the incomplete information Xen has about the memory layout. > > + * > > + * Either print a message to note dom0 attempted to access an > > + * unpopulated GPA, or try to fixup the p2m by creating an > > + * identity mapping for the faulting GPA. > > + */ > > + if ( opt_dom0_pf_fixup ) > > + { > > + int inner_rc = hvm_hwdom_fixup_p2m(addr); > > Why not use rc, as we do elsewhere in the function? hvm_hwdom_fixup_p2m() returns an errno, while rc in this context contains X86EMUL_ values. I could indeed re-use rc, it just felt wrong to mix different error address spaces on the same variable. > > --- a/xen/arch/x86/hvm/hvm.c > > +++ b/xen/arch/x86/hvm/hvm.c > > @@ -13,6 +13,7 @@ > > #include <xen/lib.h> > > #include <xen/trace.h> > > #include <xen/sched.h> > > +#include <xen/iocap.h> > > #include <xen/irq.h> > > #include <xen/softirq.h> > > #include <xen/domain.h> > > @@ -5458,6 +5459,36 @@ int hvm_copy_context_and_params(struct domain *dst, struct domain *src) > > return rc; > > } > > > > +bool __ro_after_init opt_dom0_pf_fixup; > > +int hvm_hwdom_fixup_p2m(paddr_t addr) > > The placement here looks odd to me. Why not as static function in emulate.c? > Or alternatively why not as p2m_hwdom_fixup() in mm/p2m.c? I don't have a strong opinion, if you are fine with it a static function in emulate.c might be the best then. > > +{ > > + unsigned long gfn = paddr_to_pfn(addr); > > + struct domain *currd = current->domain; > > + p2m_type_t type; > > + mfn_t mfn; > > + int rc; > > + > > + ASSERT(is_hardware_domain(currd)); > > + ASSERT(!altp2m_active(currd)); > > + > > + /* > > + * Fixups are only applied for MMIO holes, and rely on the hardware domain > > + * having identity mappings for non RAM regions (gfn == mfn). > > + */ > > + if ( !iomem_access_permitted(currd, gfn, gfn) || > > + !is_memory_hole(_mfn(gfn), _mfn(gfn)) ) > > + return -EPERM; > > + > > + mfn = get_gfn(currd, gfn, &type); > > + if ( !mfn_eq(mfn, INVALID_MFN) || !p2m_is_hole(type) ) > > + rc = mfn_eq(mfn, _mfn(gfn)) ? 0 : -EEXIST; > > I understand this is to cover the case where two vCPU-s access the same GFN > at about the same time. However, the "success" log message at the call site > being debug-only means we may be silently hiding bugs in release builds, if > e.g. we get here despite the GFN having had an identity mapping already for > ages. Possibly, but what would be your suggestion to fix this? I will think about it, but I can't immediately see a solution that's not simply to make the message printed by the caller to be gprintk() instead of gdprintk() so catch such bugs. Would you agree to that? Thanks, Roger. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] x86/dom0: attempt to fixup p2m page-faults for PVH dom0 2025-02-14 12:38 ` Roger Pau Monné @ 2025-02-14 13:07 ` Jan Beulich 2025-02-17 8:25 ` Roger Pau Monné 0 siblings, 1 reply; 14+ messages in thread From: Jan Beulich @ 2025-02-14 13:07 UTC (permalink / raw) To: Roger Pau Monné Cc: Oleksii Kurochko, Community Manager, Andrew Cooper, Anthony PERARD, Michal Orzel, Julien Grall, Stefano Stabellini, xen-devel On 14.02.2025 13:38, Roger Pau Monné wrote: > On Fri, Feb 14, 2025 at 12:53:01PM +0100, Jan Beulich wrote: >> On 14.02.2025 10:29, Roger Pau Monne wrote: >>> --- a/xen/arch/x86/hvm/emulate.c >>> +++ b/xen/arch/x86/hvm/emulate.c >>> @@ -338,8 +338,38 @@ static int hvmemul_do_io( >>> if ( !s ) >>> { >>> if ( is_mmio && is_hardware_domain(currd) ) >>> - gdprintk(XENLOG_DEBUG, "unhandled memory %s to %#lx size %u\n", >>> - dir ? "read" : "write", addr, size); >>> + { >>> + /* >>> + * PVH dom0 is likely missing MMIO mappings on the p2m, due to >>> + * the incomplete information Xen has about the memory layout. >>> + * >>> + * Either print a message to note dom0 attempted to access an >>> + * unpopulated GPA, or try to fixup the p2m by creating an >>> + * identity mapping for the faulting GPA. >>> + */ >>> + if ( opt_dom0_pf_fixup ) >>> + { >>> + int inner_rc = hvm_hwdom_fixup_p2m(addr); >> >> Why not use rc, as we do elsewhere in the function? > > hvm_hwdom_fixup_p2m() returns an errno, while rc in this context > contains X86EMUL_ values. I could indeed re-use rc, it just felt > wrong to mix different error address spaces on the same variable. Hmm, yes, I see. >>> --- a/xen/arch/x86/hvm/hvm.c >>> +++ b/xen/arch/x86/hvm/hvm.c >>> @@ -13,6 +13,7 @@ >>> #include <xen/lib.h> >>> #include <xen/trace.h> >>> #include <xen/sched.h> >>> +#include <xen/iocap.h> >>> #include <xen/irq.h> >>> #include <xen/softirq.h> >>> #include <xen/domain.h> >>> @@ -5458,6 +5459,36 @@ int hvm_copy_context_and_params(struct domain *dst, struct domain *src) >>> return rc; >>> } >>> >>> +bool __ro_after_init opt_dom0_pf_fixup; >>> +int hvm_hwdom_fixup_p2m(paddr_t addr) >> >> The placement here looks odd to me. Why not as static function in emulate.c? >> Or alternatively why not as p2m_hwdom_fixup() in mm/p2m.c? > > I don't have a strong opinion, if you are fine with it a static > function in emulate.c might be the best then. I'd be fine with either of the suggested options. mm/p2m.c is perhaps the more logical home for such a function, yet the option of having it static is quite appealing, too. Hence why I came to think of that one first. >>> +{ >>> + unsigned long gfn = paddr_to_pfn(addr); >>> + struct domain *currd = current->domain; >>> + p2m_type_t type; >>> + mfn_t mfn; >>> + int rc; >>> + >>> + ASSERT(is_hardware_domain(currd)); >>> + ASSERT(!altp2m_active(currd)); >>> + >>> + /* >>> + * Fixups are only applied for MMIO holes, and rely on the hardware domain >>> + * having identity mappings for non RAM regions (gfn == mfn). >>> + */ >>> + if ( !iomem_access_permitted(currd, gfn, gfn) || >>> + !is_memory_hole(_mfn(gfn), _mfn(gfn)) ) >>> + return -EPERM; >>> + >>> + mfn = get_gfn(currd, gfn, &type); >>> + if ( !mfn_eq(mfn, INVALID_MFN) || !p2m_is_hole(type) ) >>> + rc = mfn_eq(mfn, _mfn(gfn)) ? 0 : -EEXIST; >> >> I understand this is to cover the case where two vCPU-s access the same GFN >> at about the same time. However, the "success" log message at the call site >> being debug-only means we may be silently hiding bugs in release builds, if >> e.g. we get here despite the GFN having had an identity mapping already for >> ages. > > Possibly, but what would be your suggestion to fix this? I will think > about it, but I can't immediately see a solution that's not simply to > make the message printed by the caller to be gprintk() instead of > gdprintk() so catch such bugs. Would you agree to that? My thinking was that it might be best to propagate a distinguishable error code (perhaps -EEXIST, with its present use then replaced) out of the function, and make the choice of gprintk() vs gdprintk() depend on that. Accompanied by a comment explaining things a little. Jan ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] x86/dom0: attempt to fixup p2m page-faults for PVH dom0 2025-02-14 13:07 ` Jan Beulich @ 2025-02-17 8:25 ` Roger Pau Monné 2025-02-17 8:44 ` Jan Beulich 0 siblings, 1 reply; 14+ messages in thread From: Roger Pau Monné @ 2025-02-17 8:25 UTC (permalink / raw) To: Jan Beulich Cc: Oleksii Kurochko, Community Manager, Andrew Cooper, Anthony PERARD, Michal Orzel, Julien Grall, Stefano Stabellini, xen-devel On Fri, Feb 14, 2025 at 02:07:05PM +0100, Jan Beulich wrote: > On 14.02.2025 13:38, Roger Pau Monné wrote: > > On Fri, Feb 14, 2025 at 12:53:01PM +0100, Jan Beulich wrote: > >> On 14.02.2025 10:29, Roger Pau Monne wrote: > >>> +{ > >>> + unsigned long gfn = paddr_to_pfn(addr); > >>> + struct domain *currd = current->domain; > >>> + p2m_type_t type; > >>> + mfn_t mfn; > >>> + int rc; > >>> + > >>> + ASSERT(is_hardware_domain(currd)); > >>> + ASSERT(!altp2m_active(currd)); > >>> + > >>> + /* > >>> + * Fixups are only applied for MMIO holes, and rely on the hardware domain > >>> + * having identity mappings for non RAM regions (gfn == mfn). > >>> + */ > >>> + if ( !iomem_access_permitted(currd, gfn, gfn) || > >>> + !is_memory_hole(_mfn(gfn), _mfn(gfn)) ) > >>> + return -EPERM; > >>> + > >>> + mfn = get_gfn(currd, gfn, &type); > >>> + if ( !mfn_eq(mfn, INVALID_MFN) || !p2m_is_hole(type) ) > >>> + rc = mfn_eq(mfn, _mfn(gfn)) ? 0 : -EEXIST; > >> > >> I understand this is to cover the case where two vCPU-s access the same GFN > >> at about the same time. However, the "success" log message at the call site > >> being debug-only means we may be silently hiding bugs in release builds, if > >> e.g. we get here despite the GFN having had an identity mapping already for > >> ages. > > > > Possibly, but what would be your suggestion to fix this? I will think > > about it, but I can't immediately see a solution that's not simply to > > make the message printed by the caller to be gprintk() instead of > > gdprintk() so catch such bugs. Would you agree to that? > > My thinking was that it might be best to propagate a distinguishable error > code (perhaps -EEXIST, with its present use then replaced) out of the function, > and make the choice of gprintk() vs gdprintk() depend on that. Accompanied by a > comment explaining things a little. I think it would be easier if I just made those gprintk() instead of gdprintk(), all with severity XENLOG_DEBUG except for the one that reports the failure of the fixup function that is XENLOG_WARNING. Would you be OK with that? Thanks, Roger. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] x86/dom0: attempt to fixup p2m page-faults for PVH dom0 2025-02-17 8:25 ` Roger Pau Monné @ 2025-02-17 8:44 ` Jan Beulich 2025-02-17 10:20 ` Roger Pau Monné 0 siblings, 1 reply; 14+ messages in thread From: Jan Beulich @ 2025-02-17 8:44 UTC (permalink / raw) To: Roger Pau Monné Cc: Oleksii Kurochko, Community Manager, Andrew Cooper, Anthony PERARD, Michal Orzel, Julien Grall, Stefano Stabellini, xen-devel On 17.02.2025 09:25, Roger Pau Monné wrote: > On Fri, Feb 14, 2025 at 02:07:05PM +0100, Jan Beulich wrote: >> On 14.02.2025 13:38, Roger Pau Monné wrote: >>> On Fri, Feb 14, 2025 at 12:53:01PM +0100, Jan Beulich wrote: >>>> On 14.02.2025 10:29, Roger Pau Monne wrote: >>>>> +{ >>>>> + unsigned long gfn = paddr_to_pfn(addr); >>>>> + struct domain *currd = current->domain; >>>>> + p2m_type_t type; >>>>> + mfn_t mfn; >>>>> + int rc; >>>>> + >>>>> + ASSERT(is_hardware_domain(currd)); >>>>> + ASSERT(!altp2m_active(currd)); >>>>> + >>>>> + /* >>>>> + * Fixups are only applied for MMIO holes, and rely on the hardware domain >>>>> + * having identity mappings for non RAM regions (gfn == mfn). >>>>> + */ >>>>> + if ( !iomem_access_permitted(currd, gfn, gfn) || >>>>> + !is_memory_hole(_mfn(gfn), _mfn(gfn)) ) >>>>> + return -EPERM; >>>>> + >>>>> + mfn = get_gfn(currd, gfn, &type); >>>>> + if ( !mfn_eq(mfn, INVALID_MFN) || !p2m_is_hole(type) ) >>>>> + rc = mfn_eq(mfn, _mfn(gfn)) ? 0 : -EEXIST; >>>> >>>> I understand this is to cover the case where two vCPU-s access the same GFN >>>> at about the same time. However, the "success" log message at the call site >>>> being debug-only means we may be silently hiding bugs in release builds, if >>>> e.g. we get here despite the GFN having had an identity mapping already for >>>> ages. >>> >>> Possibly, but what would be your suggestion to fix this? I will think >>> about it, but I can't immediately see a solution that's not simply to >>> make the message printed by the caller to be gprintk() instead of >>> gdprintk() so catch such bugs. Would you agree to that? >> >> My thinking was that it might be best to propagate a distinguishable error >> code (perhaps -EEXIST, with its present use then replaced) out of the function, >> and make the choice of gprintk() vs gdprintk() depend on that. Accompanied by a >> comment explaining things a little. > > I think it would be easier if I just made those gprintk() instead of > gdprintk(), all with severity XENLOG_DEBUG except for the one that > reports the failure of the fixup function that is XENLOG_WARNING. > Would you be OK with that? Hmm. Okay-ish at best. Even if debug+guest-level messages are suppressed by default, I think it wouldn't be nice if many of them might appear in release builds with guest_loglevel=all. What I find difficult is to predict how high the chances are to see any of them (and then possibly multiple times). Jan ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] x86/dom0: attempt to fixup p2m page-faults for PVH dom0 2025-02-17 8:44 ` Jan Beulich @ 2025-02-17 10:20 ` Roger Pau Monné 2025-02-17 10:27 ` Jan Beulich 0 siblings, 1 reply; 14+ messages in thread From: Roger Pau Monné @ 2025-02-17 10:20 UTC (permalink / raw) To: Jan Beulich Cc: Oleksii Kurochko, Community Manager, Andrew Cooper, Anthony PERARD, Michal Orzel, Julien Grall, Stefano Stabellini, xen-devel On Mon, Feb 17, 2025 at 09:44:28AM +0100, Jan Beulich wrote: > On 17.02.2025 09:25, Roger Pau Monné wrote: > > On Fri, Feb 14, 2025 at 02:07:05PM +0100, Jan Beulich wrote: > >> On 14.02.2025 13:38, Roger Pau Monné wrote: > >>> On Fri, Feb 14, 2025 at 12:53:01PM +0100, Jan Beulich wrote: > >>>> On 14.02.2025 10:29, Roger Pau Monne wrote: > >>>>> +{ > >>>>> + unsigned long gfn = paddr_to_pfn(addr); > >>>>> + struct domain *currd = current->domain; > >>>>> + p2m_type_t type; > >>>>> + mfn_t mfn; > >>>>> + int rc; > >>>>> + > >>>>> + ASSERT(is_hardware_domain(currd)); > >>>>> + ASSERT(!altp2m_active(currd)); > >>>>> + > >>>>> + /* > >>>>> + * Fixups are only applied for MMIO holes, and rely on the hardware domain > >>>>> + * having identity mappings for non RAM regions (gfn == mfn). > >>>>> + */ > >>>>> + if ( !iomem_access_permitted(currd, gfn, gfn) || > >>>>> + !is_memory_hole(_mfn(gfn), _mfn(gfn)) ) > >>>>> + return -EPERM; > >>>>> + > >>>>> + mfn = get_gfn(currd, gfn, &type); > >>>>> + if ( !mfn_eq(mfn, INVALID_MFN) || !p2m_is_hole(type) ) > >>>>> + rc = mfn_eq(mfn, _mfn(gfn)) ? 0 : -EEXIST; > >>>> > >>>> I understand this is to cover the case where two vCPU-s access the same GFN > >>>> at about the same time. However, the "success" log message at the call site > >>>> being debug-only means we may be silently hiding bugs in release builds, if > >>>> e.g. we get here despite the GFN having had an identity mapping already for > >>>> ages. > >>> > >>> Possibly, but what would be your suggestion to fix this? I will think > >>> about it, but I can't immediately see a solution that's not simply to > >>> make the message printed by the caller to be gprintk() instead of > >>> gdprintk() so catch such bugs. Would you agree to that? > >> > >> My thinking was that it might be best to propagate a distinguishable error > >> code (perhaps -EEXIST, with its present use then replaced) out of the function, > >> and make the choice of gprintk() vs gdprintk() depend on that. Accompanied by a > >> comment explaining things a little. > > > > I think it would be easier if I just made those gprintk() instead of > > gdprintk(), all with severity XENLOG_DEBUG except for the one that > > reports the failure of the fixup function that is XENLOG_WARNING. > > Would you be OK with that? > > Hmm. Okay-ish at best. Even if debug+guest-level messages are suppressed by > default, I think it wouldn't be nice if many of them might appear in release > builds with guest_loglevel=all. What I find difficult is to predict how high > the chances are to see any of them (and then possibly multiple times). I think getting those messages even in non-debug builds might be helpful for debugging purposes. Sometimes it's difficult for users to switch to a debug build of Xen if not provided by their upstream. FWIW, on my Intel NUC I see three of those: (XEN) [ 5.418855] arch/x86/hvm/emulate.c:391:d0v0 fixup p2m mapping for page fedc7 added (XEN) [ 5.474574] arch/x86/hvm/emulate.c:391:d0v0 fixup p2m mapping for page fd6a0 added (XEN) [ 8.712784] arch/x86/hvm/emulate.c:391:d0v2 fixup p2m mapping for page fe410 added Would you be fine with this approach: bool __ro_after_init opt_dom0_pf_fixup; static int hwdom_fixup_p2m(paddr_t addr) { unsigned long gfn = paddr_to_pfn(addr); struct domain *currd = current->domain; p2m_type_t type; mfn_t mfn; int rc; ASSERT(is_hardware_domain(currd)); ASSERT(!altp2m_active(currd)); /* * Fixups are only applied for MMIO holes, and rely on the hardware domain * having identity mappings for non RAM regions (gfn == mfn). */ if ( !iomem_access_permitted(currd, gfn, gfn) || !is_memory_hole(_mfn(gfn), _mfn(gfn)) ) return -EPERM; mfn = get_gfn(currd, gfn, &type); if ( !mfn_eq(mfn, INVALID_MFN) || !p2m_is_hole(type) ) rc = mfn_eq(mfn, _mfn(gfn)) ? -EEXIST : -ENOTEMPTY; else rc = set_mmio_p2m_entry(currd, _gfn(gfn), _mfn(gfn), 0); put_gfn(currd, gfn); return rc; } [...] int inner_rc = hwdom_fixup_p2m(addr); if ( !inner_rc || inner_rc == -EEXIST ) { gdprintk(XENLOG_DEBUG, "fixup p2m mapping for page %lx %s\n", paddr_to_pfn(addr), !inner_rc ? "added" : "already present"); rc = X86EMUL_RETRY; vio->req.state = STATE_IOREQ_NONE; break; } gprintk(XENLOG_WARNING, "unable to fixup memory %s to %#lx size %u: %d\n", dir ? "read" : "write", addr, size, inner_rc); Thanks, Roger. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] x86/dom0: attempt to fixup p2m page-faults for PVH dom0 2025-02-17 10:20 ` Roger Pau Monné @ 2025-02-17 10:27 ` Jan Beulich 2025-02-17 10:51 ` Roger Pau Monné 0 siblings, 1 reply; 14+ messages in thread From: Jan Beulich @ 2025-02-17 10:27 UTC (permalink / raw) To: Roger Pau Monné Cc: Oleksii Kurochko, Community Manager, Andrew Cooper, Anthony PERARD, Michal Orzel, Julien Grall, Stefano Stabellini, xen-devel On 17.02.2025 11:20, Roger Pau Monné wrote: > On Mon, Feb 17, 2025 at 09:44:28AM +0100, Jan Beulich wrote: >> On 17.02.2025 09:25, Roger Pau Monné wrote: >>> On Fri, Feb 14, 2025 at 02:07:05PM +0100, Jan Beulich wrote: >>>> On 14.02.2025 13:38, Roger Pau Monné wrote: >>>>> On Fri, Feb 14, 2025 at 12:53:01PM +0100, Jan Beulich wrote: >>>>>> On 14.02.2025 10:29, Roger Pau Monne wrote: >>>>>>> +{ >>>>>>> + unsigned long gfn = paddr_to_pfn(addr); >>>>>>> + struct domain *currd = current->domain; >>>>>>> + p2m_type_t type; >>>>>>> + mfn_t mfn; >>>>>>> + int rc; >>>>>>> + >>>>>>> + ASSERT(is_hardware_domain(currd)); >>>>>>> + ASSERT(!altp2m_active(currd)); >>>>>>> + >>>>>>> + /* >>>>>>> + * Fixups are only applied for MMIO holes, and rely on the hardware domain >>>>>>> + * having identity mappings for non RAM regions (gfn == mfn). >>>>>>> + */ >>>>>>> + if ( !iomem_access_permitted(currd, gfn, gfn) || >>>>>>> + !is_memory_hole(_mfn(gfn), _mfn(gfn)) ) >>>>>>> + return -EPERM; >>>>>>> + >>>>>>> + mfn = get_gfn(currd, gfn, &type); >>>>>>> + if ( !mfn_eq(mfn, INVALID_MFN) || !p2m_is_hole(type) ) >>>>>>> + rc = mfn_eq(mfn, _mfn(gfn)) ? 0 : -EEXIST; >>>>>> >>>>>> I understand this is to cover the case where two vCPU-s access the same GFN >>>>>> at about the same time. However, the "success" log message at the call site >>>>>> being debug-only means we may be silently hiding bugs in release builds, if >>>>>> e.g. we get here despite the GFN having had an identity mapping already for >>>>>> ages. >>>>> >>>>> Possibly, but what would be your suggestion to fix this? I will think >>>>> about it, but I can't immediately see a solution that's not simply to >>>>> make the message printed by the caller to be gprintk() instead of >>>>> gdprintk() so catch such bugs. Would you agree to that? >>>> >>>> My thinking was that it might be best to propagate a distinguishable error >>>> code (perhaps -EEXIST, with its present use then replaced) out of the function, >>>> and make the choice of gprintk() vs gdprintk() depend on that. Accompanied by a >>>> comment explaining things a little. >>> >>> I think it would be easier if I just made those gprintk() instead of >>> gdprintk(), all with severity XENLOG_DEBUG except for the one that >>> reports the failure of the fixup function that is XENLOG_WARNING. >>> Would you be OK with that? >> >> Hmm. Okay-ish at best. Even if debug+guest-level messages are suppressed by >> default, I think it wouldn't be nice if many of them might appear in release >> builds with guest_loglevel=all. What I find difficult is to predict how high >> the chances are to see any of them (and then possibly multiple times). > > I think getting those messages even in non-debug builds might be > helpful for debugging purposes. Sometimes it's difficult for users to > switch to a debug build of Xen if not provided by their upstream. > > FWIW, on my Intel NUC I see three of those: > > (XEN) [ 5.418855] arch/x86/hvm/emulate.c:391:d0v0 fixup p2m mapping for page fedc7 added > (XEN) [ 5.474574] arch/x86/hvm/emulate.c:391:d0v0 fixup p2m mapping for page fd6a0 added > (XEN) [ 8.712784] arch/x86/hvm/emulate.c:391:d0v2 fixup p2m mapping for page fe410 added For my understanding: Did Xen with a PVH Dom0 not work on the NUC before? Or else how come it survived without this fixing up of mappings? > Would you be fine with this approach: > > bool __ro_after_init opt_dom0_pf_fixup; > static int hwdom_fixup_p2m(paddr_t addr) > { > unsigned long gfn = paddr_to_pfn(addr); > struct domain *currd = current->domain; > p2m_type_t type; > mfn_t mfn; > int rc; > > ASSERT(is_hardware_domain(currd)); > ASSERT(!altp2m_active(currd)); > > /* > * Fixups are only applied for MMIO holes, and rely on the hardware domain > * having identity mappings for non RAM regions (gfn == mfn). > */ > if ( !iomem_access_permitted(currd, gfn, gfn) || > !is_memory_hole(_mfn(gfn), _mfn(gfn)) ) > return -EPERM; > > mfn = get_gfn(currd, gfn, &type); > if ( !mfn_eq(mfn, INVALID_MFN) || !p2m_is_hole(type) ) > rc = mfn_eq(mfn, _mfn(gfn)) ? -EEXIST : -ENOTEMPTY; > else > rc = set_mmio_p2m_entry(currd, _gfn(gfn), _mfn(gfn), 0); > put_gfn(currd, gfn); > > return rc; > } > [...] > int inner_rc = hwdom_fixup_p2m(addr); > > if ( !inner_rc || inner_rc == -EEXIST ) > { > gdprintk(XENLOG_DEBUG, > "fixup p2m mapping for page %lx %s\n", > paddr_to_pfn(addr), > !inner_rc ? "added" : "already present"); As before, I think the "already present" message wants to be present also in release build logs. As opposed to the "added" one. Yet at the same time, if e.g. you and Andrew agree on the shape above, I won't stand in the way. Jan ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] x86/dom0: attempt to fixup p2m page-faults for PVH dom0 2025-02-17 10:27 ` Jan Beulich @ 2025-02-17 10:51 ` Roger Pau Monné 2025-02-17 10:58 ` Jan Beulich 0 siblings, 1 reply; 14+ messages in thread From: Roger Pau Monné @ 2025-02-17 10:51 UTC (permalink / raw) To: Jan Beulich Cc: Oleksii Kurochko, Community Manager, Andrew Cooper, Anthony PERARD, Michal Orzel, Julien Grall, Stefano Stabellini, xen-devel On Mon, Feb 17, 2025 at 11:27:52AM +0100, Jan Beulich wrote: > On 17.02.2025 11:20, Roger Pau Monné wrote: > > On Mon, Feb 17, 2025 at 09:44:28AM +0100, Jan Beulich wrote: > >> On 17.02.2025 09:25, Roger Pau Monné wrote: > >>> On Fri, Feb 14, 2025 at 02:07:05PM +0100, Jan Beulich wrote: > >>>> On 14.02.2025 13:38, Roger Pau Monné wrote: > >>>>> On Fri, Feb 14, 2025 at 12:53:01PM +0100, Jan Beulich wrote: > >>>>>> On 14.02.2025 10:29, Roger Pau Monne wrote: > >>>>>>> +{ > >>>>>>> + unsigned long gfn = paddr_to_pfn(addr); > >>>>>>> + struct domain *currd = current->domain; > >>>>>>> + p2m_type_t type; > >>>>>>> + mfn_t mfn; > >>>>>>> + int rc; > >>>>>>> + > >>>>>>> + ASSERT(is_hardware_domain(currd)); > >>>>>>> + ASSERT(!altp2m_active(currd)); > >>>>>>> + > >>>>>>> + /* > >>>>>>> + * Fixups are only applied for MMIO holes, and rely on the hardware domain > >>>>>>> + * having identity mappings for non RAM regions (gfn == mfn). > >>>>>>> + */ > >>>>>>> + if ( !iomem_access_permitted(currd, gfn, gfn) || > >>>>>>> + !is_memory_hole(_mfn(gfn), _mfn(gfn)) ) > >>>>>>> + return -EPERM; > >>>>>>> + > >>>>>>> + mfn = get_gfn(currd, gfn, &type); > >>>>>>> + if ( !mfn_eq(mfn, INVALID_MFN) || !p2m_is_hole(type) ) > >>>>>>> + rc = mfn_eq(mfn, _mfn(gfn)) ? 0 : -EEXIST; > >>>>>> > >>>>>> I understand this is to cover the case where two vCPU-s access the same GFN > >>>>>> at about the same time. However, the "success" log message at the call site > >>>>>> being debug-only means we may be silently hiding bugs in release builds, if > >>>>>> e.g. we get here despite the GFN having had an identity mapping already for > >>>>>> ages. > >>>>> > >>>>> Possibly, but what would be your suggestion to fix this? I will think > >>>>> about it, but I can't immediately see a solution that's not simply to > >>>>> make the message printed by the caller to be gprintk() instead of > >>>>> gdprintk() so catch such bugs. Would you agree to that? > >>>> > >>>> My thinking was that it might be best to propagate a distinguishable error > >>>> code (perhaps -EEXIST, with its present use then replaced) out of the function, > >>>> and make the choice of gprintk() vs gdprintk() depend on that. Accompanied by a > >>>> comment explaining things a little. > >>> > >>> I think it would be easier if I just made those gprintk() instead of > >>> gdprintk(), all with severity XENLOG_DEBUG except for the one that > >>> reports the failure of the fixup function that is XENLOG_WARNING. > >>> Would you be OK with that? > >> > >> Hmm. Okay-ish at best. Even if debug+guest-level messages are suppressed by > >> default, I think it wouldn't be nice if many of them might appear in release > >> builds with guest_loglevel=all. What I find difficult is to predict how high > >> the chances are to see any of them (and then possibly multiple times). > > > > I think getting those messages even in non-debug builds might be > > helpful for debugging purposes. Sometimes it's difficult for users to > > switch to a debug build of Xen if not provided by their upstream. > > > > FWIW, on my Intel NUC I see three of those: > > > > (XEN) [ 5.418855] arch/x86/hvm/emulate.c:391:d0v0 fixup p2m mapping for page fedc7 added > > (XEN) [ 5.474574] arch/x86/hvm/emulate.c:391:d0v0 fixup p2m mapping for page fd6a0 added > > (XEN) [ 8.712784] arch/x86/hvm/emulate.c:391:d0v2 fixup p2m mapping for page fe410 added > > For my understanding: Did Xen with a PVH Dom0 not work on the NUC before? Or > else how come it survived without this fixing up of mappings? I've got no idea what's in those addresses. It did survive and seems to work fine without those identity mappings in the p2m. I assume that returning ~0 for reads and ignoring writes what good enough. > > > Would you be fine with this approach: > > > > bool __ro_after_init opt_dom0_pf_fixup; > > static int hwdom_fixup_p2m(paddr_t addr) > > { > > unsigned long gfn = paddr_to_pfn(addr); > > struct domain *currd = current->domain; > > p2m_type_t type; > > mfn_t mfn; > > int rc; > > > > ASSERT(is_hardware_domain(currd)); > > ASSERT(!altp2m_active(currd)); > > > > /* > > * Fixups are only applied for MMIO holes, and rely on the hardware domain > > * having identity mappings for non RAM regions (gfn == mfn). > > */ > > if ( !iomem_access_permitted(currd, gfn, gfn) || > > !is_memory_hole(_mfn(gfn), _mfn(gfn)) ) > > return -EPERM; > > > > mfn = get_gfn(currd, gfn, &type); > > if ( !mfn_eq(mfn, INVALID_MFN) || !p2m_is_hole(type) ) > > rc = mfn_eq(mfn, _mfn(gfn)) ? -EEXIST : -ENOTEMPTY; > > else > > rc = set_mmio_p2m_entry(currd, _gfn(gfn), _mfn(gfn), 0); > > put_gfn(currd, gfn); > > > > return rc; > > } > > [...] > > int inner_rc = hwdom_fixup_p2m(addr); > > > > if ( !inner_rc || inner_rc == -EEXIST ) > > { > > gdprintk(XENLOG_DEBUG, > > "fixup p2m mapping for page %lx %s\n", > > paddr_to_pfn(addr), > > !inner_rc ? "added" : "already present"); > > As before, I think the "already present" message wants to be present also in > release build logs. As opposed to the "added" one. Yet at the same time, if > e.g. you and Andrew agree on the shape above, I won't stand in the way. I didn't want to add yet another level of indentation, as it then becomes: int inner_rc = hwdom_fixup_p2m(addr); if ( !inner_rc || inner_rc == -EEXIST ) { if ( !inner_rc ) gdprintk(XENLOG_DEBUG, "fixup p2m mapping for page %lx added\n", paddr_to_pfn(addr)); else gprintk(XENLOG_INFO, "fixup p2m mapping for page %lx already present\n", paddr_to_pfn(addr)); Would you be OK with the above proposal then? Thanks, Roger. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] x86/dom0: attempt to fixup p2m page-faults for PVH dom0 2025-02-17 10:51 ` Roger Pau Monné @ 2025-02-17 10:58 ` Jan Beulich 0 siblings, 0 replies; 14+ messages in thread From: Jan Beulich @ 2025-02-17 10:58 UTC (permalink / raw) To: Roger Pau Monné Cc: Oleksii Kurochko, Community Manager, Andrew Cooper, Anthony PERARD, Michal Orzel, Julien Grall, Stefano Stabellini, xen-devel On 17.02.2025 11:51, Roger Pau Monné wrote: > On Mon, Feb 17, 2025 at 11:27:52AM +0100, Jan Beulich wrote: >> On 17.02.2025 11:20, Roger Pau Monné wrote: >>> On Mon, Feb 17, 2025 at 09:44:28AM +0100, Jan Beulich wrote: >>>> On 17.02.2025 09:25, Roger Pau Monné wrote: >>>>> On Fri, Feb 14, 2025 at 02:07:05PM +0100, Jan Beulich wrote: >>>>>> On 14.02.2025 13:38, Roger Pau Monné wrote: >>>>>>> On Fri, Feb 14, 2025 at 12:53:01PM +0100, Jan Beulich wrote: >>>>>>>> On 14.02.2025 10:29, Roger Pau Monne wrote: >>>>>>>>> +{ >>>>>>>>> + unsigned long gfn = paddr_to_pfn(addr); >>>>>>>>> + struct domain *currd = current->domain; >>>>>>>>> + p2m_type_t type; >>>>>>>>> + mfn_t mfn; >>>>>>>>> + int rc; >>>>>>>>> + >>>>>>>>> + ASSERT(is_hardware_domain(currd)); >>>>>>>>> + ASSERT(!altp2m_active(currd)); >>>>>>>>> + >>>>>>>>> + /* >>>>>>>>> + * Fixups are only applied for MMIO holes, and rely on the hardware domain >>>>>>>>> + * having identity mappings for non RAM regions (gfn == mfn). >>>>>>>>> + */ >>>>>>>>> + if ( !iomem_access_permitted(currd, gfn, gfn) || >>>>>>>>> + !is_memory_hole(_mfn(gfn), _mfn(gfn)) ) >>>>>>>>> + return -EPERM; >>>>>>>>> + >>>>>>>>> + mfn = get_gfn(currd, gfn, &type); >>>>>>>>> + if ( !mfn_eq(mfn, INVALID_MFN) || !p2m_is_hole(type) ) >>>>>>>>> + rc = mfn_eq(mfn, _mfn(gfn)) ? 0 : -EEXIST; >>>>>>>> >>>>>>>> I understand this is to cover the case where two vCPU-s access the same GFN >>>>>>>> at about the same time. However, the "success" log message at the call site >>>>>>>> being debug-only means we may be silently hiding bugs in release builds, if >>>>>>>> e.g. we get here despite the GFN having had an identity mapping already for >>>>>>>> ages. >>>>>>> >>>>>>> Possibly, but what would be your suggestion to fix this? I will think >>>>>>> about it, but I can't immediately see a solution that's not simply to >>>>>>> make the message printed by the caller to be gprintk() instead of >>>>>>> gdprintk() so catch such bugs. Would you agree to that? >>>>>> >>>>>> My thinking was that it might be best to propagate a distinguishable error >>>>>> code (perhaps -EEXIST, with its present use then replaced) out of the function, >>>>>> and make the choice of gprintk() vs gdprintk() depend on that. Accompanied by a >>>>>> comment explaining things a little. >>>>> >>>>> I think it would be easier if I just made those gprintk() instead of >>>>> gdprintk(), all with severity XENLOG_DEBUG except for the one that >>>>> reports the failure of the fixup function that is XENLOG_WARNING. >>>>> Would you be OK with that? >>>> >>>> Hmm. Okay-ish at best. Even if debug+guest-level messages are suppressed by >>>> default, I think it wouldn't be nice if many of them might appear in release >>>> builds with guest_loglevel=all. What I find difficult is to predict how high >>>> the chances are to see any of them (and then possibly multiple times). >>> >>> I think getting those messages even in non-debug builds might be >>> helpful for debugging purposes. Sometimes it's difficult for users to >>> switch to a debug build of Xen if not provided by their upstream. >>> >>> FWIW, on my Intel NUC I see three of those: >>> >>> (XEN) [ 5.418855] arch/x86/hvm/emulate.c:391:d0v0 fixup p2m mapping for page fedc7 added >>> (XEN) [ 5.474574] arch/x86/hvm/emulate.c:391:d0v0 fixup p2m mapping for page fd6a0 added >>> (XEN) [ 8.712784] arch/x86/hvm/emulate.c:391:d0v2 fixup p2m mapping for page fe410 added >> >> For my understanding: Did Xen with a PVH Dom0 not work on the NUC before? Or >> else how come it survived without this fixing up of mappings? > > I've got no idea what's in those addresses. It did survive and seems > to work fine without those identity mappings in the p2m. I assume > that returning ~0 for reads and ignoring writes what good enough. On one hand I find this concerning. Otoh this way we'll maybe learn what issues were so far papered over by said read/write behavior. (Of course there's also a small chance of this opening up new problem areas.) >>> Would you be fine with this approach: >>> >>> bool __ro_after_init opt_dom0_pf_fixup; >>> static int hwdom_fixup_p2m(paddr_t addr) >>> { >>> unsigned long gfn = paddr_to_pfn(addr); >>> struct domain *currd = current->domain; >>> p2m_type_t type; >>> mfn_t mfn; >>> int rc; >>> >>> ASSERT(is_hardware_domain(currd)); >>> ASSERT(!altp2m_active(currd)); >>> >>> /* >>> * Fixups are only applied for MMIO holes, and rely on the hardware domain >>> * having identity mappings for non RAM regions (gfn == mfn). >>> */ >>> if ( !iomem_access_permitted(currd, gfn, gfn) || >>> !is_memory_hole(_mfn(gfn), _mfn(gfn)) ) >>> return -EPERM; >>> >>> mfn = get_gfn(currd, gfn, &type); >>> if ( !mfn_eq(mfn, INVALID_MFN) || !p2m_is_hole(type) ) >>> rc = mfn_eq(mfn, _mfn(gfn)) ? -EEXIST : -ENOTEMPTY; >>> else >>> rc = set_mmio_p2m_entry(currd, _gfn(gfn), _mfn(gfn), 0); >>> put_gfn(currd, gfn); >>> >>> return rc; >>> } >>> [...] >>> int inner_rc = hwdom_fixup_p2m(addr); >>> >>> if ( !inner_rc || inner_rc == -EEXIST ) >>> { >>> gdprintk(XENLOG_DEBUG, >>> "fixup p2m mapping for page %lx %s\n", >>> paddr_to_pfn(addr), >>> !inner_rc ? "added" : "already present"); >> >> As before, I think the "already present" message wants to be present also in >> release build logs. As opposed to the "added" one. Yet at the same time, if >> e.g. you and Andrew agree on the shape above, I won't stand in the way. > > I didn't want to add yet another level of indentation, as it then > becomes: > > int inner_rc = hwdom_fixup_p2m(addr); > > if ( !inner_rc || inner_rc == -EEXIST ) > { > if ( !inner_rc ) > gdprintk(XENLOG_DEBUG, > "fixup p2m mapping for page %lx added\n", > paddr_to_pfn(addr)); > else > gprintk(XENLOG_INFO, > "fixup p2m mapping for page %lx already present\n", > paddr_to_pfn(addr)); > > Would you be OK with the above proposal then? Yes (with off-by-1 indentation corrected). It's unfortunate that this can't be written in a more compact form. Jan ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] x86/dom0: attempt to fixup p2m page-faults for PVH dom0 2025-02-14 9:29 ` [PATCH 2/2] x86/dom0: attempt to fixup p2m page-faults " Roger Pau Monne 2025-02-14 11:53 ` Jan Beulich @ 2025-02-14 14:42 ` Andrew Cooper 1 sibling, 0 replies; 14+ messages in thread From: Andrew Cooper @ 2025-02-14 14:42 UTC (permalink / raw) To: Roger Pau Monne, xen-devel Cc: Oleksii Kurochko, Community Manager, Anthony PERARD, Michal Orzel, Jan Beulich, Julien Grall, Stefano Stabellini On 14/02/2025 9:29 am, Roger Pau Monne wrote: > diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc > index 9bbd00baef91..e9884de07e9e 100644 > --- a/docs/misc/xen-command-line.pandoc > +++ b/docs/misc/xen-command-line.pandoc > @@ -822,7 +822,8 @@ Specify the bit width of the DMA heap. > > ### dom0 > = List of [ pv | pvh, shadow=<bool>, verbose=<bool>, > - cpuid-faulting=<bool>, msr-relaxed=<bool> ] (x86) > + cpuid-faulting=<bool>, msr-relaxed=<bool>, > + pf-fixup=<bool> ] (x86) > > = List of [ sve=<integer> ] (Arm64) > Looking at this, we need to make dom0= disjoint between architectures like we do for other options. I'll do a patch. ~Andrew ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2025-02-17 10:59 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-02-14 9:29 [PATCH 0/2] x86/pvh: workaround missing MMIO regions in dom0 p2m Roger Pau Monne 2025-02-14 9:29 ` [PATCH 1/2] x86/emul: dump unhandled memory accesses for PVH dom0 Roger Pau Monne 2025-02-14 11:23 ` Jan Beulich 2025-02-14 9:29 ` [PATCH 2/2] x86/dom0: attempt to fixup p2m page-faults " Roger Pau Monne 2025-02-14 11:53 ` Jan Beulich 2025-02-14 12:38 ` Roger Pau Monné 2025-02-14 13:07 ` Jan Beulich 2025-02-17 8:25 ` Roger Pau Monné 2025-02-17 8:44 ` Jan Beulich 2025-02-17 10:20 ` Roger Pau Monné 2025-02-17 10:27 ` Jan Beulich 2025-02-17 10:51 ` Roger Pau Monné 2025-02-17 10:58 ` Jan Beulich 2025-02-14 14:42 ` Andrew Cooper
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.