[PATCH 0/2] x86/hvm: fixes for RO MMIO emulation

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/2] x86/hvm: fixes for RO MMIO emulation
@ 2025-04-08  9:31 Roger Pau Monne
  2025-04-08  9:31 ` [PATCH 1/2] x86/hvm: remove unreachable MMCFG write emulation Roger Pau Monne
  2025-04-08  9:31 ` [PATCH 2/2] x86/hvm: fix write emulation of RO ranges Roger Pau Monne
  0 siblings, 2 replies; 19+ messages in thread
From: Roger Pau Monne @ 2025-04-08  9:31 UTC (permalink / raw)
  To: xen-devel; +Cc: Roger Pau Monne, Jan Beulich, Andrew Cooper

Hello,

Two small fixes for RO MMIO emulation for HVM guests. First one removes
unreachable code from the HVM path, second fixes an issue in the
mmio_ro_emulated_write() handler when used in HVM guests context.

Thanks, Roger.

Roger Pau Monne (2):
  x86/hvm: remove unreachable MMCFG write emulation
  x86/hvm: fix write emulation of RO ranges

 xen/arch/x86/hvm/emulate.c      | 22 +++++++------------
 xen/arch/x86/include/asm/mm.h   |  3 ---
 xen/arch/x86/mm.c               | 38 ++++++---------------------------
 xen/arch/x86/pv/ro-page-fault.c | 31 +++++++++++++++++++++++++++
 4 files changed, 45 insertions(+), 49 deletions(-)

-- 
2.48.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/2] x86/hvm: remove unreachable MMCFG write emulation
  2025-04-08  9:31 [PATCH 0/2] x86/hvm: fixes for RO MMIO emulation Roger Pau Monne
@ 2025-04-08  9:31 ` Roger Pau Monne
  2025-04-08 13:42   ` Jan Beulich
  2025-04-08  9:31 ` [PATCH 2/2] x86/hvm: fix write emulation of RO ranges Roger Pau Monne
  1 sibling, 1 reply; 19+ messages in thread
From: Roger Pau Monne @ 2025-04-08  9:31 UTC (permalink / raw)
  To: xen-devel; +Cc: Roger Pau Monne, Jan Beulich, Andrew Cooper

The current implementation of PVH dom0 relies on vPCI to trap and handle
accesses to the MMCFG area.  Previous implementation of PVH dom0 (v1)
didn't have vPCI, and as a classic PV dom0, relied on the MMCFG range being
RO.  As such hvm_emulate_one_mmio() had to special case write accesses to
the MMCFG area.

With PVH dom0 using vPCI, and the MMCFG accesses being fully handled there,
hvm_emulate_one_mmio() should never handle accesses to MMCFG, making the
code effectively unreachable.

Remove it and leave an ASSERT to make sure MMCFG accesses never get into
hvm_emulate_one_mmio().  As a result of the removal of one of the users of
mmcfg_intercept_write(), the function can now be moved into the same
translation unit where it's solely used, allowing it to be made static and
effectively built only when PV support is enabled.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/hvm/emulate.c      | 22 ++++++++--------------
 xen/arch/x86/include/asm/mm.h   |  3 ---
 xen/arch/x86/mm.c               | 31 -------------------------------
 xen/arch/x86/pv/ro-page-fault.c | 31 +++++++++++++++++++++++++++++++
 4 files changed, 39 insertions(+), 48 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 04d07ccaf6a4..2c9fbacce7fb 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -2858,12 +2858,6 @@ int hvm_emulate_one(
 
 int hvm_emulate_one_mmio(unsigned long mfn, unsigned long gla)
 {
-    static const struct x86_emulate_ops hvm_intercept_ops_mmcfg = {
-        .read       = x86emul_unhandleable_rw,
-        .insn_fetch = hvmemul_insn_fetch,
-        .write      = mmcfg_intercept_write,
-        .validate   = hvmemul_validate,
-    };
     static const struct x86_emulate_ops hvm_ro_emulate_ops_mmio = {
         .read       = x86emul_unhandleable_rw,
         .insn_fetch = hvmemul_insn_fetch,
@@ -2872,28 +2866,28 @@ int hvm_emulate_one_mmio(unsigned long mfn, unsigned long gla)
     };
     struct mmio_ro_emulate_ctxt mmio_ro_ctxt = { .cr2 = gla, .mfn = _mfn(mfn) };
     struct hvm_emulate_ctxt ctxt;
-    const struct x86_emulate_ops *ops;
     unsigned int seg, bdf;
     int rc;
 
     if ( pci_ro_mmcfg_decode(mfn, &seg, &bdf) )
     {
-        mmio_ro_ctxt.seg = seg;
-        mmio_ro_ctxt.bdf = bdf;
-        ops = &hvm_intercept_ops_mmcfg;
+        /* Should be always handled by vPCI for PVH dom0. */
+        gdprintk(XENLOG_ERR, "unhandled MMCFG access for %pp\n",
+                 &PCI_SBDF(seg, bdf));
+        ASSERT_UNREACHABLE();
+        return X86EMUL_UNHANDLEABLE;
     }
-    else
-        ops = &hvm_ro_emulate_ops_mmio;
 
     hvm_emulate_init_once(&ctxt, x86_insn_is_mem_write,
                           guest_cpu_user_regs());
     ctxt.ctxt.data = &mmio_ro_ctxt;
 
-    switch ( rc = _hvm_emulate_one(&ctxt, ops, VIO_no_completion) )
+    switch ( rc = _hvm_emulate_one(&ctxt, &hvm_ro_emulate_ops_mmio,
+                                   VIO_no_completion) )
     {
     case X86EMUL_UNHANDLEABLE:
     case X86EMUL_UNIMPLEMENTED:
-        hvm_dump_emulation_state(XENLOG_G_WARNING, "MMCFG", &ctxt, rc);
+        hvm_dump_emulation_state(XENLOG_G_WARNING, "RO MMIO", &ctxt, rc);
         break;
     case X86EMUL_EXCEPTION:
         hvm_inject_event(&ctxt.ctxt.event);
diff --git a/xen/arch/x86/include/asm/mm.h b/xen/arch/x86/include/asm/mm.h
index 6c7e66ee21ab..a1bc8cc27451 100644
--- a/xen/arch/x86/include/asm/mm.h
+++ b/xen/arch/x86/include/asm/mm.h
@@ -553,9 +553,6 @@ struct mmio_ro_emulate_ctxt {
 int cf_check mmio_ro_emulated_write(
     enum x86_segment seg, unsigned long offset, void *p_data,
     unsigned int bytes, struct x86_emulate_ctxt *ctxt);
-int cf_check mmcfg_intercept_write(
-    enum x86_segment seg, unsigned long offset, void *p_data,
-    unsigned int bytes, struct x86_emulate_ctxt *ctxt);
 
 int audit_adjust_pgtables(struct domain *d, int dir, int noisy);
 
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index b294497a140d..4fecd37aeca0 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5200,37 +5200,6 @@ int cf_check mmio_ro_emulated_write(
     return X86EMUL_OKAY;
 }
 
-int cf_check mmcfg_intercept_write(
-    enum x86_segment seg,
-    unsigned long offset,
-    void *p_data,
-    unsigned int bytes,
-    struct x86_emulate_ctxt *ctxt)
-{
-    struct mmio_ro_emulate_ctxt *mmio_ctxt = ctxt->data;
-
-    /*
-     * Only allow naturally-aligned stores no wider than 4 bytes to the
-     * original %cr2 address.
-     */
-    if ( ((bytes | offset) & (bytes - 1)) || bytes > 4 || !bytes ||
-         offset != mmio_ctxt->cr2 )
-    {
-        gdprintk(XENLOG_WARNING, "bad write (cr2=%lx, addr=%lx, bytes=%u)\n",
-                mmio_ctxt->cr2, offset, bytes);
-        return X86EMUL_UNHANDLEABLE;
-    }
-
-    offset &= 0xfff;
-    if ( pci_conf_write_intercept(mmio_ctxt->seg, mmio_ctxt->bdf,
-                                  offset, bytes, p_data) >= 0 )
-        pci_mmcfg_write(mmio_ctxt->seg, PCI_BUS(mmio_ctxt->bdf),
-                        PCI_DEVFN(mmio_ctxt->bdf), offset, bytes,
-                        *(uint32_t *)p_data);
-
-    return X86EMUL_OKAY;
-}
-
 /*
  * For these PTE APIs, the caller must follow the alloc-map-unmap-free
  * lifecycle, which means explicitly mapping the PTE pages before accessing
diff --git a/xen/arch/x86/pv/ro-page-fault.c b/xen/arch/x86/pv/ro-page-fault.c
index 372187e9a096..11b01c479e43 100644
--- a/xen/arch/x86/pv/ro-page-fault.c
+++ b/xen/arch/x86/pv/ro-page-fault.c
@@ -298,6 +298,37 @@ static int ptwr_do_page_fault(struct x86_emulate_ctxt *ctxt,
  * fault handling for read-only MMIO pages
  */
 
+static int cf_check mmcfg_intercept_write(
+    enum x86_segment seg,
+    unsigned long offset,
+    void *p_data,
+    unsigned int bytes,
+    struct x86_emulate_ctxt *ctxt)
+{
+    struct mmio_ro_emulate_ctxt *mmio_ctxt = ctxt->data;
+
+    /*
+     * Only allow naturally-aligned stores no wider than 4 bytes to the
+     * original %cr2 address.
+     */
+    if ( ((bytes | offset) & (bytes - 1)) || bytes > 4 || !bytes ||
+         offset != mmio_ctxt->cr2 )
+    {
+        gdprintk(XENLOG_WARNING, "bad write (cr2=%lx, addr=%lx, bytes=%u)\n",
+                mmio_ctxt->cr2, offset, bytes);
+        return X86EMUL_UNHANDLEABLE;
+    }
+
+    offset &= 0xfff;
+    if ( pci_conf_write_intercept(mmio_ctxt->seg, mmio_ctxt->bdf,
+                                  offset, bytes, p_data) >= 0 )
+        pci_mmcfg_write(mmio_ctxt->seg, PCI_BUS(mmio_ctxt->bdf),
+                        PCI_DEVFN(mmio_ctxt->bdf), offset, bytes,
+                        *(uint32_t *)p_data);
+
+    return X86EMUL_OKAY;
+}
+
 static const struct x86_emulate_ops mmio_ro_emulate_ops = {
     .read       = x86emul_unhandleable_rw,
     .insn_fetch = ptwr_emulated_insn_fetch,
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/2] x86/hvm: fix write emulation of RO ranges
  2025-04-08  9:31 [PATCH 0/2] x86/hvm: fixes for RO MMIO emulation Roger Pau Monne
  2025-04-08  9:31 ` [PATCH 1/2] x86/hvm: remove unreachable MMCFG write emulation Roger Pau Monne
@ 2025-04-08  9:31 ` Roger Pau Monne
  2025-04-08 13:57   ` Jan Beulich
  2025-04-08 14:00   ` Andrew Cooper
  1 sibling, 2 replies; 19+ messages in thread
From: Roger Pau Monne @ 2025-04-08  9:31 UTC (permalink / raw)
  To: xen-devel; +Cc: Roger Pau Monne, Jan Beulich, Andrew Cooper

When running on AMD hardware in HVM mode the guest linear address (GLA)
will not be provided to hvm_emulate_one_mmio(), and instead is
unconditionally set of ~0.  As a consequence mmio_ro_emulated_write() will
always report an error, as the fault GLA generated by the emulation of the
access won't be ~0.

Fix this by only checking for the fault GLA in mmio_ro_emulated_write()
when the guest is PV.

Fixes: 33c19df9a5a0 ('x86/PCI: intercept accesses to RO MMIO from dom0s in HVM containers')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/mm.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 4fecd37aeca0..79836705c51e 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5187,7 +5187,12 @@ int cf_check mmio_ro_emulated_write(
 
     /* Only allow naturally-aligned stores at the original %cr2 address. */
     if ( ((bytes | offset) & (bytes - 1)) || !bytes ||
-         offset != mmio_ro_ctxt->cr2 )
+         /*
+          * HVM domains might not have a valid fault GLA in the context, as AMD
+          * NPT faults don't report the faulting GLA.  It's also possible for
+          * the fault to happen in non-paging modes.
+          */
+         (is_pv_domain(current->domain) && offset != mmio_ro_ctxt->cr2) )
     {
         gdprintk(XENLOG_WARNING, "bad access (cr2=%lx, addr=%lx, bytes=%u)\n",
                 mmio_ro_ctxt->cr2, offset, bytes);
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] x86/hvm: remove unreachable MMCFG write emulation
  2025-04-08  9:31 ` [PATCH 1/2] x86/hvm: remove unreachable MMCFG write emulation Roger Pau Monne
@ 2025-04-08 13:42   ` Jan Beulich
  0 siblings, 0 replies; 19+ messages in thread
From: Jan Beulich @ 2025-04-08 13:42 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, xen-devel

On 08.04.2025 11:31, Roger Pau Monne wrote:
> The current implementation of PVH dom0 relies on vPCI to trap and handle
> accesses to the MMCFG area.  Previous implementation of PVH dom0 (v1)
> didn't have vPCI, and as a classic PV dom0, relied on the MMCFG range being
> RO.  As such hvm_emulate_one_mmio() had to special case write accesses to
> the MMCFG area.
> 
> With PVH dom0 using vPCI, and the MMCFG accesses being fully handled there,
> hvm_emulate_one_mmio() should never handle accesses to MMCFG, making the
> code effectively unreachable.
> 
> Remove it and leave an ASSERT to make sure MMCFG accesses never get into
> hvm_emulate_one_mmio().  As a result of the removal of one of the users of
> mmcfg_intercept_write(), the function can now be moved into the same
> translation unit where it's solely used, allowing it to be made static and
> effectively built only when PV support is enabled.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
with one cosmetic suggestion:

> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -2858,12 +2858,6 @@ int hvm_emulate_one(
>  
>  int hvm_emulate_one_mmio(unsigned long mfn, unsigned long gla)
>  {
> -    static const struct x86_emulate_ops hvm_intercept_ops_mmcfg = {
> -        .read       = x86emul_unhandleable_rw,
> -        .insn_fetch = hvmemul_insn_fetch,
> -        .write      = mmcfg_intercept_write,
> -        .validate   = hvmemul_validate,
> -    };
>      static const struct x86_emulate_ops hvm_ro_emulate_ops_mmio = {
>          .read       = x86emul_unhandleable_rw,
>          .insn_fetch = hvmemul_insn_fetch,
> @@ -2872,28 +2866,28 @@ int hvm_emulate_one_mmio(unsigned long mfn, unsigned long gla)
>      };
>      struct mmio_ro_emulate_ctxt mmio_ro_ctxt = { .cr2 = gla, .mfn = _mfn(mfn) };
>      struct hvm_emulate_ctxt ctxt;
> -    const struct x86_emulate_ops *ops;
>      unsigned int seg, bdf;
>      int rc;
>  
>      if ( pci_ro_mmcfg_decode(mfn, &seg, &bdf) )
>      {
> -        mmio_ro_ctxt.seg = seg;
> -        mmio_ro_ctxt.bdf = bdf;
> -        ops = &hvm_intercept_ops_mmcfg;
> +        /* Should be always handled by vPCI for PVH dom0. */
> +        gdprintk(XENLOG_ERR, "unhandled MMCFG access for %pp\n",
> +                 &PCI_SBDF(seg, bdf));
> +        ASSERT_UNREACHABLE();
> +        return X86EMUL_UNHANDLEABLE;
>      }
> -    else
> -        ops = &hvm_ro_emulate_ops_mmio;
>  
>      hvm_emulate_init_once(&ctxt, x86_insn_is_mem_write,
>                            guest_cpu_user_regs());
>      ctxt.ctxt.data = &mmio_ro_ctxt;
>  
> -    switch ( rc = _hvm_emulate_one(&ctxt, ops, VIO_no_completion) )
> +    switch ( rc = _hvm_emulate_one(&ctxt, &hvm_ro_emulate_ops_mmio,
> +                                   VIO_no_completion) )
>      {
>      case X86EMUL_UNHANDLEABLE:
>      case X86EMUL_UNIMPLEMENTED:
> -        hvm_dump_emulation_state(XENLOG_G_WARNING, "MMCFG", &ctxt, rc);
> +        hvm_dump_emulation_state(XENLOG_G_WARNING, "RO MMIO", &ctxt, rc);

The string doesn't need to be all capitals (see other uses of the function).
How about "r/o MMIO"?

Jan


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] x86/hvm: fix write emulation of RO ranges
  2025-04-08  9:31 ` [PATCH 2/2] x86/hvm: fix write emulation of RO ranges Roger Pau Monne
@ 2025-04-08 13:57   ` Jan Beulich
  2025-04-09  9:07     ` Roger Pau Monné
  2025-04-09 14:27     ` Marek Marczykowski
  2025-04-08 14:00   ` Andrew Cooper
  1 sibling, 2 replies; 19+ messages in thread
From: Jan Beulich @ 2025-04-08 13:57 UTC (permalink / raw)
  To: Roger Pau Monne, Marek Marczykowski; +Cc: Andrew Cooper, xen-devel

On 08.04.2025 11:31, Roger Pau Monne wrote:
> When running on AMD hardware in HVM mode the guest linear address (GLA)
> will not be provided to hvm_emulate_one_mmio(), and instead is
> unconditionally set of ~0.  As a consequence mmio_ro_emulated_write() will
> always report an error, as the fault GLA generated by the emulation of the
> access won't be ~0.

Which means subpage_mmio_write_accept() is flawed, too, on AMD (or more
generally whenever .gla_valid isn't set).

> Fix this by only checking for the fault GLA in mmio_ro_emulated_write()
> when the guest is PV.

This narrows checking too much, imo. For VT-x we could continue to do so,
provided we pass e.g. npfec down into hvm_emulate_one_mmio(), i.e. make
the gla_valid flag visible there.

> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -5187,7 +5187,12 @@ int cf_check mmio_ro_emulated_write(
>  
>      /* Only allow naturally-aligned stores at the original %cr2 address. */
>      if ( ((bytes | offset) & (bytes - 1)) || !bytes ||
> -         offset != mmio_ro_ctxt->cr2 )
> +         /*
> +          * HVM domains might not have a valid fault GLA in the context, as AMD
> +          * NPT faults don't report the faulting GLA.  It's also possible for
> +          * the fault to happen in non-paging modes.
> +          */
> +         (is_pv_domain(current->domain) && offset != mmio_ro_ctxt->cr2) )
>      {
>          gdprintk(XENLOG_WARNING, "bad access (cr2=%lx, addr=%lx, bytes=%u)\n",
>                  mmio_ro_ctxt->cr2, offset, bytes);

Is logging the supposed CR2 value useful then for cases where the GLA
isn't valid? I fear it might be more confusing than helpful.

Jan


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] x86/hvm: fix write emulation of RO ranges
  2025-04-08  9:31 ` [PATCH 2/2] x86/hvm: fix write emulation of RO ranges Roger Pau Monne
  2025-04-08 13:57   ` Jan Beulich
@ 2025-04-08 14:00   ` Andrew Cooper
  2025-04-09  9:47     ` Roger Pau Monné
  1 sibling, 1 reply; 19+ messages in thread
From: Andrew Cooper @ 2025-04-08 14:00 UTC (permalink / raw)
  To: Roger Pau Monne, xen-devel; +Cc: Jan Beulich

On 08/04/2025 10:31 am, Roger Pau Monne wrote:
> When running on AMD hardware in HVM mode the guest linear address (GLA)
> will not be provided to hvm_emulate_one_mmio(), and instead is
> unconditionally set of ~0.  As a consequence mmio_ro_emulated_write() will
> always report an error, as the fault GLA generated by the emulation of the
> access won't be ~0.
>
> Fix this by only checking for the fault GLA in mmio_ro_emulated_write()
> when the guest is PV.
>
> Fixes: 33c19df9a5a0 ('x86/PCI: intercept accesses to RO MMIO from dom0s in HVM containers')
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

I think there are several bugs here.

We do get %cr2 reliably for PV and Shadow guests.

Intel EPT may or may not give us GLA.  e.g. writes for pagetable A/D
updates don't get GLA.

Defaulting to ~0 isn't ok.  We need some kind of GLA-valid signal,
except for HAP guests, it isn't even the GLA we care about, it's the GPA
which identifies the MMIO region.

We shouldn't terminate the emulation if there's no GLA to check.  In the
case that we don't have a GLA, we should translate the memory operand
and cross-check the GPA.  We'll definitely have one of the two to hand.

~Andrew

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] x86/hvm: fix write emulation of RO ranges
  2025-04-08 13:57   ` Jan Beulich
@ 2025-04-09  9:07     ` Roger Pau Monné
  2025-04-09 10:00       ` Jan Beulich
  2025-04-09 14:27     ` Marek Marczykowski
  1 sibling, 1 reply; 19+ messages in thread
From: Roger Pau Monné @ 2025-04-09  9:07 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Marek Marczykowski, Andrew Cooper, xen-devel

On Tue, Apr 08, 2025 at 03:57:17PM +0200, Jan Beulich wrote:
> On 08.04.2025 11:31, Roger Pau Monne wrote:
> > When running on AMD hardware in HVM mode the guest linear address (GLA)
> > will not be provided to hvm_emulate_one_mmio(), and instead is
> > unconditionally set of ~0.  As a consequence mmio_ro_emulated_write() will
> > always report an error, as the fault GLA generated by the emulation of the
> > access won't be ~0.
> 
> Which means subpage_mmio_write_accept() is flawed, too, on AMD (or more
> generally whenever .gla_valid isn't set).

Oh, yes, good catch.  I didn't notice that one.  We should move all
those checks to use a paddr rather than a gla.

> > Fix this by only checking for the fault GLA in mmio_ro_emulated_write()
> > when the guest is PV.
> 
> This narrows checking too much, imo. For VT-x we could continue to do so,
> provided we pass e.g. npfec down into hvm_emulate_one_mmio(), i.e. make
> the gla_valid flag visible there.

I don't think we should rely on the gla at all in
mmio_ro_emulated_write(), and instead just use the physical address.

> > --- a/xen/arch/x86/mm.c
> > +++ b/xen/arch/x86/mm.c
> > @@ -5187,7 +5187,12 @@ int cf_check mmio_ro_emulated_write(
> >  
> >      /* Only allow naturally-aligned stores at the original %cr2 address. */
> >      if ( ((bytes | offset) & (bytes - 1)) || !bytes ||
> > -         offset != mmio_ro_ctxt->cr2 )
> > +         /*
> > +          * HVM domains might not have a valid fault GLA in the context, as AMD
> > +          * NPT faults don't report the faulting GLA.  It's also possible for
> > +          * the fault to happen in non-paging modes.
> > +          */
> > +         (is_pv_domain(current->domain) && offset != mmio_ro_ctxt->cr2) )
> >      {
> >          gdprintk(XENLOG_WARNING, "bad access (cr2=%lx, addr=%lx, bytes=%u)\n",
> >                  mmio_ro_ctxt->cr2, offset, bytes);
> 
> Is logging the supposed CR2 value useful then for cases where the GLA
> isn't valid? I fear it might be more confusing than helpful.

Since it was a debug printk I was kind of OK with leaving it, but
given the other comments I think I will have to rework
mmio_ro_emulated_write() so that it doesn't use the gla anymore.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] x86/hvm: fix write emulation of RO ranges
  2025-04-08 14:00   ` Andrew Cooper
@ 2025-04-09  9:47     ` Roger Pau Monné
  0 siblings, 0 replies; 19+ messages in thread
From: Roger Pau Monné @ 2025-04-09  9:47 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Jan Beulich

On Tue, Apr 08, 2025 at 03:00:28PM +0100, Andrew Cooper wrote:
> On 08/04/2025 10:31 am, Roger Pau Monne wrote:
> > When running on AMD hardware in HVM mode the guest linear address (GLA)
> > will not be provided to hvm_emulate_one_mmio(), and instead is
> > unconditionally set of ~0.  As a consequence mmio_ro_emulated_write() will
> > always report an error, as the fault GLA generated by the emulation of the
> > access won't be ~0.
> >
> > Fix this by only checking for the fault GLA in mmio_ro_emulated_write()
> > when the guest is PV.
> >
> > Fixes: 33c19df9a5a0 ('x86/PCI: intercept accesses to RO MMIO from dom0s in HVM containers')
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> 
> I think there are several bugs here.
> 
> We do get %cr2 reliably for PV and Shadow guests.
> 
> Intel EPT may or may not give us GLA.  e.g. writes for pagetable A/D
> updates don't get GLA.
> 
> Defaulting to ~0 isn't ok.  We need some kind of GLA-valid signal,
> except for HAP guests, it isn't even the GLA we care about, it's the GPA
> which identifies the MMIO region.
> 
> We shouldn't terminate the emulation if there's no GLA to check.  In the
> case that we don't have a GLA, we should translate the memory operand
> and cross-check the GPA.  We'll definitely have one of the two to hand.

I guess I will have to switch to a more complex approach for HVM and
use logic similar to hvmemul_write() to figure out the mfn, and
compare it with the fault provided one.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] x86/hvm: fix write emulation of RO ranges
  2025-04-09  9:07     ` Roger Pau Monné
@ 2025-04-09 10:00       ` Jan Beulich
  2025-04-09 10:39         ` Roger Pau Monné
  0 siblings, 1 reply; 19+ messages in thread
From: Jan Beulich @ 2025-04-09 10:00 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Marek Marczykowski, Andrew Cooper, xen-devel

On 09.04.2025 11:07, Roger Pau Monné wrote:
> On Tue, Apr 08, 2025 at 03:57:17PM +0200, Jan Beulich wrote:
>> On 08.04.2025 11:31, Roger Pau Monne wrote:
>>> When running on AMD hardware in HVM mode the guest linear address (GLA)
>>> will not be provided to hvm_emulate_one_mmio(), and instead is
>>> unconditionally set of ~0.  As a consequence mmio_ro_emulated_write() will
>>> always report an error, as the fault GLA generated by the emulation of the
>>> access won't be ~0.
>>
>> Which means subpage_mmio_write_accept() is flawed, too, on AMD (or more
>> generally whenever .gla_valid isn't set).
> 
> Oh, yes, good catch.  I didn't notice that one.  We should move all
> those checks to use a paddr rather than a gla.

Really that function could just be passed the offset into the page.

>>> Fix this by only checking for the fault GLA in mmio_ro_emulated_write()
>>> when the guest is PV.
>>
>> This narrows checking too much, imo. For VT-x we could continue to do so,
>> provided we pass e.g. npfec down into hvm_emulate_one_mmio(), i.e. make
>> the gla_valid flag visible there.
> 
> I don't think we should rely on the gla at all in
> mmio_ro_emulated_write(), and instead just use the physical address.

But you can't validate a physical address against a CR2 value. And I view
this validation as meaningful, to guard (best effort, but still) against
e.g. insn re-writing under our feet.

Jan


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] x86/hvm: fix write emulation of RO ranges
  2025-04-09 10:00       ` Jan Beulich
@ 2025-04-09 10:39         ` Roger Pau Monné
  2025-04-09 12:59           ` Jan Beulich
  0 siblings, 1 reply; 19+ messages in thread
From: Roger Pau Monné @ 2025-04-09 10:39 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Marek Marczykowski, Andrew Cooper, xen-devel

On Wed, Apr 09, 2025 at 12:00:16PM +0200, Jan Beulich wrote:
> On 09.04.2025 11:07, Roger Pau Monné wrote:
> > On Tue, Apr 08, 2025 at 03:57:17PM +0200, Jan Beulich wrote:
> >> On 08.04.2025 11:31, Roger Pau Monne wrote:
> >>> When running on AMD hardware in HVM mode the guest linear address (GLA)
> >>> will not be provided to hvm_emulate_one_mmio(), and instead is
> >>> unconditionally set of ~0.  As a consequence mmio_ro_emulated_write() will
> >>> always report an error, as the fault GLA generated by the emulation of the
> >>> access won't be ~0.
> >>
> >> Which means subpage_mmio_write_accept() is flawed, too, on AMD (or more
> >> generally whenever .gla_valid isn't set).
> > 
> > Oh, yes, good catch.  I didn't notice that one.  We should move all
> > those checks to use a paddr rather than a gla.
> 
> Really that function could just be passed the offset into the page.
> 
> >>> Fix this by only checking for the fault GLA in mmio_ro_emulated_write()
> >>> when the guest is PV.
> >>
> >> This narrows checking too much, imo. For VT-x we could continue to do so,
> >> provided we pass e.g. npfec down into hvm_emulate_one_mmio(), i.e. make
> >> the gla_valid flag visible there.
> > 
> > I don't think we should rely on the gla at all in
> > mmio_ro_emulated_write(), and instead just use the physical address.
> 
> But you can't validate a physical address against a CR2 value. And I view
> this validation as meaningful, to guard (best effort, but still) against
> e.g. insn re-writing under our feet.

But we have the mfn in mmio_ro_ctxt, and could possibly use that to
validate?  I could expand the context to include the offset also, so
that we could fully validate it.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] x86/hvm: fix write emulation of RO ranges
  2025-04-09 10:39         ` Roger Pau Monné
@ 2025-04-09 12:59           ` Jan Beulich
  2025-04-09 13:33             ` Roger Pau Monné
  0 siblings, 1 reply; 19+ messages in thread
From: Jan Beulich @ 2025-04-09 12:59 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Marek Marczykowski, Andrew Cooper, xen-devel

On 09.04.2025 12:39, Roger Pau Monné wrote:
> On Wed, Apr 09, 2025 at 12:00:16PM +0200, Jan Beulich wrote:
>> On 09.04.2025 11:07, Roger Pau Monné wrote:
>>> On Tue, Apr 08, 2025 at 03:57:17PM +0200, Jan Beulich wrote:
>>>> On 08.04.2025 11:31, Roger Pau Monne wrote:
>>>>> When running on AMD hardware in HVM mode the guest linear address (GLA)
>>>>> will not be provided to hvm_emulate_one_mmio(), and instead is
>>>>> unconditionally set of ~0.  As a consequence mmio_ro_emulated_write() will
>>>>> always report an error, as the fault GLA generated by the emulation of the
>>>>> access won't be ~0.
>>>>
>>>> Which means subpage_mmio_write_accept() is flawed, too, on AMD (or more
>>>> generally whenever .gla_valid isn't set).
>>>
>>> Oh, yes, good catch.  I didn't notice that one.  We should move all
>>> those checks to use a paddr rather than a gla.
>>
>> Really that function could just be passed the offset into the page.
>>
>>>>> Fix this by only checking for the fault GLA in mmio_ro_emulated_write()
>>>>> when the guest is PV.
>>>>
>>>> This narrows checking too much, imo. For VT-x we could continue to do so,
>>>> provided we pass e.g. npfec down into hvm_emulate_one_mmio(), i.e. make
>>>> the gla_valid flag visible there.
>>>
>>> I don't think we should rely on the gla at all in
>>> mmio_ro_emulated_write(), and instead just use the physical address.
>>
>> But you can't validate a physical address against a CR2 value. And I view
>> this validation as meaningful, to guard (best effort, but still) against
>> e.g. insn re-writing under our feet.
> 
> But we have the mfn in mmio_ro_ctxt, and could possibly use that to
> validate?  I could expand the context to include the offset also, so
> that we could fully validate it.

How would you use the MFN to validate against the VA in CR2?

Jan


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] x86/hvm: fix write emulation of RO ranges
  2025-04-09 12:59           ` Jan Beulich
@ 2025-04-09 13:33             ` Roger Pau Monné
  2025-04-09 13:50               ` Jan Beulich
  0 siblings, 1 reply; 19+ messages in thread
From: Roger Pau Monné @ 2025-04-09 13:33 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Marek Marczykowski, Andrew Cooper, xen-devel

On Wed, Apr 09, 2025 at 02:59:45PM +0200, Jan Beulich wrote:
> On 09.04.2025 12:39, Roger Pau Monné wrote:
> > On Wed, Apr 09, 2025 at 12:00:16PM +0200, Jan Beulich wrote:
> >> On 09.04.2025 11:07, Roger Pau Monné wrote:
> >>> On Tue, Apr 08, 2025 at 03:57:17PM +0200, Jan Beulich wrote:
> >>>> On 08.04.2025 11:31, Roger Pau Monne wrote:
> >>>>> When running on AMD hardware in HVM mode the guest linear address (GLA)
> >>>>> will not be provided to hvm_emulate_one_mmio(), and instead is
> >>>>> unconditionally set of ~0.  As a consequence mmio_ro_emulated_write() will
> >>>>> always report an error, as the fault GLA generated by the emulation of the
> >>>>> access won't be ~0.
> >>>>
> >>>> Which means subpage_mmio_write_accept() is flawed, too, on AMD (or more
> >>>> generally whenever .gla_valid isn't set).
> >>>
> >>> Oh, yes, good catch.  I didn't notice that one.  We should move all
> >>> those checks to use a paddr rather than a gla.
> >>
> >> Really that function could just be passed the offset into the page.
> >>
> >>>>> Fix this by only checking for the fault GLA in mmio_ro_emulated_write()
> >>>>> when the guest is PV.
> >>>>
> >>>> This narrows checking too much, imo. For VT-x we could continue to do so,
> >>>> provided we pass e.g. npfec down into hvm_emulate_one_mmio(), i.e. make
> >>>> the gla_valid flag visible there.
> >>>
> >>> I don't think we should rely on the gla at all in
> >>> mmio_ro_emulated_write(), and instead just use the physical address.
> >>
> >> But you can't validate a physical address against a CR2 value. And I view
> >> this validation as meaningful, to guard (best effort, but still) against
> >> e.g. insn re-writing under our feet.
> > 
> > But we have the mfn in mmio_ro_ctxt, and could possibly use that to
> > validate?  I could expand the context to include the offset also, so
> > that we could fully validate it.
> 
> How would you use the MFN to validate against the VA in CR2?

I would use hvmemul_virtual_to_linear() and hvm_translate_get_page()
to get the underlying mfn of the linear address.  But maybe there's a
part of this that I'm missing, I've certainly haven't tried to
implement any of it.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] x86/hvm: fix write emulation of RO ranges
  2025-04-09 13:33             ` Roger Pau Monné
@ 2025-04-09 13:50               ` Jan Beulich
  2025-04-09 14:01                 ` Roger Pau Monné
  0 siblings, 1 reply; 19+ messages in thread
From: Jan Beulich @ 2025-04-09 13:50 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Marek Marczykowski, Andrew Cooper, xen-devel

On 09.04.2025 15:33, Roger Pau Monné wrote:
> On Wed, Apr 09, 2025 at 02:59:45PM +0200, Jan Beulich wrote:
>> On 09.04.2025 12:39, Roger Pau Monné wrote:
>>> On Wed, Apr 09, 2025 at 12:00:16PM +0200, Jan Beulich wrote:
>>>> On 09.04.2025 11:07, Roger Pau Monné wrote:
>>>>> On Tue, Apr 08, 2025 at 03:57:17PM +0200, Jan Beulich wrote:
>>>>>> On 08.04.2025 11:31, Roger Pau Monne wrote:
>>>>>>> When running on AMD hardware in HVM mode the guest linear address (GLA)
>>>>>>> will not be provided to hvm_emulate_one_mmio(), and instead is
>>>>>>> unconditionally set of ~0.  As a consequence mmio_ro_emulated_write() will
>>>>>>> always report an error, as the fault GLA generated by the emulation of the
>>>>>>> access won't be ~0.
>>>>>>
>>>>>> Which means subpage_mmio_write_accept() is flawed, too, on AMD (or more
>>>>>> generally whenever .gla_valid isn't set).
>>>>>
>>>>> Oh, yes, good catch.  I didn't notice that one.  We should move all
>>>>> those checks to use a paddr rather than a gla.
>>>>
>>>> Really that function could just be passed the offset into the page.
>>>>
>>>>>>> Fix this by only checking for the fault GLA in mmio_ro_emulated_write()
>>>>>>> when the guest is PV.
>>>>>>
>>>>>> This narrows checking too much, imo. For VT-x we could continue to do so,
>>>>>> provided we pass e.g. npfec down into hvm_emulate_one_mmio(), i.e. make
>>>>>> the gla_valid flag visible there.
>>>>>
>>>>> I don't think we should rely on the gla at all in
>>>>> mmio_ro_emulated_write(), and instead just use the physical address.
>>>>
>>>> But you can't validate a physical address against a CR2 value. And I view
>>>> this validation as meaningful, to guard (best effort, but still) against
>>>> e.g. insn re-writing under our feet.
>>>
>>> But we have the mfn in mmio_ro_ctxt, and could possibly use that to
>>> validate?  I could expand the context to include the offset also, so
>>> that we could fully validate it.
>>
>> How would you use the MFN to validate against the VA in CR2?
> 
> I would use hvmemul_virtual_to_linear()

If you mean to use the CR2 as input, you wouldn't need this. I said VA in
my earlier reply, yes, but strictly speaking that's a linear address.

> and hvm_translate_get_page()
> to get the underlying mfn of the linear address.  But maybe there's a
> part of this that I'm missing, I've certainly haven't tried to
> implement any of it.

Hmm, I see. I didn't think of doing it this way round. There's certainly
at least one caveat with this approach: Multiple linear addresses can all
map to the same GFN and hence MFN. Checking against the original linear
address (when available) doesn't have such an issue.

Jan


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] x86/hvm: fix write emulation of RO ranges
  2025-04-09 13:50               ` Jan Beulich
@ 2025-04-09 14:01                 ` Roger Pau Monné
  2025-04-09 14:08                   ` Jan Beulich
  0 siblings, 1 reply; 19+ messages in thread
From: Roger Pau Monné @ 2025-04-09 14:01 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Marek Marczykowski, Andrew Cooper, xen-devel

On Wed, Apr 09, 2025 at 03:50:13PM +0200, Jan Beulich wrote:
> On 09.04.2025 15:33, Roger Pau Monné wrote:
> > On Wed, Apr 09, 2025 at 02:59:45PM +0200, Jan Beulich wrote:
> >> On 09.04.2025 12:39, Roger Pau Monné wrote:
> >>> On Wed, Apr 09, 2025 at 12:00:16PM +0200, Jan Beulich wrote:
> >>>> On 09.04.2025 11:07, Roger Pau Monné wrote:
> >>>>> On Tue, Apr 08, 2025 at 03:57:17PM +0200, Jan Beulich wrote:
> >>>>>> On 08.04.2025 11:31, Roger Pau Monne wrote:
> >>>>>>> When running on AMD hardware in HVM mode the guest linear address (GLA)
> >>>>>>> will not be provided to hvm_emulate_one_mmio(), and instead is
> >>>>>>> unconditionally set of ~0.  As a consequence mmio_ro_emulated_write() will
> >>>>>>> always report an error, as the fault GLA generated by the emulation of the
> >>>>>>> access won't be ~0.
> >>>>>>
> >>>>>> Which means subpage_mmio_write_accept() is flawed, too, on AMD (or more
> >>>>>> generally whenever .gla_valid isn't set).
> >>>>>
> >>>>> Oh, yes, good catch.  I didn't notice that one.  We should move all
> >>>>> those checks to use a paddr rather than a gla.
> >>>>
> >>>> Really that function could just be passed the offset into the page.
> >>>>
> >>>>>>> Fix this by only checking for the fault GLA in mmio_ro_emulated_write()
> >>>>>>> when the guest is PV.
> >>>>>>
> >>>>>> This narrows checking too much, imo. For VT-x we could continue to do so,
> >>>>>> provided we pass e.g. npfec down into hvm_emulate_one_mmio(), i.e. make
> >>>>>> the gla_valid flag visible there.
> >>>>>
> >>>>> I don't think we should rely on the gla at all in
> >>>>> mmio_ro_emulated_write(), and instead just use the physical address.
> >>>>
> >>>> But you can't validate a physical address against a CR2 value. And I view
> >>>> this validation as meaningful, to guard (best effort, but still) against
> >>>> e.g. insn re-writing under our feet.
> >>>
> >>> But we have the mfn in mmio_ro_ctxt, and could possibly use that to
> >>> validate?  I could expand the context to include the offset also, so
> >>> that we could fully validate it.
> >>
> >> How would you use the MFN to validate against the VA in CR2?
> > 
> > I would use hvmemul_virtual_to_linear()
> 
> If you mean to use the CR2 as input, you wouldn't need this. I said VA in
> my earlier reply, yes, but strictly speaking that's a linear address.

I was thinking about using the segment and offset parameters of the
mmio_ro_emulated_write() call.

> > and hvm_translate_get_page()
> > to get the underlying mfn of the linear address.  But maybe there's a
> > part of this that I'm missing, I've certainly haven't tried to
> > implement any of it.
> 
> Hmm, I see. I didn't think of doing it this way round. There's certainly
> at least one caveat with this approach: Multiple linear addresses can all
> map to the same GFN and hence MFN. Checking against the original linear
> address (when available) doesn't have such an issue.

I see... Yet for AMD that address is not uniformly available as part
of the vmexit information?  As I understand the checks done in
mmio_ro_emulated_write() are to ensure correctness, but carrying the
access even when the %cr2 check fail wouldn't cause issues to Xen
itself?

Thanks, Roger.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] x86/hvm: fix write emulation of RO ranges
  2025-04-09 14:01                 ` Roger Pau Monné
@ 2025-04-09 14:08                   ` Jan Beulich
  2025-04-09 15:33                     ` Roger Pau Monné
  0 siblings, 1 reply; 19+ messages in thread
From: Jan Beulich @ 2025-04-09 14:08 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Marek Marczykowski, Andrew Cooper, xen-devel

On 09.04.2025 16:01, Roger Pau Monné wrote:
> On Wed, Apr 09, 2025 at 03:50:13PM +0200, Jan Beulich wrote:
>> On 09.04.2025 15:33, Roger Pau Monné wrote:
>>> On Wed, Apr 09, 2025 at 02:59:45PM +0200, Jan Beulich wrote:
>>>> On 09.04.2025 12:39, Roger Pau Monné wrote:
>>>>> On Wed, Apr 09, 2025 at 12:00:16PM +0200, Jan Beulich wrote:
>>>>>> On 09.04.2025 11:07, Roger Pau Monné wrote:
>>>>>>> On Tue, Apr 08, 2025 at 03:57:17PM +0200, Jan Beulich wrote:
>>>>>>>> On 08.04.2025 11:31, Roger Pau Monne wrote:
>>>>>>>>> When running on AMD hardware in HVM mode the guest linear address (GLA)
>>>>>>>>> will not be provided to hvm_emulate_one_mmio(), and instead is
>>>>>>>>> unconditionally set of ~0.  As a consequence mmio_ro_emulated_write() will
>>>>>>>>> always report an error, as the fault GLA generated by the emulation of the
>>>>>>>>> access won't be ~0.
>>>>>>>>
>>>>>>>> Which means subpage_mmio_write_accept() is flawed, too, on AMD (or more
>>>>>>>> generally whenever .gla_valid isn't set).
>>>>>>>
>>>>>>> Oh, yes, good catch.  I didn't notice that one.  We should move all
>>>>>>> those checks to use a paddr rather than a gla.
>>>>>>
>>>>>> Really that function could just be passed the offset into the page.
>>>>>>
>>>>>>>>> Fix this by only checking for the fault GLA in mmio_ro_emulated_write()
>>>>>>>>> when the guest is PV.
>>>>>>>>
>>>>>>>> This narrows checking too much, imo. For VT-x we could continue to do so,
>>>>>>>> provided we pass e.g. npfec down into hvm_emulate_one_mmio(), i.e. make
>>>>>>>> the gla_valid flag visible there.
>>>>>>>
>>>>>>> I don't think we should rely on the gla at all in
>>>>>>> mmio_ro_emulated_write(), and instead just use the physical address.
>>>>>>
>>>>>> But you can't validate a physical address against a CR2 value. And I view
>>>>>> this validation as meaningful, to guard (best effort, but still) against
>>>>>> e.g. insn re-writing under our feet.
>>>>>
>>>>> But we have the mfn in mmio_ro_ctxt, and could possibly use that to
>>>>> validate?  I could expand the context to include the offset also, so
>>>>> that we could fully validate it.
>>>>
>>>> How would you use the MFN to validate against the VA in CR2?
>>>
>>> I would use hvmemul_virtual_to_linear()
>>
>> If you mean to use the CR2 as input, you wouldn't need this. I said VA in
>> my earlier reply, yes, but strictly speaking that's a linear address.
> 
> I was thinking about using the segment and offset parameters of the
> mmio_ro_emulated_write() call.
> 
>>> and hvm_translate_get_page()
>>> to get the underlying mfn of the linear address.  But maybe there's a
>>> part of this that I'm missing, I've certainly haven't tried to
>>> implement any of it.
>>
>> Hmm, I see. I didn't think of doing it this way round. There's certainly
>> at least one caveat with this approach: Multiple linear addresses can all
>> map to the same GFN and hence MFN. Checking against the original linear
>> address (when available) doesn't have such an issue.
> 
> I see... Yet for AMD that address is not uniformly available as part
> of the vmexit information?

Even stronger, I thought: It's uniformly not available.

>  As I understand the checks done in
> mmio_ro_emulated_write() are to ensure correctness, but carrying the
> access even when the %cr2 check fail wouldn't cause issues to Xen
> itself?

Well, "wouldn't" is too strong for my taste, "shouldn't" would fit. The
checking is there to avoid guests playing games. Whether that prevents
merely in-guest just-bugs or actual XSAs we can't know until we find a
case where the game playing might make us do something wrong. I expect
it's unlikely for Xen itself to be affected. But an in-guest privilege
escalation would already be bad enough.

So why don't we do the linear address check as we do today (provided a
linear address is available), and only use the alternative approach when
no linear address is available?

Jan


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] x86/hvm: fix write emulation of RO ranges
  2025-04-08 13:57   ` Jan Beulich
  2025-04-09  9:07     ` Roger Pau Monné
@ 2025-04-09 14:27     ` Marek Marczykowski
  1 sibling, 0 replies; 19+ messages in thread
From: Marek Marczykowski @ 2025-04-09 14:27 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Roger Pau Monne, Andrew Cooper, xen-devel

[-- Attachment #1: Type: text/plain, Size: 1203 bytes --]

On Tue, Apr 08, 2025 at 03:57:17PM +0200, Jan Beulich wrote:
> On 08.04.2025 11:31, Roger Pau Monne wrote:
> > When running on AMD hardware in HVM mode the guest linear address (GLA)
> > will not be provided to hvm_emulate_one_mmio(), and instead is
> > unconditionally set of ~0.  As a consequence mmio_ro_emulated_write() will
> > always report an error, as the fault GLA generated by the emulation of the
> > access won't be ~0.
> 
> Which means subpage_mmio_write_accept() is flawed, too, on AMD (or more
> generally whenever .gla_valid isn't set).

That may explain issues I see when using XHCI console on AMD (I can
crash the whole thing using sequence of driver binding/unbinding in
dom0). That's actually the hw12 runner in the other series, but tests
that are included in gitlab do not trigger the issue (fortunately?). But
also, it may be a different issue, as it affects PV dom0 too...

Anyway, I can probably test a patch if subpage_mmio_write_accept() works
as intended (I'll need to check if that path is exercised on AMD too as
it depends on xhci caps layout - it was definitely used on Intel).

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] x86/hvm: fix write emulation of RO ranges
  2025-04-09 14:08                   ` Jan Beulich
@ 2025-04-09 15:33                     ` Roger Pau Monné
  2025-04-10  6:27                       ` Jan Beulich
  0 siblings, 1 reply; 19+ messages in thread
From: Roger Pau Monné @ 2025-04-09 15:33 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Marek Marczykowski, Andrew Cooper, xen-devel

On Wed, Apr 09, 2025 at 04:08:47PM +0200, Jan Beulich wrote:
> On 09.04.2025 16:01, Roger Pau Monné wrote:
> > On Wed, Apr 09, 2025 at 03:50:13PM +0200, Jan Beulich wrote:
> >> On 09.04.2025 15:33, Roger Pau Monné wrote:
> >>> On Wed, Apr 09, 2025 at 02:59:45PM +0200, Jan Beulich wrote:
> >>>> On 09.04.2025 12:39, Roger Pau Monné wrote:
> >>>>> On Wed, Apr 09, 2025 at 12:00:16PM +0200, Jan Beulich wrote:
> >>>>>> On 09.04.2025 11:07, Roger Pau Monné wrote:
> >>>>>>> On Tue, Apr 08, 2025 at 03:57:17PM +0200, Jan Beulich wrote:
> >>>>>>>> On 08.04.2025 11:31, Roger Pau Monne wrote:
> >>>>>>>>> When running on AMD hardware in HVM mode the guest linear address (GLA)
> >>>>>>>>> will not be provided to hvm_emulate_one_mmio(), and instead is
> >>>>>>>>> unconditionally set of ~0.  As a consequence mmio_ro_emulated_write() will
> >>>>>>>>> always report an error, as the fault GLA generated by the emulation of the
> >>>>>>>>> access won't be ~0.
> >>>>>>>>
> >>>>>>>> Which means subpage_mmio_write_accept() is flawed, too, on AMD (or more
> >>>>>>>> generally whenever .gla_valid isn't set).
> >>>>>>>
> >>>>>>> Oh, yes, good catch.  I didn't notice that one.  We should move all
> >>>>>>> those checks to use a paddr rather than a gla.
> >>>>>>
> >>>>>> Really that function could just be passed the offset into the page.
> >>>>>>
> >>>>>>>>> Fix this by only checking for the fault GLA in mmio_ro_emulated_write()
> >>>>>>>>> when the guest is PV.
> >>>>>>>>
> >>>>>>>> This narrows checking too much, imo. For VT-x we could continue to do so,
> >>>>>>>> provided we pass e.g. npfec down into hvm_emulate_one_mmio(), i.e. make
> >>>>>>>> the gla_valid flag visible there.
> >>>>>>>
> >>>>>>> I don't think we should rely on the gla at all in
> >>>>>>> mmio_ro_emulated_write(), and instead just use the physical address.
> >>>>>>
> >>>>>> But you can't validate a physical address against a CR2 value. And I view
> >>>>>> this validation as meaningful, to guard (best effort, but still) against
> >>>>>> e.g. insn re-writing under our feet.
> >>>>>
> >>>>> But we have the mfn in mmio_ro_ctxt, and could possibly use that to
> >>>>> validate?  I could expand the context to include the offset also, so
> >>>>> that we could fully validate it.
> >>>>
> >>>> How would you use the MFN to validate against the VA in CR2?
> >>>
> >>> I would use hvmemul_virtual_to_linear()
> >>
> >> If you mean to use the CR2 as input, you wouldn't need this. I said VA in
> >> my earlier reply, yes, but strictly speaking that's a linear address.
> > 
> > I was thinking about using the segment and offset parameters of the
> > mmio_ro_emulated_write() call.
> > 
> >>> and hvm_translate_get_page()
> >>> to get the underlying mfn of the linear address.  But maybe there's a
> >>> part of this that I'm missing, I've certainly haven't tried to
> >>> implement any of it.
> >>
> >> Hmm, I see. I didn't think of doing it this way round. There's certainly
> >> at least one caveat with this approach: Multiple linear addresses can all
> >> map to the same GFN and hence MFN. Checking against the original linear
> >> address (when available) doesn't have such an issue.
> > 
> > I see... Yet for AMD that address is not uniformly available as part
> > of the vmexit information?
> 
> Even stronger, I thought: It's uniformly not available.

Oh yes, that's what I meant to say but got the words the other way
around.

> >  As I understand the checks done in
> > mmio_ro_emulated_write() are to ensure correctness, but carrying the
> > access even when the %cr2 check fail wouldn't cause issues to Xen
> > itself?
> 
> Well, "wouldn't" is too strong for my taste, "shouldn't" would fit. The
> checking is there to avoid guests playing games. Whether that prevents
> merely in-guest just-bugs or actual XSAs we can't know until we find a
> case where the game playing might make us do something wrong. I expect
> it's unlikely for Xen itself to be affected. But an in-guest privilege
> escalation would already be bad enough.

I see.  That was kind of my understanding of the checks.  At least on
HVM it feels a bit weird to handle r/o regions this way.  It would IMO
be more natural to use an hvm_io_handler, but that's maybe because I'm
more familiar with those.

And in that regard, hvm_io_handler don't seem to do any of the extra
checking that mmio_ro_emulated_write() does with the %cr2, but maybe
that's done by some higher layer?  AFAICT that would ultimately get
called by hvmemul_read(), and there are no checks there at all.

> So why don't we do the linear address check as we do today (provided a
> linear address is available), and only use the alternative approach when
> no linear address is available?

I can try to do that, albeit as said above, at least for HVM guests
that checking of %cr2 seems to be quite inconsistent, as
hvmemul_{read,write}() won't do any of it.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] x86/hvm: fix write emulation of RO ranges
  2025-04-09 15:33                     ` Roger Pau Monné
@ 2025-04-10  6:27                       ` Jan Beulich
  2025-04-10  7:02                         ` Roger Pau Monné
  0 siblings, 1 reply; 19+ messages in thread
From: Jan Beulich @ 2025-04-10  6:27 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Marek Marczykowski, Andrew Cooper, xen-devel

On 09.04.2025 17:33, Roger Pau Monné wrote:
> On Wed, Apr 09, 2025 at 04:08:47PM +0200, Jan Beulich wrote:
>> On 09.04.2025 16:01, Roger Pau Monné wrote:
>>>  As I understand the checks done in
>>> mmio_ro_emulated_write() are to ensure correctness, but carrying the
>>> access even when the %cr2 check fail wouldn't cause issues to Xen
>>> itself?
>>
>> Well, "wouldn't" is too strong for my taste, "shouldn't" would fit. The
>> checking is there to avoid guests playing games. Whether that prevents
>> merely in-guest just-bugs or actual XSAs we can't know until we find a
>> case where the game playing might make us do something wrong. I expect
>> it's unlikely for Xen itself to be affected. But an in-guest privilege
>> escalation would already be bad enough.
> 
> I see.  That was kind of my understanding of the checks.  At least on
> HVM it feels a bit weird to handle r/o regions this way.  It would IMO
> be more natural to use an hvm_io_handler, but that's maybe because I'm
> more familiar with those.

I guess this would be an option; I assume it's the way it is because PVHv1
inherited it from PV, and PVHv2 inherited it from PVHv1.

> And in that regard, hvm_io_handler don't seem to do any of the extra
> checking that mmio_ro_emulated_write() does with the %cr2, but maybe
> that's done by some higher layer?  AFAICT that would ultimately get
> called by hvmemul_read(), and there are no checks there at all.

That more general framework isn't page-fault specific, and hence there's
no CR2 recorded to check against.

Jan


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] x86/hvm: fix write emulation of RO ranges
  2025-04-10  6:27                       ` Jan Beulich
@ 2025-04-10  7:02                         ` Roger Pau Monné
  0 siblings, 0 replies; 19+ messages in thread
From: Roger Pau Monné @ 2025-04-10  7:02 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Marek Marczykowski, Andrew Cooper, xen-devel

On Thu, Apr 10, 2025 at 08:27:49AM +0200, Jan Beulich wrote:
> On 09.04.2025 17:33, Roger Pau Monné wrote:
> > On Wed, Apr 09, 2025 at 04:08:47PM +0200, Jan Beulich wrote:
> >> On 09.04.2025 16:01, Roger Pau Monné wrote:
> >>>  As I understand the checks done in
> >>> mmio_ro_emulated_write() are to ensure correctness, but carrying the
> >>> access even when the %cr2 check fail wouldn't cause issues to Xen
> >>> itself?
> >>
> >> Well, "wouldn't" is too strong for my taste, "shouldn't" would fit. The
> >> checking is there to avoid guests playing games. Whether that prevents
> >> merely in-guest just-bugs or actual XSAs we can't know until we find a
> >> case where the game playing might make us do something wrong. I expect
> >> it's unlikely for Xen itself to be affected. But an in-guest privilege
> >> escalation would already be bad enough.
> > 
> > I see.  That was kind of my understanding of the checks.  At least on
> > HVM it feels a bit weird to handle r/o regions this way.  It would IMO
> > be more natural to use an hvm_io_handler, but that's maybe because I'm
> > more familiar with those.
> 
> I guess this would be an option; I assume it's the way it is because PVHv1
> inherited it from PV, and PVHv2 inherited it from PVHv1.

I have a draft with this approach, and it seems quite better, as it
allows to get rid of hvm_emulate_one_mmio() and the special casing
done in hvm_hap_nested_page_fault().

Thanks, Roger.


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2025-04-10  7:03 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-08  9:31 [PATCH 0/2] x86/hvm: fixes for RO MMIO emulation Roger Pau Monne
2025-04-08  9:31 ` [PATCH 1/2] x86/hvm: remove unreachable MMCFG write emulation Roger Pau Monne
2025-04-08 13:42   ` Jan Beulich
2025-04-08  9:31 ` [PATCH 2/2] x86/hvm: fix write emulation of RO ranges Roger Pau Monne
2025-04-08 13:57   ` Jan Beulich
2025-04-09  9:07     ` Roger Pau Monné
2025-04-09 10:00       ` Jan Beulich
2025-04-09 10:39         ` Roger Pau Monné
2025-04-09 12:59           ` Jan Beulich
2025-04-09 13:33             ` Roger Pau Monné
2025-04-09 13:50               ` Jan Beulich
2025-04-09 14:01                 ` Roger Pau Monné
2025-04-09 14:08                   ` Jan Beulich
2025-04-09 15:33                     ` Roger Pau Monné
2025-04-10  6:27                       ` Jan Beulich
2025-04-10  7:02                         ` Roger Pau Monné
2025-04-09 14:27     ` Marek Marczykowski
2025-04-08 14:00   ` Andrew Cooper
2025-04-09  9:47     ` Roger Pau Monné

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.