All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] x86/vlapic: Fixes to APIC_ESR handling
@ 2024-11-28  0:47 Andrew Cooper
  2024-11-28  0:47 ` [PATCH 1/2] x86/vlapic: Fix handling of writes to APIC_ESR Andrew Cooper
  2024-11-28  0:47 ` [PATCH 2/2] x86/vlapic: Drop vlapic->esr_lock Andrew Cooper
  0 siblings, 2 replies; 15+ messages in thread
From: Andrew Cooper @ 2024-11-28  0:47 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich, Roger Pau Monné

Found because of yesterday's Pentium errata fun, and trying to
complete/publish the XSA-462 PoC.

Andrew Cooper (2):
  x86/vlapic: Fix handling of writes to APIC_ESR
  x86/vlapic: Drop vlapic->esr_lock

 xen/arch/x86/hvm/vlapic.c              | 27 +++++++++++---------------
 xen/arch/x86/include/asm/hvm/vlapic.h  |  1 -
 xen/include/public/arch-x86/hvm/save.h |  1 +
 3 files changed, 12 insertions(+), 17 deletions(-)

-- 
2.39.5



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/2] x86/vlapic: Fix handling of writes to APIC_ESR
  2024-11-28  0:47 [PATCH 0/2] x86/vlapic: Fixes to APIC_ESR handling Andrew Cooper
@ 2024-11-28  0:47 ` Andrew Cooper
  2024-11-28  9:03   ` Roger Pau Monné
  2024-11-28 10:31   ` Jan Beulich
  2024-11-28  0:47 ` [PATCH 2/2] x86/vlapic: Drop vlapic->esr_lock Andrew Cooper
  1 sibling, 2 replies; 15+ messages in thread
From: Andrew Cooper @ 2024-11-28  0:47 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich, Roger Pau Monné

Xen currently presents APIC_ESR to guests as a simple read/write register.

This is incorrect.  The SDM states:

  The ESR is a write/read register. Before attempt to read from the ESR,
  software should first write to it. (The value written does not affect the
  values read subsequently; only zero may be written in x2APIC mode.) This
  write clears any previously logged errors and updates the ESR with any
  errors detected since the last write to the ESR. This write also rearms the
  APIC error interrupt triggering mechanism.

Introduce a new pending_esr field in hvm_hw_lapic.  Update vlapic_error() to
accumulate errors here, and extend vlapic_reg_write() to discard the written
value, and instead transfer pending_esr into APIC_ESR.  Reads are still as
before.

Importantly, this means that guests no longer destroys the ESR value it's
looking for in the LVTERR handler when following the SDM instructions.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>

Slightly RFC.  This collides with Alejandro's patch which adds the apic_id
field to hvm_hw_lapic too.  However, this is a far more obvious backport
candidate.

lapic_check_hidden() might in principle want to audit this field, but it's not
clear what to check.  While prior Xen will never have produced it in the
migration stream, Intel APIC-V will set APIC_ESR_ILLREGA above and beyond what
Xen will currently emulate.

I've checked that this does behave correctly under Intel APIC-V.  Writes to
APIC_ESR drop the written value into the backing page then take a trap-style
EXIT_REASON_APIC_WRITE which allows us to sample/latch properly.
---
 xen/arch/x86/hvm/vlapic.c              | 17 +++++++++++++++--
 xen/include/public/arch-x86/hvm/save.h |  1 +
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index 3363926b487b..98394ed26a52 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -108,7 +108,7 @@ static void vlapic_error(struct vlapic *vlapic, unsigned int errmask)
     uint32_t esr;
 
     spin_lock_irqsave(&vlapic->esr_lock, flags);
-    esr = vlapic_get_reg(vlapic, APIC_ESR);
+    esr = vlapic->hw.pending_esr;
     if ( (esr & errmask) != errmask )
     {
         uint32_t lvterr = vlapic_get_reg(vlapic, APIC_LVTERR);
@@ -127,7 +127,7 @@ static void vlapic_error(struct vlapic *vlapic, unsigned int errmask)
                  errmask |= APIC_ESR_RECVILL;
         }
 
-        vlapic_set_reg(vlapic, APIC_ESR, esr | errmask);
+        vlapic->hw.pending_esr |= errmask;
 
         if ( inj )
             vlapic_set_irq(vlapic, lvterr & APIC_VECTOR_MASK, 0);
@@ -802,6 +802,19 @@ void vlapic_reg_write(struct vcpu *v, unsigned int reg, uint32_t val)
         vlapic_set_reg(vlapic, APIC_ID, val);
         break;
 
+    case APIC_ESR:
+    {
+        unsigned long flags;
+
+        spin_lock_irqsave(&vlapic->esr_lock, flags);
+        val = vlapic->hw.pending_esr;
+        vlapic->hw.pending_esr = 0;
+        spin_unlock_irqrestore(&vlapic->esr_lock, flags);
+
+        vlapic_set_reg(vlapic, APIC_ESR, val);
+        break;
+    }
+
     case APIC_TASKPRI:
         vlapic_set_reg(vlapic, APIC_TASKPRI, val & 0xff);
         break;
diff --git a/xen/include/public/arch-x86/hvm/save.h b/xen/include/public/arch-x86/hvm/save.h
index 7ecacadde165..9c4bfc7ebdac 100644
--- a/xen/include/public/arch-x86/hvm/save.h
+++ b/xen/include/public/arch-x86/hvm/save.h
@@ -394,6 +394,7 @@ struct hvm_hw_lapic {
     uint32_t             disabled; /* VLAPIC_xx_DISABLED */
     uint32_t             timer_divisor;
     uint64_t             tdt_msr;
+    uint32_t             pending_esr;
 };
 
 DECLARE_HVM_SAVE_TYPE(LAPIC, 5, struct hvm_hw_lapic);
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/2] x86/vlapic: Drop vlapic->esr_lock
  2024-11-28  0:47 [PATCH 0/2] x86/vlapic: Fixes to APIC_ESR handling Andrew Cooper
  2024-11-28  0:47 ` [PATCH 1/2] x86/vlapic: Fix handling of writes to APIC_ESR Andrew Cooper
@ 2024-11-28  0:47 ` Andrew Cooper
  2024-11-28  9:26   ` Roger Pau Monné
  1 sibling, 1 reply; 15+ messages in thread
From: Andrew Cooper @ 2024-11-28  0:47 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich, Roger Pau Monné

With vlapic->hw.pending_esr held outside of the main regs page, it's much
easier to use atomic operations.

Use xchg() in vlapic_reg_write(), and *set_bit() in vlapic_error().

The only interesting change is that vlapic_error() now needs to take an
err_bit rather than an errmask, but thats fine for all current callers and
forseable changes.

No practical change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>

It turns out that XSA-462 had an indentation bug in it.

Our spinlock infrastructure is obscenely large.  Bloat-o-meter reports:

  add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-111 (-111)
  Function                                     old     new   delta
  vlapic_init                                  208     190     -18
  vlapic_error                                 112      67     -45
  vlapic_reg_write                            1145    1097     -48

In principle we could revert the XSA-462 patch now, and remove the LVTERR
vector handling special case.  MISRA is going to complain either way, because
it will see the cycle through vlapic_set_irq() without considering the
surrounding logic.
---
 xen/arch/x86/hvm/vlapic.c             | 32 ++++++---------------------
 xen/arch/x86/include/asm/hvm/vlapic.h |  1 -
 2 files changed, 7 insertions(+), 26 deletions(-)

diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index 98394ed26a52..f41a5d4619bb 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -102,14 +102,9 @@ static int vlapic_find_highest_irr(struct vlapic *vlapic)
     return vlapic_find_highest_vector(&vlapic->regs->data[APIC_IRR]);
 }
 
-static void vlapic_error(struct vlapic *vlapic, unsigned int errmask)
+static void vlapic_error(struct vlapic *vlapic, unsigned int err_bit)
 {
-    unsigned long flags;
-    uint32_t esr;
-
-    spin_lock_irqsave(&vlapic->esr_lock, flags);
-    esr = vlapic->hw.pending_esr;
-    if ( (esr & errmask) != errmask )
+    if ( !test_and_set_bit(err_bit, &vlapic->hw.pending_esr) )
     {
         uint32_t lvterr = vlapic_get_reg(vlapic, APIC_LVTERR);
         bool inj = false;
@@ -124,15 +119,12 @@ static void vlapic_error(struct vlapic *vlapic, unsigned int errmask)
             if ( (lvterr & APIC_VECTOR_MASK) >= 16 )
                  inj = true;
             else
-                 errmask |= APIC_ESR_RECVILL;
+                set_bit(ilog2(APIC_ESR_RECVILL), &vlapic->hw.pending_esr);
         }
 
-        vlapic->hw.pending_esr |= errmask;
-
         if ( inj )
             vlapic_set_irq(vlapic, lvterr & APIC_VECTOR_MASK, 0);
     }
-    spin_unlock_irqrestore(&vlapic->esr_lock, flags);
 }
 
 bool vlapic_test_irq(const struct vlapic *vlapic, uint8_t vec)
@@ -153,7 +145,7 @@ void vlapic_set_irq(struct vlapic *vlapic, uint8_t vec, uint8_t trig)
 
     if ( unlikely(vec < 16) )
     {
-        vlapic_error(vlapic, APIC_ESR_RECVILL);
+        vlapic_error(vlapic, ilog2(APIC_ESR_RECVILL));
         return;
     }
 
@@ -525,7 +517,7 @@ void vlapic_ipi(
             vlapic_domain(vlapic), vlapic, short_hand, dest, dest_mode);
 
         if ( unlikely((icr_low & APIC_VECTOR_MASK) < 16) )
-            vlapic_error(vlapic, APIC_ESR_SENDILL);
+            vlapic_error(vlapic, ilog2(APIC_ESR_SENDILL));
         else if ( target )
             vlapic_accept_irq(vlapic_vcpu(target), icr_low);
         break;
@@ -534,7 +526,7 @@ void vlapic_ipi(
     case APIC_DM_FIXED:
         if ( unlikely((icr_low & APIC_VECTOR_MASK) < 16) )
         {
-            vlapic_error(vlapic, APIC_ESR_SENDILL);
+            vlapic_error(vlapic, ilog2(APIC_ESR_SENDILL));
             break;
         }
         /* fall through */
@@ -803,17 +795,9 @@ void vlapic_reg_write(struct vcpu *v, unsigned int reg, uint32_t val)
         break;
 
     case APIC_ESR:
-    {
-        unsigned long flags;
-
-        spin_lock_irqsave(&vlapic->esr_lock, flags);
-        val = vlapic->hw.pending_esr;
-        vlapic->hw.pending_esr = 0;
-        spin_unlock_irqrestore(&vlapic->esr_lock, flags);
-
+        val = xchg(&vlapic->hw.pending_esr, 0);
         vlapic_set_reg(vlapic, APIC_ESR, val);
         break;
-    }
 
     case APIC_TASKPRI:
         vlapic_set_reg(vlapic, APIC_TASKPRI, val & 0xff);
@@ -1716,8 +1700,6 @@ int vlapic_init(struct vcpu *v)
 
     vlapic_reset(vlapic);
 
-    spin_lock_init(&vlapic->esr_lock);
-
     tasklet_init(&vlapic->init_sipi.tasklet, vlapic_init_sipi_action, v);
 
     if ( v->vcpu_id == 0 )
diff --git a/xen/arch/x86/include/asm/hvm/vlapic.h b/xen/arch/x86/include/asm/hvm/vlapic.h
index 2c4ff94ae7a8..c38855119836 100644
--- a/xen/arch/x86/include/asm/hvm/vlapic.h
+++ b/xen/arch/x86/include/asm/hvm/vlapic.h
@@ -69,7 +69,6 @@ struct vlapic {
         bool                 hw, regs;
         uint32_t             id, ldr;
     }                        loaded;
-    spinlock_t               esr_lock;
     struct periodic_time     pt;
     s_time_t                 timer_last_update;
     struct page_info         *regs_page;
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] x86/vlapic: Fix handling of writes to APIC_ESR
  2024-11-28  0:47 ` [PATCH 1/2] x86/vlapic: Fix handling of writes to APIC_ESR Andrew Cooper
@ 2024-11-28  9:03   ` Roger Pau Monné
  2024-11-28 11:01     ` Andrew Cooper
  2024-11-28 10:31   ` Jan Beulich
  1 sibling, 1 reply; 15+ messages in thread
From: Roger Pau Monné @ 2024-11-28  9:03 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Jan Beulich

On Thu, Nov 28, 2024 at 12:47:36AM +0000, Andrew Cooper wrote:
> Xen currently presents APIC_ESR to guests as a simple read/write register.
> 
> This is incorrect.  The SDM states:
> 
>   The ESR is a write/read register. Before attempt to read from the ESR,
>   software should first write to it. (The value written does not affect the
>   values read subsequently; only zero may be written in x2APIC mode.) This
>   write clears any previously logged errors and updates the ESR with any
>   errors detected since the last write to the ESR. This write also rearms the
>   APIC error interrupt triggering mechanism.
> 
> Introduce a new pending_esr field in hvm_hw_lapic.  Update vlapic_error() to
> accumulate errors here, and extend vlapic_reg_write() to discard the written
> value, and instead transfer pending_esr into APIC_ESR.  Reads are still as
> before.
> 
> Importantly, this means that guests no longer destroys the ESR value it's
> looking for in the LVTERR handler when following the SDM instructions.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> 
> Slightly RFC.  This collides with Alejandro's patch which adds the apic_id
> field to hvm_hw_lapic too.  However, this is a far more obvious backport
> candidate.
> 
> lapic_check_hidden() might in principle want to audit this field, but it's not
> clear what to check.  While prior Xen will never have produced it in the
> migration stream, Intel APIC-V will set APIC_ESR_ILLREGA above and beyond what
> Xen will currently emulate.
> 
> I've checked that this does behave correctly under Intel APIC-V.  Writes to
> APIC_ESR drop the written value into the backing page then take a trap-style
> EXIT_REASON_APIC_WRITE which allows us to sample/latch properly.
> ---
>  xen/arch/x86/hvm/vlapic.c              | 17 +++++++++++++++--
>  xen/include/public/arch-x86/hvm/save.h |  1 +
>  2 files changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
> index 3363926b487b..98394ed26a52 100644
> --- a/xen/arch/x86/hvm/vlapic.c
> +++ b/xen/arch/x86/hvm/vlapic.c
> @@ -108,7 +108,7 @@ static void vlapic_error(struct vlapic *vlapic, unsigned int errmask)
>      uint32_t esr;
>  
>      spin_lock_irqsave(&vlapic->esr_lock, flags);
> -    esr = vlapic_get_reg(vlapic, APIC_ESR);
> +    esr = vlapic->hw.pending_esr;
>      if ( (esr & errmask) != errmask )
>      {
>          uint32_t lvterr = vlapic_get_reg(vlapic, APIC_LVTERR);
> @@ -127,7 +127,7 @@ static void vlapic_error(struct vlapic *vlapic, unsigned int errmask)
>                   errmask |= APIC_ESR_RECVILL;
>          }
>  
> -        vlapic_set_reg(vlapic, APIC_ESR, esr | errmask);
> +        vlapic->hw.pending_esr |= errmask;
>  
>          if ( inj )
>              vlapic_set_irq(vlapic, lvterr & APIC_VECTOR_MASK, 0);

The SDM also contains:

"This write also rearms the APIC error interrupt triggering
mechanism."

Where "this write" is a write to the ESR register.  My understanding
is that the error vector will only be injected for the first reported
error. I think the logic regarding whether to inject the lvterr vector
needs to additionally be gated on whether vlapic->hw.pending_esr ==
0.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] x86/vlapic: Drop vlapic->esr_lock
  2024-11-28  0:47 ` [PATCH 2/2] x86/vlapic: Drop vlapic->esr_lock Andrew Cooper
@ 2024-11-28  9:26   ` Roger Pau Monné
  2024-11-28 10:10     ` Andrew Cooper
  0 siblings, 1 reply; 15+ messages in thread
From: Roger Pau Monné @ 2024-11-28  9:26 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Jan Beulich

On Thu, Nov 28, 2024 at 12:47:37AM +0000, Andrew Cooper wrote:
> With vlapic->hw.pending_esr held outside of the main regs page, it's much
> easier to use atomic operations.
> 
> Use xchg() in vlapic_reg_write(), and *set_bit() in vlapic_error().
> 
> The only interesting change is that vlapic_error() now needs to take an
> err_bit rather than an errmask, but thats fine for all current callers and
> forseable changes.
> 
> No practical change.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> 
> It turns out that XSA-462 had an indentation bug in it.
> 
> Our spinlock infrastructure is obscenely large.  Bloat-o-meter reports:
> 
>   add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-111 (-111)
>   Function                                     old     new   delta
>   vlapic_init                                  208     190     -18
>   vlapic_error                                 112      67     -45
>   vlapic_reg_write                            1145    1097     -48
> 
> In principle we could revert the XSA-462 patch now, and remove the LVTERR
> vector handling special case.  MISRA is going to complain either way, because
> it will see the cycle through vlapic_set_irq() without considering the
> surrounding logic.
> ---
>  xen/arch/x86/hvm/vlapic.c             | 32 ++++++---------------------
>  xen/arch/x86/include/asm/hvm/vlapic.h |  1 -
>  2 files changed, 7 insertions(+), 26 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
> index 98394ed26a52..f41a5d4619bb 100644
> --- a/xen/arch/x86/hvm/vlapic.c
> +++ b/xen/arch/x86/hvm/vlapic.c
> @@ -102,14 +102,9 @@ static int vlapic_find_highest_irr(struct vlapic *vlapic)
>      return vlapic_find_highest_vector(&vlapic->regs->data[APIC_IRR]);
>  }
>  
> -static void vlapic_error(struct vlapic *vlapic, unsigned int errmask)
> +static void vlapic_error(struct vlapic *vlapic, unsigned int err_bit)

Having to use ilog2() in the callers is kind of ugly.  I would rather
keep the same function parameter (a mask), and then either assert that
it only has one bit set, or iterate over all possible set bits on the
mask.

I assume you had a preference for doing it at the caller because it
would then be done by the preprocessor as the passed values are
macros.  Maybe we could add a wrapper about it:

static void vlapic_set_error_bit(struct vlapic *vlapic, unsigned int bit)
{ ... }

#define vlapic_error(v, m) ({         \
    BUILD_BUG_ON((m) & ((m) - 1));    \
    vlapic_set_error_bit(v, ilog2(m));\
})


>  {
> -    unsigned long flags;
> -    uint32_t esr;
> -
> -    spin_lock_irqsave(&vlapic->esr_lock, flags);
> -    esr = vlapic->hw.pending_esr;
> -    if ( (esr & errmask) != errmask )
> +    if ( !test_and_set_bit(err_bit, &vlapic->hw.pending_esr) )
>      {
>          uint32_t lvterr = vlapic_get_reg(vlapic, APIC_LVTERR);
>          bool inj = false;
> @@ -124,15 +119,12 @@ static void vlapic_error(struct vlapic *vlapic, unsigned int errmask)
>              if ( (lvterr & APIC_VECTOR_MASK) >= 16 )
>                   inj = true;

The line above also has bogus indentation.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] x86/vlapic: Drop vlapic->esr_lock
  2024-11-28  9:26   ` Roger Pau Monné
@ 2024-11-28 10:10     ` Andrew Cooper
  2024-11-28 10:25       ` Roger Pau Monné
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Cooper @ 2024-11-28 10:10 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Xen-devel, Jan Beulich

On 28/11/2024 9:26 am, Roger Pau Monné wrote:
> On Thu, Nov 28, 2024 at 12:47:37AM +0000, Andrew Cooper wrote:
>> With vlapic->hw.pending_esr held outside of the main regs page, it's much
>> easier to use atomic operations.
>>
>> Use xchg() in vlapic_reg_write(), and *set_bit() in vlapic_error().
>>
>> The only interesting change is that vlapic_error() now needs to take an
>> err_bit rather than an errmask, but thats fine for all current callers and
>> forseable changes.
>>
>> No practical change.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>> CC: Jan Beulich <JBeulich@suse.com>
>> CC: Roger Pau Monné <roger.pau@citrix.com>
>>
>> It turns out that XSA-462 had an indentation bug in it.
>>
>> Our spinlock infrastructure is obscenely large.  Bloat-o-meter reports:
>>
>>   add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-111 (-111)
>>   Function                                     old     new   delta
>>   vlapic_init                                  208     190     -18
>>   vlapic_error                                 112      67     -45
>>   vlapic_reg_write                            1145    1097     -48
>>
>> In principle we could revert the XSA-462 patch now, and remove the LVTERR
>> vector handling special case.  MISRA is going to complain either way, because
>> it will see the cycle through vlapic_set_irq() without considering the
>> surrounding logic.
>> ---
>>  xen/arch/x86/hvm/vlapic.c             | 32 ++++++---------------------
>>  xen/arch/x86/include/asm/hvm/vlapic.h |  1 -
>>  2 files changed, 7 insertions(+), 26 deletions(-)
>>
>> diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
>> index 98394ed26a52..f41a5d4619bb 100644
>> --- a/xen/arch/x86/hvm/vlapic.c
>> +++ b/xen/arch/x86/hvm/vlapic.c
>> @@ -102,14 +102,9 @@ static int vlapic_find_highest_irr(struct vlapic *vlapic)
>>      return vlapic_find_highest_vector(&vlapic->regs->data[APIC_IRR]);
>>  }
>>  
>> -static void vlapic_error(struct vlapic *vlapic, unsigned int errmask)
>> +static void vlapic_error(struct vlapic *vlapic, unsigned int err_bit)
> Having to use ilog2() in the callers is kind of ugly.  I would rather
> keep the same function parameter (a mask), and then either assert that
> it only has one bit set, or iterate over all possible set bits on the
> mask.

It can't stay as a mask, or we can't convert the logic to be lockless. 
There's no such thing as test_and_set_mask()  (until we get into next
years processors).

If you really don't like ilog2(), then we need a parallel set of
APIC_ESR_*_BIT constants, but I considered ilog2() to be the lesser of
these two evils.

> I assume you had a preference for doing it at the caller because it
> would then be done by the preprocessor as the passed values are
> macros.  Maybe we could add a wrapper about it:
>
> static void vlapic_set_error_bit(struct vlapic *vlapic, unsigned int bit)
> { ... }
>
> #define vlapic_error(v, m) ({         \
>     BUILD_BUG_ON((m) & ((m) - 1));    \
>     vlapic_set_error_bit(v, ilog2(m));\
> })

This is overkill IMO.  There are 3 callers and they're all local to
apic.c (hopefully soon to gain a 4th, but still).
>>  {
>> -    unsigned long flags;
>> -    uint32_t esr;
>> -
>> -    spin_lock_irqsave(&vlapic->esr_lock, flags);
>> -    esr = vlapic->hw.pending_esr;
>> -    if ( (esr & errmask) != errmask )
>> +    if ( !test_and_set_bit(err_bit, &vlapic->hw.pending_esr) )
>>      {
>>          uint32_t lvterr = vlapic_get_reg(vlapic, APIC_LVTERR);
>>          bool inj = false;
>> @@ -124,15 +119,12 @@ static void vlapic_error(struct vlapic *vlapic, unsigned int errmask)
>>              if ( (lvterr & APIC_VECTOR_MASK) >= 16 )
>>                   inj = true;
> The line above also has bogus indentation.

Yes, that was mentioned.

~Andrew


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] x86/vlapic: Drop vlapic->esr_lock
  2024-11-28 10:10     ` Andrew Cooper
@ 2024-11-28 10:25       ` Roger Pau Monné
  2024-11-28 10:47         ` Andrew Cooper
  0 siblings, 1 reply; 15+ messages in thread
From: Roger Pau Monné @ 2024-11-28 10:25 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Jan Beulich

On Thu, Nov 28, 2024 at 10:10:39AM +0000, Andrew Cooper wrote:
> On 28/11/2024 9:26 am, Roger Pau Monné wrote:
> > On Thu, Nov 28, 2024 at 12:47:37AM +0000, Andrew Cooper wrote:
> >> With vlapic->hw.pending_esr held outside of the main regs page, it's much
> >> easier to use atomic operations.
> >>
> >> Use xchg() in vlapic_reg_write(), and *set_bit() in vlapic_error().
> >>
> >> The only interesting change is that vlapic_error() now needs to take an
> >> err_bit rather than an errmask, but thats fine for all current callers and
> >> forseable changes.
> >>
> >> No practical change.
> >>
> >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> >> ---
> >> CC: Jan Beulich <JBeulich@suse.com>
> >> CC: Roger Pau Monné <roger.pau@citrix.com>
> >>
> >> It turns out that XSA-462 had an indentation bug in it.
> >>
> >> Our spinlock infrastructure is obscenely large.  Bloat-o-meter reports:
> >>
> >>   add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-111 (-111)
> >>   Function                                     old     new   delta
> >>   vlapic_init                                  208     190     -18
> >>   vlapic_error                                 112      67     -45
> >>   vlapic_reg_write                            1145    1097     -48
> >>
> >> In principle we could revert the XSA-462 patch now, and remove the LVTERR
> >> vector handling special case.  MISRA is going to complain either way, because
> >> it will see the cycle through vlapic_set_irq() without considering the
> >> surrounding logic.
> >> ---
> >>  xen/arch/x86/hvm/vlapic.c             | 32 ++++++---------------------
> >>  xen/arch/x86/include/asm/hvm/vlapic.h |  1 -
> >>  2 files changed, 7 insertions(+), 26 deletions(-)
> >>
> >> diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
> >> index 98394ed26a52..f41a5d4619bb 100644
> >> --- a/xen/arch/x86/hvm/vlapic.c
> >> +++ b/xen/arch/x86/hvm/vlapic.c
> >> @@ -102,14 +102,9 @@ static int vlapic_find_highest_irr(struct vlapic *vlapic)
> >>      return vlapic_find_highest_vector(&vlapic->regs->data[APIC_IRR]);
> >>  }
> >>  
> >> -static void vlapic_error(struct vlapic *vlapic, unsigned int errmask)
> >> +static void vlapic_error(struct vlapic *vlapic, unsigned int err_bit)
> > Having to use ilog2() in the callers is kind of ugly.  I would rather
> > keep the same function parameter (a mask), and then either assert that
> > it only has one bit set, or iterate over all possible set bits on the
> > mask.
> 
> It can't stay as a mask, or we can't convert the logic to be lockless. 
> There's no such thing as test_and_set_mask()  (until we get into next
> years processors).

The test_and_set_bit() will also need to be changed if you agree with
my comment on patch 1, as the interrupt should only be injected when
vlapic->hw.pending_esr == 0 rather than whether the specific error is
set in ESR.

> If you really don't like ilog2(), then we need a parallel set of
> APIC_ESR_*_BIT constants, but I considered ilog2() to be the lesser of
> these two evils.
> 
> > I assume you had a preference for doing it at the caller because it
> > would then be done by the preprocessor as the passed values are
> > macros.  Maybe we could add a wrapper about it:
> >
> > static void vlapic_set_error_bit(struct vlapic *vlapic, unsigned int bit)
> > { ... }
> >
> > #define vlapic_error(v, m) ({         \
> >     BUILD_BUG_ON((m) & ((m) - 1));    \
> >     vlapic_set_error_bit(v, ilog2(m));\
> > })
> 
> This is overkill IMO.  There are 3 callers and they're all local to
> apic.c (hopefully soon to gain a 4th, but still).

My recommendation would be a local macro in vlapic.c, but I'm
certainly not going to block the patch hover this.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] x86/vlapic: Fix handling of writes to APIC_ESR
  2024-11-28  0:47 ` [PATCH 1/2] x86/vlapic: Fix handling of writes to APIC_ESR Andrew Cooper
  2024-11-28  9:03   ` Roger Pau Monné
@ 2024-11-28 10:31   ` Jan Beulich
  2024-11-28 11:10     ` Andrew Cooper
  1 sibling, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2024-11-28 10:31 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Roger Pau Monné, Xen-devel

On 28.11.2024 01:47, Andrew Cooper wrote:
> Xen currently presents APIC_ESR to guests as a simple read/write register.
> 
> This is incorrect.  The SDM states:
> 
>   The ESR is a write/read register. Before attempt to read from the ESR,
>   software should first write to it. (The value written does not affect the
>   values read subsequently; only zero may be written in x2APIC mode.) This
>   write clears any previously logged errors and updates the ESR with any
>   errors detected since the last write to the ESR. This write also rearms the
>   APIC error interrupt triggering mechanism.
> 
> Introduce a new pending_esr field in hvm_hw_lapic.  Update vlapic_error() to
> accumulate errors here, and extend vlapic_reg_write() to discard the written
> value, and instead transfer pending_esr into APIC_ESR.  Reads are still as
> before.
> 
> Importantly, this means that guests no longer destroys the ESR value it's
> looking for in the LVTERR handler when following the SDM instructions.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

No Fixes: tag presumably because the issue had been there forever?

> ---
> Slightly RFC.  This collides with Alejandro's patch which adds the apic_id
> field to hvm_hw_lapic too.  However, this is a far more obvious backport
> candidate.
> 
> lapic_check_hidden() might in principle want to audit this field, but it's not
> clear what to check.  While prior Xen will never have produced it in the
> migration stream, Intel APIC-V will set APIC_ESR_ILLREGA above and beyond what
> Xen will currently emulate.

The ESR really is an 8-bit value (in a 32-bit register), so checking the
upper bits may be necessary. Plus ...

> --- a/xen/include/public/arch-x86/hvm/save.h
> +++ b/xen/include/public/arch-x86/hvm/save.h
> @@ -394,6 +394,7 @@ struct hvm_hw_lapic {
>      uint32_t             disabled; /* VLAPIC_xx_DISABLED */
>      uint32_t             timer_divisor;
>      uint64_t             tdt_msr;
> +    uint32_t             pending_esr;
>  };

... I think you need to make padding explicit here, and then check that
to be zero.

Jan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] x86/vlapic: Drop vlapic->esr_lock
  2024-11-28 10:25       ` Roger Pau Monné
@ 2024-11-28 10:47         ` Andrew Cooper
  0 siblings, 0 replies; 15+ messages in thread
From: Andrew Cooper @ 2024-11-28 10:47 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Xen-devel, Jan Beulich

On 28/11/2024 10:25 am, Roger Pau Monné wrote:
> On Thu, Nov 28, 2024 at 10:10:39AM +0000, Andrew Cooper wrote:
>> On 28/11/2024 9:26 am, Roger Pau Monné wrote:
>>> On Thu, Nov 28, 2024 at 12:47:37AM +0000, Andrew Cooper wrote:
>>>> With vlapic->hw.pending_esr held outside of the main regs page, it's much
>>>> easier to use atomic operations.
>>>>
>>>> Use xchg() in vlapic_reg_write(), and *set_bit() in vlapic_error().
>>>>
>>>> The only interesting change is that vlapic_error() now needs to take an
>>>> err_bit rather than an errmask, but thats fine for all current callers and
>>>> forseable changes.
>>>>
>>>> No practical change.
>>>>
>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>>> ---
>>>> CC: Jan Beulich <JBeulich@suse.com>
>>>> CC: Roger Pau Monné <roger.pau@citrix.com>
>>>>
>>>> It turns out that XSA-462 had an indentation bug in it.
>>>>
>>>> Our spinlock infrastructure is obscenely large.  Bloat-o-meter reports:
>>>>
>>>>   add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-111 (-111)
>>>>   Function                                     old     new   delta
>>>>   vlapic_init                                  208     190     -18
>>>>   vlapic_error                                 112      67     -45
>>>>   vlapic_reg_write                            1145    1097     -48
>>>>
>>>> In principle we could revert the XSA-462 patch now, and remove the LVTERR
>>>> vector handling special case.  MISRA is going to complain either way, because
>>>> it will see the cycle through vlapic_set_irq() without considering the
>>>> surrounding logic.
>>>> ---
>>>>  xen/arch/x86/hvm/vlapic.c             | 32 ++++++---------------------
>>>>  xen/arch/x86/include/asm/hvm/vlapic.h |  1 -
>>>>  2 files changed, 7 insertions(+), 26 deletions(-)
>>>>
>>>> diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
>>>> index 98394ed26a52..f41a5d4619bb 100644
>>>> --- a/xen/arch/x86/hvm/vlapic.c
>>>> +++ b/xen/arch/x86/hvm/vlapic.c
>>>> @@ -102,14 +102,9 @@ static int vlapic_find_highest_irr(struct vlapic *vlapic)
>>>>      return vlapic_find_highest_vector(&vlapic->regs->data[APIC_IRR]);
>>>>  }
>>>>  
>>>> -static void vlapic_error(struct vlapic *vlapic, unsigned int errmask)
>>>> +static void vlapic_error(struct vlapic *vlapic, unsigned int err_bit)
>>> Having to use ilog2() in the callers is kind of ugly.  I would rather
>>> keep the same function parameter (a mask), and then either assert that
>>> it only has one bit set, or iterate over all possible set bits on the
>>> mask.
>> It can't stay as a mask, or we can't convert the logic to be lockless. 
>> There's no such thing as test_and_set_mask()  (until we get into next
>> years processors).
> The test_and_set_bit() will also need to be changed if you agree with
> my comment on patch 1, as the interrupt should only be injected when
> vlapic->hw.pending_esr == 0 rather than whether the specific error is
> set in ESR.

I'm writing a longer email explaining why that's not correct :)

~Andrew


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] x86/vlapic: Fix handling of writes to APIC_ESR
  2024-11-28  9:03   ` Roger Pau Monné
@ 2024-11-28 11:01     ` Andrew Cooper
  2024-11-28 11:09       ` Jan Beulich
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Cooper @ 2024-11-28 11:01 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Xen-devel, Jan Beulich

On 28/11/2024 9:03 am, Roger Pau Monné wrote:
> On Thu, Nov 28, 2024 at 12:47:36AM +0000, Andrew Cooper wrote:
>> diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
>> index 3363926b487b..98394ed26a52 100644
>> --- a/xen/arch/x86/hvm/vlapic.c
>> +++ b/xen/arch/x86/hvm/vlapic.c
>> @@ -108,7 +108,7 @@ static void vlapic_error(struct vlapic *vlapic, unsigned int errmask)
>>      uint32_t esr;
>>  
>>      spin_lock_irqsave(&vlapic->esr_lock, flags);
>> -    esr = vlapic_get_reg(vlapic, APIC_ESR);
>> +    esr = vlapic->hw.pending_esr;
>>      if ( (esr & errmask) != errmask )
>>      {
>>          uint32_t lvterr = vlapic_get_reg(vlapic, APIC_LVTERR);
>> @@ -127,7 +127,7 @@ static void vlapic_error(struct vlapic *vlapic, unsigned int errmask)
>>                   errmask |= APIC_ESR_RECVILL;
>>          }
>>  
>> -        vlapic_set_reg(vlapic, APIC_ESR, esr | errmask);
>> +        vlapic->hw.pending_esr |= errmask;
>>  
>>          if ( inj )
>>              vlapic_set_irq(vlapic, lvterr & APIC_VECTOR_MASK, 0);
> The SDM also contains:
>
> "This write also rearms the APIC error interrupt triggering
> mechanism."
>
> Where "this write" is a write to the ESR register.

Correct.

> My understanding
> is that the error vector will only be injected for the first reported
> error. I think the logic regarding whether to inject the lvterr vector
> needs to additionally be gated on whether vlapic->hw.pending_esr ==
> 0.

I think it's clumsy wording.

Bits being set mask subsequent LVTERR's of the same type.  That's what
the "if ( (esr & errmask) != errmask )" guard is doing above.

What I think it's referring to is that writing APIC_ESR will zero
pending_esr and thus any subsequent error will cause LVTERR to deliver.


Having said all that, I can't find anything in the current SDM/APM which
states this.  I think I need to go back to the older manuals.

~Andrew


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] x86/vlapic: Fix handling of writes to APIC_ESR
  2024-11-28 11:01     ` Andrew Cooper
@ 2024-11-28 11:09       ` Jan Beulich
  0 siblings, 0 replies; 15+ messages in thread
From: Jan Beulich @ 2024-11-28 11:09 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Roger Pau Monné

On 28.11.2024 12:01, Andrew Cooper wrote:
> On 28/11/2024 9:03 am, Roger Pau Monné wrote:
>> On Thu, Nov 28, 2024 at 12:47:36AM +0000, Andrew Cooper wrote:
>>> diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
>>> index 3363926b487b..98394ed26a52 100644
>>> --- a/xen/arch/x86/hvm/vlapic.c
>>> +++ b/xen/arch/x86/hvm/vlapic.c
>>> @@ -108,7 +108,7 @@ static void vlapic_error(struct vlapic *vlapic, unsigned int errmask)
>>>      uint32_t esr;
>>>  
>>>      spin_lock_irqsave(&vlapic->esr_lock, flags);
>>> -    esr = vlapic_get_reg(vlapic, APIC_ESR);
>>> +    esr = vlapic->hw.pending_esr;
>>>      if ( (esr & errmask) != errmask )
>>>      {
>>>          uint32_t lvterr = vlapic_get_reg(vlapic, APIC_LVTERR);
>>> @@ -127,7 +127,7 @@ static void vlapic_error(struct vlapic *vlapic, unsigned int errmask)
>>>                   errmask |= APIC_ESR_RECVILL;
>>>          }
>>>  
>>> -        vlapic_set_reg(vlapic, APIC_ESR, esr | errmask);
>>> +        vlapic->hw.pending_esr |= errmask;
>>>  
>>>          if ( inj )
>>>              vlapic_set_irq(vlapic, lvterr & APIC_VECTOR_MASK, 0);
>> The SDM also contains:
>>
>> "This write also rearms the APIC error interrupt triggering
>> mechanism."
>>
>> Where "this write" is a write to the ESR register.
> 
> Correct.
> 
>> My understanding
>> is that the error vector will only be injected for the first reported
>> error. I think the logic regarding whether to inject the lvterr vector
>> needs to additionally be gated on whether vlapic->hw.pending_esr ==
>> 0.
> 
> I think it's clumsy wording.
> 
> Bits being set mask subsequent LVTERR's of the same type.  That's what
> the "if ( (esr & errmask) != errmask )" guard is doing above.

That's what we do, yes, but is that correct? I agree with Roger's reading
of that sentence.

> What I think it's referring to is that writing APIC_ESR will zero
> pending_esr and thus any subsequent error will cause LVTERR to deliver.

..., while at the same time preventing LVTERR delivery when there was
another error already pending.

Jan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] x86/vlapic: Fix handling of writes to APIC_ESR
  2024-11-28 10:31   ` Jan Beulich
@ 2024-11-28 11:10     ` Andrew Cooper
  2024-11-28 11:50       ` Jan Beulich
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Cooper @ 2024-11-28 11:10 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Roger Pau Monné, Xen-devel

On 28/11/2024 10:31 am, Jan Beulich wrote:
> On 28.11.2024 01:47, Andrew Cooper wrote:
>> Xen currently presents APIC_ESR to guests as a simple read/write register.
>>
>> This is incorrect.  The SDM states:
>>
>>   The ESR is a write/read register. Before attempt to read from the ESR,
>>   software should first write to it. (The value written does not affect the
>>   values read subsequently; only zero may be written in x2APIC mode.) This
>>   write clears any previously logged errors and updates the ESR with any
>>   errors detected since the last write to the ESR. This write also rearms the
>>   APIC error interrupt triggering mechanism.
>>
>> Introduce a new pending_esr field in hvm_hw_lapic.  Update vlapic_error() to
>> accumulate errors here, and extend vlapic_reg_write() to discard the written
>> value, and instead transfer pending_esr into APIC_ESR.  Reads are still as
>> before.
>>
>> Importantly, this means that guests no longer destroys the ESR value it's
>> looking for in the LVTERR handler when following the SDM instructions.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> No Fixes: tag presumably because the issue had been there forever?

Oh, I forgot to note that.

I can't decide between forever, or since the introduction of the ESR
support (so Xen 4.5 like XSA-462, and still basically forever).
>> ---
>> Slightly RFC.  This collides with Alejandro's patch which adds the apic_id
>> field to hvm_hw_lapic too.  However, this is a far more obvious backport
>> candidate.
>>
>> lapic_check_hidden() might in principle want to audit this field, but it's not
>> clear what to check.  While prior Xen will never have produced it in the
>> migration stream, Intel APIC-V will set APIC_ESR_ILLREGA above and beyond what
>> Xen will currently emulate.
> The ESR really is an 8-bit value (in a 32-bit register), so checking the
> upper bits may be necessary.

It is now, but it may not be in the future.

My concern is that this value is generated by microcode, so we can't
audit based on which reserved bits we think prior versions of Xen never set.

I don't particularly care about a toolstack deciding to feed ~0 in
here.  But, if any bit beyond 7 gets allocated in the future, then
auditing the bottom byte would lead to a migration failure of what is in
practice a correct value.

>  Plus ...
>
>> --- a/xen/include/public/arch-x86/hvm/save.h
>> +++ b/xen/include/public/arch-x86/hvm/save.h
>> @@ -394,6 +394,7 @@ struct hvm_hw_lapic {
>>      uint32_t             disabled; /* VLAPIC_xx_DISABLED */
>>      uint32_t             timer_divisor;
>>      uint64_t             tdt_msr;
>> +    uint32_t             pending_esr;
>>  };
> ... I think you need to make padding explicit here, and then check that
> to be zero.

On further consideration I need to merge this with Alejandro's change. 
His depends on spotting the need to zero-extend beyond tdt_msr to
identify the compatibility case.

~Andrew


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] x86/vlapic: Fix handling of writes to APIC_ESR
  2024-11-28 11:10     ` Andrew Cooper
@ 2024-11-28 11:50       ` Jan Beulich
  2024-11-28 11:57         ` Andrew Cooper
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2024-11-28 11:50 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Roger Pau Monné, Xen-devel

On 28.11.2024 12:10, Andrew Cooper wrote:
> On 28/11/2024 10:31 am, Jan Beulich wrote:
>> On 28.11.2024 01:47, Andrew Cooper wrote:
>>> Xen currently presents APIC_ESR to guests as a simple read/write register.
>>>
>>> This is incorrect.  The SDM states:
>>>
>>>   The ESR is a write/read register. Before attempt to read from the ESR,
>>>   software should first write to it. (The value written does not affect the
>>>   values read subsequently; only zero may be written in x2APIC mode.) This
>>>   write clears any previously logged errors and updates the ESR with any
>>>   errors detected since the last write to the ESR. This write also rearms the
>>>   APIC error interrupt triggering mechanism.
>>>
>>> Introduce a new pending_esr field in hvm_hw_lapic.  Update vlapic_error() to
>>> accumulate errors here, and extend vlapic_reg_write() to discard the written
>>> value, and instead transfer pending_esr into APIC_ESR.  Reads are still as
>>> before.
>>>
>>> Importantly, this means that guests no longer destroys the ESR value it's
>>> looking for in the LVTERR handler when following the SDM instructions.
>>>
>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> No Fixes: tag presumably because the issue had been there forever?
> 
> Oh, I forgot to note that.
> 
> I can't decide between forever, or since the introduction of the ESR
> support (so Xen 4.5 like XSA-462, and still basically forever).
>>> ---
>>> Slightly RFC.  This collides with Alejandro's patch which adds the apic_id
>>> field to hvm_hw_lapic too.  However, this is a far more obvious backport
>>> candidate.
>>>
>>> lapic_check_hidden() might in principle want to audit this field, but it's not
>>> clear what to check.  While prior Xen will never have produced it in the
>>> migration stream, Intel APIC-V will set APIC_ESR_ILLREGA above and beyond what
>>> Xen will currently emulate.
>> The ESR really is an 8-bit value (in a 32-bit register), so checking the
>> upper bits may be necessary.
> 
> It is now, but it may not be in the future.
> 
> My concern is that this value is generated by microcode, so we can't
> audit based on which reserved bits we think prior versions of Xen never set.
> 
> I don't particularly care about a toolstack deciding to feed ~0 in
> here.  But, if any bit beyond 7 gets allocated in the future, then
> auditing the bottom byte would lead to a migration failure of what is in
> practice a correct value.

If a bit beyond zero got allocated, then it being set in an incoming stream
will, for an unaware Xen version, still be illegal. Such a guest simply can't
be migrated to a Xen version unaware of the bit. Once Xen becomes aware, the
auditing would (of course) also need adjustment.

Jan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] x86/vlapic: Fix handling of writes to APIC_ESR
  2024-11-28 11:50       ` Jan Beulich
@ 2024-11-28 11:57         ` Andrew Cooper
  2024-11-28 12:16           ` Jan Beulich
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Cooper @ 2024-11-28 11:57 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Roger Pau Monné, Xen-devel

On 28/11/2024 11:50 am, Jan Beulich wrote:
> On 28.11.2024 12:10, Andrew Cooper wrote:
>> On 28/11/2024 10:31 am, Jan Beulich wrote:
>>> On 28.11.2024 01:47, Andrew Cooper wrote:
>>>> Xen currently presents APIC_ESR to guests as a simple read/write register.
>>>>
>>>> This is incorrect.  The SDM states:
>>>>
>>>>   The ESR is a write/read register. Before attempt to read from the ESR,
>>>>   software should first write to it. (The value written does not affect the
>>>>   values read subsequently; only zero may be written in x2APIC mode.) This
>>>>   write clears any previously logged errors and updates the ESR with any
>>>>   errors detected since the last write to the ESR. This write also rearms the
>>>>   APIC error interrupt triggering mechanism.
>>>>
>>>> Introduce a new pending_esr field in hvm_hw_lapic.  Update vlapic_error() to
>>>> accumulate errors here, and extend vlapic_reg_write() to discard the written
>>>> value, and instead transfer pending_esr into APIC_ESR.  Reads are still as
>>>> before.
>>>>
>>>> Importantly, this means that guests no longer destroys the ESR value it's
>>>> looking for in the LVTERR handler when following the SDM instructions.
>>>>
>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>> No Fixes: tag presumably because the issue had been there forever?
>> Oh, I forgot to note that.
>>
>> I can't decide between forever, or since the introduction of the ESR
>> support (so Xen 4.5 like XSA-462, and still basically forever).
>>>> ---
>>>> Slightly RFC.  This collides with Alejandro's patch which adds the apic_id
>>>> field to hvm_hw_lapic too.  However, this is a far more obvious backport
>>>> candidate.
>>>>
>>>> lapic_check_hidden() might in principle want to audit this field, but it's not
>>>> clear what to check.  While prior Xen will never have produced it in the
>>>> migration stream, Intel APIC-V will set APIC_ESR_ILLREGA above and beyond what
>>>> Xen will currently emulate.
>>> The ESR really is an 8-bit value (in a 32-bit register), so checking the
>>> upper bits may be necessary.
>> It is now, but it may not be in the future.
>>
>> My concern is that this value is generated by microcode, so we can't
>> audit based on which reserved bits we think prior versions of Xen never set.
>>
>> I don't particularly care about a toolstack deciding to feed ~0 in
>> here.  But, if any bit beyond 7 gets allocated in the future, then
>> auditing the bottom byte would lead to a migration failure of what is in
>> practice a correct value.
> If a bit beyond zero got allocated, then it being set in an incoming stream
> will, for an unaware Xen version, still be illegal. Such a guest simply can't
> be migrated to a Xen version unaware of the bit. Once Xen becomes aware, the
> auditing would (of course) also need adjustment.

That's the whole point.  It's not about Xen's awareness; it's what
APIC-V/AVIC might do *in existing configurations* on future hardware
without taking a VMExit.

If there were no APIC-V support to begin with, this would be easy and
auditing would be limited to SENDILL|RECVILL as those are the only two
bits Xen knows about.

~Andrew


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] x86/vlapic: Fix handling of writes to APIC_ESR
  2024-11-28 11:57         ` Andrew Cooper
@ 2024-11-28 12:16           ` Jan Beulich
  0 siblings, 0 replies; 15+ messages in thread
From: Jan Beulich @ 2024-11-28 12:16 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Roger Pau Monné, Xen-devel

On 28.11.2024 12:57, Andrew Cooper wrote:
> On 28/11/2024 11:50 am, Jan Beulich wrote:
>> On 28.11.2024 12:10, Andrew Cooper wrote:
>>> On 28/11/2024 10:31 am, Jan Beulich wrote:
>>>> On 28.11.2024 01:47, Andrew Cooper wrote:
>>>>> Xen currently presents APIC_ESR to guests as a simple read/write register.
>>>>>
>>>>> This is incorrect.  The SDM states:
>>>>>
>>>>>   The ESR is a write/read register. Before attempt to read from the ESR,
>>>>>   software should first write to it. (The value written does not affect the
>>>>>   values read subsequently; only zero may be written in x2APIC mode.) This
>>>>>   write clears any previously logged errors and updates the ESR with any
>>>>>   errors detected since the last write to the ESR. This write also rearms the
>>>>>   APIC error interrupt triggering mechanism.
>>>>>
>>>>> Introduce a new pending_esr field in hvm_hw_lapic.  Update vlapic_error() to
>>>>> accumulate errors here, and extend vlapic_reg_write() to discard the written
>>>>> value, and instead transfer pending_esr into APIC_ESR.  Reads are still as
>>>>> before.
>>>>>
>>>>> Importantly, this means that guests no longer destroys the ESR value it's
>>>>> looking for in the LVTERR handler when following the SDM instructions.
>>>>>
>>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>>> No Fixes: tag presumably because the issue had been there forever?
>>> Oh, I forgot to note that.
>>>
>>> I can't decide between forever, or since the introduction of the ESR
>>> support (so Xen 4.5 like XSA-462, and still basically forever).
>>>>> ---
>>>>> Slightly RFC.  This collides with Alejandro's patch which adds the apic_id
>>>>> field to hvm_hw_lapic too.  However, this is a far more obvious backport
>>>>> candidate.
>>>>>
>>>>> lapic_check_hidden() might in principle want to audit this field, but it's not
>>>>> clear what to check.  While prior Xen will never have produced it in the
>>>>> migration stream, Intel APIC-V will set APIC_ESR_ILLREGA above and beyond what
>>>>> Xen will currently emulate.
>>>> The ESR really is an 8-bit value (in a 32-bit register), so checking the
>>>> upper bits may be necessary.
>>> It is now, but it may not be in the future.
>>>
>>> My concern is that this value is generated by microcode, so we can't
>>> audit based on which reserved bits we think prior versions of Xen never set.
>>>
>>> I don't particularly care about a toolstack deciding to feed ~0 in
>>> here.  But, if any bit beyond 7 gets allocated in the future, then
>>> auditing the bottom byte would lead to a migration failure of what is in
>>> practice a correct value.
>> If a bit beyond zero got allocated, then it being set in an incoming stream
>> will, for an unaware Xen version, still be illegal. Such a guest simply can't
>> be migrated to a Xen version unaware of the bit. Once Xen becomes aware, the
>> auditing would (of course) also need adjustment.
> 
> That's the whole point.  It's not about Xen's awareness; it's what
> APIC-V/AVIC might do *in existing configurations* on future hardware
> without taking a VMExit.

How would you migrate such a guest to arbitrary other hardware, i.e.
potentially lacking support for that bit? If LVTERR triggering is as per
Roger's reading of the SDM, without knowing how many bits hardware
presently checks we couldn't guarantee correctness. Bits from 8 up being
reserved right now even leaves me wondering what happens on present
hardware when one of those top 24 bits is set.

> If there were no APIC-V support to begin with, this would be easy and
> auditing would be limited to SENDILL|RECVILL as those are the only two
> bits Xen knows about.

Limiting to just these two bits would be wrong; future Xen might make
use of more of them, and a guest should then still migrate correctly
(just that, after this initial being set of extra bits, it would never
again see any of them becoming set).

Jan


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2024-11-28 12:16 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-28  0:47 [PATCH 0/2] x86/vlapic: Fixes to APIC_ESR handling Andrew Cooper
2024-11-28  0:47 ` [PATCH 1/2] x86/vlapic: Fix handling of writes to APIC_ESR Andrew Cooper
2024-11-28  9:03   ` Roger Pau Monné
2024-11-28 11:01     ` Andrew Cooper
2024-11-28 11:09       ` Jan Beulich
2024-11-28 10:31   ` Jan Beulich
2024-11-28 11:10     ` Andrew Cooper
2024-11-28 11:50       ` Jan Beulich
2024-11-28 11:57         ` Andrew Cooper
2024-11-28 12:16           ` Jan Beulich
2024-11-28  0:47 ` [PATCH 2/2] x86/vlapic: Drop vlapic->esr_lock Andrew Cooper
2024-11-28  9:26   ` Roger Pau Monné
2024-11-28 10:10     ` Andrew Cooper
2024-11-28 10:25       ` Roger Pau Monné
2024-11-28 10:47         ` Andrew Cooper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.