* [PATCH] vmx, apicv: fix save/restore issue with apicv
@ 2014-10-11 7:54 Yang Zhang
2014-10-13 8:33 ` Jan Beulich
2014-10-14 14:19 ` Olaf Hering
0 siblings, 2 replies; 11+ messages in thread
From: Yang Zhang @ 2014-10-11 7:54 UTC (permalink / raw)
To: xen-devel
Cc: kevin.tian, Andrew.Cooper3, olaf, JBeulich, Yang Zhang,
malcolm.crossley
From: Yang Zhang <yang.z.zhang@Intel.com>
This patch fix two issues:
1. Interrupts on PIR are lost during save/restore. Syncing the PIR
into IRR during save will fix it.
2. EOI exit bitmap doesn't setup correctly after restore. Here we
will construct the eoi exit bimap via (IRR | ISR). Though it may cause
unnecessary eoi exit of the interrupts that pending in IRR or ISR during
save/resotre, each pending interrupt only causes one vmexit. The
subsequent interrupts will adjust the eoi exit bitmap correctly. So
the performace hurt can be ignored.
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
---
Hi Olaf and Malcolm, Can you help to retest it? Because the key part
is changed in this patch, so i'm not sure whether it still works in
your side.
Also, Olaf, is it ok to add your SOB in this patch? If yes, I can resend
it to do it.
xen/arch/x86/hvm/vlapic.c | 3 +++
xen/arch/x86/hvm/vmx/vmx.c | 27 ++++++++++++++++++++++++++-
2 files changed, 29 insertions(+), 1 deletions(-)
diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index 99ae1be..e702ed3 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -1259,6 +1259,9 @@ static int lapic_save_regs(struct domain *d, hvm_domain_context_t *h)
for_each_vcpu ( d, v )
{
+ if ( hvm_funcs.sync_pir_to_irr )
+ hvm_funcs.sync_pir_to_irr(v);
+
s = vcpu_vlapic(v);
if ( (rc = hvm_save_entry(LAPIC_REGS, v->vcpu_id, h, s->regs)) != 0 )
break;
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 304aeea..fb38d15 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1583,7 +1583,9 @@ static int vmx_virtual_intr_delivery_enabled(void)
static void vmx_process_isr(int isr, struct vcpu *v)
{
unsigned long status;
- u8 old;
+ u8 old, i;
+ unsigned int *eoi_exit_bitmap, val;
+ struct vlapic *vlapic = vcpu_vlapic(v);
if ( isr < 0 )
isr = 0;
@@ -1597,6 +1599,29 @@ static void vmx_process_isr(int isr, struct vcpu *v)
status |= isr << VMX_GUEST_INTR_STATUS_SVI_OFFSET;
__vmwrite(GUEST_INTR_STATUS, status);
}
+
+ eoi_exit_bitmap = (unsigned int *)v->arch.hvm_vmx.eoi_exit_bitmap;
+ /*
+ * We cannot simple copy the TMR as eoi exit bitmap due to the special
+ * periodic timer interrupt which is edge trigger but need a mandatory
+ * EOI-induced VM exit to let pending periodic timer interrupts have the
+ * chance to be injected for compensation.
+ * Here we will construct the eoi exit bimap via (IRR | ISR). Though it
+ * may cause unnecessary eoi exit of the interrupts that pending in IRR or
+ * ISR during save/resotre, each pending interrupt only causes one vmexit.
+ * The subsequent interrupts will adjust the eoi exit bitmap correctly. So
+ * the performace hurt can be ingored.
+ */
+ for ( i = 0x10; i < 0x80; i += 0x10 )
+ {
+ val = vlapic_get_reg(vlapic, APIC_IRR + i);
+ val |= vlapic_get_reg(vlapic, APIC_ISR + i);
+ eoi_exit_bitmap[i >> 4] |= val;
+ }
+
+ for ( i = 0; i < 4; i++ )
+ __vmwrite(EOI_EXIT_BITMAP(i), v->arch.hvm_vmx.eoi_exit_bitmap[i]);
+
vmx_vmcs_exit(v);
}
--
1.7.6.3
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH] vmx, apicv: fix save/restore issue with apicv
2014-10-11 7:54 [PATCH] vmx, apicv: fix save/restore issue with apicv Yang Zhang
@ 2014-10-13 8:33 ` Jan Beulich
2014-10-14 5:43 ` Zhang, Yang Z
2014-10-14 14:19 ` Olaf Hering
1 sibling, 1 reply; 11+ messages in thread
From: Jan Beulich @ 2014-10-13 8:33 UTC (permalink / raw)
To: Yang Zhang; +Cc: kevin.tian, Andrew.Cooper3, olaf, malcolm.crossley, xen-devel
>>> On 11.10.14 at 09:54, <yang.z.zhang@intel.com> wrote:
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -1583,7 +1583,9 @@ static int vmx_virtual_intr_delivery_enabled(void)
> static void vmx_process_isr(int isr, struct vcpu *v)
> {
> unsigned long status;
> - u8 old;
> + u8 old, i;
There is absolutely no point in i being u8 - this can be plain unsigned
int.
> + unsigned int *eoi_exit_bitmap, val;
> + struct vlapic *vlapic = vcpu_vlapic(v);
>
> if ( isr < 0 )
> isr = 0;
> @@ -1597,6 +1599,29 @@ static void vmx_process_isr(int isr, struct vcpu *v)
> status |= isr << VMX_GUEST_INTR_STATUS_SVI_OFFSET;
> __vmwrite(GUEST_INTR_STATUS, status);
> }
> +
> + eoi_exit_bitmap = (unsigned int *)v->arch.hvm_vmx.eoi_exit_bitmap;
No casts like this please. This is a bitmap and would hence preferably
be treated that way consistently. The code here isn't performance
critical.
> + /*
> + * We cannot simple copy the TMR as eoi exit bitmap due to the special
> + * periodic timer interrupt which is edge trigger but need a mandatory
> + * EOI-induced VM exit to let pending periodic timer interrupts have the
> + * chance to be injected for compensation.
> + * Here we will construct the eoi exit bimap via (IRR | ISR). Though it
> + * may cause unnecessary eoi exit of the interrupts that pending in IRR or
> + * ISR during save/resotre, each pending interrupt only causes one vmexit.
> + * The subsequent interrupts will adjust the eoi exit bitmap correctly. So
> + * the performace hurt can be ingored.
> + */
> + for ( i = 0x10; i < 0x80; i += 0x10 )
So you skip the first 32 vectors instead of just the first 16. That's not
in line with the APIC definitions. Also basing the loop variable on APIC
register numbers is kind of odd. I'd really suggest using something more
natural, like vector space (which would also allow you to use existing
helper functions/macros).
> + {
> + val = vlapic_get_reg(vlapic, APIC_IRR + i);
> + val |= vlapic_get_reg(vlapic, APIC_ISR + i);
The comment above doesn't really make clear why not just IRR.
Also please check the spelling in that comment.
Jan
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH] vmx, apicv: fix save/restore issue with apicv
2014-10-13 8:33 ` Jan Beulich
@ 2014-10-14 5:43 ` Zhang, Yang Z
2014-10-14 9:18 ` Andrew Cooper
2014-10-14 9:53 ` Jan Beulich
0 siblings, 2 replies; 11+ messages in thread
From: Zhang, Yang Z @ 2014-10-14 5:43 UTC (permalink / raw)
To: Jan Beulich
Cc: Tian, Kevin, Andrew.Cooper3@citrix.com, olaf@aepfle.de,
malcolm.crossley@citrix.com, xen-devel@lists.xen.org
Jan Beulich wrote on 2014-10-13:
> >>> On 11.10.14 at 09:54, <yang.z.zhang@intel.com> wrote:
> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > @@ -1583,7 +1583,9 @@ static int
> > vmx_virtual_intr_delivery_enabled(void)
> > static void vmx_process_isr(int isr, struct vcpu *v) {
> > unsigned long status;
> > - u8 old;
> > + u8 old, i;
>
> There is absolutely no point in i being u8 - this can be plain unsigned int.
It's ok to change it to unsigned int.
>
> > + unsigned int *eoi_exit_bitmap, val;
> > + struct vlapic *vlapic = vcpu_vlapic(v);
> >
> > if ( isr < 0 )
> > isr = 0;
> > @@ -1597,6 +1599,29 @@ static void vmx_process_isr(int isr, struct vcpu
> *v)
> > status |= isr << VMX_GUEST_INTR_STATUS_SVI_OFFSET;
> > __vmwrite(GUEST_INTR_STATUS, status);
> > }
> > +
> > + eoi_exit_bitmap = (unsigned int
> > + *)v->arch.hvm_vmx.eoi_exit_bitmap;
>
> No casts like this please. This is a bitmap and would hence preferably be
> treated that way consistently. The code here isn't performance critical.
Yes, it's performance critical from the live migration's point and that's why I use the cast here and
>
> > + /*
> > + * We cannot simple copy the TMR as eoi exit bitmap due to the special
> > + * periodic timer interrupt which is edge trigger but need a mandatory
> > + * EOI-induced VM exit to let pending periodic timer interrupts have
> the
> > + * chance to be injected for compensation.
> > + * Here we will construct the eoi exit bimap via (IRR | ISR). Though it
> > + * may cause unnecessary eoi exit of the interrupts that pending in IRR
> or
> > + * ISR during save/resotre, each pending interrupt only causes one
> vmexit.
> > + * The subsequent interrupts will adjust the eoi exit bitmap correctly.
> So
> > + * the performace hurt can be ingored.
> > + */
> > + for ( i = 0x10; i < 0x80; i += 0x10 )
>
> So you skip the first 32 vectors instead of just the first 16. That's not in line with
> the APIC definitions. Also basing the loop variable on APIC register numbers is
> kind of odd. I'd really suggest using something more natural, like vector space
> (which would also allow you to use existing helper functions/macros).
>
...skip the first 32 vectors and operates based on group instead vector
Also, Spec says "21-31 - Intel reserved. Do not use." So it's ok to skip them.
> > + {
> > + val = vlapic_get_reg(vlapic, APIC_IRR + i);
> > + val |= vlapic_get_reg(vlapic, APIC_ISR + i);
>
> The comment above doesn't really make clear why not just IRR.
Upon acceptance of an interrupt into the IRR, the corresponding TMR will be changed. This means an interrupt either in IRR or in ISR has no chance to update the TMR and eoi exit bitmap. So we need to check IRR + ISR while rebuilding the eoi exit bitmap. It is not enough to check only IRR.
>
> Also please check the spelling in that comment.
Sorry for my poor English.
>
> Jan
Best regards,
Yang
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH] vmx, apicv: fix save/restore issue with apicv
2014-10-14 5:43 ` Zhang, Yang Z
@ 2014-10-14 9:18 ` Andrew Cooper
2014-10-14 12:54 ` Zhang, Yang Z
2014-10-14 9:53 ` Jan Beulich
1 sibling, 1 reply; 11+ messages in thread
From: Andrew Cooper @ 2014-10-14 9:18 UTC (permalink / raw)
To: Zhang, Yang Z, Jan Beulich
Cc: Tian, Kevin, olaf@aepfle.de, malcolm.crossley@citrix.com,
xen-devel@lists.xen.org
On 14/10/14 06:43, Zhang, Yang Z wrote:
> Jan Beulich wrote on 2014-10-13:
>>>>> On 11.10.14 at 09:54, <yang.z.zhang@intel.com> wrote:
>>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>>> @@ -1583,7 +1583,9 @@ static int
>>> vmx_virtual_intr_delivery_enabled(void)
>>> static void vmx_process_isr(int isr, struct vcpu *v) {
>>> unsigned long status;
>>> - u8 old;
>>> + u8 old, i;
>> There is absolutely no point in i being u8 - this can be plain unsigned int.
> It's ok to change it to unsigned int.
>
>>> + unsigned int *eoi_exit_bitmap, val;
>>> + struct vlapic *vlapic = vcpu_vlapic(v);
>>>
>>> if ( isr < 0 )
>>> isr = 0;
>>> @@ -1597,6 +1599,29 @@ static void vmx_process_isr(int isr, struct vcpu
>> *v)
>>> status |= isr << VMX_GUEST_INTR_STATUS_SVI_OFFSET;
>>> __vmwrite(GUEST_INTR_STATUS, status);
>>> }
>>> +
>>> + eoi_exit_bitmap = (unsigned int
>>> + *)v->arch.hvm_vmx.eoi_exit_bitmap;
>> No casts like this please. This is a bitmap and would hence preferably be
>> treated that way consistently. The code here isn't performance critical.
> Yes, it's performance critical from the live migration's point and that's why I use the cast here and
It is run once per vLAPIC on restore. I would not qualify it as
performance critical, but I would also advocate the efficient method of
word accesses over the inefficient bit accesses where it makes sense.
>
>>> + /*
>>> + * We cannot simple copy the TMR as eoi exit bitmap due to the special
>>> + * periodic timer interrupt which is edge trigger but need a mandatory
>>> + * EOI-induced VM exit to let pending periodic timer interrupts have
>> the
>>> + * chance to be injected for compensation.
>>> + * Here we will construct the eoi exit bimap via (IRR | ISR). Though it
>>> + * may cause unnecessary eoi exit of the interrupts that pending in IRR
>> or
>>> + * ISR during save/resotre, each pending interrupt only causes one
>> vmexit.
>>> + * The subsequent interrupts will adjust the eoi exit bitmap correctly.
>> So
>>> + * the performace hurt can be ingored.
>>> + */
>>> + for ( i = 0x10; i < 0x80; i += 0x10 )
>> So you skip the first 32 vectors instead of just the first 16. That's not in line with
>
>> the APIC definitions. Also basing the loop variable on APIC register numbers is
>> kind of odd. I'd really suggest using something more natural, like vector space
>> (which would also allow you to use existing helper functions/macros).
>>
> ...skip the first 32 vectors and operates based on group instead vector
>
> Also, Spec says "21-31 - Intel reserved. Do not use." So it's ok to skip them.
And what do you expect will happen to DoS guests which continue to use
the legacy PICs?
This range cannot possibly be skipped.
~Andrew
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH] vmx, apicv: fix save/restore issue with apicv
2014-10-14 9:18 ` Andrew Cooper
@ 2014-10-14 12:54 ` Zhang, Yang Z
0 siblings, 0 replies; 11+ messages in thread
From: Zhang, Yang Z @ 2014-10-14 12:54 UTC (permalink / raw)
To: Andrew Cooper, Jan Beulich
Cc: Tian, Kevin, olaf@aepfle.de, malcolm.crossley@citrix.com,
xen-devel@lists.xen.org
Andrew Cooper wrote on 2014-10-14:
> On 14/10/14 06:43, Zhang, Yang Z wrote:
>> Jan Beulich wrote on 2014-10-13:
>>>>>> On 11.10.14 at 09:54, <yang.z.zhang@intel.com> wrote:
>>>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>>>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>>>> @@ -1583,7 +1583,9 @@ static int
>>>> vmx_virtual_intr_delivery_enabled(void)
>>>> static void vmx_process_isr(int isr, struct vcpu *v) {
>>>> unsigned long status;
>>>> - u8 old;
>>>> + u8 old, i;
>>> There is absolutely no point in i being u8 - this can be plain unsigned int.
>> It's ok to change it to unsigned int.
>>
>>>> + unsigned int *eoi_exit_bitmap, val;
>>>> + struct vlapic *vlapic = vcpu_vlapic(v);
>>>>
>>>> if ( isr < 0 )
>>>> isr = 0;
>>>> @@ -1597,6 +1599,29 @@ static void vmx_process_isr(int isr, struct
>>>> vcpu
>>> *v)
>>>> status |= isr << VMX_GUEST_INTR_STATUS_SVI_OFFSET;
>>>> __vmwrite(GUEST_INTR_STATUS, status);
>>>> }
>>>> +
>>>> + eoi_exit_bitmap = (unsigned int
>>>> + *)v->arch.hvm_vmx.eoi_exit_bitmap;
>>> No casts like this please. This is a bitmap and would hence
>>> preferably be treated that way consistently. The code here isn't
> performance critical.
>> Yes, it's performance critical from the live migration's point and
>> that's why I use the cast here and
>
> It is run once per vLAPIC on restore. I would not qualify it as
> performance critical, but I would also advocate the efficient method
Since the live migration is largely used in cloud, and the downtime of live migration is very important to cloud user. So it is worth to put the restore code path as performance critical area in future. BTW, I am working on KVM to reduce the downtime now.
> of word accesses over the inefficient bit accesses where it makes sense.
>
>>
>>>> + /* + * We cannot simple copy the TMR as eoi exit bitmap due
>>>> to + the special + * periodic timer interrupt which is edge
>>>> trigger but need a mandatory + * EOI-induced VM exit to let
>>>> pending periodic timer + interrupts have the + * chance to be
>>>> injected for compensation. + * Here we will construct the eoi
>>>> exit bimap via (IRR | ISR). Though it + * may cause unnecessary
>>>> eoi exit of the interrupts that + pending in IRR or + * ISR
>>>> during save/resotre, each pending interrupt only + causes one vmexit.
>>>> + * The subsequent interrupts will adjust the eoi exit bitmap
> correctly.
>>> So
>>>> + * the performace hurt can be ingored.
>>>> + */
>>>> + for ( i = 0x10; i < 0x80; i += 0x10 )
>>> So you skip the first 32 vectors instead of just the first 16.
>>> That's not in line with
>>>
>>> the APIC definitions. Also basing the loop variable on APIC register
>>> numbers is kind of odd. I'd really suggest using something more
>>> natural, like vector space (which would also allow you to use existing
>>> helper functions/macros).
>>>
>> ...skip the first 32 vectors and operates based on group instead
>> vector
>>
>> Also, Spec says "21-31 - Intel reserved. Do not use." So it's ok to skip them.
>
> And what do you expect will happen to DoS guests which continue to use
> the legacy PICs?
I didn't get you point. Can you explain more?
>
> This range cannot possibly be skipped.
>
> ~Andrew
Best regards,
Yang
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] vmx, apicv: fix save/restore issue with apicv
2014-10-14 5:43 ` Zhang, Yang Z
2014-10-14 9:18 ` Andrew Cooper
@ 2014-10-14 9:53 ` Jan Beulich
2014-10-14 13:10 ` Zhang, Yang Z
1 sibling, 1 reply; 11+ messages in thread
From: Jan Beulich @ 2014-10-14 9:53 UTC (permalink / raw)
To: Yang Z Zhang
Cc: Kevin Tian, Andrew.Cooper3@citrix.com, olaf@aepfle.de,
malcolm.crossley@citrix.com, xen-devel@lists.xen.org
>>> On 14.10.14 at 07:43, <yang.z.zhang@intel.com> wrote:
> Jan Beulich wrote on 2014-10-13:
>> >>> On 11.10.14 at 09:54, <yang.z.zhang@intel.com> wrote:
>> > @@ -1597,6 +1599,29 @@ static void vmx_process_isr(int isr, struct vcpu
>> *v)
>> > status |= isr << VMX_GUEST_INTR_STATUS_SVI_OFFSET;
>> > __vmwrite(GUEST_INTR_STATUS, status);
>> > }
>> > +
>> > + eoi_exit_bitmap = (unsigned int
>> > + *)v->arch.hvm_vmx.eoi_exit_bitmap;
>>
>> No casts like this please. This is a bitmap and would hence preferably be
>> treated that way consistently. The code here isn't performance critical.
>
> Yes, it's performance critical from the live migration's point and that's
> why I use the cast here and
Compared to everything else involved in migration, a few extra
cycles here surely don't matter at all. What does matter is
maintainability of this already not always easy to understand code.
>> > + /*
>> > + * We cannot simple copy the TMR as eoi exit bitmap due to the special
>> > + * periodic timer interrupt which is edge trigger but need a mandatory
>> > + * EOI-induced VM exit to let pending periodic timer interrupts have
>> the
>> > + * chance to be injected for compensation.
>> > + * Here we will construct the eoi exit bimap via (IRR | ISR). Though it
>> > + * may cause unnecessary eoi exit of the interrupts that pending in IRR
>> or
>> > + * ISR during save/resotre, each pending interrupt only causes one
>> vmexit.
>> > + * The subsequent interrupts will adjust the eoi exit bitmap correctly.
>> So
>> > + * the performace hurt can be ingored.
>> > + */
>> > + for ( i = 0x10; i < 0x80; i += 0x10 )
>>
>> So you skip the first 32 vectors instead of just the first 16. That's not in line with
>> the APIC definitions. Also basing the loop variable on APIC register numbers is
>> kind of odd. I'd really suggest using something more natural, like vector space
>> (which would also allow you to use existing helper functions/macros).
>>
>
> ...skip the first 32 vectors and operates based on group instead vector
>
> Also, Spec says "21-31 - Intel reserved. Do not use." So it's ok to skip them.
Please distinguish APIC specification and CPU specification here.
The APIC one only really precludes 0...15 to be used.
>> > + {
>> > + val = vlapic_get_reg(vlapic, APIC_IRR + i);
>> > + val |= vlapic_get_reg(vlapic, APIC_ISR + i);
>>
>> The comment above doesn't really make clear why not just IRR.
>
> Upon acceptance of an interrupt into the IRR, the corresponding TMR will be
> changed. This means an interrupt either in IRR or in ISR has no chance to
> update the TMR and eoi exit bitmap. So we need to check IRR + ISR while
> rebuilding the eoi exit bitmap. It is not enough to check only IRR.
But then why not look at TMR right away? It's being kept up to
date as long as the guest is running, and gets migrated over
afterwards. Considering what vmx_update_eoi_exit_bitmap())
does, you really want to or TMR into EOI_EXIT_BITMAP (which
post-migration could just be an assignment rather than an or).
Jan
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH] vmx, apicv: fix save/restore issue with apicv
2014-10-14 9:53 ` Jan Beulich
@ 2014-10-14 13:10 ` Zhang, Yang Z
2014-10-14 13:31 ` Jan Beulich
0 siblings, 1 reply; 11+ messages in thread
From: Zhang, Yang Z @ 2014-10-14 13:10 UTC (permalink / raw)
To: Jan Beulich
Cc: Tian, Kevin, Andrew.Cooper3@citrix.com, olaf@aepfle.de,
malcolm.crossley@citrix.com, xen-devel@lists.xen.org
Jan Beulich wrote on 2014-10-14:
> >>> On 14.10.14 at 07:43, <yang.z.zhang@intel.com> wrote:
> > Jan Beulich wrote on 2014-10-13:
> >> >>> On 11.10.14 at 09:54, <yang.z.zhang@intel.com> wrote:
> >> > @@ -1597,6 +1599,29 @@ static void vmx_process_isr(int isr, struct
> >> > vcpu
> >> *v)
> >> > status |= isr << VMX_GUEST_INTR_STATUS_SVI_OFFSET;
> >> > __vmwrite(GUEST_INTR_STATUS, status);
> >> > }
> >> > +
> >> > + eoi_exit_bitmap = (unsigned int
> >> > + *)v->arch.hvm_vmx.eoi_exit_bitmap;
> >>
> >> No casts like this please. This is a bitmap and would hence
> >> preferably be treated that way consistently. The code here isn't
> performance critical.
> >
> > Yes, it's performance critical from the live migration's point and
> > that's why I use the cast here and
>
> Compared to everything else involved in migration, a few extra cycles here
Xen is far behind KVM on live migration. If we want to remain the competitive, I'd suggest doing a cleanup on current migration logic to make it more effective.
> surely don't matter at all. What does matter is maintainability of this already
> not always easy to understand code.
>
> >> > + /*
> >> > + * We cannot simple copy the TMR as eoi exit bitmap due to the
> special
> >> > + * periodic timer interrupt which is edge trigger but need a
> mandatory
> >> > + * EOI-induced VM exit to let pending periodic timer
> >> > + interrupts have
> >> the
> >> > + * chance to be injected for compensation.
> >> > + * Here we will construct the eoi exit bimap via (IRR | ISR). Though
> it
> >> > + * may cause unnecessary eoi exit of the interrupts that
> >> > + pending in IRR
> >> or
> >> > + * ISR during save/resotre, each pending interrupt only causes
> >> > + one
> >> vmexit.
> >> > + * The subsequent interrupts will adjust the eoi exit bitmap
> correctly.
> >> So
> >> > + * the performace hurt can be ingored.
> >> > + */
> >> > + for ( i = 0x10; i < 0x80; i += 0x10 )
> >>
> >> So you skip the first 32 vectors instead of just the first 16. That's
> >> not in line with the APIC definitions. Also basing the loop variable
> >> on APIC register numbers is kind of odd. I'd really suggest using
> >> something more natural, like vector space (which would also allow you to
> use existing helper functions/macros).
> >>
> >
> > ...skip the first 32 vectors and operates based on group instead
> > vector
> >
> > Also, Spec says "21-31 - Intel reserved. Do not use." So it's ok to skip them.
>
> Please distinguish APIC specification and CPU specification here.
> The APIC one only really precludes 0...15 to be used.
Ok. I will check the APIC specification.
>
> >> > + {
> >> > + val = vlapic_get_reg(vlapic, APIC_IRR + i);
> >> > + val |= vlapic_get_reg(vlapic, APIC_ISR + i);
> >>
> >> The comment above doesn't really make clear why not just IRR.
> >
> > Upon acceptance of an interrupt into the IRR, the corresponding TMR
> > will be changed. This means an interrupt either in IRR or in ISR has
> > no chance to update the TMR and eoi exit bitmap. So we need to check
> > IRR + ISR while rebuilding the eoi exit bitmap. It is not enough to check only
> IRR.
>
> But then why not look at TMR right away? It's being kept up to date as long as
> the guest is running, and gets migrated over afterwards. Considering what
> vmx_update_eoi_exit_bitmap()) does, you really want to or TMR into
> EOI_EXIT_BITMAP (which post-migration could just be an assignment rather
> than an or).
No, we cannot use TMR directly. As I said in the comments, the periodic timer interrupt is tricky which is edge-trigger but need an EOI vmexit to let the pending interrupts have the chance to be injected for compensation.
>
> Jan
Best regards,
Yang
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH] vmx, apicv: fix save/restore issue with apicv
2014-10-14 13:10 ` Zhang, Yang Z
@ 2014-10-14 13:31 ` Jan Beulich
2014-10-16 6:38 ` Zhang, Yang Z
0 siblings, 1 reply; 11+ messages in thread
From: Jan Beulich @ 2014-10-14 13:31 UTC (permalink / raw)
To: Yang Z Zhang
Cc: Kevin Tian, Andrew.Cooper3@citrix.com, olaf@aepfle.de,
malcolm.crossley@citrix.com, xen-devel@lists.xen.org
>>> On 14.10.14 at 15:10, <yang.z.zhang@intel.com> wrote:
> Jan Beulich wrote on 2014-10-14:
>> >>> On 14.10.14 at 07:43, <yang.z.zhang@intel.com> wrote:
>> > Jan Beulich wrote on 2014-10-13:
>> >> >>> On 11.10.14 at 09:54, <yang.z.zhang@intel.com> wrote:
>> >> > @@ -1597,6 +1599,29 @@ static void vmx_process_isr(int isr, struct
>> >> > vcpu
>> >> *v)
>> >> > status |= isr << VMX_GUEST_INTR_STATUS_SVI_OFFSET;
>> >> > __vmwrite(GUEST_INTR_STATUS, status);
>> >> > }
>> >> > +
>> >> > + eoi_exit_bitmap = (unsigned int
>> >> > + *)v->arch.hvm_vmx.eoi_exit_bitmap;
>> >>
>> >> No casts like this please. This is a bitmap and would hence
>> >> preferably be treated that way consistently. The code here isn't
>> performance critical.
>> >
>> > Yes, it's performance critical from the live migration's point and
>> > that's why I use the cast here and
>>
>> Compared to everything else involved in migration, a few extra cycles here
>
> Xen is far behind KVM on live migration. If we want to remain the
> competitive, I'd suggest doing a cleanup on current migration logic to make
> it more effective.
But that surely needs to be done elsewhere, where more than a
couple of nanoseconds are to be gained.
>> >> > + {
>> >> > + val = vlapic_get_reg(vlapic, APIC_IRR + i);
>> >> > + val |= vlapic_get_reg(vlapic, APIC_ISR + i);
>> >>
>> >> The comment above doesn't really make clear why not just IRR.
>> >
>> > Upon acceptance of an interrupt into the IRR, the corresponding TMR
>> > will be changed. This means an interrupt either in IRR or in ISR has
>> > no chance to update the TMR and eoi exit bitmap. So we need to check
>> > IRR + ISR while rebuilding the eoi exit bitmap. It is not enough to check only
>> IRR.
>>
>> But then why not look at TMR right away? It's being kept up to date as long as
>> the guest is running, and gets migrated over afterwards. Considering what
>> vmx_update_eoi_exit_bitmap()) does, you really want to or TMR into
>> EOI_EXIT_BITMAP (which post-migration could just be an assignment rather
>> than an or).
>
> No, we cannot use TMR directly. As I said in the comments, the periodic
> timer interrupt is tricky which is edge-trigger but need an EOI vmexit to let
> the pending interrupts have the chance to be injected for compensation.
The respective "normal" (non-migration) handling of this being in
vmx_intr_assist(), involving pt_vector? If so, I think I understand
now.
Thanks, Jan
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH] vmx, apicv: fix save/restore issue with apicv
2014-10-14 13:31 ` Jan Beulich
@ 2014-10-16 6:38 ` Zhang, Yang Z
2014-10-16 9:53 ` Andrew Cooper
0 siblings, 1 reply; 11+ messages in thread
From: Zhang, Yang Z @ 2014-10-16 6:38 UTC (permalink / raw)
To: Jan Beulich
Cc: Tian, Kevin, Andrew.Cooper3@citrix.com, olaf@aepfle.de,
malcolm.crossley@citrix.com, xen-devel@lists.xen.org
Jan Beulich wrote on 2014-10-14:
>>>> On 14.10.14 at 15:10, <yang.z.zhang@intel.com> wrote:
>> Jan Beulich wrote on 2014-10-14:
>>>>>> On 14.10.14 at 07:43, <yang.z.zhang@intel.com> wrote:
>>>> Jan Beulich wrote on 2014-10-13:
>>>>>>>> On 11.10.14 at 09:54, <yang.z.zhang@intel.com> wrote:
>>>>>> @@ -1597,6 +1599,29 @@ static void vmx_process_isr(int isr,
>>>>>> struct vcpu
>>>>> *v)
>>>>>> status |= isr << VMX_GUEST_INTR_STATUS_SVI_OFFSET;
>>>>>> __vmwrite(GUEST_INTR_STATUS, status);
>>>>>> }
>>>>>> +
>>>>>> + eoi_exit_bitmap = (unsigned int
>>>>>> + *)v->arch.hvm_vmx.eoi_exit_bitmap;
>>>>>
>>>>> No casts like this please. This is a bitmap and would hence
>>>>> preferably be treated that way consistently. The code here isn't
>>> performance critical.
>>>>
>>>> Yes, it's performance critical from the live migration's point
>>>> and that's why I use the cast here and
>>>
>>> Compared to everything else involved in migration, a few extra
>>> cycles here
>>
>> Xen is far behind KVM on live migration. If we want to remain the
>> competitive, I'd suggest doing a cleanup on current migration logic
>> to make it more effective.
>
> But that surely needs to be done elsewhere, where more than a couple
> of nanoseconds are to be gained.
If you think it is ok, I will use vector based operations:
+ for ( vector = 0x10; vector < NR_VECTORS; vector++ )
+ if (vlapic_test_vector(vector, &s->regs->data[APIC_IRR]) ||
+ vlapic_test_vector(vector, &s->regs->data[APIC_ISR]))
+ set_bit(vector, v->arch.hvm_vmx.eoi_exit_bitmap);
Andrew, is it ok for you?
>
>>>>>> + {
>>>>>> + val = vlapic_get_reg(vlapic, APIC_IRR + i);
>>>>>> + val |= vlapic_get_reg(vlapic, APIC_ISR + i);
>>>>>
>>>>> The comment above doesn't really make clear why not just IRR.
>>>>
>>>> Upon acceptance of an interrupt into the IRR, the corresponding
>>>> TMR will be changed. This means an interrupt either in IRR or in
>>>> ISR has no chance to update the TMR and eoi exit bitmap. So we
>>>> need to check IRR + ISR while rebuilding the eoi exit bitmap. It
>>>> is not enough to check only
>>> IRR.
>>>
>>> But then why not look at TMR right away? It's being kept up to date
>>> as long as the guest is running, and gets migrated over afterwards.
>>> Considering what
>>> vmx_update_eoi_exit_bitmap()) does, you really want to or TMR into
>>> EOI_EXIT_BITMAP (which post-migration could just be an assignment
>>> rather than an or).
>>
>> No, we cannot use TMR directly. As I said in the comments, the
>> periodic timer interrupt is tricky which is edge-trigger but need an
>> EOI vmexit to let the pending interrupts have the chance to be
>> injected for
> compensation.
>
> The respective "normal" (non-migration) handling of this being in
> vmx_intr_assist(), involving pt_vector? If so, I think I understand now.
Yes.
>
> Thanks, Jan
Best regards,
Yang
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH] vmx, apicv: fix save/restore issue with apicv
2014-10-16 6:38 ` Zhang, Yang Z
@ 2014-10-16 9:53 ` Andrew Cooper
0 siblings, 0 replies; 11+ messages in thread
From: Andrew Cooper @ 2014-10-16 9:53 UTC (permalink / raw)
To: Zhang, Yang Z, Jan Beulich
Cc: Tian, Kevin, olaf@aepfle.de, malcolm.crossley@citrix.com,
xen-devel@lists.xen.org
On 16/10/14 07:38, Zhang, Yang Z wrote:
> Jan Beulich wrote on 2014-10-14:
>>>>> On 14.10.14 at 15:10, <yang.z.zhang@intel.com> wrote:
>>> Jan Beulich wrote on 2014-10-14:
>>>>>>> On 14.10.14 at 07:43, <yang.z.zhang@intel.com> wrote:
>>>>> Jan Beulich wrote on 2014-10-13:
>>>>>>>>> On 11.10.14 at 09:54, <yang.z.zhang@intel.com> wrote:
>>>>>>> @@ -1597,6 +1599,29 @@ static void vmx_process_isr(int isr,
>>>>>>> struct vcpu
>>>>>> *v)
>>>>>>> status |= isr << VMX_GUEST_INTR_STATUS_SVI_OFFSET;
>>>>>>> __vmwrite(GUEST_INTR_STATUS, status);
>>>>>>> }
>>>>>>> +
>>>>>>> + eoi_exit_bitmap = (unsigned int
>>>>>>> + *)v->arch.hvm_vmx.eoi_exit_bitmap;
>>>>>> No casts like this please. This is a bitmap and would hence
>>>>>> preferably be treated that way consistently. The code here isn't
>>>> performance critical.
>>>>> Yes, it's performance critical from the live migration's point
>>>>> and that's why I use the cast here and
>>>> Compared to everything else involved in migration, a few extra
>>>> cycles here
>>> Xen is far behind KVM on live migration. If we want to remain the
>>> competitive, I'd suggest doing a cleanup on current migration logic
>>> to make it more effective.
>> But that surely needs to be done elsewhere, where more than a couple
>> of nanoseconds are to be gained.
> If you think it is ok, I will use vector based operations:
>
> + for ( vector = 0x10; vector < NR_VECTORS; vector++ )
> + if (vlapic_test_vector(vector, &s->regs->data[APIC_IRR]) ||
> + vlapic_test_vector(vector, &s->regs->data[APIC_ISR]))
> + set_bit(vector, v->arch.hvm_vmx.eoi_exit_bitmap);
>
> Andrew, is it ok for you?
That covers the appropriate vector range, (and getting a working fix is
far more important than arguing its efficiency at this point in the
release).
This looks fine to me.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] vmx, apicv: fix save/restore issue with apicv
2014-10-11 7:54 [PATCH] vmx, apicv: fix save/restore issue with apicv Yang Zhang
2014-10-13 8:33 ` Jan Beulich
@ 2014-10-14 14:19 ` Olaf Hering
1 sibling, 0 replies; 11+ messages in thread
From: Olaf Hering @ 2014-10-14 14:19 UTC (permalink / raw)
To: Yang Zhang
Cc: Andrew.Cooper3, kevin.tian, malcolm.crossley, JBeulich, xen-devel
On Sat, Oct 11, Yang Zhang wrote:
> From: Yang Zhang <yang.z.zhang@Intel.com>
>
> This patch fix two issues:
>
> 1. Interrupts on PIR are lost during save/restore. Syncing the PIR
> into IRR during save will fix it.
>
> 2. EOI exit bitmap doesn't setup correctly after restore. Here we
> will construct the eoi exit bimap via (IRR | ISR). Though it may cause
> unnecessary eoi exit of the interrupts that pending in IRR or ISR during
> save/resotre, each pending interrupt only causes one vmexit. The
> subsequent interrupts will adjust the eoi exit bitmap correctly. So
> the performace hurt can be ignored.
>
> Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
> ---
> Hi Olaf and Malcolm, Can you help to retest it? Because the key part
> is changed in this patch, so i'm not sure whether it still works in
> your side.
>
> Also, Olaf, is it ok to add your SOB in this patch? If yes, I can resend
> it to do it.
Yes, thats fine.
I have tested this version and save/restore still works with current
staging.
Olaf
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2014-10-16 9:53 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-11 7:54 [PATCH] vmx, apicv: fix save/restore issue with apicv Yang Zhang
2014-10-13 8:33 ` Jan Beulich
2014-10-14 5:43 ` Zhang, Yang Z
2014-10-14 9:18 ` Andrew Cooper
2014-10-14 12:54 ` Zhang, Yang Z
2014-10-14 9:53 ` Jan Beulich
2014-10-14 13:10 ` Zhang, Yang Z
2014-10-14 13:31 ` Jan Beulich
2014-10-16 6:38 ` Zhang, Yang Z
2014-10-16 9:53 ` Andrew Cooper
2014-10-14 14:19 ` Olaf Hering
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.