* [PATCH 0/3] KVM: x86: Fix and optimize in-kernel NMI watchdog support
@ 2008-10-15 14:27 Jan Kiszka
2008-10-15 14:27 ` [PATCH 1/3] KVM: x86: Relax accept conditions of kvm_apic_accept_pic_intr Jan Kiszka
` (3 more replies)
0 siblings, 4 replies; 30+ messages in thread
From: Jan Kiszka @ 2008-10-15 14:27 UTC (permalink / raw)
To: kvm; +Cc: avi, jiajun.xu, sheng
Bug tracker reports 2149609 and 2168057 pointed out boot issues of
Windows 64-bit versions. This series fixes the underlying problem.
It furthermore stacks the earlier posted optimization on top which
shortens the PIT IRQ delivery path in case the NMI-watchdog-via-PIT
trick is not used by the guest.
Jan
--
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 1/3] KVM: x86: Relax accept conditions of kvm_apic_accept_pic_intr
2008-10-15 14:27 [PATCH 0/3] KVM: x86: Fix and optimize in-kernel NMI watchdog support Jan Kiszka
@ 2008-10-15 14:27 ` Jan Kiszka
2008-10-17 5:11 ` Sheng Yang
2008-10-15 14:27 ` [PATCH 2/3] KVM: x86: Dont deliver PIT IRQs to masked LVT0s Jan Kiszka
` (2 subsequent siblings)
3 siblings, 1 reply; 30+ messages in thread
From: Jan Kiszka @ 2008-10-15 14:27 UTC (permalink / raw)
To: kvm; +Cc: avi, jiajun.xu, sheng, Jan Kiszka
[-- Attachment #1: kvm-x86-relax-kvm_apic_accept_pic_intr.patch --]
[-- Type: text/plain, Size: 1096 bytes --]
Aligning in-kernel kvm_apic_accept_pic_intr with its user space mate,
this patch relaxes the conditions under which PIC IRQs are accepted
by LVT0. This reflects reality and allows to reuse the service for the
NMI watchdog use case.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
arch/x86/kvm/lapic.c | 13 ++++---------
1 file changed, 4 insertions(+), 9 deletions(-)
Index: b/arch/x86/kvm/lapic.c
===================================================================
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1072,16 +1072,11 @@ int kvm_apic_has_interrupt(struct kvm_vc
int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu)
{
u32 lvt0 = apic_get_reg(vcpu->arch.apic, APIC_LVT0);
- int r = 0;
- if (vcpu->vcpu_id == 0) {
- if (!apic_hw_enabled(vcpu->arch.apic))
- r = 1;
- if ((lvt0 & APIC_LVT_MASKED) == 0 &&
- GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT)
- r = 1;
- }
- return r;
+ if (!apic_hw_enabled(vcpu->arch.apic) ||
+ (lvt0 & APIC_LVT_MASKED) == 0)
+ return 1;
+ return 0;
}
void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu)
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 2/3] KVM: x86: Dont deliver PIT IRQs to masked LVT0s
2008-10-15 14:27 [PATCH 0/3] KVM: x86: Fix and optimize in-kernel NMI watchdog support Jan Kiszka
2008-10-15 14:27 ` [PATCH 1/3] KVM: x86: Relax accept conditions of kvm_apic_accept_pic_intr Jan Kiszka
@ 2008-10-15 14:27 ` Jan Kiszka
2008-10-17 15:23 ` Alexander Graf
2008-10-15 14:27 ` [PATCH 3/3] KVM: x86: Optimize NMI watchdog delivery Jan Kiszka
2008-10-19 11:13 ` [PATCH 0/3] KVM: x86: Fix and optimize in-kernel NMI watchdog support Avi Kivity
3 siblings, 1 reply; 30+ messages in thread
From: Jan Kiszka @ 2008-10-15 14:27 UTC (permalink / raw)
To: kvm; +Cc: avi, jiajun.xu, sheng, Jan Kiszka
[-- Attachment #1: kvm-x86-dont-deliver-pit-to-masked-lvt0.patch --]
[-- Type: text/plain, Size: 656 bytes --]
This fixes Windows 64-bit boot regressions: PIT IRQs must not be
delivered via LVT0 lines if they are masked.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
arch/x86/kvm/i8254.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Index: b/arch/x86/kvm/i8254.c
===================================================================
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -609,7 +609,7 @@ static void __inject_pit_timer_intr(stru
*/
for (i = 0; i < KVM_MAX_VCPUS; ++i) {
vcpu = kvm->vcpus[i];
- if (!vcpu)
+ if (!vcpu || !kvm_apic_accept_pic_intr(vcpu))
continue;
kvm_apic_local_deliver(vcpu, APIC_LVT0);
}
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 3/3] KVM: x86: Optimize NMI watchdog delivery
2008-10-15 14:27 [PATCH 0/3] KVM: x86: Fix and optimize in-kernel NMI watchdog support Jan Kiszka
2008-10-15 14:27 ` [PATCH 1/3] KVM: x86: Relax accept conditions of kvm_apic_accept_pic_intr Jan Kiszka
2008-10-15 14:27 ` [PATCH 2/3] KVM: x86: Dont deliver PIT IRQs to masked LVT0s Jan Kiszka
@ 2008-10-15 14:27 ` Jan Kiszka
2008-10-17 17:06 ` Sheng Yang
2008-10-19 11:13 ` [PATCH 0/3] KVM: x86: Fix and optimize in-kernel NMI watchdog support Avi Kivity
3 siblings, 1 reply; 30+ messages in thread
From: Jan Kiszka @ 2008-10-15 14:27 UTC (permalink / raw)
To: kvm; +Cc: avi, jiajun.xu, sheng, Jan Kiszka
[-- Attachment #1: kvm-x86-optimize-NMI-watchdog-delivery.patch --]
[-- Type: text/plain, Size: 3150 bytes --]
As suggested by Avi, this patch introduces a counter of VCPUs that have
LVT0 set to NMI mode. Only if the counter > 0, we push the PIT ticks via
all LAPIC LVT0 lines to enable NMI watchdog support.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
arch/x86/kvm/i8254.c | 13 +++++++------
arch/x86/kvm/lapic.c | 23 ++++++++++++++++++++---
include/asm-x86/kvm_host.h | 1 +
3 files changed, 28 insertions(+), 9 deletions(-)
Index: b/arch/x86/kvm/i8254.c
===================================================================
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -607,12 +607,13 @@ static void __inject_pit_timer_intr(stru
* The route is: PIT -> PIC -> LVT0 in NMI mode,
* timer IRQs will continue to flow through the IOAPIC.
*/
- for (i = 0; i < KVM_MAX_VCPUS; ++i) {
- vcpu = kvm->vcpus[i];
- if (!vcpu || !kvm_apic_accept_pic_intr(vcpu))
- continue;
- kvm_apic_local_deliver(vcpu, APIC_LVT0);
- }
+ if (kvm->arch.vapics_in_nmi_mode > 0)
+ for (i = 0; i < KVM_MAX_VCPUS; ++i) {
+ vcpu = kvm->vcpus[i];
+ if (!vcpu || !kvm_apic_accept_pic_intr(vcpu))
+ continue;
+ kvm_apic_local_deliver(vcpu, APIC_LVT0);
+ }
}
void kvm_inject_pit_timer_irqs(struct kvm_vcpu *vcpu)
Index: b/arch/x86/kvm/lapic.c
===================================================================
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -130,6 +130,11 @@ static inline int apic_lvtt_period(struc
return apic_get_reg(apic, APIC_LVTT) & APIC_LVT_TIMER_PERIODIC;
}
+static inline int apic_lvt_nmi_mode(u32 lvt_val)
+{
+ return (lvt_val & (APIC_MODE_MASK | APIC_LVT_MASKED)) == APIC_DM_NMI;
+}
+
static unsigned int apic_lvt_mask[APIC_LVT_NUM] = {
LVT_MASK | APIC_LVT_TIMER_PERIODIC, /* LVTT */
LVT_MASK | APIC_MODE_MASK, /* LVTTHMR */
@@ -672,6 +677,20 @@ static void start_apic_timer(struct kvm_
apic->timer.period)));
}
+static void apic_manage_nmi_watchdog(struct kvm_lapic *apic, u32 lvt0_val)
+{
+ int nmi_wd_enabled = apic_lvt_nmi_mode(apic_get_reg(apic, APIC_LVT0));
+
+ if (apic_lvt_nmi_mode(lvt0_val)) {
+ if (!nmi_wd_enabled) {
+ apic_debug("Receive NMI setting on APIC_LVT0 "
+ "for cpu %d\n", apic->vcpu->vcpu_id);
+ apic->vcpu->kvm->arch.vapics_in_nmi_mode++;
+ }
+ } else if (nmi_wd_enabled)
+ apic->vcpu->kvm->arch.vapics_in_nmi_mode--;
+}
+
static void apic_mmio_write(struct kvm_io_device *this,
gpa_t address, int len, const void *data)
{
@@ -753,9 +772,7 @@ static void apic_mmio_write(struct kvm_i
break;
case APIC_LVT0:
- if (val == APIC_DM_NMI)
- apic_debug("Receive NMI setting on APIC_LVT0 "
- "for cpu %d\n", apic->vcpu->vcpu_id);
+ apic_manage_nmi_watchdog(apic, val);
case APIC_LVTT:
case APIC_LVTTHMR:
case APIC_LVTPC:
Index: b/include/asm-x86/kvm_host.h
===================================================================
--- a/include/asm-x86/kvm_host.h
+++ b/include/asm-x86/kvm_host.h
@@ -359,6 +359,7 @@ struct kvm_arch{
struct kvm_ioapic *vioapic;
struct kvm_pit *vpit;
struct hlist_head irq_ack_notifier_list;
+ int vapics_in_nmi_mode;
int round_robin_prev_vcpu;
unsigned int tss_addr;
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/3] KVM: x86: Relax accept conditions of kvm_apic_accept_pic_intr
2008-10-15 14:27 ` [PATCH 1/3] KVM: x86: Relax accept conditions of kvm_apic_accept_pic_intr Jan Kiszka
@ 2008-10-17 5:11 ` Sheng Yang
2008-10-17 8:10 ` Jan Kiszka
2008-10-17 15:31 ` Xu, Jiajun
0 siblings, 2 replies; 30+ messages in thread
From: Sheng Yang @ 2008-10-17 5:11 UTC (permalink / raw)
To: kvm; +Cc: Jan Kiszka, avi, jiajun.xu
On Wednesday 15 October 2008 22:27:49 Jan Kiszka wrote:
> Aligning in-kernel kvm_apic_accept_pic_intr with its user space mate,
> this patch relaxes the conditions under which PIC IRQs are accepted
> by LVT0. This reflects reality and allows to reuse the service for the
> NMI watchdog use case.
>
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> ---
> arch/x86/kvm/lapic.c | 13 ++++---------
> 1 file changed, 4 insertions(+), 9 deletions(-)
>
> Index: b/arch/x86/kvm/lapic.c
> ===================================================================
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1072,16 +1072,11 @@ int kvm_apic_has_interrupt(struct kvm_vc
> int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu)
> {
> u32 lvt0 = apic_get_reg(vcpu->arch.apic, APIC_LVT0);
> - int r = 0;
>
> - if (vcpu->vcpu_id == 0) {
> - if (!apic_hw_enabled(vcpu->arch.apic))
> - r = 1;
> - if ((lvt0 & APIC_LVT_MASKED) == 0 &&
> - GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT)
> - r = 1;
> - }
> - return r;
> + if (!apic_hw_enabled(vcpu->arch.apic) ||
> + (lvt0 & APIC_LVT_MASKED) == 0)
> + return 1;
> + return 0;
> }
>
> void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu)
>
(sorry for late review...)
Thanks to find out the root cause of BSOD!
But I am a little concern about this change. As you know, PIC only connect to
cpu0. So I think it's not proper to make it generic.
Maybe you can use kvm_apic_accept_pic_intr(vcpu0) in later patch?
--
regards
Yang, Sheng
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/3] KVM: x86: Relax accept conditions of kvm_apic_accept_pic_intr
2008-10-17 5:11 ` Sheng Yang
@ 2008-10-17 8:10 ` Jan Kiszka
2008-10-17 16:35 ` Sheng Yang
2008-10-17 15:31 ` Xu, Jiajun
1 sibling, 1 reply; 30+ messages in thread
From: Jan Kiszka @ 2008-10-17 8:10 UTC (permalink / raw)
To: Sheng Yang; +Cc: kvm, avi, jiajun.xu
Sheng Yang wrote:
> On Wednesday 15 October 2008 22:27:49 Jan Kiszka wrote:
>> Aligning in-kernel kvm_apic_accept_pic_intr with its user space mate,
>> this patch relaxes the conditions under which PIC IRQs are accepted
>> by LVT0. This reflects reality and allows to reuse the service for the
>> NMI watchdog use case.
>>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>> ---
>> arch/x86/kvm/lapic.c | 13 ++++---------
>> 1 file changed, 4 insertions(+), 9 deletions(-)
>>
>> Index: b/arch/x86/kvm/lapic.c
>> ===================================================================
>> --- a/arch/x86/kvm/lapic.c
>> +++ b/arch/x86/kvm/lapic.c
>> @@ -1072,16 +1072,11 @@ int kvm_apic_has_interrupt(struct kvm_vc
>> int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu)
>> {
>> u32 lvt0 = apic_get_reg(vcpu->arch.apic, APIC_LVT0);
>> - int r = 0;
>>
>> - if (vcpu->vcpu_id == 0) {
>> - if (!apic_hw_enabled(vcpu->arch.apic))
>> - r = 1;
>> - if ((lvt0 & APIC_LVT_MASKED) == 0 &&
>> - GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT)
>> - r = 1;
>> - }
>> - return r;
>> + if (!apic_hw_enabled(vcpu->arch.apic) ||
>> + (lvt0 & APIC_LVT_MASKED) == 0)
>> + return 1;
>> + return 0;
>> }
>>
>> void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu)
>>
> (sorry for late review...)
>
> Thanks to find out the root cause of BSOD!
>
> But I am a little concern about this change. As you know, PIC only connect to
> cpu0. So I think it's not proper to make it generic.
I don't think so - and if it were true, qemu would have a bug then, see
its corresponding code.
>
> Maybe you can use kvm_apic_accept_pic_intr(vcpu0) in later patch?
Sorry, don't get what you mean with this (independent of the above).
Jan
--
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/3] KVM: x86: Dont deliver PIT IRQs to masked LVT0s
2008-10-15 14:27 ` [PATCH 2/3] KVM: x86: Dont deliver PIT IRQs to masked LVT0s Jan Kiszka
@ 2008-10-17 15:23 ` Alexander Graf
2008-10-17 15:37 ` Jan Kiszka
0 siblings, 1 reply; 30+ messages in thread
From: Alexander Graf @ 2008-10-17 15:23 UTC (permalink / raw)
To: Jan Kiszka; +Cc: kvm, avi, jiajun.xu, sheng
On 15.10.2008, at 16:27, Jan Kiszka wrote:
> This fixes Windows 64-bit boot regressions: PIT IRQs must not be
> delivered via LVT0 lines if they are masked.
Hum, I still run into issues when booting the free Hyper-V system (or
Windows Server 2008). It BSODs on bootup in 9 out of 10 cases, but
boots just fine with -no-kvm-pit, even though I have your patches
applied.
Any ideas?
Alex
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: [PATCH 1/3] KVM: x86: Relax accept conditions of kvm_apic_accept_pic_intr
2008-10-17 5:11 ` Sheng Yang
2008-10-17 8:10 ` Jan Kiszka
@ 2008-10-17 15:31 ` Xu, Jiajun
1 sibling, 0 replies; 30+ messages in thread
From: Xu, Jiajun @ 2008-10-17 15:31 UTC (permalink / raw)
To: 'Sheng Yang', 'kvm@vger.kernel.org'
Cc: 'Jan Kiszka', 'avi@redhat.com'
On Friday, October 17, 2008 1:11 PM Sheng Yang wrote:
> On Wednesday 15 October 2008 22:27:49 Jan Kiszka wrote:
>> Aligning in-kernel kvm_apic_accept_pic_intr with its user space mate,
>> this patch relaxes the conditions under which PIC IRQs are accepted
>> by LVT0. This reflects reality and allows to reuse the service for
>> the NMI watchdog use case.
>>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>> ---
>> arch/x86/kvm/lapic.c | 13 ++++---------
>> 1 file changed, 4 insertions(+), 9 deletions(-)
>>
>> Index: b/arch/x86/kvm/lapic.c
>> ===================================================================
>> --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c
>> @@ -1072,16 +1072,11 @@ int kvm_apic_has_interrupt(struct kvm_vc
>> int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu) {
>> u32 lvt0 = apic_get_reg(vcpu->arch.apic, APIC_LVT0); - int
>> r = 0;
>>
>> - if (vcpu->vcpu_id == 0) {
>> - if (!apic_hw_enabled(vcpu->arch.apic))
>> - r = 1;
>> - if ((lvt0 & APIC_LVT_MASKED) == 0 &&
>> - GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT)
>> - r = 1;
>> - }
>> - return r;
>> + if (!apic_hw_enabled(vcpu->arch.apic) ||
>> + (lvt0 & APIC_LVT_MASKED) == 0)
>> + return 1;
>> + return 0;
>> }
>>
>> void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu)
>>
> (sorry for late review...)
>
> Thanks to find out the root cause of BSOD!
With the patch, we can boot up 64bit windows (XP, Win2k3, Vista, Win2k8) without BSOD now.
Best Regards
Jiajun
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/3] KVM: x86: Dont deliver PIT IRQs to masked LVT0s
2008-10-17 15:23 ` Alexander Graf
@ 2008-10-17 15:37 ` Jan Kiszka
2008-10-17 15:44 ` Alexander Graf
0 siblings, 1 reply; 30+ messages in thread
From: Jan Kiszka @ 2008-10-17 15:37 UTC (permalink / raw)
To: Alexander Graf; +Cc: kvm, avi, jiajun.xu, sheng
Alexander Graf wrote:
>
> On 15.10.2008, at 16:27, Jan Kiszka wrote:
>
>> This fixes Windows 64-bit boot regressions: PIT IRQs must not be
>> delivered via LVT0 lines if they are masked.
>
> Hum, I still run into issues when booting the free Hyper-V system (or
> Windows Server 2008). It BSODs on bootup in 9 out of 10 cases, but boots
> just fine with -no-kvm-pit, even though I have your patches applied.
>
> Any ideas?
Grmbl. What is the last working commit? And does commenting out the PIT
forwarding in __inject_pit_timer_intr change the picture?
"Free Hyper-V" - is this something I can simply download somewhere? URLs
welcome. My colleague with all the MSDN CDs is out of office.
Jan
--
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/3] KVM: x86: Dont deliver PIT IRQs to masked LVT0s
2008-10-17 15:37 ` Jan Kiszka
@ 2008-10-17 15:44 ` Alexander Graf
2008-10-17 18:14 ` Alexander Graf
0 siblings, 1 reply; 30+ messages in thread
From: Alexander Graf @ 2008-10-17 15:44 UTC (permalink / raw)
To: Jan Kiszka; +Cc: kvm, avi, jiajun.xu, sheng
On 17.10.2008, at 17:37, Jan Kiszka wrote:
> Alexander Graf wrote:
>>
>> On 15.10.2008, at 16:27, Jan Kiszka wrote:
>>
>>> This fixes Windows 64-bit boot regressions: PIT IRQs must not be
>>> delivered via LVT0 lines if they are masked.
>>
>> Hum, I still run into issues when booting the free Hyper-V system (or
>> Windows Server 2008). It BSODs on bootup in 9 out of 10 cases, but
>> boots
>> just fine with -no-kvm-pit, even though I have your patches applied.
>>
>> Any ideas?
>
> Grmbl. What is the last working commit? And does commenting out the
> PIT
> forwarding in __inject_pit_timer_intr change the picture?
I'll try to bisect things later on - but I need to do other stuff
before that, so don't expect anything before monday.
> "Free Hyper-V" - is this something I can simply download somewhere?
> URLs
> welcome. My colleague with all the MSDN CDs is out of office.
Yes, you can just download it at http://www.microsoft.com/servers/hyper-v-server/how-to-get.mspx
. I only installed it with my nested SVM patches applied, but I guess
it should install without too. The installed system definitely runs
without.
Alex
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/3] KVM: x86: Relax accept conditions of kvm_apic_accept_pic_intr
2008-10-17 8:10 ` Jan Kiszka
@ 2008-10-17 16:35 ` Sheng Yang
2008-10-17 17:35 ` Jan Kiszka
0 siblings, 1 reply; 30+ messages in thread
From: Sheng Yang @ 2008-10-17 16:35 UTC (permalink / raw)
To: Jan Kiszka; +Cc: Sheng Yang, kvm, avi, jiajun.xu
On Fri, Oct 17, 2008 at 10:10:54AM +0200, Jan Kiszka wrote:
> Sheng Yang wrote:
> > On Wednesday 15 October 2008 22:27:49 Jan Kiszka wrote:
> >> Aligning in-kernel kvm_apic_accept_pic_intr with its user space mate,
> >> this patch relaxes the conditions under which PIC IRQs are accepted
> >> by LVT0. This reflects reality and allows to reuse the service for the
> >> NMI watchdog use case.
> >>
> >> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> >> ---
> >> arch/x86/kvm/lapic.c | 13 ++++---------
> >> 1 file changed, 4 insertions(+), 9 deletions(-)
> >>
> >> Index: b/arch/x86/kvm/lapic.c
> >> ===================================================================
> >> --- a/arch/x86/kvm/lapic.c
> >> +++ b/arch/x86/kvm/lapic.c
> >> @@ -1072,16 +1072,11 @@ int kvm_apic_has_interrupt(struct kvm_vc
> >> int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu)
> >> {
> >> u32 lvt0 = apic_get_reg(vcpu->arch.apic, APIC_LVT0);
> >> - int r = 0;
> >>
> >> - if (vcpu->vcpu_id == 0) {
> >> - if (!apic_hw_enabled(vcpu->arch.apic))
> >> - r = 1;
> >> - if ((lvt0 & APIC_LVT_MASKED) == 0 &&
> >> - GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT)
> >> - r = 1;
> >> - }
> >> - return r;
> >> + if (!apic_hw_enabled(vcpu->arch.apic) ||
> >> + (lvt0 & APIC_LVT_MASKED) == 0)
> >> + return 1;
> >> + return 0;
> >> }
> >>
> >> void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu)
> >>
> > (sorry for late review...)
> >
> > Thanks to find out the root cause of BSOD!
> >
> > But I am a little concern about this change. As you know, PIC only connect to
> > cpu0. So I think it's not proper to make it generic.
>
> I don't think so - and if it were true, qemu would have a bug then, see
> its corresponding code.
You can refer to Intel MP spec, virtual wire mode. Google
"MP spec" can find it.
Normally PIC is only used in BSP boot up for SMP guest(PIC can't afford SMP,
otherwise we won't need IOAPIC/LAPIC). After that, it should be disabled.
And virtual wire mode works with APIC_MODE_EXTINT on LVT0 of BSP lapic, so
that's why you see
GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT
KVM follow virtual wire mode exactly.
For QEmu, it just check if lapic LVT0 is masked, and don't check vcpu0.
That's indeed a little problematic, for it's not that sufficient to
determine if it's programmed as virtual wire mode and used for deliver
interrupts from PIC. Well, in most condition, it can work. But maybe
it's not clean in logic.
For NMI watchdog here, we use a little more tricky way other than normal
PIC/LAPIC interaction. IIRC, NMI watchdog don't mask PIC after enable
IOAPIC, it also don't mask LVT0 of every LAPIC. It use physical connection
of PIT to PIC then to LAPIC LVT0 to send NMI. Program LVT0 to NMI, then
every PIT interrupt would go through PIC, arrive at LVT0, trig a NMI.
So I think the key problem for Windows is, they don't need it, but we send
the NMIs. We send the NMI when LVT0 is masked. Base on this, I think your
optimize patch also can resolve this issue? It's already including necessary
judgment. We will try it next week.
--
regards
Yang, Sheng
>
> >
> > Maybe you can use kvm_apic_accept_pic_intr(vcpu0) in later patch?
>
> Sorry, don't get what you mean with this (independent of the above).
>
> Jan
>
> --
> Siemens AG, Corporate Technology, CT SE 2
> Corporate Competence Center Embedded Linux
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 3/3] KVM: x86: Optimize NMI watchdog delivery
2008-10-15 14:27 ` [PATCH 3/3] KVM: x86: Optimize NMI watchdog delivery Jan Kiszka
@ 2008-10-17 17:06 ` Sheng Yang
2008-10-17 17:23 ` Jan Kiszka
0 siblings, 1 reply; 30+ messages in thread
From: Sheng Yang @ 2008-10-17 17:06 UTC (permalink / raw)
To: Jan Kiszka; +Cc: kvm, avi, jiajun.xu, sheng
On Wed, Oct 15, 2008 at 04:27:51PM +0200, Jan Kiszka wrote:
> As suggested by Avi, this patch introduces a counter of VCPUs that have
> LVT0 set to NMI mode. Only if the counter > 0, we push the PIT ticks via
> all LAPIC LVT0 lines to enable NMI watchdog support.
>
I feel a little strange about: if *counter > 0*, we push to *all*. Can we
only push NMIs to the ones that set NMI for LVT0?
How about add a field in struct kvm_lapic? We can quickly know if we need to
inject NMI for this vcpu. Well, though kernel mostly enable NMI watchdog on
all vcpu, I think this is more precise, and match the logic, and avoid one
more field in kvm_arch...
--
regards
Yang, Sheng
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> ---
> arch/x86/kvm/i8254.c | 13 +++++++------
> arch/x86/kvm/lapic.c | 23 ++++++++++++++++++++---
> include/asm-x86/kvm_host.h | 1 +
> 3 files changed, 28 insertions(+), 9 deletions(-)
>
> Index: b/arch/x86/kvm/i8254.c
> ===================================================================
> --- a/arch/x86/kvm/i8254.c
> +++ b/arch/x86/kvm/i8254.c
> @@ -607,12 +607,13 @@ static void __inject_pit_timer_intr(stru
> * The route is: PIT -> PIC -> LVT0 in NMI mode,
> * timer IRQs will continue to flow through the IOAPIC.
> */
> - for (i = 0; i < KVM_MAX_VCPUS; ++i) {
> - vcpu = kvm->vcpus[i];
> - if (!vcpu || !kvm_apic_accept_pic_intr(vcpu))
> - continue;
> - kvm_apic_local_deliver(vcpu, APIC_LVT0);
> - }
> + if (kvm->arch.vapics_in_nmi_mode > 0)
> + for (i = 0; i < KVM_MAX_VCPUS; ++i) {
> + vcpu = kvm->vcpus[i];
> + if (!vcpu || !kvm_apic_accept_pic_intr(vcpu))
> + continue;
> + kvm_apic_local_deliver(vcpu, APIC_LVT0);
> + }
> }
>
> void kvm_inject_pit_timer_irqs(struct kvm_vcpu *vcpu)
> Index: b/arch/x86/kvm/lapic.c
> ===================================================================
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -130,6 +130,11 @@ static inline int apic_lvtt_period(struc
> return apic_get_reg(apic, APIC_LVTT) & APIC_LVT_TIMER_PERIODIC;
> }
>
> +static inline int apic_lvt_nmi_mode(u32 lvt_val)
> +{
> + return (lvt_val & (APIC_MODE_MASK | APIC_LVT_MASKED)) == APIC_DM_NMI;
> +}
> +
> static unsigned int apic_lvt_mask[APIC_LVT_NUM] = {
> LVT_MASK | APIC_LVT_TIMER_PERIODIC, /* LVTT */
> LVT_MASK | APIC_MODE_MASK, /* LVTTHMR */
> @@ -672,6 +677,20 @@ static void start_apic_timer(struct kvm_
> apic->timer.period)));
> }
>
> +static void apic_manage_nmi_watchdog(struct kvm_lapic *apic, u32 lvt0_val)
> +{
> + int nmi_wd_enabled = apic_lvt_nmi_mode(apic_get_reg(apic, APIC_LVT0));
> +
> + if (apic_lvt_nmi_mode(lvt0_val)) {
> + if (!nmi_wd_enabled) {
> + apic_debug("Receive NMI setting on APIC_LVT0 "
> + "for cpu %d\n", apic->vcpu->vcpu_id);
> + apic->vcpu->kvm->arch.vapics_in_nmi_mode++;
> + }
> + } else if (nmi_wd_enabled)
> + apic->vcpu->kvm->arch.vapics_in_nmi_mode--;
> +}
> +
> static void apic_mmio_write(struct kvm_io_device *this,
> gpa_t address, int len, const void *data)
> {
> @@ -753,9 +772,7 @@ static void apic_mmio_write(struct kvm_i
> break;
>
> case APIC_LVT0:
> - if (val == APIC_DM_NMI)
> - apic_debug("Receive NMI setting on APIC_LVT0 "
> - "for cpu %d\n", apic->vcpu->vcpu_id);
> + apic_manage_nmi_watchdog(apic, val);
> case APIC_LVTT:
> case APIC_LVTTHMR:
> case APIC_LVTPC:
> Index: b/include/asm-x86/kvm_host.h
> ===================================================================
> --- a/include/asm-x86/kvm_host.h
> +++ b/include/asm-x86/kvm_host.h
> @@ -359,6 +359,7 @@ struct kvm_arch{
> struct kvm_ioapic *vioapic;
> struct kvm_pit *vpit;
> struct hlist_head irq_ack_notifier_list;
> + int vapics_in_nmi_mode;
>
> int round_robin_prev_vcpu;
> unsigned int tss_addr;
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 3/3] KVM: x86: Optimize NMI watchdog delivery
2008-10-17 17:06 ` Sheng Yang
@ 2008-10-17 17:23 ` Jan Kiszka
2008-10-17 17:34 ` Sheng Yang
0 siblings, 1 reply; 30+ messages in thread
From: Jan Kiszka @ 2008-10-17 17:23 UTC (permalink / raw)
To: sheng; +Cc: Jan Kiszka, kvm, avi, jiajun.xu
[-- Attachment #1: Type: text/plain, Size: 4881 bytes --]
Sheng Yang wrote:
> On Wed, Oct 15, 2008 at 04:27:51PM +0200, Jan Kiszka wrote:
>> As suggested by Avi, this patch introduces a counter of VCPUs that have
>> LVT0 set to NMI mode. Only if the counter > 0, we push the PIT ticks via
>> all LAPIC LVT0 lines to enable NMI watchdog support.
>>
>
> I feel a little strange about: if *counter > 0*, we push to *all*. Can we
> only push NMIs to the ones that set NMI for LVT0?
We don't do that due to !kvm_apic_accept_pic_intr(). The counter is only
about optimizing that case where we don't have to walk the whole chain,
asking every vcpu if it would like to receive the IRQ.
>
> How about add a field in struct kvm_lapic? We can quickly know if we need to
> inject NMI for this vcpu. Well, though kernel mostly enable NMI watchdog on
> all vcpu, I think this is more precise, and match the logic, and avoid one
> more field in kvm_arch...
The point of this patch is to avoid touching vcpu structures AT ALL when
there is no interest in the NMI watchdog (normally, OSes will either
enable the WD trick for all CPUSs or keep it off).
Jan
>
> --
> regards
> Yang, Sheng
>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>> ---
>> arch/x86/kvm/i8254.c | 13 +++++++------
>> arch/x86/kvm/lapic.c | 23 ++++++++++++++++++++---
>> include/asm-x86/kvm_host.h | 1 +
>> 3 files changed, 28 insertions(+), 9 deletions(-)
>>
>> Index: b/arch/x86/kvm/i8254.c
>> ===================================================================
>> --- a/arch/x86/kvm/i8254.c
>> +++ b/arch/x86/kvm/i8254.c
>> @@ -607,12 +607,13 @@ static void __inject_pit_timer_intr(stru
>> * The route is: PIT -> PIC -> LVT0 in NMI mode,
>> * timer IRQs will continue to flow through the IOAPIC.
>> */
>> - for (i = 0; i < KVM_MAX_VCPUS; ++i) {
>> - vcpu = kvm->vcpus[i];
>> - if (!vcpu || !kvm_apic_accept_pic_intr(vcpu))
>> - continue;
>> - kvm_apic_local_deliver(vcpu, APIC_LVT0);
>> - }
>> + if (kvm->arch.vapics_in_nmi_mode > 0)
>> + for (i = 0; i < KVM_MAX_VCPUS; ++i) {
>> + vcpu = kvm->vcpus[i];
>> + if (!vcpu || !kvm_apic_accept_pic_intr(vcpu))
>> + continue;
>> + kvm_apic_local_deliver(vcpu, APIC_LVT0);
>> + }
>> }
>>
>> void kvm_inject_pit_timer_irqs(struct kvm_vcpu *vcpu)
>> Index: b/arch/x86/kvm/lapic.c
>> ===================================================================
>> --- a/arch/x86/kvm/lapic.c
>> +++ b/arch/x86/kvm/lapic.c
>> @@ -130,6 +130,11 @@ static inline int apic_lvtt_period(struc
>> return apic_get_reg(apic, APIC_LVTT) & APIC_LVT_TIMER_PERIODIC;
>> }
>>
>> +static inline int apic_lvt_nmi_mode(u32 lvt_val)
>> +{
>> + return (lvt_val & (APIC_MODE_MASK | APIC_LVT_MASKED)) == APIC_DM_NMI;
>> +}
>> +
>> static unsigned int apic_lvt_mask[APIC_LVT_NUM] = {
>> LVT_MASK | APIC_LVT_TIMER_PERIODIC, /* LVTT */
>> LVT_MASK | APIC_MODE_MASK, /* LVTTHMR */
>> @@ -672,6 +677,20 @@ static void start_apic_timer(struct kvm_
>> apic->timer.period)));
>> }
>>
>> +static void apic_manage_nmi_watchdog(struct kvm_lapic *apic, u32 lvt0_val)
>> +{
>> + int nmi_wd_enabled = apic_lvt_nmi_mode(apic_get_reg(apic, APIC_LVT0));
>> +
>> + if (apic_lvt_nmi_mode(lvt0_val)) {
>> + if (!nmi_wd_enabled) {
>> + apic_debug("Receive NMI setting on APIC_LVT0 "
>> + "for cpu %d\n", apic->vcpu->vcpu_id);
>> + apic->vcpu->kvm->arch.vapics_in_nmi_mode++;
>> + }
>> + } else if (nmi_wd_enabled)
>> + apic->vcpu->kvm->arch.vapics_in_nmi_mode--;
>> +}
>> +
>> static void apic_mmio_write(struct kvm_io_device *this,
>> gpa_t address, int len, const void *data)
>> {
>> @@ -753,9 +772,7 @@ static void apic_mmio_write(struct kvm_i
>> break;
>>
>> case APIC_LVT0:
>> - if (val == APIC_DM_NMI)
>> - apic_debug("Receive NMI setting on APIC_LVT0 "
>> - "for cpu %d\n", apic->vcpu->vcpu_id);
>> + apic_manage_nmi_watchdog(apic, val);
>> case APIC_LVTT:
>> case APIC_LVTTHMR:
>> case APIC_LVTPC:
>> Index: b/include/asm-x86/kvm_host.h
>> ===================================================================
>> --- a/include/asm-x86/kvm_host.h
>> +++ b/include/asm-x86/kvm_host.h
>> @@ -359,6 +359,7 @@ struct kvm_arch{
>> struct kvm_ioapic *vioapic;
>> struct kvm_pit *vpit;
>> struct hlist_head irq_ack_notifier_list;
>> + int vapics_in_nmi_mode;
>>
>> int round_robin_prev_vcpu;
>> unsigned int tss_addr;
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 3/3] KVM: x86: Optimize NMI watchdog delivery
2008-10-17 17:23 ` Jan Kiszka
@ 2008-10-17 17:34 ` Sheng Yang
2008-10-17 17:40 ` Jan Kiszka
0 siblings, 1 reply; 30+ messages in thread
From: Sheng Yang @ 2008-10-17 17:34 UTC (permalink / raw)
To: Jan Kiszka; +Cc: sheng, Jan Kiszka, kvm, avi, jiajun.xu
On Fri, Oct 17, 2008 at 07:23:01PM +0200, Jan Kiszka wrote:
> Sheng Yang wrote:
> > On Wed, Oct 15, 2008 at 04:27:51PM +0200, Jan Kiszka wrote:
> >> As suggested by Avi, this patch introduces a counter of VCPUs that have
> >> LVT0 set to NMI mode. Only if the counter > 0, we push the PIT ticks via
> >> all LAPIC LVT0 lines to enable NMI watchdog support.
> >>
> >
> > I feel a little strange about: if *counter > 0*, we push to *all*. Can we
> > only push NMIs to the ones that set NMI for LVT0?
>
> We don't do that due to !kvm_apic_accept_pic_intr(). The counter is only
> about optimizing that case where we don't have to walk the whole chain,
> asking every vcpu if it would like to receive the IRQ.
I don't agree to use kvm_apic_accept_pic_intr() here, as I explained in the
first mail. It's not a normal path, and current KVM handle it well.
>
> >
> > How about add a field in struct kvm_lapic? We can quickly know if we need to
> > inject NMI for this vcpu. Well, though kernel mostly enable NMI watchdog on
> > all vcpu, I think this is more precise, and match the logic, and avoid one
> > more field in kvm_arch...
>
> The point of this patch is to avoid touching vcpu structures AT ALL when
> there is no interest in the NMI watchdog (normally, OSes will either
> enable the WD trick for all CPUSs or keep it off).
Logically, I think lapic is more proper place. And put a bool there won't
affect much. I think we can do it more straightly here.
--
regards
Yang, Sheng
>
> Jan
>
> >
> > --
> > regards
> > Yang, Sheng
> >
> >> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> >> ---
> >> arch/x86/kvm/i8254.c | 13 +++++++------
> >> arch/x86/kvm/lapic.c | 23 ++++++++++++++++++++---
> >> include/asm-x86/kvm_host.h | 1 +
> >> 3 files changed, 28 insertions(+), 9 deletions(-)
> >>
> >> Index: b/arch/x86/kvm/i8254.c
> >> ===================================================================
> >> --- a/arch/x86/kvm/i8254.c
> >> +++ b/arch/x86/kvm/i8254.c
> >> @@ -607,12 +607,13 @@ static void __inject_pit_timer_intr(stru
> >> * The route is: PIT -> PIC -> LVT0 in NMI mode,
> >> * timer IRQs will continue to flow through the IOAPIC.
> >> */
> >> - for (i = 0; i < KVM_MAX_VCPUS; ++i) {
> >> - vcpu = kvm->vcpus[i];
> >> - if (!vcpu || !kvm_apic_accept_pic_intr(vcpu))
> >> - continue;
> >> - kvm_apic_local_deliver(vcpu, APIC_LVT0);
> >> - }
> >> + if (kvm->arch.vapics_in_nmi_mode > 0)
> >> + for (i = 0; i < KVM_MAX_VCPUS; ++i) {
> >> + vcpu = kvm->vcpus[i];
> >> + if (!vcpu || !kvm_apic_accept_pic_intr(vcpu))
> >> + continue;
> >> + kvm_apic_local_deliver(vcpu, APIC_LVT0);
> >> + }
> >> }
> >>
> >> void kvm_inject_pit_timer_irqs(struct kvm_vcpu *vcpu)
> >> Index: b/arch/x86/kvm/lapic.c
> >> ===================================================================
> >> --- a/arch/x86/kvm/lapic.c
> >> +++ b/arch/x86/kvm/lapic.c
> >> @@ -130,6 +130,11 @@ static inline int apic_lvtt_period(struc
> >> return apic_get_reg(apic, APIC_LVTT) & APIC_LVT_TIMER_PERIODIC;
> >> }
> >>
> >> +static inline int apic_lvt_nmi_mode(u32 lvt_val)
> >> +{
> >> + return (lvt_val & (APIC_MODE_MASK | APIC_LVT_MASKED)) == APIC_DM_NMI;
> >> +}
> >> +
> >> static unsigned int apic_lvt_mask[APIC_LVT_NUM] = {
> >> LVT_MASK | APIC_LVT_TIMER_PERIODIC, /* LVTT */
> >> LVT_MASK | APIC_MODE_MASK, /* LVTTHMR */
> >> @@ -672,6 +677,20 @@ static void start_apic_timer(struct kvm_
> >> apic->timer.period)));
> >> }
> >>
> >> +static void apic_manage_nmi_watchdog(struct kvm_lapic *apic, u32 lvt0_val)
> >> +{
> >> + int nmi_wd_enabled = apic_lvt_nmi_mode(apic_get_reg(apic, APIC_LVT0));
> >> +
> >> + if (apic_lvt_nmi_mode(lvt0_val)) {
> >> + if (!nmi_wd_enabled) {
> >> + apic_debug("Receive NMI setting on APIC_LVT0 "
> >> + "for cpu %d\n", apic->vcpu->vcpu_id);
> >> + apic->vcpu->kvm->arch.vapics_in_nmi_mode++;
> >> + }
> >> + } else if (nmi_wd_enabled)
> >> + apic->vcpu->kvm->arch.vapics_in_nmi_mode--;
> >> +}
> >> +
> >> static void apic_mmio_write(struct kvm_io_device *this,
> >> gpa_t address, int len, const void *data)
> >> {
> >> @@ -753,9 +772,7 @@ static void apic_mmio_write(struct kvm_i
> >> break;
> >>
> >> case APIC_LVT0:
> >> - if (val == APIC_DM_NMI)
> >> - apic_debug("Receive NMI setting on APIC_LVT0 "
> >> - "for cpu %d\n", apic->vcpu->vcpu_id);
> >> + apic_manage_nmi_watchdog(apic, val);
> >> case APIC_LVTT:
> >> case APIC_LVTTHMR:
> >> case APIC_LVTPC:
> >> Index: b/include/asm-x86/kvm_host.h
> >> ===================================================================
> >> --- a/include/asm-x86/kvm_host.h
> >> +++ b/include/asm-x86/kvm_host.h
> >> @@ -359,6 +359,7 @@ struct kvm_arch{
> >> struct kvm_ioapic *vioapic;
> >> struct kvm_pit *vpit;
> >> struct hlist_head irq_ack_notifier_list;
> >> + int vapics_in_nmi_mode;
> >>
> >> int round_robin_prev_vcpu;
> >> unsigned int tss_addr;
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe kvm" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
>
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/3] KVM: x86: Relax accept conditions of kvm_apic_accept_pic_intr
2008-10-17 16:35 ` Sheng Yang
@ 2008-10-17 17:35 ` Jan Kiszka
2008-10-17 17:47 ` Sheng Yang
0 siblings, 1 reply; 30+ messages in thread
From: Jan Kiszka @ 2008-10-17 17:35 UTC (permalink / raw)
To: Sheng Yang; +Cc: Jan Kiszka, kvm, avi, jiajun.xu
[-- Attachment #1: Type: text/plain, Size: 4263 bytes --]
Sheng Yang wrote:
> On Fri, Oct 17, 2008 at 10:10:54AM +0200, Jan Kiszka wrote:
>> Sheng Yang wrote:
>>> On Wednesday 15 October 2008 22:27:49 Jan Kiszka wrote:
>>>> Aligning in-kernel kvm_apic_accept_pic_intr with its user space mate,
>>>> this patch relaxes the conditions under which PIC IRQs are accepted
>>>> by LVT0. This reflects reality and allows to reuse the service for the
>>>> NMI watchdog use case.
>>>>
>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>> ---
>>>> arch/x86/kvm/lapic.c | 13 ++++---------
>>>> 1 file changed, 4 insertions(+), 9 deletions(-)
>>>>
>>>> Index: b/arch/x86/kvm/lapic.c
>>>> ===================================================================
>>>> --- a/arch/x86/kvm/lapic.c
>>>> +++ b/arch/x86/kvm/lapic.c
>>>> @@ -1072,16 +1072,11 @@ int kvm_apic_has_interrupt(struct kvm_vc
>>>> int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu)
>>>> {
>>>> u32 lvt0 = apic_get_reg(vcpu->arch.apic, APIC_LVT0);
>>>> - int r = 0;
>>>>
>>>> - if (vcpu->vcpu_id == 0) {
>>>> - if (!apic_hw_enabled(vcpu->arch.apic))
>>>> - r = 1;
>>>> - if ((lvt0 & APIC_LVT_MASKED) == 0 &&
>>>> - GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT)
>>>> - r = 1;
>>>> - }
>>>> - return r;
>>>> + if (!apic_hw_enabled(vcpu->arch.apic) ||
>>>> + (lvt0 & APIC_LVT_MASKED) == 0)
>>>> + return 1;
>>>> + return 0;
>>>> }
>>>>
>>>> void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu)
>>>>
>>> (sorry for late review...)
>>>
>>> Thanks to find out the root cause of BSOD!
>>>
>>> But I am a little concern about this change. As you know, PIC only connect to
>>> cpu0. So I think it's not proper to make it generic.
>> I don't think so - and if it were true, qemu would have a bug then, see
>> its corresponding code.
>
> You can refer to Intel MP spec, virtual wire mode. Google
> "MP spec" can find it.
Ah, good reference.
>
> Normally PIC is only used in BSP boot up for SMP guest(PIC can't afford SMP,
> otherwise we won't need IOAPIC/LAPIC). After that, it should be disabled.
> And virtual wire mode works with APIC_MODE_EXTINT on LVT0 of BSP lapic, so
> that's why you see
>
> GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT
>
> KVM follow virtual wire mode exactly.
According to my understanding of the spec, the virtual wire mode means
that the PIC signal is delivered via LVT0, and thus can be received by
_all_ CPUs in the system. However, only the BSP usually enables LVT0,
thus is receiving the IRQ. When Linux switches to NMI watchdog mode 1,
it also unmasks the other CPUs (and reprograms all to deliver NMIs
instead of EXTINTs).
Then there is also the "PIC Mode", ie. direct delivery to the BSP, and
only the latter. That mode is obviously target by the current
kvm_apic_accept_pic_intr implementation. But I find no indication in the
spec yet that both modes cannot exists in the same system. But I also
fail to understand how one could switch between both modes (via software).
>
> For QEmu, it just check if lapic LVT0 is masked, and don't check vcpu0.
> That's indeed a little problematic, for it's not that sufficient to
> determine if it's programmed as virtual wire mode and used for deliver
> interrupts from PIC. Well, in most condition, it can work. But maybe
> it's not clean in logic.
>
> For NMI watchdog here, we use a little more tricky way other than normal
> PIC/LAPIC interaction. IIRC, NMI watchdog don't mask PIC after enable
> IOAPIC, it also don't mask LVT0 of every LAPIC. It use physical connection
> of PIT to PIC then to LAPIC LVT0 to send NMI. Program LVT0 to NMI, then
> every PIT interrupt would go through PIC, arrive at LVT0, trig a NMI.
>
> So I think the key problem for Windows is, they don't need it, but we send
> the NMIs. We send the NMI when LVT0 is masked. Base on this, I think your
> optimize patch also can resolve this issue? It's already including necessary
> judgment. We will try it next week.
The key problem for Windows was most probably not NMI, but the fact that
we forwarded _any_ PIC IRQ (emulating virtual wire mode) without
checking for the LAPICs' mask state.
OK, this requires a few more thoughts and a bit more reading.
Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 3/3] KVM: x86: Optimize NMI watchdog delivery
2008-10-17 17:34 ` Sheng Yang
@ 2008-10-17 17:40 ` Jan Kiszka
2008-10-17 18:26 ` Sheng Yang
2008-10-19 11:15 ` Avi Kivity
0 siblings, 2 replies; 30+ messages in thread
From: Jan Kiszka @ 2008-10-17 17:40 UTC (permalink / raw)
To: sheng; +Cc: Jan Kiszka, kvm, avi, jiajun.xu
[-- Attachment #1: Type: text/plain, Size: 1999 bytes --]
Sheng Yang wrote:
> On Fri, Oct 17, 2008 at 07:23:01PM +0200, Jan Kiszka wrote:
>> Sheng Yang wrote:
>>> On Wed, Oct 15, 2008 at 04:27:51PM +0200, Jan Kiszka wrote:
>>>> As suggested by Avi, this patch introduces a counter of VCPUs that have
>>>> LVT0 set to NMI mode. Only if the counter > 0, we push the PIT ticks via
>>>> all LAPIC LVT0 lines to enable NMI watchdog support.
>>>>
>>> I feel a little strange about: if *counter > 0*, we push to *all*. Can we
>>> only push NMIs to the ones that set NMI for LVT0?
>> We don't do that due to !kvm_apic_accept_pic_intr(). The counter is only
>> about optimizing that case where we don't have to walk the whole chain,
>> asking every vcpu if it would like to receive the IRQ.
>
> I don't agree to use kvm_apic_accept_pic_intr() here, as I explained in the
> first mail. It's not a normal path, and current KVM handle it well.
Current KVM only support PIC Mode, which is fine, but not sufficient for
NMI watchdog support. We need to get the Virtual Wire Mode in, but
correctly.
>>> How about add a field in struct kvm_lapic? We can quickly know if we need to
>>> inject NMI for this vcpu. Well, though kernel mostly enable NMI watchdog on
>>> all vcpu, I think this is more precise, and match the logic, and avoid one
>>> more field in kvm_arch...
>> The point of this patch is to avoid touching vcpu structures AT ALL when
>> there is no interest in the NMI watchdog (normally, OSes will either
>> enable the WD trick for all CPUSs or keep it off).
>
> Logically, I think lapic is more proper place. And put a bool there won't
> affect much. I think we can do it more straightly here.
If you have dozens of lapics, you don't want to check them all if they
are ALL switched of anyway. That information is better encoded in a
single, (virtual) system-wide bool. That's the most common case we want
to speed up. And it is the core of the optimization Avi suggested
(unless I totally misunderstood him).
Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/3] KVM: x86: Relax accept conditions of kvm_apic_accept_pic_intr
2008-10-17 17:35 ` Jan Kiszka
@ 2008-10-17 17:47 ` Sheng Yang
2008-10-17 17:56 ` Jan Kiszka
0 siblings, 1 reply; 30+ messages in thread
From: Sheng Yang @ 2008-10-17 17:47 UTC (permalink / raw)
To: Jan Kiszka; +Cc: Sheng Yang, Jan Kiszka, kvm, avi, jiajun.xu
On Fri, Oct 17, 2008 at 07:35:01PM +0200, Jan Kiszka wrote:
> Sheng Yang wrote:
> > On Fri, Oct 17, 2008 at 10:10:54AM +0200, Jan Kiszka wrote:
> >> Sheng Yang wrote:
> >>> On Wednesday 15 October 2008 22:27:49 Jan Kiszka wrote:
> >>>> Aligning in-kernel kvm_apic_accept_pic_intr with its user space mate,
> >>>> this patch relaxes the conditions under which PIC IRQs are accepted
> >>>> by LVT0. This reflects reality and allows to reuse the service for the
> >>>> NMI watchdog use case.
> >>>>
> >>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> >>>> ---
> >>>> arch/x86/kvm/lapic.c | 13 ++++---------
> >>>> 1 file changed, 4 insertions(+), 9 deletions(-)
> >>>>
> >>>> Index: b/arch/x86/kvm/lapic.c
> >>>> ===================================================================
> >>>> --- a/arch/x86/kvm/lapic.c
> >>>> +++ b/arch/x86/kvm/lapic.c
> >>>> @@ -1072,16 +1072,11 @@ int kvm_apic_has_interrupt(struct kvm_vc
> >>>> int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu)
> >>>> {
> >>>> u32 lvt0 = apic_get_reg(vcpu->arch.apic, APIC_LVT0);
> >>>> - int r = 0;
> >>>>
> >>>> - if (vcpu->vcpu_id == 0) {
> >>>> - if (!apic_hw_enabled(vcpu->arch.apic))
> >>>> - r = 1;
> >>>> - if ((lvt0 & APIC_LVT_MASKED) == 0 &&
> >>>> - GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT)
> >>>> - r = 1;
> >>>> - }
> >>>> - return r;
> >>>> + if (!apic_hw_enabled(vcpu->arch.apic) ||
> >>>> + (lvt0 & APIC_LVT_MASKED) == 0)
> >>>> + return 1;
> >>>> + return 0;
> >>>> }
> >>>>
> >>>> void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu)
> >>>>
> >>> (sorry for late review...)
> >>>
> >>> Thanks to find out the root cause of BSOD!
> >>>
> >>> But I am a little concern about this change. As you know, PIC only connect to
> >>> cpu0. So I think it's not proper to make it generic.
> >> I don't think so - and if it were true, qemu would have a bug then, see
> >> its corresponding code.
> >
> > You can refer to Intel MP spec, virtual wire mode. Google
> > "MP spec" can find it.
>
> Ah, good reference.
>
> >
> > Normally PIC is only used in BSP boot up for SMP guest(PIC can't afford SMP,
> > otherwise we won't need IOAPIC/LAPIC). After that, it should be disabled.
> > And virtual wire mode works with APIC_MODE_EXTINT on LVT0 of BSP lapic, so
> > that's why you see
> >
> > GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT
> >
> > KVM follow virtual wire mode exactly.
>
> According to my understanding of the spec, the virtual wire mode means
> that the PIC signal is delivered via LVT0, and thus can be received by
> _all_ CPUs in the system. However, only the BSP usually enables LVT0,
> thus is receiving the IRQ. When Linux switches to NMI watchdog mode 1,
> it also unmasks the other CPUs (and reprograms all to deliver NMIs
> instead of EXTINTs).
>
> Then there is also the "PIC Mode", ie. direct delivery to the BSP, and
> only the latter. That mode is obviously target by the current
> kvm_apic_accept_pic_intr implementation. But I find no indication in the
> spec yet that both modes cannot exists in the same system. But I also
> fail to understand how one could switch between both modes (via software).
No. If so, we don't need to check LVT0.
--
regards
Yang, Sheng
>
> >
> > For QEmu, it just check if lapic LVT0 is masked, and don't check vcpu0.
> > That's indeed a little problematic, for it's not that sufficient to
> > determine if it's programmed as virtual wire mode and used for deliver
> > interrupts from PIC. Well, in most condition, it can work. But maybe
> > it's not clean in logic.
> >
> > For NMI watchdog here, we use a little more tricky way other than normal
> > PIC/LAPIC interaction. IIRC, NMI watchdog don't mask PIC after enable
> > IOAPIC, it also don't mask LVT0 of every LAPIC. It use physical connection
> > of PIT to PIC then to LAPIC LVT0 to send NMI. Program LVT0 to NMI, then
> > every PIT interrupt would go through PIC, arrive at LVT0, trig a NMI.
> >
> > So I think the key problem for Windows is, they don't need it, but we send
> > the NMIs. We send the NMI when LVT0 is masked. Base on this, I think your
> > optimize patch also can resolve this issue? It's already including necessary
> > judgment. We will try it next week.
>
> The key problem for Windows was most probably not NMI, but the fact that
> we forwarded _any_ PIC IRQ (emulating virtual wire mode) without
> checking for the LAPICs' mask state.
>
> OK, this requires a few more thoughts and a bit more reading.
>
> Jan
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/3] KVM: x86: Relax accept conditions of kvm_apic_accept_pic_intr
2008-10-17 17:47 ` Sheng Yang
@ 2008-10-17 17:56 ` Jan Kiszka
2008-10-17 18:12 ` Jan Kiszka
2008-10-17 18:15 ` Sheng Yang
0 siblings, 2 replies; 30+ messages in thread
From: Jan Kiszka @ 2008-10-17 17:56 UTC (permalink / raw)
To: Sheng Yang; +Cc: Jan Kiszka, kvm, avi, jiajun.xu
[-- Attachment #1: Type: text/plain, Size: 3764 bytes --]
Sheng Yang wrote:
> On Fri, Oct 17, 2008 at 07:35:01PM +0200, Jan Kiszka wrote:
>> Sheng Yang wrote:
>>> On Fri, Oct 17, 2008 at 10:10:54AM +0200, Jan Kiszka wrote:
>>>> Sheng Yang wrote:
>>>>> On Wednesday 15 October 2008 22:27:49 Jan Kiszka wrote:
>>>>>> Aligning in-kernel kvm_apic_accept_pic_intr with its user space mate,
>>>>>> this patch relaxes the conditions under which PIC IRQs are accepted
>>>>>> by LVT0. This reflects reality and allows to reuse the service for the
>>>>>> NMI watchdog use case.
>>>>>>
>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>> ---
>>>>>> arch/x86/kvm/lapic.c | 13 ++++---------
>>>>>> 1 file changed, 4 insertions(+), 9 deletions(-)
>>>>>>
>>>>>> Index: b/arch/x86/kvm/lapic.c
>>>>>> ===================================================================
>>>>>> --- a/arch/x86/kvm/lapic.c
>>>>>> +++ b/arch/x86/kvm/lapic.c
>>>>>> @@ -1072,16 +1072,11 @@ int kvm_apic_has_interrupt(struct kvm_vc
>>>>>> int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu)
>>>>>> {
>>>>>> u32 lvt0 = apic_get_reg(vcpu->arch.apic, APIC_LVT0);
>>>>>> - int r = 0;
>>>>>>
>>>>>> - if (vcpu->vcpu_id == 0) {
>>>>>> - if (!apic_hw_enabled(vcpu->arch.apic))
>>>>>> - r = 1;
>>>>>> - if ((lvt0 & APIC_LVT_MASKED) == 0 &&
>>>>>> - GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT)
>>>>>> - r = 1;
>>>>>> - }
>>>>>> - return r;
>>>>>> + if (!apic_hw_enabled(vcpu->arch.apic) ||
>>>>>> + (lvt0 & APIC_LVT_MASKED) == 0)
>>>>>> + return 1;
>>>>>> + return 0;
>>>>>> }
>>>>>>
>>>>>> void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu)
>>>>>>
>>>>> (sorry for late review...)
>>>>>
>>>>> Thanks to find out the root cause of BSOD!
>>>>>
>>>>> But I am a little concern about this change. As you know, PIC only connect to
>>>>> cpu0. So I think it's not proper to make it generic.
>>>> I don't think so - and if it were true, qemu would have a bug then, see
>>>> its corresponding code.
>>> You can refer to Intel MP spec, virtual wire mode. Google
>>> "MP spec" can find it.
>> Ah, good reference.
>>
>>> Normally PIC is only used in BSP boot up for SMP guest(PIC can't afford SMP,
>>> otherwise we won't need IOAPIC/LAPIC). After that, it should be disabled.
>>> And virtual wire mode works with APIC_MODE_EXTINT on LVT0 of BSP lapic, so
>>> that's why you see
>>>
>>> GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT
>>>
>>> KVM follow virtual wire mode exactly.
>> According to my understanding of the spec, the virtual wire mode means
>> that the PIC signal is delivered via LVT0, and thus can be received by
>> _all_ CPUs in the system. However, only the BSP usually enables LVT0,
>> thus is receiving the IRQ. When Linux switches to NMI watchdog mode 1,
>> it also unmasks the other CPUs (and reprograms all to deliver NMIs
>> instead of EXTINTs).
>>
>> Then there is also the "PIC Mode", ie. direct delivery to the BSP, and
>> only the latter. That mode is obviously target by the current
>> kvm_apic_accept_pic_intr implementation. But I find no indication in the
>> spec yet that both modes cannot exists in the same system. But I also
>> fail to understand how one could switch between both modes (via software).
>
> No. If so, we don't need to check LVT0.
OK, there is that IMCR register to enable/disable the PIC Mode - but
neither KVM nor QEMU implement it (which may indicate they both want to
provide Virtual Wire Mode?). When enabled, Virtual Wire Mode is
effectively disabled (for the BSP at least) as the LAPIC is disconnected
(from the BSP).
Still, I find not trace in the spec that says only the BSP can receive
PIC interrupts in Virtual Wire Mode (the LVT0 line is connected to _all_
CPUs).
Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/3] KVM: x86: Relax accept conditions of kvm_apic_accept_pic_intr
2008-10-17 17:56 ` Jan Kiszka
@ 2008-10-17 18:12 ` Jan Kiszka
2008-10-17 18:14 ` Jan Kiszka
2008-10-18 2:44 ` Sheng Yang
2008-10-17 18:15 ` Sheng Yang
1 sibling, 2 replies; 30+ messages in thread
From: Jan Kiszka @ 2008-10-17 18:12 UTC (permalink / raw)
To: Sheng Yang; +Cc: kvm, avi, jiajun.xu, Jan Kiszka
[-- Attachment #1: Type: text/plain, Size: 4347 bytes --]
Jan Kiszka wrote:
> Sheng Yang wrote:
>> On Fri, Oct 17, 2008 at 07:35:01PM +0200, Jan Kiszka wrote:
>>> Sheng Yang wrote:
>>>> On Fri, Oct 17, 2008 at 10:10:54AM +0200, Jan Kiszka wrote:
>>>>> Sheng Yang wrote:
>>>>>> On Wednesday 15 October 2008 22:27:49 Jan Kiszka wrote:
>>>>>>> Aligning in-kernel kvm_apic_accept_pic_intr with its user space mate,
>>>>>>> this patch relaxes the conditions under which PIC IRQs are accepted
>>>>>>> by LVT0. This reflects reality and allows to reuse the service for the
>>>>>>> NMI watchdog use case.
>>>>>>>
>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>> ---
>>>>>>> arch/x86/kvm/lapic.c | 13 ++++---------
>>>>>>> 1 file changed, 4 insertions(+), 9 deletions(-)
>>>>>>>
>>>>>>> Index: b/arch/x86/kvm/lapic.c
>>>>>>> ===================================================================
>>>>>>> --- a/arch/x86/kvm/lapic.c
>>>>>>> +++ b/arch/x86/kvm/lapic.c
>>>>>>> @@ -1072,16 +1072,11 @@ int kvm_apic_has_interrupt(struct kvm_vc
>>>>>>> int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu)
>>>>>>> {
>>>>>>> u32 lvt0 = apic_get_reg(vcpu->arch.apic, APIC_LVT0);
>>>>>>> - int r = 0;
>>>>>>>
>>>>>>> - if (vcpu->vcpu_id == 0) {
>>>>>>> - if (!apic_hw_enabled(vcpu->arch.apic))
>>>>>>> - r = 1;
>>>>>>> - if ((lvt0 & APIC_LVT_MASKED) == 0 &&
>>>>>>> - GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT)
>>>>>>> - r = 1;
>>>>>>> - }
>>>>>>> - return r;
>>>>>>> + if (!apic_hw_enabled(vcpu->arch.apic) ||
>>>>>>> + (lvt0 & APIC_LVT_MASKED) == 0)
>>>>>>> + return 1;
>>>>>>> + return 0;
>>>>>>> }
>>>>>>>
>>>>>>> void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu)
>>>>>>>
>>>>>> (sorry for late review...)
>>>>>>
>>>>>> Thanks to find out the root cause of BSOD!
>>>>>>
>>>>>> But I am a little concern about this change. As you know, PIC only connect to
>>>>>> cpu0. So I think it's not proper to make it generic.
>>>>> I don't think so - and if it were true, qemu would have a bug then, see
>>>>> its corresponding code.
>>>> You can refer to Intel MP spec, virtual wire mode. Google
>>>> "MP spec" can find it.
>>> Ah, good reference.
>>>
>>>> Normally PIC is only used in BSP boot up for SMP guest(PIC can't afford SMP,
>>>> otherwise we won't need IOAPIC/LAPIC). After that, it should be disabled.
>>>> And virtual wire mode works with APIC_MODE_EXTINT on LVT0 of BSP lapic, so
>>>> that's why you see
>>>>
>>>> GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT
>>>>
>>>> KVM follow virtual wire mode exactly.
>>> According to my understanding of the spec, the virtual wire mode means
>>> that the PIC signal is delivered via LVT0, and thus can be received by
>>> _all_ CPUs in the system. However, only the BSP usually enables LVT0,
>>> thus is receiving the IRQ. When Linux switches to NMI watchdog mode 1,
>>> it also unmasks the other CPUs (and reprograms all to deliver NMIs
>>> instead of EXTINTs).
>>>
>>> Then there is also the "PIC Mode", ie. direct delivery to the BSP, and
>>> only the latter. That mode is obviously target by the current
>>> kvm_apic_accept_pic_intr implementation. But I find no indication in the
>>> spec yet that both modes cannot exists in the same system. But I also
>>> fail to understand how one could switch between both modes (via software).
>> No. If so, we don't need to check LVT0.
>
> OK, there is that IMCR register to enable/disable the PIC Mode - but
> neither KVM nor QEMU implement it (which may indicate they both want to
> provide Virtual Wire Mode?). When enabled, Virtual Wire Mode is
> effectively disabled (for the BSP at least) as the LAPIC is disconnected
> (from the BSP).
>
> Still, I find not trace in the spec that says only the BSP can receive
> PIC interrupts in Virtual Wire Mode (the LVT0 line is connected to _all_
> CPUs).
Now I checked also the BIOS KVM is shipping, and the MP Feature byte 2,
bit 7 (IMCRP) is cleared, thus KVM is providing the Virtual Wire mode.
Looking at Figure 3-3 of the MP spec, one can see that the PIC's output
is connected to the LVT0 line in this mode, and that this line is
connected to all CPUs in the system. So I can't help concluding that a)
QEMU's implementation is correct and b) my patch is correct as well. Or
please tell me where I'm wrong now...
Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/3] KVM: x86: Dont deliver PIT IRQs to masked LVT0s
2008-10-17 15:44 ` Alexander Graf
@ 2008-10-17 18:14 ` Alexander Graf
0 siblings, 0 replies; 30+ messages in thread
From: Alexander Graf @ 2008-10-17 18:14 UTC (permalink / raw)
To: Jan Kiszka; +Cc: KVM list, Avi Kivity, jiajun.xu, sheng
On 17.10.2008, at 17:44, Alexander Graf wrote:
>
> On 17.10.2008, at 17:37, Jan Kiszka wrote:
>
>> Alexander Graf wrote:
>>>
>>> On 15.10.2008, at 16:27, Jan Kiszka wrote:
>>>
>>>> This fixes Windows 64-bit boot regressions: PIT IRQs must not be
>>>> delivered via LVT0 lines if they are masked.
>>>
>>> Hum, I still run into issues when booting the free Hyper-V system
>>> (or
>>> Windows Server 2008). It BSODs on bootup in 9 out of 10 cases, but
>>> boots
>>> just fine with -no-kvm-pit, even though I have your patches applied.
>>>
>>> Any ideas?
>>
>> Grmbl. What is the last working commit? And does commenting out the
>> PIT
>> forwarding in __inject_pit_timer_intr change the picture?
>
I'm slowly losing faith in myself - I simply rebuilt it after I merged
some other random stuff in (mostly IA64) and now the Hyper-V server
simply boots.
I'll come back to you as soon as I find it broken again ;-).
Thanks a lot and sorry for the fuss,
Alex
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/3] KVM: x86: Relax accept conditions of kvm_apic_accept_pic_intr
2008-10-17 18:12 ` Jan Kiszka
@ 2008-10-17 18:14 ` Jan Kiszka
2008-10-18 2:44 ` Sheng Yang
1 sibling, 0 replies; 30+ messages in thread
From: Jan Kiszka @ 2008-10-17 18:14 UTC (permalink / raw)
To: Sheng Yang; +Cc: kvm, avi, jiajun.xu, Jan Kiszka
[-- Attachment #1: Type: text/plain, Size: 555 bytes --]
Jan Kiszka wrote:
> ...
> Now I checked also the BIOS KVM is shipping, and the MP Feature byte 2,
> bit 7 (IMCRP) is cleared, thus KVM is providing the Virtual Wire mode.
> Looking at Figure 3-3 of the MP spec, one can see that the PIC's output
> is connected to the LVT0 line in this mode, and that this line is
s/LVT0/LINTIN0/
> connected to all CPUs in the system. So I can't help concluding that a)
> QEMU's implementation is correct and b) my patch is correct as well. Or
> please tell me where I'm wrong now...
>
> Jan
>
Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/3] KVM: x86: Relax accept conditions of kvm_apic_accept_pic_intr
2008-10-17 17:56 ` Jan Kiszka
2008-10-17 18:12 ` Jan Kiszka
@ 2008-10-17 18:15 ` Sheng Yang
1 sibling, 0 replies; 30+ messages in thread
From: Sheng Yang @ 2008-10-17 18:15 UTC (permalink / raw)
To: Jan Kiszka; +Cc: Sheng Yang, Jan Kiszka, kvm, avi, jiajun.xu
On Fri, Oct 17, 2008 at 07:56:13PM +0200, Jan Kiszka wrote:
> Sheng Yang wrote:
> > On Fri, Oct 17, 2008 at 07:35:01PM +0200, Jan Kiszka wrote:
> >> Sheng Yang wrote:
> >>> On Fri, Oct 17, 2008 at 10:10:54AM +0200, Jan Kiszka wrote:
> >>>> Sheng Yang wrote:
> >>>>> On Wednesday 15 October 2008 22:27:49 Jan Kiszka wrote:
> >>>>>> Aligning in-kernel kvm_apic_accept_pic_intr with its user space mate,
> >>>>>> this patch relaxes the conditions under which PIC IRQs are accepted
> >>>>>> by LVT0. This reflects reality and allows to reuse the service for the
> >>>>>> NMI watchdog use case.
> >>>>>>
> >>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> >>>>>> ---
> >>>>>> arch/x86/kvm/lapic.c | 13 ++++---------
> >>>>>> 1 file changed, 4 insertions(+), 9 deletions(-)
> >>>>>>
> >>>>>> Index: b/arch/x86/kvm/lapic.c
> >>>>>> ===================================================================
> >>>>>> --- a/arch/x86/kvm/lapic.c
> >>>>>> +++ b/arch/x86/kvm/lapic.c
> >>>>>> @@ -1072,16 +1072,11 @@ int kvm_apic_has_interrupt(struct kvm_vc
> >>>>>> int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu)
> >>>>>> {
> >>>>>> u32 lvt0 = apic_get_reg(vcpu->arch.apic, APIC_LVT0);
> >>>>>> - int r = 0;
> >>>>>>
> >>>>>> - if (vcpu->vcpu_id == 0) {
> >>>>>> - if (!apic_hw_enabled(vcpu->arch.apic))
> >>>>>> - r = 1;
> >>>>>> - if ((lvt0 & APIC_LVT_MASKED) == 0 &&
> >>>>>> - GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT)
> >>>>>> - r = 1;
> >>>>>> - }
> >>>>>> - return r;
> >>>>>> + if (!apic_hw_enabled(vcpu->arch.apic) ||
> >>>>>> + (lvt0 & APIC_LVT_MASKED) == 0)
> >>>>>> + return 1;
> >>>>>> + return 0;
> >>>>>> }
> >>>>>>
> >>>>>> void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu)
> >>>>>>
> >>>>> (sorry for late review...)
> >>>>>
> >>>>> Thanks to find out the root cause of BSOD!
> >>>>>
> >>>>> But I am a little concern about this change. As you know, PIC only connect to
> >>>>> cpu0. So I think it's not proper to make it generic.
> >>>> I don't think so - and if it were true, qemu would have a bug then, see
> >>>> its corresponding code.
> >>> You can refer to Intel MP spec, virtual wire mode. Google
> >>> "MP spec" can find it.
> >> Ah, good reference.
> >>
> >>> Normally PIC is only used in BSP boot up for SMP guest(PIC can't afford SMP,
> >>> otherwise we won't need IOAPIC/LAPIC). After that, it should be disabled.
> >>> And virtual wire mode works with APIC_MODE_EXTINT on LVT0 of BSP lapic, so
> >>> that's why you see
> >>>
> >>> GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT
> >>>
> >>> KVM follow virtual wire mode exactly.
> >> According to my understanding of the spec, the virtual wire mode means
> >> that the PIC signal is delivered via LVT0, and thus can be received by
> >> _all_ CPUs in the system. However, only the BSP usually enables LVT0,
> >> thus is receiving the IRQ. When Linux switches to NMI watchdog mode 1,
> >> it also unmasks the other CPUs (and reprograms all to deliver NMIs
> >> instead of EXTINTs).
> >>
> >> Then there is also the "PIC Mode", ie. direct delivery to the BSP, and
> >> only the latter. That mode is obviously target by the current
> >> kvm_apic_accept_pic_intr implementation. But I find no indication in the
> >> spec yet that both modes cannot exists in the same system. But I also
> >> fail to understand how one could switch between both modes (via software).
> >
> > No. If so, we don't need to check LVT0.
>
> OK, there is that IMCR register to enable/disable the PIC Mode - but
> neither KVM nor QEMU implement it (which may indicate they both want to
> provide Virtual Wire Mode?). When enabled, Virtual Wire Mode is
> effectively disabled (for the BSP at least) as the LAPIC is disconnected
> (from the BSP).
Please check the figure for the difference between PIC mode and virtual wire
mode.
>
> Still, I find not trace in the spec that says only the BSP can receive
> PIC interrupts in Virtual Wire Mode (the LVT0 line is connected to _all_
> CPUs).
>
Only the one to BSP works normally. And I am not meant to go throught PIC
when implement NMI Watchdog, otherwise I won't use that
apic_local_deliver(). NMI watchdog is too specific case, I don't want to mix
them up.
--
regards
Yang, Sheng
> Jan
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 3/3] KVM: x86: Optimize NMI watchdog delivery
2008-10-17 17:40 ` Jan Kiszka
@ 2008-10-17 18:26 ` Sheng Yang
2008-10-17 18:39 ` Jan Kiszka
2008-10-19 11:15 ` Avi Kivity
1 sibling, 1 reply; 30+ messages in thread
From: Sheng Yang @ 2008-10-17 18:26 UTC (permalink / raw)
To: Jan Kiszka; +Cc: sheng, Jan Kiszka, kvm, avi, jiajun.xu
On Fri, Oct 17, 2008 at 07:40:00PM +0200, Jan Kiszka wrote:
> Sheng Yang wrote:
> > On Fri, Oct 17, 2008 at 07:23:01PM +0200, Jan Kiszka wrote:
> >> Sheng Yang wrote:
> >>> On Wed, Oct 15, 2008 at 04:27:51PM +0200, Jan Kiszka wrote:
> >>>> As suggested by Avi, this patch introduces a counter of VCPUs that have
> >>>> LVT0 set to NMI mode. Only if the counter > 0, we push the PIT ticks via
> >>>> all LAPIC LVT0 lines to enable NMI watchdog support.
> >>>>
> >>> I feel a little strange about: if *counter > 0*, we push to *all*. Can we
> >>> only push NMIs to the ones that set NMI for LVT0?
> >> We don't do that due to !kvm_apic_accept_pic_intr(). The counter is only
> >> about optimizing that case where we don't have to walk the whole chain,
> >> asking every vcpu if it would like to receive the IRQ.
> >
> > I don't agree to use kvm_apic_accept_pic_intr() here, as I explained in the
> > first mail. It's not a normal path, and current KVM handle it well.
>
> Current KVM only support PIC Mode, which is fine, but not sufficient for
> NMI watchdog support. We need to get the Virtual Wire Mode in, but
> correctly.
>
> >>> How about add a field in struct kvm_lapic? We can quickly know if we need to
> >>> inject NMI for this vcpu. Well, though kernel mostly enable NMI watchdog on
> >>> all vcpu, I think this is more precise, and match the logic, and avoid one
> >>> more field in kvm_arch...
> >> The point of this patch is to avoid touching vcpu structures AT ALL when
> >> there is no interest in the NMI watchdog (normally, OSes will either
> >> enable the WD trick for all CPUSs or keep it off).
> >
> > Logically, I think lapic is more proper place. And put a bool there won't
> > affect much. I think we can do it more straightly here.
>
> If you have dozens of lapics, you don't want to check them all if they
> are ALL switched of anyway. That information is better encoded in a
> single, (virtual) system-wide bool. That's the most common case we want
> to speed up. And it is the core of the optimization Avi suggested
> (unless I totally misunderstood him).
Yeah, I am agree on this point now. But for the above one, NO... :)
Using apic_local_deliver() also means I ignored PIC and make it transpent.
Please don't involve it in again. It's *not* the normal usage. I want to
keep the impact as small as possible.
--
regards
Yang, Sheng
> Jan
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 3/3] KVM: x86: Optimize NMI watchdog delivery
2008-10-17 18:26 ` Sheng Yang
@ 2008-10-17 18:39 ` Jan Kiszka
0 siblings, 0 replies; 30+ messages in thread
From: Jan Kiszka @ 2008-10-17 18:39 UTC (permalink / raw)
To: sheng; +Cc: Jan Kiszka, kvm, avi, jiajun.xu
[-- Attachment #1: Type: text/plain, Size: 2582 bytes --]
Sheng Yang wrote:
> On Fri, Oct 17, 2008 at 07:40:00PM +0200, Jan Kiszka wrote:
>> Sheng Yang wrote:
>>> On Fri, Oct 17, 2008 at 07:23:01PM +0200, Jan Kiszka wrote:
>>>> Sheng Yang wrote:
>>>>> On Wed, Oct 15, 2008 at 04:27:51PM +0200, Jan Kiszka wrote:
>>>>>> As suggested by Avi, this patch introduces a counter of VCPUs that have
>>>>>> LVT0 set to NMI mode. Only if the counter > 0, we push the PIT ticks via
>>>>>> all LAPIC LVT0 lines to enable NMI watchdog support.
>>>>>>
>>>>> I feel a little strange about: if *counter > 0*, we push to *all*. Can we
>>>>> only push NMIs to the ones that set NMI for LVT0?
>>>> We don't do that due to !kvm_apic_accept_pic_intr(). The counter is only
>>>> about optimizing that case where we don't have to walk the whole chain,
>>>> asking every vcpu if it would like to receive the IRQ.
>>> I don't agree to use kvm_apic_accept_pic_intr() here, as I explained in the
>>> first mail. It's not a normal path, and current KVM handle it well.
>> Current KVM only support PIC Mode, which is fine, but not sufficient for
>> NMI watchdog support. We need to get the Virtual Wire Mode in, but
>> correctly.
>>
>>>>> How about add a field in struct kvm_lapic? We can quickly know if we need to
>>>>> inject NMI for this vcpu. Well, though kernel mostly enable NMI watchdog on
>>>>> all vcpu, I think this is more precise, and match the logic, and avoid one
>>>>> more field in kvm_arch...
>>>> The point of this patch is to avoid touching vcpu structures AT ALL when
>>>> there is no interest in the NMI watchdog (normally, OSes will either
>>>> enable the WD trick for all CPUSs or keep it off).
>>> Logically, I think lapic is more proper place. And put a bool there won't
>>> affect much. I think we can do it more straightly here.
>> If you have dozens of lapics, you don't want to check them all if they
>> are ALL switched of anyway. That information is better encoded in a
>> single, (virtual) system-wide bool. That's the most common case we want
>> to speed up. And it is the core of the optimization Avi suggested
>> (unless I totally misunderstood him).
>
> Yeah, I am agree on this point now. But for the above one, NO... :)
>
> Using apic_local_deliver() also means I ignored PIC and make it transpent.
> Please don't involve it in again. It's *not* the normal usage. I want to
> keep the impact as small as possible.
??? You lost me again. Are we still talking about the changes of this
particular patch? In what way does it "involve" apic_local_deliver
again? And into what?
Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/3] KVM: x86: Relax accept conditions of kvm_apic_accept_pic_intr
2008-10-17 18:12 ` Jan Kiszka
2008-10-17 18:14 ` Jan Kiszka
@ 2008-10-18 2:44 ` Sheng Yang
2008-10-18 3:02 ` Sheng Yang
2008-10-18 8:29 ` Jan Kiszka
1 sibling, 2 replies; 30+ messages in thread
From: Sheng Yang @ 2008-10-18 2:44 UTC (permalink / raw)
To: Jan Kiszka; +Cc: Sheng Yang, kvm, avi, jiajun.xu, Jan Kiszka
On Fri, Oct 17, 2008 at 08:12:00PM +0200, Jan Kiszka wrote:
> Jan Kiszka wrote:
> > Sheng Yang wrote:
> >> On Fri, Oct 17, 2008 at 07:35:01PM +0200, Jan Kiszka wrote:
> >>> Sheng Yang wrote:
> >>>> On Fri, Oct 17, 2008 at 10:10:54AM +0200, Jan Kiszka wrote:
> >>>>> Sheng Yang wrote:
> >>>>>> On Wednesday 15 October 2008 22:27:49 Jan Kiszka wrote:
> >>>>>>> Aligning in-kernel kvm_apic_accept_pic_intr with its user space mate,
> >>>>>>> this patch relaxes the conditions under which PIC IRQs are accepted
> >>>>>>> by LVT0. This reflects reality and allows to reuse the service for the
> >>>>>>> NMI watchdog use case.
> >>>>>>>
> >>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> >>>>>>> ---
> >>>>>>> arch/x86/kvm/lapic.c | 13 ++++---------
> >>>>>>> 1 file changed, 4 insertions(+), 9 deletions(-)
> >>>>>>>
> >>>>>>> Index: b/arch/x86/kvm/lapic.c
> >>>>>>> ===================================================================
> >>>>>>> --- a/arch/x86/kvm/lapic.c
> >>>>>>> +++ b/arch/x86/kvm/lapic.c
> >>>>>>> @@ -1072,16 +1072,11 @@ int kvm_apic_has_interrupt(struct kvm_vc
> >>>>>>> int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu)
> >>>>>>> {
> >>>>>>> u32 lvt0 = apic_get_reg(vcpu->arch.apic, APIC_LVT0);
> >>>>>>> - int r = 0;
> >>>>>>>
> >>>>>>> - if (vcpu->vcpu_id == 0) {
> >>>>>>> - if (!apic_hw_enabled(vcpu->arch.apic))
> >>>>>>> - r = 1;
> >>>>>>> - if ((lvt0 & APIC_LVT_MASKED) == 0 &&
> >>>>>>> - GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT)
> >>>>>>> - r = 1;
> >>>>>>> - }
> >>>>>>> - return r;
> >>>>>>> + if (!apic_hw_enabled(vcpu->arch.apic) ||
> >>>>>>> + (lvt0 & APIC_LVT_MASKED) == 0)
> >>>>>>> + return 1;
> >>>>>>> + return 0;
> >>>>>>> }
> >>>>>>>
> >>>>>>> void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu)
> >>>>>>>
> >>>>>> (sorry for late review...)
> >>>>>>
> >>>>>> Thanks to find out the root cause of BSOD!
> >>>>>>
> >>>>>> But I am a little concern about this change. As you know, PIC only connect to
> >>>>>> cpu0. So I think it's not proper to make it generic.
> >>>>> I don't think so - and if it were true, qemu would have a bug then, see
> >>>>> its corresponding code.
> >>>> You can refer to Intel MP spec, virtual wire mode. Google
> >>>> "MP spec" can find it.
> >>> Ah, good reference.
> >>>
> >>>> Normally PIC is only used in BSP boot up for SMP guest(PIC can't afford SMP,
> >>>> otherwise we won't need IOAPIC/LAPIC). After that, it should be disabled.
> >>>> And virtual wire mode works with APIC_MODE_EXTINT on LVT0 of BSP lapic, so
> >>>> that's why you see
> >>>>
> >>>> GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT
> >>>>
> >>>> KVM follow virtual wire mode exactly.
> >>> According to my understanding of the spec, the virtual wire mode means
> >>> that the PIC signal is delivered via LVT0, and thus can be received by
> >>> _all_ CPUs in the system. However, only the BSP usually enables LVT0,
> >>> thus is receiving the IRQ. When Linux switches to NMI watchdog mode 1,
> >>> it also unmasks the other CPUs (and reprograms all to deliver NMIs
> >>> instead of EXTINTs).
> >>>
> >>> Then there is also the "PIC Mode", ie. direct delivery to the BSP, and
> >>> only the latter. That mode is obviously target by the current
> >>> kvm_apic_accept_pic_intr implementation. But I find no indication in the
> >>> spec yet that both modes cannot exists in the same system. But I also
> >>> fail to understand how one could switch between both modes (via software).
> >> No. If so, we don't need to check LVT0.
> >
> > OK, there is that IMCR register to enable/disable the PIC Mode - but
> > neither KVM nor QEMU implement it (which may indicate they both want to
> > provide Virtual Wire Mode?). When enabled, Virtual Wire Mode is
> > effectively disabled (for the BSP at least) as the LAPIC is disconnected
> > (from the BSP).
> >
> > Still, I find not trace in the spec that says only the BSP can receive
> > PIC interrupts in Virtual Wire Mode (the LVT0 line is connected to _all_
> > CPUs).
>
> Now I checked also the BIOS KVM is shipping, and the MP Feature byte 2,
> bit 7 (IMCRP) is cleared, thus KVM is providing the Virtual Wire mode.
> Looking at Figure 3-3 of the MP spec, one can see that the PIC's output
> is connected to the LVT0 line in this mode, and that this line is
> connected to all CPUs in the system. So I can't help concluding that a)
> QEMU's implementation is correct and b) my patch is correct as well. Or
> please tell me where I'm wrong now...
Frankly speaking, here are two apporoaches. Both are OK to work. You
insisted the QEmu method, emulate that line connect all lapic's LVT0. And I
insisted to follow the current solution, the dot-line of virtual wire mode
in the spec, then make NMI watchdog as a separate thing, impact others as
small as possible.
When I wrote NMI watchdog, I don't want to involve PIC, for it's special
case of PIC usage. So I think it's OK to not emulate the path here, then use
apic_local_deliver() to send the interrupt directly, not through the PIC
path. If PIC involved, that's another path. Current QEmu covered this,
pic_request_irq() send to every vcpu, emulate that whole LVT0 line. Our KVM
choose a different way, we just assume PIC only connect to LVT0 of BSP, for
others should be disabled. That's save a lot when you have a lot of vcpus,
as you said.
So currently, QEmu emulate virtual wire mode well, and KVM do some
simplification, only connect to BSP. Both of them follow this in each's
code. And for KVM, the change to kvm_apic_accept_pic_intr() broke this
assumption. Now we only work PIC with BSP, but check all the vcpus. I don't
think that's a good combination. I think we are not likely do more to
improve our PIC connection method, so NMI watchdog in KVM was designed as a
separate thing, as a special case, and should be the only special case.
kvm_cpu_has_interrupt() called every time before VM entry to check if
there are any intr can be injected. If lapic got none,
it would check kvm_apic_accept_pic_intr(). Check every vcpu or only check
vcpu0, would bring about (vcpu_nr - 1) * ((vm_exit_nr - lapic_has_intr_nr) /
vcpu_nr)(if we assume vmexit on every vcpu is the mostly compatiable) more
times to do the judgment on other vcpus here. And normally, the latter
number would tens of thousand to hundreds of thousands. If you care about
1000 per vcpu's touch in pit, why you don't care about them here?
--
regards
Yang, Sheng
>
> Jan
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/3] KVM: x86: Relax accept conditions of kvm_apic_accept_pic_intr
2008-10-18 2:44 ` Sheng Yang
@ 2008-10-18 3:02 ` Sheng Yang
2008-10-18 8:29 ` Jan Kiszka
1 sibling, 0 replies; 30+ messages in thread
From: Sheng Yang @ 2008-10-18 3:02 UTC (permalink / raw)
To: Jan Kiszka; +Cc: Sheng Yang, kvm, avi, jiajun.xu, Jan Kiszka
On Sat, Oct 18, 2008 at 10:44 AM, Sheng Yang <yasker@gmail.com> wrote:
> On Fri, Oct 17, 2008 at 08:12:00PM +0200, Jan Kiszka wrote:
>> Jan Kiszka wrote:
>> > Sheng Yang wrote:
>> >> On Fri, Oct 17, 2008 at 07:35:01PM +0200, Jan Kiszka wrote:
>> >>> Sheng Yang wrote:
>> >>>> On Fri, Oct 17, 2008 at 10:10:54AM +0200, Jan Kiszka wrote:
>> >>>>> Sheng Yang wrote:
>> >>>>>> On Wednesday 15 October 2008 22:27:49 Jan Kiszka wrote:
>> >>>>>>> Aligning in-kernel kvm_apic_accept_pic_intr with its user space mate,
>> >>>>>>> this patch relaxes the conditions under which PIC IRQs are accepted
>> >>>>>>> by LVT0. This reflects reality and allows to reuse the service for the
>> >>>>>>> NMI watchdog use case.
>> >>>>>>>
>> >>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>> >>>>>>> ---
>> >>>>>>> arch/x86/kvm/lapic.c | 13 ++++---------
>> >>>>>>> 1 file changed, 4 insertions(+), 9 deletions(-)
>> >>>>>>>
>> >>>>>>> Index: b/arch/x86/kvm/lapic.c
>> >>>>>>> ===================================================================
>> >>>>>>> --- a/arch/x86/kvm/lapic.c
>> >>>>>>> +++ b/arch/x86/kvm/lapic.c
>> >>>>>>> @@ -1072,16 +1072,11 @@ int kvm_apic_has_interrupt(struct kvm_vc
>> >>>>>>> int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu)
>> >>>>>>> {
>> >>>>>>> u32 lvt0 = apic_get_reg(vcpu->arch.apic, APIC_LVT0);
>> >>>>>>> - int r = 0;
>> >>>>>>>
>> >>>>>>> - if (vcpu->vcpu_id == 0) {
>> >>>>>>> - if (!apic_hw_enabled(vcpu->arch.apic))
>> >>>>>>> - r = 1;
>> >>>>>>> - if ((lvt0 & APIC_LVT_MASKED) == 0 &&
>> >>>>>>> - GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT)
>> >>>>>>> - r = 1;
>> >>>>>>> - }
>> >>>>>>> - return r;
>> >>>>>>> + if (!apic_hw_enabled(vcpu->arch.apic) ||
>> >>>>>>> + (lvt0 & APIC_LVT_MASKED) == 0)
>> >>>>>>> + return 1;
>> >>>>>>> + return 0;
>> >>>>>>> }
>> >>>>>>>
>> >>>>>>> void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu)
>> >>>>>>>
>> >>>>>> (sorry for late review...)
>> >>>>>>
>> >>>>>> Thanks to find out the root cause of BSOD!
>> >>>>>>
>> >>>>>> But I am a little concern about this change. As you know, PIC only connect to
>> >>>>>> cpu0. So I think it's not proper to make it generic.
>> >>>>> I don't think so - and if it were true, qemu would have a bug then, see
>> >>>>> its corresponding code.
>> >>>> You can refer to Intel MP spec, virtual wire mode. Google
>> >>>> "MP spec" can find it.
>> >>> Ah, good reference.
>> >>>
>> >>>> Normally PIC is only used in BSP boot up for SMP guest(PIC can't afford SMP,
>> >>>> otherwise we won't need IOAPIC/LAPIC). After that, it should be disabled.
>> >>>> And virtual wire mode works with APIC_MODE_EXTINT on LVT0 of BSP lapic, so
>> >>>> that's why you see
>> >>>>
>> >>>> GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT
>> >>>>
>> >>>> KVM follow virtual wire mode exactly.
>> >>> According to my understanding of the spec, the virtual wire mode means
>> >>> that the PIC signal is delivered via LVT0, and thus can be received by
>> >>> _all_ CPUs in the system. However, only the BSP usually enables LVT0,
>> >>> thus is receiving the IRQ. When Linux switches to NMI watchdog mode 1,
>> >>> it also unmasks the other CPUs (and reprograms all to deliver NMIs
>> >>> instead of EXTINTs).
>> >>>
>> >>> Then there is also the "PIC Mode", ie. direct delivery to the BSP, and
>> >>> only the latter. That mode is obviously target by the current
>> >>> kvm_apic_accept_pic_intr implementation. But I find no indication in the
>> >>> spec yet that both modes cannot exists in the same system. But I also
>> >>> fail to understand how one could switch between both modes (via software).
>> >> No. If so, we don't need to check LVT0.
>> >
>> > OK, there is that IMCR register to enable/disable the PIC Mode - but
>> > neither KVM nor QEMU implement it (which may indicate they both want to
>> > provide Virtual Wire Mode?). When enabled, Virtual Wire Mode is
>> > effectively disabled (for the BSP at least) as the LAPIC is disconnected
>> > (from the BSP).
>> >
>> > Still, I find not trace in the spec that says only the BSP can receive
>> > PIC interrupts in Virtual Wire Mode (the LVT0 line is connected to _all_
>> > CPUs).
>>
>> Now I checked also the BIOS KVM is shipping, and the MP Feature byte 2,
>> bit 7 (IMCRP) is cleared, thus KVM is providing the Virtual Wire mode.
>> Looking at Figure 3-3 of the MP spec, one can see that the PIC's output
>> is connected to the LVT0 line in this mode, and that this line is
>> connected to all CPUs in the system. So I can't help concluding that a)
>> QEMU's implementation is correct and b) my patch is correct as well. Or
>> please tell me where I'm wrong now...
>
> Frankly speaking, here are two apporoaches. Both are OK to work. You
> insisted the QEmu method, emulate that line connect all lapic's LVT0. And I
> insisted to follow the current solution, the dot-line of virtual wire mode
> in the spec, then make NMI watchdog as a separate thing, impact others as
> small as possible.
>
> When I wrote NMI watchdog, I don't want to involve PIC, for it's special
> case of PIC usage. So I think it's OK to not emulate the path here, then use
> apic_local_deliver() to send the interrupt directly, not through the PIC
> path. If PIC involved, that's another path. Current QEmu covered this,
> pic_request_irq() send to every vcpu, emulate that whole LVT0 line. Our KVM
> choose a different way, we just assume PIC only connect to LVT0 of BSP, for
> others should be disabled. That's save a lot when you have a lot of vcpus,
> as you said.
>
> So currently, QEmu emulate virtual wire mode well, and KVM do some
> simplification, only connect to BSP. Both of them follow this in each's
> code. And for KVM, the change to kvm_apic_accept_pic_intr() broke this
> assumption. Now we only work PIC with BSP, but check all the vcpus. I don't
> think that's a good combination. I think we are not likely do more to
> improve our PIC connection method, so NMI watchdog in KVM was designed as a
> separate thing, as a special case, and should be the only special case.
>
> kvm_cpu_has_interrupt() called every time before VM entry to check if
> there are any intr can be injected. If lapic got none,
> it would check kvm_apic_accept_pic_intr(). Check every vcpu or only check
> vcpu0, would bring about (vcpu_nr - 1) * ((vm_exit_nr - lapic_has_intr_nr) /
> vcpu_nr)(if we assume vmexit on every vcpu is the mostly compatiable) more
comparable... not compatiable...
> times to do the judgment on other vcpus here. And normally, the latter
> number would tens of thousand to hundreds of thousands. If you care about
> 1000 per vcpu's touch in pit, why you don't care about them here?
>
Well, I just hope you can understand my thought.
Thanks.
--
regards,
Yang, Sheng
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/3] KVM: x86: Relax accept conditions of kvm_apic_accept_pic_intr
2008-10-18 2:44 ` Sheng Yang
2008-10-18 3:02 ` Sheng Yang
@ 2008-10-18 8:29 ` Jan Kiszka
1 sibling, 0 replies; 30+ messages in thread
From: Jan Kiszka @ 2008-10-18 8:29 UTC (permalink / raw)
To: Sheng Yang; +Cc: kvm, avi, jiajun.xu, Jan Kiszka
[-- Attachment #1: Type: text/plain, Size: 4497 bytes --]
Sheng Yang wrote:
> On Fri, Oct 17, 2008 at 08:12:00PM +0200, Jan Kiszka wrote:
>> Now I checked also the BIOS KVM is shipping, and the MP Feature byte 2,
>> bit 7 (IMCRP) is cleared, thus KVM is providing the Virtual Wire mode.
>> Looking at Figure 3-3 of the MP spec, one can see that the PIC's output
>> is connected to the LVT0 line in this mode, and that this line is
>> connected to all CPUs in the system. So I can't help concluding that a)
>> QEMU's implementation is correct and b) my patch is correct as well. Or
>> please tell me where I'm wrong now...
>
> Frankly speaking, here are two apporoaches. Both are OK to work. You
> insisted the QEmu method, emulate that line connect all lapic's LVT0. And I
> insisted to follow the current solution, the dot-line of virtual wire mode
> in the spec, then make NMI watchdog as a separate thing, impact others as
> small as possible.
Ack.
>
> When I wrote NMI watchdog, I don't want to involve PIC, for it's special
> case of PIC usage. So I think it's OK to not emulate the path here, then use
> apic_local_deliver() to send the interrupt directly, not through the PIC
> path. If PIC involved, that's another path. Current QEmu covered this,
> pic_request_irq() send to every vcpu, emulate that whole LVT0 line. Our KVM
> choose a different way, we just assume PIC only connect to LVT0 of BSP, for
> others should be disabled. That's save a lot when you have a lot of vcpus,
> as you said.
Yes, I came across this assumption that only the BSP can receive PIC
interrupts as well in the meantime. I tried to first enhance the
accuracy of KVMs virtual wire mode and then optimize it the way proposed
for the NMI watchdog. However, I had to give up as I realized the this
assumption is too deeply hooked into the KVM design.
Nevertheless, one minor inaccuracy can and should be fixed (will repost
as true patch after more testing): If the APIC is disabled, there will
be no PIC interrupt forwarding. This should also be fixed in QEMU.
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1071,17 +1071,15 @@ int kvm_apic_has_interrupt(struct kvm_vc
int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu)
{
+ struct kvm_lapic *apic = vcpu->arch.apic;
u32 lvt0 = apic_get_reg(vcpu->arch.apic, APIC_LVT0);
- int r = 0;
- if (vcpu->vcpu_id == 0) {
- if (!apic_hw_enabled(vcpu->arch.apic))
- r = 1;
- if ((lvt0 & APIC_LVT_MASKED) == 0 &&
- GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT)
- r = 1;
- }
- return r;
+ /* Virtual Wire mode, but we only deliver to the BSP. */
+ if (vcpu->vcpu_id == 0 && apic_hw_enabled(apic)
+ && !(lvt0 & APIC_LVT_MASKED)
+ && GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT)
+ return 1;
+ return 0;
}
void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu)
>
> So currently, QEmu emulate virtual wire mode well, and KVM do some
> simplification, only connect to BSP. Both of them follow this in each's
> code. And for KVM, the change to kvm_apic_accept_pic_intr() broke this
> assumption. Now we only work PIC with BSP, but check all the vcpus. I don't
> think that's a good combination. I think we are not likely do more to
> improve our PIC connection method, so NMI watchdog in KVM was designed as a
> separate thing, as a special case, and should be the only special case.
Agreed. I'm preparing patches to take this into account while fixing the
current NMI watchdog implementation.
>
> kvm_cpu_has_interrupt() called every time before VM entry to check if
> there are any intr can be injected. If lapic got none,
> it would check kvm_apic_accept_pic_intr(). Check every vcpu or only check
> vcpu0, would bring about (vcpu_nr - 1) * ((vm_exit_nr - lapic_has_intr_nr) /
> vcpu_nr)(if we assume vmexit on every vcpu is the mostly compatiable) more
> times to do the judgment on other vcpus here. And normally, the latter
> number would tens of thousand to hundreds of thousands. If you care about
> 1000 per vcpu's touch in pit, why you don't care about them here?
As I said, that case would have only mattered in an improved version if
any VCPU > 1 had its LVT0 unmasked - similar optimization like for NMI
WD. But things are more tricky as the PIC code and its users are not
prepared to dispatch the PIC vector to multiple sinks. That finally
convinced me stopping my rework. The effort became too high compared to
the accuracy gain that hardly any OS may need.
Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 0/3] KVM: x86: Fix and optimize in-kernel NMI watchdog support
2008-10-15 14:27 [PATCH 0/3] KVM: x86: Fix and optimize in-kernel NMI watchdog support Jan Kiszka
` (2 preceding siblings ...)
2008-10-15 14:27 ` [PATCH 3/3] KVM: x86: Optimize NMI watchdog delivery Jan Kiszka
@ 2008-10-19 11:13 ` Avi Kivity
2008-10-19 13:03 ` Sheng Yang
3 siblings, 1 reply; 30+ messages in thread
From: Avi Kivity @ 2008-10-19 11:13 UTC (permalink / raw)
To: Jan Kiszka; +Cc: kvm, jiajun.xu, sheng
Jan Kiszka wrote:
> Bug tracker reports 2149609 and 2168057 pointed out boot issues of
> Windows 64-bit versions. This series fixes the underlying problem.
>
> It furthermore stacks the earlier posted optimization on top which
> shortens the PIT IRQ delivery path in case the NMI-watchdog-via-PIT
> trick is not used by the guest.
>
I'd like to apply this. I've seen all the back-and-forth, but not
having read the specs, I can't tell which side is right...
Sheng, do you still object to the patches?
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 3/3] KVM: x86: Optimize NMI watchdog delivery
2008-10-17 17:40 ` Jan Kiszka
2008-10-17 18:26 ` Sheng Yang
@ 2008-10-19 11:15 ` Avi Kivity
1 sibling, 0 replies; 30+ messages in thread
From: Avi Kivity @ 2008-10-19 11:15 UTC (permalink / raw)
To: Jan Kiszka; +Cc: sheng, Jan Kiszka, kvm, jiajun.xu
Jan Kiszka wrote:
> If you have dozens of lapics, you don't want to check them all if they
> are ALL switched of anyway. That information is better encoded in a
> single, (virtual) system-wide bool. That's the most common case we want
> to speed up. And it is the core of the optimization Avi suggested
> (unless I totally misunderstood him).
>
>
Yes, that's what I meant. Touching vcpu data is expensive since it's
likely to be in MESI (or whatever it's called now) exclusive mode on
some other cpu.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 0/3] KVM: x86: Fix and optimize in-kernel NMI watchdog support
2008-10-19 11:13 ` [PATCH 0/3] KVM: x86: Fix and optimize in-kernel NMI watchdog support Avi Kivity
@ 2008-10-19 13:03 ` Sheng Yang
0 siblings, 0 replies; 30+ messages in thread
From: Sheng Yang @ 2008-10-19 13:03 UTC (permalink / raw)
To: Avi Kivity; +Cc: Jan Kiszka, kvm, jiajun.xu, sheng
On Sun, Oct 19, 2008 at 01:13:57PM +0200, Avi Kivity wrote:
> Jan Kiszka wrote:
>> Bug tracker reports 2149609 and 2168057 pointed out boot issues of
>> Windows 64-bit versions. This series fixes the underlying problem.
>>
>> It furthermore stacks the earlier posted optimization on top which
>> shortens the PIT IRQ delivery path in case the NMI-watchdog-via-PIT
>> trick is not used by the guest.
>>
>
> I'd like to apply this. I've seen all the back-and-forth, but not
> having read the specs, I can't tell which side is right...
>
> Sheng, do you still object to the patches?
I think Jan understood and agreed my thoughts now. And he is preparing a
updated patch to leave NMI watchdog as a single part without impact our
assumption on our PIC only connect to BSP.
Thanks Jan for understanding. :)
--
regards
Yang, Sheng
>
> --
> error compiling committee.c: too many arguments to function
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2008-10-19 13:03 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-15 14:27 [PATCH 0/3] KVM: x86: Fix and optimize in-kernel NMI watchdog support Jan Kiszka
2008-10-15 14:27 ` [PATCH 1/3] KVM: x86: Relax accept conditions of kvm_apic_accept_pic_intr Jan Kiszka
2008-10-17 5:11 ` Sheng Yang
2008-10-17 8:10 ` Jan Kiszka
2008-10-17 16:35 ` Sheng Yang
2008-10-17 17:35 ` Jan Kiszka
2008-10-17 17:47 ` Sheng Yang
2008-10-17 17:56 ` Jan Kiszka
2008-10-17 18:12 ` Jan Kiszka
2008-10-17 18:14 ` Jan Kiszka
2008-10-18 2:44 ` Sheng Yang
2008-10-18 3:02 ` Sheng Yang
2008-10-18 8:29 ` Jan Kiszka
2008-10-17 18:15 ` Sheng Yang
2008-10-17 15:31 ` Xu, Jiajun
2008-10-15 14:27 ` [PATCH 2/3] KVM: x86: Dont deliver PIT IRQs to masked LVT0s Jan Kiszka
2008-10-17 15:23 ` Alexander Graf
2008-10-17 15:37 ` Jan Kiszka
2008-10-17 15:44 ` Alexander Graf
2008-10-17 18:14 ` Alexander Graf
2008-10-15 14:27 ` [PATCH 3/3] KVM: x86: Optimize NMI watchdog delivery Jan Kiszka
2008-10-17 17:06 ` Sheng Yang
2008-10-17 17:23 ` Jan Kiszka
2008-10-17 17:34 ` Sheng Yang
2008-10-17 17:40 ` Jan Kiszka
2008-10-17 18:26 ` Sheng Yang
2008-10-17 18:39 ` Jan Kiszka
2008-10-19 11:15 ` Avi Kivity
2008-10-19 11:13 ` [PATCH 0/3] KVM: x86: Fix and optimize in-kernel NMI watchdog support Avi Kivity
2008-10-19 13:03 ` Sheng Yang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox