* [PATCH 0/4] KVM: x86: change and fix PIT discard tick policy
@ 2016-02-03 16:23 Radim Krčmář
2016-02-03 16:23 ` [PATCH 1/4] KVM: x86: fix interrupt dropping race in PIT Radim Krčmář
` (3 more replies)
0 siblings, 4 replies; 11+ messages in thread
From: Radim Krčmář @ 2016-02-03 16:23 UTC (permalink / raw)
To: linux-kernel; +Cc: kvm, Paolo Bonzini, Yuki Shibuya
[3/4] fixes a bug where legacy NMI watchdog doesn't receive NMIs if a
CPU that handles PIT interrupts hangs with masked interrupts.
Remaining patches try to make the code nicer where necessary.
([1/4] makes you think that it fixes a bug, but that would be fixed by
skipping the path in [3/4].)
[3/4] only changes the discard policy. A change that allows legacy NMI
watchdog with the reinject policy is below, but conscience doesn't allow
me to sign it off.
# diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
@@ -274,6 +274,10 @@ static void pit_do_work(struct kthread_work *work)
int i;
struct kvm_kpit_state *ps = &pit->pit_state;
+ if (atomic_read(&kvm->arch.vapics_in_nmi_mode) > 0)
+ kvm_for_each_vcpu(i, vcpu, kvm)
+ kvm_apic_nmi_wd_deliver(vcpu);
+
if (ps->reinject && !atomic_xchg(&ps->irq_ack, 0))
return;
@@ -289,9 +293,6 @@ static void pit_do_work(struct kthread_work *work)
* VCPUs and only when LVT0 is in NMI mode. The interrupt can
* also be simultaneously delivered through PIC and IOAPIC.
*/
- if (atomic_read(&kvm->arch.vapics_in_nmi_mode) > 0)
- kvm_for_each_vcpu(i, vcpu, kvm)
- kvm_apic_nmi_wd_deliver(vcpu);
}
static enum hrtimer_restart pit_timer_fn(struct hrtimer *data)
Radim Krčmář (4):
KVM: x86: fix interrupt dropping race in PIT
KVM: x86: refactor PIT state inject_lock
KVM: x86: change PIT discard tick policy
KVM: x86: remove notifiers from PIT discard policy
arch/x86/kvm/i8254.c | 122 ++++++++++++++++++++++++++-------------------------
arch/x86/kvm/i8254.h | 4 +-
arch/x86/kvm/x86.c | 6 +--
3 files changed, 67 insertions(+), 65 deletions(-)
--
2.7.0
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/4] KVM: x86: fix interrupt dropping race in PIT
2016-02-03 16:23 [PATCH 0/4] KVM: x86: change and fix PIT discard tick policy Radim Krčmář
@ 2016-02-03 16:23 ` Radim Krčmář
2016-02-04 13:35 ` Radim Krčmář
2016-02-03 16:23 ` [PATCH 2/4] KVM: x86: refactor PIT state inject_lock Radim Krčmář
` (2 subsequent siblings)
3 siblings, 1 reply; 11+ messages in thread
From: Radim Krčmář @ 2016-02-03 16:23 UTC (permalink / raw)
To: linux-kernel; +Cc: kvm, Paolo Bonzini, Yuki Shibuya
A misuse of atomic operations opened a window
kvm_pit_ack_irq: | pit_timer_fn:
value = atomic_dec_return(&ps->pending); |
| !atomic_read(&ps->pending)
if (value < 0) atomic_inc(&ps->pending); |
If ps->pending starts as 0 and we are using the discard policy, we don't
inject any interrupt in kvm_pit_ack_irq or pit_timer_fn, leading to a
missed PIT cycle.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
arch/x86/kvm/i8254.c | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)
diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index b0ea42b78ccd..de5f5018026f 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -239,13 +239,7 @@ static void kvm_pit_ack_irq(struct kvm_irq_ack_notifier *kian)
int value;
spin_lock(&ps->inject_lock);
- value = atomic_dec_return(&ps->pending);
- if (value < 0)
- /* spurious acks can be generated if, for example, the
- * PIC is being reset. Handle it gracefully here
- */
- atomic_inc(&ps->pending);
- else if (value > 0)
+ if (atomic_add_unless(&ps->pending, -1, 0))
/* in this case, we had multiple outstanding pit interrupts
* that we needed to inject. Reinject
*/
--
2.7.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 2/4] KVM: x86: refactor PIT state inject_lock
2016-02-03 16:23 [PATCH 0/4] KVM: x86: change and fix PIT discard tick policy Radim Krčmář
2016-02-03 16:23 ` [PATCH 1/4] KVM: x86: fix interrupt dropping race in PIT Radim Krčmář
@ 2016-02-03 16:23 ` Radim Krčmář
2016-02-03 16:45 ` Paolo Bonzini
2016-02-03 16:23 ` [PATCH 3/4] KVM: x86: change PIT discard tick policy Radim Krčmář
2016-02-03 16:23 ` [PATCH 4/4] KVM: x86: remove notifiers from PIT discard policy Radim Krčmář
3 siblings, 1 reply; 11+ messages in thread
From: Radim Krčmář @ 2016-02-03 16:23 UTC (permalink / raw)
To: linux-kernel; +Cc: kvm, Paolo Bonzini, Yuki Shibuya
Following patches would be even uglier if inject_lock didn't go away.
Patch changes the virtual wire comment to better describe our situation.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
arch/x86/kvm/i8254.c | 67 ++++++++++++++++++++++------------------------------
arch/x86/kvm/i8254.h | 3 +--
2 files changed, 29 insertions(+), 41 deletions(-)
diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index de5f5018026f..a137eb381012 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -236,16 +236,13 @@ static void kvm_pit_ack_irq(struct kvm_irq_ack_notifier *kian)
{
struct kvm_kpit_state *ps = container_of(kian, struct kvm_kpit_state,
irq_ack_notifier);
- int value;
- spin_lock(&ps->inject_lock);
+ atomic_set(&ps->irq_ack, 1);
if (atomic_add_unless(&ps->pending, -1, 0))
/* in this case, we had multiple outstanding pit interrupts
* that we needed to inject. Reinject
*/
queue_kthread_work(&ps->pit->worker, &ps->pit->expired);
- ps->irq_ack = 1;
- spin_unlock(&ps->inject_lock);
}
void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu)
@@ -276,34 +273,25 @@ static void pit_do_work(struct kthread_work *work)
struct kvm_vcpu *vcpu;
int i;
struct kvm_kpit_state *ps = &pit->pit_state;
- int inject = 0;
- /* Try to inject pending interrupts when
- * last one has been acked.
+ if (!atomic_xchg(&ps->irq_ack, 0))
+ return;
+
+ kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1, false);
+ kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 0, false);
+
+ /*
+ * Provides NMI watchdog support via Virtual Wire mode.
+ * The route is: PIT -> LVT0 in NMI mode.
+ *
+ * Note: Our Virtual Wire implementation does not follow
+ * the MP specification. We propagate a PIT interrupt to all
+ * VCPUs and only when LVT0 is in NMI mode. The interrupt can
+ * also be simultaneously delivered through PIC and IOAPIC.
*/
- spin_lock(&ps->inject_lock);
- if (ps->irq_ack) {
- ps->irq_ack = 0;
- inject = 1;
- }
- spin_unlock(&ps->inject_lock);
- if (inject) {
- kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1, false);
- kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 0, false);
-
- /*
- * Provides NMI watchdog support via Virtual Wire mode.
- * The route is: PIT -> PIC -> LVT0 in NMI mode.
- *
- * Note: Our Virtual Wire implementation is simplified, only
- * propagating PIT interrupts to all VCPUs when they have set
- * LVT0 to NMI delivery. Other PIC interrupts are just sent to
- * VCPU0, and only if its LVT0 is in EXTINT mode.
- */
- if (atomic_read(&kvm->arch.vapics_in_nmi_mode) > 0)
- kvm_for_each_vcpu(i, vcpu, kvm)
- kvm_apic_nmi_wd_deliver(vcpu);
- }
+ if (atomic_read(&kvm->arch.vapics_in_nmi_mode) > 0)
+ kvm_for_each_vcpu(i, vcpu, kvm)
+ kvm_apic_nmi_wd_deliver(vcpu);
}
static enum hrtimer_restart pit_timer_fn(struct hrtimer *data)
@@ -323,6 +311,12 @@ static enum hrtimer_restart pit_timer_fn(struct hrtimer *data)
return HRTIMER_NORESTART;
}
+static void kvm_pit_reset_reinject(struct kvm_pit *pit)
+{
+ atomic_set(&pit->pit_state.pending, 0);
+ atomic_set(&pit->pit_state.irq_ack, 1);
+}
+
static void create_pit_timer(struct kvm *kvm, u32 val, int is_period)
{
struct kvm_kpit_state *ps = &kvm->arch.vpit->pit_state;
@@ -345,8 +339,7 @@ static void create_pit_timer(struct kvm *kvm, u32 val, int is_period)
ps->timer.function = pit_timer_fn;
ps->kvm = ps->pit->kvm;
- atomic_set(&ps->pending, 0);
- ps->irq_ack = 1;
+ kvm_pit_reset_reinject(ps->pit);
/*
* Do not allow the guest to program periodic timers with small
@@ -646,18 +639,15 @@ void kvm_pit_reset(struct kvm_pit *pit)
}
mutex_unlock(&pit->pit_state.lock);
- atomic_set(&pit->pit_state.pending, 0);
- pit->pit_state.irq_ack = 1;
+ kvm_pit_reset_reinject(pit);
}
static void pit_mask_notifer(struct kvm_irq_mask_notifier *kimn, bool mask)
{
struct kvm_pit *pit = container_of(kimn, struct kvm_pit, mask_notifier);
- if (!mask) {
- atomic_set(&pit->pit_state.pending, 0);
- pit->pit_state.irq_ack = 1;
- }
+ if (!mask)
+ kvm_pit_reset_reinject(pit);
}
static const struct kvm_io_device_ops pit_dev_ops = {
@@ -691,7 +681,6 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags)
mutex_init(&pit->pit_state.lock);
mutex_lock(&pit->pit_state.lock);
- spin_lock_init(&pit->pit_state.inject_lock);
pid = get_pid(task_tgid(current));
pid_nr = pid_vnr(pid);
diff --git a/arch/x86/kvm/i8254.h b/arch/x86/kvm/i8254.h
index c84990b42b5b..f8cf4b84f435 100644
--- a/arch/x86/kvm/i8254.h
+++ b/arch/x86/kvm/i8254.h
@@ -33,8 +33,7 @@ struct kvm_kpit_state {
u32 speaker_data_on;
struct mutex lock;
struct kvm_pit *pit;
- spinlock_t inject_lock;
- unsigned long irq_ack;
+ atomic_t irq_ack;
struct kvm_irq_ack_notifier irq_ack_notifier;
};
--
2.7.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 3/4] KVM: x86: change PIT discard tick policy
2016-02-03 16:23 [PATCH 0/4] KVM: x86: change and fix PIT discard tick policy Radim Krčmář
2016-02-03 16:23 ` [PATCH 1/4] KVM: x86: fix interrupt dropping race in PIT Radim Krčmář
2016-02-03 16:23 ` [PATCH 2/4] KVM: x86: refactor PIT state inject_lock Radim Krčmář
@ 2016-02-03 16:23 ` Radim Krčmář
2016-02-03 16:48 ` Paolo Bonzini
2016-02-03 16:23 ` [PATCH 4/4] KVM: x86: remove notifiers from PIT discard policy Radim Krčmář
3 siblings, 1 reply; 11+ messages in thread
From: Radim Krčmář @ 2016-02-03 16:23 UTC (permalink / raw)
To: linux-kernel; +Cc: kvm, Paolo Bonzini, Yuki Shibuya
Discard policy uses ack_notifiers to prevent injection of PIT interrupts
before EOI from the last one.
This patch changes the policy to always try to deliver the interrupt,
which makes a difference when its vector is in ISR.
Old implementation would drop the interrupt, but proposed one injects to
IRR, like real hardware would.
The old policy breaks legacy NMI watchdogs, where PIT is used through
virtual wire (LVT0): PIT never sends an interrupt before receiving EOI,
thus a guest deadlock with disabled interrupts will stop NMIs.
Note that NMI doesn't do EOI, so PIT also had to send a normal interrupt
through IOAPIC. (KVM's PIT is deeply rotten and luckily not used much
in modern systems.)
Even though there is a chance of regressions, I think we can fix the
LVT0 NMI bug without introducing a new tick policy.
Reported-by: Yuki Shibuya <shibuya.yk@ncos.nec.co.jp>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
arch/x86/kvm/i8254.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index a137eb381012..fc554fbf71a7 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -237,6 +237,9 @@ static void kvm_pit_ack_irq(struct kvm_irq_ack_notifier *kian)
struct kvm_kpit_state *ps = container_of(kian, struct kvm_kpit_state,
irq_ack_notifier);
+ if (!ps->reinject)
+ return;
+
atomic_set(&ps->irq_ack, 1);
if (atomic_add_unless(&ps->pending, -1, 0))
/* in this case, we had multiple outstanding pit interrupts
@@ -274,7 +277,7 @@ static void pit_do_work(struct kthread_work *work)
int i;
struct kvm_kpit_state *ps = &pit->pit_state;
- if (!atomic_xchg(&ps->irq_ack, 0))
+ if (ps->reinject && !atomic_xchg(&ps->irq_ack, 0))
return;
kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1, false);
@@ -299,10 +302,10 @@ static enum hrtimer_restart pit_timer_fn(struct hrtimer *data)
struct kvm_kpit_state *ps = container_of(data, struct kvm_kpit_state, timer);
struct kvm_pit *pt = ps->kvm->arch.vpit;
- if (ps->reinject || !atomic_read(&ps->pending)) {
+ if (ps->reinject)
atomic_inc(&ps->pending);
- queue_kthread_work(&pt->worker, &pt->expired);
- }
+
+ queue_kthread_work(&pt->worker, &pt->expired);
if (ps->is_periodic) {
hrtimer_add_expires_ns(&ps->timer, ps->period);
--
2.7.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 4/4] KVM: x86: remove notifiers from PIT discard policy
2016-02-03 16:23 [PATCH 0/4] KVM: x86: change and fix PIT discard tick policy Radim Krčmář
` (2 preceding siblings ...)
2016-02-03 16:23 ` [PATCH 3/4] KVM: x86: change PIT discard tick policy Radim Krčmář
@ 2016-02-03 16:23 ` Radim Krčmář
2016-02-03 16:51 ` Paolo Bonzini
3 siblings, 1 reply; 11+ messages in thread
From: Radim Krčmář @ 2016-02-03 16:23 UTC (permalink / raw)
To: linux-kernel; +Cc: kvm, Paolo Bonzini, Yuki Shibuya
Discard policy doesn't rely on information from notifiers, so we don't
need to register notifiers unconditionally.
Use of ps->lock doesn't make sense, but isn't any worse than before.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
arch/x86/kvm/i8254.c | 44 ++++++++++++++++++++++++++++++--------------
arch/x86/kvm/i8254.h | 1 +
arch/x86/kvm/x86.c | 6 +++---
3 files changed, 34 insertions(+), 17 deletions(-)
diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index fc554fbf71a7..f1dfc250bbe0 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -237,9 +237,6 @@ static void kvm_pit_ack_irq(struct kvm_irq_ack_notifier *kian)
struct kvm_kpit_state *ps = container_of(kian, struct kvm_kpit_state,
irq_ack_notifier);
- if (!ps->reinject)
- return;
-
atomic_set(&ps->irq_ack, 1);
if (atomic_add_unless(&ps->pending, -1, 0))
/* in this case, we had multiple outstanding pit interrupts
@@ -708,14 +705,13 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags)
hrtimer_init(&pit_state->timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
pit_state->irq_ack_notifier.gsi = 0;
pit_state->irq_ack_notifier.irq_acked = kvm_pit_ack_irq;
- kvm_register_irq_ack_notifier(kvm, &pit_state->irq_ack_notifier);
- pit_state->reinject = true;
mutex_unlock(&pit->pit_state.lock);
- kvm_pit_reset(pit);
-
pit->mask_notifier.func = pit_mask_notifer;
- kvm_register_irq_mask_notifier(kvm, 0, &pit->mask_notifier);
+
+ kvm_pit_set_reinject(pit, true);
+
+ kvm_pit_reset(pit);
kvm_iodevice_init(&pit->dev, &pit_dev_ops);
ret = kvm_io_bus_register_dev(kvm, KVM_PIO_BUS, KVM_PIT_BASE_ADDRESS,
@@ -738,14 +734,37 @@ fail_unregister:
kvm_io_bus_unregister_dev(kvm, KVM_PIO_BUS, &pit->dev);
fail:
- kvm_unregister_irq_mask_notifier(kvm, 0, &pit->mask_notifier);
- kvm_unregister_irq_ack_notifier(kvm, &pit_state->irq_ack_notifier);
+ kvm_pit_set_reinject(pit, false);
kvm_free_irq_source_id(kvm, pit->irq_source_id);
kthread_stop(pit->worker_task);
kfree(pit);
return NULL;
}
+void kvm_pit_set_reinject(struct kvm_pit *pit, bool reinject)
+{
+ /* Implicit BUG on NULL dereference. */
+ struct kvm_kpit_state *ps = &pit->pit_state;
+ struct kvm *kvm = pit->kvm;
+
+ mutex_lock(&ps->lock);
+
+ if (ps->reinject == reinject)
+ goto out;
+
+ ps->reinject = reinject;
+ if (reinject) {
+ kvm_pit_reset_reinject(pit);
+ kvm_register_irq_ack_notifier(kvm, &ps->irq_ack_notifier);
+ kvm_register_irq_mask_notifier(kvm, 0, &pit->mask_notifier);
+ } else {
+ kvm_unregister_irq_ack_notifier(kvm, &ps->irq_ack_notifier);
+ kvm_unregister_irq_mask_notifier(kvm, 0, &pit->mask_notifier);
+ }
+out:
+ mutex_unlock(&ps->lock);
+}
+
void kvm_free_pit(struct kvm *kvm)
{
struct hrtimer *timer;
@@ -754,10 +773,7 @@ void kvm_free_pit(struct kvm *kvm)
kvm_io_bus_unregister_dev(kvm, KVM_PIO_BUS, &kvm->arch.vpit->dev);
kvm_io_bus_unregister_dev(kvm, KVM_PIO_BUS,
&kvm->arch.vpit->speaker_dev);
- kvm_unregister_irq_mask_notifier(kvm, 0,
- &kvm->arch.vpit->mask_notifier);
- kvm_unregister_irq_ack_notifier(kvm,
- &kvm->arch.vpit->pit_state.irq_ack_notifier);
+ kvm_pit_set_reinject(kvm->arch.vpit, false);
mutex_lock(&kvm->arch.vpit->pit_state.lock);
timer = &kvm->arch.vpit->pit_state.timer;
hrtimer_cancel(timer);
diff --git a/arch/x86/kvm/i8254.h b/arch/x86/kvm/i8254.h
index f8cf4b84f435..7ac94bc0799f 100644
--- a/arch/x86/kvm/i8254.h
+++ b/arch/x86/kvm/i8254.h
@@ -60,5 +60,6 @@ void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val, int hpet_legacy_s
struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags);
void kvm_free_pit(struct kvm *kvm);
void kvm_pit_reset(struct kvm_pit *pit);
+void kvm_pit_set_reinject(struct kvm_pit *pit, bool reinject);
#endif
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5b937fdebc66..6bc03141e809 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3652,9 +3652,9 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm,
{
if (!kvm->arch.vpit)
return -ENXIO;
- mutex_lock(&kvm->arch.vpit->pit_state.lock);
- kvm->arch.vpit->pit_state.reinject = control->pit_reinject;
- mutex_unlock(&kvm->arch.vpit->pit_state.lock);
+
+ kvm_pit_set_reinject(kvm->arch.vpit, control->pit_reinject);
+
return 0;
}
--
2.7.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 2/4] KVM: x86: refactor PIT state inject_lock
2016-02-03 16:23 ` [PATCH 2/4] KVM: x86: refactor PIT state inject_lock Radim Krčmář
@ 2016-02-03 16:45 ` Paolo Bonzini
2016-02-04 13:13 ` Radim Krčmář
0 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2016-02-03 16:45 UTC (permalink / raw)
To: Radim Krčmář, linux-kernel; +Cc: kvm, Yuki Shibuya
On 03/02/2016 17:23, Radim Krčmář wrote:
> Following patches would be even uglier if inject_lock didn't go away.
>
> Patch changes the virtual wire comment to better describe our situation.
>
> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
> ---
> arch/x86/kvm/i8254.c | 67 ++++++++++++++++++++++------------------------------
> arch/x86/kvm/i8254.h | 3 +--
> 2 files changed, 29 insertions(+), 41 deletions(-)
>
> diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
> index de5f5018026f..a137eb381012 100644
> --- a/arch/x86/kvm/i8254.c
> +++ b/arch/x86/kvm/i8254.c
> @@ -236,16 +236,13 @@ static void kvm_pit_ack_irq(struct kvm_irq_ack_notifier *kian)
> {
> struct kvm_kpit_state *ps = container_of(kian, struct kvm_kpit_state,
> irq_ack_notifier);
> - int value;
>
> - spin_lock(&ps->inject_lock);
> + atomic_set(&ps->irq_ack, 1);
smp_mb__before_atomic();
> if (atomic_add_unless(&ps->pending, -1, 0))
> /* in this case, we had multiple outstanding pit interrupts
> * that we needed to inject. Reinject
> */
> queue_kthread_work(&ps->pit->worker, &ps->pit->expired);
> - ps->irq_ack = 1;
> - spin_unlock(&ps->inject_lock);
> }
>
> void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu)
> @@ -276,34 +273,25 @@ static void pit_do_work(struct kthread_work *work)
> struct kvm_vcpu *vcpu;
> int i;
> struct kvm_kpit_state *ps = &pit->pit_state;
> - int inject = 0;
>
> - /* Try to inject pending interrupts when
> - * last one has been acked.
> + if (!atomic_xchg(&ps->irq_ack, 0))
> + return;
> +
> + kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1, false);
> + kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 0, false);
> +
> + /*
> + * Provides NMI watchdog support via Virtual Wire mode.
> + * The route is: PIT -> LVT0 in NMI mode.
> + *
> + * Note: Our Virtual Wire implementation does not follow
> + * the MP specification. We propagate a PIT interrupt to all
> + * VCPUs and only when LVT0 is in NMI mode. The interrupt can
> + * also be simultaneously delivered through PIC and IOAPIC.
> */
> - spin_lock(&ps->inject_lock);
> - if (ps->irq_ack) {
> - ps->irq_ack = 0;
> - inject = 1;
> - }
> - spin_unlock(&ps->inject_lock);
> - if (inject) {
> - kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1, false);
> - kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 0, false);
> -
> - /*
> - * Provides NMI watchdog support via Virtual Wire mode.
> - * The route is: PIT -> PIC -> LVT0 in NMI mode.
> - *
> - * Note: Our Virtual Wire implementation is simplified, only
> - * propagating PIT interrupts to all VCPUs when they have set
> - * LVT0 to NMI delivery. Other PIC interrupts are just sent to
> - * VCPU0, and only if its LVT0 is in EXTINT mode.
> - */
> - if (atomic_read(&kvm->arch.vapics_in_nmi_mode) > 0)
> - kvm_for_each_vcpu(i, vcpu, kvm)
> - kvm_apic_nmi_wd_deliver(vcpu);
> - }
> + if (atomic_read(&kvm->arch.vapics_in_nmi_mode) > 0)
> + kvm_for_each_vcpu(i, vcpu, kvm)
> + kvm_apic_nmi_wd_deliver(vcpu);
> }
>
> static enum hrtimer_restart pit_timer_fn(struct hrtimer *data)
> @@ -323,6 +311,12 @@ static enum hrtimer_restart pit_timer_fn(struct hrtimer *data)
> return HRTIMER_NORESTART;
> }
>
> +static void kvm_pit_reset_reinject(struct kvm_pit *pit)
> +{
> + atomic_set(&pit->pit_state.pending, 0);
smp_wmb()?
Looks safe otherwise. (Please also add a comment before the memory
barriers to show the pairing).
Paolo
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 3/4] KVM: x86: change PIT discard tick policy
2016-02-03 16:23 ` [PATCH 3/4] KVM: x86: change PIT discard tick policy Radim Krčmář
@ 2016-02-03 16:48 ` Paolo Bonzini
0 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2016-02-03 16:48 UTC (permalink / raw)
To: Radim Krčmář, linux-kernel; +Cc: kvm, Yuki Shibuya
On 03/02/2016 17:23, Radim Krčmář wrote:
> Discard policy uses ack_notifiers to prevent injection of PIT interrupts
> before EOI from the last one.
>
> This patch changes the policy to always try to deliver the interrupt,
> which makes a difference when its vector is in ISR.
> Old implementation would drop the interrupt, but proposed one injects to
> IRR, like real hardware would.
>
> The old policy breaks legacy NMI watchdogs, where PIT is used through
> virtual wire (LVT0): PIT never sends an interrupt before receiving EOI,
> thus a guest deadlock with disabled interrupts will stop NMIs.
>
> Note that NMI doesn't do EOI, so PIT also had to send a normal interrupt
> through IOAPIC. (KVM's PIT is deeply rotten and luckily not used much
> in modern systems.)
>
> Even though there is a chance of regressions, I think we can fix the
> LVT0 NMI bug without introducing a new tick policy.
>
> Reported-by: Yuki Shibuya <shibuya.yk@ncos.nec.co.jp>
> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Haven't looked at the patch yet, but this is definitely how DISCARD is
supposed to work.
Paolo
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 4/4] KVM: x86: remove notifiers from PIT discard policy
2016-02-03 16:23 ` [PATCH 4/4] KVM: x86: remove notifiers from PIT discard policy Radim Krčmář
@ 2016-02-03 16:51 ` Paolo Bonzini
2016-02-04 13:15 ` Radim Krčmář
0 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2016-02-03 16:51 UTC (permalink / raw)
To: Radim Krčmář, linux-kernel; +Cc: kvm, Yuki Shibuya
On 03/02/2016 17:23, Radim Krčmář wrote:
> Discard policy doesn't rely on information from notifiers, so we don't
> need to register notifiers unconditionally.
>
> Use of ps->lock doesn't make sense, but isn't any worse than before.
Oh, it's perfectly okay. Too fine-grained locks are bad, and lock
contention on ps->lock is a non-issue.
Can you however add a patch that says what fields of kvm_kpit_state are
protected by which locks? Then this patch will just add
/* Protected by kvm_kpit_state lock. */
above the reinject field.
Otherwise
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
> ---
> arch/x86/kvm/i8254.c | 44 ++++++++++++++++++++++++++++++--------------
> arch/x86/kvm/i8254.h | 1 +
> arch/x86/kvm/x86.c | 6 +++---
> 3 files changed, 34 insertions(+), 17 deletions(-)
>
> diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
> index fc554fbf71a7..f1dfc250bbe0 100644
> --- a/arch/x86/kvm/i8254.c
> +++ b/arch/x86/kvm/i8254.c
> @@ -237,9 +237,6 @@ static void kvm_pit_ack_irq(struct kvm_irq_ack_notifier *kian)
> struct kvm_kpit_state *ps = container_of(kian, struct kvm_kpit_state,
> irq_ack_notifier);
>
> - if (!ps->reinject)
> - return;
> -
> atomic_set(&ps->irq_ack, 1);
> if (atomic_add_unless(&ps->pending, -1, 0))
> /* in this case, we had multiple outstanding pit interrupts
> @@ -708,14 +705,13 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags)
> hrtimer_init(&pit_state->timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
> pit_state->irq_ack_notifier.gsi = 0;
> pit_state->irq_ack_notifier.irq_acked = kvm_pit_ack_irq;
> - kvm_register_irq_ack_notifier(kvm, &pit_state->irq_ack_notifier);
> - pit_state->reinject = true;
> mutex_unlock(&pit->pit_state.lock);
>
> - kvm_pit_reset(pit);
> -
> pit->mask_notifier.func = pit_mask_notifer;
> - kvm_register_irq_mask_notifier(kvm, 0, &pit->mask_notifier);
> +
> + kvm_pit_set_reinject(pit, true);
> +
> + kvm_pit_reset(pit);
>
> kvm_iodevice_init(&pit->dev, &pit_dev_ops);
> ret = kvm_io_bus_register_dev(kvm, KVM_PIO_BUS, KVM_PIT_BASE_ADDRESS,
> @@ -738,14 +734,37 @@ fail_unregister:
> kvm_io_bus_unregister_dev(kvm, KVM_PIO_BUS, &pit->dev);
>
> fail:
> - kvm_unregister_irq_mask_notifier(kvm, 0, &pit->mask_notifier);
> - kvm_unregister_irq_ack_notifier(kvm, &pit_state->irq_ack_notifier);
> + kvm_pit_set_reinject(pit, false);
> kvm_free_irq_source_id(kvm, pit->irq_source_id);
> kthread_stop(pit->worker_task);
> kfree(pit);
> return NULL;
> }
>
> +void kvm_pit_set_reinject(struct kvm_pit *pit, bool reinject)
> +{
> + /* Implicit BUG on NULL dereference. */
> + struct kvm_kpit_state *ps = &pit->pit_state;
> + struct kvm *kvm = pit->kvm;
> +
> + mutex_lock(&ps->lock);
> +
> + if (ps->reinject == reinject)
> + goto out;
> +
> + ps->reinject = reinject;
> + if (reinject) {
> + kvm_pit_reset_reinject(pit);
> + kvm_register_irq_ack_notifier(kvm, &ps->irq_ack_notifier);
> + kvm_register_irq_mask_notifier(kvm, 0, &pit->mask_notifier);
> + } else {
> + kvm_unregister_irq_ack_notifier(kvm, &ps->irq_ack_notifier);
> + kvm_unregister_irq_mask_notifier(kvm, 0, &pit->mask_notifier);
> + }
> +out:
> + mutex_unlock(&ps->lock);
> +}
> +
> void kvm_free_pit(struct kvm *kvm)
> {
> struct hrtimer *timer;
> @@ -754,10 +773,7 @@ void kvm_free_pit(struct kvm *kvm)
> kvm_io_bus_unregister_dev(kvm, KVM_PIO_BUS, &kvm->arch.vpit->dev);
> kvm_io_bus_unregister_dev(kvm, KVM_PIO_BUS,
> &kvm->arch.vpit->speaker_dev);
> - kvm_unregister_irq_mask_notifier(kvm, 0,
> - &kvm->arch.vpit->mask_notifier);
> - kvm_unregister_irq_ack_notifier(kvm,
> - &kvm->arch.vpit->pit_state.irq_ack_notifier);
> + kvm_pit_set_reinject(kvm->arch.vpit, false);
> mutex_lock(&kvm->arch.vpit->pit_state.lock);
> timer = &kvm->arch.vpit->pit_state.timer;
> hrtimer_cancel(timer);
> diff --git a/arch/x86/kvm/i8254.h b/arch/x86/kvm/i8254.h
> index f8cf4b84f435..7ac94bc0799f 100644
> --- a/arch/x86/kvm/i8254.h
> +++ b/arch/x86/kvm/i8254.h
> @@ -60,5 +60,6 @@ void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val, int hpet_legacy_s
> struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags);
> void kvm_free_pit(struct kvm *kvm);
> void kvm_pit_reset(struct kvm_pit *pit);
> +void kvm_pit_set_reinject(struct kvm_pit *pit, bool reinject);
>
> #endif
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 5b937fdebc66..6bc03141e809 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3652,9 +3652,9 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm,
> {
> if (!kvm->arch.vpit)
> return -ENXIO;
> - mutex_lock(&kvm->arch.vpit->pit_state.lock);
> - kvm->arch.vpit->pit_state.reinject = control->pit_reinject;
> - mutex_unlock(&kvm->arch.vpit->pit_state.lock);
> +
> + kvm_pit_set_reinject(kvm->arch.vpit, control->pit_reinject);
> +
> return 0;
> }
>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/4] KVM: x86: refactor PIT state inject_lock
2016-02-03 16:45 ` Paolo Bonzini
@ 2016-02-04 13:13 ` Radim Krčmář
0 siblings, 0 replies; 11+ messages in thread
From: Radim Krčmář @ 2016-02-04 13:13 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: linux-kernel, kvm, Yuki Shibuya
2016-02-03 17:45+0100, Paolo Bonzini:
> On 03/02/2016 17:23, Radim Krčmář wrote:
>> Following patches would be even uglier if inject_lock didn't go away.
>>
>> Patch changes the virtual wire comment to better describe our situation.
>>
>> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
>> ---
>> diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
>> @@ -236,16 +236,13 @@ static void kvm_pit_ack_irq(struct kvm_irq_ack_notifier *kian)
>> {
>> struct kvm_kpit_state *ps = container_of(kian, struct kvm_kpit_state,
>> irq_ack_notifier);
>> - int value;
>>
>> - spin_lock(&ps->inject_lock);
(Our code doesn't need barrier between dereferencing the pointer and
reading its contents, and this bug is not possible happen on x86.)
>> + atomic_set(&ps->irq_ack, 1);
>
> smp_mb__before_atomic();
irq_ack has to be set before queueing pit_do_work, or could lose the
interrupt.
atomic_add_unless implies full barriers, so I think we're ok here.
>> if (atomic_add_unless(&ps->pending, -1, 0))
>> queue_kthread_work(&ps->pit->worker, &ps->pit->expired);
>> void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu)
>> @@ -323,6 +311,12 @@ static enum hrtimer_restart pit_timer_fn(struct hrtimer *data)
>> return HRTIMER_NORESTART;
>> }
>>
>> +static void kvm_pit_reset_reinject(struct kvm_pit *pit)
>> +{
>> + atomic_set(&pit->pit_state.pending, 0);
>
> smp_wmb()?
I don't think that we need to ensure the order of pending and irq_ack
because there isn't a dependency between these two. The reset is a slap
out of nowhere ... I think we'd need locks to make it behave correctly.
The worst that can happen is one virtual wire NMI injection when the
IOAPIC interrupt was in IRR and number of injections therefore won't
match. The race goes like this:
kvm_pit_ack_irq is running and we have pending > 0, so pit_do_work is
scheduled and executed. pit_do_work sets irq_ack from 1 to 0.
Then pit_timer_fn gets executed, increases pending and queues
pit_do_work. Before pit_do_work has a chance to run, we set pending=0
and irq_ack=1 in kvm_pit_reset_reinject. pit_do_work is executed, sets
irq_ack from 1 to 0, but injects only the NMI, because interrupt is
already in IRR. When the interrupt does EOI, we don't reinject it,
because pending==0.
Barriers can't solve this, locking would be pretty awkward and I think
that having one interrupt more or less is ok for the delay policy. :)
| + atomic_set(&pit->pit_state.irq_ack, 1);
Callers of kvm_pit_reset_reinject are providing barriers, so assignment
can't be reordered into inappropriate places, but it wouldn't hurt to
add barriers here.
> Looks safe otherwise. (Please also add a comment before the memory
> barriers to show the pairing).
Thanks, I'll comment what I can.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 4/4] KVM: x86: remove notifiers from PIT discard policy
2016-02-03 16:51 ` Paolo Bonzini
@ 2016-02-04 13:15 ` Radim Krčmář
0 siblings, 0 replies; 11+ messages in thread
From: Radim Krčmář @ 2016-02-04 13:15 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: linux-kernel, kvm, Yuki Shibuya
2016-02-03 17:51+0100, Paolo Bonzini:
> On 03/02/2016 17:23, Radim Krčmář wrote:
>> Discard policy doesn't rely on information from notifiers, so we don't
>> need to register notifiers unconditionally.
>>
>> Use of ps->lock doesn't make sense, but isn't any worse than before.
Oops, it is worse than before ... toggling KVM_REINJECT_CONTROL when the
guest is running and reading reinject without locking is now far more
complex. This patch should have also ignored KVM_REINJECT_CONTROL when
PIT has been started.
> Oh, it's perfectly okay. Too fine-grained locks are bad, and lock
> contention on ps->lock is a non-issue.
>
> Can you however add a patch that says what fields of kvm_kpit_state are
> protected by which locks?
Ok. (I'll be careful to not rewrite the whole PIT while at it. :])
> Then this patch will just add
>
> /* Protected by kvm_kpit_state lock. */
>
> above the reinject field.
There was no need to lock reinject in the past and v2 will hopefully
achieve it again.
> Otherwise
>
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Thanks. (Might not be applicable to v2, though; sorry.)
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/4] KVM: x86: fix interrupt dropping race in PIT
2016-02-03 16:23 ` [PATCH 1/4] KVM: x86: fix interrupt dropping race in PIT Radim Krčmář
@ 2016-02-04 13:35 ` Radim Krčmář
0 siblings, 0 replies; 11+ messages in thread
From: Radim Krčmář @ 2016-02-04 13:35 UTC (permalink / raw)
To: linux-kernel; +Cc: kvm, Paolo Bonzini, Yuki Shibuya
2016-02-03 17:23+0100, Radim Krčmář:
> A misuse of atomic operations opened a window
>
> kvm_pit_ack_irq: | pit_timer_fn:
> value = atomic_dec_return(&ps->pending); |
> | !atomic_read(&ps->pending)
> if (value < 0) atomic_inc(&ps->pending); |
>
> If ps->pending starts as 0 and we are using the discard policy, we don't
> inject any interrupt in kvm_pit_ack_irq or pit_timer_fn, leading to a
> missed PIT cycle.
>
> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
> ---
> diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
> @@ -239,13 +239,7 @@ static void kvm_pit_ack_irq(struct kvm_irq_ack_notifier *kian)
> int value;
>
> spin_lock(&ps->inject_lock);
> - value = atomic_dec_return(&ps->pending);
> - if (value < 0)
> - /* spurious acks can be generated if, for example, the
> - * PIC is being reset. Handle it gracefully here
> - */
> - atomic_inc(&ps->pending);
> - else if (value > 0)
> + if (atomic_add_unless(&ps->pending, -1, 0))
This is not right, I'll redo the series. Sorry.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2016-02-04 13:35 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-03 16:23 [PATCH 0/4] KVM: x86: change and fix PIT discard tick policy Radim Krčmář
2016-02-03 16:23 ` [PATCH 1/4] KVM: x86: fix interrupt dropping race in PIT Radim Krčmář
2016-02-04 13:35 ` Radim Krčmář
2016-02-03 16:23 ` [PATCH 2/4] KVM: x86: refactor PIT state inject_lock Radim Krčmář
2016-02-03 16:45 ` Paolo Bonzini
2016-02-04 13:13 ` Radim Krčmář
2016-02-03 16:23 ` [PATCH 3/4] KVM: x86: change PIT discard tick policy Radim Krčmář
2016-02-03 16:48 ` Paolo Bonzini
2016-02-03 16:23 ` [PATCH 4/4] KVM: x86: remove notifiers from PIT discard policy Radim Krčmář
2016-02-03 16:51 ` Paolo Bonzini
2016-02-04 13:15 ` Radim Krčmář
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).