* [PATCH 0/4] KVM: SVM: improve NMI window singlestep
@ 2017-06-15 11:20 Ladi Prosek
2017-06-15 11:20 ` [PATCH 1/4] KVM: SVM: introduce disable_nmi_singlestep helper Ladi Prosek
` (4 more replies)
0 siblings, 5 replies; 23+ messages in thread
From: Ladi Prosek @ 2017-06-15 11:20 UTC (permalink / raw)
To: kvm; +Cc: rkrcmar
On AMD hardware, Hyper-V doesn't work nested on KVM if L1 is started with
more than one vCPU. The problem is in NMI which is heavily used by Windows
on SMP systems.
This series fixes three related issues with the current NMI singlestep
logic and makes Windows with Hyper-V happy. The whole thing is far from
perfect, though, especially considering the interaction with user-mode
singlestepping (KVM_GUESTDBG_SINGLESTEP) which also uses the TRAP flag.
High-level, both KVM_GUESTDBG_SINGLESTEP and NMI window singlestep set
the TRAP flag to make the CPU exit after executing a single instruction.
But, in the absence of RFLAGS shadowing support in the hardware, for this
to work reliably KVM should make sure that this is 100% transparent to the
guest, i.e. the guest will never be able to tell that the TRAP flag is set.
NMI window singlestep kind of works with these patches because it's meant
only for short sequences (I believe that the original intention was to
step over an IRET but I doubt it's that simple anymore) so we can get
away with half-butting it. In particular, it's unlikely that the guest
would set the TRAP flag while the NMI window is closed. Properly handling
KVM_GUESTDBG_SINGLESTEP would likely involve intercepting PUSHF & POPF,
clearing the TRAP flag from the stack on interrupt entry, and possibly more.
Each of the following may be active, independently:
1) NMI injection looking for the window to open
2) User-mode singlestepping the guest using KVM_GUESTDBG_SINGLESTEP
3) The guest OS singlestepping a program
so an SVM_EXIT_EXCP_BASE + DB_VECTOR exit should be de-multiplexed to do
possibly several things: inject an NMI, notify user-mode, inject nested
exit or DB exception.
Ladi Prosek (4):
KVM: SVM: introduce disable_nmi_singlestep helper
KVM: nSVM: do not forward NMI window singlestep VM exits to L1
KVM: SVM: hide TF/RF flags used by NMI singlestep
KVM: SVM: don't NMI singlestep over event injection
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 1/4] KVM: SVM: introduce disable_nmi_singlestep helper
2017-06-15 11:20 [PATCH 0/4] KVM: SVM: improve NMI window singlestep Ladi Prosek
@ 2017-06-15 11:20 ` Ladi Prosek
2017-06-15 11:20 ` [PATCH 2/4] KVM: nSVM: do not forward NMI window singlestep VM exits to L1 Ladi Prosek
` (3 subsequent siblings)
4 siblings, 0 replies; 23+ messages in thread
From: Ladi Prosek @ 2017-06-15 11:20 UTC (permalink / raw)
To: kvm; +Cc: rkrcmar
Just moving the code to a new helper in preparation for subsequent
commits.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
---
arch/x86/kvm/svm.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ba9891a..27521a0 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -963,6 +963,14 @@ static void svm_disable_lbrv(struct vcpu_svm *svm)
set_msr_interception(msrpm, MSR_IA32_LASTINTTOIP, 0, 0);
}
+static void disable_nmi_singlestep(struct vcpu_svm *svm)
+{
+ svm->nmi_singlestep = false;
+ if (!(svm->vcpu.guest_debug & KVM_GUESTDBG_SINGLESTEP))
+ svm->vmcb->save.rflags &=
+ ~(X86_EFLAGS_TF | X86_EFLAGS_RF);
+}
+
/* Note:
* This hash table is used to map VM_ID to a struct kvm_arch,
* when handling AMD IOMMU GALOG notification to schedule in
@@ -2111,10 +2119,7 @@ static int db_interception(struct vcpu_svm *svm)
}
if (svm->nmi_singlestep) {
- svm->nmi_singlestep = false;
- if (!(svm->vcpu.guest_debug & KVM_GUESTDBG_SINGLESTEP))
- svm->vmcb->save.rflags &=
- ~(X86_EFLAGS_TF | X86_EFLAGS_RF);
+ disable_nmi_singlestep(svm);
}
if (svm->vcpu.guest_debug &
--
2.9.3
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 2/4] KVM: nSVM: do not forward NMI window singlestep VM exits to L1
2017-06-15 11:20 [PATCH 0/4] KVM: SVM: improve NMI window singlestep Ladi Prosek
2017-06-15 11:20 ` [PATCH 1/4] KVM: SVM: introduce disable_nmi_singlestep helper Ladi Prosek
@ 2017-06-15 11:20 ` Ladi Prosek
2017-06-15 12:08 ` Paolo Bonzini
2017-06-16 13:26 ` Radim Krčmář
2017-06-15 11:20 ` [PATCH 3/4] KVM: SVM: hide TF/RF flags used by NMI singlestep Ladi Prosek
` (2 subsequent siblings)
4 siblings, 2 replies; 23+ messages in thread
From: Ladi Prosek @ 2017-06-15 11:20 UTC (permalink / raw)
To: kvm; +Cc: rkrcmar
Nested hypervisor should not see singlestep VM exits if singlestepping
was enabled internally by KVM. Windows is particularly sensitive to this
and known to bluescreen on unexpected VM exits.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
---
arch/x86/kvm/svm.c | 45 ++++++++++++++++++++++++++++++++++++++++-----
1 file changed, 40 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 27521a0..512cc61 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -189,6 +189,7 @@ struct vcpu_svm {
struct nested_state nested;
bool nmi_singlestep;
+ u64 nmi_singlestep_guest_rflags;
unsigned int3_injected;
unsigned long int3_rip;
@@ -966,9 +967,13 @@ static void svm_disable_lbrv(struct vcpu_svm *svm)
static void disable_nmi_singlestep(struct vcpu_svm *svm)
{
svm->nmi_singlestep = false;
- if (!(svm->vcpu.guest_debug & KVM_GUESTDBG_SINGLESTEP))
- svm->vmcb->save.rflags &=
- ~(X86_EFLAGS_TF | X86_EFLAGS_RF);
+ if (!(svm->vcpu.guest_debug & KVM_GUESTDBG_SINGLESTEP)) {
+ /* Clear our flags if they were not set by the guest */
+ if (!(svm->nmi_singlestep_guest_rflags & X86_EFLAGS_TF))
+ svm->vmcb->save.rflags &= ~X86_EFLAGS_TF;
+ if (!(svm->nmi_singlestep_guest_rflags & X86_EFLAGS_RF))
+ svm->vmcb->save.rflags &= ~X86_EFLAGS_RF;
+ }
}
/* Note:
@@ -2538,6 +2543,31 @@ static int nested_svm_exit_handled_msr(struct vcpu_svm *svm)
return (value & mask) ? NESTED_EXIT_DONE : NESTED_EXIT_HOST;
}
+/* DB exceptions for our internal use must not cause vmexit */
+static int nested_svm_intercept_db(struct vcpu_svm *svm)
+{
+ unsigned long dr6;
+
+ /* If we're not singlestepping, it's not ours */
+ if (!svm->nmi_singlestep)
+ return NESTED_EXIT_DONE;
+
+ /* If it's not a singlestep exception, it's not ours */
+ if (kvm_get_dr(&svm->vcpu, 6, &dr6))
+ return NESTED_EXIT_DONE;
+ if (!(dr6 & DR6_BS))
+ return NESTED_EXIT_DONE;
+
+ /* If the guest is singlestepping, we should forward the vmexit */
+ if (svm->nmi_singlestep_guest_rflags & X86_EFLAGS_TF) {
+ disable_nmi_singlestep(svm);
+ return NESTED_EXIT_DONE;
+ }
+
+ /* It's ours, nested hypervisor must not see this vmexit */
+ return NESTED_EXIT_HOST;
+}
+
static int nested_svm_exit_special(struct vcpu_svm *svm)
{
u32 exit_code = svm->vmcb->control.exit_code;
@@ -2593,8 +2623,12 @@ static int nested_svm_intercept(struct vcpu_svm *svm)
}
case SVM_EXIT_EXCP_BASE ... SVM_EXIT_EXCP_BASE + 0x1f: {
u32 excp_bits = 1 << (exit_code - SVM_EXIT_EXCP_BASE);
- if (svm->nested.intercept_exceptions & excp_bits)
- vmexit = NESTED_EXIT_DONE;
+ if (svm->nested.intercept_exceptions & excp_bits) {
+ if (exit_code == SVM_EXIT_EXCP_BASE + DB_VECTOR)
+ vmexit = nested_svm_intercept_db(svm);
+ else
+ vmexit = NESTED_EXIT_DONE;
+ }
/* async page fault always cause vmexit */
else if ((exit_code == SVM_EXIT_EXCP_BASE + PF_VECTOR) &&
svm->apf_reason != 0)
@@ -4635,6 +4669,7 @@ static void enable_nmi_window(struct kvm_vcpu *vcpu)
* Something prevents NMI from been injected. Single step over possible
* problem (IRET or exception injection or interrupt shadow)
*/
+ svm->nmi_singlestep_guest_rflags = svm_get_rflags(vcpu);
svm->nmi_singlestep = true;
svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
}
--
2.9.3
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 3/4] KVM: SVM: hide TF/RF flags used by NMI singlestep
2017-06-15 11:20 [PATCH 0/4] KVM: SVM: improve NMI window singlestep Ladi Prosek
2017-06-15 11:20 ` [PATCH 1/4] KVM: SVM: introduce disable_nmi_singlestep helper Ladi Prosek
2017-06-15 11:20 ` [PATCH 2/4] KVM: nSVM: do not forward NMI window singlestep VM exits to L1 Ladi Prosek
@ 2017-06-15 11:20 ` Ladi Prosek
2017-06-15 12:09 ` Paolo Bonzini
2017-06-15 11:20 ` [PATCH 4/4] KVM: SVM: don't NMI singlestep over event injection Ladi Prosek
2017-06-15 12:03 ` [PATCH 0/4] KVM: SVM: improve NMI window singlestep Paolo Bonzini
4 siblings, 1 reply; 23+ messages in thread
From: Ladi Prosek @ 2017-06-15 11:20 UTC (permalink / raw)
To: kvm; +Cc: rkrcmar
These flags are used internally by SVM so it's cleaner to not leak
them to callers of svm_get_rflags. This is similar to how the TF
flag is handled on KVM_GUESTDBG_SINGLESTEP by kvm_get_rflags and
kvm_set_rflags.
Without this change, the flags may propagate from host VMCB to nested
VMCB or vice versa while singlestepping over a nested VM enter/exit,
and then get stuck in inappropriate places.
Example: NMI singlestepping is enabled while running L1 guest. The
instruction to step over is VMRUN and nested vmrun emulation stashes
rflags to hsave->save.rflags. Then if singlestepping is disabled
while still in L2, TF/RF will be cleared from the nested VMCB but the
next nested VM exit will restore them from hsave->save.rflags and
cause an unexpected DB exception.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
---
arch/x86/kvm/svm.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 512cc61..739010a 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1725,11 +1725,24 @@ static void svm_vcpu_unblocking(struct kvm_vcpu *vcpu)
static unsigned long svm_get_rflags(struct kvm_vcpu *vcpu)
{
- return to_svm(vcpu)->vmcb->save.rflags;
+ struct vcpu_svm *svm = to_svm(vcpu);
+ unsigned long rflags = svm->vmcb->save.rflags;
+
+ if (svm->nmi_singlestep) {
+ /* Hide our flags if they were not set by the guest */
+ if (!(svm->nmi_singlestep_guest_rflags & X86_EFLAGS_TF))
+ rflags &= ~X86_EFLAGS_TF;
+ if (!(svm->nmi_singlestep_guest_rflags & X86_EFLAGS_RF))
+ rflags &= ~X86_EFLAGS_RF;
+ }
+ return rflags;
}
static void svm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
{
+ if (to_svm(vcpu)->nmi_singlestep)
+ rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
+
/*
* Any change of EFLAGS.VM is accompanied by a reload of SS
* (caused by either a task switch or an inter-privilege IRET),
--
2.9.3
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 4/4] KVM: SVM: don't NMI singlestep over event injection
2017-06-15 11:20 [PATCH 0/4] KVM: SVM: improve NMI window singlestep Ladi Prosek
` (2 preceding siblings ...)
2017-06-15 11:20 ` [PATCH 3/4] KVM: SVM: hide TF/RF flags used by NMI singlestep Ladi Prosek
@ 2017-06-15 11:20 ` Ladi Prosek
2017-06-15 12:05 ` Paolo Bonzini
2017-06-15 12:03 ` [PATCH 0/4] KVM: SVM: improve NMI window singlestep Paolo Bonzini
4 siblings, 1 reply; 23+ messages in thread
From: Ladi Prosek @ 2017-06-15 11:20 UTC (permalink / raw)
To: kvm; +Cc: rkrcmar
Singlestepping is enabled by setting the TF flag and care must be
taken to not let the guest see (and reuse at an inconvenient time)
the modified rflags value. One such case is event injection, as part
of which flags are pushed on the stack to be restored later on iret.
This commit disables singlestepping when we're about to inject an
event and enables iret interception to still have a chance to
resume our NMI injection efforts.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
---
arch/x86/kvm/svm.c | 27 +++++++++++++++++++++++----
1 file changed, 23 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 739010a..c5ef56a 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3271,11 +3271,14 @@ static int cpuid_interception(struct vcpu_svm *svm)
static int iret_interception(struct vcpu_svm *svm)
{
- ++svm->vcpu.stat.nmi_window_exits;
clr_intercept(svm, INTERCEPT_IRET);
- svm->vcpu.arch.hflags |= HF_IRET_MASK;
- svm->nmi_iret_rip = kvm_rip_read(&svm->vcpu);
- kvm_make_request(KVM_REQ_EVENT, &svm->vcpu);
+
+ if (svm->vcpu.arch.hflags & HF_NMI_MASK) {
+ ++svm->vcpu.stat.nmi_window_exits;
+ svm->vcpu.arch.hflags |= HF_IRET_MASK;
+ svm->nmi_iret_rip = kvm_rip_read(&svm->vcpu);
+ kvm_make_request(KVM_REQ_EVENT, &svm->vcpu);
+ }
return 1;
}
@@ -4823,6 +4826,22 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
if (unlikely(svm->nested.exit_required))
return;
+ /*
+ * Disable singlestep if we're injecting an interrupt/exception.
+ * We don't want our modified rflags to be pushed on the stack where
+ * we might not be able to easily reset them if we disabled NMI
+ * singlestep later.
+ */
+ if (svm->nmi_singlestep && svm->vmcb->control.event_inj) {
+ /*
+ * We enabled NMI singlestepping because the NMI window was
+ * closed. It's unlikely that injecting another event will make
+ * it any better. Try again later, on next iret at the latest.
+ */
+ disable_nmi_singlestep(svm);
+ set_intercept(svm, INTERCEPT_IRET);
+ }
+
pre_svm_run(svm);
sync_lapic_to_cr8(vcpu);
--
2.9.3
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 0/4] KVM: SVM: improve NMI window singlestep
2017-06-15 11:20 [PATCH 0/4] KVM: SVM: improve NMI window singlestep Ladi Prosek
` (3 preceding siblings ...)
2017-06-15 11:20 ` [PATCH 4/4] KVM: SVM: don't NMI singlestep over event injection Ladi Prosek
@ 2017-06-15 12:03 ` Paolo Bonzini
2017-06-15 12:10 ` Ladi Prosek
4 siblings, 1 reply; 23+ messages in thread
From: Paolo Bonzini @ 2017-06-15 12:03 UTC (permalink / raw)
To: Ladi Prosek, KVM list
On 15/06/2017 13:20, Ladi Prosek wrote:
> NMI window singlestep kind of works with these patches because it's meant
> only for short sequences (I believe that the original intention was to
> step over an IRET but I doubt it's that simple anymore)
Yes, it was meant to step over an IRET or an interrupt shadow.
One extra case that may cause NMI singlestep these days is GIF=0, but
that is also solved easily:
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index d1efe2c62b3f..15a2f7f8e539 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -4622,6 +4622,9 @@ static void enable_nmi_window(struct kvm_vcpu *vcpu)
if ((svm->vcpu.arch.hflags & (HF_NMI_MASK | HF_IRET_MASK))
== HF_NMI_MASK)
return; /* IRET will cause a vm exit */
+ if ((svm->vcpu.arch.hflags & (HF_NMI_MASK | HF_GIF_MASK))
+ == HF_NMI_MASK)
+ return; /* STGI will cause a vm exit */
so you could include this change in your series.
Paolo
> so we can get
> away with half-butting it. In particular, it's unlikely that the guest
> would set the TRAP flag while the NMI window is closed. Properly handling
> KVM_GUESTDBG_SINGLESTEP would likely involve intercepting PUSHF & POPF,
> clearing the TRAP flag from the stack on interrupt entry, and possibly more.
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 4/4] KVM: SVM: don't NMI singlestep over event injection
2017-06-15 11:20 ` [PATCH 4/4] KVM: SVM: don't NMI singlestep over event injection Ladi Prosek
@ 2017-06-15 12:05 ` Paolo Bonzini
2017-06-15 12:38 ` Ladi Prosek
0 siblings, 1 reply; 23+ messages in thread
From: Paolo Bonzini @ 2017-06-15 12:05 UTC (permalink / raw)
To: Ladi Prosek, kvm; +Cc: rkrcmar
On 15/06/2017 13:20, Ladi Prosek wrote:
> @@ -4823,6 +4826,22 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
> if (unlikely(svm->nested.exit_required))
> return;
>
> + /*
> + * Disable singlestep if we're injecting an interrupt/exception.
> + * We don't want our modified rflags to be pushed on the stack where
> + * we might not be able to easily reset them if we disabled NMI
> + * singlestep later.
> + */
> + if (svm->nmi_singlestep && svm->vmcb->control.event_inj) {
> + /*
> + * We enabled NMI singlestepping because the NMI window was
> + * closed. It's unlikely that injecting another event will make
> + * it any better. Try again later, on next iret at the latest.
> + */
> + disable_nmi_singlestep(svm);
> + set_intercept(svm, INTERCEPT_IRET);
> + }
> +
> pre_svm_run(svm);
>
> sync_lapic_to_cr8(vcpu);
>
I wonder if we could just force an immediate vmexit instead of asking
for one at the next IRET. Based on the AMD manual, event injection
happens before external interrupts cause a vmexit.
Interrupts here are disabled (through either IF or GIF) until VMRUN, so
just a
smp_send_reschedule(vcpu->cpu);
should be enough after disable_nmi_singlestep. Or if you put the code
in pre_svm_run you have a cpu local variable.
Paolo
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/4] KVM: nSVM: do not forward NMI window singlestep VM exits to L1
2017-06-15 11:20 ` [PATCH 2/4] KVM: nSVM: do not forward NMI window singlestep VM exits to L1 Ladi Prosek
@ 2017-06-15 12:08 ` Paolo Bonzini
2017-06-16 13:26 ` Radim Krčmář
1 sibling, 0 replies; 23+ messages in thread
From: Paolo Bonzini @ 2017-06-15 12:08 UTC (permalink / raw)
To: Ladi Prosek, kvm; +Cc: rkrcmar
On 15/06/2017 13:20, Ladi Prosek wrote:
> @@ -4635,6 +4669,7 @@ static void enable_nmi_window(struct kvm_vcpu *vcpu)
> * Something prevents NMI from been injected. Single step over possible
> * problem (IRET or exception injection or interrupt shadow)
> */
> + svm->nmi_singlestep_guest_rflags = svm_get_rflags(vcpu);
> svm->nmi_singlestep = true;
I like this, and I think it should be extended (not by you necessarily
:)) to KVM_GUESTDBG_SINGLESTEP. The current way of dropping TF from the
guest altogether kinda works, because you're not going to run nested
gdb, but is conceptually broken.
Paolo
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/4] KVM: SVM: hide TF/RF flags used by NMI singlestep
2017-06-15 11:20 ` [PATCH 3/4] KVM: SVM: hide TF/RF flags used by NMI singlestep Ladi Prosek
@ 2017-06-15 12:09 ` Paolo Bonzini
2017-06-15 13:02 ` Ladi Prosek
0 siblings, 1 reply; 23+ messages in thread
From: Paolo Bonzini @ 2017-06-15 12:09 UTC (permalink / raw)
To: Ladi Prosek, kvm; +Cc: rkrcmar
On 15/06/2017 13:20, Ladi Prosek wrote:
> Example: NMI singlestepping is enabled while running L1 guest. The
> instruction to step over is VMRUN and nested vmrun emulation stashes
> rflags to hsave->save.rflags. Then if singlestepping is disabled
> while still in L2, TF/RF will be cleared from the nested VMCB but the
> next nested VM exit will restore them from hsave->save.rflags and
> cause an unexpected DB exception.
Stupid question ahead, why is NMI singlestepping even using RF?
I should fire up my AMD box and see whether changing it breaks
eventinj.flat...
Paolo
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/4] KVM: SVM: improve NMI window singlestep
2017-06-15 12:03 ` [PATCH 0/4] KVM: SVM: improve NMI window singlestep Paolo Bonzini
@ 2017-06-15 12:10 ` Ladi Prosek
0 siblings, 0 replies; 23+ messages in thread
From: Ladi Prosek @ 2017-06-15 12:10 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: KVM list
On Thu, Jun 15, 2017 at 2:03 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
>
> On 15/06/2017 13:20, Ladi Prosek wrote:
>> NMI window singlestep kind of works with these patches because it's meant
>> only for short sequences (I believe that the original intention was to
>> step over an IRET but I doubt it's that simple anymore)
>
> Yes, it was meant to step over an IRET or an interrupt shadow.
>
> One extra case that may cause NMI singlestep these days is GIF=0, but
> that is also solved easily:
>
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index d1efe2c62b3f..15a2f7f8e539 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -4622,6 +4622,9 @@ static void enable_nmi_window(struct kvm_vcpu *vcpu)
> if ((svm->vcpu.arch.hflags & (HF_NMI_MASK | HF_IRET_MASK))
> == HF_NMI_MASK)
> return; /* IRET will cause a vm exit */
> + if ((svm->vcpu.arch.hflags & (HF_NMI_MASK | HF_GIF_MASK))
> + == HF_NMI_MASK)
> + return; /* STGI will cause a vm exit */
>
> so you could include this change in your series.
Will do, thanks!
> Paolo
>
>> so we can get
>> away with half-butting it. In particular, it's unlikely that the guest
>> would set the TRAP flag while the NMI window is closed. Properly handling
>> KVM_GUESTDBG_SINGLESTEP would likely involve intercepting PUSHF & POPF,
>> clearing the TRAP flag from the stack on interrupt entry, and possibly more.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 4/4] KVM: SVM: don't NMI singlestep over event injection
2017-06-15 12:05 ` Paolo Bonzini
@ 2017-06-15 12:38 ` Ladi Prosek
2017-06-15 13:21 ` Paolo Bonzini
0 siblings, 1 reply; 23+ messages in thread
From: Ladi Prosek @ 2017-06-15 12:38 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: KVM list, Radim Krcmar
On Thu, Jun 15, 2017 at 2:05 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
>
> On 15/06/2017 13:20, Ladi Prosek wrote:
>> @@ -4823,6 +4826,22 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
>> if (unlikely(svm->nested.exit_required))
>> return;
>>
>> + /*
>> + * Disable singlestep if we're injecting an interrupt/exception.
>> + * We don't want our modified rflags to be pushed on the stack where
>> + * we might not be able to easily reset them if we disabled NMI
>> + * singlestep later.
>> + */
>> + if (svm->nmi_singlestep && svm->vmcb->control.event_inj) {
>> + /*
>> + * We enabled NMI singlestepping because the NMI window was
>> + * closed. It's unlikely that injecting another event will make
>> + * it any better. Try again later, on next iret at the latest.
>> + */
>> + disable_nmi_singlestep(svm);
>> + set_intercept(svm, INTERCEPT_IRET);
>> + }
>> +
>> pre_svm_run(svm);
>>
>> sync_lapic_to_cr8(vcpu);
>>
>
> I wonder if we could just force an immediate vmexit instead of asking
> for one at the next IRET. Based on the AMD manual, event injection
> happens before external interrupts cause a vmexit.
>
> Interrupts here are disabled (through either IF or GIF) until VMRUN, so
> just a
>
> smp_send_reschedule(vcpu->cpu);
>
> should be enough after disable_nmi_singlestep. Or if you put the code
> in pre_svm_run you have a cpu local variable.
Yup, this seems to work. Thanks!
> Paolo
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/4] KVM: SVM: hide TF/RF flags used by NMI singlestep
2017-06-15 12:09 ` Paolo Bonzini
@ 2017-06-15 13:02 ` Ladi Prosek
2017-06-15 13:32 ` Paolo Bonzini
0 siblings, 1 reply; 23+ messages in thread
From: Ladi Prosek @ 2017-06-15 13:02 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: KVM list, Radim Krcmar
On Thu, Jun 15, 2017 at 2:09 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
>
> On 15/06/2017 13:20, Ladi Prosek wrote:
>> Example: NMI singlestepping is enabled while running L1 guest. The
>> instruction to step over is VMRUN and nested vmrun emulation stashes
>> rflags to hsave->save.rflags. Then if singlestepping is disabled
>> while still in L2, TF/RF will be cleared from the nested VMCB but the
>> next nested VM exit will restore them from hsave->save.rflags and
>> cause an unexpected DB exception.
>
> Stupid question ahead, why is NMI singlestepping even using RF?
To be sure that the DB is really going to be singlestep and not a
regular breakpoint? Otherwise db_interception would have to do more
checks and maybe inject DB_VECTOR even if nmi_singlestep is on. Hmm..
but we fail to deliver such a regular breakpoint DB to the guest right
now, don't we :/
> I should fire up my AMD box and see whether changing it breaks
> eventinj.flat...
>
> Paolo
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 4/4] KVM: SVM: don't NMI singlestep over event injection
2017-06-15 12:38 ` Ladi Prosek
@ 2017-06-15 13:21 ` Paolo Bonzini
0 siblings, 0 replies; 23+ messages in thread
From: Paolo Bonzini @ 2017-06-15 13:21 UTC (permalink / raw)
To: Ladi Prosek; +Cc: KVM list, Radim Krcmar
On 15/06/2017 14:38, Ladi Prosek wrote:
> On Thu, Jun 15, 2017 at 2:05 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>>
>> On 15/06/2017 13:20, Ladi Prosek wrote:
>>> @@ -4823,6 +4826,22 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
>>> if (unlikely(svm->nested.exit_required))
>>> return;
>>>
>>> + /*
>>> + * Disable singlestep if we're injecting an interrupt/exception.
>>> + * We don't want our modified rflags to be pushed on the stack where
>>> + * we might not be able to easily reset them if we disabled NMI
>>> + * singlestep later.
>>> + */
>>> + if (svm->nmi_singlestep && svm->vmcb->control.event_inj) {
>>> + /*
>>> + * We enabled NMI singlestepping because the NMI window was
>>> + * closed. It's unlikely that injecting another event will make
>>> + * it any better. Try again later, on next iret at the latest.
>>> + */
>>> + disable_nmi_singlestep(svm);
>>> + set_intercept(svm, INTERCEPT_IRET);
>>> + }
>>> +
>>> pre_svm_run(svm);
>>>
>>> sync_lapic_to_cr8(vcpu);
>>>
>>
>> I wonder if we could just force an immediate vmexit instead of asking
>> for one at the next IRET. Based on the AMD manual, event injection
>> happens before external interrupts cause a vmexit.
>>
>> Interrupts here are disabled (through either IF or GIF) until VMRUN, so
>> just a
>>
>> smp_send_reschedule(vcpu->cpu);
>>
>> should be enough after disable_nmi_singlestep. Or if you put the code
>> in pre_svm_run you have a cpu local variable.
>
> Yup, this seems to work. Thanks!
Make sure to add a comment. :)
Paolo
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/4] KVM: SVM: hide TF/RF flags used by NMI singlestep
2017-06-15 13:02 ` Ladi Prosek
@ 2017-06-15 13:32 ` Paolo Bonzini
0 siblings, 0 replies; 23+ messages in thread
From: Paolo Bonzini @ 2017-06-15 13:32 UTC (permalink / raw)
To: Ladi Prosek; +Cc: KVM list, Radim Krcmar
On 15/06/2017 15:02, Ladi Prosek wrote:
>> Stupid question ahead, why is NMI singlestepping even using RF?
> To be sure that the DB is really going to be singlestep and not a
> regular breakpoint?
Indeed, you would have to check DR6.BS before clearing nmi_singlestep in
db_interception.
Paolo
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/4] KVM: nSVM: do not forward NMI window singlestep VM exits to L1
2017-06-15 11:20 ` [PATCH 2/4] KVM: nSVM: do not forward NMI window singlestep VM exits to L1 Ladi Prosek
2017-06-15 12:08 ` Paolo Bonzini
@ 2017-06-16 13:26 ` Radim Krčmář
2017-06-19 12:50 ` Ladi Prosek
1 sibling, 1 reply; 23+ messages in thread
From: Radim Krčmář @ 2017-06-16 13:26 UTC (permalink / raw)
To: Ladi Prosek; +Cc: kvm
2017-06-15 13:20+0200, Ladi Prosek:
> Nested hypervisor should not see singlestep VM exits if singlestepping
> was enabled internally by KVM. Windows is particularly sensitive to this
> and known to bluescreen on unexpected VM exits.
>
> Signed-off-by: Ladi Prosek <lprosek@redhat.com>
> ---
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> @@ -966,9 +967,13 @@ static void svm_disable_lbrv(struct vcpu_svm *svm)
> static void disable_nmi_singlestep(struct vcpu_svm *svm)
> {
> svm->nmi_singlestep = false;
> - if (!(svm->vcpu.guest_debug & KVM_GUESTDBG_SINGLESTEP))
> - svm->vmcb->save.rflags &=
> - ~(X86_EFLAGS_TF | X86_EFLAGS_RF);
> + if (!(svm->vcpu.guest_debug & KVM_GUESTDBG_SINGLESTEP)) {
> + /* Clear our flags if they were not set by the guest */
> + if (!(svm->nmi_singlestep_guest_rflags & X86_EFLAGS_TF))
> + svm->vmcb->save.rflags &= ~X86_EFLAGS_TF;
> + if (!(svm->nmi_singlestep_guest_rflags & X86_EFLAGS_RF))
> + svm->vmcb->save.rflags &= ~X86_EFLAGS_RF;
IIUC, we intercept/fault on IRET, disable the interception, set TF+RF
and enter again, the CPU executes IRET and then we get a #DB exit.
IRET pops EFLAGS from before the NMI -- doesn't the CPU properly restore
EFLAGS, so we do not need this part here?
Thanks.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/4] KVM: nSVM: do not forward NMI window singlestep VM exits to L1
2017-06-16 13:26 ` Radim Krčmář
@ 2017-06-19 12:50 ` Ladi Prosek
2017-06-19 13:05 ` Ladi Prosek
0 siblings, 1 reply; 23+ messages in thread
From: Ladi Prosek @ 2017-06-19 12:50 UTC (permalink / raw)
To: Radim Krčmář; +Cc: KVM list
On Fri, Jun 16, 2017 at 3:26 PM, Radim Krčmář <rkrcmar@redhat.com> wrote:
> 2017-06-15 13:20+0200, Ladi Prosek:
>> Nested hypervisor should not see singlestep VM exits if singlestepping
>> was enabled internally by KVM. Windows is particularly sensitive to this
>> and known to bluescreen on unexpected VM exits.
>>
>> Signed-off-by: Ladi Prosek <lprosek@redhat.com>
>> ---
>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>> @@ -966,9 +967,13 @@ static void svm_disable_lbrv(struct vcpu_svm *svm)
>> static void disable_nmi_singlestep(struct vcpu_svm *svm)
>> {
>> svm->nmi_singlestep = false;
>> - if (!(svm->vcpu.guest_debug & KVM_GUESTDBG_SINGLESTEP))
>> - svm->vmcb->save.rflags &=
>> - ~(X86_EFLAGS_TF | X86_EFLAGS_RF);
>> + if (!(svm->vcpu.guest_debug & KVM_GUESTDBG_SINGLESTEP)) {
>> + /* Clear our flags if they were not set by the guest */
>> + if (!(svm->nmi_singlestep_guest_rflags & X86_EFLAGS_TF))
>> + svm->vmcb->save.rflags &= ~X86_EFLAGS_TF;
>> + if (!(svm->nmi_singlestep_guest_rflags & X86_EFLAGS_RF))
>> + svm->vmcb->save.rflags &= ~X86_EFLAGS_RF;
>
> IIUC, we intercept/fault on IRET, disable the interception, set TF+RF
> and enter again, the CPU executes IRET and then we get a #DB exit.
>
> IRET pops EFLAGS from before the NMI -- doesn't the CPU properly restore
> EFLAGS, so we do not need this part here?
My test VM doesn't work without this part, even with the change that
Paolo suggested in 0/4:
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index d1efe2c62b3f..15a2f7f8e539 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -4622,6 +4622,9 @@ static void enable_nmi_window(struct kvm_vcpu *vcpu)
if ((svm->vcpu.arch.hflags & (HF_NMI_MASK | HF_IRET_MASK))
== HF_NMI_MASK)
return; /* IRET will cause a vm exit */
+ if ((svm->vcpu.arch.hflags & (HF_NMI_MASK | HF_GIF_MASK))
+ == HF_NMI_MASK)
+ return; /* STGI will cause a vm exit */
Let's see what we're singlestepping over (Windows Server 2016 with Hyper-V).
1. The very first time we enable NMI singlestep is when running
nested. The nested guest's rip is just after 'pause' in a 'pause; cmp;
jne' loop. svm_nmi_allowed returns false because nested_svm_nmi
returns false - we don't want to inject NMI now because
svm->nested.intercept has the INTERCEPT_NMI bit set.
2. Then we find ourselves in L1 with the rip just after 'vmrun'.
svm_nmi_allowed returns false because gif_set returns false (hflags ==
HF_HIF_MASK | HF_VINTR_MASK). So we, again, enable NMI singlestep
(without having disabled it yet).
3. We singlestep over the instruction immediately following 'vmrun'
(it's a 'mov rax, [rsp+0x20]' in this trace) and finally disable NMI
singlestep on this vcpu. We inject the NMI when the guest executes
stgi.
I'll see if I can short this out. Setting the trap flag to step over
an instruction which has no other significance than following a
'vmrun' is indeed unnecessary.
Thanks!
Ladi
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 2/4] KVM: nSVM: do not forward NMI window singlestep VM exits to L1
2017-06-19 12:50 ` Ladi Prosek
@ 2017-06-19 13:05 ` Ladi Prosek
2017-06-19 13:52 ` Paolo Bonzini
2017-06-19 16:17 ` Radim Krčmář
0 siblings, 2 replies; 23+ messages in thread
From: Ladi Prosek @ 2017-06-19 13:05 UTC (permalink / raw)
To: Radim Krčmář; +Cc: KVM list
On Mon, Jun 19, 2017 at 2:50 PM, Ladi Prosek <lprosek@redhat.com> wrote:
> On Fri, Jun 16, 2017 at 3:26 PM, Radim Krčmář <rkrcmar@redhat.com> wrote:
>> 2017-06-15 13:20+0200, Ladi Prosek:
>>> Nested hypervisor should not see singlestep VM exits if singlestepping
>>> was enabled internally by KVM. Windows is particularly sensitive to this
>>> and known to bluescreen on unexpected VM exits.
>>>
>>> Signed-off-by: Ladi Prosek <lprosek@redhat.com>
>>> ---
>>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>>> @@ -966,9 +967,13 @@ static void svm_disable_lbrv(struct vcpu_svm *svm)
>>> static void disable_nmi_singlestep(struct vcpu_svm *svm)
>>> {
>>> svm->nmi_singlestep = false;
>>> - if (!(svm->vcpu.guest_debug & KVM_GUESTDBG_SINGLESTEP))
>>> - svm->vmcb->save.rflags &=
>>> - ~(X86_EFLAGS_TF | X86_EFLAGS_RF);
>>> + if (!(svm->vcpu.guest_debug & KVM_GUESTDBG_SINGLESTEP)) {
>>> + /* Clear our flags if they were not set by the guest */
>>> + if (!(svm->nmi_singlestep_guest_rflags & X86_EFLAGS_TF))
>>> + svm->vmcb->save.rflags &= ~X86_EFLAGS_TF;
>>> + if (!(svm->nmi_singlestep_guest_rflags & X86_EFLAGS_RF))
>>> + svm->vmcb->save.rflags &= ~X86_EFLAGS_RF;
>>
>> IIUC, we intercept/fault on IRET, disable the interception, set TF+RF
>> and enter again, the CPU executes IRET and then we get a #DB exit.
>>
>> IRET pops EFLAGS from before the NMI -- doesn't the CPU properly restore
>> EFLAGS, so we do not need this part here?
>
> My test VM doesn't work without this part, even with the change that
> Paolo suggested in 0/4:
>
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index d1efe2c62b3f..15a2f7f8e539 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -4622,6 +4622,9 @@ static void enable_nmi_window(struct kvm_vcpu *vcpu)
> if ((svm->vcpu.arch.hflags & (HF_NMI_MASK | HF_IRET_MASK))
> == HF_NMI_MASK)
> return; /* IRET will cause a vm exit */
> + if ((svm->vcpu.arch.hflags & (HF_NMI_MASK | HF_GIF_MASK))
> + == HF_NMI_MASK)
> + return; /* STGI will cause a vm exit */
>
>
> Let's see what we're singlestepping over (Windows Server 2016 with Hyper-V).
>
> 1. The very first time we enable NMI singlestep is when running
> nested. The nested guest's rip is just after 'pause' in a 'pause; cmp;
> jne' loop. svm_nmi_allowed returns false because nested_svm_nmi
> returns false - we don't want to inject NMI now because
> svm->nested.intercept has the INTERCEPT_NMI bit set.
>
> 2. Then we find ourselves in L1 with the rip just after 'vmrun'.
> svm_nmi_allowed returns false because gif_set returns false (hflags ==
> HF_HIF_MASK | HF_VINTR_MASK). So we, again, enable NMI singlestep
> (without having disabled it yet).
>
> 3. We singlestep over the instruction immediately following 'vmrun'
> (it's a 'mov rax, [rsp+0x20]' in this trace) and finally disable NMI
> singlestep on this vcpu. We inject the NMI when the guest executes
> stgi.
>
>
> I'll see if I can short this out. Setting the trap flag to step over
> an instruction which has no other significance than following a
> 'vmrun' is indeed unnecessary.
Ok, I think I understand it.
First, Paolo's GIF change should be checking only HF_GIF_MASK. Whether
we're currently in an NMI or not does not make a difference, NMI is
not allowed and we have interception in place for when GIF is set so
we don't need to singlestep.
Second, enable_nmi_window can also do nothing if
svm->nested.exit_required. We're not really going to run the guest in
this case so no need to singlestep.
With these two tweaks my test VM does not generate any DB exits. We
still occasionally set TF if we have an NMI and interrupt pending at
the same time (see inject_pending_event where we check
vcpu->arch.interrupt.pending, then later vcpu->arch.nmi_pending) but
we clear it immediately in the new code added in patch 4.
What a complex state machine! I'll prepare v2.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/4] KVM: nSVM: do not forward NMI window singlestep VM exits to L1
2017-06-19 13:05 ` Ladi Prosek
@ 2017-06-19 13:52 ` Paolo Bonzini
2017-06-19 16:17 ` Radim Krčmář
1 sibling, 0 replies; 23+ messages in thread
From: Paolo Bonzini @ 2017-06-19 13:52 UTC (permalink / raw)
To: Ladi Prosek, Radim Krčmář; +Cc: KVM list
On 19/06/2017 15:05, Ladi Prosek wrote:
> With these two tweaks my test VM does not generate any DB exits. We
> still occasionally set TF if we have an NMI and interrupt pending at
> the same time (see inject_pending_event where we check
> vcpu->arch.interrupt.pending, then later vcpu->arch.nmi_pending) but
> we clear it immediately in the new code added in patch 4.
Good, please check that eventinj.flat still works on non-nested and
doesn't regress on nested (I'm not sure if it works right now). I'll
check that too before applying, though.
Paolo
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/4] KVM: nSVM: do not forward NMI window singlestep VM exits to L1
2017-06-19 13:05 ` Ladi Prosek
2017-06-19 13:52 ` Paolo Bonzini
@ 2017-06-19 16:17 ` Radim Krčmář
2017-06-19 17:17 ` Paolo Bonzini
2017-06-20 7:41 ` Ladi Prosek
1 sibling, 2 replies; 23+ messages in thread
From: Radim Krčmář @ 2017-06-19 16:17 UTC (permalink / raw)
To: Ladi Prosek; +Cc: KVM list
2017-06-19 15:05+0200, Ladi Prosek:
> On Mon, Jun 19, 2017 at 2:50 PM, Ladi Prosek <lprosek@redhat.com> wrote:
> > On Fri, Jun 16, 2017 at 3:26 PM, Radim Krčmář <rkrcmar@redhat.com> wrote:
> >> 2017-06-15 13:20+0200, Ladi Prosek:
> >>> Nested hypervisor should not see singlestep VM exits if singlestepping
> >>> was enabled internally by KVM. Windows is particularly sensitive to this
> >>> and known to bluescreen on unexpected VM exits.
> >>>
> >>> Signed-off-by: Ladi Prosek <lprosek@redhat.com>
> >>> ---
> >>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> >>> @@ -966,9 +967,13 @@ static void svm_disable_lbrv(struct vcpu_svm *svm)
> >>> static void disable_nmi_singlestep(struct vcpu_svm *svm)
> >>> {
> >>> svm->nmi_singlestep = false;
> >>> - if (!(svm->vcpu.guest_debug & KVM_GUESTDBG_SINGLESTEP))
> >>> - svm->vmcb->save.rflags &=
> >>> - ~(X86_EFLAGS_TF | X86_EFLAGS_RF);
> >>> + if (!(svm->vcpu.guest_debug & KVM_GUESTDBG_SINGLESTEP)) {
> >>> + /* Clear our flags if they were not set by the guest */
> >>> + if (!(svm->nmi_singlestep_guest_rflags & X86_EFLAGS_TF))
> >>> + svm->vmcb->save.rflags &= ~X86_EFLAGS_TF;
> >>> + if (!(svm->nmi_singlestep_guest_rflags & X86_EFLAGS_RF))
> >>> + svm->vmcb->save.rflags &= ~X86_EFLAGS_RF;
> >>
> >> IIUC, we intercept/fault on IRET, disable the interception, set TF+RF
> >> and enter again, the CPU executes IRET and then we get a #DB exit.
> >>
> >> IRET pops EFLAGS from before the NMI -- doesn't the CPU properly restore
> >> EFLAGS, so we do not need this part here?
> >
> > My test VM doesn't work without this part, even with the change that
> > Paolo suggested in 0/4:
> >
> > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > index d1efe2c62b3f..15a2f7f8e539 100644
> > --- a/arch/x86/kvm/svm.c
> > +++ b/arch/x86/kvm/svm.c
> > @@ -4622,6 +4622,9 @@ static void enable_nmi_window(struct kvm_vcpu *vcpu)
> > if ((svm->vcpu.arch.hflags & (HF_NMI_MASK | HF_IRET_MASK))
> > == HF_NMI_MASK)
> > return; /* IRET will cause a vm exit */
> > + if ((svm->vcpu.arch.hflags & (HF_NMI_MASK | HF_GIF_MASK))
> > + == HF_NMI_MASK)
> > + return; /* STGI will cause a vm exit */
> >
> >
> > Let's see what we're singlestepping over (Windows Server 2016 with Hyper-V).
> >
> > 1. The very first time we enable NMI singlestep is when running
> > nested. The nested guest's rip is just after 'pause' in a 'pause; cmp;
> > jne' loop. svm_nmi_allowed returns false because nested_svm_nmi
> > returns false - we don't want to inject NMI now because
> > svm->nested.intercept has the INTERCEPT_NMI bit set.
> >
> > 2. Then we find ourselves in L1 with the rip just after 'vmrun'.
> > svm_nmi_allowed returns false because gif_set returns false (hflags ==
> > HF_HIF_MASK | HF_VINTR_MASK). So we, again, enable NMI singlestep
> > (without having disabled it yet).
> >
> > 3. We singlestep over the instruction immediately following 'vmrun'
> > (it's a 'mov rax, [rsp+0x20]' in this trace) and finally disable NMI
> > singlestep on this vcpu. We inject the NMI when the guest executes
> > stgi.
> >
> >
> > I'll see if I can short this out. Setting the trap flag to step over
> > an instruction which has no other significance than following a
> > 'vmrun' is indeed unnecessary.
>
> Ok, I think I understand it.
>
> First, Paolo's GIF change should be checking only HF_GIF_MASK. Whether
> we're currently in an NMI or not does not make a difference, NMI is
> not allowed and we have interception in place for when GIF is set so
> we don't need to singlestep.
>
> Second, enable_nmi_window can also do nothing if
> svm->nested.exit_required. We're not really going to run the guest in
> this case so no need to singlestep.
Makes sense.
> With these two tweaks my test VM does not generate any DB exits. We
> still occasionally set TF if we have an NMI and interrupt pending at
> the same time (see inject_pending_event where we check
> vcpu->arch.interrupt.pending, then later vcpu->arch.nmi_pending) but
> we clear it immediately in the new code added in patch 4.
Right, we only need the single step over IRET and interrupt shadow.
Btw. instead of single-stepping over IRET/interrupt shadow, could we set
INTERRUPT_SHADOW in VMCB, inject the NMI, and let it execute?
This mechanism would explain why AMD didn't provide a trap for IRET ...
APM 15.20 says "Injected events are treated in every way as though they
had occurred normally in the guest", which makes me think that
INTERRUPT_SHADOW blocks them, if it blocks NMIs at all on AMD.
e.g.
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 6e3095d1bad4..b564613b4e65 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -4624,14 +4624,16 @@ static void enable_nmi_window(struct kvm_vcpu *vcpu)
if ((svm->vcpu.arch.hflags & (HF_NMI_MASK | HF_IRET_MASK))
== HF_NMI_MASK)
- return; /* IRET will cause a vm exit */
+ return 0; /* IRET will cause a vm exit */
- /*
- * Something prevents NMI from been injected. Single step over possible
- * problem (IRET or exception injection or interrupt shadow)
- */
- svm->nmi_singlestep = true;
- svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
+ if (svm->vmcb->control.event_inj)
+ return 1; /* will request VM exit immediately after injection */
+
+ --vcpu->arch.nmi_pending;
+ svm_inject_nmi(vcpu);
+ svm->vmcb->control.int_state |= SVM_INTERRUPT_SHADOW_MASK;
+
+ return 0;
}
static int svm_set_tss_addr(struct kvm *kvm, unsigned int addr)
(Of course we'd want to refactor it for the final patch. :])
> What a complex state machine! I'll prepare v2.
Yes. :/ Looking forward to v2,
thanks.
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 2/4] KVM: nSVM: do not forward NMI window singlestep VM exits to L1
2017-06-19 16:17 ` Radim Krčmář
@ 2017-06-19 17:17 ` Paolo Bonzini
2017-06-19 17:46 ` Radim Krčmář
2017-06-20 7:41 ` Ladi Prosek
1 sibling, 1 reply; 23+ messages in thread
From: Paolo Bonzini @ 2017-06-19 17:17 UTC (permalink / raw)
To: Radim Krčmář, Ladi Prosek; +Cc: KVM list
On 19/06/2017 18:17, Radim Krčmář wrote:
> Right, we only need the single step over IRET and interrupt shadow.
>
> Btw. instead of single-stepping over IRET/interrupt shadow, could we set
> INTERRUPT_SHADOW in VMCB, inject the NMI, and let it execute?
> This mechanism would explain why AMD didn't provide a trap for IRET ...
You mean they didn't provide a trap-like VMEXIT for IRET, only fault-like?
Thanks,
Paolo
> APM 15.20 says "Injected events are treated in every way as though they
> had occurred normally in the guest", which makes me think that
> INTERRUPT_SHADOW blocks them, if it blocks NMIs at all on AMD.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/4] KVM: nSVM: do not forward NMI window singlestep VM exits to L1
2017-06-19 17:17 ` Paolo Bonzini
@ 2017-06-19 17:46 ` Radim Krčmář
0 siblings, 0 replies; 23+ messages in thread
From: Radim Krčmář @ 2017-06-19 17:46 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: Ladi Prosek, KVM list
2017-06-19 19:17+0200, Paolo Bonzini:
> On 19/06/2017 18:17, Radim Krčmář wrote:
> > Right, we only need the single step over IRET and interrupt shadow.
> >
> > Btw. instead of single-stepping over IRET/interrupt shadow, could we set
> > INTERRUPT_SHADOW in VMCB, inject the NMI, and let it execute?
> > This mechanism would explain why AMD didn't provide a trap for IRET ...
>
> You mean they didn't provide a trap-like VMEXIT for IRET, only fault-like?
Yes. SVM has trap-like VM exit, so I didn't understand why it was not
used for IRET. Forcing the hypervisor to have two VM exits and a clumsy
single-step felt out of place when the rest was designed nicely ...
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/4] KVM: nSVM: do not forward NMI window singlestep VM exits to L1
2017-06-19 16:17 ` Radim Krčmář
2017-06-19 17:17 ` Paolo Bonzini
@ 2017-06-20 7:41 ` Ladi Prosek
2017-06-20 13:01 ` Radim Krčmář
1 sibling, 1 reply; 23+ messages in thread
From: Ladi Prosek @ 2017-06-20 7:41 UTC (permalink / raw)
To: Radim Krčmář; +Cc: KVM list
On Mon, Jun 19, 2017 at 6:17 PM, Radim Krčmář <rkrcmar@redhat.com> wrote:
> 2017-06-19 15:05+0200, Ladi Prosek:
>> On Mon, Jun 19, 2017 at 2:50 PM, Ladi Prosek <lprosek@redhat.com> wrote:
>> > On Fri, Jun 16, 2017 at 3:26 PM, Radim Krčmář <rkrcmar@redhat.com> wrote:
>> >> 2017-06-15 13:20+0200, Ladi Prosek:
>> >>> Nested hypervisor should not see singlestep VM exits if singlestepping
>> >>> was enabled internally by KVM. Windows is particularly sensitive to this
>> >>> and known to bluescreen on unexpected VM exits.
>> >>>
>> >>> Signed-off-by: Ladi Prosek <lprosek@redhat.com>
>> >>> ---
>> >>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>> >>> @@ -966,9 +967,13 @@ static void svm_disable_lbrv(struct vcpu_svm *svm)
>> >>> static void disable_nmi_singlestep(struct vcpu_svm *svm)
>> >>> {
>> >>> svm->nmi_singlestep = false;
>> >>> - if (!(svm->vcpu.guest_debug & KVM_GUESTDBG_SINGLESTEP))
>> >>> - svm->vmcb->save.rflags &=
>> >>> - ~(X86_EFLAGS_TF | X86_EFLAGS_RF);
>> >>> + if (!(svm->vcpu.guest_debug & KVM_GUESTDBG_SINGLESTEP)) {
>> >>> + /* Clear our flags if they were not set by the guest */
>> >>> + if (!(svm->nmi_singlestep_guest_rflags & X86_EFLAGS_TF))
>> >>> + svm->vmcb->save.rflags &= ~X86_EFLAGS_TF;
>> >>> + if (!(svm->nmi_singlestep_guest_rflags & X86_EFLAGS_RF))
>> >>> + svm->vmcb->save.rflags &= ~X86_EFLAGS_RF;
>> >>
>> >> IIUC, we intercept/fault on IRET, disable the interception, set TF+RF
>> >> and enter again, the CPU executes IRET and then we get a #DB exit.
>> >>
>> >> IRET pops EFLAGS from before the NMI -- doesn't the CPU properly restore
>> >> EFLAGS, so we do not need this part here?
>> >
>> > My test VM doesn't work without this part, even with the change that
>> > Paolo suggested in 0/4:
>> >
>> > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>> > index d1efe2c62b3f..15a2f7f8e539 100644
>> > --- a/arch/x86/kvm/svm.c
>> > +++ b/arch/x86/kvm/svm.c
>> > @@ -4622,6 +4622,9 @@ static void enable_nmi_window(struct kvm_vcpu *vcpu)
>> > if ((svm->vcpu.arch.hflags & (HF_NMI_MASK | HF_IRET_MASK))
>> > == HF_NMI_MASK)
>> > return; /* IRET will cause a vm exit */
>> > + if ((svm->vcpu.arch.hflags & (HF_NMI_MASK | HF_GIF_MASK))
>> > + == HF_NMI_MASK)
>> > + return; /* STGI will cause a vm exit */
>> >
>> >
>> > Let's see what we're singlestepping over (Windows Server 2016 with Hyper-V).
>> >
>> > 1. The very first time we enable NMI singlestep is when running
>> > nested. The nested guest's rip is just after 'pause' in a 'pause; cmp;
>> > jne' loop. svm_nmi_allowed returns false because nested_svm_nmi
>> > returns false - we don't want to inject NMI now because
>> > svm->nested.intercept has the INTERCEPT_NMI bit set.
>> >
>> > 2. Then we find ourselves in L1 with the rip just after 'vmrun'.
>> > svm_nmi_allowed returns false because gif_set returns false (hflags ==
>> > HF_HIF_MASK | HF_VINTR_MASK). So we, again, enable NMI singlestep
>> > (without having disabled it yet).
>> >
>> > 3. We singlestep over the instruction immediately following 'vmrun'
>> > (it's a 'mov rax, [rsp+0x20]' in this trace) and finally disable NMI
>> > singlestep on this vcpu. We inject the NMI when the guest executes
>> > stgi.
>> >
>> >
>> > I'll see if I can short this out. Setting the trap flag to step over
>> > an instruction which has no other significance than following a
>> > 'vmrun' is indeed unnecessary.
>>
>> Ok, I think I understand it.
>>
>> First, Paolo's GIF change should be checking only HF_GIF_MASK. Whether
>> we're currently in an NMI or not does not make a difference, NMI is
>> not allowed and we have interception in place for when GIF is set so
>> we don't need to singlestep.
>>
>> Second, enable_nmi_window can also do nothing if
>> svm->nested.exit_required. We're not really going to run the guest in
>> this case so no need to singlestep.
>
> Makes sense.
>
>> With these two tweaks my test VM does not generate any DB exits. We
>> still occasionally set TF if we have an NMI and interrupt pending at
>> the same time (see inject_pending_event where we check
>> vcpu->arch.interrupt.pending, then later vcpu->arch.nmi_pending) but
>> we clear it immediately in the new code added in patch 4.
>
> Right, we only need the single step over IRET and interrupt shadow.
>
> Btw. instead of single-stepping over IRET/interrupt shadow, could we set
> INTERRUPT_SHADOW in VMCB, inject the NMI, and let it execute?
> This mechanism would explain why AMD didn't provide a trap for IRET ...
>
> APM 15.20 says "Injected events are treated in every way as though they
> had occurred normally in the guest", which makes me think that
> INTERRUPT_SHADOW blocks them, if it blocks NMIs at all on AMD.
Unfortunately this does not seem to work, eventinj.flat fails with
your change. There may be more vmexits before rip finally moves
forward so I have added a bool flag "nmi_int_shadow" and keep setting
SVM_INTERRUPT_SHADOW_MASK on entry as long as the flag is set. The
flag is reset in svm_complete_interrupts which conveniently has the
logic for this already. As far as I can tell, NMI is still injected
before the IRET.
eventinj.flat output:
Before NMI IRET test
Sending NMI to self
NMI isr running stack 0x461000
Sending nested NMI to self
After nested NMI to self
Nested NMI isr running rip=40038e (expecting 400390)
...
and my primitive debug output ("Running guest" is printed before
vmrun, "Guest exited" after):
[436685.009323] Running guest at rip 402146 shadow 0
[436685.009325] Guest exited with 7b
[436685.009438] Running guest at rip 40038e shadow 0
[436685.009440] Guest exited with 400
[436685.009452] Running guest at rip 40038e shadow 0
[436685.009454] Guest exited with 400
[436685.009467] Running guest at rip 40038e shadow 0
[436685.009468] Guest exited with 400
[436685.009470] Running guest at rip 40038e shadow 0
[436685.009472] Guest exited with 400
[436685.009474] Running guest at rip 40038e shadow 0
[436685.009476] Guest exited with 74
[436685.009478] Running guest at rip 40038e shadow 1
[436685.009481] Guest exited with 400
[436685.009484] Running guest at rip 40038e shadow 1
[436685.009486] Guest exited with 400
[436685.009494] Running guest at rip 40038e shadow 1
[436685.009496] Guest exited with 400
[436685.009499] Running guest at rip 402e20 shadow 0
[436685.009501] Guest exited with 400
> e.g.
>
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 6e3095d1bad4..b564613b4e65 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -4624,14 +4624,16 @@ static void enable_nmi_window(struct kvm_vcpu *vcpu)
>
> if ((svm->vcpu.arch.hflags & (HF_NMI_MASK | HF_IRET_MASK))
> == HF_NMI_MASK)
> - return; /* IRET will cause a vm exit */
> + return 0; /* IRET will cause a vm exit */
>
> - /*
> - * Something prevents NMI from been injected. Single step over possible
> - * problem (IRET or exception injection or interrupt shadow)
> - */
> - svm->nmi_singlestep = true;
> - svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
> + if (svm->vmcb->control.event_inj)
> + return 1; /* will request VM exit immediately after injection */
> +
> + --vcpu->arch.nmi_pending;
> + svm_inject_nmi(vcpu);
> + svm->vmcb->control.int_state |= SVM_INTERRUPT_SHADOW_MASK;
> +
> + return 0;
> }
>
> static int svm_set_tss_addr(struct kvm *kvm, unsigned int addr)
>
> (Of course we'd want to refactor it for the final patch. :])
>
>> What a complex state machine! I'll prepare v2.
>
> Yes. :/ Looking forward to v2,
>
> thanks.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/4] KVM: nSVM: do not forward NMI window singlestep VM exits to L1
2017-06-20 7:41 ` Ladi Prosek
@ 2017-06-20 13:01 ` Radim Krčmář
0 siblings, 0 replies; 23+ messages in thread
From: Radim Krčmář @ 2017-06-20 13:01 UTC (permalink / raw)
To: Ladi Prosek; +Cc: KVM list
2017-06-20 09:41+0200, Ladi Prosek:
> On Mon, Jun 19, 2017 at 6:17 PM, Radim Krčmář <rkrcmar@redhat.com> wrote:
> > 2017-06-19 15:05+0200, Ladi Prosek:
> >> On Mon, Jun 19, 2017 at 2:50 PM, Ladi Prosek <lprosek@redhat.com> wrote:
> >> > On Fri, Jun 16, 2017 at 3:26 PM, Radim Krčmář <rkrcmar@redhat.com> wrote:
> >> >> 2017-06-15 13:20+0200, Ladi Prosek:
> >> >>> Nested hypervisor should not see singlestep VM exits if singlestepping
> >> >>> was enabled internally by KVM. Windows is particularly sensitive to this
> >> >>> and known to bluescreen on unexpected VM exits.
> >> >>>
> >> >>> Signed-off-by: Ladi Prosek <lprosek@redhat.com>
> >> >>> ---
> >> >>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> >> >>> @@ -966,9 +967,13 @@ static void svm_disable_lbrv(struct vcpu_svm *svm)
> >> >>> static void disable_nmi_singlestep(struct vcpu_svm *svm)
> >> >>> {
> >> >>> svm->nmi_singlestep = false;
> >> >>> - if (!(svm->vcpu.guest_debug & KVM_GUESTDBG_SINGLESTEP))
> >> >>> - svm->vmcb->save.rflags &=
> >> >>> - ~(X86_EFLAGS_TF | X86_EFLAGS_RF);
> >> >>> + if (!(svm->vcpu.guest_debug & KVM_GUESTDBG_SINGLESTEP)) {
> >> >>> + /* Clear our flags if they were not set by the guest */
> >> >>> + if (!(svm->nmi_singlestep_guest_rflags & X86_EFLAGS_TF))
> >> >>> + svm->vmcb->save.rflags &= ~X86_EFLAGS_TF;
> >> >>> + if (!(svm->nmi_singlestep_guest_rflags & X86_EFLAGS_RF))
> >> >>> + svm->vmcb->save.rflags &= ~X86_EFLAGS_RF;
> >> >>
> >> >> IIUC, we intercept/fault on IRET, disable the interception, set TF+RF
> >> >> and enter again, the CPU executes IRET and then we get a #DB exit.
> >> >>
> >> >> IRET pops EFLAGS from before the NMI -- doesn't the CPU properly restore
> >> >> EFLAGS, so we do not need this part here?
> >> >
> >> > My test VM doesn't work without this part, even with the change that
> >> > Paolo suggested in 0/4:
> >> >
> >> > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> >> > index d1efe2c62b3f..15a2f7f8e539 100644
> >> > --- a/arch/x86/kvm/svm.c
> >> > +++ b/arch/x86/kvm/svm.c
> >> > @@ -4622,6 +4622,9 @@ static void enable_nmi_window(struct kvm_vcpu *vcpu)
> >> > if ((svm->vcpu.arch.hflags & (HF_NMI_MASK | HF_IRET_MASK))
> >> > == HF_NMI_MASK)
> >> > return; /* IRET will cause a vm exit */
> >> > + if ((svm->vcpu.arch.hflags & (HF_NMI_MASK | HF_GIF_MASK))
> >> > + == HF_NMI_MASK)
> >> > + return; /* STGI will cause a vm exit */
> >> >
> >> >
> >> > Let's see what we're singlestepping over (Windows Server 2016 with Hyper-V).
> >> >
> >> > 1. The very first time we enable NMI singlestep is when running
> >> > nested. The nested guest's rip is just after 'pause' in a 'pause; cmp;
> >> > jne' loop. svm_nmi_allowed returns false because nested_svm_nmi
> >> > returns false - we don't want to inject NMI now because
> >> > svm->nested.intercept has the INTERCEPT_NMI bit set.
> >> >
> >> > 2. Then we find ourselves in L1 with the rip just after 'vmrun'.
> >> > svm_nmi_allowed returns false because gif_set returns false (hflags ==
> >> > HF_HIF_MASK | HF_VINTR_MASK). So we, again, enable NMI singlestep
> >> > (without having disabled it yet).
> >> >
> >> > 3. We singlestep over the instruction immediately following 'vmrun'
> >> > (it's a 'mov rax, [rsp+0x20]' in this trace) and finally disable NMI
> >> > singlestep on this vcpu. We inject the NMI when the guest executes
> >> > stgi.
> >> >
> >> >
> >> > I'll see if I can short this out. Setting the trap flag to step over
> >> > an instruction which has no other significance than following a
> >> > 'vmrun' is indeed unnecessary.
> >>
> >> Ok, I think I understand it.
> >>
> >> First, Paolo's GIF change should be checking only HF_GIF_MASK. Whether
> >> we're currently in an NMI or not does not make a difference, NMI is
> >> not allowed and we have interception in place for when GIF is set so
> >> we don't need to singlestep.
> >>
> >> Second, enable_nmi_window can also do nothing if
> >> svm->nested.exit_required. We're not really going to run the guest in
> >> this case so no need to singlestep.
> >
> > Makes sense.
> >
> >> With these two tweaks my test VM does not generate any DB exits. We
> >> still occasionally set TF if we have an NMI and interrupt pending at
> >> the same time (see inject_pending_event where we check
> >> vcpu->arch.interrupt.pending, then later vcpu->arch.nmi_pending) but
> >> we clear it immediately in the new code added in patch 4.
> >
> > Right, we only need the single step over IRET and interrupt shadow.
> >
> > Btw. instead of single-stepping over IRET/interrupt shadow, could we set
> > INTERRUPT_SHADOW in VMCB, inject the NMI, and let it execute?
> > This mechanism would explain why AMD didn't provide a trap for IRET ...
> >
> > APM 15.20 says "Injected events are treated in every way as though they
> > had occurred normally in the guest", which makes me think that
> > INTERRUPT_SHADOW blocks them, if it blocks NMIs at all on AMD.
>
> Unfortunately this does not seem to work, eventinj.flat fails with
> your change. There may be more vmexits before rip finally moves
> forward so I have added a bool flag "nmi_int_shadow" and keep setting
> SVM_INTERRUPT_SHADOW_MASK on entry as long as the flag is set. The
> flag is reset in svm_complete_interrupts which conveniently has the
> logic for this already. As far as I can tell, NMI is still injected
> before the IRET.
>
> eventinj.flat output:
>
> Before NMI IRET test
> Sending NMI to self
> NMI isr running stack 0x461000
> Sending nested NMI to self
> After nested NMI to self
> Nested NMI isr running rip=40038e (expecting 400390)
> ...
At least we know the current code isn't bad for no reason. :)
Thanks for trying it out!
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2017-06-20 13:01 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-06-15 11:20 [PATCH 0/4] KVM: SVM: improve NMI window singlestep Ladi Prosek
2017-06-15 11:20 ` [PATCH 1/4] KVM: SVM: introduce disable_nmi_singlestep helper Ladi Prosek
2017-06-15 11:20 ` [PATCH 2/4] KVM: nSVM: do not forward NMI window singlestep VM exits to L1 Ladi Prosek
2017-06-15 12:08 ` Paolo Bonzini
2017-06-16 13:26 ` Radim Krčmář
2017-06-19 12:50 ` Ladi Prosek
2017-06-19 13:05 ` Ladi Prosek
2017-06-19 13:52 ` Paolo Bonzini
2017-06-19 16:17 ` Radim Krčmář
2017-06-19 17:17 ` Paolo Bonzini
2017-06-19 17:46 ` Radim Krčmář
2017-06-20 7:41 ` Ladi Prosek
2017-06-20 13:01 ` Radim Krčmář
2017-06-15 11:20 ` [PATCH 3/4] KVM: SVM: hide TF/RF flags used by NMI singlestep Ladi Prosek
2017-06-15 12:09 ` Paolo Bonzini
2017-06-15 13:02 ` Ladi Prosek
2017-06-15 13:32 ` Paolo Bonzini
2017-06-15 11:20 ` [PATCH 4/4] KVM: SVM: don't NMI singlestep over event injection Ladi Prosek
2017-06-15 12:05 ` Paolo Bonzini
2017-06-15 12:38 ` Ladi Prosek
2017-06-15 13:21 ` Paolo Bonzini
2017-06-15 12:03 ` [PATCH 0/4] KVM: SVM: improve NMI window singlestep Paolo Bonzini
2017-06-15 12:10 ` Ladi Prosek
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).