[PATCH 0/6] KVM: Enhancements and fixes around guest debugging

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/6] KVM: Enhancements and fixes around guest debugging
@ 2010-02-22 17:51 Jan Kiszka
  2010-02-22 17:51 ` [PATCH 1/6] KVM: VMX: Update instruction length on intercepted BP Jan Kiszka
                   ` (5 more replies)
  0 siblings, 6 replies; 29+ messages in thread
From: Jan Kiszka @ 2010-02-22 17:51 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, Gleb Natapov

Patch 1&2 are well-known, but I think no longer disputed. Patch 3 was
requested by Avi while merging some qemu-kvm bits of mine, it should be
pushed into 2.6.33 as well. And the other patches came out of my attempt
to step over some APIC MMIO access of a guest.

So far my "stable" bits, the rest is currently waiting for feedback by
CPU vendors (NMI stepping on AMD, NMI injection vs. STI shadow on
Intel).

You may pull these patches from

	git://git.kiszka.org/linux-kvm gdb

Jan Kiszka (6):
  KVM: VMX: Update instruction length on intercepted BP
  KVM: SVM: Emulate nRIP feature when reinjecting INT3
  KVM: x86: Add KVM_CAP_X86_ROBUST_SINGLESTEP
  KVM: x86: Drop RF manipulation for guest single-stepping
  KVM: x86: Preserve injected TF across emulation
  KVM: x86: Emulator support for TF

 arch/x86/include/asm/kvm_host.h |    5 +++
 arch/x86/kvm/svm.c              |   72 +++++++++++++++++++++++++++----------
 arch/x86/kvm/vmx.c              |   19 ++++++++++
 arch/x86/kvm/x86.c              |   74 ++++++++++++++++++++++++++++++++------
 include/linux/kvm.h             |    1 +
 5 files changed, 139 insertions(+), 32 deletions(-)


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 1/6] KVM: VMX: Update instruction length on intercepted BP
  2010-02-22 17:51 [PATCH 0/6] KVM: Enhancements and fixes around guest debugging Jan Kiszka
@ 2010-02-22 17:51 ` Jan Kiszka
  2010-02-22 17:51 ` [PATCH 2/6] KVM: SVM: Emulate nRIP feature when reinjecting INT3 Jan Kiszka
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 29+ messages in thread
From: Jan Kiszka @ 2010-02-22 17:51 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, Gleb Natapov

We intercept #BP while in guest debugging mode. As VM exits due to
intercepted exceptions do not necessarily come with valid
idt_vectoring, we have to update event_exit_inst_len explicitly in such
cases. At least in the absence of migration, this ensures that
re-injections of #BP will find and use the correct instruction length.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 arch/x86/kvm/vmx.c |   13 +++++++++++++
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index ce5ec41..d772476 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2775,6 +2775,12 @@ static int handle_rmode_exception(struct kvm_vcpu *vcpu,
 		kvm_queue_exception(vcpu, vec);
 		return 1;
 	case BP_VECTOR:
+		/*
+		 * Update instruction length as we may reinject the exception
+		 * from user space while in guest debugging mode.
+		 */
+		to_vmx(vcpu)->vcpu.arch.event_exit_inst_len =
+			vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
 		if (vcpu->guest_debug & KVM_GUESTDBG_USE_SW_BP)
 			return 0;
 		/* fall through */
@@ -2897,6 +2903,13 @@ static int handle_exception(struct kvm_vcpu *vcpu)
 		kvm_run->debug.arch.dr7 = vmcs_readl(GUEST_DR7);
 		/* fall through */
 	case BP_VECTOR:
+		/*
+		 * Update instruction length as we may reinject #BP from
+		 * user space while in guest debugging mode. Reading it for
+		 * #DB as well causes no harm, it is not used in that case.
+		 */
+		vmx->vcpu.arch.event_exit_inst_len =
+			vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
 		kvm_run->exit_reason = KVM_EXIT_DEBUG;
 		kvm_run->debug.arch.pc = vmcs_readl(GUEST_CS_BASE) + rip;
 		kvm_run->debug.arch.exception = ex_no;
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 2/6] KVM: SVM: Emulate nRIP feature when reinjecting INT3
  2010-02-22 17:51 [PATCH 0/6] KVM: Enhancements and fixes around guest debugging Jan Kiszka
  2010-02-22 17:51 ` [PATCH 1/6] KVM: VMX: Update instruction length on intercepted BP Jan Kiszka
@ 2010-02-22 17:51 ` Jan Kiszka
  2010-02-23 10:13   ` Gleb Natapov
  2010-02-22 17:51 ` [PATCH 3/6] KVM: x86: Add KVM_CAP_X86_ROBUST_SINGLESTEP Jan Kiszka
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 29+ messages in thread
From: Jan Kiszka @ 2010-02-22 17:51 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, Gleb Natapov

When in guest debugging mode, we have to reinject those #BP software
exceptions that are caused by guest-injected INT3. As older AMD
processors to not support the required nRIP VMCB field, try to emulate
it by moving RIP by one on injection. Fix it up again in case the
injection failed and we were able to catch this. This does not work for
unintercepted faults, but it is better than doing nothing.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 arch/x86/kvm/svm.c |   72 +++++++++++++++++++++++++++++++++++++--------------
 1 files changed, 52 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 1d76899..754e2f7 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -46,6 +46,7 @@ MODULE_LICENSE("GPL");
 #define SVM_FEATURE_NPT  (1 << 0)
 #define SVM_FEATURE_LBRV (1 << 1)
 #define SVM_FEATURE_SVML (1 << 2)
+#define SVM_FEATURE_NRIP (1 << 3)
 #define SVM_FEATURE_PAUSE_FILTER (1 << 10)
 
 #define NESTED_EXIT_HOST	0	/* Exit handled on host level */
@@ -109,6 +110,10 @@ struct vcpu_svm {
 	struct nested_state nested;
 
 	bool nmi_singlestep;
+
+	unsigned int3_injected;
+	u16 int3_cs;
+	u64 int3_rip;
 };
 
 /* enable NPT for AMD64 and X86 with PAE */
@@ -235,23 +240,6 @@ static void svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
 	vcpu->arch.efer = efer;
 }
 
-static void svm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr,
-				bool has_error_code, u32 error_code)
-{
-	struct vcpu_svm *svm = to_svm(vcpu);
-
-	/* If we are within a nested VM we'd better #VMEXIT and let the
-	   guest handle the exception */
-	if (nested_svm_check_exception(svm, nr, has_error_code, error_code))
-		return;
-
-	svm->vmcb->control.event_inj = nr
-		| SVM_EVTINJ_VALID
-		| (has_error_code ? SVM_EVTINJ_VALID_ERR : 0)
-		| SVM_EVTINJ_TYPE_EXEPT;
-	svm->vmcb->control.event_inj_err = error_code;
-}
-
 static int is_external_interrupt(u32 info)
 {
 	info &= SVM_EVTINJ_TYPE_MASK | SVM_EVTINJ_VALID;
@@ -297,6 +285,39 @@ static void skip_emulated_instruction(struct kvm_vcpu *vcpu)
 	svm_set_interrupt_shadow(vcpu, 0);
 }
 
+static void svm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr,
+				bool has_error_code, u32 error_code)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	/* If we are within a nested VM we'd better #VMEXIT and let the
+	   guest handle the exception */
+	if (nested_svm_check_exception(svm, nr, has_error_code, error_code))
+		return;
+
+	if (nr == BP_VECTOR && !svm_has(SVM_FEATURE_NRIP)) {
+		u64 old_rip = kvm_rip_read(&svm->vcpu);
+
+		/*
+		 * For guest debugging where we have to reinject #BP if some
+		 * INT3 is guest-owned:
+		 * Emulate nRIP by moving RIP forward. Will fail if injection
+		 * raises a fault that is not intercepted. Still better than
+		 * failing in all cases.
+		 */
+		skip_emulated_instruction(&svm->vcpu);
+		svm->int3_cs = svm->vmcb->save.cs.selector;
+		svm->int3_rip = kvm_rip_read(&svm->vcpu);
+		svm->int3_injected = svm->int3_rip - old_rip;
+	}
+
+	svm->vmcb->control.event_inj = nr
+		| SVM_EVTINJ_VALID
+		| (has_error_code ? SVM_EVTINJ_VALID_ERR : 0)
+		| SVM_EVTINJ_TYPE_EXEPT;
+	svm->vmcb->control.event_inj_err = error_code;
+}
+
 static int has_svm(void)
 {
 	const char *msg;
@@ -2703,8 +2724,10 @@ static void svm_complete_interrupts(struct vcpu_svm *svm)
 	kvm_clear_exception_queue(&svm->vcpu);
 	kvm_clear_interrupt_queue(&svm->vcpu);
 
-	if (!(exitintinfo & SVM_EXITINTINFO_VALID))
+	if (!(exitintinfo & SVM_EXITINTINFO_VALID)) {
+		svm->int3_injected = 0;
 		return;
+	}
 
 	vector = exitintinfo & SVM_EXITINTINFO_VEC_MASK;
 	type = exitintinfo & SVM_EXITINTINFO_TYPE_MASK;
@@ -2714,12 +2737,20 @@ static void svm_complete_interrupts(struct vcpu_svm *svm)
 		svm->vcpu.arch.nmi_injected = true;
 		break;
 	case SVM_EXITINTINFO_TYPE_EXEPT:
-		/* In case of software exception do not reinject an exception
-		   vector, but re-execute and instruction instead */
 		if (is_nested(svm))
 			break;
 		if (kvm_exception_is_soft(vector))
+			if (vector == BP_VECTOR && svm->int3_injected &&
+			    svm->vmcb->save.cs.selector == svm->int3_cs &&
+			    kvm_rip_read(&svm->vcpu) == svm->int3_rip)
+				kvm_rip_write(&svm->vcpu,
+					      kvm_rip_read(&svm->vcpu) -
+					      svm->int3_injected);
 			break;
+		/*
+		 * In case of other software exceptions, do not reinject the
+		 * vector, but re-execute the instruction instead.
+		 */
 		if (exitintinfo & SVM_EXITINTINFO_VALID_ERR) {
 			u32 err = svm->vmcb->control.exit_int_info_err;
 			kvm_queue_exception_e(&svm->vcpu, vector, err);
@@ -2733,6 +2764,7 @@ static void svm_complete_interrupts(struct vcpu_svm *svm)
 	default:
 		break;
 	}
+	svm->int3_injected = 0;
 }
 
 #ifdef CONFIG_X86_64
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/6] KVM: SVM: Emulate nRIP feature when reinjecting INT3
  2010-02-22 17:51 ` [PATCH 2/6] KVM: SVM: Emulate nRIP feature when reinjecting INT3 Jan Kiszka
@ 2010-02-23 10:13   ` Gleb Natapov
  2010-02-23 10:17     ` Jan Kiszka
  0 siblings, 1 reply; 29+ messages in thread
From: Gleb Natapov @ 2010-02-23 10:13 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm

On Mon, Feb 22, 2010 at 06:51:19PM +0100, Jan Kiszka wrote:
> When in guest debugging mode, we have to reinject those #BP software
> exceptions that are caused by guest-injected INT3. As older AMD
> processors to not support the required nRIP VMCB field, try to emulate
> it by moving RIP by one on injection. Fix it up again in case the
> injection failed and we were able to catch this. This does not work for
> unintercepted faults, but it is better than doing nothing.
> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> ---
>  arch/x86/kvm/svm.c |   72 +++++++++++++++++++++++++++++++++++++--------------
>  1 files changed, 52 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 1d76899..754e2f7 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -46,6 +46,7 @@ MODULE_LICENSE("GPL");
>  #define SVM_FEATURE_NPT  (1 << 0)
>  #define SVM_FEATURE_LBRV (1 << 1)
>  #define SVM_FEATURE_SVML (1 << 2)
> +#define SVM_FEATURE_NRIP (1 << 3)
>  #define SVM_FEATURE_PAUSE_FILTER (1 << 10)
>  
>  #define NESTED_EXIT_HOST	0	/* Exit handled on host level */
> @@ -109,6 +110,10 @@ struct vcpu_svm {
>  	struct nested_state nested;
>  
>  	bool nmi_singlestep;
> +
> +	unsigned int3_injected;
> +	u16 int3_cs;
> +	u64 int3_rip;
>  };
>  
>  /* enable NPT for AMD64 and X86 with PAE */
> @@ -235,23 +240,6 @@ static void svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
>  	vcpu->arch.efer = efer;
>  }
>  
> -static void svm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr,
> -				bool has_error_code, u32 error_code)
> -{
> -	struct vcpu_svm *svm = to_svm(vcpu);
> -
> -	/* If we are within a nested VM we'd better #VMEXIT and let the
> -	   guest handle the exception */
> -	if (nested_svm_check_exception(svm, nr, has_error_code, error_code))
> -		return;
> -
> -	svm->vmcb->control.event_inj = nr
> -		| SVM_EVTINJ_VALID
> -		| (has_error_code ? SVM_EVTINJ_VALID_ERR : 0)
> -		| SVM_EVTINJ_TYPE_EXEPT;
> -	svm->vmcb->control.event_inj_err = error_code;
> -}
> -
Why have you moved svm_queue_exception() function?

>  static int is_external_interrupt(u32 info)
>  {
>  	info &= SVM_EVTINJ_TYPE_MASK | SVM_EVTINJ_VALID;
> @@ -297,6 +285,39 @@ static void skip_emulated_instruction(struct kvm_vcpu *vcpu)
>  	svm_set_interrupt_shadow(vcpu, 0);
>  }
>  
> +static void svm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr,
> +				bool has_error_code, u32 error_code)
> +{
> +	struct vcpu_svm *svm = to_svm(vcpu);
> +
> +	/* If we are within a nested VM we'd better #VMEXIT and let the
> +	   guest handle the exception */
> +	if (nested_svm_check_exception(svm, nr, has_error_code, error_code))
> +		return;
> +
> +	if (nr == BP_VECTOR && !svm_has(SVM_FEATURE_NRIP)) {
> +		u64 old_rip = kvm_rip_read(&svm->vcpu);
> +
> +		/*
> +		 * For guest debugging where we have to reinject #BP if some
> +		 * INT3 is guest-owned:
> +		 * Emulate nRIP by moving RIP forward. Will fail if injection
> +		 * raises a fault that is not intercepted. Still better than
> +		 * failing in all cases.
> +		 */
> +		skip_emulated_instruction(&svm->vcpu);
> +		svm->int3_cs = svm->vmcb->save.cs.selector;
> +		svm->int3_rip = kvm_rip_read(&svm->vcpu);
> +		svm->int3_injected = svm->int3_rip - old_rip;
> +	}
> +
> +	svm->vmcb->control.event_inj = nr
> +		| SVM_EVTINJ_VALID
> +		| (has_error_code ? SVM_EVTINJ_VALID_ERR : 0)
> +		| SVM_EVTINJ_TYPE_EXEPT;
> +	svm->vmcb->control.event_inj_err = error_code;
> +}
> +
>  static int has_svm(void)
>  {
>  	const char *msg;
> @@ -2703,8 +2724,10 @@ static void svm_complete_interrupts(struct vcpu_svm *svm)
>  	kvm_clear_exception_queue(&svm->vcpu);
>  	kvm_clear_interrupt_queue(&svm->vcpu);
> 
int int3_injected = svm->int3_injected;
svm->int3_injected = 0;

Will save us from zeroing svm->int3_injected in two places.

> -	if (!(exitintinfo & SVM_EXITINTINFO_VALID))
> +	if (!(exitintinfo & SVM_EXITINTINFO_VALID)) {
> +		svm->int3_injected = 0;
>  		return;
> +	}
>  
>  	vector = exitintinfo & SVM_EXITINTINFO_VEC_MASK;
>  	type = exitintinfo & SVM_EXITINTINFO_TYPE_MASK;
> @@ -2714,12 +2737,20 @@ static void svm_complete_interrupts(struct vcpu_svm *svm)
>  		svm->vcpu.arch.nmi_injected = true;
>  		break;
>  	case SVM_EXITINTINFO_TYPE_EXEPT:
> -		/* In case of software exception do not reinject an exception
> -		   vector, but re-execute and instruction instead */
>  		if (is_nested(svm))
>  			break;
>  		if (kvm_exception_is_soft(vector))
> +			if (vector == BP_VECTOR && svm->int3_injected &&
> +			    svm->vmcb->save.cs.selector == svm->int3_cs &&
> +			    kvm_rip_read(&svm->vcpu) == svm->int3_rip)
> +				kvm_rip_write(&svm->vcpu,
> +					      kvm_rip_read(&svm->vcpu) -
> +					      svm->int3_injected);
>  			break;
Something is very wrong here. break does not belong to any of above ifs,
so code below is unreachable.

> +		/*
> +		 * In case of other software exceptions, do not reinject the
> +		 * vector, but re-execute the instruction instead.
> +		 */
>  		if (exitintinfo & SVM_EXITINTINFO_VALID_ERR) {
>  			u32 err = svm->vmcb->control.exit_int_info_err;
>  			kvm_queue_exception_e(&svm->vcpu, vector, err);
> @@ -2733,6 +2764,7 @@ static void svm_complete_interrupts(struct vcpu_svm *svm)
>  	default:
>  		break;
>  	}
> +	svm->int3_injected = 0;
>  }
>  
>  #ifdef CONFIG_X86_64
> -- 
> 1.6.0.2

--
			Gleb.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/6] KVM: SVM: Emulate nRIP feature when reinjecting INT3
  2010-02-23 10:13   ` Gleb Natapov
@ 2010-02-23 10:17     ` Jan Kiszka
  2010-02-23 10:23       ` Avi Kivity
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Kiszka @ 2010-02-23 10:17 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Avi Kivity, Marcelo Tosatti, kvm

Gleb Natapov wrote:
> On Mon, Feb 22, 2010 at 06:51:19PM +0100, Jan Kiszka wrote:
>> When in guest debugging mode, we have to reinject those #BP software
>> exceptions that are caused by guest-injected INT3. As older AMD
>> processors to not support the required nRIP VMCB field, try to emulate
>> it by moving RIP by one on injection. Fix it up again in case the
>> injection failed and we were able to catch this. This does not work for
>> unintercepted faults, but it is better than doing nothing.
>>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>> ---
>>  arch/x86/kvm/svm.c |   72 +++++++++++++++++++++++++++++++++++++--------------
>>  1 files changed, 52 insertions(+), 20 deletions(-)
>>
>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>> index 1d76899..754e2f7 100644
>> --- a/arch/x86/kvm/svm.c
>> +++ b/arch/x86/kvm/svm.c
>> @@ -46,6 +46,7 @@ MODULE_LICENSE("GPL");
>>  #define SVM_FEATURE_NPT  (1 << 0)
>>  #define SVM_FEATURE_LBRV (1 << 1)
>>  #define SVM_FEATURE_SVML (1 << 2)
>> +#define SVM_FEATURE_NRIP (1 << 3)
>>  #define SVM_FEATURE_PAUSE_FILTER (1 << 10)
>>  
>>  #define NESTED_EXIT_HOST	0	/* Exit handled on host level */
>> @@ -109,6 +110,10 @@ struct vcpu_svm {
>>  	struct nested_state nested;
>>  
>>  	bool nmi_singlestep;
>> +
>> +	unsigned int3_injected;
>> +	u16 int3_cs;
>> +	u64 int3_rip;
>>  };
>>  
>>  /* enable NPT for AMD64 and X86 with PAE */
>> @@ -235,23 +240,6 @@ static void svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
>>  	vcpu->arch.efer = efer;
>>  }
>>  
>> -static void svm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr,
>> -				bool has_error_code, u32 error_code)
>> -{
>> -	struct vcpu_svm *svm = to_svm(vcpu);
>> -
>> -	/* If we are within a nested VM we'd better #VMEXIT and let the
>> -	   guest handle the exception */
>> -	if (nested_svm_check_exception(svm, nr, has_error_code, error_code))
>> -		return;
>> -
>> -	svm->vmcb->control.event_inj = nr
>> -		| SVM_EVTINJ_VALID
>> -		| (has_error_code ? SVM_EVTINJ_VALID_ERR : 0)
>> -		| SVM_EVTINJ_TYPE_EXEPT;
>> -	svm->vmcb->control.event_inj_err = error_code;
>> -}
>> -
> Why have you moved svm_queue_exception() function?
> 

To avoid adding a prototype.

>>  static int is_external_interrupt(u32 info)
>>  {
>>  	info &= SVM_EVTINJ_TYPE_MASK | SVM_EVTINJ_VALID;
>> @@ -297,6 +285,39 @@ static void skip_emulated_instruction(struct kvm_vcpu *vcpu)
>>  	svm_set_interrupt_shadow(vcpu, 0);
>>  }
>>  
>> +static void svm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr,
>> +				bool has_error_code, u32 error_code)
>> +{
>> +	struct vcpu_svm *svm = to_svm(vcpu);
>> +
>> +	/* If we are within a nested VM we'd better #VMEXIT and let the
>> +	   guest handle the exception */
>> +	if (nested_svm_check_exception(svm, nr, has_error_code, error_code))
>> +		return;
>> +
>> +	if (nr == BP_VECTOR && !svm_has(SVM_FEATURE_NRIP)) {
>> +		u64 old_rip = kvm_rip_read(&svm->vcpu);
>> +
>> +		/*
>> +		 * For guest debugging where we have to reinject #BP if some
>> +		 * INT3 is guest-owned:
>> +		 * Emulate nRIP by moving RIP forward. Will fail if injection
>> +		 * raises a fault that is not intercepted. Still better than
>> +		 * failing in all cases.
>> +		 */
>> +		skip_emulated_instruction(&svm->vcpu);
>> +		svm->int3_cs = svm->vmcb->save.cs.selector;
>> +		svm->int3_rip = kvm_rip_read(&svm->vcpu);
>> +		svm->int3_injected = svm->int3_rip - old_rip;
>> +	}
>> +
>> +	svm->vmcb->control.event_inj = nr
>> +		| SVM_EVTINJ_VALID
>> +		| (has_error_code ? SVM_EVTINJ_VALID_ERR : 0)
>> +		| SVM_EVTINJ_TYPE_EXEPT;
>> +	svm->vmcb->control.event_inj_err = error_code;
>> +}
>> +
>>  static int has_svm(void)
>>  {
>>  	const char *msg;
>> @@ -2703,8 +2724,10 @@ static void svm_complete_interrupts(struct vcpu_svm *svm)
>>  	kvm_clear_exception_queue(&svm->vcpu);
>>  	kvm_clear_interrupt_queue(&svm->vcpu);
>>
> int int3_injected = svm->int3_injected;
> svm->int3_injected = 0;
> 
> Will save us from zeroing svm->int3_injected in two places.

OK.

> 
>> -	if (!(exitintinfo & SVM_EXITINTINFO_VALID))
>> +	if (!(exitintinfo & SVM_EXITINTINFO_VALID)) {
>> +		svm->int3_injected = 0;
>>  		return;
>> +	}
>>  
>>  	vector = exitintinfo & SVM_EXITINTINFO_VEC_MASK;
>>  	type = exitintinfo & SVM_EXITINTINFO_TYPE_MASK;
>> @@ -2714,12 +2737,20 @@ static void svm_complete_interrupts(struct vcpu_svm *svm)
>>  		svm->vcpu.arch.nmi_injected = true;
>>  		break;
>>  	case SVM_EXITINTINFO_TYPE_EXEPT:
>> -		/* In case of software exception do not reinject an exception
>> -		   vector, but re-execute and instruction instead */
>>  		if (is_nested(svm))
>>  			break;
>>  		if (kvm_exception_is_soft(vector))
>> +			if (vector == BP_VECTOR && svm->int3_injected &&
>> +			    svm->vmcb->save.cs.selector == svm->int3_cs &&
>> +			    kvm_rip_read(&svm->vcpu) == svm->int3_rip)
>> +				kvm_rip_write(&svm->vcpu,
>> +					      kvm_rip_read(&svm->vcpu) -
>> +					      svm->int3_injected);
>>  			break;
> Something is very wrong here. break does not belong to any of above ifs,
> so code below is unreachable.

Ouch, forgotten braces. Will fix.

> 
>> +		/*
>> +		 * In case of other software exceptions, do not reinject the
>> +		 * vector, but re-execute the instruction instead.
>> +		 */
>>  		if (exitintinfo & SVM_EXITINTINFO_VALID_ERR) {
>>  			u32 err = svm->vmcb->control.exit_int_info_err;
>>  			kvm_queue_exception_e(&svm->vcpu, vector, err);
>> @@ -2733,6 +2764,7 @@ static void svm_complete_interrupts(struct vcpu_svm *svm)
>>  	default:
>>  		break;
>>  	}
>> +	svm->int3_injected = 0;
>>  }
>>  
>>  #ifdef CONFIG_X86_64
>> -- 
>> 1.6.0.2
> 
> --
> 			Gleb.

Thanks,
Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/6] KVM: SVM: Emulate nRIP feature when reinjecting INT3
  2010-02-23 10:17     ` Jan Kiszka
@ 2010-02-23 10:23       ` Avi Kivity
  0 siblings, 0 replies; 29+ messages in thread
From: Avi Kivity @ 2010-02-23 10:23 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Gleb Natapov, Marcelo Tosatti, kvm

On 02/23/2010 12:17 PM, Jan Kiszka wrote:
>>
>> Why have you moved svm_queue_exception() function?
>>
>>      
> To avoid adding a prototype.
>    

It's often better to do this in a separate patch: the changelog answers 
the question above, and the following patch is smaller.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 3/6] KVM: x86: Add KVM_CAP_X86_ROBUST_SINGLESTEP
  2010-02-22 17:51 [PATCH 0/6] KVM: Enhancements and fixes around guest debugging Jan Kiszka
  2010-02-22 17:51 ` [PATCH 1/6] KVM: VMX: Update instruction length on intercepted BP Jan Kiszka
  2010-02-22 17:51 ` [PATCH 2/6] KVM: SVM: Emulate nRIP feature when reinjecting INT3 Jan Kiszka
@ 2010-02-22 17:51 ` Jan Kiszka
  2010-02-22 17:51 ` [PATCH 4/6] KVM: x86: Drop RF manipulation for guest single-stepping Jan Kiszka
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 29+ messages in thread
From: Jan Kiszka @ 2010-02-22 17:51 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, Gleb Natapov

This marks the guest single-step API improvement of 94fe45da and
91586a3b with a capability flag to allow reliable detection by user
space.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 arch/x86/kvm/x86.c  |    1 +
 include/linux/kvm.h |    1 +
 2 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e25a522..ca07f11 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1571,6 +1571,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 	case KVM_CAP_HYPERV_SPIN:
 	case KVM_CAP_PCI_SEGMENT:
 	case KVM_CAP_DEBUGREGS:
+	case KVM_CAP_X86_ROBUST_SINGLESTEP:
 		r = 1;
 		break;
 	case KVM_CAP_COALESCED_MMIO:
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index d25912e..ce28767 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -506,6 +506,7 @@ struct kvm_ioeventfd {
 #ifdef __KVM_HAVE_DEBUGREGS
 #define KVM_CAP_DEBUGREGS 50
 #endif
+#define KVM_CAP_X86_ROBUST_SINGLESTEP 51
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 4/6] KVM: x86: Drop RF manipulation for guest single-stepping
  2010-02-22 17:51 [PATCH 0/6] KVM: Enhancements and fixes around guest debugging Jan Kiszka
                   ` (2 preceding siblings ...)
  2010-02-22 17:51 ` [PATCH 3/6] KVM: x86: Add KVM_CAP_X86_ROBUST_SINGLESTEP Jan Kiszka
@ 2010-02-22 17:51 ` Jan Kiszka
  2010-02-22 17:51 ` [PATCH 5/6] KVM: x86: Preserve injected TF across emulation Jan Kiszka
  2010-02-22 17:51 ` [PATCH 6/6] KVM: x86: Emulator support for TF Jan Kiszka
  5 siblings, 0 replies; 29+ messages in thread
From: Jan Kiszka @ 2010-02-22 17:51 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, Gleb Natapov

RF is not required for injecting TF as the latter will trigger only
after an instruction execution anyway. So do not touch RF when arming or
disarming guest single-step mode.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 arch/x86/kvm/x86.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ca07f11..e2e03a4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5892,7 +5892,7 @@ unsigned long kvm_get_rflags(struct kvm_vcpu *vcpu)
 
 	rflags = kvm_x86_ops->get_rflags(vcpu);
 	if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
-		rflags &= ~(unsigned long)(X86_EFLAGS_TF | X86_EFLAGS_RF);
+		rflags &= ~X86_EFLAGS_TF;
 	return rflags;
 }
 EXPORT_SYMBOL_GPL(kvm_get_rflags);
@@ -5903,7 +5903,7 @@ void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
 	    vcpu->arch.singlestep_cs ==
 			get_segment_selector(vcpu, VCPU_SREG_CS) &&
 	    vcpu->arch.singlestep_rip == kvm_rip_read(vcpu))
-		rflags |= X86_EFLAGS_TF | X86_EFLAGS_RF;
+		rflags |= X86_EFLAGS_TF;
 	kvm_x86_ops->set_rflags(vcpu, rflags);
 }
 EXPORT_SYMBOL_GPL(kvm_set_rflags);
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 5/6] KVM: x86: Preserve injected TF across emulation
  2010-02-22 17:51 [PATCH 0/6] KVM: Enhancements and fixes around guest debugging Jan Kiszka
                   ` (3 preceding siblings ...)
  2010-02-22 17:51 ` [PATCH 4/6] KVM: x86: Drop RF manipulation for guest single-stepping Jan Kiszka
@ 2010-02-22 17:51 ` Jan Kiszka
  2010-02-23 10:00   ` Gleb Natapov
  2010-02-22 17:51 ` [PATCH 6/6] KVM: x86: Emulator support for TF Jan Kiszka
  5 siblings, 1 reply; 29+ messages in thread
From: Jan Kiszka @ 2010-02-22 17:51 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, Gleb Natapov

Call directly into the vendor services for getting/setting rflags in
emulate_instruction to ensure injected TF survives the emulation.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 arch/x86/kvm/x86.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e2e03a4..19e8b28 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3468,7 +3468,7 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
 		kvm_x86_ops->get_cs_db_l_bits(vcpu, &cs_db, &cs_l);
 
 		vcpu->arch.emulate_ctxt.vcpu = vcpu;
-		vcpu->arch.emulate_ctxt.eflags = kvm_get_rflags(vcpu);
+		vcpu->arch.emulate_ctxt.eflags = kvm_x86_ops->get_rflags(vcpu);
 		vcpu->arch.emulate_ctxt.mode =
 			(!is_protmode(vcpu)) ? X86EMUL_MODE_REAL :
 			(vcpu->arch.emulate_ctxt.eflags & X86_EFLAGS_VM)
@@ -3547,7 +3547,7 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
 		return EMULATE_DO_MMIO;
 	}
 
-	kvm_set_rflags(vcpu, vcpu->arch.emulate_ctxt.eflags);
+	kvm_x86_ops->set_rflags(vcpu, vcpu->arch.emulate_ctxt.eflags);
 
 	if (vcpu->mmio_is_write) {
 		vcpu->mmio_needed = 0;
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 5/6] KVM: x86: Preserve injected TF across emulation
  2010-02-22 17:51 ` [PATCH 5/6] KVM: x86: Preserve injected TF across emulation Jan Kiszka
@ 2010-02-23 10:00   ` Gleb Natapov
  2010-02-23 10:13     ` Jan Kiszka
  0 siblings, 1 reply; 29+ messages in thread
From: Gleb Natapov @ 2010-02-23 10:00 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm

On Mon, Feb 22, 2010 at 06:51:22PM +0100, Jan Kiszka wrote:
> Call directly into the vendor services for getting/setting rflags in
> emulate_instruction to ensure injected TF survives the emulation.
> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> ---
>  arch/x86/kvm/x86.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index e2e03a4..19e8b28 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3468,7 +3468,7 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
>  		kvm_x86_ops->get_cs_db_l_bits(vcpu, &cs_db, &cs_l);
>  
>  		vcpu->arch.emulate_ctxt.vcpu = vcpu;
> -		vcpu->arch.emulate_ctxt.eflags = kvm_get_rflags(vcpu);
> +		vcpu->arch.emulate_ctxt.eflags = kvm_x86_ops->get_rflags(vcpu);
So now emulator runs with injected TF? Hmm, then may be emulator should
inject DB when appropriate and caller of emulate_instruction() should
emulate DB intercept if external debugging is going on?

>  		vcpu->arch.emulate_ctxt.mode =
>  			(!is_protmode(vcpu)) ? X86EMUL_MODE_REAL :
>  			(vcpu->arch.emulate_ctxt.eflags & X86_EFLAGS_VM)
> @@ -3547,7 +3547,7 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
>  		return EMULATE_DO_MMIO;
>  	}
>  
> -	kvm_set_rflags(vcpu, vcpu->arch.emulate_ctxt.eflags);
> +	kvm_x86_ops->set_rflags(vcpu, vcpu->arch.emulate_ctxt.eflags);
>  
>  	if (vcpu->mmio_is_write) {
>  		vcpu->mmio_needed = 0;
> -- 
> 1.6.0.2

--
			Gleb.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 5/6] KVM: x86: Preserve injected TF across emulation
  2010-02-23 10:00   ` Gleb Natapov
@ 2010-02-23 10:13     ` Jan Kiszka
  2010-02-23 10:31       ` Gleb Natapov
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Kiszka @ 2010-02-23 10:13 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Avi Kivity, Marcelo Tosatti, kvm

Gleb Natapov wrote:
> On Mon, Feb 22, 2010 at 06:51:22PM +0100, Jan Kiszka wrote:
>> Call directly into the vendor services for getting/setting rflags in
>> emulate_instruction to ensure injected TF survives the emulation.
>>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>> ---
>>  arch/x86/kvm/x86.c |    4 ++--
>>  1 files changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index e2e03a4..19e8b28 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -3468,7 +3468,7 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
>>  		kvm_x86_ops->get_cs_db_l_bits(vcpu, &cs_db, &cs_l);
>>  
>>  		vcpu->arch.emulate_ctxt.vcpu = vcpu;
>> -		vcpu->arch.emulate_ctxt.eflags = kvm_get_rflags(vcpu);
>> +		vcpu->arch.emulate_ctxt.eflags = kvm_x86_ops->get_rflags(vcpu);
> So now emulator runs with injected TF? Hmm, then may be emulator should
> inject DB when appropriate and caller of emulate_instruction() should
> emulate DB intercept if external debugging is going on?

That is what patch 6 aims at, both for external as well as
guest-internal debugging.

> 
>>  		vcpu->arch.emulate_ctxt.mode =
>>  			(!is_protmode(vcpu)) ? X86EMUL_MODE_REAL :
>>  			(vcpu->arch.emulate_ctxt.eflags & X86_EFLAGS_VM)
>> @@ -3547,7 +3547,7 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
>>  		return EMULATE_DO_MMIO;
>>  	}
>>  
>> -	kvm_set_rflags(vcpu, vcpu->arch.emulate_ctxt.eflags);
>> +	kvm_x86_ops->set_rflags(vcpu, vcpu->arch.emulate_ctxt.eflags);
>>  
>>  	if (vcpu->mmio_is_write) {
>>  		vcpu->mmio_needed = 0;
>> -- 
>> 1.6.0.2

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 5/6] KVM: x86: Preserve injected TF across emulation
  2010-02-23 10:13     ` Jan Kiszka
@ 2010-02-23 10:31       ` Gleb Natapov
  2010-02-23 10:40         ` Jan Kiszka
  0 siblings, 1 reply; 29+ messages in thread
From: Gleb Natapov @ 2010-02-23 10:31 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm

On Tue, Feb 23, 2010 at 11:13:13AM +0100, Jan Kiszka wrote:
> Gleb Natapov wrote:
> > On Mon, Feb 22, 2010 at 06:51:22PM +0100, Jan Kiszka wrote:
> >> Call directly into the vendor services for getting/setting rflags in
> >> emulate_instruction to ensure injected TF survives the emulation.
> >>
> >> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> >> ---
> >>  arch/x86/kvm/x86.c |    4 ++--
> >>  1 files changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >> index e2e03a4..19e8b28 100644
> >> --- a/arch/x86/kvm/x86.c
> >> +++ b/arch/x86/kvm/x86.c
> >> @@ -3468,7 +3468,7 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
> >>  		kvm_x86_ops->get_cs_db_l_bits(vcpu, &cs_db, &cs_l);
> >>  
> >>  		vcpu->arch.emulate_ctxt.vcpu = vcpu;
> >> -		vcpu->arch.emulate_ctxt.eflags = kvm_get_rflags(vcpu);
> >> +		vcpu->arch.emulate_ctxt.eflags = kvm_x86_ops->get_rflags(vcpu);
> > So now emulator runs with injected TF? Hmm, then may be emulator should
> > inject DB when appropriate and caller of emulate_instruction() should
> > emulate DB intercept if external debugging is going on?
> 
> That is what patch 6 aims at, both for external as well as
> guest-internal debugging.
> 
It does at the wrong level. It tries to do it above emulator, but only emulator
knows what is the sate of instruction emulation and when emulation is completed.
If at this point TF is set it inject DB trap. The code above checks if
DB intercept is enabled when DB is injected and calls usual DB intercept
path.

> > 
> >>  		vcpu->arch.emulate_ctxt.mode =
> >>  			(!is_protmode(vcpu)) ? X86EMUL_MODE_REAL :
> >>  			(vcpu->arch.emulate_ctxt.eflags & X86_EFLAGS_VM)
> >> @@ -3547,7 +3547,7 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
> >>  		return EMULATE_DO_MMIO;
> >>  	}
> >>  
> >> -	kvm_set_rflags(vcpu, vcpu->arch.emulate_ctxt.eflags);
> >> +	kvm_x86_ops->set_rflags(vcpu, vcpu->arch.emulate_ctxt.eflags);
> >>  
> >>  	if (vcpu->mmio_is_write) {
> >>  		vcpu->mmio_needed = 0;
> >> -- 
> >> 1.6.0.2
> 
> Jan
> 
> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

--
			Gleb.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 5/6] KVM: x86: Preserve injected TF across emulation
  2010-02-23 10:31       ` Gleb Natapov
@ 2010-02-23 10:40         ` Jan Kiszka
  2010-02-23 11:03           ` Gleb Natapov
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Kiszka @ 2010-02-23 10:40 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Avi Kivity, Marcelo Tosatti, kvm

Gleb Natapov wrote:
> On Tue, Feb 23, 2010 at 11:13:13AM +0100, Jan Kiszka wrote:
>> Gleb Natapov wrote:
>>> On Mon, Feb 22, 2010 at 06:51:22PM +0100, Jan Kiszka wrote:
>>>> Call directly into the vendor services for getting/setting rflags in
>>>> emulate_instruction to ensure injected TF survives the emulation.
>>>>
>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>> ---
>>>>  arch/x86/kvm/x86.c |    4 ++--
>>>>  1 files changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>>> index e2e03a4..19e8b28 100644
>>>> --- a/arch/x86/kvm/x86.c
>>>> +++ b/arch/x86/kvm/x86.c
>>>> @@ -3468,7 +3468,7 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
>>>>  		kvm_x86_ops->get_cs_db_l_bits(vcpu, &cs_db, &cs_l);
>>>>  
>>>>  		vcpu->arch.emulate_ctxt.vcpu = vcpu;
>>>> -		vcpu->arch.emulate_ctxt.eflags = kvm_get_rflags(vcpu);
>>>> +		vcpu->arch.emulate_ctxt.eflags = kvm_x86_ops->get_rflags(vcpu);
>>> So now emulator runs with injected TF? Hmm, then may be emulator should
>>> inject DB when appropriate and caller of emulate_instruction() should
>>> emulate DB intercept if external debugging is going on?
>> That is what patch 6 aims at, both for external as well as
>> guest-internal debugging.
>>
> It does at the wrong level. It tries to do it above emulator, but only emulator
> knows what is the sate of instruction emulation and when emulation is completed.
> If at this point TF is set it inject DB trap. The code above checks if
> DB intercept is enabled when DB is injected and calls usual DB intercept
> path.

Sorry, can't follow, your description does not match the above code for me.

This patch just ensures that we do not loose TF across emulation and, by
itself, already improves guest debugging: it allows to step over
emulated blocks with one instruction offset. And it has no side effects
as the emulator does not evaluate TF yet (before patch 6).

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 5/6] KVM: x86: Preserve injected TF across emulation
  2010-02-23 10:40         ` Jan Kiszka
@ 2010-02-23 11:03           ` Gleb Natapov
  0 siblings, 0 replies; 29+ messages in thread
From: Gleb Natapov @ 2010-02-23 11:03 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm

On Tue, Feb 23, 2010 at 11:40:56AM +0100, Jan Kiszka wrote:
> Gleb Natapov wrote:
> > On Tue, Feb 23, 2010 at 11:13:13AM +0100, Jan Kiszka wrote:
> >> Gleb Natapov wrote:
> >>> On Mon, Feb 22, 2010 at 06:51:22PM +0100, Jan Kiszka wrote:
> >>>> Call directly into the vendor services for getting/setting rflags in
> >>>> emulate_instruction to ensure injected TF survives the emulation.
> >>>>
> >>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> >>>> ---
> >>>>  arch/x86/kvm/x86.c |    4 ++--
> >>>>  1 files changed, 2 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >>>> index e2e03a4..19e8b28 100644
> >>>> --- a/arch/x86/kvm/x86.c
> >>>> +++ b/arch/x86/kvm/x86.c
> >>>> @@ -3468,7 +3468,7 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
> >>>>  		kvm_x86_ops->get_cs_db_l_bits(vcpu, &cs_db, &cs_l);
> >>>>  
> >>>>  		vcpu->arch.emulate_ctxt.vcpu = vcpu;
> >>>> -		vcpu->arch.emulate_ctxt.eflags = kvm_get_rflags(vcpu);
> >>>> +		vcpu->arch.emulate_ctxt.eflags = kvm_x86_ops->get_rflags(vcpu);
> >>> So now emulator runs with injected TF? Hmm, then may be emulator should
> >>> inject DB when appropriate and caller of emulate_instruction() should
> >>> emulate DB intercept if external debugging is going on?
> >> That is what patch 6 aims at, both for external as well as
> >> guest-internal debugging.
> >>
> > It does at the wrong level. It tries to do it above emulator, but only emulator
> > knows what is the sate of instruction emulation and when emulation is completed.
> > If at this point TF is set it inject DB trap. The code above checks if
> > DB intercept is enabled when DB is injected and calls usual DB intercept
> > path.
> 
> Sorry, can't follow, your description does not match the above code for me.
> 
> This patch just ensures that we do not loose TF across emulation and, by
> itself, already improves guest debugging: it allows to step over
> emulated blocks with one instruction offset. And it has no side effects
> as the emulator does not evaluate TF yet (before patch 6).
> 
My comment was not about this patch, but about patch 6. This is the job
of emulator to emulate CPU and inject exceptions/traps, not the code above
it.

--
			Gleb.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 6/6] KVM: x86: Emulator support for TF
  2010-02-22 17:51 [PATCH 0/6] KVM: Enhancements and fixes around guest debugging Jan Kiszka
                   ` (4 preceding siblings ...)
  2010-02-22 17:51 ` [PATCH 5/6] KVM: x86: Preserve injected TF across emulation Jan Kiszka
@ 2010-02-22 17:51 ` Jan Kiszka
  2010-02-23  9:55   ` Gleb Natapov
  5 siblings, 1 reply; 29+ messages in thread
From: Jan Kiszka @ 2010-02-22 17:51 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, Gleb Natapov

Support both guest- as well as host-owned EFLAGS.TF while emulating
instructions. For guest-owned TF, we simply inject DB and update DR6.BS
after completing an instruction that has TF set on entry. To support
guest single-stepping under host control, we store the pending step
along with its CS and RIP and trigger a corresponding user space exit
once guest execution is about to resume. This check is is also required
in the VMX emulation loop during invalid guest states.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 arch/x86/include/asm/kvm_host.h |    5 +++
 arch/x86/kvm/vmx.c              |    6 +++
 arch/x86/kvm/x86.c              |   65 ++++++++++++++++++++++++++++++++++-----
 3 files changed, 68 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d46e791..d69d8aa 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -362,8 +362,11 @@ struct kvm_vcpu_arch {
 	u64 *mce_banks;
 
 	/* used for guest single stepping over the given code position */
+	bool singlestep_pending;
 	u16 singlestep_cs;
+	u16 singlestep_pending_cs;
 	unsigned long singlestep_rip;
+	unsigned long singlestep_pending_rip;
 	/* fields used by HYPER-V emulation */
 	u64 hv_vapic;
 };
@@ -820,4 +823,6 @@ int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
 void kvm_define_shared_msr(unsigned index, u32 msr);
 void kvm_set_shared_msr(unsigned index, u64 val, u64 mask);
 
+int kvm_check_guest_singlestep(struct kvm_vcpu *vcpu);
+
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index d772476..317828f 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3489,6 +3489,12 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
 			goto out;
 		if (need_resched())
 			schedule();
+
+		if (unlikely(vcpu->arch.singlestep_pending)) {
+			ret = kvm_check_guest_singlestep(vcpu);
+			if (ret == 0)
+				goto out;
+		}
 	}
 
 	vmx->emulation_required = 0;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 19e8b28..6ebebb9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3441,6 +3441,27 @@ static void cache_all_regs(struct kvm_vcpu *vcpu)
 	vcpu->arch.regs_dirty = ~0;
 }
 
+static u16 get_segment_selector(struct kvm_vcpu *vcpu, int seg)
+{
+	struct kvm_segment kvm_seg;
+
+	kvm_get_segment(vcpu, &kvm_seg, seg);
+	return kvm_seg.selector;
+}
+
+static void queue_singlestep(struct kvm_vcpu *vcpu)
+{
+	if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) {
+		vcpu->arch.singlestep_pending = true;
+		vcpu->arch.singlestep_pending_cs =
+			get_segment_selector(vcpu, VCPU_SREG_CS);
+		vcpu->arch.singlestep_pending_rip = kvm_rip_read(vcpu);
+	} else {
+		vcpu->arch.dr6 |= DR6_BS;
+		kvm_queue_exception(vcpu, DB_VECTOR);
+	}
+}
+
 int emulate_instruction(struct kvm_vcpu *vcpu,
 			unsigned long cr2,
 			u16 error_code,
@@ -3449,6 +3470,7 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
 	int r, shadow_mask;
 	struct decode_cache *c;
 	struct kvm_run *run = vcpu->run;
+	bool singlestep;
 
 	kvm_clear_exception_queue(vcpu);
 	vcpu->arch.mmio_fault_cr2 = cr2;
@@ -3515,8 +3537,12 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
 		}
 	}
 
+	singlestep = vcpu->arch.emulate_ctxt.eflags & X86_EFLAGS_TF;
+
 	if (emulation_type & EMULTYPE_SKIP) {
 		kvm_rip_write(vcpu, vcpu->arch.emulate_ctxt.decode.eip);
+		if (singlestep)
+			queue_singlestep(vcpu);
 		return EMULATE_DONE;
 	}
 
@@ -3549,6 +3575,9 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
 
 	kvm_x86_ops->set_rflags(vcpu, vcpu->arch.emulate_ctxt.eflags);
 
+	if (singlestep)
+		queue_singlestep(vcpu);
+
 	if (vcpu->mmio_is_write) {
 		vcpu->mmio_needed = 0;
 		return EMULATE_DO_MMIO;
@@ -4450,6 +4479,26 @@ out:
 	return r;
 }
 
+int kvm_check_guest_singlestep(struct kvm_vcpu *vcpu)
+{
+	unsigned long rip = kvm_rip_read(vcpu);
+
+	vcpu->arch.singlestep_pending = false;
+
+	if (vcpu->arch.singlestep_pending_cs !=
+		get_segment_selector(vcpu, VCPU_SREG_CS) ||
+	    vcpu->arch.singlestep_pending_rip != rip)
+		return 1;
+
+	vcpu->run->debug.arch.dr6 = DR6_BS | DR6_FIXED_1;
+	vcpu->run->debug.arch.dr7 = 0;
+	vcpu->run->exit_reason = KVM_EXIT_DEBUG;
+	vcpu->run->debug.arch.pc = get_segment_base(vcpu, VCPU_SREG_CS) + rip;
+	vcpu->run->debug.arch.exception = DB_VECTOR;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_check_guest_singlestep);
 
 static int __vcpu_run(struct kvm_vcpu *vcpu)
 {
@@ -4471,6 +4520,12 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
 
 	r = 1;
 	while (r > 0) {
+		if (unlikely(vcpu->arch.singlestep_pending)) {
+			r = kvm_check_guest_singlestep(vcpu);
+			if (r == 0)
+				break;
+		}
+
 		if (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE)
 			r = vcpu_enter_guest(vcpu);
 		else {
@@ -4828,14 +4883,6 @@ static gpa_t get_tss_base_addr_read(struct kvm_vcpu *vcpu,
 	return kvm_mmu_gva_to_gpa_read(vcpu, base_addr, NULL);
 }
 
-static u16 get_segment_selector(struct kvm_vcpu *vcpu, int seg)
-{
-	struct kvm_segment kvm_seg;
-
-	kvm_get_segment(vcpu, &kvm_seg, seg);
-	return kvm_seg.selector;
-}
-
 static int kvm_load_realmode_segment(struct kvm_vcpu *vcpu, u16 selector, int seg)
 {
 	struct kvm_segment segvar = {
@@ -5607,6 +5654,8 @@ int kvm_arch_vcpu_reset(struct kvm_vcpu *vcpu)
 	vcpu->arch.dr6 = DR6_FIXED_1;
 	vcpu->arch.dr7 = DR7_FIXED_1;
 
+	vcpu->arch.singlestep_pending = false;
+
 	return kvm_x86_ops->vcpu_reset(vcpu);
 }
 
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 6/6] KVM: x86: Emulator support for TF
  2010-02-22 17:51 ` [PATCH 6/6] KVM: x86: Emulator support for TF Jan Kiszka
@ 2010-02-23  9:55   ` Gleb Natapov
  2010-02-23 10:10     ` Jan Kiszka
  0 siblings, 1 reply; 29+ messages in thread
From: Gleb Natapov @ 2010-02-23  9:55 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm

On Mon, Feb 22, 2010 at 06:51:23PM +0100, Jan Kiszka wrote:
> Support both guest- as well as host-owned EFLAGS.TF while emulating
> instructions. For guest-owned TF, we simply inject DB and update DR6.BS
> after completing an instruction that has TF set on entry. To support
> guest single-stepping under host control, we store the pending step
> along with its CS and RIP and trigger a corresponding user space exit
> once guest execution is about to resume. This check is is also required
> in the VMX emulation loop during invalid guest states.
> 
Emulator currently is a total mess. It is not a good time to add more mess
there right now IMO.

> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> ---
>  arch/x86/include/asm/kvm_host.h |    5 +++
>  arch/x86/kvm/vmx.c              |    6 +++
>  arch/x86/kvm/x86.c              |   65 ++++++++++++++++++++++++++++++++++-----
>  3 files changed, 68 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index d46e791..d69d8aa 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -362,8 +362,11 @@ struct kvm_vcpu_arch {
>  	u64 *mce_banks;
>  
>  	/* used for guest single stepping over the given code position */
> +	bool singlestep_pending;
>  	u16 singlestep_cs;
> +	u16 singlestep_pending_cs;
>  	unsigned long singlestep_rip;
> +	unsigned long singlestep_pending_rip;
If we are going to have many of those rip/cs pairs may be it is better
to add structure linear_ip and have functions is_same_ip().
 
>  	/* fields used by HYPER-V emulation */
>  	u64 hv_vapic;
>  };
> @@ -820,4 +823,6 @@ int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
>  void kvm_define_shared_msr(unsigned index, u32 msr);
>  void kvm_set_shared_msr(unsigned index, u64 val, u64 mask);
>  
> +int kvm_check_guest_singlestep(struct kvm_vcpu *vcpu);
> +
>  #endif /* _ASM_X86_KVM_HOST_H */
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index d772476..317828f 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -3489,6 +3489,12 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
>  			goto out;
>  		if (need_resched())
>  			schedule();
> +
> +		if (unlikely(vcpu->arch.singlestep_pending)) {
> +			ret = kvm_check_guest_singlestep(vcpu);
> +			if (ret == 0)
> +				goto out;
> +		}
>  	}
>  
>  	vmx->emulation_required = 0;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 19e8b28..6ebebb9 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3441,6 +3441,27 @@ static void cache_all_regs(struct kvm_vcpu *vcpu)
>  	vcpu->arch.regs_dirty = ~0;
>  }
>  
> +static u16 get_segment_selector(struct kvm_vcpu *vcpu, int seg)
> +{
> +	struct kvm_segment kvm_seg;
> +
> +	kvm_get_segment(vcpu, &kvm_seg, seg);
> +	return kvm_seg.selector;
> +}
> +
> +static void queue_singlestep(struct kvm_vcpu *vcpu)
> +{
> +	if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) {
> +		vcpu->arch.singlestep_pending = true;
> +		vcpu->arch.singlestep_pending_cs =
> +			get_segment_selector(vcpu, VCPU_SREG_CS);
> +		vcpu->arch.singlestep_pending_rip = kvm_rip_read(vcpu);
Why should we remember rip where TF happened? We should exit
immediately to userspace anyway, no?

> +	} else {
> +		vcpu->arch.dr6 |= DR6_BS;
> +		kvm_queue_exception(vcpu, DB_VECTOR);
What if instruction emulation generated fault?

> +	}
> +}
> +
>  int emulate_instruction(struct kvm_vcpu *vcpu,
>  			unsigned long cr2,
>  			u16 error_code,
> @@ -3449,6 +3470,7 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
>  	int r, shadow_mask;
>  	struct decode_cache *c;
>  	struct kvm_run *run = vcpu->run;
> +	bool singlestep;
>  
>  	kvm_clear_exception_queue(vcpu);
>  	vcpu->arch.mmio_fault_cr2 = cr2;
> @@ -3515,8 +3537,12 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
>  		}
>  	}
>  
> +	singlestep = vcpu->arch.emulate_ctxt.eflags & X86_EFLAGS_TF;
> +
>  	if (emulation_type & EMULTYPE_SKIP) {
>  		kvm_rip_write(vcpu, vcpu->arch.emulate_ctxt.decode.eip);
> +		if (singlestep)
> +			queue_singlestep(vcpu);
Instruction that wasn't emulated shouldn't generate faults.

>  		return EMULATE_DONE;
>  	}
>  
> @@ -3549,6 +3575,9 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
>  
>  	kvm_x86_ops->set_rflags(vcpu, vcpu->arch.emulate_ctxt.eflags);
>  
> +	if (singlestep)
> +		queue_singlestep(vcpu);
> +
if vcpu->mmio_is_write == true we can still exit with DO_MMIO, so
instruction is not yes completely executed. This queue_singlestep
mechanism looks bogus anyway. emulate_instruction() caller should
initiate exit to userspace space if required.

>  	if (vcpu->mmio_is_write) {
>  		vcpu->mmio_needed = 0;
>  		return EMULATE_DO_MMIO;
> @@ -4450,6 +4479,26 @@ out:
>  	return r;
>  }
>  
> +int kvm_check_guest_singlestep(struct kvm_vcpu *vcpu)
> +{
> +	unsigned long rip = kvm_rip_read(vcpu);
> +
> +	vcpu->arch.singlestep_pending = false;
> +
> +	if (vcpu->arch.singlestep_pending_cs !=
> +		get_segment_selector(vcpu, VCPU_SREG_CS) ||
> +	    vcpu->arch.singlestep_pending_rip != rip)
> +		return 1;
> +
Again how this check can be false?

> +	vcpu->run->debug.arch.dr6 = DR6_BS | DR6_FIXED_1;
> +	vcpu->run->debug.arch.dr7 = 0;
> +	vcpu->run->exit_reason = KVM_EXIT_DEBUG;
> +	vcpu->run->debug.arch.pc = get_segment_base(vcpu, VCPU_SREG_CS) + rip;
> +	vcpu->run->debug.arch.exception = DB_VECTOR;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(kvm_check_guest_singlestep);
>  
>  static int __vcpu_run(struct kvm_vcpu *vcpu)
>  {
> @@ -4471,6 +4520,12 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
>  
>  	r = 1;
>  	while (r > 0) {
> +		if (unlikely(vcpu->arch.singlestep_pending)) {
> +			r = kvm_check_guest_singlestep(vcpu);
> +			if (r == 0)
> +				break;
> +		}
> +
Why not use existing mechanism to cause run loop to exit to userspace
i.e return 0 from vcpu_enter_guest(), instead of adding special cases here?

>  		if (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE)
>  			r = vcpu_enter_guest(vcpu);
>  		else {
> @@ -4828,14 +4883,6 @@ static gpa_t get_tss_base_addr_read(struct kvm_vcpu *vcpu,
>  	return kvm_mmu_gva_to_gpa_read(vcpu, base_addr, NULL);
>  }
>  
> -static u16 get_segment_selector(struct kvm_vcpu *vcpu, int seg)
> -{
> -	struct kvm_segment kvm_seg;
> -
> -	kvm_get_segment(vcpu, &kvm_seg, seg);
> -	return kvm_seg.selector;
> -}
> -
>  static int kvm_load_realmode_segment(struct kvm_vcpu *vcpu, u16 selector, int seg)
>  {
>  	struct kvm_segment segvar = {
> @@ -5607,6 +5654,8 @@ int kvm_arch_vcpu_reset(struct kvm_vcpu *vcpu)
>  	vcpu->arch.dr6 = DR6_FIXED_1;
>  	vcpu->arch.dr7 = DR7_FIXED_1;
>  
> +	vcpu->arch.singlestep_pending = false;
> +
>  	return kvm_x86_ops->vcpu_reset(vcpu);
>  }
>  
> -- 
> 1.6.0.2

--
			Gleb.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 6/6] KVM: x86: Emulator support for TF
  2010-02-23  9:55   ` Gleb Natapov
@ 2010-02-23 10:10     ` Jan Kiszka
  2010-02-23 10:26       ` Gleb Natapov
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Kiszka @ 2010-02-23 10:10 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Avi Kivity, Marcelo Tosatti, kvm

Gleb Natapov wrote:
> On Mon, Feb 22, 2010 at 06:51:23PM +0100, Jan Kiszka wrote:
>> Support both guest- as well as host-owned EFLAGS.TF while emulating
>> instructions. For guest-owned TF, we simply inject DB and update DR6.BS
>> after completing an instruction that has TF set on entry. To support
>> guest single-stepping under host control, we store the pending step
>> along with its CS and RIP and trigger a corresponding user space exit
>> once guest execution is about to resume. This check is is also required
>> in the VMX emulation loop during invalid guest states.
>>
> Emulator currently is a total mess. It is not a good time to add more mess
> there right now IMO.

Then let's clean up what you consider "mess" in this feature. Unless
there are plans to clean up the emulator for the next or next-but-one
kernel release, I do not want to wait for this.

> 
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>> ---
>>  arch/x86/include/asm/kvm_host.h |    5 +++
>>  arch/x86/kvm/vmx.c              |    6 +++
>>  arch/x86/kvm/x86.c              |   65 ++++++++++++++++++++++++++++++++++-----
>>  3 files changed, 68 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index d46e791..d69d8aa 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -362,8 +362,11 @@ struct kvm_vcpu_arch {
>>  	u64 *mce_banks;
>>  
>>  	/* used for guest single stepping over the given code position */
>> +	bool singlestep_pending;
>>  	u16 singlestep_cs;
>> +	u16 singlestep_pending_cs;
>>  	unsigned long singlestep_rip;
>> +	unsigned long singlestep_pending_rip;
> If we are going to have many of those rip/cs pairs may be it is better
> to add structure linear_ip and have functions is_same_ip().

Agreed.

>  
>>  	/* fields used by HYPER-V emulation */
>>  	u64 hv_vapic;
>>  };
>> @@ -820,4 +823,6 @@ int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
>>  void kvm_define_shared_msr(unsigned index, u32 msr);
>>  void kvm_set_shared_msr(unsigned index, u64 val, u64 mask);
>>  
>> +int kvm_check_guest_singlestep(struct kvm_vcpu *vcpu);
>> +
>>  #endif /* _ASM_X86_KVM_HOST_H */
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index d772476..317828f 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -3489,6 +3489,12 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
>>  			goto out;
>>  		if (need_resched())
>>  			schedule();
>> +
>> +		if (unlikely(vcpu->arch.singlestep_pending)) {
>> +			ret = kvm_check_guest_singlestep(vcpu);
>> +			if (ret == 0)
>> +				goto out;
>> +		}
>>  	}
>>  
>>  	vmx->emulation_required = 0;
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 19e8b28..6ebebb9 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -3441,6 +3441,27 @@ static void cache_all_regs(struct kvm_vcpu *vcpu)
>>  	vcpu->arch.regs_dirty = ~0;
>>  }
>>  
>> +static u16 get_segment_selector(struct kvm_vcpu *vcpu, int seg)
>> +{
>> +	struct kvm_segment kvm_seg;
>> +
>> +	kvm_get_segment(vcpu, &kvm_seg, seg);
>> +	return kvm_seg.selector;
>> +}
>> +
>> +static void queue_singlestep(struct kvm_vcpu *vcpu)
>> +{
>> +	if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) {
>> +		vcpu->arch.singlestep_pending = true;
>> +		vcpu->arch.singlestep_pending_cs =
>> +			get_segment_selector(vcpu, VCPU_SREG_CS);
>> +		vcpu->arch.singlestep_pending_rip = kvm_rip_read(vcpu);
> Why should we remember rip where TF happened? We should exit
> immediately to userspace anyway, no?

I think MMIO exits takes precedence, so this is intended to exit after
they completed, ie. after the instruction is fully finished.

> 
>> +	} else {
>> +		vcpu->arch.dr6 |= DR6_BS;
>> +		kvm_queue_exception(vcpu, DB_VECTOR);
> What if instruction emulation generated fault?

Fault-like exceptions will trigger before that, and the instruction
won't complete. Do we have any trap-like exceptions to worry about?

> 
>> +	}
>> +}
>> +
>>  int emulate_instruction(struct kvm_vcpu *vcpu,
>>  			unsigned long cr2,
>>  			u16 error_code,
>> @@ -3449,6 +3470,7 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
>>  	int r, shadow_mask;
>>  	struct decode_cache *c;
>>  	struct kvm_run *run = vcpu->run;
>> +	bool singlestep;
>>  
>>  	kvm_clear_exception_queue(vcpu);
>>  	vcpu->arch.mmio_fault_cr2 = cr2;
>> @@ -3515,8 +3537,12 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
>>  		}
>>  	}
>>  
>> +	singlestep = vcpu->arch.emulate_ctxt.eflags & X86_EFLAGS_TF;
>> +
>>  	if (emulation_type & EMULTYPE_SKIP) {
>>  		kvm_rip_write(vcpu, vcpu->arch.emulate_ctxt.decode.eip);
>> +		if (singlestep)
>> +			queue_singlestep(vcpu);
> Instruction that wasn't emulated shouldn't generate faults.
> 

Skipping here doesn't mean it's not emulated. A valid question might be
if we should catch it here or in skip_emulated_instruction.

>>  		return EMULATE_DONE;
>>  	}
>>  
>> @@ -3549,6 +3575,9 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
>>  
>>  	kvm_x86_ops->set_rflags(vcpu, vcpu->arch.emulate_ctxt.eflags);
>>  
>> +	if (singlestep)
>> +		queue_singlestep(vcpu);
>> +
> if vcpu->mmio_is_write == true we can still exit with DO_MMIO, so
> instruction is not yes completely executed. This queue_singlestep
> mechanism looks bogus anyway. emulate_instruction() caller should
> initiate exit to userspace space if required.

That's while it is _queued_, not immediately delivered: MMIO exits will
continue to take precedence.

> 
>>  	if (vcpu->mmio_is_write) {
>>  		vcpu->mmio_needed = 0;
>>  		return EMULATE_DO_MMIO;
>> @@ -4450,6 +4479,26 @@ out:
>>  	return r;
>>  }
>>  
>> +int kvm_check_guest_singlestep(struct kvm_vcpu *vcpu)
>> +{
>> +	unsigned long rip = kvm_rip_read(vcpu);
>> +
>> +	vcpu->arch.singlestep_pending = false;
>> +
>> +	if (vcpu->arch.singlestep_pending_cs !=
>> +		get_segment_selector(vcpu, VCPU_SREG_CS) ||
>> +	    vcpu->arch.singlestep_pending_rip != rip)
>> +		return 1;
>> +
> Again how this check can be false?

E.g. someone fiddled with the VCPU state, resetting the guest.

> 
>> +	vcpu->run->debug.arch.dr6 = DR6_BS | DR6_FIXED_1;
>> +	vcpu->run->debug.arch.dr7 = 0;
>> +	vcpu->run->exit_reason = KVM_EXIT_DEBUG;
>> +	vcpu->run->debug.arch.pc = get_segment_base(vcpu, VCPU_SREG_CS) + rip;
>> +	vcpu->run->debug.arch.exception = DB_VECTOR;
>> +
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(kvm_check_guest_singlestep);
>>  
>>  static int __vcpu_run(struct kvm_vcpu *vcpu)
>>  {
>> @@ -4471,6 +4520,12 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
>>  
>>  	r = 1;
>>  	while (r > 0) {
>> +		if (unlikely(vcpu->arch.singlestep_pending)) {
>> +			r = kvm_check_guest_singlestep(vcpu);
>> +			if (r == 0)
>> +				break;
>> +		}
>> +
> Why not use existing mechanism to cause run loop to exit to userspace
> i.e return 0 from vcpu_enter_guest(), instead of adding special cases here?
> 

I wanted to exit in case of vcpu->arch.mp_state != KVM_MP_STATE_RUNNABLE
as well, but thinking about this again it's actually more reasonable to
exit once the VCPU unblocks again, e.g. once halt resumes.

>>  		if (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE)
>>  			r = vcpu_enter_guest(vcpu);
>>  		else {
>> @@ -4828,14 +4883,6 @@ static gpa_t get_tss_base_addr_read(struct kvm_vcpu *vcpu,
>>  	return kvm_mmu_gva_to_gpa_read(vcpu, base_addr, NULL);
>>  }
>>  
>> -static u16 get_segment_selector(struct kvm_vcpu *vcpu, int seg)
>> -{
>> -	struct kvm_segment kvm_seg;
>> -
>> -	kvm_get_segment(vcpu, &kvm_seg, seg);
>> -	return kvm_seg.selector;
>> -}
>> -
>>  static int kvm_load_realmode_segment(struct kvm_vcpu *vcpu, u16 selector, int seg)
>>  {
>>  	struct kvm_segment segvar = {
>> @@ -5607,6 +5654,8 @@ int kvm_arch_vcpu_reset(struct kvm_vcpu *vcpu)
>>  	vcpu->arch.dr6 = DR6_FIXED_1;
>>  	vcpu->arch.dr7 = DR7_FIXED_1;
>>  
>> +	vcpu->arch.singlestep_pending = false;
>> +
>>  	return kvm_x86_ops->vcpu_reset(vcpu);
>>  }
>>  
>> -- 
>> 1.6.0.2
> 
> --
> 			Gleb.

Thanks,
Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 6/6] KVM: x86: Emulator support for TF
  2010-02-23 10:10     ` Jan Kiszka
@ 2010-02-23 10:26       ` Gleb Natapov
  2010-02-23 10:29         ` Avi Kivity
  2010-02-23 10:37         ` Jan Kiszka
  0 siblings, 2 replies; 29+ messages in thread
From: Gleb Natapov @ 2010-02-23 10:26 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm

On Tue, Feb 23, 2010 at 11:10:57AM +0100, Jan Kiszka wrote:
> Gleb Natapov wrote:
> > On Mon, Feb 22, 2010 at 06:51:23PM +0100, Jan Kiszka wrote:
> >> Support both guest- as well as host-owned EFLAGS.TF while emulating
> >> instructions. For guest-owned TF, we simply inject DB and update DR6.BS
> >> after completing an instruction that has TF set on entry. To support
> >> guest single-stepping under host control, we store the pending step
> >> along with its CS and RIP and trigger a corresponding user space exit
> >> once guest execution is about to resume. This check is is also required
> >> in the VMX emulation loop during invalid guest states.
> >>
> > Emulator currently is a total mess. It is not a good time to add more mess
> > there right now IMO.
> 
> Then let's clean up what you consider "mess" in this feature. Unless
> there are plans to clean up the emulator for the next or next-but-one
> kernel release, I do not want to wait for this.
> 
There are plans to cleanup the emulator.

> > 
> >> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> >> ---
> >>  arch/x86/include/asm/kvm_host.h |    5 +++
> >>  arch/x86/kvm/vmx.c              |    6 +++
> >>  arch/x86/kvm/x86.c              |   65 ++++++++++++++++++++++++++++++++++-----
> >>  3 files changed, 68 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> >> index d46e791..d69d8aa 100644
> >> --- a/arch/x86/include/asm/kvm_host.h
> >> +++ b/arch/x86/include/asm/kvm_host.h
> >> @@ -362,8 +362,11 @@ struct kvm_vcpu_arch {
> >>  	u64 *mce_banks;
> >>  
> >>  	/* used for guest single stepping over the given code position */
> >> +	bool singlestep_pending;
> >>  	u16 singlestep_cs;
> >> +	u16 singlestep_pending_cs;
> >>  	unsigned long singlestep_rip;
> >> +	unsigned long singlestep_pending_rip;
> > If we are going to have many of those rip/cs pairs may be it is better
> > to add structure linear_ip and have functions is_same_ip().
> 
> Agreed.
> 
> >  
> >>  	/* fields used by HYPER-V emulation */
> >>  	u64 hv_vapic;
> >>  };
> >> @@ -820,4 +823,6 @@ int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
> >>  void kvm_define_shared_msr(unsigned index, u32 msr);
> >>  void kvm_set_shared_msr(unsigned index, u64 val, u64 mask);
> >>  
> >> +int kvm_check_guest_singlestep(struct kvm_vcpu *vcpu);
> >> +
> >>  #endif /* _ASM_X86_KVM_HOST_H */
> >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> >> index d772476..317828f 100644
> >> --- a/arch/x86/kvm/vmx.c
> >> +++ b/arch/x86/kvm/vmx.c
> >> @@ -3489,6 +3489,12 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
> >>  			goto out;
> >>  		if (need_resched())
> >>  			schedule();
> >> +
> >> +		if (unlikely(vcpu->arch.singlestep_pending)) {
> >> +			ret = kvm_check_guest_singlestep(vcpu);
> >> +			if (ret == 0)
> >> +				goto out;
> >> +		}
> >>  	}
> >>  
> >>  	vmx->emulation_required = 0;
> >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >> index 19e8b28..6ebebb9 100644
> >> --- a/arch/x86/kvm/x86.c
> >> +++ b/arch/x86/kvm/x86.c
> >> @@ -3441,6 +3441,27 @@ static void cache_all_regs(struct kvm_vcpu *vcpu)
> >>  	vcpu->arch.regs_dirty = ~0;
> >>  }
> >>  
> >> +static u16 get_segment_selector(struct kvm_vcpu *vcpu, int seg)
> >> +{
> >> +	struct kvm_segment kvm_seg;
> >> +
> >> +	kvm_get_segment(vcpu, &kvm_seg, seg);
> >> +	return kvm_seg.selector;
> >> +}
> >> +
> >> +static void queue_singlestep(struct kvm_vcpu *vcpu)
> >> +{
> >> +	if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) {
> >> +		vcpu->arch.singlestep_pending = true;
> >> +		vcpu->arch.singlestep_pending_cs =
> >> +			get_segment_selector(vcpu, VCPU_SREG_CS);
> >> +		vcpu->arch.singlestep_pending_rip = kvm_rip_read(vcpu);
> > Why should we remember rip where TF happened? We should exit
> > immediately to userspace anyway, no?
> 
> I think MMIO exits takes precedence, so this is intended to exit after
> they completed, ie. after the instruction is fully finished.
> 
> > 
> >> +	} else {
> >> +		vcpu->arch.dr6 |= DR6_BS;
> >> +		kvm_queue_exception(vcpu, DB_VECTOR);
> > What if instruction emulation generated fault?
> 
> Fault-like exceptions will trigger before that, and the instruction
> won't complete. Do we have any trap-like exceptions to worry about?
> 
They will not trigger before that. They will be queued for the next
entry and queuing another one will either overwrite the previous one,
or will queue double fault (depending on what what the first exception).

> > 
> >> +	}
> >> +}
> >> +
> >>  int emulate_instruction(struct kvm_vcpu *vcpu,
> >>  			unsigned long cr2,
> >>  			u16 error_code,
> >> @@ -3449,6 +3470,7 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
> >>  	int r, shadow_mask;
> >>  	struct decode_cache *c;
> >>  	struct kvm_run *run = vcpu->run;
> >> +	bool singlestep;
> >>  
> >>  	kvm_clear_exception_queue(vcpu);
> >>  	vcpu->arch.mmio_fault_cr2 = cr2;
> >> @@ -3515,8 +3537,12 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
> >>  		}
> >>  	}
> >>  
> >> +	singlestep = vcpu->arch.emulate_ctxt.eflags & X86_EFLAGS_TF;
> >> +
> >>  	if (emulation_type & EMULTYPE_SKIP) {
> >>  		kvm_rip_write(vcpu, vcpu->arch.emulate_ctxt.decode.eip);
> >> +		if (singlestep)
> >> +			queue_singlestep(vcpu);
> > Instruction that wasn't emulated shouldn't generate faults.
> > 
> 
> Skipping here doesn't mean it's not emulated. A valid question might be
> if we should catch it here or in skip_emulated_instruction.
In skip_emulated_instruction() or even above that. Only at the point we
are sure we actually emulate instruction.

> 
> >>  		return EMULATE_DONE;
> >>  	}
> >>  
> >> @@ -3549,6 +3575,9 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
> >>  
> >>  	kvm_x86_ops->set_rflags(vcpu, vcpu->arch.emulate_ctxt.eflags);
> >>  
> >> +	if (singlestep)
> >> +		queue_singlestep(vcpu);
> >> +
> > if vcpu->mmio_is_write == true we can still exit with DO_MMIO, so
> > instruction is not yes completely executed. This queue_singlestep
> > mechanism looks bogus anyway. emulate_instruction() caller should
> > initiate exit to userspace space if required.
> 
> That's while it is _queued_, not immediately delivered: MMIO exits will
> continue to take precedence.
That is bogus. We should queue TF only after instruction is completely
emulated, not during emulation. Instruction may generate dozen MMIO
exits and eventually be aborted by exception, so DB shouldn't be generated
at all in this case.

> 
> > 
> >>  	if (vcpu->mmio_is_write) {
> >>  		vcpu->mmio_needed = 0;
> >>  		return EMULATE_DO_MMIO;
> >> @@ -4450,6 +4479,26 @@ out:
> >>  	return r;
> >>  }
> >>  
> >> +int kvm_check_guest_singlestep(struct kvm_vcpu *vcpu)
> >> +{
> >> +	unsigned long rip = kvm_rip_read(vcpu);
> >> +
> >> +	vcpu->arch.singlestep_pending = false;
> >> +
> >> +	if (vcpu->arch.singlestep_pending_cs !=
> >> +		get_segment_selector(vcpu, VCPU_SREG_CS) ||
> >> +	    vcpu->arch.singlestep_pending_rip != rip)
> >> +		return 1;
> >> +
> > Again how this check can be false?
> 
> E.g. someone fiddled with the VCPU state, resetting the guest.
How is this someone? It should reset this state too then.

> 
> > 
> >> +	vcpu->run->debug.arch.dr6 = DR6_BS | DR6_FIXED_1;
> >> +	vcpu->run->debug.arch.dr7 = 0;
> >> +	vcpu->run->exit_reason = KVM_EXIT_DEBUG;
> >> +	vcpu->run->debug.arch.pc = get_segment_base(vcpu, VCPU_SREG_CS) + rip;
> >> +	vcpu->run->debug.arch.exception = DB_VECTOR;
> >> +
> >> +	return 0;
> >> +}
> >> +EXPORT_SYMBOL_GPL(kvm_check_guest_singlestep);
> >>  
> >>  static int __vcpu_run(struct kvm_vcpu *vcpu)
> >>  {
> >> @@ -4471,6 +4520,12 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
> >>  
> >>  	r = 1;
> >>  	while (r > 0) {
> >> +		if (unlikely(vcpu->arch.singlestep_pending)) {
> >> +			r = kvm_check_guest_singlestep(vcpu);
> >> +			if (r == 0)
> >> +				break;
> >> +		}
> >> +
> > Why not use existing mechanism to cause run loop to exit to userspace
> > i.e return 0 from vcpu_enter_guest(), instead of adding special cases here?
> > 
> 
> I wanted to exit in case of vcpu->arch.mp_state != KVM_MP_STATE_RUNNABLE
> as well, but thinking about this again it's actually more reasonable to
> exit once the VCPU unblocks again, e.g. once halt resumes.
> 
We will exit immediately after halt if return value is 0, why would you
want to exit on vcpu entry after that?
 
> >>  		if (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE)
> >>  			r = vcpu_enter_guest(vcpu);
> >>  		else {
> >> @@ -4828,14 +4883,6 @@ static gpa_t get_tss_base_addr_read(struct kvm_vcpu *vcpu,
> >>  	return kvm_mmu_gva_to_gpa_read(vcpu, base_addr, NULL);
> >>  }
> >>  
> >> -static u16 get_segment_selector(struct kvm_vcpu *vcpu, int seg)
> >> -{
> >> -	struct kvm_segment kvm_seg;
> >> -
> >> -	kvm_get_segment(vcpu, &kvm_seg, seg);
> >> -	return kvm_seg.selector;
> >> -}
> >> -
> >>  static int kvm_load_realmode_segment(struct kvm_vcpu *vcpu, u16 selector, int seg)
> >>  {
> >>  	struct kvm_segment segvar = {
> >> @@ -5607,6 +5654,8 @@ int kvm_arch_vcpu_reset(struct kvm_vcpu *vcpu)
> >>  	vcpu->arch.dr6 = DR6_FIXED_1;
> >>  	vcpu->arch.dr7 = DR7_FIXED_1;
> >>  
> >> +	vcpu->arch.singlestep_pending = false;
> >> +
> >>  	return kvm_x86_ops->vcpu_reset(vcpu);
> >>  }
> >>  
> >> -- 
> >> 1.6.0.2
> > 
> > --
> > 			Gleb.
> 
> Thanks,
> Jan
> 
> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

--
			Gleb.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 6/6] KVM: x86: Emulator support for TF
  2010-02-23 10:26       ` Gleb Natapov
@ 2010-02-23 10:29         ` Avi Kivity
  2010-02-23 10:32           ` Gleb Natapov
  2010-02-23 10:37         ` Jan Kiszka
  1 sibling, 1 reply; 29+ messages in thread
From: Avi Kivity @ 2010-02-23 10:29 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Jan Kiszka, Marcelo Tosatti, kvm

On 02/23/2010 12:26 PM, Gleb Natapov wrote:
> On Tue, Feb 23, 2010 at 11:10:57AM +0100, Jan Kiszka wrote:
>    
>> Gleb Natapov wrote:
>>      
>>> On Mon, Feb 22, 2010 at 06:51:23PM +0100, Jan Kiszka wrote:
>>>        
>>>> Support both guest- as well as host-owned EFLAGS.TF while emulating
>>>> instructions. For guest-owned TF, we simply inject DB and update DR6.BS
>>>> after completing an instruction that has TF set on entry. To support
>>>> guest single-stepping under host control, we store the pending step
>>>> along with its CS and RIP and trigger a corresponding user space exit
>>>> once guest execution is about to resume. This check is is also required
>>>> in the VMX emulation loop during invalid guest states.
>>>>
>>>>          
>>> Emulator currently is a total mess. It is not a good time to add more mess
>>> there right now IMO.
>>>        
>> Then let's clean up what you consider "mess" in this feature. Unless
>> there are plans to clean up the emulator for the next or next-but-one
>> kernel release, I do not want to wait for this.
>>
>>      
> There are plans to cleanup the emulator.
>    

That shouldn't stop the code.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 6/6] KVM: x86: Emulator support for TF
  2010-02-23 10:29         ` Avi Kivity
@ 2010-02-23 10:32           ` Gleb Natapov
  2010-02-23 10:34             ` Avi Kivity
  0 siblings, 1 reply; 29+ messages in thread
From: Gleb Natapov @ 2010-02-23 10:32 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Jan Kiszka, Marcelo Tosatti, kvm

On Tue, Feb 23, 2010 at 12:29:25PM +0200, Avi Kivity wrote:
> On 02/23/2010 12:26 PM, Gleb Natapov wrote:
> >On Tue, Feb 23, 2010 at 11:10:57AM +0100, Jan Kiszka wrote:
> >>Gleb Natapov wrote:
> >>>On Mon, Feb 22, 2010 at 06:51:23PM +0100, Jan Kiszka wrote:
> >>>>Support both guest- as well as host-owned EFLAGS.TF while emulating
> >>>>instructions. For guest-owned TF, we simply inject DB and update DR6.BS
> >>>>after completing an instruction that has TF set on entry. To support
> >>>>guest single-stepping under host control, we store the pending step
> >>>>along with its CS and RIP and trigger a corresponding user space exit
> >>>>once guest execution is about to resume. This check is is also required
> >>>>in the VMX emulation loop during invalid guest states.
> >>>>
> >>>Emulator currently is a total mess. It is not a good time to add more mess
> >>>there right now IMO.
> >>Then let's clean up what you consider "mess" in this feature. Unless
> >>there are plans to clean up the emulator for the next or next-but-one
> >>kernel release, I do not want to wait for this.
> >>
> >There are plans to cleanup the emulator.
> 
> That shouldn't stop the code.
> 
If the code is correct and doesn't make emulator code even messier.

--
			Gleb.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 6/6] KVM: x86: Emulator support for TF
  2010-02-23 10:32           ` Gleb Natapov
@ 2010-02-23 10:34             ` Avi Kivity
  0 siblings, 0 replies; 29+ messages in thread
From: Avi Kivity @ 2010-02-23 10:34 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Jan Kiszka, Marcelo Tosatti, kvm

On 02/23/2010 12:32 PM, Gleb Natapov wrote:
>>>>>
>>>>> Emulator currently is a total mess. It is not a good time to add more mess
>>>>> there right now IMO.
>>>>>            
>>>> Then let's clean up what you consider "mess" in this feature. Unless
>>>> there are plans to clean up the emulator for the next or next-but-one
>>>> kernel release, I do not want to wait for this.
>>>>
>>>>          
>>> There are plans to cleanup the emulator.
>>>        
>> That shouldn't stop the code.
>>
>>      
> If the code is correct and doesn't make emulator code even messier.
>    

Definitely.  I'm sure Jan will address concrete comments.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 6/6] KVM: x86: Emulator support for TF
  2010-02-23 10:26       ` Gleb Natapov
  2010-02-23 10:29         ` Avi Kivity
@ 2010-02-23 10:37         ` Jan Kiszka
  2010-02-23 11:00           ` Gleb Natapov
  1 sibling, 1 reply; 29+ messages in thread
From: Jan Kiszka @ 2010-02-23 10:37 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Avi Kivity, Marcelo Tosatti, kvm

Gleb Natapov wrote:
> On Tue, Feb 23, 2010 at 11:10:57AM +0100, Jan Kiszka wrote:
>> Gleb Natapov wrote:
>>> On Mon, Feb 22, 2010 at 06:51:23PM +0100, Jan Kiszka wrote:
>>>> Support both guest- as well as host-owned EFLAGS.TF while emulating
>>>> instructions. For guest-owned TF, we simply inject DB and update DR6.BS
>>>> after completing an instruction that has TF set on entry. To support
>>>> guest single-stepping under host control, we store the pending step
>>>> along with its CS and RIP and trigger a corresponding user space exit
>>>> once guest execution is about to resume. This check is is also required
>>>> in the VMX emulation loop during invalid guest states.
>>>>
>>> Emulator currently is a total mess. It is not a good time to add more mess
>>> there right now IMO.
>> Then let's clean up what you consider "mess" in this feature. Unless
>> there are plans to clean up the emulator for the next or next-but-one
>> kernel release, I do not want to wait for this.
>>
> There are plans to cleanup the emulator.

When?

> 
>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>> ---
>>>>  arch/x86/include/asm/kvm_host.h |    5 +++
>>>>  arch/x86/kvm/vmx.c              |    6 +++
>>>>  arch/x86/kvm/x86.c              |   65 ++++++++++++++++++++++++++++++++++-----
>>>>  3 files changed, 68 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>>>> index d46e791..d69d8aa 100644
>>>> --- a/arch/x86/include/asm/kvm_host.h
>>>> +++ b/arch/x86/include/asm/kvm_host.h
>>>> @@ -362,8 +362,11 @@ struct kvm_vcpu_arch {
>>>>  	u64 *mce_banks;
>>>>  
>>>>  	/* used for guest single stepping over the given code position */
>>>> +	bool singlestep_pending;
>>>>  	u16 singlestep_cs;
>>>> +	u16 singlestep_pending_cs;
>>>>  	unsigned long singlestep_rip;
>>>> +	unsigned long singlestep_pending_rip;
>>> If we are going to have many of those rip/cs pairs may be it is better
>>> to add structure linear_ip and have functions is_same_ip().
>> Agreed.
>>
>>>  
>>>>  	/* fields used by HYPER-V emulation */
>>>>  	u64 hv_vapic;
>>>>  };
>>>> @@ -820,4 +823,6 @@ int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
>>>>  void kvm_define_shared_msr(unsigned index, u32 msr);
>>>>  void kvm_set_shared_msr(unsigned index, u64 val, u64 mask);
>>>>  
>>>> +int kvm_check_guest_singlestep(struct kvm_vcpu *vcpu);
>>>> +
>>>>  #endif /* _ASM_X86_KVM_HOST_H */
>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>>> index d772476..317828f 100644
>>>> --- a/arch/x86/kvm/vmx.c
>>>> +++ b/arch/x86/kvm/vmx.c
>>>> @@ -3489,6 +3489,12 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
>>>>  			goto out;
>>>>  		if (need_resched())
>>>>  			schedule();
>>>> +
>>>> +		if (unlikely(vcpu->arch.singlestep_pending)) {
>>>> +			ret = kvm_check_guest_singlestep(vcpu);
>>>> +			if (ret == 0)
>>>> +				goto out;
>>>> +		}
>>>>  	}
>>>>  
>>>>  	vmx->emulation_required = 0;
>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>>> index 19e8b28..6ebebb9 100644
>>>> --- a/arch/x86/kvm/x86.c
>>>> +++ b/arch/x86/kvm/x86.c
>>>> @@ -3441,6 +3441,27 @@ static void cache_all_regs(struct kvm_vcpu *vcpu)
>>>>  	vcpu->arch.regs_dirty = ~0;
>>>>  }
>>>>  
>>>> +static u16 get_segment_selector(struct kvm_vcpu *vcpu, int seg)
>>>> +{
>>>> +	struct kvm_segment kvm_seg;
>>>> +
>>>> +	kvm_get_segment(vcpu, &kvm_seg, seg);
>>>> +	return kvm_seg.selector;
>>>> +}
>>>> +
>>>> +static void queue_singlestep(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +	if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) {
>>>> +		vcpu->arch.singlestep_pending = true;
>>>> +		vcpu->arch.singlestep_pending_cs =
>>>> +			get_segment_selector(vcpu, VCPU_SREG_CS);
>>>> +		vcpu->arch.singlestep_pending_rip = kvm_rip_read(vcpu);
>>> Why should we remember rip where TF happened? We should exit
>>> immediately to userspace anyway, no?
>> I think MMIO exits takes precedence, so this is intended to exit after
>> they completed, ie. after the instruction is fully finished.
>>
>>>> +	} else {
>>>> +		vcpu->arch.dr6 |= DR6_BS;
>>>> +		kvm_queue_exception(vcpu, DB_VECTOR);
>>> What if instruction emulation generated fault?
>> Fault-like exceptions will trigger before that, and the instruction
>> won't complete. Do we have any trap-like exceptions to worry about?
>>
> They will not trigger before that. They will be queued for the next
> entry and queuing another one will either overwrite the previous one,
> or will queue double fault (depending on what what the first exception).

The will not stack as the instruction failed, thus no singlestep will be
queued as well.

> 
>>>> +	}
>>>> +}
>>>> +
>>>>  int emulate_instruction(struct kvm_vcpu *vcpu,
>>>>  			unsigned long cr2,
>>>>  			u16 error_code,
>>>> @@ -3449,6 +3470,7 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
>>>>  	int r, shadow_mask;
>>>>  	struct decode_cache *c;
>>>>  	struct kvm_run *run = vcpu->run;
>>>> +	bool singlestep;
>>>>  
>>>>  	kvm_clear_exception_queue(vcpu);
>>>>  	vcpu->arch.mmio_fault_cr2 = cr2;
>>>> @@ -3515,8 +3537,12 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
>>>>  		}
>>>>  	}
>>>>  
>>>> +	singlestep = vcpu->arch.emulate_ctxt.eflags & X86_EFLAGS_TF;
>>>> +
>>>>  	if (emulation_type & EMULTYPE_SKIP) {
>>>>  		kvm_rip_write(vcpu, vcpu->arch.emulate_ctxt.decode.eip);
>>>> +		if (singlestep)
>>>> +			queue_singlestep(vcpu);
>>> Instruction that wasn't emulated shouldn't generate faults.
>>>
>> Skipping here doesn't mean it's not emulated. A valid question might be
>> if we should catch it here or in skip_emulated_instruction.
> In skip_emulated_instruction() or even above that. Only at the point we
> are sure we actually emulate instruction.
> 
>>>>  		return EMULATE_DONE;
>>>>  	}
>>>>  
>>>> @@ -3549,6 +3575,9 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
>>>>  
>>>>  	kvm_x86_ops->set_rflags(vcpu, vcpu->arch.emulate_ctxt.eflags);
>>>>  
>>>> +	if (singlestep)
>>>> +		queue_singlestep(vcpu);
>>>> +
>>> if vcpu->mmio_is_write == true we can still exit with DO_MMIO, so
>>> instruction is not yes completely executed. This queue_singlestep
>>> mechanism looks bogus anyway. emulate_instruction() caller should
>>> initiate exit to userspace space if required.
>> That's while it is _queued_, not immediately delivered: MMIO exits will
>> continue to take precedence.
> That is bogus. We should queue TF only after instruction is completely
> emulated, not during emulation. Instruction may generate dozen MMIO
> exits and eventually be aborted by exception, so DB shouldn't be generated
> at all in this case.

How to detect this? This potential looping over user space caused sever
headache will designing the TF feature. I thought I got all cases.

The current model is: if we first execute a different instruction than
the one that singlestep targets at, singlestep should be cleared without
having any effect. Re-executing the stepped instruction after a
potential exception resolution should then reassert singlestep and
requeue it properly.

> 
>>>>  	if (vcpu->mmio_is_write) {
>>>>  		vcpu->mmio_needed = 0;
>>>>  		return EMULATE_DO_MMIO;
>>>> @@ -4450,6 +4479,26 @@ out:
>>>>  	return r;
>>>>  }
>>>>  
>>>> +int kvm_check_guest_singlestep(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +	unsigned long rip = kvm_rip_read(vcpu);
>>>> +
>>>> +	vcpu->arch.singlestep_pending = false;
>>>> +
>>>> +	if (vcpu->arch.singlestep_pending_cs !=
>>>> +		get_segment_selector(vcpu, VCPU_SREG_CS) ||
>>>> +	    vcpu->arch.singlestep_pending_rip != rip)
>>>> +		return 1;
>>>> +
>>> Again how this check can be false?
>> E.g. someone fiddled with the VCPU state, resetting the guest.
> How is this someone? It should reset this state too then.

If somehow possible, I do not want to make this state part of the user
visible VCPU state but rather translate it into a pending #DB + set
DR6.BS (for guest-owned TF) or drop it (for host-owned - not critical here).

> 
>>>> +	vcpu->run->debug.arch.dr6 = DR6_BS | DR6_FIXED_1;
>>>> +	vcpu->run->debug.arch.dr7 = 0;
>>>> +	vcpu->run->exit_reason = KVM_EXIT_DEBUG;
>>>> +	vcpu->run->debug.arch.pc = get_segment_base(vcpu, VCPU_SREG_CS) + rip;
>>>> +	vcpu->run->debug.arch.exception = DB_VECTOR;
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(kvm_check_guest_singlestep);
>>>>  
>>>>  static int __vcpu_run(struct kvm_vcpu *vcpu)
>>>>  {
>>>> @@ -4471,6 +4520,12 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
>>>>  
>>>>  	r = 1;
>>>>  	while (r > 0) {
>>>> +		if (unlikely(vcpu->arch.singlestep_pending)) {
>>>> +			r = kvm_check_guest_singlestep(vcpu);
>>>> +			if (r == 0)
>>>> +				break;
>>>> +		}
>>>> +
>>> Why not use existing mechanism to cause run loop to exit to userspace
>>> i.e return 0 from vcpu_enter_guest(), instead of adding special cases here?
>>>
>> I wanted to exit in case of vcpu->arch.mp_state != KVM_MP_STATE_RUNNABLE
>> as well, but thinking about this again it's actually more reasonable to
>> exit once the VCPU unblocks again, e.g. once halt resumes.
>>
> We will exit immediately after halt if return value is 0, why would you
> want to exit on vcpu entry after that?

As I said: I will move this into vcpu_enter_guest.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 6/6] KVM: x86: Emulator support for TF
  2010-02-23 10:37         ` Jan Kiszka
@ 2010-02-23 11:00           ` Gleb Natapov
  2010-02-23 11:04             ` Avi Kivity
  2010-02-23 11:30             ` Jan Kiszka
  0 siblings, 2 replies; 29+ messages in thread
From: Gleb Natapov @ 2010-02-23 11:00 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm

On Tue, Feb 23, 2010 at 11:37:21AM +0100, Jan Kiszka wrote:
> Gleb Natapov wrote:
> > On Tue, Feb 23, 2010 at 11:10:57AM +0100, Jan Kiszka wrote:
> >> Gleb Natapov wrote:
> >>> On Mon, Feb 22, 2010 at 06:51:23PM +0100, Jan Kiszka wrote:
> >>>> Support both guest- as well as host-owned EFLAGS.TF while emulating
> >>>> instructions. For guest-owned TF, we simply inject DB and update DR6.BS
> >>>> after completing an instruction that has TF set on entry. To support
> >>>> guest single-stepping under host control, we store the pending step
> >>>> along with its CS and RIP and trigger a corresponding user space exit
> >>>> once guest execution is about to resume. This check is is also required
> >>>> in the VMX emulation loop during invalid guest states.
> >>>>
> >>> Emulator currently is a total mess. It is not a good time to add more mess
> >>> there right now IMO.
> >> Then let's clean up what you consider "mess" in this feature. Unless
> >> there are plans to clean up the emulator for the next or next-but-one
> >> kernel release, I do not want to wait for this.
> >>
> > There are plans to cleanup the emulator.
> 
> When?
ASAP :) I am looking into that, but it will not be easy.

> 
> > 
> >>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> >>>> ---
> >>>>  arch/x86/include/asm/kvm_host.h |    5 +++
> >>>>  arch/x86/kvm/vmx.c              |    6 +++
> >>>>  arch/x86/kvm/x86.c              |   65 ++++++++++++++++++++++++++++++++++-----
> >>>>  3 files changed, 68 insertions(+), 8 deletions(-)
> >>>>
> >>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> >>>> index d46e791..d69d8aa 100644
> >>>> --- a/arch/x86/include/asm/kvm_host.h
> >>>> +++ b/arch/x86/include/asm/kvm_host.h
> >>>> @@ -362,8 +362,11 @@ struct kvm_vcpu_arch {
> >>>>  	u64 *mce_banks;
> >>>>  
> >>>>  	/* used for guest single stepping over the given code position */
> >>>> +	bool singlestep_pending;
> >>>>  	u16 singlestep_cs;
> >>>> +	u16 singlestep_pending_cs;
> >>>>  	unsigned long singlestep_rip;
> >>>> +	unsigned long singlestep_pending_rip;
> >>> If we are going to have many of those rip/cs pairs may be it is better
> >>> to add structure linear_ip and have functions is_same_ip().
> >> Agreed.
> >>
> >>>  
> >>>>  	/* fields used by HYPER-V emulation */
> >>>>  	u64 hv_vapic;
> >>>>  };
> >>>> @@ -820,4 +823,6 @@ int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
> >>>>  void kvm_define_shared_msr(unsigned index, u32 msr);
> >>>>  void kvm_set_shared_msr(unsigned index, u64 val, u64 mask);
> >>>>  
> >>>> +int kvm_check_guest_singlestep(struct kvm_vcpu *vcpu);
> >>>> +
> >>>>  #endif /* _ASM_X86_KVM_HOST_H */
> >>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> >>>> index d772476..317828f 100644
> >>>> --- a/arch/x86/kvm/vmx.c
> >>>> +++ b/arch/x86/kvm/vmx.c
> >>>> @@ -3489,6 +3489,12 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
> >>>>  			goto out;
> >>>>  		if (need_resched())
> >>>>  			schedule();
> >>>> +
> >>>> +		if (unlikely(vcpu->arch.singlestep_pending)) {
> >>>> +			ret = kvm_check_guest_singlestep(vcpu);
> >>>> +			if (ret == 0)
> >>>> +				goto out;
> >>>> +		}
> >>>>  	}
> >>>>  
> >>>>  	vmx->emulation_required = 0;
> >>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >>>> index 19e8b28..6ebebb9 100644
> >>>> --- a/arch/x86/kvm/x86.c
> >>>> +++ b/arch/x86/kvm/x86.c
> >>>> @@ -3441,6 +3441,27 @@ static void cache_all_regs(struct kvm_vcpu *vcpu)
> >>>>  	vcpu->arch.regs_dirty = ~0;
> >>>>  }
> >>>>  
> >>>> +static u16 get_segment_selector(struct kvm_vcpu *vcpu, int seg)
> >>>> +{
> >>>> +	struct kvm_segment kvm_seg;
> >>>> +
> >>>> +	kvm_get_segment(vcpu, &kvm_seg, seg);
> >>>> +	return kvm_seg.selector;
> >>>> +}
> >>>> +
> >>>> +static void queue_singlestep(struct kvm_vcpu *vcpu)
> >>>> +{
> >>>> +	if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) {
> >>>> +		vcpu->arch.singlestep_pending = true;
> >>>> +		vcpu->arch.singlestep_pending_cs =
> >>>> +			get_segment_selector(vcpu, VCPU_SREG_CS);
> >>>> +		vcpu->arch.singlestep_pending_rip = kvm_rip_read(vcpu);
> >>> Why should we remember rip where TF happened? We should exit
> >>> immediately to userspace anyway, no?
> >> I think MMIO exits takes precedence, so this is intended to exit after
> >> they completed, ie. after the instruction is fully finished.
> >>
> >>>> +	} else {
> >>>> +		vcpu->arch.dr6 |= DR6_BS;
> >>>> +		kvm_queue_exception(vcpu, DB_VECTOR);
> >>> What if instruction emulation generated fault?
> >> Fault-like exceptions will trigger before that, and the instruction
> >> won't complete. Do we have any trap-like exceptions to worry about?
> >>
> > They will not trigger before that. They will be queued for the next
> > entry and queuing another one will either overwrite the previous one,
> > or will queue double fault (depending on what what the first exception).
> 
> The will not stack as the instruction failed, thus no singlestep will be
> queued as well.
Instruction failed doesn't mean emulation failed, so lets see what
happens when you single step over instruction that generates page fault.
#PF is queued and x86_emulate_insn() returns 0 to emulate_instruction()
no you call queue_singlestep() which calls kvm_queue_exception(vcpu, DB_VECTOR);
and this cause #DF to be injected.

> 
> > 
> >>>> +	}
> >>>> +}
> >>>> +
> >>>>  int emulate_instruction(struct kvm_vcpu *vcpu,
> >>>>  			unsigned long cr2,
> >>>>  			u16 error_code,
> >>>> @@ -3449,6 +3470,7 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
> >>>>  	int r, shadow_mask;
> >>>>  	struct decode_cache *c;
> >>>>  	struct kvm_run *run = vcpu->run;
> >>>> +	bool singlestep;
> >>>>  
> >>>>  	kvm_clear_exception_queue(vcpu);
> >>>>  	vcpu->arch.mmio_fault_cr2 = cr2;
> >>>> @@ -3515,8 +3537,12 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
> >>>>  		}
> >>>>  	}
> >>>>  
> >>>> +	singlestep = vcpu->arch.emulate_ctxt.eflags & X86_EFLAGS_TF;
> >>>> +
> >>>>  	if (emulation_type & EMULTYPE_SKIP) {
> >>>>  		kvm_rip_write(vcpu, vcpu->arch.emulate_ctxt.decode.eip);
> >>>> +		if (singlestep)
> >>>> +			queue_singlestep(vcpu);
> >>> Instruction that wasn't emulated shouldn't generate faults.
> >>>
> >> Skipping here doesn't mean it's not emulated. A valid question might be
> >> if we should catch it here or in skip_emulated_instruction.
> > In skip_emulated_instruction() or even above that. Only at the point we
> > are sure we actually emulate instruction.
> > 
> >>>>  		return EMULATE_DONE;
> >>>>  	}
> >>>>  
> >>>> @@ -3549,6 +3575,9 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
> >>>>  
> >>>>  	kvm_x86_ops->set_rflags(vcpu, vcpu->arch.emulate_ctxt.eflags);
> >>>>  
> >>>> +	if (singlestep)
> >>>> +		queue_singlestep(vcpu);
> >>>> +
> >>> if vcpu->mmio_is_write == true we can still exit with DO_MMIO, so
> >>> instruction is not yes completely executed. This queue_singlestep
> >>> mechanism looks bogus anyway. emulate_instruction() caller should
> >>> initiate exit to userspace space if required.
> >> That's while it is _queued_, not immediately delivered: MMIO exits will
> >> continue to take precedence.
> > That is bogus. We should queue TF only after instruction is completely
This should have been "We should queue #DB ..".

> > emulated, not during emulation. Instruction may generate dozen MMIO
> > exits and eventually be aborted by exception, so DB shouldn't be generated
> > at all in this case.
> 
> How to detect this? This potential looping over user space caused sever
> headache will designing the TF feature. I thought I got all cases.
> 
How to detect that emulation is complete? emulate_instruction() should
return EMULATE_DONE in this case.

> The current model is: if we first execute a different instruction than
> the one that singlestep targets at, singlestep should be cleared without
> having any effect. Re-executing the stepped instruction after a
> potential exception resolution should then reassert singlestep and
> requeue it properly.
> 
> > 
> >>>>  	if (vcpu->mmio_is_write) {
> >>>>  		vcpu->mmio_needed = 0;
> >>>>  		return EMULATE_DO_MMIO;
> >>>> @@ -4450,6 +4479,26 @@ out:
> >>>>  	return r;
> >>>>  }
> >>>>  
> >>>> +int kvm_check_guest_singlestep(struct kvm_vcpu *vcpu)
> >>>> +{
> >>>> +	unsigned long rip = kvm_rip_read(vcpu);
> >>>> +
> >>>> +	vcpu->arch.singlestep_pending = false;
> >>>> +
> >>>> +	if (vcpu->arch.singlestep_pending_cs !=
> >>>> +		get_segment_selector(vcpu, VCPU_SREG_CS) ||
> >>>> +	    vcpu->arch.singlestep_pending_rip != rip)
> >>>> +		return 1;
> >>>> +
> >>> Again how this check can be false?
> >> E.g. someone fiddled with the VCPU state, resetting the guest.
> > How is this someone? It should reset this state too then.
> 
> If somehow possible, I do not want to make this state part of the user
> visible VCPU state but rather translate it into a pending #DB + set
> DR6.BS (for guest-owned TF) or drop it (for host-owned - not critical here).
As far as I can see for guest owned TF you always inject #DB immediately
in queue_singlestep().  And I think this should always be the case. At
the end of instruction emulator should inject #DB. When we enter vcpu
 or emulator again  if #DB interception is enables and #DB is pending
intercept function should be called.

> 
> > 
> >>>> +	vcpu->run->debug.arch.dr6 = DR6_BS | DR6_FIXED_1;
> >>>> +	vcpu->run->debug.arch.dr7 = 0;
> >>>> +	vcpu->run->exit_reason = KVM_EXIT_DEBUG;
> >>>> +	vcpu->run->debug.arch.pc = get_segment_base(vcpu, VCPU_SREG_CS) + rip;
> >>>> +	vcpu->run->debug.arch.exception = DB_VECTOR;
> >>>> +
> >>>> +	return 0;
> >>>> +}
> >>>> +EXPORT_SYMBOL_GPL(kvm_check_guest_singlestep);
> >>>>  
> >>>>  static int __vcpu_run(struct kvm_vcpu *vcpu)
> >>>>  {
> >>>> @@ -4471,6 +4520,12 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
> >>>>  
> >>>>  	r = 1;
> >>>>  	while (r > 0) {
> >>>> +		if (unlikely(vcpu->arch.singlestep_pending)) {
> >>>> +			r = kvm_check_guest_singlestep(vcpu);
> >>>> +			if (r == 0)
> >>>> +				break;
> >>>> +		}
> >>>> +
> >>> Why not use existing mechanism to cause run loop to exit to userspace
> >>> i.e return 0 from vcpu_enter_guest(), instead of adding special cases here?
> >>>
> >> I wanted to exit in case of vcpu->arch.mp_state != KVM_MP_STATE_RUNNABLE
> >> as well, but thinking about this again it's actually more reasonable to
> >> exit once the VCPU unblocks again, e.g. once halt resumes.
> >>
> > We will exit immediately after halt if return value is 0, why would you
> > want to exit on vcpu entry after that?
> 
> As I said: I will move this into vcpu_enter_guest.
> 
> Jan
> 
> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

--
			Gleb.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 6/6] KVM: x86: Emulator support for TF
  2010-02-23 11:00           ` Gleb Natapov
@ 2010-02-23 11:04             ` Avi Kivity
  2010-02-23 11:30             ` Jan Kiszka
  1 sibling, 0 replies; 29+ messages in thread
From: Avi Kivity @ 2010-02-23 11:04 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Jan Kiszka, Marcelo Tosatti, kvm

On 02/23/2010 01:00 PM, Gleb Natapov wrote:
>>>
>>> They will not trigger before that. They will be queued for the next
>>> entry and queuing another one will either overwrite the previous one,
>>> or will queue double fault (depending on what what the first exception).
>>>        
>> The will not stack as the instruction failed, thus no singlestep will be
>> queued as well.
>>      
> Instruction failed doesn't mean emulation failed, so lets see what
> happens when you single step over instruction that generates page fault.
> #PF is queued and x86_emulate_insn() returns 0 to emulate_instruction()
> no you call queue_singlestep() which calls kvm_queue_exception(vcpu, DB_VECTOR);
> and this cause #DF to be injected.
>    

This looks like a test case btw.

We need to test, at least:

- unemulated single step
- emulated single step successful insn
- emulated single step faulting insn

and probably more.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 6/6] KVM: x86: Emulator support for TF
  2010-02-23 11:00           ` Gleb Natapov
  2010-02-23 11:04             ` Avi Kivity
@ 2010-02-23 11:30             ` Jan Kiszka
  2010-02-23 11:41               ` Avi Kivity
  2010-02-23 12:02               ` Gleb Natapov
  1 sibling, 2 replies; 29+ messages in thread
From: Jan Kiszka @ 2010-02-23 11:30 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Avi Kivity, Marcelo Tosatti, kvm

Gleb Natapov wrote:
> On Tue, Feb 23, 2010 at 11:37:21AM +0100, Jan Kiszka wrote:
>> Gleb Natapov wrote:
>>> On Tue, Feb 23, 2010 at 11:10:57AM +0100, Jan Kiszka wrote:
>>>> Gleb Natapov wrote:
>>>>> On Mon, Feb 22, 2010 at 06:51:23PM +0100, Jan Kiszka wrote:
>>>>>> Support both guest- as well as host-owned EFLAGS.TF while emulating
>>>>>> instructions. For guest-owned TF, we simply inject DB and update DR6.BS
>>>>>> after completing an instruction that has TF set on entry. To support
>>>>>> guest single-stepping under host control, we store the pending step
>>>>>> along with its CS and RIP and trigger a corresponding user space exit
>>>>>> once guest execution is about to resume. This check is is also required
>>>>>> in the VMX emulation loop during invalid guest states.
>>>>>>
>>>>> Emulator currently is a total mess. It is not a good time to add more mess
>>>>> there right now IMO.
>>>> Then let's clean up what you consider "mess" in this feature. Unless
>>>> there are plans to clean up the emulator for the next or next-but-one
>>>> kernel release, I do not want to wait for this.
>>>>
>>> There are plans to cleanup the emulator.
>> When?
> ASAP :) I am looking into that, but it will not be easy.

Ok, so you are targeting 2.6.35? Then I'm fine to wait for this, keeping
the patch for local use so far.

But we should then merge patch 5 as a workaround so that guest debugging
is at least not completely broken when stepping over emulated instructions.

> 
>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>> ---
>>>>>>  arch/x86/include/asm/kvm_host.h |    5 +++
>>>>>>  arch/x86/kvm/vmx.c              |    6 +++
>>>>>>  arch/x86/kvm/x86.c              |   65 ++++++++++++++++++++++++++++++++++-----
>>>>>>  3 files changed, 68 insertions(+), 8 deletions(-)
>>>>>>
>>>>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>>>>>> index d46e791..d69d8aa 100644
>>>>>> --- a/arch/x86/include/asm/kvm_host.h
>>>>>> +++ b/arch/x86/include/asm/kvm_host.h
>>>>>> @@ -362,8 +362,11 @@ struct kvm_vcpu_arch {
>>>>>>  	u64 *mce_banks;
>>>>>>  
>>>>>>  	/* used for guest single stepping over the given code position */
>>>>>> +	bool singlestep_pending;
>>>>>>  	u16 singlestep_cs;
>>>>>> +	u16 singlestep_pending_cs;
>>>>>>  	unsigned long singlestep_rip;
>>>>>> +	unsigned long singlestep_pending_rip;
>>>>> If we are going to have many of those rip/cs pairs may be it is better
>>>>> to add structure linear_ip and have functions is_same_ip().
>>>> Agreed.
>>>>
>>>>>  
>>>>>>  	/* fields used by HYPER-V emulation */
>>>>>>  	u64 hv_vapic;
>>>>>>  };
>>>>>> @@ -820,4 +823,6 @@ int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
>>>>>>  void kvm_define_shared_msr(unsigned index, u32 msr);
>>>>>>  void kvm_set_shared_msr(unsigned index, u64 val, u64 mask);
>>>>>>  
>>>>>> +int kvm_check_guest_singlestep(struct kvm_vcpu *vcpu);
>>>>>> +
>>>>>>  #endif /* _ASM_X86_KVM_HOST_H */
>>>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>>>>> index d772476..317828f 100644
>>>>>> --- a/arch/x86/kvm/vmx.c
>>>>>> +++ b/arch/x86/kvm/vmx.c
>>>>>> @@ -3489,6 +3489,12 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
>>>>>>  			goto out;
>>>>>>  		if (need_resched())
>>>>>>  			schedule();
>>>>>> +
>>>>>> +		if (unlikely(vcpu->arch.singlestep_pending)) {
>>>>>> +			ret = kvm_check_guest_singlestep(vcpu);
>>>>>> +			if (ret == 0)
>>>>>> +				goto out;
>>>>>> +		}
>>>>>>  	}
>>>>>>  
>>>>>>  	vmx->emulation_required = 0;
>>>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>>>>> index 19e8b28..6ebebb9 100644
>>>>>> --- a/arch/x86/kvm/x86.c
>>>>>> +++ b/arch/x86/kvm/x86.c
>>>>>> @@ -3441,6 +3441,27 @@ static void cache_all_regs(struct kvm_vcpu *vcpu)
>>>>>>  	vcpu->arch.regs_dirty = ~0;
>>>>>>  }
>>>>>>  
>>>>>> +static u16 get_segment_selector(struct kvm_vcpu *vcpu, int seg)
>>>>>> +{
>>>>>> +	struct kvm_segment kvm_seg;
>>>>>> +
>>>>>> +	kvm_get_segment(vcpu, &kvm_seg, seg);
>>>>>> +	return kvm_seg.selector;
>>>>>> +}
>>>>>> +
>>>>>> +static void queue_singlestep(struct kvm_vcpu *vcpu)
>>>>>> +{
>>>>>> +	if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) {
>>>>>> +		vcpu->arch.singlestep_pending = true;
>>>>>> +		vcpu->arch.singlestep_pending_cs =
>>>>>> +			get_segment_selector(vcpu, VCPU_SREG_CS);
>>>>>> +		vcpu->arch.singlestep_pending_rip = kvm_rip_read(vcpu);
>>>>> Why should we remember rip where TF happened? We should exit
>>>>> immediately to userspace anyway, no?
>>>> I think MMIO exits takes precedence, so this is intended to exit after
>>>> they completed, ie. after the instruction is fully finished.
>>>>
>>>>>> +	} else {
>>>>>> +		vcpu->arch.dr6 |= DR6_BS;
>>>>>> +		kvm_queue_exception(vcpu, DB_VECTOR);
>>>>> What if instruction emulation generated fault?
>>>> Fault-like exceptions will trigger before that, and the instruction
>>>> won't complete. Do we have any trap-like exceptions to worry about?
>>>>
>>> They will not trigger before that. They will be queued for the next
>>> entry and queuing another one will either overwrite the previous one,
>>> or will queue double fault (depending on what what the first exception).
>> The will not stack as the instruction failed, thus no singlestep will be
>> queued as well.
> Instruction failed doesn't mean emulation failed, so lets see what
> happens when you single step over instruction that generates page fault.
> #PF is queued and x86_emulate_insn() returns 0 to emulate_instruction()
> no you call queue_singlestep() which calls kvm_queue_exception(vcpu, DB_VECTOR);
> and this cause #DF to be injected.

Right, this doesn't work yet for guest-owned TF. So we also need to
check if RIP progressed before queuing #DB.

>>>>>> +	}
>>>>>> +}
>>>>>> +
>>>>>>  int emulate_instruction(struct kvm_vcpu *vcpu,
>>>>>>  			unsigned long cr2,
>>>>>>  			u16 error_code,
>>>>>> @@ -3449,6 +3470,7 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
>>>>>>  	int r, shadow_mask;
>>>>>>  	struct decode_cache *c;
>>>>>>  	struct kvm_run *run = vcpu->run;
>>>>>> +	bool singlestep;
>>>>>>  
>>>>>>  	kvm_clear_exception_queue(vcpu);
>>>>>>  	vcpu->arch.mmio_fault_cr2 = cr2;
>>>>>> @@ -3515,8 +3537,12 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
>>>>>>  		}
>>>>>>  	}
>>>>>>  
>>>>>> +	singlestep = vcpu->arch.emulate_ctxt.eflags & X86_EFLAGS_TF;
>>>>>> +
>>>>>>  	if (emulation_type & EMULTYPE_SKIP) {
>>>>>>  		kvm_rip_write(vcpu, vcpu->arch.emulate_ctxt.decode.eip);
>>>>>> +		if (singlestep)
>>>>>> +			queue_singlestep(vcpu);
>>>>> Instruction that wasn't emulated shouldn't generate faults.
>>>>>
>>>> Skipping here doesn't mean it's not emulated. A valid question might be
>>>> if we should catch it here or in skip_emulated_instruction.
>>> In skip_emulated_instruction() or even above that. Only at the point we
>>> are sure we actually emulate instruction.
>>>
>>>>>>  		return EMULATE_DONE;
>>>>>>  	}
>>>>>>  
>>>>>> @@ -3549,6 +3575,9 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
>>>>>>  
>>>>>>  	kvm_x86_ops->set_rflags(vcpu, vcpu->arch.emulate_ctxt.eflags);
>>>>>>  
>>>>>> +	if (singlestep)
>>>>>> +		queue_singlestep(vcpu);
>>>>>> +
>>>>> if vcpu->mmio_is_write == true we can still exit with DO_MMIO, so
>>>>> instruction is not yes completely executed. This queue_singlestep
>>>>> mechanism looks bogus anyway. emulate_instruction() caller should
>>>>> initiate exit to userspace space if required.
>>>> That's while it is _queued_, not immediately delivered: MMIO exits will
>>>> continue to take precedence.
>>> That is bogus. We should queue TF only after instruction is completely
> This should have been "We should queue #DB ..".
> 
>>> emulated, not during emulation. Instruction may generate dozen MMIO
>>> exits and eventually be aborted by exception, so DB shouldn't be generated
>>> at all in this case.
>> How to detect this? This potential looping over user space caused sever
>> headache will designing the TF feature. I thought I got all cases.
>>
> How to detect that emulation is complete? emulate_instruction() should
> return EMULATE_DONE in this case.

...*and* RIP moved forward.

> 
>> The current model is: if we first execute a different instruction than
>> the one that singlestep targets at, singlestep should be cleared without
>> having any effect. Re-executing the stepped instruction after a
>> potential exception resolution should then reassert singlestep and
>> requeue it properly.
>>
>>>>>>  	if (vcpu->mmio_is_write) {
>>>>>>  		vcpu->mmio_needed = 0;
>>>>>>  		return EMULATE_DO_MMIO;
>>>>>> @@ -4450,6 +4479,26 @@ out:
>>>>>>  	return r;
>>>>>>  }
>>>>>>  
>>>>>> +int kvm_check_guest_singlestep(struct kvm_vcpu *vcpu)
>>>>>> +{
>>>>>> +	unsigned long rip = kvm_rip_read(vcpu);
>>>>>> +
>>>>>> +	vcpu->arch.singlestep_pending = false;
>>>>>> +
>>>>>> +	if (vcpu->arch.singlestep_pending_cs !=
>>>>>> +		get_segment_selector(vcpu, VCPU_SREG_CS) ||
>>>>>> +	    vcpu->arch.singlestep_pending_rip != rip)
>>>>>> +		return 1;
>>>>>> +
>>>>> Again how this check can be false?
>>>> E.g. someone fiddled with the VCPU state, resetting the guest.
>>> How is this someone? It should reset this state too then.
>> If somehow possible, I do not want to make this state part of the user
>> visible VCPU state but rather translate it into a pending #DB + set
>> DR6.BS (for guest-owned TF) or drop it (for host-owned - not critical here).
> As far as I can see for guest owned TF you always inject #DB immediately
> in queue_singlestep().  And I think this should always be the case. At
> the end of instruction emulator should inject #DB. When we enter vcpu
>  or emulator again  if #DB interception is enables and #DB is pending
> intercept function should be called.

I'm trying to avoid the need for this by handling the reason for #DB
interception separately. We have to anyway as we need to emulate the
effects like DR6 updates etc. differently (#DB interceptions come with
vendor-specific state changes / exit codes).

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 6/6] KVM: x86: Emulator support for TF
  2010-02-23 11:30             ` Jan Kiszka
@ 2010-02-23 11:41               ` Avi Kivity
  2010-02-23 12:03                 ` Jan Kiszka
  2010-02-23 12:05                 ` Gleb Natapov
  2010-02-23 12:02               ` Gleb Natapov
  1 sibling, 2 replies; 29+ messages in thread
From: Avi Kivity @ 2010-02-23 11:41 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Gleb Natapov, Marcelo Tosatti, kvm

On 02/23/2010 01:30 PM, Jan Kiszka wrote:
>>
>> How to detect that emulation is complete? emulate_instruction() should
>> return EMULATE_DONE in this case.
>>      
> ...*and* RIP moved forward.
>    

A branch or rep instruction can successfully execute and not change rip.

Btw, do we expect a #DB on every iteration of rep?  In this case we need 
to modify the code, currently we'll batch rep;ins and rep;outs up to a 
page's worth.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 6/6] KVM: x86: Emulator support for TF
  2010-02-23 11:41               ` Avi Kivity
@ 2010-02-23 12:03                 ` Jan Kiszka
  2010-02-23 12:05                 ` Gleb Natapov
  1 sibling, 0 replies; 29+ messages in thread
From: Jan Kiszka @ 2010-02-23 12:03 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Gleb Natapov, Marcelo Tosatti, kvm

Avi Kivity wrote:
> On 02/23/2010 01:30 PM, Jan Kiszka wrote:
>>> How to detect that emulation is complete? emulate_instruction() should
>>> return EMULATE_DONE in this case.
>>>      
>> ...*and* RIP moved forward.
>>    
> 
> A branch or rep instruction can successfully execute and not change rip.
> 
> Btw, do we expect a #DB on every iteration of rep?  In this case we need 
> to modify the code, currently we'll batch rep;ins and rep;outs up to a 
> page's worth.

Right, TF triggers after each rep step. So we need to break this up when
tracing is on. That's now definitely something we need a deeper emulator
change.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 6/6] KVM: x86: Emulator support for TF
  2010-02-23 11:41               ` Avi Kivity
  2010-02-23 12:03                 ` Jan Kiszka
@ 2010-02-23 12:05                 ` Gleb Natapov
  1 sibling, 0 replies; 29+ messages in thread
From: Gleb Natapov @ 2010-02-23 12:05 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Jan Kiszka, Marcelo Tosatti, kvm

On Tue, Feb 23, 2010 at 01:41:44PM +0200, Avi Kivity wrote:
> On 02/23/2010 01:30 PM, Jan Kiszka wrote:
> >>
> >>How to detect that emulation is complete? emulate_instruction() should
> >>return EMULATE_DONE in this case.
> >...*and* RIP moved forward.
> 
> A branch or rep instruction can successfully execute and not change rip.
> 
Jmp instruction can successfully execute and not change rip :) Also
I think exception generation should be considered as part of instruction
execution and sometimes even intended effect (ud instructions for
instance).

> Btw, do we expect a #DB on every iteration of rep?  In this case we
> need to modify the code, currently we'll batch rep;ins and rep;outs
> up to a page's worth.
> 
> -- 
> error compiling committee.c: too many arguments to function

--
			Gleb.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 6/6] KVM: x86: Emulator support for TF
  2010-02-23 11:30             ` Jan Kiszka
  2010-02-23 11:41               ` Avi Kivity
@ 2010-02-23 12:02               ` Gleb Natapov
  1 sibling, 0 replies; 29+ messages in thread
From: Gleb Natapov @ 2010-02-23 12:02 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, Marcelo Tosatti, kvm

On Tue, Feb 23, 2010 at 12:30:27PM +0100, Jan Kiszka wrote:
> Gleb Natapov wrote:
> > On Tue, Feb 23, 2010 at 11:37:21AM +0100, Jan Kiszka wrote:
> >> Gleb Natapov wrote:
> >>> On Tue, Feb 23, 2010 at 11:10:57AM +0100, Jan Kiszka wrote:
> >>>> Gleb Natapov wrote:
> >>>>> On Mon, Feb 22, 2010 at 06:51:23PM +0100, Jan Kiszka wrote:
> >>>>>> Support both guest- as well as host-owned EFLAGS.TF while emulating
> >>>>>> instructions. For guest-owned TF, we simply inject DB and update DR6.BS
> >>>>>> after completing an instruction that has TF set on entry. To support
> >>>>>> guest single-stepping under host control, we store the pending step
> >>>>>> along with its CS and RIP and trigger a corresponding user space exit
> >>>>>> once guest execution is about to resume. This check is is also required
> >>>>>> in the VMX emulation loop during invalid guest states.
> >>>>>>
> >>>>> Emulator currently is a total mess. It is not a good time to add more mess
> >>>>> there right now IMO.
> >>>> Then let's clean up what you consider "mess" in this feature. Unless
> >>>> there are plans to clean up the emulator for the next or next-but-one
> >>>> kernel release, I do not want to wait for this.
> >>>>
> >>> There are plans to cleanup the emulator.
> >> When?
> > ASAP :) I am looking into that, but it will not be easy.
> 
> Ok, so you are targeting 2.6.35? Then I'm fine to wait for this, keeping
> the patch for local use so far.
> 
As I said ASAP :) I can't as of yet tell how much time it will take.
Right now I am trying to revive emulator tests that we have, since
touching this code without test cases is madness.

> But we should then merge patch 5 as a workaround so that guest debugging
> is at least not completely broken when stepping over emulated instructions.
Yes, patch 5 is fine.

--
			Gleb.

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2010-02-23 12:05 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-22 17:51 [PATCH 0/6] KVM: Enhancements and fixes around guest debugging Jan Kiszka
2010-02-22 17:51 ` [PATCH 1/6] KVM: VMX: Update instruction length on intercepted BP Jan Kiszka
2010-02-22 17:51 ` [PATCH 2/6] KVM: SVM: Emulate nRIP feature when reinjecting INT3 Jan Kiszka
2010-02-23 10:13   ` Gleb Natapov
2010-02-23 10:17     ` Jan Kiszka
2010-02-23 10:23       ` Avi Kivity
2010-02-22 17:51 ` [PATCH 3/6] KVM: x86: Add KVM_CAP_X86_ROBUST_SINGLESTEP Jan Kiszka
2010-02-22 17:51 ` [PATCH 4/6] KVM: x86: Drop RF manipulation for guest single-stepping Jan Kiszka
2010-02-22 17:51 ` [PATCH 5/6] KVM: x86: Preserve injected TF across emulation Jan Kiszka
2010-02-23 10:00   ` Gleb Natapov
2010-02-23 10:13     ` Jan Kiszka
2010-02-23 10:31       ` Gleb Natapov
2010-02-23 10:40         ` Jan Kiszka
2010-02-23 11:03           ` Gleb Natapov
2010-02-22 17:51 ` [PATCH 6/6] KVM: x86: Emulator support for TF Jan Kiszka
2010-02-23  9:55   ` Gleb Natapov
2010-02-23 10:10     ` Jan Kiszka
2010-02-23 10:26       ` Gleb Natapov
2010-02-23 10:29         ` Avi Kivity
2010-02-23 10:32           ` Gleb Natapov
2010-02-23 10:34             ` Avi Kivity
2010-02-23 10:37         ` Jan Kiszka
2010-02-23 11:00           ` Gleb Natapov
2010-02-23 11:04             ` Avi Kivity
2010-02-23 11:30             ` Jan Kiszka
2010-02-23 11:41               ` Avi Kivity
2010-02-23 12:03                 ` Jan Kiszka
2010-02-23 12:05                 ` Gleb Natapov
2010-02-23 12:02               ` Gleb Natapov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox