linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] KVM: x86: restore IP after all far jump failures
@ 2016-11-22 19:21 Radim Krčmář
  2016-11-22 19:34 ` Paolo Bonzini
  2016-11-22 20:03 ` Jim Mattson
  0 siblings, 2 replies; 8+ messages in thread
From: Radim Krčmář @ 2016-11-22 19:21 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, stable, Nadav Amit, Dmitry Vyukov,
	Steve Rutherford, Andrew Honig, Prasad Pandit

em_jmp_far and em_ret_far assumed that setting IP can only fail in 64
bit mode, but syzkaller proved otherwise (and SDM agrees).
Code segment was restored upon failure, but it was left uninitialized
outside of long mode, which could lead to a leak of host kernel stack.

Found by syzkaller:

  WARNING: CPU: 2 PID: 3668 at arch/x86/kvm/emulate.c:2217 em_ret_far+0x428/0x480
  Kernel panic - not syncing: panic_on_warn set ...

  CPU: 2 PID: 3668 Comm: syz-executor Not tainted 4.9.0-rc4+ #49
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
   [...]
  Call Trace:
   [...] __dump_stack lib/dump_stack.c:15
   [...] dump_stack+0xb3/0x118 lib/dump_stack.c:51
   [...] panic+0x1b7/0x3a3 kernel/panic.c:179
   [...] __warn+0x1c4/0x1e0 kernel/panic.c:542
   [...] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585
   [...] em_ret_far+0x428/0x480 arch/x86/kvm/emulate.c:2217
   [...] em_ret_far_imm+0x17/0x70 arch/x86/kvm/emulate.c:2227
   [...] x86_emulate_insn+0x87a/0x3730 arch/x86/kvm/emulate.c:5294
   [...] x86_emulate_instruction+0x520/0x1ba0 arch/x86/kvm/x86.c:5545
   [...] emulate_instruction arch/x86/include/asm/kvm_host.h:1116
   [...] complete_emulated_io arch/x86/kvm/x86.c:6870
   [...] complete_emulated_mmio+0x4e9/0x710 arch/x86/kvm/x86.c:6934
   [...] kvm_arch_vcpu_ioctl_run+0x3b7a/0x5a90 arch/x86/kvm/x86.c:6978
   [...] kvm_vcpu_ioctl+0x61e/0xdd0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2557
   [...] vfs_ioctl fs/ioctl.c:43
   [...] do_vfs_ioctl+0x18c/0x1040 fs/ioctl.c:679
   [...] SYSC_ioctl fs/ioctl.c:694
   [...] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:685
   [...] entry_SYSCALL_64_fastpath+0x1f/0xc2

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: stable@vger.kernel.org
Fixes: d1442d85cc30 ("KVM: x86: Handle errors when RIP is set during far jumps")
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Steve Rutherford <srutherford@google.com>
Cc: Andrew Honig <ahonig@google.com>
Cc: Prasad Pandit <ppandit@redhat.com>
---
 arch/x86/kvm/emulate.c | 20 ++++++--------------
 1 file changed, 6 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 7d4f9b7f06ee..d8a812b6204d 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2137,10 +2137,7 @@ static int em_jmp_far(struct x86_emulate_ctxt *ctxt)
 	const struct x86_emulate_ops *ops = ctxt->ops;
 	u8 cpl = ctxt->ops->cpl(ctxt);
 
-	/* Assignment of RIP may only fail in 64-bit mode */
-	if (ctxt->mode == X86EMUL_MODE_PROT64)
-		ops->get_segment(ctxt, &old_sel, &old_desc, NULL,
-				 VCPU_SREG_CS);
+	ops->get_segment(ctxt, &old_sel, &old_desc, NULL, VCPU_SREG_CS);
 
 	memcpy(&sel, ctxt->src.valptr + ctxt->op_bytes, 2);
 
@@ -2151,12 +2148,10 @@ static int em_jmp_far(struct x86_emulate_ctxt *ctxt)
 		return rc;
 
 	rc = assign_eip_far(ctxt, ctxt->src.val, &new_desc);
-	if (rc != X86EMUL_CONTINUE) {
-		WARN_ON(ctxt->mode != X86EMUL_MODE_PROT64);
+	if (rc != X86EMUL_CONTINUE)
 		/* assigning eip failed; restore the old cs */
 		ops->set_segment(ctxt, old_sel, &old_desc, 0, VCPU_SREG_CS);
-		return rc;
-	}
+
 	return rc;
 }
 
@@ -2221,9 +2216,7 @@ static int em_ret_far(struct x86_emulate_ctxt *ctxt)
 	struct desc_struct old_desc, new_desc;
 	const struct x86_emulate_ops *ops = ctxt->ops;
 
-	if (ctxt->mode == X86EMUL_MODE_PROT64)
-		ops->get_segment(ctxt, &old_cs, &old_desc, NULL,
-				 VCPU_SREG_CS);
+	ops->get_segment(ctxt, &old_cs, &old_desc, NULL, VCPU_SREG_CS);
 
 	rc = emulate_pop(ctxt, &eip, ctxt->op_bytes);
 	if (rc != X86EMUL_CONTINUE)
@@ -2240,10 +2233,9 @@ static int em_ret_far(struct x86_emulate_ctxt *ctxt)
 	if (rc != X86EMUL_CONTINUE)
 		return rc;
 	rc = assign_eip_far(ctxt, eip, &new_desc);
-	if (rc != X86EMUL_CONTINUE) {
-		WARN_ON(ctxt->mode != X86EMUL_MODE_PROT64);
+	if (rc != X86EMUL_CONTINUE)
 		ops->set_segment(ctxt, old_cs, &old_desc, 0, VCPU_SREG_CS);
-	}
+
 	return rc;
 }
 
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] KVM: x86: restore IP after all far jump failures
  2016-11-22 19:21 [PATCH] KVM: x86: restore IP after all far jump failures Radim Krčmář
@ 2016-11-22 19:34 ` Paolo Bonzini
  2016-11-22 19:44   ` Nadav Amit
       [not found]   ` <F346D675-FC84-4595-BC1D-C1049F7D6B7E@gmail.com>
  2016-11-22 20:03 ` Jim Mattson
  1 sibling, 2 replies; 8+ messages in thread
From: Paolo Bonzini @ 2016-11-22 19:34 UTC (permalink / raw)
  To: Radim Krčmář, linux-kernel, kvm
  Cc: stable, Nadav Amit, Dmitry Vyukov, Steve Rutherford, Andrew Honig,
	Prasad Pandit



On 22/11/2016 20:21, Radim Krčmář wrote:
> em_jmp_far and em_ret_far assumed that setting IP can only fail in 64
> bit mode, but syzkaller proved otherwise (and SDM agrees).
> Code segment was restored upon failure, but it was left uninitialized
> outside of long mode, which could lead to a leak of host kernel stack.
> 
> Found by syzkaller:
> 
>   WARNING: CPU: 2 PID: 3668 at arch/x86/kvm/emulate.c:2217 em_ret_far+0x428/0x480
>   Kernel panic - not syncing: panic_on_warn set ...
> 
>   CPU: 2 PID: 3668 Comm: syz-executor Not tainted 4.9.0-rc4+ #49
>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>    [...]
>   Call Trace:
>    [...] __dump_stack lib/dump_stack.c:15
>    [...] dump_stack+0xb3/0x118 lib/dump_stack.c:51
>    [...] panic+0x1b7/0x3a3 kernel/panic.c:179
>    [...] __warn+0x1c4/0x1e0 kernel/panic.c:542
>    [...] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585
>    [...] em_ret_far+0x428/0x480 arch/x86/kvm/emulate.c:2217
>    [...] em_ret_far_imm+0x17/0x70 arch/x86/kvm/emulate.c:2227
>    [...] x86_emulate_insn+0x87a/0x3730 arch/x86/kvm/emulate.c:5294
>    [...] x86_emulate_instruction+0x520/0x1ba0 arch/x86/kvm/x86.c:5545
>    [...] emulate_instruction arch/x86/include/asm/kvm_host.h:1116
>    [...] complete_emulated_io arch/x86/kvm/x86.c:6870
>    [...] complete_emulated_mmio+0x4e9/0x710 arch/x86/kvm/x86.c:6934
>    [...] kvm_arch_vcpu_ioctl_run+0x3b7a/0x5a90 arch/x86/kvm/x86.c:6978
>    [...] kvm_vcpu_ioctl+0x61e/0xdd0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2557
>    [...] vfs_ioctl fs/ioctl.c:43
>    [...] do_vfs_ioctl+0x18c/0x1040 fs/ioctl.c:679
>    [...] SYSC_ioctl fs/ioctl.c:694
>    [...] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:685
>    [...] entry_SYSCALL_64_fastpath+0x1f/0xc2
> 
> Reported-by: Dmitry Vyukov <dvyukov@google.com>
> Cc: stable@vger.kernel.org
> Fixes: d1442d85cc30 ("KVM: x86: Handle errors when RIP is set during far jumps")
> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
> ---
> Cc: Nadav Amit <nadav.amit@gmail.com>
> Cc: Dmitry Vyukov <dvyukov@google.com>
> Cc: Steve Rutherford <srutherford@google.com>
> Cc: Andrew Honig <ahonig@google.com>
> Cc: Prasad Pandit <ppandit@redhat.com>
> ---
>  arch/x86/kvm/emulate.c | 20 ++++++--------------
>  1 file changed, 6 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index 7d4f9b7f06ee..d8a812b6204d 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -2137,10 +2137,7 @@ static int em_jmp_far(struct x86_emulate_ctxt *ctxt)
>  	const struct x86_emulate_ops *ops = ctxt->ops;
>  	u8 cpl = ctxt->ops->cpl(ctxt);
>  
> -	/* Assignment of RIP may only fail in 64-bit mode */
> -	if (ctxt->mode == X86EMUL_MODE_PROT64)
> -		ops->get_segment(ctxt, &old_sel, &old_desc, NULL,
> -				 VCPU_SREG_CS);
> +	ops->get_segment(ctxt, &old_sel, &old_desc, NULL, VCPU_SREG_CS);
>  
>  	memcpy(&sel, ctxt->src.valptr + ctxt->op_bytes, 2);
>  
> @@ -2151,12 +2148,10 @@ static int em_jmp_far(struct x86_emulate_ctxt *ctxt)
>  		return rc;
>  
>  	rc = assign_eip_far(ctxt, ctxt->src.val, &new_desc);
> -	if (rc != X86EMUL_CONTINUE) {
> -		WARN_ON(ctxt->mode != X86EMUL_MODE_PROT64);
> +	if (rc != X86EMUL_CONTINUE)
>  		/* assigning eip failed; restore the old cs */
>  		ops->set_segment(ctxt, old_sel, &old_desc, 0, VCPU_SREG_CS);
> -		return rc;
> -	}
> +
>  	return rc;
>  }
>  
> @@ -2221,9 +2216,7 @@ static int em_ret_far(struct x86_emulate_ctxt *ctxt)
>  	struct desc_struct old_desc, new_desc;
>  	const struct x86_emulate_ops *ops = ctxt->ops;
>  
> -	if (ctxt->mode == X86EMUL_MODE_PROT64)
> -		ops->get_segment(ctxt, &old_cs, &old_desc, NULL,
> -				 VCPU_SREG_CS);
> +	ops->get_segment(ctxt, &old_cs, &old_desc, NULL, VCPU_SREG_CS);
>  
>  	rc = emulate_pop(ctxt, &eip, ctxt->op_bytes);
>  	if (rc != X86EMUL_CONTINUE)
> @@ -2240,10 +2233,9 @@ static int em_ret_far(struct x86_emulate_ctxt *ctxt)
>  	if (rc != X86EMUL_CONTINUE)
>  		return rc;
>  	rc = assign_eip_far(ctxt, eip, &new_desc);
> -	if (rc != X86EMUL_CONTINUE) {
> -		WARN_ON(ctxt->mode != X86EMUL_MODE_PROT64);
> +	if (rc != X86EMUL_CONTINUE)
>  		ops->set_segment(ctxt, old_cs, &old_desc, 0, VCPU_SREG_CS);
> -	}
> +
>  	return rc;
>  }
>  
> 

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] KVM: x86: restore IP after all far jump failures
  2016-11-22 19:34 ` Paolo Bonzini
@ 2016-11-22 19:44   ` Nadav Amit
       [not found]   ` <F346D675-FC84-4595-BC1D-C1049F7D6B7E@gmail.com>
  1 sibling, 0 replies; 8+ messages in thread
From: Nadav Amit @ 2016-11-22 19:44 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Radim Krčmář, linux-kernel, kvm, stable,
	Dmitry Vyukov, Steve Rutherford, Andrew Honig, Prasad Pandit

I admit my wrongdoings, but I still think the fix should have been to
remove the entire recovery logic and just return X86EMUL_UNHANDLEABLE if
something goes wrong (exception). This will kill the misbehaving process
but keep the VM running.

Otherwise, a malicious VM process, which can somehow control descriptors
(LDT?) may modify the descriptor during the emulation and get the system
to inconsistent state and prevent the VM-entry.

No?

Nadav

> On Nov 22, 2016, at 11:34 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> 
> 
> On 22/11/2016 20:21, Radim Krčmář wrote:
>> em_jmp_far and em_ret_far assumed that setting IP can only fail in 64
>> bit mode, but syzkaller proved otherwise (and SDM agrees).
>> Code segment was restored upon failure, but it was left uninitialized
>> outside of long mode, which could lead to a leak of host kernel stack.
>> 
>> Found by syzkaller:
>> 
>>  WARNING: CPU: 2 PID: 3668 at arch/x86/kvm/emulate.c:2217 em_ret_far+0x428/0x480
>>  Kernel panic - not syncing: panic_on_warn set ...
>> 
>>  CPU: 2 PID: 3668 Comm: syz-executor Not tainted 4.9.0-rc4+ #49
>>  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>>   [...]
>>  Call Trace:
>>   [...] __dump_stack lib/dump_stack.c:15
>>   [...] dump_stack+0xb3/0x118 lib/dump_stack.c:51
>>   [...] panic+0x1b7/0x3a3 kernel/panic.c:179
>>   [...] __warn+0x1c4/0x1e0 kernel/panic.c:542
>>   [...] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585
>>   [...] em_ret_far+0x428/0x480 arch/x86/kvm/emulate.c:2217
>>   [...] em_ret_far_imm+0x17/0x70 arch/x86/kvm/emulate.c:2227
>>   [...] x86_emulate_insn+0x87a/0x3730 arch/x86/kvm/emulate.c:5294
>>   [...] x86_emulate_instruction+0x520/0x1ba0 arch/x86/kvm/x86.c:5545
>>   [...] emulate_instruction arch/x86/include/asm/kvm_host.h:1116
>>   [...] complete_emulated_io arch/x86/kvm/x86.c:6870
>>   [...] complete_emulated_mmio+0x4e9/0x710 arch/x86/kvm/x86.c:6934
>>   [...] kvm_arch_vcpu_ioctl_run+0x3b7a/0x5a90 arch/x86/kvm/x86.c:6978
>>   [...] kvm_vcpu_ioctl+0x61e/0xdd0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2557
>>   [...] vfs_ioctl fs/ioctl.c:43
>>   [...] do_vfs_ioctl+0x18c/0x1040 fs/ioctl.c:679
>>   [...] SYSC_ioctl fs/ioctl.c:694
>>   [...] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:685
>>   [...] entry_SYSCALL_64_fastpath+0x1f/0xc2
>> 
>> Reported-by: Dmitry Vyukov <dvyukov@google.com>
>> Cc: stable@vger.kernel.org
>> Fixes: d1442d85cc30 ("KVM: x86: Handle errors when RIP is set during far jumps")
>> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
>> ---
>> Cc: Nadav Amit <nadav.amit@gmail.com>
>> Cc: Dmitry Vyukov <dvyukov@google.com>
>> Cc: Steve Rutherford <srutherford@google.com>
>> Cc: Andrew Honig <ahonig@google.com>
>> Cc: Prasad Pandit <ppandit@redhat.com>
>> ---
>> arch/x86/kvm/emulate.c | 20 ++++++--------------
>> 1 file changed, 6 insertions(+), 14 deletions(-)
>> 
>> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
>> index 7d4f9b7f06ee..d8a812b6204d 100644
>> --- a/arch/x86/kvm/emulate.c
>> +++ b/arch/x86/kvm/emulate.c
>> @@ -2137,10 +2137,7 @@ static int em_jmp_far(struct x86_emulate_ctxt *ctxt)
>> 	const struct x86_emulate_ops *ops = ctxt->ops;
>> 	u8 cpl = ctxt->ops->cpl(ctxt);
>> 
>> -	/* Assignment of RIP may only fail in 64-bit mode */
>> -	if (ctxt->mode == X86EMUL_MODE_PROT64)
>> -		ops->get_segment(ctxt, &old_sel, &old_desc, NULL,
>> -				 VCPU_SREG_CS);
>> +	ops->get_segment(ctxt, &old_sel, &old_desc, NULL, VCPU_SREG_CS);
>> 
>> 	memcpy(&sel, ctxt->src.valptr + ctxt->op_bytes, 2);
>> 
>> @@ -2151,12 +2148,10 @@ static int em_jmp_far(struct x86_emulate_ctxt *ctxt)
>> 		return rc;
>> 
>> 	rc = assign_eip_far(ctxt, ctxt->src.val, &new_desc);
>> -	if (rc != X86EMUL_CONTINUE) {
>> -		WARN_ON(ctxt->mode != X86EMUL_MODE_PROT64);
>> +	if (rc != X86EMUL_CONTINUE)
>> 		/* assigning eip failed; restore the old cs */
>> 		ops->set_segment(ctxt, old_sel, &old_desc, 0, VCPU_SREG_CS);
>> -		return rc;
>> -	}
>> +
>> 	return rc;
>> }
>> 
>> @@ -2221,9 +2216,7 @@ static int em_ret_far(struct x86_emulate_ctxt *ctxt)
>> 	struct desc_struct old_desc, new_desc;
>> 	const struct x86_emulate_ops *ops = ctxt->ops;
>> 
>> -	if (ctxt->mode == X86EMUL_MODE_PROT64)
>> -		ops->get_segment(ctxt, &old_cs, &old_desc, NULL,
>> -				 VCPU_SREG_CS);
>> +	ops->get_segment(ctxt, &old_cs, &old_desc, NULL, VCPU_SREG_CS);
>> 
>> 	rc = emulate_pop(ctxt, &eip, ctxt->op_bytes);
>> 	if (rc != X86EMUL_CONTINUE)
>> @@ -2240,10 +2233,9 @@ static int em_ret_far(struct x86_emulate_ctxt *ctxt)
>> 	if (rc != X86EMUL_CONTINUE)
>> 		return rc;
>> 	rc = assign_eip_far(ctxt, eip, &new_desc);
>> -	if (rc != X86EMUL_CONTINUE) {
>> -		WARN_ON(ctxt->mode != X86EMUL_MODE_PROT64);
>> +	if (rc != X86EMUL_CONTINUE)
>> 		ops->set_segment(ctxt, old_cs, &old_desc, 0, VCPU_SREG_CS);
>> -	}
>> +
>> 	return rc;
>> }
>> 
>> 
> 
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] KVM: x86: restore IP after all far jump failures
  2016-11-22 19:21 [PATCH] KVM: x86: restore IP after all far jump failures Radim Krčmář
  2016-11-22 19:34 ` Paolo Bonzini
@ 2016-11-22 20:03 ` Jim Mattson
  2016-11-22 20:58   ` Radim Krčmář
  1 sibling, 1 reply; 8+ messages in thread
From: Jim Mattson @ 2016-11-22 20:03 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: linux-kernel, kvm, Paolo Bonzini, stable, Nadav Amit,
	Dmitry Vyukov, Steve Rutherford, Andrew Honig, Prasad Pandit

Should the subject read: "KVM: x86: restore CS after all far jump failures"?

On Tue, Nov 22, 2016 at 11:21 AM, Radim Krčmář <rkrcmar@redhat.com> wrote:
> em_jmp_far and em_ret_far assumed that setting IP can only fail in 64
> bit mode, but syzkaller proved otherwise (and SDM agrees).
> Code segment was restored upon failure, but it was left uninitialized
> outside of long mode, which could lead to a leak of host kernel stack.
>
> Found by syzkaller:
>
>   WARNING: CPU: 2 PID: 3668 at arch/x86/kvm/emulate.c:2217 em_ret_far+0x428/0x480
>   Kernel panic - not syncing: panic_on_warn set ...
>
>   CPU: 2 PID: 3668 Comm: syz-executor Not tainted 4.9.0-rc4+ #49
>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>    [...]
>   Call Trace:
>    [...] __dump_stack lib/dump_stack.c:15
>    [...] dump_stack+0xb3/0x118 lib/dump_stack.c:51
>    [...] panic+0x1b7/0x3a3 kernel/panic.c:179
>    [...] __warn+0x1c4/0x1e0 kernel/panic.c:542
>    [...] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585
>    [...] em_ret_far+0x428/0x480 arch/x86/kvm/emulate.c:2217
>    [...] em_ret_far_imm+0x17/0x70 arch/x86/kvm/emulate.c:2227
>    [...] x86_emulate_insn+0x87a/0x3730 arch/x86/kvm/emulate.c:5294
>    [...] x86_emulate_instruction+0x520/0x1ba0 arch/x86/kvm/x86.c:5545
>    [...] emulate_instruction arch/x86/include/asm/kvm_host.h:1116
>    [...] complete_emulated_io arch/x86/kvm/x86.c:6870
>    [...] complete_emulated_mmio+0x4e9/0x710 arch/x86/kvm/x86.c:6934
>    [...] kvm_arch_vcpu_ioctl_run+0x3b7a/0x5a90 arch/x86/kvm/x86.c:6978
>    [...] kvm_vcpu_ioctl+0x61e/0xdd0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2557
>    [...] vfs_ioctl fs/ioctl.c:43
>    [...] do_vfs_ioctl+0x18c/0x1040 fs/ioctl.c:679
>    [...] SYSC_ioctl fs/ioctl.c:694
>    [...] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:685
>    [...] entry_SYSCALL_64_fastpath+0x1f/0xc2
>
> Reported-by: Dmitry Vyukov <dvyukov@google.com>
> Cc: stable@vger.kernel.org
> Fixes: d1442d85cc30 ("KVM: x86: Handle errors when RIP is set during far jumps")
> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
> ---
> Cc: Nadav Amit <nadav.amit@gmail.com>
> Cc: Dmitry Vyukov <dvyukov@google.com>
> Cc: Steve Rutherford <srutherford@google.com>
> Cc: Andrew Honig <ahonig@google.com>
> Cc: Prasad Pandit <ppandit@redhat.com>
> ---
>  arch/x86/kvm/emulate.c | 20 ++++++--------------
>  1 file changed, 6 insertions(+), 14 deletions(-)
>
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index 7d4f9b7f06ee..d8a812b6204d 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -2137,10 +2137,7 @@ static int em_jmp_far(struct x86_emulate_ctxt *ctxt)
>         const struct x86_emulate_ops *ops = ctxt->ops;
>         u8 cpl = ctxt->ops->cpl(ctxt);
>
> -       /* Assignment of RIP may only fail in 64-bit mode */
> -       if (ctxt->mode == X86EMUL_MODE_PROT64)
> -               ops->get_segment(ctxt, &old_sel, &old_desc, NULL,
> -                                VCPU_SREG_CS);
> +       ops->get_segment(ctxt, &old_sel, &old_desc, NULL, VCPU_SREG_CS);
>
>         memcpy(&sel, ctxt->src.valptr + ctxt->op_bytes, 2);
>
> @@ -2151,12 +2148,10 @@ static int em_jmp_far(struct x86_emulate_ctxt *ctxt)
>                 return rc;
>
>         rc = assign_eip_far(ctxt, ctxt->src.val, &new_desc);
> -       if (rc != X86EMUL_CONTINUE) {
> -               WARN_ON(ctxt->mode != X86EMUL_MODE_PROT64);
> +       if (rc != X86EMUL_CONTINUE)
>                 /* assigning eip failed; restore the old cs */
>                 ops->set_segment(ctxt, old_sel, &old_desc, 0, VCPU_SREG_CS);
> -               return rc;
> -       }
> +
>         return rc;
>  }
>
> @@ -2221,9 +2216,7 @@ static int em_ret_far(struct x86_emulate_ctxt *ctxt)
>         struct desc_struct old_desc, new_desc;
>         const struct x86_emulate_ops *ops = ctxt->ops;
>
> -       if (ctxt->mode == X86EMUL_MODE_PROT64)
> -               ops->get_segment(ctxt, &old_cs, &old_desc, NULL,
> -                                VCPU_SREG_CS);
> +       ops->get_segment(ctxt, &old_cs, &old_desc, NULL, VCPU_SREG_CS);
>
>         rc = emulate_pop(ctxt, &eip, ctxt->op_bytes);
>         if (rc != X86EMUL_CONTINUE)
> @@ -2240,10 +2233,9 @@ static int em_ret_far(struct x86_emulate_ctxt *ctxt)
>         if (rc != X86EMUL_CONTINUE)
>                 return rc;
>         rc = assign_eip_far(ctxt, eip, &new_desc);
> -       if (rc != X86EMUL_CONTINUE) {
> -               WARN_ON(ctxt->mode != X86EMUL_MODE_PROT64);
> +       if (rc != X86EMUL_CONTINUE)
>                 ops->set_segment(ctxt, old_cs, &old_desc, 0, VCPU_SREG_CS);
> -       }
> +
>         return rc;
>  }
>
> --
> 2.10.2
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] KVM: x86: restore IP after all far jump failures
       [not found]   ` <F346D675-FC84-4595-BC1D-C1049F7D6B7E@gmail.com>
@ 2016-11-22 20:56     ` Radim Krčmář
  2016-11-22 23:18       ` Nadav Amit
  0 siblings, 1 reply; 8+ messages in thread
From: Radim Krčmář @ 2016-11-22 20:56 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Paolo Bonzini, LKML, KVM list, stable, Dmitry Vyukov,
	Steve Rutherford, Andrew Honig, Prasad Pandit

2016-11-22 11:43-0800, Nadav Amit:
> I admit my wrongdoings, but I still think the fix should have been to
> remove the entire recovery logic and just return X86EMUL_UNHANDLEABLE if
> something goes wrong (exception). This will kill the misbehaving process
> but keep the VM running.

X86EMUL_UNHANDLEABLE will kill the whole VM (on QEMU, other userspaces
might handle the instruction and resume KVM).

The recovery path is in the spec, which means that nothing goes wrong.
I think we implement the spec quite well now, so keeping the #GP and CS
recovery is slightly better, although not safer.

> Otherwise, a malicious VM process, which can somehow control descriptors
> (LDT?) may modify the descriptor during the emulation and get the system
> to inconsistent state and prevent the VM-entry.

We restore the original CS -- malicious guest would get killed on a
failed VM entry anyway, so the difference is only in KVM internal error
code (assuming there are no other bugs).

Am I misunderstanding something?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] KVM: x86: restore IP after all far jump failures
  2016-11-22 20:03 ` Jim Mattson
@ 2016-11-22 20:58   ` Radim Krčmář
  0 siblings, 0 replies; 8+ messages in thread
From: Radim Krčmář @ 2016-11-22 20:58 UTC (permalink / raw)
  To: Jim Mattson
  Cc: linux-kernel, kvm, Paolo Bonzini, stable, Nadav Amit,
	Dmitry Vyukov, Steve Rutherford, Andrew Honig, Prasad Pandit

2016-11-22 12:03-0800, Jim Mattson:
> Should the subject read: "KVM: x86: restore CS after all far jump failures"?

Yes, thanks.  Maybe "preserve CS" would be even better.

I'll fix it when applying if there are no other problems.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] KVM: x86: restore IP after all far jump failures
  2016-11-22 20:56     ` Radim Krčmář
@ 2016-11-22 23:18       ` Nadav Amit
  2016-11-23 16:23         ` Radim Krčmář
  0 siblings, 1 reply; 8+ messages in thread
From: Nadav Amit @ 2016-11-22 23:18 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: Paolo Bonzini, LKML, KVM list, stable, Dmitry Vyukov,
	Steve Rutherford, Andrew Honig, Prasad Pandit


> On Nov 22, 2016, at 12:56 PM, Radim Krčmář <rkrcmar@redhat.com> wrote:
> 
> 2016-11-22 11:43-0800, Nadav Amit:
>> I admit my wrongdoings, but I still think the fix should have been to
>> remove the entire recovery logic and just return X86EMUL_UNHANDLEABLE if
>> something goes wrong (exception). This will kill the misbehaving process
>> but keep the VM running.
> 
> X86EMUL_UNHANDLEABLE will kill the whole VM (on QEMU, other userspaces
> might handle the instruction and resume KVM).

I don’t think so. If CPL is not 0, handle_emulation_failure() will be called
and will inject #UD.

> 
> The recovery path is in the spec, which means that nothing goes wrong.
> I think we implement the spec quite well now, so keeping the #GP and CS
> recovery is slightly better, although not safer.
> 
>> Otherwise, a malicious VM process, which can somehow control descriptors
>> (LDT?) may modify the descriptor during the emulation and get the system
>> to inconsistent state and prevent the VM-entry.
> 
> We restore the original CS -- malicious guest would get killed on a
> failed VM entry anyway, so the difference is only in KVM internal error
> code (assuming there are no other bugs).
> 
> Am I misunderstanding something?

Most likely you are right, but after doing one mistake, I don’t want
to be accountable for another.

Note there is another problematic case in em_ret_far(). If an exception
occurs when RIP is assigned, the RSP updates (of emulate_pop() ) are not 
going to be reverted. Can it be used for anything malicious? I don’t know.

Nadav

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] KVM: x86: restore IP after all far jump failures
  2016-11-22 23:18       ` Nadav Amit
@ 2016-11-23 16:23         ` Radim Krčmář
  0 siblings, 0 replies; 8+ messages in thread
From: Radim Krčmář @ 2016-11-23 16:23 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Paolo Bonzini, LKML, KVM list, stable, Dmitry Vyukov,
	Steve Rutherford, Andrew Honig, Prasad Pandit

2016-11-22 15:18-0800, Nadav Amit:
>> On Nov 22, 2016, at 12:56 PM, Radim Krčmář <rkrcmar@redhat.com> wrote:
>> 
>> 2016-11-22 11:43-0800, Nadav Amit:
>>> I admit my wrongdoings, but I still think the fix should have been to
>>> remove the entire recovery logic and just return X86EMUL_UNHANDLEABLE if
>>> something goes wrong (exception). This will kill the misbehaving process
>>> but keep the VM running.
>> 
>> X86EMUL_UNHANDLEABLE will kill the whole VM (on QEMU, other userspaces
>> might handle the instruction and resume KVM).
> 
> I don’t think so. If CPL is not 0, handle_emulation_failure() will be called
> and will inject #UD.

This part of the emulator should only be used on Intels without
unrestricted guest, for real and unpaged protected mode.

It is unsafe as we don't make a clear distinction which instructions are
emulated for MMIO and which for old CPUs, but anyway, we should not be
emulating it only any OS that considers security.

(x86 can use pretty much any instruction for MMIO, but I think that KVM
 should let a userspace emulator handle those exotic OSes.)

>> The recovery path is in the spec, which means that nothing goes wrong.
>> I think we implement the spec quite well now, so keeping the #GP and CS
>> recovery is slightly better, although not safer.
>> 
>>> Otherwise, a malicious VM process, which can somehow control descriptors
>>> (LDT?) may modify the descriptor during the emulation and get the system
>>> to inconsistent state and prevent the VM-entry.
>> 
>> We restore the original CS -- malicious guest would get killed on a
>> failed VM entry anyway, so the difference is only in KVM internal error
>> code (assuming there are no other bugs).
>> 
>> Am I misunderstanding something?
> 
> Most likely you are right, but after doing one mistake, I don’t want
> to be accountable for another.
> 
> Note there is another problematic case in em_ret_far(). If an exception
> occurs when RIP is assigned, the RSP updates (of emulate_pop() ) are not 
> going to be reverted. Can it be used for anything malicious? I don’t know.

True, we are not emulating it right ... security is just side-effect.
Benefits of emulation are not worth the work so I'll prepare a patch
that just returns X86EMUL_UNHANDLEABLE.  It's not going to fix other
bugs with stack handling, but at least we won't be making it crazier.

Thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-11-23 16:24 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-11-22 19:21 [PATCH] KVM: x86: restore IP after all far jump failures Radim Krčmář
2016-11-22 19:34 ` Paolo Bonzini
2016-11-22 19:44   ` Nadav Amit
     [not found]   ` <F346D675-FC84-4595-BC1D-C1049F7D6B7E@gmail.com>
2016-11-22 20:56     ` Radim Krčmář
2016-11-22 23:18       ` Nadav Amit
2016-11-23 16:23         ` Radim Krčmář
2016-11-22 20:03 ` Jim Mattson
2016-11-22 20:58   ` Radim Krčmář

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).