From: Eduardo Habkost <ehabkost@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Wanpeng Li <kernellwp@gmail.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
kvm <kvm@vger.kernel.org>,
yfu@redhat.com
Subject: Re: [PATCH] KVM: x86: inject exceptions produced by x86_decode_insn
Date: Thu, 30 Nov 2017 18:33:45 -0200 [thread overview]
Message-ID: <20171130203345.GG3037@localhost.localdomain> (raw)
In-Reply-To: <20171129184216.GC3037@localhost.localdomain>
On Wed, Nov 29, 2017 at 04:42:16PM -0200, Eduardo Habkost wrote:
> On Wed, Nov 29, 2017 at 12:44:42PM +0100, Paolo Bonzini wrote:
> > On 29/11/2017 12:44, Eduardo Habkost wrote:
> > > On Mon, Nov 13, 2017 at 09:32:09AM +0100, Paolo Bonzini wrote:
> > >> On 13/11/2017 08:15, Wanpeng Li wrote:
> > >>> 2017-11-10 17:49 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
> > >>>> Sometimes, a processor might execute an instruction while another
> > >>>> processor is updating the page tables for that instruction's code page,
> > >>>> but before the TLB shootdown completes. The interesting case happens
> > >>>> if the page is in the TLB.
> > >>>>
> > >>>> In general, the processor will succeed in executing the instruction and
> > >>>> nothing bad happens. However, what if the instruction is an MMIO access?
> > >>>> If *that* happens, KVM invokes the emulator, and the emulator gets the
> > >>>> updated page tables. If the update side had marked the code page as non
> > >>>> present, the page table walk then will fail and so will x86_decode_insn.
> > >>>>
> > >>>> Unfortunately, even though kvm_fetch_guest_virt is correctly returning
> > >>>> X86EMUL_PROPAGATE_FAULT, x86_decode_insn's caller treats the failure as
> > >>>> a fatal error if the instruction cannot simply be reexecuted (as is the
> > >>>> case for MMIO). And this in fact happened sometimes when rebooting
> > >>>> Windows 2012r2 guests. Just checking ctxt->have_exception and injecting
> > >>>> the exception if true is enough to fix the case.
> > >>>
> > >>> I found the only place which can set ctxt->have_exception is in the
> > >>> function x86_emulate_insn(), and x86_decode_insn() will not set
> > >>> ctxt->have_exception even if kvm_fetch_guest_virt() returns
> > >>> X86_EMUL_PROPAGATE_FAULT.
> > >>
> > >> Hmm, you're right. Looks like Yanan has been (un)lucky when trying out
> > >> this patch! :(
> > >>
> > >> Yanan, can you double check that you can reproduce the issue with an
> > >> unpatched kernel? I will work on a kvm-unit-tests testcsae
> > >
> > > We don't have a kvm-unit-tests reproducer for this yet, right?
> > >
> > > I'm considering trying to write one, but I don't want to
> > > duplicate work.
> >
> > No, I haven't written one yet.
>
> The reproducer (not a full test case) is quite simple, see patch below.
>
> Now, I've noticed something interesting when running the
> reproducer:
There's something else that makes the bug hard to reproduce: as
soon as I set RSP to a valid address in inregs before calling
trap_emulator(), the bug is not reproducible anymore.
But if I keep RSP=0, I won't be able to validate the bug fix
because I won't be able to configure a working #PF handler.
This alone makes the bug not reproducible anymore:
diff --git a/x86/emulator.c b/x86/emulator.c
index 72cb035..a7e61ff 100644
--- a/x86/emulator.c
+++ b/x86/emulator.c
@@ -1104,6 +1104,8 @@ static void test_illegal_movbe(void)
static void test_fetch_failure(void *mem, void *alt_insn_page)
{
+ void *stack = alloc_page();
+ inregs = (struct regs){ .rsp = (u64)stack+1024 };
trap_emulator(mem, NULL, NULL);
}
This is what I see:
When we don't have a stack (inregs.rsp=0),
reexecute_instruction() is preventing the emulation failure from
happening on the I/O instruction VM exits, and KVM keeps entering
the VM in a loop (getting thousands of I/O instruction VM exits)
until we finally get an EPT misconfig VM exit on GVA
0xfffffffffffffff8.
When we set up inregs.rsp, reexecute_instruction() also prevents
the emulation from failing on the I/O instruction VM exits, but
instead of a EPT misconfig VM exit, we get EPT violation VM exit
after a few thousand iterations, and the page fault is delivered
to the VCPU.
I don't know why KVM loops so many times on I/O instruction VM
exits before finally getting an emulation failure (or finally
delivering a page fault, if a stack is available), but this might
explain why the bug is so hard to reproduce under normal
circumstances.
>
> If the test_fetch_failure() call happens before we touch
> pci-testdev through *mem (like in the patch below), we get an
> emulation failure like the one Yanan saw:
>
> $ /usr/bin/qemu-system-x86_64 -nodefaults -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -machine accel=kvm -kernel ./x86/emulator.flat # -initrd /tmp/tmp.RCPjppRp8i
> enabling apic
> paging enabled
> cr0 = 80010011
> cr3 = 45e000
> cr4 = 20
> KVM internal error. Suberror: 1
> emulation failure
> RAX=0000000000000000 RBX=0000000000000000 RCX=0000000000000000 RDX=0000000000000000
> RSI=0000000000000000 RDI=0000000000000000 RBP=0000000000000000 RSP=0000000000000000
> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
> RIP=ffffffffffffc08a RFL=00010002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0010 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
> CS =0008 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
> SS =0010 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
> DS =0010 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
> FS =0010 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
> GS =0010 0000000000454d60 ffffffff 00c09300 DPL=0 DS [-WA]
> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
> TR =0080 000000000041148a 0000ffff 00008b00 DPL=0 TSS64-busy
> GDT= 000000000041100a 0000047f
> IDT= 0000000000000000 00000fff
> CR0=80010011 CR2=ffffffffffffc08a CR3=000000000045e000 CR4=00000020
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000500
> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
>
> but if I call test_fetch_failure() after touching *mem, like this:
>
> diff --git a/x86/emulator.c b/x86/emulator.c
> index 977ec75..72cb035 100644
> --- a/x86/emulator.c
> +++ b/x86/emulator.c
> @@ -1124,7 +1124,6 @@ int main()
> alt_insn_page = alloc_page();
> insn_ram = vmap(virt_to_phys(insn_page), 4096);
>
> - test_fetch_failure(mem, alt_insn_page);
>
> // test mov reg, r/m and mov r/m, reg
> t1 = 0x123456789abcdef;
> @@ -1135,6 +1134,8 @@ int main()
> : "memory");
> report("mov reg, r/m (1)", t2 == 0x123456789abcdef);
>
> + test_fetch_failure(mem, alt_insn_page);
> +
> test_simplealu(mem);
> test_cmps(mem);
> test_scas(mem);
>
> then I get a KVM_INTERNAL_ERROR_DELIVERY_EV:
>
> $ /usr/bin/qemu-system-x86_64 -nodefaults -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -machine accel=kvm -kernel ./x86/emulator.flat # -initrd /tmp/tmp.lmXZa46TEA
> enabling apic
> paging enabled
> cr0 = 80010011
> cr3 = 45e000
> cr4 = 20
> PASS: mov reg, r/m (1)
> KVM internal error. Suberror: 3
> extra data[0]: 80000b0e
> extra data[1]: 31
> extra data[2]: 182
> extra data[3]: ff000ff8
> RAX=0000000000000000 RBX=0000000000000000 RCX=0000000000000000 RDX=0000000000000000
> RSI=0000000000000000 RDI=0000000000000000 RBP=0000000000000000 RSP=0000000000000000
> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
> RIP=ffffffffffffc08a RFL=00010002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0010 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
> CS =0008 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
> SS =0010 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
> DS =0010 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
> FS =0010 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
> GS =0010 0000000000454d60 ffffffff 00c09300 DPL=0 DS [-WA]
> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
> TR =0080 000000000041148a 0000ffff 00008b00 DPL=0 TSS64-busy
> GDT= 000000000041100a 0000047f
> IDT= 0000000000000000 00000fff
> CR0=80010011 CR2=ffffffffffffc08a CR3=000000000045e000 CR4=00000020
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000500
> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
> ^C
>
> Also, if I run the reproducer using ept=0, it gets stuck into a
> loop re-entering the same "in (%dx),%al" instruction over and
> over again. trace-cmd report output:
>
> qemu-system-x86-18185 [001] 1057573.830491: kvm_exit: reason IO_INSTRUCTION rip 0xffffffffffffc08a info 8 0
> qemu-system-x86-18185 [001] 1057573.830494: kvm_emulate_insn: 0:ffffffffffffc08a: 4d 89 2c 24
> qemu-system-x86-18185 [001] 1057573.830503: kvm_entry: vcpu 0
> qemu-system-x86-18185 [001] 1057573.830504: kvm_exit: reason IO_INSTRUCTION rip 0xffffffffffffc08a info 8 0
> qemu-system-x86-18185 [001] 1057573.830505: kvm_emulate_insn: 0:ffffffffffffc08a: 4d 89 2c 24
> qemu-system-x86-18185 [001] 1057573.830506: kvm_entry: vcpu 0
> qemu-system-x86-18185 [001] 1057573.830507: kvm_exit: reason IO_INSTRUCTION rip 0xffffffffffffc08a info 8 0
> qemu-system-x86-18185 [001] 1057573.830508: kvm_emulate_insn: 0:ffffffffffffc08a: 4d 89 2c 24
> qemu-system-x86-18185 [001] 1057573.830509: kvm_entry: vcpu 0
> qemu-system-x86-18185 [001] 1057573.830510: kvm_exit: reason IO_INSTRUCTION rip 0xffffffffffffc08a info 8 0
> qemu-system-x86-18185 [001] 1057573.830511: kvm_emulate_insn: 0:ffffffffffffc08a: 4d 89 2c 24
> qemu-system-x86-18185 [001] 1057573.830511: kvm_entry: vcpu 0
> qemu-system-x86-18185 [001] 1057573.830512: kvm_exit: reason IO_INSTRUCTION rip 0xffffffffffffc08a info 8 0
> qemu-system-x86-18185 [001] 1057573.830513: kvm_emulate_insn: 0:ffffffffffffc08a: 4d 89 2c 24
> qemu-system-x86-18185 [001] 1057573.830514: kvm_entry: vcpu 0
> qemu-system-x86-18185 [001] 1057573.830514: kvm_exit: reason IO_INSTRUCTION rip 0xffffffffffffc08a info 8 0
> qemu-system-x86-18185 [001] 1057573.830515: kvm_emulate_insn: 0:ffffffffffffc08a: 4d 89 2c 24
> qemu-system-x86-18185 [001] 1057573.830516: kvm_entry: vcpu 0
> qemu-system-x86-18185 [001] 1057573.830517: kvm_exit: reason IO_INSTRUCTION rip 0xffffffffffffc08a info 8 0
> qemu-system-x86-18185 [001] 1057573.830518: kvm_emulate_insn: 0:ffffffffffffc08a: 4d 89 2c 24
> qemu-system-x86-18185 [001] 1057573.830519: kvm_entry: vcpu 0
> qemu-system-x86-18185 [001] 1057573.830521: kvm_exit: reason IO_INSTRUCTION rip 0xffffffffffffc08a info 8 0
> qemu-system-x86-18185 [001] 1057573.830522: kvm_emulate_insn: 0:ffffffffffffc08a: 4d 89 2c 24
> qemu-system-x86-18185 [001] 1057573.830523: kvm_entry: vcpu 0
> [...]
>
> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
> ---
> x86/emulator.c | 21 +++++++++++++++++----
> 1 file changed, 17 insertions(+), 4 deletions(-)
>
> diff --git a/x86/emulator.c b/x86/emulator.c
> index e6f27cc..977ec75 100644
> --- a/x86/emulator.c
> +++ b/x86/emulator.c
> @@ -792,9 +792,11 @@ static void trap_emulator(uint64_t *mem, void *alt_insn_page,
> extern u8 insn_page[], test_insn[];
>
> insn_ram = vmap(virt_to_phys(insn_page), 4096);
> - memcpy(alt_insn_page, insn_page, 4096);
> - memcpy(alt_insn_page + (test_insn - insn_page),
> - (void *)(alt_insn->ptr), alt_insn->len);
> + if (alt_insn_page) {
> + memcpy(alt_insn_page, insn_page, 4096);
> + memcpy(alt_insn_page + (test_insn - insn_page),
> + (void *)(alt_insn->ptr), alt_insn->len);
> + }
> save = inregs;
>
> /* Load the code TLB with insn_page, but point the page tables at
> @@ -805,7 +807,11 @@ static void trap_emulator(uint64_t *mem, void *alt_insn_page,
> invlpg(insn_ram);
> /* Load code TLB */
> asm volatile("call *%0" : : "r"(insn_ram));
> - install_page(cr3, virt_to_phys(alt_insn_page), insn_ram);
> + if (alt_insn_page) {
> + install_page(cr3, virt_to_phys(alt_insn_page), insn_ram);
> + } else {
> + install_pte(cr3, 1, insn_ram, PT_USER_MASK, 0);
> + }
> /* Trap, let hypervisor emulate at alt_insn_page */
> asm volatile("call *%0": : "r"(insn_ram+1));
>
> @@ -1096,6 +1102,11 @@ static void test_illegal_movbe(void)
> handle_exception(UD_VECTOR, 0);
> }
>
> +static void test_fetch_failure(void *mem, void *alt_insn_page)
> +{
> + trap_emulator(mem, NULL, NULL);
> +}
> +
> int main()
> {
> void *mem;
> @@ -1113,6 +1124,8 @@ int main()
> alt_insn_page = alloc_page();
> insn_ram = vmap(virt_to_phys(insn_page), 4096);
>
> + test_fetch_failure(mem, alt_insn_page);
> +
> // test mov reg, r/m and mov r/m, reg
> t1 = 0x123456789abcdef;
> asm volatile("mov %[t1], (%[mem]) \n\t"
> --
> 2.13.6
>
>
> --
> Eduardo
--
Eduardo
prev parent reply other threads:[~2017-11-30 20:33 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-11-10 9:49 [PATCH] KVM: x86: inject exceptions produced by x86_decode_insn Paolo Bonzini
2017-11-10 21:42 ` Radim Krčmář
2017-11-13 7:15 ` Wanpeng Li
2017-11-13 8:32 ` Paolo Bonzini
2017-11-13 10:09 ` Yanan Fu
2017-11-16 17:12 ` Radim Krčmář
2017-11-29 11:44 ` Eduardo Habkost
2017-11-29 11:44 ` Paolo Bonzini
2017-11-29 18:42 ` Eduardo Habkost
2017-11-29 22:47 ` Paolo Bonzini
2017-11-29 23:10 ` Eduardo Habkost
2017-11-30 16:04 ` Eduardo Habkost
2017-11-30 9:20 ` Wanpeng Li
2017-11-30 16:00 ` Paolo Bonzini
2017-11-30 20:33 ` Eduardo Habkost [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171130203345.GG3037@localhost.localdomain \
--to=ehabkost@redhat.com \
--cc=kernellwp@gmail.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=yfu@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.