From mboxrd@z Thu Jan 1 00:00:00 1970 From: Paolo Bonzini Subject: Re: BUG: sleeping function called from invalid context at ./include/linux/uaccess.h:LINE Date: Mon, 6 Nov 2017 17:14:35 +0100 Message-ID: References: <001a11447f06b06409055cd621ed@google.com> <367fa416-1adb-53db-4295-c25cec17b893@redhat.com> <12909e30-82ec-f39b-743e-5fb1426b1b89@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Cc: syzbot , borntraeger@de.ibm.com, hpa@zytor.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, lstoakes@gmail.com, mhocko@suse.com, mingo@redhat.com, rkrcmar@redhat.com, syzkaller-bugs@googlegroups.com, tglx@linutronix.de, x86@kernel.org To: David Hildenbrand , Nick Desaulniers Return-path: In-Reply-To: <12909e30-82ec-f39b-743e-5fb1426b1b89@redhat.com> Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org List-Id: kvm.vger.kernel.org On 06/11/2017 17:01, David Hildenbrand wrote: > On 06.11.2017 16:10, Nick Desaulniers wrote: >> Does it have to be stack allocated? > > We can't use kmalloc and friends in emulate.c. We would have to > introduce new emulator callbacks. > > a) for malloc and free. hmmm. > b) for carrying out the fxrstr/fixup. > > Paolo, what do you suggest? You can use kmalloc. Any userspace user of emulate.c would have to write a wrapper. But I'm not sure it's useful... maybe the asm_safe+memcpy could be moved to a separate noinline function, so that segmented_read_std is invoked with a leaner stack. Paolo >> >> On Nov 6, 2017 3:52 AM, "David Hildenbrand" > > wrote: >> >> On 31.10.2017 12:34, syzbot wrote: >> > Hello, >> > >> > syzkaller hit the following crash on >> > 91dfed74eabcdae9378131546c446442c29bf769 >> > >> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master >> >> > compiler: gcc (GCC) 7.1.1 20170620 >> > .config is attached >> > Raw console output is attached. >> > C reproducer is attached >> > syzkaller reproducer is attached. See https://goo.gl/kgGztJ >> > for information about syzkaller reproducers >> > >> > >> > in_atomic(): 1, irqs_disabled(): 0, pid: 2909, name: syzkaller879109 >> > 2 locks held by syzkaller879109/2909: >> >   #0:  (&vcpu->mutex){+.+.}, at: [] >> vcpu_load+0x1c/0x70 >> > arch/x86/kvm/../../../virt/kvm/kvm_main.c:154 >> >   #1:  (&kvm->srcu){....}, at: [] vcpu_enter_guest >> > arch/x86/kvm/x86.c:6983 [inline] >> >   #1:  (&kvm->srcu){....}, at: [] vcpu_run >> > arch/x86/kvm/x86.c:7061 [inline] >> >   #1:  (&kvm->srcu){....}, at: [] >> > kvm_arch_vcpu_ioctl_run+0x1bc2/0x58b0 arch/x86/kvm/x86.c:7222 >> > CPU: 1 PID: 2909 Comm: syzkaller879109 Not tainted >> 4.13.0-rc4-next-20170811 >> > #1 >> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs >> 01/01/2011 >> > Call Trace: >> >   __dump_stack lib/dump_stack.c:16 [inline] >> >   dump_stack+0x194/0x257 lib/dump_stack.c:52 >> >   ___might_sleep+0x2b2/0x470 kernel/sched/core.c:6014 >> >   __might_sleep+0x95/0x190 kernel/sched/core.c:5967 >> >   __might_fault+0xab/0x1d0 mm/memory.c:4383 >> >   __copy_from_user include/linux/uaccess.h:71 [inline] >> >   __kvm_read_guest_page+0x58/0xa0 >> > arch/x86/kvm/../../../virt/kvm/kvm_main.c:1771 >> >   kvm_vcpu_read_guest_page+0x44/0x60 >> > arch/x86/kvm/../../../virt/kvm/kvm_main.c:1791 >> >   kvm_read_guest_virt_helper+0x76/0x140 arch/x86/kvm/x86.c:4407 >> >   kvm_read_guest_virt_system+0x3c/0x50 arch/x86/kvm/x86.c:4466 >> >   segmented_read_std+0x10c/0x180 arch/x86/kvm/emulate.c:819 >> >   em_fxrstor+0x27b/0x410 arch/x86/kvm/emulate.c:4022 >> >> >> In em_fxrstor, we do a get_fpu(), which in return disables preemption. >> >> With preempt_disable(), we do a >> segmented_read_std()->kvm_vcpu_read_guest_page(), triggering the >> warning. >> >> >   x86_emulate_insn+0x55d/0x3c50 arch/x86/kvm/emulate.c:5471 >> >   x86_emulate_instruction+0x411/0x1ca0 arch/x86/kvm/x86.c:5698 >> >   kvm_mmu_page_fault+0x18b/0x2c0 arch/x86/kvm/mmu.c:4854 >> >   handle_ept_violation+0x1fc/0x5e0 arch/x86/kvm/vmx.c:6400 >> >   vmx_handle_exit+0x281/0x1ab0 arch/x86/kvm/vmx.c:8718 >> >   vcpu_enter_guest arch/x86/kvm/x86.c:6999 [inline] >> >   vcpu_run arch/x86/kvm/x86.c:7061 [inline] >> >   kvm_arch_vcpu_ioctl_run+0x1cee/0x58b0 arch/x86/kvm/x86.c:7222 >> >   kvm_vcpu_ioctl+0x64c/0x1010 >> arch/x86/kvm/../../../virt/kvm/kvm_main.c:2591 >> >   vfs_ioctl fs/ioctl.c:45 [inline] >> >   do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685 >> >   SYSC_ioctl fs/ioctl.c:700 [inline] >> >   SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691 >> >   entry_SYSCALL_64_fastpath+0x1f/0xbe >> >> I don't really see a way to avoid two fxstate variables. Unloading the >> fpu in between the fxstore/fxrstr could lead to host values getting >> overwritten. Loading/saving the fpu in kvm_arch_vcpu_ioctl_run() would >> most probably also not work, as the relevant portions of fxregs_state >> would not get saved/restored. So the preemption would still be needed. >> >> >> So all I can offer for now is the following (untested, can send as >> proper patch if needed): >> >> >> From f32d06c8c621c6d68e073e9bdb81a6280b6c9544 Mon Sep 17 00:00:00 2001 >> From: David Hildenbrand > >> Date: Mon, 6 Nov 2017 12:35:39 +0100 >> Subject: [PATCH v1] KVM: x86: fix em_fxstor sleeping while in atomic >> >> Commit 9d643f63128b tried to optimize the stack size, but introduced a >> guest memory access which might sleep while in atomic. >> >> Let's undo that part of the commit but keep the cleanups. >> >> Reported by syzbot: >> >> in_atomic(): 1, irqs_disabled(): 0, pid: 2909, name: syzkaller879109 >> 2 locks held by syzkaller879109/2909: >>   #0:  (&vcpu->mutex){+.+.}, at: [] >> vcpu_load+0x1c/0x70 >> arch/x86/kvm/../../../virt/kvm/kvm_main.c:154 >>   #1:  (&kvm->srcu){....}, at: [] vcpu_enter_guest >> arch/x86/kvm/x86.c:6983 [inline] >>   #1:  (&kvm->srcu){....}, at: [] vcpu_run >> arch/x86/kvm/x86.c:7061 [inline] >>   #1:  (&kvm->srcu){....}, at: [] >> kvm_arch_vcpu_ioctl_run+0x1bc2/0x58b0 arch/x86/kvm/x86.c:7222 >> CPU: 1 PID: 2909 Comm: syzkaller879109 Not tainted >> 4.13.0-rc4-next-20170811 >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs >> 01/01/2011 >> Call Trace: >>   __dump_stack lib/dump_stack.c:16 [inline] >>   dump_stack+0x194/0x257 lib/dump_stack.c:52 >>   ___might_sleep+0x2b2/0x470 kernel/sched/core.c:6014 >>   __might_sleep+0x95/0x190 kernel/sched/core.c:5967 >>   __might_fault+0xab/0x1d0 mm/memory.c:4383 >>   __copy_from_user include/linux/uaccess.h:71 [inline] >>   __kvm_read_guest_page+0x58/0xa0 >> arch/x86/kvm/../../../virt/kvm/kvm_main.c:1771 >>   kvm_vcpu_read_guest_page+0x44/0x60 >> arch/x86/kvm/../../../virt/kvm/kvm_main.c:1791 >>   kvm_read_guest_virt_helper+0x76/0x140 arch/x86/kvm/x86.c:4407 >>   kvm_read_guest_virt_system+0x3c/0x50 arch/x86/kvm/x86.c:4466 >>   segmented_read_std+0x10c/0x180 arch/x86/kvm/emulate.c:819 >>   em_fxrstor+0x27b/0x410 arch/x86/kvm/emulate.c:4022 >>   x86_emulate_insn+0x55d/0x3c50 arch/x86/kvm/emulate.c:5471 >>   x86_emulate_instruction+0x411/0x1ca0 arch/x86/kvm/x86.c:5698 >>   kvm_mmu_page_fault+0x18b/0x2c0 arch/x86/kvm/mmu.c:4854 >>   handle_ept_violation+0x1fc/0x5e0 arch/x86/kvm/vmx.c:6400 >>   vmx_handle_exit+0x281/0x1ab0 arch/x86/kvm/vmx.c:8718 >>   vcpu_enter_guest arch/x86/kvm/x86.c:6999 [inline] >>   vcpu_run arch/x86/kvm/x86.c:7061 [inline] >>   kvm_arch_vcpu_ioctl_run+0x1cee/0x58b0 arch/x86/kvm/x86.c:7222 >>   kvm_vcpu_ioctl+0x64c/0x1010 >> arch/x86/kvm/../../../virt/kvm/kvm_main.c:2591 >>   vfs_ioctl fs/ioctl.c:45 [inline] >>   do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685 >>   SYSC_ioctl fs/ioctl.c:700 [inline] >>   SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691 >>   entry_SYSCALL_64_fastpath+0x1f/0xbe >> RIP: 0033:0x437fc9 >> RSP: 002b:00007ffc7b4d5ab8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 >> RAX: ffffffffffffffda RBX: 00000000004002b0 RCX: 0000000000437fc9 >> RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000005 >> RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000020ae8000 >> R10: 0000000000009120 R11: 0000000000000206 R12: 0000000000000000 >> R13: 0000000000000004 R14: 0000000000000004 R15: 0000000020077000 >> >> Fixes: 9d643f63128b ("KVM: x86: avoid large stack allocations in >> em_fxrstor") >> Signed-off-by: David Hildenbrand > > >> --- >>  arch/x86/kvm/emulate.c | 16 +++++++++------- >>  1 file changed, 9 insertions(+), 7 deletions(-) >> >> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c >> index fb0055953fbc..d87f01a2d6f4 100644 >> --- a/arch/x86/kvm/emulate.c >> +++ b/arch/x86/kvm/emulate.c >> @@ -4002,7 +4002,7 @@ static int em_fxsave(struct x86_emulate_ctxt >> *ctxt) >> >>  static int em_fxrstor(struct x86_emulate_ctxt *ctxt) >>  { >> -       struct fxregs_state fx_state; >> +       struct fxregs_state fx_state, fx_old; >>         int rc; >>         size_t size; >> >> @@ -4010,19 +4010,21 @@ static int em_fxrstor(struct >> x86_emulate_ctxt *ctxt) >>         if (rc != X86EMUL_CONTINUE) >>                 return rc; >> >> +       size = fxstate_size(ctxt); >> +       rc = segmented_read_std(ctxt, ctxt->memop.addr.mem, >> &fx_state, size); >> +       if (rc != X86EMUL_CONTINUE) >> +               return rc; >> + >>         ctxt->ops->get_fpu(ctxt); >> >> -       size = fxstate_size(ctxt); >>         if (size < __fxstate_size(16)) { >> -               rc = asm_safe("fxsave %[fx]", , [fx] "+m"(fx_state)); >> +               rc = asm_safe("fxsave %[fx]", , [fx] "+m"(fx_old)); >>                 if (rc != X86EMUL_CONTINUE) >>                         goto out; >> +               memcpy(((void *)&fx_state) + size, ((void *)&fx_old) >> + size, >> +                      __fxstate_size(16) - size); >>         } >> >> -       rc = segmented_read_std(ctxt, ctxt->memop.addr.mem, >> &fx_state, size); >> -       if (rc != X86EMUL_CONTINUE) >> -               goto out; >> - >>         if (fx_state.mxcsr >> 16) { >>                 rc = emulate_gp(ctxt, 0); >>                 goto out; >> -- >> 2.13.6 >> >> >> >> -- >> >> Thanks, >> >> David / dhildenb >> > >