From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexander Graf Date: Mon, 14 Jul 2014 11:24:07 +0000 Subject: KVM HV crash Message-Id: <53C3BDD7.4020609@suse.de> List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: kvm-ppc@vger.kernel.org Hi Paul, I've just seen a crash on POWER7 in HV KVM on a host where I run HV and PR KVM VMs in parallel based on the latest code (linus/master merged with for-3.16 merged with kvm-ppc-queue plus some PR patches): Unable to handle kernel paging request for data at address 0x0000000c Faulting instruction address: 0xc0000000000667d8 Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS24 NUMA PowerNV Modules linked in: kvm_hv kvm_pr kvm CPU: 4 PID: 4973 Comm: qemu-system-ppc Not tainted 3.16.0-rc4-1.1-default+ #84 task: c000000f170addb0 ti: c000000f17154000 task.ti: c000000f17154000 NIP: c0000000000667d8 LR: d000000007d61ee8 CTR: c0000000000667c0 REGS: c000000f171574a0 TRAP: 0300 Not tainted (3.16.0-rc4-1.1-default+) MSR: 9000000000009032 CR: 24002428 XER: 00000000 CFAR: c000000000009358 DAR: 000000000000000c DSISR: 42000000 SOFTE: 1 GPR00: d000000007d63be8 c000000f17157720 c0000000018c47d8 0000000000006020 GPR04: 0000000000000000 0000000000000001 c00000000190fb18 0000000000000000 GPR08: 0000000000000c00 0000000000000000 0000000000000004 d000000007d6a420 GPR12: c0000000000667c0 c00000000fdc2400 d000000007d73ef8 0000000000000007 GPR16: c000000f17157858 c000000f19db1000 c000000001948f88 d000000007d73ef8 GPR20: d000000007d73ef8 0000000000000004 0000000000000004 c000000001948584 GPR24: c000000f171577d0 c000000f19db0ff8 c00000000fdc3f00 c000000f15941da0 GPR28: c000000f15941da0 c000000f19db0fc0 c000000f19db0fc0 c000000f15941da0 NIP [c0000000000667d8] .xics_wake_cpu+0x18/0x30 LR [d000000007d61ee8] .kvmppc_start_thread+0x78/0xd0 [kvm_hv] Call Trace: [c000000f17157720] [c000000f171577d0] 0xc000000f171577d0 (unreliable) [c000000f171577a0] [d000000007d63be8] .kvmppc_run_vcpu+0x5f8/0xef0 [kvm_hv] [c000000f17157920] [d000000007d650c4] .kvmppc_vcpu_run_hv+0x344/0x740 [kvm_hv] [c000000f171579f0] [d0000000079bd7cc] .kvmppc_vcpu_run+0x2c/0x40 [kvm] [c000000f17157a60] [d0000000079ba154] .kvm_arch_vcpu_ioctl_run+0x54/0x1a0 [kvm] [c000000f17157af0] [d0000000079b4b58] .kvm_vcpu_ioctl+0x4b8/0x740 [kvm] [c000000f17157cb0] [c000000000278858] .do_vfs_ioctl+0xa8/0x710 [c000000f17157d90] [c000000000278f84] .SyS_ioctl+0xc4/0xe0 [c000000f17157e30] [c00000000000a260] syscall_exit+0x0/0xa0 Instruction dump: 7c0004ac 91490004 39200001 992d028c 4e800020 60000000 3d22001a 78631f24 3929a788 39400004 7d29182a 7c0004ac <9949000c> 39200001 992d028c 4e800020 ---[ end trace c05a9dd08ee8f5fc ]--- (gdb) x /10i xics_wake_cpu 0xc0000000000667c0 : addis r9,r2,26 0xc0000000000667c4 : rldicr r3,r3,3,60 0xc0000000000667c8 : addi r9,r9,-22648 0xc0000000000667cc : li r10,4 0xc0000000000667d0 : ldx r9,r9,r3 0xc0000000000667d4 : sync 0xc0000000000667d8 : stb r10,12(r9) 0xc0000000000667dc : li r9,1 0xc0000000000667e0 : stb r9,652(r13) 0xc0000000000667e4 : blr (gdb) x /5i 0xc0000000000667d8 0xc0000000000667d8 : stb r10,12(r9) (gdb) l *0xc0000000000667d8 0xc0000000000667d8 is in xics_wake_cpu (./arch/powerpc/include/asm/io.h:169). 169 DEF_MMIO_OUT_D(out_8, 8, stb); (gdb) l *(kvmppc_run_vcpu+0x5f8) 0x3be8 is in kvmppc_run_vcpu (arch/powerpc/kvm/book3s_hv.c:1631). 1626 1627 1628 vc->pcpu = smp_processor_id(); 1629 list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) { 1630 kvmppc_start_thread(vcpu); 1631 kvmppc_create_dtl_entry(vcpu, vc); 1632 } 1633 1634 /* Set this explicitly in case thread 0 doesn't have a vcpu */ 1635 get_paca()->kvm_hstate.kvm_vcore = vc; So r9 is a NULL pointer. R9 is derived from the CPU ID. R3 at the point in time of the fault is 0x6020. If I read the rldicr correctly that means that cpu_id was 3076 which is a ridiculous number. The system only has 16 cores. So I guess something in this calculation in kvmppc_start_thread() went wrong: cpu = vc->pcpu + vcpu->arch.ptid; Either one of the pointers was incorrect (vc or vcpu) or the values inside. Have you seen this before? Any idea what could be going wrong? Alex