From mboxrd@z Thu Jan 1 00:00:00 1970 From: Janosch Frank Subject: Re: [PATCH 2/2] KVM: Create debugfs dir and stat files for each VM Date: Mon, 30 Nov 2015 09:37:30 +0100 Message-ID: <565C0ACA.9080204@linux.vnet.ibm.com> References: <56576FD6.40506@de.ibm.com> <56581A59.4080705@de.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Cc: kvm@vger.kernel.org, pbonzini@redhat.com, pmorel@linux.vnet.ibm.com, dan.carpenter@oracle.com, "Kevin's boot bot" To: Tyler Baker , Christian Borntraeger Return-path: Received: from e06smtp10.uk.ibm.com ([195.75.94.106]:38110 "EHLO e06smtp10.uk.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753085AbbK3Ihh (ORCPT ); Mon, 30 Nov 2015 03:37:37 -0500 Received: from localhost by e06smtp10.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 30 Nov 2015 08:37:35 -0000 Received: from b06cxnps3075.portsmouth.uk.ibm.com (d06relay10.portsmouth.uk.ibm.com [9.149.109.195]) by d06dlp02.portsmouth.uk.ibm.com (Postfix) with ESMTP id F3907219005F for ; Mon, 30 Nov 2015 08:37:25 +0000 (GMT) Received: from d06av06.portsmouth.uk.ibm.com (d06av06.portsmouth.uk.ibm.com [9.149.37.217]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id tAU8bWT562259252 for ; Mon, 30 Nov 2015 08:37:32 GMT Received: from d06av06.portsmouth.uk.ibm.com (localhost [127.0.0.1]) by d06av06.portsmouth.uk.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id tAU8bVuM028400 for ; Mon, 30 Nov 2015 01:37:32 -0700 In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: On 11/27/2015 09:42 PM, Tyler Baker wrote: > On 27 November 2015 at 10:53, Tyler Baker wrote: >> On 27 November 2015 at 09:08, Tyler Baker wrote: >>> On 27 November 2015 at 00:54, Christian Borntraeger >>> wrote: >>>> On 11/26/2015 09:47 PM, Christian Borntraeger wrote: >>>>> On 11/26/2015 05:17 PM, Tyler Baker wrote: >>>>>> Hi Christian, >>>>>> >>>>>> The kernelci.org bot recently has been reporting kvm guest boot >>>>>> failures[1] on various arm64 platforms in next-20151126. The bot >>>>>> bisected[2] the failures to the commit in -next titled "KVM: Create >>>>>> debugfs dir and stat files for each VM". I confirmed by reverting this >>>>>> commit on top of next-20151126 it resolves the boot issue. >>>>>> >>>>>> In this test case the host and guest are booted with the same kernel. >>>>>> The host is booted over nfs, installs qemu (qemu-system arm64 2.4.0), >>>>>> and launches a guest. The host is booting fine, but when the guest is >>>>>> launched it errors with "Failed to retrieve host CPU features!". I >>>>>> checked the host logs, and found an "Unable to handle kernel paging >>>>>> request" splat[3] which occurs when the guest is attempting to start. >>>>>> >>>>>> I scanned the patch in question but nothing obvious jumped out at me, >>>>>> any thoughts? >>>>> Not really. >>>>> Do you have processing running that do read the files in /sys/kernel/debug/kvm/* ? >>>>> >>>>> If I read the arm oops message correctly it oopsed inside >>>>> __srcu_read_lock. there is actually nothing in there that can oops, >>>>> except the access to the preempt count. I am just guessing right now, >>>>> but maybe the preempt variable is no longer available (as the process >>>>> is gone). As long as a debugfs file is open, we hold a reference to >>>>> the kvm, which holds a reference to the mm, so the mm might be killed >>>>> after the process. But this is supposed to work, so maybe its something >>>>> different. An objdump of __srcu_read_lock might help. >>>> Hmm, the preempt thing is done in srcu_read_lock, but the crash is in >>>> __srcu_read_lock. This function gets the srcu struct from mmu_notifier.c, >>>> which must be present and is initialized during boot. >>>> >>>> >>>> int __srcu_read_lock(struct srcu_struct *sp) >>>> { >>>> int idx; >>>> >>>> idx = READ_ONCE(sp->completed) & 0x1; >>>> __this_cpu_inc(sp->per_cpu_ref->c[idx]); >>>> smp_mb(); /* B */ /* Avoid leaking the critical section. */ >>>> __this_cpu_inc(sp->per_cpu_ref->seq[idx]); >>>> return idx; >>>> } >>>> >>>> Looking at the code I have no clue why the patch does make a difference. >>>> Can you try to get an objdump -S for__Srcu_read_lock? >> Some other interesting finding below... >> >> On the host, I do _not_ have any nodes under /sys/kernel/debug/kvm/ >> >> Running strace on the qemu command I use to launch the guest yields >> the following. >> >> [pid 5963] 1448649724.405537 mmap(NULL, 65536, PROT_READ|PROT_WRITE, >> MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6652a000 >> [pid 5963] 1448649724.405586 read(13, "MemTotal: 16414616 >> kB\nMemF"..., 1024) = 1024 >> [pid 5963] 1448649724.405699 close(13) = 0 >> [pid 5963] 1448649724.405755 munmap(0x7f6652a000, 65536) = 0 >> [pid 5963] 1448649724.405947 brk(0x2552f000) = 0x2552f000 >> [pid 5963] 1448649724.406148 openat(AT_FDCWD, "/dev/kvm", >> O_RDWR|O_CLOEXEC) = 13 >> [pid 5963] 1448649724.406209 ioctl(13, KVM_CREATE_VM, 0) = -1 ENOMEM >> (Cannot allocate memory) > If I comment the call to kvm_create_vm_debugfs(kvm) the guest boots > fine. I put some printk's in the kvm_create_vm_debugfs() function and > it's returning -ENOMEM after it evaluates !kvm->debugfs_dentry. I was > chatting with some folks from the Linaro virtualization team and they > mentioned that ARM is a bit special as the same PID creates two vms in > quick succession, the first one is a scratch vm, and the other is the > 'real' vm. With that bit of info, I suspect we may be trying to create > the debugfs directory twice, and the second time it's failing because > it already exists. > > Cheers, > > Tyler > After a quick look into qemu I guess I've found the problem: kvm_init creates a vm, does checking and self initialization and then calls kvm_arch_init. The arch initialization indirectly calls kvm_arm_create_scratch_host_vcpu and that's where the trouble begins, as it also creates a VM. My assumption was, that nobody would create multiple VMs under the same PID. Christian and I are working on a solution on kernel side. Cheers Janosch