From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mm01.cs.columbia.edu (mm01.cs.columbia.edu [128.59.11.253]) by smtp.lore.kernel.org (Postfix) with ESMTP id A265CC19F2D for ; Sat, 13 Aug 2022 13:32:30 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id 015344F5B2; Sat, 13 Aug 2022 09:32:30 -0400 (EDT) X-Virus-Scanned: at lists.cs.columbia.edu Authentication-Results: mm01.cs.columbia.edu (amavisd-new); dkim=softfail (fail, message has been altered) header.i=@kernel.org Received: from mm01.cs.columbia.edu ([127.0.0.1]) by localhost (mm01.cs.columbia.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id MN+x1hdEahLt; Sat, 13 Aug 2022 09:32:28 -0400 (EDT) Received: from mm01.cs.columbia.edu (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id 79C234F5B3; Sat, 13 Aug 2022 09:32:28 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id 2B9A44F5B1 for ; Sat, 13 Aug 2022 09:32:27 -0400 (EDT) X-Virus-Scanned: at lists.cs.columbia.edu Received: from mm01.cs.columbia.edu ([127.0.0.1]) by localhost (mm01.cs.columbia.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qKPzvAyE7KVy for ; Sat, 13 Aug 2022 09:32:25 -0400 (EDT) Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by mm01.cs.columbia.edu (Postfix) with ESMTPS id 9C90C4F5B0 for ; Sat, 13 Aug 2022 09:32:25 -0400 (EDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id EF318B800E2; Sat, 13 Aug 2022 13:32:23 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 95D22C433C1; Sat, 13 Aug 2022 13:32:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1660397542; bh=4tt7rfBTVdPNcYX1M4pLZ2pBDu1cWuKg6CQTeYZJTL8=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=pQAAWq9Q3rJzp7x/PtLEfq3vmnHAh0bW8QQEYaQhkuFYyo6EoeKg2W+Z9XiyxR1ZX xq52lNyN6iYxFzCSQPMS8zQEq/rCIbadzx+kwKfhLYjKGbBbTrcVrqfqdrSlu5Hs0Z zCIhVBiPuqEf2V5tNzqVdWJueqzUMOUyhgwJIKTV84xhTxOAuytriiAcjLhBOEO+uV 8ZPBpRm9dfEXftW7mn24xtGOfBzNA2bnRDKllwx0nYw4NQfCECyLY5w/Xb7g0R3yUI bdVtH5hcfNQzk8gLd7O7mdQCYt0ExHt+kEj9F9QdhI8y0e2DV5p/IVk+rVwX4sKQ6S 2IyOSx7r0NsDw== Received: from sofa.misterjones.org ([185.219.108.64] helo=wait-a-minute.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1oMrFU-002iqS-AU; Sat, 13 Aug 2022 14:32:20 +0100 Date: Sat, 13 Aug 2022 14:32:19 +0100 Message-ID: <8735e0s1zw.wl-maz@kernel.org> From: Marc Zyngier To: Vitaly Chikunov Subject: Re: qemu-system-aarch64: Failed to retrieve host CPU features In-Reply-To: <20220813111137.5plgwrfnosolj2bp@altlinux.org> References: <20220812021427.cwenhciuftgtaj64@altlinux.org> <20220812084529.ur5qcyws5qvoyvuc@altlinux.org> <87h72hv71u.wl-maz@kernel.org> <20220813111137.5plgwrfnosolj2bp@altlinux.org> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: vt@altlinux.org, peter.maydell@linaro.org, qemu-arm@nongnu.org, kvmarm@lists.cs.columbia.edu, qemu-devel@nongnu.org, ldv@altlinux.org X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false Cc: qemu-arm@nongnu.org, "Dmitry V. Levin" , kvmarm , QEMU Developers X-BeenThere: kvmarm@lists.cs.columbia.edu X-Mailman-Version: 2.1.14 Precedence: list List-Id: Where KVM/ARM decisions are made List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: kvmarm-bounces@lists.cs.columbia.edu Sender: kvmarm-bounces@lists.cs.columbia.edu On Sat, 13 Aug 2022 12:11:37 +0100, Vitaly Chikunov wrote: > > Marc, > > On Fri, Aug 12, 2022 at 04:02:37PM +0100, Marc Zyngier wrote: > > On Fri, 12 Aug 2022 10:25:55 +0100, > > Peter Maydell wrote: > > > > > > I've added some more relevant mailing lists to the cc. > > > > > > On Fri, 12 Aug 2022 at 09:45, Vitaly Chikunov wrote: > > > > On Fri, Aug 12, 2022 at 05:14:27AM +0300, Vitaly Chikunov wrote: > > > > > I noticed that we starting to get many errors like this: > > > > > > > > > > qemu-system-aarch64: Failed to retrieve host CPU features > > > > > > > > > > Where many is 1-2% per run, depends on host, host is Kunpeng-920, and > > > > > Linux kernel is v5.15.59, but it started to appear months before that. > > > > > > > > > > strace shows in erroneous case: > > > > > > > > > > 1152244 ioctl(9, KVM_CREATE_VM, 0x30) = -1 EINTR (Interrupted system call) > > > > > > > > > > And I see in target/arm/kvm.c:kvm_arm_create_scratch_host_vcpu: > > > > > > > > > > vmfd = ioctl(kvmfd, KVM_CREATE_VM, max_vm_pa_size); > > > > > if (vmfd < 0) { > > > > > goto err; > > > > > } > > > > > > > > > > Maybe it should restart ioctl on EINTR? > > > > > > > > > > I don't see EINTR documented in ioctl(2) nor in Linux' > > > > > Documentation/virt/kvm/api.rst for KVM_CREATE_VM, but for KVM_RUN it > > > > > says "an unmasked signal is pending". > > > > > > > > I am suggested that almost any blocking syscall could return EINTR, so I > > > > checked the strace log and it does not show evidence of arriving a signal, > > > > the log ends like this: > > > > > > > > 1152244 openat(AT_FDCWD, "/dev/kvm", O_RDWR|O_CLOEXEC) = 9 > > > > 1152244 ioctl(9, KVM_CHECK_EXTENSION, KVM_CAP_ARM_VM_IPA_SIZE) = 48 > > > > 1152244 ioctl(9, KVM_CREATE_VM, 0x30) = -1 EINTR (Interrupted system call) > > > > 1152244 close(9) = 0 > > > > 1152244 newfstatat(2, "", {st_dev=makedev(0, 0xd), st_ino=57869925, st_mode=S_IFIFO|0600, st_nlink=1, st_uid=517, st_gid=517, st_blksize=4096, st_blocks=0, st_size=0, st_atime=1660268019 /* 2022-08-12T01:33:39.850436293+0000 */, st_atime_nsec=850436293, st_mtime=1660268019 /* 2022-08-12T01:33:39.850436293+0000 */, st_mtime_nsec=850436293, st_ctime=1660268019 /* 2022-08-12T01:33:39.850436293+0000 */, st_ctime_nsec=850436293}, AT_EMPTY_PATH) = 0 > > > > 1152244 write(2, "qemu-system-aarch64: Failed to r"..., 58) = 58 > > > > 1152244 exit_group(1) = ? > > > > 1152245 <... clock_nanosleep resumed> ) = ? > > > > 1152245 +++ exited with 1 +++ > > > > 1152244 +++ exited with 1 +++ > > > > > > KVM folks: should we expect that KVM_CREATE_VM might fail EINTR > > > and need retrying? > > > > In general, yes. But for this particular one, this is pretty odd. > > > > The only path I can so far see that would match this behaviour is if > > mm_take_all_locks() (called from __mmu_notifier_register()) was > > getting interrupted by a signal (I'm looking at a 5.19-ish kernel, > > which may slightly differ from the 5.15 mentioned above). > > > > But as Vitaly points out, it doesn't seem to be a signal delivered > > here. > > > > Vitaly: could you please share your exact test case (full qemu command > > line), and instrument your kernel to see if mm_take_all_locks() is the > > one failing? > > Full command is `qemu-system-aarch64 -M accel=kvm:tcg -m 4096M -smp > cores=8 -nodefaults -nographic -no-reboot -fsdev > local,id=root,path=/,security_model=none,multidevs=remap -device > virtio-9p-pci,fsdev=root,mount_tag=/dev/root -device virtio-rng-pci > -serial mon:stdio -kernel /boot/vmlinuz-5.18.16-un-def-alt1 -initrd > /usr/src/tmp/initramfs-5.18.16-un-def-alt1.img -sandbox on,spawn=deny -M > virt,gic-version=3 -cpu max -append 'console=ttyAMA0 mitigations=off > nokaslr quiet panic=-1 SCRIPT=/usr/src/tmp/tmp.458pkF5r8d'`. > > But a minified reproducer is `qemu-system-aarch64 -M virt,accel=kvm -cpu max -kernel qwe`. Ah, couldn't be smaller! ;-) Thanks for that! > > I traced with kprobe event tracer with the probes: > > echo 'p kvm_dev_ioctl' > kprobe_events > echo 'r kvm_dev_ioctl $retval' >> kprobe_events > echo 'p mm_take_all_locks' >> kprobe_events > echo 'r mm_take_all_locks $retval' >> kprobe_events > > Then run reproducer in the loop > > for ((i=0;i<100;i++));do echo $i; strace -fvo bb.$i qemu-system-aarch64 -M virt,accel=kvm -cpu max -kernel qwe 2>&1 | grep Failed && break; done > > And on 97th iteration it's failed, strace log to check that PID is the same with > the following trace: > > 3611463 ioctl(9, KVM_CREATE_VM, 0x30) = -1 EINTR (Interrupted system call) > > Event trace log: > > qemu-system-aar-3611463 [002] d.... 342920.535549: p_kvm_dev_ioctl_0: (kvm_dev_ioctl+0x0/0x910) > qemu-system-aar-3611463 [002] d.... 342920.535635: p_mm_take_all_locks_0: (mm_take_all_locks+0x0/0x390) > qemu-system-aar-3611463 [002] ..... 342920.535657: r_mm_take_all_locks_0: (__mmu_notifier_register+0x44/0x200 <- mm_take_all_locks) arg1=0xfffffffc > qemu-system-aar-3611463 [002] ..... 342920.535695: r_kvm_dev_ioctl_0: (__arm64_sys_ioctl+0xcc/0x124 <- kvm_dev_ioctl) arg1=0xfffffffffffffffc > > So that's mm_take_all_locks returns EINTR. > > (I tried also to make C reproducer which opens /dev/kvm and call > `ioctl(kvm, KVM_CREATE_VM, 0)` in the loop and it does not reproduce the > problem, so it seems it's also something additional that QEMU does.) On my fast test box, it takes several hundreds of iterations to fire, but that's not completely impossible to trigger (strace actually helps here). I was expecting it to fire more on a slow one, but that's not the case. And indeed, no signal seems to be delivered. I suspect that FIT_NOTIFY_SIGNAL is set, but I need to investigate more. With a reproducer, it should be easier to diagnose fully. But we probably need to handle EINTR when creating the mini VM. M. -- Without deviation from the norm, progress is not possible. _______________________________________________ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm