From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3A1FF1DB933 for ; Wed, 11 Dec 2024 16:12:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733933535; cv=none; b=nnNw17dC4U4gowbrk02BAZ97UWcVOO+90Gar/Jc40BfBHUt1U8HHmHsbsMhakR/wjzt8dfhS3upecikLQd1591gxD972/uTj5WMdzwnOlQ9gN9fQvc7gSz5GhXmaRGa2qkLuIRYQRVeRnl6GuMsNH1HDebnNPwiIEf0RUtisN/U= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733933535; c=relaxed/simple; bh=qKP/+FCxwB6yUfzOD256SYaGwM3RdiDXqzgfHFBwVDw=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: Content-Type:MIME-Version; b=dRTQ8fDXFnKF+I3tD5YYU2oj+rJFAcTH6bHAhlb+sK5Homd/mIym6ly5magwJ4f8pESXBTVDGWjqi2i9tay3Hq0pmpkEgGf20GQTlx7k7yvljd31ILMviWux4V56Uq4oDVHVl3N9wgTSEtquC78/+vzrVCMXL29208ikGVg88OA= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=eWP8K67v; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="eWP8K67v" Received: by smtp.kernel.org (Postfix) with ESMTPS id BFB4FC4CEDD for ; Wed, 11 Dec 2024 16:12:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1733933534; bh=qKP/+FCxwB6yUfzOD256SYaGwM3RdiDXqzgfHFBwVDw=; h=From:To:Subject:Date:In-Reply-To:References:From; b=eWP8K67vukFFJ29sHZZbo7qKbcx8RCZKKuCgMLOLcWcb0v5Ft3+A0cW0Jwh+LWyTY dVS3ko1DGYO3X3zk615KX8WvMlpdZVOO5PUMuTHQ8VSow2H5SU3tIhBzwapJlEzP7T 9sZuhui+MQpXZ2mirTygRIQpxsNkeCEcHsbzHV6BNs8D/XfpJHX1jGy4X9fK2cQMF+ B7opDlreMZhm6C/P8DxPHh3l9cnpG6vVM4tf2lM/ilL5ZvU2S45dDAVxWb3yGCj8Sf 9A0xF21x6SlRb7i9KJuImw0YUyZSKtoW6Tylivw5KkcbsKWnuBxY/35NP1fbfQIGHV F84K2E/Zn9SPA== Received: by aws-us-west-2-korg-bugzilla-1.web.codeaurora.org (Postfix, from userid 48) id B1507C41615; Wed, 11 Dec 2024 16:12:14 +0000 (UTC) From: bugzilla-daemon@kernel.org To: kvm@vger.kernel.org Subject: [Bug 219588] [6.13.0-rc2+]WARNING: CPU: 52 PID: 12253 at arch/x86/kvm/mmu/tdp_mmu.c:1001 tdp_mmu_map_handle_target_level+0x1f0/0x310 [kvm] Date: Wed, 11 Dec 2024 16:12:14 +0000 X-Bugzilla-Reason: None X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: AssignedTo virtualization_kvm@kernel-bugs.osdl.org X-Bugzilla-Product: Virtualization X-Bugzilla-Component: kvm X-Bugzilla-Version: unspecified X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: seanjc@google.com X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: virtualization_kvm@kernel-bugs.osdl.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugzilla.kernel.org/ Auto-Submitted: auto-generated Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 https://bugzilla.kernel.org/show_bug.cgi?id=3D219588 --- Comment #1 from Sean Christopherson (seanjc@google.com) --- On Wed, Dec 11, 2024, bugzilla-daemon@kernel.org wrote: > I hit a bug on the intel host, this problem occurs randomly: > [ 406.127925] ------------[ cut here ]------------ > [ 406.132572] WARNING: CPU: 52 PID: 12253 at arch/x86/kvm/mmu/tdp_mmu.c:= 1001 > tdp_mmu_map_handle_target_level+0x1f0/0x310 [kvm] Can you describe the host activity at the time of the WARN? E.g. is it und= er memory pressure and potentially swapping, is KSM or NUMA balancing active? I have a sound theory for how the scenario occurs on KVM's end, but I still t= hink it's wrong for KVM to overwrite a writable SPTE with a read-only SPTE in th= is situation. And does the VM have device memory or any other type of VM_PFNMAP or VM_IO memory exposed to it? E.g. an assigned device? If so, can you provide the register state from the other WARNs? If the PFNs are all in the same range, then ma= ybe this is something funky with the VM_PFNMAP | VM_IO path. The WARN is a sanity check I added because it should be impossible for KVM = to install a non-writable SPTE overtop an existing writable SPTE. Or so I thought. The WARN is benign in the sense that nothing bad will happen _in KVM_; KVM correctly handles the unexpected change, the WARN is there purely to flag t= hat something unexpected happen. if (new_spte =3D=3D iter->old_spte) ret =3D RET_PF_SPURIOUS; else if (tdp_mmu_set_spte_atomic(vcpu->kvm, iter, new_spte)) return RET_PF_RETRY; else if (is_shadow_present_pte(iter->old_spte) && (!is_last_spte(iter->old_spte, iter->level) || WARN_ON_ONCE(leaf_spte_change_needs_tlb_flush(iter->old_s= pte, new_spte)))) <=3D=3D=3D=3D kvm_flush_remote_tlbs_gfn(vcpu->kvm, iter->gfn, iter->level= ); Cross referencing the register state RAX: 860000025e000bf7 RBX: ff4af92c619cf920 RCX: 0400000000000000 RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000015 RBP: ff4af92c619cf9e8 R08: 800000025e0009f5 R09: 0000000000000002 R10: 000000005e000901 R11: 0000000000000001 R12: ff1e70694fc68000 R13: 0000000000000005 R14: 0000000000000000 R15: ff4af92c619a1000 with the disassembly 4885C8 TEST RAX,RCX 0F84EEFEFFFF JE 0000000000000-F1 4985C8 TEST R8,RCX 0F85E5FEFFFF JNE 0000000000000-F1 0F0B UD2 RAX is the old SPTE and RCX is the new SPTE, i.e. the SPTE change is: 860000025e000bf7 800000025e0009f5 On Intel, bits 57 and 58 are the host-writable and MMU-writable flags #define EPT_SPTE_HOST_WRITABLE BIT_ULL(57) #define EPT_SPTE_MMU_WRITABLE BIT_ULL(58) which means KVM is overwriting a writable SPTE with a non-writable SPTE bec= ause the current vCPU (a) hit a READ or EXEC fault on a non-present SPTE and (b) retrieved a non-writable PFN from the primary MMU, and that fault raced with a WRITE fault on a different vCPU that retrieved and installed a writable PFN. On a READ or EXEC fault, this code in hva_to_pfn_slow() should get a writab= le PFN. Given that KVM has an valid writable SPTE, the corresponding PTE in the pri= mary MMU *must* be writable, otherwise there's a missing mmu_notifier invalidation. /* map read fault as writable if possible */ if (!(flags & FOLL_WRITE) && kfp->map_writable && get_user_page_fast_only(kfp->hva, FOLL_WRITE, &wpage)) { put_page(page); page =3D wpage; flags |=3D FOLL_WRITE; } out: *pfn =3D kvm_resolve_pfn(kfp, page, NULL, flags & FOLL_WRITE); return npages; Hmm, gup_fast_folio_allowed() has a few conditions where it will reject fast GUP, but they should be mutually exclusive with KVM having a writable SPTE. If = the mapping is truncated or the folio is swapped out, secondary MMUs need to be invalidated before folio->mapping is nullified. /* * The mapping may have been truncated, in any case we cannot deter= mine * if this mapping is safe - fall back to slow path to determine ho= w to * proceed. */ if (!mapping) return false; And secretmem can't be GUP'd, and it's not a long-term pin, so these checks don't apply either: if (check_secretmem && secretmem_mapping(mapping)) return false; /* The only remaining allowed file system is shmem. */ return !reject_file_backed || shmem_mapping(mapping); Similarly, hva_to_pfn_remapped() should get a writable PFN if said PFN is writable in the primary MMU, regardless of the fault type. If this turns out to get a legitimate scenario, then I think it makes sense= to add an is_access_allowed() check and treat the fault as spurious. But I wo= uld like to try to bottom out on what exactly is happening, because I'm mildly concerned something is buggy in the primary MMU. --=20 You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.=