From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D06E839B94C for ; Wed, 20 May 2026 14:21:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.202 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779286895; cv=none; b=OSI5RLNmRIgQoaNfyXOrix5G61r9HSsLui8l9iJu7rCHrosQGxHpum4mAYuy5AuADK49aCScKZeYuTCB0t1rvYiD52N08N62MH/fLgFLZ9LHfuptOx6R2siVscvUlqHioxwDsOyzHrA9KCHybwSeh0zJfkFf3QVrt9W4MX663t8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779286895; c=relaxed/simple; bh=FDEy+sBoWj2hNwWR6bbG5FHmYcnImhBaSmFUYukAAec=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=oNZDy5NtOzKxaRsxnT05N8gK/mESPj7l55z5FIchKNAJBXtkGtOeqTe4+/mRFaGtTze5rR8INydcEoLwOrqUzK+g32jVFetfTyq1MVyEKV9K912DDxRnRsevVqa+ZUBZCF21YcQuQrqcPGgn/ujZiE3u17wRtXl2wQ2oZ3wEFP8= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=lbqtltqb; arc=none smtp.client-ip=209.85.215.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="lbqtltqb" Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-c823549b1fcso8548916a12.0 for ; Wed, 20 May 2026 07:21:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1779286893; x=1779891693; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=FGduY4ljPx9wbBVEnZWZe0lo89Kip5G/xMJZhZdI+Us=; b=lbqtltqb+9UdNCEzURki/5DYghvNraWl8f2UHMKBex6QGzzZj7CP5H0RNAZVeAY00S FwOCvad/2XtAIv2sSDTKBJdSGXDwgzUV+oWsR3R0SD6i/9/9+I3JCraCBUHqYgfqJ9PC wh2J2tBWQv/wjpLTSpGvgazGr7dVbm3ZqjRCwAb3biobWo/v8oijgM4CSKlhKMB5B9UZ z/2BQBaVuL5pdS8mMiudurD2Cg6sLNbosvl4MMKM8rP57kB4ke04az4n8Iuo99TZmcvZ 6G/IdhIui/ANV2PdlqQawvCtBfiOz7TjmtlaS2CKPcuSJ7VCbBpyAvcH7bBKEQ7GefFQ 2C8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779286893; x=1779891693; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=FGduY4ljPx9wbBVEnZWZe0lo89Kip5G/xMJZhZdI+Us=; b=hhehRpKywFhtMjuDcA6ipRBRBztv9XV0QWpURwFygfGG8XpgOCsbP44DX+W6zTCEIK MjUN9QK8f4Fd4zRrQ6+YLBANqDbsVOCbzMeMIvX4v1HGNfNrOYAsl/OeNVM170n0DAIk vXX/OWQ9YrZmeOc1yfl3FRgx98Cr6Pp/8ISrl/QlOcV4mGIzm/OKk2mCUlzZ8hB7R/17 z1KdQsaoP2BttM6o3XWhQeAzJmI2nft08nH0rGHVlU2/zGeF/iEEOcN4yoKVLuDbQVIW msocA9sQbAZ8GFtS/x4sR+n3Lq2wdg+loJveXPGHbhgBg9uTdjznam/XaU3RRFKlwpEE Z3sg== X-Forwarded-Encrypted: i=1; AFNElJ9FgcWLsod9lRvshGmz5HDJTn8PaNyOrpmnPg1Q+I29WZ5g7OrPzVPqW+Jb8JTwYPYrpND6/KBPLRg=@vger.kernel.org X-Gm-Message-State: AOJu0YzGFGmacNSa8FRQq5qzl2xmWgRx1WXtC+dkLBp70ffyc4Cq2XPx DQakftPbxHAqeav6p4F1tTsv4JD5QkjTs7xrHMm9lqkxmoCM0NYkhUzH0Yt479kamcle3tSHsUV uWtxUbA== X-Received: from pgdn22.prod.google.com ([2002:a63:8f16:0:b0:c82:2d14:39c8]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:430b:b0:3a3:3d95:508e with SMTP id adf61e73a8af0-3b22ec9a748mr25179838637.32.1779286891879; Wed, 20 May 2026 07:21:31 -0700 (PDT) Date: Wed, 20 May 2026 07:21:31 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-doc@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260507-gmem-inplace-conversion-v6-0-91ab5a8b19a4@google.com> <20260507-gmem-inplace-conversion-v6-6-91ab5a8b19a4@google.com> Message-ID: Subject: Re: [PATCH v6 06/43] KVM: x86/mmu: Bug the VM if gmem attributes are queried to determine max mapping level From: Sean Christopherson To: Fuad Tabba Cc: ackerleytng@google.com, aik@amd.com, andrew.jones@linux.dev, binbin.wu@linux.intel.com, brauner@kernel.org, chao.p.peng@linux.intel.com, david@kernel.org, ira.weiny@intel.com, jmattson@google.com, jthoughton@google.com, michael.roth@amd.com, oupton@kernel.org, pankaj.gupta@amd.com, qperret@google.com, rick.p.edgecombe@intel.com, rientjes@google.com, shivankg@amd.com, steven.price@arm.com, willy@infradead.org, wyihan@google.com, yan.y.zhao@intel.com, forkloop@google.com, pratyush@kernel.org, suzuki.poulose@arm.com, aneesh.kumar@kernel.org, liam@infradead.org, Paolo Bonzini , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Jonathan Corbet , Shuah Khan , Shuah Khan , Vishal Annapurve , Andrew Morton , Chris Li , Kairui Song , Kemeng Shi , Nhat Pham , Baoquan He , Barry Song , Axel Rasmussen , Yuanchu Xie , Wei Xu , Youngjun Park , Qi Zheng , Shakeel Butt , Kiryl Shutsemau , Jason Gunthorpe , Vlastimil Babka , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev Content-Type: text/plain; charset="us-ascii" On Wed, May 20, 2026, Fuad Tabba wrote: > On Thu, 7 May 2026 at 21:22, Ackerley Tng via B4 Relay > wrote: > > > > From: Ackerley Tng > > > > When the maximum mapping level is queried, KVM's MMU lock is held, and > > while the MMU lock is held, guest_memfd cannot take the > > filemap_invalidate_lock() to look up the current shared/private state of > > the gfn, for these reasons: > > > > + The MMU lock is a spinlock or rwlock and cannot be held while taking a > > lock that can sleep. > > + In guest_memfd's code paths (such as truncate), the > > filemap_invalidate_lock() is held while taking the MMU lock, and taking > > the locks in reverse order would introduce a AB-BA deadlock. > > > > Currently, the maximum mapping level is only queried from guest_memfd in > > the process of recovering huge pages, if dirty logging is disabled on a > > memslot. Dirty logging is not currently supported for guest_memfd, and > > guest_memfd memslots also cannot be updated. > > > > For now, bug the VM if guest_memfd needs to be queried to determine the > > maximum mapping level. This guard can be removed if/when support is added. > > > > Signed-off-by: Ackerley Tng > > --- > > arch/x86/kvm/mmu/mmu.c | 9 +++++++++ > > 1 file changed, 9 insertions(+) > > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > index a80a876ab4ad6..153bcc5369985 100644 > > --- a/arch/x86/kvm/mmu/mmu.c > > +++ b/arch/x86/kvm/mmu/mmu.c > > @@ -3357,6 +3357,15 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm, struct kvm_page_fault *fault, > > max_level = fault->max_level; > > is_private = fault->is_private; > > } else { > > + /* > > + * Memory attributes cannot be obtained from guest_memfd while > > + * the MMU lock is held. > > + */ > > + if (KVM_BUG_ON(static_call_query(__kvm_get_memory_attributes) == > > + kvm_gmem_get_memory_attributes, kvm)) { > > + return 0; > > + } > > + > > This directly takes the address of kvm_gmem_get_memory_attributes, > which is only compiled if CONFIG_KVM_GUEST_MEMFD=y. This breaks > ARCH=i386. And this bleeds guest_memfd implementation details into places they don't belong. The right way to deal with this is to use lockdep_assert_not_held() in whatever code mustn't run with mmu_lock held. E.g. diff --git virt/kvm/guest_memfd.c virt/kvm/guest_memfd.c index c9f155c2dc5c..3bea9c1137ef 100644 --- virt/kvm/guest_memfd.c +++ virt/kvm/guest_memfd.c @@ -547,6 +547,9 @@ unsigned long kvm_gmem_get_memory_attributes(struct kvm *kvm, gfn_t gfn) struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn); struct inode *inode; + /* Comment goes here. */ + lockdep_assert_not_held(&kvm->mmu_lock); + /* * If this gfn has no associated memslot, there's no chance of the gfn * being backed by private memory, since guest_memfd must be used for But I'm confused, because kvm_gmem_get_memory_attributes() doesn't actually take filemap_invalidate_lock(), so what exactly is the problem? > > max_level = PG_LEVEL_NUM; > > is_private = kvm_mem_is_private(kvm, gfn); > > } > > > > -- > > 2.54.0.563.g4f69b47b94-goog > > > >