From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 554E218054 for ; Mon, 23 Sep 2024 18:37:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727116641; cv=none; b=K235gQ5WGRPQK6aq13S1bGR63k5GQkbfVwmYThknFOL3tPHaKVR+FB9czcKrSppCgE/+nqU9wIlC0cyxsRLKR22uuD1YDjqmNYwG0zRxdOlgoZegTJxrT8JOLi7KMnfPSPgS5prmo9iYG5WkK+f5+YcJqNfbIElnadYKj/jOI74= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727116641; c=relaxed/simple; bh=rCYtrihX4ihGeNjXGzRivL2dCDo5huMpPoj7e/TVaqs=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=VBkrdTbLyN0qDl5h/VeF2VhHbZQnS4X79X6Sppu7YOqHKlQxnpieYA752aNJOHqWxIr75imxrvgiqSjhuWbaDMbhT4oy8wZcj7vUbcp32nxQh1Ojkbv2C6Z/L4mYcU/w4Fw+hS6daqVW0zUbNPsu8IRbA81LIPKK/QilGDUCfL8= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Y92TOGnf; arc=none smtp.client-ip=209.85.215.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Y92TOGnf" Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-7e6af43d0c5so577923a12.3 for ; Mon, 23 Sep 2024 11:37:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1727116640; x=1727721440; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=/S8Wwfazfe8XPn2DgDvOZQ7LYfAriiSs2+Z1ElWXsz0=; b=Y92TOGnf9V5qbP2QxmwmRCb6EtI4LbmHOtPQ8xtz8NgqHpcVGujbhS0Kxf8bnD2L2E kKUX3UJO1UyLGMsGXcxyfvpMVPwY6wQvDiReA14Q6fkfW9/jcu7oHjl+Dqq4iLMacusw aVMaGoJDgcwVSvIdEqLnqWN9dfgLnfDVCo6MyoC8jTd1fGh4twXc7/I1wqrKqeE/rI2R 3p+T/yz3OaVSLHkzm9Gzs4GpoUJHfrp5jmMEoiPIsRytfsy/c7qkTZuzuBLel6PW/1Fk O+l+qvNRtQIx3CNxpw38nVBK+OsOQ++DeEGxpDy4q2g2dqA7HbGMY7g8EDuIRdLiFTEv GWJQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727116640; x=1727721440; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=/S8Wwfazfe8XPn2DgDvOZQ7LYfAriiSs2+Z1ElWXsz0=; b=xPTuTK6C+lk55fMulFR4a+t/y85pCPB95YOlamXsDlmqe68sbI9cyACzPhaotEp1yf aeT8S1WOlXghQ89JOOJNj25I7lAtIlNB1UGrMqws6kGw2/NwV72jOx4xRwkooRyvcnID CeetYjEh0+n3tPEovkaVEZVLVlyuhMMqIqfj5uSFqf/Dk36jd0jnHD4CvyPV9yZOqFz1 AVW7q2EUx+js25bL0uL3WHTeakKxaLbrfeUpftX+YgrMSJ4BhUZ9IhwzKswUefw3TBA1 L0uBZk0rvjiraiE2sFQBx2Lu9NvOv6Bd7qqvCln2/N7nWZ8IDa/gky/jN7F0CohPODxm 1RRA== X-Forwarded-Encrypted: i=1; AJvYcCXHaq+nRZOhR4aT/Inx0lXLmCsfmE2lihcgNI2Abe9raUvXNARTYl4Wk3MWc1ymaH+5UqB+/9hdxew8FYM=@vger.kernel.org X-Gm-Message-State: AOJu0Ywc5XoZrMgcUtvoYZx259X+CUMi7HUFa1Gdh/U95wZUhWZxvzWc TxlJXiiM4z6BgviM3YFLdIAJA1p9IbZv+l7jYK+ukuXmfm2CdNqjwxxG6sTnGEPfyXwY/+OQ5E8 ikA== X-Google-Smtp-Source: AGHT+IGAYSFuTfcgYwZi5yt7ijkpmW6T+Qrle1UNzg+3VzmewGeG3vrBFgiiN1AmLEbBc8zFvyD5CwEHs2w= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a17:902:c951:b0:206:8c37:bcc7 with SMTP id d9443c01a7336-208d839844dmr669435ad.1.1727116639464; Mon, 23 Sep 2024 11:37:19 -0700 (PDT) Date: Mon, 23 Sep 2024 11:37:14 -0700 In-Reply-To: <20240703021043.13881-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240703020921.13855-1-yan.y.zhao@intel.com> <20240703021043.13881-1-yan.y.zhao@intel.com> Message-ID: Subject: Re: [PATCH v2 1/4] KVM: x86/mmu: Introduce a quirk to control memslot zap behavior From: Sean Christopherson To: Yan Zhao Cc: pbonzini@redhat.com, rick.p.edgecombe@intel.com, kai.huang@intel.com, isaku.yamahata@intel.com, dmatlack@google.com, sagis@google.com, erdemaktas@google.com, graf@amazon.com, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Content-Type: text/plain; charset="us-ascii" On Wed, Jul 03, 2024, Yan Zhao wrote: > Introduce the quirk KVM_X86_QUIRK_SLOT_ZAP_ALL to allow users to select > KVM's behavior when a memslot is moved or deleted for KVM_X86_DEFAULT_VM > VMs. Make sure KVM behave as if the quirk is always disabled for > non-KVM_X86_DEFAULT_VM VMs. ... > Suggested-by: Kai Huang > Suggested-by: Sean Christopherson Bad Sean, bad. > +/* > + * Zapping leaf SPTEs with memslot range when a memslot is moved/deleted. > + * > + * Zapping non-leaf SPTEs, a.k.a. not-last SPTEs, isn't required, worst > + * case scenario we'll have unused shadow pages lying around until they > + * are recycled due to age or when the VM is destroyed. > + */ > +static void kvm_mmu_zap_memslot_leafs(struct kvm *kvm, struct kvm_memory_slot *slot) > +{ > + struct kvm_gfn_range range = { > + .slot = slot, > + .start = slot->base_gfn, > + .end = slot->base_gfn + slot->npages, > + .may_block = true, > + }; > + bool flush = false; > + > + write_lock(&kvm->mmu_lock); > + > + if (kvm_memslots_have_rmaps(kvm)) > + flush = kvm_handle_gfn_range(kvm, &range, kvm_zap_rmap); This, and Paolo's merged variant, break shadow paging. As was tried in commit 4e103134b862 ("KVM: x86/mmu: Zap only the relevant pages when removing a memslot"), all shadow pages, i.e. non-leaf SPTEs, need to be zapped. All of the accounting for a shadow page is tied to the memslot, i.e. the shadow page holds a reference to the memslot, for all intents and purposes. Deleting the memslot without removing all relevant shadow pages results in NULL pointer derefs when tearing down the VM. Note, that commit is/was buggy, and I suspect my follow-up attempt[*] was as well. https://lore.kernel.org/all/20190820200318.GA15808@linux.intel.com Rather than trying to get this functional for shadow paging (which includes nested TDP), I think we should scrap the quirk idea and simply make this the behavior for S-EPT and nothing else. BUG: kernel NULL pointer dereference, address: 00000000000000b0 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 6085f43067 P4D 608c080067 PUD 608c081067 PMD 0 Oops: Oops: 0000 [#1] SMP NOPTI CPU: 79 UID: 0 PID: 187063 Comm: set_memory_regi Tainted: G W 6.11.0-smp--24867312d167-cpl #395 Tainted: [W]=WARN Hardware name: Google Astoria/astoria, BIOS 0.20240617.0-0 06/17/2024 RIP: 0010:__kvm_mmu_prepare_zap_page+0x3a9/0x7b0 [kvm] Code: <48> 8b 8e b0 00 00 00 48 8b 96 e0 00 00 00 48 c1 e9 09 48 29 c8 8b RSP: 0018:ff314a25b19f7c28 EFLAGS: 00010212 Call Trace: kvm_arch_flush_shadow_all+0x7a/0xf0 [kvm] kvm_mmu_notifier_release+0x6c/0xb0 [kvm] mmu_notifier_unregister+0x85/0x140 kvm_put_kvm+0x263/0x410 [kvm] kvm_vm_release+0x21/0x30 [kvm] __fput+0x8d/0x2c0 __se_sys_close+0x71/0xc0 do_syscall_64+0x83/0x160 entry_SYSCALL_64_after_hwframe+0x76/0x7e