From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7D1EF481AB for ; Fri, 27 Sep 2024 14:35:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.202 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727447706; cv=none; b=LVgwZHxDg/cz3fGr46K6E4QJ1jmpMmKBX/4KJFzffmGrPTtvxe0/mgaj+hpMxROwfP55IjW3C6h1ZTHKM3DP6jbNmboyuwj4GMxoFXExSVMAdPDcJlUmzO4wiY3SxjE1NZAyxhgZpIyVUYhWPt+n1PlVzOE7SdMX+pP7nsLf5YI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727447706; c=relaxed/simple; bh=26xfhZSnR38tIySAMPTN+tZMdkMAV+uEJr9WkWvF/ag=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=pUqNZM3hraaFikEEfm89DWTmQ35ehYktgWqO1tBoQ9zkWOcrduYYB5bXQFJFfJdsMBy5q/GqsqeZ+tMdvpVNLVKvs9yXXnCEW0nhbb2ATvFxfwPXTm8qlXN5gK9DATdzYcAi6ionPIt8Xmi3rCH9DMhtuzJIOT9hqYyVhSjJcNE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=hzg9T+MA; arc=none smtp.client-ip=209.85.219.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="hzg9T+MA" Received: by mail-yb1-f202.google.com with SMTP id 3f1490d57ef6-e258c0e02a9so3246993276.3 for ; Fri, 27 Sep 2024 07:35:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1727447703; x=1728052503; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=x7ySq9WAF3LhSVhIL2gzm1kyClTrXusjtIOTa8ObngI=; b=hzg9T+MAYpX7YqRRal7w6+nyv/v1i1T3kE9TuATZWKFCHgSqmOHXjgF3xq4KB/AWVG vXvO8bIttzMRDYZcXp5M+TkmUSeGISKoRo9jYFUAzGXWtZt1E0/ADarXexwJqNXLIwoj +nH6AZKaY9v+2P2DfT7ZrBXmy1e48ug9wUTzIupEp1a9XT3UyxwvUaaf6zFv8uCrgnIA 4p7dYygRkoX6bxExxPY3yZ4/qQuWZ1+wVShH6O8dos91tiWRDxcl6EVEOY/1CSkpYBV6 +AKcm48axBCtnDuL5vbR01K5dWw5j0LosS7w0EwOMyGC3Dg2poiDhTcFid9UF9gqiS+X qs9w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727447703; x=1728052503; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=x7ySq9WAF3LhSVhIL2gzm1kyClTrXusjtIOTa8ObngI=; b=TSUP/57D57XB1D63Ne6sM9675oSTCc3+XXXPuka6MbFt2r4zYlLGw7fOIj+OQ8Qt4j bEsJBSfqG0Iss1wtOAsfbtqKW16x2eYpAVv0aFgZKKmrnjlwx7CZLD65PCMYr1/HtnzD sjdRJ1olWXQTl/XAEpSrDNW+WkG9UBjdi5h4TNXv1mY7NUoR0YyfjM4Z3LfA9WI6jOf8 QKSw6RVr3W88gufbWDNsXTebY/x9tp5YdIasxVAfLzuXZ03m0mE9doayYF/oe7gpuAka 2Z7mwnPMxr8gsELYxcGX8QKn5Jj48wwVw8MrMdhhkYRy6zKwQWlz4so0RkBYu4n9RYUv 4zkA== X-Forwarded-Encrypted: i=1; AJvYcCXyl/C6hF99q55JwjAGT/ahAeFIQ43tp/TgYjtbRmLnF8uONE5ZWg8S3fp0DZP1U84r/zyYVnmwDki7/nE=@vger.kernel.org X-Gm-Message-State: AOJu0YzZurem/LU11xhPLDcDcqtJv38/otGyuG9nWwlVZTaDPSC7nWcD 31eoyFNneI0bIo8b7t6LyfC7h6jwDUgMGxZLaEzMS49NUbvc9mE75NMUCXzXzlySV/9ZRp0kETQ FoQ== X-Google-Smtp-Source: AGHT+IHK44G6MEuSuK54ez0WbjUBvQMLb5IMMQ1FLEA4b5+xVcS/FEkwgYEE6D4iDUrKga1UOzUKK7X15ww= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a5b:14c:0:b0:e25:2491:d005 with SMTP id 3f1490d57ef6-e2604b84f89mr2788276.8.1727447703410; Fri, 27 Sep 2024 07:35:03 -0700 (PDT) Date: Fri, 27 Sep 2024 07:32:10 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <6eecc450d0326c9bedfbb34096a0279410923c8d.1726182754.git.isaku.yamahata@intel.com> Message-ID: Subject: Re: [PATCH] KVM: x86/tdp_mmu: Trigger the callback only when an interesting change From: Sean Christopherson To: Yan Zhao Cc: Isaku Yamahata , kvm@vger.kernel.org, sagis@google.com, chao.gao@intel.com, pbonzini@redhat.com, rick.p.edgecombe@intel.com, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="us-ascii" On Fri, Sep 27, 2024, Sean Christopherson wrote: > On Thu, Sep 26, 2024, Yan Zhao wrote: > > On Thu, Sep 12, 2024 at 05:07:57PM -0700, Sean Christopherson wrote: > > > On Thu, Sep 12, 2024, Isaku Yamahata wrote: > > > Right now, the fixes for make_spte() are sitting toward the end of the massive > > > kvm_follow_pfn() rework (80+ patches and counting), but despite the size, I am > > > fairly confident that series can land in 6.13 (lots and lots of small patches). > > > > > > --- > > > Author: Sean Christopherson > > > AuthorDate: Thu Sep 12 16:23:21 2024 -0700 > > > Commit: Sean Christopherson > > > CommitDate: Thu Sep 12 16:35:06 2024 -0700 > > > > > > KVM: x86/mmu: Flush TLBs if resolving a TDP MMU fault clears W or D bits > > > > > > Do a remote TLB flush if installing a leaf SPTE overwrites an existing > > > leaf SPTE (with the same target pfn) and clears the Writable bit or the > > > Dirty bit. KVM isn't _supposed_ to clear Writable or Dirty bits in such > > > a scenario, but make_spte() has a flaw where it will fail to set the Dirty > > > if the existing SPTE is writable. > > > > > > E.g. if two vCPUs race to handle faults, the KVM will install a W=1,D=1 > > > SPTE for the first vCPU, and then overwrite it with a W=1,D=0 SPTE for the > > > second vCPU. If the first vCPU (or another vCPU) accesses memory using > > > the W=1,D=1 SPTE, i.e. creates a writable, dirty TLB entry, and that is > > > the only SPTE that is dirty at the time of the next relevant clearing of > > > the dirty logs, then clear_dirty_gfn_range() will not modify any SPTEs > > > because it sees the D=0 SPTE, and thus will complete the clearing of the > > > dirty logs without performing a TLB flush. > > But it looks that kvm_flush_remote_tlbs_memslot() will always be invoked no > > matter clear_dirty_gfn_range() finds a D bit or not. > > Oh, right, I forgot about that. I'll tweak the changelog to call that out before > posting. Hmm, and I'll drop the Cc: stable@ too, as commit b64d740ea7dd ("kvm: > x86: mmu: Always flush TLBs when enabling dirty logging") was a bug fix, i.e. if > anything should be backported it's that commit. Actually, a better idea. I think it makes sense to fully commit to not flushing when overwriting SPTEs, and instead rely on the dirty logging logic to do a remote TLB flush. E.g. on top of this change in the mega-series is a cleanup to unify the TDP MMU and shadow MMU logic for clearing Writable and Dirty bits, with this comment (which is a massaged version of an existing comment for mmu_spte_update()): /* * Whenever an MMU-writable SPTE is overwritten with a read-only SPTE, remote * TLBs must be flushed. Otherwise write-protecting the gfn may find a read- * only SPTE, even though the writable SPTE might be cached in a CPU's TLB. * * Remote TLBs also need to be flushed if the Dirty bit is cleared, as false * negatives are not acceptable, e.g. if KVM is using D-bit based PML on VMX. * * Don't flush if the Accessed bit is cleared, as access tracking tolerates * false negatives, and the one path that does care about TLB flushes, * kvm_mmu_notifier_clear_flush_young(), uses mmu_spte_update_no_track(). * * Note, this logic only applies to leaf SPTEs. The caller is responsible for * determining whether or not a TLB flush is required when modifying a shadow- * present non-leaf SPTE. */ But that comment is was made stale by commit b64d740ea7dd. And looking through the dirty logging logic, KVM (luckily? thankfully?) flushes based on whether or not dirty bitmap/ring entries are reaped, not based on whether or not SPTEs were modified.