Re: [PATCH v2 4/6] KVM: x86/mmu: Recover TDP MMU huge page mappings in-place instead of zapping

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: Vipin Sharma <vipinsh@google.com>
Cc: David Matlack <dmatlack@google.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	kvm@vger.kernel.org
Subject: Re: [PATCH v2 4/6] KVM: x86/mmu: Recover TDP MMU huge page mappings in-place instead of zapping
Date: Mon, 4 Nov 2024 14:39:02 -0800	[thread overview]
Message-ID: <ZylNBqRnL1lD-T1n@google.com> (raw)
In-Reply-To: <CAHVum0fYkpU-UAyuqrRx+VGi2BFVSupwMhKQ+Q0hY9+15GSTCg@mail.gmail.com>

On Fri, Nov 01, 2024, Vipin Sharma wrote:
> On Wed, Oct 30, 2024 at 4:42 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Wed, Oct 09, 2024, Vipin Sharma wrote:
> >
> > Coming back to this, I opted to match the behavior of make_small_spte() and do:
> >
> >         KVM_BUG_ON(!is_shadow_present_pte(small_spte) || level == PG_LEVEL_4K, kvm);
> >
> 
> Should these be two separate KVM_BUG_ON(), to aid in debugging?

I smushed them together to may make_huge_page_split_spte()'s style, and because
practically speaking the assertion will never fail.  And if it does fail, figuring
out what failed wil be quite easy as the two variables being checked are function
parameters, i.e. are all but guaranteed to be in RSI and RDX.

> > As explained in commit 3d4415ed75a57, the scenario is meant to be impossible.
> > If the check fails in production, odds are good there's SPTE memory corruption
> > and we _want_ to kill the VM.
> >
> >     KVM: x86/mmu: Bug the VM if KVM tries to split a !hugepage SPTE
> >
> >     Bug the VM instead of simply warning if KVM tries to split a SPTE that is
> >     non-present or not-huge.  KVM is guaranteed to end up in a broken state as
> >     the callers fully expect a valid SPTE, e.g. the shadow MMU will add an
> >     rmap entry, and all MMUs will account the expected small page.  Returning
> >     '0' is also technically wrong now that SHADOW_NONPRESENT_VALUE exists,
> >     i.e. would cause KVM to create a potential #VE SPTE.
> >
> >     While it would be possible to have the callers gracefully handle failure,
> >     doing so would provide no practical value as the scenario really should be
> >     impossible, while the error handling would add a non-trivial amount of
> >     noise.
> >
> > There's also no need to return SHADOW_NONPRESENT_VALUE.  KVM_BUG_ON() ensures
> > all vCPUs are kicked out of the guest, so while the return SPTE may be a bit
> > nonsensical, it will never be consumed by hardware.  Theoretically, KVM could
> > wander down a weird path in the future, but again, the most likely scenario is
> > that there was host memory corruption, so potential weird paths are the least of
> > KVM's worries at that point.
> >
> > More importantly, in the _current_ code, returning SHADOW_NONPRESENT_VALUE happens
> > to be benign, but that's 100% due to make_huge_spte() only being used by the TDP
> > MMU.  If the shaduw MMU ever started using make_huge_spte(), returning a !present
> > SPTE would be all but guaranteed to cause fatal problems.
> 
> I think the caller should be given the opportunity to handle a
> failure. In the current code, TDP is able to handle the error
> condition, so penalizing a VM seems wrong. We have gone from a state
> of reduced performance to either very good performance or VM being
> killed.

The context of how the failure can happens matters.  The only way this helper can
fail is if there is a blatant bug in the caller, or if there is data corruption
of some form.  The use of KVM_BUG_ON() is a very clear signal to developers and
readers that the caller _must_ ensure the SPTE is shadow-present SPTE, and that
the new SPTE will be a huge SPTE.  Critically, treating bad input as fatal to the
VM also allow the caller to assume success.

> If shadow MMU starts using make_huge_spte() and doesn't add logic to
> handle this scenario (killing vm or something else) then that is a
> coding bug of that feature which should be fixed.

No, because allowing make_huge_spte() to fail, even if there is a WARN, adds
non-trivial complexity with zero real-world benefit.  At some point, success must
be assumed/guaranteed.  Forcing a future developer to think about the best way to
handle a failure that "can't" happen is a waste of their time.

E.g. as an example where allowing failure is both more absurd and more painful,
imagine if kvm_mmu_prepare_zap_page() return an error if mmu_lock weren't held.
Trying to gracefully handle error would be madness, and so it simply asserts that
mmu_lock is held.

next prev parent reply	other threads:[~2024-11-04 22:39 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-23 23:56 [PATCH v2 0/6] KVM: x86/mmu: Optimize TDP MMU huge page recovery during disable-dirty-log David Matlack
2024-08-23 23:56 ` [PATCH v2 1/6] KVM: x86/mmu: Drop @max_level from kvm_mmu_max_mapping_level() David Matlack
2024-08-23 23:56 ` [PATCH v2 2/6] KVM: x86/mmu: Batch TLB flushes when zapping collapsible TDP MMU SPTEs David Matlack
2024-08-23 23:56 ` [PATCH v2 3/6] KVM: x86/mmu: Refactor TDP MMU iter need resched check David Matlack
2024-10-31  0:04   ` Sean Christopherson
2024-08-23 23:56 ` [PATCH v2 4/6] KVM: x86/mmu: Recover TDP MMU huge page mappings in-place instead of zapping David Matlack
2024-10-09 16:23   ` Vipin Sharma
2024-10-09 17:35     ` Sean Christopherson
2024-10-09 20:21       ` Vipin Sharma
2024-10-30 23:42         ` Sean Christopherson
2024-11-01 20:37           ` Vipin Sharma
2024-11-04 22:39             ` Sean Christopherson [this message]
2024-08-23 23:56 ` [PATCH v2 5/6] KVM: x86/mmu: Rename make_huge_page_split_spte() to make_small_spte() David Matlack
2024-08-23 23:56 ` [PATCH v2 6/6] KVM: x86/mmu: WARN if huge page recovery triggered during dirty logging David Matlack
2024-10-31 20:00 ` [PATCH v2 0/6] KVM: x86/mmu: Optimize TDP MMU huge page recovery during disable-dirty-log Sean Christopherson
2024-11-05  5:56 ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZylNBqRnL1lD-T1n@google.com \
    --to=seanjc@google.com \
    --cc=dmatlack@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=vipinsh@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.