All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mingwei Zhang <mizhang@google.com>
To: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	David Matlack <dmatlack@google.com>,
	Jim Mattson <jmattson@google.com>
Subject: Re: [PATCH 9/9] KVM: x86/mmu: BUG() in rmap helpers iff CONFIG_BUG_ON_DATA_CORRUPTION=y
Date: Thu, 18 May 2023 19:05:23 +0000	[thread overview]
Message-ID: <ZGZ2834xLw/woerO@google.com> (raw)
In-Reply-To: <20230511235917.639770-10-seanjc@google.com>

On Thu, May 11, 2023, Sean Christopherson wrote:
> Introduce KVM_BUG_ON_DATA_CORRUPTION() and use it in the low-level rmap
> helpers to convert the existing BUG()s to WARN_ON_ONCE() when the kernel
> is built with CONFIG_BUG_ON_DATA_CORRUPTION=n, i.e. does NOT want to BUG()
> on corruption of host kernel data structures.  Environments that don't
> have infrastructure to automatically capture crash dumps, i.e. aren't
> likely to enable CONFIG_BUG_ON_DATA_CORRUPTION=y, are typically better
> served overall by WARN-and-continue behavior (for the kernel, the VM is
> dead regardless), as a BUG() while holding mmu_lock all but guarantees
> the _best_ case scenario is a panic().
> 
> Make the BUG()s conditional instead of removing/replacing them entirely as
> there's a non-zero chance (though by no means a guarantee) that the damage
> isn't contained to the target VM, e.g. if no rmap is found for a SPTE then
> KVM may be double-zapping the SPTE, i.e. has already freed the memory the
> SPTE pointed at and thus KVM is reading/writing memory that KVM no longer
> owns.
> 
> Link: https://lore.kernel.org/all/20221129191237.31447-1-mizhang@google.com
> Suggested-by: Mingwei Zhang <mizhang@google.com>
> Cc: David Matlack <dmatlack@google.com>
> Cc: Jim Mattson <jmattson@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Mingwei Zhang <mizhang@google.com>
> +/*
> + * Note, "data corruption" refers to corruption of host kernel data structures,
> + * not guest data.  Guest data corruption, suspected or confirmed, that is tied
> + * and contained to a single VM should *never* BUG() and potentially panic the
> + * host, i.e. use this variant of KVM_BUG() if and only if a KVM data structure
> + * is corrupted and that corruption can have a cascading effect to other parts
> + * of the hosts and/or to other VMs.
> + */
> +#define KVM_BUG_ON_DATA_CORRUPTION(cond, kvm)			\
> +({								\
> +	bool __ret = !!(cond);					\
> +								\
> +	if (IS_ENABLED(CONFIG_BUG_ON_DATA_CORRUPTION))		\
> +		BUG_ON(__ret);					\
> +	else if (WARN_ON_ONCE(__ret && !(kvm)->vm_bugged))	\
> +		kvm_vm_bugged(kvm);				\
> +	unlikely(__ret);					\
> +})
> +
Previously, my concern was that people might abuse this feature by
generating lots of KVM_BUG_ON_DATA_CORRUPTION() in the code, with the
execuse that "hey, it is not a BUG_ON(), just turn off
CONFIG_BUG_ON_DATA_CORRUPTION." In reality, especially in production, no
one will take that risk by completely turning off the KCONFIG, so
KVM_BUG_ON_DATA_CORRUPTION() is still a BUG_ON() but with people having
execuses to add more.

Later I realize that this worry is purely based on hypothesis, so I
choose to not worry about that anymore. Overall, making BUG_ON()
tunable is still a very good progress. Thank you and David for the
help.

-Mingwei

>  static inline void kvm_vcpu_srcu_read_lock(struct kvm_vcpu *vcpu)
>  {
>  #ifdef CONFIG_PROVE_RCU
> -- 
> 2.40.1.606.ga4b1b128d6-goog
> 

      reply	other threads:[~2023-05-18 19:05 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-11 23:59 [PATCH 0/9] KVM: x86/mmu: Clean up MMU_DEBUG and BUG/WARN usage Sean Christopherson
2023-05-11 23:59 ` [PATCH 1/9] KVM: x86/mmu: Delete pgprintk() and all its usage Sean Christopherson
2023-05-11 23:59 ` [PATCH 2/9] KVM: x86/mmu: Delete rmap_printk() " Sean Christopherson
2023-05-11 23:59 ` [PATCH 3/9] KVM: x86/mmu: Delete the "dbg" module param Sean Christopherson
2023-05-11 23:59 ` [PATCH 4/9] KVM: x86/mmu: Rename MMU_WARN_ON() to KVM_MMU_WARN_ON() Sean Christopherson
2023-05-12 23:23   ` David Matlack
2023-05-12 23:30     ` Sean Christopherson
2023-05-12 23:35       ` David Matlack
2023-05-11 23:59 ` [PATCH 5/9] KVM: x86/mmu: Convert "runtime" WARN_ON() assertions to WARN_ON_ONCE() Sean Christopherson
2023-05-12 23:14   ` David Matlack
2023-05-12 23:18     ` Sean Christopherson
2023-05-12 23:24       ` David Matlack
2023-05-11 23:59 ` [PATCH 6/9] KVM: x86/mmu: Bug the VM if a vCPU ends up in long mode without PAE enabled Sean Christopherson
2023-05-12 23:33   ` David Matlack
2023-05-12 23:40     ` Sean Christopherson
2023-05-11 23:59 ` [PATCH 7/9] KVM: x86/mmu: Replace MMU_DEBUG with proper KVM_PROVE_MMU Kconfig Sean Christopherson
2023-05-11 23:59 ` [PATCH 8/9] KVM: x86/mmu: Plumb "struct kvm" all the way to pte_list_remove() Sean Christopherson
2023-05-11 23:59 ` [PATCH 9/9] KVM: x86/mmu: BUG() in rmap helpers iff CONFIG_BUG_ON_DATA_CORRUPTION=y Sean Christopherson
2023-05-18 19:05   ` Mingwei Zhang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZGZ2834xLw/woerO@google.com \
    --to=mizhang@google.com \
    --cc=dmatlack@google.com \
    --cc=jmattson@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.