Re: [PATCH v2 3/4] KVM: x86: Add support for MSR_IA32_MCx_CTL2 MSRs.

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: Jue Wang <juew@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
	Tony Luck <tony.luck@intel.com>,
	kvm@vger.kernel.org
Subject: Re: [PATCH v2 3/4] KVM: x86: Add support for MSR_IA32_MCx_CTL2 MSRs.
Date: Wed, 11 May 2022 19:00:43 +0000	[thread overview]
Message-ID: <YnwH2zRlGvwkjCAC@google.com> (raw)
In-Reply-To: <20220412223134.1736547-4-juew@google.com>

On Tue, Apr 12, 2022, Jue Wang wrote:
> Note the support of CMCI (MCG_CMCI_P) is not enabled in
> kvm_mce_cap_supported yet.

You can probably guess what I would say here :-)  A reader should not have to go
wade through Intel's SDM to understand the relationship between CTL2 and CMCI.

> Signed-off-by: Jue Wang <juew@google.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  1 +
>  arch/x86/kvm/x86.c              | 50 +++++++++++++++++++++++++--------
>  2 files changed, 40 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index ec9830d2aabf..639ef92d01d1 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -800,6 +800,7 @@ struct kvm_vcpu_arch {
>  	u64 mcg_ctl;
>  	u64 mcg_ext_ctl;
>  	u64 *mce_banks;
> +	u64 *mci_ctl2_banks;
>  
>  	/* Cache MMIO info */
>  	u64 mmio_gva;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index eb4029660bd9..73c64d2b9e60 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3167,6 +3167,7 @@ static int set_msr_mce(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  	unsigned bank_num = mcg_cap & 0xff;
>  	u32 msr = msr_info->index;
>  	u64 data = msr_info->data;
> +	u32 offset;
>  
>  	switch (msr) {
>  	case MSR_IA32_MCG_STATUS:
> @@ -3180,10 +3181,22 @@ static int set_msr_mce(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  			return 1;
>  		vcpu->arch.mcg_ctl = data;
>  		break;
> +	case MSR_IA32_MC0_CTL2 ... MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) - 1:

This is wrong.  Well, incomplete.  It will allow writes to MSRs that do not exist,
i.e. if bank_num >= offset < KVM_MAX_MCE_BANKS.

I believe this will suffice and is also the simplest?

		if (msr >= MSR_IA32_MCx_CTL2(bank_num))
			return 1;

Actually, tying in with the array_index_nopsec() comments below, if this captures
the last MSR in a variable, then the overrun is much more reasonable (sample idea
at the bottom).

> +		if (!(mcg_cap & MCG_CMCI_P) &&
> +				(data || !msr_info->host_initiated))

Funky indentation.  Should be:

		if (!(mcg_cap & MCG_CMCI_P) &&
		    (data || !msr_info->host_initiated))
			return 1;

> +			return 1;
> +		/* An attempt to write a 1 to a reserved bit raises #GP */
> +		if (data & ~(MCI_CTL2_CMCI_EN | MCI_CTL2_CMCI_THRESHOLD_MASK))
> +			return 1;
> +		offset = array_index_nospec(
> +				msr - MSR_IA32_MC0_CTL2,
> +				MSR_IA32_MCx_CTL2(bank_num) - MSR_IA32_MC0_CTL2);

My preference would be to let this run over (by a fair amount)

		offset = array_index_nospec(msr - MSR_IA32_MC0_CTL2,
					    MSR_IA32_MCx_CTL2(bank_num) - MSR_IA32_MC0_CTL2);

But if we use a local variable, there's no overrun:

	case MSR_IA32_MC0_CTL2 ... MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) - 1:
		last_msr = MSR_IA32_MCx_CTL2(bank_num) - 1;
		if (msr > last_msr)
			return 1;

		if (!(mcg_cap & MCG_CMCI_P) && (data || !msr_info->host_initiated))
			return 1;
		/* An attempt to write a 1 to a reserved bit raises #GP */
		if (data & ~(MCI_CTL2_CMCI_EN | MCI_CTL2_CMCI_THRESHOLD_MASK))
			return 1;
		offset = array_index_nospec(msr - MSR_IA32_MC0_CTL2,
					    last_msr + 1 - MSR_IA32_MC0_CTL2);
		vcpu->arch.mci_ctl2_banks[offset] = data;
		break;


And if we go that route, in a follow-up patch at the end of the series, clean up
the "default" path to hoist the if-statement into a proper case statement (unless
I've misread the code), e.g. with some opportunstic cleanup:

	case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
		last_msr = MSR_IA32_MCx_CTL(bank_num) - 1;
		if (msr > last_msr)
			return 1;

		offset = array_index_nospec(msr - MSR_IA32_MC0_CTL,
					    last_msr + 1 - MSR_IA32_MC0_CTL);

		/*
		 * Only 0 or all 1s can be written to IA32_MCi_CTL, though some
		 * Linux kernels clear bit 10 in bank 4 to workaround a BIOS/GART
		 * TLB issue on AMD K8s, ignore this to avoid an uncatched #GP in
		 * the guest.
		 */
		if ((offset & 0x3) == 0 &&
		    data != 0 && (data | (1 << 10)) != ~(u64)0)
			return -1;

		/* MCi_STATUS */
		if (!msr_info->host_initiated && (offset & 0x3) == 1 &&
		    data != 0 && !can_set_mci_status(vcpu))
			return -1;

		vcpu->arch.mce_banks[offset] = data;
		break;

Actually, rereading that, isn't "return -1" wrong?  That will cause kvm_emulate_wrmsr()
to exit to userspace, not inject a #GP.

*sigh*  Yep, indeed, the -1 gets interpreted as -EPERM and kills the guest.

Scratch the idea of doing the above on top, I'll send separate patches and a KUT
testcase.  I'll Cc you on the patches, it'd save Paolo a merge conflict (or you a
rebase) if this series is based on top of that bugfix + cleanup.

> +		vcpu->arch.mci_ctl2_banks[offset] = data;
> +		break;
>  	default:
>  		if (msr >= MSR_IA32_MC0_CTL &&
>  		    msr < MSR_IA32_MCx_CTL(bank_num)) {
> -			u32 offset = array_index_nospec(
> +			offset = array_index_nospec(
>  				msr - MSR_IA32_MC0_CTL,
>  				MSR_IA32_MCx_CTL(bank_num) - MSR_IA32_MC0_CTL);
>  
> @@ -3489,7 +3502,8 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  			return 1;
>  		}
>  		break;
> -	case 0x200 ... 0x2ff:
> +	case 0x200 ... MSR_IA32_MC0_CTL2 - 1:
> +	case MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) ... 0x2ff:
>  		return kvm_mtrr_set_msr(vcpu, msr, data);
>  	case MSR_IA32_APICBASE:
>  		return kvm_set_apic_base(vcpu, msr_info);
> @@ -3646,6 +3660,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  	case MSR_IA32_MCG_CTL:
>  	case MSR_IA32_MCG_STATUS:
>  	case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
> +	case MSR_IA32_MC0_CTL2 ... MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) - 1:
>  		return set_msr_mce(vcpu, msr_info);
>  
>  	case MSR_K7_PERFCTR0 ... MSR_K7_PERFCTR3:
> @@ -3750,6 +3765,7 @@ static int get_msr_mce(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata, bool host)
>  	u64 data;
>  	u64 mcg_cap = vcpu->arch.mcg_cap;
>  	unsigned bank_num = mcg_cap & 0xff;
> +	u32 offset;
>  
>  	switch (msr) {
>  	case MSR_IA32_P5_MC_ADDR:
> @@ -3767,10 +3783,18 @@ static int get_msr_mce(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata, bool host)
>  	case MSR_IA32_MCG_STATUS:
>  		data = vcpu->arch.mcg_status;
>  		break;
> +	case MSR_IA32_MC0_CTL2 ... MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) - 1:

Same comments as the WRMSR path.  I'll also handle the "default" case in my cleanup.

> +		if (!(mcg_cap & MCG_CMCI_P) && !host)
> +			return 1;
> +		offset = array_index_nospec(
> +				msr - MSR_IA32_MC0_CTL2,
> +				MSR_IA32_MCx_CTL2(bank_num) - MSR_IA32_MC0_CTL2);
> +		data = vcpu->arch.mci_ctl2_banks[offset];
> +		break;
>  	default:
>  		if (msr >= MSR_IA32_MC0_CTL &&
>  		    msr < MSR_IA32_MCx_CTL(bank_num)) {
> -			u32 offset = array_index_nospec(
> +			offset = array_index_nospec(
>  				msr - MSR_IA32_MC0_CTL,
>  				MSR_IA32_MCx_CTL(bank_num) - MSR_IA32_MC0_CTL);
>  

...

> @@ -11126,9 +11152,11 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
>  		goto fail_free_lapic;
>  	vcpu->arch.pio_data = page_address(page);
>  
> -	vcpu->arch.mce_banks = kzalloc(KVM_MAX_MCE_BANKS * sizeof(u64) * 4,
> +	vcpu->arch.mce_banks = kcalloc(KVM_MAX_MCE_BANKS * 4, sizeof(u64),
> +				       GFP_KERNEL_ACCOUNT);

Switching to kcalloc() should be a separate patch.

> +	vcpu->arch.mci_ctl2_banks = kcalloc(KVM_MAX_MCE_BANKS, sizeof(u64),
>  				       GFP_KERNEL_ACCOUNT);
> -	if (!vcpu->arch.mce_banks)
> +	if (!vcpu->arch.mce_banks | !vcpu->arch.mci_ctl2_banks)

This wants to be a logical-OR, not a bitwise-OR.

Oh, and vcpu->arch.mci_ctl2_banks needs to be freed if a later step fails.

>  		goto fail_free_pio_data;
>  	vcpu->arch.mcg_cap = KVM_MAX_MCE_BANKS;
>  
> -- 
> 2.35.1.1178.g4f1659d476-goog
>

next prev parent reply	other threads:[~2022-05-11 19:01 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-12 22:31 [PATCH v2 0/4] KVM: x86: Add support of CMCI signaling and UCNA Jue Wang
2022-04-12 22:31 ` [PATCH v2 1/4] KVM: x86: Clean up KVM APIC LVT logic Jue Wang
2022-05-11 17:38   ` Sean Christopherson
2022-05-12 17:57     ` Jue Wang
2022-04-12 22:31 ` [PATCH v2 2/4] KVM: x86: Add LVTCMCI support Jue Wang
2022-05-11 17:55   ` Sean Christopherson
2022-05-12 18:01     ` Jue Wang
2022-04-12 22:31 ` [PATCH v2 3/4] KVM: x86: Add support for MSR_IA32_MCx_CTL2 MSRs Jue Wang
2022-05-11 19:00   ` Sean Christopherson [this message]
2022-05-13  4:45     ` Jue Wang
2022-04-12 22:31 ` [PATCH v2 4/4] KVM: x86: Add support for MCG_CMCI_P and handling of injected UCNAs Jue Wang
2022-05-11 19:20   ` Sean Christopherson
2022-05-13  4:54     ` Jue Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YnwH2zRlGvwkjCAC@google.com \
    --to=seanjc@google.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=juew@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=tony.luck@intel.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.