Re: MCG_CAP ABI breakage (was Re: [Qemu-devel] [PATCH] target-i386: Do not set MCG_SER_P by default)

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Borislav Petkov <bp@alien8.de>
To: Eduardo Habkost <ehabkost@redhat.com>
Cc: KVM <kvm@vger.kernel.org>,
	"Marcelo Tosatti" <mtosatti@redhat.com>,
	qemu-devel <qemu-devel@nongnu.org>,
	"Huang Ying" <ying.huang@intel.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Andreas Färber" <afaerber@suse.de>,
	"Richard Henderson" <rth@twiddle.net>
Subject: Re: MCG_CAP ABI breakage (was Re: [Qemu-devel] [PATCH] target-i386: Do not set MCG_SER_P by default)
Date: Mon, 23 Nov 2015 17:43:14 +0100	[thread overview]
Message-ID: <20151123164314.GF5134@pd.tnic> (raw)
In-Reply-To: <20151123151127.GM20436@thinpad.lan.raisama.net>

On Mon, Nov 23, 2015 at 01:11:27PM -0200, Eduardo Habkost wrote:
> On Mon, Nov 23, 2015 at 11:22:37AM -0200, Eduardo Habkost wrote:
> [...]
> > In the case of this code, it looks like it's already broken
> > because the resulting mcg_cap depends on host kernel capabilities
> > (the ones reported by kvm_get_mce_cap_supported()), and the data
> > initialized by target-i386/cpu.c:mce_init() is silently
> > overwritten by kvm_arch_init_vcpu(). So we would need to fix that
> > before implementing a proper compatibility mechanism for
> > mcg_cap.
> 
> Fortunately, when running Linux v2.6.37 and later,
> kvm_arch_init_vcpu() won't actually change mcg_cap (see details
> below).
> 
> But the code is broken if running on Linux between v2.6.32 and
> v2.6.36: it will clear MCG_SER_P silently (and silently enable
> MCG_SER_P when migrating to a newer host).
> 
> But I don't know what we should do on those cases. If we abort
> initialization when the host doesn't support MCG_SER_P, all CPU
> models with MCE and MCA enabled will become unrunnable on Linux
> between v2.6.32 and v2.6.36. Should we do that, and simply ask
> people to upgrade their kernels (or explicitly disable MCE) if
> they want to run latest QEMU?
> 
> For reference, these are the capabilities returned by Linux:
> * KVM_MAX_MCE_BANKS is 32 since
>   890ca9aefa78f7831f8f633cab9e4803636dffe4 (v2.6.32-rc1~693^2~199)
> * KVM_MCE_CAP_SUPPORTED is (MCG_CTL_P | MCG_SER_P) since
>   5854dbca9b235f8cdd414a0961018763d2d5bf77 (v2.6.37-rc1~142^2~3)

The commit message of that one says that there is MCG_SER_P support in
the kernel.

The previous commit talks about MCE injection with KVM_X86_SET_MCE
ioctl but frankly, I don't see that. From looking at the current code,
KVM_X86_SET_MCE does kvm_vcpu_ioctl_x86_setup_mce() which simply sets
MCG_CAP. And it gets those from KVM_X86_GET_MCE_CAP_SUPPORTED which is

#define KVM_MCE_CAP_SUPPORTED (MCG_CTL_P | MCG_SER_P)

So it basically sets those two supported bits. But how is

	supported == actually present

?!?!

That soo doesn't make any sense.

> * KVM_MCE_CAP_SUPPORTED is MCG_CTL_P between
>   890ca9aefa78f7831f8f633cab9e4803636dffe4 (v2.6.32-rc1~693^2~199)
>   and 5854dbca9b235f8cdd414a0961018763d2d5bf77 (v2.6.37-rc1~142^2~3)
> 
> The current definitions in QEMU are:
>     #define MCE_CAP_DEF     (MCG_CTL_P|MCG_SER_P)
>     #define MCE_BANKS_DEF   10

That's also wrong. The number of banks is, of course,
generation-specific. The qemu commit adding that is

79c4f6b08009 ("QEMU: MCE: Add MCE simulation to qemu/tcg")

and I really don't get what the reasoning behind it is? Is this supposed
to mimick a certain default CPU which is not really resembling a real
CPU or some generic CPU or what is it?

Because the moment you start qemu with -cpu <something AMD>, all that
MCA information is totally wrong.

> The target-i386/cpu.c:mce_init() code sets mcg_cap to:
>     env->mcg_cap == MCE_CAP_DEF | MCE_BANKS_DEF;
>                  == (MCG_CTL_P|MCG_SER_P) | 10;
> 
> The kvm_arch_init_vcpu() code that changes mcg_cap does the following:
>     kvm_get_mce_cap_supported(cs->kvm_state, &mcg_cap, &banks);
>     if (banks > MCE_BANKS_DEF) {
>         banks = MCE_BANKS_DEF;
>     }
>     mcg_cap &= MCE_CAP_DEF;
>     mcg_cap |= banks;
>     env->mcg_cap = mcg_cap;
>   * Therefore, if running Linux v2.6.37 or newer, this will be
>     the result:
>     banks == 10;
>     mcg_cap == (MCG_CTL_P|MCG_SER_P) | banks
>             == (MCG_CTL_P|MCG_SER_P) | 10;
>     * That's the same value set by mce_init(), so fortunately
>       kvm_arch_init_vcpu() isn't actually changing mcg_cap
>       if running Linux v.2.6.37 and newer.
>   * However, if running Linux between v2.6.32 and v2.6.37,
>     kvm_arch_init_vcpu() will silently clear MCG_SER_P.

I don't understand - you want that cleared when emulating a !Intel CPU
or an Intel CPU which doesn't support SER. This bit should be clear
there. Why should be set at all there?

Hmmm?

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

WARNING: multiple messages have this Message-ID (diff)

From: Borislav Petkov <bp@alien8.de>
To: Eduardo Habkost <ehabkost@redhat.com>
Cc: KVM <kvm@vger.kernel.org>,
	"Marcelo Tosatti" <mtosatti@redhat.com>,
	qemu-devel <qemu-devel@nongnu.org>,
	"Huang Ying" <ying.huang@intel.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Andreas Färber" <afaerber@suse.de>,
	"Richard Henderson" <rth@twiddle.net>
Subject: Re: [Qemu-devel] MCG_CAP ABI breakage (was Re: [PATCH] target-i386: Do not set MCG_SER_P by default)
Date: Mon, 23 Nov 2015 17:43:14 +0100	[thread overview]
Message-ID: <20151123164314.GF5134@pd.tnic> (raw)
In-Reply-To: <20151123151127.GM20436@thinpad.lan.raisama.net>

On Mon, Nov 23, 2015 at 01:11:27PM -0200, Eduardo Habkost wrote:
> On Mon, Nov 23, 2015 at 11:22:37AM -0200, Eduardo Habkost wrote:
> [...]
> > In the case of this code, it looks like it's already broken
> > because the resulting mcg_cap depends on host kernel capabilities
> > (the ones reported by kvm_get_mce_cap_supported()), and the data
> > initialized by target-i386/cpu.c:mce_init() is silently
> > overwritten by kvm_arch_init_vcpu(). So we would need to fix that
> > before implementing a proper compatibility mechanism for
> > mcg_cap.
> 
> Fortunately, when running Linux v2.6.37 and later,
> kvm_arch_init_vcpu() won't actually change mcg_cap (see details
> below).
> 
> But the code is broken if running on Linux between v2.6.32 and
> v2.6.36: it will clear MCG_SER_P silently (and silently enable
> MCG_SER_P when migrating to a newer host).
> 
> But I don't know what we should do on those cases. If we abort
> initialization when the host doesn't support MCG_SER_P, all CPU
> models with MCE and MCA enabled will become unrunnable on Linux
> between v2.6.32 and v2.6.36. Should we do that, and simply ask
> people to upgrade their kernels (or explicitly disable MCE) if
> they want to run latest QEMU?
> 
> For reference, these are the capabilities returned by Linux:
> * KVM_MAX_MCE_BANKS is 32 since
>   890ca9aefa78f7831f8f633cab9e4803636dffe4 (v2.6.32-rc1~693^2~199)
> * KVM_MCE_CAP_SUPPORTED is (MCG_CTL_P | MCG_SER_P) since
>   5854dbca9b235f8cdd414a0961018763d2d5bf77 (v2.6.37-rc1~142^2~3)

The commit message of that one says that there is MCG_SER_P support in
the kernel.

The previous commit talks about MCE injection with KVM_X86_SET_MCE
ioctl but frankly, I don't see that. From looking at the current code,
KVM_X86_SET_MCE does kvm_vcpu_ioctl_x86_setup_mce() which simply sets
MCG_CAP. And it gets those from KVM_X86_GET_MCE_CAP_SUPPORTED which is

#define KVM_MCE_CAP_SUPPORTED (MCG_CTL_P | MCG_SER_P)

So it basically sets those two supported bits. But how is

	supported == actually present

?!?!

That soo doesn't make any sense.

> * KVM_MCE_CAP_SUPPORTED is MCG_CTL_P between
>   890ca9aefa78f7831f8f633cab9e4803636dffe4 (v2.6.32-rc1~693^2~199)
>   and 5854dbca9b235f8cdd414a0961018763d2d5bf77 (v2.6.37-rc1~142^2~3)
> 
> The current definitions in QEMU are:
>     #define MCE_CAP_DEF     (MCG_CTL_P|MCG_SER_P)
>     #define MCE_BANKS_DEF   10

That's also wrong. The number of banks is, of course,
generation-specific. The qemu commit adding that is

79c4f6b08009 ("QEMU: MCE: Add MCE simulation to qemu/tcg")

and I really don't get what the reasoning behind it is? Is this supposed
to mimick a certain default CPU which is not really resembling a real
CPU or some generic CPU or what is it?

Because the moment you start qemu with -cpu <something AMD>, all that
MCA information is totally wrong.

> The target-i386/cpu.c:mce_init() code sets mcg_cap to:
>     env->mcg_cap == MCE_CAP_DEF | MCE_BANKS_DEF;
>                  == (MCG_CTL_P|MCG_SER_P) | 10;
> 
> The kvm_arch_init_vcpu() code that changes mcg_cap does the following:
>     kvm_get_mce_cap_supported(cs->kvm_state, &mcg_cap, &banks);
>     if (banks > MCE_BANKS_DEF) {
>         banks = MCE_BANKS_DEF;
>     }
>     mcg_cap &= MCE_CAP_DEF;
>     mcg_cap |= banks;
>     env->mcg_cap = mcg_cap;
>   * Therefore, if running Linux v2.6.37 or newer, this will be
>     the result:
>     banks == 10;
>     mcg_cap == (MCG_CTL_P|MCG_SER_P) | banks
>             == (MCG_CTL_P|MCG_SER_P) | 10;
>     * That's the same value set by mce_init(), so fortunately
>       kvm_arch_init_vcpu() isn't actually changing mcg_cap
>       if running Linux v.2.6.37 and newer.
>   * However, if running Linux between v2.6.32 and v2.6.37,
>     kvm_arch_init_vcpu() will silently clear MCG_SER_P.

I don't understand - you want that cleared when emulating a !Intel CPU
or an Intel CPU which doesn't support SER. This bit should be clear
there. Why should be set at all there?

Hmmm?

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

next prev parent reply	other threads:[~2015-11-23 16:43 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-20 23:01 [PATCH] target-i386: Do not set MCG_SER_P by default Borislav Petkov
2015-11-20 23:11 ` Andreas Färber
2015-11-20 23:11   ` [Qemu-devel] " Andreas Färber
2015-11-21  1:09   ` Borislav Petkov
2015-11-21  1:09     ` [Qemu-devel] " Borislav Petkov
2015-11-23 13:22     ` Eduardo Habkost
2015-11-23 13:22       ` [Qemu-devel] " Eduardo Habkost
2015-11-23 14:47       ` Paolo Bonzini
2015-11-23 14:47         ` [Qemu-devel] " Paolo Bonzini
2015-11-23 15:03         ` Borislav Petkov
2015-11-23 15:03           ` [Qemu-devel] " Borislav Petkov
2015-11-23 15:11       ` MCG_CAP ABI breakage (was Re: [Qemu-devel] [PATCH] target-i386: Do not set MCG_SER_P by default) Eduardo Habkost
2015-11-23 15:11         ` [Qemu-devel] MCG_CAP ABI breakage (was " Eduardo Habkost
2015-11-23 16:43         ` Borislav Petkov [this message]
2015-11-23 16:43           ` Borislav Petkov
2015-11-23 19:42           ` MCG_CAP ABI breakage (was Re: [Qemu-devel] " Eduardo Habkost
2015-11-23 19:42             ` [Qemu-devel] MCG_CAP ABI breakage (was " Eduardo Habkost
2015-11-23 20:46             ` MCG_CAP ABI breakage (was Re: [Qemu-devel] " Borislav Petkov
2015-11-23 20:46               ` [Qemu-devel] MCG_CAP ABI breakage (was " Borislav Petkov
2015-11-24 16:36           ` MCG_CAP ABI breakage (was Re: [Qemu-devel] " Eduardo Habkost
2015-11-24 16:36             ` [Qemu-devel] MCG_CAP ABI breakage (was " Eduardo Habkost
2015-11-24 18:44             ` MCG_CAP ABI breakage (was Re: [Qemu-devel] " Borislav Petkov
2015-11-24 18:44               ` [Qemu-devel] MCG_CAP ABI breakage (was " Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151123164314.GF5134@pd.tnic \
    --to=bp@alien8.de \
    --cc=afaerber@suse.de \
    --cc=ehabkost@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.