From: Marcelo Tosatti <mtosatti@redhat.com>
To: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Avi Kivity <avi@redhat.com>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
Huang Ying <ying.huang@intel.com>,
Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>,
Jin Dongming <jin.dongming@np.css.fujitsu.com>
Subject: Re: [PATCH 08/13] kvm: x86: Inject pending MCE events on state writeback
Date: Thu, 17 Feb 2011 15:55:59 -0200 [thread overview]
Message-ID: <20110217175559.GA12113@amt.cnet> (raw)
In-Reply-To: <4D5D558B.10406@siemens.com>
On Thu, Feb 17, 2011 at 06:06:19PM +0100, Jan Kiszka wrote:
> On 2011-02-17 17:35, Marcelo Tosatti wrote:
> > On Tue, Feb 15, 2011 at 09:23:32AM +0100, Jan Kiszka wrote:
> >> The current way of injecting MCE events without updating of and
> >> synchronizing with the CPUState is broken and causes spurious
> >> corruptions of the MCE-related parts of the CPUState.
> >
> > Can you explain how? The current pronlem with MCE is that it bypasses
> > writeback code, but corruption has nothing to do with that.
>
> It's precisely the same scenario as with the old debug exception
> re-injection: If we update the pending exception state via
> KVM_SET_VCPU_EVENTS, we must not inject it via any other path. Otherwise
> we end up with overwritten/lost events - which is extremely critical for
> this rarely taken code paths.
>
> Jut like parts of KVM_SET_GUEST_DEBUG, KVM_X86_SET_MCE pre-dates
> KVM_SET_VCPU_EVENTS which obsoleted all other exception injection
> mechanisms.
OK.
> >
> >> As a first step towards a fix, enhance the state writeback code with
> >> support for injecting events that are pending in the CPUState. A pending
> >> exception will then be signaled via cpu_interrupt(CPU_INTERRUPT_MCE).
> >> And, just like for TCG, we need to leave the halt state when
> >> CPU_INTERRUPT_MCE is pending (left broken for the to-be-removed old KVM
> >> code).
> >>
> >> This will also allow to unify TCG and KVM injection code.
> >>
> >> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> >> CC: Huang Ying <ying.huang@intel.com>
> >> CC: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
> >> CC: Jin Dongming <jin.dongming@np.css.fujitsu.com>
> >> ---
> >> target-i386/kvm.c | 75 +++++++++++++++++++++++++++++++++++++++++++++++++---
> >> 1 files changed, 70 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> >> index f909661..46f45db 100644
> >> --- a/target-i386/kvm.c
> >> +++ b/target-i386/kvm.c
> >> @@ -467,6 +467,44 @@ void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
> >> #endif /* !KVM_CAP_MCE*/
> >> }
> >>
> >> +static int kvm_inject_mce_oldstyle(CPUState *env)
> >> +{
> >> +#ifdef KVM_CAP_MCE
> >> + if (kvm_has_vcpu_events()) {
> >> + return 0;
> >> + }
> >> + if (env->interrupt_request & CPU_INTERRUPT_MCE) {
> >> + unsigned int bank, bank_num = env->mcg_cap & 0xff;
> >> + struct kvm_x86_mce mce;
> >> +
> >> + /* We must not raise CPU_INTERRUPT_MCE if it's not supported. */
> >> + assert(env->mcg_cap);
> >> +
> >> + env->interrupt_request &= ~CPU_INTERRUPT_MCE;
> >> +
> >> + /*
> >> + * There must be at least one bank in use if CPU_INTERRUPT_MCE was set.
> >> + * Find it and use its values for the event injection.
> >> + */
> >> + for (bank = 0; bank < bank_num; bank++) {
> >> + if (env->mce_banks[bank * 4 + 1] & MCI_STATUS_VAL) {
> >> + break;
> >> + }
> >> + }
> >> + assert(bank < bank_num);
> >> +
> >> + mce.bank = bank;
> >> + mce.status = env->mce_banks[bank * 4 + 1];
> >> + mce.mcg_status = env->mcg_status;
> >> + mce.addr = env->mce_banks[bank * 4 + 2];
> >> + mce.misc = env->mce_banks[bank * 4 + 3];
> >> +
> >> + return kvm_vcpu_ioctl(env, KVM_X86_SET_MCE, &mce);
> >> + }
> >> +#endif /* KVM_CAP_MCE */
> >> + return 0;
> >> +}
> >> +
> >> static void cpu_update_state(void *opaque, int running, int reason)
> >> {
> >> CPUState *env = opaque;
> >> @@ -1375,10 +1413,25 @@ static int kvm_put_vcpu_events(CPUState *env, int level)
> >> return 0;
> >> }
> >>
> >> - events.exception.injected = (env->exception_injected >= 0);
> >> - events.exception.nr = env->exception_injected;
> >> - events.exception.has_error_code = env->has_error_code;
> >> - events.exception.error_code = env->error_code;
> >> + if (env->interrupt_request & CPU_INTERRUPT_MCE) {
> >> + /* We must not raise CPU_INTERRUPT_MCE if it's not supported. */
> >> + assert(env->mcg_cap);
> >> +
> >> + env->interrupt_request &= ~CPU_INTERRUPT_MCE;
> >> + if (env->exception_injected == EXCP08_DBLE) {
> >> + /* this means triple fault */
> >> + qemu_system_reset_request();
> >> + env->exit_request = 1;
> >> + }
> >> + events.exception.injected = 1;
> >> + events.exception.nr = EXCP12_MCHK;
> >> + events.exception.has_error_code = 0;
> >> + } else {
> >> + events.exception.injected = (env->exception_injected >= 0);
> >> + events.exception.nr = env->exception_injected;
> >> + events.exception.has_error_code = env->has_error_code;
> >> + events.exception.error_code = env->error_code;
> >> + }
> >
> > IMO it is important to maintain a scope for kvm_put_vcpu_events /
> > kvm_get_vcpu_events: they synchronize state to/from the kernel. Not more
> > than that. Whatever you're trying to do here should be higher in the
> > vcpu loop code.
>
> We pick up CPU_INTERRUPT_MCE and translate it into the right exception
> that put_vcpu_events is about to sync to the kernel. What should be done
> earlier of those steps? Calculating env->exception_injected?
Everything but writeback. Update env->exception_injected/nr in
process_irqchip_events, or in a separate kvm_arch_update_exceptions.
> >> return ret;
> >> @@ -1678,10 +1736,17 @@ void kvm_arch_post_run(CPUState *env, struct kvm_run *run)
> >> int kvm_arch_process_irqchip_events(CPUState *env)
> >> {
> >> if (kvm_irqchip_in_kernel()) {
> >> + if (env->interrupt_request & CPU_INTERRUPT_MCE) {
> >> + kvm_cpu_synchronize_state(env);
> >> + if (env->mp_state == KVM_MP_STATE_HALTED) {
> >> + env->mp_state = KVM_MP_STATE_RUNNABLE;
> >> + }
> >> + }
> >
> > Should not manipulate mp_state of a running vcpu (should only do that
> > for migration when vcpu is stopped), since its managed by the kernel,
> > for irqchip case.
>
> Not for asynchronously injected MCEs. The target CPU would simply
> oversleep them. MCEs are not in the scope of the in-kernel irqchip.
Pending MCE exception could break out of in-kernel halt emulation.
WARNING: multiple messages have this Message-ID (diff)
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
Avi Kivity <avi@redhat.com>, Huang Ying <ying.huang@intel.com>,
Jin Dongming <jin.dongming@np.css.fujitsu.com>
Subject: [Qemu-devel] Re: [PATCH 08/13] kvm: x86: Inject pending MCE events on state writeback
Date: Thu, 17 Feb 2011 15:55:59 -0200 [thread overview]
Message-ID: <20110217175559.GA12113@amt.cnet> (raw)
In-Reply-To: <4D5D558B.10406@siemens.com>
On Thu, Feb 17, 2011 at 06:06:19PM +0100, Jan Kiszka wrote:
> On 2011-02-17 17:35, Marcelo Tosatti wrote:
> > On Tue, Feb 15, 2011 at 09:23:32AM +0100, Jan Kiszka wrote:
> >> The current way of injecting MCE events without updating of and
> >> synchronizing with the CPUState is broken and causes spurious
> >> corruptions of the MCE-related parts of the CPUState.
> >
> > Can you explain how? The current pronlem with MCE is that it bypasses
> > writeback code, but corruption has nothing to do with that.
>
> It's precisely the same scenario as with the old debug exception
> re-injection: If we update the pending exception state via
> KVM_SET_VCPU_EVENTS, we must not inject it via any other path. Otherwise
> we end up with overwritten/lost events - which is extremely critical for
> this rarely taken code paths.
>
> Jut like parts of KVM_SET_GUEST_DEBUG, KVM_X86_SET_MCE pre-dates
> KVM_SET_VCPU_EVENTS which obsoleted all other exception injection
> mechanisms.
OK.
> >
> >> As a first step towards a fix, enhance the state writeback code with
> >> support for injecting events that are pending in the CPUState. A pending
> >> exception will then be signaled via cpu_interrupt(CPU_INTERRUPT_MCE).
> >> And, just like for TCG, we need to leave the halt state when
> >> CPU_INTERRUPT_MCE is pending (left broken for the to-be-removed old KVM
> >> code).
> >>
> >> This will also allow to unify TCG and KVM injection code.
> >>
> >> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> >> CC: Huang Ying <ying.huang@intel.com>
> >> CC: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
> >> CC: Jin Dongming <jin.dongming@np.css.fujitsu.com>
> >> ---
> >> target-i386/kvm.c | 75 +++++++++++++++++++++++++++++++++++++++++++++++++---
> >> 1 files changed, 70 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> >> index f909661..46f45db 100644
> >> --- a/target-i386/kvm.c
> >> +++ b/target-i386/kvm.c
> >> @@ -467,6 +467,44 @@ void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
> >> #endif /* !KVM_CAP_MCE*/
> >> }
> >>
> >> +static int kvm_inject_mce_oldstyle(CPUState *env)
> >> +{
> >> +#ifdef KVM_CAP_MCE
> >> + if (kvm_has_vcpu_events()) {
> >> + return 0;
> >> + }
> >> + if (env->interrupt_request & CPU_INTERRUPT_MCE) {
> >> + unsigned int bank, bank_num = env->mcg_cap & 0xff;
> >> + struct kvm_x86_mce mce;
> >> +
> >> + /* We must not raise CPU_INTERRUPT_MCE if it's not supported. */
> >> + assert(env->mcg_cap);
> >> +
> >> + env->interrupt_request &= ~CPU_INTERRUPT_MCE;
> >> +
> >> + /*
> >> + * There must be at least one bank in use if CPU_INTERRUPT_MCE was set.
> >> + * Find it and use its values for the event injection.
> >> + */
> >> + for (bank = 0; bank < bank_num; bank++) {
> >> + if (env->mce_banks[bank * 4 + 1] & MCI_STATUS_VAL) {
> >> + break;
> >> + }
> >> + }
> >> + assert(bank < bank_num);
> >> +
> >> + mce.bank = bank;
> >> + mce.status = env->mce_banks[bank * 4 + 1];
> >> + mce.mcg_status = env->mcg_status;
> >> + mce.addr = env->mce_banks[bank * 4 + 2];
> >> + mce.misc = env->mce_banks[bank * 4 + 3];
> >> +
> >> + return kvm_vcpu_ioctl(env, KVM_X86_SET_MCE, &mce);
> >> + }
> >> +#endif /* KVM_CAP_MCE */
> >> + return 0;
> >> +}
> >> +
> >> static void cpu_update_state(void *opaque, int running, int reason)
> >> {
> >> CPUState *env = opaque;
> >> @@ -1375,10 +1413,25 @@ static int kvm_put_vcpu_events(CPUState *env, int level)
> >> return 0;
> >> }
> >>
> >> - events.exception.injected = (env->exception_injected >= 0);
> >> - events.exception.nr = env->exception_injected;
> >> - events.exception.has_error_code = env->has_error_code;
> >> - events.exception.error_code = env->error_code;
> >> + if (env->interrupt_request & CPU_INTERRUPT_MCE) {
> >> + /* We must not raise CPU_INTERRUPT_MCE if it's not supported. */
> >> + assert(env->mcg_cap);
> >> +
> >> + env->interrupt_request &= ~CPU_INTERRUPT_MCE;
> >> + if (env->exception_injected == EXCP08_DBLE) {
> >> + /* this means triple fault */
> >> + qemu_system_reset_request();
> >> + env->exit_request = 1;
> >> + }
> >> + events.exception.injected = 1;
> >> + events.exception.nr = EXCP12_MCHK;
> >> + events.exception.has_error_code = 0;
> >> + } else {
> >> + events.exception.injected = (env->exception_injected >= 0);
> >> + events.exception.nr = env->exception_injected;
> >> + events.exception.has_error_code = env->has_error_code;
> >> + events.exception.error_code = env->error_code;
> >> + }
> >
> > IMO it is important to maintain a scope for kvm_put_vcpu_events /
> > kvm_get_vcpu_events: they synchronize state to/from the kernel. Not more
> > than that. Whatever you're trying to do here should be higher in the
> > vcpu loop code.
>
> We pick up CPU_INTERRUPT_MCE and translate it into the right exception
> that put_vcpu_events is about to sync to the kernel. What should be done
> earlier of those steps? Calculating env->exception_injected?
Everything but writeback. Update env->exception_injected/nr in
process_irqchip_events, or in a separate kvm_arch_update_exceptions.
> >> return ret;
> >> @@ -1678,10 +1736,17 @@ void kvm_arch_post_run(CPUState *env, struct kvm_run *run)
> >> int kvm_arch_process_irqchip_events(CPUState *env)
> >> {
> >> if (kvm_irqchip_in_kernel()) {
> >> + if (env->interrupt_request & CPU_INTERRUPT_MCE) {
> >> + kvm_cpu_synchronize_state(env);
> >> + if (env->mp_state == KVM_MP_STATE_HALTED) {
> >> + env->mp_state = KVM_MP_STATE_RUNNABLE;
> >> + }
> >> + }
> >
> > Should not manipulate mp_state of a running vcpu (should only do that
> > for migration when vcpu is stopped), since its managed by the kernel,
> > for irqchip case.
>
> Not for asynchronously injected MCEs. The target CPU would simply
> oversleep them. MCEs are not in the scope of the in-kernel irqchip.
Pending MCE exception could break out of in-kernel halt emulation.
next prev parent reply other threads:[~2011-02-17 18:01 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-02-15 8:23 [PATCH 00/13] [uq/master] Patch queue, part IV (MCE edition) Jan Kiszka
2011-02-15 8:23 ` [Qemu-devel] " Jan Kiszka
2011-02-15 8:23 ` [PATCH 01/13] x86: Account for MCE in cpu_has_work Jan Kiszka
2011-02-15 8:23 ` [Qemu-devel] " Jan Kiszka
2011-02-15 8:23 ` [PATCH 02/13] x86: Perform implicit mcg_status reset Jan Kiszka
2011-02-15 8:23 ` [Qemu-devel] " Jan Kiszka
2011-02-15 8:23 ` [PATCH 03/13] x86: Small cleanups of MCE helpers Jan Kiszka
2011-02-15 8:23 ` [Qemu-devel] " Jan Kiszka
2011-02-15 8:23 ` [PATCH 04/13] x86: Refine error reporting of MCE injection services Jan Kiszka
2011-02-15 8:23 ` [Qemu-devel] " Jan Kiszka
2011-02-15 8:23 ` [PATCH 05/13] x86: Optionally avoid injecting AO MCEs while others are pending Jan Kiszka
2011-02-15 8:23 ` [Qemu-devel] " Jan Kiszka
2011-02-15 8:23 ` [PATCH 06/13] Synchronize VCPU states before reset Jan Kiszka
2011-02-15 8:23 ` [Qemu-devel] " Jan Kiszka
2011-02-15 8:23 ` [PATCH 07/13] kvm: x86: Move MCE functions together Jan Kiszka
2011-02-15 8:23 ` [Qemu-devel] " Jan Kiszka
2011-02-15 8:23 ` [PATCH 08/13] kvm: x86: Inject pending MCE events on state writeback Jan Kiszka
2011-02-15 8:23 ` [Qemu-devel] " Jan Kiszka
2011-02-17 16:35 ` Marcelo Tosatti
2011-02-17 16:35 ` [Qemu-devel] " Marcelo Tosatti
2011-02-17 17:06 ` Jan Kiszka
2011-02-17 17:06 ` [Qemu-devel] " Jan Kiszka
2011-02-17 17:55 ` Marcelo Tosatti [this message]
2011-02-17 17:55 ` Marcelo Tosatti
2011-02-17 18:04 ` Jan Kiszka
2011-02-17 18:04 ` [Qemu-devel] " Jan Kiszka
2011-02-17 18:17 ` Marcelo Tosatti
2011-02-17 18:17 ` [Qemu-devel] " Marcelo Tosatti
2011-02-15 8:23 ` [PATCH 09/13] kvm: x86: Consolidate TCG and KVM MCE injection code Jan Kiszka
2011-02-15 8:23 ` [Qemu-devel] " Jan Kiszka
2011-02-17 18:08 ` Marcelo Tosatti
2011-02-17 18:08 ` [Qemu-devel] " Marcelo Tosatti
2011-02-17 18:17 ` Jan Kiszka
2011-02-17 18:17 ` [Qemu-devel] " Jan Kiszka
2011-02-15 8:23 ` [PATCH 10/13] kvm: x86: Clean up kvm_setup_mce Jan Kiszka
2011-02-15 8:23 ` [Qemu-devel] " Jan Kiszka
2011-02-15 8:23 ` [PATCH 11/13] kvm: x86: Fail kvm_arch_init_vcpu if MCE initialization fails Jan Kiszka
2011-02-15 8:23 ` [Qemu-devel] " Jan Kiszka
2011-02-15 8:23 ` [PATCH 12/13] Add qemu_ram_remap Jan Kiszka
2011-02-15 8:23 ` [Qemu-devel] " Jan Kiszka
2011-02-15 8:23 ` [PATCH 13/13] KVM, MCE, unpoison memory address across reboot Jan Kiszka
2011-02-15 8:23 ` [Qemu-devel] " Jan Kiszka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110217175559.GA12113@amt.cnet \
--to=mtosatti@redhat.com \
--cc=avi@redhat.com \
--cc=jan.kiszka@siemens.com \
--cc=jin.dongming@np.css.fujitsu.com \
--cc=kvm@vger.kernel.org \
--cc=qemu-devel@nongnu.org \
--cc=seto.hidetoshi@jp.fujitsu.com \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.