From: Jan Kiszka <jan.kiszka@siemens.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>,
Marcelo Tosatti <mtosatti@redhat.com>, kvm <kvm@vger.kernel.org>
Subject: Re: [PATCH] KVM: x86: Convert INIT and SIPI signals into synchronously handled events
Date: Tue, 12 Mar 2013 13:58:48 +0100 [thread overview]
Message-ID: <513F2688.2080902@siemens.com> (raw)
In-Reply-To: <513F1A63.9070107@redhat.com>
On 2013-03-12 13:06, Paolo Bonzini wrote:
> Il 12/03/2013 12:44, Jan Kiszka ha scritto:
>> A VCPU sending INIT or SIPI to some other VCPU races for setting the
>> remote VCPU's mp_state. When we were unlucky, KVM_MP_STATE_INIT_RECEIVED
>> was overwritten by kvm_emulate_halt and, thus, got lost.
>>
>> This introduces APIC events for those two signals, keeping them in
>> kvm_apic until kvm_apic_accept_events is run over the target vcpu
>> context. kvm_apic_has_events reports to kvm_arch_vcpu_runnable if there
>> are pending events, thus if vcpu blocking should end.
>>
>> The patch comes with the side effect of effectively obsoleting
>> KVM_MP_STATE_SIPI_RECEIVED. We still accept it from user space, but
>> immediately translate it to KVM_MP_STATE_INIT_RECEIVED + KVM_APIC_SIPI.
>> The vcpu itself will no longer enter the KVM_MP_STATE_SIPI_RECEIVED
>> state. That also means we no longer exit to user space after receiving a
>> SIPI event.
>>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>> ---
>>
>> This doesn't fix the wrong behaviour on INIT for the BSP yet. Leaving
>> this honor to Paolo.
>>
>> I didn't try porting any INIT handling for nested on top yet but I
>> think it should be feasible - once we know their semantics for sure, at
>> least on Intel...
>>
>> arch/x86/include/asm/kvm_host.h | 1 +
>> arch/x86/kvm/lapic.c | 41 +++++++++++++++++++++++++++++++-------
>> arch/x86/kvm/lapic.h | 7 ++++++
>> arch/x86/kvm/x86.c | 39 ++++++++++++++++++++----------------
>> 4 files changed, 63 insertions(+), 25 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 348d859..2d28e76 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -1002,6 +1002,7 @@ int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v);
>> int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
>> int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
>> int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
>> +void kvm_vcpu_reset(struct kvm_vcpu *vcpu);
>>
>> void kvm_define_shared_msr(unsigned index, u32 msr);
>> void kvm_set_shared_msr(unsigned index, u64 val, u64 mask);
>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>> index 02b51dd..4a21a6b 100644
>> --- a/arch/x86/kvm/lapic.c
>> +++ b/arch/x86/kvm/lapic.c
>> @@ -731,7 +731,7 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
>> case APIC_DM_INIT:
>> if (!trig_mode || level) {
>> result = 1;
>> - vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
>> + set_bit(KVM_APIC_INIT, &apic->pending_events);
>
> I think this should clear pending SIPIs, unless KVM_APIC_INIT was
> already set in which case it should be a no-op. Something like:
>
> e = apic->pending_events;
> while (!(e & KVM_APIC_INIT))
> e = cmpxchg(&apic->pending_events, e,
> (e | KVM_APIC_INIT) & ~KVM_APIC_SIPI);
>
> If you do this, better make pending_events an atomic_t.
OK (looks ugly but necessary).
>
>> kvm_make_request(KVM_REQ_EVENT, vcpu);
>> kvm_vcpu_kick(vcpu);
>> } else {
>> @@ -743,13 +743,13 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
>> case APIC_DM_STARTUP:
>> apic_debug("SIPI to vcpu %d vector 0x%02x\n",
>> vcpu->vcpu_id, vector);
>> - if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
>> - result = 1;
>> - vcpu->arch.sipi_vector = vector;
>> - vcpu->arch.mp_state = KVM_MP_STATE_SIPI_RECEIVED;
>> - kvm_make_request(KVM_REQ_EVENT, vcpu);
>> - kvm_vcpu_kick(vcpu);
>> - }
>> + result = 1;
>> + apic->sipi_vector = vector;
>> + /* make sure sipi_vector is visible for the receiver */
>> + smp_wmb();
>> + set_bit(KVM_APIC_SIPI, &apic->pending_events);
>> + kvm_make_request(KVM_REQ_EVENT, vcpu);
>> + kvm_vcpu_kick(vcpu);
>> break;
>>
>> case APIC_DM_EXTINT:
>> @@ -1860,6 +1860,31 @@ int kvm_lapic_enable_pv_eoi(struct kvm_vcpu *vcpu, u64 data)
>> addr);
>> }
>>
>> +bool kvm_apic_has_events(struct kvm_vcpu *vcpu)
>> +{
>> + return kvm_vcpu_has_lapic(vcpu) && vcpu->arch.apic->pending_events;
>> +}
>> +
>> +void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
>> +{
>> + struct kvm_lapic *apic = vcpu->arch.apic;
>> +
>> + if (!kvm_vcpu_has_lapic(vcpu))
>> + return;
>> +
>> + if (test_and_clear_bit(KVM_APIC_INIT, &apic->pending_events))
>> + vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
>> + if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events) &&
>> + vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
>> + vcpu->arch.sipi_vector = apic->sipi_vector;
>> + pr_debug("vcpu %d received sipi with vector # %x\n",
>> + vcpu->vcpu_id, vcpu->arch.sipi_vector);
>> + kvm_lapic_reset(vcpu);
>> + kvm_vcpu_reset(vcpu);
>> + vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
>> + }
>> +}
>> +
>> void kvm_lapic_init(void)
>> {
>> /* do not patch jump label more than once per second */
>> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
>> index 1676d34..ef3f4ef 100644
>> --- a/arch/x86/kvm/lapic.h
>> +++ b/arch/x86/kvm/lapic.h
>> @@ -5,6 +5,9 @@
>>
>> #include <linux/kvm_host.h>
>>
>> +#define KVM_APIC_INIT 0
>> +#define KVM_APIC_SIPI 1
>> +
>> struct kvm_timer {
>> struct hrtimer timer;
>> s64 period; /* unit: ns */
>> @@ -32,13 +35,17 @@ struct kvm_lapic {
>> void *regs;
>> gpa_t vapic_addr;
>> struct page *vapic_page;
>> + unsigned long pending_events;
>> + unsigned int sipi_vector;
>> };
>> int kvm_create_lapic(struct kvm_vcpu *vcpu);
>> void kvm_free_lapic(struct kvm_vcpu *vcpu);
>>
>> int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu);
>> +bool kvm_apic_has_events(struct kvm_vcpu *vcpu);
>> int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu);
>> int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu);
>> +void kvm_apic_accept_events(struct kvm_vcpu *vcpu);
>> void kvm_lapic_reset(struct kvm_vcpu *vcpu);
>> u64 kvm_lapic_get_cr8(struct kvm_vcpu *vcpu);
>> void kvm_lapic_set_tpr(struct kvm_vcpu *vcpu, unsigned long cr8);
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index b891ac3..a0b8041 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -162,8 +162,6 @@ u64 __read_mostly host_xcr0;
>>
>> static int emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt);
>>
>> -static void kvm_vcpu_reset(struct kvm_vcpu *vcpu);
>> -
>> static inline void kvm_async_pf_hash_reset(struct kvm_vcpu *vcpu)
>> {
>> int i;
>> @@ -5663,6 +5661,11 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>> bool req_immediate_exit = 0;
>>
>> if (vcpu->requests) {
>> + kvm_apic_accept_events(vcpu);
>> + if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
>> + r = 1;
>> + goto out;
>> + }
>> if (kvm_check_request(KVM_REQ_MMU_RELOAD, vcpu))
>> kvm_mmu_unload(vcpu);
>> if (kvm_check_request(KVM_REQ_MIGRATE_TIMER, vcpu))
>> @@ -5847,14 +5850,6 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
>> int r;
>> struct kvm *kvm = vcpu->kvm;
>>
>> - if (unlikely(vcpu->arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED)) {
>> - pr_debug("vcpu %d received sipi with vector # %x\n",
>> - vcpu->vcpu_id, vcpu->arch.sipi_vector);
>> - kvm_lapic_reset(vcpu);
>> - kvm_vcpu_reset(vcpu);
>> - vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
>> - }
>> -
>> vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
>> r = vapic_enter(vcpu);
>> if (r) {
>> @@ -5871,8 +5866,8 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
>> srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
>> kvm_vcpu_block(vcpu);
>> vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
>> - if (kvm_check_request(KVM_REQ_UNHALT, vcpu))
>> - {
>> + if (kvm_check_request(KVM_REQ_UNHALT, vcpu)) {
>> + kvm_apic_accept_events(vcpu);
>> switch(vcpu->arch.mp_state) {
>> case KVM_MP_STATE_HALTED:
>> vcpu->arch.mp_state =
>> @@ -5880,7 +5875,8 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
>> case KVM_MP_STATE_RUNNABLE:
>> vcpu->arch.apf.halted = false;
>> break;
>> - case KVM_MP_STATE_SIPI_RECEIVED:
>> + case KVM_MP_STATE_INIT_RECEIVED:
>> + break;
>> default:
>> r = -EINTR;
>> break;
>> @@ -5901,7 +5897,8 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
>> ++vcpu->stat.request_irq_exits;
>> }
>>
>> - kvm_check_async_pf_completion(vcpu);
>> + if (vcpu->arch.mp_state != KVM_MP_STATE_INIT_RECEIVED)
>> + kvm_check_async_pf_completion(vcpu);
>
> Can you instead do another change that affects the conditions of
> kvm_check_async_pf_completion?
>
> For example, should kvm_arch_interrupt_allowed return zero if the VCPU
> is in the INIT_RECEIVED state?
Yeah, that probably makes sense beyond async_pf.
>
>>
>> if (signal_pending(current)) {
>> r = -EINTR;
>> @@ -6015,6 +6012,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
>>
>> if (unlikely(vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)) {
>> kvm_vcpu_block(vcpu);
>> + kvm_apic_accept_events(vcpu);
>> clear_bit(KVM_REQ_UNHALT, &vcpu->requests);
>> r = -EAGAIN;
>> goto out;
>> @@ -6171,6 +6169,7 @@ int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
>> int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
>> struct kvm_mp_state *mp_state)
>> {
>> + kvm_apic_accept_events(vcpu);
>
> Is this racy against __vcpu_run?
No, as Gleb already explained.
>
>> mp_state->mp_state = vcpu->arch.mp_state;
>> return 0;
>> }
>> @@ -6178,7 +6177,13 @@ int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
>> int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
>> struct kvm_mp_state *mp_state)
>> {
>> - vcpu->arch.mp_state = mp_state->mp_state;
>> + if (mp_state->mp_state == KVM_MP_STATE_SIPI_RECEIVED) {
>> + if (!kvm_vcpu_has_lapic(vcpu))
>> + return -EINVAL;
>> + vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
>> + set_bit(KVM_APIC_SIPI, &vcpu->arch.apic->pending_events);
>> + } else
>> + vcpu->arch.mp_state = mp_state->mp_state;
>
> Should INIT_RECEIVED also be invalid without an in-kernel LAPIC?
Good question. But, then, how do we reset those CPUs (BSPs only)?
Jan
--
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
next prev parent reply other threads:[~2013-03-12 12:58 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-12 11:44 [PATCH] KVM: x86: Convert INIT and SIPI signals into synchronously handled events Jan Kiszka
2013-03-12 12:06 ` Paolo Bonzini
2013-03-12 12:29 ` Gleb Natapov
2013-03-12 12:29 ` Paolo Bonzini
2013-03-12 12:46 ` Jan Kiszka
2013-03-12 12:49 ` Gleb Natapov
2013-03-12 12:52 ` Jan Kiszka
2013-03-12 12:53 ` Gleb Natapov
2013-03-12 13:09 ` Paolo Bonzini
2013-03-12 13:13 ` Jan Kiszka
2013-03-12 12:58 ` Jan Kiszka [this message]
2013-03-12 13:01 ` Jan Kiszka
2013-03-12 13:13 ` Paolo Bonzini
2013-03-12 13:16 ` Jan Kiszka
2013-03-12 13:25 ` Gleb Natapov
2013-03-12 13:27 ` Jan Kiszka
2013-03-12 13:37 ` Gleb Natapov
2013-03-13 7:39 ` Jan Kiszka
2013-03-13 9:03 ` Paolo Bonzini
2013-03-13 9:08 ` Jan Kiszka
2013-03-12 13:12 ` Gleb Natapov
2013-03-12 13:21 ` Jan Kiszka
2013-03-12 13:41 ` Gleb Natapov
2013-03-12 13:43 ` Paolo Bonzini
2013-03-12 13:52 ` Gleb Natapov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=513F2688.2080902@siemens.com \
--to=jan.kiszka@siemens.com \
--cc=gleb@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=mtosatti@redhat.com \
--cc=pbonzini@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.