From: Gleb Natapov <gleb@redhat.com>
To: Jan Kiszka <jan.kiszka@web.de>
Cc: Marcelo Tosatti <mtosatti@redhat.com>, kvm <kvm@vger.kernel.org>,
Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH v2] KVM: x86: Convert INIT and SIPI signals into synchronously handled requests
Date: Tue, 5 Mar 2013 11:37:02 +0200 [thread overview]
Message-ID: <20130305093702.GY23616@redhat.com> (raw)
In-Reply-To: <5135B6E5.10001@web.de>
On Tue, Mar 05, 2013 at 10:12:05AM +0100, Jan Kiszka wrote:
> On 2013-03-05 09:46, Gleb Natapov wrote:
> > On Tue, Mar 05, 2013 at 09:24:40AM +0100, Jan Kiszka wrote:
> >> On 2013-03-05 08:57, Gleb Natapov wrote:
> >>> On Tue, Mar 05, 2013 at 12:00:10AM +0100, Jan Kiszka wrote:
> >>>> On 2013-03-04 22:41, Jan Kiszka wrote:
> >>>>> From: Jan Kiszka <jan.kiszka@siemens.com>
> >>>>>
> >>>>> A VCPU sending INIT or SIPI to some other VCPU races for setting the
> >>>>> remote VCPU's mp_state. When we were unlucky, KVM_MP_STATE_INIT_RECEIVED
> >>>>> was overwritten by kvm_emulate_halt and, thus, got lost.
> >>>>>
> >>>>> Fix this by raising requests on the sender side that will then be
> >>>>> handled synchronously over the target VCPU context.
> >>>>>
> >>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> >>>>> ---
> >>>>>
> >>>>> Changes in v2:
> >>>>> - check transition to INIT_RECEIVED in vcpu_enter_guest
> >>>>> - removed return value of kvm_check_init_and_sipi - caller has to
> >>>>> check for relevant transition afterward
> >>>>> - add write barrier after setting sipi_vector
> >>>>>
> >>>>> arch/x86/kvm/lapic.c | 11 ++++++-----
> >>>>> arch/x86/kvm/x86.c | 15 +++++++++++++++
> >>>>> include/linux/kvm_host.h | 2 ++
> >>>>> 3 files changed, 23 insertions(+), 5 deletions(-)
> >>>>>
> >>>>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> >>>>> index 02b51dd..7986c9f 100644
> >>>>> --- a/arch/x86/kvm/lapic.c
> >>>>> +++ b/arch/x86/kvm/lapic.c
> >>>>> @@ -731,8 +731,7 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
> >>>>> case APIC_DM_INIT:
> >>>>> if (!trig_mode || level) {
> >>>>> result = 1;
> >>>>> - vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
> >>>>> - kvm_make_request(KVM_REQ_EVENT, vcpu);
> >>>>> + kvm_make_request(KVM_REQ_INIT, vcpu);
> >>>>> kvm_vcpu_kick(vcpu);
> >>>>> } else {
> >>>>> apic_debug("Ignoring de-assert INIT to vcpu %d\n",
> >>>>> @@ -743,11 +742,13 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
> >>>>> case APIC_DM_STARTUP:
> >>>>> apic_debug("SIPI to vcpu %d vector 0x%02x\n",
> >>>>> vcpu->vcpu_id, vector);
> >>>>> - if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
> >>>>> + if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED ||
> >>>>> + test_bit(KVM_REQ_INIT, &vcpu->requests)) {
> >>>>> result = 1;
> >>>>> vcpu->arch.sipi_vector = vector;
> >>>>> - vcpu->arch.mp_state = KVM_MP_STATE_SIPI_RECEIVED;
> >>>>> - kvm_make_request(KVM_REQ_EVENT, vcpu);
> >>>>> + /* make sure sipi_vector is visible for the receiver */
> >>>>> + smp_wmb();
> >>>>> + kvm_make_request(KVM_REQ_SIPI, vcpu);
> >>>>> kvm_vcpu_kick(vcpu);
> >>>>> }
> >>>>> break;
> >>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >>>>> index d0cf737..0be04b9 100644
> >>>>> --- a/arch/x86/kvm/x86.c
> >>>>> +++ b/arch/x86/kvm/x86.c
> >>>>> @@ -5641,6 +5641,15 @@ static void update_eoi_exitmap(struct kvm_vcpu *vcpu)
> >>>>> kvm_x86_ops->load_eoi_exitmap(vcpu, eoi_exit_bitmap);
> >>>>> }
> >>>>>
> >>>>> +static void kvm_check_init_and_sipi(struct kvm_vcpu *vcpu)
> >>>>> +{
> >>>>> + if (kvm_check_request(KVM_REQ_INIT, vcpu))
> >>>>> + vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
> >>>>
> >>>> And here is a small race between clearing REQ_INIT and setting
> >>>> INIT_RECEIVED. It can make the LAPIC drop the SIPI incorrectly. Need to
> >>>> break up test and clear, doing the clear after mp_state update. Yeah...
> >>>>
> >>> You also need to call kvm_check_init_and_sipi() in
> >>> kvm_arch_vcpu_ioctl_get_mpstate(),
> >>
> >> Indeed.
> >>
> >>> which means you now have three places
> >>> where you transfer INIT/SIPI state from requests to mp_state. All the
> >>> problems arise from the fact that now you have two places where you
> >>> are storing current state.
> >>
> >> Not at all. I'm keeping the state in a single place, mp_state. I just
> >> have to make sure that I do not loose asynchronous events - what INIT
> >> and SIPI are.
> >>
> > As evident from this code:
> > + if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED ||
> > + test_bit(KVM_REQ_INIT, &vcpu->requests)) {
> > the state is in two places.
>
> That's just to protect the content of sipi_vector during delivery. We
> could drop the complete if clause if we protected that variable differently.
>
I understand why the code is here. I am saying that this is the evidence
that the state is in two places.
> >
> >>> To overcome this we can either deprecated
> >>> KVM_MP_STATE_INIT_RECEIVED/KVM_MP_STATE_SIPI_RECEIVED values for mp_state
> >>> (use it only for migration purposes) and use separate state in APIC
> >>> to hold those event, like with nmi, or why not go with Paolo's simple
> >>> cmpxchg one?
> >>
> >> We need to replace most, if not all, manipulations of mp_state with
> >> cmpxchg, verifying the state transitions there. And the request-based
> >> approach still looks cleaner to me when it comes to implementing INIT
> >> handling for nested modes. That will just trivially hook into
> >> kvm_check_init_and_sipi.
> >>
> > The mp_state changes are rare, do not see the problem replacing all
> > state changes with cmpxchg. I do not like request-based approach as
> > implemented since we keep state in two places and constantly sync it
> > back.
>
> And I like it more as it avoids spurious state changes toward INIT. That
> will happen if we misuse mp_state for event signaling, like we do so
> far, having to fix it up later again because the INIT event turned out
> to become an INIT VM-exit.
>
Yes, I agree we abuse mp_state for signaling. I do not want to abuse
->request for that too. So what about other idea about treating init/sipi
just like any other APIC event (that's what they are) and add state to
lapic to track init/sipi signaling?
--
Gleb.
next prev parent reply other threads:[~2013-03-05 9:37 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-04 21:41 [PATCH v2] KVM: x86: Convert INIT and SIPI signals into synchronously handled requests Jan Kiszka
2013-03-04 23:00 ` Jan Kiszka
2013-03-05 7:57 ` Gleb Natapov
2013-03-05 8:24 ` Jan Kiszka
2013-03-05 8:46 ` Gleb Natapov
2013-03-05 9:12 ` Jan Kiszka
2013-03-05 9:37 ` Gleb Natapov [this message]
2013-03-05 10:50 ` Paolo Bonzini
2013-03-05 13:25 ` Jan Kiszka
2013-03-05 13:33 ` Gleb Natapov
2013-03-05 13:28 ` Gleb Natapov
2013-03-05 23:16 ` Marcelo Tosatti
2013-03-06 0:01 ` Marcelo Tosatti
2013-03-06 0:06 ` Marcelo Tosatti
2013-03-06 6:12 ` Paolo Bonzini
2013-03-06 7:57 ` Jan Kiszka
2013-03-06 21:30 ` Marcelo Tosatti
2013-03-06 21:39 ` Jan Kiszka
2013-03-06 21:50 ` Marcelo Tosatti
2013-03-06 21:58 ` Jan Kiszka
2013-03-06 21:19 ` Marcelo Tosatti
2013-03-06 22:43 ` Paolo Bonzini
2013-03-06 23:04 ` Marcelo Tosatti
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130305093702.GY23616@redhat.com \
--to=gleb@redhat.com \
--cc=jan.kiszka@web.de \
--cc=kvm@vger.kernel.org \
--cc=mtosatti@redhat.com \
--cc=pbonzini@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.