From: Paolo Bonzini <pbonzini@redhat.com>
To: Gleb Natapov <gleb@redhat.com>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
mtosatti@redhat.com, jan.kiszka@siemens.com
Subject: Re: [PATCH] x86: kvm: reset the bootstrap processor when it gets an INIT
Date: Mon, 11 Mar 2013 18:39:44 +0100 [thread overview]
Message-ID: <513E16E0.2050703@redhat.com> (raw)
In-Reply-To: <20130311172034.GR31619@redhat.com>
Il 11/03/2013 18:20, Gleb Natapov ha scritto:
> On Mon, Mar 11, 2013 at 03:28:03PM +0100, Paolo Bonzini wrote:
>> Il 11/03/2013 14:54, Gleb Natapov ha scritto:
>>>> Setting the mp_state to INIT_RECEIVED is that interface, and it already
>>>> works, for APs at least. This patch extends it to work for the BSP as well.
>>>
>>> It does not for AP either. If AP has vmx on mp_sate should not be set to
>>> INIT_RECEIVED. mp_sate is a state as you can see from its name and we
>>> already had a discussion on the generic device API about importance of
>>> separating sending commands from setting state. There is a difference
>>> between setting mp_state during migration and signaling INIT#.
>>
>> What does migration have to do with this?
>
> get|set_mpstate is used by migration. Actually this is primary reason
> for this interface existence.
Does it have to be the only one?
>>>> In the corresponding userspace patch, I don't need to touch the CPU
>>>> state at all. I can just signal the kernel. If I touch the CPU, I'll
>>>> break the nested case, no matter how it is implemented. So far, the
>>>> userspace did not have to worry about nested, and that's something that
>>>> should be kept that way.
>>> We are discussing two different things here. I'll try to separate them.
>>> 1. BSP is broken WRT #INIT
>>> 2. nested is broken WRT #INIT
>>>
>>> You are fixing 1 with your patches, for that I proposed much easier
>>> solution (at last from kernel point of view): if BSP reset it in
>>> userspace and make it runnable. Nested virt is still broken, but this is
>>> not what you are fixing.
>>
>> It's not what I'm fixing, but I don't want to make the fix for nested
>
> What are you fixing then?
Nested virt is not what I am fixing, but I'm trying to keep an eye on
that (and the other INIT race) while doing these patches.
>> virt unnecessarily more complicated. Nested virt needs to know about
>> INIT and SIPI; redefining the meaning of INIT_RECEIVED and SIPI_RECEIVED
>> makes it more complicated to reflect these events to L1.
>>
>>> For 2 much more involved fix is needed. Jan fixes it and it will require
>>> signaling INIT# from userspace by other means than mp_sate because
>>> signaling INIT# does not automatically means that mp_sate changes to
>>> INIT_RECEIVED.
>>
>> In your interpretation of INIT_RECEIVED, no. In mine, yes...
>
> Your code shows different. With your patch setting mp_state to
> INIT_RECEIVED makes vcpu non tunable. This is incorrect if INIT_RECEIVED
> is "INIT# is triggered" interface.
What do you mean by "non tunable"? In non-nested mode, the VCPU will
reset immediately, as soon as it is re-entered. In nested mode, the
VCPU will eat the INIT_RECEIVED and turn it into a vmexit.
At least according to AMD's docs, the VMM has to reassert INIT if it
wants the processor to actually process it [15.20.8 INIT support].
Intel's does not say it explicitly, but it doesn't say the opposite
either. It seems to be the only that makes sense.
>>>> If we move away from the INIT_RECEIVED and SIPI_RECEIVED states for
>>>> in-kernel APIC -> VCPU communication, then the KVM_SET_MP_STATE ioctl
>>>> will have to convert them to the right bits in the requests field or in
>>>> the APIC state. But I'm starting to see less benefit from moving away
>>>> from mp_state.
>>>>
>>> We are not moving away from mp_state, we are moving away from using
>>> mp_state for signaling
>>
>> That's what I meant; sorry for the unclear abbreviation.
>
> Then we disagree.
We do. Let's see _where_ exactly we disagree.
>>> because with nested virt INIT does not always
>>> change mp_state
>>
>> Why not?
>
> Because mp_state is the current state the vcpu is in. It can be
> uninitialized, runnable, halted or wait for sipi. SDM says that
> if nested virt is enabled vcpu does not enter wait for sipi state
> on INIT#.
Yes, but it still has to do something (a vmexit) and go back to RUNNING.
So it needs signaling from userspace to the kernel.
>> Which is why it's good to have the reset done in kernel space,
>> not in user space.
>
> Without nested virt it does not really matter and if it is does not
> really matter you do not add code to the kernel just because it is good.
> With nested virt INIT# processing needs to go to the kernel. In some
> cases INIT will cause reset, but you do not "do reset in kernel space",
> you do "INIT# handling in kernel space".
We agree on this. What I add is: let's define the API so that it is
nested-friendly. This means having a signaling mechanism for userspace.
I think you do not want mp_state to be this signaling mechanism. Why
not? Can an existing ioctl be the alternative or do we need to invent a
new one?
Paolo
next prev parent reply other threads:[~2013-03-11 17:39 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-09 6:48 [PATCH] x86: kvm: reset the bootstrap processor when it gets an INIT Paolo Bonzini
2013-03-10 11:46 ` Gleb Natapov
2013-03-10 14:53 ` Paolo Bonzini
2013-03-10 15:35 ` Gleb Natapov
2013-03-10 17:19 ` Paolo Bonzini
2013-03-10 18:10 ` Gleb Natapov
2013-03-11 10:14 ` Paolo Bonzini
2013-03-11 10:28 ` Gleb Natapov
2013-03-11 11:25 ` Paolo Bonzini
2013-03-11 11:51 ` Gleb Natapov
2013-03-11 13:31 ` Paolo Bonzini
2013-03-11 13:54 ` Gleb Natapov
2013-03-11 14:01 ` Jan Kiszka
2013-03-11 14:05 ` Gleb Natapov
2013-03-11 14:06 ` Jan Kiszka
2013-03-11 14:09 ` Gleb Natapov
2013-03-11 14:10 ` Jan Kiszka
2013-03-11 14:12 ` Gleb Natapov
2013-03-11 14:19 ` Jan Kiszka
2013-03-11 14:23 ` Paolo Bonzini
2013-03-11 15:36 ` Jan Kiszka
2013-03-11 17:23 ` Gleb Natapov
2013-03-11 17:34 ` Jan Kiszka
2013-03-11 17:38 ` Jan Kiszka
2013-03-11 17:41 ` Gleb Natapov
2013-03-11 18:05 ` Jan Kiszka
2013-03-11 18:13 ` Gleb Natapov
2013-03-11 18:27 ` Jan Kiszka
2013-03-11 18:39 ` Gleb Natapov
2013-03-11 18:47 ` Jan Kiszka
2013-03-11 18:51 ` Gleb Natapov
2013-03-11 19:01 ` Jan Kiszka
2013-03-11 19:30 ` Gleb Natapov
2013-03-12 9:25 ` Jan Kiszka
2013-03-12 11:28 ` Gleb Natapov
2013-03-11 14:28 ` Paolo Bonzini
2013-03-11 17:20 ` Gleb Natapov
2013-03-11 17:39 ` Paolo Bonzini [this message]
2013-03-11 18:04 ` Gleb Natapov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=513E16E0.2050703@redhat.com \
--to=pbonzini@redhat.com \
--cc=gleb@redhat.com \
--cc=jan.kiszka@siemens.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mtosatti@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.